The Oxford Handbook of Music and The Brain (2019)

THE OXFORD HANDBOOK OF
MUSIC AND THE BRAIN

THE OXFORD HANDBOOK OF
MUSIC AND THE BRAIN
Edited by
MICHAEL H. THAUT
and
DONALD A. HODGES
Great Clarendon Street, Oxford, 26 , United Kingdom
Oxford University Press is a department of the University of Oxford. It furthers the University’s
objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a
registered trade mark of Oxford University Press in the UK and in certain other countries
© Oxford University Press 2019
The moral rights of the authors have been asserted
First Edition published in 2019

Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, without the prior permission in writing of Oxford
University Press, or as expressly permitted by law, by licence or under terms agreed with the
appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of
the above should be sent to the Rights Department, Oxford University Press, at the address above
You must not circulate this work in any other form and you must impose this same condition on any
acquirer
Published in the United States of America by Oxford University Press

198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data

Data available
Library of Congress Control Number: 2019943710
ISBN 978–0–19–880412–3
ebook ISBN 978–0–19–252613–7
Printed and bound by CPI Group (UK) Ltd, Croydon, 04
Links to third party websites are provided by Oxford in good faith and for information only. Oxford
disclaims any responsibility for the materials contained in any third party website referenced in this
work.
T C
List of Contributors
SECTION I INTRODUCTION
1. The Neuroscientific Study of Music: A Burgeoning
Discipline
D A. H M H. T
SECTION II MUSIC, THE BRAIN, AND

CULTURAL CONTEXTS
2. Music Through the Lens of Cultural Neuroscience
D A. H
3. Cultural Distance: A Computational Approach to Exploring

Cultural Influences on Music Cognition
S J. M ,S M. D , M T. P
4. When Extravagance Impresses: Recasting Esthetics in

Evolutionary Terms
B M
SECTION III MUSIC PROCESSING IN THE
HUMAN BRAIN
5. Cerebral Organization of Music Processing
T B J M H. T
6. Network Neuroscience: An Introduction to Graph Theory

Network-Based Techniques for Music and Brain Imaging
Research
R W. W
7. Acoustic Structure and Musical Function: Musical Notes

Informing Auditory Research
M S
8. Neural Basis of Rhythm Perception

C M. V B N , J. E T. T ,
J A. G
9. Neural Basis of Music Perception: Melody, Harmony, and

Timbre
S K
10. Multisensory Processing in Music

F R
SECTION IV NEURAL RESPONSES TO

MUSIC: COGNITION, AFFECT, LANGUAGE
11. Music and Memory
L J
Music and Attention, Executive Function, and Creativity
12.
P L R E. G
13. Neural Correlates of Music and Emotion

P N. J L S. S
14. Neurochemical Responses to Music

Y K
15. The Neuroaesthetics of Music: A Research Agenda Coming

of Age
E B
16. Music and Language

D S B M
SECTION V MUSICIANSHIP AND BRAIN

FUNCTION
17. Musical Expertise and Brain Structure: The Causes and
Consequences of Training
V B. P
18. Genomics Approaches for Studying Musical Aptitude and

Related Traits
I J
19. Brain Research in Music Performance

E A ,S F ,D S. S ,
C I. I
20. Brain Research in Music Improvisation

M G. E A L. B
21. Neural Mechanisms of Musical Imagery
T L. H
22. Neuroplasticity in Music Learning

V P M T
SECTION VI DEVELOPMENTAL ISSUES IN

MUSIC AND THE BRAIN
23. The Role of Musical Development in Early Language
Acquisition
A B ,M G , L. R S
24. Rhythm, Meter, and Timing: The Heartbeat of Musical

Development
L J. T S M -R
25. Music and the Aging Brain

L F ,A M ,E B ,
B T
26. Music Training and Cognitive Abilities: Associations,

Causes, and Consequences
S S E. G S
27. The Neuroscience of Children on the Autism Spectrum with

Exceptional Musical Abilities
A O
SECTION VII MUSIC, THE BRAIN, AND

HEALTH
28. Neurologic Music Therapy in Sensorimotor Rehabilitation
C T K M S
29. Neurologic Music Therapy for Speech and Language

Rehabilitation
Y S. L , C T , C S
30. Neurologic Music Therapy Targeting Cognitive and Affective

Functions
S H
31. Musical Disorders

I R ,S P , P T
32. When Blue Turns to Gray: The Enigma of Musician’s

Dystonia
D P E A
SECTION VIII THE FUTURE

33. New Horizons for Brain Research in Music
M H. T D A. H
Index
L C
Eckart Altenmüller, Institute of Music Physiology and Musicians’ Medicine (IMMM), University
of Music, Drama and Media, Germany
Aaron L. Berkowitz, Department of Neurology, Brigham and Women’s Hospital, Harvard Medical
School, USA
Emmanuel Bigand, CNRS, UMR5022, Laboratoire d’Etude de l’Apprentissage et du
Développement, Université de Bourgogne, France and Institut Universitaire de France, France
Anthony Brandt, The Shepherd School of Music, USA
Elvira Brattico, Center for Music in the Brain (MIB), Department of Clinical Medicine, Aarhus
University, Denmark and The Royal Academy of Music, Aarhus/Aalborg, Denmark
Thenille Braun Janzen, Music and Health Science Research Collaboratory (MaHRC), University of
Toronto, Canada
Steven M. Demorest, Northwestern University, USA
Michael G. Erkkinen, Department of Neurology, Brigham and Women’s Hospital, USA
Laura Ferreri, Cognition and Brain Plasticity Group, Bellvitge Biomedical Research Institute,
Hospitalet de Llobregat, Barcelona and Department of Cognition, Development and Educational
Psychology, University of Barcelona, Spain. Laboratoire d’Etude des Mécanismes Cognitifs,
Université Lumière Lyon 2, 69676 Lyon, France
Shinichi Furuya, Sony Computer Science Laboratories Inc., Japan
Molly Gebrian, University of Wisconsin-Eau Claire, Department of Music and Theatre Arts, USA
Jessica A. Grahn, Brain and Mind Institute, Western University, Canada
Rachel E. Guetta, The National Center for PTSD, VA Boston Healthcare System, USA
Shantala Hegde, Clinical Neuropsychology and Cognitive Neuroscience Center and Music
Cognition Laboratory, Department of Clinical Psychology, National Institute of Mental Health and
Neurosciences, Bengaluru, India
Donald A. Hodges, University of North Carolina at Greensboro, USA
Timothy L. Hubbard, Arizona State University, USA and Grand Canyon University, USA
Christos I. Ioannou, Institute of Music Physiology and Musicians’ Medicine (IMMM), University
of Music, Drama and Media, Germany
Lutz Jäncke, Division of Neuropsychology, Institute of Psychology, University of Zurich,
Switzerland
Irma Järvelä, Department of Medical Genetics, University of Helsinki, Finland
Patrik N. Juslin, Department of Psychology, Uppsala University, Sweden
Stefan Koelsch, Department for Biological and Medical Psychology, University of Bergen, Norway
Yuko Koshimori, Music and Health Research Collaboratory (MaHRC), University of Toronto,
Canada
Yune S. Lee, Department of Speech and Hearing Science, The Ohio State University, USA
Psyche Loui, Northeastern University, USA
Susan Marsh-Rollo, Auditory Development Lab, McMaster University, Canada
Bjorn Merker, Independent Scholar, Kristianstad, Sweden
Benjamin Morillon, Institut de Neurosciences des Systèmes, Aix-Marseille Université & INSERM,
Marseille, France
Steven J. Morrison, University of Washington, USA
Aline Moussard, Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal
(CRIUGM), Canada
Adam Ockelford, School of Education, University of Roehampton, London, UK
Sébastien Paquette, International Laboratory for Brain, Music and Sound Research (BRAMS),
Université de Montréal, Québec, Canada
Marcus T. Pearce, Queen Mary University of London, UK and Aarhus University, Denmark
Virginia B. Penhune, Department of Psychology Concordia University, Canada
David Peterson, Institute for Neural Computation, University of California San Diego, USA
Vesa Putkinen, Turku PET Centre, University of Turku, Turku, Finland
Isabelle Royal, Département de psychologie, Université de Montréal, Québec, Canada
Frank Russo, Ryerson University, Canada
Laura S. Sakka, Department of Psychology, Uppsala University, Sweden
Charlene Santoni, Faculty of Music, University of Toronto, Canada
E. Glenn Schellenberg, Department of Psychology, University of Toronto Mississauga, Canada
Daniel S. Scholz, Institute of Music Physiology and Musicians’ Medicine (IMMM), University of
Music, Drama and Media, Germany
Daniele Schön, Institut de Neurosciences des Systèmes, Aix-Marseille Université & INSERM,
France
Michael Schutz, Institute for Music and the Mind, McMaster University, Canada
L. Robert Slevc, Department of Psychology, University of Maryland, USA
Klaus Martin Stephan, SRH Gesundheitszentrum Bad Wimpfen, Germany
Swathi Swaminathan, Rotman Research Institute, Baycrest Health Sciences, Canada
J. Eric T. Taylor, Brain and Mind Institute, Western University, Canada
Mari Tervaniemi, Cognitive Brain Research Unit, Department of Psychology and Logopedics,
Faculty of Medicine, University of Helsinki, Helsinki, Finland and Cicero Learning, Faculty of
Educational Sciences, University of Helsinki, Helsinki, Finland
Corene Thaut, Faculty of Music, University of Toronto, Canada
Michael H. Thaut, Music and Health Science Research Collaboratory (MaHRC), University of
Toronto, Canada
Barbara Tillmann, CNRS, Lyon Neuroscience Research Center, Auditory Cognition and
Psychoacoustics team, France and University of Lyon, France
Laurel J. Trainor, Department of Psychology, Neuroscience & Behavior, McMaster University,
Canada
Pauline Tranchant, Département de psychologie, Université de Montréal, Canada
Christina M. Vanden Bosch der Nederlanden, Brain and Mind Institute, Western University,
Canada
Robin W. Wilkins, University of North Carolina at Greensboro, USA
SECTION I
IN T R OD U C T ION
CHAPT E R 1
THE NEUROSCIENTIFIC
STUDY OF MUSIC: A
BURGEONING DISCIPLINE
D O N A L D A . H O D G E S A N D MI C H A E L H . T H A U T
T book is the result of a considerable amount of effort by fifty-four

authors from thirteen countries. Beyond that, it represents the work of
hundreds of researchers over the past fifty years or so. The neuroscientific
study of music, or neuromusical research as it may be called, has grown and
expanded significantly over several decades. The purpose of this chapter is
twofold. The first portion provides a brief historical perspective on music
and neuroscience. The second presents an overview of the eight sections
and thirty-three chapters of this book.
V B
D
Space limitations do not permit a detailed historical overview of

neuromusical research. Rather, the intent is to provide glimpses of early,
pioneering efforts. In 1977, R. A. Henson included historical notes on
neuromusical research in the ground-breaking book on music and the brain
he edited along with Macdonald Critchley (Critchley & Henson, 1977).
John Brust (2003) also provided a historical perspective. More recently,
Eckart Altenmüller, Stanley Finger, and François Boller edited a two-
volume set on music, neurology, and neuroscience (2015a, 2015b) that
provides far greater depth and detail. The first volume focuses on historical
connections and perspectives and the second on evolution, the musical
brain, and medical conditions and therapies. From these and other sources,
here are a few glimpses into the growing field of music–brain research.
• Franz Joseph Gall (1758–1828), the founder of phrenology, identified

music as one of the twenty-seven faculties of the mind (Elling,
Finger, & Whitaker, 2015); in Fig. 1, you can see the music faculty,
listed as Tune, just above the eye. Among many others who pursued
this notion, Madam Luise Cappiani (1901) gave an address at the
American Institute of Phrenology in which she discussed phrenology,
physiology, and psychology in connection with music and singing.
• In the 1860s and 1870s, British neurologist John Hughlings Jackson
(1835–1911) made cogent observations about children who could not
speak but who could sing (Lorch & Greenblatt, 2015). Speaking of
one speechless child, Jackson said, “It is worthy of remark that when
he sings he can utter certain words … but he can only do so while
singing” (Jackson, 1871, p. 430). By 1888, German neurologist
August Knoblauch (1863–1919) had coined the term “amusia”
(Graziano & Johnson, 2015) and created a model with five music
centers: an auditory center for the perception of musical tones, a
motor center for musical production, an idea center for the analysis
and comprehension of music, a visual system for reading musical
notation, and a motor system for writing musical notation (Johnson &
Graziano, 2003). Damage to any of these five centers could lead to
nine disorders, grouped into perception or production impairments.
Richard Wallaschek (1860–1917), John Edgren (1849–1929), and
others also investigated the loss of musical abilities in relation to
brain function (Henson, 1977).
• The first encephalographic (EEG) recording in humans was made by
Hans Berger in 1924 (Haas, 2003). Less than twenty-five years later,
researchers were studying musicogenic epilepsy by means of EEG
(Shaw & Hill, 1947). By the mid-1970s, investigators were utilizing
event-related potentials (ERPs) in relation to music (Schwent,
Snyder, and Hillyard, 1976). They found N100 responses (negative
waves peaking between 80 and 120 ms after the onset of a stimulus)
reflecting pre-attentive perception of pitch changes.
• In 1981, Roland, Skinhøj, and Lassen asked participants to make
same-different judgments on tone-rhythm patterns taken from the
Seashore Tests of Musical Talent while undergoing positron emission
tomography (PET) scans. They found widespread activations,
including differences between left and right hemispheric processing.
• Roland Beisteiner reported on three experiments conducted in Vienna
in 1995 in which he used functional magnetic resonance imaging
(fMRI), along with direct current EEG (DC-EEG) and
magnetoencephalography (MEG), to demonstrate the viability of
these methods in the study of music. Finger and hand movements,
approximating those used in playing the piano, elicited strong
activations in primary and supplementary motor cortices. Since that
time, fMRI has become a predominant methodology in neuromusical
research.
• Recent years have seen the development of several additional
methodologies, including transcranial magnetic stimulation (TMS),
voxel based morphometry (VBM), tensor based morphometry
(TBM), diffusion tensor imaging (DTI), and genomics approaches.
Also, new data analysis techniques are being developed, such as
network science (described by Wilkins, this volume).
FIGURE 1. A phrenological map of the brain. Music is listed as “Tune” and appears just above the
eye.
Source: By William Walker Atkinson, 1862–1932 [No restrictions], via Wikimedia
Commons.
https://upload.wikimedia.org/wikipedia/commons/7/71/How_to_know_human_nature-
_its_inner_states_and_outer_forms_%281919%29_%2814784651435%29.jpg
From these earliest explorations into music and the brain, neuromusical
research has exploded in recent decades, as indicated in Fig. 2. What began
as fledgling, pioneering efforts from the 1940s to the 1960s has burgeoned
into a relative flood of publications in the 2000s.
FIGURE 2. The number of published articles obtained from a simple “music and brain” search in
PubMed (https://www.ncbi.nlm.nih.gov/pubmed/).
Given their variety and ubiquity, human musical experiences are

complex and mysterious. Philosophers, ethnomusicologists, music theorists,
and many others have spilled countless barrels of ink trying to explicate the
phenomenon of music. Why do we respond to music so powerfully? What
does it mean? Why do we have it at all? Explaining how music “works” in
the human brain is no less daunting. Of necessity, neuroscientists frequently
take a reductionist approach (Bickle, 2003; Krakauer, Ghazanfar, Gomez-
Marin, MacIver, & Poeppel, 2017). Findings from work going on at one
level (e.g., networks) are not necessarily integrated into work at another
level (e.g., genomics). Furthermore, results are often parsed according to
methodology (e.g., fMRI and ERP). As stated, some of this is of necessity;
after all, notions derived from activations generated across 30 minutes of
music listening and monitored by fMRI are not immediately compatible
with results from an experimental design with musical stimuli of just a few
seconds as recorded by MEG.
To avoid a crazy-quilt, scattershot view of music, broad overviews
attempting to blend disparate findings have appeared from time to time in
the literature. Whether in articles (e.g., Peretz & Zatorre, 2005; Warren,
2008), chapters (e.g., Marin & Perry, 1999; Schlaug, 2003), or books (e.g.,
Critchley & Henson, 1977; Koelsch, 2012), these reviews are critically
important in moving us toward a more coherent, unified understanding of
music in the brain. There are certain advantages to having a singular view
of one or two authors, or even in focusing the discussion in a limited word
count. The present volume, on the other hand, has strengths in the diversity
and expertise of fifty-four authors who have written approximately 350,000
words on music and neuroscience. In the next portion of this chapter, we
provide an overview of their thirty-three chapters.
C O
As this introductory chapter comprises the first section, these overviews

will concentrate on sections II through VIII.
II. Music, the Brain, and Cultural Contexts

2. Music through the lens of cultural neuroscience, Donald A.
Hodges.
3. Cultural distance: A computational approach to exploring cultural
influences on music cognition, Steven J. Morrison, Steven M.
Demorest, and Marcus T. Pearce.
4. When extravagance impresses: Recasting esthetics in evolutionary
terms, Bjorn Merker.
The three chapters in Section II aim to put the neuroscientific study of

music into a larger cultural context. First, Donald Hodges revisits a long-
standing notion that musical experiences have both biological and cultural
underpinnings. Biology and culture are so intertwined that there is no clear
way to separate the two, and no need to, either. Rather, the new field of
cultural neuroscience provides increased understanding of how biological
and cultural aspects constrain and enhance each other. Next, Steven
Morrison, Steven Demorest, and Marcus Pearce present a model of cultural
distance, a computational means of determining how closely the music
from disparate cultures relate. Unfamiliar music whose statistical patterns
of pitch and rhythm closely approximate one’s own may be easier to
process than music with widely divergent patterns. Such a model may be
useful in future neuroimaging studies of cross-cultural music processing. In
the final chapter in this section, Bjorn Merker presents a persuasive
argument that our human aesthetic responses to music arise from elements
at play in the development of large and complex birdsong repertoires.
Responses among birds may range from boredom to interest/curiosity. In
humans, a hedonic reversal leads to being impressed, being moved, or to
awe and sublimity at the extreme. Taken together, these three chapters
remind us that findings from the neuroscientific study of music must always
be placed into broader cultural contexts in order for a full and complete
understanding.
III. Music Processing in the Human Brain

5. Cerebral organization of music processing, Thenille Braun Janzen
and Michael H. Thaut.
6. Network neuroscience: An introduction to graph theory network-
based techniques for music and brain imaging research, Robin W.
Wilkins.
7. Acoustic structure and musical function: Musical notes informing
auditory research, Michael Schutz.
8. Neural basis of rhythm perception, Christina M. Vanden Bosch der
Nederlanden, J. Eric T. Taylor, and Jessica A. Grahn.
9. Neural basis of music perception: Melody, harmony, and timbre,
Stefan Koelsch.
10. Multisensory processing in music, Frank Russo.
Authors in Section III explore what we know about how music is

processed in the human brain. Thenille Braun Janzen and Michael Thaut
present an organizational scheme based upon ascending auditory pathways,
auditory-frontal networks, auditory-motor networks, and auditory-limbic
networks. The most advanced research has moved beyond what parts of the
brain are involved at specific points in the processing stream and are
beginning to look increasingly at how these various brain regions interact in
real time. The complexity of music processing, involving aspects such as
preference, socio-cultural contexts, musical expertise, and so on, poses a
daunting challenge but substantial process is being made. One
advancement, according to Robin Wilkins, is network science, which
utilizes graph theory techniques and analysis as a means of understanding
structural and functional connectivity in the brain. Network science moves
us closer to learning how the brain communicates with itself in the dynamic
process of responding to music. A further advantage may be that it allows
for monitoring task performance during much longer music listening
conditions than brief excerpts. Michael Schutz continues the discussion in
the next chapter with a more fine-grained examination of how micro-timing
changes in musical stimuli are processed in the brain as music unfolds over
time. Constant, rapid fluctuations in overtone spectra require sophisticated
neural tracking mechanisms. Indeed, one of the deficiencies of early
synthesized music, and to some extent some auditory perception research, is
a lack of ecological validity in terms of temporally invariant musical
stimuli.
In the next chapter, Christina Vanden Bosch der Nederlanden, J. Eric T.
Taylor, and Jessica Grahn provide an overview of the research on how the
brain processes and produces musical rhythms. Auditory-motor networks
are particularly important in beat finding and other rhythmic processes. Our
brain’s ability to perceive and produce rhythms has wide-ranging
implications for many aspects of human behavior. Stefan Koelsch expands
the discussion into an examination of the neural underpinnings of melodic,
harmonic, and timbral perception. Numerous and widespread brain regions
are involved in processing music. Because infants and individuals without
formal music training can process melody, harmony, and timbre
successfully, musicality is clearly a natural ability of the human brain.
Although much of the extant research focuses on particular sensory
modalities, ultimately a more ecologically valid understanding arises from
the integration of multiple sensory inputs and this topic is taken up by Frank
Russo. An integrated, multisensory view of music processing involves
auditory, visual, somatosensory, vestibular, and motor systems. This
necessarily involves extensive, widely-distributed but locally-specialized
neural networks (Sergent, Zuck, Terriah, & MacDonald, 1992).
Overall, the six chapters of Section III remind us that music is a whole
brain experience, with numerous intertwining and interacting neural
networks. Enormous progress has been made in ferreting out all the
disparate components and their entangled interrelationships, especially with
the advent of rapidly evolving technologies but there are still puzzles left to
solve.
IV. Neural Responses to Music
11. Music and memory, Lutz Jäncke.
12. Music and attention, executive function, and creativity, Psyche Loui
and Rachel Guetta.
13. Neural correlates of music and emotion, Patrik Juslin and Laura
Sakka.
14. Neurochemical responses to music, Yuko Koshimori.
15. The neuroaesthetics of music: A research agenda coming of age,
Elvira Brattico.
16. Music and language, Daniele Schön and Benjamin Morillon.
The six chapters comprising Section IV delve into the ways the brain
responds to music. Once again, we see multiple overlapping and mutually
reinforcing domains. All meaningful musical experiences involve memory
in one way or another. Lutz Jäncke explores discrete, music-only, and
shared memory systems that involve auditory processing, episodic,
autobiographic, semantic, and implicit memories, as well as motor
programs, emotion, and motivation. Each of these components has neural
correlates designed for encoding, storing, and retrieving musical memories.
Such a diffuse and distributed network may help explain commonly
reported musical influences on non-musical memory formation. Psyche
Loui and Rachel Guetta tackle relationships between music and attention,
executive function, and creativity. The topic of attention in music can be
informed by general theories of attention, as well as those specifically
applied to musical stimuli. Passive music listening experiences are less
likely to affect executive functions, but research is ongoing concerning
whether and to what extent active musicing affects executive functions in
terms of near and far transfers and in terms of relevant neural mechanisms.
Attention and executive functions, along with their attendant brain
networks, are both connected to musical creativity.
Patrik Juslin and Laura Sakka provide a thorough and detailed review of
neuroimaging studies related to music and emotion. Although certain brain
regions have been more or less consistently implicated in the processing of
musical emotions, much is still unclear. For example, it is not always
certain in some experimental designs whether participants are “merely”
perceiving or actually experiencing musical emotions. Juslin and Sakka
provide methodological recommendations for moving the field forward.
Neurochemical responses are the basis for musical emotions and Yuko
Koshimori reviews recent work in this emerging field. Musical experiences
induce the release of neurotransmitters (e.g., dopamine, serotonin, and
acetylcholine), neuropeptides (e.g., beta-endorphin, oxytocin, and arginine
vasopressin), steroid hormones (e.g., cortisol), and peripheral immune
biomarkers. In addition to the main area of research concerning
neurochemical responses in music listening and music performance
experiences, another primary course of investigation involves the
intentional manipulation of neurochemicals via music in a variety of health
and wellness issues (e.g., Parkinson’s disease, chronic pain, and stress).
Elvira Brattico’s discussion of neuroaesthetics combines but also moves
beyond the previous chapters in this section; this is another emerging field
that demonstrates the maturing of neuromusical research. Building on
decades of previous work in music perception, cognition, and more recently
emotion, neuroaesthetics investigates matters such as brain areas involved
in liking, preference, and aesthetic judgments. While this undoubtedly
introduces more subjectivity into the discussion, it also moves us closer to a
core human experience that lies at the root of music’s importance.
Music and language are both ubiquitous aspects of the human
experience and questions about the nature of and relationships between the
two have been asked and the answers debated for centuries. Now
neuroscientists are posing new questions, such as “to what extent are music
and language processed in distinct, shared, or homologous networks?”
Daniele Schön and Benjamin Morillon give answers to this and related
questions based on current evidence. They also discuss the effects of
musical experiences on language acquisition and skills.
As was the case with Section III, Section IV demonstrates the
tremendous complexity of human musical experiences from a
neuroscientific standpoint. Steadily, patiently, over a period of time and
with new technologies and methodologies, a clearer picture is emerging.
V. Musicianship and Brain Function

17. Musical expertise and brain structure: The causes and
consequences of training, Virginia Penhune.
18. Genomics approaches for studying musical aptitude and related
traits, Irma Järvelä.
19. Brain research in music performance, Eckart Altenmüller, Shinichi
Furuya, Daniel Scholz, and Christos Ioannou.
20. Brain research in music improvisation, Michael Erkkinen and
Aaron Berkowitz.
21. Neural mechanisms of musical imagery, Timothy Hubbard.
22. Neuroplasticity in music learning, Vesa Putkinen and Mari
Tervaniemi.
Authors of the six chapters comprising Section V are all concerned with
unraveling knotty issues surrounding the ways musicianship and brain
function interact with each other. Virginia Penhune begins with the notion
that musical training affects numerous brain structures, including gray and
white matter, auditory cortex and association areas, motor regions, frontal
regions, and parietal cortex. Some variances between adult musicians and
non-musicians may be due to pre-existing differences, but sufficient
research exists to support the contention that long-term musical training
produces many of these changes. Penhune also discusses reasons why
music has such strong effects on brain plasticity.
Irma Järvelä takes us on a tour of genomics, specifically the role of
genetics in human musicality. Genes influencing inner ear development,
auditory pathways, and cognition are all linked to musical aptitude. In
addition, genomics research suggests that music and language have a
common evolutionary heritage and that genes play a role in the effects
music has on the body. Eckart Altenmüller, Shinichi Furuya, Daniel Scholz,
and Christos Ioannou examine the contributions that prolonged extensive
goal-directed practice, multisensory-motor integration, high arousal, and
emotional and social rewards make toward inducing brain plasticity. They
discuss motor planning and control, and finally musician’s dystonia, that is,
plasticity-induced loss of skills or what they call de-expertise.
Michael Erkkinen and Aaron Berkowitz review neuroimaging studies of
music improvisation. Using PET, fMRI, tDCS (transcranial direct current
stimulation), and EEG, researchers have implicated numerous brain regions
involved in the spontaneous creation of music. Overall, improvisation
activates a broad network of brain regions involving cognitive control and
monitoring, motor planning and execution, multimodal sensation,
motivation, emotional/limbic processing, and language regions. Timothy
Hubbard describes and discusses auditory and motor neural mechanisms
supporting musical imagery. Involuntary musical imagery includes
anticipatory musical imagery, musical hallucinations, schizophrenia,
earworms, and synesthesia. Embodied musical imagery is covered in such
examples as spatial and force metaphors, the role of mimicry, the distinction
between the inner ear and inner voice, the effects of mental practice on
performance, musical imagery and dance, and musical affect.
Vesa Putkinen and Mari Tervaniemi are concerned with neural plasticity
in music learning. Focusing primarily on studies employing ERPs derived
from EEG and MEG, they found evidence to support the contention that
musical training enhances domain-general auditory processing skills,
though far transfer to executive functions is less certain. They also contend
that training alone does not account for all the differences between
musicians and non-musicians, as self-selection is a confound in terms of
predisposing factors.
These six chapters push beyond the nature of passive music listening
situations into the realm of active musicing experiences. While we cannot
pretend that we fully understand what is transpiring in the brain of Daniel
Barenboim as he conducts a Mahler symphony, by fits and starts, patient
marching, and occasional leaping, we are moving forward.
VI. Developmental Issues in Music and the Brain

23. The role of musical development in early language acquisition,
Anthony Brandt, Molly Gebrian, and Robert Slevc.
24. Rhythm, meter, and timing: The heartbeat of musical development,
Laurel J. Trainor and Susan Marsh-Rollo.
25. Music and the aging brain, Laura Ferreri, Aline Moussard,
Emmanuel Bigand, and Barbara Tillmann.
26. Music training and cognitive abilities: Associations, causes, and
consequences, Swathi Swaminathan and E. Glenn Schellenberg.
27. The neuroscience of children on the autism spectrum with
exceptional musical abilities, Adam Ockelford.
Throughout the lifespan, musical experiences have consequences for

brain development. Anthony Brandt, Molly Gebrian, and Robert Slevc
examine the role of early musical experiences on language acquisition.
Evidence suggests that speech is initially processed by infants as a type of
music. Initially entangled in the child’s brain, speech and music gradually
develop into independent modalities. Though many of the differences
between speech and music starkly divide them, timbral aspects of phonemes
and prosodic elements of melodic and rhythmic inflection provide a
common bridge.
Laurel Trainor and Susan Marsh-Rollo focus on the special role that
rhythmic elements play in musical development. Initially, infants use timing
cues to perceive and respond to emotional information. As they become
enculturated to their surroundings, they develop oscillatory brain rhythms
that link auditory and motor aspects of entrainment. Eventually, perceptual
awareness of the synchronicity of movements among people enables them
to make reliable judgments of trust and friendship. Laura Ferreri, Aline
Moussard, Emmanuel Bigand, and Barbara Tillmann report on the role
music can play in improving cognition and promoting well-being and social
connection at the other end of the lifespan. Divided into two major sections,
the first concentrates on music’s contributions to healthy aging, including
underlying brain regions. The second examines the role of music-based
therapeutic approaches dealing with age-related issues such as memory,
language, motor functions, and emotions and well-being.
Swathi Swaminathan and E. Glenn Schellenberg review relationships
between music training and cognitive abilities. Positive associations are
reported for measures of general cognitive, visuospatial, and language
abilities, as well as academic achievement and healthy aging. However,
with the exception of some linkages between musical training and specific
language skills, causal evidence is lacking, inconsistent, or weak. In the
final chapter in this section, Adam Ockelford presents a neuroscientific
model accounting for exceptional musicianship among some children on the
autism spectrum. In these special cases, children process language and
everyday sounds as if they were music. For these individuals, then, music
takes precedence over language and other everyday sounds.
From birth to death, and in all cognitive conditions, music plays an
important role in the human experience. We have known this anecdotally
and now we are beginning to understand requisite brain processes.
VII. Music, the Brain, and Health

28. Neurologic Music Therapy in sensorimotor rehabilitation, Corene
Thaut and Klaus Martin Stephan.
29. Neurologic Music Therapy for speech and language rehabilitation,
Yune Lee, Corene Thaut, and Charlene Santoni.
30. Neurologic Music Therapy targeting cognitive and affective
functions, Shantala Hegde.
31. Musical disorders, Isabelle Royal, Sébastien Paquette, and Pauline
Tranchant.
32. When blue turns to gray: The enigma of musician’s dystonia, David
Peterson and Eckart Altenmüller.
The greatest preponderance of neuromusical research is basic research,

an attempt to understand how music is processed in the brain. To date, the
strongest forays into applied research come in the area of health. The five
chapters in Section VII demonstrate the tremendous strides that have been
taken in utilizing the power of music for more healthy living.
Music is important in the development, rehabilitation, and maintenance
of sensorimotor function, especially as it relates to neurologic disorders.
Corene Thaut and Klaus Martin Stephan discuss the role of neurologic
music therapy (NMT) in the facilitation of motor function in such
populations as those with Parkinson’s disease, stroke, traumatic brain injury
(TBI), multiple sclerosis, cerebral palsy, autism, and the healthy elderly.
They cover acquired movement disorders, degenerative diseases, and
developmental disorders.
Yune Lee, Corene Thaut, and Charlene Santoni explore the efficacy of
using NMT interventions for the treatment of dysarthria, apraxia of speech,
aphasia, fluency, sensory deficits, voice disorders, and dyslexia. Eight
standardized clinical techniques in the speech and language domain include
Melodic Intonation Therapy (MIT), Musical Speech Stimulation
(MUSTIM), Rhythmic Speech Cueing (RSC), Vocal Intonation Therapy
(VIT), Oral Motor and Respiratory Exercises (OMREX), Therapeutic
Singing (TS), Developmental Speech and Language Training Through
Music (DSLM), and Symbolic Communication Training Through Music
(SYCOM). Built-in temporal processes for both rhythm and speech are
mediated by corticostriatal circuitries comprising the basal ganglia, the
supplementary motor area (SMA), the premotor cortex, and the frontal
operculum.
Shantala Hegde discusses the use of NMT to improve cognitive and
affective functioning in such neurological conditions as TBI,
stroke/cerebrovascular accident, dementia, other degenerative conditions
like Parkinson’s disease, and in major psychiatric conditions such as
schizophrenia, bipolar affective disorders, as well as common psychiatric
conditions such as anxiety and depression. Music can play an important role
in cognitive rehabilitation as it engages auditory, motor, language,
cognitive, and emotional functions across cortical and subcortical brain
regions. Although early results are promising, considerably more research
using standardized NMT techniques is needed.
Isabelle Royal, Sébastien Paquette, and Pauline Tranchant focus their
attention on musical deficiencies due to congenital or acquired amusia and
musical anhedonia. Some individuals are born with an inability to process
pitch or rhythm; others acquire such deficits as a result of brain trauma or
stroke. Musical anhedonia may affect approximately 2 percent of the
population; even though these individuals are able to interpret music’s
emotional content, they derive no pleasure from it. Collectively, the study
of amusia provides a unique opportunity to study neural structures
underlying music processing.
David Peterson and Eckart Altenmüller investigate musical dystonia
(MD), the enigmatic disorder that selectively interferes with involuntary
motor control necessary for musical performance. MD includes such
pathological features as abnormalities in inhibition, sensorimotor
integration, and plasticity at many levels of the central nervous system.
Increasing understanding of the underlying neurological processes may lead
to improved management and possibly prevention of MD.
Centuries of music therapy in a broad sense of the term (e.g., as in the
role of the shaman or medicine man in many societies worldwide) and
decades of “modern” music therapy have clearly demonstrated the healing
powers of music. We are just now, however, at the cusp of explaining these
effects from a neuroscientific standpoint. Ensuing years will undoubtedly
see tremendous progress in these applications.
VIII. The Future

33. New horizons for brain research in music, Michael Thaut and
Donald Hodges.
In the final chapter, our aim is to identify noteworthy developments in

music-brain research and identify a few key areas for future research. As
demonstrated throughout this book, significant strides are being made in a
wide variety of important areas, including network modeling and
connectivity analyses, genomics and neurotransmitter imaging, and clinical
neuroscience research. Somewhat lagging is neuroimaging work in
musician’s health, music education, and collaborative efforts with music
philosophers.
A few final comments concerning the content of this book: anyone
reading multiple chapters is likely to discover that there are some overlaps
in coverage. That is, subtopics may be discussed in more than one chapter.
We chose not to delete most of these places during the editorial process for
two main reasons: (1) Subtopics frequently need to be reintroduced in
various chapters to provide context for the main topic at hand. (2) In using
slightly different wording or citing different sources, various authors
provide a richer understanding. Contrarily, there are still a few topics that
are not covered in this volume. To do so would require expanded coverage
beyond what is possible at this point. Furthermore, it should be noted that
research in certain areas is moving so quickly that new findings are
changing our understanding on a very short timescale. Rapid release of
individual chapters online counteracts this problem to a certain extent and
we are extremely pleased with the contributions these contributing authors
have made to the literature on music and the brain.
R
Altenmüller, E., Finger, S., & Boller, F. (Eds.). (2015a). Music, neurology, and neuroscience:
Historical connections and perspectives. Progress in Brain Research Vol. 216. Amsterdam:
Elsevier.
Altenmüller, E., Finger, S., & Boller, F. (Eds.). (2015b). Music, neurology, and neuroscience:
Evolution, the musical brain, and medical conditions and therapies. Progress in Brain Research,
Vol. 217. Amsterdam: Elsevier.
Beisteiner, R. (1995). DC-EEG, MEG and FMRI as investigational tools for music processing. In R.
Steinberg (Ed.), Music and the mind machine: The psychophysiology and psychopathology of the
sense of music (pp. 243–249). Berlin: Springer Verlag.
Bickle, J. (2003). Philosophy and neuroscience: A ruthlessly reductive account. Dordrecht: Kluwer
Academic Publishers.
Brust, J. (2003). Music and the neurologist: A historical perspective. In I. Peretz & R. Zatorre (Eds.),
The cognitive neuroscience of music (pp. 181–191). Oxford: Oxford University Press.
Cappiani, L. (1901). Phrenology, physiology, and psychology in connection with music and singing.
The Phrenological Journal and Science of Health (1870–1911) 3(2), 58–60.
Critchley, M., & Henson, R. (Eds.). (1977). Music and the brain: Studies in the neurology of music.
Springfield, IL: Charles C. Thomas.
Elling, P., Finger, S., & Whitaker, H. (2015). Franz Joseph Gall and music: The faculty and the
bump. In E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience:
Historical connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 3–32).
Amsterdam: Elsevier.
Graziano, A., & Johnson, J. (2015). Music, neurology, and psychology in the nineteenth century. In
E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience: Historical
connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 33–49). Amsterdam:
Elsevier.
Haas, L. (2003). Hans Berger (1873–1941), Richard Caton (1842–1926), and
electroencephalography. Journal of Neurosurgery and Psychiatry 74(1), 9.
Henson, R. (1977). Neurological aspects of musical experience. In M. Critchley & R. Henson (Eds.),
Music and the brain (pp. 3–21). Springfield, IL: Charles C. Thomas.
Jackson, J. (1871). National hospital for the paralysed and epileptic: Singing by speechless (aphasic)
children. The Lancet 2, 430–431.
Johnson, J., & Graziano, A. (2003). August Knoblauch and amusia: A nineteenth-century cognitive
model of music. Brain and Cognition 51(1), 102–114.
Koelsch, S. (2012). Brain and music. Oxford: Wiley-Blackwell.
Krakauer, J., Ghazanfar, A., Gomez-Marin, A., MacIver, M., & Poeppel, D. (2017). Neuroscience
needs behavior: Correcting a reductionist bias. Neuron 93(3), 480–490.
Lorch, M., & Greenblatt, S. (2015). Singing by speechless (aphasic) children: Victorian medical
observations. In E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience:
Historical connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 53–72).
Marin, O., & Perry, D. (1999). Neurological aspects of music perception and performance. In D.
Deutsch (Ed.), The psychology of music (2nd ed., pp. 653–724). San Diego: Academic Press.
Peretz, I., & Zatorre, R. (2005). Brain organization for music processing. Annual Review of
Psychology 56, 89–114.
Roland, P. E., Skinhøj, E., & Lassen, N. A. (1981). Focal activations of human cerebral cortex during
auditory discrimination. Journal of Neurophysiology 45(6), 1139–1151.
Schlaug, G. (2003). The brain of musicians. In I. Peretz & R. Zatorre (Eds.), The cognitive
neuroscience of music (pp. 366–381). Oxford: Oxford University Press.
Schwent, V. L., Snyder, E., & Hillyard, S. A. (1976). Auditory evoked potentials during multichannel
selective listening: Role of pitch and localization cues. Journal of Experimental Psychology:
Human Perception and Performance 2(3), 313–325.
Sergent, J., Zuck, E., Terriah, S. & MacDonald, B. (1992). Distributed neural network underlying
musical sight-reading and keyboard performance. Science 257(3), 106–109.
Shaw, D., & Hill, D. (1947). A case of musicogenic epilepsy. Journal of Neurology, Neurosurgery,
and Psychiatry 10(3), 107.
Warren, J. (2008). How does the brain process music? Clinical Medicine 8(1), 32–36.
SECTION II
MU S IC , T HE B R A IN ,
A N D C U LT U R A L
C ON T E X T S
CHAPT E R 2
MUSIC THROUGH THE

L E N S O F C U LT U R A L
NEUROSCIENCE
DONALD A. HODGES
I C
N
S have long recognized the co-equal roles biology and culture play
in the phenomenon we call music (e.g., Blacking, 1973). Fifty years ago,
Gaston (1968), quoting Dobzhansky, expressed the idea clearly and
succinctly. In asking how we developed characteristics of humanness, he
wrote:
To begin to answer this question, it is not necessary to separate the biology from the culture
of man [italics in the original]. They go hand in hand. “The fact which must be stressed,
because it has frequently been missed or misrepresented, is that the biological and cultural
evolutions are parts of the same process” (Dobzhansky, 1962, p. 22). This means that the
part of man’s culture we call music has a biological as well as a cultural basis. (p. 11)
To be certain, the pendulum of our understanding has sometimes swung

toward one and away from the other, when nature is favored over nurture
and vice versa. However, for the moment, let us take it as axiomatic that
both are necessary. Even so, “the problem of reconciling ‘cultural’ and
‘biological’ approaches to music, and indeed to the nature of mind itself,
remains” (Cross & Morley, 2009, p. 61). The purpose of this chapter, then,
is not to debate that biology and culture are both necessary components of
human musical experiences, nor to determine the extent of the contribution
from each, but rather to examine some of the recent evidence that supports
this contention. One reason to take another look at an old, and perhaps well-
established concept, is to add newer understandings from the field of
cultural neuroscience.
Cultural neuroscience is an emerging field of study that has arisen as a
means of investigating relationships between culture and brain (Chiao, Li,
Seligman, & Turner, 2016; Han et al., 2013). Chiao (2009) sees three
components of the cultural neuroscience toolbox:
• Cultural psychologists investigate what cultural values, beliefs, and

practices influence human behavior and how they do so.
• Neuroscientists use a variety of approaches to determine the role of
the brain.
• Neurogeneticists investigate genetic regulation of brain mechanisms
that support cognitive, emotional, and social behaviors.
Using these three components, Han and Ma (2015) proposed a culture–

behavior–brain (CBB) loop model of human development (Fig. 1).
Culturally contextualized behaviors (CC-Behavior) occur within a specific
cultural context but may not occur outside that culture. Culturally voluntary
behaviors (CV-Behavior) are guided by specific cultural mores that become
embedded in the brain. Genes moderate culture–brain interactions by
affecting brain anatomy and some behavioral and cognitive characteristics;
likewise, there are mutual gene–culture influences. Some of these genetic
influences take place over thousands of years and some occur within a
given lifespan.
FIGURE 1. Illustration of the CBB loop model of human development. Cultural environments
contextualize human behaviors. Learning novel cultural beliefs and the practice of different
behavioral scripts in turn modify the functional organization of the brain. The modified brain then
guides individual behavior to voluntarily fit into a cultural context and meanwhile to modify current
cultural environments. Direct interactions also occur between culture and brain without overt
behavior. Abbreviations: CBB, culture–behavior–brain, CC-Behavior, culturally contextualized
behavior; CV-Behavior, culturally voluntary behavior.
Reprinted from Trends in Cognitive Neuroscience 19(11), Shihui Han and Yina Ma, A
culture-behavior-brain loop model of human development, pp. 666–676, Figure 1,
doi.org/10.1016/j.tics.2015.08.010, Copyright © 2015 Elsevier Ltd. All rights reserved.
Because a full explication of cultural neuroscience would require an

extended discussion beyond this chapter, a more straightforward way to
approach cultural neuroscience is to examine the implications of the
following: “Cultural practices adapt to neural constraints, and the brain
adapts to cultural practice” (Ambady & Bharucha, 2009, p. 342). Let us
examine both of these in turn, specifically as they relate to music.
Cultural Practices Adapt to Neural Constraints

Although it is difficult to predict precise biological limits for human
performance, a reasonable assumption is that biological factors place
restrictions on human musicality. We can hear musical pitches only within a
delimited frequency range, typically 20 Hz–20,000 Hz at the extremes. We
can sing only so high; for example, Mozart stretched the limits when he
wrote an F above high C in the Queen of the Night aria from The Magic
Flute (“Der Hölle Rache kocht in meinem Herzen” from Die Zauberflöte).
Even so, musicians are capable of amazing feats. Smith (1953) reported
that one pianist performed the 6266 notes of Schumann’s Toccata in C
Major, Op. 7 in 4’20” at a rate of 24.1 notes per second. Toscanini was
credited with a phenomenal memory, reportedly having memorized 250
symphonic works and 100 operas (Marek, 1975). Although it is certainly
possible for someone to play these pieces faster or memorize more scores,
surely there must be some limits. Perceptually and cognitively, Wagner and
Chinese operas push many listeners to the extreme. Going beyond human
limits, however, Cage’s Organ2/ASLSP (As SLow aS Possible) is currently
being performed in a church in Halberstadt, Germany in what is projected
to take 639 years (Wakin, 2006). At this speed, it is possible for any person
to hear only a fraction of the entire performance.
The Brain Adapts to Cultural Practice

Just as the brain shapes what we do, what we do shapes the brain.
Neurologist Frank Wilson (1998) wrote a compelling account of how the
brain and the hand co-evolved. Over time, developmental changes allowed
us to use our hands for an increasingly wider variety of tasks, such as
grasping, throwing, pounding, manipulating tools, and so on, and these
newly-acquired skills, in turn, spurred further brain development. Of
course, it is not just the hand in isolation. In chipping stone tools, for
example, listening carefully to the sound of the stone being shaped is
critical to a successful result, as one extra strike may cause the rock to
break. Creating bone flutes (Conard, Malina, & Münzel, 2009) or
lithophones (Cross, Zubrow, & Cowan, 2002), rock percussion instruments
out of flint blades, would require similar interactions of hand, ear, and
brain. In the case of flutes, tinkering with where to place finger holes and
how to direct the air (i.e., whether as a notched, block, or transverse flute)
requires considerable ingenuity (Kunej & Turk, 2000). Wilson encapsulates
these ideas in speaking about the co-evolution of the brain and the musical
hand:
What we are left with when we seek to explain musical talent on a biological basis seems
best characterized as an assembly of neurologic and behavioral potentials that arise from
within and are uniquely defined by specific cultures. (1998, p. 224)
Another example of cultural practice influencing brain development can be

seen in the organization of the hearing mechanism. Tonotopic organization
maintains a frequency map on the basilar membrane in the inner ear that is
maintained throughout the auditory pathway all the way to the auditory
cortex. Pantev and colleagues (1998) demonstrated that for trained
musicians a pitch map overlays the frequency map, as responses were 25
percent larger to piano tones than to pure tones; this was not true for
controls who had never learned to play a musical instrument. Similarly,
violinists and trumpeters showed more robust responses to tones from their
instrument than to pure tones (Pantev, Roberts, Schultz, Engelien, & Ross,
2001). Since musical tones from Western instruments (i.e., piano, violin,
trumpet) are cultural artifacts, it is difficult to account for these results
unless the brain has adapted itself to environmental experiences.
In contrast to the two ends of a continuum (i.e., either nature or nurture),
human behavior, generally, and musical behavior, specifically, are a
combination of the two. In the following sections, we will briefly examine
genetic influences on musical behavior, neural plasticity, cultural influences
on innate infant responses to music, the search for music universals, and
cross-cultural music research.
G I M
B
Genetic instructions provide another example of biological restrictions that

can be modified by environmental experiences. Although genes provide
instructions that influence nearly everything about us, including both
physical features (e.g., hair and eye color) and behavior, genetic instructions
are not inviolable; rather, daily living and life’s experiences influence gene
expression, including those associated with learning and memory (Rampon
et al., 2000). However, interpreting gene–environment interactions is not
without difficulty. What makes the situation so problematic is that some
environmental circumstances that might influence genetic expression are
themselves open to genetic influence. In reviewing the status of current
understanding, Manuck and McCaffery state that, “… it seems reasonable
to assume that most dimensions of measured experience will have both
environmental and genetic determinants …” (2014, p. 63), even if there is
no clear way of disentangling the two.
Ullén, Hambrick, and Mosing (2016) discussed interactions between
environment and genetic instructions in the development of expertise. In
contrast to a focus on deliberate practice as the sole determiner of expert
performance, they proposed a multifactorial gene–environment interaction
model (MGIM) of expert performance (Fig. 2). According to this model,
expertise results from an array of factors that work in tandem. High-level
expertise (e.g., musical performance) cannot simply be a matter of enough
hours of deliberate practice. Genetic and non-genetic factors, along with
their interactions, are necessary. For example, in a large study of twins (N =
10,500), genetic influences accounted for the amount of practice time (69
percent of the variance in males and 41 percent in females) (Mosing,
Madison, Pedersen, Kuja-Halkola, & Ullén, 2014).
FIGURE 2. Schematic summary of main elements of the multifactorial gene–environment
interaction model (MGIM). At the phenotypic level (upper part), the MGIM assumes that
psychological traits such as abilities, personality, interests, and motivation are associated with the
domain and intensity of practice. Specific examples of variables that have been shown to be
involved in various forms of expertise are provided in italics under each general heading. Practice
will cause adaptations of neural mechanisms involved in expertise and can also influence relevant
physical body properties. Furthermore, neural mechanisms related to trait differences may impact
expertise independently of practice. Both genetic and non-genetic factors (lower part) influence the
various variables that are involved in expertise at the phenotypic level. These influences are likely to
be complex and involve both gene–environment interaction effects and covariation between genes
and environment (G–E covariation).
Reprinted from Psychological Bulletin 142(4), Fredrik Ullén, David Zachary Hambrick, and
Miriam Anna Mosing, Rethinking expertise: A multifactorial gene–environment interaction
model of expert performance, pp. 427–446, doi.org/10.1037/bul0000033, Copyright © 2016
American Psychological Association.
Contemporary research is providing increasingly refined understandings

of genetic–musical behavior interactions. For example, gene expression is
differentially upregulated or downregulated for music listening or for music
performance (Kanduri et al., 2015a, b). Excellent reviews of the role of
genetics in music, documenting interaction between genes and environment
are found in Mosing, Peretz, & Ullén (2018), Yi, McPherson, Peretz,
Berkovic, & Wilson (2014), and Yi, McPherson, & Wilson (2018). See also
Chapter 18.
N P
Musicians are models of neural plasticity (Münte, Altenmüller, & Jäncke,

2002). That is, many changes have been documented in the brains of
musicians as a result of training. Table 1 is not intended to be an exhaustive
list, either of neural adaptations or number of relevant sources, but rather to
show only a few of the ways that adult musicians’ brains have been
modified by music learning experiences. Several investigators have
concluded that these changes are more likely a result of intense music
learning experiences than that these musicians were born with “different”
brains (Hyde et al., 2009; Norton et al., 2005; Schlaug, Norton, Overy, &
Winner, 2005; Schlaug et al., 2009). In a confirming study, identical twins,
with one member of each pair having piano lessons and the other one not,
showed significant differences in brain anatomy attributed to musical
training (Manzano & Ullén, 2018).
Table 1. Changes in musicians’ brains
Region Change Source

Anatomical changes
Cerebellum Greater volume in males, but not females Hutchinson et al., 2003
Corpus Area 3 of CC enlarged Schlaug et al., 2009
callosum
Gray matter Greater volume in motor, auditory, and Bermudez & Zatorre,
visuospatial areas 2005; Gaser & Schlaug,
2003
Sensorimotor Identifying markers in precentral cortex for Bangert & Schlaug, 2006
cortex string players (RH) and pianists (LH)
White matter Positive correlations between amount of Bengtsson et al., 2005
practice time and white matter organization
Functional changes
Auditory Increased cortical representation for Pantev et al., 1998, 2001
cortex musical tones over pure tones
Multimodal Increased activity in convergence zones Hodges et al., 2005
integration
areas
RH motor Increased cortical representation for string Elbert et al., 1995
cortex players
Secondary Superior sound localization in conductors Münte et al., 2001
auditory
cortex
Temporal and Enhanced MMN for chord alterations Koelsch et al., 1999;
frontal lobes Tervaniemi et al., 1999
Visual cortex Minimal deactivation of visual cortex during Hodges et al., 2010
difficult auditory tasks
RH = right hemisphere; LH = left hemisphere; CC = corpus callosum; MMN = mismatch

negativity, a component of event-related potentials in response to a violation of an
expected rule (e.g., a wrong note in a tonal musical passage).
Actually, formal study, or in musical parlance practice, is not necessary

for musical experiences to elicit changes in the brain. With the possible
exception of those with congenital amusia (Peretz, Brattico, Järvenpää, &
Tervaniemi, 2009), nearly everyone learns the music of the surrounding
culture, even in the absence of formal training. For example, people
generally have no trouble successfully processing the accompanying
musical track while watching movies and television. This was confirmed in
a study in which scores on combined music aptitude tests were normally
distributed in a population, suggesting that “moderate musical aptitude is
common and does not need formal training” (Oikkonen & Järvelä, 2014, p.
1104).
One of the critical challenges infants face is to make sense of what
initially appears to be a chaotic world. Fortunately, they come into the
world remarkably able to detect patterns and structures in the environment
based on the frequency with which they are encountered. Moreover, they
are able to do this often in the absence of explicit feedback. Statistical
learning, as it is called, is foundational for understanding how we process
both auditory (Saffran, Aslin, & Newport, 1996; Saffran, Johnson, Aslin, &
Newport, 1998) and visual stimuli (Kirham, Slemmer, & Johnson, 2002;
Turk-Browne, Jungé, & Scholl, 2005). Music and language are the two
primary auditory inputs that have been studied. Regarding music, statistical
learning plays a role in the perception of melody (Creel, Newport, & Aslin,
2004), harmony (Jonaitis & Saffran, 2009), timbre (Tillmann & McAdams,
2004), and the acquisition of absolute pitch (Saffran, 2003; Saffran &
Griepentrog, 2001). Gestalt organizing principles appear to be important in
the statistical learning process (Creel, Newport, & Aslin, 2004).
Work on the neural structures involved in statistical learning is just
beginning (e.g., Karuza et al., 2013), however, there is every reason to
believe that advancements in this area will continue to be made. In the
meantime, additional support for innate neural structures subserving music
came with the discovery that congenital amusics (persons with music
processing deficits) can learn unfamiliar words as easily as controls, but not
musical patterns (Peretz, Saffran, Schön, & Gosselin, 2012); in other words,
mere exposure is not sufficient without the requisite intact neural
mechanisms.
Experience-expectant processes (e.g., language and music) are largely
driven by genes; the brain prepares itself, largely through genetic processes,
to learn any language(s) that the person might encounter (Kuhl & Rivera-
Gaxiola, 2008). Experience-dependent processes (e.g., English or Spanish;
jazz or Chinese opera) rely more on learning experiences. Thus, infants
have the capability of processing any musical style they might encounter
(Hannon & Trehub, 2005; Winkler, Háden, Ladining, Sziller, & Honing,
2009), but the particular musical style or styles depends upon the
environment in which they are raised. Galvan (2010) created a model
whereby neural plasticity is a result of both development and learning (Fig.
3). Rather than being independent, autonomous processes, development and
learning are part of a continuum. Genetic instructions and learning
experiences work together to shape the brain. Experience-expectant
mechanisms rely more on development, while experience-dependent
mechanisms rely more on learning.
FIGURE 3. This working model illustrates that development and learning exist on a continuum, as
each independently and simultaneously influence neural plasticity. While development is largely
guided by experience-expectant mechanisms, it also receives input from experience-dependent
mechanisms. Similarly, learning is mostly guided by experience-dependent mechanisms, but also
receives experience-expectant input.
Reprinted from Human Brain Mapping 31(6), Adriana Galván, Neural plasticity of
development and learning, pp. 879–90, Figure 1+, doi.org/10.1002/hbm.21029, Copyright ©
2010, John Wiley and Sons.
Looking for explanations of how these changes occur in the brain leads
us to two basic brain development processes, neural pruning and
myelination, both of which have been implicated in musical studies. Each
process is driven by both genetic instructions and lived experiences.
Neural Pruning
Early on in development, the brain overproduces synapses, the connections
between neurons (Berk, 2017). Different brain regions peak at different
times, but by age 2 there may be as many as 50 percent more synapses in a
given area than will be present during adulthood (Stiles, Reilly, Levine,
Trauner, & Nass, 2012). Following the peak of this rapid proliferation of
synapses, a protracted period of decline extends throughout childhood and
into early adulthood. Operating on a “use it or lose it” basis, unused
synapses are selectively pruned, leaving a sculpted brain (Gogate, Giedd,
Janson, & Rapoport, 2001). The number of possible connections—100,000
trillion (1014) synapses in the cerebral cortex—is far too great to be
determined by genetics alone. Rather, the general outlines are genetically
programmed, with selective pruning guided by sensory and motor
experience, psychoactive drugs, gonadal hormones, parent–child
relationships, peer relationships, stress, intestinal flora, and diet (Kolb &
Gibb, 2011). Changes in cortical thickness as a result of pruning are
associated with behavior.
Sculpting the brain is not simply a matter of deleting unused cells and
synapses. At the same time this is happening, new synapses are being
formed throughout the lifetime. Synapses formed early in life are
“expecting” certain life experiences that will prune them into optimal
networks. Later forming synapses are more localized and specific to
particular learning experiences. “Thus, experiences are changing neural
networks by both adding and pruning synapses” (Kolb & Gibb, 2011, p.
268).
Myelination
As neurons communicate among themselves by creating neural networks,
they have numerous dendrites for input but only one axon for output. Over
time, axons are covered in a fatty sheath called myelin that enhances
transmission speed up to 100 times and improves efficiency (Zull, 2002).
Genetic instructions drive myelination in a process that moves through the
brain from bottom to top and back to front. Thus, it is only in one’s early to
mid-20s that the frontal lobes are fully myelinated, and increasing
myelination is related to enhanced cognitive functioning (Webb, Monk, &
Nelson, 2001). Because myelin is white in appearance, the core of the brain
is called white matter; here, billions of fibers connect different regions of
gray matter into neural networks (Filley, 2005).
Although genetic instructions are essential, myelination is also
responsive to learning experiences, as “neurons that wire together fire
together” and “neurons that fire apart wire apart—or neurons out of sync
fail to link” (Doidge, 2007, pp. 63–64). In other words, when we engage
repeatedly in a thought or action (e.g., practicing scales), the neural
network(s) supporting those processes becomes stronger with repetitive
stimulation (Fields, 2009). Specifically, learning experiences elicit more
wrappings of the axon, making message transmission increasingly efficient.
Thus, for example, Bengtsson et al. (2005) found that practicing the piano
induced changes in white matter plasticity. Changes were greater during
childhood than during adolescence or adulthood.
Improved efficiency comes at a cost, as myelination decreases flexibility
in neural responses. That is, the brain places restrictions on itself such that
what is learned limits what can be learned (Quartz, 2003). The more attuned
to surrounding cultural expressions (e.g., language, music, etc.) children
become, the less responsive they are to other cultural expressions (Pons,
Lewkowicz, Soto-Faraco, & Sebastián-Gallés, 2009). Responding
appropriately to unfamiliar tonal and rhythmic structures becomes more
difficult once one has learned the music of the surrounding culture (Patel,
Meltznoff, & Kuhl, 2004).
C I I I
R M
There is significant evidence that the fetus responds to sounds during the
last trimester before birth, as activations in the primary auditory cortex were
recorded using fMRI in the left hemisphere of fetuses at 33 weeks gestation
(Jardi et al., 2008). Newborns as early as 1–3 days old responded to music
with activations in both hemispheres (Perani et al., 2010). Excerpts of
Western tonal music registered primarily in the right hemisphere (RH),
while altered or dissonant versions reduced RH responses and activated left
inferior frontal cortex and limbic structures. “These results demonstrate that
the infant brain shows a hemispheric specialization in processing music as
early as the first postnatal hours” (Perani et al., 2010, p. 4758). Similarly,
using near-infrared spectroscopy (NISR), researchers found that neonates
registered speech and music sounds in both hemispheres, with more
coherent responses to speech in the left hemisphere (LH) (Kotilahti et al.,
2010). Regarding music specifically, researchers used event-related
potentials (ERP) to determine that newborns can process musical pitch
intervals (Stefanics et al., 2009), distinguish pitch from timbre (Háden et
al., 2009), detect the beat in music (Winkler et al., 2009), and create
expectations for tonal patterns (Carral et al., 2005). While one cannot rule
out the effects of learning entirely, it seems clear that we come into the
world prepared to process musical sounds.
The foregoing suggests inborn proclivities for musical processing, but
not predetermined responses to specific styles of music. After reviewing the
literature, Hannon and Trainor (2007) concluded that neural networks
“become increasingly specialized for encoding the musical structure of a
particular culture” (p. 470). As an example, Shahin, Roberts, & Trainor
(2004) found that auditory evoked potentials in 4- and 5-year-olds were
larger in those who received Suzuki music lessons compared to controls
who did not; even larger responses were generated by tones from the
instrument studied (i.e., piano or violin).
To conclude this section, we look at two studies in which researchers
examined the effects of enculturation more closely. Mehr and colleagues
conducted several experiments designed to explore the ways in which
infants imbue music with social meanings. Five-month-old infants heard
one of two novel songs presented by a parent, by a toy, or by a friendly, but
unfamiliar adult in person and subsequently via video (Mehr, Song, &
Spelke, 2016). Later, these infants heard two novel individuals sing the
familiar and then the unfamiliar song. Those infants who had previously
heard a parent sing the familiar song preferred (i.e., looked longer at) the
new person singing it rather than the new person singing the unfamiliar
song. The amount of exposure to the song received at home was correlated
with the length of selective attention. These effects were not found in the
infants who initially heard the familiar song emanating from a toy or a
socially-unrelated person. Thus, songs sung by caretakers embody social
meanings for five-month-old infants.
In an extension, eleven-month-old infants were randomly assigned to
one of two groups; one group listened to one of two novel songs sung by a
parent, while the others heard a song that emanated from a toy activated by
a parent (Mehr & Spelke, 2017). Subsequently, they viewed a video of two
new people, each singing one of the songs. In a following silent condition,
two people appeared next to each other, each presenting and endorsing an
object, such as a small stuffed toy or models of an apple or pear, and the
infant was allowed to reach toward the objects. Preference was indicated by
eye gaze and touching. Infants in both groups chose the object presented by
the singer of the familiar song. Clearly, infants preferred familiar songs
regardless of whether they were learned by hearing the parent sing or by
playing with a musical toy. Even though both groups chose the familiar
song, infants who heard parents sing the song gazed longer at the object
than those who heard it coming from a toy. Again, music was imbued with
social meanings.
T S M U
René Dubos (1981) coined the term invariants, by which he meant

characteristics of human culture that are universal in a general sense but
particularized in each culture. Language, clothing, and shelter are some
examples, as are art and music. A common way to approach an
understanding of the ubiquity of music around the world is to separate
universal from culture-specific features. Ethnomusicologists have taken on
this challenge in many articles (e.g., Boiles, 1984; List, 1984; Merriam,
1964; Nettl, 1977, 1983, 2000, 2005; Nketia, 1984). By universal, they
mean “more common than not” or “typical” and certainly not that every
culture employs a particular feature. There will nearly always be
exceptions. Nevertheless, there is abundant evidence to support the
contention that all human societies engage in what may be called or
recognized as music (Cross, 2007, 2009–2010; Cross & Morley, 2009;
Nettl, 2005). Such universal behavior is likely supported by underlying
biological mechanisms (Turner & Ioannides, 2009), such as genetic
influences (see Chapter 18).
One line of support for music’s long-standing role in human
development comes from archaeological findings. Although the earliest
evidence of art is shrouded in the mists of time, there are tantalizing hints
such as the Venus of Tan-Tan, a quartzite sculpture dated from between
300,000 and 500,000 years ago (DeFelipe, 2011) or cave paintings from
64,800 years ago (Hoffman et al., 2018). Granted, these earliest findings are
controversial, having been created by Homo heidelbergensis and Homo
neanderthalensis, respectively, and not direct evidence of music. However,
there is reason to believe that music was also part of early human
behavioral repertoire (e.g., Mithen, 2006). Here are just a few examples of
supporting evidence:
• 70,000 years ago: Cave paintings depict a bow, which anthropologists

contend was used as a musical instrument as well as a weapon
(Kendig & Levitt, 1982); musical bows have been found worldwide
(Mason, 1897).
• 60,000 years ago: Artifacts in a cave in Lebanon indicate ceremonies
involving singing and dancing (Constable, 1973). This was made
more plausible when a video was made of a contemporary Australian
aborigine executing a cave painting in the presence of singing and
dancing as part of a religious ritual (Mumford, 1967). Acoustically,
the best places for singing and chanting are those caves with the most
art and rooms with poor acoustics rarely have paintings (Allman,
1994; Cross & Morley, 2009; Morley, 2006).
• 40–20,000 years ago: Cave paintings of musicians and dancers
(Prideaux, 1973) are found along with whistles, pipes, flutes, and
bone and rock percussion instruments (Blake & Cross, 2008; Cross et
al., 2002; Dams, 1985; Kunej & Turk, 2000).
Of course, a more extensive treatment of this topic would provide many

more details, but even these few points should suffice to make the point that
humans have always and everywhere been musical.
Singing is common among all cultures (Lomax, 1968), as is the singing
of lullabies and dancing to music (McDermott, 2008). Lullabies appear to
possess common features (Trehub, Unyk, & Trainor, 1993). The use of
musical instruments is so common as to be nearly universal, if not
completely so (Wade, 2009). Instruments are often classified into
idiophones (struck instruments such as gongs and rattles), membranophones
(drums), aerophones (flutes and other wind instruments), chordophones
(stringed instruments), corpophones (body percussion and hand clapping),
and electrophones (mechanical and electrical instruments) (Hornbostel &
Sachs, 1992; Wade, 2009). Drake and Bertrand (2001) proposed five
candidates for universals in temporal processing: segmentation and
grouping, predisposition towards regularity, active search for regularity,
temporal zone for optimal processing, and predisposition toward simple
duration ratios. Even some basic emotions appear to be recognized in music
cross-culturally (Adachi, Trehub, & Abe, 2004; Balkwill & Thompson,
1999; Balkwill, Thompson, & Matsunaga, 2004; Fritz et al., 2009),
although subtle emotions are strongly affected by culture (Davies, 2010;
Gregory & Varney, 1996).
Given the enormous variety of music and musicing around the world,
and even given the fact that some cultures do not have a specific word for
music in their language (Cross & Morley, 2009; Dissanayake, 2009), it
should be no surprise that there is scant agreement on universal features.
However, Brown and Jordania (2011) proposed four types of universals:
• Type 1: Conserved Universals occur in all musical utterances and

include the use of discrete pitches, octave equivalence, phrase
structures, and so on.
• Type 2: Predominant Patterns occur in all musical systems or styles
and include musical scales with seven or fewer pitches per octave, a
predominance of precise rhythms, use of idiophones and drums, and
so on.
• Type 3: Common Patterns. Musical patterns that, while not universal,
are widespread. Examples might include the unity of Jewish musical
traditions following the diaspora, with Ashkenazic styles in Russia
and northern Europe and Sephardic styles in Persia, India, Spain, and
the Mediterranean basin (Bahat, 1980). Another example would be
religious music, such as Buddhist or Christian musical practices in
many different countries.
• Type 4: Range Universals. Particular categories of music or musical
behavior are expressed across a wide range of possibilities. For
example, all music could be placed into a classification of multipart
textures, from monophony, heterophony, homophony, to polyphony.
The first three categories are based strongly on Nettl’s (2000) gradient-
of-universality approach. The authors then provide a list of seventy items
related to music’s sound structures (i.e., pitch, rhythm, melodic structure
and texture, form, vocal style, expressive devices, and instruments) and
extra-musical features (i.e., contexts, contents, and behavior). Just as the
twelve tones of Western music’s chromatic scale provide for an infinite
number of realizations, so might these putative universals provide the
structure of human music within which the cultural variations are also
infinite. Continued research, however, is critically necessary.
Some candidates for relatively universal functions and roles of music in
worldwide cultures have also been offered (Table 2).
Table 2. Functions and roles of music
Music provides the function of:

Emotional expression M
Aesthetic enjoyment M
Entertainment M&G
Communication M&G
Symbolic representation M, C, & G
Physical response/coordination of action M&C
Enforcing conformity to social norms M
Validation of social institutions and religious rituals M&G
Contribution to the continuity and stability of culture M
Contribution to the integration of society M
Regulation of an individual’s emotional, cognitive, or physiological state C
Mediation between self and other C
Traditional roles of music include:
Lullabies G
Games G
Work music G
Dancing G
Storytelling G
Ceremonies and festivals G
Battle G
Ethnic or group identity G
Salesmanship G
Healing G
Trance G
Court music G
C = Clayton (2009), G = Gregory (1997), M = Merriam (1964).
On balance, evidence supports the notion that biological and cultural

aspects combine and interact to create whatever may be universal about
music. Ongoing cross-cultural music research is critical to advancing our
understanding.
C -C M R
In thinking about cross-cultural music research, it should be noted that one

of the difficulties in our current understanding of neurocognition is the fact
that 94 percent of the participants in psychological experiments come from
only 12 percent of the world’s population (Arnett, 2008), and 90 percent of
published neuroimaging studies come from Western countries (Chiao,
2009). This is likely even more true for music cognition and an alarming
exacerbation is the rapid Westernization of the globe such that it will soon
be much more difficult to find listeners who have not been exposed to
Western music. Time is running out for us to have access to indigenous,
authentic musical performers and listeners.
A few tentative conclusions can be drawn from the relatively small
number of cross-cultural music research studies published:
1. The most general finding is that enculturation strongly affects how

one interprets and understands music from within and without the
home culture (Curtis & Bharucha, 2009; Demorest, Morrison,
Beken, & Jungbluth, 2008; Demorest, Morrison, Nguyen, &
Bodnar, 2016; Kessler, Hansen, & Shepard, 1984). The cultural
distance hypothesis suggests that musical processing is more
efficient and accurate when unfamiliar music is similar to one’s
own cultural music and less so the farther removed the unfamiliar
music becomes (Demorest & Morrison, 2016; see also Chapter 3).
2. Given the caveats for music universals from the previous section,
there are probable cognitive and emotional processes that support
all musical experiences but these can be highly modified by
enculturation (e.g., Krumhansl et al., 2000; Lauuka, Eerola,
Thingujan, & Yamasaki, 2013; Neuhaus, 2003).
3. Certain basic emotions (e.g., happiness, sadness, anger) may be
identifiable in unfamiliar music, but less so emotions that are more
culture specific (Balkwill & Thompson, 1999; Balkwill et al., 2004;
Fritz et al., 2009; Lauuka et al., 2013). There is some evidence that
psychophysical variables or acoustic cues (e.g., tempo, loudness,
complexity, etc.) play a role in determining emotional expressions
(Balkwill & Thompson, 1999; Balkwill et al., 2004).
4. Music enculturation begins early in infancy (Morrison, Demorest,
& Stambaugh, 2008; Soley & Hannon, 2010; Trainor, Marie, Gerry,
Whiskin, & Unrau, 2012); bimusicalism, similar to bilingualism,
can result from sufficient early exposure (Wong, Roy, & Margulis,
2009). Once well established, however, enculturated processes may
be somewhat resistant to change through training (Morrison,
Demorest, Campbell, Bartolome, & Roberts 2013).
5. Active musicing is more efficient than passive exposure in
establishing enculturated music processes (Trainor et al., 2012);
however, passive exposure, in the form of statistical learning, is
sufficient for inculcating a basic understanding of one’s own
cultural music (Drake & Ben El Heni, 2003).
Regarding brain responses in cross-cultural research, a few additional

points can be made. Activation sites to familiar (i.e., from one’s own
culture) and unfamiliar music may elicit responses in similar (Demorest &
Morrison, 2003) or nearby brain regions (Matsunaga, Yokosawa, & Abe,
2012). Also, brain activations to culturally unfamiliar music may be more in
terms of degree than substance (Demorest & Osterhout, 2012; Morrison &
Demorest, 2009; Morrison, Demorest, Aylward, Cramer, & Maravilla,
2003; Nan, Knösche, & Frederici, 2006). However, different brain regions
may also be activated in response to familiar and unfamiliar music
depending on specific tasks required of participants (Nan, Knösche, Zysset,
& Frederici, 2008). Finally, cultural experiences influence both the
perception and memory of music at behavioral and neurological levels
(Demorest et al., 2010).
Taken as a whole, cross-cultural research supports the main contention
of this chapter, namely that musical experiences are an intricate and
complicated combination of biological and cultural processes. Because
biological mechanisms may influence how enculturation proceeds and
enculturation may impose biological constraints, the two are highly
interrelated. As stated at the outset, our purpose is not in attempting to
separate the two, as that is not only impossible but an artificial schism;
rather, it is to recognize that one informs the other.
C
Viewed through the lens of cultural neuroscience, the central thesis of this
chapter is that biological and cultural aspects of musical experiences are
inextricably intertwined. Virtually nothing about musical experiences is
purely biological or purely cultural. We might consider a tree with its root
system as a visual analogy (Fig. 4). Let the trunk represent musicality as a
universal aspect of being human. Let the branches represent major cultural
traditions and the smaller twigs and leaves stand for particular musical
genres and styles.1 The cultural distance hypothesis (Demorest & Morrison,
2016) suggests that leaves on the same branch (i.e., nearby musical styles)
are more understandable to listeners than leaves on the opposite side of the
tree. Supporting the visible part of the tree is a dense, deep-seated root
system. These roots represent the supporting biological and cultural
underpinnings of music. Each root is an amalgam of biological and cultural
aspects, such that it is impossible to disentangle the Gordian knot.
FIGURE 4. A visual analogy for human musicality. The roots represent biological and cultural
underpinnings. The trunk represents musicality as a universal aspect of humankind. The branches
represent different cultural traditions and the twigs and leaves represent particular musical genres
and styles.
It is important to remember that the object of study in neuromusical

research is not a brain that sits in a jar on a shelf in some lab; it is inside a
living person with a personality, with all manner of proclivities,
potentialities, and internal and external motivations and influences. Being
mindful of these biocultural interactions does not mean that it is possible to
separate biology from culture, but rather that research findings must be
interpreted with an awareness of these mutual influences. Let us hope that
research within a cultural neuroscience perspective will proceed at an ever-
increasing pace so that we can learn as much as possible about the
biocultural aspects of music before it is too late and there are no more
indigenous, authentic musicians and music listeners to study.
R
Adachi, M., Trehub, S., & Abe, J.-I. (2004). Perceiving emotion in children’s songs across age and
culture. Japanese Psychological Research 46(4), 322–336.
Allman, W. (1994). The stone age present. New York: Simon & Schuster.
Ambady, N., & Bharucha, J. (2009). Culture and the brain. Current Directions in Psychological
Science 18(6), 342–345.
Arnett, J. (2008). The neglected 95 percent: Why American psychology needs to become less
American. American Psychologist 63(7), 602–614.
Bahat, A. (1980). The musical traditions of the oriental Jews. The World of Music 22(2), 46–55.
Balkwill, L.-L., & Thompson, W. (1999). A cross-cultural investigation of the perception of emotions
in music: Psychophysical and cultural cues. Music Perception 17(1), 43–64.
Balkwill, L.-L., Thompson, W., & Matsunaga, R. (2004). Recognition of emotion in Japanese,
Western, and Hindustani music by Japanese listeners. Japanese Psychological Research 46(4),
337–349.
Bangert, M., & Schlaug, G. (2006). Specialization of the specialized in features of external human
brain morphology. European Journal of Neuroscience 24(6), 1832–1834.
Bengtsson, S., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano
practicing has regionally specific effects on white matter development. Nature Neuroscience 8(9),
1148–1150.
Berk, L. (2017). Development through the lifespan. New York: Pearson Education.
Bermudez, J., & Zatorre, R. (2005). Differences in gray matter between musicians and nonmusicians.
The neurosciences and music II: From perception to performance. Annals of the New York
Academy of Sciences 1060, 395–399.
Blacking, J. (1973). How musical is man? Seattle: University of Washington Press.
Blake, E., & Cross, I. (2008). Flint tools as portable sound-producing objects in the upper paleolithic
context: An experimental study. In P. Cunningham, J. Heeb, & R. Paardekooper (Eds.),
Experiencing archaeology by experiment (pp. 1–19). Oxford: Oxbow Books.
Boiles, C. (1984). Universals of musical behavior: A taxonomic approach. The World of Music 26(2),
50–64.
Brown, S., & Jordania, J. (2011). Universals in the world’s musics. Psychology of Music 41(2), 229–
248.
Carral, V., Huotilainen, M., Ruusuvirta, T., Fellman, V., Näätänen, R., & Escera, C. (2005). A kind of
auditory “primitive intelligence” already present at birth. European Journal of Neuroscience
21(11), 3201–3204.
Chiao, J. (2009). Cultural neuroscience: A once and future discipline. Progress in Brain Research
178, 287–304.
Chiao, J., Li, S., Seligman, R., & Turner, R. (Eds.). (2016). The Oxford handbook of cultural
neuroscience. Oxford: Oxford University Press.
Clayton, M. (2009). The social and personal functions of music in cross-cultural perspective. In S.
Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (pp. 35–44).
Oxford: Oxford University Press.
Conard, N., Malina, M., & Münzel, C. (2009). New flutes document the earliest musical tradition in
southwestern Germany. Nature 460(7256), 737–740.
Constable, G. (1973). The Neanderthals. New York: Time-Life Books.
Creel, S., Newport, E., & Aslin, R. (2004). Distant melodies: Statistical learning of nonadjacent
dependencies in tone sequences. Journal of Experimental Psychology 30(5), 1119–1130.
Cross, I. (2007). Music and cognitive evolution. In L. Barrett & R. Dunbar (Eds.), The Oxford
handbook of evolutionary psychology (pp. 649–667). Oxford: Oxford University Press.
Cross, I. (2009–2010). The evolutionary nature of musical meaning. Musicæ Scientiæ, Special Issue
2009–2010, 179–200.
Cross, I., & Morley, I. (2009). The evolution of music: Theories, definitions and the nature of the
evidence. In S. Malloch & C. Trevarthen (Eds.), Communicative musicality (pp. 61–81). Oxford:
Oxford University Press.
Cross, I., Zubrow, E., & Cowan, F. (2002). Musical behaviours and the archaeological record: A
preliminary study. In J. Mathieu (Ed.), Experimental archaeology. British Archaeological Reports
International Series 1035 (pp. 25–34). Oxford: BAR Publishing.
Curtis, M. E., & Bharucha, J. J. (2009). Memory and musical expectation for tones in cultural
context. Music Perception 26(4), 365–375.
Dams, L. (1985). Paleolithic lithophones: Descriptions and comparisons. Oxford Journal of
Archaeology 4(1), 31–46.
Davies, S. (2010). Emotions expressed and aroused by music: Philosophical perspectives. In P. Juslin
& J. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 15–43).
DeFelipe, J. (2011). The evolution of the brain, the human nature of cortical circuits, and intellectual
creativity. Frontiers in Neuroanatomy 5(29), 1–17.
Demorest, S., & Morrison, S. (2003). Exploring the influence of cultural familiarity and expertise on
neurological responses to music. Annals of the New York Academy of Sciences 999, 112–117.
Demorest, S., & Morrison, S. (2016). Quantifying culture: The cultural distance hypothesis of
melodic expectancy. In J. Chiao, S.-C. Li, R. Seligman, & R. Turner (Eds.), The Oxford handbook
of cultural neuroscience (pp. 183–196). Oxford: Oxford University Press.
Demorest, S., Morrison, S., Beken, M., & Jungbluth, D. (2008). Lost in translation: An enculturation
effect in music memory performance. Music Perception 25(3), 213–223.
Demorest, S., Morrison, S., Nguyen, V., & Bodnar, E. (2016). The influence of contextual cues on
cultural bias in music memory. Music Perception 33(5), 590–600.
Demorest, S., Morrison, S., Stambaugh, L., Beken, M., Richards, T., & Johnson, C. (2010). An fMRI
investigation of the cultural specificity of music memory. Social Cognitive and Affective
Neuroscience 5(2–3), 282–291.
Demorest, S., & Osterhout, L. (2012). ERP responses to cross-cultural melodic expectancy
violations. Annals of the New York Academy of Sciences 1252, 152–157.
Dissanayake, E. (2009). Root, leaf, blossom, or bole: Concerning the origin and adaptive function of
music. In S. Malloch & C. Trevarthen (Eds.), Communicative musicality: Exploring the basis of
human companionship (pp. 17–30). Oxford: Oxford University Press.
Dobzhansky, T. (1962). Mankind evolving. New Haven, CT: Yale University Press.
Doidge, N. (2007). The brain that changes itself. New York: Penguin.
Drake, C., & Ben El Heni, J. (2003). Synchronizing with music: Intercultural differences. Annals of
the New York Academy of Sciences 999, 429–437.
Drake, C., & Bertrand, D. (2001). The quest for universals in temporal processing in music. Annals
of the New York Academy of Sciences 930, 17–27.
Dubos, R. (1981). Celebrations of life. New York: McGraw-Hill.
Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical
representation of the fingers of the left hand in string players. Science 270(5234), 305–307.
Fields, D. (2009). The other brain. New York: Simon & Schuster.
Filley, C. (2005). White matter and behavioral neurology. In J. Ulmer, L. Parsons, M. Moseley, & J.
Gabrieli (Eds.), White matter in cognitive neuroscience. Annals of the New York Academy of
Sciences 1064, 162–183.
Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., … Koelsch, S. (2009).
Universal recognition of three basic emotions in music. Current Biology 19(7), 573–576.
Galvan, A. (2010). Neural plasticity of development and learning. Human Brain Mapping 31(6),
879–890.
Gaser, C., & Schlaug, G. (2003). Gray matter differences between musicians and nonmusicians.
Annals of the New York Academy of Sciences 999, 514–517.
Gaston, E. (1968). Man and music. In E. Gaston (Ed.), Music in therapy (pp. 7–29). New York:
Macmillan.
Gogate, N., Giedd, J., Janson, K., & Rapoport, J. (2001). Brain imaging in normal and abnormal
brain development: New perspectives for child psychiatry. Clinical Neuroscience Research 1(4),
283–290.
Gregory, A. (1997). The roles of music in society: The ethnomusicological perspective. In D.
Hargreaves & A. North (Eds.), The social psychology of music (pp. 123–140). Oxford: Oxford
University Press.
Gregory, A., & Varney, N. (1996). Cross-cultural comparisons in the affective response to music.
Psychology of Music 24(1), 47–52.
Háden, G., Stefanics, G., Vestergaard, M., Denham, S., Sziller, I., & Winkler, I. (2009). Timbre-
independent extraction of pitch in newborn infants. Psychophysiology 46(1), 69–74.
Han, S., & Ma, Y. (2015). A culture–behavior–brain loop model of human development. Trends in
Cognitive Sciences 19(11), 666–676.
Han, S., Northoff, G., Vogeley, K., Wexler, B., Kitayama, S., & Varnum, M. (2013). A cultural
neuroscience approach to the biosocial nature of the human brain. Annual Review of Psychology
64, 335–359.
Hannon, E. E., & Trainor, L. J. (2007). Music acquisition: Effects of enculturation and formal
training on development. Trends in Cognitive Sciences 11(11), 466–472.
Hannon, E. E., & Trehub, S. E. (2005). Metrical categories in infancy and adulthood. Psychological
Science 16(1), 48–55.
Hodges, D., Burdette, J., & Hairston, D. (2005). Aspects of multisensory perception: The integration
of visual and auditory information processing in musical experiences. In G. Avanzini, L. Lopez, S.
Koelsch, & M. Majno (Eds.), The neurosciences and music II: From perception to performance.
Hodges, D., Hairston, W., Maldjian, J., & Burdette, J. (2010). Keeping an open mind’s eye:
Mediation of cross-modal inhibition in music conductors. In S. M. Demorest, S. J. Morrison, & P.
S. Campbell (Eds.), Proceedings of the 11th International Conference on Music Perception and
Cognition (ICMPC 11) (pp. 415–416). Seattle, Washington.
Hoffman, D., Standish, C., Garcia-Diez, M., Pettitt, P., Milton, J., Zilhão, J., … Pike, A. (2018). U-
Th dating of carbonate crusts reveals Neandertal origin of Iberian cave art. Science 359(6378),
912–915.
Hornbostel, E., & Sachs, C. (1992). Classification of musical instruments. In H. Meyers (Ed.),
Ethnomusicology: An introduction (pp. 444–461). New York: W. W. Norton.
Hutchinson, S., Lee, L., Gaab, N., & Schlaug, G. (2003). Cerebellar volume of musicians. Cerebral
Cortex 13(9), 943–949.
Hyde, K., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A., & Schlaug, G. (2009). Musical
training shapes brain development. Journal of Neuroscience 20(10), 3019–3025.
Jardi, R., Pins, D., Houfflin-Debarge, V., Chaffiotte, C., Rocourt, N., Pruvo, J.-P., … Thomas, P.
(2008). Fetal cortical activation to sound at 33 weeks of gestation: A functional MRI study.
NeuroImage 42(1), 10–18.
Jonaitis, E., & Saffran, J. (2009). Learning harmony: The role of serial statistics. Cognitive Science
33(5), 951–968.
Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A., Lähdesmäki, H., & Järvelä, I. (2015a). The
effect of music performance on the transcriptome of professional musicians. Scientific Reports 5,
9506. doi:10.1038/srep09506
Kanduri, C., Raijas, P., Ahvenaninen, M., Phillips, A., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä,
I. (2015b). The effect of listening to music on human transcriptome. PeerJ 3, e830.
doi:10.7717/peerj.830
Karuza, E., Newport, E., Aslin, R., Starling, S., Tivarus, M., & Bavelier, D. (2013). The neural
correlates of statistical learning in a word segmentation task: An fMRI study. Brain and Language
127(1), 46–54.
Kendig, F., & Levitt, G. (1982). Overture: Sex, math and music. Science Digest 90(1), 72–73.
Kessler, E. J., Hansen, C., & Shepard, R. N. (1984). Tonal schemata in the perception of music in
Bali and the West. Music Perception 2(2), 131–165.
Kirham, N., Slemmer, J., & Johnson, S. (2002). Visual statistical learning in infancy: Evidence for a
domain general learning mechanism. Cognition 83(2), B35–B42.
Koelsch, S., Schröger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in
musicians. NeuroReport 10(6), 1309–1313.
Kolb, B., & Gibb, R. (2011). Brain plasticity and behaviour in the developing brain. Journal of
Canadian Child Adolescent Psychiatry 20(4), 265–276.
Kotilahti, K., Nissilä, I., Näsi, T., Lipiäinen, L., Noponen, T., Meriläinen, P., … Fellman, V. (2010).
Hemodynamic responses to speech and music in newborn infants. Human Brain Mapping 31(4),
595–603.
Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P., Järvinen, T., & Louhivuori, J. (2000).
Cross-cultural music cognition: Cognitive methodology applied to North Sami yoiks. Cognition
76(1), 13–58.
Kuhl, P., & Rivera-Gaxiola, M. (2008). Neural substrates of language acquisition. Annual Review of
Neuroscience 31, 511–534.
Kunej, D., & Turk, U. (2000). New perspectives on the beginnings of music: Archaeological and
musicological analysis of a middle Paleolithic bone “flute.” In N. Wallin, B. Merker, & S. Brown
(Eds.), The origins of music (pp. 235–268). Cambridge, MA: MIT Press.
Lauuka, P., Eerola, T., Thingujan, N., & Yamasaki, T. (2013). Universal and culture-specific factors
in the recognition and performance of musical affect expressions. Emotion 13(3), 434–449.
List, G. (1984). Concerning the concept of the universal and music. The World of Music 26(2), 40–
47.
Lomax, A. (1968). Folk song style and culture. New Brunswick, NJ: Transaction Books.
McDermott, J. (2008). The evolution of music. Nature 453(7193), 287–288.
Manuck, S., & McCaffery, J. (2014). Gene–environment interaction. Annual Review of Psychology
65, 41–70.
Manzano, O., & Ullén, F. (2018). Same genes, different brains: Neuroanatomical differences between
monozygotic twins discordant for musical training. Cerebral Cortex 28(1), 1–8, 387–394.
Marek, G. (1975). Toscanini. London: Vision Press.
Mason, O. (1897). Geographical description of the musical bow. American Anthropologist 10(11),
377–380.
Matsunaga, R., Yokosawa, K., & Abe, J. (2012). Magnetoencephalography evidence for different
brain subregions serving two musical cultures. Neuropsychologia 50(14), 3218–3227.
Mehr, S., Song, L., & Spelke, E. (2016). For 5-month-old infants, melodies are social. Psychological
Science 27(4), 486–501.
Mehr, S., & Spelke, E. (2017). Shared musical knowledge in 11-month-old infants. Developmental
Science 21(2), e12542. doi:10.1111/desc.12542
Merriam, A. (1964). The anthropology of music. Chicago, IL: Northwestern University Press.
Mithen, S. (2006). The singing Neanderthals: The origins of music, language, mind, and society.
Cambridge, MA: Harvard University Press.
Morley, I. (2006). The evolutionary origins and archaeology of music: An investigation into the
prehistory of human musical capacities and behaviors (Doctoral dissertation). University of
Cambridge, Cambridge. Darwin College Research Reports, DCRR-002. Retrieved from
https://www.darwin.cam.ac.uk/drupal7/sites/default/files/Documents/publications/dcrr002.pdf
Morrison, S., & Demorest, S. (2009). Cultural constraints on music perception and cognition. In J. Y.
Chiao (Ed.), Progress in brain research, Vol. 178: Cultural neuroscience: Cultural influences on
brain function (pp. 67–77). Amsterdam: Elsevier.
Morrison, S., Demorest, S., Aylward, E., Cramer, S., & Maravilla, K. (2003). fMRI investigation of
cross-cultural music comprehension. NeuroImage 20(1), 378–384.
Morrison, S., Demorest, S., Campbell, P., Bartolome, S., & Roberts, J. (2013). Effect of intensive
instruction on elementary students’ memory for culturally unfamiliar music. Journal of Research
in Music Education 60(4), 363–374.
Morrison, S., Demorest, S., & Stambaugh, L. (2008). Enculturation effects in music cognition: The
role of age and music complexity. Journal of Research in Music Education 56(2), 118–129.
Mosing, M., Madison, G., Pedersen, N., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not
make perfect: No causal effect of music practice on music ability. Psychological Science 25(9),
1795–1803.
Mosing, M., Peretz, I., & Ullén, F. (2018). Genetic influences on music expertise. In D. Hambrick, G.
Campitelli, & B. Macnamara (Eds.), The science of expertise: Behavioral, neural, and genetic
approaches to complex skill (pp. 272–282). New York: Routledge.
Mumford, L. (1967). The myth of the machine. New York: Harcourt Brace Jovanovich.
Münte, T., Altenmüller, E., & Jäncke, L. (2002). The musician’s brain as a model of neuroplasticity.
Nature Reviews Neuroscience 3(6), 473–478.
Münte, T., Kohlmetz, C., Nager, W., & Altenmüller, E. (2001). Superior auditory spatial tuning in
conductors. Nature 409(6820), 580.
Nan, Y., Knösche, T., & Frederici, A. (2006). The perception of musical phrase structure: A cross-
cultural ERP study. Brain Research 1094(1), 179–191.
Nan, Y., Knösche, T., Zysset, S., & Frederici, A. (2008). Cross-cultural music phrase processing: An
fMRI study. Human Brain Mapping 29(3), 312–328.
Nettl, B. (1977). On the question of universals. The World of Music 19, 2–13.
Nettl, B. (1983). The study of ethnomusicology. Urbana, IL: University of Illinois Press.
Nettl, B. (2000). An ethnomusicologist contemplates universals in musical sound and culture. In N.
Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 463–472). Cambridge, MA: MIT
Press.
Nettl, B. (2005). The study of ethnomusicology: Thirty-one issues and concepts. Champaign, IL:
University of Illinois Press.
Neuhaus, C. (2003). Perceiving musical scale structures: A cross-cultural event-related brain
potentials study. Annals of the New York Academy of Sciences 999, 184–188.
Nketia, J. (1984). Universal perspectives in ethnomusicology. The World of Music 26(2), 3–20.
Norton, A., Winner, E., Cronin, K., Overy, K., Lee, D., & Schlaug, G. (2005). Are there pre-existing
neural, cognitive, or motoric markers for musical ability? Brain and Cognition 59(2), 124–134.
Oikkonen, J., & Järvelä, I. (2014). Genomics approaches to study musical aptitude. Bioessays 36(11),
1102–1108.
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L., & Hoke, M. (1998). Increased
auditory cortical representation in musicians. Nature 392(6678), 811–814.
Pantev, C., Roberts, L., Schultz, M., Engelien, A., & Ross, B. (2001). Timbre-specific enhancement
of auditory cortical representations in musicians. Neuroreport 12(1), 169–174.
Patel, A., Meltznoff, A., & Kuhl, K. (2004). Cultural differences in rhythm perception: What is the
influence of native language? In S. Lipscomb, R. Ashley, R. Gjerdingen, & P. Webster (Eds.),
Proceedings of the 8th International Conference on Music Perception and Cognition. Evanston,
IL: Northwestern University. CD-ROM.
Perani, D., Saccuman, M., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010).
Functional specializations for music processing in the human newborn brain. Proceedings of the
National Academy of Sciences 107(10), 4758–4763.
Peretz, I., Brattico, E., Järvenpää, M., & Tervaniemi, M. (2009). The amusic brain: In tune, out of
key, and unaware. Brain 132(5), 1277–1286.
Peretz, I., Saffran, J., Schön, D., & Gosselin, N. (2012). Statistical learning of speech, not music, in
congenital amusia. Annals of the New York Academy of Sciences 1252, 361–367.
Pons, F., Lewkowicz, D., Soto-Faraco, S., & Sebastián-Gallés, N. (2009). Narrowing of intersensory
speech perception in infancy. Proceedings of the National Academy of Sciences, 106(26), 10598–
10602.
Prideaux, T. (1973). Cro-Magnon man. New York: Time-Life Books.
Quartz, S. (2003). Learning and brain development: A neural constructivist perspective. In P. Quinlan
(Ed.), Connectionist models of development (pp. 279–309). New York: Psychology Press.
Rampon, C., Jiang, C., Dong, H., Tang, Y.-P., Lockhart, D., Schultz, P., … Hu, Y. (2000). Effects of
environmental enrichment on gene expression in the brain. Proceedings of the National Academy
of Sciences 97(23), 12880–12884.
Saffran, J. (2003). Absolute pitch in infancy and adulthood: The role of tonal structure.
Developmental Science 6(1), 35–47.
Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-old infants. Science
274(5294), 1926–1928.
Saffran, J., & Griepentrog, G. (2001). Absolute pitch in infant auditory learning: Evidence for
developmental reorganization. Developmental Psychology 37(1), 74–85.
Saffran, J., Johnson, E., Aslin, R., & Newport, E. (1998). Statistical learning of tone sequences by
human infants and adults. Cognition 70(1), 27–52.
Schlaug, G., Forgeard, M., Zhu, L., Norton, A., Norton, A., & Winner, E. (2009). Training-induced
neuroplasticity in young children. The neurosciences and music III. Annals of the New York
Schlaug, G., Norton, A., Overy, K., & Winner, E. (2005). Effects of music training on the child’s
brain and cognitive development. The neurosciences and music II: From perception to
performance. Annals of the New York Academy of Sciences 1060, 219–230.
Shahin, A., Roberts, L., & Trainor, L. (2004). Enhancement of auditory cortical development by
musical experience in children. NeuroReport 15(12), 1917–1921.
Smith, H. (1953). From fish to philosopher. Garden City, NY: Doubleday Anchor.
Soley, G., & Hannon, E. (2010). Infants prefer the musical meter of their own culture: A cross-
cultural comparison. Developmental Psychology 46(1), 286–292.
Stefanics, G., Háden, G., Sziller, I., Balázs, L., Beke, A., & Winkler, I. (2009). Newborn infants
process pitch intervals. Clinical Neurophysiology 120(2), 304–308.
Stiles, J., Reilly, J., Levine, S., Trauner, D., & Nass, R. (2012). Neural plasticity and cognitive
development: Insights from children with perinatal brain injury. Oxford: Oxford University Press.
Tervaniemi, M., Kujala, A., Alho, K., Virtanen, J., Ilmoniemi, R., & Näätänen, R. (1999). Functional
specialization of the human auditory cortex in processing phonetic and musical sounds: A
magnetoencephalographic (MEG) study. NeuroImage 9(3), 330–336.
Tillmann, B., & McAdams, S. (2004). Implicit learning of musical timbre sequences: Statistical
regularities confronted with acoustical (dis)similarities. Journal of Experimental Psychology:
Learning, Memory, and Cognition 30(5), 1131–1142.
Trainor, L., Marie, C., Gerry, D., Whiskin, E., & Unrau, A. (2012). Becoming musically enculturated:
Effects of music classes for infants on brain and behavior. Annals of the New York Academy of
Sciences 1252, 129–138.
Trehub, S., Unyk, A., & Trainor, L. (1993). Adults identify infant-direct music across cultures. Infant
Behavior and Development 16(2), 193–211.
Turk-Browne, N., Jungé, J., & Scholl, B. (2005). The automaticity of visual statistical learning.
Journal of Experimental Psychology: General 134(4), 552–564.
Turner, R., & Ioannides, A. (2009). Brain, music and musicality: Inferences from neuroimaging. In S.
Malloch & C. Trevarthen (Eds.), Communicative Musicality (pp. 147–181). Oxford: Oxford
University Press.
Ullén, F., Hambrick, D., & Mosing, M. (2016). Rethinking expertise: A multifactorial gene–
environment interaction model of expert performance. Psychological Bulletin 142(4), 427–446.
Wade, B. (2009). Thinking musically: Experiencing music, expressing culture (2nd ed.). New York:
Wakin, D. (2006). John Cage’s long music composition in Germany changes a note. New York Times,
May 6. Retrieved September 26, 2017 from
http://www.nytimes.com/2006/05/06/arts/music/06chor.html
Webb, S., Monk, C., & Nelson, C. (2001). Mechanisms of postnatal neurobiological development:
Implications for human development. Developmental Neuropsychology 19(2), 147–171.
Wilson, F. (1998). The hand: How its use shapes the brain, language, and human culture. New York:
Vintage Books.
Winkler, I., Háden, G., Ladining, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the
beat in music. Proceedings of the National Academy of Sciences 106(7), 2468–2471.
Wong, P., Roy, A., & Margulis, E. (2009). Bimusicalism: The implicit dual enculturation of cognitive
and affective systems. Music Perception 27(2), 81–88.
Yi, T., McPherson, G., Peretz, I., Berkovic, S., & Wilson, S. (2014). The genetic basis of music
ability. Frontiers in Psychology 5(658), 1–19.
Yi, T., McPherson, G., & Wilson, S. (2018). The molecular genetic basis of music ability and music-
related phenotypes. In D. Hambrick, G. Campitelli, & B. Macnamara (Eds.), The science of
expertise: Behavioral, neural, and genetic approaches to complex skill (pp. 283–303). New York:
Routledge.
Zull, J. (2002). The art of changing the brain: Enriching the practice of teaching by exploring the
biology of learning. Sterling, VA: Stylus Publishing.
1
To be more accurate, each leaf should have a different shape to represent the individuality of
various musical styles.
CHAPT E R 3
C U LT U R A L D I S TA N C E : A
C O M P U TAT I O N A L
A P P R O A C H TO E X P L O R I N G
C U LT U R A L I N F L U E N C E S
ON MUSIC COGNITION
S T E V E N J. MO R R I S O N, S T E V E N M. D E MO R E S T, A N D
MA R C U S T. P E A R C E
A with many psychological constructs, much of what has been reported in

research on the cognitive processing of music is limited to data collected
from individuals from a small subset of cultural contexts (Henrich, Heine,
& Norenzayan, 2010). Further, the music that is typically employed for the
purposes of testing and exploration tends to be drawn from a similarly small
set of music practices and mostly consists of that constructed within the
Western diatonic framework. This includes Western classical music as well
as many North American and Western European folk and popular genres.
This is striking given that music is often regarded as a particularly
prominent and powerful manifestation of culture. Music is a common way
for individuals to assert cultural identity (Frith, 1996) and, as such, its value
arguably lies as much in its cultural and stylistic distinctiveness as in any
universal qualities it may possess.
Musical systems are somewhat closed in that each describes a set of
practices and conventions within which performances, pieces, or whatever
might be the appropriate musical “unit” are understood and evaluated.
These same practices and conventions can also serve as touchstones against
which individuals push in the spirit of creativity and innovation. People
come to inhabit a musical system due to various combinations of formal
learning—conservatory training, for example, as a means of gaining
knowledge of avant-garde art music—and informal learning—becoming
steeped in Cajun music as a result of growing up in the southern region of
the US state of Louisiana. In this chapter, our purpose is to emphasize
music as an intercultural phenomenon. As such, we will not focus on the
particularities of any specific music cultural tradition, nor will we examine
the concept of musical universals or the structural or acoustical candidates
for such a distinction. Rather, we will dedicate our attention to interactions
between music cultures, to what happens when music moves across cultural
boundaries.
From a sociological perspective, it has been useful to view the construct
of culture from a somewhat dichotomous perspective in which the notion of
the cultural insider can be contrasted with that of the cultural outsider
(Merton, 1972). Contemporary scholarship has drawn attention to the
complexity of this comparison and the considerable subjectivity that lies at
the heart of such an often, oversimplified bifurcation (for an examination of
this issue in the field of music research, see Trulsson & Burnard, 2016).
Although music is often associated with cultural identity and therefore
susceptible to insider/outsider categorization, the ease with which an
individual interacts with any given culture’s music may be more nuanced.
Culture-based differences in the way listeners and performers interact with
and respond to music are often delineated by ethnic identity or geographical
location which are, in turn, generally treated as categorical constructs. As
such, they tend to oversimplify complex relationships, obscure considerable
within-group variability, and, most critically for the present purpose, do not
hold up well when considering a brain-based understanding of music
processing.
The cultural dimension of music provides context for critical tests of
music as a neurological phenomenon. The conclusion that particular brain
regions or neurological pathways are associated with human music
processing can be tested by examining whether such relationships are
evident across musical and cultural contexts. Likewise, the strength or
extent of neural activity may offer insights into the ways in which particular
music parameters function within musical systems.
Cultural roots of music practice also offer a critical test of principles of
formal musical learning. Teaching and learning practices often vary from
culture to culture and, given that they are often directed toward within-
culture music, likely interact with the idiosyncratic elements of the music
being taught. The prospect of learning—even at a fundamental level—an
unfamiliar music tradition as a performer or as a listener provides a context
in which culture-general learning strategies or pathways might be tested.
Similarly, it provides a framework in which “from the ground up” skill or
schema development can be observed, particularly through more informal
learning pathways in which exposure and self-directed discovery feature
prominently. At the neurological level, learning within a culturally
unfamiliar context might provide further evidence of experience-based
neural plasticity as well as potential interactions with already-learned music
conventions.
Given the incremental nature of music learning (formal or informal) and
the imprecision of insider/outsider classifications, cross-cultural studies of
both music perception and music learning would benefit from a more
nuanced view of cross-cultural differences in musical traditions, one that is
more continuous than categorical. Below we will explore the construct of
cultural distance as one potential approach. Cultural distance has been
examined at a societal level (Hofstede, 1983) through the development of a
suite of measures found to effectively account for culture-based variability
among workers. Since its publication, this construct has been used primarily
in the fields of business and economics; however, it has also been employed
in a number of cross-cultural designs including, occasionally, those related
to music (Baek, 2015). The principle of cultural distance—as a way to
conceptualize a culture-specific phenomenon in relation to its manifestation
in other cultures—is evident in research on more specific cultural practices,
as well. Kuhl (1991), for example, posited a “perceptual magnet effect” to
explain early language learning processes and the manner in which infants’
speech perception quickly gravitates toward commonly used phonemic
prototypes. Similarly, individuals demonstrate better memory (Golby,
Gabrieli, Chiao, & Eberhardt, 2001) as well as better recognition of
emotional expression (Chiao et al., 2008) for same-race faces. In both
instances, more differentiated face recognition correlated with increased
neural activity in fusiform areas and amygdala, respectively.
In this chapter, we will provide a brief overview of cross-cultural
research in music cognition. We will consider studies that have compared
individuals’ interactions with culturally familiar and unfamiliar music,
those that have compared responses by participants from different cultural
backgrounds, and those that have employed fully comparative designs in
which participants of different cultural backgrounds interact with each
other’s music tradition. Among the previous research, we will summarize
some of our own recent work that has focused on identifying musical
parameters—specifically pitch and rhythm—that appear to make a
particularly strong contribution to the differences arising from cross-cultural
music interactions. Based on this work, we will then describe the construct
of cultural distance as a conceptual and analytical means of interpreting and
perhaps predicting cross-cultural responses to music.
R L
The purpose of this review is to provide a brief overview of topics in music

cognition that have been explored through a cross-cultural lens. (For more
thorough treatment of this topic see reviews by Morrison & Demorest, 2009
and Patel & Demorest, 2013.) Researchers have explored the cross-cultural
perception of music emotion, preference, musical structures of scale and
key, rhythm and meter, and larger formal elements, as well as musical
memory. Participants in these studies have spanned the gamut from infancy
to adulthood offering a picture of how culture influences music cognition
and how that influence changes with age and experience.
Cross-Cultural Explorations of Emotion

The single largest body of cross-cultural research in music cognition has to
do with the recognition of emotion in music. With the exception of a very
small number of studies (e.g., Egerman, Fernando, Chuen, & McAdams,
2015), the research has focused not on emotion induction, or how music
makes you feel, but on the ability to recognize emotional states present in
music stimuli. On the surface, this seems a curious choice given the
somewhat flexible nature of emotion recognition even within a cultural
group. However, emotion proves to be an excellent choice for exploring
cultural universality versus particularity in music cognition because
emotions refer not just to cognitive categories, but to physical states that
can be mimicked acoustically (Juslin, 2000; Juslin & Laukka, 2003). Cross-
cultural studies have explored Western listeners’ perceived emotion in
music of India (Balkwill, 2006; Balkwill & Thompson, 1999; Balkwill,
Thompson, & Matsunaga, 2004; Deva & Vermani, 1975; Gregory &
Varney, 1996; Keil & Keil, 1966), perception of Western music by non-
Western listeners, including Congolese pygmies (Egermann et al., 2015)
and the Mafa people of northern Cameroon (Fritz et al., 2009), Western
listeners perception of Congolese pygmy music (Egermann et al., 2015),
and the cross-cultural communication of emotion involving performers and
listeners from Swedish, Indian, and Japanese music cultures (Laukka,
Eerola, Thingujam, Yamasaki, & Beller, 2013).
The findings can be summarized briefly as follows: A limited set of
emotions can be recognized in music regardless of cultural familiarity. The
emotions most consistently recognized (happy, sad, angry) vary in arousal
in ways that mimic physiological states. Other emotion recognition
judgments show influences of cultural familiarity. There are several theories
of emotion recognition that attempt to model this combination of
psychophysical and cultural cues in emotion recognition judgments. One of
the first theories was the Cue Redundancy Model (CRM) proposed by
Balkwill and Thompson (1999). According to this model, emotions in
music are decoded by attending to cues in the musical stimulus consisting
of psychophysical cues (sound intensity, tempo, melodic complexity, pitch
range, etc.) and culture-specific cues like the use of a certain instrument or
tonality to communicate a particular emotional state. This allows in-culture
listeners to use more information in their emotion recognition judgments,
but it also allows out-of-culture listeners to access basic emotional
information regardless of familiarity. The authors later proposed a more
refined model called Fractionating Emotional Systems or FES (Thompson
& Balkwill, 2010). FES attempts to explain how the culture-specific and
culture-general cues proposed in CRM function in development. They
propose that all emotion communication is built on a phylogenetic base of
shared cues involved in being human. As we age we incorporate
ontogenetic cues for both music and language prosody into our emotional
vocabulary in a more culturally specific way. Fritz (2013) has proposed a
“dock-in” model of emotion recognition that is consistent with previous
models in stating “all music cultures contain both universal and culture-
specific features” (p. 514). It differs from previous models in that it
proposes that different cultures may “dock in” to only a subset of universal
music codes and that cross-cultural understanding can be explained in part
by the overlap in universal features employed. This notion of overlap
between cultures is similar to the cultural distance hypothesis discussed
below, though the basis for comparing cultural systems is based on a
simulation of the cognitive processing of musical structure rather than a
comparison of stimulus features.
When evaluating the findings of cross-cultural research in emotion
perception it is important to keep in mind that, of all of the studies listed,
only three (Egermann et al., 2015; Gregory & Varney, 1996; Laukka et al.,
2013) were fully comparative, that is, featuring both listeners and musical
stimuli from all cultures involved (Patel & Demorest, 2013). It may be
difficult to generalize these findings to other non-Western listeners or
musics. While the experience of emotions is a human universal, the notion
that music contains an emotional message rather than a functional or social
one, may be a somewhat culturally specific one. Given that most of the
studies cited here asked listeners from Western or Western-influenced
cultures to identify the emotions in non-Western music, and that much of
that music came from a single non-Western culture (India), it is difficult to
determine the cultural appropriateness of emotion judgments in music. As
Fritz (2013) observed in relation to one specific comparison involving
members of a society indigenous to a remote region of Cameroon, “the
musical expression of a variety of emotions like fearfulness and sadness,
while recognized in the Western stimuli by the Mafa participants, are—
according to interviews with Mafa individuals—never represented in the
traditional music of the Mafa people” (p. 512).
Cross-Cultural Explorations of Music Preference
Music preference research also explores affective responses to music, not in
terms of how music codes affect and emotion, but rather by examining the
conditions under which listeners experience pleasure when hearing music.
As LeBlanc proposed in his theoretical model, “Music preference decisions
are based upon the interaction of input information and the characteristics
of the listener, with input information consisting of the musical stimulus
and the listener’s cultural environment” (1982, p. 29). Music educators have
long been interested in music preference as a cross-cultural phenomenon in
part due to their commitment to providing a culturally diverse music
education. Researchers in music education have looked at how children’s
preference for music of other cultures develops and its relationship to
familiarity and other musical features.
Researchers have explored the musical qualities that might influence
preference judgments across cultures (Demorest & Schultz, 2004; Flowers,
1980; Fung, 1994; Morrison & Yeh, 1999; Shehan, 1981) and whether
instruction in a culture’s music can influence preference (Heingartner &
Hall, 1974; Shehan, 1985). As with the research on emotion, the bulk of
studies explore how Western listeners respond to non-Western music and
are not fully comparative. Findings show that preference for culturally
unfamiliar music can be increased with exposure—most of these studies
were conducted in formal educational settings among school-age and
college populations—but it does not extend to novel pieces from the
culture. Also, students prefer music that has properties of their culture such
as westernized arrangements of non-Western music (Demorest & Schultz,
2004). To summarize the findings, the more familiar sounding something is
culturally, the more likely listeners are to like it. However, while exposure
can increase preference for out-of-culture music, it does so only for learned
pieces and does not generalize to the style as a whole (Shehan, 1985).
Cross-Cultural Explorations of Musical Structure

One of the debates surrounding music and culture is the extent to which
there are deep structures in music that are relatively invariant across
cultures (cf. Brown & Jordania, 2013). Given humans’ shared biology and
the apparent human need to engage in musical behavior, it is plausible that
certain structural features would be present in most, if not all, musics.
Through cross-cultural explorations of musical structure, researchers have
sought to identify some of the structural features or perceptual processes
that work across cultures as well as the points at which music cognition
becomes more culturally bound.
Scale and Key Perception

Some of the earliest cross-cultural work done on scale perception included
infants (Lynch & Eilers, 1991, 1992; Lynch, Eilers, Oller, & Urbano 1990;
Lynch, Eilers, Oller, Urbano, & Wilson, 1991; Lynch, Short, & Chua,
1995). In a series of studies the authors tested whether pitch deviations
could be detected when presented in the context of familiar (major/minor)
versus unfamiliar (pelog) scale contexts. They found that deviations were
better detected for familiar scale contexts for both adults and children with
the exception of infants aged 6–12 months who performed similarly. While
these studies represent an important early attempt to examine scale
perception, they were hampered by methodological issues pertaining to the
way in which stimuli were created and the possible interference of absolute
pitch strategies.
There has been a significant amount of work examining whether tonal
relationships or tonal hierarchies (Krumhansl & Shepard, 1979) can be
perceived by out-of-culture listeners (Castellano, Bharucha, and
Krumhansl, 1984; Kessler, Hansen, and Shepard, 1984; Krumhansl, 1995;
Krumhansl, Louhivuori, Toiviainen, Jarvinen, & Eerola, 1999; Krumhansl
et al., 2000). The research has included music and participants from a
variety of cultures in the designs and the findings have been mixed. The
general sense is that out-of-culture listeners can employ more global
strategies involving tone proximity and frequency of occurrence within the
stimulus materials to mimic insider tonality judgments, but only up to a
point. When judgments become more complex (Krumhansl et al.,1999,
2000) or require specific cultural knowledge (Curtis & Bharucha, 2009),
cultural influences on tonal cognition become more pronounced. This
suggests that tonality perception, like emotion perception, provides both
general and specific cues for listeners depending on their cultural
background.
Two recent fully comparative studies (Raman & Dowling, 2016, 2017)
demonstrate the relative influence of global versus cultural factors in
tonality judgments. In a series of four experiments across two studies the
authors explored the sensitivity of Western and Carnātic trained musicians
to two types of modulations in Carnātic melodies. The rāgamālikā
modulation is more typical in Carnātic music and corresponds to the less
frequent parallel minor (C major to C minor) modulation in Western music.
The grahabēdham modulation is less common in Carnātic music, but more
common in Western music as it corresponds to a modulation to the relative
minor (C major to A minor). They tested modulation identification (both
accuracy and speed), tonal profiles, and active probe tone response during
modulation. While results varied somewhat across the different
experiments, they found, in general, that cultural background influenced
speed and accuracy in modulation detection with Indian listeners more
accurate overall. Response time varied by the cultural familiarity of the
modulation, with Indians faster for rāgamālikās and Westerners faster for
grahabēdhams. They also found that Western musicians’ tone profile
responses, while relying on global information about frequency and
distribution of tones, were sometimes influenced by a misapplication of
Western major/minor judgments in Carnātic tone profiles. The authors
reference the Cue Redundancy Model reviewed above as a possible
explanation for the mix of global and cultural cues employed by both
groups of musicians.
Other approaches to cross-cultural tonal cognition have included event-
related potential (ERP) responses to tasks involving out-of-culture scale
violations (Neuhaus, 2003; Renninger, Wilson, & Donchin, 2006) and
melodic expectancy violations (Demorest & Osterhout, 2012). In general,
listeners were less sensitive to out-of-culture scale deviations unless they
could detect the deviations using a culture-specific strategy. Another area of
research has addressed whether linguistic background shapes musical
ability. Researchers have found that tonal language speakers are generally
better at general pitch discrimination (Giuliano, Pfordresher, Stanley,
Narayana, & Wicha, 2011; Pfordresher & Brown, 2009; Wong et al., 2012)
and even at pitch accuracy in singing (Pfordresher & Brown, 2009) than
those from non-tonal linguistic backgrounds. The authors suggest that fine-
grained pitch processing is central to the acquisition of a tonal language and
therefore better developed among these individuals (Pfordresher & Brown,
2009).
Rhythm and Meter Perception

Rhythm and meter perception has received much more attention in music
cognition over the last ten to fifteen years, and with that attention has come
a commensurate increase in cross-cultural exploration. Researchers have
examined when infants’ responses to meter become culturally biased
(Hannon & Trehub, 2005a, 2005b; Soley & Hannon, 2010), the influence of
linguistic rhythm on rhythm perception (Hannon, 2009; Iversen, Patel, &
Ohgushi, 2008; Patel & Daniele, 2003; Yoshida et al., 2010), and cultural
influences on rhythmic perception and performance (Cameron, Bentley, &
Grahn, 2015; Drake & Ben El Heni, 2003; Polak, London, & Jacoby, 2016;
Stobart & Cross, 2000).
In all of these investigations researchers have found varying degrees of
cultural influence in rhythm processing in adults and infants, with infants
demonstrating a preference for the meters of their home culture as early as
4–8 months (Soley & Hannon, 2010), even when those meters were more
complex. Unlike adults, monocultural infants were equally responsive to
metric violations within both familiar and unfamiliar meters (Hannon &
Trehub 2005a) and infants as old as 12 months demonstrated enough
flexibility to “reset” their perceptual responses with sufficient exposure to
an unfamiliar meter (Hannon & Trehub, 2005b). While language
acquisition has often been a focus of tonal cognition, several studies have
found relationships between the rhythmic qualities of language and musical
rhythms (Hannon, 2009; Patel & Daniele, 2003) and rhythm grouping
(Iversen et al., 2008; Yoshida et al., 2010) of instrumental music from the
culture.
In a recent fully comparative study, Cameron and colleagues (2015)
tested Western-born and East African musicians’ performance on three
rhythmic tasks, discriminating between two patterns, reproducing rhythm
patterns, and tapping a steady beat to rhythmic patterns. Patterns were
drawn from East African and Western music and the authors predicted that
musicians would show a cultural advantage for all three tasks. As with
previous cross-cultural work, however, they found that while the two
performance tasks (rhythm reproduction and beat tapping) showed an in-
culture advantage, the groups were equally adept at rhythm discrimination.
This study was particularly noteworthy for including both perception and
performance measures, as many studies feature one or the other.
Phrasing and Form

Researchers have explored the influence of enculturation on phrase
boundary perception (Nan, Knösche, & Friederici, 2006; Nan, Knösche,
Zysset, & Friederici, 2008) and musical tension (Wong, Chan, Roy, &
Margulis, 2011) through neuroscientific measures. Two fully comparative
ERP studies (Nan, Knösche, & Friederici, 2009; Nan et al., 2006) tested
Chinese and German musicians’ and non-musicians’ ability to detect phrase
boundaries cross-culturally in unfamiliar excerpts. Results showed a clear
in-culture advantage on the behavioral task, and early positive ERP
components (100–450 ms) distinguished the two groups of participants for
Chinese music (familiar only to the Chinese participants). Both groups
exhibited a Closure Positive Shift neurologically suggesting they were
sensitive to phrase boundaries in both cultures. A follow-up study with only
German participants used an fMRI paradigm (Nan et al., 2008) to scan
participants while they heard phrased and unphrased examples of Western
and Chinese melodies that they were asked to classify by culture. All
participants were better at recognizing in-culture examples and the
researchers found that participants exhibited generally higher activation
when listening to the Chinese melodies in regions associated with attention
and auditory processing suggesting that out-of-culture music is more
demanding for those processes.
In most of the studies reviewed thus far, there are differences with in-
culture and out-of-culture responses to a variety of musical tasks from
emotion and preference to basic musical structures. However, the results are
almost always tempered by an awareness that some aspects of music
processing can be done without relying on culturally specific strategies,
using more global cues and responding to familiar sounding aspects of
unfamiliar cultures. In the next section, we review a series of studies on
cross-cultural music memory that have led us to propose a possible
explanatory framework for musical enculturation.
Cross-Cultural Explorations of Music Memory
In a series of experiments over the last decade or so we have used
recognition memory as a way of assessing how effectively in-culture and
out-of-culture music is processed. The studies have explored both
behavioral (Demorest, Morrison, Beken, & Jungbluth, 2008; Morrison,
Demorest, Campbell, Bartolome, & Roberts, 2012; Morrison, Demorest, &
Stambaugh, 2008) and neurological (Demorest et al., 2010; Morrison,
Demorest, Aylward, Cramer, & Maravilla, 2003) responses to culturally
familiar (Western or Turkish) and culturally unfamiliar (Turkish or Chinese)
music. In addition, we explored whether memory performance was
influenced by training (Demorest et al., 2008; Morrison et al., 2012) or
complexity (Morrison et al., 2008). The primary finding of this research has
been that there is an “enculturation effect,” or cultural bias, in listening such
that culturally unfamiliar music is consistently less effectively processed
even when considering matters of age, training, and complexity. Further,
this effect appears in both Western and non-Western born listeners. This
finding was strengthened by the work of another group that tested memory
and tension judgments in monomusical and bimusical participants in the
United States and India (Wong, Roy, & Margulis, 2009) and found a similar
recognition memory effect for monomusical, but not for bimusical,
participants. It should be noted that in most cases out-of-culture recognition
memory was above chance and demonstrated improvement with repeated
testing (Morrison et al., 2012); however, the observed difference between
in- and out-of-culture memory performance remained.
Despite the consistency of the enculturation effect, we did not have a
good explanation for its cause: that is, what aspect of out-of-culture music
was interfering with listeners’ ability to hear and remember it? What was so
unfamiliar about culturally unfamiliar music? Was it timbre, tonality,
rhythm, melody, or some combination? In a recent study (Demorest,
Morrison, Nguyen, & Bodnar, 2016), we sought to strip away contextual
variables in an attempt to attenuate or eliminate the effects of enculturation
on memory performance. We also explored the possible influence of music
preference as a variable influencing attention and memory. Western-born
participants (N = 128) were randomly assigned to conditions in which they
heard the same music excerpts presented in one of three contexts: full
instrumental ensemble (the original version), a single-voice melody on
piano, or a single-voice isochronous pitch sequence also on piano. In each
condition participants heard a block of three longer Western art music
excerpts and a block of three longer Turkish art music excerpts in a
counterbalanced order. After each example, they were asked to rate their
preference for the excerpt. After each set of three examples they completed
a twelve-item recognition memory test with six targets (taken from the
excerpts heard previously) and six foils (taken from a musically different
and previously unheard part of the same pieces). Regardless of the listening
condition, participants demonstrated superior memory for in-culture
examples suggesting that none of the contextual changes mitigated memory
performance for out-of-culture music. In-culture memory performance was
influenced by context, but out-of-culture memory performance was not.
Preference was higher overall for in-culture music, but there was no
significant correlation between preference scores and memory performance
across cultures. This suggested that the process of enculturation involved a
kind of informal learning of deeper structure involving commonly heard
sequences of pitch relationships.
Based on these findings we concluded, “If our understandings of out-of-
culture music are filtered through in-culture expectations, then a
comparison of the statistical properties of a listener’s home culture with that
of an unfamiliar culture might yield predictive information about
subsequent memory performance” (Demorest et al., 2016, p. 597). We
labeled the notion of a statistical comparison between music cultures across
one or more selected parameters as cultural distance (Demorest &
Morrison, 2016) in an effort to convey the potentially continuous rather
than dichotomous relationship among music cultural practices. In the next
section, we will discuss the construct of cultural distance as an explanatory
framework and present illustrative work in cross-cultural corpus analysis
that lends support to its central premise.
C D
Throughout the body of research that examines cross-cultural cognitive

processes associated with music, the logic of the underlying design
typically sets individuals and/or music examples from one cultural
background in contrast with individuals and/or music from another cultural
background. Such designs impose a dichotomous relationship between that
which is culturally familiar or culturally similar and that which is unfamiliar
or dissimilar. On one scale, this might be seen as reflecting the in-group and
out-of-group dynamic. However, such bifurcation blurs the fluidity that
characterizes musical interactions (Cross, 2008). That is, from the point of
view of an individual encultured in a particular music tradition, the music of
a culturally unfamiliar tradition may seem surprisingly accessible in one
case or virtually impenetrable in another. It is this distinction—and the
continuum of increasing or decreasing similarity from one’s own music—
that we propose can be productively explored using the concept of cultural
distance (Demorest & Morrison, 2016).
The way in which an individual interacts with music is mediated by the
properties common to the prevailing music of that individual’s culture. The
music on which one was “brought up” provides the framework by which
subsequent music experiences are judged as typical or atypical. Put another
way, the statistical likelihood of events that characterize the music of one’s
home culture governs not only the way in which one interacts with novel
pieces from within that same cultural tradition, but also with music from
culturally unfamiliar music traditions. One scans for common and familiar
patterns both where they are likely to be found and where they may not be
likely at all. This situation suggests a way in which an individual’s
responses to and facility with culturally unfamiliar music may be
interpreted or, indeed, predicted. Specifically, we have hypothesized that
the degree to which the musics of any two cultures differ in the statistical patterns of pitch
and rhythm will predict how well a person from one of the cultures can process the music of
the other. (Demorest & Morrison, 2016, p. 189)
Based on this cultural distance hypothesis, music cultures with considerable

overlap of patterns would likely allow for more efficient and effective
processing that might be observed through such responses as recognition
memory, error detection, phrase parsing, or metric identification, to name a
few.
In order to test this proposition, we first need a way to ascertain the
statistical properties of structural parameters considered typical of a given
culture’s music. IDyOM (Information Dynamics of Music; Pearce, 2005) is
a computational model of auditory expectation that uses statistical learning
and probabilistic prediction to acquire and process internal representations
of the structure of a musical style. Using the intervallic content of melody
as an illustration, IDyOM generates a probability distribution over the set of
possible intervals leading to each note in the melody. IDyOM generates
probability distributions that are conditioned upon the preceding musical
context and the prior musical experience of the model. The probability of
each note can be log-transformed to yield its information content according
to the model (MacKay, 2003), which reflects how unexpected the model
finds a note in a particular context. IDyOM is a variable-order Markov
model (Begleiter, El-Yaniv, & Yona, 2004; Bell, Cleary, & Witten, 1990;
Bunton, 1997; Cleary & Teahan, 1997) which uses a multiple-viewpoint
framework (Conklin & Witten, 1995) to represent music. This means that
IDyOM has several features that go beyond the capabilities of standard
Markov (or n-gram) models: first, it combines predictions from models of
different order (using different length contexts for prediction); second, it
adapts the maximum order used depending on the context; third, it
combines predictions from a long-term model (intended to reflect effects of
long-term exposure to a musical style) and a short-term model (reflecting
dynamic learning of repeated structure within a given piece of music); and
fourth, it is able to combine models of different representations of the
musical surface (e.g., chromatic pitch, pitch contour, pitch interval and scale
degree for predicting pitch; duration, duration ratio, duration contour for
predicting rhythm).
IDyOM has been shown to predict accurately Western listeners’ pitch
expectations in behavioral, physiological, and EEG studies (e.g., Egermann,
Pearce, Wiggins, & McAdams, 2013; Hansen & Pearce, 2014; Omigie,
Pearce, & Stewart, 2012; Omigie, Pearce, Williamson, & Stewart, 2013;
Pearce, 2005; Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya, 2010). In
many circumstances, IDyOM provides a more accurate model of listeners’
pitch expectations than static rule-based models (e.g., Narmour, 1990;
Schellenberg, 1997). Rule-based models consist of fixed rules (e.g., a small
interval is expected to be followed by another small interval in the same
direction) which cannot be modified by experience and therefore do not
predict any differences in perception between music cultures. Although
such models may describe the perception of listeners from a given culture
they do not constitute accurate models of cognition since they cannot
account for the observed effects of enculturation reviewed above, and they
often prove less accurate than IDyOM in accounting for within-culture
perception (Hansen & Pearce, 2014; Pearce, 2005; Pearce, Ruiz, et al.,
2010). Furthermore, IDyOM accounts well for other psychological
processes in music perception, including similarity perception (Pearce &
Müllensiefen, 2017), recognition memory performance (Agres, Abdallah, &
Pearce, 2018), phrase boundary perception (Pearce, Müllensiefen, &
Wiggins, 2010), and aspects of emotional experience (Egermann et al.,
2013; Gingras et al., 2015; Sauvé, Sayad, Dean, & Pearce, 2017).
To illustrate the construct of cultural distance, we trained three IDyOM
models to simulate listeners with enculturation in three different musical
styles: first, a Western model trained on a corpus of European folk songs to
simulate the perception of a Western listener enculturated in Western tonal
music; second, a Chinese model trained on a corpus of Chinese folk songs
to simulate the perception of a Chinese listener enculturated in Chinese
traditional music; and third, a Turkish model trained on a corpus of Turkish
Makam melodies to simulate the perception of a Turkish listener
enculturated in Turkish Makam music. The corpus of Western tonal music
consists of 769 German folk songs from the Essen Folk Song Collection
(Schaffrath, 1992, 1994, 1995), extracted from the datasets fink and erk.
The corpus of Chinese music consists of 858 Chinese folk songs from the
Essen Folk Song Collection, extracted from the datasets han and natmin.
The corpus of Turkish Makam music consists of 805 Makam melodies
extracted from the SymbTR database (Karaosmanoğlu, 2012).1 See Table 1
for further details of the corpora used to train the model simulations.
Empty and non-monophonic compositions were first removed from all
corpora. Furthermore, we removed duplicate compositions using a
conservative procedure that considers two compositions duplicates if they
share the same opening four melodic pitch intervals regardless of rhythm.
The pitch system used in Turkish Makam music is microtonal and does not
precisely map onto the Western (approximately) twelve-fold equal division
of the octave (Bozkurt, Ayangil, & Holzapfel, 2014). Since IDyOM’s pitch
matching is exact this would cause the Western and Chinese models to
assign zero probabilities to every pitch in the Turkish corpus. A simple
(though not unproblematic) way of addressing this issue is to round each
pitch in the Turkish corpus to the nearest semitone, which enables
comparisons to be made between the corpora. For studies with Western
participants, this corresponds to the assumption that listeners perceive
microtonal pitches categorically, aggregating microtonal pitches to the
nearest semitone category. There is some evidence that listeners do in fact
perceive pitch categorically in this way, at least in certain circumstances
(Burns & Campbell, 1994; Perlman & Krumhansl, 1996). In this example,
any responses among Western listeners that demonstrated differences
between Western melodies and these “pitch-Westernized” Turkish melodies
would underestimate the dissimilarity experienced between the two corpora,
conservatively producing type II errors (false negatives) rather than type I
errors (false positives).
Each model was used to make both within-culture and between-culture
predictions. For the within-culture predictions, IDyOM estimates the
information content of every event in every composition in the corpus,
using ten-fold cross-validation (Kohavi, 1995) to create training and test
sets from the same corpus. For between-culture predictions, IDyOM is first
trained on the within-culture corpus (e.g., the Western corpus for the
Western model) and then estimates the information content of every note in
every composition in a different corpus representing the comparison culture
(e.g., the Chinese or Turkish corpus for the Western model). IDyOM was
configured to use only its long-term model (or LTM, simulating long-term
exposure to a musical style) trained on the appropriate corpus; the short-
term model (simulating dynamic learning of repeated patterns within a
piece of music) was not used. Other than these differences regarding
training corpora, all models were configured identically using the default
parameters described in Pearce (2005). In all cases, information content was
averaged across notes for each composition yielding a value representing
the mean unpredictability of that composition for a given model.
For each comparison between cultures (Western vs. Turkish, Western vs.
Chinese, Turkish vs. Chinese), we then plot the data for each composition
in the two corresponding corpora: information content for one model is
plotted on the abscissa while information content for the second model is
plotted on the ordinate. The line of equality (x = y) indicates equivalence
between the two models. Compositions lying on this line do not distinguish
the two cultures, being equally predictable for each model; in other words,
they should be equally familiar and predictable to listeners enculturated in
either of the two cultures. Positions near the origin represent compositions
that are simple within both cultures—that is, they are highly predictable
insofar as most incidences of a selected feature are quite common—while
positions far from the origin represent compositions that are complex—
unpredictable, uncommon—within both cultures. Positions further away
from the line of equality represent compositions that are predictable for the
simulated model of one culture but unpredictable for the simulated model of
the other culture. Distance from the line of equality, therefore, provides a
quantitative measure of cultural distance based on information-theoretic
modeling of enculturation in musical styles. Fig. 1A illustrates how cultural
distance is computed for a comparison between IDyOM models trained on
the Western corpus and the Chinese corpus using a pitch interval
representation. By rotating the data points through 45°, Fig. 1B shows the
same data with Cultural Distance on the ordinate and culture-neutral
complexity on the abscissa. In this example, IDyOM correctly classifies 98
percent of the folk songs by culture (Chinese vs. Western).
FIGURE 1. Modeling cultural distance between the Western and Chinese corpora using a pitch
interval representation. A: The information content of the Western model plotted against that of the
Chinese model with the x = y line shown. B: A 45° rotation of A such that the ordinate represents
cultural distance and the abscissa culture-neutral complexity. For each style, the ten compositions
with most extreme cultural distance are highlighted.
As mentioned above, IDyOM is capable of modeling different attributes

of the musical surface and combining the predictions made by those
models. For each comparison between cultures, cultural distance is
computed for models predicting pitch structure alone (using a
representation of pitch interval), rhythmic structure alone (using a
representation of inter-onset interval), and for models using a combined
representation of pitch and rhythmic structure (for which a melodic event is
represented as a pair of values, one for the preceding pitch interval and one
for the preceding inter-onset interval).
For each cultural comparison and each of the three representations, ten
compositions with the highest Cultural Distance were selected for each of
the two cultures compared. These compositions are highlighted in Fig. 1 for
the pitch interval representation. Table 2 shows the mean Cultural Distance
values for each combination of cultural comparison and model
representation for the corpus as a whole and for the ten selected
compositions. Note that this Cultural Distance measure reflects both
corpora included in the comparison. Thus, there is only partial overlap
between the different comparisons (e.g., five of the ten Chinese songs
selected in the German comparison are the same as those selected in the
Turkish comparison; five for the two Turkish comparisons and two for the
two German comparisons). Note also that this Cultural Distance measure
may be asymmetrical such that one culture is on average more distant from
the second than the second is from the first (e.g., in the case of the Western
and Chinese comparison, see Table 2). For all three cultural comparisons, as
shown in Table 2, the IDyOM simulations produce positive correlations
between the cultures for rhythm predictions much more so than pitch
predictions which yield no correlation (Western/Chinese), a small positive
correlation (Western/Turkish), or a moderate negative correlation
(Turkish/Chinese). This suggests that pitch is a more important indicator of
cultural distance between these styles than rhythm. For each of the three
representations used in each of the three comparisons, one-sample t-tests
indicate that the mean cultural distance is significantly different from zero
(p < 0.01) for both corpora involved in the comparison.
Limitations
The analysis of two or more types of music along any given musical
parameter (for example, pitch as in the illustration above) or combination of
parameters imposes the assumption that such an analysis is valid within
each music type. While a music tradition such as Western art music (at least
that from or deriving from the common practice period of approximately
the mid-seventeenth to early twentieth centuries) has a well-established
history of analysis and interpretation based, in part, on both sequential and
concurrent pitch interval relationships, the same may not be said of other
traditions. Tools such as IDyOM offer the flexibility to examine cultural
distance according to a variety of individual or combinations of musical
parameters. Nevertheless, any specific configuration runs the risk of
privileging one parametric hierarchy over another. Thus, in terms of cross-
cultural research, such statistical models will virtually always impose the
perspective of a particular music tradition, at least to some degree.
This limitation has ramifications for fully comparative studies in that the
degree to which a parameter holds primacy for one set of participants may
not hold true for the other. Much as emotion recognition, so familiar to the
experience of westernized listeners, did not figure meaningfully in the
music tradition of the Mafa (Fritz, 2013), the statistical likelihood of
patterns of pitch may contribute less to musical thinking among Rwandans
and more to North Americans (as in Cameron et al., 2015) than does the
complexity of patterns of rhythm. In this way, cultural distance is a tool
through which one can isolate norms for one or more musical parameters as
well as provide a particular perspective on musical meaning-making.
A related limitation is that IDyOM currently requires symbolic score-
like input in which notes are represented as discrete events with discrete
properties (e.g., onset time, pitch). This does not readily accommodate
musical cultures which depend heavily on timbral, dynamic, or textural
changes. The same is true of musical cultures that have no written tradition,
where the distinction between composition and performance is blurred or
nonexistent or where music is inextricably combined with other modes of
communication (Cross, 2014).
Despite the emphasis here on the advantageous aspects of familiarity,
without question novelty is an attractive characteristic of music. Models of
musical expectancy (e.g., Huron, 2006; Meyer, 1956) describe the interest
inherent in and stimulation derived from that which is unfamiliar and
surprising in music. The constant curiosity for new musical ideas suggests
ongoing willingness to explore less “predictable” musical scenarios. With
much of the world’s music readily—and in many cases instantly—
accessible, such willingness leads as easily to unfamiliar music traditions as
to the remoter corners of one’s own. We have used cultural distance as a
means of explaining processing difficulties (as operationalized by
recognition memory); however, it is equally viable as a tool to examine
such positive aspects of music experience as interest and surprise. Although
Cook (2008) was referring specifically to musicologists, his description can
arguably be construed more broadly: “Practically all of us are at least to
some degree musically multilingual … as a result one understands even the
tradition(s) in which one is most ‘at home’ as options amongst other
options, understands them in relation to other traditions rather than as
absolutes” (p. 63).
Research on cross-cultural music interactions has demonstrated that

responses to culturally familiar and unfamiliar music, as well as responses
by individuals encultured in different music traditions, can be either
remarkably similar or strikingly different depending on the task and the
music presented. Theoretical models such as Cue Redundancy (Balkwill &
Thompson, 1999) or Fritz’s (2013) dock-in model, have framed cross-
cultural music interactions as consisting of culture-general and culture-
specific components. The manner in which these models account for areas
of overlap between music cultures and distinctions unique to each music
culture fit well with recent research findings as well as with the concept of
cultural distance. However, absent from their construal of shared and
unique features is a middle ground of “culturally specific but similar”
components that, while mutually proprietary and uniquely meaningful to
each culture, may be somewhat accommodating to strategies for listening,
performing, and meaning-making deployed by individuals from outside the
culture.
This similar-but-not-shared aspect of the cultural distance construct can
help account for memory responses, reported above, to out-of-culture music
that were less successful than for in-culture music but were still above
chance (e.g., Demorest et al., 2008). Likewise, it also provides an
explanation in cases where listeners have applied familiar listening
strategies to culturally unfamiliar music only to encounter ultimate
confusion (e.g., Curtis & Bharucha, 2009). Eventually, the trajectory of
complexity within a culturally unfamiliar system takes a listener or
performer past where learned patterns can accommodate. On the whole,
responses to musics that demonstrate considerable overlap may show
greater consistency than those to musics with very few points of
commonality. Thus, one can make a distinction between the apparent “ease”
with which an individual can move between music cultures and the more
likely case of greater opportunities afforded by some unfamiliar music
cultures to successfully deploy familiar strategies.
This is potentially useful for neurological investigations of music
processing. Responses to culturally unfamiliar music have generally been
reported to differ more by degree than by presence or location. That is,
music appears to recruit similar neural systems regardless of its cultural
familiarity, though the strength or extent of that activity may differ
according to the music encountered (e.g., Nan et al., 2008; Demorest et al.,
2010). The model of cultural distance is a tool that provides a continuous
rather than categorical conceptualization of cross-cultural music research
designs. Such a correlational approach may lend itself well to the fine-
grained, incremental, and plastic manner in which neurological processes
and pathways develop and are deployed.
We are not suggesting that through the learning of an unfamiliar array of
patterns one can gain access to the full, rich experience of culturally
situated musical contexts. Music represents a broad range of activities and
relationships that may only have tenuous connections to structural
parameters like melodic or rhythmic intervals. Much of music’s meaning is
derived from where, when, and how it occurs quite apart from how it is put
together (Small, 1998). Rather, we suggest that cultural distance may be a
useful lens through which specific aspects of the cognitive processing of
music—particularly musical structure—may be predicted, investigated,
analyzed, and interpreted.
Much of the research on cross-cultural musical interactions has involved
measurement of such things as memory, affective response, detection of
differences, verbal or written description, and preference. In virtually all
cases these outcomes were prompted through listening tasks, a way of
experiencing music that, while ecologically valid and obviating any need
for previous training, is covert and arguably accommodating of varied
interpretations and strategies. In contrast, investigations of cross-cultural
performance contexts may yield new insights into the ways in which
individuals navigate unfamiliar musical terrain. More directly observable
performance-based interactions may shed additional light on the processes
by which one grapples with, accommodates, or eventually gains facility
with musics that are differently organized.
Earlier we posed the question of what happens when music crosses
cultural boundaries. The construct of cultural distance provides a more
graduated, incremental way of conceptualizing the relationship between the
familiar and the unfamiliar. It allows for the fluidity characteristic of
musical interactions, recognizes the porous nature of music categorization,
and accounts for the variability found within any music tradition. For
research purposes, cultural distance offers a way by which dichotomous
models of music—insider/outsider, familiar/unfamiliar, own/other—can be
refined to test a more nuanced picture of musical meaning-making. In this
way, cross-cultural music interactions might be viewed less as the crossing
of a boundary and more as the undertaking of a trip.
R
Agres, K., Abdallah, S., & Pearce, M. T. (2018). Information-theoretic properties of auditory
sequences dynamically influence expectation and memory. Cognitive Science 42(1), 43–76.
Baek, Y. M. (2015). Relationship between cultural distance and cross-cultural music video
consumption on YouTube. Social Science Computer Review 33(6), 730–748.
Balkwill, L. L. (2006). Perceptions of emotion in music across cultures. Paper presented at Emotional
Geographies: The Second International & Interdisciplinary Conference, May, Queen’s University,
Kingston, Canada.
Balkwill, L. L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of
emotion in music: Psychophysical and cultural cues. Music Perception 17(1), 43–64.
Balkwill, L. L., Thompson, W. F., & Matsunaga, R. (2004). Recognition of emotion in Japanese,
337–349.
Begleiter, R., El-Yaniv, R., & Yona, G. (2004). On prediction using variable order Markov models.
Journal of Artificial Intelligence Research 22, 385–421.
Bell, T. C., Cleary, J. G., & Witten, I. H. (1990). Text compression. Englewood Cliffs, NJ: Prentice
Hall.
Bozkurt, B., Ayangil, R., & Holzapfel, A. (2014). Computational analysis of Makam music in
Turkey: Review of state-of-the-art and challenges. Journal of New Music Research 43(1), 3–23.
Brown, S., & Jordania, J. (2013). Universals in the world’s musics. Psychology of Music 41(2), 229–
248.
Bunton, S. (1997). Semantically motivated improvements for PPM variants. The Computer Journal
40(2–3), 76–93.
Burns, E. M., & Campbell, S. L. (1994). Frequency and frequency-ratio resolution by possessors of
absolute and relative pitch: Examples of categorical perception. Journal of the Acoustical Society
of America 96(5), 2704–2719.
Cameron, D. J., Bentley, J., & Grahn, J. A. (2015). Cross-cultural influences on rhythm processing:
Reproduction, discrimination, and beat tapping. Frontiers in Psychology 6, 366. Retrieved from
https://doi.org/10.3389/fpsyg.2015.00366
Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music of
north India. Journal of Experimental Psychology: General 113(3), 394–412.
Chiao, J. Y., Iidaka, T., Gordon, H. L., Nogawa, J., Bar, M., Aminoff, E., … Ambady, N. (2008).
Cultural specificity in amygdala response to fear faces. Journal of Cognitive Neuroscience 20(12),
2167–2174.
Cleary, J. G., & Teahan, W. J. (1997). Unbounded length contexts for PPM. The Computer Journal
40(2–3), 67–75.
Conklin, D., & Witten, I. H. (1995). Multiple viewpoint systems for music prediction. Journal of
New Music Research 24(1), 51–73.
Cook, N. (2008). We are all (ethno)musicologists now. In H. Stobart (Ed.), The new (ethno)-
musicologies (pp. 48–70). Lanham, MD: Scarecrow Press.
Cross, I. (2008). Musicality and the human capacity for culture. Musicae Scientiae 12(1 Suppl.),
147–167.
Cross, I. (2014). Music and communication in music psychology. Psychology of Music 42(6), 809–
819.
Curtis, M. E., & Bharucha, J. J. (2009). Memory and musical expectation for tones in cultural
context. Music Perception 26(4), 365–375.
Demorest, S. M., & Morrison, S. J. (2016). Quantifying culture: The cultural distance hypothesis of
melodic expectancy. In J. Y. Chiao, S.-C. Li, R. Seligman, & R. Turner (Eds.), The Oxford
handbook of cultural neuroscience (pp. 183–194). Oxford: Oxford University Press.
Demorest, S. M., Morrison, S. J., Beken, M. N., & Jungbluth, D. (2008). Lost in translation: An
enculturation effect in music memory performance. Music Perception 25(3), 213–223.
Demorest, S. M., Morrison, S. J., Beken, M. N., Stambaugh, L. A., Richards, T. L., & Johnson, C.
(2010). Music comprehension among western and Turkish listeners: fMRI investigation of an
enculturation effect. Social Cognitive and Affective Neuroscience 5, 282–291.
Demorest, S. M., Morrison, S. J., Nguyen, V. Q., & Bodnar, E. N. (2016). The influence of contextual
cues on cultural bias in music memory. Music Perception 33(5), 590–600.
Demorest, S. M., & Osterhout, L. (2012). ERP responses to cross-cultural melodic expectancy
violations. Annals of the New York Academy of Sciences 1252, 152–157.
Demorest, S. M., & Schultz, S. J. (2004). Children’s preference for authentic versus arranged
versions of world music recordings. Journal of Research in Music Education 52(4), 300–313.
Deva, B. C., & Virmani, K. G. (1975). A study in the psychological response to ragas. (Research
Report II of Sangeet Natak Akademi). New Delhi, India: Indian Musicological Society.
Drake, C., & Ben El Heni, J. (2003). Synchronizing with music: Intercultural differences. Annals of
Egermann, H., Fernando, N., Chuen, L., & McAdams, S. (2015). Music induces universal emotion-
related psychophysiological responses: Comparing Canadian listeners to Congolese Pygmies.
Frontiers in Psychology 5, 1341. Retrieved from https://doi.org/10.3389/fpsyg.2014.01341
Egermann, H., Pearce, M. T., Wiggins, G. A., & McAdams, S. (2013). Probabilistic models of
expectation violation predict psychophysiological emotional responses to live concert music.
Cognitive, Affective & Behavioral Neuroscience 13(3), 533–553.
Flowers, P. J. (1980). Relationship between two measures of music preference. Contributions to
Music Education 8, 47–54.
Frith, S. (1996). Music and identity. In S. Hall & P. Du Gay (Eds.), Questions of cultural identity (pp.
108–127). London: Sage Publications.
Fritz, T. (2013). The dock-in model of music culture and cross-cultural perception. Music Perception:
An Interdisciplinary Journal 30(5), 511–516.
Fung, C. V. (1994). Undergraduate nonmusic majors’ world music preference and multicultural
attitudes. Journal of Research in Music Education 42(1), 45–57.
Gingras, B., Pearce, M. T., Goodchild, M., Dean, R. T., Wiggins, G., & McAdams, S. (2015).
Linking melodic expectation to expressive performance timing and perceived musical tension.
Journal of Experimental Psychology: Human Perception & Performance 42(4), 594–609.
Giuliano, R. J., Pfordresher, P. Q., Stanley, E. M., Narayana, S., & Wicha, N. Y. (2011). Native
experience with a tone language enhances pitch discrimination and the timing of neural responses
to pitch change. Frontiers in Psychology 2, 146. Retrieved from
Golby, A. J., Gabrieli, J. D., Chiao, J. Y., & Eberhardt, J. L. (2001). Differential responses in the
fusiform region to same-race and other-race faces. Nature Neuroscience 4, 845–850.
Gregory, A. H., & Varney, N. (1996). Cross-cultural comparisons in the affective response to music.
Hannon, E. E. (2009). Perceiving speech rhythm in music: Listeners classify instrumental songs
according to language of origin. Cognition 111(3), 403–409.
Hannon, E. E., & Trehub, S. E. (2005a). Metrical categories in infancy and adulthood. Psychological
Science 16(1), 48–55.
Hannon, E. E., & Trehub, S. E. (2005b). Tuning in to musical rhythms: Infants learn more readily
than adults. Proceedings of the National Academy of Sciences 102(35), 12639–12643.
Hansen, N. C., & Pearce, M. T. (2014). Predictive uncertainty in auditory sequence processing.
Frontiers in Psychology 5, 1–17. Retrieved from https://doi.org/10.3389/fpsyg.2014.01052
Heingartner, A., & Hall, J. V. (1974). Affective consequences in adults and children of repeated
exposure to auditory stimuli. Journal of Personality and Social Psychology 29(6), 719–723.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral
and Brain Sciences 33(2–3), 61–83.
Hofstede, G. (1983). National cultures in four dimensions: A research-based theory of cultural
differences among nations. International Studies of Management & Organization 13(1–2), 46–74.
Huron, D. B. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA:
MIT Press.
Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on
auditory experience. Journal of the Acoustical Society of America 124, 2263–2271.
Juslin, P. N. (2000). Cue utilization in communication of emotion in music performance: Relating
performance to perception. Journal of Experimental Psychology: Human Perception and
Performance 26(6), 1797–1812.
Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal emotion and music
performance: Different channels, same code? Psychological Bulletin 129(5), 770–814.
Karaosmanoğlu, M. K. (2012). A Turkish Makam music symbolic database for music information
retrieval: Symbtr. In Proceedings of the 13th ISMIR Conference, Porto, Portugal, 223–228.
Keil, A., & Keil, C. (1966). A preliminary report: The perception of Indian, Western, and Afro-
American musical moods by American students. Ethnomusicology 10(2), 153–173.
Kessler, E. J., Hansen, C., and Shepard, R. N. (1984). Tonal schemata in the perception of music in
Bali and the West. Music Perception 2(2), 131–65.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model
selection. In Proceedings of the Fourteenth International Joint Conference on Artificial
Intelligence (Vol. 2, pp. 1137–1145). San Mateo, CA: Morgan Kaufmann.
Krumhansl, C. L. (1995). Music psychology and music theory: Problems and prospects. Music
Theory Spectrum 17(1), 53–80.
Krumhansl, C. L., Louhivuori, J., Toiviainen, P., Jarvinen, T., & Eerola, T. (1999). Melodic
expectation in Finnish spiritual folk hymns: Convergence of statistical, behavioral, and
computational approaches. Music Perception 17(2), 151–195.
Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions within
a diatonic context. Journal of Experimental Psychology: Human Perception and Performance
5(4), 579–594.
Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P., Järvinen, T., & Louhivuori, J. (2000).
Cross-cultural music cognition: Cognitive methodology applied to North Sami Yoiks. Cognition
76(1), 13–58.
Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect” for the
prototypes of speech categories, monkeys do not. Attention, Perception, & Psychophysics 50(2),
93–107.
Laukka, P., Eerola, T., Thingujam, N. S., Yamasaki, T., & Beller, G. (2013). Universal and culture-
specific factors in the recognition and performance of musical affect expressions. Emotion 13(3),
434–449.
LeBlanc, A. (1982). An interactive theory of music preference. Journal of Music Therapy 19(1), 28–
45.
Lynch, M. P., & Eilers, R. E. (1991). Children’s perception of native and nonnative musical scales.
Music Perception 9(1), 121–131.
Lynch, M. P., & Eilers, R. E. (1992). A study of perceptual development for musical tuning.
Perception & Psychophysics 52(6), 599–608.
Lynch, M. P., Eilers, R. E., Oller, D. K., & Urbano, R. C. (1990). Innateness, experience, and music
perception. Psychological Science 1(4), 272–276.
Lynch, M. P., Eilers, R. E., Oller, K. D., Urbano, R. C., & Wilson, P. (1991). Influences of
acculturation and musical sophistication on perception of musical interval patterns. Journal of
Experimental Psychology: Human Perception and Performance 17(4), 967–975.
Lynch, M. P., Short, L. B., and Chua, R. (1995). Contributions of experience to the development of
musical processing in infancy. Developmental Psychobiology 28(7), 377–398.
MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge:
Cambridge University Press.
Merton, R. K. (1972). Insiders and outsiders: A chapter in the sociology of knowledge. American
Journal of Sociology 78(1), 9–47.
Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press.
Morrison, S. J., & Demorest, S. M. (2009). Cultural constraints on music perception and cognition.
Progress in Brain Research 178, 67–77.
Morrison, S. J., Demorest, S. M., Aylward, E. H., Cramer, S. C., & Maravilla, K. R. (2003). fMRI
investigation of cross-cultural music comprehension. NeuroImage 20(1), 378–384.
Morrison, S. J., Demorest, S. M., Campbell, P. S., Bartolome, S. J., & Roberts, J. C. (2012). Effect of
intensive instruction on elementary students’ memory for culturally unfamiliar music. Journal of
Research in Music Education 60(4), 363–374.
Morrison, S. J., Demorest, S. M., & Stambaugh, L. A. (2008). Enculturation effects in music
cognition: The role of age and music complexity. Journal of Research in Music Education 56(2),
118–129.
Morrison, S. J., & Yeh, C. S. (1999). Preference responses and use of written descriptors among
music and nonmusic majors in the United States, Hong Kong, and the People’s Republic of China.
Journal of Research in Music Education 47(1), 5–17.
Nan, Y., Knösche, T. R., & Friederici, A. D. (2006). The perception of musical phrase structure: A
cross-cultural ERP study. Brain Research 1094(1), 179–191.
Nan, Y., Knösche, T. R., & Friederici, A. D. (2009). Non-musicians’ perception of phrase boundaries
in music: A cross-cultural ERP study. Biological Psychology 82(1), 70–81.
Nan, Y., Knösche, T. R., Zysset, S., & Friederici, A. D. (2008). Cross-cultural music phrase
processing: An fMRI study. Human Brain Mapping 29(3), 312–328.
Narmour, E. (1990). The analysis and cognition of basic melodic structures: The implication-
realization model. Chicago, IL: University of Chicago Press.
Neuhaus, C. (2003). Perceiving musical scale structures: A cross-cultural event-related brain
potentials study. Annals of the New York Academy of Sciences 999, 184–188.
Omigie, D., Pearce, M. T., & Stewart, L. (2012). Tracking of pitch probabilities in congenital amusia.
Neuropsychologia 50(7), 1483–1493.
Omigie, D., Pearce, M. T., Williamson, V. J., & Stewart, L. (2013). Electrophysiological correlates of
melodic processing in congenital amusia. Neuropsychologia 51(9), 1749–1762.
Patel, A. D., & Daniele, J. R. (2003). An empirical comparison of rhythm in language and music.
Cognition 87(1), B35–B45.
Patel, A. D., & Demorest, S. M. (2013). Comparative music cognition: Cross-species and cross-
cultural studies. In D. Deutsch (Ed.), The psychology of music (3rd ed., pp. 647–681). London:
Academic Press.
Pearce, M. T. (2005). The construction and evaluation of statistical models of melodic structure in
music perception and composition (Doctoral dissertation). Department of Computing, City
University, London.
Pearce, M. T., & Müllensiefen, D. (2017). Compression-based modelling of musical similarity
perception. Journal of New Music Research 46(2), 135–155.
Pearce, M. T., Müllensiefen, D., & Wiggins, G. A. (2010). Melodic grouping in music information
retrieval: New methods and applications. In Z. W. Ras & A. Wieczorkowska (Eds.), Advances in
music information retrieval (pp. 364–388). Berlin: Springer.
Pearce, M. T., Ruiz, M. H., Kapasi, S., Wiggins, G. A., & Bhattacharya, J. (2010). Unsupervised
statistical learning underpins computational, behavioural and neural manifestations of musical
expectation. NeuroImage 50(1), 302–313.
Perlman, M., & Krumhansl, C. L. (1996). An experimental study of internal interval standards in
Javanese and Western musicians. Music Perception 14(2), 95–116.
Pfordresher, P. Q., & Brown, S. (2009). Enhanced production and perception of musical pitch in tone
language speakers. Attention, Perception, & Psychophysics 71(6), 1385–1398.
Polak, R., London, J., & Jacoby, N. (2016). Both isochronous and non-isochronous metrical
subdivision afford precise and stable ensemble entrainment: A corpus study of Malian djembe
drumming. Frontiers in Neuroscience 10, 285. Retrieved from
https://doi.org/10.3389/fnins.2016.00285
Raman, R., & Dowling, W. J. (2016). Real-time probing of modulations in South Indian classical
(Carnatic) music by Indian and Western musicians. Music Perception 33(3), 367–393.
Raman, R., & Dowling, W. J. (2017). Perception of modulations in south Indian classical (Carnatic)
music by student and teacher musicians: A cross-cultural study. Music Perception 34(4), 424–437.
Renninger, L. B., Wilson, M. P., & Donchin, E. (2006). The processing of pitch and scale: An ERP
study of musicians trained outside of the western musical system. Empirical Musicology Review
1(4), 185–197.
Sauvé, S., Sayad, A., Dean, R. T., & Pearce, M. T. (2017). Effects of pitch and timing expectancy on
musical emotion. arXiv Preprint, 1708.03687.
Schaffrath, H. (1992). The ESAC databases and MAPPET software. Computing in Musicology 8, 66.
Schaffrath, H. (1994). The ESAC electronic songbooks. Computing in Musicology 9, 78.
Schaffrath, H. (1995). The Essen folksong collection. In D. Huron (Ed.), Database containing 6,255
folksong transcriptions in the Kern format and a 34-page research guide [computer database].
Menlo Park, CA: CCARH.
Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy.
Shehan, P. K. (1981). Student preferences for ethnic music styles. Contributions to Music Education
9, 21–28.
Shehan, P. K. (1985). Transfer of preference from taught to untaught pieces of non-Western music
genres. Journal of Research in Music Education 33(3), 149–158.
Small, C. (1998). Musicking: The meanings of performing and listening. Middletown, CT: Wesleyan
University Press.
Soley, G., & Hannon, E. E. (2010). Infants prefer the musical meter of their own culture: A cross-
Stobart, H., & Cross, I. (2000). The Andean anacrusis? Rhythmic structure and perception in Easter
songs of northern Potosi, Bolivia. British Journal of Ethnomusicology 9(2), 63–92.
Thompson, W. F., & Balkwill, L. L. (2010). Cross-cultural similarities and differences. In P. N. Juslin
& J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 755–
790). New York: Oxford University Press.
Trulsson, Y. H., & Burnard, P. (2016). Insider, outsider or cultures in-between. In P. Burnard, E.
Mackinlay, & K. Powell (Eds.), The Routledge international handbook of intercultural arts
research (pp. 115–125). New York: Routledge.
Wong, P. C. M., Chan, A. H. D., Roy, A., & Margulis, E. H. (2011). The bimusical brain is not two
monomusical brains in one: Evidence from musical affective processing. Journal of Cognitive
Neuroscience 23(12), 4082–4093.
Wong, P. C., Ciocca, V., Chan, A. H., Ha, L. Y., Tan, L. H., & Peretz, I. (2012). Effects of culture on
musical pitch perception. PloS ONE 7(4), e33424.
Wong, P. C. M., Roy, A. K., & Margulis, E. H. (2009). Bimusicalism: The implicit dual enculturation
of cognitive and affective systems. Music Perception 27(2), 81–88.
Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010).
The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic
study. Cognition 115(2), 356–361.
1
The Essen Folk Song Collection was retrieved from: http://kern.humdrum.org/cgi-bin/browse?
l=/essen. The SymbTR database was retrieved from: https://github.com/MTG/SymbTr.
CHAPT E R 4
W H E N E X T R AVA G A N C E
IMPRESSES: RECASTING
ESTHETICS IN
E VO L U T I O N A RY T E R M S
B JO R N ME R K E R
W we constrain language by meter and rhyme in poetry, or when we

adorn mundane earthenware pottery with decorative markings, we are
making matters more complicated than utility or instrumental purposes
dictate. Whole art forms, such as music, lay claim to human resources
without yielding obvious returns in survival benefits. A candidate benefit
such as the promotion of group cohesion through music-mediated bonding
(Huron, 2001) begs the question of why humans need music to bond when
non-human animals bond perfectly well without it (Lim & Young, 2006).
We share with them the system of socially contingent circulation of
“hormones of affiliation” (oxytocin and vasopressin, Heinrichs, von
Dawans, & Domes, 2009), so enhanced expression of relevant receptors in
nuclei of the basal forebrain (Kelly & Goodson, 2014) would seem to
provide a less cumbersome way to increase our bonding propensities. Even
assuming music does play a role in the human case (Pearce, Launay, &
Dunbar, 2015), the evolutionary question of how and why music acquired a
capacity to facilitate bonding remains (Pinker, 1997, p. 528).
The paucity of well-supported utilitarian accounts of the function of
music has led some to regard music as a by-product of other mechanisms of
the mind (Pinker, 1997, p. 534) or as a culturally invented “technology”
(Patel, 2008, p. 400). The approach to be detailed in what follows traces the
human propensity to expend resources on the arts to the same selection
pressures that have compelled a number of species of non-human animals to
maintain cultural traditions featuring large repertoires of elaborate song on a
learned basis. In tracing that analogy we will uncover a
psychological/neural mechanism specifically involved in esthetic judgments
that frames the question of the emotional impact of music in a new way. To
do so, we need to approach the relevant animal displays from first
principles onwards to arrive at the recent elaboration of Zahavi’s handicap
principle (Zahavi, 1975) into the “developmental stress hypothesis” for the
function of large and complex song repertoires in birds with vocal learning
(Hasselquist, Bensch, & von Schantz, 1996; MacDougall-Shackleton &
Spencer, 2012; Nowicki, Searcy, & Peters, 2002a).
S A , I: T
E L
Among songbirds with vocal production learning (Janik & Slater, 1997)
there is an association between a high duty cycle for song (i.e., a large
amount of continuous singing per day), large song repertoires, and high
pattern variety among songs (Baylis, 1982). This correlation is presumably
driven by the fact that protracted production of monotonous singing loses
the attention of its audience by the ubiquitous mechanism of habituation
(Hartshorne, 1956; Kroodsma, 1978; Sachs, 1967; Sokolov, 1963).
Persistent singing is energetically costly and takes place at the expense of
useful activities such as foraging. Why, then, prolong the song display
beyond the boredom threshold, thus incurring the additional cost of
acquiring the means to produce elaborate song? Whence the waste and
frivolity of virtuoso performance?
Because the costs of signaling are paid for by the same metabolic engine
that foots the bill for survival, the very fact of surviving with the added
burden of exaggerated signaling is in itself informative. It supplies proof
positive that the signaler is capable of sustaining that additional burden. The
capacity is therefore necessarily an aspect of signaler quality, a
circumstance Amotz Zahavi codified in what he named the “handicap
principle” (Zahavi, 1975), a principle that completes the Darwinian theory
of sexual selection (Darwin, 1871).
The logic of the handicap principle is quite general, and is by no means
limited to promoting potential genetic benefits to offspring. A male in an
agonistic interaction with a conspecific needs to assess the actual fighting
ability of the rival, and not his genetic potential. Similarly a female in a
species with obligate bi-parental care must assess the extent of a
prospective mate’s capacity to invest in the care of offspring. For a variety
of reasons that capacity can and does vary independently of the genes he
contributes to those offspring. In these cases and others, a display of excess
capacity in the form of elaborate signaling can indicate capacity in the
relevant behavioral dimension, provided such signaling actually relates,
directly or indirectly, to abilities and resources employed in the behavioral
dimension of interest to the receiver.
An example involving a direct relationship between signaling and a
desired or feared quality in the signaler is physical fitness itself. Loud
singing for many hours on a daily basis proves that the singer has the
energy reserves, predator vigilance, stamina, and foraging ability to sustain
such behavior without succumbing. For a receiver this means that those
same resources are available for other uses should the animal’s
circumstances or needs require it. Indirect relationships between signaling
and signaler qualities can be quite remote, as illustrated by numerous
laboratory and field studies inspired by the “developmental stress
hypothesis” over the past two decades (Hasselquist et al., 1996; Nowicki et
al., 2002a).
The learned acquisition of elaborate song is a protracted and demanding
sequence of intertwined perceptual, attentional, memory, and motor
challenges that unfolds after hatching in a still developing brain. The
sequence of passive song memorization, followed by stage-wise practice of
vocal skill spanning over weeks and months, interacts with and feeds back
upon the development and neural maturation of an elaborate system of
interconnected forebrain nuclei dedicated to song learning and production
(Iwaniuk & Nelson, 2003; reviewed in Nowicki et al., 2002a). Thus, the
size of the song control nuclei of the mature songbird forebrain correlates
not only with average song repertoire size across species, but with the
repertoire size and song proficiency achieved by individuals within a
species (Gahr, 2000; Garamszegi & Eens, 2004). The latter circumstance is
of central biological significance, because repertoire size and song
proficiency are factors used by females in choosing a mate (Nowicki,
Searcy, & Peters, 2002b).
Each sequential stage of this delicately tuned two-way interaction
between neural development and behavior is susceptible to perturbation by
a variety of external stressors and disturbances (hence the name
“developmental stress hypothesis”). They include, but are not limited to,
immune challenges, disease and parasites, nutritional status dependent on
parental provisioning and later the bird’s own foraging ability,
environmental pollutants, and disruptions at the nest (reviewed by
MacDougall-Shackleton & Spencer, 2012). The management of such
encumbrances consumes developmental resources which otherwise would
have been available for the practice-dependent growth of the song system.
A large repertoire and proficient song performance accordingly can only
be acquired by an individual who as a nestling was cared for by well-
functioning parents, who grew up in a secure nest, was subsequently
unencumbered by disease and parasites, and—in possession of sharp
faculties, memory capacity, foraging ability, and predator vigilance—
engaged in hundreds of hours of successful singing practice. Whatever
impairs the post-hatching growth of a bird’s system of song nuclei, and
whatever keeps the bird from attending to and practicing song is later
evident as deficits in the size and perfection of its mature song repertoire.
This makes a large repertoire of complex song a direct causal reflection of
an individual’s successful passage through a demanding and varied
developmental obstacle course.
The more demanding the performance to be acquired, the more
comprehensive a measure of an individual’s personal history and qualities
lies implicit in the perfected, mature songbout. In effect, then, an
individual’s level of song proficiency sums up, in a single performance, the
entire developmental history of the singer, and as such provides an all-
round certificate of competence, of all-around individual phenotypic
quality. It tells its audience, in a way impossible to counterfeit, that the
singer comes from, as it were, “a good background.” Potential mates and
rivals thus do well to take a singer displaying mastery and virtuosity
seriously.
Though none of this is likely to be accomplished without an adequate
genetic background, it is the finished phenotype and not the genotype that
fights with rivals and helps a bonded female provision her offspring and
defend the nest. Hence the importance of markers for phenotypic quality
when decisions in these regards have to be made on the spot during a brief
breeding season. That is what the expert songbout provides, conferring on
the singer high priority as mate or rival. Provided, that is, that there are ears
competent to assess the quality of the songbout, and to discriminate an
outstanding performance from a middling one. This in turn leads us to the
crux of cultural esthetics, namely the means by which receivers judge
performance quality and the critical dependence of those means on the
cultural song tradition within which the performance takes place.
S A , II: A B
B
The circumstances outlined in the previous section help us understand why

a brown thrasher accumulates a song repertoire estimated to contain over
1800 separate melodies (Kroodsma & Parker, 1977), or how the sounds of
as many as 76 different species of birds from two continents can be
identified in the song repertoire of a single marsh warbler individual
(Dowsett-Lemaire, 1979). As we have seen, the sole reason to take a
performance seriously is the protracted and demanding process of its
acquisition. It is only the lengthy and exacting course of pattern acquisition
and vocal skill practice that makes a songbout an all-round index of
phenotypic quality. Accordingly, 1800 melodies produced by impromptu
invention on the spot ought to impress less than the same number acquired
by meticulous copying from the local song tradition. Why should this be
so?
Only the local song tradition (or other local sounds in the case of bird
mimics) provides the intended audience with a standard or norm by which
to judge the extent of a singing individual’s proficiency and repertoire
coverage. The listeners or judges grew up in the same general neighborhood
as the singer. They were therefore exposed to the same song tradition and
other ambient sounds, and committed them to memory even if, as is the
case for the females of some species, they do not themselves sing (the
females of many species do in fact sing, see Riebel, 2003). Females are
sensitive both to how much has been learned by a male and to how well his
performance matches the shared standard, and they make their mating
decisions accordingly (Nowicki et al., 2002b).
Only against the background of the intimate knowledge of the local lore
shared between performer and audience is it possible to tell the extent to
which a given performer has achieved mastery. By the same token, potential
usurpers without apprenticeship in that lore but attempting to fake it need
not apply. Even in species that copy the sounds of other species or
environmental sounds into their repertoire (perhaps generations ago: Baylis,
1982), and thus lack species-specific constraints on the patterns that may be
acquired, the repertoire is typically acquired, first and foremost, from the
local song tradition carried by conspecifics. Because only exact duplication
of the received pattern proves that the bird actually attended to and
discriminated the perceptual details of its model and then practiced its
articulatory complexities to perfection, copying fidelity is part of the
standard. A whole circuit in the song control system of birds—the so-called
anterior forebrain pathway—is dedicated to using auditory feedback of the
bird’s own singing voice to gradually shape its vocal output to match the
model stored in memory (see Konishi, 2004 for review).
Complexity itself is therefore not the point of the performance. Random
strings abound in complexity—in fact, by one measure they are ultimately
complex (see, for example, Grassberger, 1986, Fig. 1)—but their
complexity is of a kind that does not lend itself to comparative assessment.
If individuals are to be compared one with another, the extent of their
acquisition of content from a common, shared, pool of content, namely the
local song tradition or soundscape, is essential for ranking their
performances. If that tradition and soundscape is richer than what any
single individual can easily master, then the extent of an individual’s
repertoire “coverage” of that material, that is, the size of an individual’s
tradition-based repertoire, is a veridical measure of his or her song learning
capacity.
FIGURE 1. Schematic depiction of the “information dimension” outlined in the text. It is a
composite of concepts and findings of three authors studying responses to novelty in the 1960s
(Bindra, 1959; Sachs, 1967; Sokolov, 1963). The dimension spans from maximal certainty (minimal
prediction error) at the bottom, to maximal uncertainty (maximal prediction error) at the top.
Behavioral reactions are presented in the left-hand column, and inferred psychological states in the
right-hand column.
We can conjecture that only a performance that draws on a sufficiently

broad sample of the listener’s recognition memory for the local song
tradition—a sample large enough to challenge and even tax the listeners’
powers of apprehension—will be taken seriously as a proof of competence.
The better each song string reproduces the traditional model the better will
their aggregate fill this function.
A judgment of competence might have to be upgraded to one of mastery
if, in addition to featuring extensive coverage of the traditional lore and
high fidelity reproduction, the performance starts pushing the limits of the
listener’s recognition memory. This would happen when the songbout
includes material that in fact forms part of the tradition but was “missed” by
the listener’s own ontogenetic acquisition process, or when it features
patterns introduced by the singer as virtuoso embellishments. Under such
circumstances the listener has good reason to take the performance
seriously indeed. In either case the performer is giving proof of a capacity
beyond that possessed by the listener, given the proviso that in either case
such “excess” material (excess from the standpoint of the listener) fits
seamlessly into the framework of the received form.
It is copying fidelity (supported by what I have called a “conformal
motive,” Merker, 2005) that gives the resulting cultural song tradition the
temporal inertia needed for it to serve as a standard of judgment. It in effect
stabilizes the tradition against too rapid accumulation of inevitable copying
errors. Thus stabilized, it provides individual learners with a vehicle by
means of which to advertise the quality of their developmental history
through the quality and scope of their command of the local tradition, and
their audience with a standard by which to judge the same performance. We
turn now to the inner workings of the listener’s responsiveness in this
regard.
S A , III:
S M
Judgments of a singer’s performance are fraught with consequences for

listeners, be they potential rivals or mates. In the case of the opposite sex it
determines the partner with whom one or more breeding seasons—even a
lifetime—will be spent, and in the case of same-sex rivalry it determines
matters as important as the quality of the territory on which foraging and
the rearing of offspring will take place. Much therefore hinges on the
assessment of the songbout that serves as a proxy for the phenotypic
qualities it underwrites, as covered in the previous two sections. How then
to compare and judge the streams of intricately patterned sound emanating
from the throats of singers (perhaps not even visible to their judges)?
Something must intervene in the psychology of the
receiver/judge/listener between apprehension of the songbout and the real-
life choice the receiver makes on its basis. That something can hardly be
formal analysis of the contents of the songbout, but ought to be some form
of intuitive summary measure of the extent to which the songbout taps and
taxes the listener’s knowledge of the local song tradition. Some form of
global emotional summary thus lies close at hand.
As we have seen, the repertoire size, model fidelity, pattern complexity,
and ease or elegance of delivery of a songbout must be measured against
the local song tradition as its standard. It is assessable, therefore, only
against a background of prior familiarity with that local song tradition (or
soundscape, in the case of mimics). A principal function of the bulging
forebrain system of warm-blooded animals (i.e., birds and mammals, which
are large-brained compared to the rest of the animal kingdom) is to
determine the extent to which current sensory afference pushes or exceeds
the boundaries of prior stimulus familiarity.
This quantity has variously been called surprisal (Tribus, 1961), novelty
(Berlyne, 1960; Bindra, 1959; Sachs, 1967; Sokolov, 1963), surprisingness
(Kamin, 1969), prediction error (not named as such: Rescorla & Wagner,
1972), and expectancy violation (Meyer, 1956). It has been related
specifically to esthetics by Berlyne (1971). Though there are differences in
emphasis and detail behind these names, they all have a shared functional
principle at their core, readily interpretable in informal Bayesian terms
(Rohrmeier and Koelsch, 2012). The operation of this principle is captured
by the free energy formulation of the logistics of bi-directional learning
networks pioneered by Geoffrey Hinton and colleagues (Hinton & Zemel,
1994), subsequently popularized by Karl Friston and others (Clark, 2013;
Friston, 2002).
Implemented through an elaborate neural system which besides its
neocortical parts involves the hippocampus, amygdala, and diencephalic
and midbrain way stations (see Merker, 2007a, Fig. 3 and Merker, 2013a,
Fig. 2) this function converts the informational content of sensory
experience to a running emotional summary in real time of the extent to
which the pattern of current afference exceeds the bounds of prior
experience. When those bounds are exceeded, this system signals caution,
apprehension, fear, and even terror, along a dimension that represents the
magnitude of novelty, expectancy violation, or prediction error. For present
purposes, it suffices to conceive of movement along this dimension to be
signaled by increasing levels of central activation. Central activation is
reflected in cortical gamma oscillations (Merker, 2013b), and peripherally
in the specifically cholinergic aspect of sympathetic activity reflected in
skin conductance changes (Shields, MacDowell, Fairchild, & Campbell,
1987), which vary linearly with the intensity of psychological (emotional)
activation (Bradley & Lang, 2000).
FIGURE 2. The same spectrum of psychological states as in the right-hand column of Fig. 1,
paired with their counterparts in the domain of esthetics. The context of existential safety that frames
esthetic experience occasions a “hedonic reversal” of valence in the upper reaches of the esthetic
information dimension, here designated “danger zone.”
Long before the recent formal treatments of this system were

inaugurated, its behavioral and psychological aspects had been studied by
psychologists and physiologists interested in the learning dynamics of
habituation to novelty already cited. Their results can be summarized in
terms of a pattern of graded responsiveness along a single
psychological/emotional dimension. Cognitively it spans a spectrum from
total certainty to total uncertainty, behaviorally from sleep to freezing, and
emotionally from boredom to terror. Between the latter two extremes lies a
gradient of emotional states ranging from mild interest, to active curiosity,
caution, and fear, as depicted in Fig. 1.
When the prior stimulus familiarity of such a system, stored as
recognition memory, includes a massive repertoire of local song, acquired
during the intensive song learning stage of ontogeny, the normal operation
of this system renders it a sensitive detector of the extent to which a
currently experienced songbout pushes or exceeds the boundaries of the
listener’s recognition memory. To the extent that it does, the system will
deliver the selfsame real-time emotional summary along the central
activation dimension for that songbout as for any other sensory experience.
An impoverished sample of the local song tradition will be experienced as
“boring.” An adequate sample rendered with confidence or flair will be
experienced as “interesting.” Finally, a bout whose pattern richness taxes
the limits of the listener’s recognition memory would be experienced as
apprehension or fear, were it not for the fact that it is set apart from other
activities by its character of performance, or “display” in behavioral biology
terms, and is recognized as such by all concerned.
Thus framed and contextually constrained, the superior performance
induces not outright apprehension or fear in the listener, but a “tamed”
version of the same in the form of being “touched,” “moved,” “impressed,”
and—at the high end of the informational dimension—even “awed” by
what is heard (cf. Konečni, 2005, 2015). This hedonic shift instantiates, in
other words, the principle that in a context of safety, negative emotions may
undergo a “hedonic reversal” to be experienced as positive (Apter, 1982;
Bloom, 2010; Strohminger, 2013).
The principal proposal of this chapter, then, is this: The biological roots
of the esthetic emotions, animal and human, are to be found in this
informational dimension of telencephalic operations in large-brained
species. So far these esthetic emotions have been discussed primarily in the
context of human responses to art (Berlyne, 1971; Konečni, 2005, 2015;
Konečni, Brown, & Wanic, 2008; Kuehnast, Wagner, Wassiliwizky,
Jacobsen, & Mennighaus, 2014; Scherer & Zentner, 2001; Scherer, Zentner,
& Schacht, 2001–2002). For brevity, I propose to use the expressions
“moving” and “being moved” (Konečni, 2005, 2015), and at times the
equivalent “impressing,” “impressive,” and “being impressed,” as
shorthand for phenomena associated with the mid-range of emotional
responsiveness to esthetic stimuli, flanked by “interest” at the less intense,
and by “awe” at the more intense, end of the range, as depicted in Fig. 2.
The reason the heart of a Bengalese finch female starts beating faster on
hearing a tape recording of an accomplished male singer (Okanoya, 2004)
would accordingly be “because she is moved or impressed” by what she
hears. And if we are indeed on the grounds of emotion we should be able to
specify an action tendency or behavioral bias promoted by that emotion
(Ekman, 1999; Fontaine & Scherer, 2013; Frijda, 1987; Izard, 2007). In
view of what has gone before the answer is not far to seek: the action
tendency promoted by the more intense levels of being impressed is that of
“yielding,” “surrender,” “submission,” or “capitulation” to the source of the
impressive performance, be the performer a potential mate or a rival. In a
sense, the hedonic reversal from “fear” to “being moved” or “impressed” is
reflected in a replacement of the behavioral tendency to escape by a
tendency to yield or surrender. And if, finally, we ask for the eliciting
stimulus or antecedent that evokes the emotion of being impressed, the
answer can only be “an outstanding performance.”
We can accordingly sum up this excursion into the biology of
“squandering as asset” by saying that an “outstanding performance” before
listeners conversant with the tradition to which the performance belongs
will move or impress those listeners, and that their emotional response of
being impressed is realized in an action tendency toward “surrender,”
directed at the performer exhibiting mastery through the performance.
This ascription of the effect of esthetic stimuli to the operation of the
information-related emotional dimension sets them clearly apart from both
motivational systems in general (their hedonic aspects included, for which
see Bloom, 2010) and from the domain of basic emotions as a whole
(Ekman, 1999). The boredom-to-awe spectrum is but the full unpacking of
one of the basic emotions, variously namely “interest” (Izard, 2007) or
“surprise” (Ekman, 1999). In keeping with the “cerebral” nature of this
emotional spectrum, the neural system for learning, producing,
apprehending, and judging song in birds with vocal learning is concentrated
to the telencephalon of their forebrain (Jarvis, 2007).
Finally, to counter possible misunderstanding of the role assigned to
emotion in the present proposal: the fact that the process of assessing the
merits of a songbout is mediated by an emotional variable (“being moved or
impressed”) by no means implies that the patterns of the song somehow
“portray emotion,” are about emotion, or are a vehicle for communicating
emotions. They portray nothing outside of themselves. What they
communicate is command of repertoire, complexity, and mastery of
execution, not anything encoded, language-like, in those patterns (more on
this in the section “The Psychological Impact of Music”). When performed
by an accomplished singer, a listener attuned to the relevant song tradition
registers appreciation of the performance in the form of being interested,
moved, or awed, according to the degree of command of tradition and
virtuosity displayed therein. The emotion is about the pattern, and not the
other way around. That the pattern in turn reflects the all-round phenotypic
qualities of the performer is what allows esthetics to be cast in evolutionary
terms, if the argument developed in this chapter has any merit.
T H C
We are now ready to make a swift transition to human arts and esthetics,
and we do so via human music on the plausible assumption that the first
form of human music proper was song. In fact, song may have preceded
speech in our evolutionary history, perhaps in the form of song and dance in
a group setting as a first form of the human arts (Merker, 2005, 2008). A
strong reason to make these assumptions is provided by the fact that
humans, unlike our closest relatives among the apes, indeed unlike any
other primate, are vocal learners, and more specifically, are vocal
production learners (Janik & Slater, 1997; see also Doupé & Kuhl, 1999).
Among non-human animals, this capacity for learning to reproduce by
voice novel sound patterns originally received by ear has most commonly
evolved to serve learned song. Therefore, the default assumption regarding
the function for which our own capacity for vocal learning originated would
be song as well. If so, learned song preceded speech in our evolutionary
ancestry (see Merker, 2012, 2015 for details), and we have landed squarely
in the constellation of factors outlined in previous sections as critical for the
origin and maintenance of complex cultural traditions of ritual lore in
animals with learned song. As a biological trait, this would include the
motivational mechanism of a conformal motive ensuring fidelity to tradition
(Merker, 2005), the role of prior familiarity in appreciation (cf. Madison &
Schiölde, 2017), as well as the ultimate purpose for taking on the burden of
acquisition, namely to impress a competent audience with one’s command
of the shared lore. As we saw in the section “Squandering as Asset, II,”
fidelity to tradition coupled with a shared exposure history furnishes a
standard of judgment short of which the tradition eventually collapses into
idiosyncratic caprice without grounds for comparing one performance with
another.
Trends in Western art over the past century have tended to obscure the
fundamental nature of this connection. In fact, it has been actively
combatted as a fetter on the exercise of untrammeled creativity. In good
agreement with the present perspective, the history of contemporary art
accordingly abounds in examples of idiosyncratic caprice exercised in the
absence of shared criteria for comparing one performance or creation with
another. Proof of this assertion surfaces from time to time in the form of
adventitious revelations that expose the arbitrary nature of the judgments
involved (Cheston, 2015; Jordan-Smith, 1960; Museum of Hoaxes, 2005.
Also: Wikipedia entries for Disumbrationism, Pierre Brassau).
Each step toward such a state of affairs typically meets with opposition
when it first occurs. Presumably this trend in the serious arts of Western
culture (poetry, painting, and music first and foremost, though not limited to
these) would not have proceeded as far as it has were it not for a more
general cultural ambience in the West emphasizing the inherent value of
novelty and the sanctity of artistic freedom, buttressed by the Romanticism
myth of artistic genius. That myth emphasizes the role of rare artistic
endowment over that of diligent mastery of a tradition in the genesis of
great art (Smith, 1924; Waterhouse, 1926). This cultural ambience
eventually ripened into outright celebration of iconoclasm in the course of
the twentieth century. Yet even then, with each advance of idiosyncratic
license, voices were raised in protest, sometimes trenchantly so.
One illustrative example occurred when a faction of musicians in the
modern jazz genre abandoned all reliance on traditional form in what they
styled “free form jazz.” The jazz bassist and band leader Charles Mingus, a
creative musician by no means a stranger to innovation, witnessed a key
event in this development. It was Ornette Coleman’s controversial 1960
performances at the New York City “Five Spot” jazz club. Mingus
commented: “… if the free-form guys could play the same tune twice, then
I would say they were playing something … Most of the time they use their
fingers on the saxophone and they don’t even know what’s going to come
out. They’re experimenting” (Wikipedia entry “Charles Mingus”). On
another occasion he noted, “They don’t even know their Parker” (B.
Merker, personal observation).
Note that Mingus’ comments by no means are directed against creativity
or innovation as such. They remind us, rather, of the necessity, under
circumstances where freedom is in fact possible because the means of
artistic expression are learned, of a shared exposure history to ground
substantive assessment of artistic merit. It is that shared background that
supplies the crucial “common currency” by which alone the informational
emotion of “being impressed” serves as an index of comparative value
across different performances. Without that anchoring in a shared tradition,
the emotional reaction of being impressed becomes as arbitrary and
idiosyncratic as the performances themselves. The bulwark against bluff has
been broken.
The reaction of being impressed by outstanding artistic creations for
which one has been prepared by an appropriate exposure history is
ubiquitous across the arts. It must not be confounded with the kind of
emotional responses that originate in personal associations forged in the
course of significant life events. A tune that figured prominently in a
teenage romantic infatuation may, when encountered years or even decades
later, compel strong feelings on an associative basis without reflecting on
the tune’s artistic merits (Konečni, 2005; Rauhe, 2003; Scherer & Zentner,
2001).
It is otherwise when we encounter a piece of music, perhaps for the first
time, for which our listening history of the genre to which it belongs has
equipped us to appreciate its masterfully patterned content, and we groan
and even weep in admiration (Gabrielsson, 2011; Scherer et al., 2001–2002;
see also Konečni, 2005). We may even feel our skin covered in goose-
bumps, and a shiver or chill traverse our spine (reviewed in Hunter &
Schellenberg, 2010; see further Gabrielsson, 2011; Scherer & Zentner,
2001; Silvia & Nusbaum, 2011; Vickhoff, Åström, & Theorell, 2012). But
what a peculiar way to express our admiration, by sighing, groaning, chills,
and even tears!
Our analysis of animal cultural esthetics allows us to make sense of
these peculiar behavioral and physiological tokens of being impressed. An
ordinary trigger for bodily reactions such as shivers, goose-bumps
(piloerection), or chills is genuine fear (Marks, 1969, pp. 2, 39). They are
the peripheral expressions of the central fear state, as it engages the
autonomic (sympathetic) nervous system on an automatic, involuntary
basis. These low-level autonomic (involuntary) reactions apparently remain
patent even under circumstances where an esthetic stimulus taps the fear
range of the information dimension, but on contextual grounds undergoes a
hedonic reversal, as already covered. Thus the shivers, chills, and goose-
bumps betray the origin of the emotional impact of strong esthetic
experiences in the fear range of the informational dimension depicted in
Figs. 1 and 2, in good agreement with the present interpretive framework
(cf. Benedek & Kaernbach, 2011).
Similarly for the sighing, groaning, and weeping elicited by strong
esthetic experiences. Tears appear to be the most common bodily response
to strong experiences of music (Gabrielsson, 2011; Scherer et al., 2001–
2002; see also Konečni, 2005). A prominent ordinary setting for such
reactions is the experience of personal loss, for which such reactions serve a
largely involuntary expressive function (e.g., Averill, 1979; Frijda, 1988).
As we saw in the previous section, the action tendency contingent on being
esthetically moved should be a readiness, indeed an urge, to yield, submit,
surrender, or capitulate to the source of an impressive performance, be it a
rival who has bested us by a masterly performance, or a suitor who has
penetrated our defenses by the same.
In either case, loss is implicit in the act of surrender. Being bested by a
rival is attended by a direct loss of status and its perquisites. What hovers in
the evolutionary background, as we have seen, is the potential for physical
attack from an agonist whose masterful performance, according to the
developmental stress hypothesis, advertises his all-round superior
phenotypic characteristics. In surrendering to a suitor, one loses freedom of
choice in matters as important as the parentage of one’s offspring, along
with loss of personal independence for the considerable stretch of time that
the partnership will last.
More abstractly conceived, a certain giving up (loss) of self is implicit in
every act of submission. Arthur Schopenhauer emphasized “forgetfulness of
self” in discussing esthetics, and its special relation to experiences of the
sublime which he illustrated by way of landscape painting (Schopenhauer,
1844/1966, vol. I, pp. 200ff., vol. II. pp. 369ff.). For a recent discussion of
this important (and once celebrated) topic in esthetics, see Konečni (2005,
2011). Absorption in the pattern-stream of a musical performance promotes
forgetfulness of anything extraneous, including one’s sense of self. Such
absorption, given the requisite level of background familiarity, will be all
the more complete and compelling in the case of outstanding performances,
not only because their masterly patterning invites it, even compels it, but
because they tax our powers of apprehension. Then self-surrender and
forgetfulness of self may reach a peak, a circumstance that may bear on the
psychology of transcendental and religious experiences that are a prominent
aspect of strong experiences of music (Gabrielsson, 2011).
What drives tears to our eyes even though we are not actually sad or
grieving, then, is the tacit sense of loss coupled to the action tendency of
surrender promoted by an outstanding performance. The phenomenon is not
even strictly confined to arts and esthetics: similar responses can occur on
witnessing an outstanding performance in, say, sports. To prevent
misunderstanding, note that none of this is to be taken to mean that the
connection between such reactions and surrender is directly present to the
minds of those experiencing them. The listening mind is typically absorbed
in the pattern of the performance, far from the sadness of loss or the cold
hand of fear. In keeping with the hedonic reversal invoked here, happiness
and joy are typical of these intense experiences (Gabrielsson, 2011).
These caveats regarding what might be present to the mind of the
listener/beholder do not mean, however, that the evolutionary logic of
capitulating to the originator of a masterful display necessarily is a matter
of our distant ancestry only. There is no dearth of examples of strangers
soliciting casual amorous liaisons with famous creators of art on the basis
of encountering their creations alone (Lipsius, 1919; Miller, 2000, p. 331;
see also Nettle & Clegg, 2006).
In sum, only where artistry is embedded in a tradition that poses a
challenge of acquisition for its practitioners and also has shaped the
sensibilities of the intended audience does the latter’s emotional response of
being impressed provide a measure of artistic merit. It is only when both
conditions are met that a causal connection between intuitive response and
artistic merit is in fact patent, according to the psychological account given
in the section “Squandering as Asset, III.” Such was typically the state of
affairs throughout human cultures until the advent of modernity in the West,
and even there it still holds for its popular culture within any of its given
subcultures. By the same token, where anything can be art, nothing in fact
is.
T P I M :
N “M ” “E ”
The thesis that art generally, and music specifically, exerts its effect via the
informational dimension defined in the section “Squandering as Asset, III”
has obvious consequences for the much discussed issue of “music and
meaning” and its subdomain “music and emotion” (Davies, 1994; Juslin &
Sloboda, 2001; Meyer, 1956; Robinson, 1997). It does so by allowing us to
draw a principled distinction between the undoubted psychological impact
of music on the one hand, and questions of its carrying meaning as well as
of its portraying or inducing emotions on the other.
Strictly speaking only the sentences of language “mean” at all, as most
trenchantly argued by Staal (Staal, 1989). The multilevel combinatorics of
phonemes and morphemes by which language performs its arbitrary (in the
sense of conventional) mapping between the form of utterances and their
meaning (compare “bord,” “Tisch,” and “table” for the selfsame type of
object in Swedish, German, and English, respectively) constitutes a bona
fide code for representing and conveying meaning. This code is so detailed
and comprehensive that virtually every difference in the strings of
phonemes that make up sentences makes a difference in the information
conveyed by those sentences. This lexically semanticized syntactic code
turns sequential patterns of vocally produced sounds into statements about
things that bear not the slightest resemblance to those sound sequences
themselves (see the “table” example above). Thus we think and
communicate about objects, events, matters of fact, states of the world,
ideas, intentions, beliefs and desires, without limit, using the same few
dozen phonemes to do so. This is what it means to “mean,” namely that
something encodes something other than itself, which is its meaning.
Much of what compels our interest, carries significance, and recruits our
psychological engagement—in short, has psychological impact—does so
without the detour of meaning in this sense. Our non-linguistic perception
and cognition quite generally operates on patterns of sensory input by
discriminating, segmenting, grouping, classifying and generalizing within
and across them, and not by using them as vehicles for encoding matters
other than themselves. Something need not, in other words, mean in order
to be meaningful: witness the experience of a magnificent sunset as but one
of an infinitude of cases in point.
Viewed in this light, the patterns of music define themselves as
perceptual objects that engage the informational dimension of our
perceptual/cognitive capacities in the manner of auditory analogs of
visually presented arabesques or the shifting patterns of a turning
kaleidoscope, to use Hanslick’s felicitous metaphors (Hanslick, 1854). As
such they need to “sound good” (and “better,” and “best”), not to refer to
circumstances other than themselves. In so doing they exploit the limitless
pattern-generativity music conquers for itself by a combinatorics of
“particulate” elements drawn from the discretized continua of pitch and
duration (for which see Abler, 1989; Merker, 2002; Merker, Morley, &
Zuidema, 2015). This limitless generativity fits ill with a conception of
music as a device for portraying or evoking the limited set of subjective
states that make up our emotions. Not only is the empirical evidence
supporting that conception weak (Konečni, 2003, 2008; Konečni et al.,
2008; Scherer, 2003), but weighty arguments have been leveled against it,
arguments for which Hanslick’s 1854 essay is still the unsurpassed locus
classicus (Hanslick, 1854; see also Davies, 1994; Zangwill, 2004).
It is the common experience of having been moved by music, even to
the point of tears or chills in some instances, that has lent credence to the
notion that music, somehow, is “about” emotions, or exerts its effects by
inducing them. This “being moved” or being impressed by music is indeed
an emotional response. But as we saw in the section “Squandering as Asset,
III,” that response moves up and down the intensity dimension of a single
one of the basic emotions, rather than across them. The fact that music has
an emotional impact in this sense must accordingly be sharply distinguished
from the claim that music is about emotions in the plural, or aims at
evoking emotions, again in the plural, which in either case it would have to
do to fit the metaphor of being a “language of emotions” (Spencer, 1911).
To the extent that the patterns of music refer to circumstances other than
themselves (e.g., storms, battles, or dancing peasants in programmatic
music) they tend to do so by dynamically or otherwise mimicking,
resembling, or caricaturing the things to be evoked (Hanslick, 1854). That
is not how language carries meaning except in the special cases of
onomatopoeia and some of the uses of prosody, both of which lie outside of
the central coding device that gives the sentences of language their unique
and unbounded capacity to mean. So even when music is intended to mean
—which is far from always the case—it does not mean, it mimics.
In song and music without lyrics it is the vocal or instrumental patterns
themselves that are the information conveyed. As detailed in previous
sections, our emotional response to these patterns concerns the inducing
patterns themselves as they interact with the background of our prior
musical familiarity. When that background relates to the genre to which the
patterns belong, the specificity, scope, and intricacy of the interaction is
commensurately enhanced. It is here that the infinite pattern generativity of
music comes into its own. It furnishes the makings of the untold structural
devices (variation and repetition, various symmetries and asymmetries, etc.)
needed to create temporal trajectories capable of sustaining our interest in
the face of the ubiquitous habituability of our cognitive equipment, a
habituability that converts every novelty to “old hat” in the course of a few
encounters. It is not to our emotions that this content is addressed in the first
place, but to our imagination, as Eduard Hanslick, following Arthur
Schopenhauer, insisted (Hanslick, 1854; Schopenhauer, 1844/1966, vol. II,
pp. 447ff.).1
Engaging the contours of our recognition memory the ever-varied
patterns of music trigger a variety of familiarity-based expectancies which
their temporally unfolding melodic, rhythmic, and harmonic patterns
confirm, violate, or complement, generating tensions, their resolution, and
new expectancies in ever-shifting peregrinations across the sensibility
landscape sculpted by the listener’s history of prior exposure (Meyer, 1956;
Narmour, 1977; Schopenhauer, 1844/1966, vol. II, p. 455). For a given
musical listening experience, it is the cumulative effect—presumably in
“leaky integrator” fashion—of the particular sequence of twists and turns
along the temporal trajectory of this interaction that determines how far up
the information dimension toward “awe” a given experience of music takes
us and thus the extent to which it moves or impresses us. In this sense
“being moved” or “impressed” is a specifically esthetic emotion. It may
even be deemed the esthetic emotion (Konečni, 2005, 2011, 2015),
generated by hedonic reversal in the danger zone of the information
dimension.
In both cultural history and the listening history of individuals, the
infinite space of musical combinatorics differentiates into occupied
subregions according to genre (cf. Merker, 2002, pp. 11–12). Individual
specimens that make up any given region of this space will exhibit greater
or lesser elegance with greater or lesser degrees of well-formedness (see,
e.g., Lerdahl & Jackendoff, 1983) and greater or lesser efficacy in stirring a
given listener’s imagination on encounter. And just as in other perceptual
and cognitive systems, pattern invariants are bound to be extracted across
the sampled space in accordance with a variety of shared structural
characteristics, clustering musical impressions under high-level descriptors.
Thus the categories “nostalgia” (sentimental, dreamy, melancholic),
“power” (energetic, triumphant, heroic), and “tension” (agitated, nervous,
impatient, irritated) extracted by Zentner and colleagues from responses to a
diverse sample of European classical music (Zentner, Grandjean, & Scherer,
2008). The authors interpret their results in terms of a model of music-
specific emotions. They might also be construed in terms of high-level
intuitive (statistical) pattern-classification, ranging across the vast and
multiform world of musical patterns that have accumulated in a given
musical culture and whose uneven sampling has shaped the musical
familiarity and sensibilities of any given listener.
The perspective on the psychological impact of music presented here by
no means relegates music to an abstract domain of formalist
connoisseurship. Some of its patterns—say of rhythmic music meant to
support dancing—access a presumably species-specific predisposition for
bodily entrainment to isochrony-based auditory patterns, and help optimize
such entrainment (Merker, 2014; Merker, Madison, & Eckerdal, 2009). The
central role of music in youth and popular culture suffices to dispel any
overly formalist notion of its nature, and fits well with the evolutionary
perspective on esthetics presented here.
To sum up: the emotional impact of music is best understood not by

analogy to the meaning encoded in language, nor by assimilation to the
biology of basic emotions, but through the behavioral biology of the
Zahavian handicap principle (Zahavi, 1975) and its psychological
ramifications. Where handicaps take the form of esthetic displays—from
the peacock’s tail to the vocal artistry of pied butcherbirds (Taylor, 2009)—
mechanisms for judging their quality must be in place, typically in the
medium of an emotional dimension spanning from boredom, via
interest/curiosity, to being impressed, with awe and a sense of sublimity at
its high end.
As I have been at pains to make credible, the elaboration of Zahavi’s
handicap principle in the developmental stress hypothesis for the size and
complexity of birdsong repertoires provides an eminently plausible
interpretive framework for the nature and function of human song and
music as well. It dispels the appearance of frivolity encumbering our
expenditure of effort and resources on acquiring and producing the pattern
richness of human song and music. By exact analogy to the case of learned
birdsong, it gives us a means to display command and mastery of a trove of
culturally patterned and transmitted lore. Such command and mastery
serves not only as a badge of competence in the culture, but as a certificate
of the phenotypic traits needed to achieve that competence.
In our case today, music is not alone in providing such a shorthand
certificate of phenotypic competence. It was eventually supplemented by
language performing that same function among others. The two domains
share not only the pattern generativity of a combinatorics of discrete
elements, but also the mechanism of vocal learning, and the cerebral
equipment for pattern-assessment. It is even possible that language grew out
of song in a glacial movement of contextual semanticization of song
repertoires, as detailed in Merker (2012).
For music, in the setting of a cultural tradition of pattern familiarity
shared between performer and listener, the circumstances reviewed here
allow a given performance to be appreciated, and even to be assessed, on
occasion, as an outstanding one. And that, I submit, is when extravagance
impresses, and what is more, when it rightfully should impress, according
to the recasting of esthetics in evolutionary terms that has been the burden
of this chapter.
R
Abler, W. L. (1989). On the particulate principle of self-diversifying systems. Journal of Social and
Biological Structures 12(1), 1–13.
Apter, M. J. (1982). The experience of motivation: The theory of psychological reversals. New York:
Academic Press.
Averill, J. R. (1979). The functions of grief. In C. Izard (Ed.), Emotions in personality and
psychopathology (pp. 339–368). New York: Plenum Press.
Baylis, J. R. (1982). Avian vocal mimicry: Its function and evolution. In D. E. Kroodsma & E. H.
Miller (Eds.), Acoustic communication in birds (pp. 51–83). New York: Academic Press.
Benedek, M., & Kaernbach, C. (2011). Physiological correlates and emotional specificity of human
piloerection. Biological Psychology 86(3), 320–329.
Berlyne, D. E. (1960). Conflict, arousal, and curiosity. New York: McGraw-Hill.
Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton Century.
Bindra, D. (1959). Stimulus change, reactions to novelty, and response decrement. Psychological
Review 66(2), 96–103.
Bloom, P. (2010). How pleasure works. New York: W. W. Norton.
Bradley, M. M., & Lang, P. J. (2000). Measuring emotion: Behavior, feeling and physiology. In R. D.
Lane, L. Nadel, & G. Ahern (Eds.), Cognitive neuroscience of emotion (pp. 242–276). New York:
Cheston, P. (2015). Artist in legal row claims “former workshop sold her paint-spattered carpet as
genuine works.” Retrieved from http://www.standard.co.uk/news/london/artist-in-legal-row-
claims-former-workshop-sold-her-paint-spattered-carpet-as-genuine-works-a2947666.html
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive
science. Behavioral and Brain Sciences 36(3), 181–204.
Darwin, C. (1871). The descent of man and selection in relation to sex. New York: D. Appleton &
Company.
Davies, S. (1994). Musical meaning and expression. Ithaca, NY: Cornell University Press.
Doupé, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common themes and mechanisms.
Annual Review of Neuroscience 22, 567–631.
Dowsett-Lemaire, F. (1979). The imitative range of the song of the Marsh Warbler, Acrocephalus
palustris, with special reference to imitations of African birds. Ibis 121(4), 453–468.
Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and
emotion (pp. 45–60). Chichester: John Wiley and Sons.
Fontaine, J. J. R., & Scherer, K. R. (2013). Emotion is for doing: The action tendency component. In
J. J. R. Fontaine, K. R. Scherer, & C. Soriano (Eds.), Components of emotional meaning: A
sourcebook (Chapter 11). Oxford Scholarship Online. Oxford: Oxford University Press.
doi:10.1093/acprof:oso/9780199592746.001.0001
Frijda, N. H. (1987). Emotion, cognitive structure, and action tendency. Cognition and Emotion 1(2),
115–143.
Frijda, N. H. (1988). Laws of emotion. American Psychologist 43(5), 349–358.
Friston, K. (2002). Functional integration and inference in the brain. Progress in Neurobiology 68(2),
113–143.
Gabrielsson, A. (2011). Strong experiences with music. Oxford: Oxford University Press.
Gahr, M. (2000). Neural song control system of hummingbirds: Comparison to swifts, vocal learning
(songbirds) and nonlearning (suboscines) passerines, and vocal learning (budgerigars) and
nonlearning (dove, owl, gull, quail, chicken) nonpasserines. Journal of Comparative Neurology
426(2), 182–196.
Garamszegi, L. Z., & Eens, M. (2004). Brain space for a learned task: Strong intraspecific evidence
for neural correlates of singing behavior in songbirds. Brain Research Reviews 44(2–3), 187–193.
Grassberger, P. (1986). Toward a quantitative theory of self-generated complexity. International
Journal of Theoretical Physics 25(9), 907–938.
Hanslick, E. (1854). Vom musikalisch Schönen. Beiträge zur Revision der Ästhetik der Tonkunst.
Leipzig: Weigel.
Hartshorne, C. (1956). The monotony threshold in singing birds. Auk 73, 176–192.
Hasselquist, D., Bensch, S., & von Schantz, T. (1996). Correlation between song repertoire, extra-
pair paternity and offspring survival in the great reed warbler. Nature 381(6579), 229–232.
Heinrichs, M., von Dawans, B., & Domes, G. (2009). Oxytocin, vasopressin, and human social
behavior. Frontiers in Neuroendocrinology 30(4), 548–557.
Hinton, G. E., & Zemel, R. S. (1994). Autoencoders, minimum description length, and Helmholtz
free energy. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information
processing systems 6 (pp. 3–10). San Mateo, CA: Morgan Kaufmann.
Hunter, P. G., & Schellenberg, E. G. (2010). Music and emotion. In M. R. Jones, R. R. Fay, & A. N.
Popper (Eds.), Music perception (pp. 129–164). New York: Springer.
Huron, D. (2001). Is music an evolutionary adaptation? Annals of the New York Academy of Sciences
930, 43–61.
Iwaniuk, A. N., & Nelson, J. E. (2003). Developmental differences are correlated with relative brain
size in birds: A comparative analysis. Canadian Journal of Zoology 81(12), 1913–1928.
Izard, C. E. (2007). Basic emotions, natural kinds, emotion schemas, and a new paradigm.
Perspectives on Psychological Science 2(3), 260–280.
Janik, V. M., & Slater, P. J. B. (1997). Vocal learning in mammals. Advances in the Study of Behavior
26, 59–99.
Jarvis, E. D. (2007). Neural systems for vocal learning in birds and humans: A synopsis. Journal of
Ornithology 148, 35–44.
Jordan-Smith, P. (1960). The road I came; some recollections and reflections concerning changes in
American life and manners since 1890. Caldwell, Idaho: Caxton Printers.
Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford:
Kamin, L. J. (1969). Predictability, surprise, attention, and conditioning. In R. Church & B. Campbell
(Eds.), Punishment and aversive behavior (pp. 279–296). New York: Appleton-Century-Crofts.
Kelly, A. M., & Goodson, J. L. (2014). Social functions of individual vasopressin–oxytocin cell
groups in vertebrates: What do we really know? Frontiers in Neuroendocrinology 35(4), 512–529.
Konečni, V. J. (2003). Review of P. N. Juslin and J. A. Sloboda (Eds.), Music and emotion: Theory
and research. Music Perception 20, 332–341.
Konečni, V. J. (2005). The aesthetic trinity: Awe, being moved, thrills. Bulletin of Psychology and the
Arts 5(2): 27–44.
Konečni, V. J. (2008). Does music induce emotion? A theoretical and methodological analysis.
Psychology of Aesthetics, Creativity, and the Arts 2(2), 115–129.
Konečni, V. J. (2011). Aesthetic trinity theory and the sublime. Philosophy Today 55, 64–73.
Konečni, V. J. (2015). Being moved as one of the major aesthetic emotional states: A commentary on
“Being moved: linguistic representation and conceptual structure.” Frontiers in Psychology 6, 343.
Konečni, V. J., Brown, A., & Wanic, R. (2008). Comparative effects of music and recalled life-events
on emotional state. Psychology of Music 36(3), 289–308.
Konishi, M. (2004). The role of auditory feedback in birdsong. In H. P. Ziegler & P. Marler (Eds.),
The behavioral neurobiology of birdsong. Annals of the New York Academy of Sciences 1016, 463–
475.
Kroodsma, D. E. (1978). Continuity and versatility in birdsong: Support for the monotony threshold
hypothesis. Nature 274(5672), 681–683.
Kroodsma, D. E., & Parker, L. D. (1977). Vocal virtuosity in the brown thrasher. Auk 94, 783–785.
Kuehnast, M., Wagner, V., Wassiliwizky, E., Jacobsen, T., & Mennighaus, W. (2014). Being moved:
Linguistic representation and conceptual structure. Frontiers in Psychology: Emotion Science 5,
1242.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT
Press.
Lim, M. M., & Young, L. J. (2006). Neuropeptidergic regulation of affiliative behavior and social
bonding in animals. Hormones and Behavior 50(4), 506–517.
Lipsius, I. M. (1919). Liszt und die Frauen. Leipzig: Breitkopf & Härtel.
MacDougall-Shackleton, S. A., & Spencer, K. A. (2012). Developmental stress and birdsong:
Current evidence and future directions. Journal of Ornithology 153(Suppl. 1), S105–S117.
Madison, G., & Schiölde, G. (2017). Repeated listening increases the liking for music regardless of
its complexity: Implications for the appreciation and aesthetics of music. Frontiers in
Neuroscience 11, 147.
Marks, I. M. (1969). Fears and phobias. New York: Academic Press.
Merker, B. (2002). Music: The missing Humboldt system. Musicae Scientiae 6(1), 3–21.
Merker, B. (2005). The conformal motive in birdsong, music and language: An introduction. In G.
Avanzini, L. Lopez, S. Koelsch, & M. Majno (Eds.), The neurosciences and music II: From
perception to performance. Annals of the New York Academy of Sciences 1060, 17–28.
Merker, B. (2007a). Consciousness without a cerebral cortex: A challenge for neuroscience and
medicine. Behavioral and Brain Sciences 30(1), 63–134.
Merker, B. (2007b). Music at the limits of the mind. In G. Kugiumutzakis (Ed.),Sympantiki Armonia,
Musike kai Epistimi. Ston Miki Theodoraki [Universal harmony, music and science. In honour of
Mikis Theodorakis]. Heraklion: Crete University Press.
Merker, B. (2008). Ritual foundations of human uniqueness. In S. Malloch & C. Trevarthen (Eds.),
Communicative musicality (pp. 45–59). Oxford: Oxford University Press.
Merker, B. (2012). The vocal learning constellation: Imitation, ritual culture, encephalization. In N.
Bannan & S. Mithen (Eds.), Music, language and human evolution (pp. 215–60). Oxford: Oxford
University Press.
Merker, B. (2013a). The efference cascade, consciousness, and its self: Naturalizing the first person
pivot of action control. Frontiers in Psychology 4, article 501, 1–20.
Merker, B. (2013b). Cortical gamma oscillations: The functional key is activation, not cognition.
Neuroscience & Biobehavioral Reviews 37(3): 401–417.
Merker, B. (2014). Groove or swing as distributed rhythmic consonance: Introducing the groove
matrix. Frontiers in Human Neuroscience 8, article 454, 1–4.
Merker, B. (2015). Seven theses on the biology of music and language. Signata 6, 195–213.
Merker, B., Madison, G., & Eckerdal, P. (2009). On the role and origin of isochrony in human
rhythmic entrainment. Cortex 45(1): 4–17.
Merker, B., Morley, I., & Zuidema, W. (2015). Five fundamental constraints on theories of the
origins of music. Philosophical Transactions of the Royal Society of London: Biology 370(1664):
20140095. doi:10.1098/rstb.2014.0095
Miller, G. F. (2000). The mating mind: How sexual choice shaped the evolution of human nature.
New York: Doubleday.
Museum of Hoaxes (2005). Monkey art fools expert. Retrieved from:
http://hoaxes.org/weblog/comments/monkey_art_fools_expert
Narmour, E. (1977). Beyond Schenkerism: The need for alternatives in music analysis. Chicago, IL:
University of Chicago Press.
Nettle, D., & Clegg, H. (2006). Schizotypy, creativity and mating success in humans. Proceedings of
the Royal Society of London B: Biological Sciences 273, 611–615. doi:10.1098/rspb.2005.3349
Nowicki, S., Searcy, W. A., & Peters, S. (2002a). Brain development, song learning and mate choice
in birds: A review and experimental test of the “nutritional stress hypothesis.” Journal of
Comparative Physiology A: Sensory, Neural, and Behavioral Physiology 188: 1003–1004.
Nowicki, S., Searcy, W. A., & Peters, S. (2002b). Quality of song learning affects female response to
male bird song. Proceedings of the Royal Society of London B: Biological Sciences 269, 1949–
1954.
Okanoya, K. (2004). Song syntax in Bengalese finches: Proximate and ultimate analyses. Advances
in the Study of Behavior 34, 297–345.
Patel, A. D. (2008). Music, language, and the brain. Oxford: Oxford University Press.
Pearce, E., Launay, J., & Dunbar, R. I. M. (2015). The ice-breaker effect: Singing mediates fast social
bonding. Royal Society Open Science 2, 150221. Retrieved from
http://dx.doi.org/10.1098/rsos.150221
Pinker, S. (1997). How the mind works. New York: Penguin Putnam.
Rauhe, H. (2003). Musik heilt und befreit. In H. G. Bastian & G. Kreutz (Eds.), Musik und
Humanität. Interdiziplinäre Grundlagen für (musikalische) Erzhiehung und Bildung (pp. 182–
191). Mainz: Schott.
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the
effectiveness of reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.),
Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-
Crofts.
Riebel, K. (2003). The “mute” sex revisited: Vocal production and perception learning in female
songbirds. Advances in the Study of Behavior 33, 49–86.
Robinson, J. (Ed.). (1997). Music and meaning. Ithaca, NY: Cornell University Press.
Rohrmeier, M. A., & Koelsch, S. (2012). Predictive information processing in music cognition. A
critical review. International Journal of Psychophysiology, 83, 164–175.
Sachs, E. (1967). Dissociation of learning in rats and its similarities to dissociative states in man. In
J. Zubin & H. Hunt (Eds.), Comparative psychopathology: Animal and human (pp. 249–304). New
York: Grune and Stratton.
Scherer, K. R. (2003). Why music does not produce basic emotions. In R. Breslin (Ed.), Proceedings
of the Stockholm Music Acoustic Conference, 2 vols., Vol. 1 (pp. 25–28). Retrieved from
http://www.speech.kth.se/smac03
Scherer, K. R., & Zentner, M. R. (2001). Emotional effects of music: Production rules. In P. N. Juslin
& J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361–392). Oxford: Oxford
University Press.
Scherer, K. R., Zentner, M. R., & Schacht, A. (2001–2002). Emotional states generated by music: An
exploratory study of music experts. Musicae Scientiae, Special Issue: Current trends in the study of
music and emotion, 149–171.
Schopenhauer, A. (1844/1966). The world as will and representation (2nd ed.; orig. ed. 1819). Trans.
E. F. J. Payne, 2 vols. New York: Dover.
Shields, S. A., MacDowell, K. A., Fairchild, S. B., & Campbell, M. L. (1987). Is mediation of
sweating cholinergic, adrenergic, or both? A comment on the literature. Psychophysiology 24(3),
312–319.
Silvia, P. J., & Nusbaum, E. C. (2011). On personality and piloerection: Individual differences in
aesthetic chills and other unusual aesthetic experiences. Psychology of Aesthetics, Creativity, and
the Arts 5(3), 208–214.
Smith, L. P. (1924). Four words: Romantic, originality, creative, genius. Oxford: Clarendon Press.
Sokolov, E. N. (1963). Higher nervous functions: The orienting reflex. Annual Review of Physiology
25, 545–580.
Spencer, H. (1911). On the origin and function of music. In Essays on education and kindred subjects
(pp. 312–330). London: J. M. Dent & Sons.
Staal, F. (1989). Rules without meaning. New York: Peter Lang.
Strohminger, N. S. (2013). The hedonics of disgust (Doctoral dissertation). University of Michigan.
Retrieved from https://deepblue.lib.umich.edu/handle/2027.42/97960
Taylor, H. (2009). Towards a species songbook: Illuminating the vocalisations of the Australian pied
butcherbird (Cracticus nigrogularis) (Doctoral dissertation). University of Western Sydney.
Tribus, M. (1961). Thermodynamics and thermostatics: An introduction to energy, information and
states of matter, with engineering applications. New York: Van Nostrand.
Vickhoff, B., Åström, R., & Theorell, T. (2012). Musical piloerection. Music and Medicine 4, 82–89.
Waterhouse, F. A. (1926). Romantic “originality.” The Sewanee Review 34, 40–49.
Zahavi, A. (1975). Mate selection: A selection for a handicap. Journal of Theoretical Biology 53(1),
205–214.
Zangwill, N. (2004). Against emotion: Hanslick was right about music. British Journal of Aesthetics
44(1), 29–43.
Zentner, M., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked by the sound of music:
Characterization, classification, and measurement. Emotion 8(4), 494–521.
1
For Schopenhauer’s influence on Hanslick, see Merker (2007b), to which can be added the fact
that the “smoking gun” of that influence in Hanslick’s final paragraph was eliminated from all but the
first edition of Hanslick’s famous essay.
SECTION III
MU S IC P R OC E S S IN G IN
T HE HU MA N B R A IN
CHAPT E R 5
CEREBRAL
O R G A N I Z AT I O N O F M U S I C
PROCESSING
T H E N I L L E B R A U N JA N Z E N A N D MI C H A E L H . T H A U T
U the neural underpinnings of music processing is a central

theme in cognitive neuroscience, as evidenced by the growing body of
literature on this topic. Neuroimaging research developed over the past 20
years has successfully mapped several cortical and subcortical brain regions
that support music processing. This chapter provides a broad panorama of
the current knowledge concerning the anatomical and functional basis of
music processing in the healthy brain. For that, we focus our attention on
core brain networks implicated in music processing, emphasizing the
anatomical and functional interactions between cortical and subcortical
areas within auditory-frontal networks, auditory-motor networks, and
auditory-limbic networks. Finally, we review recent studies investigating
how brain networks organize themselves in a naturalistic music listening
context. The term network here implies the notion of a collection of regions
that are activated to support a particular function, referencing structural and
functional connections between these regions. With that, we move beyond
the “where” and “when” of task-related activity to start understanding how
different brain networks interact to support cognitive, perceptual, and motor
functions.
N B M P
H B
The Ascending Auditory Pathways

Music perception begins with the decoding of acoustic information.
Acoustic signals such as voices and music enter the human ear and trigger a
cascade of signal transpositions along the auditory pathways (Fig. 1).
Incoming auditory signals are transmitted by the outer and middle ear to the
cochlea of the inner ear, where acoustic information is translated into neural
activity. Acoustic properties such as sound frequency are represented
tonotopically in the basilar membrane of the cochlea, which refers to the
systematic topographical arrangement of neurons as a function of their
response to tones of different frequencies. This tonotopic organization is
found throughout the auditory neuraxis (Humphries, Liebenthal, & Binder,
2010; Zatorre, 2002).
FIGURE 1. The neural auditory pathway consists of an interconnecting cascade of processing
nodes from the cochlear nucleus (CN) up to primary auditory cortex (AC) and higher-level auditory
regions in superior temporal cortex (STC). Abbreviations: CN, cochlear nucleus; SOC, superior
olivary complex; IC, inferior colliculus; HC, hippocampus; MGB, medial geniculate body; AC,
auditory cortex; STC, superior temporal cortex.
Reprinted from Progress in Neurobiology 123(1), Sascha Frühholz, Wiebke Trost, and Didier
Grandjean, The role of the medial temporal limbic system in processing emotions in voice
and music, pp. 1–17, https://doi.org/10.1016/J.PNEUROBIO.2014.09.003, Copyright © 2014
Elsevier Ltd. All rights reserved.
Outside of the cochlea, dendrites of the spiral ganglion cells synapse

with the base of the hair cells located in the organ of Corti on the basilar
membrane. Triggered by the movement of the hair cells on the basilar
membrane, the spiral ganglion cells are the first neurons to fire an action
potential in the auditory pathway and transmit all the brain’s auditory input
via their axons synapsing with the dendrites of the cochlear nuclei (Amunts,
Morosan, Hilbig, & Zilles, 2012; Froud et al., 2015; Nayagam, Muniak, &
Ryugo, 2011).
The majority of the fibers (70 percent) cross over to the opposite
hemisphere starting at the levels of the cochlear nuclei (contralateral
pathway), while some remain on the same incoming side (ipsilateral
pathway). The acoustic information is highly preprocessed by a series of
brainstem nuclei before reaching the cortex. Basic acoustic features such as
sound intensity, signal onsets, periodicity, and signal location are extracted
in the cochlear nucleus, lateral lemniscus, and the superior olivary complex.
There is a secondary pathway that originates in the ventral cochlear nucleus
where some fibers project from there to the reticular formation, a general
arousal system in the lower brainstem. Descending (efferent) fiber tracts
from the reticular formation form the audio-spinal pathway by connecting
with the motor neurons in the spinal cord to innervate reflexive motor
responses to sound and to prime motor neural excitability (Horn, 2006;
Huffman & Henson, 1990; Rossignol & Melvill Jones, 1976). The
secondary ascending (afferent) pathway inhibits lower auditory centers to
elevate hearing thresholds and alert the cortex to incoming auditory signals.
In the primary ascending pathway, the superior olivary complex is the
first relay station of the brainstem where cochlear inputs from both left and
right sides converge, providing the anatomical basis for the processing of
sound location by measuring timing and sound intensity differences
between incoming left and right signals to determine sound angles (Grothe,
2000; Tollin, 2003). More complex spectral and temporal decoding of the
acoustic signals occurs in the inferior colliculus. Functional magnetic
resonance imaging research with animals has shown that the spectral and
temporal dimensions of the acoustic signals are distinctly mapped in the
inferior colliculus, indicating that, in addition to the tonotopic maps, the
temporal envelope of the acoustic signals are also topographically
represented in the inferior colliculus (Baumann et al., 2011). The last cross-
lateral projections are at the inferior colliculus level.
The last subcortical node in the primary ascending pathway is the medial
geniculate body, which is comprised of multiple subdivisions. The ventral
nucleus of the medial geniculate body is tonotopically organized and is the
main ascending route to the primary auditory cortex, while its other
subdivisions project widely to both primary and non-primary auditory
cortex. Importantly, the auditory pathway does not only consist of
ascending projections; it also has rich top-down projections that are critical
for modulation of neural responses in the subcortical auditory centers and
for learning-induced plasticity (Bajo, Nodal, Moore, & King, 2010; Suga &
Ma, 2003). In general, conduction in the auditory pathway is faster and
stronger for the contralateral pathway.
The human auditory cortex is located in the posterior part of the superior
temporal lobe covering the Heschl’s gyrus and parts of the planum
temporale and the posterior superior temporal gyrus. More specifically, the
primary auditory cortex is largely located in the medial part of the Heschl’s
gyrus (corresponding to Brodmann’s area BA41), and its core auditory
region is tonotopically organized such that different subregions of the
cortex are sensitive to different frequency bands (Langers, 2014; Norman-
Haignere, Kanwisher, & McDermott, 2013). The primary auditory cortex
performs fine-grained and specific analysis of acoustic features, such as
frequency (Da Costa et al., 2011; Humphries et al., 2010; Warren,
Uppenkamp, Patterson, & Griffiths, 2003) and spectro-temporal modulation
(Schonwiesner & Zatorre, 2009), playing a key role for the transformation
of acoustic features into auditory percepts (e.g., from sound frequency into
pitch percept) (Griffiths & Warren, 2004). Several lesion studies and
functional imaging research have identified the lateral Heschl’s gyrus as a
pitch-sensitive area, suggesting that pitch percepts are represented in this
particular cortical region of the auditory cortex (for review, see Zatorre &
Zarate, 2012).
After the initial decoding of acoustic information in the primary auditory
cortex, the information is transmitted to the secondary auditory cortex
(located in the planum temporale and the planum polare) and to higher-level
associative cortex in the superior temporal cortex and superior temporal
sulcus. Areas of the non-primary auditory cortex are involved in a number
of functions crucial for establishing a cognitive representation of the
acoustic environment, including the representation of auditory objects
(auditory Gestalt formation), which entails processes such as the analysis of
the contour of a melody, spatial grouping, extraction of inter-sound
relationships, and stream segregation (Griffiths & Warren, 2002, 2004; for
review, see Koelsch, 2011).
Within the non-primary auditory cortex, there are multiple differentiated
networks that have distinct functional roles (Cammoun et al., 2015). There
is consistent evidence indicating that the superior temporal gyrus—both
anterior and posterior to the Heschl’s gyrus—plays an important role in
melodic processing (for review, see Janata, 2015; Peretz & Zatorre, 2005;
Zatorre & Zarate, 2012). For instance, the superior temporal lobe (including
both the superior temporal gyrus and the superior temporal sulcus) has been
identified in studies examining melodic contour processing (Lee, Janata,
Frost, Hanke, & Granger, 2011; Patterson, Uppenkamp, Johnsrude, &
Griffiths, 2002; Schindler, Herdener, & Bartels, 2013; Tramo, Shah, &
Braida, 2002), perception of melodic intervals (Klein & Zatorre, 2015),
sound spectral envelope (Warren, Jennings, & Griffiths, 2005), and
categorical perception of major and minor chords (Klein & Zatorre, 2011).
Interestingly, studies have shown that the posterior region of the auditory
cortex is more sensitive to decoding changes in pitch height (which refers to
the spectral weighting of a sound), whereas more anterior areas are more
sensitive to changes in pitch chroma (which is a feature related to the
relative position of a pitch within a scale), indicating that pitch dimensions
may have distinct representations in the human auditory cortex (Warren et
al., 2003). Recently emerging evidence suggests that the parietal cortex and
posterior regions of the superior temporal sulcus are key brain areas for
multisensory integration, where information from auditory, visual, tactile,
and multisensory stimuli converge via a patchy distribution of inputs,
followed by integration in the intervening cortex (Beauchamp, Argall,
Bodurka, Duyn, & Martin, 2004; Beauchamp, Nath, & Pasalar, 2010;
Beauchamp, Yasar, Frye, & Ro, 2008).
Functional differences have also been reported between the left and right
auditory cortices, whereby the left auditory cortical areas have a higher
degree of temporal sensitivity, whereas corresponding areas on the right
auditory cortex have a greater spectral resolution (Andoh & Zatorre, 2011;
Cha, Zatorre, & Schönwiesner, 2016; Perani, 2012; Santoro et al., 2014;
Stewart, Overath, Warren, Foxton, & Griffiths, 2008; Tervaniemi et al.,
2000; Warrier et al., 2009). Notably, research has repeatedly shown a right-
hemisphere bias in the processing of fine-grained spectral processing and a
preferential response in the left hemisphere for temporal features of sounds,
which supports the hypothesis that these functional asymmetries at early
stages of auditory processing may be related to the intrinsic properties of
each cortical hemisphere (Zatorre & Zarate, 2012). However, the pattern of
activation between hemispheres can be modulated by stimulus complexity
and/or task demands (Brechmann & Scheich, 2005; Hyde, Peretz, &
Zatorre, 2008; Schön, Gordon, & Besson, 2005; Stewart et al., 2008) or
music training (Ohnishi et al., 2001; Proverbio, Orlandi, & Pisanu, 2016).
With respect to music perception, the findings outlined thus far reveal a
hierarchical organization of auditory processing (Stewart et al., 2008;
Wessinger et al., 2001; see also de Heer, Huth, Griffiths, Gallant, &
Theunissen, 2017). The primary auditory cortex plays a crucial role in
extracting individual pitches and pitch changes within the melody, whereas
non-primary auditory areas are involved in determining relationships
between pitches to define the melody contour. More abstract processes
required to establish syntactic relationships and meaning occur largely in
regions outside of the auditory cortex, including the frontal cortex.
Auditory-Frontal Networks
The transformation of the auditory information into a musically meaningful
tonal context involves several areas of the frontal cortex. Studies of music
syntax, utilizing primarily expectancy violation paradigms, have
demonstrated that regions of the inferior frontal gyrus respond to harmonic
expectancy violations (Bianco et al., 2016; Janata, Birk, et al., 2002;
Koelsch et al., 2002; Koelsch, Fritz, Schulze, Alsop, & Schlaug, 2005;
Maess, Koelsch, Gunter, & Friederici, 2001; Seger et al., 2013; Tillmann,
Janata, & Bharucha, 2003). Reports have repeatedly indicated that the
cortical network comprising the inferior frontolateral cortex (corresponding
to BA44), inferior frontal gyrus, the anterior portion of the superior
temporal gyrus, and the ventral premotor cortex, is involved in the
processing of musical structure (for review, see Koelsch, 2006, 2011). This
network appears to be specialized in establishing syntactic relationships by
evaluating the harmonic relationship between incoming tonal information
and a preceding harmonic sequence, thus detecting musical-structural
irregularities and organizing fast short-term predictions of upcoming
musical events (Koelsch, 2006). Recent imaging research has also
suggested that rhythmic and melodic deviations in musical sequences may
recruit different cortical areas—pitch deviations engage a neural network
comprising auditory cortices, inferior frontal and prefrontal areas, whereas
rhythmic deviations of a musical sequence recruit neural networks
involving the posterior parts of the auditory cortices and parietal areas
(Lappe, Lappe, & Pantev, 2016; Lappe, Steinsträter, & Pantev, 2013).
These findings are in accordance with the dual-pathway model of
auditory processing, which hypothesizes that two auditory processing
pathways originate from the primary auditory cortex, each contributing to
processing different higher-order aspects of auditory stimuli (Belin &
Zatorre, 2000; Bizley & Cohen, 2013; Hickok & Poeppel, 2007;
Rauschecker & Scott, 2009). The anterior-ventral auditory pathway—which
projects from anterior superior temporal gyrus to anterior inferior frontal
gyrus and prefrontal areas—is predominantly involved in perceiving
auditory objects and processing auditory spectral features. For instance, it
has been shown that the inferior frontal gyrus and related areas of the
ventrolateral prefrontal cortex are activated during phonological and
semantic processing, non-verbal auditory sound detection (Kiehl, Laurens,
Duty, Forster, & Liddle, 2001), discrimination and auditory feature
detection (Gaab, Gaser, Zaehle, Jancke, & Schlaug, 2003; Zatorre,
Bouffard, & Belin, 2004), and auditory working memory (Kaiser, Ripper,
Birbaumer, & Lutzenberger, 2003), which reinforces the assumption that
these areas play a more fundamental role in auditory processing. On the
other hand, the posterior-dorsal stream—which connects posterior superior
temporal gyrus with posterior inferior frontal gyrus, posterior parietal
cortex, and premotor cortex—has been implicated in extracting spectral
motion and temporal components of an auditory stimulus, thus processing
how frequencies change over time (review: Plakke & Romanski, 2014;
Zatorre & Zarate, 2012). Recent evidence indicates that the dorsal pathway
of auditory processing also plays an important role in calculating and
comparing pitch or temporal manipulations within a context and using this
auditory information to select and prepare appropriate motor responses
(Belin & Zatorre, 2000; Chen, Rae, & Watkins, 2012; Foster, Halpern, &
Zatorre, 2013; Hickok & Poeppel, 2007; Loui, 2015; Saur et al., 2008;
Warren, Wise, & Warren, 2005).
Frontal cortex activity has also been associated with cognitive demands
or the stimulus properties within a task. Tasks that require maintenance and
rehearsal of musical information activate the working memory functional
network, comprising the ventrolateral premotor cortex (encroaching Broca’s
area), dorsal premotor cortex, the planum temporale, inferior parietal lobe,
the anterior insula, and subcortical structures (Koelsch et al., 2009; Royal et
al., 2016; Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011). The
medial prefrontal cortex (primarily the medial orbitofrontal region) appears
to be particularly engaged in tasks requiring self-referential judgments
(Alluri et al., 2013; Zysset, Huber, Ferstl, & von Cramon, 2002), musical
semantic memory (Groussard et al., 2010; Platel, Baron, Desgranges,
Bernard, & Eustache, 2003), and music-evoked autobiographical memory
(Janata, 2009; Von Der Heide, Skipper, Klobusicky, & Olson, 2013). Areas
in the frontal lobes, such as parietal and ventrolateral prefrontal regions, are
differentially activated depending on the relative attentional demands of the
tasks (Alho, Rinne, Herron, & Woods, 2014; Janata, Tillmann, & Bharucha,
2002; Maidhof & Koelsch, 2011; Satoh, Takeda, Nagata, Hatazawa, &
Kuzuhara, 2001).
Involuntary musical imagery—that is, the spontaneous experience of
having music looping in one’s head—is associated with cortical thickness in
regions of the right frontal and temporal cortices as well as the anterior
cingulate and left angular gyrus (Farrugia, Jakubowski, Cusack, & Stewart,
2015). On the other hand, voluntary musical imagery—the generation of
mental representation of music or musical attributes in the absence of real
sound input—engages secondary auditory cortices, the parietal cortex,
inferior frontal regions, the supplementary motor area (SMA) and pre-SMA
(Brown & Martinez, 2007; Halpern, Zatorre, Bouffard, & Johnson, 2004;
Harris & De Jong, 2014; Peretz et al., 2009; Zatorre, Halpern, Perry, Meyer,
& Evans, 1996). Neural activity in motor areas during perception or mental
imagery of sounds have been repeatedly reported when musicians listen to a
well-rehearsed musical sequence (Bangert et al., 2006; D’Ausilio,
Altenmüller, Olivetti Belardinelli, & Lotze, 2006; Harris & De Jong, 2014;
Haueisen & Knösche, 2001) or when pianists watch silent video recordings
of hands playing a silent keyboard (Baumann et al., 2007; Bianco et al.,
2016; Hasegawa et al., 2004). Activation of the fronto-parietal motor-
related network (comprising Broca’s area, the premotor region, intraparietal
sulcus, and inferior parietal region) was also found when non-musicians
listened to a piano piece they learned to play (Lahav, Saltzman, & Schlaug,
2007). These studies collectively show that the mere perception or mental
imagery of sounds (which would normally be associated with a specific
action) can automatically trigger representations of the movement necessary
to produce these sounds, providing strong evidence that perception and
action are intrinsically coupled in the human brain and in cognition (for
review, see Keller, 2012; Maes, Leman, Palmer, & Wanderley, 2014;
Novembre & Keller, 2014).
Auditory-Motor Networks
Projections from motor cortex to the auditory cortex are an architectural
feature common to many animal species (Schneider, Nelson, & Mooney,
2014). Animal research has indeed proven to be an important model to
investigate the synaptic and circuit mechanisms by which the motor cortex
interacts with auditory cortical activity (e.g., Merchant, Perez, Zarco, &
Gamez, 2013; Nelson et al., 2013; Roberts et al., 2017; Schneider &
Mooney, 2015). For instance, a recent study in mice found that axons from
the secondary motor cortex make synapses onto both excitatory and
inhibitory neurons in deep and superficial layers of the auditory cortex and
that a subset of these neurons extends axons to various subcortical areas
important for auditory processing (Nelson et al., 2013). The analysis of
local field potentials of behaving macaques has also provided valuable
insight regarding the neural underpinnings for beat synchronization,
showing, for example, that beta-band oscillations may enable
communication between distributed circuits involving the striato-thalamo-
cortical network during rhythm perception and production (for a review, see
Merchant & Bartolo, 2018; Merchant, Grahn, Trainor, Rohrmeier, & Fitch,
2015; see also Chapter 8).
Recent research has also identified fiber projections transmitting
auditory signals into motor regions in the human brain (Fernández-Miranda
et al., 2015). Fernández-Miranda and colleagues demonstrated that the left
and right arcuate fascicle, a white matter fiber tract that links lateral
temporal cortex with frontal areas, is segmented into subtracts with distinct
fiber terminations (Fig. 2). One set of fibers terminates at the ventral
precentral and caudal middle frontal gyri (BA4, BA6), providing direct
projections from auditory cortex to motor areas (primary motor cortex,
premotor cortex).
FIGURE 2. Subtracts of the left arcuate fascicle with terminations on primary motor cortex and
premotor cortex, corresponding to Brodmann areas BA6 and BA4 (ventral precentral and caudal
middle frontal gyri).
Reprinted by permission from Brain Structure and Function 220 (3), Asymmetry,
connectivity, and segmentation of the arcuate fascicle in the human brain, Juan C. Fernández-
Miranda, Yibao Wang, Sudhir Pathak, Lucia Stefaneau, Timothy Verstynen, and Fang-Cheng
Yeh, pp. 1665–1680, https://doi.org/10.1007/s00429-014-0751-7 © Springer-Verlag Berlin
Heidelberg, 2014.
Further evidence of functional coordination between auditory and motor

cortices has been provided by a robust body of neuroimaging research.
Studies have shown that listening to and encoding auditory rhythms
internally increases auditory-motor brain connectivity (Chen, Penhune, &
Zatorre, 2008a; Chen, Zatorre, & Penhune, 2006; Fujioka, Trainor, Large,
& Ross, 2012; Grahn & Brett, 2007), and that the coupling among cortical
motor and auditory areas is strengthened with musical training (Chen,
Penhune, & Zatorre, 2008b; Grahn & Rowe, 2009; Palomar-García,
Zatorre, Ventura-Campos, Bueichekú, & Ávila, 2017). Studies have also
found that corticospinal excitability is modulated by music with a strong
beat (“groove”), which suggests that merely listening to musical rhythm
elicits activity in motor-output pathways from the primary motor cortex to
the spinal cord (Giovannelli et al., 2013; Michaelis, Wiener, & Thompson,
2014; Stupacher, Hove, Novembre, Schütz-Bosbach, & Keller, 2013).
Further evidence of auditory-motor coupling at spinal cord level is provided
by research showing that delivering transcranial magnetic stimulation in
time with the music facilitates corticospinal excitability in muscles involved
in foot tapping (i.e., tibialis anterior, gastrocnemius) (Wilson & Davey,
2002; see also Thaut, McIntosh, Prassas, & Rice, 1992), and that the degree
of corticospinal excitability depends on musical training, being greater in
trained musicians (D’Ausilio et al., 2006; Stupacher et al., 2013). Finally,
extensive neurophysiological evidence indicates that auditory and motor
regions communicate through oscillatory activity and that the cortical loop
between these areas generates temporal predictions that are crucial in
auditory perceptual learning and for the perception of, and entrainment to,
musical rhythms (Fujioka et al., 2012; Large, Herrera, & Velasco, 2015;
Large & Snyder, 2009; Ross, Barat, & Fujioka, 2017; for review: Merchant
et al., 2015; Morillon & Baillet, 2017; Ross, Iversen, & Balasubramaniam,
2016). Therefore, evidence at multiple levels of inquiry suggests that there
is a strong functional and anatomical link between auditory and motor-
related areas, and that many components of the motor system are deeply
involved in auditory perceptual learning, in the generation of predictions, as
well as in the perception of, and entrainment to, musical rhythms.
Interconnectivity between auditory and motor-related areas is crucial for
time perception and for the production of timed movements. Temporal
processing and sensorimotor synchronization involve complex functional
networks comprising several distant cortical and subcortical brain areas,
including the cerebellum, the basal ganglia (predominantly the putamen),
thalamus, the SMA and pre-SMA, premotor cortex (PMC), and the auditory
cortex (for review: Chauvigné, Gitau, & Brown, 2014; Iversen &
Balasubramaniam, 2016; Leow & Grahn, 2014; Merchant et al., 2015; Teki,
Grube, & Griffiths, 2012). Although the specific role of each area is still
emerging, recent studies have reached consensus that there are at least two
distinct networks involved in timing—one is centered on the role of the
cerebellum in the processing of sensory prediction errors, motor adaptation,
and duration-based timing, and the second is based on the role of the basal
ganglia and the SMA on beat-based timing and internally driven rhythmic
movements.
Cortico-Cerebellar Network
The cerebellum receives segregated projections from prefrontal, frontal,
parietal, and superior temporal regions via the pontine nuclei in the
brainstem (Fig. 3). Output projections are then sent from the cerebellar
cortex to specialized deep cerebellar nuclei, which in turn project back, via
the thalamus, to the region of the cerebral cortex from which the initial
projection originated (Koziol, Budding, & Chidekel, 2011; Schmahmann &
Pandya, 1997). These parallel cortico-cerebellar loops place the cerebellum
in a unique position to use all the information it receives from the neocortex
to build, through a learning process, an internal “model” that contains all of
the dynamic processes that are required to perform a specific movement or
behavior. This feedforward information (or efferent copy) is used to
generate a representation of the expected sensory consequences of that
command, and to compute error signals that can produce online changes to
adjust its execution and/or to improve future predictions (for review, see
Sokolov, Miall, & Ivry, 2017; Wolpert, Miall, & Kawato, 1998). Indeed,
research has demonstrated that the cerebellum is key in establishing sensory
prediction errors by processing signal discrepancies between the expected
sensory consequences of a stimulus/movement and the actual sensory input
(Baumann et al., 2015; Koziol et al., 2014; Manto et al., 2012; Tseng,
Diedrichsen, Krakauer, Shadmehr, & Bastian, 2007). These error signals are
essential for sensorimotor control, motor adaptation, and learning because
they allow rapid adjustments in the motor output and refinement of future
sensory predictions in order to reduce the variability of subsequent actions
(Doyon, Penhune, & Ungerleider, 2003; Petter, Lusk, Hesslow, & Meck,
2016; Shadmehr, Smith, & Krakauer, 2010; Sokolov et al., 2017).
FIGURE 3. Diagram of cortico-cerebellar and basal ganglia-thalamo-cortical networks and the
intricate connectivity between these circuits. The basal ganglia-thalamo-cortical timing network
normally involves the SMA, PFC, Striatum, PPC, GPe, Th, STN, VTA, and SN. The cerebellar
network involves the Cb Cortex, PN, DN, and IO. Note that the cerebellum is also connected to
multiple cortical and subcortical regions, and that reciprocal connections between the basal ganglia
and the cerebellum are not illustrated. Abbreviations: PFC, prefrontal cortex; SMA, supplementary
motor area; PPC, posterior parietal cortex; Th, thalamus; GPe, globus pallidus; STN, sub thalamic
nuclei; SN, substantia nigra; VTA, ventral tegmental area; PN, pontine nuclei; DN, dentate nucleus;
Cb, cerebellar cortex; IO, inferior olive.
Reproduced with permission from Petter et al. (2016).
A growing body of research evidence indicates that cortico-cerebellar

networks are predominantly engaged in movement synchronization to
externally cued stimuli, but less involved in self-paced or internally guided
motor behaviors (Brown, Martinez, & Parsons, 2006; Buhusi & Meck,
2005; Chauvigné et al., 2014; Del Olmo, Cheeran, Koch, & Rothwell, 2007;
Grahn & Rowe, 2013; Manto et al., 2012; Thaut et al., 2009; Witt, Laird, &
Meyerand, 2008). These findings concur with the cerebellum’s role in
integration of sensory and motor information, basic sensory prediction
related to motor timing, and temporal adaptation during sensorimotor
synchronization (Diedrichsen, Criscimagna-Hemminger, & Shadmehr,
2007; Gao et al., 1996; Manto et al., 2012; Mayville, Jantzen, Fuchs,
Steinberg, & Kelso, 2002; Rao et al., 1997; Schwartze, Keller, & Kotz,
2016; Shadmehr et al., 2010; Thaut, Demartin, & Sanes, 2008; Tseng et al.,
2007). The premotor cortex is also known to play a role in movements
guided by external sensory stimuli and is thought to be particularly involved
in aspects of prediction related to motor timing and temporal adaptation,
and in integrating higher-order features of sound with the appropriately
timed and organized motor response (Chapin et al., 2010; Chen et al.,
2008b; Jahanshahi et al., 1995; Jäncke, Loose, Lutz, Specht, & Shah, 2000;
Kornysheva & Schubotz, 2011; Pecenka, Engel, & Keller, 2013; Schubotz,
2007). Studies have indeed identified fronto-olivocerebellar pathways that
connect the dorsal portions of the dentate nucleus in the cerebellum to
motor areas such as the primary motor cortex and the premotor cortex
(Dum, 2002; Middleton & Strick, 2001; Schmahmann & Pandya, 1997).
The olivocerebellar network is thought to be an important neural loop in
the cerebellar adaption of sensorimotor forward models due to its capacity
to directly modulate the output signals sent from the cerebellum back to
sensorimotor cortical areas (Koziol et al., 2011; Sokolov et al., 2017). The
inferior olive is a brainstem nucleus that receives significant projections
from the sensorimotor cortex and is one of the main sources of input to the
cerebellar cortex. Excitatory neurons originated in the inferior olive, known
as climbing fibers, project to Purkinje cells in the cerebellar sensorimotor
cortex and the deep cerebellar nuclei. This microcircuit is completed with
Purkinje cells in the cerebellar cortex sending inhibitory projections to the
deep cerebellar nuclei (including the dentate nucleus), which in turn send
projections back to the inferior olive and to the cerebral cortex via the
thalamus (Fig. 3). Some models suggest that this cortico-cerebellar network
is involved in detecting sequences of cortical input activity and generating
precisely timed output activity in response, hence contributing to the
optimization and coordination of neocortical network activity involved in
cognitive and motor processes (Durstewitz, 2003; Fatemi et al., 2012, p.
792; Mauk & Buonomano, 2004; Medina & Mauk, 2000; Molinari et al.,
2005; Molinari, Leggio, & Thaut, 2007; Thaut et al., 2009). Alternatively,
other theories hypothesize that the olivocerebellar circuit has the
electrophysiological characteristics of a neural clock capable of generating
accurate absolute timing signals, suggesting that the cerebellum is
specialized for providing an explicit temporal representation (Allman, Teki,
Griffiths, & Meck, 2014; Ashe & Bushara, 2014; Ivry, Spencer, Zelaznik, &
Diedrichsen, 2002; Spencer, Ivry, & Zelaznik, 2005; Teki et al., 2012).
Recently converging evidence indicates that the cerebellum is also
implicated in measuring and storing the absolute duration of sub-second
time intervals of discrete perceptual events (for review: Allman et al., 2014;
Petter et al., 2016; Teki et al., 2012). Several studies have demonstrated that
the cerebellum is crucial for perceptual tasks requiring temporal
discrimination, processing of target duration, detecting the timing onset of
discrete perceptual events, detecting violations of temporal expectancies,
and processing complex temporal events such as polyrhythmic stimuli and
non-metric rhythms (Grahn & Rowe, 2009; Grube, Cooper, Chinnery, &
Griffiths, 2010; Kotz, Stockert, & Schwartze, 2014; O’Reilly, Mesulam, &
Nobre, 2008; Paquette, Fujii, Li, & Schlaug, 2017; Schwartze, Rothermich,
Schmidt-Kassow, & Kotz, 2011; Teki, Grube, Kumar, & Griffiths, 2011;
Tesche & Karhu, 2000; Thaut et al., 2008). Recent functional imaging and
transcranial stimulation research demonstrated that the cerebellar lobules VI
and VIIA in the vermis are specially active in perceptual tasks involving
duration-based timing (Grube, Cooper, et al., 2010; Grube, Lee, Griffiths,
Barker, & Woodruff, 2010; Keren-Happuch, Chen, Ho, & Desmond, 2014;
Lee et al., 2007; O’Reilly et al., 2008).
The notion that distinct cerebellar regions are activated depending on the
context and the different aspects of timing is supported by neuroimaging
studies demonstrating that the cerebellum is topographically organized so
that different regions of the cerebellum manage information from different
domains (Kelly & Strick, 2003; Keren-Happuch et al., 2014; Koziol et al.,
2011; Stoodley & Schmahmann, 2009, 2010). Although the cerebellum has
been long known for its importance in motor behavior and timing, current
research has firmly established the cerebellum’s critical role in modulating
cognitive functions including attention, emotion, executive function,
language, working memory, and music perception (for review, see Baumann
et al., 2015; Buckner, 2013; Koziol et al., 2014; Sokolov et al., 2017).
Recent studies indeed suggest that the cerebellum plays a role in processing
pitch and timbre (Alluri et al., 2012; Parsons, 2012; Parsons, Petacchi,
Schmahmann, & Bower, 2009; Pfordresher, Mantell, Brown, Zivadinov, &
Cox, 2014; Thaut, Trimarchi, & Parsons, 2014; Toiviainen, Alluri, Brattico,
Wallentin, & Vuust, 2014). For instance, Thaut and colleagues (2014)
described common and distinct neural substrates underlying processing of
the different components of rhythmic structure (i.e., pattern, meter, tempo),
but also, that melody processing induced activity in different regions when
compared to rhythm (e.g., right anterior insula and various cerebellar areas).
Another study showed that alterations of auditory feedback during piano
performance, particularly pitch disruptions, increased activity in the
cerebellum (Pfordresher et al., 2014), which is aligned with the
understanding that the cerebellum is involved in monitoring sensory
prediction errors, including pitch information. The cerebellum has also been
implicated in the processing of affective sounds (Alluri et al., 2015;
Pallesen et al., 2005; for review: Frühholz, Trost, & Kotz, 2016), and in
working memory tasks such as recognizing musical motifs (e.g., Burunat,
Alluri, Toiviainen, Numminen, & Brattico, 2014; see also Ito, 2008; Marvel
& Desmond, 2010), supporting the idea that the cerebellum is a
multipurpose neural mechanism capable of influencing a wide range of
functional processes.
Basal Ganglia-Thalamo-Cortical Network

Mounting evidence suggests that a distributed network comprising the basal
ganglia (particularly the putamen), thalamus, and cortical areas such as the
SMA and pre-SMA, premotor cortex and auditory cortex, is engaged in beat
perception (for review: Leow & Grahn, 2014; Merchant et al., 2015; Petter
et al., 2016; Teki et al., 2012). The basal ganglia are thought to play a key
role in predicting upcoming events based on a relative timing mechanism,
that is, where temporal intervals are coded relative to a periodic beat
interval (Grahn & Brett, 2007; Grahn, Henry, & McAuley, 2011; Grahn &
Rowe, 2013; Grube, Cooper, et al., 2010; Grube, Lee, et al., 2010; Kotz,
Brown, & Schwartze, 2016; Nozaradan, Schwartze, Obermeier, & Kotz,
2017; Teki et al., 2011). These findings are consistent with studies showing
the involvement of the basal ganglia in reward prediction, associate
learning, and harmonic processing (e.g., Salimpoor, Benovoy, Larcher,
Dagher, & Zatorre, 2011; Salimpoor, Zald, Zatorre, Dagher, & McIntosh,
2015; Seger et al., 2013). Functional connectivity between basal ganglia
(putamen), cortical motor areas (premotor cortex and SMA), and auditory
cortex increases significantly when listening to rhythms with a clear beat,
suggesting that the basal ganglia and the SMA are important for the
representation of pulse and rhythm even in the absence of movement (Chen
et al., 2008a; Grahn & Brett, 2007; Grahn & Rowe, 2009; Stupacher et al.,
2013). Neural pathways connecting the basal ganglia and the SMA have
been identified in studies using in vivo imaging tractography (Akkal, Dum,
& Strick, 2007; Lehéricy et al., 2004), showing that corticostriatal
connections are part of a distributed network that supports different aspects
of timing (Fig. 3).
There are strong indications that the basal ganglia (putamen) and SMA
are predominantly involved in maintaining the internal representation of the
beat intervals in sensorimotor tasks (beat continuation). This notion is
supported by studies showing that there is greater activation of the putamen
and SMA during the continuation phase of synchronization-continuation
tasks, that is, when the external reference cues are no longer available
(Cunnington, Bradshaw, & Iansek, 1996; Grahn & Rowe, 2013; Halsband,
Ito, Tanji, & Freund, 1993; Rao et al., 1997). These findings concur with
research describing the role of the SMA in timed movements performed in
the absence of any pacing stimulus (i.e., self-paced or internally guided
motor behaviors) (Coull, Vidal, & Burle, 2016; Harrington & Jahanshahi,
2016; Lima, Krishnan, & Scott, 2016; Nachev, Kennard, & Husain, 2008;
Witt et al., 2008). Activity in the SMA and basal ganglia during internally
generated movements has been also investigated in non-human primates
(for review: Merchant & Bartolo, 2018; Merchant et al., 2015). The analysis
of local field potentials of behaving macaques has demonstrated, for
instance, that greater beta-band (15–30 Hz) activity in the putamen was
observed during the continuation phase of synchronization-continuation
tasks, suggesting that beta-band oscillations may enable communication
between a distributed set of circuits including the motor cortico-basal
ganglia-thalamo-cortical circuit (Bartolo, Prado, & Merchant, 2014).
Interestingly, the study also found gamma-activity (30–50 Hz) in some local
fields in the putamen during the synchronization phase of the task,
suggesting that the putamen may also be involved in local computations
associated with sensorimotor processing during beat synchronization.
The physiological mechanism underlying the processing of temporal
information involving the basal ganglia-thalamo-cortical circuit is likely
mediated by dopamine receptors located on corticostriatal neurons in the
nigrostriatal pathway (for review, see Agostino & Cheng, 2016; Allman et
al., 2014; Buhusi & Meck, 2005; Petter et al., 2016). Evidence suggests that
striatal medium spiny neurons in the dorsal striatum (comprising putamen
and caudate nucleus) are crucial to duration discrimination in the seconds-
to-minutes range due to their role in large-scale oscillatory networks
connecting mesolimbic, nigrostriatal, and mesocortical dopaminergic
systems (Buhusi & Meck, 2005; Merchant, Harrington, & Meck, 2013).
The striatal beat-frequency model suggests that the neural mechanisms of
interval timing are based on the entrainment of the oscillatory activity of
striatal neurons and cortical neural oscillators (Matell & Meck, 2004). The
role of dopamine in interval timing accuracy and precision is supported by
studies showing that patients with disorders that involve dopaminergic
pathways, such as Parkinson’s disease, Huntington’s disease, and
schizophrenia, have difficulties in timing-related tasks, and that
dopaminergic medication can ameliorate these issues (Harrington et al.,
2011; Jahanshahi et al., 2010; see review in Allman & Meck, 2012; Coull,
Cheng, & Meck, 2011). A recent study also showed that dopamine
depletion in healthy individuals attenuated the activity in the putamen and
SMA and directly interfered with the processing of temporal information
(Coull, Hwang, Leyton, & Dagher, 2012). Pharmacological studies have
also made significant advances in understanding how dopamine affects the
activity of corticostriatal circuits and what roles the different dopaminergic
receptors play in timing behavior (for review, see Agostino & Cheng, 2016;
Narayanan, Land, Solder, Deisseroth, & DiLeone, 2012).
Taken together, it is clear that cortico-cerebellar and basal ganglia-
thalamo-cortical networks have complementary roles in temporal
perception and motor timing, and the challenge for future studies is to
further understand how these networks interact in both motor and non-
motor functions. Recently emerging evidence from neuroanatomical studies
using virus transneuronal tracers demonstrates that the cerebellum and the
basal ganglia are reciprocally connected and that these subcortical
structures are indeed part of an integrated network (Bostan, Dum, & Strick,
2013; Caligiore et al., 2017; Chen, Fremont, Arteaga-Bracho, &
Khodakhah, 2014; Kotz et al., 2016; Pelzer, Melzer, Timmermann, von
Cramon, & Tittgemeyer, 2017). Models discussing the possible ways in
which the cortico-cerebellar and striato-thalamo-cortical networks may
integrate to support time perception and sensorimotor synchronization have
been recently proposed, instigating further investigations (Lusk, Petter,
Macdonald, & Meck, 2016; Petter et al., 2016; Teki et al., 2012).
Auditory-Limbic Networks
The limbic and the auditory systems are highly interconnected and form an
important part of the core neural network involved in affective sound
processing (Frühholz et al., 2016). Direct and indirect pathways between
the auditory system and limbic areas have been described in the literature
(Fig. 4B) (Frühholz, Trost, & Grandjean, 2014; Janak & Tye, 2015). For
instance, the amygdala (specifically, the lateral part of the basolateral
complex) receives direct projections from the superior temporal cortex
(LeDoux, 2007), and animal research suggests that there may also be a
direct connection with the primary auditory cortex (Reser, Burman,
Richardson, Spitzer, & Rosa, 2009). The amygdala is also interconnected
with subcortical nodes of the ascending auditory pathway, receiving direct
projections from the medial geniculate body and sending projections to the
inferior colliculus, supporting the notion that less complex sounds (i.e.,
short high-intensity sounds or aversive sounds) may be transmitted to the
amygdala through a fast subcortical circuit (Fig. 4B) (Frühholz et al., 2016;
Pannese, Grandjean, & Frühholz, 2016). Recent theories suggest that this
direct link between the auditory thalamus and the amygdala plays an
important role in fast responses to sound whereas a “slow” network
projecting from thalamus to primary auditory cortex to association cortex to
amygdala may govern interpretive labeling/understanding responses during
music processing and music-evoked emotions (Huron, 2006; Juslin &
Västfjäll, 2008).
FIGURE 4. (A) The neural auditory ascending pathway. (B) Amygdala and hippocampal
connections to the auditory system. The amygdala receives direct input from the MGB of the
thalamus (1) and from higher-level auditory cortex in STC (line 2), which both project to the lateral
nucleus of the basolateral (l) complex of the amygdala. Tracing studies in animals also report
connections between AC and the amygdala (dashed line 2). The basal nucleus (b) of the basolateral
complex has efferent connection to the IC (line 3). The accessory nucleus (ac), the medial (m), and
the central nucleus (c) are not directly connected to the auditory system. The hippocampus (hc)
shows direct (line 2) and indirect (lines 1) connections to the auditory cortex. A direct connection
exists from the CA1 region to the higher-level auditory cortex (line 2). Indirect connections mainly
provide input to the hippocampal formation by connections from the STC to the parahippocampal
gyrus (phg), to the perirhinal cortex (prc) and the entorhinal cortex (erc), all line 1, which figure as
input relays to the hippocampus. Abbreviations: MGB, medial geniculate body; STC, superior
temporal cortex; IC, inferior colliculus; CN, cochlear nucleus; SOC, superior olivary complex; AC,
auditory cortex; SUB, subiculum; DG, dentate gyrus.
Reprinted with permission from Frühholz et al. (2014).
Mounting evidence from functional neuroimaging research shows that

music can modulate activity in several brain areas of the limbic system,
such as the amygdala, the hippocampal formation, right ventral striatum
(including the nucleus accumbens) extending into the ventral pallidum,
caudate nucleus, insula, the cingulate cortex, and the orbitofrontal cortex
(for review: Koelsch, 2014; Zatorre, 2015). Studies have demonstrated that
music that is perceived as joyful elicits strong response in the superficial
nuclei group of the amygdala, an area that seems to be particularly involved
in extracting the social significance of signals that convey basic socio-
affective information (Koelsch et al., 2013; Koelsch & Skouras, 2014;
Lehne, Rohrmeier, & Koelsch, 2013). Activity changes in response to
joyful, unpleasant, or sad music, were also found in the (right) laterobasal
amygdala, an area that has been implicated in acquisition, encoding, and
retrieval of both positive and negative associations, and processing cues that
predict either positive or negative reinforcement (Brattico et al., 2011;
Koelsch et al., 2013; Koelsch, Fritz, v. Cramon, Müller, & Friederici, 2006;
Mitterschiffthaler, Fu, Dalton, Andrew, & Williams, 2007; Pallesen et al.,
2005). The laterobasal amygdala is involved in the regulation of neural
input into the hippocampal formation, another area that responds to music-
evoked emotions such as tenderness, peacefulness, nostalgia, or wonder
(Burunat et al., 2014; Choppin et al., 2016; Koelsch et al., 2013;
Mitterschiffthaler et al., 2007; Trost, Ethofer, Zentner, & Vuilleumier, 2012;
for review: Koelsch, 2014). The hippocampus, in turn, receives projections
from the auditory system, however, these are mediated by the
parahippocampal gyrus, the perirhinal cortex, and the entorhinal cortex
(Fig. 4B) (for review, see Frühholz et al., 2014; Koelsch, 2014).
Changes in the ventral striatum (including the nucleus accumbens) have
also been found in response to pleasant music (Blood & Zatorre, 2001;
Koelsch et al., 2006; Menon & Levitin, 2005; Mueller et al., 2015;
Salimpoor et al., 2013; Zatorre & Salimpoor, 2013). In particular, the
nucleus accumbens has been shown to respond to intense feelings of music-
evoked pleasure and reward (Blood & Zatorre, 2001; Salimpoor et al.,
2011, 2013), suggesting that functional connectivity between the auditory
cortex and ventral striatum (including the nucleus accumbens) is crucial for
experiencing pleasure in music (Martínez-Molina, Mas-Herrero, Rodríguez-
Fornells, Zatorre, & Marco-Pallarés, 2016; Sachs, Ellis, Schlaug, & Loui,
2016; Salimpoor et al., 2013). Music-evoked pleasure can lead to dopamine
release in distinct anatomical areas; increase in dopamine availability in the
dorsal striatum is associated to the anticipation of reward, whereas increase
in dopamine in the ventral striatum occurs during the rewarding experience
(Blood & Zatorre, 2001; Menon & Levitin, 2005; Salimpoor et al., 2011,
2015; Zatorre & Salimpoor, 2013).
Aesthetic pleasure results from the integration between subcortical
dopaminergic regions and higher-order cortical areas of the brain (for
review, see Salimpoor et al., 2015). It has been shown, for instance, that
functional connectivity between the nucleus accumbens and the auditory
cortex as well as fronto-striatal circuit (involving ventral and dorsal
subdivisions of the striatum and frontal areas such as inferior frontal gyri,
prefrontal cortex, and orbitofrontal cortex) predicts whether individuals will
decide to purchase a song (Salimpoor et al., 2013). Recently emerging data
from transcranial magnetic stimulation research further supports the direct
role of the fronto-striatal circuit in both the affective responses and
motivational aspects of music-induced reward (Mas-Herrero, Dagher, &
Zatorre, 2017). The ventromedial prefrontal cortex and adjacent
orbitofrontal cortex are involved in high-level emotional processing, such
as reward detection and valuation, and are the main cortical inputs to the
nucleus accumbens, again reinforcing the notion that fronto-striatal circuits
are highly involved in the integration, evaluation, and decision-making of
reward-related stimuli (for review: Haber & Knutson, 2010; Salimpoor et
al., 2015; see also Chapter 14).
Recent findings suggest that the auditory cortex also plays a crucial role
in the emotional processing of sounds, beyond mere acoustical analysis
(Frühholz et al., 2016; Koelsch, Skouras, & Lohmann, 2018). Koelsch et al.
(2018) found that fear stimuli (compared with joy stimuli) evoked higher
network centrality in both anterior and posterior auditory association cortex,
suggesting that the auditory cortex may play a central role in the affective
processing of auditory information. Moreover, findings also indicated that
the auditory cortex is functionally connected with a widespread network
involved in emotion processing, which includes limbic/paralimbic
structures (cingulate, insular, parahippocampal, and orbitofrontal cortex, as
well as the ventral striatum), and also extra-auditory neocortical areas
(visual, somatosensory, and motor-related areas, and attentional structures).
These results expand the traditional view that sensory cortices have mere
perceptual functions and highlight the importance of investigating the
functional connectivity between brain regions.
Brain Network Interactions

Recent advances in neuroimaging analysis methods have allowed
researchers to address questions of functional connectivity, interregion
coupling, and networked computations that go beyond the “where” and
“when” of task-related activity, providing new insights about how different
brain networks interact to support cognitive, perceptual, and motor
functions (Friston, 2011). Among the topics recently explored in music
neuroscience is how brain networks organize themselves in a naturalistic
music listening situation, wherein data acquisition takes place while
participants listen to entire songs in an uninterrupted fashion, thus
emulating real-life listening experiences (Alluri et al., 2012, 2013, 2015;
Burunat et al., 2014; Koelsch & Skouras, 2014; Koelsch et al., 2018; Lehne
et al., 2013; Sachs et al., 2016; Toiviainen et al., 2014). Studies using novel
data-driven methods to investigate neural correlates of musical feature
processing using fMRI data have found, for instance, that timbral feature
processing during naturalistic listening conditions engages sensory and
default mode network cerebrocortical areas as well as cognitive areas of the
cerebellum, whereas musical pulse and tonality processing recruit cortical
and subcortical cognitive, motor, and emotion-related circuits (Alluri et al.,
2012; Toiviainen et al., 2014). Orbitofrontal cortex and the anterior
cingulate cortex, which are associated with aesthetic judgments and self-
referential appraisal, are also recruited while listening to full musical pieces
(Alluri et al., 2013; Reybrouck & Brattico, 2015; Sachs et al., 2016).
Moreover, music containing lyrics seems to particularly increase activity in
the left auditory cortex, corroborating the hypothesis of hemispheric
lateralization (Alluri et al., 2013; Brattico et al., 2011). Collectively, these
findings confirm the notion that music processing requires timely
coordination of large-scale cognitive, motor, and limbic brain circuitry.
Research has also demonstrated that music preference and music
expertise can modulate functional brain connectivity during passive music
listening. A recent study has found that the default mode network—a
network of interacting brain regions that is important for internally-focused
thoughts—was more functionally connected when people listened to
unfamiliar music they like compared to music they dislike, and that
listening to one’s favorite music increased connectivity between auditory
brain areas and the hippocampus (Wilkins, Hodges, Laurienti, Steen, &
Burdette, 2014). These findings were recently expanded by a study showing
that musicians and non-musicians use different neural networks during
music listening (Alluri et al., 2017). Whole-brain network analysis revealed
that, while the dominant hubs during passive music listening in non-
musicians encompassed regions related to the default mode network, in
musicians the primary neural hubs engaged during music listening
comprised cerebral and cerebellar sensorimotor regions. Moreover, the
study also showed that musicians have enhanced connectivity in the motor
and sensory homunculus representing the upper limbs and torso during the
listening task, suggesting that experts tend to process music using an action-
based approach whereas non-musicians use a perception-based approach
(Alluri et al., 2017; see also Moore, Schaefer, Bastin, Roberts, & Overy,
2014).
Evidence for the reconfiguration of human brain functional networks
during music listening has also been provided by electroencephalography
(EEG) studies (Adamos, Laskaris, & Micheloyannis, 2018; Klein, Liem,
Hänggi, Elmer, & Jäncke, 2016; Rogenmoser, Zollinger, Elmer, & Jäncke,
2016; Sänger, Müller, & Lindenberger, 2012; Wu et al., 2012; Wu, Zhang,
Ding, Li, & Zhou, 2013). Overall, findings concur that music processing
induces changes in the functional organization of neural synchronies by
increasing intraregional and interregional oscillatory synchronizations.
These findings support the evidence that music, like other higher cognitive
tasks, requires the activation of different cortical and subcortical regions in
an organized and cooperative manner (Bhattacharya & Petsche, 2005).
Uncovering the neural underpinnings of music processing is a central theme

in cognitive neuroscience, as evidenced by the robust body of literature on
this topic. Neuroimaging research developed in the past 20 years has
successfully identified several brain regions involved in the complex set of
cognitive processes underlying music perception, memory, emotion, and
performance, providing the foundation upon which research has started to
explore how these different brain regions interact to support music
processing. This chapter provides a broad panorama of the current
knowledge concerning the anatomical and functional basis of music
processing through a network perspective. Starting with the trajectory of
auditory stimuli through the ascending auditory pathway, we described how
interactions between auditory and frontal cortical areas are crucial for the
transformation of the acoustic information into a musically meaningful
tonal context, for the integration of sound events over time in working
memory, and the role of frontal areas in autobiographical memories,
attention, and musical imagery. Anatomical and functional coordination
between auditory and motor-related areas were also discussed in order to
understand how cortical and subcortical areas are involved in sensorimotor
synchronization and temporal processing, focusing more specifically on the
roles of cortico-cerebellar and basal ganglia-thalamo-cortical networks.
Auditory and limbic interactions were also discussed in relation to affective
sound processing and music-evoked emotions, also pointing to the
importance of the integration between subcortical dopaminergic regions and
higher-order cortical areas for aesthetic pleasure. To finalize, we reviewed
recent studies investigating how brain networks organize themselves in a
naturalistic music listening context. Collectively, this robust body of
literature suggests that music processing requires timely coordination of
large-scale cognitive, motor, and limbic brain networks, setting the stage for
a new generation of music neuroscience research on the dynamic
organization of brain networks underlying music processing.
R
Adamos, D. A., Laskaris, N., & Micheloyannis, S. (2018). Harnessing functional segregation across
brain rhythms as a means to detect EEG oscillatory multiplexing during music listening. Journal of
Neural Engineering 15, 036012.
Agostino, P. V., & Cheng, R. K. (2016). Contributions of dopaminergic signaling to timing accuracy
and precision. Current Opinion in Behavioral Sciences 8, 153–160.
Akkal, D., Dum, R. P., & Strick, P. L. (2007). Supplementary motor area and presupplementary
motor area: Targets of basal ganglia and cerebellar output. Journal of Neuroscience 27(40),
10659–10673.
Alho, K., Rinne, T., Herron, T. J., & Woods, D. L. (2014). Stimulus-dependent activations and
attention-related modulations in the auditory cortex: A meta-analysis of fMRI studies. Hearing
Research 307, 29–41.
Allman, M. J., & Meck, W. H. (2012). Pathophysiological distortions in time perception and timed
performance. Brain 135(3), 656–677.
Allman, M. J., Teki, S., Griffiths, T. D., & Meck, W. H. (2014). Properties of the internal clock: First-
and second-order principles of subjective time. Annual Review of Psychology 65, 743–771.
Alluri, V., Brattico, E., Toiviainen, P., Burunat, I., Bogert, B., Numminen, J., & Kliuchko, M. (2015).
Musical expertise modulates functional connectivity of limbic regions during continuous music
listening. Psychomusicology 25(4), 443–454.
Alluri, V., Toiviainen, P., Burunat, I., Kliuchko, M., Vuust, P., & Brattico, E. (2017). Connectivity
patterns during music listening: Evidence for action-based processing in musicians. Human Brain
Mapping 38(6), 2955–2970.
Alluri, V., Toiviainen, P., Jääskeläinen, I. P., Glerean, E., Sams, M., & Brattico, E. (2012). Large-
scale brain networks emerge from dynamic processing of musical timbre, key and rhythm.
NeuroImage 59(4), 3677–3689.
Alluri, V., Toiviainen, P., Lund, T. E., Wallentin, M., Vuust, P., Nandi, A. K., … Brattico, E. (2013).
From Vivaldi to Beatles and back: Predicting lateralized brain responses to music. NeuroImage 83,
627–636.
Amunts, K., Morosan, P., Hilbig, H., & Zilles, K. (2012). Auditory system. In J. K. Mai & G.
Paxinos (Eds.), The human nervous system (3rd ed., pp. 1270–1300). London: Elsevier.
Andoh, J., & Zatorre, R. J. (2011). Interhemispheric connectivity influences the degree of modulation
of TMS-induced effects during auditory processing. Frontiers in Psychology 2, 161.
Ashe, J., & Bushara, K. (2014). The olivo-cerebellar system as a neural clock. In H. Merchant & V.
de Lafuente (Eds.), Neurobiology of interval timing: Advances in experimental medicine and
biology (pp. 155–166). New York: Springer.
Bajo, V. M., Nodal, F. R., Moore, D. R., & King, A. J. (2010). The descending corticocollicular
pathway mediates learning-induced auditory plasticity. Nature Neuroscience 13(2), 253–260.
Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., … Altenmüller, E.
(2006). Shared networks for auditory and motor processing in professional pianists: Evidence from
fMRI conjunction. NeuroImage 30(3), 917–926.
Bartolo, R., Prado, L., & Merchant, H. (2014). Information processing in the primate basal ganglia
during sensory-guided and internally driven rhythmic tapping. Journal of Neuroscience 34(11),
3910–3923.
Baumann, O., Borra, R. J., Bower, J. M., Cullen, K. E., Habas, C., Ivry, R. B., … Sokolov, A. A.
(2015). Consensus paper: The role of the cerebellum in perceptual processes. Cerebellum 14(2),
197–220.
Baumann, S., Griffiths, T. D., Sun, L., Petkov, C. I., Thiele, A., & Rees, A. (2011). Orthogonal
representation of sound dimensions in the primate midbrain. Nature Neuroscience 14(4), 423–425.
Baumann, S., Koeneke, S., Schmidt, C. F., Meyer, M., Lutz, K., & Jancke, L. (2007). A network for
audio-motor coordination in skilled pianists and non-musicians. Brain Research 1161(1), 65–78.
Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A. (2004). Unraveling
multisensory integration: Patchy organization within human STS multisensory cortex. Nature
Neuroscience 7(11), 1190–1192.
Beauchamp, M. S., Nath, A. R., & Pasalar, S. (2010). fMRI-guided transcranial magnetic stimulation
reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. Journal of
Beauchamp, M. S., Yasar, N. E., Frye, R. E., & Ro, T. (2008). Touch, sound and vision in human
superior temporal sulcus. NeuroImage 41(3), 1011–1020.
Belin, P., & Zatorre, R. J. (2000). “What,” “where” and “how” in auditory cortex. Nature
Bhattacharya, J., & Petsche, H. (2005). Phase synchrony analysis of EEG during music perception
reveals changes in functional connectivity due to musical expertise. Signal Processing 85(11),
2161–2177.
Bianco, R., Novembre, G., Keller, P. E. E., Kim, S.-G. G., Scharf, F., Friederici, A. D. D., …
Sammler, D. (2016). Neural networks for harmonic structure in music perception and action.
NeuroImage 142, 454–464.
Bizley, J. K., & Cohen, Y. E. (2013). The what, where and how of auditory-object perception. Nature
Reviews Neuroscience 14(10), 693–707.
Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity
in brain regions implicated in reward and emotion. Proceedings of the National Academy of
Sciences 98(20), 11818–11823.
Bostan, A. C., Dum, R. P., & Strick, P. L. (2013). Cerebellar networks with the cerebral cortex and
basal ganglia. Trends in Cognitive Sciences 17(5), 241–254.
Brattico, E., Alluri, V., Bogert, B., Jacobsen, T., Vartiainen, N., Nieminen, S., & Tervaniemi, M.
(2011). A functional MRI study of happy and sad emotions in music with and without lyrics.
Frontiers in Psychology 2, 308.
Brechmann, A., & Scheich, H. (2005). Hemispheric shifts of sound representation in auditory cortex
with conceptual listening. Cerebral Cortex 15(5), 578–587.
Brown, S., & Martinez, M. J. (2007). Activation of premotor vocal areas during musical
discrimination. Brain and Cognition 63(1), 59–69.
Brown, S., Martinez, M. J., & Parsons, L. M. (2006). The neural basis of human dance. Cerebral
Cortex 16(8), 1157–1167.
Buckner, R. L. (2013). The cerebellum and cognitive function: 25 years of insight from anatomy and
neuroimaging. Neuron 80(3), 807–815.
Buhusi, C. V., & Meck, W. H. (2005). What makes us tick? Functional and neural mechanisms of
interval timing. Nature Reviews Neuroscience 6(10), 755–765.
Burunat, I., Alluri, V., Toiviainen, P., Numminen, J., & Brattico, E. (2014). Dynamics of brain
activity underlying working memory for music in a naturalistic condition. Cortex 57, 254–269.
Caligiore, D., Pezzulo, G., Baldassarre, G., Bostan, A. C., Strick, P. L., Doya, K., … Herreros, I.
(2017). Consensus paper: Towards a systems-level view of cerebellar function: The interplay
between cerebellum, basal ganglia, and cortex. Cerebellum 16(1), 203–229.
Cammoun, L., Thiran, J. P., Griffa, A., Meuli, R., Hagmann, P., & Clarke, S. (2015).
Intrahemispheric cortico-cortical connections of the human auditory cortex. Brain Structure &
Function 220(6), 3537–3553.
Cha, K., Zatorre, R. J., & Schönwiesner, M. (2016). Frequency selectivity of voxel-by-voxel
functional connectivity in human auditory cortex. Cerebral Cortex 26(1), 211–224.
Chapin, H. L., Zanto, T., Jantzen, K. J., Kelso, S. J. A. A., Steinberg, F., & Large, E. W. (2010).
Neural responses to complex auditory rhythms: The role of attending. Frontiers in Psychology 1,
547–558.
Chauvigné, L. A. S., Gitau, K. M., & Brown, S. (2014). The neural basis of audiomotor entrainment:
An ALE meta-analysis. Frontiers in Human Neuroscience 8, 776.
Chen, C. H., Fremont, R., Arteaga-Bracho, E. E., & Khodakhah, K. (2014). Short latency cerebellar
modulation of the basal ganglia. Nature Neuroscience 17(12), 1767–1775.
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008a). Listening to musical rhythms recruits motor
regions of the brain. Cerebral Cortex 18(12), 2844–2854.
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008b). Moving on time: Brain network for auditory-
motor synchronization is modulated by rhythm complexity and musical training. Journal of
Cognitive Neuroscience 20(2), 226–239.
Chen, J. L., Rae, C., & Watkins, K. E. (2012). Learning to play a melody: An fMRI study examining
the formation of auditory-motor associations. NeuroImage 59(2), 1200–1208.
Chen, J. L., Zatorre, R. J., & Penhune, V. B. (2006). Interactions between auditory and dorsal
premotor cortex during synchronization to musical rhythms. NeuroImage 32(4), 1771–1781.
Choppin, S., Trost, W., Dondaine, T., Millet, B., Drapier, D., Vérin, M., … Grandjean, D. (2016).
Alteration of complex negative emotions induced by music in euthymic patients with bipolar
disorder. Journal of Affective Disorders 191, 15–23.
Coull, J. T., Cheng, R. K., & Meck, W. H. (2011). Neuroanatomical and neurochemical substrates of
timing. Neuropsychopharmacology 36(1), 3–25.
Coull, J. T., Hwang, H. J., Leyton, M., & Dagher, A. (2012). Dopamine precursor depletion impairs
timing in healthy volunteers by attenuating activity in putamen and supplementary motor area.
Journal of Neuroscience 32(47), 16704–16715.
Coull, J. T., Vidal, F., & Burle, B. (2016). When to act, or not to act: That’s the SMA’s question.
Current Opinion in Behavioral Sciences 8, 14–21.
Cunnington, R., Bradshaw, J. L., & Iansek, R. (1996). The role of the supplementary motor area in
the control of voluntary movement. Human Movement Science 15(5), 627–647.
D’Ausilio, A., Altenmüller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity
of the motor cortex while listening to a rehearsed musical piece. European Journal of
Da Costa, S., van der Zwaag, W., Marques, J. P., Frackowiak, R. S. J., Clarke, S., & Saenz, M.
(2011). Human primary auditory cortex follows the shape of Heschl’s gyrus. Journal of
Neuroscience 31(40), 14067–14075.
de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L., & Theunissen, F. E. (2017). The
hierarchical cortical organization of human speech processing. Journal of Neuroscience 37(27),
6539–6557.
Del Olmo, M. F., Cheeran, B., Koch, G., & Rothwell, J. C. (2007). Role of the cerebellum in
externally paced rhythmic finger movements. Journal of Neurophysiology 98(1), 145–152.
Diedrichsen, J., Criscimagna-Hemminger, S. E., & Shadmehr, R. (2007). Dissociating timing and
coordination as functions of the cerebellum. Journal of Neuroscience 27(23), 6291–6301.
Doyon, J., Penhune, V., & Ungerleider, L. G. (2003). Distinct contribution of the cortico-striatal and
cortico-cerebellar systems to motor skill learning. Neuropsychologia 41(3), 252–262.
Dum, R. P. (2002). An unfolded map of the cerebellar dentate nucleus and its projections to the
cerebral cortex. Journal of Neurophysiology 89(1), 634–639.
Durstewitz, D. (2003). Self-organizing neural integrator predicts interval times through climbing
activity. Journal of Neuroscience 23(12), 5342–5353.
Farrugia, N., Jakubowski, K., Cusack, R., & Stewart, L. (2015). Tunes stuck in your brain: The
frequency and affective evaluation of involuntary musical imagery correlate with cortical structure.
Consciousness and Cognition 35, 66–77.
Fatemi, S. H., Aldinger, K. A., Ashwood, P., Bauman, M. L., Blaha, C. D., Blatt, G. J., …Welsh, J. P.
(2012). Consensus paper: Pathological role of the cerebellum in autism. Cerebellum 11(3), 777–
807.
Fernández-Miranda, J. C., Wang, Y., Pathak, S., Stefaneau, L., Verstynen, T., & Yeh, F. C. (2015).
Asymmetry, connectivity, and segmentation of the arcuate fascicle in the human brain. Brain
Structure & Function 220(3), 1665–1680.
Foster, N. E. V. V, Halpern, A. R., & Zatorre, R. J. (2013). Common parietal activation in musical
mental transformations across pitch and time. NeuroImage 75, 27–35.
Friston, K. J. (2011). Functional and effective connectivity: A review. Brain Connectivity 1(1), 13–
36.
Froud, K. E., Wong, A. C. Y., Cederholm, J. M. E., Klugmann, M., Sandow, S. L., Julien, J.-P., …
Housley, G. D. (2015). Type II spiral ganglion afferent neurons drive medial olivocochlear reflex
suppression of the cochlear amplifier. Nature Communications 6(1), 7115.
Frühholz, S., Trost, W., & Grandjean, D. (2014). The role of the medial temporal limbic system in
processing emotions in voice and music. Progress in Neurobiology 123, 1–17.
Frühholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions: Towards a unifying neural
network perspective of affective sound processing. Neuroscience & Biobehavioral Reviews 68,
96–110.
Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized Timing of isochronous
sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32(5), 1791–
1802.
Gaab, N., Gaser, C., Zaehle, T., Jancke, L., & Schlaug, G. (2003). Functional anatomy of pitch
memory: An fMRI study with sparse temporal sampling. NeuroImage 19(4), 1417–1426.
Gao, J. H., Parsons, L. M., Bower, J. M., Xiong, J., Li, J., & Fox, P. T. (1996). Cerebellum implicated
in sensory acquisition and discrimination rather than motor control. Science 272(5261), 545–547.
Giovannelli, F., Banfi, C., Borgheresi, A., Fiori, E., Innocenti, I., Rossi, S., … Cincotta, M. (2013).
The effect of music on corticospinal excitability is related to the perceived emotion: A transcranial
magnetic stimulation study. Cortex 49(3), 702–710.
Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of
Grahn, J. A., Henry, M. J., & McAuley, J. D. (2011). fMRI investigation of cross-modal interactions
in beat perception: Audition primes vision, but not vice versa. NeuroImage 54(2), 1231–1243.
Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians
and nonmusicians during beat perception. Journal of Neuroscience 29(23), 7540–7548.
Grahn, J. A., & Rowe, J. B. (2013). Finding and feeling the musical beat: Striatal dissociations
between detection and prediction of regularity. Cerebral Cortex 23(4), 913–921.
Griffiths, T. D., & Warren, J. D. (2002). The planum temporale as a computational hub. Trends in
Neurosciences 25(7), 348–353.
Griffiths, T. D., & Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience
5(11), 887–892.
Grothe, B. (2000). The evolution of temporal processing in the medial superior olive, an auditory
brainstem structure. Progress in Neurobiology 61(6), 581–610.
Groussard, M., Viader, F., Hubert, V., Landeau, B., Abbas, A., Desgranges, B., … Platel, H. (2010).
Musical and verbal semantic memory: Two distinct neural networks? NeuroImage 49(3), 2764–
2773.
Grube, M., Cooper, F. E., Chinnery, P. F., & Griffiths, T. D. (2010). Dissociation of duration-based
and beat-based auditory timing in cerebellar degeneration. Proceedings of the National Academy of
Sciences 107(25), 11597–11601.
Grube, M., Lee, K. H., Griffiths, T. D., Barker, A. T., & Woodruff, P. W. (2010). Transcranial
magnetic theta-burst stimulation of the human cerebellum distinguishes absolute, duration-based
from relative, beat-based perception of subsecond time intervals. Frontiers in Psychology 1, 171.
Haber, S. N., & Knutson, B. (2010). The reward circuit: Linking primate anatomy and human
imaging. Neuropsychopharmacology 35(1), 4–26.
Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET investigation
of auditory imagery for familiar melodies. Cerebral Cortex 9(7), 697–704.
Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural
correlates of perceived and imagined musical timbre. Neuropsychologia 42(9), 1281–1292.
Halsband, U., Ito, N., Tanji, J., & Freund, H. J. (1993). The role of premotor cortex and the
supplementary motor area in the temporal control of movement in man. Brain 116(1), 243–266.
Harrington, D. L., Castillo, G. N., Greenberg, P. A., Song, D. D., Lessig, S., Lee, R. R., & Rao, S. M.
(2011). Neurobehavioral mechanisms of temporal processing deficits in Parkinson’s disease. PLoS
ONE 6(2), e17461.
Harrington, D. L., & Jahanshahi, M. (2016). Reconfiguration of striatal connectivity for timing and
action. Current Opinion in Behavioral Sciences 8, 78–84.
Harris, R., & De Jong, B. M. (2014). Cerebral activations related to audition-driven performance
imagery in professional musicians. PLoS ONE, 9(4), e93681.
Hasegawa, T., Matsuki, K. I., Ueno, T., Maeda, Y., Matsue, Y., Konishi, Y., & Sadato, N. (2004).
Learned audio-visual cross-modal associations in observed piano playing activate the left planum
temporale: An fMRI study. Cognitive Brain Research 20(3), 510–518.
Haueisen, J., & Knösche, T. R. (2001). Involuntary motor activity in pianists evoked by music
perception. Journal of Cognitive Neuroscience 13(6), 786–792.
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews
Horn, A. K. E. (2006). The reticular formation. Progress in Brain Research 151, 127–155.
Huffman, R. F., & Henson, O. W. (1990). The descending auditory pathway and acousticomotor
systems: Connections with the inferior colliculus. Brain Research Reviews 15(3), 295–323.
Humphries, C., Liebenthal, E., & Binder, J. R. (2010). Tonotopic organization of human auditory
cortex. NeuroImage 50(3), 1202–1211.
MIT Press.
Hyde, K. L., Peretz, I., & Zatorre, R. J. (2008). Evidence for the role of the right auditory cortex in
fine pitch resolution. Neuropsychologia 46(2), 632–639.
Ito, M. (2008). Control of mental activities by internal models in the cerebellum. Nature Reviews
Iversen, J. R., & Balasubramaniam, R. (2016). Synchronization and temporal processing. Current
Opinion in Behavioral Sciences 8, 175–180.
Ivry, R. B., Spencer, R. M., Zelaznik, H. N., & Diedrichsen, J. (2002). The cerebellum and event
timing. Annals of the New York Academy of Sciences 978, 302–317.
Jahanshahi, M., Jenkins, I. H., Brown, R. G., Marsden, C. D., Passingham, R. E., & Brooks, D. J.
(1995). Self-initiated versus externally triggered movements: I. An investigation using
measurement of regional cerebral blood flow with PET and movement-related potentials in normal
and Parkinson’s disease subjects. Brain 118(4), 913–933.
Jahanshahi, M., Jones, C. R. G., Zijlmans, J., Katzenschlager, R., Lee, L., Quinn, N., … Lees, A. J.
(2010). Dopaminergic modulation of striato-frontal connectivity during motor timing in
Parkinson’s disease. Brain 133(3), 727–745.
Janak, P. H., & Tye, K. M. (2015). From circuits to behaviour in the amygdala. Nature 517(7534),
284–292.
Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral
Cortex 19(11), 2579–2594.
Janata, P. (2015). Neural basis of music perception. In G. G. Celesia & G. Hickok (Eds.), Handbook
of clinical neurology: The human auditory system (Vol. 129, pp. 187–205). Amsterdam: Elsevier.
Janata, P., Birk, J., Van Horn, J., Leman, M., Tillmann, B., & Bharucha, J. J. (2002). The cortical
topography of tonal structures underlying Western music. Science 293(5539), 2425–2430.
Janata, P., Tillmann, B., & Bharucha, J. J. (2002). Listening to polyphonic music recruits domain-
general attention and working memory circuits. Cognitive, Affective & Behavioral Neuroscience
2(2), 121–140.
Jäncke, L., Loose, R., Lutz, K., Specht, K., & Shah, N. (2000). Cortical activations during paced
finger-tapping applying visual and auditory pacing stimuli. Cognitive Brain Research 10(1–2), 51–
66.
Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying
mechanisms. Behavioral and Brain Sciences 31(5), 559–575.
Kaiser, J., Ripper, B., Birbaumer, N., & Lutzenberger, W. (2003). Dynamics of gamma-band activity
in human magnetoencephalogram during auditory pattern working memory. NeuroImage 20(2),
816–827.
Keller, P. E. (2012). Mental imagery in music performance: Underlying mechanisms and potential
benefits. Annals of the New York Academy of Sciences 1252(1), 206–213.
Kelly, R. M., & Strick, P. L. (2003). Cerebellar loops with motor cortex and prefrontal cortex of a
nonhuman primate. Journal of Neuroscience 23(23), 8432–8444.
Keren-Happuch, E., Chen, S. H. A., Ho, M. H. R., & Desmond, J. E. (2014). A meta-analysis of
cerebellar contributions to higher cognition from PET and fMRI studies. Human Brain Mapping
35(2), 593–615.
Kiehl, K. A., Laurens, K. R., Duty, T. L., Forster, B. B., & Liddle, P. F. (2001). Neural sources
involved in auditory target detection and novelty processing: An event-related fMRI study.
Psychophysiology 38(1), 133–142.
Klein, C., Liem, F., Hänggi, J., Elmer, S., & Jäncke, L. (2016). The “silent” imprint of musical
training. Human Brain Mapping 37(2), 536–546.
Klein, M. E., & Zatorre, R. J. (2011). A role for the right superior temporal sulcus in categorical
perception of musical chords. Neuropsychologia 49(5), 878–887.
Klein, M. E., & Zatorre, R. J. (2015). Representations of invariant musical categories are decodable
by pattern analysis of locally distributed BOLD responses in superior temporal and intraparietal
sulci. Cerebral Cortex 25(7), 1947–1957.
Koelsch, S. (2006). Significance of Broca’s area and ventral premotor cortex for music-syntactic
processing. Cortex 42(4), 518–520.
Koelsch, S. (2011). Toward a neural basis of music perception: A review and updated model.
Frontiers in Psychology 2, 110.
Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15(3),
170–180.
Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005). Adults and children processing
music: An fMRI study. NeuroImage 25(4), 1068–1076.
Koelsch, S., Fritz, T., v. Cramon, D. Y., Müller, K., & Friederici, A. D. (2006). Investigating emotion
with music: An fMRI study. Human Brain Mapping 27(3), 239–250.
Koelsch, S., Gunter, T. C., v. Cramon, D. Y., Zysset, S., Lohmann, G., & Friederici, A. D. (2002).
Bach speaks: A cortical “language-network” serves the processing of music. NeuroImage 17(2),
956–966.
Koelsch, S., Schulze, K., Sammler, D., Fritz, T., Müller, K., & Gruber, O. (2009). Functional
architecture of verbal and tonal working memory: An fMRI study. Human Brain Mapping 30(3),
859–873.
Koelsch, S., & Skouras, S. (2014). Functional centrality of amygdala, striatum and hypothalamus in a
“small-world” network underlying joy: An fMRI study with music. Human Brain Mapping 35(7),
3485–3498.
Koelsch, S., Skouras, S., Fritz, T., Herrera, P., Bonhage, C., Küssner, M. B., & Jacobs, A. M. (2013).
The roles of superficial amygdala and auditory cortex in music-evoked fear and joy. NeuroImage
81, 49–60.
Koelsch, S., Skouras, S., & Lohmann, G. (2018). The auditory cortex hosts network nodes influential
for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13(1),
e0190057.
Kornysheva, K., & Schubotz, R. I. (2011). Impairment of auditory-motor timing and compensatory
reorganization after ventral premotor cortex stimulation. PLoS ONE 6(6), e21421.
Kotz, S. A., Brown, R. M., & Schwartze, M. (2016). Cortico-striatal circuits and the timing of action
and perception. Current Opinion in Behavioral Sciences 8, 42–45.
Kotz, S. A., Stockert, A., & Schwartze, M. (2014). Cerebellum, temporal predictability and the
updating of a mental model. Philosophical Transactions of the Royal Society B: Biological
Sciences 369(1658), 20130403.
Koziol, L. F., Budding, D., Andreasen, N., D’Arrigo, S., Bulgheroni, S., Imamizu, H., …Yamazaki,
T. (2014). Consensus paper: The cerebellum’s role in movement and cognition. Cerebellum 13(1),
151–177.
Koziol, L. F., Budding, D. E., & Chidekel, D. (2011). Sensory integration, sensory processing, and
sensory modulation disorders: Putative functional neuroanatomic underpinnings. Cerebellum
10(4), 770–792.
Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomotor
recognition network while listening to newly acquired actions. Journal of Neuroscience 27(2),
308–314.
Langers, D. R. M. (2014). Assessment of tonotopically organised subdivisions in human auditory
cortex using volumetric and surface-based cortical alignments. Human Brain Mapping 35(4),
1544–1561.
Lappe, C., Lappe, M., & Pantev, C. (2016). Differential processing of melodic, rhythmic and simple
tone deviations in musicians: An MEG study. NeuroImage 124, 898–905.
Lappe, C., Steinsträter, O., & Pantev, C. (2013). Rhythmic and melodic deviations in musical
sequences recruit different cortical areas for mismatch detection. Frontiers in Human
Neuroscience 7, 260.
Large, E. W., Herrera, J. A., & Velasco, M. J. (2015). Neural networks for beat perception in musical
rhythm. Frontiers in Systems Neuroscience 9, 159.
Large, E. W., & Snyder, J. S. (2009). Pulse and meter as neural resonance. Annals of the New York
LeDoux, J. (2007). The amygdala. Current Biology 17(20), R868–R874.
Lee, K.-H., Egleston, P. N., Brown, W. H., Gregory, A. N., Barker, A. T., & Woodruff, P. W. R.
(2007). The role of the cerebellum in subsecond time perception: Evidence from repetitive
transcranial magnetic stimulation. Journal of Cognitive Neuroscience 19(1), 147–157.
Lee, Y. S., Janata, P., Frost, C., Hanke, M., & Granger, R. (2011). Investigation of melodic contour
processing in the brain using multivariate pattern-based fMRI. NeuroImage 57(1), 293–300.
Lehéricy, S., Ducros, M., Krainik, A., Francois, C., Van De Moortele, P. F., Ugurbil, K., & Kim, D. S.
(2004). 3-D diffusion tensor axonal tracking shows distinct SMA and pre-SMA projections to the
human striatum. Cerebral Cortex 14(12), 1302–1309.
Lehne, M., Rohrmeier, M., & Koelsch, S. (2013). Tension-related activity in the orbitofrontal cortex
and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9(10),
1515–1523.
Leow, L. A., & Grahn, J. A. (2014). Neural mechanisms of rhythm perception: Present findings and
future directions. Advances in Experimental Medicine and Biology 829, 325–338.
Lima, C. F., Krishnan, S., & Scott, S. K. (2016). Roles of supplementary motor areas in auditory
processing and auditory imagery. Trends in Neurosciences 39(8), 527–542.
Loui, P. (2015). A dual-stream neuroanatomy of singing. Music Perception 32(3), 232–241.
Lusk, N. A., Petter, E. A., Macdonald, C. J., & Meck, W. H. (2016). Cerebellar, hippocampal, and
striatal time cells. Current Opinion in Behavioral Sciences 8, 186–192.
Maes, P.-J., Leman, M., Palmer, C., & Wanderley, M. M. (2014). Action-based effects on music
perception. Frontiers in Psychology 4, 1008.
Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in
Broca’s area: An MEG study. Nature Neuroscience 4(5), 540–545.
Maidhof, C., & Koelsch, S. (2011). Effects of selective attention on syntax processing in music and
language. Journal of Cognitive Neuroscience 23(9), 2252–2267.
Manto, M., Bower, J. M., Conforto, A. B., Delgado-García, J. M., Da Guarda, S. N. F., Gerwig, M.,
… Timmann, D. (2012). Consensus paper: Roles of the cerebellum in motor control: The diversity
of ideas on cerebellar involvement in movement. Cerebellum 11, 457–487.
Martínez-Molina, N., Mas-Herrero, E., Rodríguez-Fornells, A., Zatorre, R. J., & Marco-Pallarés, J.
(2016). Neural correlates of specific musical anhedonia. Proceedings of the National Academy of
Sciences 113(46), E7337–E7345.
Marvel, C. L., & Desmond, J. E. (2010). Functional topography of the cerebellum in verbal working
memory. Neuropsychology Review 20(3), 271–279.
Mas-Herrero, E., Dagher, A., & Zatorre, R. J. (2017). Modulating musical reward sensitivity up and
down with transcranial magnetic stimulation. Nature Human Behaviour 2(1), 27–32.
Matell, M. S., & Meck, W. H. (2004). Cortico-striatal circuits and interval timing: Coincidence
detection of oscillatory processes. Cognitive Brain Research 21(2), 139–170.
Mauk, M. D., & Buonomano, D. V. (2004). The neural basis of temporal processing. Annual Review
of Neuroscience 27, 307–340.
Mayville, J. M., Jantzen, K. J., Fuchs, A., Steinberg, F. L., & Kelso, J. A. S. (2002). Cortical and
subcortical networks underlying syncopated and synchronized coordination revealed using fMRI.
Human Brain Mapping 17(4), 214–229.
Medina, J. F., & Mauk, M. D. (2000). Computer simulation of cerebellar information processing.
Nature Neuroscience 3(Suppl.), 1205–1211.
Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological
connectivity of the mesolimbic system. NeuroImage 28(1), 175–184.
Merchant, H., & Bartolo, R. (2018). Primate beta oscillations and rhythmic behaviors. Journal of
Neural Transmission 125, 461–470.
Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T. (2015). Finding the beat: A
neural perspective across humans and non-human primates. Philosophical Transactions of the
Royal Society B: Biological Sciences 370(1664), 20140093.
Merchant, H., Harrington, D. L., & Meck, W. H. (2013). Neural basis of the perception and
estimation of time. Annual Review of Neuroscience 36, 313–336.
Merchant, H., Perez, O., Zarco, W., & Gamez, J. (2013). Interval tuning in the primate medial
premotor cortex as a general timing mechanism. Journal of Neuroscience 33(21), 9082–9096.
Michaelis, K., Wiener, M., & Thompson, J. C. (2014). Passive listening to preferred motor tempo
modulates corticospinal excitability. Frontiers in Human Neuroscience 8, 252.
Middleton, F. A., & Strick, P. L. (2001). Cerebellar projections to the prefrontal cortex of the primate.
Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R. (2007). A
functional MRI study of happy and sad affective states induced by classical music. Human Brain
Mapping 28(11), 1150–1162.
Molinari, M., Leggio, M. G., Filippini, V., Gioia, M. C., Cerasa, A., & Thaut, M. H. (2005).
Sensorimotor transduction of time information is preserved in subjects with cerebellar damage.
Brain Research Bulletin 67, 448–458.
Molinari, M., Leggio, M. G., & Thaut, M. H. (2007). The cerebellum and neural networks for
rhythmic sensorimotor synchronization in the human brain. Cerebellum 6(1), 18–23.
Moore, E., Schaefer, R. S., Bastin, M. E., Roberts, N., & Overy, K. (2014). Can musical training
influence brain connectivity? Evidence from diffusion tensor MRI. Brain Sciences 4(2), 405–427.
Morillon, B., & Baillet, S. (2017). Motor origin of temporal predictions in auditory attention.
Proceedings of the National Academy of Sciences 114(42), E8913–E8921.
Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J. J., … Möller, H. E. (2015).
Investigating the dynamics of the brain response to music: A central role of the ventral
striatum/nucleus accumbens. NeuroImage 116, 68–79.
Nachev, P., Kennard, C., & Husain, M. (2008). Functional role of the supplementary and pre-
supplementary motor areas. Nature Reviews Neuroscience 9, 856–869.
Narayanan, N. S., Land, B. B., Solder, J. E., Deisseroth, K., & DiLeone, R. J. (2012). Prefrontal D1
dopamine signaling is required for temporal control. Proceedings of the National Academy of
Sciences 109(50), 20726–20731.
Nayagam, B. A., Muniak, M. A., & Ryugo, D. K. (2011). The spiral ganglion: Connecting the
peripheral and central auditory systems. Hearing Research 278(1–2), 2–20.
Nelson, A., Schneider, D. M., Takatoh, J., Sakurai, K., Wang, F., & Mooney, R. (2013). A circuit for
motor cortical modulation of auditory cortical activity. Journal of Neuroscience 33(36), 14342–
14353.
Norman-Haignere, S., Kanwisher, N., & McDermott, J. H. (2013). Cortical pitch regions in humans
respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior
auditory cortex. Journal of Neuroscience 33(50), 19451–19469.
Novembre, G., & Keller, P. E. (2014). A conceptual review on action-perception coupling in the
musicians’ brain: What is it good for? Frontiers in Human Neuroscience 8, 603.
Nozaradan, S., Schwartze, M., Obermeier, C., & Kotz, S. A. (2017). Specific contributions of basal
ganglia and cerebellum to the neural tracking of rhythm. Cortex 95, 156–168.
O’Reilly, J. X., Mesulam, M. M., & Nobre, A. C. (2008). The cerebellum predicts the timing of
perceptual events. Journal of Neuroscience 28(9), 2252–2260.
Ohnishi, T., Matsuda, H., Asada, T., Aruga, M., Hirakata, M., Nishikawa, M., … Imabayashi, E.
(2001). Functional anatomy of musical perception in musicians. Cerebral Cortex 11(8), 754–760.
Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2005).
Emotion processing of major, minor, and dissonant chords: A functional magnetic resonance
imaging study. Annals of the New York Academy of Sciences 1060, 450–453.
Palomar-García, M. Á., Zatorre, R. J., Ventura-Campos, N., Bueichekú, E., & Ávila, C. (2017).
Modulation of functional connectivity in auditory-motor networks in musicians compared with
nonmusicians. Cerebral Cortex 27(5), 2768–2778.
Pannese, A., Grandjean, D., & Frühholz, S. (2016). Amygdala and auditory cortex exhibit distinct
sensitivity to relevant acoustic features of auditory emotions. Cortex 85, 116–125.
Paquette, S., Fujii, S., Li, H. C., & Schlaug, G. (2017). The cerebellum’s contribution to beat interval
discrimination. NeuroImage 163, 177–182.
Parsons, L. M. (2012). Exploring the functional neuroanatomy of music performance, perception, and
comprehension. The Cognitive Neuroscience of Music 930(1), 211–231.
Parsons, L. M., Petacchi, A., Schmahmann, J. D., & Bower, J. M. (2009). Pitch discrimination in
cerebellar patients: Evidence for a sensory deficit. Brain Research 1303, 84–96.
Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing of
temporal pitch and melody information in auditory cortex. Neuron 36(4), 767–776.
Pecenka, N., Engel, A., & Keller, P. E. (2013). Neural correlates of auditory temporal predictions
during sensorimotor synchronization. Frontiers in Human Neuroscience 7, 380.
Pelzer, E. A., Melzer, C., Timmermann, L., von Cramon, D. Y., & Tittgemeyer, M. (2017). Basal
ganglia and cerebellar interconnectivity within the human thalamus. Brain Structure and Function
222(1), 381–392.
Perani, D. (2012). Functional and structural connectivity for language and music processing at birth.
Rendiconti Lincei 23(3), 305–314.
Peretz, I., Gosselin, N., Belin, P., Zatorre, R. J., Plailly, J., & Tillmann, B. (2009). Music lexical
networks: The cortical organization of music recognition. Annals of the New York Academy of
Sciences 1169, 256–265.
Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annual Review of
Petter, E. A., Lusk, N. A., Hesslow, G., & Meck, W. H. (2016). Interactive roles of the cerebellum
and striatum in sub-second and supra-second timing: Support for an initiation, continuation,
adjustment, and termination (ICAT) model of temporal processing. Neuroscience & Biobehavioral
Reviews 71, 739–755.
Pfordresher, P. Q., Mantell, J. T., Brown, S., Zivadinov, R., & Cox, J. L. (2014). Brain responses to
altered auditory feedback during musical keyboard production: An fMRI study. Brain Research
1556, 28–37.
Plakke, B., & Romanski, L. M. (2014). Auditory connections and functions of prefrontal cortex.
Frontiers in Neuroscience 8, 199.
Platel, H., Baron, J. C., Desgranges, B., Bernard, F., & Eustache, F. (2003). Semantic and episodic
memory of music are subserved by distinct neural networks. NeuroImage 20(1), 244–256.
Proverbio, A. M., Orlandi, A., & Pisanu, F. (2016). Brain processing of consonance/dissonance in
musicians and controls: A hemispheric asymmetry revisited. European Journal of Neuroscience
44(6), 2340–2356.
Rao, S. M., Harrington, D. L., Haaland, K. Y., Bobholz, J. A., Cox, R. W., & Binder, J. R. (1997).
Distributed neural systems underlying the timing of movements. Journal of Neuroscience 17(14),
5528–5535.
Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman
primates illuminate human speech processing. Nature Neuroscience 12(6), 718–724.
Reser, D. H., Burman, K. J., Richardson, K. E., Spitzer, M. W., & Rosa, M. G. P. (2009). Connections
of the marmoset rostrotemporal auditory area: Express pathways for analysis of affective content
in hearing. European Journal of Neuroscience 30(4), 578–592.
Reybrouck, M., & Brattico, E. (2015). Neuroplasticity beyond sounds: Neural adaptations following
long-term musical aesthetic experiences. Brain Sciences 5(1), 69–91.
Roberts, T. F., Hisey, E., Tanaka, M., Kearney, M. G., Chattree, G., Yang, C. F., … Mooney, R.
(2017). Identification of a motor-to-auditory pathway important for vocal learning. Nature
Rogenmoser, L., Zollinger, N., Elmer, S., & Jäncke, L. (2016). Independent component processes
underlying emotions during natural music listening. Social Cognitive and Affective Neuroscience
11(9), 1428–1439.
Ross, B., Barat, M., & Fujioka, T. (2017). Sound-making actions lead to immediate plastic changes
of neuromagnetic evoked responses and induced β-band oscillations during perception. Journal of
Neuroscience 37(24), 5948–5959.
Ross, J. M., Iversen, J. R., & Balasubramaniam, R. (2016). Motor simulation theories of musical beat
perception. Neurocase 22(6), 558–565.
Rossignol, S., & Melvill Jones, G. (1976). Audio-spinal influence in man studied by the H-reflex and
its possible role on rhythmic movements synchronized to sound. Electroencephalography and
Clinical Neurophysiology 41(1), 83–92.
Royal, I., Vuvan, D. T., Zendel, B. R., Robitaille, N., Schönwiesner, M., & Peretz, I. (2016).
Activation in the right inferior parietal lobule reflects the representation of musical structure
beyond simple pitch discrimination. PLoS ONE 11(5), e0155291.
Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects human aesthetic
responses to music. Social Cognitive and Affective Neuroscience 11(6), 884–891.
Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically
distinct dopamine release during anticipation and experience of peak emotion to music. Nature
Salimpoor, V. N., Van Den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J.
(2013). Interactions between the nucleus accumbens and auditory cortices predict music reward
value. Science 340(6129), 216–219.
Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and
the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91.
Sänger, J., Müller, V., & Lindenberger, U. (2012). Intra- and interbrain synchronization and network
properties when playing guitar in duets. Frontiers in Human Neuroscience 6, 312.
Santoro, R., Moerel, M., De Martino, F., Goebel, R., Ugurbil, K., Yacoub, E., & Formisano, E.
(2014). Encoding of natural sounds at multiple spectral and temporal resolutions in the human
auditory cortex. PLoS Computational Biology 10(1), e1003412.
Satoh, M., Takeda, K., Nagata, K., Hatazawa, J., & Kuzuhara, S. (2001). Activated brain regions in
musicians during an ensemble: A PET study. Cognitive Brain Research 12(1), 101–108.
Saur, D., Kreher, B. W., Schnell, S., Kummerer, D., Kellmeyer, P., Vry, M.-S., … Weiller, C. (2008).
Ventral and dorsal pathways for language. Proceedings of the National Academy of Sciences
105(46), 18035–18040.
Schindler, A., Herdener, M., & Bartels, A. (2013). Coding of melodic gestalt in human auditory
cortex. Cerebral Cortex 23(12), 2987–2993.
Schmahmann, J. D., & Pandya, D. N. (1997). The cerebrocerebellar system. International Review of
Neurobiology 41, 31–38, 38a, 39–60.
Schneider, D. M., & Mooney, R. (2015). Motor-related signals in the auditory system for listening
and learning. Current Opinion in Neurobiology 33, 78–84.
Schneider, D. M., Nelson, A., & Mooney, R. (2014). A synaptic and circuit basis for corollary
discharge in the auditory cortex. Nature 513(7517), 189–194.
Schön, D., Gordon, R. L., & Besson, M. (2005). Musical and linguistic processing in song
perception. Annals of the New York Academy of Sciences 1060(1), 71–81.
Schonwiesner, M., & Zatorre, R. J. (2009). Spectro-temporal modulation transfer function of single
voxels in the human auditory cortex measured with high-resolution fMRI. Proceedings of the
Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a new
framework. Trends in Cognitive Sciences 11(5), 211–218.
Schulze, K., Zysset, S., Mueller, K., Friederici, A. D., & Koelsch, S. (2011). Neuroarchitecture of
verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping 32(5),
771–783.
Schwartze, M., Keller, P. E., & Kotz, S. A. (2016). Spontaneous, synchronized, and corrective timing
behavior in cerebellar lesion patients. Behavioural Brain Research 312, 285–293.
Schwartze, M., Rothermich, K., Schmidt-Kassow, M., & Kotz, S. A. (2011). Temporal regularity
effects on pre-attentive and attentive processing of deviance. Biological Psychology 87(1), 146–
151.
Seger, C. A., Spiering, B. J., Sares, A. G., Quraini, S. I., Alpeter, C., David, J., & Thaut, M. H.
(2013). Corticostriatal contributions to musical expectancy perception. Journal of Cognitive
Shadmehr, R., Smith, M. A., & Krakauer, J. W. (2010). Error correction, sensory prediction, and
adaptation in motor control. Annual Review of Neuroscience 33, 89–108.
Sokolov, A. A., Miall, R. C., & Ivry, R. B. (2017). The cerebellum: Adaptive prediction for
movement and cognition. Trends in Cognitive Sciences 21(5), 313–332.
Spencer, R. M. C., Ivry, R. B., & Zelaznik, H. N. (2005). Role of the cerebellum in movements:
Control of timing or movement transitions? Experimental Brain Research 161(3), 383–396.
Stewart, L., Overath, T., Warren, J. D., Foxton, J. M., & Griffiths, T. D. (2008). fMRI evidence for a
cortical hierarchy of pitch pattern processing. PLoS ONE 3(1), e1470.
Stoodley, C. J., & Schmahmann, J. D. (2009). Functional topography in the human cerebellum: A
meta-analysis of neuroimaging studies. NeuroImage 44(2), 489–501.
Stoodley, C. J., & Schmahmann, J. D. (2010). Evidence for topographic organization in the
cerebellum of motor control versus cognitive and affective processing. Cortex 46(7), 831–844.
Stupacher, J., Hove, M. J., Novembre, G., Schütz-Bosbach, S., & Keller, P. E. (2013). Musical
groove modulates motor cortex excitability: A TMS investigation. Brain and Cognition 82(2),
127–136.
Suga, N., & Ma, X. (2003). Multiparametric corticofugal modulation and plasticity in the auditory
system. Nature Reviews Neuroscience 4(10), 783–794.
Teki, S., Grube, M., & Griffiths, T. D. (2012). A unified model of time perception accounts for
duration-based and beat-based timing mechanisms. Frontiers in Integrative Neuroscience 5, 90.
Teki, S., Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of duration-based
and beat-based auditory timing. Journal of Neuroscience 31(10), 3805–3812.
Tervaniemi, M., Medvedev, S. V., Alho, K., Pakhomov, S. V., Roudas, M. S., Van Zuijen, T. L., &
Näätänen, R. (2000). Lateralized automatic auditory processing of phonetic versus musical
information: A PET study. Human Brain Mapping 10(2), 74–79.
Tesche, C. D., & Karhu, J. J. T. (2000). Anticipatory cerebellar responses during somatosensory
omission in man. Human Brain Mapping 9(3), 119–142.
Thaut, M. H., Demartin, M., & Sanes, J. N. (2008). Brain networks for integrative rhythm formation.
PLoS ONE 3(5), e2312.
Thaut, M. H., McIntosh, G. C., Prassas, S. G., & Rice, R. R. (1992). Effect of rhythmic auditory
cuing on temporal stride parameters and EMG patterns in normal gait. Neurorehabilitation and
Neural Repair 6(4), 185–190.
Thaut, M. H., Stephan, K. M., Wunderlich, G., Schicks, W., Tellmann, L., Herzog, H., … Hömberg,
V. (2009). Distinct cortico-cerebellar activations in rhythmic auditory motor synchronization.
Cortex 45(1), 44–53.
Thaut, M. H., Trimarchi, P., & Parsons, L. (2014). Human brain basis of musical rhythm perception:
Common and distinct neural substrates for meter, tempo, and pattern. Brain Sciences 4(2), 428–
452.
Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical
priming. Annals of the New York Academy of Sciences 999, 209–211.
Toiviainen, P., Alluri, V., Brattico, E., Wallentin, M., & Vuust, P. (2014). Capturing the musical brain
with Lasso: Dynamic decoding of musical features from fMRI data. NeuroImage 88, 170–180.
Tollin, D. J. (2003). The lateral superior olive: A functional role in sound source localization.
Neuroscientist 9(2), 127–143.
Tramo, M. J., Shah, G. D., & Braida, L. D. (2002). Functional role of auditory cortex in frequency
processing and pitch perception. Journal of Neurophysiology 87(1), 122–139.
Trost, W., Ethofer, T., Zentner, M., & Vuilleumier, P. (2012). Mapping aesthetic musical emotions in
the brain. Cerebral Cortex 22(12), 2769–2783.
Tseng, Y., Diedrichsen, J., Krakauer, J. W., Shadmehr, R., & Bastian, A. J. (2007). Sensory prediction
errors drive cerebellum-dependent adaptation of reaching. Journal of Neurophysiology 98(1), 54–
62.
Von Der Heide, R. J., Skipper, L. M., Klobusicky, E., & Olson, I. R. (2013). Dissecting the uncinate
fasciculus: Disorders, controversies and a hypothesis. Brain 136(6), 1692–1707.
Warren, J. D., Jennings, A. R., & Griffiths, T. D. (2005). Analysis of the spectral envelope of sounds
by the human brain. NeuroImage 24(4), 1052–1057.
Warren, J. D., Uppenkamp, S., Patterson, R. D., & Griffiths, T. D. (2003). Separating pitch chroma
and pitch height in the human brain. Proceedings of the National Academy of Sciences 100(17),
10038–10042.
Warren, J. E., Wise, R. J. S., & Warren, J. D. (2005). Sounds do-able: Auditory-motor
transformations and the posterior temporal plane. Trends in Neurosciences 28(12), 636–643.
Warrier, C., Wong, P., Penhune, V., Zatorre, R., Parrish, T., Abrams, D., & Kraus, N. (2009). Relating
structure to function: Heschl’s gyrus and acoustic processing. Journal of Neuroscience 29(1), 61–
69.
Wessinger, C. M., VanMeter, J., Tian, B., Van Lare, J., Pekar, J., & Rauschecker, J. P. (2001).
Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance
imaging. Journal of Cognitive Neuroscience 13(1), 1–7.
Wilkins, R. W., Hodges, D. A., Laurienti, P. J., Steen, M., & Burdette, J. H. (2014). Network science
and the effects of music preference on functional brain connectivity: From Beethoven to Eminem.
Scientific Reports 4(1), 6130.
Wilson, E. M. F., & Davey, N. J. (2002). Musical beat influences corticospinal drive to ankle flexor
and extensor muscles in man. International Journal of Psychophysiology 44(2), 177–184.
Witt, S. T., Laird, A. R., & Meyerand, M. E. (2008). Functional neuroimaging correlates of finger-
tapping task variations: An ALE meta-analysis. NeuroImage 42(1), 343–356.
Wolpert, D. M., Miall, R. C., & Kawato, M. (1998). Internal models in the cerebellum. Trends in
Wu, J., Zhang, J., Ding, X., Li, R., & Zhou, C. (2013). The effects of music on brain functional
networks: A network analysis. Neuroscience 250, 49–59.
Wu, J., Zhang, J., Liu, C., Liu, D., Ding, X., & Zhou, C. (2012). Graph theoretical analysis of EEG
functional connectivity during music perception. Brain Research 1483, 71–81.
Zatorre, R. J. (2002). Auditory cortex. In V. S. Ramachandran (Ed.), Encyclopedia of the Human
Brain (pp. 289–301). Amsterdam: Elsevier.
Zatorre, R. J. (2015). Musical pleasure and reward: Mechanisms and dysfunction. Annals of the New
York Academy of Sciences 1337(1), 202–211.
Zatorre, R. J., Bouffard, M., & Belin, P. (2004). Sensitivity to auditory object features in human
temporal neocortex. Journal of Neuroscience 24(14), 3637–3642.
Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the mind’s
ear: A PET investigation of musical imagery and perception. Journal of Cognitive Neuroscience
8(1), 29–46.
Zatorre, R. J., & Salimpoor, V. N. (2013). From perception to pleasure: Music and its neural
substrates. Proceedings of the National Academy of Sciences 110(Suppl. 2), 10430–10437.
Zatorre, R. J., & Zarate, J. (2012). Cortical processing of music. In D. Poeppel, T. Overath, A. N.
Popper, & R. R. Fay (Eds.), The human auditory cortex: Springer handbook of auditory research
(Vol. 43, pp. 261–294). New York: Springer.
Zysset, S., Huber, O., Ferstl, E., & von Cramon, D. Y. (2002). The anterior frontomedian cortex and
evaluative judgment: An fMRI study. NeuroImage 15(4), 983–991.
CHAPT E R 6
NETWORK
NEUROSCIENCE: AN
I N T R O D U C T I O N TO G R A P H
T H E O RY N E T W O R K - B A S E D
TECHNIQUES FOR MUSIC
AND BRAIN IMAGING
RESEARCH
R O B I N W. WI L K I N S
I this chapter, I provide an introduction to network neuroscience

techniques and methods that may be successfully applied to neuroimaging
data for brain-based music research. Included in this chapter is a
background to the field of network science more broadly, as an approach to
the study of complex systems, in addition to the more currently accepted
graph theory techniques and applied analysis methods. The focus of the
chapter is on two main components. First, an introductory overview of
some of the specific network-based techniques that may be applied to
neuroimaging data for understanding structural and functional brain
connectivity. For those interested in pursuing the effects of music on brain
connectivity, it is important to understand there is a difference between
network-based brain connectivity analyses and other conventional
correlation measures of connectivity analysis. This is particularly true
within the most prominent area of resting-state connectivity research
(Biswall, Kylen, & Hyde, 1997; Biswal, Yetkin, Haughton, & Hyde, 1995;
Fox et al., 2005; Greicius, Krasnow, Reiss, & Menon, 2003) as well as the
default mode network (Broyd et al., 2009; Buckner, Andrews-Hanna, &
Schacter, 2008; Raichle, 2001). At present, terms such as “brain networks,”
“functional connectivity,” or “brain connectivity” frequently appear in the
brain imaging literature. Nonetheless, readers are cautioned that all
connectivity terms are not scientifically interchangeable or mathematically
equal in their approach. Non-trivial statistical differences depend on
whether network-based or correlational statistical methods are used to
analyze and describe structural and functional brain connectivity (Bassett &
Sporns, 2017; Bullmore & Sporns, 2009; Greicius et al., 2003; Stam, 2014).
In addition, the field of network neuroscience is currently highly active and
readers will find more refined network measures being generated and
reported regularly. Thus, as a neuroscientific frontier, the second section of
this chapter provides some of the more promising implications from the
application of these network neuroscience techniques for advancing our
understanding of the effects of music on structural and functional brain
network connectivity. Ultimately, the supportive evidence found from the
application of these techniques may prove useful for a host of neurological
questions and neurorehabilitation avenues surrounding musical experiences
and the brain.
Overview of Network Science

Network-based approaches to the study of complex systems have become
ubiquitous in a wide variety of research areas (Barabási & Albert, 1999;
Newman, 2003; Watts & Strogatz, 1998). Steeped in the mathematical
foundation of graph theory (Euler, 1736), network methods have led to a
greater understanding of the interactions between components in systems as
disparate as social networks, biological systems, communication arrays, and
transportation networks (Barabási, 2002; Newman, 2003; Watts, 2003;
Watts & Strogatz, 1998). In addition, the fields of neuroscience and
neuroimaging have greatly benefited from a network science approach
(Bassett & Sporns, 2017; Stam & Reijneveld, 2007). Studying the brain as a
complex system presents an opportunity to understand how structural and
functional features contribute to dynamic mental phenomena of the brain.
Importantly, network-based methods move opportunities in experimental
design forward and progress beyond correlation analyses of neuroimaging
data by providing more advanced statistical measures to evaluate whole
brain connectivity (Bassett & Bullmore, 2006; Bullmore & Sporns, 2009;
Fox, Zhang, Snyder, & Raichle, 2009; Sporns, Chialvo, Kaiser, & Hilgetag,
2004; Sporns, Tononi, & Kötter, 2005). Here, the brain is subdivided into
regions (represented as network nodes) and interregional interactions
(represented as network edges) estimated from structural or functional
neuroimaging modalities, including functional magnetic resonance imaging
(fMRI), electroencephalography (EEG), diffusion tensor imaging (DTI),
and magnetoencephalography (MEG) (Friston, Frith, Turner, &
Frackowiak, 1995; Logothetis, 2008; Stam & Reijneveld, 2007; Tuch,
Reese, Wiegell, & Wedeen, 2003; Wedeen, Hagmann, Tseng, Reese, &
Weiskoff, 2005). Recent advances in a network-based understanding of the
brain have dramatically changed from conventional approaches of more
traditional brain-activation focused experiments and former statistical
analysis methods of neuroimaging data (Savoy, 2005; Shirer, Ryali,
Rykhlevskaia, Menon, & Greicius, 2012). Now, rather than trying to
understand brain function through isolated areas of brain response
activation, researchers are able to explore neurological responses
throughout the entire brain, as an interconnected system. This knowledge,
that the brain is a complex system, is transforming our more traditional
understanding of the brain (Bassett & Sporns, 2017; Betzel et al., 2012).
Approaching the brain as a system presents an opportunity to uncover
patterns in interregional interactions that are not apparent with conventional
neuroimaging approaches to experimental design and analysis methods
(Bassett, Khambhati, & Grafton, 2017; Bassett & Sporns, 2017; He &
Evans, 2010; Sporns et al., 2005). This is specifically advantageous to
questions surrounding music and brain imaging research. Unlike
conventional neuroimaging analyses, a focal impetus behind network-based
analyses arises from the hypothesis that a network approach provides for a
more accurate representation of the brain as an interconnected system, an
organizational property that is often overlooked in more conventional
neuroscientific approaches (Telesford, Simpson, Burdette, Hayasaka, &
Laurienti, 2011). Perhaps more importantly, network methods allow for a
statistically principled investigation of different brain states and
neurological disorders under a common representational framework
(Bassett & Bullmore, 2009; Moussa et al., 2011; Sporns et al., 2004).
Network-based methods not only refine the outcomes of existing
techniques, but also typify a paradigm shift for representing the brain’s
structure and functional connectivity dynamics. This approach offers
quantitatively different maps, where networks, consisting of nodes (e.g.,
voxels of neurons or brain regions) and links (e.g., anatomical or functional
connections) are endowed with topological properties. Studying the brain at
these various levels has led to the emergence of substantial evidence from
the newer field of network neuroscience, a now firmly established brain-
based scientific frontier (Bassett & Sporns, 2017). Within the brain, music
affects an intricate set of complex neural processing systems (Alluri et al.,
2012, 2013; Koelsch, 2009; Schlaug, 2001, 2009a; Thaut, Demartin, &
Sanes, 2008; Wilkins, 2015; Wilkins, Hodges, Laurienti, Steen, & Burdette,
2012, 2014; Zatorre, Evans, Meyer, & Gjedde, 1992). These include
structural components associated with sensory processing as well as
functional elements implicated in memory, cognition, and mood fluctuation.
Because music affects such diverse systems in the brain, it is an ideal
candidate for analysis using a network-based approach (Guye, Bettus,
Bartolomei, & Cozzone, 2010; Wilkins, 2015).
A network approach represents a conceptual revolution beyond standard
statistical approaches by bringing together researchers from a variety of
disciplines to work on complex problems that defy understanding through
confinement within any single discipline (West, 2011). With recent
technological and analytical advances, we are witnessing an explosion in
the quantity of network data and the subsequent comprehensiveness of
information gleaned by generating network-based maps of complex
systems’ data at each spatiotemporal scale. Importantly, network-based
methods offer a natural mathematical framework that not only refine the
outcomes of existing statistical analysis techniques, but also typify a
paradigm shift for representing complex systems’ structure and dynamics.
Extrapolating new and highly-detailed information may now be found
within the intricacies of complex systems (Mitchell, 2009; Strogatz, 2001).
Consequently, new and rewarding solutions are being obtained to address
problems important to society (Wang, González, Hidalgo, & Barabási,
2009; West, 2011). For readers interested in learning more about the
emerging area of networks, the book Linked gives a user-friendly account of
developments in the study of networks (Barabási, 2002). In addition, Six
Degrees offers a sociologist’s view of historical discoveries, both old and
new (Watts, 2003).
Introduction to Network Metrics

As a field of interdisciplinary statistical physics, network science provides a
host of robust statistical techniques and methods for investigating the
structure and function of complex systems that display behaviors that defy
explanation by the study of the systems’ elements in isolation (Barabási &
Albert, 1999; Girvan & Newman, 2002). Network science is based on the
branch of mathematics called graph theory (Euler, 1736; Newman, 2003). A
graph is simply a mathematical representation of any real-world network
that is made up of interconnected elements. In its most basic form, a
graphed network is a collection of points, referred to as vertices or nodes
connected together by lines as links or edges (see Fig. 1). A simple graph is
a set of nodes that has a set of edges. Nodes represent the fundamental
elements of the system, such as people, and the edges represent the
connections between the pairs of nodes, such as friendships between pairs
of people. Thus, a network is basically defined as a set of nodes or vertices
where the connections between them are measured as links or edges. It is
important to note that networks can be either directed or undirected,
depending on the type of network and the data provided. Undirected
networks are networks where information is passed to and from any given
node in no particular or specific flow pattern. Directed networks, on the
other hand, imply that information flows in a unilateral direction. Finally,
networks can be weighted or unweighted, depending on the choice of the
type of network. For a more detailed discussion, see Newman (2006).
FIGURE 1. Demonstration of a network. This network is comprised of 13 nodes. Nodes are shown
as numbered circles. Nodes are connected to other nodes within the network by edges or links
(shown as connecting lines).
The most primary network metric is degree. Within a network, the

degree of a node is simply the number of connections the node has to other
nodes within the rest of the network (Bullmore & Sporns, 2009; Strogatz,
2001). The degrees of all the nodes within the network form a degree
distribution (Amaral, Scala, Barthelemy, & Stanley, 2000). In random
networks, where all connections are equally possible, the degree
distribution is typically Gaussian (i.e., normal) with a symmetrically
centered distribution. Complex networks, on the other hand, generally result
in a non-Gaussian degree distribution with a long tail toward high degree
nodes.
In the depiction of the network shown in Fig. 2, nodes are connected by
links. Node 9 has edges or links that connect it to four other nodes within
the network. Thus, the node in Fig. 2 has a degree of four. The connection
links, or path length, is calculated by measuring the minimum number of
edges information must pass through, when going from one node to another
node, on its way to its final node destination within the network. The path
length measurement can be compared to a similar network with the same
number of nodes and the probability of a randomly generated set of
connection links within the same network. Thus, in any network collection
of nodes, the degree of the collection can be compared to the degree that
might occur in a randomly connected network of the same size or density
(i.e., the total number of nodes within the network). In Fig. 3, we can see
that nodes within a network can have the equal probability of connecting to
each and every other node within the network. If all nodes in the network
connect to all the other possible neighboring nodes, we would say that the
network is regular (i.e., completely connected). If, on the other hand, we
investigated the possibility of the connections of a node within a random
network, we would see a different result. In random networks, all degree
connections are equally probable, resulting as a Gaussian degree
distribution.
FIGURE 2. Demonstration of the Network Statistic Degree. This figure depicts the degree of a
network node. In this network, Node 9 has edge connections to four other nodes in the network.
Thus, Node 9 has a degree of four. Note that Node 9 also connects to Nodes 6, 7, 8, and 11 within
the network but does not connect to Node 10 or Node 12.
FIGURE 3. Depiction of three networks: regular, small-world, and random. This figure
demonstrates differences in connections within three networks that have the same number of nodes.
The regular network has connections with all neighboring nodes but no long-range connections. The
random network has haphazard connections throughout the network. In contrast, the small-world
network has primarily nearest neighbor connections but also some long-range connections across the
network. This is referred to as the “small-world” effect. Small-world networks have been revealed to
be a property of the brain.
As shown in Fig. 3, in the regular network we can see that each node is
connected to each and every other neighboring node, but does not have
long-range connections to nodes across the network. The regular network is
considered completely connected. However, in a random network, node
connections are arbitrary. Thus, in contrast to both the regular and the
random network, the small-world network depicted in the center of Fig. 3
shows that most nodes connect to neighboring nodes. However, this
network has a few nodes with long-range connections to other network
nodes. Thus, while the regular network has a lot of node-to-nearest-
neighbor connections, the small-world network also has a few distinct long-
range nodal connections that, in turn, generate close proximity through
direct connectivity (Amaral et al., 2000). These direct connections are
found regardless of node location (i.e., regional proximity). This
phenomenon of a “small-world” effect is a widely recognized characteristic
of complex brain networks (Bassett & Bullmore, 2006; Watts & Strogatz,
1998).
In random networks, all node degree connections are possible. In most
complex systems however, high degree nodes tend to connect to other high
degree nodes. In other words, the network does not scale regularly. A scale-
free network is a network where the degree distribution follows a power
law. Thus, in complex systems, rather than high degree nodes exhibiting
random connection to any particular node, high degree nodes tend to self-
select by connecting to other high degree nodes and therefore generate a
non-Gaussian distribution that is scale-free. Intuitively, when considered as
a characteristic framework for understanding the brain, this makes sense.
The brain selectively utilizes its high degree connections as resources in an
efficient fashion in order to coordinate a host of widely distributed system-
level functions. These complex networks are termed “scale-free.” To recap,
nodes in complex systems, such as the brain, generally have a non-Gaussian
degree distribution, often with a long tail toward a high degree. Complex
brain networks exhibit characteristics of small-world networks where nodes
tend to connect to other nodes in disparate regions of the network
(Bullmore & Sporns, 2012). Finally, the degree distributions of nodes in
complex networks are scale-free and follow a power law (Barabási &
Albert, 1999).
If the nearest neighbors of a node are also directly connected to each
other they form a cluster (Watts & Strogatz, 1998). Nodes that tend to
cluster are considered hubs (see Fig. 4). As the term implies, hubs function
as connection “interchanges” within the network. The clustering coefficient
quantifies the number of connections that exist between the nearest
neighbors of a node as a proportion of the maximum number of possible
connections. Random networks have a low average clustering whereas
complex networks typically have high clustering. Those nodes with high
degrees, as hubs, are considered central to the network and can demonstrate
their importance to the overall functioning brain network. This is important
when considering application to the brain. Understanding brain function,
and how it may be structurally or functionally altered or remediated via
musical experiences, has important implications for understanding the
effects of music and musical training as well as treating a variety of
neurological conditions and disorders (El Haj, Fasotti, & Allain, 2012;
Hodges & Wilkins, 2015; Hyde et al., 2009; Schlaug, 2009a; Wilkins, 2015;
Wilkins et al., 2012, 2014; Wilkins et al., 2018; Wong, Skoe, Russo, Dees,
& Kraus, 2007).
FIGURE 4. Demonstration of a hub. Node 7, shown as a darker circle, is central to all the other
nodes in the network and is therefore a hub. Note that Node 7 has a degree of five, but due to its high
centrality, Node 7 is also considered a hub within the entire network.
Hubs are part of a class of network measurements termed centrality.

Centrality analysis measures how many of the shortest paths between pairs
of nodes information must pass through on its way to its final destination
within the network (Zuo et al., 2011; see Fig. 4). Presently, centrality
measures are currently an active and ongoing area of research and there are
several specific mathematical approaches to calculating unique
characteristics of centrality metrics in the brain including: betweenness,
eigenvector, and leverage centrality, among others (Borgatti, 2005; Joyce,
Laurienti, Burdette, & Hayasaka, 2010; Newman, 2005). In concept,
centrality functions like highway interchanges or subway “transfer-stops”
by calculating those nodes, as central hubs, that play an important
functional role in the network. A node with high centrality, as a hub, is
considered crucial to the network. As one could envision in Fig. 4, if the
central hub is damaged or removed the network will become fragmented
and communication across the network will be affected accordingly.
Conversely, yet perhaps equally enticingly, if a hub were able to be restored
or trained, there would be functional implications as well.
Evidence indicates that the function of a complex network requires the
maintenance of specific hubs that have high degree connections as node
clusters. These hubs, importantly, are not necessarily adjacent and may be
located in widely distributed brain regions (Bullmore & Sporns, 2012).
Presently, there are provincial hubs that have high within-module degree
and low participation coefficient as well as connector hubs with a high
participation coefficient. However, the most widely accepted metric
currently substantiated in the brain imaging literature is the “rich club,”
those regions with densely interconnected connector hubs (Bullmore &
Sporns, 2012). The selection and removal of a few critical nodes that are
hubs can inflict havoc and potentially dismantle the entire functional or
structural network (Albert, Jeong, & Barabási, 2000). Again, this has
implications for the brain. Evidence from network neuroscience has
exposed how the brain’s network resilience to attack helps protect its
fragility and potential vulnerabilities. Damage within brain regions or
specific trauma to particular brain network hubs would likely have impact
on the brain functional network. Conversely, if external stimuli such as
music or experiences in musical training can potentially re-route
connections to specific hubs in brain regions important for healthy brain
function, or even temporarily restore hub connections within traumatized
regions, research suggests the brain may experience enhanced or
therapeutic functional results (see Fig. 7) (Raglio et al., 2015; Sachs, Ellis,
Schlaug, & Loui, 2016; Shirer et al., 2012; Sihvonen et al., 2017; Thaut et
al., 2009; Wilkins et al., 2012, 2014). This would also be demonstrated in
related functional brain concepts within the neuroimaging literature such as
neuroplasticity, neurorestoration, and neurorehabilitation (Herholtz &
Zatorre, 2012; Kraus & Chandrasekaran, 2010; Schlaug, 2009a, 2009b;
Zatorre & Samson, 1991).
Assortativity is the correlation between the degrees of connected nodes.
Positive assortativity indicates that high degree nodes tend to preferentially
self-select to connect with other high degree nodes. Again, these degree
distributions, where high degree nodes connect to other high degree nodes,
result in the “small-world phenomenon” (Barabási & Albert, 1999; Watts &
Strogatz, 1998). A negatively assortative network, on the other hand,
indicates that high degree nodes tend to connect to low degree nodes.
Community structure is a network metric for the measurement of the
interconnectedness of nodes within a network (Newman & Girvan, 2004).
Somewhat similar in concept to the partition approach when similar types
of houses can be mapped into local nearby geographic sections or
neighborhoods, community structure measures the topological
configuration of the network by partitioning the network to calculate those
nodes that exhibit and share more inner connections than outer node
connections (see Fig. 5). Community structure analysis is performed by
creating non-overlapping collections of highly interconnected nodes, or
“modules” of nodes, that are statistically more connected to each other than
to other nodes within the overall network (Girvan & Newman, 2002;
Newman & Girvan, 2004). Modules are subsets of strongly connected
nodes within the brain network. Modularity is defined as the quality of a
particular partition of the network into modules (Newman & Girvan, 2004).
Computationally, modularity (often referred to as Q) reflects the number of
links between nodes within a module minus what would be expected given
a random distribution of links between all nodes regardless of modules.
This value varies from 0 to 1, with a higher value reflecting stronger
community structure. In brief, in order to calculate the consistency of
modular organization across time, the networks are first partitioned into
distinct modules (i.e., separate communities) using a choice of algorithm
approaches such as those found in Blondel, Guillaume, Lambiotte, and
Lefebvre (2008), among others. These methods include optimization
algorithms for modularity analysis that operate by identifying, through an
iterative process, partitions of the network into subsets of highly connected
nodes compared to other connected nodes’ modularity. In community
structure detection procedures, the brain network is partitioned through
multiple iterations, as repetitive calculations to detect which subdivisions
throughout the entire network have modules that result in the maximum
number of within-group edges and the minimum number of between-group
edges (Newman & Girvan, 2004).
FIGURE 5. Community Structure. This figure depicts how a network (left panel) can be analyzed
into separate communities. Community structure is a statistical detection procedure that measures
those nodes that exhibit more highly interconnected nodes, compared to other nodal connections
within the network. This network has three sub-graphed communities (shown in green, red, and blue
circles, middle and right panels). Notice that each community is still sparsely connected, through
connector hubs, to other nodes that are in other communities. Communities can be highly connected
despite their spatial or regional proximity within the brain. Community structure is a statistic that is
also referred to as “modularity.”
Community detection procedures are computationally intensive and are

impacted by the choice of node parcellation schemes. In addition, an atlas
or region-of-interest (ROI) based network will necessarily be different from
a voxel-based network, due to the size of the network and node selection.
Robust network partitioning into modules requires partitioning the
individual network into modules across multiple iterations, in order to
capture the most representative modular structure (Blondel et al., 2008;
Fortunato, 2010; Newman, 2006). In an effort to calculate module
comparisons based on groups of people or different conditions, datasets
from groups of people or conditions can be further strengthened through the
application of an additional statistical procedure termed Scaled Inclusivity
(Steen, Hayasaka, Joyce, & Laurienti, 2011). Scaled Inclusivity takes into
account each subject’s modules and then cross-compares it to each and
every other person’s modules to determine which subject’s modules are
most representative of the group (Stanley et al., 2013; Steen et al., 2011;
Wilkins et al., 2014). Importantly, scaled inclusivity also accounts for the
negative (absence) of a node within each person’s module and thus “scales”
the calculation accordingly. Again, there are several different community
detection procedures that divide the functional subsets within the network
across the brain topology and are measured through several different
optimization procedures (Blondel et al., 2008; Fortunato, 2010; Mucha,
Richardson, Macon, Porter, & Onnela, 2010). Community structure
analysis, that calculates nodes that share connections with each other as
non-overlapping groups, is also called modularity (Newman, 2006). In
closing this introductory section on network methods, there are a host of
robust statistical graph theory approaches that can be used to describe
networks that are beyond the scope of this chapter including, but not limited
to: multiplex, multilayer, multislice, multitype, hierarchial, multiweighted,
interacting, interdependent, and coupled networks. For a complete review
of fundamental brain network measurements, see Rubinov and Sporns
(2010).
In summary, there are numerous network-based metrics that can be
applied to brain imaging data. In any network, there can be different—yet
potentially equally informative—measurements about the components of a
network. These graph theory techniques account for characteristics of the
network by measuring specific components and their unique interactions
(Telesford et al., 2011). The choice of nodes for network generation
frequently varies from study to study. It is important to stress that the choice
of node parcellation scheme and procedure is key for understanding the
robustness of a particular network and subsequent results. Research has
substantiated that voxel-based brain imaging networks differ substantially
from region or atlas based networks in terms of choice of nodal parcellation
(Cohen et al., 2008; Craddock, James, Holtzheimer, Hu, & Mayberg, 2012;
Hayasaka & Laurienti, 2010; Mumford et al., 2010; Stanley et al., 2013).
Depending on the type of imaging modality (e.g., fMRI, EEG, DSI, DTI, or
MEG), the choice of node parcellation scheme(s) and approach to the actual
node selection will, necessarily, be different. Currently, there is an absence
of a fully agreed upon approach to node selection and studies can range
from single neuron to voxel-based as well as brain regions-of-interest
primarily determined by the neuroimaging literature brain atlases
(Craddock et al., 2012; Power et al., 2011; Stanley et al., 2013; Wang, Zuo,
& He, 2010). This inherently alters how the connectivity results and
analyses are interpreted. A brain network comprised of a 90-node network
is obviously going to be different than network-based statistics performed
on a 21,000 voxel-based network. The means of node selection in brain
networks largely determines the subsequent neurobiological interpretation
of the network results (Butts, 2009). Readers are again encouraged to
determine whether research reports have selected nodes based on results
from previous neuroimaging literature a priori, somewhat like a predefined
seek-and-search, which may eliminate important information before the
results and analyses are performed, or whether the brain network and
statistics were generated without biases a priori and the subsequent analyses
performed without prior intentional selection toward findings in any
particular region or specific area of the brain. Again, neither is necessarily
“better” than the other, but it is certainly worth making the distinction as the
field of music and brain connectivity research moves forward. In closing,
this section highlights the fundamental graph theory metrics from network
science. Each network-based statistic provides a different layer of
information that leads to a fuller understanding of brain connectivity.
Generating Brain Networks: Steps for Network-Based

Neuroimaging Analysis
Generating a brain network requires multiple processing steps for analyses.
In brief, functional magnetic resonance imaging (fMRI) or other
neuroimaging data (EEG, MEG) are acquired. Once the data are acquired,
several statistical procedures are applied to prepare the data for network
analysis. These procedures are typically performed as data preprocessing
steps but are frequently reported under the data processing section within
peer-reviewed research reports. The preprocessing of fMRI data involves
skull stripping of the acquired neuroimaging data (i.e., revealing the brain
only) and the application of several imperative statistical procedures that
include motion correction, slice timing correction, realignment, co-
registration of structural and functional images, normalization, and
smoothing. An excellent explanation of the statistical techniques used on
fMRI data may be found in Lindquist et al. (2018). Processing fMRI data
for network-based analysis is only performed after completion of the
preprocessing and correlation procedures through a series of statistical steps
via command line data processing. There are several fMRI data processing
applications available online such as the Free Software Library (FSL),
AFNI, FreeSurfer, Diffusion Analysis and Tracula, and Statistical
Parametric Mapping (SPM).
Brain network generation and analysis is currently an active area of
research. Due to this fact, network-based analyses include emerging
procedures and new statistical methods that are being created and applied,
with new results being published regularly. Rather than performing the
more conventional connectivity analyses, for generating brain networks
(i.e., graph-theory based networks) subsequent to the data processing phase,
several more advanced statistical procedures are needed in order to achieve
actual graph-theory based network generation and analysis. Due to the high
computational load, network procedures and approaches as well as most
state-of-the-art network processing and analyses are still managed through
various in-house data processing scripts, typically in UNIX/LINUX,
matlab, and/or python computing languages. However, there are several
useful network toolkits and software applications that are freely available
including The Brain Connectivity Tool Box, the Functional Connectivity
(Conn)Tool Box, and GraphVar (Kruschwitz, List, Waller, Rubinov, &
Walter, 2015; Rubinov & Sporns, 2010; Whitfield-Gabrielli & Neito-
Castanon, 2012). Due to the nature of network neuroscience as an emerging
field in brain science, there is also the option of developing new statistical
network measurements and approaches, including more advanced computer
scripts, that apply to specific procedures or statistical analyses. At present,
most of these are created for a new network property or for comparing
different properties. This process will continue as the field grows and will
certainly further advance our understanding of both structural and
functional brain networks in terms of cognition and perception, in addition
to neurological health and disease. These newer network analyses statistics
and algorithms are typically published in methods sections and are
frequently reported under methods as “in-house” processing scripts, many
times in the supplemental methods section of a peer-reviewed publication.
It is quite common for new network statistics and in-house processing
scripts to be employed for working with fMRI data for network analysis.
Thus, apart from the aforementioned network statistics, the field remains to
be defined fully in terms of which newer network methods are considered
sufficiently robust as “gold standards.” Again, researchers are cautioned
that this is particularly true for node parcellation and node choice selection
(Stanley et al., 2013).
For any network analysis, once the fMRI data have been processed, a
connectivity matrix must be generated. In brief, for connectivity analysis
(often referred to as “functional connectivity”), a cross-correlation
procedure is applied between each node and each and every other node.
Current neuroimaging technology limits functional brain network analysis
to nodes above the millimeter scale, meaning that many potentially
interacting neurons and synapses will be represented as individual nodes in
human brain networks. Once the cross-correlation (i.e., the connectivity)
matrix is generated, a thresholding statistic is applied to the data. A set of
statistical thresholding procedures are performed on each correlation so that
the resulting matrix can be binarized to reveal the strongest connections in
the network. Thresholding is currently another active area of network
research (Van den Heuvel et al., 2017). Thresholding intuitively eliminates
at least some of the brain network connections. Correlation matrices can be
measured through thresholding iterations across all possible data points,
from 0.01 to 1. Indeed, thresholding procedures have been applied across
varying data points and examined for their robust characteristics. For
example, having too high a threshold (e.g., 0.95 or 1.0) necessarily includes
all correlation connections (exceedingly strong and very weak). Thus, the
results yield of the thresholded matrix is not informative. However, results
reveal that similarly sized networks show less inter-subject network
fragmentation with thresholds set at 0.2, 0.25, or 0.3. There are currently
several different statistical approaches for thresholding procedures that are
not inconsequential including proportional, relative, and absolute, among
others (Van den Heuvel et al., 2017). Researchers interested in reading more
about different consequences that may result from varying threshold
statistical approaches will find more detailed information in Van Wijk,
Stam, and Daffertshofer (2010) and Van den Heuvel et al. (2017). Again,
the goal of thresholding the correlation matrix is to preserve the strongest
connections and density of the network. Additionally, thresholding
procedures are implemented to prevent excessive fragmentation and
inadvertent insertion of randomness into the data, while simultaneously
eliminating the weaker connections. All thresholding is performed on the
connectivity matrices prior to applying any graph theory statistics for
network-based analyses. The result from this thresholding procedure is
considered a widely accepted and most fundamental step prior to any
network-based analysis. The results of thresholding procedures reveal the
adjacency matrix (Aij). It is important to note that, unlike typical correlation
analyses of resting-state data with music as functional connectivity analyses
(e.g., intrinsic connectivity, radial connectivity) oftentimes reported in the
brain imaging literature, all advanced network-based statistics and analyses
are performed on the adjacency matrix data. Thus, the choices of
parcellation scheme in terms of node selection and thresholding procedures
are critical for examining brain networks. A current lack of an agreed upon
approach to node selection has led to the analysis of functional brain
networks across an extensive range of scales. While individual neurons may
be considered as nodes, this has only been successful for more simplistic
networks, such as the C. Elegans (Sporns & Kötter, 2004; Towlson, Vertes,
Ahnert, Schafer, & Bullmore, 2013). It is still not currently possible to non-
invasively image or computationally analyze the brain’s estimated 100
billion neurons each one with ∼7,000 synapses (Stanley et al., 2013).
Presently, a comprehensive and unanimously agreed upon nodal definition
is still outstanding, making the selection of node options one of the more
central challenges in network analyses of neuroimaging data (Stanley et al.,
2013). Again, readers are encouraged to note that not all connectivity
approaches reported in the neuroimaging literature are mathematically or
statistically interchangeable in their approaches. While prevalent brain
connectivity literature employs correlation procedures, network-based
(graph theory) connectivity methods stem from the field of network science.
A full explanation of the technical and statistical steps used in brain
imaging is found in the wide set of fMRI literature, although several articles
highlight components of these techniques and network region-of-interest or
voxel-based network comparisons (Hagberg, Schult, & Swart, 2008;
Hayasaka & Laurienti, 2010). A complete review of statistics for fMRI data
can be found in Lindquist et al. (2018). Fig. 6 is a pictorial description of a
more typical data processing stream and network generation pipeline. The
pipeline depicted here is for fMRI data. Each of these steps must be
performed before any network-based statistics can be applied to individual
datasets and any network-based statistical comparisons can be made across
groups of people.
FIGURE 6. Processing stream for brain network analysis. Functional time series are correlated and
then binarized through thresholding procedures to create an adjacency matrix, representing the
strongest connections between every possible pair of nodes. The adjacency matrix is subsequently
mapped onto brain space following network-based statistical analyses. For network analysis,
functional magnetic resonance imaging (fMRI) data is processed in multiple steps through what is
typically referred to as a pipeline.
Reproduced from Wilkins (2015).
In summary, in terms of some of the broader categories of network

statistical properties and their role in the analyses of the overall brain
network (Rubinov & Sporns, 2010), there are particular metrics useful for
brain segregation, integration, and influence. Examples of segregation of
brain networks include clustering, motifs, and community structure or
modularity. Integration of brain networks includes distance, path length,
and efficiency measures, among others, while influence includes network
metrics of degree, participation, and betweenness (Bassett & Sporns, 2017;
Bullmore & Sporns, 2009). Thus, neuroimaging investigators are cautioned,
in regard to network-based brain imaging studies with music, to try to select
the most robust categories of node measures for network statistics and each
imaging modality as possible to avoid spurious results.
I M B
R
Since the original network-based investigation into the effects of music on

the brain, “Network Science: A New Method for Investigating the
Complexity of Musical Experiences in the Brain” (Wilkins et al., 2012),
that paper and those that followed have generated new insight into how and
why music affects network-based functional and structural brain
connectivity using EEG, DTI, DSI, and fMRI data (Fauvel et al., 2014;
Hodges & Wilkins, 2015; Karmonik et al., 2016; Koelsch, Skouras, &
Lohmann, 2018; Liu, Abu-Jamous, et al., 2017; Liu, Brattico et al., 2017;
Wilkins, 2015; Wilkins et al., 2014; Wu et al., 2012; Wu, Zhang, Ding, Liu,
& Zhou, 2013). The evidence resulting from a network-based approach to
the brain (Bassett & Bullmore, 2006; Bassett & Sporns, 2017; Bullmore &
Sporns, 2009) provides us with substantial confirmation that network
neuroscience not only advances our understanding of the brain, but
simultaneously holds promise for new understandings regarding the effects
of music and musical training on structural and functional brain networks in
both neurological health and disease as well as various compromised and
functional brain states (Bigand et al., 2015; Blum et al., 2017; Fauvel et al.,
2014; Gaser & Schlaug, 2003; Greicius, 2008; Gusnard, Akbudak,
Shulman, & Raichle, 2001a, 2001b; Karmonik et al., 2016; Koelsch et al.,
2018; Magee, Clark, Tamplin, & Bradt, 2017; Moussa et al., 2011; Raglio et
al., 2015; Sihvonen et al., 2017; Wilkins et al., 2018; Wu et al., 2013).
Network neuroscience presents opportunities in experimental designs
previously beyond the scope of classic neuroimaging analyses (i.e., “one
region-one behavior”). While conventional activation-style designs for
traditional experimental neuroimaging research are still valid, being able to
pursue questions about the brain’s entire system in a statistically principled
manner presents an opportunity to advance our understanding of music and
the brain. Newer evidence suggests that music may provide a means to
affect information flow in the brain network (Karmonik et al., 2016) as well
as changes in functional measures that accompany gray matter volume
changes from musical expertise (Fauvel et al., 2014). Results reveal the
brain functional network responds to preferred music listening by creating
communities within pivotal regions of the default mode network, a region
widely accepted to be important to self-reflective and mind wandering
processes important for brain function (Wilkins et al., 2014) and that a
favorite song can spontaneously separate the functional network into
distinct communities between the auditory cortex and the hippocampus, a
region recognized for memory encoding. Dynamic functional connectivity
analyses of data collected while people were listening to stimuli of
continuous music previously suggested to influence anxiety and anger show
significant measures of intrinsic connectivity within the salience network
(Lindquist et al., 2018). More recent evidence suggests that whole brain
responses to naturalistic music listening spontaneously alters the resting
brain to stimulate significant hubs within attentional control regions of the
anterior cingulate, highlighting how the network system may potentially
optimize or restore aspects of neurological function by resourcing
attentional circuit-breaker mechanisms (Wilkins et al., 2018). Compared to
the brain at rest, network analyses also indicate a significant reduction in
betweenness centrality within the amygdala during naturalistic music, a
region implicated in emotional responses linked to anxiety and avoidance
behaviors, suggesting a systems-level decrease in these affective responses
while listening to ambient background music (Wilkins et al., 2018). Recent
evidence also reveals that significant functional network characteristics of
different auditory regions are exhibited during music-evoked emotional
experiences of fear and joy (Koelsch et al., 2018).
The substantial questions and promising potential surrounding the
effects of brain responses to musical experiences have been, in many ways,
outside the scope of previously available tools and the more conventional
brain activation-based experimental approaches and analyses techniques. It
is easy to understand how music and brain imaging investigators at all
levels occasionally may have a sense of unease in dealing comprehensively
with a network-based approach to music and the brain. Under such
circumstances, it is tempting for neuroimaging scientists who are pursuing
questions about music to remain within the confines of conventional
activation analyses. A similar historical response can be found when
neuroimaging scientists were first considering the connectivity of the
Default Mode Network: “The suggested link between the processing taking
place at rest and its physiology is one that can have no direct relevance for
neuroimaging” (Morcom & Fletcher, 2007, p. 1075; for a complete update
on this commentary see also Raichle, 2001). This type of statement is
arguably true, if one’s experimental music and brain horizons are limited to
previous techniques and analyses in functional neuroimaging science. This
chapter suggests, however, that such a finite agenda will be depleted
eventually if not nourished by the broader implications and understanding
of brain function that these emerging network science techniques may
serve.
In closing, the main objective of this chapter is to highlight the graph
theory methods and network science evidence that persuades us towards
complex systems thinking and the field of network neuroscience. While
conventional approaches provide evidence of brain activation to music, a
network-based approach takes a different perspective, which is that a full
understanding of brain activity—including brain responses to musical
experiences—critically depends on studying the brain as a complex system
(Bassett & Sporns, 2017; Bullmore & Sporns, 2009; Wilkins, 2015) through
the application of network (graph theory) techniques (Bassett & Bullmore,
2009). A network-based analysis provides us with statistical rigor to study
detailed patterns of neural connections throughout the entire system of the
brain. This approach can be applied to data collected while people are
listening to continuous music, as well as comparing brain responses to
different types of music and the brains of people with musical training (Fig.
7) (Wilkins et al., 2012). These complex connections, or brain networks,
help reveal the architectural and functional scaffolding that ultimately
illuminates the brain’s dynamic behaviors as robust statistical connectivity
patterns, including the brain’s intrinsic (i.e., resting-state) activity that may
be affected while listening to music including that present in the default
mode network regions of the brain (Broyd et al., 2009; Raichle, 2001;
Wilkins et al., 2014).
FIGURE 7. Depiction of high degree hubs based on musical genre. Note the consistency of high
degree hubs in the auditory regions while people (N = 21) listened to continuous classical music, in
this case Beethoven’s 1st Symphony, Mvt. 1 London Symphony Orchestra. A 21,000 × 21,000
voxel-based matrix was used for the network-based statistical analyses.
Reproduced from Wilkins et al. (2012, pp. 282–283). © 2012 by the International Society for
the Arts, Sciences and Technology, published by MIT Press.
Although there are still fundamental questions about music and the brain
that remain unresolved, network science offers key tools that hold promise
for providing answers about complex systems in new ways. As the field
continues to advance, network neuroscience and the study of brain
connectivity, through network-based statistics, will expand new
experimental and theoretical avenues for understanding how structural brain
connectivity leads to dynamic brain function. The discussion in this chapter,
in particular, illustrates how network-based approaches may advance
fundamental questions surrounding the promising effects of music in
neurological research and rehabilitation (Hodges & Wilkins, 2015;
Kotchoubey, Pavlov, & Kleber, 2015; Thaut et al., 2008). As a
computationally robust field, network neuroscience provides a new
mathematical framework for investigating complex systems that goes
beyond previously conventional approaches to experimental design and
neuroimaging research. Methods from graph theory provide a robust, well-
established framework for assessing brain connectivity, both locally and
globally, offering a rigorous opportunity to expansively and non-invasively
explore the entire human brain under whole brain activity experiences
(Bullmore & Sporns, 2009; Rubinov & Sporns, 2010). Analyses can reveal
patterns of both structural and functional brain connectivity. A network
neuroscience approach provides unprecedented opportunities for examining
the effects of musical experiences on the human brain. The methods and
techniques presented here provide an opportunity for researchers to pursue
questions that may further advance the field of music and brain research,
deepening our scientific understanding surrounding the effects of music on
the brain.
R
Albert, R., Jeong, H., & Barabási, A.-L. (2000). Error and attack tolerance of complex networks.
Nature 406(6794), 378–382.
Alluri, V., Toivianen, P., Jaaskelainen, J. P., Glerean, E., Sams, M., & Brattico, E. (2012). Large-scale
brain networks emerge from dynamic processing of musical timbre, key and rhythm. NeuroImage
59(4), 3677–3689.
Alluri, V., Toiviainen, P., Lund, T. E., Wallentin, M., Vuust, P., Nandi, A. K., … Brattico, E. (2013).
From Vivaldi to Beatles and back: Predicting lateralized brain responses to music. NeuroImage 83,
627–636.
Amaral, L. A., Scala, A., Barthelemy, M., & Stanley, H. E. (2000). Classes of small-world networks.
Proceedings of the National Academy of Sciences 97(21), 11149–11152.
Barabási, A.-L. (2002). Linked: The new science of networks. Cambridge, MA: Perseus Publishing.
Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science 286(5439),
509–512.
Bassett, D. S., & Bullmore, E. (2006). Small-world brain networks. Neuroscientist 12(6), 512–523.
Bassett, D. S., & Bullmore, E. (2009). Human brain networks in health and disease. Current Opinion
in Neurology 22(4), 340–347.
Bassett, D. S., Khambhati, A. N., & Grafton, S. T. (2017). Emerging frontiers of neuroengineering: A
network science of brain connectivity. Annual Review of Biomedical Engineering 19, 327–352.
Bassett, D. S., & Sporns, O. (2017). Network neuroscience. Nature Neuroscience 20(3), 353–364.
Betzel, R. F., Erickson, M. A., Abell, M., O’Donnell, B. F., Hetrick, W. P., & Sporns, O. (2012).
Synchronization dynamics and evidence for a repertoire of network states in resting EEG.
Frontiers in Computational Neuroscience 6. Retrieved from
https://doi.org/10.3389/fncom.2012.00074
Bigand, E., Tillmann, B., Peretz, I., Zatorre, R. J., Lopez, L., & Majno, M. (Eds.). (2015). The
neurosciences and music V: Cognitive stimulation and rehabilitation. Annals of the New York
Academy of Sciences 1337.
Biswal, B. B., Kylen, J. V., & Hyde, J. S. (1997). Simultaneous assessment of flow and BOLD
signals in resting-state functional connectivity maps. NMR in Biomedicine 10(45), 165–170.
Biswal, B. B., Yetkin, F. Z., Haughton, V. M., & Hyde, J. S. (1995). Functional connectivity in the
motor cortex of resting human brain using echo-planar MRI. Magnetic Resonance in Medicine
34(4), 537–541.
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of
communities in large networks. Journal of Statistical Mechanics 2008. Retrieved from
https://doi.org/10.1088/1742-5468/2008/10/P10008
Blum, K., Simpatico, T., Febo, M., Rodriquez, C., Dushaj, K., Li, M., … Badgaiyan, R. D. (2017).
Hypothesizing music intervention enhances brain functional connectivity involving dopaminergic
recruitment: Common neuro-correlates to abusable drugs. Molecular Neurobiology 54(5), 3753–
3758.
Borgatti, S. (2005). Centrality and network flow. Social Networks 27(1), 55–71.
Broyd, S. J., Demanuele, C., Debener, S., Helps, S. K., James, C. J., & Sonuga-Barke, E. J. S. (2009).
Default-mode brain dysfunction in mental disorders: A systematic review. Neuroscience &
Biobehavioral Reviews 33(3), 279–296.
Buckner, R. L., Andrews-Hanna, J. R., & Schacter, D. L. (2008). The brain’s default mode network:
Anatomy, function, and relevance to disease. Annals of the New York Academy of Sciences 1124,
1–38.
Bullmore, E., & Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural
and functional systems. Nature Reviews Neuroscience 10(3), 186–198.
Bullmore, E., & Sporns, O. (2012). The economy of brain network organization. Nature Reviews
Butts, C. T. (2009). Revisiting the foundations of network analysis. Science 325(5939), 414–416.
Cohen, A. L., Fair, D. A., Dosenbach, N. U. F., Miezin, F. M., Dierker, D., & Van Essen, D. C.
(2008). Defining functional areas in individual human brains using resting functional connectivity
MRI. NeuroImage 41, 45–57.
Craddock, R. C., James, G. A., Holtzheimer, P. E., Hu, X. P., & Mayberg, H. S. (2012). A whole
brain fMRI atlas generated via spatially constrained spectral clustering. Human Brain Mapping
33(8), 1914–1928.
El Haj, M., Fasotti, L., & Allain, P. (2012). The involuntary nature of music-evoked autobiographical
memories in Alzheimer’s disease. Consciousness and Cognition 21(1), 238–246.
Euler, L. (1736). Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae
Scientiarum Imperialis Petropolitanae 8, 128–140. Reprinted and translated in N. L. Biggs, E. K.
Lloyd, & R. J. Wilson, Graph Theory 1736–1936 (pp. 3–8). Oxford: Oxford University Press,
1976.
Fauvel, B., Groussard, M., Chetelat, G., Fouquet, M., Landeau, B., Eustache, F., … Platel, H. (2014).
Morphological brain plasticity induced by musical expertise is accompanied by modulation of
functional connectivity at rest. NeuroImage 90, 179–188.
Fortunato, S. (2010). Community detection in graphs. Physics Reports 486(3–5), 75–174.
Fox, M. D., Snyder, A. Z., Vincent, J. L., Corbetta, M., Van Essen, D. C., & Raichle, M. E. (2005).
The human brain is intrinsically organized into dynamic, anticorrelated functional networks.
Fox, M. D., Zhang, D., Snyder, A. Z., & Raichle, M. E. (2009). The global signal and observed
anticorrelated resting state brain networks. Journal of Neurophysiology 101(6), 3270–3283.
Friston, K. J., Frith, C. D., Turner, R., & Frackowiak, R. S. (1995). Characterizing evoked
hemodynamics with fMRI. NeuroImage 2(2), 157–165.
Gaser, C., and Schlaug, G. (2003). Brain structures differ between musicians and non-musicians.
Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks.
Greicius, M. (2008). Resting-state functional connectivity in neuropsychiatric disorders. Current
Opinion in Neurology 21(4), 424–430.
Greicius, M., Krasnow, B., Reiss, A. L., & Menon, V. (2003). Functional connectivity in the resting
brain: A network analyses of the default mode hypothesis. Proceedings of the National Academy of
Sciences 100(1), 253–258.
Gusnard, D. A., Akbudak, E., Shulman, G. L., & Raichle, M. E. (2001a). Medial prefrontal cortex
and self-referential mental activity: Relation to a default mode of brain function. Proceedings of
the National Academy of Sciences 98(7), 4259–4264.
Gusnard, D. A., Akbudak, E., Shulman, G. L., & Raichle, M. E. (2001b). Role of medial prefrontal
cortex in a default mode of brain function. NeuroImage 13(6), S414.
Guye, M., Bettus, G., Bartolomei, F., & Cozzone, P. (2010). Graph theoretical analysis of structural
and functional connectivity MRI in normal and pathological brain networks. Magnetic Resonance
Materials in Physics, Biology and Medicine 23(5–6), 409–421.
Hagberg, A. A., Schult, D. A., & Swart, P. J. (2008). Exploring network structure, dynamics, and
function using NetworkX. In G. Varoquaux, T. Vaught, & J. Millman (Eds.), Proceedings of the
7th Python in Science Conference (SciPy2008) (pp. 11–15). Pasadena, CA.
Hayasaka, S., & Laurienti, P. J. (2010). Comparison of characteristics between region- and voxel-
based network analyses in resting-state fMRI data. NeuroImage 50(2), 499–508.
He, Y., & Evans, A. (2010). A review of structural and functional brain connectivity. Current
Opinion in Neurology 23(4), 341–350.
Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity:
Behavior, function, and structure. Neuron 76(3), 486–502.
Hodges, D. A., & Wilkins, R. W. (2015). How and why does music move us? Answers from
psychology and neuroscience. Music Education Journal 101(4), 41–47.
Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009).
Musical training shapes structural brain development. Journal of Neuroscience 29(10), 3019–3025.
Joyce, K. E., Laurienti, P. J., Burdette, J. H., & Hayasaka, S. (2010). A new measure of centrality for
brain networks. PLoS ONE 5(8), e12200.
Karmonik, C., Brandt, A., Anderson, J. R., Brooks, F., Lytle, J., Silverman, E., & Frazier, J. T (2016).
Music listening modulates functional connectivity and information flow in the human brain. Brain
Connectivity 6(8), 632–641.
Koelsch, S. (2009). A neuroscientific perspective on music therapy. Annals of the New York Academy
of Sciences 1169, 374–384.
for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13(1),
e0190057.
Kotchoubey, B., Pavlov, Y. G., & Kleber, B. (2015). Music in research and rehabilitation of disorders
of consciousness: Psychological and neurophysiological foundations. Frontiers in Psychology 6,
1763. Retrieved from https://doi.org/10.3389/fpsyg.2015.01763
Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills.
Kruschwitz, J. D., List, D., Waller, L., Rubinov, M. & Walter, H. (2015). GraphVar: A user-friendly
toolbox for comprehensive graph analyses of functional brain connectivity. Journal of
Neuroscience Methods 245, 107–115.
Lindquist, K. A., Pendl, S., Brooks, J. A., Wilkins, R. W., Kraft, R. A., & Gao, W. (2018). Dynamic
functional connectivity of intrinsic networks during emotions. NeuroImage. Under review.
Liu, C., Abu-Jamous, B., Brattico, E., & Nandi, A. K. (2017). Towards tunable consensus clustering
for studying functional brain connectivity during affective processing. International Journal of
Neural Systems 27(2), 1650042. doi:10.1142/S0129065716500428
Liu, C., Brattico, E., Abu-Jamous, B., Pereira, C. S., Jacobsen, T., & Nandi, A. K. (2017). Effect of
explicit evaluation on neural connectivity related to listening to unfamiliar music. Frontiers in
Human Neuroscience 11, 611. Retrieved from https://doi.org/10.3389/fnhum.2017.00611
Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature 453(7197),
869–878.
Magee, W. L., Clark, I., Tamplin, J., & Bradt, J. (2017). Music interventions for acquired brain injury.
Cochrane Database of Systematic Reviews 1, CD006787. doi:10.1002/14651858.CD006787.pub3
Mitchell, M. (2009). Complexity: A guided tour. Oxford: Oxford University Press.
Morcom, A. M., & Fletcher, P. C. (2007). Does the brain have a baseline? Why we should be
resisting a rest. NeuroImage 37(4), 1073–1082.
Moussa, M. N., Vechlekar, C. D., Burdett, J. H., Steen, M. R., Hugenschmidt, C. E., & Laurienti, P. J.
(2011). Changes in cognitive state alter human functional brain networks. Frontiers in Human
Neuroscience 5, 1–15. Retrieved from https://doi.org/10.3389/fnhum.2011.00083
Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J. P. (2010). Community structure
in time-dependent, multiscale, and multiplex networks. Science 328(5980), 876–878.
Mumford, J. A., Horvath, S., Oldham, M. C., Langfelder, P., Geschwind, D. H., & Poldrack, R. A.
(2010). Detecting network modules in fMRI time series: A weighted network analysis approach.
NeuroImage 52(4), 1465–1476.
Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review 45, 167–
256.
Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics
46(5), 323–351.
Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the
Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks.
Physical Review E 69(2 Pt. 2), 026113.
Power, J. D., Cohen, A. L, Nelson, S. M., Wig, G. S., Barnes, K. A., Church, J. A., … Petersen, S. E.
(2011). Functional network organization of the human brain. Neuron 72(4), 665–678.
Raglio, A., Attardo, L., Gontero, G., Rollino, S., Groppo, E., & Granieri, E. (2015). Effects of music
and music therapy on mood in neurological patients World Journal of Psychiatry 5(1), 68–78.
Raichle, M. E. (2001). A default mode of brain function. Proceedings of the National Academy of
Sciences 98(2), 676–682.
Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity: Uses and
interpretations. NeuroImage 52(3), 1059–1069.
Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects aesthetic
Savoy, R. A. (2005). Experimental design in brain activation MRI: Cautionary tales. Brain Research
Bulletin 67, 361–365.
Schlaug, G. (2001). The brain of musicians. A model for functional and structural adaptation. Annals
Schlaug, G. (2009a). Listening to music facilitates brain recovery processes. Annals of the New York
Schlaug, G. (2009b). Music, musicians, and brain plasticity. In S. Hallam, I. Cross, & M. Thaut
(Eds.), The Oxford handbook of music psychology (pp. 197–207). Oxford: Oxford University
Press.
Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white matter tracts of
chronic aphasic patients undergoing intense intonation-based speech therapy. Annals of the New
York Academy of Sciences 1169, 385–394.
Shirer, W. R., Ryali, S., Rykhlevskaia, E., Menon, V., & Greicius, M. D. (2012). Decoding subject-
driven cognitive states with whole-brain connectivity patterns. Cerebral Cortex 22(1), 158–165.
Sihvonen, A. J., Sarkamo, T., Leo, V., Tervaniemi, M., Altenmuller, E., & Soinila, S. (2017). Music-
based interventions in neurological rehabilitation. The Lancet Neurology 16(8), 648–660.
Sporns, O., Chialvo, D., Kaiser, M., & Hilgetag, C. (2004). Organization, development and function
of complex brain networks. Trends in Cognitive Sciences 8(9), 418–425.
Sporns, O., & Kotter, R. (2004). Motifs in brain networks. PLoS Biology 2(11), e369.
Sporns, O., Tononi, G., & Kötter, R. (2005). The human connectome: A structural description of the
human brain. PLoS Computational Biology 1(4), e42.
Stam, C. J. (2014). Modern network science of neurological disorders. Nature Reviews Neuroscience
15, 683–695.
Stam, C. J., & Reijneveld, J. P. (2007). Graph theoretical analysis of complex networks in the brain.
Nonlinear Biomedical Physics 1, 3. doi:10.1186/1753-4631-1-3
Stanley, M. L., Moussa, M. N., Paolini, B. M., Lyday, R., Burdette, J. H., & Laurienti, P. J. (2013).
Defining nodes in complex networks. Frontiers in Computational Neuroscience 7, 169. Retrieved
from https://doi.org/10.3389/fncom.2013.00169
Steen, M., Hayasaka, S., Joyce, K., & Laurienti, P. (2011). Assessing the consistency of community
structure in complex networks. Physical Review E 84(1–2), 016111.
Strogatz, S. H. (2001). Exploring complex networks. Nature 410(6825), 268–276.
Telesford, Q. K., Simpson, S. L., Burdette, J. H., Hayasaka, S., & Laurienti, P. J. (2011). The brain as
a complex system: Using network science as a tool for understanding the brain. Brain Connectivity
1(4), 295–308.
Thaut, M. H., Demartin, M., & Sanes, J. N. (2008). Brain networks for integrative rhythm formation.
PLoS ONE 3, e2312.
Thaut, M. H., Gardiner, J. C., Holmberg, D., Horwitz, J., Kent, L., Andrews, G., … McIntosh, G. R.
(2009). Neurologic music therapy improves executive function and emotional adjustment in
traumatic brain injury rehabilitation. Annals of the New York Academy of Sciences 1169, 406–416.
Towlson, E., Vertes, P. E., Ahnert, S., Schafer, W. R., & Bullmore, E. T. (2013). The rich club of the
C. elegans neuronal connectome. Journal of Neuroscience 33(15), 6380–6387.
Tuch, D. S., Reese, T. G., Wiegell, M. R., & Wedeen, V. J. (2003). Diffusion MRI of complex neural
architecture. Neuron 40(5), 885–895.
Van den Heuvel, M. P., de Lange, S. C., Zalesky, A., Seguin, C., Yeo, B. T. T., & Schmidt, R. (2017).
Proportional thresholding in resting-state fMRI functional connectivity networks and
consequences for patient-control connectome studies: Issues and recommendations. NeuroImage
152, 437–449.
Van Wijk, B. C., Stam, C. J., & Daffertshofer, A. (2010). Comparing brain networks of different size
and connectivity density using graph theory. PloS ONE 5, e13701.
Wang, J., Zuo, X., & He, Y. (2010). Graph-based network analysis of resting-state functional MRI.
Frontiers in Systems Neuroscience 4, 16. Retrieved from https://doi.org/10.3389/fnsys.2010.00016
Wang, M., González, C. A., Hidalgo, & Barabási, A.-L. (2009). Understanding the spreading patterns
of mobile phone viruses. Science 324(5930), 1071–1076.
Watts, D. J. (2003). Six degrees: The science of a connected age. New York: W. W. Norton.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature
393(6684), 440–442.
Wedeen, V. J., Hagmann, P., Tseng, W. Y., Reese, T. G., & Weiskoff, R. M. (2005). Mapping complex
tissue architecture with diffusion spectrum magnetic resonance imaging. Magnetic Resonance in
Medicine 54(6), 1377–1386.
West, B. J. (2011). Overview 2010 of ARL program on network science for human decision making.
Frontiers in Physiology 2, 76. Retrieved from https://doi.org/10.3389/fphys.2011.00076
Whitfield-Gabrielli, S., & Nieto-Castanon, A. (2012). Conn: A functional connectivity toolbox for
correlated and anticorrelated brain networks. Brain Connectivity 2(3).
doi:10.1089/brain.2012.0073
Wilkins, R. W. (2015). Network science and the effects of music on the human brain (Doctoral
dissertation). University of North Carolina at Greensboro.
Wilkins, R. W., Giridharan, S., Johnston, M., Brooks, J. A., Lindquist, K. A., & Kraft, R. A. (2018).
Changes in resting-state functional brain networks during naturalistic music listening. In
preparation.
Wilkins, R. W., Hodges, D. A., Laurienti, M., Steen, M., & Burdette, J. H. (2012). Network science:
A new method for investigating the complexity of musical experiences in the brain. Leonardo
45(3), 282–283.
Wilkins, R. W., Hodges, D. A., Laurienti, M., Steen, M., & Burdette, J. H. (2014). Network science
Scientific Reports 4, 6130. doi: 10.1038/srep06130
Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human
brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422.
Wu, J., Zhang, J., Ding, X., Liu, D., & Zhou, C. (2013). The effects of music on brain functional
networks: A network analyses. Neuroscience 250, 49–59.
Wu, J., Zhang, J., Liu, C., Liu, D., Ding, X., & Zhou, C. (2012). Graph theoretical analysis of EEG
functional connectivity during music perception. Brain Research 1483, 71–81.
Zatorre, R., Evans, A., Meyer, E., & Gjedde, A. (1992). Lateralization of phonetic and pitch
discrimination in speech processing. Science 256(5058), 846–849.
Zatorre, R., & Samson, S. (1991). Role of the right temporal neocortex in retention of pitch in
auditory short-term memory. Brain 114(6), 2403–2417.
Zuo, X., Ehmke, R., Mennes, M., Imperati, D., Castellanos, X., Sporns, O., & Milham, M. (2011).
Network centrality in the human functional connectome. Cerebral Cortex 22(8), 1862–1875.
CHAPT E R 7
ACOUSTIC STRUCTURE
AND MUSICAL FUNCTION:
MUSICAL NOTES
I N F O R M I N G A U D I TO RY
RESEARCH
MI C H A E L S C H U T Z
I O
B ’ Fifth Symphony has intrigued audiences for generations. In

opening with a succinct statement of its four-note motive, Beethoven deftly
lays the groundwork for hundreds of measures of musical development,
manipulation, and exploration. Analyses of this symphony are legion
(Schenker, 1971; Tovey, 1971), informing our understanding of the piece’s
structure and historical context, not to mention the human mind’s
fascination with repetition. In his intriguing book The first four notes,
Matthew Guerrieri deconstructs the implications of this brief motive (2012),
illustrating that great insight can be derived from an ostensibly limited
grouping of just four notes. Extending that approach, this chapter takes an
even more targeted focus, exploring how groupings related to the harmonic
structure of individual notes lend insight into the acoustical and perceptual
basis of music listening.
Extensive overviews of auditory perception and basic acoustical
principles are readily available (Moore, 1997; Rossing, Moore, & Wheeler,
2013; Warren, 2013) discussing the structure of many sounds, including
those important to music. Additionally, several texts now focus specifically
on music perception and cognition (Dowling & Harwood, 1986; Tan,
Pfordresher, & Harré, 2007; Thompson, 2009). Therefore this chapter
focuses on a previously under-discussed topic within the subject of musical
sounds—the importance of temporal changes in their perception. This
aspect is easy to overlook, as the perceptual fusion of overtones makes it
difficult to consciously recognize their individual contributions. Yet
changes in the amplitudes of specific overtones excited by musical
instruments as well as temporal changes in the relative strengths of those
overtones play a crucial role in musical timbre. Western music has
traditionally focused on properties such as pitch and rhythm, yet
contemporary composers are increasingly interested in timbre, to the point
where it can on occasion even serve as a composition’s primary focus
(Boulez, 1987; Hamberger, 2012). And although much previous scientific
research on the neuroscience of music as well as music perception has
focused on temporally invariant tones, there has been increasing recognition
in the past decade that broadening our toolbox of stimuli is important to
elucidating music’s psychological and neurological basis. Consequently,
understanding the role of temporal changes in musical notes holds
important implications for psychologists, musicians, and neuroscientists
alike.
Traditional musical scores give precise information regarding the
intensity of each instrument throughout a composition in the form of
dynamic markings. But for obvious practical reasons scores never specify
the rapid intensity changes found in each overtone of an individual note. At
most, composers hint at their preferences through descriptive terms such as
“sharper/duller,” vague instructions (“as if off in the distance”), and/or
performers use stylistic considerations to make such decisions—e.g., by
following period-specific performance practice. And to a large extent, both
the harmonic structure of a note as well as changes in its harmonic structure
over time are natural consequences of an instrument’s physical structure.
For example, the rapid decay of energy in harmonics shortly after the onset
of a vibraphone note contrasts with the long sustain of its fundamental—
contributing to its characteristic sound.
Musical notation clearly reflects changes in the intensity of collections
of notes (e.g., crescendos, sfz) but never on the changes within notes
themselves. While understandable, this decision mirrors the lack of
attention to changes in overtone intensity in many psychophysical
descriptions of sound—as well as perceptual experiments with auditory
stimuli. This is unfortunate, as these intensity changes play an important
role in efforts to synthesize “realistic” sounding musical notes—an issue of
great relevance to composers creating electronic music. These also play an
important role in discussions of tone quality so crucial to music educators
training young ears, not to mention sound editors/engineers exploring
which dynamic changes are important to capture and preserve when
recording/mixing/compressing high quality audio. This chapter summarizes
research on both the perceptual grouping of overtones and their rapid
temporal changes, placing it in a broader context by highlighting
connections to another important topic—how individual notes are
perceptually grouped into chords. Finally, it concludes with a discussion of
mounting evidence that auditory stimuli devoid of complex temporal
changes may lead to experimental outcomes that fail to generalize to world
listening—and on occasion can suggest errant theoretical frameworks and
basic principles.
G N : D
C H
The vertical alignment of notes gives rise to musical harmonies ranging

from lush to biting—from soothing to scary. Consequently, composers
carefully design complex groupings whose musical effects hinge on small
changes in their arrangement. For example, major and minor chords differ
significantly in their neural processing (Pallesen et al., 2005; Suzuki et al.,
2008) and evoke distinct affective responses (Eerola, Friberg, & Bresin,
2013; Heinlein, 1928; Hevner, 1935). Yet from the standpoint of acoustic
structure this change is small—a half-step in the third (i.e., “middle note”)
of a musical chord (Aldwell, Schachter, & Cadwallader, 2002). In absolute
terms, this represents a relatively small shift in the raw acoustic information
—moving one of three notes the smallest permissible musical distance.
From a raw acoustic perspective, this is particularly unremarkable in a
richly orchestrated passage, yet the shift from major to minor can lead to
significant changes in a passage’s character. Individuals with cochlear
implants—which offer relatively coarse pitch discrimination—are often
unable to hear these distinctions, and often find music listening problematic
(Wang et al., 2012). Fortunately most hear these changes quite readily, as
evidenced by a literature on the detection of “out of key” notes shifted by a
mere semi-tone (Koelsch & Friederici, 2003; Pallesen et al., 2005).
Although musical acculturation occurring at a relatively young age
(Corrigall & Trainor, 2010, 2014) aids this process, even musically
untrained individuals are capable of detecting small changes (Schellenberg,
2002).
Notes of different pitch are often grouped together into a single musical
object—a chord. Typically consisting of three or more individual notes,
chords function as a “unit” and together lay out the harmonic framework or
backbone of a musical passage. The specific selection of simultaneous notes
(i.e., harmonically building chords) has profound effects on the listening
experience of audiences, forming one of the key building blocks of strong
physiological responses to music (Lowis, 2002; Sloboda, 1991). The
masterful selection of notes, rhythms, and instruments requires both
intuition and craft, and basic principles are articulated in numerous treatises
on composition (Clough & Conley, 1984), and guidelines to orchestration
(Alexander & Broughton, 2008; Rimsky-Korsakov, 1964). Yet another
aspect of musical sound’s vertical structure plays a crucial role in the
listening experience, even if it is under less direct control by composers—
the “vertical structure” (i.e., harmonic content) of individual notes—as well
as the time-varying changes to these components. This topic forms the
primary focus of this chapter, for much as study of individual notes can lend
insight into our perception of musical passages, studying the rich, time-
varying structure of concurrent harmonics can lend insight into our
understanding of their perception.
G H : D
I N
The complexities in composers’ grouping of individual notes into chords
are well known (Aldwell et al., 2002), yet the musical importance of
individual harmonics is less transparent, even though single notes produced
by musical instruments contain incredible sophistication and nuance
(Hjortkjaer, 2013). Musical instruments produce sounds rich in overtones,
which for pitched instruments generally consist of harmonics at integer
multiples of the fundamental (Dowling & Harwood, 1986; Tan et al., 2010),
as well as other non-harmonic energy (particularly during a sound’s onset).
The lawful structure of these overtones serves as an important binding cue,
triggering a decision by the perceptual system to blend overtones such that
“the listener is not usually directly aware of the separate harmonics”
(Dowling & Harwood, 1986, p. 24). Although some musicians develop the
ability to “hear out” individual components of their instruments (Jourdain,
1997, p. 35), in general this collection of frequencies fuses into a single
musical unit. Consequently for practical matters the complex structure of
individual notes is of less musical interest than the composer’s complex
selection of structural cues (Broze & Huron, 2013; Huron & Ollen, 2003;
Patel & Daniele, 2003; Poon & Schutz, 2015), or the performer’s
interpretation of those cues (Chapin, Jantzen, Kelso, Steinberg, & Large,
2010).
Although the musical importance of small note-to-note variations in
amplitude with respect to phrasing and expressivity (Bhatara, Tirovolas,
Duan, Levy, & Levitin, 2011; Repp, 1995) is widely recognized, the small
moment-to-moment amplitude variations in individual overtones have
received less research attention. Musical sounds contain overtones shifting
in their relative strength over time (Jourdain, 1997, p. 35), and some
textbooks explicitly note the importance of these dynamic changes
(Thompson, 2009, p. 59). Yet the role of spectra is often presented as time-
invariant and described through summaries of spectral content irrespective
of temporal changes in a note’s spectra.
Musical instruments produce notes rich in temporal variation—not only
in their overall amplitudes, but even with respect to the envelopes of
individual harmonics. For example, Fig. 1 visualizes a musical note
performed on the trumpet (left panel) and clarinet (right panel), based on
instrument sounds provided by the University of Iowa Electronic Music
studios (Fritts, 1997). The intensity (z axis) of energy extracted from each
harmonic (x axis) is graphed over time (y axis). These 3D visualizations
illustrate the temporal complexity of harmonics bound into the percept of a
single note. In fact, divorced from its context in a melody, expressive
timings in musical passages, discussion of performer’s intentions regarding
phrasing and numerous other considerations, analysis of isolated notes
affords invaluable insight. Small temporal variations in each overtone play
a key role in the degree to which synthesized notes sound “real” rather than
“artificial.” Highly trained musicians can routinely produce different
variations on a single note (“brighter” or “more legato,” “shimmery,” etc.),
which involve intentionally varying both the balance and temporal changes
in a note’s overtones.
FIGURE 1. Visualization of single notes produced by a trumpet (left) and clarinet (right),
illustrating their complex temporal structure. Although the trumpet spectrum changes more
dynamically than the clarinet, each partial is in constant flux.
The goal of these 3D figures is to illustrate the dynamic nature of the harmonic structure of
musical tones. Consequently they are not complete acoustical analyses (which are readily
available elsewhere), but serve to highlight information lost in temporally invariant power
spectra.
As tones synthesized without adequate temporal changes often sound

uninteresting or “fake,” composers of electronic music, producers,
instrument manufacturers, and other musical professionals pay top dollar
for high quality audio samplings of instruments needed for their artistic
purposes. Some creators of electronic music prefer samples of real musical
sounds over efforts to synthesize these sounds (Risset & Wessel, 1999), in
part due to the temporal complexity of accurately realizing the temporal
changes in individual musical notes, as well as our sensitivity to small
changes (or the lack thereof) in electronically generated tones. From a
psychological perspective, what is so crucial about the structure of
individual notes? What are the acoustic differences between life-like and
dull renditions of individual instruments?
The importance of dynamic changes in an individual note’s harmonics
can be most usefully understood within the context of musical timbre—a
complex, multidimensional property that has proven incredibly challenging
to even define, let alone explain. Unfortunately for timbre enthusiasts, this
property is often treated as a “miscellaneous category” (Dowling &
Harwood, 1986, p. 63) accounting for the perceptual experience of
“everything about a sound which is neither loudness nor pitch” (ANSI,
1994; Erickson, 1975). In other words, timbre is often defined less by what
it is than what it is not (Risset & Wessel, 1999). This oppositional approach
is sensible given the multitude of acoustic factors known to play a role in its
perception (Caclin, McAdams, Smith, & Winsberg, 2005; McAdams,
Winsberg, Donnadieu, de Soete, & Krimphoff, 1995).
A S M
T
One particularly useful technique for studying musical timbre is

multidimensional scaling (MDS), which allows for exploration absent of
assumptions about which acoustic properties are most important. Many
studies using this approach will present a variety of individual notes
matched for pitch and intensity, asking participants to rate their similarity
(or more often, dissimilarity). Analysis of dissimilarity ratings affords
construction of a multidimensional space allowing for visualization of the
“perceptual distance” between different pairs of notes. Early studies found
spectral properties play a crucial role (Miller & Carterette, 1975), and
subsequent work has refined our understanding of their role on both the
neural (Tervaniemi, Schröger, Saher, & Näätänen, 2000) and perceptual
(Grey & Gordon, 1978; Trehub, Endman, & Thorpe, 1990) levels.
Consequently, the role of spectra in timbre is well explained in numerous
textbooks on auditory perception and music cognition (Dowling &
Harwood, 1986; Tan et al., 2010; Thompson, 2009, p. 48), typically through
visualizations of power spectra, similar to Fig. 2.
FIGURE 2. Power spectra of trumpet and clarinet. These plots accurately convey the trumpet’s
energy at many harmonics in contrast to the clarinet’s energy primarily at odd numbered harmonics.
However, power spectra fail to convey any information about the temporal changes in harmonic
amplitude so crucial to a sound’s timbre.
Power spectra provide a useful, time-invariant summary of the relative

harmonic strength. By collapsing along the temporal dimension shown in
Fig. 1, Fig. 2 summarizes one of the characteristic distinctions between
brass and woodwind instruments—that trumpets produce energy at all
harmonics, whereas clarinets primarily emphasize alternate harmonics. Yet
power spectra fail to capture the dynamic changes prominent in natural
musical instruments, and the perceptual difference between synthesizing the
information represented in Fig. 1 and Fig. 2 is striking. For interactive
demonstrations of these differences, pedagogical tools useful for both
teaching and research purposes are freely available from
www.maplelab.net/pedagogy.
The shortcomings of power spectra are clear in cases where temporal
cues play key roles not only in the realism of a musical sound, but in the
distinction between different musical timbres. For example, the top row of
Fig. 3 shows power spectra for notes produced on the trombone vs. cello.1
This visual similarity in power spectra is somewhat surprising, given the
markedly different methods of sound production in these instruments—a
brass tube driven by lips on a mouthpiece vs. a bow drawn across a string.
Additionally, cellos and trombones function differently in most musical
compositions, suggesting their perception is distinct. Although this
distinction is not apparent from their power spectra, it is clear in the middle
row of Fig. 3 showing changes in harmonic strength over time. The bottom
row provides a visualization of tones synthesized using the power spectra in
the first row—illustrating what is retained and what is lost in time-invariant
visualizations of musical sounds.
FIGURE 3. Visualizations of trombone (left) and cello (right). Panels in top row illustrate
similarity in these instruments’ power spectra, despite the clear acoustical differences shown in the
middle panels. Bottom panels visualize tones synthesized using static power spectra (i.e., ignoring
temporal changes in the strength of individual harmonics).
Certain aspects of temporal dynamics are recognized as playing an

important role in musical timbre. For example, both the rise time (initial
onset) of notes (Grey, 1977; Krimphoff, McAdams, & Winsberg, 1994) as
well as gross temporal structure—amplitude envelope—have been shown to
be important (Iverson & Krumhansl, 1993). As an extreme example,
reversing the temporal structure of a note qualitatively changes its timbre,
such that a piano note played “backwards” sounds more like a reed-organ
than a piano (Houtsma, Rossing, & Wagennars, 1987). It is important to
note that in this case the power spectra for piano notes played either
forwards or backwards are identical—yet the experience of listening to
these renditions differs markedly. Even beyond dramatic changes such as
backwards listening, temporal changes are known to play an important role
in sounds from natural instruments. However, interest in the connection
between temporal dynamics and timbre has largely focused on a sound’s
onset (Gordon, 1987; Strong & Clark, 1967) rather than changes throughout
its sustain period. For example, past studies have shown that insensitivity to
a tone’s onset correlates with reading deficits (Goswami, 2011). Tone onset
is also crucial to distinguishing between musical timbres (Skarratt, Cole, &
Gellatly, 2009), and their removal leads to confusion of instruments
otherwise easily differentiable (Saldanha & Corso, 1964).2
T U T V
S M P R
Although temporal changes in the strengths of individual harmonics clearly

play an important role in musical sounds, these changes are rightly
recognized by experimental psychologists as potentially confounding (or at
least introducing noise into) perceptual experiments. Not only will different
instruments (along with variations in mouthpieces, mallets, bows, etc.)
make consistency challenging when using natural musical tones, the
complexity of changes in recordings of nominally steady-state notes runs
contrary to the level of control desirable for scientific experimentation. If an
experimenter’s goal is to explore the role of pitch difference in auditory
stream segregation, short pure tones with minimal amplitude variation offer
clear benefits for drawing strong, replicable conclusions elucidating some
aspects of our auditory perceptual organization. Consequently, the high
degree of emphasis placed upon tightly constrained, easily reproducible
stimuli incentivizes the use of simplified tones lacking temporal variation
beyond simplistic onsets and offsets. This raises important questions about
what kinds of stimuli are used to assess auditory perception. Although
simplified sounds aid researchers in avoiding problematic confounds, their
over-use could lead to challenges with generalizing their findings to natural
sounds with the kinds of temporal variations shown in Fig. 1.
In order to explore the kinds of sounds used in research on music
perception, my team surveyed 118 empirical papers published in the journal
Music Perception from experiments dating back to its inception in 1983,
based on a previous comprehensive bibliometric survey (Tirovolas &
Levitin, 2011). Primarily interested in determining the amount of amplitude
variation found in the temporal structures of auditory stimuli, we classified
every stimulus used in each of the 212 surveyed experiments as either “flat”
(i.e., lacking temporal variation), “percussive” (decaying notes such as
those produced by the piano, cowbell, or marimba), or “other”—sounds
such as those produced by sustained instruments like the French horn or
human voice. Fig. 4 illustrates examples of each stimulus class.
FIGURE 4. Wave forms of different sounds found in the survey of stimuli used in Music
Perception (Schutz & Vaisberg, 2014).
Reproduced from Music Perception: An Interdisciplinary Journal 31(3), Michael Schutz and
Jonathan M. Vaisberg, Surveying the temporal structure of sounds used in music perception,
pp. 288–296, doi:10.1525/mp.2014.31.3.288, Copyright © 2014, The Regents of the
University of California.
The most surprising outcome from this survey was that although most
articles included a wealth of technical information on spectral structure,
duration, and the exact model of headphones or speakers used to present the
stimuli, about 35 percent failed to define the stimuli’s temporal structure.
This finding is not unique to Music Perception—my team found similar
problems with under-specification in the journal Attention, Perception &
Psychophysics (Gillard & Schutz, 2013). More important than under-
specification, both surveys revealed a strong bias against sounds with the
kinds of temporal variations common to musical instruments. Although flat
tones lend themselves well to tight experimental control and consistent
replication amongst different labs, they fail to capture the richness of the
sounds forming the backbone of the musical listening experience. Yet they
remain prominent in a wide range of research on auditory perception on
tasks purportedly designed to illuminate generalizable principles of auditory
perception.
Prominent researchers have noted that the world is “[not] replete with
examples of naturally occurring auditory pedestals [i.e., flat amplitude
envelopes]” (Phillips, Hall, & Boehnke, 2002, p. 199). Yet flat tones appear
to be the normative approach to research on auditory perception, which are
clearly far removed from the complexity of natural musical sounds—as
shown in Fig. 5. Note that each of the three musical instruments visualized
not only exhibits constant temporal changes, but temporal changes in the
amplitudes of each individual harmonic. This dynamic fluctuation contrasts
starkly with the flat tones favored in auditory perception research shown in
the bottom right panel. This over-fixation on sounds lacking meaningful
amplitude variation is not confined to behavioral work; a large-scale review
of auditory neuroscience research concluded with a note of caution that
important properties of functions of the auditory system will only be fully
understood when researchers begin employing envelopes that “involve
modulation in ways that are closer to real-world tasks faced by the auditory
system” (Joris, Schreiner, & Rees, 2004, p. 570). The acoustic distance
between the temporally dynamic musical sounds and temporally
constrained flat tones common in auditory perception and neuroscience
research raises important questions about the degree to which theories and
models derived from these experiments generalize to musical listening. The
complexities of balancing competing needs for experimental control and
ecological relevance are significant, and will serve as the focus of the
following section.
FIGURE 5. Single notes produced by an oboe (upper left), French horn (upper right), and viola
(lower left) illustrate their temporal complexity. Although their specific mix of harmonics varies,
these instruments all exhibit constant changes in the strength of each harmonic over the tone’s
duration. This temporal complexity contrasts strongly with the temporal simplicity of the flat tone
depicted in the lower right panel, which lacks temporal variation beyond abrupt onsets/offsets, and
no change in relative strength of harmonics.
O M C
S S
This focus on tightly constrained stimuli is not necessarily problematic;

control of extraneous variables is essential to researchers’ ability to draw
strong conclusions from individual experiments. Consistency in the
synthesis of stimuli amongst different labs holds many advantages with
respect to replication, an issue of increasing importance to the field as a
whole. And in some circumstances the real-world associations inherent in
temporally complex sounds can pose obstacles to answering key questions.
For example, researchers exploring acoustic attributes of unpleasant sounds
illustrate that frequency range (Kumar, Forster, Bailey, & Griffiths, 2008),
spectral “roughness” (Terhardt, 1974), and the relative mix of harmonics-to-
noise (Ferrand, 2002) are key factors—issues important for engineers
designing human–computer auditory interfaces. Yet a direct ranking of
sounds shows that vomiting is regarded as one of the most unpleasant (Cox,
2008), an outcome related less to its specific acoustic properties than the
obvious real-world associations (McDermott, 2012). In some cases these
real-world associations may be regarded as confounds obfuscating the
general principles at hand.
Therefore, in some inquiries aimed at understanding the relationship
between acoustic structure and perceptual response, it is not only reasonable
but actually necessary to use sounds devoid of referents. This issue of
disentangling the effects attributable to associations vs. acoustic features is
of particular importance in the perception of music, given the rich and
complex relationship between music, memory, and emotion. Familiar
compositions can evoke memories as a result of past associations—for
example from a history of personal listening/performance (Schulkind,
Hennis, & Rubin, 1999) or use in film sound tracks (e.g., those used by
Vuoskoski and Eerola, 2012). Indeed songs from popular television shows
are so familiar they have even been used to assess the pervasiveness of
absolute pitch amongst the general population (Schellenberg & Trehub,
2003). Consequently, synthesized tones lacking real-world associations
serve a useful purpose in advancing our understanding of auditory
perception.
However, although artificial sounds devoid of real-world associations
that afford precise control/replication offer advantages in certain
circumstances, their simplicity can pose barriers to fully understanding
music perception. In fact, auditory psychophysics’ focus on “control”
(Neuhoff, 2004) and the study of isolated parameters absent their natural
context (Gaver, 1993) is an issue of long-standing concern in some corners
of the auditory perception community. This is of particular importance to
understanding music, as composers, performers, conductors, and recording
engineers focus great attention to slight nuances of musical timbre. Yet the
same differences so useful in artistic creation often serve as confounds
within the realm of auditory psychophysics. This raises important questions
about the types of stimuli that should be used in experiments designed to
address questions related to music listening. Can artificial sounds abstracted
from our day-to-day musical experiences lead to experimental outcomes
that generalize to listening outside the laboratory?
Perceptual experiments exploring audio-visual integration in musical
contexts offer a useful case study in the consequences of ignoring the role
of musical sounds’ dynamic temporal structures. A large body of audio-
visual integration research using temporally simplistic sounds has
concluded that vision rarely influences auditory evaluations of duration3
(Fendrich & Corballis, 2001; Walker & Scott, 1981; Welch & Warren,
1980). However, a musical experiment exploring ongoing debate amongst
percussionists led to a surprising break with widely accepted theory. In that
series of studies an internationally acclaimed musician attempted to create
long and short notes on the marimba—a tuned, wooden bar instrument
similar to the xylophone. Notes on the marimba are percussive (Fig. 4,
middle panel)—with continuous temporal variation in their structure as the
energy transferred into the bar (by striking) gradually dissipates as a result
of friction, air resistance, etc. Whether or not the duration of these notes can
be intentionally varied has been long debated in the percussion community
(Schutz & Manning, 2012). However, an assessment of an expert
percussionist’s ability to control note duration demonstrated that these
gestures are in fact acoustically inconsequential, but trigger an illusion in
which the longer physical gesture used to strike the instrument affects
perception of the resulting note’s duration (Schutz & Lipscomb, 2007).
Musical implications (Schutz, 2008) aside, this finding represents a clear
break from previously accepted views on the integration of sight and sound
(Fendrich & Corballis, 2001; Walker & Scott, 1981; Welch & Warren,
1980).
The surprising ability of percussionists to shape perceived note duration
despite previous experimental work to the contrary stems in large part from
a bias in the temporal structure of stimuli used in auditory research.
Subsequent experiments illustrate that movements derived from the
percussionists’ gesture (Schutz & Kubovy, 2009b) integrate with sounds
exhibiting decaying envelopes (e.g., piano notes, produced from the impact
of a hammer on string), but failed to integrate with the sustained tones
produced by the clarinet or French horn (Schutz & Kubovy, 2009a). As the
clarinet differs in many properties from the marimba and piano, a direct test
of temporal structure using pure tones (i.e., sine waves) shaped with
decaying vs. amplitude invariant amplitude envelopes found visual
information integrated with the temporally dynamic percussive tones, but
not the temporally invariant flat tones previously used in audio-visual
integration experiments (Schutz, 2009).
This distinction between the outcomes of experiments with tones using
temporally dynamic vs. static amplitude envelopes is important in assessing
the degree to which lab-based tasks inform our understanding of listening in
the real world. For example, temporal structure can play a key role in the
well-known audio-visual bounce effect (ABE), in which two circles
approach each other, overlap, and then move to their original starting point.
Although this ambiguous display can be perceived as depicting circles
either “bouncing off” or “passing through” one another, a brief tone
coincident with the moment of overlap enhances the likelihood of seeing a
bounce (Sekuler, Sekuler, & Lau, 1997). However, not all sounds affect this
integrated perception in the same way. Sounds synthesized with decaying
envelopes mimicking impact events trigger significantly more bounce
percepts than their mirror images (Grassi & Casco, 2009). The temporal
structure of individual tones also plays a role in a variety of “general”
perceptual tasks assessed primarily using tones lacking dynamic temporal
changes, leading to different experimental outcomes in tasks ranging from
learning associations (Schutz, Stefanucci, Baum, & Roth, 2017) to
perceiving pitches (Neuhoff & McBeath, 1996), assessing event duration
(Vallet, Shore, & Schutz, 2014), and segmenting auditory streams (Iverson,
1995).
Overlooking the importance of temporal structure in auditory perception
can even lead to misguided theoretical claims used to inform ongoing
research programs. For example, as discussed previously a great deal of
audio-visual integration research involves temporally simplified tones
ensuring experimental control. However, interest in the role of the natural
connection between sight and sound has been considered in discussions
regarding the “unity assumption” (Welch, 1999) and/or “identity decision”
(Bedford, 2004). That research explores the idea that event unity between
sight and sound plays an important role in the binding decision, such that
stimuli perceived as “going together” are more likely to bind. For example,
in the well-known “ventriloquist effect” the sound of a ventriloquist’s voice
is perceptually bound with concurrent lip movements of their puppets
(Abry, Cathiard, Robert-Ribes, & Schwartz, 1994; Bonath et al., 2007).
Unfortunately, the natural real-world relationships between sights and
sounds often pose challenges for the controlled manipulations so important
to experimental research. For example, tightly controlled, psychophysically
inspired studies of multimodal speech help clarify the importance of event
unity in multisensory integration. Gender matched faces and voices—the
sound of a male producing syllable paired with the lip movements of either
male or female articulating that syllable—bind more strongly than gender
mis-matched faces and voices (Vatakis & Spence, 2007). This finding offers
strong evidence for the unity assumption raising important questions about
the degree to which it applies to auditory stimuli beyond speech.
A series of experiments assessing the role of the unity assumption with
musical stimuli involved pairing the sound of a piano note and plucked
guitar string with video recordings of the movements used to produce these
sounds. Following their earlier procedures, this approach found no evidence
of the unity assumption playing a role in this non-speech musical task (as
well as other stimuli such as a hammer striking ice vs. a bouncing ball).
This outcome contributed to the conclusion that the unity assumption
applied only to speech stimuli (Vatakis, Ghazanfar, & Spence, 2008).
However, as summarized below, subsequent research found strong evidence
for the unity assumption in non-speech tasks—considering the importance
of auditory temporal structure.
The piano and guitar sounds used by Vatakis et al. (2008) exhibited
similar amplitude envelopes—a property defining the gross temporal
structure of a sound (i.e., the summation of changes in the amplitudes of
spectral components). Building upon their approaches to assessing binding
using musical notes produced by the marimba and cello, my team found
evidence for the unity assumption when assessing sounds that involved
clearly differentiable amplitude envelopes (Chuen & Schutz, 2016).
Although in hindsight, the traditional focus on flat tones in auditory
psychophysics research helped obfuscate the obvious similarity in temporal
structure of the guitar and piano notes used by Vatakis et al. (2008). Given
the relatively small proportion of auditory perception studies using natural
sounds, this oversight is understandable as the use of natural sounds in
psychophysics experiments is laudable given the general focus on
temporally invariant stimuli, which “often seems to have limited direct
relevance for understanding the ability to recognize the nature of complex
natural acoustic source events” (Pastore, Flint, Gaston, & Solomon, 2008,
p. 13).
From these examples, it is clear that the time-varying structure of natural
sounds (or lack thereof) can meaningfully influence the outcomes of
psychological experiments. This is true whether researchers’ goals are to
explore natural listening or attempting to better understand the theoretical
structure and function of the auditory system. This issue holds important
implications even for experiments aimed at elucidating generalized
principles of perceptual processing rather than explicitly assessing the role
of dynamic temporal changes. Together, these concerns are consistent with
those raised previously by proponents of ecological acoustics such as John
Neuhoff, who argue that “the perception of dynamic, ecologically valid
stimuli is not predicted well by the results of many traditional experiments
using static stimuli” (2004, p. 5).
Traditional studies of specific sequences of notes such as the four note

opening of Beethoven’s Fifth Symphony provide useful insight into both
the theoretical structure of musical passages, as well as their larger cultural
relevance. Much as the constant movement of pitches and rhythms gives
rise to lively melodies, the continual variations in temporal structure (for
multiple simultaneous harmonics) play an important role in musical
listening. However, as this information is not notated in musical scores and
is often under-emphasized in scientific discourse, the importance of these
dynamic changes is not always fully recognized. This “insight” is well
understood amongst those involved in sound synthesis and virtual modeling
of musical instruments. However, the need for tight experimental control
for stimuli used in experimental work on auditory perception and auditory
neuroscience has incentivized the use of simple time-invariant flat tones.
Although they offer important methodological benefits, their distance from
musical sounds can pose limitations on their ability to inform our
understanding of natural listening. With modern recording and sound
synthesis approaches we now have the ability to generate auditory stimuli
exhibiting the rich temporal variation of natural musical sounds, while also
affording the precise control so crucial for avoiding confounds—raising
exciting new possibilities for future innovation and discovery. Looking
toward the future, research assessing core questions of auditory perception
using temporally complex sounds will help clarify the degree to which
existing theories and models apply to our perception of natural sounds such
as those produced by musical instruments.
A
Funding supporting this research was provided by the Natural Sciences and
Engineering Research Council of Canada (NSERC), Social Sciences and
Humanities Research Council of Canada (SSHRC), and the Ontario Early
Researcher Award (ERA). I would like to thank Maxwell Ng for his
assistance in creating the visualizations of the instrument sounds used
throughout this chapter.
R
Abry, C., Cathiard, M. A., Robert-Ribes, J., & Schwartz, J. L. (1994). The coherence of speech in
audio-visual integration. Current Psychology of Cognition 13, 52–59.
Acoustical Society of America Standards Secretariat (1994). Acoustical Terminology ANSI S1.1–
1994 (ASA 111-1994). American National Standard. ANSI/Acoustical Society of America.
Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration.
Current Biology 14(3), 257–262.
Aldwell, E., Schachter, C., & Cadwallader, A. (2002). Harmony & voice leading (3rd ed.). Boston,
MA: Schirmer.
Alexander, P. L., & Broughton, B. (2008). Professional orchestration: The first key. Solo instruments
& instrumentation note, volume 1 (3rd ed.). Petersburg, VA: Alexander Publishing.
Bedford, F. L. (2004). Analysis of a constraint on perception, cognition, and development: One
object, one place, one time. Journal of Experimental Psychology: Human Perception and
Performance 30(5), 907–912.
Bhatara, A., Tirovolas, A. K., Duan, L. M., Levy, B., & Levitin, D. J. (2011). Perception of emotional
expression in musical performance. Journal of Experimental Psychology: Human Perception and
Bonath, B., Noesselt, T., Martinez, A., Mishra, J., Schwiecker, K., Heinze, H.-J., & Hillyard, S. A.
(2007). Neural basis of the ventriloquist illusion. Current Biology 17(19), 1697–1703.
Boulez, P. (1987). Timbre and composition—timbre and language. Contemporary Music Review
2(1), 161–171.
Broze, Y., & Huron, D. (2013). Is higher music faster? Pitch–speed relationships in Western
compositions. Music Perception: An Interdisciplinary Journal 31(1) 19–31.
Caclin, A., McAdams, S., Smith, B. K., & Winsberg, S. (2005). Acoustic correlates of timbre space
dimensions: A confirmatory study using synthetic tones. Journal of the Acoustical Society of
America 118(1), 471–482.
Chapin, H., Jantzen, K., Kelso, J. A. S., Steinberg, F., & Large, E. W. (2010). Dynamic emotional and
neural responses to music depend on performance expression and listener experience. PLoS ONE
5, 1–14.
Chuen, L., & Schutz, M. (2016). The unity assumption facilitates cross-modal binding of musical,
non-speech stimuli: The role of spectral and amplitude cues. Attention, Perception, &
Psychophysics 78(5), 1512–1528.
Clough, J., & Conley, J. (1984). Basic harmonic progressions. New York: W. W. Norton.
Corrigall, K. A., & Trainor, L. J. (2010). Musical enculturation in preschool children: Acquisition of
key and harmonic knowledge. Music Perception: An Interdisciplinary Journal 28(2), 195–200.
Corrigall, K. A., & Trainor, L. J. (2014). Enculturation to musical pitch structure in young children:
Evidence from behavioral and electrophysiological methods. Developmental Science 17(1), 142–
158.
Cox, T. J. (2008). Scraping sounds and disgusting noises. Applied Acoustics 69(12), 1195–1204.
Dowling, W. J., & Harwood, D. L. (1986). Music cognition. Orlando, FL: Academic Press.
Eerola, T., Friberg, A., & Bresin, R. (2013). Emotional expression in music: Contribution, linearity,
and additivity of primary musical cues. Frontiers in Psychology 4, 1–12. Retrieved from
Erickson, R. (1975). Sound Structure in Music. Berkeley, CA: University of California Press.
Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a
statistically optimal fashion. Nature 415(6870), 429–433.
Fendrich, R., & Corballis, P. M. (2001). The temporal cross-capture of audition and vision.
Ferrand, C. T. (2002). Harmonics-to-noise ratio: An index of vocal aging. Journal of Voice 16(4),
480–487.
Fritts, L. (1997). University of Iowa Electronic Music Studios. University of Iowa. Retrieved from
http://theremin.music.uiowa.edu/MIS.html
Gaver, W. (1993). What in the world do we hear? An ecological approach to auditory event
perception. Ecological Psychology 5(1) 1–29.
Gillard, J., & Schutz, M. (2013). The importance of amplitude envelope: Surveying the temporal
structure of sounds in perceptual research. In Proceedings of the Sound and Music Computing
Conference (pp. 62–68). Stockholm, Sweden.
Gordon, J. W. (1987). The perceptual attack time of musical tones. Journal of the Acoustical Society
of America 82(1) 88–105.
Goswami, U. (2011). A temporal sampling framework for developmental dyslexia. Trends in
Cognitive Sciences 15(1) 3–10.
Grassi, M., & Casco, C. (2009). Audiovisual bounce-inducing effect: Attention alone does not
explain why the discs are bouncing. Journal of Experimental Psychology: Human Perception and
Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the
Acoustical Society of America 61(5), 1270–1277.
Grey, J. M., & Gordon, J. W. (1978). Perceptual effects of spectral modifications on musical timbres.
Journal of the Acoustical Society of America 63(5), 1493–1500.
Guerrieri, M. (2012). The first four notes: Beethoven’s Fifth and the human imagination. New York:
Alfred A. Knopf.
Hamberger, C. L. (2012). The evolution of Schoenberg’s Klangfarbenmelodie: The importance of
timbre in modern music. The Pennsylvania State University. Retrieved from
https://etda.libraries.psu.edu/files/final_submissions/8130
Heinlein, C. P. (1928). The affective characters of the major and minor modes in music. Journal of
Comparative Psychology 8, 101–142.
Hevner, K. (1935). The affective character of the major and minor modes in music. American
Journal of Psychology 47(1), 103–118.
Hjortkjaer, J. (2013). The musical brain. In J. O. Lauring (Ed.), An introduction to neuroaesthetics:
The neuroscientific approach to aesthetic experience, artistic creativity, and arts appreciation (pp.
211–244). Copenhagen: Museum Tusculanum Press.
Houtsma, A. J. M., Rossing, T. D., & Wagennars, W. M. (1987). Auditory demonstrations on
compact disc. Journal of the Acoustical Society of America. New York: Acoustical Society of
America/Eindhoven: Institute for Perception Research.
Huron, D., & Ollen, J. (2003). Agogic contrast in French and English themes: Further support for
Patel and Daniele (2003). Music Perception: An Interdisciplinary Journal 21(2), 267–271.
Iverson, P. (1995). Auditory stream segregation by musical timbre: Effects of static and dynamic
acoustic attributes. Journal of Experimental Psychology: Human Perception and Performance 21,
751–763.
Iverson, P., & Krumhansl, C. L. (1993). Isolating the dynamic attributes of musical timbre. Journal of
the Acoustical Society of America 94, 2594–2603.
Joris, P. X., Schreiner, C. E., & Rees, A. (2004). Neural processing of amplitude-modulated sounds.
Physiological Reviews 84, 541–577.
Jourdain, R. (1997). Music, the brain, and ecstasy: How music captures our imagination. New York:
William Morrow and Company.
Kendall, R. A. (1986). The role of acoustic signal partitions in listener categorization of musical
phrases. Music Perception 4(2), 185–213.
Koelsch, S., & Friederici, A. D. (2003). Toward the neural basis of processing structure in music.
Krimphoff, J., McAdams, S., & Winsberg, S. (1994). Caractérisation du timbre des sons complexes.
II. Analyses acoustiques et quantification psychophysique. Journal de Physique IV Colloque 4,
625–628.
Kumar, S., Forster, H. M., Bailey, P., & Griffiths, T. D. (2008). Mapping unpleasantness of sounds to
their auditory representation. Journal of the Acoustical Society of America 124(6), 3810–3817.
Lowis, M. J. (2002). Music as a trigger for peak experiences among a college staff population.
Creativity Research Journal 14(3–4), 351–359.
McAdams, S., Winsberg, S., Donnadieu, S., de Soete, G., & Krimphoff, J. (1995). Perceptual scaling
of synthesized musical timbres: Common dimensions, specificities, and latent subject classes.
Psychological Research 58(3), 177–192.
McDermott, J. (2012). Auditory preferences and aesthetics: Music, voices, and everyday sounds. In
R. J. Dolan & T. Sharot (Eds.), Neuroscience of preference and choice: Cognitive and neural
mechanisms (pp. 227–257). London: Academic Press.
Miller, J. R., & Carterette, E. C. (1975). Perceptual space for musical structures. Journal of the
Moore, B. C. J. (1997). An introduction to the psychology of hearing (4th ed.). London: Academic
Press.
Neuhoff, J. G. (2004). Ecological psychoacoustics (J. G. Neuhoff, Ed.). Amsterdam:
Elsevier/Academic Press.
Neuhoff, J. G., & McBeath, M. K. (1996). The Doppler illusion: The influence of dynamic intensity
change on perceived pitch. Journal of Experimental Psychology: Human Perception and
Emotion processing of major, minor, and dissonant chords: A functional magnetic resonance
Pastore, R. E., Flint, J., Gaston, J. R., & Solomon, M. J. (2008). Auditory event perception: The
source–perception loop for posture in human gait. Perception & Psychophysics 70(1), 13–29.
Patel, A. D., & Daniele, J. R. (2003). Stress-timed vs. syllable-timed music? A comment on Huron
and Ollen (2003). Music Perception: An Interdisciplinary Journal 21(2), 273–276.
Phillips, D. P., Hall, S. E., & Boehnke, S. E. (2002). Central auditory onset responses, and temporal
asymmetries in auditory perception. Hearing Research 167(1–2), 192–205.
Poon, M., & Schutz, M. (2015). Cueing musical emotions: An empirical analysis of 24-piece sets by
Bach and Chopin documents parallels with emotional speech. Frontiers in Psychology 6, 1–13.
Retrieved from https://doi.org/10.3389/fpsyg.2015.01419
Repp, B. H. (1995). Quantitative effects of global tempo on expressive timing in music performance:
Some perceptual evidence. Music Perception: An Interdisciplinary Journal 13(1), 39–57.
Rimsky-Korsakov, N. (1964). Principles of orchestration (M. Steinberg, Ed.). New York: Dover.
Risset, J.-C., & Wessel, D. L. (1999). Exploration of timbre by analysis and synthesis. In D. Deutsch
(Ed.), The Psychology of Music (pp. 113–169). San Diego, CA: Gulf Professional Publishing.
Rossing, T. D., Moore, R. F., & Wheeler, P. A. (2013). The science of sound (3rd ed.). London:
Pearson Education.
Saldanha, E. L., & Corso, J. F. (1964). Timbre cues and the identification of musical instruments.
Schellenberg, E. G. (2002). Asymmetries in the discrimination of musical intervals: Going out-of-
tune is more noticeable than going in-tune musical intervals. Music Perception: An
Interdisciplinary Journal 19(2), 223–248.
Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychological
Science 14(3), 262–266.
Schenker, H. (1971). Analysis of the first movement. In E. Forbes (Ed.), Beethoven Symphony No. 5
in C minor (pp. 164–182). New York: W. W. Norton.
Schulkind, M. D., Hennis, L. K., & Rubin, D. C. (1999). Music, emotion, and autobiographical
memory: They’re playing your song. Memory & Cognition 27(6), 948–955.
Schutz, M. (2008). Seeing music? What musicians need to know about vision. Empirical Musicology
Review 3(3), 83–108.
Schutz, M. (2009). Crossmodal integration: The search for unity (Dissertation). University of
Virginia.
Schutz, M., & Kubovy, M. (2009a). Causality and cross-modal integration. Journal of Experimental
Psychology: Human Perception and Performance 35(6), 1791–1810.
Schutz, M., & Kubovy, M. (2009b). Deconstructing a musical illusion: Point-light representations
capture salient properties of impact motions. Canadian Acoustics 37(1), 23–28.
Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music: Vision influences perceived tone
duration. Perception 36(6), 888–897.
Schutz, M., & Manning, F. (2012). Looking beyond the score: The musical role of percussionists’
ancillary gestures. Music Theory Online 18, 1–14.
Schutz, M., Stefanucci, J., Baum, S. H., & Roth, A. (2017). Name that percussive tune: Associative
memory and amplitude envelope. Quarterly Journal of Experimental Psychology 70(7), 1323–
1343.
Schutz, M., & Vaisberg, J. M. (2014). Surveying the temporal structure of sounds used in music
perception. Music Perception: An Interdisciplinary Journal 31(3), 288–296.
Sekuler, R., Sekuler, A. B., & Lau, R. (1997). Sound alters visual motion perception. Nature
385(6614), 308.
Skarratt, P. A., Cole, G. G., & Gellatly, A. R. H. (2009). Prioritization of looming and receding
objects: Equal slopes, different intercepts. Attention, Perception, & Psychophysics 71(4), 964–970.
Sloboda, J. (1991). Music structure and emotional response: Some empirical findings. Psychology of
Music 19(2), 110–120.
Strong, W., & Clark, M. (1967). Perturbations of synthetic orchestral wind-instrument tones. Journal
of the Acoustical Society of America 41(2), 277–285.
Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., … Yanai, K. (2008).
Discrete cortical regions associated with the musical beauty of major and minor chords. Cognitive,
Affective, & Behavioral Neuroscience 8(2), 126–131.
Tan, S.-L., Pfordresher, P. Q., & Harré, R. (2007). Psychology of music: From sound to significance.
New York: Psychology Press.
Terhardt, E. (1974). On the perception of periodic sound fluctuations (roughness). Acta Acustica
United with Acustica 30, 201–213.
Tervaniemi, M., Schröger, E., Saher, M., & Näätänen, R. (2000). Effects of spectral complexity and
sound duration on automatic complex-sound pitch processing in humans: A mismatch negativity
study. Neuroscience Letters 290, 66–70.
Thompson, W. F. (2009). Music, thought, and feeling: Understanding the psychology of music. New
York: Oxford University Press.
Tirovolas, A. K., & Levitin, D. J. (2011). Music perception and cognition research from 1983 to
2010: A categorical and bibliometric analysis of empirical articles in Music Perception. Music
Perception: An Interdisciplinary Journal 29(1), 23–36.
Tovey, D. F. (1971). The Fifth Symphony. In E. Forbes (Ed.), Beethoven Symphony No. 5 in C minor
(pp. 143–150). New York: W. W. Norton.
Trehub, S. E., Endman, M. W., & Thorpe, L. A. (1990). Infants’ perception of timbre: Classification
of complex tones by spectral structure. Journal of Experimental Child Psychology 49(2), 300–313.
Vallet, G., Shore, D. I., & Schutz, M. (2014). Exploring the role of amplitude envelope in duration
estimation. Perception 43(7), 616–630.
Vatakis, A., Ghazanfar, A. A., & Spence, C. (2008). Facilitation of multisensory integration by the
“unity effect” reveals that speech is special. Journal of Vision 8(9), 1–11.
Vatakis, A., & Spence, C. (2007). Crossmodal binding: Evaluating the “unity assumption” using
audiovisual speech stimuli. Perception & Psychophysics 69(5), 744–756.
Vuoskoski, J. K., & Eerola, T. (2012). Can sad music really make you sad? Indirect measures of
affective states induced by music and autobiographical memories. Psychology of Aesthetics,
Creativity, and the Arts 6, 1–10.
Walker, J. T., & Scott, K. J. (1981). Auditory-visual conflicts in the perceived duration of lights,
tones and gaps. Journal of Experimental Psychology: Human Perception and Performance 7(6),
1327–1339.
Wang, S., Liu, B., Dong, R., Zhou, Y., Li, J., Qi, B., … Zhang, L. (2012). Music and lexical tone
perception in Chinese adult cochlear implant users. The Laryngoscope 122, 1353–1360.
Warren, R. M. (2013). Auditory perception: A new synthesis. Amsterdam: Elsevier.
Welch, R. B. (1999). Meaning, attention, and the “unity assumption” in the intersensory bias of
spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Musseler (Eds.),
Cognitive contributions to the perception of spatial and temporal events (pp. 371–387).
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy.
Psychological Bulletin 88(3), 638–667.
1
All analyses of notes in this chapter are based on additional samples from the University of
Iowa Electronic Music studios (Fritts, 1997).
2
However, presenting notes without transients as part of a melodic sequence (rather than as
isolated tones) may mitigate this confusion (Kendall, 1986).
3
Provided that the acoustic information is of sufficient quality (Alais & Burr, 2004; Ernst &
Banks, 2002).
CHAPT E R 8
NEURAL BASIS OF
RHYTHM PERCEPTION
C H R I S T I N A M. VA N D E N B O S C H D E R N E D E R L A N D E N, J.
E R I C T. TAY L O R, A N D JE S S I C A A . G R A H N
T experience music, listeners must be able to pick up on the temporal

relationships among events as they unfold. These temporal relationships are
characterized by the rhythm, or the pattern of time intervals between the
onsets of events in music (see Fig. 1). Unlike sculpture or painting, we must
comprehend rhythmic structure to perceive and produce music and dance.
One of the most intriguing phenomena in music is that when we listen to
rhythm, we perceive a regular, recurring pulse or beat (Cooper & Meyer,
1960; Large, 2008), which allows us to bob our heads and clap our hands in
time to the music. This psychologically generated beat does not always
have to align with the note onsets in a rhythm, as evidenced by the fact that
we mentally continue the beat through gaps in the music (see Fig. 1). We
further organize the musical beat into alternations of strong and weak beats
at multiple hierarchical timescales, called meter (Epstein, 1995; Lerdahl &
Jackendoff, 1983). The meter helps us distinguish between, for example, a
waltz (i.e., triple meter) and a march (i.e., duple meter) depending on
whether we hear the strong beat fall on every three or two beats,
respectively.
FIGURE 1. Rhythm is represented by the black dots on each line, whereas the beat is represented
by the bold black lines occurring every other beat (duple meter). (A). Represents a simple metrical
pattern, such that events fall on the beat more often than not. (B). Depicts a complex metrical pattern
with some events occurring on the beat, while many others do not. Finally, (C). Illustrates a
syncopated rhythm, where note events always occur off the beat.
Despite the ease with which humans pick up on the beat and synchronize
their movements to music, it is not trivial to understand how human brains
perceive and process rhythm. Musical rhythms are beat-based, which means
that the pattern of onsets gives rise to the feeling of an underlying pulse or
framework. Perceiving a beat can make it easier to predict and act on
upcoming events in a rhythmic sequence. However, many other naturally
occurring rhythms in our environment do not have a regular pulse or beat,
such as walking, talking, or the car engine turning over. Such rhythms are
called non-beat-based. Different mechanisms have been proposed to
account for the way that humans encode beat- and non-beat-based rhythms.
Absolute timing mechanisms encode exact durations of all intervals in a
sequence, whereas relative timing mechanisms encode when intervals start
and stop in relation to the beat. If there is no regular beat, then absolute
timing is likely necessary to encode the rhythm. However, with a beat,
relative timing may be used. There is evidence for distinct neural networks
associated with absolute and relative timing (Teki, Grube, Kumar, &
Griffiths, 2011), with participants relying on either mechanism depending
on the nature of the rhythm and task demands.
A number of approaches are used to understanding rhythm processing,
incorporating methodologies based on behavior, neuroimaging, and patient
studies. Measuring behavior is fundamental to our understanding of rhythm
because it can provide a direct measure of how we move to music.
However, much of behavioral research is correlational—that is, distinct
measures of stimulus characteristics and tapping variability may be related
to one another, but the stimulus characteristics may not cause tapping
variability as there may be a third unmeasured variable that is the true
influencer of performance. Although neuroimaging approaches are ideal for
discovering more about when and where in the brain rhythm is processed,
some neuroimaging studies fail to include (or are unable to include for
methodological reasons) behavioral measures of rhythm processing, which
makes it difficult to determine exactly how differences in neural activation
relate to real-world outcomes. That is, simply because there are differences
in activation for two different rhythms does not necessarily mean that
participants will also perceive them differently. Finally, studies of patients
with brain damage or dysfunction provide significant insights, but rely on
natural accidents, which do not lead to the same amount or location of
damage in each individual. This makes it difficult to understand what
lesioned areas are truly necessary for rhythm processing or whether a
particular combination of areas is required to perceive rhythm. Of course, a
combination of these approaches, combined with methods that focally
disrupt ongoing neural processing, such as transcranial magnetic
stimulation (TMS), have given rise to a rich literature on human rhythm
perception. This chapter will review the current literature on the neural
basis of rhythm perception, which highlights important brain areas for
perceiving a beat, and how the human brain entrains to rhythms in music.
F B
To understand how the brain processes rhythm, especially rhythms that

have a beat (as is the case in most music), it is first necessary to create
rhythmic stimuli that are capable of inducing the percept of a strong beat,
and similar stimuli that either do not induce a beat percept at all, or only
weakly so. This allows researchers to compare scenarios when participants
feel the beat more or less strongly (or not at all), but other aspects of the
task are equivalent (e.g., the presence of acoustically similar sounds, having
to listen to or reproduce rhythms). The stimuli must be as physically similar
as possible so that when activation of strong and weak beat rhythms is
compared, activation differences will not arise from other differences
between stimuli apart from the percept of a beat. To solve this problem,
researchers take advantage of the fact that the strength of a beat percept can
be driven by perceptual accents, or perceived emphases on certain tones in a
rhythm that generally mark the beat. These perceptual accents differ from
physical accents (e.g., changes in loudness or pitch on certain notes),
because they arise from timing of the tones, even though the physical
properties of all the tones are identical (Brochard, Abecasis, Potter, Ragot,
& Drake, 2003; Povel & Okkerman, 1981). For instance, people perceive
accents on every other event in an evenly spaced sequence (e.g., tick-tock
of a clock), or perceive accents on notes that are preceded or followed by
long silent intervals, even if the events are the same duration, loudness, and
pitch as events not surrounded by silence. By creating sequences that differ
in their pattern of temporal onsets, but are identical in all other respects
(e.g., number of tones, duration, loudness, pitch), researchers can create
rhythms with varying degrees of a beat percept that are matched in other
ways. For example, in metric simple rhythms (see Fig. 1A), the timing of
the tones is selected to create regularly occurring perceptual accents, which
induce a clear and steady beat. In these cases, the intervals between tones
would be comprised of whole-integer ratios (e.g., 2, 2, 1, 1, 2, in which
numbers represent multiples of an arbitrary time interval (e.g., 1 = 250 ms)
between tone onsets). These can be compared to metric complex rhythms
(see Fig. 1B), in which tone onsets are syncopated: the perceptual accents
occur irregularly—they do not always coincide with the beat, and thus do
not induce a strong beat percept (Povel & Essens, 1985). The irregular
perceptual accents of complex rhythms also make it more difficult to
synchronize with them (Patel, Iversen, Chen, & Repp, 2005). A third
category of rhythmic sequences is non-metric, in which onsets occur in
non-integer ratio sequences (e.g., onsets occurring at 1, 1.3, 0.4, 1.7
intervals apart) rather than the integer ratio sequences common to metric
simple and complex rhythms. These stimuli sound almost random, so much
so that listeners have a strong inclination to reproduce them by incorrectly
tapping back integer ratio sequences (Collier & Wright, 1995; Essens, 1986;
Ravignani, Delgado, & Kirby, 2017). Importantly, the perceptual difference
between simple and complex rhythms is a product of whether perceptual
accents coincide with tone onsets, as both types of rhythms are composed of
integer-ratio intervals.
Now that we have covered the notion of metricality and the distinction
between simple and complex rhythms, we can dive into recent findings
from cognitive neuroscience that describe how neural processes give rise to
rhythm perception. The basic logic of fMRI studies involves measuring the
brain’s online metabolic activity in at least two different experimental
conditions, and comparing the pattern of differences. These differences
indicate which neural structures respond during a behavior or cognitive
function. For example, comparing brain responses to simple, complex, and
non-metric rhythms that differ in the strength of beat perception but are
otherwise perceptually similar, might pinpoint the neural structures
involved in perceiving the beat. Using this approach, greater activity has
been seen in certain motor structures, namely the basal ganglia (BG) and
supplementary motor area (SMA), while hearing metric simple rhythms
(which have a clear beat) compared to complex or non-metric stimuli
(which have little or no beat; Grahn & Brett, 2007). Greater activity
occurred regardless of participants’ musical training, suggesting a
fundamental role for those areas in rhythm perception. Responses in other
motor areas, namely the premotor cortex (PMC) and cerebellum, were
observed for all sequences, and did not vary depending on the presence or
absence of a beat. All these motor areas were implicated in more general
timing and sequence processing as well (Chen, Penhune, & Zatorre, 2008),
but the basal ganglia and SMA appear to be particularly responsive to beat
processing. A similar study revealed a distributed network that predicts
individual differences in beat perception, in which strong beat perceivers
display greater SMA, ventrolateral prefrontal cortex (PFC), and medial PFC
activity compared to weak beat perceivers. The activated network in strong
beat perceivers was more distributed through frontal and motor areas,
whereas the activated network in weak beat perceivers was more limited to
auditory processing (Chen, Zatorre, & Penhune, 2006). While degree of
motor response in activated networks differs depending on the task and the
stimuli, one conclusion is clear: rhythm perception recruits participation
from a network of motor areas, even when movements are not required. The
recruitment of basal ganglia (specifically, the putamen) during beat
perception was further confirmed by a later fMRI study that examined the
neural responses to the beat when the beat was induced by different types of
accents, or emphases (Grahn & Rowe, 2009). The beat was either
emphasized by changes in loudness on beat tones (strong external accents),
or perceptual accents created by timing of the tones without loudness
changes as described in metric simple stimuli above (weak external
accents), or the tones were unaccented, and thus any beat imposed by
listeners was generated internally (internal accents). Regardless of the
accent used to induce beat perception, greater putamen activity was
observed relative to control non-beat rhythms. Moreover, greater
connectivity (taken as a measure of communication between brain areas)
was observed between the putamen and the SMA, PMC, and auditory
cortex. In musicians, whether the beat was external or internal activated
different subcomponents of the motor network, modulating connectivity
between PMC and auditory cortex.
fMRI studies provide insights into the metabolic state of the brain under
different conditions, but there are other physiological markers we can use to
decipher the processes involved in rhythm perception. TMS can be used to
briefly activate the connections between the brain, spinal cord, and distal
musculature, giving us a snapshot of the body’s corticospinal excitability at
any given moment. Simply, a strong magnetic field is induced on the
surface of a participant’s head using a powerful electric current, safely
contained within an insulated coil. This field passes a few centimeters
through the head and induces a small and localized electric field inside the
brain that causes neurons within it to fire. This means experimenters can
directly and non-invasively stimulate neuronal firing. The neuronal firing
triggered by TMS delivered to primary motor cortex (M1) results in
involuntary muscle twitches. Measuring the amplitude of the electrical
signal that causes the muscle to contract, called a motor evoked potential
(MEP), at the muscle gives a reliable index of corticospinal excitability, or
the motor system’s readiness for action. For example, stimulating M1 in
pianists elicits greater amplitude MEPs from hand muscles when they listen
to a piece they have played compared to an unfamiliar piece (D’Ausilio,
Altenmüller, Olivetti Belardinelli, & Lotze, 2006), suggesting that their
motor system automatically responds to pieces they have learned. Rhythm
researchers have used this TMS-MEP logic to measure the motor system’s
excitability during beat perception. For example, MEP amplitudes
measured from the ankle were greater when TMS pulses were delivered in
time with the beats of strong beat compared to weak beat rhythms
(Cameron, Stewart, Pearce, Grube, & Muggleton, 2012). Increased MEP
amplitude in response to the beat is in line with the aforementioned fMRI
literature on greater basal ganglia and SMA activations during perception of
simple compared to complex sequences. Increases in excitability also
happen in response to music. In a recent study, musicians listened to “high-
groove” or “low-groove” music while receiving TMS, where groove is “a
musical quality that makes us want to move with the rhythm or beat”
(Stupacher, Hove, Novembre, Schutz-Bosbach, & Keller, 2013, p. 127).
MEP amplitudes were greater for high-groove versus low-groove music,
and this effect was more pronounced for MEPs elicited on the beat versus
off the beat pulses, indicating that motor system readiness was greatest on
the beat. Although these studies do not directly implicate the basal ganglia
in rhythm perception—these structures are too deep within the brain for the
transcranial magnetic field to penetrate—the MEPs are measured
downstream of the central nervous system, implicating the possible
modulation of the entire motor system.
The last source of evidence for the role of the motor system in rhythm
perception comes from neuropsychological cases. There are patients whose
rhythm perception has been altered by a disease or disorder, or by lesions
due to stroke. For example, Parkinson’s disease (PD) is characterized by
rigidity of movement, tremor, and slowness, and is caused by the
progressive deterioration of the dopaminergic pathway in the basal ganglia.
Given the aforementioned review of the basal ganglia’s role in rhythm
perception, and PD patients’ documented difficulty with perception and
production of isochronous rhythms (Harrington, Haaland, & Hermanowitz,
1998), Grahn and Brett (2009) surmised that this patient population would
also display difficulties with beat perception. Indeed, healthy older
participants found it easier to perceive simple, beat-based rhythms versus
complex rhythms, whereas PD patients did not display this advantage,
indicating they were less able use the simple rhythms’ beat-based structure
to perform the discrimination task. The authors concluded that a healthy
basal ganglia appears to be necessary for processing rhythms with a strong
beat. In a follow-up to this study, PD patients who were on L-DOPA, a drug
that increases dopamine, discriminated simple rhythms better (but complex
rhythms worse) when they were on versus off their medication (Cameron,
Pickett, Earhart, & Grahn, 2016). Rhythm discrimination performance was
correlated with the severity of the Parkinson’s disease. Taken together, the
results indicated that healthy dopaminergic function influences beat-based
timing. Finally, the ability to adapt to changes in tempo is severely
hampered by focal basal ganglia lesions due to stroke (Schwartze, Keller,
Patel, & Kotze, 2011). Thus, overall, these neuropsychological studies point
to an essential role of the basal ganglia in normal rhythm perception.
O M
An alternative way of examining beat perception is through the neural

dynamics of excitation and inhibition, which lead to cyclical activity
changes in populations of neurons. These cyclical changes are called
oscillations. Neuronal activity oscillates spontaneously in the brain, but
when listeners receive rhythmic input, the phase and period of ongoing
neural oscillations can be influenced to match, or phase-lock to, the
incoming signal (e.g., Picton, John, Dimitrijevic, & Purcell, 2003;
Schroeder, Lakatos, Kajikawa, Partan, & Puce, 2008). Indeed, rhythmic
stimuli like music or language can act as a pacing signal that allows
listeners to more accurately attend to relevant information in the continuous
signal (Henry & Obleser, 2012). This finding is directly consistent with
behavioral work from the dynamic attending literature that finds
fluctuations in attention over time (Jones & Boltz,1989; Large & Jones,
1999). That is, attention fluctuates periodically, with peaks of concentrated
attention that occur more strongly with increasing periodicity of a stimulus.
Oscillatory attentional and neural dynamics help explain a large body of
literature in music cognition that finds better performance for events (pitch
or interval discrimination) happening on a strong beat—when attention may
be at its peak in the oscillatory phase—compared to a weak beat (for recent
review see Henry & Herrmann, 2014).
The ongoing oscillatory neural dynamics of the brain also make
behavioral predictions for musical rhythm and beat perception. Neural
resonance theory (Large, 2008; Large & Snyder, 2009) demonstrates that
the interaction of rhythmic input with a bank of ongoing neural oscillators
can give rise to several key facets of human rhythm and beat perception
described throughout this chapter. As described above, the way that humans
experience music is as a stable regular pattern in time. However, the surface
rhythm is often not periodic. A rhythm may initially have several events
that fall on the beat, but events will also fall on both strong and weak
metrical positions (Figs. 1A and 1B) and may even begin to consistently fall
on the off-beat, as is the case in syncopated rhythms (Fig. 1C). The
mathematical model of neural resonance theory predicts that perception of
the beat in this instance would remain stable because the initial rhythmic
input from rhythms of Figs. 1A and 1B act to reset the phase and period of
ongoing neural oscillators, leading to a beat percept that is quite persistent,
even in the face of conflicting evidence. Further, the physics of ongoing
oscillators allow listeners to maintain a rhythmic pulse, even in the absence
of environmental input (e.g., through a silent gap in a song). Finally, a key
prediction from neural resonance theory illustrates that neural oscillations
resonate with rhythmic input, which results in peaks of activation at integer
multiples at harmonics or subharmonics of the input rhythms. These
harmonics (3:1 or 2:1) and subharmonics (1:3 or 1:2) are related to the way
that listeners hear perceptual accents on alternating events (see section on
“Oscillatory Mechanisms”), and place stronger perceptual emphasis on
certain tones, such as downbeats, in metrical groupings. Taken together,
neural resonance theory explains how humans (a) perceive a regular pulse
from irregular rhythmic input (i.e., in the absence of strictly periodic input,
such as a metronome), (b) maintain the feeling of a pulse or beat that
persists when sound ceases, and (c) experience alternations of strong and
weak beats and begin to organize music into a hierarchical metrical
framework. That is, listeners “hear musical events in relation to these
patterns because they are intrinsic to the physics of the neural systems
involved in perceiving, attending, and responding to auditory stimuli”
(Large & Snyder, 2009, p. 52).
Neural oscillatory perspectives on rhythm perception have been fruitful
because they rely on the temporal dynamics of rhythm processing instead of
relying on an index of rhythm processing at a single moment in time or as
an average of brain activity over time. One popular way to examine beat
perception has been through the frequency tagging approach. This
methodology allows researchers to characterize at what rates listeners hear
strong events happening, or have heightened attention, in musical rhythms
(e.g., events happening at a rate of 2 or 3 Hz). A landmark study
demonstrated that when participants heard a metronome (i.e., evenly
spaced, unaccented sequence of tones), but were asked to perceive the
metronome in groupings of two or three tones, there was a peak in the
power of the EEG spectrum related to the particular frequency they
imagined. Thus, when participants heard the rhythm in groupings of two
there was greater power at the frequency related to a binary grouping
compared to the slower frequency related to a ternary grouping and vice
versa (Nozaradan, Peretz, Missal, & Mouraux, 2011). When participants
were trained to move to an ambiguous rhythm in either a duple or a triple
meter, there was subsequently greater power in the EEG spectrum at the
frequency they moved to (Chemin, Mouraux, & Nozaradan, 2014), even
when simply listening and no longer moving. Such findings demonstrate
that beat perception is not simply stimulus-driven, but that listeners can and
do impose a beat on a sequence that can be observed in neural activity.
Oscillatory activity in particular frequency bands—which are unrelated
to the particular frequency of the input stimulus—is also important for
characterizing rhythm processing. When researchers looked at induced (i.e.,
not phase-locked to the event onset) and not evoked (i.e., phase-locked to
the stimulus onset) activity using electroencephalography they found that
activity of high-frequency oscillations from 20–60 Hz followed the pattern
of processing that was observed in many behavioral studies of rhythm
processing—listeners do not simply react to note onsets after they occur.
Instead, they anticipate a beat and even “feel” that beat when a note is
occasionally omitted. These researchers showed that there was a peak in
induced high-frequency oscillations that occurred in anticipation of tone
onsets even when the tone was omitted (Snyder & Large, 2005). Further
studies found separate functional roles for beta (15–30 Hz) and gamma (30–
80 Hz) frequency bands (Fujioka, Trainor, Large, & Ross, 2009). Activity
in the beta band reflected motor processing and was important for
coordinating auditory-motor interactions when processing the beat in music,
whereas gamma-band activity was associated with the same endogenous
anticipatory processing of the beat found in previous studies. In a follow-up
study, Fujioka and colleagues (Fujioka, Trainor, Large, & Ross, 2012)
found that induced beta-band activity increased in anticipation of the beat
and varied with tempo, further suggesting an endogenous generator. In
contrast, a sharp decrease in induced beta occurred immediately after the
onset of the beat, but this decrease followed the same pattern regardless of
stimulus presentation characteristics, suggesting that beta desynchronization
was simply a response to hearing a tone, and not reflective of anticipation.
Importantly, this activity originates both from auditory cortex and from
sensorimotor cortex, again highlighting the role beta-band activity plays in
coordinating auditory-motor interactions (Fujioka et al., 2012). These
auditory-motor interactions in the beta band could have important
consequences for preparation of motor movements that are important in
more ecologically valid musical experiences, such as when the listener
grooves along with the music. Again, these studies highlight that rhythm
processing is not just a faithful tracking of acoustic input, but rather
involves the perception of beat and meter related periodicities that are not
necessarily part of a stimulus. These studies also highlight the integral role
motor processing plays in the perception of the beat in music, even when
listeners are not moving or tapping along to the music.
There is a growing interest in neuroscientific investigations of rhythm
processing to characterize the way that humans entrain to music by looking
at oscillatory dynamics at the beat frequency and by considering auditory-
motor dynamics in other frequency bands. While many approaches have
shed considerable light on the ways listeners perceive rhythm, it is
important to carefully consider how we interpret whether differences in
peak power of the EEG spectrum are related to beat or stimulus
characteristics (Henry, Herrmann, & Grahn, 2017). Neural resonance theory
has shown that many facets of beat perception in humans emerge naturally
from the physical interactions of multiple internal oscillators and rhythmic
input, but it does not explain all aspects of beat perception, such as how
children learn to be better perceivers of the beat in music, or how beat
perception changes based on culture or musical experience. Future research
is needed to understand more about how oscillatory activity in the brain
interacts with music experience and how such experiences are maintained
or weighted in such a dynamical system.
L M
Music and language are both important forms of human communication.

Although there are many similarities among these two domains, one of the
key similarities related to rhythm processing is that music and language
both unfold sequentially in time and are hierarchically structured (Patel,
2003). Yet the temporal characteristics of how they unfold are different
(Ding et al., 2017). As is clear from the discussion thus far, listeners
perceive a musical beat that, despite surface irregularities, leads to the
perception of beat events unfolding at regular intervals, with alternating
strong and weak beats according to the meter of the music. In language,
there is a long history of debate over whether there are isochronous units in
speech rhythms between successive syllable onsets or between successive
stressed syllable onsets. However, after carefully annotating speech
intervals at consonant and vowel onsets, little evidence has been found for
regularity between syllables or stressed syllables in the acoustic signal,
although other patterns of more or less vocalic variability did emerge
(Grabe & Low, 2002; but see Brown, Pfordresher, & Chow, 2017). Despite
the lack of evidence for an isochronous interval in speech, spoken
utterances are still comprised of rhythmic, albeit less regular than in music,
peaks in the acoustic signal that are important for helping listeners form
expectations. There is growing evidence that better neural tracking of the
rhythmic syllable onsets in speech is important for language comprehension
(e.g., Peelle, Gross, & Davis, 2013). Given the importance of understanding
rhythmic relationships in language, there is growing interest in how musical
training or musical ability is related to language abilities in a wide range of
listeners.
A large body of literature has focused on the relationship between
reading and rhythm processing. For instance, compared to age and reading
level matched peers, individuals with dyslexia are worse at neurally
tracking low-frequency (e.g., delta and theta frequency bands) temporal
information in the speech signal (Power, Colling, Mead, Barnes, &
Goswami, 2016). Delta (0–4 Hz) and theta (4–8 Hz) frequencies roughly
correspond to the rate at which phrases and syllables unfold in the speech
stream, respectively. Some researchers have even posited that the deficits
related to phonological awareness and analysis in developmental dyslexia
are actually caused by temporal processing deficits. In particular, some
evidence suggests that adults with dyslexia oversample the speech stream in
high-frequency oscillatory bands that may be related to phonological
onsets, which leads to greater power at frequencies that may be irrelevant
for processing phonetic information in speech (Lehongre, Ramus,
Villiermet, Schwartz, & Giraud, 2011). Further evidence of a relationship
between language and temporal processing abilities comes from individuals
with specific language impairments, including dyslexia, for whom there are
positive correlations between musical training and language outcomes, with
unique predictive power coming from rhythm perception skills (Flaugnacco
et al., 2015; Habib et al., 2016; Zuk et al., 2017). Enhanced language
processing as a result of music ability may be particularly related to beat-
based processing abilities, and not better encoding of rhythmic intervals in
general, as studies have shown that regularity detection is particularly
related to language and literacy in adults from a wide range of language
backgrounds (Bekius, Cope, & Grube, 2016; Grube, Cooper, & Griffiths,
2013).
While the studies outlined above show evidence that neurally following
rhythmic input in speech is important for developing normal language and
reading skills and that music training seems to be related to behavioral
language outcomes, there are very few studies that have compared beat
perception or production abilities and related them directly to neural
tracking of rhythmic input. However, one study has shown that there is an
association between rhythm production abilities in preschoolers and
encoding of fundamental frequency in a single utterance (i.e., “da”) through
auditory brainstem response (Woodruff-Carr, White-Schwoch, Tierney,
Strait, & Kraus, 2014). Further research is necessary to establish a link
between rhythm perception or production and neural tracking of low-
frequency information in speech. Tracking the rhythmic fluctuations in the
amplitude envelope of speech may also be different from the types of
rhythmic entrainment discussed above. Therefore, it is important to
determine whether these language and reading studies rely on similar
mechanisms as musical rhythm entrainment, such that that neural activity
can remain entrained to the syllable rate even when the utterance is
removed, or whether these studies represent a more stimulus-dependent
rhythmic processing than musical rhythms.
D R
Rhythm perception is an important skill for myriad domains, including

music, language, and movement, among others. The ubiquity of rhythmic
information in our everyday environment highlights the importance for
developing rhythm processing skills early in development. Indeed, rhythm
processing seems to be important also for social-emotional development.
Children who are able to synchronize to music show not only better parsing
of events unfolding in time, but also show better social-emotional
processing as a result of synchronization. After playing musical instruments
together, 4-year-olds showed higher rates of spontaneous helping compared
to children who were not encouraged to synchronize their actions with a
partner (Kirschner & Tomasello, 2010). A similar pattern can be found as
young as 14 months of age in a paradigm that induces synchrony between a
child and another adult by having the experimenter bounce the child in an
infant carrier in synchrony with another adult (Cirelli, Einarson, & Trainor,
2014). Infants bounced synchronously showed more prosocial helping
behaviors than children bounced out of synchrony. It is clear that beat
processing is advantageous for normal development, but indexing beat
perception in infancy is difficult given the limitations young infants have
making overt behavioral responses. This is where neural measures are
particularly useful for examining beat perception at the earliest stages of
development.
Newborns listening to musical sequences show larger neural mismatch
responses when an omission occurs on a strong beat compared to a weak
beat (Winkler, Haden, Ladinig, Sziller, & Honing, 2009), providing
evidence for beat processing in humans from birth. Further, using the
frequency-tagging approach described above, infants who heard an
ambiguous rhythm that could be perceived in either a duple or triple meter,
had peaks corresponding to the beat and both metrical frequencies (Cirelli,
Spinelli, Nozaradan, & Trainor, 2016). However, infants with either more
experience in music classes or more musically engaged parents showed
greater peaks in the EEG spectrum related to duple compared to triple meter
perception (Brochard et al., 2003). This finding is in line with culture-
specific patterns suggesting that Western listeners prefer simple integer
ratios, with a bias toward duple meter groupings. Later in childhood, even
when children are capable of making behavioral or motor responses, the
immaturity of their motor system makes it unclear whether a lack of beat
perception abilities or motor immaturity is at the root of differences
between the way children and adults process rhythm and beat. For instance,
although children move when they hear music, there is little evidence that
they actually synchronize their movements to the beat, which makes it
unclear whether children are poor beat perceivers or poor dancers. As
described above, beta-band activity reflects auditory-motor interactions and
researchers have used this approach to show that beat processing may not
become mature until after age 7. Seven-year-olds’ beta-band activity during
beat processing only showed the adult-like pattern of desynchronization and
subsequent rebound in anticipation of the beat for slow rhythms, not fast
rhythms (Cirelli, Bosnyak, et al., 2014). These findings in the beta band
align well with behavioral findings suggesting that sensorimotor
synchronization with music does not reach adult-like accuracy until 8 or 9
years of age (McAuley, Jones, Holub, Johnston, & Miller, 2006). Together
these findings suggest that beat perception is intact from birth, but that the
auditory-motor processing capabilities required to synchronize movements
to music develop well into childhood.
C P
E R P
Clues to the neural mechanisms of rhythm perception can be gleaned from

comparative studies between humans and other species that have similar
abilities, but different brains. Some of the best examples of rhythmic
entrainment come from various bird species, such as cockatiels (Patel,
Iversen, Bregman, & Schulz, 2009), certain parrots (Schachner, Brady,
Pepperberg, & Hauser, 2009), and budgerigars (Hasegawa, Okanoya,
Hasegawa, & Seki, 2011), who can all bob their heads in time with a simple
rhythm. Although none of these animals appear to reach human
sophistication with rhythmic entrainment—for example, they have
difficulty with complex rhythms, or with adaptation to novel tempos—their
ability to synchronize with simple rhythms has led some researchers to
theorize that beat perception is a corollary ability to the development of
vocal learning (Patel, 2006). Although this idea is further supported by the
presence of simple synchronization abilities in other vocal, non-bird species
such as bonobos, chimpanzees, and possibly elephants (Hattori, Tomonaga,
& Matsuzawa, 2013; Large & Gray, 2015; Poole, Tyack, Stoeger-Horwath,
& Watwood, 2005), recent demonstrations of rhythmic entrainment in a sea
lion—a decidedly non-vocal species—pose complications for the theory
that beat perception is a corollary development to vocal learning (Cook,
Rouse, Wilson, & Reichmuth, 2013). The sea lion not only synchronizes to
simple rhythms, she satisfies more stringent tests of rhythmic entrainment
previously observed only in humans, such as being able to adapt to changes
in tempo.
Most cross-species research is done on monkeys instead of better vocal
learners (like the aforementioned bird species) for matters of convenience
(e.g., established monkey neurophysiology labs, similar brains) and closer
evolutionary ancestry. Monkey rhythmic entrainment is impoverished by
comparison to humans: macaques can time intervals very accurately (Zarco,
Merchant, Prado, & Mendez, 2009), and can synchronize with simple
isochronous sequences, but they have more reactionary, as opposed to
anticipatory, actions. Online measures of neural activity (i.e., local field
potentials, LFPs) during synchronization tasks indicate that monkeys’
putaminal cells are interval-sensitive, with different populations
representing different durations by bursts of gamma- or beta-band
oscillations (Bartolo, Prado, & Merchant, 2014). Thus, monkeys are good at
timing the individual intervals that make up a rhythm, but they do not
appear to synchronize as accurately as humans do when multiple intervals
are presented in a sequence. Monkeys also do not appear to process non-
isochronous rhythms the way that humans do. EEG studies in monkeys
show no event-related potentials (ERPs) corresponding to unexpected
events (as indexed by the mismatch negativity, or MMN, to unexpected beat
omissions); the MMN represents the detection of something out-of-place—
if the monkeys don’t perceive the beat in the rhythm, then the omissions
won’t be out of place (Honing, Merchant, Háden, Prado, & Bartolo, 2012).
Moreover, simple rhythmic deviants elicit changes in gaze and expression,
and auditory cortex LFPs (Selezneva et al., 2013). Structurally, these
sequential timing tasks in humans rely on the motor cortico-basal-ganglia-
thalamo-cortical (mCBGT) circuit in humans. The monkey analogue to this
network also appears to be heavily involved in motor timing and
sequencing (Merchant, Pérez, Zarco, & Gámez, 2013). However, the
reciprocal connections between auditory cortex and the mCBGT circuit that
exist in humans are not matched in monkeys. Instead, the monkey mCBGT
appears to be more strongly connected to visual cortex, which may explain
why monkeys lack strong rhythmic entrainment, and perform better on
visual synchrony tasks than auditory ones (for a review, see Merchant &
Honing, 2014). This structural discrepancy, along with the behavior
differences between humans and monkeys, may explain why strong
rhythmic entrainment appears to be a decidedly human ability.
C -M I
R P
Thus far, this chapter has focused on rhythm perception in the auditory
modality. However, rhythms include temporally patterned stimuli in any
modality. For example, the isochronous blinking of a car’s turn signal is a
visual rhythm, and your phone’s vibrating notification is a tactile rhythm. In
this section, we will discuss how rhythm is perceived in non-auditory
modalities, focusing on vision. Predictably, the neural correlates of rhythm
perception differ between modalities, but some are also shared. These
shared substrates might be a clue to the neural representation of rhythm in a
pure, temporal sense, uncontaminated by modality-specific processing: the
sine qua non of rhythm perception.
Like audition, vision is sensitive to temporal regularities in the
environment. For example, visual-spatial attention is biased toward reliably
repeating patterns (Zhao, Al-Aidroos, & Turk-Browne, 2013). Unlike
audition, rhythmic visual stimuli, such as a blinking dot, do not lead to a
strong sense of beat: Auditory rhythms are reproduced and remembered
better than visual ones (Glenberg, Mann, Altman, Forman, & Procise, 1989;
Collier & Logan, 2000, respectively). While it is true that audition generally
has better temporal sensitivity than vision (e.g., Goldstone & Lhamon,
1972) this does not explain why auditory rhythms give rise to a sense of
beat and visual ones do not. Recently, researchers have instantiated visual
rhythms with more dynamic stimuli in an attempt to capitalize on the visual
system’s sensitivity to motion and acceleration. A blinking stimulus isn’t
visually natural, but a moving one is. Concordantly, rotating bars and
bouncing balls can give rise to a sense of beat in a manner similar to
auditory stimuli (Grahn, 2012; Hove, Iversen, Zhang, & Repp, 2013;
Iversen, Patel, Nicodemus, & Emmorey, 2015). Even more naturalistic
stimuli, like watching a dancer or following a conductor’s baton, give rise
to timing advantages illustrative of beat perception (Luck & Sloboda, 2009;
Su & Salazar-Lopez, 2016). The message from this new literature is that
although audition wins over other modalities for temporal processing
superiority, rhythm processing is possible in other modalities when the
stimuli are crafted to follow that sense’s priorities.
Given that visual rhythm processing is possible, how does the brain do
it? One possibility is that visual rhythm processing piggy-backs on the
rhythmically-superior auditory and motor resources. According to this view,
visual rhythm perception involves the creation of an internal auditory
rhythm to accompany visual stimuli. Evidence for this perspective was
demonstrated in an fMRI task where participants watched or heard
rhythmic stimuli in counterbalanced blocks of a tempo adaptation task;
visual sequences produced a stronger sense of beat and stronger bilateral
putamen activity when preceded by the auditory task block versus with no
prior auditory experience with the task (Grahn, Henry, & McAuley, 2011).
This change in brain response during the visual task following the auditory
block resembled the activation observed in auditory tasks alone (Grahn &
Brett, 2007). When the visual task preceded the auditory block, there was
no enhancement to rhythm perception or brain response in the basal
ganglia, indicating that it was not simply a practice effect. This study used
blinking visual rhythms that do not readily elicit rhythm perception, so the
authors suggested that the observed behavior and brain responses reflected
the co-opting of typical auditory rhythm perception to achieve the
perception of a visual rhythm. In a later fMRI study with discrete and
moving visual and auditory stimuli, this putamen activity was shown to
reflect a supra-modal rhythm perception response: Activity in the putamen
corresponded to the strength of synchrony with an ongoing rhythm,
regardless of the modality and without prior auditory experience with the
stimuli (Hove, Fairhurst, Kotz, & Keller, 2013). This idea of a supra-modal,
or modality-general, process underpinning rhythm perception received
further support from a study measuring ERPs in response to temporal
expectancy violations in an adaptive tempo task with auditory and visual
stimuli (Pasinski, McAuley, & Snyder, 2016). They found larger amplitude
ERPs for the auditory task, but similar patterns as the visual response,
suggesting again the presence of a modality-general rhythm perception
network, likely rooted in the basal ganglia and the motor system.
I D M
T
Rhythm processing abilities vary widely in the general population. It is not
difficult to run into someone who proclaims that she has two left feet, and
there is evidence of individuals who are actually “beat deaf.” These
individuals cannot align their movements to the beat of a musical piece
despite being able to synchronize to a metronome (Phillips-Silver et al.,
2011). Differences in experience, such as music training, can enhance
rhythm perception and production. But individual differences in abilities
that are associated with beat perception can lead people to encode, store,
and act on auditory information in different ways. For instance, individual
differences in auditory short-term memory (STM) and regularity detection
are associated with better rhythm abilities, especially when reproducing
longer rhythms (Grahn & Schuit, 2012). Music training also accounts for
unique variance in rhythm reproduction abilities compared to auditory STM
and regularity detection, although musical training may only influence
rhythm perception abilities in certain tasks (Bauer, Kreutz, & Herrmann,
2015; Grahn & Brett, 2007; Geiser, Ziegler, Jancke, & Meyer, 2009).
Regularity detection is also correlated with activation in auditory-motor
areas, including left SMA and left dorsal and ventral premotor areas, which
may indicate that people who are better at detecting the beat in music also
rely more heavily on transforming rhythms into auditory-motor
representations instead of relying purely on auditory cues (Grahn & Schuit,
2012). These findings are similar to previous work showing that strong beat
perceivers showed greater activation in SMA compared to weak beat
perceivers, when listening to an ambiguous rhythm (Grahn & McAuley,
2009). Individual motor abilities may also be important for predicting
individual differences in preferred tempo (120 bpm or 2 Hz), which is the
rate at which listeners feel most comfortable tapping to music or a
metronome (McAuley et al., 2006). An individual’s specific peak frequency
in the beta range, assessed during a motor tapping task, predicts preferred
tempo (Bauer et al., 2015), providing additional evidence that auditory-
motor interactions can lead to differences in the way that people prefer to
entrain to music.
Although much of the literature on rhythm and beat perception makes
claims about commonalities across individuals in the neural processing of
rhythms, there is considerable variation in the way humans respond to
rhythms. These individual differences are particularly important to consider
when trying to use rhythm as a therapeutic tool, as in patients with PD.
Although rhythmic stimulation may have seemingly miraculous effects for
some individuals, there are many others for whom rhythmic stimulation
may have no effect or perhaps even a negative effect on gait (Leow, Parrott,
& Grahn, 2014; Nombela et al., 2013). Further research is necessary to
characterize what factors lead to these individual differences in neural
processing of rhythm, including auditory-motor interactions, musical
background, and biological differences to better target interventions to the
individual.
M J A
So far, we have examined rhythm as a subject of the perceiver and his or

her brain. Realistically, rhythms must also have creators, making rhythm
perception an inherently social topic: It depends upon the perception of
others’ actions. One of the themes of this chapter has been the contribution
of the motor system to the perception of rhythm; unsurprisingly, it may be
through the shared architecture of our motor systems that we perceive
music and rhythm so fluently when expressed by other people.
The idea of motor system involvement in rhythm perception follows
from the discovery of the mirror neuron system in monkeys and analogous
systems in humans (see Rizzolatti & Craighero, 2004, for a review), a
network that responds not only to one’s own movements, but also to seeing
or hearing movements of others. This discovery was rapidly adapted to
explain motor simulation: that we unconsciously mimic, or concurrently
represent, others’ movements within our own motor system (Gallese &
Goldman, 1998). It is useful to think of motor simulation as a way to
represent observed actions by the same motoric structures that execute
them. Later, motor simulation was employed to explain the empathic nature
of movement in art (Freedberg & Gallese, 2007). Evidence for the shared
representation of action in art observation has been demonstrated in dance
(Cross, Hamilton, & Grafton, 2006) and painting (Leder, Bär, & Topolinski,
2012; Taylor, Witt, & Grimaldi, 2012), but it is most prominently espoused
in music, explaining findings such as the automatic activation of hand-
controlling motor areas in pianists while listening to piano performance
(Haueisen & Knösche, 2001), the co-activation of auditory areas in
violinists when they mimic violin actions (Lotze, Scheler, Tan, Braun, &
Birbaumer, 2003), and various effects describing interference between
music listening and musical performance, which occurs because both
processes depend upon activation of the motor system (e.g., Drost, Rieger,
Brass, Gunter, & Prinz, 2005; Drost, Rieger, & Prinz, 2007; Taylor & Witt,
2015).
As we have seen, rhythm perception involves the motor system: Feeling
the beat is an inherently motoric phenomenon. We can apply the logic of
motor simulation to the challenging demands of timing and joint action in
music. Perceiving rhythm in a social setting may also require the concurrent
representation of others’ actions in the listener’s motor system. In an
inventive TMS study, pianists were required to play the right-hand part of a
duet where the left-hand part was either rehearsed by them at an earlier time
or unrehearsed. This left-hand part would undergo regular changes in
timing that the subject would adapt to. Right-hemisphere (read: left-hand)
TMS interfered with tempo adaptation only for duets where the subject had
previously rehearsed the left-handed accompanying piece. This indicates
that keeping time with a duet partner involved the online co-representation
of that partner’s piece, which was disrupted by the TMS (Novembre, Ticini,
Schütz-Bosbach, & Keller, 2013). This is evidence for the role of motor
simulation in the perception of rhythm and timing during joint musical
action. This flexible adaptation is not surprising given the motor system’s
ability to represent the temporal dynamics of observed actions (Press, Cook,
Blakemore, & Kilner, 2011). This co-representation of observed and
executed actions is important for any kind of rhythmic cooperation. To
study rhythmic cooperation, a group of researchers created a virtual partner
in an adaptive timing task so that the degree of timing cooperation could be
tightly controlled. Subjects tapping along with the virtual partner’s
changing rhythm exhibited increased activity in premotor areas when the
partner was cooperative versus difficult to follow along with, suggesting
that this kind of rhythmic co-action depends on simulated internal
representations of the co-actor (Fairhurst, Janata, & Keller, 2012).
C
As we have seen, the neuroscience of rhythm is a vibrant field of study that
can be approached from many angles. Despite this variety, there has been a
common thread throughout this chapter: the involvement of motor
processes during the perception of rhythm and beat. Given the role of the
motor system in fundamental timing processes, it is not surprising that
similar networks should become involved with the perception of rhythm.
What interests us is the reliability and the variety of motor system
participation in rhythm processing, whether it is the recruitment of the basal
ganglia during the perception of strong beats as revealed by fMRI (see
section on “Feeling the Beat”), the heightened corticospinal excitability of
toe-tapping as revealed by TMS (see section on “Feeling the Beat”), the
auditory-motor coordination of beta-band patterns recorded from EEG (see
section on “Oscillatory Mechanisms”), the co-development of movement
and rhythm production in children (see section on “Development of
Rhythm”), or the use of others’ actions to guide joint action (see section on
“Comparative Psychology and Evolution of Rhythm Perception”). These
are just a few of the many ways in which the auditory and motor systems
interact to produce the rich experience of rhythm perception and
production. Promising new research offers these auditory-motor interactions
as the basis for therapies to help patients with neurodegenerative diseases of
the basal ganglia (e.g., Spaulding et al., 2013), or patients with
developmental language disorders, such as dyslexia (e.g., Flaugnacco et al.,
2015). Our ability to neurally process rhythms is not only important for
being able to clap along to our favorite song, but is important for examining
fundamental psychological questions ranging from individual differences in
perception and production to what makes humans unique from other
species. Future work on the neural bases of rhythm perception has the
potential to inform a wide range of domains including aesthetics, evolution,
and human perception and production.
R
Bartolo, R., Prado, L., & Merchant, H. (2014). Information processing in the primate basal ganglia
during sensory-guided and internally driven rhythmic tapping. Journal of Neuroscience 34(11),
3910–3923.
Bauer, A. K., Kreutz, G., & Herrmann, C. S. (2015). Individual musical tempo preference correlates
with EEG beta rhythm. Psychophysiology 52(4), 600–604.
Bekius, A., Cope, T., & Grube, M. (2016). The beat to read: A cross-lingual link between rhythmic
regularity perception and reading skill. Frontiers in Human Neuroscience 10, 425. Retrieved from
https://doi.org/10.3389/fnhum.2016.00425
Brochard, R., Abecasis, D., Potter, D., Ragot, R., & Drake, C. (2003). The “ticktock” of our internal
clock: Direct brain evidence of subjective accents in isochronous sequences. Psychological
Science 14(4), 362–366.
Brown, S., Pfordresher, P. Q., & Chow, I. (2017). A musical model of speech rhythm.
Psychomusicology: Music, Mind, and Brain 27(2), 95–112.
Cameron, D. J., Pickett, K. A., Earhart, G. M., & Grahn, J. A. (2016). The effect of dopaminergic
medication on beat-based auditory timing in Parkinson’s disease. Frontiers in Neurology 7, 19.
Retrieved from https://doi.org/10.3389/fneur.2016.00019
Cameron, D. J., Stewart, L., Pearce, M. T., Grube, M., & Muggleton, N. G. (2012). Modulation of
motor excitability by metricality of tone sequences. Psychomusicology: Music, Mind, and Brain
22(2), 122–128.
Chemin, B., Mouraux, A., & Nozaradan, S. (2014). Body movement selectively shapes the neural
representation of musical rhythms. Psychological Science 25(12), 2147–2159.
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008). Listening to musical rhythms recruits motor
Chen, J. L., Zatorre, R. J., & Penhune, V. B. (2006). Interactions between auditory and dorsal
premotor cortex during synchronization to musical rhythms. NeuroImage 32(4), 1771–1781.
Cirelli, L. K., Bosnyak, D., Manning, F. C., Sinelli, C., Marie, C., Fujioka, T., … Trainor, L. J.
(2014). Beat-induced fluctuations in auditory cortical beta-band activity: Using EEG to measure
age-related changes. Frontiers in Psychology 5, 742. Retrieved from
Cirelli, L. K., Einarson, K. M., & Trainor, L. J. (2014). Interpersonal synchrony increases prosocial
behavior in infants. Developmental Science 17(6), 1003–1011.
Cirelli, L. K., Spinelli, C., Nozaradan, S., & Trainor, L. J. (2016). Measuring neural entrainment to
beat and meter in infants: Effects of music background. Frontiers in Neuroscience 10, 229.
Retrieved from https://doi.org/10.3389/fnins.2016.00229
Collier, G. L., & Logan, G. (2000). Modality differences in short-term memory for rhythms. Memory
& Cognition 28(4), 529–538.
Collier, G. L., & Wright, C. E. (1995). Temporal rescaling of simple and complex ratios in rhythmic
tapping. Journal of Experimental Psychology: Human Perception and Performance 21(3), 602–
627.
Cook, P., Rouse, A., Wilson, M., & Reichmuth, C. (2013). A California sea lion (Zalophus
californianus) can keep the beat: Motor entrainment to rhythmic auditory stimuli in a non vocal
mimic. Journal of Comparative Psychology 127(4), 412–427.
Cooper, G., & Meyer, L. B. (1960). The rhythmic structure of music. Chicago, IL: University of
Chicago Press.
Cross, E. S., Hamilton, A. F. D. C., & Grafton, S. T. (2006). Building a motor simulation de novo:
Observation of dance by dancers. NeuroImage 31(3), 1257–1267.
D’Ausilio, A., Altenmuller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity
Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D. (2017). Temporal modulations in
speech and music. Neuroscience & Biobehavioral Reviews 81(Part B), 181–187.
Drost, U. C., Rieger, M., Brass, M., Gunter, T. C., & Prinz, W. (2005). Action-effect coupling in
pianists. Psychological Research 69(4), 233–241.
Drost, U. C., Rieger, M., & Prinz, W. (2007). Instrument specificity in experienced musicians.
Quarterly Journal of Experimental Psychology 60(4), 527–533.
Epstein, D. (1995). Shaping time: Music, the brain, and performance. New York: Macmillan.
Essens, P. J. (1986). Hierarchical organization of temporal patterns. Perception & Psychophysics
40(2), 69–73.
Fairhurst, M. T., Janata, P., & Keller, P. E. (2012). Being and feeling in sync with an adaptive virtual
partner: Brain mechanisms underlying dynamic cooperativity. Cerebral Cortex 23(11), 2592–2600.
Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training
increases phonological awareness and reading skills in developmental dyslexia: A randomized
control trial. PLoS ONE 10(9), e0138715.
Freedberg, D., & Gallese, V. (2007). Motion, emotion and empathy in esthetic experience. Trends in
Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2009). Beta and gamma rhythms in human
auditory cortex during musical beat processing. Annals of the New York Academy of Sciences 1169,
89–92.
Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized timing of isochronous sounds
is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32(5), 1791–1802.
Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading.
Trends in Cognitive Sciences 2(12), 493–501.
Geiser, E., Ziegler, E., Jancke, L., & Meyer, M. (2009). Early electrophysiological correlates of meter
and rhythm processing in music perception. Cortex 45(1), 93–102.
Glenberg, A. M., Mann, S., Altman, L., Forman, T., & Procise, S. (1989). Modality effects in the
coding reproduction of rhythms. Memory & Cognition 17(4), 373–383.
Goldstone, S., & Lhamon, W. T. (1972). Auditory-visual differences in human temporal judgment.
Perceptual and Motor Skills 34(2), 623–633.
Grabe, E., & Low, L. (2002). Durational variability in speech and the rhythm class hypothesis. In N.
Warner & C. Gussenhoven (Eds.), Papers in laboratory phonology 7 (pp. 515–546). Berlin:
Mouton de Gruyter.
Grahn, J. A. (2012). See what I hear? Beat perception in auditory and visual rhythms. Experimental
Brain Research 220(1), 51–61.
Grahn, J. A., & Brett, M. (2009). Impairment of beat-based rhythm discrimination in Parkinson’s
disease. Cortex 45(1), 54–61.
in beat perception: Audition primes vision, but not vice versa. NeuroImage 54(2), 1231–1243.
Grahn, J. A., & McAuley, J. D. (2009). Neural bases of individual difference in beat perception.
NeuroImage 47(4), 1894–1903.
Grahn, J. A., & Schuit, D. (2012). Individual difference in rhythmic ability: Behavioral and
neuroimaging investigations. Psychomusicology: Music, Mind, and Brain 22(2), 105–121.
Grube, M., Cooper, F. E., & Griffiths, T. D. (2013). Auditory temporal-regularity processing
correlates with language and literacy skill in early adulthood. Cognitive Neuroscience 3(3–4), 225–
230.
Habib, M., Lardy, C., Desiles, T., Commeiras, C., Chobert, J., & Besson, M. (2016). Music and
dyslexia: A new musical training method to improve reading and related disorders. Frontiers in
Psychology 7, 26. Retrieved from https://doi.org/10.3389/fpsyg.2016.00026
Harrington, D. L., Haaland, K. Y., & Hermanowitz, N. (1998). Temporal processing in the basal
ganglia. Neuropsychology 12(1), 3–12.
Hasegawa, A., Okanoya, K., Hasegawa, T., & Seki, Y. (2011). Rhythmic synchronization tapping to
an audio-visual metronome in budgerigars. Scientific Reports 1, 120. doi:10.1038/srep00120
Hattori, Y., Tomonaga, M., & Matsuzawa, T. (2013). Spontaneous synchronized tapping to an
auditory rhythm in a chimpanzee. Scientific Reports 3, 1566. doi:10.1038/srep01566
Haueisen, J., & Knösche, T. R. (2001). Involuntary motor activity in pianists evoked by music
Henry, M. J., & Herrmann, B. (2014). Low-frequency neural oscillations support dynamic attending
in temporal context. Timing and Time Perception 2(1), 62–86.
Henry, M. J., Herrmann, B., & Grahn, J. A. (2017). What can we learn about beat perception by
comparing brain signals and stimulus envelopes? PLoS ONE 12(2), e0172454.
Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains slow neural oscillations and
optimizes human listening behavior. Proceedings of the National Academy of Sciences 109(49),
20095–20100.
Honing, H., Merchant, H., Háden, G. P., Prado, L., & Bartolo, R. (2012). Rhesus monkeys (Macaca
mulatta) detect rhythmic groups in music, but not the beat. PloS ONE 7(12), e51369.
Hove, M. J., Fairhurst, M. T., Kotz, S. A., & Keller, P. E. (2013). Synchronizing with auditory and
visual rhythms: An fMRI assessment of modality differences and modality appropriateness.
Hove, M. J., Iversen, J. R., Zhang, A., & Repp, B. H. (2013). Synchronization with competing visual
and auditory rhythms: Bouncing ball meets metronome. Psychological Research 77(4), 388–398.
Iversen, J. R., Patel, A. D., Nicodemus, B., & Emmorey, K. (2015). Synchronization to auditory and
visual rhythms in hearing and deaf individuals. Cognition 134, 232–244.
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review
96(3), 459–491.
Kirschner, S., & Tomasello, M. (2010). Joint music making promotes prosocial behavior in 4-year-
old children. Evolution and Human Behavior 31(5), 354–364.
Large, E. W. (2008). Resonating to musical rhythm: Theory and experiment. In S. Grondin (Ed.),
Psychology of time (pp. 189–231). Bingley: Emerald.
Large, E. W., & Gray, P. M. (2015). Spontaneous tempo and rhythmic entrainment in a bonobo (Pan
paniscus). Journal of Comparative Psychology 129(4), 317–328.
Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying
events. Psychological Review 106(1), 119–159.
Leder, H., Bär, S., & Topolinski, S. (2012). Covert painting simulations influence aesthetic
appreciation of artworks. Psychological Science 23(12), 1479–1481.
Lehongre, K., Ramus, F., Villiermet, N., Schwartz, D., & Giraud, A.-L. (2011). Altered low-gamma
sampling in auditory cortex accounts for the three main facets of dyslexia. Neuron 72(6), 1080–
1090.
Leow, L.-A., Parrott, T., & Grahn, J. A. (2014). Individual differences in beat perception affect gait
responses to low- and high-groove music. Frontiers in Human Neuroscience 8, 1–12. Retrieved
from https://doi.org/10.3389/fnhum.2014.00811
Press.
Lotze, M., Scheler, G., Tan, H. R., Braun, C., & Birbaumer, N. (2003). The musician’s brain:
Functional imaging of amateurs and professionals during performance and imagery. NeuroImage
20(3), 1817–1829.
Luck, G., & Sloboda, J. A. (2009). Spatio-temporal cues for visually mediated synchronization.
Music Perception: An Interdisciplinary Journal 26(5), 465–473.
McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our
lives: Life span development of timing and event tracking. Journal of Experimental Psychology:
General 135(3), 348–367.
Merchant, H., & Honing, H. (2014). Are non-human primates capable of rhythmic entrainment?
Evidence for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience 7, 274.
Merchant, H., Pérez, O., Zarco, W., & Gámez, J. (2013). Interval tuning in the primate medial
premotor cortex as a general timing mechanism. Journal of Neuroscience 33(21), 9082–9096.
Nombela, C., Rae, C. L., Grahn, J. A., Barker, R. A., Owen, A. M., & Rowe, J. B. (2013). How often
does music and rhythm improve patients’ perception of motor symptoms in Parkinson’s disease?
Journal of Neurology 260(5), 1404–1405.
Novembre, G., Ticini, L. F., Schütz-Bosbach, S., & Keller, P. E. (2013). Motor simulation and the
coordination of self and other in real-time joint action. Social Cognitive and Affective
Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the neuronal entrainment to
beat and meter. Journal of Neuroscience 31(28), 10234–10240.
Pasinski, A. C., McAuley, J. D., & Snyder, J. S. (2016). How modality specific is processing of
auditory and visual rhythms? Psychophysiology 53(2), 198–208.
Patel, A. D. (2003). Rhythm in language and music. Annals of the New York Academy of Sciences
999, 140–143.
Patel, A. D. (2006). Musical rhythm, linguistic rhythm, and human evolution. Music Perception: An
Patel, A. D., Iversen, J. R., Bregman, M. R., & Schulz, I. (2009). Experimental evidence for
synchronization to a musical beat in a nonhuman animal. Current Biology 19(10), 827–830.
Patel, A. D., Iversen, J. R., Chen, Y., & Repp, B. H. (2005). The influence of metricality and
modality on synchronization with a beat. Experimental Brain Research 163(2), 226–238.
Peelle, J. E., Gross, J., & Davis, M. H. (2013). Phase-locked responses to speech in human auditory
cortex are enhanced during comprehension. Cerebral Cortex 23(6), 1378–1387.
Phillips-Silver, J., Toiviainen, P., Gosselin, N., Piche, O., Nozaradan, S., Palmer, C., & Peretz, I.
(2011). Born to dance but beat deaf: A new form of congenital amusia. Neuropsychologia 49(5),
961–969.
Picton, T. W., John, M. S., Dimitrijevic, A., & Purcell, D. (2003). Human auditory steady-state
responses. International Journal of Audiology 42, 177–219.
Poole, J. H., Tyack, P. L., Stoeger-Horwath, A. S., & Watwood, S. (2005). Animal behaviour:
Elephants are capable of vocal learning. Nature 434(7032), 455–456.
Povel, D. J., & Essens, P. (1985). Perception of temporal patterns. Music Perception: An
Povel, D. J., & Okkerman, H. (1981). Accents in equitone sequences. Perception & Psychophysics
30(6), 565–572.
Power, A. J., Colling, L. J., Mead, N., Barnes, L., & Goswami, U. (2016). Neural encoding of the
speech envelope by children with developmental dyslexia. Brain and Language 160, 1–10.
Press, C., Cook, J., Blakemore, S. J., & Kilner, J. (2011). Dynamic modulation of human motor
activity when observing actions. Journal of Neuroscience 31(8), 2792–2800.
Ravignani, A., Delgado, T., & Kirby, S. (2017). Musical evolution in the lab exhibits rhythmic
universals. Nature Human Behaviour 1(1), 0007. doi:10.1038/s41562-016-0007
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience
27, 169–192.
Schachner, A., Brady, T. F., Pepperberg, I. M., & Hauser, M. D. (2009). Spontaneous motor
entrainment to music in multiple vocal mimicking species. Current Biology 19(10), 831–836.
Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal oscillations and
visual amplification of speech, Cell Press 12(3), 106–113.
Schwartze, M., Keller, P. E., Patel, A. D., & Kotz, S. A. (2011). The impact of basal ganglia lesions
on sensorimotor synchronization, spontaneous motor tempo, and the detection of tempo changes.
Behavioural Brain Research 216(2), 685–691.
Selezneva, E., Deike, S., Knyazeva, S., Scheich, H., Brechmann, A., & Brosch, M. (2013). Rhythm
sensitivity in macaque monkeys. Frontiers in Systems Neuroscience 7. Retrieved from
https://doi.org/10.3389/fnsys.2013.00049
Snyder, J. S., & Large, E. W. (2005). Gamma-band activity reflects the metric structure of rhythmic
tone sequences. Cognitive Brain Research 24(1), 117–126.
Spaulding, S. J., Barber, B., Colby, M., Cormack, B., Mick, T., & Jenkins, M. E. (2013). Cueing and
gait improvement among people with Parkinson’s disease: A meta-analysis. Archives of Physical
Medicine and Rehabilitation 94(3), 562–570.
Stupacher, J., Hove, M. J., Novembre, G., Schutz-Bosbach, S., & Keller, P. E. (2013). Musical
groove modulates motor cortex excitability: A TMS investigation. Brain and Cognition 82(2),
127–136.
Su, Y. H., & Salazar-López, E. (2016). Visual timing of structured dance movements resembles
auditory rhythm perception. Neural Plasticity 2016, 1678390. doi:10.1155/2016/1678390
Taylor, J. E. T., & Witt, J. K. (2015). Listening to music primes space: Pianists, but not novices,
simulate heard actions. Psychological Research 79(2), 175–182.
Taylor, J. E. T., Witt, J. K., & Grimaldi, P. J. (2012). Uncovering the connection between artist and
audience: Viewing painted brushstrokes evokes corresponding action representations in the
observer. Cognition 125(1), 26–36.
Winkler, I., Haden, G. P., Ladinig, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the
Woodruff Carr, K., White-Schwoch, T., Tierney, A. T., Strait, D. L., & Kraus, N. (2014). Beat
synchronization predicts neural speech encoding and reading readiness in preschoolers.
Zarco, W., Merchant, H., Prado, L., & Mendez, J. C. (2009). Subsecond timing in primates:
Comparison of interval production between human subjects and rhesus monkeys. Journal of
Neurophysiology 102(6), 3191–3202.
Zhao, J., Al-Aidroos, N., & Turk-Browne, N. B. (2013). Attention is spontaneously biased toward
regularities. Psychological Science 24(5), 667–677.
Zuk, J., Bishop-Liebler, P., Ozernov-Palchik, O., Moore, E., Overy, K., Welch, G., & Gaab, N.
(2017). Revisiting the “enigma” of musicians with dyslexia: Auditory sequencing and speech
abilities. Journal of Experimental Psychology: General 146(4), 495–511.
CHAPT E R 9
NEURAL BASIS OF MUSIC

P E R C E P T I O N : M E L O D Y,
H A R M O N Y, A N D T I M B R E
S T E FA N K O E L S C H
“M ” is a special case of sound. As opposed to animal song and

drumming (e.g., birdsong, ape drumming, etc.), music is produced by
humans. As opposed to noise, and noise-textures (e.g., wind, fire crackling,
rain, water bubbling, etc.), musical sounds have a structural organization. In
the time domain, the most fundamental principle of musical structure is the
temporal organization of sounds based on an isochronous grid (the tactus, or
“beat”), although there are notable exceptions (such as some kinds of
meditation music, or some pieces of modern art music, such as the famous
Atmosphères by Ligeti). In the frequency (pitch) domain, the most
fundamental principle of musical structure is an organization of pitches
according to the overtone series, resulting in simple (e.g., pentatonic)
scales. Note that the production of overtone-based scales is, in turn, rooted
in the perceptual properties of the auditory system, especially in octave and
“fifth equivalence” (Terhardt, 1991), and that inharmonic spectra (e.g., of
inharmonic metallophones) give rise to different scales, such as the pelog
and slendro scales (Sethares, 2005). Thus, for a vast amount of musical
traditions around the globe and through human history, these two principles
(isochronous beat and scale-pitch) build the nucleus of a universal musical
grammar. Out of this nucleus, a seemingly infinite number of musical
systems, styles, and compositions evolved, and this evolution appears to
have followed computational principles described, for example, in the
Chomsky hierarchy and their extensions (Rohrmeier, Zuidema, Wiggins, &
Scharff, 2015), that is, local relationships between sounds based on a finite
state grammar, nonlocal relationships between sounds based on a context-
free grammar, and possibly a context-sensitive grammar (Rohrmeier et al.,
2015). Note that the term “language” also refers to structured sounds that
are produced by humans. Similar to music, spoken language also has
melody, rhythm, accents, and timbre. However, in language normally only
one individual speaks at a time (otherwise the language cannot be
understood, and the sound is unpleasant). By contrast, music immediately
affords, by virtue of its fundamental structural principles, that several
individuals produce sounds together (while the music still makes sense and
sounds good). In this sense, language is the music of the individual, and
music is the language of the group. The fact that music can only be
produced by humans is afforded by the uniquely human ability to
synchronize movements (including vocalizations) flexibly in a group to an
external pulse (see also Merchant & Honing, 2014; Merker, Morley, &
Zuidema, 2015). Finally, several scholars have noted that “language,” in
turn, is a special case of music. For example, Ulrich (personal
communication) once noted that language is music distorted by
(propositional) semantics. In this regard, the terms “music” and “language”
both refer to structured sounds that are produced by humans as a means of
social interaction, expression, diversion, or evocation of emotion, with
language, in addition, affording the property of propositional semantics.
The following sections will review neuroscientific research about the
perception of musical sounds, in particular with regard to the structural
processing of melodies and harmonies.
W D N O H O
C
The auditory system evolved phylogenetically from the vestibular system.
Interestingly, the vestibular nerve contains a substantial number of
acoustically responsive fibers. The otolith organs (saccule and utricle) are
sensitive to sounds and vibrations (Todd, Paillard, Kluk, Whittle, &
Colebatch, 2014), and the vestibular nuclear complex in the brainstem
exerts a major influence on spinal (and ocular) motoneurons in response to
loud sounds with low frequencies, or with sudden onsets (Todd et al., 2014;
Todd & Cody, 2000). Moreover, both the vestibular nuclei and the auditory
cochlear nuclei in the brainstem project to the reticular formation (also in
the brainstem), and the vestibular nucleus also projects to the parabrachial
nucleus, a convergence site for vestibular, visceral, and autonomic
processing in the brainstem (Balaban & Thayer, 2001; Kandler & Herbert,
1991). Such projections initiate and support movements and contribute to
the arousing effects of music. Thus, subcortical processing of sounds does
not only give rise to auditory sensations, but also to muscular and
autonomic responses, and the stimulation of motoneurons and autonomic
neurons by low-frequency beats might contribute to the human impetus to
“move to the beat” (Grahn & Rowe, 2009; Todd & Cody, 2000). In addition
to vibrations of the vestibular apparatus and cochlea, sounds also evoke
resonances in vibration receptors, that is, in the Pacinian corpuscles (which
are sensitive from 10 Hz to a few kHz, and located mainly in the skin, the
retroperitoneal space in the belly, the periosteum of the bones, and the sex
organs), and maybe even responses in mechanoreceptors of the skin that
detect pressure. The famous international concert percussionist Dame
Evelyn Glennie is profoundly deaf and hears mainly through vibrations felt
in the skin (personal communication with Dame Glennie), and probably in
the vestibular organ. Thus, we do not only hear with our cochlea, but also
with the vestibular apparatus and mechanoreceptors distributed throughout
our body.
A F E
B T
Neural activity originating in the auditory nerve is progressively

transformed in the auditory brainstem, as indicated by different neural
response properties for the periodicity of sounds, timbre (including
roughness, or consonance/dissonance), sound intensity, and interaural
disparities in the superior olivary complex and the inferior colliculus
(Geisler, 1998; Langner & Ochse, 2006; Pickles, 2008; Sinex, Guzik, Li, &
Henderson Sabes, 2003). Already the inferior colliculi can initiate flight-
and defensive behavior in response to threatening stimuli (even before the
acoustic information reaches the auditory cortex; Cardoso, Coimbra, &
Brandão, 1994; Lamprea et al., 2002), providing evidence of relatively
elaborated auditory processing already in the brainstem. This stays in
contrast to the visual system: already Philip Bard (1934) observed that
decortication (removing the neocortex) led to blindness in cats and dogs,
but not to deafness. Although the hearing thresholds appeared to be
elevated, the animals were capable of differentiating sounds. From the
thalamus (particularly over the medial geniculate body) neural impulses are
mainly projected into the auditory cortex (but note that the thalamus also
projects auditory impulses into the amygdala and the medial orbitofrontal
cortex; Kaas, Hackett, & Tramo, 1999; LeDoux, 2000; Öngür & Price,
2000). The exact mechanisms underlying pitch perception are not known
(and will not be discussed here), but it is clear that both space information
(originating from the tonotopic organization of the cochlea) and time
information (originating from the integer time intervals of neural spiking in
the auditory nerve) contribute to pitch perception (Moore, 2008).
Importantly, the auditory pathway does not only consist of bottom-up, but
also of top-down projections; nuclei such as the dorsal nucleus of the
inferior colliculus presumably receive even more descending than
ascending projections from diverse auditory cortical fields (Huffman &
Henson, 1990). Given the massive top-down projections within the auditory
pathway, it also becomes increasingly obvious that top-down predictions
play an important role in pitch perception (Malmierca, Anderson, &
Antunes, 2015). Within the predictive coding framework (currently one of
the dominant theories on sensory perception), such top-down projections
are thought to afford passing on backward predictions, while forward
sensory information is passed bottom-up, signaling prediction errors, that is,
sensory information that does not match a prediction (Friston, 2010).
Numerous studies investigated decoding of frequency information in the
auditory brainstem using the frequency-following response (FFR; Kraus &
Chandrasekaran, 2010). The FFR can be elicited pre-attentively and is
thought to originate mainly from the inferior colliculus (but note also that it
is likely that the auditory cortex is at least partly involved in shaping the
FFRs, e.g., by virtue of top-down projections to the inferior colliculus,
referred to above). Using FFRs, Wong and colleagues (Wong, Skoe, Russo,
Dees, & Kraus, 2007) measured brainstem responses to three Mandarin
tones that differed only in their (F0) pitch contours. Participants were
amateur musicians and non-musicians, and results revealed that musicians
had more accurate encoding of the pitch contour of the phonemes (as
reflected in the FFRs) than non-musicians. This finding indicates that the
auditory brainstem is involved in the encoding of pitch contours of speech
information (vowels), and that the correlation between the FFRs and the
properties of the acoustic information is modulated by musical training.
Similar training effects on FFRs elicited by syllables with a dipping pitch
contour have also been observed in native English speakers (non-musicians)
after a training period of 14 days (with eight 30-minute sesssions; Song,
Skoe, Wong, & Kraus, 2008). The latter results show the contribution of the
brainstem in language learning and its neural plasticity in adulthood.
A study by Strait and colleagues (Strait, Kraus, Skoe, & Ashley, 2009)
also reported musical training effects on the decoding of the acoustic
features of an affective vocalization (an infant’s unhappy cry), as reflected
in auditory brainstem potentials. This suggests (a) that the auditory
brainstem is involved in the auditory processing of communicated states of
emotion (which substantially contributes to the decoding and understanding
of affective prosody), and (b) that musical training can lead to a finer tuning
of such (subcortical) processing.
Acoustical Equivalency of “Timbre” and

“Phoneme”
With regard to a comparison between music and speech, it is worth
mentioning that, in terms of acoustics, there is no difference between a
phoneme and the timbre of a musical sound (and it is only a matter of
convention that some phoneticians rather use terms such as “vowel quality”
or “vowel color,” instead of “timbre”). Both are characterized by the two
physical correlates of timbre: spectrum envelope (i.e., differences in the
relative amplitudes of the individual “harmonics,” or “overtones”) and
amplitude envelope (also sometimes called the amplitude contour or energy
contour of the sound wave, i.e., the way that the loudness of a sound
changes over time, particularly with regard to the on- and offset of a sound).
Aperiodic sounds can also differ in spectrum envelope (see, e.g., the
difference between /ʃ/ and /s/), and timbre differences related to amplitude
envelope play a role in speech, e.g. in the shape of the attack for /b/ vs. /w/
and /ʃ/ vs. /tʃ/.
Auditory Feature Extraction in the Auditory

Cortex
As mentioned earlier, auditory information is projected mainly via the
subdivisions of the medial geniculate body into the primary auditory cortex
(PAC, corresponding to Brodmann’s area 41) and adjacent secondary
auditory fields (corresponding to Brodmann’s areas 42 and 52; for a
detailed description of primary auditory “core,” and secondary auditory
“belt” fields, as well as their connectivity, see Kaas & Hackett, 2000). With
regard to the functional properties of primary and secondary auditory fields,
a study by Petkov and colleagues (Petkov, Kayser, Augath, & Logothetis,
2006) showed that, in the macaque monkey, all of the PAC core areas, and
most of the surrounding belt areas, show a tonotopic organization (the
tonotopic organization is clearest in the field A1, and some belt areas seem
to show only weak, or no, tonotopic organization). These auditory areas
perform a more fine-grained, and more specific, analysis of acoustic
features compared to the auditory brainstem. For example, Tramo and
colleagues (Tramo, Shah, & Braida, 2002) reported that a patient with
bilateral lesion of the PAC (a) had normal detection thresholds for sounds
(i.e., the patient could say whether there was a tone or not), but (b) had
elevated thresholds for determining whether two tones had the same pitch
or not (i.e., the patient had difficulties to detect fine-grained frequency
differences between two subsequent tones), and (c) had markedly increased
thresholds for determining the pitch direction (i.e., the patient had great
difficulties in saying whether the second tone was higher or lower in pitch
than the first tone, even though he could tell that both tones differed.1 Note
that the auditory cortex is also involved in a number of other functions,
such as auditory sensory memory, extraction of inter-sound relationships,
discrimination and organization of sounds as well as sound patterns, stream
segregation, automatic change detection, and multisensory integration (for
reviews see Hackett & Kaas, 2004; Winkler, 2007; some of these functions
are also mentioned further in the following). Moreover, the (primary)
auditory cortex is involved in the transformation of acoustic features (such
as frequency information) into percepts (such as pitch height and pitch
chroma). For example, a sound with the frequencies 200 Hz, 300 Hz, and
400 Hz is transformed into the pitch percept of 100 Hz. Lesions of the
(right) PAC result in a loss of the ability to perceive residue pitch (or
“virtual pitch”) in both animals (Whitfield, 1980) and humans (Zatorre,
1988), and neurons in the anterolateral region of the PAC show responses to
a missing fundamental frequency (Bendor & Wang, 2005). Moreover,
magnetoencephalographic (MEG) data indicate that response properties in
the PAC depend on whether or not a missing fundamental of a complex
tone is perceived (Patel & Balaban, 2001; data were obtained from
humans). Note, however, that combination tones emerge already in the
cochlea, and that the periodicity of complex tones is coded in the spike
pattern of auditory brainstem neurons; therefore, different mechanisms
contribute to the perception of residue pitch on at least three different levels
(basilar membrane, brainstem, and auditory cortex). However, the studies
by Zatorre (1988) and Whitfield (1980) suggest that, compared to the
brainstem or the basilar membrane, the auditory cortex plays a more
prominent role for the transformation of acoustic features into auditory
percepts (such as the transformation of information about the frequencies of
a complex sound, as well as about the periodicity of a sound, into a pitch
percept).
Warren and colleagues (Warren, Uppenkamp, Patterson, & Griffiths,
2003) reported that changes in pitch chroma involve auditory regions
anterior of the PAC (covering parts of the planum polare) more strongly
than changes in pitch height. Conversely, changes in pitch height appear to
involve auditory regions posterior of the PAC (covering parts of the planum
temporale) more strongly than changes in pitch chroma (Warren et al.,
2003). Moreover, with regard to functional differences between the left and
the right PAC, as well as neighboring auditory association cortex, several
studies suggest that the left auditory cortex (AC) has a higher resolution of
temporal information than the right AC, and that the right AC has a higher
spectral resolution than the left AC (Hyde, Peretz, & Zatorre, 2008; Perani
et al., 2010; Zatorre, Belin, & Penhune, 2002).
Finally, the auditory cortex also prepares acoustic information for further
conceptual and conscious processing. For example, with regard to the
meaning of sounds, just a short single tone can sound, for example,
“bright”, “rough,” or “dull”. That is, the timbre of a single sound is already
capable of conveying meaning information.
Operations within the (primary and adjacent) auditory cortex related to
auditory feature analysis are reflected in electrophysiological recordings in
brain-electric responses that have latencies of about 10 to 100 ms,
particularly middle-latency responses, including the auditory P1 (a response
with positive polarity and a latency of around 50 ms), and the later auditory
N100 component (the N1 is a response with negative polarity and a latency
of around 100 ms). Such brain-electric responses are also referred to as
“event-related potentials” (ERPs) or “evoked potentials.”
E M G
F
While auditory features are extracted, the acoustic information enters the
auditory sensory memory (or “echoic memory”), and representations of
auditory Gestalten (Griffiths & Warren, 2004) or “auditory objects” are
formed. The auditory sensory memory (ASM) retains information only for a
few seconds, and information stored in the ASM fades quickly. The ASM is
thought to store physical features of sounds (such as pitch, intensity,
duration, location, timbre, etc.), sound patterns, and even abstract features
of sound patterns (e.g., Paavilainen, Simola, Jaramillo, Näätänen, &
Winkler, 2001). Operations of the ASM are at least partly reflected
electrically in the mismatch negativity (MMN, e.g., Näätänen, Tervaniemi,
Sussman, Paavilainen, & Winkler, 2001). The MMN is an ERP with
negative polarity and a peak-latency of about 100–200 ms and appears to
receive its main contributions from neural sources located in the PAC and
adjacent auditory (belt) fields, with additional (but smaller) contributions
from frontal cortical areas (for reviews, see Deouell, 2007; Schönwiesner et
al., 2007).
Auditory sensory memory operations are indispensable for music
perception; therefore, practically all MMN studies are inherently related to,
and relevant for, the understanding of the neural correlates of music
processing. As will be outlined below, numerous MMN studies have
contributed to this issue (a) by investigating different response properties of
the ASM to musical and speech stimuli, (b) by using melodic and rhythmic
patterns to investigate auditory Gestalt formation, and/or (c) by studying
effects of long- and short-term musical training on processes underlying
ASM operations. Especially the latter studies have contributed substantially
to our understanding of neuroplasticity (i.e., to changes in neuronal
structure and function due to experience), and thus to our understanding of
the neural basis of learning (for a review see Tervaniemi, 2009). Here,
suffice it to say that MMN studies showed effects of long-term musical
training on the processing of sound localization, pitch, melody, rhythm,
musical key, timbre, tuning, and timing (e.g., Koelsch, Schröger, &
Tervaniemi, 1999; Putkinen, Tervaniemi, Saarikivi, de Vent, & Huotilainen,
2014; Rammsayer & Altenmüller, 2006; Tervaniemi, Castaneda, Knoll, &
Uther, 2006; Tervaniemi, Janhunen, Kruck, Putkinen, & Huotilainen, 2016).
Auditory oddball paradigms were also used to investigate processes of
melodic and rhythmic grouping of tones occurring in tone patterns (such
grouping is essential for auditory Gestalt formation, see also Sussman,
2007), as well as effects of musical long-term training on these processes.
These studies showed effects of musical training (a) on the processing of
melodic patterns (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004;
Tervaniemi, Ilvonen, Karma, Alho, & Näätänen, 1997; Tervaniemi,
Rytkönen, Schröger, Ilmoniemi, & Näätänen, 2001; Zuijen, Sussman,
Winkler, Näätänen, & Tervaniemi, 2004; in these studies, patterns consisted
of four or five tones), (b) on the encoding of the number of elements in a
tone pattern (Zuijen, Sussman, Winkler, Näätänen, & Tervaniemi, 2005),
and (c) on the processing of patterns consisting of two voices (Fujioka,
Trainor, Ross, Kakigi, & Pantev, 2005).
The formation of auditory Gestalten entails processes of perceptual
separation, as well as processes of melodic, rhythmic, timbral, and spatial
grouping. Such processes have been summarized under the concepts of
auditory scene analysis and auditory stream segregation (Bregman, 1994).
Grouping of acoustic events follows Gestalt principles such as similarity,
proximity, and continuity (for acoustic cues used for perceptual separation
and auditory grouping see Darwin, 1997, 2008). In everyday life, such
operations are not only important for music processing, but also, for
instance, for separating a speaker’s voice during a conversation from other
sound sources in the environment. That is, these operations are important
because their function is to recognize and to follow acoustic objects, and to
establish a cognitive representation of the acoustic environment. It appears
that the planum temporale (which is part of the auditory association cortex)
is a crucial structure for auditory scene analysis and stream segregation,
particularly due to its role for the processing of pitch intervals and sound
sequences (Griffiths & Warren, 2002; Patterson, Uppenkamp, Johnsrude, &
Griffiths, 2002; Snyder & Elhilali, 2017).
M E F :
P L D
Processing regularities of subsequent sounds can be performed based on

two different principles: first, based on the regularities inherent in the
acoustical properties of the sounds, for example, pitch (after a sequence of
several sounds with the same pitch, a sound with a different pitch sounds
irregular). This type of processing is assumed to be performed by the
auditory sensory memory, and processing of irregular sounds is reflected in
the MMN (discussed earlier). Note that the extraction of the regularity
underlying such sequences does not require memory capabilities beyond the
auditory sensory memory (i.e., the regularity is extracted in real time, on a
moment-to-moment basis). I have referred previously to such syntactic
processes as “knowledge-free structuring” (Koelsch, 2012).
Second, the local arrangement of elements in language and music
includes numerous regularities that cannot simply be extracted on a
moment-to-moment basis but have to be learned over an extended period of
time (“local” refers here to the arrangement of adjacent, or directly
succeeding, elements). For example, it usually takes months, or even years,
to learn the syntax of a language, and it takes a considerable amount of
exposure and learning to establish (implicit) knowledge of the statistical
regularities of a certain type of music. I have referred previously to such
syntactic processes as “musical expectancy formation” (Koelsch, 2012).
An example for local dependencies in music captured by “musical
expectancy formation” is the bigram table of chord transition probabilities
extracted from a corpus of Bach chorales in a study by Rohrmeier and
Cross (2008). That table, for example, showed that after a dominant seventh
chord, the most likely chord to follow is the tonic. It also showed that a
supertonic is nine times more likely to follow a tonic than a tonic following
a supertonic. This is important, because the acoustic similarity of tonic and
supertonic is the same in both cases, and therefore it is very difficult to
explain this statistical regularity simply based on acoustic similarity. Rather,
this regularity is specific for this kind of major-minor tonal music, and thus
has to be learned (over an extended period of time) to be represented
accurately in the brain of a listener.
Notably, even non-musicians are sensitive to such statistical regularities
and pick up statistical structures without explicit intent. This ability is
explored within the frameworks of statistical learning (Saffran, Aslin, &
Newport, 1996) and implicit learning (Cleeremans, Destrebecqz, & Boyer,
1998), both of which have been argued to investigate the same underlying
learning phenomenon (Dienes, 2012; Perruchet & Pacton, 2006). Although
statistical learning appears to be domain-general (Conway & Christiansen,
2005), it has most prominently been investigated in the context of language
acquisition, especially word learning (for a review see Romberg & Saffran,
2010), as well as music (for reviews see Ettlinger, Margulis, & Wong, 2011;
François & Schön, 2014; Rohrmeier & Rebuschat, 2012). With regard to
statistical learning paradigms, word learning has been argued to be
grounded, at least in part, in sequence prediction: in a continuous stream of
syllables, sequences of events linked with high statistical conditional
probability likely correspond to words, whereas syllable transitions with
low predictability may likely be indicative of word-boundaries (François &
Schön, 2014; Marcus, Vijayan, Rao, & Vishton, 1999; Saffran, Newport, &
Aslin, 1996). Thus, tracking conditional probability relations between
syllables has been regarded as highly relevant for the extraction of
candidate word forms (Hay, Pelucchi, Estes, & Saffran, 2011; Saffran,
2001). In music, representations of musical regularities guiding local
dependencies serve the formation of a musical expectancy (“musical” is
italicized here to clearly differentiate this type of expectancy formation
from the formation of expectancies based on simply acoustical regularities).
In addition, integrating information across the extracted units eventually
reveals distributional properties (Hunt & Aslin, 2010; Thiessen, Kronstein,
& Hufnagle, 2013). Extracted statistical properties provide an important
basis for predictions which guide the processing of sensory information
(Friston, 2010; Friston & Kiebel, 2009; Thiessen et al., 2013). Stimuli that
are hard to predict (e.g., the syllable after a word boundary) have been
hypothesized to increase processing load (Friston, 2010; Friston & Kiebel,
2009). Such an increase in processing load has been found to be reflected
neurophysiologically in ERP components such as the N100 and the N400:
during successful stream segmentation, word-onsets evoke larger N100 and
N400 ERPs compared to more predictable positions within the word in
adults (e.g., Abla, Katahira, & Okanoya, 2008; François, Chobert, Besson,
& Schön, 2013; Francois & Schön, 2011, 2014; Schön & François, 2011;
Teinonen & Huotilainen, 2012), and similar ERP responses have been
observed even in newborns (Teinonen, Fellman, Näätänen, Alku, &
Huotilainen, 2009).
When participants learn local dependencies (i.e., statistical regularities
underlying the succession of sounds), irregular sounds elicit a statistical
MMN (or sMMN; Koelsch, Busch, Jentschke, & Rohrmeier, 2016), which
is maximal between around 130–220 ms, and has a frontal distribution
(Daikoku, Yatomi, & Yumoto, 2014; Furl et al; 2011; Koelsch et al., 2016;
Paraskevopoulos, Kuchenbuch, Herholz, & Pantev, 2012). So far, this has
been investigated in statistical learning paradigms in which participants are
presented over a period of several dozens of minutes with streams of
“triplets” (i.e., sounds arranged in threes), with the triplets being designed
such that succession of tones within and between triplets follows exactly
specified statistical regularities.
It is important to understand that, within the Chomsky hierarchy, a finite
state automaton is required to process both the regularities underlying the
generation the physical MMN (phMMN) and abstract-feature MMN
(afMMN) on the one side (i.e., “knowledge-free structuring”), and the
sMMN on the other (i.e., “musical expectancy formation”). In other words,
a finite state grammar is sufficient to process these two types of regularities.
However, they are represented psychologically and neurophysiologically in
fundamentally different ways (because the processing of regularities that do
not require long-term memory, i.e., “knowledge-free structuring,” differs
neurocognitively from the processing of regularities stored in long-term
memory, i.e. “musical expectancy formation”). The local transition
probabilities underlying the generation of the phMMN and afMMN are
stored in auditory sensory memory (and if the probabilities change, the
sensory representations of the new transition probabilities are dynamically
updated). By contrast, deviants in statistical learning paradigms, like those
employed in the MEG studies described above (Daikoku et al., 2014;
Daikoku, Yatomi, & Yumoto, 2015; Furl et al., 2011; Koelsch et al., 2016;
Paraskevopoulos et al., 2012) require an extended period of learning, and
the mismatch response associated with statistical learning reflects the
processing of local dependencies based on (implicit) knowledge about
statistical regularities. That is, the mismatch response associated with
statistical learning is based on memory representations beyond the
capabilities of sensory memory. With regard to music, this also means that
fundamentally different neurocognitive systems process different types of
local syntactic dependencies in music, even though they can be captured by
the same (finite state) automaton within the Chomsky hierarchy.
M S B :
P N D
As described in the previous section, tonal music involves representations

of single events and local relationships on short timescales. However, many
composers designed nested hierarchical syntactic structures spanning longer
timescales, potentially up to entire movements of symphonies and sonatas
(Salzer, 1962; Schenker, 1956). Hierarchical syntactic structure (involving
the potential for nested nonlocal dependencies) is a key component of the
human language capacity (Chomsky, 1995; Fitch & Hauser, 2004;
Friederici, Bahlmann, Heim, Schubotz, & Anwander, 2006; Hauser,
Chomsky, & Fitch, 2002; Nevins, Pesetsky, & Rodrigues, 2009), and
frequently produced and perceived in everyday life. For example, in the
sentence “the boy who helped Peter kissed Mary,” the subject relative
clause “who helped Peter” is nested into the main clause “the boy kissed
Mary,” creating a nonlocal hierarchical dependency between “the boy” and
“kissed Mary.”2 Music theorists have described analogous hierarchical
structures for music. Schenker (1956) was the first to describe musical
structures as organized hierarchically, in a way that musical events are
elaborated (or prolonged) by other events in a recursive fashion. According
to this principle, for example, a phrase (or set of phrases) can be conceived
of as an elaboration of a basic underlying tonic–dominant–tonic
progression. Schenker further argued that this principle can be expanded to
even larger musical sequences, up to entire musical movements. In addition,
Hofstadter (1979) was one of the first to argue that a change of key
embedded in a superordinate key (such as a tonal modulation away from
and returning to an initial key) constitutes a prime example of recursion in
music. Based on similar ideas, several theorists have developed formal
descriptions of the analysis of hierarchical structures in music (Lerdahl &
Jackendoff, 1983; Rohrmeier, 2011; Steedman, 1984), including the
Generative Theory of Tonal Music (GTTM) by Lerdahl and Jackendoff
(1983), and the Generative Syntax Model (GSM) by Rohrmeier (2011).
Humans are capable of processing hierarchically organized structures
including nonlocal dependencies in music (Dibben, 1994; Koelsch,
Rohrmeier, Torrecuso, & Jentschke, 2013; Lerdahl & Krumhansl, 2007;
Serafine, Glassman, & Overbeeke, 1989), driven by the human capacity to
perceive and produce hierarchical, potentially recursive structures
(Chomsky, 1995; Hauser et al., 2002; Jackendoff & Lerdahl, 2006). Using
chorales by J. S. Bach (see Figure 1) a recent study (Koelsch et al., 2013)
showed that hierarchically incorrect final chords of a musical period
(violating the nonlocal prolongation of the beginning of the period) elicit a
negative brain-electric potential which is maximal between 150 and 300 ms
and had frontal preponderance.
FIGURE 1. Nonlocal dependencies in music. (a) Original version of J. S. Bach’s chorale Liebster
Jesu, wir sind hier. The first phrase ends on an open dominant (see chord with fermata) and the
second phrase ends on a tonic (dotted rectangle). The tree structure above the scores represents a
schematic diagram of the harmonic dependencies. The two thick vertical lines (separating the first
and the second phrase) visualize that the local dominant (V, rectangle above the fermata) is not
immediately followed by a resolving tonic chord, but implies its resolution with the final tonic
(indicated by the dotted arrow). The same dependency exists between initial and final tonic
(indicated by the solid arrow). This illustrates the nonlocal (long-distance) dependency between the
initial and final tonic regions and tonic chords, respectively (also illustrated by the solid arrow). The
chords belonging to a key other than the initial key (see function symbols in square brackets)
represent one level of embedding. (b) Modified version (the first phrase, i.e. notes up to the fermata,
was transposed downwards by the pitch interval of one fourth, see light gray scores). The tree
structure above the scores illustrates that the second phrase is not compatible with an expected tonic
region (indicated by the dotted line), and that the last chord (a tonic of a local cadence, dotted
rectangle) neither prolongs the initial tonic, nor closes the open dominant (see solid and dotted lines
followed by question mark). In both (a) and (b), roman numerals indicate scale degrees. T, S, and D
indicate the main tonal functions (tonic, subdominant, dominant) of the respective part of the
sequence. Squared brackets indicate scale degrees relative to the local key (in the original version,
the function symbols in square brackets indicate that the local key of C major is a subdominant
region of the initial key of G major).
Note that the term “hierarchical” is used here to refer to a syntactic

organizational principle of musical sequences by which elements are
organized in terms of subordination and dominance relationships (Lerdahl
& Jackendoff, 1983; Rohrmeier, 2011; Steedman, 1984). Such hierarchical
structures can be established through the recursive application of rules,
analogous to the establishment of hierarchical structures in language
(Chomsky, 1995). In both linguistics and music theory, such hierarchical
dependency structures are commonly represented using tree graphs. The
term “hierarchical” is sometimes also used in a different sense, namely to
indicate that certain pitches, chords, or keys within pieces occur more
frequently than others and thus establish a frequency-based ranking of
structural importance (Krumhansl & Cuddy, 2010). That is not the sense
intended here.
Numerous other studies using EEG, MEG, and fMRI have previously
investigated processing of musical syntax using melodies (with regular and
irregular tones) or chord sequences (with regular and irregular harmonies,
for reviews see Koelsch, 2009, 2012; Patel, 2008). In all of these studies,
the processes of “musical expectancy formation” (involving processing of
local dependencies) and “musical structure building” (involving processing
of hierarchically organized nonlocal dependencies) were confounded (as is
usually the case in “real” music). For example, in the sequences shown in
Figure 2b, the final chord of the upper sequence is a tonic (I), which is the
most likely chord to follow a dominant (V). The final chord of the lower
sequence is a supertonic (II), which is less likely to follow a dominant.
Thus, the local transition probability from V to II is lower than from V to I
(in other words, the local dependency of I on V is stronger, i.e., more
regular, than of II on V).
FIGURE 2. (a) Examples of chord functions: The chord built on the first scale tone is denoted as
the tonic, the chord on the second tone as the supertonic, and the chord on the fifth tone as the
dominant. (b) The dominant-tonic progression represents a regular ending of a harmonic sequence
(top), the dominant-supertonic progression is less regular and unacceptable as a marker of the end of
a harmonic progression (bottom sequence, the arrow indicates the less regular chord). (c) ERPs
elicited in a passive listening condition by the final chords of the two sequence types shown in (b).
Both sequence types were presented in pseudorandom order equiprobably in all twelve major keys.
Brain responses to irregular chords clearly differ from those to regular chords (best to be seen in the
black difference wave, regular subtracted from irregular chords). The first difference between the
two waveforms is maximal around 200 ms after the onset of the fifth chord (ERAN, indicated by the
long arrow) and taken to reflect processes of music-syntactic analysis. The ERAN is followed by an
N5 taken to reflect processes of harmonic integration (short arrow). (d) Activation foci (small
spheres) reported by functional imaging studies on music-syntactic processing using chord sequence
paradigms (Koelsch, Fritz, et al., 2005; Maess et al., 2001; Koelsch et al., 2002; Tillmann et al.,
2003) and melodies (Janata et al., 2002). Large gray disks show the mean coordinates of foci
(averaged for each hemisphere across studies, coordinates9 refer to standard stereotaxic space).
Reprinted from Trends in Cognitive Sciences, 9(12), Stefan Koelsch and Walter A. Siebel, Towards a
neural basis of music perception, pp. 578–584, Copyright © 2005 Elsevier Ltd. All rights reserved.
At the same time, the final tonic “prolongs” the initial tonic, whereas the
final supertonic does not. Therefore, the nonlocal dependency between
initial and final chord is fulfilled in the upper sequence and violated in the
bottom sequence. Figure 2c shows brain-electric responses to the final
chords of the sequences shown in Figure 2b: the irregular supertonics elicit
an ERAN (early right anterior negativity, indicated by the arrow) compared
to the regular tonic chords. Importantly, as described earlier, the ERAN
elicited here is a conglomerate of the sMMN (due to processing the local
dependency violation) and the “hierarchical ERAN” (due to the processing
of the nonlocal dependency violation). A study by Zhang and colleagues
(Zhang, Zhou, Chang, & Yang, 2018), however, nicely showed effects of
nonlocal context effects on local harmonic processing using the ERAN.
The ERAN has a larger amplitude in individuals with musical training,
is reduced by strong attentional demands, but can be elicited even if
participants ignore the musical stimulus (for a review see Koelsch, 2012).
Most studies reporting an ERAN used harmonies as stimuli, but the ERAN
can also be elicited by melodies (e.g., Carrus, Pearce, & Bhattacharya,
2013; Fiveash, Thompson, Badcock, & McArthur, 2018; Miranda &
Ullman, 2007; Zendel, Lagrois, Robitaille, & Peretz, 2015). Moreover, a
study by Sun and colleagues (Sun, Liu, Zhou, & Jiang, 2018) reported that
the ERAN can also be elicited by rhythmic syntactic processing.
Interestingly, a study by Przysinda and colleagues (Przysinda, Zeng, Maves,
Arkin, & Loui, 2017) showed differential ERAN responses in classical and
jazz musicians depending on their preferences for irregular, or unusual
harmonies. The ERAN is relatively immune against predictions: the ERAN
latency, but not amplitude, is influenced by veridical expectations (Guo &
Koelsch, 2016). However, Vuvan and colleagues (Vuvan, Zendel, & Peretz,
2018) reported that random feedback (including false feedback) on
participants’ detection of out-of-key tones in melodies modulated the
ERAN amplitude, possibly suggesting that attention-driven changes in the
confidence in predictions (i.e., changes in the precision of predictions)
might alter the ERAN amplitude. Recent studies also report that the ERAN
is absent in individuals with “amusia” (Sun, Lu, et al., 2018), or that pitch-
judgment tasks can eliminate the ERAN in amusics (Zendel et al., 2015).
In children, the ERAN becomes visible around the age of 30 months
(Jentschke, Friederici, & Koelsch, 2014), and several studies have reported
ERAN responses in pre-school children (Corrigall & Trainor, 2014;
Jentschke, Koelsch, Sallat, & Friederici, 2008; Koelsch, Grossmann,
Gunter, Hahne, & Friederici, 2003). Children with specific language
impairment show a reduced (or absent) ERAN (Jentschke et al., 2008),
whereas neurophysiological correlates of language-syntactic processing are
developed earlier, and more strongly in children with musical training
(Jentschke & Koelsch, 2009).
Functional neuroimaging studies using chord sequences (similar to those
shown in Figure 2b, e.g., Koelsch et al., 2002; Koelsch, Fritz, Schulze,
Alsop, & Schlaug, 2005; Maess, Koelsch, Gunter, & Friederici, 2001;
Tillmann, Janata, & Bharucha, 2003; Villarreal, Brattico, Leino, Østergaard,
& Vuust, 2011) or melodies (Janata, Tillmann, & Bharucha, 2002) suggest
that music-syntactic processing involves the pars opercularis of the inferior
frontal gyrus (corresponding to BA 44v; Amunts et al., 2010) bilaterally,
but with right-hemispheric weighting (see the spheres in Figure 2d). It
seems likely that the involvement of BA 44v in music-syntactic processing
is mainly due to the hierarchical processing of (syntactic) information: This
part of Broca’s area is involved in the hierarchical processing of syntax in
language (e.g., Friederici et al., 2006; Makuuchi, Bahlmann, Anwander, &
Friederici, 2009), the hierarchical processing of action sequences (e.g.,
Fazio et al., 2009; Koechlin & Jubault, 2006), and possibly also in the
processing of hierarchically organized mathematical formulas and termini
(Friedrich & Friederici, 2009; although activation in the latter study cannot
clearly be assigned to BA 44 or BA 45). Finally, using an artificial musical
grammar, a recent study by Cheung and colleagues (Cheung, Meyer,
Friederici, & Koelsch, 2018) reported activation of BA 44v associated with
the processing of nonlocal (nested) dependencies (however, note that
dependencies in that study were not hierarchically organized).
It appears that inferior BA 44 is not the only structure involved in
music-syntactic processing: additional structures include the superior part
of the pars opercularis (Koelsch et al., 2002), ventral premotor cortex
(PMCv; Janata et al., 2002; Koelsch, Fritz, et al., 2005; Parsons, 2001), and
the anterior portion of the STG (Koelsch, Fritz, et al., 2005). The PMCv
possibly contributes to the processing of local music-syntactic dependencies
(i.e., information based on a finite state grammar): activations of PMCv
have been reported in a variety of functional imaging studies on auditory
processing using musical stimuli, linguistic stimuli, auditory oddball
paradigms, pitch discrimination tasks, and serial prediction tasks,
underlining the importance of these structures for the sequencing of
structural information, the recognition of structure, and the prediction of
sequential information (Janata & Grafton, 2003). With regard to language,
Friederici (2004) reported that activation foci of functional neuroimaging
studies on the processing of hierarchically organized long-distance
dependencies and transformations are located in the posterior IFG (with the
mean of the coordinates reported in that article being located in the inferior
pars opercularis), whereas activation foci of functional neuroimaging
studies on the processing of local dependency violations are located in the
PMCv (see also Friederici et al., 2006; Makuuchi et al., 2009; Opitz &
Kotz, 2011). Moreover, patients with lesions in the PMCv show disruption
of the processing of finite state, but not phrase-structure grammar (Opitz &
Kotz, 2011).
That is, in the abovementioned experiments that used chord sequence
paradigms to investigate the processing of harmonic structure, the music-
syntactic processing of the chord functions probably involved processing of
both finite state grammar (local dependencies) and phrase-structure (or
“context-free”) grammar (hierarchically organized nonlocal dependencies).
The music-syntactic analysis involved a computation of the harmonic
relation between a chord function and the context of preceding chord
functions (phrase-structure grammar). Such a computation is more difficult
(and less common) for irregular than for regular chord functions, and this
increased difficulty is presumably reflected in a stronger activation of
(inferior) BA 44 in response to irregular chords. In addition, the local
transition probability from the penultimate to the final chord is lower for the
dominant–supertonic progression than for the dominant–tonic progression
(finite state grammar), and the computation of the (less predicted) lower-
probability progression is presumably reflected in a stronger activation of
PMCv in response to irregular chords. The stronger activation of both BA
44 and PMCv appears to correlate with the perception of a music-
syntactically irregular chord as “unexpected” (although emotional effects of
irregular chords probably originate from BA 47, discussed below).
Note that the ability to process context-free grammar is available to
humans, whereas non-human primates are apparently not able to master
such grammars (Fitch & Hauser, 2004). Thus, it is highly likely that only
humans can adequately process music-syntactic information at the phrase-
structure level. It is also worth noting that numerous studies showed that
even “non-musicians” (i.e., individuals who have not received formal
musical training) have a highly sophisticated (implicit) knowledge about
musical syntax (e.g., Tillmann, Bharucha, & Bigand, 2000). Such
knowledge is presumably acquired during listening experiences in everyday
life.
Finally, it is important to note that violations of musical expectancies
also have emotional effects, such as surprise, or tension (Huron, 2006;
Koelsch, 2014; Lehne & Koelsch, 2015; Meyer, 1956). Consequently,
musical irregularity confounds emotion-eliciting effects, and it is difficult to
disentangle cognitive and emotional effects of music-syntactic irregularities
in neuroscientific experiments. For example, a study by Koelsch and
colleagues (Koelsch, Ftiz, et al., 2005) reported activation foci in both BA
44 and BA 47 (among other structures) in response to musical expectancy
violations, and a study by Levitin and Menon (2005) reported activation of
BA 47 (without BA 44) in response to scrambled (unpleasant) vs. normal
music. BA 47 is paralimbic, five-layered palaeocortex (not neocortex), and
activation of this region with musical irregularities is most likely due to
emotional effects (this is also consistent with an fMRI study reporting that
musical tension correlates with neural activity in BA 47; Lehne, Rohrmeier,
& Koelsch, 2014). Note that, because BA 47 is not neocortex, it is
problematic to consider this region as a “language area.” Moreover, BA 47
is adjacent to BA 44/45/46, thus activation foci originating in Broca’s area
can easily be misplaced in BA 47. Based on receptorarchitectonic (and
cytoarchitectonic) data, a study by Amunts et al. (2010) showed that BA 47
does not cluster together with BA 44/45/46 (Broca’s area in the wider
sense), nor with BA 6 (PMC).
As mentioned earlier, hierarchical processing of syntactic information
from different domains (such as music and language) requires contributions
from neural populations located in BA 44. However, it is still possible that,
although such neural populations are located in the same brain area, entirely
different (non-overlapping) neural populations serve the syntactic
processing of music and language within the same area. That is, perhaps the
neural populations mediating language-syntactic processing in BA 44 are
different from neural populations mediating music-syntactic processing in
the same area. Therefore, the strongest evidence for shared neural resources
for the syntactic processing of music and language stems from experiments
that revealed interactions between music-syntactic and language-syntactic
processing (Carrus et al., 2013; Fedorenko, Patel, Casasanto, Winawer, &
Gibson, 2009; Koelsch, Gunter, Wittfoth, & Sammler, 2005; Patel, Iversen,
Wassenaar, & Hagoort, 2008; Slevc, Rosenberg, & Patel, 2009; Steinbeis &
Koelsch, 2008). In these studies, chord sequences or melodies were played
simultaneously with (visually presented) sentences, and it was shown, for
example, that the ERAN elicited by irregular chords interacted with the left
anterior negativity (LAN) elicited by linguistic (morpho-syntactic)
violations (Koelsch, Gunter, et al., 2005; Steinbeis & Koelsch, 2008). Thus,
music-syntactic processes can interfere with language-syntactic processes.
In summary, neurophysiological studies show that music- and language-
syntactic processes engage overlapping resources (presumably located in
the inferior frontolateral cortex), and evidence showing that these resources
underlie music- and language-syntactic processing is provided by
experiments showing interactions between ERP components reflecting
music- and language-syntactic processing (in particular LAN and ERAN).
Importantly, such interactions are observed in the absence of interactions
between LAN and MMN, that is, in the absence between language-syntactic
and acoustic deviance processing (reflected in the MMN), and in the
absence of interactions between the ERAN and the N400 (i.e., in the
absence of music-syntactic and language-semantic processing). Therefore,
the reported interactions between LAN and ERAN are syntax-specific and
cannot be observed in response to any kind of irregularity.
C R
As a concluding remark I would like to emphasize that even individuals

without formal musical training show sophisticated abilities with regard to
the decoding of musical information, the acquisition of knowledge about
musical syntax, the processing of musical information according to that
knowledge, and the understanding of music. This finding supports the
notion that musicality is a natural ability of the human brain. Such musical
abilities are important for making music together in groups, and thus for the
beneficial social effects promoted by musical group activities (such as
cooperation and social cohesion, e.g., Koelsch, 2014; Tarr, Launay, &
Dunbar, 2014). The natural musical abilities of humans are also important
for the acquisition and the processing of language. For example,
differentiating different vowels, consonants, and lexical tones, is a highly
sophisticated capability of the human auditory system. Tonal languages rely
on a meticulous decoding of pitch information, and both tonal and non-
tonal languages require an accurate analysis of speech prosody to decode
structure and meaning of speech. Infants use such prosodic cues to acquire
information about word and phrase boundaries (possibly even about word
meaning). The assumption of an intimate connection between music and
speech is corroborated by the reviewed findings of overlapping and shared
neural resources for music and language processing in both adults and
children. These findings suggest that the human brain, particularly at an
early age, does not treat language and music as separate domains, but rather
treats language as a special case of music, and music as a special case of
sound.
R
Abla, D., Katahira, K., & Okanoya, K. (2008). On-line assessment of statistical learning by event-
related potentials. Journal of Cognitive Neuroscience 20(6), 952–964.
Amunts, K., Lenzen, M., Friederici, A. D., Schleicher, A., Morosan, P., Palomero-Gallagher, N., &
Zilles, K. (2010). Broca’s region: Novel organizational principles and multiple receptor mapping.
PLoS Biology 8(9), e1000489.
Balaban, C. D., & Thayer, J. F. (2001). Neurological bases for balance–anxiety links. Journal of
Anxiety Disorders 15(1), 53–79.
Bard, P. (1934). On emotional expression after decortication with some remarks on certain theoretical
views: Part II. Psychological Review 41(5), 424.
Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditory cortex.
Nature 436(7054), 1161–1165.
Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge,
MA: MIT Press.
Cardoso, S. H., Coimbra, N. C., & Brandão, M. L. (1994). Defensive reactions evoked by activation
of NMDA receptors in distinct sites of the inferior colliculus. Behavioural Brain Research 63(1),
17–24.
Carrus, E., Pearce, M. T., & Bhattacharya, J. (2013). Melodic pitch expectation interacts with neural
responses to syntactic but not semantic violations. Cortex 49(8), 2186–2200.
Cheung, V., Meyer, L., Friederici, A. D., & Koelsch, S. (2018). The right inferior frontal gyrus
processes hierarchical non-local dependencies in music. Scientific Reports 8, 3822.
doi:10.1038/s41598-018-22144-9
Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press.
Cleeremans, A., Destrebecqz, A., & Boyer, M. (1998). Implicit learning: News from the front. Trends
in Cognitive Sciences 2(10), 406–416.
Conway, C. M., & Christiansen, M. H. (2005). Modality-constrained statistical learning of tactile,
visual, and auditory sequences. Journal of Experimental Psychology: Learning, Memory, and
Cognition 31(1), 24–39.
Corrigall, K. A., & Trainor, L. J. (2014). Enculturation to musical pitch structure in young children:
Evidence from behavioral and electrophysiological methods. Developmental Science 17(1), 142–
158.
Daikoku, T., Yatomi, Y., & Yumoto, M. (2014). Implicit and explicit statistical learning of tone
sequences across spectral shifts. Neuropsychologia 63, 194–204.
Daikoku, T., Yatomi, Y., & Yumoto, M. (2015). Statistical learning of music-and language-like
sequences and tolerance for spectral shifts. Neurobiology of Learning and Memory 118, 8–19.
Darwin, C. J. (1997). Auditory grouping. Trends in Cognitive Sciences 1(9), 327–333.
Darwin, C. J. (2008). Listening to speech in the presence of other sounds. Philosophical Transactions
of the Royal Society B: Biological Sciences 363(1493), 1011–1021.
Deouell, L. Y. (2007). The frontal generator of the mismatch negativity revisited. Journal of
Psychophysiology 21(3/4), 188–203.
Dibben, N. (1994). The cognitive reality of hierarchic structure in tonal and atonal music. Music
Perception 12(1), 1–25.
Dienes, Z. (2012). Conscious versus unconscious learning of structure. In P. Rebuschat & J. Williams
(Eds.), Statistical learning and language acquisition (pp. 337–364). Berlin: Walter de Gruyter.
Ettlinger, M., Margulis, E. H., & Wong, P. C. (2011). Implicit memory in music and language.
Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00211
Fazio, P., Cantagallo, A., Craighero, L., D’Ausilio, A., Roy, A. C., Pozzo, T., … Fadiga, L. (2009).
Encoding of human action in Broca’s area. Brain 132(7), 1980–1988.
Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural integration in
language and music: Evidence for a shared system. Memory & Cognition 37(1), 1–19.
Fitch, W. T., & Hauser, M. D. (2004). Computational constraints on syntactic processing in a
nonhuman primate. Science 303(5656), 377–380.
Fiveash, A., Thompson, W. F., Badcock, N. A., & McArthur, G. (2018). Syntactic processing in
music and language: Effects of interrupting auditory streams with alternating timbres.
International Journal of Psychophysiology 129(1), 31–40.
François, C., Chobert, J., Besson, M., & Schön, D. (2013). Music training for the development of
speech segmentation. Cerebral Cortex 23(9), 2038–2043.
Francois, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical and
linguistic structures. Cerebral Cortex 21(10), 2357–2365.
François, C., & Schön, D. (2014). Neural sensitivity to statistical regularities as a fundamental
biological process that underlies auditory learning: The role of musical practice. Hearing Research
308, 122–128.
Friederici, A. D. (2004). Processing local transitions versus long-distance syntactic hierarchies.
Trends in Cognitive Sciences 8(6), 245–247.
Friederici, A. D., Bahlmann, J., Heim, S., Schubotz, R. I., & Anwander, A. (2006). The brain
differentiates human and non-human grammars: Functional localization and structural
connectivity. Proceedings of the National Academy of Sciences 103(7), 2458–2463.
Friedrich, R., & Friederici, A. D. (2009). Mathematical logic in the human brain: Syntax. PLoS ONE
4(5), e5599.
Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience
11(2), 127–138.
Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy principle. Philosophical
Transactions of the Royal Society B: Biological Sciences 364(1521), 1211–1221.
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances
automatic encoding of melodic contour and interval structure. Journal of Cognitive Neuroscience
16(6), 1010–1021.
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2005). Automatic encoding of
polyphonic melodies in musicians and nonmusicians. Journal of Cognitive Neuroscience 17(10),
1578–1592.
Furl, N., Kumar, S., Alter, K., Durrant, S., Shawe-Taylor, J., & Griffiths, T. D. (2011). Neural
prediction of higher-order auditory sequence statistics. NeuroImage 54(3), 2267–2277.
Geisler, C. D. (1998). From sound to synapse: Physiology of the mammalian ear. New York: Oxford
University Press.
Griffiths, T. D., & Warren, J. D. (2002). The planum temporale as a computational hub. Trends in
Griffiths, T. D., & Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience
5(11), 887–892.
Guo, S., & Koelsch, S. (2016). Effects of veridical expectations on syntax processing in music:
Event-related potential evidence. Scientific Reports 6, 19064. doi:10.1038/srep19064
Hackett, T. A., & Kaas, J. (2004). Auditory cortex in primates: Functional subdivisions and
processing streams. In M. S. Gazzaniga (Ed.), The cognitive neurosciences (pp. 215–232).
Cambridge, MA: MIT Press.
Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it,
and how did it evolve? Science 298(5598), 1569–1579.
Hay, J. F., Pelucchi, B., Estes, K. G., & Saffran, J. R. (2011). Linking sounds to meanings: Infant
statistical learning in a natural language. Cognitive Psychology 63(2), 93–106.
Hofstadter, D. R. (1979). Gödel, Escher, Bach. New York: Basic Books.
Huffman, R. F., & Henson, O. W. (1990). The descending auditory pathway and acousticomotor
systems: Connections with the inferior colliculus. Brain Research Reviews 15(3), 295–323.
Hunt, R. H., & Aslin, R. N. (2010). Category induction via distributional analysis: Evidence from a
serial reaction time task. Journal of Memory and Language 62(2), 98–112.
MIT Press.
Jackendoff, R., & Lerdahl, F. (2006). The capacity for music: What is it, and what’s special about it?
Cognition 100(1), 33–72.
Janata, P., & Grafton, S. T. (2003). Swinging in the brain: Shared neural substrates for behaviors
related to sequencing and music. Nature Neuroscience 6(7), 682–687.
Janata, P., Tillmann, B., & Bharucha, J. J. (2002). Listening to polyphonic music recruits domain-
general attention and working memory circuits. Cognitive, Affective, & Behavioral Neuroscience
2(2), 121–140.
Jentschke, S., Friederici, A. D., & Koelsch, S. (2014). Neural correlates of music-syntactic
processing in two-year old children. Developmental Cognitive Neuroscience 9, 200–208.
Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development of syntax
processing in children. NeuroImage 47(2), 735–744.
Jentschke, S., Koelsch, S., Sallat, S., & Friederici, A. D. (2008). Children with specific language
impairment also show impairment of music-syntactic processing. Journal of Cognitive
Neuroscience 20(11), 1940–1951.
Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right human
auditory cortex for perceiving pitch direction. Brain 123(1), 155–163.
Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing streams in
primates. Proceedings of the National Academy of Sciences 97(22), 11793–11799.
Kaas, J. H., Hackett, T. A., & Tramo, M. J. (1999). Auditory processing in primate cerebral cortex.
Current Opinion in Neurobiology 9(2), 164–170.
Kandler, K., & Herbert, H. (1991). Auditory projections from the cochlear nucleus to pontine and
mesen-cephalic reticular nuclei in the rat. Brain Research 562(2), 230–242.
Koechlin, E., & Jubault, T. (2006). Broca’s area and the hierarchical organization of human behavior.
Neuron 50(6), 963–974.
Koelsch, S. (2009). Music-syntactic processing and auditory memory: Similarities and differences
between ERAN and MMN. Psychophysiology 46(1), 179–190.
Koelsch, S. (2012). Brain and music. Chichester: Wiley-Blackwell.
170–180.
Koelsch, S., Busch, T., Jentschke, S., & Rohrmeier, M. (2016). Under the hood of statistical learning:
A statistical MMN reflects the magnitude of transitional probabilities in auditory sequences.
Scientific Reports 6, 19741. doi:10.1038/srep19741
Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005). Adults and children processing
music: An fMRI study. NeuroImage 25(4), 1068–1076.
Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., & Friederici, A. D. (2003). Children
processing music: Electric brain responses reveal musical competence and gender differences.
Journal of Cognitive Neuroscience 15(5), 683–693.
Koelsch, S., Gunter, T. C., Cramon, D. Y. von, Zysset, S., Lohmann, G., & Friederici, A. D. (2002).
956–966.
Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax
processing in language and in music: An ERP study. Journal of Cognitive Neuroscience 17(10),
1565–1577.
Koelsch, S., Rohrmeier, M., Torrecuso, R., & Jentschke, S. (2013). Processing of hierarchical
syntactic structure in music. Proceedings of the National Academy of Sciences 110(38), 15443–
15448.
musicians. Neuroreport 10(6), 1309–1313.
Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in Cognitive
Sciences 9(12), 578–584.
Krumhansl, C. L., & Cuddy, L. L. (2010). A theory of tonal hierarchies in music. Music Perception
36, 51–87.
Lamprea, M. R., Cardenas, F. P., Vianna, D. M., Castilho, V. M., Cruz-Morales, S. E., & Brandão, M.
L. (2002). The distribution of Fos immunoreactivity in rat brain following freezing and escape
responses elicited by electrical stimulation of the inferior colliculus. Brain Research 950(1–2),
186–194.
Langner, G., & Ochse, M. (2006). The neural basis of pitch and harmony in the auditory system.
Musicae Scientiae 10(1), 185.
LeDoux, J. E. (2000). Emotion circuits in the brain. Annual Review of Neuroscience 23, 155–184.
Lehne, M., & Koelsch, S. (2015). Toward a general psychological model of tension and suspense.
Lehne, M., Rohrmeier, M., & Koelsch, S. (2014). Tension-related activity in the orbitofrontal cortex
and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9(10),
1515–1523.
Press.
Lerdahl, F., & Krumhansl, C. L. (2007). Modeling tonal tension. Music Perception 24(4), 329–366.
Levitin, D. J., & Menon, V. (2005). The neural locus of temporal structure and expectancies in music:
Evidence from functional neuroimaging at 3 tesla. Music Perception: An Interdisciplinary Journal
22(3), 563–575.
Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in the
area of Broca: An MEG-study. Nature Neuroscience 4(5), 540–545.
Makuuchi, M., Bahlmann, J., Anwander, A., & Friederici, A. D. (2009). Segregating the core
computational faculty of human language from working memory. Proceedings of the National
Academy of Sciences 106(20), 8362–8367.
Malmierca, M. S., Anderson, L. A., & Antunes, F. M. (2015). The cortical modulation of stimulus-
specific adaptation in the auditory midbrain and thalamus: A potential neuronal correlate for
predictive coding. Frontiers in Systems Neuroscience 9, 19. Retrieved from
Marcus, G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule learning by seven-month-old
infants. Science 283(5398), 77–80.
Merker, B., Morley, I., & Zuidema, W. (2015). Five fundamental constraints on theories of the
origins of music. Philosophical Transactions of the Royal Society B: Biological Sciences
370(1664), 20140095.
Miranda, R. A., & Ullman, M. T. (2007). Double dissociation between rules and memory in music:
An event-related potential study. NeuroImage 38(2), 331–345.
Moore, B. C. J. (2008). An introduction to the psychology of hearing (5th ed.). Bingley: Emerald.
Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001). “Primitive
intelligence” in the auditory cortex. Trends in Neurosciences 24(5), 283–288.
Nevins, A., Pesetsky, D., & Rodrigues, C. (2009). Pirahã exceptionality: A reassessment. Language
85(2), 355–404.
Öngür, D., & Price, J. L. (2000). The organization of networks within the orbital and medial
prefrontal cortex of rats, monkeys and humans. Cerebral Cortex 10(3), 206–219.
Opitz, B., & Kotz, S. A. (2011). Ventral premotor cortex lesions disrupt learning of sequential
grammatical structures. Cortex 48(6), 664–673.
Paavilainen, P., Simola, J., Jaramillo, M., Näätänen, R., & Winkler, I. (2001). Preattentive extraction
of abstract feature conjunctions from auditory stimulation as reflected by the mismatch negativity
(MMN). Psychophysiology 38(2), 359–365.
Paraskevopoulos, E., Kuchenbuch, A., Herholz, S. C., & Pantev, C. (2012). Statistical learning
effects in musicians and non-musicians: An MEG study. Neuropsychologia 50(2), 341–349.
Parsons, L. (2001). Exploring the functional neuroanatomy of music performance, perception, and
comprehension. Annals of the New York Academy of Sciences 930, 211–231.
Patel, A. D. (2008). Music, language, and the brain. Oxford: Oxford University Press.
Patel, A. D., & Balaban, E. (2001). Human pitch perception is reflected in the timing of stimulus-
related cortical activity. Nature Neuroscience 4(8), 839–844.
Patel, A. D., Iversen, J. R., Wassenaar, M., & Hagoort, P. (2008). Musical syntactic processing in
agrammatic Broca’s aphasia. Aphasiology 22(7), 776–789.
Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing of
temporal pitch and melody information in auditory cortex. Neuron 36(4), 767–776.
Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010).
Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two
approaches. Trends in Cognitive Sciences 10(5), 233–238.
Petkov, C. I., Kayser, C., Augath, M., & Logothetis, N. K. (2006). Functional imaging reveals
numerous fields in the monkey auditory cortex. PLoS Biology 4(7), e215.
Pickles, J. O. (2008). An introduction to the physiology of hearing (3rd ed.). Bingley: Emerald.
Przysinda, E., Zeng, T., Maves, K., Arkin, C., & Loui, P. (2017). Jazz musicians reveal role of
expectancy in human creativity. Brain and Cognition 119, 45–53.
Putkinen, V., Tervaniemi, M., Saarikivi, K., de Vent, N., & Huotilainen, M. (2014). Investigating the
effects of musical training on functional brain development with a novel melodic MMN paradigm.
Neurobiology of Learning and Memory 110, 8–15.
Rammsayer, T., & Altenmüller, E. (2006). Temporal information processing in musicians and
nonmusicians. Music Perception 24(1), 37–48.
Rohrmeier, M. (2011). Towards a generative syntax of tonal harmony. Journal of Mathematics and
Music 5(1), 35–53.
Rohrmeier, M., & Cross, I. (2008). Statistical properties of tonal harmony in Bach’s chorales. In
Ken’ichi Miyazaki, Mayumi Adachi, Yuzuru Hiraga, Yoshitaka Nakajima, and Minoru Tsuzaki
(Eds.), Proceedings of the 10th International Conference on Music Perception and Cognition.
ICMPC (CD-ROM).
Rohrmeier, M., & Rebuschat, P. (2012). Implicit learning and acquisition of music. Topics in
Cognitive Science 4(4), 525–553.
Rohrmeier, M., Zuidema, W., Wiggins, G. A., & Scharff, C. (2015). Principles of structure building
in music, language and animal song. Philosophical Transactions of the Royal Society B: Biological
Sciences 370(1664), 20140097.
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley
Interdisciplinary Reviews: Cognitive Science 1(6), 906–914.
Saffran, J. R. (2001). Words in a sea of sounds: The output of infant statistical learning. Cognition
81(2), 149–169.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants.
Science 274(5294), 1926–1928.
Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of distributional
cues. Journal of Memory and Language 35(4), 606–621.
Salzer, F. (1962). Structural hearing: Tonal coherence in music (Vol. 1). New York: Dover
Publications.
Schenker, H. (1956). Neue musikalische theorien und phantasien: Der freie satz (2nd ed.). Vienna:
Universal Edition.
Schön, D., & François, C. (2011). Musical expertise and statistical learning of musical and linguistic
structures. Frontiers in Psychology 2, 167. Retrieved from https://doi:10.3389/fpsyg.2011.00167
Schönwiesner, M., Novitski, N., Pakarinen, S., Carlson, S., Tervaniemi, M., & Näätänen, R. (2007).
Heschl’s gyrus, posterior superior temporal gyrus, and mid-ventrolateral prefrontal cortex have
different roles in the detection of acoustic changes. Journal of Neurophysiology 97(3), 2075–2082.
Serafine, M. L., Glassman, N., & Overbeeke, C. (1989). The cognitive reality of hierarchic structure
in music. Music Perception 6(4), 397–430.
Sethares, W. A. (2005). The gamelan. In W. A. Sethares, Tuning, timbre, spectrum, scale (pp. 165–
187). Berlin: Springer.
Sinex, D. G., Guzik, H., Li, H., & Henderson Sabes, J. (2003). Responses of auditory nerve fibers to
harmonic and mistuned complex tones. Hearing Research 182(1–2), 130–139.
Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: Self-paced
reading time evidence for shared processing of linguistic and musical syntax. Psychonomic
Bulletin & Review 16(2), 374–381.
Snyder, J. S., & Elhilali, M. (2017). Recent advances in exploring the neural underpinnings of
auditory scene perception. Annals of the New York Academy of Sciences 1396, 39–55.
Song, J. H., Skoe, E., Wong, P. C. M., & Kraus, N. (2008). Plasticity in the adult human auditory
brainstem following short-term linguistic training. Journal of Cognitive Neuroscience 20(10),
1892–1902.
Steedman, M. J. (1984). A generative grammar for jazz chord sequences. Music Perception 2(1), 52–
77.
Steinbeis, N., & Koelsch, S. (2008). Shared neural resources between music and language indicate
semantic processing of musical tension-resolution patterns. Cerebral Cortex 18(5), 1169–1178.
Strait, D. L., Kraus, N., Skoe, E., & Ashley, R. (2009). Musical experience and neural efficiency:
Effects of training on subcortical processing of vocal expressions of emotion. European Journal of
Sun, L., Liu, F., Zhou, L., & Jiang, C. (2018). Musical training modulates the early but not the late
stage of rhythmic syntactic processing. Psychophysiology 55(2), e12983.
Sun, Y., Lu, X., Ho, H. T., Johnson, B. W., Sammler, D., & Thompson, W. F. (2018). Syntactic
processing in music and language: Parallel abnormalities observed in congenital amusia.
NeuroImage: Clinical 19, 640–651.
Sussman, E. S. (2007). A new view on the MMN and attention debate: The role of context in
processing auditory events. Journal of Psychophysiology 21(3), 164–175.
Tarr, B., Launay, J., & Dunbar, R. I. (2014). Music and social bonding: “Self–other” merging and
neurohormonal mechanisms. Frontiers in Psychology 5, 1096. Retrieved from
Teinonen, T., Fellman, V., Näätänen, R., Alku, P., & Huotilainen, M. (2009). Statistical language
learning in neonates revealed by event-related brain potentials. BMC Neuroscience 10(1), 21.
Teinonen, T., & Huotilainen, M. (2012). Implicit segmentation of a stream of syllables based on
transitional probabilities: An MEG study. Journal of Psycholinguistic Research 41(1), 71–82.
Terhardt, E. (1991). Music perception and sensory information acquisition: Relationships and low-
level analogies. Music Perception: An Interdisciplinary Journal 8(3), 217–239.
Tervaniemi, M. (2009). Musicians—same or different? Annals of the New York Academy of Sciences
1169, 151–156.
Tervaniemi, M., Castaneda, A., Knoll, M., & Uther, M. (2006). Sound processing in amateur
musicians and nonmusicians: Event-related potential and behavioral indices. Neuroreport 17(11),
1225–1228.
Tervaniemi, M., Ilvonen, T., Karma, K., Alho, K., & Näätänen, R. (1997). The musical brain: Brain
waves reveal the neurophysiological basis of musicality in human subjects. Neuroscience Letters
226(1), 1–4.
Tervaniemi, M., Janhunen, L., Kruck, S., Putkinen, V., & Huotilainen, M. (2016). Auditory profiles
of classical, jazz, and rock musicians: Genre-specific sensitivity to musical sound features.
Tervaniemi, M., Rytkönen, M., Schröger, E., Ilmoniemi, R. J., & Näätänen, R. (2001). Superior
formation of cortical memory traces for melodic patterns in musicians. Learning & Memory 8(5),
295–300.
Thiessen, E. D., Kronstein, A. T., & Hufnagle, D. G. (2013). The extraction and integration
framework: A two-process account of statistical learning. Psychological Bulletin 139(4), 792–814.
Tillmann, B., Bharucha, J., & Bigand, E. (2000). Implicit learning of tonality: A self-organized
approach. Psychological Review 107(4), 885–913.
Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical
priming. Cognitive Brain Research 16(2), 145–161.
Todd, N. P. M., & Cody, F. W. (2000). Vestibular responses to loud dance music: A physiological
basis of the “rock and roll threshold”? Journal of the Acoustical Society of America 107(1), 496–
500.
Todd, N. P. M., Paillard, A., Kluk, K., Whittle, E., & Colebatch, J. (2014). Vestibular receptors
contribute to cortical auditory evoked potentials. Hearing Research 309, 63–74.
Tramo, M. J., Shah, G. D., & Braida, L. D. (2002). Functional role of auditory cortex in frequency
processing and pitch perception. Journal of Neurophysiology 87(1), 122–139.
Villarreal, E. A. G., Brattico, E., Leino, S., Østergaard, L., & Vuust, P. (2011). Distinct neural
responses to chord violations: A multiple source analysis study. Brain Research 1389, 103–114.
Vuvan, D. T., Zendel, B. R., & Peretz, I. (2018). Random feedback makes listeners tone-deaf.
Scientific Reports 8(1), 7283.
Warren, J. D., Uppenkamp, S., Patterson, R. D., & Griffiths, T. D. (2003). Separating pitch chroma
and pitch height in the human brain. Proceedings of the National Academy of Sciences 100(17),
10038–10042.
Whitfield, I. (1980). Auditory cortex and the pitch of complex tones. Journal of the Acoustical
Society of America 67(2), 644–647.
Winkler, I. (2007). Interpreting the mismatch negativity. Journal of Psychophysiology 21(3–4), 147–
163.
Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes
human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422.
Zatorre, R. J. (1988). Pitch perception of complex tones and human temporal-lobe function. Journal
of the Acoustic Society of America 84, 566–572.
Zatorre, R. J. (2001). Neural specializations for tonal processing. Annals of the New York Academy of
Sciences 930, 193–210.
Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music
and speech. Trends in Cognitive Sciences 6(1), 37–46.
Zendel, B. R., Lagrois, M.-É., Robitaille, N., & Peretz, I. (2015). Attending to pitch information
inhibits processing of pitch information: The curious case of amusia. Journal of Neuroscience
35(9), 3815–3824.
Zhang, J., Zhou, X., Chang, R., & Yang, Y. (2018). Effects of global and local contexts on chord
processing: An ERP study. Neuropsychologia 109, 149–154.
Zuijen, T. L. von, Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2004). Grouping of
sequential sounds: An event-related potential study comparing musicians and nonmusicians.
Zuijen, T. L. von, Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2005). Auditory
organization of sound sequences by a temporal or numerical regularity: A mismatch negativity
study comparing musicians and non-musicians. Cognitive Brain Research 23(2–3), 270–276.
1
For similar results obtained from patients with (right) PAC lesions see Johnsrude, Penhune, and
Zatorre (2000) and Zatorre (2001).
2
Note that a finite state automaton will only (mis)understand that “Peter kissed Mary”!
CHAPT E R 10
M U LT I S E N S O RY
PROCESSING IN MUSIC
FRANK RUSSO
D of music tend to be unimodal in nature, often including some

version of the idea that music is organized sound with aesthetic intent. Even
philosophical treatises that attempt to define music in broad terms tend to
overlook multisensory aspects (Nattiez, 1990; Thomas, 1983). However,
multisensory aspects abound. For instance, the facial expressions and body
gestures of a performer may be perceived through the visual system and the
mechanical vibrations produced by a musical instrument may be perceived
through the somatosensory system. Sensorimotor networks may also give
rise to cascade effects. For example, motor activity in response to a beat
may give rise to micro-movements of the head and torso, which may in turn
lead to vestibular stimulation. When the motor activity becomes entrained it
may serve as its own channel of sensory input. As such, the perception of
music is often multisensory, integrating inputs from auditory, visual,
somatosensory, vestibular, and motor areas.
This chapter has three main sections. The first provides an overview of
theory and evidence regarding multisensory processing. The second
considers auditory-only processing with a focus on lateralization, basic
modularity, and pathways. This sets the stage for the final section, which
considers non-auditory and multisensory processing of pitch, timbre, and
rhythm. In each subsection corresponding to a dimension of music,
psychophysical evidence is presented before reviewing the extant
neuroscientific evidence. Where no neuroscientific evidence exists,
proposals have been made about the types of neural processing that may be
involved.
M P
It has often been noted that speech is perceived by eye and by ear. This is
normally characterized as an opportunity to minimize uncertainty as it
allows the brain to capitalize on convergences. However, it can also
represent a sensory-processing challenge in that information from across
two channels must somehow be bound together into a common
representation. This challenge may be even greater in music given the
additional channels of sensory information that are routinely involved and
the intentional use of uncertainty as a compositional device. Nevertheless,
under most conditions, multisensory information in music is successfully
integrated yielding a coherent and stable multisensory percept.
Information from across the senses may be integrated in a manner that is
cognitive or perceptual (Schutz, 2008). Cognitive integration takes place
after information from two or more channels has been processed
independently (see review and meta-analysis concerning audio-visual music
by Platz & Kopiez, 2012). A classic musical example of this type of
integration is the influence of performer attractiveness on judgments of
performance quality (Wapnick, Mazza, & Darrow, 2000). In this example,
information from one channel does not so much alter perception in another
as much as influence how those perceptions are evaluated.
Another musical example that reflects cognitive multisensory integration
concerns the “blue note” in live jazz and blues. Blue notes are often
accompanied by a visual display that conveys negatively valenced emotion
(e.g., wincing of the eyes, shaking, or rolling the head back). Thompson,
Graham, and Russo (2005) sought to assess the effect of this practice by
using twenty clips of a blues concert performed by B.B. King. Although all
of the selected clips possessed some level of dissonance, half were
performed with a relatively neutral facial expression. Two groups of
participants were asked to provide judgments of dissonance. One group
made judgments in an auditory-only condition and the other made
judgments in an auditory-visual condition. Results revealed that visual
information influenced judgments of dissonance, such that the difference
between dissonant and neutral performances was greater in the audio-visual
condition. However, it would be erroneous to conclude that information
from the visual and auditory channel had been integrated at the level of
perceptual representation.
Integration at the perceptual level is said to take place when information
from across the senses is integrated in a manner that is automatic and pre-
attentive (Arieh & Marks, 2008; Spence, 2011). All of the multisensory
examples considered in the rest of this chapter meet these simple criteria.
However, the neural mechanisms allowing for perceptual integration are by
no means uniform. To foreshadow, there are at least three main types of
mechanisms that have been implicated. The mechanisms vary with respect
to network size but all involve some form of direct or indirect
communication between primary sensory areas of the brain (see Fig. 1).
FIGURE 1. Schematic diagram of brain circuitry underpinning three mechanisms of multisensory
integration (STS = Superior Temporal Sulcus; IFG = Inferior Frontal Gyrus; S = Somatosensory
Cortex; A = Auditory Cortex; V = Visual Cortex). Top panel diagrams first mechanism involving
primary sensory areas only. Second panel diagrams second mechanism involving the first
mechanism in addition to a known multisensory area, superior-temporal sulcus (STS). Bottom panel
diagrams third mechanism that may be described as sensorimotor. It builds on the second
mechanism adding feedback connections from a known motor planning area, inferior frontal gyrus
(IFG). Subcortical contributions from the superior colliculus not diagrammed.
First, a basic form of multisensory integration occurs when unisensory

input activates areas of primary sensory cortex that are not normally
associated with that input. This phenomenon has been observed following
sensory deprivation that is permanent (e.g., blindness) or temporary (e.g.,
blindfold), suggesting a role for rapid cortical plasticity (Merabet et al.,
2008). Complementary evidence has been found using unisensory
information. For example, auditory cortex may be activated by lip reading
in the context of silent speech (Calvert et al., 1997) or silent tactile
stimulation (Foxe et al., 2002). Although not strictly multisensory, these
examples reveal the existence of lateral connections between primary
sensory areas and suggest the potential for integration without the
involvement of higher-order multisensory areas (Foxe & Schroeder, 2005).
Second, evidence has been observed for a “superadditive” neural
response to multisensory input that is greater than the neural response for
equivalent unisensory inputs. Most of the evidence for superadditivity has
been found using intercellular recordings in the superior colliculus using
animal models (Stein & Meredith, 1993). However, using non-invasive
imaging methods, evidence for superadditivity has also been found in the
cerebral cortex. For instance, a superadditive response has been observed in
superior temporal sulcus using audio-tactile and audio-visual stimuli
(Beauchamp, Yasar, Frye, & Ro, 2008). This body of evidence suggests a
mechanism for multisensory integration that relies on hierarchical
processing involving the progressive convergence of pathways.
Finally, evidence is emerging from electrophysiological, neuroimaging,
and brain stimulation studies for the functional role of connectivity across
broad expanses of sensorimotor cortex (Frith & Hasson, 2016; Keil, Müller,
Ihssen, & Weisz, 2012; Luo, Liu, & Poeppel, 2010; Luo & Poeppel, 2007).
Synchronized oscillations across multisensory and motor areas may serve to
integrate and select task-relevant information from across the senses.
Sensory input may feed forward leading to a predictive motor code that is
informed by priors (empirically based expectations about movement
patterns). In turn, this predictive code can feed back to multisensory areas
allowing for comparison with incoming sensory input (Kilner, Friston, &
Frith, 2007). This body of evidence emphasizes the inherent uncertainty
that exists in sensory information and the important role that the motor
system can have in disambiguating that uncertainty. This sensorimotor
mechanism allows for context-sensitive multisensory integration that relies
on feedforward and feedback connections (Senkowski, Schneider, Foxe, &
Engel, 2008).
In addition to investigating the particular mechanisms underpinning
multisensory integration, research has attempted to explain the extent to
which the different senses will contribute to the perception of a
multisensory stimulus. The likelihood of integrating information from
across the senses is lawfully related to the extent to which information
about a signal appears to overlap in space and time. In other words, when
the audio and visual aspects of a signal are delayed in time or separated in
space, the likelihood of integration is reduced. In addition, the law of
inverse effectiveness states that multisensory integration is inversely
proportional to the effectiveness of the strongest unisensory response
(Meredith & Stein, 1986; Stein & Meredith, 1993). Hence, if an auditory
input is robust on its own to facilitate some functional goal, it will be
resistant to influence from non-auditory information. If the auditory input is
weak due to a compromised sensory system, perceptual ambiguity, or
masking from noise, then the likelihood of integrating information from
other senses increases.
Maximum-likelihood estimation (MLE) methods have been used to
model psychophysical as well as neural findings (Alais & Burr, 2004; Ernst
& Banks, 2002; Gu, Angelaki, & DeAngelis, 2008; Rohe & Noppeney,
2015). Based on Bayesian probability theory, MLE models are essentially a
weighted linear sum that combines signals from different senses (Angelaki,
Gu, & DeAngelis, 2009; Ernst & Bülthoff, 2004). The weight assigned to
each signal is determined by stimulus or perceiver characteristics that
influence signal reliability. Like the inverse effectiveness rule, the critical
assumption in this approach is that inherent uncertainty exists in sensory
information.
A P
Despite the extensive involvement of non-auditory areas in music

processing, there is no mistaking that the auditory cortex is the central hub
for processing music in the neurotypical brain. Rather than an
undifferentiated whole, the auditory cortex is best understood as a
collection of modules that work together as an “auditory network” enabling
the processing of separate dimensions of music. The exposition of these
modules is briefly reviewed here to allow for comparison with processing
of the same dimensions as experienced by other senses. While more
exhaustive reviews of auditory neuroscience may be found elsewhere in this
volume, the focus of the brief review provided here sets the stage for
subsequent discussion of evidence for non-auditory input activating
auditory cortex.
The area known as the auditory core exists in both hemispheres
including the superior temporal gyrus of the temporal lobe and extending
into the lateral sulcus as well as the transverse temporal gyri that runs
toward the center of the brain. The latter is often referred to as Heschl’s
gyrus, which is the first structure in the cortex that reflects the tonotopic
map that originates in the cochlea. Some research has suggested the
existence of separate caudal and rostral tonotopic maps with mirror-like
orientations (Formisano et al., 2003). Additional tonotopic maps have been
found in the belt area surrounding the core (Rauschecker, Tian, & Hauser,
1995; Rauschecker, Tian, Pons, & Mishkin, 1997). Beyond the belt area lies
a tertiary area of auditory cortex known as the parabelt. The parabelt is
thought to have functionally distinct subdivisions (Kaas & Hackett, 2000).
The caudal subdivision abuts and is interconnected with the superior
temporal sulcus. Together, this caudal subdivision of the parabelt and the
superior temporal sulcus constitute the posterior hub of the auditory-motor
pathway (more details on pathways below).
An early PET study by Zatorre & Belin (2001) indicated that in both
hemispheres, temporal variation of auditory input engages the core, whereas
spectral variation engages the belt. However, responses to temporal features
(i.e., with relevance for rhythm) were clearly biased toward the left and
responses to spectral features (i.e., with relevance for pitch and timbre)
were clearly biased toward the right. This apparent pattern of hemispheric
specialization has been further validated by the results of
neuropsychological studies involving patients with cortical lesions. In
general, patients with lesions in the right hemisphere have more impaired
pitch processing than do those with lesions in the left hemisphere. For
example, lesions in the right hemisphere lead to weaker pitch discrimination
(Milner, 1962; Johnsrude, Penhune, & Zatorre, 2000), weaker perception of
the missing fundamental (Zatorre, 1988), weaker sensitivity to pitch
direction (Johnsrude, Penhune, & Zatorre, 2000), and weaker sensitivity to
the global pitch contour (Peretz, 1990).
On the basis of neurophysiological and psychophysical data, Poeppel
(2001) proposed a similar (but speech-specific) hemispheric specialization
that focused on the window of temporal integration. He proposed that the
left hemisphere had a short integration window (20–50 ms) that supports
processing of formant transitions and that the right hemisphere had a long
integration window (150–250 ms) that supports processing of intonation
contours. This specialization may ultimately be rooted in differences in the
volume of white-matter tissue across the two hemispheres. A post-mortem
study by Anderson, Southern, and Powers (1999) found a higher volume of
white matter tissue in the belt area of the left hemisphere compared to the
right due to greater thickness of the myelin sheathing. More recent
neuroimaging research has further validated this proposed explanation for
hemispheric specialization. For example, Hyde, Peretz, & Zatorre (2008)
found that activation in the right hemisphere increased parametrically as a
function of the pitch distance between consecutive tones. In contrast, they
observed only a coarse-grain differentiation in the left hemisphere. Auditory
evoked potentials have also been used to elucidate hemispheric
specialization. Neurons in the right hemisphere have been found to possess
sharper frequency tuning than those in the left hemisphere (Liégeois-
Chauvel, Giraud, Badier, Marquis, & Chauvel, 2012).
Pathways
Much like the “what” (ventral) and “where” (dorsal) visual pathways
originally proposed to explain functional organization in the visual system
(Goodale & Milner, 1992), there are two main auditory pathways leading
out of auditory cortex and terminating in frontal areas (Zatorre, Chen, &
Penhune, 2007). A ventral auditory pathway is thought to be involved
primarily with category-based representations (e.g., phonemes). A dorsal
“auditory-motor” pathway is thought to be specialized for sensorimotor
translations of time-varying information that is not categorical. This
pathway may be particularly important in the context of learning a new
piece of music (Lahav, Saltzman, & Schlaug, 2007; Schalles & Pineda,
2015), perceiving emotion in music (McGarry, Pineda, & Russo, 2015;
Thompson et al., 2005; Vines, Krumhansl, Wanderley, Dalca, & Levitin,
2011), and in the type of feedback monitoring required for performance,
particularly in continuous pitch instruments like voice or violin (Loui, 2015;
Zatorre et al., 2007). The auditory-motor pathway involves reciprocal
connections between inferior frontal gyrus and posterior subdivisions of the
superior temporal gyrus (auditory parabelt) and superior temporal sulcus
(multisensory area).
M P P
Visuomotor Influences
Numerous studies have demonstrated that the size of a sung melodic
interval can be judged directly through the visual system. When videos of
sung melodic intervals are presented to observers without audio, they are
able to accurately scale them according to size (Thompson & Russo, 2007).
This ability does not appear to require music or vocal training, which argues
against a cognitive account based on long-term memory associations, and
further suggests that some aspects of the visual information provide reliable
cues for judging interval size. Video-based tracking has shown that larger
intervals possess more head movement, eyebrow raising, and mouth
opening. The influence of visual information on perception of size in sung
melodic intervals persists even under point-light presentation conditions in
which the dynamic information in the display is retained while eliminating
static visual cues (Abel, Li, Russo, Schlaug, & Loui, 2016).
The visual channel continues to influence the perceived size of sung
melodic intervals even when audio is present (Russo, Sandstrom, &
Maksimowski, 2011; Thompson et al., 2005; Thompson, Russo, &
Livingstone, 2010). The mouth area may be particularly important in
judging the size of sung melodic intervals as reducing the level of audibility
in an audio-visual presentation (by increasing level of background noise)
causes observers to increase the proportion gaze directed toward the mouth
(Russo et al., 2011). However, the visual influence on auditory judgments
has been found to be mitigated for participants with a young onset of
musical training (Abel et al., 2016). One interpretation of this finding is that
early-trained musicians possess a stronger audio-motor representation of
sung melodic intervals. This enhancement in priors may allow them to
focus on auditory input or rely less heavily on non-auditory input when
presented with multisensory musical stimuli. This prioritization may be
further reinforced through experience playing in groups where orthogonal
streams of audio and visual information may co-exist. But how can we be
sure that vision influences melodic pitch processing at a perceptual (vs.
cognitive) level?
One behavioral means of assessing whether multisensory integration is
perceptual is to utilize a dual-task paradigm. Thompson et al. (2010)
presented participants with sung melodic intervals accompanied by facial
expressions used to perform a small or large interval (two and nine
semitones, respectively). Participants were asked to count the number of 0’s
and 1’s that were superimposed over the singer’s face during performance
of each interval. The conditions were blocked by digit speed (300 or 700
msec per digit) as well as task demand (single or dual task). Results
revealed that the influence of the visual information on auditory judgments
of sung melodic interval size was not moderated by cognitive load. These
findings suggest that the integration was automatic and pre-attentive.
The cortical underpinnings of this example of multisensory integration
in music may originate in motion selective areas of the dorsal visual
pathway, such as the medial temporal and the medial superior temporal
areas. Both of these areas are adjacent to the posterior bank of the superior
temporal sulcus, a known multisensory area that projects to premotor areas
allowing for sensorimotor translations of dynamic sensory input (Kilner,
2011). There are also reciprocal connections from premotor to superior
temporal sulcus allowing for a predictive coding model of action involving
sensory representations (Kilner et al., 2007). This type of predictive coding
may be particularly important in shaping auditory judgments of action on
the basis of visual input alone or in situations where auditory input is
ambiguous for some reason (e.g., an individual with severe hearing loss or
an individual with normal hearing listening in low signal-to-noise
conditions).
The mechanism proposed for visual perception and audio-visual
integration of melodic pitch information involves feedforward and feedback
connections along the dorsal stream. Feedforward connections provide
multisensory input to motor planning areas. Feedback connections provide
predictive coding of movement informed by priors that can be compared
with incoming sensory information (Kilner et al., 2007; Maes, Leman,
Palmer, & Wanderley, 2014). In the case of individuals with severe hearing
loss, there may also be an additional contribution owed to visual activation
of the auditory cortex in belt areas (Finney, 2001; Röder, Stock, Bien,
Neville, & Rösler, 2002). Research in animal models suggests that belt
areas undergo profound plastic changes following a period of auditory
deprivation, which leads in some cases to enhanced visual processing.
Lomber, Meredith, and Kral (2010) showed that deactivation of posterior
belt areas selectively eliminates enhancements to visual localization,
whereas deactivation of the dorsal belt areas eliminates enhancement of
visual motion detection.
Transcranial magnetic stimulation (TMS) has been used as one means of
investigating the assumed involvement of motor areas in processing sung
melodic intervals (Royal, Lidji, Théoret, Russo, & Peretz, 2015). Non-
musicians were given brief training that enabled them to apply a label to
intervals of different size (e.g., unison, octave, etc.). Following training,
facilitative TMS was applied over motor cortex, while participants observed
a pitch interval label that was immediately followed by the audio-visual
presentation of a sung interval. Participants were required to make a forced-
choice judgment regarding whether the pitch interval label matched the
pitch interval contained in the two-note vocal melody. Motor-evoked
potentials recorded from the mouth muscles contralateral to the hemisphere
receiving stimulation were found to increase relative to baseline for large
pitch intervals and decrease for small pitch intervals, suggesting that some
type of motor simulation was taking place.
Another line of evidence in support of motor involvement in perception
of song may be found in EEG research investigating the sensorimotor (or
mu) wave. The oscillatory generators of the sensorimotor wave can be
found in the inferior frontal gyrus, and to a lesser extent in the inferior
parietal lobe. The sensorimotor wave becomes desynchronized when an
individual moves intentionally or when they observe others moving
intentionally, and the extent of desynchronization is enhanced under
multisensory presentation conditions (Kaplan & Iacoboni, 2007; McGarry,
Russo, Schalles, & Pineda, 2012). These data have been interpreted as
evidence of an internal simulation involving motor planning and
proprioception. While some controversy exists regarding the putative mirror
system responsible for the sensorimotor wave (Hickok, 2009), its
responsiveness to observation of intentional action is less equivocal. A
meta-analysis by Fox et al. (2016), involving eighty-five studies, found
significant event-related desynchronization during observation of
intentional action (Cohen’s d = 0.31, N = 1,508). With regard to music
stimuli, evidence has been found for sensorimotor desynchronization in
response to audio-only presentations of isolated sung notes (Lévêque &
Schön, 2013) and audio-visual presentations of sung melodic intervals
(McGarry et al., 2015). Although it seems likely, it remains to be
determined whether sensorimotor desynchronization in response to song is
greater in multisensory compared to unisensory presentation conditions.
Somatosensory Influences
Because all sound arises from a source of mechanical vibration, it should be
no surprise that evidence exists for perception of pitch and other musical
dimensions on the basis of vibrotactile input (i.e., mechanical vibration of
the skin). Detection thresholds for vibrotactile stimuli show peak sensitivity
around 250 Hz, and a sharp decline in sensitivity (i.e., larger thresholds)
below 100 Hz (Hopkins, Maté-Cid, Fulford, Seiffert, & Ginsborg, 2016;
Morioka & Griffin, 2005; Verrillo, 1992). Thresholds are also smaller in
smooth (vs. hairy) skin due to increased mechanoreceptor density (Verrillo
& Bolanowski, 1986), and with large (vs. small) contactor areas due to
effects of spatial summation (Morioka & Griffin, 2005). Pitch
discrimination thresholds obtained with vibrotactile stimuli tend to be about
five times greater than those obtained with auditory stimuli (Branje,
Maksimowski, Karam, Fels, & Russo, 2010; Verrillo, 1992). In addition to
this relatively poor pitch discrimination ability, there is no convincing
psychophysical evidence for vibrotactile pitch discriminations beyond about
1,000 Hz.
Single cell recording in macaques has revealed that low-frequency
vibrotactile stimuli can activate belt areas of auditory cortex (Schroeder et
al., 2001). Convergent evidence has been found in imaging studies
involving adults with normal hearing. Low-frequency vibrotactile stimuli
has been shown to activate auditory cortex bilaterally (Levänen, Jousmäki,
& Hari, 1998), particularly in posterior belt areas (Schürmann, Caetano,
Hlushchuk, Jousmäki, & Hari, 2006). The extent of auditory activations
observed in deaf participants is more widespread than that observed in
normal hearing participants (Auer, Bernstein, Sungkarat, & Singh, 2007),
likely due to neuroplastic changes following sensory deprivation. One
question resulting from this work is whether activation of auditory areas by
vibrotactile stimuli is direct or whether it is the result of projections from
somatosensory areas.
Using MEG, Caetano and Jousmäki (2006) were able to track the time
course of vibrotactile activations. They presented normal hearing
participants with 200 Hz vibrotactile stimuli delivered to the fingertips. An
initial response was observed in somatosensory cortex, peaking around 60
ms, followed by transient auditory responses in auditory and secondary
somatosensory cortices between 100 and 200 ms. Finally, a sustained
response was observed in auditory cortex between 200 and 700 ms.
Although these studies all present unisensory stimuli, taken together, these
findings suggest a likely mechanism for audio-tactile integration that is
hierarchical involving a progressive convergence of auditory and
somatosensory pathways. One of the main areas of sensory convergence in
the cortex appears to be the posterior subdivisions of the auditory parabelt
and the superior temporal sulcus (see Fig. 2).
FIGURE 2. Schematic sagittal view of the human brain featuring modules and pathways that are
involved in the multisensory perception of music.
M P T
Saldaña and Rosenblum (1993) presented participants with audio-visual
presentations of cello tones where bowing and plucking was crossed across
the senses. So, for example, observers were presented with a multisensory
stimulus in which the audio channel consisted of an unequivocal plucking
sound and the visual channel presented an unequivocal bowing movement.
Much like the “McGurk effect” upon which this study is based (McGurk &
Macdonald, 1976), auditory judgments were influenced by visual
information. For instance, plucking sounds were more likely to be heard as
bowing when accompanied by bowing visual movement. The authors
interpreted their results with regard to an automatic internal motor
simulation that is driven by auditory and visual information. Much like the
explanation for sung melodic pitch, an internal motor simulation may have
provided a predictive coding model of action involving sensory
representations. The output of the predictive coding model may have been
integrated with direct auditory input at the level of the superior temporal
sulcus. Consistent with this interpretation, fMRI work involving
multisensory speech has consistently implicated the superior temporal
sulcus and superior temporal gyrus (Callan et al., 2003, 2004; Jones &
Callan, 2003). Similar evidence has been found with multisensory tool use
and the extent of activation in the superior temporal sulcus appears to
adhere to the law of inverse effectiveness (Stevenson & James, 2009).
Several studies have investigated the ability to discriminate timbre using
vibrotactile stimuli. Russo, Ammirante, and Fels (2012) found that deaf and
hearing observers were able to accurately distinguish instrument timbres on
the basis of vibrotactile information alone. Deaf and hearing participants
were also able to distinguish timbre on the basis of synthetic tones that
differed only with regard to spectral envelope (dull vs. bright). This ability
persisted even though numerous controls were put in place to ensure that
participants received no trace of residual auditory input. Russo et al. (2012)
proposed that vibrotactile discrimination involves the cortical integration of
spectral information filtered through frequency-tuned mechanoreceptors.
There are four known channels that respond to touch (Bolanowski,
Gescheider, Verrillo, & Checkosky, 1988), and each is sensitive to a unique
range of the frequency spectrum. This allows the mechanoreceptors to
collectively code for spectral shape in the same way that has been proposed
for critical bands in the auditory system (Makous, Friedman, & Vierck,
1995). It would only take two such channels to allow for the coding of
spectral tilt. A follow-up study revealed that deaf participants are able to
discriminate sung vowels and that extent of difference in spectral tilt
between pairs strongly predicted their discriminability (Ammirante, Russo,
Good, & Fels, 2013).
In addition to the influence of vibrotactile stimulation on passive
reception of timbre, it seems likely that such stimulation provides
performers with valuable timbre information during active performance
(Marshall & Wanderley, 2011). As an example, the string vibrations of a
piano are detectable at the level of the key press. Vibration detection
thresholds are reduced under natural playing conditions involving active
touch (Papetti, Jarvelainen, Giordano, Schiesser, & Frohlich, 2017) and the
co-occurrence of sound at the same frequency (Ro, Hsu, Yasar, Caitlin
Elmore, & Beauchamp, 2009). Perhaps not surprisingly, the perception of
sound quality as evaluated by the performer has been shown to be
influenced by vibration that is felt through the keys (Fontana, Papetti,
Järveläinen, & Avanzini, 2017).
To date, there have been no neural studies investigating auditory-tactile
perception of timbre. However, it seems likely that this ability would
depend on direct projections from somatosensory cortex to posterior belt
areas of auditory cortex (see top panel of Fig. 1). These direct projections
are likely to be right lateralized because of thinner myelin sheathing in the
right auditory cortex (Anderson et al., 1999), which may better support
communication across frequency channels, thus enabling spectral analysis.
M P R
Rhythm involves the metrical patterning and grouping of tones that is
shaped by intensity and duration. Visual influences have been found to
affect the ability to track rhythm as well as the low-level dimensions that
contribute to rhythm (e.g., loudness and duration). Because percussionists
do not have the ability to independently control the intensity and duration of
the notes that they produce, the use of gestures may be particularly
important in shaping these dimensions (Schutz, 2008). Rosenblum and
Fowler (1991) recorded handclaps of varying intensity. They presented
participants with audio-visual pairings of the handclaps that were either
congruent or incongruent. Although participants were asked to base
loudness judgments only on what they heard, the visual information
presented had a systematic influence on loudness judgments.
Schutz and colleagues have shown that expressive gestures are also able
to influence the duration of a performed note. Their initial study utilized
recordings of notes performed on a marimba with “long” and “short”
gestures (Schutz & Lipscomb, 2007). Audio and visual channels were
recombined to form congruent and incongruent audio-visual pairings. These
pairings were presented to listeners and they were asked to make duration
estimations on the basis of sound alone. Although the auditory content of
the recordings had no effect on estimations of duration, the visual
presentation influenced perceived duration such that long gestures
lengthened notes and short gestures shortened notes. This effect persisted
even when visual content was substituted with a point-light display,
suggesting that the effect was based on the dynamics of visual movement
(Schutz & Kubovy, 2009).
The ability to synchronize to metrical structures created by discrete
visual flashes has been found to be inferior to synchronization with discrete
auditory tones that have the same temporal characteristics (Patel, Iversen,
Chen, & Repp, 2005). However, the auditory advantage is almost
eliminated if visual rhythms are presented using continuous stimuli such as
a bouncing ball (Grahn, 2012; Hove, Fairhurst, Kotz, & Keller, 2013;
Iversen, Patel, Nicodemus, & Emmorey, 2015). Imaging results have shown
that activation in the putamen, a key timing area involved in motor planning
and beat perception (Grahn & Brett, 2007), parallels results obtained with
sensorimotor synchronization tasks. In particular, continuous visual stimuli
led to greater activation of the putamen than did visual flashes, approaching
activation levels obtained with auditory beeps. This finding suggests that
the ability to synchronize to metrical structure is not simply contingent on
the channel of sensory input but also on the nature of stimulus presentation
(Grahn, 2012; Hove et al., 2013; Ross, Iversen, & Balasubramaniam, 2016).
While discrete events are optimal with auditory stimuli, continuous events
lead to better outcomes with visual stimuli. Some evidence suggests that the
deaf possess some advantage in tracking visual rhythms (Iversen et al.,
2015). The latter finding may be owed to neuroplastic changes resulting
from sensory deprivation and life-long experience with signing (Bavelier et
al., 2000, 2001). Referring back to Fig. 1, the strength of direct visual input
to auditory-motor pathways is likely enhanced in deaf individuals.
Many studies have used EEG to assess neural entrainment to the beat.
When the frequency of the beat is within the range of human movement
(e.g., 1 to 4 Hz), large swathes of cortex entrain to that frequency. These
neural oscillations will persist even after a rhythmic stimulus has been
temporarily paused. Depending on when the rhythmic stimulus is resumed,
the entrained neural oscillations will either increase or decrease in power
(Simon & Wallace, 2017). Power decreases when the rhythmic stimulus
anticipates the beat (too early) and it increases when the rhythmic stimulus
is resumed on the beat (on time). However, if the beat is resumed as an
audio-visual event, there is no modulation of power in the entrained neural
oscillations. These findings reveal that multisensory inputs are not
equivalent to auditory inputs with respect to entrainment. One interpretation
is that multisensory input is “highly reliable or salient” and that resources
should be allocated to processing it independently from the oscillations
manifesting from the original auditory-only beat. This pattern of neural
findings may also help to explain results from sensorimotor synchronization
studies revealing superior synchronization using multisensory rhythms
compared with auditory-only rhythms (Elliott, Wing, & Welchman, 2010;
Varlet, Marin, Issartel, Schmidt, & Bardy, 2012).
Although visual influences on the perception of rhythm can be powerful,
it is important to acknowledge that many listeners will choose to listen with
their eyes closed under challenging conditions. One interpretation of this
phenomenon is that the visual information is somehow distracting. In a task
involving temporal order judgments of varying complexity, researchers
found progressively greater deactivation of visual cortical areas as temporal
asynchronies approached discrimination thresholds (Hairston et al., 2008).
This finding is perhaps best understood from the perspective of the inverse
effectiveness rule (Stein & Meredith, 1993), whereby deactivation of the
visual cortex protects against integration of potentially aberrant timing
information from the visual system in a task that is well handled by
audition.
Some evidence exists for the somatosensory system contributing to the
perception of rhythm. Tranchant et al. (2017) asked deaf and hearing
participants to synchronize movements to a vibrotactile beat delivered
through a vibrating platform. Hearing participants were also asked to
synchronize movements to the same beat delivered through audition and
without vibrotactile stimulation. Results revealed that most participants
were able to synchronize to the vibrotactile beat with no differences
between groups. However, for hearing participants, synchronization
performance was better in the auditory condition than in the vibrotactile
condition.
Other studies have demonstrated that sensorimotor synchronization to a
beat is possible using vibrotactile stimulation applied to the fingertip
(Brochard, Touzalin, Després, & Dufour, 2008; Elliott et al., 2010), toe
(Müller et al., 2008), or to the back (Ammirante, Patel, & Russo, 2016).
Findings have revealed that synchronization to a simple (metronomic)
vibrotactile beat can be as accurate as synchronization to an auditory beat
but only under certain conditions. For example, Müller et al. (2008) found
equivalence on the fingertip but not the toe and Ammirante et al. (2016)
found equivalence on the back, but only when a large portion of the back
was stimulated. Presumably, spatial summation (involving integration of
information across receptors), improved the somatosensory response to
rhythmic information (Gescheider, Bolanowski, Pope, & Verrillo, 2002).
Ammirante et al. (2016) also included an audio-tactile condition to
investigate multisensory integration. Results indicated that sensorimotor
synchronization to audio was consistently equivalent to auditory-tactile,
regardless of contactors size. These results may be interpreted with respect
to the maximum likelihood estimation model (Ernst & Banks, 2002), where
auditory information represents a highly reliable cue that is resistant to
integration with information from a somewhat less reliable channel of
sensory input (vibrotactile).
The results of Ammirante et al. (2016) may also be considered with
respect to sensorimotor models of perception (Fig. 1, Panel 3). The Action
Simulation for Auditory Perception (ASAP) model suggests that our ability
to find the beat in rhythm is based on an internal simulation of periodic
motor activity (Patel & Iversen, 2014). A secondary hypothesis posited in
the model is that beat perception evolved from mechanisms required for
verbal communication, as both involve periodic timing and the integration
of motor and auditory information. This hypothesis is supported in part by
the observation that beat synchronization exists robustly in vocal-learning
species that are only distally related to humans (e.g., parrots and elephants)
and not at all in non-human primates (Merchant, Grahn, Trainor,
Rohrmeier, & Fitch, 2015).
As vocal communication is primarily based in the auditory modality it
follows that cognitive and neurological timing mechanisms would show a
preference for auditory stimuli. Again, this prediction is confirmed by
evidence demonstrating that sensorimotor synchronization to auditory
stimuli tends to be superior to sensorimotor synchronization to visual or
vibrotactile stimuli. Current research in my lab led by Sean Gilmore is
using EEG and source analysis to investigate the extent to which neural
entrainment to the beat is possible under audio-only, vibrotactile-only, and
audio-vibrotactile stimuli. On the basis of the behavioral results of
Ammirante et al. (2016), we expect to find that neural entrainment in motor
planning areas will be weakest for vibrotactile stimuli and that no
differences will exist between audio and audio-tactile conditions.
Movement-Based Influences
Both passive and active head movements are capable of stimulating the
vestibular system (Cullen & Roy, 2004). Given that people actively move
their heads while listening to music it would seem that vestibular
stimulation is commonplace in music listening. Moreover, given that
vestibular cortex is extensively connected with other sensory systems it
stands to reason that there are ample opportunities for multisensory
integration in music that involve the vestibular system. Phillips-Silver &
Trainor (2005) assessed the contribution of the vestibular system to
multisensory rhythm using an ambiguous auditory rhythm. These rhythms
can be encoded in duple form (a march) or in triple form (a waltz). The
rhythms were presented to infants while they were bounced on every
second or every third beat. On the basis of a head-turn preference
procedure, researchers were able to conclude that when infants were
bounced on every second beat, they were coding the ambiguous rhythm in
duple form, and when they were bounced on every third beat they coded the
rhythm in triple form. A follow-up experiment in the same study showed
that blindfolding infants mitigated but did not eliminate the effect, which
confirms that this example of multisensory integration in rhythm does not
depend on visual perception.
Two other studies by Trainor and colleagues have confirmed that these
effects of auditory-vestibular integration in music persist into adulthood. In
one study, adults were trained to bounce in duple or triple time while
listening to an ambiguous rhythm. A subsequent listening test showed that
adults identified an auditory version of the rhythm pattern with accented
beats that matched their bouncing experience as more similar than a version
whose accents did not match (Phillips-Silver & Trainor, 2007). Because this
study involved self-motion it was not able to separate out the contributions
of vestibular and proprioceptive cues. However, a follow-up study
involving direct galvanic stimulation of the vestibular system was able to
provide evidence that auditory and vestibular information are integrated in
rhythm perception in adults even in the absence of movement. In single cell
recordings involving animal models, the posterior parietal cortex appears to
be a likely locus of multisensory integration involving vestibular input
(Bremmer, Schlack, Duhamel, Graf, & Fink, 2001). This area happens to be
proximal to other cortical areas that have been implicated as contributing to
multisensory processing (i.e., posterior superior temporal gyrus, auditory
parabelt, and medial temporal areas).
Other researchers have considered the consequences of multisensory
integration resulting from moving to the beat. Manning & Schutz (2013)
had participants move or simply listen to an isochronous beat. A final tone
was presented following a brief pause and participants were asked whether
it was consistent with the timing of the preceding sequence. Accuracy in
this timing task was superior in the movement condition. In a follow-up
study, it was found that the accuracy gains in this timing task are greater in
percussionists than in non-percussionists, suggesting a role for experience
with moving to the beat (Manning & Schutz, 2016). It seems likely that the
multisensory timing cues resulting from moving to the beat would lead to
stronger neural entrainment to the beat. Indeed, EEG research involving an
ambiguous rhythm has shown that entrainment is stronger after participants
have been trained to move to the rhythm in a way that suggests a binary or
ternary form (Chemin, Mouraux, & Nozaradan, 2014). In addition, the
entrainment gains were detectable at frequencies related to the meter of
movement.
S C
This chapter has provided theory and evidence regarding multisensory

processing in music. Three mechanisms were proposed and a broad range
of evidence was reviewed. Fig. 3 provides a schematic depiction of this
review focusing on brain areas and connections that underpin multimodal
processing of pitch, timbre, and rhythm. Solid lines are used to indicate
connections that have been validated using multiple lines of evidence.
Dashed lines are used to indicate connections that are more theoretical with
only limited validation. Regardless of the evidential status, the proposed
connection strength is reflected by line thickness. Due to space
considerations, this review has been necessarily selective in topics
considered. A more exhaustive consideration of the subject could have
broadened the focus to include multisensory perception of lyrics (Quinto,
Thompson, Russo, & Trehub, 2010), expressivity (Vuoskoski, Thompson,
Clarke, & Spence, 2014), and emotion (Thompson, Russo, & Quinto, 2008;
Vines et al., 2011), as well as examples of multisensory integration that are
better understood from an associative or cognitive perspective (e.g., North,
2012; North, Hargreaves, & McKendrick, 1999; Wapnick et al., 2000).
Nonetheless, this chapter has attempted to make the case that our
conceptualization of music should be multisensory. Although the majority
of individuals will justifiably focus on sound as the core of music
processing, a more inclusive and nuanced consideration of music takes a
multisensory perspective, involving the integration of inputs from auditory,
visual, somatosensory, vestibular, and motor areas.
FIGURE 3. Schematic representation of cortical connections supporting multisensory perception of
music.
A
Funding supporting this research was provided by a Discovery Grant from
the Natural Sciences and Engineering Research Council of Canada
(NSERC). I would like to thank Fran Copelli for assistance with figures and
discussion of concepts. Sean Gilmore and Michael Schutz provided
valuable feedback on earlier drafts of this chapter.
R
Abel, M. K., Li, H. C., Russo, F. A., Schlaug, G., & Loui, P. (2016). Audiovisual interval size
estimation is associated with early musical training. PLoS ONE 11(10), 1–12.
Alais, D., & Burr, D. (2004). Ventriloquist effect results from near-optimal bimodal integration.
Ammirante, P., Patel, A. D., & Russo, F. A. (2016). Synchronizing to auditory and tactile
metronomes: A test of the auditory-motor enhancement hypothesis. Psychonomic Bulletin &
Review 23(6), 1882–1890.
Ammirante, P., Russo, F. A., Good, A., & Fels, D. I. (2013). Feeling voices. PloS ONE 8(1), 1–5.
Anderson, B., Southern, B. D., & Powers, R. E. (1999). Anatomic asymmetries of the posterior
superior temporal lobes: A postmortem study. Neuropsychiatry Neuropsychology, and Behavioral
Neurology 12(4), 247–254.
Angelaki, D. E., Gu, Y., & DeAngelis, G. C. (2009). Multisensory integration: Psychophysics,
neurophysiology, and computation. Current Opinion in Neurobiology 19(4), 452–458.
Arieh, Y., & Marks, L. E. (2008). Cross-modal interaction between vision and hearing: A speed-
accuracy analysis. Perception & Psychophysics 70(3), 412–421.
Auer, E. T., Bernstein, L. E., Sungkarat, W., & Singh, M. (2007). Vibrotactile activation of the
auditory cortices in deaf versus hearing adults. Neuroreport 18(7), 645–648.
Bavelier, D., Brozinsky, C., Tomann, A., Mitchell, T., Neville, H., & Liu, G. (2001). Impact of early
deafness and early exposure to sign language on the cerebral organization for motion processing.
Bavelier, D., Tomann, A., Hutton, C., Mitchell, T., Corina, D., Liu, G., & Neville, H. (2000). Visual
attention to the periphery is enhanced in congenitally deaf individuals. Journal of Neuroscience
20(17), RC93.
Beauchamp, M. S., Yasar, N. E., Frye, R. E., & Ro, T. (2008). Touch, sound and vision in human
superior temporal sulcus. NeuroImage 41(3), 1011–1020.
Bolanowski, S. J., Gescheider, G. A., Verrillo, R. T., & Checkosky, C. M. (1988). Four channels
mediate the mechanical aspects of touch. Journal of the Acoustical Society of America 84(5),
1680–1694.
Branje, C., Maksimowski, M., Karam, M., Fels, D. I., & Russo, F. A. (2010). Vibrotactile display of
music on the human back. Proceedings of the 3rd International Conference on Advances in
Computer–Human Interactions, ACHI 2010 (pp. 154–159). Retrieved from
https://doi.org/10.1109/ACHI.2010.40
Bremmer, F., Schlack, A., Duhamel, J. R., Graf, W., & Fink, G. R. (2001). Space coding in primate
posterior parietal cortex. NeuroImage 14(1), S46–S51.
Brochard, R., Touzalin, P., Després, O., & Dufour, A. (2008). Evidence of beat perception via purely
tactile stimulation. Brain Research 1223, 59–64.
Caetano, G., & Jousmäki, V. (2006). Evidence of vibrotactile input to human auditory cortex.
NeuroImage 29(1), 15–28.
Callan, D. E., Jones, J. A., Munhall, K., Callan, A. M., Kroos, C., & Vatikiotis-Bateson, E. (2003).
Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport
14(17), 2213–2218.
Callan, D. E., Jones, J. A., Munhall, K., Kroos, C., Callan, A. M., & Vatikiotis-Bateson, E. (2004).
Multisensory integration sites identified by perception of spatial wavelet filtered visual speech
gesture information. Journal of Cognitive Neuroscience 16(5), 805–816.
Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C. R., McGuire, P. K., …
David, A. S. (1997). Activation of auditory cortex during silent lipreading. Science 276(5312),
593–596.
Chemin, B., Mouraux, A., & Nozaradan, S. (2014). Body movement selectively shapes the neural
representation of musical rhythms. Psychological Science 25(12), 2147–2159.
Cullen, K. E., & Roy, J. E. (2004). Signal processing in the vestibular system during active versus
passive head movements. Journal of Neurophysiology 91(5), 1919–1933.
Elliott, M. T., Wing, A. M., & Welchman, A. E. (2010). Multisensory cues improve sensorimotor
synchronisation. European Journal of Neuroscience 31(10), 1828–1835.
Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a
statistically optimal fashion. Nature 415(6870), 429–433.
Ernst, M. O., & Bülthoff, H. H. (2004). Merging the senses into a robust percept. Trends in Cognitive
Sciences 8(4), 162–169.
Finney, E. M. F. (2001). Visual stimuli activate auditory cortex in the deaf. Nature Neuroscience
4(12), 1171–1173.
Fontana, F., Papetti, S., Järveläinen, H., & Avanzini, F. (2017). Detection of keyboard vibrations and
effects on perceived piano quality. Journal of the Acoustical Society of America 142(5), 2953–
2967.
Formisano, E., Kim, D. S., Di Salle, F., Van De Moortele, P. F., Ugurbil, K., & Goebel, R. (2003).
Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40(4), 859–869.
Fox, N. A., Yoo, K. H., Bowman, L. C., Cannon, E. N., Ferrari, P. F., Bakermans-Kranenburg, M. J.,
… Van IJzendoorn, M. H. (2016). Assessing human mirror activity with EEG mu rhythm: A meta-
analysis. Psychological Bulletin 142(3), 291–313.
Foxe, J. J., & Schroeder, C. E. (2005). The case for feedforward multisensory convergence during
early cortical processing. Neuroreport 16(5), 419–423.
Foxe, J. J., Wylie, G. R., Martinez, A., Schroeder, C. E., Javitt, D. C., Guilfoyle, D., … Murray, M.
M. (2002). Auditory-somatosensory multisensory processing in auditory association cortex: An
fMRI study. Journal of Neurophysiology 88(1), 540–543.
Frith, C. D., & Hasson, U. (2016). Mirroring and beyond: Coupled dynamics as a generalized
framework for modelling social interactions. Philosophical Transactions of the Royal Society B:
Biological Sciences 371(1693), 20150366. Retrieved from https://doi.org/10.1098/rstb.2015.0366
Gescheider, G. A., Bolanowski, S. J., Pope, J. V., & Verrillo, R. T. (2002). A four-channel analysis of
the tactile sensitivity of the fingertip: Frequency selectivity, spatial summation, and temporal
summation. Somatosensory and Motor Research 19(2), 114–124.
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends
in Neurosciences 15(1), 20–25.
Gu, Y., Angelaki, D. E., & DeAngelis, G. C. (2008). Neural correlates of multisensory cue
integration in macaque MSTd. Nature Neuroscience 11(10), 1201–1210.
Hairston, W. D., Hodges, D. A., Casanova, R., Hayasaka, S., Kraft, R., Maldjian, J. A., & Burdette, J.
H. (2008). Closing the mind’s eye: Deactivation of visual cortex related to auditory task difficulty.
Neuroreport 19(2), 151–154.
Hickok, G. (2009). Eight problems for the mirror neuron theory of action understanding in monkeys
and humans. Journal of Cognitive Neuroscience 21(7), 1229–1243.
Hopkins, C., Maté-Cid, S., Fulford, R., Seiffert, G., & Ginsborg, J. (2016). Vibrotactile presentation
of musical notes to the glabrous skin for adults with normal hearing or a hearing impairment:
Thresholds, dynamic range and high-frequency perception. PLoS ONE 11(5), e0155807. Retrieved
from https://doi.org/10.1371/journal.pone.0155807
Hove, M. J., Fairhurst, M. T., Kotz, S. A., & Keller, P. E. (2013). Synchronizing with auditory and
visual rhythms: An fMRI assessment of modality differences and modality appropriateness.
Iversen, J. R., Patel, A. D., Nicodemus, B., & Emmorey, K. (2015). Synchronization to auditory and
visual rhythms in hearing and deaf individuals. Cognition 134, 232–244.
Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right human
auditory cortex for perceiving pitch direction. Brain 123(1), 155–163.
Jones, J. A., & Callan, D. E. (2003). Brain activity during audiovisual speech perception: An fMRI
study of the McGurk effect. Neuroreport 14(8), 1129–1133.
Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing streams in
primates. Proceedings of the National Academy of Sciences 97(22), 11793–11799.
Kaplan, J. T., & Iacoboni, M. (2007). Multimodal action representation in human left ventral
premotor cortex. Cognitive Processing 8(2), 103–113.
Keil, J., Müller, N., Ihssen, N., & Weisz, N. (2012). On the variability of the McGurk effect:
Audiovisual integration depends on prestimulus brain states. Cerebral Cortex 22(1), 221–231.
Kilner, J. M. (2011). More than one pathway to action understanding. Trends in Cognitive Sciences
15(8), 352–357.
Kilner, J. M., Friston, K. J., & Frith, C. D. (2007). Predictive coding: An account of the mirror
neuron system. Cognitive Processing 8(3), 159–166.
recognition network while listening to newly acquired actions. Journal of Neuroscience?27(2),
308–314.
Levänen, S., Jousmäki, V., & Hari, R. (1998). Vibration-induced auditory-cortex activation in a
congenitally deaf adult. Current Biology 8(15), 869–872.
Lévêque, Y., & Schön, D. (2013). Listening to the human voice alters sensorimotor brain rhythms.
PLoS ONE 8(11), 1–10.
Liégeois-Chauvel, C., Giraud, K., Badier, J. M., Marquis, P., & Chauvel, P. (2012). Intracerebral
evoked potentials in pitch perception reveal a functional asymmetry of human auditory cortex.
Lomber, S. G., Meredith, M. A., & Kral, A. (2010). Cross-modal plasticity in specific auditory
cortices underlies visual compensations in the deaf. Nature Neuroscience 13(11), 1421–1427.
Loui, P. (2015). A dual-stream neuroanatomy of singing. Music Perception: An Interdisciplinary
Journal 32(3), 232–241.
Luo, H., Liu, Z., & Poeppel, D. (2010). Auditory cortex tracks both auditory and visual stimulus
dynamics using low-frequency neuronal phase modulation. PLoS Biology 8(8), e1000445.
Retrieved from http://dx.plos.org/10.1371/journal.pbio.1000445.g007
Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal responses reliably discriminate speech in
human auditory cortex. Neuron 54(6), 1001–1010.
McGarry, L. M., Pineda, J. A., & Russo, F. A. (2015). The role of the extended MNS in emotional
and nonemotional judgments of human song. Cognitive, Affective, & Behavioral Neuroscience
15(1), 32–44. https://doi.org/10.3758/s13415-014-0311-x
McGarry, L. M., Russo, F. A., Schalles, M. D., & Pineda, J. A. (2012). Audio-visual facilitation of
the mu rhythm. Experimental Brain Research 218(4), 527–538.
McGurk, H., & Macdonald, J. (1976). Hearing lips and seeing voices. Nature 264(5588), 746–748.
Maes, P.-J., Leman, M., Palmer, C., & Wanderley, M. M. (2014). Action-based effects on music
perception. Frontiers in Psychology 4. Retrieved from https://doi.org/10.3389/fpsyg.2013.01008
Makous, J. C., Friedman, R. M., & Vierck, C. J. (1995). A critical band filter in touch. Journal of
Manning, F. C., & Schutz, M. (2013). “Moving to the beat” improves timing perception.
Psychonomic Bulletin and Review 20(6), 1133–1139.
Manning, F. C., & Schutz, M. (2016). Trained to keep a beat: Movement-related enhancements to
timing perception in percussionists and non-percussionists. Psychological Research 80(4), 532–
542.
Marshall, M. T., & Wanderley, M. M. (2011). Examining the effects of embedded vibrotactile
feedback on the feel of a digital musical instrument. New Interfaces for Musical Expression (June),
399–404.
Merabet, L. B., Hamilton, R., Schlaug, G., Swisher, J. D., Kiriakopoulos, E. T., Pitskel, N. B., …
Pascual-Leone, A. (2008). Rapid and reversible recruitment of early visual cortex for touch. PLoS
ONE 3(8). Retrieved from https://doi.org/10.1371/journal.pone.0003046
Royal Society B: Biological Sciences 370(1664), 20140093. Retrieved from
https://doi.org/10.1098/rstb.2014.0093
Meredith, M. A., & Stein, B. E. (1986). Visual, auditory, and somatosensory convergence on cells in
superior colliculus results in multisensory integration. Journal of Neurophysiology 56(3), 640–662.
Milner, B. (1962). Laterality effects in audition. In V. B. Mountcastle (Ed.), Interhemispheric
relations and cerebral dominance (pp. 177–195). Baltimore, MD: Johns Hopkins University Press.
Morioka, M., & Griffin, M. J. (2005). Thresholds for the perception of hand-transmitted vibration:
Dependence on contact area and contact location. Somatosensory and Motor Research 22(4), 281–
297.
Müller, K., Aschersleben, G., Schmitz, F., Schnitzler, A., Freund, H. J., & Prinz, W. (2008). Inter-
versus intramodal integration in sensorimotor synchronization: A combined behavioral and
magnetoencephalographic study. Experimental Brain Research 185(2), 309–318.
Nattiez, J.-J. (1990). Music and discourse: Toward a semiology of music. Princeton, NJ: Princeton
University Press.
North, A. C. (2012). The effect of background music on the taste of wine. British Journal of
Psychology,103(3), 293–301.
North, A. C., Hargreaves, D. J., & McKendrick, J. (1999). The influence of in-store music on wine
selections. Journal of Applied Psychology 84(2), 271–276.
Papetti, S., Jarvelainen, H., Giordano, B. L., Schiesser, S., & Frohlich, M. (2017). Vibrotactile
sensitivity in active touch: Effect of pressing force. IEEE Transactions on Haptics 10(1), 113–122.
Patel, A. D., & Iversen, J. R. (2014). The evolutionary neuroscience of musical beat perception: The
Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in Systems Neuroscience
8, 57. Retrieved from https://doi.org/10.3389/fnsys.2014.00057
Patel, A. D., Iversen, J. R., Chen, Y., & Repp, B. H. (2005). The influence of metricality and
modality on synchronization with a beat. Experimental Brain Research 163(2), 226–238.
Peretz, I. (1990). Processing of local and global musical information by unilateral brain-damaged
patients. Brain 113(4), 1185–1205.
Phillips-Silver, J., & Trainor, L. J. (2005). Feeling the beat: Movement influences infant rhythm
perception. Science 308(5727), 1430.
Phillips-Silver, J., & Trainor, L. J. (2007). Hearing what the body feels: Auditory encoding of
rhythmic movement. Cognition 105(3), 533–546.
Platz, F., & Kopiez, R. (2012). When the eye listens: A meta-analysis of how audio-visual
presentation enhances the appreciation of music performance. Music Perception 30(1), 71–83.
Poeppel, D. (2001). Pure word deafness and the bilateral processing of the speech code. Cognitive
Science 25(5), 679–693.
Quinto, L., Thompson, W. F., Russo, F. A., & Trehub, S. E. (2010). A comparison of the McGurk
effect for spoken and sung syllables. Attention, Perception, & Psychophysics 72(6), 1450–1454.
Rauschecker, J. P., Tian, B., & Hauser, M. (1995). Processing of complex sounds in the macaque
nonprimary auditory cortex. Science 268(5207), 111–114.
Rauschecker, J. P., Tian, B., Pons, T., & Mishkin, M. (1997). Serial and parallel processing in rhesus
monkey auditory cortex. Journal of Comparative Neurology 382(1), 89–103.
Ro, T., Hsu, J., Yasar, N. E., Caitlin Elmore, L., & Beauchamp, M. S. (2009). Sound enhances touch
perception. Experimental Brain Research 195(1), 135–143.
Röder, B., Stock, O., Bien, S., Neville, H., & Rösler, F. (2002). Speech processing activates visual
cortex in congenitally blind humans. European Journal of Neuroscience 16(5), 930–936.
Rohe, T., & Noppeney, U. (2015). Cortical hierarchies perform Bayesian causal inference in
multisensory perception. PLoS Biology 13(2). Retrieved from
https://doi.org/10.1371/journal.pbio.1002073
Rosenblum, L. D., & Fowler, C. A. (1991). Audiovisual investigation of the loudness-effort effect for
speech and nonspeech events. Journal of Experimental Psychology: Human Perception and
Ross, J. M., Iversen, J. R., & Balasubramaniam, R. (2016). Motor simulation theories of musical beat
perception. Neurocase 22(6), 558–565.
Royal, I., Lidji, P., Théoret, H., Russo, F. A., & Peretz, I. (2015). Excitability of the motor system: A
transcranial magnetic stimulation study on singing and speaking. Neuropsychologia 75, 525–532.
Russo, F. A., Ammirante, P., & Fels, D. I. (2012). Vibrotactile discrimination of musical timbre.
Journal of Experimental Psychology: Human Perception and Performance 38(4), 822–826.
Russo, F. A., Sandstrom, G. M., & Maksimowski, M. (2011). Mouth versus eyes: Gaze fixation
during perception of sung interval size. Psychomusicology: Music, Mind, and Brain 21(1–2), 98–
107.
Saldaña, H. M., & Rosenblum, L. D. (1993). Visual influences on auditory pluck and bow judgments.
Schalles, M. D., & Pineda, J. A. (2015). Musical sequence learning and EEG correlates of
audiomotor processing. Behavioural Neurology, 2015. Retrieved from
https://doi.org/10.1155/2015/638202
Schroeder, C. E., Lindsley, R. W., Specht, C., Marcovici, A., Smiley, J. F., & Javitt, D. C. (2001).
Somatosensory input to auditory association cortex in the macaque monkey. Journal of
Schürmann, M., Caetano, G., Hlushchuk, Y., Jousmäki, V., & Hari, R. (2006). Touch activates human
auditory cortex. NeuroImage 30(4), 1325–1331.
Schutz, M. (2008). Seeing music? What musicians need to know about vision. Empirical Musicology
Review 3(3), 83–108.
Schutz, M., & Kubovy, M. (2009). Deconstructing a musical illusion: Point-light representations
capture salient properties of impact motions. Canadian Acoustics 37(1) 23–28.
Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music: Vision influences perceived tone
duration. Perception 36(6), 888–897.
Senkowski, D., Schneider, T. R., Foxe, J. J., & Engel, A. K. (2008). Crossmodal binding through
neural coherence: Implications for multisensory processing. Trends in Neurosciences 31(8), 401–
409.
Simon, D. M., & Wallace, M. T. (2017). Rhythmic modulation of entrained auditory oscillations by
visual inputs. Brain Topography 30(5), 565–578.
Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, &
Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press.
Stevenson, R. A., & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus:
Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage
44(3), 1210–1223.
Thomas, C. (1983). Music as heard: A study in applied phenomenology. New Haven, CT: Yale
University Press.
Thompson, W. F., Graham, P., & Russo, F. A. (2005). Seeing music performance: Visual influences
on perception and experience. Semiotica 156(1/4), 203–227.
Thompson, W. F., & Russo, F. A. (2007). Facing the music. Psychological Science 18(9), 756–757.
Thompson, W., Russo, F., & Livingstone, S. (2010). Facial expressions of singers influence perceived
pitch relations. Psychonomic Bulletin & Review 17(3), 317–322.
Thompson, W. F., Russo, F. A., & Quinto, L. (2008). Audio-visual integration of emotional cues in
song. Cognition & Emotion 22(8), 1457–1470.
Tranchant, P., Shiell, M. M., Giordano, M., Nadeau, A., Peretz, I., & Zatorre, R. J. (2017). Feeling
the beat: Bouncing synchronization to vibrotactile music in hearing and early deaf people.
Frontiers in Neuroscience 11. Retrieved from https://doi.org/10.3389/fnins.2017.00507
Varlet, M., Marin, L., Issartel, J., Schmidt, R. C., & Bardy, B. G. (2012). Continuity of visual and
auditory rhythms influences sensorimotor coordination. PLoS ONE 7(9). Retrieved from
https://doi.org/10.1371/journal.pone.0044082
Verrillo, R. T. (1992). Vibration sensation in humans. Music Perception: An Interdisciplinary Journal
9(3), 281–302.
Verrillo, R. T., & Bolanowski, S. J. (1986). The effects of skin temperature on the psychophysical
responses to vibration on glabrous and hairy skin. Journal of the Acoustical Society of America
80(2), 528–532.
Vines, B. W., Krumhansl, C. L., Wanderley, M. M., Dalca, I. M., & Levitin, D. J. (2011). Music to
my eyes: Cross-modal interactions in the perception of emotions in musical performance.
Cognition 118(2), 157–170.
Vuoskoski, J. K., Thompson, M. R., Clarke, E. F., & Spence, C. (2014). Crossmodal interactions in
the perception of expressivity in musical performance. Attention, Perception, & Psychophysics,
76(2), 591–604.
Wapnick, J., Mazza, J. K., & Darrow, A. A. (2000). Effects of performer attractiveness, stage
behaviour, and dress on evaluation of children’s piano performances. Journal of Research in Music
Education 323(4), 323–335.
Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex.
Cerebral Cortex 11(10), 946–953.
Zatorre, R. J., Chen, J. L., & Penhune, V. B. (2007). When the brain plays music: Auditory–motor
interactions in music perception and production. Nature Reviews Neuroscience 8, 547–558.
SECTION IV
N E U R A L R E S P ON S E S
TO MU S IC :
C OGN IT ION , A F F E C T,
L A N GU A GE
CHAPT E R 11
M U S I C A N D M E M O RY
L U T Z JÄ N C K E
M listening, music composing, and music-making are strongly

associated with memory processes. For example, when we listen to music
we might remember the title, the melody, the singer or musicians, and the
circumstances in which we heard the music for the first time. It is also
possible that we catch the gist when listening to a particular piece of music
without explicitly knowing the details of the piece. These are the most
obvious memory aspects associated with music. However, some people
might even be able to remember a single tone or a tone interval without
relying on a reference tone. Music might also help to boost our memory and
help us to consolidate what we have learned. These few examples
demonstrate that music is associated in many ways with memory processes.
In this chapter, I will discuss these associations and provide some examples
of and future applications for music supporting memory processes. But
before I examine the typical music-related memory aspects, I discuss some
basic principles of the human memory system.
H M G
Human memory comprises several parts: (1) sensory memory, (2) short-
term memory, (3) working memory, and (4) long-term memory. The
sensory memory stores sensory information for a very short period. This
memory system is strongly associated with neural networks processing
sensory information. Thus, this information is not processed, interpreted,
and encoded. The working memory system is a central system not only for
memory processes; it is rather pivotal for many if not all cognitions. The
main functions controlled by working memory are often coined as
“maintenance and manipulation” to express the fact that working memory
not only holds but also manipulates information. To hold information for a
short period of time without any cognitive manipulation is a matter of short-
term memory. Manipulation, which is a main pillar of the working memory
system, is strongly related to executive functions, pattern recognition, long-
term memory, encoding for long-term memory, language and music
comprehension, problem solving, and even creativity. This is all
accomplished under participation of the working memory system. Thus, this
system is pivotal for nearly all music functions and particularly for music
memory. The neural networks, which are involved in working memory
process, are not focal but are distributed over many brain areas due to the
many functions associated with working memory. In long-term memory,
encoded material is stored for longer time periods, sometimes even
extremely long—up to many decades. Long-term memory is divided into an
explicit and implicit memory system. The explicit memory system contains
consciously available information and comprises the semantic and episodic
memory. The semantic memory contains conscious memory of facts while
the episodic memory is a system for holding events, memory traces
associated with places, times, emotions, and other concept-based
knowledge of an experience. This explicit memory (sometimes also called
declarative memory) is not a simple store; it is rather a mechanism
constructing the past on the basis of stored and new information using
specific strategies (e.g., retrieval schemas, which will be described later).
The neural underpinnings of the explicit memory system are relatively
complex and contain so-called “bottleneck structures” in mesiotemporal
brain areas (including the hippocampus) and networks in temporal, parietal,
as well as frontal brain areas. Thus, the explicit memory system is based on
a distributed network with a mesiotemporal core system. The implicit
memory system contains information that is not easy to verbalize but can be
used without consciously thinking about it. The networks controlling this
implicit memory system do not overlap with the neural networks for the
explicit memory system. The neural networks for the implicit memory
mainly comprise premotor, cerebellar, and basal ganglia structures.
M P M
L
The psychological processes and the neural underpinnings of music

listening have been studied quite intensively. These studies have shown that
music is processed in a cascade of steps that begins with the segregation
within the auditory stream, followed by the extraction and integration of a
variety of acoustic features, leading to cognitive memory-related processes
that induce personal, often emotional, experiences. Thus, listening to music
can be conceived of as a hierarchical continuous serial-to-parallel
conversion during which the auditory stream (stream of tones and chords) is
integrated to melody chunks and these melody chunks are then integrated to
an entire melody (Fig. 1). For this serial-to-parallel conversion, working
memory processes are pivotal, since the tonal and/or music information is
stored temporarily and perpetually manipulated.
FIGURE 1. Schematic description of the serial-to-parallel conversion, which can be conceived of
as a form of integration of serial information on different levels. t1–t10 represent different tones
presented in serial order. m1 to m3 are the integrated tones combining to form melody fragments. At
the next level, these melody fragments are integrated into a larger melody cluster or even into the
entire musical piece.
The sound sequences are woven into a melodic contour of pitch and
rhythm. These melodic contours do not appear to be due to bottom-up
processes since the listener is not a passive listener or receiver, but is
actively engaged in processing the music. In this context, the listener uses
acoustic memories, aesthetic judgments, and expectations and combines
them to understand and interpret the particular piece of music (the schema
concept is discussed later in the chapter). Thus, the listener stores many
aspects of the auditory stimuli—such as pitch, pitch interval, timbre, and
rhythm—in memory. Based on this stored information, the listener
constructs an integrated memory of the particular melody. In the following,
I will describe some memory processes associated with tone, tone interval,
and melody processing in more detail.
Tone Memory
Even non-musicians are relatively good at remembering and recognizing
single tones or the pitch of a melody. For example, in an experiment
conducted by Gaab and colleagues (Gaab, Gaser, Zaehle, Jancke, &
Schlaug, 2003), non-musicians performed well in pitch memory tasks
during which they were asked to make a decision on whether the last or
second to last tone of a tone sequence was the same or different as the first
tone. The recognition rate for the tones was astonishingly high, with an
accuracy of about 66 percent. The authors also conducted fMRI
measurements during pitch memory learning. When relating the pitch
memory performance to the task-related hemodynamic responses, they
revealed that bilaterally the supramarginal gyrus and the dorsolateral
cerebellum were significantly correlated with good task performance. The
authors suggest that (besides the auditory cortex), the supramarginal gyrus
and the dorsolateral cerebellum may play a critical role in short-term
storage of pitch information.
Absolute pitch listeners are much better in memorizing tones and
chords. Absolute pitch (AP) is defined as the ability to identify a note
without relying on a reference tone (Levitin & Rogers, 2005; Takeuchi &
Hulse, 1993). It is a rare ability with an incidence of 1 percent in the general
population, although Asian people speaking tonal languages have a higher
rate (Deutsch, Henthorn, Marvin, & Xu, 2006). Absolute pitch is supposed
to originate from an intertwining of genetic factors (Gregersen, Kowalsky,
Kohn, & Marvin, 1999), early exposure to music (Gregersen, Kowalsky,
Kohn, & Marvin, 2001), and intensity of musical training (Gregersen et al.,
2001). Currently, a two-component model is discussed explaining this
extraordinary ability. In the context of this model, it is suggested that AP is
constituted by one perceptual (i.e., “categorical perception”) and two
cognitive—“pitch memory” (i.e., explicit memory) and “pitch labeling”
(i.e., implicit associative memory)—mechanisms, whereby the latter
mechanism has been suggested as constituting the load-bearing skeleton of
AP. Several neurophysiological and neuroanatomical studies support this
suggestion. One main finding in this context is that frontotemporal areas are
strongly activated during tone listening and tone memory tasks in AP
listeners and that these regions are specifically and strongly functionally
and anatomically interconnected (Elmer, Rogenmoser, Kühnis, & Jäncke,
2015; Rogenmoser, Elmer, & Jäncke, 2015; Zatorre, Perry, Beckett,
Westbury, & Evans, 1998). Although these findings are interesting and
important for understanding the neural underpinnings of tone perception
and tone memory, listening to single tones and remembering them are not
adequate tasks in understanding musical listening and the associated
memory processes in their entirety.
Tone Interval Memory

More important for understanding music-related memory processes is to
understand the psychological and neurophysiological processes that are
operative during tone sequence and melody listening tasks. Even non-
musicians are very good in recognizing melodies based largely upon the
relative sizes of the intervals between successive pitches. This ability is
robustly preserved even when the entire frequency range of the music is
shifted up or down (i.e., during transposing). This ability, which is called
relative pitch (RP) processing, is strongly influenced or even entirely
acquired early during development. For example, Trainor and colleagues
(Trainor, McDonald, & Alain, 2002) showed that 5.5- to 6.5-month-old
infants preferred to listen to a particular melody, which they have listened to
repeatedly (compared to a novel melody). In this experiment, the authors
also demonstrated that the AP information was more or less unimportant.
Most important, however, was the long-term representation of the melody,
which is based on the tone intervals. In a further electrophysiological
experiment (Plantinga & Trainor, 2005), it was shown that RP interval
processing occurs in a more or less automatic fashion, as demonstrated by
mismatch negativities (MMN) to deviations of known pitch intervals. Since
the MMN is commonly regarded as a neurophysiological marker of pre-
attentive processing of change detection, the authors conclude that pitch
interval perception is automatically implanted. Further studies have
substantiated these findings by showing that the encoding accuracy
increases with increasing length of the tone sequences (Lee, Janata, Frost,
Martinez, & Granger, 2015). The authors interpret these findings as support
for the idea that it is easier for the subjects to apply particular Gestalt
principles to longer than to short tone sequences.
Tonal Working Memory

When we listen to music we integrate the incoming sequential auditory
information. That makes it necessary to hold some auditory information for
a short period of time in memory and to combine this with the next
incoming sounds of a melody. Thus, we have to hold auditory information,
and based on our knowledge about the musical structure, we combine the
tone sequences into melodies. Without such a mechanism, it would be
impossible for us to follow and understand even the shortest musical piece.
From this description, it is clear that short-term memory processes
(maintaining auditory information for a short period of time) as well as
cognitive processes (manipulating, combining, and prediction) are involved
here. This combination of maintenance and manipulation of incoming
stimuli has led to the formulation of the working memory (WM) concept.
The classical WM model was mostly developed using verbal material
(Baddeley & Hitch, 1974). According to this model, verbal information is
processed by a phonological loop, which is further subdivided into a
passive storage component (phonological store) and an active rehearsal
mechanism (articulatory rehearsal process). The passive storage component
is assumed to store auditory or speech-based information for a few seconds.
In addition, an attentional control system (the central executive) controls
and supervises the phonological loop. In a later version of the WM model,
the mutual interaction between long-term memory (LTM) and WM was
recognized by proposing an episodic buffer (Baddeley, 2010). Recent
developments have led to a more domain-general model of WM (Cowan,
2011; Oberauer & Lewandowsky, 2011). This new model proposes
polymodal LTM representations of items, which are activated either by
incoming sensory input or by volition, thus becoming available for
attentional selection. Based on these theoretical contributions, we now
accept that WM is a system with limited capacity binding information from
the phonological loop, storing information in a multimodal code, and
enabling the interaction between WM and LTM under the supervision of
attention and executive control.
Behavioral Findings of Tonal Working Memory

Although the classical WM model is well elaborated, it has been unclear
whether music information (e.g., tones, chords, and timbre) is processed
within the WM system similar to verbal information. As mentioned above,
the classical WM model has been designed on the basis of verbal
information and does not explicitly specify whether the phonological loop
also processes non-verbal information. In behavioral WM studies, one
typically influences the rehearsal process by introducing specific stimuli
that are similar to the stimuli (e.g., phonological similarity effect) that
should be held in mind. Other paradigms manipulate the length of the to-be-
remembered items (e.g., word, or sequence, length effect).
An important part of the classical WM model is that verbal information
can be maintained in verbal WM by internal articulatory rehearsal (within
the phonological loop). But does such an internal rehearsal also exist for
pitch and timbre information? Not many studies have been conducted to
date trying to answer this question and they have come to conflicting
conclusions (for an excellent summary see the review by Schulze &
Koelsch, 2012). However, as Schulze and Koelsch (2012) correctly point
out, the conflicting results are mainly based on the different paradigms and
stimuli settings used. Nevertheless, when having a closer look at these
findings, a more or less clear picture emerges. There is clear evidence that a
tonal WM indeed exists in which tonal information is rehearsed. However,
the subjects must be able to rehearse the material. Rehearsal is possible if
the subjects are familiar with the tone information and when the to-be-
remembered tone information is salient enough (i.e., when tones are used
whose frequencies correspond to the frequencies of the Western chromatic
scale, or if the frequency differences between the used tones are not smaller
than one semitone).
Behavioral studies directly comparing verbal and tonal WM are
relatively rare. Early studies (Deutsch, 1970; Salamé & Baddeley, 1989)
reported that tones or instrumental music as intervening stimuli interfered
more strongly with WM tasks for tones than for phonemes or syllables.
Thus, these studies were taken as support for a specialized tonal and verbal
WM system. However, Semal and colleagues (Semal, Demany, Ueda, &
Hallé, 1996) discovered that the frequency relations between the
intervening stimuli and the standard stimuli are most important to explain
the results of the behavioral WM experiments. They rather identified that
pitch similarity of the intervening stimuli (words or tones) had a greater
effect on the performance rate than the particular modality (verbal or tonal)
of the intervening stimuli. Thus, the pitches for both verbal and tonal
stimuli are processed in the same WM system.
This auditory WM system always comes into play when the to-be-
remembered information is auditorily coded. For example, in a suppression
experiment (during which the subjects had to either sing or speak during the
retention period), recognition accuracy for both tone and digit sequences
decreases, regardless of whether the suppression material was verbal or
non-verbal (Schendel & Palmer, 2007). Thus, again this experiment
demonstrates that musical or verbal suppression does not selectively impair
verbal or tonal WM. A further experiment uncovered expertise-related
influences on the tonal WM system (Williamson, Baddeley, & Hitch, 2010).
The results of this experiment showed decreased performance if the tone
sequences consisted of more proximal (similar) pitches compared to more
distal (dissimilar) pitches, an effect resembling the phonological similarity
effect in the verbal WM domain.
Thus, one should refer more to an auditory WM where auditory
information of the same (or similar) pitch determines intervening effects.
And these intervening effects are independent whether the standard or
intervening stimuli are tones, vowels, syllables, or words. In other words,
everything that sounds similar interferes with each other. There is a single
auditory-based WM system that is used for all auditory information,
regardless of whether it is verbal or non-verbal auditory material. This
might explain how musical training can improve verbal working memory
(discussed later).
However, in the context of music listening, one has to keep in mind that
it is possible that acoustic information exists, which one cannot rehearse.
This could be specific timbre or pitch information. In this case, the subjects
cannot take advantage of the phonological loop (or general auditory
rehearsal mechanism). In such situations, the auditory information is
retained for a short period of time in specific feature maps.
Neuroanatomical Correlates of Working Memory
With the advent of modern brain imaging techniques, it is now possible to
identify the neural networks that are involved in controlling WM processes.
In the past, several studies have examined the neural underpinnings of
auditory WM using verbal material. These studies have shown that mainly
Broca’s area and premotor areas are core regions involved in the internal
rehearsal of verbal material (for a review of these studies see Schulze &
Koelsch, 2012). Besides these core regions, the insular cortex and the
cerebellum seem also to be involved in internal rehearsal of verbal
information. The neural underpinning of the phonological store has been
suggested to rely on parietal areas including the inferior and superior
parietal lobules and on the posterior perisylvian brain (particularly
including the left posterior planum temporale). While parietal brain areas
most likely reflect increased engagement of attentional resources (which,
incidentally, nicely fits with the pivotal role of attention in WM processes
according to the new domain-general WM models: Oberauer &
Lewandowsky 2011), the left posterior planum temporale is possibly
involved in the temporary storage of verbal information during WM tasks.
On the basis of these findings, and because posterior perisylvian brain areas
also support speech processing, it has been proposed that they act as an
auditory–motor interface for WM (Hickok & Poeppel, 2007). These
findings suggest a dual-stream model of speech processing with a ventral
stream involved in speech comprehension (supporting lexical access) and a
left dominant dorsal stream comprising the planum temporale enabling
sensory–motor integration. Through this stream the perceived speech
signals are mapped onto articulatory representations in frontal brain areas
Elmer, Hänggi, Meyer, & Jäncke, 2013).
Far fewer neuroimaging studies have directly investigated the neural
underpinnings of WM for tones. However, the few studies that have
examined tonal WM revealed that, only in non-musicians, all structures
involved in tonal WM were also involved in verbal WM. In summary,
consistently across studies (Schulze, Mueller, & Koelsch, 2011, 2013;
Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011), data obtained from
non-musicians indicate a considerable overlap of neural resources
underlying WM for both verbal and tonal information. This common
network includes a mainly left-lateralized fronto-parietal network including
Broca’s area, parietal areas, and the planum temporale.
Memory for Music

When we listen to music, we often recognize the musical piece quite well.
Sometimes we remember the title of the musical piece or even further
information like the text, composer, and the main instruments. Listeners
sometimes are even very accurate in reproducing familiar music by singing
and moving rhythmically to the music (Frieler et al., 2013; Halpern, 1989;
Levitin, 1994). Similar to verbal and non-verbal memory, musical memory
can be divided into implicit (unconscious), semantic, and episodic musical
memory (the latter memory systems are conscious) (Platel, 2005).
The implicit musical memory can best be seen in neurological patients.
For example, Johnson and colleagues (Johnson, Kim, & Risse, 1985)
exploited the so-called mere exposure effect in the context of music
listening experiments. This mere exposure effect was first demonstrated and
described by Zajonc (1968) as a psychological phenomenon, describing that
subjects tend to develop a preference for items simply because they have
been repeatedly confronted with this item. In the study by Johnson and
colleagues, Korsakoff’s syndrome patients preferred an unfamiliar musical
piece after only one previous presentation, compared to new musical pieces.
However, these patients did very poorly in a music recognition test. Halpern
and O’Connor (2000) found the same dissociation in normal elderly
listeners, who were at chance in recognizing just-presented melodies.
However, these subjects liked these musical pieces better than new
melodies. A similar distinction between explicit and implicit music memory
was drawn by Samson and Peretz (2005). On the basis of a comprehensive
analysis of neurological patients suffering from lesions in either the right or
the left temporal lobe, they concluded that right temporal lobe structures
have a crucial role in the formation of melody representations that support
priming and memory recognition, which are both more implicit memory
processes, whereas left-sided temporal lobe structures are more involved in
the explicit retrieval of melodies. Mere exposure effects have also been
shown in healthy subjects (Green, Bærentsen, Stødkilde-Jørgensen,
Roepstorff, & Vuust, 2012; Honing & Ladinig, 2009). These and further
similar studies in this area gave rise to the suggestion that there is indeed an
implicit musical memory, which demonstrates different features compared
to explicit musical memory. Implicit musical memory in normal and healthy
subjects appears, for example, during by-the-way music listening during
which we might move or hum without explicitly knowing which musical
piece we are listening to. This definitely happens nowadays quite often,
especially when we use our mobile devices (e.g., iPhone, etc.) while we
stroll through the street, drive a car, or jog.
The semantic musical memory is defined as memory for music excerpts
without associating them with the context in which the listener learned the
excerpt. Thus, we do not associate and remember the temporal (when) or
spatial (where) circumstances under which we have encoded and learned
the musical piece. Musical semantic memory may represent a form of
musical lexicon, separate of a verbal lexicon, even though strong links
certainly exist between them. Interestingly, musical pieces can be associated
with non-music semantic memory as Koelsch and colleagues have shown
(Koelsch et al., 2004). They demonstrated that short music excerpts can
prime concrete and even abstract words. Even when the musical pieces
were unknown to the subject, this priming effect occurred. Obviously,
music can carry meaning. The precise psychological mechanism
responsible for this interesting association between music and meaning is
currently not entirely understood but this study particularly shows that
musical information is strongly embedded in distributed memory network.
Episodic musical memory, on the other hand, is defined as the capacity
to recognize a musical excerpt for which the spatiotemporal context during
learning can be recalled (when, where, under which circumstances, and
with which people). A particular form of episodic musical memory is the
autobiographical musical memory. This memory part is activated when we
listen to music which is strongly associated with past experiences of our
own life. A further memory concept, which is similar to the
autobiographical memory, is the so-called memory for “nostalgia.”
Nostalgia has been defined as an affective process sometimes
accompanying autobiographical memories (Wildschut, Sedikides, Arndt, &
Routledge, 2006) giving rise to (mostly) positive and (sometimes) negative
effects (such as sadness). Nostalgia is strongly associated with personality
traits explaining the obvious inter-individual differences in the presence of
this effect.
The different facets of musical memory have been the focus of
substantial research in recent years. Based on this research, we now know
that the different musical memory systems mentioned earlier can be
modulated by different psychological aspects comprising (1) intrinsic
musical features such as timbre or tempo, (2) the emotional and arousal
components, and (3) individual schemas and musical structure. A further
issue influencing music memory processes, which incidentally is relatively
new, pertains to the (4) particular brain activation pattern during encoding
and retrieval of music information. In the following, I will discuss these
issues in more detail.
Intrinsic Features of Musical Pieces

Halpern and Müllensiefen (2008) manipulated timbre and tempo in order to
examine their influence on implicit and explicit memory for musical pieces.
They asked their study participants to encode forty unfamiliar short tunes.
After that, the participants were asked to give explicit and implicit memory
ratings for a list of eighty tunes, which included forty that had previously
been heard. To measure implicit memory, a rating of the pleasantness of old
and new melodies was used. Measures reflecting explicit memory
performance were obtained by calculating the difference between the
recognition confidence ratings of old and new melodies. Half of the forty
previously heard tunes differed in timbre or tempo in comparison with the
first exposure. Change in timbre and tempo both impaired explicit memory
measures, and change in tempo also made implicit tune recognition worse.
These findings support the hypothesis that an implicit musical memory
indeed exists, but furthermore shows implicit music memory is only
influenced by tempo variations. Interestingly, timbre and tempo had an
influence on the explicit music memory. These and further similar studies in
this area gave rise to the suggestion that there is indeed an implicit musical
memory, which demonstrates different features compared to the explicit
musical memory.
Emotion and Arousal Induced by Music

Several studies have shown that emotion and arousal evoked by musical
pieces influence retrieval and recognition of music. The main finding is that
emotional and arousing musical pieces are remembered better than pieces
which are less emotional and arousing (Alonso, Dellacherie, & Samson,
2015; Eschrich, Münte, & Altenmüller, 2005, 2008; Ferreri & Rodriguez-
Fornells, 2017; Parks & Clancy Dollinger, 2014; Peretz et al., 2009;
Vieillard & Gilet, 2013) (but for contradictory results, see Altenmüller,
Siggel, Mohammadi, Samii, & Münte, 2014). The reason for this memory
enhancing effect is thought to be based on at least two different and partly
interacting effects: (1) activation of the mesolimbic system, and (2)
enhancing the number of associations within the semantic associative
network.
Emotionally and rewarding music strongly activates the mesolimbic
reward system (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015). The
mesolimbic system is a relatively small brain system (including the nucleus
accumbens and the ventromedial prefrontal cortex), which is important for
the control of emotion, reward, and learning and which is mediated mainly
by dopamine. Dopamine is also widely recognized to be the critical
transmitter involved in addiction processes, for example, during the course
of virtually all drug abuses (including heroin, alcohol, cocaine, and nicotine
abuse). Even psychological addictions (e.g., gaming) are associated with
particular activations within the dopamine system (Kühn et al., 2011). But
other forms of rewards such as positive social interactions likewise activate
dopaminergic neurons and are powerful aids to attention and learning
(Keitz, Martin-Soelch, & Leenders, 2003). Dopamine is thought to
strengthen the synaptic potentiation in memory networks activated during
learning and consolidation of the music material. Thus, dopamine also
promotes plastic adaptations in brain areas involved in the control of trained
and practiced tasks. A further transmitter involved in music listening is
serotonin. Serotonin levels are significantly higher when subjects are
exposed to music they find pleasing (Evers & Suhr, 2000). Several (mostly
animal) studies have suggested a particular role of serotonin in learning and
memory processes (Meneses & Liy-Salmeron, 2012). However, it is not
entirely clear whether serotonin plays a positive or inhibitory role in
memory formation. It may actually be memory enhancing in one brain area
and inhibiting in another. Nevertheless, these transmitter systems (together
with several others) may support learning and memory processes. However,
one has to acknowledge that not only positively evaluated and rewarding
music is preferentially stored in musical memory but also negative or
simply arousing music. That these non-rewarding musical pieces are
strongly implemented in the music memory can be better explained by the
associative memory models, which will be explained in the next paragraph.
However, it should be kept in mind that this model is also useful in
explaining the role of emotion in general, irrespective of valence and
arousal.
In the context of the semantic associative network model of memory
formation (Bower, 1981) or the Search of Associative Memory (SAM)
model (Raaijmakers & Shiffrin, 1981), it has also been proposed that
emotions are used as contextual information linked to the to-be-
remembered item. These models assume that emotions are represented in a
network of nodes together with the musical piece. Thus stimulation and
“activation” of emotion nodes would create a form of spreading activation
that lowers the threshold of excitation of all associatively linked nodes and
thus helps to retrieve the music memory trace from memory. We will come
to this model and the extension later on. This model is particularly suited to
explain why even unpleasant music might be remembered well. This issue
has not been studied so far, but from introspection it is known that we
sometimes heavily dislike particular musical pieces despite recognizing
them relatively accurately.
Individual Schemas and Music Structure

Different listeners may understand the same musical piece in very different
ways. They may have varying degrees of appreciation for the musical
structure and they may differ on how it fits into the cultural context. In
order to describe and understand how we individually perceive and
memorize music, I will use the well-known schema concept (Piaget, 1923).
In other words, schemas are a form of cognitive heuristic which
automatically makes assumptions about the music and, although not
completely accurate, enables us to make quick judgments. Schemas are a
product of our experiences and can be adjusted or refined throughout our
entire lives. These schemas help us to understand various musical pieces;
they can influence our music memories, or influence what musical piece we
pay our attention to, and thus affect the chunks of information that are
available for encoding long-term music memories. Additionally, when we
try to remember a musical memory, schemas can help us to piece together
memories from it.
These schemas determine how (and whether) we encode, consolidate,
and remember a particular piece of music. A schema can even prevent us
from encoding, for example, when we are not interested in or strongly
dislike a particular musical piece. In such situations, we will not focus our
attention on this piece and at the end we will remember it poorly. On the
other hand, it is possible that a particular musical piece fits perfectly to a
stored schema (which incidentally is positively evaluated), in which case
we direct our attention to this piece of music and insert it preferentially into
our memory system. Incidentally, we know from several neurophysiological
studies that focusing attention on a particular auditory stimulus enhances
neural activation in the auditory cortex (Jäncke, Mirzazade, & Shah, 1999).
Thus, attention gives rise to focal neural activation increases in specific
brain areas and thus can influence learning, consolidation, and improved
retrieval of stored information.
While schemas depend on the individual subject and how the subject
“organizes” the neural networks and mental structures for processing
incoming information, the musical structure itself also plays a pivotal role
in learning and remembering musical pieces.
There are long and short pieces, some of them are monotone while
others vary dynamically across the entire piece. Some pieces use several
musical themes appearing in different forms while others only use one more
or less simple theme. In other words, musical structure is defined by the
degree of change within different levels of the musical piece. Many
researchers use the terms “information” or “complexity” to describe the
musical structure (Werbik, 1971). In this context, information refers to
redundancy. If the next note in a piece of music is relatively determined by
the preceding notes, it conveys little new information about the piece. Thus,
a musical piece containing complicated changes on many levels of its
structure contains more information than a piece that is repetitive and for
which the next notes and beats are easily predictable on the basis of the
preceding notes and beats. In the context of music memory, it is obvious
that complexity of the musical piece affects how it is encoded, consolidated,
and recalled. The more complicated (and complex) a musical piece, the
more difficult it is to encode and remember it. However, whether we can
learn and remember complicated and complex music also depends on our
mental structure and the schemas we have available for music perception
and music memory. Those who have mental structures for complex music
will find it easier to learn and retrieve them. Thus, there should be a strong
interaction between the mental structure for music and the musical structure
itself for forming musical memory. As far as I know, this has not been
studied explicitly in the music domain. However, in other domains it has
frequently been shown that experts (with specific and optimized mental
structures) are partly exceptional in discriminating, learning, and
recognizing information from their fields of expertise (Gobet, 1998;
Rawson & Van Overschelde, 2008). Thus, it is most likely that expertise in
music (even low level expertise) will have substantial influence on music
memory. Nevertheless, it is left to future studies to show that the available
mental structure for music has indeed an influence on musical memory.
Brain Activation during Encoding and Retrieval of Music

Only a few studies have examined the neural underpinnings of music
memory so far. The few fMRI studies have uncovered mostly similar
findings (Altenmüller et al., 2014; Ford, Addis, & Giovanello, 2011;
Gagnepain et al., 2017; Groussard et al., 2010; Janata, 2009; Margulis,
Mlsna, Uppunda, Parrish, & Wong, 2009; Plailly, Tillmann, & Royet, 2007;
Platel, 2005; Watanabe, Yagishita, & Kikyo, 2008). However, there are also
some differences depending on the paradigm used and the particular music
memory system studied. All studies identified a strong involvement of the
bilateral temporal brain area including the primary and secondary auditory
cortex (within the superior temporal gyrus) and temporal brain areas known
to be involved in language and memory processing (the middle and inferior
temporal gyrus). In addition, all studies reported the involvement of frontal
brain areas during music recognition. Mostly, the left inferior frontal cortex
is involved. When it comes to episodic music memory, bilateral frontal
cortex activations have also been reported with slightly right-sided
dominance. Sometimes the precuneus has also been reported as being
activated during episodic music memory tasks. When autobiographical
music memory is tested, hemodynamic responses in default-mode network
(DMN) regions increase, including lateral parietal, temporal, medial
prefrontal, and posterior cingulate cortices (Ford et al., 2011; Janata, 2009).
Although these studies adequately demonstrate that a distributed cortical
network is involved in music memory process, one has to keep in mind that
the fMRI environment and the obtained hemodynamic measures are not
optimal for studying the neural underpinnings of music processing in
general and music memory processes in particular. The loud and partly
annoying measurement environment is suboptimal for music presentation
and even for fine-graded cognitive processes. A major drawback of many
fMRI studies (and those mentioned earlier) is the fact that mostly very short
fragments of musical pieces (10–20 seconds) have been used, which may
have precluded the complex cognitive and emotional processes associated
with natural music listening. In addition, the hemodynamic responses are
slow and only partly correlate with the underlying neurophysiological
activations (Logothetis, 2008).
In future experiments it would be extremely helpful to study the neural
underpinnings of the different music memory systems using silent and less
annoying neurophysiological measurement techniques, such as EEG, MEG,
or NIRS which provide the possibility of working with natural music
stimuli. Currently, there are no studies using the types of experimental
paradigms that were used in the aforementioned fMRI studies. Thus, it is of
utmost importance to study the neurophysiological oscillations, intracortical
current densities, and coherences during music memory tasks. This would
provide the opportunity to study the neural underpinnings of music memory
processes using more natural experimental situations.
Until now, many music perception studies have been published using
these techniques and more natural music stimuli. Since music perception
implicitly makes use of music memory processes, these studies have
uncovered findings that are also interesting for music memory research. For
example, listening to natural music is associated with activations in
distributed neural systems comprising bilaterally temporal and frontal brain
areas (Jäncke & Alahmadi, 2016). In addition, particular coherences
between adjacent and distant brain areas are obvious during music
perception (Bhattacharya & Petsche, 2001; Bhattacharya, Petsche,
Feldmann, & Rescher, 2001; Bhattacharya, Petsche, & Pereda, 2001;
Jäncke, 2012) and other music-related tasks (Bangert & Altenmüller, 2003).
Thus, these studies partly correspond with fMRI studies in showing that
music perception (and thus partly music memory) is controlled via a
distributed neural network binding together brain systems involved in
auditory, memory, attention, sequence processing, and executive functions.
These neurophysiological findings could be used to understand the
possible enhancing effects of music on cognitive tasks (which I will
summarize in the next section). In his review article, Wolfgang Klimesch
summarized his EEG findings on memory research (Klimesch, 1999) and
reported that “good” and “bad” memory performers substantially differ in
terms of the time courses of event-related desynchronizations (ERD) in the
upper alpha and theta band during a semantic judgment task. The results
indicate within the first 1000 ms after presenting the test stimuli, good
memory performance is associated with a significantly larger extent of
alpha band desynchronization. The opposite holds true for the theta band
where good memory performance is reflected by a larger extent of
synchronization during the first 1000 ms. In this respect, the phasic
responses of these frequency bands reflect the quality and performance of
the memory. In addition, tonic changes of these frequency bands are also
related to the performance in memory, cognition, and perception. For
example, increased tonic alpha band and decreased theta band power are
associated with increased performance in various cognitive and perceptual
tasks. In this respect, it seems obvious that attempts are made to influence
the tonic and phasic oscillations in the alpha and theta bands in certain brain
areas in such a way that the functions performed by these brain areas run
optimally. This has been done by Klimesch and colleagues (Klimesch,
Sauseng, & Gerloff, 2003). They induced increased alpha-band oscillation
in parietal brain areas using transcranial magnetic stimulation (TMS) prior
to the performance of spatial intelligence tasks. By doing this, they
increased the tonic alpha band power in parietal areas. As a result of this
manipulation, the subjects substantially improved their cognitive
performance. Thus, it is conceivable that music listening might influence
brain activation in a similar way leading to an improvement in several
ongoing cognitive processes.
M M E
Can music be used as memory enhancer? When asking this question one
has to distinguish which aspect of memory should benefit from music. In
fact there are different influences of music on memory performance. First,
we have to discuss whether musicians or non-professional but musically
trained subjects benefit from musical training in terms of improved memory
performance (e.g., improved verbal working memory or improved long-
term memory). Second, does background music exert beneficial or even
detrimental effects on cognitive functions? Third, can music be used to
enhance memory functions? And fourth, is music beneficial for clinical
samples? In the following I will summarize some of the important findings
in this field.
Musical Proficiency and Memory

An often-asked question in the context of music research is whether
musicians are outperforming non-musicians in non-musical memory tasks.
In other words, is there a kind of transfer from music proficiency to non-
musical abilities? A very recent meta-analysis aimed to clarify whether
musicians indeed perform better than non-musicians in various memory
tasks (Talamini, Altoè, Carretti, & Grassi, 2017). By searching published
work on this topic in international databases, they collected twenty-nine
studies that used fifty-three different memory tasks (e.g., working memory
and long-term memory tasks with different materials). For these studies and
memory tests, they calculated Hedges’ g, a measure of the effect size
adjusted for small groups. These g values were interpreted according to the
criteria suggested by Cohen (1988): small effect = 0.2 to 0.5; medium effect
= 0.5 to 0.8; large effect > 0.8. Using this measure, they uncovered that
musicians performed better than non-musicians in terms of long-term
memory (small effect: g = 0.29), short-term memory (medium effect: g =
0.57), and working memory (medium effect: g = 0.56). They also controlled
for the influence of moderator variables (e.g., stimulus material: tonal,
verbal, or visuospatial) and identified that the musician’s advantage for
short-term and working memory was larger with tonal stimuli, moderate
with verbal stimuli, and small or null with visuospatial stimuli. Thus, one is
relatively safe in concluding that musicians are really better, even in non-
music related memory processes. But why are they better?
Currently, two possibilities are available to explain this finding: (1) a
kind of Pygmalion effect or (2) a consequence of musical training.
According to the Pygmalion (or Rosenthal) effect, musicians might perform
better than non-musicians because the researchers expected musicians to do
better, which might induce an improvement in their performance. However,
differences between musicians and non-musicians have not been reported
for all cognitive tasks (Schellenberg, 2001). There are only a few tasks
(including memory functions) for which musicians show enhanced
performance. Another possibility could be that individuals with better
memory are more likely to become musicians. This is also not very likely
since individuals with good memory can become very skilled and
successful in other domains outside the music business. They could become
good academics, economists, or philosophers. Thus, this hypothesis is not
very helpful in explaining the memory advantage in musicians. On the other
hand, a better memory might be a consequence of music training. This
musical training might have positively influenced (1) auditory processing,
(2) improved overlapping neural networks for speech and music functions,
and (3) active learning strategies, such as chunking and sensorimotor
integration.
Improved auditory processing has been demonstrated in many
experiments (Kühnis, Elmer, & Jäncke, 2014; Marie, Magne, & Besson,
2011). This improved ability could be helpful in memory tasks, especially
when stimuli are presented orally, because a better auditory encoding of the
item to be remembered could strengthen the trace of the stimulus in the
listener’s memory. In addition, encoding via the working memory system
makes use of the phonological/tonal/auditory loop of the working memory
system (described earlier). Thus, musicians might use their superior
auditory functions to use the early auditory encoding more efficiently than
non-musicians. Incidentally, two studies (Okhrei, Kutsenko, & Makarchuk,
2017; Talamini, Carretti, & Grassi, 2016) revealed no difference between
musicians and non-musicians in short-term memory tasks when verbal
stimuli were presented visually, thus supporting the hypothesis that auditory
encoding is the important link here. A further possible reason for the
superior memory performance in musicians could be based on the strong
overlap between neural networks and psychological functions involved in
speech and music processing. For example, phonological awareness,
reading ability, and music perception are controlled by overlapping
networks (Anvari, Trainor, Woodside, & Levy, 2002; Flaugnacco et al.,
2015). Music performance is a multisensorial issue involving associating
the music notation with the sound of the notes, and the motor responses.
These associations have to be built up during learning to play a music
instrument. This particular type of training is initially effortful and demands
attentional and executive control. Nevertheless, music training might
therefore enhance active learning strategies, such as chunking and
attentional control, functions that are essential to developing a good
memory.
Influence of Background Music on Learning and

Recall
The influence of background music on various tasks and cognitive
processes has been studied and discussed for quite a long while. A meta-
analysis conducted by Kämpfe and colleagues (Kämpfe, Sedlmeier, &
Renkewitz, 2010) revealed that background music does not have a uniform
effect on the performance of tasks. Based on these findings, one might
tentatively conclude that the effect of background music on cognitive
function can be attributed to general arousal and mood changes
(Schellenberg & Weiss, 2013).
In one of these studies (Jaencke & Sandmann, 2010), EEG activity was
recorded during encoding of verbal material. The authors found no
influence of background music on verbal learning. There was, however, a
substantially stronger alpha band desynchronization during the first 800–
1200 ms after presentation to the stimulus to learn during background
music. Four seconds later this changed to a substantial alpha band
synchronization. According to the results presented by Klimesch (1999),
this could indicate that background music presentation slightly improves the
neural underpinnings of encoding (indicated by the phasic alpha band
desynchronization) followed by a more efficient consolidation (indicated by
the later and more tonic alpha band synchronization). But these
neurophysiological changes do not correlate with the memory performance
since the latter did not change during background listening. In a further
study by Kussner and colleagues (Kussner, de Groot, Hofman, & Hillen,
2016), the authors reported unstable effects of background music on
learning performance. While they found no influence of background music
on learning in the first experiment, the exact replication in a second
experiment revealed a beneficial effect of background music. However,
they identified that beta band power measured at baseline before the
learning experiment (which served as an index of trait arousal) correlated
with the learning performance. Thus, general arousal as indicated by resting
state beta band activity could indicate a good starting point for later
occurring learning.
Whether background music might positively influence learning and
memory in clinical populations or in elderly subjects suffering from age-
dependent memory declines is still disputed. Some studies have shown that
background music can enhance memory performance in the elderly. Using
NIRS, Ferreri et al. (2014) reported improved learning during background
music listening conditions in elderly subjects. This learning improvement
was accompanied by decreased prefrontal cortex blood flow, which the
authors interpret as less activation and less “disturbing effort” during
encoding. If these findings hold true in replications, it will open new
perspectives to counter the often decreased episodic memory performance
in the elderly.
Music as Memory Modulator in Healthy Subjects

A recent set of studies revealed that emotional arousal evoked by music can
enhance memory consolidation (Judde & Rickard, 2010). The authors of
this study presented music excerpts immediately after learning, 20 or 45
minutes after encoding of verbal material. During this post-learning period,
the subjects relaxed. One week later the same subjects took part in a
retention experiment during which they were tested whether they
remembered the words they had learned one week before. The retention
performance was significantly enhanced, regardless of valence, when music
presentation occurred at 20 minutes, but not immediately or 45 minutes
after encoding. The authors explain this facilitatory effect of music
presentation on long-term memory in the context of what is currently
known about the time course of memory consolidation. Memory
consolidation is time-dependent since the biochemical processes
modulating synaptic processes need some time (at least 25 minutes) to
develop and to install the new and altered synaptic contacts in the memory
networks, including the release of various hormones into the bloodstream
(i.e., epinephrine, norepinephrine, and cortisol) (McGaugh, 2000). Thus,
when arousing music (irrespective of the valence) is presented exactly 20–
25 minutes post-learning, memory consolidation is enhanced. In a
subsequent experiment the authors demonstrated that learning emotional
material was attenuated when relaxing music was presented during the post-
learning phase (Rickard, Wong, & Velik, 2012). Thus, relaxing music may
counter the increased arousal levels that might enhance the formation of
emotional memory containing negative and unwanted memories.
A number of studies have investigated how memory performance
changes when the words to be learned are sung (Calvert & Tart, 1993;
Kilgour, Jakobson, & Cuddy, 2000; McElhinney & Annett, 1996;
Tamminen, Rastle, Darby, Lucas, & Williamson, 2017; Wallace, 1994). The
learning materials have been words, lyrics, or ballads. Although these
studies differ in terms of the particular paradigms used, they all came to
more or less the same conclusion that sung (verbal material) is better
recalled. However, the benefit of the sung modality increased as familiarity
with the melody increased. In some studies the sung benefit was entirely
restricted to conditions in which the song was familiar to the participants
(Calvert & Tart, 1993; Tamminen et al., 2017). These results can be best
explained in the context of the SAM theory. When new information is
encoded it is easier and more efficient to “connect” this new information
with already stored memory traces, which familiar music is. In other words,
familiar music offers the possibility of attaching new information to it.
Michael Thaut and his colleagues (Peterson & Thaut, 2007) examined
the neural underpinnings associated with the presentation and encoding of
sung verbal material compared to spoken verbal material. For this they
measured EEG during the presentation of the material to be learned and
calculated what they called “learning-related changes in coherence
(LRCC)” to quantify the learning-related changes of brain oscillations
between the scalp electrodes. Using this measure, they found increased
coherences within and between left and right frontal areas in several
frequency bands during encoding of sung words. These results are
interpreted as support for the hypothesis that verbal learning in the context
of musical presentation strengthens coherent oscillations in frontal cortical
networks, which are known to be involved in encoding and retrieval of
memory information. Although the neurophysiological findings are
compelling and consistent with what we know about the neural
underpinnings of working memory and other memory processes (i.e.,
changes of brain synchronization during learning and retrieval), there was
no difference in terms of behavioral performance for the sung or spoken
material. It might be possible that the learning material was too easy and
thus induced floor effects or that the sung material was not related strongly
enough to familiar melodies.
Music as Memory Modulator in Neurological

Patients
There is currently substantial interest in finding non-invasive interventions
to rehabilitate the cognitive impairments of neurological patients. In
particular, patients suffering from memory impairments have been targeted
in recent research in order to identify possible beneficial effects of music on
memory impairments. One of the first and most important studies with
neurological patients has been published by Särkämo et al. (2008).
Applying a single-blind, randomized, and controlled experiment with sixty
stroke patients of whom twenty listened daily for one hour to self-selected
music, the study revealed substantial improvements for verbal memory and
focused attention only for those patients listening to self-selected music.
The control subjects (listening to audio book or doing nothing in addition)
did not show any improvement in these cognitive functions. This study was
one of the first, demonstrating beneficial effects of music listening on
cognitive recovery in neurological patients.
A number of studies have shown that patients with Alzheimer’s disease
(AD) recognize lyrics that they learned heard sung more reliably than lyrics
heard in the spoken modality (Simmons-Stern, Budson, & Ally, 2010).
Besides this improvement, they also showed substantial improvements in
memorizing the semantic content of the lyrics learned in the sung modality
(Simmons-Stern et al., 2012). Other studies have shown that the material
learned in the sung condition was relatively robust since the patients
recognize these stimuli relatively well even after longer periods of time
(Moussard, Bigand, Belleville, & Peretz, 2012, 2014; Palisson et al., 2015).
Incidentally, similar findings have been shown in multiple sclerosis patients
(Thaut, Peterson, McIntosh, & Hoemberg, 2014).
Although some of the beneficial effects of sung material were relatively
small (e.g., Moussard et al., 2014), they all can be explained within the
same theoretical framework. We know that music processing is associated
with brain activations in a distributed neural network including many brain
areas. This increases the likelihood that even in case of strong
degenerations of some brain areas, other network parts are intact, which
then can be used for encoding and consolidation. A further possibility could
be that musical information could be used as “context” information, which
can be used to “attach” the newly learned information (similar to the SAM
theory proposed for memory processes in healthy subjects).
A M M M
Principally, music memory is not that different from the “classical” memory
system. However, there are some fundamental and important differences
when it comes to natural music and how it is processed, stored, and
retrieved. Music is a dynamic stimulus evolving over time, so when
listeners listen to music they have to integrate the incoming sequential
auditory information and apply specific memory-based mechanisms
(Gestalt perception, chunking, etc.) to form this sequence to a musical
piece. Thus, music listening is not only a matter of simple auditory
information processing, it is much more, since several psychological
functions are involved, from working memory to several aspects of long-
term memory. What is, however, special for music perception and music
memory is the fact that widely distributed neural networks are involved in
perceiving and recognizing musical pieces. Figure 2 demonstrates
schematically the specific nature of the musical memory, which is partly
derived from the SAM model proposed by Raaijmakers and Shiffrin (1981)
and from Kalveram’s model of inverse processing (Kalveram & Seyfarth,
2009).
FIGURE 2. Schematic description of the memory system associating music information with many
non-music aspects.
As one can see from Fig. 2, auditory information is fed to the memory
storage, which can be conceived of as a kind of correlation storage where
auditory information (at) is associated with lots of different information
resulting in a set of efferences (et). In this sense, music information can be
associated with motor programs, which is particularly important for
musicians who have learned to generate music by manipulating
instruments. For that they need specific and highly specialized motor
programs, which they can use to operate their instrument. But non-
musicians do also have (maybe even an innate) audio-motor coupling,
which becomes obvious when we listen to rhythmic music. In these
situations, we tend to move according to the music rhythm. Besides motor
programs, episodic, autobiographical, semantic, and implicit memory
information are also associated with music information. Even emotion and
motivation can be related to the incoming auditory information. These
associations can be conceived of as correlations of varying strength, with
the correlation strength depending on the frequency of repetition and the
salience of the associated information. Some of these correlations give rise
to conscious perceptions (explicit memory) while others remain
unconscious (implicit memory). Executive functions can enhance the
processing of incoming information by directing attention to particular
information at the end enhancing the neural activation of those areas that
are involved in processing this information. We can also apply executive
functions to direct our attention to particular correlations, thus enhancing
their likelihood to result in an appropriate efference. This can also lead to a
kind of suppression and/or inhibition of other correlations. In this context,
perceptual and memory schemas can be applied according to which we
select or enhance incoming information or the pattern of correlations within
the storage. This model can also be operated in an inverse fashion. For
example, when a person wants to change his current mood he might
“activate” the correlations within the storage associated with a particular
emotion. This will “activate” images of those musical pieces that activate
the wished emotion. Thus, the goal (to evoke a particular emotion) is now
fed into the storage, which activates those efferences yielding to the
emotion in question.
No generation has listened to music as often as our generation.
According to a 2016 US survey estimate, more than 90 percent of the
population reported listening to music for an average of 25 hours a week
(Nielsen, 2017). Thus, it is obvious that music is a frequently applied cue
for autobiographical memories because music is associated with lots of
everyday information. Thus music can serve as an efficient strategy to
assess and stimulate our biographical memory.
Music is a complex stimulus carrying much information and which
evolves over time. This could be the reason why music processing is
associated with distributed neural network activations. Obviously, it is
relatively easy to link music information to multimodal information. Music
can carry meaning, emotion, and episodic non-music information; it can
also trigger and control motor behavior. This multiplicity and versatility of
the music network offer many possibilities to insert new information. This
could be the reason why several music-related learning strategies improve
memory functions. However, future studies are necessary to show whether
music interventions can be used to improve memory functions in both
healthy and neurologically impaired subjects.
R
Alonso, I., Dellacherie, D., & Samson, S. (2015). Emotional memory for musical excerpts in young
and older adults. Frontiers in Aging Neuroscience 23. Retrieved from
https://doi.org/10.3389/fnagi.2015.00023
Altenmüller, E., Siggel, S., Mohammadi, B., Samii, A., & Münte, T. F. (2014). Play it again, Sam:
Brain correlates of emotional music recognition. Frontiers in Psychology 114. Retrieved from
Anvari, S. H., Trainor, L. J., Woodside, J., & Levy, B. A. (2002). Relations among musical skills,
phonological processing, and early reading ability in preschool children. Journal of Experimental
Child Psychology 83(2), 111–130.
Baddeley, A. (2010). Working memory. Current Biology 20(4), R136–R140.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), Psychology of
Learning and Motivation (Vol. 8, pp. 47–89). New York: Academic Press.
Bangert, M., & Altenmüller, E. O. (2003). Mapping perception to action in piano practice: A
longitudinal DC-EEG study. BMC Neuroscience 4, 26. Retrieved from
https://doi.org/10.1186/1471-2202-4-26
Bhattacharya, J., & Petsche, H. (2001). Enhanced phase synchrony in the electroencephalograph
gamma band for musicians while listening to music. Physical Review E: Statistical, Nonlinear, and
Soft Matter Physics 64(1 Pt. 1), 012902.
Bhattacharya, J., Petsche, H., Feldmann, U., & Rescher, B. (2001). EEG gamma-band phase
synchronization between posterior and frontal cortex during mental rotation in humans.
Neuroscience Letters 311(1), 29–32.
Bhattacharya, J., Petsche, H., & Pereda, E. (2001). Long-range synchrony in the gamma band: Role
in music perception. Journal of Neuroscience 21(16), 6329–6337.
Bower, G. H. (1981). Mood and memory. The American Psychologist 36, 129–148.
Calvert, S. L., & Tart, M. (1993). Song versus verbal forms for very-long-term, long-term, and short-
term verbatim recall. Journal of Applied Developmental Psychology 14(2), 245–260.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York: Lawrence
Erlbaum Associates.
Cowan, N. (2011). The focus of attention as observed in visual working memory tasks: Making sense
of competing claims. Neuropsychologia 49, 1401–1406.
Deutsch, D. (1970). Tones and numbers: Specificity of interference in immediate memory. Science
168(3939), 1604–1605.
Deutsch, D., Henthorn, T., Marvin, E., & Xu, H. (2006). Absolute pitch among American and
Chinese conservatory students: Prevalence differences, and evidence for a speech-related critical
period. Journal of the Acoustical Society of America 119(2), 719–722.
Elmer, S., Hänggi, J., Meyer, M., & Jäncke, L. (2013). Increased cortical surface area of the left
planum temporale in musicians facilitates the categorization of phonetic and temporal speech
sounds. Cortex 49(10), 2812–2821.
Elmer, S., Rogenmoser, L., Kühnis, J., & Jäncke, L. (2015). Bridging the gap between perceptual and
cognitive perspectives on absolute pitch. Journal of Neuroscience 35(1), 366–371.
Eschrich, S., Münte, T. F., & Altenmüller, E. O. (2005). Remember Bach: An investigation in
episodic memory for music. Annals of the New York Academy of Sciences 1060, 438–442.
Eschrich, S., Münte, T. F., & Altenmüller, E. O. (2008). Unforgettable film music: The role of
emotion in episodic long-term memory for music. BMC Neuroscience 9, 48. Retrieved from
https://doi.org/10.1186/1471-2202-9-48
Evers, S., & Suhr, B. (2000). Changes of the neurotransmitter serotonin but not of hormones during
short time music perception. European Archives of Psychiatry and Clinical Neuroscience 250(3),
144–147.
Ferreri, L., Bigand, E., Perrey, S., Muthalib, M., Bard, P., & Bugaiska, A. (2014). Less effort, better
results: How does music act on prefrontal cortex in older adults during verbal encoding? An fNIRS
study. Frontiers in Human Neuroscience 8, 301. Retrieved from
Ferreri, L., & Rodriguez-Fornells, A. (2017). Music-related reward responses predict episodic
memory performance. Experimental Brain Research 235(12), 3721–3731.
Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schon, D. (2015). Music training
Ford, J. H., Addis, D. R., & Giovanello, K. S. (2011). Differential neural activity during search of
specific and general autobiographical memories elicited by musical cues. Neuropsychologia 49(9),
2514–2526.
Frieler, K., Fischinger, T., Schlemmer, K., Lothwesen, K., Jakubowski, K., & Müllensiefen, D.
(2013). Absolute memory for pitch: A comparative replication of Levitin’s 1994 study in six
European labs. Musicae Scientiae: The Journal of the European Society for the Cognitive Sciences
of Music 17(3), 334–349.
Gaab, N., Gaser, C., Zaehle, T., Jancke, L., & Schlaug, G. (2003). Functional anatomy of pitch
memory: An fMRI study with sparse temporal sampling. NeuroImage 19(4), 1417–1426.
Gagnepain, P., Fauvel, B., Desgranges, B., Gaubert, M., Viader, F., Eustache, F., … Platel, H. (2017).
Musical expertise increases top-down modulation over hippocampal activation during familiarity
decisions. Frontiers in Human Neuroscience 11, 472. Retrieved from
Gobet, F. (1998). Expert memory: A comparison of four theories. Cognition 66(2), 115–152.
Green, A. C., Bærentsen, K. B., Stødkilde-Jørgensen, H., Roepstorff, A., & Vuust, P. (2012). Listen,
learn, like! Dorsolateral prefrontal cortex involved in the mere exposure effect in music. Neurology
Research International 2012, 846270. Retrieved from http://dx.doi.org/10.1155/2012/846270
Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (1999). Absolute pitch: Prevalence,
ethnic variation, and estimation of the genetic component. American Journal of Human Genetics
65(3), 911–913.
Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2001). Early childhood music education
and predisposition to absolute pitch: Teasing apart genes and environment. American Journal of
Medical Genetics 98(3), 280–282.
Groussard, M., La Joie, R., Rauchs, G., Landeau, B., Chetelat, G., Viader, F., … Platel, H. (2010).
When music and long-term memory interact: Effects of musical expertise on functional and
structural plasticity in the hippocampus. PLoS ONE 5(10), e13225.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition 17(5),
572–581.
Halpern, A. R., & Müllensiefen, D. (2008). Effects of timbre and tempo change on memory for
music. Quarterly Journal of Experimental Psychology 61(9), 1371–1384.
Halpern, A. R., & O’Connor, M. G. (2000). Implicit memory for music in Alzheimer’s disease.
Neuropsychology 3(14), 391–397.
Honing, H., & Ladinig, O. (2009). Exposure influences expressive timing judgments in music.
Jaencke, L., & Sandmann, P. (2010). Music listening while you learn: No influence of background
music on verbal learning. Behavioral and Brain Functions 6, 3. Retrieved from
https://doi.org/10.1186/1744-9081-6-3
Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral
Cortex 19(11), 2579–2594.
Jäncke, L. (2012). The dynamic audio-motor system in pianists. Annals of the New York Academy of
Sciences 1252, 246–252.
Jäncke, L., & Alahmadi, N. (2016). Detection of independent functional networks during music
listening using electroencephalogram and sLORETA-ICA. Neuroreport 27(6), 455–461.
Jäncke, L., Mirzazade, S., & Shah, N. J. (1999). Attention modulates activity in the primary and the
secondary auditory cortex: A functional magnetic resonance imaging study in human subjects.
Neuroscience Letters 266(2), 125–128.
Johnson, M. K., Kim, J. K., & Risse, G. (1985). Do alcoholic Korsakoff’s syndrome patients acquire
affective reactions? Journal of Experimental Psychology: Learning, Memory, and Cognition 11(1),
22–36.
Judde, S., & Rickard, N. (2010). The effect of post-learning presentation of music on long-term
word-list retention. Neurobiology of Learning and Memory 94(1), 13–20.
Kalveram, K. T., & Seyfarth, A. (2009). Inverse biomimetics: How robots can help to verify concepts
concerning sensorimotor control of human arm and leg movements. Journal of Physiology 103(3–
5), 232–243.
Kämpfe, J., Sedlmeier, P., & Renkewitz, F. (2010). The impact of background music on adult
listeners: A meta-analysis. Psychology of Music 39(4), 424–448.
Keitz, M., Martin-Soelch, C., & Leenders, K. L. (2003). Reward processing in the brain: A
prerequisite for movement preparation? Neural Plasticity 10(1–2), 121–128.
Kilgour, A. R., Jakobson, L. S., & Cuddy, L. L. (2000). Music training and rate of presentation as
mediators of text and song recall. Memory & Cognition 28(5), 700–710.
Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A
review and analysis. Brain Research Reviews 29(2), 169–195.
Klimesch, W., Sauseng, P., & Gerloff, C. (2003). Enhancing cognitive performance with repetitive
transcranial magnetic stimulation at human individual alpha frequency. European Journal of
Koelsch, S., Kasper, E., Sammler, D., Schulze, K., Gunter, T., & Friederici, A. D. (2004). Music,
language and meaning: Brain signatures of semantic processing. Nature Neuroscience 7(3), 302–
307.
Kühn, S., Romanowski, A., Schilling, C., Lorenz, R., Mörsen, C., Seiferth, N., … IMAGEN
Consortium (2011). The neural basis of video gaming. Translational Psychiatry 1(11), e53.
Kühnis, J., Elmer, S., & Jäncke, L. (2014). Auditory evoked responses in musicians during passive
vowel listening are modulated by functional connectivity between bilateral auditory-related brain
regions. Journal of Cognitive Neuroscience 26(12), 2750–2761.
Kussner, M. B., de Groot, A. M., Hofman, W. F., & Hillen, M. A. (2016). EEG beta power but not
background music predicts the recall scores in a foreign-vocabulary learning task. PLoS ONE
11(8), e0161387.
Lee, Y. S., Janata, P., Frost, C., Martinez, Z., & Granger, R. (2015). Melody recognition revisited:
Influence of melodic Gestalt on the encoding of relational pitch information. Psychonomic Bulletin
& Review 22(1), 163–169.
Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence from the production of learned
melodies. Perception & Psychophysics 56, 414–423.
Levitin, D. J., & Rogers, S. E. (2005). Absolute pitch: Perception, coding, and controversies. Trends
in Cognitive Sciences 9(1), 26–33.
Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature 453(7197),
869–878.
McElhinney, M., & Annett, J. M. (1996). Pattern of efficacy of a musical mnemonic on recall of
familiar words over several presentations. Perceptual and Motor Skills 82(2), 395–400.
McGaugh, J. L. (2000). Memory: A century of consolidation. Science 287(5451), 248–251.
Margulis, E. H., Mlsna, L. M., Uppunda, A. K., Parrish, T. B., & Wong, P. C. M. (2009). Selective
neurophysiologic responses to music in instrumentalists with different listening biographies.
Marie, C., Magne, C., & Besson, M. (2011). Musicians and the metric structure of words. Journal of
Meneses, A., & Liy-Salmeron, G. (2012). Serotonin and emotion, learning and memory. Reviews in
the Neurosciences 23(5–6), 543–553.
Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2012). Music as an aid to learn new verbal
information in Alzheimer’s disease. Music Perception: An Interdisciplinary Journal 29(5), 521–
531.
Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2014). Learning sung lyrics aids retention in
normal ageing and Alzheimer’s disease. Neuropsychological Rehabilitation 24(6), 894–917.
Nielsen (2017). Nielsen Music year-end report 2016. Retrieved from
http://www.nielsen.com/us/en/press-room/2017/nielsen-releases-2016-us-year-end-music-
report.html
Oberauer, K., & Lewandowsky, S. (2011). Modeling working memory: A computational
implementation of the Time-Based Resource-Sharing theory. Psychonomic Bulletin & Review
18(1), 10–45.
Okhrei, A., Kutsenko, T., & Makarchuk, M. (2017). Performance of working memory of musicians
and non-musicians in tests with letters, digits, and geometrical shapes. Biologija 62(4), 207–215.
Palisson, J., Roussel-Baclet, C., Maillet, D., Belin, C., Ankri, J., & Narme, P. (2015). Music enhances
verbal episodic memory in Alzheimer’s disease. Journal of Clinical and Experimental
Parks, S. L., & Clancy Dollinger, S. (2014). The positivity effect and auditory recognition memory
for musical excerpts in young, middle-aged, and older adults. Psychomusicology: Music, Mind,
and Brain 24(4), 298–308.
Peretz, I., Gosselin, N., Belin, P., Zatorre, R. J., Plailly, J., & Tillmann, B. (2009). Music lexical
networks. Annals of the New York Academy of Sciences 1169, 256–265.
Peterson, D. A., & Thaut, M. H. (2007). Music increases frontal EEG coherence during verbal
learning. Neuroscience Letters 412(3), 217–221.
Piaget, J. (1923). La langage et la pensée chez l’enfant: Études sur la logique de l’enfant. Retrieved
from
http://pubman.mpdl.mpg.de/pubman/item/escidoc:2375486/component/escidoc:2375485/Piaget_1
923_language_pensee_enfant.pdf
Plailly, J., Tillmann, B., & Royet, J.-P. (2007). The feeling of familiarity of music and odors: The
same neural signature? Cerebral Cortex 17(11), 2650–2658.
Plantinga, J., & Trainor, L. J. (2005). Memory for melody: Infants use a relative pitch code.
Cognition 98(1), 1–11.
Platel, H. (2005). Functional neuroimaging of semantic and episodic musical memory. Annals of the
New York Academy of Sciences 1060, 136–147.
Raaijmakers, J. G., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review
88(2), 93–134.
Rawson, K. A., & Van Overschelde, J. P. (2008). How does knowledge promote memory? The
distinctiveness theory of skilled memory. Journal of Memory and Language 58(3), 646–668.
Rickard, N. S., Wong, W. W., & Velik, L. (2012). Relaxing music counters heightened consolidation
of emotional memory. Neurobiology of Learning and Memory 97(2), 220–228.
Rogenmoser, L., Elmer, S., & Jäncke, L. (2015). Absolute pitch: Evidence for early cognitive
facilitation during passive listening as revealed by reduced P3a amplitudes. Journal of Cognitive
Salamé, P., & Baddeley, A. (1989). Effects of background music on phonological short-term memory.
Quarterly Journal of Experimental Psychology Section A 41(1), 107–122.
Samson, S., & Peretz, I. (2005). Effects of prior exposure on music liking and recognition in patients
with temporal lobe lesions. Annals of the New York Academy of Sciences 1060, 419–428.
Särkämö, T., Tervaniemi, M., Laitinen, S., Forsblom, A., Soinila, S., Mikkonen, M., … Hietanen, M.
(2008). Music listening enhances cognitive recovery and mood after middle cerebral artery stroke.
Brain: A Journal of Neurology 131, 866–876.
Schellenberg, E. G. (2001). Music and nonmusical abilities. Annals of the New York Academy of
Sciences 930, 355–371. Reprinted in G. E. McPherson (Ed.), The child as musician: A handbook
of musical development (2nd ed., pp. 149–176). Oxford: Oxford University Press, 2016.
Schellenberg, E. G., & Weiss, M. W. (2013). Music and cognitive abilities. In D. Deutsch (Ed.), The
Psychology of Music (3rd ed., pp. 499–550). London: Academic Press.
Schendel, Z. A., & Palmer, C. (2007). Suppression effects on musical and verbal memory. Memory &
Cognition 35(4), 640–650.
Schulze, K., & Koelsch, S. (2012). Working memory for speech and music. Annals of the New York
Schulze, K., Mueller, K., & Koelsch, S. (2011). Neural correlates of strategy use during auditory
working memory in musicians and non-musicians. European Journal of Neuroscience 33(1), 189–
196.
Schulze, K., Mueller, K., & Koelsch, S. (2013). Auditory stroop and absolute pitch: An fMRI study.
verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping 32,
771–783.
Semal, C., Demany, L., Ueda, K., & Hallé, P. A. (1996). Speech versus nonspeech in pitch memory.
Journal of the Acoustical Society of America 100(2 Pt. 1), 1132–1140.
Simmons-Stern, N. R., Budson, A. E., & Ally, B. A. (2010). Music as a memory enhancer in patients
with Alzheimer’s disease. Neuropsychologia 48(10), 3164–3167.
Simmons-Stern, N. R., Deason, R. G., Brandler, B. J., Frustace, B. S., O’Connor, M. K., Ally, B. A.,
& Budson, A. E. (2012). Music-based memory enhancement in Alzheimer’s disease: Promise and
limitations. Neuropsychologia 50(14), 3295–3303.
Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin 113(2), 345–361.
Talamini, F., Altoè, G., Carretti, B., & Grassi, M. (2017). Musicians have better memory than
nonmusicians: A meta-analysis. PLoS ONE 12(10), e0186773.
Talamini, F., Carretti, B., & Grassi, M. (2016). The working memory of musicians and nonmusicians.
Tamminen, J., Rastle, K., Darby, J., Lucas, R., & Williamson, V. J. (2017). The impact of music on
learning and consolidation of novel words. Memory 25(1), 107–121.
Thaut, M. H., Peterson, D. A., McIntosh, G. C., & Hoemberg, V. (2014). Music mnemonics aid
verbal memory and induce learning-related brain plasticity in multiple sclerosis. Frontiers in
Trainor, L. J., McDonald, K. L., & Alain, C. (2002). Automatic and controlled processing of melodic
contour and interval information measured by electrical brain activity. Journal of Cognitive
Vieillard, S., & Gilet, A.-L. (2013). Age-related differences in affective responses to and memory for
emotions conveyed by music: A cross-sectional study. Frontiers in Psychology 4, 711. Retrieved
from https://doi.org/10.3389/fpsyg.2013.00711
Wallace, W. T. (1994). Memory for music: Effect of melody on recall of text. Journal of
Experimental Psychology: Learning, Memory, and Cognition 20(6), 1471–1485.
Watanabe, T., Yagishita, S., & Kikyo, H. (2008). Memory of music: Roles of right hippocampus and
left inferior frontal gyrus. NeuroImage 39(1), 483–491.
Werbik, H. (1971). Informationsgehalt und emotionale Wirkung von Musik. Mainz: B. Schott.
Wildschut, T., Sedikides, C., Arndt, J., & Routledge, C. (2006). Nostalgia: Content, triggers,
functions. Journal of Personality and Social Psychology 91(5), 975–993.
Williamson, V. J., Baddeley, A. D., & Hitch, G. J. (2010). Musicians’ and nonmusicians’ short-term
memory for verbal and musical sequences: Comparing phonological similarity and pitch
proximity. Memory & Cognition 38(2), 163–175.
Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social
Psychology 9(2 pt. 2), 1–27.
Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998). Functional
anatomy of musical processing in listeners with absolute pitch and relative pitch. Proceedings of
CHAPT E R 12
M U S I C A N D AT T E N T I O N ,
EXECUTIVE FUNCTION,
A N D C R E AT I V I T Y
P S Y C H E L O U I A N D R A C H E L E . G U E T TA
A is “the taking possession by the mind, in clear and vivid form,

of one out of what seem several simultaneously possible objects or trains of
thought. Focalization, concentration, of consciousness are of its essence”
(James, 1890, p. 403).
Executive functions are “a family of top-down mental processes needed
when you have to concentrate and pay attention … three core EFs:
inhibition [inhibitory control, including self-control (behavioral inhibition)
and interference control (selective attention and cognitive inhibition)],
working memory (WM), and cognitive flexibility (also called set shifting,
mental flexibility, or mental set shifting and closely linked to creativity)”
(Diamond, 2013, pp. 1–2).
Creativity is “the ability to produce work that is novel (i.e., original,
unexpected), high in quality, and appropriate (i.e., useful, meets task
constraints)” (Sternberg, Lubart, Kaufman, & Pretz, 2005, p. 351).
How does music, as “organized sound” (Varèse & Wen-Chung, 1966),
intersect with these cognitive capacities of the human mind? In this chapter,
we provide a general overview of the contemporary research at the
intersection of music and attention, executive functions, and creativity. On
one hand, we see that musical sounds provide an optimal stimulus set with
which to understand the fundamental properties of attention, executive
functions, and creativity. On the other hand, music also offers a window
through which researchers may assess effects of long-term training on more
general cognitive function, as well as neurocognitive development
throughout the lifespan.
M A
There are many ways to conceptualize the vast literature on attention.

Perhaps as a result, research on the intersection between attention and
music has been similarly fragmented. Nevertheless, research on music and
attention has followed the trends of psychology and neuroscience more
generally, and musical stimuli have served as a useful model to tease apart
several models of attention. Here we provide a general overview of the
disparate theories on attention, before turning to its intersection with the
work on music more specifically.
Theories of Attention
Patel’s OPERA hypothesis (Patel, 2011b) posits that one of several reasons
why music training benefits the neural encoding of speech is through
attention: by engaging shared brain networks between music and speech
that are associated with focused attention. Attention has been thought of in
terms of early versus late selection theories, and in terms of its operation
over space and time. Early selection theories focus on sensory processing
and more exogenous (reflexive) sources of information, whereas late
selection theories focus more on feature selection and more cognitive,
endogenous operations.
Theories of early versus late attention differ in their posited effects of
perceptual selection, enhancement, and cognitive focus along various points
in the perceptual-cognitive pathway, or along the gradient of primary to
association areas in the human cortex. Such early versus late selection
theories of attention pertain to when, temporally, along the classic sensory-
cognitive pathway attentional processes might operate. Early selection
theories generally focus on sensory processing (closer to the sensory
periphery) whereas late selection theories focus more on cognitive
processing. Evidence for early selection comes from findings from the
dichotic listening paradigm in which event-related brain potentials were
recorded. The amplitude of the N1, an event-related brain response
generated in response to sounds, is enhanced in response to sounds in the
attended ear relative to the unattended ear (Woldorff & Hillyard, 1991).
Magnetoencephalography work subsequently pointed out the source of this
attentional enhancement to the auditory cortex (Woldorff et al., 1993). Since
the auditory cortex is part of the primary sensory cortices, the finding that
this early cortical way station acts on attentional processing as early as 100
ms after sound presentation provides convincing evidence for early
selection.
Theories that posit relatively late selection conceptualize attention as a
feature-based or object-based operation. In particular, the feature integration
theory (Treisman & Gelade, 1980) posits that attention operates by
combining pre-attentively selected features within a busy scene. Support for
this comes from illusory conjunctions, in which unattended features of
visual objects, such as color and shape, are sometimes combined to give
rise to an illusory percept of a nonexistent object. While this theory has
received lots of interest, the definitions of features in vision may not so
readily transfer to audition. In the auditory modality, stimulus
representation has been described as hierarchical, as shown by
psychophysical and modeling studies. At the lowest rung of the hierarchy of
stimulus representation there are “primitive features” such as acoustic
frequency, whereas at higher levels there are more complex, emergent
features such as virtual pitch, which combine with other features to form
objects. Attention can be enhanced by cueing at the appropriate level, thus
reducing uncertainty (Hafter & Saberi, 2001). Object-based attention offers
a direct comparison between visual and auditory processing. Much like the
visual system combines features to form objects, the auditory system forms
objects by grouping together sound elements that share features such as
frequency and harmonic structure (Shinn-Cunningham, 2008). The
temporal evolution of these features is especially relevant for object
formation in the auditory system. At a low-level timescale, the auditory
system may group together sounds based on similar fine-grained temporal
features such as attack time, while at a higher-level timescale, distinct tones
may be grouped together based on temporal proximity to give rise to beat
perception. Beat perception has been proposed as an attentional mechanism,
through which different temporal objects such as tones are combined
together to form larger units such as rhythms and phrases (De Freitas,
Liverence, & Scholl, 2014; Grahn, 2012). The rhythmic effects of attention
over time will be revisited later in this section.
Evidence for late selection in auditory neurophysiology comes from
findings of later attention-related enhancements in event-related brain
responses such as the P300 (Purves et al., 2008). In addition, cases of late
selection are supported by the neuropsychological literature, in which
patients with lesions in the right parietal cortex present with lack of
awareness of their contralesional (usually left) visual field. For these cases,
sometimes the successful perception of one feature can reduce the
detectability of another, simultaneously present feature, a condition known
as extinction. In the auditory/musical modality, interesting evidence comes
from the use of an auditory illusion in a case of auditory extinction
(Deouell, Deutsch, Scabini, Soroker, & Knight, 2007). This study took
advantage of Deutsch’s scale illusion, in which presenting subjects with
alternating high-pitched and low-pitched tones to the left and right ear
paradoxically leads to the percept of a stream of high tones in the right ear
and low tones in the left (Deutsch, 1974). When the patient with auditory
neglect was presented with octave illusion stimuli, he reported only hearing
the high-pitched stream. The fact that he only heard the right-lateralized
stream, rather than the right-ear stimulus, does suggest that some forms of
perceptual analysis, such as the formation of auditory streams, are intact
before attention and its disruption in hemispatial neglect, thus providing
support for late selection.
A third line of literature supports a combination of early and late
selection theories in showing attention-related enhancements of mid-latency
brain responses to sound. For example, the mismatch negativity, an event-
related potential generated around 200 ms after the onset of unexpected
sounds, is both pre-attentively generated and modulated by attention
(Woldorff, Hillyard, Gallen, Hampson, & Bloom, 1998). More specific to
the music literature, the Early Right Anterior Negativity (ERAN), an event-
related potential in response to unexpected musical chords such as the
Neapolitan chord (Koelsch, Gunter, Friederici, & Schröger, 2000), is also
pre-attentively generated but modulated by attention: When subjects were
directing attention away from auditory stimuli in a visual task, they
nevertheless elicited an ERAN in response to the Neapolitan chord;
however its amplitude is larger in the attended condition (Loui, Grent-’T-
Jong, Torpey, & Woldorff, 2005). Taken together, the best available
resolution for the debate on early versus late selection holds that attention
acts on multiple levels of the perceptual-cognitive or primary-association
continuum, by selecting relevant features and processing them more fully at
more sensory stages, and also by combining selected features to form
coherent objects, streams, or scenes at later association stages.
Selection and Filtering

While the controversy amongst the early versus late selection camps
continues, other work has focused on the roles of attention for selecting and
filtering (Hafter, Sarampalis, & Loui, 2008). Perhaps the most common
example of attentional filtering is the famous cocktail party effect, our
remarkable ability to focus on one speaker amidst a noisy environment
(Cherry, 1953). In contrast, Broadbent (1982) noted that peripheral stimuli
may also take over attention and processing, such as in the “breakthrough of
the unattended” phenomenon.
Bregman’s (1994) theory of auditory scene analysis posits that we
stream or segregate distinct auditory stimuli by means of top-down
knowledge as well as bottom-up perceptual processing, based on acoustic
features such as frequency and amplitude co-modulation. This auditory
stream segregation, the dividing of our world into separate sound-emitting
objects, helps us to make sense of the sounds around us. Music listening,
thus, entails many aspects of analyzing a busy auditory scene. In Western
music, for instance, at various times we are continually separating and
fusing the different voices within the musical surface to perceive melody
and harmony. This act of auditory scene analysis requires selective and
divided attention, and interacts with training (Loui & Wessel, 2007). In
music, the objects to which we attend may pertain to horizontal aspects
such as melody, vertical aspects such as harmony, timbral aspects including
spectral centroid, and amplitude envelope. Attended features or objects may
also be music-theoretically defined components such as specific chord
changes and harmonies. They may also pertain to rhythm, meter, and/or
larger-scale musical structure such as form.
Attending to Musical Pitch and Harmonicity

The musical surface is rich with different types of information, all of which
can direct our attention as we listen. Frequency, pitch, and harmonicity can
act as predictive cues, guiding our attention toward the cued feature. Early
psychophysical work had shown that subjects were better at detecting tones
that are presented at an expected frequency as well as an expected pitch,
giving rise to the idea that cues can combine hierarchically, as reviewed
above (Hafter & Saberi, 2001). However, cues do not have to share
perceptual features with the target in order to drive attention: Voluntary
attention to a cue frequency heightens sensitivity for a different target
frequency; furthermore, a visual cue can direct attention toward an auditory
frequency (Hafter, Schlauch, & Tang, 1993). Thus, auditory sensitivity
increases not only for what is physically presented, but also for what is
attended. Signal detection is easier when the signal shares perceptual
features with the attended cue, thus enabling involuntary or exogenous
cueing; but also, whenever the cue provides some information that can
endogenously (or voluntarily) lead to the reduction of uncertainty and the
increase in predictability about the target in an ongoing task.
These effects of endogenous cueing also guide expectations in a higher-
level musical context. Based on our long-term knowledge from
encountering music in our culture, humans have developed expectations for
commonly co-occurring musical structures such as in harmony, melody, and
musical syntax. Reaction time studies have shown that our knowledge of
musical syntax can act as a prime or a cue that directs attention toward
musically expected stimuli, thus reducing reaction time for harmonically
expected musical structures and increasing reaction time for unexpected
structures (Bharucha & Stoeckig, 1986; Marmel, Tillmann, & Dowling,
2008). This enhanced attentional processing due to the priming effect of
tonality is not tied to tasks that involve reacting to the feature of musical
expectation itself; its effects even spread to visual processing (Escoffier &
Tillmann, 2008). The priming effect of tonal expectations has been shown
in non-musicians as well as musicians, suggesting that they result from
implicitly learned expectations rather than from explicit musical training
(Bigand, Poulin, Tillmann, Madurell, & D’Adamo, 2003). However, the
effect of tonal expectations does depend on selective attention: when the
task is to attend selectively to the melodic contour of a chord progression,
musically trained subjects were more affected by unexpected harmonies,
showing both reaction time costs and benefits relative to musically
untrained subjects, who were slower overall but not affected by different
unexpected chord progressions (Loui & Wessel, 2007). This again points to
the analysis of complex musical materials (such as chord progressions with
different voices) as auditory scenes with different streams of information in
local as well as global contexts, a view echoed in other cognitive and
electrophysiological studies (Justus & List, 2005; List, Justus, Robertson, &
Bentin, 2007).
Temporal Attention, Prediction, and Entrainment

of Musical Stimuli
In addition to operating over different points in frequency, pitch, and
harmonicity, attention also operates over time. Perhaps the most recent
influential view of how music can contribute to the discussion in attention
comes from the idea that music unfolds over time in the form of rhythm,
which is the pattern of inter-onset intervals which enable the cognitive
system to chunk the incoming sound stimuli in a hierarchical manner
(Longuet-Higgins & Lee, 1982; Povel & Essens, 1985). The idea that
attention is temporally based is not incompatible with the object-based
views of attention reviewed earlier in this chapter, but more recently there
has been a shift of interest specifically toward how attention changes
dynamically over time. This is modeled by the Dynamic Attending Theory,
which posits that attention fluctuates in rhythmically predictable pulses,
giving rise to different levels of detection and identification to stimuli
presented at different times relative to the attentional rhythm (Jones, 1976;
Jones & Boltz, 1989; Jones, Moynihan, MacKenzie, & Puente, 2002).
Compelling evidence for the Dynamic Attending Theory comes from
psychophysical studies, in which subjects were better at same-different
judgments in pitch when the pitch to be judged occurred at a rhythmically
predictable time (Jones et al., 2002).
The study of rhythmic attention has recently become closely tied to the
study of rhythmic oscillations in the brain. The idea that there are intrinsic
rhythmic fluctuations in the brains of humans and other mammals is not
new, going back to the late 1800s and popularized by Hans Berger in the
1920s (Millett, 2001). Berger discovered that by recording electrical signals
from the human scalp, he could observe spontaneous electrical fluctuations
of the electroencephalogram (EEG) at the rate of ~10 Hz, which he coined
as the alpha rhythm. The power of alpha-band activity is highest during
states of rest and relaxation. In contrast, activity increases in different
frequency bands such as beta (~20 Hz), gamma (>30 Hz), and delta (2–4
Hz) have been observed during different mental states. These bands of
oscillatory activity, and the physical relationships between them, are
hypothesized to have functional significance for enabling long-range
neuronal communications across the brain. In particular, beta activity is
shown to track the beat during the perception and imagery of rhythmic
music (Fujioka, Ross, & Trainor, 2015). Rhythmic synchronization to the
beat frequency is strongest over the motor areas (Nozaradan, Zerouali,
Peretz, & Mouraux, 2013), suggesting an involvement of the motor system
in attending to the beat, consistent with fMRI work (Grahn & Brett, 2007).
Furthermore, bursts of activity in the beta band are found to originate in the
left sensorimotor cortex and influences activity in the auditory cortex,
suggesting that the motor system, with its intrinsic oscillatory activity in the
beta band, guides rhythmic attention in the auditory system (Morillon &
Baillet, 2017). Together, the recent literature shows that musical rhythm
drives auditory attention via the entrainment of oscillatory neuronal activity
at multiple frequencies, which originates in the motor system but is tightly
coupled with the auditory system. In addition to being important for
understanding attention to musical rhythm, these findings also pertain to
speech, which contains multiple temporal modulations at specific
frequencies (Ding et al., 2017). Selective attention in the real world likely
entails listening at various timescales, which affects different patterns of
neural and behavioral entrainment (Henry, Herrmann, & Obleser, 2015).
Understanding how the brain organizes these fluctuating rhythms may have
implications for designing music targeted toward enhancing attention. New
approaches to music composition have inserted some rhythmic components
(e.g., fast rhythmic amplitude modulations) to the musical stimulus to target
specific neuronal oscillations, with the ultimate goal of improving cognition
(James et al., 2017). This approach is promising as it may offer therapeutic
possibilities for music-based training of executive functions that makes use
of the rhythmic temporal properties of attention to achieve optimal goal-
directed behavior.
M E F
Executive functions (EFs) include processes related to planning and self-

control, as well as attention, working memory, mental inhibition, and
cognitive flexibility (Diamond, 2013). This subset of cognitive function
enables us to readily manipulate and prioritize information, filter through
distractors, balance our thoughts, and switch between tasks to optimize
cognitive performance. Without these processes, we would not be able to
concentrate on important tasks, think before acting, adapt to unexpected
challenges, resist temptations, or generally function cognitively in our daily
lives. The fundamental EFs, namely inhibition, interference control,
working memory, and cognitive flexibility, play important roles in
development, intelligence, and social and cognitive health.
The question as to whether and how EFs are enhanced through either
passive music listening or more active long-term musical training has
gained increased attention. The proposition that music and musical training
may influence executive functioning has been a topic of debate in recent
years, perhaps first widely popularized in media and public interest by the
Mozart Effect (Rauscher, Shaw, & Ky, 1993). The idea that merely listening
to music could improve our grades in school, our ability to focus, or even
our general IQ was at once exciting and applicable, not to mention
marketable. Since the inception of the Mozart Effect, however, research has
debunked the idea that passively listening to Mozart can transfer to
cognitive gains outside of the musical domain. And so, the questions
remain: Does music training confer non-musical advantages? If so, how?
The long-term effect of music training is arguably the most active area of
music and the brain research today. This section will delineate the current
theories and literature on what potential effects music may have on EFs, the
roles of near versus far transfer, and the functions of specific neural
mechanisms on EFs and transfer.
Since the Mozart Effect has largely been discredited, the focus in music
cognition research has been the long-term, more effortful effects of musical
training. Unlike passive listening, long-term music training engages more of
our neural and cognitive circuitry and thus can be expected to induce
structural and functional plastic changes in the brain. The importance of
discerning whether musical training promotes any advantages to EFs relates
to the question of the transferring of skills, or transfer. The transfer and
generalization of learning and skills from one area to another, then, can
increase general cognitive capacities. Near transfer occurs within a specific
modality (e.g., music and speech) whereas far transfer occurs between two
less obviously related domains (e.g., music and IQ or music and conflict
monitoring). While nearer forms of transfer between music and related
areas have been demonstrated, far transfer is harder to prove.
Association Studies Suggesting Near Transfer

Studying near transfer as a means to understand the possible effects of
music on related cognitive abilities and EFs include association studies
looking at groups of children and adults, some musically trained and some
untrained. From these comparison studies between subjects with different
levels of musical training, we know that training has measureable effects on
the brain as indicated by auditory evoked responses, such as those
generated from the brainstem (Kraus & Chandrasekaran, 2010).
Patel’s OPERA hypothesis postulates that musical training benefits the
neural encoding of speech in five ways, the first of which is overlap
between neural resources for music and speech (Patel, 2011a). This is
supported by many known associations between musical training, speech,
and language skills. For instance, musical training improves auditory skills
such as pitch discrimination, which is associated with children’s reading
abilities and phonemic knowledge, providing evidence of an association
between musical abilities and the EF needed for reading and linguistic
processing (Lamb & Gregory, 1993). Children with better pitch perception
and production abilities also perform better at phonemic awareness tests
even after controlling for intelligence and musical training (Loui, Kroog,
Zuk, Winner, & Schlaug, 2011), providing additional support for shared
neural resources for musical (pitch) and speech (phonemic) awareness.
Advantages of pitch discrimination generalize to tasks that involve the
perception of pitch in speech, and may be generally helpful in non-musical,
cognitive tasks (Lolli, Lewenstein, Basurto, Winnik, & Loui, 2015). Still,
association studies of near transfer lack certain clarity due to potential
confounds, such as parental income, education, and other indirect causes of
non-random allocation of participants.
Theoretically, an influential model that has been proposed to underlie
near transfer between music and language is Patel’s shared syntactic
resource integration hypothesis (SSRIH) (Patel, 2003). The SSRIH
proposes that syntax in language and music share a common set of
processes, executed in temporal and frontal brain regions. The proposal of a
synergistic processing scheme between music and language was
demonstrated when both reaction time and reading comprehension were
especially taxed due to the need to simultaneously integrate syntactically
ambiguous grammar and harmonic violations (Slevc, Rosenberg, & Patel,
2009). Supporting the SSRIH, these findings reinforce the theory that music
and language draw on a common pool of limited processing resources for
approaching and making sense of incoming elements into syntactic
structures. The resolution of perceptual and cognitive conflicts, or cognitive
control, then, has been implicated in both music and linguistic processing.
Patel’s demonstration of interactive effects between the two modalities
suggests the presence of near transfer between syntactic processing of
music to language.
Although the SSRIH posits shared resources between music and
language, the nature of this resource is unclear. Slevc and Okada (2015)
suggest that cognitive control, and the implicated prefrontal cortical
mechanisms, may be one shared resource between the musical and
linguistic domains. And while the intersection of music and language has
not historically been focused on EFs, the idea that cognitive control may be
controlling both syntactic domains is worth noting. The points of
convergence between processing and filtering amongst language and music,
as well as the notion of transfer, may help to explain a possible mechanism
by which musical training enhances cognitive functions such as EFs.
These findings pose generalizable implications on immediate and long-
term cognitive transfer from musical training to, say, reading exercises and
vice versa. Slevc and Okada’s theory that cognitive control may be one
shared resource between the musical and linguistic domains is important in
understanding how detection and resolution of conflict occurs when
expectations are violated and interpretations must be reworked, as in the
case of grammatical and harmonic violations. By this account, musical
training involves not just the incremental processing and integration of
musical elements as they occur sequentially, but also the generation of
musical predictions and expectations, which must sometimes be prioritized
and revised in response to evolving musical input.
An additional study investigating the relationship between music and
EFs evaluated musical experience and its ability to predict individual
differences on inhibition, updating, and set-switching in both auditory and
visual modalities (Slevc, Davey, Buschkuehl, & Jaeggi, 2016). Incidentally,
musical ability was indeed able to predict better performance on both
auditory and visual updating tasks, even when controlling for a variety of
potential confounds such as age, handedness, bilingualism, and socio-
economic status. Musical ability was not, however, clearly related to
inhibitory control and was unrelated to set-switching behavior.
Such mixed results from this group show that the extra-musical gains
associated with musical ability are not limited to auditory processes, but
rather to specific aspects of EFs. This supports a process-specific, but
modality-general relationship between musical experience and non-musical
aspects of cognition, hereby also bolstering the potential of near and far
transfer.
Far Transfer
The hypothesis that music training enhances EFs assumes that far transfer
of cognitive skills takes place as a result of training; however, far transfer
has not been reliable across studies (Sala & Gobet, 2017b). On one hand,
cross-sectional studies comparing musicians and non-musicians have shown
positive effects of EF: Adult musicians perform better on measures of
cognitive flexibility, working memory, and verbal fluency, and musically
trained children also perform better on behavioral and fMRI indices of
verbal fluency, rule representation, and task switching (Zuk, Benjamin,
Kenyon, & Gaab, 2014). On the other hand, cross-sectional studies are still
limited by the fundamental possibility that results may be due to similar
confounds as the association studies, such as differences in parental
education, socio-economic status (although these were mostly controlled for
in the previous study), or some aspect of exposure in the home environment
that is outside of the experimenter’s control, as well as pre-existing
differences before initiating training. Long-term differences in EF
performance, only after controlling for these potential confounds, would
provide a convincing basis for the possibility of far transfer.
Longitudinal Studies on Far Transfer

Longitudinal studies aim to eliminate these confounds, and the randomized
controlled trial is still hailed as the gold standard for such experimental
designs. In that regard, some longitudinal studies do provide support for
music to EF transfer. Several longitudinal studies have tested the effects of
music lessons on IQ. Preschool children who received weekly music
training for six months showed higher gains on performance IQ tests than
musically untrained counterparts, with effects being observable as early as
the age of 3 (Gromko & Poorman, 1998). Still, some of these extra-musical
gains could have been attributed to non-musical factors such as time spent
with the class and with the instructor, as these were not given in the no-
treatment control group. Thus, an active control group is an important
improvement to the design of these longitudinal studies. A 2004
longitudinal study tested the relationship between music lessons and general
intelligence, here IQ (Schellenberg, 2004). The study assigned 144 children
to either music lessons on keyboard or voice, or to control groups with
either drama lessons or no lessons. Children in the two music groups
exhibited greater increases in full-scale IQ from pre- to post-lessons, as
measured by the WISC-III (Wechsler, 1991). Although the effect was fairly
small, the demonstrated enhancements generalized across all IQ subtests,
index scores, and standardized measures of academic achievement. Further,
the drama group exhibited improvements in measures of social behavior
that were not evident amongst the music group. Here, the presence of active
control groups provides more substantial evidence for the possibility of far
transfer.
Behavioral Changes and Neural Mechanisms

In addition to a drama lesson control group, other studies have compared
music training against sports and visual art training as active control groups.
One study compared the effects of two interactive computerized training
programs in music and visual art on preschool children (Moreno et al.,
2011). Children in the music group showed enhanced performance on
verbal intelligence measures after only 20 days of training. Furthermore,
this boosted performance was positively correlated with changes in event-
related potential (ERP) measures during an executive-function task (the
go/no-go task, requiring cognitive control and inhibition), here
demonstrating far transfer. Such longitudinal studies with randomized,
active control groups provide the most impressive evidence of the far
transfer effects of music to extra-musical gains.
In another longitudinal behavioral and ERP study, Habibi and colleagues
(2016) compared children in music training, children in sports training, and
a no-training matched control group. Children with musical training showed
an improvement in their ability to detect auditory changes, as measured by
cortical auditory evoked potentials to musical notes after one year of
training. Specifically, the P1 amplitude, an ERP measure of auditory
cortical activity, decreased significantly for all three groups, though with
the largest decrease in the music group from baseline to year 2 (Habibi,
Cahn, Damasio, & Damasio, 2016). A particularly robust difference
between the three groups is the decrease in P1 amplitude and latency in the
music group elicited by piano tones in the passive task. As decreased P1
amplitude and latency is observed in adults, these results may suggest
accelerated maturity of auditory processing as a result of music training.
Combining cross-sectional and longitudinal data in a behavioral and
fMRI study in children and adults, Ellis and colleagues showed that
musically trained subjects were superior at melodic discrimination, with the
number of hours of practice predicting the behavioral improvement.
Interestingly, the underlying changes in brain activity involved increased
leftward asymmetry in the supramarginal gyrus (SMG). Longitudinal fMRI
data showed changes in activity of the left SMG during melodic
discrimination that correlated with hours of practice, after controlling for
age and previous training (Ellis, Bruijn, Norton, Winner, & Schlaug, 2013).
As the left SMG is a region implicated in short-term auditory working
memory, these training-related changes in left SMG activity may suggest
improved working memory function over time, by co-opting brain areas
that are otherwise involved in systems that are not normally engaged for
music.
It is worth noting that while Moreno et al. showed transfer to a non-
auditory task, Habibi et al. and Ellis et al. showed effects of long-term
training on neural processing of sounds, which did not involve transfer per
se. Nevertheless, the neural mechanisms that changed as a result of training,
that is, the left SMG and the neural generators of the P1, may be relatively
domain-general, respectively subserving working memory and auditory
processing more generally. The combined use of neuroimaging,
electrophysiology, and behavioral tasks is fruitful for investigating transfer
effects of musical training, as it provides clues as to the underlying neural
mechanism behind transfer. The evolution of functional neural signatures
over the course of longitudinal studies may be informative not only of how
music training affects the brain, but also of how neural processes develop
more generally throughout the lifespan.
Negative Findings
Studies reviewed thus far have reported positive transfer effects for near
transfer, and more limited but nevertheless successful results on far transfer.
However, not all reports have been positive, and the effect sizes of far
transfer have been small, as shown by a recent meta-analysis of the far
transfer effects of musical training (Sala & Gobet, 2017a, b). Mehr and
colleagues found no reliable evidence for non-musical cognitive benefits
from brief preschool music lessons (Mehr, Schachner, Katz, & Spelke,
2013). Preschool children were either given music classes, arts instruction,
or no lessons. After six weeks, the participants were assessed in four
distinct cognitive areas in which older arts students have been reported to
excel: spatial-navigational reasoning, visual form analysis, numerical
discrimination, and receptive vocabulary. At first, music class participants
showed greater spatial-navigational ability than those in the visual arts
class, while children from the visual arts class showed greater visual form
analysis ability than children from the music class. However, the
researchers were unable to replicate this trend. In the end, the children who
were provided with music classes performed no better overall than those
with visual arts or no classes. These findings demand caution in interpreting
other positive findings for enhanced executive functioning as a result of
music instruction. It may be important to note, however, that the brief
training sessions from this study do not readily compare to long-term
musical training. Furthermore, the selection of transfer tasks needs to take
into account the underlying mechanism that could lead to transfer.
Conclusions and Implications

While the popularized Mozart Effect is highly confounded, the benefits of
long-term musical training on EFs seem to be promising. Music may also
have protective effects against age-related hearing loss: For instance,
oscillatory neural activity of older adults is less flexible to speech-paced
rhythms, especially during focused attention (Henry, Herrmann, Kunke, &
Obleser, 2017). While neural entrainment to speech is disrupted in older
age, it may be possible that extended music lessons, which bolster speech
perception at a younger age, can protect against some of this disintegration
later in life (White-Schwoch, Carr, Anderson, Strait, & Kraus, 2013). Thus,
further understanding the influences of musical training on executive
function is crucial, as our ability to flexibly manipulate mental information
is not only necessary for successful functioning in everyday life, but also
has implications throughout our lifetime.
M C
While executive function pertains to the ability of the cognitive system to

work with conflicting constraints, creativity pertains to relatively
unconstrained thought processes. Thinking “outside of the box” is a
foremost marvel of the human mind. The ability to be creative, or to
produce output that is at once novel and unexpected, yet useful and
appropriate, requires some domain-specific knowledge (Csikszentmihalyi,
1996; Sternberg & Lubart, 1999; Sternberg et al., 2005). While the exact
mechanisms contributing to the creative processes are still unknown, there
is evidence that creativity relies on real-time contributions of multiple
constituent mental processes (Goldenberg, Mazursky, & Solomon, 1999).
These mental processes involve selective attention and stream segregation,
long-term and autobiographical as well as working memory, idea generation
and evaluation, and expectation and prediction, as well as the ability to
switch between these processes. Creativity, then, incorporates some of the
fundamental EFs, such as attention and mental flexibility. Creativity does
differ from other components of executive function, however, in its form of
thought. While executive function entails the ability to engage in
deliberation and strongly constrained thinking, creative thinking has fewer
deliberate constraints (Christoff, Irving, Fox, Spreng, & Andrews-Hanna,
2016). And it is due to this relatively unconstrained nature that the study of
creativity has been more elusive and imprecise. In a creativity task there is
no single correct answer, yet there are more and less creative answers.
The standard definition of creativity is bipartite: for a work to be
considered creative, it has to be both novel and useful/appropriate (Runco
& Jaeger, 2012). Historical and empirical musicologists have long been
interested in finding novelty in pieces of music relative to their context.
This is important both for better understanding of existing works, and for
the possibility of generating novel works (Collins, 2016). In contrast, the
usefulness of music is difficult, if not impossible, to define. Most might
expect that for artistic domains including music, the concept of usefulness
in music opens up more questions than it answers, and is therefore not a
good definition at all. Appropriateness is easier to define as being within the
stylistic or genre-based context, for example, sonata form, variations on a
theme, or classical versus jazz versus experimental music improvisation. To
be considered appropriate, one has to stay primarily within an expected
genre, or within the style. In that regard, creativity in music must be
considered within its historical and stylistic context. This dependence on the
environment applies to creativity more generally, which must be considered
relative to the domain, the field, and the creator (Csikszentmihalyi, 1996).
Musical Improvisation as a Model of Creativity

Psychological studies on creativity and music have considered creativity as
a set of cognitive functions. The study of musical improvisation offers a
window into creativity, which is predicated upon novel combinations of
existing skills (Limb & Braun, 2008). A systematic literature review of
neuroscience of musical improvisation shows shared neural networks
between musical improvisation and other forms of creativity, such as artistic
or scientific creativity. Generally, a network of prefrontal regions is
involved in musical improvisation as well as every other form of creativity
(Beaty, 2015). At the same time, there are also some differences between
music, artistic, and scientific creativity (e.g., insight problems). As shown
in a meta-analysis of fMRI studies on creativity (Boccia, Piccardi, Palermo,
Nori, & Palmiero, 2015), musical creativity often involves auditory-motor
networks, such as the supplementary motor areas, in addition to other
prefrontal regions that are consistently active in creativity studies.
Improvisation training is fundamentally cognitive training (Biasutti,
2015). Teaching improvisation in the classroom can not only increase
creativity among students (Norgaard, 2017), it may also inform cognitive
theories of creativity and improvisation (Norgaard, Spencer, & Montiel,
2013). A critical review of PET, fMRI, and EEG studies on creativity
showed that although there is some convergence on the importance of the
prefrontal cortex, there are nevertheless many holes in the literature that
would benefit from further investigations (Sawyer, 2011). A systematic
understanding of musical improvisation, that combines multiple methods in
musical information retrieval, psychophysics and psychometrics, and
cognitive neuroscience will be useful for a thorough understanding of what
creativity means, and how to foster creativity in pedagogy.
Neuroimaging Studies of Music and Creativity
(For a detailed overview of neuroimaging studies on improvisation, see
Chapter 20).
With the advent of fMRI and the engineering of MR-compatible musical
instruments (Hollinger, Steele, Penhune, Zatorre, & Wanderley, 2007), it
became possible to observe functional correlates of human brain activity
during jazz improvisation, comparing it to a closest non-improvised control
condition. The first fMRI study on jazz improvisation compared improvised
versus overlearned conditions in novel melodies and musical scales (Limb
& Braun, 2008). Results showed many loci of activations, with a general
trend of more activity in mesial regions during improvisation, especially in
the prefrontal cortex. Another fMRI study looked at piano improvisation as
an auditory-motor sequencing problem (Bengtsson, Csikszentmihalyi, &
Ullén, 2007). This study also compared the task of improvisation against
the task of reproducing a previously created improvisation from memory.
The most significant difference in brain activity between improvisation and
reproduction conditions was again found in the pre-supplementary motor
area (pre-SMA); however, the improvisation condition also showed higher
activity in dorsolateral prefrontal cortex and dorsal premotor cortex.
Together, results are consistent with Limb and Braun (2008) in identifying a
network of interacting prefrontal areas active during improvisation.
Similarly, another fMRI study on musical improvisation (Berkowitz &
Ansari, 2008) tested similar experimental and control conditions of
improvisation versus reproduction, but with the additional comparison
between rhythmic and melodic improvisation and control conditions.
Results showed more activations as well as deactivations for melodic
improvisation relative to rhythmic improvisation, with effects centered
around motor planning regions in the frontal lobe, specifically the premotor
cortex.
Freestyle rap is another form of musical creativity that involves heavy
use of rhythmic improvisation as opposed to melodic improvisation. One
fMRI study compared brain activity during spontaneous freestyle rap to
conventional rehearsed performance (Liu et al., 2012). During the freestyle
condition, rap artists showed an upregulation of mesial regions (presumably
important for idea generation and/or self-referential processes) and
downregulation of lateral regions which reflect rule learning. The mesial
regions are part of a larger group of regions that are intrinsically correlated
in their activity, together known as the Default Network (Fox & Raichle,
2007). In contrast, the lateral regions, such as the dorsolateral prefrontal
cortex, are part of a larger network consistently active during executive
functions (Executive Control Network) (Shirer, Ryali, Rykhlevskaia,
Menon, & Greicius, 2012).
Although most studies in musical creativity have shown improvisation-
related activity in prefrontal regions (including the medial prefrontal cortex,
the dorsolateral prefrontal cortex, the cingulate cortex, and the pre-SMA),
other studies have observed activity in the classic language and emotion
networks. One study showed activity in the inferior frontal gyrus, also
known as Broca’s area, while jazz musicians were interacting by “trading
fours” (Donnay, Rankin, Lopez-Gonzalez, Jiradejvong, & Limb, 2014) and
improvising to communicate a specific positive or negative emotional intent
(McPherson, Barrett, Lopez-Gonzalez, Jiradejvong, & Limb, 2016). Broca’s
area is also the known neural generator of the ERAN, an
electrophysiological marker for the processing of musically unexpected
events (Maess, Koelsch, Gunter, & Friederici, 2001), and recent work has
shown a larger ERAN in jazz improvising musicians, suggesting increased
involvement of Broca’s area following improvisation training (Przysinda,
Zeng, Maves, Arkin, & Loui, 2017). Functional connectivity from fMRI
results also showed that duration of improvisation experience was
negatively correlated with fronto-parietal areas in the executive control
network, but positively correlated with functional connectivity between
areas within the auditory-motor network (Pinho, De Manzano, Fransson,
Eriksson, & Ullén, 2014). Based on these recent studies, it appears that
areas important for auditory-motor functions, including the language
network, are as intrinsic to musical creativity as the aforementioned default
and executive control networks.
Data-Driven Correlates of Creativity

While the literature has generally defined creativity as the tendency to
produce novel and appropriate output, the determination of creativity in the
output has generally required the consensual assessment of multiple raters
(Amabile, 1982), a relatively time-consuming technique that can be
sensitive to bias on the part of the raters. With recent advances in musical
information retrieval, it may be fruitful to relate the definition of creativity
to information that can be gleaned from the creative output itself. Since
people who are more creative tend to produce more fluent, original, and
flexible output (Silvia, Beaty, & Nusbaum, 2013), it may be useful to
operationally define creativity as fluent production of high information
content. Information theory includes many possible measures, the first of
which is entropy, first defined by Shannon (1948) and subsequently used in
neuroscience (Friston, 2010) and in music cognition (Hansen & Pearce,
2014). Information retrieval techniques such as the musical information
retrieval toolbox (Lartillot & Toiviainen, 2007) now have relatively data-
driven measures of musical information content such as entropy, as well as
harmonic movement, spectral centroid change, and onset detection.
Applying these types of information retrieval techniques to musical
performances may yield useful information about the player’s creativity.
A new and potentially fruitful approach comes from relating entropy
from musical production to brain structure to reveal brain–behavior
correlations, an approach beginning to be adopted in recent studies (Arkin,
Przysinda, Pfeifer, Zeng, & Loui, 2019; Zeng, Przysinda, Pfeifer, Arkin, &
Loui, 2018). As data-driven approaches of understanding become
increasingly sophisticated, it has become even more important to continue
relating the studies of music and the brain to find unifying approaches to
data that might inform both fields. We can move toward biomarkers of
creativity by having rigorously defined outcome measures and relating
these outcome measures to data from the brain. In this way, music offers a
promising approach for the most useful way to conceptualize creativity.
Personality and Cognitive Profiles of Creative

Musicians
Examining personality and cognitive profiles of creative musicians has also
lent interesting insight into the neuropsychological study of creativity. Jazz
musicians tend to be more creative, as measured by the Divergent Thinking
Test (Benedek, Borovnjak, Neubauer, & Kruse-Weber, 2014). These
differences are not only domain-specific to music, but also generalize to
domain-general indicators of divergent thinking outside the musical realm.
Kleinmintz and colleagues (Kleinmintz, Goldstein, Mayseless, Abecasis, &
Shamay-Tsoory, 2014) also showed higher divergent thinking scores and
alternative uses task performance in improvising musicians, with the
mediating effect of idea evaluation. Specifically, the evaluation of creativity
mediated the effect. Furthermore, Przysinda and colleagues (2017) showed
higher scores on the divergent thinking task among jazz musicians.
In terms of personality measures, Benedek and colleagues (2014)
showed different personality profiles in jazz and improvisational musicians.
Specifically, jazz and improv musicians are more open to experience, as are
jazz listeners (Rentfrow & Gosling, 2003). This is consistent with the
creativity literature in general: there is a consistent statistical association
between creativity and openness to experience (McCrae, 1987). Although
this association is well replicated, the direction of causality is unknown.
Perhaps being open to experience makes you more creative; perhaps being
creative makes you more open to experience, or perhaps both are due to
some other variable(s). Hopefully the neurocognitive knowledge of
creativity will inform better music making in performance and in the
classroom (Biasutti, 2015), while improving understanding of how musical
knowledge might transfer to extra-musical outcomes in other areas of
cognition.
Following our adopted definition in the beginning of this chapter of music

as organized sound, we have now seen that organized sounds are generated
in many situations that are barely musical, if at all. For example,
experimental stimuli in an auditory research study are intentionally
organized sounds that vary in their musicality. The extent to which these
intentional sounds become perceived as music may involve our attention
toward its context and the many elements of the musical surface. The
literature we reviewed also shows that fundamentally, the human mind is
“an anticipator, an expectation-generator” (Dennett, 2008). As expectation
shapes all that we experience, how we perceive music also depends on our
expectation. Music interfaces with many aspects of cognition: from
attention, which is linked to stimulus processing and selection, to creativity,
which involves generating new stimuli as well as reacting to them. At
another level, music requires and influences executive function, which is
the collection of our brain’s central executive processes, which we must
deploy to interact with music.
Open questions pertain to the intersection of these three sections: Does
better executive function give rise to better creativity? Or are the two
constructs inversely related? How does attention to specific elements of the
musical surface enable or enhance creativity? Understanding these
seemingly disparate aspects of cognitive function as interrelated can drive
the formulation of new and interesting research questions, which might
inform our understanding of music as well as cognitive science more
generally.
R
Amabile, T. M. (1982). Social psychology of creativity: A consensual assessment technique. Journal
of Personality and Social Psychology 43(5), 997–1013.
Arkin, C., Przysinda, E., Pfeifer, C., Zeng, T., & Loui, P. (2017). Information content predicts
creativity in musical improvisation: A behavioral and voxel-based morphometry study. Under
review.
Arkin, C., Przysinda, E., Pfeifer, C., Zeng, T., & Loui, P. (2019). Grey matter correlates of creativity
in musical improvisation. Under review.
Beaty, R. E. (2015). The neuroscience of musical improvisation. Neuroscience & Biobehavioral
Reviews 51, 108–117.
Benedek, M., Borovnjak, B., Neubauer, A. C., & Kruse-Weber, S. (2014). Creativity and personality
in classical, jazz and folk musicians. Personality and Individual Differences 63, 117–121.
Bengtsson, S. L., Csikszentmihalyi, M., & Ullén, F. (2007). Cortical regions involved in the
generation of musical structures during improvisation in pianists. Journal of Cognitive
Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor sequences: The neural correlates
of musical improvisation. NeuroImage 41(2), 535–543.
Bharucha, J. J., & Stoeckig, K. (1986). Reaction time and musical expectancy: Priming of chords.
Biasutti, M. (2015). Pedagogical applications of the cognitive research on music improvisation.
Bigand, E., Poulin, B., Tillmann, B., Madurell, F., & D’Adamo, D. A. (2003). Sensory versus
cognitive components in harmonic priming. Journal of Experimental Psychology: Human
Perception and Performance 29(1), 159–171.
Boccia, M., Piccardi, L., Palermo, L., Nori, R., & Palmiero, M. (2015). Where do bright ideas occur
in our brain? Meta-analytic evidence from neuroimaging studies of domain-specific creativity.
Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge,
MA: MIT Press.
Broadbent, D. E. (1982). Task combination and selective intake of information. Acta Psychologica
50(3), 253–290.
Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears.
Christoff, K., Irving, Z. C., Fox, K. C., Spreng, R. N., & Andrews-Hanna, J. R. (2016). Mind-
wandering as spontaneous thought: A dynamic framework. Nature Reviews Neuroscience 17(11),
718–731.
Collins, D. (2016). The act of musical composition: Studies in the creative process. New York:
Routledge.
Csikszentmihalyi, M. (1996). Creativity: Flow and the psychology of discovery and invention. New
York: HarperCollins.
De Freitas, J., Liverence, B. M., & Scholl, B. J. (2014). Attentional rhythm: A temporal analogue of
object-based attention. Journal of Experimental Psychology: General 143(1), 71–76.
Dennett, D. C. (2008). Kinds of minds: Toward an understanding of consciousness. New York: Basic
Books.
Deouell, L. Y., Deutsch, D., Scabini, D., Soroker, N., & Knight, R. T. (2007). No disillusions in
auditory extinction: Perceiving a melody comprised of unperceived notes. Frontiers in Human
Neuroscience 1, 15. Retrieved from https://doi.org/10.3389/neuro.09.015.2007
Deutsch, D. (1974). An illusion with musical scales. Journal of the Acoustical Society of America
56(S1). Retrieved from https://doi.org/10.1121/1.1914084.
Diamond, A. (2013). Executive functions. Annual Review of Psychology 64, 135–168.
speech and music. Neuroscience & Biobehavioral Reviews 81(Part B), 181–187.
Donnay, G. F., Rankin, S. K., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2014). Neural
substrates of interactive musical improvisation: An fMRI study of “trading fours” in jazz. PLoS
ONE 9, e88665.
Ellis, R. J., Bruijn, B., Norton, A. C., Winner, E., & Schlaug, G. (2013). Training-mediated leftward
asymmetries during music processing: A cross-sectional and longitudinal fMRI analysis.
Escoffier, N., & Tillmann, B. (2008). The tonal function of a task-irrelevant chord modulates speed
of visual processing. Cognition 107(3), 1070–1083.
Fox, M. D., & Raichle, M. E. (2007). Spontaneous fluctuations in brain activity observed with
functional magnetic resonance imaging. Nature Reviews Neuroscience 8(9), 700–711.
Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience
11, 127–138.
Fujioka, T., Ross, B., & Trainor, L. (2015). Beta-band oscillations represent auditory beat and its
metrical hierarchy in perception and imagery. Journal of Neuroscience 35(45), 15187–15198.
Goldenberg, J., Mazursky, D., & Solomon, S. (1999). Creative sparks. Science 285(5433), 1495–
1496.
Gromko, J. E., & Poorman, A. S. (1998). The effect of music training on preschoolers’ spatial-
temporal task performance. Journal of Research in Music Education 46(2), 173–181.
Habibi, A., Cahn, B. R., Damasio, A., & Damasio, H. (2016). Neural correlates of accelerated
auditory processing in children engaged in music training. Developmental Cognitive Neuroscience
21, 1–14.
Hafter, E. R., & Saberi, K. (2001). A level of stimulus representation model for auditory detection
and attention. Journal of the Acoustical Society of America 110, 1489. Retrieved from
https://doi.org/10.1121/1.1394220
Hafter, E. R., Sarampalis, A., & Loui, P. (2008). Auditory attention and filters. In W. Yost (Ed.),
Auditory perception of sound sources (pp. 115–142). Dordrecht: Springer.
Hafter, E. R., Schlauch, R. S., & Tang, J. (1993). Attending to auditory filters that were not
stimulated directly. Journal of the Acoustical Society of America 94, 743–747. Retrieved from
https://doi.org/10.1121/1.408203
Hansen, N. C., & Pearce, M. T. (2014). Predictive uncertainty in auditory sequence processing.
Henry, M. J., Herrmann, B., Kunke, D., & Obleser, J. (2017). Aging affects the balance of neural
entrainment and top-down neural modulation in the listening brain. Nature Communications 8,
15801. doi:10.1038/ncomms15801
Henry, M. J., Herrmann, B., & Obleser, J. (2015). Selective attention to temporal features on nested
time scales. Cerebral Cortex 25(2), 450–459.
Hollinger, A., Steele, C., Penhune, V., Zatorre, R., & Wanderley, M. (2007). fMRI-compatible
electronic controllers. In Proceedings of the 7th international conference on New Interfaces for
Musical Expression (pp. 246–249). New York: ACM. doi:10.1145/1279740.1279790
James, T., Przysinda, E., Sampaio, G., Woods, K. J. P., Hewett, A., Morillon, B., & Loui, P. (2017).
Acoustic effects on oscillatory markers of sustained attention. Presentation at the International
Conference on Auditory Cortex. Banff, Canada.
James, W. (1890). The principles of psychology. New York: Henry Holt.
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and
memory. Psychological Review 83(5), 323–355.
96(3), 459–491.
Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stimulus-
driven attending in dynamic arrays. Psychological Science 13(4), 313–319.
Justus, T., & List, A. (2005). Auditory attention to frequency and time: An analogy to visual local-
global stimuli. Cognition 98(1), 31–51.
Kleinmintz, O. M., Goldstein, P., Mayseless, N., Abecasis, D., & Shamay-Tsoory, S. G. (2014).
Expertise in musical improvisation and creativity: The mediation of idea evaluation. PLoS ONE 9,
e101568.
Koelsch, S., Gunter, T. C., Friederici, A. D., & Schröger, E. (2000). Brain indices of music
processing: Nonmusicians are musical. Journal of Cognitive Neuroscience 12(3), 520–541.
Nature Reviews Neuroscience 11, 599–605.
Lamb, S. J., & Gregory, A. H. (1993). The relationship between music and reading in beginning
readers. Educational Psychology 13(1), 19–27.
Lartillot, O., & Toiviainen, P. (2007). A Matlab toolbox for musical feature extraction from audio. In
Proceedings of the 10th International Conference on Digital Audio Effects (pp. 237–244).
Bordeaux, France. Retrieved from http://dafx.labri.fr/main/papers/p237.pdf
Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical performance: An fMRI
study of jazz improvisation. PLoS ONE 3, e1679.
List, A., Justus, T., Robertson, L. C., & Bentin, S. (2007). A mismatch negativity study of local–
global auditory processing. Brain Research 1153, 122–133.
Liu, S., Chow, H. M., Xu, Y., Erkkinen, M. G., Swett, K. E., Eagle, M. W., … Braun, A. R. (2012).
Neural correlates of lyrical improvisation: An fMRI study of freestyle rap. Scientific Reports 2,
834. doi:10.1038/srep00834
Lolli, S., Lewenstein, A. D., Basurto, J., Winnik, S., & Loui, P. (2015). Sound frequency affects
speech emotion perception: Results from congenital amusia. Frontiers in Psychology 6. Retrieved
Longuet-Higgins, H. C., & Lee, C. S. (1982). The perception of musical rhythms. Perception 11(2),
115–128.
Loui, P., Grent-’T-Jong, T., Torpey, D., & Woldorff, M. (2005). Effects of attention on the neural
processing of harmonic syntax in Western music. Cognitive Brain Research 25(3), 678–687.
Loui, P., Kroog, K., Zuk, J., Winner, E., & Schlaug, G. (2011). Relating pitch awareness to phonemic
awareness in children: Implications for tone-deafness and dyslexia. Frontiers in Psychology 2, 111.
Retrieved from https://doi.org/10.3389/fpsyg.2011.00111
Loui, P., & Wessel, D. (2007). Harmonic expectation and affect in Western music: Effects of attention
and training. Perception & Psychophysics 69(7), 1084–1092.
McCrae, R. R. (1987). Creativity, divergent thinking, and openness to experience. Journal of
Personality and Social Psychology 52(6), 1258–1265.
McPherson, M. J., Barrett, F. S., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2016).
Emotional intent modulates the neural substrates of creativity: An fMRI study of emotionally
targeted improvisation in jazz musicians. Scientific Reports 6, 18460. doi:10.1038/srep18460
Broca’s area: An MEG study. Nature Neuroscience 4, 540–545.
Marmel, F., Tillmann, B., & Dowling, W. J. (2008). Tonal expectations influence pitch perception.
Mehr, S. A., Schachner, A., Katz, R. C., & Spelke, E. S. (2013). Two randomized trials provide no
consistent evidence for nonmusical cognitive benefits of brief preschool music enrichment. PloS
ONE 8(12), e82007.
Millett, D. (2001). Hans Berger: From psychic energy to the EEG. Perspectives in Biology and
Medicine 44(4), 522–542.
Moreno, S., Bialystok, E., Barac, R., Schellenberg, E. G., Cepeda, N. J., & Chau, T. (2011). Short-
term music training enhances verbal intelligence and executive function. Psychological Science
22(11), 1425–1433.
Norgaard, M. (2017). Developing musical creativity through improvisation in the large performance
classroom. Music Educators Journal 103(3), 34–39.
Norgaard, M., Spencer, J., & Montiel, M. (2013). Testing cognitive theories by creating a pattern-
based probabilistic algorithm for melody and rhythm in jazz improvisation. Psychomusicology:
Music, Mind, and Brain 23(4), 243–254.
Nozaradan, S., Zerouali, Y., Peretz, I., & Mouraux, A. (2013). Capturing with EEG the neural
entrainment and coupling underlying sensorimotor synchronization to the beat. Cerebral Cortex
25(3), 736–747.
Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience 6, 674–681.
Patel, A. D. (2011a). Why would musical training benefit the neural encoding of speech? The
OPERA hypothesis. Frontiers in Psychology 2, 142. Retrieved from
Patel, A. D. (2011b). Why does musical training benefit the neural encoding of speech? A new
hypothesis. Journal of the Acoustical Society of America 130, 2398. Retrieved from
https://doi.org/10.1121/1.3654612
Pinho, A. L., De Manzano, O., Fransson, P., Eriksson, H., & Ullén, F. (2014). Connecting to create:
Expertise in musical improvisation is associated with increased functional connectivity between
premotor and prefrontal areas. Journal of Neuroscience 34(18), 6156–6163.
Povel, D.-J., & Essens, P. (1985). Perception of temporal patterns. Music Perception: An
Przysinda, E., Zeng, T., Maves, K., Arkin, C., & Loui, P. (2017). Jazz musicians reveal role of
expectancy in human creativity. Brain and Cognition 119, 45–53.
Purves, D., Cabeza, R., Huettel, S. A., Labar, K. S., Platt, M. L., Woldorff, M. G., & Brannon, E. M.
(2008). Cognitive neuroscience. Sunderland: Sinauer Associates.
Rauscher, F. H., Shaw, G. L., & Ky, C. N. (1993). Music and spatial task performance. Nature
365(6447), 611.
Rentfrow, P. J., & Gosling, S. D. (2003). The do re mi’s of everyday life: The structure and
personality correlates of music preferences. Journal of Personality and Social Psychology 84(6),
1236–1256.
Runco, M. A., & Jaeger, G. J. (2012). The standard definition of creativity. Creativity Research
Journal 24(1), 92–96.
Sala, G., & Gobet, F. (2017a). Does far transfer exist? Negative evidence from chess, music, and
working memory training. Current Directions in Psychological Science 26(6), 515–520.
Sala, G., & Gobet, F. (2017b). When the music’s over: Does music skill transfer to children’s and
young adolescents’ cognitive and academic skills? A meta-analysis. Educational Research Review
20, 55–67.
Sawyer, K. (2011). The cognitive neuroscience of creativity: A critical review. Creativity Research
Journal 23(2), 137–154.
Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science 15(8), 511–514.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal
27(3), 379–423.
Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive
Sciences 12(5), 182–186.
Shirer, W. R., Ryali, S., Rykhlevskaia, E., Menon, V., & Greicius, M. D. (2012). Decoding subject-
driven cognitive states with whole-brain connectivity patterns. Cerebral Cortex 22(1), 158–165.
Silvia, P. J., Beaty, R. E., & Nusbaum, E. C. (2013). Verbal fluency and creativity: General and
specific contributions of broad retrieval ability (Gr) factors to divergent thinking. Intelligence
41(5), 328–340.
Slevc, L. R., Davey, N. S., Buschkuehl, M., & Jaeggi, S. M. (2016). Tuning the mind: Exploring the
connections between musical ability and executive functions. Cognition 152, 199–211.
Slevc, L. R., & Okada, B. M. (2015). Processing structure in language and music: A case for shared
reliance on cognitive control. Psychonomic Bulletin & Review 22(3), 637–652.
Sternberg, R. J., & Lubart, T. (1999). The concept of creativity: Prospects and paradigms. In R.
Sternberg (Ed.), Handbook of creativity 1 (pp. 3–15). Cambridge: Cambridge University Press.
Sternberg, R. J., Lubart, T. I., Kaufman, J. C., & Pretz, J. E. (2005). Creativity. In K. Holyoak & R.
G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 351–370).
Cambridge: Cambridge University Press.
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive
Psychology 12(1), 97–136.
Varèse, E., & Wen-Chung, C. (1966). The liberation of sound. Perspectives of New Music 5(1), 11–
19.
White-Schwoch, T., Carr, K. W., Anderson, S., Strait, D. L., & Kraus, N. (2013). Older adults benefit
from music training early in life: Biological evidence for long-term training-driven plasticity.
Woldorff, M. G., Gallen, C. C., Hampson, S. A., Hillyard, S. A., Pantev, C., Sobel, D., & Bloom, F.
E. (1993). Modulation of early sensory processing in human auditory cortex during auditory
selective attention. Proceedings of the National Academy of Sciences 90(18), 8722–8726.
Woldorff, M. G., & Hillyard, S. A. (1991). Modulation of early auditory processing during selective
listening to rapidly presented tones. Electroencephalography and Clinical Neurophysiology 79(3),
170–191.
Woldorff, M. G., Hillyard, S. A., Gallen, C. C., Hampson, S. R., & Bloom, F. E. (1998).
Magnetoencephalographic recordings demonstrate attentional modulation of mismatch-related
neural activity in human auditory cortex. Psychophysiology 35(3), 283–292.
Zeng, T., Przysinda, E., Pfeifer, C., Arkin, C., & Loui, P. (2017). Structural connectivity predicts
success in musical improvisation. Under review.
Zeng, T., Przysinda, E., Pfeifer, C., Arkin, C., & Loui, P. (2018). White matter connectivity reflects
success in musical improvisation. bioRxiv.
Zuk, J., Benjamin, C., Kenyon, A., & Gaab, N. (2014). Behavioral and neural correlates of executive
functioning in musicians and non-musicians. PLoS ONE 9, e99868.
CHAPT E R 13
N E U R A L C O R R E L AT E S O F
MUSIC AND EMOTION
PAT R I K N . JU S L I N A N D L A U R A S . S A K K A
W it comes to explaining the universal attraction of music as a human

phenomenon, few aspects loom larger than the emotional responses it
arouses. Music listeners may experience anything from startle reflexes and
changes in arousal to discrete emotions such as happiness, sadness, interest,
and nostalgia—as well as profound aesthetic emotions (Juslin, 2019). Such
experiences are the “driving force” behind most people’s engagement with
music, and might have far-reaching implications for their well-being and
health (e.g., MacDonald, Kreutz, & Mitchell, 2012; Thaut & Wheeler,
2010).
When systematic studies of music and emotion finally took off, around
the millennium (Juslin & Sloboda, 2001), it was inevitable that
neuropsychological research would play a role in that trend. While imaging
studies could constrain psychological theorizing, psychological theories
could guide imaging studies and help to organize their findings. Coinciding
with a reappraisal of the role of emotion in human behavior in the
neurosciences (Damasio, 1994), the end of the 1990s saw the first brain
imaging studies focusing on emotions in music (Blood, Zatorre, Bermudez,
& Evans, 1999).
Mapping the neural correlates of emotional responses to music turned
out to be more difficult than initially expected, however. Even such a
seemingly delimited domain as emotion appears to involve a wide range of
subcortical and cortical areas, distributed across the brain (Koelsch, 2014);
and unfortunately, the relevant brain regions do not come in neat little
packages, which can be interpreted easily by researchers. Hence, to account
for the neural correlates of musical emotions could turn out to be one of the
great challenges in the neuroscience of music.
The goal of this chapter is to offer a theoretical and empirical review of
studies of the neural correlates of emotional responses to music, carried out
over the last thirty-five years. The remainder of the chapter is structured as
follows: First, we provide basic definitions and distinctions of the field of
musical affect. Second, we present a theoretical framework, which could
serve to organize the field. Third, we review seventy-eight empirical
studies, published between 1982 and 2016. We distinguish different
empirical approaches in these studies and draw general conclusions based
on their results. Finally, we consider the implications of these findings and
offer some methodological recommendations for future studies.
M A : D
D
Emotions belong to the field of affect, which covers a range of phenomena.

The common and defining feature is valence (i.e., the evaluation of an
object, person, or event as being positive or negative). Most researchers
also require a certain degree of arousal, in order to distinguish affect from
purely cognitive judgments. Accordingly, musical affect could comprise
anything from preference (e.g., liking a piece) and mood (a mild, objectless,
and long-lasting affective state, e.g., feeling gloomy after hearing sad music
in the background all morning) to aesthetic judgment (e.g., rating a
composition as valuable as “art”). Most brain studies to date, however, have
arguably focused on emotions, as defined by Juslin (2011, p. 114):
Emotions are relatively brief, intense, and rapidly changing reactions to potentially
important events (subjective challenges or opportunities) in the external or internal
environment—often of a social nature—which involve a number of subcomponents
(cognitive changes, subjective feelings, expressive behavior, and action tendencies) that are
more or less ‘synchronized’ during an emotional episode.
Changes in the intensity, quality, and complexity of an emotion could occur,

from moment to moment, and such changes can be captured in terms of
shifts along such emotion dimensions as arousal and valence (Russell,
1980). However, emotions may also be analyzed in terms of qualitatively
distinct categories (e.g., joy, sadness, awe, nostalgia), which remain
throughout an episode (Izard, 1977). Both categorical and dimensional
approaches receive some support in empirical studies (e.g., Harmon-Jones,
Harmon-Jones, & Summerell, 2017), though we agree with Zentner’s
(2010) view that dimensional models are ultimately unable to do justice to
the richness or specificity of emotional responses to music.
Most researchers in the domain seem to agree that music can influence
emotions (for reviews, see Juslin & Sloboda, 2010), so the primary aim of
current research is rather to understand the nature of this process—how it
“works.” In the following section, we describe a framework that can serve
to organize and guide research.
First, we need to make a distinction between perception and induction of
emotions: We may simply perceive (or recognize) an emotion expressed in
the music or we may actually feel an emotion in ourselves. The distinction
is important, because different psychological processes—and hence
different neural substrates—may be involved, depending on the type of
process.
Whenever practically feasible, it is recommendable to measure multiple
emotion components (self-reported feeling, expression, psychophysiology)
in order to draw more valid conclusions about the occurrence of an aroused
emotion. (If researchers do not find a coherent response in multiple emotion
components, there is reason to suspect that “only” perception of emotion
has occurred.)
P M : A
T F
To explain emotional responses to music, we need to uncover the
psychological mechanisms that produce perceived or induced emotion.
Broadly speaking, the mechanism refers to those causal processes through
which an outcome is brought into being. In the present context, this
involves a functional (i.e., psychological) description of what the brain is
“doing” in principle (e.g., retrieving a memory). Such a process description
at the psychological level must not be confused with the separate question
of where in the brain the process is implemented, or with the
phenomenological experience it seeks to explain (Dennett, 1987).
Several authors have proposed possible mechanisms underlying
perception and induction of emotions in music, typically involving one or a
few possibilities (see Berlyne, 1971; Clynes, 1977; Juslin, 2001; Langer,
1957; Meyer, 1956; Scherer & Zentner, 2001; Sloboda & Juslin, 2001).
Space limitations prevent us from reviewing previous work here, but a
parsimonious way to organize current theory is provided by the ICINAS-
BRECVEMAC framework, fully described in Juslin (2019) and briefly
summarized below.
Emotion Perception
The first part of the acronym ICINAS-BRECVEMAC stands for Iconic-
Intrinsic-Associative, and refers to three ways in which music carries
emotional meaning. Although the case can be made that emotion perception
is a more straightforward process than emotion induction, even perceived
emotions may need to be decomposed into different subprocesses.
Accordingly, based on the seminal distinction made by Dowling and
Harwood (1986), Juslin (2013b) proposes that there are three distinct
“layers” of musical expression of emotion. Each layer corresponds to a
specific type of coding of emotional meaning (see Fig. 1).
FIGURE 1. Multiple-layer conceptualization of musical expression of emotions.
Reproduced from Patrik N. Juslin, What does music express? Basic emotions and beyond,
Frontiers in Psychology: Emotion Science 4(596), Figure 2, doi: 10.3389/fpsyg.2013.00596
© 2013 Juslin. This work is licensed under the Creative Commons Attribution License (CC
BY 3.0). It is attributed to the author Patrik N. Juslin.
The core layer is based on iconically coded basic emotions. Icon refers
to how music carries emotional meaning based on a formal resemblance
between the music and other events that have an emotional tone (such as
emotional speech and gesture). This core layer may explain findings of
cross-modal parallels (Juslin & Laukka, 2003) and universal recognition of
basic emotions (i.e., sadness, happiness, anger, fear, and love/tenderness) in
both speech (Bryan & Barrett, 2008) and music (Fritz et al., 2009).
The core layer may be extended, qualified, and even modified by two
additional layers based on intrinsic and associative coding, respectively,
which enable listeners to perceive also more complex or ambiguous
emotions. The two additional layers are less cross-culturally invariant and
depend more on the context and the listener’s individual learning (Juslin,
2019). Intrinsic coding refers to how music carries meaning based on
syntactic relationships within the music itself, how one part of the music
may “refer” to another part of the music, thus contributing to shifting levels
of stability, tension, or arousal (“affective trajectories”; e.g., Spitzer, 2013).
Associative coding, finally, refers to how music carries emotional
meaning based on a more arbitrary association (e.g., temporal or spatial
contiguity); a piece of music can be perceived as expressive of an emotion
just because something in the music (e.g., a melodic theme) has been
repeatedly linked with other emotionally meaningful events in the past—
either through chance or by design (e.g., Wagner’s “Leitmotif” strategy; see
Dowling & Harwood, 1986).
To illustrate this further in a musical piece, the overall emotion category
or broad “emotional tone” (e.g., sadness) might be specified by iconically
coded features (e.g., slow tempo, minor mode, low and often falling pitch
contour, legato articulation); this basic emotion category is given
“expressive shape” by intrinsically coded features (e.g., local structural
features such as syncopations, dissonant intervals, and melodic
appoggiaturas), creating “tension” and “release,” which contribute to more
time-dependent and complex nuances of the same emotion category (e.g.,
sadness vs. hopelessness); to this we add the final and more personal layer
of expression (e.g., that the listener associates the piece with a particular
person, event, or physical location).
It appears plausible that the three sources of perceived emotions—which
might occur alone or in combination—involve partly different neural
correlates (Juslin, 2019).
Emotion Induction
Our main focus in this chapter will be on induced emotion, which appears
to be more complex in terms of its neural substrates. Here, a multi-
mechanism framework is clearly called for. The second part of the ICINAS-
BRECVEMAC acronym refers to nine psychological mechanisms for
induction of emotions (listed below), which may be activated by music (and
other stimuli).
An evolutionary perspective on human perception of sounds suggests
that the survival of our ancient ancestors depended on their ability to detect
patterns in sounds, derive meaning from them, and adjust their behavior
accordingly (Juslin, 2013a; cf. Hodges & Sebald, 2011). This behavioral
function can be achieved in a multitude of ways, reflecting the phylogenetic
origin of our emotions.
The human brain did not develop from scratch. It is the result of a long
evolutionary process, during which newer brain structures were gradually
imposed on older structures (Gärdenfors, 2003). Brain circuits are laid out
like the concentric layers of an onion, functional layer upon functional
layer. One consequence of this arrangement, which is the result of natural
selection rather than design, is that emotion can be evoked at multiple levels
of the brain (Juslin, 2019).
Hence, the first author of this chapter has postulated a set of induction
mechanisms involving (more or less) distinct brain networks, which have
developed gradually and in a specific order during evolution—from simple
reflexes to complex judgments. Different mechanisms rely on different
kinds of mental representation (e.g., associative, analogical,
sensorimotoric), which serve to guide future action. All mechanisms have
in common that they can be triggered by a “musical event” (broadly defined
as music, listener, and context). The mechanisms are:
• Brainstem reflex, a hard-wired attention response to subjectively

“extreme” values of basic acoustic features, such as loudness, speed,
and timbre (e.g., Davis, 1984); you may become startled and
surprised by the loud beginning of a rock song during a live concert.
• Rhythmic entrainment, a gradual adjustment of an internal body
rhythm, such as heart rate, towards an external rhythm in the music
(e.g., Harrer & Harrer, 1977); you may experience excitement when
your heart rate is becoming gradually synchronized with a captivating
and slightly faster rhythm in a piece of techno music at a nightclub.
• Evaluative conditioning, a regular pairing of a piece of music and
other positive or negative stimuli leading to a conditioned association
(e.g., Blair & Shimp, 1992); you may feel happy when you happen to
hear a song that has repeatedly occurred in festive contexts
previously.
• Contagion, an internal “mimicry” of the perceived voice-like
emotional expression of the music (e.g., Juslin, 2001); you may
experience sadness when you hear a slow, quiet, low-pitched
performance of a classical piece on the cello, featuring much vibrato
and rubato.
• Visual imagery, inner images of an emotional character conjured up
by the listener through a metaphorical mapping of the musical
structure (Osborne, 1980); you may become relaxed when you
indulge in mental images of a landscape suggested by a piece of
“new-age” music.
• Episodic memory, a conscious recollection of a particular event from
the listener’s past that is “triggered” by the music (Baumgartner,
1992); you may experience nostalgia when a song evokes a vivid
personal memory from the specific time you met your current partner
in life.
• Musical expectancy, a response to the gradual unfolding of the
syntactical structure of the music, and its expected or unexpected
continuations (Meyer, 1956); you may feel anxious due to uncertainty
created by phrases without a clear tonal center in an “avant-garde”
piece.
• Aesthetic judgment, a subjective evaluation of the aesthetic value of
the music, based on an individual set of weighted criteria (Juslin,
2013a); you may take pleasure in the exceptional beauty of a Bach
composition, or may admire the exceptional skills of a great
performer.
In addition to these eight mechanisms, music can also arouse emotions

through the default mechanism for induction of emotions: Cognitive goal
appraisal (Scherer, 1999). You may become annoyed when a neighbor
plays music late at night, blocking your goal of going to sleep. Cognitive
appraisal appears less important in musical settings, however (Juslin,
Liljeström, Västfjäll, Barradas, & Silva, 2008). For further elaboration and
predictions for each mechanism, see Juslin (2019).
One implication of the framework is that before one can understand an
emotion in any given situation, it is necessary to know which of these
mechanisms is in operation. This is because each mechanism has its own
process characteristics, in terms of information focus, key brain regions,
degree of cultural impact and learning, ontogenetic development, induced
emotions, induction speed, availability to consciousness, dependence on
musical structure, and so forth.
Armed with these theoretical principles of music and emotion, we are
ready to take a look at the empirical work carried out to date. Our review
will be restricted to studies that explicitly focus on musical affect.
(Aesthetic responses are reviewed in Chapter 15, this volume.)
R E S
General Overview
In this section, we summarize seventy-eight neuropsychological studies,
published between 1982 and 2016 (see Appendix table). Studies have been
grouped with regard to methodology: PET/fMRI (38 studies, 49 percent),
EEG (22 studies, 28 percent), lesions (16 studies, 20 percent), and dichotic
listening (2 studies, 3 percent). They are described in terms of listeners,
musical stimuli, contrast/design, method, main findings, and type of affect
(e.g., measuring induced vs. perceived emotions; categories, dimensions,
preferences). The categorization concerning induced vs. perceived emotion
is not entirely straightforward, because brain studies do not always
distinguish the processes in the design. (Previous reviews of the field have
tended to inter-mix studies that focus on different aspects, induced vs.
perceived emotion.)
Sample size varies depending on method—PET/fMRI (M = 16.31), EEG
(M = 32.00), lesions (M = 14.44), and dichotic listening (M = 18.00)—but
tends to be relatively small overall. Note that PET/fMRI and EEG studies
have focused mostly on induced emotion, whereas lesion and dichotic
listening studies have focused mostly on perceived emotion. Blood flow
studies have used mostly fMRI (as opposed to PET) and “real” (as opposed
to synthesized) music, and have mostly adopted dimensional (66 percent) as
opposed to discrete (34 percent) approaches to emotion. EEG studies have
also used mostly “real” music—but have adopted dimensional (34 percent)
and discrete (31 percent) approaches to a roughly equal degree. Lesion
studies have (in contrast to other studies) used mainly synthesized music,
and have mostly studied discrete emotions (75 percent), rather than
dimensions (38 percent). Such differences between studies that use different
methods should clearly be kept in mind when interpreting the overall
results.
Empirical Approaches
The “contrast” and “emotion” columns in the appendix table are suggestive
of the kind of empirical approach adopted in the study. Some early studies
tended to use an open-ended exploratory approach, which simply presents
listeners with supposedly “emotional” music, to see which regions might be
affected. Although such an approach was defensible in the early stages, it
makes it difficult to interpret the results (e.g., “It is not possible to
disentangle the different subcomponents of the activation due to limitations
of this experimental design,” Alfredson, Risberg, Hagberg, & Gustafson,
2004, p. 165). Thus, for instance, it may not be clear whether the study has
measured perceived or induced emotion, in the absence of control
conditions or converging measures.
We identify at least five possible approaches in the neuropsychological
study of emotions, which can serve different aims. These have been
adopted, implicitly or explicitly, in music studies to highly varying degrees.
We briefly summarize these approaches, before looking closer at the actual
data.
1. A first approach appears to serve mainly to demonstrate that stimuli

do arouse emotions by comparing the results to previous studies of
emotions. Although most musicians and listeners would seem to take the
emotional powers of music for granted, it has been the matter of some
controversy whether music really evokes emotions (Kivy, 1990). A
landmark study by Blood and Zatorre (2001) revealed—for the first time
—that pleasurable responses to music influence “core” regions of the
brain already linked to emotion, such as the amygdala, the hippocampus,
and the ventral striatum.
The demonstration of blood-flow changes in such regions appeared to make

musical emotions more “real,” in the eyes of some observers. But data of
this kind were over-sold, sometimes: A lot was made of the finding that
enjoyment of music involves the same “reward circuits” in the brain as
other forms of pleasure such as food, sex, and drugs (e.g., the nucleus
accumbens); yet this discovery is not that surprising. It would have been far
more surprising to discover unique “reward circuits” only for music. The
major conclusion of this approach is that “the brain areas affected by
emotions to music are similar to those reported in other brain studies of
emotion.”
2. A second approach speaks to the previously discussed distinction
between perceived and induced emotions. A meta-analysis of PET and
fMRI studies of perception and induction of emotion in general (outside
music) by Wager et al. (2008) concluded that the two processes involve
peak activations of different brain regions, supporting the idea that these
are distinct processes. Some authors argue that the processes can be
distinguished in terms of prefrontal activation, such that perceived
emotion activates mainly the right hemisphere (regardless of the
emotion) whereas evoked emotion is lateralized according to valence:
positive emotions in the left hemisphere, negative in the right (e.g.,
Blonder, 1999; Davidson, 1995).
To the best of our knowledge, no music study thus far has directly
contrasted perception and induction of emotion, but attempts to interpret
data along those lines have been made (Juslin & Sloboda, 2001, p. 456). We
review further evidence below. The preliminary conclusion of this approach
is that “perception and induction of emotions may involve different patterns
of brain activation.”
3. A third approach, already hinted at above, aims mainly to

discriminate neural patterns of affective responses with regard to their
valence (positive/negative). This approach has been adopted by several
studies in the general emotion field. For instance, Chikazoe and
colleagues (Chikazoe, Lee, Kriegeskorte, & Anderson, 2014) were able
to find particular patterns with significant correlations to the degree of
positive or negative valence experienced by subjects. A similar approach
is often used in music. In fact, in our estimation, the use of an explicit
(“positive vs. negative,” “pleasant vs. unpleasant”) or implicit (“happy
vs. sad,” “consonant vs. dissonant”) valence dimension is the most
common approach in blood-flow studies. Several studies indicate that
positive affect is handled in the left hemisphere, whereas negative affect
is handled in the right (see Altenmüller, Schurmann, Lim, & Parlitz,
2002; Daly et al., 2014; Flores-Gutiérrez et al., 2007; Schmidt &
Trainor, 2001; Tsang, Trainor, Santesso, Tasker, & Schmidt, 2001).
Not all of the studies seem to follow this pattern, however—at least on first
view. A problem is that in some cases, it is difficult to know for sure
whether a study has measured perceived or evoked emotion, since multi-
component indices were not used. For instance, it is an open question
whether ratings of pleasantness of music in some studies are just that
(ratings of the stimuli) or whether they index feelings of pleasure. If there is
insufficient control over which process is actually elicited in studies, this
can explain the mixed findings. We submit that the results suggest some
degree of specificity in terms of valence, but the nature of these patterns
and their interpretation remain contested. Yet, a preliminary conclusion of
this approach is that “neural correlates can distinguish the valence of
musically aroused affect.”
4. A fourth approach seeks to obtain links between discrete emotions

and neural structures. This is part of an ongoing debate about whether
there is emotion-specificity in responding more generally. Some
neuroscientists claim to have been able to distinguish neural activity in
terms of discrete emotions (see Damasio et al., 2000; Kassam, Markey,
Cherkassky, Loewenstein, & Just, 2013; Murphy, Nimmo-Smith, &
Lawrence, 2003; Saarimäki et al., 2016).
We should clearly acknowledge, however, that the hypothesis of emotion-

specific activation remains controversial. A recent review failed to obtain
any evidence that discrete emotions can be consistently localized to distinct
brain regions (cf. Clark-Polner, Wager, Satpute, & Barrett, 2016). This is
sometimes cited as evidence against a discrete emotions approach.
However, the very same review failed to obtain evidence of specific regions
linked with dimensions such as valence also! Hence, the authors argue that
the localization hypothesis for affective states—whether discrete or
dimensional—is flawed in general.
Previous neuropsychological studies may, indeed, have been too eager to
localize particular emotions in specific parts of the brain. Some tendencies
in a similar direction may be found in music research also—for instance,
linking the amygdala to fear perception (Peretz, 2001), and the
hippocampus to tender emotions (Koelsch, 2014), although both these
structures are clearly involved in a much wider range of emotions. There is
a risk here that neuroscientists “claim” certain areas as “music-specific” or
“emotion-specific” when, in fact, they are neither.
In our view, both the proponents and critics of the emotion-specificity
approach have tended to confuse causal mechanisms with affective
outcomes: there is no reason to assume emotion specificity in the former
(e.g., a “memory area” may be active across emotions), even though there
is specificity in the felt emotions (nostalgia vs. awe). In a meta-analysis,
Lindquist and colleagues (Lindquist, Wager, Kober, Bliss-Moreau, &
Barrett, 2012) observed a set of interacting brain regions commonly
involved in basic psychological operations of both an emotional and non-
emotional nature during emotion experience, across a range of discrete
emotion categories. The authors argue that this finding is consistent with a
“constructive” approach to emotion (Barrett, 2017). However, it is equally
consistent with the BRECVEMAC framework presented earlier.
The major conclusion of this approach, then, is that “although there may
be some limited level of emotion specificity in regions linked to conscious
emotional experience, most areas involve domain-general processes (such
as memory) which are active not only during emotions.” This, then, leads us
to the fifth and final approach.
5. The fifth approach focuses on underlying psychological processes

or brain functions; that is, mechanisms (e.g., Cabeza & Nyberg, 2000).
By carefully isolating distinct psychological processes in the
experimental design, one can link neural correlates to mental functions.
For example, episodic memories might involve a partly distinct brain
network from conditioned responses. This approach is the “essence” of
neuropsychology and has been successful in the neurosciences more
generally. Yet this approach is still rare in the music field (Janata, 2009;
Steinbeis, Koelsch, & Sloboda, 2006).
Over time, one may discern a change from basic lateralization studies (e.g.,
dichotic listening) and a search for individual brain structures to a
consideration of more complex and distributed networks. But we are not
aware of any study of neural correlates so far that contrasts different
psychological mechanisms. (We will consider such an approach later in the
chapter.) Thus, the increasing awareness of the role of mechanisms has not
yet translated into concrete designs. This becomes clear when we take a
closer look at the findings.
Summary of Brain Imaging Data
At the current stage, the data that are potentially most informative when it
comes to pinpointing neural correlates of musical emotion come from the
(38) brain imaging studies conducted to date. Tables 1 and 2 summarize the
main findings for perceived and induced emotion, respectively, in terms of
broad brain areas for which blood-flow changes have been reported. Ideally,
the interpretation should be made in terms of “networks” (Bressler &
Menon, 2010), rather than “isolated” regions, but current results do not yet
enable such interpretations.
Some broad conclusions can be drawn based on the findings. First,
music listening can cause changes in blood flow in “core” regions for
emotional processing. Second, as noted by Peretz (2010, p. 119), “there is
not a single, unitary emotional system underlying all emotional responses to
music.” On the contrary, a fairly broad range of cortical and subcortical
brain regions seem to be linked to musical emotions. Most of these belong
to the (extended) limbic system and include the amygdala, the
hippocampus, the striatum (including nucleus accumbens), the cingulate
cortex, the insula, the prefrontal and orbitofrontal cortex, the cerebellum,
the frontal gyrus, the parahippocampal gyrus, and various brainstem
structures.
The data in Tables 1 and 2 also enable us to compare induced and
perceived emotions. As may be seen, there is some overlap between the
brain regions reported. This could reflect two things: (a) that there is some
extent of overlap in the neural correlates of these processes or (b) that
studies have not sufficiently distinguished between the processes—such that
some studies that ostensibly focus on induced emotion have measured
perceived emotion and vice versa; or that some studies measure both
processes at the same time—leading to “noisy” data. Few studies have
measured multiple components of emotion so as to enhance the validity of
conclusions about induced emotions (discussed at the beginning of this
chapter).
In any case, note that there are certain differences in the findings for the
two processes: Only for induction of emotion have several studies reported
changes in the amygdala, the striatum (including nucleus accumbens), and
the hippocampus. At least some of these areas may thus distinguish induced
emotions from mere perception, though studies that directly contrast the
two processes under controlled conditions are clearly required to confirm
this hypothesis.1
Beyond these simple and relatively trivial conclusions, interpretations of
the findings tend to become more difficult and “impressionistic” in nature.
Given a general lack of “process-pure” manipulations of mechanisms,
researchers have to rely on “informed speculations” about the possible role
of different brain structures and networks. These are typically based on
general knowledge of the brain, but tend to be relatively vague. This is
because the analyses involve very broad brain areas which have been
proposed to be involved in a wide range of different psychological
processes; that is, they have poor “selectivity” (Poldrack, 2006) when it
comes to “revealing” specific psychological processes.
Koelsch (2014, p. 172) submits that observed changes in the amygdala
“could be because music is perceived as a stimulus with social significance
owing to its communicative properties.” This is, indeed, one possibility—
but we really do not know. And even if this notion is correct, it does not
offer very precise information about the functional role of the amygdala. An
additional problem is that this form of “reverse inference” about cognitive
process is not deductively valid (Poldrack, 2006). Normally, we would infer
from brain imaging data that “when cognitive process X is active, then
brain area Y is active”—not the other way around.
Let us be clear: This is not a matter of competence. Informed
speculations and interpretations by distinguished neuroscientists like Stefan
Koelsch or Isabelle Peretz are as good as they get. The problem is rather
that in the absence of process-specific experimental manipulation in the
field as a whole, theoretical interpretations are rendered difficult for a
number of reasons.
A first problem is that brain imaging “cannot disentangle correlation
from causation” (Peretz, 2010, p. 114); a related problem is that results
from imaging studies tend to be “overinclusive” (Peretz, 2010, p. 114);
therefore, “it is not always easy to determine if the activity is related to
emotional or non-emotional processing of the musical structure” (Peretz,
2010, p. 112). Indeed, the same brain structure can serve different roles both
within and across domains (Kreutz & Lotze, 2007). In addition, as implied
by the ICINAS-BRECVEMAC framework, cognition and emotion are not
neatly separated in the brain: specific cognitive processes may be involved
depending on the mechanism responsible for the perceived or induced
emotion.
The specific listener task (self-report of felt affect, ratings of melodies,
or mere listening) may also affect the patterns of brain
activation/deactivation, and so may differences with respect to the music
stimuli (“real” vs. “synthesized” music, “familiar” vs. “unfamiliar,” “self-
selected” vs. “experimenter-selected”). All of these issues conspire to make
interpretations of findings from brain imaging studies problematic. This has
not prevented researchers from suggesting how to organize the findings
with regard to the processes of perception and induction, respectively.
Perception of Emotions
Double brain dissociations between emotional judgments and melody
recognition (Peretz & Gagnon, 1999), and between emotional judgments
and basic music perception (Peretz, Gagnon, & Bouchard, 1998), initially
lead Peretz (2001) to postulate an “emotion module,” dedicated to
perception of emotion in music. Subsequently, she proposed that a more
distributed network, originally evolved to process vocal emotions, has been
“invaded” by music, such that emotional speech and emotional music will
share neural resources (Peretz, 2010). This idea has received some support
(Escoffier, Zhong, Schirmer, & Qui, 2013) and is in line with documented
parallels in emotions between speech and music (Juslin & Laukka, 2003).
Studies on emotions in speech suggest a network of areas primarily in the
(right) frontal and parietal lobes, including the inferior frontal gyrus
(Schirmer & Kotz, 2006).
The possibility of cross-modal parallels can be explored in the context of
the present results (Table 1). For perceived emotions, the most frequently
reported regions are frontal areas (73 percent of studies) and the frontal
gyrus (45 percent). Note that Escoffier et al. (2013) found that tracing of
emotions in both speech and music was related to activity in the medial
SFG. Moreover, Nair, Large, Steinberg, and Kelso (2002) discovered that
listening to expressive (as compared to “mechanical”) music performances
increased activity in the right inferior frontal gyrus. These findings seem
consistent with the “shared-resources hypothesis” (further evidence of a
shared neural code was recently reported by Paquette, Takerkart, Saget,
Peretz, & Belin, 2018).
There are some additional brain regions implicated in emotion-
perception studies. Curiously, there are three studies (27 percent) that report
changes in the cerebellum during perceived emotion, and five studies (45
percent) that report changes in the anterior cingulate cortex (which occurs
also in evoked emotion; cf. Table 2). We return to these findings later.
It has further been argued that the perception of dissonance is linked to
the parahippocampal gyrus (Blood et al., 1999). This notion receives
support from lesion studies showing that this basic ability suffers after
damage to the parahippocampal gyrus (Gosselin et al., 2006). Only two of
eleven studies (18 percent) report changes in the amygdala (see Table 1)—
though it has been found that recognition of “scary” music suffers after
damage to the amygdala (Gosselin et al., 2005). It cannot be completely
ruled out that the two studies really measured evoked emotion rather than
just perceived (since they featured unpleasant stimuli that may have evoked
some negative emotion). In summary, the most consistent results are that
perception of emotions in music involves the frontal cortex and the frontal
gyrus—and, perhaps, some right hemisphere lateralization (Bryden, Ley, &
Sugarman, 1982).
Induction of Emotions
For induction of emotions, a larger number of brain regions have been
reported (Table 2). The most frequently reported areas include the amygdala
(63 percent of studies), the frontal cortex (70 percent; Pfc 37 percent), the
ventral striatum/NAc (44 percent), the hippocampus (52 percent), the insula
(48 percent), and the anterior cingulate cortex (41 percent). However, note
that the results vary a lot from study to study, in ways that are not easy to
explain. For example, it may be seen in Table 2 that there are numerous
additional regions that were reported in only one or a few studies. These
include the parahippocampus, the thalamus, the basal ganglia, the
cerebellum, motor regions, and the brainstem.
One approach to this problem is to look for areas that are consistently
activated across studies in the hope that this will reveal an emotion network
that is invariably involved in the process. Thus, for instance, Koelsch and
colleagues (Koelsch, Siebel, & Fritz, 2010) argue that a network consisting
of the amygdala, the hippocampus, the parahippocampus, the temporal
poles, and the pregenual cingulate cortex may play a consistent role in
emotional processing of music. But is there support for the idea of a set of
brain regions that are consistently activated?
Close inspection of Table 2 reveals that few brain regions are reported in
more than about half of the studies which purported to measure induced
emotions. If areas are not consistently found to be influenced, how is this to
be interpreted? Some of this variability is surely due to methodological
problems and consequent measurement error. This could include differences
in how regions of interest (ROI) are defined, or in the assumptions made in
the analysis. But assuming that limbic regions were “prime suspects” in the
analyses, the variability is still too large to be accounted for by (only) this
factor.
In principle, one may argue that if these studies have tried to measure
emotion and the listed regions are not consistently activated across studies,
then either these areas are not related to emotions, or these studies have not
consistently managed to induce any emotion. However, a different
interpretation suggested by the BRECVEMAC framework (and supported
by meta-analyses of “general” emotion findings; cf. Lindquist et al., 2012)
is that the variability is due to different psychological mechanisms being
activated in different investigations (depending on the musical stimuli, the
listeners, and the situation, as well as the experimental procedure). This
possibility is elaborated in the following section.
T M P A
If neuropsychology “aims to relate neural mechanisms to mental functions”

(Peretz, 2010, p. 99), and most previous studies have not tried to manipulate
mechanisms that involve distinct mental functions (discussed earlier), it is
hard to resist the conclusion that studies in this field have somehow
attempted to do neuropsychology, although without the psychology. There
is one exception: Janata (2009) focused specifically on the process of
autobiographical memory and found that dorsal regions of the medial
prefrontal cortex responded to the relative degree of autobiographical
salience of musical stimuli (rated post-hoc).
We believe that a more principled approach, which aims to target
specific mechanisms, might lead to more interpretable results (Juslin,
Barradas, & Eerola, 2015; Juslin, Harmat, & Eerola, 2014). Based on the
assumptions that most studies of musical emotion have lacked the needed
specificity, in terms of stimulus manipulation and procedures, to separate
different underlying mechanisms, and that neuroscience studies in general
psychology have reached a higher level of theoretical sophistication, we
propose hypotheses from various sub-domains (e.g., memory, imagery,
language). These might be tested in designs that manipulate specific
mechanisms, in a humble attempt to uncover more mechanism-specific
brain networks (Juslin, 2019).
Emotional responses to music can be expected to involve three general
types of brain regions: (1) brain regions always involved during music
perception (e.g., the primary auditory cortex), (2) regions always involved
in the conscious experience of emotion, regardless of the “source” of the
emotion (candidates may include the rostral anterior cingulate and the
medial prefrontal cortex; see, e.g., Lane, 2000, pp. 356–358), and (3)
regions involved in information-processing that differs depending on the
mechanism that caused the emotion. The last category of regions may
involve processes (e.g., syntactic processing, episodic memory) that do not
in themselves imply that emotions have been aroused: They may also occur
in the absence of emotions (e.g., Pessoa, 2013). Based on these notions, we
propose the following (preliminary) hypotheses for emotion induction.
(Neural correlates of aesthetic judgments are discussed in Chapter 15, this
volume).
Brainstem reflexes involve the reticulospinal tract, which travels from
the reticular formation of the brain stem, and the intralaminar nuclei of the
thalamus (Davies, 1984; Kinomura, Larsson, Gulyás, & Roland, 1996).
“Alarm signals” to auditory events can be emitted as early as at the level of
the inferior colliculus of the brainstem (Brandao, Melo, & Cardoso, 1993),
producing startle reflexes and increased arousal. Studies show that the
reticulospinal tract is required for the acoustic startle response, because
lesions in this tract abolish the response (Boulis, Kehne, Miserendino, &
Davis, 1990). Yet, although the neural circuitry that “mediates” the acoustic
startle is located entirely within the brainstem, the system can be modulated
by higher neural tracts (Miserandino, Sananes, Melia, & Davis, 1990).
Rhythmic entrainment has been less examined, but could involve neural
oscillation patterns to rhythmic stimulation in early auditory areas, motor
areas (sensorimotor cortex, supplementary motor area), the cerebellum, and
the basal ganglia (see Fujioka, Trainor, Large, & Ross, 2012; Tierney &
Kraus, 2013; Trost el al., 2014), perhaps primed early on by reticulospinal
pathways in the brainstem (Rossignol & Melvill Jones, 1976). The
cerebellum could be particularly important in “active” entrainment
(coordination of a motor response; e.g., Grahn, Henry, & McAuley, 2011),
whereas the caudate nucleus of the basal ganglia could be the crucial area
during “passive” entrainment to auditory stimulation (Trost et al., 2014).
Evaluative conditioning (EC) involves particularly the lateral nucleus of
the amygdala and the interpositus nucleus of the cerebellum (e.g., Fanselow
& Poulus, 2005; Johnsrude, Owen, White, Zhao, & Bohbot, 2000;
Sacchetti, Scelfo, & Strata, 2005). Hippocampal activation may also occur,
if the EC depends strongly on the context, but only the amygdala seems to
be required for EC to occur (LeDoux, 2000). The timing of the delivery of
the CS and US used in conditioning is important, which may explain why
the cerebellum is active in conditioning (like another time-dependent
process—rhythmic entrainment). We argue that the amygdala is mainly
involved in the evaluation of the stimulus whereas the cerebellum is
involved in the timing of the response (Cabeza & Nyberg, 2000).
Emotional contagion from music will presumably include brain regions
for the perception of emotions from the voice (and, hence, presumably of
emotions from voice-like characteristics of music), mainly right-lateralized
inferior frontal areas (including the frontal gyrus) and the basal ganglia
(Adolphs, Damasio, & Tranel, 2002; Paulmann, Ott, & Kotz, 2011;
Schirmer & Kotz, 2006), and also “mirror neurons” in premotor regions, in
particular regions involved in perceiving emotional vocalizations (e.g.,
Paquette et al., 2018; Warren et al., 2006; cf. Koelsch, Fritz, von Cramon,
Müller, & Friederici, 2006).
Visual imagery involves visual representations in the occipital lobe that
are spatially mapped and activated in a “top-down” manner during imagery
(Charlot, Tzourio, Zilbovicius, Mazoyer, & Denis, 1992; Goldenberg,
Podreka, Steiner, Franzén, & Deecke, 1991). This requires the intervention
of an attention-demanding process of image generation, which appears to
have a left temporo-occipital localization (e.g., Farah, 2000). Self-reported
imagery vividness correlates with activation of the visual cortex in imaging
studies (Cui, Jeter, Yang, Montague, & Eagleman, 2007), which may also
be activated during music listening (e.g., Thornton-Wells et al., 2010).
Episodic memory can be divided into various stages (e.g., encoding,
retrieval). The conscious experience of recollection of an episodic memory
seems to involve the medial temporal lobe, especially hippocampus (e.g.,
Nyberg, McIntosh, Houle, Nilsson, & Tulving, 1996) and the medial
prefrontal cortex (Gilboa, 2004; for similar results in music, see Janata,
2009). Additional areas correlated with episodic memory retrieval include
the precuneus (Wagner, Shannon, Kahn, & Buckner, 2005), the entorhinal
cortex (Haist, Gore, & Mao, 2001), and the amygdala (in the case of
emotional memories; Dolcos, LaBar, & Cabeza, 2005).
Musical expectancy refers to such expectancies that involve syntactical
relationships between different parts of the musical structure (Meyer, 1956),
somewhat akin to a syntax in language. Lesion studies indicate that several
areas of the left perisylvian cortex are involved in various aspects of
syntactical processing (Brown, Hagoort, & Kutas, 2000), and parts of
Broca’s area increase their activity when sentences increase in syntactical
complexity (Caplan, Alpert, & Waters, 1998; Stromswold, Caplan, Alpert,
& Rauch, 1996; for music, see Maess, Koelsch, Gunter, & Friederici, 2001).
Musical expectancy also involves monitoring of conflicts between expected
and actual music sequences. This may recruit parts of the anterior cingulate
(Botvinick, Cohen, & Carter, 2004) or orbitofrontal cortex (Koelsch, 2014).
It should be noted that nearly all of the brain regions proposed above
have been reported in at least one imaging study of music listening; and
many have been reported frequently. Detailed predictions for neural
correlates of emotion perception, based on the ICINAS-BRECVEMAC
framework (Juslin, 2019), have not been proposed earlier, but the reported
blood-flow changes are at the least consistent with sources of perceived
emotions in terms of iconic similarity with emotional speech (e.g., the right
frontal gyrus), intrinsically coded tension in musical structure (e.g., the
anterior cingulate cortex), and associative coding based on classic
conditioning (e.g., the cerebellum; see Table 1).
Overlapping brain areas between evoked and perceived emotion (Tables
1 and 2) could reflect similar processes—such as emotion perception
(prefrontal brain areas, involved in the induction mechanism contagion) and
conflict monitoring (the anterior cingulate cortex, which is involved in both
intrinsic sources of perceived emotions and the expectancy mechanism for
emotion induction). We emphasize, however, that all “post-hoc”
speculations of this type must be treated with caution: The relevant
distinctions between processes must be made at the stage of experimental
design (Juslin et al., 2014, 2015), rather than in the interpretations
afterwards.
C R : A F N
A ?
Nearly a decade ago, Peretz (2010) observed that the neuropsychology of

music and emotion was in its infancy. Yet she seemed optimistic: “It is
remarkable how much progress has been accomplished over the last
decade” (Peretz, 2010, p. 119). We take a slightly more pessimistic view on
the current state of the art: the field may have become a “toddler,” but the
results are fragmented.
Most studies seem to make sense, when considered on their own, but the
different studies do not add up to a consistent “big picture.” When it comes
to understanding which brain regions are involved in music and emotion,
and their respective role in the underlying processes, it is not obvious that
the field has advanced much, as compared to the seminal studies carried out
nearly twenty years ago (Blood & Zatorre, 2001). Yes, some brain areas
have been (more or less) consistently reported across different studies—but
we still do not know which roles they play.
We suggest that this reflects the lack of a systematic research program,
which truly attempts to link specific psychological processes to brain
networks. The previously outlined ICINAS-BRECVEMAC framework
provides one promising way to address this issue. We recognize, however,
that there may be other ways of “slicing the pie.” The important thing is that
we do not try to eat the pie randomly, because that is bound to get messy.
We argue that future research designs need to become increasingly
sensitive to psychological process distinctions. To this end, we propose
three ways of enhancing progress in the domain:
(1) actively manipulating and contrasting different psychological

mechanisms in the same experimental design (cf. Juslin et al.,
2014);
(2) employing convergent measures to support conclusions about the
engagement of each mechanism (see Juslin et al., 2015, Table 7)
and about whether perception or induction of emotion has occurred
(e.g., Lundqvist, Carlsson, Hilmersson, & Juslin, 2009);
(3) analyzing sets of regions as networks (as opposed to analyzing
single regions), in order to increase the selectivity of response in
the brain region of interest (Poldrack, 2006; cf. Koelsch, Skouras,
& Lohmann, 2018).
We also recommend the use of more systematic control conditions (i.e.,

contrasts) to rule out “alternative interpretations”—contrasting different
mechanisms not only with one another, but also with “non-emotional”
music listening, listening to “mere sounds,” to silence, etc., in order to
isolate the brain networks that are selectively involved in: music listening
per se; emotions in general; specific emotion categories and dimensions;
psychological mechanisms; and more domain-general cognitive processes
(e.g., attention). One hitherto unexplored possibility is to use transcranial
magnetic stimulation (Pascual-Leone, Davey, Rothwell, Wassermann, &
Puri, 2002) to disrupt brain activity at crucial times and locations to prevent
mechanisms from becoming activated by music events.
Some scholars argue that an understanding of musical emotions is
important in order to better understand emotions in general (Koelsch et al.,
2010). Indeed, because music engages so fully with our emotions, music
can sometimes reveal the nature of our “emotional machinery” more clearly
than the stimuli normally used to study emotions. The fact that music
appears to be so “abstract”—meaning that our “post-hoc” rationalizations
for emotions cannot be made to easily fit—may help us to think more
clearly about the “true” causes of our emotions (Juslin, 2019).
Current theory of music and emotion suggests that responses are
mediated by a wide range of mechanisms, rather than just cognitive
appraisal. However, there is an unfortunate disconnect between theory in
the field and empirical studies of the neural correlates, which prevents brain
studies from realizing their full potential; when psychological theory
becomes reflected in the experimental design of brain imaging studies, that
is when things are bound to get exciting.
R
(Articles marked * are included in the empirical review)
Adolphs, R., Damasio, H., & Tranel, D. (2002). Neural systems for recognition of emotional
prosody: A 3-D lesion study. Emotion 2, 23–51.
*Alfredson, B. B., Risberg, J., Hagberg, B., & Gustafson, L. (2004). Right temporal lobe activation
when listening to emotionally significant music. Applied Neuropsychology 11, 161–166.
*Altenmüller, E., Schurmann, K., Lim, V. K., & Parlitz, D. (2002). Hits to the left, flops to the right:
Different emotions during listening to music are reflected in cortical lateralisation patterns.
Neuropsychologia 40, 2242–2256.
*Ball, T., Rahm, B., Eickhoff, S. B., Schulze-Bonhage, A., Speck, O., & Mutschler, I. (2007).
Response properties of human amygdala subregions: Evidence based on functional MRI combined
with probabilistic anatomical maps. PLoS ONE 2, e307.
Barrett, L. F. (2017). How emotions are made: The secret life of the brain. Boston: Houghton Mifflin
Harcourt.
Baumgartner, H. (1992). Remembrance of things past: Music, autobiographical memory, and
emotion. Advances in Consumer Research 19, 613–620.
*Baumgartner, T., Esslen, M., & Jäncke, L. (2005). From emotion perception to emotion experience:
Emotions evoked by pictures and classical music. International Journal of Psychophysiology 60,
34–43.
*Baumgartner, T., Lutz, K., Schmidt, C. F., & Jäncke, L. (2006). The emotional power of music:
How music enhances the feeling of affective pictures. Brain Research 1075, 151–164.
Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton Century Crofts.
Blair, M. E., & Shimp, T. A. (1992). Consequences of an unpleasant experience with music: A
second-order negative conditioning perspective. Journal of Advertising 21, 35–43.
Blonder, L. X. (1999). Brain and emotion relations in culturally diverse populations. In A. L. Hinton
(Ed.), Biocultural approaches to the emotions (pp. 275–296). Cambridge: Cambridge University
Press.
*Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with
activity in brain regions implicated in reward and emotion. Proceedings of the National Academy
of Sciences 98, 11818–11823.
*Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant
and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience 2,
382–387.
Botvinick, M. M., Cohen, J. D., & Carter, C. S. (2004). Conflict monitoring and anterior cingulate
cortex. Trends in Cognitive Sciences 8, 539–546.
Boulis, N. M., Kehne, J. H., Miserendino, M. J. D., & Davis, M. (1990). Differential blockade of
early and late components of acoustic startle following intrathecal infusion of 6-cyano-7-
nitroquinoxaline-2,3-dione (CNQX) or D, L-2-amino-5-phosphonovaleric acid (AP-5). Brain
Research 520, 240–246.
Brandao, M. L., Melo, L. L., & Cardoso, S. H. (1993). Mechanisms of defense in the inferior
colliculus. Behavioral Brain Research 58, 49–55.
*Brattico, E., Alluri, V., Bogert, B., Jacobsen, T., Vartiainen, N., Nieminen, S., & Tervaniemi, M.
(2011). A functional MRI study of happy and sad emotions in music with and without lyrics.
Bressler, S. L., & Menon, V. (2010). Large-scale brain networks in cognition: Emerging methods and
principles. Trends in Cognitive Sciences 14, 277–290.
Brown, C. M., Hagoort, P., & Kutas, M. (2000). Postlexical integration processes in language
comprehension: Evidence from brain-imaging research. In M. S. Gazzaniga (Ed.), The new
cognitive neurosciences (2nd ed., pp. 881–895). Cambridge, MA: MIT Press.
*Brown, S., Martinez, M. J., & Parsons, L. M. (2004). Passive music listening spontaneously
engages limbic and paralimbic systems. Neuroreport 15, 2033–2037.
Bryan, G. A., & Barrett, H. C. (2008). Vocal emotion recognition across disparate cultures. Journal
of Cognition and Culture 8, 135–148.
*Bryden, M. P., Ley, R. G., & Sugarman, J. H. (1982). A left-ear advantage for identifying the
emotional quality of tonal sequences. Neuropsychologia 20, 83–87.
Cabeza, R., & Nyberg, L. (2000). Imaging cognition II: An empirical review of 275 PET and fMRI
studies. Journal of Cognitive Neuroscience 12, 1–47.
Caplan, D., Alpert, N., & Waters, G. (1998). Effects of syntactic structure and propositional number
on patterns of regional cerebral blood flow. Journal of Cognitive Neuroscience 10, 541–542.
*Caria, A., Venuti, P., & de Falco, S. (2011). Functional and dysfunctional brain circuits underlying
emotional processing of music in autism spectrum disorders. Cerebral Cortex 21, 2838–2849.
*Chapin, H., Jantzen, K., Kelso, J. S., Steinberg, F., & Large, E. (2010). Dynamic emotional and
neural responses to music depend on performance expression and listener experience. PloS ONE 5,
e13812.
Charlot, V., Tzourio, N., Zilbovicius, M., Mazoyer, B., & Denis, M. (1992). Different mental imagery
abilities result in different regional cerebral blood flow activation patterns during cognitive tasks.
Chikazoe, J., Lee, D. H., Kriegeskorte, N., & Anderson, A. K. (2014). Population coding of affect
across stimuli, modalities, and individuals. Nature Neuroscience 17, 1114–1122.
Clark-Polner, E., Wager, T. D., Satpute, A. B., & Barrett, L. F. (2016). Neural fingerprinting: Meta-
analysis, variation, and the search for brain-based essences in the science of emotion. In L. F.
Barrett, M. Lewis, & J. M. Haviland-Jones (Eds.), Handbook of emotions (4th ed., pp. 146–165).
New York: Guilford Press.
Clynes, M. (1977). Sentics: The touch of emotions. New York: Doubleday.
Cui, X., Jeter, C. B., Yang, D., Montague, P. R., & Eagleman, D. M. (2007). Vividness of mental
imagery: Individual variability can be measured objectively. Vision Research 47, 474–478.
*Daly, I., Malik, A., Hwang, F., Roesch, E., Weaver, J., Kirke, A., … Nasuto, S. J. (2014). Neural
correlates of emotional responses to music: An EEG study. Neuroscience Letters 573, 52–57.
*Daly, I., Williams, D., Hallowell, J., Hwang, F., Kirke, A., Malik, A., … Nasuto, S. J. (2015).
Music-induced emotions can be predicted from a combination of brain activity and acoustic
features. Brain and Cognition 101, 1–11.
Damasio, A. (1994). Descartes’ error: Emotion, reason, and the human brain. New York: Avon
Books.
Damasio, A. R., Grabowski, T. J., Bechara, A., Damasio, H., Ponto, L. L. B., Parvizi, J., & Hichwa,
R. D. (2000). Subcortical and cortical brain activity during the feeling of self-generated emotions.
Nature Neuroscience 3, 1049–1056.
Davidson, R. J. (1995). Celebral asymmetry, emotion, and affective style. In R. J. Davidson & K.
Hugdahl (Eds.), Brain asymmetry (pp. 361–387). Cambridge, MA: MIT Press.
Davis, M. (1984). The mammalian startle response. In R. C. Eaton (Ed.), Neural mechanisms of
startle behavior (pp. 287–342). New York: Plenum Press.
Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press.
Dolcos, F., LaBar, K. S., & Cabeza, R. (2005). Remembering one year later: Role of the amygdala
and the medial temporal lobe memory system in retrieving emotional memories. Proceedings of
the National Academy of Sciences 102, 2626–2631.
Dowling, W. J., & Harwood, D. L. (1986). Music cognition. New York: Academic Press.
*Eldar, E., Ganor, O., Admon, R., Bleich, A., & Hendler, T. (2007). Feeling the real world: Limbic
response to music depends on related content. Cerebral Cortex 17, 2828–2840.
*Escoffier, N., Zhong, J., Schirmer, A., & Qui, A. (2013). Emotions in voice and music: Same code,
same effect? Human Brain Mapping 34, 1796–1810.
Fanselow, M. S., & Poulos, A. M. (2005). The neuroscience of mammalian associative learning.
Annual Review of Psychology 56, 207–234.
Farah, M. J. (2000). The neural bases of mental imagery. In M. S. Gazzaniga (Ed.), The new
cognitive neurosciences (2nd ed., pp. 965–974). Cambridge, MA: MIT Press.
*Field, T., Martinez, A., Nawrocki, T., Pickens, J., Fox, N. A., & Schanberg, S. (1998). Music shifts
frontal EEG in depressed adolescents. Adolescence 33, 109–116.
*Flores-Gutierrez, E. O., Diaz, J.-L., Barrios, F. A., Favila-Humara, R., Guevara, M. A., del Rio-
Portilla, Y., & Corsi-Cabrera, M. (2007). Metabolic and electric brain patterns during pleasant and
unpleasant emotions induced by music masterpieces. International Journal of Psychophysiology
65, 69–84.
*Flores-Gutiérrez, E. O., Díaz, J.-L., Barrios, F. A., Guevara, M. Á., del Río-Portilla, Y., Corsi-
Cabrera, M., & del Flores-Gutiérrez, E. O. (2009). Differential alpha coherence hemispheric
patterns in men and women during pleasant and unpleasant musical emotions. International
Journal of Psychophysiology 71, 43–49.
Universal recognition of three basic emotions in music. Current Biology 19, 1–4.
is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32, 1791–1802.
*Gagnon, L., & Peretz, I. (2000). Laterality effects in processing tonal and atonal melodies with
affective and nonaffective task instructions. Brain & Cognition 43, 206–210.
Gärdenfors, P. (2003). How homo became sapiens: On the evolution of thinking. Oxford: Oxford
University Press.
Gilboa, A. (2004). Autobiographical and episodic memory: One and the same? Evidence from
prefrontal activation in neuroimaging studies. Neuropsychologica 42, 1336–1349.
Goldenberg, G., Podreka, I., Steiner, M., Franzén, P., & Deecke, L. (1991). Contributions of occipital
and temporal brain regions to visual and acoustic imagery: A SPECT study. Neuropsychologia 29,
695–702.
*Gosselin, N., Peretz, I., Hasboun, D., Baulac, M., & Samson, S. (2011). Impaired recognition of
musical emotions and facial expressions following anteromedial temporal lobe excision. Cortex
47, 1116–1125.
*Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emotion
recognition from music. Neuropsychologia 45, 236–244.
*Gosselin, N., Peretz, I., Noulhiane, M., Hasboun, D., Beckett, C., Baulac, M., & Samson, S. (2005).
Impaired recognition of scary music following unilateral temporal lobe excision. Brain 128, 628–
640.
*Gosselin, N., Samson, S., Adolphs, R., Noulhiane, M., Roy, M., Hasboun, D., … Peretz, I. (2006).
Emotional responses to unpleasant music correlates with damage to the parahippocampal cortex.
Brain 129, 2585–2592.
*Goydke, K. N., Altenmüller, E., Möller, J., & Münte, T. (2004). Changes in emotional tone and
instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Research 21, 351–
359.
in beat perception: Audition primes vision, but not vice versa. NeuroImage 54, 1231–1243.
*Green, A. C., Baerentsen, K., Stodkilde-Jorgensen, H., Wallentin, M., Roepstorff, A., & Vuust, P.
(2008). Music in minor activates limbic structure: A relationship with dissonance? Neuroreport 19,
711–715.
*Griffiths, T. D., Warren, J. D., Dean, J. L., & Howard, D. (2004). “When the feeling’s gone”: A
selective loss of musical emotion. Journal of Neurology, Neurosurgery & Psychiatry 75, 344–345.
Haist, F., Gore, J. B., & Mao, H. (2001). Consolidation of human memory over decades revealed by
functional magnetic resonance imaging. Nature Neuroscience 4, 1139–1145.
Harmon-Jones, E., Harmon-Jones, C., & Summerell, E. (2017). On the importance of both
dimensional and discrete models of emotion. Behavioral Sciences 7, 66.
Harrer, G., & Harrer, H. (1977). Music, emotion, and autonomic function. In M. Critchley & R. A.
Henson (Eds.), Music and the brain: Studies in the neurology of music (pp. 202–216). London:
William Heinemann Medical Books.
Hodges, D., & Sebald, D. (2011). Music in the human experience: An introduction to music
psychology. New York: Routledge.
*Hsieh, S., Hornberger, M., Piguet, O., & Hodges, J. R. (2012). Brain correlates of musical and facial
emotion recognition: Evidence from the dementias. Neuropsychologia 50, 1814–1822.
Izard, C. E. (1977). The emotions. New York: Plenum Press.
*Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral
Cortex 19, 2579–2594.
*Jeong, J.-W., Diwadkar, V. A., Chugani, C. D., Sinsoongsud, P., Muzik, O., Behen, M. E., …
Chugani, D. C. (2011). Congruence of happy and sad emotion in music and faces modifies cortical
audiovisual activation. NeuroImage 54, 2973–2982.
Johnsrude, I. S., Owen, A. M., White, N. M., Zhao, W. V., & Bohbot, V. (2000). Impaired preference
conditioning after anterior temporal lobe resection in humans. Journal of Neuroscience 20, 2649–
2656.
Juslin, P. N. (2001). Communicating emotion in music performance: A review and a theoretical
framework. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp.
309–337). Oxford: Oxford University Press.
Juslin, P. N. (2011). Music and emotion: Seven questions, seven answers. In I. Deliège & J. Davidson
(Eds.), Music and the mind (pp. 113–135). Oxford: Oxford University Press.
Juslin, P. N. (2013a). From everyday emotions to aesthetic emotions: Toward a unified theory of
musical emotions. Physics of Life Reviews 10, 235–266.
Juslin, P. N. (2013b). What does music express? Basic emotions and beyond. Frontiers in
Psychology: Emotion Science 4, 596.
Juslin, P. N. (2019). Musical emotions explained. Oxford: Oxford University Press.
Juslin, P. N., Barradas, G., & Eerola, T. (2015). From sound to significance: Exploring the
mechanisms underlying emotional reactions to music. American Journal of Psychology 128, 281–
304.
Juslin, P. N., Harmat, L., & Eerola, T. (2014). What makes music emotionally significant? Exploring
the underlying mechanisms. Psychology of Music 42, 599–623.
Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music
performance: Different channels, same code? Psychological Bulletin 129, 770–814.
Juslin, P. N., Liljeström, S., Västfjäll, D., Barradas, G., & Silva, A. (2008). An experience sampling
study of emotional reactions to music: Listener, music, and situation. Emotion 8, 668–683.
Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford:
Juslin, P. N., & Sloboda, J. A. (Eds.). (2010). Handbook of music and emotion: Theory, research,
applications. Oxford: Oxford University Press.
mechanisms. Behavioral and Brain Sciences 31, 559–575.
*Kamiyama, K. S., Abla, D., Iwanaga, K., & Okanoya, K. (2013). Interaction between musical
emotion and facial expression as measured by event-related potentials. Neuropsychologia 51, 500–
505.
Kassam, K. S., Markey, A. R., Cherkassky, V. L., Loewenstein, G., & Just, M. A. (2013). Identifying
emotions on the basis of neural activation. PLoS ONE 8, e66032.
*Khalfa, S., Schon, D., Anton, J. L., & Liégeois-Chauvel, C. (2005). Brain regions involved in the
recognition of happiness and sadness in music. Neuroreport 16, 1981–1984.
Kinomura, S., Larsson, J., Gulyás, B., & Roland, P. E. (1996). Activation by attention of the human
reticular formation and thalamic intralaminar nuclei. Science 271, 512–515.
Kivy, P. (1990). Music alone: Reflections on a purely musical experience. Ithaca, NY: Cornell
University Press.
Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15,
170–180.
*Koelsch, S., Fritz, T., & Schlaug, G. (2008). Amygdala activity can be modulated by unexpected
chord functions during music listening. Neuroreport 19, 1815–1819.
*Koelsch, S., Fritz, T., von Cramon, D. Y., Müller, K., & Friederici, A. D. (2006). Investigating
emotion with music: An fMRI study. Human Brain Mapping 27, 239–250.
*Koelsch, S., Kilches, S., Steinbeis, N., & Schelinski, S. (2008). Effects of unexpected chords and of
performer’s expression on brain responses and electrodermal activity. PLoS ONE 3, e2631.
*Koelsch, S., Remppis, A., Sammler, D., Jentschke, S., Mietchen, D., Fritz, T., … Siebel, W. A.
(2007). A cardiac signature of emotionality. European Journal of Neuroscience 26, 3328–3338.
Koelsch, S., Siebel, W. A., & Fritz, T. (2010). Functional neuroimaging. In P. N. Juslin & J. A.
Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 313–344).
*Koelsch, S., Skouras, S., Fritz, T., Herrera, P., Bonhage, C., Küssner, M. B., & Jacobs, A. M.
(2013). The roles of superficial amygdala and auditory cortex in music-evoked fear and joy.
for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13, e0190057.
Kreutz, G., & Lotze, M. (2007). Neuroscience of music and emotion. In W. Gruhn & F. Rauscher
(Eds.), Neurosciences in music pedagogy (pp. 143–167). New York: Nova.
*Kreutz, G., Ott, U., & Wehrum, S. (2006). Cerebral correlates of musically-induced emotions: An
fMRI-study. In M. Baroni et al. (Eds.), Proceedings of the Ninth International Conference on
Music Perception and Cognition (ICMPC). Bologna, August 22–26.
Lane, R. D. (2000). Neural correlates of conscious emotional experience. In R. D. Lane & L. Nadel
(Eds.), Cognitive neuroscience of emotion (pp. 345–370). Oxford: Oxford University Press.
Langer, S. K. (1957). Philosophy in a new key. Cambridge, MA: Harvard University Press.
LeDoux, J. E. (2000). Cognitive-emotional interactions: Listen to the brain. In R. D. Lane & L.
Nadel (Eds.), Cognitive neuroscience of emotion (pp. 129–155). Oxford: Oxford University Press.
*Lehne, M., Rohrmeier, M., & Koelsch, S. (2014). Tension-related activity in the orbitofrontal cortex
and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9, 1515–
1523.
*Lerner, Y., Papo, D., Zhdanov, A., Belozersky, L., & Hendler, T. (2009). Eyes wide shut: Amygdala
mediates eyes-closed effect on emotional experience with music. PLoS ONE 4, e6230.
*Liégeois-Chauvel, C., Bénar, C., Krieg, J., Delbé, C., Chauvel, P., Giusiano, B., & Bigand, E.
(2014). How functional coupling between the auditory cortex and the amygdala induces musical
emotion: A single case study. Cortex 60, 82–93.
*Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010).
EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering
57, 1798–1806.
Lindquist, K. A., Wager, T. D., Kober, H., Bliss-Moreau, E., & Barrett, L. F. (2012). The brain basis
of emotion: A meta-analytic review. Behavioral and Brain Sciences 35, 121–143.
*Logeswaran, N., & Bhattacharya, J. (2009). Crossmodal transfer of emotion by music. Neuroscience
Letters 455, 129–133.
Lundqvist, L.-O., Carlsson, F., Hilmersson, P., & Juslin, P. N. (2009). Emotional responses to music:
Experience, expression, and physiology. Psychology of Music 37, 61–90.
MacDonald, R., Kreutz, G., & Mitchell, L. (Eds.). (2012). Music, health, and well-being. Oxford:
Broca’s area: A MEG study. Nature Neuroscience 4, 540–545.
*Matthews, B. R., Chang, C.-C., De May, M., Engstrom, J., & Miller, B. L. (2009). Pleasurable
emotional response to music: A case of neurodegenerative generalized auditory agnosia.
Neurocase 15, 248–259.
*Mazzoni, M., Moretti, P., Pardossi, L., Vista, M., & Muratorio, A. (1993). A case of music
imperception. Journal of Neurology, Neurosurgery & Psychiatry 56, 322–324.
*Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological
connectivity of the mesolimbic system. NeuroImage 28, 175–184.
Meyer, L. B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.
Miserendino, M. J. D., Sananes, C. B., Melia, K. R., & Davis, M. (1990). Blocking of acquisition but
not expression of conditioned fear-potentiated startle by NMDA antagonists in the amygdala.
Nature 345, 716–718.
*Mitterschiffthaler, M. T., Fu, C. H., Dalton, J. A., Andrew, C. M., & Williams, S. C. (2007). A
Mapping 28, 1150–1162.
*Mizuno, T., & Sugishita, M. (2007). Neural correlates underlying perception of tonality-related
emotional contents. Neuroreport 18, 1651–1655.
*Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J., … Möller, H. E. (2015).
Murphy, F. C., Nimmo-Smith, I., & Lawrence, A. D. (2003). Functional neuroanatomy of emotions:
A meta-analysis. Cognitive, Affective, & Behavioral Neuroscience 3, 207–233.
*Nair, D. G., Large, E. W., Steinberg, F., & Kelso, J. A. S. (2002). Perceiving emotion in expressive
piano performance: A functional MRI study. In K. Stevens et al. (Eds.), Proceedings of the 7th
International Conference on Music Perception and Cognition, July 2002 (CD rom). Adelaide,
Australia: Causal Productions.
Nyberg, L., McIntosh, A. R., Houle, S., Nilsson, L.-G., & Tulving, E. (1996). Activation of medial-
temporal structures during episodic memory retrieval. Nature 380, 715–717.
*Omar, R., Hailstone, J. C., Warren, J. E., Crutch, S. J., & Warren, J. D. (2010). The cognitive
organization of music knowledge: A clinical analysis. Brain 133, 1200–1213.
*Omar, R., Henley, S., Bartlett, J. W., Hailstone, J. C., Gordon, E., Sauter, D. A., … Warren, J. D.
(2011). The structural neuroanatomy of music emotion recognition: Evidence from frontotemporal
lobar degeneration. NeuroImage 56, 1814–1821.
Osborne, J. W. (1980). The mapping of thoughts, emotions, sensations, and images as responses to
music. Journal of Mental Imagery 5, 133–136.
*Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koistovo, J., Gjedde, A., & Carlson, S.
(2005). Emotion processing of major, minor, and dissonant chords: A functional magnetic
resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453.
Paquette, S., Takerkart, S., Saget, S., Peretz, I., & Belin, P. (2018). Cross-classification of musical
and vocal emotions in the auditory cortex. Annals of the New York Academy of Sciences 1423,
329–337.
Pascual-Leone, A., Davey, N. J., Rothwell, J., Wassermann, E. M., & Puri, B. K. (Eds.). (2002).
Handbook of transcranial magnetic stimulation. Oxford: Oxford University Press.
Paulmann, S., Ott, D. V. M., & Kotz, S. A. (2011). Emotional speech perception unfolding in time:
The role of the basal ganglia. PLoS ONE 6, e17694.
*Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010).
National Academy of Sciences 107, 4758–4763.
*Pereira, C. S., Teixeira, J., Figueiredo, P., Xavier, J., Castro, S. L., & Brattico, E. (2011). Music and
emotions in the brain: Familiarity matters. PLoS ONE 6, e27241.
Peretz, I. (2001). Listen to the brain: A biological perspective on musical emotions. In P. N. Juslin &
J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 105–134). Oxford: Oxford
University Press.
Peretz, I. (2010). Towards a neurobiology of musical emotions. In P. N. Juslin & J. A. Sloboda
(Eds.), Handbook of music and emotion: Theory, research, applications (pp. 99–126). Oxford:
*Peretz, I., & Gagnon, L. (1999). Dissociation between recognition and emotional judgment for
melodies. Neurocase 5, 21–30.
*Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants,
immediacy, and isolation after brain damage. Cognition 68, 111–141.
Pessoa, L. (2013). The cognitive-emotional brain: From interactions to integration. Cambridge, MA:
MIT Press.
*Petrini, K., Crabbe, F., Sheridan, C., & Pollick, F. E. (2011). The music of your emotions: Neural
substrates involved in detection of emotional correspondence between auditory and visual music
actions. PloS ONE 6, e19165.
Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in
Cognitive Sciences 10, 59–63.
Rossignol, S., & Jones, G. (1976). Audio-spinal influence in man studied by the H-reflex and its
possible role on rhythmic movements synchronized to sound. Electroencephalography and
Clinical Neurophysiology 41, 83–92.
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology
39, 1161–1178.
Saarimäki, H., Gotsopoulos, A., Jaaskelainen, I. P., Lampinen, J., Vuilleumier, P., Hari, R., …
Nummenmaa, L. (2016). Discrete neural signatures of basic emotions. Cerebral Cortex 26, 2563–
2573.
Sacchetti, B., Scelfo, B., & Strata, P. (2005). The cerebellum: Synaptic changes and fear
conditioning. The Neuroscientist 11, 217–227.
*Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. (2011). Anatomically distinct
dopamine release during anticipation and experience of peak emotion to music. Nature
*Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J.
value. Science 340, 216–219.
*Sammler, D., Grigutsch, M., Fritz, T., & Koelsch, S. (2007). Music and emotion:
Electrophysiological correlates of the processing of pleasant and unpleasant music.
Psychophysiology 44, 293–304.
*Satoh, M., Nakase, T., Nagata, K., & Tomimoto, H. (2011). Musical anhedonia: Selective loss of
emotional experience in listening to music. Neurocase 17, 410–417.
Scherer, K. R. (1999). Appraisal theories. In T. Dalgleish & M. Power (Eds.), Handbook of cognition
and emotion (pp. 637–663). Chichester: Wiley.
Scherer, K. R., & Zentner, M. R. (2001). Emotional effects of music: Production rules. In P. N. Juslin
& J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361–392). Oxford: Oxford
University Press.
Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: Brain mechanisms mediating vocal
emotional processing. Trends in Cognitive Sciences 10, 24–30.
*Schmidt, B., & Hanslmayr, S. (2009). Resting frontal EEG alpha-asymmetry predicts the evaluation
of affective musical stimuli. Neuroscience Letters 460, 237–240.
*Schmidt, L. A., & Trainor, L. J. (2001). Frontal brain electrical activity (EEG) distinguishes valence
and intensity of musical emotions. Cognition & Emotion 15, 487–500.
*Schmidt, L. A., Trainor, L. J., & Santesso, D. L. (2003). Development of frontal encephalogram
(EEG) and heart rate (ECG) responses to affective musical stimuli during the first 12 months of
post-natal life. Brain and Cognition 52, 27–32.
*Shahabi, H., & Moghimi, S. (2016). Toward automatic detection of brain responses to emotional
music through analysis of EEG effective connectivity. Computers in Human Behavior 58, 231–
239.
*Singer, N., Jacoby, N., Lin, T., Raz, G., Shpigelman, L., Gilam, G., … Hendler, T. (2016). Common
modulation of limbic network activation underlies musical emotions as they unfold. NeuroImage
141, 517–529.
Sloboda, J. A., & Juslin, P. N. (2001). Psychological perspectives on music and emotion. In P. N.
Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 71–104). Oxford:
Spitzer, M. (2013). Sad flowers: Affective trajectory in Schubert’s Trockne Blumen. In T. Cochrane,
B. Fantini, & K. R. Scherer (Eds.), The emotional power of music (pp. 7–21). Oxford: Oxford
University Press.
*Spreckelmeyer, K. N., Altenmüller, E., Colonius, H., & Münte, T. F. (2013). Preattentive processing
of emotional musical tones: A multidimensional scaling and ERP study. Frontiers in Psychology 4,
656. doi:10.3389/fpsyg.2013.00656
*Spreckelmeyer, K. N., Kutas, M., Urbach, T. P., Altenmüller, E., & Münte, T. F. (2006). Combined
perception of emotion in pictures and musical sounds. Brain Research 1070, 160–170.
*Steinbeis, N., & Koelsch, S. (2009). Understanding the intentions behind man-made products elicits
neural activity in areas dedicated to mental state attribution. Cerebral Cortex 19, 619–623.
*Steinbeis, N., Koelsch, S., & Sloboda, J. A. (2006). The role of musical structure in emotion:
Investigating neural, physiological, and subjective emotional responses to harmonic expectancy
violations. Journal of Cognitive Neuroscience 18, 1380–1393.
Stromswold, K., Caplan, D., Alpert, N., & Rauch, S. (1996). Localization of syntactic comprehension
by positron emission tomography. Brain and Language 52, 452–473.
*Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., … Yanai, K. (2008).
Affective, & Behavioral Neuroscience 8, 126–131.
Thaut, M. H., & Wheeler, B. L. (2010). Music therapy. In P. N. Juslin & J. A. Sloboda (Eds.),
Handbook of music and emotion: Theory, research, applications (pp. 819–848). Oxford: Oxford
University Press.
Thornton-Wells, T. A., Cannistraci, C. J., Anderson, A. W., Kim. C. Y., Eapen, M., Gore, J. C., …
Dykens, E. M.(2010). Auditory attraction: Activation of visual cortex by music and sound in
Williams syndrome. American Journal on Intellectual and Developmental Disabilities 115, 172–
189.
Tierney, A., & Kraus, N. (2013). The ability to move to a beat is linked to the consistency of neural
responses to sound. Journal of Neuroscience 33, 14981–14988.
*Trost, W., Ethofer, T., Zentner, M. R., & Vuilleumier, P. (2012). Mapping aesthetic musical
emotions in the brain. Cerebral Cortex 22, 2769–2783.
Trost, W., Frühholz, S., Schön, D., Labbé, C., Pichon, S., Grandjean, D., & Vuilleumier, P. (2014).
Getting the beat: Entrainment of brain activity by musical rhythm and pleasantness. NeuroImage
103, 55–64.
*Tsang, C. D., Trainor, L. J., Santesso, D. L., Tasker, S. L., & Schmidt, L. A. (2001). Frontal EEG
responses as a function of affective musical features. Annals of the New York Academy of Sciences
930, 439–442.
Wager, T. D., Barrett, L. F., Bliss-Moreau, E., Lindquist, K. A., Duncan, S., Kober, H., & Mize, J.
(2008). The neuroimaging of emotion. In M. Lewis, J. M. Haviland-Jones, & L. F. Barrett (Eds.),
Handbook of emotions (3rd ed., pp. 249–267). New York: Guilford Press.
Wagner, A. D., Shannon, B. J., Kahn, I., & Buckner, R. L. (2005). Parietal lobe contributions to
episodic memory retrieval. Trends in Cognitive Sciences 9, 445–453.
Warren, J. E., Sauter, D. A., Eisner, F., Wiland, J., Dresner, M. A., Wise, R. J., … Scott, S. K. (2006).
Positive emotions preferentially engage an auditory-motor “mirror” system. Journal of
Zentner, M. R. (2010). Homer’s prophecy: An essay on music’s primary emotions. Music Analysis
29, 102–125.
1
The dopaminergic mesolimbic reward pathway involving nucleus accumbens is a prime suspect
when it comes to positive emotions. The hippocampus and the amygdala are both considered key
areas for emotional memories, which will presumably not be involved as much during comparatively
“neutral” perception of emotions.
CHAPT E R 14
NEUROCHEMICAL
R E S P O N S E S TO M U S I C
Y U K O K O S H I MO R I
M is used in various settings in everyday life and can modulate our

mood, emotion, arousal, motivation, and movement. These music-induced
effects can be objectively assessed using neuroimaging techniques and
peripheral biomarkers. For example, functional neuroimaging studies have
demonstrated that listening to music alters brain activity in various brain
regions in the mesocorticolimbic pathways such as the anterior cingulate
cortex, orbitofrontal cortex, insula, amygdala, hippocampus, and ventral
striatum, which are implicated in reward, motivation, and emotional
behaviors, in addition to the brain regions in the motor pathways such as the
premotor and supplementary motor areas, thalamus, basal ganglia, and
cerebellum. These neuroimaging studies reveal important anatomical
information that also allows us to infer what brain functions music can
modulate. However, within the same anatomical region, different
neuroreceptors are expressed. Knowledge of the neurochemical functions
can uncover more specific effects of music on various brain functions and
help us to better understand the effects of music on brain pathology.
Neurochemical functions can be measured using positron emission
tomography (PET). PET imaging is a nuclear medicine imaging technique
to quantify the chemical/biological processes of molecules in vivo by
injecting radiolabeled molecules (i.e., radioligands; Venneti, Lopresti, &
Wiley, 2013). The radioligands typically resemble endogenous biological
molecules with specific biological targets that they bind. This allows for
mapping the distribution of the molecules in the brain. The radioligand is
synthesized by labeling a precursor molecule with short-lived radionuclides
such as carbon-11 (t1/2 = 20.4 min) or fluorine-18 (t1/2 = 109.8 min). After
the radioligand is injected intravenously, it enters the bloodstream, crosses
the blood–brain barrier, and binds target receptors or proteins in the brain.
The radioligands can be agonists that induce downstream signaling in a
manner similar to the endogenous molecules or antagonists that block the
receptor and prevent it from being available to the endogenous molecules
(Gunn, Slifstein, Searle, & Price, 2015). Multiple radioligands for different
targets have been developed such as dopamine (DA), serotonin (5-HT),
norepinephrine (NE), opioid, acetylcholine (Ach) and others (Gunn et al.,
2015). PET imaging employing these radioligands allows for uncovering
music-induced neurotransmissions that bind target neuroreceptors. On the
other hand, major limitations of PET imaging are cost, invasiveness, and
limited accessibility. Because of these limitations, there have been few
studies using PET imaging to investigate music-induced neurochemical
changes. Accordingly, the research findings discussed in this chapter are
primarily based on the molecular concentrations or secretion rate of the
peripheral biomarkers in blood (e.g., plasma and platelets), saliva, and
urine. It should be noted that some central and peripheral chemicals serve
different functions (e.g., norepinephrine), and whether some peripheral
measures reflect the central measures is debatable (e.g., oxytocin), which is
discussed later in this chapter. This chapter covers neurotransmitters
including DA, 5-HT, NE, and Ach; neuropeptides such as beta (ß)-
endorphin, oxytocin (OT), and arginine vasopressin (AV), as well as their
receptors and associated genes; steroid hormones such as cortisol; and
peripheral immune biomarkers.
D S
Dopamine (DA) is synthesized in the cytosol of catecholaminergic neurons
in the ventral tegmental area (VTA) and in the substantia nigra pars
compacta (SNpc) of the brain. The VTA DA sends projections to the ventral
striatum/nucleus accumbens (NAc), amygdala, and hippocampus as well as
medial prefrontal cortex such as orbitofrontal and anterior cingulate cortices
whereas the SN DA sends projections to the dorsal striatum (i.e., putamen
and caudate nucleus). These DA pathways are commonly referred to as
mesocorticolimbic and nigro-striatal dopamine pathways, respectively. The
former pathway is associated with emotional/motivational functions
whereas the latter pathway is more involved in the executive/cognitive and
sensorimotor functions (Solís & Moratalla, 2018). There are five dopamine
receptor subtypes, which are classified based on their functional properties
and subdivided into D1-like and D2-like families. The D1-like family
consists of D1 and D5 receptors and the D2-like family consists of D2, D3,
and D4 receptors. DA is also synthesized in the adrenal medulla and acts as
a hormone along with other catecholamines (see section “Norepinephrine
Systems”).
Numerous functional neuroimaging studies have demonstrated that
music alters brain activity in the DA pathways associated with
reward/motivation (Menon & Levitin, 2005; Salimpoor, van den Bosch,
Kovacevic, McIntosh, & Dagher, 2013), emotion/pleasure (Koelsch, 2014;
Mueller et al., 2015; Salimpoor, Benovoy, Larcher, Dagher, & Zatorre,
2011), as well as motor functions (Grahn & Rowe, 2009). However, to date
there has been only one study that investigated dopaminergic transmission
in the ventral and dorsal striatum associated with musical pleasure
(Salimpoor et al., 2011). This study employed PET with [11C]raclopride
that is a D2/D3 receptor antagonist and found that greater DA release
occurred in bilateral dorsal and ventral striatum, but notably in the right
caudate nucleus and the right NAc during listening to pleasurable music
selected by the participants compared to neutral music. Furthermore, the
greater DA release in the right caudate nucleus was associated with the
greater number of peak pleasure or “chills” whereas greater DA release in
the right NAc was associated with more intense chills experienced. These
anatomically distinct roles of the subregions in music listening were further
elucidated by analyzing their temporal brain activation using functional
magnetic resonance imaging (fMRI). The increased brain activity in the
right caudate nucleus occurred several seconds prior to experiencing the
pleasurable peak while the enhanced activity in the right NAc occurred
during the pleasurable moments. The authors interpreted these findings as
indicating that the former structure is involved in the anticipation and
prediction of pleasure and the latter structure, in experiencing pleasure. This
study demonstrated that musical pleasure is associated with DA release in
the ventral striatum, particularly in the NAc. However, as the individuals
who regularly experience “chills” during music listening were selected to
participate in the study, further research is needed to investigate whether
DA is also released during listening to pleasurable music in those who have
never experienced “chills.”
In addition to musical pleasure, the role of DA in music perception and
auditory-motor entrainment was investigated in Parkinson’s disease (PD)
that is primarily characterized by loss of dopaminergic neurons in the SNpc,
resulting in depletion of dopaminergic input to the dorsal striatum. The
involvement of DA in these functions was investigated when people with
PD were on and off dopaminergic medication. One study showed that
dopaminergic medication improved music perception (Cameron, Pickett,
Earhart, & Grahn, 2016). However, it is unknown whether the improvement
was due to the practice effect that was also observed in the healthy
participants or the effect of the medication. Another study using PET with
[11C]-DTBZ that binds dopamine transporters (VMAT2) did not find strong
association between dopaminergic denervation and auditory-motor task
performance (Miller et al., 2013). However, when the people with PD were
grouped based on the similarity of dopaminergic denervation, the auditory-
motor synchronization accuracy paralleled the pattern of denervation. On
the other hand, two fMRI studies did not implicate the role of DA during
auditory-motor entrainment (Elsinger et al., 2003; Jahanshahi et al., 2010).
In addition to these central dopaminergic functions, dopamine-
associated gene expression and peripheral dopaminergic levels were also
investigated. The expression of alpha-synuclein (SNCA) that maintains DA
neuronal homeostasis (Murphy, Rueter, Trojanowski, & Lee, 2000;
Oczkowska, Kozubski, Lianeri, & Dorszewska, 2014) was upregulated in
professional musicians after a two-hour concert performance compared to
after a music-free control condition (Kanduri, Kuusi, et al., 2015), as well
as in listeners with musical experiences longer than 10 years and those with
high musical aptitude after listening to 20-minutes of classical music
(Kanduri, Raijas, et al., 2015). The latter study also reported that the
upregulation was absent after a music-free control condition or in listeners
with no significant musical experience (Kanduri, Kuusi, et al., 2015;
Kanduri, Raijas, et al., 2015).
A few studies investigated the dopaminergic levels in the peripheral
samples as well as associated psychological measures using an auditory
stimulus and music. Two studies reported decreased dopaminergic levels:
one in the urine sample following daily listening to binaural beats for 60
days in healthy adults, who also reported decreased trait anxiety (Wahbeh,
Calabrese, & Zwickey, 2007) and the other study in the plasma sample
following a 12-week dance movement therapy (DMT) that combines music,
light exercise, and sensory stimulation in female adolescents with mild
depression whose psychological distress was also reduced (Jeong et al.,
2005). One study reported no change in the plasma dopaminergic level
following listening to music (high-uplifting music or low-uplifting) in
healthy adults who had performed a stress-induced task (Hirokawa & Ohira,
2003). However, it is unknown whether the stress-induced task affected the
plasma dopaminergic level before listening to music.
To summarize, there is some evidence that music enhances
dopaminergic function. Listening to pleasurable music induces dopamine
release, and music upregulates the SNCA expression that may facilitate
dopaminergic neurotransmission. However, these responses may occur only
in specific people (i.e., those with extensive musical training or those who
regularly experience “chills” by listening to pleasurable music). Further
studies are needed to investigate these effects in individuals with varying
music education/training levels and listening habits and experiences, as well
as using different genres of music. Music may also be able to reduce the
peripheral DA levels and psychological disturbances. Future studies
including both clinical and healthy control participants are needed to clarify
these effects. In addition, some PD studies are suggestive of the role of DA
in music perception and auditory cuing. Further pharmacological studies in
people with PD are needed to address the limitations of previous studies for
clarification.
E O S
Central endogenous opioid systems (EOS) include three opioid peptides
including β-endorphin, enkephalins, and dynorphin and their three
receptors, Mu (μ), Delta (δ), and Kappa (κ) (Benarroch, 2012). Neurons
containing β-endorphin are localized in two main areas, the arcuate nucleus
of the hypothalamus and nucleus tractus solitarius tract of the brainstem,
which send widespread projections to the rest of the brain. Enkephalin and
dynorphin are primarily located in local neurons. Opioid receptors are
widely but differentially expressed throughout the central nervous system
(CNS): the μ receptor is the most abundant receptor in the cerebral cortex,
amygdala, thalamus, brainstem, dorsal horn and dorsal root ganglion (DRG)
neurons; the δ receptor is mostly expressed in the olfactory system,
striatum, limbic cortex, dorsal horn and DRG neurons; and the κ receptor in
the claustrum, striatum, hippocampus, hypothalamus, brainstem, and dorsal
horn. The EOS involves various functions such as reward, pain modulation,
stress responses, and autonomic control.
In the previous section, the study by Salimpoor et al. (2011) showed that
musical pleasure was associated with DA release in the ventral striatum.
However, the central EOS likely plays a primary role in the positive affect
or pleasure induced by music, which acts synergistically with DA (Chanda
& Levitin, 2013). Both endogenous and exogenous opioids can activate DA
neurons in the VTA (Hjelmstad, Xia, Margolis, & Fields, 2013) that
innervates the NAc. Both EOS and dopamine systems are involved in
reward mechanisms, which consist of liking/pleasure (core reactions to
hedonic experience of receiving a reward), wanting (motivational aspect),
and learning (association and cognitive representations) (Berridge &
Kringelbach, 2015), but animal research favors the notion that the EOS but
not the DA system generates pleasure. Specifically, stimulation of μ opioid
receptors in the rostrodorsal part of the medial shell in the NAc, the
posterior ventral pallidum, as well as anterior orbitofrontal cortex and
posterior insula (Castro & Berridge, 2017) enhances pleasure. In fact, two
studies reported that blocking opioid receptors attenuated musical thrills
(Goldstein, 1980) and pleasure (Mallik, Chanda, & Levitin, 2017) in
response to participant-selected music.
In addition to the role of the central EOS in musical pleasure, several
studies investigated plasma levels of β-endorphin, which is released from
the anterior pituitary gland (see also the section “Neuroendocrine Systems
II”), acts as a hormone associated with stress (Kreutz, Murcia, & Bongard,
2012), and therefore functions differently from the β-endorphin in the CNS
(Veening, Gerrits, & Barendregt, 2012).
Listening to techno-music increased the plasma β-endorphin level,
accompanied with increases in other psychophysiological measures as well
as changes in emotional states whereas listening to classical music did not
affect them (Gerra et al., 1998). Interestingly, this study also showed the
association between the β-endorphin responses to music and personal traits.
Higher β-endorphin responses were associated with less novel seeking.
In contrast to enhanced EOS responses, a decrease in the plasma
concentration of β-endorphin was reported in response to experimenter-
selected relaxation music, accompanied with a reduction in worries, fear,
and blood pressure in coronary patients (Vollert, Störk, Rose, & Möckel,
2003) as well as after a single one-hour singing session in choirs affected by
cancer including carers, bereaved carers, and patients (Fancourt, Williamon,
et al., 2016). In this latter study, the β-endorphin level showed negative
correlations with the levels of immune biomarkers and a positive
correlation with another stress biomarker. Another study reported a
decrease in response to experimenter-selected classical music and imagery,
but not to music only or imagery only (McKinney, Tims, Kumar, & Kumar,
1997).
In summary, musical pleasure is associated with the central EOS and
music induces changes in the plasma concentration of β-endorphin. As the
EOS plays an important role in various functions and has two different
functional systems (i.e., central and peripheral systems), more research is
needed to replicate and extend existing literature. Suggested future studies
include PET studies that investigate both EOS and dopamine systems
associated with musical pleasure/reward, studies that investigate the effects
of different genres of music and personal traits on the release of β-
endorphin, as well as studies that investigate the effects of music on EOS
associated with pain modulation (i.e., μ opioid receptors in the central pain
network; Benarroch, 2012) and stress regulation including both healthy
participants and those with pain and stress. When the cerebrospinal fluid
(CSF) or peripheral β-endorphin levels are assessed, the diurnal fluctuations
should be taken into account. Moreover, it should be noted that responses of
the plasma β-endorphin little reflect those in the CSF although they are not
entirely independent (Veening et al., 2012).
S S
Serotonin (5-HT) is synthesized in the raphe nuclei of the brainstem. Some

of the 5-TH neurons project to the dorsal cochlear nucleus (DCN) and
others send ascending projections to the inferior colliculus (IC) in which
auditory neurons express multiple subtypes of 5-HT receptors (Hurley &
Sullivan, 2012). A few studies demonstrated that the pharmacological
stimulation of 5-HT receptors altered auditory perception and thereby
subjective feelings in healthy participants. Specifically, 5-HT receptor 2A
(5HT2A) agonist altered the neural response to both participant-selected,
personally meaningful music and experimenter-selected non-meaningful
music (Barrett, Preller, Herdener, Janata, & Vollenweider, 2017), enhanced
the emotion induced by experimenter-selected music (Kaelen et al., 2015),
as well as enhanced subjective experiences (mental imagery) accompanied
with greater brain connectivity during listening to experimenter-selected
music (Kaelen et al., 2016). These studies suggest that the variance in the
neuroreceptor expression may play a role in subjective musical experiences.
Several studies investigated genetic associations between 5-HT systems
and musical ability/behavior using conventional genetic approaches such as
genome-wide linkage scans, association studies, copy number variation
studies, and candidate gene association. However, the associations are weak
and inconclusive. In a small sample, musical traits were associated with the
protocadherin-alpha gene (Ukkola-Vuoti et al., 2013) that is important for
maturation of the serotonergic projections (Katori et al., 2009) and the
galactose mutarotase gene that plays a role in 5-HT release and in
membrane trafficking of 5-HT transporter (Djurovic et al., 2009). The
serotonin transporter gene (SLC6A4) that regulates 5-HT supply to the
receptors has been associated with musical memory (Granot et al., 2007)
and choir participation (Morley et al., 2012) whereas it showed weak
associations with musical aptitude (Ukkola, Onkamo, Raijas, Karma, &
Järvelä, 2009) and no association with active music listening (Ukkola-Vuoti
et al., 2011).
Serotonin is also implicated in behavioral states such as stress and
emotional behavior (Hurley & Sulivan, 2012), as well as various psychiatric
and neurologic disorders such as depression, anxiety disorders, obsessive-
compulsive disorder, dementia, and post-traumatic stress disorders
(Bandelow et al., 2017). One study measured the platelet content of 5-HT as
a model of neural biochemistry and reported a decrease in response to the
experimenter-selected unpleasant music compared to pleasant music,
suggesting that unpleasant music induces emotional stress or negative
emotions, which led to 5-HT release and decreased intracellular 5-HT
content of the serotonergic neurons, reflected by 5-HT content of platelets
(Evers & Suhr, 2000). Another study reported the plasma serotonin
concentration increased in female adolescents with mild depression who
had DMT whereas it decreased in those who had no intervention (Jeong et
al., 2005). However, the 5-HT levels did not significantly differ between
these two groups after the experimental session. There are also two studies
that reported no 5-HT changes following music interventions (Kumar et al.,
1999; Wahbeh et al., 2007).
In summary, the literature shows weak evidence on associations between
5-HT systems and music. However, music modulates the activity in brain
regions associated with emotion (Koelsch, 2014) and musical activities can
influence social behavior and interaction. Similarly, the 5-HT systems play
an important role in emotional behavior and social interaction (Hurley &
Sullivan, 2012). In addition, 5-HT closely interacts with neuropeptides—
oxytocin and arginine vasopressin that are implicated in social behavior and
social reward (Albers, 2015; Dölen, Darvishzadeh, Huang, & Malenka,
2013) and that have been associated with social aspects of music and
musical activities. Therefore, more studies are needed to fully understand
the relationships between central 5-HT systems and music.
N S I (P
P )
Two neuropeptides released from the posterior pituitary are oxytocin (OT)
and arginine vasopressin (AV). They are highly conserved across species
(Johnson & Young, 2017) and modulate social behaviors (Bachner-Melman
& Ebstein, 2014), including social cognition (Donaldson & Young, 2008)
and social affiliation (Insel, 2010), as well as reproductive behaviors. They
are also implicated in psychiatric disorders such as autism spectrum
disorder (Bachner-Melman & Ebstein, 2014; Donaldson & Young, 2008).
OT and AV are collectively called nonapeptides because they are composed
of nine amino acid residues (Acher & Chauvet, 1995). They are
predominantly synthesized in the magnocellular neurons in the
hypothalamic supraoptic and paraventricular nuclei and released centrally
and peripherally into the circulation through the posterior pituitary (Johnson
& Young, 2017) and thereby act as neuromodulators or neurohormones
(Bachner-Melman & Ebstein, 2014; Donaldson & Young, 2008). Although
several nonapeptide receptors are identified in the brain, OT receptor
(OTR), vasopressin receptor 1a (V1aR), and vasopressin receptor 1b
(V1bR) have been a major focus of investigation. These nonapeptide
receptors are expressed throughout auditory and mesolimbic pathways
(Johnson & Young, 2017).
Oxytocin
Several studies investigated peripheral OT responses to musical activities.
A single 30-minute singing lesson increased the serum OT level compared
to the baseline level in both professional and amateur singers (Grape,
Sandgren, Hansson, Ericson, & Theorell, 2003). Compared to a chatting
group, a singing group showed increase in the salivary OT level and
improvement of psychological well-being (Kreutz et al., 2012). In another
study, the plasma OT level increased in a small sample of four singers after
improvised singing, but it did not change after pre-composed singing
(Keeler et al., 2015). Furthermore, a group of boys with mild emotional
disturbance aged between 8 and 12 years showed an increased level of
salivary OT in the free session of group drumming compared to the practice
session, which was not observed in the same age group of girls or an older
aged group of boys (Yuhi et al., 2017).
In contrast to these findings, two studies reported that group singing
reduced the OT levels. One study found a decrease in the salivary OT level
after choir singing (Schladt et al., 2017). However, this change was not
observed after solo singing in the same participants. Instead, the OT level
increased after solo singing. Another study reported that singing in a single
70-minute choir rehearsal was associated with a decrease in the salivary OT
level across three populations affected by cancer (Fancourt, Williamon, et
al., 2016).
In addition to these studies with musical activities, the effect of passive
listening was investigated in two studies. Elevated plasma OT level was
reported in cardiac surgery patients who listened passively to experimenter-
selected “soothing” music (soft, relaxing of 60–80 beats per minute with a
volume of 50–60 dB) for 30 minutes one day after the surgery, but not in
those who rested without listening to music (Nilsson, 2009). Elevated
plasma OT level was also observed in participants with Williams Syndrome
(WS) who listened to their favorite music that elicited positive emotions
(Dai et al., 2012).
Arginine Vasopressin
Arginine vasopressin receptor 1A (AVPR1A) is one of the main genes that
have been associated with musical activities and related behaviors in
genome-wide linkage and association studies (Bachner-Melman et al.,
2005; Granot et al., 2007; Mariath et al., 2017; Ukkola et al., 2009; Ukkola-
Vuoti et al., 2011). The AVPR1A microsatellites have been associated with
musical working memory (Granot et al., 2007; but also see Granot,
Uzefovsky, Bogopolsky, & Ebstein, 2013), musical aptitude (Ukkola et al.,
2009), active music listening (Ukkola-Vuoti et al., 2011), and a wide range
of musical abilities (e.g., musical abilities associated with tempo, rhythm,
dynamics, vocality, and pitch, as well as creativity and development of
musical ideas and accompaniment) (Mariath et al., 2017). Except for these
genetic studies, the relationships between AV and music have been little
explored. The only study that has measured the AV level was the study by
Dai et al. (2012) mentioned above, which also found an increase in the AV
level in participants with WS.
In summary, music induces the peripheral OT responses and there is
some genetic association between AVR and music. However, the directional
changes are not consistent among those OT studies. The elevated OT levels
are generally implicated in positive social experiences (Chanda & Levitin,
2013). However, OT is also released in response to various kinds of stress
(Brown, Cardoso, & Ellenbogen, 2016; de Jong et al., 2015; Pierrehumbert
et al., 2010). The reduction in the OT level may reflect lower arousal and
stress during choir singing (Schladt et al., 2017). Taking blood samples may
cause stress and increase the OT level in some participants at the baseline
measurement, confounding the findings. Alternatively, the inconsistent
findings may also be partially derived from different sampling methods.
Some studies measured the OT levels in plasma and others measured the
salivary OT. These peripheral levels are used as a proxy for the central OT
levels. However, there are no strong correlations between central (CSF) and
peripheral measures as well as between the peripheral measures (Carson et
al., 2015; Hoffman, Brownley, Hamer, & Bulik, 2012; Javor et al., 2014;
Lefevre et al., 2017; Valstad et al., 2017). Other possible factors explaining
the inconsistencies include the influence by gonadal steroid (Insel, 2010)
and subjective experiences of the musical activities employed in the study
(Yuhi et al., 2017). The baseline measurement in a healthy control group is
also needed to understand the directional changes and interactions when
clinical populations are studied.
To date, there has been only one study that investigated the effects of
music on both neuropeptides in a clinical population and no study in
healthy participants. For future studies measuring both neuropeptides, it
should be noted that OT and AV show similar directional changes for some
social behaviors such as pair bonding (Caldwell, 2017), whereas they show
different effects in some cases, and opposite effects for aggression (Ferris,
1992; MacLean et al., 2017), anxiety and stress (Bachner-Melman &
Ebstein, 2014; Heinrichs, von Dawans, & Domes, 2009), and social
approach (Thompson & Walton, 2004). In fact, several electrophysiological
experiments revealed their differential regulations of excitatory projections
in the limbic system (Campbell-Smith, Holmes, Lingawi, Panayi, &
Westbrook, 2015; Huber, Veinante, & Stoop, 2005; Lubin, Elliot, Black, &
Johns, 2003; Numan et al., 2010). Furthermore, animal research suggests
that neuroanatomical distribution of their receptors may be critical for
determining function.
N S II (A
P )
A neuroendocrine system, commonly referred to as the hypothalamic-
pituitary-adrenal (HPA) axis releases cortisol as its main effector hormone
(Spencer, Chun, Hartsock, & Woodruff, 2018). Cortisol plays an important
role in circadian and stress regulation. The basal cortisol levels fluctuate in
a circadian fashion in the absence of stressors and the levels rise in response
to acute physical or psychological stressors as well as a circadian
entrainment. Circadian and stress-induced cortisol secretion is determined
by the neurohormone, corticotropin releasing factor (CRF) produced in and
secreted from the medial paraventricular nucleus of the hypothalamus. In
response to CRF, the anterior pituitary produces and secretes
adrenocorticotropin hormone (ACTH) and ß-endorphin (see also the section
“Endogenous Opioid Systems”). Triggered by the ACTH, the cortisol is
synthesized in the adrenal cortex. It passively diffuses into the adrenal vein
and is carried throughout the circulatory system. In addition to CRF,
vasopressin (AVP) is also involved in the process of the secretion.
The measured cortisol levels discussed in this section are primarily
salivary cortisol unless mentioned otherwise. Salivary cortisol is a valid and
reliable measure for unbound hormone in blood (Kirschbaum &
Hellhammer, 1994). Cortisol has been most studied as a stress biomarker in
response to music (Chanda & Levitin, 2013; Fancourt, Ockelford, & Belai,
2014; Hodges, 2010). There is a general consensus that relaxing music
regardless whether it is experimenter- or participant-selected reduces
cortisol levels (Beaulieu-Boire et al., 2013; Chanda & Levitin, 2013; Chen,
Sung, Lee, & Chang, 2015; Fancourt et al., 2014; Hodges, 2010; Jayamala,
Lakshmanagowda, Pradeep, & Goturu, 2015; Kreutz et al., 2012; Mejía-
Rubalcava, Alanís-Tavira, Mendieta-Zerón, & Sánchez-Pérez, 2015; but
also see null findings by Chen et al., 2015; Chlan, Engeland, & Savik,
2013; Good et al., 2013; Tan, McPherson, Peretz, Berkovic, & Wilson,
2014). However, when experimenter-selected relaxing music and
participant-selected music from a choice of genres were compared,
participant-selected music was more effective in reducing the cortisol level
by showing the prolonged effect post-surgery (Leardi et al., 2007). Another
study suggests that the sound of rippling water may be more effective than
relaxing music in reducing the cortisol level (Thoma et al., 2013). However,
neither was significantly different from the control condition without
acoustic stimulation.
In addition to the effect of relaxing music, several studies investigated
the effect of stimulating music on the cortisol levels. Five studies reported
that stimulating music also reduced the cortisol levels in female adolescents
with chronic depression (Field et al., 1998), in surgical patients (Koelsch et
al., 2011), in participants with hypertensives (Möckel et al., 1995), in
dancers (Quiroga Murcia, Kreutz, Clift, & Bongard, 2010), in participants
with lung infection (le Roux, Bouic, & Bester, 2007), and in healthy males
(Ooishi, Mukai, Watanabe, Kawato, & Kashino, 2017), whereas other
studies found an increase in healthy participants (Brownley, McMurray, &
Hackney, 1995; Gerra et al., 1998; Hébert, Béland, Dionne-Fournelle,
Crête, & Lupien, 2005; Karageorghis et al., 2017). These results suggest
that stimulating music can either attenuate or enhance the cortisol level,
which may depend on participant characteristics and/or their preference of
music.
Furthermore, there are some studies suggesting that music in general
may reduce the cortisol levels. For example, both participant-selected chill-
inducing music and music they disliked significantly reduced the cortisol
levels in both male and female participants (Fukui & Toyoshima, 2013).
Additionally, listening to music regardless of genre (Mozart, Strauss, and
ABBA) led to a significant reduction in the serum cortisol concentrations,
which was also significantly lower compared to those in the silence
condition (Trappe & Voit, 2016). Furthermore, another study reported that
both repetitive drumming and instrumental meditation music decreased the
cortisol levels (Gingras, Pohler, & Fitch, 2014). Taken together, cortisol
appears to be responsive to music in general.
Cortisol responses to music were also investigated in surgical patients
pre- and post- as well as during surgery. Listening to participant-selected
and experimenter-selected music during and post-surgery prevented the
cortisol level from increasing, and/or decreased the cortisol level post-
surgery (Graversen & Sommer, 2013; Nilsson, 2009; Schneider,
Schedlowski, Schürmeyer, & Becker, 2001; Tabrizi, Sahraei, & Rad, 2012;
but also see Lin, Lin, Huang, Hsu, & Lin, 2011). Comparing different
periods of time of music listening, one study reported that listening to
experimenter-selected relaxing music following the surgery was most
effective in reducing the level of serum cortisol relative to listening to
music in the pre- and peri-operative periods (Nilsson, Unosson, & Rawal,
2005). Altogether, these studies suggest that listening to music is beneficial
to surgical patients by reducing cortisol levels. Schneider et al. (2001)
reported that more than a majority of patients in the music group thought
that the beneficial effect of music was a distraction. Thus, how music exerts
the beneficial effect on the cortisol level and other behavioral measures in
surgical patients needs further investigation.
The counteracting effect of music on the elevated cortisol levels induced
by acute stressors was also studied in younger healthy participants.
Experimenter-selected relaxing music helped to reduce the cortisol level
immediately following a psychological stressor whereas the silence
condition led to an increase during the same recovery period (Khalfa, Dalla
Bella, Roy, Peretz, & Lupien, 2003), suggesting that relaxing music
facilitates faster recovery from the stressor. On the other hand,
experimenter-selected relaxing music did not lower the cortisol levels after
the exposure to a psychological stressor whereas it prevented stress-induced
increases in heart rate, systolic blood pressure, and anxiety compared to the
silence condition (Knight & Rickard, 2001). Another study used an acute
physiological stressor and demonstrated that tapping to the experimenter-
selected positive music post-stressor was associated with more positive
mood and stronger cortisol responses (i.e., increase) compared to tapping to
the neural music (Koelsch et al., 2016). The positive mood was also
associated with the greater cortisol response to the acute stressor in the
music group. Authors interpreted these findings as indicating that the
stronger cortisol response may reflect an early sign of immuno-enhancing
response to the acute stressor, but not a higher stress level because the
music group overall had a more positive mood (Koelsch et al., 2016). There
was no effect of music on the level of ACTH in this study. The inconsistent
findings of these studies may be partly due to different types of stressor and
how music was applied in these studies.
Moreover, the effects of group musical activities on endocrine responses
have been studied. Singing was associated with a reduction in endocrine
responses (Fancourt, Aufegger, & Williamon, 2015; Fancourt, Williamon,
et al., 2016; Schladt et al., 2017). Cortisol reduction was greater for choir
singing than solo singing, accompanied with a reduction in the salivary OT
(Schladt et al., 2017). In addition, the effect of group singing on the
endocrine responses was modulated by the conditions of performance
(Fancourt et al., 2015). More specifically, the reduced levels in cortisol and
cortisone were only observed in the low-stress condition (singing without
an audience) compared to the high-stress condition (singing in a live
concert). On the other hand, no endocrine changes were found following a
single session of group drumming (Bittman et al., 2001) or multiple
sessions of group drumming (Fancourt, Perkins, et al., 2016) in healthy
participants.
Music therapy, in which musical and other activities are led by a
therapist, showed mixed results. Following guided imaginary and music
(GIM) therapy that combines relaxation techniques and listening to classical
music, the cortisol level was reduced to be lower compared to the silence
condition in healthy participants (McKinney et al., 1997) and in individuals
on sick-leave (Beck, Hansen, & Gold, 2015). On the other hand, there were
no endocrine responses to an individualized music therapy in older healthy
adults (Suzuki, Kanamori, Nagasawa, Tokiko, & Takayuki, 2007); to an
individualized music therapy or to a multisensory stimulation environment
including auditory stimulation in older adults with severe dementia
(Valdiglesias et al., 2017); or to movement music therapy in older healthy
adults (Shimizu et al., 2013).
One study investigated the social effect of music listening on the cortisol
level (Linnemann, Strahler, & Nater, 2016). Listening to music in the
presence of others (mostly friends), but not listening alone, attenuated the
secretion of cortisol. However, the presence of others alone significantly
explained the variance in the cortisol level. In addition, the findings of this
study should be interpreted with caution since the time intervals between
music listening and the cortisol measurement were unknown.
Interestingly, listening to music for relaxation was associated with
significant reductions in the subjective stress level and in the cortisol
concentration in healthy participants (Linnemann, Ditzen, Strahler, Doerr,
& Nater, 2015). Moreover, the reduction in the cortisol level was not
associated with the perception of music as relaxing. The authors
emphasized the importance of non-musical, contextual factors such as
reasons for music listening. It would be interesting to compare the cortisol
response to a non-musical control activity for relaxation. This study also
showed that listening to music for distraction increased the stress level,
which contrasted the findings in surgical patients (Schneider et al., 2001).
This may be due to differences in participants’ characteristics and/or
circumstances.
The effects of music on endocrine measures may be mediated by sex.
For example, testosterone showed opposite responses between men and
women. Music decreased testosterone in men but it increased in women
(Fukui & Toyoshima, 2013; Fukui & Yamashita, 2003). In addition, music
may have differential effects on men and women. In one study, after
strenuous exercise men and women showed different trajectory of the
cortisol levels during a recovery period with music. This was observed
regardless of musical tempo (Karageorghis et al., 2017). In another study,
the cortisol level decreased more steeply in men relative to women in both
choir and solo singing (Schladt et al., 2017). In contrast, other studies did
not find any sex effect on the level of cortisol associated with music
listening (Fukui & Yamashita, 2003; Nater, Abbruzzese, Krebs, & Ehlert,
2006).
The studies discussed above included adult participants. Several studies
investigated the cortisol responses to music in younger age groups. In
schoolchildren, extra two-hour musical activities including singing,
moving, dancing, or playing instruments during a school year resulted in a
reduction of the cortisol level measured in the afternoon at the end of school
year. However, this result reached a statistical significance only when a one-
tail t-test was used (Lindblad, Hogmark, & Theorell, 2007). In preterm
infants, exposure to live instrumental music reduced the cortisol level along
with improvement of other measures for oxygen desaturations, apneas, and
pain (Schwilling et al., 2015). On the other hand, recorded lullabies did not
affect the cortisol level or sleep–awake behavior (Dorn et al., 2014).
Another study also did not find any effect of recorded lullaby combined
with touch on the cortisol level (Qiu et al., 2017). This study, however,
showed that following the intervention, blood ß-endorphin was significantly
increased, accompanied with decreased pain responses.
To summarize, it is relatively conclusive that music reduces the cortisol
level. The beneficial effects of music may be associated with distraction
from aversive states (Chanda & Levitin, 2013) in the context of acute
stressors (Linnemann, Kappert, et al., 2015) and/or listener’s intention of
music listening (Linnemann, Ditzen, et al., 2015). Further studies are
needed to clarify how music exerts beneficial effects on stress biomarkers.
The endocrine responses are primarily studied associated with stress in
which multiple factors can affect the findings, for example depending on
whether the stressor is either acute or chronic (Koelsch et al., 2016) or
whether it is psychological, physiological, or physical. In addition,
appropriate stress response differs depending on the circadian phase
(Spencer et al., 2018). Therefore, more studies warrant further elucidation
of the effect of music on stress responses.
N S
Norepinephrine (NE) neurons are located in the brainstem, primarily in the

locus coeruleus (LC), whose axons widely project to the cerebral cortex,
limbic regions, thalamus, and cerebellum as well as to the spinal cord. The
major NE projection from the LC is thought to play an important role in
stress responses and various psychiatric disorders (Hurley, Flashman,
Chow, & Taber, 2010). In addition, the NE neurons located in the caudal
pons and medulla are involved in the function of the sympathetic nervous
system (SNS), regulating the autonomic responses of heart rate, blood
pressure, and respiration. The activation of the SNS induced by physical or
psychological stressors releases NE, which stimulates the adrenal glands
that synthesizes and secretes hormonal norepinephrine, epinephrine, and
dopamine (Kreutz et al., 2012).
Music has been studied as an intervention to reduce stress and to
normalize the SNS. A single therapeutic session using relaxing music
reduced the plasma level of epinephrine in critically ill patients, which was
also accompanied with reductions in the amount of sedative drug required,
in blood pressure, and in heart rate (Conrad et al., 2007). Similarly, another
study reported that music therapy sessions using familiar music lowered the
plasma NE level in the elderly with dementia and cardiovascular disease
compared to those without music therapy (Okada et al., 2009). The patients
in the music therapy group also showed improvement in other SNS
measures and a reduction in the number of congestive heart failures. On the
other hand, music therapy sessions increased both NE and epinephrine
levels in males with Alzheimer’s disease (Kumar et al., 1999). The
increased levels were normalized at a six-week follow-up.
Two studies demonstrated the differential effects of stimulating and
relaxing music. Experimenter-selected slow rhythm (classical) music
decreased the plasma NE level whereas experimenter-selected fast rhythm
(music from action movies) increased the plasma epinephrine level in
healthy male participants (Yamamoto et al., 2003). These changes did not
affect the following exercise performance. Similarly, using the salivary
alpha-amylase as a surrogate biomarker for SNS, energizing music
increased the activity whereas relaxing music decreased it in healthy
participants (Linnemann, Ditzen, et al., 2015).
One study showed the effects of music and the genre of music on the NE
and epinephrine levels only in patients but not in healthy participants
(Möckel et al., 1995). The hypertensive participants who selected modern
classical music from a choice of preselected genres showed a reduction in
the NE level whereas those who selected meditative music showed a
reduced epinephrine level. The participants in both groups also showed
reductions in other stress biomarkers.
Another study showed that the religious Islamic music selected by an
experimenter reduced the plasma NE level whereas classic music increased
it in the Muslim participants who were listening to it during a dental
procedure, which was also accompanied with the same directional changes
in the systolic blood pressure. The differences in the NE levels as well as in
the systolic blood pressure between pre- and post-dental procedure in the
religious music group also significantly differed from the classic music and
no music groups (Maulina, Djustiana, & Shahib, 2017). These studies
suggest that the significance of music may play an important role in
exerting the positive effects on the hormonal and physiological measures.
In contrast, there were no catecholaminergic changes in response to
experimenter-selected stimulating music (Hirokawa & Ohira, 2003) or
experimenter-selected relaxing music in healthy participants (Gerra et al.,
1998; Hirokawa & Ohira, 2003); experimenter-selected relaxing music in
post-operative critically ill patients (Conrad et al., 2007); participant-
selected music from a list, from a choice of genre, or of their preferences in
preoperative patients (Lin et al., 2011; Schneider et al., 2001; Wang,
Kulkarni, Dolev, & Kain, 2002) or in those receiving ventilator support
(Chlan, Engeland, & Anthony, 2007); or participant-selected music from a
choice of genre in patients under general anesthesia (Migneault et al., 2004)
or in post-operative patients (Lin et al., 2011). Furthermore, experimenter-
selected positive music from various genres and with varying tempo that
evokes feelings of pleasure and happiness did not change the levels of NE
compared to a neutral auditory stimuli following an acute physiological
stressor (Koelsch et al., 2016).
To summarize, literature shows conflicting results on the peripheral
catecholaminergic responses to music. Music tends to decrease the levels of
catecholamine in some individuals with medical conditions. Tempo/rhythm
of music may be an important factor influencing the responses. This may
reflect that the auditory nuclei in the brainstem and the midbrain encoding
auditory temporal information (Griffiths, Uppenkamp, Johnsrude, Josephs,
& Patterson, 2001) are innervated by the NE system from the LC (Levitt &
Moore, 1979; Thompson, 2003). The beneficial effects observed in listening
to religious Islamic music among Muslim participants (Maulina et al.,
2017) suggest the top-down regulation of music on the NE system and SNS.
The underlying mechanisms for the effects of music on SNS are also
discussed in review papers (Fancourt et al., 2014; Juslin & Västfjäll, 2008).
P I S
The immune system functions to protect and defend the body against
infection and damage from foreign organisms and toxins, while maintaining
checks and balances to prevent self-reactivity. It has two branches: innate
and adaptive immunity. The innate immune responses occur immediately
following an insult and are the first component of the immune system to be
activated against invasion (Turvey & Broide, 2010), and include activation
of immune cells such as granulocyte and monocytes/macrophages, and
secretion of pro-inflammatory cytokines such as interleukin IL-1β, IL-6,
tumor necrosis factor alpha TNF-α, and interferon-gamma IFN-γ to
upregulate the acute inflammatory response. In contrast, the adaptive
immune system, consisting of B cells and T cells, is slower acting with its
responses occurring days to weeks after exposure. Unlike the innate
immune system, the adaptive immune system is capable of memory and is
able to adjust in response to pathogens. Anti-inflammatory cytokines
include IL-1Ra, IL-4, IL-6, IL-10, IL-11 and IL-13, and TNF-β that
modulate the inflammatory immune response to prevent the harmful effects
of prolonged immune system activation. It should be noted that the immune
cells such as natural killer (NK) cells and dendritic cells cannot be clearly
defined as innate or adaptive and that some cytokines have both pro- and
anti-inflammatory properties depending on the amount of the cytokines
expressed, the length of time they are expressed, or which form of the
receptors the cytokines activate (Rainville, Tsyglakova, & Hodes, 2018).
Immune Cells
Group drumming led to an increase in the levels of NK cell in healthy
adults (Bittman et al., 2001) and increased CD4+ T cell, and memory T cell
counts only in older adults, but not in younger adults (Koyama et al., 2009).
In contrast, listening to experimenter-selected relaxing music during surgery
decreased the levels of NK cell, which was not observed in patients who
chose their music from preselected music (Leardi et al., 2007).
Cytokines
Among cytokines, IL-6 (that presents both pro- and anti-inflammatory
properties) has been most researched associated with music. Music therapy
sessions using relaxing music reduced the IL-6 levels, which was
accompanied with reductions in SNS biomarkers in surgical patients
(Conrad et al., 2007), and in the elderly with cerebrovascular disease and
dementia (Okada et al., 2009). Experimenter-selected classical music also
decreased the IL-6 level among older adults who liked the genre of music,
which was accompanied with an increase in the expression of μ opioid
receptors (Stefano, Zhu, Cadet, Salamon, & Mantione, 2004) whereas it did
not change the levels of other cytokines. On the other hand, group
drumming exercises led to an increase in the IL-6 level, along with
increased levels of pro-inflammatory IFN-γ in older adults, which was not
observed in younger adults (Koyama et al., 2009). Because Koyama et al.
(2009) also reported increased CD4+ T cell and memory T cell counts only
in older adults, the increased IL-6 level may be anti-inflammatory.
Although it appeared that IL-6 showed “the greatest levels of
responsiveness” (Fancourt et al., 2014, p. 18), more recent studies showed
otherwise (Beaulieu-Boire et al., 2013; Fancourt, Perkins, et al., 2016;
Fancourt, Williamon, et al., 2016; Koelsch et al., 2016). Further research is
needed to determine whether or not IL-6 is a sensitive immune biomarker in
response to music.
Other cytokines also showed responses to music. The level of anti-
inflammatory IL-1 was increased, along with the cortisol reduction in
response to participant-selected music compared to the control conditions
(Bartlett, Kaufman, & Smeltekop, 1993). In another study, anti-
inflammatory IL-4 was increased, accompanied with a reduction in a pro-
inflammatory marker, monocyte chemoattractant protein (MCP) in response
to multiple group drumming sessions (Fancourt, Perkins, et al., 2016).
Another study reported increased inflammatory markers including pro-
inflammatory IL-2 and soluble IL2 receptor α; anti-inflammatory IL-4; and
IL-17 that displays both pro- and anti-inflammatory profiles, along with
improved affects and reductions in the cortisol, ß-endorphin, and OT levels
in response to a single session of singing in choirs affected by cancer
(Fancourt, Williamon, et al., 2016). One study found that Mozart, but not
Beethoven or Schubert, downregulated the levels of anti-inflammatory IL-
4, 10 and 13 and upregulated the levels of pro-inflammatory cytokine such
as IFN-γ and IL-12, which was also associated with alleviated allergic skin
responses (Kimata, 2003). The findings reported by Kimata (2003) may
reflect the enhancement of pro-inflammatory responses induced by music,
which was similar to the increased cortisol responses following the acute
physiological stressor (Koelsch et al., 2016).
Immunoglobulin A
Along with these peripheral immune biomarkers, immunoglobulin A (IgA)
is one of the most commonly studied immune biomarkers associated with
music. Immunoglobulin A is a major serum immunoglobulin that is
predominantly produced in the bone marrow and mediates various
protective functions through interaction with specific receptors and immune
mediators (Woof & Ken, 2006). Immunoglobulin A is also a principal
antibody class in the external secretions that bathe vast mucosal surfaces of
the gastrointestinal, respiratory, and genitourinary tracts and plays an
important role in first line immune protection. Secretory and serum IgA are
different biochemical and immunochemical properties produced by cells
with different organ distributions. Therefore, different methods of
immunization can induce either secretory or serum IgA responses or a
combination of both.
In general, research has yielded consistent results: music increases the
concentrations or secretion rate of secretory IgA (S-IgA) (Chanda &
Levitin, 2013; Fancourt et al., 2014; Hodges, 2010), suggesting that music
enhances immunity in healthy individuals. Furthermore, the S-IgA increase
was greater when engaging in group singing compared to passive listening
(Beck, Cesario, Yousefi, & Enamoto, 2000; Kreutz, Bongard, Rohrmann,
Hodapp, & Grebe, 2004; Kuhn, 2002). Another study showed that S-IgA
was increased only in repose to “designer music” that brings positive
feelings, but not to relaxing (new age) or rock music (McCraty, Atkinson, &
Rein, 1996).
However, there are also a few studies reporting no changes in the levels
of IgA. In two studies, the serum levels of IgA did not change in patients
who listened to experimenter-selected calming music post-surgery (Nilsson
et al., 2005) or joyful music (which was described to the patients as
“relaxing” acoustic stimulation to reduce noise) before, during, or after
surgery (Koelsch et al., 2011). No music effect on the plasma IgA
concentrations may be due to the effects of local anesthetic infiltration
(Nilsson et al., 2005) or due to differences in response to music between S-
IgA and serum IgA (Woof & Ken, 2006). Furthermore, two studies reported
no changes following stressors such as eating adverse/allergic food (Kejr et
al., 2010) or a stressful cognitive task (Hirokawa & Ohira, 2003). The
immunoenhancement effect of music may be limited in healthy individuals
without exposure to stressors.
To summarize, there is some evidence that music induces changes in
immune biomarkers. S-IgA appears to respond most consistently and
robustly to music in healthy individuals. Music-induced increase of S-IgA
is interpreted as immunoenhancement. Future studies can investigate how
long this effect lasts and whether music experiences and habits modulate
the effect. Although music induces responses of other immune biomarkers,
the interpretations can be challenging due to the inconsistency in the
directional changes of cytokines with different inflammatory properties.
An interesting observation from animal research is that individual
differences in the peripheral immune system influence the development of
stress susceptibility, demonstrated by higher circulating levels of IL-6 and
leukocytes in susceptible mice compared to resilient and control mice
(Hodes et al., 2014; Rainville et al., 2018). Therefore, it may be useful to
separate participants depending on the baseline level of immune
biomarkers. Furthermore, as immune biomarkers are closely connected with
hormones (Yovel, Shakhar, & Ben-Eliyahu, 2001), sex may need to be
accounted for in the study design.
C S
Cholinergic neurons are localized in the basal forebrain, pedunculopontine

tegmental nucleus (PPT), and laterodorsal tegmental nucleus (LDT). Two
latter nuclei are collectively termed as the pontomesencephalic tegmentum
(PMT) nuclei. The PMT nuclei send widespread projections to the spinal
cord, thalamus, basal forebrain, and frontal cortex. Major acetylcholine
(Ach) receptors include nicotinic (nAChR) and muscarinic (mAChR)
receptors, which are expressed in the auditory system (Metherate, 2011;
Morley & Happe, 2000). Cholinergic modulation of auditory functions is
well studied and animal research demonstrated that cholinergic neurons
were responsive to simple auditory stimuli such as pure tones (Koyama,
Jodo, & Kayama, 1994) and clicks (Reese, Garciarill, & Skinner, 1995a, b).
However, whether acoustic information including music induces
cholinergic responses in the human brain is unknown.
In animal research, some neurons in the primary auditory cortex send
direct glutamatergic projections to the superior olivary complex, as well as
PMT that innervates the IC and the auditory thalamus (Motts & Schofield,
2010). These observations suggest that auditory stimuli activating the
primary auditory cortex may be able to affect the activity of cholinergic
neurons in the PMT, influencing various functions such as arousal, the
sleep–wake cycle, motor control, and motivation and reward behavior
(Schofield, 2010). Cholinergic PMT cells are connected with dopaminergic
neurons in the VTA (Chen, Nakamura, Kawamura, Takahashi, & Nakahara,
2006; Pan & Hyland, 2005) and these connections are likely to be involved
in reward behavior (Pan & Hyland, 2005), and the connections from PPT
with BG are associated with motor functions. The cholinergic PPT neurons
responsive to clicks (Reese et al., 1995a, b) may underlie part of the
mechanisms for auditory-motor entrainment. There is also a network of the
mediodorsal nucleus of the thalamus projecting to cholinergic and non-
cholinergic neurons in the globus pallidus that project to the auditory cortex
(Moriizumi & Hattori, 1992), which may also be associated with auditory
motor functions.
D F D
Research demonstrates that music induces responses of neurochemicals as

well as peripheral hormones and immune biomarkers, along with
concomitant functional changes. Some of them are extensively studied and
yield relatively consistent responses (e.g., a reduction in cortisol and an
increase in S-IgA), and others are little studied and/or show inconsistent
results. In addition, there are few studies that have directly investigated the
CNS responses. As a neuroscientific pursuit in music as well as clinical
application of music are of growing interest, more studies are needed to
elucidate and confirm the neurochemical responses to music and acoustic
information by employing more rigorous study designs.
Future studies need to consider participant characteristics such as age,
sex, trait and state of depression and anxiety levels, baseline neurochemical
levels, polymorphisms associated with music ability, music
education/training levels, and music listening habits and preferences. At the
same time, more studies are needed to investigate the effects of these
individual characteristics on neurochemical responses to determine
important confounding variables in music studies. Moreover, studies
including clinical populations or older healthy adults need to include
control groups consisting of participants without the medical conditions or
younger healthy adults to determine whether the target group is different
from the control group in the baseline level of neurochemical measures and
how they react differently to the music intervention.
Existing literature used different methods to evaluate the molecular
responses to music. Some studies simply compared the levels between pre-
and post-music intervention and others additionally included control silence
conditions. In addition, some studies used passive listening and others used
group musical activities. In order to determine and dismantle the specific
effects of music, future studies need to include control conditions well
matched with music conditions in terms of attention, engagement, and
interactions (e.g., passive listening to music versus passive listening to an
audio book suggested by Chanda & Levitin, 2013).
In general, research shows that participant-selected music has greater
responses compared to experimenter-selected music. When the
experimenter-selected music is used in experimental studies or in clinical
settings, participants’ rating on the selected music for emotional dimensions
and liking may help to understand the findings or answer some of the
inconsistency and variance of the responses although subjective and
objective hedonic reactions are not always mutual (Berridge &
Kringelbach, 2015). More specific descriptions of music may also help to
further clarify what components of music are important to induce such
responses.
Furthermore, concomitant measures of other relevant biomarkers and
physiological/emotional/behavioral data are useful to determine whether
observed neurochemical responses are beneficial. Demonstrated
correlations may help to interpret the findings and the underlying
mechanisms. Moreover, the findings based on the peripheral measures to
infer brain functions should be interpreted with caution unless the measures
are well-validated proxies for the central measures. In addition, timing of
measurement is important for some biomarkers, and thus multiple
measurements over a period of time may be able to capture more distinct
response.
To date, there is only one study directly addressing neurochemical
changes associated with music listening (Salimpoor et al., 2011), which
used PET with a D2/D3 receptor antagonist, [11C]raclopride (see section
“Dopamine Systems” for details). However, D2/D3 receptor agonists such
as [11C]-(+)-PHNO (Rabiner & Laruelle, 2010; Willeit et al., 2006) may be
more advantageous to investigate the functional changes in the ventral
striatum because it is more sensitive to competition from endogenous
dopamine following administration of dopamine releasing stimuli than the
antagonist [11C]raclopride (Narendran et al., 2010; Shotbolt et al., 2012;
Willeit et al., 2008) and shows up to 20-fold higher affinity for D3 over D2
receptors, providing higher sensitivity and allowing for better quantification
of the D3 receptor subtype in the ventral striatum (Graff-Guerrero et al.,
2008; Narendran et al., 2006).
There are three radioligands developed for opioid receptors,: [11C]-
carfentanil, targeting µ opioid receptors (Frost et al., 1989), [11C]-
diprenorphine for non-selective opioid receptors (Jones et al., 1994), and
more recent [11C]-LY2795050, targeting κ opioid receptors (Naganawa et
al., 2015). For the neuroimmune system, a number of radioligands have
been developed, targeting translocator protein (TSPO) that is localized to
the outer membrane of mitochondria of glia cells and has been used as a
biomarker for neuroimmune system and neuroinflammation in normal
aging and various diseases and disorders (Gunn et al., 2015). In addition,
radioligands for 5-HT and cholinergic subtype receptors as well as for NE
have been developed.
When PET studies are conducted, demographic characteristics such as
age and sex, and body mass are important confounding variables to be taken
into account (Gunn et al., 2015). In addition, polymorphism can have great
impact on binding. For example, TSPO polymorphism produces three
different binding phenotypes (Owen et al., 2011).
Along with these neuroreceptors and proteins that can be studied using
PET imaging, the recent development of proton magnetic resonance
spectroscopy at high magnetic field strengths allows for more reliable
estimation of the amino acids glutamine, glutamate, and gamma-
aminobutyric acid (Ciurleo, Di Lorenzo, Bramanti, & Marino, 2014). This
neuroimaging technique may be able to shed a light on cortico-cortical
interactions and top-down modulations of music. Moreover,
pharmacological studies in a double-blind placebo-controlled crossover
design, combined with more accessible fMRI would facilitate to elucidate
the role of neurochemicals in music-associated complex functions such as
cognition and emotional behaviors.
R
Acher, R., & Chauvet, J. (1995). The neurohypophysial endocrine regulatory cascade: Precursors
mediators, receptors, and effectors. Frontiers in Neuroendocrinology 16(3), 237–289.
Albers, H. E. (2015). Species, sex and individual differences in the vasotocin/vasopressin system:
Relationship to neurochemical signaling in the social behavior neural network. Frontiers in
Neuroendocrinology 36, 49–71.
Bachner-Melman, R., Dina, C., Zohar, A. H., Constantini, N., Lerer, E., Hoch, S., … Ebstein, R. P.
(2005). AVPR1a and SLC6A4 gene polymorphisms are associated with creative dance
performance. PLoS Genetics 1(3), 394–403. Retrieved from
https://doi.org/10.1371/journal.pgen.0010042
Bachner-Melman, R., & Ebstein, R. P. (2014). The role of oxytocin and vasopressin in emotional and
social behaviors. Handbook of Clinical Neurology 124, 53–68.
Bandelow, B., Baldwin, D., Abelli, M., Bolea-Alamanac, B., Bourin, M., Chamberlain, S. R., …
Riederer, P. (2017). Biological markers for anxiety disorders, OCD and PTSD: A consensus
statement. Part II: Neurochemistry, neurophysiology and neurocognition. World Journal of
Biological Psychiatry 18(3), 162–214.
Barrett, F. S., Preller, K. H., Herdener, M., Janata, P., & Vollenweider, F. X. (2017). Serotonin 2A
receptor signaling underlies LSD-induced alteration of the neural response to dynamic changes in
music. Cerebral Cortex (December), 1–12. Retrieved from https://doi.org/10.1093/cercor/bhx257
Bartlett, D., Kaufman, D., & Smeltekop, R. (1993). The effects of music listening and perceived
sensory experience on the immune system as measured by interleukin-1 and cortisol. Journal of
Music Therapy 30(4), 194–209.
Beaulieu-Boire, G., Bourque, S., Chagnon, F., Chouinard, L., Gallo-Payet, N., & Lesur, O. (2013).
Music and biological stress dampening in mechanically-ventilated patients at the intensive care
unit ward: A prospective interventional randomized crossover trial. Journal of Critical Care 28(4),
442–450.
Beck, B. D., Hansen, A. M., & Gold, C. (2015). Coping with work-related stress through guided
imagery and music (GIM): Randomized controlled trial. Journal of Music Therapy 52(3), 323–
352.
Beck, R. J., Cesario, T. C., Yousefi, A., & Enamoto, H. (2000). Choral singing, performance
perception, and immune system changes in salivary immunoglobulin A and cortisol. Music
Benarroch, E. E. (2012). Endogenous opioid systems: Current concepts and clinical correlations.
Neurology 79, 807–814.
Berridge, K. C., & Kringelbach, M. L. (2015). Pleasure systems in the brain. Neuron 86(3), 646–664.
Bittman, B., Berk, L., Felten, D., Westengard, J., Simonton, O., Pappas, J., & Ninehouser, M. (2001).
Composite effects of group drumming music therapy on modulatin of neuroendocrine-immune
parameters in normal subjects. Alternative Therapies 7(1), 38–47.
Brown, C. A., Cardoso, C., & Ellenbogen, M. A. (2016). A meta-analytic review of the correlation
between peripheral oxytocin and cortisol concentrations. Frontiers in Neuroendocrinology 43, 19–
27.
Brownley, K. A., McMurray, R. G., & Hackney, A. C. (1995). Effects of music on physiological and
affective responses to graded treadmill exercise in trained and untrained runners. International
Journal of Psychophysiology 19(3), 193–201.
Caldwell, H. K. (2017). Oxytocin and vasopressin: Powerful regulators of social behavior.
Neuroscientist 23(5), 517–528.
Cameron, D. J., Pickett, K. A., Earhart, G. M., & Grahn, J. A. (2016). The effect of dopaminergic
medication on beat-based auditory timing in Parkinson’s disease. Frontiers in Neurology 7, 1–8.
Retrieved from https://doi.org/10.3389/fneur.2016.00019
Campbell-Smith, E. J., Holmes, N. M., Lingawi, N. W., Panayi, M. C., & Westbrook, R. F. (2015).
Oxytocin signaling in basolateral and central amygdala nuclei differentially regulates the
acquisition, expression, and extinction of context-conditioned fear in rats. Learning & Memory
22(5), 247–257.
Carson, D. S., Berquist, S. W., Trujillo, T. H., Garner, J. P., Hannah, S. L., Hyde, S. A., … Parker, K.
J. (2015). Cerebrospinal fluid and plasma oxytocin concentrations are positively correlated and
negatively predict anxiety in children. Molecular Psychiatry 20(9), 1085–1090.
Castro, D. C., & Berridge, K. C. (2017). Opioid and orexin hedonic hotspots in rat orbitofrontal
cortex and insula. Proceedings of the National Academy of Sciences 114(43), E9125–E9134.
Chanda, M. L., & Levitin, D. J. (2013). The neurochemistry of music. Trends in Cognitive Sciences
17(4), 179–191.
Chen, C. J., Sung, H. C., Lee, M. S., & Chang, C. Y. (2015). The effects of Chinese five-element
music therapy on nursing students with depressed mood. International Journal of Nursing Practice
21(2), 192–199.
Chen, J., Nakamura, M., Kawamura, T., Takahashi, T., & Nakahara, D. (2006). Roles of
pedunculopontine tegmental cholinergic receptors in brain stimulation reward in the rat.
Psychopharmacology 184(3–4), 514–522.
Chlan, B. L. L., Engeland, W. C., & Anthony, A. (2007). Influence of music on the stress response in
patients receiving mechanical ventilatory support: A pilot study. American Journal of Critical
Care 16(2), 141–146.
Chlan, L. L., Engeland, W. C., & Savik, K. (2013). Does music influence stress in mechanically
ventilated patients? Intensive and Critical Care Nursing 29(3), 121–127.
Ciurleo, R., Di Lorenzo, G., Bramanti, P., & Marino, S. (2014). Magnetic resonance spectroscopy:
An in vivo molecular imaging biomarker for Parkinson’s disease? BioMed Research International
2014, 519816. Retrieved from https://doi.org/10.1155/2014/519816
Conrad, C., Niess, H., Jauch, K., Bruns, C., Hartl, W., & Welker, L. (2007). Overture for growth
hormone: Requiem for interleukin-6? Critical Care Medicine 35(12), 2709–2713.
Dai, L., Carter, C. S., Ying, J., Bellugi, U., Pournajafi-Nazarloo, H., & Korenberg, J. R. (2012).
Oxytocin and vasopressin are dysregulated in Williams syndrome, a genetic disorder affecting
social behavior. PLoS ONE 7(6), e38513. Retrieved from
de Jong, T. R., Menon, R., Bludau, A., Grund, T., Biermeier, V., Klampfl, S. M., … Neumann, I. D.
(2015). Salivary oxytocin concentrations in response to running, sexual self-stimulation,
breastfeeding and the TSST: The Regensburg Oxytocin Challenge (ROC) study.
Psychoneuroendocrinology 62, 381–388.
Djurovic, S., Le Hellard, S., Kähler, A. K., Jönsson, E. G., Agartz, I., Steen, V. M., … Andreassen,
O. A. (2009). Association of MCTP2 gene variants with schizophrenia in three independent
samples of Scandinavian origin (SCOPE). Psychiatry Research 168(3), 256–258.
Dölen, G., Darvishzadeh, A., Huang, K. W., & Malenka, R. C. (2013). Social reward requires
coordinated activity of nucleus accumbens oxytocin and serotonin. Nature 501(7466), 179–184.
Donaldson, Z. R., & Young, L. J. (2008). Oxytocin, vasopressin, and the neurogenetics of sociality.
Science 322(5903), 900–904. Correction (2009): Science 323(5920), 1429.
Dorn, F., Wirth, L., Gorbey, S., Wege, M., Zemlin, M., Maier, R. F., & Lemmer, B. (2014). Influence
of acoustic stimulation on the circadian and ultradian rhythm of premature infants. Chronobiology
International 31(9), 1062–1074.
Elsinger, C. L., Rao, S. M., Zimbelman, J. L., Reynolds, N. C., Blindauer, K. A., & Hoffmann, R. G.
(2003). Neural basis for impaired time reproduction in Parkinson’s disease: An fMRI study.
Journal of the International Neuropsychological Society 9(7), 1088–1098.
Evers, S., & Suhr, B. (2000). Changes of the neurotransmitter serotonin but not of hormones during
short time music perception. European Archives of Psychiatry and Clinical Neuroscience 250(3),
144–147.
Fancourt, D., Aufegger, L., & Williamon, A. (2015). Low-stress and high-stress singing have
contrasting effects on glucocorticoid response. Frontiers in Psychology 6, 1–5. Retrieved from
Fancourt, D., Ockelford, A., & Belai, A. (2014). The psychoneuroimmunological effects of music: A
systematic review and a new model. Brain, Behavior, and Immunity 36, 15–26.
Fancourt, D., Perkins, R., Ascenso, S., Carvalho, L. A., Steptoe, A., & Williamon, A. (2016). Effects
of group drumming interventions on anxiety, depression, social resilience and inflammatory
immune response among mental health service users. PLoS ONE 11(3), 1–16. Retrieved from
Fancourt, D., Williamon, A., Carvalho, L. A., Steptoe, A., Dow, R., & Lewis, I. (2016). Singing
modulates mood, stress, cortisol, cytokine and neuropeptide activity in cancer patients and carers.
Ecancermedicalscience 10, 1–13. Retrieved from https://doi.org/10.3332/ecancer.2016.631
Ferris, C. (1992). Role of vasopressin in aggressive and dominant/subordinate behaviors. Annals of
Field, T., Martinez, A., Nawrocki, T., Pickens, J., Fox, N., & Schanberg, S. (1998). Music shifts
frontal EEG in depressed adolescents. Adolescence 33(129), 109–116.
Frost, J. J., Douglass, K. H., Mayberg, H. S., Dannals, R. F., Links, J. M., Wilson, A. A., … Wagner,
H. N. (1989). Multicompartmental analysis of [11C]-Carfentanil binding to opiate receptors in
humans measured by positron emission tomography. Journal of Cerebral Blood Flow &
Metabolism 9(3), 398–409.
Fukui, H., & Toyoshima, K. (2013). Influence of music on steroid hormones and the relationship
between receptor polymorphisms and musical ability: A pilot study. Frontiers in Psychology 4, 1–
8. Retrieved from https://doi.org/10.3389/fpsyg.2013.00910
Fukui, H., & Yamashita, M. (2003). The effects of music and visual stress on testosterone and
cortisol in men and women. Neuro Endocrinology Letters 24(3–4), 173–180.
Gerra, G., Zaimovic, A., Franchini, D., Palladino, M., Giucastro, G., Reali, N., … Brambilla, F.
(1998). Neuroendocrine responses of healthy volunteers to “techno-music”: Relationships with
personality traits and emotional state. International Journal of Psychophysiology 28, 99–111.
Gingras, B., Pohler, G., & Fitch, W. T. (2014). Exploring shamanic journeying: Repetitive drumming
with shamanic instructions induces specific subjective experiences but no larger cortisol decrease
than instrumental meditation music. Plos ONE 9(7). Retrieved from
Goldstein, A. (1980). Thrills in response to music and other stimuli. Physiological Psychology 8(1),
126–129.
Good, M., Albert, J. M., Arafah, B., Anderson, G. C., Wotman, S., Cong, X., … Ahn, S. (2013).
Effects on postoperative salivary cortisol of relaxation/music and patient teaching about pain
management. Biological Research for Nursing 15(3), 318–329.
Graff-Guerrero, A., Willeit, M., Ginovart, N., Mamo, D., Mizrahi, R., Rusjan, P., … Kapur, S.
(2008). Brain region binding of the D2/3 agonist [11C]-(+)- PHNO and the D2/3 antagonist
[11C]raclopride in healthy humans. Human Brain Mapping 29(4), 400–410.
Granot, R. Y., Frankel, Y., Gritsenko, V., Lerer, E., Gritsenko, I., Bachner-Melman, R., … Ebstein, R.
P. (2007). Provisional evidence that the arginine vasopressin 1a receptor gene is associated with
musical memory. Evolution and Human Behavior 28(5), 313–318.
Granot, R. Y., Uzefovsky, F., Bogopolsky, H., & Ebstein, R. P. (2013). Effects of arginine vasopressin
on musical working memory. Frontiers in Psychology 4, 1–12. Retrieved from
Grape, C., Sandgren, M., Hansson, L., Ericson, M., & Theorell, T. (2003). Does singing promote
well-being? An empirical study of professional and amateur singers during a singing lesson.
Integrative Physiological & Behavioral Science 38(1), 65–74.
Graversen, M., & Sommer, T. (2013). Perioperative music may reduce pain and fatigue in patients
undergoing laparoscopic cholecystectomy. Acta Anaesthesiologica Scandinavica 57(8), 1010–
1016.
Griffiths, T. D., Uppenkamp, S., Johnsrude, I., Josephs, O., & Patterson, R. D. (2001). Encoding of
the temporal regularity of sound in the human brainstem. Nature Neuroscience 4(6), 633–637.
Gunn, R. N., Slifstein, M., Searle, G. E., & Price, J. C. (2015). Quantitative imaging of protein
targets in the human brain with PET. Physics in Medicine and Biology 60(22), R363–R411.
Hébert, S., Béland, R., Dionne-Fournelle, O., Crête, M., & Lupien, S. J. (2005). Physiological stress
response to video-game playing: The contribution of built-in music. Life Sciences 76(20), 2371–
2380.
Heinrichs, M., von Dawans, B., & Domes, G. (2009). Oxytocin, vasopressin, and human social
behavior. Frontiers in Neuroendocrinology 30(4), 548–557.
Hirokawa, E., & Ohira, H. (2003). The effects of music listening after a stressful task on immune
functions, neuroendocrine responses, and emotional states in college students. Journal of Music
Therapy 40(3), 189–211.
Hjelmstad, G. O., Xia, Y., Margolis, E. B., & Fields, H. L. (2013). Opioid modulation of ventral
pallidal afferents to ventral tegmental area neurons. Journal of Neuroscience 33(15), 6454–6459.
Hodes, G. E., Pfaua, M. L., Marylene Leboeufb, C., Goldena, S. A., Christoffela, D. J., Bregmana,
D., … Russo, S. J. (2014). Individual differences in the peripheral immune system promote
resilience versus susceptibility to social stress. Proceedings of the National Academy of Sciences
111(45), 16136–16141.
Hodges, D. (2010). Pyschophysiological measures. In P. Juslin & J. Sloboda (Eds.), Handbook of
music and emotion: Theory, research, applications (pp. 279–312). Oxford: Oxford University
Press.
Hoffman, E. R., Brownley, K. A., Hamer, R. M., & Bulik, C. M. (2012). Plasma, salivary, and
urinary oxytocin in anorexia nervosa: A pilot study. Eating Behaviors 13(3), 256–259.
Huber, D., Veinante, P., & Stoop, R. (2005). Vasopressin and oxytocin excite distinct neuronal
populations in the central amygdala. Science 308(5719), 245–248.
Hurley, L. M., & Sullivan, M. R. (2012). From behavioral context to receptors: Serotonergic
modulatory pathways in the IC. Frontiers in Neural Circuits 6, 1–17. Retrieved from
https://doi.org/10.3389/fncir.2012.00058
Hurley, R. A, Flashman, L. A., Chow, T. W., & Taber, K. H. (2010). The brainstem: Anatomy,
assessment, and clinical syndromes. Journal of Neuropsychiatry and Clinical Neuroscience 22(1),
2–6. Retrieved from https://doi.org/10.1176/appi.neuropsych.23.2.121
Insel, T. R. (2010). The challenge of translation in social neuroscience: A review of oxytocin,
vasopressin, and affiliative behavior. Neuron 65(6), 768–779.
Jahanshahi, M., Jones, C. R. G., Zijlmans, J., Katzenschlager, R., Lee, L., Quinn, N., … Lees, A. J.
(2010). Dopaminergic modulation of striato-frontal connectivity during motor timing in
Parkinson’s disease. Brain 133(3), 727–745.
Javor, A., Riedl, R., Kindermann, H., Brandstatter, W., Ransmayr, G., & Gabriel, M. (2014).
Correlation of plasma and salivary oxytocin in healthy young men: Experimental evidence. Neuro
Endocrinology Letters 35(6), 470–473.
Jayamala, A. K., Lakshmanagowda, P. B., Pradeep, G. C. M., & Goturu, J. (2015). Impact of music
therapy on breast milk secretion in mothers of premature newborns. Journal of Clinical and
Diagnostic Research 9(4), CC04–CC06.
Jeong, Y. J., Hong, S. C., Myeong, S. L., Park, M. C., Kim, Y. K., & Suh, C. M. (2005). Dance
movement therapy improves emotional responses and modulates neurohormones in adolescents
with mild depression. International Journal of Neuroscience 115(12), 1711–1720.
Johnson, Z. V., & Young, L. J. (2017). Oxytocin and vasopressin neural networks: Implications for
social behavioral diversity and translational neuroscience. Neuroscience & Biobehavioral Reviews
76, 87–98.
Jones, A. K. P., Cunningham, V. J., Ha-Kawa, S. K., Fujiwara, T., Liyii, Q., Luthra, S. K., … Jones,
T. (1994). Quantitation of [11C]diprenorphine cerebral kinetics in man acquired by PET using
presaturation, pulse-chase and tracer-only protocols. Journal of Neuroscience Methods 51(2), 123–
134.
mechanisms. Behavioral and Brain Sciences 31(5), 559–575; discussion 575–621.
Kaelen, M., Barrett, F. S., Roseman, L., Lorenz, R., Family, N., Bolstridge, M., … Carhart-Harris, R.
L. (2015). LSD enhances the emotional response to music. Psychopharmacology 232(19), 3607–
3614.
Kaelen, M., Roseman, L., Kahan, J., Santos-Ribeiro, A., Orban, C., Lorenz, R., … Carhart-Harris, R.
(2016). LSD modulates music-induced imagery via changes in parahippocampal connectivity.
European Neuropsychopharmacology 26(7), 1099–1109.
Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A. K., Lähdesmäki, H., & Järvelä, I. (2015). The
1–7. Retrieved from https://doi.org/10.1038/srep09506
Kanduri, C., Raijas, P., Ahvenainen, M., Philips, A. K., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä,
I. (2015). The effect of listening to music on human transcriptome. PeerJ 3, e830. Retrieved from
https://doi.org/10.7717/peerj.830
Karageorghis, C. I., Bruce, A. C., Pottratz, S. T., Stevens, R. C., Bigliassi, M., & Hamer, M. (2017).
Psychological and psychophysiological effects of recuperative music postexercise. Medicine &
Science in Sports & Exercise 50(4), 739–746.
Katori, S., Hamada, S., Noguchi, Y., Fukuda, E., Yamamoto, T., Yamamoto, H., … Yagi, T. (2009).
Protocadherin-α family is required for serotonergic projections to appropriately innervate target
brain areas. Journal of Neuroscience 29(29), 9137–9147.
Keeler, J. R., Roth, E. A., Neuser, B. L., Spitsbergen, J. M., Waters, D. J. M., & Vianney, J.-M.
(2015). The neurochemistry and social flow of singing: Bonding and oxytocin. Frontiers in Human
Kejr, A., Gigante, C., Hames, V., Krieg, C., Mages, J., König, N., … Diel, F. (2010). Receptive music
therapy and salivary histamine secretion. Inflammation Research 59(Suppl. 2), 217–218.
Khalfa, S., Dalla Bella, S., Roy, M., Peretz, I., & Lupien, S. J. (2003). Effects of relaxing music on
salivary cortisol level after psychological stress. Annals of the New York Academy of Sciences 999,
374–376.
Kimata, H. (2003). Listening to Mozart reduces allergic skin wheal responses and in vitro allergen-
specific IGE production in atopic dermatitis patients with latex allergy. Behavioral Medicine 29(1),
15–19.
Kirschbaum, C., & Hellhammer, D. H. (1994). Salivary cortisol in psychoneuroendocrine research:
Recent developments and applications. Psychoneuroendocrinology, 19(4), 313–333.
Knight, W. E. J., & Rickard, N. S. (2001). Relaxing music prevents stress-induced increase in
subjective anxiety, systolic blood pressure, and heart rate in healthy males and female. Journal of
Music Therapy 34(4), 254–272.
170–180.
Koelsch, S., Boehlig, A., Hohenadel, M., Nitsche, I., Bauer, K., & Sack, U. (2016). The impact of
acute stress on hormones and cytokines, and how their recovery is affected by music-evoked
positive mood. Scientific Reports 6, 1–11. Retrieved from https://doi.org/10.1038/srep23008
Koelsch, S., Fuermetz, J., Sack, U., Bauer, K., Hohenadel, M., Wiegel, M., … Heinke, W. (2011).
Effects of music listening on cortisol levels and propofol consumption during spinal anesthesia.
Frontiers in Psychology 2, 1–9. Retrieved from https://doi.org/10.3389/fpsyg.2011.00058
Koyama, M., Wachi, M., Utsuyama, M., Bittman, B., Hirokawa, K., & Kitagawa, M. (2009).
Recreational music-making modulates immunological responses and mood states in older adults.
Journal of Medical and Dental Sciences 56, 79–90.
Koyama, Y., Jodo, E., & Kayama, Y. (1994). Sensory responsiveness of “broad-spike” neurons in the
laterodorsal tegmental nucleus, locus coeruleus and dorsal raphe of awake rats: Implications for
cholinergic and monoaminergic neuron-specific responses. Neuroscience 63(4), 1021–1031.
Kreutz, G., Bongard, S., Rohrmann, S., Hodapp, V., & Grebe, D. (2004). Effects of choir singing or
listening on secretory immunoglobulin A, cortisol, and emotional state. Journal of Behavioral
Medicine 27(6), 623–635.
Kreutz, G., Murcia, C., & Bongard, S. (2012). Psychoneuroendocrine research on music and health:
An overview. In R. MacDonald, G. Kreutz, & L. Mitchell (Eds.), Music, health, and wellbeing (pp.
Kuhn, D. (2002). The effects of active and passive participation in musical activity on the immune
system as measured by salivary immunoglobulin A (SIgA). Journal of Music Therapy 39(1), 30–
39.
Kumar, A., Tims, F., Cruess, D., Mintzer, M., Ironson, G., Loewenstein, D., … Kumar, M. (1999).
Music therapy increases serum melatonin levels in patients with Alzheimer’s disease. Alternative
Therapies in Health Medicine 5(9), 49–57.
Leardi, S., Pietroletti, R., Angeloni, G., Necozione, S., Ranalletta, G., & Del Gusto, B. (2007).
Randomized clinical trial examining the effect of music therapy in stress response to day surgery.
British Journal of Surgery 94(8), 943–947.
le Roux, F., Bouic, P., & Bester, M. (2007). The effect of Bach’s Magnificat on emotions, immune,
and endocrine parameters during physiotherapy treatment of patients with infectious lung
conditions. Journal of Music Therapy 44(2), 156–168.
Lefevre, A., Mottolese, R., Dirheimer, M., Mottolese, C., Duhamel, J. R., & Sirigu, A. (2017). A
comparison of methods to measure central and peripheral oxytocin concentrations in human and
non-human primates. Scientific Reports 7(1), 17222. Retrieved from
https://doi.org/10.1038/s41598-017-17674-7
Levitt, P., & Moore, R. Y. (1979). Origin and organization of brainstem catecholamine innervation in
the rat. Journal of Comparative Neurology 186(4), 505–528.
Lin, P. C., Lin, M. L., Huang, L. C., Hsu, H. C., & Lin, C. C. (2011). Music therapy for patients
receiving spine surgery. Journal of Clinical Nursing 20(7–8), 960–968.
Lindblad, F., Hogmark, Å., & Theorell, T. (2007). Music intervention for 5th and 6th graders: Effects
on development and cortisol secretion. Stress and Health 23(1), 9–14.
Linnemann, A., Ditzen, B., Strahler, J., Doerr, J. M., & Nater, U. M. (2015). Music listening as a
means of stress reduction in daily life. Psychoneuroendocrinology 60, 82–90.
Linnemann, A., Kappert, M. B., Fischer, S., Doerr, J. M., Strahler, J., & Nater, U. M. (2015). The
effects of music listening on pain and stress in the daily life of patients with fibromyalgia
syndrome. Frontiers in Human Neuroscience 9, 1–10. Retrieved from
Linnemann, A., Strahler, J., & Nater, U. M. (2016). The stress-reducing effect of music listening
varies depending on the social context. Psychoneuroendocrinology 72, 97–105.
Lubin, D., Elliot, J., Black, M., & Johns, J. (2003). An oxytocin antagonist infused into the central
nucleus of the amygdala increases maternal aggressive behavior. Behavioral Neuroscience 117(2),
195–201.
McCraty, R., Atkinson, M., & Rein, G. (1996). Music enhances the effect of positive emotional states
on salivary IgA. Stress Medicine 12, 167–175.
McKinney, C. H., Tims, F. C., Kumar, A. M., & Kumar, M. (1997). The effect of selected classical
music and spontaneous imagery on plasma β-endorphin. Journal of Behavioral Medicine 20(1),
85–99.
MacLean, E. L., Gesquiere, L. R., Gruen, M. E., Sherman, B. L., Martin, W. L., & Carter, C. S.
(2017). Endogenous oxytocin, vasopressin, and aggression in domestic dogs. Frontiers in
Psychology 8. Retrieved from https://doi.org/10.3389/fpsyg.2017.01613
Mallik, A., Chanda, M. L., & Levitin, D. J. (2017). Anhedonia to music and mu-opioids: Evidence
from the administration of naltrexone. Scientific Reports 7, 1–8. Retrieved from
https://doi.org/10.1038/srep41952
Mariath, L. M., da Silva, A. M., Kowalski, T. W., Gattino, G. S., De Araujo, G. A., Figueiredo, F. G.,
… Schuch, J. B. (2017). Music genetics research: Association with musicality of a polymorphism
in the AVPR1A gene. Genetics and Molecular Biology 40(2), 421–429.
Maulina, T., Djustiana, N., & Shahib, M. N. (2017). The effect of music intervention on dental
anxiety during dental extraction procedure. The Open Dentistry Journal 11(1), 565–572.
Mejía-Rubalcava, C., Alanís-Tavira, J., Mendieta-Zerón, H., & Sánchez-Pérez, L. (2015). Changes
induced by music therapy to physiologic parameters in patients with dental anxiety.
Complementary Therapies in Clinical Practice 21(4), 282–286.
Metherate, R. (2011). Functional connectivity and cholinergic modulation in auditory cortex.
Neuroscience & Biobehavioral Reviews 35(10), 2058–2063.
Migneault, B., Girard, F., Albert, C., Chouinard, P., Boudreault, D., Provencher, D., … Girard, D. C.
(2004). The effect of music on the neurohormonal stress response to surgery under general
anesthesia. Anesthesia & Analgesia 98(2), 527–532.
Miller, N. S., Kwak, Y., Bohnen, N. I., Müller, M. L. T. M., Dayalu, P., & Seidler, R. D. (2013). The
pattern of striatal dopaminergic denervation explains sensorimotor synchronization accuracy in
Parkinson’s disease. Behavioural Brain Research 257, 100–110.
Möckel, M., Störk, T., Vollert, J., Röcker, L., Danne, O., Hochrein, H., … Frei, U. (1995). Stress
reduction through listening to music: Effects on stress hormones, hemodynamics and mental state
in patients with arterial hypertension and in healthy persons. Deutsche Medizinische Wochenschrift
120(21), 745–752.
Moriizumi, T., & Hattori, T. (1992). Choline acetyltransferase-immunoreactive neurons in the rat
entopeduncular nucleus. Neuroscience 46(3), 721–728.
Morley, A. P., Narayanan, M., Mines, R., Molokhia, A., Baxter, S., Craig, G., … Craig, I. (2012).
AVPR1A and SLC6A4 polymorphisms in choral singers and non-musicians: A gene association
study. PLoS ONE 7(2), 2–8. Retrieved from https://doi.org/10.1371/journal.pone.0031763
Morley, B. J., & Happe, H. K. (2000). Cholinergic receptors: Dual roles in transduction and
plasticity. Hearing Research 147(1–2), 104–112.
Motts, S. D., & Schofield, B. R. (2010). Cholinergic and non-cholinergic projections from the
pedunculopontine and laterodorsal tegmental nuclei to the medial geniculate body in guinea pigs.
Frontiers in Neuroanatomy 4, 1–8. Retrieved from https://doi.org/10.3389/fnana.2010.00137
Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J., … Möller, H. E. (2015).
Murphy, D. D., Rueter, S. M., Trojanowski, J. Q., & Lee, V. M. (2000). Synucleins are
developmentally expressed, and alpha-synuclein regulates the size of the presynaptic vesicular
pool in primary hippocampal neurons. Journal of Neuroscience 20(9), 3214–3220.
Naganawa, M., Zheng, M.-Q., Henry, S., Nabulsi, N., Lin, S.-F., Ropchan, J., … Huang, Y. (2015).
Test-retest reproducibility of binding parameters in humans with 11C-LY2795050, an antagonist
PET radiotracer for the opioid receptor. Journal of Nuclear Medicine 56(2), 243–248.
Narendran, R., Mason, N. S., Laymon, C. M., Lopresti, B. J., Velasquez, N. D., May, M. A., …
Frankle, W. G. (2010). A comparative evaluation of the dopamine D(2/3) agonist radiotracer [11C]
(-)-N-propyl-norapomorphine and antagonist [11C]raclopride to measure amphetamine-induced
dopamine release in the human striatum. Journal of Pharmacology and Experimental Therapeutics
333(2), 533–539.
Narendran, R., Slifstein, M., Guillin, O., Hwang, Y., Hwang, D. R., Scher, E., … Laruelle, M. (2006).
Dopamine (D2/3) receptor agonist Positron Emission Tomography radiotracer [11C]-(+)-PHNO is
a D3 receptor preferring agonist in vivo. Synapse 60(7), 485–495.
Nater, U. M., Abbruzzese, E., Krebs, M., & Ehlert, U. (2006). Sex differences in emotional and
psychophysiological responses to musical stimuli. International Journal of Psychophysiology
62(2), 300–308.
Nilsson, U. (2009). Soothing music can increase oxytocin levels during bed rest after open-heart
surgery: A randomised control trial. Journal of Clinical Nursing 18(15), 2153–2161.
Nilsson, U., Unosson, M., & Rawal, N. (2005). Stress reduction and analgesia in patients exposed to
calming music postoperatively: A randomized controlled trial. European Journal of
Anaesthesiology 22(2), 96–102.
Numan, M., Bress, J. A., Ranker, L. R., Gary, A. J., DeNicola, A. L., Bettis, J. K., & Knapp, S. E.
(2010). The importance of the basolateral/basomedial amygdala for goal-directed maternal
responses in postpartum rats. Behavioural Brain Research 214(2), 368–376.
Oczkowska, A., Kozubski, W., Lianeri, M., & Dorszewska, J. (2014). Mutations in PRKN and SNCA
genes important for the progress of Parkinson’s disease. Current Genomics 14(8), 502–517.
Okada, K., Kurita, A., Takase, B., Otsuka, T., Kodani, E., Kusama, Y., … Mizuno, K. (2009). Effects
of music therapy on autonomic nervous system activity, incidence of heart failure events, and
plasma cytokine and catecholamine levels in elderly patients with cerebrovascular disease and
dementia. International Heart Journal 50(1), 95–110.
Ooishi, Y., Mukai, H., Watanabe, K., Kawato, S., & Kashino, M. (2017). Increase in salivary
oxytocin and decrease in salivary cortisol after listening to relaxing slow-tempo and exciting fast-
tempo music. PLoS ONE 12(12), 1–16. Retrieved from
Owen, D. R. J., Gunn, R. N., Rabiner, E. A., Bennacef, I., Fujita, M., Kreisl, W. C., … Parker, C. A.
(2011). Mixed-affinity binding in humans with 18-kDa translocator protein ligands. Journal of
Nuclear Medicine 52(1), 24–32.
Pan, W. X., & Hyland, B. I. (2005). Pedunculopontine tegmental nucleus controls conditioned
responses of midbrain dopamine neurons in behaving rats. Journal of Neuroscience 25(19), 4725–
4732.
Pierrehumbert, B., Torrisi, R., Laufer, D., Halfon, O., Ansermet, F., & Beck Popovic, M. (2010).
Oxytocin response to an experimental psychosocial challenge in adults exposed to traumatic
experiences during childhood or adolescence. Neuroscience 166(1), 168–177.
Qiu, J., Jiang, Y.-F., Li, F., Tong, Q.-H., Rong, H., & Cheng, R. (2017). Effect of combined music
and touch intervention on pain response and β-endorphin and cortisol concentrations in late
preterm infants. BMC Pediatrics 17(1), 1–7. Retrieved from https://doi.org/10.1186/s12887-016-
0755-y
Quiroga Murcia, C., Kreutz, G., Clift, S., & Bongard, S. (2010). Shall we dance? An exploration of
the perceived benefits of dancing on well-being. Arts & Health 2(2), 149–163.
Rabiner, E. A., & Laruelle, M. (2010). Imaging the D3 receptor in humans in vivo using [11C](+)-
PHNO positron emission tomography (PET). International Journal of Neuropsychopharmacology
13(3), 289–290.
Rainville, J. R., Tsyglakova, M., & Hodes, G. E. (2018). Deciphering sex differences in the immune
system and depression. Frontiers in Neuroendocrinology (August). Retrieved from
https://doi.org/10.1016/j.yfrne.2017.12.004
Reese, N. B., Garciarill, E., & Skinner, R. D. (1995a). Auditory input to the pedunculopontine
nucleus: I. Evoked potentials. Brain Research Bulletin 37(3), 257–264.
Reese, N. B., Garciarill, E., & Skinner, R. D. (1995b). Auditory input to the pedunculopontine
nucleus: II. Unite responses. Brain Research Bulletin 37(3), 265–273.
Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., & Dagher, A. Z. R. (2013).
Interactions between the nucleus accumbens and auditory cortices predict music reward value.
Science 340(6129), 216–219.
Schladt, T. M., Nordmann, G. C., Emilius, R., Kudielka, B. M., de Jong, T. R., & Neumann, I. D.
(2017). Choir versus solo singing: Effects on mood, and salivary oxytocin and cortisol
concentrations. Frontiers in Human Neuroscience 11, 1–9. Retrieved from
Schneider, N., Schedlowski, M., Schürmeyer, T. H., & Becker, H. (2001). Stress reduction through
music in patients undergoing cerebral angiography. Neuroradiology 43(6), 472–476.
Schofield, B. R. (2010). Projections from auditory cortex to midbrain cholinergic neurons that project
to the inferior colliculus. Neuroscience 166(1), 231–240.
Schwilling, D., Vogeser, M., Kirchhoff, F., Schwaiblmair, F., Boulesteix, A. L., Schulze, A., &
Flemmer, A. W. (2015). Live music reduces stress levels in very low-birthweight infants. Acta
Paediatrica (Oslo, Norway), 104(4), 360–367.
Shimizu, N., Umemura, T., Hirai, T., Tamura, T., Sato, K., & Kusaka, Y. (2013). Effects of movement
music therapy with the naruko clapper on psychological, physical and physiological indices among
elderly females: A randomized controlled trial. Gerontology 59(4), 355–367.
Shotbolt, P., Tziortzi, A. C., Searle, G. E., Colasanti, A., Van Der Aart, J., Abanades, S., … Rabiner,
E. A. (2012). Within-subject comparison of [11C]-(+)-PHNO and [11C]raclopride sensitivity to
acute amphetamine challenge in healthy humans. Journal of Cerebral Blood Flow and Metabolism
32(1), 127–136.
Solís, O., & Moratalla, R. (2018). Dopamine receptors: Homomeric and heteromeric complexes in
l‑DOPA‑induced dyskinesia. Journal of Neural Transmission 1, 1–8. Retrieved from
https://doi.org/10.1007/s00702-018-1852-x
Spencer, R. L., Chun, L. E., Hartsock, M. J., & Woodruff, E. R. (2018). Glucocorticoid hormones are
both a major circadian signal and major stress signal: How this shared signal contributes to a
dynamic relationship between the circadian and stress systems. Frontiers in Neuroendocrinology
49, 52–71.
Stefano, G. B., Zhu, W., Cadet, P., Salamon, E., & Mantione, K. J. (2004). Music alters constitutively
expressed opiate and cytokine processes in listeners. Medical Science Monitor: International
Medical Journal of Experimental and Clinical Research 10(6), MS18–MS27.
Suzuki, M., Kanamori, M., Nagasawa, S., Tokiko, I., & Takayuki, S. (2007). Music therapy-induced
changes in behavioral evaluations, and saliva chromogranin A and immunoglobulin A
concentrations in elderly patients with senile dementia. Geriatrics & Gerontology International
7(1), 61–71.
Tabrizi, E. M., Sahraei, H., & Rad, S. M. (2012). The effect of music on the level of cortisol, blood
glucose and physiological variables. EXCLI Journal 11, 556–565. Retrieved from
Tan, Y. T., McPherson, G. E., Peretz, I., Berkovic, S. F., & Wilson, S. J. (2014). The genetic basis of
music ability. Frontiers in Psychology 5, 1–19. Retrieved from
Thoma, M. V., La Marca, R., Brönnimann, R., Finkel, L., Ehlert, U., & Nater, U. M. (2013). The
effect of music on the human stress response. PLoS ONE 8(8), 1–12. Retrieved from
Thompson, A. M. (2003). Pontine sources of norepinephrine in the cat cochlear nucleus. Journal of
Comparative Neurology 457(4), 374–383.
Thompson, R. R., & Walton, J. C. (2004). Peptide effects on social behavior: Effects of vasotocin and
isotocin on social approach behavior in male goldfish (Carassius auratus). Behavioral
Trappe, H.-J., & Voit, G. (2016). The cardiovascular effect of musical genres. Deutzsches Ärzteblatt
International 113(20), 347–352.
Turvey, S. E., & Broide, D. H. (2010). Innate immunity. Journal of Allergy and Clinical Immunology
125(2 Suppl. 2), S24–S32.
Ukkola-Vuoti, L., Kanduri, C., Oikkonen, J., Buck, G., Blancher, C., Raijas, P., … Järvelä, I. (2013).
Genome-wide copy number variation analysis in extended families and unrelated individuals
characterized for musical aptitude and creativity in music. PLoS ONE 8(2). Retrieved from
Ukkola-Vuoti, L., Oikkonen, J., Onkamo, P., Karma, K., Raijas, P., & Järvelä, I. (2011). Association
of the arginine vasopressin receptor 1A (AVPR1A) haplotypes with listening to music. Journal of
Human Genetics 56(4), 324–329.
Ukkola, L., Onkamo, P., Raijas, P., Karma, K., & Järvelä, I. (2009). Musical aptitude is associated
with AVPR1A-Halotypes. PLoS ONE 4(5), e5534. Retrieved from
Valdiglesias, V., Maseda, A., Lorenzo-López, L., Pásaro, E., Millán-Calenti, J. C., & Laffon, B.
(2017). Is salivary chromogranin A a valid psychological stress biomarker during sensory
stimulation in people with advanced dementia? Journal of Alzheimer’s Disease 55(4), 1509–1517.
Valstad, M., Alvares, G. A., Egknud, M., Matziorinis, A. M., Andreassen, O. A., Westlye, L. T., &
Quintana, D. S. (2017). The correlation between central and peripheral oxytocin concentrations: A
systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews 78, 117–124.
Veening, J. G., Gerrits, P. O., & Barendregt, H. P. (2012). Volume transmission of beta-endorphin via
the cerebrospinal fluid: A review. Fluids and Barriers of the CNS 9(1), 1. Retrieved from
https://doi.org/10.1186/2045-8118-9-16
Venneti, S., Lopresti, B. J., & Wiley, C. A. (2013). Molecular imaging of microglia/macrophages in
the brain. Glia 61(1), 10–23. Retrieved from https://doi.org/10.1002/glia.22357
Vollert, J., Störk, T., Rose, M., & Möckel, M. (2003). Musik als begleitende therapie bei koronarer
herzkrankheit. Deutsche Medizinische Wochenschrift, 128, 2712–2716.
Wahbeh, H., Calabrese, C., & Zwickey, H. (2007). Binaural beat technology in humans: A pilot study
to assess psychologic and physiologic effects. Journal of Alternative and Complementary
Medicine 13(1), 25–32.
Wang, S., Kulkarni, L., Dolev, J., & Kain, Z. (2002). Music and preoperative anxiety: A randomized,
controlled study. Anesthesia & Analgesia 94(6), 1489–1494.
Willeit, M., Ginovart, N., Graff, A., Rusjan, P., Vitcu, I., Houle, S., … Kapur, S. (2008). First human
evidence of d-amphetamine induced displacement of a D2/3agonist radioligand: A [11C]-(+)-
PHNO positron emission tomography study. Neuropsychopharmacology 33(2), 279–289.
Willeit, M., Ginovart, N., Kapur, S., Houle, S., Hussey, D., Seeman, P., & Wilson, A. A. (2006).
High-affinity states of human brain dopamine D2/3 receptors imaged by the agonist [11C]-(+)-
PHNO. Biological Psychiatry 59(5), 389–394.
Woof, J. M., & Ken, M. A. (2006). The function of immunoglobulin A in immunity. Journal of
Pathology 208(2), 270–282.
Yamamoto, T., Ohkuwa, T., Itoh, H., Kitoh, M., Terasawa, J., Tsuda, T., … Sato, Y. (2003). Effects of
pre-exercise listening to slow and fast rhythm music on supramaximal cycle performance and
selected metabolic variables. Archives of Physiology and Biochemistry 111(3), 211–214.
Yovel, G., Shakhar, K., & Ben-Eliyahu, S. (2001). The effects of sex, menstrual cycle, and oral
contraceptives on the number and activity of natural killer cells. Gynecologic Oncology 81(2),
254–262.
Yuhi, T., Kyuta, H., Mori, H.-A., Murakami, C., Furuhara, K., Okuno, M., … Higashida, H. (2017).
Salivary oxytocin concentration changes during a group drumming intervention for maltreated
school children. Brain Sciences 7, 152. Retrieved from https://doi.org/10.3390/brainsci7110152
CHAPT E R 15
THE NEUROAESTHETICS
OF MUSIC: A RESEARCH
AGENDA COMING OF AGE
E LV I R A B R AT T I C O
H , the study of music, how it is perceived and appreciated and

how it is created (composed) and produced (performed) has been
approached in two broadly distinct ways. On one hand, music has been
studied as a succession of compositions and composers and how these are
acclaimed in different epochs. This “humanistic” approach uses the
descriptive methods of history, sociology, and philosophy, and it is often
identified with musicology proper. Within this approach, philosophical
aesthetics of music finds its place (Scruton, 1999): the goal is to describe
the change of musical taste over time, namely the explicit or unsaid
principles that tacitly govern the consensus on what is considered musically
acceptable and admirable (“beautiful”) and what is not. The peculiarity of
this “humanistic” approach is the attention to the work of a single composer
or musician, narrated for evidencing the uniqueness and exceptionality of
his/her work, and its non-replicable contribution to humanity (Zeki, 2014).
On the other hand, music has also been studied analytically with
methods resembling natural sciences more than humanities. Music theory in
primis and systematic musicology in secundis have evidenced the
conventions that underlie music composition, namely the recipes for
creating music, derived from the work of generally recognized composers,
and the constant laws of perception that govern how music is understood
and appreciated. With the advent of cognitive science, this “systematic”
approach, grounded on the scientific method, has been inspired by the
computer metaphor in the search for universal rules that govern how we
perceive, appreciate, and produce music (Sloboda, 1985). The search for the
perception and cognition laws for music has profited from neurological
findings, in which patients with brain lesions in auditory temporal areas
showed a loss of musical perceptual abilities accompanied with
preservation of other auditory perceptual skills (Peretz & Zatorre, 2005).
These studies, when supported by the opposite findings, namely when
showing a double dissociation between music and language perception,
provided the grounds for the initial influential models of music perception
and production, listing a set of modules, each dedicated to encapsulated and
automatic subskills (Peretz & Coltheart, 2003). This line of research,
bridging systematic musicology with brain-lesion studies, has seen its
climax in the 1980s and early 1990s.
The 1990s, called the “decade of the brain,” also witnessed a surge of
interest for answering epistemological (perception- and cognition-related)
questions with experiments on healthy volunteers using methods borrowed
from neurophysiology and neurology. New brain scanning devices such as
magnetoencephalography (MEG, measuring the magnetic fields around ion
currents produced by neurotransmission) and functional magnetic resonance
imaging (fMRI, measuring neuron-activity-dependent hemoglobin changes
in blood flow in the brain) allowed access to the study of music brain
functions to a broader group of researchers, without the need to study rare
brain-lesion patients. Healthy volunteers could be increasingly measured
during music tasks without causing any harm to them, apart from the short-
lasting discomfort of the experimental session. This variation of the
systematic approach peaked in the 2000s decade and is called “cognitive
neuroscience of music” (Levitin & Tirovolas, 2009; Peretz & Zatorre, 2003;
Samson, Dellacherie, & Platel, 2009) or more simply “neurosciences of
music” (Altenmüller et al., 2012; Bigand & Tillmann, 2015). According to
these accounts, music corresponds to a biological function, involving
universal features that are shared by all humans ontologically (since birth)
and philologically (since the appearance of Homo sapiens). More complex
models of music perception, cognition, and emotions started to emerge,
incorporating findings that pointed at shared rather than modular neural
resources dedicated to music, in relation to other auditory functions
(Früholz, Trost, & Kotz, 2016; Koelsch & Siebel, 2005; Patel, 2008).
Hence, in cognitive neuroscience of music, the main goals have been, and
still are, the search for brain specializations for music (as opposed to
speech), the determination of the neural foundations of music perception,
emotion, and production, and the identification of music effects on other
brain functions. Overall, the predominant topics and models within
cognitive neuroscience of music leave little space to aesthetic processes
such as evaluative judgments, appreciation, and taste formation.
In the past years, though, we are witnessing a shift of paradigm within
the “systematic” approach, centered on a revised conceptualization of
music, and that might ultimately reconcile this approach with the traditional
“humanistic” one. This shift has been initiated by studies that have focused
on the subjective experience of music listening, rather than the objective,
physical attributes of it. In these studies, experimental participants were
asked to bring their own music to the laboratory, and their individual
reactions to the music heard became the focus of investigation,
irrespectively of which object induced those reactions (Blood & Zatorre,
2001; Brattico et al., 2016). This experience is referred to as aesthetic when
it originates in association to an artistic, human-made object without clear
utilitarian functions. In several philosophical conceptualizations, in art and
music what matters is the phenomenological content of the individual
experience. The scientific method applied to the study of this experience is
called empirical aesthetics, when mainly behavioral methods are used, or
neuroaesthetics, when also brain research techniques are applied. In
empirical aesthetics and neuroaesthetics, researchers strive to fragment the
aesthetic experience into subprocesses or stages that can be studied
separately and that, when replicated, can produce a predictable outcome.
However, since the human mind possesses an embodied craving for beauty,
harmony, and symmetry, some artistic-object features that generate an
aesthetic experience occur more frequently than others (Chatterjee &
Vartanian, 2016; Conway & Rehding, 2013; Pearce et al., 2016; Smith,
2005).
Indeed, art and music are forms of human expression that are as old as
our species is (Aubert et al., 2014; Curtis, 2006). Hence, the aesthetic
experience of music (and other arts) must be a biological as well as a
cultural phenomenon. This point of view does not in any way downplay the
act of creation, but rather emphasizes the fact that an aesthetic experience
has aspects that are amenable for analysis in terms of biological
frameworks. A recent cross-cultural study (Savage, Brown, Sakai, &
Currie, 2015) provides further support studying music eliciting aesthetic
experiences common to all humans. This study showed that the well-studied
statistically predominant perceptual and cognitive features of music (pitch:
use of discrete pitches, small intervals and melodic contours; rhythm:
isochronous beat and multiples of beats; form: short phrases lasting less
than 9 sec) are accompanied by other features that have been thus far
marginalized in scientific investigation, namely instrumentation (concurrent
use of voice and instruments), performance style (chest voice), and social
context (performed in groups and by males). These features relate to aspects
of music that are relevant in an aesthetic experience and that have been thus
far mainly related to cultural transmission rather than biology: for instance,
mastering the style of the music is often a prerequisite for reaching a
positive aesthetic outcome and the type of social context is also a
determinant of a musical aesthetic experience. In line with this, a meta-
analysis has summarized the reasons for listening to music (Schafer,
Sedlmeier, Stadtler, & Huron, 2013), illustrating, from the subjective
experiential viewpoint, how music can be addressed by scientific
investigation. Among the 129 surveyed reasons from the literature, three
main factors emerged: social relatedness, self-awareness, and mood
regulation/arousal. The last factor supports previous claims that music
listening behavior is explained by the emotional and aesthetic impact of
music. The other two factors have been less studied with neuroscience
methods, also due to the limitations intrinsic to the experimental setup.
In the present chapter, I first describe the general framework of
neuroaesthetics of art that has inspired the advocated paradigm shift from
music neuroscience to music neuroaesthetics, and then provide some
putative reasons for the slow emergence of this field of research, as opposed
to the neuroaesthetics of visual arts. Then, I list some of the main findings
obtained within music neuroaesthetics that have been organized in the few
models existing in the literature. The discussion is dedicated to the frontiers
in the study of intra-subject neural interactions between brain areas that
give rise to aesthetic responses and to the latest attempts for capturing the
neural attributes of inter-subject interactions during musical performance.
T N S N
D A
The term neuroaesthetics was first coined by Semir Zeki almost two
decades ago (Zeki, 1999) to indicate a multidisciplinary field of research,
focused at first on visual art, merging a long history of philosophical and
empirical aesthetics with the methodology of cognitive and affective
neurosciences (Chatterjee, 2011; Chatterjee & Vartanian, 2014, 2016;
Conway & Rehding, 2013; Nadal & Pearce, 2011; Pearce et al., 2016).
Neuroaesthetics seeks to understand the neural principles underlying the
different processes that compose a human aesthetic experience with an
artistic object (Livingstone & Hubel, 2002).
An aesthetic experience has been defined as a psychological state
determined by interaction with an object to which we intend to attribute
(evaluate/appraise) positive or negative qualities according to perceptual,
cognitive, affective, or cultural criteria. It is intrinsically different from
other affective experiences due to a special attitude (also referred to as
focus, stance, or pre-classification) toward the object. According to a
Kantian notion, this aesthetic stance is often characterized by being
disinterested, distanced from the primary emotional needs of the organism
(Leder, Gerger, Brieber, & Schwarz, 2014). According to a somewhat
tautological definition, an aesthetic experience is “an experience of a
particular kind of object that has been designed to produce such an
experience” (Bundgaard, 2015, p. 788). According to this
conceptualization, an aesthetic experience arises when, through perceptual-
representational processes, we attribute to the stimulus a meaning based on
aesthetic evaluation. While there exist some universal laws of preference
for some stimulus configurations (e.g., according to Gestalt laws humans
tend to like symmetry, equilibrium, and order due to organizational function
of the organism; Cupchik, 2007; Eysenck, 1942), the stimulus alone is not
by itself the source of an aesthetic experience. Rather, it is the intentional
relation and attitude that the subject has with the stimulus. Because of this,
subjectivity is intrinsic in aesthetic responses. A stimulus that is
aesthetically appealing to one person can be repulsive to another. These
variations derive from both the internal state, including the personal
experience of previous encounters with the stimulus, and the attitudes
toward the stimulus, the current mood, and the innate biological
predispositions for processing the stimulus and for having an aesthetic
experience as a whole (Pelowski, Markey, Forster, Gerger, & Leder, 2017).
Along this conceptualization, the research field of neuroaesthetics is
dedicated to studying how the brain facilitates the human capacity for
experiencing phenomena as “aesthetic” and for creating objects that evoke
such experience. To delve into these aims, one can choose two possible
directions of investigation, as also conceptualized by Brattico (2015),
Cupchik and colleagues (Cupchik, Vartanian, Crawley, & Mikulis, 2009),
Jacobsen and Beudt (2017), and Pelowski et al. (2017): on one hand, the
bottom-up perceptual facilitation of aesthetic responses based on the
physical properties of an artwork, and, on the other hand, the feedback and
feedforward relationship between top-down, intentional orientation of
attention and the artwork. Following Redies (Redies, 2015), this dualism in
how aesthetic phenomena are studied can be represented as a dichotomy
between formalist and contextual theories. Formalist theories propose that
the aesthetic experience relies on formal properties of the stimulus (e.g.,
symmetry, sensual beauty), which are considered to be universal and based
on human brain physiology. Often in these theories, aesthetic responses to
art are described as automatic and independent from conscious control
(Zeki, 2013). In turn, in contextual theories the aesthetic experience
depends on the intention of the artist and the circumstances under which the
artwork has been created and is displayed. Some of these theories focus on
contemporary abstract art, characterized by a lesser role given to sensory
features (Jacobsen, 2014; Leder, Belke, Oeberst, & Augustin, 2004;
Pelowski, Markey, Lauring, & Leder, 2016). Some proposals also attempt a
reconciliation between the two opposite stands, modeling the impact of top-
down and bottom-up factors depending on the type of artistic stimulus that
is at hand. For instance, in the model by Redies (Redies, 2015) external
information, meaning the stimulus features and context in which it is
displayed, is distinct from internal representation, meaning the subjective
representation and reaction to the stimulus by the beholder. In this particular
model, aesthetic experience is reached only with favorable encoding and
cognitive mastering of the stimulus. In most proposals, mainly focused on
visual art (Pearce et al., 2016; Pelowski et al., 2016), the aesthetic
experience seems to emerge from the interaction of cognitive, affective, and
evaluative processes, involving at least three different brain processes: (a)
an enhancement of low-level sensory processing; (b) high-level top-down
processing and activation of cortical areas involved in evaluative judgment;
(c) an engagement of the reward circuit, including cortical and subcortical
regions.
The initial efforts within neuroaesthetics of visual art involved
measurements of subjects’ brain activity while they evaluated the beauty or
preference of artistic versus natural pictures (e.g., Vartanian & Goel, 2004),
while they rated the beauty or correctness of abstract visual patterns (e.g.,
Jacobsen & Höfel, 2003), or while they viewed abstract, still life,
landscape, or portrait pictures classified as beautiful, ugly, or neutral prior
to the brain scanning session (e.g., Kawabata & Zeki, 2004). After these
inspiring works, a great number of publications using neuroimaging and
neurophysiological techniques have followed. Current neuroaesthetic
research has fractionated human responses to art into the main outcomes of
aesthetic emotions (e.g., pleasure, being moved, interest), preference (e.g.,
conscious liking), and judgment (e.g., beauty), associating to each of them a
replicable and reliable pattern of neural and physiological activity (Brattico
et al., 2016; Brattico, Bogert, & Jacobsen, 2013; Brattico & Pearce, 2013;
Chatterjee & Vartanian, 2014, 2016; Istok, Brattico, Jacobsen, Ritter, &
Tervaniemi, 2013; Jacobsen, 2014; Leder, Markey, & Pelowski, 2015;
Nieminen, Istok, Brattico, Tervaniemi, & Huotilainen, 2011; Pearce et al.,
2016; Pelowski et al., 2016; Reybrouck & Brattico, 2015). In these
proposals, aesthetic emotions are the subjective feelings elicited by an
artistic object whereas aesthetic judgments are defined as subjective
evaluations based on an individual set of criteria. Moreover, several factors
affecting the aesthetic experience have been targeted by neuroscientific
investigation: environment, intentions, familiarity, expertise, and attitudes.
In the latest overarching proposal called the Vienna Integrated Model of Art
Perception or VIMAP (Pelowski et al., 2017), bottom-up processing of low-
level artwork derived features, listing perceptual analysis, implicit memory
integration, and explicit classification, is conjoined with top-down factors.
Among those latter factors, cognitive mastery, namely the matching of all
information collected in previous processing stages to existing predictions
and schemata, plays a central role and leads to the creation of meaning and
associations. Brain substrates of the difference stages of the visual aesthetic
experiences have also been identified particularly in visual cortices for
feature analysis, dorsolateral prefrontal cortex for cognitive mastery
default-mode network regions, error-monitoring regions of the anterior
cingulate cortex, limbic regions (particularly, insula and amygdala) for
controlling emotions, and orbitofrontal cortex for integrating signals from
cognitive and emotional brain regions and issuing aesthetic judgments.
Lately, while the initial and majority of efforts have concentrated on
visual art (paintings), researchers keen on the neuroaesthetics approach
have expanded their interest from visual art toward several other artistic
domains, such as sculpture (Di Dio, Macaluso, & Rizzolatti, 2007),
architecture (Coburn, Vartanian, & Chatterjee, 2017), dance (Calvo-Merino,
Glaser, Grezes, Passingham, & Haggard, 2005; Calvo-Merino, Jola, Glaser,
& Haggard, 2008), and poetry (Wassiliwizky, Koelsch, Wagner, Jacobsen,
& Menninghaus, 2017). In the past few years, the field has seen a fast
growth with several special issues of journals and books (e.g., Huston,
Nadal, Mora, Agnati, & Cela Conde, 2015; Martindale, Locher, & Petrov,
2007), reviews (Chatterjee, 2011; Chatterjee and Vartanian, 2014, 2016;
Leder & Nadal, 2014; Nadal et al., 2008; Pearce et al., 2016; Pelowski et
al., 2016, 2017), and conferences (e.g., Nadal & Pearce, 2011). While
critiques do exist (Tallis, 2008, 2011), and are indeed welcome for a healthy
scientific debate, in the past two years the status of neuroaesthetics,
especially for visual arts, has changed from that of contingent or trendy to
that of a mature discipline (Chatterjee, 2011; Leder & Nadal, 2014; Pearce
et al., 2016).
N : A R A
M
Similar to other artistic domains, music is phylogenetically universal: it has

existed across all human cultures and epochs. It might be even older than
our Homo sapiens species: a flute with two holes carved in a bear bone was
found in 1996 from a cave in Slovenia that was inhabited by Neanderthals
(Aubert et al., 2014; Seghdi & Brattico, in press). Music is also
ontogenetically universal considering that it is the first form of
communication between a newborn and a parent and the last one to
disappear when all other cognitive functions have been polished away by
neurodegenerative decay (Golden et al., 2017; Jacobsen et al., 2015;
Matrone & Brattico, 2015). Music shares all these aspects, namely
universality, evolutionary functions, emotional impact, expressivity, with
other forms of art. Moreover, music is characterized by responses that are
aesthetic in nature, since they involve a variety of emotional processes that
typically are associated with and temporally precede evaluative (subjective)
decisions to consciously like the music heard and attribute to it (objective)
properties of beauty or mastering or interest, as well as to seek for the same
experience again. These processes form a learning motivational loop that
ultimately generates a set of preferences and habits called musical taste. For
instance, the top reasons why we listen to music (Laukka, 2007; McDonald
& Stewart, 2008) and even why we become musicians (Juslin & Laukka,
2004; Sloboda, 1992) are related to the aesthetic responses that music
evokes: enjoyment, being moved, entertainment, and beauty. Also, when
asked to name the adjectives that best describe the aesthetic value of music
hundreds of university students indicated “beautiful” as the most common
word (Istok et al., 2009).
Hence, cognitive neuroscience can regard music as a form of expressive
art, rather than an auditory domain to be contrasted with the other auditory
domains of speech/language, as proposed in a first essay dedicated to the
emerging field (Brattico & Pearce, 2013), aligning itself to the recent
progress of neuroaesthetics. Along this line of thought, already in the late
1800s, the German philosopher Eduard Hanslick (1825–1904) underlined
the strong links between music and aesthetics as opposed to the utilitarian
function of speech: “Speech and music have their centres of gravity at
different points, around which the characteristics of each are grouped: and
while all specific laws of music will centre in its independent forms of
beauty, all laws of speech will turn upon the correct use of sounds as a
medium of expressing ideas” (Hanslick, 1954, pp. 94–95). In a second
essay dedicated to music neuroaesthetics (Hodges, 2016), the field was
described as counting two distinct research agendas. The first one is a
“broad” agenda that studies music perception, cognition, and emotion,
without explicit reference to aesthetics or to any aesthetic concept, and
which can be identified with the broader field of cognitive neuroscience of
music. The second one is a research agenda of “narrow” scope that can be
identified as the “core” neuroaesthetics of music, since it deals primarily
with aesthetic processes, and it explicitly refers to preference, aesthetic
emotions, and beauty (or other aesthetic) judgments.
The increasing amount of studies under the umbrella of the “core”
neuroaesthetics of music often do not explicitly refer to any specific model
of the musical aesthetic experience, but they typically contain the word
“aesthetic” when describing findings. The goals of the “core”
neuroaesthetics of music are to determine how the neuronal processing of
multisensorial signals leads to aesthetic responses during music listening
and performance. Aesthetic responses include emotions (such as sensory
and conscious pleasure or enjoyment, being moved), liking or preference,
and aesthetic judgment. The present chapter aims at identifying the main
themes that separate music neuroaesthetics from the broader cognitive
neurosciences of music (see Fig. 1).
FIGURE 1. Diagram illustrating the standing of the field of neuroaesthetics of music within
broader human cognitive neuroscience studies.
E M M
A E
Even if the past few years have witnessed several studies on aesthetic-
related phenomena during music listening, the scientific questions asked
have often been addressed without any explicit reference to overarching
aesthetic frameworks, differently from what happens in visual research
(Brattico & Pearce, 2013; Hodges, 2016). In a critical integrative analysis
of thirty-one empirical aesthetic studies conducted between the years 1990
and 2015 (out of the initial 1,450 references first obtained) (Tiihonen,
Brattico, Maksimainen, Wikgren, & Saarikallio, 2017), it was noted that
scientific investigations of pleasure, one of the main subjective aesthetic
responses to any artwork, have been contextualized within aesthetic
frameworks and concepts for the visual modality (studies using stimuli
from figurative arts, such as painting or sculptures) whereas they were
linked to basic neuroscientific literature on primary pleasure (or the absence
of it) for music modality. This analysis confirms that visual empirical and
neuroaesthetics are active fields counting a number of established and well-
recognized frameworks, whereas research on music is dominated by
sensory and basic emotion models.
The current situation can be attributed to the scarcity of brain-based
models of aesthetic processes in music, leading to limited efforts of
overarching interpretations of the individual neuroscientific findings
obtained. One of these models (illustrated in Fig. 2) is characterized by a
chronometric distinction of the information processing stages leading to
aesthetic responses. This and further developments by the same authors
establish a distinction between pre-attentive, low-level perceptual and
emotional stages, and reflective processes involving cognitive control
(Brattico, 2015; Brattico et al., 2013; Brattico & Pearce, 2013; Nieminen et
al., 2011; Reybrouck & Brattico, 2015). These stages lead to the three main
outcomes of an aesthetic experience, namely emotion, preference, and
judgment (Brattico, 2015; Brattico & Pearce, 2013). These previous
accounts include a locationist view combined with a temporal information
processing description of the brain mechanisms involved in the aesthetic
experience of music: each temporally evolving stage depends on a distinct
set of specific brain structures. The final outcomes of the aesthetic
experience require the succession of all previous stages in order to
materialize. For instance, according to Brattico et al. (2013), conscious
liking judgments can be issued after the brainstem, thalamus, and limbic
regions have quickly reacted to salient features of the sound, and after the
frontotemporal cortex has encoded and integrated those sound features with
learned cognitive schemata, using parietal and action observation neural
resources for attributing emotional connotations to the sounds (see Fig. 2).
If all these stages are successfully completed, and if limbic, prefrontal, and
mentalizing brain regions are conspicuously activated, then a liking
judgment, possibly accompanied also by a beauty verdict, is issued.
FIGURE 2. A schematic representation of a previous framework concerning the timing,
localization, and effects of neural processes contributing to aesthetic experience (modified from
Brattico et al., 2013). The lower block shows how the various processes evolve as a function of time,
beginning from the first sensory analyses to the main outcomes of aesthetic emotions, preference,
and judgments. The upper block illustrates their rough anatomic locations and connections in the
human brain. ABR = auditory brainstem response; LPP = late positive potential.
Other influential models that inform research on music neuroaesthetics,

although not explicitly referring to the aesthetic experience as a whole, have
targeted either music-induced emotions or mainly pleasure (irrespectively
of other emotions). The most influential model for music-induced emotions
has been first proposed by Juslin and Västfjäll (2008) and identifies eight
main mechanisms that are supposed to explain induction of any musical
emotion: brainstem reflexes (the automatic reactions to salient, potentially
important, features of sounds), evaluative conditioning (deriving from
repeated pairing of music to positive or negative stimuli), emotional
contagion (when music mimics a bodily or vocal emotional expression),
visual imagery (association with visual images during listening), episodic
memory (elicitation of a memory for a particular event), and musical
expectancy. This latter mechanism has been strongly linked with two
important forces accounting for a rewarding musical experience:
predictability and surprise (Huron, 2006). During listening, we use our
former encounters with music and our implicit knowledge of musical
conventions to consciously or implicitly anticipate the outcomes of the
musical “paths,” wondering where they might lead us (Huron, 2006, 2009).
According to Huron (2006, 2009), Imagination, Tension, Prediction,
Reaction, and Appraisal (ITPRA) create a loop leading to musical pleasure:
anticipating future events in music through imagination creates both
physiological and psychological tension, and both unconscious and
conscious predictions for specific features are formed; the final outcome is
a reaction leading to a conscious appraisal response (whether the outcome is
good, bad, or something in between). In a summarizing effort, Vuust &
Kringelbach (2010) identified a dichotomy between extra-musical
mechanisms that rely, for example, on associations with past events or other
emotional sounds and the intra-musical mechanism of anticipation.
According to this latter mechanism, a musical structure would be
aesthetically pleasing when it optimally challenges learned predictions for
incoming events (Vuust & Witek, 2014; Witek, Clarke, Wallentin,
Kringelbach, & Vuust, 2014; Witek, Kringelbach, & Vuust, 2015). In the
brain, dopaminergic neurotransmission between ventral tegmental area,
ventral striatum (including the nucleus accumbens), amygdala, and insula
up to the orbitofrontal cortex, is associated with desire for a reward,
especially when it comes as unexpected (prediction error), whereas dorsal
striatum and opioid neurotransmission seem to be related with the actual
pleasurable reaction (Berridge & Kringelbach, 2015; Kringelbach &
Berridge, 2017; Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011).
To the initial six emotion-inducing mechanisms proposed by Juslin and
Västfjäll (2008), another two were added by Juslin (2013). The first one
was rhythmic entrainment. This mechanism is particularly interesting from
the neuroscience perspective as it has been linked to the neural mechanism
that synchronizes the firing frequency of neuronal assemblies to the pulse of
the music heard (Large & Snyder, 2009), although not at tempi below 1
note per second (Doelling & Poeppel, 2015). In some cases, this neuronal
entrainment can be observed even in the spectral domain. For instance, a
dissonant sound seems to elicit neuronal activity that periodically oscillates
at the same frequency as the beats (amplitude modulations) of the sound
(Fishman et al., 2001; Pallesen et al., 2015).
The other added mechanism was aesthetic judgment (Juslin, 2013), that
begins when a special aesthetic attitude is adopted and that is based on a set
of individual criteria determining a preference or rejection of a particular
musical piece. Aesthetic judgment, according to Juslin (2013), accounts for
the special nature of music-induced emotions that distinguish them from
mundane emotions (such as sadness for a sudden loss) as well as for the
common incidence of mixed emotions induced by music representing
negative emotions but producing pleasurable feelings of enjoyment.
Notably, in this model, the distinction between emotions, preference, and
judgments is made, similarly to Brattico et al. (2013), although it differs
from that because of a lesser emphasis on temporally succeeding neutrally
distinct processes.
The most comprehensive accounts of the aesthetic experience (Brattico
et al., 2013; Hargreaves & North, 2010; Hodges, 2016; Juslin, 2013) cover
also the context, namely the external physical environment surrounding the
individual during a musical activity. The listening experience changes
depending on whether it is consumed alone or with peers, in a concert hall
or at home. The listener, that is, the internal state of the individual
(attention, intention, attitude, motivation, personality) cannot be omitted
either (Brattico et al., 2013; Brattico, 2015; Hargreaves & North, 2010;
Hodges, 2016; Reybrouck and Brattico, 2015); a specific internal state can
either impose an incidental music consumption, in the case of a distracted
person with no intention to have any musical exposure, or cause a full
aesthetic experience with positive responses, in the case of the avid
concertgoer.
M B S R
A R M
In the information processing models of the aesthetic experience presented

above, the extraction of acoustic features in brainstem, thalamus, and
sensory cortices is the first necessary stage. In music (but also in visual art,
according to some models), emotional responses, described also as
reactions or reflexes, occur already at an early stage and are closely
predicted by the physical content of the stimulus (Brattico et al., 2013;
Pearce, 2015; Reybrouck & Brattico, 2015). For instance, a rough,
dissonant chord can alone excite neuronal assemblies in the limbic system,
such as the amygdala and parahippocampal gyrus (Blood, Zatorre,
Bermudez, & Evans, 1999; Gosselin et al., 2006; Pallesen et al., 2005). In
the case of early emotional reactions to sounds, causing immediate sensory
pleasure, limbic regions can be activated even without the involvement of
higher-order brain areas. A dissociation between fast and slow routes for
pleasure (described in Brattico, 2015; Kringelbach & Berridge, 2017), is
visible in studies involving tasks that distract subjects from deliberate
evaluation of sounds. For instance, in Bogert et al. (2016) limbic regions
were activated in response to emotionally stereotypical music clips only
when subjects were focusing their attention on descriptive aspects of the
sounds, whereas they were downregulated when subjects had to direct their
conscious attention to the emotions expressed by the music.
An intermediate stage of the aesthetic experience, explicitly mentioned
in Brattico et al. (2013) and Juslin (2013) as well as several visual models
(Pelowski et al., 2016, 2017), includes integration of features and the
modulation by existing cultural knowledge. This stage requires the
involvement of lateral prefrontal cortex, particularly the inferior frontal
gyrus, and premotor areas. These brain regions have been repeatedly
involved in the detection of incongruous sound events, violating
expectations based on previous knowledge of musical conventions. The
predictive coding theory of brain function suggests that in both auditory and
frontal regions of the brain prior predictions are continuously applied top-
down to an incoming signal and when an error occurs between priors and
actual signal, predictions are changed in a bottom-up feedback loop for
minimizing free energy (Friston, 2005; Vuust, Ostergaard, Pallesen, Bailey,
& Roepstorff, 2009). These prediction errors can be measured by using the
event-related potential (ERP) technique and by focusing on brain responses
such as the N100 or the mismatch negativity (MMN) or the early right
anterior negativity (ERAN) (Koelsch, 2011; Koelsch & Siebel, 2005),
tracking the information content (probabilistically based on the occurrence
of sounds in the preceding context) or subjective expectancy of sounds
(Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya, 2010).
During the intermediate stage, discrete emotions expressed by music are
perceived and possibly even induced. While in Juslin’s (2013) model,
emotions are considered as an outcome of the different psychological and
neural mechanisms activated during a listening experience, in Brattico et
al.’s (2013) model emotions are perceived and felt before other aesthetic
outcomes occur. Support for this view comes from studies showing the
independence between conscious, thought-related aesthetic processes and
emotional processes (Bogert et al., 2016; Brielmann & Pelli, 2017; Liu et
al., in press). A recent meta-analysis of fMRI studies on musical emotions
highlights a set of regions in the brain that form the core of the functional
network that processes musical emotions, namely nucleus accumbens,
amygdala, hippocampus, insula, cingulate cortex, orbitofrontal cortex, and
temporal pole (Koelsch, 2014).
One kind of emotional response to music is conscious pleasure or
enjoyment, closely related to liking and preference. In existing models,
enjoyment and conscious liking are described as aesthetic outcomes since
they require a deliberate decision and an evaluative act deriving from the
integration of the preceding cognitive and emotional information processing
stages (Brattico et al., 2013; Juslin, 2013). From the brain perspective,
conscious pleasure and liking, often accompanied by the bodily response of
chills, have been consistently associated with activity of mesolimbic brain
regions of the reward circuit, including the nucleus accumbens, the ventral
tegmental area, the amygdala, the insula, the orbitofrontal cortex, and the
ventromedial prefrontal cortex, which rely on the neurotransmitter
dopamine (Blood & Zatorre, 2001; Blum et al., 2010; Chanda & Levitin,
2013; Koelsch, 2014; Salimpoor et al., 2013; Zatorre, 2015).
A third kind of aesthetic outcome is aesthetic judgment (“this music is
beautiful”). As visible from Table 1, only a few studies have analyzed
aesthetic judgments. Indeed, beauty is the most mentioned criterion when
freely associating a word to music aesthetic value (Istok et al., 2009). A
series of studies aimed at contrasting aesthetic versus cognitive responses to
the same musical stimuli in order to evidence the specificity and
chronometry of the neural mechanisms that govern aesthetic processes
occurring during music listening. The first of these studies (Brattico,
Jacobsen, De Baene, Glerean, & Tervaniemi, 2010) was conducted using
electroencephalography (EEG): subjects were asked to judge the same 180
musical sequences while they were either deciding if the sequences sounded
correct or incorrect (descriptive task) or they were deciding whether they
liked them or not (evaluative task). Results showed larger frontal
negativities for evaluative than descriptive task and more neural resources
involved in “aesthetic” listening. In terms of brain structures, the
orbitofrontal cortex is repeatedly found active in association with beauty
judgments of music (similarly to beauty judgments of visual art) (Brattico et
al., submitted; Ishizu & Zeki, 2011).
P A : N
I M A
R
Recent years have seen a change in the way brain physiology is described,
from a locationist view where each structure subserves one or a few main
functions, to a distributed view where the brain is described as a complex
dynamic system and where the interactions between its components govern
cognitive functions (Bassett & Sporns, 2017; Medaglia, Lynall, & Bassett,
2015). This novel view derives from the technological and scientific
progress of network neuroscience, namely the marriage between network
science and cognitive neuroscience (Bassett & Sporns, 2017). Network
techniques are mathematical tools to describe complex systems organized in
networks that change over time (dynamics) (Medaglia et al., 2015;
Newman, 2010).
In previous overviews of the music neuroaesthetic field (Brattico &
Pearce, 2013; Hodges, 2016), studies from network neuroscience have not
been much mentioned. Indeed, most studies on functional connectivity have
been published in the past two years. For instance, it has been recently
found that functional connectivity between the superior temporal gyrus
(where the auditory cortex is located), the inferofrontal cortex (where
hierarchical predictions for sounds are computed), and reward regions
determines the pleasurable rewarding responses to music, or the absence of
them (Martínez-Molina, Mas-Herrero, Rodriguez-Fornells, Zatorre, &
Marco-Pallares, 2016; Sachs, Ellis, Schlaug, & Loui, 2016; Salimpoor et
al., 2013; Wilkins, Hodges, Laurienti, Steen, & Burdette, 2014). For
instance, in a study where subjects had to decide how much money they
would use to buy songs, it was found that the connections between the
nucleus accumbens and its surrounding regions (the amygdala and the
hippocampus) predicted how much a participant would spend on each song
(Salimpoor et al., 2013).
The importance of the neural interactions between the nucleus
accumbens and the auditory cortex for determining aesthetic pleasure to
music has been remarked also by studies aiming at identifying the neural
sources of individual differences in pleasurable reactions to musical sounds
(Keller et al., 2013; Martínez-Molina et al., 2016; Sachs et al., 2016). These
studies originate from the recently empirically proven observation that
music is not universally liked and appreciated but rather individuals vary
greatly in their sensitivity to musical reward, ranging from musicophilics
characterized by acute craving for music and increased responsiveness and
interest for musical sounds (Sacks, 2007) to musical anhedonics, with a
total indifference to music (Mas-Herrero, Marco-Pallares, Lorenzo-Seva,
Zatorre, & Rodriguez-Fornells, 2013; Mas-Herrero, Zatorre, Rodriguez-
Fornells, & Marco-Pallares, 2014). A recent study using diffusion tensor
imaging (DTI) has evidenced that the white-matter tracts between the
posterior portion of the superior temporal lobe and emotion- and reward-
processing regions such as the anterior insula and the medial prefrontal
cortex explain the individual differences in reward sensitivity to music
(Sachs et al., 2016). In that study, reward sensitivity was quantified with the
amount of chills experienced by each individual combined with the degree
of physiological changes (heart rate and skin conductance response) during
listening to music inducing chills versus neutral music. Another study
(Martínez-Molina et al., 2016) used the newly developed BMRQ
questionnaire to identify music-specific anhedonic, hedonic, and
hyperhedonic subjects. They were measured with fMRI during a music
listening task where they rated the pleasantness of the music excerpts, and a
gambling task, where they either won or lost a symbolic amount of money.
Results evidenced decreased regional activity in the ventral striatum
(including the nucleus accumbens) in anhedonics and increased regional
activity in hyperhedonics as well as downregulated functional connectivity
between this area and the right superior temporal gyrus in anhedonics.
These results were obtained only in relation with pleasantness responses to
the music and not with the gambling task.
These findings are not confined to receptive pleasure during listening
but also relate to the desire to move to rhythmic aspects of the music. A
study by Witek et al. (forthcoming) found local changes in directed
effective connectivity between motor (dorsomedial prefrontal) and reward
(striatal) networks during maximal rhythm-induced pleasurable urge to
move. In addition, they showed that maximal pleasurable desire to move to
sound was predicted by a meta-stable brain network organization, namely a
neural organization lying between an ordered and a disordered state
(computed as whole-brain shuffling speed of effective connectivity
matrices) (Deco, Kringelbach, Jirsa, & Ritter, 2017).
These and other studies compellingly demonstrate that functional
connectivity between the superior temporal gyrus (where the auditory
cortex is located), the inferofrontal cortex (where hierarchical predictions
for sounds are computed), and reward regions of the brain are linked with
pleasurable rewarding responses to music, or the absence of them
(Martínez-Molina et al., 2016; Sachs et al., 2016; Salimpoor et al., 2013;
Wilkins et al., 2014). Notably, the neural transmission between these brain
areas is regulated by the monoamine neurotransmitter dopamine that has
been linked to incentive salience and motivation for acting, namely to the
“wanting” phase of the reward cycle (Kringelbach & Berridge, 2017). A
very recent investigation has discovered a molecular link between affective
sensitivity to (musical) sounds and dopamine functionality (Quarto et al.,
2017): a functional variation in a dopamine receptor gene modulates the
impact of sounds on mood states and emotion-related prefrontal and striatal
brain activity.
The studies reviewed above, while having the important merit to reveal
the complex architecture subserving the rewarding experience of music
listening, have not examined whether this experience can be consumed
spontaneously, even with casual listening, or whether it requires focused
attention and a particular attitude (that is sometimes referred to as aesthetic
stance). A fresh study (Liu et al., in press) contrasted conditions varying in
the type of focused attentional involvement toward the music requested
from subjects. Similarly to previous findings (Bogert et al., 2016; Brattico
et al., 2016; Liu, Abu-Jamous, Brattico, & Nandi, 2017), the study observed
a co-activation in a network of mesiotemporal limbic structures, including
the nucleus accumbens, in response to the liked musical stimuli,
irrespectively of whether subjects were focusing on making a conscious
liking evaluation or not. Functional connectivity within prefrontal and
parieto-occipital regions was instead obtained for the liking judgments.
F C P
Until now, the musical experience has been analyzed from the point of view
of the subject. Yet, music (like other arts) can represent a means of
communication between the judgmental intentions of the perceiver and the
meaning-making intentions of the composer/artist. The act of meaning
attribution, which is essential to an aesthetic experience, as argued, for
example, by Chatterjee and Vartanian (2014), Pearce et al. (2016), Leder et
al. (2004), and Menninghaus et al. (2017), cannot exist without the
assignment of an intention to the agent producing the artistic object
(Acquadro, Congedo, & De Riddeer, 2016). Modern neuroscience offers
unprecedented opportunities to capture the essence of such aesthetic
processes, thanks to the hyperscanning approach, namely the synchronized
brain recordings of two or more persons doing an experimental task
together (Hari, Henriksson, Malinen, & Parkkonen, 2015; Konvalinka &
Roepstorff, 2012; Zhdanov et al., 2015; Zhou, Bourguignon, Parkkonen, &
Hari, 2016).
Even if presently, “core” neuroaesthetics of music does not much
account for motor production, the mirror neuron or action observation
system (a set of neurons in the fronto-parietal regions of the brain that
responds when watching others doing a motor action; Freedberg & Gallese,
2007; Gallese & Freedberg, 2007; Rizzolatti, Fadiga, Gallese, & Fogassi,
1996) has been proposed as a key mechanism allowing aesthetic responses
to music in an interactive situation (Molnar-Szakacs & Overy, 2006).
According to one model (Molnar-Szakacs & Overy, 2006), music is
described as a sequence of hierarchically organized sequences of motor acts
synchronous with auditory information and activating both the auditory
cortex and motor regions of the action observation network in the posterior
inferior frontal gyrus (BA 44) and adjacent premotor cortex. In this model,
the anterior insula serves to evaluate the internal visceral changes derived
from music and relay these changes to activity in the limbic system, which
ultimately is responsible for the complex affective experiences originating
from music listening. The co-activation of the same motor systems in
musician and perceiver is supposed to allow the co-representation and
sharing of emotions during an aesthetic musical experience. Hence, future
studies using hyperscanning techniques might measure the aesthetic value
of a musical interaction and determine the responsible neural mechanisms.
Initial investigations measuring the inter-subject coupling of
electroencephalographic signals (especially in beta frequency range) from
guitarists playing in a duet prove the feasibility of this approach
(Lindenberger, Li, Gruber, & Müller, 2009; Müller, Sanger, &
Lindenberger, 2013).
To conclude, the agenda of the neuroaesthetics of music, by addressing
questions related to intra- and inter-subjectivity during a musical activity,
comes close to the essence of music and of what we are as humans. While
there still is the risk of “biologism,” researchers working under the music
neuroaesthetics umbrella reach out to the “humanistic” approach since they
strive to explain how “musical appreciation is dependent on culture,
memory, mood and many other factors such as personal taste” (Tallis, 2011,
p. 54).
R
Acquadro, M. A., Congedo, M., & De Riddeer, D. (2016). Music performance as an experimental
approach to hyperscanning studies. Frontiers in Human Neuroscience 10, 242. Retrieved from
Alluri, V., & Toiviainen, P. (2015). Musical expertise modulates functional connectivity of limbic
regions during continuous music listening. Psychomusicology: Music, Mind, and Brain 25(4),
443–454.
Altenmüller, E., Demorest, S. M., Fujioka, T., Halpern, A. R., Hannon, E. E., Loui, P., … Zatorre, R.
J. (2012). Introduction to the neurosciences and music IV: Learning and memory. Annals of the
New York Academy of Sciences 1252, 1–16.
Aubert, M., Brumm, A., Ramli, M., Sutikna, T., Saptomo, E. W., Hakim, B., … Dosseto, A. (2014).
Pleistocene cave art from Sulawesi, Indonesia. Nature 514(7521), 223–227.
Bassett, D. S., & Sporns, O. (2017). Network neuroscience. Nature Neuroscience 20(3), 353–364.
Berns, G. S., Capra, C. M., Moore, S., & Noussair, C. (2010). Neural mechanisms of the influence of
popularity on adolescent ratings of music. NeuroImage 49(3), 2687–2696.
Berridge, K. C., & Kringelbach, M. L. (2015). Pleasure systems in the brain. Neuron 86(3), 646–664.
Bigand, E., & Tillmann, B. (2015). Introduction to the neurosciences and music V: Cognitive
stimulation and rehabilitation. Annals of the New York Academy of Sciences 1337, vii–ix.
Sciences 98(20) 11818–11823.
Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant
and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience
2(4), 382–387.
Blum, K., Chen, T. J., Chen, A. L., Madigan, M., Downs, B. W., Waite, R. L., … Gold, M. S. (2010).
Do dopaminergic gene polymorphisms affect mesolimbic reward activation of music listening
response? Therapeutic impact on Reward Deficiency Syndrome (RDS). Medical Hypotheses 74(3),
513–520.
Bogert, B., Numminen-Kontti, T., Gold, B., Sams, M., Numminen, J., Burunat, I., … Brattico, E.
(2016). Hidden sources of joy, fear, and sadness: Explicit versus implicit neural processing of
musical emotions. Neuropsychologia 89, 393–402.
Brattico, E. (2015). From pleasure to liking and back: Bottom-up and top-down neural routes to the
aesthetic enjoyment of music. In M. Nadal, J. P. Houston, L. Agnati, F. Mora, & C. J. Cela Conde
(Eds.), Art, aesthetics, and the brain (pp. 303–318). Oxford: Oxford University Press.
Brattico, E., Bogert, B., Alluri, V., Tervaniemi, M., Eerola, T., & Jacobsen, T. (2016). It’s sad but I
like it: The neural dissociation between musical emotions and liking in experts and laypersons.
Frontiers in Human Neuroscience 9, 676. Retrieved from
Brattico, E., Bogert, B., & Jacobsen, T. (2013). Toward a neural chronometry for the aesthetic
experience of music. Frontiers in Psychology 4, 206. Retrieved from
Brattico, E., Brattico, P., & Jacobsen, T. (2009). The origins of the aesthetic enjoyment of music: A
review of the literature. Musicae Scientiae 13(2), 15–39.
Brattico, E., Brusa, A., Fernandes, H. M., Jacobsen, T., Gaggero, G., Toiviainen, P., Vuust, P., &
Proverbio, A. M. (submitted). The beauty and the brain: Investigating the neural correlates of
musical beauty during a realistic listening experience.
Brattico, E., Jacobsen, T., De Baene, W., Glerean, E., & Tervaniemi, M. (2010). Cognitive vs.
affective listening modes and judgments of music: An ERP study. Biological Psychology 85(3),
393–409.
Brattico, E., & Pearce, M. T. (2013). The neuroaesthetics of music. Psychology of Aesthetics,
Creativity, and the Arts 7, 48–61.
Brattico, P., Brattico, E., & Vuust, P. (2017). Global sensory qualities and aesthetic experience of
music. Frontiers in Neuroscience 11. Retrieved from https://doi.org/10.3389/fnins.2017.00159
Brielmann, A. A., & Pelli, D. G. (2017). Beauty requires thought. Current Biology 27(10), 1506–
1513 e3.
Brown, S., Gao, X., Tisdelle, L., Eickhoff, S. B., & Liotti, M. (2011). Naturalizing aesthetics: Brain
areas for aesthetic appraisal across sensory modalities. NeuroImage 58(1), 250–258.
Bundgaard, H. (2015). Feeling, meaning, and intentionality: A critique of the neuroaesthetics of
beauty. Phenomenology and the Cognitive Sciences 14(4), 781–801.
Calvo-Merino, B., Glaser, D. E., Grezes, J., Passingham, R. E., & Haggard, P. (2005). Action
observation and acquired motor skills: An FMRI study with expert dancers. Cerebral Cortex 15(8),
1243–1249.
Calvo-Merino, B., Jola, C., Glaser, D. E., & Haggard, P. (2008). Towards a sensorimotor aesthetics of
performing art. Consciousness and Cognition 17(3), 911–922.
17(4), 179–193.
Chapin, H., Jantzen, K., Kelso, J. A., Steinberg, F., & Large, E. (2010). Dynamic emotional and
neural responses to music depend on performance expression and listener experience. PloS ONE
5(12), e13812.
Chatterjee, A. (2011). Neuroaesthetics: A coming of age story. Journal of Cognitive Neuroscience
23(1), 53–62.
Chatterjee, A., & Vartanian, O. (2014). Neuroaesthetics. Trends in Cognitive Sciences 18(7), 370–
375.
Chatterjee, A., & Vartanian, O. (2016). Neuroscience of aesthetics. Annals of the New York Academy
of Sciences 1369, 172–194.
Coburn, A., Vartanian, O., & Chatterjee, A. (2017). Buildings, beauty, and the brain: A neuroscience
of architectural experience. Journal of Cognitive Neuroscience 29(9), 1521–1531.
Conway, B. R., & Rehding, A. (2013). Neuroaesthetics and the trouble with beauty. PLoS Biology 11,
e1001504.
Cupchik, G. C. (2007). A critical reflection on Arnheim’s Gestalt theory of aesthetics. Psychology of
Aesthetics, Creativity, and the Arts 1(1), 16–24.
Cupchik, G. C., Vartanian, O., Crawley, A., & Mikulis, D. J. (2009). Viewing artworks: Contributions
of cognitive control and perceptual facilitation to aesthetic experience. Brain and Cognition 70(1),
84–91.
Curtis, G. (2006). The cave painters. New York: Anchor Books.
Deco, G., Kringelbach, M. L., Jirsa, V. K., & Ritter, P. (2017). The dynamics of resting fluctuations
in the brain: Metastability and its dynamical cortical core. Scientific Reports 7, 3095.
doi:10.1038/s41598-017-03073-5
Di Dio, C., Macaluso, E., & Rizzolatti, G. (2007). The golden beauty: Brain response to classical and
renaissance sculptures. PLoS ONE 11, 1–9.
Doelling, K. B., & Poeppel, D. (2015). Cortical entrainment to music and its modulation by
expertise. Proceedings of the National Academy of Sciences 112(45), E6233–E6242.
Eysenck, H. J. (1942). The experimental study of the “good Gestalt”: A new approach. Psychological
Review 49(4), 344–364.
Fishman, Y. I., Volkov, I. O., Noh, M. D., Garell, P. C., Bakken, H., Arezzo, J. C., … Steinschneider,
M. (2001). Consonance and dissonance of musical chords: Neural correlates in auditory cortex of
monkeys and humans. Journal of Neurophysiology 86(6), 2761–2788.
Freedberg, D., & Gallese, V. (2007). Motion, emotion and empathy in esthetic experience. Trends in
Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B:
Biological Sciences 360(1456), 815–836.
Früholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions: Towards a unifying neural
network perspective of affective sound processing. Neuroscience & Biobehioral Reviews 68, 96–
110.
Gallese, V., & Freedberg, D. (2007). Mirror and canonical neurons are crucial elements in esthetic
response. Trends in Cognitive Sciences 11(10), 411.
Golden, H. L., Clark, C. N., Nicholas, J. M., Cohen, M. H., Slattery, C. F., Paterson, R. W., …
Warren, J. D. (2017). Music perception in dementia. Journal of Alzheimer’s Disease 55(3), 933–
949.
Gosselin, N., Samson, S., Adolphs, R., Noulhiane, M., Roy, M., Hasboun, D., … Peretz, I. (2006).
Emotional responses to unpleasant music correlates with damage to the parahippocampal cortex.
Brain 129(10), 2585–2592.
Hanslick, E. (1954). On the musically beautiful. Indianapolis: Hackett (English translation from the
8th ed. 1891).
Hargreaves, D. J., & North, A. C. (2010). Experimental aesthetics and liking for music. In P. N.
Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications
(pp. 515–46). Oxford: Oxford University Press.
Hari, R., Henriksson, L., Malinen, S., & Parkkonen, L. (2015). Centrality of social interaction in
human brain function. Neuron 88(1), 181–193.
Hodges, D. A. (2016). The neuroaesthetics of music. In S. Hallam, I. Cross, & M. Thaut (Eds.), The
Oxford handbook of music psychology (2nd ed., pp. 247–262). Oxford: Oxford University Press.
Huron, D. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA:
MIT Press.
Huron, D. (2009). Aesthetics. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of
music psychology (pp. 151–159). Oxford: Oxford University Press.
Huston, J. P., Nadal, M., Agnati, L., Mora, L., & Cela Conde, C. J. (Eds.). (2015). Art, aesthetics and
the brain. Oxford: Oxford University Press.
Ishizu, T., & Zeki, S. (2011). Toward a brain-based theory of beauty. PLoS ONE 6, e21852.
Istok, E., Brattico, E., Jacobsen, T., Krohn, K., Mueller, M., & Tervaniemi, M. (2009). Aesthetic
responses to music: A questionnaire study. Musicae Scientiae 13, 183–206.
Istok, E., Brattico, E., Jacobsen, T., Ritter, A., & Tervaniemi, M. (2013). “I love rock ’n’ roll”: Music
genre preference modulates brain responses to music. Biological Psychology 92(2), 142–151.
Jacobsen, J. H., Stelzer, J., Fritz, T. H., Chetelat, G., La Joie, R., & Turner, R. (2015). Why musical
memory can be preserved in advanced Alzheimer’s disease. Brain 138(8), 2438–2450.
Jacobsen, T. (2014). Domain specificity and mental chronometry in empirical aesthetics. British
Jacobsen, T., & Beudt, S. (2017). Domain generality and domain specificity in aesthetic appreciation.
New Ideas in Psychology 47, 97–102.
Jacobsen, T., & Höfel, L. (2003). Descriptive and evaluative judgment processes: Behavioral and
electrophysiological indices of processing symmetry and aesthetics. Cognitive, Affective, &
Behavioral Neuroscience 3(4), 289–299.
Juslin, P. N. (2013). From everyday emotions to aesthetic emotions: Towards a unified theory of
musical emotions. Physics of Life Reviews 10(3), 235–266.
Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A
review and a questionnaire study of everyday listening. Journal of New Music Research 33(3),
217–238.
Kawabata, H., & Zeki, S. (2004). Neural correlates of beauty. Journal of Neurophysiology 91(4),
1699–1705.
Keller, J., Young, C. B., Kelley, E., Prater, K., Levitin, D. J., & Menon, V. (2013). Trait anhedonia is
associated with reduced reactivity and connectivity of mesolimbic and paralimbic reward
pathways. Journal of Psychiatric Research 47(10), 1319–1328.
Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15,
170–180.
Sciences 9(12), 578–584.
Konvalinka, I., & Roepstorff, A. (2012). The two-brain approach: How can mutually interacting
brains teach us something about social interaction? Frontiers in Human Neuroscience 6, 215.
Retrieved from https://doi.org/10.3389/fnhum.2012.00215
Kornysheva, K., von Cramon, D. Y., Jacobsen, T., & Schubotz, R. I., Tuning-in the beat: Aesthetic
appreciation of musical rhythms correlates with a premotor activity boost. Human Brain Mapping
31(1), 48–64.
Kringelbach, M. L., & Berridge, K. C. (2017). The affective core of emotion: Linking pleasure,
subjective well-being, and optimal metastability in the brain. Emotion Review 9(3), 191–199.
Kühn, S., & Gallinat, J. (2012). The neural correlates of subjective pleasantness. NeuroImage 61(1),
289–294.
Laukka, P. (2007). Uses of music and psychological well-being among the elderly. Journal of
Happiness Studies 8(2), 215–241.
Leder, H., Belke, B., Oeberst, A., & Augustin, D. (2004). A model of aesthetic appreciation and
aesthetic judgements. British Journal of Psychology 95(4), 489–508.
Leder, H., Gerger, G., Brieber, D., & Schwarz, N. (2014). What makes an art expert? Emotion and
evaluation in art appreciation. Cognition and Emotion 28, 1137–1147.
Leder, H., Markey, P. S., & Pelowski, M. (2015). Aesthetic emotions to art: What they are and what
makes them special. Comment on “The quartet theory of human emotions: An integrative and
neurofunctional model” by S. Koelsch et al. Physics of Life Reviews 13, 67–70.
Leder, H., & Nadal, M. (2014). Ten years of a model of aesthetic appreciation and aesthetic
judgments: The aesthetic episode—developments and challenges in empirical aesthetics. British
Lehne, M., & Koelsch, S. (2015). Tension-resolution patterns as a key element of aesthetic
experience: Psychological principles and underlying brain mechanisms. In J. P. Huston, M. Nadal,
F. Mora, L. Agnati, & C. J. Cela Conde (Eds.), Art, aesthetics, and the brain (pp. 285–302).
Levitin, D. J., & Tirovolas, A. K. (2009). Current advances in the cognitive neuroscience of music.
Lindenberger, U., Li, S. C., Gruber, W., & Müller, V. (2009). Brains swinging in concert: Cortical
phase synchronization while playing guitar. BMC Neuroscience 10, 22. Retrieved from
https://doi.org/10.1186/1471-2202-10-22
Liu, C., Abu-Jamous, B., Brattico, E., & Nandi, A. K. (2017). Towards tunable consensus clustering
for studying functional brain connectivity during affective processing. International Journal of
Neural Systems 27(2), doi:10.1142/S0129065716500428
Liu, C., Brattico, E., Abu-Jamous, B., Pereira, C. S., Jacobsen, T., & Nandi, A. K. (in press). Effect
of explicit evaluation on the neural connectivity related to listening to unfamiliar music. Frontiers
in Human Neuroscience. Retrieved from https://doi.org/10.3389/fnhum.2017.00611
Livingstone, M., & Hubel, D. H. (2002). Vision and art: The biology of seeing. New York: Harry N.
Abrams.
McDonald, C., & Stewart, L. (2008). Uses and functions of music in congenital amusia. Music
Perception 25(4), 345–355.
Martindale, C., Locher, P., & Petrov, V. M. (2007). Evolutionary and neurocognitive approaches to
aesthetics, creativity and the arts. Amityville, NY: Baywood Publishing.
Martínez-Molina, N., Mas-Herrero, E., Rodriguez-Fornells, A., Zatorre, R. J., & Marco-Pallares, J.
Sciences 113, E7337–E7345.
Mas-Herrero, E., Dagher, A., & Zatorre, R. J. (2018). Modulating musical reward sensitivity up and
down with transcranial magnetic stimulation. Nature Human Behaviour 2, 27–32.
Mas-Herrero, E., Marco-Pallares, J., Lorenzo-Seva, U., Zatorre, R. J., & Rodriguez-Fornells, A.
(2013). Individual differences in music reward experiences. Music Perception 31(2), 118–138.
Mas-Herrero, E., Zatorre, R. J., Rodriguez-Fornells, A., & Marco-Pallares, J. (2014). Dissociation
between musical and monetary reward responses in specific musical anhedonia. Current Biology
24(6), 699–704.
Matrone, C., & Brattico, E. (2015). The power of music on Alzheimer’s disease and the need to
understand the underlying molecular mechanisms. Journal of Alzheimer’s Disease and
Parkinsonism 5. doi:10.4172/2161-0460.1000196
Medaglia, J. D., Lynall, M. E., & Bassett, D. S. (2015). Cognitive network neuroscience. Journal of
Menninghaus, W., Wagner, V., Hanich, J., Wassiliwizky, E., Jacobsen, T., & Koelsch, S. (2017). The
distancing-embracing model of the enjoyment of negative emotions in art reception. Behavioral
and Brain Sciences 40, 1–58.
Molnar-Szakacs, I., & Overy, K. (2006). Music and mirror neurons: From motion to “e”motion.
Social Cognitive and Affective Neuroscience 1(3), 235–241.
Montag, C., Reuter, M., & Axmacher, N. (2011). How one’s favorite song activates the reward
circuitry of the brain: Personality matters! Behavioural Brain Research 225(2), 511–514.
Müller, V., Höfel, L., Brattico, E., & Jacobsen, T. (2010). Aesthetic judgments of music in experts
and laypersons: An ERP study. International Journal of Psychophysiology 76(1), 40–51.
Müller, V., Sanger, J., & Lindenberger, U. (2013). Intra- and inter-brain synchronization during
musical improvisation on the guitar. PLoS ONE 8, e73852.
Nadal, M., Munar, E., Capo, M. A., Rossello, J., & Cela-Conde, C. J. (2008). Towards a framework
for the study of the neural correlates of aesthetic preference. Spatial Vision 21(3–5), 379–396.
Nadal, M., & Pearce, M. T. (2011). The Copenhagen neuroaesthetics conference: Prospects and
pitfalls for an emerging field. Brain and Cognition 76(1), 172–183.
Newman, M. E. J. (2010). Networks: An introduction. Oxford: Oxford University Press.
Nieminen, S., Istok, E., Brattico, E., Tervaniemi, M., & Huotilainen, M. (2011). The development of
aesthetic responses to music and their underlying neural and psychological mechanisms. Cortex
47(9), 1138–1146.
Pallesen, K. J., Bailey, C. J., Brattico, E., Gjedde, A., Palva, J. M., & Palva, S. (2015). Experience
drives synchronization: The phase and amplitude dynamics of neural oscillations to musical chords
are differentially modulated by musical expertise. PLoS ONE 10, e0134211.
Emotion processing of major, minor, and dissonant chords: a functional magnetic resonance
Patel, A. (2008). Music, language, and the brain. Oxford: Oxford University Press.
Pearce, M. T. (2015). Effects of expertise on the cognitive and neural processes involved in musical
appreciation. In J. P. Huston, M. Nadal, F. Mora, L. Agnati, & C. J. Cela Conde (Eds.), Art,
aesthetics, and the brain (pp. 319–338). Oxford: Oxford University Press.
Pearce, M. T., Ruiz, M. H., Kapasi, S., Wiggins, G. A., & Bhattacharya, J. (2010). Unsupervised
statistical learning underpins computational, behavioural, and neural manifestations of musical
expectation. NeuroImage 50(1), 302–313.
Pearce, M. T., Zaidel, D. W., Vartanian, O., Skov, M., Leder, H., Chatterjee, A., & Nadal, M. (2016).
Neuroaesthetics: The cognitive neuroscience of aesthetic experience. Perspectives on
Psychological Science 11(2), 265–279.
Pelowski, M., Markey, P. S., Forster, M., Gerger, G., & Leder, H. (2017). Move me, astonish me …
delight my eyes and brain: The Vienna Integrated Model of top-down and bottom-up processes in
Art Perception (VIMAP) and corresponding affective, evaluative, and neurophysiological
correlates. Physics of Life Reviews 21, 80–125.
Pelowski, M., Markey, P. S., Lauring, J. O., & Leder, H. (2016). Visualizing the impact of art: An
update and comparison of current psychological models of art experience. Frontiers in Human
Neuroscience 10, 160. doi:10.3389/fnhum.2016.00160
Pereira, C. S., Teixeira, J., Figueiredo, P., Xavier, J., Castro, S. L., & Brattico, E. (2011). Music and
emotions in the brain: Familiarity matters. PloS ONE 6(11), e27241.
Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience 6, 688–691.
Peretz, I., & Zatorre, R. (Eds.). (2003). The cognitive neuroscience of music. Oxford: Oxford
University Press.
Psychology 56, 89–114
Quarto, T., Fasano, M. C., Taurisano, P., Fazio, L., Antonucci, L. A., Gelao, B., … Brattico, E.
(2017). Interaction between DRD2 variation and sound environment on mood and emotion-related
brain activity. Neuroscience 341, 9–17.
Redies, C. (2015). Combining universal beauty and cultural context in a unifying model of visual
aesthetic experience. Frontiers in Human Neuroscience 9, 218. Retrieved from
Reybrouck, M., & Brattico, E. (2015). Neuroplasticity beyond sounds: Neural adaptations following
long-term musical aesthetic experiences. Brain Sciences 5(1), 69–91.
Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of
motor actions. Cognitive Brain Research 3(2), 131–141.
Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects human aesthetic
Sacks, O. (2007). Musicophilia: Tales of music and the brain. New York: Vintage.
Salimpoor, V. N., Van Den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J.
value. Science 340(6129), 216–219.
Salimpoor, V. N., & Zattore, R. J. (2013). Neural interactions that give rise to musical pleasure.
Psychology of Aesthetics, Creativity, and the Arts 7, 62–75.
Samson, S., Dellacherie, D., & Platel, H. (2009). Emotional power of music in patients with memory
disorders: Clinical implications of cognitive neuroscience. Annals of the New York Academy of
Sciences 1169, 245–255.
Savage, P. E., Brown, S., Sakai, E., & Currie, T. E. (2015). Statistical universals reveal the structures
and functions of human music. Proceedings of the National Academy of Sciences 112, 8987–8992.
Schafer, T., Sedlmeier, P., Stadtler, C., & Huron, D. (2013). The psychological functions of music
listening. Frontiers in Psychology 4, 511. Retrieved from
Scruton, R. (1999). The aesthetics of music. Oxford: Oxford University Press.
Seghdi, N., & Brattico, E. (in press). The phylogenetic roots of music. Biokulturelle Menneske.
Sloboda, J. A. (1985). The musical mind. Oxford: Oxford University Press.
Sloboda, J. A. (1992). Empirical studies of emotional response to music. In M. R. Jones & S.
Holleran (Eds.), Cognitive Bases of Musical Communication (pp. 33–46). Washington, DC:
American Psychological Association.
Smith, C. U. (2005). Evolutionary neurobiology and aesthetics. Perspectives in Biology and
Medicine 48(1), 17–30.
Steinbeis, N., & Koelsch, S. (2009). Understanding the intentions behind man-made products elicits
neural activity in areas dedicated to mental state attribution. Cerebral Cortex 19(3), 619–623.
Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., … Yanai, K. (2008).
Affective, & Behavioral Neuroscience 8(2), 126–31.
Tallis, R. (2008). The limitations of a neurological approach to art: Review of Neuroarthistory: From
Aristotle and Pliny to Baxandall and Zeki by John Onians (Yale University Press, 2008). Lancet
372, 19–20.
Tallis, R. (2011). Reflections of a metaphysical flaneur. London and New York: Routledge.
Tiihonen, M., Brattico, E., Maksimainen, J., Wikgren, J., & Saarikallio, S. (2017). Constituents of
music and visual-art related pleasure: A critical integrative literature review. Frontiers in
Trost, W., Ethofer, T., Zentner, M., & Vuilleumier, P. (2012). Mapping aesthetic musical emotions in
the brain. Cerebral Cortex 22(12), 2769–2783.
Trost, W., Frühholz, S., Cochrane, T., Cojan, Y., & Vuilleumier, P. (2015). Temporal dynamics of
musical emotions examined through intersubject synchrony of brain activity. Social Cognitive and
Affective Neuroscience 10(12), 1705–1721.
Trost, W., Frühholz, S., Schön, D., Labbé, C., Pichon, S., Grandjean, D., & Vuilleumier, P. (2014).
Getting the beat: Entrainment of brain activity by musical rhythm and pleasantness. NeuroImage
103, 55–64.
Vartanian, O., & Goel, V. (2004). Neuroanatomical correlates of aesthetic preference for paintings.
Neuroreport 15(5), 893–897.
Vuust, P., & Kringelbach, M. L. (2010). The pleasure of making sense of music. Interdisciplinary
Science Reviews 35(2), 166–182.
Vuust, P., Ostergaard, L., Pallesen, K. J., Bailey, C., & Roepstorff, A. (2009). Predictive coding of
music: Brain responses to rhythmic incongruity. Cortex 45(1), 80–92.
Vuust, P., & Witek, M. A. (2014). Rhythmic complexity and predictive coding: A novel approach to
modeling rhythm and meter perception in music. Frontiers in Psychology 5, 1111. Retrieved from
Wassiliwizky, E., Koelsch, S., Wagner, V., Jacobsen, T., & Menninghaus, W. (2017). The emotional
power of poetry: Neural circuitry, psychophysiology and compositional principles. Social
Cognitive and Affective Neuroscience 12(8), 1229–1240.
Wilkins, R. W., Hodges, D. A., Laurienti, P. J., Steen, M., & Burdette, J. H. (2014). Network science
Witek, M. A., Clarke, E. F., Wallentin, M., Kringelbach, M. L., & Vuust, P. (2014). Syncopation,
body-movement and pleasure in groove music. PLoS ONE 9, e94446.
Witek, M. A., Gilson, M., Clarke, E. F., Wallentin, M., Deco, G., Kringelbach, M. L., & Vuust, P.
(forthcoming). The brain dynamics of musical groove: Whole-brain modelling of effective
connectivity reveals increased metastability of reward and motor networks. Nature
Communication.
Witek, M. A., Kringelbach, M. L., & Vuust, P. (2015). Musical rhythm and affect: Comment on “The
quartet theory of human emotions: An integrative and neurofunctional model” by S. Koelsch et al.
Physics of Life Reviews 13, 92–94.
Zatorre, R. J. (2015). Musical pleasure and reward: Mechanisms and dysfunction. Annals of the New
Zeki, S. (1999). Inner vision: An exploration of art and the brain. Oxford: Oxford University Press.
Zeki, S. (2013). Clive Bell’s “Significant Form” and the neurobiology of aesthetics. Frontiers in
Zeki, S. (2014). Neurobiology and the humanities. Neuron 84(1), 12–14.
Zhdanov, A., Nurminen, J., Baess, P., Hirvenkari, L., Jousmaki, V., Makela, J. P., … Parkkonen, L.
(2015). An internet-based real-time audiovisual link for dual MEG recordings. PLoS ONE 10,
e0128485.
Zhou, G., Bourguignon, M., Parkkonen, L., & Hari, R. (2016). Neural signatures of hand kinematics
in leaders vs. followers: A dual-MEG study. NeuroImage 125, 731–738.
CHAPT E R 16
MUSIC AND LANGUAGE
D A N I E L E S C H Ö N A N D B E N JA MI N MO R I L L O N
W music and language may differ in terms of their structures and

functions, they share the distinctive feature of being dynamically organized
in time; the information they carry is intrinsically contained in the temporal
dimension. A frequently asked question is whether music and language are
processed by similar or different brain regions, neural networks, or cortical
oscillatory processes, and to what extent the brain circuitry is specialized
compared to other stimuli. In order to tackle these issues, it is worth
keeping in mind some principles. Nikolaas Tinbergen and David Marr
described different levels of analysis that must, in their valuable opinion, be
taken into account if one wants to understand behavior and complex
systems (Marr, 1982; Tinbergen, 1963). Marr’s three levels of analysis
(computational, algorithmic, and implementational) are particularly suited
to study brain functions. Because music and language differ in terms of
surface acoustic features and convey different purposes, the computations
needed to process them differ. On the other hand, at the implementation
level, the same organ and a myriad of cells process both music and
language. The key program in modern cognitive neurosciences is thus to
tackle the algorithmic level (Poeppel, 2012): Are similar or different
algorithms involved in the processing of music and language? And what are
they? In this chapter, we will begin with a historical perspective, where the
human brain is described from a phrenological viewpoint. Then, we will
describe the common functions and operations in music and language, the
methodological limitations in current approaches, and portray the resource-
sharing hypothesis. We will then describe the interdependency between
music and language, notably how musical training improves language
skills, before trying to bridge music and language in a single context. We
will conclude by describing a promising avenue: studies that adopt a
dynamical standpoint to understand music and language.
O M M
L
From a historical perspective, the study of the comparison of music and

language brain functions dates back to the early observations of deficits
acquired following a brain lesion. Since then, language and musical
disorders are described with different terms: aphasia and amusia. This
distinction comes along with a deeper distinction between language and
musical domains that at the end of the nineteenth century had been the
object of structural or historical formalization following very different
paths. Language is analyzed as a formal system of different elements, while
music is viewed in a historical perspective as an artistic behavior. Language
and music are thus viewed as two highly distinct human domains. In this
context, the observation of selective impairment of language or musical
abilities fits in very well and also complies with the idea that different
functions are implemented in different brain regions.
The birth of cognitive sciences is strongly influenced by this vision of
language as a specific and uniquely human function with dedicated neural
structures and music as a different human “artistic” function. At the end of
the 1950s Noam Chomsky was convinced that the principles underlying
language structure are biologically determined: every individual has the
same language potential because it is genetically transmitted, independently
of socio-cultural differences. This scientific and political view of language
development has had a tremendous impact in the field of linguistics,
cognitive sciences, and neurosciences. It stands in clear contrast with that of
another giant of psychology, B. F. Skinner. Skinner considered the mind as
a tabula rasa whereon only experience could add knowledge. The two
giants faced each other in an intellectual duel. The most famous attack of
Chomsky (1959) is the argument on the poverty of the stimulus: the child
exposed to a limited amount of linguistic stimuli is able to generalize to
new linguistic constructions using the rules acquired on the initial set.
According to Chomsky, the trial and error learning mechanisms defended
by the behaviorists would not be an appropriate model to language
acquisition since language is acquired by listening to correct sentences. This
observation, as well as the fact that a confined brain lesion such as in
Broca’s area may induce a specific language deficit (agrammatism), led to
Chomsky’s suggestion that syntactic knowledge may be partly innate.
Curiously, Chomsky did not remark that music acquisition follows very
similar principles as language acquisition: early acquisition, generativity,
and learning from correct structures.
Chomsky’s work strongly inspired that of Jerry Fodor who in the early
1980s wrote The modularity of mind (1983). The mind (and the brain)
would be organized in independent modules with specific functions. Again
Fodor’s view is strongly influenced by and reinforces the results of the
neuropsychological literature, digging deeper and deeper into specific
deficits following focal brain lesions. The functioning of the brain seems
quite simple: every region has a specific functional role and the lesion
causes a deficit that may be very specific, for instance affecting
independently nouns and verbs processing (Hillis & Caramazza, 1995).
It is within this context that the field of neuropsychology of music
develops, beyond previous anecdotal accounts. As every new field, the
desire to gain identity and acknowledgment is strong. Music is thus studied
as a special human faculty with dedicated brain areas. This vision is also
constrained by some intrinsic limitations of the field of neuropsychology of
music. First, research on musical skills in brain lesion patients requires a
neurologist or neuropsychologist with a musical background. Indeed, while
testing language skills may appear a simple task, assessing musical abilities
definitely requires special skills, even more so in the era of “pencil and
paper.” The second limitation is the Western idea that music is the
prerogative of a few people, called musicians, and thus it only make sense
to assess musical abilities in experienced musicians such as composers,
conductors, or performers with musical education (Basso & Capitani, 1985;
Luria, Tsevetkova, & Futer, 1965). Altogether this gives access to a limited
amount of data strongly influenced by the modular approach, with musical
functions clearly distinct from other human abilities. This is the vision that
is well summarized in the article entitled “Modularity of music processing”
(Peretz & Coltheart, 2003): several single case studies are used to defend
not only the hypothesis of modularity of music and language, but also the
modularity of different levels of music processing.
However, focusing on a single function, even more when using a single
methodological approach (for instance, brain lesions) will systematically
lead toward a modular interpretation of reality. In other words, focusing
only on language syntactic processing in Broca’s aphasics will necessarily
lead one to conclude that the left inferior frontal operculum is involved (or
not) in syntactic processing. This may be in turn interpreted in a modular
perspective: syntax is independently and specifically processed in the left
frontal operculum. By contrast a comparative approach will give a broader
and more complex picture. Patel (2003), considering language and musical
syntax, claims that, while these may seem very different, there are several
commonalities, such as the need to build an integrated flow of information
that takes into account a certain number of rules. Here we can clearly see all
the power of the comparative approach that requires us to go beyond a
circular definition of cognitive function (e.g., syntax is syntax) in order to
compare apparently different function (e.g., syntax and harmony) that can
possibly be redefined in terms of a more elementary function with a greater
psychobiological validity. In the case of syntax and harmony, finding
common substrates requires one to redefine the object of the study (i.e.,
some elementary operation common to both).
With the advent of the neuroimaging era, while the first two decades
have been dominated by a modular approach, the last decade has put the
accent on the importance of the network and its connections. Cognitive
neurosciences have also gained access to the functioning of non-
pathological brains using highly sophisticated experimental designs. This
has allowed a breakdown of both language and music processing into more
elementary operations. If the search of biomarkers has somewhat
consolidated the innatist model, several major criticisms have been
developed further. For instance, studies on the zebra finch, a species of bird
well known for their ability to learn new songs, showed that it is the
learning process that alters the neuronal circuits. The maturation of synaptic
inhibition onto premotor neurons is correlated with learning but not age
(Vallentin, Kosche, Lipkind, & Long, 2016). This shows that even in a
species wherein one could think that the rules governing song acquisition
are genetically encoded, the environment plays an important role. Of course
putting a zebra finch in a cage with a cat (and assuming that the cat did not
eat the bird) would not allow the bird to learn how to meow, that is to say
that the genes do play a major role, of course. When considering the case of
language and music, two extremely refined forms of communication, while
the human species specificity is certainly genetically encoded, this does not
imply that whatever allows the development of language is specific to
language and not shared with music. In other words, if language and music
are specific to humans in their capacity to convey an extraordinary amount
of information, one should not misinterpret this in terms of different
evolutionary or developmental trajectories of language and music.
Psychology of music and neurosciences of music are recent fields of
research. The major limitation of new disciplines (and of humans) is their
strong desire to build their own identity, which often occurs to the detriment
of considering neighboring disciplines (and identities). Our field has also
yielded to this temptation by building musical cognitive models that have
initially ignored other potentially inspiring and similar domains, such as
language for instance. We will see now that music shares several cognitive
operations with language.
C F O
M L
Both language and music serve a highly sophisticated communicative

function. While we will refrain here from giving a definition of what is
language and what is music, it is important to keep in mind that they require
a huge amount of different perceptual and cognitive operations.
To perceive both music and language, one of the first operations that
needs to be implemented is the possibility to discriminate sounds. The two
phonemes [d] and [t] are quite similar but need to be distinguished as it is
the case for a C and B in music or for the same pitch played by an oboe or a
bassoon. Sounds can be characterized in terms of a limited number of
spectral features and these features are relevant to both musical and
linguistic sounds. The analysis of the acoustic features of sounds takes
place in the cochlea and in several subcortical relays up to the primary
auditory cortex. There is a suggestion that the auditory cortex may be
asymmetric in the use of temporal windows of analysis, with the left
auditory cortex preferring short windows of integration and the right
auditory cortex preferring longer windows (Giraud et al., 2007; Poeppel,
2003; Zatorre, Belin, & Penhune, 2002). This hypothesis has been used to
defend the idea that language, requiring short windows of analysis to
discriminate consonants, is preferentially processed in the left hemisphere,
while music, requiring longer windows of analysis to discriminate pitch, is
preferentially processed in the right hemisphere. While the debate is still
open, one should keep in mind that language perception is not just
consonant discrimination, but also requires us to take into account other
features, such as pitch in tone or stresses, that require longer windows of
analysis. On the other side, music is often considered in our Western society
and by non-musicians as mostly relying on pitch discrimination. However,
any good musician will claim that an extremely important feature of music
is the sound quality, which is not stationary as pitch and requires short
analysis windows. The scenario is thus more complicated than it is often
depicted and the idea of the cortex performing parallel processing on any
acoustic input, yielding to the extraction of complementary piece of
information, seems necessary to overcome the simplistic monolithic
distinction between language and music.
Generating different patterns of neuronal responses to every sound
would yield, in everyday life, to an infinite number of sound
representations. This is why sounds are categorized. Two acoustically
different tokens of [b] will thus be perceived as a unique [b]. Two different
high Es of the violin will be perceived as E, even if one is slightly lower
than the other; an A and a C note of a piano will be perceived as “piano”
sounds. Categorization is necessary and common to both language and
music and it allows us to make sense of the world, by reducing its intrinsic
variety to a finite and limited number of categories. Categorical
representations of sounds are possibly distributed across neuronal
populations within the human auditory cortices, including primary auditory
areas (Belin, Zatorre, Lafaille, Ahad, & Pike, 2000; Liebenthal, Binder,
Spitzer, Possing, & Medler, 2005; Rauschecker & Scott, 2009; Rauschecker
& Tian, 2000; Staeren, Renvall, De Martino, Goebel, & Formisano, 2009),
although motor regions seem also to play a role in representing, for
instance, phonemic acoustic features (Cheung, Hamilton, Johnson, &
Chang, 2016).
We rarely perceive sounds in isolation, but rather in a complex flow.
This requires us to build a structure that evolves in time, taking into account
the different phonemes of a sentence or tones of a melody. Building such a
structure requires at the very least a working memory capacity that allows
manipulating sound representations. Sounds are grouped into larger units
and this grouping depends upon our previous experience with these sounds.
In other words we take advantage of our previous experience with the world
and build multiple statistical distributions of sounds. Different distributions
will account for different grouping strategies: for instance, streaming a
specific voice or musical instrument in a cocktail party or in a musical
ensemble (Elhilali & Shamma, 2008); grouping phonemes together or tones
to build words or melodies according to the transitional probabilities of
phonemes or tones (Saffran, Aslin, & Newport, 1996; Saffran, Johnson,
Aslin, & Newport,1999; Schön et al., 2008). These statistical distributions
are built on the memory traces of what we have previously perceived and
strongly influence our upcoming perception of the world. In fact, following
these statistical distributions, several rules may emerge that allow us to
simplify even more the complex and continuous auditory flow. Importantly
in both language and music, the distributions can also be computed onto
symbolic unit.
These distributions or internal models at different feature levels have
two major consequences. The first, cited earlier, is that they allow us to
generate new sequences having similar statistical properties—in other
words, new sentences or melodies complying to the rules of the musical or
linguistic system. The second is that they allow us to make accurate
predictions on upcoming events. Listening to a person speaking or playing
we will be able to anticipate, to a certain degree, what and when is going to
be said/played. Considering the very fast and changing nature of the
auditory flow, this ability is of utmost importance and it explains why
sounds (phonemes or tones) missing from a speech or musical signal can be
restored by the brain and appear to be heard (DeWitt & Samuel, 1990). In
this respect music is particularly challenging because it may require us to
anticipate simultaneously several distinct streams of features. For instance,
when listening to a symphony orchestra or a string quartet, several melodic
lines take place at the same time and need to be anticipated in order to
perceive a sense of continuity in the music.
Overall, language and music are characterized by a limited set of
acoustic features, categorized by the human brain into a limited set of
representations, and subjected to similar rules of statistical learning.
O R S
Since most research in cognitive neuroscience has been guided by the

assumption that brain regions are specialized for a given function, studies
on music and language have addressed the question of whether music and
language share common neural substrates. This has been often referred as
the notion of overlap (Patel, 2011). The idea is simple. If one could show
that there is a strong overlap for music and language processing, this would
go against a modular and domain-specific view. However, there are more
problems with this approach than one might imagine. We will review them
briefly in the following section together with some neuroimaging findings.
The first problem is of purely methodological order. Indeed, many
published works using fMRI, including those comparing music and speech
processing, use a subtraction logic. Namely, results are a statistical contrast
that only allows us to see which areas show a greater signal compared to
another condition. This is referred to the tip of the iceberg problem. Indeed,
it may well be that by contrasting a language and a music task one finds a
peak in a given region. This is then interpreted as a specific area dedicated
to language (or to music, depending upon the direction of the subtraction;
see for instance Rogalsky & Hickok, 2011). However, this completely
ignores the possibility that there is a large common substrate that is
invisible when making the subtraction (e.g., 100 and 101 share 100, but
101–100 only shows 1). This approach is Manichean and suffers from its
lack of quantitative descriptions. These studies have, therefore, a
methodological bias toward highlighting differences rather than
commonalities.
A second series of problems is the experimental designs that have been
used. Indeed, only a few studies have directly used the same participants
and the same experiment music and language processing. Comparing
results across studies will also tend to show differences that may not be due
to brain computations but to differences in the populations, acquisition, or
analysis pipelines. Even when assessing music and language processing in
the same participant, there remains the challenge of comparing comparable
conditions. This goes beyond the fact that speech and music stimuli by
nature are different acoustically, insofar as if this was the only difference it
should affect only the primary auditory cortex. The real challenge is to
define the proper elementary operation and balance the difficulty level of
the task across linguistic and musical stimuli. Defining the operation is
already quite challenging because it requires a “good” model of what to
compare. Of course comparing music and language does not make any
sense, because there is no such a thing as a function for music in the brain.
Thus, music and language need to be reduced to more elementary functions
as described earlier. But even comparing syntactic processing is not trivial.
Indeed, one needs to choose which syntactic level to compare in language
(syntactic embedding or gender agreement do not imply the same
operations) and find the good analogy in music. Then, the researcher is still
left with a complicated issue, that of the difficulty level. For instance, in
comparing the role of pitch in music and in language prosody, one should
ascertain that the difficulty level of the task is comparable across material
rather than using a fixed criterion (e.g., detect a 15 percent pitch change)
that may be trivial with music but not with speech (Schön, Magne, &
Besson, 2004).
Another important issue is raised by Peretz and colleagues:
It is important to keep in mind that neural overlap does not necessarily entail neural sharing.
The neural circuits established for musicality may be intermingled or adjacent to those used
for a similar function in language and yet be neurally separable. For example, mirror
neurons are interspersed among purely motor-related neurons in pre-motor regions of the
macaque cortex (Rizzolatti & Craighero, 2004). Similarly, the neurons responsible for the
computation of some musical feature may be interspersed among neurons involved in
similar aspects in speech.
(Peretz, Vuvan, Lagrois, & Armony, 2015, p. 3)
The problem that is raised here is the scale problem of human anatomy.
Historically, there has been a very rough distinction of music and language
in terms of hemispheric dominance and this led many people to believe that
language is processed by the left hemisphere and music by the right
hemisphere. We now clearly know that this is not the case (Lindell, 2006;
Vigneau et al., 2011). Then, there have been more specific claims that the
left Broca’s area would be language specific, but this has also been
falsified, by showing for instance that musical harmony (Koelsch et al.,
2002; Maess, Koelsch, Gunter, & Friederici, 2001) and rhythm processing
(Herdener et al., 2012) are mediated by the same regions processing
language syntax (Friederici & Kotz, 2003). Further work based on
multivariate pattern analysis has shown that within overlapping regions,
distinct brain patterns of responses can be found to linguistic and musical
sounds (Abrams et al., 2010; Fedorenko, McDermott, Norman-Haignere, &
Kanwisher, 2012). However, these differences could be accounted for in
terms of differences in the stimuli manipulation or in the task. For instance,
Abrams et al. (2010) compared scrambled versions of music and speech to
normal music and speech and used a fixed scrambling window of 350 ms.
As the authors acknowledge, it could be that music and speech have
inherently different acoustical regularities and structures, rendering one
material more “scrambled” than the other. Also, different patterns of
activation in common brain areas may result from the same neural
population reacting differently to music and language (Kunert & Slevc,
2015).
The argument raised by Peretz advocates for the possibility of music
dedicated neurons, adjacent to language dedicated neurons. While this is of
course a non-falsifiable hypothesis for the moment, one should not think of
music or language as a whole, but in terms of precisely defined elementary
operations. If these operations are required with both language and music
material, then there would be no reason for the brain to produce two
extremely intermingled networks computing the same algorithm. On the
other side it is clear that the rules determining gender agreement or those
affecting tonality modulations are necessarily represented in different neural
networks. Thus, claiming that differences may always subsist at a smaller
scale is a recursive argument that does not really add much to the debate
(besides the fact that at a quantum level, music and language can be
described by the same equations). In our view, the major advances will not
come from single unit recordings showing specific neurons to the last chord
of a precise Haydn piano sonata, but rather from neurocomputational
models precisely describing what particular operations are subtended by a
given neural network when listening to speech and to music.
A more promising approach seems to us to study whether two different
levels of music and language processing interact or not. Indeed, the
interaction is a measure of the extent to which two processes influence each
other and as such it can be used to infer that one process is not independent
of the other. Several studies have tackled this issue by using interference
paradigms. For instance Slevc and colleagues (Slevc, Rosenberg, & Patel,
2009) measured the reading time of garden path sentences and found that it
was influenced by simultaneous presentation of irrelevant harmonically
unexpected chords while it was not affected by timbrally unexpected chords
(e.g., a different instrument). These results have been interpreted as
evidence for shared music–language resources processing structural
(syntactic) relations. The task-irrelevant music being processed
automatically, it uses some resources resulting in a suboptimal processing
of the language syntactic relations. Other studies have used this approach to
show an interaction between melodic and syntactic processing (Fedorenko,
Patel, Casasanto, Winawer, & Gibson, 2009), harmonic and syntactic
processing but not semantic processing (Hoch, Poulin-Charronnat, &
Tillmann, 2011), and harmonic processing and word recall (Fiveash &
Pammer, 2014). This has also been coupled to electrophysiological
measures, confirming that melodic or harmonic unexpected events affect
the syntax-related left anterior negativity (Carrus, Pearce, & Bhattacharya,
2013; Koelsch, Gunter, Wittfoth, & Sammler, 2005). Interestingly Sammler
et al. (2013) showed a co-localization of early components elicited by
musical and linguistic syntactic deviations using intracranial recordings.
Surprisingly, few neuroimaging studies have exploited the possibly most
natural setting to compare music and language which is a stimulus that
combines both speech and music: song. The use of songs has the clear
advantage of solving the problem of using different stimuli in the language
and musical task. Schön et al. (2010) used an interference paradigm based
on sung sentences and showed that the processing demands of melodic and
lexical/phonological processing interact in a large network including
bilateral temporal cortex and left inferior frontal cortex. Importantly, most
voxels sensitive to the lexical/phonological manipulation are also sensitive
to the interaction between the lexical/phonological and the melodic
dimensions. In other words there seem to be very few voxels that are
involved in lexical/phonological and are not influenced by melodic
structure (see Fig. 1).
FIGURE 1. Number of surviving voxels for the main effect of lexical/phonological dimension as a
function of the threshold of the interaction between phonological and melodic dimensions. The
dotted vertical line indicates the p-value of 0.05 for the mask. The right edge corresponds to a very
conservative p-value (adapted from Schön et al., 2010).
Similarly, Sammler et al. (2010) using an adaptation paradigm, showed a

strong integration between melodic and phonological levels in song in the
dorsal pathway with a degree of integration decaying toward anterior
regions of the left STS, possibly resulting from the processing of meaning
of words. This integration of melodic and phonological dimension is also in
line with the findings that a sung language is more easily learned than a
spoken language (Schön et al., 2008). Kunert and colleagues (Kunert,
Willems, Casasanto, Patel, & Hagoort, 2015) showed an effect of musical
harmonic deviancy on language syntax processing in the left inferior frontal
gyrus. Notably this effect was not present when the deviancy in the musical
stimulus was limited to the acoustic level (louder sound). Interestingly the
authors also showed, in a behavioral study, an effect of the syntactic
structure of sentences on the performance of a musical harmonic judgment
task, confirming the idea of shared resources.
One may wonder how to combine these data suggesting shared resources
with the “older” data issued from the neuropsychological studies pointing
rather to a specificity and independence of several levels of language and
music processing. However, very few studies have tried to systematically
assess the co-existence of language and musical deficits, even for the most
studied language deficit following a lesion in Broca’s area. Ani Patel was
the first to investigate brain-damaged individuals and more specifically
aphasic individuals with grammatical comprehension problems in language
in order to see whether they also have a deficit in processing structural
musical relations (Patel, Iversen, Wassenaar, & Hagoort, 2008). Broca’s
aphasic patients and controls had to judge whether a set of sentences
contained or not a grammatical or semantic error. A similar task was used
with harmonic error introduced into musical chord sequences. In a second
experiment participants were tested using an implicit harmonic priming
procedure. Both experiments showed that the aphasic patients have an
impaired musical syntactic processing. Importantly, this took place in
absence of low-level deficits, and with a preserved short-term memory for
pitch patterns. This scenario is complicated by the fact that not all
agrammatic patients may necessarily show a musical deficit (Slevc, Faroqi-
Shah, Saxena, & Okada, 2016). On a similar line, Sammler and colleagues
(Sammler, Koelsch, & Friederici, 2011) showed a reduction or extinction of
the typical electrophysiological marker of musical syntax processing in
agrammatic patients with a lesion in the left inferior frontal cortex. These
results are consistent with the hypothesis that Broca’s area computes a
rather domain-general “syntactic” processing but still a huge amount of
work remains to be done with brain-lesioned patients.
M T L S
We have seen that the approach of studying music and language brain
correlates is limited by a number of methodological problems that render
the interpretation of the results in terms of sharing or not of the resources
rather complex. Another way to address the sharing resources hypothesis is
to investigate whether music training affects the way the brain processes
language, and vice versa. The reasoning is the following. Musical expertise
requires an intense training often starting at an early age. As a result of
learning, all the operations required by music perception and production
will be affected by this training and become more efficient. If some of these
operations are also required by language perception and production, then
one should be able to observe a more efficient processing whenever the
appropriate language processing levels are investigated. By contrast with
the approach described above, the validation of this hypothesis does not
necessarily require brain imaging data, insofar as behavioral differences can
be taken as evidence that resource sharing exists. Psychologists and some
neuroscientists often use the term “transfer of learning.” This term is,
however, rather vague as it seems to point to some sort of magic transfer of
learning from one domain to another or from one function to another
function without specifying how this transfer would actually take place.
However, an alternative explanation is to hypothesize that these so-called
transfer effects are simply due to an elementary function that is shared by
both music and language processing. According to this view there is no
transfer taking place, but only sharing of functions and resources.
Importantly, while there is no clear way of showing how transfer could be
possibly implemented, shared elementary operations can be defined via
careful experimental manipulations.
Considering the early steps of sound analysis helps to clarify this point.
The group of Nina Kraus has studied for many years the effect of music
training on sound perception in general, including speech. Using EEG and
focusing on high frequency (>200 Hz) neural responses, possibly
principally occurring at the subcortical level, this group of researchers has
shown that, compared to non-musicians, musicians have a stronger
representation of several features of speech sounds, including the
fundamental frequency (Wong, Skoe, Russo, Dees, & Kraus, 2007), the
harmonics (Kraus & Chandrasekaran, 2010), and rapid transients that may
be important in distinguishing consonants (Parbery-Clark, Tierney, Strait, &
Kraus, 2012). Overall, the correlation between the neural response and the
stimulus is greater in musicians than in non-musicians and this
independently of whether the stimulus is a music or a speech sound
(Musacchia, Sams, Skoe, & Kraus, 2007). Most importantly this correlation
is more resistant to acoustic noise in musicians. In other words, musicians
seem to be able to filter out the noise better than non-musicians (Parbery-
Clark, Skoe, & Kraus, 2009). Interestingly some of these differences can be
observed in adults that had a few years of music training during childhood,
thus showing that these changes last in time and do not necessarily require a
long-lasting and intense training (Skoe & Kraus, 2012).
Moreover, these differences induced by music training are not simply
due to a better processing of any sound feature. Indeed, results of a recent
experiment show that music training can facilitate the selective processing
of certain relevant features of speech. In this study, Intartaglia and
colleagues (Intartaglia, White-Schwoch, Kraus, & Schön, 2017) compared
French and American participants listening to an American phoneme, not
existing in French. The comparison of the neural signatures showed that
American participants had a more robust representation compared to French
participants. The differences concerned the high formant frequencies that
are necessary to encode the specific features of consonants and vowels.
They then tested French musicians and the differences with the Americans
disappeared. In other words, music training seems to allow a better
encoding of the relevant features of speech sounds, even when these sounds
are not familiar.
When interpreting these overall results one should keep in mind that two
possible non-exclusive explanations co-exist. First, the subcortical relays
may be more efficient in sound processing due to massive bottom-up
processing. In this case one can clearly see that there is no need to advocate
for a transfer effect. There is a dedicated auditory subcortical network that
processes both musical and linguistic sounds. If this network becomes more
efficient via intensive musical training, then speech processing will also
benefit from the enhanced efficiency. Second, the cortical regions are
known to send efferent signals to the subcortical relays and these
modulatory top-down signals may play a role in enhancing the
representation of certain features of sounds or in reducing the noise (Strait,
Kraus, Parbery-Clark, & Ashley, 2010; Tenenbaum, Kemp, Griffiths, &
Goodman, 2011). In this perspective, the changes are possibly due to an
enhanced connectivity that allows a finer modulatory activity of cortical
over subcortical activity.
Independently of whether these enhanced subcortical representations
reflect a bottom-up or a top-down modulation, these results are important in
interpreting the differences that may be observed at a more integrated level.
Indeed, differences observed at a phonological, syntactic, or prosodic level
may result from a cascade effect of early auditory processing differences.
The studies on prosody and phoneme perception in musicians are
particularly sensitive to this issue. Indeed, pitch is important in speech at
the supra-segmental level, by signaling the emotional content of an
utterance (Kotz et al., 2003), the linguistic structure (Steinhauer, Alter, &
Friederici, 1999), and certain syntactic features such as to determine
whether the utterance is a question or not (Astésano, Besson, & Alter,
2004). Pitch contour also plays a role at the segmental level in tone
languages: it plays a linguistically contrastive function.
Musicians are more accurate in detecting subtle pitch variations in both
music and speech prosody. These variations in speech prosody are detected
earlier by musicians’ brains and elicit more distinguishable event-related
potentials compared to normal speech (Schön et al., 2004). This has been
replicated with 8-year-old musician children (Magne, Schön, & Besson,
2006). Music lessons also seem to promote sensitivity to emotions
conveyed by speech prosody. Indeed, musically trained adults perform
better than untrained adults in discrimination and identification of
emotional prosody (Thompson, Schellenberg, & Husain, 2004). Finally,
musicians are more accurate at identifying, reproducing, or discriminating
Mandarin tones (Gottfried & Riester, 2000; Gottfried, Staby, & Ziemer,
2004; Marie, Delogu, Lampis, Belardinelli, & Besson, 2011). However, as
previously stated, it is difficult to know to what extent these differences are
due to cortical or subcortical plasticity. Considering that anatomical
differences have been observed at the cortical level in the auditory cortex
(Benner et al., 2017; Kleber et al., 2016; Schlaug, Jäncke, Huang, Staiger,
& Steinmetz, 1995; Shahin, Bosnyak, Trainor, & Roberts, 2003), it seems
reasonable to believe that the whole auditory network is modified by music
training, thus affecting speech processing at multiple levels.
Interestingly, previous studies provided evidence for a positive
relationship between the function or the anatomy of the planum temporale
and performance during syllable categorization (Elmer, Hänggi, Meyer, &
Jäncke, 2013). Recently, Elmer and colleagues (Elmer, Hänggi, & Jäncke,
2016) provided evidence for a relationship between planum temporale
connectivity, musicianship, and phonetic categorization. They found an
increased connectivity between the left and right plana temporalia in
musicians compared to non-musicians. This increased connectivity
positively correlated with the performance in a phonetic categorization task
as well as with musical aptitudes. Indeed, music training seems to affect the
sensitivity to some acoustic features that are important to categorization of
syllables, in particular temporal features such as voice-onset time (Chobert,
Marie, François, Schön, & Besson, 2011; Zuk et al., 2013).
Very few studies have examined whether musical expertise influences
the processing of the speech temporal structures. While isochrony is absent
in speech, several nested temporal hierarchies are present in speech
(Cummins & Port, 1998; Ghitza, 2011; Giraud & Poeppel, 2012).
Musicians outperform non-musicians when asked to judge the lengthening
of a syllable in a sentence (Marie, Magne, & Besson, 2011). Also,
independently of whether musicians direct attention to the temporal or
semantic content, they are more sensitive to subtle changes in the temporal
structure of speech than non-musicians (Magne et al., 2006; Marie, Delogu,
et al., 2011). Milovanov et al. (2009) reported a positive correlation
between musical aptitudes and sensitivity to syllable discrimination in
children. In artificial language learning, speech segmentation results from
the capacity to parse a continuous stream of syllables and to build and
maintain probabilistic relationship of the different elements (syllables) that
compose words. François and Schön (2011) showed that musicians have
improved segmentation skills compared to non-musicians. Indeed, when
listening to a new stream of an artificial language, they are faster and more
accurate at segmenting the continuous stream. Children, after only one year
of music training already show an improvement in speech segmentation
(François, Chobert, Besson, & Schön, 2012). This ability, namely
discovering word boundaries in the continuous stream of natural speech, is
of utmost importance during language learning in the first years of life
(Saffran et al., 1996).
The evidence concerning an effect of music training on language
semantic and syntactic levels is rather scarce. One study showed that music
training seems to influence semantic aspects of language processing
(Dittinger et al., 2016). However, in this study, French participants had to
learn new words that were in Thai language. Thus, differences may be due
to the difficulty of the task at the perceptual level in terms of discriminating
Thai tokens that differed in terms of pitch or vowel length. At the neural
level, results indicate an increased functional connectivity in the ventral and
dorsal streams of the left hemisphere during retrieval of novel words in
musicians compared to non-musicians (Dittinger, Valizadeh, Jäncke,
Besson, & Elmer, 2018). An effect of musical expertise on syntactic
processing was shown by Jentschke and Koelsch (2009) with earlier and
larger evoked responses to syntactic errors in children with musical
training. However, others described that differences are absent at the
behavioral level and that musical expertise does not modulate the amplitude
of responses evoked by syntactic violations but only the topographical
distribution (Fitzroy & Sanders, 2013). Thus, the evidence that music
training affects language semantic and syntactic processing is not yet
compelling and further studies are awaited.
Overall, while the theoretical framework of transfer of learning remains
uncertain, there is a rather massive amount of data pointing to an
improvement induced by music training at different levels of speech and
language processing. Patel (2014) has tried to formalize the conditions
under which music training may be beneficial to speech processing. In the
OPERA hypothesis (Overlap, Precision, Emotion, Repetition, and
Attention) he suggests that, in order for music training to enhance speech
processing, music and speech need to share sensory or cognitive processing
mechanisms and music must place higher demands on these mechanisms
compared to speech. These mechanisms are tightly bound to the music
emotional rewards system (Salimpoor et al., 2013). The last ingredients of
music-induced and speech-related neural plasticity would be the fact that
music training requires a repetition of sound patterns and gestures for an
enormous amount of time under conditions of highly focused attention.
B M L
When considering the effects of music training on speech and language

abilities, one should keep in mind that most of the studies described here
compared adult professional musicians to a group of adult non-musicians.
This comparison has two methodological weaknesses. The first concerns
the possibility of pre-existing differences, namely musicians already
differed from non-musicians before starting to make music. The second is
that music training is a complex activity, often involving individual lessons,
group activities, theory classes, and so on. This makes it impossible to
know what factors in music training had an impact on speech and language
abilities.
Both criticisms can be addressed by running longitudinal studies
assessing the absence of differences before the beginning of music training
(Chobert, François, Velay, & Besson, 2012; François et al., 2012), and
comparing the music-training group with a control group involved in an
activity with a similar setting (e.g., visual arts, theater). However, this
approach is time and cost consuming, insofar as it requires following two
groups of children for a long period of time (often one year), testing them at
least twice and coordinating the two training programs.
There is an alternative methodological approach that is somewhat in
between the interference or interaction approach and the group comparison
described earlier. The idea is to test the effect of music stimulation on
speech perception. This has proven particularly successful in the temporal
domain. Indeed, the structure of speech and music have a similar
hierarchical temporal scaffolding (Haegens & Golumbic, 2018; Schön &
Tillmann, 2015). A series of studies has shown that priming the temporal
structure of speech using a music rhythmic prime can induce a speech
processing benefit (Cason, Astésano, & Schön, 2015; Cason & Schön,
2012; Chern, Tillmann, Vaughan, & Gordon, 2018; Przybylski et al., 2013).
These studies showed a benefit of rhythmic priming both in phoneme
detection and in a grammaticality judgment task. This approach has been
particularly efficient with language-impaired population. For instance,
passive listening to a rhythmic regular prime improved the performance in a
grammaticality judgment task in children with dyslexia or specific language
impairment (SLI, Bedoin, Brisseau, Molinier, Roch, & Tillmann, 2016;
Przybylski et al., 2013) and patients with a basal ganglia lesion (Kotz,
Gunter, & Wonneberger, 2005). While these results support the importance
of temporal predictions, the ability to anticipate in time upcoming events,
for language processing, it is not clear whether the benefit at the
grammatical level is mediated by a selective effect at the syntactic level or
by improved speech perception. For instance Cason and colleagues (Cason,
Hidalgo, Isoard, Roman, & Schön, 2015) have shown that priming the
temporal structure of a sentence with music improved phoneme perception
in hearing-impaired children.
Most of these studies have used a passive listening approach. However,
an active approach, requiring the intervention of the audio-motor network
seems to have a stronger effect than passive listening (Cason, Astésano, &
Schön, 2015; Morillon & Baillet, 2017; Morillon, Schroeder, & Wyart,
2014). An interesting avenue for the future is to test the effect of a single
session of music training on several levels of speech and language
processing. This seems to us a good compromise between all the above-
mentioned approaches insofar as it prevents the criticisms of pre-existing
differences, and it allows a strict control of the content of the training
session without “reducing” music to a passive listening of an isochronous
metronome. Recently, Hidalgo and colleagues (Hidalgo, Falk, & Schön,
2017) used this type of approach to investigate temporal adaptation in
speech interaction in hearing-impaired children. They showed that a 30
minute session of active rhythmic training facilitated the access to the
temporal structure of verbal interactions and improved performance in a
simple turn-taking task.
One of the factors prompting research in the domain of music and
language is the possibility to use music to remediate language impairment.
Thus, the fundamental research supports the therapeutic approach of using
music to recover impaired functions by defining what aspects of music
training benefit language processing and at which levels of processing.
While it is not the aim of this chapter to review this literature (see Chapter
29 by Lee, Thaut, and Santoni, this volume), it is important to note that the
underlying neuroscientific models supporting the use of music in language
rehabilitation have changed. For instance, the development of melodic
intonation therapy to recover language function in non-fluent aphasic
patients was somewhat driven by the idea that patients can learn a new way
to speak through singing by using the right hemisphere (Albert, Sparks, &
Helm, 1973; Zumbansen, Peretz, & Hébert, 2014). Forty years later our
knowledge of the spatiotemporal dynamics subtending music and language
on one side and of the pathophysiology of language disorders on the other
side has been refined. Stahl and colleagues (Stahl, Kotz, Henseler, Turner,
& Geyer, 2011) have shown, concerning non-fluent aphasia, that rhythmic
training may be the most relevant aspect of the musical intervention, rather
than the melodic aspect, especially when patients present a basal ganglia
lesion, a subcortical structure involved in motor coordination and the
processing of temporal information (Kotz & Schwartze, 2010).
Interestingly, several recent works on the use of music for language
rehabilitation point to an important role of the rhythmic aspect of music.
More precisely, musical training targeted toward improving rhythmic
perception and production, resulted in improved phonological and reading
skills (Bhide, Power, & Goswami, 2013; Cogo-Moreira, de Avila,
Ploubidis, & de Jesus Mari, 2013; Flaugnacco et al., 2015; Moore,
Branigan, & Overy, 2017; Overy, 2000). These results suggest a shared
substrate and point to temporal processing as playing a major role in
language processing. This fits with the temporal sampling framework
proposed by Goswami (2011) for dyslexia and by extension for SLI.
Building on the neural resonance theory positing internal oscillators guiding
attention over time (Large & Jones, 1999), Goswami suggests that deficits
in syllabic segmentation and other sequential processes may result from
impaired rhythmic entrainment leading to difficulties in sampling
information over time. Along a similar line, Tierney and Kraus (2014)
proposed the precise auditory timing hypothesis (PATH) that suggests that
neural entrainment in auditory and motor cortex, and the interaction
between them, underlies many of the behavioral aspects of both language
and music processing. We will now describe music and language with a
temporal focus.
A T F M
L
As of today, the most promising approach to understand information

processing seems to us the adoption of a dynamical emphasis, focusing on
the temporal dimension. This perspective can be operationalized in
complementary ways, ranging from portraying temporal regularities within
sensory inputs to investigating time-resolved neural patterns of activity
implicated in sensory processing, both in terms of frequency-resolved
neural oscillations and neural networks dynamics. The underlying
motivation is to describe information processing at the algorithmic (or
representational) level, as first proposed by David Marr (Marr, 1982;
Poeppel, 2012)—in other words to understand how the system does what it
does, and more precisely what representations it uses, how they emerge, and
how they are manipulated. Describing the time constants or the temporal
profile of activity of each of these neural algorithms constitutes a
preliminary stage toward this ultimate goal. While this approach can be
carried out separately for music and language, a direct comparison of the
two is also useful to delimitate general processing steps from more specific
ones.
In the speech domain, David Poeppel has theorized this approach in the
“asymmetric sampling in time” hypothesis (Giraud and Poeppel, 2012;
Poeppel, 2003). Basically, speech can be described as a multi-timescale
signal, with a hierarchical organization composed of phonemic, syllabic,
and prosodic information (among others). At the neural level, both parallel
and sequential processing occurs, with gamma (~30 Hz), theta (~5 Hz), and
delta (~2 Hz) oscillations being specifically engaged by these multi-
timescale, quasi-rhythmic properties of speech, and tracking its dynamics.
Giraud and Poeppel argue that such neural oscillations “are foundational in
speech and language processing, ‘packaging’ incoming information into
units of the appropriate temporal granularity” (Giraud & Poeppel, 2012, p.
511). Interestingly, music is also characterized by a multi-timescale
structure, with rhythm and meter hierarchically organized (Vuust & Witek,
2014). However, in an acoustic characterization of the temporal
modulations in music and speech, Ding and colleagues (2017) recently
highlighted that their temporal modulation rates differ. While the main
tempo of music is around 2 Hz (120 bpm), a temporal modulation around 5
Hz primarily characterizes speech, which corresponds to the syllabic rate.
At least two complementary avenues can be drawn from this result.
First, the distinction between music and speech modulation properties
could be at the origin of some of their computational differences. In a
fascinating paradigm, Oded Ghitza showed that intelligibility of time-
compressed speech can be greatly enhanced if periods of silence of the
appropriate duration are inserted (Ghitza, 2011; Ghitza & Greenberg, 2009).
Oscillation-based models of speech perception best explain these data,
where optimum intelligibility is achieved when the syllable rhythm is
within the range of the theta-frequency brain rhythms (~4–10 Hz),
comparable to the rate at which segments and syllables are articulated in
conversational speech. Follow-up experiments were performed in the music
domain, where participants had to identify the musical key of time-
compressed short melodic sequences (Farbood, Marcus, & Poeppel, 2013;
Farbood, Rowland, Marcus, Ghitza, & Poeppel, 2015). This highlighted that
insertion of silence gaps was beneficial to performance, in unison with the
speech experiments, providing compelling clues into possible oscillatory
mechanisms underlying segmentation of auditory information. However,
the two experiments in the music domain were not conclusive with regard
to the preferred rate of processing, observed at 2–3 Hz or 5–7 Hz,
respectively. While the former result would be compatible with the fact that
the main tempo of music is around 2 Hz, suggesting that the distinctions
between music and speech acoustic modulation properties are a productive
attribute of their respective perceptual analysis, the latter would be
compatible with the idea that the auditory cortex parses information at the
theta rate, and that such sampling operates rather independently of the
nature of the acoustic signal (music or speech).
Second, the most shared characteristic between music and language
acoustic signals is that both of them have strong temporal constraints (i.e., a
salient main modulation rate, at ~2 and 5 Hz, respectively), leading to
strong temporal predictions. Temporal predictions are believed to play a
fundamental role in the way we sample sensory information, in particular in
the auditory domain (Jones, 1976; Nobre & van Ede, 2018; Schroeder &
Lakatos, 2009). Behavioral experiments show that anticipating the moment
of occurrence of an upcoming event optimizes its processing by improving
the quality of sensory information (Jaramillo & Zador, 2011; Morillon,
Schroeder, Wyart, & Arnal, 2016; Rohenkohl, Cravo, Wyart, & Nobre,
2012). Current theories and empirical findings suggest that this
enhancement is achieved by the entrainment of low-frequency neuronal
oscillations, which temporally modulates the excitability of task-relevant
neuronal populations (Cravo, Rohenkohl, Wyart, & Nobre, 2013; Large &
Jones, 1999; Schroeder & Lakatos, 2009). Such entrainment, principally
observed in sensory cortices (Besle et al., 2011; Lakatos et al., 2013), would
be possible thanks to the downward propagation of temporal prediction
signals, recently shown to originate in the motor system (Morillon &
Baillet, 2017). These signals would be responsible for the predictive
alignment of the neuronal excitability phase of ongoing oscillations in
sensory cortex with upcoming events, possibly through top-down phase-
reset (e.g., Park, Ince, Schyns, Thut, & Gross, 2015 ; Stefanics et al., 2010).
A recent proposition by Arnal and colleagues (Rimmele, Morillon,
Poeppel, & Arnal, submitted) is that time estimation relies on the neural
recycling of action circuits (Coull, 2011) and is implemented by internal,
non-conscious “simulation” of movements in most ecological situations
(Arnal, 2012; Arnal & Giraud, 2012; Schubotz, 2007). On this view,
temporal predictions correspond to a covert form of active sensing
(Morillon, Hackett, Kajikawa, & Schroeder, 2015; Schroeder, Wilson,
Radman, Scharfman, & Lakatos, 2010). In other words, the efferent motor
signals that are generated when synchronizing our actions to predictable
events are also generated during the passive perception of such regularities
(Arnal, 2012; Patel & Iversen, 2014). When temporal regularities occur in
the timescale of natural actions/movements, the motor system is recruited
(Chen, Penhune, & Zatorre, 2008; Du & Zatorre, 2017; Grahn & Rowe,
2012; Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015; Teki, Grube,
Kumar, & Griffiths, 2011; Zatorre, Chen, & Penhune, 2007). The great
richness of the repertoire of motor schemes (gestures) makes it possible to
simulate (and predict) the occurrence of sensory events with great accuracy
and to treat them with greater precision (Morillon et al., 2016; Schubotz,
2007), offering a flexible tool to precisely predict “when” and select
relevant information in time. Given the finesse of our motor expertise and
the amazing complexity of our repertoire of actions, this means that we can
use internal simulation of action to anticipate temporal trajectories. This
conception is compatible with various forms of “motor theories” of speech
perception (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967), in
which the covert simulation of actions can lead to a given sensory
configuration.
The role of temporal predictions, while being a critical role in both
music and language, differs in multiple ways. First, music is much more
rhythmic than speech, hence predictions are more precise. Second, while
temporal predictions have primarily a contextual role in language, helping
to optimize the extraction of relevant information, they serve a much more
fundamental purpose in music. Indeed, musical rhythm has a remarkable
capacity to move our minds and bodies. This is because it is part of the
information content in itself, rather than being a contextual cue (as in
language). In a compelling review article, Vuust and Witek (2014)
hypothesize that music would exploit general principles of brain
functioning, notably its structuration as a Bayesian, predictive system, to
optimize our pleasure and desire to move. In any case, these distinctions
highlight that music stimulates the dorsal auditory stream much more than
language, as this pathway is involved in audio-motor transformation
(Hickok & Poeppel, 2007) and temporal information processing (Morillon
& Baillet, 2017). As a consequence, musical training or musical stimulation
strengthen the connectivity between auditory and motor cortices, which has
beneficial effects for speech comprehension (Falk, Lanzilotti, & Schön,
2017), especially in noisy conditions (Du & Zatorre, 2017), and
phonological and reading skills in children (Flaugnacco et al., 2015), as
described earlier. Overall, while music and language have both different
structure and function, they share the specificity to be temporal in essence.
Adopting a dynamical approach seems thus the most promising avenue to
understand how the human brain interacts with this type of multisensory
environment.
R
Abrams, D. A., Bhatara, A., Ryali, S., Balaban, E., Levitin, D. J., & Menon, V. (2010). Decoding
temporal structure in music and speech relies on shared brain resources but elicits different fine-
scale spatial patterns. Cerebral Cortex 21(7), 1507–1518.
Albert, M. L., Sparks, R. W., & Helm, N. A. (1973). Melodic intonation therapy for aphasia. Archives
of Neurology 29, 130–131.
Arnal, L. (2012). Predicting “when” using the motor system’s beta-band oscillations. Frontiers in
Arnal, L., & Giraud, A. L. (2012). Cortical oscillations and sensory predictions. Trends in Cognitive
Sciences 16(7), 390–398.
Astésano, C., Besson, M., & Alter, K. (2004). Brain potentials during semantic and prosodic
processing in French. Cognitive Brain Research 18(2), 172–184.
Basso, A., & Capitani, E. (1985). Spared musical abilities in a conductor with global aphasia and
ideomotor apraxia. Journal of Neurology, Neurosurgery & Psychiatry 48(5), 407–412.
Bedoin, N., Brisseau, L., Molinier, P., Roch, D., & Tillmann, B. (2016). Temporally regular musical
primes facilitate subsequent syntax processing in children with specific language impairment.
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human
auditory cortex. Nature 403(6767), 309–312.
Benner, J., Wengenroth, M., Reinhardt, J., Stippich, C., Schneider, P., & Blatow, M. (2017).
Prevalence and function of Heschl’s gyrus morphotypes in musicians. Brain Structure and
Function 222(8), 1–17.
Besle, J., Schevon, C. A., Mehta, A. D., Lakatos, P., Goodman, R. R., McKhann, G. M., …
Schroeder, C. E. (2011). Tuning of the human neocortex to the temporal dynamics of attended
events. Journal of Neuroscience 31(9), 3176–3185.
Bhide, A., Power, A., & Goswami, U. (2013). A rhythmic musical intervention for poor readers: A
comparison of efficacy with a letter-based intervention. Mind, Brain, and Education 7(2), 113–
123.
Carrus, E., Pearce, M. T., & Bhattacharya, J. (2013). Melodic pitch expectation interacts with neural
responses to syntactic but not semantic violations. Cortex 49(8), 2186–2200.
Cason, N., Astésano, C., & Schön, D. (2015). Bridging music and speech rhythm: Rhythmic priming
and audio-motor training affect speech perception. Acta Psychologica 155, 43–50.
Cason, N., Hidalgo, C., Isoard, F., Roman, S., & Schön, D. (2015). Rhythmic priming enhances
speech production abilities: Evidence from prelingually deaf children. Neuropsychology 29(1),
102.
Cason, N., & Schön, D. (2012). Rhythmic priming enhances the phonological processing of speech.
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008). Listening to musical rhythms recruits motor
Chern, A., Tillmann, B., Vaughan, C., & Gordon, R. L. (2018). New evidence of a rhythmic priming
effect that enhances grammaticality judgments in children. Journal of Experimental Child
Cheung, C., Hamilton, L. S., Johnson, K., & Chang, E. F. (2016). The auditory representation of
speech sounds in human motor cortex. eLife 5, e12577.
Chobert, J., François, C., Velay, J. L., & Besson, M. (2012). Twelve months of active musical
training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and
voice onset time. Cerebral Cortex 24(4), 956–967.
Chobert, J., Marie, C., François, C., Schön, D., & Besson, M. (2011). Enhanced passive and active
processing of syllables in musician children. Journal of Cognitive Neuroscience 23(12), 3874–
3887.
Chomsky, N. (1959). A review of B. F. Skinner’s Verbal Behavior. Language 35(1), 26–58.
Cogo-Moreira, H., de Avila, C. R. B., Ploubidis, G. B., & de Jesus Mari, J. (2013). Effectiveness of
music education for the improvement of reading skills and academic achievement in young poor
readers: A pragmatic cluster-randomized, controlled clinical trial. PloS ONE 8(3), e59984.
Coull, J. T. (2011). Discrete neuroanatomical substrates for generating and updating temporal
expectations. In S. Dehaene & E. Brannon (Eds.), Space, time and number in the brain: Searching
for the foundations of mathematical thought (pp. 87–101). Amsterdam: Elsevier.
Cravo, A. M., Rohenkohl, G., Wyart, V., & Nobre, A. C. (2013). Temporal expectation enhances
contrast sensitivity by phase entrainment of low-frequency oscillations in visual cortex. Journal of
Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of
Phonetics 26(2), 145–171.
DeWitt, L. A., & Samuel, A. G. (1990). The role of knowledge-based expectations in music
perception: Evidence from musical restoration. Journal of Experimental Psychology: General
119(2), 123–144.
speech and music. Neuroscience & Biobehavioral Reviews 81(B), 181–187.
Dittinger, E., Barbaroux, M., D’Imperio, M., Jäncke, L., Elmer, S., & Besson, M. (2016).
Professional music training and novel word learning: From faster semantic encoding to longer-
lasting word representations. Journal of Cognitive Neuroscience 28(10), 1584–1602.
Dittinger, E., Valizadeh, S. A., Jäncke, L., Besson, M., & Elmer, S. (2018). Increased functional
connectivity in the ventral and dorsal streams during retrieval of novel words in professional
musicians. Human Brain Mapping 39(2), 722–734.
Du, Y., & Zatorre, R. J. (2017). Musical training sharpens and bonds ears and tongue to hear speech
better. Proceedings of the National Academy of Sciences 5, 201712223. Retrieved from
https://doi.org/10.1073/pnas.1712223114
Elhilali, M., & Shamma, S. A. (2008). A cocktail party with a cortical twist: How cortical
mechanisms contribute to sound segregation. Journal of the Acoustical Society of America 124(6),
3751–3771.
Elmer, S., Hänggi, J., & Jäncke, L. (2016). Interhemispheric transcallosal connectivity between the
left and right planum temporale predicts musicianship, performance in temporal speech
processing, and functional specialization. Brain Structure and Function 221(1), 331–344.
Elmer, S., Hänggi, J., Meyer, M., & Jäncke, L. (2013). Increased cortical surface area of the left
planum temporale in musicians facilitates the categorization of phonetic and temporal speech
sounds. Cortex 49(10), 2812–2821.
Falk, S., Lanzilotti, C., & Schön, D. (2017). Tuning neural phase entrainment to speech. Journal of
Farbood, M. M., Marcus, G., & Poeppel, D. (2013). Temporal dynamics and the identification of
musical key. Journal of Experimental Psychology Human Perception & Performance 39(4), 911–
918.
Farbood, M. M., Rowland, J., Marcus, G., Ghitza, O., & Poeppel, D. (2015). Decoding time for the
identification of musical key. Attention, Perception, & Psychophysics 77(1), 28–35.
Fedorenko, E., McDermott, J. H., Norman-Haignere, S., & Kanwisher, N. (2012). Sensitivity to
musical structure in the human brain. Journal of Neurophysiology 108(12), 3289–3300.
Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural integration in
language and music: Evidence for a shared system. Memory & Cognition 37(1), 1–9.
Fitzroy, A. B., & Sanders, L. D. (2013). Musical expertise modulates early processing of syntactic
violations in language. Frontiers in Psychology 3, 603. Retrieved from
Fiveash, A., & Pammer, K. (2014). Music and language: Do they draw on similar syntactic working
memory resources? Psychology of Music 42(2), 190–209.
Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge, MA: MIT
Press.
François, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical and
linguistic structures. Cerebral Cortex 21(10), 2357–2365.
Friederici, A. D., & Kotz, S. A. (2003). The brain basis of syntactic processes: Functional imaging
and lesion studies. NeuroImage 20, S8–S17.
Ghitza, O. (2011). Linking speech perception and neurophysiology: Speech decoding guided by
cascaded oscillators locked to the input rhythm. Frontiers in Psychology 2. Retrieved from
Ghitza, O., & Greenberg, S. (2009). On the possible role of brain rhythms in speech perception:
Intelligibility of time-compressed speech with periodic and aperiodic insertions of silence.
Phonetica 66, 113–126.
Giraud, A. L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S., & Laufs, H. (2007).
Endogenous cortical rhythms determine cerebral specialization for speech perception and
production. Neuron 56(6), 1127–1134.
Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging
computational principles and operations. Nature Neuroscience 15(4), 511–517.
Gottfried, T. L., & Riester, D. (2000). Relation of pitch glide perception and Mandarin tone
identification. Journal of the Acoustical Society of America 108(5), 2604.
Gottfried, T. L., Staby, A. M., & Ziemer, C. J. (2004). Musical experience and Mandarin tone
discrimination and imitation. Journal of the Acoustical Society of America 115(5), 2545.
Grahn, J. A., & Rowe, J. B. (2012). Finding and feeling the musical beat: Striatal dissociations
between detection and prediction of regularity. Cerebral Cortex 23(4), 913–921.
Haegens, S., & Golumbic, E. Z. (2018). Rhythmic facilitation of sensory processing: A critical
review. Neuroscience & Biobehavioral Reviews 86, 150–165.
Herdener, M., Humbel, T., Esposito, F., Habermeyer, B., Cattapan-Ludewig, K., & Seifritz, E.
(2012). Jazz drummers recruit language-specific areas for the processing of rhythmic structure.
Hidalgo, C., Falk, S., & Schön, D. (2017). Speak on time! Effects of a musical rhythmic training on
children with hearing loss. Hearing Research 351, 11–18.
Hillis, A. E., & Caramazza, A. (1995). Representation of grammatical categories of words in the
brain. Journal of Cognitive Neuroscience 7(3), 396–407.
Hoch, L., Poulin-Charronnat, B., & Tillmann, B. (2011). The influence of task-irrelevant music on
language processing: Syntactic and semantic structures. Frontiers in Psychology 2. Retrieved from
Intartaglia, B., White-Schwoch, T., Kraus, N., & Schön, D. (2017). Music training enhances the
automatic neural processing of foreign speech sounds. Scientific Reports 7(1), 12631.
Jaramillo, S., & Zador, A. M. (2011). The auditory cortex mediates the perceptual effects of acoustic
temporal expectation. Nature Neuroscience 14, 246–251.
Kleber, B., Veit, R., Moll, C. V., Gaser, C., Birbaumer, N., & Lotze, M. (2016). Voxel-based
morphometry in opera singers: Increased gray-matter volume in right somatosensory and auditory
cortices. NeuroImage 133, 477–483.
Koelsch, S., Gunter, T. C., Cramon, D. Y. V., Zysset, S., Lohmann, G., & Friederici, A. D. (2002).
956–966.
Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax
processing in language and in music: An ERP study. Journal of Cognitive Neuroscience 17(10),
1565–1577.
Kotz, S. A., Gunter, T. C., & Wonneberger, S. (2005). The basal ganglia are receptive to rhythmic
compensation during auditory syntactic processing: ERP patient data. Brain and Language 95(1),
70–71.
Kotz, S. A., Meyer, M., Alter, K., Besson, M., von Cramon, D. Y., & Friederici, A. D. (2003). On the
lateralization of emotional prosody: An event-related functional MR investigation. Brain &
Language 86(3), 366–376.
Kotz, S. A., & Schwartze, M. (2010). Cortical speech processing unplugged: A timely subcortico-
cortical framework. Trends in Cognitive Sciences 14(9), 392–399.
Kunert, R., & Slevc, L. R. (2015). A commentary on: “Neural overlap in processing music and
speech.” Frontiers in Human Neuroscience 9. Retrieved from
Kunert, R., Willems, R. M., Casasanto, D., Patel, A. D., & Hagoort, P. (2015). Music and language
syntax interact in Broca’s area: An fMRI study. PloS One 10(11), e0141069.
Lakatos, P., Musacchia, G., O’Connel, M. N., Falchier, A. Y., Javitt, D. C., & Schroeder, C. E.
(2013). the spectrotemporal filter mechanism of auditory selective attention. Neuron 77, 750–761.
Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of
the speech code. Psychological Review 74, 431–461.
Liebenthal, E., Binder, J. R., Spitzer, S. M., Possing, E. T., & Medler, D. A. (2005). Neural substrates
of phonemic perception. Cerebral Cortex 15(10), 1621–1631.
Lindell, A. K. (2006). In your right mind: Right hemisphere contributions to language processing and
production. Neuropsychology Review 16(3), 131–148.
Luria, A. R., Tsevetkova, L. S., & Futer, D. S. (1965). Aphasia in a composer. Journal of the
Neurological Sciences 2(3), 288–292.
Broca’s area: an MEG study. Nature neuroscience, 4(5), 540–545.
Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music
and language better than nonmusician children: Behavioral and electrophysiological approaches.
Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of musical
expertise on segmental and tonal processing in Mandarin Chinese. Journal of Cognitive
Neuroscience 23(10), 2701–2715.
Marie, C., Magne, C., & Besson, M. (2011). Musicians and the metric structure of words. Journal of
Marr, D. (1982). Vision. San Francisco, CA: Freeman.
Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W.T. (2015). Finding the beat: A neural
perspective across humans and non-human primates. Philosophical Transactions of the Royal
Society B: Biological Sciences 370(1664), 20140093. doi:10.1098/rstb.2014.0093
Milovanov, R., Huotilainen, M., Esquef, P. A., Alku, P., Välimäki, V., & Tervaniemi, M. (2009). The
role of musical aptitude and language skills in preattentive duration processing in school-aged
children. Neuroscience Letters 460(2), 161–165.
Moore, E., Branigan, H. & Overy, K. (2017). Exploring the role of auditory-motor synchronisation in
the transfer of music to language skills in dyslexia. Outstanding Poster Award talk at
Neurosciences and Music VI conference.
Morillon, B., Hackett, T. A., Kajikawa, Y., & Schroeder, C. E. (2015). Predictive motor control of
sensory dynamics in auditory active sensing. Current Opinion in Neurobiology 31, 230–238.
Morillon, B., Schroeder, C. E., & Wyart, V. (2014). Motor contributions to the temporal precision of
auditory attention. Nature Communications 5, 5255.
Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal prediction in lieu of
periodic stimulation. Journal of Neuroscience 36(8), 2342–2347.
Musacchia, G., Sams, M., Skoe, E., & Kraus, N. (2007). Musicians have enhanced subcortical
auditory and audiovisual processing of speech and music. Proceedings of the National Academy of
Sciences 104(40), 15894–15898.
Nobre, A. C., & van Ede, F. (2018). Anticipated moments: Temporal structure in attention. Nature
Reviews Neuroscience 19, 34–38.
Overy, K. (2000). Dyslexia, temporal processing and music: The potential of music as an early
learning aid for dyslexic children. Psychology of Music 28(2), 218–229.
Parbery-Clark, A., Skoe, E., & Kraus, N. (2009). Musical experience limits the degradative effects of
background noise on the neural processing of sound. Journal of Neuroscience 29(45), 14100–
14107
Parbery-Clark, A., Tierney, A., Strait, D. L., & Kraus, N. (2012). Musicians have fine-tuned neural
distinction of speech syllables. Neuroscience 219, 111–119.
Park, H., Ince, R. A. A., Schyns, P. G., Thut, G., & Gross, J. (2015). Frontal top-down signals
increase coupling of auditory low-frequency oscillations to continuous speech in human listeners.
Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience 6(7), 674–681.
Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA
hypothesis. Frontiers in Psychology 2, 142. doi:10.3389/fpsyg.2011.00142
Patel, A. D. (2014). Can nonlinguistic musical training change the way the brain processes speech?
The expanded OPERA hypothesis. Hearing Research 308, 98–108.
Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in System Neuroscience
Patel, A. D., Iversen, J. R., Wassenaar, M., & Hagoort, P. (2008). Musical syntactic processing in
agrammatic Broca’s aphasia. Aphasiology 22(7–8), 776–789.
Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience 6(7), 688–
691.
Peretz, I., Vuvan, D., Lagrois, M. É., & Armony, J. L. (2015). Neural overlap in processing music
and speech. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664),
20140090.
Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral
lateralization as “asymmetric sampling in time.” Speech Communication 41(1), 245–255.
Poeppel, D. (2012). The maps problem and the mapping problem: Two challenges for a cognitive
neuroscience of speech and language. Cognitive Neuropsychology 29(1–2), 34–55.
Przybylski, L., Bedoin, N., Krifi-Papoz, S., Herbillon, V., Roch, D., Léculier, L., … Tillmann, B.
(2013). Rhythmic auditory stimulation influences syntactic processing in children with
developmental language disorders. Neuropsychology 27(1), 121–131.
Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman
primates illuminate human speech processing. Nature Neuroscience 12, 718–724.
Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of “what” and
“where” in auditory cortex. Proceedings of the National Academy of Sciences 97(22), 11800–
11806.
Rimmele, J. M., Morillon, B., Poeppel, D., & Arnal, L. H. (submitted). The proactive and flexible
sense of timing.
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience
27, 169–192.
Rogalsky, C., & Hickok, G. (2011). The role of Broca’s area in sentence comprehension. Journal of
Rohenkohl, G., Cravo, A. M., Wyart, V., & Nobre, A. C. (2012). Temporal expectation improves the
quality of sensory information. Journal of Neuroscience 32(24), 8424–8428.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants.
Science 274(5294), 1926–1928.
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone
sequences by human infants and adults. Cognition 70(1), 27–52.
Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J.
value. Science 340(6129), 216–219.
Sammler, D., Baird, A., Valabrègue, R., Clément, S., Dupont, S., Belin, P., & Samson, S. (2010). The
relationship of lyrics and tunes in the processing of unfamiliar songs: A functional magnetic
resonance adaptation study. Journal of Neuroscience 30(10), 3572–3578.
Sammler, D., Koelsch, S., Ball, T., Brandt, A., Grigutsch, M., Huppertz, H. J., … Friederici, A. D.
(2013). Co-localizing linguistic and musical syntax with intracranial EEG. NeuroImage 64, 134–
146.
Sammler, D., Koelsch, S., & Friederici, A. D. (2011). Are left fronto-temporal brain areas a
prerequisite for normal music-syntactic processing? Cortex 47(6), 659–673.
Schlaug, G., Jäncke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum
size in musicians. Neuropsychologia 33(8), 1047–1055.
Schön, D., Boyer, M., Moreno, S., Besson, M., Peretz, I., & Kolinsky, R. (2008). Songs as an aid for
language acquisition. Cognition 106(2), 975–983.
Schön, D., Gordon, R., Campagne, A., Magne, C., Astésano, C., Anton, J. L., & Besson, M. (2010).
Similar cerebral networks in language, music and song perception. NeuroImage 51(1), 450–461.
Schön, D., Magne, C., & Besson, M. (2004). The music of speech: Music training facilitates pitch
processing in both music and language. Psychophysiology 41(3), 341–349.
Schön, D., & Tillmann, B. (2015). Short- and long-term rhythmic interventions: Perspectives for
language rehabilitation. Annals of the New York Academy of Sciences 1337, 32–39.
Schroeder, C. E., & Lakatos, P. (2009). Low-frequency neuronal oscillations as instruments of
sensory selection. Trends in Neurosciences 32(1), 9–18.
Schroeder, C. E., Wilson, D. A., Radman, T., Scharfman, H., & Lakatos, P. (2010). Dynamics of
active sensing and perceptual selection. Current Opinion in Neurobiology 20, 172–176.
Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a new
framework. Trends in Cognitive Sciences 11(5), 211–218.
Shahin, A., Bosnyak, D. J., Trainor, L. J., & Roberts, L. E. (2003). Enhancement of neuroplastic P2
and N1c auditory evoked potentials in musicians. Journal of Neuroscience 23(13), 5545–5552.
Skoe, E., & Kraus, N. (2012). A little goes a long way: How the adult brain is shaped by musical
training in childhood. Journal of Neuroscience 32(34), 11507–11510.
Slevc, L. R., Faroqi-Shah, Y., Saxena, S., & Okada, B. M. (2016). Preserved processing of musical
structure in a person with agrammatic aphasia. Neurocase 22(6), 505–511.
Staeren, N., Renvall, H., De Martino, F., Goebel, R., & Formisano, E. (2009). Sound categories are
represented as distributed patterns in the human auditory cortex. Current Biology 19(6), 498–502.
Stahl, B., Kotz, S. A., Henseler, I., Turner, R., & Geyer, S. (2011). Rhythm in disguise: Why singing
may not hold the key to recovery from aphasia. Brain 134(10), 3083–3093.
Stefanics, G., Hangya, B., Hernadi, I., Winkler, I., Lakatos, P., & Ulbert, I. (2010). Phase entrainment
of human delta oscillations can mediate the effects of expectation on reaction speed. Journal of
Neuroscience 30(41), 13578–13585.
Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of
prosodic cues in natural speech processing. Nature Neuroscience 2(2), 191–196.
Strait, D. L., Kraus, N., Parbery-Clark, A., & Ashley, R. (2010). Musical experience shapes top-down
auditory mechanisms: Evidence from masking and auditory attention performance. Hearing
Research 261(1), 22–29.
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind:
Statistics, structure, and abstraction. Science 331(6022), 1279–1285.
Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: Do music
lessons help? Emotion 4(1), 46–64.
Tierney, A., & Kraus, N. (2014). Auditory-motor entrainment and phonological skills: Precise
auditory timing hypothesis (PATH). Frontiers in Human Neuroscience 8. Retrieved from
Tinbergen, N. (1963). On aims and methods of ethology. Ethology 20, 410–433.
Vallentin, D., Kosche, G., Lipkind, D., & Long, M. A. (2016). Inhibition protects acquired song
segments during vocal learning in zebra finches, Science 351(6270), 267–271.
Vigneau, M., Beaucousin, V., Hervé, P. Y., Jobard, G., Petit, L., Crivello, F., … Tzourio-Mazoyer, N.
(2011). What is right-hemisphere contribution to phonological, lexico-semantic, and sentence
processing? Insights from a meta-analysis. NeuroImage 54(1), 577–593.
Vuust, P., & Witek, M. A. G. (2014). Rhythmic complexity and predictive coding: A novel approach
to modeling rhythm and meter perception in music. Frontiers in Psychology 5, 1111. Retrieved
Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human
brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422.
Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music
and speech. Trends in Cognitive Sciences 6(1), 37–46.
Zuk, J., Ozernov-Palchik, O., Kim, H., Lakshminarayanan, K., Gabrieli, J. D. E., Tallal, P., & Gaab,
N. (2013). Enhanced syllable discrimination thresholds in musicians. PLoS ONE 8(12), e80546.
Zumbansen, A., Peretz, I., & Hébert, S. (2014). Melodic intonation therapy: Back to basics for future
research. Frontiers in Neurology 5. Retrieved from https://doi.org/10.3389/fneur.2014.00007
SECTION V
MU S IC IA N S HIP A N D
B R A IN F U N C T ION
CHAPT E R 17
M U S I C A L E X P E RT I S E A N D
BRAIN STRUCTURE: THE
CAUSES AND
CONSEQUENCES OF
TRAINING
VIRGINIA B. PENHUNE
O the past twenty years, brain imaging studies have demonstrated that
music training can change brain structure, predominantly in the auditory-
motor network that underlies music performance. These studies have also
shown that brain structural variation is related to performance on a range of
musical tasks, and that even short-term training can result in brain plasticity.
In this chapter, we will argue that the observed differences in brain structure
between experts and novices derive from at least four sources. First, there
may be pre-existing individual differences in structural features supporting
specific skills that predispose people to undertake music training. Second,
lengthy and consistent training likely produces structural change in the
brain networks tapped by performance through repeated cycles of
prediction, feedback, and error-correction that drive learning. Third, the
timing of practice during specific periods of development may result in
brain changes that do not occur at other periods of time, and which may
promote future learning and plasticity. Fourth, both the rewarding nature of
music itself, as well as the reward value of practice and accurate
performance may make music training a particularly effective driver of
brain plasticity.
S B D
A M
There is now a relatively large body of brain imaging data showing

differences in gray- (GM) and white-matter (WM) architecture between
musicians and non-musicians (see Fig. 1). In adults all of these studies are
cross-sectional, and typically compare music students or professionals with
controls selected to have very little music training. One of the most
common and expected findings is that music training is associated with
enhancements in auditory regions, particularly Heschl’s gyrus (HG), the
region of primary auditory cortex. These studies have found that musicians
commonly show greater gyrification of HG (Schneider et al., 2002;
Schneider et al., 2005), and greater GM volume or cortical thickness (CT)
in this region (Bermudez, Lerch, Evans, & Zatorre, 2009; Foster & Zatorre,
2010; Gaser & Schlaug, 2003; Karpati, Giacosa, Foster, Penhune, & Hyde,
2017; Schneider et al., 2002, 2005). These differences have been shown to
be related to indices of music proficiency (Schneider et al., 2002, 2005),
hours of music practice (Foster & Zatorre, 2010), variations in EEG and
MEG responses to auditory signals (Schneider et al., 2002, 2005), and
performance on melody discrimination and rhythm reproduction tasks
(Foster & Zatorre, 2010; Karpati et al., 2017).
FIGURE 1. Regions of the dorsal auditory pathway affected by music training. Illustrates brain
regions found to show structural changes in musicians compared to non-musicians. These include
the auditory (superior temporal gyrus, STG), partietal, premotor cortex (PMC), and inferior frontal
gyrus (IFG) regions in the dorsal auditory pathway, as well as the connecting fibers of the arcuate
fasciculus. Also pictured are the cerebellum and corticospinal tract (CST). Regions not shown are
the corpus callosum and basal ganglia.
The second most common finding is enhancement in motor regions of

the brain, including GM in primary motor, premotor, and parietal regions,
as well as the cerebellum and basal ganglia. In addition, consistent increases
have been observed in white-matter pathways, including the corpus
callosum, descending motor tracts, and sensorimotor connections. One of
the first studies in this domain found that the length of the central sulcus,
and by inference the size of the motor cortex (M1), was larger in trained
musicians, and that earlier onset of training was related to greater length
(Amunts et al., 1997). This finding has been replicated in subsequent
studies using whole-brain analysis techniques (Bermudez et al., 2009; Gaser
& Schlaug, 2003). Differences between musicians and non-musicians have
also been observed in the corpus callosum (CC), the primary white-matter
pathway connecting the two hemispheres. In another early investigation, it
was found that the surface area of the anterior half of the CC was larger in
musicians, and that this difference was greatest for those who began
training before age 7 (Schlaug, Jancke, Huang, Staiger, & Steinmetz, 1995).
Musicians have also been found to have greater white-matter integrity in the
CC as measured using diffusion tensor imaging (DTI), with these measures
being related to hours of practice (Bengtsson et al., 2005), as well as to age
of start and performance on a sensory-motor synchronization task (Steele,
Bailey, Zatorre, & Penhune, 2013). In the descending motor pathways,
changes in DTI measures have been observed to be related to hours of
practice in childhood (Bengtsson et al., 2005). Changes in subcortical
structures have also been observed, with a recent study reporting that
musicians have greater gray-matter volume in the putamen (Vaquero et al.,
2016), and others showing enhancements in cerebellar gray- (Gaser &
Schlaug, 2003; Hutchinson, Lee, Gaab, & Schlaug, 2003) and white-matter
(Abdul-Kareem, Stancak, Parkes, Al-Ameen, et al., 2011). However, a more
recent study from our laboratory using cerebellar-specific segmentation
techniques found no differences in either gray- or white-matter volumes
between musicians and non-musicians, but that musicians who began
training before age 7 had reduced volumes in cerebellar regions specifically
related to motor timing (Baer et al., 2015).
Other regions found to differ between musicians and non-musicians are
in frontal and parietal cortex, including regions important for language (pars
opercularis and triangularis; areas 44 and 45) and working memory
(dorsolateral: 9/46; and ventrolateral prefrontal cortex: 47/12). Enhanced
GM density has been observed in areas 44/45 that is related to years of
music experience (Abdul-Kareem, Stancak, Parkes, & Sluming, 2011;
James et al., 2014; Sluming et al., 2002), and to performance on a test of
absolute pitch (Bermudez et al., 2009). Importantly, musicians have also
been found to have greater white-matter integrity as measured with DTI in
the arcuate fasciculus, the pathway connecting auditory, parietal, and
inferior frontal regions (Halwani, Loui, Ruber, & Schlaug, 2011). Musicians
have also been reported to have greater cortical thickness in DLPFC; and
interregional variability in cortical thickness is correlated across a broader
range of auditory and motor regions in musicians compared to controls
(Bermudez et al., 2009). Finally, several studies have reported greater gray-
matter volume in parietal regions (Foster & Zatorre, 2010; Gaser &
Schlaug, 2003; James et al., 2014), which are engaged in sensorimotor
transformations and planning that are relevant for playing a musical
instrument (Andersen & Cui, 2009; Gogos et al., 2010; Rauschecker, 2011).
In particular, Foster and Zatorre (2010) found that both gray-matter volume
and cortical thickness were related to performance on a test of melodic
discrimination in a group of people with varying levels of music
experience.
Taken together, cross-sectional studies in adult musicians provide
evidence that long-term practice produces structural changes in regions of
the dorsal auditory-motor network that has been shown in functional
imaging studies to be recruited during playing (Brown, Zatorre, & Penhune,
2015; Chen, Penhune, & Zatorre, 2008; Herholz & Zatorre, 2012;
Novembre & Keller, 2014).
D I T -
R P
Studying effects of music training in childhood is important because that is

when lessons typically begin, but also because we know that sensorimotor
experience during early sensitive periods in development can have
differential impacts on long-term brain plasticity. The first longitudinal
study in children examined the effects of 15 months of piano training study
in 6- to 8-year-olds (Hyde et al., 2009). Longitudinal studies are critical
because they allow us to establish more direct causal connections between
training and any observed changes in the brain. This study found that
children who received training did not differ from untrained children at
baseline, but showed gray-matter enhancements in auditory and motor
cortex, as well as enlargement of the corpus callosum. Most importantly, the
volume of auditory cortex was found to be related to performance on tests
of melody and rhythm discrimination, and the volume of motor cortex was
found to be related to performance on a test of fine-motor skill. These
results are supported by a second longitudinal study which found that 6- to
8-year-old children participating in a music training program were found to
have greater WM integrity in the CC after two years (Habibi et al., 2017).
There was also some evidence of reduced cortical thinning in right
compared to left posterior auditory cortex. Taken together, these
longitudinal results indicate that even relatively short-term training in
childhood can produce changes in behavior and brain structure. Most
importantly, changes occurred in the same regions of the auditory-motor
network—auditory cortex, M1, and the CC—that have been shown to differ
after long-term training in adults. The parallel between longitudinal changes
in childhood and cross-sectional findings in adults supports the inference
that the structural differences observed in adults are indeed the result of
training.
The only other anatomical study in children found that in a large group
of 8- to 10-year-olds, the volume of HG was larger in those who practiced
more, and was associated with measures of music aptitude, as well as
behavioral and MEG measures of auditory processing (Seither-Preisler,
Parncutt, & Schneider, 2014). This is consistent with a longitudinal EEG
study in children showing enhancements of auditory evoked responses to
musical features (Putkinen, Tervaniemi, Saarikivi, Ojala, & Huotilainen,
2014). Interestingly, however, no changes in HG volume were observed
when examining possible longitudinal effects after 13 months of additional
training. Further, hierarchical regressions models predicting HG volume
found that aptitude accounted for a greater proportion of the variance than
practice time. The authors interpreted these last two findings as indicating
that anatomical predispositions make a greater contribution to musical
outcomes than training. However, it is also possible that training-related
plastic changes had already occurred in the period preceding the study.
Most children began lessons between 6 and 7 years old, and thus had
already been playing for one to two years.
The issue of whether predispositions or training contribute most to
observed structural differences between musicians and non-musicians has
long been debated, with little data that can directly contribute to settling the
argument. As will be discussed further in this chapter, some data from
untrained adults show that individual differences in specific anatomical
features are related to performance or learning of musical tasks, providing
indirect evidence that pre-existing anatomical features may mediate the
potential to acquire musical skills (Foster & Zatorre, 2010; Li et al., 2014;
Paquette, Fujii, Li, & Schlaug, 2017; Schneider et al., 2005). The finding
described earlier of larger HG volume in children who practice more, and
which does not change over time can also be considered as evidence for a
pre-existing structural feature associated with musical skill (Seither-Preisler
et al., 2014). Work with twins has shown that the propensity to practice is
heritable, and that genes appear to account for a large portion of the
variance in music abilities (Mosing, Madison, Pedersen, Kuja-Halkola, &
Ullén, 2014). However, a very recent study from this same group compared
brain structure in monozygotic twins discordant for music practice. They
found that the twins who played had greater cortical thickness in auditory
and motor regions as well as WM enhancements in the corpus callosum
compared to those who did not (de Manzano & Ullén, 2018). These
findings provide the most definitive support yet for the causal effect of
music training on brain structure. In an effort to synthesize these apparently
opposing results, the authors have proposed a gene–environment interaction
model of the musical skills and its impact on the brain (Ullén, Hambrick, &
Mosing, 2016). This model proposes that multiple genetic predispositions
subserving specifically musical skills, such as auditory and motor abilities,
as well as non-specific cognitive and personality factors contribute to the
likelihood that someone will engage in training. They also hypothesize that
environmental factors interact with genetic predispositions to either
promote or discourage persistence. We would further propose that the
timing of music experience interacts with both predispositions and
normative brain maturation to influence long-term behavioral and brain
plasticity (see Fig. 2).
FIGURE 2. Gene–maturation–environment interactions. Illustrates the interaction between genes,
brain maturation, and specific training. Genetic variation leads to individual differences in brain
structures for musical aptitudes such as auditory perception and motor dexterity. Genetic variation
also regulates other non-specific aptitudes, such as cognitive skills and personality factors, including
openness and the propensity to practice. Maturation produces normative changes that peak at
different times depending on the brain region. Experience, such as music training, then interacts with
both pre-existing individual differences, and normative maturation to change brain structure and
plasticity. Experience also feeds back on genes through gene–environment interactions that can
further enhance or limit plasticity.
T I D
T
A very important question in understanding the effect of music training on

brain structure is the interaction between brain development and music
training. Anecdotal evidence from the lives of famous musicians suggests
that an early start of training can promote the development of extraordinary
skill in adulthood (Jorgensen, 2011). Evidence from animal and human
studies also shows that early experience, such as specific auditory exposure
(Chang & Merzenich, 2003; de Villers-Sidani, Chang, Bao, & Merzenich,
2007), or enriched sensorimotor environments (Kolb et al., 2012), can have
long-term effects on behavior and the brain.
Two important early studies provided suggestive evidence that the
impact of music training on brain structure was related to the age of start,
with those who begin earlier showing greater enhancements in the size of
M1 (Amunts et al., 1997) or the surface area of the corpus callosum
(Schlaug et al., 1995). However, without specific controls, the age of start
of training is typically confounded with the total years of training, making it
impossible to attribute the observed differences to the age at which training
began. In addition, these studies did not link the observed neuroanatomical
differences to relevant behavior.
To address these issues, a series of studies have compared behavior and
brain structure in early- (ET < age 7) and late-trained (LT > age 7)
musicians (see Fig. 3; see also Baer et al., 2015; Bailey & Penhune, 2010,
2012, 2013; Bailey, Zatorre, & Penhune, 2014; Steele et al., 2013; Vaquero
et al., 2016). In these studies we matched ET and LT groups on important
potential confounding variables including: years of music experience, years
of formal training, and hours of current practice. In addition, we assessed
cognitive measures such as non-verbal IQ and auditory working memory
which might be thought to be related to the capacity for early training. Most
importantly, we assessed performance on relevant musical skills, such as
rhythm reproduction and melody discrimination. The age 7 cut-off for ET
and LT groups was initially drawn from the study by Schlaug et al. (1995)
and was essentially arbitrary. However, using a large sample of behavioral
data, we have been able to show that the likely age range where early
training has its strongest effect is between 7 and 9 (Bailey & Penhune,
2013). Behaviorally our studies have shown that adult musicians who begin
training before age 7 outperform those who begin later on rhythm
reproduction and melody discrimination tasks (Bailey & Penhune, 2010,
2012). Drawing on this work, we collected a large sample of ET and LT
musicians with behavioral, T1 and DTI data. Analysis using deformation-
based morphometry on the T1 data found that ET musicians show
enlargement in the region of the ventral premotor cortex (vPMC), and that
the volume of this region is related to performance on the rhythm
synchronization task (Bailey et al., 2014). These findings are consistent
with fMRI studies showing that vPMC is active when both musicians and
non-musicians are performing the same rhythm task (Chen et al., 2008). In
the same sample, DTI measures showed that ET musicians also had
enhanced WM integrity in the posterior mid-body of the corpus callosum,
the location of fibers connecting M1 and PMC in the two hemispheres
(Steele et al., 2013). We interpreted these findings based on data about
normative maturation in these regions, and the relative contribution of
genes and environment to their variability. A large, cross-sectional
developmental sample showed that GM volume in anterior motor regions,
including MI and PMC, have their peak period of growth between 6 and 8
years old (Giedd et al., 1999). Similarly, the size of the anterior region of
the CC shows its peak increase at the same time (Westerhausen et al.,
2011), and variability of this region is more strongly influenced by
environmental than genetic factors (Chiang et al., 2009). Based on these
data, we can hypothesize that early training at the time of peak maturational
change in motor regions and the CC may enhance brain plasticity. In
addition, the relatively stronger contribution of environment to the size of
anterior CC in adults suggests that it might be more susceptible to the
impact of music training. We interpreted these findings as demonstrating a
scaffold, or metaplastic, effect where early training promotes brain
plasticity which is sustained or augmented by later practice (Steele et al.,
2013).
Our findings in the PMC and CC appear to tell a straightforward story in
which early training produces enlargement or enhancement of brain
structure. However, more recent findings make it clear that reality is not so
simple. Using the same sample described earlier, we examined GM and
WM volumes in the cerebellum using a novel multi-atlas segmentation
technique that labels all thirteen lobules in both hemispheres (Baer et al.,
2015). In addition, we tested these musicians and controls on a classic
auditory-motor tapping and continuation task (Repp, 2005). The cerebellum
has been linked to a range of sensory and motor timing functions that are
likely to be relevant for music training and performance (Koziol et al.,
2014; Sokolov, Miall, & Ivry, 2017). And, as described earlier, previous
work had found greater cerebellar GM volume in trained musicians (Gaser
& Schlaug, 2003; Hutchinson et al., 2003). However, the results of our
study showed that ET musicians had smaller volumes of cerebellar lobules
IV, V, and VI compared to LT musicians. Strikingly, earlier age of start,
greater music experience, and better timing performance were all correlated
with smaller cerebellar volumes. Better timing performance was
specifically associated with smaller volumes of right lobule VI which has
been functionally linked to perceptual and motor timing (E, Chen, Ho, &
Desmond, 2014; Ivry, Spencer, Zelaznik, & Diedrichsen, 2002). This is
consistent with another recent study which found that early-trained pianists
had smaller GM volume in the right putamen, and lower timing variability
when playing scales (Vaquero et al., 2016).
FIGURE 3. Findings from studies examining structural differences between early- (ET; before age
7) and late-trained (LT; after age 7) musicians. Panel A on the left is taken from Bailey et al., 2014
and shows GM enhancement in the ventral premotor cortex (vPMC) in ET musicians. Panel A on
the right is taken from Steele et al., 2013 and shows enhanced FA in the posterior midbody of the
corpus callosum. Panel B on the left is taken from Vaquero et al., 2016 and shows reduced GM in
the putamen in ET musicians. Panel B on the right is taken from Baer et al, 2015 and shows reduced
volume of left cerebellar lobule VIIIa. The graphs at the bottom of each panel show the relationship
of volume changes with the age of onset of training.
So, why does training affect the cerebellum differently than the cortex,
and how do these findings challenge our understanding of the effects of
early experience? There are several features of cerebellar anatomy that may
explain this result. First, developmental studies show that peak growth in
the cerebellum occurs much later than in most of the cortex, between the
ages of 12 and 18 (Tiemeier et al., 2010). Thus early experience may have a
different effect on cerebellar plasticity, such that experience leads to greater
efficiency and reduced expansion. Second, the cerebellum is unique in
being structurally homogeneous, with the identical cytoarchitecture and
input–output circuitry throughout (Schmahmann, 1997). In the motor
system, the cerebellar circuits are known to play a role in error-correction
and optimization. Because these circuits are uniform across the structure, it
is hypothesized that they perform the same role in optimizing a wide variety
of functions in the regions to which it is connected (Balsters, Whelan,
Robertson, & Ramnani, 2013; Koziol et al., 2014; Sokolov et al., 2017).
The cerebellar regions that are smaller in ET musicians in our study are
connected to frontal motor and association regions, including M1, PMC,
and prefrontal cortex (Diedrichsen, Balsters, Flavell, Cussans, & Ramnani,
2009; Kelly & Strick, 2003). Based on this information, it is possible that
training-related skills and cortical expansion might be supported by greater
optimization and reduced expansion in the cerebellum. If this is true, then
cortical and cerebellar changes with training should be inversely related.
A S -T T
Differences in brain structure between musicians and non-musicians have

generally been attributed to long and intensive training. However, it is more
likely that they result from an interaction between training-induced
plasticity and pre-existing individual differences in the brain that predispose
certain people to engage in music (see Fig. 2). While there is little direct
evidence for specific brain features that predispose an individual to become
a musician, evidence from studies of individual differences in music ability
and response to training can provide some clues. Individual differences in
auditory and motor regions of untrained individuals have been linked to
performance on specific musical tasks, and to the ability to learn to play an
instrument. GM concentrations in auditory regions and the amygdala were
found to be correlated with interval discrimination in a large sample
unselected for music training (Li et al., 2014). Similarly, in a sample
selected to have a range of musical experience, GM concentration and
cortical thickness in auditory and parietal regions were found to be related
to the ability to discriminate melodies that had been transposed (Foster &
Zatorre, 2010). Finally, a recent study found that cerebellar volumes were
related to beat perception in musicians (Paquette et al., 2017). Individual
differences in WM tracts connecting auditory and motor regions, and in
motor output pathways have been found to be related to faster learning of
short melodies (Engel et al., 2014). Further, WM integrity in the left arcuate
fasciculus and the temporal segment of the CC have been found to predict
individual differences in auditory-motor synchronization (Blecher, Tal, &
Ben-Shachar, 2016). Findings showing that brain structural features can
predict musical skills are consistent with results in related domains, where
the volume of auditory cortex was found to be associated with the ability to
learn linguistic pitch discrimination (Wong et al., 2008), and the volume of
both auditory cortex (Golestani, Molko, Dehaene, LeBihan, & Pallier, 2007;
Golestani, Paus, & Zatorre, 2002) and the arcuate fasciculus have been
found to be related to foreign language sound learning (Vaquero,
Rodriguez-Fornells, & Reiterer, 2017).
Very importantly, however, aptitude for music training likely relies on
more than pure auditory or motor skill. Heritability studies show that the
propensity to practice appears to be genetically transmitted (Mosing et al.,
2014), and that personality variables such as “openness to experience” are
also associated with lifetime practice (Butkovic, Ullén, & Mosing, 2015).
Thus, an individual with exceptional pre-existing skills must also have the
right personality characteristics to undertake long-term training, and the
openness to engage with new people, places, and ideas. A talented
individual who does not like to practice, or hates stress, travel, and
challenge is unlikely to become a professional musician.
B I A T
Taken together, the current data on brain structure in musicians suggests

that there may be pre-existing structural features—likely in the auditory-
motor network supporting musical skill—that predispose individuals to
pursue music training. Once training begins, the long-term effects on
behavior and brain structure depend on the age of start, and thus on the
interaction between training and the maturational trajectories of these
regions and their connections. Early training may produce a type of scaffold
or metaplasticity effect. Metaplasticity is a term that originates from studies
of hippocampal learning mechanisms, and denotes the idea that experience
can change the potential for plasticity of a synapse (for review see
Altenmüller & Furuya, 2016; Herholz & Zatorre, 2012). When applied to
the context of music, it is the idea that training during specific phases of
brain development can have long-term effects on how those regions change
in response to future experience. Evidence for metaplastic effects resulting
from music training comes from studies showing that musicians have
enhanced learning of sensory and motor skills (Herholz, Boh, & Pantev,
2011; Ragert, Schmidt, Altenmüller, & Dinse, 2004; Rosenkranz,
Williamon, & Rothwell, 2007), and greater increases in M1 activity during
learning (Hund-Georgiadis & von Cramon, 1999). Thus we can think of
early training as a scaffold on which later training can build (Bailey et al.,
2014; Steele et al., 2013). Along with these training-specific metaplastic
effects, evidence from heritability studies indicates that skills and abilities
not specific to music may also contribute to promoting or limiting
plasticity; these include the propensity to practice (Mosing et al., 2014), as
well personality and cognitive variables that can support training (Butkovic
et al., 2015).
W I M S E
D B P ?
Why does music training produce such robust changes in brain structure?
One very obvious answer is practice—lots of practice. For the studies
reviewed here, the average length of training for musicians was 15–20
years. This is the equivalent of thousands of hours of practice across a large
portion of the person’s life. While the idea that simply practicing long
enough will result in expertise has been largely debunked (for review, see
Mosing et al., 2014), long-term, consistent practice is strongly associated
with expertise in a range of domains (Macnamara, Hambrick, & Oswald,
2014). Further, in the studies reviewed here, the length of training is
typically strongly related to both structural brain differences and task
performance. The impact of practice on brain organization is supported by
studies in animals showing that practice on new motor tasks is associated
with expanded representations in motor areas (Elbert, Pantev, Wienbruch,
Rockstroh, & Taub, 1995; Nudo, Milliken, Jenkins, & Merzenich, 1996),
changes in MR measures of gray- and white-matter (Scholz, Allemang-
Grand, Dazai, & Lerch, 2015; Scholz, Niibori, Frankland, & Lerch, 2015),
and increased numbers of synapses and dendritic spines (Kleim, Barnaby, et
al., 2002; Kleim, Freeman, et al., 2002; Kleim et al., 2004). Neuronal
changes in gray matter that are related to learning include neurogenesis,
synaptogenesis, and changes in neuronal morphology. In white matter,
changes related to learning including increases in the number of axons,
axon diameter, packing density of fibers, and myelination can be found
(Zatorre, Fields, & Johansen-Berg, 2012).
A second reason that music training may be particularly effective in
driving brain plasticity is the highly specific nature of practice. The
majority of musicians are experts on a single instrument; thus they perform
millions of repetitions of the same movements, and listen attentively to an
even larger number of associated sounds. When practicing, a musician
imagines and plans a precise sequence of sounds and the movements
required to produce them. Once the plan is set in motion, they use auditory
and somatosensory information to detect subtle deviations in sound and
movement, implementing adjustments to enhance performance. Practice is
therefore a repeated prediction, feedback, and error-correction cycle.
Auditory-motor prediction is thought to be a central function of the dorsal
stream, particularly of the premotor cortex. Brain imaging studies have
shown increased activity in the PMC when people listen to melodies that
they have learned to play (Chen, Rae, & Watkins, 2012; Lahav, Saltzman,
& Schlaug, 2007), and recent work from our laboratory has shown that
transcranical magnetic stimulation (TMS) over dorsal PMC disrupts
learning of auditory-motor associations (Lega, Stephan, Zatorre, &
Penhune, 2016). Feedback and error-correction are key components of
motor learning (Diedrichsen, Shadmehr, & Ivry, 2010; Sokolov et al., 2017;
Wolpert, Diedrichsen, & Flanagan, 2011), and studies of both motor and
sensory learning show that functional and structural changes in the brain are
driven by decreases in error and improved precision. For example, learning
to juggle (Scholz, Klein, Behrens, & Johansen-Berg, 2009), balance on a
tilting board (Taubert et al., 2010), or to perform a complex visuomotor task
(Lakhani et al., 2016; Landi, Baguear, & Della-Maggiore, 2011) have all
been shown to produce changes in gray- or white-matter architecture that
were related to decreases in error with learning. Thus error-driven learning,
particularly during periods of high developmental plasticity may be an
important contributor to structural brain changes measured in adult
musicians.
Another reason that music training may be so successful in producing
brain plasticity is that it is inherently multisensory. To produce music,
performers must learn to link sounds to actions, but they must also link
visual, somatosensory, and proprioceptive feedback to these sounds and
actions. As described earlier, training is a prediction to feedback to error-
correction cycle in which musicians use all their sensory resources to
produce the perfect sound. Sounds are linked to actions relatively rapidly, as
has been shown by changes in the strength of motor activity during passive
listening to learned melodies after short-term training (Bangert et al., 2006;
D’Ausilio, Altenmüller, Olivetti Belardinelli, & Lotze, 2006; Lega et al.,
2016; Stephan, Brown, Lega, & Penhune, 2016). In particular, it was shown
that learning to play a melody resulted in greater changes in the activity of
auditory cortex than learning to remember the melody by listening alone
(Lappe, Herholz, Trainor, & Pantev, 2008). This may partly be based on
strong intrinsic connections between the auditory and motor systems (Chen
et al., 2012; Poeppel, 2014; Zatorre, Chen, & Penhune, 2007). But it can
also be hypothesized that co-activation of circuits deriving from multiple
senses may drive plasticity even more strongly than input from a single
sense (Lee & Noppeney, 2011, 2014).
A final feature of music training that is likely crucial in promoting
plasticity is the rewarding nature of performance. There are three aspects of
reward that may stimulate plasticity: first, the rewarding nature of music
itself that is experienced through playing; second, the intrinsic reward of
performing, both for the player and through the acclaim it may bring; and
finally, the potentially rewarding nature of practice and the pleasure of
accurate performance. The intrinsic pleasure derived from music appears to
be common to most people (Mas-Herrero, Marco-Pallares, Lorenzo-Seva,
Zatorre, & Rodriguez-Fornells, 2011), and is hypothesized to be based on
the same dopamine-modulated, predictive systems that regulate reward in
other domains with direct biological consequences, including drugs, food,
sex, and money (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015).
Thus learning to produce a rewarding stimulus, such as music, is likely to
be rewarding to the player.
We also know that learning and brain plasticity are strongly affected by
the reward value of what is learned. Animal studies show that brain
plasticity associated with auditory learning is greater when the information
to be learned is rewarded, or behaviorally relevant. For example, the
responses of neurons in the auditory cortex of ferrets were modulated by the
reward value of stimuli (David, Fritz, & Shamma, 2012). Further, pairing a
tone with stimulation of dopamine circuits in the brainstem increased the
selectivity of responding in auditory neurons tuned to the same tone (Bao,
Chan, & Merzenich, 2001). Importantly, dopamine has been shown to
modulate motor learning in both humans and animals (Floel et al., 2005;
Tremblay et al., 2009, 2010); possibly through the reinforcement and habit-
formation circuitry of the striatum (Graybiel & Grafton, 2015; Haith &
Krakauer, 2013). Thus, if the output of practice, a beautiful piece of music,
is rewarding and stimulates dopamine release, then playing such a piece
should promote learning. It is also likely that the social benefits of playing
music add to this type of reward.
Finally, humans seem to have a strong internal motivation to practice
and perfect many skills, even if those skills do not have immediate
physiological, psychological, or social outcomes. In addition to music,
people spend hours perfecting their golf swing, playing video games, or
baking elaborate cakes. All of these skills require practice, and the outcome
of practice is often not immediate. Thus we hypothesize that practice itself
may be rewarding, and that the prediction–feedback–error-correction cycle
that is important for learning, may be motivating across a range of domains.
When musicians are learning a new and challenging piece, or perfecting an
old one, they know exactly what they want it to sound like. This
representation is translated into a motor plan, and both the imagined
outcome and the plan become predictions against which they will measure
their performance. When musicians attempt to play the piece, they will
likely make errors, which lead to corrections and learning; but when they
play the piece as imagined, they experience the reward of accurate
performance. Because error feedback and reward are so important for
learning, these mechanisms seem like strong candidates for promoting brain
plasticity, but have been little explored.
W D W G F H ?
Bringing together the data from this review, we suggest three directions for
future research.
(1) Currently, most studies examine GM and WM differences

separately, or do not directly link them through analysis. Analyses
typically target differences in individual regions, when it is very
likely that plasticity changes occur at the network level.
Additionally, groups are defined a priori rather than using data-
driven approaches using participant characteristics such as training
duration or age-of-start. Implementing these kinds of analyses
requires large samples with multiple imaging measures. This
implies a multi-center, data-sharing approach where standard
behavioral and imaging protocols are implemented to allow
aggregation of results.
(2) A related goal for music neuroscientists in the next ten years should
be the establishment of standardized test batteries with age-based
norms that can be administered across locations. A number of
groups have been working on the development of tests aimed at
children and adults (Dalla Bella et al., 2017; Ireland, Parker, Foster,
& Penhune, in press; Mullensiefen, Gingras, Musil, & Stewart,
2014; Peretz et al., 2013). Important features of such norms are:
availability, standard of administration, and up-to-date norms.
(3) Studies targeting gene–maturation–environment interactions that
will allow us to understand the complex interactions between pre-
existing individual differences in ability, and the type and timing of
music training. Music-specific databases and standard instruments
would contribute to the feasibility of such work.
R
Abdul-Kareem, I. A., Stancak, A., Parkes, L. M., Al-Ameen, M., Alghamdi, J., Aldhafeeri, F. M., …
Sluming, V. (2011). Plasticity of the superior and middle cerebellar peduncles in musicians
revealed by quantitative analysis of volume and number of streamlines based on diffusion tensor
tractography. Cerebellum 10(3), 611–623.
Abdul-Kareem, I. A., Stancak, A., Parkes, L. M., & Sluming, V. (2011). Increased gray matter
volume of left pars opercularis in male orchestral musicians correlate positively with years of
musical performance. Journal of Magnetic Resonance Imaging 33(1), 24–32.
Altenmüller, E., & Furuya, S. (2016). Brain plasticity and the concept of metaplasticity in skilled
musicians. Advances in Experimental Medicine and Biology 957, 197–208.
Amunts, K., Schlaug, G., Jancke, L., Steinmetz, H., Schleicher, A., Dabringhaus, A., & Zilles, K.
(1997). Motor cortex and hand motor skills: Structural compliance in the human brain. Human
Brain Mapping 5(3), 206–215.
Andersen, R. A., & Cui, H. (2009). Intention, action planning, and decision making in parietal-frontal
circuits. Neuron 63(5), 568–583.
Baer, L., Park, M., Bailey, J., Chakravarty, M., Li, K., & Penhune, V. (2015). Regional cerebellar
volumes are related to early musical training and finger tapping performance. NeuroImage 109,
130–139.
Bailey, J. A., & Penhune, V. B. (2010). Rhythm synchronization performance and auditory working
memory in early- and late-trained musicians. Experimental Brain Research 204(1), 91–101.
Bailey, J. A., & Penhune, V. B. (2012). A sensitive period for musical training: Contributions of age
of onset and cognitive abilities. Annals of the New York Academy of Sciences 1252, 163–170.
Bailey, J. A., & Penhune, V. B. (2013). The relationship between the age of onset of musical training
and rhythm synchronization performance: Validation of sensitive period effects. Frontiers in
Neuroscience 7, 227. Retrieved from https://doi.org/10.3389/fnins.2013.00227
Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2014). Early musical training is linked to gray matter
structure in the ventral premotor cortex and auditory-motor rhythm synchronization performance.
Balsters, J. H., Whelan, C. D., Robertson, I. H., & Ramnani, N. (2013). Cerebellum and cognition:
Evidence for the encoding of higher order rules. Cerebral Cortex 23(6), 1433–1443.
Bao, S., Chan, V. T., & Merzenich, M. M. (2001). Cortical remodelling induced by activity of ventral
tegmental dopamine neurons. Nature 412(6842), 79–83.
Bengtsson, S., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano
1148–1150.
Bermudez, P., Lerch, J. P., Evans, A. C., & Zatorre, R. J. (2009). Neuroanatomical correlates of
musicianship as revealed by cortical thickness and voxel-based morphometry. Cerebral Cortex
19(7), 1583–1596.
Blecher, T., Tal, I., & Ben-Shachar, M. (2016). White matter microstructural properties correlate with
sensorimotor synchronization abilities. NeuroImage 138, 1–12.
Brown, R. M., Zatorre, R. J., & Penhune, V. B. (2015). Expert music performance: Cognitive, neural,
and developmental bases. Progress in Brain Research 217, 57–86.
Butkovic, A., Ullén, F., & Mosing, M. A. (2015). Personality-related traits as predictors of music
practice: Underlying environmental and genetic influences. Personality and Individual Differences
74, 133–138.
Chang, E. F., & Merzenich, M. M. (2003). Environmental noise retards auditory cortical
development. Science 300(5618), 498–502.
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008). Moving on time: Brain network for auditory-
Chen, J. L., Rae, C., & Watkins, K. E. (2012). Learning to play a melody: An fMRI study examining
the formation of auditory-motor associations. NeuroImage 59(2), 1200–1208.
Chiang, M. C., Barysheva, M., Shattuck, D. W., Lee, A. D., Madsen, S. K., Avedissian, C., …
Thompson, P. M. (2009). Genetics of brain fiber architecture and intellectual performance. Journal
of Neuroscience 29(7), 2212–2224.
D’Ausilio, A., Altenmüller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity
Dalla Bella, S., Farrugia, N., Benoit, C. E., Begel, V., Verga, L., Harding, E., & Kotz, S. A. (2017).
BAASTA: Battery for the Assessment of Auditory Sensorimotor and Timing Abilities. Behavior
Research Methods 49(3), 1128–1145.
David, S. V., Fritz, J. B., & Shamma, S. A. (2012). Task reward structure shapes rapid receptive field
plasticity in auditory cortex. Proceedings of the National Academy of Sciences 109(6), 2144–2149.
de Manzano, O., & Ullén, F. (2018). Same genes, different brains: Neuroanatomical differences
between monozygotic twins discordant for musical training. Cerebral Cortex 28(1), 387–394.
de Villers-Sidani, E., Chang, E. F., Bao, S., & Merzenich, M. M. (2007). Critical period window for
spectral tuning defined in the primary auditory cortex (A1) in the rat. Journal of Neuroscience
27(1), 180–189.
Diedrichsen, J., Balsters, J. H., Flavell, J., Cussans, E., & Ramnani, N. (2009). A probabilistic MR
atlas of the human cerebellum. NeuroImage 46(1), 39–46.
Diedrichsen, J., Shadmehr, R., & Ivry, R. B. (2010). The coordination of movement: Optimal
feedback control and beyond. Trends in Cognitive Sciences 14(1), 31–39.
E, K. H., Chen, S. H., Ho, M. H., & Desmond, J. E. (2014). A meta-analysis of cerebellar
contributions to higher cognition from PET and fMRI studies. Human Brain Mapping 35(2), 593–
615.
Engel, A., Hijmans, B. S., Cerliani, L., Bangert, M., Nanetti, L., Keller, P. E., & Keysers, C. (2014).
Inter-individual differences in audio-motor learning of piano melodies and white matter fiber tract
architecture. Human Brain Mapping 35(5), 2483–2497.
Floel, A., Breitenstein, C., Hummel, F., Celnik, P., Gingert, C., Sawaki, L., … Cohen, L. G. (2005).
Dopaminergic influences on formation of a motor memory. Annals of Neurology 58(1), 121–130.
Foster, N. E., & Zatorre, R. J. (2010). Cortical structure predicts success in performing musical
transformation judgments. NeuroImage 53(1), 26–36.
Gaser, C., & Schlaug, G. (2003). Brain structure differences between musicians and non-musicians.
Giedd, J., Blumenthal, J., Jeffries, N., Castellanos, F., Liu, H., Zijdenbos, A., … Rapoport, J. (1999).
Brain development during childhood and adolescence: A longitudinal MRI study. Nature
Gogos, A., Gavrilescu, M., Davison, S., Searle, K., Adams, J., Rossell, S. L., … Egan, G. F. (2010).
Greater superior than inferior parietal lobule activation with increasing rotation angle during
mental rotation: An fMRI study. Neuropsychologia 48(2), 529–535.
Golestani, N., Molko, N., Dehaene, S., LeBihan, D., & Pallier, C. (2007). Brain structure predicts the
learning of foreign speech sounds. Cerebral Cortex 17(3), 575–582.
Golestani, N., Paus, T., & Zatorre, R. (2002). Anatomical correlates of learning novel speech sounds.
Neuron 35, 997–1010.
Graybiel, A. M., & Grafton, S. T. (2015). The striatum: Where skills and habits meet. Cold Spring
Harbor Perspectives in Biology 7(8), a021691. doi:10.1101/cshperspect.a021691
Habibi, A., Damasio, A., Ilari, B., Veiga, R., Joshi, A. A., Leahy, R. M., … Damasio, H. (2017).
Childhood music training induces change in micro and macroscopic brain structure: Results from a
longitudinal study. Cerebral Cortex 1–12. doi:10.1093/cercor/bhx286
Haith, A. M., & Krakauer, J. W. (2013). Model-based and model-free mechanisms of human motor
learning. Advances in Experimental Medicine and Biology 782, 1–21.
Halwani, G. F., Loui, P., Ruber, T., & Schlaug, G. (2011). Effects of practice and experience on the
arcuate fasciculus: Comparing singers, instrumentalists, and non-musicians. Frontiers in
Herholz, S. C., Boh, B., & Pantev, C. (2011). Musical training modulates encoding of higher-order
regularities in the auditory cortex. European Journal of Neuroscience 34(3), 524–529.
Herholz, S. C., & Zatorre, R. (2012). Musical training as a framework for brain plasticity: Behavior,
function, and structure. Neuron 76(3), 486–502.
Hund-Georgiadis, M., & von Cramon, D. (1999). Motor-learning-related changes in piano players
and non-musicians revealed by functional magnetic-resonance signals. Experimental Brain
Research 125(4), 417–425.
Hutchinson, S., Lee, L. H., Gaab, N., & Schlaug, G. (2003). Cerebellar volume of musicians.
Ireland, K., Parker, A., Foster, N., & Penhune, V. (in press). Rhythm and melody tasks for school-
aged children with and without musical training: Age-equivalent scores and reliability. Frontiers in
Auditory Cognitive Neuroscience.
Ivry, R. B., Spencer, R. M., Zelaznik, H. N., & Diedrichsen, J. (2002). The cerebellum and event
timing. Annals of the New York Academy of Sciences 978, 302–317.
James, C. E., Oechslin, M. S., Van De Ville, D., Hauert, C. A., Descloux, C., & Lazeyras, F. (2014).
Musical training intensity yields opposite effects on grey matter density in cognitive versus
sensorimotor networks. Brain Structure & Function 219(1), 353–366.
Jorgensen, H. (2011). Instrumental learning: Is an early start a key to success? British Journal of
Music Education 18(3), 227–239.
Karpati, F. J., Giacosa, C., Foster, N. E. V., Penhune, V. B., & Hyde, K. L. (2017). Dance and music
share gray matter structural correlates. Brain Research 1657, 62–73.
Kelly, R., & Strick, P. (2003). Cerebellar loops with motor cortex and prefrontal cortex of a non-
human primate. Journal of Neuroscience 23(23), 8432–8444.
Kleim, J. A., Barnaby, S., Cooper, N., Hogg, T., Reidel, C., Remple, M., & Nudo, R. (2002). Motor
learning-dependent synaptogenesis is localized to functionally reorganized motor cortex.
Neurobiology of Learning and Memory 77(1), 63–77.
Kleim, J. A., Freeman, J. H., Jr., Bruneau, R., Nolan, B. C., Cooper, N. R., Zook, A., & Walters, D.
(2002). Synapse formation is associated with memory storage in the cerebellum. Proceedings of
Kleim, J. A., Hogg, T., VandenBerg, P., Cooper, N., Bruneau, R., & Remple, M. (2004). Cortical
synaptogenesis and motor map reorganziation occur during late, but not early, phase of motor skill
learning. Journal of Neuroscience 24(3), 628–633.
Kolb, B., Mychasiuk, R., Muhammad, A., Li, Y., Frost, D. O., & Gibb, R. (2012). Experience and the
developing prefrontal cortex. Proceedings of the National Academy of Sciences 109(Suppl. 2),
17186–17193.
Koziol, L. F., Budding, D., Andreasen, N., D’Arrigo, S., Bulgheroni, S., Imamizu, H., … Yamazaki,
T. (2014). Consensus paper: The cerebellum’s role in movement and cognition. Cerebellum 13(1),
151–177.
308–314.
Lakhani, B., Borich, M. R., Jackson, J. N., Wadden, K. P., Peters, S., Villamayor, A., … Boyd, L. A.
(2016). Motor skill acquisition promotes human brain myelin plasticity. Neural Plasticity 2016,
7526135. doi:10.1155/2016/7526135
Landi, S. M., Baguear, F., & Della-Maggiore, V. (2011). One week of motor adaptation induces
structural changes in primary motor cortex that predict long-term memory one year later. Journal
of Neuroscience 31(33), 11808–11813.
Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by short-
term unimodal and multimodal musical training. Journal of Neuroscience 28(39), 9632–9639.
Lee, H., & Noppeney, U. (2011). Long-term music training tunes how the brain temporally binds
signals from multiple senses. Proceedings of the National Academy of Sciences 108(51), E1441–
E1450.
Lee, H., & Noppeney, U. (2014). Music expertise shapes audiovisual temporal integration windows
for speech, sinewave speech, and music. Frontiers in Psychology 5, 868. Retrieved from
Lega, C., Stephan, M. A., Zatorre, R. J., & Penhune, V. (2016). Testing the role of dorsal premotor
cortex in auditory-motor association learning using transcranical magnetic stimulation (TMS).
PLoS ONE 11(9), e0163380.
Li, X., De Beuckelaer, A., Guo, J., Ma, F., Xu, M., & Liu, J. (2014). The gray matter volume of the
amygdala is correlated with the perception of melodic intervals: a voxel-based morphometry study.
PLoS ONE 9(6), e99889.
Macnamara, B. N., Hambrick, D. Z., & Oswald, F. L. (2014). Deliberate practice and performance in
music, games, sports, education, and professions: A meta-analysis. Psychological Science 25(8),
1608–1618.
Mas-Herrero, E., Marco-Pallares, J., Lorenzo-Seva, U., Zatorre, R. J., & Rodriguez-Fornells, A.
(2011). Individual differences in music reward experiences. Music Perception 31(2), 118–138.
Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not
1795–1803.
Mullensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians: An
index for assessing musical sophistication in the general population. PLoS ONE 9(2), e89642.
Novembre, G., & Keller, P. E. (2014). A conceptual review on action-perception coupling in the
musicians’ brain: What is it good for? Frontiers in Human Neuroscience 8, 603. Retrieved from
Nudo, R., Milliken, G., Jenkins, W., & Merzenich, M. (1996). Use-dependent alterations of
movement representations in primary motor cortex of adult squirrel monkeys. Journal of
discrimination. NeuroImage 163, 177–182.
Peretz, I., Gosselin, N., Nan, Y., Caron-Caplette, E., Trehub, S. E., & Beland, R. (2013). A novel tool
for evaluating children’s musical abilities across age and culture. Frontiers in Systems
Neuroscience 7, 30. Retrieved from https://doi.org/10.3389/fnsys.2013.00030
Poeppel, D. (2014). The neuroanatomic and neurophysiological infrastructure for speech and
language. Current Opinion in Neurobiology 28, 142–149.
Putkinen, V., Tervaniemi, M., Saarikivi, K., Ojala, P., & Huotilainen, M. (2014). Enhanced
development of auditory change detection in musically trained school-aged children: A
longitudinal event-related potential study. Developmental Science 17(2), 282–297.
Ragert, P., Schmidt, A., Altenmüller, E., & Dinse, H. (2004). Superior tactile performance and
learning in professional pianists: Evidence for meta-plasticity in musicians. European Journal of
Rauschecker, J. (2011). An expanded role for the dorsal auditory pathway in sensorimotor control
and integration. Hearing Research 271, 16–25.
Repp, B. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic
Bulletin and Review 12(6), 969–992.
Rosenkranz, K., Williamon, A., & Rothwell, J. C. (2007). Motorcortical excitability and synaptic
plasticity is enhanced in professional musicians. Journal of Neuroscience 27(19), 5200–5206.
Schlaug, G., Jancke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum
Schmahmann, J. (1997). The cerebrocerebellar system. In J. Schmahmann (Ed.), The Cerebellum and
Cognition (Vol. 41, pp. 31–55). San Diego, CA: Academic Press.
Schneider, P., Scherg, M., Dosch, H., Specht, H., Gutschalk, A., & Rupp, A. (2002). Morphology of
Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians. Nature
Schneider, P., Sluming, V., Roberts, N., Scherg, M., Goebel, R., Specht, H. J., … Rupp, A. (2005).
Structural and functional asymmetry of lateral Heschl’s gyrus reflects pitch perception preference.
Nature Neuroscience 8(9), 1241–1247.
Scholz, J., Allemang-Grand, R., Dazai, J., & Lerch, J. P. (2015). Environmental enrichment is
associated with rapid volumetric brain changes in adult mice. NeuroImage 109, 190–198.
Scholz, J., Klein, M. C., Behrens, T. E., & Johansen-Berg, H. (2009). Training induces changes in
white-matter architecture. Nature Neuroscience 12(11), 1370–1371.
Scholz, J., Niibori, Y., Frankland, P. W., & Lerch, J. P. (2015). Rotarod training in mice is associated
with changes in brain structure observable with multimodal MRI. NeuroImage 107, 182–189.
Seither-Preisler, A., Parncutt, R., & Schneider, P. (2014). Size and synchronization of auditory cortex
promotes musical, literacy, and attentional skills in children. Journal of Neuroscience 34(33),
10937–10949.
Sluming, V., Barrick, T., Howard, M., Cezayirli, E., Mayes, A., & Roberts, N. (2002). Voxel-based
morphometry reveals increased gray matter density in Broca’s area in male symphony orchestra
musicians. NeuroImage 17(3), 1613–1622.
Sokolov, A. A., Miall, R. C., & Ivry, R. B. (2017). The cerebellum: Adaptive prediction for
movement and cognition. Trends in Cognitive Sciences 21(5), 313–332.
Steele, C. J., Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2013). Early musical training and white-
matter plasticity in the corpus callosum: Evidence for a sensitive period. Journal of Neuroscience
33(3), 1282–1290.
Stephan, M. A., Brown, R., Lega, C., & Penhune, V. (2016). Melodic priming of motor sequence
performance: The role of the dorsal premotor cortex. Frontiers in Neuroscience 10, 210. Retrieved
from https://www.frontiersin.org/articles/10.3389/fnins.2016.00210
Taubert, M., Draganski, B., Anwander, A., Muller, K., Horstmann, A., Villringer, A., & Ragert, P.
(2010). Dynamic properties of human brain structure: Learning-related changes in cortical areas
and associated fiber connections. Journal of Neuroscience 30(35), 11670–11677.
Tiemeier, H., Lenroot, R. K., Greenstein, D. K., Tran, L., Pierson, R., & Giedd, J. N. (2010).
Cerebellum development during childhood and adolescence: A longitudinal morphometric MRI
study. NeuroImage 49(1), 63–70.
Tremblay, P. L., Bedard, M. A., Langlois, D., Blanchet, P. J., Lemay, M., & Parent, M. (2010).
Movement chunking during sequence learning is a dopamine-dependent process: A study
conducted in Parkinson’s disease. Experimental Brain Research 205(3), 375–385.
Tremblay, P. L., Bedard, M. A., Levesque, M., Chebli, M., Parent, M., Courtemanche, R., &
Blanchet, P. J. (2009). Motor sequence learning in primate: Role of the D2 receptor in movement
chunking during consolidation. Behavioural Brain Research 198(1), 231–239.
Ullén, F., Hambrick, D. Z., & Mosing, M. A. (2016). Rethinking expertise: A multifactorial gene–
Vaquero, L., Hartmann, K., Ripolles, P., Rojo, N., Sierpowska, J., Francois, C., … Altenmüller, E.
(2016). Structural neuroplasticity in expert pianists depends on the age of musical training onset.
Vaquero, L., Rodriguez-Fornells, A., & Reiterer, S. M. (2017). The left, the better: White-matter
brain integrity predicts foreign language imitation ability. Cerebral Cortex 27(8), 3906–3917.
Westerhausen, R., Luders, E., Specht, K., Ofte, S. H., Toga, A. W., Thompson, P. M., … Hugdahl, K.
(2011). Structural and functional reorganization of the corpus callosum between the age of 6 and 8
years. Cerebral Cortex 21(5), 1012–1017.
Wolpert, D. M., Diedrichsen, J., & Flanagan, J. R. (2011). Principles of sensorimotor learning.
Wong, P. C., Warrier, C. M., Penhune, V. B., Roy, A. K., Sadehh, A., Parrish, T. B., & Zatorre, R. J.
(2008). Volume of left Heschl’s gyrus and linguistic pitch learning. Cerebral Cortex 18(4), 828–
836.
Zatorre, R. J., Chen, J., & Penhune, V. (2007). When the brain plays music: Sensory-motor
Zatorre, R. J., Fields, R. D., & Johansen-Berg, H. (2012). Plasticity in gray and white: Neuroimaging
changes in brain structure during learning. Nature Neuroscience 15(4), 528–536.
CHAPT E R 18
GENOMICS APPROACHES
FOR STUDYING MUSICAL
A P T I T U D E A N D R E L AT E D
TRAITS
I R MA JÄ RV E L Ä
G A S H
T
E cell in the human body contains 46 chromosomes, made up of ~3

billion nucleotides containing about 20,000 individual genes (Dixon-
Salazar & Gleeson, 2010). Of the 20,000 genes, to date the functions of
4,000 genes have been uncovered (http://www.omim.org/). About 1.5
percent of the genome encodes amino acids that form the building blocks of
human tissues and organs, the proteins. The human cerebral cortex is made
up of ~20 billion neurons, each of which makes an average of 7,000
synaptic contacts (Dixon-Salazar & Gleeson, 2010). The human brain
exhibits a higher expression of genes for synaptic transmission and
plasticity and higher energy metabolism compared to other primates
(Cáceres et al., 2003). Genomic approaches enable the study of biological
phenomena in an unbiased and hypothesis-free fashion, without any
knowledge of the biological background of the phenotype of interest
(Lander, 2011). Molecular genetic analyses can be applied to study human
traits based on their molecular properties rather than anatomic regions. The
utility of next generation sequencing technology has facilitated the
identification of individual genetic variants (“genetic selfies”) with
decreasing cost (Lindor, Thibodeau, & Burke, 2017). This has been
exemplified in medical research, where thousands of genes that cause
inherited diseases or predisposition to common diseases have been
identified.
Molecular genetic studies are based on Mendelian rules, knowing that
children inherit half of their genes from their mother and half from their
father. The inherited variants remain the same during their entire lives. This
is the unique strength of DNA studies in the identification of genetic
variants associated with human traits. Using statistical methods, genetic loci
and alleles can be identified in the human genome that are associated with
the trait under study. Genes with their pathways located in the associated
regions are the candidate genes whose functions can explain the biological
characteristics of a trait under study. Environmental factors (lifestyle) can
affect the expression and regulation of genes. The effect of the
environmental triggers on the expression and regulation of the genes can be
studied for example by RNA- and microRNA-sequencing in humans and
model organisms. Methods of genomics and bioinformatics can be applied
to combine the data to identify genes and alleles, their regulation, and the
pathways linked to musical aptitude and music-related behavioral traits
(e.g., music education, listening, performing, creating music; see Fig. 1).
FIGURE 1. The mode of inheritance of human traits spans from monogenic, that is, caused by a
single gene, to multifactorial inheritance caused by numerous predisposing variants and
environmental factors. Based on genetic and genomics studies musical aptitude is inherited as a
multifactorial trait for which both predisposing genetic variants and exposure to music as an
environmental factor are needed (Oikkonen et al., 2015; Park et al., 2012; Pulli et al., 2008).
M A B
T
Musical practices represent distinctive cognitive abilities of humans. In

biological (genetic) terms, musical aptitude represents a complex cognitive
trait in humans where the function of the auditory pathway (inner ear,
brainstem, auditory cortex) and several brain regions are involved. Music is
sound that is recognized by hair cells in the inner ear. These sounds are
transmitted as electrical signals through the midbrain to the auditory cortex.
About 1 percent of all human genes have a function in hearing; of them at
least 80 are known to cause hearing loss (http://hereditaryhearingloss.org/)
(Atik, Bademci, Diaz-Horta, Blanton, & Tekin, 2017). Brains are naturally
very sensitive to environmental exposure to music (Perani et al., 2010) and
music training (see, e.g., Herholz & Zatorre, 2012; Koelsch, 2010). This
sensitivity is age-dependent similarly to language (Penhune & de Villers-
Sidani, 2014; White, Hutka, Williams, & Moreno, 2013) or vocal learning
in songbirds (Rothenberg, Roeske, Voss, Naguib, & Tchernichovski, 2014).
The sensitivity may be linked to emotional content characteristic of musical
sounds that have their effect on human body functions (Nakahara, Masuko,
Kinoshita, Francis, & Furuya, 2011; Salimpoor, Benovoy, Larcher, Dagher,
& Zatorre, 2011). However, the molecular mechanisms and biological
pathways mediating the effects of music remain largely unknown.
It may be that the ability to detect musical sounds serves as a
prerequisite for appreciating music. This ability is called musical aptitude in
this chapter. Musical aptitude can include abilities, for example, to perceive
and understand intensity, pitch, timbre and tone duration, and the rhythm
and structure they form in music. Carl Seashore developed a battery of tests
consisting of six subtests that measure pitch, intensity, time, consonance,
tonal memory, and rhythm (Seashore, Lewis, & Saetveit, 1960). The
Seashore tests for pitch (SP) and for time (ST) consist of pair-wise
comparisons of the physical properties of sound, and are used to measure
simple sensory capacities such as the ability to detect small differences in
tone pitch or length. Karma (1994) developed a music test (KMT) to
measure the structure of music that includes recognition of melodic contour,
grouping, and relational pitch processing. Auditory structuring ability can
be defined as the ability to identify temporal aspects in time (detecting
sound patterns in time) (Karma, 1994). A similar kind of pattern recognition
is found in many other fields like sport and poetry (comprising language
and speech) that resembles gestalt principles in recognition of music
structure (Justus & Hutsler, 2005). In zebra finches, identification of
acoustic features of song syllables (pitch and timbre) and the species-
specific typical gap durations (rhythm) between song syllables are detected
by different neural cells (Araki, Bandi, & Yazaki-Sukiyama, 2016).
Temporal coding of inter-syllable silent gaps seems to be preserved when
birds are exposed to different song environments, suggesting that temporal
gap coding is innate and species-specific whereas syllable morphology
coding is more experience dependent (Araki et al., 2016). In fact, the
detection of gaps resembles the detection of pauses in music structure in
humans. This is related to understanding tones with time that evokes
anticipatory responses because of the cognitive expectations and prediction
cues involved in listening to music (Salimpoor, Zald, Zatorre, Dagher, &
McIntosh, 2015). In fact, combined music test scores (KMT, SP, ST) were
normally distributed among participants with no specific music education
(Oikkonen & Järvelä, 2014) suggesting that the ability to detect pitch, time,
and sound patterns is common in populations with no music training.
Abilities that animals exhibit without the need for training are referred to as
innate traits. The possession of a natural musical ability may explain why
musical practices are common and present in all societies.
It has been observed that musicianship clusters in families. How much is
this aggregation due to genetic and/or environmental factors, such as
exposure to music? Several studies have been performed to analyze the
inheritance of musical traits. In a twin study using a Distorted Tunes Test
(DTT) (the subjects’ task was to recognize wrong tones incorporated into
simple popular melodies) the correlation between the test scores was 0.67 in
monozygous and 0.44 in dizygous twins (Drayna, Manichaikul, de Lange,
& Snieder, 2001). The heritability (defined as the proportion of the total
variance of the phenotype that is genetic, h2 = VG/VP, where VG is genetic
variance and VP is the overall variance of the phenotype) of the auditory
structuring ability test (Karma Music Test, KMT) was 0.46 in the Finnish
families examined (Oikkonen et al., 2015). Carl Seashore’s subtests of pitch
(SP) and time discrimination (ST) measure the ability to detect small
differences between two sequentially presented tones. The heritability
scores are 0.68 and 0.21 for SP and ST, respectively. The heritability of
combined KMT, SP and ST scores COMB was 0.60 (Oikkonen et al.,
2015). The heritability of pitch perception test (PPA) that is based on
singing was 40 percent (Park, Lee, Kim, & Ju, 2012). A genetic component
has also been demonstrated in rare music phenotypes such as congenital
amusia (Peretz, Cummings, & Dube, 2007) and absolute pitch (AP)
(Baharloo, Service, Risch, Gitschier, & Freimer, 2000). Congenital amusia
is often referred to as “tone deafness” and is a disorder in which a subject’s
ability to perceive or produce music is disturbed. A recent family
aggregation study showed that the sibling relative risk (λS,) was estimated
to be 10.8, which suggests a genetic contribution to the trait (Peretz et al.,
2007). Another extreme trait is absolute pitch (AP). AP refers to the ability
to identify and name pitches without a reference pitch, and the sibling
relative risk (λS) has been estimated to range from 7.8 to 15.1 (Baharloo et
al., 2000). In fact, music perception belongs to a class of human cognitive
abilities that has been shown to be highly familial. In the Finnish families,
52 percent of the professional musicians had one or both parents who were
also professional musicians (Fig. 2).
FIGURE 2. Parental music education is related to children’s music education. High music
education is common among parents of professional musicians (n = 100).
Reproduced from Irma Järvelä, Genomics studies on musical aptitude, music perception, and
practice, Annals of the New York Academy of Sciences, Special Issue: The Neurosciences and
Music 6, p. 2, Figure 1, doi:10.1111/nyas.13620, Copyright © 2018, New York Academy of
Sciences.
E M A
Evolution is based on genetic alleles that are transmitted through

generations during history. Music cultures can develop in diverse directions
but they are linked to the genetic alleles in the human genome. These alleles
are responsible for biological determinant human traits. Favorable alleles
are enriched in the gene pool showing high allele frequencies associated
with the beneficial trait, whereas damaging alleles that cause harmful
effects tend to disappear from the gene pool. The universality of music in
all societies suggests that beneficial alleles do underlie music-related
behavior. However, it is not known what distinguishes humans from
primates with regard to musical ability and what are the biological
determinants underlying artistic cognitive traits.
It is notable that modern humans have an auditory center that functions
identically to that of the first primates that lived millions of years ago
(Parker et al., 2013). Adaptive convergent sequence evolution has also been
found in echolocating bats and dolphins (Montealegre-Z, Jonsson, Robson-
Brown, Postles, & Robert, 2012), implying that numerous genes are linked
not only to hearing but also vision. Interestingly, several birdsong genes
were shown to be upregulated when listening to and performing music (Guo
et al., 2007; Horita et al., 2012; Kanduri, Kuusi, et al., 2015; Kanduri,
Raijas, et al., 2015; Pfenning et al., 2014). These data suggest that the
machinery to facilitate the hearing of sounds is highly conserved. It
facilitates communication via sounds important for the survival of humans
and other species. Vocal learning in songbirds shows similar features to
those found in humans (Araki et al., 2016). Recent studies on songbirds
have shown that there exist two different types of brain cells in the bird
auditory cortex, these register song syllables in zebra finches (Araki et al.,
2016). One type identifies the acoustic features of song syllables (pitch and
timbre) that are affected more by the environment whereas the other type
detects the species-specific typical gap durations (rhythm) between song
syllables which are preserved (Araki et al., 2016).
Advanced cognitive abilities are characteristic of humans and are likely
to be the recent result of positive selection (Sabeti et al., 2006). For
example, FOXP2 has been implicated in human speech and language has
been under positive selection during recent human evolution (Enard et al.,
2002).
As genetic evolution is much slower than cultural evolution, we and
others (Honing, Ten Cate, Peretz, & Trehub, 2015) hypothesize that the
genetic variants associated with musical aptitude have a pivotal role in the
development of music culture. In comparison, in songbirds, the evolution of
song culture is the result of a multigenerational process where the song is
developed by vertical transmission in a species-specific fashion suggesting
genetic constraints (Lipkind & Tchernichovski, 2011). This emphasizes the
importance of the selection of parental singing skills and their genetic
background in evolution.
According to the Mendelian rules, half of the genes are directly inherited
to the offspring. In fact, the genetic component is larger as the other half of
the genes that are not transmitted to the children shape the parental behavior
and affect the children’s development. Concordantly, Hambrick et al. (2014)
have shown that training in music is responsible for about 30 percent of
music performance in professional musicians implying that other factors
including genes have a larger effect. In a Swedish twin study, it was found
that willingness to practice music is an independent personality trait that has
a high heritability (40–70 percent) (Mosing, Madison, Pedersen, Kuja-
Halkola, & Ullén, 2014). These results point to a greater and independent
role of genetic factors contributing to music perception and practice.
Genomic approaches can be used to identify the regions of positive
selection in the human genome. Variations in the music test scores of
auditory structuring ability (Karma Music Test; KMT), Carl Seashore’s
subtests of pitch (SP) and time discrimination (ST) suggest that the alleles
may have been targeted for selection. When testing three methods for
selection, haplotype based methods haploPS, XP-EHH, and the allele
frequency based method FST in the combined phenotype of three music test
scores shown earlier (COMB), hundreds of genes were found in the
selection regions (Liu et al., 2016). Several of them were known to be
involved in auditory perception and inner ear development (DICER1,
FGF20, CUX1, SPARC, KIF3A, TGFB3, LGR5, GPR98, PAX8, COL11A1,
USH2A, PROX1). The findings are consistent with the convergent evolution
of genes related to auditory processes and communication in other species
(Montealegre-Z et al., 2012; Parker et al., 2013; Zhang et al., 2014). Some
genes were known to affect cognition and memory (e.g., GRIN2B, IL1A,
IL1B, RAPGEF5) and reward mechanisms (RGS9). Interestingly, several
genes were linked to song perception and production in songbirds (e.g.,
FOXP1, RGS9, GPR98, GRIN2B, VLDLR). Of these GPR98 expressed in
the song control nuclei of the vocalizing songbird (zebra finch) has been
found to be under positive selection in the songbird lineage (Pfenning et al.,
2014).
Some hypotheses could be constructed based on previous biological
knowledge about the identified genes. FOXP2 has been implicated in an
inherited language disorder (Lai, Fisher, Hurst, Vargha-Khadem, &
Monaco, 2001) that causes disturbances in the ability to detect timing
(rhythm) but not pitch in music (Alcock, Passingham, Watkins, & Vargha-
Khadem, 2000). This is concordant with the different brain cells that are
responsible for pitch and timing in songbirds (Araki et al., 2016). FOXP1
and another candidate gene VLDLR, a direct target gene of human FOXP2
(Ayub et al., 2013; Vernes et al., 2007), belong to the singing-regulated
gene networks in the zebra finch. VLDLR, very-low-density lipoprotein
receptor (Vldlr) is a member of the Reelin pathway, which affects learned
vocalization (Hilliard, Miller, Fraley, Horvath, & White, 2012). GRIN2B is
associated with learning, brain plasticity, and cognitive performance in
humans (Kauppi, Nilsson, Adolfsson, Eriksson, & Nyberg, 2011) and
belongs to the ten prioritized genes in convergent analysis of musical traits
in animals and humans (Oikkonen, Onkamo, Järvelä, & Kanduri, 2016).
RGS9 is expressed in the striatum and belongs to the regulator of G-
protein signaling (RGS) gene family that plays a key role in regulating
intracellular signaling of G-protein coupled receptors, such as dopamine.
The data support the previous findings of the role of dopaminergic pathway
and its link to the reward mechanism as molecular determinants in the
positive selection of music (Salimpoor et al., 2011). This preliminary study
identified a huge amount of functionally relevant candidate genes which
underlie the evolution of music. Further studies may give a more accurate
picture after methods to analyze polygenic selection become available
(Qian, Deng, Lu, & Xu, 2013).
G -W L A
A M T
For assigning genetic markers associated with a trait such as musical
aptitude the definition of the phenotype is required. As musical aptitude is a
complex cognitive trait, it is likely that its individual components have
distinct molecular backgrounds. Each of these components (subphenotype)
can be analyzed separately and they can also be combined.
In a genome-wide study of musical aptitude nearly 800 family members
were defined for auditory structuring ability (Karma Music Test, KMT)
(Karma, 1994) and perception of pitch and time (Seashore et al., 1960) in
music and a combined test score of all the three aforementioned test scores
(COMB). When the family material was analyzed for 660,000 genetic
markers several genetic loci were found in the human genome (Oikkonen et
al., 2015). The identified loci contained candidate genes that affect inner ear
development and neurocognitive processes, which are necessary traits for
music perception. The highest probability of linkage was obtained at 4q22
(Oikkonen et al., 2015). Earlier, chromosome 4q22 was found in a smaller
family material using microsatellite marker scan (Pulli et al., 2008). The
strongest association (in unrelated subjects) was found upstream of GATA
binding protein 2 (GATA2) at chromosome 3q21.3. GATA2 is a relevant
candidate gene as it regulates the development of cochlear hair cells
(Haugas, Lilleväli, Hakanen, & Salminen, 2010) and the inferior colliculus
(IC) (Lahti, Achim, & Partanen, 2013) important in tonotopic mapping, that
is, the processing of sounds of different frequency in the brain.
Interestingly, GATA2 is abundantly expressed in dopaminergic neurons
(Scherzer et al., 2008) that release dopamine during emotional arousal to
music (Salimpoor et al., 2011). Several plausible candidate genes were
located at 4p14 with the highest probability of linkage in the family study
(Oikkonen et al., 2015). The pitch perception accuracy (SP) was linked next
to the protocadherin 7 gene (PCHD7), expressed in the cochlear (Lin et al.,
2012) and amygdaloid (Hertel, Redies, & Medina, 2012) complexes.
PCHD7 is a relevant candidate gene for pitch perception functioning in the
hair cells of the cochlea that recognize pitches (Gosselin, Peretz, Johnsen,
& Adolphs, 2007). The amygdala is the emotional center of the human
brain affected by music (Koelsch, 2010). Interestingly, the homologous
gene PCDH15 also affects hair cell sensory transduction and together with
cadherin type 23 (CDH23) (another candidate gene at chromosome 16)
form a tip-link with each other in sensory hair cells (Sotomayor, Weihofen,
Gaudet, & Corey, 2012). Moreover, the Pcdha–gene cluster was found in
the CNV-study of musical aptitude (Ukkola-Vuoti et al., 2013). Platelet-
derived growth factor receptor alpha polypeptide (PDGFRA) is expressed
in the hippocampus (Di Pasquale et al., 2003), associated with learning and
memory. The potassium channel tetramerisation domain containing 8
(KCTD8) is expressed in the spiral ganglion of the cochlea (Metz,
Gassmann, Fakler, Schaeren-Wiemers, & Bettler, 2011). KCTD8 also
interacts with the GABA receptors GABRB1 and GABRB2; of these,
GABRb1 protein is reduced in schizophrenia, bipolar disorder, and major
depression, diseases that severely affect human cognition and mood
regulation (Fatemi, Folsom, Rooney, & Thuras, 2013). Cholinergic
receptor, nicotinic alpha 9 (neuronal) (CHRNA9) (Katz et al., 2004) and the
paired-like homeobox 2b (PHOX2B) (Ousdal et al., 2012) on chromosome 4
also affect inner ear development. In addition, PHOX2B increases amygdala
activity and autonomic functions (blood pressure, heart rate, and
respiration) that are reported to be affected by music (Blood & Zatorre,
2001). The genome-wide analyses performed on Mongolian families using
the pitch perception accuracy (PPA) test identified a partly shared genetic
region on chromosome 4q (Park et al., 2012). The statistically most
significant locus found in a genome-wide linkage study of absolute pitch
(AP) is located at 8q24.21 (Theusch, Basu, & Gitschier, 2009). The results
suggest that musical aptitude is an innate ability that is affected by several
predisposing genetic variants (Fig. 1).
Genome-wide copy number variation (CNV) analysis revealed regions
that contain candidate genes for neuropsychiatric disorders were associated
with musical aptitude (Ukkola-Vuoti et al., 2013). A deletion covering the
protocadherin-a gene cluster 1–9 (PCDHA 1–9) was associated with low
music test scores (COMB) both in familial and sporadic cases. PCDHAs
affect synaptogenesis and maturation of the serotonergic projections in the
brain and Pcdha mutant mice show abnormalities in learning and memory
(Katori et al., 2009).
T E M P
P H T
Music acts as an environmental trigger. Numerous studies have shown that
listening to and performing classical music have an effect on the human
body (Blood & Zatorre, 2001; Salimpoor et al., 2011). When comparing
genome-wide RNA expression profiles before and after listening to
classical music and after a “music-free” control session, the activity of
genes involved in dopamine secretion and transport (SNCA, RTN4, and
SLC6A8), and learning and memory (SNCA, NRGN, NPTN, RTN4) were
enhanced (Kanduri, Kuusi, et al., 2015). Of these genes, SNCA (George,
Jin, Woods, & Clayton, 1995), NRGN (Wood, Olson, Lovell, & Mello,
2008), and RGS2 affect song learning and singing in songbirds (Clayton,
2000) suggesting a shared evolutionary background of sound perception
between vocalizing birds and humans. It is noteworthy that the effect of
music was only detectable in musically-experienced listeners. The lack of
the effect of music in novices could be explained by differences in the
amount of exposure to music that is known to affect brain structure and
function (Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995; Gaser &
Schlaug, 2003), unfamiliarity with the music (Salimpoor, Benovoy, Longo,
Cooperstock, & Zatorre, 2009), or musical anhedonia (Martínez-Molina,
Mas-Herrero, Rodríguez-Fornells, Zatorre, & Marco-Pallarés, 2016). In
addition, listening to music increased the expression of the target genes of
the dopaminoceptive neuronal glucocorticoid receptor (NR3C1), which
increases the synaptic concentration of dopamine linked to rewarding and
reinforcing properties (Ambroggi et al., 2009). It is of note that NR3C1 is
also a key molecule in the regulation of addictive behavior.
Music performance by professional musicians involves a wide spectrum
of cognitive and multisensory motor skills, whose molecular basis is largely
unknown. The effect of music performance on the genome-wide peripheral
blood transcriptome of professional musicians was analyzed by collecting
RNA-samples before and after a two-hour concert performance and after a
“music-free” control session. The upregulated genes were found to affect
dopaminergic neurotransmission, motor behavior, neuronal plasticity, and
neurocognitive functions including learning and memory. Specifically,
performance of music by professional musicians increased the expression of
FOS, DUSP1, SNCA, catecholamine biosynthesis, and dopamine
metabolism (Kanduri, Raijas, et al., 2015). Interestingly, SNCA, FOS, and
DUSP1 are involved in song perception and production in songbirds. Thus,
both listening to and performing music shared partially the same genes as
those affected in songbird singing. It is noteworthy that although the brains
of songbirds are small they have a double density neuron structure
compared to primate brains of the same mass. Thus, the large number of
neurons can contribute to the neural basis of cognitive capacity (Enard,
2016).
In both listening to and performing music (Kanduri, Kuusi, et al., 2015;
Kanduri, Raijas, et al., 2015), one of the strongest activations was detected
in the alfa-synuclein gene (SNCA), which has a physiological role in the
development of nerve cells and releases neurotransmitters, especially
dopamine from the presynaptic cells. Dopamine is responsible for motor
functions and genes known to affect growth and the plasticity of nerve cells
and the inactivation of genes affecting neurodegeneration (Kanduri, Raijas,
et al., 2015). SNCA is located in the best linkage region of musical aptitude
on chromosome 4q22.1 and is regulated by GATA2 residing at 3q21, the
region with the most significant association with musical aptitude thus
linking the results of the GWA study and transcriptional profiling studies to
the same locus (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015;
Oikkonen et al., 2015) (Fig. 3). GATA2 is abundantly expressed in
dopaminergic neurons and binds to intron-1 of endogenous neuronal SNCA
to regulate its expression. The results are in agreement with
neurophysiological studies where increases in endogenous dopamine have
been detected in the striatum when listening to music (Blood & Zatorre,
2001). Interestingly, SNCA is a causative gene for Parkinson’s disease (with
disturbed dopamine metabolism) (Petrucci, Ginevrino, & Valente, 2016)
and variations in SNCA predispose to Lewy-body dementia (Peuralinna et
al., 2008).
Listening to music and music performance had partially different effects
on gene expression. Some genes such as ZNF223 and PPP2R3A were
downregulated after music listening but upregulated after music
performance (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015).
ZNF223 is a zinc-finger transcription regulator and similar to an immediate
early response gene (IEG) ZNF225 (also known as ZENK, EGR1) that
regulates the song control system of songbirds (Dong & Clayton, 2008).
PPP2R3A, abundantly expressed in the striatum, is known to integrate the
effects of dopamine and other neurotransmitters (Ahn et al., 2007). Other
IEGs such as FOS and DUSP1 that are known to be responsible for the
song control nuclei of songbirds were upregulated only after music
performance (Kanduri, Kuusi, et al., 2015), but not music listening
(Kanduri, Raijas, et al., 2015). Many other song perception-related genes in
songbirds like RGS2 were found to be differentially regulated after listening
to music, but not after music performance (Kanduri, Kuusi, et al., 2015;
Kanduri, Raijas, et al., 2015). The reasons for the differences are plausibly
due, for example, to different types of musical activity and different study
subjects.
FIGURE 3. The results of DNA- and RNA-studies of music-related traits converge at chromosome
4q22. The alpha-synuclein gene (SNCA) upregulated by listening to music (Kanduri, Raijas, et al.,
2015) and music performance by professional musicians (Kanduri, Kuusi, et al., 2015) is located at
the most significant region of musical aptitude (Oikkonen et al., 2015; Park et al., 2012; Pulli et al.,
2008) and regulated by GATA2, associated with musical aptitude (Oikkonen et al., 2015).
Reproduced from Irma Järvelä, Genomics studies on musical aptitude, music perception, and
practice, Annals of the New York Academy of Sciences, Special Issue: The Neurosciences and
Music 6, p. 4, Fig. 2, doi:10.1111/nyas.13620, Copyright © 2018, New York Academy of
Sciences.
At the molecular level, auditory perception processes have been shown

to exhibit convergent evolution across species (Sotomayor et al., 2012;
Zhang et al., 2014). Among them is protocadherin15 (PCDH15), also found
in human genome-wide association study of musical aptitude (Oikkonen et
al., 2015). Also, gene expression specializations have been detected in the
regions of the brain that are essential for auditory perception and
production, both in humans and songbirds (Pfenning et al., 2014; Salimpoor
et al., 2011; Whitney et al., 2014).
C A
Integration of data from various species helps to prioritize genes most

relevant to the phenotype. A rich literature exists about genes affecting the
vocal learning of different species, especially songbirds (Clayton, 2013;
Pfenning et al., 2014) and recently, data have been gathered about candidate
genes associated with human musical traits (Kanduri, Kuusi, et al., 2015;
Kanduri, Raijas, et al., 2015; Liu et al., 2016; Oikkonen et al., 2015; Park et
al., 2012; Pulli et al., 2008). When ranking the hitherto known data about
the candidate genes found in musical aptitude, music listening, and
performance with genes identified in vocalizing animal species, data about
brain and tissue-specific molecules and pathways can be utilized, which is
not possible in human studies alone. Convergent analysis of genes
identified in vocalizing animals and human music-related traits revealed
that the most common candidate genes were activity dependent immediate
early genes (IEGs) including EGR1, FOS, ARC, BDNF, and DUSP1
(Oikkonen et al., 2016). IEGs respond to sensory and motor stimuli in the
brain. Of these, EGR1 is widely expressed in brain regions that affect
cognition, emotional response, and sensitivity to reward in the rat (Duclot &
Kabbaj, 2017). EGR1 is upregulated by song perception and production in
songbirds (Avey, Kanyo, Irwin, & Sturdy, 2008; Drnevich, et al., 2012).
Interestingly, EGR1 is the only highly ranked gene in all human phenotypes
like music listening, music performing, and musical aptitude. In contrast,
PHIP, noradrenalin, and NR4A2 were ranked among the top molecules in
the whole sample as well as within music listening studies, but not within
music performance (e.g., singing) related studies, whereas DUSP1, PKIA,
and DOPEY2 were the top genes specifically in music practice. These
results support at least partially different molecular backgrounds for music-
related processes. FOS and DUSP1 were activated when professional
musicians played a concert (Kanduri, Kuusi, et al., 2015). Other candidate
genes like FOXP2 and GRIN2B have been shown to be critical for vocal
communication in songbirds (Haesler et al., 2004) and cognitive
development, including speech in humans (Hu, Chen, Myers, Yuan, &
Traynelis, 2016), and they are located in the selection regions for musical
aptitude (Liu et al., 2016). There are still limitations in comparative studies
as the avian genomes contain only ~70 percent of the number of the human
genes (Zhang et al., 2014).
Convergent evidence for genes involved in functions like cognition,
learning, and memory has been reported in music-related activities
(Oikkonen et al., 2016). Several pathways were identified describing the
interaction and function of the identified genes. Among them, the CDK5
signaling pathway regulates cognitive functions in the brain. Interestingly,
the MEK gene, a member of the CDK5 signaling pathway is necessary for
song learning in songbirds (London & Clayton, 2008). There is a partially
shared genetic predisposition for musical abilities and general cognition
(Mosing et al., 2014; Mosing, Madison, Pedersen, & Ullén, 2016). Human
cognitive capacity has evolved rapidly, therefore it is highly likely that
human-specific pathways and genes may underlie human musical abilities.
Obviously, cognition-related genes are a plausible group of candidate genes
for elucidating the more recent evolution of music-related traits.
B B C
A M
Creativity in music is an essential part of the development of music culture

and industry. Creative activity in music, composing, improvising, or
arranging, is common (Oikkonen et al., 2016). Some evidence for the
biological basis of creativity in music has been obtained from brain imaging
studies where composing (Brown, Martinez, & Parsons, 2006) and
improvising musical pieces are shown to affect several regions in the brain
such as the medial prefrontal cortex, premotor areas, and auditory cortex
(Dietrich & Kanso, 2010; Limb & Braun, 2008; Liu et al., 2012). Listening
to music has been shown to increase dopamine in a human PET study
(Salimpoor, 2011).
So far, genomics approaches have been applied rarely to creative
activities in general or specifically in musical activities. Dopaminergic
genes appeared to be upregulated in genomic studies of musical aptitude
and related traits (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015;
Oikkonen et al., 2015), and some of them, such as FOS and FOXP2 have
also been found in songbirds (Murugan, Harward, Scharff, & Mooney,
2013; Nordeen, Holtzman, & Nordeen, 2009). The dopamine D4 receptor
gene (DRD4) is an interesting candidate for creativity in music. It mediates
dopamine signaling at the neuronal synapses. Two of its signaling variants
(7R and 2R) have been associated not only with novelty seeking and
altruism, but also financial risk taking and heavy drinking: This kind of
behavior can be seen as a sensitivity toward the influences from the
environment (Kitayama, King, Hsu, Liberzon, & Yoon, 2016). It may be
that the carriers of the 7R/2R variants have the capacity to adopt new ways
of behavior such as to create new music (Kitayama et al., 2016) (Fig. 4).
This may serve as an example of how genetic variants can affect cultural
evolution (Kim & Sasaki, 2014). In fact, many composers are known to
have composed music describing actual occurrences in society.
FIGURE 4. Several music-related traits are found to be linked to dopaminergic metabolism.

Based on a large epidemiological study from Sweden, individuals in
creative professions were more likely to suffer from bipolar disorder
(Kyaga et al., 2011) and creative professions were overrepresented among
the first-degree relatives of patients with neuropsychiatric disorders (e.g.,
schizophrenia and bipolar disorder) indicating familial co-segregation of
creativity and neuropsychiatric disorders (Kyaga et al., 2013). When the
known genes and alleles associated with neuropsychiatric disorders (Lee,
Ripke, Neale, & Cross-Disorder Group, 2013; Schizophrenia Working
Group of the Psychiatric Genomics Consortium, 2014) were analyzed
among artistic professions including musicians, the risk alleles were more
prevalent in the artistic professions (Power et al., 2015). When professional
musicians played a traditional classic concert, several genes reported to be
mutated in neuropsychiatric or neurodegenerative diseases were affected
(Kanduri, Kuusi, et al., 2015). This finding may reflect creative activities
plausibly linked to music performance. Thus, molecular genetic studies give
evidence that artistic creativity and neuropsychiatric disorders are partially
shared by the same predisposing genetic variants. Creativity is likely
rewarding, whereas diseases cause suffering. It is currently not known
which of the numerous risk alleles of neuropsychiatric disorders are
required and which are the individual family and environmental protective
or risk factors that underlie complex phenotypes like creativity and
neuropsychiatric disorders.
Empirical research on the biological background of music-related human

traits has been introduced using genomics methods. Genes affecting inner
ear development, dopaminergic systems, learning, and memory were found
as candidate genes for musical aptitude, listening to and performing music.
In addition, the several genes previously known to affect vocal learning of
songbirds were identified as candidate genes for music perception and
practice. Activity dependent immediate early genes (IEGs) were the most
commonly ranked genes by humans and songbirds in convergent analysis.
IEGs like EGR1 are critical mediators of gene–environment interactions
characterized by rapid and dynamic responses to neuronal activity and
reward-related synaptic plasticity (Duclot & Kabbaj, 2017), also reported in
music-related studies (Salimpoor et al., 2011; Schneider et al., 2002). IEGs
could thus serve as plausible candidate genes to mediate the effects of
music as an environmental factor. Replication studies and studies using
epigenomics methods are warranted to further elucidate the biological
background of music-related traits.
R
Ahn, J. H., Sung, J. Y., McAvoy, T., Nishi, A., Janssens, V., Goris, J., … Nairn, A. C. (2007). The
B"/PR72 subunit mediates Ca2+-dependent dephosphorylation of DARPP-32 by protein
phosphatase 2A. Proceedings of the National Academy of Sciences 104(23), 9876–9881.
Alcock, K. J., Passingham, R. E., Watkins, K., & Vargha-Khadem, F. (2000). Pitch and timing
abilities in inherited speech and language impairment. Brain & Language 75(1), 34–46.
Ambroggi, F., Turiault, M., Milet, A., Deroche-Gamonet, V., Parnaudeau, S., Balado, E., … Tronche,
F. (2009). Stress and addiction: Glucocorticoid receptor in dopaminoceptive neurons facilitates
cocaine seeking. Nature Neuroscience 12(3), 247–249.
Araki, M., Bandi, M. M., & Yazaki-Sukiyama, Y. (2016). Mind the gap: Neural coding of species
identity in birdsong prosody. Science 354(6317), 1282–1287.
Atik, T., Bademci, G., Diaz-Horta, O., Blanton, S. H., & Tekin, M. (2017). Whole-exome sequencing
and its impact in hereditary hearing loss. Genetics Research 97, e4.
doi:10.1017/S001667231500004X
Avey, M. T., Kanyo, R. A., Irwin, E. L., & Sturdy, C. B. (2008). Differential effects of vocalization
type, singer and listener on ZENK immediate early gene response in black-capped chickadees
(Poecile atricapillus). Behavioural Brain Research 188(1), 201–208.
Ayub, Q., Yngvadottir, B., Chen, Y., Xue, Y., Hu, M., Vernes, S. C., … Tyler-Smith, C. (2013).
FOXP2 targets show evidence of positive selection in European populations. American Journal of
Baharloo, S., Service, S. K., Risch, N., Gitschier, J., & Freimer, N. B. (2000). Familial aggregation of
absolute pitch. American Journal of Human Genetics 67(3), 755–758.
in brain regions implicate in reward and emotion. Proceedings of the National Academy of
Sciences 98(20), 11818–11823.
Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in the brain: A
PET study of the generation of melodies and sentences. European Journal of Neuroscience 23(10),
2791–2803.
Cáceres, M., Lachuer, J., Zapala, M. A., Redmond, J. C., Kudo, L., Geschwind, D. H., … Barlow, C.
(2003). Elevated gene expression levels distinguish human from non-human primate brains.
Clayton, D. F. (2000). The genomic action potential. Neurobiology of Learning and Memory 74(3),
185–216.
Clayton, D. F. (2013). The genomics of memory and learning in songbirds. Annual Review of
Genomics and Human Genetics 14, 45–65.
Di Pasquale, G., Davidson, B. L., Stein, C. S., Martins, I., Scudiero, D., Monks, A., & Chiorini, J. A.
(2003). Identification of PDGFR as a receptor for AAV-5 transduction. Nature Medicine 9, 1306–
1312.
Dietrich, A., & Kanso, R. (2010). A review of EEG, ERP, and neuroimaging studies of creativity and
insight. Psychological Bulletin 136(5), 822–848.
Dixon-Salazar, T. J., & Gleeson, J. G. (2010). Genetic regulation of human brain development:
Lessons from Mendelian diseases. Annals of the New York Academy of Sciences 1214, 156–167.
Dong, S., & Clayton, D. F. (2008) Partial dissociation of molecular and behavioral measures of song
habituation in adult zebra finches. Genes, Brain and Behavior 7(7), 802–809.
Drayna, D., Manichaikul, A., de Lange, M., & Snieder, H. (2001). Genetic correlates of musical pitch
recognition in humans. Science 291(5510), 1969–1972.
Drnevich, J., Replogle, K. L., Lovell, P., Hahn, T. P., Johnson, F., Mast, T. G., … Clayton, D. F.
(2012). Impact of experience-dependent and -independent factors on gene expression in songbird
brain. Proceedings of the National Academy of Sciences 109(Suppl. 2), 17245–17252.
Duclot, F., & Kabbaj, M. (2017). The role of Early Growth Response 1 (EGR1) in brain plasticity
and neuropsychiatric disorders. Frontiers in Behavioral Neuroscience 11, 35. Retrieved from
https://doi.org/10.3389/fnbeh.2017.00035
Enard, W. (2016). The molecular basis of human brain evolution. Current Biology 26(20), R1109–
R1117.
Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S., Wiebe, V., Kitano, T., … Pääbo, S. (2002).
Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418(6900), 869–
872.
Fatemi, S. H., Folsom, T. D., Rooney, R. J., & Thuras, P. D. (2013). Expression of GABAA a2-, b1-
and e-receptors are altered significantly in the lateral cerebellum of subjects with schizophrenia,
major depression and bipolar disorder. Translational Psychiatry 3, e303. doi:10.1038/tp.2013.64
Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musicians.
George, J. M., Jin, H., Woods, W. S., & Clayton, D. F. (1995). Characterization of a novel protein
regulated during the critical period for song learning in the zebra finch. Neuron 15, 361–372.
Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emotion
recognition from music. Neuropsychologia 45(2), 236–244.
Guo, Y. P., Sun, X., Li, C., Wang, N. Q., Chan, Y. S., & He, J. (2007). Corticothalamic
synchronization leads to c-fos expression in the auditory thalamus. Proceedings of the National
Haesler, S., Wada, K., Nshdejan, A., Morrisey, E. E., Lints, T., Jarvis, E. D., & Scharff, C. (2004).
FoxP2 expression in avian vocal learners and non-learners. Journal of Neuroscience 24(13), 3164–
3175.
Hambrick, D. Z., Oswald, F. L., Altmann, E. M., Meinz, E. J., Gobet, F., & Campitelli, G. (2014).
Deliberate practice: Is that all it takes to become an expert? Intelligence 45, 34–45.
Haugas, M., Lilleväli, K., Hakanen, J., & Salminen, M. (2010). Gata2 is required for the
development of inner ear semicircular ducts and the surrounding perilymphatic space.
Developmental Dynamics 239(9), 2452–2469.
Hertel, N., Redies, C., & Medina, L. (2012). Cadherin expression delineates the divisions of the
postnatal and adult mouse amygdala. Journal of Comparative Neurology 520(17), 3982–4012.
Hilliard, A. T., Miller, J. E., Fraley, E. R., Horvath, S., & White, S. A. (2012). Molecular
microcircuitry underlies functional specification in a basal ganglia circuit dedicated to vocal
learning. Neuron 73(3), 537–552.
Honing, H., Ten Cate, C., Peretz, I., & Trehub, S. E. (2015). Without it no music: Cognition, biology
and evolution of musicality. Philosophical Transactions of the Royal Society of London B:
Biological Sciences 370(1664): 20140088. doi:10.1098/rstb.2014.0088
Horita, H., Kobayashi, M., Liu, W.-C., Oka, K., Jarvis, E. D., & Wada, K. (2012). Specialized motor-
driven dusp1 expression in the song systems of multiple lineages of vocal learning birds. PLoS
ONE 7, e42173.
Hu, C., Chen, W., Myers, S. J., Yuan, H., & Traynelis, S. F. (2016). Human GRIN2B variants in
neurodevelopmental disorders. Journal of Pharmacological Sciences 132(2), 115–121.
Justus, T., & Hutsler, J. J. (2005). Fundamental issues in the evolutionary psychology of music:
Assessing innateness and domain specificity. Music Perception 23(1), 1–27.
Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A. K., Lähdesmäki, H., & Järvelä, I. (2015).The
9506. doi:10.1038/srep09506
Kanduri, C., Raijas, P., Ahvenainen, M., Philips, A. K., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä,
I (2015). The effect of listening to music on human transcriptome. PeerJ 3, e830. Retrieved from
https://doi.org/10.7717/peerj.830
Karma, K. (1994). Auditory and visual temporal structuring: How important is sound to musical
thinking? Psychology of Music 22(1), 20–30.
Katori, S., Hamada, S., Noguchi, Y., Fukuda, E., Yamamoto, T., Yamamoto, H., … Yagi, T. (2009).
Protocadherin-alpha family is required for serotonergic projections to appropriately innervate
target brain areas. Journal of Neuroscience 29(29), 9137–9147.
Katz, E., Elgoyhen, A. B., Gómez-Casati, M. E., Knipper, M., Vetter, D. E., Fuchs, P. A., &
Glowatski, E. (2004). Developmental regulation of nicotinic synapses on cochlear inner hair cells.
Kauppi, K., Nilsson, L.-G., Adolfsson, R., Eriksson, E. & Nyberg, L. (2011). KIBRA polymorphism
is related to enhanced memory and elevated hippocampal processing. Journal of Neuroscience 31,
14218–14222.
Kim, H. S., & Sasaki, J. Y. (2014). Cultural neuroscience: Biology of the mind in cultural contexts.
Kitayama, S., King, A., Hsu, M., Liberzon, I., & Yoon, C. (2016). Dopamine-system genes and
cultural acquisition: The norm sensitivity hypothesis. Current Opinion in Psychology 8, 167–174.
Koelsch, S. (2010). Towards a neural basis of music-evoked emotions. Trends in Cognitive Sciences
14(3), 131–137.
Kyaga, S., Landén, M., Boman, M., Hultman, C. M., Långström, N., & Lichtenstein, P. (2013).
Mental illness, suicide and creativity: 40-year prospective total population study. Journal of
Psychiatric Research 47(1), 83–90.
Kyaga, S., Lichtenstein, P., Boman, M., Hultman, C., Långström, N., & Landén, M. (2011).
Creativity and mental disorder: Family study of 300,000 people with severe mental disorder.
British Journal of Psychiatry 199(5), 373–379.
Lahti, L., Achim, K., & Partanen, J. (2013). Molecular regulation of GABAergic neuron
differentiation and diversity in the developing midbrain. Acta Physiologica (Oxford) 207(4), 616–
627.
Lai, C. S., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F., & Monaco, A. P. (2001). A forkhead-
domain gene is mutated in a severe speech and language disorder. Nature 413 (6855), 519–523.
Lander, E. S. (2011). Initial impact of the sequencing of the human genome. Nature 470, 187–197.
Lee, S. H., Ripke, S., Neale, B. M., & Cross-Disorder Group of the Psychiatric Genomics
Consortium (2013). Genetic relationship between five psychiatric disorders estimated from
genome-wide SNPs. Nature Genetics 45(9), 984–994.
Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical performance: An
FMRI study of jazz improvisation. PLoS ONE 3, e1679.
Lin, J., Yan, X., Wang, C., Guo, Z., Rolfs, A., & Luo, J. (2012). Anatomical expression patterns of
delta-protocadherins in developing chicken cochlea. Journal of Anatomy 221(6), 598–608.
Lindor, N. M., Thibodeau, S., & Burke, W. (2017). Whole-genome sequencing in healthy people.
Mayo Clinic Proceedings 92(1), 159–172.
Lipkind, D., & Tchernichovski, O. (2011). Colloquium paper: Quantification of developmental
birdsong learning from the subsyllabic scale to cultural evolution. Proceedings of the National
Academy of Sciences 108(Suppl. 3), 15572–15579.
834. doi:10.1038/srep00834
Liu, X., Kanduri, C., Oikkonen, J., Karma, K., Raijas, P., Ukkola-Vuoti, L., … Järvelä, I. (2016).
Detecting signatures of positive selection associated with musical aptitude in the human genome.
London, S. E., & Clayton, D. F. (2008). Functional identification of sensory mechanisms required for
developmental song learning. Nature Neuroscience 11(5), 579–586.
Sciences 113(46), E7337–E7345.
Metz, M., Gassmann, M., Fakler, B., Schaeren-Wiemers, N., & Bettler, B. (2011). Distribution of the
auxiliary GABAB receptor subunits KCTD8, 12, 12b, and 16 in the mouse brain. Journal of
Comparative Neurology 519(8), 1435–1454.
Montealegre-Z, F., Jonsson, T., Robson-Brown, K. A., Postles, M., & Robert, D. (2012). Convergent
evolution between insect and mammalian audition. Science 338(6109), 968–971.
1795–1803.
Mosing, M. A., Madison, G., Pedersen, N. L., & Ullén, F. (2016). Investigating cognitive transfer
within the framework of music practice: Genetic pleiotropy rather than causality. Developmental
Science 19(3), 504–512.
Murugan, M., Harward, S., Scharff, C., & Mooney, R. (2013). Diminished FoxP2 levels affect
dopaminergic modulation of corticostriatal signaling important to song variability. Neuron 80(6),
1464–1476.
Nakahara, H., Masuko, T., Kinoshita, H., Francis, P. R., & Furuya, S. (2011). Performing music can
induce greater modulation of emotion-related psychophysiological responses than listening to
music. International Journal of Psychophysiology 81(3), 152–158.
Nordeen, E. J., Holtzman, D. A., & Nordeen, K. W. (2009). Increased Fos expression among
midbrain dopaminergic cell groups during birdsong tutoring. European Journal of Neuroscience
30(4), 662–670.
Oikkonen, J., Huang, Y., Onkamo, P., Ukkola-Vuoti, L., Raijas, P., Karma, K., … Järvelä, I. (2015).
A genome-wide linkage and association study of musical aptitude identifies loci containing genes
related to inner ear development and neurocognitive functions. Molecular Psychiatry 20(2), 275–
282.
Oikkonen, J., & Järvelä, I. (2014). Genomics approaches to study musical aptitude. Bioessays 36(11),
1102–1108.
Oikkonen, J., Onkamo, P., Järvelä, I., & Kanduri, C. (2016). Convergent evidence for the molecular
basis of musical traits. Scientific Reports 6, 39707. doi:10.1038/srep39707
Ousdal, O. T., Anand Brown, A., Jensen, J., Nakstad, P. H., Melle, I., Agartz, I., … Andreassen, O.
A. (2012). Associations between variants near a monoaminergic pathways gene (PHOX2B) and
amygdala reactivity: A genome-wide functional imaging study. Twin Research and Human
Genetics 15(3), 273–285.
Park, H., Lee, S., Kim, H. J., & Ju, Y. S. (2012). Comprehensive genomic analyses associate UGT8
variants with musical ability in a Mongolian population. Journal of Medical Genetics 49(12), 747–
752.
Parker, J., Tsagkogeorga, G., Cotton, J. A., Liu, Y., Provero, P., Stupka, E., & Rossiter, S. J. (2013).
Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470),
228–231.
Penhune, V., & de Villers-Sidani, E. (2014). Time for new thinking about sensitive periods. Frontiers
in Systems Neuroscience 8, 55. Retrieved from https://doi.org/10.3389/fnsys.2014.00055
Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010).
Peretz, I., Cummings, S., & Dube, M. P. (2007). The genetics of congenital amusia (tone deafness):
A family-aggregation study. American Journal of Human Genetics 81(3), 582–588.
Petrucci, S., Ginevrino, M., & Valente, E. M. (2016). Phenotypic spectrum of alpha-synuclein
mutations: New insights from patients and cellular models. Parkinsonism & Related Disorders
22(Suppl. 1), S16–S20.
Peuralinna, T., Oinas, M., Polvikoski, T., Paetau, A., Sulkava, R., Niinistö, L., … Myllykangas, L.
(2008). Neurofibrillary tau pathology modulated by genetic variation of alpha-synuclein. Annals of
Neurology 64(3), 348–352.
Pfenning, A. R., Hara, E., Whitney, O., Rivas, M. V., Wang, R., Roulhac, P. L., … Jarvis, E. D.
(2014). Convergent transcriptional specializations in the brains of humans and song-learning birds.
Science 346(6215), 1256846–1256846.
Power, R. A, Steinberg, S., Bjornsdottir, G., Rietveld, C. A., Abdellaoui, A., Nivard, M. M., …
Steffanson, K. (2015). Polygenic risk scores for schizophrenia and bipolar disorder predict
creativity. Nature Neuroscience 18(7), 953–955.
Pulli, K., Karma, K., Norio, R., Sistonen, P., Göring, H. H., & Järvelä, I. (2008). Genome-wide
linkage scan for loci of musical aptitude in Finnish families: Evidence for a major locus at 4q22.
Journal of Medical Genetics 45(7), 451–456.
Qian, W., Deng, L., Lu, D., & Xu, S. (2013). Genome-wide landscapes of human local adaptation in
Asia. PLoS ONE 8, e54224.
Rothenberg, D., Roeske, T. C., Voss, H. U., Naguib, M., & Tchernichovski, O. (2014). Investigation
of musicality in birdsong. Hearing Research 308, 71–83.
Sabeti, P. C., Schaffner, S. F., Fry, B., Lohmueller, J., Varilly, P., Shamovsky, O., … Lander, E. S.
(2006). Positive natural selection in the human lineage. Science 312(5780), 1614–1620.
Salimpoor, V. N., Benovoy, M., Longo, G., Cooperstock, J. R., & Zatorre, R. J. (2009). The
rewarding aspects of music listening are related to degree of emotional arousal. PLoS ONE 4,
e7487.
Scherzer, C. R., Grass, J. A., Liao, Z., Pepivani, I., Zheng, B., Eklund, A. C., … Schlossmacher, M.
G. (2008). GATA transcription factors directly regulate the Parkinson’s disease-linked gene alpha-
synuclein. Proceedings of the National Academy of Sciences 105(31), 10907–10912.
Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014). Biological insights
from 108 schizophrenia-associated genetic loci. Nature 511(7510), 421–427.
Schneider, P., Scherg, M., Dosch, H. G., Specht, H. J., Gutschalk, A., & Rupp, A. (2002).
Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians.
Seashore, C., Lewis, D., & Saetveit, J. (1960). Seashore measures of musical talents. New York:
Psychological Corporation.
Sotomayor, M., Weihofen, W. A., Gaudet, R., & Corey, D. P. (2012). Structure of a force-conveying
cadherin bond essential for inner-ear mechanotransduction. Nature 492(7427), 128–132.
Theusch, E., Basu, A., & Gitschier, J. (2009). Genome-wide study of families with absolute pitch
reveals linkage to 8q24.21 and locus heterogeneity. American Journal of Human Genetics 85(1),
112–119.
Ukkola-Vuoti, L., Kanduri, C., Oikkonen, J., Buck, G., Blancher, C., Raijas, P., … Järvelä, I. (2013).
Genome-wide copy number variation analysis in extended families and unrelated individuals
characterized for musical aptitude and creativity in music. PLoS ONE 8, e56356.
Vernes, S. C., Spiteri, E., Nicod, J., Groszer, M., Taylor, J. M., Davies, K. E., … Fisher, S. E. (2007).
High-throughput analysis of promoter occupancy reveals direct neural targets of FOXP2, a gene
mutated in speech and language disorders. American Journal of Human Genetics 81(6), 1232–
1250.
White, E. J., Hutka, S. A., Williams, L. J., & Moreno, S. (2013). Learning, neural plasticity and
sensitive periods: Implications for language acquisition, music training and transfer across the
lifespan. Frontiers in Systems Neuroscience 7, 90. Retrieved from
Whitney, O., Pfenning, A. R., Howard, J. T., Blatti, C. A., Liu, F., Ward, J. M., … Jarvis, E. D.
(2014). Core and region-enriched networks of behaviorally regulated genes and the singing
genome. Science 346(6215), 1256780.
Wood, W. E., Olson, C. R., Lovell, P. V., & Mello, C. V. (2008). Dietary retinoic acid affects song
maturation and gene expression in the song system of the zebra finch. Developmental
Neurobiology 68(10), 1213–1224.
Zhang, G., Li, C., Li, Q., Li, B., Larkin, D. M., Lee, C., … Wang, J. (2014). Comparative genomics
reveals insights into avian genome evolution and adaptation. Science 346(6215), 1311–1320.
CHAPT E R 19
BRAIN RESEARCH IN
MUSIC PERFORMANCE
E C K A RT A LT E N MÜ L L E R, S H I N I C H I F U R U YA, D A N I E L S .
S C H O L Z, A N D C H R I S TO S I . I O A N N O U
M performance is based on extensive training and playing experience.

It provides an excellent model for studying changes in brain functions and
structures along with increasing expertise, a phenomenon usually referred
to as plasticity of the human brain. Especially in professional musicians
demands placed on the nervous system by music performance are very high
and provide a uniquely rich multisensory and motor experience to the
player. As confirmed by neuroimaging studies, playing music depends on a
strong coupling of perception and action mediated by sensory, motor, and
multimodal integration areas distributed throughout the brain. A pianist, for
example, must draw on a whole set of complex skills, including translating
visual analysis of musical notation into motor actions, coordinating
multisensory information with bimanual motor activity, developing fine
motor skills in both hands coupled with metric precision, and monitoring
auditory feedback to fine-tune a performance as it progresses.
In this chapter, we summarize research on the effects of musical training
on brain function, brain connectivity, and brain structure. First, we address
factors inducing and continuously driving brain plasticity in dedicated
musicians, arguing that prolonged goal-directed practice, multisensory–
motor integration, high arousal, and emotional and social rewards
contribute to these plasticity-induced brain adaptations. Subsequently, we
briefly review the neuroanatomy and neurophysiology underpinning
musical activities by focusing on the perception of sound, integration of
sound and movement, and the physiology of motor planning and motor
control. Further down, we review the literature on functional changes in
brain activation and brain connectivity along with the acquisition of musical
skills. In the following section, we focus on structural adaptions in the gray
matter of the brain and in fiber tract density associated with music learning.
We critically discuss the findings that structural changes are mostly seen
when starting musical training after the age of 7 years, whereas functional
optimization is more effective before this age. Finally, we briefly address
the phenomenon of de-expertise, reviewing studies which provide evidence
that intensive music-making can induce dysfunctional changes which are
accompanied by a degradation of skilled motor behavior, also termed
“musician’s dystonia” (see Peterson & Altenmüller, this volume). This
condition, which is frequently highly disabling, mainly affects male
classical musicians with a history of compulsive working behavior, anxiety
disorder, or chronic pain. We conclude with a concise summary of the role
of brain plasticity, meta-plasticity, and maladaptive plasticity in the
acquisition and loss of musicians’ expertise.
P M D
B P
Performing music at a professional level is one of the most demanding and

fascinating human experiences. Singing and playing an instrument involve
the precise execution of very fast and, in many instances, extremely
complex movements that must be structured and coordinated with
continuous auditory, somatosensory, and visual feedback. Furthermore, it
requires retrieval of musical, motor, and multisensory information from
both short-term and long-term memory and relies on continuous planning of
an ongoing performance in working memory. The consequences of motor
actions have to be anticipated, monitored, and adjusted almost in real-time
(Brown, Penhune, & Zatorre, 2015). At the same time, music should be
expressive, requiring the performance to be enriched with a complex set of
innate and acculturated emotional gestures.
Practice is required to develop all of these skills and to execute these
complex tasks. Ericsson and colleagues (Ericsson, Krampe, & Tesch-
Römer, 1993) undertook one of the most influential studies on practice,
with students at the Berlin Academy of Music. They considered not only
time invested in practice but also quality of practice, and proposed the
concept of “deliberate practice” as a prerequisite for attaining excellence.
Deliberate practice combines goal-oriented, structured, and effortful
practicing with motivation, resources, and focused attention. Ericsson and
colleagues argued that a major distinction between professional and
amateur musicians, and generally between more successful versus less
successful learners, is the amount of deliberate practice undertaken during
the many years required to develop instrumental skills to a high level
(Ericsson & Lehmann, 1996). Extraordinarily skilled musicians therefore
exert a great deal more effort and concentration during their practice than
less skilled musicians, and are more likely to plan, imagine, monitor, and
control their playing by focusing their attention on what they are practicing
and how it can be improved. Furthermore, they can be eager to build up a
network of supportive peers, frequently involving family and friends.
The concept of deliberate practice has been refined since it became clear
that not only the amount of deliberate practice, but also the point in life at
which intense goal-directed practice begins are important variables. In the
auditory domain, for example, critical periods—“windows of
opportunity”—exist for the acquisition of so-called “absolute” or “perfect”
pitch. Absolute pitch denotes the ability to name pitches without a reference
pitch. It is mediated by auditory long-term memory and is strongly linked to
intense early musical experience, usually before the age of 7 years
(Baharloo, Johnston, Service, Gitschier, & Freimer, 1998; Miyazaki, 1988;
Sergeant, 1968). However, genetic predisposition may play a role since
absolute pitch is more common in certain East Asian populations and may
run in families (Baharloo, Service, Risch, Gitschier, & Freimer, 2000;
Gregersen, Kowalsky, Kohn, & Marvin, 2001). In the sensorimotor domain,
early practice before age 7 years leads to optimized and more stable motor
programs (Furuya, Klaus, Nitsche, Paulus, & Altenmüller, 2014) and to
smaller yet more efficient neuronal networks, compared to practice
commencing later in life (Vaquero et al., 2016). This means that for specific
sensorimotor skills, such as fast and independent finger movements,
sensitive periods exist during development and maturation of the central
nervous system, comparable to those for auditory and somatosensory skills
(Ragert, Schmidt, Altenmüller, & Dinse, 2003).
The issue of nature vs. nurture, or genetic predisposition vs.
environmental influences and training in musical skills is complex, since
the success of training is itself subject to genetic variability. General
observation suggests that outcomes will not be identical for all individuals
receiving the same amount of training. Evidence supporting the
contribution of pre-existing individual differences comes from a large
Swedish twin study showing that the propensity to practice is partially
heritable (Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014).
Corrigall and colleagues investigated the contribution of cognitive and
personality variables to music training, showing that those who engage in
music perform better on cognitive tasks, have better educated parents, and
describe themselves as more “open to experience” on personality scales
(Corrigall, Schellenberg, & Misura, 2013). Findings are also beginning to
accumulate in the music performance domain, indicating that learning
outcomes can be predicted in part based on pre-existing structural or
functional brain features (Herholz, Coffey, Pantev, & Zatorre, 2016). A
convincing example of dysfunctional genetic predisposition is the inability
to acquire auditory skills in congenital amusia; a hereditary condition
characterized by absent or highly deficient pitch perception (Gingras,
Honing, Peretz, Trainor, & Fisher, 2015). In the sensorimotor domain,
musician’s dystonia, the loss of motor control in skilled movements while
playing an instrument, has a strong genetic background in about one-third
of affected musicians (Schmidt et al., 2009).
On the other hand, training is clearly necessary for musical expertise,
with a large number of researchers reporting that the length of musical
experience is strongly correlated with performance on a range of musical
tasks, as well as with brain function and structure (Amunts et al., 1997;
Bengtsson et al., 2005; Bermudez, Lerch, Evans, & Zatorre, 2008; Chen,
Penhune, & Zatorre, 2008a; Oechslin, Imfeld, Loenneker, Meyer, & Jäncke,
2010). Predispositions and experience contribute to musical expertise, and
the relative balance between the two factors may differ in specific aspects
of the many different musical subskills. Furthermore, it seems that there
exist early sensitive periods during which musical stimulation or training of
subskills has to take place in order to establish fertile ground for growing
extraordinary expertise later in life. This is best illustrated by the scaffold
metaphor (Steele, Bailey, Zatorre, & Penhune, 2013). An early start to
training develops the “scaffold” for building a “skyscraper-like” level of
expertise later in life, whereas a late start of training allows only for
moderate results even after long and intense training. Of course these
scaffolds may differ from one domain to the next. For example, an
outstanding virtuoso like the legendary pianist Lang Lang, known for his
breathtaking finger dexterity, may require both highly relevant inherited
traits and intense early sensorimotor training. Other musicians such as the
late French singer Edith Piaf, known for her emotional expressivity but
somehow lacking in technique, may have started technical exercises late in
life but had genetic and biographical conditions allowing her to build up
emotional depth, a character trait we feel and value, despite the difficulty in
operationalizing it for precise study.
Performing music at a professional level relies on a range of subskills,
which are represented in different, though overlapping brain networks.
Auditory skills such as the abovementioned perfect pitch, sensitivity to
timing variations (e.g., “groove”) and to micro-pitches (e.g., tuning of a
violin), or auditory long-term memory (e.g., memorizing a 12-tone series),
are mainly processed in the temporal lobes of both hemispheres with a right
hemisphere bias (Zatorre, 2001). However, signs of auditory and musical
expertise can already be detected in the ascending auditory pathway at the
brainstem level (Skoe & Kraus, 2013). Sensorimotor skills, such as low
two-point discrimination thresholds (the ability to discern that two nearby
objects touching the skin are two distinct points) and high tactile sensitivity
(e.g., left fifth finger in professional violinists), bimanual or quadrupedal
coordination (e.g., for piano and organ playing), fast finger movements
(e.g., right hand arpeggios on the classical guitar), or complex hand
postures (e.g., left hand on the electric guitar), are represented in premotor,
motor, and parietal cortical areas, and in subcortical brain structures such as
the basal ganglia and the cerebellum (Altenmüller & Furuya, 2015).
Emotional and performance skills are supported by individualized
prefrontal and orbitofrontal cortical regions and in the limbic system. Self-
monitoring, anticipation of the consequences of one’s actions, motivation,
and focusing attention (all contributing to goal-directed “deliberate”
practice), recruit a highly diverse network, including lateral prefrontal
cortices, parietal cortices, limbic structures, and particularly motivational
pathways, including the accumbens nucleus, and memory structures such as
the hippocampus (Zatorre & Salimpoor, 2013). All of these regions and the
interconnecting nerve fibers are subject to modifications in function and
structure in association with musical practice, a phenomenon which is based
on brain plasticity.
Brain plasticity denotes the general ability of our central nervous
system to adapt throughout the lifespan to changing environmental
conditions, body biomechanics, and new tasks. Brain plasticity is most
typically observed for complex tasks with high behavioral relevance
activating circuits involved in emotion, motivation, and reward. The
continued activities of accomplished musicians are ideal for providing the
prerequisites of brain plasticity (for a review see Schlaug, 2015). In musical
expertise, the abovementioned processes are accompanied by changes in the
function of the brain’s neuronal networks, as a result of a strengthening of
synaptic connections, and in changes of its gross structure. With respect to
mechanisms and microstructural effects of plasticity, our understanding of
the molecular and cellular processes underlying these adaptations is far
from complete. Brain plasticity may occur on different time scales. For
example, the efficiency and size of synapses may be modified in a time
window of seconds to minutes, while the growth of new synapses and
dendrites may require hours to days. An increase in gray matter density,
which mainly reflects an enlargement of neurons due to increased
metabolism, needs at least several weeks. White matter density also
increases as a consequence of musical training. This effect is primarily due
to an enlargement of myelin cells which wrap around the nerve fibers
(axons) and dendrites, greatly contributing to the velocity of the electrical
impulses traveling along them. Under conditions requiring rapid
information transfer and high temporal precision these myelin cells adapt
by growing, and as a consequence nerve conduction velocity increases.
Finally, brain regions involved in specific tasks may be enlarged after long-
term training due to the growth of structures supporting nervous function,
for example, in the blood vessels that are necessary for oxygen and glucose
transportation (for a comprehensive review see Taubert, Villringer, &
Ragert, 2012).
There are four main reasons why we believe that these effects on brain
plasticity are more pronounced in music performance than in other skilled
activities. First, the intensity of goal-directed training is extremely high;
students admitted to a German state conservatory have spent an average of
10 years and 10,000 hours of deliberate practice in order to pass the
demanding entrance examinations (Ericsson et al., 1993). Second, related to
the above, musical training in those individuals who later become
professional musicians usually starts very early, sometimes before age 6
years when the adaptability of the central nervous system is at its highest.
Third, musical activities are strongly linked to conditions of high arousal
and positive emotions, but also to stressors such as music performance
anxiety. Neuroactive hormones, such as adrenalin (arousal), endorphins
(joy), dopamine (rewarding experience), and stress hormones (fear of
failure) support neuroplastic adaptations. Fourth, performing music in
public is frequently accompanied by strong social feelings best described as
a sense of connectedness and meaning. As a consequence, increased release
of oxytocin and serotonin will similarly enhance plastic adaptations
(Zatorre & Salimpoor, 2013).
However, we should be careful in claiming that music produces more
prominent plastic adaptations in the brain compared to other skilled
activities as the methodology of group comparisons in brain plasticity
research might produce a bias. For example, group investigations into
professional classical pianists compared to “non-musicians,” such as in our
study by Vaquero et al. (2016), might be influenced by differences in
sample homogeneity. As opposed to many skilled activities, such as playing
golf or other sports or other creative professions, such as writing or
painting, classical pianists experience from a very young age similar
acculturation and take part in highly homogeneous activities due to the
canonical nature of their training. The latter study similar etudes of Hanon,
Czerny, and Chopin for many years, and this may well produce more
uniform brain adaptations, which dominate any individual changes. In other
pursuits such as the visual arts, creative writing, architecture, jazz
improvisation, and music composition, individualized training may produce
more diverse effects that are masked in group statistics.
B R I P
M : A Q O
Playing a musical instrument or singing at a professional level require

highly refined auditory, sensorimotor, and emotional-communicative skills
that are acquired over many years of extensive training, and that have to be
stored and maintained through further regular practice. Auditory feedback
is needed to improve and perfect performance, and activity of emotion-
related brain areas is required to render a performance vivid and touching.
Performance-based music-making therefore relies primarily on a highly
developed auditory–motor–emotion integration capacity, which is reflected
on the one hand in increased neuronal connectivity and on the other hand in
functional and structural adaptations of brain areas supporting these
activities. In the following, we give a quick overview of the many brain
regions involved in making music (for a review see Brown et al., 2015).
Music perception involves primary and secondary auditory areas (A1,
A2) and auditory association areas (AA) in the two temporal lobes. The
primary auditory area, localized in the upper portion of the temporal lobe in
Heschl’s gyrus receives its main input from the inner ears via the ascending
auditory pathway. It is mainly involved in basic auditory processing such as
pitch and loudness perception, perception of time structures, and spectral
decomposition. The left primary auditory cortex is specialized in the rapid
analysis of time structures, such as differences in voice onset times when
articulating “da” or “ta.” The right, on the other hand, deals primarily with
the spectral decomposition of sounds. The secondary auditory areas
surround the primary area in a belt-like formation. More complex auditory
features such as timbre are processed in the secondary auditory areas
(Koelsch, 2011). Finally, in the auditory association areas, auditory gestalt
perception takes place. Auditory gestalts can be understood, for example, as
pitch-time patterns like melodies and words. In right-handers, and in about
95 percent of all left-handers, Wernicke’s area in the left posterior portion
of the upper temporal lobe is specialized in language decoding (Kraus,
McGee, & Koch, 1998).
In contrast to the early auditory processing of simple acoustic structures,
listening to music is a far more complex task. Music is experienced not only
as an acoustic structure over time, but also as patterns, associations,
emotions, expectations, and so on. Such experiences rely on a complex set
of perceptive, cognitive, and emotional operations. Integrated over time,
and frequently linked to biographic memories, they enable us to experience
strong emotions, processed in structures of the limbic system such as the
ventral tegmental area of the mesencephalon or the accumbens nucleus in
the basal forebrain (Salimpoor et al., 2013). Memories and social emotions
evoked during music listening and playing involve the hippocampus, deep
in the temporal lobe, and the dorsolateral prefrontal cortex, mainly in the
right hemisphere.
Making music relies on voluntary skilled movements which involve
four cortical regions in both hemispheres: the primary motor area (M1)
located in the precentral gyrus directly in front of the central sulcus; the
supplementary motor area (SMA) located anteriorly to M1 of the frontal
lobe and the inner (medial) side of the cortex; the cingulate motor area
(CMA) below the SMA and above the corpus callosum on the inner
(medial) side of the hemisphere; and the premotor area (PMA), which is
located adjacent to the lateral aspect of the primary motor area (see Fig. 1).
FIGURE 1. Brain regions involved in sensory and motor music processing. (The abbreviation “a”
stands for “area.”) Left hemisphere is shown in the foreground (lower right); right hemisphere in the
background (upper left). The numbers relate to the respective Brodmann’s areas, a labeling of
cortical regions according to the fine structure of the nervous tissue.
SMA, PMA, and CMA can be described as secondary motor areas,

because they are used to process movement patterns rather than simple
movements. In addition to cortical regions, the motor system includes the
subcortical structures of the basal ganglia, and the cerebellum. Steady
kinaesthetic feedback is also required to control any guided motor action
and comes from the primary somatosensory area (S1) behind the central
sulcus in the parietal lobe. This lobe is involved in many aspects of
movement processing and is an area where information from multiple
sensory regions converges. In the posterior parietal area, body coordinates
in space are monitored and calculated, and visual information is transferred
into these coordinates. As far as musicians are concerned, this area is
prominently activated during tasks involving multisensory integration, for
example, during sight-reading, the playing of complex pieces of music
(Haslinger et al., 2005), and the transformation of musical pitch information
into movement coordinates (Brown et al., 2013) and of musical notation
into corresponding motor actions (Stewart et al., 2003).
The primary motor area (M1) represents the movements of body parts
distinctly, in systematic order. The representation of the leg is located on the
top and the inner side of the hemisphere, the arm in the upper portion, and
the hand and mouth in the lower portion of M1. This representation of
distinct body parts in corresponding brain regions is called “somatotopic”
or “homuncular” order. Just as the motor homunculus is represented upside-
down, so too is the sensory homunculus on the other side of the central
sulcus. The proportions of both the motor and the sensory homunculi are
markedly distorted since they are determined by the density of motor and
sensory innervations of the respective body parts. For example, control of
fine movements of the tongue requires many more nerve fibers transmitting
information to this muscle, compared to control of the muscles of the back.
Therefore, the hand, lips, and tongue require almost two-thirds of the
neurons in this area (Roland & Zilles, 1996). However, as further explained
below, the relative representation of body parts may be modified by usage.
Moreover, the primary motor area does not simply represent individual
muscles; multiple muscular representations are arranged in a complex way
so as to allow the execution of simple types of movements rather than the
activation of a specific muscle. This process is a consequence of the fact
that a two-dimensional array of neurons in M1 has to code for three-
dimensional movements in space (Gentner & Classen, 2006). Put more
simply, our brain does not represent muscles but rather movements.
The supplementary motor area (SMA) is mainly involved in the
sequencing of complex movements and in the triggering of movements
based on internal cues. It is particularly engaged when the execution of a
sequential movement depends on internally stored and memorized
information. It therefore is important for both rhythm and pitch processing
because of its role in sequencing and the hierarchical organization of
movement (Hikosaka & Nakamura, 2002). Skilled musicians and non-
musicians engage the SMA either when performing music or when
imagining listening to or performing music (de Manzano & Ullén, 2012;
Herholz & Zatorre, 2012). This finding suggests that the SMA may be
crucial for experts’ ability to plan music segment-by-segment during
performance.
The premotor area (PMA) is primarily engaged when our motor system
has to react to external stimuli, such as acoustic or visual prompts.
Anticipation, planning, and preparation of movement patterns in response to
visual cues have been attributed to the function of PMA (Stetson &
Anderson, 2015). It is involved in the learning, execution, and recognition
of limb movements and seems to be particularly concerned with the
integration of visual information, which is necessary for movement
planning. The PMA is also responsible for processing complex rhythms
(Chen, Penhune, & Zatorre, 2008b).
The function of the cingulate motor area (CMA) is still under debate.
Electrical stimulation and brain imaging studies demonstrate its
involvement in movement selection in situations when movements are
critical to obtain reward or avoid punishment. This fact points towards close
links between the cingulate gyrus and the emotion processing limbic
system. The CMA may therefore play an important role in mediating
cortical cognitive and limbic-emotional functions, for example, in error
processing during a musical performance (Herrojo-Ruiz, Jabusch, &
Altenmüller, 2009).
The basal ganglia, located deep inside the cerebral hemispheres, are
interconnected reciprocally via the thalamus to the motor and sensory
cortices, thus constituting a loop of information flow between cortical and
subcortical areas. They are indispensable for any kind of voluntary action
and play a crucial role in organizing sequences of motor actions. The basal
ganglia are therefore the structures mainly involved in automation of skilled
movements such as sequential finger movements (Seger, 2006). Their
special function consists of selecting appropriate motor actions and
comparing the goal and course of those actions with previous experience.
The middle putamen in particular seems to be involved in storing fast and
automated movement programs. It is subject to plastic adaptations in
professional musicians. Furthermore, in the basal ganglia the flow of
information between the cortex and the limbic emotional systems, in
particular the amygdala and the accumbens nucleus, converges. It is
therefore assumed that the basal ganglia process and control the emotional
evaluation of motor behavior in terms of expected reward or punishment
(for a review see Haber, 2003).
The cerebellum is an essential contributor to the timing and accuracy of
fine-tuned movements. It is thought to play a role in correcting errors and in
learning new skills. The cerebellum has been hypothesized to be part of a
network including parietal and motor cortex that encodes predictions of the
internal models of these skills. The term “internal model” refers to a neural
process that simulates the response of the motor system in order to estimate
the outcome of a motor command. The cerebellum is connected to almost
all regions of the brain, including those important for memory and higher
cognitive functions. It has been proposed that this structure serves as a
universal control system that contributes to learning, and to optimizing a
range of functions across the brain (Ramnani, 2014).
T E M T
B F
With advanced techniques, brain function can be precisely assessed.

Activity changes of brain networks, connectivity measures between brain
areas on a small and a large scale, and even the amount of nerve cells
activated in response to musical stimuli can be estimated (for a review on
methodology see Altenmüller, Münte, & Gerloff, 2004).
The neural bases of refined auditory processing in musicians are well
understood. In 1998, Pantev and colleagues provided a first indication that
extensive musical training can plastically alter receptive functions (Pantev
et al., 1998). Equivalent current dipole strength, a measure of mass neuronal
activation, was computed from evoked magnetic fields generated in
auditory cortex in response to piano tones and to pure tones of equal
fundamental frequency and loudness. In musicians, the responses to piano
tones (but not to pure tones) were ~25 percent larger than in non-musicians.
In a study of violinists and trumpeters, this effect was most pronounced for
tones from each musician’s own type of instrument (Hirata, Kuriki, &
Pantev, 1999). In a similar way, evoked neural responses to subtle
alterations in rhythm or pitch are much more pronounced in musicians than
in non-musicians (Münte, Nager, Beiss, Schroeder, & Altenmüller, 2003).
Even functions such as sound localization that operate on basic acoustic
properties have shown effects of plasticity and expertise amongst different
groups of musicians. A conductor, more than any other musician, is likely
to depend on spatial localization for successful performance. For example,
he might need to guide his attention to a certain player in a large orchestra.
In one study, professional conductors were found to be better than pianists
and non-musicians at separating adjacent sound sources in the periphery of
the auditory field. This behavioral selectivity was paralleled by modulation
of evoked brain responses, which were selective for the attended source in
conductors, but not in pianists or non-musicians (Münte, Kohlmetz, Nager,
& Altenmüller, 2001). These functional adaptations are not restricted to the
auditory cortex, but can be observed in subcortical areas of the ascending
auditory pathway: musically trained individuals have enhanced brainstem
representations of musical sound wave-forms (Wong, Skoe, Russo, Dees, &
Kraus, 2007).
Refined somatosensory perception constitutes another basis of high-
level performance. The kinaesthetic sense is especially important. It allows
for control and feedback of muscle and tendon tension as well as joint
positions, which enables continuous monitoring of finger, hand, and lip
position in the frames of body and instrument coordinates (e.g., the
keyboard, the mouthpiece). Intensive musical training has also been
associated with an expansion of the functional representation of finger or
hand maps, as demonstrated in magnetoencephalography (MEG) studies.
For example, the somatosensory representation of the left fifth digit in
string players was found to be larger than that of non-musicians (Elbert,
Pantev, Wienbruch, Rockstroh, & Taub, 1995). Musicians who had begun
training early in life (<13 years) demonstrated larger cortical representation
of this digit compared to those who started to play their instruments later.
This finding is reflected at a behavioral level in lower two-point
discrimination thresholds at the fingertips of musicians who started their
training earlier (Ragert et al., 2003). By contrast, weight-discrimination
ability did not differ between musicians and non-musicians, although its
individual difference within musicians predicts fine motor control well
(Hosoda & Furuya, 2016).
In motor brain function, changes corresponding to the acquisition of
musical expertise can also be observed with electrophysiological methods.
These are mainly related to reduced motor excitability thresholds (Pascual-
Leone, 2001; Ridding, Brouwer, & Nordstrom, 2000), changes in motor
program encoded in the motor cortex (Gentner et al., 2010), changes in
motor receptive fields of trained motor patterns (Pascual-Leone, Grafman,
& Hallett, 1994), and changes in sensorimotor integration (Rosenkranz et
al., 2005). For example, a non-invasive brain stimulation study facilitated
and degraded fine motor control of non-musicians and trained pianists,
respectively, which suggests a nonlinearity of cortical motor functions
(Furuya et al., 2014). In another study, auditory and premotor cortices were
co-activated when novices learned to play piano. In a longitudinal study,
Bangert and Altenmüller (2003) showed that the formation of such
multisensory connections between auditory and motor areas needs less than
six weeks of regular piano training. This finding demonstrates how brain
adaptations dynamically accompany musical learning processes. A further
causal link between training and auditory–motor integration has been
shown by findings of enhanced premotor recruitment in generating tonal
patterns after specific training on the production of those patterns (Lahav,
Saltzman, & Schlaug, 2007). This auditory–motor integration circuit is
activated in a somatotopical and time-locked manner (Furukawa, Uehara, &
Furuya, 2017).
Activation of motor co-representations can occur in trained pianists
not only by listening to piano tunes (Bangert et al., 2006), but also by
observing pianists’ finger movements. When pianists observed video
sequences of a moving hand at the piano, activation was found in additional
brain areas as compared to musically naïve subjects (Haslinger et al., 2005).
In addition to the hand area in the primary motor cortex, secondary auditory
cortices in the temporal lobe, and polymodal association cortices in the
dorsolateral premotor cortex and the parietal cortex were activated.
Furthermore, the hand areas of the cerebellum were active. This extended
neuronal network corresponds to a mirror neuron network; a group of
functionally connected areas involved in imitation of movements and
learning through observation (Rizzolatti, Fadiga, Gallese, & Fogassi, 1996).
As a consequence for musical practice, it follows that careful demonstration
at the instrument may enhance learning. Teaching methods based on
demonstration and imitation are widely used at all levels of musical
training, and would appear to be particularly effective in cases where
teachers demonstrate an action or series of actions that are carefully and
methodically observed by the student.
Practicing through listening and/or observation can be considered as
special cases of mental training. Narrowly defined, mental training is
understood as the vivid imagination of movement sequences without
physically performing them. As with observation of actions, principally the
same brain regions are active as if the imagined action is performed; that is,
the primary motor cortex, the supplementary motor cortex, and the
cerebellum (Kuhtz-Buschbeck et al., 2003). In a study investigating mental
training of finger movement sequences of different complexities, brain
activation increased with the degree of difficulty of the imagined motor
task. Furthermore, when continuing mental practice over a period of several
days, the involved brain regions showed plastic adaptations. Although these
adaptations were less dramatic than if the motor tasks were practiced
physically, mental training produced a clear improvement in task
performance as assessed by finger-tapping tests.
Many researchers have used functional magnetic resonance imaging
(fMRI) to compare neural activities between musicians and non-musicians.
Differences in activity have been observed across many brain regions when
individuals are asked to perform musical tasks involving discrimination
(e.g., Foster & Zatorre, 2010), working memory (e.g., Gaab, Gaser, &
Schlaug, 2006), or production (Bangert et al., 2006, Kleber, Veit,
Birbaumer, Gruzelier, & Lotze, 2010). Despite the heterogeneity of the
tasks used, an area that was commonly differentially activated in many of
these studies was the posterior superior temporal gyrus, which is important
for auditory gestalt perception, spectro-temporal processing, and auditory–
motor transformations (Warren, Wise, & Warren, 2005). A recent study
identified the left superior temporal gyrus as the region that is most linked
with musical training, in terms of cumulative practice hours (Ellis et al.,
2012). As we will see below, morphometric studies have found larger
amounts of gray matter in this region related to expertise and specific
auditory skills, such as the possession of perfect pitch (Gaser & Schlaug,
2003).
T E M T
B S
Since the age of phrenology, neuroscientists have tried to relate

extraordinary skills to changes in brain anatomy. For example, at the
beginning of the twentieth century, Auerbach (1906–1913) reported that the
middle and posterior thirds of the superior temporal gyrus were larger than
normal in several post-mortem studies of the brains of famous musicians.
Modern brain imaging techniques such as high-resolution magnetic
resonance imaging (MRI), voxel-based morphometry (VBM), and tensor-
based morphometry (TBM) allow precise determination of gray and white
matter volume in predefined brain regions. A relatively new technique that
can be used to study differences in fiber tract volume and direction is
diffusion tensor imaging (DTI). This provides information about white
matter microstructures by measuring diffusion properties of water
molecules that move preferentially along the myelin sheets of axons. The
degree of diffusivity is quantified as fractional anisotropy (FA), a measure
allowing the assessment of orientation and direction of axons and their
degree of myelination (Bandettini, 2009).
In the auditory domain functional adaptations, such as increased
sensitivity to sounds, are accompanied by anatomical changes in primary or
secondary auditory cortices of the superior temporal gyrus and the temporal
plane (Bermudez et al., 2008; Gaser & Schlaug, 2003; Schneider et al.,
2005; Zatorre, Chen, & Penhune, 2007). A study by Schneider and
colleagues (2002) in professional musicians, amateurs, and non-musicians
is especially meaningful. These authors not only found an enlargement of
the primary auditory cortex (Heschl’s gyrus) related to increased cumulative
life practice time in professional musicians, but also demonstrated that this
enlargement was accompanied by more pronounced neuronal
representations for pure tones in the same region, as reflected in the dipole
size of evoked neuromagnetic fields. Behavioral tests in the same three
groups of subjects revealed that the volume of Heschl’s gyrus was
positively related to improved auditory working memory and gestalt
perception, as operationalized in a musical aptitude test. These behavioral,
functional, and anatomical changes seem to be causally linked to musical
training according to a longitudinal study with children (Hyde et al., 2009).
Fifteen 6-year-old children received fifteen months of piano training and
showed not only improved auditory perception, but also enlarged gray
matter of the right primary auditory cortex as compared to sixteen age-
matched controls.
In absolute pitch possessors, a pronounced leftward asymmetry of the
temporal plane was found (Schlaug, Jäncke, Huang, & Steinmetz, 1995). It
was also demonstrated that in musicians with absolute pitch, the posterior
superior temporal gyrus is connected to a region within the middle temporal
gyrus which has been associated with categorical perception (Loui, Li,
Hohmann, & Schlaug, 2011). Thus, the connections between the posterior
part of the superior temporal gyrus and the middle temporal gyrus may play
a role in determining whether or not someone develops absolute pitch,
alongside early exposure to music.
In the sensorimotor domain, extensive musical practice during
childhood and adolescence might have a strong effect on the maturation and
the development of brain structures involved. Keyboard players have been a
preferred group to study structural brain changes due to a high demand on
bimanual dexterity and the possibility of assessing behavior such as speed
and regularity of finger movements with MIDI technology (Amunts et al.,
1997; Bangert et al., 2006). In the first study that examined structural
differences between musicians and non-musicians, Schlaug and
collaborators (Schlaug, Jäncke, Huang, Staiger, & Steinmetz, 1995) showed
that professional musicians (pianists and string-players) had a larger middle
section of the corpus callosum compared to a non-musician control group.
This finding was ascribed to an increase in myelination in the crossing
fibers of the hand areas of both hemispheres, related to the high demands on
bimanual coordination. Different research groups using a range of
methodological approaches have replicated this finding (Gärtner et al.,
2013; Öztürk, Tascioglu, Aktekin, Kurtoglu, & Erden, 2002; Steele et al.,
2013). A causal relationship between piano training and enlargement of the
corpus callosum was established in the abovementioned longitudinal study
by Hyde et al. (2009). Other fiber tracts have been investigated in
musicians: in a DTI study with pianists, Bengtsson and colleagues (2005)
found that the size of several white matter tracts correlated with the
estimated amount of musical practice during childhood. These structures
included the posterior limb of the internal capsule, a part of the
corticospinal tract descending from the motor cortex to the spinal cord, and
fiber tracts connecting the temporal and frontal lobes. Although the total
number of practice hours during childhood was lower than in adolescence
and adulthood, these adaptations support the idea that the central nervous
system exhibits greater plastic capacities during early stages of
development and maturation periods. However, some studies have reported
lower fractional anisotropy in musicians in the corticospinal tract
connecting primary motor areas with the spinal cord (Imfeld, Oechslin,
Meyer, Loenneker, & Jäncke, 2009), and in the arcuate fasciculus, the fiber
tract connecting auditory and premotor regions (Halwani, Loui, Rüber, &
Schlaug, 2011). According to Schlaug (2015) these discrepant results may
be explained by the fact that these fiber tracts are aligned in a less parallel
manner than in non-musicians due to increased axonal sprouting and more
branching of axons. In future, imaging technologies may provide a more
fine-grained picture of nervous tissues.
Concerning the size of primary motor cortex, various findings have
been reported. In pianists, the depth of the central sulcus, often used as a
marker of primary motor cortex size, was larger in both hemispheres but
compared to non-musicians more pronounced on the right hemisphere,
corresponding to the non-dominant left hand function (Amunts et al., 1997;
Schlaug, 2001). It was argued that years of manual motor practice of the
non-dominant left hand produced this effect on the right hemisphere. For
the dominant right hand and left hemisphere this effect was believed to be
masked, since it undergoes some form of fine-motor training in everyone
who writes and performs other skilled sensorimotor tasks with that hand. As
was observed for the corpus callosum, there was a positive correlation
between the size of the primary motor cortex and the onset of instrumental
musical training. Again, a causal relationship was established in the
abovementioned longitudinal study in child piano novices, with an increase
in gray matter density in the right motor hand area associated with fifteen
months of piano training (Hyde et al., 2009). A recent investigation into
middle-aged pianists revealed some interesting details concerning the effect
of ongoing expertise on lifelong plasticity (Gärtner et al., 2013). Pianists
who continued to give concerts and practice for a minimum of three hours a
day showed not only larger motor hand areas, but also larger foot areas in
the sensorimotor cortices of both hemispheres than pedagogues who had
majored in piano performance, but who had practiced for less than two
hours a day over the last ten years. This result relates to the important role
of pedaling in piano performance. Pedaling is a highly refined skill
requiring spatiotemporal control in the range of millimeters and
milliseconds in order to adaptively modulate color, expressivity, and
loudness of the music.
Structural brain differences have been reported in musicians who play
different instruments (Bangert & Schlaug, 2006). For keyboard players, the
omega-shaped folding of the precentral gyrus, which is associated with
hand and finger movement representation, was found to be more prominent
in the left hemisphere for keyboard players, but was more prominent in the
right hemisphere for string players. This structural difference is likely to
reflect an adaptation to the specific demands of different musical
instruments. Obviously, the rapid and spatiotemporally precise movements
of the left hand in string-players are a stronger stimulus for plastic
adaptations compared to the right hand bowing movements requiring the
fine-tuned balance of fingers at the frog of the bow, and precise movements
of wrist and arm. Gaser and Schlaug (2003) compared professional pianists,
amateur musicians, and non-musicians and reported increased gray matter
(GM) volume in professional musicians not only in primary motor,
somatosensory, and premotor areas, but also in multisensory parietal
integration areas and in cerebellar brain regions. Modeling musical
expertise with the same three-group population, James and collaborators
(James et al., 2013) reported an intricate pattern of increased/decreased
GM. In particular, musicians showed GM density increases in areas related
to higher-order cognitive processes (such as the fusiform gyrus or the
inferior frontal gyrus), whereas GM decreases were found in sensorimotor
regions (such as perirolandic and striatal areas). These reductions in GM
were interpreted as reflecting a higher degree of automaticity of motor skills
in more expert musicians. Like GM, white matter differs between different
instrumentalists, for example, a larger FA at the corpus callosum for the
string players than the pianists (Vollmann et al., 2014).
It is now well established that along with increasing expertise, not only
enlargement but also reduction of neural structure can be observed. This
was first established in a study of pianists targeting the middle putamen in
the basal ganglia; a brain region involved in automation of motor programs.
Granert and colleagues (Granert, Peller, Jabusch, Altenmüller, & Siebner,
2011) measured the skill level of piano playing via temporal accuracy in a
scale-playing task. These authors found that the higher the level of piano
playing, the smaller the volume of gray matter in the right putamen. This
reduction was ascribed to an optimization process of neuronal networks
within the putamen, leading to fewer, but more efficient and stable dendritic
and axonal connections in this area of the motor basal ganglia loop.
Until recently, it remained an open question, the degree to which these
structural and functional brain changes are influenced by age at onset of
musical activity and by cumulative practice hours over particular periods of
life. These factors have often been confounded and it was generally
believed that early commencement of musical activity, along with increased
life practice time, resulted in enlarged neural representations underpinning
auditory or sensorimotor skills. Steele and colleagues (2013) were the first
to investigate the morphology of the corpus callosum such that they could
compare its white matter organization in early- and late-trained musicians
who had been matched for years of training and experience. They found
that early-trained musicians had greater connectivity in the posterior part of
the corpus callosum and that fractional anisotropy in this region was related
both to age at onset of training and to sensorimotor synchronization
performance. They concluded that training before the age of 7 years
resulted in changes in white matter connectivity that may serve as a scaffold
upon which ongoing experience can build. Inspired by this work, and since
in this study neither gray matter density nor the size of specific brain areas
were analyzed, we designed a similar brain morphometry study in a group
of thirty-six award-winning professional pianists (Vaquero et al. 2016). We
kept cumulative life practice time constant, but split the group into twenty-
one pianists who had started their musical training before age 7, and another
group of fifteen who had started after that age. We compared brain anatomy
between these groups, and between musicians and age-matched medical
students who were non-musicians. In addition, twenty-eight pianists from
the sample completed a scale-playing task, in order for us to obtain an
objective measure of their pianistic abilities and temporal precision.
Compared with non-musicians, pianists showed more gray matter in regions
associated with learning (hippocampus), sensory and motor control and
processing (putamen and thalamus), emotional processing and the reward
system (amygdala), and with auditory and language processing (left
superior temporal cortex). However, they also showed less gray matter in
regions involved in sensory and motor control (postcentral gyrus) and
processing of musical stimuli (right superior temporal cortex), as well as
structures that have been related to music-score reading (supramarginal
gyrus). Moreover, among the pianists it was observed that the size of the
right putamen correlated significantly with the age at which music training
began: the earlier they started to play the piano, the smaller the volume of
gray matter in the right putamen (see Fig. 2). In keeping with the
interpretation of the results of Granert et al. (2011) reported above, pianists
who started earlier in life optimized functionality of neural structures
involved in sensorimotor processing, motor learning, and motor memory.
This is reflected in the behavioral task: those pianists who had started their
musical training before age 7 played with higher regularity than those who
started after that age, even though all of the pianists practiced for the same
number of hours around the time of the study and had achieved the same
level of proficiency. This is an important scientific proof of common
knowledge, expressed in proverbs such as “a tree must bend while it is
young.” With respect to brain sciences, it is an interesting phenomenon,
showing that even for highly complex motor tasks, sensitive periods in the
nervous system exist (Furuya, Klaus, et al., 2014). However, as we have
seen, such windows of opportunity can depend on domain, genetics, and
continuing training.
FIGURE 2. Summary of the results of the study on pianists by Vaquero et al. (2016). Explanations
are given in the figure and text.
Reprinted from NeuroImage, 126, Lucía Vaquero, Karl Hartmann, Pablo Ripollés, Nuria
Rojo, Joanna Sierpowska, Clément François, Estela Càmara, Floris Tijmen van Vugt,
Bahram Mohammadi, Amir Samii, Thomas F. Münte, Antoni Rodríguez-Fornells, and Eckart
Altenmüller, Structural neuroplasticity in expert pianists depends on the age of musical
training onset, pp. 106–119, Copyright © 2015 Elsevier Ltd. All rights reserved.
D -E : M ’ D
S M P
Brain plasticity is not always beneficial. Overtraining, fear of failure,

chronic pain, and other stressors may trigger a deterioration of motor
control and initiate a process of degradation of skill. We propose to term
this plasticity-induced loss of skills “de-expertise.” Approximately one or
two in 100 professional musicians suffer from a loss of voluntary control of
their extensively trained, refined, and complex sensorimotor skills. This
condition is generally referred to as focal dystonia, violinists’ cramp, or
pianists’ cramp. In many cases, focal dystonia is so disabling that it
prematurely ends the artist’s professional career (Altenmüller, Ioannou, &
Lee, 2015; see also Peterson & Altenmüller, this volume). The various
symptoms that can mark the beginning of the disorder include subtle loss of
control in fast passages, finger-curling, lack of precision in forked
fingerings in woodwind players, irregularity of trills, fingers sticking on the
keys, involuntary flexion of the bowing thumb in strings, and impairment of
control of the embouchure in woodwind and brass players in certain
registers. At this stage, most musicians believe that the reduced precision of
their movements is due to a technical problem. As a consequence, they
intensify their efforts, but this often only exacerbates the problem.
Musician’s dystonia (MD) has been described for almost every
instrument, including keyboard, strings, plucked instruments, woodwind,
brass, percussion, and folk instruments such as bagpipes and accordion. In a
recent study of the epidemiology of 369 German professional musicians
suffering from MD, keyboard players were most common with 27.1 percent
(percentage in the healthy musicians population 29 percent, Altenmüller,
Baur, Hofmann, Lim, & Jabusch, 2012), followed by woodwind (21.7
percent, vs. 15 percent healthy musicians) and brass players (20.9 percent,
vs. 10 percent healthy musicians). When subdividing these instrumental
groups into single instruments, piano represented 22 percent of the total,
guitar 15.2 percent, flute 9.7 percent, and violin 7.6 percent (Lee, Heiß,
Eich, Ioannou, & Altenmüller, 2018).
The interplay of predisposing and triggering factors in musician’s
dystonia has become a topic of intense research (Jabusch & Altenmüller,
2004; Jabusch, Müller, & Altenmüller, 2004; Ioannou & Altenmüller, 2014;
Ioannou, Furuya, & Altenmüller, 2016). As predisposing factors, male
gender and genetic susceptibility to malfunction of neuronal networks
involved in sensorimotor representations and pathways play a role (Schmidt
et al., 2009). Furthermore, psychological traits such as elevated anxiety and
obsessive-compulsive-disorder (OCD) have been documented. These may
be partly interrelated: male gender and genetic susceptibility for
malfunction of sensorimotor pathways may have a common origin, such as
certain genes located on the X-chromosome. OCD and anxiety can in turn
lead to dystonia-triggering behaviors, such as overuse of muscles and
exaggerated, repetitive training. The following triggering factors have been
identified: high motor demands; extra-instrumental activities, such as
writing and typing (Baur, Jabusch, & Altenmüller, 2011); late onset of
training; playing classical music with high demands on precise
temporospatial control; increased general psychological and muscular
tension; and destabilization of overlearned sensory motor programs due to
imposed technique changes or sensory disturbances following nerve or soft-
tissue injury (for a review, see Altenmüller et al., 2015; Peterson &
Altenmüller, this volume).
B C A L
S C
The etiology of focal dystonia is not completely understood at present, but

it is probably multifactorial. Most studies of focal dystonia reveal
abnormalities in three main areas: (a) reduced inhibition in the motor
system at cortical, subcortical and spinal levels; (b) altered sensory
perception and integration; and (c) impaired sensorimotor integration. All
of these changes are believed to primarily originate from dysfunctional
brain plasticity.
A lack of inhibition is a common finding in studies of patients with all
forms of dystonia (for a review see Lin & Hallett, 2009). Fine motor control
in general requires a subtle balance in neural circuits between excitation and
inhibition. This fact is particularly important in allowing precise and
smooth hand movements. For example, rapid individuated finger
movements in piano playing require selective and specific activation of
muscles to move the intended finger in the desired manner, and to inhibit
movements of uninvolved fingers (Furuya, Oku, Miyazaki, & Kinoshita,
2015). In patients suffering from hand dystonia, electromyographic
recordings have revealed abnormally prolonged muscle firing with co-
contraction of antagonistic muscles and overflow of activation to
inappropriate muscles (Furuya & Altenmüller, 2013). Lack of inhibition is
found at multiple levels of the nervous system. At the spinal level, it leads
to reduced reciprocal inhibition of antagonistic muscle groups producing
co-contraction, for example of wrist flexors and extensor muscles. This in
turn produces a feeling of stiffness and immobility, and frequently leads to
abnormal postures with predominant flexion of the wrist due to the relative
strength of the flexor muscles. Abnormal inhibition has also been
demonstrated at the cortical level by using non-invasive transcranial
magnetic stimulation to measure intracortical inhibition (Sommer et al.,
2002). Interestingly, at this level abnormal inhibition is frequently seen in
both hemispheres, despite unilateral symptoms. This points towards a more
generalized form of inhibition deficit. Finally, lack of inhibition is also seen
in more complex tasks, such as when movement preparation is required
prior to scale playing, and for sudden movement inhibition following a stop
signal in pianists (Herrojo-Ruiz et al., 2009). The ubiquitous demonstration
of deficient inhibition is suggestive of a common underlying genetic cause.
However, it has to be emphasized that none of these electrophysiological
effects allow diagnosis on an individual level, since the variability in both
healthy and dystonic musicians is extremely large.
Altered sensory perception may also be a sign of maladaptive brain
plasticity. Several researchers have demonstrated that the ability to perceive
two stimuli as temporally or spatially separate is impaired in patients with
musician’s dystonia, whether sensation is via the fingertips (in hand
dystonia), or the lips (in embouchure dystonia). This behavioral deficit is
mirrored in findings of the cortical somatosensory representation of fingers
or lips. It has been demonstrated with various functional brain imaging
methods that in the somatosensory cortex the topographical locations of
sensory inputs from individual fingers overlap more in patients with
musician’s cramp than in healthy controls (Elbert et al., 1998). Similarly, lip
representation may be altered in patients suffering from embouchure
dystonia (Haslinger, Altenmüller, Castrop, Zimmer, & Dresel, 2010). Other
abnormalities include elevated temporal discrimination thresholds, a marker
of basal ganglia dysfunction found to be relevant to the pathogenesis of
focal dystonia (Termsarasab et al., 2015). Since in healthy musicians an
increase of the size of sensory finger representations has been interpreted as
an adaptive plastic change to support the current needs and experiences of
the individual (see above, Elbert et al., 1995), it could be speculated that
these changes overdevelop in musicians suffering from dystonia, shifting
brain plasticity from being beneficial to maladaptive (Rosenkranz et al.,
2005). In this context it is worth recalling that local pain and intensified
sensory input due to nerve entrapment, trauma, or muscle overuse have
been described as potential triggers of dystonia. There are clear parallels of
abnormal cortical processing of sensory information and cortical
reorganization between patients with chronic pain and those with focal
dystonia. An animal model of focal dystonia established in overtrained
monkeys supports this suggestion; repetitive movements induced both types
of symptoms—pain syndromes as well as dystonic movements. Mapping of
neural receptive fields has demonstrated a distortion of cortical
somatosensory representations (Byl, Merzenich, & Jenkins, 1996),
suggesting that overtraining and practice-induced alterations in cortical
processing may play a role in focal hand dystonia.
Impaired sensorimotor integration also plays an important role in the
pathophysiology of musician’s dystonia. This is best illustrated by the
“sensory trick” phenomenon: some musicians suffering from dystonia show
a marked improvement of fine motor control when playing with a latex
glove, or when holding an object (such as a rubber gum) between the
fingers, thus changing the somatosensory input. In experimental settings,
vibrating stimuli lead to a worsening of musician’s dystonia. In one study,
when transcranial magnetic stimulation was used in conjunction with
muscle vibration, motor evoked potentials decreased in agonist muscles and
increased in antagonist muscles (Rosenkranz, Altenmüller, Siggelkow, &
Dengler, 2000; Rosenkranz et al., 2005). These data again suggest an
altered central integration of sensory input in musician’s dystonia, which
might be due to the failure to link the proprioceptive input to the
appropriate motor cortical area. Reversing these effects of sensorimotor
disintegration is the approach of several retraining therapies. Sensory
retraining in the form of tactile discrimination practice can ameliorate
motor symptoms, suggesting that the abovementioned sensory
abnormalities may drive the motor disorder. Interestingly, a positive
response to the sensory trick phenomenon is linked to a better outcome in
attempts to re-educate musicians with dystonia (Paulig, Jabusch, Großbach,
Boullet, & Altenmüller, 2014).
Innovative brain imaging techniques recently demonstrated that
musician’s dystonia is also a network disorder. Assessment of changes in
functional brain networks and in neural connectivity between different brain
regions revealed that patients with musician’s dystonia show altered
network architecture characterized by abnormal expansion or shrinkage of
neural cell assemblies. These changes include the breakdown of basal
ganglia–cerebellar interaction, loss of a pivotal region of information
transfer in the premotor cortex, and pronounced reduction of connectivity
within the sensorimotor and fronto-parietal regions (e.g., Battistella,
Termsarasab, Ramdhani, Fuertinger, & Simonyan, 2017; Strübing, Ruiz,
Jabusch, & Altenmüller, 2012). These abnormalities have been further
characterized by significant connectivity changes in the primary
sensorimotor and inferior parietal cortices. Therefore, musician’s dystonia
likely represents a disorder of large-scale functional networks. However, the
specific role of these networks and their inter-individual variability remain
to be clarified.
There are currently several treatment methods available for musician’s
dystonia. Novel strategies aim to reverse the maladaptive plastic changes,
for example with inhibition of overactive motor areas on the affected side,
alongside activation of the contralateral “healthy” motor cortex, whilst
musicians perform in-phase symmetrical finger exercises on a keyboard
(Furuya, Nitsche, Paulus, & Altenmüller, 2014). Retraining may also be
successful, but usually requires several years to succeed (van Vugt, Boullet,
Jabusch, & Altenmüller, 2014). Symptomatic treatment through temporary
weakening of the cramping muscles by injecting Botulinum toxin has
proven to be helpful in other cases; however, since the injections need to be
applied regularly every three to five months during one’s career, this
approach does not offer a good solution for young patients. Thus, the
challenge is to prevent young musicians from acquiring such a disorder.
The components of such a prevention program include reasonable practice
schedules, economic technique, prevention of muscle overuse and pain,
mental practice, avoidance of exaggerated perfectionism, and psychological
support with respect to self-confidence.
B P P
R E P
M
In the preceding paragraphs, we have demonstrated how musical activities,

such as learning to master an instrument and to perform in public, induce
brain plasticity. These central nervous adaptations are in most cases
beneficial but in some circumstances may be detrimental, as illustrated in
musician’s dystonia. Age at commencement of practice, amount and quality
of practice, genetic predisposition, and accompanying conditions, such as
stressors or muscular overuse, determine the quality and nature (adaptive or
maladaptive) of these brain changes. Furthermore, the brain’s “sensitive
periods,” when it is best shaped, seem not only to depend on hereditary
factors but also to vary in different sensory, motor, and cognitive domains.
We propose the concept of meta-plasticity, conceptualized above with a
scaffold metaphor: early musical training stabilizes the sensorimotor system
and provides neuroprotective effects with respect to the development of
focal dystonia (see Fig. 3). These effects are maintained for the whole
lifespan: those musicians who start early not only develop superior auditory
and sensorimotor skills, they also show less age-induced decline of
sensorimotor and cognitive functions (Krampe & Ericsson, 1996; Meinz,
2000). Thus, intense musical training in childhood can bring about lifelong
change in both structure and functions of auditory, sensorimotor, and
emotional systems. These not only enhance musical skill acquisition and
guard against disorders triggered by extensive training, but also serve as
ingredients for better shaping lifelong neuronal development.
FIGURE 3. Different time courses of skill acquisition in earlier and later “starters,” and in
musicians suffering from dystonia. Neuronal networks underpinning specific skills are optimized in
childhood. This allows more effective acquisition during pre-adolescence and adolescence. Skill
level continues to improve during adulthood, and stabilizes as related activities cease. Early-
optimized neuronal networks are more stable and less susceptible to maladaptive changes, such as
occur in musician’s dystonia. Here late inception of training and specific trigger factors may lead to
a deterioration of sensorimotor skills.
We would like to conclude our chapter with a general remark. As

emphasized above, the complex neurophysiological processes involved in
musical training and expert performance are not restricted to sensorimotor
brain circuits, but also involve memory, imagination, creativity, and—most
importantly—emotional communicative skills. The most brilliant virtuosos
will not move their listeners if imagination, color, fantasy, and emotion are
not part of their artistic expression. These qualities are often not trained
solely within a practice studio, but depend on and may be linked to
experience from daily life, human relationships, a rich artistic environment,
and emotional depth. Such factors that profoundly influence the aesthetic
quality of music performance can be subject to expertise research; however,
they are presently inaccessible to neuroscientific methodology. Important
steps here will include the development of more fine-grained imaging
technologies and the integration of findings from brain morphology, nerve
cell metabolism, connectivity measures, and neurotransmitter activities, at
the individual level (Amunts & Zilles, 2015). Along with experimental
paradigms that include meaningful behavioral measures, such research may
eventually enable us to uncover the secrets of musical creativity and its
emotional power.
A
This work was in part supported by the DYSTRACT grant, awarded from
the BMBF to authors EA and DS.
R
Altenmüller, E., Baur, V., Hofmann, A., Lim, V. K., & Jabusch, H. C. (2012). Musician’s cramp as
manifestation of maladaptive brain plasticity: Arguments from instrumental differences. Annals of
Altenmüller, E., & Furuya, S. (2015). Planning and performance. In S. Hallam, I. Cross, & M. Thaut
(Eds.), The Oxford handbook of music psychology (2nd ed., pp. 529–546). Oxford: Oxford
University Press.
Altenmüller, E., Ioannou, C. I., & Lee, A. (2015). Apollo’s curse: Neurological causes of motor
impairments in musicians. Progress in Brain Research 217, 89–106.
Altenmüller, E., Münte, T. H., & Gerloff, C. (2004). Neurocognitive functions and the EEG. In E.
Niedermeyer & F. Lopes da Silva (eds.), Electroencephalography (5th ed., pp. 661–682).
Baltimore, MD: Lippincott Williams.
Amunts, K., Schlaug, G., Jäncke, L., Steinmetz, H., Schleicher, A., Dabringhaus, A., & Zilles, K.
(1997). Motor cortex and hand motor skills: Structural compliance in the human brain. Human
Brain Mapping 5(3), 206–215.
Amunts, K., & Zilles, K. (2015). Architectonic mapping of the human brain beyond Brodmann.
Neuron 88(6), 1086–1107.
Auerbach, S. (1906–1913). Zur Lokalisation des musicalischen Talentes im Gehirn und am Schädel.
Archives of Anatomy and Physiology (1906, 197–230; 1908, 31–38; 1911, 1–10; 1913 (Suppl.),
89–96).
Baharloo, S., Johnston, P. A., Service, S. K., Gitschier, J., & Freimer, N. B. (1998). Absolute pitch:
An approach for identification of genetic and nongenetic components. American Journal of
Baharloo, S., Service, S. K., Risch, N., Gitschier, J., & Freimer, N. B. (2000). Familial aggregation of
absolute pitch. American Journal of Human Genetics 67(3), 755–758.
Bandettini, P. A. (2009). What is new in neuroimaging methods? Annals of the New York Academy of
Sciences 1156, 260–293.
Bangert, M., & Altenmüller, E. (2003). Mapping perception to action in piano practice: A
longitudinal DC-EEG-study. BMC Neuroscience 4, 26–36.
Bangert, M., Peschel, T., Rotte, M., Drescher, D., Hinrichs, H., Schlaug, G., … Altenmüller, E.
Bangert, M., & Schlaug, G. (2006). Specialization of the specialized in features of external brain
morphology. European Journal of Neuroscience 24(6), 1832–1834.
Battistella, G., Termsarasab, P., Ramdhani, R. A., Fuertinger, S., & Simonyan, K. (2017). Isolated
focal dystonia as a disorder of large-scale functional networks. Cerebral Cortex 27(2), 1203–1215.
Baur, V., Jabusch, H. C., & Altenmüller, E. (2011). Behavioral factors influence the phenotype of
musician’s dystonia. Movement Disorders 26(9), 1780–1781.
Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano
1148–1150.
19(7), 1583–1596.
Brown, R. M., Chen, J. L., Hollinger, A., Palmer, C., Penhune, V., & Zatorre, R. J. (2013). Repetition
suppression in auditory-motor regions to pitch and temporal structure in music. Journal of
Brown, R. M., Penhune, V. B., & Zatorre, R. (2015). Expert music performance: Cognitive, neural,
and developmental bases. Progress in Brain Research 217, 57–86.
Byl, N. N., Merzenich, M. M., & Jenkins, W. M. (1996). A primate genesis model of focal dystonia
and repetitive strain injury: I. Learning-induced dedifferentiation of the representation of the hand
in the primary somatosensory cortex in adult monkeys. Neurology 47(2), 508–520.
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008a). Moving on time: Brain network for auditory-
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008b). Listening to musical rhythms recruits motor
Corrigall, K. A., Schellenberg, E. G., & Misura, N. M. (2013). Music training, cognition, and
personality. Frontiers in Psychology 4, 222. Retrieved from
De Manzano, Ö., & Ullén, F. (2012). Activation and connectivity patterns of the presupplementary
and dorsal premotor areas during free improvisation of melodies and rhythms. NeuroImage 63(1),
272–280.
Elbert, T., Candia, V., Altenmüller, E., Rau, H., Rockstroh, B., Pantev, C., & Taub, E. (1998).
Alteration of digital representations in somatosensory cortex in focal hand dystonia. Neuroreport
9(16), 3571–3575.
Ellis, R. J., Norton, A., Overy, K., Winner, E., Alsop, D., & Schlaug, G. (2012). Differentiating
maturational and training influences on fMRI activation during music processing. NeuroImage
60(3), 1902–1912.
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the
acquisition of expert performance. Psychological Review 100(3), 363–406.
Ericsson, K. A., & Lehmann, A. C. (1996). Expert and exceptional performance: Evidence of
maximal adaptation to task constraints. Annual Review of Psychology 47, 273–305.
Foster, N. E., & Zatorre, R. J. (2010). A role for the intraparietal sulcus in transforming musical pitch
information. Cerebral Cortex 20(6), 1350–1359.
Furukawa, Y., Uehara, K., & Furuya, S. (2017). Expertise-dependent motor somatotopy of music
perception. Neuroscience Letters 650, 97–102.
Furuya, S., & Altenmüller, E. (2013). Finger-specific loss of independent control of movements in
musicians with focal dystonia. Neuroscience 247, 152–163.
Furuya, S., Klaus, M., Nitsche, M. A., Paulus, W., & Altenmüller, E. (2014). Ceiling effects prevent
further improvement of transcranial stimulation in skilled musicians. Journal of Neuroscience
34(41), 13834–13839.
Furuya, S., Nitsche, M. A., Paulus, W., & Altenmüller, E. (2014). Surmounting retraining limits in
musicians’ dystonia by transcranial stimulation. Annals of Neurology 75(5), 700–707.
Furuya, S., Oku, T., Miyazaki, F., & Kinoshita, H. (2015). Secrets of virtuoso: Neuromuscular
attributes of motor virtuosity in expert musicians. Scientific Reports 5, 15750.
Gaab, N., Gaser, C., & Schlaug, G. (2006). Improvement-related functional plasticity following pitch
memory training. NeuroImage 31(1), 255–263.
Gärtner, H., Minnerop, M., Pieperhoff, P., Schleicher, A., Zilles, K., Altenmüller, E., & Amunts, K.
(2013). Brain morphometry shows effects of long-term musical practice in middle-aged keyboard
players. Frontiers in Psychology 4, 636. Retrieved from https://doi.org/10.3389/fpsyg.2013.00636
Gentner, R., & Classen, J. (2006). Modular organization of finger movements by the human central
nervous system. Neuron 52(4), 731–742.
Gentner, R., Gorges, S., Weise, D., aufm Kampe, K., Buttmann, M., & Classen, J. (2010). Encoding
of motor skill in the corticomuscular system of musicians. Current Biology 20(20), 1869–1874.
Gingras, B., Honing, H., Peretz, I., Trainor, L. J., & Fisher, S. E. (2015). Defining the biological
bases of individual differences in musicality. Philososophical Transactions of the Royal Society of
London B: Biological Sciences 370(1664), 20140092.
Granert, O., Peller, M., Jabusch, H. C., Altenmüller, E., & Siebner, H. R. (2011). Sensorimotor skills
and focal dystonia are linked to putaminal grey-matter volume in pianists. Journal of Neurology,
Neurosurgery, and Psychiatry 82(11), 1225–1231.
Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2001). Early childhood music education
and predisposition to absolute pitch: Teasing apart genes and environment. American Journal of
Medical Genetics 98(3), 280–282.
Haber, J. (2003). The primate basal ganglia: Parallel and integrative networks. Journal of Chemical
Neuroanatomy 26(4), 317–330.
Halwani, G. F., Loui, P., Rüber, T., & Schlaug, G. (2011). Effects of practice and experience on the
Haslinger, B., Altenmüller, E., Castrop, F., Zimmer, C., & Dresel, C. (2010). Sensorimotor
overactivity as a pathophysiologic trait of embouchure dystonia. Neurology 74(22), 1790–1797.
Haslinger, B., Erhard, P., Altenmüller, E., Schroeder, U., Boecker, H., & Ceballos-Baumann, A. O.
(2005). Transmodal sensorimotor networks during action observation in professional pianists.
Herholz, S. C., Coffey, E. B., Pantev, C., & Zatorre, R. J. (2016). Dissociation of neural networks for
predisposition and for training-related plasticity in auditory-motor learning. Cerebral Cortex 26(7),
3125–3134.
Herrojo-Ruiz, M., Jabusch, H. C., & Altenmüller, E. (2009). Detecting wrong notes in advance:
Neuronal correlates of error monitoring in pianists. Cerebral Cortex 19(11), 2625–2639.
Herrojo-Ruiz, M., Senghaas, P., Grossbach, M., Jabusch, H. C., Bangert, M., Hummel, F., …
Altenmüller E. (2009). Defective inhibition and inter-regional phase synchronization in pianists
with musician’s dystonia (MD): An EEG study. Human Brain Mapping 30(8), 2689–2700.
Hikosaka, O., & Nakamura, K. (2002). Central mechanisms of motor skill learning. Current Opinion
in Neurobiology 12(2), 217–222.
Hirata, Y., Kuriki, S., & Pantev, C. (1999). Musicians with absolute pitch show distinct neural
activities in the auditory cortex. Neuroreport 10(5), 999–1002.
Hosoda, M., & Furuya, S. (2016). Shared somatosensory and motor functions in musicians. Scientific
Reports 6, 37632.
Imfeld, A., Oechslin, M. S., Meyer, M., Loenneker, T., & Jäncke, L. (2009). White matter plasticity
in the corticospinal tract of musicians: A diffusion tensor imaging study. NeuroImage 46(3), 600–
607.
Ioannou, C. I., & Altenmüller, E. (2014). Psychological characteristics in musician’s dystonia: A new
diagnostic classification. Neuropsychology 61, 80–88.
Ioannou, C. I., Furuya, S., & Altenmüller, E. (2016). The impact of stress on motor performance in
skilled musicians suffering from focal dystonia: Physiological and psychological characteristics.
Jabusch, H. C., & Altenmüller, E. (2004). Anxiety as an aggravating factor during onset of focal
dystonia in musicians. Medical Problems of Performing Artists 19(2), 75–81.
Jabusch, H. C., Müller, S. V., & Altenmüller, E. (2004). High levels of perfectionism and anxiety in
musicians with focal dystonia. Movement Disorders 19(10), 990–991.
James, C. E., Oechslin, M. S., Van De Ville, D., Hauert, C. A., Descloux, C., & Lazeyras, F. (2013).
Musical training intensity yields opposite effects on grey matter density in cognitive versus
sensorimotor networks. Brain Structure and Function 219(1), 53–66.
Kleber, B., Veit, R., Birbaumer, N., Gruzelier, J., & Lotze, M. (2010). The brain of opera singers:
Experience-dependent changes in functional activation. Cerebral Cortex 20(5), 1144–1152.
Koelsch S. (2011). Toward a neural basis of music perception: A review and updated model.
Krampe, R., & Ericsson, K. (1996). Maintaining excellence: Deliberate practice and elite
performance in young and older pianists. Journal of Experimental Psychology: General 125(4),
331–359.
Kraus, N., McGee, T. J., & Koch, D. B. (1998). Speech sound representation, perception and
plasticity: A neurophysiologic perspective. Audiology & Neurotology 3, 168–182.
Kuhtz-Buschbeck, J. P., Mahnkopf, C., Holzknecht, C., Siebner, H., Ulmer, S., & Jansen, O. (2003).
Effector-independent representations of simple and complex imagined finger movements: A
combined fMRI and TMS study. European Journal of Neuroscience 18(12), 3375–3387.
308–314.
Lee, A., Heiß, P., Eich, C., Ioannou, I. C., & Altenmüller, E. (2018). Phenomenology, risk-factors and
treatment outcome in 369 musicians with focal dystonia. Submitted to Journal of Clinical
Movement Disorders.
Lin, P. T., & Hallett, M. (2009). The pathophysiology of focal hand dystonia. Journal of Hand
Therapy 22(2), 109–114.
Loui, P., Li, H. C., Hohmann, A., & Schlaug, G. (2011). Enhanced cortical connectivity in absolute
pitch musicians: A model for local hyperconnectivity. Journal of Cognitive Neuroscience 23(4),
1015–1026.
Meinz, E. J. (2000). Experience-based attenuation of age-related differences in music cognition tasks.
Psychology and Aging 15(2), 297–312.
Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors. Perception &
make perfect: No causal effect of music practice on music ability Psychological Science 25(9),
17965–1803.
Münte, T. F., Kohlmetz, C., Nager, W., & Altenmüller, E. (2001). Neuroperception: Superior auditory
spatial tuning in professional conductors. Nature 409(6820), 580.
Münte, T. F., Nager, W., Beiss, T., Schroeder, C., & Altenmüller, E. (2003). Specialization of the
specialized: Electrophysiological investigations in professional musicians. Annals of the New York
Oechslin, M. S., Imfeld, A., Loenneker, T., Meyer, M., & Jäncke, L. (2010). The plasticity of the
superior longitudinal fasciculus as a function of musical expertise: A diffusion tensor imaging
study. Frontiers in Human Neuroscience 3, 76. Retrieved from
https://doi.org/10.3389/neuro.09.076.2009
Oztürk, A. H., Tascioglu, B., Aktekin, M., Kurtoglu, Z., & Erden, I. (2002). Morphometric
comparison of the human corpus callosum in professional musicians and non-musicians by using
in vivo magnetic resonance imaging. Journal of Neuroradiology 29(1), 29–34.
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). Increased
auditory cortical representation in musicians. Nature 392, 811–814.
Pascual-Leone, A. (2001). The brain that plays music and is changed by it. Annals of the New York
Pascual-Leone, A., Grafman, J., & Hallett, M. (1994). Modulation of cortical motor output maps
during development of implicit and explicit knowledge. Science 263(5151), 1287–1289.
Paulig, J., Jabusch, H. C., Großbach, M., Boullet, L., & Altenmüller, E. (2014). Sensory trick
phenomenon improves motor control in pianists with dystonia: Prognostic value of glove-effect.
Ragert, P., Schmidt, A., Altenmüller, E., & Dinse, H. R. (2003). Superior tactile performance and
learning in professional pianists: Evidence for meta-plasticity in musicians. European Journal of
Ramnani, N. (2014). Automatic and controlled processing in the corticocerebellar system. Progress
in Brain Research 210, 255–285.
Ridding, M. C., Brouwer, B., & Nordstrom, M. A. (2000). Reduced interhemispheric inhibition in
musicians. Experimental Brain Research 133(2), 249–253.
Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of
motor actions. Brain Research: Cognitive Brain Research 3(2), 131–141.
Roland, P. E., & Zilles, J. (1996). Functions and structures of the motor cortices in humans. Current
Opinion in Neurobiology 6(6), 773–781.
Rosenkranz, K., Altenmüller, E., Siggelkow, S., & Dengler, R. (2000). Alteration of sensorimotor
integration in musician’s cramp: Impaired focussing of proprioception. Clinical Neurophysiology
111(11), 2036–2041.
Rosenkranz, K., Williamon, A., Butler, K., Cordivari, C., Lees, A. J., & Rothwell, J. C. (2005).
Pathophysiological differences between musician’s dystonia and writer’s cramp. Brain 128(4),
918–931.
value. Science 340(6129), 216–219.
Schlaug, G. (2001). The brain of musicians: A model for functional and structural plasticity. Annals
Schlaug, G. (2015). Musicians and music making as a model for the study of brain plasticity.
Schlaug, G., Jäncke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callusum
Schlaug, G., Jäncke, L., Huang, Y., & Steinmetz, H. (1995). In vivo evidence of structural brain
asymmetry in musicians. Science 267(5198), 699–701.
Schmidt, A., Jabusch, H. C., Altenmüller, E., Hagenah, J., Brüggemann, N., Lohmann, K., … Klein,
C. (2009). Etiology of musicians dystonia: Familial or environmental? Neurology 72(14), 1248–
1254.
Schneider, P., Sluming, V., Roberts, N., Scherg, M., Goebel, R., Specht, H. J., … Rupp, A. (2005).
Structural and functional asymmetry of lateral Heschl’s gyrus reflects pitch perception preference.
Seger, C. A. (2006). The basal ganglia in human learning Neuroscientist 12(4), 285–290.
Sergeant, D. (1968). Experimental investigation of absolute pitch. Journal of Research in Music
Education 17(1), 135–143.
Skoe, E., & Kraus, N. (2013). Musical training heightens auditory brainstem function during
sensitive periods in development. Frontiers in Psychology 4, 622. Retrieved from
Sommer, M., Ruge, D., Tergau, F., Beuche, W., Altenmüller, E., & Paulus, W. (2002). Spatial
distribution of intracortical inhibition and facilitation in focal dystonia Movement Disorders 17,
1017–1025.
Steele, C. J., Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2013). Early musical training and white-
matter plasticity in the corpus callosum: Evidence for a sensitive period. Journal of Neuroscience
33(3), 1282–1290.
Stetson, C., & Anderson, C. A. (2015). Early planning in frontal and parietal cortex in a simplified
task. Journal of Neurophysiology 113(10), 3915–3922.
Stewart, L., Henson, R., Kampe, K., Walsch, V., Turner, R., & Frith, U. (2003). Brain changes after
learning to read and play music. NeuroImage 20(1), 71–83.
Strübing, F., Ruiz, M. H., Jabusch, H. C., & Altenmüller, E. (2012). Error monitoring is altered in
musician’s dystonia: Evidence from ERP-based studies. Annals of the New York Academy of
Sciences 1252, 192–199.
Taubert, M., Villringer, A., & Ragert, P. (2012). Learning-related gray and white matter changes in
humans: An update. Neuroscientist 18(4), 320–325.
Termsarasab, P., Ramdhani, R. A., Battistella, G., Rubien-Thomas, E., Choy, M., Farwell, I. M., &
Simonyan, K. (2015). Neural correlates of abnormal sensory discrimination in laryngeal dystonia.
Neuroimage Clinical 10, 18–26.
van Vugt, F. T., Boullet, L., Jabusch, H. C., & Altenmüller, E. (2014). Musician’s dystonia in
pianists: Long-term evaluation of retraining and other therapies. Parkinsonism & Related
Disorders 20(1), 8–12.
Vaquero, L., Hartmann, K., Ripolles, P., Rojo, N., Sierpowska, J., François, C., … Altenmüller, E.
(2016). Structural neuroplasticity in expert pianists depends on the age of musical training onset.
Vollmann, H., Ragert, P., Conde, V., Villringer, A., Classen, J., Witte, O. W., & Steel, J. (2014).
Instrument specific use-dependent plasticity shapes the anatomical properties of the corpus
callosum: A comparison between musicians and non-musicians. Frontiers in Behavioral
Neuroscience 8, 245. Retrieved from https://doi.org/10.3389/fnbeh.2014.00245
Warren, J. E., Wise, R. J., & Warren, J. D. (2005). Sounds do-able: Auditory-motor transformations
and the posterior temporal plane. Trends in Neuroscience 28(12), 636–643.
Zatorre, R. J. (2001). Neural specializations for tonal processing. Annals of the New York Academy of
Sciences 930, 193–210.
interactions in music perception and production. Nature Reviews Neuroscience 8(7), 547–558.
substrates Proceedings of the National Academy of Sciences 110(2), 10430–10437.
CHAPT E R 20
BRAIN RESEARCH IN
M U S I C I M P R O V I S AT I O N
MI C H A E L G . E R K K I N E N A N D A A R O N L . B E R K O WI T Z
I — ability to spontaneously generate novel musical

compositions in the moment of performance—is a fundamental aspect of
nearly all musical performance traditions across the world and throughout
history. In addition to its importance in musical performance, musical
improvisation is a specific instance of generative creativity, a capacity
necessary for day-to-day cognitive functions such as language. Several
studies over the past decade have begun to illuminate our neuroscientific
understanding of improvisational musical creativity. This chapter reviews
the experimental methods and results of these studies, what they help us to
understand about the neural substrates of improvisation, and what this
understanding elucidates about the cognitive underpinnings of creativity
more broadly (for a review of studies of improvisation in the arts beyond
music, see Beaty, Benedek, Silvia, & Schacter, 2016).
Because brain research on musical improvisation is relatively new
compared to other areas of the music cognition literature, we present a
detailed review of major studies performed to date, organized by
experimental method (functional magnetic resonance imaging (fMRI),
positron emission tomography (PET), transcranial direct current stimulation
(tDCS), and electroencephalography (EEG)). Following the summaries of
individual articles, the discussion section summarizes this literature and its
implications for the understanding of the cognitive and neurobiological
basis of improvisation.
F M R
I ( MRI)
fMRI evaluates changes in blood oxygenation to determine areas of

increased or decreased brain activity during cognitive tasks. Neural activity
during an experimental condition of interest is compared to neural activity
during control conditions to evaluate which region(s) of the brain appear to
be associated with the particular cognitive task being studied.
Cortical Regions Involved in the Generation of

Musical Structures during Improvisation in Pianists
(Bengtsson, Csikszentmihalyi, & Ullén, 2007)
Design
Eleven male concert pianists were shown template musical scores and
instructed to create new or embellished melodies based on the score (the
“Improv” condition). The subjects were asked to remember their
improvised melodies, and to reproduce them “as faithfully as possible” in
the immediately subsequent condition (the “Reproduce” condition). In 5 of
the 11 subjects, the authors also included a condition where subjects
improvised melodies based on the score, but without needing to remember
and reproduce them later (the “FreeImprov” condition). This condition was
designed to reduce the impact of increased working memory demands on
the improvisational task. The “Improv” and “Reproduce” conditions were
designed to keep the musical and motor output the same across tasks, thus
isolating the improvisational element. Behavioral measures included the
number of notes played, the difference between the template score and the
improvised performance, the types of musical modifications (e.g., grace
notes, substitutions, etc.), and the accuracy of the reproduced melodies.
Results
Behavioral: Subjects accurately reproduced their improvisations during the
“Reproduce” condition, and a comparison between “Improv” and
“Reproduce” showed no differences in musical or motor output. In the
comparison of “FreeImprov” versus “Improv,” the subjects played more
notes, for a longer period, and made more musical modifications during the
former condition.
fMRI: The comparison of “Improv” and “Reproduce” conditions
demonstrated increased signal in the right dorsolateral prefrontal cortex
(DLPFC), right pre-supplementary motor area (pre-SMA), bilateral dorsal
premotor cortex (PMd), left posterior superior temporal gyrus (STG) near
the temporoparietal junction (TPJ), right fusiform gyrus, and bilateral
middle occipital gyri. Improvisational complexity correlated with increased
activity within the pre-SMA. There were no significant differences in brain
activity in the “FreeImprov” versus “Improv” contrast.
Conclusions/Highlighted Discussion
The authors hypothesized that the DLPFC may “maintain and execute an
overall plan for the improvisation through top-down influences on the
activity in subordinate premotor areas,” thus guiding the selection of actions
best suited for a particular improvisation to be executed by the motor
system. The authors suggest the pre-SMA may be involved in the temporal
aspects of response selection during improvisation, including “decisions
about timing and rhythmic patterning.” The rostral PMd may be involved in
guiding the selection of responses based on visual cues (in this case musical
notation), and transforming those cues into motor sequences. The authors
speculate that the medially-located pre-SMA may be related to rhythm and
the lateral rostral PMd to melody, given these regions’ known roles in
temporal and visual motor integration, respectively. The posterior STG
activation was felt to represent increased processing in auditory-motor
feedback loops required to guide the improvisation based on the ongoing
performance, increased auditory monitoring and working memory demands,
or the online retrieval or musical motifs from long-term memory. The
bilateral fusiform and middle occipital activations were felt to reflect
increased visual processing of the musical scores when using them to guide
improvisation. In sum, these authors conclude that improvisation requires a
brain network relying heavily on higher-order cognitive and motor centers
in conjunction with auditory memory/feedback, and, when score-guided,
higher-order visual regions.
Generation of Novel Motor Sequences: The Neural

Correlates of Musical Improvisation (Berkowitz &
Ansari, 2008)
Design
Thirteen classically-trained pianists were instructed to perform under
conditions of rhythmic and melodic freedom and constraint. The melodic
constraint consisted of playing one of several simple melodies with a fixed
order of five sequential pitches, and the rhythmic constraint was playing
quarter notes on the beat with a metronome. Using a 2×2 block design,
subjects were asked to improvise novel five-note melodies with and without
rhythmic constraint, and to improvise new rhythms with and without the
melodic constraint. The effects of “rhythmic freedom” (i.e., all improvised
versus constrained rhythm tasks) and “melodic freedom” (i.e., all
improvised versus constrained melodic tasks) were assessed. Behavioral
measures included how well the subjects adhered to the rhythmic and
melodic tasks, including the inter-press interval variability, variety of note
combinations, and percentage of unique note sequences.
Results
Behavioral: Subjects were largely adherent to the rhythmic and melodic
constraints, and demonstrated a higher percentage of unique melodic
sequences when improvising.
fMRI: Melodic improvisation was associated with increased activity in
the anterior cingulate cortex (ACC), left ventral premotor cortex (PMv) and
inferior frontal gyrus (IFG), bilateral dorsal premotor cortex (PMd), and left
cerebellum.
Rhythmic improvisation was associated with increased activity in the
left ACC, PMd, IFG/PMv, sensorimotor cortex, superior parietal lobule
(SPL), and inferior parietal lobule.
The interaction of melodic and rhythmic freedom—the condition of total
improvisational freedom—did not reveal any unique areas of activity
beyond that of melodic or rhythmic freedom considered separately.
However, a conjunction analysis evaluating which areas were involved with
both rhythmic and melodic freedom demonstrated a three-region network:
the left IFG, PMd, and ACC.
The authors suggest the areas of common activation in both rhythmic and
melodic freedom—IFG/PMv, ACC, and PMd—to be involved in “the
generation of possible musical phrases, selection among them, and
execution of the decided-upon motor output,” respectively.
Neural Substrates of Spontaneous Musical

Performance: An fMRI Study of Jazz Improvisation
(Limb & Braun, 2008)
Design
Six professional jazz pianists were instructed to play improvised and
rehearsed melodies both under musically constrained and unconstrained
conditions. In the constrained condition (“Scale-Improv”), the subject was
limited to notes of the C-major scale and could only play quarter notes set
to the beat of a metronome; the constrained control condition (“Scale-Ctrl”)
consisted of repeated playing of the scale, ascending then descending paced
by a metronome. In the unconstrained condition (“Jazz-Improv”), the
subject improvised melodies without rhythmic constraint to background
musical accompaniment. The control condition was performing memorized
melodies given to the musicians beforehand (“Jazz-Ctrl”) with background
accompaniment. Behavioral measures included the total number and scalar
distribution of notes.
Results
Behavioral: There were no differences in the motor output between
improvisational and control tasks in either constrained or unconstrained
condition, suggesting the functional MRI findings were not solely
attributable to the magnitude of motor or musical output.
fMRI: The comparison of improvisational tasks with controls, in both
the constrained and unconstrained conditions, was associated with increased
neural activity across an extensive network involving prefrontal (bilateral
medial frontopolar cortex, left dorsal ACC), motor (bilateral PMv, bilateral
PMd, bilateral SMA, left IFG, primary motor cortex, right cerebellum),
auditory (right STG, bilateral anterior middle temporal gyrus [MTG],
superior temporal sulcus [STS], inferior temporal gyrus [ITG]), limbic,
parietal (bilateral SMG, intraparietal sulcus [IPS], and SPL), and visual
areas (bilateral middle and superior occipital gyri, left fusiform gyrus).
There was also a widespread pattern of deactivations involving the
prefrontal (bilateral DLPFC, dorsal medial PFC, bilateral lateral
orbitofrontal cortex), limbic (bilateral hypothalamus, amygdala,
hippocampus, and parahippocampal gyri, temporopolar area, ventral
striatum), insula (bilateral posterior, middle, and anterior), motor basal
ganglia (bilateral caudate and putamen), and heteromodal sensory areas
(posterior STS, angular gyrus, PCC). The listed activations and
deactivations seen during improvisational tasks were present when
compared against both control and rest tasks, suggesting the findings are
not merely driven by increased motor and cognitive demands of
improvisation. Unique effects of jazz improvisation beyond that of the
constrained form of improvisation (“Scale”) were not reported.
The authors highlight the dissociation of activity within the prefrontal
cortex seen during musical improvisation—widespread deactivation of
lateral regions and focal activation with the medial regions—and interpret
this pattern to suggest that “musical creativity vis-à-vis improvisation may
be a result of the combination of intentional, internally generated self-
expression (MPFC-mediated) with the suspension of self-monitoring and
related processes (lateral OFC and DLPFC-mediated) that typically regulate
conscious control of goal-directed, predictable, or planned actions.” The
authors speculate that the increased neocortical activity in multiple sensory
and motor regions may reflect task-related demands (e.g., temporal activity
reflects processing of highly-structured acoustic information, motor region
activity with the generation/selection of novel motor programs) and raise
the possibility that these changes may be triggered in a top-down manner
driven by the prefrontal areas. The authors speculate that deactivations
within parts of the limbic system may reflect the positive emotional valence
associated with creative act, as this is also seen when listening to
pleasurable music. The authors suggest the medial PFC serves as “an index
of internally motivated behaviors” and performs “a broad-based integrative
function, [that] combines multiple cognitive operations in the pursuit of
higher behavioral goals.” The lateral PFC is conceived of as the site of
conscious monitoring, where goal-directed behaviors are evaluated and
corrected, and is “responsible for planning, stepwise implementation, and
on-line adjustment of behavioral sequences that require retention of
preceding steps in working memory.” This state of medial activation and
lateral deactivation within the PFC may allow for highly-creative and
unconstrained actions, which when directed toward musical improvisation,
may allow for creative expression in this highly-trained group of musicians.
Expertise-Related Deactivation of the Right

Temporoparietal Junction during Musical
Improvisation (Berkowitz & Ansari, 2010)
Design
This study builds on the earlier work (2008) by the same authors, and
sought to uncover how expertise impacts the improvisational neuroanatomy.
The experimental tasks were the same as their 2008 study (see Berkowitz &
Ansari, 2008, described earlier), but the study compared classically-trained
pianists with an average of 13 years of experience (n = 13) with non-
musicians who had less than 3 years of musical training (n = 15).
Results
Behavioral: There was no difference in the level of novelty between experts
and novices during the improvisational tasks.
The fMRI data demonstrates that experts had significantly reduced
activity in one region—the right TPJ—when compared with non-experts
during improvisational tasks. This was true under conditions of melodic
improvisation but not rhythmic improvisation.
The finding that TPJ deactivation was associated with expertise while
improvising was unexpected, as the authors had hypothesized that musical
expertise may be associated with increased activity in the network of
regions described in their prior study (Berkowitz & Ansari, 2008). TPJ
deactivation was felt to reflect a training-related utilization of a “top-down,”
goal-directed cognitive strategy during improvisation. They review
evidence that the TPJ, as part of the broader ventral attention network, is
normally involved in “bottom-up, stimulus-driven” attentional processing,
and is important for reorienting attention to behaviorally relevant sensory
stimuli. The authors suggest that during improvisational tasks, experts may
more effectively filter out potentially task-irrelevant stimuli by inhibiting
the TPJ, reflecting a top-down, goal-driven approach to the task.
Goal-Independent Mechanisms for Free Response

Generation: Creative and Pseudo-Random
Performance Share Neural Substrates (de
Manzano & Ullén, 2012b)
Design
The novel design feature of the study is the use of a musical baseline task
employing external cues—sight-reading. This allowed for a distinction
between states of internal and external (e.g., cued) generation of musical
expression.
Fifteen professional concert pianists with improvisational experience
had musical scores presented visually along with task instructions. The
number of notes allowed in any task ranged from two (F and G) to 6 (F
major scale) to 12 (chromatic scale). The conditions were “Notes,” where
scores showed pseudo-random sequences of pitches and subjects were
asked to play them as accurately as possible; “Improvise,” where the scores
left the melody unspecified (i.e., used percussion notation) and subjects
were asked to “do musical improvisations”; and “Random,” where the
scores left the melody unspecified and subjects were asked to “press the
keys in a random fashion.”
Behavioral assessments included the number of notes/keystrokes and the
accuracy of the reproduced melody/rhythm. In terms of neuroimaging,
standard contrasts were applied to the MRI signal (e.g.,
“Improvise”—“Notes”). Shared regions of overlapping signal across
contrasts (i.e., conjunction analyses) were also applied that distinguished
internally-generated versus externally-cued actions (i.e., conjunction of
“Random”—“Notes” and “Improvise”—“Notes”). Parametric analyses
using the number of keys (2, 6, 12) as a modulator of brain activity were
obtained for each contrast (i.e., what brain regions were parametrically
modulated by # of keys during improvisational versus constrained tasks).
Results
Behavioral: The performance data showed that the pianists accurately
performed the presented melody during the “Notes” condition, and that
larger intervals were played during the “Notes” and “Random” sections
compared with “Improvise.”
fMRI: Internally-generated actions (conjunction of “Improvise” +
“Random” versus “Notes”) were associated with increased activity in
prefrontal regions (right ACC, left DLPFC, left dorsal medial PFC),
bilateral motor regions (bilateral pre-SMA and cerebellum), bilateral IFG,
and bilateral insula. The contrast of “Improvise”—“Random” did not reveal
any unique areas of neural activity. The contrast of
“Random”—“Improvise,” by comparison, showed a widespread pattern of
bilateral activations, including frontal, temporal, parietal, occipital, insular,
and cerebellar regions. This pattern was maintained when controlling for
the larger interval sizes seen in “Random.” The largest clusters were within
TPJ, medial and lateral PMC, DLPFC, and cerebellum. A parametric
analysis using the number of keys (2, 6, 12) as a modulator of brain activity
did not identify regions with significant correlations during “Improvisation”
contrasted with either “Random” or “Notes.”
The authors suggest that the observed pattern of activation associated with
internally-generated tasks, namely the bilateral IFG extending into the
DLPFC, right ACC, pre-SMA, bilateral insula, and cerebellum, is “likely
involved in general functions important for a wide-range of free generation
tasks, e.g., attention to action, monitoring of responses in working memory,
response inhibition, and selection.” Several areas reported previously in
Bengtsson et al. (2007) were not seen, including the bilateral rostral PMd,
TPJ, and occipital regions, and the authors suggest this to reflect differences
in the baseline condition, as sight-reading may require additional visual
processing and visually-guided selection of motor sequences. Bilateral IFG
activations, which were not observed in Bengtsson et al. (2007), were felt to
result from comparatively increased musical freedom, and are thought to
contribute to cognitive processes important for the generation of novel
musical phrases, including “retrieval and selection of semantic information
from long-term memory, rule maintenance, and sequential control in
general including syntax processing.”
The widespread increased activity seen with random keyboard presses
(compared with musical improvisation) was felt to reflect task demands:
“far from being a simple model of free choice, [pseudo-random generation]
is a complex and novel task with high demands on attention, planning,
working memory, and executive functions.”
This study suggests musical improvisation utilizes prefrontal, high-order
motor, inferior frontal, and insular regions, but that these regions are not
specific for musical improvisation, as this network is also active during
performance of internally-generated actions that are not musical in nature.
Activation and Connectivity Patterns of the

Presupplementary and Dorsal Premotor Areas
during Free Improvisation of Melodies and
Rhythms (de Manzano & Ullén, 2012a)
Design
Fifteen professional concert pianists played four-bar musical phrases with a
number of rhythmic and melodic constraints. The subjects were cued
visually with a score, and were asked to use a similar number of keystrokes
across tasks. The experimental conditions included “Notes,” where the
displayed score specified both the notes and rhythm to be played;
“Melody,” where the score specified the rhythm (using percussion notation)
but the melody was undefined; “Rhythm,” where the score specified the
melody (using filled note heads only) but the rhythm was undefined; and
“Free,” where the score left both melody and rhythm unspecified.
Behavioral measures included the number of notes/keystrokes played,
the accuracy of reproduced melody/rhythm, and the duration of the
performance.
The fMRI technique utilized was a region of interest (ROI) analysis,
using the pre-SMA and PMd as ROIs, and examining whether activity in
these regions was associated with improvised melodies (“Free” + “Melody”
versus “Rhythm” + “Notes”) or improvised rhythms (“Free” + “Rhythm”
versus “Melody” + “Notes”). A psychophysiological interaction (PPI)
functional connectivity analysis was also performed, using the pre-SMA
and left PMd as seeds during melodic and rhythmic improvisation.
Results
Behavioral: There were more reproduction errors during constrained
improvisational (e.g., “Melody”) than non-improvisational (“Notes”) tasks,
although the absolute number of errors was small.
fMRI: The ROI analysis looking specifically at the pre-SMA and PMd
showed melodic improvisation to be associated with increased activity in
both the left PMd and bilateral SMA, whereas rhythmic improvisation
demonstrated only bilateral pre-SMA activity. Functional connectivity
studies demonstrated increased connectivity of pre-SMA with the
cerebellum (right > left, lobules VI/VII) during rhythmic improvisation
when compared with melodic improvisation or “Notes.” Melodic
improvisation was not associated with increased pre-SMA functional
connectivity with other brain regions. The left PMd was not associated with
changes in function connectivity during either task. The GLM (general
linear model) contrast of free improvisation (“Free”—“Notes”) was
associated with increased activity in prefrontal, motor, and lateral frontal
network, including the pre-SMA, left PMd, left DLPFC extending into the
left IFG (pars triangularis), and decreased activity within the bilateral
inferior occipital gyrus, right precentral gyrus extending across the midline
into the right superior parietal cortex, bilateral medial frontal gyrus, left
superior parietal lobe, and right inferior parietal lobe.
This study demonstrated differential activation of high-order motor regions
during musical tasks that isolate melodic and improvisational freedom, and
functional connectivity of these regions to other brain regions varies
depending on the task. The authors argue that the pre-SMA plays a critical
role in motor timing and the hierarchical control and sequencing of
movements, which is important in both melody and rhythm generation. The
authors suggest that the PMd may be important for melodic and spatial
processing based on evidence showing the region to be important for the
cognitive aspects of visuomotor integration and spatial targeting of
movement sequences.
The regions associated with free improvisation (“Free”—“Notes”)—the
pre-SMA, dorsal PMC, and DLPFC—are known to be associated with
“explicit processing of novel motor sequences.” The authors contrast their
results with that of Limb and Braun (2008), which showed task-related
deactivations in these areas in musicians with expertise specifically in
improvisation, and suggest that expertise may make the tasks less
cognitively demanding. Expert improvisers may utilize “implicit, routine,
and automated behavior” strategies, which is reflected neurologically in “a
more caudal distribution of activity in the SMA and PMd.”
The observed increase in functional connectivity of the pre-SMA with
the cerebellum during rhythmic improvisation was anticipated given the
cerebellum’s role in motor timing, and demonstrates the region’s ability to
modulate its interactions with other areas based on task demands.
Neural Correlates of Lyrical Improvisation: An
fMRI Study of Freestyle Rap (Liu et al., 2012)
Design
This study sought to understand musical improvisation using a different
creative modality than the keyboard studies described earlier: freestyle rap.
Twelve freestyle rap artists with more than five years of experience rapped
over an eight-measure instrumental track at 85 beats per minute, and the
experimental conditions included “Conventional,” where the subjects were
asked to rap a memorized lyric they had been given prior to the scanning
session; and “Improvise,” where the subjects were asked to improvise
lyrics. Behavioral measurements included blinded ratings of the creative
use of language and rhythm in the improvised compositions (by an expert
panel), and the number of syllables per minute. The subjects also completed
standard beside neurological tests of generative verbal fluency, with both
phonological and semantic constraints.
Results
Behavioral: The subjects generated the same number of syllables in both
the “Improvise” and “Conventional” conditions. On tests of verbal fluency,
the subjects scored above the 80th percentile when compared with age and
education matched controls in both semantic and phonemic tests,
suggesting superior abilities.
fMRI: The main neuroimaging contrast designed to isolate the unique
aspects of improvisation—“Improvise” versus “Conventional”—revealed
increased activity within the several functional networks, including
prefrontal (left medial PFC extending from the frontopolar cortex to the
pre-SMA), language/perisylvian (left IFG, left MTG, left STS, and left
fusiform), and motor (left cingulate motor area [CMA], left pre-SMA, left
dorsal PMC, left caudate, left globus pallidus, and right posterior
cerebellum and vermis) regions. There was decreased activity within the
right DLPFC extending from orbital to superior regions.
A parametric analysis using the expert-rated creativity scores as
predictors of regional activity showed higher creativity scores during
“Improvise” to be associated with higher activity in the left posterior and
middle temporal gyrus, left medial PFC near the superior frontal sulcus, and
the left PCC.
A functional connectivity analysis showed that during “Improvise”
(versus “Conventional”), activity in the left medial PFC (seed selected by
the parametric results) had reduced connectivity with the right DLPFC and
increased connections with the anterior perisylvian (e.g., left IFG) and
cortical motor areas (e.g., CMA, ACC, pre-SMA, and PMd). To define the
extent of this improvisation-associated medial PFC network further,
functional connectivity studies were repeated using the areas of increased
medial PFC connectivity as seeds (e.g., left IFG, pre-SMA, CMA); these
regions all showed increased connectivity with the left amygdala. When the
left amygdala was used as a seed, there was a widespread, bilaterally-
distributed neural network of increased connectivity, which included the
right IFG, right inferior parietal lobule, and bilateral insula.
The study showed similar results to Limb and Braun (2008), with a
dissociation of medial activation and lateral deactivation within the PFC
during musical improvisation. The authors suggest that this pattern reflects
“a state in which internally-motivated, stimulus-independent behaviors are
allowed to unfold in the absence of conscious volitional control.” The
medial PFC, which regulates motivational drive and guides self-generated
behaviors, is normally regulated by the DLPFC, where executive control
occurs and ongoing adjustments are made “to ensure that actions conform
to explicit goals.” The authors speculate that information from the medial
PFC could bypass the DLPFC via its rich connections to the CMA, an area
known to integrate affective and cognitive representations to guide
behavior. The deactivations of the right DLPFC and other elements of the
dorsal attention network (e.g., intraparietal sulcus) are explained: “top-
down attentional processes mediated by this network may be attenuated
during improvisation, consistent with the notion that a state of defocused
attention enables the generation of novel, unexpected associations that
underlie spontaneous creative activity.”
The areas of activation during improvisational tasks tended to be
lateralized to the left, whereas the deactivations were more right-lateralized
(e.g., DLPFC). The authors suggest the dominant hemisphere activations
are consistent given the unique demands of freestyle rap, an inherently
language-based musical form, and may reflect “spontaneous phonetic
encoding and articulation of rapidly selected words during improvisation …
and spontaneous incorporation into established rhythmic patterns … which
may place additional demands on these regions.”
The widespread, bilaterally-distributed network identified during the
functional connectivity analyses using the medial PFC as the initial seed
may underlie “multi-modal sensory processing and the representation of
subjective experience, and that as a whole, this entire network is more
effectively coupled during spontaneous behavior—perhaps facilitating what
has been described as a psychological ‘flow’ state.”
The correlation of creativity scores with neural activity in the MTG and
STS may reflect superior verbal fluency, as these areas are important for
accessing the mental lexicon, and the medial PFC—also associated with
higher scores of creativity—may suggest a role for motivation/drive in
innovative compositions.
The authors argue that the DLPFC deactivations, which were not
reported in other studies of musical improvisation, may be the result of
fewer secondary cognitive demands associated with other studies’ tasks.
Neural Correlates of Musical Creativity:

Differences between High and Low Creative
Subjects (Villarreal et al., 2013)
Design
This study was designed to examine rhythmic improvisation, and look for
differences in subjects deemed to have high or low rhythmic creative
capacities. Twenty-four music therapy students were presented with one of
fourteen rhythms played on a cymbal, and asked to either repeat the rhythm
they just heard (“Repeat”) or create a new rhythm based on the presented
rhythmic pattern (“Create”).
Behavioral measurements included the number of variations from the
original sequence (“fluidity”) and the type of variations used (“flexibility”).
Based on these performance measures, the subjects were divided into two
groups—a less creative group (LCG) and a high creative group (HCG).
MRI analyses included both standard task comparisons and a parametric
analysis based on fluidity and flexibility scores.
Results
Behavioral: There was a wide but bimodal distribution in both fluidity and
flexibility scores, and the parameters were strongly correlated. This allowed
for a relatively clear separation into the HCG and LCG groups.
fMRI: Comparisons of HCG versus LCG (when “Create” and “Repeat”
tasks were combined) demonstrated increased signal in the left pre- and
postcentral gyrus and left DLPFC. The main improvisational contrast,
“Create” versus “Repeat,” revealed increased signal in the SMA, DLPFC,
and right ventral lateral PFC, when both groups were collapsed into one
(HCG+LCG). When examining the groups separately for this contrast, the
HCG showed increased activity in the left DLPFC, right insula, and right
ventral lateral PFC, whereas the LCG showed increased signal in the left
precentral gyrus and SMA. The contrast of these maps—“Create” versus
“Repeat” for the HCG only compared with the LCG—revealed only an
uncorrected activation within the left DLPFC and right insula.
Parametric analysis showed that flexibility scores covaried with signal in
the left DLPFC, right ventral lateral PFC, and right insula.
The authors argue that the left DLPFC and right insular activations seen in
the HCG during improvisation, combined with these regions’ positive
correlation with creativity scores in the parametric analysis represent
“widespread integration of networks associated with cognitive,
motivational, and emotional processes,” which may be important for
novelty idea generation. The DLPFC activity observed in the HCG, similar
to that suggested in Bengtsson et al. (2007), reflects “a greater focus of
attention, greater reliance on working memory to retain diverse musical
images in their mind while other images were being processed, greater
inhibition of interfering stimuli to avoid adhering to the original rhythmical
patterns, and greater amount of manipulating to organize their products into
unique and recognizable original combinations.” The tasks employed are
similar to those used by Bengtsson et al. (2007), and may help explain the
discrepant DLPFC findings with Limb and Braun (2008) and Liu et al
(2012). The insular activity, via its interactions with other regions, “serves
to develop subjective emotional and motivational states and to translate
these states into specific action plans. … and the correlation between
anterior insula activation and creativity … likely reflects a positive
association between the capacity to integrate information and creativity
level.”
The LCG group showed only SMA activity, and this was attributed to
the SMA’s role in a network that includes cortical (SMA, IPL), basal
ganglia, and cerebellar structures to integrate sensory and motor
information during performances involving rhythmic movements in
response to auditory stimuli. The authors suggest SMA may have been
more prominent in the LCG because their compositions did not differ from
the originally-presented rhythmic stimuli.
Connecting to Create: Expertise in Musical

Improvisation Is Associated with Increased
Functional Connectivity between Premotor and
Prefrontal Areas (Pinho, de Manzano, Fransson,
Eriksson, & Ullén, 2014)
Design
This study examined the role of musical expertise as it relates to the neural
correlates of musical improvisation. Thirty-nine pianists with a wide variety
of musical and improvisational experience improvised melodies under a
number of conditions, namely “Tonal,” where they improvised using six
different pitches from a Western musical scale (major or minor); “Atonal,”
where they improvised using six different pitches randomly chosen except
not from the same Western scale and at least one interval greater than a
third; “Happy,” where they were asked to improvise a “happy” melody
without pitch constraints; and “Fearful,” where they were asked to
improvise a “fearful” melody without pitch constraints. The subjects also
completed a survey where they estimated their total practice hours on the
piano, hours spent specifically on classical training, and hours spent
improvising.
Behavioral measurements included performance measures, including
how accurately subjects adhered to the task instructions (e.g., did they use
the correct notes of the scale), musical complexity, and the survey data.
Neuroimaging measurements included standard contrasts across
conditions, although the main contrast of interest was improvisation versus
rest. A functional connectivity analyses was also performed, which sought
to correlate the survey on experience with the regional functional
connectivity during musical improvisation.
Results
Behavioral: There was significant variability in the amount of experience
amongst the pianists, both overall and specifically for improvisation. The
amount of improvisational training did not correlate with measures of
musical complexity.
fMRI: There were strong correlations of the BOLD signal across all four
experimental conditions, and as such, the conditions were collapsed into
one (“Improvisation”) for the purposes of imaging analysis. Improvisational
experience was negatively correlated with neural activity during
“Improvisation” (contrasted with rest) in several right hemisphere regions,
namely the DLPFC, IFG, anterior insula, and angular gyrus. In a functional
connectivity analysis, improvisational experience was associated with
higher connectivity between prefrontal, premotor, and motor regions of the
frontal lobe during improvisation (contrasted with rest). This was shown
using six different seed regions within the bilateral DLPFC, pre-SMA, and
PMd, although the most extensive connectivity was seen using the right
PMd. Additional areas of increased functional connectivity outside the
frontal lobe were also observed using these seeds, including the parietal,
posterior temporal, primary sensorimotor, and cerebellar regions. The
regions in each of these studies affected by improvisational experience—
right hemisphere regions for neural activity, and bilateral frontal regions in
the functional connectivity analysis—were non-overlapping in their
anatomical distribution. All of these effects were independent of the amount
of classical piano experience or the age of the pianists.
This study demonstrates a link between the type of training and the
functional neuroanatomy underlying improvised musical performance. The
authors suggest that “greater functional connectivity of the frontal brain
regions seen in the most experienced participants may reflect a more
efficient integration of representations of musical structures at different
levels of abstraction. A higher functional connectivity with the seed regions
was observed with premotor regions and parietal and prefrontal association
cortex and the cerebellum, suggesting the training-related functional
reorganizations may affect both cognitive and sensorimotor aspects of
improvisation.” The authors suggest that the reduced activations of the right
DLPFC and parietal regions observed in those with more extensive
improvisational experience may indicate “automation and reduced top-
down cognitive control,” similar to what was reported by Limb and Braun
(2008) and Liu et al. (2012). The authors explain the finding of training
being associated with reduced brain activity but increased connectivity
between regions during the task of musical improvisation as signifying that
“skilled improvisational performance may thus be characterized by both
lower demands on executive control and a more efficient interaction within
the network of involved brain areas.”
Addressing a Paradox: Dual Strategies for Creative

Performance in Introspective and Extrospective
Networks (Pinho, Ullén, Castelo-Branco, Fransson,
& de Manzano, 2016)
Design
This study employed the same methods as Pinho et al. (2014), where 39
pianists were asked to improvise melodies on keyboard under a number of
conditions, including “Tonal,” “Atonal,” “Happy,” and “Fearful,” as
described above. The purpose was to compare improvisational tasks with
“emotional” intention (e.g., happy, fearful) with those based on explicit
rules based on pitch sets (e.g., tonal, atonal). Similar performance
measurements were gathered, including the accuracy of performance
against the task instructions and characterization of musical complexity.
MRI measurements included GLM contrasts of the BOLD signal across
tasks, and functional connectivity between the DLPFC and other brain
regions during the different pitch sets or emotional character of the
generated melodies.
Results
Behavioral: Performers used more keystrokes and showed higher musical
complexity during the emotional conditions (“Happy” and “Fearful”)
compared with the pitch-set conditions (“Tonal” and “Atonal”).
fMRI: In the contrast of the pitch-set conditions versus the emotional
conditions, there was increased activity within the bilateral DLPFC
(extended on the right throughout the middle frontal gyrus into the PMd),
inferior parietal lobes, inferior temporal gyri, left inferior occipital gyri, and
left cerebellum. During the opposite contrast—emotional versus pitch-set
conditions—there was increased signal within the left dorsal medial PFC
(in the superior medial gyrus), left medial orbital gyrus, bilateral IFG,
bilateral insula (extending into the amygdala), left STG, left mid-cingulate,
right precentral gyrus, left central sulcus, right Rolandic operculum, and
bilateral occipital gyri.
The right DLPFC seed ROI used in the functional connectivity analyses
was chosen as the region of overlap in the DLPFC between the GLM
contrast of pitch-set versus emotion and a previously reported DLPFC area
whose activity is related to improvisation practice (Pinho et al., 2014).
Functional connectivity during pitch-set condition (compared with
emotional sets) showed increased connectivity of the right DLPFC with
motor areas (bilateral PMd, left PMv, left SMA), auditory areas (bilateral
STG), left primary sensorimotor cortex, left parietal lobe, and right
cerebellum. During emotional conditions, the right DLPFC showed more
connectivity with parts of the default-mode network (medial PFC and
medial parietal regions). The left DLPFC showed a similar connectivity
pattern as the right DLPFC during emotional conditions, but not during
pitch-set conditions.
This study demonstrates that improvisation-associated neural activity and
connectivity are modulated by emotional and musical constraints. The
pitch-set task, the authors suggest, requires “an explicit approach to creative
thinking,” and consequently the DLPFC is more active and functionally
connected to premotor, sensorimotor, and cerebellum, which are important
for “integrating goal-oriented information, that is, internal (musical) and
external (response set) constraints, for attentional selection, that is,
cognitive control of action sequencing and motor execution.” The authors
argue that “top-down executive control extends to the level of motor
execution,” and the DLPFC, PMd, and parietal cortex “constitute an
‘intentional framework’ for sensorimotor processing.”
Emotional improvisation, in contrast, may rely on a more “implicit”
strategy, which is reflected in reduced DLPFC activity and its increased
connections with parts of the default mode. The increased activity of the
medial PFC during emotional improvisation is notable given its role in
“representing the affective meaning of stimuli,” and its “functional
interconnections with cortical, striatal, and limbic regions … [that] allow
convergence of sensorimotor integration and visceromotor control in the
processing of emotionally salient information and regulation of behavior.”
The authors point to evidence of tonal representations in the medial PFC,
which “may enable associative processes between music, emotion, and
memories.” During emotional conditions, the IFG controls response
selection “based on retrieval and sequencing processes that … utilized
internalized musical syntactic rules and semantic associations.”
The authors argue these two modes of musical improvisation represent
two neurological “meta-systems,” one an executive system “where the
DLPFC drives integration of sensory, autonomic, and goal-related
information to implement adaptive control,” and another an integrative
system “constituted primarily by the default mode network, where largely
automated processes in specialized brain systems are organized under the
influence of the MPFC for the flexible integration of exogenous and
endogenous information.” An individual may shift between these two
cognitive modes depending on their training and the improvisational
context.
Neural Substrates of Interactive Musical
Improvisation: An fMRI Study of “Trading Fours”
in Jazz (Donnay, Rankin, Lopez-Gonzalez,
Jiradejvong, & Limb, 2014)
Design
This study sought to examine musical improvisation as it occurs with an
interlocutor, as in “trading fours” in jazz. Eleven professional musicians
proficient in jazz piano performance interacted musically with an
interlocutor by alternating four-bar phrases with each other. The constraints
on these interactions between the musician pair characterized the
experimental conditions, which included “Scale-Control,” where only
quarter notes and repeated playing of the D Dorian scale was permitted;
“Scale-Improv,” where melodies were improvised using the D Dorian scale,
but only quarter notes were allowed; “Jazz-Control,” where subjects played
a memorized composition with background accompaniment; and “Jazz-
Improv,” where melodies and rhythms were unrestricted and played with
background accompaniment.
Behavioral assessments included measurement of the performance and
quantification of the musical interactions between the subject and
interlocutor, including note density, pitch class distribution, pitch class
transition, duration distribution, duration transition, interval distribution,
interval transitions, and melodic complexity.
Results
Behavioral: There were more notes played in the “Jazz-Improv” condition
compared with the “Jazz-Control,” with the comparable “Scale” conditions
showing no difference. Melodic complexity was highest and most variable
for “Jazz-Improv.” The melodies traded in phrase pairings were related to
each other in terms of duration, pitch, interval, and melodic complexity.
fMRI: The main contrast in the MRI data, improvised melodies versus
controls, revealed a widespread pattern of activation and deactivation in
both “Scale” and “Jazz” conditions. Areas of increased activity included
language (bilateral IFG pars opercularis and triangularis, bilateral [right
more so than left] posterior STG within Wernicke’s area), prefrontal
(bilateral DLPFC), motor (bilateral SMA), parietal areas (bilateral IPL,
bilateral SPL), bilateral SMG, and bilateral middle occipital gyrus. Areas of
decreased signal included prefrontal areas (bilateral dorsal prefrontal cortex
over the superior frontal gyrus and middle frontal gyrus), default mode
areas (bilateral angular gyrus, bilateral precuneus), and motor areas
(bilateral precentral gyrus). This pattern of BOLD signal change was
similar in both “Jazz” and “Scale” paradigms.
Functional connectivity measured during improvisational exchanges
revealed increased connectivity between the left and right IFG, and anti-
correlations between the bilateral IFG and STG, and the left IFG and
bilateral angular gyri.
The study demonstrates that improvised musical exchanges are associated
with increased activity in a network that includes traditional perisylvian
language areas and their right-sided homologues (e.g., IFG, posterior STG),
prefrontal and attentional regions (bilateral DLPFC, IPL), premotor/motor
areas (e.g., bilateral SMA, precentral gyrus), and parietal regions (e.g., SPL,
IPL). The right IFG was felt to be important for the “detection of task-
relevant cues, such as those involved in the identification of salient
harmonic and rhythmic elements,” and the right STG important for auditory
short-term memory, as would be required to keep track of the interlocutor’s
ongoing improvisations. The bilateral IFG is important in syntactic
processing of music and speech, and the STG has been implicated in
harmonic processing. The authors suggest a link between linguistic and
musical discourse, and point to shared regions of activity in this study and
those using a speech interlocutor, as well as similarities in their hierarchical
structures, and propose that both utilize a “common neural network for
syntactic operations.”
Increased activity within the DLPFC during improvisation was felt to
represent increased conscious self-monitoring of musical behavior in the
social musical setting, and possibly also increased working demands
associated with trading fours. The authors speculate that increased activity
in the sensorimotor areas represents a “primed” state “as the musician
prepares to execute unplanned ideas in a spontaneous context.”
The authors suggest that the functional deactivations within the bilateral
angular gyrus, and its reduced connectivity with left IFG, may be
“indicative of the lesser role semantic processing has in moment-to-moment
recall and improvisatory musical generation whereby only musical syntactic
information is exchanged and explicit meaning is intangible and possibly
superfluous.”
This study suggests that social paired musical improvisations may utilize
inferior frontal systems important for hierarchical structuring of musical
and linguistic discourse (e.g., musical syntax), and require increased
working memory demands and harmonic processing. Areas important for
the communication of explicit semantic ideas (and their functional
connections) are less active during these exchanges, suggesting a de-
emphasis of these features in musical conversation.
Emotional Intent Modulates the Neural Substrates

of Creativity: An fMRI Study of Emotionally
Targeted Improvisation in Jazz Musicians
(McPherson, Barrett, Lopez-Gonzalez,
Jiradejvong, & Limb, 2016)
Design
This study sought to study the relationship between musical improvisation
and emotional processing. Twelve professional jazz pianists with greater
than five years of professional experience were asked to improvise
melodies in response to emotional cues. Subjects were shown individual
photographs representing one of three emotional valence states (e.g.,
positive, ambiguous, negative), and were asked to improvise melodies that
best represented the presented facial expression. The three experimental
conditions, “Positive,” “Negative,” and “Ambiguous” were contrasted with
a lower level musical baseline task, where the subject played ascending and
descending chromatic scales (“Chromatic”).
Behavioral measurements included musical performance features,
including note density (notes per second), note duration distribution
(variable length of individual notes), note maxima and minima (highest and
lowest pitch), mode, and key.
Results
Behavioral: The emotional valence of the facial expressions was associated
with differences in performance, as “positive” improvisations were most apt
to be performed in a major key (71% of the time, compared with 31% for
negative and 46% for ambiguous), had higher note maxima (whereas
“negative” conditions had lower note minima), highest note density
(followed by ambiguous, then negative), and significantly more notes of
shorter duration.
fMRI: When combining all groups, improvisation (versus chromatic)
was associated with increased signal in the left IFG, and decreased signal in
the bilateral medial and lateral frontopolar cortex, DLPFC, angular gyrus,
precuneus, and bilateral mid-cingulate.
Emotional valence was associated with different regions of brain activity
during improvisation. Positive improvisations were associated with
decreased signal within left hippocampus, and more extensive deactivation
in the DLPFC, angular gyrus, and precuneus compared with
negative/ambiguous. Both negative and ambiguous improvisations were
associated with increased activity in the bilateral SMA, and negative
improvisations were associated with decreased signal within the bilateral
hippocampi.
During improvisational blocks, the contrast of positive versus
ambiguous showed increased activity within limbic areas (left
hippocampus, left amygdala, right parahippocampal gyrus). The contrast of
negative versus ambiguous revealed increased signal in dorsal medial
prefrontal (right ACC [BA9]), posterior default mode (left angular gyrus
[BA39]), high-order sensory (SMG [BA40]), and limbic regions (right
hippocampus), and decreased signal within motor (right cerebellum, left
primary motor [BA4]), and auditory areas (bilateral Heschel’s gyrus).
Negative and ambiguous versus positive revealed increased signal in
prefrontal (bilateral frontopolar cortex [BA10]), right ACC (BA 32), right
insula (BA13 and 47), and perisylvian areas (right SMG [BA40], bilateral
middle temporal [BA22]). The contrast of positive versus negative revealed
increased signal only within the right cerebellum. Viewing the emotional
expression itself was not associated with any significant differences in brain
activity, which suggests the observed difference during improvisation did
not simply reflect viewing the emotional stimulus itself.
Functional connectivity analyses using seeds within the left amygdala
and left insula revealed changes in connectivity associated with the
emotional valence of the stimulus. During positive improvisations (versus
chromatic), the left amygdala had reduced connectivity with the left
cerebellum, and the left insula had lower connectivity with areas important
for attention and executive functioning (left superior frontal gyrus, bilateral
middle frontal gyrus), high-order sensory processing (left SMG), and
primary sensorimotor functions (precentral and postcentral gyri), increased
connectivity with visual areas (middle occipital gyrus). During negative
improvisations, the left amygdala had lower connectivity with the right IFG
and left postcentral gyrus. When contrasting positive versus negative
emotions during improvisational trials (not versus chromatic), the left
amygdala had greater connectivity with left-sided attention/executive areas
(superior medial and superior frontal gyri, IPS), ACC, and high-order
sensory areas (SMG). Using the same contrast, the left insula showed
increased connectivity with the Rolandic operculum and reduced
connectivity with midbrain (including substantia nigra).
The study reveals a network of brain regions important for musical
improvisation (e.g., deactivations within the angular gyrus, precuneus,
medial PFC; activations of the IFG) and demonstrates that activity in these
and other regions is altered by the intended emotional valence of the
compositions. Positive improvisations were associated with robust
deactivations of the DLPFC, which, in association with a lack of increased
activity in the SMA (which is active during tasks requiring continuous
monitoring of motor output), may “indicate that positive improvisation
induces a deeper state of flow than negative or ambiguous improvisation.”
Negative improvisations are associated with increased insular connectivity
with the substantia nigra, a midbrain nucleus containing neurons with
dopaminergic projections to subcortical reward centers. The insula is known
to represent afferent information about internal body states, and the authors
suggest that negative improvisations may be associated with “binding of
visceral awareness” within the insula without any “real-life” negative
consequences, creating a potentially rewarding situation. This may depend
on maintaining “cognitive distance” from the performance, which they
argue is substantiated by the finding of increased activity within the SMA
and frontopolar cortex during negative improvisations, which are regions
known to be involved in cognitive control and self-monitoring.
The authors suggest that positive and negative musical improvisation
may be pleasurable by different mechanisms: “While positive emotional
targets enable more widespread hypofrontality and deeper flow states
during spontaneous creativity, negative emotional targets may be more
closely linked to a stronger visceral experience and greater activity in
reward processing areas of the brain during improvisation.” This study
demonstrates that emotional intent activates different neural networks
during musical improvisation, and that positive and negative emotions
utilize different aspects of attentional, limbic, and sensory processing
during the generation of novel melodies.
P E T (PET)
In PET imaging, radioactively-labeled molecules important for blood flow

and metabolism (e.g., glucose, oxygen) are injected intravenously. The
tracer is taken up by different brain regions based on the metabolic demands
of the local tissue, which correlates with neural activity.
Music and Language Side by Side in the Brain: A

PET Study of the Generation of Melodies and
Sentences (Brown, Martinez, & Parsons, 2006)
Design
This study was designed to investigate the functional neuroanatomical link
between the spontaneous, improvisational aspects of language and music.
The investigative approach involved tasks where subjects improvised
musical and linguistic ideas. The subjects were ten university students with
musical experience, but not necessarily expertise; to be eligible they needed
to demonstrate proficiency in accurately reproducing presented melodies
vocally in key and with superimposed harmonies. The subjects completed
the following experimental tasks: (1) “melody generation,” where
incomplete, novel six-second melodies were presented aurally and subjects
were asked to generate and sing “an appropriate phrase” that completed
them using the syllable/da/; (2) “sentence generation,” where incomplete,
novel sentence fragments were presented and subjects were asked to
generate “semantically and syntactically appropriate” phrases that
completed the fragments; and (3) “rest,” where subjects sat with their eyes
closed.
Behavioral measures included a measurement of the melodic and verbal
responses. Standard voxel-wise PET analyses were performed during each
of the tasks, and group whole-brain flow images for the rest conditions were
subtracted from the two experimental tasks, to reveal the functional
anatomy specific to melody and sentence generation above that of rest.
Results
The authors found that melody generation (relative to rest) was associated
with increased PET signal in the SMA, pre-SMA, primary motor, lateral
premotor, frontal operculum, anterior insula, primary auditory, secondary
auditory, and superior temporal polar cortices. The SMA, primary motor,
and frontal opercular signals were bilateral, and the auditory cortices were
lateralized to smaller foci in the right hemisphere, and more widespread on
the left. An extensive subcortical network was also identified, including
thalamus, putamen, globus pallidus, caudate, midbrain, pons, and
cerebellum. The bilateral parieto-occipital cortices were deactivated. A
similar, overlapping network was identified for sentence generation. The
network of shared activation included the bilateral SMA, left primary
motor, bilateral premotor, left IFG (pars triangularis), left primary auditory,
bilateral secondary auditory, anterior insular, and left anterior cingulate
cortices; the subcortical areas were nearly identical between the two tasks.
Regions specific for melody generation were the dorsal right temporal pole
and right frontal operculum.
The authors identified a wide regional network that is associated with the
generation of new melodies that includes motor, language, auditory, limbic,
and subcortical areas. The authors suggest that these regions support
processes integral to improvisation, including “(i) accessing rules of
harmony, and (ii) re-ordering, rhythmically altering, re-harmonizing, or
concatenating the stimulus or recalled musical associations to generate
musically-appropriate phrases.” The opercular and planum polare
activations are hypothesized to subserve the “use of implicit knowledge for
harmonic and melodic rules,” the premotor, basal ganglia, and cerebellar
activity to subserve the “representation of rhythmic musical features” such
as meter, and the insula to reflect “kinaesthetically based musical
expressivity.” The other areas—the SMA, ACC, premotor areas, basal
ganglia—are “likely to be involved in the improvised manipulation of
musical structures or perceived in the stimulus,” and also to aid in the
response selection of “generated possibilities to determine the next note in a
phrase.” The authors suggest that music and language shared many
resources, including for audition and vocalization, and use parallel
resources for phonological generativity of different semantic units.
T D C
S ( DCS)
Transcranial direct current stimulation (tDCS) utilizes an externally applied

electrical current to stimulate a brain region. tDCS can increase or decrease
activity in the targeted brain region depending on factors such as the
frequency of stimulation and intrinsic properties of the neural populations
of the stimulated brain regions.
Anodal tDCS to Right Dorsolateral Prefrontal

Cortex Facilitates Performance for Novice Jazz
Improvisers but Hinders Experts (Rosen et al.,
2016)
Design
This study investigated how musical improvisation is affected when tDCS
is directed toward the right DLPFC, given conflicting findings in this region
in prior studies of musical improvisation. Seventeen jazz piano players with
a range of improvisational experience improvised melodies with both hands
on a full-size keyboard over background accompaniment. Improvisational
blocks consisted of 6 sixteen-bar jazz songs. The subjects completed three
sessions, with each session consisting of rest, an improvisational block, and
ending with additional non-musical cognitive tasks (not analyzed). Each
session used a different form of tDCS stimulation over the right DLPFC (at
the F4 electrode), including a “sham” condition (only 30 seconds of
stimulation), anodal stimulation (designed to “turn on” the region), and
cathodal stimulation (designed to “turn off” the region). After each session,
subjects were asked to choose their best performances. In addition, expert
judges rated the compositions on creativity, technical proficiency, and
aesthetic appeal. The musicians also completed a questionnaire about their
musical background (improvisational experience, musical style, etc.).
Results
The individual components of the expert ratings of performance were all
positively correlated with one another, and were collapsed into a single
“quality” score. The musical scores improved with increasing prior
improvisational experience.
When all subjects were considered together, tDCS (i.e., sham, anodal,
cathodal) did not affect musical quality. However, there was a quality-by-
expert interaction: right DLPFC stimulation increased the musical quality in
less experienced subjects (anodal more than cathodal) and decreased quality
in experts (anodal only).
Conclusion/Highlighted Discussion
Based on data from prior reports, the authors hypothesized that anodal
stimulation to the right DLPFC would improve the improvisational
performance in less experienced individuals by enhancing top-down
conscious control mechanisms (“Type 2 processes”), and that cathodal
stimulation would improve performance in more experienced improvisers
by enhancing implicit, automatic performance (“Type 1 processes”) that is
hypothesized to occur in hypofrontal states with expertise. The finding that
anodal (i.e., activating) stimulation led to increased quality of performance
in novices and decreased quality in experts is consistent with this idea. The
authors suggest that right DLPFC stimulation may enhance cognitive
processes that are important for creativity more generally, such as working
memory, attention, inhibitory control, and visuospatial memory. The authors
argue that right DLPFC anodal stimulation may also activate and strengthen
a functionally-connected network of brain regions including prefrontal,
premotor, and motor areas, which may “appear similar to more experienced
musicians,” or it may increase theta coherence, which is believed to
integrate “widely distributed neural networks that underlie creativity.”
Experts do not benefit from this stimulation, they argue, because it
disrupted their highly-trained neural networks by recruiting explicit, top-
down processing, “similar to what happens when one attends to the
components of a well-learned skill, causing performance decrements.”
The authors argue that cathodal stimulation did not have the expected,
opposite effects to that of anodal stimulation due to the unclear inhibitory
effects of cathodal stimulation or compensation from other cognitive
domains. Novices may have benefited from cathodal stimulation for similar
reasons as experts, by allowing them to “perform using a more bottom-up
approach.”
E (EEG)
Electroencephalography uses scalp electrodes to record electrical signals

from the brain.
The Brain Network Underpinning Novel Melody

Creation (Adhikari et al., 2016)
Design
This study sought to examine the electrophysiological signatures associated
with musical improvisation using EEG. Nineteen experienced musicians
with piano proficiency were tested using five experimental conditions,
“Play-Prelearned,” “Play-Improvised,” “Imagine-Prelearned,” “Imagine-
Improvised,” and “Rest.” The subjects performed melodies on a keyboard
during the “Play” conditions, and imagined melodies during the “Imagine”
conditions. The “Prelearned” conditions consisted of playing (or imagining)
one of four eight-quarter note melodies memorized by subjects before the
test session. Improvisational sessions were restricted to quarter notes within
the same tonal range as the “Prelearned” melodies. All conditions
(including rest) were paced by a metronome.
Behavioral data included performance accuracy (melodic and rhythmic)
and an originality score.
During all sessions, 64-channel EEG recording data was collected, and
spectral measures of peak amplitude, coherence, and Granger causality
(“directional causal influence from one oscillatory process to another”)
were calculated at different nodes and compared between conditions.
Results
Subjects’ performance was generally accurate: they performed the correct
order of tones on 88 percent of prelearned trials, and only slightly
anticipated the metronomic beat.
During play conditions, improvisation (versus prelearned) was
associated with higher peak amplitudes over left frontal, left central,
bilateral parietal, and bilateral occipital nodes. The same contrast during
imagined performance showed a similar area of peak amplitude difference
over the left frontal region, and novel regions within right lateral temporal
areas. The anatomical sources of the EEG signal during these tasks were
calculated to correspond to the left superior frontal gyrus (SFG), SMA, left
IPL, DLPFC, and right superior temporal gyrus (STG).
There was globally increased alpha power in all leads (most pronounced
in the parieto-occipital regions) during prelearned versus improvised
performances during the play tasks, and slightly increased beta power in the
frontal and parietal regions. There were no power differences in any
frequency range when comparing the “Imagine-Improvise” and “Imagine-
Prelearned” conditions.
The Granger causality analysis revealed dynamic intra-network
interactions. During overt musical performance, improvisation was
associated with decreased causal influences from the SFG to SMA, SMA to
IPL, and IPL to SFG. The strength of the connectivity between these
regions was also negatively correlated with the originality of the
compositions.
The authors propose that the finding of increased alpha power during the
overt prelearned tasks reflects top-down inhibition or suppression of
potentially interfering alternative responses (e.g., the other three prelearned
melodies.) The increase in beta power during the prelearned task may
reflect “improvement in cerebral integrative and motor functions,” and
“planning and execution of motor movements.”
The authors suggest the SMA may be involved in motor readiness,
motor imagery involving covert vocalizations, and “monitoring of current
and planned motor movements.” The left IPL and right STG are believed to
be involved in a feedback loop involving somatosensory and auditory
perception. The frontal areas (SFG, right DLPFC) may be involved in
cognitive control of musical improvisation. The causality analyses showing
reduced influence of the SFG to SMA to IPL to SFG loop during
improvisation aligns with the hypofrontality hypothesis, whereby “top-
down control may inhibit a creative process driven by bottom-up
processes.” The authors argue that during more complex improvisations, the
information flow is reversed through parts of the network (e.g., the SFG
receives information from the SMA), resulting in bottom-up processing
during more creative output.
The authors conclude that “creative performance in a real-time musical
improvisational task involves regions that may function outside of the top-
down control networks usually seen in traditional decision-making tasks.”
This may be driven by the time constraints related to the task, wherein
deliberate decision making about individual note choices is not possible,
resulting in reliance on “bottom-up processes to control note choices using
aesthetic rules that our advanced musician participants have internalized
during a lifetime of music engagement.”
Creativity as a Distinct Trainable Mental State: An
EEG Study of Musical Improvisation (Lopata,
Nowicki, & Joanisse, 2017)
Design
This study used EEG to evaluate three questions about the neural substrates
underlying improvisation. The first was whether there is a difference in
frontal alpha activity between musical improvisation, rote playback, and
passive listening, since synchronous frontal alpha oscillations are
hypothesized to serve as a marker of implicit, bottom-up “Type 1” creative
processes (see above). The second was to look for changes in alpha
synchronization associated with improvisational expertise/training, and the
third was to assess whether changes in alpha frequency correlate with the
quality of improvised compositions, as rated by experts. Twenty-two
musicians with a wide variety of musical experience (range: 4–48 years;
mean 18.5; SD 11.7) were split into two groups, one with formal
institutional training in improvisation (“FITI”) and the other without (“Non-
FITI”). Prior to testing, the musicians were shown three charts of 16 bars of
chord progressions and given the diatonic structures for each progression
(e.g., C-blues, G-major), but without overlying melodies. The experimental
tasks were performed in the same order each time: “Listen,” where a
melody was played and the subjects passively listened; “Learn,” where
subjects actively learned to play the melody on a keyboard; “Imagine
Playback,” where subjects imagined playing the prior melody; “Actual
Playback,” where subjects overtly played the learned melody; “Imagine
Improvisation,” where the subjects imagined improvising melodies over the
chord progressions; and “Actual Improvisation,” where they improvised
over the chords.
Behavioral measures included an expert assessment of the creativity of
the improvisational musical creativity via the use of a questionnaire.
EEG data included measurements of upper alpha range power (10–12
Hz), which were calculated for each condition (versus a pre-stimulus
reference interval) at each electrode, and a measurement of synchronization
across electrodes.
Results
In both the FITI and non-FITI groups, there was increased frontal alpha
synchronization in the right hemisphere during “Listen,” “Playback,” and
“Improvisation” tasks. During “Improvisation,” however, the FITI group
showed increased right hemisphere alpha power compared with the non-
FITI group. In the FITI group, alpha synchronization was higher during
improvisation compared with both “Listen” and “Playback,” suggesting a
unique interaction in alpha synchrony with improvisation and expertise.
Also in the FITI group there were positive correlations between the left and
right hemisphere frontal alpha synchrony for all tasks.
In the non-FITI group, there was a strong negative correlation of right
hemisphere alpha synchrony with both musical and improvisational
experience during “Improvisation”; this was true to a lesser extent during
“Listen” and “Playback.” In the FITI-group, there were positive correlations
of left hemisphere alpha synchrony and age, musical experience, and
improvisational experience during all tasks.
There were no significant differences in creativity scores between FITI
and non-FITI groups, but in the FITI group only, there was a positive
correlation between creativity scores and right hemisphere alpha
synchronization.
The study demonstrates that frontal alpha synchronization is associated with
musical improvisation, which is enhanced by formal training experience,
and is associated with more creative performances in the most experienced
improvisers. The authors interpreted the increased frontal alpha
synchronization during improvisation to be “evidence of an underlying
creative mental state characterized by immersion in a Type 1 spontaneous
processing mode,” that suggests “top-down processing and internal focus of
attention,” and not merely “a suppression of executive functions and
logical-rational thought processes.” The authors speculate that increased
right hemisphere alpha synchronization in the FITI group during
“Improvisation” to “support the view of a special role of right frontal brain
areas in the generation of original ideas, and as benefitting from expertise
and development through training.”
In the non-FITI group, the finding of negative correlations of right
hemisphere alpha synchronization with musical and improvisational
experience during improvisation is interpreted to reflect this group’s “lack
of immersion in Type 1 spontaneous processing,” and suggests an
engagement with music that is more deliberate than spontaneous. The
authors argue that the correlation between right frontal alpha synchrony and
creativity scores in those with FITI suggest that “Type 1 spontaneous
processing tends to yield higher quality improvised performance” in these
experts.
S D
The reviewed literature investigating the neuroanatomical substrates

underlying musical improvisation explores different aspect of this complex
behavior, including: the differential neural processing of rhythm and
melody during musical improvisation (Berkowitz & Ansari, 2008; Pinho et
al., 2016), the role of emotion (McPherson et al., 2016; Pinho et al., 2016),
the relationship to language (Brown et al., 2006; Liu et al., 2012), the
impact of an interlocutor (Donnay et al., 2014), and how expertise and
training (Berkowitz & Ansari, 2010; Lopata et al., 2017; Pinho et al., 2014),
the creative output (Lopata et al., 2017; Villarreal et al., 2013), and direct
electrical stimulation (Rosen et al., 2016) modulate neural activity within
the identified networks underlying musical improvisation.
Although these studies varied widely in their task design, overlap is seen
in a broad network of brain regions involved in cognitive control and
monitoring, motor planning and execution, multimodal sensation,
motivation, emotional/limbic processing, and language regions.
The brain networks involved in musical improvisation perform domain-
general processes that are recruited for the spontaneous generation of
music. For example, de Manzano and Ullén (2012b) showed that many of
regions implicated in musical improvisation are also active when generating
random keystrokes, which suggests that musical improvisation involves
networks that are important for tasks involving freely-generated actions
more broadly. Brown et al. (2006) demonstrated that musical improvisation
recruits regions also involved in sentence generation. Analyzing known
functions of brain regions involved in improvisation provides insights into
the domain-general cognitive modalities that contribute to musical
improvisation. In the following, we review the implications of the described
studies (for additional reviews, see Beaty, 2015; Beaty et al., 2016).
Attentional Networks and the Prefrontal Cortex

Almost all studies of musical improvisation in the literature implicate the
prefrontal cortex, a region important for a number of cognitive, behavioral,
and affective functions. A theme that emerges in the reviewed studies is the
distinctive roles of the medial and lateral aspects of the PFC, regions known
to subserve different cognitive processes.
Medial PFC: The medial PFC (de Manzano & Ullén, 2012b; Limb &
Braun, 2008; Liu et al., 2012) is important for internally-focused attention,
self-generated actions, motivation, social cognition, and self-referential
thinking. It has widespread connections with limbic, heteromodal sensory,
and other prefrontal areas. In musical improvisation, the medial PFC may
be important in the coordination and expression of internally-motivated
behaviors, serving an integrative role combining multiple cognitive
processes in the pursuit of internal goals (Limb & Braun, 2008).
Improvisation of music with an emotional intention is associated with
increased medial PFC activity as part of a broader network involving the
insula and IFG (Pinho et al., 2016). Activity in more caudal medial
prefrontal regions is associated with higher creativity scores during
improvised rap, and this region has second- and third-order connections
with a widespread, bilateral network involving limbic, motor, and
multimodal perceptual areas (Liu et al., 2012). Facilitation of these
connections during improvised rap may be important in enhancing the
creative product (Liu et al., 2012). Liu et al. (2012) also report the medial
PFC to be functionally disconnected from the DLPFC during
improvisational tasks, a finding the authors speculate may allow for
unplanned, spontaneous idea generation outside the constraint of top-down
conscious control.
ACC: The dorsal ACC is activated in several studies of musical
improvisation (Berkowitz & Ansari, 2008; de Manzano & Ullén, 2012b; Lu
et al., 2015). The ACC is sometimes considered an extension of the medial
PFC given its anatomical proximity (just posterior), and has rich
connections with cognitive/attentional, affective, and motor areas. It is
connected with both “top-down” and “bottom-up” attentional networks, and
is important for error detection and monitoring, as well as reward-based
learning. The ACC participates in the selection of appropriate actions based
on predicted reward/affective values of competing plans, and monitors
errors in these predictions. This region and its function may serve an
important role in musical decision making during the moment-to-moment
unfolding of musical improvisation.
Dorsolateral PFC: The DLPFC—in contrast to the medial PFC—is a
prefrontal region that is important in “top-down” processing and externally-
directed attention, and plays a critical role in executive functions including
working memory, planning, and multi-step cognitive processes. Conflicting
findings have been shown in the DLPFC in studies of musical
improvisation. It has been reported as having either increased (Bengtsson et
al., 2007; de Manzano & Ullén, 2012a, 2012b; Donnay et al., 2014; Pinho
et al., 2016; Villarreal et al., 2013) or decreased (Donnay et al., 2014; Limb
& Braun, 2008; Liu et al., 2012; McPherson et al., 2016; Pinho et al., 2014,
2016) activity, depending on the study and the tasks involved.
In studies demonstrating increased DLPFC activation, it has been
reported to reflect top-down guidance of motor planning and response
selection (Bengtsson et al., 2007), reliance on working memory, inhibition
of competing stimuli (Rosen et al., 2016; Villarreal et al., 2013), integration
of goal-oriented information for attentional selection (Pinho et al., 2016),
and increased conscious self-monitoring when engaging with a musical
interlocutor (Donnay et al., 2014). Direct current stimulation of the right
DLPFC was associated with an increased quality of improvisational musical
performance in novices (Rosen et al., 2016), suggesting that activity in the
region can facilitate creative performance in certain subjects.
During musical improvisation, reduced lateral PFC activity is suggested
to represent a suspension of top-down, goal-directed, conscious control and
self-monitoring functions, which allows more remote associations and
unplanned, less predictable solutions to unfold (Limb & Braun, 2008). This
is most commonly reported dorsally (Limb & Braun, 2008; Liu et al.,
2012), but also more ventrally extending into the lateral orbitofrontal cortex
(Limb & Braun, 2008; Liu et al., 2012). This mechanism of creative
expression is supported by electrophysiological studies showing reduced
causal influences from the DLPFC (e.g., SFG) on premotor and parietal
areas during improvisation, with a reduced strength of connection between
these areas predictive of more creative performance (Adhikari et al., 2016).
This hypofrontality mechanism may be a marker of expertise, as
improvisational experience is negatively correlated with right DLPFC
activity (Pinho et al., 2014). In EEG studies, alpha synchronization, a
hypothesized marker of spontaneous, internally-focused attention, is
enhanced in subjects with formal improvisational training, and is associated
with increased musical creativity scores (Lopata et al., 2017). Stimulation
of the right DLPFC using tDCS was associated with a reduction in the
quality of improvisational performance in experts. These effects may be
limited to those with specific improvisational training, as opposed to
musical training more broadly, as several studies requiring only the latter
(Berkowitz & Ansari, 2008; Villarreal et al., 2013) demonstrate increased
DLPFC activity during improvisation, and those with more creative
products showed increased activity within the DLPFC (Villarreal et al.,
2013).
DLPFC activity, and its functional connectivity, can be modulated by the
nature of the improvisational task, which may be a reflection of the
cognitive approach to creative expression (Pinho et al., 2016). Tasks that
require an explicit approach to creativity (e.g., limited to a specific pitch-
set) are associated with relative increases in DLPFC activity when
compared to those requiring implicit strategies (e.g., improvise an emotion)
(Pinho et al., 2016). The pitch-set improvisational task in Pinho et al.
(2016) was associated with increased DLPFC functional connectivity with
motor, auditory, and parietal regions—possibly reflecting an intentional,
top-down executive network—and the emotional improvisational tasks with
increased connections with default mode regions—a more integrative
network. Improvising with the intent of expressing positive emotions is
associated with reduced DLPFC activity (McPherson et al., 2016).
Motor Regions
Motor areas control the body’s movements and include primary motor
cortex (e.g., precentral gyrus) and high-order regions important for
planning, sequencing, initiation, and monitoring of movement (e.g., PMd,
PMv, SMA, pre-SMA), emotionally-guided movement (e.g., CMA),
patterning and sequencing of movements (basal ganglia), and coordination
of movements (cerebellum). Given that musical improvisation can only be
externalized through movement, it is not surprising that all of these regions
are involved in musical improvisation.
The SMA and pre-SMA are important in the selection, initiation, timing,
and monitoring of motor movements, and are thought to play a role in the
rhythmic patterning during improvisational tasks (Bengtsson et al., 2007;
Brown et al., 2006; de Manzano & Ullén, 2012a; Donnay et al., 2014; Liu
et al., 2012; Villarreal et al., 2013) and hierarchical control of motor
sequencing (de Manzano & Ullén, 2012a). The pre-SMA is more strongly
connected with the cerebellum during tasks of rhythmic improvisation (de
Manzano & Ullén, 2012a), which highlights its role in timing, and also with
the limbic areas during freestyle rap (Liu et al., 2012), suggesting an
interaction beyond that of other motor areas. The connection between
higher-order motor regions may be modulated by training, as more
experienced improvisers show increased connections of the SMA and PMd
with a widespread network involving prefrontal, premotor, motor, parietal,
and auditory regions (Pinho et al., 2014).
The PMd is reported in many studies (Bengtsson et al., 2007; Berkowitz
& Ansari, 2008; Brown et al., 2006; de Manzano & Ullén, 2012a; Limb &
Braun, 2008; Liu et al., 2012). It is suggested to play a role in sensorimotor
integration, whereby sensory information (often visual) is used to guide the
sequencing and planning of motor movements; it is also important for
internally-generated actions, and is connected with prefrontal areas. The
region may be important in reading musical notation (Bengtsson et al.,
2007), melodic performance (Bengtsson et al., 2007), and more broadly in
top-down, explicit processing of novel motor sequencing (de Manzano &
Ullén, 2012a).
The CMA is thought to guide the selection of voluntary movements
based on expected rewards, and is known to integrate limbic information to
guide motor behaviors. It is associated with increased activity during
musical improvisation (Liu et al., 2012). During freestyle rap, the CMA
shows increased functional connectivity with the amygdala as part of a
broader network integrating affective, motor, and perceptual processes (Liu
et al., 2012). The authors speculate it may represent an alternative pathway
of behavioral expression occurring outside DLPFC-mediated, explicit
motor selection.
Limbic/Affective Processing
Given the emotional nature of musical improvisation, it is not surprising
that limbic areas are involved. These regions help to represent emotion,
motivation, and memory, and are connected richly with the autonomic
nervous system, which provides information about internal body states.
There are reports of reduced activity, including the hypothalamus,
amygdala, hippocampus, parahippocampal gyrus, temporopolar cortex, and
ventral striatum, which may be indicative of the positive emotional valence
associated with improvising (Limb & Braun, 2008). During freestyle rap,
the medial PFC shows increased functional connectivity with the amygdala
via the IFG, CMA, and pre-SMA (Liu et al., 2012).
Insula: The insula is important in representing subjective emotional and
motivational states as it receives interoceptive inputs from the body via the
autonomic nervous system and integrates this information with sensory,
limbic, hedonic, and cognitive inputs. It is also important in salience
detection, and serves an important role in switching between different large-
scale networks (e.g., central executive and default mode networks).
Successful integration of highly-integrated emotional representations into
specific motor plans may be important for creative expression under certain
conditions, a process that may be enhanced when guided by the medial
PFC. Activity in the insular cortices has been reported in several studies of
musical improvisation (de Manzano & Ullén, 2012b; Limb & Braun, 2008;
Pinho et al., 2016; Villarreal et al., 2013), with reports of both increased
(Brown et al., 2006; de Manzano & Ullén, 2012b; Pinho et al., 2016;
Villarreal et al., 2013) and decreased (Limb & Braun, 2008) activity. When
comparing groups with high versus low creativity scores on a task of
rhythmic improvisation, the right insula was associated with increased
creativity scores, and its activity was positively correlated with higher
scores (Villarreal et al., 2013). Improvisation with emotional intent is
associated with bilateral insular activation (Pinho et al., 2016), and the
expression of negatively-valenced emotional improvisations is associated
with increased insular connectivity with the midbrain substantia nigra
(McPherson et al., 2016).
Language Areas
The left IFG is critical for expressive language and syntax/grammar,
functioning as part of Broca’s area. The IFG may be involved in other
functions, including response inhibition, mirroring of external motor
movements, generative verbal fluency, and hierarchical motor sequencing.
The IFG has been implicated in several studies of improvisation (Berkowitz
& Ansari, 2008; de Manzano & Ullén, 2012a, 2012b; Donnay et al., 2014;
Limb & Braun, 2008; Liu et al., 2012; McPherson et al., 2016; Pinho et al.,
2016), and is thought to play a role in the generation and selection of motor
sequences (Berkowitz & Ansari, 2008), novel musical phrases (de Manzano
& Ullén, 2012b), detection of salient harmonic and rhythmic elements
(Donnay et al., 2014), and hierarchical structuring of musical phrases
(Donnay et al., 2014). During freestyle rap, the medial PFC has enhanced
functional connections with the left IFG as part of a broader, integrative
network. Donnay et al. (2014) demonstrate the IFG to be functionally
disconnected from areas important for communication of explicit semantic
information (e.g., angular gyrus) during improvisational tasks that utilize a
musical interlocutor, and the authors suggest that in musical
communication, explicit semantic knowledge is superfluous, and that
“acoustic-phonologic-analysis” areas are paramount.
Sensory Processing
Sensory information is represented in the cortex in a hierarchical manner.
Incoming sensory information is initially processed as simple unimodal
representations (primary sensory areas), and progressively organized into
complex unimodal representations (secondary sensory cortex), then later
combined with other sensory modalities in heteromodal regions. These
regions are subject to both bottom-up and top-down regulation. A number
of studies report increased activity within primary and unimodal sensory
areas (Bengtsson et al., 2007; Brown et al., 2006; Limb & Braun, 2008).
This may be related to task demands, suggesting a role for increased
sensory processing during improvisation, or possibly a release phenomenon
with reduced top-down inhibition (Limb & Braun, 2008).
Auditory: Auditory sensory streams are located within the superior and
lateral temporal areas, and musical improvisation is associated with
activation of these areas, including the STG (Bengtsson et al., 2007; Brown
et al., 2006; Donnay et al., 2014; Limb & Braun, 2008; Liu et al., 2012;
Pinho et al., 2016), MTG (Limb & Braun, 2008; Liu et al., 2012), and ITG
(Limb & Braun, 2008). Activity within the posterior superior temporal
areas (Bengtsson et al., 2007; Limb & Braun, 2008; Liu et al., 2012), a
region known to be important in highly-structured auditory processing, is
involved in auditory working memory (Bengtsson et al., 2007: Donnay et
al., 2014). It may be part of an auditory-motor feedback loop where
auditory information is utilized online to guide the next musical idea via
higher-order motor planning through instrumental performance (Bengtsson
et al., 2007), and may aid the retrieval of stored musical motifs (Bengtsson
et al., 2007). The posterior temporal regions may also be important in
harmonic processing (Donnay et al., 2014). Activity in this region is
associated with higher scores of creativity during improvised rap (Liu et al.,
2012).
Somatosensory: The SPL—a secondary somatosensory region—has
been reported to have increased activity in several studies (Berkowitz &
Ansari, 2008; Donnay et al., 2014; Limb & Braun, 2008), but not all (Liu et
al., 2012). The primary sensory areas are also reported. This may reflect
task demands, although in these studies the motor output (and thus
somatosensory feedback) in both the improvisational tasks and controls was
similar, and as such was suggested to represent a “generalized
intensification of activity in all sensory modalities” associated with musical
spontaneity (Limb & Braun, 2008).
Visual: The occipital cortex, which is the site of hierarchical visual
processing, including the fusiform and lingual gyri, was activated in a
number of studies (Bengtsson et al., 2007; Donnay et al., 2014; Limb &
Braun 2008; Liu et al., 2012; Pinho et al., 2016), and has been reported to
reflect visual demands associated with using a musical score to guide
improvisation (Bengtsson et al., 2007).
Heteromodal Sensory Processing and the Parietal

Lobes
The parietal lobe is important for many cognitive functions, as it is situated
between multiple sensory areas (somatosensory, auditory, visual) and has
widespread connections with the frontal cognitive and motor areas.
Depending on its subregion, it is involved in top-down attentional
processing and executive functioning (IPS), bottom-up attentional
processing (TPJ), serve as a sensory guide for movement, and bind together
discrete visual elements into a coherent sensory “whole”; it is important for
a diversity of higher cognitive processes such as visual imagination, mirror
phenomenon, calculations, navigation, feelings of familiarity, and
knowledge of directionality.
Angular gyrus: The angular gyrus is a heteromodal sensory region that
is part of the default mode network, an interconnected set of cortical nuclei
that suberve internal states characterized by defocused attention, mind-
wandering, and recalling autobiographical information. It lies at the
temporoparietal junction (TPJ), which is also implicated in bottom-up
attentional processes as part of a broader ventral attention network
involving the ventral frontal cortex. This network is important for
identifying and orienting to behaviorally relevant stimuli that occur
unexpectedly. Several studies show reduced activity in TPJ during
improvisational tasks (Berkowitz & Ansari, 2010; Brown et al., 2006;
Donnay et al., 2014; Limb & Braun, 2008; Liu et al., 2012; McPherson et
al., 2016). This may reflect the role of expertise, as Berkowitz and Ansari
(2010) showed reduced TPJ activity in experts compared to novices, and
Pinho et al. (2014) show activity in this region during improvisational tasks
to be inversely correlated with improvisational (but not overall musical)
experience. The role these deactivations play in spontaneous musical
expression is uncertain, but may reflect a broad reduction in top-down
attentional control and increased automation of complex behavior (Pinho et
al., 2014), or increased top-down control with explicit, goal-directed
suppression of bottom-up stimuli that compete for attention (Berkowitz &
Ansari, 2010). EEG evidence is more supportive of the former hypothesis
(Adhikari et al., 2016).
In addition to the lateral parietal regions, musical improvisation is
associated with other parietal areas such as the PCC, a default-mode
network region with strong connections to both the medial PFC, lateral
parietal areas, and temporolimbic areas. Deactivation in the PCC is seen
during improvisation (Limb & Braun, 2008; Liu et al., 2012), although
increased activity in this region is correlated with higher scores of creativity
during improvised rap (Liu et al., 2012). Improvisation was associated with
reduced activity within the precuneus (McPherson et al., 2016), a nearby
region to the PCC with overlapping functionality. There is increased
bilateral activity of the SMG reported in several studies (Donnay et al.,
2014; Limb & Braun, 2008), but not all (Liu et al., 2012), and this may
relate to its role in preparation to execute learned motor actions (i.e.,
praxis).
Improvising is associated with changes in brain regions involved in

attention, higher-order motor processing, limbic processing, unimodal and
multimodal sensory processing, and linguistic processing. Activity in these
regions is not unique to musical improvisation, but rather subserves
domain-general cognitive processes that are recruited when improvising.
From a cognitive neuroscience perspective, improvisation can be seen as a
process in which auditory-motor representations are retrieved from memory
storage, selected and combined based on stylistic rule-based constraints,
and then executed through the motor system based on real-time
sensorimotor and emotional evaluation. This is analogous to what occurs in
spoken language.
Expertise in improvisation appears to require various types of attentional
shifts: from top-down attention involving the DLPFC (and dorsal attention
network) to a state in which conscious monitoring is suspended and creative
products are generated spontaneously and implicitly to produce more
creative works; but also at times inhibition of bottom-up processing (rTPJ
deactivation) in order to preserve top-down, goal-directed states of internal
motivation, represented by activation of the medial prefrontal areas, which
themselves interface with a widespread network involving sensory, motor,
and limbic regions during improvisation. This medial-lateral prefrontal
dissociation of activity, seen most clearly in experts, may underlie the
reported psychological state of flow, whereby complex, goal-directed
actions are allowed to be expressed effortlessly.
Musical improvisation provides a unique substrate for the study of the
neural basis of creativity, providing insights into how domain-general
cognitive processes can themselves be creatively recombined in real time to
create spontaneous works of art.
R
Adhikari, B. M., Norgaard, M., Quinn, K. M., Ampudia, J., Squirek, J., & Dhamala, M. (2016). The
brain network underpinning novel melody creation. Brain Connectivity 6(10), 772–785.
Beaty, R. E. (2015). The neuroscience of musical improvisation. Neuroscience & Biobehavioral
Reviews 51, 108–117.
Beaty, R. E., Benedek, M., Silvia, P. J., & Schacter, D. L. (2016). Creative cognition and brain
network dynamics. Trends in Cognitive Sciences 20(2), 87–95.
Bengtsson, S. L., Csikszentmihalyi, M., & Ullén, F. (2007). Cortical regions involved in the
generation of musical structures during improvisation in pianists. Journal of Cognitive
Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor sequences: The neural correlates
of musical improvisation. NeuroImage 41(2), 535–543.
Berkowitz, A. L., & Ansari, D. (2010). Expertise-related deactivation of the right temporoparietal
junction during musical improvisation. NeuroImage 49(1), 712–719.
Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in the brain: A
PET study of the generation of melodies and sentences. European Journal of Neuroscience 23(10),
2791–2803.
de Manzano, O., & Ullén, F. (2012a). Activation and connectivity patterns of the presupplementary
and dorsal premotor areas during free improvisation of melodies and rhythms. NeuroImage 63(1),
272–280.
de Manzano, O., & Ullén, F. (2012b). Goal-independent mechanisms for free response generation:
Creative and pseudo-random performance share neural substrates. NeuroImage 59(1), 772–780.
Donnay, G. F., Rankin, S. K., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2014). Neural
substrates of interactive musical improvisation: An FMRI study of “trading fours” in jazz. PloS
ONE 9(2), e88665.
Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical performance: An fMRI
study of jazz improvisation. PloS ONE 3(2), e1679.
834. doi:10.1038/srep00834
Lopata, J. A., Nowicki, E. A., & Joanisse, M. F. (2017). Creativity as a distinct trainable mental state:
An EEG study of musical improvisation. Neuropsychologia 99, 246–258.
Lu, J., Yang, H., Zhang, X., He, H., Luo, C., & Yao, D. (2015). The brain functional state of music
creation: An fMRI study of composers. Scientific Reports 5, 12277. doi:10.1038/srep12277
McPherson, M. J., Barrett, F. S., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2016).
Emotional intent modulates the neural substrates of creativity: An fMRI study of emotionally
targeted improvisation in jazz musicians. Scientific Reports 6, 18460. doi:10.1038/srep18460
Pinho, A. L., de Manzano, O., Fransson, P., Eriksson, H., & Ullén, F. (2014). Connecting to create:
Expertise in musical improvisation is associated with increased functional connectivity between
premotor and prefrontal areas. Journal of Neuroscience 34(18), 6156–6163.
Pinho, A. L., Ullén, F., Castelo-Branco, M., Fransson, P., & de Manzano, O. (2016). Addressing a
paradox: Dual strategies for creative performance in introspective and extrospective networks.
Rosen, D. S., Erickson, B., Kim, Y. E., Mirman, D., Hamilton, R. H., & Kounios, J. (2016). Anodal
tDCS to right dorsolateral prefrontal cortex facilitates performance for novice jazz improvisers but
hinders experts. Frontiers in Human Neuroscience 10, 579. Retrieved from
Villarreal, M. F., Cerquetti, D., Caruso, S., Schwarcz López Aranguren, V., Gerschcovich, E. R.,
Frega, A. L., & Leiguarda, R. C. (2013). Neural correlates of musical creativity: Differences
between high and low creative subjects. PloS ONE 8(9), e75427.
CHAPT E R 21
NEURAL MECHANISMS OF
M U S I C A L I M A G E RY
T I MO T H Y L . H U B B A R D
I the early years of the cognitive approach to psychology, cognitive

processes were considered analogous to software and the brain was
considered analogous to hardware. Software and hardware can be viewed as
relatively independent, and so there was not much focus on the neural
mechanisms of cognitive processes. However, with the development of
brain imaging technologies that allowed examination of functioning in
intact living brains, researchers began to make significant advances in
linking different cognitive processes with different neural mechanisms, and
questions about the neural mechanisms of cognition became more central.
Music offered an excellent venue for investigation of neural mechanisms of
cognition (e.g., Peretz & Zatorre, 2003) and brain plasticity (e.g., Herholz
& Zatorre, 2012; Schlaug, 2015). The importance of understanding neural
mechanisms of cognition was underscored by the emergence of the notion
of embodied cognition, an approach which suggests that cognitive
functioning is influenced by characteristics and properties of embodied
experience (e.g., Barsalou, 2008; Gibbs, 2005; Shapiro, 2010; Wilson,
2002). Indeed, there have recently been calls for an embodied cognition
approach to the study of music (e.g., Cox, 2016). Most papers in
psychology and neuroscience of music focused on perception, cognition,
and performance (e.g., Levitin & Tirovolas, 2009), and there has been less
focus on musical imagery. This chapter will focus on neural mechanisms of
musical imagery across a range of domains.
Music is generally considered an auditory stimulus, but perceptual and
cognitive representation of music can involve non-auditory (e.g.,
kinesthetic) information, and musical imagery involves auditory and non-
auditory components. Studies involving auditory and non-auditory
components of musical imagery and that have implications for
understanding neural mechanisms of musical imagery are considered.
Studies involving only behavioral or psychophysical measures of musical
imagery are reviewed in Hubbard (2010, 2013a, 2013b, 2018, forthcoming)
and are not considered here unless those studies have implications for
understanding neural mechanisms of musical imagery. Studies involving
neuroscience of music that do not generate testable predictions regarding
musical imagery are also not considered here. The similarity of imagery and
perception of musical stimuli is addressed, and results from studies
involving behavioral and psychophysical methods, clinical studies of brain-
damaged individuals, and physiological data involving electrophysiology
and brain imaging are considered. Involuntary musical imagery is
addressed, and examples involving anticipatory musical imagery, musical
hallucinations, musical imagery accompanying schizophrenia, earworms,
and the relative lack of musical imagery in synesthesia are considered.
Embodied musical imagery is addressed, and examples involving spatial
and force metaphors, the role of mimicry, the distinction between the inner
ear and inner voice, the effects of mental practice on performance, musical
imagery and dance, and musical affect are considered. A brief summary and
conclusions are then presented.
I P M
Imagery often seems to exhibit perception-like qualities, and a starting point

for many studies of musical imagery involves the similarity of imagery and
perception. There have been three main approaches to examining the
relationship between imagery and perception, and these involve (a)
behavioral and psychophysical studies; (b) studies of patients with brain
damage; and (c) brain-imaging methods such as electroencephalography
(EEG), positron emission tomography (PET), and functional magnetic
resonance imaging (fMRI).
Behavioral and Psychophysical

There are many similarities in behavioral and psychophysical data
regarding musical imagery and music perception. Properties of perceived
and imaged musical tones such as pitch (Hubbard & Stoeckig, 1988) and
timbre (Crowder, 1989) prime subsequently perceived tones with matching
properties. Imaged tempo for a familiar tune matches the typical
performance tempo for that tune (Halpern, 1988b; Jakubowski, Farrugia, &
Stewart, 2016), and studies in which participants scanned through an
imaged melody found that relative latencies between notes are preserved
(e.g., Halpern, 1988a; Zatorre, Halpern, & Bouffard, 2010; Zatorre,
Halpern, Perry, Meyer, & Evans, 1996). Musical images preserve harmonic
relatedness and tonality (Hubbard & Stoeckig, 1988; Vuvan & Schmuckler,
2011) and exhibit a weak form of absolute pitch (Halpern, 1989;
Schellenberg & Trehub, 2003). Pitch acuity is similar in perceived and
imaged musical pitch, but temporal acuity is worse in musical imagery than
in perception (Janata & Paroo, 2006). Not surprisingly, pitch acuity in
imagery is better in participants with more musical training (Cebrian &
Janata, 2010b). Right-handed experimental participants instructed to image
a voice often localize that voice on their right side (Prete, Marzoli,
Brancucci, & Tommasi, 2016), consistent with the right ear advantage for
speech, and it could be hypothesized that side preferences found in music
perception should be found for musical imagery. In general, findings are
consistent with hypotheses that musical imagery preserves structural and
temporal properties of a musical stimulus and that imagery of musical
stimuli involves many of the same neural mechanisms as music perception.
Brain Damage
Although there have been numerous studies of the effects of brain damage
on music perception, cognition, and production (for reviews, see Marin &
Perry, 1999; Peretz & Zatorre, 2005; Stewart, von Kriegstein, Warren, &
Griffiths, 2006), there have been relatively few studies of musical imagery
in patients with brain damage. The studies that have been reported typically
compared performance involving imagery in brain-damaged patients with
performance on the same task in a control group. Patients with damage to
the right temporal lobe performed worse on pitch comparisons in imagery
and in perception than did patients with damage to the left temporal lobe or
control participants (Zatorre & Halpern 1993). Halpern (2003) suggested
these lesion data and subsequent imaging data (from Zatorre et al., 1996)
demonstrated the right superior temporal gyrus is involved in comparisons
of pitch in imagery (see also Samsom & Zatorre, 1991). Patients with right
temporal lobe damage to the area including Heschl’s gyrus do not perceive
a missing fundamental (Zatorre, 1988), and this is consistent with a role for
this area in top-down representation of pitch. Patients with right hemisphere
damage have difficulty in processing information regarding musical interval
and musical contour (Liégois-Chauvel, Peretz, Babaï, Laguitton, &
Chauvel, 1998; Peretz, 1990) and in identification of sad music (Khalfa,
Schon, Anton, & Liégeois-Chauvel, 2005), and this predicts such patients
would have similar difficulties in musical imagery. More positively, music
influences brain plasticity, and so it could be predicted that musical imagery
might be useful in the treatment of some neurological damage or disorders
(e.g., melodic intonation therapy, Peretz, 2013; also Bringas et al., 2015;
Sabaté, Llanos, & Rodriguez, 2008; Särkämö, Altenmüller, Rodriguez-
Fornells, & Peretz, 2016).
Clinical studies of individuals with trauma-induced amusia (e.g., Marin
& Perry, 1999; Satoh, 2014) or congenital amusia (e.g., Peretz, 2013) have
shed light on neural mechanisms of music processing, but musical imagery
has typically not been studied in such individuals. Amusias might have a
basis in perception or memory (Peretz, 2002); to the extent an amusia
involves dysfunction of memory, imagery might be impacted (e.g., Satoh,
2014, explicitly identifies memory as internal imagery), but to the extent an
amusia involves dysfunction in perception, imagery might be relatively
spared. Also, parallels between types of amusia and types of aphasia (e.g.,
receptive, production) suggest there may be some overlap in neural
mechanisms that process music and neural mechanisms that process
language (cf. Besson & Schön, 2003; Marin & Perry, 1999; Patel, 2008).
Additionally, findings that patients with amusia have difficulty in spatial
tasks such as mental rotation (Douglas & Bilkey, 2007; but see Tillmann et
al., 2010), coupled with findings that some types of musical imagery
manipulation involve cortical areas implicated in mental rotation (Zatorre et
al., 2010), suggest such patients might have impaired musical imagery.
Studies of patients with amusia suggest music functions are not as strongly
lateralized as language functions (Alossa & Castelli, 2009), and this has
been confirmed in non-patient studies as well (e.g., Parsons, 2003; Platel et
al., 1997). Also, presence of amusia predicts deficits in auditory emotion
recognition in schizophrenia, and this might reflect development of music
and language from the same musical protolanguage (Kantrowitz et al.,
2014).
Physiological Measures
Many studies recorded physiological measures in an attempt to understand
neural mechanisms of musical imagery. These studies typically involved
electrophysiology such as EEG and event-related potential (ERP) or brain
imaging such as PET and fMRI (for review, see Koelsch, 2012).
Electrophysiology
Imaging a melody results in more high-band synchronized alpha than does
perceiving a melody (Schaefer, Vlek, & Desain, 2011; Villena-González,
López, & Rodríguez, 2016), and alpha is increased during imagery of more
complex tones (van Dijk, Nieuwenhuis, & Jensen, 2010). Emitted potentials
occur when a musical note is expected but not presented (Cebrian & Janata,
2010b; Janata, 2001), and these are similar to evoked potentials elicited by
presentation of a musical note. There are differences in size of the N1 in
response to a perceived tone as a function of image accuracy and whether
preceding tones were imaged or perceived (Cebrian & Janata, 2010a). If a
participant deliberately generates an auditory image appropriate to a
stimulus seen in a visual picture, P2 and LPC are increased (Wu, Mai,
Chan, Zheng, & Luo, 2006). A larger mismatch in loudness or pitch
between imaged tones and subsequent perceived tones elicits a larger N2
(Wu, Mai, Yu, Qin, & Luo, 2010) and lower-pitched or louder images and
percepts evoke a larger N1 and LPC (Wu, Yu, Mai, Wei, & Luo, 2011).
Accented beats in a sequence of imaged or perceived beats result in a larger
positive amplitude after 180–250 milliseconds and a larger negative
amplitude after 350 milliseconds (Vlek, Schaefer, Gielen, Farquhar, &
Desain, 2011). Relatedly, rhythmic aspects of melody are more easily
isolated in EEG than are pitch or melody-driven aspects (Schaefer, Desain,
& Suppes, 2009). Mismatch negativity is evoked in musicians for perceived
and for imaged musical stimuli (Herholz, Lappe, Knief, & Pantev, 2008;
Yumoto et al., 2005). Continuation of a lyric in imagery during an
unexpected silent gap in familiar music results in several changes in
perceptual, attentional, and cognitive components of ERPs (Gabriel et al.,
2016). In highly trained musicians, ERPs while reading a visual musical
score are indistinguishable from ERPs while listening to auditory notes
(Simoens & Tervaniemi, 2013). In general, imagery of a musical stimulus
results in generation of ERP or EEG patterns similar to those generated by
perception of a musical stimulus.
Brain Imaging
There have been numerous studies involving brain imaging during
processing of musical stimuli (for reviews, see Koelsch, 2010, 2012; also
Peretz & Zatorre, 2003) and changes in the brain related to musical training
(Wan & Schlaug, 2013). There is substantial overlap of cortical areas
activated in musical imagery and activated in music perception, especially
in Wernicke’s area and its right hemisphere homologue (Zhang, Chen, Wen,
Lu, & Liu, 2017) and auditory association areas (e.g., Daselaar, Porat,
Huijbers, & Pennartz, 2010; Herholz, Halpern, & Zatorre, 2012; Zatorre et
al., 1996). Spontaneous imagery during an unexpected gap in a well-known
musical piece (Kraemer, Macrae, Green, & Kelley, 2005) or during a silent
gap prior to the start of an expected music track on a familiar CD (Leaver,
van Lare, Zielinski, Halpern, & Rauschecker, 2009) involves activation of
auditory association areas as well as prefrontal and motor areas. Auditory
imagery may activate frequency-specific regions in primary auditory cortex
(Oh, Kwon, Yang, & Jeong, 2013). When participants listen to four-part
harmony, there is greater activation in bilateral temporal lobes, cingulate
gyrus, and medial cerebellum when participants focus on the harmony as a
whole, but greater activation of superior parietal, bilateral precuneus, and
bilateral orbital frontal cortices if participants focus on a particular (e.g.,
alto) line (Satoh, Takeda, Nagata, Hatazawa, & Kuzuhara, 2001). Judgment
of similarities of perceived timbres and of imaged timbres results in similar
cortical activation (Halpern, Zatorre, Bouffard, & Johnson, 2004):
secondary auditory cortex and supplementary motor cortex are activated in
both imagery and perception, but primary auditory cortex is activated only
in perception (see also Zhang et al., 2017). Indeed, passive listening to
music by musicians (Haueisen & Knösche, 2001) and non-musicians
(Perrone-Capano, Volpicelli, & di Porzio, 2017) who remain motionless
results in activation of cortical motor areas.
Participants who self-report more vivid musical imagery exhibit greater
activation in right superior temporal gyrus and prefrontal cortex (Herholz et
al., 2012) and in right parietal cortex (Zatorre et al., 2010). Higher self-
reported vividness of auditory imagery correlates with gray matter volume
in left inferior parietal lobe, medial superior frontal gyrus, middle frontal
gyrus, and left supplementary motor area (Lima et al., 2015). Application of
TMS over the right hemisphere (to disrupt cortical activation) disrupts pitch
discrimination (Halpern, 2003). Imagery reversal of a musical stimulus (i.e.,
scanning backward through a melody; Zatorre et al., 2010) activates
intraparietal sulcus and ventrolateral and dorsolateral frontal cortex (areas
involved in manipulating sensory information). Musicians who read a
musical score initially exhibit activation in occipital areas that spreads to
midline parietal and then to left temporal auditory association areas and
right premotor areas, and this pattern could reflect emergence of notational
audiation, that is, auditory imagery of a piece of music that is evoked by
reading the musical score of that piece (Schürmann, Raij, Fujiki, & Hari,
2002). Participants instructed to image a single note exhibit activation of
bilateral superior temporal gyri, medial and inferior frontal gyri, and
precuneus (Yoo, Lee, & Choi, 2001). Overall, brain imaging studies
generally support the idea that neural mechanisms are shared between
imagery and perception and between imagery and production (see later
subsection on “Mental Practice and Performance”), although there are
exceptions (e.g., primary auditory cortex is less likely to be activated during
imagery than during perception).
I M I
The majority of laboratory studies of musical imagery involve images
created in response to a stimulus or task demand, and as noted above, these
studies suggest such imagery generally recruits neural mechanisms similar
to those used in music perception, cognition, and performance. However,
musical imagery can be involuntary and occur spontaneously and without
conscious control. Five types of involuntary musical imagery are
considered here, namely (a) anticipatory musical imagery, (b) musical
hallucinations, (c) musical imagery in schizophrenia, (d) earworms, and (e)
synesthesia.
Anticipatory Musical Imagery

Involuntary musical imagery reflects anticipation of an upcoming or
ongoing musical stimulus. As noted earlier, when participants encounter an
unexpected silent gap when listening to a familiar melody, they often report
continuation of the melody in imagery; such continuation is linked with
activation in auditory association areas and, when linguistic information
(e.g., lyrics) isn’t available, in primary auditory cortex (Kraemer et al.,
2005). Similarly, listeners who expect a musical stimulus but are presented
with silence exhibit emitted potentials similar to the evoked potentials that
occur when a musical stimulus is perceived (Janata, 2001). As noted earlier,
when listening to a familiar CD, participants often experience mental
imagery of an upcoming track during the silent period before that track;
such imagery is linked with activity in rostral prefrontal cortex and motor
areas (Leaver et al., 2009). Notational audiation can be considered
anticipatory musical imagery, as content of audiation anticipates what
would be heard if the notated music was performed. Indeed, ERPs in highly
trained musicians during visual note reading are indistinguishable from
ERPs during auditory note perception (Simoens & Tervaniemi, 2013). The
existence of anticipatory musical imagery is consistent with hypotheses that
imagery is an internal predictive process (e.g., Neisser, 1976; Tian &
Poepple, 2012) and that anticipatory musical imagery might be linked with
expectations that contribute to musical affect (cf. Huron, 2006; Juslin &
Västfjäll, 2008).
Musical Hallucinations
In voluntary musical imagery, individuals have volitional control over
imagery and are aware that the sound does not emanate from a stimulus in
the environment. In musical hallucinations, there is no volitional control
over imagery and sounds are perceived to emanate from objects in the
environment. Musical hallucinations are classified as idiopathic if they
occur in the absence of associated psychopathology (other than hearing
impairment) and as sympathetic if they are associated with concurrent
psychopathology such as depression or schizophrenia (Coebergh, Lauw,
Bots, Sommer, & Blom, 2015). Common etiological factors for musical
hallucinations are brain injury, epilepsy, psychiatric disorder, and
intoxication/pharmacology (Evers, 2006; Evers & Ellger, 2004). Musical
hallucinations can accompany hearing loss (e.g., Hammeke, McQuillen, &
Cohen, 1983), possibly because lack of auditory input disinhibits cortical
mechanisms of auditory imagery and perception (Griffiths, 2000). Patients
with hearing loss might experience musical imagery rather than other types
of auditory imagery because music is more predictable and repetitive than
are other types of auditory stimuli (Kumar et al., 2014). Dysfunction of
temporal cortex (e.g., Kasai, Asada, Yumoto, Takeya, & Matsuda, 1999),
right hemisphere focal brain lesions (Berrios, 1991; but see Keshavan,
Davis, Steingard, & Lishman, 1992; Kumar et al., 2014), and activity in
right superior temporal gyrus (Penfield & Perot, 1963), posterior middle
right temporal lobe (Griffiths, Jackson, Spillane, Friston, & Frackowiak,
1997), superior temporal sulcus (Bernardini, Attademo, Blackmon, &
Devinsky, 2017), and cerebellum (Griffiths, 2000) are linked to the
presence of musical hallucinations. However, given that cerebral
localization of music processing is dependent upon musical background and
experience, the relationship between neural mechanisms and musical
hallucinations could exhibit significant individual differences.
Schizophrenia
The most commonly investigated psychopathology within auditory imagery
literature is schizophrenia. Although the majority of investigations of
schizophrenia focused on auditory hallucinations involving verbal stimuli
(e.g., Cho & Wu, 2013; Evans, McGuire, & David, 2000; Johns et al., 2001;
McGuire et al., 1996; Shergill, Bullmore, Simmons, Murray, & McGuire,
2000), cases of musical hallucinations have been documented. Saba and
Keshavan (1997) documented sixteen patients with schizophrenia who
reported musical hallucinations. Musical imagery in schizophrenia is
typically hallucinatory (not under voluntary control), and Baba and
colleagues (Baba, Hamada, & Koca, 2003) suggested a model of musical
hallucination in schizophrenia in which musical imagery becomes more
obsessive in quality, is perceived as originating outside the individual, and
is ultimately accepted as part of the self. The content of musical
hallucinations in schizophrenia is often described as religious, and this is
consistent with observations that delusions in schizophrenia often contain
religious themes (e.g., Galant-Swafford & Bota, 2015). Brain imaging
acquired during a schizophrenic patient’s musical hallucinations revealed
increased activity in right orbitofrontal cortex (Bleich-Cohen, Hendler,
Pashinian, Faragian, & Poyurovsky, 2011). Relatedly, differences in brain
activation patterns of patients with schizophrenia and controls when spoken
sentences were imaged in another person’s voice, but not when sentences
were imaged in the participant’s own voice (McGuire et al., 1995), suggest
articulatory information relative to the inner voice (discussed later in the
chapter) might be overly represented in schizophrenia.
Earworms
Perhaps the fastest growing area of research on musical imagery during the
past several years involves earworms (also referred to as involuntary
musical imagery, stuck-song syndrome, brain worms, sticky music,
intrusive musical imagery, and perpetual music track; see Williams, 2015).
Earworms are a fragment of a song or melody that repeatedly and
involuntarily occupies an individual’s awareness. Unlike musical
hallucinations, earworms are generally not considered to reflect
psychopathology and are usually not considered distressing by those who
experience them (Beaty et al., 2013; Halpern & Bartlett, 2011; Hemming &
Merrill, 2015). Research on earworms has focused on descriptive
phenomenology and behavioral correlates (for summary, see Hubbard,
forthcoming), and there has been little consideration of neural mechanisms
of earworms. Levitin (2007) suggested earworms occur when neural areas
representing a specific piece of music get stuck in “playback mode.”
Farrugia and colleagues (Farrugia, Jakubowski, Cusack, & Stewart, 2015)
found that frequency of occurrence of earworms was related to cortical
thickness in the right frontal and temporal cortices and anterior cingulate,
whereas affective aspects of involuntary musical imagery were related to
gray matter volume in right temporopolar and parahippocampal cortices. It
could be predicted that neural mechanisms previously shown to be involved
in voluntary musical imagery might be activated during earworms, and
there might be additional (or lack of) activation in other areas or differences
in time course of activation that reflect the involuntary nature of earworms
(e.g., differences in voluntary voice imagery and involuntary voice
hallucinations; Linden et al., 2011).
Synesthesia
Synesthesia occurs if a stimulus in one dimension or modality induces
systematic and idiosyncratic perceptual experience of a specific stimulus in
a different dimension or modality (e.g., hearing a specific sound induces
visual experience of a specific color, e.g., Baron-Cohen & Harrison, 1997;
Cytowic, 2002; Robertson & Sagiv, 2005). Reports of synesthesia in which
a non-musical stimulus elicits musical imagery are rare (e.g., see listings in
Cytowic & Eagleman, 2011; Day, 2016), and perhaps the most well-known
is that of composer Jean Sibelius, who experienced different musical chords
when viewing different colors (Pearce, 2007).1 It might be tempting to
consider musical hallucinations or earworms as forms of synesthesia, but
neither musical hallucinations nor earworms match the typical
phenomenology of synesthesia (e.g., specific synesthetic experiences are
evoked by specific stimuli and consistent over long periods of time). Most
research on neural mechanisms of synesthesia focused on color-grapheme
synesthesia (in which perception of letters or numerals induced experience
of color; e.g., Rouw & Scholte, 2010), and there has been little research
involving neural mechanisms of synesthesia involving musical imagery.
Possible neural mechanisms of synesthesia involve activation
(Ramachandran & Hubbard, 2001) or disinhibition (e.g., Grossenbacher &
Lovelace, 2001) of cross-connections between sensory areas, and so one
speculative possibility is that lack of evoked musical imagery in synesthesia
might be related to the general lack of activation in primary auditory cortex
during musical imagery. Another speculative possibility is that musical
perception already involves non-auditory (e.g., kinesthetic) elements, and
so activation of other non-auditory information in musical imagery is not
experienced as synesthesia per se.
E M I
Explanations of musical phenomena that are based on properties of the

human body have a long history (e.g., dissonance and consonance reflect
beat interference along the basilar membrane; Greenwood, 1961; see also
Hodges, 2009). However, recent developments in cognitive science suggest
characteristics of embodied experience more actively influence perception,
cognition, and action (e.g., motor theory of speech perception, Liberman &
Mattingly, 1985; mirror neurons, Iacoboni, 2009; Oztop, Kawato, & Arbib,
2006). Indeed, observations that music spontaneously engages our bodies in
multiple ways (e.g., tapping along with a beat, attributing an accent pattern
to isochronous beats) suggest music offers a promising venue in which to
investigate embodied cognition (e.g., see Reybrouck, 2001). Aspects of
embodiment that are relevant to musical imagery include (a) spatial and
force metaphors, (b) use of mimicry, (c) the inner ear and inner voice
distinction, (d) mental practice and performance, (e) the relationship
between music and dance, and (f) musical affect.
Spatial and Force Metaphors

Much of human cognition is based on metaphor (Lakoff & Johnson, 1980),
and many prevalent metaphors reflect properties of embodiment and might
influence image schemata (including motor imagery) and other aspects of
cognition (Lakoff & Johnson, 1999). One example involves the notion of
pitch height (for discussion, see Cox, 2016). Faster auditory frequencies are
judged to be “higher” in pitch than are slower auditory frequencies, and
responding to stimuli in specific spatial locations is usually improved when
visual stimuli higher in the picture plane are associated with faster auditory
frequencies (Deroy, Fernandez-Prieto, Navarra, & Spence, 2018; Elkin &
Leuthold, 2011; Keller, Dalla Bella, & Koch, 2010). Related to pitch height
are notions that a sequence of notes forms a contour and that melody moves
in steps and leaps such that notes successive in time are represented as
motion in space (Johnson & Larson, 2003). More broadly, Larson (2012)
suggested analogues of physical inertia, gravitational attraction, and
magnetism occur in music, and Hubbard (2017) addressed the possibility of
an analogue of momentum in music. Eitan and Granot (2006; Eitan &
Timmers, 2010) identify many motion metaphors in musical space (e.g.,
crescendo is associated with approach and with acceleration). Neural
mechanisms of spatial and force metaphors have not received extensive
research, although it could be predicted that cortical areas involved in
processing motion information could be activated (e.g., much as a still
photograph depicting a specific direction of motion activates cortical
motion processing areas; e.g., Senior et al., 2000; Senior, Ward, & David,
2002) in both music perception and musical imagery.
Mimicry
Listening to or recalling music has been suggested to involve motor
mimicry (Cox, 2016). Many musical sounds provide information regarding
the human motor action that produced those sounds (including information
involving spatial and force metaphors), and musical imagery is often
accompanied by visual or motor images related to the sound source (Godøy,
2001). Cox suggested an important component of music comprehension is
imitating, either overtly or covertly, the sound-producing actions of
performers. Such imitative movements might involve movements
appropriate to playing an instrument or subvocal imitation of musical
sounds, and musical features (e.g., pitch, duration, strength, etc.) might be
represented mimetically; indeed, even simple tapping along with the beat
might be considered mimicry. Western popular music has been dominated
by music that is easily singable or danceable (Cox, 2016), and this is
consistent with the importance of embodiment and mimicry in music
processing. Such mimicry might involve overt physical action or covert
firing of mirror neurons. Mirror neurons can be activated by sounds
associated with a given action (Kohler et al., 2002), and so might be
involved in neural activity relevant to singing or playing an instrument. One
consequence of such mimicry is that musical imagery involves kinesthetic
and proprioceptive information (Hubbard, 2013b), and the importance of
kinesthetic and proprioceptive information in auditory and musical imagery
is seen in the distinction between the inner ear and inner voice and in
separate roles of auditory imagery and kinesthetic imagery in mental
practice and performance.
Inner Ear and Inner Voice

In addition to perceiving sounds generated by stimuli in the environment,
humans perceive sounds they generate with their bodies, most commonly
vocalizations (e.g., speaking, singing). Just as listening to external sounds
or generating vocalizations involve the ear or the voice, respectively,
auditory imagery of external sound or vocalization has been hypothesized
to involve the “inner ear” or “inner voice,” respectively (see Hubbard,
2010, 2013b). A distinction between the inner ear and inner voice
underscores one way in which musical imagery (and auditory imagery in
general) reflects embodied experience, as elements of the inner voice are
linked to articulatory gestures involved in speech, singing, or other sound
production. The distinction between the inner ear and inner voice is often
related to Baddeley’s model of working memory (Baddeley, 1986, 2000),
which contains a phonological store used for retention of auditory material
and an articulatory rehearsal mechanism that recodes stimuli for the
phonological store. More specifically, the inner ear is linked to a passive
phonological store and the inner voice is linked to a more active articulatory
rehearsal mechanism. Evidence for the existence of such separate processes
is based on a variety of findings (see Hubbard, 2010, 2013b, 2018,
forthcoming). Just as the phonological store and articulatory rehearsal
mechanism are separate structures or processes that generally work together
but can be experimentally separated, Smith and colleagues (Smith, Wilson,
& Reisberg, 1995) suggested the inner voice and inner ear are separate
structures or processes that generally work together but can be
experimentally separated.
Activation of motor areas in musical and auditory imagery has been
found in multiple studies (for review, see Lima, Krishnan, & Scott, 2016;
Zatorre & Halpern, 2005). Consistent with this, if participants cannot
subvocalize during an experimental task involving auditory imagery, thus
interfering with potential articulatory activity, performance on some tasks
involving auditory imagery is affected (e.g., Reisberg, Smith, Baxter, &
Sonenshine, 1989; Smith et al., 1995); this suggests motor activity in the
form of articulatory gestures influences at least some auditory imagery (see
also Aleman & van’t Wout, 2004). When tasks specified by Smith et al.
(1995) to utilize the inner ear or inner voice were given to schizophrenia
patients, there were no differences in performance (Evans et al., 2000), and
this is consistent with confusion of self-generated and other-generated
vocalization in schizophrenia. Studies of Halpern, Zatorre, and colleagues
(Halpern et al., 2004; Halpern & Zatorre, 1999; Zatorre et al., 1996) suggest
motor areas are activated in auditory imagery of instrumental (non-vocal)
stimuli (Halpern & Zatorre, 1999), and this is consistent with Baddeley and
Logie’s (1992) claim that the articulatory mechanism is involved in
rehearsal of non-vocal stimuli. Relatedly, activation of cerebellar regions
involved in control of the tongue and lips occurs during musical imagery
(Herholz et al., 2012). Evidence of subvocalization in musical imagery is
found in studies of notational audiation, as recognition of a familiar melody
embedded within a larger musical score is disrupted more by phonatory
interference than by rhythmic or auditory interference (e.g., Brodsky,
Henik, Rubenstein, & Zorman, 2003), and EMG activity near the larynx is
increased during reading of a musical score (Brodsky, Kessler, Rubenstein,
Ginsborg, & Henik, 2008).
Mental Practice and Performance

The distinction between the inner ear and inner voice suggests motor
information contributes to auditory imagery, and a role for motor
information in auditory imagery can be seen in studies of the effects of
musical imagery in mental practice and performance. The role of mental
imagery in musical performance is reviewed in detail in Keller (2012), and
findings most relevant to understanding neural mechanisms of musical
imagery are briefly considered here. When string players performed or
imaged a performance of a specific piece, the times taken to play or image
were highly correlated, and frontal lobes, cerebellum, parietal lobe, and
supplementary motor area, but not primary auditory cortex, were activated
during imagery (Langheim, Callicott, Mattay, Duyn, & Weinberger, 2002).
When professional or amateur violinists performed or imaged a
performance, somatosensory cortex was activated, and activation was more
focused in professionals in imagery and in performance (Lotze, Scheler,
Tan, Braun, & Birbaumer, 2003), and it was speculated that musical training
strengthened connections between auditory and movement areas of the
cortex. Violinists exhibited activation of bilateral frontal opercular regions
in preparation for and during musical imagery of performance and during
performance (Kristeva, Chakarov, Schulte-Mönting, & Spreer, 2003) and
exhibited activation in bilateral frontal opercular regions and in
sensorimotor, premotor, and supplementary motor areas during imagery of
performance and during performance (Nirkko, Baader, Loevblad, Milani, &
Wiesendanger, 2000). Coherence of EEG recorded from near the
supplementary motor area of a violoncellist was highest while imagining
playing scales, less when imagining playing a familiar piece by Bach, and
lowest when listening to the same piece by Bach (Petsche, von Stein, &
Filz, 1996).
Pianists who were presented with a musical score and imaged playing it
or played it on a silent keyboard exhibited overlap in activation of premotor
areas in imagery and in performance, and activation was greater during
performance; however, primary motor cortex and posterior parietal cortex
were active during performance and not during imagery (Meister et al.,
2004). Pianists and non-musicians passively listened to a short piano
melody or arbitrarily pressed keys on a soundless keyboard, and in both
tasks pianists exhibited increased activation in dorsolateral and inferior
frontal cortex, superior temporal gyrus, supramarginal gyrus, and
supplementary motor and premotor areas (Bangert et al., 2006). Analogous
similarities are observed with comparison of imagery and perception.
Pianists who listened to familiar pieces exhibited activation in motor
regions appropriate for which fingers would have produced the notes
(Haueisen & Knösche, 2001) and exhibited activation in auditory areas
when they watched a silent video of someone fingering piano keys
(Haslinger et al., 2005). Similarly, when pictures of hand configurations for
playing guitar chords were shown, guitar players exhibited greater
activation in inferior parietal and ventral premotor cortex than did musically
untrained observers (Vogt et al., 2007). However, even though there is
overlap between neural areas activated during imagery and neural areas
activated during perception or performance, there are unique elements to
each (e.g., see Zhang et al., 2017), and it is not necessarily the case that a
common area of activation implies a similar mental representation (for
discussion, see Linke & Cusack, 2015).
Experimental participants who scored higher on a test of auditory
imagery performed better on a subsequent performance following practice
on a silent keyboard in which auditory feedback was not provided (Highben
& Palmer, 2004). Guitarists or vocalists who used mental practice and
physical practice performed best with a mixture of mental and physical
practice (Theiler & Lippmann, 1995), and mental practice was more
effective when musical pieces were relatively easy and less effective than
physical practice when musical pieces were more difficult (Cahn, 2008).
Pitch encoding of piano students is enhanced if those students make finger
tapping movements as if they were playing a piano (Mikumo, 1994), and
pitch acuity in auditory imagery and ability to synchronize in a tapping task
are positively correlated (Pecenka & Keller, 2009). Relatedly, imaged
tempo of popular music is more accurate when individuals tap as they
image (Jakubowksi et al., 2016). Imaged singing activates parietal and
motor areas including Broca’s area and its right hemisphere homologue
(e.g., Baumann et al., 2007) and also activates areas associated with
emotional processing including anterior cingulate cortex, anterior temporal
lobe, and bilateral amygdala (Kleber, Birbaumer, Veit, Trevorrow, & Lotze,
2007). A case study of a pianist suggested imagery aided in managing tasks
and integration of intention and action (Davidson-Kelly, Schaeffer, Moran,
& Overy, 2015). In general, mental practice can facilitate subsequent
performance (Driskell, Copper, & Moran, 1994; Lotze, 2013), presumably
because musical imagery reinforces or strengthens connections made during
physical practice, and this suggests a role of motor activation and motor
processes in musical imagery.
Dance
Perhaps the most obvious form of embodiment of musical information is
dance, which involves production of bodily movements that map onto
properties of music. In general, movements of the body can parallel (mimic)
movements in music (e.g., slowing near the end of a movement, as when
runners slow before stopping and ritardandi occur at the end of a musical
piece, e.g., Friberg & Sundberg 1999), and movements of the body in
response to a specific piece of music can reflect the rhythm, tempo, meter,
and articulation of that music (Fraisse, 1982; Mitchell & Gallaher, 2001).
Whether musical imagery influences kinesthetic information in dance, and
whether kinesthetic information in dance influences musical imagery, is not
known. Given that auditory imagery preserves structural and temporal
information of the referent stimulus (Hubbard, 2010, 2013a, 2013b),
coupled with structural similarity of music and dance (Krumhansl &
Schenck 1997; Vines, Krumhansl, Wanderley, & Levitin, 2006),
relationships between kinesthetic imagery of dance and auditory imagery of
music (which would contain kinesthetic information) could be predicted.
Such relationships might influence behavior (e.g., musical imagery of
ascending or high pitches might facilitate rising or sustained bodily
movement, musical imagery of legato musical notes might facilitate
smoother bodily movement, etc.) as well as produce similar patterns of
cortical activation. Relatedly, findings that auditory stimuli facilitate non-
dance body movements (e.g., in Parkinson’s disease; Rizzonelli, Kim,
Gladow, & Mainka, 2017; Sabaté et al., 2008; Thaut et al., 1996) suggest
auditory imagery of music might be useful in the treatment of motor
disorders.
Musical Affect
The evolutionary origins of music have been linked to communication of
emotional information (e.g., Bryant, 2013; Snowdon, Zimmerman, &
Altenmüller, 2015), and this might account for the common observation of
a link between music and affect (for review, see Juslin & Sloboda, 2001).
Indeed, music perception increases activation in mesocorticolimbic areas,
especially in the amygdala and hippocampus (e.g., Blood & Zatorre, 2001;
for review, Koelsch, 2010). Listening to music is linked with release of
dopamine in dorsal and ventral striatum, and the amount of dopamine
released appears related to the amount of pleasure experienced (Salimpoor,
Benovoy, Larcher, Dagher, & Zatorre, 2011). Furthermore, perception of
music is linked with an increase of oxytocin (Chanda & Levitin, 2013),
which is linked with social bonding. To the extent that musical imagery
involves activation of the same neural mechanisms as music perception,
cognition, and production, then musical imagery would presumably be
linked with affect. Indeed, as noted earlier, the majority of cases of
earworms are generally pleasant. Also, if images function as anticipatory
predictive processes (e.g., Neisser, 1976; Tian & Poepple, 2012), then
matching of musical imagery to subsequent music perception might result
in positive affect resulting from a successful prediction (cf. expectancy as a
contributor to emotion; Huron, 2006; Juslin & Västfjäll, 2008). Given that
perceived music might activate different cortical areas as a function of
whether that music is perceived as happy or sad (Khalfa et al., 2005;
Mitterschiffthaler, Fu, Dalton, Andrew, & Williams, 2007), analogous
patterns of cortical activation could be predicted during music imagery.
S C
Musical imagery is phenomenologically similar to music perception,

cognition, and production, and studies of musical imagery are often
modeled on studies of music perception, cognition, and production. Studies
of musical imagery have included behavioral and psychophysical measures,
clinical studies of brain-damaged patients, and electroencephalography and
brain imaging measures. These studies often found or suggested parallels
between neural mechanisms involved in music perception, cognition, and
production and neural mechanisms involved in musical imagery. Musical
imagery leads to emitted potentials similar to evoked potentials in music
perception, and mismatches between perception and imagery can influence
ERP components such as N1, N2, P2, LPC, and MMN. Auditory
association areas in frontal and prefrontal cortex are activated during
musical imagery, and the right temporal lobe seems critical for generation
and judgment of pitch. Greater vividness of musical imagery is linked with
greater activation in right superior temporal gyrus and prefrontal cortex, and
manipulation of musical imagery activates intraparietal and frontal regions
activated in other spatial tasks. However, there are some differences in
activation patterns; for example, primary auditory cortex is usually
activated in music perception but is usually not activated in music imagery.
Overall, neural mechanisms involved in musical imagery, like neural
mechanisms involved in music perception, cognition, and production, are
distributed throughout the cerebral hemispheres and the cerebellum.
An initially surprising finding was that motor areas of the cortex are
often activated during musical imagery. This suggests that motor
information might contribute to musical imagery, and in fact, motor
information has been suggested to contribute to auditory imagery more
generally. Researchers proposed a distinction between the inner ear, which
involves auditory information, and the inner voice, which involves
articulatory information in addition to auditory information. Studies in
which the possibility of subvocalization was manipulated support such a
distinction. Relatedly, studies of imagery in musical practice and
performance highlight how motor activation and information contribute to
musical imagery and how musical imagery contributes to performance.
Similarly, engagement of the motor system (e.g., tapping along with the
beat) improves accuracy of musical imagery, and there is greater activity in
motor areas for musicians observing a musical performance on their trained
instrument than for non-musicians observing the same performance. The
role of the motor system in musical imagery is consistent with an embodied
cognition approach and with spatial and force metaphors in the
representation of music. Relatedly, mimicry in the form of covert (e.g.,
neural activation) or overt action is involved in music perception and
musical imagery, and music perception and musical imagery might
influence our motor system (e.g., dance). Indeed, given the connection
between motor activation in music and effects of music on brain plasticity,
it could be predicted that musical imagery might be a useful adjunct in
treatment of some motor disorders.
Musical imagery occurs in a wide range of domains. Imagery can be
voluntary, and it is these voluntary images that previously received the most
study. Musical imagery can also occur involuntarily, and examples of
involuntary musical imagery include anticipatory musical imagery,
pathologies such as musical hallucinations and schizophrenia, and
earworms. Anticipatory musical imagery predicts upcoming musical
experience, and this is similar to the predictive aspects of other types of
imagery and might contribute to musical affect. Relatedly, affective
reactions to perceived music are linked to specific neurochemicals and
areas of cortical activation, and it could be predicted that musical imagery
might involve those same mechanisms. Earworms reflect the common
experience of a melodic fragment that individuals “cannot get out of their
heads,” and although there have recently been many studies focusing on
descriptive phenomenology and behavioral correlates of earworms, there
have been few studies examining neural mechanisms of earworms. It can be
predicted that neural mechanisms of involuntary imagery will presumably
overlap with the neural mechanisms involved in voluntary imagery, and any
observed differences in activation patterns could inform not just theories of
musical representation, but theories of cognitive control more generally.
Overall, musical imagery occurs in a variety of situations; involves neural
mechanisms involved in music perception, cognition, and production; is an
important part of subjective experience; and reflects the embodied nature of
cognition.
R
Aleman, A., & van’t Wout, M. (2004). Subvocalization in auditory-verbal imagery: Just a form of
motor imagery? Cognitive Processing 5(4), 228–231.
Alossa, N., & Castelli, L. (2009). Amusia and musical functioning. European Neurology 61(5), 269–
277.
Baba, A., Hamada, H., & Koca, H. (2003). Musical hallucinations in schizophrenia. 2. Relations with
verbal hallucinations. Psychopathology 36(2), 104–110.
Baddeley, A. D. (1986). Working memory. New York: Oxford University Press.
Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in
Baddeley, A. D., & Logie, R. H. (1992). Auditory imagery and working memory. In D. Reisberg
(Ed.), Auditory imagery (pp. 179–197). Hillsdale, NJ: Lawrence Erlbaum Associates.
Baron-Cohen, S., & Harrison, J. E. (Eds.). (1997). Synaesthesia: Classic and contemporary readings.
Cambridge, MA: MIT Press/Blackwell.
Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology 59, 617–645.
Baumann, S., Koeneke, S., Schmidt, C. F., Meyer, M., Lutz, K., & Jänke, L. A. (2007). A network for
audio-motor coordination in skilled pianists and non-musicians. Brain Research 1161, 65–78.
Beaty, R. E., Burgin, C. J., Nusbaum, E. C., Kwapil, T. R., Hodges, D. A., & Silvia, P. J. (2013).
Music to the inner ears: Exploring individual differences in musical imagery. Consciousness and
Cognition 22(4), 1163–1173.
Bernardini, F., Attademo, L., Blackmon, K., & Devinsky, O. (2017). Musical hallucinations: A brief
review of functional neuroimaging findings. CNS Spectrums 22(5), 397–403.
Berrios, G. E. (1991). Musical hallucinosis: A statistical analysis of 46 cases. Psychopathology 24(6),
356–360.
Besson, M., & Schön, D. (2003). Comparison between language and music. In I. Peretz & R. J.
Zatorre (Eds.), The cognitive neuroscience of music (pp. 269–293). New York: Oxford University
Press.
Bleich-Cohen, M., Hendler, T., Pashinian, A., Faragian, S., & Poyurovsky, M. (2011). Obsessive
musical hallucinations in a schizophrenic patient: Psychopathological and fMRI characteristics.
CNS Spectrums 16(7), 153–156.
Sciences 98(20), 11818–11823.
Bringas, M. L., Zaldivar, M., Rojas, P. A., Martinez-Montes, K., Chongo, D. M., Ortega, M. A., …
Valdes-Sosa, P. A. (2015). Effectiveness of music therapy as an aid to neurorestoration of children
with severe neurological disorders. Frontiers in Neuroscience 9, 427. Retrieved from
Brodsky, W., Henik, A., Rubenstein, B. S., & Zorman, M. (2003). Auditory imagery from musical
notation in expert musicians. Perception & Psychophysics 65(4), 602–612.
Brodsky, W., Kessler, Y., Rubenstein, B. S., Ginsborg, J., & Henik, A. (2008). The mental
representation of music notation: Notational audiation. Journal of Experimental Psychology:
Human Perception and Performance 34(2), 427–445.
Bryant, G. A. (2013). Animal signals and emotion in music: Coordinating affect across groups.
Cahn, D. (2008). The effects of varying ratios of physical and mental practice, and task difficulty on
performance of a tonal pattern. Psychology of Music 36, 179–191.
Cebrian, A. N., & Janata, P. (2010a). Electrophysiological correlates of accurate mental image
formation in auditory perception and imagery tasks. Brain Research 1342, 39–54.
Cebrian, A. N., & Janata, P. (2010b). Influences of multiple memory systems on auditory mental
image acuity. Journal of the Acoustical Society of America 127, 3189–3202.
17(4), 179–193.
Cho, R., & Wu, W. (2013). Mechanisms of auditory verbal hallucination in schizophrenia. Frontiers
in Psychiatry 4, 155. Retrieved from https://doi.org/10.3389/fpsyt.2013.00155
Coebergh, J. A. F., Lauw, R. F., Bots, R., Sommer, I. E. C., & Blom, J. D. (2015). Musical
hallucinations: Review of treatment effects. Frontiers in Psychology 6, 814. Retrieved from
Cox, A. (2016). Music and embodied cognition. Bloomington, IN: Indiana University Press.
Crowder, R. G. (1989). Imagery for musical timbre. Journal of Experimental Psychology: Human
Cytowic, R. E. (2002). Synesthesia: A union of the senses (2nd ed.). Cambridge, MA: MIT Press.
Cytowic, R. E., & Eagleman, D. M. (2011). Wednesday is indigo blue: Discovering the brain of
synesthesia. Cambridge, MA: MIT Press.
Daselaar, S. M., Porat, Y., Huijbers, W., & Pennartz, C. M. (2010). Modality-specific and modality-
independent components of the human imagery system. NeuroImage 52(2), 677–685.
Davidson-Kelly, K., Schaeffer, R. S., Moran, N., & Overy, K. (2015). “Total Inner Memory”:
Deliberate uses of multimodal musical imagery during performance preparation.
Psychomusicology: Music, Mind and Brain 25(1), 83–92.
Day, S. A. (2016). Synesthetes: A handbook. CreateSpace Independent Publishing Platform.
Deroy, O., Fernandez-Prieto, I., Navarra, J., & Spence, C. (2018). Unraveling the paradox of spatial
pitch. In T. L. Hubbard (Ed.), Spatial biases in perception and cognition (pp. 77–93). New York:
Cambridge University Press.
Douglas, K. M., & Bilkey, D. K. (2007). Amusia is associated with deficits in spatial processing.
Driskell, J. E., Copper, C., & Moran, A. (1994). Does mental practice enhance music performance?
Journal of Applied Psychology 79(4), 481–492.
Eitan, Z., & Granot, R. Y. (2006). How music moves: Musical parameters and listeners’ images of
motion. Music Perception 23(3), 221–247.
Eitan, Z., & Timmers, R. (2010). Beethoven’s last piano sonata and those who follow crocodiles:
Cross-domain mappings of auditory pitch in a musical context. Cognition 114(3), 405–422.
Elkin, J., & Leuthold, H. (2011). The representation of pitch in auditory imagery: Evidence from S-R
compatibility and distance effects. Journal of Cognitive Psychology 23(1), 76–91.
Evans, C. L., McGuire, P. K., & David, A. S. (2000). Is auditory imagery defective in patients with
auditory hallucinations? Psychological Medicine 30(1), 137–148.
Evers, S. (2006). Musical hallucinations. Current Psychiatry Reports 8(3), 205–210.
Evers, S., & Ellger, T. (2004). The clinical spectrum of musical hallucinations. Journal of the
Farrugia, N., Jakubowski, K., Cusack, R., & Stewart, L. (2015). Tunes stuck in your brain: The
frequency and affective evaluation of involuntary musical imagery correlate with cortical structure.
Consciousness and Cognition 35, 66–77.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music (pp. 149–181).
New York: Academic Press.
Friberg, A., & Sundberg, J. (1999). Does music performance allude to locomotion? A model of final
ritardandi derived from measurements of stopping runners. Journal of the Acoustical Society of
America 105(3), 1469–1484.
Gabriel, D., Wong, T. C., Nicolier, M., Giustiniani, J., Mignot, C., Noiret, N., … Vandel, P. (2016).
Don’t forget the lyrics! Spatiotemporal dynamics of neural mechanisms spontaneously evoked by
gaps of silence in familiar and newly learned songs. Neurobiology of Learning and Memory 132,
18–28.
Galant-Swafford, J., & Bota, R. (2015). Musical hallucinations in schizophrenia. Mental Illness 7(1),
6065.
Gibbs, R. W. (2005). Embodiment and cognitive science. New York: Cambridge University Press.
Godøy, R. I. (2001). Imagined action, excitation, and resonance. In R. I. Godøy & H. Jørgensen
(Eds.), Musical imagery (pp. 237–250). New York: Taylor & Francis.
Greenwood, D. D. (1961). Critical bandwidth and the frequency coordinates of the basilar membrane.
Journal of the Acoustical Society of America 33, 1344–1356.
Griffiths, T. D. (2000). Musical hallucinosis in acquired deafness: Phenomenology and brain
substrate. Brain 123(10), 2065–2076.
Griffiths, T. D., Jackson, M. C., Spillane, J. A., Friston, K. J., & Frackowiak, R. S. J. (1997). A
neural substrate for musical hallucinosis. Neurocase 3(3), 167–172.
Grossenbacher, P. G., & Lovelace, C. T. (2001). Mechanisms of synesthesia: Cognitive and
physiological constraints. Trends in Cognitive Sciences 5(1), 36–41.
Halpern, A. R. (1988a). Mental scanning in auditory imagery for songs. Journal of Experimental
Psychology: Learning, Memory, and Cognition 14, 434–443.
Halpern, A. R. (1988b). Perceived and imaged tempos of familiar songs. Music Perception 6(2),
193–202.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition 17(5),
572–581.
Halpern, A. R. (2003). Cerebral substrates of musical imagery. In I. Peretz & R. Zatorre (Eds.), The
cognitive neuroscience of music (pp. 217–230). New York: Oxford University Press.
Halpern, A. R., & Bartlett, J. C. (2011). The persistence of musical memories: A descriptive study of
earworms. Music Perception 28(4), 425–432.
Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET investigation
of auditory imagery for familiar melodies. Cerebral Cortex 9(7), 697–704.
Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural
correlates of perceived and imagined musical timbre. Neuropsychologia 42(9), 1281–1292.
Hammeke, T. A., McQuillen, M. P., & Cohen, B. A. (1983). Musical hallucinations associated with
acquired deafness. Journal of Neurology, Neurosurgery, and Psychiatry 46(6), 570–572.
Haslinger, B., Erhard, P., Altenmüller, E., Schroeder, U., Boecker, H., & Ceballos-Baumann, A. O.
(2005). Transmodal sensorimotor networks during action observation in professional pianists.
Haueisen, J., & Knösche, T. (2001). Involuntary motor activity in patients evoked by music
Hemming, J., & Merrill, J. (2015). On the distinction between involuntary musical imagery, musical
hallucinosis, and musical hallucinations. Psychomusicology: Music, Mind, and Brain 25(4), 435–
442.
Herholz, S. C., Halpern, A. R., & Zatorre, R. J. (2012). Neuronal correlates of perception, imagery,
and memory for familiar tunes. Journal of Cognitive Neuroscience 24(6), 1382–1397.
Herholz, S. C., Lappe, C., Knief, A., & Pantev, C. (2008). Neural basis of music imagery and the
effect of musical expertise. European Journal of Neuroscience 28(11), 2352–2360.
Highben, Z., & Palmer, C. (2004). Effects of auditory and motor mental practice in memorized piano
performance. Bulletin of the Council for Research in Music Education 159, 58–65.
Hodges, D. A. (2009). Bodily responses to music. In S. Hallam, I. Cross, & M. Thaut (Eds.), The
Oxford handbook of music psychology (2nd ed., pp. 183–196). New York: Oxford University
Press.
Hubbard, T. L. (2010). Auditory imagery: Empirical findings. Psychological Bulletin 136(2), 302–
329.
Hubbard, T. L. (2013a). Auditory aspects of auditory imagery. In S. Lacey & R. Lawson (Eds.),
Multisensory imagery (pp. 51–76). New York: Springer.
Hubbard, T. L. (2013b). Auditory imagery contains more than audition. In S. Lacey & R. Lawson
(Eds.), Multisensory imagery (pp. 221–247). New York: Springer.
Hubbard, T. L. (2017). Momentum in music: Musical succession as physical motion.
Hubbard, T. L. (2018). Some methodological and conceptual considerations in studies of auditory
imagery. Auditory Perception and Cognition 1, 6–41.
Hubbard, T. L. (forthcoming). Some anticipatory, kinesthetic, and dynamic aspects of auditory
imagery. In M. Grimshaw, M. Walther-Hansen, & M. Knakkergaard (Eds.), The Oxford handbook
of sound and imagination. New York: Oxford University Press.
Hubbard, T. L., & Stoeckig, K. (1988). Musical imagery: Generation of tones and chords. Journal of
Experimental Psychology: Learning, Memory, and Cognition 14, 656–667.
Huron, D. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA:
MIT Press.
Iacoboni, M. (2009). Imitation, empathy, and mirror neurons. Annual Review of Psychology 60, 653–
670.
Jakubowski, K., Farrugia, N., & Stewart, L. (2016). Probing imagined tempo for music: Effects of
motor engagement and musical experience. Psychology of Music 44(6), 1274–1288.
Janata, P. (2001). Brain electrical activity evoked by mental formation of auditory expectations and
images. Brain Topography 13(3), 169–193.
Janata, P., & Paroo, K. (2006). Acuity of auditory images in pitch and time. Perception &
Johns, L. C., Rossell, S., Frith, C., Ahmad, F., Hemsley, D., Kuipers, E., & McGuire, P. K. (2001).
Verbal self-monitoring and auditory verbal hallucinations in patients with schizophrenia.
Psychological Medicine 31(4), 705–715.
Johnson, M. L., & Larson, S. (2003). “Something in the way she moves” - Metaphors of musical
motion. Metaphor and Symbol 18(2), 63–84.
Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. New York:
Kantrowitz, J. T., Scaramello, N., Jakubovitz, A., Lehrfeld, J. M., Laukka, P., Elfenbein, H. A., …
Javitt, D. C. (2014). Amusia and protolanguage impairments in schizophrenia. Psychological
Medicine 44(13), 2739–2748.
Kasai, K., Asada, T., Yumoto, M., Takeya, J., & Matsuda, H. (1999). Evidence of functional
abnormality in the right auditory cortex during musical hallucinations. Lancet 354, 1703–1704.
Keller, P. E. (2012). Mental imagery in musical performance: Underlying mechanisms and potential
benefits. Annals of the New York Academy of Sciences 1252, 206–213.
Keller, P. E., Dalla Bella, S., & Koch, I. (2010). Auditory imagery shapes movement timing and
kinematics: Evidence from a musical task. Journal of Experimental Psychology: Human
Keshavan, M. S., Davis, A. S., Steingard, S., & Lishman, W. A. (1992). Musical hallucinosis: A
review and synthesis. Neuropsychiatry, Neuropsychology, & Behavioral Neurology 5(3), 211–223.
Khalfa, S., Schon, D., Anton, J.-L., & Liégeois-Chauvel, C. (2005). Brain regions involved in the
recognition of happiness and sadness in music. Neuroreport 16(18), 1981–1984.
Kleber, B., & Birbaumer, N., Veit, R., Trevorrow, T., & Lotze, M. (2007). Overt and imagined
singing of an Italian aria. NeuroImage 36(3), 889–900.
Koelsch, S. (2010). Towards a neural basis of music-evoked emotions. Trends in Cognitive Sciences
14(3), 131–137.
Koelsch, S. (2012). Brain and music. Cambridge, MA: Wiley-Blackwell.
Kohler, E., Keysers, C., Umiltà, M. A., Fogassi, L., Gallese, V., & Rizzolatti, G. (2002). Hearing
sounds, understanding actions: Action representation in mirror neurons. Science 297(5582), 846–
848.
Kraemer, D. J. M., Macrae, C. N., Green, A. E., & Kelley, W. M. (2005). Musical imagery: Sound of
silence activates auditory cortex. Nature 434, 158.
Kristeva, R., Chakarov, V., Schulte-Mönting, J., & Spreer, J. (2003). Activation of cortical areas in
music execution and imagining: A high-resolution EEG study. NeuroImage 20(3), 1872–1883.
Krumhansl, C. L., & Schenck, D. L. (1997). Can dance reflect the structural and expressive qualities
of music? A perceptual experiment on Balanchine’s choreography of Mozart’s Divertimento No.
15. Musicae Scientiae 1, 63–85.
Kumar, S., Sedley, W., Barnes, G. R., Teki, S., Friston, K. J., & Griffiths, T. D. (2014). A brain basis
for musical hallucinations. Cortex 52(100), 86–97.
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of Chicago Press.
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to
western thought. New York: Basic Books.
Langheim, F. J., Callicott, J. H., Mattay, V. S., Duyn, J. H., & Weinberger, D. R. (2002). Cortical
systems associated with covert music rehearsal. Neuroimage 16(4), 901–908.
Larson, S. (2012). Musical forces: Motion, metaphor and meaning in music. Bloomington, IN:
Indiana University Press.
Leaver, A. M., van Lare, J., Zielinski, B., Halpern, A. R., & Rauschecker, J. P. (2009). Brain
activation during anticipation of sound sequences. Journal of Neuroscience 29(8), 2477–2485.
Levitin, D. J. (2007). This is your brain on music: The science of a human obsession. New York:
Penguin Group.
Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised.
Cognition 21, 1–36.
Liégois-Chauvel, C., Peretz, I., Babaï, M., Laguitton, V., & Chauvel, P. (1998). Contribution of
different cortical areas in the temporal lobes in music processing. Brain 121(10), 1853–1867.
Lima, C. F., Krishnan, S., & Scott, S. K. (2016). Roles of supplementary motor areas in auditory
processing and auditory imagery. Trends in Neurosciences 39(8), 527–542.
Lima, C. F., Lavan, N., Evans, S., Agnew, Z., Halpern, A. R., Shanmugalingam, P., … Scott, S. K.
(2015). Feel the noise: Relating individual differences in auditory imagery to the structure and
function of sensorimotor systems. Cerebral Cortex 25(11), 4638–4650.
Linden, D. E. J., Thornton, K., Kuswanto, C. N., Johnston, S. J., van de Ven, V., & Jackson, M. C.
(2011). The brain’s voices: Comparing nonclinical auditory hallucinations and imagery. Cerebral
Cortex 21(2), 330–337.
Linke, A. C., & Cusack, R. (2015). Flexible information coding in human auditory cortex during
perception, imagery, and STM of complex sounds. Journal of Cognitive Neuroscience 27(7),
1322–1333.
Lotze, M. (2013). Kinesthetic imagery of musical performance. Frontiers in Human Neuroscience 7,
280. Retrieved from https://doi.org/10.3389/fnhum.2013.00280
Lotze, M., Scheler, G., Tan, H. R., Braun, C., & Birbaumer, N. (2003). The musician’s brain:
Functional imaging of amateurs and professionals during performance and imagery. Neuroimage
20(3), 1817–1829.
McGuire, P. K., Silbersweig, D. A., Wright, I., Murray, R. M., David, A. S., Frackowiak, R. S. J., &
Frith, C. D. (1995). Abnormal monitoring of inner speech: A physiological basis for auditory
hallucinations. Lancet 346, 596–600.
McGuire, P. K., Silbersweig, D. A., Murray, R. M., David, A. S., Frackowiak, R. S., & Frith, C. D.
(1996). Functional anatomy of inner speech and auditory verbal imagery. Psychological Medicine
26(1), 29–38.
Marin, O. S. M., & Perry, D. W. (1999). Neurological aspects of music perception and performance.
In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 653–724). New York: Academic Press.
Meister, I. G., Krings, T., Foltys, H., Boroojerdi, B., Müller, M., Töpper, R., & Thron, A. (2004).
Playing piano in the mind: An fMRI study on music imagery and performance in pianists.
Cognitive Brain Research 19(3), 219–228.
Mikumo, M. (1994). Motor encoding strategy for pitches of melodies. Music Perception 12(2), 175–
197.
Mitchell, R. W., & Gallaher, M. C. (2001). Embodying music: Matching music and dance in memory.
Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R. (2007). A
Mapping 28(11), 1150–1162.
Neisser, U. (1976). Cognition and reality: Principles and implications of cognitive psychology. New
York: W. H. Freeman.
Nirkko, A. C., Baader, A. P., Loevblad, K.-O., Milani, P., & Wiesendanger, M. (2000). Cortical
representation of music production in violin players: Behavioral assessment and functional
imaging of finger sequencing, bimanual coordination and music specific brain activation.
NeuroImage 11(5), S106.
Oh, J., Kwon, J. H., Yang, P. S., & Jeong, J. (2013). Auditory imagery modulates frequency-specific
areas in the human auditory cortex. Journal of Cognitive Neuroscience 25(2), 175–187.
Oztop, E., Kawato, M., & Arbib, M. (2006). Mirror neurons and imitation: A computationally guided
review. Neural Networks 19(3), 254–271.
Parsons, L. M. (2003). Exploring the functional neuroanatomy of music performance, perception, and
comprehension. In I. Peretz & R. Zatorre (Eds.), The cognitive neuroscience of music (pp. 247–
268). New York: Oxford University Press.
Patel, A. D. (2008). Music, language, and the brain. New York: Oxford University Press.
Pearce, J. M. S. (2007). Synaesthesia. European Neurology 57(2), 120–124.
Pecenka, N., & Keller, P. E. (2009). Auditory pitch imagery and its relationship to musical
synchronization. Annals of the New York Academy of Sciences 1169, 282–286.
Penfield, W., & Perot, P. (1963). The brain’s record of auditory and visual experience. Brain 86(4),
595–696.
patients. Brain 113(4), 1185–1205.
Peretz, I. (2002). Brain specialization for music. Neuroscientist 8(4), 374–382.
Peretz, I. (2013). The biological foundations of music: Insight from congenital amusia. In D. Deutsch
(Ed.). The psychology of music (3rd ed., pp. 551–564). New York: Academic Press.
Peretz, I., & Zatorre, R. J. (Eds.). (2003). The cognitive neuroscience of music. New York: Oxford
University Press.
Perrone-Capano, C., Volpicelli, F., & di Porzio, U. (2017). Biological bases of human musicality.
Review of Neuroscience 28(3), 235–245.
Petsche, H., von Stein, A., & Filz, O. (1996). EEG aspects of mentally playing an instrument.
Cognitive Brain Research 3(2), 115–123.
Platel, H., Price, C., Baron, J.-C., Wise, R., Lambert, J., Frackowiak, R. S. J., … Eustache, F. (1997).
The structural components of music perception: A functional anatomical study. Brain: A Journal
of Neurology 120(2), 229–243.
Prete, G., Marzoli, D., Brancucci, A., & Tommasi, L. (2016). Hearing it right: Evidence of
hemispheric lateralization in auditory imagery. Hearing Research 332, 80–86.
Ramachandran, V. S., & Hubbard, E. M. (2001). Synaesthesia: A window into perception, thought
and language. Journal of Consciousness Studies 8(12), 3–34.
Reisberg, D., Smith, J. D., Baxter, D. A., & Sonenshine, M. (1989). “Enacted” auditory images are
ambiguous; “pure” auditory images are not. Quarterly Journal of Experimental Psychology A:
Human Experimental Psychology 41(3), 619–641.
Reybrouck, M. (2001). Musical imagery between sensory processing and ideomotor simulation. In R.
I. Godøy & H. Jørgensen (Eds.), Musical imagery (pp. 117–135). New York: Taylor & Francis.
Rizzonelli, M., Kim, J. H., Gladow, T., & Mainka, S. (2017). Musical stimulation with feedback in
gait training for Parkinson’s disease. Psychomusicology: Music, Mind, and Brain 27, 213–218.
Robertson, L. C., & Sagiv, N. (Eds.). (2005). Synesthesia: Perspectives from cognitive neuroscience.
New York: Oxford University Press.
Rouw, R., & Scholte, H. S. (2010). Neural basis of individual differences in synesthetic experiences.
Saba, P. R., & Keshavan, M. S. (1997). Musical hallucinations and musical imagery: Prevalence and
phenomenology in schizophrenic inpatients. Psychopathology 30, 185–190.
Sabaté, M., Llanos, C., & Rodriguez, M. (2008). Integration of auditory and kinesthetic information
in motion: Alterations in Parkinson’s disease. Neuropsychology 22(4), 462–468.
Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., &. Zatorre, R. J. (2011). Anatomically
distant dopamine release during anticipation and experience of peak emotion to music. Nature
Samson, S., & Zatorre, R. J. (1991). Recognition memory for text and melody of songs after
unilateral temporal lobe lesion: Evidence for dual encoding. Journal of Experimental Psychology:
Learning, Memory, and Cognition 17(4), 793–804.
Särkämö, T., Altenmüller, E., Rodriguez-Fornells, A., & Peretz, I. (2016). Editorial. Music, brain,
and rehabilitation: Emerging therapeutic applications and potential neural mechanisms. Frontiers
in Human Neuroscience 10, 103. Retrieved from https://doi.org/10.3389/fnhum.2016.00103
Satoh, M. (2014). Musical processing in the brain: A neuropsychological approach through cases
with amusia. Austin Journal of Clinical Neurology 1(2), 1009.
Satoh, M., Takeda, K., Nagata, N., Hatazawa, J., & Kuzuhara, S. (2001). Activated brain regions in
musicians during an ensemble: A PET study. Cognitive Brain Research 12(1), 101–108.
Schaefer, R. S., Desain, P., & Suppes, P. (2009). Structural decomposition of EEG signatures of
melodic processing. Biological Psychology 82(3), 253–259.
Schaefer, R. S., Vlek, R. J., & Desain, P. (2011). Music perception and imagery in EEG: Alpha band
effects of task and stimulus. International Journal of Psychophysiology 82(3), 254–259.
Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychological
Science 14(3), 262–266.
Schlaug, G. (2015). Musicians and music making as a model for the study of brain plasticity.
Schürmann, M., Raij, T., Fujiki, N., & Hari, R. (2002). Mind’s ear in a musician: Where and when in
the brain. NeuroImage 16(2), 434–440.
Senior, C., Barnes, J., Giampietroc, V., Simmons, A., Bullmore, E. T., Brammer, M., & David, A. S.
(2000). The functional neuroanatomy of implicit-motion perception or “representational
momentum.” Current Biology 10(1), 16–22.
Senior, C., Ward, J., & David, A. S. (2002). Representational momentum and the brain: An
investigation of the functional necessity of V5/MT. Visual Cognition 9(1), 81–92.
Shapiro, L. (2010). Embodied cognition. New York: Routledge.
Shergill, S. S., Bullmore, E., Simmons, A., Murray, R., & McGuire, P. (2000). Functional anatomy of
auditory verbal imagery in schizophrenic patients with auditory hallucinations. American Journal
of Psychiatry 157(10), 1691–1693.
Simoens, V. L., & Tervaniemi, M. (2013). Auditory short-term memory activation during score
reading. PLoS ONE 8(1), e53691.
Smith, J. D., Wilson, M., & Reisberg, D. (1995). The role of subvocalization in auditory imagery.
Snowdon, C. T., Zimmerman, E., & Altenmüller, E. (2015). Music evolution and neuroscience.
Spiller, M. J., Jonas, C. N., Simner, J., & Jansari, A. (2015). Beyond visual imagery: How modality-
specific is enhanced mental imagery in synesthesia? Consciousness and Cognition 31, 73–85.
Stewart, L., von Kriegstein, K., Warren, J. D., & Griffiths, T. D. (2006). Music and the brain:
Disorders of musical listening. Brain 129(10), 2533–2553.
Thaut, M. H., McIntosh, G. C., Rice, R. R., Miller, R. A., Rathbun, J., & Brault, J. M. (1996).
Rhythmic auditory stimulation in gait training with Parkinson’s disease patients. Movement
Disorders 11(2), 193–200.
Theiler, A. M., & Lippman, L. G. (1995). Effects of mental practice and modeling on guitar and
vocal performance. Journal of General Psychology 122(4), 329–343.
Tian, X., & Poeppel, D. (2012). Mental imagery of speech: Linking motor and perceptual systems
through internal simulation and estimation. Frontiers in Human Neuroscience 6, 314. Retrieved
from https://doi.org/10.3389/fnhum.2012.00314
Tillmann, B., Jolicoeur, P., Ishihara, M., Gosselin, N., Bertrand, O., Rossetti, Y., & Peretz, I. (2010).
The amusic brain: Lost in music, but not in space. PLoS ONE 5(4), e10173.
van Dijk, H., Nieuwenhuis, I. L., & Jensen, O. (2010). Left temporal alpha band activity increases
during working memory retention of pitches. European Journal of Neuroscience 31(9), 1701–
1707.
Villena-González, M., López, V., & Rodríguez, E. (2016). Data of ERPs and spectral alpha power
when attention is engaged on visual or verbal/auditory imagery. Data in Brief 7, 882–888.
Vines, B. W., Krumhansl, C. L., Wanderley, M. M., & Levitin, D. J. (2006). Cross-modal interactions
in the perception of musical performance. Cognition 101(1), 80–113.
Vlek, R. J., Schaefer, R. S., Gielen, C. C. A. M., Farquhar, J. D. R., & Desain, P. (2011). Shared
mechanisms in perception and imagery of auditory accents. Clinical Neurophysiology 122(8),
1526–1532.
Vogt, S., Buccino, G., Wohlschlager, A. M., Canessa, N., Shah, N. J., Zilles, K., … Fink, G. R.
(2007). Prefrontal involvement in imitation learning of hand actions: Effects of practice and
expertise. NeuroImage 37(4), 1371–1383.
Vuvan, D. T., & Schmuckler, M. A. (2011). Tonal hierarchy representations in auditory imagery.
Memory & Cognition 39(3), 477–490.
Wan, C. Y., & Schlaug, G. (2013). Brain plasticity induced by musical training. In D. Deutsch (Ed.).
The psychology of music (3rd ed., pp. 565–581). New York: Academic Press.
Williams, T. I. (2015). The classification of involuntary musical imagery: The case for earworms.
Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin & Review 9(4), 625–636.
Wu, J., Mai, X., Chan, C. C. H., Zheng, Y., & Luo, Y. (2006). Event related potentials during mental
imagery of animal sounds. Psychophysiology 43(6), 592–597.
Wu, J., Mai, X., Yu, Z., Qin, S., & Luo, Y. (2010). Effects of discrepancy between imagined and
perceived sounds on the N2 component of the event-related potential. Psychophysiology 47(2),
289–298.
Wu, J., Yu, Z., Mai, X., Wei, J., & Luo, Y. (2011). Pitch and loudness information encoded in
auditory imagery as revealed by event-related potentials. Psychophysiology 48(3), 415–419.
Yoo, S. S., Lee, C. U., & Choi, B. G. (2001). Human brain mapping of auditory imagery: Event-
related functional MRI study. Neuroreport 12(14), 3045–3049.
Yumoto, M., Matsuda, M., Itoh, K., Uno, A., Karino, S., Siatoh, O., … Kaga, K. (2005). Auditory
imagery mismatch negativity elicited in musicians. Neuroreport 16(11), 1175–1178.
Zatorre, R. J., & Halpern, A. R. (1993). Effect of unilateral temporal-lobe excision on perception and
imagery of songs. Neuropsychologia 31(3), 221–232.
Zatorre, R. J., & Halpern, A. R. (2005). Mental concerts: Musical imagery and the auditory cortex.
Neuron 47(1), 9–12.
Zatorre, R. J., Halpern, A. R., & Bouffard, M. (2010). Mental reversal of imagined melodies: A role
for the posterior parietal cortex. Journal of Cognitive Neuroscience 22(4), 775–789.
Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the mind’s
ear: A PET investigation of musical imagery and perception. Journal of Cognitive Neuroscience
8(1), 29–46.
Zhang, Y., Chen, G., Wen, H., Lu, K. H., & Liu, Z. (2017). Musical imagery involves Wernicke’s
area in bilateral and anti-correlated network interactions in musicians. Scientific Reports 7(1),
17068.
1
Interestingly, a number of prominent composers are suspected of or have admitted to
experiencing synesthesia in which musical stimuli evoked different colors or other visual qualities
(e.g., Leonard Bernstein, Duke Ellington, Billy Joel, Franz Liszt, Oliver Messiaen, Nikolai Rimsky-
Korsakov, Alexander Scriabin). The greater prevalence of non-musical imagery (e.g., visual color)
triggered by a musical stimulus (e.g., pitch), coupled with the relative lack of musical imagery (e.g.,
pitch) triggered by a non-musical stimulus (e.g., visual color), is consistent with findings that
auditory stimuli evoke non-auditory qualities in a large percentage of synesthetes, but non-auditory
stimuli evoke auditory qualities in a very small percentage of synesthetes (Spiller, Jonas, Simner, &
Jansari, 2015). Also, it is not clear if the apparent lack of auditory imagery induced in synesthesia is
due to a limitation of synesthesia or to a bias in reporting.
CHAPT E R 22
NEUROPLASTICITY IN
MUSIC LEARNING
V E S A P U T K I N E N A N D MA R I T E RVA N I E MI
O of the central aims of neuroscience is to understand how experience

shapes the brain. Animal studies have demonstrated experience-driven
neuroplasticity at various spatial and temporal scales, with some of the
earliest evidence coming from studies investigating the effects of enriched
environments on brain function and anatomy (Buonomano & Merzenich,
1998; Praag, Kempermann, & Gage, 2000). During the last three decades,
musical training has emerged as a popular model for brain plasticity in
humans (Herholz & Zatorre, 2012). Since mastering a musical instrument
requires years of intense deliberate practice and relies on a multitude of
perceptual and cognitive skills, it is a reasonable prediction that the brains
of musicians and musical laypersons should differ in many respects. Indeed,
since the mid-1990s, an ever-increasing number of neuroimaging and
electrophysiological studies have reported differences between adult
musicians and non-musicians in neural markers of sensory processing and
motor function as well as neural correlates of higher-order, domain-general
cognitive functions. Although these findings are typically discussed in the
context of expertise and plasticity, it is also widely recognized that, in
addition to training, predisposing factors most likely also contribute to these
group differences. Obviously, cross-sectional studies in adults cannot tease
apart the contribution of these factors. As a consequence, several research
groups have started to conduct longitudinal studies in children in attempt to
establish the causal role of training in driving structural and functional
differentiation of the brain.
Structural imaging studies have provided evidence for numerous
differences in neural architecture between adult musicians and non-
musicians including differences in gray matter anatomy of auditory and
somatosensory systems (Bermudez, Lerch, Evans, & Zatorre, 2009; Gaser
& Schlaug, 2003; Schneider et al., 2002), extra-sensory regions (Bermudez
et al., 2009; Gaser & Schlaug, 2003; Sluming et al., 2002), and the
organization of the white matter tracks (Bengtsson et al., 2005; Halwani,
Loui, Rüber, & Schlaug, 2011; Schlaug, Jäncke, Huang, Staiger, &
Steinmetz, 1995). Here, we focus mainly on studies that have investigated
functional reorganization of the musician brain and, in particular, on studies
employing event-related potentials (ERP) or fields (ERFs) derived from
electroencephalography (EEG) and magnetoencephalography (MEG),
respectively. This literature spans over twenty years and includes many of
the first reports of the neural correlates of musical expertise, and notably,
the majority of the more recent longitudinal studies conducted in this
framework thus far.
E M
E C S P
M
Some of the earliest evidence for neuroplastic effects of musical training

came from studies that employed source-modeling of electromagnetic brain
responses to investigate cortical somatosensory and auditory representations
in adult musicians (Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995;
Pantev et al., 1998). It was found that string players showed stronger
cortical responses to pneumatic stimulation of the left-hand digits used to
control the pitch of the instrument whereas no differences between the
musicians and control subjects were found for the right-hand digits. In the
auditory domain, musicians showed stronger responses to piano tones than
non-musicians but not to pure tones, indicating that auditory cortical
representations were enhanced in musicians particularly for acoustically
rich, musical sounds. Other early electrophysiological evidence indicated
that the tuning of the auditory cortical representations in musicians was
strongest for the timbre of the musicians’ trained instrument (Pantev,
Roberts, Schulz, Engelien, & Ross, 2001). Consequently, these findings
were interpreted as evidence for training-induced plasticity—a conclusion
bolstered by the finding that the magnitude of the enhancement was
correlated with the years of musical training/age of training onset (Elbert et
al., 1995; Pantev et al., 1998).
Also in the mid-1990s, several research groups began to examine how
the auditory system of musicians and non-musicians generate expectations
of upcoming auditory events based on preceding auditory input. In the first
of these studies, musicians and non-musicians listened to familiar and
unfamiliar musical sound sequences and evaluated whether the ending tone
was harmonically or rhythmically congruous or not (Besson & Faïta, 1995;
Besson, Faïta, & Requin, 1994). Musicians outperformed the controls in the
behavioral task—except for the incongruities in the familiar musical
phrases which were detected by both groups with equal success—and
showed augmented and faster late-latency brain responses (termed late
positive component, LPC) to the incongruent endings relative to the non-
musicians indicating that formal knowledge of musical rules modifies the
neural mechanisms underlying musical expectancies (Besson & Faïta,
1995). However, the LPC did not differ significantly between the groups in
another condition where the task was simply to listen to the musical phrases
but not explicitly decide whether the ending tones were congruous or
incongruous. These were the first demonstrations that ERPs are sensitive to
differences in musical sophistication and indicated that musical training
could modify relatively late (400–800 ms after stimulus onset) post-
perceptual cognitive processes related to the overt detection of the
unexpected musical events.
Later studies found evidence that the superior sound discrimination in
musicians extended to earlier stages of the auditory processing stream and
was evident even for sound processing that occurs outside the focus of
attention. One example of an ERP index used in such studies is the early
anterior negativity (ERAN) that is elicited by harmonically inappropriate
chords within chord cadences. The ERAN peaks around 200 ms and its
neural sources have been localized to bilateral inferior frontal gyri, with
particularly strong contribution from the right-hemisphere homologue of
Broca’s area (Maess, Koelsch, Gunter, & Friederici, 2001). The ERAN is
thought to rely on learned music-syntactic rules (Koelsch, 2009) and can be
obtained from non-musicians (Koelsch, Gunter, Friederici, & Schröger,
2000) and even from young children (Koelsch et al., 2003). However, the
amplitude of this response is augmented in musicians (Koelsch, Schmidt, &
Kansok, 2002) and musically trained children (Jentschke & Koelsch, 2009)
indicating that, even though implicit learning or harmonic regularities
typical for one’s culture is sufficient for the pre-attentive detection of
harmony violations, formal training has an additional enhancing effect on
this ability.
The mismatch negativity (MMN) has been used particularly widely to
examine early processing musical sound regularities in musicians and non-
musicians. The MMN has been conceptualized as a response to sounds that
deviate from predictions that are generated based on regularities that the
auditory system automatically extracts from the sound environment
(Näätänen, Paavilainen, Rinne, & Alho, 2007; Näätänen, Tervaniemi,
Sussman, Paavilainen, & Winkler, 2001). The MMN appears to originate
from auditory cortical and prefrontal sources (Rinne, Alho, Ilmoniemi,
Virtanen, & Näätänen, 2000; Schönwiesner et al., 2007) and is typically
seen as a negative peak in the ERP between 100 and 250 ms from the onset
of a sound that deviates from some regular aspects of a sound stream. These
regularities can range from simple ones such as repeating pitch to more
abstract rules (Näätänen et al., 2001; Paavilainen, 2013). According to the
most widely adopted theoretical account, the MMN is related to the
adjustment of the regularity representations in light of the new sound
information when the expectations based on these representations are
disconformed (Winkler, Denham, & Nelken, 2009). The “automaticity” of
the MMN refers to the fact that this response can be elicited even when
participants are not actively attending to the stimuli but for example
focusing on watching a silent video or performing a dichotic listening task
(Näätänen et al., 2007). This property of the MMN helps to rule out biases
introduced by differences in motivation or alertness between musicians and
non-musicians. Furthermore, the MMN-like responses can be obtained from
infants and even from newborn babies and thereby it allows the
investigation of early musically relevant perceptual skills and their
maturation (Hannon & Trainor, 2007). Finally, the amplitude of the MMN
has been found to correlate with the accuracy of overt stimulus
discrimination (Näätänen et al., 2007) and to increase with short-term
laboratory training mimicking aspects of musical training (Lappe, Herholz,
Trainor, & Pantev, 2008; Menning, Roberts, & Pantev, 2000;
Paraskevopoulos, Kuchenbuch, Herholz, & Pantev, 2012a). Thus, the MMN
appears to reflect individual differences in sound discrimination skills and is
sensitive to plasticity of the auditory system.
The earliest MMN study to compare musicians and non-musicians found
stronger MMNs in the musicians to occasional mistuning of the third of a
repeating major triad chord while no group differences were found for
MMNs elicited by occasional pitch changes in simple pure tones (Koelsch,
Schröger, & Tervaniemi, 1999). Thus, the enhanced neural pitch
discrimination in musicians was specific for the more musically meaningful
context. Subsequent studies have reported larger or earlier MMNs in
musicians to various types of sound changes including sound omissions
(Rüsseler, Altenmüller, Nager, Kohlmetz, & Münte, 2001), changes in
contour or interval structure of melodies (Brattico, Näätänen, & Tervaniemi,
2001; Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004, 2005; Tervaniemi,
Rytkönen, Schröger, Ilmoniemi, & Näätänen, 2001), the temporal and
numerical organization of sound groups (van Zuijen, Sussman, Winkler,
Näätänen, & Tervaniemi, 2004, 2005), and to infrequent non-prototypical
chords (Brattico et al., 2008). Interestingly, electromagnetic MMN-like
mismatch responses elicited by audio-visual incongruities have been found
to engage different cortical regions and connectivity patterns in musicians
and non-musicians which could be due to musicians’ training in sight
reading (Paraskevopoulos, Kraneburg, Herholz, Bamidis, & Pantev, 2015;
Paraskevopoulos, Kuchenbuch, Herholz, & Pantev, 2012b).
In these studies, the musicians had received training in classical music.
This framework was a fruitful starting point for this line of research since
the intensity and rigor of the training could be expected to lead to large
effects sizes. Furthermore, the training history of the musicians was
typically well documented and thereby the brain data could be readily
correlated with the years/hours of instrumental practice. In early 2000,
however, studies began to examine music experts from other genres also in
order to test whether the brain basis of the musical expertise is more or less
the same for all musicians, or whether fine-grained differences, based on
different demands set by various instruments or genres, could be reflected
in the functional brain indices.
The first studies in this framework indicated that not only classical
musicians but also rock and jazz musicians are more accurate than
laypersons in discriminating various sound features such as timing (jazz
musicians) and spatial sound source location (amateur rock musicians)
(Tervaniemi, Castaneda, Knoll, & Uther, 2006; Vuust et al., 2005). Studies
comparing jazz, rock, and classical musicians to laypersons have found that
jazz musicians showed superior neural discrimination for the majority of
sound features in a multi-feature MMN paradigm (Vuust, Brattico,
Seppänen, Näätänen, & Tervaniemi, 2012) and that musicians with
backgrounds in one these genres display different MMN response profiles
across different types of changes in melodic sound patterns (Tervaniemi,
Janhunen, Kruck, Putkinen, & Huotilainen, 2015).
The relationship between speech and music continues to be debated, but
these domains clearly have some interesting similarities and appear to rely
on some of the same neural mechanisms (Patel, 2011). This raises the
question that, if musicians are more sensitive to various features in music,
are they also superior in processing of speech sounds? Indeed, Tervaniemi
et al. (2009) and Marie and colleagues (Marie, Kujala, & Besson, 2012)
showed musicians to be more sensitive to some acoustic changes in speech
sounds, while a more recent study found that musicians displayed larger
MMNs to pitch, vowel, duration, and voice-onset time changes in spoken
syllables (Kühnis, Elmer, Meyer, & Jäncke, 2013). However, in the study of
Tervaniemi et al. (2009), the enhanced brain responses to pitch changes
were not observed in musicians in a passive listening condition, but only
when they were instructed to listen to the sounds indicating that enhanced
neural sound discrimination in musicians might, in some situations, become
apparent only with top-down modulation of sensory processing (see also
Tervaniemi, Just, Koelsch, Widmann, & Schröger, 2005).
S E A
P
The auditory brainstem response (ABR) is another electrophysiological
measure that has been widely used to compare sound encoding between
musicians and non-musicians (Kraus & Chandrasekaran, 2010). The ABR
elicited by complex sounds like phonemes or musical interval is typically
characterized by transient series of peaks within the first few milliseconds
followed by sustained response that closely mimics the period features of
the sound. As the nomenclature suggests, the ABR is thought to originate
mainly from brainstem nuclei (Chandrasekaran & Kraus, 2010) although
there is emerging evidence for a cortical contribution to the response
(Coffey, Herholz, Chepesiuk, Baillet, & Zatorre, 2016; Coffey, Musacchia,
& Zatorre, 2017). The sustained portion of the ABR, termed the frequency
following response (FFR), preserves the spectro-temporal features of the
stimulus with high fidelity. Therefore, the FFR lends itself to the study of
encoding of sound features that are important for pitch and timbre
processing in music but also essential for differentiating speech sound.
Indeed, the majority of the studies employing the FFR in musicians and
non-musicians have been conducted using speech sound and indicate that
relative to musically untrained controls musicians show more robust coding
of the spectrum of speech stimuli or faster or stronger neural responses at
the very early stages of sound encoding (Kraus & Chandrasekaran, 2010;
Strait & Kraus, 2014). Furthermore, there is evidence to suggest that these
group differences are particularly pronounced in challenging listening
conditions such as in the presence of background noise (Coffey, Mogilever,
& Zatorre, 2017; Strait & Kraus, 2014). Such findings have raised hopes
that musical training could be used to alleviate problems of speech-in-noise
perception and other auditory processing deficits that can occur in language
and other neurological disorders as well as in normal aging (Alain, Zendel,
Hutka, & Bidelman, 2014; Skoe & Kraus, 2010).
The first of these studies used the FFR to investigate the encoding of
pitch contours in spoken Mandarin Chinese syllables (Wong, Skoe, Russo,
Dees, & Kraus, 2007). In Mandarin Chinese, the meaning of syllables is
dependent on the pitch contour and a previous study had shown that native
speakers of this language display enhanced pitch tracking as indexed by the
ABR (Krishnan, Gandour, Ananthakrishnan, & Vijayaraghavan, 2015). The
study by Wong et al. in turn found evidence for a more robust pitch tracking
in musicians than in non-musicians. As none of the subjects had previous
exposure to Mandarin, the results suggest a generalization of the enhanced
sound encoding in musicians to foreign speech sounds. A later study found
that musicians had enhanced ABRs to cello and spoken syllables (/da/) both
in the early transient portion of the ABR as well as in the FFR time window
(Musacchia, Sams, Skoe, & Kraus, 2007). Since these seminal studies, the
enhanced encoding of linguistic and non-linguistic sounds in musicians as
indexed by the ABR has been replicated numerous times (Strait & Kraus,
2014) also in children and adolescents (discussed below) as well as in aging
participants (Alain et al., 2014). Results from laboratory training in sound
identification indicate that pitch tracking of the ABR is boosted even by
short-term experience (Song, Skoe, Wong, & Kraus, 2008) and thereby
support (but obviously do not prove) the notion that the higher quality
sound representation in musicians may be attributable to experience.
In sum, ABR studies indicate that frequent engagement with musical
sounds might tune sound processing in the nuclei along the auditory
pathway to fine-grained acoustic information that is important for sound
processing in both music and speech. The mechanism underlying the ABR
enhancement in musicians is unclear, but has been speculated to be driven
by top-down influence through descending (cortico-fugal) pathways from
the cortex to the auditory brainstem (Kraus & Chandrasekaran, 2010). The
enhanced auditory skills of musicians indeed tend to be accompanied by
above-average performance in non-auditory tasks that tap into higher-order
cognitive processes such as executive functions (discussed below) which, in
some studies, have been found to correlate with the degree of enhancement
in ABR indices of sound encoding (e.g., Strait, Kraus, Parbery-Clark, &
Ashley, 2010).
M T D
B
Studying the effects of musical activities on brain development is of great

theoretical and practical value since such studies have the potential to reveal
the antecedents of the functional group differences seen in adults with and
without musical training and are the only way to establish whether early
musical activities support the development of cognitive skills that are
important to academic achievement and well-being. Cross-sectional and
longitudinal studies indicate that differences in brain function and structure
between musically trained and untrained children start to emerge already
from a few months to two years of musical training (Chobert, François,
Velay, & Besson, 2012; Hyde et al., 2009; Kraus et al., 2014). Importantly,
a few controlled intervention studies in children carried out thus far have
provided initial evidence for the causal role of practice in neural advantages
of music training (Chobert et al., 2012; Kraus et al., 2014; Nan et al., 2018).
One of the first studies to be conducted in the framework of musical
training and brain plasticity in childhood recorded electric brain potentials
evoked by piano, violin, and pure tones in 4- to 5-year-old children enrolled
in music lessons and in control children not active in music (Shahin,
Roberts, & Trainor, 2004). The early P1 response was larger in the music
group for all tones whereas the following P2 was enhanced specifically for
the instrument of practice (piano or violin). In a subsequent longitudinal
study, the P2 response of children learning to play the violin became
enhanced for violin timbre during a one-year follow-up, while it remained
unchanged for noise sounds used as the control material (Fujioka, Ross,
Kakigi, Pantev, & Trainor, 2006). A more recent longitudinal study found
evidence that an Il Sistema-based music program facilitated the maturation
of both passive auditory processing and active sound discrimination of
musical sounds as indexed by the N1-P2 and P300 components,
respectively, when compared to the sport-based intervention or no-
intervention control group (Habibi, Cahn, Damasio, & Damasio, 2016). In
the speech domain, another study found that brain responses to pitch
incongruities in both music and speech differentiated 8-year-old children
with and without musical training (Magne, Schön, & Besson, 2006). This
was followed by longitudinal studies with random assignment that showed
evidence for a causal contribution of training in shaping neural responses to
pitch changes in speech (Moreno, Marques, Santos, Santos, & Besson,
2009) as well as to responses reflecting speech segmentation (François,
Chobert, Besson, & Schön, 2012).
The first MMN studies in musically trained children were cross-
sectional and reported enhanced MMNs in musically trained school-aged
children for frequency changes in violin tones (Meyer et al., 2011) and for
changes from major chords to minor chords (Virtala, Huotilainen, Putkinen,
Makkonen, & Tervaniemi, 2012). Putkinen and colleagues recorded
auditory ERPs longitudinally in children who attended a public elementary
school that integrates instrument lessons, orchestra practice, and music
theory studies into the daily curriculum (Putkinen, Tervaniemi, Saarikivi,
Ojala, & Huotilainen, 2014). The control group consisted of children who
did not play a musical instrument and had non-musical hobbies and were
matched to the music group with regard to socio-economic status (parental
education and income). The MMN elicited by occasional minor chords
presented among major chords increased in amplitude more in the music
group than in the control group between the ages of 7 and 13 years. Along
the same lines, a related study (Putkinen, Tervaniemi, Saarikivi, de Vent, &
Huotilainen, 2014) found that the MMNs elicited by changes in melody,
rhythm, timbre, and tuning increased more in amplitude in the music group
with age than in the control group between 9 and 11 years. Neither study
found significant differences in MMN amplitude at the baseline
measurement indicating that there was no pre-training enhancement in
neural sound discrimination in the music group.
Chobert and colleagues conducted a longitudinal study for two school
years in children who were randomly assigned to music or painting classes
(Chobert et al., 2012). The children received tuition in these activities in 45-
minute sessions twice a week during the first school year and once a week
during the second. MMNs to changes in syllable frequency, duration, and
voice-onset time (VOT) were recorded before training and after six and
twelve months. The MMNs to syllable duration and VOT changes increased
in amplitude after twelve months of training in the children who were
involved in music training but not in those taking painting classes. There
were no group differences before or at six months after the onset of training.
Longitudinal studies have also examined how training affects the
development of sound encoding reflected by the ABR. In a randomized
longitudinal study, Kraus et al. (2014) found that children between the ages
of 6 and 9 who participated in a community-based musical training program
(the Harmony project) for two years showed evidence for a more precise
differentiation of speech sounds as indexed by the spectrum of ABRs
elicited by two different stop consonants (Kraus et al., 2014). Longitudinal
studies in adolescents have reported evidence for enhanced maturation of
the ABR and earlier emergence of adult-like cortical responses to speech
sounds over two years of music lessons (Tierney, Krizman, & Kraus, 2015;
Tierney, Krizman, Skoe, Johnston, & Kraus, 2013).
In addition to more conventional training regimes, computerized
learning environments have been used to investigate the effects of musical
vs. foreign language training in childhood (Janus, Lee, Moreno, &
Bialystok, 2016; Moreno, Lee, Janus, & Bialystok, 2015). In both domains,
corresponding elements—perception, reading, and production—were
taught. In one study, thirty-six 4- to 6-year-old English-speaking children
received either French or music training for twenty days, two hours a day
(Moreno et al. 2015). In a test-training-retest procedure, the children were
tested with EEG and with neurocognitive tests. They were divided into the
music and language training groups in a pseudo-random manner. After the
intervention, both groups showed enhanced brain responses in the trained
domain (music group—music sounds; French group—French vowels) and,
correspondingly, reduced reaction in the untrained domain. The study also
indicated that these changes persisted one year after the training had ended.
There appears to be reasonable empirical support for the conclusion that
musical activities in childhood can benefit sound processing skills that are
important not only in music but also in the language domain. Whether
musical training can enhance higher-order cognitive functions is a more
contentious issue. In the next section, we turn to evidence for the putative
benefits of effects of musical training on executive functions which has
been the focus of a considerable number of studies in recent years (Moreno
& Bidelman, 2014).
T S P :
D M I E
F ?
Executive functions refer to top-down control mechanisms such as

inhibition, set-shifting, working memory, and selective attention (Diamond,
2013; Friedman & Miyake, 2017). These functions support many higher-
order processes (e.g., planning, decision making) and predict various
societally important phenomena ranging from academic performance to
healthy lifestyle choices (Titz & Karbach, 2014). Inhibition, set-shifting,
and working memory are widely considered the core subcomponents of
executive functions (Friedman & Miyake, 2017). Learning to play a
musical instrument places heavy demands on these functions and therefore
it stands to reason that musical training could be associated with above-
average performance in a range of executive function tasks either because
these functions help one to persist in musical training and/or because
musical training enhances executive functions.
A number of studies have found support for this assertion by showing
that musically trained adults and children outperform untrained peers in
tasks of inhibition, set-shifting, and working memory (Bialystok & DePape,
2009; George & Coch, 2011; Hansen, Wallentin, & Vuust, 2013; Ho,
Cheung, & Chan, 2003; Jaschke, Honing, & Scherder, 2018; Moradzadeh,
Blumenthal, & Wiseheart, 2015; Saarikivi, Putkinen, Tervaniemi, &
Huotilainen, 2016; Zuk, Benjamin, Kenyon, & Gaab, 2014). A few fMRI
studies have investigated the neural underpinnings of these behavioral
differences and found that musicians recruit prefrontal and other cortical
and subcortical regions more strongly than non-musicians during working
memory and task-switching tasks (Pallesen et al., 2010; Schulze, Mueller,
& Koelsch, 2011; Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011;
Zuk et al., 2014). For instance, Pallesen et al. (2010) found that musicians
showed stronger activity than non-musicians during a working memory task
in various prefrontal and parietal areas, the anterior cingulate, insula and the
precentral gyrus (Pallesen et al., 2010). There is also evidence that
musicians display structural changes in prefrontal regions that support
executive functions (Bermudez et al., 2009; Gaser & Schlaug, 2003;
Sluming et al., 2002).
The first study to investigate the neural underpinnings of executive
functions in musically trained children used a task designed to tap into task-
switching and found stronger activation in ventrolateral prefrontal cortex
and supplementary motor area (SMA) in the music group relative to age-
matched control children (Zuk et al., 2014). A subsequent study
investigated the neural correlates of inhibition in children around the age of
9 years who had participated in an El Sistema-based music training program
for two years. The results indicated that on the incongruent trials of a Color-
Word Stroop task, the children in the music group showed stronger
activation in the anterior cingulate, inferior frontal gyrus, pre-SMA/SMA
and insula relative to an active control group participating in sports training
and a passive control group who did not participate in an extra-curricular
program (Sachs, Kaplan, Der Sarkissian, & Habibi, 2017).
Using pseudo-random group allocation, Moreno et al. (2011) compared
the neurocognitive effects of computerized intervention for music and
visual arts. As in the study described in the previous section (Moreno et al.,
2015) the intervention lasted for twenty days and the children were asked to
practice twice a day for one hour, each time. In the final analyses, there
were forty-eight participants who were 4 to 6 years of age. It was found that
the music intervention improved the verbal abilities of the children and that
this was paralleled with the facilitation of the neural indices of executive
functions. There were no identical improvements in the children in the art
group. This suggests that relatively short but very intensive music
intervention can improve general cognitive functions, necessary for many
learning activities. In the second study, using the same intensive learning
environment for music vs. French, Janus et al. (2016) reported significant
improvement of the executive functions of their 4- to 6-year-old children—
again already in twenty days.
Collectively, the studies reviewed here suggest that the musician
advantage might extend to domain-general executive functions. It should be
noted, however, that results across studies are not very consistent with
regard to which subcomponent of executive functions is found to
differentiate musicians and non-musicians and some studies have also failed
to find evidence for transfer from musical training to domain-general
cognitive abilities (e.g., Chobert et al., 2012; Costa-Giomi, 1999; Moreno et
al., 2009). Furthermore, most of the evidence for enhanced executive
functions in musicians comes from correlational studies that cannot
disentangle whether these group differences reflect the effects of training or
pre-training differences in cognitive capacity. A recent meta-analysis
concluded that the evidence for the benefits of musical training on working
memory performance suffers from confounds and the support for such
transfer effects is weak (Sala & Gobet, 2017b). More generally, the notion
of far transfer—i.e., the assertion that training in one domain could
generalize to other only distantly related domains—has lately met with
increasing skepticism (Sala & Gobet, 2017a; Ullén, Hambrick, & Mosing,
2016).
P M A
N C
Evidence for heritability in different measures of musical ability has

challenged the view that deliberate practice alone can explain the musician
advantages reviewed in this chapter (Ullén et al., 2016). Although
heritability and experience-dependent plasticity are not mutually exclusive,
these studies show that genetic factors influence the perceptual,
motivational, and cognitive abilities involved in music processing which
should not be ignored when interpreting the group differences between
musicians and non-musicians. Studies have identified candidate genes that
might influence musical aptitude (for a review, see the chapter by Järvelä,
this volume) while twin studies indicate a substantial genetic component in
indices of brain structure and function (Peper, Brouwer, Boomsma, Kahn, &
Hulshoff Pol, 2007; Van Beijsterveldt & Van Baal, 2002), and some
musically relevant perceptual and motors skills as well as the amount of
practice musicians engage in (Drayna, Manichaikul, de Lange, Snieder, &
Spector, 2001; Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014;
Ullén, Mosing, & Madison, 2015; Vinkhuyzen, Van der Sluis, Posthuma, &
Boomsma, 2009). On the other hand, a recent study (de Manzano & Ullén,
2018) in identical twins discordant on musical training found differences in
brain structure between musically trained and non-trained siblings despite
the identical genotype indicating these structural differences were due to
training.
Recent evidence suggests that genetic factors do not have uniform
impact in tonal vs. rhythmic processing. Using online listening tasks
adapted from the well-established listening test of amusia (MBEA,
Montreal Battery on Evaluation of Amusia), Seesjärvi and others
investigated the relative contribution of genetic and environmental factors
to individual variation in music perception in 384 twins (Seesjärvi et al.,
2016). The participants performed three listening tasks that required the
detection of pitch differences in pairs of melodies and key or rhythm
incongruities within single melodies. The first task involved a working
memory component, since it required the comparison of two melodies and
resembled tasks that were previously used in studies reporting strong
genetic components in music perception skills (e.g., Drayna et al., 2001).
The performance in the latter two tasks, in turn, relied less on working
memory and tapped into tonal and rhythmic perception abilities that are
implicitly learned even by non-musicians. There were additive genetic
effects in the pitch task with no shared environment effects while the
opposite was found for the key task. Variation in the rhythm task
performance, in turn, was mainly explained by a strong non-shared
environmental effect. The authors concluded that this pattern of results
suggests that the contribution of genetic and environmental factors on
music perception depends on the degree to which the perceptual skill in
question relies on formally vs. implicitly acquired knowledge of musical
structures.
C F D
Longitudinal studies in children are important for establishing whether

music and other arts can indeed facilitate real-life learning and academic
skills either directly, by influencing relevant cognitive faculties, or
indirectly, by increasing motivation and engagement in learning activities.
Randomized control trials are of course the gold standard for establishing
causality but pose many practical difficulties especially for long-term
follow-ups where more naturalistic studies might be the only option. Two
ongoing large-scale studies in the United States are currently examining the
efficacy of musical training programs implemented in community settings
(For first results from these studies see Habibi et al., 2016, 2017; Kraus et
al., 2014). In parallel, we have initiated three intervention studies in
community settings. The first of them was established in Finnish
kindergartens where weekly music playschool and dance programs were
integrated into the regular daycare routines (Linnanvalli, Putkinen,
Lipsanen, Huotilainen, & Tervaniemi, 2018). In the second study,
elementary school teachers receive assistance in implementing physical
activity and musical programs as a part of the school day of 10-year-old
children on a weekly basis. The third study is ongoing in an elementary
school in Beijing where 9- to 10-year-old children receive extra-curricular
lessons in music or in English language two to three times a week. In all
these three studies, interventions are preceded and followed by
neurocognitive and EEG measurements, and, in the schools, also tests on
academic achievement. Our aim is to determine whether these easy-to-
implement and motivating music interventions can benefit brain
development and facilitate the learning of academic as well as social skills.
This chapter has reviewed studies on functional differences between
musicians and non-musicians in sound processing and executive functions.
This literature provides fairly strong empirical and theoretical grounds for
concluding that musical training enhances domain-general auditory
processing skills while the evidence for far transfer from musical training to
executive functions is more mixed. It also appears likely that training alone
cannot explain all the variation between musicians and non-musicians in
neurocognitive skills. Cross-sectional studies will continue to elucidate the
neural bases of exceptional musical ability and provide hypotheses
regarding the effects of musical training on the brain but self-selection
complicates the interpretation of such studies in terms of the contribution of
predisposing factors and brain plasticity. If these caveats are kept in mind
and the hypothesized neuroplastic effects of musical training are tested in
longitudinal studies, musical training will continue to serve as a useful
model for neural plasticity in humans.
R
Alain, C., Zendel, B. R., Hutka, S., & Bidelman, G. M. (2014). Turning down the noise: The benefit
of musical training on the aging auditory brain. Hearing Research 308, 162–173.
Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano
1148–1150.
19(7), 1583–1596.
Besson, M., & Faïta, F. (1995). An event-related potential (ERP) study of musical expectancy:
Comparison of musicians with nonmusicians. Journal of Experimental Psychology: Human
Besson, M., Faïta, F., & Requin, J. (1994). Brain waves associated with musical incongruities differ
for musicians and non-musicians. Neuroscience Letters 168(1), 101–105.
Bialystok, E., & DePape, A.-M. (2009). Musical expertise, bilingualism, and executive functioning.
Brattico, E., Näätänen, R., & Tervaniemi, M. (2001). Context effects on pitch perception in musicians
and nonmusicians: Evidence from event-related-potential recordings. Music Perception: An
Brattico, E., Pallesen, K. J., Varyagina, O., Bailey, C., Anourova, I., Järvenpää, M., … Tervaniemi,
M. (2008). Neural discrimination of nonprototypical chords in music experts and laymen: An
MEG study. Journal of Cognitive Neuroscience 21(11), 2230–2244.
Buonomano, D. V., & Merzenich, M. M. (1998). Cortical plasticity: From synapses to maps. Annual
Review of Neuroscience 21, 149–186.
Chandrasekaran, B., & Kraus, N. (2010). The scalp-recorded brainstem response to speech: Neural
origins and plasticity. Psychophysiology 47(2), 236–246.
Chobert, J., François, C., Velay, J.-L., & Besson, M. (2012). Twelve months of active musical
Coffey, E. B. J., Herholz, S. C., Chepesiuk, A. M. P., Baillet, S., & Zatorre, R. J. (2016). Cortical
contributions to the auditory frequency-following response revealed by MEG. Nature
Communications 7, 11070. doi:10.1038/ncomms11070
Coffey, E. B., Mogilever, N., & Zatorre, R. J. (2017). Speech-in-noise perception in musicians: A
review. Hearing Research 352, 49–69.
Coffey, E. B., Musacchia, G., & Zatorre, R. J. (2017). Cortical correlates of the auditory frequency-
following and onset responses: EEG and fMRI evidence. Journal of Neuroscience 37(4), 830–838.
Costa-Giomi, E. (1999). The effects of three years of piano instruction on children’s cognitive
development. Journal of Research in Music Education 47(3), 198–212.
de Manzano, Ö., & Ullén, F. (2018). Same genes, different brains: Neuroanatomical differences
between monozygotic twins discordant for musical training. Cerebral Cortex 28(1), 387–394.
Diamond, A. (2013). Executive functions. Annual Review of Psychology 64, 135–168.
Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001). Genetic correlates of
musical pitch recognition in humans. Science 291(5510), 1969–1972.
Friedman, N. P., & Miyake, A. (2017). Unity and diversity of executive functions: Individual
differences as a window on cognitive structure. Cortex 86, 186–204.
Fujioka, T., Ross, B., Kakigi, R., Pantev, C., & Trainor, L. J. (2006). One year of musical training
affects development of auditory cortical-evoked fields in young children. Brain 129(10), 2593–
2608.
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances
automatic encoding of melodic contour and interval structure. Journal of Cognitive Neuroscience
16(6), 1010–1021.
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2005). Automatic encoding of
polyphonic melodies in musicians and nonmusicians. Journal of Cognitive Neuroscience 17(10),
1578–1592.
George, E. M., & Coch, D. (2011). Music training and working memory: An ERP study.
Habibi, A., Cahn, B. R., Damasio, A., & Damasio, H. (2016). Neural correlates of accelerated
auditory processing in children engaged in music training. Developmental Cognitive Neuroscience
21, 1–14.
Habibi, A., Damasio, A., Ilari, B., Veiga, R., Joshi, A. A., Leahy, R. M., … Damasio, H. (2017).
Childhood music training induces change in micro and macroscopic brain structure: Results from a
longitudinal study. Cerebral Cortex, 1–12. Retrieved from https://doi.org/10.1093/cercor/bhx286
Halwani, G. F., Loui, P., Rüber, T., & Schlaug, G. (2011). Effects of practice and experience on the
Hansen, M., Wallentin, M., & Vuust, P. (2013). Working memory and musical competence of
musicians and non-musicians. Psychology of Music 41(6), 779–793.
behavior, function, and structure. Neuron 76(3), 486–502.
Ho, Y. C., Cheung, M. C., & Chan, A. S. (2003). Music training improves verbal but not visual
memory: Cross-sectional and longitudinal explorations in children. Neuropsychology 17(3), 439–
450.
Janus, M., Lee, Y., Moreno, S., & Bialystok, E. (2016). Effects of short-term music and second-
language training on executive control. Journal of Experimental Child Psychology 144, 84–97.
Jaschke, A. C., Honing, H., & Scherder, E. J. (2018). Longitudinal analysis of music education on
executive functions in primary school children. Frontiers in Neuroscience 12, 103. Retrieved from
https://www.frontiersin.org/articles/10.3389/fnins.2018.00103
Koelsch, S. (2009). Music-syntactic processing and auditory memory: Similarities and differences
between ERAN and MMN. Psychophysiology 46(1), 179–190.
Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., Schröger, E., & Friederici, A. D. (2003).
Children processing music: Electric brain responses reveal musical competence and gender
differences. Journal of Cognitive Neuroscience 15(5), 683–693.
Koelsch, S., Gunter, T., Friederici, A. D., & Schröger, E. (2000). Brain indices of music processing:
“Nonmusicians” are musical. Journal of Cognitive Neuroscience 12(3), 520–541.
Koelsch, S., Schmidt, B.-H., & Kansok, J. (2002). Effects of musical expertise on the early right
anterior negativity: An event-related brain potential study. Psychophysiology 39(5), 657–663.
musicians. Neuroreport 10(6), 1309–1313.
Kraus, N., Slater, J., Thompson, E. C., Hornickel, J., Strait, D. L., Nicol, T., & White-Schwoch, T.
(2014). Music enrichment programs improve the neural encoding of speech in at-risk children.
Krishnan, A., Gandour, J. T., Ananthakrishnan, S., & Vijayaraghavan, V. (2015). Language
experience enhances early cortical pitch-dependent responses. Journal of Neurolinguistics 33,
128–148.
Kühnis, J., Elmer, S., Meyer, M., & Jäncke, L. (2013). The encoding of vowels and temporal speech
cues in the auditory cortex of professional musicians: An EEG study. Neuropsychologia 51(8),
1608–1618.
Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by short-
term unimodal and multimodal musical training. Journal of Neuroscience 28(39), 9632–9639.
Linnavalli, T., Putkinen, V., Lipsanen, J., Huotilainen, M., & Tervaniemi, M. (2018). Music
playschool enhances children’s linguistic skills. Scientific Reports 8(1), 8767.
Broca’s area: An MEG study. Nature Neuroscience 4(5), 540–545.
Marie, C., Kujala, T., & Besson, M. (2012). Musical and linguistic expertise influence pre-attentive
and attentive processing of non-speech sounds. Cortex 48(4), 447–457.
Menning, H., Roberts, L. E., & Pantev, C. (2000). Plastic changes in the auditory cortex induced by
intensive frequency discrimination training. Neuroreport 11(4), 817–822.
Meyer, M., Elmer, S., Ringli, M., Oechslin, M. S., Baumann, S., & Jäncke, L. (2011). Long-term
exposure to music enhances the sensitivity of the auditory system in children. European Journal of
Moradzadeh, L., Blumenthal, G., & Wiseheart, M. (2015). Musical training, bilingualism, and
executive function: A closer look at task switching and dual-task performance. Cognitive Science
39(5), 992–1020.
22(11), 1425–1433.
Moreno, S., & Bidelman, G. M. (2014). Examining neural plasticity and cognitive benefit through
the unique lens of musical training. Hearing Research 308, 84–97.
Moreno, S., Lee, Y., Janus, M., & Bialystok, E. (2015). Short-term second language and music
training induces lasting functional brain changes in early childhood. Child Development 86(2),
394–406.
Moreno, S., Marques, C., Santos, A., Santos, M., & Besson, M. (2009). Musical training influences
linguistic abilities in 8-year-old children: More evidence for brain plasticity. Cerebral Cortex
19(3), 712–723.
1795–1803.
Musacchia, G., Sams, M., Skoe, E., & Kraus, N. (2007). Musicians have enhanced subcortical
auditory and audiovisual processing of speech and music. Proceedings of the National Academy of
Sciences 104(40), 15894–15898.
Näätänen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity (MMN) in
basic research of central auditory processing: A review. Clinical Neurophysiology 118(12), 2544–
2590.
Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001). “Primitive
intelligence” in the auditory cortex. Trends in Neurosciences 24(5), 283–288.
Nan, Y., Liu, L., Geiser, E., Shu, H., Gong, C. C., Dong, Q., … Desimone, R. (2018). Piano training
enhances the neural processing of pitch and improves speech perception in Mandarin-speaking
children. Proceedings of the National Academy of Sciences 115(28), E6630–E6639.
Paavilainen, P. (2013). The mismatch-negativity (MMN) component of the auditory event-related
potential to violations of abstract regularities: A review. International Journal of Psychophysiology
88(2), 109–123.
Pallesen, K. J., Brattico, E., Bailey, C. J., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S.
(2010). Cognitive control in auditory working memory is enhanced in musicians. PloS ONE 5(6),
e11120.
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). Increased
auditory cortical representation in musicians. Nature 392(6678), 811–814.
Pantev, C., Roberts, L. E., Schulz, M., Engelien, A., & Ross, B. (2001). Timbre-specific
enhancement of auditory cortical representations in musicians. Neuroreport 12(1), 169–174.
Paraskevopoulos, E., Kraneburg, A., Herholz, S. C., Bamidis, P. D., & Pantev, C. (2015). Musical
expertise is related to altered functional connectivity during audiovisual integration. Proceedings
of the National Academy of Sciences 112(40), 12522–12527.
Paraskevopoulos, E., Kuchenbuch, A., Herholz, S. C., & Pantev, C. (2012a). Evidence for training-
induced plasticity in multisensory brain structures: An MEG study. PloS ONE 7(5), e36534.
Paraskevopoulos, E., Kuchenbuch, A., Herholz, S. C., & Pantev, C. (2012b). Musical expertise
induces audiovisual integration of abstract congruency rules. Journal of Neuroscience 32(50),
18196–18203.
hypothesis. Frontiers in Psychology 2, 142. doi:10.3389/fpsyg.2011.00142
Peper, J. S., Brouwer, R. M., Boomsma, D. I., Kahn, R. S., & Hulshoff Pol, H. E. (2007). Genetic
influences on human brain structure: A review of brain imaging studies in twins. Human Brain
Mapping 28(6), 464–473.
Praag, H. van, Kempermann, G., & Gage, F. H. (2000). Neural consequences of environmental
enrichment. Nature Reviews Neuroscience 1(3), 191–198.
Putkinen, V., Tervaniemi, M., Saarikivi, K., de Vent, N., & Huotilainen, M. (2014). Investigating the
effects of musical training on functional brain development with a novel melodic MMN paradigm.
Neurobiology of Learning and Memory 110, 8–15.
Putkinen, V., Tervaniemi, M., Saarikivi, K., Ojala, P., & Huotilainen, M. (2014). Enhanced
development of auditory change detection in musically trained school-aged children: A
longitudinal event-related potential study. Developmental Science 17(2), 282–297.
Rinne, T., Alho, K., Ilmoniemi, R. J., Virtanen, J., & Näätänen, R. (2000). Separate time behaviors of
the temporal and frontal mismatch negativity sources. NeuroImage 12(1), 14–19.
Rüsseler, J., Altenmüller, E., Nager, W., Kohlmetz, C., & Münte, T. F. (2001). Event-related brain
potentials to sound omissions differ in musicians and non-musicians. Neuroscience Letters 308(1),
33–36.
Saarikivi, K., Putkinen, V., Tervaniemi, M., & Huotilainen, M. (2016). Cognitive flexibility
modulates maturation and music-training-related changes in neural sound discrimination.
European Journal of Neuroscience 44(2), 1815–1825.
Sachs, M., Kaplan, J., Der Sarkissian, A., & Habibi, A. (2017). Increased engagement of the
cognitive control network associated with music training in children during an fMRI Stroop task.
PloS ONE 12(10), e0187254.
Sala, G., & Gobet, F. (2017a). Does far transfer exist? Negative evidence from chess, music, and
working memory training. Current Directions in Psychological Science 26(6), 515–520.
Sala, G., & Gobet, F. (2017b). When the music’s over: Does music skill transfer to children’s and
20, 55–67.
Schlaug, G., Jäncke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum
Schönwiesner, M., Novitski, N., Pakarinen, S., Carlson, S., Tervaniemi, M., & Näätänen, R. (2007).
Heschl’s gyrus, posterior superior temporal gyrus, and mid-ventrolateral prefrontal cortex have
different roles in the detection of acoustic changes. Journal of Neurophysiology 97(3), 2075–2082.
Schulze, K., Mueller, K., & Koelsch, S. (2011). Neural correlates of strategy use during auditory
working memory in musicians and non-musicians. European Journal of Neuroscience 33(1), 189–
196.
verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping 32(5),
771–783.
Seesjärvi, E., Särkämö, T., Vuoksimaa, E., Tervaniemi, M., Peretz, I., & Kaprio, J. (2016). The nature
and nurture of melody: A twin study of musical pitch and rhythm perception. Behavior Genetics
46(4), 506–515.
Shahin, A., Roberts, L. E., & Trainor, L. J. (2004). Enhancement of auditory cortical development by
musical experience in children. Neuroreport 15(12), 1917–1921.
Skoe, E., & Kraus, N. (2010). Auditory brainstem response to complex sounds: A tutorial. Ear and
Hearing 31(3), 302–324.
Song, J. H., Skoe, E., Wong, P. C., & Kraus, N. (2008). Plasticity in the adult human auditory
brainstem following short-term linguistic training. Journal of Cognitive Neuroscience 20(10),
1892–1902.
Strait, D. L., & Kraus, N. (2014). Biological impact of auditory expertise across the life span:
Musicians as a model of auditory learning. Hearing Research 308(Suppl. C), 109–121.
Strait, D. L., Kraus, N., Parbery-Clark, A., & Ashley, R. (2010). Musical experience shapes top-down
auditory mechanisms: Evidence from masking and auditory attention performance. Hearing
Research 261(1), 22–29.
Tervaniemi, M., Castaneda, A., Knoll, M., & Uther, M. (2006). Sound processing in amateur
musicians and nonmusicians: Event-related potential and behavioral indices. Neuroreport 17(11),
1225–1228.
Tervaniemi, M., Janhunen, L., Kruck, S., Putkinen, V., & Huotilainen, M. (2015). Auditory profiles
of classical, jazz, and rock musicians: Genre-specific sensitivity to musical sound features.
Frontiers in Psychology 6. Retrieved from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4703758/
Tervaniemi, M., Just, V., Koelsch, S., Widmann, A., & Schröger, E. (2005). Pitch discrimination
accuracy in musicians vs. nonmusicians: An event-related potential and behavioral study.
Experimental Brain Research 161(1), 1–10.
Tervaniemi, M., Kruck, S., De Baene, W., Schröger, E., Alter, K., & Friederici, A. D. (2009). Top-
down modulation of auditory processing: Effects of sound context, musical expertise and
attentional focus. European Journal of Neuroscience 30(8), 1636–1642.
Tervaniemi, M., Rytkönen, M., Schröger, E., Ilmoniemi, R. J., & Näätänen, R. (2001). Superior
formation of cortical memory traces for melodic patterns in musicians. Learning & Memory 8(5),
295–300.
Tierney, A. T., Krizman, J., & Kraus, N. (2015). Music training alters the course of adolescent
auditory development. Proceedings of the National Academy of Sciences 112(32), 10062–10067.
Tierney, A. T., Krizman, J., Skoe, E., Johnston, K., & Kraus, N. (2013). High school music classes
enhance the neural processing of speech. Frontiers in Psychology 4. Retrieved from
Titz, C., & Karbach, J. (2014). Working memory and executive functions: Effects of training on
academic achievement. Psychological Research 78(6), 852–868.
Ullén, F., Hambrick, D. Z., & Mosing, M. A. (2016). Rethinking expertise: A multifactorial gene–
Ullén, F., Mosing, M. A., & Madison, G. (2015). Associations between motor timing, music practice,
and intelligence studied in a large sample of twins. Annals of the New York Academy of Sciences
1337, 125–129.
Van Beijsterveldt, C. E. M., & Van Baal, G. C. M. (2002). Twin and family studies of the human
electroencephalogram: A review and a meta-analysis. Biological Psychology 61(1), 111–138.
van Zuijen, T. L., Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2004). Grouping of
sequential sounds: An event-related potential study comparing musicians and nonmusicians.
van Zuijen, T. L., Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2005). Auditory
organization of sound sequences by a temporal or numerical regularity: A mismatch negativity
study comparing musicians and non-musicians. Brain Research: Cognitive Brain Research 23(2–
3), 270–276.
Vinkhuyzen, A. A., Van der Sluis, S., Posthuma, D., & Boomsma, D. I. (2009). The heritability of
aptitude and exceptional talent across different domains in adolescents and young adults. Behavior
Genetics 39(4), 380–392.
Virtala, P., Huotilainen, M., Putkinen, V., Makkonen, T., & Tervaniemi, M. (2012). Musical training
facilitates the neural discrimination of major versus minor chords in 13-year-old children.
Psychophysiology 49(8), 1125–1132.
Vuust, P., Brattico, E., Seppänen, M., Näätänen, R., & Tervaniemi, M. (2012). The sound of music:
Differentiating musicians using a fast, musical multi-feature mismatch negativity paradigm.
Vuust, P., Pallesen, K. J., Bailey, C., van Zuijen, T. L., Gjedde, A., Roepstorff, A., & Østergaard, L.
(2005). To musicians, the message is in the meter: Pre-attentive neuronal responses to incongruent
rhythm are left-lateralized in musicians. NeuroImage 24(2), 560–564.
Winkler, I., Denham, S. L., & Nelken, I. (2009). Modeling the auditory scene: Predictive regularity
representations and perceptual objects. Trends in Cognitive Sciences 13(12), 532–540.
functioning in musicians and non-musicians. PloS ONE 9(6), e99868.
SECTION VI
D E V E L OP ME N TA L
IS S U E S IN MU S IC A N D
T HE B R A IN
CHAPT E R 23
THE ROLE OF MUSICAL

D E V E L O P M E N T I N E A R LY
LANGUAGE ACQUISITION
A N T H O N Y B R A N D T, MO L LY G E B R I A N, A N D L . R O B E RT
SLEVC
L is a foundation of human culture, and the transmission of

language from caregiver to child is one of human society’s most universal
and cherished tasks. As language is passed down, infants benefit from both
explicit and implicit learning. Caregivers tutor them with infant-directed
speech or motherese (Falk, 2004; Fernald, 1992;). Infants also
unconsciously internalize the conversations that happen around them
(Perruchet & Pacton, 2006; Saffran Aslin, & Newport, 1996). Thanks to
direct tutelage and less focused exposure, normally developing children are
able to learn their native tongue within the first few years of life.
As Perani writes, “The way or mechanism through which language is
acquired and mastered is one of the core questions in the domain of human
sciences” (Perani, 2012, p. 306). Scientists have long speculated about
whether music is implicated in this process. After all, just like language,
music is ubiquitous in the world’s cultures: every normally developing
human is born with the ability to appreciate music; populations everywhere
sing as well as speak. Given increasing evidence for shared neural resources
for these two forms of human expression in adults (Patel, 2012), is it
possible that they are even more entangled in infancy? Is music involved in
early language acquisition?
At first glance, it might be difficult to see how. Language is referential.
From our daily conversations to the loftiest tomes, it is our way of
transmitting information. As Jackendoff (2009, p. 197) writes:
Language is essentially a mapping between sound and “propositional” or “conceptual”
thought. The messages it conveys can be about people, objects, places, actions, or any
manner of abstraction. Language can convey information about past, future, visible things,
invisible things, and what is not the case.
Jackendoff continues: “None of these functions can be satisfied by music.”

Music can be put to many uses: it can express emotional states, as in a love
song; it can depict physical phenomena, such as a Tuvan throat singer’s
imitation of a waterfall; it can evoke spiritual enlightenment, as in religious
chant; it can be a display of stamina, as in an Inuit vocal competition; it can
even present mathematical structure, as in the Fibonacci proportions in Bela
Bartok’s string quartets. But it has little of the declamatory power of
language. Language is built to say things like “We’re out of eggs—can you
please run to the store?” Any musician would labor in vain to express
something that concrete. In cultures around the world, distinguishing
between the real and the hypothetical (“You should have bought the other
car”) and past, present, and future (“The coupon expired yesterday”) are a
routine and vital part of linguistic communication; as Jackendoff writes,
music is virtually incapable of expressing these quotidian concepts.
Although there is undeniably some overlap, music—in all its diverse
manifestations around the globe—doesn’t mimic or reproduce the functions
of language; rather, it complements them. Or as Victor Hugo put it: “Music
expresses that which cannot be put into words” (Hugo, 1864, p. 44).
Part and parcel of this contrast is that language and music are organized
differently. Language is a combination of vocabulary and syntax. Music
may often consist of recurring patterns, tropes, and even formulaic gestures
—but there’s no dictionary for motives. And the syntactic distinctions
between subject, verb, and object—virtually universal in language—have
no correspondence in music. It’s for this reason that you can’t faithfully
translate one genre of music into another: it’s fruitless to search for a
gamelan version of a Beethoven string quartet, or the Gagaku version of a
country-and-western song (Jackendoff, 2009).
There are other distinctions as well. Whether the tala of Indian classical
music or the twelve-bar blues of jazz, the use of cyclic form is widespread
in musical cultures. This is particularly true when the music is participatory:
the structural predictability gives spectators the confidence to join in. Cyclic
structure is not requisite or absolute—but even repertoire that involves
unsynchronized or loosely coordinated ensemble performance, such as
Japanese Gagaku, is often underpinned by repeating metric structures
(Harich-Schneider, 1954). But whereas cyclic structure is perhaps the most
elemental and resilient musical structure, it is only marginally relevant to
language. We typically don’t express ourselves in loops: rather, a linguistic
argument is narrative, rather than circular.
Similarly, a great deal of indigenous music sets a tempo or pace—and
sticks with it. For instance, in gamelan music, each large section is
characterized by an underlying pulse; during a transitional section called the
pathetan, most of the instruments drop out as the pulse shifts; then, when a
new pulse is established, the full ensemble again joins in (Spiller, 2004).
Linguistic communication doesn’t dictate speed in that way. The speed of
verbal communication sways with our thoughts: hesitating when we’re
searching for the right word or trying to recall something, rushing ahead
when we’re surer of ourselves or excited.
Thus, the contrast between language and music—both in function and
rhetoric—is quite stark. It is perhaps not surprising, then, that Norman-
Haignere and colleagues (Norman-Haignere, Kanwisher, & McDermott,
2015) recently identified patterns of neural activity associated with musical
stimuli that were dissociable from activity for speech and environmental
sounds. To the adult mind, language and music are not easily confused.
Jackendoff urges “caution in drawing strong connections between language
and music, both in the contemporary human brain and in their evolutionary
roots” (Jackendoff, 2009, p. 203).
But consider the experience of listening to a language you don’t know:
what you hear is a vocal performance, which varies in timbre, melody, and
rhythm. When you don’t understand the words, speech is a type of music.
And that is how infants are first exposed to language.
T M S
Is it accurate to describe speech as musical? Phonemes are the basic units of

speech, prosody the way speech is delivered. Both of these have musical
attributes.
Phonemes—the distinct units of sound in a language that make up words
—involve different attack characteristics and acoustic spectra. As a result,
distinguishing phonemes involves rapid temporal processing similar to the
discrimination of musical timbre: for instance, the distinctions between the
consonants s and k happen on the order of 25–50 ms—the same as the time
window for the contrast between a cymbal and woodblock (Hukin &
Darwin, 1995; Robinson & Patterson, 1995; Shepard, 1980).
The timbral characteristics of phonemes are put to musical use in scat
singing and bebop, in which nonsense syllables are used for their sonic
appeal. This is playfully illustrated by the song “Who put the Bomp?”
(Mann, 1961):
Who put the bomp
In the bomp bah bomp bah bomp?
Who put the ram
In the rama lama ding dong? …
I’d like to shake his hand
He made my baby
Fall in love with me …
Milton Babbitt’s Phonemena is a more cerebral form of scat singing: the

text consists of varying combinations of twelve vowel-based sounds and
twenty-four consonants. As in bebop, Babbitt’s phonemes are not bearers of
meaning, but rather serve purely musical functions (Kostelanetz, 1987). On
a tight deadline and unable to get a colleague at the United Nations to
provide African lyrics in time, Lionel Richie invented a nonsense language
for his 1983 hit song All Night Long: “Tom bo il de ay de moi ya, Hey
Jambo Jumbo” (Fleming, 2013).
Every language has its distinctive sonic characteristics, based on its
inventory of phonemes. For instance, the Xhosa language has distinctive
vocal “clicks.” Bantu doesn’t include the English cluster sch, so “school” in
Bantu is “sukulu.” English doesn’t include the Russian tsch, and Japanese
doesn’t have the American r. Languages vary widely in their sonic
inventory: the Xhosa dialect Taa is estimated to have upwards of two
hundred phonemes, while Hawaiian has less than two dozen (Rousseau,
2016). Understanding and speaking any language involves mastering its
unique phonemic palette, as well as the combinations in which they occur.
Meanwhile, prosody refers to the melodic and rhythmic inflection of
speech. In tonal languages such as Hmong and Mandarin, melodic
inflection is a determinant of meaning. For instance, in Hmong, depending
on how it is spoken, the syllable paw can have seven different meanings:
“female,” “ball,” “thorn,” “paternal grandmother,” “pancreas,” “to see,” and
“to throw” (McWhorter, 2015). The Chinese poem “The Lion-Eating Poet
in the Stone Den” consists only of the syllable shi, repeated 92 times. But
by speaking shi with the appropriate pitch contours, the spoken text tells the
story of a poet named Shih Shih who hunts lions at a market, takes them
back to his stone den, and tries to eat ten of them (Forsyth, 2012). In the
Bantu languages of Africa, successive syllables oscillate between “register
tones.” The use of vocal registers is once again a determinant of meaning:
in the Nigerian language of Akan, the words for “good,” “fan,” and “father”
are all papa, pronounced with different register tones (McWhorter, 2015).
Even in non-tonal languages such as English, prosody is essential to
verbal communication, helping to demarcate word and phrase boundaries,
create emphases, and distinguish questions from statements: when spoken,
the sentences “She’s next in line” and “She’s next in line?” can only be told
apart thanks to their melodic inflection.
Rhythm is also an important component of prosody. In stress-timed
languages like English, accented syllables are elongated, whereas in
syllable-timed languages like Japanese, they’re not. Ramus and Mehler
(1999) devised a study in which they “smoothed out” the phonetic
differences between English and Japanese and tested whether adults could
still correctly identify the source tongue. They found that French speakers
could indeed discriminate between the two languages based on little else
than their rhythmic patterns.
The melody and rhythm of speech play a special role in the Ewe and
Yoruba tribes of Africa. Both tribes use “talking drums” to mimic their
tonal languages: by transposing the prosodic features of their speech into
percussion riffs, the Ewe have elaborate “shouting” contests and the Yoruba
communicate over large distances (Batuman, 2012).
The prosody of a non-tonal language like English is put to musical use
in Steve Reich’s Different Trains for string quartet and electronic tape. A
meditation on the contrasting voyages of American and European Jews in
the 1940s, Reich interviewed riders of the transcontinental railroad along
with Holocaust survivors. As the electronic tape performs snippets of these
spoken commentaries—“from New York to Los Angeles,” “Black crows
invaded our country many years ago”—the string quartet imitates their
pitch and rhythmic inflections, turning his subjects’ speech into musical
motives (Reich, 1989). Similarly, jazz artist Jason Moran “transcribed” a
lecture by artist Adrian Piper into a piano solo by imitating the melody and
rhythm of her speech and then adopted the resulting musical line as the
basis for a jazz improvisation (Moran, 2006).
This link between prosody and music was clearly demonstrated in an
experiment by Diana Deutsch (Deutsch, Henthorn, & Lapidis, 2011).
Deutsch looped the recorded phrase “sometimes behave so strangely” over
and over and found that, after about ten repetitions, adult listeners heard the
looped words as sung rather than spoken. Deutsch further noted that when
she scrambled the order of the syllables, the listeners once again heard the
text as normal speech. She surmised that, when the listeners no longer
needed to attend to the meaning of the phrase, they began to pay more
attention to its prosody; but when the syllables were scrambled, that
nullified the effect, because the listeners were once again trying to figure
out what the words meant.
Still, is it appropriate to describe speech as “musical”? Peretz (2001, p.
440) has proposed that the “two anchorage points of brain specialization for
music are the encoding of pitch along musical scales and the ascribing of a
regular pulse to incoming events.” Speech doesn’t have either. However,
there is indigenous music that doesn’t as well: for instance, many types of
throat singing involve neither traditional scales nor fixed pulse. Because, as
Cross notes, music is “cultural, variable, and particular” (Cross, 2001, p.
32), we have argued that music is best defined as “creative play with sound,
in which there is an attention to sound’s acoustic properties, irrespective of
any referential meaning” (Brandt, Gebrian, & Slevc, 2012). Given that it
involves close attention to pitch, rhythm, and timbre, speech can be viewed
as a special type of music—especially if it is a language you don’t speak.
Music may at times seem to be more “rehearsed” or programmed than
conversational speech: a Beethoven sonata will sound nearly identical from
performance to performance. But that is not the case in improvisatory
traditions: a Japanese shakuhashi performer or Indian sitar player will never
perform the same way twice. In “free jazz,” the musicians are not bound by
shared pulse or harmony and often spontaneously incorporate extended
techniques such as over-blowing, key clicks, pitch bends, and multiphonics.
Taking the broadest possible definition of music, speech can be viewed as a
form of musical improvisation: it is “a concert of phonemes and syllables,
melodically inflected by prosody” (Brandt et al., 2012, p. 4). Each language
is distinguished by its inventory of possible sounds and its conventions of
melodic and rhythmic performance. The question is: does this matter to
infants?
M L P
A B
In societies across the globe, getting children conversant with their native
tongue as fast as possible is a universal goal. Thus, children’s ability to
learn language is an essential constraint on its structure. As Deacon (1997,
p. 110) writes, “The structure of a language is under intense selection
pressure because in its reproduction from generation to generation, it must
pass through a narrow bottleneck: children’s minds.” Do language’s musical
features help to facilitate this process?
Despite their inability to speak and understand language, newborns
display an impressive sensitivity to a variety of linguistic contrasts. This
sensitivity has been often cited as evidence that the ability to learn language
is innate (e.g., Eimas, Siqueland, Jusczyk, & Vigorito, 1971; Vouloumanos
& Werker, 2007). However, infants are most responsive to the sounds of
words, not to their meaning: they are drawn first to the musical aspects of
language.
For instance, as they encounter the world for the first time, newborns are
famously able to discriminate the phonemes of all languages (Dehaene-
Lambertz & Dehaene, 1994; Eimas et al., 1971; Werker & Tees, 1984). As
discussed above, this reflects a sensitivity to (vocal) timbre. Although less
research has been done to probe musical timbre perception in newborns, 3-
to 4-day-old infants can organize auditory streams on the basis of timbre,
similar to adults (McAdams & Bertoncini, 1997). Older infants remain
highly sensitive to timbre: 6-month-olds have long-term memory for the
timbre of folk songs, and 7- to 8.5-month-old infants can differentiate tones
that differ only in their spectral structure (Trainor, Wu, & Tsang, 2004;
Trehub, Endman, & Thorpe, 1990). Timbre appears to be so salient to
infants that it can actually affect their ability to recognize and discriminate
other basic features of music and speech (for summaries, see Costa-Giomi
& Davila, 2014; Creel, 2016). For instance, infants take longer to learn
words spoken by different speakers than when they are spoken by just one
person (Jusczyl, Pisoni, & Mullennix, 1992). Trainor and colleagues (2004)
familiarized infants with a melody played on a single instrument and found
that the infants could not recognize this same melody when played on a
different instrument. The salience of timbre extends through preschool:
when asked to associate sound to visual stimuli, preschoolers do so more
readily with timbral contrasts than with pitch contours (Creel, 2016). Even
adults show memory facilitation dependent on timbre (e.g., Halpern &
Müllensiefen, 2008; Radvansky & Potter, 2000).
Newborns are also sensitive to the rhythmic features of language: they
distinguish between languages of different rhythmic classes—stress-timed
or syllable-timed—whether or not the contrast includes their native
language (Nazzi, Bertoncini, & Mehler, 1998). Although newborns prefer
their native language (Moon, Cooper, & Fifer, 1993), this seems to reflect a
preference for the rhythmic class (stress patterns) of their native language: it
is not until 4 months of age that infants can reliably tell the difference
between languages of the same rhythmic class (Bosch & Sebastian-Galles,
1997; Gervain & Mehler, 2010; Nazzi et al., 1998).
In addition to being sensitive to linguistic timbre and rhythm, infants can
also discriminate the characteristic prosody (or melody) of their native
language (Friederici, 2006) and even show evidence of discriminating
affective prosody in the first two days of life (Cheng, Lee, Chen, Wang, &
Decety, 2012). Research on infant cries also sheds light on the importance
of melodic abilities in linguistic development: over the first few months of
life, melodic complexity of crying increases (Wermke & Mende, 2009) and
infants who do not show this increasing melodic complexity show poorer
language performance two years later (Wermke, Leising, & Stellzig-
Eisenhauer, 2007). The “melody” of infants’ cries reflects the prosody of
their native language, further support for the idea that infants are sensitive
to the musical aspects of the language to which they are most exposed
(Mampe, Friederici, Christophe, & Wermke, 2009; Prochnow, Erlandsson,
Hesse, & Wermke, 2017).
Additional evidence that the musical features of language are salient to
infants comes from the way we talk to them: in “baby talk” or motherese,
melodic contours are exaggerated, speech is slower and rhythmic stresses
are emphasized. Researchers have debated why we talk to babies that way.
Some have argued that the function of motherese is limited to emotional
communication (Trainor, Austin, & Desjardins, 2000). Others propose that,
in addition to communicating emotion, baby talk also engages infants’
attention (Fernald, 1989): for instance, a parent’s pitch contours vary
depending on whether the child is smiling and/or maintaining eye contact or
not (Stern, Spieker, & MacKain, 1982). Across a variety of languages,
mothers modulate their vocal timbre in similar ways when speaking to
infants versus adults, perhaps as a way to draw their infants’ attention—
again highlighting the salience timbre has for infants (Piazza, Iordan, &
Lew-Williams, 2017). Other researchers emphasize the didactic role of
motherese, observing that mothers lengthen the vowels in content words
and exaggerate word and sentence boundaries (Kuhl et al., 1997; Saint-
Georges et al., 2013).
The functions of motherese may certainly change in the course of
development: as Saint-Georges and colleagues write (Saint-Georges et al.,
2013, p. 9), “Mothers adjust their infant-directed speech to infants’ age,
cognitive abilities and linguistic level.” The universality of motherese (Falk,
2004; Fernald, 1992) highlights how crucial it is to the caregiver–child
relationship. Infants are drawn to the musical features of speech—and that
attraction helps them to engage socially and to learn.
Various other evidence illuminates the extensive sensitivity infants have
to the musical sounds of language. For instance, newborns can use different
patterns of lexical stress to discriminate individual words (Sansavini,
Bertoncini, & Giovanelli, 1997); can use acoustic cues that signal word
boundaries (Christophe, Dupoux, Bertoncini, & Mehler, 1994); can
distinguish content words from function words based on their different
acoustic characteristics (Shi, Werker, & Morgan, 1999); and appear to be
sensitive to the prosodic boundaries in sentences (Pannekamp, Weber, &
Friederici, 2006). A long-held debate in language acquisition research is
how infants solve the so-called “bootstrapping problem”—how infants
connect sounds to meaning. It may be that infants use the musical aspects of
language (timbre, rhythm, melody) as the scaffolding on which they hang
their later developments in semantic and syntactic comprehension. Infants
are listening for how their language is composed and using this musical
information to support later linguistic developments.
Still, early life is the hardest period to study and the evidence is at times
contradictory, especially when it comes to timbre perception: for instance,
infants born ten weeks premature can discriminate a ba/da contrast, but
have more difficulty discriminating two different speakers
(Mahmoudzadeh, Wallois, Kongolo, Goudjil, & Dehaene-Lambertz, 2016),
and infants can recognize different phones before they can recognize
different voices (Dehaene-Lambertz, 2017). This discrepancy highlights the
importance of continuing to probe the aural abilities of newborns.
If the precocious discrimination abilities we see in newborns are
domain-general and not limited to language, we would expect to find
similarly precocious music perception abilities. Indeed, young infants have
shown that they have very fine-grained pitch discrimination abilities, being
able to distinguish pitch contrasts as small as a third of a half-step (Olsho,
Schoon, Sakai, Turpin, & Sperduto, 1982). Newborns can also detect a
deviant pitch, even when the timbre of the pitches is also changing (Háden
et al., 2009), and can extract pitch patterns and use them in a predictive
manner (Háden, Németh, Török, & Winkler, 2015)—further evidence of
their advanced pitch-processing abilities.
In the rhythmic domain, newborns can detect the beat in music and can
detect when an important rhythmic event is removed from a repeating drum
pattern (Winkler, Háden, Ladinig, Sziller, & Honing, 2009). Newborns can
also distinguish between small changes (60–100 ms) in the length of two
tones (Čėponiené et al., 2002; Cheour et al., 2002). At two months of age,
they can discriminate an isochronous sequence of tones from a non-
isochronous sequence (Demany, McKenzie, & Vurpillot, 1977). Far more
research has been done on the speech perception abilities than the music
perception abilities of newborns and very young infants, but existing
evidence suggests that infants’ music perception abilities are every bit as
sensitive as their language perception abilities at birth. This further supports
the idea that these are domain-general sound processing capabilities, not
unique to either music or language.
C - M
L P 6 12
M A
Gradually, infants’ sound perception abilities become more refined and

culture-specific and begin to segregate more clearly into music perception
and language perception capabilities. Tellingly, this refinement proceeds
along a remarkably similar track for both music and language (see Fig. 1).
At 6 months, infants can still discriminate all of the phonetic contrasts in
the world’s languages (Cheour et al., 1998; Rivera-Gaxiola, Klarman,
Garcia-Sierra, & Kuhl, 2005), although they do show evidence of being
attuned to the vowel sounds of their native language over other languages
(Kuhl, Williams, Lacerda, Stevens, & Lindblom, 1992). Similarly, 6-month-
old infants can detect changes in a melody made of pitches from Javanese
scales or Western scales equally well, whereas adults have a difficult time
with the Javanese scale melodies (Lynch, Eilers, Kimbrough Oller, &
Urbano, 1990). By 9 months, Western infants are more like adults: they
have more difficulty with the Javanese melodies (Lynch & Eilers, 1992). In
the language domain, by 8 months, infants cannot discriminate non-native
vowel contrasts, although they can still discriminate non-native consonant
contrasts. By 10–12 months, this ability disappears as well (Polka &
Werker, 1994; Werker & Tees, 1984).
FIGURE 1. Parallel development in music and language milestones from 6 to 12 months. Regular
text denotes parallel development. Italics denote related, but not analogous development. Bold text
denotes language-only development. See main text for citations not listed here. (1) Six-month-olds
can discriminate changes in Western and Javanese scales, can discriminate simple and complex
meters, and can discriminate the phonemes of all languages. (2) Nine-month-olds can detect pitch or
timing changes more easily in strong metrical structures and more easily process duple meter (more
common) than triple meter (less common; Bergeson & Trehub, 2006). (3) Twelve-month-olds can
better detect mistuned notes in Western scales than in Javanese scales and have more difficulty
detecting changes in complex than simple meters. (4) Between 6 and 8 months, infants can
discriminate consonant from dissonant intervals, but have difficulty discriminating between different
consonant intervals (Schellenberg & Trainor, 1996). (5) Between 6 and 8 months, infants can no
longer discriminate non-native vowel contrasts, but can still discriminate non-native consonant
contrasts. (6) Trehub & Thorpe (1989). (7) At 7.5–8 months, English speaking infants show a bias
for stress-initial words and are sensitive to prosodic and frequency cues to word order.
Adapted from Frontiers in Psychology 3, p. 327, Figure 1, Anthony Brandt, Molly Gebrian,
and L. Robert Slevc, Music and early language acquisition, doi: 10.3389/fpsyg.2012.00327,
© 2012 Brandt, Gebrian, and Slevc. Reprinted under the terms of the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
Rhythmic perception abilities become more culture-specific as well. At

7.5 months, English-speaking infants show a preference for stress-initial
words (Jusczyk, Hohne, & Bauman, 1999) and by 9 months, infants are
especially sensitive to the stress patterns in their native language (for a
review, see Jusczyk, 2000). As noted above, word segmentation in infants is
initially based on rhythmic information and it is only by 10.5 months that
infants can use non-stress-based cues to segment words (Jusczyk et al.,
1999). In music, 6-month-old infants can detect changes in complex and
simple meters equally well, whereas Western adults have a hard time with
complex meters (Hannon & Trehub, 2005a). By 12 months, Western infants
also show this same difficulty with complex meters (Hannon & Trehub,
2005b).
All of this refinement in the language domain lays the groundwork for
the eventual understanding of meaning and syntax. At 8 months, infants are
sensitive to the word order rules in their native language (a necessary
ingredient in syntax comprehension in many languages), but largely through
prosodic information and word frequency (Gervain, Nespor, Mazuka,
Horie, & Mehler, 2008; Hochmann, Endress, & Mehler, 2010; Nespor et al.,
2008). At 9 months, infants show evidence of understanding their first
words (Friederici, 2006), and at this point, semantic and syntactic
development takes over. Typically developing infants begin to talk between
11 and 13 months, experience an explosion in their vocabulary between 18
and 24 months, and their syntactic learning reaches a high point between 18
and 36 months (Friederici, 2006; Kuhl, 2010). These developmental stages
show a clear trajectory: the further removed an aspect of language is from
music (referential meaning, grammar), the later it is learned.
C - M
L C
Parallels between music and language development do not stop after the
first year of life. Indeed, they continue throughout childhood, until
children’s musical and linguistic sensitivity reaches adult levels (see Fig. 2).
One challenge in comparing music and language development after infancy
is that, whereas linguistic ability is often measured against the general
population, musical ability is often (implicitly) measured against the
expertise of professional musicians. This has contributed to the idea that
language ability is an innate skill all typically developing humans possess,
whereas musical skill is due to “talent” or a “gift” that is slower to mature.
Although it certainly takes a tremendous amount of hard work and
dedication to master the viola or the trumpet, acquiring the musical
conventions of your native culture is no more difficult or slow than learning
your native language.
FIGURE 2. Parallel development in music and language milestones from 2 to 12 years. Regular
text denotes parallel development. Italics denote related, but not analogous development. See main
text for references. (1) Two-year-olds can repeat brief, sung phrases with identifiable rhythm and
contour. (2) Eighteen-month-olds produce two-word utterances; 2-year-olds tend to eliminate
function words, but not content words. (3) Two-year-olds show basic knowledge of word order
constraints. (4) Three-year-olds have some knowledge of key membership and harmony and sing
“outline songs.” (5) Four- to six-year olds show knowledge of scale and key membership and detect
changes more easily in diatonic melodies than in non-diatonic ones. Five-year-olds show a typical
electrophysiological response to unexpected chords (the early right anterior negativity, or ERAN),
but do not detect a melodic change that implies a change in harmony. (6) At 5 years, processing of
function words depends on semantic context and brain activation is not function-specific for
semantic vs. syntactic processing (unlike adults). (7) Six-year-olds are able to speak in complete,
well-formed sentences. (8) Seven-year-olds have a knowledge of Western tonal structure comparable
to adults’ and can detect melodic changes that imply a change in harmony. (9) Only after 10 years of
age do children show adult-like electrophysiological responses to syntactic errors (Hahne, Eckstein,
& Friederici, 2006).
Adapted from Frontiers in Psychology 3, p. 327, Figure 2, Anthony Brandt, Molly Gebrian,
and L. Robert Slevc, Music and early language acquisition, doi: 10.3389/fpsyg.2012.00327,
© 2012 Brandt, Gebrian, and Slevc. Reprinted under the terms of the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
Between 2 and 3 years of age, children acquire basic knowledge of

syntax (e.g., Höhle, Weissenborn, Schmitz, & Ischebeck, 2001), although at
this age, syntax and semantics are still interdependent (Brauer & Friederici,
2007; Friederici, 1983). This is still true at age 5. During the same period,
children are mastering the syntax of their culture’s music, which in Western
music means knowledge of key membership and harmony (Corrigall &
Trainor, 2009). Again, up through age 5, musical syntactic knowledge is
incomplete and still very much dependent on context, much like what is
seen in language (Koelsch et al., 2003; Trainor & Trehub, 1994; Trehub,
Cohen, Thorpe, & Morrongiello, 1986). By the age of 6, children appear to
have mastered the syntax of their native language (Scott, 2004; Nuñez et al.,
2011) and by 7 years of age, a child’s knowledge of the tonal structure in
their culture’s music is comparable to an adult’s (McMullen & Saffran,
2004; Speer & Meeks, 1985; Trainor & Trehub, 1994). The learning of
more complex syntactic structures in language continues through age 10
(Friederici, 1983), as do musical abilities. It is not until 8 to 10 years of age
that children’s pitch discrimination abilities reach adult levels (Werner &
Marean, 1996) and sensitivity to implied harmonies reaches adult levels by
age 12 (Costa-Giomi, 2003).
This parallel developmental trajectory is remarkable, especially given
that all of the papers cited above studied children from Western cultures,
which prioritize language learning far above musical learning in school
curricula. In fact, children who are given music lessons reach musical
developmental milestones sooner than their peers who do not take music
lessons (for a review, see Trainor & Corrigall, 2010). It is difficult to sustain
the argument that musical learning is slower and more effortful in the face
of this parallel development, especially when music learning is not given
nearly the same emphasis in our culture or schools.
The ability to make music also follows a remarkably parallel track to the
development of children’s speech throughout childhood. There are
relatively few studies of children’s singing abilities, perhaps because in
Western culture, we tend to separate musicians from non-musicians. This
separation is not true for many non-Western societies and was not the case
historically: modern Western society is unusual in its exclusion of singing
(and dancing) from everyday life (Cross, 2001). Despite this, the
development of singing ability appears to proceed similarly across cultures
and keeps pace with the ability to speak. For instance, around the age of 2,
children begin to produce short linguistic utterances (Friederici, 2006;
Gervain & Mehler, 2010). At this same age, they can reproduce simple
musical fragments with identifiable rhythm and contour (Dowling, 1999).
Later, 2- to 3-year-olds tend to eliminate function words (but not content
words, like nouns and verbs) from their spontaneous speech (Gerken,
Landau, & Remez, 1990). Similarly, when 3-year-olds sing, they have a
tendency to mix bits of songs from their own culture with their own original
vocal improvisations, singing so-called “outline songs” that follow the
general contour of melodies from their culture (Davidson, 1994;
Hargreaves, 1996; Moog, 1976). Thus, in both speaking and singing,
toddlers are uttering the gist of what they want to express (using the most
important words, nouns and verbs in language, and getting the essential
contour of songs correct), while leaving out the more nuanced details
(function words in language, more detailed and precise pitch content in
music). In Western societies, the ability to sing continues to mature until
around age 11 (e.g., Howard, Angus, & Welch, 1994; Welch, 2002),
although in societies that emphasize singing ability on par with speaking
ability, this improvement happens earlier and to a greater extent (Kreutzer,
2001; Welch, 2009).
I N B S
L P ?
Despite the evidence above, many have claimed that music and language
are separate systems, and are processed as such by the brain (e.g., Peretz &
Coltheart, 2003). Some researchers claim that this separation is innate based
on evidence that infants show left hemisphere lateralization for speech at
birth (e.g., Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002). However,
a close look at this research reveals a more complicated picture. Bilateral
activation for both music and language has been found in a number of
studies, ranging from those that include full sentences and/or musical
phrases (Fava, Hull, Baumbauer, & Bortfeld, 2014; Perani, 2012), to those
that use tightly controlled speech-like sounds that test specific parameters
of the linguistic acoustic signal (Dehaene-Lambertz, 2000; Kotilahti et al.,
2010; Minagawa-Kawai, Cristià, Vendelin, Cabrol, & Dupoux, 2011a).
Fava and colleagues (2014) used full sentences and musical phrases and
found no differences in activation. Perani and colleagues (2011) found right
lateralization for speech that closely parallels the activation they observed
to music in an earlier study (Perani et al., 2010).
Researchers who argue that music and language processing are separate
in the brain often point to rapid temporal processing as evidence for this
separation (e.g., Boemio, Fromm, Braun, & Poeppel, 2005; Zatorre &
Belin, 2001). The argument is that rapid temporal processing is required for
language comprehension, but not for music perception. However, as
described earlier, the perception of musical timbre requires processing at the
same rate as phonemes: for instance, in a percussion solo, rapid temporal
processing is at least as salient as slower melodic processing. Nevertheless,
even taken at face value, this research has produced conflicting results.
While the right hemisphere is believed to respond more strongly to
spectral information, the left hemisphere shows greater sensitivity to rapid
temporal contrasts, which underlies the supposed innate predisposition for
language perception. As noted earlier, Mahmoudzadeh and colleagues
(2016) found that infants born 10 weeks preterm could distinguish
consonants differing along a temporal continuum (specifically, a ba/da
contrast varying in voice onset time and so requiring rapid temporal
processing), but not between different speakers (whose voices presumably
differ mostly in spectral, not temporal, dimensions). Because the infants
were born so prematurely, the authors argued that this is evidence for an
early advantage for the processing of fast changes in the acoustic signal that
is likely genetic. However, an earlier study on premature infants by this
same research group (Mahmoudzadeh et al., 2013) found right hemisphere
activation increases to both a change in phoneme and a change in speaker (a
left hemisphere increase was also observed, but only for the phoneme
change). Minagawa-Kawai and colleagues (2011a) tested sounds that
differed in only their temporal composition or only their spectral
composition and found no evidence for hemispheric asymmetries; in fact,
both hemispheres were equally activated in the temporal condition.
Dehaene-Lambertz, who has long argued that music and language are
separate and that language perception is innate and prioritized in the infant
brain, has recently acknowledged that the brain activation to language in
young infants is complex and the leftward preference sometimes seen for
linguistic stimuli may have more to do with the fact that temporally
responsive neurons on the left mature earlier, rather than reflecting anything
special about language per se (Dehaene-Lambertz, 2017).
What is more fully supported by the research, at least to date, is the idea
that functional specialization emerges as a result of exposure and learning.
In 2011, Minagawa-Kawai and colleagues proposed such a developmental
scenario, arguing that
language acquisition recruits several specialized (but not necessarily domain-specific)
learning subsystems … The establishment of feature based, categorical phonetic units in the
extraction of words and rules on the basis of hierarchical and adjacent regularities requires
specific learning algorithms that are especially efficient in the left hemisphere and, as a
result, speech perception comes to be left-lateralized as a function of experience.
(Minagawa-Kawai, Cristià, & Dupoux, 2011b, p. 219)
This hypothesis predicts that second language learning in adults will be less
left-lateralized as a function of proficiency, which is exactly what is found
(Dehaene et al., 1997; Perani et al., 1996, 1998). In addition, sound pairs
only elicit asymmetrical activation when they constitute a contrast in the
speaker’s native language (Dehaene-Lambertz, 1997; Minagawa-Kawa,
Mori, & Sato, 2005; Näätänen et al., 1997), but after extensive training with
a non-native contrast, there is a shift to left-hemisphere dominance (Best &
Avery, 1999; Zhang et al., 2009). Gervain (2015, p. 16) echoes this idea:
“Features of the native language are processed in an increasingly lateralized
fashion in a network of focal brain areas, as processing turns from
acoustic/auditory to linguistic in nature, whereas non-native sound patterns
are handled in a more distributed and more bilateral way.”
L D D
Other evidence for the inseparability of music and language during early
life comes from linked deficits in individuals with abnormal linguistic or
musical development. For example, the speech perception and reading
deficits associated with developmental dyslexia have often been linked to
underlying problems with auditory processing, including processing of
rapid temporal changes in speech (for a review, see Hämäläinen, Salminen,
& Leppänen, 2013). Given the discussion above, it is unsurprising that
these deficits impact music as well. In particular, children with dyslexia
tend to also show deficits with musical rhythm (e.g., synchronizing with a
metronome) and (while not as often studied) musical timbre (e.g.,
Goswami, Huss, Mead, Fosker, & Verney, 2013; Huss, Verney, Fosker,
Mead, & Goswami, 2011; Overy, Nicolson, Fawcett, & Clarke, 2003).
Assuming a crucial link between musical and linguistic development, one
might imagine that musical experience could help treat dyslexia. In fact,
musical (especially rhythm) training can help remediate linguistic deficits in
dyslexia (e.g., Flaugnacco et al., 2015), although music training is clearly
no panacea given that dyslexic musicians show good musical abilities
despite pronounced reading deficits (Weiss, Granot, & Ahissar, 2014).
Note, however, that dyslexic musicians do show other types of perceptual
and auditory deficits including discrimination of amplitude envelope cues
(which relies on rapid temporal processing) and auditory working memory
(Weiss et al., 2014; Zuk et al., 2017). Thus developmental dyslexia appears
to be a deficit not only of language but also of music.
Similar parallels emerge in other purportedly language-specific deficits
such as specific language impairment (SLI). Individuals with SLI, whose
primary deficit involves syntactic processing, also show deficits in the
processing of musical (harmonic) structure (Jentschke, Koelsch, Sallat, &
Friederici, 2008), deficits in rhythmic processing (e.g., Cumming, Wilson,
Leong, Colling, & Goswami, 2015), and show more accurate grammatical
processing following rhythmic stimulation in a priming paradigm (Bedoin,
Brisseau, Molinier, Roch, & Tillmann, 2016; Przybylski et al., 2013).
Production deficits in SLI are also associated with productive deficits in
music—specifically, in pitch-matching and melody reproduction (Clément,
Planchou, Béland, Motte, & Samson, 2015).
Even deficits in language production such as developmental stuttering
have been linked to deficits with musical rhythm (Wieland, McAuley,
Dilley, & Chang, 2015), and suggestive relationships between language and
music deficits appear in other disorders not linked specifically to language.
For example, autistic children often have age-appropriate responses to
music but suffer from language disabilities. This has been viewed as
evidence that music and language must involve innately distinct neural
networks, with the music network healthy and the language one impaired.
However, in an fMRI study of autistic children, Lai and colleagues (Lai,
Pantazatos, Schneider, & Hirsch, 2012, p. 961) found that, “paradoxically,
brain regions associated with these functions typically overlap”: while song
and speech activated the same networks, the response to song was vigorous
whereas that to speech was subdued (Lai et al., 2012). And while the causes
are not yet fully understood, there is growing evidence for the efficacy of
musical engagement and therapy on language (and other) outcomes in
autism spectrum disorders (e.g., Geretsegger, Elefant, Mössler, & Gold,
2014).
Of course, the prediction is not just that linguistic deficits show
concomitant musical processing problems, but also that developmental
deficits in music should have consequences for language development.
Congenital amusia, the most well-studied deficit of musical development, is
primarily associated with deficits in pitch perception and/or pitch memory,
although can also include deficits in the processing of temporal (rhythmic)
aspects of music (reviews: Peretz & Hyde, 2003; Tillman, Albouy, &
Caclin, 2015). While early conceptions of amusia suggested a deficit
specific to music, more recent work has found related deficits in processing
of linguistic pitch, both for lexical tones (in tone-language speakers; e.g.,
Liu, Patel, Fourcin, & Stewart, 2010; Wang & Peng, 2014), and for the
recognition of emotional prosody in non-tone languages (Thompson, Marin,
& Stewart, 2012). Speech perception deficits in congenital amusia appear to
extend to non-pitch-based aspects of language as well (e.g., Liu, Jiang,
Wang, Xu, & Patel, 2015), showing subtle, but widespread, linguistic
consequences of congenital amusia.
One deficit that severely impacts both music and speech processing is
deafness. Given that deaf individuals can successfully learn full-fledged
sign languages, deafness could be seen as a notable counterexample for
many of the claims made here. Note, however, that music is not just an
auditory stimulus, but also a kinesthetic and visual one, and the rhythmic
“babbling” characteristic of sign-exposed infants (e.g., Petitto, Holowka,
Sergio, & Ostry, 2001) may lay the foundation for the temporal processing
abilities underlying later linguistic acquisition in signers.
E M L
A
It can be difficult for adults to conceive of music and language being treated
as one and the same in the infant brain because they are so obviously
different once we reach maturity. However, this entanglement between
music and language persists throughout our lives. Research on the
perception of tunes and lyrics shows evidence of integrated processing at
the pre-lexical, phonemic processing stage in adults (Sammler et al., 2010).
Sine-wave speech, which initially sounds like meaningless whistles, can not
only be perceived as speech after training, but it also activates speech areas
(specifically the left posterior superior temporal sulcus) once it is perceived
as speech (Mottonen et al., 2006). Silbo Gomero, a whistled speech used on
La Gomera in the Canary Islands, activates areas of the brain normally
associated with spoken language perception in proficient whistlers, but not
in those who do not speak the language (Carreiras, Lopez, Rivero, &
Corina, 2005).
Something as abstract as syntax may also be rooted in the music of
language. Kreiner and Eviatar (2014) argue that syntactic comprehension is
rooted in prosody: syntactic and prosodic boundaries largely correspond in
spoken language and this helps aid syntactic comprehension (see also
Heffner & Slevc, 2015). They note that prosody helps disambiguate unclear
syntactic structures (such as garden-path sentences), which can be
misleading in print, but rarely in spoken conversation. As argued above, this
melody of language is what infants first use, but even as adults, it continues
to underlie our syntactic understanding. There is also evidence for a
correlation between rhythm processing and word stress processing in
normal adults, another entanglement of music and language (Hausen,
Torppa, Salmela, Vainio, & Särkämö, 2013).
Patel (2012) has proposed a framework for the adult brain in which music
and language share neural resources for those perceptual and cognitive
tasks that they share in common. Given the evidence for infants’
sensitivities to the musical features of speech, the co-development of
musical and linguistic abilities, and shared developmental disorders, it
seems plausible that music and language are even more deeply entangled in
the newborn brain and that modularity emerges in the course of
development. Speech may be privileged in the infant brain (Shultz,
Vouloumanos, Bennett, & Pelphrey, 2014), but it is first experienced as a
vocal performance whose musical features are what engage the newborn’s
attention. It is only later that the child begins to apprehend the referential
function of words, and the music of words begins to sink into the
background. As Deutsch’s “looped speech” experiment shows, the music of
speech is never really absent, even in a non-tonal language—it is just that
we pay less attention to it. Music—and by extension, poetry—may give
human culture ways to creatively engage in the features of our aural
imaginations that conversational speech does not prioritize.
A burning question of human cognition is whether language is innate.
Are we language animals, with a universal grammar encoded in our genes
(Chomsky, 1980)? Or do we have a strong biological need to communicate
—and an aptitude for learning how to do so? The jury is still out, but there
has been a gradual shift toward viewing language as a cultural inheritance
rather than a genetic one. Iterated learning—in which initially random data
becomes coherent over time as generations of subjects instruct one another
—is a plausible way of describing the emergence of both language and
music (Kirby, Griffiths, & Smith, 2014; Ravignani, Delgado, & Kirby,
2016; Smith, Kirby, & Brighton, 2003). The study of music’s role in early
language acquisition may have important implications: when we learn
language, the music of speech comes first, thereby providing a key
mechanism by which language is transmitted from generation to generation.
In a newborn’s first months, speech is akin to bebop: musical attention to
how languages are composed—through their unique phonemic inventory
and prosody—helps an infant born into any community learn its native
tongue. The same acuities used in these early developmental stages—
sensitivities to timbre, pitch, and rhythm, and the ability to recognize their
consistencies—embody musical aptitude later in life. Early language
acquisition thus lies at the crossroads of music and language and provides
tantalizing glimpses into what it means to be human.
R
Albin, D. D., & Echols, C. H. (1996). Stressed and word-final syllables in infant-directed speech.
Infant Behavior and Development 19(4), 401–418.
Batuman, E. (2012). Talking drums. The New Yorker, July 9. Retrieved from
https://www.newyorker.com/culture/culture-desk/talking-drums
Bergeson, T. R., & Trehub, S. E. (2006). Infants’ perception of rhythmic patterns. Music Perception
23(4), 345–360.
Best, C. T., & Avery, R. A. (1999). Left-hemisphere advantage for click consonants is determined by
linguistic significance and experience. Psychological Science 10(1), 65–70.
Boemio, A., Fromm, S., Braun, A., & Poeppel, D. (2005). Hierarchical and asymmetric temporal
sensitivity in human auditory cortices. Nature Neuroscience 8(3), 389–395.
Bosch, L., & Sebastián-Gallés, N. (1997). Native-language recognition abilities in 4-month-old
infants from monolingual and bilingual environments. Cognition 65(1), 33–69.
Brandt, A., Gebrian, M., & Slevc, L. R. (2012). Music and early language acquisition. Frontiers in
Brauer, J., & Friederici, A. D. (2007). Functional neural networks of semantic and syntactic
processes in the developing brain. Journal of Cognitive Neuroscience 19(10), 1609–1623.
Carreiras, M., Lopez, J., Rivero, F., & Corina, D. (2005). Linguistic perception: Neural processing of
a whistled language. Nature 433(7021), 31–32.
Čėponiené, R., Kushnerenko, E., Fellman, V., Renlund, M., Suominen, K., & Näätänen, R. (2002).
Event-related potential features indexing central auditory discrimination by newborns. Cognitive
Cheng, Y., Lee, S. Y., Chen, H. Y., Wang, P. Y., & Decety, J. (2012). Voice and emotion processing in
the human neonatal brain. Journal of Cognitive Neuroscience 24(6), 1411–1419.
Cheour, M., Čėponiené, R., Lehtokoski, A., Luuk, A., Allik, J., Alho, K., & Näätänen, R. (1998).
Development of language-specific phoneme representations in the infant brain. Nature
Cheour, M., Čėponiené, R., Leppänen, P., Alho, K., Kujala, T., Renlund, M., & Näätänen, R. (2002).
The auditory sensory memory trace decays rapidly in newborns. Scandinavian Journal of
Chomsky, N. (1980). Rules and representations. New York: Columbia University Press.
Christophe, A., Dupoux, E., Bertoncini, J., & Mehler, J. (1994). Do infants perceive word
boundaries? An empirical study of the bootstrapping of lexical acquisition. Journal of the
Clément, S., Planchou, C., Béland, R., Motte, J., & Samson, S. (2015). Singing abilities in children
with Specific Language Impairment (SLI). Frontiers in Psychology 6. Retrieved from
Corrigall, K. A., & Trainor, L. J. (2009). Effects of musical training on key and harmony perception.
Costa-Giomi, E. (2003). Young children’s harmonic perception. Annals of the New York Academy of
Sciences 999, 477–484.
Costa-Giomi, E., & Davila, Y. (2014). Infants’ discrimination of female singing voices. International
Journal of Music Education 32(3), 324–332.
Creel, S. C. (2016). Ups and downs in auditory development: Preschoolers’ sensitivity to pitch
contour and timbre. Cognitive Science 40(2), 373–403.
Cross, I. (2001). Music, cognition, culture, and evolution. Annals of the New York Academy of
Sciences 930, 28–42.
Cumming, R., Wilson, A., Leong, V., Colling, L. J., & Goswami, U. (2015). Awareness of rhythm
patterns in speech and music in children with specific language impairments. Frontiers in Human
Neuroscience 9. Retrieved from https://doi.org/10.3389/fnhum.2015.00672
Davidson, L. (1994). Song singing by young and old: A developmental approach to music. In R.
Aiello with J. Sloboda (Eds.), Musical perceptions (pp. 99–130). New York: Oxford University
Press.
Deacon, T. W. (1997). The symbolic species: The coevolution of language and the brain. New York:
W. W. Norton.
Dehaene, S., Dupoux, E., Mehler, J., Cohen, L., Paulesu, E., Perani, D., & Le Bihan, D. (1997).
Anatomical variability in the cortical representation of first and second language. NeuroReport
8(17), 3809–3815.
Dehaene-Lambertz, G. (1997). Electrophysiological correlates of categorical phoneme perception in
adults. NeuroReport 8(4), 919–924.
Dehaene-Lambertz, G. (2000). Cerebral specialization for speech and non-speech stimuli in infants.
Dehaene-Lambertz, G. (2017). The human infant brain: A neural architecture able to learn language.
Psychonomic Bulletin and Review 24(1), 48–55.
Dehaene-Lambertz, G., & Dehaene, S. (1994). Speed and cerebral correlates of syllable
discrimination in infants. Nature 370, 1–4.
Dehaene-Lambertz, G., Dehaene, S., & Hertz-Pannier, L. (2002). Functional neuroimaging of speech
perception in infants. Science 298(5600), 2013–2015.
Demany, L., McKenzie, B., & Vurpillot, E. (1977). Rhythm perception in early infancy. Nature
266(5604), 718–719.
Deutsch, D., Henthorn, T., & Lapidis, R. (2011). Illusory transformation from speech to song.
Journal of the Acoustical Society of America 129(4), 2245–2252
Dowling, W. J. (1999). The development of music perception and cognition. In D. Deutsch (Ed.), The
Psychology of Music (2nd ed.; pp. 603–625). London: Academic Press.
Eimas, P., Siqueland, E., Jusczyk, P., & Vigorito, J. (1971). Speech perception in infants. Science
171(3968), 202–206.
Falk, D. (2004). Prelinguistic evolution in early hominins: Whence motherese? Behavioral and Brain
Sciences 27(4), 491–503; discussion: 503–583.
Fava, E., Hull, R., Baumbauer, K., & Bortfeld, H. (2014). Hemodynamic responses to speech and
music in preverbal infants. Child Neuropsychology 20(4), 430–448.
Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants: Is the melody
the message? Child Development 60(6), 1497–1510.
Fernald, A. (1992). Human maternal vocalizations to infants as biologically relevant signals: An
evolutionary perspective. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind:
Evolutionary psychology and the generation of culture (pp. 391–428). Oxford: Oxford University
Press.
Fernald, A., & Mazzie, C. (1991). Prosody and focus in speech to infants and adults. Developmental
Psychology 27(2), 209–221.
Fleming, K. (2013). Five surprising facts about Richie’s classic “All Night Long.” New York Post,
September 21. Retrieved from http://nypost.com/2013/09/21/five-surprising-facts-about-richies-
classic-all-night-long/
Forsyth, M. (2012). The etymologicon: A circular stroll through the hidden connections of the
English language. New York: Berkley Books.
Fox, C. (1990). Steve Reich’s “Different Trains.” Tempo, New Series, No. 172 (March), 2–8.
Friederici, A. D. (1983). Children’s sensitivity to function words during sentence comprehension.
Linguistics 21, 717–739.
Friederici, A. D. (2006). The neural basis of language development and its impairment. Neuron
52(6), 941–952.
Geretsegger, M., Elefant, C., Mössler, K. A., & Gold, C. (2014). Music therapy for people with
autism spectrum disorder. Cochrane Database of Systematic Reviews 17(6), CD004381.
doi:10.1002/14651858.CD004381.pub3
Gerken, L., Landau, B., & Remez, R. (1990). Function morphemes in young children’s speech
perception and production. Developmental Psychology 26(2), 204–216.
Gervain, J. (2015). Plasticity in early language acquisition: The effects of prenatal and early
childhood experience. Current Opinion in Neurobiology 35, 13–20.
Gervain, J., & Mehler, J. (2010). Speech perception and language acquisition in the first year of life.
Gervain, J., Nespor, M., Mazuka, R., Horie, R., & Mehler, J. (2008). Bootstrapping word order in
prelexical infants: A Japanese–Italian cross-linguistic study. Cognitive Psychology 57(1), 56–74.
Goswami, U., Huss, M., Mead, N., Fosker, T., & Verney, J. P. (2013). Perception of patterns of
musical beat distribution in phonological developmental dyslexia: Significant longitudinal
relations with word reading and reading comprehension. Cortex 49(5), 1363–1376.
Háden, G. P., Németh, R., Török, M., & Winkler, I. (2015). Predictive processing of pitch trends in
newborn infants. Brain Research 1626, 14–20.
Háden, G. P., Stefanics, G., Vestergaard, M. D., Denham, S. L., Sziller, I., & Winkler, I. (2009).
Timbre-independent extraction of pitch in newborn infants. Psychophysiology 46(1), 69–74.
Hahne, A., Eckstein, K., & Friederici, A. D. (2006). Brain signatures of syntactic and semantic
processes during children’s language development. Brain 16(7), 1302–1318.
Halpern, A. R., & Müllensiefen, D. (2008). Effects of timbre and tempo change on memory for
music. Quarterly Journal of Experimental Psychology 61(9), 1371–1384.
Hämäläinen, J. A., Salminen, H. K., & Leppänen, P. H. (2013). Basic auditory processing deficits in
dyslexia: Systematic review of the behavioral and event-related potential/field evidence. Journal of
Learning Disabilities 46(5), 413–427.
Science 16(1), 48–55.
Hargreaves, D. J. (1996). The development of artistic and musical competence. In I. Deliege & J.
Sloboda (Eds.), Musical beginnings (pp. 145–170). Oxford: Oxford University Press.
Harich-Schneider, E. (1954). The rhythmical patterns in Gagaku and Bugaku. Leiden: E. J. Brill.
Hausen, M., Torppa, R., Salmela, V. R., Vainio, M., & Särkämö, T. (2013). Music and speech
prosody: A common rhythm. Frontiers in Psychology 4. Retrieved from
Heffner, C. C., & Slevc, L. R. (2015). Prosodic structure as a parallel to musical structure. Frontiers
in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.01962
Hochmann, J. R., Endress, A. D., & Mehler, J. (2010). Word frequency as a cue for identifying
function words in infancy. Cognition 115(3), 444–457.
Höhle, B., Weissenborn, J., Schmitz, M., & Ischebeck, A. (2001). Discovering word order
regularities: The role of prosodic information for early parameter setting. In J. Weissenborn & B.
Höhle (Eds.), Approaches to bootstrapping: Phonological, lexical, syntactic and
neurophysiological aspects of early language acquisition (pp. 249–265). Amsterdam: John
Benjamins.
Howard, D. M., Angus, J. A., & Welch, G. F. (1994). Singing pitching accuracy from years 3 to 6 in
a primary school. Proceedings of the Institute of Acoustics 16(5), 223–230.
Hugo, V. (1864). William Shakespeare. Paris: A. Lacroix, Verboeckhoeven et Cie.
Hukin, R. W., & Darwin, C. J. (1995). Comparison of the effect of onset asynchrony on auditory
grouping in pitch matching and vowel identification. Perception and Psychophysics 57(2), 191–
196.
Huss, M., Verney, J. P., Fosker, T., Mead, N., & Goswami, U. (2011). Music, rhythm, rise time
perception and developmental dyslexia: Perception of musical meter predicts reading and
phonology. Cortex 47(6), 674–689.
Jackendoff, R. (2009). Parallels and nonparallels between language and music. Music Perception
26(3), 195–204.
Jentschke, S., Koelsch, S., Sallat, S., & Friederici, A. D. (2008). Children with specific language
impairment also show impairment of music-syntactic processing. Journal of Cognitive
Neuroscience 20(11), 1940–1951.
Jusczyk, P. W. (2000). The discovery of spoken language. Cambridge, MA: MIT Press.
Jusczyk, P. W., Hohne, E. A., & Bauman, A. (1999). Infants’ sensitivity to allophonic cues to word
segmentation. Perception and Psychophysiology 61(8), 1465–1476.
Jusczyl, P. W., Pisoni, D. B., & Mullennix, J. (1992). Some consequences of stimulus variability on
speech processing by 2-month-old infants. Cognition 43(3), 253–291.
Kirby, S., Griffiths, T., & Smith, K. (2014). Iterated learning and the evolution of language. Current
Opinion in Neurobiology 28, 108–114.
Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., Schröger, E., & Friederici, A. D. (2003).
Children processing music: Electric brain responses reveal musical competence and gender
differences. Journal of Cognitive Neuroscience 15(5), 683–693.
Kostelanetz, R. (1987). Notes on Milton Babbitt as text-sound artist. Perspectives of New Music
25(1/2), 280–284.
Kotilahti, K., Nissilä, I., Näsi, T., Lipiäinen, L., Noponen, T., Meriläinen, P., … Fellman, V. (2010).
Hemodynamic responses to speech and music in newborn infants. Human Brain Mapping 31(4),
595–603.
Kreiner, H., & Eviatar, Z. (2014). The missing link in the embodiment of syntax: Prosody. Brain &
Language 137, 91–102.
Kreutzer, N. (2001). Song acquisition among from rural Shona-speaking Zimbabwean children from
birth to 7 years. Journal of Research in Music Education 49(3), 198–211.
Kuhl, P. K. (2010). Brain mechanisms in early language acquisition. Neuron 67(5), 713–727.
Kuhl, P. K., Andruski, J. E., Chistovich, I., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., …
Lacerda, F. (1997). Cross-language analysis of phonetic units in language addressed to infants.
Science 277(5326), 684–686.
Kuhl, P. K., Williams, K., Lacerda, F., Stevens, K., & Lindblom, B. (1992). Linguistic experience
alters phonetic perception in infants by 6 months of age. Science 255(5044), 606–608.
Lai, G., Pantazatos, S. P., Schneider, H., & Hirsch, J. (2012). Neural systems for speech and song in
autism. Brain 135(3), 961–975.
Liu, F., Jiang, C., Wang, B., Xu, Y., & Patel, A. D. (2015). A music perception disorder (congenital
amusia) influences speech comprehension. Neuropsychologia 66, 111–118.
Liu, F., Patel, A. D., Fourcin, A., & Stewart, L. (2010). Intonation processing in congenital amusia:
Discrimination, identification and imitation. Brain 133(6), 1682–1693.
Perception and Psychophysics 52(6), 599–608.
Lynch, M. P., Eilers, R. E., Kimbrough Oller, D., & Urbano, R. C. (1990). Innateness, experience,
and music perception. Psychological Science 1(4), 272–276.
McAdams, S., & Bertoncini, J. (1997). Organization and discrimination of repeating sound sequences
by newborn infants. Journal of the Acoustical Society of America 102(5), 2945–2953.
McMullen, E., & Saffran, J. R. (2004). Music and language: A developmental comparison. Music
Perception 21(3), 289–311.
McWhorter, J. (2015). The world’s most musical languages. The Atlantic, November 13. Retrieved
from https://www.theatlantic.com/international/archive/2015/11/tonal-languages-linguistics-
mandarin/415701/
Mahmoudzadeh, M., Dehaene-Lambertz, G., Fournier, M., Kongolo, G., Goudjil, S., Dubois, J., &
Wallois, F. (2013). Syllabic discrimination in premature human infants prior to complete formation
of cortical layers. Proceedings of the National Academy of Sciences 110(12), 4846–4851.
Mahmoudzadeh, M., Wallois, F., Kongolo, G., Goudjil, S., & Dehaene-Lambertz, G. (2016).
Functional maps at the onset of auditory inputs in very early preterm human neonates. Cerebral
Cortex 27(4), 2500–2512.
Mampe, B., Friederici, A. D., Christophe, A., & Wermke, K. (2009). Newborns’ cry melody is
shaped by their native language. Current Biology 19, 1994–1997.
Mann, B. (1961). Who Put the Bomp. ABC-Paramount 10237.
Minagawa-Kawai, Y., Cristià, A., & Dupoux, E. (2011b). Cerebral lateralization and early speech
acquisition: A developmental scenario. Developmental Cognitive Neuroscience 1(3), 217–232.
Minagawa-Kawai, Y., Cristià, A., Vendelin, I., Cabrol, D., & Dupoux, E. (2011a). Assessing signal-
driven mechanisms in neonates: Brain responses to temporally and spectrally different sounds.
Minagawa-Kawai, Y., Mori, K., & Sato, Y. (2005). Different brain strategies underlie the categorical
perception of foreign and native phonemes. Journal of Cognitive Neuroscience 17(9), 1376–1385.
Miranda, R. A., & Ullman, M. T. (2007). Double dissociation between rules and memory in music:
An event-related potential study. NeuroImage 38(2), 331–345.
Moog, H. (1976). The musical experience of the pre-school child. Trans. C. Clarke. London: Schott.
Moon, C., Cooper, R. P., & Fifer, W. P. (1993).Two-day-olds prefer their native language. Infant
Behavior and Development 16(4), 495–500.
Moran, J. (2006). Artist-in-residence. New York: Blue Note Records.
Möttönen, R., Calvert, G. A., Jääskeläinen, I. P., Matthews, P. M., Thesen, T., Tuomainen, J., &
Sams, M. (2006). Perceiving identical sounds as speech or non-speech modulates activity in the
left posterior superior temporal sulcus. NeuroImage 30(2), 563–569.
Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., & Allik, J.
(1997). Language-specific phoneme representations revealed by electric and magnetic brain
responses. Nature 385(6615), 432–434.
Nazzi, T., Bertoncini, J., & Mehler, J. (1998). Language discrimination by newborns: Toward an
understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and
Nespor, M., Shukla, M., van de Vijver, R., Avesani, C., Schraudolf, H., & Donati, C. (2008).
Different phrasal prominence realization in VO and OV languages. Lingue e Linguaggio 7(2), 1–
28.
Norman-Haignere, S., Kanwisher, N. G., & McDermott, J. H. (2015). Distinct cortical pathways for
music and speech revealed by hypothesis-free voxel decomposition. Neuron 88(6), 1281–1296.
Nuñez, S. C., Dapretto, M., Katzir, T., Starr, A., Bramen, J., Kan, E., … Sowell, E. R. (2011). fMRI
of syntactic processing in typically developing children: Structural correlates in the inferior frontal
gyrus. Developmental Cognitive Neuroscience 1(3), 313–323.
Olsho, L. W., Schoon, C., Sakai, R., Turpin, R., & Sperduto, V. (1982). Auditory frequency
discrimination in infancy. Developmental Psychology 18(5), 721–726.
Overy, K., Nicolson, R. I., Fawcett, A. J., & Clarke, E. F. (2003). Dyslexia and music: Measuring
musical timing skills. Dyslexia 9(1), 18–36.
Pannekamp, A., Weber, C., & Friederici, A. D. (2006). Prosodic processing at the sentence level in
infants. NeuroReport 17(6), 675–678.
Patel, A. D. (2012). Language, music, and the brain: A resource-sharing framework. In P. Rebuschat,
M. Rohrmeier, J. A. Hawkins, & I. Cross (Eds.), Language and music as cognitive systems (pp.
Perani, D. (2012). Functional and structural connectivity for language and music processing at birth.
Rendiconti Lincei 23(3), 305–314.
Perani, D., Dehaene, S., Grassi, F., Cohen, L., Cappa, S. F., Dupoux, E., … Mehler, J. (1996). Brain
processing of native and foreign languages. NeuroReport 7(15–17), 2439–2444.
Perani, D., Paulesu, E., Galles, N. S., Dupoux, E., Dehaene, S., Bettinardi, V., & Mehler, J. (1998).
The bilingual brain: Proficiency and age of acquisition of the second language. Brain 121(10),
1841–1852.
Perani, D., Saccuman, M. C., Scifo, P., Anwander, A., Spada, D., Baldoli, C., … Friederici, A. D.
(2011). Neural language networks at birth. Proceedings of the National Academy of Sciences
108(38), 16056–16061.
Perani, D., Saccumann, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010).
Peretz, I. (2001). The biological foundations of music. In E. Dupoux (Ed.), Language, Brain, and
Cognitive Development: Essays in Honor of Jacques Mehler (pp. 435–466). Cambridge, MA: MIT
Press.
691.
Peretz, I., & Hyde, K. L. (2003). What is specific to music processing? Insights from congenital
amusia. Trends in Cognitive Sciences 7(8), 362–367.
Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two
approaches. Trends in Cognitive Sciences 10(5), 233–238.
Petitto, L. A., Holowka, S., Sergio, L. E., & Ostry, D. (2001). Language rhythms in baby hand
movements. Nature 413(6851), 35–36.
Piazza, E. A., Iordan, M. C., & Lew-Williams, C. (2017). Mothers consistently alter their unique
vocal fingerprints when communicating with infants. Current Biology 27(20), 3162–3167.
Polka, L., & Werker, J. F. (1994). Developmental changes in perception of nonnative vowel contrasts.
Prochnow, A., Erlandsson, S., Hesse, V., & Wermke, K. (2017). Does a “musical” mother tongue
influence cry melodies? A comparative study of Swedish and German newborns. Musicae
Scientiae (October). doi:1029864917733035
Przybylski, L., Bedoin, N., Krifi-Papoz, S., Herbillon, V., Roch, D., Léculier, L., & Tillmann, B.
Radvansky, G. A., & Potter, J. K. (2000). Source cuing: Memory for melodies. Memory and
Cognition 28(5), 693–699.
Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: Study based on
speech resynthesis. Journal of the Acoustical Society of America 105, 512–521.
Ravignani, A., Delgado, T., & Kirby, S. (2016). Musical evolution in the lab exhibits rhythmic
universals. Nature Human Behaviour. doi:10.1038/s41562-016-0007
Reich, S. (1989). Different Trains/Electric Counterpoint. Nonesuch 979176-2.
Rivera-Gaxiola, M., Klarman, L., Garcia-Sierra, A., & Kuhl, P. K. (2005). Neural patterns to speech
and vocabulary growth in American infants. NeuroReport 16, 495–498.
Robinson, K., & Patterson, R. D. (1995). The duration required to identify the instrument, the octave,
or the pitch chroma of a musical note. Music Perception 13, 1–14.
Rousseau, B. (2016). Which language uses the most sounds? Click 5 times for the answer. The New
York Times, November 25. Retrieved from https://www.nytimes.com/2016/11/25/world/what-in-
the-world/click-languages-taa-xoon-xoo-botswana.html
Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by eight-month-old infants. Science
274(5294), 926–928.
Saint-Georges, C., Chetouani, M., Cassel, R., Apicella, F., Mahdhaoui, A., Muratori, F., & Cohen, D.
(2013). Motherese in interaction: At the cross-road of emotion and cognition? (A systematic
review). PLoS ONE, 8(10). Retrieved from https://doi.org/10.1371/journal.pone.0078103
Sansavini, A., Bertoncini, J., & Giovanelli, G. (1997). Newborns discriminate the rhythm of
multisyllabic stressed words. Developmental Psychology 33(1), 3–11.
Schellenberg, E. G., & Trainor, L. J. (1996). Sensory consonance and the perceptual similarity of
complex-tone harmonic intervals: Tests of adult and infant listeners. Journal of the Acoustical
Society of America 100(5), 3321–3328.
Scott, C. (2004). Syntactic ability in children and adolescents with language and learning disabilities.
In R. A. Berman (Ed.), Language Development Across Childhood and Adolescence (pp. 111–134).
Amsterdam: John Benjamins.
Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering. Science 210(4468),
390–398.
Shi, R., Werker, J. F., & Morgan, J. L. (1999). Newborn infants’ sensitivity to perceptual cues to
lexical and grammatical words. Cognition 72(2), B11–B21.
Shultz, S., Vouloumanos, A., Bennett, R. H., & Pelphrey, K. (2014). Neural specialization for speech
in the first months of life. Developmental Science 17(5), 766–774.
Smith, K., Kirby, S., & Brighton, H. (2003). Iterated learning: A framework for the emergence of
language. Artificial Life 9, 371–386.
Speer, J. R., & Meeks, P. U. (1985). School children’s perception of pitch in music.
Psychomusicology 5, 49–56.
Spiller, H. (2004). The traditional sounds of Indonesia. Santa Barbara, CA: ABC-CLIO.
Stern, D. N., Spieker, S., & MacKain, K. (1982). Intonation contours as signals in maternal speech to
prelinguistic infants. Developmental Psychology 18(5), 727–735.
Thompson, W. F., Marin, M. M., & Stewart, L. (2012). Reduced sensitivity to emotional prosody in
congenital amusia rekindles the musical protolanguage hypothesis. Proceedings of the National
Tillmann, B., Albouy, P., & Caclin, A. (2015). Congenital amusias. In G. G. Celesia & G. S. Hickok
(Eds.), The human auditory system: Fundamental organization and clinical disorder (3rd ed.; pp.
589–605). Amsterdam: Elsevier.
Trainor, L. J., Austin, C. M., & Desjardins, N. (2000). Is infant-directed speech a result of the vocal
expression of emotion? Psychological Science 11(3), 188–195.
Trainor, L. J., & Corrigall, K. A. (2010). Music acquisition and effects of musical experience. In M.
Riess Jones, R. R. Fay, & A. N. Popper (Eds.), Music Perception (Vol. 36; pp. 89–127). New York:
Springer.
Trainor, L. J., & Trehub, S. E. (1994). Key membership and implied harmony in Western tonal
music: Developmental perspectives. Attention, Perception, and Psychophysics 56(2), 125–132.
Trainor, L. J., Wu, L., & Tsang, C. D. (2004). Long-term memory for music: Infants remember tempo
and timbre. Developmental Science 7(3), 289–296.
Trehub, S., Cohen, A., Thorpe, L., & Morrongiello, B. (1986). Development of the perception of
musical relations: Semitone and diatonic structure. Journal of Experimental Psychology: Human
Perception and Performance 12, 295–301.
Trehub, S. E., Endman, M. W., & Thorpe, L. A. (1990). Infants’ perception of timbre: Classification
of complex tones by spectral structure. Journal of Experimental Child Psychology 49(2), 300–313.
Trehub, S. E., & Thorpe, L. A. (1989). Infants’ perception of rhythm: Categorization of auditory
sequences by temporal structure. Canadian Journal of Psychology/Revue canadienne de
psychologie 43(2), 217–229.
Vouloumanos, A., & Werker, J. F. (2007). Listening to language at birth: Evidence for a bias for
speech in neonates. Developmental Science 10(2), 159–164.
Wang, X., & Peng, G. (2014). Phonological processing in Mandarin speakers with congenital amusia.
Weiss, A. H., Granot, R. Y., & Ahissar, M. (2014). The enigma of dyslexic musicians.
Welch, G. F. (2002). Early childhood musical development. In L. Bresler & C. Thompson (Eds.), The
arts in children’s lives: Context, culture and curriculum (pp. 113–128). Dordrecht: Kluwer.
Welch, G. F. (2009). Evidence of the development of vocal pitch matching ability in children.
Japanese Journal of Music Education Research 21, 1–13.
Werker, J. F., & Tees, R. (1984). Cross-language speech perception: Evidence for perceptual
reorganization during the first year of life. Infant Behavior and Development 7, 49–63.
Wermke, K., Leising, D., & Stellzig-Eisenhauer, A. (2007). Relation of melody complexity in
infants’ cries to language outcome in the second year of life: A longitudinal study. Clinical
Linguistics and Phonetics 21(11–12), 961–973.
Wermke, K., & Mende, W. (2009). Musical elements in human infants’ cries: In the beginning is the
melody. Musicae Scientiae 13(2), 151–175.
Werner, L. A., & Marean, G. C. (1996). Human auditory development. Madison, WI: Brown
Benchmark.
Wieland, E. A., McAuley, J. D., Dilley, L. C., & Chang, S. E. (2015). Evidence for a rhythm
perception deficit in children who stutter. Brain & Language 144, 26–34.
Winkler, I., Háden, G. P., Ladinig, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the
beat music. Proceedings of the National Academy of Sciences 106(7), 2468–2471.
Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex.
Zhang, Y., Kuhl, P. K., Imada, T., Iverson, P., Pruitt, J., Stevens, E. B., & Nemoto, I. (2009). Neural
signatures of phonetic learning in adulthood: A magnetoencephalography study. NeuroImage
46(1), 226–240.
Zuk, J., Bishop-Liebler, P., Ozernov-Palchik, O., Moore, E., Overy, K., Welch, G., & Gaab, N.
(2017). Revisiting the “enigma” of musicians with dyslexia: Auditory sequencing and speech
abilities. Journal of Experimental Psychology: General 146(4), 495–511.
CHAPT E R 24
RHYTHM, METER, AND

T I M I N G : T H E H E A RT B E AT
OF MUSICAL
DEVELOPMENT
L A U R E L J. T R A I N O R A N D S U S A N MA R S H - R O L L O
E we perceive, think, feel, and do unfolds over time. For

example, the notes of music and the syllables of speech only make sense in
the context of the preceding and forthcoming sounds in the sequences in
which they are embedded. Many motor acts are also rhythmic, from
heartbeats and breathing to locomotion, articulating speech, and playing a
musical instrument. From regularities in the rhythmic surface (i.e., the
pattern of temporal intervals defined by sound onsets) of music, adults can
extract a beat, a quasi-steady (quasi-isochronous) internally constructed
tempo, to which they can entrain motor movements such as tapping,
clapping, and dancing (e.g., Gjerdingen, 1989; London, 2004; Repp, 2005;
Repp & Su, 2013). Beat extraction is complex in that although it often
matches periodicities in the input rhythm, a beat can be perceived even
when half or more than half of the event onsets in rhythmic surface are off
the perceived beat (syncopated) (Brochard, Abecasis, Potter, Ragot, &
Drake, 2003; Tal et al., 2017). From a piece of music, adults can typically
extract different beat tempos that are hierarchically organized, forming a
metrical hierarchy (Essens & Povel, 1985; Hannon, Nave-Blodgett, &
Nave, 2018; Jones & Boltz, 1989; Lerdahl & Jackendoff, 1983). Typically,
every second or third beat from one metrical beat level is maintained at the
next level of the hierarchy, although other patterns are also possible, such as
alternating groups of two and three beats, which form groups of five at the
next level of the hierarchy.
Rhythms and their derived underlying metrical structures are powerful
for several reasons. First, they provide an organizational structure on which
incoming sound events can be hung. Second, they provide a means for
chunking or phrasing the incoming information into meaningful units
(Dowling, 1973). Third, because of their underlying regularity, rhythms
enable prediction of when important information is expected to occur next,
so that attention can be deployed at the most important time points for
optimizing perceptual processing (e.g., Chang, Bosnyak, & Trainor, under
review b; Ding et al., 2017; Fujioka, Trainor, Large, & Ross, 2012; Haegens
& Zion-Golumbic, 2018; Jones, Moynihan, MacKenzie, & Puente, 2002;
Large & Jones, 1999; Nobre, Correa, & Coull, 2007; Schroeder & Lakatos,
2009; van Ede, Niklaus, & Nobre, 2017). The ubiquity of rhythms in
biological systems, and evidence that hearing an auditory rhythm involves
auditory–motor connections (Fujioka et al., 2012; Grahn & Brett, 2007;
Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015; Patel & Iversen,
2014; Trainor & Zatorre, 2015; Zatorre, Chen, & Penhune, 2007), makes
rhythmic processes central to many aspects of development.
It should be noted that rhythms and metrical hierarchies do not depend
completely on isochronous timing. Indeed, deviations from completely
regular timing are also important and are often used expressively in music
(James, Michel, Britz, Vuilleumier, & Hauert, 2012; Rankin, Large, & Fink,
2009; Repp, 1992). For example, phrases typically speed up in the middle
and slow down at the end (Palmer, 1989; Repp, 1992; Todd, 1985), and
prolonging particular notes can give them emphasis and increase
expectations for the next note (Huron, 2006; Meyer, 1956).
We present the position here that timing, meter, and rhythm are the most
fundamental aspects of music, on which other aspects of music, such as
pitch structures, dynamics, and phrasing, are built. This chapter explores the
development of musical timing, meter, and rhythm, without which musical
perception and performance would not be possible. This chapter is not
intended as a complete review of the literature; rather it examines some of
the major research findings in perceptual and sensorimotor development in
individuals and across social contexts.
E P B , M ,
R
From at least as young as two months of age, infants detect tempo changes
(Baruch & Drake, 1997) and can discriminate different rhythm patterns
composed of the same interval durations but in different orders, such as
100–600–300 ms versus 600–300–100 ms (Chang & Trehub, 1977;
Demany, McKenzie, & Vurpillot, 1977; Lewkowicz, 2003). Impressively,
they can make such discriminations across changes in the tempo and pitch
level of the comparison patterns (Trehub & Thorpe, 1989). Young infants
also show sensitivity to phrase and grouping structures in music. For
example, they are more sensitive to small timing perturbations inserted in
the middle of a phrase than at phrase boundaries, where an elongation is
likely to occur in any case (Jusczyk & Krumhansl, 1993; Thorpe & Trehub,
1989; Trainor & Adams, 2000). However, the temporal cues infants use to
determine grouping boundaries of short sequences may be influenced by the
language they are exposed to in their environment. Trainor and Adams
(2000) found that English-learning infants marked the ends of short
perceptual groups with longer tones, whereas Yoshida et al. (2010) found
that Japanese-learning infants marked the beginnings of perceptual groups
with longer tones, consistent with linguistic accent structures in the
respective languages.
Infants also show precocious sensitivity to beat and metrical structure.
For example, newborns exposed to a rhythmic pattern with a 4/4 time
structure show a larger event-related potential (ERP) response in
electroencephalographic (EEG) recordings to omissions on strong
compared to weak beats (Winkler, Háden, Ladinig, Sziller, & Honing,
2009), suggesting sensitivity to metrical structure. However, this study
needs to be replicated, as whether a beat was strong or weak was
confounded with the number of sounds that were omitted. By 7 months of
age, however, there is clear evidence of metrical processing. After
habituation to repeated trials containing rhythms in either duple or triple
meter, infants showed renewed interest for a rhythm presented with the
other (novel) meter (Hannon & Johnson, 2005). In another study, infants
bounced on either every second or third beat of a repeating six-beat
ambiguous rhythm pattern subsequently preferred to listen to a version of
the rhythm pattern with accents added on either every second or every third
beat that matched the meter set up during the bouncing experience,
indicating both discrimination between meters and early involvement of the
motor system in the perception of meter (Phillips-Silver & Trainor, 2005).
In adults, listening to a rhythm pattern entrains neural activity at the
perceived beat and meter frequencies (i.e., tempos) (Fujioka et al., 2012;
Fujioka, Ross, & Trainor, 2015; Nozaradan, 2014; Nozaradan, Peretz, &
Mouraux, 2012; Tal et al., 2017). Evidence from EEG studies also indicates
that at least as young as 7 months, neural oscillations in auditory cortex
entrain to both beat and metrical levels of rhythm patterns (Cirelli, Spinelli,
Nozaradan, & Trainor, 2016). Seven-month-old and 15-month-old infants
listened to a repeating six-beat rhythm pattern that could be interpreted as
either in duple or triple meter, as in Phillips-Silver and Trainor (2005). EEG
recordings were subjected to Fourier analysis following the frequency-
tagging methods of Nozaradan and colleagues (Nozaradan, Peretz, Missal,
& Mouraux, 2011). Peaks in the frequency spectrum were found that
corresponded to the beat frequency as well as to both duple and triple
metrical interpretations, indicating neural entraining at both beat and meter
frequencies in both age groups. Interestingly, at 7 months, the amplitude at
the duple meter frequency was enhanced in infants engaged in infant–parent
music classes compared to those not enrolled in music classes. At 15
months, beat and both meter frequencies were all enhanced in infants whose
parents were musically trained compared to those whose parents were not
musically trained. Thus, early in development neural circuits are sensitive
to the temporal structure of incoming auditory rhythmic patterns.
Just as tonality, pitch, and harmonic structures vary across the musical
systems used in different cultures, so do metrical structures (Hannon et al.,
2018; Hannon & Trainor, 2007; Trainor & Hannon, 2013). In Western
music, durations most commonly stand in simple 2:1 ratios (e.g., a march
meter), with 3:1 as the next most common ratio (e.g., a waltz meter).
However, in many parts of the world (e.g., Africa, the Balkans, South Asia,
South America), more complex metrical patterns that create a non-
isochronous beat at one or more levels of the metrical hierarchy are
common (Hannon, Soley, & Ullal, 2012; London, 2004). For example, an
isochronous beat at a basic level of the hierarchy might be grouped into
alternating groups of three and two beats at the next level of the hierarchy,
creating a five-beat pattern. The alternating groups of two and three beats
create a more complex duration ratio of 3:2. Western adults without
exposure to music with such metrical patterns are much better at
discriminating, remembering, reproducing, and tapping to rhythm patterns
with simple compared to complex meters (e.g., Essens, 1986; Essens &
Povel, 1985; Fraisse, 1982; Hannon & Trehub, 2005a; Repp, London, &
Keller, 2005; Snyder, Hannon, Large, & Christiansen, 2006). However,
adults who grew up in cultures employing complex meter do not show
processing differences between familiar rhythms with simple and complex
meters (e.g., Hannon & Trehub, 2005a; Hannon, Soley, & Ullal, 2012).
Just as infants learn the particular language(s) in their environment,
becoming more sensitive to the phonemic structure of that language, and
less sensitive to alternative phonemic structures by their first birthday (Kuhl
et al., 2006; Werker & Tees, 2005), a similar process of perceptual
narrowing occurs in music acquisition, such that infants become specialized
at processing both the tonal (Gerry, Unrau, & Trainor, 2012; Lynch &
Eilers, 1992; Lynch, Eilers, Oller, & Urbano, 1990; Trainor, Marie, Gerry,
Whiskin, & Unrau, 2012; Trainor & Trehub, 1992, 1994) and metrical
structures (Gerry, Faux, & Trainor, 2010; Hannon & Trehub, 2005a, 2005b)
in the music they experience in their environment (Hannon & Trainor,
2007; Trainor & Corrigall, 2010; Trainor & Hannon, 2013; Trainor &
Unrau, 2012). With respect to metrical processing, at 4 to 6 months of age,
Western infants notice if an extra beat is added to a 7/4 meter as well as if a
beat is dropped from an 8/4 meter (Hannon & Trehub, 2005a). However,
performance on the 7/4 meter declines between 7 and 12 months, such that
12-month-old Western infants, like Western adults, perform very poorly on
this task (Hannon & Trehub, 2005b). That these declines for non-native
meters are driven by experience is reinforced by findings that listening
experience can speed up or slow down (or even reverse) the perceptual
narrowing. As far as slowing down or reversing perceptual narrowing,
providing daily listening experience with non-Western non-isochronous
meters reinstates the loss of sensitivity for the non-isochronous meters in
12-month-olds (Hannon & Trehub, 2005a). Interestingly, there appears to
be a window of sensitivity for reversing perceptual narrowing for meter, as
Western 5- to 7-year-old children show some, but not full, reinstatement
after a similar listening experience with non-isochronous meters, whereas
adults show no evidence of reinstatement after such experience (Hannon,
Vandenbosch der Nederlanden, & Tichko, 2012). As far as speeding up
perceptual narrowing, Gerry et al. (2010) found that 7-month-old infants
enrolled in Kindermusik classes showed a listening preference for a rhythm
with accents on every second beat compared to the same rhythm with
accents on every third beat, the former meter being more common in
Western music, whereas infants not enrolled in music classes did not show
this preference. In general, a preference for native meters may be evident
prior to perceptual narrowing. Soley and Hannon (2010) found that a
preference for isochronous over non-isochronous meters increases between
4 and 8 months in Western infants whereas no listening preferences are
evident during this age period in Turkish infants.
The bias for culture-specific meters continues into childhood. Einarson
and Trainor (2015, 2016) developed a child-friendly version of the Beat
Alignment Task (BAT) (Iversen & Patel, 2008) that included music with
both simple and complex meters (cBAT). In this task, children watch pairs
of short video excerpts of puppets drumming to musical excerpts. One
puppet drums on the beat and the other puppet’s drumming is either at the
wrong tempo or misaligned in phase with the music. The children decide
which puppet is the best drummer for a band. Western 5-year-old children
were at change levels on music with complex meters, but performed
significantly better (and above change levels) on music with simple meters.
Together, the infant and child studies indicate that perceptual narrowing for
the metrical structures common in the music of one’s culture develop early
and are maintained in childhood, raising the interesting question of whether
best pedagogical practice might be to expose infants and young children to
complex meters if the goal is to provide them with the perceptual tools to
understand the rhythms of music from around the world.
T R I -D
S
For most infants, their first experience of music is likely hearing their
mother sing, and a diary study of North American mothers, where
opportunities to listen to recorded music abound, indicates that most
mothers still sing to their infants many times during the day, such as when
bathing them, playing, feeding, during diaper changes, in the car, and at
sleep time (Trehub et al., 1997). Infants often experience their parents’
singing while being held and rocked or walked rhythmically, or while
feeling their parent tap their back or touch other body parts rhythmically
during the song, so that from early ages, infants experience musical rhythms
in a multisensory context involving hearing, movement, and vision. Singing
to infants appears to be a spontaneous intuitive response to the presence of
an infant, and universal across human cultures. Furthermore, across
cultures, Western adults were able to discern lullabies that were intended for
infants from other songs matched in tempo and general style (Trehub,
Unyk, & Trainor, 1993), suggesting that infant-directed singing might have
been an evolutionary adaptation that helped infants to survive. The one
song category that Western adults found difficult to distinguish from
lullabies was love songs (Trehub & Trainor, 1998), suggesting that lullabies
may express and communicate the deep emotional bonds between parents
and their infants. Music appears to be particularly effective at controlling
infants’ states—when left alone, hearing their mothers’ infant-directed
singing was found to keep infants happier for considerably longer than
hearing their mothers’ infant-directed speech (Corbeil, Trehub, & Peretz,
2016), and cortisol levels were found to decrease in infants when their
mothers sang to them (Shenfield, Trehub, & Nakata, 2003).
Just as adults use a different speaking style when talking to infants
compared to adults, termed infant-directed or musical speech (Fernald,
1991; Papoušek, Papoušek, & Symmes, 1991), they sing differently to
infants than they sing in other circumstances (Trehub & Trainor, 1998).
Trainor (1996) recorded mothers singing the same song when their infant
was present and when their infant was absent and found that adults were
highly accurate at identifying the infant-directed versions. Furthermore,
using a preferential looking paradigm, she found that infants preferred to
listen to the infant-directed versions. In addition to being sung at a higher
pitch and in a more loving tone of voice, in comparison to non-infant-
directed singing, infant-directed singing also differs in timing and rhythmic
features. It is generally slower in tempo and has exaggerated structural
features, such as enhanced phrase boundaries, rhythm, and grouping
(Longhi, 2009; Trainor, Clark, Huntley, & Adams, 1997). For example,
infant-directed singing contains longer pauses between phrases (Trainor et
al., 1997). There is also evidence that mothers exaggerate the hierarchical
beat structure of songs when singing to infants, using both acoustic accents
and body movements to do so (Longhi, 2009). They particularly emphasize
upbeats, which is interesting in that upbeats provide anticipatory
information that a downbeat is expected to follow. And while infants are not
yet able to synchronize movements precisely to the beats of music, infants
in this study made more synchronous movements to beats at the beginnings
and ends of phrases than in the middle, suggesting some understanding of
the temporal structure of phrases in infant-directed singing with
exaggerated rhythmic cues. There is some evidence that depressed mothers
do not employ the full repertoire of infant-directed singing features,
generally singing faster and with less expression than non-depressed
mothers (de l’Etoile & Leider, 2011), and possibly compromising
communication with their infants.
Two basic categories of infant-directed singing have been identified:
lullabies, where the intention is to help a fussy infant to fall asleep, and
playsongs, where the intention is to rouse the infant, interact with them in
play, and direct their attention to interesting people and things in the
environment (Rock, Trainor, & Addison, 1999; Trainor, 1996). These two
categories arise more from the style or manner in which the caregiver sings
than the structural content of the music. Indeed, Rock et al. (1999) recorded
mothers singing a song of their choice to their infant, once in a lullaby style
and once in a playsong style. Adult raters were 100 percent accurate at
identifying which were lullabies and which playsongs, indicating that these
styles are highly distinct. Furthermore they rated playsongs as sounding
more rhythmic, clipped, and accented compared to lullabies, which were
rated as sounding smoother. Importantly, infants show differential behaviors
in response to lullaby and playsong renditions of the same song (Rock et al.,
1999) and prefer faster tempos for playsongs but not for lullabies (Conrad,
Walsh, Allen, & Tsang, 2011). That such timing and rhythmic differences
are likely universal across human cultures and musical systems suggests
that the perceptual, emotional, and social consequences of these temporal
features may have evolutionary origins.
Singing to infants is a social interaction that requires temporal
coordination between the caregiver and the infant. Such temporal
coordination during the first months after birth appears to promote
communication, the development of successful social interactions, and
emotion regulation (Ilari, 2016; Malloch & Trevarthen, 2009). These social
consequences of rhythmic interactions are discussed later in the chapter.
R , P , N
O , D
R
The regularity of rhythms enables prediction of when the next beat is likely
to occur (Large & Jones, 1999; Trainor & Zatorre, 2015), which can aid in
preparing for incoming information and focusing attention at information-
rich points in time. Predictive timing is critical for the perception of stimuli
such as speech and music that unfold rapidly over time and are fleeting in
that once each note or phoneme ends, the next begins, and it is not possible
to hear the input again. Indeed there is considerable evidence that the adult
brain is continually predicting the future and comparing its predictions with
what actually occurs (Fujioka et al., 2012, 2015; Herrmann, Henry,
Haegens, & Obleser, 2016; Morillon & Schroeder, 2015). In the case of
incorrect predictions, an error signal is generated which can engage
attention and lead to additional processing and learning (Arnal & Giraud,
2012; Chang, Bosnyak, & Trainor, 2018; Ding et al., 2017; Haegens & Zion
Golumbic, 2018; Nobre et al., 2007; Nobre & van Ede, 2018; Schroeder &
Lakatos, 2009; Schröger, Marzecová, & SanMiguel, 2015). In adults, both
behavioral and neural evidence indicates that the perception of sounds at
beat onsets presented in rhythmic contexts is enhanced (Arnal & Giraud,
2012; Chang et al., under review b; Haegens & Zion-Golumbic, 2018;
Henry & Obleser, 2012; Herrmann et al., 2016; Jones et al., 2002; Morillon,
Schroeder, Wyart, & Arnal, 2016; Nobre & van Ede, 2018). Predictive
processes are evident very early in infancy in that occasional unexpected
changes (deviants) in isochronous sound sequences lead to ERP mismatch
responses (MMRs) in EEG recordings (Basirat, Dehaene, & Dehaene-
Lambertz, 2014; Háden, Németh, Török, & Winkler, 2015; He, Hotson, &
Trainor, 2007; Trainor, 2012; Trainor & He, 2013; Trainor & Zatorre,
2015).
The neural mechanisms that underlie predictive timing are beginning to
be understood in the adult brain in terms of neural oscillations. Specifically,
low frequency neural oscillations (delta band, ~1–3 Hz) phase align with
the onsets of beats in an auditory rhythmic stimulus such that predictive
timing is enhanced (Arnal, Poeppel, & Giraud, 2015; Bauer, Bleichner,
Jaeger, Thorne, & Debener, 2018; Calderone, Lakatos, Butler, &
Castellanos, 2014; Henry, Herrmann, & Obleser, 2014; Henry & Obleser,
2012; Schroeder & Lakatos, 2009; Stefanics et al., 2010). The power in
higher frequency oscillations (beta band, ~20 Hz) is also modulated by
auditory rhythmic stimuli such that beta power decreases after a beat onset
and rebounds so as to reach maximum amplitude at the expected time of the
next beat, dependent on the tempo of the rhythmic input (Fujioka et al.,
2012, 2015). This rebound time appears to be a neural signature of timing
prediction in the brain, and beta oscillations are proposed to reflect
attentional processes leading to enhanced perception at particular time
points (Arnal & Giraud, 2012; Chang et al., 2018, under review b; Iversen,
Repp, & Patel, 2009; Snyder & Large, 2005). Beta oscillations also appear
to be associated with capture of attention (Chang et al., 2018).
Very little research has examined the development of neural oscillations
involved in predictive processes in rhythmic contexts. However, one study
compared beta oscillations in 7-year-old children and adults in response to
isochronous beat sequences at different tempos (Cirelli, Bosnyak, et al.,
2014). Beta power entrainment to the tempo of the input was found, but the
responses of children were noisier than those of adults, and were
measurable only over a more narrow range of tempos than in adults. This
suggests that the neural oscillatory responses underlying predictive timing
follow a protracted developmental trajectory. Clearly more research is
needed to understand the brain development underlying rhythmic predictive
processes.
While neural oscillation studies of predictive timing have focused on
stimuli with constant tempos, in real music performances the tempo
typically modulates continuously (James et al., 2012; Palmer, 1989; Rankin
et al., 2009; Repp, 1992; Todd, 1985). These timing perturbations are not
random, but interact with the structure and content of the music. In
particular, expressive timing emphasizes phrase boundaries by lengthening
phrase-final notes or chords, and plays with temporal expectations by, for
example, elongating notes or chords that embody harmonic tension, thus
delaying their resolution (London, 2004; Repp, 2005; Repp & Su, 2013).
Perceptual studies indicate that both musically untrained adults (Clarke &
Krumhansl, 1990; Deliege, 1987; Palmer & Krumhansl, 1987; Peretz, 1989)
and infants (Krumhansl & Jusczyk, 1990; Trainor & Adams, 2000) are
sensitive to phrase boundaries. Even non-musicians produce phrase-final
lengthening in their performances (Kragness & Trainor, 2016), suggesting
that it is not a learned performance technique, but is based on intrinsic
temporal expectations. Specifically, Kragness and Trainor (2016) used a
self-paced tapping paradigm in which non-musician adults pressed a key to
get the next chord in an unfamiliar sequence of chords. In their renditions,
adults tended to speed up in the middle of phrases defined by typical
Western cadences, and to slow down at the ends of phrases, even after
metrical regularity and melodic contour were controlled.
One possible explanation for phrase-final lengthening is that ends of
phrases tend to be points of high entropy in that it is difficult to predict what
will come next, whereas points in the middle of phrases tend to be of low
entropy in that it is relatively easy to predict the next note or chord (Pearce,
Müllensiefen, & Wiggins, 2010; Pearce & Wiggins, 2006). The uncertainty
at phrase boundaries might require more processing time, leading to a
natural slowing. Further evidence for an entropy explanation comes from a
developmental study in which children as young as 3 years were found to
dwell longer at phrase endings, although sophistication in the cues used to
detect phrase boundaries increased between 3 and 7 years of age (Kragness
& Trainor, 2018). That very young children produce phrase-final
lengthening in their musical productions is consistent with the possibility
that it is based on intrinsic processing properties of perception rather than
reflecting learning of a particular musical performance style.
These studies suggest, first, that the brain entrains to the beat and meter
in auditory rhythms early in development, but that this entrainment and its
relation to attention and error monitoring continues to develop for many
years and, second, that although a steady beat is typically experienced when
listening to music, timing perturbations in isochrony that follow the
structure and content of the music are present early in childhood, suggesting
that beat perception and the neural processes underlying it are not strictly
isochronous, but involve an interaction between time and context.
T M E
Adults are good at identifying basic emotions in music such as happiness,

sadness, anger, fear, and tenderness (e.g., Balkwill & Thompson, 1999;
Balkwill, Thompson, & Matsunaga, 2004; Fritz et al., 2009; Juslin &
Laukka, 2003; Mohn, Argstatter, & Wilker, 2011), and non-musicians
appear to perform as well as musicians at this (Bigand, Vieillard, Madurell,
Marozeau, & Dacquet, 2005; Juslin, 1997). Tempo, rhythm, loudness,
articulation, pitch, and tonality are among the cues used in music for
emotional communication (Gabrielsson & Lindström, 2010: Juslin &
Timmers, 2010). Although tonality cues to emotion vary considerably
across cultures and musical systems, features such as tempo, loudness, and
complexity appear to operate similarly, and adults use such cues to identify
the emotions in music from an unfamiliar culture (Balkwill & Thompson,
1999; Balkwill et al., 2004). When musicians play the same piece of music
in different ways to express different emotions, listeners use timing and
intensity cues to identify those emotions (Behrens & Green, 1993; Juslin,
1997, 2000; Laukka & Gabrielsson, 2000).
A number of studies show that children as young as 3 to 4 years of age
can categorize music as expressing happiness or sadness (e.g., Adachi,
Trehub, & Abe, 2004; Cunningham & Sterling, 1988; Dolgin & Adelson,
1990; Esposito & Serio, 2007; Gerardi & Gerken, 1995; Giomo, 1993;
Gregory, Worrall & Sarge, 1996; Kastner & Crowder, 1990; Kratus, 1993;
Nawrot, 2003). A couple of studies show that even 5-month-old infants
discriminate melodies expressing happiness from those expressing sadness
(Flom, Gentile, & Pick 2008; Flom & Pick, 2012). The ability to identify
more complex emotions such as anger and fear develop later, with some
competence emerging by 5 to 6 years of age (Cunningham & Sterling,
1988; Giomo, 1993; Kratus, 1993; Terwogt & van Grinsven, 1991).
The particular cues used by children in the studies reviewed here are
generally not known. However, two studies specifically separated timing
from pitch cues. Mote (2011) found that children as young as 4 years used
tempo to distinguish happy from sad music. Dalla Bella and colleagues
(Dalla Bella, Peretz, Rousseau, & Gosselin, 2001) varied both tempo and
mode (major/minor) independently and found that 5-year-old children used
tempo to distinguish happy from sad music, but it was not until 6 years of
age that children used mode. When singing to express emotions, children
between 4 and 12 years of age increase their tempo, loudness, and pitch
height to express happiness compared to sadness (Adachi & Trehub, 1998).
Furthermore, these cues can be used by children from different cultures to
decipher the intended emotion of other children’s singing (Adachi et al.,
2004).
It has been proposed that emotions can be conceptualized along two
continuous dimensions, valence and arousal (Russell, 1980). In adults, high
arousal emotions, including joy, excitement, fear, and anger, are typically
associated with fast tempos, staccato articulation (disconnected notes), and
high sound levels whereas low arousal emotions, including peacefulness,
tenderness, sadness, and grief are typically associated with slow tempos,
legato articulation (connected notes), and low sound levels (Gabrielsson &
Juslin, 1996; Gagnon & Peretz, 2003; Ilie & Thompson, 2006; Juslin, 2000;
Juslin & Timmers, 2010). One study examined how children use timing and
sound intensity to convey emotions varying in valence and arousal, using a
self-pacing method in which children pressed a key to get successive chords
in musical pieces (Kragness, Baksh, Battcock, & Trainor, 2017). Children
played each musical piece four times, expressing joy (high arousal, high
valence), sadness (low arousal, low valence), peacefulness (low arousal,
high valence), or anger (high arousal, low valence) on each rendition. With
their key presses, children were able to control tempo (onset-to-onset
duration), articulation (note duration relative to onset-to-onset), and
loudness. By 5 years of age, children used faster tempos for high-arousal
emotions (joy, anger) than low-arousal emotions (peacefulness, sadness).
By 7 years of age, children also used articulation, playing shorter notes to
express high-arousal than low-arousal emotions. In sum, children use tempo
and articulation to express emotion from a fairly young age, even in the
absence of musical performance experience.
In adults, musical emotions are often conveyed or heightened by
deviations from regular timing, termed expressive timing (James et al.,
2012; Rankin et al., 2009; Repp, 1992). Meyer (1956) proposed that such
deviations are one way to play with expectations and that musical emotions
arise from general-purpose physiological responses to prediction errors (see
also Huron, 2006; Trainor & Zatorre, 2015). Although expressive timing
has not been studied developmentally to our knowledge, a couple of studies
suggest that infants, unlike adults, prefer regularity over timing variability.
Nakata and Mitani (2005) found infants prefer to listen to the more regular
of two rhythms. Trainor et al. (2012) found that infants overall had no
preference for a version of Chopin’s Waltz in A-flat, op. 69, No. 1 played
expressively by Dinu Lipatti compared to one that was computer generated
with metronomic timing. Together these studies suggest that children use
basic timing cues such as overall tempo for emotional processing in music,
but that it takes some time for children to become sensitive to timing cues
involving deviations from regularity, as are found in expressive timing.
D A –M
E S E
S M
It is often noted that across cultures past and present music serves social
functions, including action coordination, communication, and social
cohesion (Cirelli, 2018; Cirelli, Trehub, & Trainor, 2018; D’Ausilio,
Novembre, Fadiga, & Keller, 2015; Ilari, 2016; Patel & Iversen, 2014;
Trainor, 2015; Trainor & Cirelli, 2015). Indeed music is present at virtually
all important social occasions including weddings, funerals, religious
rituals, parties, sporting events, and political rallies. A number of
researchers have suggested that the main function of joint musical
experiences among adults is the increased social cohesion that results from
synchronous movement (e.g., Bispham, 2006; Brown & Volgsten, 2006;
Fitch, 2006; Huron, 2001; McNeill, 1995; Merker, 2000). It has also been
suggested that music similarly enhances the social relationships between
infants and their caregivers, with the coordinated interaction that music
engenders increasing attachment, bonding, emotional recognition, and self-
regulation in early development (Cirelli, 2018; Cirelli, Trehub, & Trainor,
2018; Dissanayake, 2012; Ilari, 2016; Malloch & Trevarthen, 2009; Trainor
& Cirelli, 2015).
Many motor movements across species are rhythmic, including
heartbeats, locomotion (e.g., walking, running, skipping, swimming, wing
flapping), pulsating in fireflies, and sound productions from speech in
humans to chirping in crickets (Ackermann, 2008; Bentley & Hoy, 1974;
Buck, 1935, 1937, 1988; Kelso, Saltzman, & Tuller, 1986; Partridge, 1982;
Peelle & Davis, 2012; Weimerskirch, Martin, Clerquin, Alexandre, &
Jiraskova, 2001). However, spontaneous synchronization of movements to
an external auditory beat appears to be relatively rare among non-human
species (Merchant & Honing, 2014; Merchant et al., 2015; Patel, Iversen,
Bregman, & Schulz, 2009; Schachner, Brady, Pepperberg, & Hauser, 2009),
but very common and seemingly effortless in most human adults (Iversen &
Patel, 2008; Patel & Iversen, 2014; Repp, 2005; Repp & Su, 2013; Trainor,
2015).
Despite the ease with which adults achieve auditory–motor rhythmic
coupling, it appears to take a long time to develop (Cirelli, Trehub, &
Trainor, 2018; Drake, 1993; Einarson & Trainor, 2013, 2015, 2016;
Fitzpatrick, Schmidt, & Lockman, 1996; Luck & Toiviainen, 2006; Phillips-
Silver & Trainor, 2005; Provasi & Bobin-Bègue, 2003; Trainor & Cirelli,
2015; Van Noorden & De Bruyn, 2009; Zentner & Eerola, 2010), so
interactions in early development involving synchronous movement likely
rely on the caregiver to achieve the synchrony. Zentner and Eerola (2010)
analyzed the movements of a European sample of infants while they
listened to music and found no evidence for precise synchronization with
the tempo of the music. However, infants moved more in response to music
than to speech, and moved differently to music with faster compared to
slower tempos. Such early responses are likely influenced by early
experiences; in a follow-up study, Ilari (2015) found increased spontaneous
rhythmic movements to music in a Brazilian sample of infants compared to
those in the European sample of Zentner and Eerola (2010).
At least among children growing up in Western cultures, children
younger than 4 years of age generally have difficulty entraining to a beat.
One study found that 2.5-year-old children only succeeded at tapping to a
beat when the tempo was around their spontaneous tapping rate of about
400 ms onset-to-onset (Provasi & Bobin-Bègue, 2003). Another study
reported that 3-year-olds performed poorly in general at clapping to a beat
(Fitzpatrick et al., 1996). And although children younger than 4 years of age
readily engage in whole body movements in response to music, their
hopping, swaying, and circling are not generally entrained to the tempo of
the music (Eerola, Luck, & Toiviainen, 2006). However, by 4 years of age,
clear motoric entrainment to a beat emerges (Drake, Jones, & Baruch, 2000;
Eerola et al., 2006; Endedijk et al., 2015; Fitzpatrick et al., 1996; McAuley,
Jones, Holub, Johnston, & Miller, 2006; Provasi & Bobin-Bègue, 2003),
although school-aged children still perform worse than adults (Einarson &
Trainor, 2013; Van Noorden & De Bruyn, 2009).
Interestingly, evidence for auditory–motor entrainment can be seen at
younger ages when the task is embedded in a social situation. While not
synchronization, coordination between mothers and infants aged 3 to 9
months is evident in infancy, and correlates with self-regulation, future IQ,
and development of empathy (Feldman, 2007). Furthermore, during infant-
directed singing, infants’ head, body, hand, and leg movements coordinate
most with the music at the beginnings and ends of phrases (Longhi, 2009).
At somewhat older ages, precursors of entrainment can be seen in the
spontaneous drumming of pairs of children in social situations; for instance,
at ages 2 and 3 years, children will stop and start drumming when a partner
does so, but only 4-year-olds appear to adapt the tempo of their drumming
to that of their partner, even though all children showed tempo stability (i.e.,
the ability to produce relatively isochronous drumming sequences)
(Endedijk et al., 2015).
Interestingly, one study indicates that children as young as 2.5 years of
age will entrain their drumming to that of an adult social partner (Kirschner
& Tomasello, 2009). That this ability does not rely simply on visual cues is
evident in that the children performed better when drumming with a human
social partner than with a machine that hit the drum. Unlike when
entraining to a predetermined stimulus as in most laboratory studies, in real
musical interactions between people, all participants can adaptively adjust
their timing in response to the other musicians (D’Ausilio et al., 2015;
Keller, Novembre, & Hove, 2014; Nakata & Trainor, 2015). Indeed, the
information flow among members of a musical ensemble can be measured
through correlational and directional causal analyses of movement or EEG
(Chang, Livingstone, Bosnyak, & Trainor, 2017; Lindenberger, Li, Gruber,
& Müller, 2009; Sänger, Müller, & Lindenberger, 2012). With such
approaches, it is possible that socially adaptive musical interactions could
be measured in young children.
Cultural experience also has an effect. Parental reports suggest that
children in Brazil engage to a greater extent in such social music making
than do children from Germany, and children from Brazil show greater
propensity to spontaneously synchronize their drumming with another
person than children from Germany (Kirschner & Ilari, 2014).
Several studies indicate that when adults move in synchrony with each
other, they subsequently cooperate more, like and trust each other more,
remember more about each other, and engage in more altruistic acts (e.g.,
Anshel & Kipper, 1988; Hove & Risen, 2009; Launay, Dean, & Bailes,
2013; Macrae, Duffy, Miles, & Lawrence, 2008; Tarr, Launay, & Dunbar,
2014; Valdesolo & DeSteno, 2011; Valdesolo, Ouyang, & DeSteno, 2010;
Wiltermuth & Heath, 2009; Woolhouse, Tidhar, Demorest, Morrison, &
Campbell, 2010). Furthermore, synchronized drumming can increase
activation in the caudate, a brain region linked to expectation and reward
(Kokal, Engel, Kirschner, & Keysers, 2011). Music provides an ideal
context for facilitating synchronous movement between people. Rhythmic
regularity in music enables prediction of when the next beat is expected,
and therefore advanced planning of the motor movements necessary to
synchronize with the beat. When people hear the same music and
synchronize their movements with the beat of that music, they necessarily
become synchronized with each other.
Although infants cannot precisely entrain their movements to an
auditory beat, they often experience such synchronization when they are
held and walked or rocked to singing and other music. The social effects of
synchronization with others begin to appear around the end of the first year
after birth, a time during which infants’ social understanding is undergoing
rapid development (Dunfield, Kuhlmeier, O’Connell, & Kelley, 2011). For
example, at 12 months, but not at 9 months, infants preferred a toy bear that
rocked in sync with them over one that rocked out-of-sync with them
(Tunçgenç, Cohen, & Fawcett, 2015). By 14 months, infants will engage in
overt helping behaviors, for example, picking up and handing back a
marker “accidentally” dropped by an experimenter engaged in drawing a
picture, or picking up and handing back a clothespin “accidentally” dropped
by an experimenter hanging clothes on a line (Warneken & Tomasello,
2006, 2007, 2009). At 14 months, infants are not yet able to move in sync to
music, but they can be bounced in sync to music when held by an assistant
in an infant carrier. Cirelli, Einarson, and Trainor (2014) had an assistant
bounce infants to “Twist and Shout” by the Beatles while the infant faced
forward in the carrier across from the experimenter, who either bounced in
sync with them (i.e., at the same tempo) or out of sync (at a different
tempo), by having the experimenter bounce according to a click track
delivered to her over headphones. They found that after less than 3 minutes
of such bouncing, infants were more likely to help the experimenter if they
experienced synchronous compared to asynchronous bouncing in a series of
helping tasks as just described. Furthermore, the increased helpfulness was
targeted at the person that the infant experienced the bouncing with—
infants showed no increased helpfulness toward a neutral experimenter who
was present during the bouncing episode, but did not move to the music
(Cirelli, Wan, & Trainor, 2014). However, if infants were shown a skit that
either indicated that a second experimenter was a “friend” of the bouncer, or
was simply an “acquaintance,” infants who bounced in sync with the
experimenter transferred their helpfulness to the friend but not to the
acquaintance. This suggests that infants use movement synchrony as one
cue to identify who is in their social group and who is not.
Infants also form expectations for future behaviors between other people
(third party relationships) by observing how they interact. It appears that
infants begin to use synchrony as a cue for third party relationships around
the same age. Fawcett and Tunçgenç (2017) found that 15-month-old but
not 12-month-old infants who watched bears move either in sync or out of
sync expected those who moved in sync to affiliate socially. Cirelli, Wan,
Johanis, and Trainor (2018) found that 12- and 15-month-old infants who
watched videos of two women bouncing either in sync with each other or
out of sync were surprised when those who bounced asynchronously
subsequently displayed friendly behavior, although no significant
expectations were observed for those who bounced synchronously.
The use of synchronous movement to music as a cue to social affiliation
continues into childhood. For example, after clapping together
synchronously, 4- to 6-year-old children are more likely to help each other
compared to children who experienced asynchronous clapping (Tunçgenç &
Cohen, 2016). Even passive synchronous versus asynchronous movement
(children were pushed on swings) at 4 years of age results in increased
coordination and cooperation between children (Rabinowitch & Meltzoff,
2017).
Interestingly, moving infants synchronously versus asynchronously with
an experimenter in the absence of music has similar prosocial effects on
subsequent helping behavior as when the music is present (Cirelli, Wan, et
al., 2017). This suggests that music may facilitate synchronous movement,
but that experiencing music together per se might not be crucial for
increasing prosocial behavior. However, infants bounced with no music
were much less happy and cooperative than those bounced with music
(Cirelli, Wan, et al., 2017), suggesting that the music does play a role in
infant emotion regulation, which is likely helpful for encouraging prosocial
behavior. Furthermore, synchronous movement between the experimenter
and infant increases helping even when the musical beat is irregular and
movements are therefore synchronized but not isochronous (Cirelli, Wan, &
Trainor, 2014). Interestingly, anti-phase bouncing between an infant and
experimenter (i.e., the experimenter is at maximum height when the infants
is at minimum height and vice versa) appears to be as powerful as in-phase
bouncing for eliciting helping behaviors from infants, suggesting that the
mechanism is not one of simple self-similarity (Cirelli, Wan, & Trainor,
2014). Together, these results suggest that the role of music in encouraging
prosocial behavior is one of promoting synchronous movement. If
synchronous movement can be achieved in the absence of music, it is still a
powerful force for encouraging prosocial behavior. However, music is an
ideal stimulus for promoting synchronous movement because its temporal
regularity enables the prediction necessary for coordinated movement.
A D
D
Given that rhythms organize our movements and our communication

systems—music and language—and given that neural processes involve
rhythmic oscillations, it is not surprising that when timing and/or rhythmic
processes are compromised in development, the consequences can be
severe. Indeed, deficits in timing and rhythmic processing have been linked
to major developmental disorders including dyslexia, developmental
coordination disorder (DCD), autism, attention deficit (hyperactivity)
disorder (ADD/ADHD), and stuttering (Bhat & Srinivasan, 2013;
Debrabant, Gheysen, Caeyenberghs, Van Waelvelde, & Vingerhoets, 2013;
Falter & Noreika, 2014; Goswami, 2011; Hardy & LaGasse, 2013; Isaksson
et al., 2018; Rosenblum & Regev, 2013; Toplak, Dockstader, & Tannock,
2006; Wieland, McAuley, Dilley, & Chang, 2015; Williams, Woollacott, &
Ivry, 1992). Furthermore, there is high comorbidity among these disorders,
raising the possibility of a common timing deficit (Brown-Lum & Zwicker,
2015; Iversen, Berg, Ellertsen, & Tønnessen, 2005; King-Dowling,
Missiuna, Rodriguez, Greenway, & Cairney, 2015; McLeod, Langevin,
Goodyear, & Dewey, 2014; Piek & Dyck, 2004; Reiersen, Constantino, &
Todd, 2008; Watemberg, Waiserberg, Zuk, & Lerman-Sagie., 2007).
Taking the example of dyslexia, languages are rhythmically organized
around the syllable level. Although not strictly isochronous, syllables in
syllable-timed languages (e.g., French), and stressed syllables in stress-
timed language (e.g., English) are perceived as being roughly isochronous.
Syllables thus provide a basic scaffold for a hierarchy of metrical levels and
are crucial for perceiving and ordering the speech sounds (phonemes)
within them, as well as for grouping syllables into words and phrases (e.g.,
Ding, Melloni, Zhang, Tian, & Poeppel, 2016; Giraud et al., 2007; Giraud
& Poeppel, 2012).
Dyslexia is defined as difficulties with letter-sound (phoneme) mappings
during reading (Goswami, 2011), but the underlying core deficit appears to
be difficulty in the extraction of syllable-level rhythmic structure, from
which the phonemes comprising the syllables can be discerned (e.g.,
Goswami, 2011; Goswami, Huss, Mead, Fosker, & Verney, 2013). The
deficit is not specific to language as both the ability to perceive rhythmic
and metrical structure and the ability to entrain finger taps to an auditory
beat are impaired in dyslexia (Flaugnacco et al., 2014; Huss, Verney,
Fosker, Mead, & Goswami, 2011; Overy, Nicolson, Fawcett, & Clarke,
2003; Thomson, Fryer, Maltby, & Goswami, 2006; Thomson & Goswami,
2008; Wolff, 2002). A few studies suggest that the presentation of rhythmic
cues can enhance linguistic processing (Cason, Astésano, & Schön, 2015;
Cason & Schön, 2012; Schön & Tillmann, 2015; Przybylski et al., 2013).
For example, using both behavioral and EEG measures, Cason, Schön, and
colleagues (Cason & Schön, 2012; Cason et al., 2015) found superior
processing of whether the last syllable in a sentence contained a particular
phoneme when it was preceded by a rhythmic cue that matched the prosody
(metrical structure) of the sentence compared to a rhythmic cue that did not
match. In another study, Przybylski et al. (2013) presented children with
rhythmic or arrhythmic primes followed by a spoken sentence and found
better determination of whether the sentence was syntactically correct or
not after the rhythmic primes.
There are also a number of reports that rhythmic musical training can
lessen dyslexic symptoms (Bhide, Power, & Goswami, 2013; Cogo-
Moreira, de Avila, Ploubidis, & de Jesus Mari, 2013; Flaugnacco et al.,
2015; Habib et al., 2016; Schön & Tillmann, 2015; Taub, McGrew, &
Keith, 2007; Thomson, Leong, & Goswami, 2013). For example, Habib et
al. (2016) found that eighteen hours of musical training with a strong
emphasis on rhythmic exercises led to improvements in reading and other
linguistic and non-linguistic skill in children between 7 and 12 years of age.
In a registered clinical trial, Flaugnacco et al. (2015) tested a sample of
forty-eight children with dyslexia between the ages of 8 and 11 years. They
found that seven months of musical training based on Kodaly and Orff
approaches that were adapted to have a strong emphasis on timing and
rhythm led to superior phonological and reading outcomes compared to a
control painting program (all children also received conventional treatment
for dyslexia).
Timing and rhythm deficits are also associated with developmental
coordination disorder (DCD), which is defined as significant fine and/or
gross motor deficits that interfere with everyday living and are not
accompanied by intellectual impairment or other identifiable physical
disorder (American Psychiatric Association, 2013). Despite the fact that
dyslexia and DCD appear to involve different mechanisms, there is high
comorbidity between them (Brown-Lum & Zwicker, 2017; Gomez &
Sirigu, 2015). DCD is associated with deficits in motor planning, motor
sequencing, temporal prediction, perceptual timing, and tapping to a
predictable visual sequence (Debrabant et al., 2013; Estil, Ingvaldsen, &
Whiting, 2002; Volman & Geuze, 1998; Williams, Thomas, Maruff, Butson,
& Wilson, 2006; Wilmut & Wann, 2008; Wilson, Ruddock, Smits-
Engelsman, Polatajko, & Blank, 2013; Zwicker, Missiuna, & Boyd, 2009).
A few studies have reported that children with DCD are also impaired at
entraining their tapping to an auditory beat (Rosenblum & Regev, 2013;
Williams et al., 1992). Given that simply perceiving an auditory beat
engages the motor system, Trainor and colleagues (Trainor, Chang, Cairney,
& Li, 2018) proposed that auditory timing and rhythm perception is a core
deficit of DCD. Indeed in an initial study, these researchers found marked
deficits in rhythm and duration perception using both behavioral and EEG
measures (Chang, Chan, Li, Cairney, & Trainor, 2017).
This review of dyslexia and DCD suggests that timing and rhythm
processing may be core deficits in both disorders and relate to their high
comorbidity. Less research has been conducted on this question with respect
to ADD/ADHD and autism, but it is possible that a timing deficit may be
common to all of these developmental disorders. Importantly, the success of
rhythmic training in treating dyslexia suggests that auditory and movement
interventions involving timing and rhythm training might also be effective
in DCD, autism, and ADHD.
Young infants are sensitive to timing, rhythm, and meter, which helps them
to organize inputs such as speech and music into hierarchical meaningful
structures, and to enhance processing of auditory streams that unfold over
time by using the regularities of rhythms to predict when important
upcoming information will occur. Tempo and expressive timing are used in
both caregivers’ infant-directed singing and in children’s early musical
productions to convey emotional information. Auditory–motor connections
can be seen early in development in the influence of movement on metrical
interpretation, and in the influence of synchronous movement between an
infant and an adult on infants’ altruistic helping behaviors. At the same
time, it takes considerable development before children become proficient
at entraining their motor movements to musical beats. The critical
importance of timing and rhythm in development is evident in the strong
associations between poor skills in these domains and developmental
disorders including dyslexia, DCD, autism, attention deficits, and stuttering.
Early diagnosis of poor timing and rhythm skills holds promise for early
assessment of risk for developmental disorders and age-appropriate
interventions that can put young children on a better developmental
trajectory.
A
The writing of this chapter was supported by grants from the Natural
Science and Engineering Research Council of Canada, the Canadian
Institute of Health Research, and the Social Sciences and Humanities
Research Council of Canada.
R
Ackermann, H. (2008). Cerebellar contributions to speech production and speech perception:
Psycholinguistic and neurobiological perspectives. Trends in Neurosciences 31(6), 265–272.
Adachi, M., & Trehub, S. E. (1998). Children’s expression of emotion in song. Psychology of Music
26(2), 133–153.
Adachi, M., Trehub, S. E., & Abe, J. I. (2004). Perceiving emotion in children’s songs across age and
culture. Japanese Psychological Research 46(4), 322–336.
American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders (5th
ed.). Washington, DC: APA.
Anshel, A., & Kipper, D. A. (1988). The influence of group singing on trust and cooperation. Journal
of Music Therapy 25(3), 145–155.
Arnal, L. H., & Giraud, A. L. (2012). Cortical oscillations and sensory predictions. Trends in
Arnal, L. H., Poeppel, D., & Giraud, A. L. (2015). Temporal coding in the auditory cortex. In G.
Celesia & G. Hickok (Eds.), Handbook of clinical neurology, Vol. 129: The human auditory system
(pp. 85–98). Amsterdam: Elsevier.
Balkwill, L. L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of
emotion in music: Psychophysical and cultural cues. Music Perception: An Interdisciplinary
Journal 17(1), 43–64.
Balkwill, L. L., Thompson, W. F., & Matsunaga, R. I. E. (2004). Recognition of emotion in Japanese,
337–349.
Baruch, C., & Drake, C. (1997). Tempo discrimination in infants. Infant Behavior and Development
20(4), 573–577.
Basirat, A., Dehaene, S., & Dehaene-Lambertz, G. (2014). A hierarchy of cortical responses to
sequence violations in three-month-old infants. Cognition 132(2), 137–150.
Bauer, A. K. R., Bleichner, M. G., Jaeger, M., Thorne, J. D., & Debener, S. (2018). Dynamic phase
alignment of ongoing auditory cortex oscillations. NeuroImage 167, 396–407.
Behrens, G. A., & Green, S. B. (1993). The ability to identify emotional content of solo
improvisations performed vocally and on three different instruments. Psychology of Music 21(1),
20–33.
Bentley, D., & Hoy, R. R. (1974). The neurobiology of cricket song. Scientific American 231(2), 34–
45.
Bhat, A. N., & Srinivasan, S. (2013). A review of “music and movement” therapies for children with
autism: Embodied interventions for multisystem development. Frontiers in Integrative
Neuroscience 7, 22. Retrieved from https://doi.org/10.3389/fnint.2013.00022
123.
Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., & Dacquet, A. (2005). Multidimensional scaling
of emotional responses to music: The effect of musical expertise and of the duration of the
excerpts. Cognition & Emotion 19(8), 1113–1139.
Bispham, J. (2006). Rhythm in music: What is it? Who has it? And why? Music Perception: An
Brochard, R., Abecasis, D., Potter, D., Ragot, R., & Drake, C. (2003). The “ticktock” of our internal
clock: Direct brain evidence of subjective accents in isochronous sequences. Psychological
Science 14(4), 362–366.
Brown, S., & Volgsten, U. (Eds.). (2006). Music and manipulation: On the social uses and social
control of music. New York: Berghahn.
Brown-Lum, M., & Zwicker, J. G. (2015). Brain imaging increases our understanding of
developmental coordination disorder: A review of literature and future directions. Current
Developmental Disorders Reports 2(2), 131–140.
Brown-Lum, M., & Zwicker, J. G. (2017). Neuroimaging and occupational therapy: Bridging the gap
to advance rehabilitation in developmental coordination disorder. Journal of Motor Behavior
49(1), 98–110.
Buck, J. B. (1935). Synchronous flashing of fireflies experimentally induced. Science 81, 339–340.
Buck, J. B. (1937). Studies on the firefly. I. The effects of light and other agents on flashing in
Photinus pyralis, with special reference to periodicity and diurnal rhythm. Physiological Zoology
10(1), 45–58.
Buck, J. B. (1988). Synchronous rhythmic flashing of fireflies. II. Quarterly Review of Biology 63(3),
265–289.
Calderone, D. J., Lakatos, P., Butler, P. D., & Castellanos, F. X. (2014). Entrainment of neural
oscillations as a modifiable substrate of attention. Trends in Cognitive Sciences 18(6), 300–309.
Cason, N., Astésano, C., & Schön, D. (2015). Bridging music and speech rhythm: Rhythmic priming
and audio-motor training affect speech perception. Acta Psychologica 155, 43–50.
Cason, N., & Schön, D. (2012). Rhythmic priming enhances the phonological processing of speech.
Chang, A., Bosnyak, D., & Trainor, L. J. (under review a). Beta oscillatory power modulation reflects
the predictability of pitch change.
Chang, A., Bosnyak, D., & Trainor, L. J. (2018). Beta oscillatory power modulation reflects the
predictability of pitch change. Cortex, 106, 248–260.
Chang, A., Chan, J., Li, Y.-C., Cairney, J., & Trainor, L. J. (2017). Auditory timing deficits in
developmental coordination disorder. Presented at the 1st Conference of the Timing Research
Forum, Strasbourg, France, October 23–25.
Chang, A., Livingstone, S. R., Bosnyak, D. J., & Trainor, L. J. (2017). Body sway reflects leadership
in joint music performance. Proceedings of the National Academy of Sciences 114(21), E4134–
E4141.
Chang, H. W., & Trehub, S. E. (1977). Infants’ perception of temporal grouping in auditory patterns.
Child Development 48(4), 1666–1670.
Cirelli, L. K. (2018). How interpersonal synchrony facilitates early prosocial behavior. Current
Opinion in Psychology 20, 35–39.
Cirelli, L. K., Bosnyak, D., Manning, F. C., Spinelli, C., Marie, C., Fujioka, T., … Trainor, L. J.
(2014). Beat-induced fluctuations in auditory cortical beta-band activity: Using EEG to measure
age-related changes. Frontiers in Psychology 5, 742. Retrieved from
Cirelli, L. K., Spinelli, C., Nozaradan, S., & Trainor, L. J. (2016). Measuring neural entrainment to
beat and meter in infants: Effects of music background. Frontiers in Neuroscience 10, 229.
Cirelli, L. K., Trehub, S. E., & Trainor, L. J. (2018). Rhythm and melody as social signals for infants.
Annals of the New York Academy of Sciences. doi:10.1111/nyas.13580
Cirelli, L. K., Wan, S. J., Johanis, T. C., & Trainor, L. J. (2018). Infants’ use of interpersonal
asynchrony as a signal for third-party affiliation. Music & Science 1.
doi:10.1177/2059204317745855
Cirelli, L. K., Wan, S. J., Spinelli, C., & Trainor, L. J. (2017). Effects of interpersonal movement
synchrony on infant helping behaviors: Is music necessary? Music Perception: An
Cirelli, L. K., Wan, S. J., & Trainor, L. J. (2014). Fourteen-month-old infants use interpersonal
synchrony as a cue to direct helpfulness. Philosophical Transactions of the Royal Society B:
Biological Sciences 369(1658), 20130400.
Clarke, E. F., & Krumhansl, C. L. (1990). Perceiving musical time. Music Perception: An
Cogo-Moreira, H., de Avila, C. R. B., Ploubidis, G. B., & de Jesus Mari, J. (2013). Effectiveness of
music education for the improvement of reading skills and academic achievement in young poor
readers: A pragmatic cluster-randomized, controlled clinical trial. PloS ONE 8(3), e59984.
Conrad, N. J., Walsh, J., Allen, J. M., & Tsang, C. D. (2011). Examining infants’ preferences for
tempo in lullabies and playsongs. Canadian Journal of Experimental Psychology/Revue
canadienne de psychologie expérimentale 65(3), 168–172.
Corbeil, M., Trehub, S. E., & Peretz, I. (2016). Singing delays the onset of infant distress. Infancy
21(3), 373–391.
Cunningham, J. G., & Sterling, R. S. (1988). Developmental change in the understanding of affective
meaning in music. Motivation and Emotion 12(4), 399–413.
Dalla Bella, S., Peretz, I., Rousseau, L., & Gosselin, N. (2001). A developmental study of the
affective value of tempo and mode in music. Cognition 80(3), B1–B10.
D’Ausilio, A., Novembre, G., Fadiga, L., & Keller, P. E. (2015). What can music tell us about social
interaction? Trends in Cognitive Sciences 19(3), 111–114.
Debrabant, J., Gheysen, F., Caeyenberghs, K., Van Waelvelde, H., & Vingerhoets, G. (2013). Neural
underpinnings of impaired predictive motor timing in children with developmental coordination
disorder. Research in Developmental Disabilities 34(5), 1478–1487.
de l’Etoile, S. K., and Leider, C. N. (2011). Acoustic parameters of infant-directed singing in mothers
with depressive symptoms. Infant Behavior and Development 34(2), 248–256.
Deliege, I. (1987). Grouping conditions in listening to music: An approach to Lerdahl & Jackendoff’s
grouping preference rules. Music Perception: An Interdisciplinary Journal 4(4), 325–359.
Demany, L., McKenzie, B., & Vurpillot, E. (1977). Rhythm perception in early infancy. Nature
266(5604), 718–719.
Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical
linguistic structures in connected speech. Nature Neuroscience 19(1), 158–164.
speech and music. Neuroscience & Biobehavioral Reviews 81(B), 181–187.
Dissanayake, E. (2012). The earliest narratives were musical. Research Studies in Music Education
34(1), 3–14.
Dolgin, K. G., & Adelson, E. H. (1990). Age changes in the ability to interpret affect in sung and
instrumentally-presented melodies. Psychology of Music 18(1), 87–98.
Dowling, W. J. (1973). Rhythmic groups and subjective chunks in memory for melodies. Perception
& Psychophysics 14(1), 37–40.
Drake, C. (1993). Reproduction of musical rhythms by children, adult musicians, and adult
nonmusicians. Perception & Psychophysics 53(1), 25–33.
Drake, C., Jones, M. R., & Baruch, C. (2000). The development of rhythmic attending in auditory
sequences: Attunement, referent period, focal attending. Cognition 77(3), 251–288.
Dunfield, K., Kuhlmeier, V. A., O’Connell, L., & Kelley, E. (2011). Examining the diversity of
prosocial behavior: Helping, sharing, and comforting in infancy. Infancy 16(3), 227–247.
Eerola, T., Luck, G., & Toiviainen, P. (2006). An investigation of pre-schoolers’ corporeal
synchronization with music. In Proceedings of the 9th International Conference on Music
Perception and Cognition (pp. 472–476). The Society for Music Perception and Cognition and
European Society for the Cognitive Sciences of Music Bologna.
Einarson, K. M., & Trainor, L. J. (2013). Five-year-old children’s beat perception and beat
synchronization abilities. Frontiers in Human Neuroscience. Conference abstract: 14th Rhythm
Production and Perception Workshop, Birmingham, September 11–13.
Einarson, K. M., & Trainor, L. J. (2015). The effect of visual information on young children’s
perceptual sensitivity to musical beat alignment. Timing & Time Perception 3(1–2), 88–101.
Einarson, K. M., & Trainor, L. J. (2016). Hearing the beat: Young children’s perceptual sensitivity to
beat alignment varies according to metric structure. Music Perception: An Interdisciplinary
Journal 34(1), 56–70.
Endedijk, H. M., Ramenzoni, V. C., Cox, R. F., Cillessen, A. H., Bekkering, H., & Hunnius, S.
(2015). Development of interpersonal coordination between peers during a drumming task.
Developmental Psychology 51(5), 714–721.
Esposito, A., and Serio, M. (2007). Children’s perception of musical emotional expressions. In A.
Esposito, M. Faundez-Zanuy, E. Keller, & M. Marinaro (Eds.), Verbal and nonverbal
communication behaviours: COST Action 2102 international workshop, Vietri sul Mare, Italy,
March 29–31, 2007. Revised selected and invited papers (pp. 51–64). Berlin: Springer.
Essens, P. J. (1986). Hierarchical organization of temporal patterns. Perception & Psychophysics
40(2), 69–73.
Essens, P. J., & Povel, D. J. (1985). Metrical and nonmetrical representations of temporal patterns.
Estil, L., Ingvaldsen, R., & Whiting, H. (2002). Spatial and temporal constraints on performance in
children with movement co-ordination problems. Experimental Brain Research 147(2), 153–161.
Falter, C. M., & Noreika, V. (2014). Time processing in developmental disorders: A comparative
view. In V. Arstila & D. Lloyd (Eds.), Subjective time: The philosophy, psychology, and
neuroscience of temporality (pp. 557–598). Cambridge, MA: MIT Press.
Fawcett, C., & Tunçgenç, B. (2017). Infants’ use of movement synchrony to infer social affiliation in
others. Journal of Experimental Child Psychology 160, 127–136.
Feldman, R. (2007). Parent–infant synchrony and the construction of shared timing: Physiological
precursors, developmental outcomes, and risk conditions. Journal of Child Psychology and
Psychiatry 48(3–4), 329–354.
Fernald, A. (1991). Prosody in speech to children: Prelinguistic and linguistic functions. Annals of
Child Development 8, 43–80.
Fitch, W. T. (2006). The biology and evolution of music: A comparative perspective. Cognition
100(1), 173–215.
Fitzpatrick, P., Schmidt, R. C., & Lockman, J. J. (1996). Dynamical patterns in the development of
clapping. Child Development 67(6), 2691–2708.
Flaugnacco, E., Lopez, L., Terribili, C., Zoia, S., Buda, S., Tilli, S., … Schön, D. (2014). Rhythm
perception and production predict reading abilities in developmental dyslexia. Frontiers in Human
Neuroscience 8, 392. doi:10.3389/fnhum.2014.00392
Flom, R., Gentile, D. A., & Pick, A. D. (2008). Infants’ discrimination of happy and sad music.
Infant Behavior and Development 31(4), 716–728.
Flom, R., & Pick, A. D. (2012). Dynamics of infant habituation: Infants’ discrimination of musical
excerpts. Infant Behavior and Development 35(4), 697–704.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutch (Ed.), The psychology of music (pp. 149–180).
San Diego: Academic Press.
Fujioka, T., Ross, B., & Trainor, L. J. (2015). Beta-band oscillations represent auditory beat and its
metrical hierarchy in perception and imagery. Journal of Neuroscience 35(45), 15187–15198.
Gabrielsson, A., & Juslin, P. N. (1996). Emotional expression in music performance: Between the
performer’s intention and the listener’s experience. Psychology of Music 24(1), 68–91.
Gabrielsson, A., & Lindström, E. (2010). The role of structure in the musical expression of emotions.
In P. Juslin and J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research,
applications (pp. 367–400). Oxford: Oxford University Press.
Gagnon, L., & Peretz, I. (2003). Mode and tempo relative contributions to “happy-sad” judgements
in equitone melodies. Cognition & Emotion 17(1), 25–40.
Gerardi, G. M., & Gerken, L. (1995). The development of affective responses to modality and
melodic contour. Music Perception: An Interdisciplinary Journal 12(3), 279–290.
Gerry, D. W., Faux, A. L., & Trainor, L. J. (2010). Effects of Kindermusik training on infants’
rhythmic enculturation. Developmental Science 13(3), 545–551.
Gerry, D. W., Unrau, A., & Trainor, L. J. (2012). Active music classes in infancy enhance musical,
communicative and social development. Developmental Science 15(3), 398–407.
Giomo, C. J. (1993). An experimental study of children’s sensitivity to mood in music. Psychology of
Music 21(2), 141–162.
Giraud, A. L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S., & Laufs, H. (2007).
Endogenous cortical rhythms determine cerebral specialization for speech perception and
production. Neuron 56(6), 1127–1134.
Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging
computational principles and operations. Nature Neuroscience 15(4), 511–517.
Gjerdingen, R. O. (1989). Meter as a mode of attending: A network simulation of attentional
rhythmicity in music. Intégral 3, 67–91.
Gomez, A., & Sirigu, A. (2015). Developmental coordination disorder: Core sensori-motor deficits,
neurobiology and etiology. Neuropsychologia 79(B), 272–287.
Goswami, U., Huss, M., Mead, N., Fosker, T., & Verney, J. P. (2013). Perception of patterns of
musical beat distribution in phonological developmental dyslexia: Significant longitudinal
relations with word reading and reading comprehension. Cortex 49(5), 1363–1376.
Gregory, A. H., Worrall, L., & Sarge, A. (1996). The development of emotional responses to music in
young children. Motivation and Emotion 20(4), 341–348.
Psychology 7, 26. doi:10.3389/fpsyg.2016.00026
Háden, G. P., Németh, R., Török, M., & Winkler, I. (2015). Predictive processing of pitch trends in
newborn infants. Brain Research 1626, 14–20.
Haegens, S., & Zion Golumbic, E. (2018). Rhythmic facilitation of sensory processing: A critical
review. Neuroscience & Biobehavioral Reviews 86, 150–165.
Hannon, E. E., Vandenbosch der Nederlanden, C. M., & Tichko, P. (2012). Effects of perceptual
experience on children’s and adults’ perception of unfamiliar rhythms. Annals of the New York
Hannon, E. E., & Johnson, S. P. (2005). Infants use meter to categorize rhythms and melodies:
Implications for musical structure learning. Cognitive Psychology 50(4), 354–377.
Hannon, E. E., Nave-Blodgett, J. E., & Nave, K. M. (2018). The developmental origins of the
perception and production of musical rhythm. Child Development Perspectives.
doi:10.1111/cdep.12285
Hannon, E. E., Soley, G., & Ullal, S. (2012). Familiarity overrides complexity in rhythm perception:
A cross-cultural comparison of American and Turkish listeners. Journal of Experimental
Science 16(1), 48–55.
Hardy, M. W., and LaGasse, A. B. (2013). Rhythm, movement, and autism: Using rhythmic
rehabilitation research as a model for autism. Frontiers in Integrative Neuroscience 7, 19.
Retrieved from https://doi.org/10.3389/fnint.2013.00019
He, C., Hotson, L., & Trainor, L. J. (2007). Mismatch responses to pitch changes in early infancy.
Henry, M. J., Herrmann, B., & Obleser, J. (2014). Entrained neural oscillations in multiple frequency
bands comodulate behavior. Proceedings of the National Academy of Sciences 111(41), 14935–
14940.
Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains slow neural oscillations and
optimizes human listening behavior. Proceedings of the National Academy of Sciences 109(49),
20095–20100.
Herrmann, B., Henry, M. J., Haegens, S., & Obleser, J. (2016). Temporal expectations and neural
amplitude fluctuations in auditory cortex interactively influence perception. NeuroImage 124,
487–497.
Hove, M. J., & Risen, J. L. (2009). It’s all in the timing: Interpersonal synchrony increases affiliation.
Social Cognition 27(6), 949–960.
Huron, D. (2001). Is music an evolutionary adaptation? Annals of the New York Academy of Sciences
930(1), 43–61.
MIT Press.
Huss, M., Verney, J. P., Fosker, T., Mead, N., & Goswami, U. (2011). Music, rhythm, rise time
perception and developmental dyslexia: Perception of musical meter predicts reading and
phonology. Cortex 47(6), 674–689.
Ilari, B. (2015). Rhythmic engagement with music in early childhood: A replication and extension.
Ilari, B. (2016). Music in the early years: Pathways into the social world. Research Studies in Music
Education 38(1), 23–39.
Ilie, G., & Thompson, W. F. (2006). A comparison of acoustic cues in music and speech for three
dimensions of affect. Music Perception: An Interdisciplinary Journal 23(4), 319–330.
Isaksson, S., Salomäki, S., Tuominen, J., Arstila, V., Falter-Wagner, C. M., & Noreika, V. (2018). Is
there a generalized timing impairment in autism spectrum disorders across time scales and
paradigms? Journal of Psychiatric Research 99, 111–121.
Iversen, J. R., & Patel, A. D. (2008). The Beat Alignment Test (BAT): Surveying beat processing
abilities in the general population. In K. Miyazaki, M. Adachi, Y. Hiraga, Y. Nakajima, & M.
Tsuzaki (Eds.), Proceedings of the 10th International Conference on Music Perception and
Cognition (ICMPC 10) (pp. 465–468).
Iversen, J. R., Repp, B. H., & Patel, A. D. (2009). Top-down control of rhythm perception modulates
early auditory responses. Annals of the New York Academy of Sciences 1169, 58–73.
Iversen, S., Berg, K., Ellertsen, B., & Tønnessen, F. E. (2005). Motor coordination difficulties in a
municipality group and in a clinical sample of poor readers. Dyslexia 11(3), 217–231.
James, C. E., Michel, C. M., Britz, J., Vuilleumier, P., & Hauert, C. A. (2012). Rhythm evokes action:
Early processing of metric deviances in expressive music by experts and laymen revealed by ERP
source imaging. Human Brain Mapping 33(12), 2751–2767.
96(3), 459–491.
Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stimulus-
Jusczyk, P. W., & Krumhansl, C. L. (1993). Pitch and rhythmic patterns affecting infants’ sensitivity
to musical phrase structure. Journal of Experimental Psychology: Human Perception and
Juslin, P. N. (1997). Emotional communication in music performance: A functionalist perspective
and some data. Music Perception: An Interdisciplinary Journal 14(4), 383–418.
Juslin, P. N. (2000). Cue utilization in communication of emotion in music performance: Relating
performance to perception. Journal of Experimental Psychology: Human Perception and
Performance 26(6), 1797–1813.
Juslin, P. N., & Laukka, P. (2003). Emotional expression in speech and music. Annals of the New
Juslin, P. N., & Timmers, R. (2010). Expression and communication of emotion in music
performance. In P. Juslin and J. A. Sloboda (Eds.), Handbook of music and emotion: Theory,
research, applications (pp. 453–489). Oxford: Oxford University Press.
Kastner, M. P., & Crowder, R. G. (1990). Perception of the major/minor distinction: IV. Emotional
connotations in young children. Music Perception: An Interdisciplinary Journal 8(2), 189–201.
Keller, P. E., Novembre, G., & Hove, M. J. (2014). Rhythm in joint action: Psychological and
neurophysiological mechanisms for real-time interpersonal coordination. Philosophical
Transactions of the Royal Society B: Biological Sciences 369(1658), 20130394.
doi:10.1098/rstb.2013.0394
Kelso, J. A., Saltzman, E. L., & Tuller, B. (1986). The dynamical perspective on speech production:
Data and theory. Journal of Phonetics 14(1), 29–59.
King-Dowling, S., Missiuna, C., Rodriguez, M. C., Greenway, M., & Cairney, J. (2015). Reprint of
“Co-occurring motor, language and emotional–behavioral problems in children 3–6 years of age.”
Human Movement Science 42, 344–351.
Kirschner, S., & Ilari, B. (2014). Joint drumming in Brazilian and German preschool children:
Cultural differences in rhythmic entrainment, but no prosocial effects. Journal of Cross-Cultural
Psychology 45(1), 137–166.
Kirschner, S., & Tomasello, M. (2009). Joint drumming: Social context facilitates synchronization in
preschool children. Journal of Experimental Child Psychology 102(3), 299–314.
Kokal, I., Engel, A., Kirschner, S., & Keysers, C. (2011). Synchronized drumming enhances activity
in the caudate and facilitates prosocial commitment—if the rhythm comes easily. PLoS ONE 6(11),
e27272.
Kragness, H. E., Baksh, A., Battcock, A., & Trainor, L. J. (2017). Children’s use of expressive cues
in music: A developmental self-pacing study. Presented at the Neurosciences & Music VI
conference, Boston, USA, June 15–18.
Kragness, H. E., & Trainor, L. J. (2016). Listeners lengthen phrase boundaries in self-paced music.
Kragness, H. E., & Trainor, L. J. (2018). Young children pause on phrase boundaries in self-paced
music listening: The role of harmonic cues. Developmental Psychology 54(5), 842–856.
Kratus, J. (1993). A developmental study of children’s interpretation of emotion in music.
Krumhansl, C. L., & Jusczyk, P. W. (1990). Infants’ perception of phrase structure in music.
Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a
facilitation effect for native language phonetic perception between 6 and 12 months.
Developmental Science 9(2), F13–F21.
Laukka, P., & Gabrielsson, A. (2000). Emotional expression in drumming performance. Psychology
of Music 28(2), 181–189.
Launay, J., Dean, R. T., & Bailes, F. (2013). Synchronization can influence trust following virtual
interaction. Experimental Psychology 60(1), 1–11.
Press.
Lewkowicz, D. J. (2003). Learning and discrimination of audiovisual events in human infants: The
hierarchical relation between intersensory temporal synchrony and rhythmic pattern cues.
Lindenberger, U., Li, S. C., Gruber, W., & Müller, V. (2009). Brains swinging in concert: Cortical
phase synchronization while playing guitar. BMC Neuroscience 10(1), 22.
London, J. (2004). Hearing in time: Psychological aspects of musical meter (2nd ed.). New York:
Longhi, S. (2009). Bloch oscillations in complex crystals with PT symmetry. Physical Review Letters
103(12), 123601.
Luck, G., & Toiviainen, P. (2006). Ensemble musicians’ synchronization with conductors’ gestures:
An automated feature-extraction analysis. Music Perception: An Interdisciplinary Journal 24(2),
189–200.
Lynch, M. P., Eilers, R. E., Oller, D. K., & Urbano, R. C. (1990). Innateness, experience, and music
perception. Psychological Science 1(4), 272–276.
McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our
lives: Life span development of timing and event tracking. Journal of Experimental Psychology:
General 135(3), 348–367.
McLeod, K. R., Langevin, L. M., Goodyear, B. G., & Dewey, D. (2014). Functional connectivity of
neural motor networks is disrupted in children with developmental coordination disorder and
attention-deficit/hyperactivity disorder. NeuroImage: Clinical 4, 566–575.
McNeill, W. H. (1995). Keeping together in time. Boston, MA: Harvard University Press.
Macrae, C. N., Duffy, O. K., Miles, L. K., & Lawrence, J. (2008). A case of hand waving: Action
synchrony and person perception. Cognition 109(1), 152–156.
Malloch, S. E., & Trevarthen, C. E. (2009). Communicative musicality: Exploring the basis of human
companionship. New York: Oxford University Press.
Royal Society B: Biological Sciences 370(1664), 20140093. doi:10.1098/rstb.2014.0093
Merker, B. (2000). Synchronous chorusing and human origins. In N. L. Wallin, B. Merker, & S.
Brown (Eds.), The origins of music (pp. 315–327). Cambridge, MA: MIT Press.
Mohn, C., Argstatter, H., & Wilker, F. W. (2011). Perception of six basic emotions in music.
Morillon, B., & Schroeder, C. E. (2015). Neuronal oscillations as a mechanistic substrate of auditory
temporal prediction. Annals of the New York Academy of Sciences 1337, 26–31.
Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal prediction in lieu of
periodic stimulation. Journal of Neuroscience 36(8), 2342–2347.
Mote, J. (2011). The effects of tempo and familiarity on children’s affective interpretation of music.
Emotion 11(3), 618–622.
Nakata, T., & Mitani, C. (2005). Influences of temporal fluctuation on infant attention. Music
Nakata, T., & Trainor, L. J. (2015). Perceptual and cognitive enhancement with an adaptive timing
partner: Electrophysiological responses to pitch change. Psychomusicology: Music, Mind, and
Brain 25(4), 404–415.
Nawrot, E. S. (2003). The perception of emotional expression in music: Evidence from infants,
children and adults. Psychology of Music 31(1), 75–92.
Nobre, A. C., Correa, A., & Coull, J. T. (2007). The hazards of time. Current Opinion in
Neurobiology 17(4), 465–470.
Nobre, A. C., & van Ede, F. (2018). Anticipated moments: Temporal structure in attention. Nature
Reviews Neuroscience 19(1), 34–48.
Nozaradan, S. (2014). Exploring how musical rhythm entrains brain activity with
electroencephalogram frequency-tagging. Philosophical Transactions of the Royal Society B:
Biological Sciences 369(1658), 20130393. doi:10.1098/rstb.2013.0393
Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the neuronal entrainment to
beat and meter. Journal of Neuroscience 31(28), 10234–10240.
Nozaradan, S., Peretz, I., & Mouraux, A. (2012). Selective neuronal entrainment to the beat and
meter embedded in a musical rhythm. Journal of Neuroscience 32(49), 17572–17581.
Overy, K., Nicolson, R. I., Fawcett, A. J., & Clarke, E. F. (2003). Dyslexia and music: Measuring
musical timing skills. Dyslexia 9(1), 18–36.
Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental
Palmer, C., & Krumhansl, C. L. (1987). Pitch and temporal contributions to musical phrase
perception: Effects of harmony, performance timing, and familiarity. Perception & Psychophysics
41(6), 505–518.
Papoušek, M., Papoušek, H., & Symmes, D. (1991). The meanings of melodies in motherese in tone
and stress languages. Infant Behavior and Development 14(4), 415–440.
Partridge, B. L. (1982). The structure and function of fish schools. Scientific American 246(6), 114–
123.
Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in Systems Neuroscience
Patel, A. D., Iversen, J. R., Bregman, M. R., & Schulz, I. (2009). Experimental evidence for
synchronization to a musical beat in a nonhuman animal. Current Biology 19(10), 827–830.
Pearce, M. T., Müllensiefen, D., & Wiggins, G. A. (2010). The role of expectation and probabilistic
learning in auditory boundary perception: A model comparison. Perception 39(10), 1367–1391.
Pearce, M. T., & Wiggins, G. A. (2006). Expectation in melody: The influence of context and
learning. Music Perception: An Interdisciplinary Journal 23(5), 377–405.
Peelle, J. E., & Davis, M. H. (2012). Neural oscillations carry speech rhythm through to
comprehension. Frontiers in Psychology 3, 320. Retrieved from
Peretz, I. (1989). Clustering in music: An appraisal of task factors. International Journal of
Psychology 24(1–5), 157–178.
Phillips-Silver, J., & Trainor, L. J. (2005). Feeling the beat: Movement influences infant rhythm
perception. Science 308(5727), 1430.
Piek, J. P., & Dyck, M. J. (2004). Sensory-motor deficits in children with developmental coordination
disorder, attention deficit hyperactivity disorder and autistic disorder. Human Movement Science
23(3–4), 475–488.
Provasi, J., & Bobin-Bègue, A. (2003). Spontaneous motor tempo and rhythmical synchronisation in
2½- and 4-year-old children. International Journal of Behavioral Development 27(3), 220–231.
Rabinowitch, T. C., & Meltzoff, A. N. (2017). Synchronized movement experience enhances peer
cooperation in preschool children. Journal of Experimental Child Psychology 160, 21–32.
Rankin, S. K., Large, E. W., & Fink, P. W. (2009). Fractal tempo fluctuation and pulse prediction.
Reiersen, A. M., Constantino, J. N., & Todd, R. D. (2008). Co-occurrence of motor problems and
autistic symptoms in attention-deficit/hyperactivity disorder. Journal of the American Academy of
Child & Adolescent Psychiatry 47(6), 662–672.
Repp, B. H. (1992). Diversity and commonality in music performance: An analysis of timing
microstructure in Schumann’s “Träumerei.” Journal of the Acoustical Society of America 92(5),
2546–2568.
Repp, B. H. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic
Repp, B. H., London, J., & Keller, P. E. (2005). Production and synchronization of uneven rhythms at
fast tempi. Music Perception: An Interdisciplinary Journal 23(1), 61–78.
Repp, B. H., & Su, Y. H. (2013). Sensorimotor synchronization: A review of recent research (2006–
2012). Psychonomic Bulletin & Review 20(3), 403–452.
Rock, A. M., Trainor, L. J., & Addison, T. L. (1999). Distinctive messages in infant-directed lullabies
and play songs. Developmental Psychology 35(2), 527–534.
Rosenblum, S., & Regev, N. (2013). Timing abilities among children with developmental
coordination disorders (DCD) in comparison to children with typical development. Research in
Developmental Disabilities 34(1), 218–227.
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology
39(6), 1161–1178.
Sänger, J., Müller, V., & Lindenberger, U. (2012). Intra- and interbrain synchronization and network
properties when playing guitar in duets. Frontiers in Human Neuroscience 6, 312. Retrieved from
Schachner, A., Brady, T. F., Pepperberg, I. M., & Hauser, M. D. (2009). Spontaneous motor
entrainment to music in multiple vocal mimicking species. Current Biology 19(10), 831–836.
Schön, D., & Tillmann, B. (2015). Short- and long-term rhythmic interventions: Perspectives for
language rehabilitation. Annals of the New York Academy of Sciences 1337, 32–39.
Schroeder, C. E., & Lakatos, P. (2009). Low-frequency neuronal oscillations as instruments of
sensory selection. Trends in Neurosciences 32(1), 9–18.
Schröger, E., Marzecová, A., & SanMiguel, I. (2015). Attention and prediction in human audition: A
lesson from cognitive psychophysiology. European Journal of Neuroscience 41(5), 641–664.
Shenfield, T., Trehub, S. E., & Nakata, T. (2003). Maternal singing modulates infant arousal.
Snyder, J. S., Hannon, E. E., Large, E. W., & Christiansen, M. H. (2006). Synchronization and
continuation tapping to complex meters. Music Perception: An Interdisciplinary Journal 24(2),
135–146.
Snyder, J. S., & Large, E. W. (2005). Gamma-band activity reflects the metric structure of rhythmic
tone sequences. Cognitive Brain Research 24(1), 117–126.
Soley, G., & Hannon, E. E. (2010). Infants prefer the musical meter of their own culture: A cross-
Stefanics, G., Hangya, B., Hernádi, I., Winkler, I., Lakatos, P., & Ulbert, I. (2010). Phase entrainment
of human delta oscillations can mediate the effects of expectation on reaction speed. Journal of
Neuroscience 30(41), 13578–13585.
Tal, I., Large, E. W., Rabinovitch, E., Wei, Y., Schroeder, C. E., Poeppel, D., & Golumbic, E. Z.
(2017). Neural entrainment to the beat: The “missing-pulse” phenomenon. Journal of
Neuroscience 37(26), 6331–6341.
Tarr, B., Launay, J., & Dunbar, R. I. (2014). Music and social bonding: “Self-other” merging and
neurohormonal mechanisms. Frontiers in Psychology 5, 1096. Retrieved from
Taub, G. E., McGrew, K. S., & Keith, T. Z. (2007). Improvements in interval time tracking and
effects on reading achievement. Psychology in the Schools 44(8), 849–863.
Terwogt, M. M., & Van Grinsven, F. (1991). Musical expression of moodstates. Psychology of Music
19(2), 99–109.
Thomson, J. M., Fryer, B., Maltby, J., & Goswami, U. (2006). Auditory and motor rhythm awareness
in adults with dyslexia. Journal of Research in Reading 29(3), 334–348.
Thomson, J. M., & Goswami, U. (2008). Rhythmic processing in children with developmental
dyslexia: Auditory and motor rhythms link to reading and spelling. Journal of Physiology-Paris
102(1–3), 120–129.
Thomson, J. M., Leong, V., & Goswami, U. (2013). Auditory processing interventions and
developmental dyslexia: A comparison of phonemic and rhythmic approaches. Reading and
Writing 26(2), 139–161.
Thorpe, L. A., & Trehub, S. E. (1989). Duration illusion and auditory grouping in infancy.
Todd, N. (1985). A model of expressive timing in tonal music. Music Perception: An
Toplak, M. E., Dockstader, C., & Tannock, R. (2006). Temporal information processing in ADHD:
Findings to date and new methods. Journal of Neuroscience Methods 151(1), 15–29.
Trainor, L. J. (1996). Infant preferences for infant-directed versus noninfant-directed playsongs and
lullabies. Infant Behavior and Development 19(1), 83–92.
Trainor, L. J. (2012). Musical experience, plasticity, and maturation: Issues in measuring
developmental change using EEG and MEG. Annals of the New York Academy of Sciences 1252,
25–36.
Trainor, L. J. (2015). The origins of music in auditory scene analysis and the roles of evolution and
culture in musical creation. Philosophical Transactions of the Royal Society B: Biological Sciences
370(1664), 20140089. doi: 10.1098/rstb.2014.0089
Trainor, L. J., & Adams, B. (2000). Infants’ and adults’ use of duration and intensity cues in the
segmentation of tone patterns. Perception & Psychophysics 62(2), 333–340.
Trainor, L. J., Chang, A., Cairney, J., Li, Y. C. (2018). Is auditory perceptual timing a core deficit of
developmental coordination disorder? Annals of the New York Academy of Sciences 1423, 30–39
Trainor, L. J., & Cirelli, L. (2015). Rhythm and interpersonal synchrony in early social development.
Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. A. (1997). The acoustic basis of preferences for
infant-directed singing. Infant Behavior and Development 20(3), 383–396.
Trainor, L. J., & Corrigall, K. A. (2010). Music acquisition and effects of musical experience. In M.
Riess Jones, R. Fay, & A. Popper (Eds.), Music perception (pp. 89–127). New York: Springer.
Trainor, L. J., & Hannon, E. E. (2013). Musical development. In D. Deutsch (Ed.), The psychology of
music (3rd ed., pp. 423–497). London: Academic Press.
Trainor, L. J., & He, C. (2013). Auditory and musical development. In P. D. Zelazo (Ed.), The Oxford
handbook of developmental psychology, Vol. 1: Body and mind (pp. 310–337). Oxford: Oxford
University Press.
Trainor, L. J., Marie, C., Gerry, D., Whiskin, E., & Unrau, A. (2012). Becoming musically
enculturated: Effects of music classes for infants on brain and behavior. Annals of the New York
Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants’ and adults’ sensitivity to western
musical structure. Journal of Experimental Psychology: Human Perception and Performance
18(2), 394–402.
Trainor, L. J., & Trehub, S. E. (1994). Key membership and implied harmony in Western tonal
music: Developmental perspectives. Perception & Psychophysics 56(2), 125–132.
Trainor, L. J., & Unrau, A. (2012). Development of pitch and music perception. In L. Werner, R. R.
Fay, & A. N. Popper (Eds.), Springer handbook of auditory research: Human auditory
development (pp. 223–254). New York: Springer.
Trainor, L. J., & Zatorre, R. J. (2015). The neurobiology of musical expectations from perception to
emotion. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology
(2nd ed., pp. 285–306). Oxford: Oxford University Press.
Trehub, S. E., & Thorpe, L. A. (1989). Infants’ perception of rhythm: Categorization of auditory
sequences by temporal structure. Canadian Journal of Psychology/Revue canadienne de
psychologie 43(2), 217–229.
Trehub, S. E., & Trainor, L. J. (1998). Singing to infants: Lullabies and play songs. Advances in
Infancy Research 12, 43–78.
Trehub, S. E., Unyk, A. M., Kamenetsky, S. B., Hill, D. S., Trainor, L. J., Henderson, J. L., & Saraza,
M. (1997). Mothers’ and fathers’ singing to infants. Developmental Psychology 33(3), 500–507.
Trehub, S. E., Unyk, A. M., & Trainor, L. J. (1993). Adults identify infant-directed music across
cultures. Infant Behavior and Development 16(2), 193–211.
Tunçgenç, B., & Cohen, E. (2016). Movement synchrony forges social bonds across group divides.
Tunçgenç, B., Cohen, E., & Fawcett, C. (2015). Rock with me: The role of movement synchrony in
infants’ social and nonsocial choices. Child Development 86(3), 976–984.
Valdesolo, P., & DeSteno, D. (2011). Synchrony and the social tuning of compassion. Emotion 11(2),
262–266.
Valdesolo, P., Ouyang, J., & DeSteno, D. (2010). The rhythm of joint action: Synchrony promotes
cooperative ability. Journal of Experimental Social Psychology 46(4), 693–695.
van Ede, F., Niklaus, M., & Nobre, A. C. (2017). Temporal expectations guide dynamic prioritization
in visual working memory through attenuated α oscillations. Journal of Neuroscience 37(2), 437–
445.
Van Noorden, L., & De Bruyn, L. (2009). The development of synchronization skills of children 3 to
11 years old. In Proceedings of ESCOM—7th Triennial Conference of the European Society for the
Cognitive Sciences of Music. Jyväskylä, Finland: University of Jyväskylä.
Volman, M. C. J., & Geuze, R. H. (1998). Relative phase stability of bimanual and visuomanual
rhythmic coordination patterns in children with a developmental coordination disorder. Human
Movement Science 17(4–5), 541–572.
Warneken, F., & Tomasello, M. (2006). Altruistic helping in human infants and young chimpanzees.
Science 311(5765), 1301–1303.
Warneken, F., & Tomasello, M. (2007). Helping and cooperation at 14 months of age. Infancy 11(3),
271–294.
Warneken, F., & Tomasello, M. (2009). Varieties of altruism in children and chimpanzees. Trends in
Watemberg, N., Waiserberg, N., Zuk, L., & Lerman-Sagie, T. (2007). Developmental coordination
disorder in children with attention-deficit-hyperactivity disorder and physical therapy intervention.
Developmental Medicine & Child Neurology 49(12), 920–925.
Weimerskirch, H., Martin, J., Clerquin, Y., Alexandre, P., & Jiraskova, S. (2001). Energy saving in
flight formation. Nature 413(6857), 697–698.
Werker, J. F., & Tees, R. C. (2005). Speech perception as a window for understanding plasticity and
commitment in language systems of the brain. Developmental Psychobiology 46(3), 233–251.
Wieland, E. A., McAuley, J. D., Dilley, L. C., & Chang, S. E. (2015). Evidence for a rhythm
perception deficit in children who stutter. Brain & Language 144, 26–34.
Williams, H. G., Woollacott, M. H., & Ivry, R. (1992). Timing and motor control in clumsy children.
Journal of Motor Behavior 24(2), 165–172.
Williams, J., Thomas, P. R., Maruff, P., Butson, M., & Wilson, P. H. (2006). Motor, visual and
egocentric transformations in children with developmental coordination disorder. Child: Care,
Health and Development 32(6), 633–647.
Wilmut, K., & Wann, J. (2008). The use of predictive information is impaired in the actions of
children and young adults with developmental coordination disorder. Experimental Brain Research
191(4), 403–418.
Wilson, P. H., Ruddock, S., Smits-Engelsman, B., Polatajko, H., & Blank, R. (2013). Understanding
performance deficits in developmental coordination disorder: A meta-analysis of recent research.
Developmental Medicine & Child Neurology 55(3), 217–228.
Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation. Psychological Science 20(1), 1–
5.
Winkler, I., Háden, G. P., Ladinig, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the
Wolff, P. H. (2002). Timing precision and rhythm in developmental dyslexia. Reading and Writing
15(1–2), 179–206.
Woolhouse, M., Tidhar, D., Demorest, S., Morrison, S., & Campbell, P. (2010). Group dancing leads
to increased person-perception. In Proceedings of the 11th International Conference on Music
Perception and Cognition (pp. 605–608). Seattle, WA: University of Washington.
Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010).
The development of perceptual grouping biases in infancy: A Japanese–English cross-linguistic
study. Cognition 115(2), 356–361.
interactions in music perception and production. Nature Reviews Neuroscience 8(7), 547–558.
Zentner, M., & Eerola, T. (2010). Rhythmic engagement with music in infancy. Proceedings of the
Zwicker, J. G., Missiuna, C., & Boyd, L. A. (2009). Neural correlates of developmental coordination
disorder: A review of hypotheses. Journal of Child Neurology 24(10), 1273–1281.
CHAPT E R 25
MUSIC AND THE AGING

BRAIN
L A U R A F E R R E R I, A L I N E MO U S S A R D, E MMA N U E L
B I G A N D, A N D B A R B A R A T I L L MA N N
J S wrote in Thoughts on various subjects: “Every man

desires to live long, but no man would be old.” Getting older often carries
with it a series of physical, cognitive, emotional, and social troubles, which
research in psychology and neuroscience is constantly trying to reduce.
One of the main topics in research on aging concerns cognitive decline,
namely the decreased cognitive functioning associated with increasing age
in the adult portion of the lifespan (Salthouse, 2016). Healthy older adults
usually perform worse than young adults in numerous cognitive tasks,
especially when they involve memory processes and executive functions
(e.g., Buckner, 2004; Van Petten et al., 2004). These behavioral outcomes
have been linked to changes in brain structure and function, such as the
shrinkage of cerebral regions (Raz et al., 2005), prefrontal reorganization
(Cabeza, 2002; Cabeza, Anderson, Locantore, & McIntosh, 2002), as well
as micro- and macro-structural alterations of brain connectivity (Fjell,
Sneve, Grydeland, Storsve, & Walhovd, 2017). Furthermore, normal aging
can be associated to difficulties in emotional and social domains, often
reflected in late-life depression (Aziz & Steffens, 2013; Meltzer et al.,
1998), and social isolation and disconnectedness (Cornwell & Waite, 2009).
These problems become particularly hard to fight in pathological aging (i.e.,
dementia), where severe cognitive, motor, and emotional impairments
dramatically affect patients’ and caregivers’ lives. Therefore, in a society
which is exponentially getting older (e.g., Alzheimer’s Disease
International, 2015), it has become urgent to investigate the mechanisms
promoting successful aging and to identify what can prevent, limit, and
rehabilitate cognitive and emotional impairments.
In this framework, music arises as a particularly promising stimulus.
Able to stimulate the whole brain, thus modulating cerebral activity in brain
areas involved in cognitive, motor, and emotional processes (Zatorre, 2005),
music is increasingly considered as a powerful tool to improve cognition
while promoting well-being and social connection. Furthermore, the use of
music for brain stimulation seems to be particularly appropriate in older
adults, who can perform similarly to younger adults in music perception
tasks (Halpern, Bartlett, & Dowling, 1995, 1998, Johnson et al., 2011), and
show well-preserved musical memory, even in cases where episodic
memory is impaired (Baird & Samson, 2009; Cuddy, Sikka, & Vanstone,
2015).
In this chapter, we review the main studies on music and aging to
present the music-driven beneficial effect on cognition and well-being. To
this aim, we first focus on the effects of music in normal aging, both in
terms of musical expertise and simple musical exposure. A specific section
is then devoted to the underlying brain processes. Finally, we consider
music-based therapeutic approaches in pathological aging.
N A : M C
P W -B
Music is a complex stimulus capable of modifying cognitive function and

emotional status. Here, we consider (1) the positive effects of music on
cognition in normal aging, contrasting different levels of musical activity,
such as musical expertise, short-term training, and passive musical
exposure, and (2) the effects of music on emotion and well-being. We then
discuss the potential neural substrates of such effects.
Cognition
The Effect of Musical Expertise
Music is particularly able to promote neural plasticity, namely the unique
ability of the human brain to modify its structure and function throughout
the life course, following changes within the body or in the external
environment. Musical expertise greatly promotes brain plastic changes
(Dalla Bella, 2016), and several studies comparing young adult musicians
to non-musicians have demonstrated a positive effect of music-driven brain
plasticity on many cognitive processes (see Wan & Schlaug, 2010, for
review). Encouraged by these promising findings in younger adults,
researchers have more recently started to explore if, and to what extent,
these positive effects of musical practice could also be observed in the
elderly, and so whether practicing music could contribute to reduce typical
age-related cognitive decline.
Remarkably, older musicians have been shown to perform better than
non-musicians in several behavioral tasks, involving speech perception
(Parbery-Clark, Strait, Anderson, Hittner, & Kraus, 2011; Parbery-Clark,
Anderson, Hittner, & Kraus, 2012), working memory (Amer, Kalender,
Hasher, Trehub, & Wong, 2013; Hanna-Pladdy & Gajewski, 2012; Parbery-
Clark et al., 2011), long-term memory (Gooding, Abner, Jicha, Kryscio, &
Schmitt, 2014), language (Hanna-Pladdy & MacKay, 2011), visuospatial
processing (Hanna-Pladdy & Gajewski, 2012), and executive functions,
such as planning, fluency, inhibition, and switching abilities (Amer et al.,
2013; Hanna-Pladdy & Gajewski, 2012; Moussard, Bermudez, Alain, Tays,
& Moreno, 2016; Parbery-Clark et al., 2011; Zendel & Alain, 2012).
Behavioral differences are supported by differences at neural level. For
example, Moussard et al. (2016) recently showed that during a go/no-go
task, older musicians performed better than older non-musicians in response
inhibition and showed increased amplitudes of the N2 and P3 components,
electrophysiological signatures typically associated to cognitive control
processes in this type of go/no-go task.
The observed beneficial effects of musical expertise depend on several
factors, such as onset of musical training, proficiency, and amount of formal
training or of current practice, which may have specific influences on
different aspects of cognition. For example, Hanna-Pladdy and MacKay
(2011) analyzed the predictors of musicians’ cognitive performance using
stepwise multiple regression, and reported that the onset of training
predicted the improvement of visual memory performance (both immediate
and delayed). Current musical activity was more linked to switching
abilities, while total years of practice were linked to delayed visual memory
performance and visuomotor speed. When comparing three groups of older
adults with low, medium, and high musical expertise (defined by a
questionnaire assessing music theory knowledge), Gooding et al. (2014)
found that the high proficiency group performed significantly better in an
episodic memory task than the group with lower musical proficiency.
Furthermore, formal musical training in early- to mid-life might promote
neural plasticity and leads to structural changes that potentially improve
cognitive performance across time. This hypothesis is supported by brain
findings showing that even a moderate amount of music training (4 to 14
years) is associated to neural enhancement during speech perception many
decades after training has stopped, indicating that musical practice during
childhood and young adulthood may carry relevant biological benefits into
older adulthood (White-Schwoch, Carr, Anderson, Strait, & Kraus, 2013).
These beneficial effects of musical training across the lifespan in healthy
individuals raise the question whether these benefits might also impact the
development of neurodegenerative diseases typical of pathological aging,
such as Alzheimer’s disease (AD). According to the concept of cognitive
reserve (Stern, 2002, 2009), stimulating life experiences (e.g., education
attainment, occupational activity, leisure activities, social network) are
associated with higher resilience to age-related brain diseases and to
reduced risks of developing dementia. As music is known to be cognitively
stimulating and to promote social integration, it is reasonable to think that it
could contribute to building such reserve throughout the life course. In line
with these predictions, Wilson and colleagues (Wilson, Boyle, Yang, James,
& Bennett, 2015) found that musical training during youth was associated
with lower risk of developing mild cognitive impairment (MCI) in a cohort
of 964 older adults. Interestingly, similar benefits were found for high levels
of foreign language instruction (see Craik, Bialystok, & Freedman, 2010).
An interesting approach linking music to cognitive reserve comes from twin
studies. When comparing twins individuals (both monozygotic and
dizygotic) who played a musical instrument in older adulthood to their non-
musician co-twins, Balbag and colleagues (Balbag, Pedersen, & Gatz,
2014) found that the musician twins were less likely to develop dementia
and cognitive impairment, supporting the idea that music may act as a
protective factor against normal and pathological cognitive decline. In the
same vein, Verghese et al. (2003) showed that frequent participation in
musical activities was associated with decreased risk for dementia (but see
also Verghese et al., 2006, for more mitigated results regarding risks of
MCI, where only the frequency of participation in leisure activities—but
not the type of activity per se—is associated to beneficial outcomes).
Although the findings described above appear quite promising, we must
mention that the interpretation of differences between musicians and non-
musicians, as measured by correlational designs, is always restricted by
several limitations: (1) The actual contributions of “nature and nurture” to
both musical skills and cognitive differences between musicians and non-
musicians, and thus the “cause or consequence” pattern linking music
practice and cognition, need to be clarified. It could be, for example, that
individuals who started musical training had higher levels of cognitive
functioning already at baseline, and this could, at least in part, explain
cohort differences. In the same vein, it is also possible that those who keep
playing music at an older age do so because they do have better
preservation of cognitive abilities, and thus can better handle music
practice. (2) People practicing leisure activities, such as music, usually
report more socially integrated lifestyle and/or higher education level and
socio-economic status, which are confounding variables that can crucially
modulate the aging process. (3) Cross-sectional studies are little informative
about the minimum amount of musical practice needed for observing a
positive effect of musical training on cognition, which in turns limits
recommendations that could be provided to adults that have not been
practicing earlier in life: in other words, is it still worth starting musical
practice in late adulthood? In sum, to provide causal evidence for music-
induced benefits in the older adult population, intervention or longitudinal
studies become necessary. Although specifically pinpointing the long-
lasting effect of musical training across the entire life course would require
a huge—and obviously hardly feasible—longitudinal study, intervention
studies measuring the effects of short-term musical practice bring highly
valuable insights on music-induced changes.
The Effect of Short-Term Musical Training

Studies on children and young adults have shown that even short-term
musical practice is able to modulate behavioral outcomes and brain
structures. For example, adult non-musicians who learned to play a five-
finger sequence on a keyboard over only five days (Pascual-Leone, 2001)
or who were trained to play piano and read musical notation in fifteen
weeks (Stewart et al., 2003) showed better behavioral performance (such as
reading music and playing keyboard) together with a cortical brain
reorganization (e.g., increased activation of the superior parietal cortex,
critical for sensorimotor integration).
Importantly, although brain plasticity mechanisms are reduced in older
populations, environmental enrichment can strongly modulate and slow
down such reduction (see Mora, Segovia, & Del Arco, 2007). Accordingly,
short-term musical interventions have also led to beneficial effects in older
adult populations. Bugos and colleagues (Bugos, Perlstein, McCrae,
Brophy, & Bedenbaugh, 2007) found that six months of piano lessons (30-
min lesson and three hours of practice per week) were enough for
significantly improving performance at cognitive tests, involving motor
skills, executive functions, and working memory, in a group of non-
musician adults aged between 60 and 85, compared to a non-active control
group. Similar results were obtained by Seinfeld and colleagues (Seinfeld,
Figueroa, Ortiz-Gil, & Sanchez-Vives, 2013) after four months of daily
piano training. In this study, an effort was made to quantify leisure activities
practiced by individuals from the control group (e.g., physical activity,
computer, language or painting lessons, etc.), to ensure that they at least
received some degree of stimulation as well, suggesting that adding music
training to people’s regular activities may make a difference. Furthermore,
in line with previous findings, results from Alain and colleagues (Alain et
al., 2019) show that three months of music lessons involving rhythmic
training with percussive instruments, music theory, and singing were
sufficient to show modulation in the electrophysiological response during
various cognitive tasks in older adults.
It is, however, important to mention some methodological limitations of
those studies. (1) Sample sizes are quite small (e.g., experimental group n =
16, Bugos et al., 2007; n = 13, Seinfeld et al., 2013) and participants have
particularly high levels of education, which does not necessarily allow for
generalization of results to the whole population of older adults. (2)
Appropriate control conditions (i.e., active control with similar amount or
frequency of activities), as well as random assignment design, would be
required to be able to draw solid conclusions on the effects of such training
programs. (3) Long-term follow-up would be informative of the long-
lasting effects of such interventions, and crucial to provide evidence for
music-driven reduction of age-related decline over time and of dementia
prevalence.
Despite these limitations, the findings represent promising data to
suggest that even short-term musical training, reflected in intensive training
programs requiring motor, multisensory, and cognitive integration, may be
able to induce changes at behavioral and neural levels, and to strengthen
cognition in the elderly. This is particularly encouraging as it suggests that
music could potentially contribute to the development of cognitive reserve
through late-life (and not only lifelong) stimulation.
The Effect of Passive Musical Exposure

Another set of studies have shown that simply listening to music could also
temporarily enhance cognition. For example, simple passive exposure to
background music during cognitive tasks has been shown to improve older
adults’ cognitive processes, such as word fluency (Thompson, Moulin,
Hayre, & Jones, 2005), processing speed (Bottiroli, Rosi, Russo, Vecchi, &
Cavallini, 2014), working memory (Mammarella, Fairfield, & Cornoldi,
2007), and declarative memory (Bottiroli et al., 2014; Ferreri et al., 2014)
(see El Haj, Omigie, & Clément, 2014, for an exception). Bottiroli et al.
(2014) measured declarative memory and processing speed in a group of
older adults listening to negatively and positively valenced classical music
(compared to control conditions of silence and white noise). Their results
showed that processing speed was better with a musical background of
positive valence, and that declarative memory benefited from both negative
and positive background music when compared to the control conditions. In
a study investigating episodic memory and prefrontal cortex modulation by
music, Ferreri et al. (2014) found that encoding words with a musical (i.e.,
instrumental jazz/blues) background (versus silence) resulted in higher
memory performance and less engagement of the prefrontal cortex. The
authors suggested that music may act as a facilitating factor for memory
performance by easing the verbal encoding and disengaging the prefrontal
cortex, whose impairments in older adults are usually related to episodic
memory deficits. In contrast with studies presented above, where musical
activity is thought to strengthen cognitive processes, cognitive enhancement
during or following music listening is considered as a temporary
enhancement of the efficiency of brain processes in general, probably due to
global arousal and emotional effects (see section “The Underpinning Brain
Mechanisms”).
These results, together with the ones on musical training discussed
earlier, lead to three main conclusions: (1) lifelong musical expertise seems
to reduce cognitive decline, possibly by contributing to building some form
of cognitive reserve; (2) musical interventions at an old age are able to
improve cognitive functions in the absence of prior musical expertise,
suggesting a positive effect of late-life training; (3) as both active music
training and passive music listening can result in improved cognitive
performance, it is reasonable to think that numerous and diverse neural
mechanisms are prompted by music. The next two sections present studies
investigating the effects of music on non-cognitive domains, like emotions
and well-being, as well as the potential brain mechanisms implicated in
cognitive and emotional effects of music in the elderly.
Emotions and Well-Being

Difficulties in emotional and social spheres, which can lead to depression
and loneliness, threaten older adults’ well-being (see Luanaigh & Lawlor,
2008). These difficulties might be due to intense life changes (e.g.,
retirement, bereavement), cognitive and physical decline, hormonal
changes, or changes in social relationships. Music has the tremendous
power to evoke strong emotions and intense pleasure, thus modulating
subjective mood and arousal (see Koelsch, 2014). Emotions evoked by
music are associated to physiological responses (such as changes in skin
conductance and heart rate or hormone release) and brain activation in the
mesolimbic systems (see Zatorre & Salimpoor, 2013). Furthermore, musical
activities often involve social functions promoting social contact, empathy,
cooperation, and sense of belonging with others (Koelsch, 2014). Critically,
the influence of music in everyday life (Cohen, Bailey, & Nilsson, 2002;
Laukka, 2007) as well as its emotional impact (Saarikallio, 2011) seem to
keep high levels of importance across lifespan and cultures (Grau-Sánchez
et al., 2017).
Older adults report that music helps them in connecting with other
people, developing self-identity and increasing self-esteem, thus increasing
quality of life and decreasing feelings of isolation and loneliness (see Grau-
Sánchez et al., 2017). Furthermore, music, as an emotional stimulus, can be
used as an aid for expressing and experiencing spirituality and escaping
from everyday life through imagination or the evocation of
autobiographical memories (Hays & Minichiello, 2005). A questionnaire
administered to older adults living in Sweden (Laukka, 2007) highlighted
that positive emotions were among the most frequently felt emotions in
response to music. Furthermore, participants reported that they listened to
music in response to psychological needs, such as emotional functions (e.g.,
mood regulation, relaxation, pleasure), and needs related to issues of
identity, belonging, and agency. Notably, both positive emotions and
satisfaction of psychological needs are crucial factors for well-being (Deci
& Ryan, 2000; Diener, Oishi, & Lucas, 2003).
In support of the hypothesis that music can modulate arousal and
improve life quality, Lai and Good (2005) showed that listening to forty-
five minutes of relaxing music at bedtime improves sleep quality, duration,
and efficiency, thus reducing daytime dysfunctions in a group of older
adults with sleeping disorders (see also Luanaigh & Lawlor, 2008). In line
with these findings, Chan and colleagues (Chan, Chan, Mok, Tse, & Yuk,
2009) found that listening to music before going to sleep significantly
reduces older adults’ depression scores, together with heart rate, blood
pressure, and respiratory rate levels. However, these studies usually
compare a musical intervention with an untreated control group. It could be
therefore that the absence of a proper control (i.e., testing another type of
active intervention) can affect the correct interpretation of the results.
Beyond music listening, practicing a musical activity can significantly
contribute to health and emotional benefits (Hallam & Creech, 2016; see
also Menec, 2003). Piano lessons in older non-musicians (Seinfeld et al.,
2013) and participation in community choirs (Kreutz, Bongard, Rohrmann,
Hodapp, & Grebe, 2004; Lamont, Murray, Hale, & Wright-Bevans, 2018)
resulted in lower levels of depression, as well as increased positive mood
states, quality of life, and social interactions when compared to other leisure
activities (e.g., painting or physical exercises) or to the subjective state
before the musical interventions.
In sum, music can strongly modulate not only cognitive performance,
but also emotions, mood, and arousal in older adults, thus improving well-
being and social connections. This calls for a deeper understanding of the
brain mechanisms involved in music-related positive effect in aging.
The Underpinning Brain Mechanisms

The reviewed studies on cognition and emotion suggest that the positive
effect of music on aging might be due to the stimulation of numerous brain
processes, and several mechanisms may explain the observed benefit. Here,
we examine several possible and interconnected explanations. First, we
focus on the overlap between musical and non-musical activities and the
idea that the creation or strengthening of shared networks promote so-called
“transfer effects” of music practice on non-musical tasks. In the context of
aging, this hypothesis will be linked to the concept of cognitive reserve.
The second hypothesis will focus on music-induced physiological effects,
i.e., mood and reward modulations due to the emotional or arousing aspects
of music.
Musical and non-musical activities share cognitive processes. For
example, an fMRI study from Sammler et al. (2010) revealed that speech
and songs are processed at varying degrees of integration along the axis of
the superior temporal sulcus and gyrus and the precentral gyrus, indicating
that language and music share numerous features reflected in the observed
overlap in brain circuitry (see also Patel, 2011; Patel, Gibson, Ratner,
Besson, & Holcomb, 1998; Tillmann, 2012 for similarities between music
and language). Similarly, brain regions activated by rhythmic auditory
stimulation (e.g., music with regular beats), such as the basal ganglia and
the cerebellar-thalamocortical networks, are also crucial for the control of
motor functioning (Dalla Bella, Benoit, Farrugia, Schwartze, & Kotz,
2015). The observation that several brain regions involved in music
perception and production are also involved in other functions may explain
the positive transfer of training-related benefits to non-musical domains,
such as movement, language, memory, and executive functions (see
Schellenberg, 2003). This becomes of particular interest in aging, where
being engaged in stimulating activities throughout the lifespan contributes
to building cognitive reserve. Cognitive reserve has been proposed as a
model to explain inter-individual differences in the severity of cognitive
aging and clinical dementia (Whalley, Deary, Appleton, & Starr, 2004) and
is thought to act through two main mechanisms: (1) strengthening existing
networks, making them more robust and resilient to age-related disruptions;
(2) increasing brain flexibility and facilitating the recruitment of new or
alternative networks, thus allowing compensatory mechanisms to achieve a
task despite a potential disruption. Based on studies reviewed above, it is
reasonable to think that music could promote cognitive reserve through
these two mechanisms and thus act as a “neuroprotector” (Omigie &
Samson, 2014). This is supported, for example, by findings revealing that
musicians, in comparison to non-musicians, show less of the typical age-
related brain volume reductions in dorsolateral prefrontal cortex and left
inferior frontal gyrus (Sluming et al., 2002). Such hypotheses could also
explain the behavioral advantage showed by musicians in cognitive tasks.
A second explanation focuses on the music-related changes in emotions,
physiological arousal, and pleasure, usually associated with increased
activation in core emotion network including amygdala, hippocampus, and
in mesolimbic striatal regions associated to dopamine release, especially the
nucleus accumbens (Salimpoor et al., 2013). Modulations of activation in
these regions can be associated to emotional and cognitive changes. Indeed,
music promotes positive emotions and modulates the level of relaxation,
pleasure, and motivation, thus increasing well-being and social connection
(see section “Emotions and Well-Being” above). Modulation in arousal,
such as reduced levels of anxiety and agitation, can also enhance attention
mechanisms crucial for cognitive performance (Peck, Girard, Russo, &
Fiocco, 2016). Likewise, as emotionally valenced stimuli are usually easier
to remember (see Christianson, 2014), the music-driven high emotional
intensity may promote the encoding and retrieval of the to-be-remembered
information, thus resulting in increased memory performance even for the
elderly (see Jäncke, 2008). Furthermore, dopamine transmission in the
mesolimbic reward system promotes memory formation in the
hippocampus through the ventro-tegmental area/substantia nigra—
hippocampal loop (Lisman, Grace, & Duzel, 2011), and could therefore
drive the music-related increased performance in learning and memory
tasks (Ferreri & Rodriguez-Fornells, 2017).
Interestingly, such stimulating and emotional power of music may
induce physiological-cellular changes and promote brain plasticity (see
Dalla Bella, 2016; Wan & Schlaug, 2010), thus suggesting that music can
not only strengthen and protect the existing networks, but also stimulate and
reorganize them through plastic changes. This is particularly relevant when
considering the decrease in plastic changes during the lifespan (Stiles,
2000). Furthermore, as plastic changes are related to neural myelination
(Gibson et al., 2014), it may be that music stimulation increases and
preserves not only gray matter volume, but also brain connectivity, usually
impaired by aging processes (Fjell et al., 2017).
In sum, the benefits of music on aging could be explained by some
overlap in brain resources and changes in mood and arousal promoting the
strengthening of existing networks and stimulating brain plasticity. This
ultimately points at music as a powerful tool to fight against aging-related
emotional and cognitive impairments. Although further research is needed
to specifically pinpoint all the brain mechanisms explaining the positive
effect of music on the aging brain, fundamental research in neuroscience
has substantially increased our understanding of music-related changes on
behavioral and neural levels. The observation that diverse music-related
neural processes are involved in aging suggests that musical interventions
can be employed to stimulate a broad range of impaired functions in aging.
The time is ripe to further investigate and develop new perspectives for
neuroscience-informed rehabilitation techniques.
P A : N -
I T R
M
Dementia refers to progressive and irreversible neurodegenerative brain

damage that constitutes the most frequent form of pathological aging. It
represents one of the greatest health, social, and economic challenges of our
time, with 46 million people living with dementia worldwide, and a
projection of over 131 million people in 2050 (Alzheimer’s Disease
International, 2015). Numerous musical interventions in the clinical field
have focused on dementia patients, in particular when suffering from AD,
the most common form of dementia. A relevant factor justifying musical
interventions in dementia is related to the fact that abilities such as music
perception and musical memory may be relatively spared in patients
suffering from dementia (see Baird & Samson, 2009, 2015 for reviews). For
example, a recent study on music perception reported that AD patients often
perform as well as healthy controls in basic perceptual skills, such as
temporal and timbre processing, musical scene analysis or tune recognition
(Golden et al., 2017). In addition, musical memory has been shown to be
spared in dementia patients despite severe memory deficits, for example
allowing patients with severe AD to learn and recall new songs (Baird,
Umbach, & Thompson, 2017b). Therefore, music remains accessible to
most patients regardless of cognitive integrity, and thus constitutes a unique
simulation tool, in addition to being an especially appropriate means of
communication between patients and caregivers (Ogay, 1995). Several
music-based therapeutic approaches have been employed depending on the
stage of dementia and the therapeutic goals. For example, music has been
used as mnemonic for the support of encoding or retrieval of information
(early stage), as reminiscence therapy for improving episodic memory
(early to moderate stage), as arousing/calming approach for apathy, anxiety,
or aggressiveness (moderate to severe stage), and as a unique and rare way
to communicate with patients (severe stage). Here, in light of the brain
mechanisms previously considered, and specifically focusing on dementias,
we review and discuss in the following the music-based therapeutic
approaches according to different rehabilitative goals: memory, language,
movement, and emotions and well-being.
Memory
Because of degeneration in medio-temporal and prefrontal lobes, encoding
and retrieving information is one of the most affected abilities in AD
dementia and of particular interest for music interventions. Several findings
revealed that dementia patients with severe memory deficits can show a
surprisingly robust musical memory (see Baird & Samson, 2015). A
possible neuroanatomical explanation is that brain regions specifically
involved in musical memory, such as the caudal anterior cingulate and the
ventral pre-supplementary motor area, are also regions relatively spared in
the first stages of AD (Jacobsen et al., 2015). The question thus arises as to
whether music can be used to promote the encoding and retrieval of non-
musical material in dementia patients. It has been shown that presenting
verbal information in a sung, rather than spoken version, improves its later
recognition after an immediate (Simmons-Stern, Budson, & Ally, 2010;
Simmons-Stern et al., 2012) or delayed (i.e., four weeks, Moussard, Bigand,
Belleville, & Peretz, 2012, 2014a) recall at a mild stage of AD. This
supports the hypothesis that music may act as a good anchor point for
verbal information, in turn enhancing its retrieval (see Ferreri & Verga,
2016). Different results come from Baird and colleagues (Baird, Samson,
Miller, & Chalmers, 2017a), where no beneficial effects of the sung versus
spoken modality were observed on subsequent immediate (30 min) and
delayed (24 hours) recall. These contrasting findings might be due to
experimental differences. As discussed by Baird et al. (2017a), using only
one session of learning, rather than several learning sessions over weeks
(Moussard et al., 2012, 2014a), may have hindered revealing the potential
benefit of music. Thus, it is possible that music could represent an efficient
mnemonic to learn or relearn information in early stages of AD, but only if
there are sufficient time and learning trials allocated to encoding this
information.
Other findings based on non-verbal memory tasks support music-related
memory improvement in dementia. In another study by Moussard and
colleagues (Moussard, Bigand, Belleville, & Peretz, 2014b), mild AD
participants were asked to learn a sequence of gestures in synchrony (i.e.,
shadowing the experimenter) or not (i.e., observing the experimenter),
performed with music or a metronome beat. Results showed that AD
patients learned better in the music condition when tested after an
immediate (but not delayed, i.e., 10 min) recall. Several studies also showed
that music facilitates the recall of autobiographical (i.e., personal) memories
in people suffering from AD. El Haj and colleagues (El Haj, Fasotti, &
Allain, 2012a; El Haj, Clément, Fasotti, & Allain, 2013) found higher
quality (in terms of speed of recall, content specificity, and grammatical
complexity) of autobiographical memories when recalled after having
listened to music. These studies compared music (selected by the patients or
by the experimenter) to a silent control condition. Critically, Foster and
Valentine (2001) showed that a beneficial effect of background sound on
autobiographical memory for recent (but not for remote) events was also
observed for cafeteria noise (instead of music). Hence, it could be argued
that not the music itself, but rather a more general auditory stimulation
drives the observed positive effect. However, in a further study, El Haj and
colleagues (El Haj, Postal, & Allain, 2012b) found better autobiographical
memories after exposure to patients-selected music compared to another
musical condition (i.e., Vivaldi’s Four Seasons), thus supporting the pivotal
and probably particularly arousing role of personally relevant music
stimulation (see also Lord & Garner, 1993). Remarkably, most of these
studies also reported an emotional component: in AD patients, preferred
music-triggered autobiographical memories were rated higher in emotional
content (El Haj et al., 2012a), and with prevalence of positive over negative
content (El Haj et al., 2012b), than memories retrieved during silence or
with music chosen by the experimenter. Consistent with the hypothesis that
music-driven modulations in emotions can enhance memory (Jäncke,
2008), these findings suggest that music may enhance autobiographical
recall by promoting positive emotional memories (El Haj et al., 2012b; see
Irish et al., 2006, for alternative explanation based on anxiety reduction).
Language
While numerous studies have investigated the role of music on memory in
dementia, few investigations exist in the language domain. Findings from
studies investigating non-degenerative disease related to brain traumas or
vascular problems, such as stroke, suggest that music can be a useful tool
for language rehabilitation. Melodic intonation therapy (MIT), a speech
therapy based on the potential rehabilitative effects of singing and rhythmic
movement, has been described as a valid intervention for improving post-
stroke aphasia in younger and older adults (e.g., Belin et al., 1996; Racette,
Bard, & Peretz, 2006; see Zumbansen, Peretz, & Hébert, 2014 for a
review). Consistently with the hypothesis that music stimulation may
enhance cognitive functions by promoting brain connectivity, Schlaug and
colleagues (Schlaug, Marchina, & Norton, 2009) showed increased number
of fibers and volume of the arcuate fasciculus comparing patients’ brain
pre- and post-MIT treatment. Crucially, the arcuate fasciculus is a white
matter tract connecting several regions involved in language production,
such as superior temporal lobe, premotor regions and posterior inferior
frontal gyrus, and primary motor cortex.
Although to our knowledge no study has explored MIT interventions in
dementia care, positive results of music interventions on language function
have also been found in AD patients. Brotons and Koger (2000) observed
that four sessions of music therapy (i.e., listening to songs related to a
conversation topic) significantly improved both speech content and fluency
in a group of ten AD patients. However, considering the small sample size
of the study and the lack of similar investigations, further research is needed
to better understand the effect of musical interventions on language in
dementia. This becomes particularly relevant when considering the
progressive loss of language functioning usually observed in dementias,
such as AD.
Motor Functions
Although findings on AD patients suggest that music may help motor
functions by enhancing gesture learning (Moussard et al., 2014b), the
investigation of motor rehabilitation through music mainly concerns the
study of patients with post-stroke motor impairment or Parkinson’s disease
(PD). PD is characterized by severe movement impairments (e.g.,
bradykinesia or akinesia, limb rigidity and postural instability) that usually
causes gait disorders, such as small steps, lower cadence and reduced gait
speed, festination and freezing (Dalla Bella et al., 2015). Several studies
clearly revealed that rhythm, and more specifically isochronous stimulation
through metronome or music (i.e., with its underlying beats), can
significantly improve gait (Thaut et al., 1996; see De Dreu, Van Der Wilk,
Poppe, Kwakkel, & Van Wegen, 2012 and Dalla Bella et al., 2015 for meta-
analysis and review). In particular, rhythmic auditory stimulation (i.e., when
PD patients are asked to walk along with the auditory cue) has been shown
to improve gait velocity, cadence, and stride length (e.g., McIntosh, Brown,
Rice, & Thaut, 1997; Thaut et al., 1996) and reduce freezing episodes (e.g.,
Arias & Cudeiro, 2010). Such improvements have been shown to persist
also in the absence of the auditory stimulation following cuing-based
training programs, thus leading to long-lasting effects (see Dalla Bella et al.,
2015). Importantly, rhythmic auditory stimulation has been shown to be an
effective intervention for gait rehabilitation not only in PD patients, but also
in subjects with hemi-paretic stroke (Thaut, McIntosh, & Rice, 1997; Thaut
et al., 2007). One possible explanation for this beneficial effect relies on the
fact that external auditory cues, by promoting neural entrainment, generate
temporal expectations allowing to predict the occurrence of a next event
(Jones, 1976; Large, 2008). Such rhythm-driven expectations can help in
regularizing and stabilizing movement when the sensorimotor network
underlying temporal processing is impaired, as in PD or stroke patients, by
reinforcing compensatory neural networks able to enhance motor behavior
(Benoit et al., 2014; Nombela, Hughes, Owen, & Grahn, 2013).
Emotions and Well-Being

While the impact of music on emotion and well-being is important for
normal aging, it becomes crucial in pathological aging. Problems like
apathy, anxiety, depression, and agitation are indeed hard challenges when
dealing with dementia care.
Apathy is a behavioral and psychological symptom of dementia.
Different from depression, it is characterized by a diminution in initiative
and engagement in activities, and usually correlates with a decrease in
cognitive functioning (Levy et al., 1998) and dopamine transmission (David
et al., 2008). Consistently with the music-related changes in physiological
arousal and dopamine transmission observed in response to music
stimulation, musical interventions have been shown to significantly reduce
apathy levels in patients with dementia (Holmes, Knights, Dean,
Hodkinson, & Hopkins, 2006). In the lay-audience documentary Alive
Inside, this effect is famously illustrated with the case of Henri, who seems
to be suddenly back to life after having been exposed to his favorite jazz
music.
Several studies also showed that music interventions (such as listening
to music, singing, or playing instruments, see Ueda, Suzukamo, Sato, &
Izumi, 2013 for a review), especially if long-lasting (i.e., more than three
months), can significantly decrease levels of anxiety and depression in
people suffering from dementia (Clément, Tonini, Khatir, Schiaratura, &
Samson, 2012; Guetin et al., 2009; Janata, 2012; Narme et al., 2014;
Sakamoto, Ando, & Tsutou, 2013; Sung, Chang, Lee, & Lee, 2006;
Svansdottir & Snaedal, 2006), in turn improving their quality of life (see
also Stegemöller, 2017). For example, Sakamoto et al. (2013) found that
both passive and interactive (i.e., clapping, singing, and dancing) music
listening elicited positive emotional responses in severe AD patients (when
compared to a non-intervention control group of patients). However, as
previously discussed, a proper control condition is needed for stating an
actual music-related positive effect. Two studies compared musical
intervention to other pleasurable activities (i.e., cooking) in patients with
AD or mixed dementia (Clément et al., 2012; Narme et al., 2014). While
one study observed no differences at short-term, but a significantly more
positive emotional state for the music group at long-term (Clément et al.,
2012), the other study (Narme et al., 2014) revealed that both music and
cooking interventions led to positive changes in the patients’ behavior and
emotional state, also reducing caregiver distress. These results suggest that
the positive effect of music on emotions and arousal reduction might be
driven by the pleasantness of the activity rather than by the music itself (see
Samson, Clément, Narme, Schiaratura, & Ehrlé, 2015 for methodological
requirements for non-pharmacological clinical trials). This leads to two
main considerations: (1) In light of the hedonic impact on patients’ well-
being, it is important to note that music is probably one of the most
pleasurable activities for humans. As such, music could be considered as a
special stimulus able to involve a broader range of the population when
compared to other types of interventions. (2) As discussed by Narme et al.
(2014), therapist’s preference and implication in the proposed interventions
play a crucial role and can affect the therapeutic outcome. This calls for the
employment of professional personnel (i.e., trained music therapists) able to
exploit the whole potential of music interventions and adapt it to each
patient’s needs. As most of the studies investigating the effect of music on
reduction in agitation usually compare musical interventions solely to
standard care (see Baird & Samson, 2015; Narme et al., 2014), the currently
existing research is probably not enough for claiming a reliable effect of
specific music stimulation, and calls for more controlled clinical trials.
Nevertheless, a strong link between music and well-being in dementia
seems to exist. Exploring the relationship between music and personal
identity, McDermott and colleagues (McDermott, Orrell, & Ridder, 2014)
recently collected interviews from patients with dementia, family carers,
staff, and music therapists. Their analyses resulted in a model, the
“psychosocial model of music in dementia,” in which music emerges as a
powerful and reliable stimulus for promoting self-identity and personal
connectedness, ultimately improving well-being (see Ogay, 1995 for music
therapy and personality in dementia patients; see also Norberg, Melin, &
Asplund 2003 and Castro et al., 2015 for music interventions in the final
stage of dementia and in patients with disorders of consciousness).
C F D
Research on normal and pathological aging revealed that music is an

interesting and powerful means for promoting cognition and well-being in
older adult populations: music constitutes an enjoyable and social activity,
accessible to anyone regardless of his/her background (e.g., education
attainment, previous musical experience). The positive effect of music in
aging seems to rely on diverse, complex, and interacting neural processes
promoting brain plasticity, transfer, and compensatory mechanisms that
improve behavioral outcomes at emotional, cognitive, and social levels.
While numerous positive and encouraging results propose music as a
unique tool for preventing cognitive decline and rehabilitating deficits
related to neurodegenerative diseases, it is also worth mentioning that
relevant limitations and open questions arise from the existing literature.
As argued above, biases such as lack of a proper control group or
condition and small sample sizes may affect the power and reliability of the
results. The inconsistencies of the results across studies may also come
from methodological differences related to the type of musical interventions
(e.g., active training, passive listening), their duration and frequency, as
well as the type of music employed (familiar, unfamiliar, selected by the
participant or the experimenter). More experimental rigor and systematic
manipulations are therefore needed for further investigations, and would
help in clarifying the actual contribution of music (listening, practice) in the
observed beneficial effects. Accordingly, it will be critical for further
research to compare different types of music training with other stimulating
leisure activities, and to provide evidence for the minimal optimal dose (i.e.,
the type, number, and duration of musical interventions) required for
observing positive outcomes (perhaps depending on the targeted population,
i.e., healthy older adults or patients).
Furthermore, although numerous brain mechanisms are supposed to
prompt the observed behavioral findings, only few neuroimaging studies
have been conducted on music and aging. Further research investigating the
underlying neural mechanisms is therefore needed to better understand how
music acts on the aging brain, thus allowing the setup of more fine-grained
musical interventions.
Clarifying these open questions may afford effective contributions not
only to the research community but also, and most importantly, to the
society. Results on musical practice and its long-term effects diminishing
cognitive decline highlight indeed the importance of facilitating music
activities across the lifespan. The hypothesis that musical training may act
as a neuroprotective factor would strongly support music classes in the
educational system (see Kraus et al., 2014; White-Schwoch et al., 2013). In
addition, this would allow offering a valid, pleasant, and affordable tool of
stimulation and prevention of neurodegenerative diseases in the elderly
population. Accordingly, results on the effectiveness of short-term musical
exposure at practice at the older age and musical interventions in clinical
settings endorse the insertion of music activities and music-therapeutic
programs in healthcare, as means for hindering or at least diminishing
cognitive decline, and decreasing pathological aging-related deficits.
In sum, music rises as a great tool able to improve cognitive functions
and well-being acting on numerous brain processes in the older adults
population. Nevertheless, other leisure activities (e.g., learning a second
language, playing complex games, etc.) could also lead to similar positive
effects. It remains therefore essential for future research and clinical
applications to develop individualized protocols depending on patients’
needs (e.g., improvement of mood, memory, etc.) and personal preferences,
also comparing the different domains.
R
Alain, C., Moussard, A., Singer, J., Lee, Y., Bidelman, G. M., & Moreno, S. (2019). Music and
Visual Art Training Modulate Brain Activity in Older Adults. Frontiers in neuroscience, 13.
Alzheimer’s Disease International (2015). World Alzheimer Report, 2015. The global impact of
Dementia 2015. An analysis of prevalence, incidence, costs and trends. London: Alzheimer’s
Disease International (ADI).
Amer, T., Kalender, B., Hasher, L., Trehub, S. E., & Wong, Y. (2013). Do older professional
musicians have cognitive advantages?. PloS ONE 8(8), e71630.
Arias, P., & Cudeiro, J. (2010). Effect of rhythmic auditory stimulation on gait in Parkinsonian
patients with and without freezing of gait. PloS ONE 5(3), e9675.
Aziz, R., & Steffens, D. C. (2013). What are the causes of late-life depression? Psychiatric Clinics of
North America 36(4), 497–516.
Baird, A., & Samson, S. (2009). Memory for music in Alzheimer’s disease: Unforgettable?
Neuropsychology Review 19(1), 85–101.
Baird, A., & Samson, S. (2015). Music and dementia. Progress in Brain Research 217, 207–235.
Baird, A., Samson, S., Miller, L., & Chalmers, K. (2017a). Does music training facilitate the
mnemonic effect of song? An exploration of musicians and nonmusicians with and without
Alzheimer’s dementia. Journal of Clinical and Experimental Neuropsychology 39(1), 9–21.
Baird, A., Umbach, H., & Thompson, W. F. (2017b). A nonmusician with severe Alzheimer’s
dementia learns a new song. Neurocase 23(1), 36–40.
Balbag, M. A., Pedersen, N. L., & Gatz, M. (2014). Playing a musical instrument as a protective
factor against dementia and cognitive impairment: A population-based twin study. International
Journal of Alzheimer’s Disease 8, 836748. doi:10.1155/2014/836748
Belin, P., Zilbovicius, M., Remy, P., Francois, C., Guillaume, S., Chain, F., … Samson, Y. (1996).
Recovery from nonfluent aphasia after melodic intonation therapy: A PET study. Neurology 47(6),
1504–1511.
Benoit, C. E., Dalla Bella, S., Farrugia, N., Obrig, H., Mainka, S., & Kotz, S. A. (2014). Musically
cued gait-training improves both perceptual and motor timing in Parkinson’s disease. Frontiers in
Human Neuroscience 8. Retrieved from https://doi.org/10.3389/fnhum.2014.00494
Bottiroli, S., Rosi, A., Russo, R., Vecchi, T., & Cavallini, E. (2014). The cognitive effects of listening
to background music on older adults: Processing speed improves with upbeat music, while
memory seems to benefit from both upbeat and downbeat music. Frontiers in Aging Neuroscience
6. Retrieved from https://doi.org/10.3389/fnagi.2014.00284
Brotons, M., & Koger, S. M. (2000). The impact of music therapy on language functioning in
dementia. Journal of Music Therapy 37(3), 183–195.
Buckner, R. L. (2004). Memory and executive function in aging and AD: Multiple factors that cause
decline and reserve factors that compensate. Neuron 44(1), 195–208.
Bugos, J. A., Perlstein, W. M., McCrae, C. S., Brophy, T. S., & Bedenbaugh, P. H. (2007).
Individualized piano instruction enhances executive functioning and working memory in older
adults. Aging and Mental Health 11(4), 464–471.
Cabeza, R. (2002). Hemispheric asymmetry reduction in older adults: The HAROLD model.
Psychology and Aging 17(1), 85–100.
Cabeza, R., Anderson, N. D., Locantore, J. K., & McIntosh, A. R. (2002). Aging gracefully:
Compensatory brain activity in high-performing older adults. NeuroImage 17(3), 1394–1402.
Castro, M., Tillmann, B., Luauté, J., Corneyllie, A., Dailler, F., André-Obadia, N., & Perrin, F.
(2015). Boosting cognition with music in patients with disorders of consciousness.
Neurorehabilitation and Neural Repair 29(8), 734–742.
Chan, M. F., Chan, E. A., Mok, E., Tse, K., & Yuk, F. (2009). Effect of music on depression levels
and physiological responses in community-based older adults. International Journal of Mental
Health Nursing 18(4), 285–294.
Christianson, S. A. (Ed.). (2014). The handbook of emotion and memory: Research and theory. New
York: Psychology Press.
Clément, S., Tonini, A., Khatir, F., Schiaratura, L., & Samson, S. (2012). Short and longer term
effects of musical intervention in severe Alzheimer’s disease. Music Perception: An
Cohen, A., Bailey, B., & Nilsson, T. (2002). The importance of music to seniors. Psychomusicology:
A Journal of Research in Music Cognition 18(1–2), 89–102.
Cornwell, E. Y., & Waite, L. J. (2009). Social disconnectedness, perceived isolation, and health
among older adults. Journal of Health and Social Behavior 50(1), 31–48.
Craik, F. I. M., Bialystok, E., & Freedman, M. (2010). Delaying the onset of Alzheimer disease:
Bilingualism as a form of cognitive reserve. Neurology 75(19), 1726–1729.
Cuddy, L. L., Sikka, R., & Vanstone, A. (2015). Preservation of musical memory and engagement in
healthy aging and Alzheimer’s disease. Annals of the New York Academy of Sciences 1337, 223–
231.
Dalla Bella, S. (2016). Music and brain plasticity. In S. Hallam, I Cross, & M. Thaut (Eds.), The
Oxford handbook of music psychology (pp. 325–342). Oxford: Oxford University Press.
Dalla Bella, S., Benoit, C. E., Farrugia, N., Schwartze, M., & Kotz, S. A. (2015). Effects of musically
cued gait training in Parkinson’s disease: Beyond a motor benefit. Annals of the New York
David, R., Koulibaly, M., Benoit, M., Garcia, R., Caci, H., Darcourt, J., & Robert, P. (2008). Striatal
dopamine transporter levels correlate with apathy in neurodegenerative diseases: A SPECT study
with partial volume effect correction. Clinical Neurology and Neurosurgery 110(1), 19–24.
De Dreu, M. J., Van Der Wilk, A. S. D., Poppe, E., Kwakkel, G., & Van Wegen, E. E. H. (2012).
Rehabilitation, exercise therapy and music in patients with Parkinson’s disease: A meta-analysis of
the effects of music-based movement therapy on walking ability, balance and quality of life.
Parkinsonism & Related Disorders 18, S114–S119.
Deci, E. L., & Ryan, R. M. (2000). The “what” and “why” of goal pursuits: Human needs and the
self-determination of behavior. Psychological Inquiry 11(4), 227–268.
Diener, E., Oishi, S., & Lucas, R. E. (2003). Personality, culture, and subjective well-being:
Emotional and cognitive evaluations of life. Annual Review of Psychology 54, 403–425.
El Haj, M., Clément, S., Fasotti, L., & Allain, P. (2013). Effects of music on autobiographical verbal
narration in Alzheimer’s disease. Journal of Neurolinguistics 26(6), 691–700.
El Haj, M., Fasotti, L., & Allain, P. (2012a). The involuntary nature of music-evoked
autobiographical memories in Alzheimer’s disease. Consciousness and Cognition 21(1), 238–246.
El Haj, M., Omigie, D., & Clément, S. (2014). Music causes deterioration of source memory:
Evidence from normal ageing. Quarterly Journal of Experimental Psychology 67(12), 2381–2391.
El Haj, M., Postal, V., & Allain, P. (2012b). Music enhances autobiographical memory in mild
Alzheimer’s disease. Educational Gerontology 38(1), 30–41.
Ferreri, L., Bigand, E., Perrey, S., Muthalib, M., Bard, P., & Bugaiska, A. (2014). Less effort, better
results: How does music act on prefrontal cortex in older adults during verbal encoding? An fNIRS
study. Frontiers in Human Neuroscience 8. Retrieved from
Ferreri, L., & Rodriguez-Fornells, A. (2017). Music-related reward responses predict episodic
memory performance. Experimental Brain Research 235(12), 3721–31.
Ferreri, L., & Verga, L. (2016). Benefits of music on verbal learning and memory. Music Perception:
An Interdisciplinary Journal 34(2), 167–182.
Fjell, A. M., Sneve, M. H., Grydeland, H., Storsve, A. B., & Walhovd, K. B. (2017). The
disconnected brain and executive function decline in aging. Cerebral Cortex 27(3), 2303–2317.
Foster, N. A., & Valentine, E. R. (2001). The effect of auditory stimulation on autobiographical recall
in dementia. Experimental Aging Research 27(3), 215–228.
Gibson, E. M., Purger, D., Mount, C. W., Goldstein, A. K., Lin, G. L., Wood, L. S., … Monje, M.
(2014). Neuronal activity promotes oligodendrogenesis and adaptive myelination in the
mammalian brain. Science 344(6183), 1252304.
Golden, H. L., Clark, C. N., Nicholas, J. M., Cohen, M. H., Slattery, C. F., Paterson, R. W., …
Warren, J. D. (2017). Music perception in dementia. Journal of Alzheimer’s Disease 55(3), 933–
949.
Gooding, L., Abner, E. L., Jicha, G. A., Kryscio, R. J., & Schmitt, F. A. (2014). Musical training and
late-life cognition. Journal of Alzheimer’s Disease and Other Dementias 29, 333–343.
Grau-Sánchez, J., Foley, M., Hlavová, R., Muukkonen, I., Ojinaga-Alfageme, O., Radukic, A., …
Hundevad, B. (2017). Exploring musical activities and their relationship to emotional well-being
in elderly people across Europe: A study protocol. Frontiers in Psychology 8. Retrieved from
Guetin, S., Portet, F., Picot, M. C., Pommié, C., Messaoudi, M., Djabelkir, L., … Touchon, J. (2009).
Effect of music therapy on anxiety and depression in patients with Alzheimer’s type dementia:
Randomised, controlled study. Dementia and Geriatric Cognitive Disorders 28(1), 36–46.
Hallam, S., & Creech, A. (2016). Can active music making promote health and well-being in older
citizens? Findings of the music for life project. London Journal of Primary Care 8(2), 21–25.
Halpern, A. R., Bartlett, J. C., & Dowling, W. J. (1995). Aging and experience in the recognition of
musical transpositions. Psychology and Aging, 10(3), 325–342.
Halpern, A. R., Bartlett, J. C., & Dowling, W. J. (1998). Perception of mode, rhythm, and contour in
unfamiliar melodies: Effects of age and experience. Music Perception: An Interdisciplinary
Journal 15(4), 335–355.
Hanna-Pladdy, B., & Gajewski, B. (2012). Recent and past musical activity predicts cognitive aging
variability: Direct comparison with general lifestyle activities. Frontiers in Human Neuroscience
6. Retrieved from https://doi.org/10.3389/fnhum.2012.00198
Hanna-Pladdy, B., & MacKay, A. (2011). The relation between instrumental musical activity and
cognitive aging. Neuropsychology 25(3), 378–386.
Hays, T., & Minichiello, V. (2005). The contribution of music to quality of life in older people: An
Australian qualitative study. Ageing & Society 25(2), 261–278.
Holmes, C., Knights, A., Dean, C., Hodkinson, S., & Hopkins, V. (2006). Keep music live: Music
and the alleviation of apathy in dementia subjects. International Psychogeriatrics 18(4), 623–630.
Irish, M., Cunningham, C. J., Walsh, J. B., Coakley, D., Lawlor, B. A., Robertson, I. H., & Coen, R.
F. (2006). Investigating the enhancing effect of music on autobiographical memory in mild
Alzheimer’s disease. Dementia and Geriatric Cognitive Disorders 22(1), 108–120.
Jacobsen, J. H., Stelzer, J., Fritz, T. H., Chételat, G., La Joie, R., & Turner, R. (2015). Why musical
Janata, P. (2012). Effects of widespread and frequent personalized music programming on agitation
and depression in assisted living facility residents with Alzheimer-type dementia. Music and
Medicine 4(1), 8–15.
Jäncke, L. (2008). Music, memory and emotion. Journal of Biology 7(6), 21.
Johnson, J. K., Chang, C. C., Brambati, S. M., Migliaccio, R., Gorno-Tempini, M. L., Miller, B. L.,
& Janata, P. (2011). Music recognition in frontotemporal lobar degeneration and Alzheimer
disease. Cognitive and Behavioral Neurology: Official Journal of the Society for Behavioral and
Cognitive Neurology 24(2), 74–84.
170–183.
Kreutz, G., Bongard, S., Rohrmann, S., Hodapp, V., & Grebe, D. (2004). Effects of choir singing or
listening on secretory immunoglobulin A, cortisol, and emotional state. Journal of Behavioral
Medicine 27(6), 623–635.
Lai, H. L., & Good, M. (2005). Music improves sleep quality in older adults. Journal of Advanced
Nursing 49(3), 234–244.
Lamont, A., Murray, M., Hale, R., & Wright-Bevans, K. (2018). Singing in later life: The anatomy of
a community choir. Psychology of Music 46(3), 424–439.
Large, E. W. (2008). Resonating to musical rhythm: Theory and experiment. In S. Grondin (Ed.), The
psychology of time (pp. 189–232). Bingley: Emerald Group.
Laukka, P. (2007). Uses of music and psychological well-being among the elderly. Journal of
Happiness Studies 8(2), 215–241.
Levy, M. L., Cummings, J. L., Fairbanks, L. A., Masterman, D., Miller, B. L., Craig, A. H., …
Litvan, I. (1998). Apathy is not depression. Journal of Neuropsychiatry and Clinical
Lisman, J., Grace, A. A., & Duzel, E. (2011). A neoHebbian framework for episodic memory: Role
of dopamine-dependent late LTP. Trends in Neurosciences 34(10), 536–547.
Lord, T. R., & Garner, J. E. (1993). Effects of music on Alzheimer patients. Perceptual and Motor
Skills 76(2), 451–455.
Luanaigh, C. Ó., & Lawlor, B. A. (2008). Loneliness and the health of older people. International
Journal of Geriatric Psychiatry 23(12), 1213–1221.
McDermott, O., Orrell, M., & Ridder, H. M. (2014). The importance of music for people with
dementia: The perspectives of people with dementia, family carers, staff and music therapists.
Aging & Mental Health 18(6), 706–716.
McIntosh, G. C., Brown, S. H., Rice, R. R., & Thaut, M. H. (1997). Rhythmic auditory-motor
facilitation of gait patterns in patients with Parkinson’s disease. Journal of Neurology,
Neurosurgery & Psychiatry 62(1), 22–26.
Mammarella, N., Fairfield, B., & Cornoldi, C. (2007). Does music enhance cognitive performance in
healthy older adults? The Vivaldi effect. Aging Clinical and Experimental Research 19(5), 394–
399.
Meltzer, C. C., Smith, G., DeKosky, S. T., Pollock, B. G., Mathis, C. A., Moore, R. Y., … Reynolds,
C. F. (1998). Serotonin in aging, late-life depression, and Alzheimer’s disease: The emerging role
of functional imaging. Neuropsychopharmacology 18(6), 407–430.
Menec, V. H. (2003). The relation between everyday activities and successful aging: A 6-year
longitudinal study. Journals of Gerontology Series B: Psychological Sciences and Social Sciences
58(2), S74–S82.
Mora, F., Segovia, G., & Del Arco, A. (2007). Aging, plasticity and environmental enrichment:
Structural changes and neurotransmitter dynamics in several areas of the brain. Brain Research
Reviews 55(1), 78–88.
Moussard, A., Bermudez, P., Alain, C., Tays, W., & Moreno, S. (2016). Life-long music practice and
executive control in older adults: An event-related potential study. Brain Research 1642, 146–153.
Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2012). Music as an aid to learn new verbal
information in Alzheimer’s disease. Music Perception: An Interdisciplinary Journal 29(5), 521–
531.
Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2014a). Learning sung lyrics aids retention in
normal ageing and Alzheimer’s disease. Neuropsychological Rehabilitation 24(6), 894–917.
Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2014b). Music as a mnemonic to learn gesture
sequences in normal aging and Alzheimer’s disease. Frontiers in Human Neuroscience 8.
Narme, P., Clément, S., Ehrlé, N., Schiaratura, L., Vachez, S., Courtaigne, B., … Samson, S. (2014).
Efficacy of musical interventions in dementia: Evidence from a randomized controlled trial.
Journal of Alzheimer’s Disease 38(2), 359–369.
Nombela, C., Hughes, L. E., Owen, A. M., & Grahn, J. A. (2013). Into the groove: Can rhythm
influence Parkinson’s disease? Neuroscience & Biobehavioral Reviews 37(10), 2564–2570.
Norberg, A., Melin, E., & Asplund, K. (2003). Reactions to music, touch and object presentation in
the final stage of dementia: An exploratory study. International Journal of Nursing Studies 40(5),
473–479.
Ogay, S. (1995). La maintenance de la personnalité du sujet dément par la musicothérapie.
Psychologie médicale 27, 104–105.
Omigie, D., & Samson, S. (2014). A protective effect of musical expertise on cognitive outcome
following brain damage? Neuropsychology Review 24(4), 445–460.
Parbery-Clark, A., Anderson, S., Hittner, E., & Kraus, N. (2012). Musical experience strengthens the
neural representation of sounds important for communication in middle-aged adults. Frontiers in
Aging Neuroscience 4. Retrieved from https://doi.org/10.3389/fnagi.2012.00030
Parbery-Clark, A., Strait, D. L., Anderson, S., Hittner, E., & Kraus, N. (2011). Musical experience
and the aging auditory system: Implications for cognitive abilities and hearing speech in noise.
PloS ONE 6(5), e18082.
Pascual-Leone, A. (2001). The brain that plays music and is changed by it. Annals of the New York
hypothesis. Frontiers in Psychology 2, 142. Retrieved from
Patel, A. D., Gibson, E., Ratner, J., Besson, M., & Holcomb, P. J. (1998). Processing syntactic
relations in language and music: An event-related potential study. Journal of Cognitive
Peck, K. J., Girard, T. A., Russo, F. A., & Fiocco, A. J. (2016). Music and memory in Alzheimer’s
disease and the potential underlying mechanisms. Journal of Alzheimer’s Disease 51(4), 949–959.
Racette, A., Bard, C., & Peretz, I. (2006). Making non-fluent aphasics speak: Sing along! Brain
129(10), 2571–2584.
Raz, N., Lindenberger, U., Rodrigue, K. M., Kennedy, K. M., Head, D., Williamson, A., … Acker, J.
D. (2005). Regional brain changes in aging healthy adults: General trends, individual differences
and modifiers. Cerebral Cortex 15(11), 1676–1689.
Saarikallio, S. (2011). Music as emotional self-regulation throughout adulthood. Psychology of Music
39(3), 307–327.
Sakamoto, M., Ando, H., & Tsutou, A. (2013). Comparing the effects of different individualized
music interventions for elderly individuals with severe dementia. International Psychogeriatrics
25(5), 775–784.
value. Science 340(6129), 216–219.
Salthouse, T. A. (2016). Theoretical perspectives on cognitive aging. New York: Routledge.
Samson, S., Clément, S., Narme, P., Schiaratura, L., & Ehrlé, N. (2015). Efficacy of musical
interventions in dementia: Methodological requirements of nonpharmacological trials. Annals of
Schellenberg, E. G. (2003). Does exposure to music have beneficial side effects? In R. Peretz & R. J.
Zatorre (Eds.), The cognitive neuroscience of music (pp. 430–448). New York: Nova Science
Press.
Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white-matter tracts of
patients with chronic Broca’s aphasia undergoing intense intonation-based speech therapy. Annals
Seinfeld, S., Figueroa, H., Ortiz-Gil, J., & Sanchez-Vives, M. V. (2013). Effects of music learning
and piano practice on cognitive function, mood and quality of life in older adults. Frontiers in
Simmons-Stern, N. R., Deason, R. G., Brandler, B. J., Frustace, B. S., O’Connor, M. K., Ally, B. A.,
& Budson, A. E. (2012). Music-based memory enhancement in Alzheimer’s disease: Promise and
limitations. Neuropsychologia 50(14), 3295–3303.
Stegemöller, E. L. (2017). The neuroscience of speech and language. Music Therapy Perspectives
35(2), 107–112.
Stern, Y. (2002). What is cognitive reserve? Theory and research application of the reserve concept.
Journal of the International Neuropsychological Society 8(3), 448–460.
Stern, Y. (2009). Cognitive reserve. Neuropsychologia 47(10), 2015–2028.
Stewart, L., Henson, R., Kampe, K., Walsh, V., Turner, R., & Frith, U. (2003). Brain changes after
learning to read and play music. NeuroImage 20(1), 71–83.
Stiles, J. (2000). Neural plasticity and cognitive development. Developmental Neuropsychology
18(2), 237–272.
Sung, H. C., Chang, S. M., Lee, W. L., & Lee, M. S. (2006). The effects of group music with
movement intervention on agitated behaviours of institutionalized elders with dementia in Taiwan.
Complementary Therapies in Medicine 14(2), 113–119.
Svansdottir, H. B., & Snaedal, J. (2006). Music therapy in moderate and severe dementia of
Alzheimer’s type: A case-control study. International Psychogeriatrics 18(4), 613–621.
Thaut, M. H., Leins, A. K., Rice, R. R., Argstatter, H., Kenyon, G. P., McIntosh, G. C., … Fetter, M.
(2007). Rhythmic auditory stimulation improves gait more than NDT/Bobath training in near-
ambulatory patients early poststroke: A single-blind, randomized trial. Neurorehabilitation and
Thaut, M. H., McIntosh, G. C., & Rice, R. R. (1997). Rhythmic facilitation of gait training in
hemiparetic stroke rehabilitation. Journal of the Neurological Sciences 151(2), 207–212.
Rhythmic auditory stimulation in gait training for Parkinson’s disease patients. Movement
Disorders 11(2), 193–200.
Thompson, R. G., Moulin, C. J. A., Hayre, S., & Jones, R. W. (2005). Music enhances category
fluency in healthy older adults and Alzheimer’s disease patients. Experimental Aging Research
31(1), 91–99.
Tillmann, B. (2012). Music and language perception: Expectations, structural integration, and
cognitive sequencing. Topics in Cognitive Science 4(4), 568–584.
Ueda, T., Suzukamo, Y., Sato, M., & Izumi, S. I. (2013). Effects of music therapy on behavioral and
psychological symptoms of dementia: A systematic review and meta-analysis. Ageing Research
Reviews 12(2), 628–641.
Van Petten, C., Plante, E., Davidson, P. S., Kuo, T. Y., Bajuscak, L., & Glisky, E. L. (2004). Memory
and executive function in older adults: Relationships with temporal and prefrontal gray matter
volumes and white matter hyperintensities. Neuropsychologia 42(10), 1313–1335.
Verghese, J., LeValley, A., Derby, C., Kuslansky, G., Katz, M., Hall, C., … Lipton, R. (2006). Leisure
activities and the risk of amnestic mild cognitive impairment in the elderly. Neurology 66(6), 821–
827.
Verghese, J., Lipton, R. B., Katz, M. J., Hall, C. B., Derby, C. A., Kuslansky, G., … Buschke, H.
(2003). Leisure activities and the risk of dementia in the elderly. New England Journal of Medicine
348(25), 2508–2516.
Wan, C. Y., & Schlaug, G. (2010). Music making as a tool for promoting brain plasticity across the
life span. The Neuroscientist 16(5), 566–577
Whalley, L. J., Deary, I. J., Appleton, C. L., & Starr, J. M. (2004). Cognitive reserve and the
neurobiology of cognitive aging. Ageing Research Reviews 3(4), 369–382.
Wilson, R. S., Boyle, P. A., Yang, J., James, B. D., & Bennett, D. A. (2015). Early life instruction in
foreign language and music and incidence of mild cognitive impairment. Neuropsychology 29(2),
292–302.
Zatorre, R. (2005). Music, the food of neuroscience? Nature 434(7031), 312–315.
substrates. Proceedings of the National Academy of Sciences 110(Suppl. 2), 10430–10437.
Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory
processing. Psychology and Aging 27(2), 410–417.
research. Frontiers in Neurology 5. Retrieved from https://doi.org/10.3389/fneur.2014.00007
CHAPT E R 26
MUSIC TRAINING AND

COGNITIVE ABILITIES:
A S S O C I AT I O N S , C A U S E S ,
AND CONSEQUENCES
S WAT H I S WA MI N AT H A N A N D E . G L E N N
SCHELLENBERG
O the years, a large body of research has examined associations between

music lessons and non-musical cognitive abilities, with the aim of
determining whether music training improves cognition. Because most
studies have correlational designs, conclusions of causation are precluded.
Thus, it remains a matter of great debate whether and under what conditions
music lessons improve non-musical abilities.
Musicians’ brains are structurally and functionally different from those
of non-musicians (for reviews see Gaser & Schlaug, 2003; Herholz &
Zatorre, 2012). Except in rare instances (e.g., Elbert, Pantev, Wienbruch,
Rockstroh, & Taub, 1995), these differences do not inform the issue of
causation. Individual differences in demographics, personality, cognitive
ability, and so on, could be associated with brain development and with the
likelihood of taking music lessons. In the present chapter, our focus is on
associations between music training and behavior.
As one would expect, musically trained individuals tend to perform
better than other individuals on tasks that require them to perceive and
discriminate sequences of tones or beats (Bhatara, Yeung, & Nazzi, 2015;
Law & Zentner, 2012; Slevc, Davey, Buschkuehl, & Jaeggi, 2016;
Swaminathan & Schellenberg, 2017; Swaminathan, Schellenberg, & Khalil,
2017; Swaminathan, Schellenberg, & Venkatesan, 2018; Wallentin, Nielsen,
Friis-Olivarius, Vuust, & Vuust, 2010). Music training is also associated
with a wide range of non-musical abilities (for reviews see Schellenberg &
Weiss, 2013; Swaminathan & Schellenberg, 2016). Moreover, a limited
amount of experimental evidence suggests that music training causes
improvements in non-musical abilities, at least in some circumstances (e.g.,
Jaschke, Honing, & Scherder, 2018; Kaviani, Mirbaha, Pournaseh, &
Sagan, 2014; Portowitz, Lichtenstein, Egorova, & Brand, 2009;
Schellenberg, 2004; Slater et al., 2015).
Based on findings of widespread correlations and a handful of
encouraging experimental results, researchers have proposed that music
training is the perfect model for investigating plasticity and transfer of
learning (e.g., Herholz & Zatorre, 2012 Münte, Altenmüller, & Jäncke,
2002; Wan & Schlaug, 2010). Consequently, correlations between music
training and non-musical abilities are considered to provide evidence of
causal effects. This tendency is problematic for at least three reasons.
First of all, the link between music training and non-musical abilities is
not clear-cut. For example, experiments sometimes fail to document
improvements in cognitive abilities after taking music lessons (e.g.,
Butzlaff, 2000; Mehr, Schachner, Katz, & Spelke, 2013). Those that
succeed often adopt non-standard pedagogies, such as training music-
listening skills rather than teaching participants to sing or play an
instrument (Degé & Schwarzer, 2011; Moreno, Bialystok, et al., 2011; for a
review see Swaminathan & Schellenberg, 2016). When children are
assigned to more standard conservatory-style lessons at no cost to their
parents (Schellenberg, 2004; Slater et al., 2015), the learning process bears
little resemblance to the real world, because parents do not insist that their
children practice between lessons. Even in correlational studies,
associations with music training are not always evident (e.g., Boebinger et
al., 2015; Brandler & Rammsayer, 2003; Helmbold, Rammsayer, &
Altenmüller, 2005; Ruggles, Freyman, & Oxenham, 2014; Schellenberg &
Moreno, 2010; Swaminathan & Schellenberg, 2017).
The second problem is more theoretical, concerning the relation between
correlation and causation. As every student in an introductory psychology
course learns, correlation does not imply causation. Nevertheless, this is not
a reciprocal relation, and causation definitely implies correlation. In other
words, if music training causes improvements in cognitive abilities, one
should rightly expect this effect to be evident in everyday life, such that
individuals who take years of music lessons exhibit the documented
positive effects. In short, when a correlational study is well designed and
adequately powered, a null result provides direct negative evidence against
the hypothesized effect, whereas a positive effect is simply consistent with a
putative causal association.
A third major issue involves far transfer, when training in a domain
such as music leads to better performance or faster learning in a different
(i.e., non-musical) domain. Although near transfer—to a highly similar task
—is common, it is still unclear whether far-transfer effects are even
possible, despite more than a century of research (e.g., Brody, 1992; Jensen,
1969, 1998; Thorndike & Woodworth, 1901a, 1901b). For example,
interventions designed specifically to improve general cognitive abilities,
such as working memory, fluid intelligence, or academic performance,
report weak or variable results (Guo, Ohsawa, Suzuki, & Sekiyama, 2018;
Love, Chazan-Cohen, Raikes, & Brooks-Gunn, 2013; Melby-Lervåg &
Hulme, 2013; Melby-Lervåg, Reddick, & Hulme, 2016; Rapport, Orban,
Kofler, & Friedman, 2013; Shipstead, Redick, & Engle, 2012; Weicker,
Villringer, & Thöne-Otto, 2016). In fact, recent meta-analyses find weak to
no evidence that (1) chess instruction leads to better cognitive skills (Sala &
Gobet, 2016), (2) working-memory training enhances cognitive ability or
academic achievement (Sala & Gobet, 2017b; Soveri, Antfolk, Karlsson,
Salo, & Laine, 2017), or (3) video-game playing improves cognition (Sala,
Tatlidil, & Gobet, 2018). With this larger context in mind, the putative
cognitive-training effects of music lessons should be considered with
caution. In fact, a meta-analysis that examined directly whether music
training has far-transfer effects on non-musical cognitive abilities reported
similarly skeptical results (Sala & Gobet, 2017a).
In the remainder of this chapter, we first review the correlational and
experimental evidence for music-training effects. Subsequently, we propose
an analytic strategy as a possible way forward.
A R E E
Researchers have examined whether music training is associated with

general cognitive abilities, visuospatial skills, and language skills.
Associations have also been studied in applied contexts, such as educational
settings and interventions designed to promote healthy aging. In this
section, we review these findings, paying close attention to inconsistencies
in the literature.
Music Training and General Cognitive Abilities

Musically trained children and adults typically have higher IQ scores than
their untrained counterparts (Gibson, Folley, & Park, 2009; Gruhn, Galley,
& Kluth, 2003; Ho, Cheung, & Chan, 2003; Schellenberg, 2011a, 2011b;
Schellenberg & Mankarious, 2012; Trimmer & Cuddy, 2008). In some
instances, duration of training is associated positively with IQ or other
measures of general cognitive ability (Corrigall, Schellenberg, & Misura,
2013; Degé, Wehrum, Stark, & Schwarzer, 2014; Schellenberg, 2006;
Swaminathan et al., 2017, 2018; Swaminathan & Schellenberg, 2018). In
other words, as music training tends to increase, so does IQ. Because
intelligence-test scores predict educational and career outcomes, as well as
health and longevity (e.g., Deary, Strand, Smith, & Fernandes, 2007; Judge,
Higgins, Thoresen, & Barrick, 1999; Spinath, Spinath, Harlaar, & Plomin,
2006), correlations are often interpreted optimistically, as evidence that
music training promotes wide-ranging cognitive benefits that have
implications for an individual’s success in life. An alternative view of these
correlational findings, however, is that enrolling in music lessons,
particularly for extended durations of time, is the consequence of better
intellectual functioning.
Nevertheless, there is some experimental evidence indicating that music
lessons cause small improvements in IQ scores, which we will now
summarize and evaluate. In one study, 144 6-year-olds were assigned
randomly to one year of keyboard or vocal music lessons or to control
conditions (drama lessons or no lessons at all; Schellenberg, 2004). Before
the intervention period began, the groups did not differ in their scores on the
Wechsler Intelligence Scale for Children—III (WISC-III). After the
intervention, all groups showed improvements on the WISC-III. These
across-the-board improvements likely resulted from attending school or a
retesting effect. A more provocative result revealed that children who
received music lessons (keyboard or voice) showed larger improvements
than their counterparts in the control groups. The effect was evident only
when the two music groups were contrasted directly with the two control
groups, however, and it was small (< 3 points), less than the average intra-
individual difference between two administrations of the same test.
Moreover, at post-test when parents were questioned about their child’s
practice habits, it became clear that children in the music groups practiced
minimally (10–15 min per week). In any event, the observed effect could
have stemmed from the school-like structure of the music lessons, which
differed from the play-like structure of the drama lessons, and led to better
test-taking skills. Alternatively, the effect may have been a Type I error. As
a side note, an interesting non-musical result was that the children in the
drama group had the largest improvements in social behavior.
In a more recent study of preschoolers in Tehran, children assigned to
three months of weekly music lessons made statistically significant post-test
gains on a standardized Farsi version of the Stanford-Binet IQ test (Kaviani
et al., 2014). There was no evidence of improvement in the control group.
The control group in the Iranian study was a passive control group (i.e., no
lessons at all), however, which makes it impossible to attribute the positive
findings of the musically trained group to the actual music training, rather
than to other aspects of the intervention (e.g., contact with an adult
instructor).
Another recent longitudinal study was conducted in the Netherlands
(Jaschke et al., 2018). Randomization to different conditions involved entire
schools rather than individual children (as in Portowitz et al., 2009). Two
schools were assigned randomly to a music-training intervention, and two
to a visual-arts program. The remaining two groups received the standard
Dutch curriculum. A fourth group comprised children who were taking
music lessons outside of school and assigned to the music intervention. The
authors reported that the visual-arts group showed more improvement than
the other groups in visuospatial ability, whereas the two music groups
showed larger improvements in verbal IQ and executive functions (planning
and inhibition). In short, children from different schools had different rates
of improvement. It is impossible, however, to attribute the response patterns
to the different interventions. Other differences between schools, such as
teaching quality, may have played a major role, and no conclusions can be
made about any analysis that included the fourth group because of self-
selection.
Relatively weak positive results such as these are further belied by a fair
dose of mixed or null findings. One issue is that enhancements are more
likely on some IQ tests than on others. For example, group differences are
less likely to be evident when a test of fluid intelligence is used as the
measure of general cognitive ability (Bialystok & DePape, 2009; Brandler
& Rammsayer, 2003; Helmbold et al., 2005; Schellenberg & Moreno, 2010;
Swaminathan & Schellenberg, 2017), rather than a test that includes
measures of crystallized intelligence, such as vocabulary (Jaschke et al.,
2018; Kaviani et al., 2014; Schellenberg, 2004). It is also clear that music
training is associated with IQ in some groups but not in others. For
example, university music majors, who have presumably invested a lot of
time and effort in acquiring musical skills, do not necessarily show an IQ
advantage compared to students at the same level majoring in other
disciplines (Brandler & Rammsayer, 2003; Helmbold et al., 2005). In other
words, the association between music training and cognitive ability is
strongest when music training is an add-on activity rather than the
participant’s primary focus. Otherwise, one would expect professional
musicians (e.g., Celine Dion, members of symphony orchestras) to be
geniuses.
There are other reasons to be cautious about the putative causal effect of
music training on intelligence. For one, correlations between music lessons
and cognitive ability may be explained by personality factors, particularly
the Openness-to-Experience trait (Corrigall et al., 2013; Corrigall &
Schellenberg, 2015). In other words, musically trained individuals may
perform well on intelligence tests, at least in part, because they tend to be
curious and particularly interested in learning new things (including, but not
limited to, music). Moreover, common genetic factors underlie intelligence
and the propensity to practice music (Mosing, Madison, Pedersen, & Ullén,
2016).
Findings from studies that examined personality or genetics raise the
possibility that the association between music training and general cognitive
ability in correlational studies and quasi-experiments is largely a reflection
of pre-existing differences. Moreover, despite some experimental evidence
for modest IQ enhancements after music training (Jaschke et al., 2018;
Kaviani et al., 2014; Portowitz et al., 2009; Schellenberg, 2004), other
experiments and longitudinal studies failed to find general cognitive
improvements. For example, one longitudinal study in Hong Kong found no
evidence for an IQ enhancement after six months of training (Ho et al.,
2003, study 2). When researchers in Massachusetts assigned preschool
children randomly to either six weeks of group music lessons or no lessons
at all, they found no advantage in cognitive abilities for the children who
took music lessons (Mehr et al., 2013, experiment 2). Such inconsistent
results suggest that music training may not always result in cognitive
advantages, or that the effect is very small.
One possibility is that music lessons lead to intellectual advantages only
if they train some intermediate capacity that mediates the association
between music training and intelligence. For example, it has been suggested
that executive functions such as attention, a capacity closely associated with
general cognitive ability (Salthouse, 2005), can be trained (Rueda, Rothbart,
McCandliss, Saccamanno, & Posner, 2005). Working memory is similarly
thought to be trainable (Klingberg, 2010; cf. Melby-Lervåg & Hulme,
2013), and there is some evidence that working-memory training transfers
to improvements in fluid intelligence (Jaeggi, Buschkuehl, Jonides, &
Perrig, 2008). This particular report of far transfer has been questioned,
however, because of the study’s methodological irregularities, and because
there was no evidence that the effect was long-lasting (Conway, Getz,
Macnamara, & Engel de Abreu, 2011; Mackintosh, 2011; Moody, 2009).
Nevertheless, it is still possible that music lessons train executive
functions, including working memory, which in turn promote general
cognitive enhancements (Degé, Kubicek, & Schwarzer, 2011; Hannon &
Trainor, 2007; Posner, Rothbart, Sheese, & Kieras, 2008; Schellenberg &
Peretz, 2008). In fact, musically trained adults perform better than their
untrained counterparts on auditory (Bialystok & DePape, 2009; Roden,
Grube, Bongard, & Kreutz, 2014; Zuk, Benjamin, Kenyon, & Gaab, 2014)
as well as non-auditory (Bialystok & DePape, 2009; Okada & Slevc, 2018;
Zuk et al., 2014) tests of executive functions, as do musically trained
children and teenagers (Degé et al., 2011; Herrero & Carriedo, 2018;
Jaschke et al., 2018). In one study of 9- to 11-year-olds, however, music
training was associated with IQ but not with executive functions other than
auditory working memory (Schellenberg, 2011a). Virtually identical results
were evident when 6- to 8-year-olds were randomly assigned to a six-week
music-training intervention, with more improvement, relative to controls,
on a test of working memory but not on other measures of executive
functions (Guo et al., 2018). Thus, it remains unclear whether executive
functions mediate the effect of music training on cognition.
A second possibility is that the type of music training plays a role
(Swaminathan & Schellenberg, 2016). Private music lessons (where one
teacher attends to one student or a very small group of students) emphasize
individual accomplishment and skill mastery. Group-based lessons (e.g.,
training in a high-school band), on the other hand, are more likely to
emphasize collective outcomes over individual ones. Private music training
may be more effective than group-based lessons at improving scores on
tests of cognitive ability, which by definition measure individual ability and
accomplishment. Alternatively, individual differences in cognitive ability
may influence who takes private lessons. Either way, the association could
be limited to the developed world, where private lessons are common. In
developing countries, and throughout history, music making is and has been
typically a group activity, in which virtually everyone takes part.
Considered as a whole, although associations between music training
and intelligence are evident in many circumstances, it is unclear whether
music training causes improvements in cognitive ability. If music lessons
do improve intelligence or general cognitive ability, the effect appears to be:
(1) small, (2) evident only among some individuals, or (3) a likely
consequence of taking lessons that emphasize individual achievement.
More generally, we know that far-transfer effects are very rare, and that
parsimony rules the day in the world of science. In short, a simpler
explanation of the available data is that high-functioning individuals are
more likely than other individuals to take music lessons.
Associations with Specific Cognitive Abilities

Despite evidence of associations between music training and domain-
general abilities such as general intelligence or IQ, it has often been
suggested that musical abilities are more strongly related to some non-
musical, cognitive abilities than they are to others. For example, a case has
been made for special overlaps between musical and visuospatial skills
(Leng, Shaw, & Wright, 1990; Rauscher & Shaw, 1998). Others have
argued for associations with language skills, specifically that musically
trained individuals exhibit enhanced performance on lower-level tasks that
involve speech perception (e.g., Kraus & Chandrasekaran, 2010; Patel,
2003, 2011; Skoe & Kraus, 2012). These theories imply that the benefits of
music training are especially likely to transfer to skills that are trained more
directly during music lessons, such as navigating a piano keyboard or
reading musical notation (which transfers to visuospatial skills), or listening
skills more generally (which extend to speech perception). In this
subsection, we review evidence for training-related transfer to the
visuospatial and language domains.
Associations with Visuospatial Skills

Music training is associated with visuospatial skills. In fact, advantages on
visual and spatial-reasoning tasks are evident in studies of musically trained
adults (Bidelman, Hutka, & Moreno, 2013; Brochard, Dufour, & Deprés,
2004; Faßhauer, Frese, & Evers, 2015; Jakobson, Lewycky, Kilgour, &
Stoesz, 2008; Patston & Tippett, 2011; Sluming, Brooks, Howard, Downes,
& Roberts, 2007; Stoesz, Jakobson, Kilgour, & Lewycky, 2007) and
children (Bilhartz, Bruhn, & Olson, 2000; Costa-Giomi, 1999; Gromko &
Poorman, 1998; Hassler, Birbaumer, & Feil, 1985; Rauscher & Hinton,
2011; Rauscher & Zupan, 2000). For example, musically trained adults
outperform their untrained counterparts on tests of visuospatial short-term
memory (i.e., Corsi blocks; Bidelman et al., 2013), on tasks that require
them to recreate line drawings from short-term or long-term memory
(Jakobson et al., 2008), and when they are asked to determine whether two
three-dimensional shapes are the same but rotated in space (Sluming et al.,
2007). Examples from children indicate that music training predicts better
performance on a task that asks them to remember the order of different
colored beads on a string (Bilhartz et al., 2000), and when they are required
to arrange blocks to form the shape of a previously seen staircase (Rauscher
& Zupan, 2000).
Results from experimental studies provide only weak indications that
music training actually causes improvements in visuospatial skills. For
example, one study assigned preschool children to six months of 10–15 min
of weekly keyboard lessons and 30 min of daily voice lessons, voice lessons
only, computer training, or no lessons (Rauscher et al., 1997). Only the
children in the keyboard/singing group exhibited improvement on the
Object Assembly subtest of the Wechsler Preschool and Primary Scale of
Intelligence—Revised (WPPSI-R). An unequivocal interpretation of these
findings depends on the internal validity of the design, however, and it is
doubtful that the computer training (with commercially available
educational software) was an appropriate and equally engaging control
activity. Moreover, the singing-only group had less contact with an adult
instructor compared to the keyboard/singing group. Finally, children were
not assigned randomly to the four conditions. Although a review of other
studies from the same laboratory provided converging results (Rauscher &
Hinton, 2011), there were not enough methodological details provided in
the review to be confident about the findings.
Nevertheless, one meta-analysis of experimental studies concluded that
music training causes improvements in spatial skills, even though six of the
fifteen studies included in the analysis were conducted by a single research
group (i.e., Rauscher and colleagues; Hetland, 2000). More recent studies
report mixed results. For example, Mehr et al. (2013) conducted two
experiments. In one, children were randomly assigned to either music or
visual-arts lessons for six weeks. After the intervention, the music group
outperformed the art group on a spatial-navigation task, while the art group
outperformed the music group on a geometry-perception task. In a second
experiment, no significant group differences were evident when a new
group of preschool children was randomly assigned to either six weeks of
music lessons or no lessons at all.
One might question whether the effect of music training on visuospatial
skills is evident only after a longer duration of training (i.e., longer than six
weeks). Although this is a reasonable proposal, when Costa-Giomi (1999)
assigned 9-year-olds to three years of piano lessons or no lessons, the
piano-trained children had better spatial abilities after one and two years of
lessons, but not after three years. In short, although there is ample evidence
that music training is associated positively with visuospatial abilities,
evidence that music training causes the association is weak and
inconsistent.
Associations with Language Abilities

Because both language and music are rule-bound means of auditory
communication, associations between musical and linguistic processing
have received much attention from scholars who conduct research in these
domains. The available evidence documents that musically trained
individuals are better than their untrained counterparts at detecting
linguistic stress patterns (Kolinsky, Cuvelier, Goetry, Peretz, & Morais,
2009), and at perceiving pitch and intonation in speech (Besson, Schön,
Moreno, Santos, & Magne, 2007; Dankovičová, House, Crooks, & Jones,
2007; Delogu, Lampis, & Belardinelli, 2010; Good et al., 2017; Magne,
Schön, & Besson, 2006; Marques, Moreno, Castro, & Besson, 2007;
Thompson, Schellenberg, & Husain, 2004; Wong, Skoe, Russo, Dees, &
Kraus, 2007; Wu et al., 2015). In some instances, they also tend to be better
at perceiving speech under challenging conditions, such as comprehending
speech in noise (Parbery-Clark, Skoe, Lam, & Kraus, 2009; Strait & Kraus,
2011; Strait, Parbery-Clark, O’Connell, & Kraus, 2013; Tierney, Krizman,
Skoe, Johnston, & Kraus, 2013; Swaminathan et al., 2015; but see
Boebinger et al., 2015; Ruggles et al., 2014), or perceiving acoustically
degraded vowel sounds (Bidelman & Krishnan, 2010). Musicians also show
advantages in speech-segmentation skills (François, Chobert, Besson, &
Schön, 2013) and phonological perception (Chobert, François, Velay, &
Besson, 2014; Chobert, Marie, François, Schön, & Besson, 2011; Zuk et al.,
2013), and their brainstems appear to make higher-fidelity representations
of speech stimuli (e.g., Kraus et al., 2014; Parbery-Clark, Tierney, Strait, &
Kraus, 2012; Strait, O’Connell, Parbery-Clark, & Kraus, 2014; Strait,
Parbery-Clark, Hittner, & Kraus, 2012; Weiss & Bidelman, 2015).
In addition to speech-specific auditory advantages, musically trained
individuals show advantages on higher-level cognitive tests of verbal ability
including verbal short-term (Chan, Ho, & Cheung, 1998; Franklin et al.,
2008; Hansen, Wallentin, & Vuust, 2013; Ho et al., 2003), working
(Franklin et al., 2008), and long-term memory (Franklin et al., 2008). In
short-term and long-term tests of verbal memory, participants are required
to read and recall a list of unrelated words, either immediately (short-term
memory) or after a delay (long-term memory). In tests of working memory,
listeners are required to remember a list of letters or numbers, but between
each presentation of a to-be-remembered item, a secondary task requires
them to determine whether a sentence makes sense, or to solve an addition
problem.
Music training also predicts enhanced performance on tests of
vocabulary (Forgeard, Winner, Norton, & Schlaug, 2008; Piro & Oritz,
2009) and second-language ability (Petitto, 2008; Posedel, Emery, Souza, &
Fountain, 2012; Swaminathan & Gopinath, 2013; Talamini, Grassi,
Toffalini, Santoni, & Carretti, 2018). One of the most reliable findings is
that music training is correlated positively with phonological awareness (a
skill important to the development of reading), which refers to the ability to
perceive and segment phonological elements of speech (Gromko, 2005;
Overy, 2003; Wandell, Dougherty, Ben-Shachar, Deutsch, & Tsang, 2008).
It is therefore not surprising that music training is also associated positively
with reading ability (Butzlaff, 2000; Corrigall & Trainor, 2011; Moreno et
al., 2009; Standley, 2008; Swaminathan et al., 2018).
Some experimental evidence supports the idea that music training can
promote language abilities. For example, Degé and Schwarzer (2011)
randomly assigned preschool children to daily training in music,
phonological skills, or sports. After twenty weeks, the phonological
awareness of children in the music group matched those of children in the
program designed specifically to improve these skills. Both groups
outperformed the sports group, which ruled out the role of normal
maturation.
Similar findings have been found in children with atypical language
development. For example, one study assigned children with dyslexia to six
weeks of a rhythm intervention, a commercially available phoneme-
discrimination intervention, or a passive control group (Thomson, Leong, &
Goswami, 2013). The rhythm intervention involved copying and
synchronizing to non-speech rhythms on a hand drum, speaking and
clapping to words in rhythm, and playing computerized games intended to
train basic auditory skills linked to rhythm perception. Relative to the
control group, both the rhythm group and the phonological skills group
improved on tests of phonological processing. Another experiment
randomly assigned children with dyslexia to thirty weeks of either music
lessons (focused primarily on rhythm) or painting lessons (Flaugnacco et
al., 2015). Despite similar performance at pre-test, after the intervention, the
music group made larger gains on phonological and reading skills compared
to the painting group.
Moreno and colleagues (Moreno, Bialystok, et al., 2011) assigned
preschool children to twenty days of computerized training in music or
visual arts. The children were tested on the Block Design and Vocabulary
subtests of the WPPSI-III before and after the intervention. The music
group made significant post-test gains on the Vocabulary subtest but not the
Block Design subtest. Importantly, the arts group did not make any gains,
which indicates that vocabulary improvements were specific to music
training. In another article from the same sample of children (Moreno,
Friesen, & Bialystok, 2011), the researchers reported that the music group
was better at learning to map arbitrary visual symbols to words, a skill that
is likely to be important for the development of reading.
The successful music-lesson interventions in the experimental studies
described above were relatively short-term, and focused on listening rather
than learning to play an instrument. Thus, listening training that focuses on
music, particularly rhythm and timing, may indeed help children perceive
rapid temporal changes in speech, such as those that distinguish adjacent
phonemes. When children are trained with more standard pedagogies,
however, the results tend to be weaker.
For example, in a recent longitudinal study that examined group-based
music lessons, differences compared to a control group emerged only after
extended training (Slater et al., 2015). Specifically, children improved on a
test that measured their ability to perceive speech in noise after two years of
community-based music lessons, which were taught using an established
and successful curriculum. Nevertheless, children who received only one
year of the same lessons did not make any statistically significant gains.
Moreover, other studies failed to find correlations between music training
and language skills (Boebinger et al., 2015; Ruggles et al. 2014;
Swaminathan & Schellenberg, 2017), which implies that typical music
lessons may not always improve speech and language abilities, or that the
effect is relatively weak.
In general, however, music lessons that emphasize listening skills and
temporal (rhythm) perception appear to promote phonological awareness
and speech perception, at least for some groups of individuals. These
improvements can, in turn, facilitate learning to read.
Music Training and Cognitive Performance in
Real-World Contexts
The studies described in previous sections raise the possibility that music
training causes small improvements on standardized tests and laboratory-
based measures of speech perception and other aspects of cognitive ability.
If such effects exist, they would have little importance unless they extend to
performance in real-world situations. We now turn to two such situations:
academic achievement and healthy aging.
Music Training and Academic Achievement

Participation in school-based musical activities predicts academic
performance in later years (Catterall, Chapleau, & Iwanaga, 1999;
Gouzouasis, Guhn, & Kishor, 2007; Winner & Cooper, 2000). For example,
a meta-analysis of ten years of data from the American College Board
found that high-school students with training in the arts, including music,
performed better than students without any arts training on the SAT
(formerly the Scholastic Aptitude Test; Vaughn & Winner, 2000). (SAT
scores are used as a basis for admission to undergraduate colleges. Thus, the
SAT is administered routinely to high-school seniors.) Longer duration of
musical participation is also known to be associated with better SAT scores
(Vaughn & Winner, 2000), and with higher average grades in school
(Catterall et al., 1999; Schellenberg, 2006; Wetter, Koerner, & Schwaninger,
2009). These associations tend to be broad and general, rather than
restricted to one or two school subjects. For example, the mathematics and
geometry scores of musically trained participants tend to be higher than
their untrained counterparts (Catterall et al., 1999; Cheek & Smith, 1999;
Gardiner, Fox, Knowles, & Jeffry, 1996; Gouzouasis et al., 2007; Graziano,
Peterson, & Shaw, 1999; Spelke, 2008; Vaughn, 2000; Vaughn & Winner,
2000), as is the case with language-related academic outcomes (Vaughn &
Winner, 2000). Nevertheless, some types of music lessons may be more
strongly associated with academic outcomes than others. For example,
Canadian adolescents with keyboard lessons show advantages on high-
school English tests, but those with vocal music training do not
(Gouzouasis et al., 2007).
Interestingly, students who enroll in music classes—even theory-based
music history classes—appear to demonstrate academic advantages
compared to students who report having no training in any form of fine arts
(Vaughn & Winner, 2000). In other words, actual instrumental or vocal
training may not be unique in its association with academic performance.
Indeed, high-school students who participate in any type of arts activity
show advantages on the SAT, with drama students showing the strongest
advantages (Vaughn & Winner, 2000). Moreover, children participating in
sports are just as likely as arts participants to win academic awards (Winner,
Goldstein, & Vincent-Lancrin, 2012), and they perform no differently from
musically trained children on tests of mathematical ability (Spelke, 2008).
Thus, participation in any type of extracurricular activity, not just music,
predicts academic performance.
It is also unclear whether participation in music (or other) activities
causes academic advantages. It is equally likely, if not more likely, that pre-
existing individual differences in academic ability determine musical
participation. Indeed, grades in elementary school predict participation in
middle-school (Kinney, 2008) and high-school (Frakes, 1985) music
classes, a timeline that rules out a causal role for music training. Moreover,
better academic performance predicts longer participation in subsequent
musical activities (e.g., Kinney, 2010; Klinedinst, 1991).
The association between music training and academic performance
could also be an artifact of a third variable. For example, socio-economic
affluence is associated with better scholastic performance (Sirin, 2005), as
well as with musical participation (Corenblum & Marshall, 1998; Kinney,
2010; Klinedinst, 1991). However, the correlation between training and
scholastic achievement is evident across socio-economic status (SES) levels
(Catterall et al., 1999; Fitzpatrick, 2006) and persists even after holding
SES constant (Corrigall et al., 2013; Degé et al., 2014; Schellenberg, 2006,
2011a, 2011b; Schellenberg & Mankarious, 2012), which indicates that the
association between music training and academic achievement is at least
partly independent of SES.
Pre-existing personality differences also appear to play a role. For
example, musically trained children tend to do even better in school than
one would predict from their elevated IQ scores (Corrigall et al., 2013;
Degé et al., 2014; Schellenberg, 2006). This “special” association between
music training and school performance disappears when conscientiousness
is controlled in addition to IQ (Corrigall et al., 2013). In other words, in
addition to being smart, musically trained children tend to be particularly
hard-working and diligent, which explains why they do particularly well in
school.
Finally, the results of longitudinal and experimental studies provide little
evidence of a causal role for music training on academic performance. For
example, one longitudinal study found evidence for a scholastic advantage
after two years of piano lessons but not after the third year (Costa-Giomi,
2004). Another one-year longitudinal study found no evidence of improved
performance on a standardized test of academic achievement, although
children with music training had larger improvements, in absolute terms, on
each of five subtests (Schellenberg, 2004). The most compelling negative
result comes from a government-funded project in the UK that was
organized by the Educational Endowment Foundation and the National
Centre for Social Research. Over 900 children in Year 2 (7-year-olds) from
nineteen schools were randomly assigned to music training (strings or
voice), or to a control group that had drama lessons (Haywood et al., 2015).
All participants received weekly training for thirty-two weeks in groups of
approximately ten children. Improvements in mathematical abilities and
literacy skills were similar for the music and drama groups, and there was
no difference between the two music groups.
Meta-analytic reviews that include findings from published and
unpublished sources also report no evidence for a causal role of music
training in scholastic achievement (Winner & Cooper, 2000; Winner et al.,
2012). As noted earlier, the most recent meta-analysis (Sala & Gobet,
2017b) found a very small association between music training and
academic achievement, but this effect was due to contributions from poorly
designed studies. Specifically, such associations are more likely to be
evident when (1) studies do not have random assignment, such that self-
selection plays a role in choosing to take music lessons, and (2) the control
group is passive (no activity) rather than active, such that non-musical
aspects of the music training (e.g., structured learning environment,
additional contact with an adult teacher) are implicated. In sum, there is
little evidence to support the notion that music training causes
improvements in scholastic performance, despite much evidence that music
training is associated positively with academic achievement.
Music Training and Healthy Aging
Older adults often experience declines in cognitive abilities, such as deficits
in executive functions and difficulties with hearing in noisy environments
(for reviews see Alain, Zendel, Hutka, & Bidelman, 2014; Salthouse, 2004).
Because music training may cause small improvements in these skills in
normally maturing children, it is plausible that it could also slow the onset
of aging-related declines. To date, only a handful of studies have examined
this possibility.
The available evidence suggests that middle-aged and older adults who
have practiced music throughout their lives tend to outperform age-matched
non-musicians on auditory perception tests, such as speech perception in
noise (Parbery-Clark, Strait, Anderson, Hittner, & Kraus, 2011; Zendel &
Alain, 2012), categorical perception of speech sounds (Bidelman & Alain,
2015), frequency discrimination (Grassi, Meneghetti, Toffalini, & Borella,
2017), and the detection of gaps and mistuned harmonics in tones (Grassi et
al., 2017; Zendel & Alain, 2009, 2012, 2013). In fact, some evidence
implies that older-adult musicians perform almost as well as young adults at
detecting mistuned harmonics (Zendel & Alain, 2013). In one instance,
even a small amount of music training in childhood was related to better
temporal precision in speech-evoked neural responses later in life (White-
Schwoch, Carr, Anderson, Strait, & Kraus, 2013).
Middle-aged and older adults with music training also tend to show
advantages in executive-function tasks, including auditory attention and
working memory, verbal immediate recall, and verbal fluency (Amer,
Kalender, Hasher, Trehub, & Wong, 2013; Fauvel et al., 2014; Grassi et al.,
2017; Hanna-Pladdy & Gajewski, 2012; Parbery-Clark et al., 2011).
Advantages appear to be strongest for individuals who began music lessons
earlier rather than later in life (Fauvel et al., 2014). In the visuospatial
domain, however, the evidence is less clear. Whereas two studies found no
evidence for visual working-memory advantages in older musicians
(Hanna-Pladdy & Gajewski, 2012; Parbery-Clark et al., 2011), three others
reported that musicians outperform non-musicians on a visuospatial span
task, the Simon Task, or other tests of visuospatial ability (Amer et al.,
2013; Grassi et al., 2017; Hanna-Pladdy & MacKay, 2011).
One recent study of adults over 64 years of age compared those who
were currently singing or playing a musical instrument to other participants
(Mansens, Deeg, & Comijs, 2017). The older participants who were making
music, particularly those who were playing an instrument, had higher scores
on tests of episodic memory, executive functions, and attention. It is
unknown, however, whether these individuals also played music earlier in
life. In any event, playing music later in life appears to be a marker of
healthy aging.
At the very least, these studies suggest that those who are inclined to
engage in musical activities early in life are also likely to show cognitive
advantages later in life, as they do when they are younger. The findings do
not inform the issue of rates of cognitive decline. Thus, whether music
training can be used to preserve cognitive abilities or even slow down
cognitive aging processes is still an open question.
Suggestive evidence, however, comes from one experimental study in
which 60- to 85-year-olds were assigned randomly to six months of piano
lessons or a no-lessons (passive) control condition (Bugos, Perlstein,
McCrae, Brophy, & Bedenbaugh, 2007). The music group improved on two
of five tests of executive function, whereas the control group did not appear
to make gains on any of the tests. It is not clear, however, whether
improvements in the intervention group were due to music training per se.
As noted, the effect could be due to non-musical aspects of the training,
such as an additional opportunity to engage with someone (the instructor),
or simply the knowledge among the piano-group participants that they were
somehow “special” because they were receiving an intervention. Moreover,
it is not clear whether such improvements were long-lasting.
In sum, although it is possible that (1) music training in childhood may
buffer against cognitive declines that are evident later in life, and (2)
musical engagement in late adulthood could preserve or even improve
already declining abilities, not enough evidence is available at the present
time to confirm or disprove these hypotheses.
O W F : M M
A M T
As noted, most of the research on music lessons and non-musical cognitive

abilities is correlational in design. Such designs preclude inferences of
causality. Experimental studies with random assignment are better suited to
studying causal direction but are relatively rare because they are expensive
to carry out. Moreover, training received in the context of an experiment is,
no doubt, quite different from the experience of music lessons in the real
world. For example, attrition limits the length of training in experimental
studies. In the one-year study by Schellenberg (2004) with 144 children at
baseline, 12 students (1 in 12, or 8.3 percent) dropped out and were not
available for post-test. In the two-year study by Jaschke et al. (2018), 30 of
176 children (17.0 percent) who were tested at baseline were not available
at post-test. Moreover, random assignment in experiments excludes
motivational factors that promote long-term musical participation in the real
world. In other words, correlational studies can capture ecologically valid
information that experiments cannot.
Is there a way to improve the interpretability of correlational findings?
In recent research, we made one such attempt by measuring duration of
music training, performance on music aptitude tests, and other possible
confounding variables (e.g., SES; Swaminathan & Schellenberg, 2017,
2018; Swaminathan et al., 2017, 2018). Music aptitude is typically
quantified using tests that measure listeners’ ability to perceive, remember,
and discriminate melodies and rhythms (Gordon, 1965; Seashore, Lewis, &
Saetveit, 1960). On each trial, the listener decides whether two musical
sequences are identical. On trials with non-identical sequences, one event
(i.e., a tone or drumbeat) in the second sequence is altered in pitch or time.
In pedagogical contexts, aptitude is a measure of musical proclivities that
should lead to subsequent success in musical activities, including training.
As one would expect, music training does indeed predict performance on
aptitude tests (e.g., Law & Zentner, 2012; Wallentin et al., 2010), but the
causal direction is unclear. Importantly, music aptitude is also associated
with performance on tests of general cognitive abilities and language
abilities (for review, see Schellenberg & Weiss, 2013).
In our statistical analyses, we examined individual differences in
performance on a non-musical variable (e.g., a test of speech perception or
intelligence) as a function of music aptitude, with music training held
constant, and music training, with music aptitude held constant. Depending
on the particular research question, we also held constant other measures
with overlapping variance, such as socio-economic status or personality.
These analyses of partial associations allowed for more nuanced
investigation about the relative role of learning and the environment (e.g.,
music training) on the one hand, and natural abilities (e.g., music aptitude,
intelligence) on the other hand. With training held constant, associations of
non-musical skills with performance on a music-aptitude test indicate that
the association between musical and non-musical skills is independent of
training, and possibly precedes it. With aptitude held constant, associations
of non-musical skills with training provide more convincing evidence for
training effects.
Although partial correlations, like simple correlations, do not allow for
inferences of causation, this method serves to contextualize the size and
location of hypothesized training effects relative to pre-existing associations
between musical and non-musical abilities. When we used this method with
adult participants, music aptitude was associated with intelligence
(Swaminathan et al., 2017) and speech perception skills (Swaminathan &
Schellenberg, 2017) when training was held constant, but music training
was not associated with either outcome when music aptitude was held
constant. In the case of speech perception, there was not even a simple
association between music training and performance on a test of the ability
to discriminate speech sounds that are relevant (i.e., phonemic) in a foreign
language (Zulu) but not in English. Our interpretation of these data was that
pre-existing differences in music aptitude and cognitive ability predict
music training. Although taking music lessons may go on to increase music
aptitude and cognitive abilities further (Figure 1), such training effects are
likely to play a relatively small role in the overall picture.
FIGURE 1. Individuals with high cognitive ability and music aptitude have an increased likelihood
of taking music lessons, which could then go on to improve cognitive and musical abilities.
In other correlational research, we used a similar approach to examine
the association between music training and music aptitude (Swaminathan &
Schellenberg, 2018). The simple association between the two variables was
significant, as in previous research, with music training accounting for 24.5
percent of the variance in music aptitude. When socio-economic status,
openness to experience, short-term memory, and general cognitive ability
were considered jointly with music training, the predictive power of the
model increased to 36.7 percent. Music training continued to have the
largest partial association, accounting independently for 6.2 percent of the
variance in music aptitude. Note that the reduction in variance explained
(from 24.5 percent to 6.2 percent) highlights the overlap between music
training and non-musical variables, which are typically overlooked in this
line of research. When the non-musical variables were considered jointly,
they accounted uniquely for 12.2 percent of the variance in music aptitude
(with music training held constant). In other words, music aptitude was
predicted better by the non-musical variables than it was by music training
alone. These findings highlight that music aptitude is more than the simple
consequence of music training. Although music training might be the best
predictor variable, other, non-musical variables play an important role.
In a fourth study (Swaminathan et al., 2018), we used the same approach
to examine the association between music training and reading ability
among adults who were native or non-native speakers of English. As in
previous research, reading ability was positively associated with duration of
music training. We also found that reading ability improved in tandem with
general cognitive ability, and it was better among native than non-native
speakers of English. When these variables were considered jointly, general
cognitive ability and native-language status had significant partial
associations with reading ability, but music training did not. In other words,
associations between music training and reading may be an artifact of other
variables that are typically ignored in this line of research.
The evidence that music training causes improvements in non-musical

domains is very weak, except for the effect of rhythm- and listening-based
training, which appears to improve fine-grained listening skills in general,
which can then enhance the ability to isolate and segment the sounds of
speech. Because isolating speech sounds and matching them with letters or
groups of letters is crucial for reading, rhythm- and listening-based training
may go on to improve reading skills, particularly for those who have
difficulty with reading (i.e., young children and children with dyslexia).
These positive effects are evident primarily as a consequence of specially
designed interventions that focus specifically on rhythm training and
analytical listening. Typical conservatory-style training may not have the
same effects, or much weaker effects.
Otherwise, although there is ample evidence that music lessons are
predictive of benefits in general cognitive abilities, visuospatial abilities, or
language abilities, the causal evidence is very weak. Large-sample, long-
term studies with random assignment to music lessons are virtually
impossible to conduct because of cost and attrition. Moreover, when such
efforts are made, the results may fail to generalize broadly because it is
difficult to know what one is studying when motivation, personality, music
aptitude, demographics, and general cognitive ability are held constant. In
the real world, these factors play a key role in determining who takes music
lessons, particularly for long durations of time.
In short, evidence that traditional music pedagogies have non-musical
cognitive benefits is lacking and unconvincing. Most of the available
correlational evidence can be explained parsimoniously: high-functioning
children are more likely than other children to take music lessons and to
perform well on tests of many sorts. We therefore advocate a different
approach—correlational designs that attempt to account for as many
alternative explanations as possible. At the very least, this approach allows
researchers to be sure that they are studying a real-world phenomenon,
rather than an experimental or pedagogical artifact.
Future research on music and non-musical abilities is likely to find
nuanced results if individual differences in music aptitude and other
variables, such as SES, personality, and general cognitive ability (Corrigall
& Schellenberg, 2015; Corrigall et al., 2013), are considered alongside
music training. In other words, the causes of music training may be just as
important as its consequences.
A
Supported by a grant from the Natural Sciences and Engineering Research
Council of Canada awarded to EGS.
R
Alain, C., Zendel, B. R., Hutka, S., & Bidelman, G. M. (2014). Turning down the noise: The benefit
of musical training on the aging auditory brain. Hearing Research 308, 162–173.
Amer, T., Kalender, B., Hasher, L., Trehub, S. E., & Wong, Y. (2013). Do older professional
musicians have cognitive advantages? PLoS ONE 8(8), e71630.
Besson, M., Schön, D., Moreno, S., Santos, A., & Magne, C. (2007). Influence of musical expertise
and musical training on pitch processing in music and language. Restorative Neurology and
Neuroscience 25(3–4), 399–410.
Bhatara, A., Yeung, H. H., & Nazzi, T. (2015). Foreign language learning in French speakers is
associated with rhythm perception, but not with melody perception. Journal of Experimental
Bialystok, E., & DePape, A.-M. (2009). Musical expertise, bilingualism, and executive functioning.
Bidelman, G. M., & Alain, C. (2015). Musical training orchestrates coordinated neuroplasticity in
auditory brainstem and cortex to counteract age-related declines in categorical vowel perception.
Bidelman, G. M., Hutka, S., & Moreno, S. (2013). Tone language speakers and musicians share
enhanced perceptual and cognitive abilities for musical pitch: Evidence for bidirectionality
between the domains of language and music. PLoS ONE 8(4): e60676.
Bidelman, G. M., & Krishnan, A. (2010). Effects of reverberation on brainstem representation of
speech in musicians and nonmusicians. Brain Research 1355, 112–125.
Bilhartz, T. D., Bruhn, R. A., & Olson, J. E. (2000). The effect of early music training on child
cognitive development. Journal of Applied Developmental Psychology 20(4), 615–636.
Boebinger, D., Evans, S., Rosen, S., Lima, C.F., Manly, T., & Scott, S. K. (2015). Musicians and non-
musicians are equally adept at perceiving masked speech. Journal of the Acoustical Society of
America 137(1), 378–387.
Brandler, S., & Rammsayer, T. H. (2003). Differences in mental abilities between musicians and non-
musicians. Psychology of Music 31(2), 123–138.
Brochard, R., Dufour, A., & Deprés, O. (2004). Effect of musical expertise on visuospatial abilities:
Evidence from reaction times and mental imagery. Brain and Cognition 54(2), 103–109.
Brody, N. (1992). Intelligence (2nd ed.). San Diego, CA: Academic Press.
Bugos, J. A., Perlstein, W. M., McCrae, C. S., Brophy, T. S., & Bedenbaugh, P. H. (2007).
Individualized piano instruction enhances executive functioning and working memory in older
adults. Aging and Mental Health 11(4), 464–471.
Butzlaff, R. (2000). Can music be used to teach reading? Journal of Aesthetic Education 34(3–4),
167–178.
Catterall, J., Chapleau, R., & Iwanaga, J. (1999). Involvement in the arts and human development:
General involvement and intensive involvement in music and theatre arts. In E. Fiske (Ed.),
Champions of change: The impact of the arts on learning (pp. 1–18). Washington, DC: The Arts
Education Partnership and The President’s Committee on the Arts and the Humanities.
Chan, A. S., Ho, Y. C., & Cheung, M. C. (1998). Music training improves verbal memory. Nature
396(6707), 128.
Cheek, J. M., & Smith, L. R. (1999). Music training and mathematics achievement. Adolescence
34(136), 759–761.
Chobert, J., François, C., Velay, J. L., & Besson, M. (2014). Twelve months of active musical
Chobert, J., Marie, C., François, C., Schön, D., & Besson, M. (2011). Enhanced passive and active
processing of syllables in musician children. Journal of Cognitive Neuroscience 23(12), 3874–
3887.
Conway, A. R. A., Getz, S. J., Macnamara, B., & Engel de Abreu, P. M. J. (2011). Working memory
and fluid intelligence. In R. J. Sternberg & S. B. Kaufman (Eds.), The Cambridge handbook of
intelligence (pp. 394–418). Cambridge: Cambridge University Press.
Corenblum, B., & Marshall, E. (1998). The band played on: Predicting students’ intentions to
continue studying music. Journal of Research in Music Education 46(1), 128–140.
Corrigall, K. A., & Schellenberg, E. G. (2015). Predicting who takes music lessons: Parent and child
characteristics. Frontiers in Psychology 6, 282. doi: 10.3389/fpsyg.2015.00282
Corrigall, K. A., Schellenberg, E. G., & Misura, N. M. (2013). Music training, cognition, and
personality. Frontiers in Psychology 4, 222. doi: 10.3389/fpsyg.2013.00222
Corrigall, K. A., & Trainor, L. J. (2011). Associations between length of music training and reading
skills in children. Music Perception 29(2), 147–155.
Costa-Giomi, E. (1999). The effects of three years of piano instruction on children’s cognitive
development. Journal of Research in Music Education 47(3), 198–212.
Costa-Giomi, E. (2004). Effects of three years of piano instruction on children’s academic
achievement, school performance and self-esteem. Psychology of Music 32(2), 139–152.
Dankovičová, J., House, J., Crooks, A., & Jones, K. (2007). The relationship between musical skills,
music training, and intonation analysis skills. Language and Speech 50(2), 177–225.
Deary, I. J., Strand, S., Smith, P., & Fernandes, C. (2007). Intelligence and educational achievement.
Intelligence 35(1), 13–21.
Degé, F., Kubicek, C., & Schwarzer, G. (2011). Music lessons and intelligence: A relation mediated
by executive functions. Music Perception 29(2), 195–201.
Degé, F., & Schwarzer, G. (2011). The effect of a music program on phonological awareness in
preschoolers. Frontiers in Psychology 2, 124. doi: 10.3389/fpsyg.2011.00124.
Degé, F., Wehrum, S., Stark, R., & Schwarzer, G. (2014). Music lessons and academic self-concept in
12- to 14-year-old children. Musicae Scientiae 18(2), 203–215.
Delogu, F., Lampis, G., & Belardinelli, M. O. (2010). From melody to lexical tone: Musical ability
enhances specific aspects of foreign language perception. European Journal of Cognitive
Faßhauer, C., Frese, A., & Evers, S. (2015). Musical ability is associated with enhanced auditory and
visual cognitive processing. BMC Neuroscience 16(1), 1. Retrieved from
https://doi.org/10.1186/s12868-015-0200-4
Fauvel, B., Groussard, M., Mutlu, J., Arenaza-Urquijo, E. M., Eustache, F., Desgranges, B., & Platel,
H. (2014). Musical practice and cognitive aging: Two cross-sectional studies point to phonemic
fluency as a potential candidate for a use-dependent adaptation. Frontiers in Aging Neuroscience
6, 227. doi:10.3389/fnagi.2014.00227
Fitzpatrick, K. R. (2006). The effect of instrumental music participation and socioeconomic status on
Ohio fourth-, sixth-, and ninth-grade proficiency test performance. Journal of Research in Music
Education 54(1), 73–84.
increases phonological awareness and reading skills in developmental dyslexia: A Randomized
control trial. PloS ONE 10(9), e0138715.
Forgeard, M., Winner, E., Norton, A., & Schlaug, G. (2008). Practicing a musical instrument in
childhood is associated with enhanced verbal ability and nonverbal reasoning. PLoS ONE 3(10),
e3566.
Frakes, L. (1985). Differences in music achievement, academic achievement, and attitude among
participants, dropouts, and nonparticipants in secondary school music. Dissertation Abstracts
International 46, 370A. University Microfilms No. AAC8507938.
Franklin, M. S., Moore, K. S., Yip, C., Jonides, J., Rattray, K., & Moher, J. (2008). The effects of
musical training on verbal memory. Psychology of Music 36(3), 353–365.
Gardiner, M., Fox, A., Knowles, F., & Jeffry, D. (1996). Learning improved by arts training. Nature
381(6580), 284.
Gibson, C., Folley, B. S., & Park, S. (2009). Enhanced divergent thinking and creativity in musicians:
A behavioral and near-infrared spectroscopy study. Brain and Cognition 69(1), 162–169.
Good, A., Gordon, K. A., Papsin, B. C., Nespoli, G., Hopyan, T., Peretz, I., & Russo, F. A. (2017).
Benefits of music training for perception of emotional speech prosody in deaf children with
cochlear implants. Ear & Hearing 38(4), 455–464.
Gordon, E. E. (1965). Music aptitude profile. Chicago: GIA.
Gouzouasis, P., Guhn, M., & Kishor, N. (2007). The predictive relationship between achievement and
participation in music and achievement in core Grade 12 academic subjects. Music Education
Research 9(1), 81–92.
Grassi, M., Meneghetti, C., Toffalini, E., & Borella, E. (2017). Auditory and cognitive performance
in elderly musicians and nonmusicians. PLoS ONE 12(11), e0187881.
Graziano, A. B., Peterson, M., & Shaw, G. L. (1999). Enhanced learning of proportional math
through music training and spatial-temporal training. Neurological Research 21(2), 139–152.
Gromko, J. E. (2005). The effect of music instruction on phonemic awareness in beginning readers.
Gromko, J. E., & Poorman, A. S. (1998). Developmental trends and relationships in children’s aural
perception and symbol use. Journal of Research in Music Education 46(1), 16–23.
Gruhn, W., Galley, N., & Kluth, C. (2003). Do mental speed and musical abilities interact? Annals of
Guo, X., Ohsawa, C., Suzuki, A., & Sekiyama, K. (2018). Improved digit span in children after a 6-
week intervention of playing a musical instrument: An exploratory randomized controlled trial.
Frontiers in Psychology 8, 2303. doi: 10.3389/fpsyg.2017.02303
Hanna-Pladdy, B., & Gajewski, B. (2012). Recent and past musical activity predicts cognitive aging
variability: Direct comparison with general lifestyle activities. Frontiers in Human Neuroscience
6, 198. doi: 10.3389/fnhum.2012.00198
Hanna-Pladdy, B., & MacKay, A. (2011). The relation between instrumental musical activity and
cognitive aging. Neuropsychology 25(3), 378–386.
Hansen, M., Wallentin, M., & Vuust, P. (2013). Working memory and musical competence of
musicians and non-musicians. Psychology of Music 41(6), 779–793.
Hassler, M., Birbaumer, N., & Feil, A. (1985). Musical talent and visuo-spatial abilities: A
longitudinal study. Psychology of Music 13(2), 99–113.
Haywood, S., Griggs, J., Lloyd, C., Morris, S., Kiss, Z., & Skipp, A. (2015). Creative futures: Act,
sing, play. Evaluation report and executive summary. London: Educational Endowment
Foundation.
Helmbold, N., Rammsayer, T., & Altenmüller, E. (2005). Differences in primary mental abilities
between musicians and nonmusicians. Journal of Individual Differences 26, 74–85.
Herrero, L., & Carriedo, N. (2018). Differences in updating processes between musicians and non-
musicians from late childhood to adolescence. Learning and Individual Differences 61, 188–195.
Hetland, L. (2000). Learning to make music enhances spatial reasoning. Journal of Aesthetic
Education 34(3–4), 179–238.
Ho, Y., Cheung, M., & Chan, A. S. (2003). Music training improves verbal but not visual memory:
Cross-sectional and longitudinal explorations in children. Neuropsychology 17(3), 439–450.
Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Perrig, W. J. (2008). Improving fluid intelligence with
training on working memory. Proceedings of the National Academy of Sciences 105, 6829–6833.
Jakobson, L. S., Lewycky, S. T., Kilgour, A. R., & Stoesz, B. M. (2008). Memory for verbal and
visual material in highly trained musicians. Music Perception 26(1), 41–55.
Jaschke, A. C., Honing, H., & Scherder, E. J. A. (2018). Longitudinal analysis of music education on
executive functions in primary school children. Frontiers in Neuroscience 12, 103.
doi:10.3389/fnins.2018.00103
Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational
Review 39(1), 1–123.
Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.
Judge, T. A., Higgins, C. A., Thoresen, C. J., & Barrick, M. R. (1999). The big five personality traits,
general mental ability, and career success across the life span. Personnel Psychology, 52(3), 621–
652.
Kaviani, H., Mirbaha, H., Pournaseh, M., & Sagan, O. (2014). Can music lessons increase the
performance of preschool children in IQ tests? Cognitive Processing 15(1), 77–84.
Kinney, D. W. (2008). Selected demographic variables, school music participation and achievement
test scores of urban middle school students. Journal of Research in Music Education 56(2), 145–
161.
Kinney, D. W. (2010). Selected nonmusic predictors of urban students’ decisions to enroll and persist
in middle school band programs. Journal of Research in Music Education 57(4), 334–350.
Klinedinst, R. E. (1991). Predicting performance achievement and retention of fifth-grade
instrumental students. Journal of Research in Music Education 39(3), 225–238.
Klingberg, T. (2010). Training and plasticity of working memory. Trends in Cognitive Sciences 14(7),
317–324.
musicians. Neuroreport 10, 1309–1313.
Kolinsky, R., Cuvelier, H., Goetry, V., Peretz, I., & Morais, J. (2009). Music training facilitates
lexical stress processing. Music Perception 26(3), 235–246.
Nature Reviews Neuroscience 11, 599–605.
Law, L. N. C., & Zentner, M. (2012). Assessing musical abilities objectively: Construction and
validation of the Profile of Music Perception Skills. PLoS ONE 7(12), e52508.
Leng, X., Shaw, G. L., & Wright, E. L. (1990). Coding of musical structure and the trion model of
cortex. Music Perception 8(1), 49–62.
Love, J. M., Chazan-Cohen, R., Raikes, H., & Brooks-Gunn, J. (2013). What makes a difference:
Early Head Start evaluation findings in a developmental context. Monographs of the Society for
Research in Child Development 78(1), 1–173.
Mackintosh, N. J. (2011). IQ and human intelligence (2nd ed.). Oxford: Oxford University Press.
Mansens, D., Deeg, D. J. H., & Comijs, H. C. (2017). The association between singing and/or
playing a musical instrument and cognitive functions in older adults. Aging & Mental Health.
doi:10.1080/13607863.2017.1328481
Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a
foreign language better than non-musicians: Behavioral and electrophysiological evidence. Journal
of Cognitive Neuroscience 19(9), 1453–1463.
Mehr, S. A., Schachner, A., Katz, R. C., & Spelke, E. S. (2013). Two randomized trials provide no
consistent evidence for nonmusical cognitive benefits of brief preschool music enrichment. PLoS
ONE 8(12), e82007.
Melby-Lervåg, M., & Hulme, C. (2013). Is working memory training effective? A meta-analytic
review. Developmental Psychology 49(2), 270–291.
Melby-Lervåg, M., Reddick, T. S., & Hulme, C. (2016). Working memory training does not improve
performance on measures of intelligence or other measures of “far transfer.” Perspectives on
Moody, D. E. (2009). Can intelligence be increased by training on a task of working memory?
Intelligence 37, 327–328.
22(11), 1425–1433.
Moreno, S., Friesen, D., & Bialystok, E. (2011). Effect of music training on promoting preliteracy
skills: Preliminary causal evidence. Music Perception 29(2), 165–172.
Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S. L., & Besson, M. (2009). Musical
training influences linguistic abilities in 8-year-old children: More evidence for brain plasticity.
Mosing, M. A., Madison, G., Pedersen, N. L., & Ullén, F. (2016). Investigating cognitive transfer
within the framework of music practice: Genetic pleiotropy rather than causality. Developmental
Science 19(3), 504–512. A
Münte, T. F., Altenmüller, E., & Jäncke, L. (2002). The musician’s brain as a model of
neuroplasticity. Nature Reviews Neuroscience 3(6), 473–478.
Okada, B. M., & Slevc, R. L. (2018). Individual differences in musical training and executive
functions: A latent variable approach. Memory & Cognition. doi:10.3758/s13421-018-0822-8
Overy, K. (2003). Dyslexia and music. Annals of the New York Academy of Sciences 999, 497–505.
Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musician enhancement for speech in
noise. Ear and Hearing 30(6), 653–661.
Parbery-Clark, A., Strait, D. L., Anderson, S., Hittner, E., & Kraus, N. (2011). Musical experience
and the aging auditory system: Implications for cognitive abilities and hearing speech in noise.
PLoS ONE 6(5), e18082.
Parbery-Clark, A., Tierney, A., Strait, D. L., & Kraus, N. (2012). Musicians have fine-tuned neural
distinction of speech syllables. Neuroscience 219, 111–119.
Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience 6, 674–681.
hypothesis. Frontiers in Psychology 2, 142. doi: 10.3389/fpsyg.2011.00142
Patston, L. L. M., & Tippett, L. J. (2011). The effect of background music on cognitive performance
in musicians and nonmusicians. Music Perception 29(2), 173–183.
Petitto, L. (2008). Arts education, the brain, and language. In B. Rich & C. Asbury (Eds.), Learning,
arts, and the brain: The Dana Consortium report on arts and cognition (pp. 93–104). New
York/Washington, DC: The Dana Foundation.
Piro, J. M., & Oritz, C. (2009). The effect of piano lessons on the vocabulary and verbal sequencing
skills of primary grade students. Psychology of Music 37(3), 325–347.
Portowitz, A., Lichtenstein, O., Egorova, L., & Brand, E. (2009). Underlying mechanisms linking
music education and cognitive modifiability. Research Studies in Music Education 31(2), 107–128.
Posedel, J., Emery, L., Souza, B., & Fountain, C. (2012). Pitch perception, working memory, and
second-language phonological production. Psychology of Music 40(4), 508–517.
Posner, M., Rothbart, M. K., Sheese, B. E., & Kieras, J. (2008). How arts training influences
cognition. In B. Rich & C. Asbury (Eds.), Learning, arts, and the brain: The Dana Consortium
report on arts and cognition (pp. 1–10). New York/Washington, DC: The Dana Foundation.
Rapport, M. D., Orban, S. A., Kofler, M. J., & Friedman, L. M. (2013). Do programs designed to
train working memory, other executive functions, and attention benefit children with ADHD? A
meta-analytic review of cognitive, academic, and behavioral outcomes. Clinical Psychology
Review 33(8), 1237–1252.
Rauscher, F. H., & Hinton, S. C. (2011). Music instruction and its diverse extra-musical benefits.
Rauscher, F. H., & Shaw, G. L. (1998). Key components of the Mozart effect. Perceptual and Motor
Skills 86(3), 835–841.
Rauscher, F. H., Shaw, G. L., Levine, L. J., Wright, E. L., Dennis, W. R., & Newcomb, R. L. (1997).
Music training causes long-term enhancement of preschool children’s spatial-temporal reasoning.
Neurological Research 19(1), 2–8.
Rauscher, F. H., & Zupan, M. A. (2000). Classroom keyboard instruction improves kindergarten
children’s spatial-temporal performance: A field experiment. Early Childhood Research Quarterly
15(2), 215–228.
Roden, I., Grube, D., Bongard, S., & Kreutz, G. (2014). Does music training enhance working
memory performance? Findings from a quasi-experimental longitudinal study. Psychology of
Music 42(2), 284–298.
Rueda, M. R., Rothbart, M. K., McCandliss, B. D., Saccamanno, L., & Posner, M. I. (2005).
Training, maturation and genetic influences on the development of executive attention.
Proceedings of the National Academy of Sciences 102, 14931–14936.
Ruggles, D. R., Freyman, R. L., & Oxenham, A. J. (2014). Influence of musical training on
understanding voiced and whispered speech in noise. PLoS ONE 9(1), e86980.
Sala, G., & Gobet, F. (2016). Do the benefits of chess instruction transfer to academic and cognitive
skills? A meta-analysis. Educational Research Review 18, 46–57.
Sala, G., & Gobet, F. (2017a). When the music’s over: Does music skill transfer to children’s and
20, 55–67.
Sala, G., & Gobet, F. (2017b). Working memory training in typically developing children: A meta-
analysis of the available evidence. Developmental Psychology 53, 671–685.
Sala, G., Tatlidil, K. S., & Gobet, F. (2018). Video game training does not enhance cognitive ability:
A comprehensive meta-analytic investigation. Psychological Bulletin 144, 111–139.
Salthouse, T. A. (2004). What and when of cognitive aging. Current Directions in Psychological
Science 13(4), 140–144.
Salthouse, T. A. (2005). Relations between cognitive abilities and measures of executive functioning.
Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science 15(8), 511–514.
Schellenberg, E. G. (2006). Long-term positive associations between music lessons and IQ. Journal
of Educational Psychology 98(2), 457–468.
Schellenberg, E. G. (2011a). Examining the association between music lessons and intelligence.
British Journal of Psychology 102(3), 283–302.
Schellenberg, E. G. (2011b). Music lessons, emotional intelligence, and IQ. Music Perception 29(2),
185–194.
Schellenberg, E. G., & Mankarious, M. (2012). Music training and emotion comprehension in
childhood. Emotion 12(5), 887–891.
Schellenberg, E. G., & Moreno, S. (2010). Music lessons, pitch processing and g. Psychology of
Music 38(2), 209–221.
Schellenberg, E. G., & Peretz, I. (2008). Music, language and cognition: Unresolved issues. Trends in
Schellenberg, E. G., & Weiss, M. W. (2013). Music and cognitive abilities. In D. Deutsch (Ed.), The
psychology of music (3rd ed.). San Diego, CA: Elsevier.
Seashore, C. E., Lewis, D., & Saetveit, J. G. (1960). The Seashore measures of musical talents. New
York: Psychological Corporation.
Shipstead, Z., Redick, T. S., & Engle, R. W. (2012). Is working memory training effective?
Psychological Bulletin 138, 628–654.
Sirin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of
research. Review of Educational Research 75(3), 417–453.
Skoe, E., & Kraus, N. (2012). Human subcortical auditory function provides a new conceptual
framework for considering modularity. In P. Rebuschat, M. Rohrmeier, J. A. Hawkins, & I. Cross
(Eds.), Language and music as cognitive systems (pp. 269–282). Oxford: Oxford University Press.
Slater, J., Skoe, E., Strait, D. L., O’Connell, S., Thompson, E., & Kraus, N. (2015). Music training
improves speech-in-noise perception: Longitudinal evidence from a community-based music
program. Behavioural Brain Research 291, 244–252.
Slevc, L. R., Davey, N. S., Buschkuehl, M., & Jaeggi, S. M. (2016). Tuning the mind: Exploring the
connections between musical ability and executive functions. Cognition 152, 199–211.
Sluming, V., Brooks, J., Howard, M., Downes, J. J., & Roberts, N. (2007). Broca’s area supports
enhanced visuospatial cognition in orchestral musicians. Journal of Neuroscience 27(14), 3799–
3806.
Soveri, A., Antfolk, J., Karlsson, L., Salo, B., & Laine, M. (2017). Working memory training
revisited: A multi-level meta-analysis of n-back training studies. Psychonomic Bulletin & Review
24(4), 1077–1096.
Spelke, E. (2008). Effects of music instruction on developing cognitive systems at the foundation of
mathematics and science. In B. Rich & C. Asbury (Eds.), Learning, arts, and the brain: The Dana
Consortium report on arts and cognition (pp. 17–49). New York/Washington, DC: The Dana
Foundation.
Spinath, B., Spinath, F. M., Harlaar, N., & Plomin, R. (2006). Predicting school achievement from
general cognitive ability, self-perceived ability, and intrinsic value. Intelligence 34(4), 363–374.
Standley, J. M. (2008). Does music instruction help children learn to read? Evidence of a meta-
analysis. Update: Applications of Research in Music Education 27(1), 17–32
Stoesz, B., Jakobson, L., Kilgour, A., & Lewycky, S. (2007). Local processing advantage in
musicians: Evidence from disembedding and constructional tasks. Music Perception 25(2), 153–
165.
Strait, D. L., & Kraus, N. (2011). Can you hear me now? Musical training shapes functional brain
networks for selective auditory attention and hearing speech in noise. Frontiers in Psychology 2,
113. doi: 10.3389/fpsyg.2011.00113
Strait, D. L., O’Connell, S., Parbery-Clark, A., & Kraus, N. (2014). Musicians’ enhanced neural
differentiation of speech sounds arises early in life: Developmental evidence from ages 3 to 30.
Strait, D. L., Parbery-Clark, A., Hittner, E., & Kraus, N. (2012). Musical training during early
childhood enhances the neural encoding of speech in noise. Brain and Language 123(3), 191–201.
Strait, D. L., Parbery-Clark, A., O’Connell, S., & Kraus, N. (2013). Biological impact of preschool
music classes on processing speech in noise. Developmental Cognitive Neuroscience 6, 51–60.
Swaminathan, J., Mason, C. R., Streeter, T. M., Best, V., Kidd Jr, G., & Patel, A. D. (2015). Musical
training, individual differences and the cocktail party problem. Scientific Reports 5, 11628.
doi:10.1038/srep11628
Swaminathan, S., & Gopinath, J. K. (2013). Music training and second-language English
comprehension and vocabulary skills in Indian children. Psychological Studies 58(2), 164–170.
Swaminathan, S., & Schellenberg, E. G. (2016). Music training. In T. Strobach & J. Karbach (Eds.),
Cognitive training: An overview of features and applications (pp. 137–144). New York: Springer.
Swaminathan, S., & Schellenberg, E. G. (2017). Musical competence and phonemic perception in a
foreign language. Psychonomic Bulletin and Review 24(6), 1929–1934.
Swaminathan, S., & Schellenberg, E. G. (2018). Musical competence is predicted by music training,
cognitive abilities, and personality. Manuscript submitted for publication.
Swaminathan, S., Schellenberg, E. G., & Khalil, S. (2017). Revisiting the association between music
lessons and intelligence: Training effects or music aptitude? Intelligence 62, 119–124.
Swaminathan, S., Schellenberg, E. G., & Venkatesan, K. (2018). Explaining the association between
music training and reading in adults. Journal of Experimental Psychology: Learning, Memory, and
Cognition. doi: 10.1037/xlm0000493
Talamini, F., Grassi, M., Toffalini, E., Santoni, R., & Carretti, B. (2018). Learning a second language:
Can music aptitude or music training have a role? Learning and Individual Differences 64, 1–7.
Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: Do music
lessons help? Emotion 4(1), 46–64.
Writing 26(2), 139–161.
Thorndike, E. L., & Woodworth, R. S. (1901a). The influence of improvement in one mental function
upon the efficiency of other functions (I). Psychological Review 8, 247–261.
Thorndike, E. L., & Woodworth, R. S. (1901b). The influence of improvement in one mental
function upon the efficiency of other functions (II). The estimation of magnitudes. Psychological
Review 8, 384–395.
Tierney, A., Krizman, J., Skoe, E., Johnston, K., & Kraus, N. (2013). High school music classes
enhance the neural processing of speech. Frontiers in Psychology 4, 855. doi:
10.3389/fpsyg.2013.00855.
Trimmer, C. G., & Cuddy, L. L. (2008). Emotional intelligence, not music training, predicts
recognition of emotional speech prosody. Emotion 8(6), 838–849.
Vaughn, K. (2000). Music and mathematics: Modest support for the oft-claimed relationship. Journal
of Aesthetic Education 34(3–4), 149–166.
Vaughn, K., & Winner, E. (2000). SAT scores of students who study the arts: What we can and
cannot conclude about the association. Journal of Aesthetic Education 34(3–4), 77–89.
Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., & Vuust, P. (2010). The Musical Ear
Test: A new reliable rest for measuring musical competence. Learning and Individual Differences
20(3), 188–196.
life span. The Neuroscientist 16(5), 566–577.
Wandell, B., Dougherty, R. F., Ben-Shachar, M., Deutsch, G. K., & Tsang, J. (2008). Training in the
arts, reading, and brain imaging. In B. Rich & C. Asbury (Eds.), Learning, arts, and the brain: The
Dana Consortium report on arts and cognition (pp. 51–59). New York/Washington, DC: The Dana
Foundation.
Weicker, J., Villringer, A., & Thöne-Otto, A. (2016). Can impaired working memory functioning be
improved by training? A meta-analysis with a special focus on brain injured patients.
Weiss, M. W., & Bidelman, G. M. (2015). Listening to the brainstem: Musicianship enhances
intelligibility of subcortical representations for speech. Journal of Neuroscience 35(4), 1687–1691.
Wetter, O. E., Koerner, F., & Schwaninger, A. (2009). Does musical training improve school
performance? Instructional Science 37(4), 365–374.
Winner, E., & Cooper, M. (2000). Mute those claims: No evidence (yet) for a causal link between
arts study and academic achievement. Journal of Aesthetic Education 34(3–4), 11–75.
Winner, E., Goldstein, T. R., & Vincent-Lancrin, S. (2012). The impact of arts education: What do
we know? Paris: Organisation for Economic Co-operation and Development.
human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10, 420–422.
Wu, H., Ma, X., Zhang, L., Liu, Y., Zhang, Y., & Shu, H. (2015). Musical experience modulates
categorical perception of lexical tones in native Chinese speakers. Frontiers in Psychology 6, 436.
doi:10.3389/fpsyg.2015.00436
Zendel, B. R., & Alain, C. (2009). Concurrent sound segregation is enhanced in musicians. Journal
of Cognitive Neuroscience 21(8), 1488–1498.
Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory
processing. Psychology and Aging 27(20), 410–417.
Zendel, B. R., & Alain, C. (2013). The influence of lifelong musicianship on neurophysiological
measures of concurrent sound segregation. Journal of Cognitive Neuroscience 25(4), 503–516.
functioning in musicians and non-musicians. PLoS ONE 9(6), e99868.
Zuk, J., Ozernov-Palchik, O., Kim, H., Lakshminarayanan, K., Gabrieli, J. D., Tallal, P., & Gaab, N.
(2013). Enhanced syllable discrimination thresholds in musicians. PLoS ONE 8(12), e80546.
CHAPT E R 27
THE NEUROSCIENCE OF
CHILDREN ON THE
AUTISM SPECTRUM WITH
EXCEPTIONAL MUSICAL
ABILITIES
ADAM OCKELFORD
T chapter considers the exceptional musicianship that characterizes some

children on the autism spectrum who have learning difficulties, for whom
many areas of life that most of us take for granted—speech and language,
emotional intelligence, and social skills—present sometimes
insurmountable challenges. Yet as developing musicians these children may
function in much the same way as infant prodigies (McPherson, 2016).
How is this possible, and what does it tell us about their evolving music
minds? We begin by revisiting William Gaver’s “ecological” analysis of
hearing in the context of autism.
G ’ E T
L A
In his “ecological” approach to understanding auditory perception, Gaver
describes how, in everyday contexts, listeners typically privilege the
function of sounds over their acoustic properties (Gaver, 1993). He gives
the example of a pedestrian walking along an alley when a sound starts to
emerge from behind: that of a car with a large and powerful engine. It is
possible, Gaver contends, that the person concerned may attend to the
sound’s timbre, noticing whether it is rough or smooth, or bright or dull.
Paying attention to these attributes, which have to do with quality of the
sound itself, Gaver terms “musical” listening. It is more likely however, in
the situation described, that the pedestrian will notice that the car is
approaching rather quickly from behind and that the sound of its engine is
echoing off the narrow walls of the alley. There is a need to move quickly to
get out of the vehicle’s way! This Gaver terms “everyday” listening: the
experience of focusing on the significance of an event rather than its
acoustic properties. However, there are people for whom this unthinking
prioritization of function over form does not appear to occur—among them,
many children with so-called “classic” autism—the type first identified by
Leo Kanner in 1943.
Autism is a lifelong, neurological condition that typically manifests
itself within the first two or three years of childhood (see, for example,
Boucher, 2009; Frith, 2003; Wing, 2003). Its effects pervade the whole of a
child’s development. According to the Diagnostic and Statistical Manual of
Mental Disorders, Fifth Edition (the so-called “DSM-5”), published by the
American Psychiatric Association in 2013, Autism Spectrum Disorder is
identified through two criteria (APA, 2013, 299.00 (F84.0)):
A. Persistent deficits in social communication and social interaction

across multiple contexts, as manifested by the following [illustrative
examples]:
1. Deficits in social-emotional reciprocity, ranging, for example,
from abnormal social approach and failure of normal back-and-
forth conversation; to reduced sharing of interests, emotions, or
affect; to failure to initiate or respond to social interactions.
2. Deficits in nonverbal communicative behaviors used for social
interaction, ranging, for example, from poorly integrated verbal
and nonverbal communication; to abnormalities in eye contact
and body language or deficits in understanding and use of
gestures; to a total lack of facial expressions and nonverbal
communication.
3. Deficits in developing, maintaining, and understanding
relationships, ranging, for example, from difficulties adjusting
behavior to suit various social contexts; to difficulties in
sharing imaginative play or in making friends; to absence of
interest in peers.
B. Restricted, repetitive patterns of behavior, interests, or activities, as
manifested by at least two of the following [illustrative examples]:
1. Stereotyped or repetitive motor movements, use of objects, or
speech (e.g., simple motor stereotypies, lining up toys or
flipping objects, echolalia, idiosyncratic phrases).
2. Insistence on sameness, inflexible adherence to routines, or
ritualized patterns or verbal nonverbal behavior (e.g., extreme
distress at small changes, difficulties with transitions, rigid
thinking patterns, greeting rituals, need to take same route or
eat same food every day).
3. Highly restricted, fixated interests that are abnormal in intensity
or focus (e.g., strong attachment to or preoccupation with
unusual objects, excessively circumscribed or perseverative
interest).
4. Hyper- or hyporeactivity to sensory input or unusual interests in
sensory aspects of the environment (e.g., apparent indifference
to pain/temperature, adverse response to specific sounds or
textures, excessive smelling or touching of objects, visual
fascination with lights or movement).
Criterion B, Example 4 is of particular interest to us here: the

observation that the perceptual qualities of objects may be more important
than their function. This is corroborated by the accounts of parents of
children on the autism spectrum, who often report a deep-rooted fascination
with sound for its own sake (Ockelford, 2013, pp. 13, 14). For instance, one
mother reports that her son Jack “is obsessed with the beeping sound of the
microwave when its cooking cycle comes to an end. He can’t bear to leave
the kitchen till it’s stopped. And just lately, he’s become very interested in
the whirr of the tumble-drier too.” Another describes how her 4-year-old
daughter just repeats what is said to her. “For a long time, she didn’t speak
at all, but now I might say, ‘Hello, Anna’, and she will reply ‘Hello, Anna’.
I ask ‘Do you want to play with your toys’ and she just says ‘Play with your
toys’, though I don’t think she really knows what I mean.” A father relates
how his son Ben “wants to listen to the jingles that he downloads from the
internet all the time—16 hours a day if we let him. He doesn’t even play
them all the way through: sometimes just the first couple of seconds of a
clip, over and over again. He must have heard them thousands of times, but
he never seems to get bored.” Freddie, according to his mother, is obsessed
with sound too. He “constantly flicks any glasses, bowls, pots or pans that
are within reach.” He once “emptied out the dresser—and even brought in
half a dozen flowerpots from the garden—and lined everything up on the
floor. Then he sat and ‘played’ his new instrument for hours.” Finally,
Romy goes through phases of only pretending to play the notes on her
keyboard—touching the keys with her fingers but not actually pressing
them down. She also “introduces everyday sounds that she hears into her
improvising. For example, she plays the complicated descending harmonic
sound of the airplanes coming into land at Heathrow as chords, and
somehow integrates them into the music she is playing.”
What causes some autistic children to hear sounds in this way? And
what impact does this idiosyncratic style of auditory perception have on the
way they engage with music? To contextualize these questions, let us
consider the manner in which so-called “neurotypical” infants come to
process sound. If Gaver’s ecological model of hearing is correct, then there
must be a stage in auditory development at which the separation between
“functional” and “perceptual” listening occurs. Moreover, there is a further
category of listening that Gaver does not mention: that pertaining to
language, which is ultimately based on the perception and cognition of
speech sounds. The separation of music and language perception ties in
with evidence from neuroscience, which suggests that, while the two
domains share some neurological resources, they also have dedicated
processing pathways (Patel, 2012). These are distinct from those activated
by environmental sounds (Norman-Haignere, Kanwisher, & McDermott,
2015).
As yet, it is unknown how and when the three types of auditory
processing, relating to everyday sounds, music, and speech, become
established in the architecture of the brain following the development of
hearing from around four months before birth (Lecanuet, 1996). There is a
growing body of evidence that musical engagement is essential to language
acquisition (Brandt, Gebrian, & Slevc, 2012), suggesting that the neural
correlates of music perception emerge first. My own work, the “zygonic”
theory of music-structural understanding (Ockelford, 2017), supports this
view.
Z T
The theory sets out from the reductionist position that music can be
regarded as a system of perceived sonic variables. Some of these, such as
duration, have a single axis of variability, while others, like timbre, are
multidimensional in nature; some, including loudness, gauge qualities,
while others detail its perceived location in time or space; and some, like
pitch, pertain to individual notes, while others, such as tonality, are
characteristic of a group. Despite this diversity, all these variables have a
common characteristic: they each have many potential modes of existence,
or “values,” whose range represents the freedom of choice available to
composers. Conversely, each may be deemed to be constrained or “ordered”
to the extent that its value is reckoned to be subject to restriction. While
some of the causes of perceived sonic constraint may lie beyond a
composer’s immediate control (the selection of timbre will be dictated by
the availability of performers, for example, and a singer may be unable to
reach a particular pitch), and while external influences (such as the cross-
media effects of song-texts, for instance) often have a bearing, most—and
certainly the most important—perceived sonic restrictions in fact function
intra-musically, through the process of repetition. In short, a value may be
thought to be ordered if it is reckoned to exist in imitation of another.
Since the vast majority of listeners are quite unaware of this type of
cognitive activity, clearly it need not operate at a conscious level. Yet it
must be universally present, if only subliminally, otherwise an orderly
sequence of sounds would prove no more effective a means of musical
communication than a random one, which is not the case. While the
acknowledgment of the central role of repetition in musical structure is
widespread in the music-theoretical and music-psychological literatures, the
function of imitation is less well understood. In this respect, zygonic theory
most closely resonates with the thinking of Cone (1987, p. 237), who
asserts, in relation to the derivation of musical material, that “y is derived
from x (y ← x), or, to use the active voice, x generates y (x → y), if y
resembles x and y follows x. By ‘resembles’, I mean ‘sounds like’.”
In zygonic theory, the connections between x and y that Cone identifies,
through which a sense of derivation (or generation) is imagined to exist, are
termed “zygonic relationships” (after the Greek “zygon,” meaning a yoke
connecting two similar things). This single theoretical concept bequeaths a
vast perceptual legacy, with many manifestations: potentially involving any
perceived aspect of sound; existing over different periods of perceived time;
and operating within the same and between different pieces, performances,
and hearings. Zygonic relationships may function in a number of different
ways: reactively, for example, in assessing the relationship between two
extant values, or proactively, in ideating a value as an orderly continuation
from one presented. They may operate between anticipated or remembered
values, or even those that are wholly imagined, only ever existing in the
mind.
In music-structural terms, zygonic relationships function at three
hierarchical levels: between individual events (notes or chords), groups
(motifs, “hooks,” “licks,” or riffs) and frameworks (imaginary matrices of
pitch and time, whose elements have different perceived probabilities of
occurrence according to a listener’s previous exposure to pieces in a
particular style or genre). Recognizing imitation between individual musical
events is thought to take the least mental processing power of all forms of
structure (Ockelford, 2017, p. 187), since it requires at most two or three
items of musical information, in the form of notes or chords or the intervals
between them, to be held in working memory and compared. The temporal
envelope within which such structures occur is constrained, sometimes
extending to little more than Edmund Husserl’s “perceived present.”
Recognizing relationships between motifs is cognitively more demanding:
organization of this kind necessarily involves four events or more, since at
least two are required to create a group (and a minimum of two groups is
required). The timespans of such structures are potentially greater than
those involving events alone, and may even implicate long-term memory.
There is likely to be a greater degree of abstraction from the perceptual
“surface” too. Imitative links between frameworks of pitch and onset times
appear to be the most cognitively demanding of all. They depend on the
existence of long-term “schematic” memories, whereby the details of the
perceptual surface of music and individual connections perceived between
musical events are not encoded in long-term memory discretely or
independently, but are combined with many thousands of other similar data
to create probabilistic networks of relationships between notional
representations of pitch and perceived time. That is, large amounts of
perceptual information are merged to enable the deep level of cognitive
abstraction to occur.
To sum up: the cognitive correlates of musical structure grow in
complexity as one moves from events, to groups and then frameworks,
reflecting an increasing amount of perceptual input, experienced over
longer periods of time, and processed and stored using progressively more
abstract forms of mental representation. Moreover, the cognitive operations
pertaining to higher levels of structure must build on and incorporate those
required to process lower levels, since connections between groups
comprise series of relationships between events, and links between
frameworks are established by acknowledging the correspondences that
exist between groups.
T N D M
A P
To what extent does this hierarchy of music structures tie in with the
development of children’s understanding of music? It seems that very
young children have a built-in propensity to imitate others, and that this
plays a part in early interactive sound-making using individual musical
sounds (Meltzoff & Prinz, 2002). Similarly, in analyzing preverbal
communication in babies from just two to seven months old, Papoušek
(1996) found that up to half of these infants’ vocal sounds are part of
reciprocal matching sequences that the children engage in with their
mothers. These findings complement work by other researchers showing
that babies less than five months of age can replicate individual pitches
(Kessen, Levine, & Wendrich, 1979), copy changes in pitch (Kuhl &
Meltzoff, 1982), and emulate vowel-like sounds made by others. Each of
these forms of interaction involves imitation at the level of events—
showing engagement with musical sounds at the first level of the structural
hierarchy.
Engagement at the next level first appears from seven to eleven months,
when babies repeat and vary groups of sounds, using them as the basic units
of structure, through babbling that, according to Papoušek (1996, p. 106),
involves producing “short musical patterns or phrases that soon become the
core units for a new level of vocal practicing and play.” Gradually, groups
of sounds may be linked through repetition or transposition to form chains,
and the first self-sufficient improvised pieces emerge. Welch (2006) notes
that between the ages of one and two, a typically spontaneous song
comprises repetitions of a brief melodic phrase at different pitch centers.
These are unlike adult singing, however, because “they lack a framework of
stable pitches (a scale) and use a very limited set of contours in one song”
(Dowling, 1982, pp. 416, 417). From the age of two-and-a-half, so-called
“potpourri” melodies may appear (Moog, 1976, p. 115), which borrow and
may transform features and fragments from other, standard songs that have
been assimilated into the child’s own spontaneous singing (Hargreaves,
1986, p. 73). These self-generated melodies, which use materials derived
from a repertoire that is familiar from a child’s musical culture, are termed
“referent-guided improvisation” by Mang (2005).
Finally, from around the age of four (the age can vary considerably), two
advances occur, which pertain to the third level of the music-structural
hierarchy: frameworks. First, children develop the capacity to abstract an
underlying pulse from the surface rhythm of songs and other pieces
(meaning that he or she can perform “in time” to a regular beat that is
provided). Second, children’s singing acquires “tonal stability,” with the
clear projection of a key center across all the phrases of a piece
(Hargreaves, 1986, pp. 76, 77). These abilities imply a cognizance of
repetition at a deeper structural level in the “background” organization of
music.
How does this compare with language development? Children start to
understand a few key words towards the end of their first year, and will
begin to speak using some of these words from around the age of 12
months. Around this time, they develop the capacity to process short
phrases, and from 18–24 months they learn to juxtapose words in pairs
themselves. Over the next two years, these become amalgamated into
longer and more complex sentences, which are generated through an
intuitive understanding of the syntax of the language (or languages) to
which a child is exposed (Saxton, 2010).
So much for the development of music and language processing in the
early years. We can surmise that the third category in the ecological model
of auditory perception—“everyday” sounds—must perceptually be the most
primitive of all, since it requires less cognitive processing than either music
or speech. Hence we can reasonably assume that, early on in human
development, the brain typically treats all sound in the same way and that
music processing starts to emerge as a distinct strand, first, followed by
language. We can speculate that “everyday” sounds form the residue that is
left.
Based on these assumptions, a developmental model of ecological
auditory perception can be constructed along the following lines as in Fig.
1. Note the underlying assumption that, in addition to their shared neural
resources, music and language come to have other, distinct neural correlates
during the first year of life. Since the precise nature of the sounds that
constitute speech and music varies from one culture to another, it is
appropriate to regard the model as indicative rather than prescriptive.
FIGURE 1. A visual representation of how the emerging streams of music and language processing
arise in auditory development.
T D A
P S C
A S
So much for “neurotypical” development. What of children on the autism

spectrum, though? The parents’ descriptions cited above suggest that
certain types of sound, particularly those that have a special salience for a
given individual, or that they find singularly pleasing, such as the whirring
of the tumble drier, have little or no functional significance for some autistic
children. Rather, there is a tendency for them to be processed primarily in
terms of their sounding qualities, in the same way that the elements of
music are. Beyond this, it also appears that everyday sounds involving
repetition or regular change (for example, the beeping of a microwave) may
be processed in music-structural terms. That is to say, some children on the
autism spectrum hear repetition that is generated mechanically or
electronically as being imitative (see Fig. 2).
FIGURE 2. Some everyday sounds may be processed as music among children on the autism
spectrum.
There is another possibility that should be acknowledged: that the

autistic children who are preoccupied with the sounding qualities of certain
everyday objects and the repetitive patterns that some of them make don’t
hear these auditory phenomena in a musical way (as being derived from
through imitation) but purely as environmental regularities. By extension, it
could be the case that the same children don’t hear music in a “musical”
way, either, but merely as patterned sequences of sounds, to which no sense
of human agency is transferred by imitation. Why should this be so? One
explanation would be because such children did not engage in the early
vocal interactions with carers—“communicative musicality” (Malloch &
Trevarthen 2009)—that, early in life, may embed a sense of imitation in
sounds that are repeated (Ockelford, 2017). However, the accounts of
Freddie appropriating everyday sound-makers (flower pots) to be used as
musical instruments, and Romy reproducing the whines of jet engines of
airplanes coming in to land and integrating them into her improvisation at
the piano, suggest that some autistic children, at least, do perceive everyday
sounds in a musical way.
It is conceivable that this tendency is reinforced by the ubiquity of music
in the lives of young children (Lamont, 2008); in the developed world, they
are typically surrounded by electronic games and gadgets, toys, mobile
phones, MP3 players, computers, iPads, TVs, radios, and so on, all of which
emanate music in some form. Music is to be found in much of the wider
human environment too, including cafés, restaurants, shops, cinemas,
waiting rooms, cars and airplanes, and at many religious gatherings and
other public ceremonies. Given that children are inundated with non-
functional (musical) sounds, designed, in one way or another, to influence
emotional states and behavior, perhaps we should not be surprised that the
sounds with which they often co-occur that to neurotypical ears are
functional, should come to be processed in the same way.
The manner in which some children on the autism spectrum perceive the
world can have other consequences too. For instance, the development of
language can be affected, resulting in “echolalia”—a distinctive form of
speech widely reported among autistic children (Mills, 1993; Sterponi &
Shankey, 2013) that was first defined as the meaningless repetition of words
or phrases (Fay 1967, 1973). It appears, however, that echolalia actually
fulfills a range of functions in verbal interaction (Prizant, 1979), including
turn-taking and affirmation, and it often finds a place in non-interactive
contexts too, serving as a self-reflective commentary or rehearsal strategy
(McEvoy, Loveland, & Landry, 1988; Prizant & Duchan, 1981). Given the
zygonic hypothesis that imitation lies at the heart of musical structure
(Ockelford, 2013), it could be argued that one cause of echolalia is the
organization of language (in the absence of semantics and syntax) through
the structure (repetition) that is present in all music. It is as though words
are treated as musical objects in their own right, to be manipulated not
according to their meaning or grammatical function, but purely through
their sounding qualities. This implies a second modification to the
ecological model of auditory development (see Fig. 3).
FIGURE 3. Speech may also be processed musically by some children on the autism spectrum.
It is worth noting that echolalia is not only found in the context of
“special” development; it is a feature of “typical” language acquisition in
young children too (Mcglone-Dorrian & Potter, 1984) when, it seems, the
urge to imitate what is heard outstrips semantic understanding. This accords
with a stage in the ecological model of auditory development when the two
strands of communication through sound—language and music—are not
yet cognitively distinct, and supports the notion that musical development
precedes the onset of language.
For some children on the autism spectrum, music itself can become
“super-structured” with additional repetition, as the account of Ben (above)
shows; it is common for children on the autism spectrum to play snippets of
pieces or videos with music over and over again. It is as though the high
proportion of repetition that characterizes music (which is at least 80
percent—see Ockelford, 2005), is insufficient for the mind that craves
structure, and so it makes even more. In conversing with autistic adults who
are able to verbalize why, as children, they would repeat musical excerpts in
this way, it seems that the main reason for obsessively repeating a
particularly fascinating series of sounds (apart from the sheer enjoyment
that the regularity brings) is that they could hear more and more in the
sequence concerned as they listened to it again and again. Bearing in mind
that most music tends to be highly complex, with many events occurring
simultaneously (and given that even individual notes tend to comprise many
pitches in the form of harmonics), to the child with finely tuned auditory
perception, there are many different things to attend to in even a few
seconds of music, and an even greater number of potential relationships
between sounds to fathom. So, for example, while listening to a passage for
orchestra one hundred times may be extremely tedious to the “neurotypical”
ear, which can detect only half a dozen composite events, each fused in
perception, to the mind of the autistic child, which can break down the
sequence into a dozen different melodic lines, the stimulus may be
captivating.
A P
One of the consequences of an early preoccupation with the “musical”
qualities of sounds appears to be the development of “absolute pitch”—or
“AP.” This is the ability to identify or produce pitches in isolation from
others. In the Western population, the capacity is very rare, with an
estimated prevalence of 1 in 10,000 (Takeuchi & Hulse, 1993). However,
among those on the autism spectrum, the position is markedly different;
recent estimates, derived from parental questionnaires, vary between 8
percent, n = 118 (Vamvakari, 2013) and 21 percent, n = 305 (Reese, 2014).
These figures are broadly supported by DePape, Hall, Tillmann, and Trainor
(2012) who found that 11 percent of 27 high-functioning adolescents with
autism had AP. It is unusual to find such high orders of difference in the
incidence of a perceptual ability and, evidently, there is something distinct
in the way that the parts of the brain responsible for pitch memory wire
themselves up in a significant minority of autistic children.
Although AP is a useful skill in “neurotypical” musicians—including
elite performers—it is an indispensable factor in the development of
performance skills in autistic children with learning difficulties—so-called
“musical savants” (Miller, 1989). It seems to be this unusual ability that
both motivates and enables some young children with a very limited general
understanding of the world around, from the age of 24 months or so, to pick
out tunes or chords on instruments that they encounter (sometimes more or
less by chance) at home or elsewhere. Often the instrument concerned will
be an electric keyboard or piano. The children’s early experiments in
producing music may well occur with no adult intervention—or, indeed,
awareness of what they are doing. It is my contention that AP has this
impact since each pitch sounds distinct, and potentially can elicit a powerful
emotional response; hence, being able to reproduce these at will must surely
be an intoxicating experience. But more than this, having AP makes
learning to play by ear manageable, in a way that “relative pitch”—the
capacity to process the differences between pitches (“intervals”)—does not.
To understand why this should be so, consider a typical playground chant,
based on the intervals of a minor third, a perfect fourth, and a major second
(Fig. 4).
FIGURE 4. An archetypal playground chant.
In so-called “neurotypical” individuals, motifs such as this are likely to

be cognitively encoded, stored and retrieved as a series of differences
between notes (although some degree of absolute pitch memory will exist—
a child would know if the chant were an octave too high, for example). For
children with AP, though, the position is quite different, since they can
capture the pitch data from the melody as a series of self-sufficient values,
rather than a sequence of intervals. So, in seeking to remember and repeat
groups of notes over extended periods of time, they have certain processing
advantages over their neurotypical peers, who, by extracting and storing
pitch information at a higher level of abstraction, lose the “surface detail.”
Observe that there are apparent disadvantages to “absolute” representations
of pitch too since, by regarding qualia in isolation, listeners cannot take
advantage of the patterns that exist through the repetition of intervals, and
so greater demands are made on memory. However, as the brain’s long-term
storage capacity is so large, this is not a serious problem; indeed, having an
exceptional memory is something that is common to many children with
autism and all savants.
It is the capacity for “absolute pitch data capture” that, in my view,
explains why children who are on the autism spectrum and have learning
difficulties with AP are able to develop instrumental skills at an early age
with no formal tuition. This is because, for them, reproducing groups of
notes that they have heard is merely a question of remembering a series of
one-to-one mappings between given pitches as they sound and (very often)
the keys on a keyboard that produce them. The crucial thing is that these
relationships are invariant: once learnt, they can service a lifetime of music
making, through which they are constantly reinforced. Conversely, were a
child with “relative pitch” to try to play by ear, he or she would have a far
more difficult task. Children in this position need to become proficient in
the complicated process of calculating how the intervals that are perceived
map onto the distances between keys, which, due to the asymmetries of the
keyboard, are likely to differ according to what would necessarily be an
arbitrary starting point.
Take, for example, the interval that exists between the first two notes of
the playground chant (a minor third) shown in Fig. 4: this can be produced
through no fewer than twelve distinct key combinations, comprising one of
four underlying patterns. The complexity of the situation is compounded by
the fact that virtually the same physical leap between other keys may sound
different (a major third) according to its location on the keyboard (Fig. 5).
FIGURE 5. The different mechanisms involved in playing by ear using “absolute” and “relative”
pitch processing.
It is important to point out that children with AP who learn to play
rapidly acquire relative pitch processing skills too, enabling them to play
melodies beginning on different notes. Indeed, it is not unusual for them to
learn to reproduce pieces fluently in every key. This may seem
contradictory, given the processing advantage conferred by being able to
encode pitches as perceptual identities in their own right, each mapping
uniquely onto a particular note on the keyboard. The reality of almost all
pieces of music, however, is that melodic (and harmonic) motifs variously
appear at different pitches through transposition and so, to make sense of
music, young children with AP need to learn to process pitch relatively as
well as absolutely (Stalinski & Schellenberg, 2010).
In summary: the difference in pitch processing between musicians who
are AP possessors and those who are not can be characterized with
reference to an imaginary cognitive “ladder” of pitch, upon which the
values of a given framework (a major or minor scale, for example) exist as
rungs. Now, for a child with relative pitch, the ladder (whose configuration
becomes clear from a rapid analysis of incoming intervals) is movable: it
can exist at any pitch height and still offer a satisfactory pitch framework
for a given piece. However, for a child with AP, the position of the ladder is
fixed. So he or she has the advantage of both recognizing a particular pitch
ladder, and knowing where it sits in “pitch-space.”
T I AP M
E C Y
P W A A
S
What is the impact of AP on the musical engagement of children who are

on the autism spectrum likely to be? There is no one answer to this
question, since the individuals will vary hugely in terms of their preferences
and motivations. I have written at length about the extraordinary life of
Derek Paravicini (Ockelford, 2009), who is what Treffert (2009) would call
a “prodigious” musical savant. It is simply not possible to imagine Derek
without his piano playing, which embodies the way that he thinks, the way
he feels, and the way he relates to other people. A description of the way he
engages with environmental sounds and speech is set out below, taken from
a blog designed to raise awareness of autism and musicality. But there are
many other children on the autism spectrum with whom I have worked who
are no less “special” in their different ways. In this context, I offer two
further accounts of children whom I have taught for a number of years.
They too are taken from awareness-raising blogs.
Derek
After many hours of the same dull drone – auditory chewing gum that has long since lost its
flavor or interest – there is a sudden, almost imperceptible change in the humming of the plane’s
engines. I glance outside and see that, at last, we are over the Nevada desert. Only an hour or so
now until we hit Los Angeles.
The young man sitting next to me – noticeably upright in his seat – stiffens slightly as he hears
the tiny deviation in sound.
‘F sharp’, he intones. ‘It’s F sharp, Adam.’
He leans towards me, demanding a response, and the sun bounces off his trademark Prada
sunglasses, but without penetrating the world of darkness beneath.
‘Yes, Derek’, I reply, ‘We’ll soon be landing at LAX.’
‘Landing at LAX’, he echoes, apparently relishing the sound of the words – and their import – in
equal measure.
‘And I will see Dana, and I will play the piano’, he continues.
‘Yes, Derek.’ I offer the same reply again, the sound of my voice as much as the words offering a
reassurance forged in a relationship of many years – as Derek’s teacher, mentor and friend.
‘You’ll play the piano.’
Repetition confers calm, a hint of a smile crosses Derek’s features, and he relaxes back in his
seat.
Derek Paravicini – blind autistic savant, musician extraordinaire, learning disabled genius,
unflagging companion – is on his way to California to perform in a series of concerts: grist to his
globe-trotting mill.
For him, airplanes are one of life’s many mysteries: a series of awkward slopes and steps to be
negotiated; well-meaning helping hands; a waft of warm, stale air; ‘doors to automatic and cross-
check’; the sound of the engines starting up. Soon the seat seems to move and bump about, then
steadiness; a long, vibrating steadiness. Les Mis on the headphones – once, twice, three times?
At last, everything goes into reverse, and abruptly, we’re off the plane. Now there are new
voices, new accents. A new hotel. Oatmeal instead of porridge for breakfast. And … finally …
the piano. At last, something familiar. Every note a close friend. The band plays the same as in
England. The clapping is familiar too, though people seem to clap louder in America.
‘Good job, Derek!’ ‘Awesome!’ ‘Can you smile for the photo?’ Derek wrinkles his nose, and
everyone laughs, infectiously. He catches the humor, and smiles as well. Music has worked its
magic, as it always does.1
This account illustrates how, even as a young adult, Derek’s propensity
for processing everyday sounds and language in a musical way, acquired as
a child, is still evident. For him, the sound of the jet engine has to be
accepted as a feature of air travel, but is not understood; hence it remains
for him at the level of pure auditory input, which he hears as musical notes.
And there are traces too of his childhood tendency to repeat words—
echolalia—as much for their sounds as their meanings, as he copies the
ends of my contributions to our conversation. These are testament to his
idiosyncratic neural circuitry, produced both as a result of hypoxia due to
his extreme prematurity and the consequent exceptional cognitive
environment that his developing brain had to endure, without visual input
and with limited capacity to process language.
And yet, with the necessary support, Derek can function well: meeting
new people, interacting with them in unrehearsed social situations, and
tolerating unfamiliar environments with equanimity. By celebrating the
advantages and ameliorating the disadvantages of Derek’s autism spectrum
condition, he has a quality of life that would be the envy of many,
acknowledged internationally as a “special” musician.
The current cohort of autistic children with whom I work visit me, with
their parents, in a large practice room at the University of Roehampton.
There are two pianos, to avoid potential difficulties over personal space. A
number of the children rarely say a word. Some, like Romy, don’t speak at
all. She converses through her playing, telling me what piece she would like
next, and indicating when she’s had enough. Sometimes, she will tease me
by apparently suggesting one thing when she means another. In this way,
jokes are shared and, sometimes, feelings of sadness too. For Romy, music
replaces words, and truly functions as a proxy language, with the
exceptional neurological correlates that must entail.
Romy
On Sunday mornings, at 10.00am, I steel myself for Romy’s arrival. I know that the next two
hours will be an exacting test of my musical mettle. Yet Romy has severe learning difficulties,
and she doesn’t speak at all. She is musical to the core, though; she lives and breathes music – it
is the very essence of her being. With her passion comes a high degree of particularity; Romy
knows precisely which piece she wants me to play, at what tempo, and in which key. And woe
betide me if I get it wrong.
When we started working together, six years ago, mistakes and misunderstandings occurred all
too frequently since, as it turned out, there were very few pieces that Romy would tolerate: for
example, the theme from Für Elise (never the middle section); the Habanera from Carmen; and
some snippets from ‘Buckaroo Holiday’ (the first movement of Aaron Copland’s Rodeo).
Romy’s acute neophobia meant that even one note of a different piece would evoke shrieks of
fear-cum-anger, and the session could easily grow into an emotional conflagration.
So gradually, gradually, over weeks, then months, and then years, I introduced new pieces –
sometimes, quite literally, at the rate of one note per session. On occasion, if things were
difficult, I would even take a step back before trying to move on again the next time. And,
imperceptibly at first, Romy’s fears started to melt away. The theme from Brahms’s Haydn
Variations became something of an obsession, followed by the slow movement of Beethoven’s
Pathetique sonata. Then it was Joplin’s The Entertainer, and Rocking All Over the World by
Status Quo.
Over the six years, Romy’s jigsaw box of musical pieces – fragments ranging from just a few
seconds to a minute or so in length – has filled up at an ever-increasing rate. Now it’s
overflowing, and it’s difficult to keep up with Romy’s mercurial musical mind; mixing and
matching ideas in our improvised sessions, and even changing melodies and harmonies so they
mesh together, or to ensure that my contributions don’t!
As we play, new pictures in sound emerge and then retreat as a kaleidoscope of ideas whirls
between us. Sometimes a single melody persists for fifteen minutes, even half an hour. For
Romy, no matter how often it is repeated, a fragment of music seems to stay fresh and vibrant.
At other times, it sounds as though she is trying to play several pieces at the same time – she just
can’t get them out quickly enough, and a veritable nest of earworms wriggle their way onto the
piano keyboard. Vainly I attempt to herd them into a common direction of musical travel.
So here I am, sitting at the piano in Roehampton, on a Sunday morning in mid-November,
waiting for Romy to join me (not to be there when she arrives is asking for trouble). I’m
limbering up with a rather sedate rendition of the opening of Chopin’s Etude in C major, Op. 10,
No. 1 when I hear her coming down the corridor, vocalizing with increasing fervor. I feel the
tension rising, and as her father pushes open the door, she breaks away from him, rushes over to
the piano and, with a shriek and an extraordinarily agile sweep of her arm, elbows my right hand
out of the way at the precise moment that I was going to hit the D an octave above middle C. She
usurps this note to her own ends, ushering in her favorite Brahms-Haydn theme. Instantly, Romy
smiles, relaxes and gives me the choice of moving out of the way or having my lap appropriated
as an unwilling cushion on the piano stool. I choose the former, sliding to my left onto a chair
that I’d placed earlier in readiness for the move that I knew I would have to make.
I join in the Brahms, and encourage her to use her left hand to add a bass line. She tolerates this
up to the end of the first section of the theme, but in her mind she’s already moved on, and
without a break in the sound, Romy steps onto the set of A Little Night Music, gently noodling
around the introduction to Send in the Clowns. But it’s in the wrong key – G instead of E flat –
which I know from experience means that she doesn’t really want us to go into the Sondheim
classic, but instead wants me to play the first four bars (and only the first four bars) of
Schumann’s Kleine Studie Op. 68, No. 14. Trying to perform the fifth bar would, in any case, be
futile since Romy’s already started to play … now, is it I am Sailing or O Freedom? The opening
ascent from D through E to G could signal either of those possibilities. Almost tentatively, Romy
presses those three notes down and then looks at me and smiles, waiting, and knowing that
whichever option I choose will be the wrong one. I just shake my head at her and plump for O
Freedom, but sure enough Rod Stewart shoves the Spiritual out of the way before it has time to
draw a second breath.
From there, Romy shifts up a gear to the Canon in D – or is it really Pachelbel’s masterpiece?
With a deft flick of her little finger up to a high A, she seems to suggest that she wants Streets of
London instead (which uses the same harmonies). I opt for Ralph McTell, but another flick, this
time aimed partly at me as well as the keys, shows that Romy actually wants Beethoven’s
Pathetique theme – but again, in the wrong key (D). Obediently I start to play, but Romy takes
us almost immediately to A flat (the tonality that Beethoven originally intended). As soon as I’m
there, though, Romy races back up the keyboard again, returning to Pachelbel’s domain. Before
I’ve had time to catch up, though, she’s transformed the music once more; now we’re hearing the
famous theme from Dvorak’s New World Symphony.
I pause to recover my thoughts, but Romy is impatiently waiting for me to begin the
accompaniment. Two or three minutes into the session, and we’ve already touched on twelve
pieces spanning 300 years of Western music and an emotional range to match. Yet, here is a girl
who in everyday life is supposed to have no ‘theory of mind’ – the capacity to put yourself in
other people’s shoes and think what they are thinking. Here is someone who is supposed to lack
the ability to communicate. Here is someone who functions, apparently, at an 18-month level.
But I say here is a joyous musician who amazes all who hear her. Here is a girl in whom extreme
ability and disability coexist in the most extraordinary way. Here is someone who can reach out
through music and touch one’s emotions in a profound way. If music is important to us all, for
Romy it is truly her lifeblood.2
How did Romy, severely learning disabled, become such a talented, if

idiosyncratic, musician? According to the theory set out above, it was her
early inability to process language, in tandem with her inability to grasp the
portent of many everyday sounds, that enhanced her ability to process all
sounds in a musical way. The two were inextricably linked. Indeed, without
the former, we can surmise that the latter would never have developed.
Romy’s AP means that, for her, every note on the piano is instantly
recognizable. But more than this, for Romy, each pitch provides a stable
point of reference in an otherwise capricious world. And it’s not just notes
on the piano that function for Romy in this way. In her mind, each of the
notes in any piece of music sounds distinct. While, for most of us, musical
sounds pass by unremarkably in perceptual terms, for Romy, different
notes, different chords, can affect her profoundly: an E flat major harmony
can make her quiver with excitement, for example, while G7 can make her
cry.
In itself, though, AP is insufficient to make a “special” musician; that
takes at least 7,000 hours of practice (Sloboda, Davidson, Howe, & Moore
1996). How, then, did Romy acquire her musical skills? Like many autistic
children early in life, she developed an obsession, which in her case was a
small electronic keyboard, whose notes lit up in the sequence needed to
play one of a number of simple tunes. As far as Romy was concerned, this
musical toy was one of only a few things with which she could
meaningfully interact, and whose logic she could understand.
Unsurprisingly, she spent hundreds of hours playing with it. The keyboard
was comfortingly predictable in comparison to with any human being—
even her devoted family, whose language and behavior differed subtly from
one occasion to another, as all human interaction does. The keyboard,
though, invariably responded to Romy in the same way. Whenever she
pressed a particular key, it always sounded the same as it did before. Here
was something in the environment that Romy could predict and control.
And so, through countless hours of self-directed exploration as a toddler,
Romy discovered where all the notes (whose sounds she could hear in her
head) are on the keyboard. Today, as a teenager, for Romy to play the piano
merely requires her to hear a tune in her head (available to her through the
internal library of songs, stored as series of absolute auditory images) and
play along with it, pressing down the correct keys in sequence as their
pitches sound in her head (see Fig. 5). And this approach works not only for
music. As we noted above, she will reproduce the sounds of the jet engines
of planes as they descend towards Heathrow Airport, for example, and she
unhesitatingly copies any ringtones that interrupt her piano lessons.
AP can have other consequences for children on the autism spectrum
too. The absolute representation of sounds in their heads appears to fuel
musical imagination in a way that is more vivid, more visceral even, than
the relative memory of intervals alone. And, although formal research is yet
to be undertaken, the anecdotal accounts of parents and teachers suggest
that earworms are widespread; shown most obviously in some children’s
incessant vocalizing of melodic fragments. With minds full of tunes that
seem to be playing the whole time, external sounds can be at best
superfluous and at worst an irritation, as the following account of a session
with Freddie, then eleven years old, shows.
Freddie
‘Why’s he doing that?’ Freddie’s father, Simon, sounded more than usually puzzled by the antics
of his son.
After months of displacement activity, Freddie was finally sitting next to me at the piano, and
looked as though this time he really were about to play. A final fidget and then his right hand
moved towards the keys. With infinite care, he placed his thumb on middle C as he had watched
me do before – but without pressing it down. Silently, he moved to the next note (D), which he
feathered in a similar way, using his index finger, then with the same precision he touched E, F,
and G, before coming back down the soundless scale to an inaudible C.
I couldn’t help smiling.
‘Fred, we need to hear the notes!’
My comment was rewarded with a deep stare, right into my eyes. Through them, almost. It was
always hard to know what Freddie was thinking, but on this occasion he did seem to understand
and was willing to respond to my request, since his thumb went back to C. Again, the key
remained un-pressed, but this time he sang the note (perfectly in tune), and then the next one,
and the next, until the five-finger exercise was complete.
In most children (assuming that they had the necessary musical skills), such behavior would
probably be regarded as an idiosyncratic attempt at humor or even mild naughtiness. But Freddie
was being absolutely serious and was pleased, I think, to achieve what he’d been asked to do, for
he had indeed enabled me to hear the notes!
He stared at me again, evidently expecting something more, and without thinking I leant
forward.
‘Now on this one, Fred’, I said, touching C sharp.
Freddie gave the tiniest blink and a twitch of his head, and I imagined him, in a fraction of a
second, making the necessary kinesthetic calculations. Without hesitation or error, he produced
the five-finger exercise again, this time using a mixture of black and white notes. Each pressed
silently. All sung flawlessly.
And then, spontaneously, he was off up the keyboard, beginning the same pentatonic pattern on
each of the twelve available keys. At my prompting, Freddie re-ran the sequence with his left
hand – his unbroken voice hoarsely whispering the low notes.
So logical. Why bother to play the notes if you know what they sound like already?
So apparently simple a task, and yet … such a difficult feat to accomplish: the whole
contradiction of autism crystallized in a few moments of music making.3
As I later said to Freddie’s father, if I had wanted to teach a

“neurotypical” child to do what his son had achieved with little or no
apparent effort, it would probably have taken many lessons, and hundreds
of hours of practice for the pupil to master the relationship between the
Western tonal system and the asymmetrical (yet regular) layout of the piano
keyboard. Yet Freddie had done it merely by watching and listening to what
I had done, attending to the streams of notes flowing by, extracting the
implicit rules of Western musical syntax, and using these to create patterns
of sounds anew. The crucial point is that I had never played the full
sequence of scales to Freddie that he subsequently produced. He had
worked out the necessary structures intuitively, merely through exposure to
music. Here is a “special” musician indeed.
C : T N
A C E
M A
This chapter sets out a theory of how some children on the autism spectrum
develop prodigious musical talent as a consequence of the way that they
perceive everyday sounds and speech—in musical terms. In a significant
minority of cases, this leads to the development of AP, which, given access
to an appropriate instrument (typically a keyboard), enables such children to
learn to play by ear. This skill is often acquired entirely through their own
efforts and typically first manifests itself in the early years. The neural
correlates of this exceptional development are yet to be explored through
brain imaging, which in the case of children severely affected by autism,
who tend to function successfully only in familiar environments, presents
significant challenges (although one or two passive studies in the field of
music and language have been undertaken; see, for example, Lai,
Pantazatos, Schneider, & Hirsch, 2012; Sharda, Midha, Malik, Mukerji, &
Singh, 2015). It is surely an area worth exploring, however, not only for the
light it would shed on our knowledge of exceptionality, but for the fresh
perspectives that human diversity offers the understanding of our species as
a whole. This is possible because we exist on continua of interests, abilities,
and traits, and it is my contention that, by analyzing the behaviors and their
neural correlates of those who function at the extremes of our tribe’s natural
neurodiversity, we can better understand the ordinary, everyday, musical
engagement that is characteristic of us all. Most importantly, it’s my belief
that, through the prism of the overtly remarkable, we can discover the
uncelebrated exceptionality in each of us, for whether autistic or
neurotypical, we are all musical by design (Ockelford, 2017, p. 9).
R
Boucher, J. (2009). The autistic spectrum: Characteristics, causes, and practical issues. London:
Sage Publications.
Brandt, A., Gebrian, M., & Slevc, L. R. (2012). Music and early language acquisition. Frontiers in
Cone, E. (1987). On derivation: Syntax and rhetoric. Music Analysis 6(3), 237–256.
DePape, A.-M. R., Hall, G. B. C., Tillmann, B., & Trainor, L. J. (2012). Auditory processing in high-
functioning adolescents with autism spectrum disorder. PloS ONE 7(9), e44084.
Dowling, J. (1982). Melodic information processing and its development. In D. Deutsch (Ed.), The
psychology of music (pp. 413–429). New York: Academic Press.
Fay, W. H. (1967). Childhood echolalia. Folia Phoniatrica et Logopaedica 19(4), 297–306.
doi:10.1159/000263153
Fay, W. H. (1973). On the echolalia of the blind and of the autistic child. Journal of Speech and
Hearing Disorders 38(4), 478–489.
Frith, U. (2003). Autism: Explaining the enigma. Oxford: Wiley-Blackwell.
Gaver, W. W. (1993). What in the world do we hear? An ecological approach to auditory event
perception. Ecological Psychology 5(1), 1–29.
Hargreaves, D. (1986). The developmental psychology of music. Cambridge: Cambridge University
Press.
Kessen, W., Levine, J., & Wendrich, K. (1979). The imitation of pitch in infants. Infant Behavior and
Development 2, 93–99.
Kuhl, P., & Meltzoff, A. (1982). The bimodal perception of speech in infancy. Science 218(4577),
1138–1141.
Lai, G., Pantazatos, S., Schneider, H., & Hirsch, J. (2012). Neural systems for speech and song in
autism. Brain 135(3), 961–975.
Lamont, A. (2008). Young children’s musical worlds: Musical engagement in 3.5-year-olds. Journal
of Early Childhood Research 6(3), 247–261.
Lecanuet, J.-P. (1996). Prenatal auditory experience. In I. Deliège & J. Sloboda (Eds.), Musical
beginnings (pp. 3–34). Oxford: Oxford University Press.
McEvoy, R. E., Loveland, K. A., & Landry, S. H. (1988). The functions of immediate echolalia in
autistic children: A developmental perspective. Journal of Autism and Developmental Disorders
18(4), 657–668.
Mcglone-Dorrian, D., & Potter, R. E. (1984). The occurrence of echolalia in three year olds’
responses to various question types. Communication Disorders Quarterly 7(2), 38–47.
McPherson, G. (Ed.). (2016). Musical prodigies: Interpretations from psychology, education,
musicology, and ethnomusicology. New York: Oxford University Press.
Malloch, S., & Trevarthen, C. (Eds.). (2009). Communicative musicality: Exploring the basis of
human companionship. New York: Oxford University Press.
Mang, E. (2005). The referent of early children’s songs. Music Education Research 7(1), 3–20.
Meltzoff, A., & Prinz, W. (2002). The imitative mind: Development, evolution and brain bases.
Cambridge: Cambridge University Press.
Miller, L. (1989). Musical savants: Exceptional skill and mental retardation. Hillsdale, NJ: Lawrence
Erlbaum.
Mills, A. (1993). Visual handicap. In D. Bishop & K. Mogford (Eds.), Language development in
exceptional circumstances (pp. 150–164). Hove: Psychology Press.
Moog, H. (1976). The musical experiences of the pre-school child. Trans. C. Clarke. London: Schott.
Norman-Haignere, S., Kanwisher, N. G., & McDermott, J. H. (2015). Distinct cortical pathways for
music and speech revealed by hypothesis-free voxel decomposition. Neuron 88(6), 1281–1296.
Ockelford, A. (2005). Repetition in music: Theoretical and metatheoretical perspectives. Farnham:
Ashgate.
Ockelford, A. (2009). In the key of genius: The extraordinary life of Derek Paravicini. London:
Random House.
Ockelford, A. (2013). Applied musicology: Using zygonic theory to inform music education, therapy,
and psychology research. New York: Oxford University Press.
Ockelford, A. (2017). Comparing notes: How we make sense of music. London: Profile Books.
Papoušek, M. (1996). Intuitive parenting: A hidden source of musical stimulation in infancy. In I.
Deliège & J. Sloboda (Eds.), Musical beginnings (pp. 88–112). Oxford: Oxford University Press.
Patel, A. D. (2012). Language, music, and the brain: A resource-sharing framework. In P. Rebuschat,
M. Rohmeier, J. A. Hawkins, & I. Cross (Eds.), Language and music as cognitive systems (pp.
Prizant, B. M. (1979). An analysis of the functions of immediate echolalia in autistic children.
Dissertation Abstracts International 39(9-B), 4592–4593.
Prizant, B. M., & Duchan, J. F. (1981). The functions of immediate echolalia in autistic children.
Journal of Speech and Hearing Disorders 46(3), 241–249.
Reese, A. (2014). The effect of exposure to structured musical activities on communication skills and
speech for children and young adults on the autism spectrum (Doctoral dissertation). University of
Roehampton, London.
Saxton, M. (2010). Child language: Acquisition and development. London: Sage Publications.
Sharda, M., Midha, R., Malik, S., Mukerji, S., & Singh, N. C. (2015). Fronto-temporal connectivity
is preserved during sung but not spoken word listening, across the autism spectrum. Autism
Research 8(2), 174–186.
Sloboda, J. A., Davidson, J. W., Howe, M. J., & Moore, D. G. (1996). The role of practice in the
development of performing musicians. British Journal of Psychology 87(2), 287–309.
Stalinski, S. M., & Schellenberg, E. G. (2010). Shifting perceptions: Developmental changes in
judgments of melodic similarity. Developmental Psychology 46(6), 1799–1803.
Sterponi, L., & Shankey, J. (2013). Rethinking echolalia: Repetition as interactional resource in the
communication of a child with autism. Journal of Child Language 41(2), 275–304.
Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin 113(2), 345.
Treffert, D. (2009). The savant syndrome: An extraordinary condition. A synopsis: Past, present,
future. Philosophical Transactions of the Royal Society B: Biological Sciences 364(1522), 1351–
1357.
Vamvakari, T. (2013). My child and music: A survey exploration of the musical abilities and interests
of children and young people diagnosed with autism spectrum conditions (Master’s dissertation).
University of Roehampton, London.
Welch, G. (2006). The musical development and education of young children. In B. Spodek & O.
Saracho (Eds.), Handbook of research on the education of young children (pp. 251–267). Mahwah,
NJ: Lawrence Erlbaum.
Wing, L. (2003). The autistic spectrum: A guide for parents and professionals. London: Robinson.
1
http://www.jkp.com/jkpblog/2013/04/music-language-autism/
2
https://blog.oup.com/2012/12/music-proxy-language-autisic-children/
3
http://www.huffingtonpost.com/adam-ockelford/autism-genius_b_4118805.html.
SECTION VII
MU S IC , T HE B R A IN ,
A N D HE A LT H
CHAPT E R 28
NEUROLOGIC MUSIC
THERAPY IN
S E N S O R I M O TO R
R E H A B I L I TAT I O N
C O R E N E T H A U T A N D K L A U S MA RT I N S T E P H A N
I of movement timing is often one of the most disturbing

features for patients with neurological disorders, particularly in
cerebrovascular disease and degenerative disorders such as Parkinson’s
disease (PD); and can result in debilitating motor timing with regard to the
manipulating ability of the upper extremities (e.g., in some patients after
stroke) and to gait dynamics (e.g., after stroke with basal ganglia or
cerebellar lesions or in patients with PD). Fortunately, basic science and
clinical research supporting the use of music in the rehabilitation,
maintenance, and development of movements of both the upper and lower
extremities with a variety of neurologic disorders has grown tremendously
over the last twenty-five years. Starting in the early 1990s, a series of
research papers (McIntosh, Brown, Rice, & Thaut, 1997; McIntosh, Rice,
Hurt, & Thaut, 1998; Miller, Thaut, McIntosh, & Rice, 1996; Thaut,
McIntosh, Prassas, & Rice, 1993; Thaut, Schleiffers, & Davis,1991)
became the foundation for investigating the importance of rhythm for
movement of both the upper and lower extremities in normal and
neurologically impaired individuals. Since then, a substantial amount of
research has shown the effect of rhythm and timing on optimization of
motor planning and motor execution through entrainment of movement
patterns, priming of the auditory motor pathway, and cueing of the
movement period.
Rhythmic entrainment, or the ability for the motor system to couple with
the auditory system assuming a common period, provided the first testable
motor theory for the use of auditory rhythm and music in therapy. Rhythmic
auditory cueing accesses biological auditory-motor networks that create
fast, temporally precise and stable synchronization mechanisms between
sensory input and motor output (Stephan et al., 2002; Thaut, Hoemberg,
Kenyon, & Hurt, 1998; Thaut & Kenyon, 2003). Neuroanatomically these
synchronization mechanisms depend on a distributed set of circuits
including a motor cortical-basal ganglia-thalamo-cortical circuit. SMA and
putamen are basic nodes for beat perception and motor performance
(Bengtsson et al., 2009; Grahn & Brett, 2007). Furthermore, the auditory
and motor systems are closely linked from a peripheral level (cochlear root
neurons synapsing with reticulospinal neurons) to frontotemporal pathways
involving the arcuate fasciculus (Fernández-Miranda et al., 2015;
Schmahman & Pandya, 2008) and to cortico-cortical loops between motor
and auditory cortices connected through delta and beta oscillatory activity
(Arnal, 2012; Arnal, Doelling, & Poeppel, 2015; Fujioka, Trainor, Large, &
Ross, 2012). There is an ongoing debate about the relative contributions of
the basal ganglia, especially corticostriatal loops and the cerebellum
towards timing. Recent studies suggest that the cerebellum is mainly
involved in absolute duration based timing when stimuli are presented
irregularly, but not in relative timing based on a regular beat, while the
striatum is supposed to be involved in both, absolute timing and relative
timing (see Teki, Grube, & Griffiths, 2012; Teki, Grube, Kumar, &
Griffiths, 2011). For an overview see Merchant, Grahn, Trainor, Rohrmeier,
and Fitch (2015).
Priming of motor activity is the ability of an external sensory (e.g.,
auditory) cue to stimulate recruitment of spinal motor neurons reducing the
amount of time required for the muscles to respond to a given motor
command. During walking this results in decreased variability of the muscle
activation patterns in the lower extremities. The evidence for priming and
timing of the motor system via reticulospinal pathways has been
demonstrated as early as 1967 (Paltsev & Elner) and 1976 (Rossignol &
Melvill Jones). Recently it has been shown, that during sensory entrainment
the corticostriatal system is already activated (for an overview: Sameiro-
Barbosa & Geiser, 2016) and may be an alternative anatomical basis for
sensorimotor synchronization and priming. Using EEG, Crasta and
colleagues (Crasta, Thaut, Anderson, Davies, & Gavin, 2018) demonstrated
that auditory priming improves neural synchronization in auditory-motor
entrainment. This fits well with the observations obtained during normal
gait in the early 1990s by Thaut and coworkers.
Thaut et al. (1993) investigated the effect of auditory rhythm on priming
of the lower extremities by looking at temporal parameters of the stride
cycle and EMG activity in normal gait. In the rhythmic condition, subjects
improved stride rhythmicity between the right and left lower extremities,
showed delayed onset and shorter duration of gastrocnemius muscle
activity, and increased integrated amplitude ratios for the gastrocnemius
muscle. These results provided evidence that more focused and consistent
muscle activity occurs during push-off when a rhythmic auditory cue is
present, due to a priming effect that results in a more efficient recruitment
of motor units in the spinal cord. The conclusions of this study led to the
further exploration of the effect of rhythmic auditory cueing on temporal
stride parameters and EMG patterns in patients with stroke and hemiparetic
gait, which also demonstrated similar results (Thaut et al., 1993).
During rhythmic auditory cueing the auditory stimulus does not provide
cues for the endpoints of a movement, but provides an external cue for the
duration of the movement period to help scale the timing and kinematics of
the movement between the beats. Most often, arm and hand motions are
used as discrete movements which are non-rhythmic in nature (e.g., to push
or grasp something), whereas leg movements are often repeated over longer
periods of time and are intrinsically rhythmic in nature (e.g., during
walking). Thus, therapeutic interventions must support arm and leg function
differentially, addressing the range of spatial, temporal, and muscular
dynamics which influence motor behavior during the various stages of
motor control. Parietal and premotor areas are well known to be involved in
the preparatory activity of motor action under the “supervision” of
prefrontal cortex. Even subliminal changes of an auditory interstimulus
interval will alter cortical and subcortical activity in prefrontal areas,
thalamus, and parietal and premotor areas. These changes in brain activity
correlate with specific changes of motor behavior (Stephan et al., 2002).
Furthermore, external auditory stimuli can influence not only the timing,
but also the spatiotemporal kinematics of motor behavior (Thaut, Kenyon,
Hurt, McIntosh, & Hoemberg, 2002). Thus, external auditory stimuli are
able to “access” parietal and premotor areas—presumably during the
preparatory phase—and influence the exact timing and the spatiotemporal
pattern of movements in a predictive way. Music has a wide range of
rhythmic, melodic, harmonic, and dynamic-acoustical elements and
research has shown it to be an effect therapeutic tool to influence both
upper and lower extremity discrete, serial, and continuous movements.
Experts in sports, such as professional rowers, have used musical
sonification during training as feedback in order to control for the
rhythmicity, the quality of the task, and assess the degree of synchrony
between the different athletes (Schaffert & Mattes, 2015).
N M T
Neurologic Music Therapy (NMT) is a research based system of clinical

techniques guided by music and neuroscience principles of music
perception, cognition, and performance, and designed to address functional
goals in the areas of sensorimotor, speech/language, and cognitive
rehabilitation. Unlike many traditional approaches that focus on teaching
compensatory strategies based on using the unimpaired functions to
reintegrate patients into everyday activities, NMT techniques focus on
directly addressing impairment and restoration of function based on current
knowledge of how music can aid in cortical reorganization, motor learning,
and neuromuscular re-education.
In NMT, there are three standardized rhythmic-musical applications for
rehabilitation, development, and maintenance of sensorimotor function:
Rhythmic Auditory Stimulation (RAS), Patterned Sensory Enhancement
(PSE), and Therapeutic Instrumental Music Performance (TIMP) (Thaut &
Hoemberg, 2014).
Rhythmic Auditory Stimulation (RAS) is a neurologic music therapy
technique which can be used to facilitate the rehabilitation, development,
and maintenance of movements that are intrinsically biologically
rhythmical. RAS uses the physiological effects of auditory rhythm on the
motor system to improve the control of movement in rehabilitation of
functional, stable, and adaptive gait patterns in patients with significant gait
deficits due to neurological impairment (Thaut, Nickel, Kenyon, Meissner,
& McIntosh, 2005). Driven by the principles of rhythmic entrainment,
priming of the auditory motor pathways, and cueing of the movement
period, research has shown RAS to be effective as both an immediate
entrainment stimulus to provide rhythmic cues during movement, and as a
facilitating stimulus for training in order to achieve more functional gait
patterns. Studies investigating the effects of rhythmic auditory stimulation
on Parkinson’s disease (de Dreu, van der Wilk, Poppe, Kwakkel, & van
Wegen, 2012; Kadivar, Corcos, Foto, & Hondzinski, 2011), stroke (Schauer
& Mauritz, 2003; Song et al., 2015; Song & Ryu, 2016; Suh et al., 2014),
traumatic brain injury (Hurt, Rice, McIntosh, & Thaut, 1998), multiple
sclerosis (Baram & Miller, 2007; Conklyn et al., 2010), spinal cord injuries
(de l’Etoile, 2008), and spastic diplegic cerebral palsy (Baram & Lenger,
2012; Kim et al., 2011; Kim, Kwak, Park, & Cho, 2012) continue to show
the significant impact of rhythm on gait kinematics through better posture,
more appropriate step rates (step cadence) and stride length, and more
efficient and symmetric muscle activation patterns in the lower extremities
during walking. PET and fMRI studies have shown that external acoustic
stimuli before or during movements lead to additional activations of dorsal
premotor areas, which influence the timing of movements. As Parkinsonian
patients are known to have a deficit to internally monitor and adjust the
kinematic gait parameters, this additional influence via premotor areas may
help to compensate for some of these deficits. A Cochrane review of music
therapy for acquired brain injury (Bradt, Magee, Dileo, Wheeler, &
McGilloway, 2010) suggested that rhythmic auditory stimulation may also
be beneficial for improving gait parameters in patients with stroke,
including gait velocity, cadence, stride length, and gait symmetry.
Patterned Sensory Enhancement (PSE) is a technique which uses the
rhythmic, melodic, harmonic, and dynamic-acoustical elements of music to
provide temporal, spatial, and force cues for movements which are not
intrinsically rhythmic by nature, but reflect functional exercise, movement
patterns, and activities of daily living. Unlike RAS, which focuses on
oscillatory movements such as gait and arm swing, PSE is applied to non-
biologically rhythmic movements such as arm and hand movements, pregait
and advanced gait exercises, and functional movement sequences such as
dressing or sit-to-stand transfers. In addition to temporal cues, PSE creates
sonification of movement through musical patterns, harmonies, dynamic
elements, and pitch, which help organize single, discrete motions (e.g., arm
and hand movements during reaching and grasping), into functional
movement patterns and sequences. The functional anatomy of PSE is
difficult to grasp. Depending on the exact nature of the motor task, different
cortical and subcortical parietal, premotor, and presumably also prefrontal
areas will be involved. It might be that PSE is more concerned with re-
establishment and optimization of cortical, cortico-subcortical and cortico-
cerebellar circuits.
For many years, research has shown that elements of music such as
rhythm and pitch can help develop and re-establish three-dimensional
movement trajectories, allowing for training of the specific aspects of
space, time, and muscular dynamics of functional movement. Significant
improvements in upper limb control and kinematics when paired with
patterned musical cues have been shown across a wide range of populations
including stroke (Buetefish, Hummelsheim, Denzler, & Mauritz, 1995; Luft
et al., 2004; McCombe Waller, Harris-Love, Liu, & Whitall, 2006; Malcolm
Massie, & Thaut, 2009; Thaut, Kenyon, et al., 2002; Thaut, Schicks,
McIntosh, & Hoemberg, 2002), Parkinson’s disease (Brown, Thaut,
Benjamin, & Cooke, 1993; Ma, Hwang, & Lin, 2009; Mak, 2006; Son &
Kim, 2015), cerebral palsy (Peng et al., 2011; Wang et al., 2013), and Down
syndrome (Robertson, Chua, Maraj, Kao, & Weeks, 2002; Robertson, Van
Gemmert, & Maraj 2002).
High intensity training with frequent repetition is important for
successful rehabilitation of fine and gross motor control of the upper
extremities, therefore PSE not only provides the auditory structures to drive
and enhance the movement, but can also incorporate familiar songs and
musical targets which can create an additional motivational component to
therapy, often resulting in more repetitions of exercises and higher
compliance with home exercise programs.
Therapeutic Instrumental Music Performance (TIMP) is the playing of
musical instruments in order to exercise and stimulate functional movement
patterns. When implementing TIMP, appropriate musical instruments are
selected in a therapeutically meaningful way in order to emphasize range of
motion, endurance, strength, functional hand movements, finger dexterity,
and limb coordination (Chadwick & Clark, 1980; Elliott, 1982). During
TIMP, instruments are not typically played in the traditional manner, but are
placed in different locations to facilitate practice of the desired functional
movements (Thaut, 2005).
Engaging in musical instrument playing requires a close interaction
between the sensorimotor, auditory, and visual systems. Using instruments
as targets to practice movement allows for feedback and feedforward
interactions between the auditory and premotor areas of the cortex, as well
as engaging the cerebellum and the basal ganglia. The instrument provides
the spatial parameters, while the auditory feedback provides input to make
adjustments to the timing, muscular dynamics, and positioning of the
movement. The sensory feedback provided in this audio-motor interaction
is also essential for the potential increase of plasticity (Herholz & Zatorre,
2012). Plasticity is known to be enhanced during the early and subacute
phase after lesions, such as stroke (post-lesional plasticity). Therefore,
many interventions in stroke rehabilitation try to target this “window of
opportunity” to enhance recovery.
Research has shown that repetition is extremely important for learning
and training movements. Through instruments such as the keyboard and
percussion, or creative use of music technology, TIMP exercises can
provide the opportunity to perform repetitive movements at various speeds
and combinations, incorporating both unilateral and bilateral fine and gross
motor skills. Through appropriate placement of instruments to facilitate
repetition, discrete movement of the fingers, arms, and legs can be trained,
as well as sequential movements of different limbs, incorporating bilateral
engagement of both the upper and lower extremities. Beneficial effects with
regard to both fine and gross motor control have been observed in subacute
and chronic populations including stroke (Altenmüller, Marco-Pallares,
Münte, & Schneider, 2009; Buetefish et al, 1995; Grau-Sánchez et al.,
2013; Schneider, Münte, Rodriguez-Fornells, Sailer, & Altenmüller, 2010;
Thaut, Kenyon, et al., 2002; Thaut, McIntosh, & Rice, 1997; Whithall,
McCombe Waller, Silver, & Macko, 2000; Yoo, 2009), PD (Bernatzky,
Bernatzky, Hesse, Staffen, & Ladurner, 2004; Bukawska, 2016; Bukawska,
Krężałek, Mirek, Bujas, & Marchewka 2015; Pacchetti et al., 2000),
cerebellar patients (Molinari et al., 2005), traumatic brain injury (Chong,
Cho, & Kim, 2014), Down syndrome (Ringenbach et al., 2014), and
cerebral palsy (Chong, Cho, Jeong, & Kim, 2013; Turova et al., 2017). In
addition to using TIMP as a learning and training tool for movements,
Kojovic et al. (2012) saw a significant reduction in dystonia symptoms and
electromyographic activity in the neck and orbicularis oculi muscles when
playing the piano over music listening or imagining playing.
Comparing the three interventions, RAS and PSE serve primarily as
aids, which help to optimize movement trajectories for the performance of
rhythmical or discrete motor tasks. In some patients they may also be used
as tools to promote motor learning. TIMP also facilitates optimization of
movement by providing the timing and spatial structure for the movements;
however, it adds the additional visual and auditory feedback through the
instruments which assist in the motor learning or neuromuscular re-
education.
A M D
Acquired brain injury (ABI) is defined as an injury to the brain which

occurred after birth and is not hereditary, congenital, or degenerative. More
specifically, ABIs are typically a result of an ischemic stroke, hemorrhage
in the brain, lack of oxygen to the brain (hypoxia/anoxia), infections in the
brain, toxic exposure, brain tumors, or a traumatic force to the head causing
focal or diffuse trauma. It is one of the leading causes of death and
disability in adult populations, and can cause a range of temporary and
permanent motor, cognitive, and speech dysfunctions, depending on the
type of injury and the range of the severity.
With regard to recovery, children have the advantage that the degree of
cerebral plasticity is much larger than in adults; therefore, they tend to
recover better than adults from acquired motor dysfunctions. This also helps
them when they suffer from disease in childhood.
A commonality among all acquired brain injuries is that patients are
affected at a specific time point, and depending on the severity of the
neuronal damage, there will be a chance to recover afterwards. Acquired
brain injuries often require the rehabilitation and retraining of movements
that are biologically intrinsically rhythmic as well as discrete movements
that are not intrinsically driven by an underlying rhythm. Research has
shown that despite acquired injury to the brain, the structural properties of
music can often access rhythmic entrainment mechanisms, and successfully
retrain movement by creating a stable anticipatory timescale, priming the
auditory motor pathway, and therefore optimizing kinematic trajectory
patterns. Motor deficit in gait after an ABI can vary depending on the
affected area. In stroke, gait is often characterized by hemiplegia with
sensory deficits and altered tone opposite to the lesion. In moderately
affected patients who can walk a few meters at least with some help, this
leads to altered gait parameters such as length of ground contact, stride
length and load on both sides and generally to a reduction of gait velocity.
Traumatic brain injuries often present with similar characteristics to stroke;
however, plegia is not always unilateral, and can present unilaterally or
bilaterally.
A substantial amount of research over the last twenty-five years has
provided new insights into the application of rhythm and timing to optimize
motor planning and movement execution in ABI in both subacute (Kim &
Oh, 2012; Roerdink, Bank, Peper, & Beek, 2011; Spaulding et al., 2013;
Thaut et al, 1997, 2007) and chronic phases (Cha, Kim, Hwang, & Chung,
2014; Hurt, Rice, McIntosh, & Thaut, 1998; Kim & Oh, 2012). Consistent
results show that in gait walking velocity, cadence, and stride length
increased significantly more during training with RAS compared to the
control conditions. A recent review (Yoo & Kim, 2016) summarized these
findings and gave a positive recommendation towards regular use of this
technique for the subacute and the chronic phases after stroke. Both
metronome tones and metronome and music were effective types of cueing.
While the previously mentioned studies primarily explored the
entrainment of biologically intrinsic rhythms of neural gait oscillators, other
studies have also explored discrete movements such as arm and hand
movements that are not driven by underlying biologically rhythm
(Altenmüller et al., 2009; Ford, Wagenaar, & Newell, 2007; Grau-Sánchez
et al., 2013; Luft et al., 2004; Malcolm et al., 2009; Schmitz, Kroeger, &
Effenberg, 2014; Thaut, Kenyon, et al., 2002; Whitall et al., 2000). Thaut
and colleagues investigated auditory rhythm as a timekeeper to modify the
onset, duration, and variability of electromyographic (EMG) patterns in the
biceps and triceps during the performance of a gross motor task, revealing
decreased variability in muscle activity during a motor task with auditory
rhythm, indicating a more efficient use of the muscles, which could lead to
a patient’s ability to perform a task with more accuracy and for a longer
period of time (Thaut, Schleiffers, & Davis, 1991).
Additionally, Grau-Sánchez et al. (2013) found piano playing improved
scores on the Action Research Arm Test (ARAT), Arm Paresis Score, and
the Box and Block Test (BBT) in stroke patients, consistent with other
studies looking at this population (Altenmüller et al., 2009; Rojo et al.,
2011). The above studies give strong support for the use of both Patterned
Sensory Enhancement (PSE) and Therapeutic Instrumental Music
Performance (TIMP) with these patients. Thus, in acquired movement
dysfunctions both supportive and motor learning approaches which increase
plasticity are useful.
D M D
Movement disorders are characterized as neurological conditions that affect

the speed, fluency, quality, and ease of movement. They can be hereditary
or acquired (e.g., caused by medication side effects, environmental factors,
or injury) and can present with lack of control of both voluntary and
involuntary movements. Degenerative movement disorders are progressive
by nature, and result in decreased function over time. Some of the most
common movement disorders include: Parkinson’s disease and
Parkinsonism, ataxia, dystonia, Huntington’s disease, Tourette syndrome,
and essential tremor. Due to the degenerative nature of the disorders the
major goal of the interventions will be to adjust movement performance to
the dwindling resources and use these resources as efficiently as possible.
Parkinsonian Syndromes
Parkinson’s disease, the most common movement disorder, is an idiopathic
neurodegenerative disorder associated with progressive loss of
dopaminergic neurons in the basal ganglia, due to the deterioration of the
substantia nigra. Typical symptoms include progressive loss of muscle
control, which leads to bradykinesia (slowing of movements), resting limb
tremor (trembling of the limbs and head while at rest), postural instability,
rigidity (stiffness), and gait instability resulting in impaired balance. Key
treatment goals when working with PD include increasing heel strike in
order to promote longer stride lengths and decreased festinating gait
patterns, increasing step cadence and walking speed, increasing initiation of
movement and functional balance, and decreasing the risk of falls.
Many studies have looked at Parkinson’s disease and the effects of
Rhythmic Auditory Cuing (RAS) as an external time keeper to facilitate
movement sequences that are not receiving the appropriate internal timing
cues from the basal ganglia. Findings have shown that persons with PD, on
and off medication, were able to improve their walking patterns through
better posture; more appropriate step cadence and increased stride length;
and more efficient and symmetric muscle activation patterns (McIntosh et
al., 1997; Miller et al., 1996; Richards, Malouin, Bedard, & Cioni, 1992;
Thaut, McIntosh, et al., 1996). Additionally, McIntosh et al. (1998) looked
at long-term carry-over after a five-week RAS treatment program, finding
that it took an average of 5 weeks for velocity scores to return to baseline.
Falls are among the biggest contributors to loss of independent living,
long-term institutionalization, and increased mortality (Johnell, Melton,
Atkinson, O’Fallon, & Kurland, 1992). The risk of falls in a person with PD
increases substantially from that of healthy elderly, presenting not only a
serious concern over safety, but also over the enormous human and
healthcare cost associated with falling. Wood, Bilclough, Bowron, and
Walker (2002) found that out of 109 subjects with idiopathic PD and a
mean Hoen/Yahr rating of 2, 68 percent experienced falls over a one-year
period. Thaut and colleagues (Thaut, Rice, Braun Janzen, Hurt-Thaut, &
McIntosh, 2018) examined and compared the effects of a continuous 24
week RAS treatment program to an intermittent RAS program with 8 weeks
RAS training, 8 weeks without, for 24 weeks. Changes in ankle
dorsiflexion, cadence, velocity, stride length, the Berg Balance Scale, fear
of falling, the TUG test, and frequency and severity of falls were evaluated.
The findings offered evidence that continuous and intermittent RAS
treatment over time can be effective tools to reduce falls in persons with
Parkinson’s disease; however, continuous RAS treatment resulted in
significantly greater gains in dorsiflexion, cadence, velocity, stride length,
and a reduction in severity level 1 falls and fear of falling, when comparing
treatments. These results suggest that there is only a limited carry-over
effect for RAS in PD patients—presumably due to their pathophysiological
deficit. This encourages the use of ongoing home training programs, also
outside the therapy setting for these patients.
About one third of people with Parkinson’s disease experience freezing
episodes when initiating gait, changing directions, navigating around
obstacles or in small spaces. Numerous studies have shown the
effectiveness of rhythmic auditory cueing on the reduction of freezing
episodes (e.g., Frazzitta, Maestri, Uccellini, Bertotti, & Abelli, 2009;
Freedland et al., 2002; Howe, Lövgreen, Cody, Ashton, & Oldham, 2003;
McIntosh et al., 1997; Morris, Suteerawattananon, Etnyre, Jankovic, &
Protas, 2004; Thaut, McIntosh, et al., 1996; Willems et al., 2006).
Additionally, when looking at kinematic changes due to the immediate
entrainment effects of RAS gait training, Picelli et al. (2010) found
increased hip range of motion and power during pull-off phase of gait and
decreased ankle dorsiplantar flexion with rhythmic cueing. Other studies
which have looked at RAS training programs found a slight increase in
dorsiflexion over 5 weeks (Pau et al., 2016), and a significant increase over
8 week, and in 6 month training programs (Hurt-Thaut, 2014; Thaut et al.,
2018). Until now the sequelae of Parkinson’s have been mainly treated by
RAS approaches. However, it might be that discrete dysfunctions, such as
freezing episodes may also benefit from specifically tailored PSE
approaches.
Huntington’s Disease
Huntington’s disease is a hereditary neurodegenerative disorder which
results in motor disturbances such as hyperkinesia or dystonia, slow
execution of movements, and poor coordination. Perceptual timing is even
more impaired than in Parkinson patients (Cope, Grube, Singh, Burn, &
Griffiths, 2014). The hyperkinetic choreatic movements often coexist with
bradykinesia, and gait can present with a wide base of support, increased
lateral sway, variability in swing and stance phases, difficulty with
frequency modulation, and poor initiation of movement.
Thaut and colleagues (Thaut, Lange, Miltner, Hurt, & Hoemberg, 1996;
Thaut, Miltner, Lange, Hurt, & Hoemberg, 1999) explored velocity
modulation and rhythmic synchronization of gait in persons with
Huntington’s disease, providing the first evidence that rhythmic facilitation
could influence mobility in this population. A high variability in frequency
entrainment was seen across subjects, with exact phase and period matching
highly impaired. Comparisons of self-paced walking, rhythmic metronome
cueing, and music, found that subjects were able to significantly modulate
their gait velocity during both self-paced walking and with metronome
cueing, but not during the music condition. Due to their prominent
difficulties of perception of timing, a “simple” sensory signal may be most
useful for them. Rhythmic facilitation improved locomotor function after a
short training period, although disease progression had a clear impact on
gait parameters.
Parkinsonism
Parkinsonism is a general term used to describe impairments in motor
function presenting with similar characteristics to Parkinson’s disease such
as akinesia, hypokinesia, bradykinesia, motor blocks, rigidity, and problems
with the initiation of cyclical movements. These symptoms are also found
in movement disorders of different etiology such as vascular Parkinsonism,
and drug-induced Parkinsonism and in related disorders, which affect also
additional systems, such as progressive supranuclear palsy (PSP) and
multiple system atrophy (MSA). Generally they do not respond as well to
L-Dopa medications as patients with Parkinson’s. Furthermore, Cope et al.
(2014) could show for patients with MSA, that their timing perception was
more impaired than that in Parkinson’s, similar to that in Huntington’s.
Little is known about the effects of rhythmic auditory cueing on
Parkinsonism; however, given the strong effects seen in some of the related
disorders, further research in this area is warranted.
Multiple Sclerosis
Multiple sclerosis is a prevalent autoimmune disease of the central nervous
system which results in progressive demyelination resulting in scar tissue
which causes widespread neurological sensory, motor, and cognitive
symptoms such as paresthesia, progressive hemiparesis, ataxia, fatigue, and
depression. Gait and postural dysfunctions are common in patients with
multiple sclerosis and can affect static and dynamic stability, motor control,
and coordination, leading to an increased risk of falls and decreased quality
of life. Typical gait characteristics include reduced gait velocity, stride
length, cadence, and increased step width, asymmetric gait, and increased
double limb support time.
Only recently have a number of studies bridged the gap in literature by
looking at the effects of auditory rhythmic cueing on gait in people with
multiple sclerosis (Conklyn et al., 2010; Seebacher, Kuisma, Glynn, &
Berger, 2015, 2016; Shahraki, Sohrabi, Torbati, Nikkhah, & NaeimiKia,
2017). A systematic review of the effects of rhythmic auditory cueing in
gait rehabilitation for multiple sclerosis (Ghai, Ghai, & Effenberg, 2017,
2018; Ghai, Ghai, Schmitz, & Effenberg, 2018), suggested evidence for a
positive impact of rhythmic auditory cueing on reduction in the timed 25-
meter walk test, and spatiotemporal gait parameters: gait velocity, stride
length, and cadence.
The premise for using rhythmic cueing to learn, train, and retrain
movement is built on a feedforward model of rhythm driving the motor and
kinematic changes of movement. Baram and Miller (2007), however,
studied how self-generated auditory feedback through an external apparatus
can serve as a non-imposing reference which can provide a constant
awareness of gait quality and an instantaneous sensory response to changes
in gait for people with multiple sclerosis. Results of this study may provide
evidence for the use of an auditory feedback system to enhance patient
awareness and effort to improve gait quality.
In the last years basic and clinical research has provided more effective
drug treatment for patients with MS. This leads to a change of the
therapeutic goal in more and more patients: the goal is to halt the
progression of MS and not “only” to slow the progression of the disease.
This change of treatment strategy may open the possibility to use TIMP in
these patients more intensively.
Healthy Elderly
Several factors related to the normal process of aging can affect strength,
agility, flexibility, and muscle tone, therefore leading to sensorimotor
changes and safety risk in this population.
Decreased bone density or osteoporosis can not only decrease stability,
but also make bones more vulnerable to breaks. Decreased or lack of
physical activity can result in poor muscle tone, decreased strength, and loss
of bone mass and flexibility, putting someone at higher risk for falls and
injury. Age-related visual impairments such as cataracts and glaucoma can
alter depth perception, visual acuity, and peripheral vision, making it more
difficult to safely maneuver through one’s environment. Medications can
reduce mental alertness, impair balance and gait, and cause drops in systolic
blood pressure while standing. Additionally, environmental hazards such as
poor lighting, loose rugs, lack of grab bars, objects on the floor, or unsturdy
furniture can cause risks for falling.
Listening to music and sometimes dancing to music is often quite
popular with the elderly. We will, however, concentrate in this chapter on
the use of music to prevent falls. In a 2012 Cochrane review, Gillespie and
colleagues assessed the effects of interventions designed to reduce the
incidence of falls in older people living in the community, by examining
159 random control trials with 79,193 participants (Gillespie et al., 2012).
The conclusions of this review were that multifactorial assessment and
intervention programs—such as monitoring medication, treatment of visual
problems, fall prevention education, and non-slip shoes—reduce the rate of
falls, but not the risk of falling. The only interventions which consistently
reduced both the rate and risk of falling were group and home-based
exercise programs and home safety assessments. Hurt-Thaut (2014), found
that healthy elderly achieved a statistically significant increase in degrees of
dorsiflexion, velocity, cadence, stride length, and the Berg Balance Scale
scores when participating in both a continuous and intermittent 6-month
rhythmic based exercise and walking program.
D D
Autism Spectrum Disorder
Autism spectrum disorder (ASD) is a neurodevelopmental disorder that is
often characterized by deficits in social interaction, communication, and
unusual behaviors, such as clumsy uncoordinated or repetitive movement,
poor balance and postural control (Fournier et al., 2010). Only in recent
years has research attention turned to the delays in fundamental motor
development (e.g., oral motor control, coordination, gait) in ASD compared
to typically developing children, and how those delays can directly
influence interpersonal social exchange, cognitive functions such as
attention and executive function, and the acquisition and development of
written and spoken language. Torres et al. (2013) published the most
accessed paper on ASD from a movement perspective, laying out a broad
theoretical framework to research, treat, and track autism. Torres’s research
has been at the forefront of research exploring a complex system for
analyzing micro-movements as a reflection of the layers of multi-directional
internal and external influences on the central and peripheral nervous
systems during goal oriented movement in response to cognitive motor and
social task (Torres & Donnellan, 2015).
Although there is a limited body of research looking specifically at the
influence of elements of music on sensorimotor function in this population,
based on the principles of auditory motor entrainment, rhythm-based
interventions such as RAS, PSE, and TIMP could aid in regulatory control
of proprioceptive movement and provide adaptive mechanisms to decrease
movement variability, smooth movement trajectories, and improve gait
parameters such as symmetry and stability.
Cerebral Palsy
Cerebral palsy (CP) is a chronic disability of the central nervous system
characterized by abnormal control of movement and posture. A person with
CP can present with quadriplegia (both arms and legs affected), diplegia
(two limbs affected), or hemiplegia (one side of the body affected). Motor
symptoms can vary widely, ranging from minor difficulty with fine motor
movements such as grasping and manipulation of objects, to significant
muscular and motor control of all four limbs.
A few studies have looked at the effects of rhythm and instrument
playing on this population. When investigating the effects of RAS on adults
with CP (Kim et al., 2011), kinematic analysis revealed significant
increases in the anterior tilt of the pelvis and hip flexion at initial contact
with RAS training; however, there were no statistical differences in knee,
ankle, and foot kinematic parameters. Furthermore, Kim et al. (2012),
looked at the effects of RAS versus neurodevelopmental treatment (Bobath)
on gait patterned in adults with cerebral palsy over an intensive 3-week
training period. Findings indicated that RAS significantly increased
cadence, velocity, stride length, and step length, in addition to showing
significant increases in overall normalization of the gait on the gait
deviation index scores compared to the neurodevelopmental treatment
group. In contrast, the neurodevelopmental treatment group showed
significant decreases in cadence, velocity, stride length, and step length,
with a significant increase in step time; however, neurodevelopmental
treatment showed significant improvements in internal and external rotation
s of hip joints. In support of TIMP, Chong et al. (2013) explored finger
exercises on the keyboard as a tool to increase manual dexterity and
velocity in adults with cerebral palsy, finding improvements after twelve
30-minute TIMP sessions.
S C
Since the 1990s, a strong body of research evidence has set the foundation
for the use of rhythm and music as important tools in the development,
rehabilitation, and maintenance of sensorimotor function, particularly in the
treatment of neurologic disorders. Through external rhythmic cueing,
rhythmic entrainment optimizes the execution of a motor pattern by priming
the motor system and creating anticipatory rhythmic templates to allow for
optimal anticipation, motor planning, and execution of movement (Thaut,
McIntosh, & Hoemberg, 2014). While the temporal structures in music
remain the central elements when using music in the treatment of
sensorimotor dysfunction, other elements such as pitch, dynamics, and
harmony can also enhance and shape complex movements such as arm and
hand movements that are not intrinsically rhythmic. Neurologic Music
Therapy is a research-based system of clinical techniques guided by music
and neuroscience principles of music perception, cognition, and
performance. In the area of sensorimotor rehabilitation, three standardized
techniques, RAS, PSE, and TIMP have become well accepted in the
treatment of impairment and restoration of function based on current
knowledge of how music can aid in cortical reorganization, motor learning,
and neuromuscular re-education.
R
Altenmüller, E., Marco-Pallares, J., Münte, T. F., & Schneider, S. (2009). Neural reorganization
underlies improvement in stroke-induced motor dysfunction by music-supported therapy. Annals
Arnal, L. H. (2012). Predicting “when” using the motor system’s beta-band oscillations. Frontiers in
Human Neuroscience 6. Retrieved from https://doi.org/10.3389/fnhum.2012.00225
Arnal, L. H., Doelling, K. B., & Poeppel, D. (2015). Delta–beta coupled oscillations underlie
temporal prediction accuracy. Cerebral Cortex 25(9), 3077–3085.
Baram, Y., & Lenger, R. (2012). Gait improvement in patients with cerebral palsy by visual and
auditory feedback. Neuromodulation 15(1), 48–52.
Baram, Y., & Miller, A. (2007). Auditory feedback control for improvement of gait in patients with
multiple sclerosis. Journal of Neurological Sciences 254(1–2), 90–94.
Bengtsson, S. L., Ullén, F., Henrik Ehrsson, H., Hashimoto, T., Kito, T., Naito, E., … Sadato, N.
(2009). Listening to rhythms activates motor and premotor cortices. Cortex 45(1), 62–71.
Bernatzky, G., Bernatzky, P., Hesse, H. P., Staffen, W., & Ladurner, G. (2004). Stimulating music
increases motor coordination in patients afflicted by Morbus Parkinson. Neuroscience Letters 361,
4–8.
Bradt, J., Magee, W. L., Dileo, C., Wheeler, B. L., & McGilloway, E. (2010). Music therapy for
acquired brain injury. Cochrane Database of Systematic Reviews 7, CD006787.
doi:10.1002/14651858.CD006787.pub2
Brown, S. H., Thaut, M. H., Benjamin, J., & Cooke, J. D. (1993). Effects of rhythmic auditory cueing
on temporal sequencing of complex arm movements. Proceedings of the Society for Neuroscience
227(2).
Buetefish, C., Hummelsheim, H., Denzler, P., & Mauritz, K. H. (1995). Repetitive training of isolated
movements improves the outcome of motor rehabilitation of the centrally paretic hand. Journal of
Bukowska, A. A. (2016). Influence of neurologic music therapy to improve the activity level in a
group of patients with PD. Nordic Journal of Music Therapy 25(1), 14.
Bukowska, A. A., Krężałek, P., Mirek, E., Bujas, P., & Marchewka, A. (2015). Neurologic music
therapy training for mobility and stability rehabilitation with Parkinson’s disease: A pilot study.
Frontiers in Human Neuroscience 9. Retrieved from https://doi.org/10.3389/fnhum.2015.00710
Cha, Y., Kim, Y., Hwang, S., & Chung, Y. (2014). Intensive gait training with rhythmic auditory
stimulation in individuals with chronic hemiparetic stroke: A pilot randomized controlled study.
Neurorehabilitation 35(4), 681–688.
Chadwick, D. M., & Clark, C. A. (1980). Adapting music instruments for the physically
handicapped. Music Educators Journal 67(3), 56–59.
Chong, H. J., Cho, S. R., Jeong, E., & Kim, S. J. (2013). Finger exercise with keyboard playing in
adults with cerebral palsy: A preliminary study. Journal of Exercise Rehabilitation 9(4), 420–425.
Chong, H. J., Cho, S. R., & Kim, S. J. (2014). Hand rehabilitation using MIDI keyboard playing in
adolescents with brain damage: A preliminary study. Neurorehabilitation 34(1), 147–155.
Conklyn, D., Stough, D., Novak, E., Paczak, S., Chemali, K., & Bethoux, F. (2010). A home-based
walking program using rhythmic auditory stimulation improves gait performance in patients with
multiple sclerosis: A pilot study. Neurorehabilitation and Neural Repair 24(9), 835–842.
Cope, T. E., Grube, M., Singh, B., Burn, D. J., & Griffiths, T. D. (2014). The basal ganglia in
perceptual timing: Timing performance in multiple system atrophy and Huntington’s disease.
Crasta, J. E., Thaut, M. H., Anderson, C. W., Davies, P. L., & Gavin, W. J. (2018). Auditory priming
improves neural synchronization in auditory-motor entrainment. Neuropsychologia 117, 102–112.
de Dreu, M. J., van der Wilk, A. S., Poppe, E., Kwakkel, G., & van Wegen, E. E. (2012).
Rehabilitation, exercise therapy and music in patients with Parkinson’s disease: A meta-analysis of
the effects of music-based movement therapy on walking ability, balance and quality of life.
Parkinsonism & Related Disorders 18(Suppl. 1), 114–119.
de l’Etoile, S. K. (2008). The effect of rhythmic auditory stimulation on the gait parameters of
patients with incomplete spinal cord injury: An exploratory pilot study. International Journal of
Rehabilitation Research 31(2), 155–157.
Elliott, B. (1982). Guide to the selection of musical instruments with respect to physical ability and
disability. Saint Louis, MO: MMB Music, Inc.
Fernández-Miranda, J. C., Wang, Y., Pathak, S., Stefaneau, L., Verstynen, T., & Yeh, F. C. (2015).
Asymmetry, connectivity, and segmentation of the arcuate fascicle in the human brain. Brain
Structure & Function 220(3), 1665–1680.
Ford, M., Wagenaar, R., & Newell, K. (2007). The effects of auditory rhythms and instruction on
walking patterns in individuals post stroke. Gait and Posture 26(1), 150–155.
Fournier, K. A., Kimberg, C. I., Radonovich, K. J., Tillman, M. D., Chow, J. W., Lewis, M. H., …
Hass, C. J. (2010). Increased static and dynamic postural control in children with autism spectrum
disorders. Gait Posture 32(1): 6–9.
Frazzitta, G., Maestri, R., Uccellini, D., Bertotti, G., & Abelli, P. (2009). Rehabilitation treatment of
gait in patients with Parkinson’s disease with freezing: A comparison between two physical
therapy protocols using visual and auditory cues with or without treadmill training. Movement
Disorders 24(8), 1139–1143.
Freedland, R. L., Festa, C., Sealy, M., McBean, A., Elghazaly, P., Capan, A., … Rothman, J. (2002).
The effects of pulsed auditory stimulation on various gait measurements in persons with
Parkinson’s disease. Neurorehabilitation 17(1), 81–87.
Ghai, S., Ghai, I., & Effenberg, A. O. (2017). Effect of rhythmic auditory cueing on gait in cerebral
palsy: A systematic review and meta-analysis. Neuropsychiatric Disease and Treatment 14, 43–59.
Ghai, S., Ghai, I., & Effenberg, A. O. (2018). Effect of rhythmic auditory cueing on aging gait: A
systematic review and meta-analysis. Aging and Disease 9(5), 901–923.
Ghai, S., Ghai, I., Schmitz, G., & Effenberg, A. O. (2018). Effect of rhythmic auditory cueing on
Parkinsonian gait: A systematic review and meta-analysis. Scientific Reports 8, 506.
Gillespie,L. D., Robertson, M. C., Gillespie, W. J., Sherrington, C., Gates, S., Clemson, L. M., &
Lamb, S. E. (2012). Interventions for preventing falls in older people living in the community.
Cochrane Database of Systematic Reviews 9, CD007146.
Grau-Sánchez, J., Amengual, J. L., Rojo, N., Veciana de las Heras, M., Montero, J., Rubio, F., …
Rodríguez-Fornells, A. (2013). Plasticity in the sensorimotor cortex induced by music-supported
therapy in stroke patients: A TMS study. Frontiers in Human Neuroscience 7. Retrieved from
Howe, T. E., Lövgreen, B., Cody, F. W., Ashton, V. J., & Oldham, J. A. (2003). Auditory cues can
modify the gait of persons with early-stage Parkinson’s disease: A method for enhancing
Parkinsonian walking performance. Clinical Rehabilitation 17(4), 363–367.
Hurt, C. P., Rice, R. R., McIntosh, G. C., & Thaut, M. H. (1998). Rhythmic auditory stimulation in
gait training for patients with traumatic brain injury. Journal of Music Therapy 35(4), 228–241.
Hurt-Thaut, C. P. (2014). Rhythmic auditory stimulation to reduce falls in healthy elderly and
patients with Parkinson’s disease (Doctoral dissertation). UMI dissertation publishing, 3635683.
Johnell, O., Melton, L. J. III, Atkinson, E. J., O’Fallon, W. M., & Kurland, L. T. (1992). Fracture risk
in patients with Parkinsonism: A population-based study in Olmsted County, Minnesota. Age and
Ageing 21(1), 32–38.
Kadivar, Z., Corcos, D. M., Foto, J., & Hondzinski, J. M. (2011). Effect of step training and rhythmic
auditory stimulation on functional performance in Parkinson patients. Neurorehabilitation and
Kim, J. S., & Oh, D. W. (2012). Home-based auditory stimulation training for gait rehabilitation of
chronic stroke patients. Journal of Physical Therapy Science 24(8), 775–777.
Kim, S., Kwak, E., Park, E., & Cho, S. (2012). Differential effects of rhythmic auditory stimulation
and neurodevelopmental treatment/Bobath on gait patterns in adults with cerebral palsy: A
randomized controlled trial. Clinical Rehabilitation 26(10), 904–914.
Kim, S. J., Kwak, E. E., Park, E. S., Lee, D. S., Kim, K. J., Song, J. E., & Cho, S. R. (2011). Changes
in gait patterns with rhythmic auditory stimulation in adults with cerebral palsy.
Kojovic, M., Pareés, I., Sadnicka, A., Kassavetis, P., Rubio-Agusti, I., Saifee, T. A., & Bhatia, K. P.
(2012). The brighter side of music in dystonia. Archives of Neurology 69(7), 917–919.
Luft, A. R., McCombe-Waller, S., Whitall, J., Forrester, L. W., Macko, R., Sorkin, J. D., … Hanley,
D. F. (2004). Repetitive bilateral arm training and motor cortex activation in chronic stroke: A
randomized controlled trial. Journal of the American Medical Association 292(15), 1853–1861.
Ma, H. I., Hwang, W. J., & Lin, K. C. (2009). The effects of two different auditory stimuli on
functional arm movement in persons with Parkinson’s disease: A dual-task paradigm. Clinical
Rehabilitation 23(3), 229–237.
McCombe Waller, S., Harris-Love, M., Liu, W., & Whitall, J. (2006). Temporal coordination of the
arms during bilateral simultaneous and sequential movements in patients with chronic hemiparesis.
Experimental Brain Research 168(3), 450–454.
McIntosh, G. C., Brown, S. H., Rice, R. R., & Thaut, M. H. (1997). Rhythmic auditory-motor
facilitation of gait patterns in patients with Parkinson’s disease. Journal of Neurology,
Neurosurgery, and Psychiatry 62(1), 22–26.
McIntosh, G. C., Rice, R. R., Hurt, C. P., & Thaut, M. H. (1998). Long-term training effects of
rhythmic auditory stimulation on gait in patients with Parkinson’s disease. Movement Disorders
13(2), 212 [Abstract].
Mak, M. (2006). Feed-forward audio-visual cues could enhance sit-to-stand in Parkinsonian patients.
Proceedings of the 4th World Congress for Neurorehabilitation, F1B-7.
Malcolm, M. P., Massie, C., & Thaut, M. H. (2009). Rhythmic auditory-motor entrainment improves
hemiparetic arm kinematics during reaching movements: A pilot study. Topics in Stroke
Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T.(2015). Finding the beat: A neural
perspective across humans and non-human primates. Philosophical Transactions of the Royal
Society B: Biological Sciences 370(1664), 20140093.
Miller, R. A., Thaut, M. H., McIntosh, G. C., & Rice, R. R. (1996). Components of EMG symmetry
and variability in Parkinsonian and healthy elderly gait. Electroencephalography and Clinical
Neurophysiology/Electromyography and Motor Control 101(1), 1–7.
Molinari, M., Leggio, M., Filippini, V., Gioia, M., Cerasa, A., & Thaut, M. (2005). Sensorimotor
transduction of time information is preserved in subjects with cerebellar damage. Brain Research
Bulletin 67(6), 448–458.
Morris, G. S., Suteerawattananon, M., Etnyre, B. R., Jankovic, J., & Protas, E. J. (2004). Effects of
visual and auditory cues on gait in individuals with Parkinson’s disease. Journal of the
Neurological Sciences 219(1–2), 63–69.
Pacchetti, C., Mancini, F., Aglieri, R., Fundaro, C., Martignoni, E., & Nappi, G. (2000). Active music
therapy in Parkinson’s disease: An integrative model method for motor and emotional
rehabilitation, Psychosomatic Medicine 62(3), 386–393.
Paltsev, Y. I., & Elner, A. M. (1967). Change in the functional state of the segmental apparatus of the
spinal cord under the influence of sound stimuli and its role in voluntary movement. Biophysics
12, 1219–1226.
Pau, M., Corona, F., Pili, R., Casula, C., Sors, F., Agostini, T., … Murgia, M. (2016). Effects of
physical rehabilitation integrated with rhythmic auditory stimulation on spatio-temporal and
kinematic parameters of gait in Parkinson’s disease. Frontiers in Neurology 7. Retrieved from
https://doi.org/10.3389/fneur.2016.00126
Peng, Y. C., Lu, W. T., Wang, T. H., Chen, Y. L., Liao, H. F., Lin, K. H., & Tang, P. F. (2011).
Immediate effects of therapeutic music on loaded sit-to-stand movement in children with spastic
diplegia. Gait & Posture 33(2), 274–278.
Picelli, A., Camin, M., Tinazzi, M., Vangelista, A., Cosentino, A., Fiaschi, A., & Smania, N. (2010).
Three-dimensional motion analysis of the effects of auditory cueing on gait pattern in patients with
Parkinson’s disease: A preliminary investigation. Neurological Sciences 31(4), 423–430.
Richards, C. L., Malouin, F., Bedard, P. J., & Cioni, M. (1992). Changes induced by L-Dopa and
sensory cues on the gait of Parkinsonian patients. In M. Wollacot & F. Horak (Eds.), Posture and
gait: Control mechanisms (pp. 126–129). Eugene, OR: University of Oregon Books.
Ringenbach, S. D., Zimmerman, K., Chen, C. C., Mulvey, G. M., Holzapfel, S. D., Weeks, D. J., &
Thaut, M. H. (2014). Adults with Down syndrome performed repetitive movements fast with
continuous music cues. Journal of Motor Learning and Development 2(3), 47–54.
Robertson, S. D., Chua, R., Maraj, B. K., Kao, J. C., & Weeks, D. J. (2002). Bimanual coordination
dynamics in adults with Down syndrome. Motor Control 6(4), 388–407.
Robertson, S. D., Van Gemmert, A. W., & Maraj, B. K. (2002). Auditory information is beneficial for
adults with Down syndrome in a continuous bimanual task. Acta Psychologica 110(2), 213–229.
Roerdink, M., Bank, P. J., Peper, C. L., & Beek, P. J. (2011). Walking to the beat of different drums:
Practical implications for the use of acoustic rhythms in gait rehabilitation. Gait & Posture 33(1),
690–694.
Rojo, N., Amengual, J., Juncadella, M., Rubio, F., Camara, E., Marco-Pallares, J., … Altenmüller, E.
(2011). Music-supported therapy induces plasticity in the sensorimotor cortex in chronic stroke: A
single-case study using multimodal imaging (fMRI-TMS). Brain Injury 25(7–8), 787–793.
Rossignol, S., & Melvill Jones, G. (1976). Audiospinal influences in man studied by the H-reflex and
its possible role in rhythmic movement synchronized to sound. Electroencephalography &
Clinical Neurophysiology 41(1), 83–92.
Sameiro-Barbosa, C. M., & Geiser, E. (2016). Sensory entrainment mechanisms in auditory
perception: Neural synchronization cortico-striatal activation. Frontiers in Neuroscience 10.
Schaffert, N., & Mattes, K. J. (2015). Effects of acoustic feedback training in elite-standard para-
rowing. Journal of Sports Science 33(4), 411–418.
Schauer, M., & Mauritz, K. H. (2003). Musical motor feedback (MMF) in walking hemiparetic
stroke patients: Randomized trials of gait improvement. Clinical Rehabilitation 17(7), 713–722.
Schmahmann, J. D., & Pandya, D. N. (2008). Disconnection syndromes of basal ganglia, thalamus,
and cerebrocerebellar systems. Cortex 44(8), 1037–1066.
Schmitz, G., Kroeger, D., & Effenberg, A. O. (2014). A mobile sonification system for stroke
rehabilitation. Paper presented at the 20th International Conference on Auditory Display, New
York.
Schneider, S., Münte, T., Rodriguez-Fornells, A., Sailer, M., & Altenmüller, E. (2010). Music-
supported training is more efficient than functional motor training for recovery of fine motor skills
in stroke patients. Music Perception: An Interdisciplinary Journal 27(4), 271–280.
Seebacher, B., Kuisma, R., Glynn, A., & Berger, T. (2015). Rhythmic cued motor imagery and
walking in people with multiple sclerosis: A randomised controlled feasibility study. Pilot and
Feasibility Studies 1, 25. doi:10.1186/s40814-015-0021-3
Seebacher, B., Kuisma, R., Glynn, A., & Berger, T. (2016). The effect of rhythmic-cued motor
imagery on walking, fatigue and quality of life in people with multiple sclerosis: A randomised
controlled trial. Multiple Sclerosis Journal 23(2), 286–296.
Shahraki, M., Sohrabi, M., Torbati, H. T., Nikkhah, K., & NaeimiKia, M. (2017). Effect of rhythmic
auditory stimulation on gait kinematic parameters of patients with multiple sclerosis. Journal of
Medicine and Life 10(1), 33–37.
Son, H., & Kim, E. (2015). Kinematic analysis of arm and trunk movements in the gait of
Parkinson’s disease patients based on external signals. Journal of Physical Therapy Science
27(12), 3783–3786.
Song, G. B., & Ryu, H. J. (2016). Effects of gait training with rhythmic auditory stimulation on gait
ability in stroke patients. Journal of Physical Therapy Science 28(5), 1403–1406.
Song, J. H., Zhou, P. Y., Cao, Z. H., Ding, Z. G., Chen, H. X., & Zhang, G. B. (2015). Rhythmic
auditory stimulation with visual stimuli on motor and balance function of patients with Parkinson’s
disease. European Review for Medical and Pharmacological Sciences 19(11), 2001–2007.
Spaulding, S. J., Barber, B., Colby, M., Cormack, B., Mick, T., & Jenkins, M. E. (2013). Cueing and
gait improvement among people with Parkinson’s disease: A meta-analysis. Archives of Physical
Stephan, K. M., Thaut, M. H., Wunderlich, G., Schicks, W., Tian, B., Tellmann, L., … Hoemberg, V.
(2002). Conscious and subconscious sensorimotor synchronization: Prefrontal cortex and the
influence of awareness. NeuroImage 15(2), 345–352.
Suh, J. H., Han, S. J., Jeon, S. Y., Kim, H. J., Lee, J. E., Yoon, T. S., & Chong, H. J. (2014). Effect of
rhythmic auditory stimulation on gait and balance in hemiplegic stroke patients.
Teki, S., Grube, M., & Griffiths, T. D. (2012). A unified model of time perception accounts for
duration-based and beat-based timing mechanisms. Frontiers in Integrative Neuroscience 5.
Thaut, M. H. (2005). The future of music in therapy and medicine. Annals of the New York Academy
of Sciences 1060, 303–308.
Thaut, M. H., & Hoemberg, V. (Eds.) (2014). The Oxford handbook of neurologic music therapy.
Thaut, M. H., Hoemberg, V., Kenyon, G., & Hurt, C. P. (1998). Rhythmic entrainment of hemiparetic
arm movements in stroke patients. Proceedings of the Society for Neuroscience 653(7) [Abstract].
Thaut, M. H., & Kenyon, G. P. (2003). Rapid motor adaptations to subliminal frequency shifts during
syncopated rhythmic sensorimotor synchronization. Human Movement Science 22(3), 321–338.
Thaut, M. H., Kenyon, G. P., Hurt, C. P., McIntosh, G. C., & Hoemberg, V. (2002). Kinematic
optimization of spatiotemporal patterns in paretic arm training with stroke patients.
Thaut, M. H., Lange, H., Miltner, R., Hurt, C. P., & Hoemberg, V. (1996). Rhythmic entrainment of
gait patterns in Huntington’s disease patients. Proceedings of the Society for Neuroscience 727(6)
[Abstract].
Thaut, M. H., Leins, A. K., Rice, R. R., Argstatter, H., Kenyon, G. P., McIntosh, G. C., & Fetter, M.
(2007). Rhythmic auditory stimulation improves gait more than NDT/Bobath training in near-
ambulatory patients early post-stroke: A single-blind, randomized trial. Neurorehabilitation and
Thaut, M. H., McIntosh, G. C., & Hoemberg, V. (2014). Neurobiological foundations of neurologic
music therapy: Rhythmic entrainment and the motor system. Frontiers in Psychology 5. Retrieved
Thaut, M. H., McIntosh, G. C., Prassas, S. G., & Rice, R. R. (1993). The effect of auditory rhythmic
cuing on stride and EMG patterns in hemiparetic gait of stroke patients. Journal of Neurologic
Thaut, M. H., McIntosh, G. C., & Rice, R. R. (1997). Rhythmic facilitation of gait training in
hemiparetic stroke rehabilitation. Journal of Neurological Sciences 151(2), 207–212.
Rhythmic auditory stimulation in gait training for Parkinson’s disease patients. Movement
Disorders 11(2), 193–200.
Thaut, M. H., Miltner, R., Lange, H. W., Hurt, C. P., & Hoemberg, V. (1999). Velocity modulation
and rhythmic synchronization of gait in Huntington’s disease. Movement Disorders 14(5), 808–
819.
Thaut, M. H., Nickel, A., Kenyon, G. P., Meissner, N., & McIntosh, G. C. (2005). Rhythmic auditory
stimulation (RAS) for gait training in hemiparetic stroke rehabilitation: An international
multicenter study. Proceedings of the Society for Neuroscience 756(6).
Thaut, M. H., Rice, R. R., Braun Janzen, T., Hurt-Thaut, C. P., & McIntosh, G. C. (2018). Rhythmic
auditory stimulation for reduction of falls in Parkinson’s disease: A randomized controlled study.
Clinical Rehabilitation, July 23. doi:10.1177/0269215518788615
Thaut, M. H., Schicks, W., McIntosh, G. C., & Hoemberg, V. (2002). The role of motor imagery and
temporal cuing in hemiparetic arm rehabilitation. Neurorehabilitation and Neural Repair 16, 115.
Thaut, M. H., Schleiffers, S., & Davis, W. B. (1991). Analysis of EMG activity in biceps and triceps
muscle in a gross motor task under the influence of auditory rhythm. Journal of Music Therapy 28,
64–88.
Torres, E. B., Brincker, M., Isenhower, R. W., Yanovich, P., Stigler, K. A., Nurnberger, J. I., … José,
J. V. (2013). Autism: The micro-movement perspective. Frontiers in Integrative Neuroscience 7.
Torres, E. B., & Donnellan, A. M. (2015). Editorial for research topic “Autism: The movement
perspective.” Frontiers in Integrative Neuroscience 9. Retrieved from
https://doi.org/10.3389/fnint.2015.00012
Turova, V., Alves-Pinto, A., Ehrlich, S., Blumenstein, T., Cheng, G., & Lampe, R. (2017). Effects of
short-term piano training on measures of finger tapping, somatosensory perception and motor-
related brain activity in patients with cerebral palsy. Neuropsychiatric Disease and Treatment 13,
2705–2718.
Wang, T. H., Peng, Y. C., Chen, Y. L., Lu, T. W., Liao, H. F., Tang, P. F., & Shieh, J. Y. (2013). A
home-based program using patterned sensory enhancement improves resistance exercise effects for
children with cerebral palsy: A randomized controlled trial. Neurorehabilitation and Neural Repair
27(8), 684–694.
Whitall, J., McCombe Waller, S., Silver, K. H., & Macko, R. F. (2000). Repetitive bilateral arm
training with rhythmic auditory cueing improves motor function in chronic hemiparetic stroke.
Stroke 31(10), 2390–2395.
Willems, A. M., Nieuwboer, A., Chavret, F., Desloovere, K., Dom, R., Rochester, L., … Van Wegen,
E. (2006). The use of rhythmic auditory cues to influence gait in patients with Parkinson’s disease,
the differential effect for freezers and non-freezers, an explorative study. Disability and
Wood, B. H., Bilclough, J. A., Bowron, A., & Walker, R. W. (2002). Incidence and prediction of falls
in Parkinson’s disease: A prospective multidisciplinary study. Journal of Neurology, Neurosurgery
& Psychiatry 72(6), 721–725.
Yoo, J. (2009). The role of therapeutic instrumental music performance in hemiparetic arm
rehabilitation. Music Therapy Perspectives 27(1), 16–24.
Yoo, G. E., & Kim, S. J. (2016). Rhythmic auditory cueing in motor rehabilitation for stroke patients:
Systematic review and meta-analysis. Journal of Music Therapy 53(2), 149–177.
Zatorre, R. J., Halpern, A. R., & Herholz, S. C. (2012). Neuronal correlates of perception, imagery,
and memory for familiar tunes. Journal of Cognitive Neuroscience 24(6), 1382–1397.
CHAPT E R 29
NEUROLOGIC MUSIC
THERAPY FOR SPEECH
AND LANGUAGE
R E H A B I L I TAT I O N
Y U N E S . L E E, C O R E N E T H A U T, A N D C H A R L E N E
S A N TO N I
T are anecdotal and clinical reports—some of which trace back

hundreds of years—as to the fact that music, especially singing, renders
increased speech fluency for individuals with profound speech deficits.
Both short- and long-term music-based interventions exist which can
address developmental, rehabilitative, and adaptive speech goals. A
growing body of behavioral evidence prevails the efficacy of music training
on various speech and language impairments including dyslexia, specific
language impairment (SLI), aphasia, dysarthria, apraxia of speech, fluency
disorders, voice disorders, and hearing loss. Despite the conglomerate of
findings, it has remained largely elusive as to how music can elicit neural
changes that help to mediate speech and language processes in the brain. In
recent years however, a burgeoning volume of neuroimaging studies have
begun to yield promising evidence with regards to the efficacy of the use of
Neurologic Music Therapy (NMT) interventions for speech rehabilitation
by demonstrating neural reformations. For example, Wan and colleagues
(Wan, Zheng, Marchina, Norton, & Schlaug, 2014) showed that intensive
melodic intonation therapy (MIT) induced structural connectivity in the
undamaged right hemisphere in patients with non-fluent chronic aphasia.
What aspect of music (e.g., pitch, rhythm, melody, dynamics) plays the
pivotal role in the transference of music to speech ability? Although melody
may seem like the most important feature, recent evidence suggests that
rhythm plays a critical role in the facilitation and recovery of speech, during
music-based intervention (Fujii & Wan, 2014; Stahl, Kotz, Henseler,
Turner, & Geyer, 2011). Behaviorally, rhythm performance predicts some
linguistic abilities including grammar and phonological processing (Gordon
et al., 2015). Neurologically, there is substantial overlap between rhythm
and speech circuitries along the speech-motor network (Kotz, Schwartze, &
Schmidt-Kassow, 2009; Kraus & Chandrasekaran, 2010). More specifically,
the built-in temporal processes—necessary for both music and speech—are
mediated by corticostriatal circuitries comprising the basal ganglia, the
supplementary motor area (SMA), the premotor cortex, and the frontal
operculum (Kotz & Schwartze, 2010). In particular, the basal ganglia serve
as a central hub in analyzing patterns of temporal sequences of sensory or
motoric events (Kotz & Schmidt-Kassow, 2015). Accordingly, there is a
body of evidence indicating the functional role of the basal ganglia, ranging
from beat perception and production (Grahn & Brett, 2009), to speech and
language processing. Thus, patients with basal ganglia damage (e.g.,
Parkinson’s disease, PD) show speech and language deficits as well as
motor deficits (Friederici, Kotz, Werheid, Hein, & Von Cramon, 2003;
Grahn & Brett, 2009; Kotz, Frisch, Von Cramon, & Friederici, 2003).
Consequently, PD patients whose basal ganglia are dysfunctional are not
able to detect temporal cues in speech (Farrugia et al., 2014; Kotz &
Gunter, 2015), syntactic violation in language (Friederici et al., 2003; Kotz
& Gunter, 2015), and fail to modulate their speech rate during speaking
tasks (Skodda & Schlegel, 2008).
Also, such rhythm and timing deficits can stem from the mutation of
genes coding a key neurotransmitter regulating temporal processes (Wiener,
Lohoff, & Coslett, 2011; Wiener, Lee, Lohoff, & Coslett, 2014). For
Example, DRD2 polymorphism can cause the reduction of dopamine2-
receptors’ density in the basal ganglia (Rowe et al., 1999), which can
potentially affect timing and rhythmic processes. Accordingly, Wiener et al.
(2014) have shown that polymorphism of the DRD2 gene can lead to poor
temporal judgment. Similarly, Wong and colleagues (Wong, Ettlinger, &
Zheng, 2013) reported poor performance on grammar sequencing. These
two studies also related DRD2 polymorphism to the differential functional
magnetic resonance imaging (fMRI) activity in the basal ganglia.
Intriguingly, patients with dysfunctional basal ganglia benefit from
external rhythmic cueing when performing speech and language tasks. For
example, Kotz and Gunter (2015) demonstrated that P600
electroencephalography (EEG)—a hallmark of the syntactic processing—
was restored when a PD patient was primed by march music prior to a
grammar judgment task. Correspondingly, in the developmental domain,
children with SLI performed better on syntactic judgment tasks when
primed by music with a regular beat pattern than by music with an irregular
beat pattern (Przybylski et al., 2013), or by environmental sound lacking
beat components entirely (Bedoin, Brisseau, Molinier, Roch, & Tillmann,
2016). In summation, current findings suggest that there is a tight coupling
between speech and music, and that rhythmic processes are mediated by
dedicated neural and genetic mechanisms.
In this chapter, we will provide a comrehensive overview of Neurology
Music Therapy for various speech and language disorders. Neurologic
Music Therapy is an evidence-based system of standardized clinical
techniques which are based on scientific knowledge related to music
perception and production, and the effects thereof, on non-musical brain
and behavior function (Thaut & Hoemberg, 2014). In the speech and
language domain, there are eight standardized techniques: Melodic
Intonation Therapy (MIT), Musical Speech Stimulation (MUSTIM),
Rhythmic Speech Cueing (RSC), Vocal Intonation Therapy (VIT), Oral
Motor and Respiratory Exercises (OMREX), Therapeutic Singing (TS),
Developmental Speech and Language Training Through Music (DSLM),
and Symbolic Communication Training Through Music (SYCOM). We will
illustrate how each of these NMT techniques is applied to rehabilitation of
speech and language impairments.
D
Motor speech disorders (MSDs) can occur due to neurologic impairments
affecting the planning, programming, control, or execution of speech.
MSDs include the dysarthrias and apraxia of speech (Duffy, 2005, p. 5).
Dysarthria is a collective term referring to a neuropathophysiologic
disruption in the activation and control (e.g., strength, speed, range of
motion, tone, coordination) of the muscles necessary for speech production.
Dysarthria therefore, can affect the respiratory, phonatory, resonatory,
articulatory, and prosodic aspects of speech. Several categories exist:
flaccid, spastic, hypokinetic, hyperkinetic, ataxic, upper motor neuron, and
mixed; all resulting from damage or disturbance to the upper or lower
motor neurons, basal ganglia, or cerebellum (Darley, Aronson, & Brown,
1969; Duffy, 2005).
Singing and speech share the same proprioceptive feedback system.
Guenther’s Directions into Velocities of Articulators (DIVA) model
describes a segmental theory of speech motor control which proposes that
speech segments are coded by the central nervous system (CNS) as
auditory-temporal and somatosensory-temporal goal regions, and that two
controls drive a speech sound map: feedforward and feedback. The
feedforward mechanism outlines how the CNS sends anticipatory pre-
programmed instructions about movements by relying on past experiences
in movement planning, execution, and error correction. The feedback
mechanism provides scaffolding for how speech movement is controlled
based on the sensory input the CNS receives, which may indicate deviation
from the planned movement (Guenther, 2006; Guenther & Vladusich, 2012;
Tourville & Guenther, 2011). In the domain of speech and language
rehabilitation, the task of singing could be theorized as able to induce
neuromotor retraining via the formation of new motor command
relationships within the feedback mechanism, that stimulate learning within
the feedforward mechanism, thereby causing the CNS to re-calibrate or
reset its motor program for communication. In addition, since singing
naturally lends itself to heightening various elements of speech production
as an augmentative form of vocal loading, respiratory shaping, resonant
voicing, exaggerated articulation, and prosodic phrasing, singing could also
be theorized as able to modulate motor neuron activity; carrying with it
implications for rehabilitation (Cohen, 1994; Natke, Donath, & Kalveram,
2003; Tonkinson, 1994). Hereafter lies a review of current singing-related
voice therapy strategies prescribed to specific motor speech disordered
populations and their outcomes. Elucidation of the practice of singing as a
therapeutic science with reproducible effects is the main construct of the
review.
There is a significant profusion of literature reporting positive outcomes
for utilizing singing tasks as a means of voice therapy in dysarthric
populations. In traumatic brain injury and stroke, singing-induced gains
have been documented in areas related to maximum phonation time,
intensity, speech rate, prosody, vocal range, and overall intelligibility
(Baker, Wigram, & Gold, 2005; Cohen, 1988, 1992; Kim & Jo, 2013;
Tamplin, 2008). Therapeutic outcomes using NMT speech techniques such
as VIT, TS, and OMREX in Parkinson’s disease have also revealed
significant improvements in the areas of hypomimia, vocal intensity,
fundamental frequency, maximum phonation time, prosody, articulation,
and better lung function test scores, overall (Caligiuri, 1989; Canavan,
Evans, Foy, Langford, & Proctor, 2012; DeStewart, Willemse, Maassen, &
Horstink, 2003; Di Benedetto et al., 2009; Elefant, Baker, Lotan, Lagesen,
& Skeie, 2012; Haneishi, 2001; Stegemöller, Radig, Hibbing, Wingate, &
Sapienza, 2017; Tanner, 2012; Tanner, Rammage, & Liu, 2016; Tautscher-
Basnett, Tomantschger, Keglevic, & Freimuller, 2006). Accordingly, in
earlier work by Bellaire, Yorkston, and Beukelman (1986), the modification
of the breathing pattern of mildly dysarthric speakers resulted in the
amelioration of prosodic repertoire. Similarly, in 1993, Cohen and Masse
applied a singing intervention to persons with neurogenic communication
disorders, symptomatic of multiple sclerosis, cerebral palsy, Parkinson’s
disease, and cerebrovascular accident. Findings revealed improvements in
intelligibility ratings, vocal intensity, and vocal range.
The significance of singing in human development has always had firm
roots in our evolutionary inheritance: Charles Darwin theorized that
Neanderthals originally communicated using a catalog of song-like
expressions lacking in words or meaning (Darwin, 1872/1988).
Furthermore, recent research provides description of a phenomenon known
as infant-directed speech, or musical speech as being catalytic for preverbal
communication and an important stage in language learning in the earliest
stages of life (Fernald, 1989; Trainor, Clark, Huntley, & Adams, 1997). As
such, singing’s current emergence (or re-emergence) as speech’s keen and
remunerative partner, implies that the pairing has been evident all along,
and that prescribing singing training to motor speech disordered populations
is, therefore, reflective of a more refined understanding of where we came
from.
A S
Apraxia affects the sensorimotor programming, planning, or preparation

(e.g., velum elevation, tongue placement) needed for directing movements
that result in volitional speech production (Yorkston, Beukelman, Strand, &
Hakel, 2010, p. 7). Messages from the brain to the mouth become disrupted,
resulting in an inability to move the articulators to execute speech sounds
correctly. Apraxia can range from mild to a complete loss of the ability to
produce speech. The disorder exists in two forms: congenital (childhood
apraxia of speech—CAS) and acquired (apraxia of speech—AOS).
Furthermore, “although AOS can involve all speech subsystems, it is
predominantly a disorder of articulation and prosody” (Ballard et al., 2015,
p. 316). While still in its infancy, the most significant conglomerate of
clinical evidence that points towards treating apraxia utilizes rhythm as the
main cueing mechanism. In NMT, representative techniques include MIT
and RSC, with some additional prescription of OMREX and TS.
In RSC “speech rate control via auditory rhythm is used to improve
temporal characteristics such as fluency, articulatory rate, pause time, and
intelligibility of speaking” (Mainka & Mallien, 2014, p. 150). Since 1988,
several single-case studies have existed in the literature pointing to the
positive effects of metronomic pacing treatment for the rehabilitation of
apraxia of speech (Dworkin, Abkarian, & Johns, 1988; Wambaugh &
Martinez, 2000). More recent research has developed. Brendal and Ziegler
(2008) compared a metrical pacing treatment with an articulatory treatment
on a sample of ten patients with post-stroke induced mild to severe aphasia.
Post-therapy, the metrical stimulation treatment group exhibited
improvements in articulatory and suprasegmental accuracy, while the
articulatory treatment group displayed improvement in articulation alone. In
another study, using a metronomic rate control and hand tapping task within
a single-subject baseline design on a patient with mild AOS, Mauszycki and
Wambaugh’s (2008) results indicated improvement in sound production
accuracy and total utterance duration during repetition tasks. Aitken
Dunham (2010) designed a single-subject study comparing the efficacy of a
speech therapy treatment program with a speech and music therapy-
combined treatment program. The music therapy protocol was established
through the work of Kim and Tomaino (2008), and included elements of
RSC, MIT, OMREX, and TS. Results revealed that while both treatment
groups showed improvement post-therapy, the greatest treatment effect was
found following the combined therapy protocol. Finally, using a single-
subject design with a repeated practice versus repeated practice in tandem
with a rate/rhythm control strategy on ten speakers with chronic AOS,
Wambaugh and colleagues’ (2012) results indicated articulation
improvement in the repeated practice treatment with mild gains when
rate/rhythm control tasks were added.
Several studies have looked at the use of MIT with apraxia of speech
populations; however, due to the small sample sizes and lack of consistent
protocols, it is difficult to draw conclusions without further investigation
and caution is recommended when interpreting the results. In 1975, Keith
and Aronson reported a case of a 48-year-old-woman with aphasia and
apraxia of speech, three years post-stroke. After several weeks of speech
therapy, progress was muted, so a singing task was prescribed, which
resulted in the patient’s exhibition of the ability to sing and articulate words
in song. While transfer to speech was not without mild aphasic and apraxic
error, the presence of speech, and the effectiveness thereof, warranted
acknowledgment and prompted further clinical investigation. Krauss and
Galloway (1982) ran a study comparing a traditional speech therapy
protocol to one that included MIT as a warm-up across a single-subject case
study on two boys with CAS and additional developmental delays.
Outcomes for both subjects indicated improvements in phrase length, noun
retrieval, and verbal imitation. Furthermore, Helfrich-Miller (1984)
provided a report on three case studies involving children with apraxia of
speech who were prescribed MIT over a period of one to four years. Gains
were reported in the areas of phoneme acquisition, speech sequencing, and
overall improvements in intelligibility rating. However, due to the fact that
the patients were also receiving speech therapy during this time, the
conclusion that MIT was the catalyst for outcome causation should be
viewed with discretion. In 2011, Martikainen and Korpilahti compared the
effectiveness of combining MIT with the Touch-Cue Method (TCM) in the
single case of a 4-year-old girl with CAS. Findings revealed a decrease in
speech sound errors along with an increase in sequencing abilities post MIT.
This progression continued when TCM was added resulting in whole words
being formed. The outcomes of all of the aforementioned work indicate the
need for further study in this domain to improve the communicative
efficiency of people afflicted by AOS and CAS. Both MIT and RSC can be
valuable compensatory facilitators of speech and language encoding and in
tandem, speech and language production.
Aphasia is a communication disorder which can affect a person’s use of

expressive and receptive language. Aphasia is typically caused by acquired
brain injuries, but can also be present as a degenerative brain and nervous
system disorder in persons with frontotemporal dementia. Applications of
MIT and MUSTIM as used in NMT have primarily focused on non-fluent
aphasia and primary progressive aphasia.
Broca’s aphasia, also referred to as expressive or non-fluent aphasia,
results from damage to the language network in the left frontal lobe of the
brain, Brodmann’s areas 44 and 45. Broca’s aphasia is characterized by the
complete loss of ability to produce meaningful speech or severely reduced
speech output with limited short utterances. Vocabulary access is halted and
laborious, with a lack of ability to organize and control linguistic content,
which often consists of non-propositional speech. Speech can be
perseverative, with disordered syntax, grammar, and structure. The person
with expressive aphasia may understand speech relatively well and be able
to read, but be limited in writing (The American Speech-Language-Hearing
Association, 2017). For over a hundred years, it has been noted that people
with aphasia frequently have the ability to sing familiar, overlearned songs,
which are accessed through the unimpaired right hemisphere heavily
involved in the emotional color and expression as well as melodic and
rhythmic aspects of both singing and speech. It was not until the 1970s that
researchers standardized it as a formalized treatment process for people
with Broca’s aphasia (Sparks, Helm, & Albert, 1974, Sparks & Holland,
1976). Since then, successful applications of the rhythmic and melodic
intonation for aphasia have been seen across languages and cultures
(Bonakdarpour, Eftekharzadeh, & Ashayeri, 2003; Cortese, Riganello,
Arcuri, Pignataro, & Buglione, 2015; Haro-Martinez et al., 2017, Popovici,
1995).
Melodic Intonation Therapy (MIT) is a technique which uses a person
with aphasia’s unaffected ability to sing familiar songs, in order to teach
them how to sing and eventually generate speech output of functional
phrases through the use of the melodic and rhythmic elements of speech.
Evidence has shown that by using the stepwise process of MIT, the brain is
able to bypass damaged left-hemisphere networks and engage right-
hemisphere language resources via the rerouting of speech pathways,
therefore aiding in the restoration of propositional speech (Breier, Randle,
Maher, & Papanicolaou, 2010; Schlaug, Marchina, & Norton, 2009). While
many studies have focused on the use of MIT with chronic aphasia, Van der
Meulen and colleagues (Van der Meulen, van de Sandt-Koenderman,
Heijenbrok-Kal, Visch-Brink, & Ribbers, 2012) saw significant
improvements on verbal output in subacute severe non-fluent aphasia
patients between two and three months post-stroke.
While the ultimate goal when using MIT is to train propositional
language, in order for people to communicate and express non-formulaic
verbal output independently in their everyday life, it is also used to teach a
specific set of formulaic or overlearned phrases which are relevant to the
patient’s life. The long-term goal is to improve propositional speech, and
therefore speech and language assessments which are sensitive to both
propositional and non-propositional speech should be used (Lim et al.,
2013; Zumbansen, Peretz, & Hébert, 2014). Long-term, there is some
evidence that suggests reactivation of left-hemisphere speech circuitry
(Belin et al., 1996; Naeser & Helm-Estabrooks, 1985; Schlaug, Marchina,
& Norton, 2008; Schlaug et al., 2009).
Although the body of research validating the use of MIT with aphasia is
very large, there is still more to be understood. Many studies have
emphasized melody as the key element driving the responses seen in
patients when using MIT (Akanuma, Meguro, Satoh, Tashiro, & Itoh, 2016;
Seger et al., 2013). While melody clearly plays an important role, more
recent research has focused on the rhythmic priming and pacing which has
been shown to engage auditory, prefrontal, and parietal regions in the right
hemisphere (Boucher, Garcia, Fleurant, & Paradis, 2001; Stephan et al.,
2002).
F
Fluency refers to the aspects of speech output related to continuity,

smoothness, rate, and effort. Most people have experienced brief speech
disfluencies at some point in time in their lives. For instance, normal
disfluency can happen when children are first learning to combine words
and speak in short sentences, or when they are learning to read. However,
when disfluencies become numerous to a point where they impede the
ability to communicate, they may meet the diagnostic criteria for a fluency
disorder such as stuttering or cluttering.
Stuttering is most commonly presents at childhood, but adult onset can
also result due to a range of neurologic and neuropsychological conditions.
While the exact cause of stuttering is not completely understood, there are
many theories suggesting a range of genetic, neurological, psychological,
and social linguistic links to the disorder. The Diagnostic and Statistical
Manual of Mental Disorders, 5th edition (DSM-5; American Psychiatric
Association, 2013), identifies primary symptoms for childhood-onset
fluency disorder as repetition of sounds, syllables, or monosyllabic whole
words; prolongation of single sounds; blocked silence or voicing during
speech; and excessive physical tension in word production. Secondary
behaviors may also include hesitation, interjection of sounds, loss of eye
contact, and extraneous motor movements such as eye blinking.
Jokel and colleagues (Jokel, De Nil, & Sharpe, 2007) systematically
assessed the speech characteristics of adults with neurogenic stuttering due
to acquired brain injury. Their research resulted in six principal
characteristics of stuttering, which are often referred to in the neurogenic
stuttering literature:
1. Disfluencies occur equally on grammatical and substantive words.

2. Repetition, prolongation, and blocks occur in all positions of words.
3. There is a consistency of stuttering behavior across all speech tasks.
4. The speaker does not appear overly anxious about the stuttering
behavior.
5. Secondary features are rarely observed.
6. An adaptation effect is not observed.
Cluttering, which also presents as a disruption in fluency and rate of
speech, is characterized by rapid bursts of speech at an irregular speech
rate. Typical disfluencies may include excessive whole word repetition,
unfinished words, omitted syllables, and interjections. People who clutter
often have limited self-awareness of their irregular speech, sloppy
handwriting, poor attention, difficulties organizing thoughts, auditory
processing disorders, and learning disabilities.
Several studies have suggested that stuttering is a disorder of motor
timing, and may be related to the basal ganglia (Lebrun, 1998; Rosenberger,
1980; Victor & Ropper 2001; Wu et al., 1995). The SMA and basal ganglia
play a significant role in providing internal timing cues to facilitate the
initiation of even well-learned speech (Cunnington, Bradshaw, & Iansek,
1996). Rhythm has been used as an effective external timing cue to
compensate for deficient internal cues from the basal ganglia and the SMA.
This may explain why speaking to a metronome is one of the most effective
ways to instantly create fluency for persons who stutter (Alm, 2004). This
effect has been reported to be independent of speech rate, with significant
reduction in stuttering seen even at very fast tempos (Van Ripper, 1982).
Research has also looked at singing to increase fluency in vocal output.
Alm (2004) suggested that melody cannot exist without rhythm; therefore,
when singing the brain has an internal representation of the intended timing
for the initiation of each syllable, similar to how the metronome can provide
external timing cues. A study by Healey and colleagues (Healey, Mallard,
& Adams, 1976) compared singing familiar and unfamiliar lyrics to a
familiar melody. While both conditions were associated with significant
reductions in stuttering, the greatest increases were seen when singing
familiar lyrics, possibly indicating that singing alone does not account for
all of the decreases in stuttering that occur during singing. In 1979, Colcord
and Adams compared reading versus singing altered lyrics to a familiar
melody in order to increase fluency, voicing durations, and vocal sound
pressure levels (SPL) in moderate to severe stutterers. Results revealed both
a decrease in disfluency and an increase in voicing duration when singing to
a familiar melody over reading. In addition, Glover and colleagues (Glover,
Kalinowski, Rastatter, & Stuart, 1996) compared reading vs. self-generated
singing at normal and fast tempos. Singing at both fast and normal rates
was found to generate substantial reduction in stuttering over reading.
Although the quality of singing varied significantly, this study suggested
that stutterers also have the ability to internally create fluent speech output
by imposing self-generated melodic structures when asked to sing.
S D
Although several attempts have been made to use music as a habilitative

means for hearing restoration, there is only a scant of NMT research in the
hearing loss and cochlear implant (CI) domain (Gfeller, 2016; Limb &
Rubinstein, 2012). This is primarily due to the inherent difficulty of music
perception in CI users—their ability to process spectrally complex musical
sound is limited (Limb & Roy, 2014). Such impoverished music signals can
be disturbing to some CI users, while others find it pleasant (Abdi,
Khalessi, Khorsandi, & Gholami, 2001; Gfeller, Driscoll, Smith, &
Scheperle, 2012).
Only a few studies have examined the effect of music training on CI
hearing improvement with rigor for proper experimental designs and formal
tests. This is due to logistical and practical challenges including
heterogeneity in age, onset of the hearing loss, and duration of the CI
among the participants. For example, past studies often relied upon
teachers’ or parents’ evaluations as indicative of improved music skills and
aptitude after music listening with no statistical analyses (Abdi et al., 2001;
Rocca, 2012). Nevertheless, emerging evidence indicates that music
training appears to elicit improved listening skills in the CI users. Chen et
al. (2010) reported that pitch perception was positively correlated with a
music training period in twenty-seven CI children. Fu and colleagues (Fu,
Galvin, Wang, & Wu, 2015) demonstrated that melodic contour
identification ability was significantly improved after four weeks of a
computerized music training program in fourteen congenitally deaf CI
children. Importantly, music training not only leads to listening skills within
the music domain, but it also transfers to speech and cognitive domains. For
example, Rochette and colleagues (Rochette, Moussard, & Bigand,2014)
showed that fourteen profoundly deaf children who received 1.5–4 years of
music lessons outperformed the control CI group (i.e., no musical training)
in phonetic discrimination tests, auditory scene analysis, and working
memory tests.
At present, there is a dearth of CI studies examining the effect of music
training at the neural level. In general, it is difficult to study the CI users
using fMRI because of the ferromagnetic characteristics of the CI device.
Instead, EEG has been used to study the neural activity associated with
listening skills in the CI users. For example, Peterson et al. (2015) recorded
the brain activity profiles of eleven adolescent CI users using EEG before
and after a two-week music training period. They found a significant
change in mismatch negativity (MMN) in response to deviations of timbre,
intensity, and rhythm. In addition to EEG, functional near-infrared
spectroscopy (fNIRS) is another viable option to study CI populations
(Saliba, Bortfeld, Levitin, & Oghalai, 2016). In fact, fNIRS has several
advantages over fMRI including quietness, portability, and a naturalistic
and participant-friendly environment. Although the application of fNIRS in
the CI domain is still in its infancy, fNIRS allows for studying the
neuroanatomical changes following music intervention training in CI users,
and therefore remains attractive.
V D
Voice disorders occur when there is a deficiency in vocal functioning

affecting speech production. Symptoms may include: hoarse or breathy
vocal quality; loss of voice; pitch breaks, inability to maintain typical pitch,
or reduced pitch range; lack of vocal carrying power; reduced loudness
range; a need to use greater vocal effort; running out of breath while
talking; an unsteady voice, tension in the neck and shoulders, throat or neck
pain, throat fatigue or tightness, pain upon swallowing; an increased need to
cough or throat; and any form of discomfort in the chest, ears, or back of
the neck (Kostyk & Putnam Rochet, 1998). Voice disorders can be
manifested in a multitude of different ways and have multi-factorial
etiologies. Stemple and colleagues (Stemple, Glaze, & Klaben, 2009)
classify disorders into four main causal areas: medically-related disorders
and primary disorders (structural and neurogenic); personality-related
disorders, sometimes referred to as psychogenic; and vocal misuse
disorders, alternatively labeled functional.
Medically-related disorders refer to “medical or surgical interventions
that directly cause voice disorders and medical or health conditions and
treatments that may indirectly contribute to the development of voice
disorders” (e.g., trauma, chronic illness, chronic disorder) (Stemple et al.,
2009, p. 60). A sampling of singing-task induced voice therapy research
outcomes are highlighted hereafter. Onofre and colleagues (Onofre et al.,
2013) provided report on the use of a singing training program prescribed to
laryngectomy-wearing patients with tracheoesophageal voice prostheses.
The program included both respiratory muscle strengthening and scalar
vocalization tasks. Outcomes revealed improvement in the grade of
dysphonia, roughness and breathiness as well as minor improvements in
vocal extension during tracheoesophageal phonation. Using a randomized
control trial, Hilton et al. (2013) prescribed singing exercises to a
population of ninety-three patients in an effort to reduce symptoms of
snoring and sleep apnea, and found that by improving the tone and strength
of the pharyngeal muscles, the experimental group displayed a significantly
reduced frequency of snoring. Lortie and colleagues (Lortie, Rivard,
Thibeault, & Tremblay, 2017) described the augmentative effects of singing
on the aging voice by looking at a population of seventy-two people with an
age range of 20 to 93 years. Findings indicated that frequent singing
moderates age-related effects on most acoustic parameters of the voice,
especially related to pitch accuracy and amplitude levels. This is in keeping
with similar findings by Sauder, Roy, Tanner, Houtz, and Smith (2010) and
Ziegler Verdolini Abbott, Johns, Klein, and Hapner (2014) related to
presbylaryngis, which lends itself to the burgeoning conglomerate of
evidence affirming the benefits of singing practice on the aging voice.
Primary disorders include “embryologic, physiologic, neurologic and
anatomic disorders that have vocal changes as secondary symptoms of the
primary disorder” (e.g., cleft palate, velopharyngeal insufficiency, hearing
impairment, cerebral palsy) (Stemple et al., 2009, p. 65). Research in this
area has been localized to a few select areas; one being spasmodic
dysphonia (SD). SD causes symptoms of strained or effortful voice qualities
due to adductor or abductor laryngospasm. While primary treatment
prescription includes botox injection or resection of the recurrent laryngeal
nerve to paralyze one of the folds, voice therapy treatment targets include
work related to soft and sustained phonatory onsets, pitch and loudness
modifications (The American Speech-Language-Hearing Association,
2017). Recent literature suggests that SD is a form of focal dystonia with
dysfunction often reflected during speech tasks alone, leaving non-linguistic
vegetative functions such as coughing, laughing, and singing unaffected by
the disorder. Reduction in spasticity thereby, is the result of deviation from
the normal mode of phonation, and singing has henceforth been promoted
as an effective strategy to explore in therapy (Bloch, Hirano, & Gould,
1985). Therapeutic outcomes in populations with unilateral vocal fold
paralysis have also been positive, with reported improvement related to
reduced hoarseness and improved perception of voice impairment (Busto-
Crespo et al., 2016). That said, acknowledgment of these results should be
treated with caution as idiopathic vocal fold immobility has a history of
spontaneous resolution. Finally, since reports have shown that
professionally trained classical singers carefully tune their velopharyngeal
port to fine-tune their voice timbre (Austin, 1997; Birch et al., 2002; Fowler
& Morris, 2007; Sundberg et al., 2007; Tanner, Roy, Merrill, & Power,
2005; Yanagisawa, Estill, Mambrino, & Talkin, 1991), research
investigating the effect of altered auditory feedback on the control of oral–
nasal balance in song was completed by Santoni, de Boer, Thaut, and
Bressmann (2018). Results indicated that all participants showed lower
nasalance scores in response to both increased and decreased nasal signal
level feedback, with no differences reported between trained singers and
untrained non-singers. In a similar study by Jennings and Kuehn (2008),
looking at the singing of sustained vowels, without the altered feedback
condition, trained singers were shown to display lower nasalance scores
than untrained singers. While the results of both studies may not be directly
comparable, they do support the premise that more research is needed in
this area in order to support the potential for the experimental
implementation of singing-infused therapeutic protocols in populations with
hypernasal resonance disorders.
Personality-related or psychogenic voice disorders come about due to
psychological factors reflected via a disturbance of voice (e.g.,
puberphonia, conversion aphonia). In a study treating patients with
puberphonia, Desai and Mishra infused singing modalities (humming and
glottal phonatory onsets) into part of their voice therapy protocol and found
that all patients (N = 30) were able to achieve appropriate pitch range post-
therapy (Desai & Mishra, 2012). More research is needed in this area.
Vocal misuse disorders (e.g., muscle tension dysphonia, voice fatigue,
ventricular phonation, phonotrauma) are typical of vocal abuse, often
caused by poor muscle functioning or poor voicing behaviors. There is a lot
of literature supporting the use of several standard voice therapy treatment
protocols addressing vocal misuse with protocols somewhat analogous to
tasks involved in singing—specifically the use of nasal consonants and
humming, sustained phonation, pitch glides, and rhythmic vocal play: the
Smith Accent Method (Smith & Thyme, 1976), Vocal Function Exercises
(Stemple, Lee, D’Amico, & Pickup, 1994), Lessac-Madsen Resonant Voice
Therapy (Verdolini-Marston, Burke, Lessac, Glaze, & Caldwell, 1995),
Semi-Occluded Vocal Tract (Titze, 2006), and Phonatory Resistance
Training Exercises (Ziegler & Hapner, 2013). Perceptual outcomes of
Resonant Voice Therapy, for example, have included improvements in
speech-level fundamental frequency and range of speaking intensity, as well
as reductions in vocal roughness, strain, monotone, hard glottal attack,
vocal fry, and overall vocal fatigue (Chen, Hsiao, Hsiao, Chung, & Chiang,
2007; Roy et al., 2003; Yiu & Ho, 2002; Verdolini-Marston et al., 1995).
Alleviation of supraglottic activity (false vocal fold and anterior-posterior
compression) has also been reported (Ogawa et. al, 2013).
Active singing as a treatment option in the world of voice therapy is in
fact not a new concept. Boone, McFarlane, Von Berg, and Zraick (2010)
explain a technique called Redirected Phonation, which is prescribed to
patients having difficulty “finding” their voice due to functional dysphonia.
The procedural mechanism of the technique is that the speech language
pathologist “searches with the patient to find some kind of vegetative
phonation (coughing, gargling, laughing, throat clearing) or some kind of
intentional voicing (‘playing’ the comb or kazoo, humming, singing, trilling
[Colton & Casper, 1996], or saying ‘um-hmm’ [Cooper, 1973])” (Boone et
al., 2010, p. 230). Relative to singing, the protocol is focused on singing
practice sentences (similar to the practice of chant-talking) with the goal of
phasing out the singing with the newly redirected voicing procedure for
speech—shaped by the improved respiration, phonation, and overall voice
quality present in the singing condition. Outcomes have included increased
ease and clarity of voice production, with less perturbation.
A vast amount of clinical evidence also points to the benefit of using
singing tasks as a means of respiratory therapy. Within the domain of
chronic obstructive pulmonary disease (COPD), clusters of research have
shown that singing voice therapy tasks (VIT, TS, OMREX) have resulted in
improvements in single breath counting, breath support modes (clavicular
vs. diaphragmatic), maximum intensity ratings, lung function tests
(maximum expiratory pressure, forced expiratory volume, and forced vital
capacity), as well as self-reported improvement in dyspnea ratings (Canga,
Azoulay, Raskin, & Loewy, 2015; Engen, 2003; Jamaly et al., 2017; Lord et
al., 2012; Skingley et al., 2014). In a study completed by Eley and Gorman
(2010), thirty-three asthmatic participants were treated with either OMREX
(males playing a didgeridoo) or VIT and TS (females taking singing
lessons). Results for the males indicated significant improvements in lung
function tests (peak expiratory flow, forced expiratory volume, and forced
vital capacity). Results for the females revealed promising, but insignificant
gains in peak expiratory flow. There is also some preliminary evidence of
the clinical benefit of a singing program in the cystic fibrosis population
with one study pointing to amelioration reflected in lung function scores
(maximum inspiratory pressure, maximum expiratory pressure) of eight
hospitalized children post-treatment (Irons, Kenny, McElrea, & Chang,
2012). Finally, in a study conducted by Tamplin et al. (2013), a randomized
control trial comparing the effectiveness of singing lessons (OMREX and
TS) versus music appreciation and relaxation classes for twenty-four
participants with quadriplegia was completed. Results indicated significant
improvements in speech intensity as well as maximum phonation time for
the singing group alone.
The breadth of this research exhibits an exciting trend towards the use of
a singing-task as a viable and contemporary utensil for use in voice therapy
practices.
Rhythm-based intervention has been applied to developmental dyslexia, a

prevalent reading disorder despite a person’s normal cognitive abilities and
IQ. For example, Thomson and colleagues (Thomson, Leong, &
Goswami,2012) devised a novel rhythm training program for six weeks of
intervention with eleven dyslexic children. The rhythm intervention
program consisted of three different training regimens aimed at improving
auditory temporal processing in a fun and engaging manner. They compared
the efficacy of the rhythm intervention to a conventional phonetic training
program that eleven other dyslexic children participated in. Both
intervention programs yielded a comparable amount of improvement in
phonological awareness compared to a third control group of eleven
dyslexic children. Similarly, Bhide and colleagues (Bhide, Power, &
Goswami, 2013) compared a rhythm intervention program consisting of
nine different rhythm training sections (e.g., same/different rhythm
discrimination, rise time discrimination, etc.) to a conventional intervention
program that required children to match sound to spelling. Their findings
indicated that the rhythm-based intervention was as effective as the
conventional intervention method. Bonacina and colleagues (Bonacina,
Cancer, Lanzi, Lorusso, & Antonietti, 2015) conducted an intervention
study with fourteen dyslexic children who underwent a computerized
rhythmic-reading training (RRT) every other week (a total of nine sessions).
Compared to a control group (no training), children who received the RRT
improved reading ability as evidenced by reduced reading speed and
increased accuracy. Flaugnacco et al. (2015) performed a randomized
control trial wherein dyslexic children participated in either a music training
program or a painting training program (in tandem with conventional daily
treatment) for a period of seven months. The music training program was
based on Kodaly and Orff pedagogy with significant focus given to the
rhythmic and temporal aspects of the music. What they found was that the
music group outperformed the control (i.e., painting) group in phonological
awareness and reading skills. More recently, Habib et al. (2016) conducted
a music-based intervention program with dyslexic and normal school-age
children for three days (six hours per day). After the intervention, they
found a significant improvement in phonological and syllabic encoding
abilities in the dyslexic children. Most notably, performance after the
intervention was comparable to that of normal children. In summary, both
short- and long-term music-based intervention programs appear to be
effective ways of treating dyslexia.
C
There are many parallels between both the structure and production of
speech, language, and music. All can be considered sensorimotor behaviors
that require a high level of control and dynamic interplay between several
brain processes in order to select, organize, articulate, and implement in a
time-sensitive manner. Because of the inherent timing, rhythm, pattern, and
melodic structures in both music and speech, music has the potential to
simulate normal speech patterns, and therefore act as a training and
retraining tool for people with speech and language disorders.
R
Abdi, S., Khalessi, M. H., Khorsandi, M., & Gholami, B. (2001). Introducing music as a means of
habilitation for children with cochlear implants. International Journal of Pediatric
Otorhinolaryngology 59(2), 105–113.
Aitken Dunham, D. J. (2010). Efficacy of using music therapy combined with traditional aphasia and
apraxia of speech treatments (Master’s dissertation). Western California University, North
Carolina.
Akanuma, K., Meguro, K., Satoh, M., Tashiro, M., & Itoh, M. (2016). Singing can improve speech
function in aphasics associated with intact right basal ganglia and preserve right temporal glucose
metabolism: Implications for singing therapy indication. International Journal of Neuroscience
126(1), 39–45.
Alm, P. A. (2004). Stuttering and the basal ganglia circuits: A critical review of possible relations.
Journal of Communication Disorders 37(4), 325–69.
Austin, S. F. (1997). Movement of the velum during speech and singing in classically trained singers.
Journal of Voice—Official Journal of the Voice Foundation 11(2), 212–221.
Baker, F., Wigram, T., & Gold, C. (2005). The effects of song-singing programme on the affective
speaking intonation of people with traumatic brain injury. Brain Injury 19(7), 519–28.
Ballard, K. J., Wambaugh, J. L., Duffy, J. R., Layfield, C., Maas, E., Mauszycki, S., & McNeil, M. R.
(2015). Treatment for acquired apraxia of speech: A systematic review of intervention research
between 2004 and 2012. American Journal of Speech Language Pathology 24(2), 316–337.
Frontiers in Neuroscience 10. Retrieved from
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913515/
Belin, P., Van Eeckhout, P., Zilbovicius, M., Remy, P., François, C., Guillaume, S., Chain, F., …
Samson, Y. (1996). Recovery from nonfluent aphasia after melodic intonation therapy: A PET
study. Neurology 47(6), 1504–1511.
Bellaire, K., Yorkston, K. M., & Beukelman, D. R. (1986). Modification of breath patterning to
increase naturalness of a mildly dysarthric speaker. Journal of Communication Disorders 19(4),
271–280.
123.
Birch, P., Gümoes, B., Stavad, H., Prytz, S., Björkner, E., & Sundberg, J. (2002). Velum behavior in
professional classic operatic singing. Journal of Voice 16(1), 61–71.
Bloch, C. S., Hirano, M., & Gould, W. J. (1985). Symptom improvement of spastic dysphonia in
response to phonatory tasks. Annals of Otology, Rhinology, & Laryngology 94, 51–4.
Bonacina, S., Cancer, A., Lanzi, P. L., Lorusso, M. L., & Antonietti, A. (2015). Improving reading
skills in students with dyslexia: The efficacy of a sublexical training with rhythmic background.
Educational Psychology 6(1510), 1–8.
Bonakdarpour, B., Eftekharzadeh, A., & Ashayeri, H. (2003). Melodic intonation therapy in Persian
aphasic patients. Aphasiology 17(1), 75–95.
Boone, D. R. (1983). The voice and voice therapy (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.
Boone, D. R., McFarlane, S. C., Von Berg, A. L., & Zraick, R. I. (2010). The voice and voice therapy
(8th ed.). Boston, MA: Pearson.
Boucher, V., Garcia, L. J., Fleurant, J., & Paradis, J. (2001). Variable efficacy of rhythm and tone in
melody based interventions: Implications for the assumption of a right-hemisphere facilitation in
non-fluent aphasia. Aphasiology 15(2), 131–149.
Breier, J. I., Randle, S., Maher, L. M., & Papanicolaou, A. C. (2010). Changes in maps of language
activity activation following melodic intonation therapy using magnetoencephalography: Two case
studies. Journal of Clinical and Experimental Neuropsychology 32(3), 309–314
Brendal, B., & Ziegler, W. (2008). Effectiveness of metrical pacing in the treatment of apraxia of
speech. Aphasiology 22(1), 1–26.
Busto-Crespo, O., Uzcanga-Lacabe, M., Abad-Marco, A., Berasategui, I., García, L., Maraví, E., …
Fernández-González, S. (2016). Longitudinal voice outcomes after voice therapy in unilateral
vocal fold paralysis. Journal of Voice 30(6), 767.e9–767.e15.
Caligiuri, M. P. (1989). The influence of speaking rate on articulatory hypokinesia in Parkinsonian
dysarthria. Brain & Language 36(3), 1493–1502.
Canavan, M., Evans, C., Foy, C., Langford, R., & Proctor, R. (2012). Can group singing provide
effective speech therapy for people with Parkinson’s disease? Arts and Health 4(1), 83–95.
Canga, B., Azoulay, R., Raskin, J., & Loewy, J. (2015). AIR: Advances in respiration—music
therapy in the treatment of chronic pulmonary disease. Respiratory Medicine 109(12), 1532–1539.
Casper, J. (2000). Confidential voice. In J. C. Stemple (Ed.), Voice therapy: Clinical studies (2nd ed.,
pp. 128–139). San Diego, CA: Singular Publishing Group.
Chen, J. K.-C., Chuang, A. Y. C., McMahon, C., Hsieh, J. C., Tung, T. H., & Li, L. P. (2010). Music
training improves pitch perception in prelingually deafened children with cochlear implants.
Pediatrics 125(4), e793–e800.
Chen, S. H., Hsiao, T. Y., Hsiao, L. C., Chung, Y. M., & Chiang, S. C. (2007). Outcome of resonant
voice therapy for female teachers with voice disorders: Perceptual, physiological, acoustic,
aerodynamic, and functional measurements. Journal of Voice 21(4), 415–425.
Cohen, N. S. (1988). The use of superimposed rhythm to decrease the rate of speech in a brain-
damaged adolescent. Journal of Music Therapy 25(2), 85–93.
Cohen, N. S. (1992). The effect of singing instruction on the speech production of neurologically
impaired persons. Journal of Music Therapy 29(2), 87–102.
Cohen, N. S. (1994). Speech and song: Implications for therapy. Music Therapy Perspectives 12(1),
8–14.
Cohen, N. S., & Masse, R. (1993). The application of singing and rhythmic instruction as a
therapeutic intervention for persons with neurogenic communication disorders. Journal of Music
Therapy 30(2), 81–99.
Colcord, R. D., & Adams, M. R. (1979). Voicing duration and vocal SPL changes associated with
stuttering reduction during singing. Journal of Speech and Hearing Research 22(3), 468–479.
Colton, R. H., & Casper, J. K. (1996). Understanding voice problems: A physiological perspective
for diagnosis and treatment (3rd ed.). Baltimore, MD: Lippincott.
Cooper, M. (1973). Modern techniques of vocal rehabilitation. Springfield, IL: Charles C. Thomas.
Cortese, M. D., Riganello, F., Arcuri, F., Pignataro, L. M., & Buglione, I. (2015). Rehabilitation of
aphasia: Application of melodic-rhythmic therapy to Italian language. Frontiers in Human
Cunnington, R., Bradshaw, J. L., & Iansek, R. (1996). The role of the supplementary motor area in
the control of voluntary movement. Human Movement Science 15(5), 627–647.
Darley, F. L., Aronson, A. E., & Brown, J. R. (1969). Differential diagnostic patterns of dysarthria.
Journal of Speech and Hearing Research 12(2), 246–269.
Darwin, C. (1872/1988). The expression of the emotions in man and animals. Ed. P. Ekman. Oxford:
Desai, V., & Mishra, P. (2012). Voice therapy outcome in puberphonia. Journal of Laryngology and
Voice 2(1), 26–29.
DeStewart, B. J., Willemse, S. C., Maassen, B. A. M., & Horstink, M. W. I. M. (2003). Improvement
of voicing in patients with Parkinson’s disease by speech therapy. Neurology 60(3), 498–500.
Di Benedetto, P., Cavazzon, M., Mondolo, F., Rugiu, G., Peratoner, A., & Biasutti, E. (2009). Voice
and choral singing treatment: A new approach for speech and voice disorders in Parkinson’s
disease. European Journal of Physical Rehabilitation Medicine 45(1), 13–19.
Duffy, J. R. (2005). Motor speech disorders: Substrates, differential diagnosis, and management. St.
Louis, MO: Elsevier Mosby.
Dworkin, J. P., Abkarian, G. G., & Johns, D. F. (1988). Apraxia of speech: The effectiveness of a
treatment regimen. Journal of Speech and Hearing Disorders 53(3), 280–294.
Elefant, C., Baker, F. A., Lotan, M., Lagesen, S. K., & Skeie, G. O. (2012). The effect of group music
therapy on mood, speech, and singing in individuals with Parkinson’s disease: A feasibility study.
Journal of Music Therapy 49(3), 278–302.
Eley, R., & Gorman, D. (2010). Didgeridoo playing and singing to support asthma management in
aboriginal Australians. Journal of Rural Health 26(1), 100–104.
Engen, R. L. (2003). The singer’s breath: Implications for treatment of persons with emphysema
(Dissertation). University of Iowa.
Farrugia, N., Benoit, C. E., Schwartze, M., Pell, M., Obrig, H., Dalla Bella, S., & Kotz, S. (2014).
Auditory cueing in Parkinson’s disease: Effects on temporal processing and spontaneous theta
oscillations. Procedia—Social and Behavioral Sciences 126, 104–105. Special Issue for
International Conference on Timing and Time Perception, March 31–April 3, Corfu, Greece.
Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants. Is the melody
the message? Child Development 60(6), 1497–1510.
control trial. PloS ONE 10(9), e0138715.
Fowler, L. P., & Morris, R. J. (2007). Comparison of fundamental frequency nasalance between
trained singers and nonsingers for sung vowels. Annals of Otology, Rhinology, & Laryngology
116(10), 739–746.
Friederici, A. D., Kotz, S. A., Werheid, K., Hein, G., & Von Cramon, D. (2003). Syntactic
comprehension in Parkinson’s disease: Investigating early automatic and late integrational
processes using event-related brain potentials. Neuropsychology 17(1), 133–142.
Fu, Q.-J., Galvin, J. J., Wang, X., & Wu, J. L. (2015). Benefits of music training in Mandarin-
speaking pediatric cochlear implant users. Journal of Speech, Language, and Hearing Research
58(1), 163–169.
Fujii, S., & Wan, C. Y. (2014). The role of rhythm in speech and language rehabilitation: The SEP
hypothesis. Frontiers in Human Neuroscience 8, 1–15. doi:10.3389/fnhum.2014.00777
Gfeller, K. (2016). Music-based training for pediatric CI recipients: A systematic analysis of
published studies. European Annals of Otorhinolaryngology, Head and Neck Diseases, 12th
European Symposium on Pediatric Cochlear Implant (ESPCI 2015) 133(Suppl. 1), S50–S56.
Gfeller, K., Driscoll, V., Smith, R. S., & Scheperle, C. (2012). The music experiences and attitudes of
a first cohort of prelingually-deaf adolescents and young adults CI recipients. Seminars in Hearing
33(4), 346–360.
Glover, H., Kalinowski, J., Rastatter, M., & Stuart, A. (1996). Effect of instruction to sing on
stuttering frequency at normal and fast rates. Perceptual and Motor Skills 83(2), 511–522.
Gordon, R. L., Shivers, C. M., Wieland, E. A., Kotz, S. A., Yoder, P. J., & McAuley, J. D. (2015).
Musical rhythm discrimination explains individual differences in grammar skills in children.
Developmental Science 18(4), 635–644.
Grahn, J. A., & Brett, M. (2009). Impairment of beat-based rhythm discrimination in Parkinson’s
disease. Cortex 45(1), 54–61.
Guenther, F. H. (2006). Cortical interactions underlying the production of speech sounds. Journal of
Communication Disorders 39(5), 350–365.
Guenther, F. H., & Vladusich, T. (2012). A neural theory of speech acquisition and production.
Journal of Neurolinguistics 25(5), 408–422.
Psychology 7. Retrieved from
http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00026/abstract
Haneishi, E. (2001). Effects of a music therapy voice protocol on speech intelligibility, vocal acoustic
measures, and mood of individuals with Parkinson’s disease. Journal of Music Therapy 38(4),
273–290.
Haro-Martínez, A. M., García-Concejero, V. E., López-Ramos, A., Maté-Arribas, E., López-Táppero,
J., Lubrini, G., … & Fuentes, B. (2017). Adaptation of melodic intonation therapy to Spanish: A
feasibility pilot study, Aphasiology 31(11), 1333–1343.
Healey, E. C., Mallard, A. R., & Adams, M. R. (1976). Factors contributing to the reduction of
stuttering during singing. Journal of Speech and Hearing Research 19, 475–480.
Helfrich-Miller, K. R. (1984). Melodic intonation therapy with developmentally apraxic children.
Seminars in Speech and Language 5, 119–126.
Hilton, M. P., Savage, J., Hunter, B., McDonald, S., Repanos, C., & Powell, R. (2013). Singing
exercises improve sleepiness and frequency of snoring among snorers: A randomised controlled
trial. International Journal of Otolaryngology and Head & Neck Surgery 2(3), 97–102.
Irons, J. Y., Kenny, D. T., McElrea, M., & Chang, A. B. (2012). Singing therapy for young people
with cystic fibrosis: A randomized controlled pilot study. Music and Medicine 4(3), 136–145.
Jamaly, S., Leidag, M., Schneider, H. W., Domanksi, U., Rasche, K., Schröder, M., & Nilius, G.
(2017). The effect of singing therapy compared to standard physiotherapeutic lung sport in COPD.
Pneumologie 71(S01), S1–S125.
Jennings, J. J., & Kuehn, D. P. (2008). The effects of frequency range, vowel, dynamic loudness
level, and gender on nasalance in amateur and classically trained singers. Journal of Voice 22(1),
75–89.
Jokel, R., De Nil, L. F., & Sharpe, A. K. (2007). Speech disfluencies in adults with neurogenic
stuttering associated with stroke and traumatic brain injury. Journal of Medical Speech-Language
Pathology 15(3), 243–261.
Keith, R., & Aronson, A. (1975). Singing as therapy for apraxia of speech and aphasia: Report of a
case. Brain & Language 2, 483–488.
Kim, M., & Tomaino, C. (2008). Protocol evaluation for effective music therapy for persons with
nonfluent aphasia. Topics in Stroke Rehabilitation 15(6), 555–569.
Kim, S. J., & Jo, U. (2013). Study of accent-based music speech protocol development for improving
voice problems in stroke patients with mixed dysarthria. Neurorehabilitation 32(1), 185–190.
Kostyk, B. E., & Putnam Rochet, A. (1998). Laryngeal airway resistance in teachers with vocal
fatigue: A preliminary study. Journal of Voice 12(3), 287–299.
Kotz, S. A., Frisch, S., Von Cramon, D. Y., & Friederici, A. D. (2003). Syntactic language
processing: ERP lesion data on the role of the basal ganglia. Journal of the International
Neuropsychological Society 9(7), 1053–1060.
Kotz, S. A., & Gunter, T. C. (2015). Can rhythmic auditory cuing remediate language-related deficits
in Parkinson’s disease? Annals of the New York Academy of Sciences 1337, 62–68.
Kotz, S. A., & Schmidt-Kassow, M. (2015). Basal ganglia contribution to rule expectancy and
temporal predictability in speech. Cortex 68, 48–60.
Kotz, S. A., & Schwartze, M. (2010). Cortical speech processing unplugged: A timely subcortico-
cortical framework. Trends in Cognitive Sciences 14(9), 392–399.
Kotz, S. A., & Schwartze, M. (2016). Motor timing and sequencing in speech production: A general-
purpose framework. In G. Hickok & S. L. Small (Eds.), Neurobiology of Language (pp. 717–724).
New York: Academic Press.
Kotz, S. A., Schwartze, M., & Schmidt-Kassow, M. (2009). Non-motor basal ganglia functions: A
review and proposal for a model of sensory predictability in auditory language perception. Cortex
45(8), 982–990.
Krauss, T., & Galloway, H. (1982). Melodic intonation therapy with language delayed apraxic
children. Journal of Music Therapy 19(2), 102–113.
Large, E. W., Herrera, J. A., & Velasco, M. J. (2015). Neural networks for beat perception in musical
rhythm. Frontiers in Systems Neuroscience 9. Retrieved from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658578/
Lebrun, Y. (1998). Clinical observations and experimental research on the study of stuttering.
Journal of Fluency Disorders 23(2), 119–122.
Lim, K. B., Kim, Y. K., Lee, H. J., Yoo, J., Hwang, J. Y., Kim, J. A., & Kim, S. K. (2013). The
therapeutic effect of neurologic music therapy and speech language therapy in post-stroke aphasic
patients. Annals of Rehabilitation Medicine 37(4), 556–562.
Limb, C. J., & Roy, A. T. (2014). Technological, biological, and acoustical constraints to music
perception in cochlear implant users. Hearing Research 308(Suppl. C), 13–26.
Limb, C. J., & Rubinstein, J. T. (2012). Current research on music perception in cochlear implant
users. Otolaryngologic Clinics of North America, Cochlear Implants: Adult and Pediatric 45(1),
129–140.
Lord, V. M., Hume, V. J., Kelly, J. L., Cave, P., Silver, J., Waldman, M., & Hopkinson, N. S. (2012).
Singing classes for chronic obstructive pulmonary disease: A randomized controlled trial. BMC
Pulmonary Medicine 12(1), 1–7.
Lortie, C. L., Rivard, J., Thibeault, M., & Tremblay, P. (2017). The moderating effect of frequent
singing on voice aging. Journal of Voice 31(1), 112.e1–112.e12.
Mainka, S., & Mallien, G. (2014). Rhythmic speech cueing (RSC). In M. H. Thaut & V. Hoemberg
(Eds.), Handbook of neurologic music therapy (pp. 150–160). Oxford: Oxford University Press.
Martikainen, A., & Korpilahti, P. (2011). Intervention for childhood apraxia of speech: A single-case
study. Child Language Teaching and Therapy 27(1), 9–20.
Mauszycki, S. C., & Wambaugh, J. L. (2008). The effects of rate control treatment on consonant
production accuracy in mild apraxia of speech. Aphasiology 22(7–8), 906–920.
Naeser, M. A., & Helm-Estabrooks, N. A. (1985). CT scan lesion localization and response to
melodic intonation therapy with nonfluent aphasia cases. Cortex 21(2), 203–223.
Natke, U., Donath, T. M., & Kalveram, K. T. (2003). Control of voice fundamental frequency in
speaking versus singing. Journal of the Acoustical Society of America 113(3), 1587–1593.
Ogawa, M., Hosokawa, K., Yoshida, M., Yoshii, T., Shiromoto, O., & Inohara, H. (2013). Immediate
effectiveness of humming on the supraglottic compression in subjects with muscle tension
dysphonia. Folia Phoniatrica et Logopaedica 65(3), 123–128.
Onofre, F., Ricz, H. M. A., Takeshita-Monaretti, T., Prado, M. Y. D. A., & Aguiar-Ricz, L. (2013).
Effect of singing training on total laryngectomees wearing a tracheoesophageal voice prosthesis.
Acta cirúrgica brasileira/Sociedade Brasileira para Desenvolvimento Pesquisa em Cirurgia 28,
119–125.
Petersen, B., Weed, E., Sandmann, P., Brattico, E., Hansen, M., Sørensen, S. D., & Vuust, P. (2015).
Brain responses to musical feature changes in adolescent cochlear implant users. Frontiers in
Human Neuroscience 9. Retrieved from
https://www.frontiersin.org/articles/10.3389/fnhum.2015.00007/full
Popovici, M. (1995). Melodic intonation therapy in the verbal decoding of aphasics. Revue Roumaine
de Neurologie et Psychiatrie 33, 57–97.
Rocca, C. (2012). A different musical perspective: Improving outcomes in music through
habilitation, education, and training for children with cochlear implants. Seminars in Hearing
33(4), 425–433.
Rochette, F., Moussard, A., & Bigand, E. (2014). Music lessons improve auditory perceptual and
cognitive performance in deaf children. Frontiers in Human Neuroscience 8, 488. Retrieved from
Rosenberger, P. B. (1980). Dopaminergic systems and speech fluency. Journal of Fluency Disorders
5, 255–267.
Rowe, D. C., Van den Oord, E. J., Stever, C., Giedinhagen, L. N., Gard, J. M., Cleveland, H. H., …
Waldman, I. D. (1999). The DRD2 TaqI polymorphism and symptoms of attention deficit
hyperactivity disorder. Molecular Psychiatry 4(6), 580–586.
Roy, N., Weinrich, B., Grey, S. D., Tanner, K., Stemple, J. C., & Sapienza, C. M. (2003). Three
treatments for teachers with voice disorders: A randomised clinical trial. Journal of Speech,
Language, and Hearing Research 46, 670–688.
Saliba, J., Bortfeld, H., Levitin, D. J., & Oghalai, J. S. (2016). Functional near-infrared spectroscopy
for neuroimaging in cochlear implant recipients. Hearing Research 338(Suppl. C), 64–75.
Santoni, C., de Boer, G., Thaut, M., & Bressmann, T. (2018). Influence of altered auditory feedback
on oral-nasal balance in song. Journal of Voice. Manuscript in print.
Sauder, C., Roy, N., Tanner, K., Houtz, D. R., & Smith, M. E. (2010). Vocal function exercises for
presbylaryngis: A multidimensional assessment of treatment outcomes. Annals of Otology,
Rhinology, & Laryngology 119(7), 460–467.
Schlaug, G., Marchina, S., & Norton, A. (2008). From singing to speaking: Why singing may lead to
recovery of expressive language function in patients with Broca’s aphasia. Music Perception 25(4),
315–323.
Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white matter tracts of
chronic aphasic patients undergoing intense intonation-based speech therapy. New York Academy
of Sciences 1169, 385–94.
Seger, C. A., Spiering, B. J., Sares, A. G., Quraini, S. I., Alpeter, C., David, J., & Thaut, M. H.
(2013). Corticostriatal contributions to musical expectancy perception. Journal of Cognitive
Skingley, A., Page, S., Clift, S., Morrison, I., Coulton, S., Treadwell, P., … Shipton, M. (2014).
Singing for breathing: Participants’ perceptions of a group singing programme for people with
COPD. Arts & Health 6(1), 59–74.
Skodda, S., & Schlegel, U. (2008). Speech rate and rhythm in Parkinson’s disease. Movement
Disorders: Official Journal of the Movement Disorder Society 23(7), 985–992.
Smith, S., & Thyme, K. (1976). Statistic research on changes in speech due to pedagogic treatment
(the accent method). Folia Phoniatrica 28, 98–103.
Sparks, R. W., Helm, N., & Albert, M. (1974). Aphasia rehabilitation resulting from melodic
intonation therapy. Cortex 10(4), 303–316.
Sparks, R. W., & Holland, A. L. (1976). Method: Melodic intonation therapy for aphasia. Journal of
Speech and Hearing Disorders 41, 298–300
Stahl, B., Kotz, S. A., Henseler, I., Turner, R., & Geyer, F. (2011). Rhythm in disguise: Why singing
may not hold the key to recovery from aphasia. Brain 134(10), 3083–3093.
Stegemöller, E. L., Radig, H., Hibbing, P., Wingate, J., & Sapienza, C. (2017). Effects of singing on
voice, respiratory control and quality of life in persons with Parkinson’s disease. Disability and
Stemple, J. C., Glaze, L. E., & Klaben, B. G. (2009). Clinical voice pathology: Theory and
management (4th ed.). San Diego, CA: Singular Publishing Group.
Stemple, J. C., Lee, L., D’Amico, B., & Pickup, B. (1994). Efficacy of vocal function exercises as a
method of improving voice production. Journal of Voice 8(3), 271–278.
Stephan, K. M., Thaut, M. H., Wunderlich, G., Schicks, W., Tian, B., Tellmann, L., … Hömberg, V.
(2002). Conscious and subconscious sensorimotor synchronization: Cortex and the influence of
awareness. NeuroImage 15(2), 345–352.
Sundberg, J., Birch, P., Gümoes, B., Stavad, H., Prytz, S., & Karle, A. (2007). Experimental findings
on the nasal tract resonator in singing. Journal of Voice 21(2), 127–137.
Tamplin, J. (2008). A pilot study into the effect of vocal exercises and singing on dysarthric speech.
Tamplin, J., Baker, F. A., Grocke, D., Brazzale, D. J., Pretto, J. J., Ruehland, W. R., … Berlowitz, D.
J. (2013). Effect of singing on respiratory function, voice, and mood after quadriplegia: a
randomized controlled trial. Archives of Physical Medicine and Rehabilitation 94(3), 426–434.
Tanner, K., Roy, N., Merrill, R. M., & Power, D. (2005). Velopharyngeal port status during classical
singing. Journal of Speech, Language, and Hearing Research 48(6), 1311–1324.
Tanner, M. A. (2012). Voice improvement in Parkinson’s disease: Vocal pedagogy and voice therapy
combined (Doctoral dissertation). University of Alberta.
Tanner, M. A., Rammage, L., & Liu, L. (2016). Does singing and vocal strengthening improve vocal
ability in people with Parkinson’s disease? Arts & Health 8(3), 199–212.
Tautscher-Basnett, A., Tomantschger, V., Keglevic, S., & Freimuller, M. (2006). Group therapy for
individuals with Parkinson disease (PD) focusing on voice strengthening. Presentation to the 4th
World Congress for Neurorehabilitation.
Thaut, M. H., Thaut, C. P., & McIntosh, K. (2014). Melodic intonation therapy (MIT). In M. H.
Thaut & V. Hoemberg (Eds.), Handbook of neurologic music therapy (pp. 140–145). Oxford:
Thaut, M. H., & Hoemberg, V. (Eds.). (2014). Handbook of neurologic music therapy. Oxford:
The American Speech-Language-Hearing Association (2017). Spasmodic dysphonia (Website).
https://www.asha.org/public/speech/disorders/Spasmodic-Dysphonia
Writing 26(2), 139–161.
Titze, I. R. (2006). Voice training and therapy with a semi-occluded vocal tract: Rationale and
scientific underpinnings. Journal of Speech, Language, and Hearing Research 49(2), 448–459.
Tonkinson, S. (1994). The Lombard effect in choral singing. Journal of Voice 8(1), 24–29.
Tourville, J. A., & Guenther, F. H. (2011). The DIVA model: A neural theory of speech acquisition
and production. Language and Cognitive Processes 26(7), 952–981.
Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. (1997). The acoustic basis of infant preferences
for infant-directed singing. Infant Behavior and Development 20(3), 383–396.
Van der Meulen, I., Van de Sandt-Koenderman, M. E., & Ribbers, G. M. (2012). Melodic intonation
therapy: present controversies and future opportunities. Archives of Physical Medicine and
Rehabilitation 93(1), S46–52.
Van Riper, C. (1982). The nature of stuttering (2nd ed.). Englewood Cliffs, NJ: Prentice Hall.
Verdolini, K., Druker D. G., Palmer, P. M., & Samawi, H. (1998). Laryngeal adduction in resonant
voice. Journal of Voice 12(3), 315–327.
Verdolini-Marston, K., Burke, M. K., Lessac, A., Glaze, L., & Caldwell, E. (1995). Preliminary study
of two methods of treatment for laryngeal nodules. Journal of Voice 9(1), 74–85.
Victor, M., & Ropper, A. H. (2001). Adams and Victor’s principles of neurology (7th ed.). New York:
McGraw-Hill.
Wambaugh, J. L., & Martinez, A. L. (2000). Effects of rate and rhythm control treatment on
consonant production accuracy in apraxia of speech. Aphasiology 14(8), 851–871.
Wambaugh, J. L., Nessler, C., Cameron, R., & Mauszycki, S. C. (2012). Acquired apraxia of speech:
The effects of repeated practice and rate/rhythm control treatments on sound production accuracy.
American Journal of Speech-Language Pathology 21(2), S5–S27.
Wan, C. Y., Zheng, X., Marchina, S., Norton, A., & Schlaug, G. (2014). Intensive therapy induces
contralateral white matter changes in chronic stroke patients with Broca’s aphasia. Brain &
Language 136(Suppl. C), 1–7.
Wiener, M., Lohoff, F. W., & Coslett, H. B. (2011). Double dissociation of dopamine genes and
timing in humans. Journal of Cognitive Neuroscience 23(10), 2811–2821.
Wiener, M., Lee, Y.-S., Lohoff, F. W., & Coslett, H. B. (2014). Individual differences in the
morphometry and activation of time perception networks are influenced by dopamine genotype.
Wong, P. C. M., Ettlinger, M., & Zheng, J. (2013). Linguistic grammar learning and DRD2-TAQ-IA
polymorphism. PLoS ONE 8(5), e64983.
Wu, J. C., Maguire, G., Riley, G., Fallon, J., LaCasse, L., Chin, S., … Lottenberg, S. (1995). A
positron emission tomography [18F] deoxyglucose study of developmental stuttering. Neuroreport
6(3), 501–505.
Yanagisawa, E., Estill, J., Mambrino, L., & Talkin, D. (1991). Supraglottic contributions to pitch
raising: Videoendoscopic study with spectroanalysis. Annals of Otology, Rhinology, &
Laryngology 100(1), 19–30.
Yiu, E. M. L., & Ho, E. Y. Y. (2002). Short-term effect of humming on vocal quality. Asia Pacific
Journal of Speech, Language and Hearing 7(3), 123–137.
Yorkston, K. M., Beukelman, D. R., Strand, E. A., & Hakel, M. (2010). Management of motor speech
disorders in children and adults. Austin, TX: Pro-Ed Inc.
Ziegler, A., & Hapner, E. R. (2013). Phonation resistance training exercise (PhoRTE) therapy. In A.
Behrman & J. Haskell (Eds.), Exercises for voice therapy (pp. 147–148). San Diego, CA: Plural
Publishing.
Ziegler, A., Verdolini Abbott, K., Johns, M., Klein, A., & Hapner, E. R. (2014). Preliminary data on
two voice therapy interventions in the treatment of presbyphonia. The Laryngoscope 124(8),
1869–1876.
research. Frontiers in Neurology 5, 1–11. Retrieved from https://doi.org/10.3389/fneur.2014.00007
CHAPT E R 30
NEUROLOGIC MUSIC
T H E R A P Y TA R G E T I N G
COGNITIVE AND
AFFECTIVE FUNCTIONS
S H A N TA L A H E G D E
C A F :
C F R
C , emotion, and social cognition play a central role in functional

recovery and determine overall quality of life in neurological conditions
such as traumatic brain injury (TBI), stroke/cerebrovascular accident,
dementia, other degenerative conditions like Parkinson’s disease, and in
major psychiatric conditions such as schizophrenia and bipolar affective
disorders, as well as common psychiatric conditions such as anxiety and
depression (Diamond, Felsenthal, Macciocchi, Butler, & Lally-Cassady,
1996; Elvevag & Goldberg, 2000; Lam, Kennedy, McIntyre, & Khullar,
2014; Sun, Tan, & Yu, 2014). Pharmacological treatment has shown limited
effects in alleviating deficits in cognition, emotion, and social cognition
neurological and psychiatric conditions (Harvey, Green, Keefe, & Velligan,
2004; Marder, 2006; Müller, 2002, 2012; Rund & Borg, 1999). Over the
last three decades, cognitive remediation has emerged as the best available
non-pharmacological treatment method to target cognitive deficits
(Cicerone et al., 2000, 2011; Gordon et al., 2006; Keshavan, Vinogradov,
Rumsey, Sherrill, & Wagner, 2014; Rohling, Faust, Beverly, & Demakis,
2009; Volpe & McDowell, 1990). The terms remediation, rehabilitation,
and retraining have been interchangeably used in scientific literature.
Technically, the term rehabilitation encompasses not only cognitive
remediation but includes methods such as holistic and multimodal methods
(Hegde, 2014). The holistic approach in rehabilitation addresses cognitive,
emotional, and other non-cognitive domains of functioning.
C R : T
M I C
F
Cognitive remediation (CR) provides patients with the cognitive and

perceptual skills necessary to perform tasks or solve problems which are
currently difficult, but which were within their capabilities before injury
(Cicerone et al., 2000; Diller & Gordon, 1981; Prigatano, 1997; Sohlberg &
Mateer, 2001). CR is described as an intervention that aims to improve
cognitive processes (attention, memory, executive function, social
cognition, or meta-cognition) with the goal of durability and generalization.
The final goal is to improve the patient’s ability to adapt and to be able to
gain normal or near normal functioning in their daily living (Benedict et al.,
1994; Spring & Ravdin, 1992).
There are two approaches to CR:
1. Compensatory approach: In this approach the goal is not to target

the specific cognitive function, but rather to focus on altering the
individual’s behavior or the environment to help compensate in the
area of deficits. This approach is meant to help the individual cope
by providing them with alternative techniques such as reminders to
compensate for memory deficits (Raskin, 2010). This approach is
beneficial if the individual has intact basic cognitive functions to be
able to utilize the aids or reminders.
Restorative approach: This focuses directly on the areas of cognitive
2. deficits and aims to improve cognitive functions by using techniques
such as drill-based exercises, either paper and pencil based or
computer based tasks. The aim is to enable the individual to
accurately perform the task and gain proficiency by repetitively
engaging the chosen task and be able to hold this skill over longer
duration (Raskin, 2010). Consistent repetition on tasks which are
carefully designed and hierarchically placed in levels of difficulty is
crucial in CR. The assumption is that repetitive activation not only
leads to clinical and behavioral domains of functioning but leads to
changes in cortical reorganization. Repeated practice facilitates
neural recovery and restoration of functions mediated by the
underlying neural circuits.
It was always acknowledged that the interaction between biological

systems—the human brain in this case—and the environment is bi-
directional and certain types of environmental experiences can have a
positive impact on biological processes such as cognitive functions. CR is
based on the same principle, which in today’s neuroscientific field is termed
neural plasticity (Raskin, 2010). Neural plasticity is considered the veritable
essence of the brain. It is the capacity of the brain to change based on the
experiences in the environment. This means that the brain is a malleable
organ and it is constantly undergoing reorganization and change (Bruel-
Jungerman, Davis, & Laroche, 2007). Studies on animals demonstrating
changes in cortical organization and the interaction between the different
brain systems following learning tasks provided the initial evidence for
neural plasticity (Kleim, Barbay, & Nudo, 1998). Research studies that
followed on similar lines in the field of neural plasticity led researchers to
explore methods to improve brain functions which led to the emergence of
CR as a treatment method. Initial studies were on patients with traumatic
brain injury and stroke. Some studies reported that those who had lost
motor skills due to brain injury improved on motor functioning due to
motor-based exercises and in addition improved in brain function in the
motor cortex. With studies emerging in this direction, researchers developed
different methods and practice principles that could be used to alleviate
specific cognitive functions and improve overall brain functions in various
disorders (Ben-Yishay, Piasetsky, & Rattok, 1985; Ben-Yishay & Prigatano,
1990; Cicerone, 2012; Faralli, Bigoni, Mauro, Rossi, & Carulli, 2013; Podd,
2012).
Today, CR is considered as a crucial treatment method in ameliorating
cognitive functions in a range of clinical conditions. Systematic research in
this direction is ongoing. A careful examination of literature on CR in
various clinical conditions indicates variability in terms of the CR method
and tasks. There is a need for high-quality evidenced-based research
studies. Meta-analyses on CR as an intervention, especially the restorative
approaches in various neurological and psychiatric conditions, have shown
small (Cohen’s d = 0.30) to large effect sizes (Cohen’s d = 0.71) in
traumatic brain injury(TBI) (Cernich, Kurtz, Mordecai, & Ryan, 2010;
Cicerone et al., 2011; Rohling et al., 2009). There are relatively fewer
systematic studies on CR in degenerative conditions such as Alzheimer’s or
Parkinson’s disease (reference) compared to TBI. The efficacy of CR in
degenerative conditions is still unclear and there are no meta-analytic
studies. In major psychiatric condition such as schizophrenia, CR methods
have shown small to moderate effect sizes (Wykes, Huddy, Cellard,
McGurk, & Czobor, 2011) and studies from developing countries such as
India number only a handful (Hegde, 2017). CR when provided along with
other forms of rehabilitation such as occupational training or social skills
training seemed to show larger effects. Research on examining wider
generalizability of improved cognitive functions targeted via CR and
sustenance of this improvement for longer duration has been the biggest
challenge to overcome (Cappa et al., 2005; Chung, Pollock, Campbell,
Durward, & Hagen, 2013; Schutz & Trainor, 2007; Wykes et al., 2011).
Newer effective methods of CR are still very much needed in the field of
neuropsychological rehabilitation.
M B P
The underlying principle of CR is “neural plasticity”—the capacity of the

brain to reorganize itself and relearn functions that were available prior to
the acquired injury. Neural plasticity is the ability of neural circuits to
change their structure, function, and connectivity in response to experience.
Neural plasticity is known to underlie functional recovery in a broad array
of neurological and psychiatric conditions. Cognitive deficits which play a
central role in recovery could be alleviated by CR, the treatment method
designed to improve brain structure as well as functions.
In the last three decades neuroscientific research on music has taken a
significant leap owing to advancement in scientific techniques and methods
such as EEG-ERP, fMRI, TMS, rTMS, MEG, etc. Scientific understanding
of neural correlates of music perception and cognition has offered a strong
edifice for evidenced-based music therapy techniques and process (Thaut,
2005a). Research on music perception and production has proven that
music engages almost all cognitive processes such as acoustic analysis,
information processing, attention, sensorimotor integration, executive
functions, language processing, long-term memory, emotion, and creativity.
Engaging passively and more so actively in music leads to activation of the
neural networks underlying cognitive processes. The activation is not
restricted to music processes alone and it is known to extend to non-musical
domains such as motor, language, cognitive, and affective domains of
functions. In other words, music is produced by involving a host of
sensorimotor, cognitive, and language functions and the music in turn
stimulates complex cognitive, affective, sensorimotor, and language
processes that have the capacity to generalize to non-musical domains of
functioning (Levitin & Tirovolas, 2009; Patel 2010; Peretz & Zatorre, 2005;
Schlaug, 2009; Thaut, 2010). In addition to sensorimotor, language, and
overall cognitive processes, music is also a powerful method used in
emotion regulation. It is a fact that music can elicit strong emotions. Music,
similar to real-life emotions, engages the very same frontal system and the
limbic, the mesolimbic system, and the reward area nucleus accumbens in
eliciting and processing emotions. Music alters psychophysiological
functions such as perception of pain, regulation of autonomic arousability,
blood pressure, respiration, and heart rate, and also causes neurochemical
changes. The neurochemical changes that music brings about may be
grouped into four different domains: dopamine and opioids which mediate
reward, motivation, and pleasure; cortisol, corticotrophin-releasing
hormone (CRH) and adrenocorticotrophic hormone (ACTH) which mediate
stress and arousal; serotonin and the peptide derivatives of
proopiomelanocortin (POMC), alpha-melanocyte-stimulating hormone and
beta-endorphin which mediate immunity; and oxytocin which mediates
social affiliation (Blood & Zatorre, 2001; Chanda & Levitin, 2013;
Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015; Sutoo & Akiyama,
2004).
Neuroscientific investigations have strongly implicated music in
enhanced structural as well as functional brain plasticity (Jäncke, 2009).
Music is a highly complex and structured stimulus with several dimensions.
The nature, structure, intensity, and complexity of the experience or
stimulus determine the neural changes that occur. Music, as a temporal
auditory language, is today considered an effective method in
neurorehabilitation, and music with its temporal structure and pattern is
known to enhance cognitive functions (Koelsch, 2009; Thaut, 2005b; Thaut
& Hoemberg, 2014). The temporal and sequential aspect of music is known
to serve as a “scaffold” to bootstrap the temporal and sequential pattern in
cognitive functions such as information processing, sustained attention, and
memory (Conway, Pisoni, & Kronenberger, 2009). Engaging in music both
actively and passively is proven to be one of the best forms of cognitive
exercise and as a biological phenomenon it is considered as a signal of
cognitive and emotional flexibility and cognitive fitness (Herholz &
Zatorre, 2012; Levitin & Tirovolas, 2009; Peretz, 2006). Musicians have
been studied as a special group, to study neural plasticity and establish the
effects of musical training and underlying neural changes (Münte,
Altenmüller, & Jäncke, 2002). Music is now being systematically studied
and used in altering and regulating cognitive processes and emotion
regulation which can be generalized to non-musical domains of functioning,
keeping the principle of neural plasticity as the basic assumption in a range
of neurological and psychiatric conditions where cognitive deficits play a
central role (Särkämö, Altenmüller, Rodríguez-Fornells, & Peretz, 2016;
Särkämö, Tervaniemi, & Huotilainen, 2013; Sihvonen et al., 2017; Wan &
Schlaug, 2010). Music therefore is considered as a powerful and integrative
method to address multiple domains of functioning—cognition, emotional,
and social—and as a valuable tool in neuropsychological rehabilitation.
M T : F S S
M N M
Addressing cognitive dysfunction via music is a recent frontier in music
therapy and has emerged as one of the most promising and innovative new
methods. Considering music as a form of therapy has a long history and has
been practiced across cultures over several centuries, dating back even to
prehistoric times (Thaut, 2015). Music has always been considered to
facilitate “overall well-being” and the “feel-good” factor by enhancing
overall emotional health (de l’Etoile, 2010). Music therapy has been
considered effective in the reduction of anxiety, depression, and agitation.
However, often the process underlying the mechanisms of change was
unclear. The changes were explained relying upon the popular and
prevailing psychological theories such as behavioral, psychoanalytic, and
humanistic schools of thought (de l’Etoile, 2010, 2016). Such explanations
have had limited contributions in understanding the underlying processes of
music in therapy which is crucial in standardizing techniques to target
specific functioning as well as in evaluating the specific outcome. With
advancement in neuroscientific investigations on music perception and
cognition and better understanding of the underlying neural correlates, there
has been a major change in music therapy principles and objectives. Music
therapy has shifted from its perspective as a social science model to a
neuroscience-based model (Thaut, 2005a; Thaut, McIntosh, & Hoemberg,
2014). An increasing number of controlled trials have examined the effects
of music-based interventions such as listening, singing, and actively
engaging in music by playing an instrument in neurorehabilitation
(Sihvonen et al., 2017).
A O N M
T T E
M
The recent advancement in the field of music therapy has been the
development of the neuroscience-based approach to music therapy practice
and research. “Neurologic Music Therapy” (NMT) is based on
neuroscientific modes of music perception, cognition, and production. NMT
consists of twenty standardized techniques targeting three main domains of
functioning, namely, sensorimotor, language, and cognitive-affective
dysfunctions that are the result of neurologic disease of the human nervous
system (Clair, Pasiali, & Lagasse, 2008; de l’Etoile, 2010; Thaut, 2005a).
Each of the techniques focuses on the influence of music on non-musical
domains of functioning with non-musical therapeutic goals.
NMT is based on two interrelated models to explain how music-based
interventions and training can access and modulate cognitive functions in a
neuropsychological rehabilitation context. The two models of the NMT are
the “Rational Scientific Mediating Model” (RSMM) and the
“Transformational Design Model” (TDM) (de l’Etoile, 2016; Hegde, 2014;
Thaut, McIntosh, & Hoemberg, 2014). The TDM is the clinical component
or practical extension of the scientific theory model of the RSMM.
Rational Scientific Mediating Model (RSMM)

The basic premise of the RSMM is that the scientific basis of music as
therapy should be anchored in the empirical studies of the neurological,
psychological, and physiological foundations of music perception,
cognition, and production (Thaut, 2005a; Thaut, McIntosh, & Hoemberg,
2014). The RSMM was conceptualized to provide a systematic
epistemology for translational research. This model helps in establishing the
link between the scientific findings of music perception, cognition, and
production and the rehabilitation of non-musical functions in the bio-
psycho-social domains. The RSMM links the knowledge of musical and
non-musical behavior, thereby explaining the mechanisms of change to
support effective therapy and rehabilitation. It is considered as dynamic
model which is open to incorporate newer research findings thereby
contributing to a deeper understanding of the underlying process of music
as therapy. The RSMM plays a crucial role in the development of music-
based therapeutic techniques as well as the selection of appropriate
therapeutic methods. This model helps in validation of the techniques of
NMT. It consists of four steps:
1. Musical response model: The focus is on how we perceive, produce,

or respond to music—its neurological, physiological, and
psychological components. This includes understanding the
underlying relevant bodily systems, the neural correlates, and
various perceptual, motor, and cognitive processes.
2. Non-musical parallel model: The focus is on similar perceptual,
motor, and cognitive processes in non-musical brain and behavior
functions and linking this with step 1 by investigating shared or
parallel underlying mechanisms between musical and non-musical
behavior. This is an important step before suggesting that music will
have positive effect on the non-musical domain of functioning.
3. Mediating model: The focus is on combining the previous two steps
to study whether music can influence parallel non-musical behaviors
in normal and clinical populations.
4. Clinical research model: The focus is on the long-term, therapeutic
effects of music in non-musical domain functioning and studying the
carry-over effect after the treatment program.
The Transformational Design Model (TDM)

The TDM provides systematic step-by-step guidance in designing,
implementing, and evaluating the clinical intervention (Thaut, 2014). The
validity of the intervention will be explained through the RSMM. The TDM
focuses on establishing focused and clearly delineated therapeutic goals.
The TDM is important in bringing together the traditional cognitive
rehabilitative techniques with the NMT system. There are six steps in the
TDM model:
1. Diagnostic and functional/clinical assessment of the patient: This is

the diagnostic and etiological assessment of the patient and applying
clinical assessments for optimal treatment selection and evaluation
of progress across the therapy sessions.
2. Development of appropriate and measurable therapeutic goals and
objectives.
3. Design of functional, non-musical therapeutic exercise structures and
stimuli to accomplish the clinical goals and objectives.
4. Translation of step 3 into functional therapeutic music exercises.
This is a crucial step in the NMT process. The therapist has to
translate functional goals as therapeutic exercises incorporating
musical elements and stimuli. This means that all functional
exercises elements get translated into musical elements. This
translational or transformational process is guided by three principles
(a) Scientific validity—the translation process should be congruent
with the scientific information developed in the RSMM.
(b) Musical logic—the musical experience in therapy has to
confirm to the aesthetic and artistic principles of good musical
forms even at its most basic level.
(c) Structural equivalence—the therapeutic music exercise should
be similar in its structure and function to the non-musical
functional design; all non-musical exercise elements and stimuli
have to be translated musically.
5. Outcome assessment/post-intervention assessment—repeat of the
baseline assessment carried out in step 1. Assessments may be
carried out after each session, over a set of sessions, at the end of the
treatment period, and at follow-ups.
6. Transfer of therapeutic learning to functional applications for
“activities of daily living” (ADL) (de l’Etoile, 2016; Thaut, 2005a,
2014; Thaut, McIntosh, & Hoemberg, 2014).
There are three major areas which the various techniques of NMT
address:
1. Sensorimotor functions: targeting motor functions, mobility strength,

endurance, cadence and coordination of gross and fine motor
movements in lower and upper extremities (de l’Etoile, 2010; Thaut,
2005c).
2. Speech and language functions: targeting vocal control, speech
production, and meaningful usage of verbal and non-verbal
communication (de l’Etoile, 2010; Thaut, 2005d).
3. Cognitive and affective functions: targeting basic and higher-order
cognitive functions such as attention, memory, executive functions,
emotion, and psychosocial skills (Clair et al., 2008; de l’Etoile,
2010; Thaut, 2005b).
The first two areas are covered in other chapters of this book. This
chapter will focus on various techniques targeting cognitive and affective
functions.
Enhanced neuroscientific understanding of music perception and
cognition along with advancement in non-invasive research tools to study
human brain functions over the last two to three decades have contributed
to linking music and cognitive functions as well examining the shared and
unique neural correlates underlying music and non-musical cognitive
processes. Until recently, lack of standardized methods was a major
drawback in the field of music therapy.
CR via NMT-based techniques is one of the recent developments in the
area of NMT in comparison to clinical research in the domain of
sensorimotor and speech and language functions (Gardiner & Horwitz,
2015; Thaut, 2010). Careful analysis of the literature indicates only a
handful of studies with a strong theoretical background examining the
effects of traditional music therapy on cognitive functions. NMT
techniques, developed on the basis of RSMM and TDM, are evidence-
based, theoretically grounded, and standardized in terminology and
methods. The six steps of the TDM, which is the clinical application model,
run parallel to the fundamental principles of CR. This comparison is
presented in tabular format in Table 1.
Table 1. A comparison of basic principles of cognitive remediation and clinical
application model of (TDM) of NMT
Key steps and principles of cognitive Six steps of TDM of NMT

remediation
A detailed assessment at baseline, post- 1 and 5—Diagnostic and functional/clinical
intervention, and at follow-up assessment, post-intervention, and follow-
(neuropsychological functioning, up assessment.
psychosocial functioning, evaluation of
mood, etc.).
Standardized
neuropsychological/neurocognitive tests
are used to obtain the baseline level
cognitive functioning. The tests are re-
administered after the completion of
intervention to quantify the changes in the
specific domains of cognitive functions.
Key steps and principles of cognitive Six steps of TDM of NMT
remediation
Underlying core principle—neural 2. Core principle is that music and music-

plasticity: “neuronal sparing” (prevention based exercises will facilitate “neural
of further deterioration or “neuronal plasticity.”
death”), and neuronal reorganization 3. Development of appropriate and
(growth of newer neuronal reconnection measurable therapeutic goals and
to perform and accomplish the task). objectives.
Marking goals and objectives of the 4. Development of functional non-musical
intervention based on needs and the therapy and exercise and stimuli.
detailed evaluation which facilitates
identification areas of cognitive functions
which need specific focus.
Development of different methods of
exercises, stimulus modalities, levels of
complexity, and response demands
(paper–pencil based/computer based)
with scientific validity. The exercises
should be able to target the cognitive
function that it is expected to improve.
The remediation exercises must be
organized in levels of difficulties—from
basic skills such as attention and
concentration, then progressing to more
complex skills such as learning, memory,
executive functions, affect, and social
behavior.
In a good cognitive remediation method,
the intervention tasks and exercises
should be process-specific targeting
specific areas of cognitive functioning
such as attention, memory, or executive
functions.
Final aim is to observe improved 5. Translation of functional non-musical

cognitive functions transfer to activities of therapy exercises to music-based
daily living. exercises and stimuli with scientific validity,
Holistic approach will include education musical logic, and structural equivalence.
of the patient and family members. 6. Transfer of therapeutic learning to
Family members are included to functional applications (NMT emphasizes
enhance the support system. generalization of therapeutic gains to
activities of daily living; techniques include
homework exercises and involvement of
family members) to help transfer of
therapeutic learning to real-life situations.
Key references for principles of CR: Ben-Yishay & Prigatano, 1990; Eack, 2012;
Prigatano, 1997; Raskin, 2010.
T NMT T
C E
The various techniques of NMT which address cognitive-affective functions

are presented along with a brief description of the technique as well as chief
target populations in Table 2.
S B NMT T
T C , E ,
P F
The techniques of NMT, as stated earlier, are based on the RSMM and
TDM. The two models are dynamic in nature by being able to integrate
recent research studies to help translations to standardized therapeutic
techniques. A strong scientific link for music as a method of cognitive
remediation comes from a growing body of research that links music and a
host of cognitive functions such as attention, temporal order learning,
visuotemporal reasoning, and auditory verbal memory (Drake, Jones, &
Baruch, 2000; Hitch, 1996; Kilgour, Jakobson, & Cuddy, 2000; Sarnthein et
al., 1997; Shaw & Bodner, 1999). The temporal structure of music remains
a central element in therapy and rehabilitation. Rhythm for instance is
known to play an important role in tuning and modulating attention, albeit
musical attention (Drake et al., 2000). Rhythmic patterns entrain attention
focus by interacting with attention oscillators via coupling mechanisms
(Thaut, 2010). There exists strong evidence in the field that sensory
rhythms entrain or synchronize attentional processes. This basically means
that sensory rhythms drive a periodic series of attentional peaks and troughs
that occur at roughly equal temporal intervals (Jones, 1976; Jones & Boltz,
1989; Large & Jones, 1999). A summary of behavioral findings of studies
examining rhythmic entrainment and attention is that rhythmically expected
events are better detected or discriminated than events occurring
arhythmically (Jones, Boltz, & Kidd, 1982; Jones, Moynihan, Mackenzie,
& Puente, 2002; McAuley & Jones, 2003). Studies have also shown
evidence of cross-modal effects, i.e., auditory entrainment on the temporal
allocation of visual attention. Rhythmic auditory stimulus alters the
temporal distribution of visual attention (Miller, Carlson, & McAuley,
2013). Recently the role of temporal attention has been examined carefully
for its role in aiding language development (de Diego-Balaguer, Martinez-
Alvarez, & Pons, 2016; Fujii & Wan, 2014; Jung, Sontag, Park, & Loui,
2015).
Recently studies on clinical populations such as those with Parkinson’s
disease have shown high correlation between non-musical cognitive
functions and performance on various temporal components of rhythm such
as tempo, beat discrimination, and beat perception in a musical context. The
performance on the rhythm perception task predicted performance on the
non-musical domains of cognitive functions such as focused attention and
working memory (Biswas, Hegde, Jhunjhunwala, & Pal, 2016). In patients
with disorders of consciousness, cerebral responses were observed when
their first name was called out after presenting the patient’s preferred music
listening condition than in the continuous sound condition. The cerebral
responses were recorded using bedside EEG recordings and the event
related potential (ERP) method examining the P300 and N200 ERP wave-
forms which are also considered as an index of discriminative processing to
a very salient and emotional word—such as one’s first name (Castro et al.,
2015). Presence or absence of this discriminative cerebral response is
strongly associated with outcome of patients having disorders of
consciousness (Fischer, Luaute, Adeleine, & Morlet, 2004).
Studies have shown parallels in temporal chunking principles of non-
musical memory processes with musical memory formation based on the
structural principles of phrasing, grouping, and hierarchical abstraction in
musical patterns (Deutsch, 1982). There are significant numbers of studies
that have shown effects of music in enhancing memory for non-musical
material (Ho, Cheung, & Chan, 2003; Jakobson, Cuddy, & Kilgour, 2003;
Thaut, Peterson, McIntosh, & Hoemberg, 2014; Wallace, 1994). The
musical mnemonics method of rehearsal has shown superior benefits over
verbal rehearsal in children with learning disability and developmentally
disabled children (Gfeller, 1983; Kern, Wolery, & Aldridge, 2007; Wolfe &
Hom, 1993). Music is proven to serve as an effective method to improve
mood, orientation, and remote episodic memory, as well as attention and
executive functions, in patients with early dementia (Särkämö, Tervaniemi,
et al., 2014). The musical mnenomics method has been shown to improve
verbal memory in patients with multiple sclerosis (MS). In a series of
experiments using EEG, music-based learning led to increased coherence
(phase-locked synchronization) within and between oscillatory brain
networks in alpha and gamma bands. Higher oscillatory activity in lower
alpha band rhythms in bilateral prefrontal neural networks underlying
memory was reported compared to spoken condition. Patients with MS
were presented with the sung or spoken version of the Auditory Verbal
Learning Test. Systems-level brain activity with oscillatory network
synchronization during music-assisted learning was measured. “Learning
related synchronization” (LRS) was calculated. LRS was the percentage
change in EEG spectral power from the first time the word was presented to
the average of the subsequent word encoding trials. It was observed that
LRS differed significantly between the music and spoken condition in low
alpha and upper beta bands. Patients who were presented the words using
the musical template showed overall better word memory and better word
order memory as well as stronger bilateral frontal alpha LRS than the
patients who received the spoken word template. It is suggested by the
authors of this work that the temporal structure implicit in musical stimuli
seems to enhance “deep coding” during the verbal learning and sharpens the
timing of neural dynamics in brain networks degraded by demyelination in
patients with MS (Thaut, Peterson, & McIntosh, 2005; Thaut, Peterson, et
al., 2014; Peterson & Thaut, 2007). Musical processing in general is often
observed to be spared in patients with neurodegenerative conditions such as
Alzheimer’s disease (AD) and even brief exposure to music has been shown
to have positive impact on their performance on certain cognitive tasks such
as autobiographical memory and verbal fluency. The brain areas associated
with music cognition seem to be preferentially spared in AD. Regions
identified to encode musical memory have been shown to correspond with
areas that showed substantially minimal cortical atrophy (as measured with
magnetic resonance imaging), and minimal disruption of glucose
metabolism (as measured with (18)F-fluorodeoxyglucose positron emission
tomography), when compared to the rest of the brain (Jacobsen et al., 2015;
Limb, 2006; Thompson, Moulin, Hayre, & Jones, 2005). In one study, AD
patients were exposed to lyrics of unfamiliar children’s songs bimodally at
encoding and visual stimuli and were presented with either sung or spoken
lyrics. The findings showed that AD patients demonstrated better
recognition accuracy for the lyrics that were presented musically than for
those that were spoken. Healthy controls, however, did not show any
significant difference between the two conditions (Simmons-Stern, Budson,
& Ally, 2010).
The musical neglect training (MNT) method has shown significant
impact in targeting hemi-neglect conditions often seen in patients with right
hemisphere cerebrovascular disease at the temporoparietal junction of the
infero-posterior parietal cortex. Most of these patients also suffer from
anosognosia wherein patients deny having any unilateral neglect condition.
Across various studies, musical stimuli have been considered superior to
other sensory cues such as visual or tactile cues (Hommel et al., 1990). The
commonly used techniques to target hemi-neglect conditions include neck
vibration, limb activation, or optokinetic stimulation. The effects of such
treatment are transitory in nature lasting for less than thirty minutes. The
MNT method such as playing a scale on an instrument such as a keyboard
has been shown to sustain the gains observed for over a week (Bernardi et
al., 2017; Bodak, Malhotra, Bernardi, Cocchini, & Stewart, 2014).
Variations in providing tonal cues such as lower pitch and higher pitch have
been shown to modulate line-bisection tasks with low pitch producing
leftward or downward biases and high pitch producing rightward or upward
biases suggesting how visuomotor processing can be spatially modulated by
auditory cues (Ishihara et al., 2013). Patients with hemi-neglect seem to
sustain the gradual improvement up to one week to four months duration
and this translates to changes in day-to-day activities (Bodak et al., 2014;
Guilbert, Clement, & Moroni, 2017; Ishihara et al., 2013).
A successful neuropsychological rehabilitation process will address not
only cognitive functions, but also address issues that will facilitate
translation of improvement in cognition to psychosocial functioning and in
real-life situations. Anxiety and depression are a major concern in several
neurological conditions such as traumatic brain injury, stroke, and
degenerative conditions (Raglio et al., 2015). A single blind randomized
controlled study has shown that everyday music listening during the first
two months after stroke improves cognitive functions (verbal memory,
Cohen’s d = 0.88, and focused attention, Cohen’s d = 0.92) as well as mood
(lowered depression, Cohen’s d = 0.77) in stroke patients compared to a
group who received audio book listening as a control intervention (Särkämö
et al., 2008). There were also neuroanatomical changes in the recovering
brain. Gray matter reorganization observed in the frontal areas correlated
with improved verbal memory, focused attention, and language skills and
gray matter reorganization in the left ventral/subgenual anterior cingulate
cortex correlated with reduced negative mood (Särkämö, Ripollés, et al.,
2014). NMT has been successfully used in targeting psychosocial issues
(Kleinstauber & Gurr, 2006; Nayak et al., 2000). Adherence to music-based
intervention seems to be better and drop-out rates are low suggesting that
music has an innate ability to sustain the interest of patients. To sustain
interest and reduce drop-out rates has been a great challenge in research
studies on CR (Maratos, Gold, Wang, & Crawford, 2008; Mossler, Chen,
Heldal, & Gold, 2011).
Research on music in rehabilitation has utilized a variety of music-based
intervention methods, including guided listening to singing and playing
instruments. NMT includes standardized techniques which demand the
clinician or researcher to have undergone specific training in NMT to
maintain the standards of treatment methods. There has been a significant
surge in systematic research in the field of music and CR, and NMT has
provided the scientific basis and framework in carrying out systematic
work. A preliminary study using a quasi-experimental design examined the
immediate effect of NMT in a group-setting on patients with brain injury.
The treatment group received brief sessions of NMT lasting for thirty
minutes with each session targeting one of the following functions:
attention, memory, executive functions, and emotional adjustment. A
control group received rest periods for the same duration in a quiet room.
The findings showed that NMT positively affected executive functions,
mental flexibility (Cohen’s d = 1.21), and there was a significant decrease
in depression with medium effect size (Cohen’s d = 0.52) and anxiety to a
small extent (Cohen’s d = 0.28). Therapy did not have a positive impact on
attention or memory. This study did not examine sustenance of
improvement over time (Thaut et al., 2009). There is indeed a need for
systematic research examining the optimal intervention duration and
follow-up of sustenance of treatment gains.
S F D
Cognition and emotional domains of functioning play a crucial role in the

functional recovery of patients suffering from neurological and psychiatric
conditions. There is undoubtedly a need for newer methods in
neurorehabilitation which can bring about not only significant changes in
functioning but produce long-lasting improvement. Newer methods of CR
are warranted to either replace or complement the existing methods of CR.
Traditional CR methods and research studies are also being challenged in
terms of its ability to target multiple domains of functioning—cognition,
affect, and psychosocial functioning. Music as therapy is gaining a new
perspective with advancement in neuroscientific research on music
perception, cognition, and production. Music and music-based intervention
are today considered ideal methods for CR as it has the capacity to engage
auditory, motor, language, cognitive, and emotional functions across
cortical and subcortical brain regions. The chapter has presented a brief
overview of the various techniques of NMT targeting cognition, affect, and
psychosocial functions. At present the existing scientific literature is often
characterized by small sample sizes and highlights the need for
standardized methods of intervention across studies. This has been the case
even with research in restorative methods of CR in the field of
neuropsychological rehabilitation. Future research on music should
consider this limitation before intervention methods are planned and
standardized. There is a need for stronger research methodology and
definition of the medium or parameter in music related to specific output of
rehabilitation. Adding evaluation of neurochemical markers of neural
plasticity such as brain-derived neurotrophic factor (BDNF), and methods
such as EEG/ERP and fMRI, would contribute further to the scientific
strength of future research on music therapy. NMT techniques have the
potential to overcome this issue as the methods are well standardized and
can be used in large-scale or multi-center studies. Compared to systematic
research using the techniques of NMT for sensorimotor and language
functions, research on NMT techniques for cognitive and affective
functions is far less abundant. NMT with its strong theoretical and scientific
background has positively influenced the practice of music therapy across
countries. Future studies may also aim to examine the effects of NMT
techniques for sensorimotor and language functions on cognitive functions
and vice versa due to shared neural networks underlying the former
functions with general cognitive functions. Also, all the techniques of NMT
aim at enhancing neural plasticity to bring about the desired changes in
neural function and behavior.
R
Ben-Yishay, Y., Piasetsky, E., & Rattok, J. (1985). A systematic method for ameliorating disorders in
basic attention. In M. J. Meir, A. L. Benton, & L. Diller (Eds.), Neuropsychological rehabilitation
(pp. 165–82). New York: Guilford Press.
Ben-Yishay, Y., & Prigatano, G. P. (1990). Cognitive remediation. In M. Rosenthal, M. R. E. R.
Griffith, M. R. Bond, & J. D. Miller (Eds.), Rehabilitation of the adult and child with traumatic
brain injury (2nd ed., pp. 393–409). Philadelphia, PA: Davis.
Benedict, R. H., Harris, A. E., Markow, T., McCormick, J. A., Nuechterlein, K. H., & Asarnow, R. F.
(1994). Effects of attention training on information processing in schizophrenia. Schizophrenia
Bulletin 20, 537–546.
Bernardi, N. F., Cioffi, M. C., Ronchi, R., Maravita, A., Bricolo, E., Zigiotto, L., … Vallar, G. (2017).
Improving left spatial neglect through music scale playing. Journal of Neuropsychology 11(1),
135–158.
Biswas, A., Hegde, S., Jhunjhunwala, K., & Pal, P. K. (2016). Two sides of the same coin:
Impairment in perception of temporal components of rhythm and cognitive functions in
Parkinson’s disease. Basal Ganglia 6(1), 63–70.
Sciences 98(20), 11818–11823.
Bodak, R., Malhotra, P., Bernardi, N. F., Cocchini, G., & Stewart, L. (2014). Reducing chronic visuo-
spatial neglect following right hemisphere stroke through instrument playing. Frontiers in Human
Neuroscience 8, 413. Retrieved from https://doi.org/10.3389/fnhum.2014.00413
Bruel-Jungerman, E., Davis, S., & Laroche, S. (2007). Brain plasticity mechanisms and memory: A
party of four. Neuroscientist 13(5), 492–505.
Cappa, S. F., Benke, T., Clarke, S., Rossi, B., Stemmer, B., & Van Heugten, C. M. (2005). EFNS
guidelines on cognitive rehabilitation: Report of an EFNS task force. European Journal of
Neurology 12(9), 665–680.
Castro, M., Tillmann, B., Luaute, J., Corneyllie, A., Dailler, F., Andre-Obadia, N., & Perrin, F.
(2015). Boosting cognition with music in patients with disorders of consciousness.
Neurorehabilitation and Neural Repair 29(8), 734–742.
Cernich, A. N., Kurtz, S. M., Mordecai, K. L., & Ryan, P. B. (2010). Cognitive rehabilitation in
traumatic brain injury. Current Treatment Options in Neurology 12(5), 412–423.
17(4), 179–193.
Chung, C. S., Pollock, A., Campbell, T., Durward, B. R., & Hagen, S. (2013). Cognitive
rehabilitation for executive dysfunction in adults with stroke or other adult non-progressive
acquired brain damage. Cochrane Database of Systematic Reviews 4, CD008391.
Cicerone, K. D. (2012). Facts, theories, values: Shaping the course of neurorehabilitation. The 60th
John Stanley Coulter memorial lecture. Archives of Physical Medicine and Rehabilitation 93(2),
188–191.
Cicerone, K. D., Dahlberg, C., Kalmar, K., Langenbahn, D. M., Malec, J. F., Bergquist, T. F., …
Morse, P. A. (2000). Evidence-based cognitive rehabilitation: Recommendations for clinical
practice. Archives of Physical Medicine and Rehabilitation 81(12), 1596–1615.
Cicerone, K. D., Langenbahn, D. M., Braden, C., Malec, J. F., Kalmar, K., Fraas, M., … Ashman, T.
(2011). Evidence-based cognitive rehabilitation: Updated review of the literature from 2003
through 2008. Archives of Physical Medicine and Rehabilitation 92(4), 519–530.
Clair, A. A., Pasiali, V., & Lagasse, B. (2008). Neurologic music therapy. In A. A. Darrow (Ed.),
Introduction to approaches in music therapy (2nd ed., pp. 153–171). Silver Spring, MD: American
Music Therapy Association.
Conway, C. M., Pisoni, D. B., & Kronenberger, W. G. (2009). The importance of sound for cognitive
sequencing abilities: The auditory scaffolding hypothesis. Current Directions in Psychological
Science 18(5), 275–279.
de Diego-Balaguer, R., Martinez-Alvarez, A., & Pons, F. (2016). Temporal attention as a scaffold for
language development. Frontiers in Psychology 7. Retrieved from
de l’Etoile, S. K. (2010). Neurologic music therapy: A scientific paradigm for clinical practice. Music
and Medicine 2(2), 78–84.
de l’Etoile, S. K. (2016). Processes of music therapy: Clinical and scientific rationales and models. In
S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (2nd ed., pp.
Deutsch, D. (1982). Organizational processes in music. In M. Clynes (Ed.), Music, mind and brain
(pp. 119–131). New York: Plenum Press.
Diamond, P. T., Felsenthal, G., Macciocchi, S. N., Butler, D. H., & Lally-Cassady, D. (1996). Effect
of cognitive impairment on rehabilitation outcome. American Journal of Physical Medicine and
Diller, L., & Gordon, W. A. (1981). Interventions for cognitive deficits in brain-injured adults.
Journal of Consulting and Clinical Psychology 49(6), 822–834.
Drake, C., Jones, M. R., & Baruch, C. (2000). The development of rhythmic attending in auditory
sequences: Attunement, referent period, focal attending. Cognition 77(3), 251–288.
Eack, S. M. (2012). Cognitive remediation: A new generation of psychosocial interventions for
people with schizophrenia. Social Work 57(3), 235–246.
Elvevag, B., & Goldberg, T. E. (2000). Cognitive impairment in schizophrenia is the core of the
disorder. Critical Reviews in Neurobiology 14(1), 1–21.
Faralli, A., Bigoni, M., Mauro, A., Rossi, F., & Carulli, D. (2013). Noninvasive strategies to promote
functional recovery after stroke. Neural Plasticity 2013, 854597.
Fischer, C., Luaute, J., Adeleine, P., & Morlet, D. (2004). Predictive value of sensory and cognitive
evoked potentials for awakening from coma. Neurology 63(4), 669–673.
Fujii, S., & Wan, C. Y. (2014). The role of rhythm in speech and language rehabilitation: The SEP
hypothesis. Frontiers in Human Neuroscience 8. Retrieved from
http://dx.doi.org/10.3389/fnhum.2014.00777
Gardiner, J. C., & Horwitz, J. L. (2015). Neurologic music therapy and group psychotherapy for
treatment of traumatic brain injury: Evaluation of a cognitive rehabilitation group. Music Therapy
Perspectives 33(2), 193–201.
Gfeller, K. E. (1983). Musical mnemonics as an aid to retention with normal and learning disabled
students. Journal of Music Therapy 20(4), 179–189.
Gordon, W. A., Zafonte, R., Cicerone, K., Cantor, J., Brown, M., Lombard, L., … Chandna, T.
(2006). Traumatic brain injury rehabilitation: State of the science. American Journal of Physical
Guilbert, A., Clement, S., & Moroni, C. (2017). A rehabilitation program based on music practice for
patients with unilateral spatial neglect: A single-case study. Neurocase 23(1), 12–21.
Harvey, P. D., Green, M. F., Keefe, R. S., & Velligan, D. I. (2004). Cognitive functioning in
schizophrenia: A consensus statement on its role in the definition and evaluation of effective
treatments for the illness. Journal of Clinical Psychiatry 65(3), 361–372.
Hegde, S. (2014). Music based cognitive remediation therapy for patients with traumatic brain injury.
Frontiers in Neurology 5. Retrieved from https://doi.org/10.3389/fneur.2014.00034
Hegde, S. (2017). A review of Indian research on cognitive remediation for schizophrenia. Asian
Journal of Psychiatry 25, 54–59.
Hitch, G. J. (1996). Temporal grouping effects in immediate recall: A working memory analysis.
Quarterly Journal of Experimental Psychology Section A 49(1), 116–139.
Ho, Y. C., Cheung, M. C., & Chan, A. S. (2003). Music training improves verbal but not visual
memory: Cross-sectional and longitudinal explorations in children. Neuropsychology 17(3), 439–
450.
Hommel, M., Peres, B., Pollak, P., Memin, B., Besson, G., Gaio, J. M., & Perret, J. (1990). Effects of
passive tactile and auditory stimuli on left visual neglect. Archives of Neurology 47, 573–576.
Ishihara, M., Revol, P., Jacquin-Courtois, S., Mayet, R., Rode, G., Boisson, D., … Rossetti, Y.
(2013). Tonal cues modulate line bisection performance: Preliminary evidence for a new
rehabilitation prospect? Frontiers in Psychology 4, 704. Retrieved from
Jacobsen, J. H., Stelzer, J., Fritz, T. H., Chetelat, G., La Joie, R., & Turner, R. (2015). Why musical
Jakobson, L. S., Cuddy, L. L., & Kilgour, A. R. (2003). Time tagging: A key to musicians’ superior
memory. Music Perception: An Interdisciplinary Journal 20(3), 307–313.
Jäncke, L. (2009). Music drives brain plasticity. F1000 Biology Reports 1, 78. Retrieved from
http://www.F1000.com/Reports/Biology/content/1/78
96(3), 459–491.
Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and
temporal context. Perception & Psychophysics 32(3), 211–218.
Jones, M. R., Moynihan, H., Mackenzie, N., & Puente, J. (2002). Temporal aspects of stimulus-
Jung, H., Sontag, S., Park, Y. S., & Loui, P. (2015). Rhythmic effects of syntax processing in music
and language. Frontiers in Psychology 6, 1762. Retrieved from
Kern, P., Wolery, M., & Aldridge, D. (2007). Use of songs to promote independence in morning
greeting routines for young children with autism. Journal of Autism and Developmental Disorders
37(7), 1264–1271.
Keshavan, M. S., Vinogradov, S., Rumsey, J., Sherrill, J., & Wagner, A. (2014). Cognitive training in
mental disorders: Update and future directions. American Journal of Psychiatry 171(5), 510–522.
Kilgour, A. R., Jakobson, L. S., & Cuddy, L. L. (2000). Music training and rate of presentation as
mediators of text and song recall. Memory & Cognition 28(5), 700–710.
Kleim, J. A., Barbay, S., & Nudo, R. J. (1998). Functional reorganization of the rat motor cortex
following motor skill learning. Journal of Neurophysiology 80(6), 3321–3325.
Kleinstauber, M., & Gurr, B. (2006). Music in brain injury rehabilitation. Journal of Cognitive
Rehabilitation 24, 4–14.
Koelsch, S. (2009). A neuroscientific perspective on music therapy. Annals of the New York Academy
of Sciences 1169, 374–384.
Lam, R. W., Kennedy, S. H., McIntyre, R. S., & Khullar, A. (2014). Cognitive dysfunction in major
depressive disorder: Effects on psychosocial functioning and implications for treatment. Canadian
Journal of Psychiatry/Revue Canadienne de Psychiatrie 59(12), 649–654.
Limb, C. J. (2006). Structural and functional neural correlates of music perception. The Anatomical
Record Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology 288, 435–446.
McAuley, J. D., & Jones, M. R. (2003). Modeling effects of rhythmic context on perceived duration:
A comparison of interval and entrainment approaches to short-interval timing. Journal of
Experimental Psychology: Human Perception and Performance 29(6), 1102–1125.
Maratos, A. S., Gold, C., Wang, X., & Crawford, M. J. (2008). Music therapy for depression.
Cochrane Database of Systematic Reviews 1, CD004517.
Marder, S. R. (2006). Initiatives to promote the discovery of drugs to improve cognitive function in
severe mental illness. Journal of Clinical Psychiatry 67(7), e03.
Miller, J. E., Carlson, L. A., & McAuley, J. D. (2013). When what you hear influences when you see:
Listening to an auditory rhythm influences the temporal allocation of visual attention.
Mossler, K., Chen, X., Heldal, T. O., & Gold, C. (2011). Music therapy for people with schizophrenia
and schizophrenia-like disorders. Cochrane Database of Systematic Reviews 4, CD004025.
Müller, T. (2002). Drug treatment of non-motor symptoms in Parkinson’s disease. Expert Opinion in
Pharmacotherapy 3(4), 381–388.
Müller, T. (2012). Drug therapy in patients with Parkinson’s disease. Translational
Neurodegeneration 1, 10. doi:10.1186/2047-9158-1-10
Münte, T. F., Altenmüller, E., & Jäncke, L. (2002). The musician’s brain as a model of
neuroplasticity. Nature Reviews Neuroscience 3, 473–478.
Nayak, S., Wheeler, B. L., Shiflett, S. C., & Agostinelli, S. (2000). Effect of music therapy on mood
and social interaction among individuals with acute traumatic brain injury and stroke.
Rehabilitation Psychology 45(3), 274–283.
Patel, A. D. (2010). Music, biological evolution, and the brain. In M. Bailar (Ed.), Emerging
disciplines (pp. 91–144). Houston, TX: Rice University Press.
Peretz, I. (2006). The nature of music from a biological perspective. Cognition 100(1), 1–32.
Peterson, D. A., & Thaut, M. H. (2007). Music increases frontal EEG coherence during verbal
learning. Neuroscience Letters 412(3), 217–221.
Podd, M. H. (2012). History of cognitive remediation. In M. H. Podd, Cognitive remediation for
brain injury and neurological illness (pp. 1–4). New York: Springer.
Prigatano, G. P. (1997). Learning from our successes and failures: Reflections and comments on
“Cognitive rehabilitation: How it is and how it might be.” Journal of the International
Neuropsychological Society 3(5), 497–499.
Raglio, A., Attardo, L., Gontero, G., Rollino, S., Groppo, E., & Granieri, E. (2015). Effects of music
and music therapy on mood in neurological patients. World Journal of Psychiatry 5(1), 68–78.
Raskin, S. A. (2010). Current approaches to cognitive rehabilitation. In C. Armstrong & L. Morrow
(Eds.), Handbook of medical neuropsychology (pp. 505–518). New York: Springer.
Rohling, M. L., Faust, M. E., Beverly, B., & Demakis, G. (2009). Effectiveness of cognitive
rehabilitation following acquired brain injury: A meta-analytic re-examination of Cicerone et al.’s
(2000, 2005) systematic reviews. Neuropsychology 23(1), 20–39.
Rund, B. R., & Borg, N. E. (1999). Cognitive deficits and cognitive training in schizophrenic
patients: A review. Acta Psychiatrica Scandinavica 100(2), 85–95.
Särkämö, T., Altenmüller, E., Rodríguez-Fornells, A., & Peretz, I. (2016). Editorial: Music, brain,
and rehabilitation: Emerging therapeutic applications and potential neural mechanisms. Frontiers
in Human Neuroscience 10, 103. Retrieved from https://doi.org/10.3389/fnhum.2016.00103
Särkämö, T., Ripollés, P., Vepsäläinen, H., Autti, T., Silvennoinen, H. M., Salli, E., … Rodríguez-
Fornells, A. (2014). Structural changes induced by daily music listening in the recovering brain
after middle cerebral artery stroke: A voxel-based morphometry study. Frontiers in Human
Neuroscience 8, 245. Retrieved from https://doi.org/10.3389/fnhum.2014.00245
Särkämö, T., Tervaniemi, M., & Huotilainen, M. (2013). Music perception and cognition:
Development, neural basis, and rehabilitative use of music. Wiley Interdisciplinary Reviews:
Cognitive Science 4(4), 441–451.
Särkämö, T., Tervaniemi, M., Laitinen, S., Forsblom, A., Soinila, S., Mikkonen, M., … Hietanen, M.
(2008). Music listening enhances cognitive recovery and mood after middle cerebral artery stroke.
Brain 131(3), 866–876.
Särkämö, T., Tervaniemi, M., Laitinen, S., Numminen, A., Kurki, M., Johnson, J. K., & Rantanen, P.
(2014). Cognitive, emotional, and social benefits of regular musical activities in early dementia:
Randomized controlled study. Gerontologist 54(4), 634–650.
Sarnthein, J., Vonstein, A., Rappelsberger, P., Petsche, H., Rauscher, F. H., & Shaw, G. L. (1997).
Persistent patterns of brain activity: An EEG coherence study of the positive effect of music on
spatial-temporal reasoning. Neurological Research 19(2), 107–116.
Schlaug, G. (2009). Part VI introduction: Listening to and making music facilitates brain recovery
processes. Annals of the New York Academy of Sciences 1169, 372–373.
Schutz, L. E., & Trainor, K. (2007). Evaluation of cognitive rehabilitation as a treatment paradigm.
Brain Injury 21(6), 545–557.
Shaw, G. L., & Bodner, M. (1999). Music enhances spatial-temporal reasoning: Towards a
neurophysiological basis using EEG. Clinical Electroencephalography 30(4), 151–155.
Sihvonen, A. J., Särkämö, T., Leo, V., Tervaniemi, M., Altenmüller, E., & Soinila, S. (2017). Music-
based interventions in neurological rehabilitation. Lancet Neurology 16(8) 648–660.
Sohlberg, M. M., & Mateer, C. A. (2001). Cognitive rehabilitation: An integrative
neuropsychological approach. New York: Guilford Press.
Spring, B. J., & Ravdin, L. (1992). Cognitive remediation in schizophrenia: Should we attempt it?
Schizophrenia Bulletin 18(1), 15–20.
Sun, J.-H., Tan, L., & Yu, J.-T. (2014). Post-stroke cognitive impairment: Epidemiology, mechanisms
and management. Annals of Translational Medicine 2(8), 80.
Sutoo, D., & Akiyama, K. (2004). Music improves dopaminergic neurotransmission: Demonstration
based on the effect of music on blood pressure regulation. Brain Research 1016(2), 255–262.
Thaut, M. H. (2005a). Rhythm, music and the brain: Scientific foundations and clinical applications.
New York: Routledge.
Thaut, M. H. (2005b). Neurologic music therapy in cognitive rehabilitation. In M. Thaut, Rhythm,
music and the brain: Scientific foundations and clinical applications (pp. 179–202). New York:
Routledge.
Thaut, M. H. (2005c). Neurologic music therapy in sensorimotor rehabilitation. In M. Thaut, Rhythm,
music and the brain: Scientific foundations and clinical applications (pp. 137–164). New York:
Routledge.
Thaut, M. H. (2005d). Neurologic music therapy in speech and language rehabilitation. In M. Thaut,
Rhythm, music and the brain: Scientific foundations and clinical applications (pp. 165–178). New
York: Routledge.
Thaut, M. H. (2010). Neurologic music therapy in cognitive rehabilitation. Music Perception 27(4),
281–285.
Thaut, M. H. (2014). Assessment and the transformational design model (TDM). In M. H. Thaut &
V. Hoemberg (Eds.), Handbook of neurologic music therapy (pp. 60–68). Oxford: Oxford
University Press.
Thaut, M. H. (2015). Music as therapy in early history. Progress in Brain Research 217, 143–158.
Thaut, M. H., Gardiner, J. C., Holmberg, D., Horwitz, J., Kent, L., Andrews, G., Donelan, B., &
McIntosh, G. R. (2009). Neurologic music therapy improves executive function and emotional
adjustment in traumatic brain injury rehabilitation. Annals of the New York Academy of Sciences
1169, 406–416.
Thaut, M. H., & Hoemberg, V. (Eds.). (2014). Handbook of neurologic music therapy. Oxford:
Thaut, M. H., McIntosh, G. C., & Hoemberg, V. (2014). Neurologic music therapy: From social
science to neuroscience. In M. H. Thaut & V. Hoemberg (Eds.), Handbook of neurologic music
therapy (pp. 1–6). Oxford: Oxford University Press.
Thaut, M. H., Peterson, D. A., & McIntosh, G. C. (2005). Temporal entrainment of cognitive
functions: Musical mnemonics induce brain plasticity and oscillatory synchrony in neural
networks underlying memory. Annals of the New York Academy of Sciences 1060, 243–254.
Thaut, M. H., Peterson, D. A., McIntosh, G. C., & Hoemberg, V. (2014). Music mnemonics aid
verbal memory and induce learning: Related brain plasticity in multiple sclerosis. Frontiers in
Thompson, R. G., Moulin, C. J., Hayre, S., & Jones, R. W. (2005). Music enhances category fluency
in healthy older adults and Alzheimer’s disease patients. Experimental Aging Research 31(1), 91–
99.
Volpe, B. T., & McDowell, F. H. (1990). The efficacy of cognitive rehabilitation in patients with
traumatic brain injury. Archives of Neurology 47, 220–222.
Wallace, W. T. (1994). Memory for music: Effect of melody on recall of text. Journal of
Experimental Psychology: Learning, Memory, & Cognition 20(6), 1471–1485.
life span. Neuroscientist 16(5), 566–577.
Wolfe, D. E., & Hom, C. (1993). Use of melodies as structural prompts for learning and retention of
sequential verbal information by preschool students. Journal of Music Therapy 30(2), 100–118.
Wykes, T., Huddy, V., Cellard, C., McGurk, S. R., & Czobor, P. (2011). A meta-analysis of cognitive
remediation for schizophrenia: Methodology and effect sizes. American Journal of Psychiatry
168(5), 472–485.
CHAPT E R 31
MUSICAL DISORDERS
I S A B E L L E R O YA L, S É B A S T I E N PA Q U E T T E, A N D
PA U L I N E T R A N C H A N T
T vast majority of people choose to experience music on a daily basis.

Part of the reason why is that music is known to regulate our moods and can
produce positive emotions through the sensory and cognitive experience it
provides (Lonsdale & North, 2011; Salimpoor et al., 2013). Music is also an
experience that can be shared socially through dancing or singing in
synchrony with others (Wiltermuth & Heath, 2009). Although music
listening habits have evolved throughout the years with the rise of various
audio devices, the appeal of music is not a new phenomenon. Music has
been documented in nearly every culture over time, which makes musical
engagement a fundamental and universal human trait (Merriam, 1964;
Peretz, 2006). Indeed, the human brain is able to track musical changes
such as pitch and time variations just a few hours after birth, indicating that
it is equipped with the necessary neural architecture to naturally acquire
musical abilities during early development (Peretz, 2002; Peretz &
Coltheart, 2003; Trehub, 2001; Zatorre & Peretz, 2001).
Despite the universality of music, a minority of individuals present with
very specific musical perception deficits that cannot be attributed to general
auditory dysfunction, intellectual disability, or a lack of musical exposure
(Ayotte, Peretz, & Hyde, 2002). Cases where these deficits are present from
birth are often referred to as “congenital amusia” (Peretz, 2001; Peretz et
al., 2002; Peretz, Cummings, & Dubé, 2007). Congenital amusia acts as an
umbrella term to designate both a “pitch-based” amusia, an inability to
process subtle pitch changes, and a “beat finding” disorder, an inability to
synchronize to music (Phillips-Silver et al., 2011; Sowiński & Dalla Bella,
2013; Tranchant, Vuvan, & Peretz, 2016). In contrast, acquired amusia
refers to the development of similar symptoms following a neurological
event (e.g., stroke or accident).
The goal of this chapter is to provide an overview of these intriguing
musical disorders and to demonstrate how they represent a unique
opportunity to study not only normal brain function, but also to isolate the
specific brain areas that play a role in musical processing. The first sections
of this chapter will focus on pitch-based amusia, beat finding disorder,
acquired amusia as well as musical anhedonia. The last section will be
dedicated to a discussion highlighting what the study of these musical
disorders has taught us about normal brain function.
P -B A
Prevalence and Behavioral Markers

The most prevalent musical disorder is congenital and specifically affects
the perception of pitch. Congenital amusia affects approximately 1.5
percent of the population, with no discernible difference in prevalence
between women and men (Peretz & Vuvan, 2017). Behaviorally, individuals
with congenital amusia are distinct from unaffected individuals because
they have difficulty singing in-tune, detecting singing that is out-of-tune
(including their own), and identifying a familiar song without lyrics (Ayotte
et al., 2002; Peretz, Champod, & Hyde, 2003). They also struggle with
maintaining short melodies in working memory (Ayotte et al., 2002). These
behavioral markers are thought to be the result of an impairment that affects
the processing of fine pitch variations that is central to the emergence of the
musical deficits observed in congenital amusia (Hyde & Peretz, 2004).
Indeed, congenital amusics cannot reliably detect pitch deviations that
are smaller than two semitones, whereas non-amusics can reliably detect
differences that are several orders of magnitude smaller (Hyde & Peretz,
2004). One semitone (100 cents on a logarithmic scale of music interval) is
equivalent to the distance between two consecutive piano keys. Because
Western music often uses pitch variations that are below the detection
threshold of congenital amusics (i.e., one semitone), essential parts of the
musical structure are often missed. This also implies that amusics
occasionally fail to identify notes that violate tonal regularity, explaining
why they struggle so much with detecting out-of-tune singing. Given that
these deficits are hallmark behavioral manifestations of congenital amusia,
they are often the focus of diagnostic tools used to identify individuals
affected by this disorder.
Identification of Congenital Amusia

The most widely used quantitative tool to identify individuals with
congenital amusia is the Montreal Battery of Evaluation of Amusia
(MBEA) (Peretz et al., 2003). It assesses several aspects of auditory
musical perception, such as tonal knowledge, temporal (rhythm) processing,
musical working memory, and musical recognition abilities. It also allows
for the comparison of an individual’s profile to that of an amusic
population. Given that the core deficit of congenital amusia relates to an
impaired perception of pitch structure, an individual must perform below
the cut-off scores for tasks requiring the detection of melodic key violations
to be considered amusic. In addition, other confounding factors must first
be excluded for the diagnosis to stand, such as a low cognitive potential,
abnormal hearing abilities, and a history of traumatic brain injury. More
recently, the Montreal Protocol for Identification of Amusia (MPIA) was
published to introduce a full evaluation protocol through which congenital
amusics can be effectively identified (Vuvan et al., 2017). It includes the
MBEA, questionnaires, and a description of the testing of relevant
exclusion criteria.
Neurological Markers
Although identifying congenital amusics at the behavioral level is
unquestionably useful to characterize how they perceive music, it is only
the first step towards understanding the etiology of the disorder. What
characterizes congenital amusia at the functional level is a dissociation
between what is perceived and what is consciously detected, a phenomenon
known as perception without awareness. Indeed, although congenital
amusics fail to reliably detect pitch changes that are smaller than two
semitones, their brain is able to process changes that are as small as an
eighth of a tone (Hyde, Zatorre, & Peretz, 2011; Moreau, Jolicœur, &
Peretz, 2009; Peretz, Brattico, Järvenpää, & Tervaniemi, 2009). What
makes congenital amusics different from non-amusics is that, although their
auditory system clearly detects small pitch differences, this information
does not appear to reach higher-order brain areas that are responsible for the
conscious perception of these differences.
Early neuroimaging studies using electroencephalography (EEG)
demonstrated that the brain of both congenital amusics and healthy controls
can detect small pitch variations presented within a string of repeated
sounds (Moreau et al., 2009; Peretz, Brattico, & Tervaniemi, 2005). This
detection is revealed by the presence of a specific EEG component that is
associated with the automatic detection of pitch variations, the mismatch
negativity (MMN). Both the MMN and the N1 (another EEG component
that is elicited by all auditory stimuli) of amusics are comparable to that of
control subjects (although see Albouy et al., 2013). Furthermore, when
presented with tones in a melodic context, the brain of congenital amusics
also exhibits a normal early right anterior negativity (ERAN; Koelsch,
2011; Koelsch & Siebel, 2005) in response to the violation of more
complex auditory sequences based on the Western tonal system. In marked
contrast, however, the P3b and the P600 components, associated with the
conscious detection of deviant tone, are significantly altered in congenital
amusics. Because the P3b is generally absent in response to pitch variations
that are smaller than one semitone in non-amusics (Peretz et al., 2005), it
suggested that its absence in amusics might reflect a dysfunction of the
mechanisms that would normally underlie the conscious perception of the
deviation.
These EEG findings supporting the idea of perception without
awareness are also consistent with results obtained using functional
magnetic resonance imaging (fMRI). For instance, a positive and linear
increase in the blood-oxygen-level-dependent (BOLD) response was found
when congenital amusics listened to melodies composed of pure tones
varying from zero to two semitones, indicating that their brain can indeed
track very small variations despite the absence of any conscious perception
(Hyde et al., 2011).
Contribution of the Right Frontotemporal

Network
Compared to the normal brain, several areas of the congenital amusic’s
brain present both structural and functional anomalies that are thought to
underlie the manifestation of congenital amusia. These anomalies are
primarily found in the right frontotemporal network, encompassing the right
inferior frontal gyrus (IFG; Brodmann area (BA) 44, 45, 47), the superior
temporal gyrus (STG; BA 22), and the right arcuate fasciculus. From a
structural standpoint, the congenital amusic brain displays decreased white
matter concentration and increased gray matter concentration in the right
IFG (Albouy et al., 2013; Hyde, Zatorre, Griffiths, Lerch, & Peretz, 2006;
Hyde et al., 2007), gray matter morphological alterations in the right STG
(Albouy et al., 2013; Hyde et al., 2007), and a reduced number of white
matter fibers in the right arcuate fasciculus (Loui, Alsop, & Schlaug, 2009;
Wilbiks, Vuvan, Girard, Peretz, & Russo, 2016). Furthermore, there is
evidence supporting a reduction in connectivity between the right IFG and
right STG (Albouy et al., 2013; Albouy, Mattout, Sanchez, Tillmann, &
Cacin, 2015; Hyde et al., 2011; Leveque et al., 2016), in addition to
evidence supporting an increase in connectivity between the right and left
STG (Albouy et al., 2015; Hyde et al., 2011). Taken together, these results
suggest a disturbance in the recurrent processing between the right IFG and
the right STG, which is believed to underlie the manifestation of amusia
(Fig. 1) (Peretz, 2016). According to this hypothesis, the IFG—whose role
is to amplify and refine the auditory signal from the STG (Opitz, Rinne,
Mecklinger, von Cramon, & Schröger, 2002)—is unable to provide
adequate top-down modulation of the signal processed in the right STG,
which significantly hampers the amusic’s ability to consciously detect
subtle auditory changes in the environment.
FIGURE 1. Anomalous recurrent processing in the right frontotemporal network.

Reprinted from Trends in Cognitive Sciences, 20(11), Isabelle Peretz, Neurobiology of
congenital amusia, pp. 857–67, doi.org/10.1016/j.tics.2016.09.002, Copyright © 2016
Elsevier Ltd. All rights reserved, with permission from Elsevier.
This first section described congenital amusia in terms of its

distinguishing neuropsychological features and presented the specific
behavioral and neurological anomalies that characterize it. At the end of the
chapter, we will further discuss how our understanding of brain function has
evolved in light of what we have learned from studying congenital amusia.
The following section will focus on a different musical disorder that affects
an individual’s ability to properly synchronize with music.
B F D
Dance is a universal phenomenon that has been documented throughout

history and found across different cultures (e.g., Nettl, 2000). Dancing is
likely to have resulted from evolutionary forces, as it may play a role in
sexual attraction (Darwin, 1871; Neave et al., 2011) and in the development
of pro-social skills (e.g., Cirelli, Einarson, & Trainor, 2014). Yet, anecdotes
of people having poor rhythmic skills, or “two left feet,” are frequent. The
study of a new form of congenital amusia, a beat finding disorder (also
known as beat-deafness) that affects time perception, rather than pitch
perception, has emerged in recent years. Phillips-Silver et al. (2011) first
reported the case of Mathieu, a university student experiencing difficulty to
match the tempo of a dance-like bouncing movement (bending the knees up
and down) with music. This rhythmic difficulty could not be explained by
other cognitive, pitch-processing, or motor deficits. Perhaps cases like
Mathieu, however, should not be so surprising: moving in synchrony with a
beat, despite its apparent simplicity, is indeed a sophisticated behavior. To
properly perform this behavior, one must be able to predict when the next
beat will occur; indeed, humans generally attempt to anticipate tones—
rather than react to them—when synchronizing to a metronome (for a
review, see Repp, 2005). In addition, complex musical forms are usually
characterized by the absence of a one-to-one correspondence between tones
and beats. For example, in syncopated rhythms, which are typical of jazzier
musical styles, beats can occur on silences. Therefore, beat perception in
the context of music requires the listener to infer the timing of the beats
(Honing, 2013). Several other individuals with a profile comparable to
Mathieu have been identified since 2011 (four in Sowiński & Dalla-Bella,
2013; one in addition to Mathieu in Palmer, Lidji, & Peretz, 2014; fourteen
in Tranchant et al., 2016; and subject LV in Bégel et al., 2017). These
individuals offer a unique opportunity to study the neural mechanisms that
are essential to beat perception and synchronization (e.g. Paquette et al.,
2017), and to provide insight into the neurobiological origins of dance.
In a follow-up study with Mathieu, his beat prediction abilities were
assessed with a task that required him to tap in synchrony with a
metronome (Palmer et al., 2014). Mathieu’s taps preceded the tone onset by
approximately 30 milliseconds, which was comparable with what was
observed in non-amusics. However, when experimenters introduced
unpredictable temporal changes in the metronome sequences, Mathieu
required a larger amount of taps to return to baseline compared with non-
amusics. This finding led the authors to conclude that a deficient
perception–action coupling likely underlay his beat finding deficit. Such a
hypothesis raises the question of whether poor synchronization with music
is always accompanied by poor beat perception, or rather if it could be due
to poor coupling between (normal) perception and movement. Phillips-
Silver et al. (2011) had previously observed that Mathieu had difficulty in
judging whether the underlying pattern of strong and weak beats in short
piano melodies corresponded to a march (One-two-One-two) or to a waltz
(One-two-three-One-two-three; Meter Test of the MBEA, see Peretz et al.,
2003). It was therefore concluded that Mathieu’s inability to detect an
underlying beat in a musical context likely stemmed from poor beat
perception abilities. In contrast, Sowiński and Dalla-Bella (2013) identified
two individuals with significant beat finding difficulties in a musical
context in the absence of any deficits on two rhythm perception tasks,
suggesting that poor synchronization can be dissociated from poor
perception. However, Tranchant and Vuvan (2015) noted that beat
perception is not necessary to perform the perceptual tasks used in Sowiński
and Dalla-Bella (2013). It thus remains unclear whether beat perception was
truly unimpaired in these cases. Therefore, future studies should attempt to
dissociate beat perception abilities from synchronization abilities.
The study of beat finding disorder is still in its infancy. In particular, the
brain correlates of the disorder have not thoroughly been investigated yet.
Much of what is known so far was provided by Mathias and colleagues
(Mathias, Lidji, Honing, Palmer, & Peretz, 2016), who showed that the
MMN following beat omissions in rhythmical patterns was normal in
Mathieu, suggesting that the pre-attentive processing of beat irregularities is
preserved. However, Mathieu’s event-related potentials lacked a P3b
component in response to beat omissions, which, as mentioned earlier, is a
key component associated with the conscious detection of deviant auditory
stimuli. Future research should harness recently developed EEG techniques
(e.g., Nozaradan, Peretz, & Mouraux, 2012) to investigate neuronal
entrainment to a beat in the beat finding disorder. New cases like Mathieu
have been identified over the last few years, opening the door for the
neuroimaging investigation of brain regions and/or networks anomalies
associated with the beat finding disorder.
A A
The term “amusia” is often used to refer to its developmental form (i.e.,
congenital amusia), as previously described in this chapter. However,
amusia can also be acquired, often following a significant neurological
event (e.g., brain trauma or stroke). Because the severity and the location of
brain insults are typically unique to each individual, individuals with
acquired amusia present with very unique manifestations, often not limited
to a single aspect of music processing. As a result, the scientific literature
on acquired amusia has documented a wide array of music deficits (see
Clark, Golden, & Warren, 2015 for a review). Although studying both
congenital and acquired amusia allows for a better understating of how
music is processed by the brain, the study of acquired amusia is uniquely
positioned to inform us on deficient music processing networks resulting
from a neurological insult. A review of several acquired amusia case studies
suggests that lesions producing deficits in musical perception are
predominantly located in the right hemisphere (Stewart, von Kriegstein,
Warren, & Griffiths, 2006). However, some exceptions do exist. For
instance, there are reports describing acquired music perception
impairments that occurred following lesions to both hemispheres (Ayotte,
Peretz, Rousseau, Bard, & Bowjanowski, 2000; Liégeois-Chauvel, Peretz,
Babaï, Laguitton, & Chauvel, 1998; Peretz, 1990; Schuppert, Münte,
Wieringer, & Altenmüller, 2000).
Acquired Pitch Perception Deficits

As previously discussed, the detection of subtle pitch variations is essential
for the perception of music, and a reduced awareness to those variations is
at the core of what is generally described as “amusia.” Individual cases of
acquired amusia usually have in common lesions that encompass the
anterior and middle portions of the right superior temporal gyrus (STG) and
the insula (e.g., Ayotte et al., 2000; Hochman & Abrams, 2014; Peretz et
al., 1994). Importantly, Ayotte and collaborators (2000) demonstrated that
in some cases, the lesions can selectively impair pitch processing while
preserving temporal (rhythm and meter) perception, suggesting that distinct
neural substrates may underlie both deficits.
Acquired amusia has also been documented in patients who underwent
surgery for intractable epilepsy. For example, Liégeois-Chauvel and
collaborators (1998) demonstrated that right hemisphere resections
including the STG impairs the processing of both contour and pitch
intervals, whereas left hemisphere resections primarily affected the
processing of pitch interval. Furthermore, Peretz and collaborators (1994)
showed that when bilateral auditory lesions preserved the primary auditory
cortex, only melody perception impairments are observed as rhythm
perception abilities appear to be preserved.
Acquired Time Perception Deficits

The study of individual cases of acquired amusia has indicated that time
perception can also be specifically impaired following brain injury. For
example, patients with lesions to their right anterior temporal lobe, planum
temporale, or insula, display impaired rhythm (or time interval) perception
even though their pitch interval perception is preserved (Confavreux,
Croisile, Garassus, Aimard, & Trillet, 1992; Fujii et al., 1990; Mendez &
Geehan, 1988).
Impaired rhythm perception has also been documented following an
anterior temporal resection that included the right STG (Liégeois-Chauvel
et al., 1998), whereas impaired meter processing has been reported
following resections of the anterior temporal lobe in both the right and left
hemispheres (Liégeois-Chauvel et al., 1998; Schuppert et al., 2000).
Given the different phenotypes of acquired amusia, the following section
will present acquired amusia as a broader musical deficit (as measured by
the global score of the MBEA).
Acquired Amusia: Larger Cohort Studies

Although case studies are quite useful for the investigation of the different
phenotypes of acquired amusia, larger cohort studies allow for the
investigation of what these different acquired amusia phenotypes have in
common. For instance, reports indicate that musical disorders are
commonly observed following a middle cerebral artery (MCA) stroke
(Ayotte et al., 2000; Schuppert et al., 2000) and symptoms often persist
beyond the acute phase of the stroke (Sihvonen, Ripollés, Rodríguez-
Fornells, Soinila, & Särkämö, 2017). In these patients, the severity of the
ensuing music processing deficit, as assessed by the MBEA, is often greater
when damage is localized in the right hemisphere (particularly in frontal
and temporal areas), compared with the effects of damage in the left
hemisphere (Särkämö et al., 2009).
Moreover, using MRI Voxel-based lesion-symptom mapping and Voxel-
based Morphometry (VBM) in a sample of 77 stroke patients (49 amusics),
a recent study was able to identify which brain structures were typically
damaged and which were typically preserved in acquired amusia (Sihvonen
et al., 2016). Based on their findings, the authors concluded that damage to
the right STG, middle temporal gyrus (MTG), insula, and putamen, formed
the necessary neural substrate for acquired amusia. Furthermore, VBM
analyses revealed that patients with persistent amusia had gray matter
volume decreases in the right STG and MTG at the six-month follow-up
assessment, in addition to a white matter volume decrease in the right
MTG, when compared to control patients.
In a follow-up study with a larger sample size (N = 90), acquired amusia
was associated with an acute stage lesion pattern in the right temporal,
insular, and striatal areas (Sihvonen, Ripollés, Rodríguez-Fornells, et al.,
2017). Persistent amusia was associated with gray matter volume decreases
in right temporal areas. The decrease in volume was localized more
posteriorly in individuals with a pitch-based amusia and more anteriorly in
those with a beat finding disorder. One of the particularly novel findings of
the study was that more severe and persistent manifestations of amusia were
associated with a more widespread pattern of acute stage lesions and gray
matter volume changes in the right hemisphere, which encompassed not
only temporal, insular, and striatal areas, but also frontal, parietal, and
limbic areas.
Similar to congenital amusia, which is characterized by neural anomalies
that affect both functional and structural connectivity (Peretz, 2016),
acquired amusia has been associated with structural damage in multiple
white matter tracts, including the arcuate fasciculus, the inferior
longitudinal fasciculus, the uncinate fasciculus, the frontal aslant tract, the
corpus callosum, and the right inferior fronto-occipital fasciculus
(Sihvonen, Ripollés, Särkämö, et al., 2017). Furthermore, structural damage
to the right inferior fronto-occipital fasciculus was also found to be the best
predictor of global MBEA score. Taken together, these studies illustrate the
widespread brain areas that, when lesioned, can produce the symptoms
associated with acquired amusia. These areas not only include auditory
regions but also several regions involved in higher-order cognitive
processes, such as those found in the frontal and parietal cortex.
Acquired Amusia and its Comorbidities

Because acquired amusia typically results from an insult to one of several
cerebral structures, it is often comorbid with other conditions that affect
higher cognitive functions. For example, a review of case studies showed
that 55 percent of patients with acquired amusia also have at least some
form of language impairment (Stewart et al., 2006). Similarly, it was later
reported that individuals with acquired amusia steadfastly performed worse
than non-amusics on tests of verbal expression and comprehension
(Särkämö et al., 2009). These findings suggest a link between music and
language processing that could potentially be mediated by some other
cognitive processes. Furthermore, the latter study also showed an
association between acquired amusia and visuospatial processing (Särkämö
et al., 2009). Indeed, patients who showed contralesional spatial neglect in a
visuospatial attention task were also considered amusic based on their
global MBEA score. Interestingly, previous reports have also shown that
visuospatial deficits and mild neglect tend to co-occur with musical deficits
in right hemisphere-lesioned patients (Liégeois-Chauvel et al., 1998; Peretz,
1990), suggesting that music processing and spatial processing may be
subserved by common cognitive processes.
Finally, it is worth noting that acquired amusia has also been associated
with a wide range of other cognitive dysfunctions that include working
memory, verbal learning, executive functioning, and attention deficits
(Särkämö et al., 2009). The severity of the amusia (MBEA score),
therefore, is potentially exacerbated by such impairments, and consequently
not solely due to a dysfunction in brain areas that subserve music
processing (Särkämö et al., 2009).
M A
Although most people listen to music to help regulate their mood,

individuals with a musical perception disorder may not experience this
emotional appeal to music (Lonsdale & North, 2011) despite the fact that
their ability to extract basic emotional information from music is relatively
preserved (Gosselin, Paquette, & Peretz, 2015). The majority of amusics do
not seek musical exposure and are generally found to be indifferent to it
(Omigie, Müllensiefen, & Stewart, 2012; Gosselin et al., 2015).
Two acquired amusia case studies suggest a possible dissociation
between the ability to perceive music and the pleasure derived from
listening to music. The first case, IR, is a patient who acquired amusia
following brain damage following cerebral aneurysms in both the left and
right middle cerebral arteries. Although IR could no longer discriminate
fine pitch variations, she still enjoyed music and noted often dancing to
music while doing chores (Peretz, Gagnon, & Bouchard, 1998). The second
case is a patient who acquired amusia following an ischemic stroke. He
subsequently reported having lost interest in music afterwards, as he no
longer felt emotionally engaged when listening to music (Hirel et al., 2014).
Although he could readily identify emotions in different sensory modalities
(music, faces), his emotional intensity judgments for music were attenuated
compared to those of his matched control. His lesions in the right
hemisphere encompassed primarily the STG, the primary auditory cortex,
but also included portions of the ventral segment of the middle temporal
gyrus, amygdala, and insula.
In the general population, some individuals do not experience pleasure
when listening to music, while still being able to successfully extract its
emotional content (Mas-Herrero, Zatorre, Rodríguez-Fornells, & Marco-
Pallarés, 2014). This condition, known as musical anhedonia, is believed to
result from a disconnection between the perception of an emotion and
experiencing it.
The results of a recent study revealed that participants with musical
anhedonia show a selective reduction of brain activity in the nucleus
accumbens that was specific to when listening to music, whereas normal
activation patterns were observed during a monetary gambling task
(Martínez-Molina, Mas-Herrero, Rodríguez-Fornells, Zatorre, & Marco-
Pallarés, 2016). Furthermore, anhedonic participants were also found to
have decreased levels of functional connectivity between the right auditory
cortex and the ventral striatum compared to control participants.
Participants who displayed above average responses to music showed
enhanced connectivity between these same structures. Taken together, these
results suggest that musical anhedonia might result from dysfunctional
connectivity between auditory areas and the subcortical structures
implicated in the reward network.
W H W L S
A ?
In this chapter, we examined the three most prevalent musical disorders:

congenital amusia, beat finding disorder, and acquired amusia. The study of
these disorders has not only revealed the importance of several right
hemisphere brain structures for the processing of pitch, but also that a
dysfunction of any of these areas can produce pitch processing disorders.
Compared to congenital amusia, acquired amusia has a more heterogeneous
presentation and is often comorbid with other cognitive impairments. These
additional cognitive impairments need to be taken into consideration when
attempting to isolate brain structures that are critical for certain types of
musical processing. Although current evidence suggests a dissociation
between pitch processing disorders and beat finding disorders, additional
experiments are needed to paint a clearer picture of the neural structures
that underlie the latter type.
What Does Congenital Amusia Have in Common

with Other Developmental Disorders?
The current understanding that amusia is a disorder stemming from an
anomalous recurrent processing between a sensory core (i.e., auditory
cortex) and higher-order frontal areas (i.e., IFG) shares important
similarities with other neurodevelopmental disorders, such as
developmental dyslexia and congenital prosopagnosia (Paquette et al.,
2018). Developmental dyslexia is a disorder that affects the accuracy and
fluidity of reading (Lyon, Shaywitz, & Shaywitz, 2003), whereas congenital
prosopagnosia consists of an inability to recognize faces (Behrmann &
Avidan, 2005).
In developmental dyslexia, although phonetic representations appear to
be properly processed in the left STG, functional and structural connectivity
anomalies between the left STG and left IGF prevent dyslexic individuals
from consciously accessing those representations (Boets et al., 2013;
Ramus, 2014). Similarly, the right fusiform face area (FFA) shows normal
face selectivity in the majority of individuals suffering from prosopagnosia
(Avidan & Behrmann, 2009). However, a reduction in the concentration of
white matter fibers connecting the FFA to anterior frontal and temporal
regions has been observed in prosopagnosic individuals (Thomas et al.,
2008), suggesting that a transmission issue may also be at the root of the
disorder. The parallels that can be drawn between amusia and these
developmental disorders suggest that a key common neural feature
underlies their expression: a breakdown in the communication pathways
between a core sensory area and a higher-order brain region that are
required for the conscious detection of subtle perceptual differences.
Different Phenotypes of Amusia

Over the past decade, research on amusia has provided evidence that pitch-
based amusia is a musical disorder whose origin is acoustic in nature.
Indeed, a recent meta-analysis indicated that the processing deficit observed
in congenital amusia is both acoustic (i.e., sounds that are not presented in a
musical context such as a melody) and musical, thus suggesting that most
congenital amusic cases are likely the result of an acoustic deficit that
generates a musical deficit (Vuvan, Nunes-Silva, & Peretz, 2015). Indeed,
because the conscious detection threshold of amusics is greater than the
intervals used in Western music, amusics do not have access to all the
necessary information to perceive and produce music normally. However,
amusia case studies have indicated that a double dissociation exists between
acoustic and musical processing, suggesting that the presentation of the
disorder may be more heterogeneous than previously thought. This
dissociation is intriguing because it raises the question whether both types
of processing stem from similar or distinct neural structures, and also
generates new testable hypotheses for future neuroimaging experiments.
Improving Perceptual Outcomes in Amusia

To this day, few studies have examined the potential benefit of
rehabilitation strategies to attenuate the perceptual deficits of amusia, with
only modest outcomes (see Whiteford & Oxenham, 2018 for a positive
outcome). One promising avenue, however, is the use of neuromodulation
techniques such as transcranial magnetic stimulation (TMS) or transcranial
direct-current stimulation (tDCS). For instance, the application of tDCS has
been shown to improve certain aspects of reading in children and
adolescents suffering from dyslexia (Costanzo et al., 2016). Furthermore, it
was recently shown that transcranial alternating current stimulation (tACS)
can improve pitch memory in amusics (Schaal, Pfeifer, Krause, & Pollok,
2015). Because neuromodulation techniques can non-invasively modulate
the activity of a given region, they could be used to better characterize the
role of distinct cerebral networks in the expression of amusia, and to
determine if the stimulation of different brain regions can improve the
perceptual deficits observed in the different amusia phenotypes.
R
Albouy, P., Mattout, J., Bouet, R., Maby, E., Sanchez, G., Aguera, P. E., … Tillmann, B. (2013).
Impaired pitch perception and memory in congenital amusia: The deficit starts in the auditory
cortex. Brain 136(5), 1639–1661.
Albouy, P., Mattout, J., Sanchez, G., Tillmann, B., & Cacin, A. (2015). Altered retrieval of melodic
information in congenital amusia: Insights from dynamic causal modeling of MEG data. Frontiers
in Human Neuroscience 9. Retrieved from
http://journal.frontiersin.org/Article/10.3389/fnhum.2015.00020/abstract
Avidan, G., & Behrmann, M. (2009). Functional MRI reveals compromised neural integrity of the
face processing network in congenital prosopagnosia. Current Biology 19(13), 1146–1150.
Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia: A group study of adults afflicted with a
music-specific disorder. Brain 125(2), 238–251.
Ayotte, J., Peretz, I., Rousseau, I., Bard, C., & Bowjanowski, M. (2000). Patterns of music agnosia
associated with middle cerebral artery infarcts. Brain 123(9), 1926–1938.
Bégel, V., Benoit, C.-E., Correa, A., Cutanda, D., Kotz, S. A., & Dalla Bella, S. (2017). “Lost in
time” but still moving to the beat. Neuropsychologia 94, 129–138.
Behrmann, M., & Avidan, G. (2005). Congenital prosopagnosia: Face-blind from birth. Trends in
Boets, B., Op de Beeck, H. P., Vandermosten, M., Scott, S. K., Gillebert, C. R., Mantini, D., …
Ghesquière, P. (2013). Intact but less accessible phonetic representations in adults with dyslexia.
Science 342(6163), 1251–1254.
Clark, C. N., Golden, H. L., & Warren, J. D. (2015). Acquired amusia. Handbook of Clinical
Neurology 129, 607–631.
Confavreux, C., Croisile, B., Garassus, P., Aimard, G., & Trillet, M. (1992). Progressive amusia and
aprosody. Archives of Neurology 49(9), 971–976.
Costanzo, F., Varuzza, C., Rossi, S., Sdoia, S., Varvara, P., Oliveri, M., … Menghini, D. (2016).
Reading changes in children and adolescents with dyslexia after transcranial direct current
stimulation. NeuroReport 27(5), 295–300.
Darwin, C. (1871). The descent of man, and selection in relation to sex. London: Murray.
Fujii, T., Fukatsu, R., Watabe, S. I., Ohnuma, A., Teramura, K., Kimura, I., … Kogure, K. (1990).
Auditory sound agnosia without aphasia following a right temporal lobe lesion. Cortex 26(2), 263–
268.
Gosselin, N., Paquette, S., & Peretz, I. (2015). Sensitivity to musical emotions in congenital amusia.
Cortex 71, 171–182.
Hirel, C., Lévêque, Y., Deiana, G., Richard, N., Cho, T.-H., Mechtouff, L., … Nighoghossian, N.
(2014). Amusie acquise et anhédonie musicale. Revue Neurologique 170(8–9), 536–540.
Hochman, M., & Abrams, K. (2014). Amusia for pitch caused by right middle cerebral artery infarct.
Journal of Stroke and Cerebrovascular Diseases 23(1), 164–165.
Honing, H. (2013). Structure and interpretation of rhythm in music. In D. Deutsch (Ed.), The
psychology of music (pp. 369–404). London: Academic Press.
Hyde, K. L., Lerch, J. P., Zatorre, R. J., Griffiths, T. D., Evans, A. C., & Peretz, I. (2007). Cortical
thickness in congenital amusia: When less is better than more. Journal of Neuroscience 27(47),
13028–13032.
Hyde, K. L., & Peretz, I. (2004). Brains that are out of tune but in time. Psychological Science 15(5),
356–360.
Hyde, K. L., Zatorre, R. J., Griffiths, T. D., Lerch, J. P., & Peretz, I. (2006). Morphometry of the
amusic brain: A two-site study. Brain 129(10), 2562–2570.
Hyde, K. L., Zatorre, R. J., & Peretz, I. (2011). Functional MRI evidence of an abnormal neural
network for pitch processing in congenital amusia. Cerebral Cortex 21(2), 292–299.
Sciences 9(12), 578–584.
Leveque, Y., Fauvel, B., Groussard, M., Caclin, A., Albouy, P., Platel, H., & Tillmann, B. (2016).
Altered intrinsic connectivity of the auditory cortex in congenital amusia. Journal of
Liégeois-Chauvel, C., Peretz, I., Babaï, M., Laguitton, V., & Chauvel, P. (1998). Contribution of
different cortical areas in the temporal lobes to music processing. Brain 121(10), 1853–1867.
Lonsdale, A. J., & North, A. C. (2011). Why do we listen to music? A uses and gratifications
analysis. British Journal of Psychology 102(1), 108–134.
Loui, P., Alsop, D., & Schlaug, G. (2009). Tone deafness: A new disconnection syndrome? Journal
of Neuroscience 29(33), 10215–10220.
Lyon, G. R., Shaywitz, S. E., & Shaywitz, B. A. (2003). A definition of dyslexia. Annals of Dyslexia
53(1), 1–14.
Sciences 113(46), E7337–E7345.
Mas-Herrero, E., Zatorre, R. J., Rodríguez-Fornells, A., & Marco-Pallarés, J. (2014). Dissociation
between musical and monetary reward responses in specific musical anhedonia. Current Biology
24(6), 699–704.
Mathias, B., Lidji, P., Honing, H., Palmer, C., & Peretz, I. (2016). Electrical brain responses to beat
irregularities in two cases of beat deafness. Frontiers in Neuroscience 10. Retrieved from
Mendez, M. F., & Geehan, G. R. (1988). Cortical auditory disorders: Clinical and psychoacoustic
features. Journal of Neurology, Neurosurgery, and Psychiatry 51(1), 1–9.
Merriam, A. P. (1964). The anthropology of music. Evanston, IL: Northwestern University Press.
Moreau, P., Jolicœur, P., & Peretz, I. (2009). Automatic brain responses to pitch changes in
congenital amusia. Annals of the New York Academy of Sciences 1169, 191–194.
Neave, N., McCarty, K., Freynik, J., Caplan, N., Honekopp, J., & Fink, B. (2011). Male dance moves
that catch a woman’s eye. Biology Letters 7(2), 221–224.
Nettl, B. (2000). An ethnomusicologist contemplates universals in musical sound and musical
culture. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 463–472).
Cambridge, MA: MIT Press.
Nozaradan, S., Peretz, I., & Mouraux, A. (2012). Selective neuronal entrainment to the beat and
meter embedded in a musical rhythm. Journal of Neuroscience 32(49), 17572–17581.
Omigie, D., Müllensiefen, D., & Stewart, L. (2012). The experience of music in congenital amusia.
Opitz, B., Rinne, T., Mecklinger, A., von Cramon, D. Y., & Schröger, E. (2002). Differential
contribution of frontal and temporal cortices to auditory change detection: fMRI and ERP results.
NeuroImage 15(1), 167–174.
Palmer, C., Lidji, P., & Peretz, I. (2014). Losing the beat: Deficits in temporal coordination.
Philosophical Transactions of the Royal Society B: Biological Sciences 369(1658), 20130405–
20130405.
discrimination. NeuroImage, 163, 177–182.
Paquette, S., Li, H. C., Corrow, S. L., Buss, S. S., Barton, J., & Schlaug, G. (2018). Developmental
perceptual impairments: Cases when tone-deafness and prosopagnosia co-occur. Frontiers in
human neuroscience, 12, 438.
patients. Brain 113(4), 1185–1205.
Peretz, I. (2001). Brain specialization for music. Annals of the New York Academy of Sciences 930,
153–165.
Peretz, I. (2002). Brain specialization for music. The Neuroscientist 8(4), 372–380.
Peretz, I. (2006). The nature of music from a biological perspective. Cognition 100(1), 1–32.
Peretz, I. (2016). Neurobiology of congenital amusia. Trends in Cognitive Sciences 20(11), 857–867.
Peretz, I., Ayotte, J., Zatorre, R. J., Mehler, J., Ahad, P., Penhune, V. B., & Jutras, B. (2002).
Congenital amusia: A disorder of fine-grained pitch discrimination. Neuron 33(2), 185–191.
Peretz, I., Brattico, E., Järvenpää, M., & Tervaniemi, M. (2009). The amusic brain: In tune, out of
key, and unaware. Brain 132(5), 1277–1286.
Peretz, I., Brattico, E., & Tervaniemi, M. (2005). Abnormal electrical brain responses to pitch in
congenital amusia. Annals of Neurology 58(3), 478–482.
Peretz, I., Champod, A. S., & Hyde, K. (2003). Varieties of musical disorders: The Montreal battery
of evaluation of amusia. Annals of the New York Academy of Sciences 999, 58–75.
691.
Peretz, I., Cummings, S., & Dubé, M. P. (2007). The genetics of congenital amusia (tone deafness):
A family-aggregation study. American Journal of Human Genetics 81(3), 582–588.
Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants,
immediacy, and isolation after brain damage. Cognition 68(2), 111–141.
Peretz, I., Kolinsky, R., Tramo, M., Labrecque, R., Hublet, C., Demeurisse, G., & Belleville, S.
(1994). Functional dissociations following bilateral lesions of auditory cortex. Brain 117(6), 1283–
1301.
Peretz, I., & Vuvan, D. (2017). Prevalence of congenital amusia. European Journal of Human
Genetics 25, 625–630.
Phillips-Silver, J., Toivianen, P., Gosselin, N., Piché, O., Nozaradan, S., Palmer, C., & Petetz, I.
(2011). Born to dance but beat deaf: A new form of congenital amusia. Neuropsychologia 49(5),
961–969.
Ramus, F. (2014). Neuroimaging sheds new light on the phonological deficit in dyslexia. Trends in
Repp, B. H. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic
Salimpoor, V. N., van den Bosch., I., Kovacevik, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J.
value. Science 340(6129), 216–219.
Särkämö, T., Tervaniemi, M., Soinila, S., Autti, T., Silvennoinen, H. M., Laine, M., & Hietanen, M.
(2009). Cognitive deficits associated with acquired amusia after stroke: A neuropsychological
follow-up study. Neuropsychologia 47(12), 2642–2651.
Schaal, N. K., Pfeifer, J., Krause, V., & Pollok, B. (2015). From amusic to musical? Improving pitch
memory in congenital amusia with transcranial alternating current stimulation. Behavioural Brain
Research 294, 141–148.
Schuppert, M., Münte, T. F., Wieringer, B. M., & Altenmüller, E. (2000). Receptive amusia:
Evidence for cross-hemispheric neural networks underlying music processing strategies. Brain
123(3), 546–559.
Sihvonen, A., Ripollés, P., Leo, V., Rodríguez-Fornells, A., Soinila, S., & Särkämö, T. (2016). Neural
basis of acquired amusia and its recovery after stroke. Journal of Neuroscience 36(34), 8872–
8881.
Sihvonen, A., Ripollés, P., Rodríguez-Fornells, A., Soinila, S., & Särkämö, T. (2017). Revisiting the
neural basis of acquired amusia: Lesion patterns and structural changes underlying amusia
recovery. Frontiers in Neuroscience 11. Retrieved from https://doi.org/10.3389/fnins.2017.00426
Sihvonen, A., Ripollés, P., Särkämö, T., Leo, V., Rodríguez-Fornells, A., Saunavaara, J., … Soinila,
S. (2017). Tracting the neural basis of music: Deficient structural connectivity underlying acquired
amusia. Cortex 97, 255–273.
Sowiński, J., & Dalla Bella, S. (2013). Poor synchronization to the beat may result from deficient
auditory-motor mapping. Neuropsychologia 51(10), 1952–1963.
Stewart, L., von Kriegstein, K., Warren, J. D., & Griffiths, T. D. (2006). Music and the brain:
Disorders of musical listening. Brain: A Journal of Neurology 129(Pt. 10), 2533–2553.
Thomas, C., Avidan, G., Humphreys, K., Jung, K. J., Gao, F., & Behrmann, M. (2008). Reduced
structural connectivity in ventral visual cortex in congenital prosopagnosia. Nature Neuroscience
12(1), 29–31.
Tranchant, P., & Vuvan, D. T. (2015). Current conceptual challenges in the study of rhythm
processing deficits. Frontiers in Neuroscience 9. Retrieved from
Tranchant, P., Vuvan, D. T., & Peretz, I. (2016). Keeping the beat: A large sample study of bouncing
and clapping to music. PloS ONE 11(7), e0160178.
Trehub, S. E. (2001). Musical predispositions in infancy. Annals of the New York Academy of
Sciences 930, 1–16.
Vuvan, D., Nunes-Silva, M., & Peretz, I. (2015). Meta-analytic evidence for the non-modularity of
pitch processing in congenital amusia. Cortex 69, 186–200.
Vuvan, D., Paquette, S., Mignault Goulet, G., Royal, I., Felezeu, M. & Peretz, I. (2017). The
Montreal protocol for identification of amusia. Behavior Research Methods 50.
Whiteford, K. L., & Oxenham, A. J. (2018). Learning for pitch and melody discrimination in
congenital amusia. Cortex 103, 164–178.
Wilbiks, J. M. P., Vuvan, D. T., Girard, P.-Y., Peretz, I., & Russo, F. A. (2016). Effects of vocal
training in a musicophile with congenital amusia. Neurocase 22(6), 526–537.
Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation. Psychological Science 20(1), 1–
5.
Zatorre, R. J., & Peretz, I. (2001). The biological foundations of music. Annals of the New York
CHAPT E R 32
W H E N B L U E T U R N S TO
G R AY: T H E E N I G M A O F
M U S I C I A N ’ S D Y S TO N I A
D AV I D P E T E R S O N A N D E C K A RT A LT E N MÜ L L E R
Dystonia
M ’ dystonia is a particular form of dystonia. The concept of
dystonia has evolved over the past several decades. A recent update to its
definition reflects a multiyear effort to achieve some consensus on how it is
described:
Dystonia is a movement disorder characterized by sustained or intermittent muscle
contractions causing abnormal, often repetitive, movements, postures, or both. Dystonic
movements are typically patterned, twisting, and may be tremulous. Dystonia is often
initiated or worsened by voluntary action and associated with overflow muscle activation.
(Albanese et al., 2013)
Dystonia can refer to characteristic symptoms that are secondary to a long

list of other, mostly neurologic, disorders. Dystonia can also be the primary
disorder, referred to as “isolated dystonia.” References to dystonia in the
remainder of this chapter will refer to the primary, isolated form of
dystonia.
There are many forms of dystonia, most commonly differentially
characterized by their age of onset and the distribution of body regions that
are symptomatic. Dystonias with onset in childhood or adolescence tend to
be more general, involving several parts of the body and are more likely to
begin in the lower extremities. Adult onset dystonias are much more
common and typically involve only a single body region (“focal dystonia”)
or a small number of contiguous body regions (“segmental dystonia”). The
various focal dystonias share several characteristics, including a lack of
selectivity in attempts to perform specific movements, such as individual
finger movements in the case of focal hand dystonia (FHD). In some cases,
focal dystonias also exhibit undesirable co-contraction of antagonistic
muscles (Cohen & Hallett, 1988).
Musician’s Dystonia
Musician’s dystonia (MD) is a specific type of focal dystonia. Onset can
range from approximately 18 years old to the seventh decade, but most
commonly occurs in the mid-30s (Altenmüller, 2003; Brandfonbrener,
1995; Brandfonbrener & Robson, 2004; Chang & Frucht, 2013; Conti,
Pullman, & Frucht, 2008; Jankovic & Shale, 1989; Lederman, 1991;
Schuele & Lederman, 2004a). MD is one of the most perplexing forms of
dystonia. Documenting a cohort of over 590 MD patients diagnosed
between 1994 and 2007 at the Institute of Music Physiology and Musicians’
Medicine of the Hanover University of Music, Drama, and Media
(Altenmüller, Baur, Hofmann, Lim, & Jabusch, 2012; Jabusch &
Altenmüller, 2006a) it has been associated with almost every instrument
and in several body regions. In every case MD involves impaired voluntary
motor control while a musician is playing the instrument. The symptoms
generally appear during movements that are extensively trained. It can
affect control of facial, lip, and tongue muscles (“embouchure dystonia,”
Frucht et al., 2001), lower limbs, or, in the majority of patients, the muscles
controlling the arm or hand. MD involving the hand is a form of FHD.
MD is sometimes referred to as “musician’s cramp” because it is often
described in conjunction with a form of FHD called “writer’s cramp.”
However, the term “cramp” can be misleading; MD rarely involves the
maximum intensity contractions associated with cramps (Pesenti, Barbieri,
& Priori, 2004; Tubiana, 2000). In the hand, MD is usually associated with
loss of fine control and coordination most commonly in heterogeneous
subsets of digits 2–5 (Charness, Ross, & Shefner, 1996; Frucht, 2009b;
Furuya, Tominaga, Miyazaki, & Altenmüller, 2015; Jankovic & Ashoori,
2008; Jankovic & Shale, 1989). The relative amount of excessive finger
flexion or extension depends on the type of instrument (Frucht, 2009b;
Conti et al., 2008). Flexion is more common than extension. If multiple
fingers are involved, they are usually adjacent fingers. Patients report a
feeling of loss of automaticity in previously automatic music performance
(Frucht, 2009b). Several examples of abnormal postural configurations are
depicted in Fig. 1. MD is painless in most, but not all, patients (Jabusch &
Altenmüller, 2006b). Indeed pain may suggest diagnosis of repetitive strain
injury or occupational fatigue syndrome rather than MD.
FIGURE 1. Representative posture patterns in musician’s dystonia.
Reproduced from Hans-Christian Jabusch and Eckart Altenmüller, Epidemiology,
phenomenology, and therapy of musician’s cramp. In Eckart Altenmüller, Mario
Wiesendanger, and Jürg Kesselring (Eds.), Music, motor control, and the brain, p. 267,
Figure 17.2, doi:10.1093/acprof:oso/9780199298723.001.0001, Copyright © Oxford
University Press 2006, reproduced by permission of Oxford University Press.
MD is the form of focal hand dystonia with the highest rate of

prevalence (Altenmüller & Jabusch, 2009). Curiously, MD is about ten
times more prevalent than corresponding focal dystonias in the general
public. An estimated 1–2 percent of all musicians develop MD
(Altenmüller, 2003). MD is the performance-related medical problem that is
most likely to lead to long-term disability in musicians (Schuele &
Lederman, 2004b). Because treatments are frequently suboptimal and
usually incomplete, and because musicians’ identities are strongly
intertwined with their profession, news of the MD diagnosis can be
devastating. However, it should be mentioned that prognosis has improved
the last twenty years, and around 70 percent of musicians suffering from
focal dystonia remain in their professions (Lee, Eich, Ioannou, &
Altenmüller, 2015a).
Essential Characteristics of Musician’s Dystonia

Two characteristics of MD stand out: how symptoms are localized to a
specific body part and how symptoms exhibit task specificity. Fig. 2
illustrates how localization of the dystonia in the left versus right hands, and
in some cases facial muscles, are differentially involved depending on the
type of musical instrument (Frucht, 2009a; Jabusch & Altenmüller, 2006b).
Instruments such as keyboards (piano, organ, harpsichord) and plucked
instruments (guitar, electric bass) are associated with MD predominantly in
the right hand. Bowed string instruments are associated with MD
predominantly in the left hand (Altenmüller et al., 2012; Jabusch &
Altenmüller, 2006b). It remains unclear to what extent the demands of
music performance on the particular type of instrument factor into this
lateral asymmetry. Many classical repertoires place great demands on both
hands at the keyboard, for example. Fine motor control in other activities of
daily living also contributes to susceptibility in the individual’s dominant
hand (Baur, Jabusch, & Altenmüller, 2011). Naturally, brass players are
most likely to exhibit embouchure dystonia. Also, brass players exhibit a
higher ratio of embouchure to hand dystonia than woodwind players. This
makes sense in light of the motor control demands of the instruments. Brass
players precisely control frequency and amplitude of lip vibrations by
modulating embouchure muscle tension. The demands for woodwind
instruments are different: embouchure adjustments do not require lip
vibration but finger movement patterns are more complex, explaining why
dystonia is common in the hand in woodwinds but very rare among brass
players (Altenmüller et al., 2012).
FIGURE 2. Symptom localization among the hands and embouchure in musician’s dystonia.
Reproduced from Hans-Christian Jabusch and Eckart Altenmüller, Epidemiology,
phenomenology, and therapy of musician’s cramp. In Eckart Altenmüller, Mario
Wiesendanger, and Jürg Kesselring (Eds.), Music, motor control, and the brain, p. 267,
Figure 17.3, doi:10.1093/acprof:oso/9780199298723.001.0001, Copyright © Oxford
University Press 2006, reproduced by permission of Oxford University Press.
Task specificity refers to the phenomenon whereby symptoms appear

only during certain tasks. It is a hallmark of many forms of dystonia,
including not only those forms explicitly labeled as “task-specific
dystonias,” but also other forms such as cranial and cervical dystonia where
symptoms are often sensitive to the task context, such as whether or not a
patient is talking. This characteristic is one of the reasons why many of the
focal dystonias were historically considered a psychiatric disorder long
before they were considered neurologic (Marsden & Sheehy, 1990). Among
the focal dystonias, MD exhibits some of the most exquisite task specificity.
For many patients, the symptoms are present only while playing the
instrument and, in some cases, only in specific passages of specific pieces
(Jabusch & Altenmüller, 2006b; Lee, Tominaga, Furuya, Miyazaki, &
Altenmüller, 2015b; Tubiana, 2003). Broadly defined, a “task” context can
include other elements of the patient’s moment-by-moment sensorimotor
state. Thus so called “sensory tricks,” or the “geste antagoniste,” that
patients can use to transiently alleviate symptoms, can be viewed as a
particular manifestation of task specificity. As with other focal dystonias,
some MD patients can benefit from sensory tricks (Paulig, Jabusch,
Grossbach, Boullet, & Altenmüller, 2014). For example, some MD patients
benefit from playing with a latex glove, or when holding an object such as a
rubber gum between the fingers (Jabusch & Altenmüller, 2006a).
Collectively the localization and task specificity characteristics of MD are
important features to keep in mind when assessing the patient’s severity and
response to treatment.
The Importance of Rating Scales

As with most disorders, it is a common convention to carefully assess the
patient before initiating treatment. Compared to characterizing severity,
diagnosis is relatively straightforward and is not addressed further here. But
evaluating the efficacy of treatments inherently requires some comparison
of severity before and at various time points after treatment. The class of
tools used to measure severity are often referred to as “rating scales”
because they have historically most commonly involved a human—the
clinician or patient or both—making some ratings of severity on a
previously agreed upon scale.
Rating scales provide important outcome measures for clinical trials.
They also commonly serve stratification purposes in genetic studies or as a
regressor for research into pathophysiology. Thus they have become a
critical path tool for the whole pipeline of research into new treatments.
Tailoring the Dystonia Study Group (Group, 2004) guidelines for
musician’s dystonia, a maximally useful rating scale for MD should be: (1)
reliable and valid, (2) sensitive to change, (3) specifically designed to
measure MD, and (4) practical in a clinical setting (Spector &
Brandfonbrener, 2005, 2007).
Rating Scales for Musician’s Dystonia

Motivated by Spector and Brandfonbrener’s initial effort (Spector &
Brandfonbrener, 2007), Peterson and colleagues (Peterson, Berque,
Jabusch, Altenmüller, & Frucht, 2013) conducted a critical and
comprehensive review of MD rating scales in 2013. The latter used
considerably less restrictive inclusion criteria to comprehensively review
the use of rating scales in 135 articles on MD. They provided complete
descriptions of the scales, variations in their use, and their properties
relative to the Dystonia Study Group’s guidelines for clinical utility. They
also systematically evaluated the distribution of each scale’s use in the
literature, including studies involving various treatment approaches and
pathophysiological assays. As shown in Fig. 3, the various scales can be
divided into subjective and objective measures, with subjective being
further subdivided into patient- or clinician-rated.
FIGURE 3. Rating scale use in the musician’s dystonia literature. (A) : histogram of number scales
used in each study; (B) : number of studies using each type of scale (subjective by patient, subjective
by clinician, objective, or combinations thereof; (C) : number of studies using each scale, grouped
by type.
Reproduced from David A. Peterson, Patrice Berque, Hans-Christian Jabusch, Eckart
Altenmüller, and Steven J. Frucht, Rating scales for musician’s dystonia, Neurology 81(6),
589–598, https://doi.org/10.1212/WNL.0b013e31829e6f72, Copyright © 2013 American
Academy of Neurology.
A noteworthy objective scale is MIDI-based Scale Analysis (MSA). For

MSA, keyboardists play 10–15 iterations of two octaves of the C major
scale in ulnar and radial directions mezzo forte legato style at a tempo of
eight notes per second. Key press timing is recorded through a standard
MIDI interface. Key press and release timing provide measures of tone
durations, overlaps, and interonset intervals (IOIs). The standard deviation
of IOIs (sdIOI) is used to quantify the temporal evenness with which the
scales are performed. The sdIOI has provided excellent sensitivity and
become the primary outcome measure in subsequent studies using MSA
(Jabusch, Vauth, & Altenmüller, 2004c).
Rating Scale Deficiencies

Rating scales have provided a way to strengthen an assessment exercise that
is otherwise largely qualitative by supplementing it with measures that are
inherently quantitative. Curiously, only about half of all experimental
studies in MD used quantitative assessments (Peterson et al., 2013).
Unfortunately, none of the scales have been rigorously evaluated against the
Dystonia Study Group’s criteria for a maximally useful rating scale
(Spector & Brandfonbrener, 2007): reliable and valid, sensitive to change,
practical in a clinical setting, and specifically tailored to MD. The
subjective scales lack the sensitivity needed to compare treatments with
similar efficacy because they have high inter-rater variability. They also
lack digit-level specificity, which is central for many patients with FHD.
Some of the rating scales used in MD—such the Fahn-Marsden (FM) scale,
Unified Dystonia Rating Scale (UDRS), and Global Dystonia Scale (GDS)
—were originally designed for generalized dystonia or focal forms other
than MD. Although they convey “global impressions” based on clinical
observation, they are not tailored to task-specific motor impairments
(Peterson et al., 2013). A few scales—such as the TCS, FAM, and TRE—
incorporate a symptom-evoking performance element. This is key for MD
as it is inherently task-specific for most patients. However, only a small
minority of past MD research has used these scales.
Objective scales, e.g., based on kinematics or MIDI for hand dystonia or
acoustic analysis of the fundamental in embouchure dystonia (Lee, Voget,
Furuya, Morise, & Altenmüller, 2016), offer the benefits of mitigating the
intra- and inter-rater variability intrinsic to subjective scales. However,
objective scales require additional infrastructure, limiting their efficient use
in the clinic. Regardless of how MD severity is measured, the relative
merits of those techniques should be considered in the context of choosing,
evaluating, and updating strategies to treat MD.
T
Treating Musician’s Dystonia

Options for treating MD overlap substantially with those for other focal
dystonias. Yet MD is one of the most difficult forms of dystonia to treat
(Jankovic & Ashoori, 2008; Tubiana & Chamagne, 1993). In general, the
treatments are aimed at the symptoms, not the causes or underlying
pathophysiology. This is unsurprising given what little is known about the
pathogenesis and pathophysiology of focal dystonias. A summary list of the
treatments and rough stratification of their efficacy across a wide variety of
MD patients is given in Fig. 4.
FIGURE 4. Treatment efficacy based on patient ratings. Black bars: deterioration. Gray bars: No
change. White bars: alleviation. Hatched bars: no reply. Trhx, trihexyphenidyl; BT, botulinum toxin.
Reproduced from Hans-Christian Jabusch, Dorothea Zschucke, Alexander Schmidt, Stephan
Schuele, and Eckart Altenmüller, Focal dystonia in musicians: Treatment strategies and long-
term outcome in 144 patients, Movement Disorders 20(12), 1623–1626,
https://doi.org/10.1002/mds.20631, Copyright © 2005 Movement Disorder Society.
The primary oral medication that has been tried is trihexyphenidyl, an

anticholinergic. However, at doses sufficient to demonstrate efficacy in
adults it usually is accompanied by severe adverse side effects. Botulinum
toxin injections into affected muscles have become the mainstay for treating
focal dystonias. Their net effect is to block neurotransmission at the
neuromuscular junction. In the case of MD, injection efficacy can be greatly
enhanced by carefully selecting the muscle to inject, including by
distinguishing primary from compensatory movements (Frucht, 2015) and
appropriate incorporation of EMG and ultrasound guidance. The injections
can offer some symptomatic relief for many. There are reports of
particularly successful cases (e.g., Vecchio et al., 2012). However,
botulinum toxins also have several limiting adverse side effects (Frucht,
2009b; Jankovic & Ashoori, 2008; Zeuner & Molloy, 2008), particularly
when lateral finger movements are an important part of the motor
repertoire. If not carefully planned and administered, they often lead to
weakness that limits hand performance. Even with optimal doses and
injection locations, efficacy tends to wear off, typically 2–4 months post
injection. Thus injections need to be periodically repeated in perpetuity and
there is the natural variation in efficacy associated with where one is in the
treatment “cycle.”
For the overwhelming majority of MD patients, currently available
treatments are suboptimal. Because many MD patients are professional
musicians that play at very high levels, the diagnosis can spell the end of
their performance career (Conti et al., 2008; Frucht, 2009a). Thus MD
remains as one of the primary challenges in musician’s medicine (Jabusch
& Altenmüller, 2006a; Rosset-Llobet, Candia, Molas, Cubells, & Pascual-
Leone, 2009; Tubiana & Chamagne, 1993).
Physical Medicine and Rehabilitation

Pharmacologic treatment strategies suffer from a lack of specificity that is
particularly important in MD. Systemic oral medications have an obvious
lack of spatial selectivity. It is highly unlikely, for example, that there is a
simple mapping between cholinergic receptors in various brain regions and
alterations in brain circuitry that coincide with very specific motor
repertoires selectively affected in music performance. There is also a
mismatch between the slow pharmacokinetics of oral medications and the
rapid temporal dynamics of MD symptoms. In the case of botulinum toxins,
they have an inherently good spatial selectivity in the periphery, but again
suffer from a mismatch in temporal dynamics. Given these deficiencies of
oral medications and botulinum toxin approaches, and the characteristics of
symptom localization and exquisite task specificity in MD, it is
unsurprising that a diverse set of non-pharmacologic treatment strategies
have been attempted.
Prominent among the non-pharmacologic approaches are a variety of
techniques under the umbrella of what can be called physical medicine and
rehabilitation (PMR). These go by many names, including rehabilitation
strategies, behavioral therapies, behavioral interventions, retraining
programs, pedagogical retraining, technical exercises, non-specific
exercises on the instrument, etc. In each case, they involve some subset of
principles from PMR. They have been reviewed for their use in dystonia,
including with foci of the hand and wrist, but more often for the writer’s
cramp form of FHD than for MD per se (Bernstein et al., 2016; Valdes,
Naughton, & Algar, 2014). In some cases, PMR approaches have involved
temporarily limiting movement. Specifically, they involve immobilizing or
modifying the range of motion of the affected motor system with some form
of splint for a period of weeks to years (Priori, Pesenti, Cappellari, Scarlato,
& Barbieri, 2001; Satoh, Narita, & Tomimoto, 2011), and can produce
improvements that are sustained well beyond the end of the intervention
(Priori et al., 2001).
The PMR methods have a long history, going at least as far back as to
predate recognition of FHD in the musician as dystonia (Hays, 1987).
Collectively they are considered useful (Lederman, 2001), but often exhibit
benefits that are mixed and/or transient, require months or even years of
therapy (Sakai, 2006; Tubiana, 2000), and suffer from varied levels of
compliance. The reasons are generally unclear, but for some approaches it
may be because the methods are not tailored to each musician’s exquisite
and very personalized motor repertoire. In the case of most PMR
approaches, replicating the results has either not been attempted or not been
successful. Also, the studies have suffered from several design deficiencies,
mostly because it is inherently difficult to incorporate careful controls and
blinding.
Some research into focal dystonia pathophysiology has been repurposed
for therapeutic potential. Prominent examples include non-invasive brain
stimulation methods such as transcranial magnetic stimulation (TMS) and
direct current stimulation (DCS). They have been tried in isolation and in
conjunction with PMR techniques, with the hypothesis that they might
amplify, accelerate, or make more persistent the beneficial effects of the
PMR approaches in isolation (Furuya, Nitsche, Paulus, & Altenmüller,
2014; Rosset-Llobet et al., 2015). These brain stimulation methods offer the
advantage that they can be administered in a control (placebo) fashion,
thereby enabling blinded, controlled trials (Rosset-Llobet, Fabregas-Molas,
& Pascual-Leone, 2015). TMS and DCS are discussed further in the next
section on pathophysiology.
Pathophysiology versus Pathogenesis

Even in scientific forums, distinctions between association and causation
are sometimes unclear. Characterizations of research in focal dystonia are
no exception. One way we have attempted to make this distinction more
clear in this chapter is by discussing matters of pathophysiology and
pathogenesis in separate sections. Most if not all findings of pathological
physiology in FHD, including MD, should be viewed as simple
associations. Whether or not they are potential causes or consequences of
the disorder is particularly difficult to determine. For example, even
evidence that clinical recovery in MD coincides with normalization of
receptive field topography (Candia, Wienbruch, Elbert, Rockstroh, & Ray,
2003) does not necessarily suggest that transitions from normal to abnormal
receptive field topographies play a causal role.
Classic Concepts on Focal Dystonia

Pathophysiology
Most of the information about the pathophysiology of MD is actually
inferred from research including cohorts comprised of some (or all) patients
with other forms of focal task-specific dystonias, especially writer’s cramp.
General consensus is that most, but probably not all, physiological features
are common across the focal task-specific dystonias. In this section, unless
denoted otherwise, the pathophysiological features were found in studies
with FHD patients, sometimes but not always including MD patients.
Over the past few decades, a small handful of recurrent themes have
characterized the pathophysiology of FHD (for a good review, see Hallett,
2006). These include abnormalities in inhibition, sensorimotor integration,
and plasticity. Aspects of all of these have been documented at various
nodes in the brain circuits mediating motor (and sensorimotor) function, as
illustrated by fMRI (Haslinger et al., 2017; Oga et al., 2002), diffusion
tensor imaging (DTI, Delmaire et al., 2009), and even gamma knife
thalamotomies (Horisawa et al., 2017). Roughly speaking, the brain regions
implicated include several somatosensory and somatomotor cortical areas,
some association cortical areas, several basal ganglia nuclei, portions of the
cerebellum, and motor and intralaminar nuclei in the thalamus. Brainstem
and spinal circuits may also play a role, and have been indirectly implicated
in some studies focused on alterations to reflex systems in paired pulse
paradigms.
Investigators have found a decrease in net inhibition at each level of the
motor control circuitry. Perhaps more importantly, the decreased inhibition
has more specifically been associated with a loss of spatial selectivity, and
this has been put forth as a potential endophenotype for focal dystonia
(Altenmüller et al., 2012). It has been documented in the spatial domain, as
a blurred differentiation of individual digits at the behavioral and the neural
levels in FHD (Bara-Jimenez, Catalan, Hallett, & Gerloff, 1998; Delmaire
et al., 2005; Sohn & Hallett, 2004), including in MD (Elbert et al., 1998). It
has also been documented in the time domain, as in reduced TMS-evoked
silent periods (Chen et al., 1997; Ridding, Sheean, Rothwell, Inzelberg, &
Kujirai, 1995) and defective compliance with the no-go signal in a stop-
signal task (Ruiz et al., 2009). Interestingly, the exact nature of the selective
inhibition in the somatosensorimotor system may be different between
writer’s cramp and MD forms of FHD. Karin Rosenkranz (Rosenkranz et
al., 2005) evaluated a TMS-based measure of fast, local inhibition in the
cortex—the “short-latency intracortical inhibition” (SICI)—in the context
of simultaneous vibratory input to individual hand muscles. She found that
although SICI was unchanged for neighboring muscles in writer’s cramp
patients, it was suppressed selectively in neighboring muscles that are
functionally connected with the vibrated muscles in healthy musicians but
non-selectively in MD patients. They hypothesized that musicians’
extensive practice produces the altered surround inhibition that later
progresses into a non-focal pattern seen in MD.
There is also evidence for altered sensorimotor integration in FHD. Part
of this characterization stems for the ambiguity regarding when “sensory”
ends and “motor” begins in the nervous system—both functionally and
anatomically. The lines of demarcation are acutely blurred in music,
wherein tight, temporally nested sensorimotor loops are central and critical
to music comprehension and production. Notwithstanding semantics,
pathological sensorimotor integration has evolved as a consistent theme in
focal dystonia pathophysiology. Both the functional manifestations and
circuit bases have been recently reviewed (Avanzino, Tinazzi, Ionta, &
Fiorio, 2015) and will not be covered in detail here. Notably, although the
Rosenkranz (Rosenkranz et al., 2005) study focused on measures of spatial
inhibition with the SICI measure of cortical physiology, it also intrinsically
investigated sensorimotor integration by combining somatosensory input
and motor evoked potentials. The focus of this and other previous
sensorimotor integration research has been primarily on spatial selectivity,
with generally fixed temporal parameters. However, an emergent theme in
focal dystonia research is that time and timing may play particularly critical
roles.
Dystonia and Disordered Timing

Although intuition would suggest exquisite temporal processing in
musicians, and some measures indicate that timing abilities are normal in
MD (van der Steen, van Vugt, Keller, & Altenmüller, 2014), there is also
evidence for disordered temporal processing in MD. Among the many
potential MIDI output variables in Jabusch’s MSA rating scale, one of the
strongest findings in MD was the variability in the inter-onset key press
interval (sdIOI; Jabusch et al., 2004c). Also, the temporal dynamics of pre-
movement brain activity are smeared in dystonia relative to controls (Gilio
et al., 2003). This isn’t surprising given that SICI phenomena show
exquisite timing sensitivity (Rosenkranz, 2010) and in light of a recent
review triangulating between time processing, motor circuits, and
movement disorders including dystonia (Avanzino et al., 2016).
At the intersection of sensorimotor integration and timing is the
psychophysical measure known as the temporal discrimination threshold, or
TDT. In the case of the auditory TDT, for example, subjects are given two
brief auditory tones with rapid onsets and offsets separated by a brief and
variable interval. The interval is adjusted in a bi-directional staircase
fashion to determine the interval below which subjects cannot distinguish
the two separate stimuli and perceive them as one; the TDT.
The visual TDT has been shown to be abnormal (high) in many forms of
dystonia (Hutchinson et al., 2013), including MD when musicians rather
than non-musicians are used as study controls (Killian et al., 2017). The
TDT has also been put forth as a candidate endophenotype (Hutchinson et
al., 2013), because it has been shown to be abnormal in non-manifesting
carriers of the DYT1-form of familial dystonia (Fiorio et al., 2007) and it is
present in the unaffected hand, suggesting it is not secondary to symptoms
(Bara-Jimenez, Shelton, Sanger, & Hallett, 2000; Fiorio, Tinazzi, Bertolasi,
& Aglioti, 2003).
Plasticity
Plasticity is an overused and sometimes misinterpreted term in the
neurosciences. In the simplest sense, it refers to the ability of the system to
change. It has manifestations at behavioral and several physiological levels,
and likely has bases at the circuit and molecular levels, historically couched
in terms of changes in the network of synapses among neurons, and the
molecular signaling pathways that modulate the strength of those synapses
at different timescales.
In the context of dystonia, plasticity can refer to the near-real time
adaptations of sensorimotor systems, as for example a greater response than
normal to paired associate stimulation protocols, whereby the relative
timing of sensory stimuli and exogenous stimulation of motor cortical areas
can strengthen that sensory system’s subsequent ability to evoke the motor
response within the same experimental session. Plasticity can also refer to
the systems-level homologues of synaptic plastic process of long-term
potentiation and depression (LTP and LTD), in the form of high- and low-
frequency repetitive stimulation with TMS to potentiate or depress
subsequent cortical excitability. At a drastically slower timescale, plasticity
can refer to the slow, often insidious changes in brain circuitry that likely
underlie the pathogenesis of dystonia. There are also the notions of
“metaplasticity” and “homeoplasticity” which, among other things, are
thought to regulate the primary plastic processes already discussed.
In the case of MD, one theory is that musical practice in healthy
musicians is associated with (indeed, likely relies upon) beneficial plastic
adaptation in the motor cortex, including for example a reduction in motor
thresholds and increase in motor excitability, and that in MD patients these
processes have progressed too far and begin to compromise, rather than
enhance, movement patterns (Altenmüller & Jabusch, 2010b). Thus
musicians may have (or take advantage of) greater primary plastic
capabilities and MD patients may have dysfunctional metaplastic processes
that regulate the primary plastic processes in an abnormal way. A great
mystery in MD research is how such dysfunctional plastic processes are
initiated and used in generating the disorder.
P T
Biological Predisposition
With relatively rare exceptions, most central nervous system disorders,
including most forms of dystonia, likely arise from some complex
combination of genetic and environmental factors. In the adult onset
dystonias, this is usually conceptualized as a genetic predisposition
followed by some environmental “trigger.” In this chapter, we formulate a
theoretical framework for the pathogenesis of MD that is organized in a
similar fashion: biological predisposition and “use patterns” (see Fig. 5).
Biological predispositions include of course genetics but also gender. “Use
patterns” is a broad umbrella term that refers to how the sensorimotor
systems are used over protracted periods of time. As such, it can be
considered a category of “environmental” factors.
FIGURE 5. A theoretical framework for pathogenic factors. The probability of developing MD

increases with increasing abnormality in use patterns and/or biological predisposition. Several
factors influence use patterns, including the spatiotemporal demands of the instrument, overuse, and
situations that modify peripheral constraints, which in turn are variously influenced by the
performance demands on the patient. Biological predisposition consists of innate characteristics of
the patient, from the personality profile down to the level of properties of inhibition in brain circuits
and the intricate molecular underpinnings of synaptic plasticity, all in turn influenced by the patient’s
gender and genetics.
Early studies described positive family history of dystonia as a risk

factor for development of MD (Jankovic & Shale, 1989; Lim &
Altenmüller, 2003; Schmidt et al., 2006), and this connection has been
strengthened in multiple subsequent studies, including findings compatible
with a pattern of autosomal-dominant inheritance (Baur et al., 2011;
Jabusch & Altenmüller, 2006b; Schmidt et al., 2009). But specific abnormal
genes specifically associated with MD have remained elusive. Genetics are
also probably implicated in personality traits, and MD patients tend to
exhibit exaggerated perfectionism and social phobias not seen in healthy
musicians (Jabusch & Altenmüller, 2004; Jabusch, Müller, & Altenmüller,
2004a).
Another aspect of biological predisposition is gender. For practically all
other forms of dystonia, the M:F ratio ranges from 1:2 to 1:4. Curiously,
however, the overwhelming majority of MD patients are male (Lederman,
1991), and this has been confirmed in large cohorts (Jabusch &
Altenmüller, 2006b; Lim & Altenmüller, 2003). Indeed, the 5:1 M:F ratio is
corrected to 6:1 when taking into account the slight predominance of female
musicians in the musician’s population in Germany (see Fig. 6) (Lim &
Altenmüller, 2003).
FIGURE 6. Gender distribution of 591 patients and 2651 healthy musicians, in relative ratios.
Reproduced from Eckart Altenmüller, Volker Baur, Aurélie Hofmann, Vanessa K. Lim, and
Hans-Christian Jabusch, Musician’s cramp as manifestation of maladaptive brain plasticity:
Arguments from instrumental differences, Annals of the New York Academy of Sciences
1252, 259–265, doi:10.1111/j.1749-6632.2012.06456.x Copyright © 2012, John Wiley and
Sons.
The mechanisms by which genetics and gender contribute to MD

pathogenesis are an almost complete mystery, but they likely involve
differential hormonal contributions to synaptic plasticity and neuronal
inhibition, as well as macroscopic personality traits like stress, anxiety, and
perfectionism (Altenmüller et al., 2012), which appear to be present at
higher levels in MD (Altenmüller & Jabusch, 2010a). Since the first
description of musician’s dystonia, in the case of Robert Schumann
(Altenmüller, Kesselring, & Wiesendanger, 2006) psychological triggering
factors have been discussed. Indeed when tested carefully, musicians
suffering from dystonia can be clustered into two groups, those with pre-
existing anxiety disorders and dysfunctional psychological coping
strategies, leading to stressful personalities and those with no
pathophysiological signs (Ioannou & Altenmüller, 2014; Ioannou, Furuya,
& Altenmüller, 2016). Interestingly, in musicians with anxiety disorders
dystonia manifests itself about eight years earlier (Ioannou et al., 2016).
Thus it seems that MD is not a uniform nosological entity, but can instead
be classified into two forms: a predominantly “motor” manifestation and a
manifestation with accompanying non-motor symptoms, such as
constraints, anxiety, etc. Intriguingly, these two types of manifestations are
not a dichotomy but may overlap (Ioannou et al., 2016). Whether this
dimension depends on gender remains to be determined. Curiously, gender
appears to influence the TDT (Williams et al., 2015), and this sexual
dimorphism is also age-related (Butler et al., 2015). Endophenotypes like
altered spatiotemporal inhibition and the TDT may play a critical
intermediary role in discovering MD pathogenesis, helping to provide a link
between biological predisposition, the contribution of use patterns, and the
phenotypic characteristics of the disorder.
Use Patterns
Several features of MD suggest that how the sensorimotor systems are used
in music performance may be a factor contributing to the development of
MD. Motor workload and movement complexity appear to be risk factors.
MD is more likely to appear in the hand with higher demands of
spatiotemporal precision, such as the right hand on keyboards and plucked
instruments, and the left hand in bowed string players. Among the string
instruments, MD appears to be more prevalent on string instruments with
shorter string lengths, such as the violin versus double-bass (Altenmüller et
al., 2012; Jabusch & Altenmüller, 2006b). In fact, the relative absence of
MD documented in double bassists may be due to a lack of simultaneous
finger action (Conti et al., 2008). Another apparent risk factor is the type of
musical performance. Classical musicians seem to be more at risk of
developing MD than pop or jazz musicians. A hypothesized reason why is
that classical music involves higher expectations of temporally precise
reproduction and less opportunity for improvisation than pop and jazz
(Jabusch & Altenmüller, 2006b). Professional classical musicians have also
typically undergone many tens of thousands of hours of practice, involving
movements that are extensively trained, representing an “oversampling” of
a disproportionately narrow part of the space of possible sensorimotor
mappings. Finally, in some cases, peripheral issues such as prolonged pain
syndromes, nerve entrapment, and subclinical range of motion limitations
(Charness et al., 1996; Leijnse, Hallett, & Sonneveld, 2015) that can
precede MD onset may induce compensatory changes in use patterns that
are pathogenic and would not otherwise occur in a healthy musician’s motor
repertoire. Furuya (Furuya & Hanakawa, 2016) suggests that such
dysfunctional adaptations of body representations in somatosensory and
motor systems may be an intermediate point on the path toward full MD
development. It should be noted that these are all retrospective
observational results and there have been no controlled, prospective studies
regarding these factors.
As with genetics and gender, the mechanisms by which use patterns
contribute to MD pathogenesis are unclear. But having a theoretical
framework can help provide a rational basis for future experimental
research.
F D
Better Assessment
Among the numerous unmet needs in dystonia (Albanese, 2017), one of the
designated research priorities in focal task-specific dystonias is more
precise methods for characterizing and assessing the phenotype (Richardson
et al., 2017). This is acutely evident for MD. For over a decade dystonia
researchers have suggested that a new rating scale for MD that is reliable,
valid, sensitive, and specific to MD is sorely needed (Jankovic and Ashoori,
2008; Spector & Brandfonbrener, 2007). Peterson and colleagues (Peterson
et al., 2013) comprehensively summarized the state of affairs in rating scale
use for MD. As alluded to in the section “Rating Scale Deficiencies,” none
of the existing scales have been completely and rigorously evaluated for
reliability and validity, sensitivity to change, practical use in a clinical
setting, and specifically tailored to MD. Exacerbating the concerns about
reliability and sensitivity to change, most of the existing rating scales are
based on human judgments, making them inherently subjective. Further
developments in objective rating instruments that would make them more
readily applicable in the clinical setting would help mitigate these issues.
As evident in the MD rating scale review (Peterson et al., 2013), there
appears to be no standard choice for rating scale(s) in MD research; most
studies use only one or two rating scales but the scales used vary widely
from one study to the next. When trying to compare and reconcile multiple
treatment studies, this makes it difficult to discern between treatment effects
and measurement effects. Likewise non-standard selection of rating scales
diminishes the collective research value of pathophysiology studies.
In summary, as depicted in Fig. 7, we suggest that the migration to a
more ideal mode for assessing MD severity would include two categories of
improvements: (1) standardizing the choice of rating scale(s) used across
studies, and (2) increasing the efficiency with which a small number of
rating scales can completely cover the conceptual space of measurements
one wants to make in the disorder. This mode would accelerate research
into both the pathophysiology of MD and improved treatments.
FIGURE 7. Toward ideal rating scale use. Upper left: current rating scale use is neither consistent
across studies nor efficient. Upper right: efficient rating scale use would minimize the number of
scales used in each study (in this example depiction, to two scales per study). Lower left: consistent
rating scale use would ensure that the same rating scales are used across all studies. Lower right: the
ideal scenario would be both consistent and efficient by pushing the envelope on both: a consistent
application of the minimal number of scales across all studies.
New Treatments
Another designated research priority in focal task-specific dystonias is
innovative clinical trial design that takes into account the tremendous
heterogeneity in the presentation of these dystonias from one patient to the
next (Richardson et al., 2017). Despite concerted efforts to evaluate an array
of new treatment approaches, most have involved small, unblinded,
retrospective studies. Clearly we need new trials that are controlled,
blinded, prospective, and randomized. Unfortunately, these designs are very
difficult to implement in the space of PMR treatments. This poses a creative
challenge for the field of MD, given the likely prominent role of PMR in
MD treatment. Relatedly, given the heterogeneity in MD manifestation,
treatments should logically be tailored to the individual patient. This is also
challenging in the context of the common research goal of reproducibility.
Yet there is a persistent need for new therapies for MD. Although many
patients manage to stay in the field with currently available therapies, most
have to make substantial compromises in their level of music performance.
And outcomes are particularly limiting for patients with embouchure
dystonia (Frucht et al., 2001; Jabusch & Altenmüller, 2006b).
Given the central theme of disordered temporal processing in MD, we
hypothesize that the highly refined timing characteristics of the
sensorimotor systems in professional musicians are critical to not only
understanding the disorder but also optimizing treatment. Simply put,
patients get stuck in patterns of inappropriate motor sequences and greater
attention to the critical role of time in related auditory perception and motor
performance could help them overcome these dysfunctional patterns. Future
treatment research should evaluate novel non-pharmacological
interventions for MD that are focused on timing and determine whether and
how their efficacy is related to psychophysical measures of temporal
processing, such as the TDT.
With respect to the motor performance aspect of timing, one form of
PMR-style intervention explicitly focused on timing is slow-down exercises
(SDE; Sakai, 2006). SDE is an MD rehabilitation strategy that has patients
slow down the tempo of their symptom-evoking performance pieces until
symptoms resolve, and then gradually increase the tempo back to the
original speed as long as symptoms do not return. To our knowledge,
groups independent of the original developers have not evaluated SDE. And
the original application of SDE required many weeks of retraining to
establish effects. Nevertheless, approaches like SDE merit further
systematic, controlled investigation by independent groups.
TMS offers a non-invasive method to not only study MD but also,
typically in the form of repetitive TMS (rTMS), modulate it. rTMS (and its
surface voltage counterpart transcranial Direct Current Stimulation, tDCS)
have been evaluated as potential treatments for FHD, mostly with writer’s
cramp (Cho & Hallett, 2016; Obeso, Cerasa, & Quattrone, 2016) but also
with MD (Furuya et al., 2014; Kieslinger, Holler, Bergmann, Golaszewski,
& Staffen, 2013). Because it inherently involves stimulation with temporal
precision at millisecond timescales, TMS provides a potentially
complementary physiologic counterpart to the PMR methods that operate at
behavioral timescales. Although research on their combined utility as FHD
therapy has thus far shown only mixed results (Kimberley, Schmidt, Chen,
Dykstra, & Buetefisch, 2015), we expect further trials in the near future.
Although stereotactic surgeries, especially deep brain stimulation (DBS)
in the globus pallidus internum (GPi), have demonstrated efficacy in
generalized dystonia, there are relatively fewer reported series in focal
dystonias, and to our knowledge none in MD. Future advances in DBS
technology, including closed loop designs, may be able to incorporate task
context (either behaviorally or physiologically) and facilitate symptom
reduction that is appropriately context-dependent for focal task-specific
dystonias. However, as with the botulinum toxin injections, the exquisite
spatiotemporal demands of music performance will make MD probably one
of the last indications for the treatment. Although lesion methods have
fallen out of favor with the advent of DBS, there is one successful reported
case of a thalamotomy for an MD patient that was refractory to oral
medications and botulinum toxin (Horisawa et al., 2017).
Endogenous cannabinoid receptors play an important role in, among
other things, synaptic plasticity processes in the basal ganglia. Thus, they
are a rational target for dystonia. Jabusch reported positive though transient
benefits from a single dose of THC in a pianist with MD (Jabusch,
Schneider, & Altenmüller, 2004b), but subsequent controlled studies in
more broadly defined movement disorders populations have since produced
mixed results (Kluger, Triolo, Jones, & Jankovic, 2015; Koppel et al.,
2014).
Since successful treatment is still a challenge, preventing musician’s
dystonia is important. Although prospective studies are lacking, avoidance
of triggering factors, such as chronic pain, overuse, anxiety, and mechanical
repetitions are important and may prevent manifestation of MD, especially
in those musicians with genetic susceptibility.
Research on Pathophysiology
The sensorimotor systems employed during music performance operate at
high rates and with great temporal precision. Thus future basic and clinical
research in MD should specifically measure and modulate the motor control
system with a focus on timing. The TDT is one obvious paradigm for
pursuing this. But the TDT is usually measured in the visual or
somatosensory domains. Yet the auditory modality is particularly important
for musicians. So future work should include the auditory TDT to measure
the temporal precision of auditory processing in MD patients.
Research into temporal processing in MD should also be carefully
integrated with previous themes in pathophysiology, such as altered
surround inhibition. We expect that the TDT deficits in dystonia that have
been interpreted as a time-domain version of reduced surround inhibition
(Tamura et al., 2009) are not independent from but actually interact with
alterations in the precision of “spatial” surround inhibition processes.
Recent evidence that surround inhibition can be modulated by attention
(Kuhn, Keller, Lauber, & Taube, 2018) provides a timely segue to our other
recommendation for future MD pathophysiology research: we should
allocate some resources to attention. Although difficult to assay in non-
human primates, attention may have been a factor in a monkey model of
FHD (Byl, Merzenich, & Jenkins, 1996). And differential use of attention
seems to be a factor in PMR approaches to MD (Brian Hays, personal
communication).
And attentional modulation has likely been a key element in what is
sometimes labeled as psychogenic dystonia because it is often based on
tests of distractibility. If a symptom is modulated when attention is directed
to or away from a motor function, is it still organic dystonia? Regardless of
what we call it, attention seems to play a role. Indeed allocation of attention
has itself been considered a form of action selection, and therefore
attentional focus can be considered part of the task-dependent aspect
inherent in most focal dystonias. Unfortunately, attention can be difficult to
measure and is often not considered in evaluating overt motor function. But
simple gaze monitoring may provide a reasonable first step.
The brain circuits mediating attention are widespread but likely rely
heavily upon thalamic systems that have thus far been relatively
understudied in neuroscience. Yet the thalamus is a central node mediating
communication among many motor systems including the cerebellum,
cortex, striatum, and of course brainstem. Indeed Hutchinson and
colleagues (Hutchinson et al., 2013) have postulated that the projection
from the superior colliculus to the striatum via the intralaminar nuclei of the
thalamus mediates the TDT. So future pathophysiology in research on
dystonias, including MD, should redirect attention to timing, attentional
focus, and the thalamus.
Research on Pathogenic Mechanisms

The framework of pathogenic theory we discussed earlier motivates a few
directions for future research into pathogenic mechanisms. An implied but
not explicit element in the framework is reinforcement learning (RL).
Similar to that laid out for the cranial dystonias (Peterson & Sejnowski,
2017), we hypothesize that the pathogenesis of MD results from a
pathological RL process influenced by both the biological predisposition
and use pattern categories of pathogenic factors. In the simple
computational theory of RL, there is a mapping from states to actions that is
learned through trial and error and biased by reinforcement signals. The
concept is particularly appropriate for task-specific dystonias such as MD,
because action selections (i.e., the next motor output) is explicitly
influenced by the current “state,” encompassing not only sensory state but
also context of the instrument and current “task” (e.g., performing a certain
piece) (Altenmüller & Müller, 2013). In its neural instantiations, RL
systems are composed of a network of neurons whose matrices of synaptic
connections represent that state-to-action mapping, and whose weight
changes correspond to the learning process. Reinforcement signals are
thought to come from rewards and “punishments,” which can be exogenous
and/or endogenous. Some consider music as a language of emotions,
covering a spectrum from negative to positive valence. These are
experienced not only by the music consumer, but also the producer.
Professional classical musicians experience both the fear of failure in a
system that emphasizes precision and reproducibility yet also the joy of
performing (Altenmüller & Jabusch, 2009). These factors may influence
and amplify endogenous reward signals used in the brain’s RL systems.
Much, but likely not all, of the brain’s implementation of RL involves
dopamine-mediated signaling in the primary input nucleus of the basal
ganglia, the striatum. One of the most classic interpretations of phasic
dopamine signaling has been the encoding of unpredicted levels of reward,
i.e., “reward prediction errors” (Schultz & Dickinson, 2000). And there is a
large, diverse body of literature suggesting that dopamine dynamics in the
striatum may play some role in a wide variety of dystonias (Peterson,
Sejnowski, & Poizner, 2010), and this has subsequently been supported by
genetic evidence in some focal dystonias (Fuchs et al., 2013). Structural and
functional abnormalities have been found in the basal ganglia in FHD
(Peller et al., 2006; Zeuner et al., 2015), and the projections from cortex to
striatum likely play a critical role in representing temporal information at
behavioral timescales (Meck, Penney, & Pouthas, 2008). Unsurprisingly
then, the striatum has also been implicated as a key structure related to the
TDT (Bradley et al., 2009; Pastor, Macaluso, Day, & Frackowiak, 2008).
We suggest that future research into MD (and, for that matter, more broadly
defined focal dystonias) take into account this theoretical framework for
designing future experiments as well as computational model simulations of
the system, as has already been done coding malleable sensory
representations with neighborhood preserving self-organizing maps that
simulate the task-dependence of MD (Altenmüller & Müller, 2013).
Progress in genomics has enabled whole-exome sequencing at ever
decreasing costs. However, the relatively low number of MD patients limits
what might be inferred from an unbounded search from whole genomes. If,
however, appropriate priors are taken into account, the search for
statistically meaningful genetic associations with MD may be tractable.
Such priors could be informed by, for example, (1) genetic findings from
other focal dystonias, which likely share many aspects of biological
predisposition with MD, (2) genes associated with molecular pathways that
are sexually dimorphic, (3) genes associated with molecular pathways that
underlie and influence cellular and circuit-level physiology such as synaptic
plasticity and altered neuronal inhibition.
Given the high dimensionality and complexity of not only the genomics
but also the rich motor repertoires inherent in a musician’s “use patterns,”
theoretical frameworks and computational models may help provide a
tractable path forward for simulating and studying what gives rise to MD.
Ultimately, this knowledge should in turn provide guidance on how to
reverse and prevent the disorder.
A
Peterson wishes to thank Dominique Sy for assistance generating Figs. 5
and 7. Peterson acknowledges partial support from the Dystonia Coalition
(NS065701 and TR001456), from the Office of Rare Diseases Research at
the National Center for Advancing Translational Sciences and the National
Institute of Neurological Disorders and Stroke, the Bachmann-Strauss
Dystonia & Parkinson Foundation, the Benign Essential Blepharospasm
Research Foundation, UCSD’s Kavli Institute for Brain and Mind, the
National Institute of Mental Health (NIMH 5T32-MH020002), the National
Science Foundation (the Temporal Dynamics of Learning Center, a Science
of Learning Center [SMA-1041755] and the program in Mind, Machines,
Motor Control [EFRI-1137279]), the Howard Hughes Medical Institute, and
the Congressionally Directed Medical Research Program (W81XWH-17-1-
0393). Altenmüller acknowledges support from BMBF Project Dystract:
100255620.
R
Albanese, A. (2017). Editorial: Unmet needs in dystonia. Frontiers in Neurology 8. Retrieved from
Albanese, A., Bhatia, K., Bressman, S. B., Delong, M. R., Fahn, S., Fung, V. S. C., … Teller, J. K.
(2013). Phenomenology and classification of dystonia: A consensus update. Movement Disorders
28(7), 863–873.
Altenmüller, E. (2003). Focal dystonia: Advances in brain imaging and understanding of fine motor
control in musicians. Hand Clinics 19(3), 523–538.
Altenmüller, E., Baur, V., Hofmann, A., Lim, V. K., & Jabusch, H. C. (2012). Musician’s cramp as
manifestation of maladaptive brain plasticity: Arguments from instrumental differences. Annals of
Altenmüller, E., & Jabusch, H. C. (2009). Focal hand dystonia in musicians: Phenomenology,
etiology, and psychological trigger factors. Journal of Hand Therapy 22(2), 144–154; quiz 155.
Altenmüller, E., & Jabusch, H. C. (2010a). Focal dystonia in musicians: Phenomenology,
pathophysiology, triggering factors, and treatment. Medical Problems of Performing Artists 25(1),
3–9.
Altenmüller, E., & Jabusch, H. C. (2010b). Focal dystonia in musicians: Phenomenology,
pathophysiology and triggering factors. European Journal of Neurology 17(1), 31–36.
Altenmüller, E., Kesselring, J., & Wiesendanger, M. (2006). Music, motor control and the brain.
Altenmüller, E., & Müller, D. (2013). A model of task-specific focal dystonia. Neural Networks 48,
25–31.
Avanzino, L., Pelosin, E., Vicario, C. M., Lagravinese, G., Abbruzzese, G., & Martino, D. (2016).
Time processing and motor control in movement disorders. Frontiers in Human Neuroscience 10.
Avanzino, L., Tinazzi, M., Ionta, S., & Fiorio, M. (2015). Sensory-motor integration in focal
dystonia. Neuropsychologia 79(Part B), 288–300.
Bara-Jimenez, W., Catalan, M. J., Hallett, M., & Gerloff, C. (1998). Abnormal somatosensory
homunculus in dystonia of the hand. Annals of Neurology 44(5), 828–31.
Bara-Jimenez, W., Shelton, P., Sanger, T. D., & Hallett, M. (2000). Sensory discrimination
capabilities in patients with focal hand dystonia. Annals of Neurology 47(3), 377–380.
Baur, V., Jabusch, H. C., & Altenmüller, E. (2011). Behavioral factors influence the phenotype of
musician’s dystonia. Movement Disorders 26(9), 1780–1781.
Bernstein, C. J., Ellard, D. R., Davies, G., Hertenstein, E., Tang, N. K. Y., Underwood, M., &
Sandhu, H. (2016). Behavioural interventions for people living with adult-onset primary dystonia:
A systematic review. BMC Neurology 16. doi:10.1186/s12883-016-0562-y
Bradley, D., Whelan, R., Walsh, R., Reilly, R. B., Hutchinson, S., Molloy, F., & Hutchinson, M.
(2009). Temporal discrimination threshold: VBM evidence for an endophenotype in adult onset
primary torsion dystonia. Brain 132(9), 2327–2335.
Brandfonbrener, A. G. (1995). Musicians with focal dystonia: A report of 58 cases seen during a 10-
year period at a performing-arts medical-center. Medical Problems of Performing Artists 10, 121–
127.
Brandfonbrener, A. G., & Robson, C. (2004). Review of 113 musicians with focal dystonia seen
between 1985 and 2002 at a clinic for performing artists. Advances in Neurology 94, 255–256.
Butler, J. S., Beiser, I. M., Williams, L., McGovern, E., Molloy, F., Lynch, T., … Hutchinson, M.
(2015). Age-related sexual dimorphism in temporal discrimination and in adult-onset dystonia
suggests GABAergic mechanisms. Frontiers in Neurology 6. Retrieved from
Byl, N. N., Merzenich, M. M., & Jenkins, W. M. (1996). A primate genesis model of focal dystonia
and repetitive strain injury. 1. Learning-induced dedifferentiation of the representation of the hand
in the primary somatosensory cortex in adult monkeys. Neurology 47(2), 508–520.
Candia, V., Wienbruch, C., Elbert, T., Rockstroh, B., & Ray, W. (2003). Effective behavioral
treatment of focal hand dystonia in musicians alters somatosensory cortical organization.
Proceedings of the National Academy of Sciences 100, 7942–7946.
Chang, F. C. F., & Frucht, S. J. (2013). Motor and sensory dysfunction in musician’s dystonia.
Current Neuropharmacology 11(1), 41–47.
Charness, M. E., Ross, M. H., & Shefner, J. M. (1996). Ulnar neuropathy and dystonic flexion of the
fourth and fifth digits: Clinical correlation in musicians. Muscle and Nerve 19(4), 431–437.
Chen, R., Classen, J., Gerloff, C., Celnik, P., Wassermann, E. M., Hallett, M., & Cohen, L. G. (1997).
Depression of motor cortex excitability by low-frequency transcranial magnetic stimulation.
Neurology 48(5), 1398–1403.
Cho, H. J., & Hallett, M. (2016). Non-invasive brain stimulation for treatment of focal hand dystonia:
Update and future direction. Journal of Movement Disorders 9(2), 55–62.
Cohen, L. G., & Hallett, M. (1988). Hand cramps: Clinical features and electromyographic patterns
in a focal dystonia. Neurology 38(7), 1005–1012.
Conti, A. M., Pullman, S., & Frucht, S. J. (2008). The hand that has forgotten its cunning: Lessons
from musicians’ hand dystonia. Movement Disorders 23(10), 1398–1406.
Delmaire, C., Krainik, A., Du Montcel, S. T., Gerardin, E., Meunier, S., Mangin, J. F., … Lehericy, S.
(2005). Disorganized somatotopy in the putamen of patients with focal hand dystonia. Neurology
64(8), 1391–1396.
Delmaire, C., Vidailhet, M., Wassermann, D., Descoteaux, M., Valabregue, R., Bourdain, F., …
Lehericy, S. (2009). Diffusion abnormalities in the primary sensorimotor pathways in writer’s
cramp. Archives of Neurology 66(4), 502–508.
Elbert, T., Candia, V., Altenmüller, E., Rau, H., Sterr, A., Rockstroh, B., … Taub, E. (1998).
Alteration of digital representations in somatosensory cortex in focal hand dystonia. Neuroreport
9(16), 3571–3575.
Fiorio, M., Gambarin, M., Valente, E. M., Liberini, P., Loi, M., Cossu, G., … Tinazzi, M. (2007).
Defective temporal processing of sensory stimuli in DYT1 mutation carriers: A new
endophenotype of dystonia? Brain 130(1), 134–142.
Fiorio, M., Tinazzi, M., Bertolasi, L., & Aglioti, S. M. (2003). Temporal processing of visuotactile
and tactile stimuli in writer’s cramp. Annals of Neurology 53(5), 630–635.
Frucht, S. J. (2009a). Embouchure dystonia: Portrait of a task-specific cranial dystonia. Movement
Disorders 24(12), 1752–1762.
Frucht, S. J. (2009b). Focal task-specific dystonia of the musicians’ hand: A practical approach for
the clinician. Journal of Hand Therapy 22(2), 136–142.
Frucht, S. J. (2015). Evaluating the musician with dystonia of the upper limb: A practical approach
with video demonstration. Journal of Clinical Movement Disorders 2. doi:10.1186/s40734-015-
0026-3
Frucht, S. J., Fahn, S., Greene, P. E., O’Brien, C., Gelb, M., Truong, D. D., … Ford, B. (2001). The
natural history of embouchure dystonia. Movement Disorders 16(5), 899–906.
Fuchs, T., Saunders-Pullman, R., Masuho, I., Luciano, M. S., Raymond, D., Factor, S., … Ozelius, L.
J. (2013). Mutations in GNAL cause primary torsion dystonia. Nature Genetics 45, 88-U128.
Furuya, S., & Hanakawa, T. (2016). The curse of motor expertise: Use-dependent focal dystonia as a
manifestation of maladaptive changes in body representation. Neuroscience Research 104, 112–
119.
Furuya, S., Nitsche, M. A., Paulus, W., & Altenmüller, E. (2014). Surmounting retraining limits in
musicians’ dystonia by transcranial stimulation. Annals of Neurology 75(5), 700–707.
Furuya, S., Tominaga, K., Miyazaki, F., & Altenmüller, E. (2015). Losing dexterity: Patterns of
impaired coordination of finger movements in musician’s dystonia. Scientific Reports 5. doi:
10.1038/srep13360
Gilio, F., Curra, A., Inghilleri, M., Lorenzano, C., Suppa, A., Manfredi, M., & Berardelli, A. (2003).
Abnormalities of motor cortex excitability preceding movement in patients with dystonia. Brain
126, 1745–1754.
Group, D. S. (2004). Rating scales for dystonia: Assessment of reliability of three scales. Advances in
Neurology 94, 329–336.
Hallett, M. (2006). Pathophysiology of writer’s cramp. Human Movement Science 25(4–5), 454–463.
Haslinger, B., Noe, J., Altenmüller, E., Riedl, V., Zimmer, C., Mantel, T., & Dresel, C. (2017).
Changes in resting-state connectivity in musicians with embouchure dystonia. Movement
Disorders 32(3), 450–458.
Hays, B. (1987). Painless hand problems of string-pluckers. Medical Problems of Performing Artists
2, 39–40.
Horisawa, S., Tamura, N., Hayashi, M., Matsuoka, A., Hanada, T., Kawamata, T., & Taira, T. (2017).
Gamma knife ventro-oral thalamotomy for musician’s dystonia. Movement Disorders 32(1), 89–
90.
Hutchinson, M., Kimmich, O., Molloy, A., Whelan, R., Molloy, F., Lynch, T., … O’Riordan, S.
(2013). The endophenotype and the phenotype: Temporal discrimination and adult-onset dystonia.
Movement Disorders 28(13), 1766–1774.
Ioannou, C. I., & Altenmüller, E. (2014). Psychological characteristics in musician’s dystonia: A new
diagnostic classification. Neuropsychologia 61, 80–88.
Ioannou, C. I., Furuya, S., & Altenmüller, E. (2016). The impact of stress on motor performance in
skilled musicians suffering from focal dystonia: Physiological and psychological characteristics.
Jabusch, H. C., & Altenmüller, E. (2004). Anxiety as an aggravating factor during onset of focal
dystonia in musicians. Medical Problems of Performing Artists 19, 75–81.
Jabusch, H. C., & Altenmüller, E. (2006a). Epidemiology, phenomenology, and therapy of musician’s
cramp. In E. Altenmüller, J. Kesselring, & M. Wiesendanger (Eds.), Music, motor control and the
brain (pp. 265–282). Oxford: Oxford University Press.
Jabusch, H. C., & Altenmüller, E. (2006b). Focal dystonia in musicians: From phenomenology to
therapy. Advances in Cognitive Psychology 2(2–3), 207–220.
Jabusch, H. C., Müller, S. V., & Altenmüller, E. (2004a). Anxiety in musicians with focal dystonia
and those with chronic pain. Movement Disorders 19(10), 1169–1175.
Jabusch, H. C., Schneider, U., & Altenmüller, E. (2004b). Delta 9-tetrahydrocannabinol improves
motor control in a patient with musician’s dystonia. Movement Disorders 19, 990–991.
Jabusch, H. C., Vauth, H., & Altenmüller, E. (2004c). Quantification of focal dystonia in pianists
using scale analysis. Movement Disorders 19(2), 171–180.
Jankovic, J., & Ashoori, A. (2008). Movement disorders in musicians. Movement Disorders 23(14),
1957–1965.
Jankovic, J., & Shale, H. (1989). Dystonia in musicians. Seminars in Neurology 9, 131–135.
Kieslinger, K., Holler, Y., Bergmann, J., Golaszewski, S., & Staffen, W. (2013). Successful treatment
of musician’s dystonia using repetitive transcranial magnetic stimulation. Clinical Neurology and
Neurosurgery 115(9), 1871–1872.
Killian, O., McGovern, E. M., Beck, R., Beiser, I., Narasimham, S., Quinlivan, B., … Reilly, R. B.
(2017). Practice does not make perfect: Temporal discrimination in musicians with and without
dystonia. Movement Disorders 32(2), 1791–1792.
Kimberley, T. J., Schmidt, R. L. S., Chen, M., Dykstra, D. D., & Buetefisch, C. M. (2015). Mixed
effectiveness of rTMS and retraining in the treatment of focal hand dystonia. Frontiers in Human
Neuroscience 9. Retrieved from https://doi.org/10.3389/fnhum.2015.00385
Kluger, B., Triolo, P., Jones, W., & Jankovic, J. (2015). The therapeutic potential of cannabinoids for
movement disorders. Movement Disorders 30(3), 313–327.
Koppel, B. S., Brust, J. C. M., Fife, T., Bronstein, J., Youssof, S., Gronseth, G., & Gloss, D. (2014).
Systematic review: Efficacy and safety of medical marijuana in selected neurologic disorders.
Report of the Guideline Development Subcommittee of the American Academy of Neurology.
Neurology 82, 1556–1563.
Kuhn, Y. A., Keller, M., Lauber, B., & Taube, W. (2018). Surround inhibition can instantly be
modulated by changing the attentional focus. Scientific Reports 8(1). doi:10.1038/s41598-017-
19077-0
Lederman, R. J. (1991). Focal dystonia in instrumentalists: Clinical features. Medical Problems of
Performing Artists 6, 132–136.
Lederman, R. J. (2001). Embouchure problems in brass instrumentalists. Medical Problems of
Performing Artists 16, 53–57.
Lee, A., Eich, C., Ioannou, C. I., & Altenmüller, E. (2015a). Life satisfaction of musicians with focal
dystonia. Occupational Medicine 65(5) 380–385.
Lee, A., Tominaga, K., Furuya, S., Miyazaki, F., & Altenmüller, E. (2015b). Electrophysiological
characteristics of task-specific tremor in 22 instrumentalists. Journal of Neural Transmission
122(3), 393–401.
Lee, A., Voget, J., Furuya, S., Morise, M., & Altenmüller, E. (2016). Quantification of sound
instability in embouchure tremor based on the time-varying fundamental frequency. Journal of
Neural Transmission 123(5), 515–521.
Leijnse, J. N. A. L., Hallett, M., & Sonneveld, G. J. (2015). A multifactorial conceptual model of
peripheral neuromusculoskeletal predisposing factors in task-specific focal hand dystonia in
musicians: Etiologic and therapeutic implications. Biological Cybernetics 109(1), 109–123.
Lim, V. K., & Altenmüller, E. (2003). Musicians’ cramp: Instrumental and gender differences.
Medical Problems of Performing Artists 18, 21–26.
Marsden, C. D., & Sheehy, M. P. (1990). Writer’s cramp. Trends in Neurosciences 13(4), 148–153.
Meck, W. H., Penney, T. B., & Pouthas, V. (2008). Cortico-striatal representation of time in animals
and humans. Current Opinion in Neurobiology 18(2), 145–152.
Obeso, I., Cerasa, A., & Quattrone, A. (2016). The effectiveness of transcranial brain stimulation in
improving clinical signs of hyperkinetic movement disorders. Frontiers in Neuroscience 9.
Oga, T., Honda, M., Toma, K., Murase, N., Okada, T., Hanakawa, T., … Shibasaki, H. (2002).
Abnormal cortical mechanisms of voluntary muscle relaxation in patients with writer’s cramp: An
fMRI study. Brain 125(4), 895–903.
Pastor, M. A., Macaluso, E., Day, B. L., & Frackowiak, R. S. J. (2008). Putaminal activity is related
to perceptual certainty. NeuroImage 41(1), 123–129.
Paulig, J., Jabusch, H. C., Grossbach, M., Boullet, L., & Altenmüller, E. (2014). Sensory trick
phenomenon improves motor control in pianists with dystonia: Prognostic value of glove-effect.
Peller, M., Zeuner, K. E., Munchau, A., Quartarone, A., Weiss, M., Knutzen, A., … Siebner, H. R.
(2006). The basal ganglia are hyperactive during the discrimination of tactile stimuli in writer’s
cramp. Brain 129(10), 2697–708.
Pesenti, A., Barbieri, S., & Priori, A. (2004). Limb immobilization for occupational dystonia: A
possible alternative treatment for selected patients. Advances in Neurology 94, 247–54.
Peterson, D. A., Berque, P., Jabusch, H. C., Altenmüller, E., & Frucht, S. J. (2013). Rating scales for
musician’s dystonia: The state of the art. Neurology 81(6), 589–598.
Peterson, D. A., & Sejnowski, T. J. (2017). A dynamic circuit hypothesis for the pathogenesis of
blepharospasm. Frontiers in Computational Neuroscience 11. Retrieved from
https://doi.org/10.3389/fncom.2017.00011
Peterson, D. A., Sejnowski, T. J., & Poizner, H. (2010). Convergent evidence for abnormal striatal
synaptic plasticity in dystonia. Neurobiology of Disease 37, 558–573.
Priori, A., Pesenti, A., Cappellari, A., Scarlato, G., & Barbieri, S. (2001). Limb immobilization for
the treatment of focal occupational dystonia. Neurology 57(3), 405–409.
Richardson, S. P., Altenmüller, E., Alter, K., Alterman, R. L., Chen, R., Frucht, S., … Hallett, M.
(2017). Research priorities in limb and task-specific dystonias. Frontiers in Neurology 8. Retrieved
from https://doi.org/10.3389/fneur.2017.00170
Ridding, M. C., Sheean, G., Rothwell, J. C., Inzelberg, R., & Kujirai, T. (1995). Changes in the
balance between motor cortical excitation and inhibition in focal, task specific dystonia. Journal of
Neurology, Neurosurgery & Psychiatry 59(5), 493–498.
Rosenkranz, K. (2010). Plasticity and intracortical inhibition in dystonia: Methodological
reconsiderations. Brain 133(6), e146.
Rosenkranz, K., Williamon, A., Butler, K., Cordivari, C., Lees, A. J., & Rothwell, J. C. (2005).
Pathophysiological differences between musician’s dystonia and writer’s cramp. Brain 128(4),
918–931.
Rosset-Llobet, J., Candia, V., Molas, S. F. I., Cubells, D., & Pascual-Leone, A. (2009). The challenge
of diagnosing focal hand dystonia in musicians. European Journal of Neurology 16(7), 864–869.
Rosset-Llobet, J., Fabregas-Molas, S., & Pascual-Leone, A. (2015). Effect of transcranial direct
current stimulation on neurorehabilitation of task-specific dystonia: A double-blind, randomized
clinical trial. Medical Problems of Performing Artists 30, 178–184.
Ruiz, M. H., Senghaas, P., Grossbach, M., Jabusch, H. C., Bangert, M., Hummel, F., … Altenmüller,
E. (2009). Defective inhibition and inter-regional phase synchronization in pianists with
musician’s dystonia: An EEG study. Human Brain Mapping 30(8), 2689–2700.
Sakai, N. (2006). Slow down exercise for the treatment of focal hand dystonia in pianists. Medical
Problems of Performing Artists 21, 25–28.
Satoh, M., Narita, M., & Tomimoto, H. (2011). Three cases of focal embouchure dystonia:
Classifications and successful therapy using a dental splint. European Neurology 66(2), 85–90.
Schmidt, A., Jabusch, H. C., Altenmüller, E., Hagenah, J., Bruggemann, N., Hedrich, K., … Klein, C.
(2006). Dominantly transmitted focal dystonia in families of patients with musician’s cramp.
Neurology 67(4), 691–693.
Schmidt, A., Jabusch, H. C., Altenmüller, E., Hagenah, J., Bruggemann, N., Lohmann, K., … Klein,
C. (2009). Etiology of musician’s dystonia: Familial or environmental? Neurology 72(14), 1248–
1254.
Schuele, S., & Lederman, R. J. (2004a). Long-term outcome of focal dystonia in string
instrumentalists. Movement Disorders 19(1), 43–48.
Schuele, S. U., & Lederman, R. J. (2004b). Occupational disorders in instrumental musicians.
Medical Problems of Performing Artists 19, 123–128.
Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of
Sohn, Y. H., & Hallett, M. (2004). Disturbed surround inhibition in focal hand dystonia. Annals of
Neurology 56(4), 595–599.
Spector, J. T., & Brandfonbrener, A. G. (2005). A new method for quantification of musician’s
dystonia: The frequency of abnormal movements scale. Medical Problems of Performing Artists
20, 157–162.
Spector, J. T., & Brandfonbrener, A. G. (2007). Methods of evaluation of musician’s dystonia:
Critique of measurement tools. Movement Disorders 22(3), 309–312.
Tamura, Y., Ueki, Y., Lin, P., Vorbach, S., Mima, T., Kakigi, R., & Hallett, M. (2009). Disordered
plasticity in the primary somatosensory cortex in focal hand dystonia. Brain 132(3), 749–755.
Tubiana, R. (2000). Musician’s focal dystonia. In R. Tubiana & P. C. Amadio (Eds.), Medical
problems of the instrumentalist musician. London and Malden, MA: Martin Dunitz.
Tubiana, R. (2003). Musician’s focal dystonia. Hand Clinics 19, 303–308.
Tubiana, R., & Chamagne, P. (1993). Medical professional problems of the upper-limb on musicians.
Bulletin de l’Académie Nationale de Médecine 177, 203–216.
Valdes, K., Naughton, N., & Algar, L. (2014). Sensorimotor interventions and assessments for the
hand and wrist: A scoping review. Journal of Hand Therapy 27(4), 272–286.
van der Steen, M. C., van Vugt, F. T., Keller, P. E., & Altenmüller, E. (2014). Basic timing abilities
stay intact in patients with musician’s dystonia. PloS ONE 9(3), e92906.
Vecchio, M., Malaguarnera, G., Giordano, M., Malaguarnera, M., Volti, G. L., Galvano, F., …
Malaguarnera, M. (2012). A musician’s dystonia. Lancet 379, 2116.
Williams, L. J., Butler, J. S., Molloy, A., McGovern, E., Beiser, I., Kimmich, O., … Reilly, R. B.
(2015). Young women do it better: Sexual dimorphism in temporal discrimination. Frontiers in
Neurology 6. Retrieved from https://doi.org/10.3389/fneur.2015.00160
Zeuner, K. E., Knutzen, A., Granert, O., Gotz, J., Wolff, S., Jansen, O., … Witt, K. (2015). Increased
volume and impaired function: The role of the basal ganglia in writer’s cramp. Brain and Behavior
5(2), e00301.
Zeuner, K. E., & Molloy, F. M. (2008). Abnormal reorganization in focal hand dystonia: Sensory and
motor training programs to retrain cortical function. Neurorehabilitation 23(1), 43–53.
SECTION VIII
T HE F U T U R E
CHAPT E R 33
NEW HORIZONS FOR

BRAIN RESEARCH IN
MUSIC
MI C H A E L H . T H A U T A N D D O N A L D A . H O D G E S
I : T F
P
P future developments in research is as hazardous and flawed as

future predictions in any other area of human life. When trying to predict
we generally look back first and try to find trends and patterns of
development in the past. From there we then try to generate probabilities for
future events to happen. In this way, interestingly—and somewhat
irrationally—we tend to work predictions in a linear fashion by extending
the past into the future, although the initial look backwards should teach us
unequivocally the opposite: that history decidedly moves forward in an
entirely nonlinear way. That is true for science as much as for any other
area of human life. However, nonlinear trajectories of development we
cannot predict—they happen. But since the human brain likes to create a
sense of security and minimize uncertainty, it likes to predict the future and
accepts acting happily irrational.
Therefore, this last chapter, concluding a book that is intended to
provide a systematic state-of-the-art compendium of brain research in music
—to which the authors have responded in a most remarkable and brilliant
fashion—has to be read with a grain of salt and a healthy dose of
skepticism. The sheer fact should give us pause that, after new brain
imaging technology broke into human neuroscience and completely
changed the understanding of the human brain from past states, few would
have predicted that this development would also usher in a most exciting
time for brain research in music. Brain imaging research in music, from
relatively simple and straightforward beginnings and questions (e.g., is
there a music-biased hemisphere in the brain?), has developed into a
complex field of inquiry—an almost Boethian renaissance of considering
music a science and an important subject of scientific inquiry. This would
not have been a surprise in medieval musicology because for most of its
known and recorded history, starting in the Western world in Greek
antiquity, music was considered primarily a science (musica mundana) and
only secondarily a performance art (musica instrumentalis et vocalis).
T N H F
R
This last and short chapter, therefore, will try to point to some significant
developments in the past regarding music and brain research that may hold
potential for future developments. We will also try to point to areas for
future research efforts, that in our understanding are underserved and in
need of more and new research strategies. And with due caution the chapter
will try to throw in some ideas of nonlinear leaps, motivated by so many
innovative accounts provided in the chapters of this book.
1. One of the most important scientific developments in the past thirty

years may have been that brain research in music has slowly moved
from a mostly exploratory approach to a hypothesis and theory-
driven field of inquiry. The emergence of a scientifically respected
field of music neuroscience reflects this successful paradigm shift
which is also manifested in increasing numbers of publications in
high-impact neuroscience journals and in a series of high-profile
international meetings, for example—but not limited to—by
initiatives of the New York Academy of Sciences, the Mariani
Foundation in Italy, the Kennedy Center and NIH, the Royal
Society of Medicine, the Royal Institution of Great Britain, and the
Society for Clinical Neuromusicology. It is also not uncommon
anymore for universities to have chairs in music psychology and
music cognition that have merged into cognitive neuroscience
fields. So, this trend may be continuing into the future, and as data
and knowledge accumulate, more complex theories as well new
explanatory models of brain processing in music may emerge.
2. As cogently pointed out in several chapters in the book, highly
sophisticated evidence for the existence of music (and visual arts)
in very early human history—virtually with the first appearance of
records of Homo sapiens—may drive further research into the role
of music in human brain development. One may pose the question
of whether the arts and music were the critical first laboratory for
abstract and symbolic thought and expression in the human brain,
which subsequently formed the cognitive basis for later
developments in language, culture, and technology. Looking at the
five bore holes in almost equidistance of a 45,000-year-old bone
flute in pentatonic tuning may provoke the question whether the
first feat of acoustical engineering actually happened in music, tens
of thousands of years before other records of technology appear
(Conard, Malina, & Muenzel, 2009). Current persuasive trends,
which may determine future research directions, seem to go beyond
earlier theories which ascribed to music a more secondary role in
human brain development, that is, by focusing on emotional
expression, social support, mating functions, spirituality, or even
considering music a curious but pleasing auditory derivative of
verbal language.
3. The current rapid new developments in brain imaging emphasizing
dynamic network modeling and connectivity analyses—in other
words, measuring functional information flow rather than
topographical region imaging—have been a boon for music
research because music, from a spectral and structural point of
view, is the most complex auditory language the human brain has
developed. This approach may lead to a much better understanding
of music processing in the human brain, and will also allow for
investigating music in a more ecologically salient and valid
framework. That is, we can now study structural complexities of
the full compositional architecture of music in a holistic rather than
in a more lab-type single element fashion.
4. Beyond imaging research in network connectivity and information
flow, neurotransmitter imaging has seen much less output—
possibly because it is technologically more challenging—but may
hold a deeper understanding of the nature of actual information
flow driving brain connectivity. Dopamine or MAOB protein
imaging, for example, may hold some important keys to
understanding the neurochemical machinery involved in music
processing beyond structural and functional imaging. However, the
required PET technology, including the production of radioactive
imaging ligands, is a much more involved technical process than
MRI-based imaging. Furthermore, PET is expensive, the use of
radioactive tracers limits the number of scans a participant can
undergo in an experiment, and some candidates are unwilling to be
participants due to the use of radioactive materials.
5. Neurotransmitter imaging has also the potential to continue to
provide new insights into understanding clinical translations of
brain processes in music perception, cognition, and production.
One of the most rapid—and almost nonlinearly—evolving areas of
brain research in music has been clinical. Clinical neuroscience
research in music which began in earnest but small steps twenty-
five years ago has now led to unprecedented medical recognitions
of music and rhythm as a significant modality in brain
rehabilitation. The World Federation of Neurorehabilitation, for
example, has endorsed Neurologic Music Therapy as a
standardized and evidence-based treatment system and maintains a
special study section to advance research and clinical practice.
Similarly, the British Medical Association gave the Oxford
handbook of neurologic music therapy a runner-up award at the
annual Medical Book Award in the category “Best New Book in
Neurology 2015.” These are truly remarkable advancements and
this may continue to be an applied research area with a strong
trajectory of future advancements, especially via studies that
combine investigations into neural mechanisms and clinical
outcomes. For example, recent research has begun to look at the
currently unknown neural mechanisms underlying the well-
documented preservations of long-term musical memories in
Alzheimer’s disease which are often much more pronounced when
compared to the maintenance of non-musical memories (Thaut et
al., 2018). This understanding may lead to a more focused
understanding of clinical uses, for example, whether music-based
exercises may lead to—even if temporarily limited—cognitive
boosts for persons with cognitive dysfunctions and dementia states.
Another example of newly emerging research uses PET imaging
technology to study dopamine release as a neural mechanism of
music’s effects on mobility improvements in Parkinson’s disease
(Koshimori et al., 2018). This may also lead to new investigations,
which are mostly still in conceptual stages, regarding
neurotransmitter functions in clinical applications of music to
dementia states and mental health, for example, clinical depression.
6. As therapy and rehabilitation move strongly towards a learning and
training paradigm and therapists become more and more regarded
as “clinical” coaches to help retrain brain function or employ
effective learning/training strategies in neurodevelopment, the strict
lines in research and practice between music learning and music as
therapy may become increasingly blurred as an understanding of
similar underlying brain mechanisms emerges. Research into the
effect of music on general intelligence measures and cognitive
development has seen a checkered path not without significant
controversy. This book provides a thorough and critically important
appraisal of music as one of several biological languages of the
human brain which contributes to integrative cognitive functions
and mental fluidity. This appraisal provides a well-grounded
platform for future research.
7. Future research directions merging genomics, neurotransmitter
functions, evolutionary biology, and comparative cross-cultural
music perception–cognition–performance investigations may
provide true cutting-edge new insight into the role and function of
music in human evolution which in many ways remain a mystery in
human history, as so prominently was pointed out by Darwin
himself already in 1871 (Darwin, 1871). Comparative music
research across cultures has been surprisingly slow to advance and
—hopefully again—this book will provide accelerating “starting
blocks” for new research trajectories. Comparative research
combining neuroscience with musicology, music theory, and music
cognition/perception could provide many answers to age-old
questions about the contributions of innate vs. cultural factors
shaping the musical brain as well as the existence of deep structure
universals vs. surface structure modifications and adaptations,
viewing music in a Chomsky-sense as a syntactical auditory
language system. Thinking through this concept further along one
may posit in parallel to Chomsky’s innate “Language Acquisition
Device” (LAD) an innate “Musical Language Acquisition Device”
(MLAD). This would require more collaborative cross-disciplinary
research undertakings between neuroscience and music, including
history, theory, composition, performance, and music psychology.
Such “fusion” research would greatly benefit almost all future
directions in music and brain research, as the professional music
side is still often underrepresented.
8. This book is trying to create a very important exclamation mark
behind the urgent need for more research in musician health. As the
public view of music has at times been overdeveloped in the past
years into a very simple and romantic notion of music as the brain
and soul cure for all and everything, the notion that professional
musicianship is a very challenging and injury-prone occupation,
seemed to the external observer a contradiction in terms. A
musician must be by definition and brain research metrics the
happiest, most fulfilled, and brain developed person in the world.
The world of data shows a very different picture. The incidence of
musculoskeletal injuries at some time in a musician’s performance
career is staggeringly high at over 50 percent and can lead to
career-ending conditions because they are not reversible at this
time, for example, in musician dystonia (Guptill, 2011). Other areas
of musician health including performance stress, anxiety, and other
negative psychological factors can lead to severely debilitating
disease conditions whose incidence rate again is quite high.
Musician health is an area that needs a very significant push
towards new and improved research (Henechowicz, Chen, Cohen,
& Thaut, 2018). Sports science, including optimized motor
learning, injury prevention, and performance psychology, is highly
developed and may serve as a partner and model for musician
training and musician health. The teacher/student model in music
has always been dominated by the artistic master image. Sports
training always worked from a coaching model. Admittedly, there
are significant differences between music and sports training, for
example, in the necessary emphasis in music on motor skills being
physical tools for aesthetic and artistic expression (although some
very difficult sports disciplines also have significant aesthetic
components such as figure skating or gymnastics). However, there
are also considerable motoric and psychological overlaps between
both fields in performance learning and practice. Fortunately, the
awareness that change is needed is growing and more and more
important research initiatives are starting but much remains to be
done.
9. In contrast to the advancing work in music–brain research and
music therapy, progress in music education has been much slower.
Significant strides in basic research studies in music learning are
being made (see, e.g., Chapters 19 and 22, this volume); however,
very little has been done in the way of applied research in music
education settings similar to the clinical work discussed in previous
points. In the Preface to Neurosciences in music pedagogy (Gruhn
& Rauscher, 2007), the editors wrote, “there is in fact nothing really
new that brain researchers can tell educators about teaching that
they did not know” (p. vii). That statement may be somewhat
extreme in the light of newer findings, but the situation remains that
recommendations given to an elementary general music teacher, a
middle school choir director, or a high school band director are
often either grossly simplified to the point of exaggerating or
distorting research findings or highly watered-down generalities
that amount to little more than platitudes. While it may be too much
to expect highly detailed pedagogical instructions that are
supported by neuroscience, this is a field that is ripe for
exploration.
10. Curiously, it may be that neuroscientific research is beginning to
encroach on territory once reserved for philosophers. Take the case
of aesthetics, for example. More than fifty years ago, Wittgenstein
(1967, pp. 19–20) wrote:
People still have the idea that psychology is one day going to explain all our
aesthetic judgments, and they mean experimental psychology. This is very funny
—very funny indeed. … Supposing it was found that all our judgments proceeded
from our brain. We discovered particular kinds of mechanism in the brain,
formulated general laws, etc. One could show that this sequence of notes produces
this particular kind of reaction; makes a man smile and say: “Oh, how wonderful.”
In contrast, Huron (2016) remarked that biology and geology now

answer questions once falling under the purview of “natural
philosophy,” physics and astronomy have superseded cosmology,
social and behavioral sciences handle questions of human behavior,
and so:
If evolutionary psychologists are correct, then questions concerning the
experience of beauty and ugliness may soon slip from the domineering grasp of
philosophy. … Only time will tell whether we are witnessing the passing of the
aesthetics baton from philosophy to empirical psychology.
(Huron, 2016, p. 242)
More particularly, “From the perspective of cognitive neuroscience,

the disembodied, nonutilitarian notion of aesthetic pleasure posited
by Kant cannot easily be reconciled with biology” (Huron, 2016, p.
242). Rather than pitting one discipline against another, perhaps we
should consider that “we have much to gain by coordinating
insights from music listeners, music philosophers, and music
researchers” (Hodges, 2013, p. 276). The word aesthetics comes
from the Greek word aisthetikos which broadly means “pertaining
to sense perception” and in this context maybe even the Kantian
notion of required “A Priori Knowledge in Aesthetic Judgment”
(Kant, 1790) where the perceived art object seems to be created to
fit the processing of one’s perceptual apparatus—in other words as
if the object was created to be heard and seen—may become
reconciled in such coordinated exploration with modern
neurobiology (Thaut, 2005). Such coordinated efforts would be
helpful in a number of areas covered in this volume.
C
The ten horizons laid out in this chapter for potential future research should
not be read as predictors—see the introduction to this chapter—rather than
as areas of potential growth for the advancement of knowledge in music as
a language the human brain has created. The chapters in this book are trying
to present the most current state of knowledge about the functions and
operations in the brain associated with music. As we think and create music
subjectively it becomes an object of discovery and scientific research.
Therefore, the final mission of the book is to serve as a springboard for
future brain research in music, providing knowledge from very different
angles of scientific discovery to help shape new trajectories of pursuit to
understand music and our brains, that is, as the brain creates and engages in
music it is changed by engaging in music.
R
Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton-Century-Crofts.
Conard, N. J., Malina, M., & Muenzel, S. C. (2009). New flutes document the earliest musical
tradition in southwestern Germany. Nature 460, 737–740.
Darwin, C. (1871). The descent of man and selection in relation to sex. London: John Murray.
Gruhn, W., & Rauscher, F. (Eds.). (2007). Neurosciences in Music Pedagogy. New York: Nova
Biomedical Books.
Guptill, C. A. (2011). The lived experience of professional musicians with playing-related injuries: A
phenomenological inquiry. Medical Problems of Performing Artists 26(2), 84–95.
Henechowicz, T., Chen, J., Cohen, L. G., & Thaut, M. H. (2018). Prevalence of BDNF
polymorphism in musicians: Evidence for compensatory motor learning strategies in music?
Proceedings of the Society for Neuroscience. In press.
Hodges, D. (2013). Music listeners, philosophers, and researchers. Physics of Life Reviews 10(3),
275–276.
Huron, D. (2016). Aesthetics. In S. Hallan, I. Cross, & M. Thaut (Eds.), The Oxford handbook of
music psychology (2nd ed., pp. 233–245). Oxford: Oxford University Press.
Kant, I. (1790). Kritik der Urteilskraft. Berlin: Lagarde.
Koshimori, Y., Strafella, A., Valli, M., Sharma, V., Cho, S. S., Houle, S., & Thaut, M. H. (2018).
Motor synchronization to rhythmic auditory stimulation (RAS) attenuates dopamine response in
the ventral striatum in young healthy adults. Proceedings of the Society for Neuroscience 670, 15.
Thaut, M. H. (2005). Rhythm, music, and the brain. New York: Taylor & Francis.
Thaut, M. H., Schweizer, T., Leggieri, M., Churchill, N., Fornazzari, L., & Fischer, C. (2018). Neural
basis for potential preservation of musical memory and effects on functional intra network
connectivity in early Alzheimer’s disease and mild cognitive dysfunction. Proceedings of the
Society for Neuroscience 741, 12.
Wittgenstein, L. (1967). Lectures and conversations on aesthetics, psychology, and religious belief.
Ed. C. Barret. Berkeley, CA: University of California Press.
I
Note: Italic page numbers represents figure in that page
β-endorphins 336–8, 342, 345 see also beta endorphins
A
Abrams, D. A. et al. 397
absolute pitch (AP) 240, 443, 461, 471
and autism 681–90, 683
abstract-feature mismatch negativity (afMMN) 196
academic achievement, effect of musical training 654–6
acetylcholine (Ach) receptors 350
acquired amusia 765–70
acquired brain injury (ABI) 700–1
Action Research Arm Test (ARAT) 701
Action Simulation for Auditory Perception (ASAP) model 225
active vs. passive exposure 32
activities of daily living (ADL) 745
adaptive convergent sequence evolution 443
Adhikari, B. M. et al. 509–10
adjacency matrices (Aij) 133
adrenocorticotropic hormone (ACTH) 342, 741
aesthetic experiences 366, 367–74, 371, 373
brain structures 375–6
emotions 73–4, 75–8, 80
future challenges 381–2
pleasure 105 see also pleasure in music
studies 377–81
aesthetic judgment 286, 290, 374, 376
affect, definitions 286–7
affective functions, neurologic music therapy 745
aging 623–4, 636–7
brain mechanisms 629–31
cognition in musicians 624–6
emotions and well-being 628–9, 635–6
and language 633–4
and listening to music 627–8
and memory 632–3
and motor functions 634–5, 705
and musical training 626–7, 656–7
pathologies 631–3, 635–6
singing therapy 725
Aitken Dunham, D. J. 719
Akan language 570
Albanese, A. et al. 776
allergic skin responses 348
Alluri, V. and Toiviainen, P. 377
Alm, P. A. 722–3
alpha-melanocyte-stimulating hormone 741
Altenmüller, E., Finger, S., and Boller, F. 3–4
Alzheimer’s disease (AD) 254, 625, 631–3, 634, 635–6
cognitive remediation 740
NMT 751
Ammirante, P. et al. 225
amplitude envelopes 151, 157, 158, 190–1
amplitude variation 153–5, 154
Amunts, K. et al. 202
amusia 5, 25, 400, 443, 461, 556
acquired 765–70
congenital 760–5, 769–70
and imagery 523–4
and language disorders 581
phenotypes 770–1
pitch-based 761–3
amygdala 102–4, 103, 299, 300, 301, 446
Anderson, B., Southern, B. D., and Powers, R. E. 217
angular gyrus 518
anhedonia 768–9
animals
brain plasticity 431
dystonia 477
rhythm perception 175–6
anterior cingulate cortex (ACC) 106, 300, 301, 490, 513–14
anterior pituitary 341–5
anticipatory musical imagery 526
anxiety 752
aphasia 393, 400, 523, 720–1
apraxia of speech (AOS) 718–20
aptitude 427–8
evolution 443–5
exceptional musicianship 671–4, 681–90
genetic influences 555–6
genetic markers 445–6
as genetic trait 440–3
innateness 808
and musical training 658–60, 659
without training 25
archaeological findings 29–30
arcuate fasciculus 428, 471, 634
arginine vasopressin (AV) 339, 340–1, 342
Arm Paresis Score 701
Arnal, L. et al. 407–8
arousal 286, 601
enhancing musical memory 246–7
art
archaeological findings 29–30
neuroaesthetics of 367–9
associative coding of emotions 288
Associative Mood and Memory Training (AMMT) 749
assortativity 129
asthma 727
asymmetric sampling in time hypothesis 406
atlas/region-of-interest (ROI) based networks 130, 131
attention 263–4
to pitch and harmonicity 266–7
selection and filtering 266
temporal 267–9
theories 264–6
training 747–8, 749–50
attention deficit (hyperactivity) disorder (ADD/ADHD) 607
audio-visual bounce effect (ABE) 157
audio-visual integration research 156–8
auditory areas, primary (A1) and secondary (A2) 464
auditory association areas (AAs) 464
auditory belt 221
auditory brainstem response (ABR) 550–1, 553
auditory core 221
auditory cortex 394, 428
auditory domain 470
auditory feedback 464
auditory-frontal networks 93–5
auditory gestalts 464
auditory-limbic networks 102–5
auditory–motor integration circuit 469
auditory-motor networks 95–102, 96
auditory parabelt 221
auditory pathways 90–3, 90, 217
auditory perception 153–9
autistic children 678–81, 678, 680
development 675–7, 677, 678–81, 678, 680
Auditory Perception Training (APT) 747
auditory processing 216–17
auditory sensory memory (ASM) 192–4
auditory sensory processing 517
auditory stream segregation 266
auditory system 188–94
Auditory Verbal Learning Test 751
Auerbach, S. 470
autism spectrum disorder (ASD) 581, 607, 705–6
and absolute pitch (AP) 681–90, 683
and development of auditory perception 678–81, 678, 680
and exceptional musical abilities 671–4, 681–90
autobiographical memory 245, 248, 302, 633
axons 27
Ayotte, J. et al. 766
B
Baba, A. et al. 527
Babbitt, M. 569
Bach, J. S. 197
background music 252–4, 627–8, 633
backward playing of notes 151
Baddeley, A. D. 531
Baddeley, A. D. and Logie, R. H. 531
Baer, L. et al. 426
Bailey, J. A. et al. 426
Baird, A. et al. 632–3
Balbag, M. A. et al. 626
Balkwill, L. L. and Thompson, W. F. 45
Bangert, M. and Altenmüller, E. 469
Bantu languages 570
Barabási, A.-L. 125
Baram, Y. and Miller, A. 704
Bard, P. 189
basal ganglia (BG) 223, 301, 426, 467, 473–4, 696, 716
basal ganglia-thalamo-cortical network 100–2
bats 443
beat 165–6, 166, 187
continuation 101
extraction 592
perception 166–72, 177, 178–9
somatosensory perception 225
visuomotor perception 224
see also rhythm
Beat Alignment Task (BAT) 596
beat finding disorder 763–5
bebop 569
Beethoven, L. v, 145, 159
behavioral changes in musical training 272–3
behavioral studies, imagery and perception 522–3
Beisteiner, R. 5
Bellaire, K. et al. 718
Benedek, M. et al. 278
Bengtsson, S. L. et al. 27, 471, 488–9, 493–4, 498
Berger, H. 5, 268
Berkowitz, A. L. and Ansari, D. 489–90, 492, 518
Berlyne, D. E. 71
Berns, G. S. et al. 377
beta-endorphin 741 see also β-endorphins
Bhide, A. et al. 728
biological approaches vs. cultural approaches 19–22, 461–2
biological restrictions 21
bipolar disorder 451
birds, rhythm perception 175–6
birdsong 67–73, 81, 393–4, 441, 443, 444–5, 447, 449
Blondel, V. D. et al. 129
Blood, A. J. and Zatorre, R. J. 292, 377
blood-flow studies 292, 293, 296, 304
blue notes 213
Bogert, B. et al. 375
Bonacina, S. et al. 728
bonding 66
Boone, D. R. et al. 726–7
bootstrapping problem 573, 742
Bottiroli, S. et al. 628
Box and Block Test (BBT) 701
brain, co-evolution with hands 21
brain damage
effect on imagery and perception 523–4
pathological aging 631–2, 635–6
pitch perception 191
and rhythm processing 166, 169–70
brain-derived neurotrophic factor (BDNF) 753
brain development see development; plasticity
brain imaging studies of imagery and perception 525–6
brain injuries 700–1
singing therapy 718
brain scanning, technological advances 365
brainstem 301
brainstem reflex 289, 302
Brattico, E. 367, 377
Brattico, E. and Pearce, M. T. 377
Brattico, E. et al. 372, 373, 374, 375–6, 377
Brazilian music 603
BRECVEMAC framework 289–90, 294, 299, 301 see also ICINAS-BRECVEMAC
Bregman, A. S. 266
Brendal, B. and Ziegler, W. 719
Broadbent, D. E. 266
Broca’s aphasia 720–1
Broca’s area 202, 243, 277, 397, 400
Brotons, M. and Koger, S. M. 634
Brown, S. and Jordania, J. 30
Brown, S. et al. 377, 506–7, 513
Brust, J. 3
Bugos, J. A. et al. 627
C
Caetano, G. and Jousmäki, V. 220
Cage, J. 21
Cameron, D. J. et al. 49
Cappiani, L. 4
Carnātic music, tonality 48
Cason, N. et al. 404, 606
caudal subdivision 216
cave paintings 29–30
CBB (culture–behavior–brain) loop model 20, 20
CDK5 pathway 450
cello
harmonics 151, 152
multisensory perception 221
central activation 72
central nervous system (CNS) 717
central sulcus 465, 472
centrality analysis 128
cerebellum 100, 106, 243, 300, 301, 427, 467, 490, 696
rhythm perception 168
cerebral palsy (CP) 698, 699, 700, 706
Chan, M. F. et al. 629
Chapin, H. et al. 377
Chatterjee, A. and Vartanian, O. 381
Chen, J. K.-C. et al. 723–4
Cheung, V. et al. 201
Chiao, J. 20
Chikazoe, J. et al. 293
childhood apraxia of speech (CAS) 719–20
children see infants
chills, response to music 335, 380
Chinese IDyOM model 53–8
Chinese music, phrase boundary perception 49
Chobert, J. et al. 553
cholinergic systems 350
Chomsky, N. 392, 808
Chomsky hierarchy 196
Chong, H. J. et al. 706
Chopin 601
chord functions 200
chord transitions 194, 197–9, 198, 201–3
chords 147–8
CHRNA9 gene 446
chromosomes 445–6, 448
chronic obstructive pulmonary disease (COPD) 727
cingulate motor area (CMA) 465, 466–7, 515–16
Cirelli, L. K. et al. 604, 605
clapping, synchronous 605 see also synchronous movement
clarinet, harmonics 148–9, 149, 150
classical music 337, 549
clinical research model 744
Closure Positive Shift 49
clusters 127–8
cluttering 722
cochlea 188–9, 192
cochlear 446
cochlear implant (CI) users 147, 723–4
cognition
of creative musicians 278
and listening to music 627–8
and musical expertise 624–6
neural mechanisms 521
neurologic music therapy (NMT) 747–53
and neurological disorders 738
and short-term training 626–7
and training 647–50, 654–7, 659
cognitive control 271
cognitive decline in aging 623–4
cognitive dysfunction, music therapy 742–3
cognitive functions, neurologic music therapy 745
cognitive goal appraisal 290
cognitive integration 213–16
cognitive neuroscience of music 365, 369–74, 371, 373
studies 377–81
cognitive remediation/rehabilitation (CR) 738–40, 753
vs. Transformational Design Model of NMT 746
cognitive reserve, in aging 629–31
Cohen, J. 250
Cohen, N. S. and Masse, R. 718
Colcord, R. D. and Adams, M. R. 723
Coleman, O. 75–6
Common Patterns 30–1
community detection procedures 130
community structure analysis 129, 130, 131
comparative approach 393
compensatory approach to cognitive remediation 739
complexity 70, 248
composers of music 450–1
instructions 146
conductors 468
congenital amusia 760–5, 769–70
connectivity 122–3, 127
and improvisation 499–500
see also network metrics
connectivity analysis 133
consciousness disorders 750
Conserved Universals 30–1
contagion, emotional 290, 303
convergent analysis 449–50
Cook, N. 58
Cope, T. E. et al. 704
copy number variation (CNV) analysis 446
corpus callosum (CC) 421, 422, 425, 426, 471, 473
Corrigall, K. A. et al. 461
cortical sound processing 547–50
cortico-cerebellar network 97–100, 98, 102
corticospinal tract 471
corticotrophin-releasing hormone (CRH) 741
corticotropin releasing factor (CRF) 342
cortisol 341–5, 741
Costa-Giomi, E. 652
Cox, A. 530
cramp 475, 477, 777, 785–6 see also dystonia
Crasta, J. E. et al. 696
creativity 263, 274–5, 278–9
biological basis 450–1
definition 277–8
generative see improvisation
in musical improvisation 275–6
neural correlates 497–9
neuroimaging studies 276–7
personality and cognitive profiles 278
Critchley, M. and Henson, R. 3
cross-cultural research 32–3, 43–4, 58–60
memory 50–1
preferences 46–7
recognition of emotions 45–6
structural features 47–50
Cross, I. 571
Cross, I. and Morley, I. 19
crying, response to music 76–7
Cue Redundancy Model (CRM) 45, 48, 58
cultural approaches 22–32
vs. biological approaches 19–22, 461–2
cultural distance 44, 51–60
cultural distance hypothesis 32, 33, 52
culturally contextualized behaviors (CC-Behavior) 20
culturally voluntary behaviors (CV-Behavior) 20
Cupchik, G. C. et al. 367
cyclic form 568
cystic fibrosis 727
cytokines 348
D
Dai, L. et al. 340
Dalla Bella, S. et al. 601
dance 533–4
beat finding disorders 763–5
universality 30
see also synchronous movement
dance movement therapy (DMT) 336
Darwin, C. 718, 808
data-driven approaches to creativity 277–8
DC-EEG (direct current EEG) 5
de-expertise 475–6
de Manzano, O. and Ullén, F. 492–5, 513
Deacon, T. W. 571–2
deafness 189, 581, 723–4
deep brain stimulation (DBS), for musician’s dystonia (MD) 793
Default Mode Network 136–7
Default Network 277
Degé, F. and Schwarzer, G. 653
degenerative brain disorders, cognitive remediation 740
degenerative movement disorders 701–5
degree distribution 125–7
Dehaene-Lambertz, G. 579
deliberate practice 460–1
dementia 448, 631–3, 635–6
Demorest, S. M. and Morrison, S. J. 52
Demorest, S. M. et al. 51
dendrites 463
Dennett, D. C. 279
DePape, A.-M. R. et al. 681
depression 339, 342, 752
Desai, V. and Mishra, P. 726
Deutsch, D. 265
Deutsch, D. et al. 571, 582–3
development
of auditory perception 675–7, 677
of the brain, effect of training 422–7, 426, 551–3, 557
co-development of language and music 576–8, 577
disorders
and amusia 770
in music and language 580–1
rehabilitation for 705–6
and rhythm 605–7
of language perception 574–6, 575
developmental coordination disorder (DCD) 607
Developmental Speech and Language Training Through Music (DSLM) 717
developmental stress hypothesis 68, 81
Diagnostic and Statistical Manual of Mental Disorders–5 (DSM–5) 672–3, 722
Diamond, A. 263
Different Trains (Reich) 570–1
diffusion tensor imaging (DTI) 5, 380, 421, 425, 470, 471
Ding, N. et al. 406
direct current stimulation (DCS), for dystonia 784
Directions into Velocities of Articulators (DIVA) 717
discrimination of sounds 394–6
disordered timing 786–7
disorders, language and music 580–1 see also development; neurological disorders
Distorted Tunes Test (DTT) 442
Divergent Thinking Test 278
Dobos, R. 29
Dobzhansky, T. 19
‘dock-in’ model of emotional recognition 45–6, 58
Doidge, N. 27
dolphins 443
Donnay, G. F. et al. 502–4, 517
dopamine (DA) 102, 104–5, 246, 334–6, 337, 431, 446, 448–9, 450–1, 451, 631, 741
and rhythm perception 170
dopamine imaging 807
dopamine signaling 795
dorsal cochlear nucleus (DCN) 338
dorsal pathways 217, 218
dorsal premotor cortex (PMd) 276, 488, 489, 490, 494–5, 515
dorsolateral prefrontal cortex (DLPFC) 276, 488, 514–15
Dowling, W. J. and Harwood, D. L. 148, 288
Down syndrome 699, 700
Drake, C. and Bertrand, D. 30
DRD2 polymorphism 716
DRD4 gene 450
drumming, synchronized 603–4 see also synchronous movement
DSM-5 672–3, 722
dual-pathway model 94
DUSP1 gene 447, 449–50
Dynamic Attending Theory 268
dysarthria 717–18
dyslexia 580, 606–7, 653
and amusia 770
neurologic music therapy (NMT) for 727–8
rhythm processing 173
dystonia
embouchure 477, 777, 779, 779
focal 475, 476–8, 776–7, 778, 784–6
focal hand (FHD) 777, 778, 779, 784, 785–6
musician’s (MD) 475–8, 479, 700, 777–82, 778, 779
future directions 791–6
pathogenic theory 787–90, 788, 794–6
pathophysiology 784–7, 794
and plasticity 787
treatments 782–4, 783, 791–3
types 776–7
Dystonia Study Group 780, 781–2
E
early right anterior negativity (ERAN) 199–200, 266, 277, 375, 548
earworms 528
East African music, rhythm perception 49
echoic memory 192–3
echolalia 679–80
echolocation 443
Edgren, J. 5
education in music 46, 442, 809
EEG (encephalography) 5, 376
improvisation 509–12
Einarson, K. M. and Trainor, L. J. 596
Eitan, Z. and Granot, R. Y. 530
El Haj, M. et al. 633
El Sistema music training 552, 554–5
electrophysical methods 468
electrophysiology, studies of imagery and perception 524
Eley, R. and Gorman, D. 727
Ellis, R. J. et al. 273
Elmer, S. et al. 402
embodied musical imagery 529–34
embouchure dystonia 477, 777, 779, 779
emotions 79, 285–6, 465
aesthetic 73–4, 75–8, 80
and aging 628–9, 635–6
auditory processing 190
cross-cultural studies on recognition 45–6
cultural specificity 32
definition 286
discreteness 293
empirical studies 291–304, 306–23
enhancing musical memory 246–7
induction 289–90, 297–300, 301–2
vs. perception 287, 292, 296
layers of expression 288–9, 288
neural responses 104
neurologic music therapy (NMT) 747–53
perception 287–9, 300
vs. induction 287, 292, 296
psychological mechanisms 287–90
regulation using cognitive remediation 741
and rhythm 600–1
sensitivity to 402
specific brain regions 293–6
universality 30
visual displays 213
empirical aesthetics 366, 367–74, 371, 373
studies 377–81
endocrine responses 339–45
endogenous cannabinoid receptors 793
endogenous opioid systems (EOSs) 336–8
enjoyment of music see pleasure in music
entrainment 407
entropy 277–8, 599–600
environment, influence on genetic expression 22–3, 23
environmental effects, vs. genetic effects 440
epilepsy 5, 766
episodic buffer 241
episodic memory 238, 245, 248, 290, 303
ERAN (early right anterior negativity) 199–200, 266, 277, 375, 548
Ericsson, K. A. et al. 460
error-driven learning 430
Escoffier, N. et al. 300
esthetic experiences see aesthetic experiences
esthetic judgment see aesthetic judgment
evaluative conditioning (EC) 290, 303
event-related desynchronizations (ERDs) 249–50
event-related potentials (ERPs) 5, 192, 375, 594
in infants 28
in monkeys 176
phrase boundary perception 49
tonality perception 48
visual rhythm perception 178
evolution
of brain and hands 21
learned song 74–5
of musical aptitude 443–5
Ewe language 570
Executive Control Network 277
executive functions (EFs) 255, 263, 269–70
in aging 657
and musical training 554–5, 649–50
training 748
transfer 269–74
expectations 267, 290, 303–4, 398
experience-dependent processes 25–6, 26
experience-expectant processes 25–6, 26, 26–7
expert performance 22–3, 23
expertise see musicians
explicit memory system 238, 244, 245–6, 255
exposure see familiarity
extinction 265
extracurricular activities 654–5
F
faculties of the mind 4, 4
Fahn-Marsden (FM) scale 782
falling, risk of 702, 705
familiarity
and cross-cultural research 32, 33
infant responses 28–9
and music memory 50
network connections 106
and preference 46–7, 58, 240
scale perception 47
far transfer 271–74, 646, 649
Farrugia, N. et al. 528
Fava, E. et al. 579
Fawcett, C. and Tunçgenç, B. 604–5
feature integration theory 264–5
feedback 430, 464, 465, 699, 717
feedforward and feedback connections 219
Fernández-Miranda, J. C. et al. 95
Ferreri, L. et al. 252, 628
ferrets 431
fetuses, response to musical stimulation 27–8
Fifth Symphony (Beethoven) 145, 159
filtering, attentional 266
flat tones 153–5, 154, 155
Flaugnacco, E. et al. 606–7, 728
fluency disorders 721–3
fMRI (functional magnetic resonance imaging) 5, 365, 469
of improvisation 488–506
network-based approaches 132–3, 134, 135
focal dystonias 475, 476–8, 776–7, 778, 784–6
focal hand dystonia (FHD) 777, 778, 779, 784, 785–6
Fodor, J. 392
foot tapping, neural basis 97
forgetfulness of self 77
formal institutional training in improvisation (FITI) 511–12
FOS gene 447, 449–50
Foster, N. A. and Valentine, E. R. 633
Foster, N. E. and Zatorre, R. J. 422
Fox, N. A. et al. 220
FOXP1 gene 445
FOXP2 gene 443–5, 450
fractional anisotropy (FA) 426, 470, 473
Fractionating Emotional Systems (FES) 45
François, C. and Schön, D. 403
free form jazz 75–6
free response generation 492–4
freestyle rap 276, 495–7
freezing episodes 703
French horn, harmonics 155
frequency following response (FFR) 190, 550–1
frequency range, biological restrictions 21
frequency tagging approach 171
Friederici, A. D. 201
Fritz, T. 45–6, 58
frontal cortex 93–5, 301
frontal gyrus 300
Früholz, S. et al. 377
Fu, Q.-J. et al. 724
Fujioka, T. et al. 172
functional near-infrared spectroscopy (fNIRS) 724
functions of music 31
fusiform face area (FFA) 770
fusiform gyrus 488–9
future of brain research in music 805–11
G
Gaab, N. et al. 239
Gagaku music 568
Gall, F. J. 4
Galvan, A. 25
gamelan music 568
garden path sentences 398, 582
Gaser, C. and Schlaug, G. 472
Gaston, E. 19
GATA2 gene 448, 448
Gaver, W. 671–3
gender ratios for musician’s dystonia (MD) 788–9, 789
gene–maturation–environment interactions 424
Generative Syntax Model (GSM) 197
Generative Theory of Tonal Music (GTTM) 197
genetic effects vs. environmental effects 440
genetic influences 20
on musical behavior 22–3, 23
genomic approaches 5, 439, 452
to aptitude 440–3, 445–6
convergent analysis 449–50
creativity 450–1
evolution 443–5
effect of music on transcriptome 447–9
genre 137, 568
German music 603
Gerry, D. W. et al. 595–6
Gervain, J. 580
Gestalt formation 192–4
gestures 223
Ghitza, O. 407
Gillespie, L. D. et al. 705
Gilmore, S. 225
Giraud, A. L. and Poeppel, D. 406
Glennie, Dame E. 189
Global Dystonia Scale (GDS) 782
globus pallidus internum (GPi) 793
Glover, H. et al. 723
Gooding, L. et al. 625
Goswami, U. 405
GPR98 gene 444
grahabēdham modulation 48
Grahn, J. A. and Brett, M. 169
Granert, O. et al. 473, 474
graph theory 123, 125, 131, 137
Grau-Sánchez, J. et al. 701
gray matter (GM) 420–2, 426, 429
density 472–3
pianists 474
GRIN2B gene 450
groove 169
group drumming 340, 344, 347, 348
Guenther, F. H. 717
Guerrieri, M. 145
guided imagery and music (GIM) therapy 344
H
Habib, M. et al. 606, 728
Habibi, A. et al. 273
hallucinations, musical 526–7
Halpern, A. R. 523
Halpern, A. R. and Müllensiefen, D. 245
Halpern, A. R. and O’Connor, M. G. 244
Halpern, A. R. et al. 531
Hambrick, D. Z. et al. 444
Han, S. and Ma, Y. 20
hand dystonia 477
handicap principle 67–9, 81
hands, co-evolution with brain 21
Hanna-Pladdy, B. and MacKay, A. 625
Hannon, E. E. and Trainor, L. J. 28
Hanslick, E. 79, 80, 370
harmonic dependencies 197–9, 198
harmonic expectancy violations 93
harmonicity, attention to 266–7
harmonics 146, 148–52, 149, 155
Harmony project 553
Hawaiian language 570
head movements, and rhythm perception 225–6
Healey, E. C. et al. 723
hearing mechanism 22
hedonic reversal 73–4, 80
Helfrich-Miller, K. R. 720
hemi-neglect 751–2
hemispheres 293
asymmetry 93, 106
hemispheric specialization 217
hemodynamic responses 248–9
Henson, R. 3
heritability 428, 442, 555–6
Heschl’s gyrus (HG) 420, 423, 464, 470
damage to 523
Hidalgo, C. et al. 405
hierarchical syntactic structures 196–203
Hilton, M. P. et al. 725
Hinton, G. et al. 71
hippocampus 104, 301, 465
Hmong language 570
Hofstadter, D. R. 197
homunculus 466
hormones 463, 741
Hubbard, T. L. 521–2, 530
hubs 127–9, 128
Hugo, V. 568
human characteristics 19
humanistic approach 364, 365
Huntington’s disease 703
Huron, D. 372, 810
Hurt-Thaut, C. P. 705
Husserl, E. 675
Hutchinson, M. et al. 794
Hyde, K. L. et al. 217, 471
hypothalamic-pituitary-adrenal (HPA) axis 341
I
ICINAS-BRECVEMAC 287–8, 288, 289–90, 294, 299, 301, 304, 305
iconic coding of emotions 288
identity decision 158
IDyOM (Information Dynamics of Music) 52–8
Ilari, B. 602–3
imagery 94–5, 534–6
embodied 529–34
involuntary 526–9
and perception
behavioral and psychophysical studies 522–3
effect of brain damage 523–4
physiological measures 524–6
Imagination, Tension, Prediction, Reaction, and Appraisal (ITPRA) 372
imitation 675–6
immediate early response genes (IEGs) 448–9
immune cells 347
immune system 347–9
immunoglobin A (IgA) 348–9
implicit memory system 238, 244, 245–6, 255
improvisation 487, 512–14
EEG studies 509–12
fMRI studies 488–506
jazz 276, 277, 278, 490–1, 502–6, 508–9
language areas 516–17
limbic processing 516
as model of creativity 275–6
motor regions 515–16
neuroimaging studies 276–7
parietal lobes 518–19
PET studies 506–7
sensory processing 517–18
similarity to speech 571
tDCS studies 508–9
individual variation
in enjoyment of music 380
in musical memory 247–8
infant-directed singing 596–8
infant-directed speech 573
infants
cortisol levels 345
development of auditory perception 675–7, 677
early right anterior negativity (ERAN) 200, 203
emotion perception 600–1
enculturation 32
language and music
co-development 576–8, 577
mimicry 675–6
response to musical stimulation 28–9
rhythm development 174–5
rhythm perception 48–9, 593–4, 595–6
scale perception 47
synchronous movement 603, 604–5
vestibular system 226
inferior colliculus (IC) 189–90, 338, 446
inferior frontal gyrus (IFG) 221, 375, 490, 516–17, 763, 764
inferior parietal lobule 490
information retrieval techniques 277
inhibition, abnormalities 476
innateness 808
inner ear and inner voice 531–2
instrument tones vs. pure tones 22
instruments
harmonics 148–52, 149, 155
universality 30
insula 301, 516
insular cortex 243
Intartaglia, B. et al. 401
integration 213–16, 214
integration windows 217
intelligence quotient (IQ), effect of musical training 272–3, 647–50, 655
internal rehearsal 243
interregional interactions 123–4
intervals
judging size by sight 218
memory 240–1
intrinsic coding of emotions 288
intrinsic features in musical memory 245–6
invariants 29
involuntary musical imagery 94, 526–9
Ishizu, T. and Zeki, S. 377
J
Jabusch, H. C. 793
Jackendoff, R. 567–8, 569
Jackson, J. 4
Jacobsen, T. and Beudt, S. 367
James, W. 263
Janata, P. 302
Janus, M. et al. 555
Japanese language 570
Jaschke, A. C. et al. 658
Javanese scales vs. Western scales 574
jazz 549, 571
blue notes 213
free form 75–6
improvisation 276, 277, 278, 490–1, 502–6, 508–9
jazz musicians, personalities 278
Jennings, J. J. and Kuehn, D. P. 726
Jentschke, S. and Koelsch, S. 403
Johnson, M. K. et al. 244
Jokel, R. et al. 722
Juslin, P. N. 286, 288, 290, 374, 375
Juslin, P. N. and Västfjäll, D. 372, 374
K
Kalveram, K. T. and Seyfarth, A. 255
Kämpfe, J. et al. 252
Kant, I. 810
Karma, K., (Karma music test (KMT)) 441, 442, 444, 445–6
KCTD8 gene 446
Keith, R. and Aronson, A. 719–20
Keller, P. E. 532
keyboard players 472
Kim, M. and Tomaino, C. 719
Kim, S. J. et al. 706
Kimata, H. 348
kinaesthetic feedback 465
kinaesthetic rhythm 177
Kindermusik classes 595–6
King, B. B. 213
Kleinmintz, O. M. et al. 278
Klimesch, W. 249, 252
Klimesch, W. et al. 250
Knoblauch, A. 4–5
knowledge-free structuring 194, 196
Kodaly and Orff 607, 728
Koelsch, S. 299, 377
Koelsch, S. et al. 105, 197, 202, 245, 301
Kojovic, M. et al. 700
Kolb, B. and Gibb, R. 27
Kornysheva, K. et al. 378
Korsakoff’s syndrome 244
Kotz, S. A. and Gunter, T. C. 716
Kotz, S. A. and Schwartze, M. 723
Koyama, Y. et al. 348
Kragness, H. E. and Trainor, L. J. 599
Kraus, N. 401
Kraus, N. et al. 553
Krauss, T. and Galloway, H. 720
Kreiner, H. and Eviatar, Z. 582
Kuhl, P. K. 44
Kühn, S. and Gallinat, J. 378
Kunert, R. et al. 399
Kussner, M. B. et al. 252
L
Lai, G. et al. 581
Lai, H. L. and Good, M. 629
Lang Lang 462
language
and aging 633–4
co-development with music 576–8, 577
development 676–7
discrimination of sounds 394–6
disorders 580–1, 715–28
entanglement with music 582
learning 195
meaning in 79
modular approach 392–4
vs. music 25, 187–8, 567–9
aesthetic experience 370
music of speech 569–71
and music training 270–1, 400–4, 652–4
perception
development 574–6, 575
innate abilities 571–4, 578–80
PET studies 506–7
phonemes 190–1, 550
processing 391, 396–400
rehabilitation 633–4
rhythm 173–4
similarities to music 396–400
stress-timed vs. syllable-timed 570
temporal focus 406–8
tonal languages 203, 240, 402, 570
Mandarin Chinese 551
and tonality perception 48
training vs. musical training 553, 555
use of music in language training 404–6
language acquisition 567, 583
Language Acquisition Device (LAD) 808
language areas, improvisation 516–17
language functions, neurologic music therapy 745
Larson, S. 530
laryngospasms 725
late positive component (LPC) 547–8
lateral prefrontal cortex 375
lateral regions 277
lateral sulcus 216
laterodorsal tegmental nucleus (LDT) 350
learning 194–5
enhanced by music 269–74
influence of background music 252–4
transfer between language and music 400–4
see also music training
learning-related changes in coherence (LRCC) 253–4
learning related synchronization (LRS) 751
LeBlanc, A. 46
Leder, H. et al. 381
left anterior negativity (LAN) 203
left auditory cortex 106
left posterior planum temporale 243
Lehne, M. and Koelsch, S. 378
Lerdahl, F. and Jackendoff, R. 197
Levitin, D. J. 528
Levitin, D. J. and Menon, V. 202
Liégeois-Chauvel, C. et al. 766
Limb, C. J. and Braun, A. R. 276, 490–1, 498, 500
limbic system 102–5, 516
Lindquist, K. A. et al. 132, 135, 294
Linked (Barabási) 125
Liu, C. et al. 378, 381, 495–7, 498, 500, 513
locus coeruleus (LC) 345–7
Lomber, S. G., Meredith, M. A., and Kral, A. 219
long-term depression (LTD) 787
long-term memory 238
long-term potentiation (LTP) 787
looped speech experiment 582–3
Lopata, J. A. et al. 511–12
Lortie, C. L. et al. 725
loudness judgments, visuomotor influences 223
lullabies 345, 596–7
vs. playsongs 597
universality 30
lyrical improvisation 495–7
lyrics 106, 569–70
M
Mafa people, Cameroon 46
Magic Flute, The (Mozart) 21
magnetic resonance imaging (MRI) 470
Mahmoudzadeh, M. et al. 579
major chords vs. minor chords 147
Makam music, Turkish 53–8
Mandarin Chinese language 551, 570
Mang, E. 676
Manning, F. C. and Schutz, M. 226
Manuck, S. and McCaffery, J. 22
MAOB protein imaging 807
Marie, C. et al. 550
marimba 156–7
Marr, D. 391, 406
Martinez-Molina, N. et al. 378
Mas-Herrero, E. et al. 378
Mathias, B. et al. 765
Mauszycki, S. C. and Wambaugh, J. L. 719
maximum-likelihood estimation (MLE) 215–16
McDermott, O. et al. 636
McGurk effect 221
McIntosh, G. C. et al. 702
McPherson, M. J. et al. 504–6
meaning 78, 79, 245
medial PFC 513
medial temporal area 221
mediating model 744
MEG (magnetoencephalography) 5, 365, 468
Mehr, S. A. et al. 28, 274, 651–2
melodic expectancy violations 48
melodic intonation therapy (MIT) 633–4, 715, 717, 719–20, 721
memory 237–8, 465
and aging 632–3
auditory sensory (ASM) 192–4
autobiographical 245, 248, 302, 633
cross-cultural research 50–1
during music listening 238–9, 239
effect of complexity 248
enhanced by music 250–5
episodic 238, 245, 248, 290, 303
and expertise 248
intervals 240–1
neural activity 248–50
neural networks 243–4
recognition of music 244–50, 255–6, 256
tonal working memory 241–4
tone 239–40
training 554, 649–50, 651, 652–3, 748, 750
Mendelian rules 444
Menninghaus, W. et al. 381
Menon, V. and Levitin, D. J. 378
mental practice and performance 532–3
mental training 469
Merchant, H. et al. 696
mere exposure effect 244
Merker, B. 80n, 91
mesial regions 276–7
mesolimbic reward pathway 246, 299n
metaphors, spatial and force 529–30
metaplasticity 428–9
meta-systems 502
meter 48–9, 165
perception in infants 593–4, 595–6
see also rhythm
methodological approach 393
metrical hierarchy 592–3, 595
Meyer, L. B. 601
middle cerebral artery (MCA) 767
middle temporal gyrus (MTG) 767
MIDI-based Scale Analysis (MSA) 780
mild cognitive impairment (MCI) 625
Milovanov, R. et al. 403
mimicry 530, 675–6
Minagawa-Kawai, Y. et al. 579–80
Mingus, C. 75–6
minor chords vs. major chords 147
mirror neurons 179, 303, 469, 530
mismatch negativity (MMN) 176, 193, 241, 375, 524, 548–9, 552–3, 724
abstract-feature (afMMN) 196
physical (phMMN) 196
statistical (sMMN) 195–6, 199
modularity (Q) 129–30, 131
modulation identification 48
Molnar-Szakacs, I. and Overy, K. 378
monkeys
auditory system 191
dystonia 477
vibration receptors 220
monocyte chemoattractant protein (MCP) 348
Montag, C. et al. 378
Montreal Battery on Evaluation of Amusia (MBEA) 556, 761–2
Montreal Protocol for Identification of Amusia (MPIA) 762
mood 286
Moran, J. 571
Morcom, A. M. and Fletcher, P. C. 136–7
Moreno, S. et al. 273, 555, 653
Mote, J. 600–1
motherese 573
motor brain function 468–9
motor co-representations 469
motor cortex 95–102, 96, 221
motor cortico-basal-ganglia-thalamo-cortical (mCBGT) circuit 176
motor evoked potentials (MEPs) 169
motor functions, and aging 634–5
motor regions 301
musicians 421
primary and secondary 465, 466–7
role in improvisation 515–16
motor signals 408
motor speech disorders (MSDs) 717–23
motor system
in rhythm perception 166–70, 179–80
Moussard, A. et al. 625, 627, 633
movement, synchronous 602–5 see also dance; drumming
movement-based influences on rhythm perception 225–6
movement disorders 701–5
Mozart, W. A. 21
Mozart Effect 269, 274 see also learning
Müller, K. et al. 225
Müller, V. et al. 378
multidimensional scaling (MDS) 150
multifactorial gene–environment interaction model (MGIM) 23
multifactorial traits 440
multiple sclerosis (MS) patients 254, 698, 704–5, 751
multiple system atrophy (MSA) 704
multisensory nature of music training 430
multisensory perception 221, 227
pitch 218–21
rhythm 223–6
timbre 221–2
multisensory processing 212–16
rhythm 177–8
music, definition 187
music centers 5
music education 46, 442, 809
music processing, neural basis 90–106
Music Psychosocial Training and Counseling (MPC) 749
music-syntactic processing 200, 201–3
music therapy 344, 348, 634, 635–6
for cognitive dysfunction 742–3
neurologic (NMT) 743–6, 807
cognition and emotion 747–53
for language disorders 715–28
for motor system disorders 697–700, 707
music training 419
and academic achievement 654–6
and aptitude 658–60, 659
effect on brain development 424–7, 426, 551–3, 557
effect on brain function 467–70
and brain structure 420–2, 420, 470–4, 474
and cognition 647–50, 654–7, 659
in aging 626–7
demonstration and observation 469
effect on executive functions 554–5
and healthy aging 656–7
and language skills 400–4, 652–4
motor functions 468–9
nature vs. nurture debate 461–2
and plasticity 422–4, 429–32, 462–4, 467–8, 546–7
short-term 427–8
studies of 645–7
types 650
and visuospatial skills 651–2
see also practice
musical affect 534
musical anhedonia 768–9
Musical Attention Control Training (MACT) 748
musical behavior, genetic influences 22–3, 23
Musical Echoic Memory Training (MEMT) 748
Musical Executive Function Training (MEFT) 748
musical expectancy formation 194–6, 197, 202–3, 290, 303–4
Musical Language Acquisition Device (MLAD) 808
Musical Mnemonic Training (MMT) 748
Musical Neglect Training (MNT) 747, 751–2
musical response model 744
Musical Sensory Orientation Training (MSOT) 747
Musical Speech Stimulation (MUSTIM) 717, 720
musical structure building 197
musical systems 42–3
musicality 34
musicians
auditory brainstem response (ABR) 550–1, 553
brain structure 420–2, 420, 428–9
cognition and aging 624–6
with dyslexia 580
exceptional and autistic 671–4, 681–90
language skills 400–4
and memory 248, 250–2
motor co-representations 469
neural responses to piano tones 467–8
plasticity 478–80, 479
practice 429–30
musician’s dystonia (MD) 475–8, 479, 777–82, 778, 779
future directions 791–6
pathogenic theory 787–90, 788, 794–6
pathophysiology 784–7, 794
and plasticity 787
treatments 782–4, 783, 791–3
musicogenic epilepsy 5
myelination 27
N
N100 responses 5
Nair, D. G. et al. 300
Nakata, T. and Mitani, C. 601
Narme, P. et al. 636
nature vs. nurture 19–22
in music training 461–2
Neanderthals 369, 718
Neapolitan chord 266
near-infrared spectroscopy (NISR), in infants 28
Nettl, B. 31
network-based approaches 123–5
neuroimaging analysis 132–8, 134
network disorders, dystonia as 478
network generation 132–5
network metrics 125–31 see also connectivity
network science 5, 125
networks 89, 95–105
interactions 105–6
Neuhoff, J. 159
neural auditory pathways 90–3, 90
neural oscillations 598–9
neural plasticity see plasticity
neural pruning 26–7
neural resonance theory 170–2
neuroaesthetics 366, 367–74, 371, 373
studies 377–81
neurochemical responses to music 333–4, 350–2
cholinergic systems 350
dopamine see dopamine
endogenous opioid systems (EOSs) 336–8
neuroendocrine systems 339–45
norepinephrine (NE) systems 345–7
peripheral immune system 347–9
serotonin systems 338–9
neuroendocrine systems 339–45
neuroimaging analysis, network-based 132–8, 134
neurologic music therapy (NMT) 743–6, 807
cognition and emotion 747–53
for language disorders 715–28
for motor system disorders 697–700, 707
neurological disorders
cognitive functions 738
cognitive remediation (CR) for 739–40
speech 717–23
neurological markers, of congenital amusia 762–3
neuropsychiatric disorders 451
neurotransmitter imaging 807
Newman, M. E. J. 125
Nieminen, S. et al. 378
node parcellation 130, 131, 133
nodes 125–7, 125, 126, 131, 134
non-musical parallel model 744
norepinephrine (NE) systems 345–7
Norman-Haignere, S. et al. 569
nostalgia 245
notes, duration 156–7
novelty spectrum 72–4, 75–6
NR3C1 447
NRGN gene 447
O
oboe, harmonics 155
occipital gyri 489
olivocerebellar network 99
Onofre, F. et al. 725
onset of notes 151
Openness-to-Experience trait 649
OPERA (Overlap, Precision, Emotion, Repetition, Attention) hypothesis 264, 270, 403
opioid receptors 352, 741
Oral Motor and Respiratory Exercises (OMREX) 717, 718, 719
orbitofrontal cortex 106
Organ2/ASLSP (Cage) 21
oscillation-based models of speech perception 407
oscillatory functions 751
in rhythm perception 170–2
out-of-culture scale violations 48
overtones 148–9, 149
oxytocin (OT) 339–40, 341, 534, 741
P
Pacinian corpuscles 188–9
pain modulation 338
Pallesen, K. J. et al. 554
Pantev, C. et al. 22, 467
Papoušek, M. 676
parabelt 216, 221
parahippocampal gyrus 300
parahippocampus 301
parental music education 442
parietal areas 243
parietal lobe 465
role in improvisation 518–19
Parkinson’s disease (PD) 448, 634–5, 695, 698–700, 702–3
response to dopamine 335
rhythm perception 169–70, 179
speech deficits 716, 718
Parkinsonism 704
passive musical exposure 32, 627–8
Patel, A. 393, 400, 403
Patel, A. D. 264, 270, 582
pathogenic theory of musician’s dystonia (MD) 787–90, 788, 794–6
pathophysiology, for musician’s dystonia (MD) 784–7, 794
pathways
auditory 217
visual 218
Patterned Sensory Enhancement (PSE) 698–9, 701
PCC 518–19
PCDHA gene cluster 446
PCHD7 gene 446
PDGFRA gene 446
Pearce, M. T. 54, 378
Pearce, M. T. et al. 381
pedaling 472
pedunculopontine tegmental nucleus (PPT) 350
Pelowski, M. et al. 367
Perani, D. 567
Perani, D. et al. 579
perception 464–5
and imagery
behavioral and psychophysical studies 522–3
effect of brain damage 523–4
physiological measures 524–6
deficits see amusia
development 574–6, 575, 675–7, 677
innate abilities 571–4, 578–80
training 747–8
without awareness 762–3
perceptual integration 213–16
perceptual magnetic effect 44
perceptual narrowing 595–6
percussion, expressive gestures 223
percussion instruments, note duration 156–7
percussive tones 153, 154
Pereira, C. S. et al. 378
Peretz, I. 296, 300, 304, 571
Peretz, I. and Coltheart, M. 393
Peretz, I. et al. 397–8, 766
perfect pitch see absolute pitch
performance 459–60
biological restrictions 21
brain regions 464, 465–7, 465
effect on human transcriptome 447
expert 22–3, 23
mental 532–3
and plasticity 460, 462–4, 478–80, 479
as therapy for neurological disorders 699–700
peripheral immune system 347–9
personality traits 649, 655
of creative musicians 278
PET (positron emission tomography) scans 5, 333–4, 352, 807
improvisation 506–7
Petersen, B. et al. 724
Peterson, D. A. et al. 780, 791
Petkov, C. I. et al. 191
Phillips, D. P., Hall, S. E., and Boehnke, S. E. 153
Phillips-Silver, J. and Trainor, L. J. 226, 594
Phillips-Silver, J. et al. 764–5
phonemes 569–71
similarity to timbre 190–1
phonological loop 241, 243
phonological store 243
PHOX2B gene 446
phrenology 4, 4
physical medicine and rehabilitation (PMR) for dystonia 784, 792
physical mismatch negativity (phMMN) 196
physical responses to music, neural basis 97
physiological studies of imagery and perception 524–6
Piaf, Edith 462
pianists 489–90, 492–5, 499–502
gray matter 472–4, 474
mental performance 532–3
pedaling 472
piano
improvisation 276
temporal structure of notes 151
vibration detection 222
Picelli, A. et al. 703
Pinho, A. L. et al. 499–502, 518
Piper, A. 571
pitch
absolute vs. relative 681–3, 683
attention to 266–7
changes, detection 147
expectations 52–3
imagery and perception 522
interval representation 56
memory 239–40
metaphors 530
multisensory perception 218–21
acquired deficits 766
heritability 442–3
in infants 573–4
processing 194
in tonal languages 551
pitch-based amusia 761–3
pitch perception accuracy (PPA) test 446
planum temporale 194
plasticity 24–6, 26, 740
and aging 624, 627, 631
cognitive remediation 741–2
and musical performance 460, 462–4
and musician’s dystonia (MD) 475–8, 479, 787
and performance 478–80, 479
and training 422–4, 429–32, 462–4, 467–8, 546–7
playsongs vs. lullabies 597
pleasure in music 376, 760
anhedonia 768–9
individual variation 380
training 430–1
Poeppel, D. 217, 406
pontomesencephalic tegmentum (PMT) 350
posterior pituitary 339–41
power spectra 150–1, 152
PPP2R3A gene 448–9
practice
and brain structure changes 25, 421, 423
deliberate 460–1
effect on brain structure 470–4, 474
and expertise 429–30
and genetic influences 22–3, 23
mental 532–3
myelination 27
through observation 469
see also training
pre-supplementary motor area (pre-SMA) 276, 488–9, 494–5, 515
precentral gyrus 465
precise auditory timing hypothesis (PATH) 406
predictability 372
prediction 430
in rhythm 598–600
predictive coding model 221–2
Predominant Patterns 30–1
preference 286
cross-cultural research 46–7
prefrontal regions 277
prehistoric evidence of music 369
premotor area (PMA) 375, 465, 466
premotor cortex (PMC), rhythm perception 168
presbylaryngis 725
primary auditory cortex (A1/PAC) 191–2, 464, 470
primary motor cortex (M1) 169, 465, 466, 472
primary somatosensory area (S1) 465
priming of motor activity 696–7
private music tuition 650
proficiency, and memory tasks 250–2
progressive supranuclear palsy (PSP) 704
proopiomelanocortin (POMC) 741
prosody 570–1, 582
perception in infants 572–3
prosopagnosia 770
protocadherin15 (PCDH15) 449
Przybylski, L. et al. 606
Przysinda, E. et al. 199, 278
psychiatric disorders 451
psychological impact of music 78–81
psychological voice disorders 726
psychophysical studies, imagery and perception 522–3
psychosocial function, NMT 749
puberphonia 726
publications 6
pulse 568
pure tones 153
vs. instrument tones 22
putamen see basal ganglia
Putkinen, V. et al. 552–3
Pygmalion effect 251
R
Raaijmakers, J. G. and Shiffrin, R. M. 255
radioligands 333–4, 352
rāgamālikā modulation 48
Ramus, F. and Mehler, J. 570
random networks 126–8, 127
Range Universals 30–1
rap, freestyle 276, 495–7
rating scales for dystonia 780–2, 781, 791, 792
Rational Scientific Mediating Model (RSMM) 743–4, 749
real-world associations 156
recognition of music 244–50, 255–6, 256
recursion 197
Redies, C. 368
Redirected Phonation 726–7
Reelin pathway 445
refined auditory processing 467–8
refined somatosensory perception 468
region-of-interest (ROI)/atlas based networks 130, 131, 301, 494–5
regions 123
regular networks 126–7, 127
regularities 194–6, 202–3
Reich, S. 570–1
reinforcement learning (RL) 795
relative pitch (RP) processing 240–1, 681–3, 683
relaxation 629
repetition 568, 699
representations of music 568
Resonant Voice Therapy 726
respiratory disorders, therapy for 727
restorative approach to cognitive remediation 739
reward mechanisms 337
reward pathway, 299n
reward system (mesolimbic) 246, 374, 380, 431, 631
RGS2 gene 447
RGS9 gene 445
rhythm 592–3
for attention training 749–50
beat-based vs. non-beat based 165–6, 166
development 174–5
and developmental disorders 605–7
disorders 763–5
and emotion 600–1
in infant-directed singing 596–8
of language 173–4, 570–1
mirroring and joint action 179–80
perception 48–9, 166–72
cross-modal investigations 177–8
evolution 175–6
in infants 572, 574, 575–6, 593–4, 595–6
processing abilities, individual differences 178–9
regularity 598–600
selective attention 268–9
and stuttering 722
synchronous movement 602–5
Rhythmic Auditory Stimulation (RAS) 696–8, 702–3, 706
rhythmic entrainment 289–90, 302–3, 374, 695–6, 698
rhythmic improvisation 276, 497–9
rhythmic intervention for dyslexia 727–8
rhythmic priming 404
rhythmic-reading training (RRT) 728
Rhythmic Speech Cueing (RSC) 717, 719
Richie, L. 570
right frontotemporal network 763, 764
Rochette, F. et al. 724
Rock, A. M. et al. 597
rock music 549
Rohrmeier, M. 197
Rohrmeier, M. and Cross, I. 194
Roland, P. E., Skinhøj, E., and Lassen, N. A. 5
roles of music 31
Rosen, D. S. et al. 508–9
Rosenblum, L. D. and Fowler, C. A. 223
Rosenkrantz, K. 785–6
Rubinov, M. and Sporns, O. 131
Russian language 570
Russo, F. A., Ammirante, P., and Fels, D. I. 222
S
Sachs, M. E. et al. 378
Saint-Georges, C. et al. 573
Sakamoto, M. et al. 635
Saldaña, H. M. and Rosenblum, L. D. 221
Salimpoor, V. N. and Zatorre, R. J. 379
Salimpoor, V. N. et al. 337, 378
same–different tests 5
Sammler, D. et al. 398–9, 400, 630
Samson, S. and Peretz, I. 244
Santoni, C. et al. 726
Särkämo, T. et al. 254
SAT scores 654
Sauder, C. et al. 725
scaffolding 462, 717, 742
scale-free networks 127
scale perception 47
Scaled Inclusivity 130–1
scales 187
Javanese vs. Western 574
scat singing 569
Schellenberg, E. G. 658
schemas 247–8
Schenker, H. 197
schizophrenia 451, 524, 527–8, 531
Schlaug, G. 471
Schlaug, G. et al. 425, 471, 634
Schneider, N. et al. 343, 470
Schön, D. et al. 398
Schopenhauer, A. 77, 80
Schulze, K. and Koelsch, S. 242
Schumann 21
Schutz, M. et al. 223
sea lions, rhythm perception 176
Search of Associative Memory (SAM) model 247, 253, 255
Seashore, C. (Seashore tests) 5, 441, 442, 444, 446
secondary auditory area (A2) 464
secondary motor areas 465
secretory IgA (S-IgA) 349
Seesjärvi, E. et al. 556
segmental dystonia 777
Seinfeld, S. et al. 627
selection, and filtering 266
selection theories of attention, early vs. late 264–6
Semal, C. et al. 242
semantic associative network model of memory formation 247
semantic memory 238, 244–5
sensorimotor domain 471
sensorimotor functions, neurologic music therapy 745
sensorimotor integration 477, 786
sensorimotor pathways 221
sensory deficits, language disorders 723–4
sensory memory 237
sensory perception 477
sensory processing, improvisation 517–18
serial-to-parallel conversion 238–9, 239
serotonin (5-HT) 246, 338–9, 741
sex differences, in endocrine levels 344
sexual selection theory 67–9
Shahin, A., Roberts, L., and Trainor, L. 28
Shannon, C. E. 277
shared-resources hypothesis 300
shared syntactic resource integration hypothesis (SSRIH) 270–1
short-latency intracortical inhibition (SICI) 785–6
short-term memory 238
sight, visual rhythm 177–8
sight and sound association 156–8
Silbo Gomero, whistled speech 582
sine-wave speech 582
singing
birds see songbirds
development in children 578
endocrine responses 343–4
infant-directed 596–8
and memory tasks 253–5
oxytocin levels 340
universality 30
singing therapy 717–18, 721, 722–3
for respiratory disorders 727
for voice disorders 725–7
Six Degrees (Watts) 125
skin, music detected in 189
Skinner, B. F. 392
sleep disorders 725
Slevc, L. R. and Okada, B. M. 271
Slevc, L. R. et al. 398
slow-down exercises (SDEs) 793
SMA 515, 554
small-world networks 126–7, 127, 129
Smith, H. 21
Smith, J. D. et al. 531
sMMN (statistical mismatch negativity) 195–6, 199
SNCA gene 447, 448, 448
social functions of music 602–5
socio-economic status (SES) 655, 659
Soley, G. and Hannon, E. E. 596
somatosensory cortex 221, 465
somatosensory influences
on pitch perception 220–1
on rhythm perception 224–5
on timbre perception 222
somatosensory processing 517–18
songbirds 67–73, 81, 393–4, 441, 443, 444–5, 447, 449
songs, combination of language and music 398–9
Sowiński, J. and Dalla Bella, S. 765
spasmodic dysphonia (SD) 725
specific language impairment (SLI) 580–1
Spector, J. T. and Brandfonbrener, A. G. 780
spectral properties 148–9, 149, 150–1, 152
spectrum envelope 190–1
speech
music of 569–71
transcribed into music 570–1
speech disorders 4
fluency 721–3
neurologic music therapy (NMT) for 715–28, 745
speech sounds 550
speech therapy 633–4
spinal cord injuries 698
sports science 809
Staal, F. 78
Stahl, B. et al. 405
Stanford-Binet IQ test 648
statistical learning 25
statistical mismatch negativity (sMMN) 195–6, 199
statistical structures 194–6, 202–3
Steele, C. J. et al. 426, 473
Steinbeis, N. and Koelsch, S. 379
Sternberg, R. J. et al. 263
Strait, D. L. et al. 190
stress hormones 341–5
stress-timed languages vs. syllable-timed languages 570
string musicians 468, 472–3
mental performance 532–3
stroke patients 254, 633–4, 635, 698, 699, 701
acquired amusia 767
singing therapy 718
Stroop tests 554–5
structural features, cross-cultural research 47–50
structure
effects of training on 470–4, 474
individual variation in understanding 247–8
stuttering 581, 722–3
substantia nigra pars compacta (SNpc) 334
Sun, L. et al. 199–200
superadditivity 215
superior colliculus 221
superior parietal lobule (SPL) 490, 517
superior temporal gyrus (STG) 216, 470, 488, 489, 763, 764
superior temporal sulcus 219, 221
supplementary motor area (SMA) 465, 466
surgery
cortisol levels 343
immunoglobin A levels 349
surprise 372
Suzuki, M. et al. 379
Swift, J. 623
syllable-timed languages vs. stress-timed languages 570
syllables, rhythm 606–7
Symbolic Communication Training Through Music (SYCOM) 717
sympathetic nervous system (SNS) 345–7
synapses 463
development 26
synchronous movement 602–5 see also dance; drumming
synchrony with rhythm 174, 176 see also rhythm
synesthesia 528–9
syntax 393, 399–400
infant development 576–7
synthesized notes 148–9, 152, 157, 201
systematic approach 364–5
T
Tamplin, J. et al. 727
techno music 337
tempo
and emotion 600–1
memory for 245–6
temporal approach, to language and music 406–8
temporal attention 267–9
temporal brain areas, and memory 248
temporal discrimination threshold (TDT) 786–7, 790, 793, 794
temporal dynamics 146, 148–55, 149, 154, 157–9
temporal gyri 470–1
temporal lobes 464
temporal processing
disorders 786–7
universals 30
temporoparietal junction (TPJ) 488, 518
deactivation 492
tensor based morphometry (TBM) 5, 470
Tervaniemi, M. et al. 550
testosterone 344
thalamus 301
Thaut, M. H. et al. 100, 253, 696, 701, 702, 703
Therapeutic Instrumental Music Performance (TIMP) 699–700, 701, 706
Therapeutic Singing (TS) 717, 718, 719
Thompson, W. F. et al. 213, 218
Thomson, J. M. et al. 727
thresholding procedures 133
throat singing 571
Tierney, A. and Kraus, N. 405–6
timbre 149–51, 192
memory for 245–6
perception in infants 572, 573–4
of phonemes 569–70
similarity to phonemes 190–1
time perception, acquired deficits 766–7
timing 427
mechanisms, absolute vs. relative 166
Tinbergen, N. 391
Toccata in C Major (Schumann) 21
tone, memory 239–44
tonal languages 203, 240, 402, 551, 570
tonality perception 47–8
tone deafness see amusia
tone intervals, memory 240–1
tone onset 151
tone patterns 193
tonotopic maps 216
tonotopic organization 22, 90–1, 191
Torres, E. B. et al. 706
Toscanini 21
Touch-Cue Method (TCM) 720
trading fours 502–4
training 419
and academic achievement 654–6
and aptitude 658–60, 659
effect on brain development 424–7, 426, 551–3, 557
effect on brain function 467–70
and brain structure 420–2, 420, 470–4, 474
and cognition 647–50, 654–7, 659
in aging 626–7
demonstration and observation 469
effect on executive functions 554–5
and healthy aging 656–7
and language skills 400–4, 652–4
motor functions 468–9
nature vs. nurture debate 461–2
and plasticity 422–4, 429–32, 462–4, 467–8, 546–7
short-term 427–8
studies of 645–7
types 650
and visuospatial skills 651–2
see also practice
Trainor, L. J. 597
Trainor, L. J. and Adams, B. 594
Trainor, L. J. et al. 226, 240, 572, 601
Tramo, M. J. et al. 191
Tranchant, P. and Vuvan, D. T. 765
Tranchant, P. et al. 224
transcranial alternating current stimulation (tACS) 771
transcranial direct current stimulation (tDCS) 507–9, 771
transcranial magnetic stimulation (TMS) 5, 219, 250, 430
for amusia 771
for musician’s dystonia (MD) 784, 793
transfer 269–74, 630, 646, 649
music and language 400–4
Transformational Design Model (TDM) 744–6, 749
transverse temporal gyrus 216
traumatic brain injury (TBI) 698, 700
singing therapy 718
trombone, harmonics 151, 152
Trost, W. et al. 379
trumpet, harmonics 148–9, 149, 150
Turkish IDyOM model 53–8
Turkish music 50–1
twin studies 23, 24–5, 556
aging 626
aptitude 442
music training 423
willingness to practice 444, 461
U
Ullén, F., Hambrick, D., and Mosing, M. 22–3
Ulrich 188
Unified Dystonia Rating Scale (UDRS) 782
unity assumption 158
universality of music 369–70
universals 29–31
unpleasant sounds 155–6, 339
use patterns 790
usefulness of music 275
V
valence 286, 601
Van den Heuvel, M. P. et al. 133
Van Wijk, B. C., Stam, C. J., and Daffertshofer, A. 133
Vaquero, L. et al. 426, 463
Vatakis, A. et al. 158
ventral pathways 217
ventral premotor cortex (vPMC) 201, 426, 490
ventral striatum 104–5, 301
ventral tegmental area (VTA) 334
ventriloquist effect 158
ventrolateral prefrontal cortex (PFC) 554
Verghese, J. et al. 626
vertical structure 147
vestibular apparatus 188–9
vestibular cortex 221
vestibular system 225–6
vibration receptors 188–9, 220, 222, 224–5
Vienna Integrated Model of Art Perception (VIMAP) 369
Villarreal, M. F. et al. 497–9
viola, harmonics 155
visual cortex 221
visual imagery 290, 303
visual pathways 218
visual perception 213
visual processing 518
visual rhythm 177–8
visuomotor influences
on pitch perception 218–20
on rhythm perception 223–4
timbre perception 221–2
visuospatial skills, and musical training 651–2, 657
VLDLR gene 445
Vocal Intonation Therapy (VIT) 717, 718
vocal misuse disorders 726
voice disorders 724–7
voluntary musical imagery 94–5
voxel based morphometry (VBM) 5, 470, 767
voxel-based networks 130, 131
Vuust, P. and Kringelbach, M. L. 379
Vuust, P. and Witek, M. A. G. 408
Vuvan, D. T. et al. 199
W
Wager, T. D. et al. 292
Wallaschek, R. 5
Wambaugh, J. L. et al. 719
Wan, C. Y. et al. 715
Warren, J. D. et al. 192
Watts, D. J. 125
Wechsler Intelligence Scale for Children–III (WISC–III) 648
Welch, G. 676
well-being in aging 628–9, 635–6
Wernicke’s area 464
Weschler Preschool and Primary Scale of Intelligence–III (WPPSI–III) 653
Weschler Preschool and Primary Scale of Intelligence–Revised (WPPSI–R) 651
Western IDyOM model 53–8
Western music
modulation identification 48
Western scales vs. Javanese scales 574
Westernization of music 32
whistled speech 582
white matter (WM) 27, 420–1, 426, 428
density 463
imaging techniques 470
and music practice 471
Whitfield, I. 192
‘Who put the Bomp?’ song 569
Wiener, M. et al. 716
Wilkins, R. W. et al. 379
Williams Syndrome (WS) 340
Wilson, F. 21
Wilson, R. S. et al. 625
Witek, M. A. et al. 380
Wittgenstein, L. 810
Wong, P. C. M. et al. 190, 551, 716
Wood, B. H. et al. 702
working memory (WM) 237–8
effect of musical training 554
tonal 241–4
training 649–50, 652–3
X
Xhosa language 570
Y
Yoruba language 570
Yoshida, K. A. et al. 594
Z
Zahavi, A. 67, 81
Zajonc, R. B. 244
Zatorre, R. J. 192
Zatorre, R. J. and Belin, P. 216
zebra finches 393–4, 441, 443, 444–5
Zeki, S. 367
Zentner, M. and Eerola, T. 602–3
Zentner, M. et al. 80
Zentner, M. R. 286
Zhang, J. et al. 199
Ziegler, A. et al. 725
ZNF223 gene 448
zygonic theory of musical-structural understanding 674–5, 679

The Oxford Handbook of Music and The Brain (2019)

Uploaded by

Copyright:

Available Formats

You might also like

The Oxford Handbook of Music and The Brain (2019)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Oxford Handbook of Music and The Brain (2019)

Uploaded by

Copyright:

Available Formats

THE OXFORD HANDBOOK OF

MUSIC AND THE BRAIN

MUSIC AND THE BRAIN

© Oxford University Press 2019

The moral rights of the authors have been asserted

First Edition published in 2019

Published in the United States of America by Oxford University Press

British Library Cataloguing in Publication Data

Library of Congress Control Number: 2019943710

Printed and bound by CPI Group (UK) Ltd, Croydon, 04

SECTION II MUSIC, THE BRAIN, AND

3. Cultural Distance: A Computational Approach to Exploring

4. When Extravagance Impresses: Recasting Esthetics in

6. Network Neuroscience: An Introduction to Graph Theory

7. Acoustic Structure and Musical Function: Musical Notes

8. Neural Basis of Rhythm Perception

9. Neural Basis of Music Perception: Melody, Harmony, and

10. Multisensory Processing in Music

SECTION IV NEURAL RESPONSES TO

13. Neural Correlates of Music and Emotion

14. Neurochemical Responses to Music

15. The Neuroaesthetics of Music: A Research Agenda Coming

16. Music and Language

SECTION V MUSICIANSHIP AND BRAIN

18. Genomics Approaches for Studying Musical Aptitude and

19. Brain Research in Music Performance

20. Brain Research in Music Improvisation

22. Neuroplasticity in Music Learning

SECTION VI DEVELOPMENTAL ISSUES IN

24. Rhythm, Meter, and Timing: The Heartbeat of Musical

25. Music and the Aging Brain

26. Music Training and Cognitive Abilities: Associations,

27. The Neuroscience of Children on the Autism Spectrum with

SECTION VII MUSIC, THE BRAIN, AND

29. Neurologic Music Therapy for Speech and Language

30. Neurologic Music Therapy Targeting Cognitive and Affective

31. Musical Disorders

32. When Blue Turns to Gray: The Enigma of Musician’s

SECTION VIII THE FUTURE

T book is the result of a considerable amount of effort by fifty-four

Space limitations do not permit a detailed historical overview of

• Franz Joseph Gall (1758–1828), the founder of phrenology, identified

Given their variety and ubiquity, human musical experiences are

As this introductory chapter comprises the first section, these overviews

II. Music, the Brain, and Cultural Contexts

The three chapters in Section II aim to put the neuroscientific study of

III. Music Processing in the Human Brain

Authors in Section III explore what we know about how music is

V. Musicianship and Brain Function

VI. Developmental Issues in Music and the Brain

Throughout the lifespan, musical experiences have consequences for

VII. Music, the Brain, and Health

The greatest preponderance of neuromusical research is basic research,

VIII. The Future

In the final chapter, our aim is to identify noteworthy developments in

MUSIC THROUGH THE

To be certain, the pendulum of our understanding has sometimes swung