Music As Learning Reward

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Musical reward prediction errors engage the nucleus

accumbens and motivate learning


Benjamin P. Golda,b,c,1, Ernest Mas-Herreroa,b, Yashar Zeighamia, Mitchel Benovoyd,e, Alain Daghera,
and Robert J. Zatorrea,b,c
a
Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada; bInternational Laboratory for Brain, Music and Sound Research,
Montreal, QC H2V 2J2, Canada; cCentre for Interdisciplinary Research in Music Media and Technology, Montreal, QC H3A 1E3, Canada; dInstitute of
Biomedical Engineering, École Polytechnique, Montreal, QC H3T 1J4, Canada; and eCorstem, Montreal, QC H3C 4J9, Canada

Edited by Dale Purves, Duke University, Durham, NC, and approved January 3, 2019 (received for review June 8, 2018)

Enjoying music reliably ranks among life’s greatest pleasures. Like precisely how good or bad the outcome was relative to the expec-
many hedonic experiences, it engages several reward-related tation) (22). Bidirectional reward prediction errors (RPEs) are
brain areas, with activity in the nucleus accumbens (NAc) most thus a special case of prediction errors that motivate reinforcement
consistently reflecting the listener’s subjective response. Converging learning, or the maximization of rewards and the minimization of
evidence suggests that this activity arises from musical “reward pre- punishments (22). RPE signals in human and nonhuman animals
diction errors” (RPEs) that signal the difference between expected arrive in the NAc from dopaminergic midbrain neurons (25–27),
and perceived musical events, but this hypothesis has not been di- coinciding in at least some cases with emotional and psychophysi-
rectly tested. In the present fMRI experiment, we assessed whether ological arousal (28). The NAc also receives inputs from the ven-
music could elicit formally modeled RPEs in the NAc by applying a tral prefrontal cortex, amygdala, hippocampus, and thalamus, and
well-established decision-making protocol designed and validated is therefore well positioned to integrate cognitive and emotional
for studying RPEs. In the scanner, participants chose between arbitrary information for action selection (29). Although RPEs can also be
cues that probabilistically led to dissonant or consonant music, and observed in other brain regions, human neuroimaging studies con-
learned to make choices associated with the consonance, which they sistently find their greatest functional correlate in the NAc (30, 31).
preferred. We modeled regressors of trial-by-trial RPEs, finding that Dopamine transmission also seems to play a role in music
NAc activity tracked musically elicited RPEs, to an extent that explained enjoyment, increasing in the NAc during moments of intense
variance in the individual learning rates. These results demonstrate that musical pleasure and in the caudate during the anticipation of
music can act as a reward, driving learning and eliciting RPEs in the these events (3). Integrating this expectation-related activity of
NAc, a hub of reward- and music enjoyment-related activity. NAc dopamine with the pleasurable significance of manipulating
expectations and the NAc’s central role in pleasure, current
models posit that this network might encode dopaminergic RPEs
|
music reward prediction errors | nucleus accumbens | during pleasurable music listening (12, 13), and that these signals
|
abstract reward fMRI
could help to explain the strong emotional, psychophysiological,
and pleasurable impact of musical surprises (17–19).
M usic is one of life’s greatest pleasures (1). Like many other
pleasures (2), enjoying music reliably engages critical com-
ponents of the reward system, including the nucleus accumbens
RPE computation is thus a promising and widely invoked
explanatory mechanism for abstract pleasures like music listen-
ing, but there is currently no direct evidence that it occurs during
(NAc), caudate, orbitofrontal cortex, anterior cingulate, insula, this or any other abstract hedonic experience. Here, we tested
and amygdala (3–5). Of these, the activity of the NAc most
strongly correlates with ratings of music liking and wanting, as Significance
does its functional connectivity to the auditory cortices, orbito-
frontal cortex, anterior cingulate, and amygdala (3, 4, 6). Prediction errors are crucial for perception, learning, and adapt-
A growing body of research suggests that abstract pleasures may
ability. Can they also explain the abstract pleasures we derive
become rewarding by manipulating expectations (7–9). As music
from seemingly nonadaptive behaviors? We present evidence of
unfurls over time, often across many highly regular structures, it is
musically elicited reward prediction errors (RPEs), illustrating that
particularly well suited to manipulate expectations, as proposed by
an abstract stimulus without apparent biological value can en-
musicological and psychological models (10–13). Although other
gage the reward system simply by manipulating expectations.
mechanisms likely also contribute to musical pleasure, such as
social interaction (14), associations (15), or beauty (16), the few Our results demonstrate that musical events can elicit formally
empirical studies of musical expectancy and affective responses all modeled RPEs like those observed for concrete rewards, such as
suggest that expectations account for some of music’s strongest food or money, and that these signals support learning. This ex-
emotional and pleasurable effects (17–19). tension of the RPE model to music implies that predictive pro-
Prediction is foundational to perception, cognition, and be- cessing might play a much wider role in reward and pleasure than
havior, and is crucial for adaptive fitness in a dynamic world. previously realized, and inspires new perspectives on aesthetics
Correct predictions allow us to anticipate our needs and envi- as well as potential therapeutic and educational applications.
ronments, while incorrect ones help us adapt as these change.
Author contributions: B.P.G., M.B., A.D., and R.J.Z. designed research; B.P.G. performed
Predictions also facilitate information processing: Fully pre-
research; B.P.G., E.M.-H., Y.Z., and M.B. analyzed data; and B.P.G. wrote the paper.
dicted events validate preexisting models, while surprises in-
The authors declare no conflict of interest.
dicate the extent of error in the prediction (20–22), prompting
refinements in synaptic weights and neural circuits to update This article is a PNAS Direct Submission.

predictions and/or behaviors (23). Considering that neurons ex- Published under the PNAS license.
Downloaded at Philippines: PNAS Sponsored on July 17, 2021

press prediction errors across multiple levels of processing, from Data deposition: The data reported in this paper have been deposited in NeuroVault,
primary perception to complex decision making (23, 24), the https://neurovault.org/collections/4778/.
“predictive coding” theory proposes that minimizing these errors 1
To whom correspondence should be addressed. Email: benjamin.gold@mail.mcgill.ca.
might be the brain’s central organizing principle (20). This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
When a prediction concerns some expected value, its update 1073/pnas.1809855116/-/DCSupplemental.
depends on both the magnitude and direction of the error (i.e., Published online February 6, 2019.

3310–3315 | PNAS | February 19, 2019 | vol. 116 | no. 8 www.pnas.org/cgi/doi/10.1073/pnas.1809855116


whether music could elicit formally modeled RPEs in the NAc To determine how well the reinforcement-learning models de-
using fMRI and a reinforcement-learning protocol validated for scribed the group’s behavior, we compared the negative log likeli-
studying RPEs (32, 33). We adapted this protocol to a musical hood of each model, given the participant’s choices, with that of
context: In each of 60 trials, participants chose between two sets a null model that had no information about the choices. The
of arbitrary cues (first colors and then directions) that were as- reinforcement-learning models described the group’s choices
sociated with an ongoing musical piece ending either conso- significantly better than the null models [mean ± SD pseudo-R2 =
nantly [which they preferred: two-tailed paired samples t(19) = 12.85 ± 14.64%, t(19) = 3.93, P < 0.01; Fig. 2E], enabling us to
5.72, P < 0.01; SI Appendix, Fig. S1] or dissonantly (Fig. 1; explore the neural correlates of these modeled RPEs. The best-
sample stimuli are shown in SI Appendix). Although music is fitting parameters for the models are shown in SI Appendix, Fig. S2.
typically consonant, the probabilistic nature of this paradigm
meant that our participants could never fully predict how a trial Neural Evidence of RPEs. To assess the specific prediction that
would end, ensuring measurable prediction errors for each trial the NAc reflected RPE signaling in this musical reinforcement-
based on the interaction of expectancy and pleasantness. Pre- learning context, we convolved the computationally modeled
diction errors were therefore positive for consonant outcomes, RPEs with a standard hemodynamic response function and eval-
negative for dissonant ones, and greater the more the outcome uated their correspondence to the BOLD activity within the bi-
was unexpected based on the listener’s task experience. The basis lateral NAc region of interest. We defined this area a priori with
of our RPE modeling was thus the salient feature of consonance/ the Harvard-Oxford anatomical subcortical atlas (35). The mod-
dissonance as an exemplar of general musical prediction errors. eled RPEs significantly correlated with the BOLD activity of the
We used a temporal difference [TD(λ)] “Q-learning” algorithm right NAc after family-wise error (FWE) correction (peak voxel:
12, 8, −8; z = 3.53; PFWE = 0.01; Fig. 3A), and this was the
to model these RPEs according to each participant’s choices,
strongest correlation throughout the whole brain, illustrating that
outcomes, and outcome preferences (22, 32–34). Our first hy-
this area exhibits RPEs even from an abstract stimulus like music,
pothesis was that the RPE model would substantially explain
when the values of predicted and obtained outcomes are less clear
participants’ choices, suggesting that they used these signals to than with a concrete stimulus like money. To validate that this

PSYCHOLOGICAL AND
COGNITIVE SCIENCES
learn during the experiment. We then tested whether the RPEs activity reflected canonical RPEs rather than reward value or
manifested in the blood oxygen level-dependent (BOLD) activity surprise, we performed a conjunction analysis of distinct RPE
of the NAc, as predicted from the neuroimaging literature, and components in the same a priori-defined region. RPEs are char-
whether this activity would represent RPEs rather than some acterized as greater for rewards (consonant endings) than pun-
other correlated signal (30). Finally, since computing RPEs should ishments (dissonant endings), larger for greater or less expected
facilitate the learning of rewarding behavior (22, 23), we in- rewards, and smaller for greater or less expected punishments
vestigated the relationship of the RPE-related neural signals to (30). We found a conjunction of these features in the same area
individual learning outcomes. These analyses revealed strong be- of the right NAc as indicated by the modeled RPEs (peak voxel =
havioral and neural evidence of musical RPEs. 12, 8, −10; z = 2.38; P < 0.01; Fig. 3B). A post hoc examination
of this conjunction throughout the rest of the brain verified that
Results the strongest conjunction was in the NAc. This result corroborates
Reinforcement-Learning Behavior. Participants successfully navi- our original NAc finding by exhibiting axiomatic RPE properties
gated the musical reinforcement-learning task. They were more in the same previously identified structure.
likely than chance to make choices leading to consonant endings,
as measured by one-sample t tests of the color and direction Learning Correlates. Finally, we explored whether the RPE-related
choices separately [overall accuracy mean ± SD = 57.93 ± activity of the right NAc facilitated reinforcement learning by
14.44%, t(18) = 2.40, P = 0.03; Fig. 2A] or together [two-step testing whether it could significantly explain task performance or
accuracy mean ± SD = 37.01 ± 19.63%, t(18) = 2.67, P = 0.02; model fitness. All behavioral learning measures (overall accuracy,
Fig. 2B]. We evaluated learning throughout the experiment as the overall learning slope, two-step accuracy, and two-step learning
change in accuracy across sliding 10-trial windows: Permutation slope) were highly correlated with each other and with the mea-
tests of these learning slopes showed that overall accuracy im- sure of model fitness (pseudo-R2), so we used stepwise linear re-
proved for the group as a whole (β^ = 0.02, P = 0.04; Fig. 2C) and gression to identify the one(s) with the strongest explanatory
^ > 0.02, P < 0.05), power. This approach settled on the overall learning slope, which
for 12 of the 18 participants evaluated (all βs accounted for ∼31% of the variance in the RPE parameter esti-
while two-step accuracy did not significantly improve for the mates [F(1,16) = 7.18, P = 0.02, R2 = 0.31, adjusted (Adj.) R2 =
whole group (β^ = 0.01, P = 0.23; Fig. 2D) but did for eight of the 0.27]. We validated this effect with a robust regression to reduce
^ > 0.02, P < 0.05). The variability
18 participants evaluated (all βs the influence of outliers [F(1,16) = 6.03, P = 0.03, R2 = 0.27, Adj.
in performance reflects the difficulty of the task, and allows us to R2 = 0.23; Fig. 4A]. This relationship was positive, such that
measure the relationship between learning and brain activity. stronger RPE signaling in the right NAc corresponded to more

Fig. 1. Probabilistic musical decision-making task.


Participants first chose between two arbitrary colors
to initiate the playing of a Bach chorale: The color
choice determined its timbre. As the chorale reached
halfway, a cue prompted participants to choose be-
tween two arbitrary directions: This choice de-
termined whether the chorale ended consonantly or
dissonantly. Each choice was associated with a spe-
cific outcome, but these outcomes were probabilistic
Downloaded at Philippines: PNAS Sponsored on July 17, 2021

to elicit prediction errors. Each color led to its asso-


ciated timbre 70% of the time. Within one timbre
context, each direction choice led to its ending 85%
of the time; in the other timbre, it was 70%. The
associations were randomized across participants, who were not made explicitly aware of the contingencies but were simply told to try to make optimal
choices to hear the music they wanted. The optimal choices for the case shown in the figure were thus yellow to select its associated the harp timbre and then
left if the chorale played in that timbre or right if not. The task had 60 trials in two runs of two equal blocks each.

Gold et al. PNAS | February 19, 2019 | vol. 116 | no. 8 | 3311
task, those whose NAc activity reflected RPEs more reliably (per-
haps better encoding these signals) tended to improve the most. In
addition to supporting the widely hypothesized ability of music to
elicit RPEs, these findings establish music as a neurobiological re-
ward capable of motivating learning in a complex environment, il-
lustrating how an abstract stimulus can engage the brain’s reward
system to potentially pleasurable effect and implying that RPEs
could play a broader role in pleasure than previously known.

Music as Reward. The adaptive value of music is not readily ap-


parent. One possibility is that music might be rewarding by facil-
itating emotions (36), which can be pleasurable even when the
emotions are negative (5). However, musical emotions, like
pleasure, depend at least partly on the manipulation of expecta-
tions (10–12), just as RPEs sometimes coincide with strong
emotional arousal and pleasure (28). The value of music might
therefore be directly related to the predictions it engenders and
the quick feedback it provides across multiple evolving structures,
facilitating our “fundamental imperative” to predict and learn
from our environment (ref. 37, p. 43). The human brain exhibits
predictive processing of musical sequences even from birth (38).
This processing, which likely occurs within frontotemporal cortical
circuits (13), could represent refinements of the listener’s model,
which may rely on dopaminergic prediction errors (21, reviewed in
ref. 39). Unlike the danger of incorrectly predicting some other
auditory signals, such as those of predators or weather, being
wrong about an abstract stimulus like music has minimal bi-
ological consequences, and might even be pleasurable after cog-
nitive appraisal (11). This interpretation could apply to several
other abstract pleasures, which might engage the NAc via RPEs
related to success (i.e., a better-than-expected outcome or world
Fig. 2. Task behavior and model fit. (A) Overall accuracy. Considering color model) or knowledge acquisition/uncertainty reduction (i.e., better-
and direction choices independently, participants made optimal choices signif- than-expected information for world-model refinement) (7–9).
icantly more often than chance [t(18) = 2.40, P = 0.03]. (B) Two-step accuracy.
By this explanation, the most pleasurable music would be that
Participants also made both optimal decisions within a trial significantly more
often than chance, suggesting learning of the task’s two-step pathway [t(18) =
which helps to hone our predictions, with extremely predictable
2.67, P = 0.02]. (C) Overall accuracy improved throughout the task for the group or familiar music being too simplistic to contribute and extremely
as a whole (β^ = 0.02, P = 0.04) and for 12 of the 18 participants tested (all βs
^ > unpredictable or novel music being too complex to integrate (12).
0.02, P < 0.05). (D) Two-step accuracy did not improve throughout the task for In the present study, the difference between consonance and
the group as a whole (β^ = 0.01, P = 0.23), but it did for eight of the 18 par- dissonance was the principal motivator of behavior. Listeners of
^ > 0.02, P < 0.05). (E) Reinforcement-learning models fit
ticipants tested (all βs Western tonal music typically exhibit a strong and reliable
the group’s choices significantly better than corresponding null models [t(19) = preference for consonant music (reviewed in ref. 40), as we ob-
3.93, P < 0.01]. Red lines indicate means, boxes indicate 1 SD from the mean, served in our sample (SI Appendix, Fig. S1). Likewise, our par-
and blue lines show the chance level references of the statistical tests. ticipants’ RPEs represent the deviations from their expected

overall learning in the reinforcement-learning task. Noting that


the participant with the weakest RPE signaling in the right NAc
and the worst overall learning also scored the lowest on the
Barcelona Music Reward Questionnaire (BMRQ) (14), we further
identified a significant positive relationship between BMRQ
scores and RPE signaling in the right NAc [F(1,14) = 5.04, P =
0.04, R2 = 0.27, Adj. R2 = 0.21; Fig. 4B]. The participants who
derived less reward from musical stimuli generally reflected mu-
sical RPEs less reliably (a model combining BMRQ scores, RPE
signals, and learning is shown in SI Appendix, Fig. S3). Together,
these analyses implicate the RPE-related neural activity we ob-
served in the right NAc in the behavioral effect of learning.
Discussion
Although many authors have proposed that the intense emotions
and pleasures of music result from expectancies, predictions, and
their outcomes (conceptualized here as RPEs) (10–13), direct Fig. 3. Musically elicited RPEs reflected in NAc activity. (A) Within the bi-
evidence for this proposition has been lacking. Here, using a music- lateral NAc region of interest (purple), computationally modeled RPEs sig-
nificantly correlated with activity in a right-hemisphere cluster after family-
based probabilistic task adapted from reinforcement-learning
wise error correction (peak voxel: 12, 8, −8; z = 3.53; PFWE = 0.01; orange
Downloaded at Philippines: PNAS Sponsored on July 17, 2021

studies, computational modeling, and formal validation of RPEs, scale). (B) Conjunction of three analyses supports the original finding.
we report direct evidence of musically elicited RPEs. With only Blue indicates consonant endings > dissonant endings (outcome contrast),
musical feedback, participants learned to find their preferred mu- red indicates RPEs for consonant endings (parametric effect of rewards),
sical endings more often as the task progressed and generated green indicates RPEs for dissonant endings (parametric effect of punish-
RPEs as they did. As is typical in other domains (22, 23), RPEs ments), and yellow indicates conjunction. All images are thresholded at z ≥
supported learning: While most participants learned during the 2. In coronal slices, y = 8. In sagittal slices, x = 12. L, left; R, right.

3312 | www.pnas.org/cgi/doi/10.1073/pnas.1809855116 Gold et al.


tracking (30), so we supported this finding by conjoining distinct
RPE axioms (cf. ref. 30). The observed NAc activity was indeed
greater for rewards (consonance) than punishments (disso-
nance), for more preferred or less expected rewards (i.e., larger
positive RPEs), and for more preferred or more expected pun-
ishments (i.e., smaller negative RPEs), expressing the bi-
directional computation that rigorously defines RPEs (30).
Incorporating information from cortical, limbic, and midbrain
sources, dopamine neurons in the NAc have been identified as the
main purveyor of RPEs and goal-oriented action selection in other
domains (30, 31, reviewed in ref. 29). The NAc is also the brain
region most strongly associated with the enjoyment of music (3, 4,
6), as well as with several other pleasures (reviewed in ref. 2). Al-
though pleasures, including music, often engage the NAc bilaterally
(2–4, 6), our right-hemisphere finding is consistent with others
showing a similar asymmetry (3, 4), and with a meta-analysis
Fig. 4. Correlates of RPE-related activity in the right NAc. (A) Parameter identifying the right, but not left, NAc as a reliable indicator of
estimates (param. estim.) of RPE signaling in the right NAc (shown in pink at musically evoked emotions (41). This lateralization could be related
x = 12) significantly correlate with the standardized learning slopes of to the cortical circuitry in the right temporal and frontal cortices
overall accuracy [F(1,16) = 6.03, P = 0.03, R2 = 0.27, Adj. R2 = 0.23]. (B) BMRQ that has been implicated in tonal perceptual processing (42), in-
scores also correlate with RPE-like activity in the right NAc. *P < 0.05. cluding responses to unexpected events (17, 19), and to the NAc’s
predominantly ipsilateral connections (e.g., ref. 43). Correspond-
ingly, disruption of right auditory cortex–NAc functional interac-
choice outcomes, positive if consonant and negative if dissonant, tions is associated with a loss of musical pleasure (6).

PSYCHOLOGICAL AND
COGNITIVE SCIENCES
since the participants could never fully predict either ending. Al- A large body of research cautions that NAc dopamine trans-
though the sources of prediction errors likely differ in typical mission, including RPEs, is more strongly associated with moti-
music listening experiences, during which dissonance may be more vational arousal and approach than with hedonic pleasure per se
surprising than consonance and predictions also pertain to other (reviewed in refs. 2, 29). The RPE-related activity that we observe
musical features not tested here (cf. ref. 18), the present results here might thus reflect motivation to hear consonant over disso-
illustrate that musical RPEs are sufficient to guide decisions and nant endings rather than pleasure. However, musically elicited
learning (Fig. 2 A and B). Learning occurred despite the absence RPEs could be both motivational and pleasurable, as predictive
of any concrete outcome or reward, and without any explicit in- success often is (44). Musical motivation and pleasure, although
struction or understanding of the probabilistic task structure. distinguishable, are highly correlated and colocated in frontostriatal
Learning was evident in the participants’ choices, as they made circuits (4, 45), and they likely coincide, especially if, as discussed
significantly more choices associated with consonant endings above, positive RPEs arise from musical outcomes that reduce
throughout the experiment (Fig. 2C). Likewise, about half of the uncertainty (i.e., are more predictable than expected) and negative
sample became more likely to choose both the color and the RPEs arise from those that increase it. Future experiments should
direction most associated with consonant endings within a trial consider the relationship between RPEs and pleasure in more
(Fig. 2D). That these effects manifested within 60 trials, whereas naturalistic music listening (i.e., beyond a reinforcement-learning
most reinforcment-learning studies use many more trials and paradigm) to explore the musical structures that give rise to mu-
more concrete rewards, such as money (e.g., refs. 28, 32, 33), sical RPEs without the influence of action–outcome associations.
highlights music’s potency as a reward.
Although the participants exhibited significant learning as a group, Linking Musical RPEs to Behavioral Learning. In computing the de-
our behavioral measures also illustrate considerable variability, with viation between reward predictions and outcomes, RPEs are
some individuals performing no better than chance (Fig. 2 A–D). We essentially teaching signals that update future predictions and
designed the task to be difficult to encourage exploration, and the behaviors (22, 23). Here, we find that the strength of the RPE
only instructions were to “learn about which choices are most likely signaling in the right NAc explained ∼31% of the variance in
to lead to the music you want to hear, and then make these choices.” how much the participants learned (Fig. 4A). Participants who
Some participants reported after the experiment that they had made were more sensitive to music reward in general (14) better
decisions based on musical characteristics other than consonance/ reflected RPE-like activity, but it was ultimately the extent of
dissonance, such as major/minor or ascending/descending; however, RPE signaling in the NAc that explained learning during the
these features would not have aided their decisions, because the experiment. Those who represented RPEs more faithfully pre-
stimuli were identical except for their consonant/dissonant endings. sumably benefited from the updates these computations gener-
These strategies, and individual differences in decision-making ated (e.g., through changes in synaptic and/or effective
strategies and biases, musical backgrounds, memory capacities, and connectivity) (23). This effect validates our discovery of musi-
reward sensitivities, could explain some of the learning differences. cally elicited RPEs in the NAc via their functional significance.

Modeling and Validating Musical RPEs. We used a well-validated Conclusion


computational model of reinforcement learning (22, 34) to These results support the widely hypothesized role of RPEs in
contextualize our findings within the reinforcement-learning lit- musical pleasure, illustrating that an abstract stimulus can en-
erature. This model described 18 of the 20 participants’ choices gage the same mechanism we use to learn about concrete, evo-
better than the corresponding null models, with an average im- lutionarily advantageous rewards like food, money, or sex. By
provement of 14.28% for those 18 participants (Fig. 2E). Other exploiting our “fundamental imperative” to predict and learn (ref.
studies using similar models with more concrete feedback typi- 37, p. 43) and our diffuse mesocorticolimbic connections
cally report values between about 25% and 40% (32, 33), but our (reviewed in ref. 29), musical RPEs seem to integrate cogni-
Downloaded at Philippines: PNAS Sponsored on July 17, 2021

modeling nonetheless allowed us to explore the neural correlates tive and emotional processing in the NAc; these mechanisms
of the RPE parameter. The best-fitting free parameters were might help explain why so many people find music engaging and
akin to those in other studies (32, 33) (SI Appendix, Fig. S2). pleasurable (1). Although music is especially well suited to ma-
As we hypothesized, the BOLD activity of the NAc signifi- nipulating expectations (10–13), our findings suggest that RPEs
cantly reflected formally modeled RPEs (Fig. 3A). Such corre- could also be important for other abstract rewards and/or plea-
lations might arise from related processes, such as valence sures, such as learning (cf. ref. 7), poetry (cf. ref. 8), or humor

Gold et al. PNAS | February 19, 2019 | vol. 116 | no. 8 | 3313
(cf. ref. 9). Understanding these abstract pleasures would have Computational Reinforcement-Learning Model. We modeled RPEs with a TD(λ)
important implications for therapeutic interventions and for hap- Q-learning algorithm (22, 34). In each state s (i.e., when faced with the color
piness itself, just as understanding subjective processes of pre- choice or the direction choice in either timbre context), this algorithm
diction can elucidate much of how we perceive, think, and behave. computes the value of each action a as QðsðtÞ, aðtÞÞ ← Q½sðtÞ, aðtÞ + λαδðtÞ,
where t is the time step (1 in the first state, 2 in the second state, and 3 for
Materials and Methods feedback), QðsðtÞ, aðtÞÞ is the expected value of all future outcomes for
choice a in state s, λ is the memory trace attributing value to the first step
Participants. Twenty-three healthy volunteers with normal hearing participated
in the two-step sequence [set to 1 when sðtÞ = 1], α is the model’s learning
in this experiment after providing informed consent. Two did not explore the
rate, and δðtÞ is the RPE for that time step. The RPE updates by
task choices, rendering the computational model unable to accurately devise
δðtÞ = rðtÞ + maxaeA Q½sðt + 1Þ, aðt + 1Þ − Q½sðtÞ, aðtÞ, where rðtÞ is the feed-
their decision values and RPEs; they were thus excluded from all analyses.
back at step t (the participant’s pretask rating of the same chorale in the
Technical issues caused the loss of all of the data for one participant and the
piano timbre, rescaled from −3 to 3, or 0 in the case of no feedback) and A is
second half of the data for another, so we excluded the former from all analyses
the set of all possible actions in the state. Starting with all Q values at 0, this
and the latter from analyses of learning over time. The final sample (n = 20, 12 model thus approaches the true expected future value of each choice by
females) was aged between 18 and 27 y old (mean age ± SD = 21.60 ± 2.89 y), computing the RPE as the difference between each outcome and the (dis-
with an average of 5.24 ± 5.20 y of formal or informal musical training and an counted) best expected outcome, at a rate depending on the learning rate α.
average BMRQ score (14) of 78.83 ± 9.37 out of 100. This study was approved We fit these parameters to participants’ choices with a “softmax” function
by the Research Ethics Board of the Montreal Neurological Institute. to model the probability of choosing each action (e.g., X or Y) in each state:
βQ½s, X
P½aðtÞ = XjsðtÞ = s = eβQ½s,eX + eβQ½s, Y, where β is a temperature parameter that in-
Stimuli. The stimuli were based on 12 four-part Bach chorales, chosen because
dicates a learner’s propensity to explore the environment. We fit the model’s
they follow widespread rules of Western tonal music and have consistent
free parameters ðα, λ,   and  βÞ by exhaustive search, choosing the set that min-
structures. Each chorale contained four musical phrases of eight beats each and
imized the negative log likelihood of the model making the same choices as the
was 25.60 s long at 75 beats per minute. Six were in major keys, six were in minor
participants (31–33). This process yielded RPE values for each outcome of each
keys, and all were in duple meter. We generated a consonant (original) ending
participant, which became regressors of interest in the fMRI design matrices.
and a dissonant ending for each chorale in two timbres using MuseScore
software (2016 MuseScore BVBA). The dissonant endings had each note
fMRI Data Acquisition and Preprocessing. We acquired MRI data with a Sie-
alternatingly transposed half a step higher or lower, with the soprano and tenor
mens TIM Trio 3-T scanner and a 32-channel head coil at the McConnell Brain
parts moving up first and the alto and bass parts moving down first; the con-
Imaging Centre of the Montreal Neurological Institute. We used a T2*-
sonant and dissonant versions of the stimuli were otherwise identical due to the
weighted multiband echo planar imaging sequence to collect whole-brain
use of Musical Instrument Digital Interface (MIDI) instruments. Thus, the
functional BOLD images at high temporal resolution [52 slices, echo time
“consonant” endings were altogether much less dissonant than the “disso-
(TE) = 30 ms, repetition time (TR) = 885 ms, multiband acceleration factor = 4, flip
nant” ones, even though they could also contain some elements of dissonance.
angle = 90°, matrix size = 195 × 195 × 130, voxel size = 2.5 mm isotropic]. When
We chose the timbres according to an Internet survey with a separate group of
whole-brain coverage was not possible with these parameters, we prioritized
22 volunteers, settling on those rated most consistently and similarly pleasant
ventral temporal and frontal regions at the cost of dorsal parietal ones. The ex-
on a seven-point Likert scale: mandolin (mean pleasantness rating ± SD =
perimental task occurred over two functional runs of 30 trials, which lasted about
5.63 ± 2.13, Cronbach’s α = 0.89) and harp (mean pleasantness rating ± SD =
18 min each, with a short break in between for rest. Participants experienced and
5.56 ± 1.92, Cronbach’s α = 0.89). Participants listened to and rated their en-
responded to the task via Presentation software (Neurobehavioral Systems, Inc.)
joyment of each chorale in a MIDI piano timbre before the experimental task.
using an angled mirror on top of the head coil, MR-compatible S14 Insert
Although one person preferred the chorales that ended dissonantly, the overall Earphones (Sensimetrics Corporation), and an MR-compatible two-button box
group significantly preferred the consonant stimuli [two-tailed paired samples in the right hand. We collected a high-resolution T1-weighted image for each
t(19) = 5.72, P < 0.01; SI Appendix, Fig. S1]. Participants also rated two short participant [magnetization prepared rapid gradient echo (MPRAGE): TE =
chord sequences in both timbres, ensuring no a priori preference for either 2.98 ms, TR = 2,300 ms, matrix size = 256 × 256 × 192, voxel size = 1 mm
timbre [two-tailed paired-samples t(19) = −1.76, P = 0.09; SI Appendix, Fig. S1]. isotropic] for anatomical registration. We preprocessed the functional images
with FSL FEAT Version 6 (FMRIB), including motion correction, brain extrac-
Experimental Task. The decision-making task was based on “two-step tasks” tion, spatial smoothing with a Gaussian kernel of 5-mm FWHM, grand-mean
used in other investigations of human RPEs (32, 33). The chief difference was intensity normalization, high-pass temporal filtering with Gaussian-weighted
that, instead of money or points, the feedback in this task was only musical least-squares straight-line fitting sigma = 50.0 s, and linear registration to the
(Fig. 1). Participants began each trial with a button press, after which two T1-weighted images and then to the MNI152 2-mm standard brain.
colors appeared on the screen (blue on the left and yellow on the right).
These colors probabilistically determined the timbre of the trial, so choosing Behavioral Analysis. Behavioral analyses were performed with custom
a color with the left or right button of the response box initiated the (un- MATLAB (MathWorks) scripts. Given preferences for consonance, we mea-
altered) beginning of a randomly selected chorale in that timbre. The color sured “overall” accuracy as the proportion of choices most likely to lead to
choice was displayed on the screen for the first half of the chorale, after the more favorable timbre context or the consonant ending and “two-step”
which a cue prompted a similar choice between two directions (left and accuracy as the proportion of trials on which a participant made both the
right) that probabilistically determined whether the chorale ended conso- color and direction choice most likely to lead to a consonant ending. We
nantly or dissonantly. If the participants failed to respond before the final compared each with chance performance (50% for overall accuracy and 25%
phrase of the chorale (i.e., within 2.25 s), the trial aborted and the next one for two-step accuracy) with two-tailed, one-sample t tests, excluding the
began. If not, the direction choice was displayed on the screen during the participant who reported a preference for dissonance. To measure learning
ending. The color choice probability was always 70%, but the direction throughout the task, we divided these accuracy measures into sliding 10-trial
choice probability was 85% in one timbre context and 70% in the other. To windows (i.e., window 1 was trials 1–10, window 2 was trials 2–11, etc.) and
maximize the likelihood of hearing consonant endings, a participant’s best obtained learning slopes with linear regressions of these values against the
course of action was therefore to choose the color that would most likely window number. We evaluated the group learning slopes with one-tailed
lead to the timbre in which the direction choice was more influential and permutation tests against regression slopes from 20,000 series of accuracy bins
then the direction most associated with the consonant ending. The only in- randomly sampled without replacement, excluding the one participant who
structions were to “learn about which choices are most likely to lead to the reported a preference for dissonance and the participant with incomplete
music you want to hear, and then make these choices”; the participants were data. We characterized the fit of the reinforcement-learning model for each
blinded to the task probabilities. The task had 60 trials, evenly divided into participant as the pseudo-R2 value (i.e., the percent improvement of the
four blocks with a break between each, lasting about 40 min. We adapted the model’s negative log likelihood compared with that of a null model treating
Downloaded at Philippines: PNAS Sponsored on July 17, 2021

task timings and probabilities through pilot testing with a separate group of each decision as random) (32, 33), and compared the model fitness for the
volunteers, selecting parameters that allowed participants to perform above group to 0% with a two-tailed, one-sample t test of the pseudo-R2 values.
chance and improve during the task without learning so well that they
stopped sampling task choices. The choice–outcome associations and the fMRI Analysis. We conducted two fMRI analyses with three general linear
more influential timbre were randomly assigned to each participant at the models (GLMs). The first GLM, to identify whether computationally modeled
beginning of the experiment to control for any associative biases. RPEs correlated with BOLD activity in the bilateral NAc, included a parametric

3314 | www.pnas.org/cgi/doi/10.1073/pnas.1809855116 Gold et al.


regressor of the modeled and standardized RPEs and three nuisance re- GLM. We analyzed the conjunction as the minimum z statistic of the three
gressors for the times between the color cues and choices (“state 1”), the contrasts and thresholded the conjunction map using clusters determined by
direction cues and choices (“state 2”), and the direction choices and their voxel z > 2.3 and cluster P < 0.01.
outcomes (“anticipation”). We designed the second and third GLMs to an- We investigated the behavioral relevance of the NAc RPE-like activity with
alyze the axioms of RPE signals and corroborate the result of the first GLM. stepwise linear regression of the averaged beta estimates in the right NAc (as
These included the same nuisance regressors as the first GLM, but the second defined independently by the Harvard-Oxford atlas) against the pseudo-R2
model had separate regressors for the positive and negative RPEs and the values of the reinforcement-learning model fit and the measures of decision-
third had binary regressors for the consonant and dissonant outcomes in- making task performance (overall accuracy, two-step accuracy, overall learning
stead. We convolved these regressors with a canonical hemodynamic re- slope, and two-step learning slope). This analysis, using MATLAB’s stepwiselm
sponse function and their resulting temporal derivatives to create the function with default settings, adds or removes variables based on F tests of
regressors of interest. Each first-level model also included 24 motion re- the change in the sum of squared error. We validated this approach with a
gressors to account for movement-related variance: one for each directional robust-fit regression on the same variables, reducing the influence of out-
axis, one for the derivative of each axis, and the squares of these 12 values.
liers with weightings, using the “RobustOpts” option with default settings in
Finally, we scrubbed volumes with framewise displacement above 0.9 with
MATLAB’s fitlm function. We followed this analysis with a simple linear re-
additional nuisance regressors (46). We subjected each model to a second-
gression of right NAc RPE-like activity and BMRQ scores, and by adding
level fixed-effects analysis to average the contrast estimates of the two runs
BMRQ scores as a regressor in the significant model of overall learning slopes
for each participant (the contrasts were unchanged for the participant with
and right NAc RPE-like activity (for the 15 participants who completed the
one run of data) and to a third-level mixed-effects analysis to find group
BMRQ). Data are available at https://neurovault.org/collections/4778/ (47).
effects. Reported coordinates are in Montreal Neurological Institute space.
We tested the hypothesis of NAc activity reflecting RPEs with a bilateral
anatomical mask from the Harvard-Oxford atlas in FSL (35), thresholding at ACKNOWLEDGMENTS. We thank Vincent K. M. Cheung for helping with
stimulus design and pilot testing and Karl A. Neumann for assistance with
P < 0.05 after one-tailed FWE correction. To test whether this region reflected
recruiting participants, pilot testing, and data collection. This work was
RPEs or some correlated signal like reward magnitude or valence, we analyzed
supported by a Fulbright Canada Science, Technology, Engineering, and
a conjunction of axiomatic RPE properties in the same mask: that they should Mathematics (STEM) Graduate Award (to B.P.G.), a Natural Sciences and
be greater for rewards than punishments and increase for both larger/less Engineering Research Council of Canada Collaborative Research and Train-

PSYCHOLOGICAL AND
COGNITIVE SCIENCES
expected rewards and smaller/more expected punishments (30). To do so, ing Experience Graduate Award (to B.P.G.), and by a Foundation Grant (to
we conjoined parametric regressors of positive and negative RPEs from the R.J.Z.) from the Canadian Institutes of Health Research. R.J.Z. is also a senior
second GLM with a contrast of consonant > dissonant endings from the third fellow of the Canadian Institute for Advanced Research.

1. Dubé L, Le Bel J (2003) The content and structure of laypeople’s concept of pleasure. 25. Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward.
Cogn Emotion 17:263–295. Science 275:1593–1599.
2. Berridge KC, Kringelbach ML (2015) Pleasure systems in the brain. Neuron 86:646–664. 26. Hart AS, Rutledge RB, Glimcher PW, Phillips PE (2014) Phasic dopamine release in the
3. Salimpoor VN, Benovoy M, Larcher K, Dagher A, Zatorre RJ (2011) Anatomically dis- rat nucleus accumbens symmetrically encodes a reward prediction error term.
tinct dopamine release during anticipation and experience of peak emotion to music. J Neurosci 34:698–704.
Nat Neurosci 14:257–262. 27. Zhang Y, Larcher KM, Misic B, Dagher A (2017) Anatomical and functional organi-
4. Salimpoor VN, et al. (2013) Interactions between the nucleus accumbens and auditory zation of the human substantia nigra and its connections. eLife 6:e26653.
cortices predict music reward value. Science 340:216–219. 28. Seymour B, et al. (2005) Opponent appetitive-aversive neural processes underlie
5. Brattico E, et al. (2016) It’s sad but I like it: The neural dissociation between musical predictive learning of pain relief. Nat Neurosci 8:1234–1240.
emotions and liking in experts and laypersons. Front Hum Neurosci 9:676. 29. Floresco SB (2015) The nucleus accumbens: An interface between cognition, emotion,
6. Martínez-Molina N, Mas-Herrero E, Rodríguez-Fornells A, Zatorre RJ, Marco-Pallarés J and action. Annu Rev Psychol 66:25–52.
(2016) Neural correlates of specific musical anhedonia. Proc Natl Acad Sci USA 113: 30. Rutledge RB, Dean M, Caplin A, Glimcher PW (2010) Testing the reward prediction
E7337–E7345. error hypothesis with an axiomatic model. J Neurosci 30:13525–13536.
7. Jepma M, Verdonschot RG, van Steenbergen H, Rombouts SA, Nieuwenhuis S (2012) 31. Chase HW, Kumar P, Eickhoff SB, Dombrovski AY (2015) Reinforcement learning
Neural mechanisms underlying the induction and relief of perceptual curiosity. Front models and their neural correlates: An activation likelihood estimation meta-analysis.
Behav Neurosci 6:5. Cogn Affect Behav Neurosci 15:435–459.
8. Wassiliwizky E, Koelsch S, Wagner V, Jacobsen T, Menninghaus W (2017) The emo- 32. Gläscher J, Daw N, Dayan P, O’Doherty JP (2010) States versus rewards: Dissociable
tional power of poetry: Neural circuitry, psychophysiology and compositional princi- neural prediction error signals underlying model-based and model-free re-
ples. Soc Cogn Affect Neurosci 12:1229–1240. inforcement learning. Neuron 66:585–595.
9. Franklin RG, Jr, Adams RB, Jr (2011) The reward of a good joke: Neural correlates of 33. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences
viewing dynamic displays of stand-up comedy. Cogn Affect Behav Neurosci 11:508–515. on humans’ choices and striatal prediction errors. Neuron 69:1204–1215.
10. Meyer LB (1956) Emotion and Meaning in Music (Chicago Univ Press, Chicago). 34. Watkins CJCH (1989) Learning from delayed rewards. PhD dissertation (King’s Col-
11. Huron D (2006) Sweet Anticipation: Music and the Psychology of Expectation (MIT lege, Cambridge, UK).
Press, Cambridge, MA). 35. Frazier JA, et al. (2005) Structural brain magnetic resonance imaging of limbic and
12. Gebauer L, Kringelbach ML, Vuust P (2012) Ever-changing cycles of musical pleasure: thalamic volumes in pediatric bipolar disorder. Am J Psychiatry 162:1256–1265.
The role of dopamine and anticipation. Psychomusicology 22:152–167. 36. Salimpoor VN, Benovoy M, Longo G, Cooperstock JR, Zatorre RJ (2009) The rewarding
13. Salimpoor VN, Zald DH, Zatorre RJ, Dagher A, McIntosh AR (2015) Predictions and the aspects of music listening are related to degree of emotional arousal. PLoS One 4:e7487.
brain: How musical sounds become rewarding. Trends Cogn Sci 19:86–91. 37. Friston KJ, Friston DA (2013) A free energy formulation of music generation and
14. Mas-Herrero E, Marco-Pallarés J, Lorenzo-Seva U, Zatorre RJ, Rodriguez-Fornells A perception: Helmholtz revisited. Sound–Perception–Performance, Current Research in
(2013) Individual differences in music reward experiences. Music Percept 31:118–138. Systematic Musicology, ed Bader R (Springer, Heidelberg), pp 43–69.
15. Juslin PN, Västfjäll D (2008) Emotional responses to music: The need to consider un- 38. Virtala P, Huotilainen M, Partanen E, Fellman V, Tervaniemi M (2013) Newborn infants’
derlying mechanisms. Behav Brain Sci 31:559–575, discussion 575–621. auditory system is sensitive to Western music chord categories. Front Psychol 4:492.
16. Brattico E, Pearce M (2013) The neuroaesthetics of music. Psychol Aesthet Creat Arts 7:48–61. 39. Keiflin R, Janak PH (2017) Error-driven learning: Dopamine signals more than value-
17. Steinbeis N, Koelsch S, Sloboda JA (2006) The role of harmonic expectancy violations based errors. Curr Biol 27:R1321–R1324.
in musical emotions: Evidence from subjective, physiological, and neural responses. 40. Virtala P, Tervaniemi M (2017) Neurocognition of major-minor and consonance-dis-
J Cogn Neurosci 18:1380–1393. sonance. Music Percept 34:387–404.
18. Egermann H, Pearce MT, Wiggins GA, McAdams S (2013) Probabilistic models of ex- 41. Koelsch S (2014) Brain correlates of music-evoked emotions. Nat Rev Neurosci 15:170–180.
pectation violation predict psychophysiological emotional responses to live concert 42. Zatorre RJ, Belin P, Penhune VB (2002) Structure and function of auditory cortex:
music. Cogn Affect Behav Neurosci 13:533–553. Music and speech. Trends Cogn Sci 6:37–46.
19. Koelsch S, Kilches S, Steinbeis N, Schelinski S (2008) Effects of unexpected chords and of 43. Groenewegen HJ, Room P, Witter MP, Lohman AHM (1982) Cortical afferents of the
performer’s expression on brain responses and electrodermal activity. PLoS One 3:e2631. nucleus accumbens in the cat, studied with anterograde and retrograde transport
20. Friston K (2010) The free-energy principle: A unified brain theory? Nat Rev Neurosci techniques. Neuroscience 7:977–996.
11:127–138. 44. Mandler G (1975) Mind and Emotion (Wiley, New York).
21. Sharpe MJ, et al. (2017) Dopamine transients are sufficient and necessary for acqui- 45. Mas-Herrero E, Dagher A, Zatorre RJ (2017) Modulating musical reward sensitivity up
Downloaded at Philippines: PNAS Sponsored on July 17, 2021

sition of model-based associations. Nat Neurosci 20:735–742. and down with transcranial magnetic stimulation. Nat Hum Behav 2:27–32.
22. Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (MIT Press, 46. Siegel JS, et al. (2014) Statistical improvements in functional magnetic resonance
Cambridge, MA). imaging analyses produced by censoring high-motion data points. Hum Brain Mapp
23. den Ouden HE, Daunizeau J, Roiser J, Friston KJ, Stephan KE (2010) Striatal prediction 35:1981–1996.
error modulates cortical coupling. J Neurosci 30:3210–3219. 47. Gold BP, et al. (2019) Data from “Musical reward prediction errors engage the nucleus
24. Ylinen S, et al. (2016) Predictive coding of phonological rules in auditory cortex: A accumbens and motivate learning.” NeuroVault. Available at https://neurovault.org/
mismatch negativity study. Brain Lang 162:72–80. collections/4778/. Deposited January 28, 2019.

Gold et al. PNAS | February 19, 2019 | vol. 116 | no. 8 | 3315

You might also like