Professional Documents
Culture Documents
tmpB81B TMP
tmpB81B TMP
Reinforcement Learning
and Tourette Syndrome
Stefano Palminteri*,1, Mathias Pessiglione
*Laboratoire des Neurosciences Cognitives (LNC), Ecole Normale Supe`rieure (ENS), Paris, France
Motivation Brain and Behaviour Team (MBB), Institut du Cerveau et de la Moelle (ICM), Paris, France
1
Corresponding author: e-mail address: stefano.palminteri@gmail.com
Contents
1. Reinforcement Learning: Concepts, and Paradigms
2. Neural Correlates of Reinforcement Learning
2.1 Electrophysiological correlates in monkeys
2.2 Functional magnetic resonance imaging correlates in humans
2.3 Parkinsons disease and reinforcement learning
3. Tourette Syndrome and Reinforcement Learning
3.1 Experimental study 1: Tourette syndrome and subliminal instrumental
learning
(Palminteri, Lebreton, et al., 2009)
3.2 Experimental study 2: Tourette syndrome and reinforcement of motor skill
learning
(Palminteri et al., 2011)
3.3 Experimental study 3: Tourette syndrome and probabilistic reinforcement
learning
(Worbe et al., 2011)
4. Conclusions and Perspectives
Acknowledgments
References
132
135
135
136
138
139
140
141
143
144
148
148
Abstract
In this chapter, we report the first experimental explorations of reinforcement learning
in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the
knowledge concerning the neural bases of reinforcement learning at the moment of
these studies and the scientific rationale beyond them.
In short, reinforcement learning is learning by trial and error to maximize rewards and
minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortexbasal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the
pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well
131
132
as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients.
Specifically, the results suggest a deficit in negative reinforcement learning, possibly
underpinned by a functional hyperdopaminergia, which could explain the persistence
of tics, despite their evident inadaptive (negative) value. This idea, together with the
implications of these results in Tourette therapy and the future perspectives, is discussed
in Section 4 of this chapter.
ABBREVIATIONS
ADHD attention deficit-hyperactivity disorder
DA dopamine
DBS deep brain stimulation
fMRI functional magnetic resonance imaging
OCD obsessivecompulsive disorder
PD Parkinson disease
RL reinforcement learning
TD temporal difference (learning)
TS Tourette syndrome
VPFC ventral prefrontal cortex
VS ventral striatum
133
the heritage of the second one is to be found in the mathematical formalization of these concepts and paradigms.
The computational and the psychological views share the basic idea that
the learner (the animal or the automaton) wants something (goal-directness).
This feature distinguishes RL from the other learning processes, such as procedural or observational learning (Theoretical Neuroscience: Computational
and Mathematical Modeling of Neural Systems. The MIT Press 2005.
ISBN: 0262541858). From this standpoint two features emerge: RL is
selectional (the agent must try and select among several alternative choices)
and associative (these choices must be associated with a particular state).
In animal learning literature, RL was originally referred as conditioning.
The experimental paradigms in conditioning have belonged to two main
classes: classical conditioning and instrumental conditioning.
The minimal conditioning processes imply the building up of associations between a reinforcer and a stimulus or an action. In classical conditioning,
the reinforcer is delivered irrespective of the learners behavior, and the
observed response is represented by innate preparatory responses. The typical example is Pavlovs dog learning to salivate (innate response) in response
to a bell (stimulus), which announced the delivery of food (reinforcer)
(Pavlov, 1927). In instrumental conditioning, the reinforcers delivery is
contingent on a behavioral response. This feature was observable in the early
experimental observations of this process, provided by Thorndike and
Skinner: an animal closed in a box had to learn to perform specific actions
(string pulling, lever pressing) in order to escape captivity or get food
(Skinner, 1938; Thorndike, 1911).
Looking at the causal forces of conditioning several conditions shown to be
necessary: temporal contiguity (an action or a stimulus must be temporally close
to the outcome for an association to be established), contingency (the probability of an outcome should be higher after the action or the stimulus, i.e.,
the action or stimulus should be predictors of the outcome), and prediction
error (an action or a stimulus is associated to an outcome if the same outcome
was not already fully predicted by the learner) (Rescorla, 1967).
Rescorla and Wagner have first introduced the latter idea (Rescorla &
Wagner, 1972). They were interested in understanding a particular conditioning effect called the blocking effect (Kamin, 1967). In Kamins blocking paradigm, an animal is exposed to a first conditioned stimulus (i.e., a bell
ring), which predicts the occurrence of a reinforcer (i.e., food). After learning the association between the bell and the food, another stimulus (i.e., a
light) is presented with the food. Hence both the bell and the light are stimuli
134
that predict the food. However, when tested, the animal does not learn the
association between the light and the food, as if it were blocked by the first
association. Rescorla and Wagner proposed that conditioning occurs not
only because two events co-occur, but because that co-occurrence is unanticipated on the basis of current knowledge. In the example above, the
occurrence of food is already fully predicted by the bell, so no novel association with the light is learned. The primitive learning signal of their model
is a prediction error, defined as the difference between the predicted and
the obtained reinforcer. The reinforcer (reward or punishment) prediction
error is a measure of the predictions accuracy and the Rescorla and Wagner
model is an error minimization model.
RL in the artificial intelligence perspective is a field of machine learning
aimed to find computational solutions to a class of problems closely related to
the psychological paradigms described in the case of instrumental conditioning (Sutton & Barto, 1998). The agent is conceived as navigating through
states of the environment selecting actions and collecting a quantitative
reward,1 which should be maximized. From this learning perspective,
two main functions arise as necessary for a RL agent: predicting the expected
reward in a given state (reward prediction) and optimal action selection for
reward maximization (choice).
Most influential modern RL models incorporate a temporal difference
(TD) learning. TD learning algorithm builds accurate reward predictions
from delayed rewards; the learning rule of this model is not dissimilar of that
used in the Rescorla and Wargner model, and it is based on a reward prediction error term. Q-learning is an extension of TD learning that learns separately the reward to be expected following each available action. The
optimal choice becomes to simply choose the action with the highest reward
expectation (Watkins & Dayan, 1992). Also Q-learning is based on a TD
error.
Thus, the experimenter can, thanks to RL algorithms, extrapolate key
computational variables of these models and make quantitative predictions
on how neural and behavioral data should evolve under the assumptions of
the model. These computational constructs are referred as hidden
variables, as opposed to the experimental observables (choices, reaction
times), from which they are derived. In the next section, we shall see where
The reward of computational modeling is a quantitative term that can take negative values and therefore represent punishments as well.
135
A vast and rich literature exists concerning the neural bases of reinforcement learning in rodents. We
opted for restricting this chapter to primate studies, because they were the first to test and adopt computational concepts and models. Please note that recent studies strongly suggest that the same neurobiological and computational models are valid for both orders. Steinberg et al., 2013.
136
Healthy subjects
Reward occurs
DA level
Time
Punishment occurs
PD patients
TS patients
DA level
DA level
Figure 5.1 (A) A schematic representing dopaminergic signals (gray) following positive
and negative outcomes compared to baseline level in healthy subjects. The green and
the red line, respectively, represent the level to reach (either above or below the baseline) to express a signal strong enough to induce either reward or punishment learning.
This schematic is based on the results originally reported in Schultz, Dayan, and
Montague (1997). (B) The same processes are represented for unmedicated and medicated PD and TS patients, where the DA baselines are supposed to be modified by the
clinical and the pharmacological condition. When the dopaminergic signal does not
reach the green line, reward learning does not occur (this is the case for unmedicated
PD and medicated TS). When the dopaminergic signal does not reach the red line punishment learning does not occur (this is the case in unmedicated TS and medicated PD).
This schematic represents a possible interpretation for the results obtained in the experimental study 1 (Palminteri et al., 2009).
137
squirts of juice and water either in a predictable or in an unpredictable manner and they found that unpredictable reward sequences selectively induced
activation of the ventral striatum (VS) and in the ventral prefrontal cortex
(VPFC) (both target structures of the midbrain dopaminergic neurons) compared to predictable reward sequences, indicating that positive prediction
errors, instead of reward itself, induced increased activity in these areas.
These results have been replicated later (ODoherty, Deichmann,
Critchley, & Dolan, 2002).
These first studies used the so-called categorical approach to fMRI data
analysis. Though this approach has the advantages of being easy to implement and explain, it has the great disadvantage of preventing one from capturing the online temporal evolution of RL signals (Friston et al., 1996). This
is crucial for RL variables, such as reward predictions and reward prediction
errors, supposed to change radically during time. In fact as learning occurs,
reward prediction signals increase and prediction errors decrease: a feature
completely missed with cognitive subtraction. A second wave of studies used
a different approach, called model-based fMRI, which allows following of
learning-related changes in reward prediction and prediction error encoding
(ODoherty, Hampton, & Kim, 2007). This approach begins with computing the model estimation of the hidden computational variables according to
the RL algorithm (most often simple TD learning model for classical conditioning tasks and Q-learning model for instrumental conditioning tasks),
from subjects behavioral data. The fMRI data analysis consists of the search
of the brain areas whose neural activity covary with the models estimate of
the computational variables.
Following this model-based approach, a study from ODoherty and colleagues using classical conditioning procedure revealed that responses in the
VS and in the VPFC were significantly correlated with this error signal
(ODoherty, Dayan, Friston, Critchley, & Dolan, 2003). Similar results were
obtained by the same group in a subsequent experiment in which they contrasted a classical conditioning and an instrumental conditioning procedure
(ODoherty et al., 2004). These first results have been replicated consistently
hereinafter with different kinds of rewards (primary, as well as secondary),
different paradigms (classical, as well as instrumental conditioning), and
by different groups (see e.g., Abler, Walter, Erk, Kammerer, & Spitzer,
2006; Kim, Shimojo, & ODoherty, 2006; Palminteri, Boraud, Lafargue,
Dubois, & Pessiglione, 2009; Rutledge, Dean, Caplin, & Glimcher, 2010).
Thus, reward prediction errors have been reported consistently in the
basal ganglia (VS) and in the VPFC, which are main projections site of
138
the dopaminergic neurons (Draganski et al., 2008). The consensual interpretation, built in analogy with electrophysiology studies in nonhuman primates, of these results has been that these signals reflect the midbrain
dopaminergic input in these areas. This idea has been further supported
by another experiment in which the authors utilized a special MRI sequence
to enhance the sensitivity in the midbrain. They reported that the responses
of dopaminergic nuclei were compatible with the reward prediction error
hypothesis (DArdenne, McClure, Nystrom, & Cohen, 2008).
However, functional imaging uniquely provides us with functional
correlates, which, in principle, could be merely epiphenomenal.
A limitation which is not specific to fMRI; it is also common to other electrophysiology techniques. To assess causal relations between a neural system
and a behavior, neuroscientists must observe the behavioral output of systems perturbation (Siebner et al., 2009). The perturbation can be the
administration of a given molecule or an accidental brain injury.
The causal implication of dopaminergic transmission in fMRI prediction
error signals has been given by a pharmacological perturbation fMRI study
in which subjects performed an instrumental learning task with probabilistic
monetary rewards and were given a dopaminergic treatment. The treatment
was either a DA enhancer (levodopa), or a DA antagonist (aldol) or placebo
(Pessiglione, Seymour, Flandin, Dolan, & Frith, 2006). fMRI results showed
again that reward prediction errors were represented in the VS; furthermore,
they showed that DA treatments modified the amplitude of these signals, so
that l-dopa amplified prediction errors and aldol blunted them, establishing a
direct link between dopaminergic transmission and fMRI prediction error
signals. Moreover, these medications affected learning performances accordingly to their neural effects (enhancement under l-dopa, impairment under
aldol), suggesting a causal role of DA modulation in reward learning.
In summary, the study of the neural bases of RL in humans has consistently shown that (1) reward prediction errors are represented in the striatum
and in the prefrontal cortex (mainly in the ventral parts, VS and VPFC) and
that (2) dopaminergic pharmacological manipulation significantly affect
these signals and consequently the behavioral performance.
139
administrated an instrumental learning task to a cohort of PD patients medicated (on) or unmedicated with l-dopa (off ) (Frank, Seeberger, & OReilly,
2004). Their results showed that the off patients were impaired in learning
from positive outcomes, whereas on patients were impaired in learning from
negative outcomes. This result is consistent with the idea that reward and
punishment learning are driven by dopaminergic positive and negative prediction errors, respectively. According to this interpretation, the level of
dopaminergic transmission cannot increase enough to produce positive prediction errors in off patients, because of their neural loss in the midbrain DA
nuclei, so that positive outcomes are not able to induce learning. On the
contrary in the on patients, where the level of DA has been artificially
increased by the treatment with l-dopa, negative prediction errors (pauses
in DA transmission) are not possible, leading to the impairment in learning
from negative outcomes (Fig. 5.1B). These results have been further replicated (partially or totally) by our and other groups (Bodi et al., 2009; Frank,
Samanta, Moustafa, & Sherman, 2007; Palminteri, Lebreton, et al., 2009;
Rutledge et al., 2009; Voon et al., 2010).
In summary these results indicated that dopaminergic motor disease,
such as PD could display nonmotor symptoms in a fashion that is fully compatible with the hypothesis of dopaminergic encoding of prediction errors
during RL. On this basis a natural extension of these studies has been to
study RL in Tourette syndrome (TS).
140
141
142
Parkinson
Tourette
PD OFF
PD ON
TS OFF
Learning
performances
Learning
performances
Reward Punishment
TS ON
Reward Punishment
B
Reinforcement learning
Reaction time
(1 cent)
Motor learning
TS OFF
Controls
TS ON
15
Trial
15
Neural RPE
C
Reward prediction errors (RPE)
Ventral striatum
Group
Trial
Behavioral RPE
Learning curves
Learning performances
TS OFF
TS AA
TS PA
24
Trial
Figure 5.2 (A) A schematic summarizing the behavioral results of the experimental
study 1 (Palminteri, Lebreton, et al., 2009). The graphs show the interaction between
reinforced valence (positive or negative) and medication studies. The same pattern
can be observed in PD and TS (ON, medicated; OFF, unmedicated). (B) This schematic
summarizes the main results of the experimental study 2 (Palminteri et al., 2011). Motor
skill learning is impaired in TS, compared to controls, irrespective of the medication status, whereas reinforcement leaning effect on motor learning follows a completely different pattern: it is exacerbated in unmedicated TS patients (TS OFF) compared to
healthy controls and unmedicated TS (TS OFF) in which it is absent. (C) This schematic
summarizes the main results of the second experimental study (Worbe et al., 2011).
Reward prediction encoding has been found in the VS (among other areas, such as
the VPFC). Learning performances were blunted in DA antagonist medicated patients
(TS AA) compared to unmedicated patients (TS OFF) and partial agonist medicated
patients (TS PA). Note that all the graphs here represent ideal values meant to illustrate
the pattern of the experimental results, but not the experimental results themselves
(except for the ventral striatal activation).
143
144
145
146
In the experimental study 3, we also found an instrumental learning deficit in TS patients with OCD comorbidity, which correlated with blunted
activity in the VPFC. This finding is consistent with evidences describing
RL deficits in OCD patients (Cavanagh, Grundler, Frank, & Allen, 2010;
Chamberlain et al., 2008; Figee et al., 2011; Nielen, den Boer, & Smid,
2009; Palminteri, Clair, Mallet, & Pessiglione, 2012; Remijnse et al.,
2006). Thus, since it has been shown with a variety of behavioral tasks
and clinical models, RL deficit may represent a neuropsychological feature
of OCD. These findings of a neural and behavioral reward processing
impairment are consistent with the alleged dysfunction of ventral frontal
cortexbasal ganglia loops that has been reported in OCD and in comorbid
TS-OCD (Aouizerate et al., 2004; Rotge et al., 2010; Worbe, Gerardin,
et al., 2010). Although the connection between RL impairment and
obsessivecompulsive symptoms remains to be articulated, we speculate here
that repetitive behaviors or thoughts might come from aberrant reinforcement processes in a similar way described for tics in TS syndrome. Future
research should also focus on studying reinforcement in other pathology of
the TS spectrum, such as ADHD, which is characterized by monoaminergic
dysfunction and dopaminergic treatment (Biederman & Faraone, 2005).
In summary, RL is a process whose dysfunction could in part be responsible for the behavioral manifestation of TS at different levels (from lower
motor symptoms to higher cognitive and psychiatric symptoms). Thus,
the formal framework of RL study can provide fundamental insights
for the comprehension of neuropsychiatric disorders. From this perspective
the experimental studies presented here can be considered part of a newborn
and promising discipline, computational psychiatry, which aims to explain
neuropsychiatric diseases with formal and quantitative behavioral models
(Maia & Frank, 2011; Montague, Dolan, Friston, & Dayan, 2012).
Beyond the interest for the physiopathology of TS, these data have
also implications for the implementation of current treatments and the
development of new ones, at both the pharmacological, surgical, and the
behavioral therapy levels (Hartmann & Worbe, 2013; McNaught &
Mink, 2011). We already showed that different kinds of pharmacological
treatment differentially affect RL, possibly explaining the different expression of side effects. On the other side, behavioral therapy, which is largely
based on conditioning procedures, should take into account the medication
status of the patient. For instance, on the basis of our results, negative
reinforcement is not likely to be effective in unmedicated TS patients,
whereas the opposite can be true for medicated ones. Concerning surgical
147
148
ACKNOWLEDGMENTS
Mael Lebreton took very important and active place in the experimental studies 1 and 2.
Yulia Worbe designed and conducted the experimental study 3. Yulia Worbe and
Andreas Hartmann took care of the TS patients and provided clinical data. David Grabli
had a similar role, but for PD and dystonic patients. S. P. received a PhD fellowship from
the Neuropole de Recherche Francilien (Nerf ). The studies were funded by the Fyssen
Fondation (FF), the Ecole de Neuroscience de Paris (ENP), the Agence National de la
Recherche (ANR), and the Association Francaise du Syndrome de Gilles de la
Tourette (AFSGT).
REFERENCES
Abler, B., Walter, H., Erk, S., Kammerer, H., & Spitzer, M. (2006). Prediction error as a
linear function of reward probability is coded in human nucleus accumbens. NeuroImage,
31(2), 790795. http://dx.doi.org/10.1016/j.neuroimage.2006.01.001.
Agid, Y., Arnulf, I., Bejjani, P., Bloch, F., Bonnet, A. M., Damier, P., et al. (2003).
Parkinsons disease is a neuropsychiatric disorder. Advances in Neurology, 91, 365370.
Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/12442695.
Aouizerate, B., Guehl, D., Cuny, E., Rougier, A., Bioulac, B., Tignol, J., et al. (2004). Pathophysiology of obsessive-compulsive disorder: A necessary link between phenomenology, neuropsychology, imagery and physiology. Progress in Neurobiology, 72(3), 195221.
http://dx.doi.org/10.1016/j.pneurobio.2004.02.004.
Bar-Gad, I., & Bergman, H. (2001). Stepping out of the box: Information processing in the neural
networks of the basal ganglia. Current Opinion in Neurobiology, 11(6), 689695. Retrieved
from, http://www.ncbi.nlm.nih.gov/pubmed/11741019.
Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative
reward prediction error signal. Neuron, 47(1), 129141. http://dx.doi.org/10.1016/j.
neuron.2005.05.020.
Berns, G. S., McClure, S. M., Pagnoni, G., & Montague, P. R. (2001). Predictability modulates human brain response to reward. The Journal of Neuroscience: The Official Journal of
the Society for Neuroscience, 21(8), 27932798. Retrieved from, http://www.ncbi.nlm.nih.
gov/pubmed/11306631.
Biederman, J., & Faraone, S. V. (2005). Attention-deficit hyperactivity disorder. Lancet,
366(9481), 237248. http://dx.doi.org/10.1016/S0140-6736(05)66915-2.
Bodi, N., Keri, S., Nagy, H., Moustafa, A., Myers, C. E., Daw, N., et al. (2009). Rewardlearning and the novelty-seeking personality: A between- and within-subjects study of
the effects of dopamine agonists on young Parkinsons patients. Brain: A Journal of Neurology, 132(Pt. 9), 23852395. http://dx.doi.org/10.1093/brain/awp094.
Brembs, B. (2003). Operant conditioning in invertebrates. Current Opinion in Neurobiology,
13(6), 710717. http://dx.doi.org/10.1016/j.conb.2003.10.002.
Cavanagh, J. F., Grundler, T. O. J., Frank, M. J., & Allen, J. J. B. (2010). Altered cingulate
sub-region activation accounts for task-related dissociation in ERN amplitude as a function of obsessive-compulsive symptoms. Neuropsychologia, 48(7), 20982109. http://dx.
doi.org/10.1016/j.neuropsychologia.2010.03.031.
149
Chamberlain, S. R., Menzies, L., Hampshire, A., Suckling, J., Fineberg, N. A., del
Campo, N., et al. (2008). Orbitofrontal dysfunction in patients with obsessivecompulsive disorder and their unaffected relatives. Science (New York, N.Y.),
321(5887), 421422. http://dx.doi.org/10.1126/science.1154433.
Cohen, M. X., Axmacher, N., Lenartz, D., Elger, C. E., Sturm, V., & Schlaepfer, T. E.
(2009). Neuroelectric signatures of reward learning and decision-making in the human
nucleus accumbens. Neuropsychopharmacology: Official Publication of the American College of
Neuropsychopharmacology, 34(7), 16491658. http://dx.doi.org/10.1038/npp.2008.222.
DArdenne, K., McClure, S. M., Nystrom, L. E., & Cohen, J. D. (2008). BOLD responses
reflecting dopaminergic signals in the human ventral tegmental area. Science (New York,
N.Y.), 319(5867), 12641267. http://dx.doi.org/10.1126/science.1150605.
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12),
17041711. http://dx.doi.org/10.1038/nn1560.
Dehaene, S., Changeux, J.-P., Naccache, L., Sackur, J., & Sergent, C. (2006). Conscious,
preconscious, and subliminal processing: A testable taxonomy. Trends in Cognitive
Sciences, 10(5), 204211. http://dx.doi.org/10.1016/j.tics.2006.03.007.
Dickinson, A. (1980). Contemporary animal learning theory. Cambridge University Press.
Retrieved from, http://books.google.com/books?hlit&lr&id2y84AAAAIAAJ&pgis1.
Draganski, B., Kherif, F., Kloppel, S., Cook, P. A., Alexander, D. C., Parker, G. J. M., et al.
(2008). Evidence for segregated and integrative connectivity patterns in the human basal
ganglia. The Journal of Neuroscience, 28(28), 71437152. http://dx.doi.org/10.1523/
JNEUROSCI.1486-08.2008.
Figee, M., Vink, M., de Geus, F., Vulink, N., Veltman, D. J., Westenberg, H., et al. (2011).
Dysfunctional reward circuitry in obsessive-compulsive disorder. Biological Psychiatry,
69(9), 867874. http://dx.doi.org/10.1016/j.biopsych.2010.12.003.
Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability
and uncertainty by dopamine neurons. Science (New York, N.Y.), 299(5614), 18981902.
http://dx.doi.org/10.1126/science.1077349.
Frank, M. C., Piedad, J., Rickards, H., & Cavanna, A. E. (2011). The role of impulse control
disorders in Tourette syndrome: An exploratory study. Journal of the Neurological Sciences,
310(12), 276278. http://dx.doi.org/10.1016/j.jns.2011.06.032.
Frank, M. J., Samanta, J., Moustafa, A. A., & Sherman, S. J. (2007). Hold your horses: Impulsivity, deep brain stimulation, and medication in parkinsonism. Science (New York, N.Y.),
318(5854), 13091312. http://dx.doi.org/10.1126/science.1146157.
Frank, M. J., Seeberger, L. C., & OReilly, R. C. (2004). By carrot or by stick: Cognitive
reinforcement learning in parkinsonism. Science (New York, N.Y.), 306(5703),
19401943. http://dx.doi.org/10.1126/science.1102941.
Friston, K. J., Price, C. J., Fletcher, P., Moore, C., Frackowiak, R. S., & Dolan, R. J. (1996).
The trouble with cognitive subtraction. NeuroImage, 4(2), 97104. http://dx.doi.org/
10.1006/nimg.1996.0033.
Giedd, J. N., Blumenthal, J., Jeffries, N. O., Castellanos, F. X., Liu, H., Zijdenbos, A., et al.
(1999). Brain development during childhood and adolescence: A longitudinal MRI
study. Nature Neuroscience, 2(10), 861863. http://dx.doi.org/10.1038/13158.
Gilbert, D. L., Christian, B. T., Gelfand, M. J., Shi, B., Mantil, J., & Sallee, F. R. (2006).
Altered mesolimbocortical and thalamic dopamine in Tourette syndrome. Neurology,
67(9), 16951697. http://dx.doi.org/10.1212/01.wnl.0000242733.18534.2c.
Glascher, J., Daw, N., Dayan, P., & ODoherty, J. P. (2010). States versus rewards: Dissociable
neural prediction error signals underlying model-based and model-free reinforcement
learning. Neuron, 66(4), 585595. http://dx.doi.org/10.1016/j.neuron.2010.04.016.
Gogtay, N., Giedd, J. N., Lusk, L., Hayashi, K. M., Greenstein, D., Vaituzis, A. C., et al.
(2004). Dynamic mapping of human cortical development during childhood through
150
early adulthood. Proceedings of the National Academy of Sciences of the United States of America,
101(21), 81748179. http://dx.doi.org/10.1073/pnas.0402680101.
Hartmann, A., & Worbe, Y. (2013). Pharmacological treatment of Gilles de la Tourette syndrome. Neuroscience and Biobehavioral Reviews, 37(6), 11571161. http://dx.doi.org/
10.1016/j.neubiorev.2012.10.014.
Kamin, L. J. (1967). Attention-like processes in classical conditioning. Hamilton, Ontario:
Department of psychology, McMaster University.
Kawohl, W., Schneider, F., Vernaleken, I., & Neuner, I. (2009). Aripiprazole in the pharmacotherapy of Gilles de la Tourette syndrome in adult patients. The World Journal of
Biological Psychiatry: The Official Journal of the World Federation of Societies of Biological Psychiatry, 10(4 Pt. 3), 827831. http://dx.doi.org/10.1080/15622970701762544.
Kienast, T., & Heinz, A. (2006). Dopamine and the diseased brain. CNS & Neurological
DisordersDrug Targets, 5(1), 109131. http://dx.doi.org/10.2174/187152706784111560.
Kim, H., Shimojo, S., & ODoherty, J. P. (2006). Is avoiding an aversive outcome rewarding? Neural substrates of avoidance learning in the human brain. PLoS Biology, 4(8),
e233. http://dx.doi.org/10.1371/journal.pbio.0040233.
Koechlin, E., & Summerfield, C. (2007). An information theoretical approach to prefrontal
executive function. Trends in Cognitive Sciences, 11(6), 229235. http://dx.doi.org/
10.1016/j.tics.2007.04.005.
Kouider, S., & Dehaene, S. (2007). Levels of processing during non-conscious perception:
A critical review of visual masking. Philosophical Transactions of the Royal Society of London
Series B, Biological Sciences, 362(1481), 857875. http://dx.doi.org/10.1098/rstb.2007.2093.
Lawrence, A. D., Evans, A. H., & Lees, A. J. (2003). Compulsive use of dopamine replacement
therapy in Parkinsons disease: Reward systems gone awry? Lancet Neurology, 2(10), 595604.
Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/14505581.
Leckman, J. F. (2002). Tourettes syndrome. Lancet, 360(9345), 15771586. http://dx.doi.
org/10.1016/S0140-6736(02)11526-1.
Maia, T. V., & Frank, M. J. (2011). From reinforcement learning models to psychiatric and neurological disorders. Nature Neuroscience, 14(2), 154162. http://dx.doi.org/10.1038/nn.2723.
Malison, R. T., McDougle, C. J., van Dyck, C. H., Scahill, L., Baldwin, R. M., Seibyl, J. P.,
et al. (1995). [123I]beta-CIT SPECT imaging of striatal dopamine transporter binding in Tourettes disorder. The American Journal of Psychiatry, 152(9), 13591361. Retrieved from,
http://www.ncbi.nlm.nih.gov/pubmed/7653693.
McNaught, K. S. P., & Mink, J. W. (2011). Advances in understanding and treatment of
Tourette syndrome. Nature Reviews. Neurology, 7(12), 667676. http://dx.doi.org/
10.1038/nrneurol.2011.167.
Mink, J. W. (2003). The basal ganglia and involuntary movements. Archives of Neurology, 60,
13651368.
Mirenowicz, J., & Schultz, W. (1996). Preferential activation of midbrain dopamine neurons
by appetitive rather than aversive stimuli. Nature, 379(6564), 449451. http://dx.doi.
org/10.1038/379449a0.
Montague, P. R., Dolan, R. J., Friston, K. J., & Dayan, P. (2012). Computational psychiatry.
Trends in Cognitive Sciences, 16(1), 7280. http://dx.doi.org/10.1016/j.tics.
2011.11.018.
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine
neurons encode decisions for future action. Nature Neuroscience, 9(8), 10571063. http://
dx.doi.org/10.1038/nn1743.
Murphey, R. M. (1967). Instrumental conditioning of the fruit fly, Drosophila melanogaster.
Animal Behaviour, 15(1), 153161. http://dx.doi.org/10.1016/S0003-3472(67)80027-7.
Murray, G. K., Corlett, P. R., Clark, L., Pessiglione, M., Blackwell, A. D., Honey, G., et al.
(2008). Substantia nigra/ventral tegmental reward prediction error disruption in psychosis. Molecular Psychiatry, 13(3), 267276. http://dx.doi.org/10.1038/sj.mp.4002058, 239.
151
Nielen, M. M., den Boer, J. A., & Smid, H. G. O. M. (2009). Patients with obsessivecompulsive disorder are impaired in associative learning based on external feedback. Psychological Medicine, 39(9), 15191526. http://dx.doi.org/10.1017/S0033291709005297.
Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: Opportunity costs and
the control of response vigor. Psychopharmacology, 191(3), 507520. http://dx.doi.org/
10.1007/s00213-006-0502-4.
ODoherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal
difference models and reward-related learning in the human brain. Neuron, 38(2),
329337. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/12718865.
ODoherty, J. P., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004).
Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science (New
York, N.Y.), 304(5669), 452454. http://dx.doi.org/10.1126/science.1094285.
ODoherty, J. P., Deichmann, R., Critchley, H. D., & Dolan, R. J. (2002). Neural responses
during anticipation of a primary taste reward. Neuron, 33(5), 815826. Retrieved from,
http://www.ncbi.nlm.nih.gov/pubmed/11879657.
ODoherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to
reward learning and decision making. Annals of the New York Academy of Sciences, 1104,
3553. http://dx.doi.org/10.1196/annals.1390.022.
Palminteri, S., Boraud, T., Lafargue, G., Dubois, B., & Pessiglione, M. (2009). Brain hemispheres selectively track the expected value of contralateral options. The Journal
of Neuroscience, 29(43), 1346513472. http://dx.doi.org/10.1523/JNEUROSCI.150009.2009.
Palminteri, S., Clair, A.-H., Mallet, L., & Pessiglione, M. (2012). Similar improvement of
reward and punishment learning by serotonin reuptake inhibitors in obsessivecompulsive disorder. Biological Psychiatry, 72(3), 244250. http://dx.doi.org/10.1016/
j.biopsych.2011.12.028.
Palminteri, S., Justo, D., Jauffret, C., Pavlicek, B., Dauta, A., Delmaire, C., et al. (2012).
Critical roles for anterior insula and dorsal striatum in punishment-based avoidance
learning. Neuron, 76(5), 9981009. http://dx.doi.org/10.1016/j.neuron.2012.10.017.
Palminteri, S., Lebreton, M., Worbe, Y., Grabli, D., Hartmann, A., & Pessiglione, M.
(2009). Pharmacological modulation of subliminal learning in Parkinsons and Tourettes
syndromes. Proceedings of the National Academy of Sciences of the United States of America,
106(45), 1917919184. http://dx.doi.org/10.1073/pnas.0904035106.
Palminteri, S., Lebreton, M., Worbe, Y., Hartmann, A., Lehericy, S., Vidailhet, M., et al.
(2011). Dopamine-dependent reinforcement of motor skill learning: Evidence from
Gilles de la Tourette syndrome. Brain, 134(8), 22872301. http://dx.doi.org/
10.1093/brain/awr147.
Palminteri, S., Serra, G., Buot, A., Schmidt, L., Welter, M.-L., & Pessiglione, M. (2013). Hemispheric dissociation of reward processing in humans: Insights from deep brain stimulation.
Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, pii: S0010-9452(13)
00072-5. http://dx.doi.org/10.1016/j.cortex.2013.02.014. [Epub ahead of print].
Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity of the
cerebral cortex. London: Oxford University press. Retrieved from,http://psychclassics.
yorku.ca/Pavlov/.
Pessiglione, M., Petrovic, P., Daunizeau, J., Palminteri, S., Dolan, R. J., & Frith, C. D.
(2008). Subliminal instrumental conditioning demonstrated in the human brain. Neuron,
59(4), 561567. http://dx.doi.org/10.1016/j.neuron.2008.07.005.
Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J., & Frith, C. D. (2006). Dopaminedependent prediction errors underpin reward-seeking behaviour in humans. Nature,
442(7106), 10421045. http://dx.doi.org/10.1038/nature05051.
Priori, A., Giannicola, G., Rosa, M., Marceglia, S., Servello, D., Sassi, M., et al. (2013). Deep
brain electrophysiological recordings provide clues to the pathophysiology of Tourette
152
153