Inter-Individual Differences in Decision-Making, Flexible and Goal-Directed Behaviors: Novel Insights Within The Prefronto-Striatal Networks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Brain Struct Funct (2018) 223:897–912

https://doi.org/10.1007/s00429-017-1530-z

ORIGINAL ARTICLE

Inter-individual differences in decision-making, flexible and goal-


directed behaviors: novel insights within the prefronto-striatal
networks
Aurélie Fitoussi1,2 • Prisca Renault1,2 • Catherine Le Moine1,2 • Etienne Coutureau1,2 •

Martine Cador1,2 • Françoise Dellu-Hagedorn1,2,3

Received: 25 July 2015 / Accepted: 28 September 2017 / Published online: 12 October 2017
Ó Springer-Verlag GmbH Germany 2017

Abstract Inflexible behavior is a hallmark of several imbalanced prefronto-striatal activity could underlie inac-
decision-making-related disorders such as ADHD and curate goal representation in changing environments and
addiction. As in humans, a subset of healthy rats makes may promote maladaptive habit formation among poor
poor decisions and prefers immediate larger rewards decision-makers. These data strengthen our previous work
despite suffering large losses in a rat gambling task (RGT). identifying biomarkers of vulnerability to develop psychi-
They also display a combination of traits reminiscent of atric disorders and demonstrate the relevance of inter-in-
addiction, notably inflexible behavior and perseverative dividual differences to model maladaptive behaviors.
responses. The goal of the present work was twofold: (1) to
elucidate if behavioral inflexibility of poor decision-makers Keywords Decision-making  Rat gambling task  Goal-
could be related to a lower quality of goal-directed directed behavior  Inflexible behavior  Prefrontal cortex 
behavior (action–outcome associations); (2) to uncover the Striatum
neural basis of inter-individual differences in goal-directed
behavior. We specifically assessed inter-individual differ-
ences in decision-making in the RGT, flexibility in the Introduction
RGT-reversed version and goal-directed behavior in a
contingency degradation test, i.e., response adaptation Behavioral flexibility in a changing environment is fun-
when dissociating reward delivery from the animal’s damental in everyday life. Executive functions that allow
action. The contributions of the medial prefrontal cortex initiating, achieving and updating goal-directed actions
and the dorsal striatum to action–outcome associations (Ernst et al. 2004; Fellows 2004; Lee et al. 2012) play a
were assessed using Zif268 immunodetection. Inflexible pivotal role in flexible behaviors (Griffiths et al. 2014).
behavior was related to a lower sensitivity to contingency This role is exemplified by the deleterious consequences of
degradation in all poor decision-makers and only in a few inflexible behavior as observed in several mental disorders
good decision-makers. This poorer sensitivity was associ- related to poor executive control (addiction, attention def-
ated with a lower immunoreactivity in prelimbic and icit-hyperactivity disorder, or obsessive–compulsive dis-
infralimbic cortices and a higher one in the dorsomedial order) (Royall et al. 2002). These mental disorders are
and dorsolateral striatum. These findings suggest that an typically characterized by poor decision-making, a deficit
that can be revealed in the Iowa gambling task (IGT) that
simulates complex and conflictual situations occurring in
& Françoise Dellu-Hagedorn
francoise.dellu@u-bordeaux.fr
real-life (Bechara et al. 1994). Importantly, poor decision-
making is also observed in a subset of healthy individuals,
1
University of Bordeaux, INCIA, UMR 5287, for whom immediate small gratifications prevail over long-
33000 Bordeaux, France term larger gain (Bechara and Damasio 2002; Bechara
2
CNRS, INCIA, UMR 5287, 33000 Bordeaux, France et al. 2002).
3
University of Bordeaux, INCIA, CNRS, UMR 5287, 146 rue Similarly, we identified in rodents a minority of poor
Léo Saignat, BP 31, 33076 Bordeaux Cedex, France decision-makers in the rat gambling task (RGT) (Rivalan

123
898 Brain Struct Funct (2018) 223:897–912

et al. 2009a), a task analogous to the human IGT, differences in the quality of the encoding of A–O associ-
demonstrating that rat behavior can reliably model ations are related to differences in these network activities.
dimensions found in humans. The identification of extreme The aim of the present work was twofold. First, to relate
or maladapted behaviors expressed spontaneously in rats inter-individual differences in decision-making to basic
and comparable to those observed in the clinic has proven cognitive processes, we assessed, within the same sample
to be useful to study the complexity of clinically relevant of individuals, decision-making performances in the RGT,
phenotype and to discover endophenotypes, as no particu- behavioral flexibility in the RGT reversal procedure and
lar hypothesis is established about their origins (Bardo sensitivity to contingency degradation in instrumental
et al. 1996; Belin and Deroche-Gamonet 2012; Blondeau conditioning. Second, we aimed to uncover the neural basis
and Dellu-Hagedorn 2007; Deroche-Gamonet et al. 2004; of inter-individual differences in the quality of A–O asso-
Kabbaj and Akil 2001; Koolhaas et al. 2007; LaPorte et al. ciations through the expression of the immediate early gene
2010; Matzel and Kolata 2010; Rivalan et al. 2009b, 2013; Zif268 within prefronto-striatal networks. Zif268 was
Robbins et al. 2012; Robinson et al. 2014). A thorough chosen here since it is a transcription factor well charac-
appraisal of the poor decision-makers in our animal model terized for its role in synaptic plasticity and memory pro-
(i.e., behavioral, neurobiological phenotypes and their eti- cessing (Bozon et al. 2003) and has been shown to be
ology), could shed light on the pathogenesis of these crucial in instrumental learning (Maroteaux et al. 2014).
mental disorders and identify potential risk factors as they
may share common neuropsychological characteristics
with patients that could represent potential risk factors Materials and methods
(Rivalan et al. 2009b). In that line, we recently showed that
poor decision-making in rats can be accurately predicted Animals
from a combination of behavioral and cognitive traits
reminiscent of psychiatric symptoms of decision-making- Thirty male Wistar Han rats (Charles River, Lyon, France)
related psychiatric disorders, namely risk-taking, reward- aged from 13 to 15 weeks were used. They were housed in
seeking, motor impulsivity and behavioral inflexibility groups of four in a temperature-controlled room (22 °C) on
(Rivalan et al. 2013). Behavioral inflexibility of poor an inverted 12-h light/dark cycle (light on at 8:00 PM).
decision-makers was particularly noticeable in the RGT Tests were conducted during the dark phase of the cycle.
reversal procedure, which requires redirecting choice on The rats had free access to water and were moderately food
the basis of new response-reward contingencies. Their deprived (95% of free feeding weight) throughout the
inflexible behavior was also associated with perseverative experiments. All procedures were conducted in strict
and compulsive-like behaviors in different experimental accordance with the 2010-63-EU and with approval of the
situations (Rivalan et al. 2013), suggesting an impairment Bordeaux University Animal Care and Use Committee
in goal-directed behavior for these individuals. (Permit number: 5012087-A).
A plausible hypothesis is that this maladaptive behavior
could be related at least in part to an inefficient encoding Behavioral apparatus and procedures
and updating of action–outcome (A–O) associations that
governs goal-directed behaviors. The quality of an A–O Experimental design and principle of the RGT and RGT-
association can be specifically assessed by degrading the reversed version are given in Fig. 1a.
instrumental contingency, i.e., by dissociating reward Four rats were exposed to the same behavioral tests as
delivery from the animal’s action (i.e., lever press) (Bal- the others except for the contingency degradation test
leine and Dickinson 1998). If lever pressing is under the during which they served as control rats (CONT = non-
control of the expected outcome (A–O), then that behavior degraded condition). To compare the scores of exactly the
should progressively cease as the animal learns that lever same rats through all the experiments, scores of these
presses are no longer necessary to obtain rewards. Several control rats were excluded from the data analysis of the
studies have demonstrated that the sensitivity of instru- behavioral tests.
mental responses to contingency degradation requires the
integrity of a wide neuronal circuit which includes the Decision-making in the rat gambling task
dorsal striatum (Yin et al. 2005, 2006), the mediodorsal
thalamus (Bradfield et al. 2013; Corbit et al. 2003), the Principle The RGT requires the rat to deduce, by trial and
basolateral nucleus of the amygdala (Balleine et al. 2003), error, among four options, the two that are the more
and the medial prefrontal cortex (mPFC) (Corbit and Bal- rewarding on the long-term and tracks the continuous and
leine 2003; Coutureau et al. 2012; Naneix et al. 2009). At dynamic process of deduction and readjustment of choice
present, however, it is not clear whether inter-individual (de Visser et al. 2011; Rivalan et al. 2009a). The principle

123
Brain Struct Funct (2018) 223:897–912 899

a RGT RGT-reversed version


choice reward penalty
food
dispenser
D C D very short (1/2) D C
C short (1/4)
B B
A B very long (1/4)
A long (1/2)
A
holes for choices
Choice of an advantageous opon Choice of an advantageous opon

Operant training RGT RGT-rev


5d 1d 5d 1d

b Conngency degradaon paradigm

Operant training Conngency degradaon test

retraining
Operant training INFLEX-I
CD test1 RR10 CD test2
FLEX-S
FR1 RR5 RR10 1 month 6d 3d +60 min
RR10 CONT
2d 3d 5d
6d RR10 RR10
1d 1d

Fig. 1 Experimental designs and principles of the behavioral tasks: a version (RGT-rev), A–B and C–D associated outcomes were spatially
the RGT and RGT-reversed version and b the contingency degrada- reversed. b In the contingency degradation paradigm, rats were first
tion task. In the RGT, choices A and B allowed delivery of two pellets trained under a fixed-ratio (FR1) and random ratio (RR5 and 10)
immediately, but were followed by long and unpredictable penalties schedules of reinforcement. Contingency degradation (CD) test
(time-out). By contrast, C and D choices allowed the delivery of only consisted in delivering pellets independently of lever presses. Rats
one pellet immediately but were followed by shorter penalties. The were killed 60 min after the end of the second test for Zif
time-out could occur according to a low probability (1/4) for B and C immunoreactivity quantification and comparisons between rats sen-
choices or a high probability (1/2) for A and D. Thus, A and B choices sitive to contingency degradation (FLEX-S) and those less sensitive
were equally disadvantageous whereas C and D were equally (INFLEX-I). They were compared to control rats (CONT) that only
advantageous. Advantageous choices could provide up to fivefold had 2 RR10 sessions, a week apart
more pellets than disadvantageous choices. In the RGT-reversed

of the task tightly mimics that of the human IGT: two the time-out periods to facilitate the association between
options (holes chosen by nose-poking) that steadily offer hole-response and its consequences. Rats are free to make a
bigger immediate food reward, are disadvantageous in the new choice whenever the reward has been collected or
long run due to higher unpredictable penalties (frustrating whenever the penalty finishes (no inter-trial interval).
time-outs occurring after food delivery, during which no Apparatus The experiments were performed in twelve
reward can be obtained). Conversely, the two advantageous polyvalent conditioning boxes (Imetronic, Pessac, France;
options steadily offer smaller reward, but unpre- 28 9 30 9 34 cm). Boxes were equipped with four nose-
dictable penalties that can follow are shorter. A choice also poke holes, dimly illuminated within the hole with a white
results in the deactivation of all stimulus-lights except for LED. These holes were located on a curved wall on one
the chosen hole, until the reward is collected or the penalty, side of the box, equidistant to a food magazine situated on
if it occurs, ends up. This is particularly important during the opposite wall. Each hole was equipped with an infrared

123
900 Brain Struct Funct (2018) 223:897–912

detector connected to an external dispenser delivering food \ 30% respectively) during the last 20 min of test. The
pellets (45 mg, formula P, Sandow scientific, USA). Data remaining rats with intermediate scores were classified as
collection was made using a control software (Imetronic, undecided (between 30 and 70% of advantageous choices).
Pessac, France) running on a computer outside the testing These rats (n = 4) were not further studied because they
room. At least 30 min before each session, the rats were are too few in this experiment and overall only represent
placed in the light-attenuated and temperature-controlled 16% of the population as shown in a previously published
(23 °C) experimental room. meta-analysis (Rivalan et al. 2013). Only rats that sampled
Training Procedures were previously described (Rivalan both advantageous and disadvantageous options during the
et al. 2009a, 2013). The training phase consisted in learn- first 20 min of the test were considered. The mean latency
ing to associate two consecutive nose-pokes in one of the to collect food pellets was recorded as a motivational food
four illuminated holes with the delivery of one or two food reward index.
pellets in the magazine. For this purpose, rats had first to
associate a single nose-poke in any of the four illuminated Behavioral flexibility in the rat gambling task-reversed
holes with the delivery of one food pellet in the magazine. version
After a nose poke, only the selected hole remained illu-
minated, but all were inactivated until the rat collected the In a second stage, 5 days later, A–B and C–D associated
food reward. After food collection, all the holes were outcomes were spatially reversed to assess behavioral
lightened and activated again. This procedure continued flexibility (Rivalan et al. 2013). The test again consisted in
daily until rats obtained 100 pellets within a session a single 60 min test session. Distinct behaviors can be
(30 min cut-off). Then two consecutive nose pokes in the observed during reversal, as previously shown (Rivalan
same hole were required to obtain food, to ensure that the et al. 2013), that are very distinct from behaviors in the
selection of the hole was a voluntary choice. After reaching RGT. For example, a rat can gradually reverse its choices
the same criterion, rats were submitted to two final 15 min towards the new location of its favorite options (flexible
training sessions. In the first session, two pellets were rat) or a rat can remain during all the session on the options
delivered after a choice was made (maximum 50 pellets). that it preferred during test (inflexible rat). An index of
Then, within a second session, one pellet is delivered at a flexibility was calculated as the mean percentage for the
time (maximum 50 pellets). These sessions habituated the options preferred during the RGT, whether advantageous
rats to the quantity of pellets that could be obtained during or disadvantageous. Behaviors were differentiated on the
the test. The training phase usually lasted 5–7 days and basis of the time-course of choices and flexibility and were
tests were performed the following day. classified into two categories: flexible with gradual rever-
Test Rats could freely choose between four nose-poke sion towards the new location of their favorite options
holes (A–B–C–D) during a 1-h test session (or max. 250 ([ 60% of reversed chosen options during the last 20 min)
pellets obtained). Choices C and D vs A and B led to the and inflexible with perseveration of previously learned
immediate delivery of one vs two pellets, but A and B choices (\ 40% of choices). The use of a different criteria
choices could be followed by longer, unpre- to distinguish between flexible and inflexible behaviors
dictable penalties (222 and 444 s time-outs) compared to C (see Rivalan et al. 2013), is justified by the nature of the
and D choices (12 and 6 s). Penalties occurred at a low task which is radically different (learning vs reversal). Rats
probability (1/4) for B and C choices, and at a high prob- with flexible behavior cannot reach a high level of pref-
ability (1/2) for choices A and D. A time-out could not erence as fast as in the RGT, because of the difficulty of
occur before the third or fourth choice, to favor sampling of reversing choices.
the different conditions at the beginning of the test. During
the penalty, all lights were switched off and nose-poke Goal-directed behavior during contingency degradation
holes were disabled, but the chosen hole remained illumi-
nated to facilitate association between each choice and its Experimental design and principle of the contingency
consequences. A brief extinction of this light (1 s) signaled degradation procedures are given in Fig. 1b. Two contin-
the end of the time-out. The theoretical maximum gain was gency degradation experiments were made with the same
the same for advantageous C and D choices, and five times rats, a month apart. The first was used to describe inter-
higher than for disadvantageous A and B choices. We used individual differences in performance, notably the sessions
the same criteria as in our previous works with this task, to necessary to reveal the largest variations in behavior. The
select animals according to their performance (Fitoussi second was a control of the reliability of the measure and
et al. 2015; Rivalan et al. 2009a, 2013). Good (GD) and was used for Zif268 immunodetection. The best condition
poor (PD) decision-makers were differentiated on the basis for this assessment (third session) was chosen according to
of the percentage of advantageous choices ([ 70 and the previous experiment and to the performances obtained

123
Brain Struct Funct (2018) 223:897–912 901

during this test. The control group (CONT) was made by activity was measured by the sum of lever presses and
pooling two undecided and two good decision-makers and visits to the magazine per min.
was tested under un-degraded condition.
Apparatus Animals were trained in eight identical con- Individual sensitivity to contingency degradation
ditioning chambers (32 cm wide 9 32 cm deep 9 22 cm
high, Imetronic), each located inside a sound and light- Training One month later, rats were trained again under the
attenuating chamber. Boxes were constructed from white RR-10 training schedule during six consecutive sessions
plastic panels with a Plexiglas door and were equipped for the experimental group, and only one session for the
with a fan providing a background sound. Each box was control group (CONT) so that lever response rate could be
permanently illuminated by a diffuse 2 lux light source comparable between groups during testing days.
located at the middle of the ceiling. The floor consisted of Test Rats were given a second contingency degradation
stainless steel bars 5 mm in diameter spaced 1.5 cm apart. test, similar as the one previously used. Animals were
One stainless steel lever protruded horizontally 1 cm from tested during three consecutive days, except the CONT
the wall situated at the left of the door, 6 cm above the grid group which was run on the corresponding first and last
floor. A tray was situated centrally on the opposite wall. testing days. Given the results of the first experiment, we
Food pellets (45 mg, Formula P, Sandow scientific, USA) expected that the third session would be appropriate to
were delivered in the magazine from a food dispenser. The ensure large inter-individual differences. After the last
magazine was equipped with infrared cells to detect the session, animals were moved to a quiet room and killed
animal’s visits. A program (Imetronic, Pessac, France) 60 min later.
controlled the chambers and collected the data on a
microcomputer (outside the testing room). Zif immunohistochemistry and cell counting
Instrumental training Rats were trained to press a lever
to obtain a reward during daily training sessions. Different Because this study is based on inter-individual differences,
reinforcement schedules were used. The rats first received we had to reduce, whenever possible, any possible source
training under a continuous reinforcement, fixed ratio (FR) of variation in Zif268 quantification. After the last behav-
1 schedule, for 2 days (each lever press was rewarded). ioral testing (30 min), rats had to be in quiet conditions
Animals were then shifted to a random ratio (RR) 5 (1 h apart in a quiet room) before being killed for Zif268
schedule for 3 days; each response was rewarded with a analyses. We also had to organize the experiment in a way
probability of 0.2 on average, meaning five lever presses to minimize variability related to time-schedule (morning
rewarded on average) and then to a random ratio 10 sessions) and to housing (4 animals per cage): four rats
schedule for 4 days (RR-10, each response was rewarded were tested at a time (1 home cage), then, placed in their
with a probability of 0.1 on average, meaning 10 lever home cage in a quiet room for 1 h and then, we proceeded
presses rewarded on average). Each session ended when to four perfusions at the same time. The following groups
rats had earned 100 pellets or when 30 min had elapsed. were tested later, with a delay allowing the best technical
Contingency degradation The test consisted in six daily conditions. For these reasons, only 12 rats per day for
consecutive sessions. Animals received response-indepen- 2 days were used for Zif268 quantification (n = 24).
dent pellet delivery on a random time schedule so that the Rats were deeply anesthetized with sodium pentobar-
instrumental contingency was degraded: pellet deliveries bital (Céva Santé Animale, France) and perfused through
occurred independently of lever presses. Response-inde- the heart with 200 ml of NaCl 0.9%, followed by 250 ml of
pendent reward deliveries were yoked to reward deliveries paraformaldehyde (PFA) 4% in 0.1 M phosphate buffer
earned during the last RR-10 session for each rat (see (PB). Brains were dissected out, post-fixed 24 h in PFA 4%
Table 1) and was used as a reference for the individual and transferred into a PB 0.1 M/sucrose 30% solution for
delivery program, independently of animal lever presses, 48 h at 4 °C. Brains were frozen and stored at -80 °C until
during the testing days. This procedure reduces the influ- use. Serial coronal sections (50 lm) were cut on a freezing
ence of variation in levels of activity and results in a microtome (Leica SM 2400), collected, and stored in a
similar frequency of reward delivery during the transition cryoprotectant solution.
between training and test. For the CONT (undecided and After the initial rinsing (3 9 10 min in PBS), all the
good decision makers), the A–O contingency was steps of the immunohistochemistry procedure were sepa-
unchanged. Each session ended when rats had received 100 rated by 3 9 10 min rinsing step with PBS (0.1 M pH 7.4)
pellets or when 30 min had elapsed. Behavioral data were under agitation. A rabbit affinity purified polyclonal anti-
displayed as the percentage of lever presses during test Zif268 antibody (Sc-189, Santa-Cruz Biotechnology Inc,
compared to the last training session. An index of general CA, USA) was used as a primary antibody (1:3000 in 0.3%

123
902 Brain Struct Funct (2018) 223:897–912

Table 1 Parameters of the


RGT FLEX Rat Random time schedule (RT) (s) Mean RT (s)
random time schedule during
contingency degradation test. PD INFLEX 3 10 \ 12 \ 15 12
Random time schedules (RT) in
seconds and mean RT are PD INFLEX 9 24 \ 30 \ 35 30
calculated for each rat, to be PD INFLEX 17 12 \ 15 \ 18 15
yoked to reward deliveries PD INFLEX 20 15 \ 19 \ 23 19
during the last random-ratio 10
PD INFLEX 25 12 \ 15 \ 18 15
session
PD INFLEX 26 12 \ 15 \ 18 15
PD INFLEX 30 12 \ 15 \ 18 15
GD INFLEX 11 12 \ 15 \ 18 15
GD INFLEX 19 12 \ 15 \ 18 15
GD INFLEX 21 24 \ 30 \ 35 30
GD INFLEX 23 10 \ 12 \ 15 12
GD INFLEX 28 12 \ 15 \ 18 15
GD FLEX 1 12 \ 15 \ 18 15
GD FLEX 5 12 \ 15 \ 18 15
GD FLEX 13 12 \ 15 \ 18 15
GD FLEX 14 15 \ 19 \ 23 19
GD FLEX 16 15 \ 19 \ 23 19
GD FLEX 18 12 \ 15 \ 18 15
GD FLEX 22 24 \ 30 \ 35 30
GD FLEX 24 12 \ 15 \ 18 15
GD FLEX 4 15 \ 19 \ 23 19
GD FLEX 6 15 \ 19 \ 23 19
GD FLEX 8 10 \ 12 \ 15 12
GD FLEX 12 10 \ 12 \ 15 12
GD FLEX 15 12 \ 15 \ 18 15
Random time schedules (RT) in seconds and mean RT calculated for each rat, for contingency degradation
test

PBS-triton/3% normal donkey serum). Free-floating sec- ? 3.20 mm), the infralimbic cortex (IL; ? 3.20 mm), the
tions were incubated with the primary anti-Zif268 antibody anterior cingulate cortex (CgA; ? 1.60 mm), the anterior
overnight at room temperature (20 °C) under agitation. and posterior parts of the dorsomedial striatum (DMSa and
After rinsing, sections were then incubated with a DMSp, respectively; ? 1.60/0.20 mm) and the anterior and
biotinylated Donkey anti-rabbit secondary antibody posterior part of the dorsolateral striatum (DLSa and
(1:2000 in PBS-triton 0.3%, GE Healthcare, UK) for 2 h at DLSp; ? 1.60/0.20 mm). Anterior and posterior striatal
room temperature, followed by the avidin–biotin–peroxi- areas were distinguished since functional differences
dase complex (1:800 in PBS, Vector Laboratories) for 2 h between these subdivisions were previously described (Yin
at room temperature. The final staining was obtained fol- et al. 2005). For each area, counting was performed within
lowing incubation with diaminobenzidine (DAB, Sigma- the entire region in three consecutive sections per rat by an
Aldrich 0.2 mg/ml)-Nickel (0.04 mg/ml) in Tris-buffered experimenter blind to the experimental groups. Zif268-
saline (pH 7.6) containing 0.003% H2O2 (Sigma). Sec- positive neurons were quantified using the Mercator soft-
tions were finally rinsed three times with 0.1 M Tris buffer, ware (Explora Nova, La Rochelle, France). Results were
disposed onto gelatin-coated slides, and mounted in Eukitt expressed as number of zif268-positive nuclei per
after 24 h of air-drying. mm2 ± SEM.
For quantification, neurons with nuclei exhibiting
Zif268 immunolabeling were counted bilaterally in sec- General data analysis
tions taken through prefrontal and striatal areas according
to the rat atlas of Paxinos and Watson (1997). Details of In both the RGT and the RGT-reversed version, behavioral
sub-regions and anatomical coordinates (indicated from scores, i.e., percentage of advantageous choices
Bregma) were chosen as follow: the prelimbic cortex (PL; (mean ± SEM) were compared to chance (50%) using a

123
Brain Struct Funct (2018) 223:897–912 903

two-tailed one-sample t test for subgroups (Good and Poor chance level to choose the same options is 1/3. Nine rats
decision-makers). Comparisons of proportions of individ- out of 25 had similar preference (36%), a proportion that
uals with analogous preference during training and RGT does not differ from 33% (Fisher exact test, p = 1; ns).
were conducted using the non-parametric Fisher’s exact Thus, choices made during the test were not influenced by
test (StatXact 9). Two-way ANOVAs were used to com- those eventually developed during training. Good and poor
pare groups [repeated measures (RM) for time or structure decision-makers also differed on their within-session pat-
and group factors] followed by simple main effects (SME) tern of choices. Whereas good decision-makers tended (on
or post hoc Newman–Keuls (NK) tests, when required average) to develop progressively their preference toward
(Statistica, Statsoft 7.1). Controls of normality distribution the best options, all poor decision-makers quickly dis-
and homogeneity of variance were made. Comparisons of played a preference for disadvantageous options, probably
the number of Zif268-positive nuclei between groups were because of a preference for the higher amount of immediate
made for subcortical and cortical areas separately. Corre- reward and lack of penalty during initial sampling. Con-
lations between behavioral scores and region-specific sequently, pellet consumption (Fig. 2c) differed greatly
immunoreactivity levels were made using the parametric between groups (F1,23 = 15.23, p \ 0.001) and was much
Bravais-Pearson’s correlation test (Statistica, Statsoft 7.1). lower for poor decision-makers over time (interaction
group 9 sessions, F5,115 = 17.53, p \ 0.001). As previ-
ously shown, poor decision makers also displayed a shorter
Results latency to collect food, that we consider as a higher
motivation for the reward (t = 2.4, df = 22, p \ 0.05)
All poor decision-makers and only some good (Fig. 2d).
decision-makers demonstrate behavioral inflexibility
Behavioral flexibility during the RGT-reversed version
Inter-individual differences in decision-making
during the RGT Reversing contingencies in the RGT measures subjects’
behavioral flexibility in shifting their behavior when
The RGT measures, across successive trials, the ability to advantageous and disadvantageous choices were spatially
make the most advantageous choices. In this task, choices reversed. Persistence to choose the same location reveals
associated with a higher immediate gain are disadvanta- behavioral inflexibility (INFLEX), whereas changes in
geous in the long run due to higher unpredictable penalties. choices reflect detection of the change and behavioral
As previously shown, we found distinct patterns of choice flexibility. Detection of the change was demonstrated by
preference within a single RGT session (Fig. 2a). A ran- undecided behavior and then reversed choices (FLEX).
dom selection of the options was necessary to avoid a 100% of poor (n = 7) vs only 28% (n = 5) of good
preference bias in choices. All rats initially selected both decision makers were INFLEX and thus, chose the same
kinds of options randomly before developing a preference hole location (i.e., inflexible behavior) (Fig. 2e, f). Most
except one that did not chose all the options and was dis- good decision-makers (72%, n = 13) were FLEX. In a
carded. Rats (n = 25) clustered into two main categories previous study (Rivalan et al. 2013), we obtained very
depending on their final preference (Fig. 2b). A majority of similar results that corroborate the present findings (all
good decision-makers (n = 18, 64%) that displayed a poor decision-makers (100%; 6/6) vs only a third (36%;
strong preference for advantageous options ([ 70% pref- 5/14) of good decision-makers were inflexible).
erence) and a minority of poor decision-makers (n = 7,
25%) that chose disadvantageous options ([ 70% Behavioral inflexibility is associated with a lower
preference). sensitivity to the contingency degradation procedure
We first verified if choices during the RGT were unre-
lated to a side preference eventually developed during Operant training
training. Thus, for each individual, we compared the
options preferred during the last day of training and those The four groups of animals (INFLEX poor and good
preferred during test. A preference was noticed when the decision-makers, FLEX good decision-makers and CONT)
options were chosen more than 70% of the total number of acquired the instrumental response (operant training phase)
choices (spatially nearby A and B: AB or spatially nearby at the same rate (Fig. 3a). A mixed ANOVA with session
C and D: CD). No preference was noted when options AB and group as factors showed an effect of session
and CD were chosen less than 70% (ABCD). Given that (F9,243 = 39.23; p \ 0.001), but no effect of group
three kinds of preference could be made (preference for (F3,27 = 0.65; ns) nor any significant group 9 session
options CD, for options AB or no preference ABCD), interaction (F27,243 = 0.46; ns), indicating that all groups

123
904 Brain Struct Funct (2018) 223:897–912

Fig. 2 Inter-individual differences during the RGT and RGT-re- preferences (INFLEX PD or GD), flexible rats that reversed their
versed version. a Time-course of advantageous choices (%) during choices (FLEX) displayed more than 60% during the last 20 min. f
the rat gambling task of good decision-makers (GD; n = 18) and poor Proportions of GD and PD according to their performance in the
decision-makers (PD; n = 7); b individual RGT score during the last RGT-reversed version. Dotted line represents chance level (50%).
20 min for GD and PD; c cumulated number of pellets earned during Comparisons with chance level, Student’s t test (a, e); comparisons
the task; d mean latency to collect food during the RGT; e time- between groups (c, d) NK post hoc analysis and Student’s t test,
course of advantageous choices during the RGT-reversed version of respectively, #p \ 0.05, *p \ 0.01
PD and GD. Inflexible rats displayed less than 40% of reversed option

similarly acquired the instrumental response. Mean in lever pressing (mean of 51.0 and 48.9, respectively on
response rates of INFLEX poor and good decision-makers, the 6th day of training, compared to 23.3 for FLEX),
FLEX good decision-makers and CONT at the completion indicating that these groups had an impairment in adapting
of training were 38.1, 38.4, 39.7, and 40.1, respectively. their behavioral responses to action–outcome contingency
changes. The ANOVA with Session and degraded Groups
Contingency degradation as factors confirms the dissociation between INFLEX and
FLEX. It reveals a significant effect of session
Figure 3b shows the relative rate of responses (i.e., fre- (F5,110 = 41.24, p \ 0.001) and group (F2,22 = 11.52,
quency of lever presses during the testing sessions as p \ 0.001) and a significant interaction between session
compared to the last training session) for all the groups, and group, (F10,110 = 2.72, p \ 0.01). Specifically, the
with marked differences. No significant difference between INFLEX groups did not differ from each other (SME;
groups was observed in the mean RT applied [INFLEX PD: F1,22 = 0.36, ns). These INFLEX groups significantly
17.3 ± 2.25; INLEX GD: 17.4 ± 3.2; FLEX GD: differed from FLEX group during testing days (INFLEX
16.9 ± 1.29; F(2,22) = 0.017, ns]. The FLEX group dis- poor decision makers vs FLEX; SME; F1,22 = 14.09,
played an important decrease in lever pressing, indicating p \ 0.001; INFLEX good decision makers vs FLEX;
that these animals correctly adapted their operant responses F1,22 = 16.11, p \ 0.001). On the other hand, the level of
to the changes in action–outcome contingency. Interest- presses remained high and stable during the whole testing
ingly, the two INFLEX groups had a much slower decrease for rats run in the non-degraded schedule [Comparisons

123
Brain Struct Funct (2018) 223:897–912 905

Fig. 3 Individual-based contingency degradation test. a Rate of lever the last training session) during the individual-based contingency
presses per min during training sessions for good decision-makers degradation test; and c frequency of visits to the empty magazine per
(GD) with flexible reversed choices (FLEX) and with inflexible min. Comparisons between FLEX and INFLEX groups, NK post hoc
choices (INFLEX GD and INFLEX PD); b relative responses rate analysis, **p \ 0.01; *p \ 0.05; #p \ 0.05
(ratio of lever press frequencies during the testing sessions and during

between sessions: Newman–Keuls (NK), ns]. Finally, correct labeling for reliable quantification). To study the
flexibility in the RGT-reversed version was inversely cor- neurofunctional bases of inter-individual differences in
related with the sensitivity to the degradation contingency sensitivity to contingency degradation, rats were split into
procedure (index of flexibility vs relative response rate two groups depending on their sensitivity to the procedure
during the fourth testing day: r = - 0.59; n = 24; during the 3rd day of testing, when brains were removed
p \ 0.01). for Zif268 immunochemistry processing. The criterion
Figure 3c shows the frequency of visits to the empty chosen was the median score of relative response rate
food magazine. The more the animals learn that food pel- (44%). Thus, one group (n = 8) included rats sensitive
lets are delivered independently of their action, the more (S) to contingency degradation with a score lower than the
frequently they will check the food magazine. An ANOVA median score. Because all these rats were also flexible in
comparing the rate of visits to the magazine between the RGT-reversed task, they were named the FLEX-S
groups confirms these behavioral differences (F3,27 = 4.05, group. The other group (n = 8) included rats with a lower
p \ 0.02) and reveals differences between groups within sensitivity (insensitive, I) to contingency degradation,
the testing session (ANOVA group 9 sessions, with a score higher than the median. Most of these rats
F15,135 = 2.06, p \ 0.02). FLEX group significantly were also inflexible in the RGT-reversed task and were
increased its visits to empty magazine per min across named the INFLEX-I group. The CONT group (n = 4)
sessions (F1,5 = 8.74; p \ 0.001). This increase was sig- was used for the non-degraded condition during testing
nificantly compared to the first session as early as the 3rd sessions 1 and 3.
session (NK, p \ 0.05).
Operant re-training
Sensitivity to contingency degradation
and prefronto-striatal circuits The previous contingency degradation procedure facili-
tated training since both groups reached much sooner (third
Here, we were interested in identifying the neural corre- session) the same level of lever presses as at the end of the
lates of inter-individual differences in the sensitivity to first training phase. A mixed ANOVA with session and
the contingency degradation procedure by the measure of group as factors showed an effect of session
Zif268 expression. For this purpose, rats were re-tested in (F4,56 = 11.05, p \ 0.001), but no effect of group
the contingency degradation procedure. Twenty-four rats (F1,14 = 0.15, ns) nor any significant group 9 session
were used for Zif268 immunodetection during the con- interaction (F4,56 = 0.71, ns), indicating that both groups
tingency degradation test: 10 FLEX rats, 10 INFLEX rats reacquired similarly the instrumental response during the
and 4 CONT. Zif268 expression analyses of four rats operant training.
could not be exploited for technical reasons (absence of

123
906 Brain Struct Funct (2018) 223:897–912

Contingency degradation before Zif268 analysis than the FLEX-S under the degraded condition, but did not
significantly differ from INFLEX-I (F2,18 = 14.8
Again, large inter-individual differences in sensitivity to p \ 0.001; CONT vs FLEX-S, NK, p \ 0.001; CONT vs
the contingency degradation procedure were observed INFLEX-I p = 0.07). An ANOVA comparing the rate of
during the test sessions, notably during the third session magazine visits between the two groups during the three
where FLEX-S and INFLEX-I displayed very different testing days similarly showed marked differences
performances. Figure 4a shows the effect of the contin- (F1,14 = 13.68, p \ 0.01) (Fig. 4b). During the 3rd day of
gency degradation procedure on instrumental responses for testing, the number of visits to the empty magazine of
FLEX-S and INFLEX-I during the three testing days. As CONT was similar to that of INFLEX-I (effect of CONT,
expected, a marked difference distinguished FLEX-S and FLEX-S INFLEX-I groups, F2,18 = 6.18, p \ 0.01; NK,
INFLEX-I for lever pressing during all testing days [a ns) but was lower than that of FLEX-S (NK, p \ 0.02),
mixed ANOVA with days and group as factors showed an indicating that these animals have learnt that food pellets
effect of group (F1,14 = 17.08, p \ 0.001)]. Unlike the first are delivered independently of their action, since they
round of experiment, we report here a significant difference check more frequently for food in the magazine.
between groups during the first day (NK, p \ 0.02), which
persisted during the following days (NK, p \ 0.01). Region-specific differences in Zif268 positive-nuclei
Whereas INFLEX-I did not decrease their response rate within the prefronto-striatal circuits
compared to training during the first testing day (93.6%;
t = 0.87, df = 7, ns), the response rate of FLEX-S was Our data emphasize specific changes in Zif268 expression
decreased by almost half (53.5%; t = 3.63, df = 7, related to the sensitivity to the contingency degradation
p \ 0.01). Then, the decrease in response rate followed a (last session) within the mPFC and dorsal striatal areas. We
similar time-course for the two groups (interaction focused on these specific networks since pilot works did
group 9 session, (F2,28 = 0.68, ns). During the 3rd day of not show any difference in other brain regions. Specifi-
testing (testing day of interest for the analysis of Zif cally, animals which exhibited low Zif268 expression in
expression), the response rate of INFLEX-I was only the PFC expressed a high expression in striatal regions and
decreased by 28%, whereas it was reduced by about 79% in reciprocally, suggesting that activities in these areas were
FLEX-S. Consequently, the response rate of the control inversely related. Figure 5a, b illustrates the pattern of
group in the non-degraded condition that remained Zif268 expression for FLEX-S and INFLEX-I groups.
stable compared to training (100.2%), was much higher Among brain areas, we report differences in Zif268
expression in two areas of the mPFC: the infralimbic cortex
(IL) and the prelimbic cortex (PL) as well as in the dorsal
striatum. FLEX-S rats displayed a higher density of
Zif268-positive nuclei than INFLEX-I in IL and PL (Two-
way RM-ANOVA, F1,14 = 6.30; p \ 0.05 for group fac-
tor; post hoc Fisher’s LSD, PL and IL, p \ 0.05) and a
lower number in dorsal striatal areas, notably in DLSa,
DLSp and DMSp (Two-way RM-ANOVA, F1,14 = 6.64,
p \ 0.05 for group factor; post hoc Fisher’s LSD,
p \ 0.05). Despite a clear tendency, we did not report any
significant difference in the cingulate cortex and DMSa
(post hoc Fisher’s LSD, ns).
It is noteworthy that significant correlations between level
of Zif268-positive neurons and behavioral sensitivity to
contingency degradation were reported. Negative correla-
tions were reported between the mPFC (both PL:
Fig. 4 Individual-based contingency degradation test: second mea- r = - 0.49, n = 16, p = 0.05 and IL: r = - 0.55, n = 16,
sure to assess neural markers of inter-individual differences. a
p \ 0.05) and relative response rate during contingency
Relative response rates of flexible rats, sensitive to contingency
degradation (FLEX-S) and inflexible rats rather insensitive (INFLEX- degradation on the third session. Conversely, in the DLSa
I) and control (CONT) groups during the individual-based contin- and DMSp, we observed a positive correlation between the
gency degradation test; and b frequency of visits to the empty density of positive-Zif268 nuclei and behavioral data
magazine in number per min of FLEX-S and INFLEX-I and CONT
(r = 0.64, n = 16, p \ 0.01 and r = 0.55, n = 16,
groups. For CONT group, values of the 1st session (training) and the
2nd session (test) are given. Comparisons between groups, NK post p \ 0.05, respectively). Interestingly, these opposite trends
hoc analysis, **p \ 0.01; *p \ 0.05 were also illustrated by significant negative correlations

123
Brain Struct Funct (2018) 223:897–912 907

Fig. 5 Quantification of Zif268 immunoreactivity in a the medial the Dorsolateral striatum anterior part, d the Prelimbic area and the
prefrontal cortex and b the dorsal striatum in FLEX-S and INFLEX-I Dorsolateral striatum posterior part; e the infralimbic area and the
groups. Results are expressed as the mean ± SEM of the number of dorsolateral part of the striatum, anterior part and f the infralimbic
Zif268-positive neurons per mm2. Statistical differences between area and the dorsomedian striatum posterior part. PL prelimbic cortex,
groups, post hoc Fisher’s LSD analysis: *p \ 0.02. Dotted lines IL infralimbic cortex, CgA anterior cingulate cortex, DMS dorsome-
represent the mean level of 4 control rats. Immunoreactivity dian striatum, DMSa anterior part, DMSp posterior part, DLS
correlations with sensitivity to contingency degradation during the dorsolateral striatum, DLSa anterior part, DLSp posterior part
third session are shown for the ratios between c the Prelimbic area and

between mPFC and striatal Zif268 ratios and behavioral ns; vs ratios IL/DLSa and DSMp: r = 0.42 and - 0.29,
sensitivity to contingency degradation (see Fig. 5c–h), n = 20, ns). A positive correlation was observed between the
whereas no significant correlation was found between any of density of positive-Zif268 nuclei in DLSa and DMSa for
these ratios and the index of activity (index of activity vs FLEX-S rats (r = 0.72; df = 7, p \ 0.05), whereas it was
ratios PL/DLSa and DLSp: r = - 0.25 and - 0.33; n = 20, not significant for INFLEX-I rats (r = 0.23, df = 7, ns).

123
908 Brain Struct Funct (2018) 223:897–912

Discussion provide for food is observed for obtaining two pellets at a


time, but not for one (Rivalan et al. 2009a). Conversely, all
The current results suggest that inflexible behavior and rats behave similarly in a version of the task where all
poor decision-making in the RGT may be linked to a options provide the same number of rewards, but deliver
failure in updating the A–O association that mediates goal- the four different penalties. That is, they prefer the shorter
directed behavior as measured during the contingency penalty (6 s), followed by the 12 s long one and avoid the
degradation paradigm. This disability appears associated long penalties.
with a specific imbalanced activity within the prefronto- The apparent large difference between short and long
striatal networks. penalties only results in a ratio 5 between the two kinds of
options. When the duration of the longer penalties is
How behavioral inflexibility may influence decision- decreased (Rivalan et al. 2009a) reducing the ratio to 4 or
making 3, poor decision-maker behavior is unchanged, suggesting
that the contrast between the rewards is fundamental and
Like most of the currently available RGTs, our task inte- drives their behavior. Interestingly, good decision-makers
grates multiple cognitive abilities involved in executive take more time to select advantageous options because of
functioning (de Visser et al. 2011; Fitoussi et al. 2015; an increased difficulty of the task. As a whole, the com-
Rivalan et al. 2013; van den Bos et al. 2014; Zeeb et al. bination of the parameters used in the standard RGT allows
2009). Additionally, we show that this complex task obtaining very similar results to those obtained in the
reveals poor versus good decision-makers within a healthy human IGT (Bechara et al. 2002).
population, these behaviors being highly reproducible and In fact, poor decision-making in rats can be explained by
stable across time (Fitoussi et al. 2015; Rivalan et al. their particular combination of behavioral and cognitive
2009a, 2011, 2013). We previously showed that poor traits recently demonstrated: risk-taking, reward-seeking,
decision-making in the RGT is not related to a slower motor impulsivity and behavioral inflexibility (Rivalan
learning, since one session is long enough for the behavior et al. 2013). This combination is reminiscent of several
to stabilize and to reveal option preference. Indeed, when a mental disorders such as pathological gambling and
second test is performed the following day, good and poor addiction (van den Bos et al. 2013). It appears to be a
decision-makers maintain the same preference developed consequence of an over-valuation of high-reward-high-risk
during the first test, throughout the entire session (Fitoussi options in the task that cannot be modulated by learning
et al. 2015). Moreover, poor decision-making persisted because of their inflexible behavior. Thus, it demonstrates a
despite prior knowledge about the option values, learnt strong alteration in long-term outcome evaluation and/or
separately in a task variant, indicating that they did not insensitivity to future long-term consequences as proposed
choose the disadvantageous options because they failed to by Damasio and Bechara in humans (Bechara et al. 1997;
acquire relevant information about the task (Fig. 5e; Riv- Fitoussi et al. 2015; Rivalan et al. 2009a, 2011, 2013).
alan et al. 2009a). Our study shows that all poor decision-makers in the
As previously shown (Rivalan et al. 2009a), choices RGT exhibit behavioral inflexibility, as well as a small
during the RGT was unrelated to a side preference even- fraction of good decision-makers, confirming previous data
tually developed during training since no bias effect of (Rivalan et al. 2013). Behavioral flexibility was tested in a
location preference eventually developed during training RGT-reversed version, during which advantageous and
was observed: the proportion of rats with identical prefer- disadvantageous options were spatially swapped. In this
ence during training and test did not differ from chance. RGT variant, shifting behavioral responses based on pre-
Uncontrolled differences in the level of food restriction vious learned associations is necessary for adaptation to
or consumption cannot explain behavioral differences, changes, such as in standard simple reversal procedures
since increasing the level of food restriction by decreasing (Floresco et al. 2009). Cognitive processes engaged for
body weight from 0 to 20% had no significant impact on RGT solving can be highly predicted by a combination of
either the proportions of good and poor decision makers or several core behavioral features, including not only
the evolution of their behavior (Rivalan et al. 2009a). The behavioral flexibility but also motor impulsivity, risk-tak-
use of a palatable food pellet (Formula P, Sandow) cer- ing and reward-seeking (Rivalan et al. 2013). This may
tainly contributes to this effect. explain why poor performers in the RGT-reversed version
Importantly, we previously showed that the contrast may not also be poor performers in the RGT: some
between one pellet (favorable options) vs two (unfavorable inflexible good decision-makers, being for example less
options) is a crucial aspect of the test since differences impulsive, or less sensitive to risk or reward. It may explain
between PD and GD in the amount of work they are able to why a relationship between IGT performances and reversal

123
Brain Struct Funct (2018) 223:897–912 909

learning in humans has only been sparsely reported (Clark Ability to update the A–O association and its
et al. 2004; Toplak et al. 2010). neurobiological correlates
Here, we show an additional cognitive process related to
inflexible behavior and its putative contribution for deci- In the second part of our experiment, we aimed to identify
sion-making. Inflexible rats were characterized by an the neural correlates of inter-individual differences in
altered efficiency in goal-directed behavior during a con- sensitivity to contingency degradation using Zif268 as a
tingency degradation procedure, independently of learning marker of neuronal reactivity. We report opposing corre-
abilities during training. The less flexible it was in shifting lations between the ability to update the A–O association
choices in the RGT-reversed version, the less sensitive to during contingency degradation and Zif268 expression in
the contingency degradation the animal was. This rela- the mPFC and the dorsal striatum. A poor sensitivity to
tionship was observed for both inflexible good and poor contingency degradation was associated with a lower
decision makers, suggesting that their inflexible behavior in expression of Zif268 in the mPFC (IL and PL) and a higher
the reversal RGT procedure may have similar origins. one in the dorsal striatum (DLS and DMS). As illustrated
Thus, it seems reasonable to propose that during the by the ratio of reactivity between the two types of struc-
reversal, poor decision makers failed to adapt to contin- tures, this interplay would play an essential role for
gency reversal rather than suddenly preferring the advan- response adaptation in a changing environment. No clear
tageous options. Inter-individual differences in the dissociation could be observed between activation of IL
sensitivity to contingency degradation could be revealed by and PL nor between activations of the lateral and the
ensuring reduction of a non-specific parameter: variations medial parts of the dorsal striatum, related to the ability to
in general motor activity. For this purpose, a strict adjust- update A–O association, although their respective roles
ment of test conditions according to individual level of have often been dissociated.
motor activity during training was applied (see ‘‘Materials It is possible that differences in motor activity between
and methods’’). groups may have contributed to these differences in brain
In the present study, we also assessed rats’s ability to activities. However, the absence of correlation between
adapt their responding to a change in instrumental contin- Zif268 expression ratios between prefrontal and striatal
gency (Balleine and Dickinson 1998) using a random food structures, and motor index, in contrast to sensitivity to
delivery procedure (Coutureau et al. 2012; Yin et al. 2006), degradation, indicates that the immunoreactivity differences
yoked to training food delivery to avoid a bias related to cannot solely be attributed to unspecific motor output.
inter-individual differences in activity levels. Under these Within the dorsal striatum, the DLS and DMS differ in their
circumstances, some rats showed a marked decrement of interconnectivity, receptor distribution and mechanisms of
responding suggesting that they were sensitive to the plasticity (Yin and Knowlton 2006). The DMS receives multi-
consequences of their action. Such a decrease in lever press modal information from the medial PFC and CgA, whereas
performance is generally assumed to result from a the DLS predominantly, but not exclusively, receives senso-
decreased A–O contingency computation. It must be rimotor information from the sensorimotor cortex (Yin and
acknowledged, however, that this suppression of lever Knowlton 2006). The DMS (specifically the posterior region)
pressing may also occur due to acquisition of reward-re- is more traditionally linked with goal-directed behavior effi-
lated approach behavior to the magazine, as our results ciency (Yin et al. 2005), specifically when considering the
suggest. By contrast, however, inflexible rats maintain a causal relationship between an action and its outcome;
high level of operant behavior throughout the contingency whereas the DLS (notably the posterior part) is often linked to
changes, suggesting that they rely on habitual (S-R) habits formation (Tricomi et al. 2009) and thus, specific
responding. Such an instrumental profile may be related to stimuli-response (S-R) association. However, some studies
the ‘‘illusion of control’’ that human pathological gamblers found difficulties in dissociating the contribution of the DMS
show (Orgaz et al. 2013). It may also be related to perse- vs DLS, e.g., goal-directed vs habitual-like behaviors (Stal-
verative and compulsive-like behaviors observed in naker et al. 2010) and consequently, their respective roles have
inflexible rats using different impulsivity schedules. Inap- been reconsidered. A hypothesis is that a coexistence of S-R
propriate compulsive behaviors (Dalley et al. 2011) may and A–O information processing within these two striatal
result from attributing excessive incentive value to reward- subparts may occur (Stalnaker et al. 2010; Williams and
associated stimuli (Berridge and Robinson 1998; Flagel Eskandar 2006; Yin and Knowlton 2006). This hypothesis is
et al. 2009). supported by our data in which DMS and DLS show similar
patterns of activation in our subgroups.
The general decrease in instrumental responses
throughout the contingency degradation test sessions may
reflect flexible, goal-directed behavior. However, this

123
910 Brain Struct Funct (2018) 223:897–912

decrease is much slower in rats that are less sensitive to Finally, it is worth noting that human studies have
contingency degradation and we cannot exclude that these reported individual differences in cortico-striatal connec-
rats, that exhibited a high level of lever presses, were tivity related to the balance between goal-directed and
partially driven by a habit mode, classically attributed to habitual actions (de Wit et al. 2012). Vulnerability to
DLS activity. Thus, a transition from a habit-becoming habitual actions was related to premotor cortex—posterior
behavior toward goal-directed behavior could occur during putamen connectivity, whereas flexible goal-directed
the contingency degradation task, explaining a progressive action was predicted by medio-ventral PFC—caudate
disengagement of DLS as operant behavior decreases in connectivity. It could, therefore, be hypothesized that our
rats with sensitivity to contingency degradation. Indeed, results in rats may also originate from differences in
the higher the sensitivity to contingency degradation, the functional connectivities that remain to be explored.
lower the Zif268 reactivity in DLSa. Differences were also
significant in posterior parts of DLS and DMS according to Implication for decision-making-related disorders
this sensitivity, indicating a disengagement of the dorsal
striatum once the A–O association has been updated. The Poor decision-makers are characterized by behavioral
similar level of control rat’s activity in the DLS to that of inflexibility in the RGT reversal procedure, as well as
sensitive individuals corroborates this hypothesis. These perseverative and compulsive-like behaviors (Rivalan et al.
results are in line with recent data showing the role of DMS 2013), some behavioral traits that are reminiscent of
Zif268 activation in the transition from goal-directed symptoms of several executive disorders. Indeed, perse-
responding to habit-based actions (Maroteaux et al. 2014). verative responses are typically observed after acute
Indeed, Zif268 was shown to decrease in DMS once the administration of psychostimulants (Evenden and Ko 2005)
task performed was learned, but remained high in DLS. and inflexible and compulsive behaviors are observed in
Thus, the higher DMS activation observed in less sensitive drug addiction (Calu et al. 2007; Jentsch et al. 2002),
rats could reflect the active transition from habit-based pathological gambling (Goudriaan et al. 2006) and obses-
behavior to goal-directed responding, whereas the higher sive–compulsive disorder (OCD) (DSM-V 2013; Gillan
DLS activation would reflect the largest habit component et al. 2011). Here, we show that inflexible behavior of poor
of their behavior. decision-makers (but also some good decision-makers)
Interestingly, our data indicate that an efficient update of could be related to a less-efficient encoding of goal-di-
A–O contingency is associated with a higher engagement rected action–outcome (A–O) associations in a changing
of the mPFC, notably in PL and IL. These two PFC areas, environment that could then contribute to the etiology of
strongly interconnected (Vertes 2004), are crucial for these disorders. This deficit, related to mPFC/dorsal striatal
coordinating actions and habits. Higher PL activation in imbalance highlights key cognitive mechanisms that could
FLEX-S could, therefore, mediate rats’ ability to detect A– play a pivotal role in several psychiatric disorders. For
O contingency variations (Corbit and Balleine 2003; example, the critical transition from goal-directed to
Coutureau et al. 2012; Naneix et al. 2009). Indeed, PL is a habitual behaviors is a predominant hypothesis that may
key component of the neural circuit regulating goal-di- explain, at least in part, addictive behavior (Everitt and
rected behavior in rodents and primates, both for encoding Robbins 2013) or OCD (Gillan et al. 2011). We hypothe-
A–O contingency and for the representation of the outcome sise that this lower cognitive efficiency—subserved by a
as a goal (Balleine and O’Doherty 2010; Killcross and specific mPFC/dorsal striatal imbalanced activation, and/or
Coutureau 2003). At the same time, the higher IL activa- malfunction—may promote maladaptive habit formation
tion in rats sensitive to contingency degradation could among poor decision-makers (Killcross and Coutureau
mediate the suppression of learned A–O association 2003). As such, it could represent a key marker of vul-
required to extinguish behavior. This is in accordance with nerability for decision-making-related disorders, such as
recent views proposing that IL plays a key role in the addiction.
regulation of flexible behaviors and in inhibiting estab-
lished A–O association, when no more reward is delivered,
i.e., in extinction processes (Millan et al. 2011; Peters et al. Conclusion
2008). Thus, the involvement of IL in the contingency
degradation paradigm may share some similarities with In this study, we found that inflexible behavior as shown
that of standard extinction tests. It is noticeable that PL and predominantly in poor decision-makers is related to an
IL activities do not rely on activity levels, since active impaired ability to update an A–O association that medi-
control and FLEX-S rats with a low level of activity have ates goal-directed behavior. Taken together, our results
similar IL and PL activations. demonstrate that inter-individual differences in the transi-
tion between goal-directed and habit-based behaviors can

123
Brain Struct Funct (2018) 223:897–912 911

be revealed in a normal population of rats and offer Bechara A, Damasio AR, Damasio H, Anderson SW (1994)
important and novel insights on how information is pro- Insensitivity to future consequences following damage to human
preforntal cortex. Cognition 50:7–15
cessed through the cortico-striatal network. Although these Bechara A, Damasio H, Tranel D, Damasio AR (1997) Deciding
inter-individual differences in behavior are subtle, unlike advantageously before knowing the advantageous strategy.
comparisons between standard experimental and control Science 275:1293–1295
groups, the differential approach tends to bring together Bechara A, Dolan S, Hindes A (2002) Decision-making and addiction
(part II): myopia for the future or hypersensitivity to reward?
animal research and human psychological concepts (Ri- Neuropsychologia 40:1690–1705
valan et al. 2009b). By focusing on healthy individuals in a Belin D, Deroche-Gamonet V (2012) Responses to novelty and
non-invasive manner, this work contributes with others vulnerability to cocaine addiction: contribution of a multi-
(Bardo et al. 1996; Belin and Deroche-Gamonet 2012; symptomatic animal model. Cold Spring Harb Perspect Med
2(11):a011940. doi:10.1101/cshperspect.a011940
Blondeau and Dellu-Hagedorn 2007; Deroche-Gamonet Berridge KC, Robinson TE (1998) What is the role of dopamine in
et al. 2004; Kabbaj and Akil 2001; Koolhaas et al. 2007; reward: hedonic impact, reward learning, or incentive salience?
LaPorte et al. 2010; Matzel and Kolata 2010; Rivalan et al. Brain Res Brain Res Rev 28:309–369
2009b; Robbins et al. 2012; Robinson et al. 2014) to Blondeau C, Dellu-Hagedorn F (2007) Dimensional analysis of
ADHD subtypes in rats. Biol Psychiatry 61:1340–1350
demonstrate the relevance for identifying potential Bozon B, Kelly A, Josselyn SA, Silva AJ, Davis S, Laroche S (2003)
biomarkers that could not only promote decision-making- MAPK, CREB and zif268 are all required for the consolidation
related disorders such as addiction, and OCD, but also of recognition memory. Philos Trans R Soc Lond B Biol Sci
neural developmental disorders (i.e., developmental coor- 358:805–814
Bradfield LA, Hart G, Balleine BW (2013) The role of the anterior,
dination disorder) in which transitions between goal-di- mediodorsal, and parafascicular thalamus in instrumental condi-
rected behavior and habits are disrupted. This work is in tioning. Front Syst Neurosci 7:51
line with the recent proposal by Robbins et al. (2012) to Calu DJ, Stalnaker TA, Franz TM, Singh T, Shaham Y, Schoenbaum
undertake a more objective description of psychiatric dis- G (2007) Withdrawal from cocaine self-administration produces
long-lasting deficits in orbitofrontal-dependent reversal learning
orders through predisposing traits and neurocognitive in rats. Learn Mem 14:325–328
endophenotypes, thereby explaining the high level of Clark L, Cools R, Robbins TW (2004) The neuropsychology of
comorbidities between mental disorders. ventral prefrontal cortex: decision-making and reversal learning.
Brain Cogn 55:41–53
Acknowledgements This work was supported and funded by the Corbit LH, Balleine BW (2003) The role of prelimbic cortex in
Centre National de la Recherche Scientifique, the University of instrumental conditioning. Behav Brain Res 146:145–157
Bordeaux and the Conseil Régional d’Aquitaine. We thank B. Lor- Corbit LH, Muir JL, Balleine BW (2003) Lesions of mediodorsal
geoux, S. Lelgouach, A. Fayoux for technical assistance, A. Marc- thalamus and anterior thalamic nuclei produce dissociable
hand, S. Parkes and F. Naneix for their helpful comments on the effects on instrumental conditioning in rats. Eur J Neurosci
manuscript and editorial assistance. 18:1286–1294
Coutureau E, Esclassan F, Di Scala G, Marchand AR (2012) The role
Compliance with ethical standards of the rat medial prefrontal cortex in adapting to changes in
instrumental contingency. PLoS One 7:e33302
Dalley JW, Everitt BJ, Robbins TW (2011) Impulsivity, compulsivity,
Conflict of interest The authors declare no potential conflicts of
and top-down cognitive control. Neuron 69:680–694
interest.
de Visser L et al (2011) Rodent versions of the Iowa gambling task:
opportunities and challenges for the understanding of decision-
making. Front Neurosci 5:109
de Wit S, Watson P, Harsay HA, Cohen MX, van de Vijver I,
References Ridderinkhof KR (2012) Corticostriatal connectivity underlies
individual differences in the balance between habitual and goal-
directed action control. J Neurosci 32:12066–12075
Balleine BW, Dickinson A (1998) Goal-directed instrumental action:
Deroche-Gamonet V, Belin D, Piazza PV (2004) Evidence for
contingency and incentive learning and their cortical substrates.
addiction-like behavior in the rat. Science 305:1014–1017
Neuropharmacology 37:407–419
DSM-V (2013) Diagnostic and statistical manual of mental disorders,
Balleine B, O’Doherty J (2010) Human and rodent homologies in
Fifth edn. American Psychiatric Association, Washington, DC
action control: corticostriatal determinants of goal-directed and
Ernst M et al (2004) Choice selection and reward anticipation: an
habitual action. Neuropsychpharmacology 35:48–69
fMRI study. Neuropsychologia 42:1585–1597
Balleine BW, Killcross AS, Dickinson A (2003) The effect of lesions
Evenden J, Ko T (2005) The psychopharmacology of impulsive
of the basolateral amygdala on instrumental conditioning.
behaviour in rats VIII: effects of amphetamine, methylphenidate,
J Neurosci 23:666–675
and other drugs on responding maintained by a fixed consecutive
Bardo MT, Donohew RL, Harrington NG (1996) Psychobiology of
number avoidance schedule. Psychopharmacology 180:294–305
novelty seeking and drug seeking behavior. Behav Brain Res
Everitt BJ, Robbins TW (2013) From the ventral to the dorsal
77:23–43
striatum: devolving views of their roles in drug addiction.
Bechara A, Damasio H (2002) Decision-making and addiction (part
Neurosci Biobehav Rev 37:1946–1954
I): impaired activation of somatic states in substance dependent
Fellows LK (2004) The cognitive neurosciences of human decision
individuals when pondering decisions with negative future
making: a review and conceptual framework. Behav Cogn
consequences. Neuropsychologia 40:1675–1689
Neurosci Rev 3:159–172

123
912 Brain Struct Funct (2018) 223:897–912

Fitoussi A, Le Moine C, De Deurwaerdere P, Laqui M, Rivalan M, Rivalan M, Ahmed SH, Dellu-Hagedorn F (2009a) Risk-prone
Cador M, Dellu-Hagedorn F (2015) Prefronto-subcortical imbal- individuals prefer the wrong options on a rat version of the
ance characterizes poor decision-making: neurochemical and Iowa gambling task. Biol Psychiatry 66:743–749
neural functional evidences in rats. Brain Struct Funct Rivalan M, Blondeau C, Dellu-Hagedorn F (2009b) Modeling
220:3485–3496 symptoms of mental disorders using a dimensional approach in
Flagel SB, Akil H, Robinson TE (2009) Individual differences in the the rat. In: Granon S (ed) Endophenotypes of psychiatric and
attribution of incentive salience to reward-related cues: impli- neurodegenerative disorders in rodent models. Transworld
cations for addiction. Neuropharmacology 56(Suppl 1):139–148 Research Network, Kerala
Floresco SB, Zhang Y, Enomoto T (2009) Neural circuits subserving Rivalan M, Coutureau E, Fitoussi A, Dellu-Hagedorn F (2011) Inter-
behavioral flexibility and their relevance to schizophrenia. Behav individual decision-making differences in the effects of cingu-
Brain Res 204:396–409 late, orbitofrontal, and prelimbic cortex lesions in a rat gambling
Gillan CM, Papmeyer M, Morein-Zamir S, Sahakian BJ, Fineberg task. Front Behav Neurosci 5:22
NA, Robbins TW, de Wit S (2011) Disruption in the balance Rivalan M, Valton V, Series P, Marchand AR, Dellu-Hagedorn F
between goal-directed behavior and habit learning in obsessive- (2013) Elucidating poor decision-making in a rat gambling task.
compulsive disorder. Am J Psychiatry 168:718–726 PLoS One 8:e82052
Goudriaan AE, Oosterlaan J, de Beurs E, van den Brink W (2006) Robbins TW, Gillan CM, Smith DG, de Wit S, Ersche KD (2012)
Neurocognitive functions in pathological gambling: a compar- Neurocognitive endophenotypes of impulsivity and compulsiv-
ison with alcohol dependence, Tourette syndrome and normal ity: towards dimensional psychiatry. Trends Cogn Sci 16:81–91
controls. Addiction 101:534–547 Robinson TE, Yager LM, Cogan ES, Saunders BT (2014) On the
Griffiths KR, Morris RW, Balleine BW (2014) Translational studies motivational properties of reward cues: Individual differences.
of goal-directed action as a framework for classifying deficits Neuropharmacology 76(Pt B):450–459
across psychiatric disorders. Front Syst Neurosci 8:101 Royall DR et al (2002) Executive control function: a review of its
Jentsch JD, Olausson P, De La Garza R, Taylor JR 2nd (2002) promise and challenges for clinical research. A report from the
Impairments of reversal learning and response perseveration Committee on Research of the American Neuropsychiatric
after repeated, intermittent cocaine administrations to monkeys. Association. J Neuropsychiatry Clin Neurosci 14:377–405
Neuropsychopharmacology 26:183–190 Stalnaker TA, Calhoon GG, Ogawa M, Roesch MR, Schoenbaum G
Kabbaj M, Akil H (2001) Individual differences in novelty-seeking (2010) Neural correlates of stimulus–response and response–
behavior in rats: a c-fos study. Neuroscience 106:535–545 outcome associations in dorsolateral versus dorsomedial stria-
Killcross S, Coutureau E (2003) Coordination of actions and habits in tum. Front Integr Neurosci 4:12
the medial prefrontal cortex of rats. Cereb Cortex 13:400–408 Toplak ME, Sorge GB, Benoit A, West RF, Stanovich KE (2010)
Koolhaas JM, de Boer SF, Buwalda B, van Reenen K (2007) Decision-making and cognitive abilities: a review of associations
Individual variation in coping with stress: a multidimensional between Iowa gambling task performance, executive functions,
approach of ultimate and proximate mechanisms. Brain Behav and intelligence. Clin Psychol Rev 30:562–581
Evol 70:218–226 Tricomi E, Balleine B, O’Doherty J (2009) A specific role for
LaPorte JL, Egan RJ, Hart PC, Bergner CL, Cachat JM, Canavello posterior dorsolateral striatum in human habit learning. Eur J
PR, Kalueff AV (2010) Qui non proficit, deficit: experimental Neurosci 29:2225–2232
models for ‘integrative’ research of affective disorders. J Affect van den Bos R et al (2013) Cross-species approaches to pathological
Disord 121:1–9 gambling: a review targeting sex differences, adolescent vulner-
Lee D, Seo H, Jung MW (2012) Neural basis of reinforcement ability and ecological validity of research tools. Neurosci
learning and decision making. Annu Rev Neurosci 35:287–308 Biobehav Rev 37:2454–2471
Maroteaux M, Valjent E, Longueville S, Topilko P, Girault JA, Herve van den Bos R, Koot S, de Visser L (2014) A rodent version of the
D (2014) Role of the plasticity-associated transcription factor Iowa gambling task: 7 years of progress. Front Psychol 5:203
zif268 in the early phase of instrumental learning. PLoS One Vertes RP (2004) Differential projections of the infralimbic and
9:e81868 prelimbic cortex in the rat. Synapse 51:32–58
Matzel LD, Kolata S (2010) Selective attention, working memory, Williams ZM, Eskandar EN (2006) Selective enhancement of
and animal intelligence. Neurosci Biobehav Rev 34:23–30 associative learning by microstimulation of the anterior caudate.
Millan EZ, Marchant NJ, McNally GP (2011) Extinction of drug Nat Neurosci 9:562–568
seeking. Behav Brain Res 217:454–462 Yin H, Knowlton B (2006) The role of the basal ganglia in habit
Naneix F, Marchand AR, Di Scala G, Pape JR, Coutureau E (2009) A formation. Nat Rev Neurosci 7:464–476
role for medial prefrontal dopaminergic innervation in instru- Yin HH, Ostlund SB, Knowlton BJ, Balleine BW (2005) The role of
mental conditioning. J Neurosci 29:6599–6606 the dorsomedial striatum in instrumental conditioning. Eur J
Orgaz C, Estevez A, Matute H (2013) Pathological gamblers are more Neurosci 22:513–523
vulnerable to the illusion of control in a standard associative Yin HH, Knowlton BJ, Balleine BW (2006) Inactivation of dorso-
learning task. Front Psychol 4:306 lateral striatum enhances sensitivity to changes in the action–
Paxinos G, Watson C (1997) The rat brain in stereotaxic coordinates, outcome contingency in instrumental conditioning. Behav Brain
3rd edn. Academic Press, San Diego Res 166:189–196
Peters J, LaLumiere RT, Kalivas PW (2008) Infralimbic prefrontal Zeeb FD, Robbins TW, Winstanley CA (2009) Serotonergic and
cortex is responsible for inhibiting cocaine seeking in extin- dopaminergic modulation of gambling behavior as assessed
guished rats. J Neurosci 28:6046–6053 using a novel rat gambling task. Neuropsychopharmacology
34:2329–2343

123

You might also like