Balleine 2019

Neuron
Review
The Meaning of Behavior:

Discriminating Reflex and Volition in the Brain
Bernard W. Balleine1,*
1Decision Neuroscience Lab, UNSW Sydney, Randwick, NSW 2052, Australia
*Correspondence: bernard.balleine@unsw.edu.au
https://doi.org/10.1016/j.neuron.2019.09.024
The ability to establish behaviorally what psychological capacity an animal is deploying—to discern accu-
rately what an animal is doing—is key to functional analyses of the brain. Our current understanding of these
capacities suggests, however, that this task is complex; there is evidence that multiple capacities are
engaged simultaneously and contribute independently to the control of behavior. As such, establishing the
contribution of a cell, circuit, or neural system to any one function requires careful dissection of that role
from its influence on other functions and, therefore, the careful selection and design of behavioral tasks fit
for that purpose. Here I describe recent research that has sought to utilize behavioral tools to investigate
the neural bases of instrumental conditioning, particularly the circuits and systems supporting the capacity
for goal-directed action, as opposed to conditioned reflexes and habits, and how these sources of action
control interact to generate adaptive behavior.
Introduction perspective (Dolan and Dayan, 2013), and although the issues
For much of the last century, it has been regarded as axiomatic in these literatures have much in common (Balleine et al., 2007;
that the brain organizes the psychological functions of animals, Balleine and O’Doherty, 2010), they are not directly addressed
and therefore, by studying the brain, we can understand how here. In a similar vein, although the work described here has
such functions arise, are maintained, and when and why they clear relevance for disorders of the brain associated with various
fail with damage or disease. Recently, a number of researchers neurodegenerative conditions, psychiatric disorders, and addic-
have suggested a radical shift in this view: that the brain doesn’t tions, these too have been reviewed elsewhere and are not dis-
organize function; instead, function organizes the brain (Cooper cussed in what follows (cf. Brown and Pluck, 2000; Everitt and
and Peebles, 2015; Gomez-Marin et al., 2014; Krakauer et al., Robbins, 2016).
2017). From this perspective, understanding the organization
of the brain can only emerge from a better understanding of Process Dissociation: Discriminating Goal-Directed
the behavioral and psychological functions that the brain Action from Other Forms of Behavior
implements. The distinction between learning and performance is, arguably,
There is no doubting that the evolution and development of the most fundamental to emerge from behavioral studies ad-
brain structure and organization is sensitive to animal behavior; dressing the psychological capacities of animals (Tolman,
only phenotypic variations sufficiently adaptive for an animal to 1932). It is an important distinction for at least two reasons. First,
survive and reproduce have shaped this process. Pragmatically it implies that functional capacities don’t necessarily generate
speaking, however, the more important point is that progress in behavior and that learning can occur without any obvious behav-
understanding the relationship between the brain and our most ioral sign or signal. Second, behavior doesn’t necessarily reveal
important adaptive functions is likely to be more rapid if we first function: one can’t just take a superficial glance at what an ani-
seek to understand the meaning of behavior—i.e., how to mea- mal is doing and decide what its behavior means. The latter issue
sure, manipulate, and dissociate specific functions in any spe- is generated by the tendency to equate task with function; to
cies—and only then begin the process of establishing how the treat tasks as if they are, what Jacoby (1991) called, factor
brain supports their implementation. This requires a radical pure (i.e., that they measure only one process or function). In
adjustment in perspective; it contends that asking general ques- neuroscience, there has been a tendency to use tasks ‘‘off the
tions about what the brain does, whether framed in terms of its shelf’’ to address function without much critical thought
cells, circuits, systems, and so on, is simply not an appropriate regarding what these tasks measure. In fact, it is pretty much
question; our first task is to understand what an animal is doing impossible to demonstrate convincingly that a task is fac-
and why it is doing it and only then seek to discover in what way tor pure.
the brain contributes to that specific capacity. Historically, psychologists often relied on the proverbial rat in a
Here, I will explore this issue by focusing on the neural analysis maze to study function. However, performance in a maze can
of a core adaptive function: the capacity for goal-directed action, involve multiple influences, e.g., spatial learning, based on extra
limiting this analysis to research investigating action control in maze cues; place learning, based on intra-maze cues; or
rodents. Other recent reviews have focused on human and pri- response learning, based on proprioceptive cues. Indeed, in
mate research (O’Doherty et al., 2017) from a computational any maze, all of these forms of learning are likely contributing
Neuron 104, October 9, 2019 ª 2019 Elsevier Inc. 47

Neuron
Review
to performance to some degree, and this points to the essential but because the food strengthens (reinforces) an association be-
problem: recording neural activity or altering that activity during a tween environmental stimuli (S) and the response (R), allowing
behavioral task through some intervention or other, no matter those stimuli to control subsequent performance.
how elegantly or specifically, could reflect and/or affect any Distinguishing stimulus- from goal-directed control of instru-
one of these processes. And, of course, identifying a task with mental actions has again relied on process dissociation, setting
a process (i.e., calling it ‘‘spatial’’ or ‘‘hippocampal’’ and so on) predictions from the two views in opposition (see Balleine
doesn’t guarantee that only the process with which it is identified and Dickinson, 1998). Two kinds of experiment have been con-
has changed. To understand the psychological determinants of ducted, one assessing the sensitivity of instrumental perfor-
performance on a task and, therefore, what a change in neural mance to changes in the causal relationship between actions
activity or the effect of some neural manipulation actually means and their consequences and another evaluating sensitivity of
requires a different approach, one of process dissociation. the action to shifts in the incentive value of those consequences
Behavioral Analyses Reveal and Differentiate Multiple (Figure 1A). The latter experiment was earlier chronologically; for
Influences on Performance example, Adams and Dickinson (1981) trained rats to lever press
Although it continues to be debated how precisely volitional or for sugar (or pellets) while delivering pellets (or sugar) non-
goal-directed actions should best be defined, the encoding of contingently. They then altered the value of either the contingent
some form of relationship between an action and its conse- or the non-contingent outcome using a taste aversion manipula-
quences is commonly emphasized, whether that relationship is tion and examined the rats’ tendency to press the lever in extinc-
determined by an intentional process, observed covariation, or tion. They found a reduction in lever press performance, showing
the direct perception of a causal relationship between these that the rats were sensitive to the value of the outcome and,
events (Searle, 1983; Ahmed, 2014; Maze, 1983). Nevertheless, importantly, that this effect was greater for the contingent than
because, superficially speaking, all behavior can be described in the non-contingent outcome. This established that the effect
goal-directed terms (i.e., as being performed to achieve some was mediated by the action-outcome contingency rather than
particular goal or other), process dissociation has proven partic- changes in stimulus control. A similar effect was reported
ularly important in discriminating goal-directed actions from subsequently by Colwill and Rescorla (1985) in choice studies
those that are not. conducted using two distinct actions, lever pressing and chain
Take another paradigm case: that of a rat pressing a lever for pulling, to earn the pellet and sucrose outcomes. Again, devalu-
food. Although this response can look decidedly goal directed, it ation selectively reduced the performance of the action that pre-
can clearly involve multiple control processes. For example, viously earned the now devalued outcome.
when a lever is simply presented paired with food delivery, rats The second kind of experiment assessed the reliance of ac-
will often approach and press it even though it is only a signal tion-outcome learning on the rats’ sensitivity to the action-
for the food and a press response isn’t required (Davey and Cle- outcome contingency. The effects of omission schedules are
land, 1984; Flagel et al., 2011). The usual interpretation of this consistent with such sensitivity but do not differentiate action-
phenomenon is that the lever press is incidental to a conditioned outcome control from a competing S-R process (because
approach or ‘‘sign-tracking’’ response driven by the Pavlovian the free outcomes could differentially reinforce a competing
stimulus (lever)-outcome (food) association rather than the lever response). Direct evidence that action-outcome encoding re-
press-food association and so is not goal directed (Morrison flects sensitivity to the action-outcome contingency came from
et al., 2015). But can we be so sure? Perhaps the rats actually studies in which the contingency was selectively degraded
believe a lever press is required to get the food. One way to (i.e., one action-outcome contingency was degraded while
find out is to reverse the relationship between lever pressing another was kept intact). Using the ‘‘constant probability’’
and food delivery (i.e., to omit the food if the rat presses the approach developed by (Hammond 1980; see also Dickinson
lever). If lever pressing is goal directed, the rats should stop and Charnock, 1985), (Colwill and Rescorla 1986; see also Dick-
pressing; however, in sign tracking studies, it is usual for rats inson and Mulatero, 1989) trained rats to perform two actions for
and other species to continue to respond and so lose a consider- distinct outcomes. After this training, each subsequent session
able quantity of the food available to them (Chang and Smith, was divided into 1 s intervals, and whereas the actions continued
2016; Davey et al., 1981); indeed, they will often acquire to earn their appropriate outcomes at a fixed probability in each
Pavlovian approach responses when they have never been second in which they were performed, one or other of those
associated with food (e.g., Holland, 1979). In contrast, when outcomes was also delivered non-contingently at the same
rats are trained in situations in which lever pressing is required probability, thereby selectively degrading one, but not the other
to get food, omitting the food after the press rapidly reduces action-outcome contingency (see Figure 1A). This treatment pro-
pressing (Davis and Bitterman, 1971). duced a reduction in the degraded relative to the non-degraded
Unfortunately, dissociating Pavlovian conditioned responses action, confirming that instrumental performance was mediated
from instrumental actions is still not sufficient to assert an action by the action-outcome contingency, an effect replicated many
is goal directed. Historically, instrumental conditioning was times in recent years.
thought of as an acquired reflex, viz. the classic stimulus- Evidence for Dual Control Processes in Instrumental
response (S-R) association of the neobehaviorists, with specific Conditioning
S-R associations selected and strengthened by a process of The above analysis suggests that instrumental actions can be
reinforcement. On this view, the reason rats press the lever is goal directed, defined as an action that depends for its perfor-
not because they believe such responses result in a food reward mance on encoding (1) the relationship between the action and
48 Neuron 104, October 9, 2019

Neuron
Review
Figure 1. The Corticostriatal Circuit
Mediating Goal-Directed Action
(A) Behavioral tests used to establish the capacity
for goal-directed actions in rodents. Acquisition on
two different lever press actions for different out-
comes (delivered at a progressively decreasing
probability given a response) is followed by an
assessment of sensitivity to changes in the action-
outcome contingency (contingency degradation)
and in outcome value (outcome devaluation). A
choice test is then conducted in extinction (without
feedback).
(B) The progressive changes in goal-directed and
habitual control processes over the course of
training from the perspective of the dual control
theory of instrumental conditioning.
(C) The function of changes in the input and output
layers of the prelimbic cortex in shaping bilateral IT
output to the pDMS in generating the learning-
related plasticity necessary to encode the specific
action-outcome associations mediating goal-
directed action. PT neurons project ipsilaterally,
do not mediate plasticity, but appear to be
essential for the preparatory responses necessary
to maintain performance.
(D) The goal-directed circuit and its reliance on the
bilateral prelimbic cortex to posterior dorsomedial
striatal circuit in the brain.
PL, prelimbic cortex; pDMS, posterior dorsome-
dial striatum; SNr, substantia nigra pars retic-
ulata; mdT, mediodorsal thalamus; VTA, ventral
tegmental area.
its specific outcome and (2) the current value of that outcome. major source of habitual control; thus, a context shift removes re-
Such actions are, therefore, demonstrably sensitive to changes sidual performance after devaluation without affecting the size of
in both the action-outcome contingency and outcome value. the devaluation effect (Thrailkill and Bouton, 2015). Various exper-
Nevertheless, in our attempts to establish, behaviorally, what an- iments have confirmed other predictions of the dual control ac-
imals have learned, there have always been indications that count (reviewed in Dickinson and Balleine, 2002). Nevertheless,
more than one process is engaged. Within outcome devaluation although there is good behavioral evidence supporting dual con-
studies, for example, using taste aversion learning to devalue the trol, by far the strongest evidence for this account comes from
outcome can result in the complete suppression of consumption studies assessing the neural bases of instrumental conditioning.
but incomplete suppression of instrumental performance (Colwill
and Rescorla, 1990), suggesting that some other devaluation- The Neural Bases of Instrumental Conditioning
insensitive process is involved. In fact, evidence suggests that The brain could have implemented the behavioral and psycho-
lever pressing can simultaneously have more than one determi- logical distinction between reflex and volition in a variety of
nant; it can be controlled by its specific consequences and by ways, only one of which involved sequestering them to partially
antecedent stimuli, the latter commonly referred to as habitual independent circuits. An essentially habitual motor process
control (see Dickinson, 1994; Graybiel, 2008; Wood and Ru €nger, could have been supervenient on a cognitive overseer, some-
2016 for reviews). It is this simultaneous control, the claim that thing that has been advanced, at least conceptually, in functional
the overall rate of instrumental performance at any point in localization accounts (e.g., the cortico-centric accounts dividing
training reflects the summed strengths of the action-outcome motor from non-motor cortical areas; Koechlin et al., 2003). We
and S-R associations, that is the focus of the dual control ac- initially took a non-committal approach by assessing what, if
count (Figure 1B). anything, various brain structures and circuits contributed to
According to this account, habits strengthen with training until, goal-directed action using sensitivity to outcome devaluation
at some point, instrumental performance becomes insensitive to and continency degradation as measures of action-outcome
both outcome devaluation and to contingency degradation learning. There were many failures (reviewed in Balleine et al.,
(Adams, 1982; Dickinson et al., 1998; Figure 1B). Such findings 2009). Nevertheless, it has become clear that specific cortical,
are consistent with a S-R reinforcement mechanism that incre- striatal, and limbic structures are critical and form the hubs of
mentally strengthens the association between the S and R with a complicated set of circuits that together form the larger
each outcome delivery, meaning that, as habits develop, more network implementing goal-directed action in the brain (for re-
and more residual performance will remain after devaluation. views of earlier work, see Balleine, 2005; Balleine and O’Doherty,
Recent studies suggest that the experimental context provides a 2010).
Neuron 104, October 9, 2019 49

Neuron
Review
Goal-Directed Learning Requires a Cortical Working directed learning (Corbit et al., 2001; Hart et al., 2018a), system-
Memory atic exploration found evidence of action-outcome encoding in
The first direct evidence of goal-directed learning in the brain the posterior dorsomedial striatum (pDMS; Yin et al., 2005a), a
came from the finding that pretraining cytotoxic lesions of the region subsequently delineated as a separate anatomical divi-
prelimbic prefrontal cortex (PL; i.e., area 32) in the rat rendered sion of the striatum based on its heterogeneous, multimodal
instrumental performance insensitive to contingency degrada- cortical inputs (Hunnicutt et al., 2016). In this early work, we
tion and outcome devaluation (Balleine and Dickinson, 1998). found that pre- or post-training lesions of the pDMS, inactivation
Later, we found this involvement to be short lived; beyond a short using the GABA A agonist muscimol, the NMDA antagonist AP-5,
consolidation window, the same manipulations, made after only or the p-ERK inhibitor UO126 blocked both goal-directed
a few sessions of training, had no influence on instrumental per- learning and the expression of that learning in performance (Shi-
formance (Hart and Balleine, 2016; Ostlund and Balleine, 2005). flett and Balleine, 2011; Yin et al., 2005a, 2005b).
Clearly, the PL is necessary to encode the specific action- Hart et al., 2018a recently confirmed the involvement of the
outcome associations that mediate goal-directed action but direct PL to pDMS circuit in goal-directed learning, ruling out
does not store those associations. Rather, the brevity of its any involvement of NACco-projecting PL neurons while estab-
involvement suggests, consistent with its recurrent activity lishing the critical role of bilaterally pDMS-projecting PL output
(Stewart and Plenz, 2006), that a PL-dependent, short-term neurons in that effect. Recent evidence suggests that there are
working memory maintains specific action-outcome-related ac- two populations of corticostriatal output neurons (Nakayama
tivity for encoding elsewhere (Tsutsui et al., 2016). Supporting et al., 2018; Shepherd, 2013): pyramidal tract (PT) neurons pro-
this, an increase in markers of cellular activity (e.g., phosphory- jecting ipsilaterally to the pyramidal tract, the thalamus, and ipsi-
lation of extracellular signal regulated kinase [p-ERK]) has been lateral striatum and intratelencephalic (IT) neurons projecting
observed during a consolidation window up to 1 h after an initial bilaterally to the striatum via the corpus callosum (Baker et al.,
training session in input layers 2/3 and in the output layer 5 of the 2018; Gerfen et al., 2013; Figures 1C and 1D). Using an approach
PL accompanied by an increase in nuclear p-ERK expression, that expressed CRE in the IT, but not the PT, neurons projecting
especially in the posterior segment of the PL (Hart and Balleine, to either the NACco or the pDMS, Hart et al., 2018a infused a
2016; see also Nakayama et al., 2018; Whitaker et al., 2017). CRE-dependent hM4-DREADD into the PL and found that inac-
Blocking phosphorylation within this consolidation window after tivating the IT neurons projecting bilaterally to the pDMS, but not
a single session of training by infusing the p-ERK inhibitor the NACco, blocked goal-directed learning, whereas the PT pro-
PD980598 into the PL was sufficient to abolish goal-directed jection to the pDMS failed to support this learning (Hart et al.,
learning (Hart and Balleine, 2016). 2018a). Conversely, sparing the crossed IT projection while
The intracortical circuits that enable the PL’s involvement in blocking the ipsilaterally projecting PT neurons had no effect
this learning process likely control plasticity between the super- on goal-directed learning (Hart et al, 2018b).
ficial input and deep output layers, the former receiving inputs Action-Outcome Associations Are Encoded in Posterior
from basolateral amygdala, ventral tegmental area, mediodorsal Dorsomedial Striatum
thalamus, and other cortical areas, notably medial agranular, These results suggest a number of things. First, they confirm the
anterior cingulate, and agranular insular cortices (Ährlund- importance of the PL-pDMS projection, specifically layer 5 IT neu-
Richter et al., 2019; Shepherd, 2013). These corticocortical in- rons, in goal-directed learning and so point to the specific circuit
puts, together with inputs from the thalamus to L2/3 (Collins and cell type that mediates the encoding of the action-outcome
et al., 2018), appear to play a direct role in the cortical recurrent associations in the pDMS for goal-directed action. Second, the
activity (Jezzini et al., 2013) that allows the PL to coordinate the finding that this learning requires the bilaterally projecting IT neu-
necessary sensory, affective, and motor components of goal- rons makes clear that such learning is broadcast from the PL to
directed action (Figure 1C). Structurally, both kinds of input link the pDMS bilaterally, whether that learning is relevant to an action
superficial layers to cortical output layers, shaped by inhibitory ultimately directed at contra- or ipsi-lateral action space. This
parvalbumin interneurons in layers 2 and 3 (Gabbott et al., makes sense as an important factor differentiating goal-directed
2005; Nakayama et al., 2018) and, potentially, by a dopamine- from reflexive actions is their goals: whereas the goal of the
mediated error signal from the VTA sensitive to the action- former is to change the environment, that of the latter is simply
outcome contingency (Naneix et al., 2009, but see Ellwood to perform a motor movement. Clearly, broadcasting action-
et al., 2017). This kind of layer-related reverberative activity has outcome information bilaterally is an effective way of ensuring
been a key feature of recent models of working memory (Miller that the animal can make whatever movement is required to
et al., 2018). cause the desired change in the world. It is interesting too to
The Necessity of the Prefronto-striatal Circuit for Goal- reflect on the range of the cortical inputs that impinge on the
Directed Learning pDMS in this context. Given the multimodal nature of the goals
Although much remains to be explored in these circuits, the fact of goal-directed action, the inputs to the pDMS are clearly fit for
that they are only temporarily involved in the plasticity underlying this purpose including, besides the PL, inputs from visual, audi-
the encoding of specific action-outcome associations suggests tory, retrosplenial, parietal, orbital, and cingulate cortices and
that this learning is encoded in an efferent structure. The nucleus amygdala (Hunnicutt et al., 2016; Nagy et al., 2018).
accumbens core (NACco) and dorsomedial striatum (DMS) were Within the pDMS, the PL provides a timed glutamatergic input
considered as likely contenders, and whereas the NACco was to multiple cell types but primarily to the spiny projection neurons
found to contribute to instrumental performance but not goal- (SPNs). The specific signaling properties of these neurons are

Neuron
Review
Figure 2. The Role of the Basal Ganglia in
Action Control
(A) Schematic showing the hypothetical circuit
influencing plasticity in intracellular signaling at
direct and indirect spiny projection neurons
(dSPNs and iSPNs, respectively) in the pDMS
necessary for encoding specific action-outcome
associations. The parafascicular thalamic (Pf)
input provides state information altering activity at
cholinergic interneurons (CINs) that, via low-
threshold spiking interneurons (LTSIs), is biased in
a manner to target plasticity at specific ensembles
of SPNs. The Pf input generates burst-pause ac-
tivity in CINs that ultimately encourages learning-
related plasticity at dSPNs.
(B) The bilateral cortical basal ganglia circuit
involving bilateral IT and unilateral PT input from
the cortex to dSPN and iSPNs in the striatum, with
lateralized feedback to the mdT and cortex
directly via the SNr or indirectly via the globus
pallidus externa (GPe) and subthalamic nucleus
(STN). Dopamine neurons in the SN pars com-
pacta (SNc) exert a modulatory influence on ac-
tivity in the striatum.
(C) The feedforward cortical-basal ganglia circuit
(modified from Shephard, 2013) showing cortico-
cortical (CorCC), thalamocortical (CT), cortico-
striatal (CStr), and bilateral IT projections (for
learning) and the ipsilateral PT projections to the
striatum, basal ganglia, and brainstem/spinal cord
(for performance).
(D) The potential influence of a biased dopa-
mine (DA) input to pDMS on output for
performance. Lateralized DA activity is hy-
pothesized to encourage lateralized output via
dSPNs and, therefore, action initiation targeted at
changes in contralateral action space via a feedback circuit to the cortical PT neurons. The arkypallidal input from the GPe to the pDMS could
provide an alternative or additional source of lateralized control by inhibiting output from contralateral pDMS.
well known and described elsewhere (Gerfen and Surmeier, Other classes of interneuron likely play a role in this process
2011); however, briefly, two populations of SPNs have been through larger network influences; e.g., the GABAergic, somato-
distinguished based on their outputs to the midbrain: the striato- statin-positive, low-threshold spiking interneurons [LTSIs]
nigral or direct pathway SPNs (dSPNs), the activity of which is appear to act within the DMS to dampen activity in the pre-
largely excitatory on behavior, and the striaopallidal or indirect fronto-striatal pathway in a manner specific to internal state or
pathway SPNs (iSPNs) that largely inhibit behavior. A variety of context (Fino et al., 2018). Consistent with this claim, inhibition
changes within the pDMS have been described as a conse- of LTSIs has been argued to be necessary for goal-directed
quence of goal-directed learning consistent with the broad func- learning (Holly et al., 2019), although in this study no direct
tional influence of these neurons. Thus, for example, Shan et al. assessment of goal-directed control was used. Nevertheless,
(2014) found that, in mice, the acquisition of goal-directed lever while affecting initial acquisition, dampening LTSI activity did
pressing induced plasticity in both dSPNs and iSPNs in the not affect performance after a change in contingency, suggest-
pDMS but that, importantly, these changes were in opposing ing that the LTSI-related circuits controlling initial and subse-
directions; after learning, ex vivo assessment of AMPA/NMDA quent performance differ. This would be the case if, as has
ratios found that these were increased at D1-SPNs and reduced been argued (Bradfield et al., 2013), such changes alter the
at D2-SPNs. SPN ensembles engaged in performance after a change in state.
The PL also targets distinct classes of interneuron in the pDMS In accord with this suggestion, LTSIs can influence CIN activity
that modulate SPN activity to affect plasticity. Cholinergic inter- (Elghaba et al., 2016) and appear to do so in a manner that biases
neurons (CINs), for example, which are giant, aspiny, and regu- CIN-related networks (Sullivan et al., 2008; Figure 2A).
larly distributed throughout the striatum, where they make up These local circuits are positioned to reduce interference be-
about 2%–3% of the neurons, exert inhibitory control over plas- tween existing plasticity and that induced by encoding new ac-
ticity when tonically active and provide windows for spike- tion-outcome contingencies. Bradfield et al. (2013) used an
timing-related changes in plasticity when they burst and then outcome reversal paradigm to examine the effect of disconnect-
pause under the control of inputs from the parafascicular nu- ing the Pf-pDMS circuit and found that, whereas this disconnec-
cleus of the thalamus (Pf; Ding et al., 2010). The Pf input is, there- tion had no detectable effect on the initial encoding of specific
fore, in a position to regulate plasticity at specific ensembles of action-outcome associations in naive rats, it powerfully affected
dSPNs and iSPNs by altering the pattern of CIN activity from encoding when outcome identity was reversed. This was not due
tonic to burst-pause (Tanimura et al., 2019; Figure 2A). simply to a loss of plasticity; if rats were given sufficient training

Neuron
Review
on these new contingencies, then they could acquire them, and, In line with prior findings, stimulating dSPNs biased choice to-
indeed, consistent with retroactive interference, if a 3-week ward the port contralateral to stimulation, whereas stimulating
delay was provided between reversal and test, rats could even- iSPNs produced an ipsilateral bias. More importantly, this
tually recover the initial contingencies too (Bradfield and Bal- response bias was found to interact with reward history; mice
leine, 2017). These effects depend on CINs in the pDMS; not showed a greater degree of bias after unrewarded trials (i.e., in
only does Pf-pDMS disconnection alter CIN activity, but similar the face of greater response variability), but a reduced bias after
effects were found when CINs were ablated either chemogeneti- rewarded trials when, presumably, the value of the response was
cally or by normal aging (Matamales et al., 2016). sufficient to overcome the bias produced by stimulation. The au-
These data suggest that local circuits within the pDMS inte- thors argued, therefore, that increasing dSPN output inflates,
grate the bilateral prefrontal and midline thalamic inputs to whereas increasing iSPN output deflates, the anticipated value
encode new action-outcome associations in specific ensembles of the contralateral action. Others have argued similarly,
of SPNs in a state- or context-dependent manner. Whether the although have claimed a more specific role for dSPN activity in
Pf input to pDMS is also required to reconstruct those states ‘‘chosen value’’ based on the relatively larger changes observed
for retrieval is unknown; nevertheless, inputs to the pDMS from in post-choice activity (Kim et al., 2009; Nonomura et al., 2018).
the ventral and lateral orbitofrontal cortices (vlOFC) have recently The suggestion that pDMS output is necessary for perfor-
been argued to be involved (Stalnaker et al., 2016; Farovik et al., mance rather than learning is also consistent with recordings
2015). Supporting this view, although vlOFC damage affects taken from the terminals of VTA/SNc dopamine neurons in the
neither the learning nor the retrieval of goal-directed actions DMS during a dynamic choice discrimination task (Lee et al.,
per se (Balleine et al., 2011; Lichtenberg et al., 2017; Ostlund 2019; Parker et al., 2016). In this kind of task, the distribution
and Balleine, 2007; Panayi and Killcross, 2018), failures of of performance across two levers matches the reward distribu-
state-dependent retrieval have been reported after vlOFC inacti- tion on those levers, and matching reflects underlying learning,
vation (Gremel and Costa, 2013; Parkes et al., 2018). at least with respect to how different levers pay off and how
The Output of Dorsomedial Striatum and Its Contribution those payoffs change. However, reflecting recognition of a divi-
to Performance sion between learning and performance, the dopamine response
Given the above findings, it is clear that, in addition to its role in associated with the chosen action did not map onto reward pre-
learning, the pDMS also plays an important role in retrieving ac- diction errors for learning but rather was related to the bias in
tion-outcome information for performance, likely through its performance toward contralateral action space. Although the
output to the larger basal ganglia circuit (Peak et al., 2019; Yin value of the chosen action was reflected in the GCaMP
et al., 2008; Figure 2B). Importantly, although the plasticity in signal—being somewhat larger when the chosen value of the
the pDMS necessary to encode these associations is broadcast previous choice was larger—it was much more clearly modu-
bilaterally via the bilateral prefronto-striatal circuit, what evi- lated by the side of the choice, showing up to 900% greater ac-
dence there is suggests that the striatal output for performance tivity when the chosen action was contralateral to the recorded
is, nevertheless, hemispherically lateralized. This literature is dopamine terminals than when ipsilateral to those recordings.
complex; unfortunately, many studies examining output activity Together, these data suggest, therefore, that performance-
have focused unilaterally or have avoided looking at functional related activity in the DMS reflects the lateralization of per-
behavior and opted instead to examine the effects of dSPN formance.
and iSPN manipulations on spontaneous movement and mostly There is much still to learn about the role of the DMS in the per-
in the anterior DMS. Among those few studies examining the in- formance of goal-directed action. Whether the influence of dSPN
fluence of learning on performance, Kravitz et al. (2012) reported and iSPN activity on chosen value holds generally to be true, and
that mice biased responding toward a lever delivering optoge- it is quite possible that it may not (see Elber-Dorozko and Loe-
netic stimulation of dSPNs in the DMS, sensitive to variations wenstein, 2018 for a critical reinterpretation), it appears reason-
in the lever-stimulation contingency, and learned to avoid DMS able to conclude that the output of the DMS exerts its most
iSPN stimulation. Thus, action initiation and inhibition were consistent influence on learned actions and that, unlike its in-
driven by stimulation of direct and indirect pathway neurons, puts, these outputs are lateralized, influencing performance
respectively, consistent with traditional models of the basal contralateral to the hemisphere of that output. Finally, it is inter-
ganglia. Nevertheless, other studies suggest that the manipula- esting to consider how the output of one hemisphere is favored
tion of both DMS output pathways can sometimes bias instru- over the other to control performance (Figures 2C and 2D). One
mental performance in similar ways. Carvalho Poyraz et al. possibility lies in internal control of striatal output through iSPN
(2016) found that using a hM4 DREADD to inhibit iSPNs in the regulation of dSPN activity (Taverna et al., 2008), something
DMS increased the initiation of a lever hold down response for likely to (1) influence feedback via the PT input to pDMS as
reward, albeit with reduced efficiency. well as brain stem and spinal cord (Figure 2C) and (2) be affected
Over and above these findings, several studies have reported by dopamine release in the DMS, consistent with the effects re-
that prior learning alters the ability of dSPN and iSPN stimulation ported by Lee et al. (2019) and with the bidirectional influence of
to drive an instrumental response. For example, Tai et al. (2012) dopamine on dSPN and iSPN signaling (Gerfen and Surmeier,
gave unilateral stimulation of dSPNs and iSPNs in the DMS dur- 2011; Figure 2D). Additionally, but more speculatively, there is
ing a probabilistic switching task when mice were required to recent evidence that feedback inhibition to the striatum from
make a nose poke for reward. Generally, the mice developed a the external globus pallidus (GPe) shapes striatal output. A spe-
win-stay, lose-shift strategy, the latter with greater variability. cific population of enkephalin-positive, GABAergic projection

Neuron
Review
Figure 3. The Feedback Processes Mediating Goal-Directed and Habitual Action

(A) Summary of the circuits and circuit functions underlying goal-directed and habitual action.
(B) The essential features of the circuits controlling the incentive processes mediating the experienced and predicted values of the instrumental outcome.
(C) The distinct neural circuits mediating the primary feedback processes underlying goal-directed and habitual actions (i.e., reward and reinforcement,
respectively), adapted from Balleine (2018).
(D) The cellular changes in the nucleus accumbens shell that mediate the influence of predicted value on choice. The accumulation of delta opioid receptors
(DORs) on the membrane of cholinergic interneurons (CINs) allows variations in enkephalin (ENK) release from dopamine D2-receptor-expressing SPNs to in-
fluence acetylcholine (Ach) release and the output activity of D1-receptor-expressing SPNs through a non-traditional basal ganglia output circuit involving the
medial ventral pallidum (VPm) and mdT (see Zahm, 1999). Modified from Laurent et al. (2014).
GPi, globus pallidus interna; VA-VL, ventral anterior and ventrolateral thalamus; mOFC, medial orbitofrontal cortex; vlOFC, ventral and lateral OFC; IL, infralimbic
cortex; NACsh, nucleus accumbens shell; NACco, nucleus accumbens core; VP-l, lateral ventral pallidum; BNST, bed nucleus of the stria terminalis; ARAS,
ascending reticular activating system.
neurons in the GPe, the arkypallidal neurons, provides a source tory-related topography that can be rapidly refined and a slower
of inhibition to the striatum and particularly to the DMS sufficient learned component that involves incremental improvement
to affect action selection (Mallet et al., 2016). And, as this input is (Figure 1B; Huberdeau et al., 2015; Wolpert and Flanagan, 2016).
lateralized, it could exert differential hemispheric control, At a neural level, habitual actions have long been ascribed to
although the psychological basis for this influence on choice dorsal striatum and, more recently, to the cortical-basal-ganglia
and how it is achieved within the broader circuit remains to be circuit centered on the dorsolateral striatum (DLS; see Balleine
established. and O’Doherty, 2010 for a review). In the context of the findings
described above, it is now generally agreed that goal-directed
Differentiating Goal-Directed and Habitual Action and habitual actions involve distinct forms of action control
Control in the Brain instantiated in anatomically distinct but interacting circuits.
As intimated above, perhaps the most important factor differen- This distinction also helps explain how instrumental actions
tiating goal-directed from reflexive actions is their goal: whereas can be acquired without a goal-directed system and why,
the goal of the former is to change the environment, that of the when they are acquired, they are insensitive to changes in the ac-
latter is a specific motor movement. These latter forms of move- tion-outcome contingency and outcome value (Figure 3A). The
ment are commensurate with habits (Graybiel, 2008) (i.e., actions obvious prediction from this account—that treatments affecting
elicited by antecedent stimuli rather than their consequences). DLS should render otherwise habitual actions goal directed—
As with sensorimotor learning generally, these movements was tested early on, and lesion or inactivation of the DLS was
have dissociable components involving a flexible, largely trajec- found to increase the rats’ sensitivity to outcome devaluation

Neuron
Review
(Yin et al., 2004) and contingency degradation (Yin et al., 2006). in generating what are essentially unconditioned responses—
Interestingly, manipulations that silence the DLS have also been orienting, grooming, whisking, and so on—some of which can
found to improve the performance of complex conditional dis- be modified by conditioning (Holland, 1977). Other inputs are
criminations where competing action-outcome and S-R solu- clearly involved in learned actions through the encoding of
tions can sometimes conflict (Bergstrom et al., 2018; Bradfield various skills or the rendering of previously acquired goal-
and Balleine, 2013). directed actions habitual by overtraining or other training condi-
Examining plasticity in the DLS associated with habit learning tions. Although elements of skill learning have been argued to
has been made difficult by the precision of the timing of the dissociate from habit (e.g., changes in response speed continue
various cortical inputs, as well as subtle distinctions in the role after behavior has become habitual) (Garr and Delamater, 2019;
of neuromodulation in both the reinforcing and movement- Hardwick et al., 2018), they clearly have common features, not
related activity of midbrain dopamine inputs (Klaus et al., least of which is their dependency on motor corticostriatal cir-
2019). It is clear that dopamine activity in DLS is necessary for cuits and the tendency of both skills and habits to involve the
habit learning; 6-OHDA lesions of the dopaminergic projection concatenation of diverse responses into chunked chains of ac-
from the lateral substantia nigra pars compacta to the DLS atten- tions (Graybiel, 2008). Indeed, an important basis for changes
uates habit learning and increases goal-directed control (Faure in action control is the tendency of actions incidentally to asso-
et al., 2005), consistent with an earlier claim that dopamine activ- ciate with other actions to form open loop sequences (Dezfouli
ity in the DLS mediates the reinforcement signal (Reynolds et al., and Balleine, 2012; Halbout et al., 2019).
2001). Studies of spike-timing-dependent plasticity in the DLS It is interesting to note that, in similar fashion to the limited
suggest that the conjunction of presynaptic, high-frequency involvement of the prelimbic cortex in encoding action-outcome
cortical stimulation, postsynaptic depolarization, plus dopamine associations, the development of the learning processes in DLS
can produce D1-receptor-dependent long-term potentiation associated with skills and habits appears also to require (albeit
(Wickens et al., 1996). However, for self-paced actions like lever for longer but nevertheless) only temporary input from the motor
pressing, the reinforcement signal comes after the action deliv- cortex and other neocortical motor areas (Kupferschmidt et al.,
ering the reinforcing event, and, as a consequence, any related 2017). Lesion studies suggest that motor behavior can recover
dopamine release must come well after the sensory-motor after massive lesions of the motor regions of neocortex with
events that generate habitual action. It has been hypothesized, near-perfect kinematics. For example, Kawai et al. (2015) exam-
therefore, that a form of internal eligibility trace bridges this ined the effects of bilateral motor cortex damage on a task in
gap (Izhikevich, 2007), allowing subsequent phasic dopamine which thirsty rats had to press a lever twice in a defined temporal
release to reinforce synapses tagged by prior activity. sequence to earn water. They were given extensive training
An important recent study by Shindou et al. (2019) has pro- before motor cortex lesions and, remarkably, showed no effect
vided the necessary details of this account, finding evidence of the lesion; performance returned to the same kinematics
for the formation of a rapid and silent eligibility trace driven by acquired pre-lesion. In contrast, a group of rats given lesions
pairing presynaptic stimulation and postsynaptic depolarization of motor cortex prior to training acquired a similar temporal inter-
and dependent on the transient synaptic potentiation of calcium- val between lever presses and average reward rates but could
permeable AMPA receptors. Importantly, Shindou et al. found not generate anywhere near the same degree of temporal preci-
that this potentiation can be converted into a longer-lasting in- sion or invariance in motor kinematics.
crease in AMPA receptor function by dopamine, acting via D1 There may be quite close similarities too in the local circuits in
receptors, when it is applied 2 s after eligibility trace-related ac- the striatum mediating plasticity and in partitioning new and ex-
tivity. This effect of precisely timed phasic dopamine release at isting learning in a manner that avoids interference with stimulus
DLS targets provides the first clear evidence for a functionally control. Recent research suggests that the Pf inputs to both
appropriate reinforcement signal in the DLS. The timing of this iSPNs and CINs in the DLS is similarly organized to the DMS
reinforcement signal is interesting and suggests that it is subject and may play a similar kind of role, although for distinct functions
to considerable pre-processing. In line with this, activity of both (Mandelbaum et al., 2019). Nevertheless, it appears the broader
infralimbic cortex (Killcross and Coutureau, 2003; Smith and circuit differs with regard to the role of specific interneurons;
Graybiel, 2013) and amygdala central nucleus (Lingawi and whereas LSTIs appear to be the more critical in the DMS, parval-
Balleine, 2012) has been implicated in the acquisition and perfor- bumin-positive fast-spiking interneurons (FSIs) appear to play
mance of habits, perhaps via interactions between these struc- that role in the DLS (Fino et al., 2018; Monteiro et al., 2018).
tures (Gourley and Taylor, 2016) and between amygdala and Furthermore, although the cortical input to DLS is bilateral, unlike
the DLS (Lingawi and Balleine, 2012). the DMS, learning-induced plasticity is lateralized and contralat-
In addition to this dopamine input, the DLS receives massive eral to subsequent performance. For example, Xiong et al. (2015)
bilateral, divergent, and convergent projections from sensori- trained rats on a simple auditory discrimination task in which
motor and motor cortices (McGeorge and Faull, 1989) that are auditory frequency signaled which of two ports to the left and
broadly but imperfectly somatotopically organized (Brown the right of a center port delivered a water reward. They found
et al., 1998; Klaus et al., 2017). Sensory information is also that plasticity in the auditory cortical inputs to striatum necessary
conveyed to DLS from the thalamus (Alloway et al., 2017), partic- to encode, say, low frequency/left port and high frequency/
ularly the Pf and centromedial nuclei, which appear to be related right port were topographically organized in the contralateral
through a larger modulatory circuit to the DLS cortical inputs hemisphere (i.e., modifying a left response required plasticity
(Mandelbaum et al., 2019). Many of these inputs are involved at the inputs to right DLS from right auditory cortex, whereas

Neuron
Review
modifying a right response required plasticity at left auditory sions (Noonan et al., 2012; Rangel et al., 2008), and (2) predicted
cortical inputs to left DLS). Similar results have been reported reward values, which reflect the anticipated affective or sensory-
in a response sequence task where the NR2A/B subunit ratio specific features of an outcome and inform stimulus-based
in the striatum contralateral to the trained limb decreased during decisions (Cartoni et al., 2016; Corbit and Balleine, 2016). The
skill acquisition, optimizing the threshold for inducing subse- behavioral and neural determinants of these forms of reward
quent synaptic plasticity (Kent et al., 2013). value have been reviewed elsewhere (Balleine, 2001; Dickinson
In interpreting their results, Xiong et al. (2015) proposed that and Balleine, 2002; Balleine, 2005; Balleine and Killcross,
their effects might be mediated by plasticity predominantly at 2006; Balleine and O’Doherty, 2010; O’Doherty et al., 2017; Par-
dSPNs in the DLS based on the findings of Tai et al. (2012), kinson et al., 2000); however, there are points worth making
which, as discussed above, assessed the effects of dSPN stim- regarding the way these values are acquired and function subse-
ulation in the DMS, not DLS, and on performance, not learning. quently to influence performance, particularly with regard to the
Nevertheless, in addition to the findings of Shindou et al. circuits on which they are based.
(2019) described above, there are other reports of dSPN plas- The influence of these incentive learning processes is largely
ticity in the DLS for skill learning (Yin et al., 2009) and habits the function of cortical-basal ganglia circuits centered on the
(O’Hare et al., 2016). Other studies have reported learning- ventral striatum, where inputs from cortex combine with
related reductions in iSPN involvement. Sommer et al. (2014) re- ascending inputs from brainstem and hypothalamic targets
ported a progressive reduction in dopamine D2 receptor binding directly and via heavily processed inputs from the amygdala,
in DLS after extended training on the accelerating rotorod, ventral hippocampus, thalamus, and midbrain dopaminergic af-
whereas Shan et al. (2015) found reduced spontaneous excit- ferents (Sesack and Grace, 2010). Based on experiments inves-
atory postsynaptic current (sEPSC) amplitude at DLS iSPNs after tigating the functions of these various inputs, it has become clear
overtraining mice to press a lever for a sucrose reinforcer. This that distinct neural networks mediate experienced and predicted
latter study also found evidence of postsynaptic changes at values within this circuit, distinguished, at least in part, by their
capsaicin-sensitive TRPV1 channels on iSPNs that, together differential control of modulatory processes in the NACco and
with the D2 agonist quinpirole, occluded the iSPN depression nucleus accumbens shell (NACsh) (Figure 3B).
induced by overtraining, and, indeed, TRPV1 knockout mice Evidence suggests that the experienced value of the instru-
showed reduced habit learning (see also Hilário et al., 2007). mental outcome is determined by exposure to the relationship be-
Over and above learning-related plasticity in DLS, consider- tween the outcome and an emotional response (whether that
able work has established the involvement of DLS in perfor- response is hedonically positive or negative). This experience al-
mance and particularly the role of the dSPN and iSPN output ters responses to the outcome itself but, more importantly, also
pathways in repetitive, movement-related activity. The results alters its value as a goal and so the tendency to perform actions
of these studies have been the subject of substantial recent re- associated with that goal, a phenomenon also called instrumental
views and will not be detailed here (see Klaus et al., 2019). How- incentive learning (reviewed in Balleine, 2001; Dickinson and Bal-
ever, it is worth noting that although, like the DMS, the DLS is leine, 2002). The basolateral amygdala (BLA) is critical to the
biased in its output to the basal ganglia, unlike the DMS, activity development of this value signal. Here, the motivational and
appears to increase in both the dSPN and iSPN output pathways affective processing in brainstem and hypothalamic nuclei is
simultaneously (Cui et al., 2013) or near simultaneously (O’Hare associated with the specific sensory aspects of the outcome to
et al., 2016) during performance, consistent with the goal of generate the evaluative associations necessary to encode
developing the invariant motor movement necessary for habitual outcome value (reviewed in Balleine and Killcross, 2006;
motor control by maintaining effector stability. Figure 3C). Manipulations of the BLA that alter the processing
of this sensory or affective information block the changes in value
Process Dissociation in the Ventral Striatum: induced by outcome revaluation treatments (Wassum et al.,
Experienced versus Predicted Values 2009). Nevertheless, in line with theories regarding the ascending
The circuits described above are necessary for goal-directed ac- representation of emotional information (LeDoux and Brown,
tion but are not sufficient, and it is easy to see why. Although the 2017), the BLA doesn’t appear to store changes in outcome
encoding of action-outcome information is an essential factor in value, which are broadcast to a larger incentive memory circuit
goal-directed control, any information of the form ‘‘Action A involving the anterior insular cortex (aIC) (Livneh et al., 2017;
leads to Outcome O’’ can be used both to perform A and to avoid Parkes and Balleine, 2013). The aIC controls the performance of
performing A. What determines whether we perform A, or action via direct and indirect inputs to the NACco (Parkes et al.,
choose A rather than B, is the relative value of the consequences 2015), the latter via ventromedial prefrontal cortical inputs
of these actions. centered on the medial OFC (Bradfield et al., 2015). Such values
Importantly, it is now clear that the influence of value on the are often gated by internal states, particularly deprivation-
performance of goal-directed actions depends on two forms of induced states, likely generated via inputs to the accumbens
evaluative (or incentive) processes that have dissociable behav- from the ventral hippocampus and subiculum and modulated
ioral, psychological, and neural determinants and that influence by BLA (Fanselow and Dong, 2010; Sesack and Grace, 2010).
distinct types of decision making: (1) experienced reward values, In contrast, the predicted reward values necessary for the in-
which reflect the motivational and emotional experiences fluence of Pavlovian conditioning on instrumental performance
induced during direct exposure to a goal or outcome (e.g., are determined by a parallel circuit involving, on the one hand,
when consuming a particular food or fluid) for value-based deci- the NACco—for the general affective influence of such cues on

Neuron
Review
the rate of performance—and, on the other, the NACsh for long history of studies examining midbrain dopamine activity in
outcome-specific predictions (Corbit and Balleine, 2016; Parkin- this regard, recent evidence suggests that this effect is, again,
son et al., 2000). Again, the amygdala is pivotal, now for associ- largely dependent on the ability of such cues to modulate dopa-
ating stimuli with the sensory and affective features of instru- mine release locally (Mohebi et al., 2019). Such conditioned
mental outcomes to generate the predictive information sources of incentive motivation have been dissociated from
necessary to alter action selection in, for example, studies of the influence of outcome-specific predictions both neutrally, at
‘‘Pavlovian-instrumental transfer’’ (Cartoni et al., 2016). These the level of the amygdala and the nucleus accumbens (Clark
studies have established that the influence of the general affec- et al., 2012), particularly in conditioned reinforcement (Parkinson
tive (in this case, appetitive) significance of cues on instrumental et al., 2000), and psychologically within the distinction between
performance is mediated by the central nucleus of the amygdala general and specific transfer effects (Clark et al., 2012; Corbit
(CeA) and its indirect influence on the NACco. In contrast, the for- and Balleine, 2016) and, in fact, have much in common with older
mation of outcome-specific Pavlovian predictions and their influ- models of incentive motivation (Bindra, 1974), recently reframed
ence on choice depends on the BLA (Ostlund and Balleine, 2008) in terms of incentive salience (Berridge, 2007).
and its inputs to the NACsh (Shiflett and Balleine, 2010). In this Generally, therefore, these examples of process dissociation
latter case, there is now good evidence that these predictions provide the basis for claims that distinct incentive processes
involve BLA modulation of cholinergic processes in the shell, influence instrumental performance via independent ventral
again mediated by CINs (Heath et al., 2018), although, impor- cortical-basal ganglia loops centered on distinct regions of the
tantly, both the input and output processes and the intrinsic cir- ventral striatum. How these loops are related to those involving
cuitry of the NACsh differs markedly from the dorsal striatum (see dorsal striatum is an important open question. Although the
Gonzales and Smith, 2015 for a thoroughgoing review of these role of experienced reward is clearly limited to goal-directed
distinctions). These stimulus-outcome relationships appear to control, the ability of Pavlovian predictions to motivate actions
be physically encoded in a G-protein-coupled-receptor-medi- regardless of the current value of their consequences points to
ated cellular ‘‘memory’’ in the NACsh; delta opioid receptors their potential role in the motivation of both actions and habits
on CINs in the shell accumulate on the membrane in a manner (Balleine and Ostlund, 2007). This has implications for circuit
reflecting specific Pavlovian relationships and regulate the excit- integration and so for the structure of the larger network medi-
atory output from D1-expressing SPNs to medial ventral pal- ating action control.
lidum (VPm) and thence to the mdT when retrieved (Bertran-
Gonzalez et al., 2013; Laurent et al., 2014; Leung and Balleine, The Larger Network: Implementing an Associative
2015; Zahm, 1999; Figures 3B and 3d). The retrieval of these as- Architecture for Action Control in the Brain
sociations to guide choice involves cortical inputs to this circuit An appreciation for the meaning of behavior has had a decisive
from infralimbic cortex (Keistler et al., 2015) and vlOFC (Lichten- role in establishing the circuits that mediate goal-directed action
berg et al., 2017), although how this output is integrated with in the brain and in discriminating them from those supporting
dorsal circuits to retrieve specific actions is still a matter for other capacities, notably conditioned reflexes, skills, and habits.
conjecture (see Hart et al., 2014 for discussion; Figure 3B). Nevertheless, the complexity of the neural circuits described
The role of the dopaminergic inputs to the accumbens from here has, of course, been massively understated; the dynamics
the VTA is important to the formation of Pavlovian predictions, of activity within the circuits that support this fundamental ca-
although the involvement of these inputs is complex (Saunders pacity is likely integrated into a broader network in ways not
et al., 2018; Sesack and Grace, 2010). For example, whereas yet discovered and that we are only starting to appreciate. One
dopaminergic activity in the NACsh is critical for encoding approach to this complexity has advocated the collection of
the specific stimulus-outcome associations necessary for very large, multidimensional datasets in the hope that, on the
outcome-specific transfer effects (Laurent et al., 2014), this ap- basis of empirical and computational advances, structure will
pears related to local dopamine release and its influence on local emerge (Bassett and Sporns, 2017; Genon et al., 2018; Go-
SPN-related signaling processes rather than the activity of dopa- mez-Marin et al., 2014). This will only meaningfully be the case,
mine neurons that project to the NACsh (Saunders et al., 2018). however, if these ‘‘big data’’ approaches are constrained by con-
In contrast, dopamine does not appear to be involved in estab- ceptual architectures that actually map onto the functional ca-
lishing experienced values. Although it has become common pacities implemented by the brain. It is, therefore, important to
to read of dopamine’s ‘‘reward-related’’ effects, to date, dopa- consider how the various circuits described here are integrated.
minergic manipulations have not been found to influence the en- Some time ago, Dickinson and Balleine (1993) advanced a
coding of the experienced reward value of the instrumental broad conceptual architecture that modeled the implementation
outcome (Wassum et al., 2011; reviewed in Yin et al., 2008). of the action selection, evaluation, and execution processes re-
Dopamine, specifically the phasic activity of dopamine neu- flected in the reflexive and volitional forms of action control
rons in VTA, does, of course, play a significant role in Pavlovian described here (see also Dickinson 1994; Balleine & Ostlund,
conditioning; dopamine neurons respond robustly to both un- 2007; Figure 4A). More recently, Dickinson has developed this
conditioned and conditioned stimuli, and their projections, model as a cognitive representational system, making the case
particularly to the NACco, appear to be particularly relevant in that constraining the basic associative learning and performance
this latter regard (Saunders et al., 2018; Mohebi et al., 2019). processes that underlie goal-directed action within architectures
Furthermore, cue-elicited dopamine activity in the NACco can in- of this kind can generate a mechanism capable of implementing
fluence the vigor of goal-directed actions, although, despite a psychological rationality (Dickinson, 2012; Figure 4B).

Neuron
Review
Figure 4. The Larger Circuit Mediating
Goal-Directed Action
(A) Schematic showing the relationship between
the selection, evaluation, and execution pro-
cesses in an integrated dual control model of goal-
directed action.
(B) The associative-cybernetic model, adapted
from Dickinson (2012), that implements the se-
lection-evaluation-execution relationship within an
architecture that embodies the practical inference
underlying goal-directed action within a repre-
sentational system.
(C) A broad plan showing the implementation
of the associative-cybernetic architecture in
the broader cortical-basal ganglia network, pro-
gressively linking cingulate and neocortical
motor cortices for action control via feed-
back through successive cognitive/selection,
emotional/evaluative, and motor/executive feed-
back loops.
Perhaps the simplest kind of cognitive explanation of goal- 2010; Balleine and Ostlund, 2007)—resulting in action selection
directed control is that framed in terms of a practical inference and execution through the performance elements of the goal-
(i.e., ‘‘I do action A because I desire O and believe that A will directed system.
get me O’’), an explanation that identifies a belief (A will get me Within the model, stimulus-based urges emerge from within
O), a desire (for O), and, as a consequence, an intention (imple- the ‘‘habit system’’ (broadly defined) but are, at least in most
ment A). It is important to appreciate that this inference process situations and particularly for relatively undertrained actions,
is not explicit in the model and emerges from the way in which insufficient for performance. It is only with feedback via retrieval
the architecture of the model constrains the interaction between of previously acquired action-outcome associations and of
its associative components, as illustrated in Figure 4B. And, by their experienced incentive value (i.e., with the translation of
analogy, the same should be true of the current description of these urges into what Dickinson, 2012 calls an intention) that
the complex circuitry that mediates goal-directed control. If the they are released in performance. And, as described above,
implementation of associative processes in cognitive architec- there is evidence, for example, of state-related retrieval in
tures can provide a way for these kinds of process to be vlOFC of the specific action-outcome associations necessary
implemented in our psychology, then establishing how such for performance and, in the vmPFC and particularly the
architectures are implemented in the brain will provide the mOFC (Bradfield et al, 2015, 2018), for the related retrieval of
means to advance our understanding of how the psychological the experienced values that, through connections with the stria-
determinants of goal-directed action are supported in the neural tum, are in a position to drive value-based performance. The
circuitry described here. role of dopamine in movement execution, as distinct from
A key feature of this architecture is collaboration between learning, is likely critical here, both in shaping the specific
habitual and goal-directed action controllers, something that output from dorsomedial striatum and in releasing specific
will appear to conflict with the general view that these action con- movements via interaction between the associative and motor
trol systems exert an independent and competing influence. It is, loops through the basal ganglia (Figure 4C).
in many ways, a very brain-centric view to consider parts of the Nevertheless, although it is starting to become clear how such
brain as operating separately and independently, and, in fact, an architecture could be implemented in the circuitry described
when considered functionally, actions and habits can often above, the finer details of a selection-evaluation-execution
emerge as integrative rather than necessarily competing pro- network for performance remain elusive and constitute perhaps
cesses (Dezfouli and Balleine, 2012). Nevertheless, these argu- the most difficult open questions. Generally, these ideas raise
ments often miss the point; in terms of initial selection, it is often the possibility that a fully integrated architecture mediating the
stimuli that provoke urges to act that are then checked or favored distinct forms of action control and their feedback processes
through reflection on their consequences and the value of those described here will ultimately emerge from a better understand-
consequences (Haggard, 2008; Jackson et al., 2011). Likewise, ing of how these various circuits interact through the cortico-
as discussed above, stimuli predictive of specific outcomes cortical and thalamocortical circuits that modulate and shape
can guide choice in a highly selective fashion via feedback the output of the neocortical motor systems. There is consider-
though a ventral circuit (Figure 3B), initially engaged through a able independent research in this latter area (Arber, 2012; Doug-
form of S-R process—in this case, better characterized as an las and Martin, 2004; Harris and Shepherd, 2015), and, although
ideomotor or outcome-response (O-R) association (Shin et al., it has yet to be directly related to the functional circuitry

Neuron
Review
described here, from the current perspective, what will be Bassett, D.S., and Sporns, O. (2017). Network neuroscience. Nat. Neurosci.
20, 353–364.
required for progress in the near term is a concerted attempt
to integrate these areas of endeavor. Bergstrom, H.C., Lipkin, A.M., Lieberman, A.G., Pinard, C.R., Gunduz-Cinar,
O., Brockway, E.T., Taylor, W.W., Nonaka, M., Bukalo, O., Wills, T.A., et al.
(2018). Dorsolateral Striatum Engagement Interferes with Early Discrimination
ACKNOWLEDGMENTS Learning. Cell Rep. 23, 2264–2272.
Support for this work was provided by a Senior Principal Research Fellowship Berridge, K.C. (2007). The debate over dopamine’s role in reward: the case for
from the NHMRC of Australia (GNT#1079561) and a grant from the Australian incentive salience. Psychopharmacology (Berl.) 191, 391–431.
Research Council (DP150104878). The author would like to thank Tony Dick-
Bertran-Gonzalez, J., Laurent, V., Chieng, B.C., Christie, M.J., and Balleine,
inson, J. Bertran-Gonzalez, Genevra Hart, and James Peak for their comments
B.W. (2013). Learning-related translocation of d-opioid receptors on ventral
on a draft of the manuscript and the members of the Decision Neuroscience
striatal cholinergic interneurons mediates choice between goal-directed ac-
Lab at UNSW for the many discussions and debates that have contributed tions. J. Neurosci. 33, 16060–16071.
to this work.
Bindra, D. (1974). A motivational view of learning, performance, and behavior
REFERENCES modification. Psychol. Rev. 81, 199–213.
Bradfield, L.A., and Balleine, B.W. (2013). Hierarchical and binary associations
Adams, C.D. (1982). Variations in the Sensitivity of Instrumental Responding to compete for behavioral control during instrumental biconditional discrimina-
Reinforcer Devaluation. Q. J. Exp. Psychol. B 34, 77–98. tion. J. Exp. Psychol. Anim. Behav. Process. 39, 2–13.
Adams, C.D., and Dickinson, A. (1981). Instrumental Responding following Bradfield, L.A., and Balleine, B.W. (2017). Thalamic Control of Dorsomedial
Reinforcer Devaluation. Q. J. Exp. Psychol. B 33, 109–121. Striatum Regulates Internal State to Guide Goal-Directed Action Selection.
J. Neurosci. 37, 3721–3733.
Ahmed, A. (2014). Evidence, Decision and Causality (Cambridge Univer-
sity Press). Bradfield, L.A., Bertran-Gonzalez, J., Chieng, B., and Balleine, B.W. (2013).
The thalamostriatal pathway and cholinergic control of goal-directed action:
Ährlund-Richter, S., Xuan, Y., van Lunteren, J.A., Kim, H., Ortiz, C., Pollak interlacing new with existing learning in the striatum. Neuron 79, 153–166.
Dorocic, I., Meletis, K., and Carlén, M. (2019). A whole-brain atlas of monosyn-
aptic input targeting four different cell types in the medial prefrontal cortex of Bradfield, L.A., Dezfouli, A., van Holstein, M., Chieng, B., and Balleine, B.W.
the mouse. Nat. Neurosci. 22, 657–668. (2015). Medial Orbitofrontal Cortex Mediates Outcome Retrieval in Partially
Observable Task Situations. Neuron 88, 1268–1280.
Alloway, K.D., Smith, J.B., Mowery, T.M., and Watson, G.D.R. (2017). Sensory
Processing in the Dorsolateral Striatum: The Contribution of Thalamostriatal Bradfield, L.A., Hart, G., and Balleine, B.W. (2018). Inferring action-dependent
Pathways. Front. Syst. Neurosci. 11, 53. outcome representations depends on anterior but not posterior medial orbito-
frontal cortex. Neurobiol. Learn. Mem. 155, 463–473.
Arber, S. (2012). Motor circuits in action: specification, connectivity, and func-
tion. Neuron 74, 975–989. Brown, R.G., and Pluck, G. (2000). Negative symptoms: the ‘pathology’ of
motivation and goal-directed behaviour. Trends Neurosci. 23, 412–417.
Baker, A., Kalmbach, B., Morishima, M., Kim, J., Juavinett, A., Li, N., and
Dembrow, N. (2018). Specialized Subpopulations of Deep-Layer Pyramidal
Brown, L.L., Smith, D.M., and Goldbloom, L.M. (1998). Organizing principles of
Neurons in the Neocortex: Bridging Cellular Properties to Functional Conse-
cortical integration in the rat neostriatum: corticostriate map of the body sur-
quences. J. Neurosci. 38, 5441–5455.
face is an ordered lattice of curved laminae and radial points. J. Comp. Neurol.
392, 468–488.
Balleine, B.W. (2001). Incentive processes in instrumental conditioning. In
Handbook of Contemporary Learning Theories, R. Klein and S. Mowrer, eds.
Cartoni, E., Balleine, B., and Baldassarre, G. (2016). Appetitive Pavlovian-
(LEA), pp. 307–366.
instrumental Transfer: A review. Neurosci. Biobehav. Rev. 71, 829–848.
Balleine, B.W. (2005). Neural bases of food-seeking: affect, arousal and
reward in corticostriatolimbic circuits. Physiol. Behav. 86, 717–730. Carvalho Poyraz, F., Holzner, E., Bailey, M.R., Meszaros, J., Kenney, L., Kheir-
bek, M.A., Balsam, P.D., and Kellendonk, C. (2016). Decreasing Striatopallidal
Balleine, B.W. (2018). The motivation of action and the origin of reward. In Pathway Function Enhances Motivation by Energizing the Initiation of Goal-
Goal-Directed Decision Making: Computations and Neural Circuits, R. Morris, Directed Action. J. Neurosci. 36, 5988–6001.
A. Borstein, and A. Shenhav, eds. (Elsevier), pp. 429–455.
Chang, S.E., and Smith, K.S. (2016). An omission procedure reorganizes the
Balleine, B.W., and Dickinson, A. (1998). Goal-directed instrumental action: microstructure of sign-tracking while preserving incentive salience. Learn.
contingency and incentive learning and their cortical substrates. Neurophar- Mem. 23, 151–155.
macology 37, 407–419.
Clark, J.J., Hollon, N.G., and Phillips, P.E. (2012). Pavlovian valuation systems
Balleine, B.W., and Killcross, S. (2006). Parallel incentive processing: an inte- in learning and decision making. Curr. Opin. Neurobiol. 22, 1054–1061.
grated view of amygdala function. Trends Neurosci. 29, 272–279.
Collins, D.P., Anastasiades, P.G., Marlin, J.J., and Carter, A.G. (2018). Recip-
Balleine, B.W., and O’Doherty, J.P. (2010). Human and rodent homologies in rocal Circuits Linking the Prefrontal Cortex with Dorsal and Ventral Thalamic
action control: corticostriatal determinants of goal-directed and habitual ac- Nuclei. Neuron 98, 366–379.e4.
tion. Neuropsychopharmacology 35, 48–69.
Colwill, R.M., and Rescorla, R.A. (1985). Postconditioning devaluation of a
Balleine, B.W., and Ostlund, S.B. (2007). Still at the choice-point: action selec- reinforcer affects instrumental responding. J. Exp. Psychol. Anim. Behav. Pro-
tion and initiation in instrumental conditioning. Ann. N Y Acad. Sci. 1104, cess. 11, 120–132.
147–171.
Colwill, R.M., and Rescorla, R.A. (1986). Associative structures in instrumental
Balleine, B.W., Delgado, M.R., and Hikosaka, O. (2007). The role of the dorsal learning. Psychol. Learn. Motiv. 20, 55–104.
striatum in reward and decision-making. J. Neurosci. 27, 8161–8165.
Colwill, R.M., and Rescorla, R.A. (1990). Effect of reinforcer devaluation on
Balleine, B.W., Liljeholm, M., and Ostlund, S.B. (2009). The integrative function discriminative control of instrumental behavior. J. Exp. Psychol. Anim. Behav.
of the basal ganglia in instrumental conditioning. Behav. Brain Res. Process. 16, 40–47.
199, 43–52.
Cooper, R.P., and Peebles, D. (2015). Beyond single-level accounts: the role of
Balleine, B.W., Leung, B.K., and Ostlund, S.B. (2011). The orbitofrontal cortex, cognitive architectures in cognitive scientific explanation. Top. Cogn. Sci. 7,
predicted value, and choice. Ann. N Y Acad. Sci. 1239, 43–50. 243–258.

Neuron
Review
Corbit, L.H., and Balleine, B.W. (2016). Learning and Motivational Processes Farovik, A., Place, R.J., McKenzie, S., Porter, B., Munro, C.E., and Eichen-
Contributing to Pavlovian-Instrumental Transfer and Their Neural Bases: baum, H. (2015). Orbitofrontal cortex encodes memories within value-based
Dopamine and Beyond. Curr. Top. Behav. Neurosci. 27, 259–289. schemas and represents contexts that guide memory retrieval. J. Neurosci.
35, 8333–8344.
Corbit, L.H., Muir, J.L., and Balleine, B.W. (2001). The role of the nucleus ac-
cumbens in instrumental conditioning: Evidence of a functional dissociation Fino, E., Vandecasteele, M., Perez, S., Saudou, F., and Venance, L. (2018). Re-
between accumbens core and shell. J. Neurosci. 21, 3251–3260. gion-specific and state-dependent action of striatal GABAergic interneurons.
Nat. Commun. 9, 3339.
Cui, G., Jun, S.B., Jin, X., Pham, M.D., Vogel, S.S., Lovinger, D.M., and Costa,
R.M. (2013). Concurrent activation of striatal direct and indirect pathways dur- Flagel, S.B., Clark, J.J., Robinson, T.E., Mayo, L., Czuj, A., Willuhn, I., Akers,
ing action initiation. Nature 494, 238–242. C.A., Clinton, S.M., Phillips, P.E.M., and Akil, H. (2011). A selective role for
dopamine in stimulus-reward learning. Nature 469, 53–57.
Davey, G.C.L., and Cleland, G.G. (1984). Food anticipation and lever-directed
activities in rats. Learn. Motiv. 15, 12–36. Gabbott, P.L.A., Warner, T.A., Jays, P.R.L., Salway, P., and Busby, S.J. (2005).
Prefrontal cortex in the rat: projections to subcortical autonomic, motor, and
Davey, G.C.L., Oakley, D., and Cleland, G.G. (1981). Autoshaping in the rat: Ef- limbic centers. J. Comp. Neurol. 492, 145–177.
fects of omission on the form of the response. J. Exp. Anal. Behav. 36, 75–91.
Garr, E., and Delamater, A.R. (2019). Exploring the relationship between ac-
Davis, J., and Bitterman, M.E. (1971). Differential reinforcement of other tions, habits, and automaticity in an action sequence task. Learn. Mem. 26,
behavior (DRO): a yoked-control comparison. J. Exp. Anal. Behav. 15, 128–132.
237–241.
Genon, S., Reid, A., Langner, R., Amunts, K., and Eickhoff, S.B. (2018). How to
Dezfouli, A., and Balleine, B.W. (2012). Habits, action sequences and rein- Characterize the Function of a Brain Region. Trends Cogn. Sci. 22, 350–364.
forcement learning. Eur. J. Neurosci. 35, 1036–1051.
Gerfen, C.R., and Surmeier, D.J. (2011). Modulation of striatal projection sys-
Dickinson, A. (1994). Instrumental conditioning. In Animal Cognition and tems by dopamine. Annu. Rev. Neurosci. 34, 441–466.
Learning, N.J. Mackintosh, ed. (Academic Press), pp. 4–79.
Gerfen, C.R., Paletzki, R., and Heintz, N. (2013). GENSAT BAC cre-recombi-
Dickinson, A. (2012). Associative learning and animal cognition. Philos. Trans. nase driver lines to study the functional organization of cerebral cortical and
R. Soc. Lond. B Biol. Sci. 367, 2733–2742. basal ganglia circuits. Neuron 80, 1368–1383.
Dickinson, A., and Balleine, B.W. (1993). Actions and responses: the dual psy-
Gomez-Marin, A., Paton, J.J., Kampff, A.R., Costa, R.M., and Mainen, Z.F.
chology of behaviour. In Spatial Representation: Problems in Philosophy and
(2014). Big behavioral data: psychology, ethology and the foundations of
Psychology, N. Eilan, R.A. McCarthy, and B. Brewer, eds. (Blackwell Publish-
neuroscience. Nat. Neurosci. 17, 1455–1462.
ing), pp. 277–293.
Gonzales, K.K., and Smith, Y. (2015). Cholinergic interneurons in the dorsal
Dickinson, A., and Balleine, B.W. (2002). The role of learning in motivation. In
and ventral striatum: anatomical and functional considerations in normal and
Steven’s Handbook of Experimental Psychology. In Learning, Motivation, &
diseased conditions. Ann. N.Y. Acad. Sci. 1349, 1–45.
Emotion, Third Edition, Volume 3, C.R. Gallistel, ed. (John Wiley & Sons),
pp. 497–533. Gourley, S.L., and Taylor, J.R. (2016). Going and stopping: Dichotomies in
behavioral control by the prefrontal cortex. Nat. Neurosci. 19, 656–664.
Dickinson, A., and Charnock, D.J. (1985). Contingency Effects with Maintained
Instrumental Reinforcement. Q. J. Exp. Psychol. B 37, 397–416.
Graybiel, A.M. (2008). Habits, rituals, and the evaluative brain. Annu. Rev. Neu-
Dickinson, A., and Mulatero, C.W. (1989). Reinforcer specificity of the suppres- rosci. 31, 359–387.
sion of instrumental performance on a non-contingent schedule. Behav. Pro-
Gremel, C.M., and Costa, R.M. (2013). Orbitofrontal and striatal circuits
cesses 19, 167–180.
dynamically encode the shift between goal-directed and habitual actions.
Dickinson, A., Squire, S., Varga, Z., and Smith, J.W. (1998). Omission Learning Nat. Commun. 4, 2264.
after Instrumental Pretraining. Q. J. Exp. Psychol. B 51, 271–286.
Haggard, P. (2008). Human volition: towards a neuroscience of will. Nat. Rev.
Ding, J.B., Guzman, J.N., Peterson, J.D., Goldberg, J.A., and Surmeier, D.J. Neurosci. 9, 934–946.
(2010). Thalamic gating of corticostriatal signaling by cholinergic interneurons.
Neuron 67, 294–307. Halbout, B., Marshall, A.T., Azimi, A., Liljeholm, M., Mahler, S.V., Wassum,
K.M., and Ostlund, S.B. (2019). Mesolimbic dopamine projections mediate
Dolan, R.J., and Dayan, P. (2013). Goals and habits in the brain. Neuron 80, cue-motivated reward seeking but not reward retrieval in rats. eLife 8, 8.
312–325.
Hammond, L.J. (1980). The effect of contingency upon the appetitive condi-
Douglas, R.J., and Martin, K.A.C. (2004). Neuronal circuits of the neocortex. tioning of free-operant behavior. J. Exp. Anal. Behav. 34, 297–304.
Annu. Rev. Neurosci. 27, 419–451.
Hardwick, R.M., Forrence, A.D., Krakauer, J.W., and Haith, A.M. (2018). Time-
Elber-Dorozko, L., and Loewenstein, Y. (2018). Striatal action-value neurons dependent competition between habitual and goal-directed response prepa-
reconsidered. eLife 7, 7. ration. bioRxiv. https://doi.org/10.1101/201095.
Elghaba, R., Vautrelle, N., and Bracci, E. (2016). Mutual Control of Cholinergic Harris, K.D., and Shepherd, G.M.G. (2015). The neocortical circuit: themes and
and Low-Threshold Spike Interneurons in the Striatum. Front. Cell. Neurosci. variations. Nat. Neurosci. 18, 170–181.
10, 111.
Hart, G., and Balleine, B.W. (2016). Consolidation of Goal-Directed Action De-
Ellwood, I.T., Patel, T., Wadia, V., Lee, A.T., Liptak, A.T., Bender, K.J., and pends on MAPK/ERK Signaling in Rodent Prelimbic Cortex. J. Neurosci. 36,
Sohal, V.S. (2017). Tonic or Phasic Stimulation of Dopaminergic Projections 11974–11986.
to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously
Learned Behavioral Strategies. J. Neurosci. 37, 8315–8329. Hart, G., Leung, B.K., and Balleine, B.W. (2014). Dorsal and ventral streams:
the distinct role of striatal subregions in the acquisition and performance of
Everitt, B.J., and Robbins, T.W. (2016). Drug Addiction: Updating Actions to goal-directed actions. Neurobiol. Learn. Mem. 108, 104–118.
Habits to Compulsions Ten Years On. Annu. Rev. Psychol. 67, 23–50.
Hart, G., Bradfield, L.A., Fok, S.Y., Chieng, B., and Balleine, B.W. (2018a). The
Fanselow, M.S., and Dong, H.-W. (2010). Are the dorsal and ventral hippocam- Bilateral Prefronto-striatal Pathway Is Necessary for Learning New Goal-
pus functionally distinct structures? Neuron 65, 7–19. Directed Actions. Curr. Biol. 28, 2218–2229.e7.
Faure, A., Haberland, U., Condé, F., and El Massioui, N. (2005). Lesion to the Hart, G., Bradfield, L.A., and Balleine, B.W. (2018b). Prefrontal Corticostriatal
nigrostriatal dopamine system disrupts stimulus-response habit formation. Disconnection Blocks the Acquisition of Goal-Directed Action. J. Neurosci. 38,
J. Neurosci. 25, 2771–2780. 1311–1322.

Neuron
Review
Heath, E.M., Chieng, B., Christie, M.J., and Balleine, B.W. (2018). Substance P Laurent, V., Bertran-Gonzalez, J., Chieng, B.C., and Balleine, B.W. (2014).
and dopamine interact to modulate the distribution of delta-opioid receptors d-opioid and dopaminergic processes in accumbens shell modulate the
on cholinergic interneurons in the striatum. Eur. J. Neurosci. 47, 1159–1173. cholinergic control of predictive learning and choice. J. Neurosci. 34,
1358–1369.
Hilário, M.R.F., Clouse, E., Yin, H.H., and Costa, R.M. (2007). Endocannabi-
noid signaling is critical for habit formation. Front. Integr. Nuerosci. 1, 6. LeDoux, J.E., and Brown, R. (2017). A higher-order theory of emotional con-
sciousness. Proc. Natl. Acad. Sci. USA 114, E2016–E2025.
Holland, P.C. (1977). Conditioned stimulus as a determinant of the form of the
Pavlovian conditioned response. J. Exp. Psychol. Anim. Behav. Process. Lee, R.S., Mattar, M.G., Parker, N.F., Witten, I.B., and Daw, N.D. (2019).
3, 77–104. Reward prediction error does not explain movement selectivity in DMS-projec-
ting dopamine neurons. eLife 8, 8.
Holland, P.C. (1979). Differential effects of omission contingencies on various
components of Pavlovian appetitive conditioned responding in rats. J. Exp. Leung, B.K., and Balleine, B.W. (2015). Ventral pallidal projections to medio-
Psychol. Anim. Behav. Process. 5, 178–193. dorsal thalamus and ventral tegmental area play distinct roles in outcome-spe-
cific Pavlovian-instrumental transfer. J. Neurosci. 35, 4953–4964.
Holly, E.N., Davatolhagh, M.F., Choi, K., Alabi, O.O., Vargas Cifuentes, L., and
Fuccillo, M.V. (2019). Striatal Low-Threshold Spiking Interneurons Regulate Lichtenberg, N.T., Pennington, Z.T., Holley, S.M., Greenfield, V.Y., Cepeda, C.,
Goal-Directed Learning. Neuron 103, 92–101.e6. Levine, M.S., and Wassum, K.M. (2017). Basolateral Amygdala to Orbitofrontal
Cortex Projections Enable Cue-Triggered Reward Expectations. J. Neurosci.
Huberdeau, D.M., Krakauer, J.W., and Haith, A.M. (2015). Dual-process 37, 8374–8384.
decomposition in human sensorimotor adaptation. Curr. Opin. Neurobiol.
Lingawi, N.W., and Balleine, B.W. (2012). Amygdala central nucleus interacts
33, 71–77.
with dorsolateral striatum to regulate the acquisition of habits. J. Neurosci.
32, 1073–1081.
Hunnicutt, B.J., Jongbloets, B.C., Birdsong, W.T., Gertz, K.J., Zhong, H., and
Mao, T. (2016). A comprehensive excitatory input map of the striatum reveals Livneh, Y., Ramesh, R.N., Burgess, C.R., Levandowski, K.M., Madara, J.C.,
novel functional organization. eLife 5, 5. Fenselau, H., Goldey, G.J., Diaz, V.E., Jikomes, N., Resch, J.M., et al.
(2017). Homeostatic circuits selectively gate food cue responses in insular cor-
Izhikevich, E.M. (2007). Solving the distal reward problem through linkage of tex. Nature 546, 611–616.
STDP and dopamine signaling. Cereb. Cortex 17, 2443–2452.
Mallet, N., Schmidt, R., Leventhal, D., Chen, F., Amer, N., Boraud, T., and
€ermann, M., and Eickhoff, S.B.
Jackson, S.R., Parkinson, A., Kim, S.Y., Schu Berke, J.D. (2016). Arkypallidal Cells Send a Stop Signal to Striatum. Neuron
(2011). On the functional anatomy of the urge-for-action. Cogn. Neurosci. 2, 89, 308–316.
227–243.
Mandelbaum, G., Taranda, J., Haynes, T.M., Hochbaum, D.R., Huang, K.W.,
Jacoby, L.L. (1991). A process dissociation framework: Separating automatic Hyun, M., Umadevi Venkataraju, K., Straub, C., Wang, W., Robertson, K.,
from intentional uses of memory. J. Mem. Lang. 30, 513–541. et al. (2019). Distinct Cortical-Thalamic-Striatal Circuits through the Parafas-
cicular Nucleus. Neuron 102, 636–652.e7.
Jezzini, A., Mazzucato, L., La Camera, G., and Fontanini, A. (2013). Processing
of hedonic and chemosensory features of taste in medial prefrontal and insular Matamales, M., Skrbis, Z., Hatch, R.J., Balleine, B.W., Götz, J., and Bertran-
networks. J. Neurosci. 33, 18966–18978. Gonzalez, J. (2016). Aging-Related Dysfunction of Striatal Cholinergic Inter-
neurons Produces Conflict in Action Selection. Neuron 90, 362–373.
Kawai, R., Markman, T., Poddar, R., Ko, R., Fantana, A.L., Dhawale, A.K.,
Kampff, A.R., and Ölveczky, B.P. (2015). Motor cortex is required for learning Maze, J.R. (1983). The Meaning of Behaviour (Allen & Unwin).
but not for executing a motor skill. Neuron 86, 800–812.
McGeorge, A.J., and Faull, R.L. (1989). The organization of the projection from
Keistler, C., Barker, J.M., and Taylor, J.R. (2015). Infralimbic prefrontal cortex the cerebral cortex to the striatum in the rat. Neuroscience 29, 503–537.
interacts with nucleus accumbens shell to unmask expression of outcome-se-
lective Pavlovian-to-instrumental transfer. Learn. Mem. 22, 509–513. Miller, E.K., Lundqvist, M., and Bastos, A.M. (2018). Working Memory 2.0.
Neuron 100, 463–475.
Kent, K., Deng, Q., and McNeill, T.H. (2013). Unilateral skill acquisition induces
bilateral NMDA receptor subunit composition shifts in the rat sensorimotor Mohebi, A., Pettibone, J.R., Hamid, A.A., Wong, J.T., Vinson, L.T., Patriarchi,
striatum. Brain Res. 1517, 77–86. T., Tian, L., Kennedy, R.T., and Berke, J.D. (2019). Dissociable dopamine dy-
namics for learning and motivation. Nature 570, 65–70.
Killcross, S., and Coutureau, E. (2003). Coordination of actions and habits in
the medial prefrontal cortex of rats. Cereb. Cortex 13, 400–408. Monteiro, P., Barak, B., Zhou, Y., McRae, R., Rodrigues, D., Wickersham, I.R.,
and Feng, G. (2018). Dichotomous parvalbumin interneuron populations in
Kim, H., Sul, J.H., Huh, N., Lee, D., and Jung, M.W. (2009). Role of striatum in dorsolateral and dorsomedial striatum. J. Physiol. 596, 3695–3707.
updating values of chosen actions. J. Neurosci. 29, 14701–14712.
Morrison, S.E., Bamkole, M.A., and Nicola, S.M. (2015). Sign Tracking, but Not
Goal Tracking, is Resistant to Outcome Devaluation. Front. Neurosci. 9, 468.
Klaus, A., Martins, G.J., Paixao, V.B., Zhou, P., Paninski, L., and Costa, R.M.
(2017). The Spatiotemporal Organization of the Striatum Encodes Action Nagy, A.J., Takeuchi, Y., and Berényi, A. (2018). Coding of self-motion-
Space. Neuron 95, 1171–1180.e7. induced and self-independent visual motion in the rat dorsomedial striatum.
PLoS Biol. 16, e2004712.
Klaus, A., Alves da Silva, J., and Costa, R.M. (2019). What, If, and When to
Move: Basal Ganglia Circuits and Self-Paced Action Initiation. Annu. Rev. Neu- Nakayama, H., Ibañez-Tallon, I., and Heintz, N. (2018). Cell-Type-Specific
rosci. 42, 459–483. Contributions of Medial Prefrontal Neurons to Flexible Behaviors. J. Neurosci.
38, 4490–4504.
Koechlin, E., Ody, C., and Kouneiher, F. (2003). The architecture of cognitive
control in the human prefrontal cortex. Science 302, 1181–1185. Naneix, F., Marchand, A.R., Di Scala, G., Pape, J.-R., and Coutureau, E.
(2009). A role for medial prefrontal dopaminergic innervation in instrumental
Krakauer, J.W., Ghazanfar, A.A., Gomez-Marin, A., MacIver, M.A., and Poep- conditioning. J. Neurosci. 29, 6599–6606.
pel, D. (2017). Neuroscience Needs Behavior: Correcting a Reductionist Bias.
Neuron 93, 480–490. Nonomura, S., Nishizawa, K., Sakai, Y., Kawaguchi, Y., Kato, S., Uchigashima,
M., Watanabe, M., Yamanaka, K., Enomoto, K., Chiken, S., et al. (2018). Moni-
Kravitz, A.V., Tye, L.D., and Kreitzer, A.C. (2012). Distinct roles for direct and toring and Updating of Action Selection for Goal-Directed Behavior through
indirect pathway striatal neurons in reinforcement. Nat. Neurosci. 15, 816–818. the Striatal Direct and Indirect Pathways. Neuron 99, 1302–1314.e5.
Kupferschmidt, D.A., Juczewski, K., Cui, G., Johnson, K.A., and Lovinger, Noonan, M.P., Kolling, N., Walton, M.E., and Rushworth, M.F.S. (2012). Re-
D.M. (2017). Parallel, but Dissociable, Processing in Discrete Corticostriatal In- evaluating the role of the orbitofrontal cortex in reward and reinforcement.
puts Encodes Skill Learning. Neuron 96, 476–489.e5. Eur. J. Neurosci. 35, 997–1010.

Neuron
Review
O’Doherty, J.P., Cockburn, J., and Pauli, W.M. (2017). Learning, Reward, and Shiflett, M.W., and Balleine, B.W. (2011). Contributions of ERK signaling in the
Decision Making. Annu. Rev. Psychol. 68, 73–100. striatum to instrumental learning and performance. Behav. Brain Res. 218,
240–247.
O’Hare, J.K., Ade, K.K., Sukharnikova, T., Van Hooser, S.D., Palmeri, M.L., Yin,
H.H., and Calakos, N. (2016). Pathway-Specific Striatal Substrates for Habitual Shin, Y.K., Proctor, R.W., and Capaldi, E.J. (2010). A review of contemporary
Behavior. Neuron 89, 472–479. ideomotor theory. Psychol. Bull. 136, 943–974.
Ostlund, S.B., and Balleine, B.W. (2005). Lesions of medial prefrontal cortex Shindou, T., Shindou, M., Watanabe, S., and Wickens, J. (2019). A silent eligi-
disrupt the acquisition but not the expression of goal-directed learning. bility trace enables dopamine-dependent synaptic plasticity for reinforcement
J. Neurosci. 25, 7763–7770. learning in the mouse striatum. Eur. J. Neurosci. 49, 726–736.
Ostlund, S.B., and Balleine, B.W. (2007). Orbitofrontal cortex mediates Smith, K.S., and Graybiel, A.M. (2013). A dual operator view of habitual
outcome encoding in Pavlovian but not instrumental conditioning. J. Neurosci. behavior reflecting cortical and striatal dynamics. Neuron 79, 361–374.
27, 4819–4825.
Sommer, W.H., Costa, R.M., and Hansson, A.C. (2014). Dopamine systems
Ostlund, S.B., and Balleine, B.W. (2008). Differential involvement of the baso- adaptation during acquisition and consolidation of a skill. Front. Integr. Nuer-
lateral amygdala and mediodorsal thalamus in instrumental action selection. osci. 8, 87.
J. Neurosci. 28, 4398–4405.
Stalnaker, T.A., Berg, B., Aujla, N., and Schoenbaum, G. (2016). Cholinergic In-
Panayi, M.C., and Killcross, S. (2018). Functional heterogeneity within the ro- terneurons Use Orbitofrontal Input to Track Beliefs about Current State.
dent lateral orbitofrontal cortex dissociates outcome devaluation and reversal J. Neurosci. 36, 6242–6257.
learning deficits. eLife 7, 7.
Stewart, C.V., and Plenz, D. (2006). Inverted-U profile of dopamine-NMDA-
Parker, N.F., Cameron, C.M., Taliaferro, J.P., Lee, J., Choi, J.Y., Davidson, mediated spontaneous avalanche recurrence in superficial layers of rat pre-
T.J., Daw, N.D., and Witten, I.B. (2016). Reward and choice encoding in termi- frontal cortex. J. Neurosci. 26, 8148–8159.
nals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci.
19, 845–854. Sullivan, M.A., Chen, H., and Morikawa, H. (2008). Recurrent inhibitory network
among striatal cholinergic interneurons. J. Neurosci. 28, 8682–8690.
Parkes, S.L., and Balleine, B.W. (2013). Incentive memory: evidence the baso-
lateral amygdala encodes and the insular cortex retrieves outcome values to Tai, L.-H., Lee, A.M., Benavidez, N., Bonci, A., and Wilbrecht, L. (2012). Tran-
guide choice between goal-directed actions. J. Neurosci. 33, 8753–8763. sient stimulation of distinct subpopulations of striatal neurons mimics changes
in action value. Nat. Neurosci. 15, 1281–1289.
Parkes, S.L., Bradfield, L.A., and Balleine, B.W. (2015). Interaction of insular
Tanimura, A., Du, Y., Kondapalli, J., Wokosin, D.L., and Surmeier, D.J. (2019).
cortex and ventral striatum mediates the effect of incentive memory on choice
Cholinergic Interneurons Amplify Thalamostriatal Excitation of Striatal Indirect
between goal-directed actions. J. Neurosci. 35, 6464–6471.
Pathway Neurons in Parkinson’s Disease Models. Neuron 101, 444–458.e6.
Parkes, S.L., Ravassard, P.M., Cerpa, J.-C., Wolff, M., Ferreira, G., and Cou-
Taverna, S., Ilijic, E., and Surmeier, D.J. (2008). Recurrent collateral connec-
tureau, E. (2018). Insular and Ventrolateral Orbitofrontal Cortices Differentially
tions of striatal medium spiny neurons are disrupted in models of Parkinson’s
Contribute to Goal-Directed Behavior in Rodents. Cereb. Cortex 28,
disease. J. Neurosci. 28, 5504–5512.
2313–2325.
Thrailkill, E.A., and Bouton, M.E. (2015). Contextual control of instrumental ac-
Parkinson, J.A., Cardinal, R.N., and Everitt, B.J. (2000). Limbic cortical-ventral tions and habits. J. Exp. Psychol. Anim. Learn. Cogn. 41, 69–80.
striatal systems underlying appetitive conditioning. Prog. Brain Res. 126,
263–285. Tolman, E.C. (1932). Purposive Behavior in Animals and Men (Century Books).
Peak, J., Hart, G., and Balleine, B.W. (2019). From learning to action: the inte- Tsutsui, K.-I., Oyama, K., Nakamura, S., and Iijima, T. (2016). Comparative
gration of dorsal striatal input and output pathways in instrumental condition- Overview of Visuospatial Working Memory in Monkeys and Rats. Front.
ing. Eur. J. Neurosci. 49, 658–671. Syst. Neurosci. 10, 99.
Rangel, A., Camerer, C., and Montague, P.R. (2008). A framework for studying Wassum, K.M., Cely, I.C., Maidment, N.T., and Balleine, B.W. (2009). Disrup-
the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, tion of endogenous opioid activity during instrumental learning enhances habit
545–556. acquisition. Neuroscience 163, 770–780.
Reynolds, J.N., Hyland, B.I., and Wickens, J.R. (2001). A cellular mechanism of Wassum, K.M., Ostlund, S.B., Balleine, B.W., and Maidment, N.T. (2011). Dif-
reward-related learning. Nature 413, 67–70. ferential dependence of Pavlovian incentive motivation and instrumental
incentive learning processes on dopamine signaling. Learn. Mem. 18,
Saunders, B.T., Richard, J.M., Margolis, E.B., and Janak, P.H. (2018). Dopa- 475–483.
mine neurons create Pavlovian conditioned stimuli with circuit-defined motiva-
tional properties. Nat. Neurosci. 21, 1072–1083. Whitaker, L.R., Warren, B.L., Venniro, M., Harte, T.C., McPherson, K.B.,
Beidel, J., Bossert, J.M., Shaham, Y., Bonci, A., and Hope, B.T. (2017). Bidi-
Searle, J.R. (1983). Intentionality: An Essay in the Philosophy of Mind (Cam- rectional Modulation of Intrinsic Excitability in Rat Prelimbic Cortex Neuronal
bridge University Press). Ensembles and Non-Ensembles after Operant Learning. J. Neurosci. 37,
8845–8856.
Sesack, S.R., and Grace, A.A. (2010). Cortico-Basal Ganglia reward network:
microcircuitry. Neuropsychopharmacology 35, 27–47. Wickens, J.R., Begg, A.J., and Arbuthnott, G.W. (1996). Dopamine reverses
the depression of rat corticostriatal synapses which normally follows high-fre-
Shan, Q., Ge, M., Christie, M.J., and Balleine, B.W. (2014). The acquisition of quency stimulation of cortex in vitro. Neuroscience 70, 1–5.
goal-directed actions generates opposing plasticity in direct and indirect path-
ways in dorsomedial striatum. J. Neurosci. 34, 9196–9201. Wolpert, D.M., and Flanagan, J.R. (2016). Computations underlying sensori-
motor learning. Curr. Opin. Neurobiol. 37, 7–11.
Shan, Q., Christie, M.J., and Balleine, B.W. (2015). Plasticity in striatopallidal
projection neurons mediates the acquisition of habitual actions. Eur. J. Neuro- €nger, D. (2016). Psychology of Habit. Annu. Rev. Psychol.
Wood, W., and Ru
sci. 42, 2097–2104. 67, 289–314.
Shepherd, G.M.G. (2013). Corticostriatal connectivity and its role in disease. Xiong, Q., Znamenskiy, P., and Zador, A.M. (2015). Selective corticostriatal
Nat. Rev. Neurosci. 14, 278–291. plasticity during acquisition of an auditory discrimination task. Nature 521,
348–351.
Shiflett, M.W., and Balleine, B.W. (2010). At the limbic-motor interface: discon-
nection of basolateral amygdala from nucleus accumbens core and shell re- Yin, H.H., Knowlton, B.J., and Balleine, B.W. (2004). Lesions of dorsolateral
veals dissociable components of incentive motivation. Eur. J. Neurosci. 32, striatum preserve outcome expectancy but disrupt habit formation in instru-
1735–1743. mental learning. Eur. J. Neurosci. 19, 181–189.

Neuron
Review
Yin, H.H., Ostlund, S.B., Knowlton, B.J., and Balleine, B.W. (2005a). The role of Yin, H.H., Ostlund, S.B., and Balleine, B.W. (2008). Reward-guided learning
the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, beyond dopamine in the nucleus accumbens: the integrative functions of cor-
513–523. tico-basal ganglia networks. Eur. J. Neurosci. 28, 1437–1448.
Yin, H.H., Knowlton, B.J., and Balleine, B.W. (2005b). Blockade of NMDA re- Yin, H.H., Mulcare, S.P., Hilário, M.R.F., Clouse, E., Holloway, T., Davis, M.I.,
ceptors in the dorsomedial striatum prevents action-outcome learning in Hansson, A.C., Lovinger, D.M., and Costa, R.M. (2009). Dynamic reorganiza-
instrumental conditioning. Eur. J. Neurosci. 22, 505–512. tion of striatal circuits during the acquisition and consolidation of a skill. Nat.
Neurosci. 12, 333–341.
Yin, H.H., Knowlton, B.J., and Balleine, B.W. (2006). Inactivation of dorsolat-
eral striatum enhances sensitivity to changes in the action-outcome contin- Zahm, D.S. (1999). Functional-anatomical implications of the nucleus accum-
gency in instrumental conditioning. Behav. Brain Res. 166, 189–196. bens core and shell subterritories. Ann. N Y Acad. Sci. 877, 113–128.

Balleine 2019

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Balleine 2019

Uploaded by

Copyright:

Available Formats

Neuron

The Meaning of Behavior:

Neuron 104, October 9, 2019 ª 2019 Elsevier Inc. 47

48 Neuron 104, October 9, 2019

Neuron 104, October 9, 2019 49

50 Neuron 104, October 9, 2019

Neuron 104, October 9, 2019 51

52 Neuron 104, October 9, 2019

Figure 3. The Feedback Processes Mediating Goal-Directed and Habitual Action

Neuron 104, October 9, 2019 53

54 Neuron 104, October 9, 2019

Neuron 104, October 9, 2019 55

56 Neuron 104, October 9, 2019

Neuron 104, October 9, 2019 57

58 Neuron 104, October 9, 2019

Neuron 104, October 9, 2019 59

60 Neuron 104, October 9, 2019

Neuron 104, October 9, 2019 61

62 Neuron 104, October 9, 2019

You might also like