Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/49672485

A neural theory of visual atention and short-term memory (NTVA)

Article  in  Neuropsychologia · December 2010


DOI: 10.1016/j.neuropsychologia.2010.12.006 · Source: PubMed

CITATIONS READS

72 317

3 authors:

Claus Bundesen Thomas Habekost


University of Copenhagen University of Copenhagen
158 PUBLICATIONS   5,957 CITATIONS    61 PUBLICATIONS   1,644 CITATIONS   

SEE PROFILE SEE PROFILE

Søren Kyllingsbæk
University of Copenhagen
57 PUBLICATIONS   1,960 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Arousal and Visual Attention View project

Noise improves cognitive performance in inattentive people View project

All content following this page was uploaded by Claus Bundesen on 10 March 2018.

The user has requested enhancement of the downloaded file.


This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy

Neuropsychologia 49 (2011) 1446–1457

Contents lists available at ScienceDirect

Neuropsychologia
journal homepage: www.elsevier.com/locate/neuropsychologia

A neural theory of visual attention and short-term memory (NTVA)


Claus Bundesen ∗ , Thomas Habekost, Søren Kyllingsbæk
Department of Psychology, University of Copenhagen, Øster Farimagsgade 2A, DK-1353 Copenhagen K, Denmark

a r t i c l e i n f o a b s t r a c t

Article history: The neural theory of visual attention and short-term memory (NTVA) proposed by Bundesen, Habekost,
Received 27 July 2010 and Kyllingsbæk (2005) is reviewed. In NTVA, filtering (selection of objects) changes the number of cortical
Received in revised form neurons in which an object is represented so that this number increases with the behavioural importance
10 November 2010
of the object. Another mechanism of selection, pigeonholing (selection of features), scales the level of
Accepted 2 December 2010
activation in neurons coding for a particular feature. By these mechanisms, behaviourally important
Available online 10 December 2010
objects and features are likely to win the competition to become encoded into visual short-term memory
(VSTM). The VSTM system is conceived as a feedback mechanism that sustains activity in the neurons
Keywords:
Vision
that have won the attentional competition. NTVA accounts both for a wide range of attentional effects in
Attention human performance (reaction times and error rates) and a wide range of effects observed in firing rates
VSTM of single cells in the primate visual system.
TVA © 2010 Elsevier Ltd. All rights reserved.

1. Introduction the number of neurons representing the categorization, which is


controlled by filtering, and the level of activation of the individ-
This article is a review of the neural theory of visual attention ual neurons representing the categorization, which is controlled
and short-term memory (NTVA) proposed by Bundesen, Habekost, by pigeonholing. The rate equation of TVA essentially expresses this
and Kyllingsbæk (2005) (see also Bundesen & Habekost, 2008; fact.
Kyllingsbæk, 2006). By use of the same basic equations, NTVA Filtering makes the number of cells in which an object is rep-
accounts for a wide range of attentional effects in human per- resented increase with the behavioural importance of the object.
formance (reaction times and error rates) and a wide range of Thus, processing is parallel with differential allocation of resources
effects observed in firing rates of single cells in the visual system so that important objects are represented in many cells. More
of primates. NTVA is a further development of the theory of visual specifically, the probability that a cortical neuron represents a par-
attention (TVA) proposed by Bundesen (1990), which is a formal, ticular object in its classical receptive field equals the attention
computational theory that applies to attentional effects in mind weight of the object divided by the sum of the attention weights
and behaviour reported in the psychological literature. across all objects in the receptive field.
NTVA provides a neural interpretation of the central equations The weight equation of TVA describes the computation of atten-
of TVA: the rate and weight equations. The two equations jointly tion weights. Logically the weights must be computed before
describe two mechanisms of attentional selection: one for selection processing resources (cells) can be distributed in accordance with
of objects and one for selection of features or categories. Paying trib- the weights. Therefore, in NTVA, a normal perceptual cycle consists
ute to Broadbent (1971), we use the term filtering for selection of of two waves: a wave of unselective processing, in which attention
objects and the term pigeonholing for selection of features or cate- weights are computed, and a wave of selective processing, when
gories. Selection of a feature is the same as selection of a category processing resources have been allocated in accordance with the
formed by all those objects that share the given feature. weights. During the first wave, cortical processing resources are
In NTVA, filtering changes the number of cortical neurons in distributed at random (unselectively) across the visual field. At the
which an object is represented, whereas pigeonholing changes the end of the first wave, an attention weight has been computed for
rate of firing of cortical neurons coding for particular features (see each object in the visual field and the weight has been stored in
Fig. 1). The total activation representing a visual categorization of a priority map. The weights are used for reallocation of attention
the form “object x has feature i” is directly proportional to both (visual processing capacity) by dynamic remapping of receptive
fields of cortical neurons so that the number of neurons allocated
to an object increases with the attention weight of the object. Thus,
during the second wave, cortical processing is selective in the sense
∗ Corresponding author. Tel.: +45 3532 4858/3253 1752. that the amount of processing resources allocated to an object (the
E-mail address: bundesen@psy.ku.dk (C. Bundesen).

0028-3932/$ – see front matter © 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neuropsychologia.2010.12.006
Author's personal copy

C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457 1447

2.1.1. Rate equation


By the rate equation of TVA, the rate, v(x, i), at which a particular
visual categorization, “x belongs to i,” is encoded into VSTM is given
Pigeonholing (rate of firing)

by a product of three terms:


wx
v(x, i) = (x, i)ˇi  (1)
z∈S
wz

The first term, (x, i), is the strength of the sensory evidence
that x belongs to category i. The second term, ˇi , is a perceptual
decision bias associated with category i (0 ≤ ˇi ≤ 1). The third term
is the relative attention weight of object x—that is, the weight of
object x, wx , divided by the sum of weights across all objects in the
visual field, S.

2.1.2. Weight equation


The attention weights in the rate equation of TVA are derived
from pertinence values. Every visual category j is supposed to have
Filtering (number of cells)
a pertinence, j , which is a nonnegative real number. The perti-
Fig. 1. Attentional selection in NTVA: combined effects of filtering (selection of nence of category j is a measure of the momentary importance of
objects) and pigeonholing (selection of features) on the set of cortical spike trains attending to objects that belong to category j. The attention weight
representing a particular visual categorization of the form “object x has feature i.” of an object x in the visual field is given by the weight equation of
The 4 conditions (quadrants) correspond to the factorial combinations of 2 levels
TVA,
of filtering (weak vs. strong support to object x) by 2 levels of pigeonholing (weak
vs. strong support to feature i). Filtering changes the number of cortical neurons in

which an object is represented. Pigeonholing changes the rate of firing of cortical
wx = (x, j)j , (2)
neurons coding for a particular feature. j∈R
From “A Neural Theory of Visual Attention: Bridging Cognition and Neurophysiol-
ogy,” by Bundesen et al. (2005), Psychological Review, 112, p. 292. Copyright 2005 by where R is the set of all visual categories, (x, j) is the strength of
the American Psychological Association. Adapted with permission. the sensory evidence that object x belongs to category j, and j is
the pertinence of category j. By Eq. (2), the attention weight of an
object is a weighted sum of pertinence values. The pertinence of a
number of neurons in which the object is represented) varies with given category enters the sum with a weight equal to the strength
the attention weight of the object. As more processing resources of the sensory evidence that the object belongs to the category.
are devoted to behaviourally important objects than less important
ones, the important objects are more likely to become encoded into
2.2. Mechanisms of selection
visual short-term memory (VSTM). The VSTM system is conceived
as a feedback mechanism that sustains activity in the neurons that
Taken together, the rate and weight equations of TVA describe
have won the attentional competition.
two mechanisms of selection: a mechanism for selection of objects
In the rest of this article, we first describe TVA as a formal the-
(filtering) and a mechanism for selection of categories (pigeonhol-
ory of visual attention and short-term memory and summarize the
ing). The filtering mechanism is represented by pertinence values
applications of TVA to human performance. We then present the
and attention weights. If selection of objects with feature j is
neural interpretation of TVA, NTVA, and review some examples of
desired, the pertinence of feature j should be high. The weight
applications of NTVA to single-cell studies in primates.
equation implies that when feature j has a high pertinence, objects
possessing feature j get high attention weights. Accordingly, by
2. A formal theory of visual attention and short-term the rate equation, processing of objects with feature j is fast, so
memory (TVA) objects with this feature are likely to win the processing race and
be encoded into VSTM.
2.1. Basic assumptions The pigeonholing mechanism is represented by perceptual deci-
sion bias parameters. Pertinence values determine which objects
In TVA, both visual recognition and selection of objects in the are selected (filtering), but perceptual biases determine how the
visual field consist in making visual categorizations. A visual catego- objects are categorized (pigeonholing). If particular types of cate-
rization has the form “object x has feature i” or, equivalently, “object gorizations must be reported or otherwise responded to, the bias
x belongs to category i.” The categorization is made (or, equivalently, values of the corresponding categories should be high so that the
selected) when the categorization is encoded into visual short-term desired types of categorizations are likely to be made (cf. Eq. (1)).
memory (VSTM). When the visual categorization that x belongs to i When the selection system is coupled to a sensory system
is made, object x is said both to be selected and also to be recognized that supplies appropriate  values, and when pertinence and bias
as possessing feature i or, equivalently, as a member of category i. parameters have been set, both filtering and pigeonholing are
When a visual categorization of an object completes processing, accomplished by an encoding race between visual categorizations
the categorization enters VSTM if memory space for the categoriza- whose rate parameters are determined through the simple alge-
tion is available in VSTM. Usually the capacity of VSTM is assumed braic operations of the rate and weight equations. Thus the theory
to be limited to K different objects, where K is about 4 (cf. Luck & yields a truly computational account of selective attention in vision.
Vogel, 1997).
Clearing VSTM starts a race among objects in the visual field to 2.2.1. Note on search templates
become encoded into VSTM. An object is encoded in VSTM if, and Many investigators have emphasized the role of search templates
only if, some categorization of the object is encoded in VSTM. Each in visual search (see, e.g., Duncan & Humphreys, 1989). In the biased
object x may be represented in the encoding race by all possible competition theory of Desimone and Duncan (1995), Desimone
categorizations of the object. (1999), and Duncan (1996), visual search is initiated by uploading a
Author's personal copy

1448 C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457

search template, which is a representation of the target, into VSTM, TVA enables a very specific measurement of attention deficits in
which in turn yields a competition bias that facilitates subsequent visual neglect patients (see also Bublak et al., 2005; Finke et al.,
processing for sensory input that matches the template. In TVA, 2005). TVA-based assessment has now been used in studies of
visual selection is determined by , ˇ and  values. Other factors, simultanagnosia (Duncan et al., 2003), integrative agnosia (Gerlach,
such as search templates, may influence the selection process, but Marstrand, Habekost, & Gade, 2005), alexia (Habekost & Starrfelt,
they do so via changes in , ˇ and  values.  values are computed by 2006; Starrfelt, Habekost, & Gerlach, 2010; Starrfelt, Habekost, &
comparing stimuli against memory representations. For example, Leff, 2009), developmental dyslexia (Dubois et al., 2010), Hunting-
the strength of the sensory evidence that object x belongs to cate- ton’s disease (Finke, Bublak, Dose, Müller, & Schneider, 2006; Finke
gory i, (x, i), is computed by comparing object x against a memory et al., 2007), Alzheimer’s disease (Bublak, Redel, & Finke, 2006;
representation of members of category i (the template associated Bublak et al., 2009; Redel et al., 2010), and effects of stroke in par-
with category i). A representation in VSTM can be used as a template ticular parts of the brain (Habekost & Bundesen, 2003; Habekost
for category i, that is, used for computing (x, i) by being compared & Rostrup, 2006, 2007; Peers et al., 2005; see Habekost & Starrfelt,
against x, but an observer does not necessarily search the visual 2009, for a review). Recently the effect of pharmacological sub-
input for any item represented in VSTM. A short-term visual search stances on the TVA parameters has also been investigated (Finke
template can be defined as a representation in VSTM that is used for et al., 2010).
computing  values for a category with a high  value. Long-term
templates can be given a similar definition. A long-term visual search
3.4. Other cognitive domains
template can be defined as a representation in long-term memory
that is used for computing  values for a category with a high 
Logan (1996) proposed a combination of TVA with the con-
value. In TVA, a search template for category i need not be retained
tour detector (CODE) theory of perceptual grouping by proximity
in VSTM in order to search the visual input for members of category
(van Oeffelen & Vos, 1982, 1983): the CODE theory of visual atten-
i.
tion (CTVA). CTVA explains a wide range of spatial effects in visual
attention (see Logan, 1996; Logan & Bundesen, 1996; see Bundesen,
3. Applications of TVA to human performance
1998, for a formalization of CVTA).
Logan and Gordon (2001) extended CTVA into a theory of exec-
TVA has been applied to findings from a broad range of
utive control in dual-task situations. The theory, ECTVA, accounts
paradigms concerned with single-stimulus identification and selec-
for crosstalk, set-switching cost, and concurrence costs. ECTVA
tion from multielement displays in normal subjects. In addition,
assumes that executive processes control subordinate processes by
TVA has been applied in studies of attention deficits, and the scope
manipulating their parameters. TVA is used as the theory of subor-
of the theory has been extended to other cognitive domains such
dinate processes, and a task set is defined as a set of TVA parameters
as memory and executive control.
that is sufficient to configure TVA to perform a task. Set switching
is viewed as a change in one or more of these parameters, and the
3.1. Single-stimulus identification
time taken to change a task set is assumed to depend on the number
of parameters to be changed.
For single-stimulus identification, TVA provides a mathemati-
Logan (2002) proposed a wide-ranging instance theory of
cal derivation of the classical biased-choice model of Luce (1963)
attention and memory (ITAM) combining ECTVA with the
(see Bundesen, 1990, 1993). The biased-choice model has been
exemplar-based random walk model of categorization proposed
successful in explaining many experimental findings on effects of
by Nosofsky and Palmeri (1997). The exemplar-based random walk
visual discriminability and bias (see, e.g., Townsend & Ashby, 1982;
model is itself a combination of Nosofsky’s (1986) generalized con-
Townsend & Landon, 1982).
text model of categorization and Logan’s (1988) instance theory of
automaticity. By integrating theories of attention, categorization,
3.2. Selection from multielement displays
and memory, the development of ITAM seems a major step toward
a unified account of visual cognition.
For selection from multielement displays, TVA provides a math-
ematical derivation of the fixed-capacity independent race model
(FIRM) of Shibuya and Bundesen (1988). Fig. 2 shows a fit to 4. A neural interpretation of TVA (NTVA)
the data of Shibuya and Bundesen (1988): probability distribu-
tions of partial and whole report scores with varying numbers NTVA gives the equations of TVA an interpretation at the level
of targets and distractors and varying exposure durations. Other of individual neurons in the brain. The theory assumes that a typi-
well-established results closely fitted by TVA include effects of cal neuron in the visual system is specialized to represent a single
object integrality (Duncan, 1984) and findings of stochastic inde- feature. The feature for which the neuron is specialized can be
pendence (Bundesen, Kyllingsbæk, & Larsen, 2003; Kyllingsbæk a more or less simple “physical feature” or a “microfeature” in
& Bundesen, 2007) in processing of multiple features, effects of a distributed representation. Further, the neuron is assumed to
the spatial position of targets in studies of divided attention (e.g., respond to the properties of only one object at any given time. The
Posner, Nissen, & Ogden, 1978; Sperling, 1967), effects of selec- object selection of the neuron occurs by dynamic remapping of the
tion criterion and effects of the number of distractors in studies of cell’s classical receptive field so that the functional receptive field
focused attention (e.g., Bundesen & Pedersen, 1983; Estes & Taylor, contracts around the selected object. The probability that the neu-
1964; Treisman & Gelade, 1980; Treisman & Gormican, 1988), and ron comes to represent a particular object in this way equals the
effects of consistent practice in search (Schneider & Fisk, 1982; also attention weight of the object divided by the total sum of the atten-
see Kyllingsbæk, Schneider, & Bundesen, 2001). tion weights of all objects in the classical receptive field. Dynamic
remapping of receptive fields corresponds to the filtering mecha-
3.3. Attention deficits nism in TVA.
NTVA defines the activation of a neuron by the appearance of
The principles of TVA have been extensively applied in stud- an object in its receptive field as the increase in firing rate above
ies of attention deficits. In the pioneering study, Duncan et al. a baseline rate representing the undriven activity of the neuron.
(1999) showed how analysis in terms of parameters defined by If the baseline rate is zero, the activation is just the firing rate.
Author's personal copy

C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457 1449

Fig. 2. Relative frequency of scores of j or more (correctly reported targets) as a function of exposure duration with j, number of targets T, and number of distractors D as
parameters in partial report experiment of Shibuya and Bundesen (1988). Data are shown for subject MP. Parameter j varies within panels; j is 1 (open circles), 2 (open
squares), 3 (solid squares), 4 (solid circles), or 5 (triangle). T and D vary among panels. Smooth curves represent a theoretical fit to the data by the FIRM model. For clarity,
observed frequencies <.02 were omitted from the figure.
Adapted from “Visual Selection From Multielement Displays: Measuring and Modeling Effects of Exposure Duration,” by Shibuya and Bundesen (1988), Journal of Experimental
Psychology: Human Perception and Performance, 14, p. 595. Copyright 1988 by the American Psychological Association.

Independently of the distribution of neurons among objects (i.e., v(x, i). Thus, (x, i) equals the total activation of the set of all neurons
independently of filtering), the activation in neurons represent- coding feature i when all of these cells represent object x (say, when
ing particular features (say, the activation in the set of neurons x is the only object in the visual field) and when the featural bias in
representing feature i) can be scaled up or down. This scaling of favor of i is maximal (i.e., ˇi = 1). When the proportion of feature-i
firing rates corresponds to varying ˇi in TVA (i.e., the pigeonholing coding neurons representing object x:
mechanism). In this neural interpretation, the rate equation of TVA:
wx

wx wz
v(x, i) = (x, i)ˇi  z∈S
w
z∈S z
is <1, then the total activation representing the categorization
describes the combined effects of filtering and pigeonholing on the “object x has feature i” is scaled down by multiplication with this
total activation of the population of neurons representing the cat- factor on the right-hand side of the equation. The total activation
egorization “object x has feature i.” The v value on the left-hand representing the categorization also depends on the level of activa-
side of the rate equation is the total activation of the neurons that tion of the individual neurons representing the categorization. The
represent this particular categorization at the level of processing bias parameter ˇi is a scale factor that multiplies activations of all
in the brain where objects compete for entrance into VSTM. At this feature-i coding neurons, so the total activation representing the
level in the visual system, the classical receptive fields of neurons categorization “object x has feature i” is also directly proportional
are assumed to be so large that each one covers the entire visual to ˇi .
field. On the right-hand side of the equation, both the bias for cat- Thus, in the neural interpretation we propose for the rate equa-
egorizing objects as having feature i, ˇi , and the relative attention tion of TVA, the total activation representing the categorization
weight of object x: “object x has feature i” is directly proportional to both the num-
wx ber of neurons representing the categorization as well as the level
 of activation of the individual neurons representing the categoriza-
z∈S
wz
tion. The number of neurons is controlled by the relative attention
range between 0 and 1. This implies that the strength of the sensory weight of x (filtering) and the activation of the individual neurons
evidence that x is an i, (x, i), equals the highest possible value of is controlled by ˇi (pigeonholing).
Author's personal copy

1450 C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457

Goldman-Rakic, 1995) but thalamo-cortical interaction is also a


main candidate (see Fig. 3).
In NTVA the short-term memory mechanism is based on a topo-
graphically organized map of the objects in the visual field. The map
does not in itself represent the features of the selected objects,
but rather functions as a pointer to their locations. The neurons
representing features of objects at the pointed-to locations are
kept active beyond the immediate sensory stimulation by recip-
rocal connections to the corresponding parts of the VSTM map (see
Section 4.2). Thus, NTVA assumes that visual short-term memory
is constituted by a feedback interaction between sensory neurons
and the VSTM map of locations, which makes it possible for visual
representations to outlast the original sensory stimulation.
The process of encoding information into VSTM starts each
time the map is initialized (i.e., cleared of previous activity). This
is assumed to happen immediately after attention weights have
been computed for a new sensory impression, when the visual sys-
tem is optimally tuned to select the most relevant information.
The selection process functions in a winner-take-all manner, in
which the first activated object representations in the VSTM map
Fig. 3. Possible distribution of visual processing across the human brain. Visual block later ones from entering. The suggested type of winner-take-
information from the eye enters the lateral geniculate nucleus (LGN) of the tha- all networks is described in detail in Section 4.2, but briefly, all
lamus (1) and is transmitted to striate and extrastriate cortical areas where  values nodes in such a network excite themselves (when externally trig-
(strengths of evidence that objects at particular scales and positions have particular
gered) while inhibiting the other nodes. Thus, once a node has been
features) are computed (2). The  values are multiplied by  (pertinence) values, and
the products are transmitted from the cortex to a priority map in the pulvinar (Pul)
triggered by external stimulation, its activity will tend to sustain
nucleus of the thalamus, where the products are summed up as attention weights itself while inhibiting the other nodes to the point where they can
of the stimulus objects (3). After the first (unselective) wave of processing, cortical no longer be activated by external signals. In the VSTM map the
processing capacity is redistributed by means of attention weight signals (w) from inhibitory connections are configured such that when fewer than
the pulvinar to the cortex, so that during the second (selective) wave of processing,
K objects are active, an external signal can still overcome the inhi-
objects with high attention weights are processed by many neurons (4). The result-
ing  values are multiplied by ˇ (bias) values, and the products are transmitted from bition from the objects that are already encoded. However, when K
the cortex to a multiscale VSTM map of locations, which is tentatively localized in units are active, external stimulation representing a new object (an
the thalamic reticular nucleus (TRN) (5). When the VSTM map is initialized, objects object at a new location) can no longer activate a new object node
in the visual field effectively start a race to become encoded into VSTM. In this race, (“K-winners-take-all network”). This mechanism explains the lim-
each object is represented by all possible categorizations of the object, and each
possible categorization participates with an activation (v value) proportional to the
ited storage capacity of VSTM, where only K objects can be active at
corresponding  value multiplied by the corresponding ˇ value. For the winners the same time. Note that the network is only limited in terms of the
of the race, the TRN gates activation representing a categorization back to some of number of objects (or corresponding locations) in the VSTM map,
those cells in LGN whose activation supported the categorization (6). Thus activity but not in the number of features related to each of the selected
in neurons representing winners of the race is sustained by positive feedback.
objects. This means that once an object has established itself in
From “A Neural Theory of Visual Attention: Bridging Cognition and Neurophysiol-
ogy,” by Bundesen et al. (2005), Psychological Review, 112, p. 295. Copyright 2005 by VSTM, many different categorizations can be attached to it.
the American Psychological Association. Adapted with permission. Which type of neural signal is needed for capturing a slot in
the VSTM map? It is not currently clear what the effective signal
is. The simplest possibility is the very first spike from one of the
NTVA is a general neurocomputational model that does not many sensory neurons projecting to the map. If neural processing is
depend critically on a particular anatomical localization of the pro- viewed as very noisy, this may seem like a highly unreliable mech-
posed computations. However we have suggested some plausible anism, but there is evidence to suggest that single spikes can in fact
ways in which visual processing may be distributed across the have profound effects on cognitive processing (Parker & Newsome,
human brain (see also Section 4.2). One possibility is illustrated in 1998). Alternatively, a stronger signal from multiple neurons may
Fig. 3. For further discussions on the functional anatomy of the pro- be needed, perhaps a volley of at least n synchronized spikes, n
cesses described in TVA, see Bundesen et al. (2005) and Habekost being a number substantially greater than one. However the effec-
and Starrfelt (2009). tive signal is defined, NTVA assumes that the VSTM system is
configured so that the first such impulse to the VSTM map activates
4.1. The VSTM system the winner-take-all network. Assuming that signals for different
categorizations as well as repeated signals for a given categoriza-
A main purpose of the VSTM system is to keep visual informa- tion occur independently of each other, this leads to an exponential
tion available for other cognitive processes – further perceptual race model. That is, it implies a model where selection is deter-
analysis, thinking, or permanent storage – after the immediate mined by a parallel processing race among categorizations, a race
sensory stimulation has disappeared. In modelling visual short- in which processing times for individual categorizations are mutu-
term memory, NTVA follows a general idea dating back to Hebb ally independent, exponentially distributed random variables. Such
(1949). Hebb suggested that short-term memory implies that the a model is completely in line with the original TVA model.
activity of neurons representing the selected information is sus- The description so far applies to the simple case in which the first
tained in a positive feedback loop. This way the representation is visual categorization arriving in VSTM is used directly by the cog-
kept active in a reverberating circuit beyond the duration of the nitive system. This mode of processing can be termed immediate
original sensory impression. Hebb’s notion of short-term mem- perception. However, sometimes mutually contradicting catego-
ory has now gained wide acceptance in cognitive neuroscience. rizations of the same object are encoded into VSTM. This may for
The feedback mechanism is often assumed to depend on interac- example be the case under poor lighting conditions, where the
tions between prefrontal and posterior cortical areas (Fuster, 1997; colour of an object is not obvious. In TVA terms, if the  value for “x is
Author's personal copy

C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457 1451

of times (see Logan, 1996). In a random-walk procedure, the deci-


sion depends on the difference between the numbers of alternative
categorizations: the “winning” categorization must occur a certain
number of times more than its closest competitor (see Logan, 2002).
Both of these models can be given a mathematically exact form
(see Appendix B of Bundesen et al., 2005), but still await conclusive
empirical testing.
Representations in VSTM can be relatively concrete and “picto-
rial” or relatively abstract, “schematic,” and “conceptual.” The more
concrete sensation-like representations are often named mental
images. In NTVA simple mental images are regarded as faint copies of
sense impressions. A suggested implication of this sensory nature
is that a mental image includes a specification of the position of
the represented object in the visual field, although localization
may be less precise than with real sense impressions. Neurophys-
iologically, mental images should activate only a subset of the
neurons that are implicated in the corresponding sensory impres-
sions, and the activity in these neurons should be weaker (hence
Fig. 4. Winner-take-all network with three units. Excitatory connections are shown
the “faintness” of the representation). An observable effect in sin-
by arrows. Inhibitory connections are shown by lines ending in solid circles. gle cell activity should be a moderate increase in the firing rate of
From “A neural theory of visual attention: Bridging cognition and neurophysiology” some neurons when a mental image is positioned in their recep-
by Bundesen et al. (2005), Psychological Review, 112, 291–328. Copyright 2005 by tive fields and no external stimuli are present. Such a baseline shift
the American Psychological Association. Adapted with permission.
has been shown in several studies (e.g., Chelazzi, Duncan, Miller, &
Desimone, 1998; Miller, Erickson, & Desimone, 1996; Miller, Li, &
red” is not substantially different from the  value for “x is yellow”, Desimone, 1993).
both categorizations may end up in VSTM. In situations like this
decisional procedures are required: a processing mode of mediate 4.2. Computational networks
perception. Two possible mechanisms for mediate perception have
been suggested within the TVA framework, both based on gradual In this section we present some simple networks that can per-
accumulation of evidence: counter procedures and random-walk form the attentional operations described in NTVA. We show how
models. In a counter procedure, a decision is made when one of the the computations of these networks correspond to the rate and
categorizations has been repeated (confirmed) a certain number weight equations of TVA, and we hint at where such networks may

w1

w1

2 w2

w3 w2
3
w3

w4

4 w4
w5
w5

5
w6

6 w6

Stimuli V2/V3 V4 IT
Fig. 5. Hierarchic network with processing units at three levels: V2/V3, V4, and IT. Each V2/V3 unit has one stimulus in its classical receptive field, each V4 unit has two
stimuli in its classical receptive field, and the IT unit has all of the six stimuli in its classical receptive field. By attentional gating, the effective receptive fields of V4 and IT
units are contracted so that each of the effective receptive fields contains only one object: The symbol stands for a gate which is open on only the upper or the lower line,
and stands for a gate which is open on only one of the three lines. The gates are set by competing attentional weight signals. For k = 1, . . ., 6, wk is the attention weight of
Stimulus k.
From “A neural theory of visual attention: Bridging cognition and neurophysiology” by Bundesen et al. (2005), Psychological Review, 112, 291–328. Copyright 2005 by the
American Psychological Association. Adapted with permission.
Author's personal copy

1452 C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457

be located in the primate brain. However, NTVA is a fairly general &


neurophysiological interpretation of TVA, and it does not depend
critically on a specific anatomical localization of the proposed oper-
ations. The particular networks we present in the following may be
w1
regarded as merely proofs of the existence of simple and biologi-
cally plausible neural networks that can implement the attentional
operations of NTVA. OR

w2
4.2.1. Winner-take-all networks
A basic functional element for two central mechanisms in NTVA,
dynamic remapping of receptive fields and encoding into VSTM,
is the winner-take-all network. NTVA uses a simple network pro- &
posed by Grossberg (1976, 1980), which is illustrated in Fig. 4. As
V2/V3 V4
shown in the figure, it consists of a set of units (populations of
cells) where each unit excites itself and inhibits all other units in Fig. 6. Close-up of the uppermost gate in Fig. 5. The gate consists of a winner-take-
the cluster. When the cluster is initialized (i.e., its level of activ- all network with two units (each controlling one of the two lines through the gate),
ity is reset to the baseline level), an external signal of a certain a logical AND (&) unit on each line, and a logical OR unit transmitting information
from either AND unit. For k = 1, 2, wk is the attention weight of Stimulus k. The signal
above-threshold strength is sufficient to trigger one of the units.
representing the weight of a given stimulus comes from the representation of the
Once the unit has been triggered it keeps firing because of its self- stimulus in a topographic priority map.
exciting connections. At the same time it inhibits the other units. From “A neural theory of visual attention: Bridging cognition and neurophysiology”
If the other units are inhibited so strongly that they can no longer by Bundesen, Habekost, and Kyllingsbæk, 2005, Psychological Review, 112, 291–328.
be activated by external signals, one can simply read from the state Copyright 2005 by the American Psychological Association. Adapted with permis-
sion.
of the cluster which of the units received the first above-threshold
trigger signal after the cluster was initialized. This way, the clus-
Consider the probability that the receptive field of the IT unit in
ter can serve as a device for recording the winner of a processing
the network described contracts around Object 1 rather around one
race. If the strengths of the inhibitory signals are adjusted so that
of the other objects in the visual field. This equals the probability
at least K units must be active in order to block external activation
that both the upper one of the gates from V2/V3 to V4 as well as the
of the remaining units, the cluster works as a K-winners-take-all
gate from V4 to IT open on their upper lines. To estimate the proba-
network.
bility, we make some simple assumptions on the nature of parallel
processing in the visual system. Let the arrival times of attentional
4.2.2. Dynamic remapping of receptive fields weight signals for an object x be exponentially distributed with a
Using the Grossberg winner-take-all network as a building rate parameter equal to the attention weight of x, wx . Let attentional
block, dynamic remapping of receptive fields can be carried out in weight signals for different objects be stochastically independent.
a simple hierarchical network. In this network, processing capac- Finally, let attentional weight signals that represent the same object
ity (i.e., neurons) can be variously allocated to different stimuli by x but go to different gates be stochastically independent. On these
opening and closing gates that control communication between assumptions, the probability that the upper of the gates from V2/V3
lower and higher levels in the visual system. The gates can be con- to V4 opens on its upper line is w1 /(w1 + w2 ). The probability that
trolled by attentional weight signals in such a way that the expected the gate from V4 to IT opens on its upper line is (w1 + w2 )/wk ,
number of cells that represent a particular object at the highest level where the summation extends over Objects 1–6. The probability of
of the visual system (which projects to VSTM) becomes propor- both events equals the product of the two probabilities (since they
tional to the relative attention weight of the object. The hierarchical are stochastically independent), which is w1 /wk . Next, consider a
network is shown in Fig. 5. For simplicity, only three processing lev- population of IT units, each of which is similar to the one we have
els are shown (named V2/V3, V4, and IT), with gates between each considered so far. Each unit has Objects 1–6 and no other objects
level. Fig. 6 shows one of these gates in close-up. It consists of a within its classical receptive field. If there are N units in the popula-
winner-take-all cluster of units (one unit for each line through the tion, the expected number of IT units representing Object 1 equals
gate), logical AND units (one on each line through the gate) and a Nw1 /wk , the expected number of IT units representing Object 2
logical OR unit (which collects information from the AND units). equals Nw2 /wk , and so on. In general, the expected number of IT
The winner-take-all cluster records the first occurring signal to one units representing a particular object should be proportional to the
of the two units in the cluster. Thus, if the upper unit receives a sig- relative attention weight of the object. This way, the network con-
nal before the lower unit, the upper unit will be excited, the lower verts activity representing attention weights in the priority map
unit will be inhibited, and only the upper line through the gate will directly into the number of IT cells working on the corresponding
be open. External signals to units in the winner-take-all clusters objects.
come from the so-called priority map, which represents attention Dynamic remapping of the receptive fields of high-level neurons
weights at particular locations. As indicated in Fig. 5, a winner-take- also has the interesting property of dramatically increasing their
all cluster that gates responses from cells at a lower level to a cell spatial resolution. In the network model shown in Figs. 5 and 6, the
at a higher level receives one attentional weight signal for every effective receptive field of an IT neuron is coded by the way the gates
object within the classical receptive field of the higher level cell. in the network are set. The code formed by the gate settings can be
Attentional weight signals for objects within the receptive field of used for directing signals from an IT cell, whose effective recep-
a given lower level cell go to that unit in the winner-take-all clus- tive field is contracted around an object at a particular location, to
ter that supports communication from the lower level cell to the another unit (a neuron or a population of neurons) representing
higher level cell. Attentional weight signals for objects outside the the object at the selected location. Especially, the code formed by
receptive field of the lower level cell (but within the receptive field the gate settings can be used for directing signals to the location in
of the higher level cell) go to other, competing units in the winner- a topographic map (such as the priority or VSTM maps) that corre-
take-all cluster. As shown below, this simple network can do object sponds to the location of the stimulus object. A simple network for
selection according to the TVA equations. this operation is shown in Fig. 7.
Author's personal copy

C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457 1453

&
&
&
& &

&
& w1
&

& OR

& w2
&

Inter- Topographic &


V2/3 V4 IT
neurons map

Fig. 7. Network for directing signals from an IT cell, whose effective receptive field &
is contracted around an object at a particular location, to a unit representing the
object at the given location in a topographic map. The network on the left side of the Fig. 8. Close-up of the uppermost gate in Fig. 7. The figure is an extension of Fig. 6,
figure is similar to the network in Fig. 5, but each cell in a winner-take-all cluster and the dashed lines correspond to dashed lines in Fig. 7. The figure illustrates
in one of the gates on the left side of the figure controls not only an input line on how the same gates routing signals along a particular pathway from lower-level
the route to the IT unit but also a corresponding output line from the IT unit to the to higher-level cells in the visual system can be used for routing signals along a par-
topographic map. In the depicted state of the network, the effective receptive field allel pathway in the opposite direction. The gate is bistable. In one state, both upper
of the IT unit equals the receptive field of the upper V2/V3 unit, and output from the lines through the gate are open and both lower lines are closed; in the other state,
IT unit is directed to a place in the topographic map of objects that corresponds to both lower lines are open and both upper lines are closed. When the upper lines
the receptive field of the upper V2/V3 unit. are open, signals are directed from the retinal location corresponding to the pair
From “A neural theory of visual attention: Bridging cognition and neurophysiology” of cells in the upper left corner through the ascending pathway originating in the
by Bundesen, Habekost, and Kyllingsbæk, 2005, Psychological Review, 112, 291–328. lower member of the rightmost pair of cells. In this state, signals arriving through the
Copyright 2005 by the American Psychological Association. Adapted with permis- descending pathway to the upper member of the rightmost pair of cells are directed
sion. back towards the retinal location corresponding to the pair of cells in the upper left
corner rather than the retinal location corresponding to the pair of cells in the lower
left corner.
Consider a cell in a winner-take-all cluster in one of the gates on From “A neural theory of visual attention: Bridging cognition and neurophysiology”
the left side of Fig. 7. The cell has an excitatory connection to an AND by Bundesen, Habekost, and Kyllingsbæk, 2005, Psychological Review, 112, 291–328.
unit on a particular input line on the route to the IT unit. The cell Copyright 2005 by the American Psychological Association. Adapted with permis-
sion.
also has an excitatory connection to an AND unit on a corresponding
output line from the IT unit to a topographic map of objects. By this
arrangement, the pattern of open and closed lines on the left side at the same time (bidirectional opening and closing). Thus, it seems
of the figure is duplicated on the right side of the figure. As a result, plausible that the system is designed so that information about an
signals from the IT unit are directed to the appropriate location in object at a particular location in the visual field is processed at high
the topographic map of objects. In the depicted state of the network, levels of the visual system, but routed back to a topographic map of
the effective receptive field of the IT unit equals the receptive field objects at a low level of the system (cf. Schneider, 1995). Evolution
of the upper V2/V3 unit, and output from the IT unit is directed to may have chosen this simple solution.
a place in the topographic map of objects that corresponds to this
location. 4.2.3. Priority map
The topographic map of objects depicted in Fig. 7 may be located Fig. 9 shows a possible implementation of the priority map,
at a very high level in the visual system such as the prefrontal cortex. which represents the attention weights of objects in the visual
In this particular case, the dashed lines in Fig. 7 should symbolize field. Computation of attention weights occurs at many different
long-ranging connections from cells in gates controlling the flow of levels of the visual system, perhaps all the way from primary visual
information from the retina to the IT (gates on the left side of the cortex (V1) to IT cortex. The network in Fig. 9 computes attention
figure) to cells controlling the flow of information from IT to the pre- weights by summing up input (activations measured in spikes per
frontal cortex (AND units on the right side of the figure). However, second) to a unit (a neuron or a population of neurons) in a topo-
as illustrated in Fig. 3, another possibility is that the topographic graphically organized priority map. The priority map is tentatively
map is found at a low level of the visual system such as the tha- located in the pulvinar nucleus of the thalamus. The input is gen-
lamus. By this hypothesis, the right side of Fig. 7 depicts a flow of erated by cortical units specialized to represent different features,
information from the IT unit back towards the retina: a gate and an based on visual signals from the lateral geniculate nucleus (LGN).
AND unit connected by a dashed line are located at the same level of Each input to the unit for object x in the priority map is a product
the visual system, and the connections symbolized by the dashed of two factors: a level of activation of a cortical neuron represent-
lines are quite local. This alternative hypothesis is illustrated in ing the strength of the sensory evidence that x has some feature,
Fig. 8, which shows a gate in close-up. The figure is an extension of j, and a pertinence factor,  j , which is proportional to the perti-
Fig. 6, and the dashed lines correspond to dashed lines in Fig. 7. The nence of j, j , and represents the current importance of attending
figure illustrates how the same gates routing information from a to feature j. The products are summed up for each object location in
particular location on the retina to a certain cell at a higher level of the priority map to produce a topographic distribution of attention
the visual system can be used for routing information back from the weights, following the principle laid out in TVA’s weight equation
high-level cell to a corresponding place in a map of objects found (for mathematical details, see Appendix A of Bundesen et al., 2005).
at a low level of the visual system.
Visual pathways are generally reciprocal so that a pathway from 4.2.4. VSTM loops
area A to area B is accompanied by a pathway from B to A (e.g., Zeki, NTVA assumes that a visual categorization is encoded in VSTM
1993). From an engineering point of view, it seems simple to design when the categorization is embedded in a positive feedback loop
the visual system so that the two pathways in a pair open and close gated by a unit in the VSTM map of objects. Two feedback loops of
Author's personal copy

1454 C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457

Fig. 9. Network for computing attention weights. Signals from the lateral genicu-
Fig. 11. Possible implementation of the circuits illustrated in Fig. 10 as thalamo-
late nucleus (LGN) are transmitted to striate and extrastriate cortical areas where
cortical feedback loops. The feature units are located in different cortical areas,
 values (strengths of evidence that objects at particular scales and positions have
where  values (strengths of evidence that objects at particular scales and posi-
particular features) are computed. The  values are multiplied by  (pertinence) val-
tions have particular features) are computed. The  values are multiplied by ˇ (bias)
ues, and the products are transmitted from the cortex to the pulvinar nucleus of the
values and the products are projected back to the thalamus as rates of activation.
thalamus, where the products are summed up as attention weights of the stimulus
The VSTM map of objects is located in the thalamic reticular nucleus (TRN), which
objects.
lies like a thin shield between the thalamus and the cortex. When the VSTM map
From “A neural theory of visual attention: Bridging cognition and neurophysiology”
is initialized, objects in the visual field effectively start a race to become encoded
by Bundesen, Habekost, and Kyllingsbæk, 2005, Psychological Review, 112, 291–328.
into VSTM. In this race, each object is represented by all possible categorizations of
Copyright 2005 by the American Psychological Association. Adapted with permis-
the object, and each possible categorization participates with a firing rate (v value)
sion.
proportional to the product of the corresponding  and ˇ (bias) values. For the win-
ners of the race, the TRN gates activation representing a particular categorization
back to some of those cells in LGN whose activation supported the categorization.
this type are illustrated in Fig. 10. As shown in the figure, impulses Thus activity in neurons representing winners of the race is sustained by positive
routed to a unit that represents an object at a certain location in the feedback.
VSTM map of objects are fed back to the feature units from which From “A neural theory of visual attention: Bridging cognition and neurophysiology”
they originated, provided that the VSTM unit is activated. If the by Bundesen, Habekost, and Kyllingsbæk, 2005, Psychological Review, 112, 291–328.
Copyright 2005 by the American Psychological Association. Adapted with permis-
VSTM unit is inactive, impulses to the unit are not fed back. Thus,
sion.
for each feature-i neuron representing object x, activation of the
neuron is sustained by feedback when the unit representing object
x in the topographic VSTM map of objects is activated. i is a shape feature of an object x, and feature j is a motion feature
Fig. 11 shows how the circuits illustrated in Fig. 10 might be of the same object x. Shape-selective feature-i neurons are found in
implemented as thalamo-cortical feedback loops. In Fig. 11, feature a high-level cortical area in the ventral stream of visual processing
(see Milner & Goodale, 1995; Ungerleider & Mishkin, 1982), where
their rates of activation are modulated by multiplication with the
perceptual decision bias ˇi . Motion-selective feature-j neurons are
found in a high-level cortical area in the dorsal stream of visual pro-
cessing (cf. Milner & Goodale, 1995; Ungerleider & Mishkin, 1982),
where their rates of activation are modulated by multiplication
with the perceptual decision bias ˇj . The rates of activation of both
& the feature-i and the feature-j neurons allocated to object x are pro-
jected back to the thalamus. The top-down impulses representing
the categorization that “object x has (shape) feature i” are routed to
some of those cells in LGN whose bottom-up activation supported
this categorization, and the top-down impulses representing the
& categorization that “object x has (motion) feature j” are routed to
some of those cells in LGN whose bottom-up activation supported
that categorization. In either case, the feedback occurs through an
AND unit, which is tentatively located in the LGN close to the cells
that are targeted by the feedback. To make the feedback effective,
the AND gates must be open. This means that the AND units must
Object unit
Feature units receive input from a unit representing object x in the VSTM map of
in VSTM map
objects, tentatively located in the thalamic reticular nucleus (TRN).
Fig. 10. Two feedback loops gated by the same unit in the VSTM map of objects. Effectively, the feedback loops with information about features of
Activation of either feature unit is sustained by positive feedback if, and only if, the object x are complete if the TRN unit representing object x in the
object unit is activated. VSTM map of objects is active. An obvious way of activating the TRN
From “A neural theory of visual attention: Bridging cognition and neurophysiology”
by Bundesen, Habekost, and Kyllingsbæk, 2005, Psychological Review, 112, 291–328.
unit is illustrated in Fig. 11: both impulses representing “object x
Copyright 2005 by the American Psychological Association. Adapted with permis- has feature i” and impulses representing “object x has feature j” are
sion. projected back, not only whence they came in the LGN, but also
Author's personal copy

C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457 1455

to the unit representing object x in the TRN. If, for any reason, the the monkey’s attention was directed to one of the stimuli in the
TRN unit representing object x is and remains inactive, impulses pair, this increased the weight on the target stimulus so that the
from the cortex representing object x are not transmitted beyond mean response of the neuron was driven (up or down) toward the
the AND units in the LGN. response elicited when the target stimulus was presented alone.
The results fit in with the dual conjectures that (a) at any time, a
cell was driven by only one of the stimuli in its receptive field and
5. Applications of NTVA to single-cell studies
(b) the probability that the cell was driven by a given stimulus was
proportional to the attention weight of the stimulus. This is exactly
NTVA has been applied to explain a wide range of attentional
what is predicted by NTVA.
effects observed in single-cell studies: effects on the rate of fir-
ing with multiple stimuli in the receptive field of a cell, effects
on the rate of firing with a single stimulus in the receptive field, 5.2. Neural pigeonholing
effects on baseline firing, and effects on neural synchronization (see
Bundesen et al., 2005; Bundesen & Habekost, 2008). A few examples Whereas NTVA’s filtering mechanism affects the number of
of simple, qualitative applications to classical studies of filtering neurons in which an object x is represented, the pigeonholing mech-
and pigeonholing are sketched below. anism affects the way in which the element is represented in each
of those neurons that are allocated to the object. The activation in
a neuron representing the categorization that object x has feature
5.1. Neural filtering i should be proportional to the multiplicative perceptual bias, ˇi ,
which is applied to neurons that are specialized for signaling fea-
NTVA’s notion of attentional filtering by dynamic remapping ture i. Recordings from single cells in the visual system of monkeys
of receptive fields was originally inspired by a study of Moran have provided evidence of such a mechanism of attentional selec-
and Desimone (1985). Moran and Desimone presented macaque tion: a feature-based mechanism of attention that selects groups
monkeys with stimuli at two locations in the visual field. The mon- of neurons with similar stimulus preferences for a multiplicative
keys were trained to attend only to stimuli presented at one of the enhancement in response strength.
locations and ignore any stimuli shown at the other location. The Treue and Martinez-Trujillo (1999) (also see Martinez-Trujillo
monkeys performed a match-to-sample task, in which they were to & Treue, 2004) recorded from cortical area MT, which contains
encode a sample stimulus that was shown at the attended location, cells that are selective to the direction of motion. In the basic task,
retain it in memory for a brief delay period, and then compare it Treue and Martinez-Trujillo presented monkeys with two coher-
to a test stimulus shown at the same location. During the presen- ently moving random dot patterns, one placed inside the receptive
tation of both the sample and test displays, Moran and Desimone field of the neuron being recorded and the other one in the opposite
recorded the responses of single neurons to the stimuli in cortical visual hemifield (see Fig. 12). At the start of each trial, the monkey
areas V1, V4, and IT. They found that, when the target and distractor was shown a cue at one of the two locations. Following this, the
stimuli were both present within the receptive field of the recorded two random dot patterns appeared and the monkey was required
cell in visual areas V4 or IT, the cell’s rate of firing depended on the to detect small changes in the speed or direction of movement of
properties of the target, but not the distractor. For example, they the pattern at the cued location. These motion changes occurred
recorded the responses of individual cells to a pair of stimuli con- after a random delay ranging between a few hundred milliseconds
sisting of (a) a stimulus that elicited a high rate of firing when the and several seconds.
stimulus was presented alone (an effective sensory stimulus) and In one of their experiments, Treue and Martinez-Trujillo demon-
(b) a stimulus that had little or no effect on the rate of firing when strated an effect of pigeonholing with respect to a given direction
the stimulus was presented alone (an ineffective sensory stimulus). of movement (“feature-based attention”); probably the first clear
On trials in which the effective stimulus was the target and the inef- demonstration of pigeonholing in the single-cell literature. In this
fective stimulus was the distractor, the cell showed a high rate of experiment, the recorded neuron’s receptive field was stimulated
firing. However, on trials in which the ineffective stimulus was the by a pattern moving in the direction preferred by this particu-
target and the effective stimulus was the distractor, the cell’s fir- lar neuron. Outside the receptive field, a pattern was presented
ing rate was low. Moran and Desimone remarked that the typical
cell responded “as if the receptive field had contracted around the
attended stimulus.”
The effect depended on both the target and the distractor being
located within the recorded neuron’s receptive field. However in
cortical area IT the receptive fields were very large, covering at least
the central 12 degrees of both the contralateral and ipsilateral visual
field. For these neurons, the distractors were always located inside
the receptive field. On the contrary, in V1 only one stimulus could be
fitted into each neuron’s receptive field. In this case, no significant
effects of attention were found. Also in V4, no effect of attention
was found when only a single stimulus was placed in the receptive
field. These negative findings fit well with the predictions of NTVA.
Overall, the results of Moran and Desimone clearly support NTVA’s
notion of filtering, and in fact suggested the interpretation in the
first place. Fig. 12. Stimuli used by Treue and Martinez-Trujillo (1999). The random dot pat-
Reynolds, Chelazzi, and Desimone (1999) corroborated and tern inside the receptive field (dashed circle) always moved in the cell’s preferred
extended the findings of Moran and Desimone (1985). Recording direction (upward pointing arrow (1)); the stimulus outside moved in either the
from neurons in cortical areas V2 and V4, they found that the mean same (2) or the opposite direction (3).
From “Feature-Based Attention Influences Motion Processing Gain in Macaque
firing rate of a cell to a pair of stimuli in its receptive field approx- Visual Cortex,” by S. Treue and J. C. M. Martinez-Trujillo, 1999, Nature, 399, p. 577.
imated a weighted average of the firing rates of the cell to each Copyright 1999 by Nature Publishing Group. Adapted by permission from Macmillan
of the stimuli in the pair when they were presented alone. When Publishers Ltd.
Author's personal copy

1456 C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457

that moved either in the same direction or in the opposite (non- Bundesen, C. (1993). The relationship between independent race models and Luce’s
preferred) direction. When the monkey attended to the pattern choice axiom. Journal of Mathematical Psychology, 37, 446–471.
Bundesen, C. (1998). A computational theory of visual attention. Philosophical Trans-
outside the receptive field, the response of the recorded neuron actions of the Royal Society of London, Series B, 353, 1271–1281.
varied with the direction of movement being attended. The firing Bundesen, C., & Habekost, T. (2008). Principles of visual attention: Linking mind and
rate went up when the attended pattern (outside the receptive field brain. Oxford University Press.
Bundesen, C., Habekost, T., & Kyllingsbæk, S. (2005). A neural theory of visual
of the recorded neuron) moved in the preferred direction of the attention: Bridging cognition and neurophysiology. Psychological Review, 112,
recorded neuron. The firing rate went down when the attended 291–328.
pattern moved in the direction opposite to the preferred direc- Bundesen, C., Kyllingsbæk, S., & Larsen, A. (2003). Independent encoding of colors
and shapes from two stimuli. Psychonomic Bulletin and Review, 10, 474–479.
tion of the recorded neuron. The spatial location of the attended Bundesen, C., & Pedersen, L. F. (1983). Color segregation and visual search. Perception
pattern was exactly the same in the two conditions. Thus a nonspa- and Psychophysics, 33, 487–493.
tial, feature-based mechanism of attention seemed to be at work: Chelazzi, L., Duncan, J., Miller, E. K., & Desimone, R. (1998). Responses of neurons in
inferior temporal cortex during memory-guided visual search. Journal of Neuro-
pigeonholing.
physiology, 80, 2918–2940.
The result can be explained by assuming that when the mon- Desimone, R. (1999). Visual attention mediated by biased competition in extrastriate
key attended to movement in a particular direction, the ˇ value visual cortex. In G. W. Humphreys, J. Duncan, & A. M. Treisman (Eds.), Attention,
for movement in that direction was high while the ˇ value for space, and action (pp. 13–30). Oxford, UK: Oxford University Press.
Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention.
movement in the opposite direction was low. As the monkey was Annual Review of Neuroscience, 18, 193–222.
monitoring the pattern for hundreds to thousands of milliseconds, Dubois, M., Kyllingsbæk, S., Prado, C., Musca, S. C., Peiffer, E., Lassus-Sangosse, D., et al.
there should be plenty of time to adjust the ˇ values in accordance (2010). Fractionating the multi-character processing deficit in developmental
dyslexia: Evidence from two case studies. Cortex, 46, 717–738.
with the display. Consider the situation in which the preferred Duncan, J. (1984). Selective attention and the organization of visual information.
movement of the recorded neuron was upwards but the monkey Journal of Experimental Psychology: General, 113, 501–517.
attended to movement in the opposite direction (downwards), try- Duncan, J. (1996). Cooperating brain systems in selective perception and action.
In T. Inui, & J. L. McClelland (Eds.), Attention and performance XVI: Information
ing to detect a small change in the movement of the pattern to integration in perception and communication (pp. 549–578). Cambridge, MA: MIT
be attended. Perceptual bias should generally reflect expectations, Press.
and since the monkey had learned that only small changes in the Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Chavda, S., & Shibuya, H. (1999).
Systematic analysis of deficits in visual attention. Journal of Experimental Psy-
direction of movement of the target would occur, and the target pat- chology: General, 128, 450–478.
tern was moving downwards, ˇupwards should be low. Because the Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Ward, R., Kyllingsbæk, S., et al.
activation of the recorded cell should be proportional to ˇupwards , (2003). Dorsal and ventral simultanagnosia. Cognitive Neuropsychology, 20,
675–701.
the recorded activation should also be low. By contrast, when the
Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psy-
monkey attended to movement in the preferred direction of the chological Review, 96, 433–458.
recorded cell, ˇ values should be high for upwards and nearby Estes, W. K., & Taylor, H. A. (1964). A detection method and probabilistic mod-
directions. In this case, ˇupwards being high, the activation of the els for assessing information processing from brief visual displays. Proceedings
of the National Academy of Sciences of the United States of America, 52(2),
recorded neuron also should be high. 446–454.
Finke, K., Bublak, P., Dose, M., Müller, H. J., & Schneider, W. X. (2006). Parameter-
6. Concluding remarks based assessment of spatial and non-spatial attentional deficits in Huntington’s
disease. Brain, 129, 1137–1151.
Finke, K., Bublak, P., Krummenacher, J., Kyllingsbæk, S., Müller, H. J., & Schneider,
As exemplified in the previous section, NTVA has been applied W. X. (2005). Usability of a theory of visual attention (TVA) for parameter-
to explain a wide range of attentional effects observed in single- based measurement of attention I: Evidence from normal subjects. Journal of
the International Neuropsychological Society, 11, 832–842.
cell studies on filtering and pigeonholing (for applications of NTVA Finke, K., Dodds, C. M., Bublak, P., Regenthal, R., Baumann, F., Manly, T., et al. (2010).
to brain imaging studies of visual attention and short-term mem- Effects of modafinil and methylphenidate on visual attention capacity: A TVA-
ory, see Bundesen & Habekost, 2008, chapter 9). The applications based study. Psychopharmacology, 210, 317–329.
Finke, K., Schneider, W. X., Redel, P., Dose, M., Kerkhoff, G., Müller, H. J., et al. (2007).
to single-cell studies have yielded substantial support to the rate The capacity of attention and simultaneous perception of objects: A group study
equation of TVA and the neurophysiological interpretation of this of Huntington’s disease patients. Neuropsychologia, 45, 3272–3284.
equation made in NTVA. As emphasized in the sections above Fuster, J. M. (1997). The prefrontal cortex: Anatomy, physiology, and neuropsychology
of the frontal lobe. New York and Philadelphia: Lippincott-Raven.
entitled The VSTM System and Computational Networks, NTVA also Gerlach, C., Marstrand, L., Habekost, T., & Gade, A. (2005). A case of impaired shape
contains specific proposals for neural networks implementing not integration: Implications for models of visual object processing. Visual Cognition,
only the attentional functions but also the visual short-term mem- 12, 1409–1443.
Goldman-Rakic, P. S. (1995). Cellular basis of working memory. Neuron, 14, 477–485.
ory functions assumed in NTVA (see, in particular, Figs. 10 and 11).
Grossberg, S. (1976). Adaptive pattern classification and universal recoding: I. Par-
The particular networks we have presented may be regarded as allel development and coding of neural feature detectors. Biological Cybernetics,
merely proofs of the existence of simple and biologically plausi- 23, 121–134.
ble neural networks that can implement the operations of NTVA. Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review,
87, 1–51.
However, given the simplicity and the biological plausibility of the Habekost, T., & Bundesen, C. (2003). Patient assessment based on a theory of visual
networks, it would seem worthwhile to search for the suggested attention (TVA): Subtle deficits after a right frontal-subcortical lesion. Neuropsy-
types of networks in the human brain. chologia, 41, 1171–1188.
Habekost, T., & Rostrup, E. (2006). Persisting asymmetries of vision after right side
lesions. Neuropsychologia, 44, 876–895.
References Habekost, T., & Rostrup, E. (2007). Visual attention capacity after right hemisphere
lesions. Neuropsychologia, 45, 1474–1488.
Broadbent, D. E. (1971). Decision and stress. London: Academic Press. Habekost, T., & Starrfelt, R. (2006). Alexia and quadrant-amblyopia: Reading disabil-
Bublak, P., Finke, K., Krummenacher, J., Preger, R., Kyllingsbæk, S., Müller, H. J., ity after a minor visual field deficit. Neuropsychologia, 44, 2465–2476.
et al. (2005). Usability of a theory of visual attention (TVA) for parameter- Habekost, T., & Starrfelt, R. (2009). Visual attention capacity: A review of TVA-based
based measurement of attention II: Evidence from two patients with frontal patient studies. Scandinavian Journal of Psychology, 50, 23–32.
or parietal damage. Journal of the International Neuropsychological Society, 11, Hebb, D. O. (1949). Organization of behavior. New York: John Wiley and Sons.
843–854. Kyllingsbæk, S. (2006). Modeling visual attention. Behavior Research Methods, 38,
Bublak, P., Redel, P., & Finke, K. (2006). Spatial and non-spatial attention deficits in 123–133.
neurodegenerative diseases: Assessment based on Bundesen’s theory of visual Kyllingsbæk, S., & Bundesen, C. (2007). Parallel processing in a multi-feature whole-
attention (TVA). Restorative Neurology and Neuroscience, 24, 287–301. report paradigm. Journal of Experimental Psychology: Human Perception and
Bublak, P., Redel, P., Sorg, C., Kurz, A., Förstl, H., Müller, H. J., et al. (2009). Performance, 33, 64–82.
Staged decline of visual processing capacity in mild cognitive impairment and Kyllingsbæk, S., Schneider, W. X., & Bundesen, C. (2001). Automatic attraction of
Alzheimer’s disease. Neurobiology of Aging (August 25 Epub ahead of print). attention to former targets in visual displays of letters. Perception and Psy-
Bundesen, C. (1990). A theory of visual attention. Psychological Review, 97, 523–547. chophysics, 63, 85–98.
Author's personal copy

C. Bundesen et al. / Neuropsychologia 49 (2011) 1446–1457 1457

Logan, G. D. (1988). Toward an instance theory of automatization. Psychological Redel, P., Bublak, P., Sorg, C., Kurz, A., Förstl, H., Müller, H. J., et al. (2010). Deficits of
Review, 95, 492–527. spatial and task-related attentional selection in mild cognitive impairment and
Logan, G. D. (1996). The CODE theory of visual attention: An integration of space- Alzheimer’s disease. Neurobiology of Aging (June 17 Epub ahead of print).
based and object-based attention. Psychological Review, 103, 603–649. Reynolds, J. H., Chelazzi, L., & Desimone, R. (1999). Competitive mechanisms sub-
Logan, G. D. (2002). An instance theory of attention and memory. Psychological serve attention in macaque areas V2 and V4. Journal of Neuroscience, 19,
Review, 109, 376–400. 1736–1753.
Logan, G. D., & Bundesen, C. (1996). Spatial effects in the partial report paradigm: Schneider, W., & Fisk, A. D. (1982). Degree of consistent training: Improvements in
A challenge for theories of visual spatial attention. In D. L. Medin (Ed.), The search performance and automatic process development. Perception and Psy-
psychology of learning and motivation (pp. 243–282). San Diego, CA: Academic chophysics, 31, 160–168.
Press. Schneider, W. X. (1995). VAM: A neuro-cognitive model for visual attention, con-
Logan, G. D., & Gordon, R. D. (2001). Executive control of visual attention in dual-task trol of segmentation, object recognition, and space-based motor action. Visual
situations. Psychological Review, 108, 393–434. Cognition, 2, 331–375.
Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush, & E. Galanter Shibuya, H., & Bundesen, C. (1988). Visual selection from multielement displays:
(Eds.), Handbook of mathematical psychology (pp. 103–189). New York: Wiley. Measuring and modeling effects of exposure duration. Journal of Experimental
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features Psychology: Human Perception and Performance, 14, 591–600.
and conjunctions. Nature, 390, 279–281. Sperling, G. (1967). Successive approximations to a model for short-term memory.
Martinez-Trujillo, J. C., & Treue, S. (2004). Feature-based attention increases the Acta Psychologica, 27, 285–292.
selectivity of population responses in primate visual cortex. Current Biology, 14, Starrfelt, R., Habekost, T., & Gerlach, C. (2010). Visual processing in pure alexia: A
744–751. case study. Cortex, 46, 242–255.
Miller, E. K., Erickson, C. A., & Desimone, R. (1996). Neural mechanisms of visual Starrfelt, R., Habekost, T., & Leff, A. P. (2009). Too little, too late: Reduced visual span
working memory in prefrontal cortex of the macaque. Journal of Neuroscience, and speed characterize pure alexia. Cerebral Cortex, 19, 2880–2890.
16, 5154–5167. Townsend, J. T., & Ashby, F. G. (1982). Experimental test of contemporary mathe-
Miller, E. K., Li, L., & Desimone, R. (1993). Activity of neurons in anterior inferior matical models of visual letter recognition. Journal of Experimental Psychology:
temporal cortex during a short-term memory task. Journal of Neuroscience, 13, Human Perception and Performance, 8, 834–864.
1460–1478. Townsend, J. T., & Landon, D. E. (1982). An experimental and theoretical investigation
Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford, UK: Oxford of the constant-ratio rule and other models of visual letter confusion. Journal of
University Press. Mathematical Psychology, 25, 119–162.
Moran, J., & Desimone, R. (1985). Selective attention gates visual processing in the Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cog-
extrastriate cortex. Science, 229, 782–784. nitive Psychology, 12, 97–136.
Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization Treisman, A. M., & Gormican, S. (1988). Feature analysis in early vision: Evidence
relationship. Journal of Experimental Psychology: General, 115, 39–57. from search asymmetries. Psychological Review, 95, 15–48.
Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based random walk model of Treue, S., & Martinez-Trujillo, J. C. M. (1999). Feature-based attention influ-
speeded classification. Psychological Review, 104, 266–300. ences motion processing gain in macaque visual cortex. Nature, 399, 575–
Parker, A. J., & Newsome, W. T. (1998). Sense and the single neuron: Probing the 579.
physiology of perception. Annual Review of Neuroscience, 21, 227–277. Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M.
Peers, P. V., Ludwig, C. J., Rorden, C., Cusack, R., Bonfiglioli, C., Bundesen, C., et al. A. Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586).
(2005). Attentional functions of parietal and frontal cortex. Cerebral Cortex, 15, Cambridge, MA: MIT Press.
1469–1484. van Oeffelen, M. P., & Vos, P. G. (1982). Configurational effects on the enumeration
Posner, M. I., Nissen, M. J., & Ogden, W. C. (1978). Attended and unattended pro- of dots: Counting by groups. Memory and Cognition, 10, 396–404.
cessing modes: The role of set for spatial location. In H. L. Pick, & I. J. Saltzman van Oeffelen, M. P., & Vos, P. G. (1983). An algorithm for pattern description on the
(Eds.), Modes of perceiving and processing information (pp. 137–157). Hillsdale, level of relative proximity. Pattern Recognition, 16, 341–348.
NJ: Erlbaum. Zeki, S. (1993). A vision of the brain. Oxford, UK: Blackwell.

View publication stats

You might also like