Professional Documents
Culture Documents
Attention in Learning (5 Dec)
Attention in Learning (5 Dec)
Abstract Bower, 1968), and the past decade Conditioned blocking refers to a
Learners exhibit many ap- has produced notable empirical situation in which a new cue ac-
parently irrational behaviors in and theoretical advances. In this ar- companies an old cue that was al-
their use of cues, sometimes ticle, I focus on two such advances: ready learned to perfectly predict
learning to ignore relevant an effect called highlighting that of- an outcome. People tend not to as-
cues or to attend to irrelevant fers new opportunities for study- sociate the new cue with the out-
ones. A learning phenomenon ing attentional learning, and the come; that is, learning about the
called highlighting seems espe- development of connectionist new cue has apparently been
cially to demand explanation learning models that explain both blocked. The blocking effect has
in terms of learned attention. attentional shifting and associative been produced in a wide variety of
Highlighting complements the learning by using the unifying paradigms and species since its
classic phenomenon of condi- mechanism of error reduction. discovery by Kamin (1968). In one
tioned blocking, which has type of blocking procedure, human
been shown to involve learned participants are asked to view lists
inattention. Highlighting and of symptoms of hypothetical pa-
blocking, along with a wide HIGHLIGHTING: A NEW tients, one at a time, and to guess
spectrum of other perplexing ATTENTIONAL EFFECT for each patient the appropriate di-
learning phenomena, can be agnosis from a menu of (fictitious)
accounted for by recent con- Three phenomena have often diseases. After each guess, the cor-
nectionist models in which been interpreted as reflecting atten- rect diagnosis is provided, and the
both attentional shifting and tional learning (for a review, see learner is allowed to study the
associative learning are driven Oades & Sartory, 1997). One is la- symptoms and correct diagnosis
by the rational goal of rapid er- tent inhibition, which occurs when before moving on to the next case.
ror reduction. a cue is initially presented with no The structure of the pairings be-
apparent outcome but later is made tween cues (symptoms) and out-
Keywords to be a perfect predictor of a novel comes (diseases), and the phases of
attention; blocking; highlight- outcome. People are slow to learn training, are shown in Table 1. In
ing; learning; connectionist about the cue’s predictiveness, ap- the early phase of training, the
model parently because they inhibit atten- learner studies cases in which cue
tion to the cue (Lubow, 1989). A A produces outcome X, denoted as
second attentional effect is diffi- A → X. In a later phase of training,
Will tomorrow bring rain or culty with extradimensional rele- the cue is accompanied by a redun-
sunshine? The weather might be vance shifts.2 In an extradimensional dant relevant cue, B, but still leads
predicted by myriad cues: the color relevance shift, the predictiveness to the same outcome, X (denoted
of the sunset, the shapes of the of two cue dimensions changes. Ini- by A.B → X). The transition from
clouds, the direction the cows are tially cues from one dimension (e.g., early to late phases of training is
facing, the number of neighbors color) are relevant and cues from a seamless.
washing their cars. Which cues second dimension (e.g., shape) are The later training phase also in-
should an observer attend to? Peo- irrelevant, but then the relevance cludes intermixed trials of cases,
ple can shift their attention to the of the dimensions is exchanged. denoted as C.D → Y, in which two
cues that predict the weather, and This type of shift is difficult to different symptoms lead to a sec-
people can learn to associate those learn, apparently because people ond outcome. These control cases
cues with specific outcomes such have learned to ignore the initially occur with the same frequency as
as rain or shine. irrelevant dimension. A third phe- A.B → X trials. If all that matters
Researchers have long studied nomenon of attentional learning is for the learning of associations is
how humans and animals learn to conditioned blocking. I review the number of co-occurrences, then
allocate attention across potentially blocking in detail to provide con- the strength of association between
informative cues (e.g., Trabasso & text for a complementary phenom- B and X should be the same as the
plicitous association of A with Y. When learning A.B → X in the The input activations are transmit-
This rapid attentional shift greatly early phase of training, people at- ted (along the thin arrows in the
reduces interference between the tend to both cues equally (on aver- figure) to these gates, where they
previously learned case of A.B → X age) by default. In the later training are made available for attentional
and the to-be-learned case of A.D → phase, when people are learning modulation; attended-to cues are
Y. If the shift of attention were A.D → Y, any attention to A gener- amplified by large multipliers (i.e.,
not rapid, then the initial associa- ates an erroneous production of re- gates with high activations) and ig-
tion of A with X would be extin- sponse X. They reduce this error by nored cues are attenuated by small
guished and the association of D shifting attention away from A multipliers (i.e., gates with low ac-
with Y would not differ much in to D. tivations). Any presented cue gar-
strength from the association of B Consider next the case of block- ners some attention by default, and
with X. ing. When the first cases of A.B → the attentional gates compete (de-
The attentional shift must itself X appear in the late-training phase, noted in Fig. 1 by the horizontal
be learned, however, so that when B garners some attention by de- line between triangles) for a lim-
case A.D appears, attention is fault. Because attention has limited ited-capacity pool of activation.
shifted away from A to D. This capacity, this distraction by B The attentionally gated cue activa-
learned attention can be assessed causes A to get less than full atten- tion then propagates to outcome
by examining learning subsequent tion, and, therefore, the previously nodes at the top of the network. If a
to highlighting. It is conventionally learned X response is not gener- cue has a small attentional multi-
assumed that learning about an ig- ated as strongly as it should be. plier, its activation is not propa-
nored cue is retarded. Thus, after This error can be quickly corrected gated strongly to the outcome
highlighting, subsequent learning if attention is shifted away from B nodes. The activations at the out-
about cue A should be slower than to A. come nodes are transformed into
learning about cue D, specifically This general scheme of atten- choice probabilities to match hu-
when cues A and D are presented tional shifting by error reduction man choice data.
together. Unpublished experi- has been rigorously implemented Learning in this architecture
ments in my lab have confirmed in a series of specific connectionist proceeds as follows: When cues are
this prediction. models developed in recent years presented at the beginning of a
(e.g., Kruschke, 2001b; Kruschke & trial, the corresponding cue nodes
Johansen, 1999). In connectionist are given activation values of 1,
models generally, cues and out- whereas nodes for absent cues
ATTENTIONAL SHIFTING comes are represented by activa- have zero activation. These activa-
BY ERROR REDUCTION tions of nodes in a network, tions are transmitted to the atten-
roughly analogous to intercon- tional gates via the fixed connec-
Blocking and highlighting are nected neurons in the brain. Acti- tions represented by the thin
just two examples of seemingly ir- vation flows from node to node via arrows in Figure 1, and at the same
rational learning. In both blocking connections with different weights, time learned connections from the
and highlighting procedures, cues roughly analogous to neural syn- cues to the attentional gates (the
B and D are equally perfect predic- apses with different conductances. lower thick arrow in Fig. 1) differ-
tors of their respective outcomes The connectionist models of atten- entially activate the gates, thereby
(i.e., p (X|B) p (Y|D) 100%). tional shifting generalize and unify allocating learned attention across
Nevertheless, in blocking, B is ap- historically influential models, the cues. These learned connec-
parently underweighted, and in such as the Rescorla-Wagner (1972) tions thus generate the distribution
highlighting, D is apparently over- model, the attentional model of of attention. Activation then propa-
weighted. In both procedures, Mackintosh (1975), and the gener- gates from the attentionally gated
there is evidence that people have alized context model of Nosofsky cues to the outcome nodes (via the
shifted attention to specific cues (1986). upper thick arrow in Fig. 1). The
and learned those attentional re- Figure 1 is a schematic of the ar- activation of the outcome nodes is
allocations. chitecture common to all these con- transformed into a choice, such
It turns out that these attentional nectionist networks. When cues are that more highly activated out-
shifts, and their concomitant irra- presented, they activate nodes at comes are more likely to be chosen
tional behaviors, can be accurately the lowest layer of the network. At- than less activated outcomes. This
modeled by a simple rational pro- tention is represented by multipli- choice corresponds to a learner’s
cess: rapid error reduction. Con- cative gates (denoted in the figure response in an environment. Then
sider first the case of highlighting. by triangles) on the cue activations. the correct response is presented to