Attention in Learning (5 Dec)

CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 171
Attention in Learning enon, called highlighting, that I

propose should be added to the list
John K. Kruschke1 of attentional effects.
Department of Psychology, Indiana University, Bloomington, Indiana
Blocking
Abstract Bower, 1968), and the past decade Conditioned blocking refers to a
Learners exhibit many ap- has produced notable empirical situation in which a new cue ac-
parently irrational behaviors in and theoretical advances. In this ar- companies an old cue that was al-
their use of cues, sometimes ticle, I focus on two such advances: ready learned to perfectly predict
learning to ignore relevant an effect called highlighting that of- an outcome. People tend not to as-
cues or to attend to irrelevant fers new opportunities for study- sociate the new cue with the out-
ones. A learning phenomenon ing attentional learning, and the come; that is, learning about the
called highlighting seems espe- development of connectionist new cue has apparently been
cially to demand explanation learning models that explain both blocked. The blocking effect has
in terms of learned attention. attentional shifting and associative been produced in a wide variety of
Highlighting complements the learning by using the unifying paradigms and species since its
classic phenomenon of condi- mechanism of error reduction. discovery by Kamin (1968). In one
tioned blocking, which has type of blocking procedure, human
been shown to involve learned participants are asked to view lists
inattention. Highlighting and of symptoms of hypothetical pa-
blocking, along with a wide HIGHLIGHTING: A NEW tients, one at a time, and to guess
spectrum of other perplexing ATTENTIONAL EFFECT for each patient the appropriate di-
learning phenomena, can be agnosis from a menu of (fictitious)
accounted for by recent con- Three phenomena have often diseases. After each guess, the cor-
nectionist models in which been interpreted as reflecting atten- rect diagnosis is provided, and the
both attentional shifting and tional learning (for a review, see learner is allowed to study the
associative learning are driven Oades & Sartory, 1997). One is la- symptoms and correct diagnosis
by the rational goal of rapid er- tent inhibition, which occurs when before moving on to the next case.
ror reduction. a cue is initially presented with no The structure of the pairings be-
apparent outcome but later is made tween cues (symptoms) and out-
Keywords to be a perfect predictor of a novel comes (diseases), and the phases of
attention; blocking; highlight- outcome. People are slow to learn training, are shown in Table 1. In
ing; learning; connectionist about the cue’s predictiveness, ap- the early phase of training, the
model parently because they inhibit atten- learner studies cases in which cue
tion to the cue (Lubow, 1989). A A produces outcome X, denoted as
second attentional effect is diffi- A → X. In a later phase of training,
Will tomorrow bring rain or culty with extradimensional rele- the cue is accompanied by a redun-
sunshine? The weather might be vance shifts.2 In an extradimensional dant relevant cue, B, but still leads
predicted by myriad cues: the color relevance shift, the predictiveness to the same outcome, X (denoted
of the sunset, the shapes of the of two cue dimensions changes. Ini- by A.B → X). The transition from
clouds, the direction the cows are tially cues from one dimension (e.g., early to late phases of training is
facing, the number of neighbors color) are relevant and cues from a seamless.
washing their cars. Which cues second dimension (e.g., shape) are The later training phase also in-
should an observer attend to? Peo- irrelevant, but then the relevance cludes intermixed trials of cases,
ple can shift their attention to the of the dimensions is exchanged. denoted as C.D → Y, in which two
cues that predict the weather, and This type of shift is difficult to different symptoms lead to a sec-
people can learn to associate those learn, apparently because people ond outcome. These control cases
cues with specific outcomes such have learned to ignore the initially occur with the same frequency as
as rain or shine. irrelevant dimension. A third phe- A.B → X trials. If all that matters
Researchers have long studied nomenon of attentional learning is for the learning of associations is
how humans and animals learn to conditioned blocking. I review the number of co-occurrences, then
allocate attention across potentially blocking in detail to provide con- the strength of association between
informative cues (e.g., Trabasso & text for a complementary phenom- B and X should be the same as the
Copyright © 2003 American Psychological Society

172 VOLUME 12, NUMBER 5, OCTOBER 2003
in highlighting people learn to at-

Table 1. Designs for blocking and highlighting procedures tend to it. The right side of Table 1
shows the basic structure of train-
Procedure ing that produces the highlighting
Phase Blocking Highlighting effect (adapted from the “inverse
base-rate effect” reported by Me-
Early training A→X A.B → X
Late training A.B → X, C.D → Y A.B → X, A.D → Y din & Edelson, 1988). In the early
Testing B.D → ? B.D → ? phase of training, cues A and B are
(Result: Y is preferred; (Result: Y is preferred; presented together and indicate
i.e., B is blocked) i.e., D is highlighted) outcome X. In later training, cases
Note. A, B, C, and D are stimulus cues. X and Y are outcomes. Arrows indicate the of A.B → X continue, but are inter-
correspondence of cues to outcomes; question marks indicate trials on which the correct mixed with cases of A.D → Y. The
response is not provided. structure of the two cases is sym-
metrical in that each outcome has a
single perfect predictor (B for X
and D for Y) and the outcomes
share an imperfect predictor (A).
strength of association between D Blair and I (Kruschke & Blair, If people learn the structural
and Y. 2000) provided evidence that this symmetry, then cue A should not
This prediction of equal associa- explanation is incomplete because be differentially associated with
tive strength is assessed in the final people do, in fact, learn something the outcomes, and cues B and D
testing phase, in which cues B and about the redundant relevant cue: should be equally associated with
D are presented together and the They learn to ignore it. We showed their respective outcomes, X and Y.
learners are asked to make their that subsequent learning of a new In particular, the Rescorla-Wagner
best diagnosis based on what they association with the blocked cue is (1972) model predicts that learning
learned previously. The result is a retarded compared with learning should be symmetric after ade-
strong tendency to choose outcome of a new association with an un- quate learning in the later phase.
Y rather than outcome X. Appar- blocked control cue. The retarda- These predictions are discon-
ently, learning about cue B has tion of learning is predicted by the firmed by results from the final
been prevented, or “blocked.” conventional assumption that a testing phase. When cue A is pre-
The discovery of blocking, and learner will take a longer time to sented by itself, people strongly
its ubiquity, had an enormous im- learn about an attentionally sup- prefer outcome X. When cues B
pact on theories of learning. It dis- pressed cue than about a cue that is and D are presented together, peo-
confirmed simple contiguity theo- attended. Whereas blocking per se ple reliably prefer outcome Y. Ap-
ries, according to which learning is can be explained without appeal to parently, when learning case A.D →
determined by the frequency of co- attention, the subsequent retarda- Y, people shift attention away from
occurrences of cue and outcome, tion of learning about a blocked A, which they have already
and it posed a major challenge for cue cannot be accounted for by the learned indicates the other out-
new theories. Rescorla-Wagner model and is come X, and attentionally highlight
The dominant interpretation of most naturally explained as learned cue D. The phenomenon of high-
blocking posits that the outcome X inattention. lighting occurs across a number of
has a diminished influence on This new empirical result is the variations of the training proce-
learning when the participants first such demonstration in hu- dures and stimuli (see citations in
study cases of A.B → X in the later mans and is generally consistent Kruschke, 2001a).
phase of training. This theory, with the attentional theories of The phenomenon of highlight-
which was formalized in the enor- Mackintosh (1975), whose particu- ing implicates rapid shifts of atten-
mously influential mathematical lar formalism has been general- tion during encoding; that is, on a
model developed by Rescorla and ized and unified within a new con- given learning trial, the attentional
Wagner (1972), suggests that be- nectionist framework described shift is made largely before the as-
cause the learner has already later in this article. sociations are learned. On a later-
learned that cue A fully predicts trained trial of A.D → Y, attention
outcome X, the learner experiences Highlighting rapidly shifts away from A toward
little surprise when cases of A.B → D to achieve the preservation of the
X appear, and hence learns little Whereas in blocking people previously learned association of A
about cue B. learn to ignore a newly added cue, with X, and the prevention of a du-
Published by Blackwell Publishing Inc.

plicitous association of A with Y. When learning A.B → X in the The input activations are transmit-
This rapid attentional shift greatly early phase of training, people at- ted (along the thin arrows in the
reduces interference between the tend to both cues equally (on aver- figure) to these gates, where they
previously learned case of A.B → X age) by default. In the later training are made available for attentional
and the to-be-learned case of A.D → phase, when people are learning modulation; attended-to cues are
Y. If the shift of attention were A.D → Y, any attention to A gener- amplified by large multipliers (i.e.,
not rapid, then the initial associa- ates an erroneous production of re- gates with high activations) and ig-
tion of A with X would be extin- sponse X. They reduce this error by nored cues are attenuated by small
guished and the association of D shifting attention away from A multipliers (i.e., gates with low ac-
with Y would not differ much in to D. tivations). Any presented cue gar-
strength from the association of B Consider next the case of block- ners some attention by default, and
with X. ing. When the first cases of A.B → the attentional gates compete (de-
The attentional shift must itself X appear in the late-training phase, noted in Fig. 1 by the horizontal
be learned, however, so that when B garners some attention by de- line between triangles) for a lim-
case A.D appears, attention is fault. Because attention has limited ited-capacity pool of activation.
shifted away from A to D. This capacity, this distraction by B The attentionally gated cue activa-
learned attention can be assessed causes A to get less than full attention then propagates to outcome
by examining learning subsequent tion, and, therefore, the previously nodes at the top of the network. If a
to highlighting. It is conventionally learned X response is not gener- cue has a small attentional multi-
assumed that learning about an ig- ated as strongly as it should be. plier, its activation is not propa-
nored cue is retarded. Thus, after This error can be quickly corrected gated strongly to the outcome
highlighting, subsequent learning if attention is shifted away from B nodes. The activations at the out-
about cue A should be slower than to A. come nodes are transformed into
learning about cue D, specifically This general scheme of atten- choice probabilities to match hu-
when cues A and D are presented tional shifting by error reduction man choice data.
together. Unpublished experi- has been rigorously implemented Learning in this architecture
ments in my lab have confirmed in a series of specific connectionist proceeds as follows: When cues are
this prediction. models developed in recent years presented at the beginning of a
(e.g., Kruschke, 2001b; Kruschke & trial, the corresponding cue nodes
Johansen, 1999). In connectionist are given activation values of 1,
models generally, cues and out- whereas nodes for absent cues
ATTENTIONAL SHIFTING comes are represented by activa- have zero activation. These activa-
BY ERROR REDUCTION tions of nodes in a network, tions are transmitted to the atten-
roughly analogous to intercon- tional gates via the fixed connec-
Blocking and highlighting are nected neurons in the brain. Acti- tions represented by the thin
just two examples of seemingly ir- vation flows from node to node via arrows in Figure 1, and at the same
rational learning. In both blocking connections with different weights, time learned connections from the
and highlighting procedures, cues roughly analogous to neural syn- cues to the attentional gates (the
B and D are equally perfect predic- apses with different conductances. lower thick arrow in Fig. 1) differ-
tors of their respective outcomes The connectionist models of atten- entially activate the gates, thereby
(i.e., p (X|B) p (Y|D) 100%). tional shifting generalize and unify allocating learned attention across
Nevertheless, in blocking, B is ap- historically influential models, the cues. These learned connec-
parently underweighted, and in such as the Rescorla-Wagner (1972) tions thus generate the distribution
highlighting, D is apparently over- model, the attentional model of of attention. Activation then propa-
weighted. In both procedures, Mackintosh (1975), and the gener- gates from the attentionally gated
there is evidence that people have alized context model of Nosofsky cues to the outcome nodes (via the
shifted attention to specific cues (1986). upper thick arrow in Fig. 1). The
and learned those attentional re- Figure 1 is a schematic of the ar- activation of the outcome nodes is
allocations. chitecture common to all these con- transformed into a choice, such
It turns out that these attentional nectionist networks. When cues are that more highly activated out-
shifts, and their concomitant irra- presented, they activate nodes at comes are more likely to be chosen
tional behaviors, can be accurately the lowest layer of the network. At- than less activated outcomes. This
modeled by a simple rational pro- tention is represented by multipli- choice corresponds to a learner’s
cess: rapid error reduction. Con- cative gates (denoted in the figure response in an environment. Then
sider first the case of highlighting. by triangles) on the cue activations. the correct response is presented to

174 VOLUME 12, NUMBER 5, OCTOBER 2003
Johansen, 1999) placed a layer of

nodes between the attentional
gates and the outcome nodes to
mediate complex mappings from
cues to outcomes. The EXIT model
(Kruschke, 2001b) employed a
layer of nodes between the input
cues and the attentional gates. A
goal for future research is to com-
bine these architectural features
and address all the phenomena si-
multaneously.
The RASHNL and EXIT models
and related models have been
shown to accurately fit data from a
wide variety of experiments, in-
cluding not only the blocking and
highlighting effects, but also many
other seemingly irrational phe-
nomena in learning, such as under-
or overutilization of partially pre-
dictive cues, relevance and reversal
shifts, and differential difficulty of
Fig. 1. Connectionist architecture for learning attention and outcomes. Error at the learning different category struc-
outcome nodes initially drives rapid shifting of attention (at the triangle-shaped tures. The models have also made
nodes). After attention is shifted, the remaining error drives learning of associations novel predictions (e.g., they pre-
(thick arrows) between the cues and the shifted attentional distribution and between dicted the deleterious impact of
the attentionally gated cues and the outcomes. The associative links can be direct or undiagnostic but salient cues;
mediated by intervening layers of nodes.
Kruschke & Johansen, 1999, Exper-
iments 3 and 4) and have unified a
number of otherwise distinct ef-
the network, just as corrective feed- to generate this desired attentional fects (e.g., latent inhibition and
back is presented to learners in the distribution, instead of the distri- blocking; Kruschke, 2001b). Thus, a
experiments. The correct response bution it erroneously generated, spectrum of apparently irrational
is represented in the network by whenever these cues are presented. learning phenomena can be ac-
desired activation values of the The model computes the discrep- counted for, in rigorous detail, by
outcome nodes; the correct out- ancy between the desired atten- the rational goal of rapid error re-
come has a desired activation value tional distribution and the current duction.
of 1 and the other, incorrect out- attentional distribution, and then
comes have a desired activation adjusts the associative weights
value of 0. The model computes the from the cues to the attention
discrepancy between its guessed nodes to approximate the desired RAMIFICATIONS
outcome activations and the cor- distribution (the lower thick arrow
rect activations. The singular goal in Fig. 1). Finally, the associative Phenomena and theories of at-
of the model is to reduce this error. weights to the outcome nodes (up- tentional shifting promise to have
The first step in reducing the er- per thick arrow in Fig. 1) are ad- an impact on many fields. High-
ror is a rapid shift of attention justed to reduce the remaining er- lighting may play a role in stereo-
away from cues that cause error ror. Thus, both attention shifting type formation, which is studied
and toward cues that reduce error. and association learning are driven by social psychologists. Highlight-
This attentional shift corresponds by the goal of error reduction. ing and retarded learning after
to a change in the activations of the Different instantiations of this blocking may be used to assess
attentional multipliers so that the general architecture have been dysfunctional attention, a concern
desired distribution of attention used when addressing different of clinical psychologists. When the
across the current cues will be subsets of learning phenomena. models are fit to learning data from
achieved. The model should learn The RASHNL model (Kruschke & people with disorders such as
Published by Blackwell Publishing Inc.

schizophrenia, Huntington’s dis- Oades, R.D., & Sartory, G. (1997).

nal of Experimental Psychology: Learning, Mem-
ory, and Cognition, 27, 1385–1400.
ease, or Parkinson’s disease, the (See References) Kruschke, J.K. (2001b). Toward a unified model of
models’ attentional-shifting rate attention in associative learning. Journal of
and associative-learning rates may Mathematical Psychology, 45, 812–863.
Kruschke, J.K., & Blair, N.J. (2000). Blocking and
help identify the aspects of learn- Acknowledgments— The author’s re-
backward blocking involve learned inatten-
search has been supported by National
ing that differ between these and Institute of Mental Health FIRST Award
tion. Psychonomic Bulletin & Review, 7, 636–645.
normal populations. Similarly, R29-MH51572 and National Science Kruschke, J.K., & Johansen, M.K. (1999). A model
of probabilistic category learning. Journal of
when the models are fit to learning Foundation Grant BCS-9910720. Experimental Psychology: Learning, Memory, and
data from different age groups, the Cognition, 25, 1083–1119.
models’ attentional-shifting rate Lubow, R.E. (1989). Latent inhibition and conditioned
attention theory. Cambridge, England: Cam-
and associative-learning rates may Notes bridge University Press.
help identify which aspects of Mackintosh, N.J. (1975). A theory of attention:
Variations in the associability of stimuli with
learning change through child- 1. Address correspondence to John reinforcement. Psychological Review, 82, 276–
hood, adolescence, and maturity. K. Kruschke, Department of Psychol- 298.
Consumers and marketers will be ogy, Indiana University, 1101 E. 10th Medin, D.L., & Edelson, S.M. (1988). Problem
St., Bloomington, IN 47405-7007; e-mail: structure and the use of base-rate information
interested in how attentional shift- kruschke@indiana.edu. from experience. Journal of Experimental Psy-
ing might influence assessments of 2. Recent evidence from behavioral chology: General, 117, 68–85.
products and consumer choice. Ed- effects, modeling, and neuroscience Nosofsky, R.M. (1986). Attention, similarity and
the identification-categorization relationship.
ucators and trainers will want to has shown that relevance shifts use dif- Journal of Experimental Psychology: General, 115,
know how best to arrange topics to ferent mechanisms than reversal shifts, 39–57.
in which the same cues remain relevant Oades, R.D., & Sartory, G. (1997). The problems of
maximize efficiency in learning but the correct outcomes are reversed inattention: Methods and interpretations. Be-
while minimizing biases such as (see Kruschke, 1996; Rogers, Andrews, havioural Brain Research, 88, 3–10.
blocking and highlighting. In sum- Grasby, Brooks, & Robbins, 2000). Owen, A.M., Roberts, A.C., Hodges, J.R., Sum-
Other work has begun to separate ef- mers, B.A., Polkey, C.E., & Robbins, T.W.
mary, the study of attention in as- (1993). Contrasting mechanisms of impaired
sociative learning holds promise fects of learned relevance from learned attentional set-shifting in patients with frontal
irrelevance (see Owen et al., 1993). lobe damage or Parkinson’s disease. Brain,
for unifying disparate, perplexing 116, 1159–1175.
behavioral effects, explaining those Rescorla, R.A., & Wagner, A.R. (1972). A theory of
effects in detail through formal Pavlovian conditioning: Variations in the ef-
fectiveness of reinforcement and non-rein-
models, and applying the findings References forcement. In A.H. Black & W.F. Prokasy
to many other fields. (Eds.), Classical conditioning: II. Current research
Kamin, L.J. (1968). ‘Attention-like’ processes in and theory (pp. 64–99). New York: Appleton-
classical conditioning. In M.R. Jones (Ed.), Mi- Century-Crofts.
ami symposium on the prediction of behavior: Rogers, R.D., Andrews, T.C., Grasby, P.M.,
Recommended Reading Aversive stimulation (pp. 9–33). Coral Gables, Brooks, D.J., & Robbins, T.W. (2000). Contrast-
FL: University of Miami Press. ing cortical and subcortical activations pro-
Kruschke, J.K. (2001b). (See Refer- Kruschke, J.K. (1996). Dimensional relevance shifts duced by attentional-set shifting and reversal
ences) in category learning. Connection Science, 8, 201– learning in humans. Journal of Cognitive Neuro-
Mackintosh, N.J. (1975). (See Refer- 223. science, 12, 142–162.
ences) Kruschke, J.K. (2001a). The inverse base rate effect Trabasso, T., & Bower, G.H. (1968). Attention in
is not explained by eliminative inference. Jour- learning: Theory and research. New York: Wiley.

Attention in Learning (5 Dec)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Attention in Learning (5 Dec)

Uploaded by

Copyright:

Available Formats

CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 171

Attention in Learning enon, called highlighting, that I

Copyright © 2003 American Psychological Society

in highlighting people learn to at-

Published by Blackwell Publishing Inc.

Copyright © 2003 American Psychological Society

Johansen, 1999) placed a layer of

Published by Blackwell Publishing Inc.

schizophrenia, Huntington’s dis- Oades, R.D., & Sartory, G. (1997).

Copyright © 2003 American Psychological Society

You might also like