Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 93


5 ◆

COGNITIVE ARCHITECTURE

Michael D. Byrne
Rice University

What Are Cognitive Architectures? . . . . . . . . . . . . . . . . 94 Contemporary Architectures . . . . . . . . . . . . . . . . . . . . 102


Relevance to Human-Computer Interaction . . . . . . . . 95 LICAI/CoLiDeS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Brief Look at Past Systems in HCI . . . . . . . . . . . . . . . . 96 EPIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
The Model Human Processor (MHP) and GOMS . . . . . . 97 ACT-R 5.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Cognitive Complexity Theory (CCT) . . . . . . . . . . . . . . . 98 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
CAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 The Future of Cognitive Architectures in HCI . . . . . 110
Soar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

93
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 94

94 • BYRNE

Designing interactive computer systems to be efficient and easy to use telligent computer systems by whatever technologies best serve
is important so that people in our society may realize the potential ben- that purpose. Cognitive architectures are designed to simulate
efits of computer-based tools. . . . Although modern cognitive psychol- human intelligence in a humanlike way (Newell, 1990). For ex-
ogy contains a wealth of knowledge of human behavior, it is not a sim- ample, the chess program that defeated Kasparov, Deep Blue,
ple matter to bring this knowledge to bear on the practical problems of
would not qualify as a cognitive architecture because it does not
design—to build an applied psychology that includes theory, data, and
knowledge. (Card, Moran, & Newell, 1983, p. vii)
solve the problem (chess) in a humanlike way. Deep Blue uses
massive search of the game space, while human experts gener-
ally look only a few moves ahead, but concentrate effectively
Integrating theory, data, and knowledge about cognitive psy-
on quality moves.
chology and human performance in a way that is useful for guid-
Cognitive architectures differ from traditional research in
ing design in HCI is still not a simple matter. However, there
psychology in that work on cognitive architecture is integra-
have been significant advances since Card, Moran, and Newell
tive—that is, architectures include mechanisms to support at-
(1983) wrote the previous passage. One of the key advances is
tention, memory, problem solving, decision making, learning,
the development of cognitive architectures, the subject of this
and so forth. Most theorizing in psychology follows a divide-
chapter. The chapter will first consider what it is to be a cogni-
and-conquer strategy that tends to generate highly specific the-
tive architecture and why cognitive architectures are relevant for
ories of a very limited range of phenomena; this has changed lit-
HCI. In order to detail the present state of cognitive architec-
tle since the 1970s (Newell, 1973). This limits the usefulness of
tures in HCI, it is important to consider some of the past uses of
such theories for an applied domain such as HCI, where users
cognitive architectures in HCI research. Then, three architec-
employ a wide range of cognitive capabilities in even simple
tures actively in use in the research community (LICAI/CoLiDeS,
tasks. Instead of asking, “How can we describe this isolated phe-
EPIC, and ACT-R) and their application to HCI will be examined.
nomenon?” people working with cognitive architectures can
The chapter will conclude with a discussion of the future of cog-
ask, “How does this phenomenon fit in with what we already
nitive architectures in HCI.
know about other aspects of cognition?”
Another important feature of cognitive architectures is that
they specify only the human “virtual machine,” the fixed ar-
WHAT ARE COGNITIVE ARCHITECTURES? chitecture. A cognitive architecture alone cannot do anything.
Generally, the architecture has to be supplied with the knowl-
Most dictionaries will list several different definitions for the edge needed to perform a particular task. The combination
word architecture. For example, dictionary.com lists among of an architecture and a particular set of knowledge is gener-
them “a style and method of design and construction,” e.g., ally referred to as a “computational cognitive model,” or just
Byzantine architecture; “orderly arrangement of parts; struc- a “model.” In general, it is possible to construct more than one
ture,” e.g., the architecture of a novel; and one from computer model for any particular task. The specific knowledge incor-
science: “the overall design or structure of a computer system.” porated into a particular model is determined by the modeler.
What, then, would something have to be to qualify as a “cogni- Because the relevant knowledge must be supplied to the ar-
tive architecture”? It is something much more in the latter chitecture, the knowledge engineering task facing modelers
senses of the word architecture, an attempt to describe the attempting to model performance on complex tasks can be
overall structure and arrangement of a very particular thing, formidable.
the human cognitive system. A cognitive architecture is a broad A third centrally important feature of cognitive architectures
theory of human cognition based on a wide selection of human is that they are software artifacts constructed by human pro-
experimental data, and implemented as a running computer grammers. This has a number of relevant ramifications. First, a
simulation program. Young (Gray, Young, & Kirschenbaum, 1997; model of a task constructed in a cognitive architecture is
Ritter & Young, 2001) defined a cognitive architecture as an em- runnable and produces a sequence of behaviors. These behav-
bodiment of “a scientific hypothesis about those aspects of hu- ior sequences can be compared with the sequences produced
man cognition that are relatively constant over time and rela- by human users to help assess the quality of a particular model.
tively independent of task.” They may also provide insight into alternate ways to perform a
This idea has been a part of cognitive science since the early task—that is, they may show possible strategies that are not ac-
days of cognitive psychology and artificial intelligence, as mani- tually utilized by the people performing the task. This can be
fested in the General Problem Solver or GPS (Newell & Simon, useful in guiding interface design as well. Another feature of
1963), one of the first successful computational cognitive mod- many architectures is that they enable the creation of quantita-
els. These theories have progressed a great deal since GPS and tive models. For instance, the model may say more than just
are gradually becoming broader. One of the best descriptions of “click on button A and then menu B,” but may include the time
the vision for this area is presented in Newell’s (1990) book Uni- between the two clicks as well. Models based on cognitive ar-
fied Theories of Cognition. In it, Newell argued that the time has chitectures can produce execution times, error rates, and even
come for cognitive psychology to stop collecting disconnected learning curves. This is a major strength of cognitive architec-
empirical phenomena and begin seriously considering theoreti- tures as an approach to certain kinds of HCI problems and will
cal unification in the form of computer simulation models. Cog- be discussed in more detail in the next section.
nitive architectures are attempts to do just this. On the other hand, cognitive architectures are large software
Cognitive architectures are distinct from engineering ap- systems, which are often considered difficult to construct and
proaches to artificial intelligence, which strive to construct in- maintain. Individual models are also essentially programs, writ-
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 95

5. Cognitive Architecture • 95

ten in the “language” of the cognitive architecture. Thus, indi- be subjected to its own wind tunnel (usability) test, and the en-
vidual modelers need to have solid programming skills. gineer has little guidance about what to expect other than from
Finally, cognitive architectures are not in wide use among intuition and experience with similar tests. While intuition and
HCI practitioners. Right now, they exist primarily in academic re- experience can certainly be valuable guides, they often fall short
search laboratories. One of the barriers for practitioners is that of more “hard” quantitative methods. Perhaps the engineer can
learning and using most cognitive architectures is itself gener- intuit that interface “X” will allow users to complete tasks faster
ally a difficult task; however, this is gradually changing, and than with interface “Y,” but how much faster? Ten percent?
some of the issues being addressed in this regard will be dis- Twenty percent? Even small savings in execution times can add
cussed in section 4. Furthermore, even if cognitive architectures up to large financial savings for organizations when one con-
are not in wide use by practitioners, this does not mean that siders the scale of the activity. The paradigm example is the tele-
they are irrelevant to practitioners. The next section highlights phone operators studied by Gray, John, and Atwood (1993),
why cognitive architectures are relevant to a wide HCI audience. where even a second saved on an average call would save the
telephone company millions of dollars.
Computational models based on cognitive architectures
have the potential to provide detailed quantitative answers, and
for more than just execution times. Error rates, transfer of knowl-
RELEVANCE TO HUMAN-COMPUTER edge, learning rates, and other kinds of performance measures
INTERACTION are all metrics than can often be provided by architecture-based
models. Even if such models are not always precisely accurate in
For some readers, the relevance of models that produce quan- an absolute sense, they may still be useful in a comparative
titative predictions about human performance will be obvious. sense. For example, if a usability engineer is comparing interface
For others, this may be less immediately clear. Cognitive archi- A with interface B and the model at his or her disposal does not
tectures (a) are relevant to usability as an engineering discipline, accurately predict the absolute times to complete some set of
(b) have several HCI-relevant applications in computing systems, benchmark tasks, it may still accurately capture the ordinal dif-
and (c) serve an important role in HCI as a theoretical science. ference between the two interfaces, which may be enough.
At nearly all HCI-oriented conferences, and many online re- Additionally, there are certain circumstances when usability
sources, there are areas where corporations recruit HCI profes- tests are either impractical, prohibitively costly, or both. For
sionals. A common job title in these forums is “usability engi- example, access to certain populations such as physicians or
neer.” Implicit in this title is the view that usability is, at least in astronauts may be difficult or expensive, so bringing them in for
part, an engineering enterprise. In addition, while people with repeated usability tests may not be feasible. While developing a
this job title are certainly involved in product design, there is a model of a pilot or an air traffic controller performing an expert
sense in which most usability engineering would not be recog- task with specialized systems may be difficult at first, rerunning
nized as engineering by people trained in more traditional engi- that model to assess a change made to the user interface should
neering disciplines, such as electrical or aerospace engineering. be much more straightforward than performing a new usability
In traditional engineering disciplines, design is generally guided test for each iteration of the system. This is possible only with a
at least in part by quantitative theory. Engineers have at their dis- quantitatively realistic model of the human in the loop, one that
posal hard theories of the domain in which they work, and these can produce things such as execution times and error rates.
theories allow them to derive quantitative predictions. Consider Computational models can, in principle, act as surrogate users
an aerospace engineer designing a wing. Like a usability engi- in usability testing, even for special populations.
neer, the aerospace engineer will not start with nothing; a pre- Of course, some of these measures can be obtained through
existing design often provides a starting point; however, when other methods such as GOMS (Goals, Operators, Methods, and
the aerospace engineer decides to make a change in that design, Selection rules) analysis (Card et al., 1983; John & Kieras, 1996)
there is usually quantitative guidance about how the perfor- or cognitive walkthrough (Polson, Lewis, Rieman, & Wharton,
mance of the wing will change because of the change in design. 1992); however, these techniques were originally grounded in
This guidance, while quantitative, is not infallible, hence the the same ideas as some prominent cognitive architectures and
need for evaluation tools such as wind tunnels. This is similar to are essentially abstractions of the relevant architectures for par-
the usability engineer’s usability test. Unlike usability testing, ticular HCI purposes. In addition, architecture-based computa-
however, the aerospace engineer has some quantitative idea tional models provide things that GOMS models and cognitive
about what the outcome of the test will be, and this is not guided walkthroughs do not. First, models are executable and genera-
simply by intuition and experience, but by a mathematical theory tive. A GOMS analysis, on the other hand, is a description of the
of aerodynamics. In fact, this theory is now so advanced that few procedural knowledge the user must have and the sequence of
wind tunnels are built anymore. Instead, they are being replaced actions that must be performed to accomplish a specific task
by computer simulations based on “computational fluid dynam- instance, while the equivalent computational model actually
ics,” an outcome of the application of computational techniques generates the behaviors, often in real time or faster. Equally im-
to complex problems in aerodynamics. Simulations have not en- portantly, computational models have the capacity to be reac-
tirely replaced wind tunnels, but the demand for wind tunnel tive in real time. So while it may be possible to construct a
time has clearly been affected by this development. GOMS model that describes the knowledge necessary and the
For the most part, the usability engineer lacks the quantita- time it will take an operator to classify a new object on an air
tive tools available to the aerospace engineer. Every design must traffic controller’s screen, a paper-and-pencil GOMS model can-
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 96

96 • BYRNE

not actually execute the procedure in response to the appear- work play. While this might not be the most scientifically inter-
ance of such an object. A running computational model, on the esting use of cognitive architectures, it seems inevitable that cog-
other hand, can. nitive architectures will be used in this way.
Because of this property, architecture-based computational Cognitive architectures are also theoretically important to
models have some other important uses beyond acting as virtual HCI as an interdisciplinary field. Many people (including some
users in usability tests. One such use is in intelligent tutoring cognitive psychologists) find terms from cognitive psychology
systems (ITSs). Consider the Lisp tutor (Anderson, Conrad, & such as “working memory” or “mental model” vague and ill de-
Corbett, 1989). This tutoring system contained an architecture- fined. A harsh evaluation of explanations relying on such terms
based running computational model of the knowledge neces- is found in Salthouse (1988, p. 3): “It is quite possible that in-
sary to implement the relevant Lisp functions, as well as a mod- terpretations relying on such nebulous constructs are only mas-
ule for assessing which pieces of this knowledge were mastered querading ignorance in what is essentially vacuous terminol-
by the student. Because the model was executable, it could pre- ogy.” Computational cognitive architectures, on the other hand,
dict what action the student would take if the student had cor- require explicit specifications of the semantics of theoretical
rect knowledge of how to solve the problem. When the student terms. Even if the architectures are imperfect descriptions of
took a different action, this told the ITS that the student was the human cognitive system, they are at a minimum well speci-
missing one or more relevant pieces of knowledge. The student fied and therefore clearer in what they predict than strictly ver-
could then be given feedback about what knowledge was miss- bal theories.
ing or incomplete, and problems that exercise this knowledge A second theoretical advantage of computational theories
could be selected by the ITS. It is possible to generate more ef- such as cognitive architectures is that they provide a window
fective educational experiences by identifying students’ knowl- into how the theory actually works. As theories grow in size and
edge, as well as the gaps in that knowledge. Problems that con- number of mechanisms, the interactions of those mechanisms
tain knowledge students have already mastered can be avoided, becomes increasingly difficult to predict analytically. Computer
so as not to bore students with things they already know. This simulations permit relatively rapid evaluations of complex
frees up students to concentrate on the material they have not mechanisms and their interactions. (For an excellent discussion
yet mastered, resulting in improved learning (Anderson et al., of this topic, see Simon, 1996.) Another problem with theories
1989). While the Lisp tutor is an old research system, ITSs based based solely on verbal descriptions is their internal coherence
on the same underlying cognitive architecture with the same es- can be very difficult to assess, while such assessment is much
sential methodology have been developed for more pressing more straightforward with computational models. Verbal theo-
educational needs such as algebra and geometry and are now ries can easily hide subtle (and not so subtle) inconsistencies
sold commercially (see http://www.carnegielearning.com). that make them poor scientific theories. Computational models,
Another HCI-relevant application for high-fidelity cognitive on the other hand, force explanations to have a high level of in-
models is populating simulated worlds or situations. For exam- ternal coherence; theories that are not internally consistent are
ple, training an F-16 fighter pilot is expensive, even in a simula- typically impossible to implement on real machines.
tor, because that trainee needs to face realistic opposition. Real- Finally, HCI is an interdisciplinary field, and thus theories
istic opposition consists of other trained pilots, so training one that are fundamentally interdisciplinary in nature are appropri-
person requires taking several trained pilots away from their nor- ate. Cognitive architectures are such theories, combining com-
mal duties (e.g., flying airplanes on real missions). This is difficult putational methods and knowledge from the artificial intelli-
and expensive. If, however, the other pilots could be realistically gence end of computer science with data and theories from
simulated, the trainee could face opposition that would have cognitive psychology. While cognitive psychology and computer
useful training value without having to remove already trained science are certainly not the only disciplines that participate in
pilots from their duties. Many training situations such as this ex- HCI, they are two highly visible forces in the field. Psychologi-
ist, where the only way to train someone is to involve multiple cal theories that are manifested as executable programs should
human experts who must all be taken away from their regular be less alien to people with a computer science background
jobs. The need for expensive experts can potentially be elimi- than more traditional psychological theories.
nated (or at least reduced), however, by using architecturally Thus, cognitive architectures are clearly relevant to HCI at a
based cognitive models in place of the human experts. The U.S. number of levels. This fact has not gone unnoticed by the HCI
military has already started to experiment with just such a sce- research community. In fact, cognitive architectures have a long
nario ( Jones, Laird, Nielsen, Coulter, Kenny, & Koss, 1999). Having history in HCI, dating back to the original work of Card, Moran,
realistic opponents is desirable in domains other than training as and Newell (1983).
well, such as video games. Besides things like texture-mapped
3-D graphics, one of the features often used to sell games is net-
work play. This enables players to engage opponents whose ca-
pabilities are more comparable to their own than typical com-
puter-generated opponents; however, even with network play, BRIEF LOOK AT PAST SYSTEMS IN HCI
it is not always possible for a game to find an appropriate oppo-
nent. If the computer-generated opponent were a more high- The total history of cognitive architectures and HCI would be far
fidelity simulation of a human in terms of cognitive and percep- too long to document in a single chapter; however, it is possible
tual-motor capabilities, then video game players would have no to touch on some highlights. While not all of the systems de-
difficulty finding appropriate opponents without relying on net- scribed in this section qualify as complete cognitive architec-
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 97

5. Cognitive Architecture • 97

tures, they all share intellectual history with more current ar- (a) one cognitive, (b) one perceptual, and (c) one motor. In
chitectures and influenced their development and use in HCI. some cases, the system essentially behaves serially. For instance,
Finally, many of the concepts developed in these efforts are still in order for the system to press a key in response to the ap-
central parts of the ongoing research on cognitive architecture. pearance of a light, the perceptual processor must detect the
In addition, there is a natural starting point: appearance of the light and transmit this information to the cog-
nitive processor. The cognitive processor’s job is to decide what
the appropriate response should be, and then transmit that to
The Model Human Processor (MHP) and GOMS the motor processor, which is responsible for actually executing
the appropriate motor command. In this situation, the proces-
The Psychology of Human-Computer Interaction (Card, Moran, sors act serially, one after another; however, in more complex
& Newell, 1983) is clearly a seminal work in HCI, one of the tasks such as transcription typing, all three processors often
defining academic works in the early days of the field. While that work in parallel.
work did not produce a running cognitive architecture, it was Besides the specification of the timing for each processor
clearly in the spirit of cognitive architectures and was quite in- and the connectivity of the processors, Card, Moran, and Newell
fluential in the development of current cognitive architectures. named some general operating principles ranging from very
Two particular pieces of that work are relevant here, the Model general and qualitative to detailed and quantitative. For exam-
Human Processor (MHP) and GOMS. ple, Principle P9, the Problem Space Principle, stated:
The MHP represents a synthesis of the literature on cognitive
psychology and human performance up to that time, and The rational activity in which people engage to solve a problem can be
sketches the framework around which a cognitive architecture described in terms of (1) a set of states of knowledge, (2) operators for
could be implemented. The MHP is a system with multiple changing one state into another, (3) constraints on applying operators,
memories and multiple processors, and many of the properties and (4) control knowledge for deciding which operator to apply next.
of those processors and memories are described in some detail. (Card et al. 1983, p. 27)
(See Fig. 5.1.) Card, Moran, and Newell (1983) also specified the
interconnections of the processors and a number of general This is a particularly general and somewhat vague principle.
operating principles. In this system, there are three processors: In contrast, consider Principle P5, Fitts’ Law:

The time Tpos to move the hand to a target of size S which lies a dis-
tance D away is given by:

Tpos  IMlog2(D/S  .5)

where IM  100 [70  120] ms/bit. (Card et al. 1983, p. 27).


This is a very specific principle that quantitatively describes
hand movement behavior, which is highly relevant to, say, point-
ing with a mouse. Overall, the specification of the MHP is quite
thorough, and it lays out a basis for a cognitive architecture able
to do a wide variety of HCI-relevant tasks; however, Card et al.
(1983) did not implement the MHP as a running cognitive ar-
chitecture. This is likely for pedagogical reasons; it is not nec-
essary to have a complete, running cognitive architecture for
the general properties of that architecture to be useful for guid-
ing HCI researchers and practitioners. At the time, computa-
tional modeling was the domain of a very specialized few in cog-
nitive psychology.
Card et al. (1983) laid out another concept that has been
highly influential throughout HCI and particularly in the com-
munity of computational modelers. This is GOMS, which stands
for Goals, Operators, Methods, and Selection rules. GOMS is a
framework for task analysis that describes routine cognitive
skills in terms of the four listed components. Routine cognitive
skills are those where the user knows what the task is and how
to do the task without doing any problem solving. Text editing
with a familiar editor is the prototypical case of this, but clearly,
a great many tasks of interest in HCI could be classified as rou-
tine cognitive skills. Thus, the potential applicability of GOMS
is quite broad. Indeed, GOMS has been applied to a variety
FIGURE 5.1. Model Human Processor. Based on Card, Moran, of tasks; the website http://www.gomsmodel.org lists 143 GOMS-
and Newell (1983). related papers in its bibliography.
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 98

98 • BYRNE

What does a GOMS analysis provide? Essentially, a GOMS learning time for the knowledge described in a GOMS model
analysis of a task describes the hierarchical procedural knowl- based on a theory of transfer of training referred to as cogni-
edge a person must have to successfully complete that task. tive complexity theory (CCT), which will be described in more
Based on that, and the sequence of operators that must be ex- detail in the next section.
ecuted, it is possible to make quantitative predictions about the
execution time for a particular task. Other analyses, such as pre-
dictions of error, functionality coverage, and learning time, are Cognitive Complexity Theory (CCT)
also sometimes possible. Since the original formulation pre-
sented in Card et al. (1983), a number of different forms of When someone has learned to perform a task with a particular
GOMS analysis have been developed, each with slightly differ- interface and must switch, doing the same task with a new in-
ent strengths and weaknesses ( John & Kieras, 1996). terface, how much better off will they be than someone just
The core point as it relates to cognitive architectures is that learning to do the task with the new interface?—that is, how
GOMS analysis is originally based on a production rule analysis much is the knowledge gained from using the old interface
(Card, personal communication, 1999). Because this will come “transferred” to using the new interface? This kind of question
up several times, a brief introduction to production systems is has intrigued psychologists for at least a century, and having
warranted. Production rules are IF-THEN condition-action pairs, some answers to this question has implications for training pro-
and a set of production rules (or simply “productions,” or just grams, user interface design, and many other areas. Cognitive
“rules”) and a computational engine that interprets those pro- complexity theory (Bovair, Kieras, & Polson, 1990; Kieras & Pol-
ductions is called a “production system.” In addition to produc- son, 1985) is a psychological theory of transfer of training ap-
tions, production systems contain some representation of the plied to HCI. Most relevant to the current discussion, this the-
current state. This representation typically consists of a set ory is based on production rules. The major points of CCT are
of loosely structured data elements such as propositions or as follows:
attribute-value pairs. This set is called the “working memory”
or “declarative memory.” Because “working memory” is also • Knowledge of the procedures that people need to execute
a psychological term with somewhat different meaning in to perform routine tasks can be represented with production
that literature, “declarative memory” will be used in all further rules. The relevant production rules can be generated based
discussions. on a GOMS analysis of the task to be modeled.
The operation of a production system is cyclic. On each cy-
• The complexity of a task will be reflected in the number and
cle, the system first goes through a pattern-matching process.
content of the production rules. When certain conventions
The IF side of each production tests for the presence of a par-
are adopted about the style of those production rules, com-
ticular pattern in declarative memory. When the IF conditions of
plexity is reflected almost entirely in the number of rules.
a production are met, the production is said to fire, and the ac-
tions specified on the THEN side are executed. The actions can • The time it takes to execute a procedure can be predicted
be things such as pressing a button, or even some higher level with a production system that interprets those rules along
abstraction of action (e.g., “turn left”). Actions also include mod- with a set of typical operator times, for example, the time it
ifying the contents of declarative memory, which usually means takes to type a three-letter command. The production inter-
that a different production or productions will match on the preter used in this work was not intended to be a general
next cycle. At this abstract and purely symbolic level, production cognitive architecture, but the production system framework
systems are Turing complete and thus can compute anything is certainly consistent with current architectures.
that is computable (Newell, 1990), thus, they should be flexible • The time it takes to learn a task is a function of the number of
enough to model the wide array of computations performed new rules that the user must learn. “New” is clearly defined in
by the human cognitive system. this context. If the user already has a production, and a new
This is relevant to cognitive architectures because most cog- task requires a rule that is similar (again, similarity is well de-
nitive architectures are (or contain) production systems. GOMS fined based on the production rule syntax), then the rule for
was actually abstracted from production rule analysis. Card et al. the new task need not be learned.
(1983) discovered that, for routine cognitive skills, the structure • Some predictions about errors and speedup with practice can
of the productions was quite similar across tasks and a more also be gleaned from the contents of the production rules.
abstract representation was possible. This representation is the
original GOMS formulation. Thus, translating a GOMS analysis Obviously, this was an ambitious agenda, and there are many
into production rules, the language of most cognitive architec- subtleties. For example, the notion of a “task” as the term was
tures, is generally straightforward. Similarly, for routine cogni- used in the description of CCT actually includes more than just
tive skills, it is often relatively simple to derive a GOMS analysis the task at an abstract level. Consider a simple instance of a text-
from the set of productions used to model the task. Models editing task, deleting the word “redux” from the middle of a sen-
based on cognitive architectures can go well beyond routine tence. The actual commands needed to accomplish this task
cognitive skills, but this connection has certainly influenced the could be very different in different text editors, thus, modeling
evolution of research on cognitive architecture and HCI. This the “delete word” task would require two different sets of pro-
connection has also fed back into research and development of ductions, one for each editor—that is, the necessary knowl-
GOMS techniques themselves, such as Natural GOMS Language edge, and thus the production rules for representing it, is ac-
(NGOMSL; Kieras, 1988). NGOMSL allows the prediction of tually a function both of the task from the user point of view
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 99

5. Cognitive Architecture • 99

(e.g., “delete word”) and the interface provided to accomplish levels of the hierarchy. This forces users to spend WM resources
the task. Transfer from one text editor to another therefore de- on disambiguating menu items when they are in a situation
pends a great deal on the particulars of each interface. CCT thus where WM demands outstrip supply.
predicts asymmetrical transfer: learning editor A after editor B Another application of CAPS that is HCI relevant is the ac-
should not be the same as learning editor B after editor A. count of postcompletion error provided by Byrne and Bovair
CCT models, such as a GOMS analysis, omit modeling many (1997). What is a postcompletion error? Anecdotal evidence and
details of user behavior. In general, anything that falls outside intuition suggests that, when interacting with man-made arti-
the domain of procedural knowledge (how-to-do-it knowledge) facts, certain kinds of errors occur with greater frequency than
is not modeled. This means that the model does not model mo- others. In particular, one entire family of errors seems anecdo-
tor actions such as keypresses, and instead has a “DoKeystroke” tally common: errors that are made when some part of a task oc-
primitive operator. Nor do CCT models model things such as curs after the main goal of the task has been accomplished
natural language comprehension, clearly a requirement in text (hence, “postcompletion”). Nearly everyone reports having
editing. CCT models also do not include any model of the per- made an error of this type at one time or another. Here are two
ceptual processes required by users—the model was simply prototypical examples:
given information about the state of the display and did not
have to, for example, look to see if the cursor was over a partic-
1. Leaving the original on the glass of a photocopier. The main
ular character. This is the same scope as a typical GOMS model,
goal one generally has when using a photocopier is “get
though a CCT model is more formalized and quantitative than
copies,” and this goal is satisfied before one remove the orig-
the GOMS models described by Card et al. (1983).
inal document. This error is less common now that many
In spite of these limitations (or perhaps in part because
photocopiers include document feeders; the more current
these limitations allowed the researchers to concentrate on the
equivalent is leaving a document on the glass in a flatbed
most central aspects of the phenomena), CCT fared very well.
scanner.
Numerous laboratory experiments provide empirical support
2. Leaving one’s bank card in an automated teller machine
for many of the claims of CCT (see especially Bovair et al., 1990).
(ATM). Again, the main goal is something on the order of
The CCT framework was developed and validated in greatest de-
“get cash,” and in many ATMs, card removal occurs after the
tail to pre-GUI text editing, but it has also been applied to menu-
cash is dispensed. This error was common enough in the
based systems (Polson, Muncher, & Engelbeck, 1986) and a con-
first generation of ATMs that many ATMs are now designed in
trol panel device (Kieras & Bovair, 1986). Singley and Anderson
such a way that this error is now impossible to make.
(1989) provided a strikingly similar analysis of transfer of train-
ing, as well as supporting empirical results, lending credence
to the CCT analysis. CCT was certainly one of the most promi- Other postcompletion errors include leaving the gas cap off
nent early successes of computational modeling in HCI. after filling up the car’s gas tank, leaving change in vending ma-
chines, and more—most readers can probably think of several
others. While numerous HCI researchers were aware of this
CAPS class of error (e.g., Young, Barnard, Simon, & Whittington, 1989;
Polson et al., 1992), no previous account explained why this
CAPS (Collaborative Activation-based Production System; Just & type of error is persistent, yet not so frequent that it occurs
Carpenter, 1992) is a cognitive architecture designed to model every time. The CAPS model provides just such an account, and
individual differences in working memory (WM) capacity and can serve as a useful example of the application of a cognitive ar-
the effects of working memory load. This speciality is applica- chitecture to an HCI problem.
ble to a number of HCI situations. Certainly, some kinds of user Like most other production systems, CAPS contains two
interfaces can create excessive working memory demands, for kinds of knowledge: (a) declarative memory and (b) produc-
example, phone-based interfaces. In phone-based interaction tions. Declarative memory elements in CAPS also have associ-
(PBI), options do not remain on a screen or in any kind of avail- ated with them an activation value, and elements below a
able storage; rather, users are forced to remember the options threshold level of activation cannot be matched by productions’
presented. This seems like a prime candidate for modeling with IF sides. Additionally, unlike most other production systems, the
a system designed to capture the effects of working memory de- THEN side of a CAPS production may request that the activation
mand, and this is exactly what Huguenard, Lerch, Junker, Patz, of an element be incremented. For this to be truly useful in
and Kass (1997) did. Their data showed that, contrary to guide- modeling working memory, there is a limit to the total amount
line advice and most people’s intuition, restricting phone of activation available across all elements. If the total activation
menus to only a few (three) items each does not reduce error exceeds this limit, then all elements lose some activation to
rates. The CAPS-based model provided a clear theoretical bring the total back within the limit. This provides a mechanism
account of this phenomenon. The model showed that short for simulating human working memory limitations.
menus are not necessarily better in PBI because of two side ef- In Byrne and Bovair’s (1997) postcompletion error model,
fects of designing menu hierarchies with few options at each there is a production that increments the activation of subgoals
level. First, for the same number of total items, this increases when the parent goal is active and unsatisfied. Therefore, to use
menu depth, which creates working memory demand. Second, the photocopier example, the “get copies” subgoal supplies ac-
with fewer items at each level, each individual item has to be tivation to all the unfulfilled subgoals throughout the task; how-
more general and therefore more vague, especially at the top ever, when the “get copies” goal is satisfied, the activation sup-
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 100

100 • BYRNE

ply to the subgoals stops. Because the goal to remove the orig- mechanisms developed specifically in Soar: (a) universal sub-
inal is a subgoal of that goal, it loses its activation supply. Thus, goaling and (b) a general-purpose learning mechanism. Because
what the model predicts is that the postcompletion subgoals are the latter depends on the former, universal subgoaling will be
especially vulnerable to working memory load, and lower ca- described first. One of the features of Soar’s decision process
pacity individuals are more “at risk” than higher-capacity indi- is that it is not guaranteed to have an unambiguous set of pref-
viduals. Byrne and Bovair conducted an experiment to test this erences to work with. Alternately, there may be no preferences
prediction, and the data supported the model. listing an acceptable action. Perhaps the system does not know
This is a good demonstration of the power of cognitive any acceptable operators for the current state, or perhaps the
architectures. Byrne and Bovair neither designed nor imple- system lacks the knowledge of how to apply the best operator.
mented the CAPS architecture, but were able to use the theory Whatever the reason, if the decision procedure is unable to se-
to construct a model that made empirically testable predictions, lect an action, an impasse is said to occur. Rather than halting or
and those predictions were borne out. Though it seems unlikely entering some kind of failure state, Soar sets up a new state in
that CAPS will guide future HCI work (since its designers have a new problem space with the goal of resolving the impasse. For
gone in a different direction), it provides an excellent example example, if multiple operators were proposed, the goal of the
case for a variety of reasons. new problem space is to choose between the proposed opera-
tors. In the course of resolving one impasse, Soar may encounter
another impasse and create another new problem space, and so
Soar on. As long as the system is provided with some fairly generic
knowledge about resolving degenerate cases (e.g., if all else
The development of Soar is generally credited to Allan Newell fails, choose randomly between the two good operators), this
(especially, Newell, 1990), and Soar has been used to model a universal subgoaling allows Soar to continue even in cases
wide variety of human cognitive activity from syllogistic reason- where there is little knowledge present in the system.
ing (Polk & Newell, 1995) to flying combat aircraft in simulated Learning in Soar is a by-product of universal subgoaling.
war games ( Jones et al., 1999). Soar was Newell’s candidate Whenever an impasse is resolved, Soar creates a new produc-
“unified theory of cognition” and was the first computational tion rule. This rule summarizes the processing that went on in
theory to be offered as such. the substate. The resolution of an impasse makes a change to
While Soar is a production system like CAPS, it is possible to the superstate (the state in which the impasse originally oc-
think of Soar at a more abstract level. The guiding principle be- curred); this change is called a result. This result becomes the
hind the design of Soar is Principle P9 from the Model Human action, or THEN, side of the new production. The condition, or
Processor, the Problem Space Principle. Soar casts all cognitive IF, side of the production is generated through a dependency
activity as occurring in a problem space, which consists of a num- analysis by looking at any declarative memory item matched in
ber of states. States are transformed through the applications of the course of determining this result. When Soar learns, it learns
operators. Consider Soar playing a simple game like tic-tac-toe as only new production rules, and it only learns as the result of re-
player “X.” The problem space is the set of all states of the tic-tac- solving impasses. It is important to realize that an impasse is not
toe board—not a very large space. The operators available at any equated with failure or an inability to proceed in the problem
given state of that space are placing an X at any of the available solving, but may arise simply because, for example, there are
open spaces on the board. Obviously, this is a simplified exam- multiple good actions to take and Soar has to choose one of
ple; the problem space and the available operators for flying an them. Soar impasses regularly when problem solving, and thus
F-16 in a simulated war game are radically more complex. learning is pervasive in Soar.
Soar’s operation is also cyclic, but the central cycle in Soar’s Not surprisingly, Soar has been applied to a number of learn-
operation is called a decision cycle. Essentially, on each decision ing-oriented HCI situations. One of the best examples is the re-
cycle, Soar answers the question, “What do I do next?” Soar cent work by Altmann (Altmann & John, 1999; Altmann, 2001).
does this in two phases. First, all productions that match the Altmann collected data from an experienced programmer while
current contents of declarative memory fire. This usually causes she worked at understanding and updating a large computer
changes in declarative memory, so other productions may now program by examining a trace. These data included verbal pro-
match. Those productions are allowed to fire, and this contin- tocols (e.g., thinking aloud) as well as a log of the actions taken
ues until no new productions fire. At this time, the decision pro- (keystrokes and scrolling). The programmer occasionally scrolled
cedure is executed, in which Soar examines a special kind of de- back to find a piece of information that had previously been dis-
clarative memory element, the preference. Preferences are played. Altmann constructed a Soar model of her activity. This
statements about possible actions, for example, “operator o3 is model simply attempts to gather information about its environ-
better than o5 for the current operator,” or “s10 rejected for su- ment; it is not a complete model of the complex knowledge of
pergoal state s7.” Soar examines the available preferences and an experienced programmer. The model attends to various
selects an action. Thus, each decision cycle may contain many pieces of the display, attempts to comprehend what it sees, and
production cycles. When modeling human performance, the then issues commands. “Comprehension” in this model is man-
convention is that each decision cycle lasts 50 ms, so produc- ifested as an attempt to retrieve information about the object
tions in Soar are very low-level, encapsulating knowledge at a being comprehended.
small grain size. When an item is attended, this creates a production that
Other than the ubiquitous application of the problem space notes that the object was seen at a particular time. Because
principle, Soar’s most defining characteristics come from two learning is pervasive, Soar creates many new rules like this, but
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 101

5. Cognitive Architecture • 101

because of the dependency-based learning mechanism, these evaluation of the item much more certain and would be an ex-
new productions are quite specific to the context in which the cellent guide for novice users. This is not unlike ToolTips for
impasse originally occurred. Thus, the “index” into the model’s toolbar icons, but on a much larger scale.
rather extensive episodic memory consists of very specific cues, A model that does an excellent job of highlighting the power
usually found on the display. Seeing a particular variable name is of cognitive architectures is NTD-Soar (Nelson, Lehman, &
likely to trigger a memory for having previously seen that vari- John, 1994). NTD stands for “NASA Test Director,” who
able name. Importantly, this memory is generated automatically,
without need for the model to deliberately set goals to remem- . . . is responsible for coordinating many facets of the testing and prepa-
ber particular items. ration of the Space Shuttle before it is launched. He must complete a
Altmann (2001) discussed some of the HCI ramifications for checklist of launch procedures that, in its current form, consists of 3000
pages of looseleaf manuals . . . as well as graphical timetables describ-
this kind of always-on episodic memory trace in terms of display
ing the critical timing of particular launch events. To accomplish this, the
clutter. While avoiding display clutter is hardly new advice, it is
NTD talks extensively with other members of launch team over a two-
generally argued that it should be avoided for visual reasons way radio. . . . In addition to maintaining a good understanding of the
(e.g., Tullis, 1983). What Altmann’s model shows, however, is status of the entire launch, the NTD is responsible for coordinating trou-
that clutter can also have serious implications for effective use of bleshooting attempts by managing the communication between mem-
episodic memory. Clutter can create enormous demands for re- bers of the launch team who have the necessary expertise. (p. 658)
trieval. Since more objects will generally be attended on a clut-
tered display, the episodic trace will be large, lowering the pre- Constructing a model that is even able to perform this task at
dictive validity for any single cue. It is unlikely that kind of all is a significant accomplishment. Nelson, Lehman, and John
analysis would have been generated if it had not been guided by (1994) were able to not only build such a model, but this model
a cognitive architecture that provided the omnipresent learning was able to produce a timeline of behavior that closely matched
of Soar. the timeline produced by the actual NTD being modeled—that
Soar has also been used to implement models of exploratory is, the ultimate result was a quantitative model of human per-
learning, somewhat in the spirit of LICAI (described later). formance, and an accurate one at that.
There are two prominent models here, (a) one called IDXL (Rie- It is unlikely that such an effort could have been accom-
man, Young, & Howes, 1996) and (b) a related model called plished without the use of an integrated cognitive architecture.
Task-Action Learner (Howes & Young, 1996). These models The NTD model made extensive use of other Soar models. Nel-
both attempt to learn unfamiliar GUI interfaces. IDXL operates son, Lehman, and John (1994) did not have to generate and im-
in the domain of using software to construct data graphs, while plement theory of natural language understanding to model the
the Task-Action Leaner starts with even less knowledge and communication between the NTD and others, or the NTD read-
learns basic GUI operations such as how to open a file. For ing the pages in the checklist, because one had already been
brevity, only IDXL will be described in detail. constructed in Soar (Lewis, 1993). They did not have to con-
IDXL goes through many of the same scanning processes as struct a model of visual attention to manage the scanning and vi-
LICAI, but must rely on very different mechanisms for evalua- sual search of those 3000 pages, because such a model already
tion since Soar is fundamentally different from LICAI. IDXL existed in Soar (Wiesmeyer, 1992). A great deal of knowledge
models evaluation of various display elements as search through engineering still took place in order to understand and model
multiple problem spaces, one that is an internal search through this complex task, but using an integrated architecture greatly
Soar’s internal knowledge and the other a search through the eased the task of the modelers.
display. As items are evaluated in the search, Soar learns pro- While this modeling effort was not aimed at a particular HCI
ductions that summarize the products of each evaluation. At problem, it is not difficult to see how it would be applicable to
first, search is broad and shallow, with each item receiving a one. If one wanted to replace the 3000-page checklist with
minimum of elaboration; however, that prior elaboration guides something like a personal digital assistant (PDA), how could the
the next round of elaboration, gradually allowing IDXL to focus PDA be evaluated? There are very few NTDs in the world, and it
in on the “best” items. This model suggests a number of ways is unlikely that they would be able to devote much time to par-
in which interface designers could thus help learners acquire a ticipatory design or usability testing. Because an appropriate
new the knowledge needed to utilize a new interface. Like the quantitative model of the NTD exists, however, it should be pos-
LICAI work, the IDXL work highlights the need for good labels sible to give the model a simulated PDA and assess the impact of
to guide exploration. A more radical suggestion is based on one that change on the model’s performance. Even if the model
of the more subtle behavior of users and IDXL. When explor- does not perfectly capture the effects of the change, it is likely
ing and evaluating alternatives, long pauses often occur on par- that the model would identify problem areas and at least guide
ticular menu items. During these long pauses, IDXL is attempt- the developers in using any time they have with an actual NTD.
ing to determine the outcome of selecting the menu item being Soar has also been used as the basis for simulated agents in
considered. Thus, one suggestion for speeding up learning of a war games, as mentioned previously ( Jones et al., 1999). This
new menu-driven GUI is to detect such pauses, and show (in model (TacAir-Soar) participates in a virtual battle space in
some easily undoable way) what the results of selecting that which humans also participate. TacAir-Soar models take on a
item would be. For instance, if choosing that item brings up a di- variety of roles in this environment, from fighter pilots to heli-
alog box for specifying certain options, that dialog box could copter crews to refueling planes. Their interactions are more
be shown in some grayed-out form, and would simply vanish if complex than simple scripted agents, and they can interact with
the user moved off of that menu item. This would make the humans in the environment with English natural language. One
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 102

102 • BYRNE

of the major goals of the project is to make sure that TacAir-Soar gration (hence, the name). In the construction phase, an initial
produces humanlike behavior, because this is critical to their input (e.g., the contents of the current display) is fed into a
role, which is to serve as part of training scenarios for human weakly constrained, rule-based process that generates a net-
soldiers. In large-scale simulations with many entities, it is much work of propositions. Items in the network are linked based on
cheaper to use computer agents than to have humans fill every their argument overlap. For example, the goal of “graph data”
role in the simulation. While agents (other than the ubiquitous might be represented with the proposition (PERFORM GRAPH
and generally disliked paper clip) have not widely penetrated DATA). Any proposition containing GRAPH or DATA would thus
the common desktop interface, this remains an active HCI be linked to that goal.
research area, and future agents in other roles could also be Once the construction phase is completed, the system is left
based on cognitive architecture rather than more engineering- with a linked network of propositions. What follows is the inte-
oriented AI models. gration phase, in which activation propagates through the net-
While the Soar architecture is still being used for applied AI work in a neural, network-like fashion. Essentially, this phase is
work such as TacAir-Soar, there have been essentially no new a constraint-satisfaction phase, which is used to select one of
HCI-oriented Soar efforts in the last several years. Nothing, in the propositions in the network as the “preferred” one. For ex-
principle, has precluded such efforts, but for the time being, ample, the system may need to select the next action to perform
Soar is primarily an example of previous achievements using while using an interface. Action representations will be added to
cognitive architectures in HCI, as opposed to a source of new the network during the construction phase, and an action will
insight. be selected during the integration phase. The action will be per-
formed, and the next cycle initiated. Various C-I models have
used this basic process to select things other than actions. The
original C-I system used these cycles to select between differ-
CONTEMPORARY ARCHITECTURES ent interpretations of sentences.
LICAI has three main types of cycles: (a) one to select ac-
Current cognitive architectures are being actively developed, tions, (b) one to generate goals, and (c) one to select goals. This
updated, and applied to HCI-oriented tasks. Two of the three contrasts with how most HCI tasks have been modeled in pro-
most prominent systems (EPIC and ACT-R) are production sys- duction system architectures; in such systems, the goals are usu-
tems, or tather are centrally built around production systems. ally included in the knowledge given to the system. This is not
While all contain production rules, the level of granularity of an true in LICAI; in fact, the knowledge given to LICAI by the mod-
individual production rule varies considerably from architec- elers is minimal. For the particular application of LICAI, which
ture to architecture. Each one has a different history and was modeling users who knew how to use a Macintosh for other
unique focus, but they share a certain amount of intellectual tasks (e.g., word processing) and were now being asked to plot
history; in particular, they have all been influenced one way or some data using a Macintosh program called CricketGraph (one
another by both the MHP and each other. At some level, they group of users actually worked with Microsoft Excel), it included
may have more similarities than differences; whether this is some very basic knowledge about the Macintosh GUI and some
because they borrow from one another or because the science knowledge about graphing. Rather than supply the model with
is converging is still an open question. The third system is the goal hierarchy, Kitajima and Polson (1997) gave the model
somewhat different from these two production system models the same minimal instructions that were given to the subjects.
and will be considered first. One of the major jobs of the LICAI model, then, was to generate
the appropriate goals as they arose while attempting to carry
out the instructions.
LICAI/CoLiDeS Again, this illustrates one of the benefits of using a cognitive
architecture to model HCI tasks. Kitajima and Polson (1997) did
LICAI (Kitajima & Polson, 1997) is a primary example of an HCI- not have to develop a theory of text comprehension for LICAI to
oriented cognitive architecture not based on the production sys- be able to comprehend the instructions given to subjects; since
tem framework. With the exception of the Soar work on ex- LICAI is based on an architecture originally designed for text
ploratory learning, all of the work discussed up to this point more comprehension, they essentially got that functionality gratis. Ad-
or less assumes that the modeled users are relatively skilled with ditionally, they did not include just any text comprehension en-
the specific interface being used; these approaches do a poor gine, but one that makes empirically validated predictions about
job of modeling relatively raw novices. One of the main goals of how people represent the text they read. Thus, the claim that
LICAI is to address this concern. The paradigm question ad- the model started out with roughly the same knowledge as the
dressed by LICAI is, “How do users explore a new interface?” users is highly credible.
Unlike the other architectures discussed, LICAI’s central con- The actual behavior of the model is also revealing, as it ex-
trol mechanisms are not based on a production system. Instead, hibits many of the same exploratory behaviors seen in the users.
LICAI is built around an architecture originally designed to First, the model pursues a general strategy that can be classi-
model human discourse comprehension: construction-integra- fied as label-following (Polson et al., 1992). The model, like the
tion (C-I; Kintsch, 1998). Like production systems, C-I’s opera- users, had a strong tendency to examine anything on the screen
tion is cyclic. What happens on those cycles, however, is some- that had a label matching or nearly matching (e.g., a near syn-
what different from what happens in a production system. Each onym) a key word in the task instructions. When the particular
cycle is divided into two phases: (a) construction and (b) inte- subtask being pursued by the model contained well-labeled
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 103

5. Cognitive Architecture • 103

steps, the users were rapid, which was predicted by the model. While LSA is certainly surrounded by a host of mathematical
While this prediction is not counter-intuitive, it is important and theoretical issues (it should be noted that many similar
to note that LICAI is not programmed with this strategy. This techniques that differ in a variety of subtle ways have also been
strategy naturally emerges through the normal operation of developed), the key result from an HCI point of view is that the
construction-integration through the linkages created by shared technique works. Similarity measures generated by LSA gener-
arguments. The perhaps less-intuitive result—modeled suc- ally agree very well with human judges, even for complex lan-
cessfully by LICAI—is the effect of the number of screen objects. guage understanding tasks such as grading essays. (In fact, an
During exploration in this task, users were slower to make LSA-based system for grading essays and similar tasks has been
choices if there were more objects on the screen, but only if turned into a successful commercial product; see http://www
those items all had what were classified as “poor” labels. In the .pearsonkt.com.)
presence of “good” labels (literal match or near synonym), the Based on the LSA-augmented CoLiDeS, Blackmon and col-
number of objects on the screen did not affect decision times of leagues (2002, 2003, 2005) have developed a web usability as-
either the users or LICAI. sessment and repair technique they call “Cognitive Walkthrough
The programmers who implemented the programs operated for the Web” (CWW). CWW is based on the idea that when users
by the users put in several clever direct manipulation tricks. For are exploring a large website, they go through a two-stage
example, to change the properties of a graph axis, one double- process of parsing and understanding the page, and ultimately
clicks on the axis and a dialog box specifying the properties of generate descriptions of the available actions. Using those de-
that axis appears. Microsoft Excel has some functionality that is scriptions, they judge which action is most similar to their goal,
most easily accessed by drag-and-drop. Franzke (1994) found and then select the link or widget that corresponds with that ac-
that in a majority of first encounters with these types of direct tion. This is somewhat more complex than simply applying LSA
manipulations, users required hints from the experimenter to to all of the labels on the page and comparing each one to the
be able to continue, even after two minutes of exploration. LI- goal; the page is parsed into “regions,” each of which is as-
CAI also fails at these interactions because no appropriate links sessed, and smaller bits of text (say, link labels) are elaborated
are formed between any kind of task goal and these unlabeled, with a C-I-based process. It is those elaborations, rather than the
apparently static screen objects during the construction phase. raw link labels, that are compared via LSA to the description of
Thus, these screen objects tend to receive little activation dur- the user’s goal.
ing the integration phase, and actions involving other objects Based on this, CWW identifies where users with a particular
are always selected. goal are likely to have navigation problems and even generates
Overall, LICAI does an excellent job of capturing many of the problem severity estimates. These predictions have been vali-
other empirical regularities in exploratory learning of a GUI in- dated in a series of experiments (Blackmon, Kitajima, & Polson,
terface. This is an important issue for many interfaces—partic- 2003; Blackmon, Kitajima, & Polson, 2005), and tools which par-
ularly, any interface that is aimed at a walk-up-and-use audience. tially automate the process are available on the web (see http://
While currently common walk-up-and-use interfaces, such at autocww.colorado.edu/). This is a significant advance in HCI, as
ATMs, provide simple enough functionality that this is not al- the problem of predicting how people use semantic similarity to
ways enormously difficult, this is not the case for more sophis- guide search on the web is not a trivial one and, prior to tech-
ticated systems, such as many information kiosks. niques like this, has generally required extensive iteration of
More recently, LICAI has been updated (and renamed to user tests.
CoLiDeS, for Comprehension-based Linked model of Deliberate
Search; Kitajima, Blackmon, & Polson, 2000) to handle interac-
tion with web pages. This involves goals that are considerably EPIC
less well elucidated and interfaces with a much wider range of
semantic content. In order to help deal with these complexities, With the possible exception of the NTD model, the models dis-
LICAI has been updated with a more robust attentional mecha- cussed up to this point have been primarily “pure” cognitive
nism and a much more sophisticated notion of semantic simi- models—that is, the perception and action parts of the mod-
larity based on Latent Semantic Analysis (LSA; Landauer & Du- els have been handled in an indirect, abstract way. These mod-
mais, 1997). LSA is a technique for mathematically representing els focus on the cognition involved, which is not surprising
the meaning of words. While the details of the mathematics are given the generally cognitive background of these systems;
complex, the underlying idea is straightforward. To perform however, even the original formulation of the MHP included
LSA, one needs a large corpus of documents. An enormous ma- processors for perception and motor control. Additionally, in
trix is created, showing which words appear in which contexts fact, user interfaces have also moved from having almost ex-
(documents). This matrix is then reduced by a mathematical clusively cognitive demands (e.g., one had to remember or
technique called “singular value decomposition” (a cousin of problem-solve to generate command names) to relying much
factor analysis) to a smaller dimensional space of, say, 300 by 300 more heavily on perceptual-motor capabilities. This is one of
dimensions. Each word can then be represented by a vector in the hallmarks of the GUI, the shift to visual processing and di-
this space. The “meaning” of a passage is simply the sum of the rect manipulation rather than reliance on complex composi-
vectors of the individual words in the passage. The similarity of tion of commands.
two words (or two passages) can be quantified as the cosine Providing accurate quantitative models for this kind of activ-
between the vectors representing the words (or passages), with ity, however, requires a system with detailed models of human
larger numbers signifying greater semantic similarity. perceptual and motor capabilities. This is one of the major foci
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 104

104 • BYRNE

and contributions of EPIC (for executive process interactive sor. The appearance of the letter can actually cause a number
control). EPIC is the brainchild of Kieras and Meyer (see espe- of different elements to be deposited into EPIC’s declarative
cially 1996, 1997; Kieras, Wood, & Meyer, 1997). The overall memory at different times. For example, information about the
structure of the processors and memories in EPIC is shown in letter’s color will be delivered before information about the let-
Fig. 5.2. This certainly bears some surface similarity to the MHP, ter’s identity.
but EPIC is substantially more detailed. EPIC was explicitly de- Similarly, on the motor side, EPIC does not simulate the com-
signed to pair high-fidelity models of perception and motor putation of the torques or forces needed to produce a particu-
mechanisms with a production system. The perceptual-motor lar hand movement. Instead, EPIC computes the time it will take
processors represent a new synthesis of the human perfor- for a particular motor output to be produced after the Cognitive
mance literature, while the production system is the same one Processor has requested it. This is complicated by the fact that
used in the CCT work discussed earlier. movements happen in phases. Most importantly, each move-
All EPIC processors run in parallel with one another. There- ment includes a preparation phase and an execution phase. The
fore, while the Visual Processor is recognizing an object on the time to prepare a movement is dependent on the number of
screen, the Cognitive Processor can decide what word should movement features that must be prepared for each movement
be spoken in response to some other input, while at the same and the features of the last movement prepared. Features that
time the Manual Motor processor is pressing a key. The infor- have been prepared for the previous movement can sometimes
mation flow is typical of traditional psychological models, with be reused to save time. EPIC can make repeated identical move-
information coming in through the eyes and ears, and outputs ments rapidly because no feature preparation time is necessary
coming from the mouth and hands. More specifically, what is if the movements are identical. If they are not identical, the
modeled in each of the processors is primarily time course. amount of savings is a function of how different the current
EPIC’s Visual Processor does not take raw pixels as input and movement is from the previous one. After being prepared, a
compute that those pixels represent a letter “A.” Instead, it de- movement is executed. The execution time for a movement cor-
termines whether the object on the screen can be seen and at responds roughly to the time it takes to physically execute the
what level of detail, as well as how long it will take for a repre- movement; the execution time for aimed movements of the
sentation of that object to be delivered to EPIC’s declarative hands or fingers are governed by Fitts’ Law. EPIC’s motor
memory once the letter becomes available to the Visual Proces- processors can only prepare one movement at a time, and can

FIGURE 5.2. Overall structure of the EPIC architecture. Based on Kieras and Meyer (1996).
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 105

5. Cognitive Architecture • 105

only execute one movement at a time, but may be preparing brevity, a detailed description of these models will be omitted.
one movement while executing another. Thus, in some tasks, it Another application of EPIC that definitely merits mention is the
may be possible to effectively pipeline movements in order to model of telephone assistance operators (TAOs), data originally
generate very rapid sequences of movements. presented in Gray et al. (1993). When a telephone customer di-
EPIC’s Cognitive Processor is a production system, the same als “0,” a TAO is the person who answers. The TAOs modeled
one that was used for the earlier CCT work. One highly salient here sat at a “dumb terminal” style workstation and assisted cus-
feature of this system is that multiple rules can fire on a pro- tomers in completing telephone calls. In particular, TAOs de-
duction cycle. In fact, there is no upper bound on the number termine how calls should be billed, and this is done by speak-
of productions that can fire on a cycle. Productions in this sys- ing to the customer. The detailed EPIC models (Kieras et al.,
tem are at a much higher grain size than productions in Soar, 1997) covered a subset of the possible billing types.
which gives EPIC a parallel quality at all levels—that is, all the This provided a good test of EPIC because the task was per-
processors work in parallel, and EPIC’s cognitive processor is formed under time pressure, and seconds—actually, millisec-
itself capable of parallel processing. onds—counted in task performance. Second, this task is multi-
This allows EPIC particular leverage in multiple-task situa- modal. The TAO must speak, type, listen, and look at a display.
tions. When more than one task is being performed, the tasks Third, very fine-grained performance data were available to help
can execute in parallel; however, many of the perceptual-motor guide model construction. By now, it should come as no sur-
processors are effectively serial. People only have one set of eyes prise to the reader that it was possible to construct an EPIC
that can only be aimed at one place at a time, so if multiple tasks model that did an excellent job of modeling the time course of
are ongoing and they both require the eyes, there must be the TAO’s interaction with the customer; however, this model-
something that arbitrates. In EPIC, this additional knowledge ing effort went beyond just that and provided some insight into
about how to manage multiple tasks is termed “executive” the knowledge engineering problem facing modelers utilizing
knowledge, and the productions that implement this knowl- cognitive architectures as well.
edge execute in parallel with the productions implementing the Like other production system models, EPIC provides a cer-
task knowledge. tain amount of freedom to the modeler in model construction.
Why is all this machinery and extra knowledge necessary? Be- While the architecture used provides certain kinds of con-
cause the world of HCI is changing. The GUI forced designers straints, and these constraints are critical in doing good science
and analysts to consider more seriously the perceptual-motor and affecting the final form of the model (Howes & Young,
constraints, and the propagation of computers with user inter- 1997), the modeler does have some leeway in writing the pro-
faces away from the desktop and into mobile phones, kiosks, au- duction rules in the model. This is true even when the produc-
tomobiles, and many, many other places creates a huge demand tion rule model is derived from another structured representa-
on people’s ability to multitask. Multiple-task issues have largely tion such as a GOMS model, which was the case in the TAO
gone unmodeled and have been outside the theoretical scope model. In EPIC, it is possible to write a set of “aggressive” pro-
of most psychological accounts in HCI, at least before EPIC. ductions that maximize the system’s ability to process things in
While LICAI and Soar have not been adequately equipped parallel, while it is also possible to write any number of less ag-
to deal with high-performance perception and action compo- gressive sets representing more conservative strategies. EPIC
nents of many tasks, EPIC is not equipped to handle some of the will produce a quantitative performance prediction regardless
issues covered by other architectures. In particular, EPIC does of the strategy, but which kind of strategy should the modeler
not include any learning mechanisms, so it would be difficult choose? There is generally no a priori basis for such a decision,
to generate EPIC models for many of the domains Soar has ap- and it is not clear that people can accurately self-report on such
proached successfully. This is not a fatal shortcoming, however, low-level decisions.
as there are a wide variety of domains in which learning is not a Kieras et al. (1997) generated an elegant approach to this
key component and where high-fidelity modeling of perception problem, later termed “bracketing” (Kieras & Meyer, 2000). The
and action, along with multiple tasking, are central. idea is this: construct two models, one of which is the maximally
Naturally, these are the kinds of domains to which EPIC has aggressive version. At this end of the strategy spectrum, the
been applied. One of the first major applications of EPIC was to models contain very little in the way of cognition. The Cognitive
a deceptively simple dual-task paradigm known as the “psycho- Processor does virtually no deliberation and spends most of its
logical refractory period” (or PRP; see Meyer & Kieras, 1997). cycles simply reading off perceptual inputs and immediately
In this task, laboratory subjects are typically confronted with generating the appropriate motor output. This represents the
two choice reaction time tasks, something on the order of, “super-expert” whose performance is limited almost entirely by
“Either a red light or a green light will appear. If it’s red, hit the the rate of information flow through the perceptual-motor sys-
‘L’ key. If it’s green, hit the ‘J’ key.” This sounds simple, but the tems. At the other end of the spectrum, a model incorporating
empirical literature is rich and shows a variety of subtle effects, the slowest reasonable strategy is produced. The slowest rea-
for which EPIC provides the first unified account. Critically, what sonable strategy is one where the basic task requirements are
the EPIC models of these experiments show is that people’s met, but with no strategic effort made to optimize scheduling to
low-level strategies for scheduling the tasks play a large role in produce rapid performance. The idea is that observed perfor-
determining performance in this simple paradigm. mance should fall somewhere in between these two extremes,
EPIC has been used to model several tasks with a more di- hence, the data should be bracketed by the two versions of the
rectly HCI-oriented flavor. One of those tasks is menu selection model. Different users will tend to perform at different ends of
(Kieras & Meyer, 1997; Hornof & Kieras, 1997, 1999), but for this range for different tasks, so this is an excellent way to ac-
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 106

106 • BYRNE

commodate some of the individual differences that are always tended to work more toward the upper (slowest-reasonable)
observed in real users. bracket for the touch screen interface. This suggests an advan-
What was discovered by employing this bracketing proce- tage for the keypad interface, but the caveat is that the slowest
dure to the TAO models was surprising. Despite the fact that the reasonable performance bound for the touch screen was faster
TAOs were under considerable time pressure and were ex- than the slowest possible for the keypad. Thus, any strategy
tremely well practiced experts, their performance rarely ap- changes made by users in the course of doing the task, perhaps
proached the fastest possible model. In fact, their performance as a dynamic response to changes in workload, could affect
most closely matched a version of the model termed the “hier- which interface would be superior at any particular point in the
archical motor-parallel model.” In this version of the model, eye, task. Thus, results about which interface is faster are likely to
hand, and vocal movements are executed in parallel with one be inconsistent—which is exactly what was found.
another when possible; furthermore, the motor processor is This kind of analysis would be impossible to conduct without
used somewhat aggressively, preparing the next movement a clear, quantitative human performance model. Constructing
while the current movement was in progress. The primary place and applying such a model also suggested an alternative inter-
where EPIC could be faster but the data indicated the TAOs face that would almost certainly be faster than either, which is
were not was in the description of the task knowledge. It is pos- one using a speech-driven interface. One of the major perfor-
sible to represent the knowledge for this task as one single, flat mance bottlenecks in the task was the hands, and so in this case,
GOMS method with no use of subgoals. On the other hand, the voice-based interaction should be faster. Again, this could only
EPIC productions could represent the full subgoal structure or be clearly identified with the kind of precise quantitative mod-
a more traditional GOMS model. Retaining the hierarchical rep- eling enabled by something like the EPIC architecture.
resentation—thus incurring time costs for goal management—
provided the best fit to the TAOs performance. This provides
solid evidence for the psychological reality of the hierarchical ACT-R 5.0
control structure inherent in GOMS analysis, since even well-
practiced experts in fairly regular domains do not abandon it for ACT-R 5.0 (Anderson et al., 2004) represents another approach
some kind of faster knowledge structure. to a fully unified cognitive architecture, combining a very broad
The final EPIC model that will be considered is the model of model of cognition with rich perceptual-motor capabilities. ACT-R
the task first presented in Ballas, Heitmeyer, and Perez (1992). 5.0 is the most recent iteration of the ACT-R cognitive architec-
Again, this model first appeared in Kieras and Meyer (1997), but ture (introduced in Anderson, 1993). ACT-R has a long history
a richer version of the model is described in more detail later, within cognitive psychology, as various versions of the theory
in Kieras, Meyer, and Ballas (2001). The display used is a split have been developed over the years. In general, the ACT family
screen. On the right half of the display, the user was confronted of theories has been concerned with modeling the results of psy-
with a manual tracking task, which was performed using a joy- chology experiments, and this is certainly true of the current in-
stick. The left half of the display was a tactical task in which the carnation, ACT-R. Anderson and Lebiere (1998) show some of
user had to classify objects as hostile or neutral based on their this range, covering areas as diverse as list memory (chapter 7),
behavior. There were two versions of the interface to the tactical choice (chapter 8), and scientific reasoning (chapter 11).
task, one a command-line style interface using a keypad and one ACT-R is, like EPIC and Soar, a production system with activ-
a direct-manipulation-style interface using a touch screen. The ity centered on the production cycle, which is also set at 50 ms
performance measure of interest in this task is the time taken to in duration; however, there are many differences between ACT-
respond to events (such as the appearance of a new object or a R and the other architectures. First, ACT-R can only fire one pro-
change in state of an object) on the tactical display. duction rule per cycle. When multiple production rules match
This is again a task well suited to EPIC because the perceptual- on a cycle, an arbitration procedure called conflict resolution
motor demands are extensive. This is not, however, what makes comes into play. Second, ACT-R has a well-developed theory of
this task so interesting. What is most interesting is the human declarative memory. Unlike EPIC and Soar, declarative memory
performance data: in some cases, the keypad interface was elements in ACT-R are not simply symbols. Each declarative el-
faster than the touch screen interface, and in many cases, the ement in ACT-R also has an activation value associated with it,
two yielded almost identical performance, while in some other which determines whether and how rapidly it may be accessed.
cases, the touch screen was faster. Thus, general claims about Third, ACT-R contains learning mechanisms, but is not a perva-
the superiority of GUIs do not apply to this case and a more pre- sive learning system in the same sense as Soar. These mecha-
cise and detailed account is necessary. nisms are based on a “rational analysis” (Anderson, 1990) of the
EPIC provides just the tools necessary to do this. Two models information needs of an adaptive cognitive system.
were constructed for each interface, again using the bracketing For example of rational analysis, consider conflict resolution.
approach. The results were revealing. In fact, the fastest possible Each production in ACT-R has associated with it several numeric
models showed no performance advantage for either interface. parameters, including numbers which represent the probabil-
The apparent direct-manipulation advantage of the touch screen ity that if the production fires, the goal will be reached and the
for initial target selection was almost perfectly offset by some cost, in time, that will be incurred if the production fires. These
type-ahead advantages for the keypad. The reason for the in- values are combined according to a formula that trades off prob-
consistent results is that the users generally did not operate at the ability of success vs. cost and produces an “expected gain” for
speed of the fastest possible model; they tended to work some- each production. The matching production with the highest
where in between the brackets for both interfaces. However, they expected gain is the one that gets to fire when conflict resolu-
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 107

5. Cognitive Architecture • 107

tion is invoked. The expected gain values are noisy, so the sys-
tem’s behavior is somewhat stochastic, and the probability and
cost values are learned over time so that ACT-R can adapt to
changing environments.
Similarly, the activation of elements in declarative memory
is based on a Bayesian analysis of the probability that a declara-
tive memory element will be needed at a particular time. This
is a function of the general utility of that element, reflected in
what is termed its “base-level” activation, and that element’s as-
sociation with the current context. The more frequently and re-
cently an element has been accessed, the higher its base-level
activation will be, and thus the easier it is to retrieve. This value
changes over time according to the frequency and recency of
use, thus this value is learned. These kinds of mechanisms have
helped enable ACT-R to successfully model a wide range of cog-
nitive phenomena.
The current version of the theory, 5.0, incorporates several
important changes over previous versions of the theory. First,
the architecture is now a fully modular architecture, with fully
separate and independent modules for different kinds of infor-
mation processing (such as declarative memory and vision).
Second, ACT-R 5.0 no longer contains a “goal stack” for auto-
mated goal management; goals are now simply items in declar-
ative memory and subject to processes like learning and decay.
Third, the new modular architecture allows processing in ACT-R
to be mapped onto the human brain, with different modules FIGURE 5.3. Overall structure of the ACT-R 5.0 architecture.
identified with different brain regions. Finally, a production rule
learning mechanism has been fully incorporated, so learning is
now more pervasive in ACT-R. The 5.0 version of the theory sub- Among other things, Ehret developed an ACT-R/PM model of a
sumes all previous versions of the theory including ACT-R/PM, fairly simple, but subtle, experiment. In that experiment, sub-
which is now considered obsolete. jects were shown a target color and asked to click on a button
In ACT-R 5.0, the basic production system is augmented with that would yield that color. The buttons themselves had four
four EPIC-like peripheral modules, as well as a goal module and types: (a) blank, (b) arbitrary icon, (c) text label, and (d) color.
a declarative memory module, as depicted in Fig. 5.3. Like EPIC, In the color condition, the task was simple: users simply found
all of these modules run in parallel with one another, giving the color that matched the target and then clicked the button.
ACT-R the ability to overlap processing. The peripheral modules In the text label condition, the task was only slightly more diffi-
come from a variety of sources. ACT-R’s Vision Module is based cult: users could read the labels on the buttons and select the
on the ACT-R Visual Interface described in Anderson, Matessa, correct one because the description matched the color. In the
and Lebiere (1997). This is a feature-based attentional visual arbitrary icon condition, more or less random pictures appeared
system, but does not explicitly model eye movements. Recently, on each icon (e.g., a mailbox). Users had to either memorize the
the Vision Module has been extended to include an eye-move- picture-to-color mapping, which they had to discover by trial
ment model (Salvucci, 2001a) as well. The Motor Module is and error, or memorize the location of each color, since the but-
nearly identical to the Manual Motor Processor in EPIC and is tons did not change their functions over time. In the hardest
based directly on the specification found in Kieras and Meyer condition, the blank condition, users simply had to memorize
(1996), and the Speech Module is similarly derived from EPIC. the mapping between button location and color, which they had
The Audition Module is a hybrid of the auditory system found in to discover through trial and error.
EPIC and the attentional system in ACT-R’s Vision Module. Clearly, the conditions will produce different average re-
One other important property of ACT-R is that it is possible sponse times, and what Ehret (1999) found is that they also pro-
to have models interact with the same software as the human duced somewhat different learning curves over time. Ehret
users being modeled. The software development conditions are added an additional manipulation as well: after users performed
fairly restrictive, but if these conditions are met, then both the the task for some time, all labeling was removed. Not surpris-
user and the model are forced to use the same software. This re- ingly, the amount of disruption was different in the different con-
duces the number of degrees of freedom available to the mod- ditions, reflecting the amount of incidental location learning that
eler in that it becomes impossible to force any unpleasant mod- went on as subjects performed the task. The ACT-R model that
eling details into the model of the user interface, because there Ehret constructed did an excellent job of explaining the results.
is no model of the user interface. More will be said about this This model represented the screen with the built-in visual mech-
issue in the next section. anisms from ACT-R and learned the mappings between color and
ACT-R has been applied to numerous HCI domains. The first location via ACT-R’s standard associative learning mechanisms.
example comes from the dissertation work of Ehret (1999). The initial difference between the various conditions was repro-
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 108

108 • BYRNE

duced, as were the four learning curves. The model also suffered interfaces can be mocked up, and predicted performance com-
disruptions similar to those suffered by the human users when pared, giving the designer important information about how
the labels were removed. This model is an excellent demonstra- changes in the interface may impact user performance. This is
tion of the power of ACT-R, exercising both the perceptual-mo- not limited strictly to prediction of time of execution, either, as
tor capabilities of the system as well as the graded learning in the full model output—a complete trace of the predicted user
ACT-R’s rational-analysis driven mechanisms. behavior—is also available for inspection.
Salvucci (2001b) described an ACT-R model that tests ACT- This is an important advance, as it makes the power of com-
R’s ability to handle multimodal, high-performance situations in plete and detailed computational cognitive modeling available
a very compelling task: this model drives an automobile driving to people with no experience with cognitive architectures. It
simulator. This is not a robotics project; the ACT-R model does also has important ramifications for the methodology of model
not actually turn the steering wheel or manipulate the pedals; building, since models generated by CogTool are not hand-
rather, it communicates with the automobile simulation soft- tweaked in an attempt to optimize fit to data; in fact, the intent
ware. The model’s primary job is to maintain lane position as is to make predictions about nonexistent data. Some validation,
the car drives down the road. Salvucci (2001b) added an addi- however, has been performed, and early explorations with Cog-
tional task that makes it particularly interesting: the model di- Tool have found the time predictions to be quite accurate. As
als telephone numbers on a variety of mobile phone interfaces. exciting a development as this is, it is important to note that
Two factors were crossed: (a) whether the telephone was di- there are clearly limitations. The language of available actions
aled manually via keypad vs. dialed by voice, and (b) whether is based on the keystroke-level model (KLM) of Card, Moran,
the full telephone number needed to be dialed vs. a shortened and Newell (1983) and is thus somewhat limited. For example,
“speed dial” system. The model was also validated by compari- it does not provide a model for any kind of cognition more com-
son with data from human users. plex than very simple decisions about where to point. For ex-
Both the model and the human users showed that dialing ample, if one wanted to model use of an ATM, CogTool would
while not driving is faster than dialing while driving, and that not model the process of the user trying to retrieve a security
steering performance can be disrupted by telephone dialing. code from memory. (However, if the user was highly practiced
Not surprisingly, the most disruptive interface was the “full- such that this retrieval is very rapid and always successful, then
manual” interface, where the full phone numbers were dialed CogTool could be appropriate.) Efforts are underway to make
on a keypad. This is due largely to the fact that dialing with the automatic translation of slightly more complex GOMS-style
keypad requires visual guidance, causing the model (and the models into ACT-R models possible (St. Amant, Freed, & Ritter,
users) to take their eyes off the road. There was very little dis- 2005), but that still would not address the more complex cog-
ruption associated with the voice interfaces, regardless of nition required by many interfaces. Nonetheless, CogTool rep-
whether full numbers or speed-dial was used. resents a significant advance in making architecture-based mod-
This is a powerful illustration of the value of cognitive eling accessible to those who are not experts in the field.
architectures for a number of reasons. First, the basic driving The last application of ACT-R in HCI to be covered in de-
model could simply be reused for this task; it did not have to be tail is based on Pirolli and Card’s (1999) theory of information
reimplemented. Second, the model provides an excellent foraging. This theory is based on the idea that people forage
quantitative fit to the human data. Third, this is an excellent for information in much the same way that animals forage for
example of a situation where testing human users can be diffi- food. The literature on the latter type of foraging is quite rich,
cult. Testing human drivers with interfaces that degrade driving including equations that describe when foragers will give up
performance is dangerous, so simulators are generally used for on a particular patch of information and move on to the next
this kind of evaluation; however, maintaining a driving simula- one. Foraging for food is often guided by proximal cues (e.g.,
tor requires a great deal of space and is quite expensive. If scent) that indicate the presence of distal content (e.g., food);
someone wanted to test another variant of the telephone in- analogously, information foragers may use proximal cues (e.g.,
terface, it would be much faster and cheaper to give that inter- visible link labels) to guide their search for information that
face to Salvucci’s (2001b) model than it would be to recruit and they cannot presently access. In the spirit of the analogy, Pirolli
test human drivers. and Card term these proximal cues “information scent.” That
One of the long-standing criticisms of cognitive architectures such a theory should be applicable to the web should be fairly
in HCI is that they are theoretical tools designed to support re- obvious.
search and are too difficult for the average practitioner to use This theory has spawned a number of models based on
in a more applied context. This criticism certainly holds some modified versions of ACT-R. The original model from Pirolli and
truth (though how much may be open to debate); however, Card (1999) was called ACT-IF (for “information foraging”).
some researchers are actively trying to address it. One of the Later versions were SNIF-ACT 1.0 (Card et al., 2001) and SNIF-
most interesting developments in the cognitive architecture ACT 2.0 (Pirolli & Fu, 2003), where SNIF stands for “scent-based
arena in the past few years has been the work of John and col- navigation and information foraging.” These models use a mod-
leagues (2004) on something they term “CogTool” (see http:// ified form of ACT-R’s spreading activation to compute informa-
www.cogtool.org). CogTool allows interface designers to mock tion scent, which drives the model’s choices as it browses
up an interface in HTML, demonstrate the tasks to be per- through web pages. SNIF-ACT 1.0 and later also uses an altered
formed in a GUI environment, and have an ACT-R based cogni- version of ACT-R’s conflict resolution mechanism to determine
tive model of the tasks automatically synthesized. Alternative when to abandon the current website (or information “patch”)
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 109

5. Cognitive Architecture • 109

and move on to a new site. SNIF-ACT 2.0 adds a further im- systems, for example, the development of EPIC’s peripheral sys-
provement, which is the use of a new measure to compute the tems clearly impacted ACT-R.
strengths of association (and therefore spreading activation)
between words. This new measure, termed “pointwise mutual
information” (PMI), is much easier to compute than LSA, but Comparisons
generally produces similar results. SNIF-ACT 2.0 was used to
predict link selection frequency in a large (N  244) Web be- An exhaustive and detailed comparison of the major cognitive
havior study with impressive results, accounting for 72% of architectures is beyond the scope of this chapter; however, an
the selection variance on one site and 90% of the variance on excellent comparison that includes a number of other architec-
a second site. tures can be found in Pew and Mavor (1998). Certainly, the three
Obviously, this work is clearly related to CWW. In fact, the for- production systems are related: (a) Soar, (b) EPIC, and (c) ACT-R.
aging work has also led to a tool to help automate the process of A major difference between them is their original focus; they
web usability assessment, this one named “Bloodhound” (Chi were originally developed to model slightly different aspects of
et al., 2003). Bloodhound takes a site and a goal description and human cognition. As they develop, however, there appears to
uses scent metrics combined with Monte Carlo methods to be more convergence than divergence. This is generally taken as
compute usage patterns and identify navigation problems. The a good sign that the science is cumulating. Still, there are dif-
two projects clearly have similar goals and similar outputs. What ferences, and certainly between the production systems and LI-
is less clear is the extent to which the differences in the under- CAI/CoLiDeS. Many of the relevant comparisons are summa-
lying cognitive architecture and differences in other aspects of rized in Table 5.1.
the techniques (such as LSA vs. PMI) actually lead to different Most of the entries on this table have been previously dis-
predictions about user behavior. As of this writing, the two sys- cussed, with the exception of the last two. Support for learning
tems have never both been applied to the same website. How- will be discussed in the next section. The presence and size of
ever appealing such a “bake-off ” might be, it is important not the user community has not been discussed, as it is not clear
to lose sight of the advance even the possibility of such a com- what role (if any) such a community plays in the veridicality of
parison represents: it is only recently that the field has had psy- the predictions made by the system; however, it may be relevant
chologically informed and quantitative predictions about some- to researchers for other reasons, particularly those trying to
thing as complex as Web behavior available for potential learn the system.
comparison. Thus, this is a concrete example of cognitive ar- In addition, many of the values on this table are likely to
chitectures paying off in HCI. change in the future. For example, the 6.0 version of ACT-R will
Other published and ongoing work applying ACT-R to HCI- be available by the time this chapter appears (though there are
oriented problems is available (e.g., Taatgen & Lee, 2003; Pee- few major theoretical changes planned), and planning for a ma-
bles & Cheng, 2003; Byrne, 2001; Schoelles & Gray, 2000), but jor revision for version 7.0 is already underway.
space considerations prohibit a more exhaustive review. The vi- It is difficult to classify the value an architecture has on a par-
tality of the research effort suggests that ACT-R’s combination of ticular attribute as an advantage or a disadvantage, because what
perceptual-motor modules and a strong theory of cognition will constitutes an advantage for modeling one phenomenon may
continue to pay dividends as an HCI research tool. be a disadvantage for modeling others. For example, consider
In fact, this is not limited to ACT-R; overall, cognitive archi- learning in Soar. Certainly, when attempting to model the im-
tectures are an exciting and active area of HCI research. The sys- provement of users over time with a particular interface, Soar’s
tems described here all take slightly different approaches and learning mechanism is critical; however, there are many appli-
focus on slightly different aspects of various HCI problems, but cations for which modeling learning is not critical, and Soar’s
there is clearly a great deal of cross-pollination. Lessons learned pervasive learning feature occasionally causes undesired side ef-
and advancements made in one architecture often affect other fects that can make model construction more difficult.

TABLE 5.1. Architecture Feature Comparison


LICAI/CoLiDeS EPIC ACT-R 5.0
Original focus Text comprehension Multiple-task performance Memory and problem-solving
Basic cycle Construction- integration Production cycle (parallel) Production cycle (serial)
Symbolic or activation-based? Both Symbolic Both
Architectural goal management Special cycle types, supports None None
Vague goals
Detailed perceptual-motor systems No Yes Yes
Learning mechanisms No No Yes
Large, integrated models No No Soon
Extensive natural language Yes No No
Support for learning system None None Extensive tutorial materials, summer school
User community* None None Growing, primarily psychology
*Outside of the researchers who have developed the system.
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 110

110 • BYRNE

Ritter, Baxter, Jones, and Young (2000). A number of different


THE FUTURE OF COGNITIVE approaches have been taken. In general, the EPIC solution to
ARCHITECTURES IN HCI this problem has been to reimplement the interface to be mod-
eled in Lisp (more recent versions of EPIC require C), so the
Beyond the incremental development and application of archi- model and the interface can communicate via direction function
tectures such as EPIC and ACT-R, what will the future hold for calls. The ACT-R solution is not entirely dissimilar. In general,
cognitive architectures in HCI? What challenges are faced, and ACT-R has only been applied to relatively new experiments or in-
what is the ultimate promise? Indeed, a number of important terfaces that were initially implemented in Lisp, and thus ACT-
limitations for cognitive architectures currently exist. There are R and the interface can communicate via function calls. In or-
questions they cannot yet address, and it is hard to see how they der to facilitate the construction of models and reduce the
even would address some questions. Other limitations are more modeler’s degrees of freedom in implementing a custom inter-
pragmatic than in principle, but these are relevant as well. face strictly for use by a model, ACT-R does provide some abili-
First, there is a wide array of HCI problems that are simply ties for automatically managing this communication when the
outside the scope of current cognitive architectures. Right now, interface is built with the native GUI builder for Macintosh Com-
these architectures focus on cognition and performance, but mon Lisp under MacOS and Allegro Common Lisp under Win-
not on other aspects of HCI, such as user preference, boredom, dows. If the interface is implemented this way, both human
aesthetics, fun, and so on. Another important challenge, though users and the models can interact with the same interface.
one that might be overcome, is that these architectures gener- The most intriguing development along this line, however, is
ally have not been applied to social situations, such as those en- recent work by St. Amant and colleagues (St. Amant & Riedl,
countered in groupware or online communities (Olson & Ol- 2001; St. Amant, Horton, & Ritter, in press). They have imple-
son, this volume; Preece, this volume). In principle, it is possible mented a system called SegMan, which directly parses the
to implement a model of social interaction in a cognitive archi- screen bitmap on Windows systems—that is, given a Windows
tecture; however, the knowledge engineering problem here display—any Windows display—SegMan can parse it and iden-
would certainly be a difficult one. How does one characterize tify things such as text, scroll bars, GUI widgets, and the like. It
and implement knowledge about social situations with enough also has facilities for simulating mouse and keyboard events.
precision to be implemented in a production system? It may ul- This is an intriguing development, because in principle, it
timately be possible to do so, but it is unlikely that this will hap- should be possible to connect this to an architecture such as
pen anytime soon. EPIC or ACT-R, which would enable the architecture to poten-
One problem that will never entirely be resolved, no matter tially work with any Windows application in its native form.
how diligent the modelers are, is the knowledge engineering While a number of technical details would have to be worked
problem. Every model constructed using a cognitive architec- out, this has the potential of fulfilling one of the visions held by
ture still needs knowledge about how to use the interface and many in the cognitive architecture community: a high-fidelity
what the tasks are. By integrating across models, the knowledge “virtual user” that could potentially use any application or even
engineering demands when entering a new domain may be re- combination of applications. Besides providing a wide array of
duced (the NTD is a good example), but they will never be elim- new domains to researchers, this could be of real interest to
inated. This requirement will persist even if an architecture was practitioners as well because this opens the door for at least
to contain a perfect a theory of human learning—and there is some degree of automated usability testing. This idea is not a
still considerable work to be done to meet that goal. new one (e.g., Byrne, Wood, Sukaviriya, Foley, & Kieras, 1994;
Another barrier to the more widespread use of cognitive ar- St. Amant, 2000), but technical and scientific issues have pre-
chitectures in HCI is that the architectures themselves are large cluded its adoption on even a limited scale. This would not elim-
and complex pieces of software, and (ironically) little work has inate the need for human usability testing (see Ritter & Young,
been done to make them usable or approachable for novices. 2001, for a clear discussion of this point) for some of the reasons
For example, “EPIC is not packaged in a ‘user-friendly’ manner; listed above, but it could significantly change usability engi-
full-fledged Lisp programming expertise is required to use the neering practice in the long run.
simulation package, and there is no introductory tutorial or The architectures themselves will continue to be updated
user’s manual” (Kieras & Meyer, 1997, p. 399). The situation is and applied to more tasks and interfaces. As mentioned, a new
slightly better for ACT-R: tutorial materials, documentation, and version of ACT-R (version 6.0) is currently under development,
examples for ACT-R are available, and most years there is a two- and this new version has definitely been influenced by issues
week “summer school” for those interested in learning ACT-R raised by HCI concerns. New applications of EPIC result in new
(see http://act.psy.cmu.edu). mechanisms (e.g., similarity-based decay in verbal working
Another limiting factor is implementation. In order for a cog- memory storage; Kieras, Meyer, Mueller, & Seymour, 1999) and
nitive architecture to accurately model interaction with an in- new movement styles (e.g., click-and-point, Hornof & Kieras,
terface, it must be able to communicate with that interface. Be- 1999). Applications such as the World Wide Web are likely to
cause most user interfaces are “closed” pieces software with no drive these models into more semantically rich domain areas,
built-in support for supplying a cognitive model with the infor- and tasks that involve greater amounts of problem solving are
mation it needs for perception (e.g., what is on the screen also likely candidates for future modeling.
where) or accepting input from a model, this creates a technical Despite their limitations, this is a particularly exciting time to
problem. Somehow, the interface and the model must be con- be involved in research on cognitive architectures in HCI. There
nected. An excellent summary of this problem can be found in is a good synergy between the two areas, as cognitive architec-
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 111

5. Cognitive Architecture • 111

tures are certainly useful to HCI, so HCI is also useful for cognitive The need for truly quantitative engineering models will only
architectures. HCI is a complex and rich yet still tractable domain, grow as user interfaces propagate into more and more places
which makes it an ideal candidate for testing cognitive architec- and more and more tasks. Cognitive architectures, which al-
tures. HCI tasks are more realistic and require more integrated ca- ready have a firmly established niche in HCI, seem the most
pabilities than typical cognitive psychology laboratory experi- promising road toward such models. Thus, as the architectures
ments, and thus cognitive architectures are the best theoretical expand their range of application and their fidelity to the hu-
tools available from psychology. Theories like EPIC-Soar (Chong mans they are attempting to model, this niche is likely to ex-
& Laird, 1997) and ACT-R are the first complete psychological pand. HCI is an excellent domain for testing cognitive architec-
models that go from perception to cognition to action in detail. tures as well, so this has been, and will continue to be, a fruitful
This is a significant advance and holds a great deal of promise. two-way street.

References
Altmann, E. M. (2001). Near-term memory in programming: A simulation- Card, S. K., Pirolli, P., Van Der Wege, M., Morrison, J. B., Reeder, R. W.,
based analysis. International Journal of Human-Computer Studies, Schraedley, P. K., et al. (2001). Information scent as a driver of Web
54(2), 189–210. behavior graphs: Results of a protocol analysis method for Web
Altmann, E. M., & John, B. E. (1999). Episodic indexing: A model of usability. In Human factors in computing systems: Proceedings of
memory for attention events. Cognitive Science, 23(2), 117–156. CHI 2001 (pp. 498–505). New York: ACM.
Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Chi, E. H., Rosien, A., Supattanasiri, G., Williams, A., Royer, C., Chow, C., et al.
Erlbaum. (2003). The Bloodhound project: Automating discovery of web usabil-
Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Erlbaum. ity issues using the Infoscent simulator. In Human factors in comput-
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & ing systems: Proceedings of CHI 2003 (pp. 505–512). New York: ACM.
Quin, Y. (2004). An integrated theory of the mind. Psychological Chong, R. S., & Laird, J. E. (1997). Identifying dual-task executive
Review, 111, 1036–1060. process knowledge using EPIC-Soar. In M. Shafto & P. Langley (Eds.),
Anderson, J. R., Conrad, F. G., & Corbett, A. T. (1989). Skill acquisition Proceedings of the Nineteenth Annual Conference of the Cognitive
and the LISP tutor. Cognitive Science, 13(4), 467–505. Science Society (pp. 107–112). Hillsdale, NJ: Erlbaum.
Anderson, J. R., & Lebiere, C. (1998). The atomic components of Ehret, B. D. (1999). Learning where to look: The acquisition of location
thought. Mahwah, NJ: Erlbaum. knowledge in display-based interaction. Unpublished doctoral dis-
Anderson, J. R., Matessa, M., & Lebiere, C. (1997). ACT-R: A theory of sertation, George Mason University, Fairfax, VA.
higher level cognition and its relation to visual attention. Human- Franzke, M. (1994). Exploration, acquisition, and retention of skill with
Computer Interaction, 12(4), 439–462. display-based systems. Unpublished doctoral dissertation, University
Ballas, J. A., Heitmeyer, C. L., & Perez, M. A. (1992). Evaluating two of Colorado, Boulder.
aspects of direct manipulation in advanced cockpits. Proceedings Gray, W. D., John, B. E., & Atwood, M. E. (1993). Project Ernestine: Vali-
of ACM CHI 92 Conference on Human Factors in Computing Sys- dating a GOMS analysis for predicting and explaining real-world task
tems (pp. 127–134). New York: ACM. performance. Human-Computer Interaction, 8(3), 237–309.
Blackmon, M. H., Kitajima, M., & Polson, P. G. (2003). Repairing usabil- Gray, W. D., Young, R. M., & Kirschenbaum, S. S. (1997). Introduction
ity problems identified by the Cognitive Walkthrough for the Web. to this special issue on cognitive architectures and human-computer
In Human factors in computing systems: Proceedings of CHI 2003 interaction. Human-Computer Interaction, 12, 301–309.
(pp. 497–504). New York: ACM. Hornof, A., & Kieras, D. E. (1997). Cognitive modeling reveals menu
Blackmon, M. H., Kitajima, M., & Polson, P. G. (2005). Tool for accurately search is both random and systematic. Proceedings of ACM CHI 97
predicting website navigation problems, non-problems, problem Conference on Human Factors in Computing Systems (pp. 107–
severity, and effectiveness of repairs. In Human factors in comput- 114). New York: ACM.
ing systems: Proceedings of CHI 2005 (pp. 31–40). New York: ACM. Hornof, A., & Kieras, D. (1999). Cognitive modeling demonstrates how
Blackmon, M. H., Polson, P. G., Kitajima, M., & Lewis, C. (2002). Cogni- people use anticipated location knowledge of menu items. Pro-
tive walkthrough for the web. In Human factors in computing sys- ceedings of ACM CHI 99 Conference on Human Factors in Com-
tems: Proceedings of CHI 2002 (pp. 463–470). New York: ACM. puting Systems (pp. 410–417). New York: ACM.
Bovair, S., Kieras, D. E., & Polson, P. G. (1990). The acquisition and per- Howes, A., & Young, R. M. (1996). Learning consistent, interactive, and
formance of text-editing skill: A cognitive complexity analysis. meaningful task-action mappings: A computational model. Cogni-
Human-Computer Interaction, 5(1), 1–48. tive Science, 20(3), 301–356.
Byrne, M. D. (2001). ACT-R/PM and menu selection: Applying a cognitive Howes, A., & Young, R. M. (1997). The role of cognitive architecture in
architecture to HCI. International Journal of Human-Computer modeling the user: Soar’s learning mechanism. Human-Computer
Studies, 55, 41–84. Interaction, 12(4), 311–343.
Byrne, M. D., & Bovair, S. (1997). A working memory model of a com- Huguenard, B. R., Lerch, F. J., Junker, B. W., Patz, R. J., & Kass, R. E.
mon procedural error. Cognitive Science, 21(1), 31–61. (1997). Working memory failure in phone-based interaction. ACM
Byrne, M. D., Wood, S. D., Sukaviriya, P. N., Foley, J. D., & Kieras, D. E. Transactions on Computer-Human Interaction, 4, 67–102.
(1994). Automating interface evaluation. ACM CHI’94 Conference on John, B. E., & Kieras, D. E. (1996). The GOMS family of user interface
Human Factors in Computing Systems (pp. 232–237). New York: analysis techniques: Comparison and contrast. ACM Transactions
ACM Press. on Computer-Human Interaction, 3, 320–351.
Card, S. K., Moran, T. P., & Newell, A. (1983). The psychology of human- John, B. E., Prevas, K., Savucci, D. D., & Koedinger, K. (2004). Predictive
computer interaction. Hillsdale, NJ: Erlbaum. human performance modeling made easy. In Human factors in
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 112

112 • BYRNE

computing systems: Proceedings of CHI 2004 (pp. 455–462). New (Ed.), Visual information processing (pp. 283–308). New York: Aca-
York: ACM. demic Press.
Jones, R. M., Laird, J. E., Nielsen, P. E., Coulter, K. J., Kenny, P., & Koss, Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Har-
F. V. (1999). Automated intelligent pilots for combat flight simulation. vard University Press.
AI Magazine, 20(1), 27–41. Newell, A., & Simon, H. A. (1963). GPS, a program that simulates human
Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehen- thought. In E. A. Feigenbaum & J. Feldman (Eds.), Computers and
sion: Individual differences in working memory. Psychological thought (pp. 279–293). Cambridge, MA: MIT Press.
Review, 99(1), 122–149. Peebles, D., & Cheng, P. C. H. (2003). Modeling the effect of task and
Kieras, D. E., & Bovair, S. (1986). The acquisition of procedures from graphical representation on response latency in a graph reading
text: A production-system analysis of transfer of training. Journal of task. Human Factors, 45, 28–45.
Memory & Language, 25(5), 507–524. Pew, R. W., & Mavor, A. S. (Eds.). (1998). Modeling human and organi-
Kieras, D. E., & Meyer, D. E. (1996). The EPIC architecture: Principles zational behavior: Application to military simulations. Washing-
of operation. Retrieved July 4, 1996, from ftp://ftp.eecs.umich ton, DC: National Academy Press.
.edu/people/kieras/EPICarch.ps. Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review,
Kieras, D. E., & Meyer, D. E. (1997). An overview of the EPIC architec- 106(4), 643–675.
ture for cognition and performance with application to human-com- Pirolli, P., & Fu, W. (2003). SNIF-ACT: A model of information foraging
puter interaction. Human-Computer Interaction, 12(4), 391–438. on the world wide web. In P. Brusilovsky, A. Corbett, & F. de Rosis
Kieras, D. E., & Meyer, D. E. (2000). The role of cognitive task analysis (Eds.), Proceedings of the ninth international conference on user
in the application of predictive models of human performance. In modeling (pp. 45–54). Johnstown, PA: Springer-Verlag.
J. M. Schraagen & S. F. Chipman (Eds.), Cognitive task analysis Polk, T. A., & Newell, A. (1995). Deduction as verbal reasoning. Psycho-
(pp. 237–260). Mahwah, NJ: Erlbaum. logical Review, 102(3), 533–566.
Kieras, D. E., Meyer, D. E., Mueller, S., & Seymour, T. (1999). Insights Polson, P. G., Lewis, C., Rieman, J., & Wharton, C. (1992). Cognitive
into working memory from the perspective of the EPIC architecture walkthroughs: A method for theory-based evaluation of user inter-
for modeling skilled perceptual-motor and cognitive human perfor- faces. International Journal of Man-Machine Studies, 36(5),
mance. In A. Miyake & P. Shah (Eds.), Models of working memory: 741–773.
Mechanisms of active maintenance and executive control (pp. Polson, P. G., Muncher, E., & Engelbeck, G. (1986). A test of a common
183–223). New York: Cambridge University Press. elements theory of transfer. Proceedings of ACM CHI’86 Conference
Kieras, D. E., Meyer, D. E., & Ballas, J. A. (2001). Towards demystification on Human Factors in Computing Systems (pp. 78–83). New York:
of direct manipulation: cognitive modeling charts the gulf of execu- ACM.
tion, Proceedings of ACM CHI 01 Conference on Human Factors Rieman, J., Young, R. M., & Howes, A. (1996). A dual-space model of iter-
in Computing Systems (pp. 128–135). New York: ACM. atively deepening exploratory learning. International Journal of
Kieras, D., & Polson, P. G. (1985). An approach to the formal analysis of Human-Computer Studies, 44(6), 743–775.
user complexity. International Journal of Man-Machine Studies, Ritter, F. E., Baxter, G. D., Jones, G., & Young, R. M. (2000). Cognitive
22(4), 365–394. models as users. ACM Transactions on Computer-Human Interac-
Kieras, D. E., Wood, S. D., & Meyer, D. E. (1997). Predictive engineer- tion, 7, 141–173.
ing models based on the EPIC architecture for multimodal high- Ritter, F. E., & Young, R. M. (2001). Embodied models as simulated
performance human-computer interaction task. Transactions on users: Introduction to this special issue on using cognitive models
Computer-Human Interaction, 4(3), 230–275. to improve interface design. International Journal of Human-
Kintsch, W. (1998). Comprehension: A paradigm for cognition. New Computer Studies, 55, 1–14.
York: Cambridge University Press. Salthouse, T. A. (1988). Initiating the formalization of theories of cogni-
Kitajima, M., & Polson, P. G. (1997). A comprehension-based model of tive aging. Psychology & Aging, 3(1), 3–16.
exploration. Human-Computer Interaction, 12(4), 345–389. Salvucci, D. D. (2001a). An integrated model of eye movements and
Kitajima, M., Blackmon, M. H., & Polson, P. G. (2000). A comprehension- visual encoding. Cognitive Systems Research, 1(4), 201–220.
based model of web navigation and its application to web usability Salvucci, D. D. (2001b). Predicting the effects of in-car interface use on
analysis. In S. McDonald & Y. Waern & G. Cockton (Eds.), People and driver performance: An integrated model approach. International
Computers XIV-Usability or Else! (Proceedings of HCI 2000) (pp. Journal of Human-Computer Studies, 55, 85–107.
357–373). New York: Springer. Schoelles, M. J., & Gray, W. D. (2000). Argus Prime: Modeling emer-
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: gent microstrategies in a complex simulated task environment. In
The latent semantic analysis theory of acquisition, induction, and N. Taatgen & J. Aasman (Eds.), Proceedings of the Third Interna-
representation of knowledge. Psychological Review, 104(2), 211– tional Conference on Cognitive Modeling (pp. 260–270). Veenen-
240. dal, NL: Universal Press.
Lewis, R. L. (1993). An architecturally-based theory of human sentence Simon, H. A. (1996). The sciences of the artificial (3rd ed.). Cambridge,
comprehension. Unpublished doctoral dissertation, University of MA: MIT Press.
Michigan, Ann Arbor. Singley, M. K., & Anderson, J. R. (1989). The transfer of cognitive skill.
Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive Cambridge, MA: Harvard University Press.
cognitive processes and multiple-task performance: I. Basic mecha- St. Amant, R. (2000, Summer). Interface agents as surrogate users. Intel-
nisms. Psychological Review, 104(1), 3–65. ligence magazine, 11(2), 29–38.
Nelson, G., Lehman, J. F., & John, B. E. (1994). Integrating cognitive St. Amant, R., Horton, T. E., & Ritter, F. E. (in press). Model-based eval-
capabilities in a real-time task. In A. Ram & K. Eiselt (Eds.), Proceed- uation of expert cell phone menu interaction. ACM Transactions
ings of the Sixteenth Annual Conference of the Cognitive Science on Human-Computer Interaction.
Society (pp. 353–358). Hillsdale, NJ: Erlbaum. St. Amant, R., & Riedl, M. O. (2001). A perception/action substrate for
Newell, A. (1973). You can’t play 20 questions with nature and win: Pro- cognitive modeling in HCI. International Journal of Human-
jective comments on the papers of this symposium. In W. G. Chase Computer Studies, 55, 15–39.
ch05_8166_Sears/Jacko_LEA 7/13/07 9:22 PM Page 113

5. Cognitive Architecture • 113

St. Amant, R., Freed, A. R., & Ritter, F. E. (2005). Specifying ACT-R mod- Wiesmeyer, M. (1992). An operator-based model of covert visual atten-
els of user interaction with a GOMS language. Cognitive Systems tion. Unpublished doctoral dissertation, University of Michigan, Ann
Research, 6, 71–88. Arbor.
Taatgen, N. A., & Lee, F. J. (2003). Production compilation: A simple Young, R. M., Barnard, P., Simon, T., & Whittington, J. (1989). How would
mechanism to model skill acquisition. Human Factors, 45, 61–76. your favorite user model cope with these scenarios? ACM SIGCHI
Tullis, T. S. (1983). The formatting of alphanumeric displays: A review Bulletin, 20, 51–55.
and analysis. Human Factors, 25(6), 657–682.

You might also like