Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

A Computational Model of Human Affective Memory

and Its Application to Mindreading


Hugo Liu
MIT Media Laboratory
20 Ames Street #320D
Cambridge, MA 02139, USA
+1 (617) 253-5334
hugo@media.mit.edu



ABSTRACT
The cognitive science and artificial intelligence communities are
both interested in the problem of how humans infer the mental
states of others, known as mindreading. Whereas cognitive sci-
ence is interested in a deeper understanding of how humans min-
dread, artificial intelligence is interested in imparting mindreading
capabilities to social computers. Current AI approaches to min-
dreading are weak, however. Techniques such as user profiling
and collaborative filtering try to predict user preferences and ac-
tions, but do so very weakly. In this paper, we propose a deeper
model of a person in terms of their system of attitudes, and im-
plement the system called PERSONA. Grounded in the episodic
and reflexive memories of a person, PERSONA uses saliency-
mediated associative learning to automatically acquire a human
affective memory model from a corpus of personal text, such as a
weblog. Applying this model, PERSONA performs affective
mindreading to predict a persons likely affective response given a
new situation or event. In addition to memory-based prediction
alone, the system also analyzes the attitudes of a persons Min-
skian imprimers and performs conceptual analogy to make predic-
tions more robust. An evaluation of PERSONA indicates that it is
a promising approach, comfortably outperforming baselines; how-
ever, because affective communication is fairly fail-hard, more
refinement would be needed before this system can be applied to a
socialize a computer.
1. WHAT IS MINDREADING
INTERESTING?
Recently there has been much ado in the cognitive science com-
munity about the human faculty for Theory of Mind (ToM), oth-
erwise known as mindreading. And no, it does not refer to psy-
chic powers as one might guess. ToM and mindreading refer to
an animals capability for reflecting on its own mental states
attitudes, beliefs, and desires and modeling the mental states of
others. It is believed that humans evolved specialized mindread-
ing abilities absent in other primates (Povinelli and Preuss, 1995),
and that the human mindreading faculty makes human social
learning uniquely powerful inter alia, the rapid learning of
words (Bloom, 2002), and the learning of goals and values (Min-
sky, forthcoming). Cognitive scientists have gone about the study
of mindreading in many ways, including: by evolutionary compar-
ison, e.g. (Call and Tomasello, 1996); by examining linked phe-
nomena like imitation (Meltzoff and Gopnik, 1993); by studying
deficits of ToM in autistic children; by speculating on potential
neural substrates for ToM such as mirror neurons (Gallese and
Goldman, 1998); and by debating how it works, i.e. Simulation
Theory of ToM versus Theory Theory of ToM.
Across the divide, artificial intelligence researchers are also think-
ing about mindreading. However, being on the whole more
pragmatic, this community is more interested in imparting min-
dreading capabilities to computers and robots to create more so-
ciable human-computer interaction (Nass et al., 1994). While
some results from the cognitive science literature is interesting for
to the AI community, such as the recent find of special action
recognition neurons called mirror neurons in macaque monkeys
(Gallese et al., 1996), we think it is fair to say that behavioral and
bottom-up approaches to mindreading is still far away from pro-
ducing a compelling and predictive cognitive model that could
empower social computers.
Despite lacking a complete cognitive model of mindreading, the
AI community has been working on weaker forms of mindreading
for many years. User modeling, for example, attempts to model a
human users preferences and mental context in hopes of creating
more natural and personal interactions between human and com-
puter. One common approach in user modeling is user profiling,
whereby users are modeled by their demographic information,
usually obtained via explicit questionnaires. Applying a small set
of rules, these user demographics can be mapping into predicted
user preferences. Another common approach in user model is
collaborative filtering, in which patterns of user actions are mod-
eled against those of a whole user community. While these forms
of user modeling have enjoyed some success, particularly in prod-
uct recommendation (Resnick and Varian, 1997), these approach-
es are too weak to be useful for socializing computers. User pro-
filing oversimplifies people as obeying demographic lines, while
collaborative filtering is a purely statistical approach offering little
insight into a users beliefs or preferences; thus, user profiling and
collaborative filtering are weak mindreaders.
In this work, we explore how mindreading can be deepened in a
novel way: by considering knowledge of persons life experiences
over a long period of time, and applying this knowledge to predict
how a person might respond in new situations.
In order to create a more complete and more intimate model of a
person, we would necessarily need a corpus of knowledge about



that persons beliefs, desires, goals, and experiences. While it
may be possible to acquire this directly through interactions with
the user, building a sufficiently rich model of a person might re-
quire cumbersome interactions; thus the approach taken by this
work is to try to infer such a model automatically from personal
texts such as a journal, or a transcript of a persons beliefs and
ideas, as might be manifested in an interview. Through our initial
experience, we realized that specific beliefs and goals would be
too hard to accurately infer from unconstrained natural language
text, but we did not want to sacrifice breadth of knowledge for
specificity, so instead, we decided to try to infer just the emotions,
attitudes and dispositions associated with these beliefs and goals.
Using this body of knowledge, we construct a mechanism to pre-
dict the affective context of a person in reaction to a topic, situa-
tion or event. We dub this task affective mindreading, with affect
referring to emotions, dispositions, and attitudes. If successful,
we believe that this type of mechanism can have great implica-
tions for sociable computers.
Our approach can be summarized as follows. From text, we wish
to infer a persons emotions, attitudes, and dispositions toward
particular people, topics, events, and situations, at different times
in their lives, and to record these into a model of a persons affec-
tive memory. By interpolating and extrapolating from this affec-
tive memory, a computer can perform affective mindreading that
is to say, given a new topic, event, or situation, the system will try
to predict a persons affective response. To implement this ap-
proach, we built PERSONA, a system that creates a model of a
persons affective memory from personal texts, and exploits this
model for affective mindreading.
The rest of this paper is organized as follows. First, we present a
computational model of human affective memory, a model of
saliency-mediated associative learning from personal texts, and
discuss the implementation of the PERSONA model learner.
Second, we explore how the affective memory model is used in
conjunction with conceptual analogy to perform affective min-
dreading. Third, we describe an experiment to evaluate
PERSONA in an affective mindreading task. Fourth, we recon-
nect with the literature and address how affective mindreading
aids social learning tasks in humans and computers.
2. A MODEL OF AFFECTIVE MEMORY
In the previous section, we motivated the development of a com-
putational model of human affective memory by suggesting that
such a model would allow for more advanced mindreading by
computers than can be achieved through typical knowledge-
impoverished user modeling techniques such as profiling. We
begin this section with the caveat that the computational model
described here is not claimed or intended to be cognitively moti-
vated. We attempt to model human affective memory only insofar
as it is feasible to infer from personal texts, and only insofar as it
is useful to the task of affective mindreading predicting a per-
sons attitudes and dispositions toward a particular subject. In
this section, we first propose the two-part episode-reflex model of
human affective memory and connect it to the literature. Second,
we introduce saliency-mediated associative learning as a strategy
for automatic model acquisition from personal texts. Third, we
discuss how such a model has been implemented in PERSONA.
2.1 The Episode-Reflex Model
Of Human Affective Memory
Of the different types of human memory that have been studied,
two are of great interest to us as tools for modeling affective
memory: long-term episodic memory, and reflexive memory. In
PERSONA, we combine the strengths of two memories to form
the episode-reflex model.
2.1.1 Affective long-term episodic memory
Long-term episodic memory (LTEM) is a relatively stable
memory based on experiences and events in context. An episode
can be thought of as a coherent packet of events with a time-
sequence. Episodes are generally content-addressable, meaning
that they can be retrieved through a variety of cues based on the
sensory, affective, or semantic content of the episode, such as a
sight, sound, emotion, or location. LTEM can be very powerful
because even events which happen only once can become salient
memories and serve to recurrently influence a persons future
thinking. If we hope to accurately predict a persons affective
response to a future situation, we must account for the influence
of these one-time salient episodes. Even though our aim is to
model only the affective aspect of human memory, we cannot, in
the case of LTEM, completely disregard the non-affective aspects
because they may serve as cues for retrieval. Consequently, our
affective LTEM model represents episodes with some semantic
structure and several types of context. In PERSONA, an affective
LTEM episode has the following components:
- A collection of the subevents of an episode that are sali-
ent to the evocation of the overall affect of the episode,
sequentially ordered.
- If possible, the perceived root cause of the affective re-
sponse in that episode are extracted
- Possibly salient contexts: the date, the location, the top-
ic
- An affect valence score associated with the episode
- Salience score of episode, measuring the perceived im-
portance of the memory
The motivation of extracting only salient subevents and extracting
the perceived root cause of the episode make learning more pre-
cise, and will be discussed further in the next subsection. In addi-
tion to describe the thematic structure of the episode, we also
encode several other types of contextual cues with the episode.
As suggested by Tulvings encoding specificity hypothesis
(1983), retrieval of an episode is more likely when current condi-
tions match the encoding conditions, thus it is important to re-
member the salient contexts surrounding an episode as completely
as possible. Finally, because our main focus is on being able to
recall the attitudes experienced during an episode, we associate an
affect valence score, to be described in a later subsection.
2.1.2 Affective reflexive memory
While long-term episodic memory deals in salient, one-time
events and must generally be consciously recalled, reflexive
memory is full of automatic, instant, almost instinctive associa-
tions. Whereas LTEM is content-addressable and requires pat-
tern-matching the current situation with that of the episode, re-
flexive memory is like a simple hash-table that directly associates
a cue with a reaction, thereby abstracting away the content.
Tulving equates LTEM with remembering and reflexive
memory with knowing (Tulving, 1983).
In humans, reflexive memories are generally formed through re-
peated exposures rather than one-time events, though subsequent
exposures may simply be recalls of a particularly strong primary
exposure (Locke, 1689). In addition to frequency of exposures,
the strength of an experience is also considered. Complementing
the event-specific Affective LTEM with an event-independent
affective reflexive memory makes sense because there may not
always be an appropriate distinct episode which shapes our ap-
praisal of a situation; often, we react reflexively our present
attitudes deriving from such amalgamation of our past experiences
now collapsed into something instinctive.
Because humans undergo forgetting, belief revision, and theory
change, update policies for human reflexive memory may actually
be quite complex. In PERSONA, we adopt a more simplistic
representation and update policy that is not cognitively motivated,
but instead, exploits the ability of a computer system to compute
an affect valence at runtime. An entry in the memory is as follows:
- The key to the entry is one of two types:
o 1) A simple conceptual cue whose semantic
type belongs to the following ontology: a per-
son, an action, an object, an activity, or a
named event; or
o 2) A simple conceptual cue Bayesian condi-
tioned on the presence of a discourse topic.
- The value of the entry is a list of exposures.
- An exposure X is the following triple:
o date of exposure;
o affect valence score of exposure, V;
o saliency of exposure S
To read off the current valence associated with a conceptual cue,
the formula given in Eq. (1) is applied.
( ) | |

=
enddate
startdate t
t t b
E V E X b n
n
) ( ) ( ) , max( log
1
(1)
where n = the number of exposures of the concept

This formula returns the valence of a conceptual cue averaged
over a particular time period. The term,
( ) | | ) , max( log b n
b
, re-
wards frequency of exposures, while the term,
) (
t
E X
, rewards
the saliency of an exposure. In this simple model of an affective
reflexive memory, we do not consider phenomena such as belief
revision, reflexes conditioned over contexts, or forgetting.
In summary, we have motivated and characterized two compo-
nents to our computational model of human affective memory: an
episodic component emphasizing the affect of one-time salient
memories, and a reflexive component, emphasizing instinctive
reactions to conceptual cues that are conditioned over time. In the
following subsection, we propose how this two-part model of
human affective memory can be acquired from personal texts via
saliency-mediated associative learning.
2.2 Saliency-Mediated Associative Learning
With origins in Aristotle, classical associative learning was popu-
larized as an explanation of many brain processes beginning in the
17
th
century by several British philosophers, including John Locke
and James Mill. However, after the rise and fall of popularity of
Pavlovian classical conditioning, many in the cognitive science
community now dismiss associative learning as inadequate. In the
study of word learning in children, Paul Bloom reported that con-
trary to Lockes assertion that repetition is necessary to associate
words with sights and sounds, children actually learn word mean-
ings error-free, and without repetition, in a process dubbed fast-
mapping (Bloom, 2002). While it may seem that associative
learning is being debunked as a plausible theory of cognitive
learning, we suggest that associative learning can in many cases,
be salvaged, given that it is appropriately structured. In Blooms
research on word learning in children for example, error-free fast
mapping is possible because the child uses the teachers mental
and intentional context to disambiguate reference, and once dis-
ambiguated with sufficient confidence, the association between
word and meaning can then be made with greater confidence.
With a similar sentiment against weak associationism, Marvin
Minsky argues that simply remembering everything is not equiva-
lent to learning. The defining criteria for learning is knowing pre-
cisely what is learned. (Minsky, forthcoming) Or, formulated
another way, learning involves credit assignment (Sutton, 1984).
The lesson to be learned from this (pun intended) is that associa-
tive learning is not useful unless it is precise. In other words, our
mechanism of learning should not involve solely semantically
weak statistical methods, but instead, perhaps incorporating some
external knowledge and heuristics to gain additional precision. In
particular, we see the identification of saliency and salient events
as a mechanism to focus associations. We dub this, saliency-
mediated associative learning (SMAL). SMAL is similar to credit
assignment, except that salience is a heuristically generated score
rather than an assertion, thus making it amenable to statistical
learning methods.
The learning mechanism of each of the two parts of the proposed
affective memory model incorporates saliency to focus learning.
In the affective long-term episodic memory model, affect is asso-
ciated with particularly salient subevents rather than the whole of
the episode. In addition, the perceived root cause of the affective
response in the episode are extracted or inferred when possible.
Finally, a saliency score is given to the whole of the episode, to
rate its importance and impact to the person being modeled. These
three features together focus the associative learning mechanism,
and help to answer the question, what should be learned. Of
course, identifying saliency, being a flavor of the credit assign-
ment problem, is not an easy task, especially over domain uncon-
strained texts. In the next subsection, we explain the role that a
large common sense knowledge base plays in this important sub-
task.
In the affective reflexive memory model, associations are not
made at the word-level, which would tend to conflate the affect of
too many different senses of a word into the same entry, but ra-
ther, conceptual cues are those first-order or second-order phrases
which follow the ontology: a person, an action, an object, an ac-
tivity, or a named event. The choice of ontology reflects the types
of salient concepts that we believe people typically form stable
attitudes about. In addition, to embrace the possibility that con-
cepts may have different affect valences under different contexts,
an entry in the affective reflexive model may be keyed on a con-
cept Bayesian conditioned over a particular discourse contexts.
The difficulty in identifying the contexts which dictate a concep-
tual cues interpretation is discussed further in the evaluation of
PERSONA. Finally, each exposure is associated with a saliency
score, and conceptual cues with more entries are assignment more
salient valence scores. By putting constraints on the types of con-
cepts that can learn affective associations, by considering contexts
is learning affect associations, and by valuating the saliency of the
strength and frequency of exposures, the reflexive memory model
seeks to incorporate as much precision as possible in its associa-
tive learning.
Having proposed the episode-reflex model of human affective
memory and saliency-mediated associative learning as a mecha-
nism for model acquisition, the next section discusses how such
the model and learning mechanisms were implemented in
PERSONA.
2.3 Model Implementation in PERSONA
In proposing the model and learning mechanism, several subtasks
where implied but not addressed explicitly, such as 1) having a
source of personal texts meeting certain suitability criteria, 2) a
model for measuring affect valence, 3) a mechanism for judging
the affect of episodes and text in general, and 4) methods for de-
termining saliency. These implementation issues are discussed in
the ensuing subsections, following by a start-to-finish architectur-
al walkthrough of PERSONAs model learner.
2.3.1 Suitability Criteria for Personal Texts
The suitability of texts for model generation is subject to the fol-
lowing criteria. First, texts should be first-person, subjective,
autobiographical narratives. Personal emotions and attitudes are
either not easily accessible, or not sufficiently detailed in third-
person texts or objective writing. Second, texts should explore a
breadth of topics because an insufficiently broad model gives a
poor and disproportional sampling of a person and would be diffi-
cult to use to perform affective mindreading. Third, texts should
cover everyday events, situations, and topics, because that is the
optimal discourse domain of recognition of the mechanism with
which we will judge the affect of text. Fourth, texts should be
organized into episodes, occurring over a substantial period of
time relative to the length of a persons life.
With these criteria in mind, an ideal source of personal text is a
personal journal. While private journals would be preferred for
their candor, publicly viewable journals in the form of recently
popular weblogs are still a good source of personal texts, though
the generated model may exhibit a bias toward the public per-
sonality of the person being modeled. A less good but workable
source of personal texts are interview transcripts. Interviews sat-
isfy most of the criteria for personal text selection with the excep-
tion that interviews are not reliably organized around episodes,
and may represent a disproportionally narrow set of topics. Still,
they are suitable substrates for model generation, provided that
the limitations of the resulting model are realized.
2.3.2 Representing Affect using the PAD Model
Affect valence pervading the proposed models can take one of
two potential representations. They can take the form of an on-
tology of basic canonical emotions, represented prominently by
Paul Ekmans six basic emotions (happy, sad, angry, scared, dis-
gusted, and surprised) (1993). Or, they can take the form of a
dimensional model, represented prominently by Albert Mehrabi-
ans Pleasure-Arousal-Dominance (PAD) model (1995). In this
model, the three nearly independent dimensions are Pleasure-
Displeasure, Arousal-Nonarousal, and Dominance-
Submissiveness. Each dimension can assume values from 100%
to +100%, and a PAD valence score is a 3-tuple of these values
(e.g. [-.51, .59, .25] might represent anger).
We chose the dimensional PAD model over the discrete canonical
emotion model because PAD represents a sub-symbolic, continu-
ous account of affect, where different symbolic affects can be
unified along one of the three dimensions. This model has ro-
bustness implications for the affective classification of text. For
example, in the affective reflexive memory, a conceptual cue may
be variously associated with anger, fear, and surprise, which can
be unified along the Arousal dimension of the PAD model, thus
enabling the affect association to be coherent and focused.
2.3.3 Affective Appraisal of Personal Text
Judging the affect of a personal text has three chief considera-
tions. First, the mechanism for judging the affect should be ro-
bust and comprehensive enough to correctly appraise the affect of
a breadth of concepts. Second, to aid in the determination of
saliency, the mechanism must be able to appraise the affect of
very little text, such as on the sentence-level. Third, the mecha-
nism should recognize specific emotions rather than convolving
affect onto any single dimension.
Several common approaches fail to meet the criteria. The nave
keyword spotting approach looks for surface language features
like keywords. However, this approach is not acceptably robust
on its own because affect is often conveyed without mood key-
words. Statistical affect classification using statistical learning
models such as latent semantic analysis (LSA) generally require
large inputs for acceptable accuracy because it is a semantically
weak method. Hand-crafted models and rules are not broad
enough to analyze the desired breadth of phenomena.
To analyze personal text with the desired robustness, granularity,
and specificity, we employ a model of textual affect sensing using
real-world knowledge, proposed by Hugo Liu et al. (2003). In
this model, defeasible knowledge of everyday people, things,
places, events, and situations is leveraged to sense the affect of a
text by evaluating the affective implications of each event or situa-
tion. For example, to evaluate the affect of I got fired today,
this model evaluates the consequences of this situation and char-
acterizes it using negative emotions such as fear, sadness, and
anger. This model, coupled with a nave keyword spotting ap-
proach, provides rather comprehensive and robust affective classi-
fication. Since the model uses knowledge rather than word statis-
tics, it is semantically strong enough to evaluate text on the sen-
tence level, classifying each sentence into a six-tuple of valences
(ranging from a value of 0.0 to 1.0) for each of the six basic Ek-
man emotions. These emotions are then mapped to the PAD
model.
One point of potential paradox should be addressed. The real-
world knowledge-based model of affect sensing is based on defea-
sible commonsense knowledge from the Open Mind Com-
monsense corpus (Singh et al., 2002), which is in turn, gathered
from a web community of some 11,000 teachers. Therefore, the
affective assessment of text made by such a model represents the
judgment of a typical person. However, sometimes a personal
judgment of affect is contradicted by the typical judgment. Thus,
it would seem paradoxical to attempt to learn that a situation has a
personally negative affect when the typical person judges the situ-
ation as positive. To overcome this difficulty, we implement, in
parallel, a mood keyword-spotting affect sensing mechanism to
confirm and contradict the assessment of the primary model. In
addition, we make the assumption that although a personal affect
judgment may sometimes deviate from that of a typical person, it
will deviate most of the time. The implication of this is that on a
slightly larger granularity than a sentence, the affective appraisal
is likely to be accurate. To assess the affect of a sentence, we
factor in the affective assessment of not only the sentence itself,
but also of the paragraph, section, and whole journal entry or
episode.
Another way to view this is that the learning of personal attitudes
and dispositions can be bootstrapped by commonsense attitudes
and dispositions.
2.3.4 Determining Saliency
The key to saliency-mediated associative learning is of course,
being able to judge the saliency or importance of an episode,
cause of episode, subevent, context, or exposure.
Salient subevents. Within the analysis of an episode, saliency of
subevents is determined by two components: Relative contribu-
tion of valence to the overall valence, and alignment with key
events of everyday story scripts. First, the main verbs and argu-
ments are extracted from the sentences, constituting a candidate
list of subevents. The affective valence of each of these subevents
is compared against the overall valence of the episode, and those
that contribute most to, and align best with the overall valence are
given higher saliency. Second, using a small collection of pithy
everyday stories from the Open Mind Commonsense (OMCS)
story corpus (Singh et al., 2002), an alignment procedure tries to
map the current episode to the corpus of stories. If a match exists,
then the episodes key events can be identified and their saliencies
boosted.
Salient contexts. In identifying salient within-episode contexts,
the semantic recognition of possible types of contexts such as
date, time, location, and social circles first takes place. Contexts
which occur with the greatest number of repetitions and anaphoric
references are judged salient.
Salient cause of episode. This is perhaps the most important step
in learning. Episodes can unfold in multiple steps but a person
will ultimately attribute an affective response to a single root
cause. There are three heuristic processes for attempting to identi-
fy the perceived cause of a salient episode. First, a heuristic in-
formation extractor tries to use regular expression patterns and
syntactic cues to identify the explicitly stated perceived cause of
the affective response in the episode. Second, if not found in the
text, the alignment procedure between stories in the OMCS story
corpus and the episode may also produce a cause because may be
a cause or moral explicitly associated with the story. Third, we
use OMCSNet (Liu and Singh, 2003), a semantic network repre-
sentation of 80,000 nodes and 200,000 edges generated from
OMCS, to reason abductively about the cause. In this representa-
tion, nodes are first- and second-order concepts like people, plac-
es, things, events, activities and actions, while edges are labeled
with one of 25 relational predicates. Salient events of the episode
are mapped onto nodes in OMCS, then, edges with the causal
predicates, effectOf, and consequenceOf are followed in re-
verse. If the search for causes converges on a common node, then
that node is chosen as a cause.
Saliency of episode. Episodic saliency is judged by the frequency
with which salient subevents of that episode are recalled in later
episodes, and by the detection of affect valence of episodes whose
fulcrum is the episode in question.
Saliency of exposure. Exposure is judged as the membership of a
conceptual cue within an episode whose affect valence is strong.
Naturally, this is a weak association, and can be strengthened if
particular conceptual cues are given salient exposure. This in-
cludes frequency counting and anaphoric reference counting with-
in the episode. If a conceptual cue occurs with high frequency or
reference, then it is likely a topic of the episode.
2.3.5 The PERSONA Model Learner Architecture
The architecture of the PERSONA Model Learner is given in Fig.
1. The text inputter scrapes a weblog URL or other personal text
corpus for date-annotated episodes. In the linguistic processing
suite, raw episodes are syntactically and semantically processed,
meetings the needs of the two associative learners. The affective
text analyzer combines a real-world knowledge-based analyzer
(Emotus Ponens (Liu et al., 2003)) with a back-off mood keyword
spotter. The sentences within each episode are annotated with a
PAD valence triplet.

semantically
processed episodes
raw episodes
affectively annotated
episode
episode
Linguistic Processing Suite:
semantic type recognition
temporal phrase recognition
tagging and chunking
SVO identification
anaphora resolution
Affective Text Analyzer
Mood keyword spotting
Real-world knowledge based sensing
Ekman to PAD model mapping
Affective Long-term
Episodic Memory Learner
Salient subevent assessor
Salient contexts assessor
Salient cause assessor
Episode salience calculation
Emotus Ponens
OMCSNet
OMCS Story
Corpus
Affective Reflexive Memory
Learner
Conceptual cue gather
Salient exposure assessor
annotated episodes
Affective
Reflexive Memory
Affective
Long-term
Episodic Memory
update update
P
E
R
S
O
N
A

M
o
d
e
l

L
e
a
r
n
e
r
Text Inputter
Weblog extraction
Episode builder raw episodes
weblog URL

Fig 1. PERSONAs Model Learner Architecture
In the affective reflexive memory learner, conceptual cues previ-
ously identified in the linguistic processing suite are judged for
salient exposure by assessing their topicalization in the episode.
Then the affective reflexive memory is updated.
In the affective long-term memory learner, salient subevents, con-
texts, and causes are gathered together into an EpisodeFrame.
The salience of the episode is assessed by analyzing all later epi-
sodes for reference. Note that the salience of an episode may be
updated in the memory, as new episodes may refer to past epi-
sodes, thereby increasing their saliency. In personal theory
change, episodes may be more than recalled, they may actually be
relived, with different affective assessments. In such a case, the
original experience should be forgotten. In our current model of
PERSONA, we do not account for this. The EpisodeFrame is
associated with an episode affect valence, and stored in affective
long-term episodic memory.
To summarize, in this section, we proposed the episode-reflex
computational model of human affective memory, one based
around episodes, and one based on reflex. We presented saliency-
mediated associative learning as a focused heuristical/statistical
learning mechanism to acquire the proposed model from personal
texts such as weblogs. In the next section, we focus on how these
models can be used for affective mindreading predicting a per-
sons attitude toward a familiar or new subject by reasoning with
their affective memory model.
3. AFFECTIVE MINDREADING
In this section, we discuss how we have applied the PERSONA
affective memory model to predict the affective context of a per-
son in reaction to a person, thing, topic, situation, or event. We
have dubbed this task affective mindreading because it is similar
to the harder parent problem of mindreading trying to predict a
persons actions given knowledge of their beliefs and desires.
Dan Dennetts intentional stance describes one mindreading strat-
egy commonly used by people to understand other people (1987).
Given knowledge of a persons beliefs and desires, you can expect
that person to act in rationally in such a way as to further their
goals.
In our problem domain, the PERSONA mindreader is given
knowledge of a persons affective memories and present attitudes
toward people, things, topics, situations, and events. The
PERSONA mindreader is given some new text embodying people,
things, topics, situations, and events. Rather than predicting the
persons actions, in affective mindreading, the systems task is to
predict the persons affective response to this text. To do this, we
will leverage both the episodic and reflexive memories to perform
interpolative prediction and extrapolative prediction. In interpola-
tive prediction, the current episode activates known elements of
the affective memory. In extrapolative prediction, the current
episode does not contain known elements of the affective
memory, but rather, to make a prediction, we must reason by
analogy to connect it to known elements. So in essence our min-
dreaders strategy is to believe that a persons affective response
to a new situation will be consistent with attitudes to past occur-
rences of that situation or an analogous situation. Later in this
section, we augment this strategy by predicting that a persons
affective response to a new situation will also be influenced by
that persons internal imprimers (Minsky, forthcoming).
The next two subsections describe how the episodic memory and
reflexive memory are applied in affective mindreading.
3.1 Exploiting Episodes in Mindreading
Recall that episodes are kept separate from reflexive memories
because they are affectively powerful one-time occurrences rather
than frequency conditioned. Because they are such powerful sin-
gular examples which must be consciously recalled, their trigger-
ing typically involves several activated conceptual and contextual
cues. Episodic recollection can be triggered by a subevent or root
cause, with contextual cues as supporting triggers.
When analyzing a new episode, we apply the same heuristic
mechanisms to jist salient subevents, contexts, and root cause. If
multiple elements of the subevents, contexts, and root cause align
with an episode in memory, then that episode is triggered, causing
the triggered episodes affect to be projected onto the affect of the
current episode. The saliency of the episode is a multiplier coeffi-
cient to this affect valence. It is sometimes the case that the root
cause of the current episode aligns with the root cause of the trig-
gered episode, even though there were no matching subevents.
This can be thought of as a case of analogy because two different
sequences of events thought the person the same lesson.
3.2 Reflex Memories in Mindreading
Unlike episodes, reflex memories do not require multiple concep-
tual cues to be triggered. Each conceptual cue or conceptual cue
Bayesian conditioned on a topic will be directly triggered by the
same conceptual cue found in text. In applying reflex memories,
we separate the cases of interpolation versus extrapolation. Inter-
polation occurs when a conceptual cue in the current episode is
found in reflexive memory. In this case, the affect valence of the
memory is projected onto the current episode. However, if the
conceptual cue is not found in memory, then we can try to extrap-
olatively predict its affect by trying to map it to known concepts
via conceptual analogy, and then projecting the affect valence of
the analogous concept in memory onto the current episode.
Conceptual analogy analyzes two concepts for structural similari-
ties, and if they are similar enough, they are deemed analogous.
To perform conceptual analogy, we use OMCSNet and a structur-
al mapping algorithm (Getner, 1983). For example, the following
analogous concept can be computed:

bicycle is like car (90.0%) because both:
==[isA]==> vehicle
==[isA]==> machine
==[isA3]==> means of transportation
==[isA3]==> faster than walking
==[isA3]==> used for transport
==[isA3]==> mode of transportation
==[hasLocation3]==> street
==[hasLocation3]==> garage
==[hasCollocate2]==> wheel
==[hasCollocate3]==> wheel
==[hasUse2]==> transportation
==[hasAbility2]==> travel on road
==[hasLocation5]==> garage
==[hasPart]==> wheel
The intuition behind extrapolation using conceptual analogy is
that affective attitudes often transfer over to analogous concepts.
In indicative trials however, we discovered that there were certain
classes of concepts in which this was not the case. For example,
dog and cat are, all things considered, fairly close analogs;
however, person who love dogs are not likely to like cats. In the
short term, we stop-listed conceptual analogies among certain
hot topics like pets. However, we thought that this was an in-
teresting finding and explored it a bit further.
We suggest several explanations for this. As kids, we were often
asked for our favorite pet, and perhaps there is a common percep-
tion that not having a definitive preference is an indication of
weak personality. Such a self-reflective critic may reinforce an
XOR preference relationship among possible pets. Another ex-
planation, explored further in the next subsection, is that liking a
dog has the implication that the person is in fact a dog-person. If
the person identifies with the group, dog-people, then he/she can
be said to inherit some of the common attitudes of that group.
One of these common attitudes may be a distaste for cats, thus,
blocking the affective transfer of dog over to cat. In the next sub-
section, we see that dog-person is a public imprimer (Minsky,
forthcoming), and imprimers can be viewed computationally as
the inheritance of attitudes of the imprimer.
3.3 Inherited Attitudes
from Internal Imprimers
So far the mindreading strategy we have discussed employs only a
persons own memory-recorded attitudes in making predictions of
that persons affective response to a new episode. Now, we
would like to augment this strategy and discuss how the
knowledge of a persons imprimers can aid in prediction.
Marvin Minsky describes an imprimer as someone to which one
becomes attached. He introduces the concept in the context of
attachment-learning of goals, and suggests that imprimers help to
shape a childs values. Imprimers can be a parent, mentor, car-
toon character, a cult, or a person-type. The two most important
criteria for an imprimer are that 1) the imprimer embodies some
image, filled with goals, ideas, or intentions, and that 2) one feels
attachment to the imprimer. Minsky theorizes that the images of
imprimers can be internalized and their effects still realized.
We extend this idea in the affect realm and make the further claim
that internal imprimers can do more than to critique our goals; our
attachment to them leads us to the willful emulation of a portion
of their values and attitudes. Keeping a collection of these inter-
nal imprimers, they help to support our identity. From the suppo-
sition that we conform to many of the attitudes of our internal
imprimers, we hypothesize that affective memory models of these
imprimers, if known, can complement the persons own affective
memory model in affective mindreading. (Of course, a persons
personality will affect the degree to which their attitudes are influ-
enced by others). This hypothesis is supported by much of the
work in psychoanalysis. Sigmund Freud (1991) wrote of a pro-
cess he called introjection, in which children unconsciously emu-
late aspects of their parents, such as the assumption of their par-
ents personalities and values. Other psychologists have referred
to introjection by terms like identification, internalization, and
incorporation.
We propose the following model of internal imprimers to support
the affective mindreading mechanism. First, it is necessary to
identify people, groups, and images that may possibly be a per-
sons imprimer. We can do so but analyzing the affective
memory. From a list of all conceptual cues from both the episodic
and reflexive memories, we use semantic recognizers to identify
all people, groups (e.g. my company) and images (e.g. dog=>
dog-person) that on average, elicit high Arousal and high Sub-
missiveness, show high frequency of exposure in the reflexive
memory, and collocate in past episodes with self-conscious emo-
tion keywords like proud, embarrassed, ashamed.
DOG-
PERSON
WARREN
BUFFETT
MOM
MARTHA
STEWART
OLDER
BROTHER
SELF
b u s i n e s s
p e r s o n a
s o c i a l
p e r s o n a
d o me s t i c
p e r s o n a

Fig 2. Affective models of internal imprimers, organized into
personas, complements ones own affective model

Once imprimers are identified, we also wish to identify the con-
text under which an imprimers attitudes show influence. Shown
in Fig. 2, we propose organizing the internal imprimer space into
personas representing different contextual realms. There is good
reason to believe that humans organize imprimers by persona
because we are different people for different reasons. One might
like Warren Buffetts ideas about business but probably not about
cooking. Personas can also prevent internal conflicts but allowing
a person to maintain separate systems of attitudes in different
contexts. To identify an imprimers context, we must first agree
on an ontology of personas, which can be person-general (as the
personas in Fig. 2 are) or person-specific. Given this ontology,
we use features of each context, such as keywords taken from the
GetContext() function of OMCSNet, to classify episodes.
Once imprimers are associated with personae, we gather as much
personal text from each imprimer as desired and acquire only
the reflexive memory model, thus relaxing the constraint that texts
have episodic organization. In the augmented mindreading strate-
gy (depicted in Fig. 3), when conceptual cues are unfamiliar to the
self, we identify internal imprimers whose persona matches the
genre of the new episode, and give them an opportunity to react to
the cue. These affective reactions are multiplied by a coefficient
representing the ability of this self to be influenced, and the va-
lence score is added on to the episode. Rather than maintaining
all attitudes in the self, internal imprimers enable judgments about
certain things to be mentally outsourced to the persona-
appropriate imprimers.

SELF
Affective Reflexive
Memory
Affective LTEM
Conceptual
Analogy
New Episode
conceptual
cue
unfamiliar
cues
unfamiliar
to self
are referred
to imprimers
episode
frame
root
cause

Fig 3. The imprimer-augmented affective mindreading strategy.
Edges represent memory triggers.
4. EVALUATION
PERSONA was evaluated in affective mindreading experiments
performed with four subjects. Subjects were between the ages of
18 and 28, and have kept diary-style weblogs for at least 2 years,
with an average entry interval of three-to-four days. Subjects
submitted their weblog urls, for the generation of affective
memory models. An imprimer identification routine was run, and
the examiner hand-picked the top one imprimer for each of the
three personas implemented. A personal text corpus was built,
and imprimer reflexive memory models were generated. The
subjects were asked to engage in an interview-style experiment
with the examiner.
In the interview, subject and their corresponding PERSONA
models were asked to evaluate 12 short paragraph texts repre-
sentative of three genres: social, business, and domestic (corre-
sponding to the ontology of personas in the tested implementa-
tion). The same set of texts was presented to each participant and
the examiner chose texts that were generally evocative. They
were asked to summarize their reaction by rating three factors on
Likert-5 scales.
- Feel negative about it (1). Feel positive about it (5)
- Feel indifferent about it (1) Feel intensely about it (5)
- Dont feel control over it (1) Feel control over it (5)
These factors are mapped onto the PAD valence format, assuming
the following correspondence: 1-1.0, 2 -0.5, 30.0, 4
+0.5, and 5 +1.0. Subjects responses were not normalized. To
assess the performance of PERSONA, we record the spread be-
tween the human assessed valence and the computer assessed
valence,
PERSONA human spread
V V V = (2)
We computed the mean spread and standard deviation across all
episodes along each PAD dimension. On the 1.0 to +1.0 valence
scale, the maximum spread is 2.0. Table 1 summarizes the results.

Table 1. Performance of PERSONA affective mindreader, meas-
ured as the spread between human and computer judged values.
Pleasure Arousal Dominance
mean
spread
std.
dev.
mean
spread
std.
dev.
mean
spread
std.
dev.
SUBJECT 1 0.39 0.38 0.27 0.24 0.44 0.35
SUBJECT 2 0.42 0.47 0.21 0.23 0.48 0.31
SUBJECT 3 0.22 0.21 0.16 0.14 0.38 0.38
SUBJECT 4 0.38 0.33 0.22 0.20 0.41 0.32
BASELINE1 0.50
BASELINE2 0.67

Assuming that human reactions obeyed a uniform distribution
over the Likert-5 scale, we give two baselines, which were simu-
lated over 100,000 trials. In BASELINE 1,
PERSONA
V is fixed at
0.0. In BASELINE 2,
PERSONA
V
is given a random value over
the interval [-1.0,1.0] with a uniform distribution. It should be
pointed out however, that in the context of an interactive sociable
computer, BASELINE 1 is not a fair comparison, because it
would never produce any behavior.
On average, PERSONA performed noticeably better than both
baselines, excelling particularly in predicting arousal, and having
the most difficulty predicting dominance. The standard deviations
were very high, reflecting the observation that PERSONAs pre-
dictions were often either very close to the actual valence, or very
far. This can be attributed to one of several causes. First, multiple
episodes described in the same journal entries may have led to
improper associative learning. Second, the reflexive memory
model does not account for conflicting word senses. Third, per-
sonal texts inputted for the imprimers often generated models
skewed to positive or negative because text did not always have
an episodic organization. While results along the pleasure and
dominance dimensions are weaker, the arousal dimension record-
ed a mean spread of 0.22, suggesting the possibility that it alone
may have immediate applicability.
In the experiment, we also analyzed how often the episodic
memory, reflexive memory, and imprimers were triggered. Epi-
sodes were on average, 4 sentences long. For each episode, re-
flexive memory was triggered an average of 21.5 times, episodic
memory 0.8 times, and imprimer reflexive memory 4.2 times. To
measure the effect of imprimers and episodic memories, we re-ran
the experiment turning off imprimers only, episodic memory only,
and both. Table 2 summarizes the results.
Table 2. Performance of PERSONA that can be attributed to im-
primers and episodic memory
Pleasure
mean spread
Arousal
mean spread
Dominance
mean spread
Imp ON, Epi ON
(table 1 results sumed)
0.35 0.22 0.43
Imp ON, Epi OFF 0.34 0.21 0.43
Imp OFF, Epi ON 0.40 0.28 0.44
Imp OFF, Epi OFF 0.41 0.29 0.45

These results suggest that the positive effect of episodic memory
was negligible on the results. This certainly has to do with its low
rate of triggering, and the fact that episodic memories were
weighted only slightly more than reflexive memories. The low
trigger rate of episodic memory can also be attributed to the strict
criteria that three conceptual cues in an episode frame must trigger
in order for the whole episode to trigger. These results also sug-
gest that imprimers played a measurable role in improving per-
formance, which is a very promising result.
Overall, the evaluation demonstrates that the proposed approach
is promising, but needs further refinement. The randomized
BASELINE 2 is a good comparison when considering possible
entertainment applications, whose interaction is more fail-soft.
PERSONA does quite well against the active BASELINE 2, and
is within the performance range of these applications. However,
the results also suggest that PERSONA may not be ready for de-
ployment to a sociable computer just yet, because fallout (bad
predictions) can be very costly in the realm of affective communi-
cation (Nass et al., 1994). Affective communication obeys certain
social contracts, making them fail-hard applications.
5. CONCLUSION
Mindreading is a problem of interest to the cognitive science
community, as well as the artificial intelligence community.
While cognitive science is generally interested in attaining a
deeper understanding of the problem, we argue that their primarily
behavioral and neurological means of study will be insufficient to
uncover the larger mechanistic picture of theory of mind and min-
dreading in humans. On the other hand, the artificial intelligence
community is interested in applying mental modeling and min-
dreading to build sociable computers. However, their models of
profiling and collaborative filtering are too weak. In this work,
we have tried to address the need of both communities and walk a
middle ground. In our study of affective mindreading, we pro-
posed and built a system that attempts to model a person much
deeper than present approaches in AI.
The proposed episode-reflex model of human affective memory
has interesting implications for psychology and cognitive scienc-
es, providing a real way to be able to test classical theories such as
associative learning, memory organizations, and formation of
identity of the self, among others. The fact that attitudes can be
considered independently of beliefs and goals suggests that in
computationalizing mindreading, it may be possible to decompose
mindreading into several processes, of which affective mindread-
ing is one. And indeed, it is possible that humans may have a
toolkit of processes which combine to be called mindreading. It is
not out of the realm of possibilities that a special process exists
for modeling and assessing just the attitudes of another person.
Such a process may serve the role, we speculate, of contextually
disambiguating processes for belief determination.
6. ACKNOWLEDGMENTS
I would like to thank Barbara Barry, Push Singh, and Andrea
Lockerd for their ideas, suggestions, and inspiration in the course
of this work.
7. REFERENCES
[1] Bloom, P., (2002). Mindreading, communication, and the
learning of the names for things. Mind and Language, 17,
37-54.
[2] Call, J. & Tomasello, M. (1996). "The effect of humans on
the cognitive development of apes". In Reaching into
Thought (eds. A.E. Russou, K. A. Bard and S. T. Parker).
Cambridge University Press, pp 371--403.
[3] Dennett, D. (1987). The Intentional Stance. Cambridge, MA:
Bradford Books/MIT Press.
[4] Ekman, P. Facial expression of emotion. American Psy-
chologist, 48, 384-392. 1993.
[5] Freud, S. (1991). The essentials of psycho-analysis: the de-
finitive collection of Sigmund Freud's writing selected, with
an introduction and commentaries, by Anna Freud. London:
Penguin.
[6] Gallese, V. et al. (1996) Premotor cortex and the recognition
of motor actions Cognit. Brain Res. 3, 131-141.
[7] Gallese, V. and A. Goldman. (1998). Mirror neurons and the
simulation theory of mind-reading. Trends in Cognitive Sci-
ences, 2(12).
[8] Gentner, D. (1983). Structure-mapping: A theoretical frame-
work for analogy. Cognitive Science, 7, pp 155-170.
[9] Liu, H., Lieberman, H., and Selker, T. (2003) A model of
textual affect sensing using real-world knowledge. In Pro-
ceedings of the Seventh International Conference on Intelli-
gent User Interfaces, pages 125--132.
[10] Liu, H. and Singh, P. (2003). OMCSNet: A commonsense
inference toolkit. In Submission. Available at:
http://web.media.mit.edu/~hugo/publications
[11] Locke, J. (1689). Essay Concerning Human Understanding
Hypertext by ITL at Columbia University, 1995. Print ver-
sion ed. P.H. Nidditch. Oxford, 1975.
[12] Mehrabian, A. (1995). for a comprehensive system of
measures of emotional states: The PAD Model. (Available
from Albert Mehrabian, 1130 Alta Mesa Road, Monterey,
CA, USA 93940).
[13] Meltzoff, A. and Gopnik, A. (1993) "The role of imitation in
understanding persons and developing a theory of mind". In
Understanding other minds: perspectives from autism (eds.
S. Baron-cohen, H. Tager-Flusberg, D. Cohen) Oxford Uni-
versity Press, Chapter 16 pp 335--366.
[14] Minsky, M., (forthcoming). The Emotion Machine, Panthe-
on, New York. Several chapters are available at:
http://web.media.mit.edu/~minsky
[15] Nass, C.I., Stener, J.S., and Tanber, E. (1994) Computers
are social actors. In Proceedings of CHI 94, (Boston, MA),
pp. 72-78, April 1994.
[16] Povinelli, D.J. and Preuss, T.M. (1995) Theory of mind:
evolutionary history of a cognitive specialization. Trends in
Cognitive Neurosciences, 18:418-424.
[17] Resnick, P. Varian, H. R. (1997). Recommender Systems,
guest editors, special section: recommendation systems in
CACM Vol. 40, No. 3, pp 56-58.
[18] Singh, P. et al. (2002). Open Mind Common Sense:
Knowledge acquisition from the general public. In Proceed-
ings of the First International Conference on Ontologies, Da-
tabases, and Applications of Semantics for Large Scale In-
formation Systems. Lecture Notes in Computer Science.
Heidelberg: Springer-Verlag.
[19] Sutton, R.S. 1984. Temporal credit assignment in reinforce-
ment learning. University of Massachusetts. Departement of
Computer and Information Science. Technical Report 84-2.
Amherst, MA.
[20] Tulving, E 1983 Elements of episodic memory. Oxford: New
York.

You might also like