Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO.

4, AUGUST 2001 483

Affect Analysis of Text Using


Fuzzy Semantic Typing
Pero Subasic, Member, IEEE, and Alison Huettner

Abstract—We propose a novel, convenient fusion of natural lan- customer-oriented Internet sites (eBay.com, amazon.com),
guage processing and fuzzy logic techniques for analyzing the af- newsgroups (misc.consumer), corporate customer service,
fect content in free text. Our main goals are fast analysis and visu- email (customer complaints, questions, opinions), artistic
alization of affect content for decision making. The main linguistic
resource for fuzzy semantic typing is the fuzzy-affect lexicon, from and cultural material (movie and art reviews), etc. Second,
which other important resources—the fuzzy thesaurus and affect affect information is critical to human communication: recent
category groups—are generated. Free text is tagged with affect cat- work shows the importance of emotions in decision-making,
egories from the lexicon and the affect categories’ centralities and perception and learning [8]. And finally, we believe that con-
intensities are combined using techniques from fuzzy logic to pro- clusions with respect to affect are extensible to other types of
duce affect sets—fuzzy sets representing the affect quality of a doc-
ument. We show different aspects of affect analysis using news con- subjective information, such as flavors, styles, motivations, and
tent and movie reviews. Our experiments show a good correspon- perceptions, in general. Potential applications for qualitative
dence between affect sets and human judgments of affect content. text mining technology are completely open ended. This paper
We ascribe this to the representation of ambiguity in our fuzzy af- is an extended version of an earlier paper [9]. Here, we present
fect lexicon and the ability of fuzzy logic to deal successfully with our work in more detail and present more application examples.
the ambiguity of words in a natural language. Planned extensions
of the system include personalized profiles for Web-based content Analyzing affect in a text presents us with two obvious
dissemination, fuzzy retrieval, clustering, and classification. sources of ambiguity and imprecision: the first being emotions
themselves and the second, words in a natural language [10].
Index Terms—Computing with words, fuzzy logic, knowledge
engineering, text mining, world wide web. Rather than attempting to constrain and limit this ambiguity, we
have taken the opposite approach. We explicitly represent and
process ambiguity by introducing fuzzy logic into the picture.
I. INTRODUCTION Specifically, we integrate basic techniques from fuzzy logic and
from computing with words [13] with techniques from natural
T HE huge amount of text stored on computer systems is
getting larger every day. Moving beyond the basic assump-
tion that a given piece of text should be easily located, the next
language processing (NLP). This work is also very related to
the recent work on representing and manipulating perceptions
generation of systems aims toward integrated, personalized ser- [14]. Since the central technique we use from NLP is semantic
vices and decision support. In these areas, a quick analysis of typing [7], we refer to this approach as fuzzy semantic typing.
particular qualities in the text and an intuitive presentation to The fuzzy semantic typing approach is general in scope and
the user become increasingly important. To match an individual can be applied to many different kinds of analysis. We illustrate
user’s profile on the World Wide Web, for example, it is nec- its use in analyzing affect. At the most basic level, it involves:
essary to introduce a human dimension into text understanding 1) isolating a vocabulary of words belonging to a meta-lin-
and representation. Expectations for modeling the purely sub- guistic domain (here, affect or emotion);
jective, human dimensions of text and data understanding are 2) using multiple categorizations and scalar metrics to rep-
high, on both the users’ and providers’ sides. Here we exper- resent the meaning of each word in that domain;
iment with qualitative analysis of affect-related information in 3) computing profiles for texts based on the categorizations
free text. Affect-related information includes words describing and scores of their component domain words;
emotions; like fear, anger, love, joy, and sorrow; feelings like 4) manipulating the profiles to visualize the texts.
warmth and excitement; attitudes like helpful, friendly, hostile; We take a multi-faceted approach to representing and manip-
and other related categories such as temperament, humor, frame ulating qualitative information. We begin with an affect lexicon,
of mind, mood, spirits, morale, and disposition (including words which characterizes a large vocabulary of affect words in terms
like sweet, farce, wary, sanguine, depressed, eagerness, selfish). of a small set of basic categories, such as love, hate, happiness,
Our reasons for selecting this particular domain are threefold. and anger, each to some numerical degree. The categories from
First, affect-related information is pervasive in electronic doc- the affect lexicon constitute semantic tags, which are associated
uments: in news stories (on politics, competitive sports, etc.), with words within a broad semantic domain.
economic reports (corporate acquisitions, investor reactions), In the past, semantic tagging has generally been used for the-
matic role assignment [3] or for word sense disambiguation
(WSD) [4]. Our approach is similar to the standard WSD ap-
Manuscript received September 11, 2000; revised May 8, 2001. proach in that the lexicon entry for an ambiguous word rep-
The authors are with the Clairvoyance Corporation, Pittsburgh, PA 15232
USA (e-mail: p.subasic@clairvoyancecorp.com). resents all of its possible meanings. However, where WSD re-
Publisher Item Identifier S 1063-6706(01)06537-7. quires selecting a single meaning for the word in context, our
1063–6706/01$10.00 © 2001 IEEE
484 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001

system simply assigns them all, exploiting rather than reducing and all fuzzy techniques applicable to fuzzy sets are applied to
the word’s ambiguity. For any relevant word that appears in a such affect sets. The handling of affect sets with intensities is
text, we include all of its possible meanings and connotations in different and more statistical, since, intensities represent less
our analysis of that text; and, we depend on the associated nu- ambiguous, more quantitative features of the text. Affect sets
merical weightings and the cumulative effect of related vocab- and the fuzzy semantic typing technique are presented in Sec-
ulary to create a realistic picture of the text’s affective content. tion III.
Semantic treatments of lexical ambiguity are more typically Finally, visualization is a very important issue in our system.
components of machine translation than of information retrieval It demonstrates the real power of fuzzy semantic typing by pre-
(IR) or filtering systems, although there is some evidence that senting a concise, to-the-point, qualitative representation of af-
ambiguity resolution can improve performance in IR [6]. Our fect in texts. Such visualizations constitute an excellent tool for
approach is unusual in integrating lexico-semantic tags into a decision making. We show some interesting visualization sam-
general-purpose text management system, capable of IR, fil- ples of affect sets for movies and news articles in Section IV.
tering, categorization, and potentially other text management In Section V, we illustrate computational applications of the ap-
functions. proach: retrieval of phrases whose affect content is similar to
We have dealt with ambiguity by allowing a single lexicon that of a given phrase and filtering of phrases based on a pre-set
entry (domain word) to belong to multiple semantic categories. intensity threshold. In Section VI, we discuss ideas for further
Imprecision is handled, not only via multiple category assign- development of the system and in Section VII we summarize
ments, but also by allowing degrees of relatedness (centrality) the paper and its conclusions.
between lexicon entries and their various categories. Gleeful,
for example, is assigned to both happiness and excitement, but II. LINGUISTIC RESOURCES
is given a higher centrality score in the happiness category. In
addition to centralities, lexicon entries are also assigned numer- This section describes the linguistic resources of the fuzzy
ical intensities, which represent the strength of the affect level typing system: the affect lexicon, the fuzzy thesaurus and the
described by that word. Thus, for example, abhorrent and dis- affect category groups.
tasteful have roughly the same centrality on the repulsion scale,
but abhorrent receives a higher intensity. A. Affect Lexicon
After the affect words in a document are tagged, the fuzzy The affect lexicon is a compendium of lexical entries for af-
logic part of the system handles them by using fuzzy combina- fect words, with their corresponding parts of speech, affect cat-
tion operators, set extension operators, and a fuzzy thesaurus to egories, centralities, and intensities. Affect words were gath-
analyze fuzzy sets representing affects. Fuzzy techniques pro- ered from several sources. We began with an existing affect
vide an excellent framework for computational management of wordlist collected from newspaper articles by Mark Kantrowitz
the ambiguity and imprecision that are pervasive in the words of Justsystem Pittsburgh Research Center. We converted the list
of a natural language. There are additional reasons why fuzzy to a new format and supplemented it rather haphazardly from
logic is a good choice for text management. First, eliminating an on-line thesaurus. We are currently experimenting with a
ambiguity and imprecision from texts is unnatural and leads to more systematic, semi-automatic collection method based on
misconceptions about the underlying meaning. When properly WordNet [2], which we hope will prove both scalable and trans-
understood and managed, ambiguity and imprecision lead to en- ferable.
hanced, more concrete and precise representations than other An affect word is any word having an affect-related meaning
(e.g., statistical) methods used for text analysis. This is espe- or connotation: e.g., abhor, abusive, amity, apprehend, arro-
cially true in the case of qualitative analysis, when we are inter- gance, etc. Any given affect word may have multiple entries
ested in what essential features are present in some content. And in the affect lexicon, differing by part of speech value and/or
second, since their emphasis is qualitative, fuzzy techniques are category. The expressions of interest are a superset of simple
more appealing to humans, who tend to think qualitatively; this "emotion" words: they include emotions (happiness), feelings
makes them good candidates for any human-friendly applica- (desire), attitudes (resentful), temperament (good-natured),
tion. The appeal of fuzzy analysis will become apparent when humor (hilarious), frame of mind (cheerful), mood (sulk),
we show visualizations of affect sets later in this document. spirits (morale), and disposition (sunny). We represent an affect
Besides the fuzzy affect lexicon, we generate additional re- word’s meaning by associating the word with one or more
sources for enhanced functionality. A fuzzy thesaurus is gener- affect categories, from our initial inventory of 83 "atomic"
ated from the affect lexicon and used to expand affect sets; affect affects. A relatively straightforward affect word, like terror
category groups are generated by clustering the fuzzy thesaurus will be associated with a single affect category. An affect word
to enable easier visualization, navigation, and browsing for the with a more complex meaning will be assigned to multiple
user. We provide a detailed explanation of these resources and categories—for example, infatuation is assigned to both love
the ways in which we generate them, with several practical ex- and insanity. An ambiguous word is simply assigned to all the
amples, in Section II. categories necessary to capture its various meanings: e.g., mad
A primary representation vehicle in our system is a set of has three entries, associating it with insanity in its first sense
fuzzy semantic categories (affect categories) followed by their and with irritation and anger in its second. All lexicon entries
centralities and/or intensities, called the affect set. An affect set are root forms; forms in the text are part-of-speech tagged and
with attached centralities is always treated as a pure fuzzy set, stemmed before lookup.
SUBASIC AND HUETTNER: AFFECT ANALYSIS OF TEXT USING FUZZY SEMANTIC TYPING 485

Entries in the affect lexicon are of this form conflict and violence (3.7% of entries). The complete list of af-
fect categories with their opposites is given in Appendix A.
Centrality. Centrality degrees range from 0 to 1 by incre-
ments of 0.1. A word that belongs to several affect categories
as in will generally have different centralities from category to cate-
gory, as in this example
Lexical entry is a single entry for a word that has an affectual
connotation or denotes an affect directly. At present, our fuzzy
affect lexicon contains 3876 lexical entries, about half of what
we plan.
Part of speech tag. Since ambiguity sometimes depends on That is, the element of weakness is fairly central to the word
a word’s part of speech—and since NLP allows us to differ- emasculate (a rating of 0.7); the notion of a specific lack is also
entiate parts of speech in documents— we have included POS present but less central (rating of 0.4); and an additional element
information for lexicon entries. For example, the word alert has of violence is possible but not really necessary (rating of 0.3).
different category assignments associated with different POS In assigning centrality, typical questions the lexicon devel-
values oper should answer for each entry and affect category include:
To what extent is affect word X related to category C? To what
extent does affect word X co-occur with category C? To what ex-
tent can affect word X be replaced with category C in the text,
That is, the adjective alert means quick to perceive and act—a without changing the meaning?
kind of intelligence—while the verb alert means to call to a Since centralities indicate the presence of certain qualities
state of readiness—a kind of warning. (represented by appropriate affect categories) in a given affect
A word’s POS can affect its centrality or intensity values as word, centrality computations are handled as computations of
well as its category assignment. For example, lexicon entries fuzzy membership degrees.
with POS, categories, and centrality degrees for the word craze Intensity. In addition to centralities, lexicon entries are as-
include signed numerical intensities, which represent the strength of the
affect level described by that entry. Intensity degrees, like cen-
trality degrees, range from 0 to 1 by increments of 0.1. Here are
some examples (the second number represents the intensity)
That is, the verb craze belongs to affect category insanity with a
degree of 0.8; the singular noun craze belongs to the same cat-
egory with a degree of 0.5. This reflects the fact that the verb
craze means to make insane or as if insane—very central to the
insanity category!—while the noun craze means an exagger-
ated and often transient enthusiasm—i.e., it belongs to insanity
only in a less central, more metaphorical sense.
Affect category. Many of our categories have strayed some- All of these words have some element or connotation of repul-
what from the strictly affect domain: for example, deprivation, sion. A word like abhor expresses very intense repulsion (as
health, and intelligence are only marginally affects, and death, well as being very central to the concept of repulsion); con-
destruction and justice are not affects at all. Such categories tempt, aversion, and displeasure are progressively less intense
have been created in cases where (a) some significant portion on the repulsion scale. A word like fat—which is not at all cen-
of an affect word’s meaning cannot be captured using pure af- tral to the repulsion concept, as expressed by its low centrality
fect categories; and (b) the same meaning component recurred of 0.2, but which has some slight overtones of repulsion to many
again and again in the vocabulary we were trying to handle. For Americans—is an objective description, hence, hardly an affect
example, a word like corpse certainly entails some affect, and word at all. This is reflected in its low intensity score of 0.1. In
can plausibly be assigned to categories sadness and horror; at general, scores below 0.4 on both scales tend to be the most sub-
the same time, a part of its meaning is obviously being missed jective and notional, since it is easier to rate prominent qualities
by those categorizations. Moreover, words like assassination, than backgrounded ones.
cyanide, execute, funeral, genocide, and homicidal share this A word that belongs to several affect categories will generally
missing meaning component. On this first pass, we have gone have different intensities from category to category, as in this
ahead and created extra, not-strictly-affect categories to handle example
such words; in the future, when we review and revise the cate-
gory inventory, we may rethink this decision.
At present, there are 83 affect categories. Each affect category
has an explicit opposite, with three exceptions: death, irritation
and crime. Affect words are spread unevenly across affect cate- That is, avenge is a high-intensity conflict word, but only a mod-
gories, with the least frequent categories being health, sickness erate-intensity word with respect to violence; its intensity rating
and facilitation (only 0.35% of entries), and most frequent being for justice is somewhere in between.
486 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001

Assigning category labels and membership degrees to lexicon Since it is difficult to modify the affect intensity set consis-
entries is a very subjective process. During the present proof-of- tently to reflect the changes in the affect centrality set, we leave
concept phase, the assignments have been made by a single lin- it to the user to accommodate the intensities of the added cate-
guist. They are obviously influenced by the linguist’s own expe- gories for his/her particular purposes.
rience, reading background, and (since affects are in question) Expansion increases the number of categories and the level
personal/emotional background and prejudices. Though subjec- of detail, as shown in Section IV.
tive, the process is not completely arbitrary—the assignments Note, that since the fuzzy thesaurus is generated from the af-
are general enough in the main to yield useful results. Ideally, fect lexicon, it must be recomputed and re-generated whenever
however, we would like to involve additional linguists, to re- the affect lexicon changes. For efficiency, only those entries di-
view and refine the inventory of atomic categories and to ensure rectly affected by a change are recomputed.
some consensus on the representation of difficult items. In a fin-
ished system, repeated iterations and use of additional profiles C. Affect Category Groups
or personal lexicons will allow the individual user to fine-tune Affect category groups are generated automatically by clus-
membership degrees and accommodate his or her own subjec- tering the fuzzy thesaurus. In this process, affect categories with
tive criteria. high degrees of similarity (as defined in the fuzzy thesaurus)
are grouped together. For example, we find that love, attraction,
B. Fuzzy Thesaurus happiness, desire and pleasure form one affect category group,
The fuzzy thesaurus establishes relationships between pairs while repulsion, horror, inferiority and pain form another. If
of affect categories, based on the centralities of items assigned the automatically-created groups are not as intuitively natural
to both categories in the lexicon. It contains entries of the form as these examples, the user can edit them.
Affect category group ACG is a set of affect categories
such that

as in (3)
where is a user-set threshold. It is worth
noting that ACG is not a similarity relation in the sense of Zadeh
arranged in a matrix. When the relationship degree is equal to [12], since it lacks the transitivity property. The lack of transi-
0, no entry is recorded in the fuzzy thesaurus. When the rela- tivity is a direct consequence of the fact that the ACG is gener-
tionship degree is equal to 1.0 we say that we have discovered ated from the fuzzy thesaurus, which is in turn generated from
affectual synonyms, as in the affect lexicon. Transitivity can be enforced by computing
the missing relationship degrees from the existing ones. How-
ever, we prefer to keep the original relationship degrees intact,
in order to ensure the proper interpretation of the original affect
category assignments reflected in the affect lexicon.
Non-synonymous pairs having entries in the matrix are related Affect category groups can be used for more efficient group-
to some specified degree. ings of affect categories in visualization charts. An example is
The fuzzy thesaurus is generated by the system from the affect shown in Section IV.
lexicon. It is generated using max-min composition [12] Since the affect category groups are computed from the fuzzy
(1) thesaurus, each time the fuzzy thesaurus is changed, the affect
category groups must be recomputed. This is a computationally
inexpensive operation, since the number of affect categories in-
where are affect categories whose relationship degree volved is typically small—in our prototype, there are only 83
we want to compute and categories.
represent the centralities of affect categories with re-
spect to affect . are taken directly from III. FUZZY SEMANTIC TYPING
the affect lexicon.
The fuzzy thesaurus is primarily used for expansion of affect Fuzzy semantic typing is the process in which domain words
sets. For example, a single affect category humor with a cen- from a document are identified. The words are assigned meta-in-
trality of 0.7 is expanded using the similarity class for humor formation from the typing lexicon in the form of semantic cat-
from the fuzzy thesaurus. Using max-min composition, we ex- egories and associated degrees; and the categories and degrees
pand this affect intensity profile as follows are combined to yield the overall representation of the docu-
ment’s content. In this section, we describe in detail the steps in
humor humor excitement intelligence this process for affect analysis.

humor excitement intelligence A. Affect Sets


(2)
A central construct in our affect analysis is the affect set. It
where represents the composition operator. comprises the set of unique affect categories from a given text,
SUBASIC AND HUETTNER: AFFECT ANALYSIS OF TEXT USING FUZZY SEMANTIC TYPING 487

have something to throw if the audience


attacked him).
This document is tagged with

on the basis of its two affect words, uproar and attacked.


Note, that since the word attacked belongs to both the affect
categories violence and conflict, both categories are included as
document tags.
2) Combination of Centralities and Intensities and Docu-
ment Affect Set: The following algorithm describes how to re-
duce the initial affect set by combining the centralities and in-
tensities of recurring categories.
1) For each affect category that appears in the tagging set:
a) Compute the maximal centrality (fuzzy union) of all
centralities attached to that affect category in the
tagged document. The result is the centrality of that
category for the document as a whole.
b) Compute the average intensity of all intensities at-
tached to that affect category in the tagged docu-
ment. The result is the intensity of that category for
the document as a whole.
2) Counts of affect categories are combined with intensities
using simple averaging to yield the overall intensity score
for the document.
As an example, consider the following document:
Fig. 1. Generation of the document affect set, a fuzzy set representing affective
content of a document. Luis Bunuel’s The Exterminating Angel
(1962) is a macabre comedy, a mordant view
of human nature that suggests we harbor
with attached centralities and intensities. The following sections savage instincts and unspeakable secrets.
discuss the generation of an affect set for a general document. Take a group of prosperous dinner guests
and pen them up long enough, he suggests,
and they’ll turn on one another like rats
B. Tagging of Free Text in an overpopulation study. Bunuel be-
gins with small, alarming portents. The
The process for tagging a document with an affect set is cook and the servants suddenly put on
shown in Fig. 1. It includes the following steps. their coats and escape, just as the dinner
1) Normalization and Tagging: guests are arriving. The hostess is fu-
rious; she planned an after-dinner enter-
1) The document is parsed and tokens (individual words) are tainment involving a bear and two sheep.
generated one at a time. Now it will have to be canceled. It is
2) Each token is normalized using normalization rules for typical of Bunuel that such surrealistic
English language, shown as Grammar in Fig. 1. touches are dropped in without comment.
3) The normalized tokens are looked up in the affect lexicon. The dinner party is a success. The guests
If a token has one or several lexicon entries, we retrieve whisper slanders about each other, their
all affect categories with their associated centrality and eyes playing across the faces of their
intensity scores. fellow guests with greed, lust and envy.
Using this algorithm, we generate the initial affect set for each After dinner, they stroll into the drawing
document. room, where we glimpse a woman’s purse,
As an example, consider a simple document consisting of filled with chicken feathers and rooster
this sentence: claws.
His first film, Un Chien Andalou (1928), The output produced after fuzzy semantic tagging is given in
co-directed by Salvador Dali, caused an Table I.
uproar (he filled his pockets with stones, We combine recurring affect categories into a set of unique
he wrote in his autobiography, so he would tags, with centralities and intensities that accurately reflect the
488 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001

TABLE I That is to say, the maximal purity of the quality in the document
ENTRIES AND ASSOCIATED AFFECT CATEGORIES WITH CENTRALITIES AND already implies vaguer or more diluted degrees of that quality
INTENSITIES FOR THE MOVIE REVIEW OF EXTERMINATING ANGEL
and, therefore, is appropriate as the combined centrality/purity
for that category. The appropriate operation here is thus fuzzy
union. On the other hand, the more times an affect category is
present in the document and the higher the intensities of its in-
stances, the higher will be the combined intensity/strength at-
tached to it. We, therefore, compute the intensity attached to an
affect category as the average of all intensities attached to in-
stances of that category. We believe this model is closer to how
humans perceive intensities of words as they read. For example,
if we encounter only one instance of a “strong” word (for ex-
ample, greed in the desire category), together with many “weak”
words (for example, wish, want, prefer), the “weak” words will
influence the intensity proportionally and reduce the overall ef-
fect of the “strong” word. This is why we compute the average
rather than some other function, such as the maximum—a max-
imum would imply that the intensity of the “strongest” word is
the intensity for the whole article, which is not what we expe-
rience while reading. Another plausible approach would be to
weight intensities depending on their position in the text, giving
heavier weight to words near the beginning of the document, or
belonging to highly exposed parts of the document, such as the
title or abstract.
After computing centralities using fuzzy union and arranging
elements so that the elements with higher membership degrees
(centralities) are at the front of the fuzzy set, we have
violence, humor, warning, anger, success, slander, greed

horror, aversion absurdity, excitement, desire

pleasure, promise, surfeit repulsion, fear

lack, death, slyness, intelligence, deception, insanity

clarity, innocence, inferiority

pain, disloyalty, failure, creation, surprise


(4)
This representation of the fuzzy set of affect categories enables
us easily to spot predominant qualities of affect categories in
the document. The meaning of this affect category set is that
the document has a high degree of violence, humor, warning,
anger, success, slander, greed, horror, aversion, absurdity, ex-
citement, desire, pleasure, promise and surfeit; a medium de-
gree of repulsion, fear, lack, death, slyness, intelligence, de-
ception, insanity, clarity, innocence and inferiority; and a low
degree of pain, disloyalty, failure, creation and surprise.
overall document content. For this purpose, we discard the orig- To compute the overall intensity we use a simple average over
inal affect words and the POS information, and combine the in- all affect category instances and their respective intensities
tensities and centralities of the remaining affect categories.
Intensities and centralities are handled differently, since they (5)
represent different types of information. Centrality indicates the
purity of a quality represented by an affect category. Intensity where
indicates the strength of that quality. Thus, the number of oc- overall intensity of a document ;
currences of a particular affect category in the document does total number of affect category instances in the
not affect its centrality, but does affect its intensity. Centrality, as document ;
the purity of a quality, depends on the maximal centrality over intensity of an affect category instance .
all instances of that affect category in a particular document. For the example document, overall intensity is 0.6.
SUBASIC AND HUETTNER: AFFECT ANALYSIS OF TEXT USING FUZZY SEMANTIC TYPING 489

Fig. 2. Centralities with positive connotations up are shown separately from those whose connotations are negative. Affect categories are arranged into similar
groups around the circle.

Fig. 3. When all centralities from Fig. 2 are expanded using the fuzzy thesaurus, we get a greater level of detail. Note that additional affect categories exist in the
new chart. Centrality values are omitted for greater legibility (scale 0–1).

Overall intensity is used to detect documents with offensive IV. AFFECT SET VISUALIZATION
content. For example, high overall intensity (0.7) in com-
bination with the specific centrality profile distaste An interesting and important area related to the fuzzy typing
violence pain may indicate offensive and work is visualization of the results. We have developed a simple
undesirable content. affect tagging and affect profile browsing application called the
490 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001

Fig. 4. News report on train crash in London, October 1999. Centralities describe quality, and intensities, quantity (count and strength) of the affects in a document.
As might be expected, the affects fear, harm, pain and surprise are most central. The most intense affects are conflict, confusion, disadvantage and pain.

Fig. 5. Profile of a recent cult movie Matrix generated from a movie review. Opposite affect categories are placed on opposite sides of the circle (the left side
shows negative affects, the right side, positive). With a well-developed negative side, this movie is deservedly rated “R” in the US.

Affect Inspector. We show a basic browsing setup from that ap- humor and excitement. Groups of similar affect categories are
plication in Appendix B. generated using the technique discussed in Section II-C.
Each affect category’s centralities and intensities can be rep- When all affect categories from Fig. 2 are expanded using the
resented as a point on the perimeter of a unit circle. Centralities fuzzy thesaurus, we obtain the chart in Fig. 3. The chart contains
and intensities can then be visualized, as shown in Figs. 2–8. In both positive and negative affects, with a higher level of detail,
order to demonstrate various ways these charts can be organized, since new affect categories have been added to the chart through
we show different charts for different information objects. expansion.
In Fig. 2 we show centralities with positive affects separated In Fig. 4 we show the affect structure of a news report con-
from centralities with mainly negative affects, for the movie re- cerning a train crash in London. We show both centralities and
view of Exterminating Angel. In this way, the positive versus intensities for qualities typical of news on accidents.
the negative side of the document can be easily analyzed. More- Opposing affect categories can be placed on opposite sides of
over, groups of similar affect categories are shown close to each the circle with respect to the center point. This is illustrated in
other. This facilitates a quick overview of aspects denoted by Fig. 5, for centralities in a Matrix review. With this arrangement,
those groups. For example, in Fig. 2, clarity, creation and intel- we can easily spot which part of the circle is best developed and
ligence are not as well developed as success, desire, pleasure, understand its affective impact.
SUBASIC AND HUETTNER: AFFECT ANALYSIS OF TEXT USING FUZZY SEMANTIC TYPING 491

Fig. 6. Centralities (up) and intensities in T. S. Eliot’s famous “Rhapsody on a Windy Night.” Even the surrealistic content of this poem can be succesfully
analyzed. The poem contains mixed affects of weakness, disadvantage, insanity and strength, clarity and openness. For intensities, clarity, weakness, and
inferiority seem to prevail.

Affect categories can be generated for sets of movies. We TABLE II


analyzed sets in the romance, action, science fiction, comedy, CENTRALITY RESULTS BY TYPE
and family genres. Each set contains about 15 movies and some
movies belong to multiple sets (for example, 12 Monkeys be-
longs both to the action and science fiction genres). The results
are shown in Table II, along with profiles for news about acci-
dents. The generated profiles confirm our expectations about
affect categories for different movie genres: romance movies
showed high levels of happiness, innocence and justice; action
movies scored high on surprise, strength and slyness; family
movies emphasized sensitivity, morality, responsibility and
nurturance; and so on. The composite profiles also reveal some
less obvious features: high levels of immorality and inferiority
in the romance movie set, and high levels of inferiority and
destruction in the comedies, for example. Although the results
in many cases represent typical characteristics of movie genres,
we believe that some aspects of the results may reflect only the
selected movies within the genre.
Next, we show that poetry can be analyzed using this ap-
proach. Fig. 6 shows centralities and intensities for a famous
poem with a very accurate affect profile.
Another application of this technique is personalization.
We generated affect profiles for different users based on their
movie preferences and the results clearly reflect different
personal tastes. The illustration in Fig. 7 shows the centrality
affect profiles obtained after merging profiles for each person’s
favorite movies. In merging centrality affect profiles, we use the
same approach as when merging affect profiles of individual
words or sentences, that is, the union (maximum) operator.

V. COMPUTING WITH AFFECT PROFILES


To illustrate the computational potential of the fuzzy semantic
typing framework, we present here an example of retrieval based
on similarities drawn from affect categories. We illustrate the
technique on retrieval of similar phrases, but it extends equally
492 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001

TABLE III
PHRASE RETRIEVAL

well to retrieval of larger portions of texts, such as sentences or This retrieval-by-similarity technique can be complemented
paragraphs. Here is the outline of the experiment from which with the filtering-on-intensity technique. For example, we can
we produced Table III. set the intensity threshold to a certain category’s intensity, and
1) We found the affect profile of the sentence fear, anger, filter out all expressions whose intensity for that category is
grief and pain filled the room. greater than the threshold intensity. Alternatively, we can set the
2) We retrieved all phrases from a precompiled list of 34 threshold on overall intensity and filter out all documents with
phrases, from four different documents that had affect intensities higher than the given threshold.
profiles most similar to the seed phrase. As a similarity Such similarity computations can be conveniently combined
measure between the affect profile of the seed phrase with statistical methods. For short phrases like those shown
and the affect profile of a candidate phrase , we use [15] here, the similarity measure we used is very convenient since
it reflects the overlapping of qualitative features (affect cat-
egories) without taking into account statistical features like
(6)
number of words or categories. In certain cases, it may be
reasonable to use a vector space model with cosine similarity
where represents the sum of the intersections (min- to find similar phrases. One must be careful, however, as that
imum values) of the affect sets’ centralities for the respective approach will not always return the desired qualitative profile.
affect categories and represents the sum of the unions To illustrate this point, let us imagine that we have submitted
(maximum values) of the affect sets’ centralities for the respec- the query love, pain to a text corpus and that we are searching
tive affect categories. for sentences that have both qualities—i.e., sentences that
We found phrases as presented in Table III, ordered by de- match both love and pain. However, vector space retrieval
creasing scores for similarity to the seed phrase. Each phrase is would return sentences containing many instances of love with
shown with its respective affect centrality and intensity profiles, a high score, even if there is no mention of pain in them. Our
similarity degrees, and average intensities. similarity computation, on the other hand, gives preference
SUBASIC AND HUETTNER: AFFECT ANALYSIS OF TEXT USING FUZZY SEMANTIC TYPING 493

Fig. 7. Affect profiles generated from personal movie preferences. Each person submitted from 10 to 81 favorite movies. Individual movie profiles are merged
using the maximum operator on centralities to yield the personal profiles in this figure.

to documents having higher-level maximal centralities for would, therefore, lead to different centrality combina-
both categories, regardless of their total count in the sentence. tion operators for each of these classes.
This example clearly illustrates the main differences in the The same holds true for the hierarchical centrality
quantitative vs. the qualitative approach to love and pain. computation, but to a lesser extent, since text hier-
In the quantitative approach, we are concerned with finding archies are more regular and do not have as richly
information that is statistically similar to our query. In the varying a structure. Still, for many documents, sen-
qualitative approach, we are more concerned with a particular tences and paragraphs can be assigned weights that
qualitative profile of the target information. reflect their relative importance in centrality compu-
tations. For example, since report-style prose is typi-
VI. FURTHER DEVELOPMENT cally “front-loaded,” a possible approach would be to
increase the centrality of categories appearing in the
The fuzzy semantic typing approach deals well with ambi- title, summary, or lead paragraph of such texts.
guity and imprecision in free text. It can be effectively combined 2) Support for management of linguistic resources.
with a set of visualization tools for easy, accurate analysis of the We would like to begin experimenting with personal
affect content in a document. These results are promising, but (user-initiated) updating of general-purpose affect lex-
we feel that we have just begun exploration in uncharted terri- icons. This would include the modification of central-
tory. Our plans in the immediate future include: ities and intensities attached to affect categories, the
1) Enrichment of the existing affect typing mech- addition or removal of affect words, the definition of
anism. To the extent that expressions in free text and complex affect categories in terms of basic affect cate-
their constituent words have syntactically and semanti- gories, and the tuning of the fuzzy thesaurus to reflect
cally different roles, we are making an approximation changes in the affect lexicon. It would be especially
by using fuzzy union to compute centralities. Although interesting to experiment with different affect lexicons
individual affect words are treated equally, their roles on the sending and receiving sides of a communica-
are frequently uneven. For example, modifiers (very, tion. In such an experiment, a message composed with
more or less, not), comparative and superlative adjec- a help of personal affect lexicon would be interpreted
tives, adverbial phrases, noun phrases, and complex using personal lexicon .
phrases (with and/or connectives, etc.) all represent 3) Generalization of the fuzzy typing framework.
different classes of expressions with possibly different Fuzzy typing may be adapted to many different appli-
roles in a phrase or sentence. A finer-grained analysis cation areas by developing appropriate resources: busi-
494 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001

Fig. 8. Basic browsing setup in TreeViewer. The tree view at upper right shows a browsable text hierarchy; the text pane at lower right displays analyzed text.
Affect centrality and affect intensity sets appear on the left. Circles represent average values.

ness, food/cooking, fashion, architectural styles, cul- fingerprints, could reveal interesting juxtapositions in
tural events and artistic material, psychology-related the data.
material, wine and beer, perfumes. 5) Integration with existing text management
4) Analysis of different points of view. Using an techniques. Although not currently included in our
affect lexicon, we can analyze affects in a text. Using system, quasistatistical information can be generated
a lexicon with expressions and types that describe in- to complement qualitative analysis. This would
tentions (e.g., would like to, will, is considering, is involve a simple extension of common statistical
thinking about) would give us an intention finger- approaches like statistical indexing, term-weighting,
print for a document. Using both lexicons at the IDF-TF scoring and cosine distance [5], on a
same time, to find a text’s affectual and intentional full feature space (all terms and phrases), or on
SUBASIC AND HUETTNER: AFFECT ANALYSIS OF TEXT USING FUZZY SEMANTIC TYPING 495

TABLE IV
THE COMPLETE LIST OF AFFECT CATEGORIES AND RESPECTIVE OPPOSITE AFFECT CATEGORIES

a significantly reduced feature space consisting 6) Extension to additional languages. While the
of a limited number of affect categories. Fuzzy lexicon for our prototype system was created man-
typing would then be complemented with retrieval ually, we are currently experimenting with a more
techniques applied to affect profiles: e.g., querying systematic, semi-automatic collection method based
and retrieval, clustering and classification. We on WordNet [2]. If this technique proves successful,
believe the best way to approach this issue is to it could be applied to foreign language semantic nets
start with basic similarity assumptions as given in such as EuroWordNet [1], to accommodate a variety
[11], and further investigate techniques from [12] of other languages.
and [15] for computing similarities between fuzzy
objects. Finally, we hope to devise an algorithm that
VII. CONCLUSION
effectively combines statistical and quasistatistical
similarity scores with fuzzy similarity scores to We describe a novel approach to text analysis that combines
compute the overall similarity of two texts. semantic typing techniques from natural language processing
496 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001

with fuzzy techniques, under the common framework of fuzzy [5] D. A. Grossman and O. Frieder, Information Retrieval Algorithms and
semantic typing. Fuzzy semantic typing is an innovative way Heuristics. Norwell, MA: Kluwer, 1998.
[6] R. Krovetz and W. B. Croft, “Lexical ambiguity and information re-
to capture metalinguistic facts about a text while allowing for trieval,” ACM Trans. Inform. Syst., vol. 10, no. 2, pp. 115–141, 1992.
linguistic ambiguity and vagueness. From our analysis of gen- [7] G. Miller and C. Walter, “Contextual correlates of semantic similarity,”
erated profiles for news reports and movie reviews, we believe Language and Cognitive Processes, vol. 6, pp. 1–28, 1991.
[8] R. W. Picard, Affective Computing. Cambridge, MA: MIT Press, 1997.
that the metalinguistic representation can be usefully applied [9] P. Subasic and A. Huettner, “Affect analysis of text using fuzzy semantic
in retrieval, clustering, and classification. The approach is ap- typing,” presented at the Proc. of FUZZ-IEEE 2000, The 9th Interna-
plicable to an indefinite number of domains and lends itself to tional Conference on Fuzzy Systems, San Antonio, Taxas, 2000.
[10] M. Sugeno, “On organization of imprecision based on word classifica-
customization for a particular user or task. We look forward to tion,” in 2nd Fuzzy System Symposium, 1986, pp. 148–153. Japanese.
continuing our research in these directions. [11] A. Tversky, “Features of similarity,” Psychological Rev. 84, pp.
327–352, 1977.
[12] L. A. Zadeh, “Similarity relations and fuzzy orderings,” in Inform. Sci.
APPENDIX A 3: Elsevier Sci., 1977, pp. 177–200.
[13] =
, “Fuzzy logic computing with words,” IEEE Trans. Fuzzy Syst.,
See Table IV. pp. 103–111, 1996.
[14] , “A new AI: Toward computational theory of perceptions,” AAAI
Magazine, vol. 22, no. 1, pp. 73–84, Spring 2001.
APPENDIX B [15] R. Zwick, E. Carlstein, and D. V. Budescu, “Measures of similarity
AFFECT INSPECTOR APPLICATION among fuzzy concepts: A comparative analysis,” International Journal
of Approximate Reasoning, vol. 1, pp. 221–242, 1987.
In Fig. 8 we show an example screen from our affect inspector
application, illustrating the display of a profile’s centralities and
intensities at different levels of the text hierarchy. The essential
part of the browsing system is the TreeViewer. It is a Java/XML Pero Subasic (M’97) obtained the Dipl. Eng.
and M.S. degrees from the School of Electrical
application that contains a text hierarchy in the form of a tree. Engineering, Begrade University, Yugoslavia, and
Each node in the tree is clickable, and clicking on it displays the Dr. Eng. from Yamagata University, Japan, in
the affect profile associated with that node in two panes on the 1989, 1993, and 1996, respectively
He has been with the Institute Mihajlo Pupin, Bel-
left—the centrality profile in the upper pane, the intensity pro- grade, Yugoslavia; Belgrade University, Yugoslavia;
file in the lower. The circles represent the average values for Tohoku University, Japan; Yamagata University,
centralities (upper left) and intensities (lower left). The text as- Japan; and, the Tokyo Insitute of Technology, Japan,
as a Faculty Member or Researcher. He is currently
sociated with the node is displayed in the lower right pane. with Clairvoyance, Corporation, Pittsburgh, PA,
Browsing is possible at any level of the hierarchy from top to where he has worked in text mining, navigation and visualization, data analysis,
bottom: corpus, document, paragraph, sentence, affect category. preprocessing, including semantic Typing and Resource Management. He is the
Principal Inventor of the Fuzzy Semantic Typing framework. He has published
Affect profiles to the left are click-sensitive maps: by clicking over 40 research papers and reports in international journals, monographs and
on certain points, one can search on individual affects, or change conferences. He is author of a book on fuzzy systems and neural networks.
the form of the current display to see opposite category place- Dr. Subasic is a member of Japan Society for Fuzzy Systems (ACM SIGCHI,
SOFT) and Yugoslav Society for Soft Computing and Intelligent Systems (SO-
ments, linear placements, or placements ordered by decreasing COIS).
scores. Such arrangements facilitate easy browsing, verification
of affect profiles, and comparison of affect profiles for different
text elements, both vertically (affect-sentence-paragraph-docu-
ment-corpus) and laterally (two or more documents, paragraphs Alison Huettner received the Ph.D. in linguistics
from the University of Massachusetts, Amherst,
or sentences). MA, in 1989.
In 1998, she joined Clairvoyance Corporation,
REFERENCES Pittsburgh, PA as a Project Manager working on
natural language processing (NLP). She refined the
[1] University of Amsterdam, Dep. Computional Linguistics EuroWordNet, CLARIT NLP resources and experimented with
. (2000, May). [Online]. Available: http://www.hum.uva.nl/~ewn/ extensions, including expanded lexical equivalences,
[2] C. Fellbaum, Ed., WordNet: An Electronic Lexical Database: MIT Press, semantic typing, specialized affect handling, and
1998. a prototype question-answering system; currently,
[3] C. Fillmore and B. T. S. Atkins, “FrameNet and lexicographic rele- she is working with e-commerce applications. Prior
vance,” in Proc. First Int. Conf. on Language Resources and Evaluation to joining Clairvoyance, she was a knowledge engineer with Carnegie Group,
, 1998, pp. 417–420. Inc., and worked on a fact-extraction system and a commercial machine
[4] T. Fontanelle, “Semantic tagging: A survey,” in Papers in Computational translation system. She is also one of the inventors of the associated patent for
Lexicography, COMPLEX 99, 1999, pp. 39–56. an integrated authoring and translation system.

You might also like