Professional Documents
Culture Documents
Kid Don 11
Kid Don 11
4.2 Baselines
4 Evaluation
Our experiments compare DEviaNT to seven other
The goal of our evaluation is somewhat unusual. classifiers: (1) a Naı̈ve Bayes classifier on unigram
DEviaNT explores a particular approach to solving features, (2) an SVM model trained on unigram fea-
the TWSS problem: recognizing euphemistic and tures, (3) an SVM model trained on unigram and
structural relationships between the source domain bigram features, (4–6) MetaCost (Domingos, 1999)
and an erotic domain. As such, DEviaNT is at a dis- (see Section 3.4) versions of (1–3), and (7) a version
advantage to many potential solutions because DE- of DEviaNT that uses just the BASIC S TRUCTURE
viaNT does not aggressively explore features spe- features (as a feature ablation study). The SVM
cific to TWSSs (e.g., DEviaNT does not use a lexical models use the same parameters and kernel function
n-gram model of the TWSS training data). Thus, the as DEviaNT.
goal of our evaluation is not to outperform the base- The state-of-the-practice approach to TWSS iden-
lines in all aspects, but rather to show that by using tification is a naı̈ve Bayes model trained on a un-
only euphemism-based and structure-based features, igram model of instances of twitter tweets, some
DEviaNT can compete with the baselines, particu- tagged with #twss (VandenBos, 2011). While this
larly where it matters most, delivering high precision was the only existing classifier we were able to find,
and few false positives. this was not a rigorously approached solution to the
problem. In particular, its training data were noisy,
4.1 Datasets partially untaggable, and multilingual. Thus, we
Our goals for DEviaNT’s training data were to reimplemented this approach more rigorously as one
(1) include a wide range of negative samples to of our baselines.
distinguish TWSSs from arbitrary sentences while For completeness, we tested whether adding un-
(2) keeping negative and positive samples similar igram features to DEviaNT improved its perfor-
enough in language to tackle difficult cases. DE- mance but found that it did not.
DEviaNT viaNT has a much lower recall than Unigram SVM
0.9 Basic Structure w/o MetaCost, it accomplishes our goal of deliver-
Unigram SVM w/ MetaCost
Unigram SVM w/o MetaCost ing high-precision, while tolerating low recall.
0.8
Bigram SVM w/ MetaCost
Bigram SVM w/o MetaCost Note that the DEviaNT’s precision appears low in
0.7
Naive Bayes w/ MetaCost large because the testing data is predominantly neg-
Naive Bayes w/o MetaCost ative. If DEviaNT classified a randomly selected,
0.6
Precision
0.3 5 Contributions
0.2
We formally defined the TWSS problem, a sub-
0.1
problem of the double entendre problem. We then
identified two characteristics of the TWSS prob-
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
lem — (1) TWSSs are likely to contain nouns that
Recall
are euphemisms for sexually explicit nouns and (2)
Figure 1: The precision-recall curves for DEviaNT and TWSSs share common structure with sentences in
baseline classifiers on TS, TFLN, FML, and WQ. the erotic domain — that we used to construct
DEviaNT, an approach for TWSS classification.
4.3 Results DEviaNT identifies euphemism and erotic-domain
structure without relying heavily on structural fea-
Figure 1 shows the precision-recall curves for DE- tures specific to TWSSs. DEviaNT delivers sig-
viaNT and the other seven classifiers. DEviaNT and nificantly higher precision than classifiers that use
Basic Structure achieve the highest precisions. The n-gram TWSS models. Our experiments indicate
best competitor — Unigram SVM w/o MetaCost — that euphemism- and erotic-domain-structure fea-
has the maximum precision of 59.2%. In contrast, tures contribute to improving the precision of TWSS
DEviaNT’s precision is over 71.4%. Note that the identification.
addition of bigram features yields no improvement While significant future work in improving DE-
in (and can hurt) both precision and recall. viaNT remains, we have identified two character-
To qualitatively evaluate DEviaNT, we compared istics important to the TWSS problem and demon-
those sentences that DEviaNT, Basic Structure, and strated that an approach based on these character-
Unigram SVM w/o MetaCost are most sure are istics has promise. The technique of metaphorical
TWSSs. DEviaNT returned 28 such sentences (all mapping may be generalized to identify other types
tied for most likely to be a TWSS), 20 of which of double entendres and other forms of humor.
are true positives. However, 2 of the 8 false pos-
itives are in fact TWSSs (despite coming from the
negative testing data): “Yes give me all the cream Acknowledgments
and he’s gone.” and “Yeah but his hole really smells
sometimes.” Basic Structure was most sure about 16 The authors wish to thank Tony Fader and Mark
sentences, 11 of which are true positives. Of these, Yatskar for their insights and help with data, Bran-
7 were also in DEviaNT’s most-sure set. However, don Lucia for his part in coming up with the name
DEviaNT was also able to identify TWSSs that deal DEviaNT, and Luke Zettlemoyer for helpful com-
with noun euphemisms (e.g., “Don’t you think these ments. This material is based upon work supported
buns are a little too big for this meat?”), whereas Ba- by the National Science Foundation Graduate Re-
sic Structure could not. In contrast, Unigram SVM search Fellowship under Grant #DGE-0718124 and
w/o MetaCost is most sure about 130 sentences, 77 under Grant #0937060 to the Computing Research
of which are true positives. Note that while DE- Association for the CIFellows Project.
References Technology Conference / Conference on Empir-
ical Methods in Natural Language Processing
Greg Daniels, Ricky Gervais, and Stephen Mer-
(HLT/EMNLP05). Vancouver, BC, Canada.
chant. 2005. The Office. Television series, the
National Broadcasting Company (NBC). Bradley M. Pasanek and D. Sculley. 2008. Mining
Pedro Domingos. 1999. MetaCost: A general millions of metaphors. Literary and Linguistic
method for making classifiers cost-sensitive. In Computing, 23(3).
Proceedings of the 5th ACM SIGKDD Interna- Ekaterina Shutova. 2010. Automatic metaphor inter-
tional Conference on Knowledge Discovery and pretation as a paraphrasing task. In Proceedings
Data Mining, pages 155–164. San Diego, CA, of Human Language Technologies: The 11th An-
USA. nual Conference of the North American Chapter
W. Nelson Francis and Henry Kucera. 1979. A Stan- of the Association for Computational Linguistics
dard Corpus of Present-Day Edited American En- (HLT10), pages 1029–1037. Los Angeles, CA,
glish. Department of Linguistics, Brown Univer- USA.
sity. Kristina Toutanova, Dan Klein, Christopher Man-
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard ning, and Yoram Singer. 2003. Feature-rich part-
of-speech tagging with a cyclic dependency net-
Pfahringer, Peter Reutemann, and Ian H. Witten.
work. In Proceedings of Human Language Tech-
2009. The WEKA data mining software: An up-
nologies: The Annual Conference of the North
date. SIGKDD Explorations, 11(1).
American Chapter of the Association for Compu-
Zachary J. Mason. 2004. CorMet: A computational, tational Linguistics (HLT03), pages 252–259. Ed-
corpus-based conventional metaphor extraction monton, AB, Canada.
system. Computational Linguistics, 30(1):23–44.
Kristina Toutanova and Christopher Manning. 2000.
Rada Mihalcea and Stephen Pulman. 2007. Char-
Enriching the knowledge sources used in a maxi-
acterizing humour: An exploration of features in
mum entropy part-of-speech tagger. In Joint SIG-
humorous texts. In Proceedings of the 8th Con-
DAT Conference on Empirical Methods in NLP
ference on Intelligent Text Processing and Com-
and Very Large Corpora (EMNLP/VLC00), pages
putational Linguistics (CICLing07). Mexico City,
63–71. Hong Kong, China.
Mexico.
Ben VandenBos. 2011. Pre-trained “that’s what she
Rada Mihalcea and Carlo Strapparava. 2005. Mak-
said” bayes classifier. http://rubygems.org/
ing computers laugh: Investigations in auto-
gems/twss.
matic humor recognition. In Human Language