A Survey On Hate Speech Detection Using Natural Language Processing

A Survey on Hate Speech Detection using Natural Language Processing
Anna Schmidt Michael Wiegand

Spoken Language Systems Spoken Language Systems
Saarland University Saarland University
D-66123 Saarbrücken, Germany D-66123 Saarbrücken, Germany
anna.schmidt@lsv.uni-saarland.de michael.wiegand@lsv.uni-saarland.de
Abstract considered a hate speech message might be influ-

enced by aspects such as the domain of an utter-
This paper presents a survey on hate ance, its discourse context, as well as context con-
speech detection. Given the steadily grow- sisting of co-occurring media objects (e.g. images,
ing body of social media content, the videos, audio), the exact time of posting and world
amount of online hate speech is also in- events at this moment, identity of author and tar-
creasing. Due to the massive scale of geted recipient.
the web, methods that automatically detect This paper provides a short, comprehensive and
hate speech are required. Our survey de- structured overview of automatic hate speech de-
scribes key areas that have been explored tection, and outlines the existing approaches in
to automatically recognize these types of a systematic manner, focusing on feature extrac-
utterances using natural language process- tion in particular. It is mainly aimed at NLP re-
ing. We also discuss limits of those ap- searchers who are new to the field of hate speech
proaches. detection and want to inform themselves about the
state of the art.
1 Introduction
Hate speech is commonly defined as any commu- 2 Terminology
nication that disparages a person or a group on the
In this paper we use the term hate speech. We de-
basis of some characteristic such as race, color,
cided in favour of using this term since it can be
ethnicity, gender, sexual orientation, nationality,
considered a broad umbrella term for numerous
religion, or other characteristic (Nockleby, 2000).
kinds of insulting user-created content addressed
Examples are (1)-(3).1
in the individual works we summarize in this pa-
(1) Go fucking kill yourself and die already useless ugly per. Hate speech is also the most frequently used
pile of shit scumbag. expression for this phenomenon, and is even a le-
(2) The Jew Faggot Behind The Financial Collapse
(3) Hope one of those bitches falls over and breaks her leg gal term in several countries. Below we list other
terms that are used in the NLP community. This
Due to the massive rise of user-generated web con- should also help readers with finding further liter-
tent, in particular on social media networks, the ature on that task.
amount of hate speech is also steadily increas- In the earliest work on hate speech, Spertus
ing. Over the past years, interest in online hate (1997) refers to abusive messages, hostile mes-
speech detection and particularly the automatiza- sages or flames. More recently, many authors have
tion of this task has continuously grown, along shifted to employing the term cyberbullying (Xu et
with the societal impact of the phenomenon. Nat- al., 2012; Hosseinmardi et al., 2015; Zhong et al.,
ural language processing focusing specifically on 2016; Van Hee et al., 2015; Dadvar et al., 2013;
this phenomenon is required since basic word fil- Dinakar et al., 2012). The actual term hate speech
ters do not provide a sufficient remedy: What is is used by Warner and Hirschberg (2012), Burnap
1
and Williams (2015), Silva et al. (2016), Djuric et
The examples in this work are included to illustrate the
severity of the hate speech problem. They are taken from ac- al. (2015), Gitari et al. (2015), Williams and Bur-
tual web data and in no way reflect the opinion of the authors. nap (2015) and Kwok and Wang (2013). Further,
1
Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media , pages 1–10,
c
Valencia, Spain, April 3-7, 2017. 2017 Association for Computational Linguistics
Sood et al. (2012a) work on detecting (personal) based approaches since the unusual spelling vari-
insults, profanity and user posts that are character- ations will result in very rare or even unknown
ized by malicious intent, while Razavi et al. (2010) tokens in the training data. Character-level ap-
refer to offensive language. Xiang et al. (2012) fo- proaches, on the other hand, are more likely to
cus on vulgar language and profanity-related of- capture the similarity to the canonical spelling
fensive content. Xu et al. (2012)2 further look of these tokens. Mehdad and Tetreault (2016)
into jokingly formulated teasing in messages that systematically compare character n-gram features
represent (possibly less severe) bullying episodes. with token n-grams for hate speech detection, and
Finally, Burnap and Williams (2014) specifically find that character n-grams prove to be more pre-
look into othering language, characterized by an dictive than token n-grams.
us-them dichotomy in racist communication. Apart from word- and character-based features,
hate speech detection can also benefit from other
3 Features for Hate Speech Detection surface features (Chen et al., 2012; Nobata et al.,
As is often the case with classification-related 2016), such as information on the frequency of
tasks, one of the most interesting aspects distin- URL mentions and punctuation, comment and to-
guishing different approaches is which features are ken lengths, capitalization, words that cannot be
used. Hate speech detection is certainly no excep- found in English dictionaries, and the number of
tion since what differentiates a hateful speech ut- non-alpha numeric characters present in tokens.
terance from a harmless one is probably not at-
tributable to a single class of influencing aspects. 3.2 Word Generalization
While the set of features examined in the differ- While bag-of-words features usually yield a good
ent works greatly varies, the classification meth- classification performance in hate speech detec-
ods mainly focus on supervised learning (§6). tion, in order to work effectively these features re-
quire predictive words to appear in both training
3.1 Simple Surface Features
and test data. However, since hate speech detec-
For any text classification task, the most obvious tion is usually applied on small pieces of text (e.g.
information to utilize are surface-level features, passages or even individual sentences), one may
such as bag of words. Indeed, unigrams and larger face a data sparsity problem. This is why several
n-grams are included in the feature sets by a ma- works address this issue by applying some form
jority of authors (Chen et al., 2012; Xu et al., 2012; of word generalization. This can be achieved by
Warner and Hirschberg, 2012; Sood et al., 2012b; carrying out word clustering and then using in-
Burnap and Williams, 2015; Van Hee et al., 2015; duced cluster IDs representing sets of words as
Waseem and Hovy, 2016; Burnap and Williams, additional (generalized) features. A standard al-
2016; Hosseinmardi et al., 2015; Nobata et al., gorithm for this is Brown clustering (Brown et al.,
2016). These features are often reported to be 1992) which has been used as a feature in Warner
highly predictive. Still, in many works n-gram fea- and Hirschberg (2012). While Brown clustering
tures are combined with a large selection of other produces hard clusters – that is, it assigns each
features. For example, in their recent work, No- individual word to one particular cluster – Latent
bata et al. (2016) report that while token and char- Dirichlet Allocation (LDA) (Blei et al., 2003) pro-
acter n-gram features are the most predictive sin- duces for each word a topic distribution indicat-
gle features in their experiments, combining them ing to which degree a word belongs to each topic.
with all additional features further improves per- Such information has similarly been used for hate
formance. speech detection (Xiang et al., 2012; Zhong et al.,
Character-level n-gram features might provide a 2016).
way to attenuate the spelling variation problem of- More recently, distributed word representations
ten faced when working with user generated com- (based on neural networks), also referred to as
ment text. For instance, the phrase ki11 yrslef word embeddings, have been proposed for a sim-
a$$hole, which is regarded as an example of hate ilar purposes. For each word a vector representa-
speech, will most likely pose problems to token- tion is induced (Mikolov et al., 2013) from a large
2
The data from this work are available under http:// (unlabelled) text corpus. Such vector representa-
research.cs.wisc.edu/bullying tions have the advantage that different, semanti-
2
cally similar words may also end up having similarity classifiers are employed which in addition
lar vectors. Such vectors may eventually be used to specifying the type of polarity (i.e. positive and
as classification features, replacing binary features negative) also predict the polar intensity of an ut-
indicating the presence or frequency of particular terance. A publicly available polarity classifier
words. Since in hate speech detection sentences which produces such an output is SentiStrength
or passages are classified rather than individual (Thelwall et al., 2010). It is used for hate speech
words, a vector representation of the set of word detection by Burnap et al. (2013).
vectors representing the words of the text to be
classified is sought. A simple way to accomplish 3.4 Lexical Resources
this is by averaging the vectors of all words occur- Trying to make use of the general assumption that
ring in one passage or sentence. For detecting hate hateful messages contain specific negative words
speech, this method is only reported to have lim- (such as slurs, insults, etc.), many authors utilize
ited effectiveness (Nobata et al., 2016), no matter the presence of such words as a feature. To ob-
whether general pretrained embeddings are used tain this type of information lexical resources are
or the embeddings are induced from a domain- required that contain such predictive expressions.
specific corpus. Alternatively, Djuric et al. (2015) A popular source for such word lists is the
propose to use embeddings that directly represent web. There are several publicly available lists
the text passages to be classified. These paragraph that consist of general hate-related terms.3 Apart
embeddings (Le and Mikolov, 2014), which are from works that employ such lists (Xiang et al.,
internally based on word embeddings, have been 2012; Burnap and Williams, 2015; Nobata et al.,
shown to be much more effective than the averag- 2016), there are also approaches, such as Bur-
ing of word embeddings (Nobata et al., 2016). nap and Williams (2016) which focus on lists that
are specialized towards a particular subtype of
3.3 Sentiment Analysis hate speech, such as ethnic slurs4 , LGBT slang
Hate speech and sentiment analysis are closely re- terms5 , or words with a negative connotation to-
lated, and it is safe to assume that usually nega- wards handicapped people.6
tive sentiment pertains to a hate speech message. Apart from publicly-available word lists from
Because of this, several approaches acknowledge the web other approaches incorporate lexicons
the relatedness of hate speech and sentiment anal- that have been specially compiled for the task at
ysis by incorporating the latter as an auxiliary hand. Spertus (1997) employs a lexicon com-
classification. Dinakar et al. (2012), Sood et al. prising so-called good verbs and good adjectives.
(2012b) and Gitari et al. (2015) follow a multi- Razavi et al. (2010) manually compiled an Insult-
step approach, in which a classifier dedicated to ing and Abusing Language Dictionary containing
detect negative polarity is applied prior to the clas- both words and phrases with different degrees of
sifier specifically checking for evidence of hate manifestation of flame varieties. This dictionary
speech. Further, Gitari et al. (2015) run an addi- also assigns weights to each lexical entry which
tional classifier that weeds out non-subjective sen- represents the degree of the potential impact level
tences prior to the aforementioned polarity classi- for hate speech detection. The weights are ob-
fication. tained by adaptive learning using the training par-
Apart from multi-step approaches, there are also tition of the data set used in that work. Gitari et
single-step approaches that include some form of al. (2015) build a resource comprising hate verbs
sentiment information as a feature. For example, which are verbs that condone or encourage acts of
in their supervised classifier, Van Hee et al. (2015) violence. Despite their general effectiveness, rel-
use as features the number of positive, negative, 3
www.noswearing.com/dictionary,
and neutral words (according to a sentiment lexi- www.rsdb.org,
con) occurring in a given comment text. www.hatebase.org
4
https://en.wikipedia.org/wiki/List_
Further attempts to isolate the subset of hate of_ethnic_slurs
speech from the set of negative polar utterances 5
rest on the observation that hate speech also dis- of_LGBT_slang_terms
6
plays a high degree of negative polarity (Sood et of_disability-related_terms_with_
al., 2012b; Burnap et al., 2013). To that end, po- negative_connotations
3
atively little is known about the creation process apply some statistical feature selection (Bayesian
and the theoretical concepts that underlie the lex- Logistic Regression), Chen et al. (2012) and Gi-
ical resources that have been specially compiled tari et al. (2015) manually select the relations (e.g.
for hate speech detection. by enforcing that one argument of the relation
Most approaches employ lexical features either is an offensive term) while Nobata et al. (2016)
as some baseline or in addition to other features. do not carry out any further selection. Unfortu-
In contrast to other features, particularly bag of nately, there does not exist any evaluation compar-
words (§3.1) or embeddings (§3.2), they are usu- ing these feature variations. Zhong et al. (2016) do
ally insufficient as a stand-alone feature (Nobata not use the presence of explicit dependency rela-
et al., 2016). Contextual factors play an important tions occurring in a sentence as a feature but em-
role. For example, Hosseinmardi et al. (2015) find ploy an offensiveness level score. This score is
that 48% of media sessions in their data collection based on the frequency of co-occurrences of of-
were not deemed hate speech by a majority of an- fensive terms and user identifiers in the same de-
notators, even though they reportedly contained a pendency relation.
high percentage of profanity words. In her work on the Smokey system, Spertus
(1997) devises a set of linguistic features tailored
3.5 Linguistic Features to the task of hate speech detection. The syn-
Linguistic aspects also play an important role for tactic features include the detection of imperative
hate speech detection. Linguistic features are ei- statements (e.g. Get lost!, Get a life!) and the co-
ther employed in a more generic fashion or are occurrence of the pronoun you modified by noun
specifically tailored to the task. phrases (as in you bozos). The Smokey system
Xu et al. (2012) explore the combination of also incorporates some semantic features to pre-
ngram features with POS-information-enriched vent false positives. On the one hand, so-called
tokens. However, adding POS information does praise rules are employed, which use regular ex-
not significantly improve classifier performance. pressions involving pre-defined good words. Since
Taking into account deeper syntactic informa- that work categorizes webpages, the praise rules
tion as a feature, Chen et al. (2012) employ typed try to detect co-occurrences of good words and
dependency relationships. Such relationships have expressions referring to the website to be classi-
the potential benefit that non-consecutive words fied. On the other hand, Spertus (1997) also em-
bearing a (potentially long-distance) relationship ploys politeness rules represented by certain po-
can be captured in one feature. For instance, in (4) lite words or phrases (e.g. no thanks, would you or
a dependency tuple nsubj(pigs, Jews) will please). Nobata et al. (2016) use a similar feature.
denote the relation between the offensive term pigs
and the hate-target Jews. 3.6 Knowledge-Based Features
(4) Jews are lower class pigs. Hate speech detection is a task that cannot be
solved by simply looking at keywords. Even
Obviously, knowing that those two words are if one tries to model larger textual units, as re-
syntactically related makes the underlying state- searchers attempt to do by means of linguistic fea-
ment more likely to convey hate speech than those tures (§3.5), it remains difficult to decide whether
keywords occurring in a sentence without any syn- some utterance represents hate speech or not. For
tactic relation. Dependency relationships are also instance, (5) may not be regarded as some form of
employed in the feature set from Gitari et al. hate speech when only read in isolation.
(2015), Burnap and Williams (2015), Burnap and
Williams (2016) and Nobata et al. (2016). Bur- (5) Put on a wig and lipstick and be who you really are.
nap and Williams (2015) and Burnap and Williams
However, when the context information is given
(2016) report significant performance improve-
that this utterance has been directed towards a boy
ments based on this feature; the other papers do
on a social media site for adolescents7 , one could
not conduct ablation studies from which one could
infer that this is a remark to malign the sexuality
conclude the effectiveness of this particular fea-
or gender identity of the boy being addressed (Di-
ture. There is also a difference in the sets of
nakar et al., 2012). (5) displays stereotypes most
dependency relationships representing a sentence
which are used. Burnap and Williams (2015) 7
The example utterance from above is from Formspring.
4
commonly attributed to females (i.e. putting on a types of hate speech, it would require domain-
wig and lipstick). If these characteristics are at- specific assertions to be included first. This would
tributed to a male in a heteronormative context, the require a lot of manual coding. It is presumably
intention may have been to insult the addressee. this shortcoming that explains why, to our knowl-
The above example shows that whether a mes- edge, this is the only work that tries to detect hate
sage is hateful or benign can be highly dependent speech with the help of a knowledge base.
on world knowledge, and it is therefore intuitive
that the detection of a phenomenon as complex 3.7 Meta-Information
as hate speech might benefit from including in-
World knowledge gained from knowledge bases is
formation on aspects not directly related to lan-
not the only information available to refine incon-
guage. Dinakar et al. (2012) present an approach
clusive classification. Meta-information (i.e. in-
employing automatic reasoning over world knowl-
formation about an utterance) is also a valuable
edge focusing on anti-LGBT hate speech. The
source to hate speech detection. Since the text
basis of their model is the general-purpose on-
commonly used as data for this task almost exclu-
tology ConceptNet (Liu and Singh, 2004), which
sively comes from social media platforms, a va-
encodes concepts that are connected by relations
riety of such meta-information is usually offered
to form assertions, such as “a skirt is a form of
and can be easily accessed via the APIs those plat-
female attire”. ConceptNet is augmented by a
forms provide.
set of stereotypes (manually) extracted from the
social media network Formspring.8 An example Having some background information about the
for such a stereotype assertion is “lipstick is used user of a post may be very predictive. A user who
by girls”. The augmented knowledge base is re- is known to write hate speech messages may do
ferred to as BullySpace.9 This knowledge base also again. A user who is not known to write such
lows computing the similarity of concepts of com- messages is unlikely to do so in future. Xiang et
mon knowledge with concepts expressed in user al. (2012) effectively employ this heuristic in in-
comments.10 After extracting concepts present ferring further hate speech messages. Dadvar et
in a given user comment, the similarity between al. (2013) use as a feature the number of profane
the extracted concepts and a set of four canoni- words in the message history of a user. Know-
cal concepts is computed. Canonical concepts are ing the gender of the user may also help (Dadvar
the four reference concepts positive and negative et al., 2012; Waseem and Hovy, 2016). Men are
valence and the two genders, male and female. much more likely to post hate speech messages
The resulting similarity scores between extracted than women.
and canonical concepts indicate whether a mes- Beyond these, several other kinds of meta-
sage might constitute a hate speech instance. A information are common, such as the number of
hate speech instance has a high similarity to the posts by a user, the number of replies to a post, the
canonical concept negative valence and the canon- average of the total number of replies per follower
ical concept representing the gender opposed to or the geographical origin, but most of these have
the actual gender of the user being addressed in not been found effective for classification (Zhong
the message post. For example, for the sentence et al., 2016; Waseem and Hovy, 2016). More-
given above, a high similarity to negative valence over, there are certain kinds of meta-information
and female would correctly indicate that the utter- for which conflicting results have been reported.
ance is meant as hate speech. For instance, Hosseinmardi et al. (2015) report
Obviously, the approach proposed by Dinakar et a correlation between the number of associated
al. (2012) only works for a very confined subtype comments to a post and hate speech while Zhong
of hate speech (i.e. anti-LGBT bullying). Even et al. (2016) report the opposite. (Both papers use
though the framework would also allow for other Instagram as a source.) Many reasons may be re-
sponsible for that. Zhong et al. (2016) speculate
8
The augmentation is achieved by applying the joint infer- that the general lack in effectiveness of the meta-
ence technique blending after both ConceptNet and the asser- information they examined may be due to the fact
tions have been transformed into a so-called AnalogySpace.
9 they consider celebrity accounts. Accounts from
BullySpace contains 200 LGBT-specific assertions.
10
Concepts are represented as vectors, so the similarity can regular users, on the other hand, may display quite
be easily computed by measures such as cosine-similarity. a different behaviour. From that we conclude that
5
meta-information may be helpful but it depends on that both platforms exhibit the same top 6 hate tar-
the exact type of information one employs and also get groups: People are mostly bullied for their eth-
the source from which the data originate. nicity, behaviour, physical characteristics, sexual
orientation, class or gender. Chau and Xu (2007)
3.8 Multimodal Information present a study of a selected set of 28 anti-Black
Modern social media do not only consist of text hate groups in blogs on the Xanga site. Using a
but also include images, video and audio content. semi-automated approach, they find demographi-
Such non-textual content is also regularly com- cal and topological characteristics of these groups.
mented on, and therefore becomes part of the dis- Using web-link and -content analysis, Zhou et al.
course of a hate speech utterance. This context (2005) examine the structure of US domestic ex-
outside a written user comment can be used as a tremist groups.
predictive feature.
5 Anticipating Alarming Societal
As for knowledge-based features, not too many
Changes
contributions exist that exploit this type of infor-
mation. This is slightly surprising, since among Apart from detecting individual, isolated hateful
hateful user posts illustrated by websites doc- comments and classifying the types of users in-
umenting representative cases of severe cyber volved, the overall proportion of extreme negative
hate11 , visual context plays a major role. posts over a certain time-span also allows for inter-
Hosseinmardi et al. (2015) employ features esting avenues of research. Insights into changes
based on image labels, shared media content, and in public or personal mood can be gained. Infor-
labelled image categories. Zhong et al. (2016) mation on notable increases in the number of hate-
make use of pixel level image features and report ful posts within a short time span might indicate
that a combination of those visual features and suspicious developments in a community. Such
features derived from captions gives best perfor- information could be utilized to circumvent inci-
mance. They also employ these features for pre- dents such as racial violence, terrorist attacks, or
dicting which images are bully-prone. These are other crimes before they happen, thus providing
images that are likely to attract hate speech com- steps in the direction of anticipatory governance.
ments, and are referred to as bullying triggers. One work concerned with crime prediction is
Wang et al. (2012). This work focuses on fore-
4 Persons Involved in Bullying Episodes casting hit-and-run crimes from Twitter data by
and Their Roles effectively employing semantic role labelling and
event-based topic extraction (with LDA). Burnap
Apart from detecting hateful messages, a group of
et al. (2013) examine the automatic detection of
works focuses on persons involved in hate speech
tension in social media. They establish that it can
episodes and their roles. Xu et al. (2012) look at
be reliably detected and visualized over time us-
the entire bullying event (or bullying trace), auto-
ing sentiment analysis and lexical resources en-
matically assigning roles to actors involved in the
coding topic-specific actors, accusations and abu-
event as well as the message author. They differ-
sive terms. Williams and Burnap (2015) tempo-
entiate between the roles bully, victim, assistant,
rally relate online hate speech with offline terror-
defender, bystander, reinforcer, reporter and ac-
ist events. They find that the first hours following
cuser for tweet authors and for person mentions
a terrorist event are the critical time span in which
within the tweet. Aside from classifying insulting
online hate speech may likely occur.
messages, Sood et al. (2012b) also automatically
predict whether such messages are directed at an 6 Classification Methods
author of a previous comment or at a third party.
Silva et al. (2016) provide an analysis of the main The methods utilized for hate speech detection
hate target groups on the two social media plat- in terms of classifiers are predominantly super-
forms Twitter and Whisper. The authors conclude vised learning approaches. As classifiers mostly
Support Vector Machines are used. Among the
11
One example documenting disturbing cases of gender- more recent methods, deep learning with Recur-
based hate on facebook is
www.womenactionmedia.org/examples-of- rent Neural Network Language Models has been
gender-based-hate-speech-on- facebook/ employed in Mehdad and Tetreault (2016). There
6
exist no comparative studies which would allow ferent types of hate speech than on a service that
making judgement on the most effective learning is used by a cross-section of the general pub-
method. lic since the resulting different demographics will
The different works also differ in the choice of have an impact on the topics discussed and the lan-
classification procedure: Standard one-step clas- guage used. These implications should be consid-
sification approaches exist along with multi-step ered when interpreting the results of research con-
classification approaches. The latter approaches ducted on a particular social media platform.
employ individual classifiers that solve subprob- In general, the size of collected corpora varies
lems, such as establishing negative polarity (§3.3). considerably in works on hate speech detection,
Furthermore, some works employ semi-super- ranging from around 100 labelled comments used
vised approaches, particularly bootstrapping, in the knowledge-based work by Dinakar et al.
which can be utilized for different purposes in the (2012) to several thousand comments used in other
context of hate speech detection. On the one hand, works, such as Van Hee et al. (2015) or Djuric et
it can be used to obtain additional training data, al. (2015). Apart from the classification approach
as it is for example done in Xiang et al. (2012). In taken, another reason for these size differences lies
this work, first a set of Twitter users is divided into in the simple fact that annotating hate speech is an
good and bad users, based on the number of offen- extremely time consuming endeavour: There are
sive terms present in their posts. Then all existing much fewer hateful than benign comments present
tweets of those bad users are selected and added to in randomly sampled data, and therefore a large
the training set as hate speech instances. number of comments have to be annotated to find
In addition, bootstrapping can also be utilized a considerable number of hate speech instances.
to build lexical resources used as part of the detec- This skewed distribution makes it generally diffi-
tion process. Gitari et al. (2015) apply this method cult and costly to build a corpus that is balanced
to populate their hate verb lexicon, starting with with respect to hateful and harmless comments.
a small seed verb list, and iteratively expanding it The size of a data set should always be taken into
based on WordNet relations, adding all synonyms consideration when assessing the effectiveness of
and hypernyms of those seed verbs. certain features or (learning) methods applied on
it. Their effectiveness – or lack thereof – may be
7 Data and Annotation the result of a particular data size. For instance,
features that tackle word generalization (§3.2) are
To be able to perform experiments on hate speech
extremely important when dealing with small data
detection, access to labelled corpora is essential.
sets while on very large data sets they become less
Since there is no commonly accepted benchmark
important since data sparsity is a less of an issue.
corpus for the task, authors usually collect and la-
We are not aware of any study examining the rela-
bel their own data. The data sources that are used
tion between the size of labeled training data and
include: Twitter (Xiang et al., 2012; Xu et al.,
features/classifiers for hate speech detection.
2012; Burnap et al., 2013; Burnap et al., 2014;
Burnap and Williams, 2015; Silva et al., 2016), In order to increase the share of hate speech
Instagram (Hosseinmardi et al., 2015; Zhong et messages while keeping the size of data instances
al., 2016), Yahoo! (Nobata et al., 2016; Djuric et to be annotated at a reasonable level, Waseem and
al., 2015; Warner and Hirschberg, 2012), YouTube Hovy (2016)14 propose to pre-select the text in-
(Dinakar et al., 2012), ask.fm (Van Hee et al., stances to be annotated by querying a site for top-
2015), Formspring (Dinakar et al., 2012), Usenet ics which are likely to contain a higher degree
(Razavi et al., 2010), Whisper12 (Silva et al., of hate speech (e.g. Islam terror). While this in-
2016), and Xanga13 (Chau and Xu, 2007). Since creases the proportion of hate speech posts on re-
these sites have been created for different pur- sulting data sets, it focuses the resulting data set to
poses, they may have special characteristics, and specific topics and certain subtypes of hate speech
may therefore display different subtypes of hate (e.g. hate speech targeting Muslims).
speech. For instance, on a platform specially cre- In order to annotate a data set manually, either
ated for adolescents, one should expect quite dif- expert annotators are used or crowdsourcing ser-
12 14
http://whisper.sh The data from this work are available under http://
13
http://xanga.com github.com/zeerakw/hatespeech
7
vices, such as Amazon Mechanical Turk (AMT), all these cases, it is unclear whether the methods
are employed. Crowdsourcing has obvious eco- we described in this survey would correctly recog-
nomical and organizational advantages, especially nize these remarks as hate speech.
for a task as time-consuming as the one at hand, In (6) a woman is ridiculed for her voice. There
but annotation quality might suffer from employ- is no explicit evaluation of her voice but it is an
ing non-expert annotators. Nobata et al. (2016) obvious inference from being compared with Ker-
compare crowdsourced annotations performed us- mit the frog. In (7), a Muslim is accused of bes-
ing AMT with annotations created by expert anno- tiality. Again, there is no explicit accusation. The
tators and find large differences in agreement. speaker of that utterance relies on his addressee
In addition to the issues mentioned above that, to be aware of stereotyped prejudices against Is-
to some extent, challenge the comparability of lam. Finally, in (8), the speaker of that utterance
the research conducted on various data sets, the wants to offend some girls by suggesting they are
fact that no commonly accepted definition of hate unattractive. Again, there is no explicit mention of
speech exists further exacerbates this situation. being unattractive but challenging someone else’s
Previous works remain fairly vague when it opposite view can be interpreted in this way.
comes to the annotation guidelines their annota-
tors were given for their work. Ross et al. (2016) (6) Kermit the frog called and he wants his voice back.
(7) Your goat is calling.
point out that this is particularly a problem for (8) Who was responsible for convincing these girls they
hate speech detection. Despite providing annota- were so pretty?
tors with a definition of hate speech, in their work
the annotators still fail to produce an annotation at These examples are admittedly difficult cases
an acceptable level of reliability. and we are not aware of one individual method
which would cope with all of these examples. It
8 Challenges remains to be seen, whether in the future new
computational approaches can actually solve these
As the previous section suggests, the community
problems or whether hate speech is a research
would considerably benefit from a benchmark data
problem similar to sarcasm where only certain
set for the hate speech detection task underlying a
subtypes have been shown to be automatically de-
commonly accepted definition of the task.
tected with the help of NLP (Riloff et al., 2013).
With the exception of Dutch (Van Hee et al.,
2015) and German (Ross et al., 2016), we are not 9 Conclusion
aware of any significant research being done on
hate speech detection other than on English lan- In this paper, we presented a survey on the auto-
guage data. We think that particularly a multi- matic detection of hate speech. This task is usually
lingual perspective to hate speech may be worth- framed as a supervised learning problem. Fairly
while. Unlike other tasks in NLP, hate speech may generic features, such as bag of words or em-
have strong cultural implications, that is, depend- beddings, systematically yield reasonable classi-
ing on one’s particular cultural background, an ut- fication performance. Character-level approaches
terance may be perceived as offensive or not. It re- work better than token-level approaches. Lexical
mains to be seen in how far established approaches resources, such as list of slurs, may help classifi-
to hate speech detection examined on English are cation, but usually only in combination with other
equally effective on other languages. types of features. Various complex features using
Although in the previous sections we also de- more linguistic knowledge, such as dependency-
scribed approaches that try to incorporate the parse information, or features modelling specific
context of hate speech by employing some linguistic constructs, such as imperatives or polite-
specific knowledge-based features (§3.6), meta- ness, have also been shown to be effective. Infor-
information (§3.7) or multi-modal information mation derived from text may not be the only cue
(§3.8), we still feel that there has been compara- suggesting the presence of hate speech. It may be
tively little work looking into these types of fea- complemented by meta-information or informa-
tures. In the following, we illustrate the necessity tion from other modalities (e.g. images attached to
of incorporating such context knowledge with the messages). Making judgements about the general
help of three difficult instances of hate speech. For effectiveness of many of the complex features is
8
difficult since, in most cases, they are only evalu- Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu.
ated on individual data sets, most of which are not 2012. Detecting offensive language in social me-
dia to protect adolescent online safety. In Privacy,
publicly available and often only address a sub-
Security, Risk and Trust (PASSAT), 2012 Interna-
type of hate speech, such as bullying of particular tional Conference on and 2012 International Con-
ethnic minorities. For better comparability of difference on Social Computing (SocialCom), pages
ferent features and methods, we argue for a bench- 71–80, Amsterdam, Netherlands, September. IEEE.
mark data set for hate speech detection.
Maral Dadvar, Franciska MG de Jong, RJF Ordelman,
and RB Trieschnigg. 2012. Improved cyberbully-
Acknowledgements ing detection using gender information. DIR 2012,
We would like David M. Howcroft for proofreading this pa- pages 22–25.
per. The authors were partially supported by the German Re-
Maral Dadvar, Dolf Trieschnigg, Roeland Ordelman,
search Foundation (DFG) under grant WI 4204/2-1 and the and Franciska de Jong. 2013. Improving Cyber-
Cluster of Excellence Multimodal Computing and Interaction bullying Detection with User Context. In Proceed-
of the German Excellence Initiative. ings of the European Conference in Information Re-
trieval (ECIR), pages 693–696, Moscow, Russia.
Karthik Dinakar, Birago Jones, Catherine Havasi,

References Henry Lieberman, and Rosalind Picard. 2012.
David M. Blei, Andrew Ng, and Michael I. Jordan. Common sense reasoning for detection, prevention,
2003. Latent Dirichlet Allocation. Journal of Ma- and mitigation of cyberbullying. ACM Trans. Inter-
chine Learning Research, 3:993–1022. act. Intell. Syst., 2(3):18:1–18:30, September.
Peter F. Brown, Peter V. deSouza, Robert L. Mer- Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Gr-
cer, Vincent J. Della Pietra, and Jenifer C. Lai. bovic, Vladan Radosavljevic, and Narayan Bhamidi-
1992. Class-based n-gram models of natural lan- pati. 2015. Hate speech detection with comment
guage. Computational Linguistics, 18(4):467–479. embeddings. In Proceedings of the 24th Interna-
tional Conference on World Wide Web, pages 29–30,
P. Burnap and M. Williams. 2014. Hate speech, ma- New York, NY, USA. ACM.
chine classification and statistical modelling of in-
formation flows on twitter: Interpretation and com-
Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura
munication for policy decision making. In Inter-
Damien, and Jun Long. 2015. A lexicon-based
net, Policy and Politics Conference, Oxford, United
approach for hate speech detection. International
Kingdom.
Journal of Multimedia and Ubiquitous Engineering,
Pete Burnap and Matthew L. Williams. 2015. Cyber 10(4):215–230.
hate speech on twitter: An application of machine
classification and statistical modeling for policy and Homa Hosseinmardi, Sabrina Arredondo Mattson, Ra-
decision making. Policy & Internet, 7(2):223–242. hat Ibn Rafiq, Richard Han, Qin Lv, and Shiv-
akant Mishra. 2015. Detection of cyberbullying
Pete Burnap and Matthew L. Williams. 2016. Us and incidents on the instagram social network. CoRR,
them: identifying cyber hate on twitter across mul- abs/1503.03909.
tiple protected characteristics. EPJ Data Science,
5(1):1–15. Irene Kwok and Yuzhou Wang. 2013. Locate the hate:
Detecting tweets against blacks. In Marie desJardins
Pete Burnap, Omer F. Rana, Nick Avis, Matthew and Michael L. Littman, editors, AAAI, pages 1621–
Williams, William Housley, Adam Edwards, Jeffrey 1622, Bellevue, Washington, USA. AAAI Press.
Morgan, and Luke Sloan. 2013. Detecting ten-
sion in online communities with computational twit- Quoc Le and Tomas Mikolov. 2014. Distributed Rep-
ter analysis. Technological Forecasting and Social resentations of Sentences and Documents. In Pro-
Change, pages 96–108, May. ceedings of the International Conference on Ma-
Pete Burnap, Matthew L. Williams, Luke Sloan, Omer chine Learning (JMLR), pages 1188–1196, Beijing,
Rana, William Housley, Adam Edwards, Vincent China.
Knight, Rob Procter, and Alex Voss. 2014. Tweet-
ing the terror: modelling the social media reaction to Hugo Liu and Push Singh. 2004. ConceptNet: A Prac-
the woolwich terrorist attack. Social Network Anal- tical Commonsense Reasoning Toolkit. BT Technol-
ysis and Mining, 4(1):1–14. ogy Journal, 22:211–226.
Michael Chau and Jennifer Xu. 2007. Mining commu- Yashar Mehdad and Joel Tetreault. 2016. Do charac-
nities and their relationships in blogs: A study of on- ters abuse more than words? In 17th Annual Meet-
line hate groups. International Journal of Human- ing of the Special Interest Group on Discourse and
Computer Studies, 65(1):57–70. Dialogue, pages 299–303, Los Angeles, CA, USA.
9
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Mike Thelwall, Kevan Buckley, Georgios Paltoglou,
Dean. 2013. Efficient Estimation of Word Repre- and Di Cai. 2010. Sentiment Strength Detec-
sentations in Vector Space. In Proceedings of Work- tion in Short Informal Text. Journal of the Ameri-
shop at the International Conference on Learning can Society for Information Science and Technology,
Representations (ICLR), Scottsdale, AZ, USA. 61(12):2544–2558.
Chikashi Nobata, Joel Tetreault, Achint Thomas, Cynthia Van Hee, Els Lefever, Ben Verhoeven, Julie
Yashar Mehdad, and Yi Chang. 2016. Abusive lan- Mennes, Bart Desmet, Guy De Pauw, Walter Daele-
guage detection in online user content. In Proceed- mans, and Véronique Hoste. 2015. Detection and
ings of the 25th International Conference on World fine-grained classification of cyberbullying events.
Wide Web, pages 145–153, Geneva, Switzerland. In Proceedings of Recent Advances in Natural Lan-
guage Processing, Proceedings, pages 672–680,
John T. Nockleby. 2000. Hate Speech. In Leonard W. Hissar, Bulgaria.
Levy, Kenneth L. Karst, and Dennis J. Mahoney, Xiaofeng Wang, Matthew S Gerber, and Donald E
editors, Encyclopedia of the American Constitution, Brown. 2012. Automatic crime prediction using
pages 1277–1279. Macmillan, 2nd edition. events extracted from twitter posts. In International
Conference on Social Computing, Behavioral-
Amir H. Razavi, Diana Inkpen, Sasha Uritsky, and Stan Cultural Modeling, and Prediction, pages 231–238.
Matwin. 2010. Offensive language detection using
multi-level classification. In Proceedings of the 23rd William Warner and Julia Hirschberg. 2012. Detecting
Canadian Conference on Advances in Artificial In- hate speech on the world wide web. In Proceedings
telligence, AI’10, pages 16–27, Berlin, Heidelberg. of the Second Workshop on Language in Social Me-
dia, LSM ’12, pages 19–26, Stroudsburg, PA, USA.
Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalin- Association for Computational Linguistics.
dra De Silva, Nathan Gilbert, and Ruihong Huang.
2013. Sarcasm as Contrast between a Positive Senti- Zeerak Waseem and Dirk Hovy. 2016. Hateful sym-
ment and Negative Situation. In Proceedings of the bols or hateful people? predictive features for hate
Conference on Empirical Methods in Natural Lan- speech detection on twitter. In Proceedings of the
guage Processing (EMNLP), pages 704–714, Seat- NAACL Student Research Workshop, pages 88–93,
tle, WA, USA. San Diego, California, USA, June. Association for
Computational Linguistics.
Björn Ross, Michael Rist, Guillermo Carbonell, Ben-
Matthew Leighton Williams and Pete Burnap. 2015.
jamin Cabrera, Nils Kurowsky, and Michael Wo-
Cyberhate on social media in the aftermath of wool-
jatzki. 2016. Measuring the Reliability of Hate
wich: A case study in computational criminology
Speech Annotations: The Case of the European
and big data. British Journal of Criminology, pages
Refugee Crisis. In Proceedings of the Workshop
211–238.
on Natural Language Processing for Computer-
Mediated Communication (NLP4CMC), pages 6–9, Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and
Bochum, Germany. Carolyn Rose. 2012. Detecting offensive tweets
via topical feature discovery over a large scale twit-
Leandro Araújo Silva, Mainack Mondal, Denzil Cor- ter corpus. In Proceedings of the 21st ACM inter-
rea, Fabrı́cio Benevenuto, and Ingmar Weber. 2016. national conference on Information and knowledge
Analyzing the targets of hate in online social me- management, pages 1980–1984, Maui, HI, USA.
dia. In Proceedings of the Tenth International Con- ACM.
ference on Web and Social Media, pages 687–690,
Cologne, Germany. Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, and
Amy Bellmore. 2012. Learning from bullying
Sara Sood, Judd Antin, and Elizabeth Churchill. traces in social media. In Proceedings of the 2012
2012a. Profanity use in online communities. In conference of the North American chapter of the
Proceedings of the SIGCHI Conference on Human association for computational linguistics: Human
Factors in Computing Systems, pages 1481–1490, language technologies, pages 656–666, Montréal,
Austin, TX, USA. ACM. Canada. Association for Computational Linguistics.
Sara Owsley Sood, Elizabeth F. Churchill, and Judd Haoti Zhong, Hao Li, Anna Cinzia Squicciarini,
Antin. 2012b. Automatic identification of personal Sarah Michele Rajtmajer, Christopher Griffin,
insults on social news sites. J. Am. Soc. Inf. Sci. David J. Miller, and Cornelia Caragea. 2016.
Technol., 63(2):270–285, February. Content-driven detection of cyberbullying on the in-
stagram social network. In IJCAI, pages 3952–
Ellen Spertus. 1997. Smokey: Automatic recognition 3958, New York City, NY, USA. IJCAI/AAAI Press.
of hostile messages. In Proceedings of the Four- Yilu Zhou, Edna Reid, Jialun Qin, Hsinchun Chen, and
teenth National Conference on Artificial Intelligence Guanpi Lai. 2005. US Domestic Extremist Groups
and Ninth Conference on Innovative Applications on the Web: Link and Content Analysis. IEEE in-
of Artificial Intelligence, AAAI’97/IAAI’97, pages telligent systems, 20(5):44–51.
1058–1065, Providence, RI, USA. AAAI Press.
10

A Survey On Hate Speech Detection Using Natural Language Processing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey On Hate Speech Detection Using Natural Language Processing

Uploaded by

Copyright:

Available Formats

A Survey on Hate Speech Detection using Natural Language Processing

Anna Schmidt Michael Wiegand

Abstract considered a hate speech message might be influ-

Karthik Dinakar, Birago Jones, Catherine Havasi,

You might also like