Automated Hate Speech Detection and The Problem of Offensive Language

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Automated Hate Speech Detection and the

Problem of Offensive Language


A Summary
Muneeb Nawaz 21100061

The study deals with differentiating hate speech from forms of language that are seen
just as offensive. Lexicon detection does not work very well as it can misclassify
offensive language as hate speech as it does not take context into account. Hence a
multi class classification is used which divides the data into three labels: hate speech,
offensive and none of the above. This provides clarity but it negates sexist tweets as
offensive and racist and homophobic statements as hate speech. Tweets that don't
contain specific flag words are harder to classify as well

Before classifying we need to understand what hate speech is. In the article it is defined
as​ “hate speech as language that is used to expresses hatred towards a targeted group
or is intended to be derogatory, to humiliate, or to insult the members of the group”​ Use
of hate speech in the UK,Canada, France etc is punished by imposing fines and even
imprisonment meanwhile the United States protects this as free speech as stated in the
First Amendment to its constitution. Social media sites are always under scrutiny for not
doing enough to curb the spread of hate speech, this spreads hatred among groups and
sometimes can also incite violence. The definition here does include of the slurs used
by African Africans such as ‘n*gga’ or lyrics used in rap songs such as ‘h*e,b*tch’ or f*g
as used by video game players. Hence creating this boundary and fine-graining labels
are needed

The difference between hate speech and offensive language is very subtle. Since a lot
of these red flag words are used as curse words rather than insinuating hate speech.
Syntactic features are used to find out where certain verbs and nouns are used and
hence conclude whether something is hate speech or not unlike other supervised
models.

In the data collection process, tweets taken from twitter are manually labelled by
CrowdForce (CF) workers, who were provided the definitions of hate speech and asked
to classify the tweets in three categories mentioned in the first paragraph.Three people
labelling each tweet with the label being the majority winner. The agreement score was
92% with only 5% of the tweets being labelled hate speech much lower than twitter’s
own 11.6%.
Bigrams,trigrams or unigrams were created after the tweets were lower cased and
stemmed using Port stemmer. To capture the quality Flesch-Kincaid Grade Level and
Flesch Reading Ease scores, where the number of sentences is fixed at one and the
syntactic structure was taken by using NTLK. The count of retweets, hashtags, URLs
and mentions were also kept.

Models were tested with 10% validation data from the sample using 5-fold cross
validation to prevent overfitting. Logistic regression with L2 regularisation was used
finally as it was helpful in analysing predicted probabilities of class members. This final
model was used on the entire data set in a one vs rest framework where a different
classifier is trained for each class using the same model.

The final results were an overall precision 0.91, recall of 0.90, and F1 score of 0.90.
Even in this 40% of the hate speech was misclassified. This means the model is more
biased than humans, looking at tweets as less hateful.
Tweets with the highest probability of being hate speech contain a lot of red flag words
but it does not differentiate between a homophobic rebuttal to a homophobic tweet
classifying them both as hate speech. Sometimes non racist tweets are wrongly
classified because they use words that are racist in another context. Some hateful
tweets that are classified as neutral tend not to contain these red flag words so they
escape the classifier. A key flaw in much previous work is that offensive language is
mislabeled as hate speech due to an overly broad definition.​Finally​,​going​ ​back​ to the
nether​ ​class​, we ​see​ that the ​tweets​ with the ​highest​ ​predicted​ ​probability​ of ​belonging
to this ​class​ ​seem​ ​innocuous.

In conclusion, Hate​ ​speech​ is tough to ​define​. ​Hate​ ​speech​ ​classifications​ ​tend​ to ​reflect
subjective​ ​biases​. ​People​ ​identify​ ​racist​ and ​homophobic​ ​slurs​ as ​hateful​, but ​tend​ to
view​ ​sexist​ ​language​ as ​simply​ ​offensive.​The ​results​ ​show​ that ​people​ do ​well​ in
identifying​ severe ​cases​ of ​hate​ ​speech​, ​particularly​ ​racism​ and ​homophobia​ against
blacks​, it is ​important​ that we are ​aware​ of ​social​ ​issues​. The ​biases​ that ​affect
algorithms​ that ​aim​ to classify hate speech need to be fixed for successful
classifications

You might also like