B1PROCA4EXAMPLEPAPER

Are State-of-the-Art Textual Hate Speech Detectors Resistant to
Adversarial Manipulations?
Sajal Aggarwal
Department of Information Technology, Delhi Technological University, New Delhi, India
Dinesh Kumar Vishwakarma
Department of Information Technology, Delhi Technological University, New Delhi, India
ABSTRACT: Adversarial attacks have been thoroughly investigated for vision-based datasets in
the past decade. Comparatively, the number of works dedicated towards the study of adversarial
attacks in text-based applications is much less. This can primarily be attributed to the difficulties
encountered in crafting suitable perturbations for text-based samples that do not impact their
meaning and grammaticality. Moreover, the existing studies on textual adversarial attacks
primarily focus on text classification applications such as sentiment analysis or news type clas -
sification as opposed to tasks with stronger security implications like hate speech and offensive
language detection or fake news detection. This study serves to fine-tune three state-of-the-art
transformer models (BERT, DistilBERT and RoBERTa) on the ETHOS dataset and examine
their performance under the influence of adversarial attacks. The results obtained clearly signify
the adverse impact that adversarial attacks can have on the performance of transformer models,
thereby raising serious concerns about their robustness and reliability.
1 INTRODUCTION
1.1 Type area
Machine learning (ML) and deep learning (DL) models are being used widely to develop
automated systems capable of performing a vast range of applications [1], [2]. However, the ro-
bustness and dependability of these models is a serious concern. Adversarial attacks when ap-
plied to such models can drastically alter their predictions, thereby undermining their accuracy
by a large extent. These attack algorithms introduce perturbations or modifications into the data-
set samples such that their ground truth labels with respect to human observers are retained but
the victim model is unable to classify them correctly.
Adversarial attacks have been examined in detail with respect to computer vision models and
datasets [3], [4], [5], [6]. However, their study in the text domain is still far more limited. This is
because of the inherent distinction in the nature of visual and textual content. While introducing
seemingly unobservable perturbations into images is comparatively easier, doing the same can
be tricky for text samples. Even a minor perturbation in a text sample has the potential to des-
troy semantic and grammatical accuracy. Nevertheless, attack frameworks have been developed
for textual classification datasets in recent years. Another major deficit of investigations direc -
ted towards text-based attacks is the dearth of studies dealing with security-critical applications
such as hate speech recognition and fake news detection. Consequently, this study is aimed at
fine-tuning three cutting-edge transformer models, namely BERT, DistilBERT, and RoBERTa
on the ETHOS dataset and then evaluating the influence of five attack approaches on them.
Through this research endeavour, we demonstrate the ability of adversarial attack frameworks to
deceive state-of-the-art textual hate speech detectors. By incorporating seemingly irrelevant per-
turbations that have no impact on human judgement, adversarial attacks entirely alter model
predictions, thereby raising serious doubts about the robustness of such models. Attackers can
make use of such attack approaches to evade the automated hate speech detection procedures
enforced by social media platforms, thereby posing a serious threat to the integrity of the con-
tent published on such forums. The risks of employing adversarial attacks are more pronounced
when dealing with security-sensitive applications.
Organization: The rest of the sections will be laid out in the following manner:
In Section II, we lay out the taxonomy of adversarial attacks in the text domain and delineate
the components of recently proposed adversarial attack frameworks. Section III outlines the at-
tack procedure while Section IV serves to provide a brief description of the dataset, victim mod-
els, and evaluation parameters. Section V presents and compares the results achieved for the
three transformer models. Finally, Section VI concludes the paper while making recommenda-
tions for future research efforts.
Table 1. Margin settings for A4 size paper and letter size paper.
_______________________________________________________________
Setting A4 size paper Letter size paper
________________ ________________
cm inches cm inches
_______________________________________________________________
Top 2.5 0.98" 1.75 0.69"
Bottom 3.0 1.18" 2.01 0.79"
Left 3.0 1.18" 3.28 1.29"
Right 3.0 1.18" 3.28 1.29"
All other 0.0 0.0" 0.0 0.0"
_______________________________________________________________
2 GETTING STARTED
2.1 The template file
Copy the template file B1ProcA4.dot to the template directory. This directory can be found by
selecting the Tools menu, Options and then by tabbing the File Locations. When the Word pro -
gramme has been started, open the File menu and choose New. Now select the template B1P-
ROCA4.dot and start by renaming the document after clicking Save As in the Files menu.
Adversarial attacks in text domain
Specificity Adversary’s knowledge Perturbation granularity
Targeted Untargeted Black-box White-box Character Sentence
Semi-white (Gray) box Word
Figure 1 Taxonomy of adversarial attacks in text domain

This section aims at classifying adversarial attacks in the text domain and highlighting the
fundamental components of some recently proposed frameworks. Fig. 1. classifies the attacks
based on three parameters, i.e., specificity, adversary’s knowledge, and perturbation granularity.
Specificity describes whether the attack is aimed at altering model predictions to a particular
label (targeted attacks) or not (untargeted attacks). In untargeted attacks, the objective is to alter
the model prediction, but not to a specifically chosen label. Another basis for classification is
the adversary’s (or attacker’s) knowledge. In a black-box configuration, no model parameters,
training data, or gradients are available to the attacker while the white-box configuration
provides information about model parameters and architecture. The gray-box setting serves as
an intermediate configuration. With respect to the level of perturbation or modification, attacks
can be grouped as character-level (swapping, insertion, deletion), word-level (synonym substitu-
tion), or sentence-level (paraphrasing). According to [7], an adversarial attack framework can be
split into four constituents (Table I), namely goal function, constraints, transformations and
search techniques.
Adver- Perturbation
Text At- Speci- Transforma-
sary’s knowl- Constraints Search technique
tack ficity tions
edge level
POS consistency, Substitution with

DistilBERT similarity, counter-fitted word Gradient-based word
A2T [9] UT White-box Word-level
maximum modifica- embedding gener- importance ranking
tion rate ated synonyms
Maximum number
Alzantot of words perturbed,
[10], Faster maximum Euclidean
Word embed-
Alzantot [11] UT Black-box Word-level distance between em- Genetic algorithms
ding swap
Genetic Algo- beddings, repeat modi-
rithm fication, stopword
modification
POS consistency,
Token replace- Greedy word impor-
BAE [12] UT Black-box Word-level USE sentence encod-
ment and insertion tance ranking
ing cosine similarity
Levenshtein edit Character inser-

Character- distance, repeat modi- tion, deletion, swap-
DWB [13] UT Black-box Greedy word swap
level fication, stopword ping and substitu-
modification tion
Word embedding
Character- cosine similarity,
HotFlip Gradient-based
UT White-box level and word- Maximum number of Beam search
[14] word swap
level words perturbed, POS
matching
Maximum number
of words perturbed,
thought vector word Counter-fitted
Kuleshov
UT Black-box Word-level embedding cosine sim- word embedding Greedy search
[15]
ilarity, repeat modifi- swap
cation, stopword mod-
ification
Input column mod-
ification, repeat modi- HowNet word Particle swarm optimi-
PSO [8] UT Black-box Word-level
fication, stopword swap sation
modification
Repeat modifica-
PWWS WordNet syn- Greedy word swap
UT Black-box Word-level tion, stopword modifi-
[16] onym swap (Weighted saliency)
cation
Swapping with
POS consistency,
TextFoole word’s 50 nearest Greedy word impor-
UT Black-box Word-level USE sentence encod-
r [17] embedding neigh- tance ranking
ing cosine similarity
bours
Language-Tool Word insertion,

CLARE grammar checker, merging and swap-
UT Black-box Word-level Greedy search
[18] USE sentence encod- ping in a context-
ing cosine similarity aware manner
Character inser-
Maximum words
tion, deletion, swap-
perturbed, minimum
Pruthi Character- ping (neighbouring
UT Black-box word length, stopword Greedy search
[19] level characters and QW-
modification, repeat
ERTY keyboard-
modification
based)
2.2 Title, author and affiliation frame

Place the cursor immediately before the T of Title at the top of your newly named file and type
the title of the paper in lower case (no caps except for proper names). The title should not be
longer than 75 characters). Delete the word Title (do not delete the paragraph end).
Place the cursor immediately before the A of A.B. Author(s) and type the name of the first
author (first the initials and then the last name). If any of the co-authors have the same affili-
ation as the first author, add his name after an & (or a comma if more names follow). Delete
the words A.B. Author etc. and place the cursor immediately before the A of Affiliation. Type
the correct affiliation; Name of the institute, City, State/Province, Country, do not add street
names, P.O. Box numbers or zip codes to the affiliations. Now delete the word Affiliation. If
there are authors linked to other institutes, place the cursor at the end of the affiliation line just
typed and give a return. Now type the name(s) of the author(s) and after a return the affiliation.
Repeat this procedure until all affiliations have been typed.
All the above texts should fit in the frame which should not be changed (Width: Exactly 15.0
cm or 5,91"; Height: Exactly 7.2 cm or 2.79"; Lock anchor).
2.3 Abstract frame

If there are no further authors, place the cursor one space after the word ABSTRACT: and type
your abstract of not more than 150 words. The first line of the abstract will be 7.2 cm (2.83")
from the top. The complete abstract will fall in the abstract frame, the settings of which should
also not be changed (width: Exactly 15.0 cm or 5.91"; height: Automatic; vertical: 7.2 cm or
2.83" from margin; Lock anchor).
2.4 First line of text or heading
If your text starts with a heading, place the cursor immediately before the I of INTRODUC-
TION and type the correct text for the heading. Now delete the word INTRODUCTION and
start with the text after a return. This text should have the tag First paragraph.
If your text starts without a heading, you should place the cursor immediately before the I of
INTRODUCTION, change the tag to First paragraph and type your text after deleting the word
INTRODUCTION, but not the return at the end.
3 LAYOUT OF TEXT
3.1 Text and indenting
All text, figures, tables, etc. should fit exactly in the type area of 15 × 24 cm (5.91" × 9.52"). All
text should be typed in Times New Roman. All text is 11 pt on 12 pt line spacing except for the
paper title (16 pt on 18 pt), author(s) (12 pt on 13 pt), affiliation(s) (10 pt on 11 pt) and the small
text in tables, captions and references (10 pt on 11 pt). All line spacing is exact. Never add a
line space between lines or paragraphs.
First lines of paragraphs are indented 4 mm (0.16") except for paragraphs after a heading or a
blank line (First paragraph tag). Equations are indented 12 mm (0.47") (Formula tag).
3.2 Headings
Type primary headings in capital letters roman (Heading 1 tag) and secondary and tertiary head-
ings in lower case italics (Headings 2 and 3 tags). Headings are set flush against the left margin.
The tag will give two blank lines (24 pt) above and one (12 pt) beneath the primary
headings, 1½ blank lines (18 pt) above and a ½ blank line (6 pt) beneath the secondary headings
and one blank line (12 pt) above the tertiary headings. Headings are not indented and neither are
the first lines of text following the heading indented. If a primary heading is directly followed
by a secondary heading, only a ½ blank line should be set between the two headings.
In the Word programme this has to be done manually as follows: Place the cursor on the pri-
mary heading, select Paragraph in the Format menu, and change the setting for spacing after,
from 12 pt to 0 pt. In the same way the setting in the secondary heading for spacing before
should be changed from 18 pt to 6 pt.
3.3 Listing and numbering

For listing facts, use either the style tag List summary signs or the style tag List number signs.
3.4 Equations
Use the equation editor of the selected word processing programme. Equations are indented 12
mm (0.47") from the left margin (Formula tag). Number equations consecutively and place the
number with the tab key at the end of the line, between parantheses. Refer to equations by these
numbers. See for example Equation 1 below:
From the above we note that sin  = (x + y)z or:
( )
4
R2t
Kt= 1− k
c a +ν tan d 1
(1)
where ca = interface adhesion;  = friction angle at interface; and k1 = shear stiffness number.
3.5 Tables
Locate tables close to the first reference to them in the text and number them consecutively.
Avoid abbreviations in column headings. Indicate units in the line immediately below the head-
ing. Explanations should be given at the foot of the table, not within the table itself. Use only
horizontal rules: One above and one below the column headings and one at the foot of the table
(Table rule tag: Use the Shift-minus key to actually type the rule exactly where you want it). For
simple tables use the tab key and not the table option. Type all text in tables in small type (Table
text tag). Align all headings to the left of their column and start these headings with an initial
capital. Type the caption above the table to the same width as the table (Table caption tag). See
for example Table 2.
3.6 Figure captions

Always use the Figure caption style tag (10 points size on 11 points line space). Place the
caption underneath the figure (see example in Section 4). Type as follows: ‘ Figure 1. Caption.’
Leave about two lines of space between the figure caption and the text of the paper.
Table 2. The number of officially reported plague cases in the world.

_____________________________________________________________________________
Region* 1968 1970 1972 1974 1976 1978
_____________________________________________________________________________
Africa 172 27 85 128 183 77
America 392 326 301 297 321 142
Asia 4395 4111 2312 1408 2230 518
Total 4959 4464 2698 1833 2734 737

_____________________________________________________________________________
*For Europe only one reported case in 1970.
3.7 References
In the text, place the authors’ last names (without initials) and the date of publication in paren -
theses (see examples in Section 3.7.1). At the end of the paper, list all references in alphabetical
order underneath the heading REFERENCES (Heading without number tag). The references
should be typed in small text (10 pt on 11 pt) and second and further lines should be indented
4.0 mm (Reference text tag). If several works by the same author are cited, entries should be
chronological:
Larch, A.A. 1996a. Development ...
Larch, A.A. 1996b. Facilities ...
Larch, A.A. 1997. Computer ...
Larch, A.A. & Jensen, M.C. 1996. Effects of ...
Larch, A.A. & Smith, B.P. 1993. Alpine ...
In bibliographies the order for books and journals are respectively:

Last name, First name or Initials (ed.) year. Book title. City: Publisher.
Last name, First name or Initials year. Title of article. Title of Journal (series number if necessary) vol-
ume number (issue number if necessary): page numbers.
3.7.1 Examples:
Grove, A.T. 1980. Geomorphic evolution of the Sahara and the Nile. In M.A.J. Williams & H. Faure
(eds), The Sahara and the Nile: 21-35. Rotterdam: Balkema.
Jappelli, R. & Marconi, N. 1997. Recommendations and prejudices in the realm of foundation engineer -
ing in Italy: A historical review. In Carlo Viggiani (ed.), Geotechnical engineering for the preserva-
tion of monuments and historical sites; Proc. intern. symp., Napoli, 3-4 October 1996. Rotterdam:
Balkema.
Johnson, H.L. 1965. Artistic development in autistic children. Child Development 65(1): 13-16.
Polhill, R.M. 1982. Crotalaria in Africa and Madagascar. Rotterdam: Balkema.
3.7.2 Endnote
We would appreciate it if you make use of the enclosed Endnotes stylefile (Harvard.ens).
3.8 Notes
These should be avoided. Insert the information in the text. In tables, the following reference
marks should be used: *, **, etc. and the actual footnotes are then set directly underneath the ta -
ble.
3.9 Conclusions
Conclusions should state concisely the most important propositions of the paper as well as the
author’s views of the practical implications of the results.
4 PHOTOGRAPHS AND FIGURES
Number figures consecutively in the order in which reference is made to them in the text, mak -
ing no distinction between diagrams and photographs. Figures should fit within the column
width of 90 mm (3.54") or within the type area width of 187 mm (7.36").
Paste copies of figures, photographs etc. at the required size onto the typescript where you
want them to appear in the text. Do not place them sideways on a page. Figures, etc. should not
be centered, but placed against the left margin. Leave about two lines of space between the ac -
tual text and figure (including caption). Never place any text next to a figure. Leave this space
blank. The most convenient place for placing figures is at the top or bottom of the page. Avoid
placing text between figures as readers might not notice the text. Line drawings (as well as pho-
tographic reproductions of these) should be in black (not grey) on white. Keep in mind that ev-
erything will be reduced to 75%. Therefore, 9 point should be the minimum size of the lettering.
Lines should preferably be 0.2 mm (0.1") thick. Keep figures as simple as possible. Avoid ex-
cessive notes and designations.
anelastic strain

crack closure cracking
stiffness decrease
under compression

 f stiffness decrease
in tension
crack closure
compressive
yielding point
Figure 1. Caption of a typical figure.
5 PREFERENCES, SYMBOLS AND UNITS
Consistency of style is very important. Note the spacing, punctuation and caps in all the exam -
ples below.
 References in the text: Figure 1, Figures 2-4, 6, 8a, b (not abbreviated)
 References between parentheses: (Fig. 1), (Figs 2-4, 6, 8a, b) (abbreviated)
 USA / UK / The Netherlands instead of U.S.A. / U.K. / Netherlands / the Netherlands
 Author & Author (1989) instead of Author and Author (1989)
 (Author 1989a, b, Author & Author 1987) instead of (Author, 1989a,b; Author and Author,
1987)
 (Author et al. 1989) instead of (Author, Author & Author 1989)
 Use the following style: (Author, in press); (Author, in prep.); (Author, unpubl.); (Author,
pers. comm.)
Always use the official SI notations:
 kg / m / kJ / cm instead of kg. (Kg) / m. / kJ. (KJ) / cm.;
 20°1632SW instead of 20° 16 32 SW
 0.50 instead of 0,50 (used in French text); 9000 instead of 9,000 but if more than 10,000:
10,000 instead of 10000
 14
C instead of C14 / C-14 and BP / BC / AD instead of B.P. / B.C. / A.D.
 20 instead of 20 / X20 / x 20; 4 + 5 > 7 instead of 4+5>7 but –8 / +8 instead of – 8 / + 8
 e.g. / i.e. instead of e.g., / i.e.,

6 SUBMISSION OF MATERIAL TO THE EDITOR
The camera-ready copy PDF file of the complete paper should be created with the enclosed
CPI_AR_PDF1.7.joboptions file and sent to the editor together with the MS Word file. All fig -
ures should be included as high resolution images in the PDF file (see artwork document for
resolution requirements). Check whether the paper looks the same as this sample: Title at top of
first page in 18 points, authors in 14 points and all other text in 11 points on 12 points line
space, except for the small text (10 point on 11 point line space) used in tables, captions and ref-
erences. Also check if the type width is 187 mm (7.36"), the column width 90 mm (3.54"), the
page length is 272 mm (10.71") and that the space above the ABSTRACT is exactly as in this
sample.

B1PROCA4EXAMPLEPAPER

Uploaded by

Copyright:

Available Formats

You might also like

B1PROCA4EXAMPLEPAPER

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B1PROCA4EXAMPLEPAPER

Uploaded by

Copyright:

Available Formats

Are State-of-the-Art Textual Hate Speech Detectors Resistant to

Adversarial attacks in text domain

Specificity Adversary’s knowledge Perturbation granularity

Targeted Untargeted Black-box White-box Character Sentence

Semi-white (Gray) box Word

Figure 1 Taxonomy of adversarial attacks in text domain

POS consistency, Substitution with

Levenshtein edit Character inser-

Language-Tool Word insertion,

2.2 Title, author and affiliation frame

2.3 Abstract frame

3.3 Listing and numbering

From the above we note that sin  = (x + y)z or:

3.6 Figure captions

Table 2. The number of officially reported plague cases in the world.

America 392 326 301 297 321 142

Asia 4395 4111 2312 1408 2230 518

Total 4959 4464 2698 1833 2734 737

Larch, A.A. 1996b. Facilities ...

Larch, A.A. 1997. Computer ...

Larch, A.A. & Jensen, M.C. 1996. Effects of ...

Larch, A.A. & Smith, B.P. 1993. Alpine ...

In bibliographies the order for books and journals are respectively:

Polhill, R.M. 1982. Crotalaria in Africa and Madagascar. Rotterdam: Balkema.

4 PHOTOGRAPHS AND FIGURES

Figure 1. Caption of a typical figure.

5 PREFERENCES, SYMBOLS AND UNITS

 References in the text: Figure 1, Figures 2-4, 6, 8a, b (not abbreviated)

 References between parentheses: (Fig. 1), (Figs 2-4, 6, 8a, b) (abbreviated)

 USA / UK / The Netherlands instead of U.S.A. / U.K. / Netherlands / the Netherlands

 Author & Author (1989) instead of Author and Author (1989)

 (Author et al. 1989) instead of (Author, Author & Author 1989)

Always use the official SI notations:

 kg / m / kJ / cm instead of kg. (Kg) / m. / kJ. (KJ) / cm.;

 20°1632SW instead of 20° 16 32 SW

 20 instead of 20 / X20 / x 20; 4 + 5 > 7 instead of 4+5>7 but –8 / +8 instead of – 8 / + 8

 e.g. / i.e. instead of e.g., / i.e.,

You might also like