Professional Documents
Culture Documents
Discord Icwsm-2
Discord Icwsm-2
Discord Icwsm-2
Arthur Buzelin1 * , Yan Aquino1 * , Victoria Estanislau1 * , Pedro Bento1 * , Lucas Dayrell1 *
Caio Santana1 * , Pedro Robles1 * , Ana Paula Couto1 , Virgilio Almeida1
Fabricio Benevenuto1 , Wagner Meira Jr1
1
Universidade Federal de Minas Gerais, Brazil
{arthurbuzelin, yanaquino, victoria.estanislau, pedro.bento, lucasdayrell, caiosantana, ana.coutosilva, virgilio, fabricio, meira}
@dcc.ufmg.br, pedroroblesduten@ufmg.br
Research Gap
While platforms such as WhatsApp and Telegram have have returned approximately 3 billion text messages from 2,020
been widely studied recently, Discord remains relatively un- unique groups.
derexplored. This oversight is especially significant in the Although our crawling process covers a time window of
Brazilian context, where recent incidents have caused na- more than 8 years, we restricted our analysis to a period
tional uproar and highlighted the platform’s role in facili- in which users’ activity, in terms of exchanged messages,
tating harmful behaviors. Motivated by these developments, reached the highest volume. The period with largest engage-
our study seeks to address these crucial gaps by providing ment occurred between October 1st, 2022 and September
a comprehensive analysis of user interactions and the effec- 30th, 2023. Moreover, we filtered out messages from groups
tiveness of moderation within Brazilian Discord communi- with less than 100 exchanged messages, since those com-
ties. By focusing on this specific geographical and cultural prised of abandoned groups. Then, approximately 2.3 billion
setting, we aim to uncover nuanced insights that previous messages were filtered out from the analyses.
studies may have overlooked, thus contributing significantly We then proceed with our filtering process turning our
to the broader understanding of how digital spaces can influ- attention to the presence of bots and deleted users in our
ence user behavior. dataset. Bots in the platform are automated accounts that
perform functions like greeting new members, moderating
Dataset Collection content, playing multimedia files, or even hosting games
among the users. Deleted users, instead, are those in which
We collected messages from public Brazilian Discord either an account is deleted by the user itself or by Discord
groups, following the platform policies2 , in which those due to a violation of terms of service. We identified bots
groups are split into two distinct types: (i) those featured and the ”Deleted Users” using Discord API. It was found
in Discovery 3 , an official, in app feature to browse for 30,331,773 and 55,878,395 messages delivered by bots and
new groups to join, which must adhere to stringent guide- ”Deleted Users”, respectively. The messages sent from bots
lines, including maintaining a safe moderated environment were filtered, reducing our dataset to 677,494,339 messages
and avoiding sensitive or controversial topics; (ii) and those sent by 2,239,337 unique users, while the ”Deleted Users”
with publicly available invite links on external websites and messages were marked for future analysis. The entire pro-
online forums. cess of data filtering, including these criteria and the subse-
We then developed a web scraper to collect invitation quent steps taken, is detailed in Figure 1.
links. This data was extracted from Discord’s discovery fea-
ture and the four most prominent Discord link-sharing web- Age retrieval
sites. The data collection occurred in October 2023, table
1 summarizes the number of groups identified from each In light of reports from Brazil, where a significant number
publicly available source. In this effort, we ensured that our of incidents involved teenagers both as perpetrators and vic-
dataset included all publicly accessible groups in Brazil. tims, understanding the age demographics within our dataset
The collection was done using a custom crawler based becomes crucial. This demographic insight is essential not
on the official Discord API 4 . Once configured, our crawler only for evaluating the effectiveness of Discord’s modera-
entered public Discord groups, those in which the link invi- tion strategies but also for gaining a deeper understanding of
tation was published on the cited public sources, and saved user interactions on the platform.
any and all text messages shared on their channels from the Given that Discord’s API doesn’t provide user’s age, we
creation of Discord, May 13th 2015, up to October 1st 2023. explored a unique feature of Brazilian Discord groups: many
Moreover, for each message, we collected their correspond- have an ”introduce yourself” text channel in which users typ-
ing timestamp, author’s username, author’s unique ID and ically share personal details, including their age. This study,
content. Due to the deactivation of 77 groups, our crawler therefore, concentrates on leveraging this self-reported age
information, only retrieved in public channels from public
2 groups.
https://discord.com/safety/360043709612-our-policies
3
https://discord.com/guild-discovery Our method involved deploying a script that utilizes reg-
4
https://discord.com/developers/applications ular expressions to extract age data from these channels. To
Figure 1: A diagram displaying the three stages of the dataset collection and filtering process, alongside with the total number
of groups, messages and users, after filtering.
fact that many of the platform’s most significant issues man- Figure 8: Trends in users toxicity.
ifest in less visible or non-gaming contexts. Therefore, an
analysis that focuses on Discord as merely a gaming net-
work might fail to capture these broader, more pervasive
problems.
We also could correlate users age and which types of hate fluenced by external factors, or that Discord experiences a
they’re most into, either reproducing or being exposed to highly dynamic discourse environment where users oscillate
it. As shown in Figure 6 and Figure 7, ’Pornography’, the between toxic and non-toxic interactions over time. Despite
category that had the most pronounced hate speech occur- that, there is a relatively high number of users exhibiting an
rences, has a concerning high incidence of younger audi- increase in toxic behavior. The disparity between this group
ences. This analysis underscores the critical importance of and those that exhibit decreasing toxicity behaviors high-
addressing safety concerns on platforms where children and lights a potential escalation in negative discourse within the
teenagers are engaging with content and communities meant platform.
for adults. It also highlights a potential gap in parental over-
sight and the platform’s responsibility to protect younger By analyzing Figures 7 and 9, we observe a dis-
users from exposure to harmful content. cernible pattern of user migration among the ”pornography,”
”anime,” and ” online dating” clusters. This trend is partic-
User Path ularly troubling as it suggests a trajectory within these com-
munities that may influence user behavior. The interconnect-
Users may enter in various groups and interact with many edness shown in the network graph, along with the heatmap
other users. To effectively monitor user behavior, it is es- data, highlights significant movements between these clus-
sential to first comprehend the different environments that ters, indicating that user behavior is not confined to isolated
they could encounter. While it is challenging to pinpoint the groups but spans across multiple, potentially influencing and
exact triggers of toxic messages, analyzing patterns can pro- reinforcing harmful ideologies and behaviors.
vide valuable insights.
We conducted a detailed examination of user behavior in To explore the impact of toxic messages on user behavior
discrete intervals. Messages from each user were divided within Brazilian Discord communities, an analysis was con-
into five equal segments, with each segment representing ducted using a novel graphical representation. This graph,
one-fifth of their total messages. This segmentation allows illustrated in Figure 10, tracks the time interval between an
us to track the evolution of discourse across different peri- initial toxic message and subsequent toxic responses. The
ods for users who have posted at least one toxic message. graph demonstrates a striking pattern: about 60% of follow-
After evaluating all segments for a user, the patterns of be- up toxic messages occur within one minute of the initial
havior were analyzed to determine the trend: (i) If the toxic- message. This visual insight highlights the rapid propaga-
ity rate of messages increased in each segment, the user was tion of toxicity within the digital communication channels,
categorized under ’Increasing Toxicity’; (ii) Conversely, if suggesting a potent, immediate influence exerted by the ini-
the trend showed a decrease in toxicity over time, the user tial toxic message on subsequent interactions.
fell into the ’Decreasing Toxicity’ category; (iii) If no clear
trend was observed and the user’s behavior varied between Studies of similar interactions on platforms like Twit-
the segments, they were placed in the ’Inconsistent Toxicity’ ter and Facebook have shown that toxic messages tend to
category. provoke immediate follow-up responses, indicating a reac-
Based on the results shown in Figure 8, the majority of tive and contagious nature of toxicity [Saveski, Roy, and
users fall under the ’inconsistent’ category, suggesting a spo- Roy 2021]. This phenomenon, also evident in the Discord
radic presence of toxic behavior that does not follow a clear groups, suggests that once a toxic message is introduced into
increasing or decreasing trend. This irregularity could in- a conversation, it significantly increases the likelihood of an
dicate that users’ behavior may be context-dependent, in- immediate toxic response.
Figure 10: Time interval of next hate speech message, after
one occurs