Professional Documents
Culture Documents
Qanon NLP White Paper
Qanon NLP White Paper
Zach Quinn
Introduction
Although disinformation is a rather abstract concept, often finding its ways into
content costs businesses 78 billion dollars per year. Filtering and ultimately removing
disinformation remains a problem for platforms, businesses and content creators who delicately
balance protected speech with its industry and socially damaging impact. Those most commonly
associated with spreading misinformation are the Q Anons, followers of the mythical and
anonymous military intelligence expert Q and how followers ‘dig’ to discover and ultimately
disseminate facts that contribute to the epidemic of misinformation. While other textual analyses
have focused on the words of ‘Q’ themselves in the 8Chan hosted Q drops, this project employs
natural language processing techniques to determine the thoughts, motivations and connections
between believers of the Q phenomenon and trending topics from the period of January 6th, the
Methodology
The data mined for this project constitutes a dynamic novel data set since it is derived
from Twitter via the platform’s developer API. In order to ensure that the information is relevant,
the request was constrained to January 6th, 2021, providing nearly four months’ of tweets for the
selected hash tags. The key words and hash tags were chosen based upon existing journalistic
domain knowledge as well as existing analytic reports. The phrases selected included ‘QAnon’,
‘Save The Children’ and ‘Deep State.’ After querying the API, raw text was stripped from each
2
returned tweet and converted to a corpus, which divided each tweet into an individual text file.
Next, the data was vetorized, or split into individual words, using a Term Document Matrix.
Since this project was primarily interested in association, the key words themselves were filtered
out of the three queries, along with English stop words (normally occurring phrases like ‘the’,
‘can’ and ‘like.’) and terms that were irrelevant to the search, i.e. ‘Unicef’ for a query concerning
‘Save The Children.’ Each term’s frequency was plotted on histograms and word clouds, graphic
tools ideal for displaying qualitative data. The subsequent phase of the project involved obtaining
the sentiment scores derived from the NRC Emotion Lexicon dictionary. Finally, several of the
most frequently occurring and highly correlated words were compared using a native association
function.
Results
The ‘QAnon’ query returned results, visualized with a word cloud and histogram, that
were consistent with existing Q belief systems as well as ongoing political and cultural
narratives. Specifically note the correlation between ‘republican’, ‘evangelicals’ and ‘president.’
Several of these terms, including ‘patriottakes’ suggest that followers are both self-actualized
and ready to eliminate so-called threats to democracy, yet also resigned to wait for ‘gestures’ or
take cues from ‘president’, which aligns with Q’s original directions.
4
This inner conflict is conveyed in the sentiment score for these tweets. Although one could
predict that QAnon tweets could be ‘negative’, the third most significant sentiment is ‘trust.’
This stands in stark contrast to tweets mentioning the phrase ‘Save the Children’, which were
While the spikes in both the positive and trust categories are noteworthy observations, the fact
that ‘fear’ is so low is inconsistent with the prevailing Q narrative that societal elites are preying
upon children, which is an age-old fear mongering tactic to mobilize a susceptible population.
For the ‘Deep State’ hash tag, the greatest correlations are between the words ‘hoax’ and
‘Russia’; both key words have been featured prominently across social platforms since the 2016
The inclusion of conspiratorial terms like ‘hoax’, ‘expect’ and ‘uncovered’ reflects the
significant fear and anticipation sentiments present in this data compared to the earlier samples.
6
Conclusion
essential for content creators, platform hosts and savvy Internet users to understand the
connotations and significance of terms associated with false claims. For developers, data
scientists and executives, it is necessary to understand how certain hash tags that may appear
innocuous, such as ‘Save the Children’, have been co-opted to spread ideologies that might run
counter to the brand of individuals or businesses who may naively use such a hash tag or key
word in digital communication. In order to slow (and perhaps halt entirely) the spread of
misinformation, users bear a similar responsibility when choosing the hash tags they use to help
platforms identify, aggregate and promote their posts. This project has demonstrated the critical
insights that can be gleamed from even a moderate sample of tweets and the power of natural
language processing to synthesize, interpret and, perhaps, preempt the spread of baseless content