User Preference-Aware Fake News Detection: Yingtong Dou, Kai Shu, Congying Xia, Philip S. Yu, Lichao Sun

Short Research Paper II SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
User Preference-aware Fake News Detection

Yingtong Dou1 , Kai Shu2 , Congying Xia1 , Philip S. Yu1 , Lichao Sun3
1 Department
of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
2 Department
of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
3 Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA
{ydou5,cxia8,psyu}@uic.edu,kshu@iit.edu,james.lichao.sun@gmail.com
ABSTRACT disguised in news articles to mislead consumers [27, 37]. Disin-

Disinformation and fake news have posed detrimental effects on formation has resulted in deleterious effects and raised serious
individuals and society in recent years, attracting broad attention to concerns, demanding novel approaches for fake news detection.
fake news detection. The majority of existing fake news detection Among various fake news detection techniques, fact-checking
algorithms focus on mining news content and/or the surrounding is the most straightforward approach; however, it is usually labor-
exogenous context for discovering deceptive signals; while the en- intensive to acquire evidence from domain experts [9]. In addi-
dogenous preference of a user when he/she decides to spread a tion, computational approaches using feature engineering or deep
piece of fake news or not is ignored. The confirmation bias the- learning have shown many promising results [3, 12, 23, 31]. For
ory has indicated that a user is more likely to spread a piece of example, SAFE [36] and FakeBERT [11] used the TextCNN [35] and
fake news when it confirms his/her existing beliefs/preferences. BERT [5, 28, 29] to encode the news textual information, respec-
Users’ historical, social engagements such as posts provide rich tively; GCNFN [17] and GNN-CL [8] leveraged the GCN [14] to
information about users’ preferences toward news and have great encode the news propagation patterns on social media (e.g., news
potentials to advance fake news detection. However, the work on sharing cascading among social media accounts). However, these
exploring user preference for fake news detection is somewhat methods focus on modeling news content and its user exogenous
limited. Therefore, in this paper, we study the novel problem of context and ignore the user endogenous preferences.
exploiting user preference for fake news detection. We propose Sociological and psychological studies on journalism have the-
a new framework, UPFD, which simultaneously captures various orized the correlation between user preferences and their online
signals from user preferences by joint content and graph mod- news consumption behaviors [24]. For example, Naíve Realism [22]
eling. Experimental results on real-world datasets demonstrate indicates that consumers tend to believe that their perceptions of
the effectiveness of the proposed framework. We release our code reality are the only accurate views, while others who disagree are
and data as a benchmark for GNN-based fake news detection: regarded as uninformed, irrational, or biased; and Confirmation Bias
https://github.com/safe-graph/GNN-FakeNews. theory [18] reveals that consumers prefer to receive information
that confirms their existing views. For instance, a user believes the
CCS CONCEPTS election fraud would probably share similar news with a supportive
stance, and the news asserting election is stolen would attract users
• Computing methodologies → Machine learning; • Informa-
with similar beliefs [1]. To model user endogenous preferences,
tion systems → Social networks.
existing works have attempted to utilize historical posts as a proxy
KEYWORDS and have shown promising performance to detect sarcasm [13],
hate speech [20], and fake news spreaders [21] on social media.
Data Mining; Social Media Analysis; Fake News Detection In this paper, we consider the historical posts of social media
ACM Reference Format: users as their endogenous preference in news consumption. We
Yingtong Dou1 , Kai Shu2 , Congying Xia1 , Philip S. Yu1 , Lichao Sun3 . 2021. propose an end-to-end fake news detection framework named User
User Preference-aware Fake News Detection. In Proceedings of the 44th Inter- Preference-aware Fake Detection (UPFD) to model endogenous
national ACM SIGIR Conference on Research and Development in Information preference and exogenous context jointly (as shown in Figure 1).
Retrieval (SIGIR ’21), July 11–15, 2021, Virtual Event, Canada. ACM, New
Specifically, UPFD consists of the following major components: (1)
York, NY, USA, 5 pages. https://doi.org/10.1145/3404835.3462990
To model the user endogenous preference, we encode news content
and user historical posts using various text representation learning
1 INTRODUCTION
approaches. (2) To obtain the user exogenous context, we build
In recent years, social media has enabled the wide dissemination a tree-structured propagation graph for each news based on its
of disinformation and fake news– false or misleading information sharing cascading on social media. The news post is regarded as
Permission to make digital or hard copies of all or part of this work for personal or the root node, and other nodes represent the users who shared
classroom use is granted without fee provided that copies are not made or distributed the same news posts. (3) To integrate the endogenous and exoge-
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
nous information, we take the vector representations of news and
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, users as their node features and employ Graph Neural Networks
to post on servers or to redistribute to lists, requires prior specific permission and/or a (GNNs) [7, 17] to learn a joint user engagement embedding. The
fee. Request permissions from permissions@acm.org.
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada user engagement embedding and news textual embedding are used
© 2021 Association for Computing Machinery. to train a neural classifier to detect fake news. Our major contribu-
ACM ISBN 978-1-4503-8037-9/21/07. . . $15.00 tions can be summarized as follows:
https://doi.org/10.1145/3404835.3462990
2051
Exogenous Context Encoder
GNN Layers Readout
}
v1
!"#$%&'$()*#+,-#)%./01%234)56789:;<=><?@ABC"7>DCE01FGGG=H9'I-JKLMN6=OPQ+R%@S.$5L?BT/BS)=:UV7'EIW4N5TOX?Y)EX7%MZQ)/RZ4B@E%>5&.D[/\R]IT5D7&W^R2__-H_/2/I&?`33a68#`$Y[aB`D>3?VK2]VM;UYL_>7Ca'M^^\aT)^RIM[_:Q5L#MCC>W<^:5?`D4`4H7SJ*M8Hb#cTKN?%M&?&N..dab[LVN"X"9;SHJ9E=$O.BYT7*d.6A]<26>Oe>"bbZ-OZÈ=@-b"*-d8'[>9<f27eX?Q;e9cI">$MOa%LIR-f:;NK>MS<M>'`c`&@b>)b@d%7.=-?8)]S6&A]E-XS8aQ=8@ee$"UM4Ô/4)NS6A:IKf'X&AP2WTf_4U&Ob??aT[7X4KWLJ8\[T_+?6S_D]?.+IDEDC^@G49@GJ=5GK[9'G=8M=Z='%/d9%VPU\%P3%&HT+ZE$YQ<GVD@EAG&aP]G00!3"#$%&'$F
Loss Function
Social Context Extraction v2 !"#$%&'$()*#+,-#)%./01$2345"6"78924$53:*;<;=>?@A201B>>>8CD'E-F5GHIJ8<KL+M%NO.$9GPQ:/QO)8;R4S'7ETUI9:<VPW)7VS%HXL)/MXUQN7%Y9&.Z3/6M[E:9ZS&T?M=\\-C\/=/E&P]^^_J`#]$W3_Q]ZY^P45=[4HaRWG\YSb_'H??6_:)?MEH3\;L9G#HbbYTc?;9P]ZU]UCSOF*H`C@#d:5IP%H&P&I..A_@3G4#OJ?Z+b_\^/Q#:;<KH"5`2#^6Jd-?:T??']*_:*c/QE_LGQ_`9>.?^@HS&JN&W*Tc/U?G?I:)7RUN]be">M?"DP"GF\=ZaEaGTa@')*+QV&9@$R-'^6U]3bM$_9M@::#?)"ad@&a@8R#^//MbYJ'[3SAb5AL$aTaX*=b.&:aMSHO9.[bF\;#DYf6&+\Ufe%WdP8LcT>OQK>M.Z>>5RIQIb<7Xac7%EeC73.6`_DO*ef4WN<f%:7^8??K[S00!^"#$%&'$B
User Engagement L
!"#$%&'$()*#+,-#)%./012345678729:;6<=;.>?@@A<B>.C01DCCCEF:'G-HI34JA&K3+8LM?NO'MPEKQEHRST'7.3-"U/OVC2=C/"SW-#=7XBY*9*I6=ZKW5[:6@&3Q=-AN=)$6HC/;IZQPU8\F4G#%Z.:=5*-L+]G.^/LP_R:P)àEF%$L?[a\Y-YZab.JRc[F/S-b"9ZZ:V'9dbG>@YY]%R::9'4_>d6T*T8;WEb5aK]VE]Y-`#Ke'A7%-T:#<.aK2G9]);?'H%8'OX@BS@2`?;\[S\"cR<_;L6GW6]T+JRb]'P"2OYT]\9\BU'6[)"K"8M.8B=IY+RRc]74>SfIRO_QBP4334%')EZR*f+)T%Y`:9e&dbL2)TSfEJOU%YX9;ESdTX+JdRA[?3/+EYAK3>R\b"]*c9&6PSZQZ[2_5YWQ%RP+>BM%OH&f-Pb*EA/*L6</Ed#GC$>#CVEb:'VH:*B\^2K%LGNKOAKYeF5*]e<6fFC$F%b)C00!`"#$%&'$D
… EU
!"#$%&'$()*#+,-#)%./0123435)/6789:&;<9./=>?@A)?B)01CDDD7?E'F-G73H6I&JKL$E?GMGN+.O-#OP?K-;2Q)'RO&S$)TL=G>UL6-V>LTKORWPNXOS&FG<X/%8X/-UY4ZL5PS/Z2%[[Z\6Q]\-G\YL^")-_9$?&`YHY5?7/B29%aN$P#Pb$]@>>5G[-_V@D@=_L4/?H#;/88>$73a-RBM9/>=\#`/33aJ78J%O`4&SQLGa$T?]%-Z=Q3^&#L9O.7+/RT>D8"#SXKEc^*K_"3*OIB#d\6YIW__"T_JU+VP@L=#4[A7bd=\GaO4.'7-@[$[:+4eQ>S=Tf;S*cP?b)a&+$I/$KU&3Vad.)Y:X?&%#=A<b_I'H]UGeA>QH8>^J)+XÙaV;[:f#S>V'$"-J`"8*4V&OB>I53TZ+UV?E5I#3c%*EV@%9_>W/]\a/`bfRbbVYJIAcOD6STB/*NB2a'XaRXa&-OU/MF6PXb2\M]ZRRQ?<!X"#$%&'$C
News Piece vn
!"#$%&'$()*#+,-#)%./01234*56#789$7#:;#$<=>7?@A5B=01CDDDEFA'G-HE5IJ=&3:AKL@MN+OPQ8RS&=$T"UQVT%NW='QJX7:EY.Z:S7+A-L[\]@^)@%K?R+]4R_Q`R[I_V6:::Q$.->G>@=P]UWà`b\:;.&Y-24\$J)FYANBE@a^V9#533bV[3X`ZP@^#G']<JcSS#I$2K=YD_+XJY<AB;IKHdTQ>JZ&'.[)@T^&TEE);e]-?NH8%%55>;+*>O6/G#:H/Wf"E*\9N65Q$8>M8HE;^)bXbc/75\7:"WPK_2Y[dd)#XRe^Db'>H2KS$<4PS<6RUO\_Z8)925W'Y4<B=)bE.=[L-O-D].dHN:NF\b8=IQN)<TAK'9^fL82?^TX\MF?NZ]/VfJ>%2XOOXN828=H#XW:Y8cGE-4\'H5O)HFFL'A"bL$]))RI>GJ^dG[JbQZ=UF93US^D*/*"b/GB.GK6ab6H'_B<&)B*R6Je\@DI>3R.G0!4"#$%&'$C
1
Embedding
vn !"#$%&'$()*#+,-#)%./0123456789-':;<=>?5@47A9BC<;)01DEEEFGH'I-J3KL6MF9N>+O%PB.$QKRST/SB)FU:VW'XICY6QT9ZR[)XZW%L7>)/O7YSPX%\Q&.]=/Ô@ITQ]W&C_Oàa-Ga/`/I&R<bbAM2#<$[=AS<]\bRV3`@VLc:[Ka\WdA'L__ÂT)_OIL=aU>QK#Ldd\C4_UQR<]Y<YGWBJ*L2Ge#;T36R%L&R&6..5Ae=KVY_>LKYHâBXE:'=Fd)aW`^9]88;O'aLSLf)/MaR-@CfMP.7$CY>N'B<;UELdP5GLBaPLSABS-QQ43\U.eRF'?J+T'&-aG/fPSANZC[_=E$45:8HZ/[^%MGH&-e6&Z#L:LBTI\_II:RHZ].PFUY5RV[;99X<IBR'P)P-94`5IF)Qa&@4fJ#)O++Hb)J#4<@J?\NEIK4E_EB'FN2&EEY?ERE>VXE?][]T%]Q8]R^Z./%;>ebEQ<4IVHX.VMW00!b"#$%&'$D
In certain swing states, there

were more votes than people News Embedding
who voted, and in big numbers. Engaged Neural
Does that not really matter?
Users
User Preference and News Textual Embeddings + Classifier
Stopping Poll Watchers, voting
for unsuspecting people, fake
t1
!"#$%&'$()*#+,-#)%./0123456-789:;%<:=>?6@A#*=-4@B01C===DEF';-GD3HI3=JKLM9+2N2*.OPP#Q:RSM<SOTP&/9U=OVWO")3XLH?H-)YVWHX'7I%T6&.ZN&.PO&UX#[9WI\6ZN37]7^H5WB.P9_Y>T`;U$E:E*-U$)Na]B26RX@Yb&2&"))"96ZD$<B5<<^VM]KZ/c>3=V8EBHYZEc_%%P#'T[O/`Y*_8<6HVHQM-<H:/2]?K$GDXA5AL7FSa^Jd@8T;`W6R#NX7\b:&DMdH4L:%A.Q_84+Q'-R93<e]MIV>O=<E+>2#7H3cdLacbJP2B&K\\$-Q)"QNYU<@;'4#<Y4?V`'U5&.;NJN9Rf'%a3cW'M:;)AA'IKMJV?KN6;dTKV?"T3P53eQL894>\95@Qb87NDAe+/cEF93Z.2R]NG5/?beVB*c;=.F/7J+3a=%>$=QD*3/*"f/;UPc&F"Le:#$DH%_a4MN;]RN===3`^40!N"#$%&'$C
ballots and so much more.

…
Text tn
!"#$%&'$()*#+,-#)%./0123$4%56789:;<=>?@ABCBD+EFGH01I666JKL'<-CJMNOM6HGP=3+Q2Q*.@RR#STU9=V9@WR&/376@:D@")MBPNXN-)?:DNB'YO%WZ&.[2&.R@&7B#\3DOAZ[2MY5Y]NED8.R3^?_W`<7$KTK*-7$)2a58QZUB>?4&Q&"))"3Z[J$V8EVV]:=5G[/b_M6:;K8N?[Kb^%%R#'W\@/`?*^;VZN:NS=-VNT/Q5?Y7Q\PRW3-;R;ZL'U#DSa4Q5O"c2ARY>/\Q4TY->d5cJd.Y#J4MOC[Qb*'%=?%'9@F&CMaRAX&-bX)'WCee;`R=$*fN*2TK9#A?YM6T)4>V&-W#@[^'^+>)&]dHX:49=[fRRVfHQS<4;2`)4S)>4FQ;""A"*-FC)?5C"#JaFAL'3Z.B?c3P_LcL<*PQ"<4[b=eDXa9cROa6#Ab6d?f\J\8;O8F[Oa-ZX/38KZO]f\TddbY9âGO2LR.dY800!2"#$%&'$I
Representation
News Textual
y !"#$%&'$()*#+,-#)%./0123456789:;#:<.=>42?7@)6'A:B01CAAA=.D'B-E=FGHIAJIK'?LMD.$NO;3P&;?Q8Q2%'?/HR=STUMV$"W*@XXX<K#YMZ[QMNOX\HLQ)HLTYBQAB?D]OI^K)]T<6DIMI7/_7[GT?$[:F]\_*VX$`VX]HQN&.+V<a\*'2:'+*+A\b;BM6$a/KAG\5Z;QNA&X=Hcdc[`+=bN))N6^DQ;KZQ%BU<FE#.7X-E[7`'+$ae^W7&B"5=Nc+LL#)K'?6#QG;cS5X_`b)R[25E/S]U$F;OF8#S3%6Z@e<5f\NXIdFGbLGc5Uc8&)\:Ff8@db.R2J37OAb)<2GF8WY_cdJd7e@#Hc\[BIT6='E-OAbGZS&c<`LGAE[M3cU)QS&&%^$*M.QQc]RVT/fKd4J\#VV7K6DF9#$Df$Vef6S/UEc/=aH@QZfK2MZ:c=3Wc_]WT]FXDb&KeX4;:`P^T:4/AL[]=&2M3S)0!d"#$%&'$C
News Content Extraction Learning Models ET FC

!"#$%&'$()*#+,-#)%./01234%5/67$89/:+;5-<=>?-&@A/)01BCCCDEF'3-GDHIJ=&K@L$FEGMG<+.N-#NOE@-K;8PQRD?SG>TU'P-2$VT-U45P#K)>QK%V9*/ARW/AHM?-R%XYTZ[7\>H=;=V';]<&]P>]3U[^$-L<][_$UW>37*àbEO#@5ST$$K36".CW#`=87-**"c%E['cCS/ET-<\ZdVa"["eQT$=/\O7M[&?NKbL:'OPE8RO6+^WG'\\;G_C-QcV@&`4KWdPa/#bNQ@4'S=/GbYV\R[&5M&=S&Jc'ZHK5+b<92/<a\PN*&QE#%U3MRSJ'Sb:KO"I&'8`7AaO;bI%*?:$baN26@FV`dZdM)]@Yd&`f-'&:DD"SVN*N=b`R?E[b::'%KdIdDQdH.9fCTY<4&Hf[-XYF.Z<bY+e]WG3&>Y"32$]5?Qd\C3U)[;#LbCCdI*;TIeZCfVZc24^ZG^Wd.PYT`]]de:a_CMRJ;$H?E6!>"#$%&'$B
User Historical Posts Extraction Embedding News Label
Endogenous Preference Encoder
Figure 1: The proposed UPFD framework for user preference-aware fake news detection. Given the news piece and its engaged users on social
media, we extract the exogenous context as a news propagation graph and encode the endogenous information based on user historical posts
and news texts. The endogenous and exogenous information are fused using a GNN encoder. The final news embedding, composed of user
engagement embedding and news textual embedding, is fed into the neural classifier to predict the news’ credibility.
• We study a novel problem of user preference-aware fake news the FakeNewsNet dataset [25] which contains news content and its
detection on social media; social engagement information on Twitter. Then we use the Twitter
• We propose a principled way to exploit both endogenous prefer- Developer API [4] to crawl historical tweets of all accounts that
ence and exogenous context jointly to detect fake news; and retweeted the news in FakeNewsNet.
• We conduct extensive experiments on real-world datasets to To obtain rich historical information for user preference model-
demonstrate the effectiveness of UPFD for detecting fake news. ing, we crawl the recent two hundred tweets for each account, so
as to near 20 million tweets being crawled in total. For inaccessible
2 OUR APPROACH users whose accounts are suspended or deleted, we use randomly
In this section, we present the details of the proposed framework sampled tweets from accessible users engaging the same news as
for fake news detection named UPFD (User Preference-aware Fake its corresponding historical posts. Because deleting the inaccessible
News Detection). As shown in Figure 1, our framework has three users will break the intact news propagation cascading and results
major components. First, given a news piece, we crawl the historical in a less effective exogenous context encoder. We also remove the
posts of the users engaged in the news to learn user endogenous special characters, e.g., “@” characters and URLs, before applying
preference. We implicitly extract the preferences of engaged users text representation learning methods.
by encoding historical posts using text representation learning tech- To encode the news textual information and user preferences, we
niques (e.g., word2vec [16], BERT [5]). The news textual data is employ two types of text representation learning approaches based
encoded using the same approach. Second, to leverage user exoge- on language pretraining. Instead of training on the local corpus,
nous context, we build the news propagation graph according to its the word embeddings pretrained on a large corpus are supposed
engagement information on social media platforms (e.g., retweets to encode more semantic similarities between different words and
on Twitter). Third, we devise a hierarchical information fusion pro- sentences. For pretrained word2vec vectors, we choose the 680k
cess to fuse the user endogenous preference and exogenous context. 300-dimensional vectors pretrained by spaCy [10]. We also employ
Specifically, we obtain the user engagement embedding using GNN pretrained BERT embeddings to encode the historical tweets and
as the graph encoder, where the news and user embeddings en- news content as a sequence [5] using bert-as-a-service [32].
coded by the text encoder are used as their corresponding node Next, we elaborate the details of applying the above text rep-
features in the news propagation graph. The final news embeddings resentation learning models. spaCy includes pretrained vectors
are composed by the concatenation of user engagement embedding for 680k words, and we average the vectors of existing words in
and news textual embedding. combined recent 200 tweets to get user preference representation.
Next, we will introduce how we encode endogenous preference, The news textual embedding is obtained similarly. For the BERT
extract the exogenous context, and fuse both information. model, we use the cased BERT-Large model to encode the news and
user information. The news content is encoded using BERT with
maximum input sequence length (i.e., 512 tokens). Due to BERT’s
2.1 Endogenous Preference Encoding
input sequence length limitation, we could not use BERT to encode
It is non-trivial to explicitly model the endogenous preference of a 200 tweets as one sequence, so we resort to encode each tweet sep-
user only using his/her social network information. Similar to [2, 13, arately and average them afterward to obtain a user’s preference
20] which model the users’ personality, sentiment and stance using representation. Generally, the tweet text is way shorter than the
their historical posts, we leverage the historical posts of a user to news text, we empirically set the max input sequence length of
encode his/her preference implicitly. However, none of the previous BERT as 16 tokens to accelerate the tweets encoding time.
fake news datasets contain such information. In this paper, we select
2052
2.2 Exogenous Context Extraction Table 1: Dataset and graph statistics.

Given a news piece on social media, the user exogenous context is
composed of all users that engaged with the news. We utilize the #Graphs #Total #Total #Avg. Nodes
Dataset
(#Fake) Nodes Edges per Graph
retweet information of news pieces to build a news propagation
314
graph. As the toy example of the news propagation graph shown in Politifact (POL) 41,054 40,740 131
(157)
Figure 1, it is a tree-structured graph where the root node represents
5464
the news piece, and other nodes represent users who share the Gossipcop (GOS)
(2732)
314,262 308,798 58
root news. In this paper, we investigate the fake news propagation
on Twitter as a proof-of-concept use case. To build propagation
networks in Twitter, we follow a similar strategy used in [8, 17, 26]. compared to previous works? RQ2: What are the contributions
Specifically, we define a new piece as v 1 , and {v 2 , . . . , vn } as a list of endogenous/exogenous information and other variants of the
of users that retweeted v 1 ordered by time. We define two following proposed framework?
rules to determine the news propagation path:
• For any account vi , if vi retweets the same news later than at least 3.1 Experimental Setup
one following accounts in {v 1 , . . . , vn }, we estimate the news 3.1.1 Dataset. Previous works have proposed a couple of fake
spreads from the account with the latest timestamp to account vi . news datasets with news pieces from different websites and their
Since the latest tweets are first presented in the timeline of the fact-checking information [19, 30]. To investigate both the user pref-
Twitter app, and thus have higher probabilities to be retweeted. erence and propagation pattern of fake news, we choose the Fake-
• If account vi does not follow any accounts in the retweet se- NewsNet dataset [25]. It contains fake and real news information
quences including the source account, we conservatively esti- from two fact-checking websites and the related social engagement
mate the news spreads from the accounts with the most number from Twitter. The dataset statistics are shown in Table 1.
of followers. Because tweets from accounts with more follow-
ers have a higher chance to be viewed/retweeted by other users 3.1.2 Baselines. We compare the UPFD with fake news detection
according to the Twitter content distributing rules. models that utilize different information. Many baseline methods
Based on the above rules, we can build the news propagation graphs leverage extra information like image information which is not
on Twitter. Note that this approach can be applied to other social included in the FakeNewsNet [25]. To ensure a fair comparison, we
media platforms like Facebook as well. implement the baselines only with the parts for encoding the news
content, user comments, and news propagation graph. CSI [23]
2.3 Information Fusion employs an LSTM to encode the news content information to detect
fake news. SAFE [36] uses the TextCNN [35] to encode the news
Previous works [8, 15, 17] have demonstrated that fusing the user
textual information. GCNFN [17] is the first fake news detection
features with a news propagation graph could boost the fake news
framework to encode the news propagation graph using GCN [14].
detection performance. Since the GNN can encode both node fea-
It takes the profile information and comment textual embeddings as
ture and graph structure in an end-to-end manner, it is a good fit for
the user feature. GNN-CL [8] encodes the news propagation graph
our task. Specifically, we propose a hierarchical information fusion
using DiffPool [34], a GNN designed for graph classification. The
approach. We first fuse the endogenous and exogenous information
node features are extracted from user profile attributes on Twitter.
using the GNN. With a GNN, the news textual embedding and
The list of ten profile feature names can be found in [8, 15] We
user preference embedding can be taken as node features. Given
also add two baselines that apply MLP directly on news textual
the news propagation graph, most GNNs aggregate the features
embeddings encoded by word2vec and BERT.
of its adjacent nodes to learn the embedding of a node. Like pre-
vious GNN-based graph classification models [33, 34], we apply a 3.1.3 Experimental Settings. We implement all models using Py-
readout function over all node embeddings to obtain the embed- Torch, and all GNN models are implemented with PyTorch-Geometric
ding of a news propagation graph. The readout function makes package [6]. We use unified graph embedding size (128), batch size
the mean pooling operation over all node embeddings to get the (128), optimizer (Adam), and L2 regularization weight (0.001), train-
graph embedding (i.e., user engagement embedding). Second, since val-test split (20%-10%-70%) for all models. The experimental results
the news content usually contains more explicit signals regarding are averaged over five different runnings. Other hyper-parameters
the news’ credibility [3]. We fuse the news textual embedding and for each model are reported with the code.
user engagement embedding by concatenation as the ultimate news
embedding to enrich the news embedding information. 3.2 RQ1: Performance Evaluation
The fused news embedding is finally fed into a two-layer Multi- Table 2 shows the fake news detection performance of UPFD and six
layer Perceptron (MLP) with two output neurons representing the baselines. First, we can observe that UPFD has the best performance
predicted probabilities for fake and real news. The model is trained comparing to all baselines. UPFD outperforms the best baseline
using a binary cross-entropy loss function and is updated with SGD. GCNFN around 1% on both datasets with statistical significance.
The experimental results of UPFD and GCNFN demonstrate that
3 EXPERIMENTS the user comments (used by GCNFN) are also beneficial to fake
In the experiments, we want to address two following questions: news detection; and the user endogenous preference could impose
RQ1: How are the performances of the proposed UPFD framework additional information when user comment information is limited.
2053
Politifact Gossipcop
Table 2: The fake news detection performance of baselines 0.86
0.96
ACC ACC
and our model. Stars denote statistically significant under 0.84
F1
0.94
F1
the t-test (∗ p ≤ 0.05, ∗ ∗ p ≤ 0.01, ∗ ∗ ∗ p ≤ 0.001).

0.82 0.92
Value
Value
0.80
0.90
0.78
0.88
0.76
POL GOS 0.86
Model 0.74
ACC F1 ACC F1 0.72 0.84

-END & -EXO -END -EXO UPFD -END & -EXO -END -EXO UPFD
SAFE [36] 73.30 72.87 77.37 77.19
News CSI [23] 76.02 75.99 75.20 75.01
Figure 2: The fake news detection performance of differ-
Only BERT+MLP 71.04 71.03 85.76 85.75
word2vec+MLP 76.47 76.36 84.61 84.59 ent variants of UPFD framework. -END/-EXO represents the
GNN-CL [8] 62.90 62.25 95.11 95.09 UPFD variant without endogenous/exogenous information.
News +
GCNFN [17] 83.16 83.56 96.38 96.36
User
UPFD (ours) 84.62∗ 84.65∗ 97.23∗∗ 97.22∗∗∗
Specifically, we employ the GCNFN (word2vec) as the graph
(text) encoder for both datasets, and remove news concatenation
Second, since all baselines either encode the news content or user
to ensure a fair comparison. The UPFD variant without exogenous
comments without considering the historical posts, we can tell
information (-EXO) is implemented by removing all edges in the
that leveraging the historical posts as user endogenous preferences
news propagation graph. Thus, -EXO encodes the news embedding
could improve the fake news detection performance. Note that the
solely based on node features without exchanging information be-
UPFD with the best performance on the both datasets uses BERT
tween nodes. The UPFD variant without endogenous information
as the text encoder and GraphSAGE as the graph encoder.
(-END) takes the user profile as node features and does not contain
user endogenous preference information. The UPFD variant with-
Table 3: Fake news detection performance on two datasets
out both endogenous and exogenous information (-END & -EXO)
with different node feature types and models. The bold (un-
replaces the node features of the -EXO with user profile features.
derlined) text indicates the best (second best) performances
Figure 2 shows the fake news detection performance for dif-
on each dataset.
ferent UPFD variants on two datasets. We can find that remov-
ing either component from the UPFD will reduce its performance.
POL GOS
Feature GraphSAGE GCNFN GraphSAGE GCNFN
Moreover, jointly encoding the endogenous and exogenous infor-
ACC F1 ACC F1 ACC F1 ACC F1 mation attains the best performance. The accuracy of UPFD/-EXO
Profile 77.38 77.12 76.94 76.72 92.19 92.16 89.00 88.96 is 85.61%/81.63%, and the F1 score of UPFD/-EXO is 85.97%/81.15%
word2vec 80.54 80.41 80.54 80.41 96.81 96.80 94.97 94.95 on Politifact. The accuracy of UPFD/-EXO is 95.47%/93.92%, and
BERT 84.62 84.53 83.26 83.14 97.23 97.22 96.18 96.17 the F1 score of UPFD/-EXO is 95.46%/93.81% on Politifact. All the
experimental results are statistically significant under the t-test
(p ≤ 0.01). This indicates that exogenous information (i.e., news
3.3 RQ2: Ablation Study propagation graph) is more informative on Politifact since removing
3.3.1 Encoder Variants. As we mentioned in Section 2.3, we em- it results in a larger performance drop. It is obvious that endogenous
ploy different text encoders and GNNs to encode the endogenous information contributes more to performance gain than exogenous
and exogenous information. In Table 3, we show the fake news information. This observation further verifies the necessity of mod-
detection performance of two GNN variants using three different eling user endogenous preferences.
node features. Note that “word2vec” and “BERT” represent features
encoding the user endogenous preferences while the “Profile” fea- 4 CONCLUSION
ture is regarded as a baseline. GraphSAGE [7] is a GNN to learn In this paper, we argues that user endogenous news consumption
node embeddings via aggregating neighbor nodes information and preference plays a vital role in the fake news detection problem.
GCNFN [17] is a GNN-based fake news detection model which To verify this argument, we collect the user historical posts to im-
leverages two GCN layers to encode the news propagation graph. plicitly model the user endogenous preference and leverage the
Table 3 shows that the endogenous features (word2vec and BERT) news propagation graph on social media as the exogenous social
are consistently better than the profile feature, which only encodes context of users. An end-to-end fake news detection framework
the user profile information. We also observe that GraphSAGE and named UPFD is proposed to fuse the endogenous and exogenous
BERT have the average best performance among other model and information and predict the news’ credibility on social media. Ex-
feature variants. It suggests that BERT is better than word2vec perimental results demonstrate the advantage of modeling the user
for encoding textual features which has been verified on other endogenous preference.
NLP tasks [5]. Note that the BERT performance could be further
improved via fine-tuning, and we leave it as future work. ACKNOWLEDGMENTS
3.3.2 Framework Variants. To verify the effectiveness of endoge- This work is supported in part by NSF under grants III-1763325,
nous preference and exogenous context, we fix the text and graph III-1909323, and SaTC-1930941. Kai Shu is supported by the John S.
encoder and design three UPFD variants that remove the endoge- and James L. Knight Foundation through a grant to the Institute for
nous information, exogenous information or both of them. Data, Democracy & Politics at The George Washington University.
2054
REFERENCES zenodo.1239675
[1] Anton Abilov, Yiqing Hua, Hana Matatov, Ofra Amir, and Mor Naaman. 2021. [20] Jing Qian, Mai ElSherief, Elizabeth M Belding, and William Yang Wang. 2018.
VoterFraud2020: a Multi-modal Dataset of Election Fraud Claims on Twitter. Leveraging intra-user and inter-user representation learning for automated hate
arXiv preprint arXiv:2101.08210 (2021). speech detection. arXiv preprint arXiv:1804.03124 (2018).
[2] Nadeem Ahmad and Jawaid Siddique. 2017. Personality assessment using Twitter [21] Francisco Rangel, Anastasia Giachanou, Bilal Ghanem, and Paolo Rosso. 2020.
tweets. Procedia computer science 112 (2017), 1964–1973. Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News
[3] Shantanu Chandra, Pushkar Mishra, Helen Yannakoudakis, and Ekaterina Spreaders on Twitter. In CLEF.
Shutova. 2020. Graph-based Modeling of Online Communities for Fake News [22] Lee Ross and Andrew Ward. 1995. Naive realism in everyday life: Implications
Detection. arXiv preprint arXiv:2008.06274 (2020). for social conflict and misunderstanding. (1995).
[4] Twitter Developer. 2021. Twitter API. https://developer.twitter.com/. [23] Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. Csi: A hybrid deep model
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: for fake news detection. In CIKM.
Pre-training of deep bidirectional transformers for language understanding. arXiv [24] Kai Shu, H Russell Bernard, and Huan Liu. 2019. Studying fake news via net-
preprint arXiv:1810.04805 (2018). work analysis: detection and mitigation. In Emerging Research Challenges and
[6] Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with Opportunities in Computational Social Network Analysis and Mining. Springer,
PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and 43–65.
Manifolds. [25] Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu.
[7] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation 2020. Fakenewsnet: A data repository with news content, social context, and
learning on large graphs. In NeurIPS. spatiotemporal information for studying fake news on social media. Big Data 8,
[8] Yi Han, Shanika Karunasekera, and Christopher Leckie. 2020. Graph Neural 3 (2020), 171–188.
Networks with Continual Learning for Fake News Detection from Social Media. [26] Kai Shu, Deepak Mahudeswaran, Suhang Wang, and Huan Liu. 2020. Hierarchical
arXiv preprint arXiv:2007.03316 (2020). propagation networks for fake news detection: Investigation and exploitation. In
[9] Naeemul Hassan, Gensheng Zhang, Fatma Arslan, Josue Caraballo, Damian Proceedings of the International AAAI Conference on Web and Social Media.
Jimenez, Siddhant Gawsane, Shohedul Hasan, Minumol Joseph, Aaditya Kulkarni, [27] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake
Anil Kumar Nayak, et al. 2017. ClaimBuster: the first-ever end-to-end fact- News Detection on Social Media: A Data Mining Perspective. KDD exploration
checking system. VLDB (2017). newsletter (2017).
[28] Lichao Sun, Kazuma Hashimoto, Wenpeng Yin, Akari Asai, Jia Li, Philip Yu, and
[10] Matthew Honnibal and Ines Montani. 2017. spacy 2: Natural language understand-
Caiming Xiong. 2020. Adv-BERT: BERT is not robust on misspellings! Generating
ing with bloom embeddings, convolutional neural networks and incremental
nature adversarial samples on BERT. arXiv preprint arXiv:2003.04985 (2020).
parsing. To appear 7, 1 (2017).
[29] Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip S Yu, and Lifang
[11] Rohit Kumar Kaliyar, Anurag Goswami, and Pratik Narang. [n.d.]. FakeBERT:
He. 2020. Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks. arXiv
Fake news detection in social media with a BERT-based deep learning approach.
preprint arXiv:2010.02394 (2020).
Multimedia Tools and Applications ([n. d.]), 1–24.
[30] William Yang Wang. 2017. " Liar, Liar Pants on Fire": A New Benchmark Dataset
[12] Hamid Karimi, Proteek Roy, Sari Saba-Sadiya, and Jiliang Tang. 2018. Multi-
for Fake News Detection. arXiv preprint arXiv:1705.00648 (2017).
Source Multi-Class Fake News Detection. In COLING.
[31] Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu
[13] Anupam Khattri, Aditya Joshi, Pushpak Bhattacharyya, and Mark Carman. 2015.
Su, and Jing Gao. 2018. EANN: Event Adversarial Neural Networks for Multi-
Your sentiment precedes you: Using an authorâĂŹs historical tweets to pre-
Modal Fake News Detection. In KDD.
dict sarcasm. In Proceedings of the 6th workshop on computational approaches to
[32] Han Xiao. 2018. bert-as-service. https://github.com/hanxiao/bert-as-service.
subjectivity, sentiment and social media analysis. 25–30.
[33] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful
[14] Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph
are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
convolutional networks. In ICLR.
[34] Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure
[15] Yi-Ju Lu and Cheng-Te Li. 2020. GCAN: Graph-aware Co-Attention Networks
Leskovec. 2018. Hierarchical graph representation learning with differentiable
for Explainable Fake News Detection on Social Media. In Proceedings of the 58th
pooling. In Advances in neural information processing systems.
Annual Meeting of the Association for Computational Linguistics.
[35] Ye Zhang and Byron Wallace. 2015. A sensitivity analysis of (and practitioners’
[16] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient
guide to) convolutional neural networks for sentence classification. arXiv preprint
estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
arXiv:1510.03820 (2015).
(2013).
[36] Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020. SAFE: Similarity-Aware Multi-
[17] Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, and Michael M
modal Fake News Detection. In Pacific-Asia Conference on Knowledge Discovery
Bronstein. 2019. Fake news detection on social media using geometric deep
and Data Mining. Springer, 354–367.
learning. ICLR Workshop (2019).
[37] Xinyi Zhou and Reza Zafarani. 2020. A survey of fake news: Fundamental
[18] Raymond S Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in
theories, detection methods, and opportunities. ACM Computing Surveys (CSUR)
many guises. Review of general psychology 2, 2 (1998), 175.
53, 5 (2020), 1–40.
[19] Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno
Stein. 2018. BuzzFeed-Webis Fake News Corpus 2016. https://doi.org/10.5281/
2055

User Preference-Aware Fake News Detection: Yingtong Dou, Kai Shu, Congying Xia, Philip S. Yu, Lichao Sun

Uploaded by

Copyright:

Available Formats

You might also like

User Preference-Aware Fake News Detection: Yingtong Dou, Kai Shu, Congying Xia, Philip S. Yu, Lichao Sun

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

User Preference-Aware Fake News Detection: Yingtong Dou, Kai Shu, Congying Xia, Philip S. Yu, Lichao Sun

Uploaded by

Copyright:

Available Formats

Short Research Paper II SIGIR ’21, July 11–15, 2021, Virtual Event, Canada

User Preference-aware Fake News Detection

ABSTRACT disguised in news articles to mislead consumers [27, 37]. Disin-

Exogenous Context Encoder

GNN Layers Readout

In certain swing states, there

ballots and so much more.

News Content Extraction Learning Models ET FC

User Historical Posts Extraction Embedding News Label

Endogenous Preference Encoder

2.2 Exogenous Context Extraction Table 1: Dataset and graph statistics.

the t-test (∗ p ≤ 0.05, ∗ ∗ p ≤ 0.01, ∗ ∗ ∗ p ≤ 0.001).

ACC F1 ACC F1 0.72 0.84

You might also like