How Would Stance Detection Techniques Evolve After The Launch

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

How would Stance Detection Techniques Evolve after the Launch of

ChatGPT?

Bowen Zhang1 , Daijun Ding1 , Liwen Jing2 ∗


1
College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China
2
Faculty of Information and Intelligence, Shenzhen X-Institute, Shenzhen, China

Abstract made a transition into traditional machine learn-


ing based algorithms. Since 2014, deep learning
Stance detection refers to the task of extract- models quickly become the mainstream techniques
ing the standpoint (Favor, Against or Neither) for stance detection. Later on, with the great suc-
arXiv:2212.14548v1 [cs.CL] 30 Dec 2022

towards a target in given texts. Such re-


cess of Google’s bidirectional encoder represen-
search gains increasing attention with the pro-
liferation of social media contents. The con- tations from transformers (BERT) model, a new
ventional framework of handling stance de- NLP research paradigm emerges which is utilizing
tection is converting it into text classification large pre-trained language models (PLM) together
tasks. Deep learning models have already re- with a fine tuning process. This pre-train and fine-
placed rule-based models and traditional ma- tune paradigm provides exceptional performance
chine learning models in solving such prob- for most NLP downstream tasks including stance
lems. Current deep neural networks are fac-
detection, because the abundance of training data
ing two main challenges which are insuffi-
cient labeled data and information in social
enables PLMs to learn enough general purpose
media posts and the unexplainable nature of features and knowledge for modeling different lan-
deep learning models. A new pre-trained lan- guages. Following BERT, more and more PLMs
guage model chatGPT was launched on Nov are proposed with different specialties and char-
30, 2022. For the stance detection tasks, our acteristics, including the ELMo series, the GPT
experiments show that ChatGPT can achieve series, the Turing series, varieties of BERT and
SOTA or similar performance for commonly many more.
used datasets including SemEval-2016 and P-
Stance. At the same time, ChatGPT can pro- ChatGPT is the most recent PLM optimized for
vide explanation for its own prediction, which dialogue and attracted over 1 million users within 5
is beyond the capability of any existing model. days. Programmers use it to interpret code, artists
The explanations for the cases it cannot pro- use it to generate prompts for AIGC models, clerks
vide classification results are especially useful. use is to write and translate documents; writers
ChatGPT has the potential to be the best AI challenge ChatGPT to write poems and film scripts
model for stance detection tasks in NLP, or at
and etc. To what extent will ChatGPT transform
least change the research paradigm of this field.
ChatGPT also opens up the possibility of build- the society and people’s way of doing and think-
ing explanatory AI for stance detection. ing? For NLP experts, is ChatGPT just another
pre-trained language model?
1 Introduction In this work we conduct experiments on Chat-
GPT for stance detection tasks by directly asking
Personal stance towards an issue affects the de-
ChatGPT for the result. This approach can be con-
cision making of an individual, while the stance
sidered as a zero-shot prompting strategy. Exper-
holds by the public towards thousands of potential
imental results show that ChatGPT can achieve
topics explains more. Stance detection is an impor-
SOTA or similar performance for commonly used
tant topic in research communities of both natural
datasets including SemEval-2016 and P-Stance
language processing (NLP) and social computing
with a simple prompt. Since ChatGPT is trained
(Küçük and Can, 2020; AlDayel and Magdy, 2021).
for dialogues, it is surprisingly easy to know the
Similar to all NLP tasks, early works on stance de-
reason of the model’s decision making by directly
tection focused on rule-based approaches, and later
asking why. Furthermore, interacting with Chat-

Corresponding authors: ljing@x-institute.edu.cn GPT with a chain of inputs can potentially further
improves the performance. This paper is structured for inferring the stance polarity (Dey et al., 2018;
as follows: after a brief overview of related work Sun et al., 2018); and the GCN methods propose
in Section 2, our proposed prompting methods and a graph convolutional network to model the rela-
results are detailed in Section 3. Section 4 contains tion between target and text (Li et al., 2022; Zhang
discussions and future work. et al., 2020a; Conforti et al., 2021).
Inspired by the recent success of PLMs, fine-
2 Related Work tuning methods have led to improvements in stance
Before getting into more detail, we first give a for- detection tasks(Liu et al., 2021b). Fine-tuning mod-
mal definition of stance detection. For an input els adapt PLMs by building a stance classification
in the form of a piece of text and a target pair, head on top of the “<cls>” token, and fine-tune
stance detection is a classification problem where the whole model. The PLMs are getting larger
the stance of the author of the text is sought in and larger because the performance and sample
the form of a category label from this set: {Favor, efficiency on downstream tasks are normally pro-
Against, Neither}. Occasionally, the category la- portional to the scale of the model, and some abil-
bel of Neutral is also added to the set of stance ities like the prompting strategies, popularized by
categories and the target may or may not be explic- GPT-3, are considered to be effective only when
itly mentioned in the text. Researchers approach the model reaches a certain scale (Wei et al., 2022).
this task by converting it into a text classification The main idea of prompt-based methods is mim-
task. Stance detection studies originally focused icking PLMs to design a template suitable for clas-
on parliamentary debates and gradually shifted to sification tasks and then build a mapping (called
social media contents including Twitter, Facebook, verbalizer) from the predicted token to the classi-
Instagram, online blogs and etc. The techniques to fication labels to perform class prediction, which
approach these problems also evolve with time. bridges a projection between the vocabulary and
Early research works on stance detection from the label space. The prompting strategies provide
the 1950s mainly adopted rule-based techniques further improvements for stance detection perfor-
(Anand et al., 2011; Walker et al., 2012). Since the mance(Shin et al., 2020). Models like LaMDA,
1990s, machine learning based models gradually re- GPT-3 and etc. also gain success on few-shot
placed small scale rule-based methods. Traditional prompting(Wei et al., 2022), which alleviates the
machine learning models build text classifiers for demand for large amount of training data and the
stance detection based on selected features. The tedious training process.
effective algorithms for the classifiers are support Generally speaking, stance detection techniques
vector machine (SVM) (Addawood et al., 2017; and NLP algorithms in general experienced four
Mohammad et al., 2017), logistic regression (Fer- main paradigms: (1) rule-based models; (2) tradi-
reira and Vlachos, 2016; Tsakalidis et al., 2018; tional machine learning based models; (3) deep
Skeppstedt et al., 2017), naive bayes (HaCohen- neural network models and (4) PLM pre-train
Kerner et al., 2017; Simaki et al., 2017), decision and fine-tune paradigm. Quite recently, the 5th
tree (Wojatzki and Zesch, 2016) and etc. With the paradigm "pre-train, prompt and predict" starts to
fast advancement of deep learning in the 2010s, draw wide attention(Liu et al., 2021a).
models based on deep neural networks (DNN) be-
come mainstream in this field. These methods
Model HC FM LA
design neural networks with different structures Bicond (Augenstein et al., 2016) 32.7 40.6 34.4
and connections to obtain the desired stance clas- CrossNet (Xu et al., 2018) 38.3 41.7 38.5
sifier, which can be categorized as conventional SEKT (Zhang et al., 2020b) 50.1 44.2 44.6
TPDG (Liang et al., 2021) 50.9 53.6 46.5
DNN models, attention-based DNN models and Bert_Spc (Devlin et al., 2019) 49.6 41.9 44.8
graph convolutional network (GCN) models. Con- Bert-GCN (Lin et al., 2021) 50.0 44.3 44.2
volutional neural network (CNN) and long short- PT-HCL (Liang et al., 2022) 54.5 54.6 50.9
ChatGPT 68.4 58.2 79.5
term memory (LSTM) models are most commonly
used conventional DNN models (Augenstein et al.,
Table 1: Performance comparison (F1-m) on SemEval-
2016; Du et al., 2017); the attention-based methods 2016 dataset with zero shot setup.
mainly utilize target-specific information as the at-
tention query, and deploy an attention mechanism
Figure 1: Example of Question to ChatGPT

Methods FM LA HC
F1-avg F1-m F1-avg F1-m F1-avg F1-m
BiLSTM Augenstein et al. (2016) 48.04 52.16 51.59 54.04 47.47 57.38
BiCond (Augenstein et al., 2016) 57.39 61.37 52.32 54.48 51.89 59.75
TextCNN (Kim, 2014) 55.65 61.37 58.8 63.2 52.35 58.45
MemNet (Tang et al., 2016) 51.07 57.75 58.86 61.03 52.3 58.45
AOA (Huang et al., 2018) 55.37 59.96 58.32 62.42 51.55 58.17
TAN (Du et al., 2017) 55.77 58.26 63.72 65.74 65.38 67.71
ASGCN (Zhang et al., 2019) 56.21 58.52 59.51 62.87 62.16 64.27
Bert_Spc (Devlin et al., 2019) 57.3 60.6 63.97 66.31 65.78 69.1
TPDG (Liang et al., 2021) 67.3 / 74.7 / 73.4 /
ChatGPT 68.44 72.63 58.17 59.29 79.48 77.97

Table 2: Performance comparison on SemEval-2016 dataset with in-domain setup.

Methods Trump Biden Bernie


F1-avg F1-m F1-avg F1-m F1-avg F1-m
BiLSTM (Augenstein et al., 2016) 72.01 69.74 69.53 68.67 63.91 63.81
BiCond (Augenstein et al., 2016) 72.98 70.56 69.37 68.41 64.58 64.06
TextCNN (Kim, 2014) 77.23 76.92 78.21 77.95 70.23 69.75
MemNet (Tang et al., 2016) 77.67 76.80 77.56 77.22 72.78 71.40
AOA (Huang et al., 2018) 77.74 77.15 77.80 77.69 71.70 71.24
TAN (Du et al., 2017) 77.52 77.10 77.92 77.64 71.99 71.60
ASGCN (Zhang et al., 2019) 77.01 76.82 78.35 78.17 70.76 70.56
Bert_Spc (Devlin et al., 2019) 81.58 81.43 81.67 81.46 78.43 78.28
ChatGPT 83.19 82.78 82.30 82.03 79.43 79.43

Table 3: Performance comparison on P-Stance dataset with in-domain setup.

3 Methods and Results sentence : "RT GunnJessica: Because i want young


American women to be able to be proud of the 1st
Task definition: We use X = {x, p}N i=1 to denote woman president #SemST” to the target “Hillary
the collection of data, where each x denotes the
Clinton" select from “favor, against’ or neutral’.
input text and p denotes the corresponding target.
For this particular example, ChatGPT returns a
N represents the number of instances. Stance de-
correct result.
tection aims to predict a stance label for the input
sentence x towards the given target p by using the Results: To compare the effectiveness of Chat-
stance predictor. GPT, we carried out experimental validations in the
In this Section, we reveal the performance of the SemEval-2016 stance dataset(Mohammad et al.,
ChatGPT method for stance detection. We utilize 2016) and P-Stance dataset(Li et al., 2021). The
a special case of prompt to construct the stance SemEval-2016 is a dataset of 4870 tweets in En-
predictor by creating a template of a direct question. glish with manual annotation for stance towards 6
Specifically, we directly ask the ChatGPT model selected targets and ‘Hillary Clinton (HC)’, ‘Fem-
the stance polarity of a certain tweet towards a inist Movement (FM)’ and ‘Legalization of Abor-
specific target. Figure 1 shows an example. Given tion (LA)’ are three commonly used ones. Sim-
the input: “RT GunnJessica: Because i want young ilarly, P-Stance dataset contains 21574 English
American women to be able to be proud of the tweets with political contents with stance anno-
1st woman president #SemST”, the question for tations towards three targets including “Donald
ChatGPT input is: “What is the attitude of the Trump,” “Joe Biden,” and “Bernie Sanders.” Since
Figure 2: ChatGPT’s Explanation when the Stance is Explicitly Expressed in the Text

Figure 3: ChatGPT’s Explanation when the Stance is Implicitly Expressed in the Text

OpenAI has not provide API for using ChatGPT results are summerized in Table 1 to 3.
yet, experiments has only been conducted on these The results show that ChatGPT achieves SOTA
two benchmark datasets on social media texts. Fol- results in zero-shot setup. For example, ChatGPT
lowing (Zhang et al., 2020b; Liang et al., 2022) achieves a 15.3% improvement on average com-
we use the micro average F1-score (denoted as F1- pared with the best competitor PT-HCL in zero-
avg) and marcro-F1 score (denoted as F1-m) for shot setup. Compared with in-domain setup, where
performance evaluation. these methods first learned from 80% training cor-
pus, ChatGPT still yields better performance than
We constructed both zero-shot stance detection
all the baselines in most tasks. For example, Chat-
and in-domain stance detection setups for results
GPT achieves a 1.08% improvement on average of
comparison. The zero-shot setup means the model
F1-avg compared with the fine-tuned Bert model.
is directly tested without any adjustment with train-
ing data, which is a fair comparison with our pro- 4 Discussions and Future Work
posed prompt method using ChatGPT. We also
compared our zero-shot results of ChatGPT with Results in Section 3 demonstrate the emergent abil-
other mainstream stance detection models in an ity of ChatGPT on zero-shot prompting for stance
in-domain setup, which means these models are detection tasks. By using a simple prompt of di-
optimized with 80% tweets as training data. The rectly asking the dialogue model for the stance with
Figure 4: ChatGPT’s Explanation when it cannot Provide a Stance Detection Result (Case 1)

Figure 5: ChatGPT’s Explanation when it cannot Provide a Stance Detection Result (Case 2)

no training, ChatGPT returns SOTA results in both ChatGPT is a language model trained for dia-
zero-shot and in-domain setups. The launch of logues, thus it is a natural next step to ask the model
ChatGPT would potentially transform the whole why it provides certain answer. As shown in Figure
research area. We would like to discuss three re- 2 and 3, ChatGPT provides perfect explanations for
search directions which might further improve the why the given tweet is in favor of the target Hillary
performance of ChatGPT on stance detection tasks. Clinton weather the stance is explicitly or implic-
(1) Are there better prompt templates? itly expressed in the text. Such results indicate that
In this work, only one prompt template for stance ChatGPT carries out stance classification based on
detection has been tested with ChatGPT. Engineer- logic reasoning instead of pure probability calcula-
ing the prompt template may further improve the tion. These explanations opens up the possibility
zero-shot performance of using ChatGPT or unlock of building explanatory AI for stance detection.
the use of ChatGPT to other NLP tasks. Futher (3)Can multi-round conversation help to im-
studies can take the intuitive approach of manually prove the results?
selecting prompt templates or design an automated ChatGPT has already shown exceptional results
process for template selection. with zero-shot prompting, however, it is more pow-
(2) How well can ChatGPT explain itself? erful than a stance classifier. For some instances
when ChatGPT cannot provide prediction results, it deep bidirectional transformers for language under-
can still explain why it cannot produce a prediction, standing. In NAACL-HLT, pages 4171–4186. Asso-
ciation for Computational Linguistics.
e.g. "the sentence does not mention or directly ref-
erence the target", as shown in 4 or even "instruct Kuntal Dey, Ritvik Shrivastava, and Saroj Kaushik.
the speaker to express opinion with respect and 2018. Topical stance detection for twitter: A two-
empathy", as shown in 5. These explanations help phase lstm model using attention. In European Con-
us select the innately flawed data in the dataset, for ference on Information Retrieval, pages 529–536.
Springer.
which no model and even no human can accurately
decide the stance only by the given information. Jiachen Du, Ruifeng Xu, Yulan He, and Lin Gui. 2017.
For those flawed tweets, it is still possible to deter- Stance classification with target-specific neural at-
mine the stance of it by fixing the issue in the fol- tention networks. International Joint Conferences
on Artificial Intelligence.
lowing conversation. In a multi-round conversation
with ChatGPT, we can feed a variety of informa- William Ferreira and Andreas Vlachos. 2016. Emer-
tion to the model including background knowledge, gent: a novel data-set for stance classification. In
missing part of the sentence, stance classification Proceedings of the 2016 conference of the North
American chapter of the association for computa-
examples and etc. Future investigation on how to tional linguistics: Human language technologies.
design a multi-round conversation may further im- ACL.
prove the performance of ChatGPT model on more
NLP tasks including stance detection. Yaakov HaCohen-Kerner, Ziv Ido, and Ronen
Ya’akobov. 2017. Stance classification of tweets us-
ing skip char ngrams. In Joint European Conference
on Machine Learning and Knowledge Discovery in
References Databases, pages 266–278. Springer.
Aseel Addawood, Jodi Schneider, and Masooda Bashir.
2017. Stance classification of twitter debates: The Binxuan Huang, Yanglan Ou, and Kathleen M Car-
encryption debate as a use case. In Proceedings of ley. 2018. Aspect level sentiment classification with
the 8th international conference on Social Media & attention-over-attention neural networks. In Interna-
Society, pages 1–10. tional Conference on Social Computing, Behavioral-
Cultural Modeling and Prediction and Behavior
Abeer AlDayel and Walid Magdy. 2021. Stance Representation in Modeling and Simulation, pages
detection on social media: State of the art and 197–206. Springer.
trends. Information Processing & Management,
58(4):102597. Yoon Kim. 2014. Convolutional neural networks
for sentence classification. In Proceedings of the
Pranav Anand, Marilyn Walker, Rob Abbott, Jean 2014 Conference on Empirical Methods in Natural
E Fox Tree, Robeson Bowmani, and Michael Minor. Language Processing (EMNLP), pages 1746–1751,
2011. Cats rule and dogs drool!: Classifying stance Doha, Qatar. Association for Computational Lin-
in online debate. In Proceedings of the 2nd Work- guistics.
shop on Computational Approaches to Subjectivity
and Sentiment Analysis (WASSA 2.011), pages 1–9. Dilek Küçük and Fazli Can. 2020. Stance detection: A
survey. ACM Computing Surveys (CSUR), 53(1):1–
I Augenstein, T Rocktaeschel, A Vlachos, and 37.
K Bontcheva. 2016. Stance detection with bidirec-
tional conditional encoding. In Proceedings of the Chen Li, Hao Peng, Jianxin Li, Lichao Sun, Lingjuan
Conference on Empirical Methods in Natural Lan- Lyu, Lihong Wang, Philip S. Yu, and Lifang He.
guage Processing. Sheffield. 2022. Joint stance and rumor detection in hierarchi-
cal heterogeneous graph. IEEE Trans. Neural Net-
Costanza Conforti, Jakob Berndt, Mohammad Taher works Learn. Syst., 33(6):2530–2542.
Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd,
and Nigel Collier. 2021. Synthetic examples im- Yingjie Li, Tiberiu Sosea, Aditya Sawant, Ajith Ja-
prove cross-target generalization: A study on stance yaraman Nair, Diana Inkpen, and Cornelia Caragea.
detection on a twitter corpus. In Proceedings of the 2021. P-stance: A large dataset for stance detection
Eleventh Workshop on Computational Approaches in political domain. In Findings of the Association
to Subjectivity, Sentiment and Social Media Analy- for Computational Linguistics: ACL-IJCNLP 2021,
sis, WASSA@EACL 2021, Online, April 19, 2021, pages 2355–2365.
pages 181–187. Association for Computational Lin-
guistics. Bin Liang, Zixiao Chen, Lin Gui, Yulan He, Min Yang,
and Ruifeng Xu. 2022. Zero-shot stance detection
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and via contrastive learning. In Proceedings of the ACM
Kristina Toutanova. 2019. BERT: pre-training of Web Conference 2022, pages 2738–2747.
Bin Liang, Yonghao Fu, Lin Gui, Min Yang, Jiachen Empirical Methods in Natural Language Processing,
Du, Yulan He, and Ruifeng Xu. 2021. Target- EMNLP 2016, Austin, Texas, USA, November 1-4,.
adaptive graph for cross-target stance detection. In
WWW ’21: The Web Conference 2021, Virtual Event Adam Tsakalidis, Nikolaos Aletras, Alexandra I
/ Ljubljana, Slovenia, April 19-23, pages 3453– Cristea, and Maria Liakata. 2018. Nowcasting the
3464. stance of social media users in a sudden vote: The
case of the greek referendum. In Proceedings of the
Yuxiao Lin, Yuxian Meng, Xiaofei Sun, Qinghong Han, 27th ACM International Conference on Information
Kun Kuang, Jiwei Li, and Fei Wu. 2021. BertGCN: and Knowledge Management, pages 367–376.
Transductive text classification by combining GNN
and BERT. In Findings of the Association for Com- Marilyn Walker, Pranav Anand, Rob Abbott, and Ricky
putational Linguistics: ACL-IJCNLP 2021, pages Grant. 2012. Stance classification using dialogic
1456–1462, Online. Association for Computational properties of persuasion. In Proceedings of the 2012
Linguistics. conference of the North American chapter of the as-
sociation for computational linguistics: Human lan-
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, guage technologies, pages 592–596.
Hiroaki Hayashi, and Graham Neubig. 2021a. Pre-
train, prompt, and predict: A systematic survey of Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel,
prompting methods in natural language processing. Barret Zoph, Sebastian Borgeaud, Dani Yogatama,
arXiv preprint arXiv:2107.13586. Maarten Bosma, Denny Zhou, Donald Metzler, et al.
2022. Emergent abilities of large language models.
Rui Liu, Zheng Lin, Yutong Tan, and Weiping Wang. arXiv preprint arXiv:2206.07682.
2021b. Enhancing zero-shot and few-shot stance
detection with commonsense knowledge graph. In Michael Wojatzki and Torsten Zesch. 2016. Stance-
Findings of the Association for Computational Lin- based argument mining–modeling implicit argumen-
guistics: ACL-IJCNLP 2021, pages 3152–3157. tation using stance. In Proceedings of the KON-
VENS, pages 313–322.
Saif M. Mohammad, Svetlana Kiritchenko, Parinaz
Sobhani, Xiaodan Zhu, and Colin Cherry. 2016. Chang Xu, Cecile Paris, Surya Nepal, and Ross Sparks.
Semeval-2016 task 6: Detecting stance in tweets. In 2018. Cross-target stance classification with self-
Proceedings of the International Workshop on Se- attention networks. In Proceedings of the 56th An-
mantic Evaluation, SemEval ’16, San Diego, Cali- nual Meeting of the Association for Computational
fornia. Linguistics (Volume 2: Short Papers), pages 778–
783.
Saif M Mohammad, Parinaz Sobhani, and Svetlana
Kiritchenko. 2017. Stance and sentiment in tweets. Bowen Zhang, Min Yang, Xutao Li, Yunming Ye, Xi-
ACM Transactions on Internet Technology (TOIT), aofei Xu, and Kuai Dai. 2020a. Enhancing cross-
17(3):1–23. target stance detection with transferable semantic-
emotion knowledge. Association for Computational
Taylor Shin, Yasaman Razeghi, Robert L Logan IV, Linguistics.
Eric Wallace, and Sameer Singh. 2020. Autoprompt:
Eliciting knowledge from language models with Bowen Zhang, Min Yang, Xutao Li, Yunming Ye, Xi-
automatically generated prompts. arXiv preprint aofei Xu, and Kuai Dai. 2020b. Enhancing cross-
arXiv:2010.15980. target stance detection with transferable semantic-
emotion knowledge. In Proceedings of the 58th An-
Vasiliki Simaki, Carita Paradis, and Andreas Kerren. nual Meeting of the Association for Computational
2017. Stance classification in texts from blogs on Linguistics, pages 3188–3197.
the 2016 british referendum. In International Con-
ference on Speech and Computer, pages 700–709. Chen Zhang, Qiuchi Li, and Dawei Song. 2019.
Springer. Aspect-based sentiment classification with
aspect-specific graph convolutional networks.
Maria Skeppstedt, Vasiliki Simaki, Carita Paradis, and In EMNLP/IJCNLP (1).
Andreas Kerren. 2017. Detection of stance and sen-
timent modifiers in political blogs. In International
Conference on Speech and Computer, pages 302–
311. Springer.
Qingying Sun, Zhongqing Wang, Qiaoming Zhu, and
Guodong Zhou. 2018. Stance detection with hierar-
chical attention network. In Proceedings of the 27th
International Conference on Computational Linguis-
tics, pages 2399–2409.
Duyu Tang, Bing Qin, and Ting Liu. 2016. Aspect
level sentiment classification with deep memory net-
work. In Proceedings of the 2016 Conference on

You might also like