Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/370679557

Twitter sentiment analysis using support vector machine and deep learning
model in e-learning implementation during the Covid-19 outbreak

Conference Paper in AIP Conference Proceedings · January 2023


DOI: 10.1063/5.0128685

CITATION READS

1 158

5 authors, including:

Dinar Ajeng Kristiyanti Elly Indrayuni


Bogor Agricultural University Bina Sarana Informatika
34 PUBLICATIONS 242 CITATIONS 9 PUBLICATIONS 89 CITATIONS

SEE PROFILE SEE PROFILE

Acmad Nurhadi Akhmad Hairul Umam


Bina Sarana Informatika Atma Jaya Catholic University of Indonesia
12 PUBLICATIONS 98 CITATIONS 15 PUBLICATIONS 113 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Dinar Ajeng Kristiyanti on 13 May 2023.

The user has requested enhancement of the downloaded file.


RESEARCH ARTICLE | MAY 09 2023

Twitter sentiment analysis using support vector machine


and deep learning model in e-learning implementation
during the Covid-19 outbreak
Dinar Ajeng Kristiyanti  ; Dwi Andini Putri; Elly Indrayuni; ... et. al

AIP Conference Proceedings 2714, 020033 (2023)


https://doi.org/10.1063/5.0128685

CrossMark

 
View Export
Online Citation

Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf


Articles You May Be Interested In

Deep learning for Twitter sentiment analysis about the pros and cons of Covid-19 vaccines in Indonesia
AIP Conference Proceedings (May 2023)

Mining Twitter data on Covid-19 for sentiment analysis using SVM algorithm
AIP Conference Proceedings (May 2023)

Instagram user sentiment analysis of E-commerce promo using Naïve Bayes classifier algorithm method
AIP Conference Proceedings (May 2023)
2023
Citation:
Kristiyanti, Dinar & Putri, Dwi & Indrayuni, Elly & Nurhadi, Acmad & Hairul Umam, Akhmad. (2023). Twitter
sentiment analysis using support vector machine and deep learning model in e-learning implementation
during the Covid-19 outbreak. AIP Conference Proceedings. 020033. 10.1063/5.0128685.

Twitter Sentiment Analysis using Support Vector Machine and


Deep Learning Model in E-Learning Implementation during
the Covid-19 Outbreak
Dinar Ajeng Kristiyanti,1, a) Dwi Andini Putri,2, b) Elly Indrayuni,3, c) Acmad
Nurhadi,4, d) and Akhmad Hairul Umam5, e)
1) Information System, Universitas Multimedia Nusantara, Tangerang, Indonesia
2) Information Technology, Universitas Bina Sarana Informatika, Jakarta, Indonesia
3) Accounting Information System, Universitas Bina Sarana Informatika, Jakarta, Indonesia
4) Computer Technology, Universitas Bina Sarana Informatika, Jakarta, Indonesia
5) Communication, Tanri Abeng University, Jakarta, Indonesia

Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf


a) Corresponding author: dinar.kristiyanti@umn.ac.id
b) Electronic mail: dwi.dwd@bsi.ac.id
c) Electronic mail: elly.eiy@bsi.ac.id
d) Electronic mail: achmad.ahh@bsi.ac.id
e) Electronic mail: ahmad.umam@tau.ac.id

Abstract. Investigating the effectiveness of using e-learning during the Covid-19 outbreak around the world is very interesting.
This can be done by mining public opinion data on the application of e-learning during the Covid-19 outbreak. Twitter Sentiment
Analysis is one techniques that can be used by classifying tweet data related to public opinion and classifying it into positive and
negative sentiments, with the aim of seeing how public sentiment is related to the application of e-learning to help the government
in taking a policy. The stages of research carried out in this study include Data Collection, Data Pre-processing (Tokenizing, Trans-
form Case, Stopword Filter, Generate N-Gram and Stemming), Use of Models or Methods such as Support Vector Machine (SVM)
algorithm and Deep Learning models, Experiment and Model Assessment using Rapid Miner version 9.9, and Evaluation and Vali-
dation Results using Confusion Matrix and ROC Curves. Based on 444 tweet data in English with the keywords #elearningcovid19,
#elearning and #covid19, the results of the accuracy and AUC values of the SVM algorithm were superior to the Deep Learning
model, namely 90,53% and 87,16%, as well as the AUC value to 0,953 and 0,928. Based on the research results, it turns out that
more people in the world agree with the application of e-learning during the Covid-19 outbreak.

INTRODUCTION

Coronavirus Disease 2019 (Covid-19) has resulted in many schools closing around the world [1], even one year after
March 11, 2020 when the World Health Organization (WHO) has declared it a global pandemic [2] almost half of the
world’s students are still affected by partial or complete closure of schools [3]. Globally, according to data from the
United Nations Educational, Scientific and Cultural Organization (UNESCO), it can be seen in Figure 1 that more
than 1.3 billion children in 186 countries in the world are outside the classroom as the impact of school closures
due to the pandemic [1][3]. The number in Figure 1 refers to education globally, namely students registered at the
pre-primary, primary, junior high and senior secondary education levels, as well as at the tertiary education level. As
a result, education has changed dramatically with e-learning, where teaching is done remotely using a digital platform
[4][5][6][7][8].
Research shows that online learning has increased information retention, and is less time consuming, which means
the changes caused by Covid-19 are likely to persist [1]. As stated in research [5], e-learning is very popular among
students in which there are findings of e-learning impact such as students’ interest in using e-learning resources
and their performance. In research [6], e-learning has proven in overcoming the challenges caused by the Covid-
19 outbreak as well as in maintaining the operation of the University. However, in research [8], several challenges
or problems due to the application of e-learning emerged, such as social problems, lecturer problems, accessibility
problems, learning motivation, academic problems, generic problems, learning intentions, and demographics. It was
stated that mixed learning, namely the e-learning and face-to-face approaches, could be a solution to these problems.
Also in research [9], it was stated that face-to-face learning was preferred by students over e-learning. However, can
such a solution solve the problem, especially considering that the outbreak rate due to the Covid-19 outbreak is still
very high to date. Of course, many opinions have emerged due to the application of e-learning during the Covid-19
outbreak both in the community directly and on social media.
2nd International Conference on Advanced Information Scientific Development (ICAISD) 2021
AIP Conf. Proc. 2714, 020033-1–020033-9; https://doi.org/10.1063/5.0128685
Published by AIP Publishing. 978-0-7354-4520-8/$30.00

020033-1
Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf
FIGURE 1. Graphic of Students Number Affected by National School Closures around the World due to Covid-19 Outbreak

Discussing about public opinion on the application of e-learning in social media, several previous studies have
been conducted such as [10][11][12]. In research [10] for example, public opinion related to online learning during
the Covid-19 outbreak was analyzed, namely by doing web scrapping with sentiment analysis techniques from 154
online news articles and blog websites taken from Google and DuckDuckGo for 45 days from March 11. 2020.
With a dictionary-based approach from the lexicon-based method, it is shown that more than 90% of articles are
positive in tone, for blogs related to e-learning during the Covid-19 period. In research [11] opinions on the learning
process during the outbreak have also been investigated using word2vec techniques and machine learning techniques.
Sentiment analysis model begins with processing student sentiment and selecting features through word embedding
then using three machine learning classifications, namely Naïve Bayes, SVM and Decision Tree. Research in [12] has
proven that during the outbreak period which was most severe in March 2020, the problem of studying from home
increased nearly 15-fold in a year. Sentiment analysis conducted confirms that more than 60% of users agree to study
from home.
All the studies above employ the sentiment analysis technique. It has been done as well in several previous studies in
the last few years [13][14][15][16][17][18][19][20][21]. Twitter sentiment analysis is a popular method for exploring
opinion on social media [22][23]. Since opinions and their interrelated concepts such as sentiments, attitudes, and
emotions are central to human activity [24], opinionated posts on social media open the door for researchers to measure
the emotional tone behind data which can be classified into positive and negative sentiment classes. Various machine
learning techniques have also been widely applied in sentiment analysis and the most accurate and efficient way to
classify text is the SVM [25][26][27]. SVM has several advantages including high dimensional input space that can
handle some irrelevant features and document vectors are sparse [28]. However, SVM also has a weakness, namely
it requires a large number of training datasets and data collection is tedious [28]. In recent years, the Deep Learning
model approach has also been widely applied in sentiment analysis [29][30]. In research [31], Deep Learning models
such as Convolution Neural Networks (CNN) and Long Short-Term Memory (LSTM) have outperformed the SVM
algorithm. However, in research [32], Deep Learning model with LSTM is not better than SVM.
In relation to this study, two algorithm methods will be used, namely SVM algorithm and Deep Learning model
in sentiment analysis of public opinion on social media related to the application of e-learning during the Covid-19
outbreak in the world. Researchers will compare the two methods by measuring the level of accuracy of the Deep
Learning and SVM models, so that it can be seen which of the two algorithms is better in classifying public opinion
regarding to e-learning implementation during the Covid-19 era on Twitter using sentiment analysis technique.

RESEARCH METHODS

The stages of research study conducted by researchers are illustrated in Figure 2 below:
Based on the research stages above, method description of this experimental research study will be elaborate as
below:

020033-2
Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf
FIGURE 2. Diagram of Research Stages in the Twitter Sentiment Analysis Process

a. Collecting the Data


The first thing to do is data collection by crawling data from Twitter related to the application of e-learning during
the Covid-19 outbreak using the hashtag #elearningcovid19, #elearning and #covid19. The data crawling process
using Rapid Miner is depicted in Figure 4 below:

FIGURE 3. The Data Crawling Process Using Rapid Miner

The data crawling process is carried out using an access token obtained from the Twitter API through the Rapid
Miner tool, namely Search Twitter. After successfully connecting with Twitter, tweet data will be obtained which
will then go through a filtering process using the Select Attributes operator to retrieve only the required attributes
such as username and text. The next stage is the cleaning process using the Remove Duplicates operator to remove
duplicate data from the Twitter crawling process and the Replace Missing Value operator to keep no missing values
and the Subprocess operator to remove unnecessary words. The final stage in the data crawling process is the use
of the Sentiment Analysis operator which is used for labelling. This labelling is used to distinguish positive tweets
and negative tweets. From the withdrawal of 1000 English-language tweets, 743 tweets related to keywords related
to the implementation of e-learning were generated during the Covid-19 outbreak. However, after the filtering and
cleaning process, 470 tweets were obtained with a total of 382 tweets labelled positive, 62 tweets labelled negative,
and 26 tweets labelled neutral. There are only 2 categories used in the sentiment analysis classification process,
namely positive and negative classes. The output of this crawling process is in .xls format so that after that it can be
pre-processed data. An example of a dataset that already has a label can be seen in Table 1 below.
b. Pre-Processing Data
After the data is extracted from Twitter the next step is to clean up the data, including:
1) Tokenization

020033-3
TABLE I. Sample of Dataset that Already Has Labels

Text Label
With the emerging #technology & development in the education industry, Positive
#elearning has been a boon to all of us today. It is certainly changing
ways to learn, understand & educate oneself easily.
Keep #learning, stay #educated and have a great #weekend! Positive
Digital and new media #tech allow educational reformers to introduce Negative
progressive pedagogical approaches now also to communities filled with
social inequalities.
Teaching About Fake News #elearning Negative

Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf


Words, symbols, phrases, and other important entities (known as tokens) are separated from a text to be ana-
lyzed. Tokenization broadly breaks down a set of characters in a text (sentence) into word units. In addition, in this
stage, features or words that are not words are selected. In this case, the researchers take and remove all punctuation
marks and everything which is not letters. So that the text is clean from punctuation and numbers or anything which
is not letters.
2) Transform Case
The letters are changed from uppercase to lowercase letters for all words in the sentence.
3) Filter Stopword (English)
Important English words (the dataset is in English) are taken from the token results. There are two options
can be done using a stoplist algorithm (removing less important words) or a wordlist (storing important words). The
researchers discard less important words, such as conjunctions, namely "that", "and", "at", "from" and so on.
4) Generate N-grams
N-gram is a combination of adjectives that often appear to show a sentiment, among them the Twitter text data
token which consists of only one word, then Bigram, which is the Twitter text data token which consists of two words
and trigram, namely the Twitter text data token which consists of three the word. This research uses the trigam token
type.
5) Stemming
Stemming technique is needed in addition to minimizing the number of different indexes of a document, as
well as for grouping other words which have the same basic word and meaning but have different forms because
they get different affixes. For example, the word “together”, “togetherness”, “to equate”, will be stemmed to the root
word, which is "equal". However, like stopping, stemming performance also varies and often depends on the language
domain used.
For the pre-processing data, the researchers used Rapid Miner version 9.0 which can be seen in Figure 4 below.

FIGURE 4. Pre-Processing Data Stage

020033-4
c. The Method Used
Furthermore, the classification process is carried out using SVM algorithm and Deep Learning model in classifying
Twitter Sentiment Analysis on public opinion on Twitter related to the application of e-learning during the Covid-19
outbreak.
d. Experimentation and Assessing Method
The researchers used Rapid Miner Studio version 9.9 as an experiment toward methods used in this study.
e. Evaluation and Result Validation
To evaluate the data, the researchers used N-Fold Cross Validation. In the N-Fold Cross-Validation test the dataset
is divided into N partitions randomly. The experiment was carried as much as umber of N, where each experiment
used the N partition data as testing data and used the remaining partitions as training data. In this study, researchers
used 10-Fold Cross Validation. However, the precision assessment was carried out with the Confusion Matrix and
ROC (Receiver Operating Characteristic) curve to determine the AUC (Area Under Curve) value.

Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf


RESULT AND DISCUSSION

This part will explain the experiment results that have been carried out, namely analysing the performance and accu-
racy results into two parts, including:
a. Result
Based on the training data used, as many as 444 tweet data in English with a division of 382 for tweet data labeled
positive and 62 tweet data labeled negative. After carrying out all the stages of the research method above, the
researchers convey the main process of the SVM algorithm which can be seen in Figure 5 and the main process of the
Deep Learning model which is carried out using Rapid Miner version 9.9 can be seen in Figure 6.

FIGURE 5. The Main Process of SVM Algorithm

020033-5
Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf
FIGURE 6. The Main Process of Deep Learning Model

After testing, the results of the test comparison of the two algorithms used are obtained. The researchers describe
them in the Confusion Matrix as results of the sentiment analysis of e-learning application during the Covid-19
outbreak based on public opinion on Twitter with the SVM algorithm in Table 2 and the Confusion Matrix for the
Deep Learning model in Table 3.

TABLE II. Confusion Matrix Result of Sentiment Analysis in E-Learning Implementation during the Covid-19 outbreak based
on Public Opinion on Twitter with SVM Algorithm
SVM Accuracy: 90,53%
True Positive True Negative Class Precision
Positive Prediction 382 42 90,09%
Negative Prediction 0 20 100,00%
Class Recall 100,00% 32,26%

TABLE III. Confusion Matrix Result of Sentiment Analysis in E-Learning Implementation during Covid-19 outbreak based on
Public Opinion on Twitter with Deep Learning Model
Deep Learning Model: 87,16%
True Positive True Negative Class Precision
Positive Prediction 382 57 87,02%
Negative Prediction 0 5 100,00%
Class Recall 100,00% 8,06%

Based on Table 2 and Table 3 above, the researchers compared the results of both in Table 4 namely the com-
paration of values from accuracy, precision and recall in each method used. AUC values in ROC Curve for each
of the tested algorithms are show in Figure 7 for the SVM algorithm and Figure 8 for the Deep Learning model.
b. Discussion
From the results shown in Table 4, the accuracy value of the SVM algorithm is 90.53%, and the Deep Learning
model is only 3.37% adrift, which is 87.16%. The result of class precision from the SVM algorithm is 90.09% while
the Deep Learning model is 87.02%. The class recall result of the SVM algorithm is 100.00% as well as the Deep

020033-6
TABLE IV. Comparation Results of Accuracy, Precision, Recall and AUC Value of SVM Algorithm and Deep Learning Model
Accuracy Class Precision Class Recall AUC Value
SVM Algorithm 90,53% 90,09% 100,00% 0,953
Deep Learning Model 87,16% 87,02% 100,00% 0,928

Learning model of 100.00%. The AUC value of the SVM algorithm is 0,953 and so is the Deep Learning model, the

Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf


difference is only 0,025, which is 0,928. Overall, based on the above results, the performance of the SVM algorithm
and the Deep Learning model in classifying public opinion from Twitter data regarding to the e-learning application
during the Covid-19 outbreak, both of which show as excellent classification.

The interesting thing for this study is both SVM algorithm and the Deep Learning model are able to predict positive
tweet data with a value of 100% true positive. Even if it is seen based on the data, there is an imbalance between
the data labeled positive and negative. This is common due to the process of data collection employs data crawling
techniques on Twitter, of course, will display the data as it is which is in line with the keywords used and the reality
in the existing dataset. In fact, there are more public opinion who agree with the application of e-learning during the
Covid-19 outbreak than those who disagree with the application of e-learning.

FIGURE 7. ROC Curve for SVM Algorithm with AUC 0.953

020033-7
Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf
FIGURE 8. ROC Curve for Deep Learning Model with AUC 0.928

CONCLUSION AND SUGGESTION

It goes without saying that the efforts of doing a research study to find out public opinion using sentiment analysis
techniques by classifying them into positive and negative sentiments from tweet data of public opinion regarding the
e-learning application during the Covid-19 outbreak on Twitter social media are able to give an input for successful
indicators of the government in making policies. Based on the research result, it turns out that more people in the
world agree with the application of e-learning. The test results show that the accuracy value of the SVM algorithm is
90,53% and the Deep Learning model is 87,16%, with the difference being only 3,37%. Moreover, the AUC value of
the SVM algorithm is 0,953 and the Deep Learning model is 0,928, with the difference being only 0,025 adrift. So, it
can be concluded that the SVM algorithm is considered as the best because it has the highest accuracy and precision.
Although both results are classified as excellent classification.
For further research, it is necessary to use larger and more complex datasets, if necessary, it can also involve the
use of feature selection techniques and refinement of pre-processing for non-standard languages. So that the accuracy
results and AUC values obtained can be even better than previous studies.

ACKNOWLEDGMENTS

We would like to acknowledge the support and facilitation given by Universitas Multimedia Nusantara towards the
completion of the research.

REFERENCES
1. World Ecoconomic Forum Covid Action Platf. 1 (2020).
2. WHO, WHO.Int (2020).
3. UNESCO, Unesco 1 (2020).
4. M.A. Almaiah, A. Al-Khasawneh, and A. Althunibat, Educ. Inf. Technol. 25, 5261 (2020).
5. V. Sathishkumar, R. Radha, K. Mahalakshmi, V.S. Kumar, and A.R. Saravanakumar, Int. J. Control Autom. 13, 1088 (2020).
6. T. Favale, F. Soro, M. Trevisan, I. Drago, and M. Mellia, Comput. Networks 176, (2020).
7. S. Abbasi, T. Ayoob, A. Malik, and S.I. Memon, J. Med. Sci. 36, Covid19 (2020).
8. E. Aboagye, J.A. Yawson, and K.N. Appiah, Soc. Educ. Res. 2, 109 (2020).
9. D. Dwidienawati, D. Tjahjana, and S.B. Abdinagoro, J. Soc. Sci. 48, 1190 (2020).
10. K.K. Bhagat, S. Mishra, A. Dixit, and C.Y. Chang, Sustain. 13, 1 (2021).
11. L. Mostafa, in Adv. Intell. Syst. Comput. (Springer Science and Business Media Deutschland GmbH, 2021), pp. 195–203.

020033-8
12. S. Wrycza and J. Maślankowski, Inf. Syst. Manag. 37, 288 (2020).
13. M. Wahyudi and D.A. Kristiyanti, J. Theor. Appl. Inf. Technol. 91, (2016).
14. D.A. Kristiyanti and N. Normah, SinkrOn 4, 32 (2019).
15. D.A. Kristiyanti and M. Wahyudi, 2017 5th Int. Conf. Cyber IT Serv. Manag. CITSM 2017 (2017).
16. D.A. Kristiyanti, Normah, and A.H. Umam, Proc. 2019 5th Int. Conf. New Media Stud. CONMEDIA 2019 36 (2019).
17. D.A. Kristiyanti, Konf. Nas. Ilmu Pengetah. Dan Teknol. 2, 74 (2015).
18. D.A. Kristiyanti, A.H. Umam, M. Wahyudi, R. Amin, and L. Marlinda, in 2018 6th Int. Conf. Cyber IT Serv. Manag. CITSM 2018 (2019).
19. D.A. Kristiyanti, D.A. Putri, E. Indrayuni, A. Nurhadi, and A.H. Umam, J. Phys. Conf. Ser. 1641, (2020).
20. D.A. Putri, D.A. Kristiyanti, E. Indrayuni, A. Nurhadi, and D.R. Hadinata, J. Phys. Conf. Ser. 1641, (2020).
21. D.A. Kristiyanti, Semin. Nas. Inov. Tren 2015 “Peluang Dan Tantangan Indones. Dalam Menyikapi Afta 2015” 134 (2015).
22. J. Prusa, T.M. Khoshgoftaar, and A. Napolitano, Proc. - 2015 IEEE 14th Int. Conf. Mach. Learn. Appl. ICMLA 2015 535 (2016).
23. R. Dehkharghani, H. Mercan, A. Javeed, and Y. Saygin, Expert Syst. Appl. 41, 4950 (2014).
24. B. Liu, Sentiment Analysis and Opinion Mining, Lecture 1 (Morgan Claypool Publishers, Canada, 2012).
25. L. Kaurpabla, P. Jain, and P. Patel, Int. J. Adv. Trends Comput. Sci. Eng. 9, 4558 (2020).
26. S.D.P. Devisetty, Y.M. Sai, A.V. Yadav, and P. Vidyullatha, Int. J. Innov. Technol. Explor. Eng. 8, 1410 (2019).
27. V.K. Jain, S. Kumar, and S.L. Fernandes, J. Comput. Sci. 21, 316 (2017).
28. M.D. Devika, C. Sunitha, and A. Ganesh, Procedia Comput. Sci. 87, 44 (2016).

Downloaded from http://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0128685/17430908/020033_1_5.0128685.pdf


29. S. Hansun, A. Suryadibrata, R. Nurhasanah, and J. Fitra, Indian J. Comput. Sci. Eng. 13, 51 (2022).
30. K. Kelly Isyanta, Int. J. Adv. Trends Comput. Sci. Eng. 9, 5364 (2020).
31. V.K. Jain, S. Kumar, and P. Mahanti, Int. J. Enterp. Inf. Syst. 14, 77 (2018).
32. S.M. Samiul Salehin, R. Miah, and M. Saiful Islam, in ACM Int. Conf. Proceeding Ser. (2020).

020033-9
View publication stats

You might also like