Daffa Jatmiko 1806196522 Naskah Ringkas Inggris 2022

Aspect-Based Sentiment Analysis of National Capital Relocation Plan Using
Naïve Bayes Classifier and Support Vector Machine
Daffa Jatmiko, Isti Surjandari
Department of Industrial Engineering, Faculty of Engineering, University of Indonesia, Depok, 16424, Indonesia
E-mail: daffa.jatmiko@ui.ac.id, isti@ie.ui.ac.id
Abstract
The role of the capital city is very vital, at this time the government has again decided to move the capital
city because Jakarta is considered no longer suitable as the capital city of the Republic of Indonesia. The relocation
of Indonesia's capital city in fact invites many pro and contra opinions among the public and this response is
interesting to study, namely how the public views this government policy which also describes the level of trust in
the government. Therefore, sentiment analysis is needed with a machine learning-based classifier that is accurate
and determines the best algorithm. Data in the form of tweets were collected by web scraping and pre-processed
which resulted in data labels in the form of polarity and identified categories/aspects. Machine Learning model
with Naive Bayes algorithm and Support Vector Machine is then used in the classification of binary class polarity
with n-gram features (word order) and heuristic optimization, namely Hyperparameter Tuning. From the
combination of features and optimization treatment, the MCC value as an evaluation metric was compared and it
was found that Naive Bayes outperformed the Support Vector Machine in classifying public opinion on Twitter
social media, especially regarding the relocation of the capital city.
Keyword: Sentiment Analysis, Opinion Mining, Social Media, Naïve Bayes Classifier, Support Vector
Machine
Aspect-Based Sentiment Analysis of National Capital Relocation Plan Using
Naïve Bayes Classifier and Support Vector Machine
Daffa Jatmiko, Isti Surjandari
E-mail: daffa.jatmiko@ui.ac.id, isti@ie.ui.ac.id
Abstract
The role of the capital city is very vital, at this time the government has again decided to move the capital
city because Jakarta is considered no longer suitable as the capital city of the Republic of Indonesia. The relocation
of Indonesia's capital city in fact invites many pro and contra opinions among the public and this response is
interesting to study, namely how the public views this government policy which also describes the level of trust in
the government. Therefore, sentiment analysis is needed with a machine learning-based classifier that is accurate
and determines the best algorithm. Data in the form of tweets is collected by web scraping and pre-processed
which produces data labels in the form of polarity and identified categories/aspects. Machine Learning model with
Naive Bayes algorithm and Support Vector Machine is then used in the classification of binary class polarity with
n-gram features (word order) and heuristic optimization, namely Hyperparameter Tuning. From the combination
of features and optimization treatment, the MCC value as an evaluation metric was compared and it was found
that Naive Bayes outperformed the Support Vector Machine in classifying public opinion on Twitter social media,
especially regarding the relocation of the capital city.
Keywords: Sentiment Analysis, Opinion Mining, Social Media, Nave Bayes Classifier, Support Vector
Machine
introduction
On April 29, 2019, President Joko Widodo through a limited government meeting
decided to move the country's capital city outside Java. The relocation of the capital city is
contained in the 2020-2024 National Mid-Term Development Plan. Furthermore, on August 26
2019, President Joko Widodo announced that a new capital city would be built in the
administrative areas of North Penajam Paser Regency and Kutai Kartanegara Regency, East
Kalimantan Province. The state capital plays a very strategic, fundamental, and vital role
because the state capital is multifunctional, namely as a center for politics and government, a
center for business and economic activities. The relocation of the capital is actually nothing
new. Historically, several cities have been the capital of Indonesia, including Yogyakarta,
Bukittinggi in West Sumatra,
Currently, the government is again discussing the matter of moving the capital city
because Jakarta is considered no longer suitable as the capital city of the Republic of Indonesia.
Its location further to the west of Indonesia is thought to be the cause of the high level of
inequality between regions in the country. Therefore, the relocation of the capital city from the
original city of Jakarta to another area that is considered more potential and has a better regional
carrying capacity. There are several reasons in the government's plan to move the capital city
of the Republic of Indonesia outside Java, one of which is related to the population in Jakarta
which does not decrease every year but increases significantly (Putri et al., 2018) because all
activity centers in Jakarta such as center of government, economy, business, education, etc.
which continues to make Jakarta's population more dense. This also causes the availability of
clean water in Jakarta to get worse (Luo et al., 2019). Another reason, according to research, is
related to the geographical condition of Jakarta, which is on the Ring of Fire, which means it is
in a disaster-prone circle.
In relation to the Government's policy regarding the relocation of the capital city to East
Kalimantan, many people have shown their agreement. Many factors underlie the agreement of
the majority, especially factors related to equity in Indonesia where in Indonesia it is not yet
fully evenly distributed, especially development and equity both in terms of economy and
infrastructure in underdeveloped, leading and outermost areas in Indonesia. It is undeniable that
there are also people who are against moving the capital city to East Kalimantan. There are
those who think that the preparations are not yet ripe. This happened because the government
was deemed less open to public opinion, thus giving the impression that the government was
not ready to move the capital city. In addition, with the many problems in the economic field,
education, socio-cultural and other problems that have not been resolved. These things can lead
to public opinion that the government is reckless in making the policy of moving the capital
city. With the emergence of various pros and cons opinions from the public, the plan to move
the capital city is an interesting case to examine not only its trends but also their views on
government policies. The hope is that the policies carried out by the government can be in line
with the aspirations of the people and the government is able to prepare them carefully in
various aspects. The planned relocation of the capital city is an interesting case to examine not
only their trends but also their views on government policies. The hope is that the policies
carried out by the government can be in line with the aspirations of the people and the
government is able to prepare them carefully in various aspects. The planned relocation of the
capital city is an interesting case to examine not only their trends but also their views on
government policies. The hope is that the policies carried out by the government can be in line
with the aspirations of the people and the government is able to prepare them carefully in
various aspects.
Theoretical Review
A. Naive Bayes
Using Bayes' theorem, we can write
𝑝(𝐶)𝑝(𝐹1 , … , 𝐹𝑛 |𝐶)
𝑝(𝐶|𝐹1 , … , 𝐹𝑛 ) =
𝑝(𝐹1 , … , 𝐹𝑛 )
In simple English the above equation can be written as
𝑝𝑟𝑖𝑜𝑟 × 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 =
𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
In practice we only focus on the numerator of the fraction, because the denominator is
independent of C and the feature value is given, so the denominator is effectively constant. The
numerator is equivalent to the combined probability model𝐹𝑖
𝑝(𝐶, 𝐹1 , … , 𝐹𝑛 )
Now the "naive" conditional independence assumption comes into play: assume that every
feature is conditionally independent of every other feature for. This means that
𝑝(𝐹𝑖 |𝐶, 𝐹𝑗 ) = 𝑝(𝐹𝑖 |𝐶)
For , so that the combined model can be expressed as𝑖 ≠ 𝑗
𝑝(𝐶, 𝐹1 , … , 𝐹𝑛 ) = 𝑝(𝐶) 𝑝(𝐹1 |𝐶) 𝑝(𝐹2 |𝐶) 𝑝(𝐹3 |𝐶) …
𝑛
= 𝑝(𝐶) ∏ 𝑝(𝐹𝑖 |𝐶)

𝑖=1
This means that under the independence assumption above, the conditional distribution of class
C variables can be expressed like this:
𝑛
1
𝑝(𝐶|𝐹1 , … , 𝐹𝑛 ) = 𝑝(𝐶) ∏ 𝑝(𝐹𝑖 |𝐶)
𝑍
𝑖=1
where Z (proof) is a scaling factor that depends only on , a constant if the value of the feature
variable is known.𝐹1 , … , 𝐹𝑛 , 𝑖. 𝑒.
The discussion so far has derived an independent feature model, namely the Bayesian
naive probability model. The naive Bayes classifier combines this model with decision rules.
One general rule is to choose the most probable hypothesis; This is known as the a posteriori
decision rule or maximum MAP. The appropriate classifier is the classify function which is
defined as follows:
𝑛
𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑦(𝑓1 , … , 𝑓𝑛 ) = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝(𝐶 = 𝑐) ∏ 𝑝(𝐹𝑖 = 𝑓𝑖 |𝐶 = 𝑐)

𝑖=1
B. Support Vector Machine
The SVM technique is a classifier that finds a hyperplane or function that correctly
separates two classes by the maximum margin. 𝑔(𝑥) = 𝑤 𝑇 𝑥 + 𝑏
Figure 1. Hard-maximum-margin separator hyperplane
Mathematically, given a set of points belonging to two linearly separable classes , , the distance
of each instance of the hyperplane is equal to . SVM aims to find w, b, so that the value is equal
to 1 for the closest data point belonging to the class and -1 for the closest to . This can be seen
|𝑔(𝑥)|
as having a margin𝑥𝑖 𝜔1 𝜔2 ‖𝑤‖
𝑔(𝑥)𝜔1 𝜔2
1 1 2
+ =
‖𝑤‖ ‖𝑤‖ ‖𝑤‖
while for x , and for x .𝑤 𝑇 𝑥 + 𝑏 = 1 𝜔1 𝑤 𝑇 𝑥 + 𝑏 = −1 𝜔2
This causes an optimization problem that minimizes the objective function
1
𝐽(𝑤) = ‖𝑤‖2
2
subject to constraint
𝑦𝑖 (𝑤𝑖𝑇 𝑥 + 𝑏) ≥ 1, 𝑖 = 1,2, … , 𝑁
When an optimization problem—whether minimization or maximization—has a
constraint on the optimized variable, the cost or error function is added by adding the constraint,
multiplied by the Lagrange multiplier. The Lagrangian function for SVM is formed by adding
the objective function to the weighted sum of the constraints,
𝑁
1
ℒ(𝑤, 𝑏, 𝜆) = 𝑤 𝑇 𝑤 − ∑ 𝜆𝑖 [𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) − 1]
2
𝑖=1
where w and B are called the primal variables, and the Lagrange multiplier.𝜆𝑖
Given the inequality constraints, the Karush-Kuhn-Tucker (KKT) condition generalizes the
Lagrange multiplier.
KKT conditions are:
1. Main obstacle
−[𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) − 1] ≤ 0 ∀𝑖 = 1, … , 𝑁
2. Double obstacle
𝜆𝑖 ≥ 0 ∀𝑖 = 1, … , 𝑁
3. Complementarity lags
𝜆𝑖 [𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) − 1] = 0 ∀𝑖 = 1, … , 𝑁
4. Lagrangian gradient (zero, with respect to primal variables)
𝑤 − ∑𝑁
𝑖=1 𝜆𝑖 𝑦𝑖 𝑥𝑖
∇ℒ(𝑤, 𝑏, 𝜆) = [ ]=0
− ∑𝑁
𝑖=1 𝜆𝑖 𝑦𝑖
Based on KKT conditions,

𝑁
𝑤 = ∑ 𝜆𝑖 𝑦𝑖 𝑥𝑖
𝑖=1
𝑁
∑ 𝜆𝑖 𝑦𝑖 = 0
𝑖=1
The double problem of SVM optimization is finding the

𝑁
1
𝑚𝑎𝑥 (∑ 𝜆𝑖 − ∑ 𝜆𝑖 𝜆𝑗 𝑦𝑖 𝑦𝑗 𝑥𝑖 𝑥𝑗 )
2
𝑖=1 𝑖,𝑗
Subject on
𝑁
∑ 𝜆𝑖 𝑦𝑖 = 0
𝑖=1
𝜆𝑖 ≥ 0 ∀𝑖
Research methods
The flow of this research consists of 7 stages, namely Research Initiation which starts
from the problem identification process to the process of studying literature on opinion mining
and IKN transfer. Then the Data Processing stage which starts from the data collection process
to the filtering process. Next is the Aspect Identification stage, followed by Feature Generation,
namely the N-Grams application. The last two stages are Classifier Building with Naïve Bayes
and Support Vector Machine with metaheuristic optimization, namely Hyperparameter Tuning
(Hyperparameter Optimization) and Evaluation with Confusion Matrix which measures
precision, recall, accuracy, and specificity as well as the Matthews Correlation Coefficient
(MCC).
A. Data Pre-processing
The Data Pre-processing stage begins with data collection from Twitter which is then
processed first by eliminating data duplication and unnecessary expressions in order to be able
to label the data into positive and negative polarity.
Table 1. Pre-Processing Stage
No. Phase
1 Data collection, web scraping with Python programming language through Twitter Developer Platform
(Twitter API).
2 Data Translation, the process of translating tweets into English
3 Duplication Removal Removes duplicate instances so that only one of all duplicate instances is saved.
4 Regular Expression Removal, replace regular expression to be removed
5 Polarity Labeling, The provided text is analyzed to determine if it expresses positive/negative/neutral
sentiments. Because the SVM algorithm is used, a binary polarity is needed so that it is filtered again to
only positive and negative polarities.
B. Aspect Identification
Aspect categories (e.g., food, price) identify features that are coarser than aspect terms,
and they do not always appear as terms in a given sentence. In the example of the topic of IKN
transfer above, the president, ministers, and government refer to the “IKN implementation or
implementation” aspect category because in the context of the tweets above it is explained that
these entities are involved in the work and implementation of the IKN transfer (not specifically
stated). explicit or direct but implicit). There are also many tweets that are directly
mentioned clearly such as Kalimantan, Samarinda, etc. which refer to the selection of the
location of the new capital city so that it is included in the "Location" aspect. There are also
several terms that have the same meaning as people, residents, and citizens where although
there are slight differences in their definitions, based on context they refer to the same entity so
that they fall into the category of "Social" aspects.
Table 2. List of 7 Aspects of IKN Transfer and their key words
Aspect
Other
Implementatio IKN
Location infrastructure Economy Social Environment
n of IKN Manager
development
Jokowi DKI Jakarta Ahok train money people habitat
east komodo
government candidates project budget locals
kalimantan dragon
leader of new
Minister Kalimantan airport debt citizen komodo island
capital
bill Bekasi chief construction benefits demonstration orangutan
ikn Bandung authority infrastructure pay students Forest
ikn law Samarinda cost
progress funds
relocation investment
pertamina
eagle
bumn
state-owned
enterprises
ASPECTS PERCENTAGE
Social Economy
Other 12% 16%
infrastructure
Environment
development
2%
9%
Location
9%
IKN
IKN manager implementatio
16% n
36%
Figure 2. Percentage of 7 Aspects in the tweets . data
C. N-grams feature generation

A method for checking continuous 'n' words or sounds of a given sequence of text or
speech. This model helps to predict the next item in sequence. In sentiment analysis, the n-gram
model helps to analyze the sentiment of a text or document. The Generate n-grams operator is
used which creates all possible n-Grams of each token in the document.
D. Classification Model Development
After the tweets are identified into several aspects and divided into several text
sequences, the tweets are then processed first through several stages before optimizing the
parameters (Hyperparameter Tuning). After the tweets will be entered into the Machine
Learning classification model. Table 3 describes the process of the ML classification model and
table 4 describes the parameters used for each algorithm.
Table 3. The tweets processing stage before the ML klasifikasi classification model
No. Phase
1 Case folding, converts all characters in the document into all lowercase letters
2 Tokenization, converts text to token before converting it to vector
3 Stopwords filter, removes common words that have no meaning or required information (Ling et al. 2014)
4 Stemming, removing word endings and bringing it to the root word
Table 4. Parameters of Naïve Bayes and Optimized Support Vector Machine
Algorithm Parameter Grid/range

Number of kernels 1 - 10
Naive Bayes
Bandwidth 0.1 - 0.5
Support Vector C 0.1 - 100

Machine Gamma 0.0001 - 10
E. K-fold Cross Validation Evaluation

There are 4 models that will be evaluated, namely: Nave Bayes, Optimized Naïve Bayes,
Support Vector Machine, and Optimized Support Vector Machine. Confusion matrix, model
performance criteria will be explained further in the analysis of research results. Evaluation of
all models is carried out with 10-fold validation so that there is no need for data testing and
training or splits in the data.
Research Results and Discussion
The final analysis in this study is a comparison between all the classification models
that have been developed. Table 5 contains all the performance evaluation values of the model.
Interestingly, both NB and optimized SVM outperform their own default classifiers. The default
NB has not been able to handle the classification task well with accuracy and MCC of 61.76%
and 10% in the unigram model, 59.76% and 11% in the bigram model, and 67.20% and 26% in
the trigram model, respectively. . Meanwhile, the optimized NB results produced only slightly
better performance (not significant) when compared to the default NB, the difference in
accuracy was 2.33% on the unigram model, 5.88% on the bigram model, and 1.21% on the
trigram model. .
Table 5. Performance of the classification model
Evaluation Classification Model

Criteria Default NB Optimized NB SVM defaults Optimized SVM
Unigram
Accuracy 61.76% 64.09% 62.69% 62.85%
Precision 64.91% 66.80% 62.69% 62.91%

Sensitivity 84.94% 84.94% 100.00% 99.26%
Specificity 22.82% 29.05% 0.00% 1.66%
MCC 10% 17% 0% 4%
Biggram
Accuracy 59.76% 65.64% 61.94% 58.21%

Precision 66.08% 66.91% 66.13% 69.12%
Sensitivity 73.58% 89.38% 80.49% 60.25%
Specificity 36.51% 25.73% 30.71% 54.77%
MCC 11% 20% 13% 15%
Trigram
Accuracy 67.20% 68.41% 64.56% 66.10%

Precision 69.57% 69.82% 65.44% 73.02%
Sensitivity 84.69% 87.41% 92.10% 72.84%
Specificity 37.76% 36.51% 18.26% 54.77%
MCC 26% 28% 16% 28%
Just like a slight improvement from standard to optimized in NB performance,

Hyperparameter tuning (Parameter optimization) only improves some SVM performance to be
slightly better (not significant). Accuracy and MCC improvements in SVM were 0.16% and
4.29% in the unigram model, respectively, a 3.73% decrease in accuracy and a 1.85% increase
in MCC in the bigram model, and an increase of 1.54% and 12% on the trigram model. Without
optimization, it can be said that SVM can still carry out its classification task even though it has
a smaller precision and specificity value. With an MCC value of 16%, it can be concluded that
this model is not a good classifier definition and even only reaches 28% after optimization. In
this case,
30% 28% 28%

26%
25%
20%
20% 17%
15% 16%
15% 13%
10% 11%
10%
4%
5%
0%
0%
Optimized NB Optimized SVM Default NB Default SVM
MCC Unigram MCC Bigram MCC Trigram
Figure 3. Model Performance Rating Based on MCC . Value
MCC is used as the main criterion to rank the classifiers in figure 3. The best classifier in this
study is the optimized NB trigram followed by the optimized SVM trigram, then the Default
NB trigram, and finally, the SVM default trigram.
Conclusion
This research also serves as a comparative study between the two methods and the
treatment of the optimization parameter settings given to each method. Contrary to the research
of Kristiyanti et al in 2020 and Hakim et al in 2021, this study found that NB was proven to
outperform SVM if the optimization parameter adjustment was carried out before the learning
process occurred. If no optimization process is carried out, this research is still in line with
many similar studies where Naïve Bayes is the best choice of classification model because of
the speed of learning with high-dimensional features on limited training data.
Optimize parameters are proven to be able to improve the performance of the
classification model with a fairly short computation time. The increase in NB is not significant
but is able to show that the NB algorithm can provide more satisfactory results if the parameters
used are appropriate and appropriate. From this study, obtained 4 classifiers although not good
enough, namely optimized NB, optimized SVM, default NB, and default SVM with the first
three models working better than the default SVM model which is almost unable to perform
sentiment analysis classification tasks.
Suggestion
The study can be improved in the future by adding more classifiers including: random
forest, decision tree, and k-nearest neighbors as well as more diverse and precise optimization
treatments. The addition of this classifier is expected to provide new insights regarding the
results and characteristics of the resulting model because it does not rule out the possibility that
classifier algorithms other than NB and SVM can produce better performance as well as with
other optimization treatments that allow the model to work better than before, so it is
recommended for research. future. In addition, with the increasing number of social media users
such as Twitter,
Reference list
Arslan, M. (2014). The significance of shifting capital of KAZAKSTAN from ALMATY TO ASTANA: An
EVALUTION on the basis of geopolitical and Demographic Developments. Procedia - Social and
Behavioral Sciences, 120, 98–109. https://doi.org/10.1016/j.sbspro.2014.02.086
Amrani, YA, Lazaar, M., & Kadiri, KE (2018). A novel hybrid classification approach for sentiment analysis of
text documents. International Journal of Electrical and Computer Engineering (IJECE), 8(6), 4554–4567.
https://doi.org/10.11591/ijece.v8i6.pp4554-4567
Carley, KM, Malik, M., Kowalchuck, M., Pfeffer, J., & Landwehr, P. (2015). Twitter usage in Indonesia. SSRN
Electronic Journal.https://doi.org/10.2139/ssrn.2720332
Hamdan, H., Bellot, P., & Bechet, F. (2015). Lsislif: CRF and logistic regression for Opinion target extraction and
Sentiment POLARITY ANALYSIS. Proceedings of the 9th International Workshop on Semantic Evaluation
(SemEval 2015), 753–758. https://doi.org/10.18653/v1/s15-2128
Kelly, J. (2020). The city Sprouted: The rise of
BRASÍLIA.http://www.jstor.org/stable/10.2307/26924964?refreqid=search-gateway.
Logan, D. (2013, October 20). Myanmar's Phantom capital. The Globalist.
https://www.theglobalist.com/myanmars-phantom-capital/.
Luo, P., Kang, S., Apip, Zhou, M., Lyu, J., Aisyah, S., Binaya, M., Regmi, RK, & Nover, D. (2019). Water quality
trend assessment in Jakarta: A rapidly growing Asian megacity. PLOS ONE,
14(7).https://doi.org/10.1371/journal.pone.0219009
Maitra, S., Madan, S., Kandwal, R., & Mahajan, P. (2018). Mining authentic student feedback for faculty using
nave Bayes classifier. Procedia Computer Science, 132, 1171–
1183.https://doi.org/10.1016/j.procs.2018.05.032
Narayanan, V., Arora, I., & Bhatia, A. (2013). Fast and accurate sentiment classification using an enhanced naive
bayes model. Intelligent Data Engineering and Automated Learning – IDEAL 2013, 194–
201.https://doi.org/10.1007/978-3-642-41278-3_24
Ni, P., Kamiya, M., & Ding, R. (2018). Cities network along the Silk road: The global Urban competitiveness
report 2017. SPRINGER.
Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion mining . Proceedings of
the Seventh International Conference on Language Resources and Evaluation (LREC'10).
Park, CW, & Seo, DR (2018). Sentiment analysis of twitter corpus related to artificial intelligence assistants. 2018
5th International Conference on Industrial Engineering and Applications
(ICIEA).https://doi.org/10.1109/iea.2018.8387151
Putri, RF, Wibirama, S., Sukamdi, & Giyarsih, SR (2018). Population condition analysis of jakarta land
deformation area. IOP Conference Series: Earth and Environmental Science, 148, 012007.
https://doi.org/10.1088/1755-1315/148/1/012007
Reva, D. (2017, May 9). Capital City Relocation and National Security: The Cases of Nigeria and Kazakhstan.
UPSpace. http://hdl.handle.net/2263/60413.
Siegel, FR (2020). Coastal city flooding. In Adaptations of coastal cities to global warming, sea level rise, climate
change and endemic hazards (pp. 27–34). essays, Springer.
Sutoyo, E., & Almaarif, A. (2020). Educational Data Mining for Predicting Student Graduation Using the Naïve
Bayes Classifier Algorithm. RESTI Journal (Systems Engineering and Information Technology), 4(1), 95–
101.https://doi.org/10.29207/resti.v4i1.1502
Bayhaqy, A., Sfenrianto, S., Nainggolan, K., & The Hunt, ER (2018). Sentiment analysis about e-commerce from
tweets using decision tree, K-nearest neighbor, and Naïve Bayes. 2018 International Conference on Orange
Technologies (ICOT). https://doi.org/10.1109/icot.2018.8705796
Fitri, VA, Andreswari, R., & Hasibuan, MA (2019). Sentiment analysis of social media Twitter with case of anti-
LGBT campaign in Indonesia using nave Bayes, decision tree, and Random Forest algorithm. Procedia
Computer Science, 161, 765–772. https://doi.org/10.1016/j.procs.2019.11.181
Guia, M., Silva, R., & Bernardino, J. (2019). Comparison of nave Bayes, support Vector Machine, decision trees
and random forest on sentiment analysis. Proceedings of the 11th International Joint Conference on
Knowledge Discovery, Knowledge Engineering and Knowledge Management.
https://doi.org/10.5220/0008364105250531
Hakim, SN, Putra, AJ, & Khasanah, AU (2021). Sentiment analysis on myindihome user reviews using support
vector machine and nave Bayes classifier method. International Journal of Industrial Optimization, 2(2),
151. https://doi.org/10.12928/ijio.v2i2.4437
Joachims, T. (1998). Text categorization with support Vector MACHINES: Learning with many relevant features.
Machine Learning: ECML-98, 137–142. https://doi.org/10.1007/bfb0026683
Kristiyanti, DA, Putri, DA, Indrayuni, E., Nurhadi, A., & Umam, AH (2020). E-wallet sentiment analysis using
nave Bayes and Support Vector Machine algorithm. Journal of Physics: Conference Series, 1641, 012079.
https://doi.org/10.1088/1742-6596/1641/1/012079
Neogi, AS, Garg, KA, Mishra, RK, & Dwivedi, YK (2021). Sentiment analysis and classification of Indian
Farmers' protest using Twitter data. International Journal of Information Management Data Insights, 1(2),
100019. https://doi.org/10.1016/j.jjimei.2021.100019

Daffa Jatmiko 1806196522 Naskah Ringkas Inggris 2022

Uploaded by

Copyright:

Available Formats

You might also like

Daffa Jatmiko 1806196522 Naskah Ringkas Inggris 2022

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Daffa Jatmiko 1806196522 Naskah Ringkas Inggris 2022

Uploaded by

Copyright:

Available Formats

Aspect-Based Sentiment Analysis of National Capital Relocation Plan Using

Naïve Bayes Classifier and Support Vector Machine

Daffa Jatmiko, Isti Surjandari

E-mail: daffa.jatmiko@ui.ac.id, isti@ie.ui.ac.id

Daffa Jatmiko, Isti Surjandari

E-mail: daffa.jatmiko@ui.ac.id, isti@ie.ui.ac.id

Using Bayes' theorem, we can write

= 𝑝(𝐶) ∏ 𝑝(𝐹𝑖 |𝐶)

𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑦(𝑓1 , … , 𝑓𝑛 ) = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝(𝐶 = 𝑐) ∏ 𝑝(𝐹𝑖 = 𝑓𝑖 |𝐶 = 𝑐)

B. Support Vector Machine

Based on KKT conditions,

The double problem of SVM optimization is finding the

Table 1. Pre-Processing Stage

Table 2. List of 7 Aspects of IKN Transfer and their key words

Jokowi DKI Jakarta Ahok train money people habitat

bill Bekasi chief construction benefits demonstration orangutan

ikn Bandung authority infrastructure pay students Forest

ikn law Samarinda cost

Figure 2. Percentage of 7 Aspects in the tweets . data

C. N-grams feature generation

D. Classification Model Development

Algorithm Parameter Grid/range

Support Vector C 0.1 - 100

E. K-fold Cross Validation Evaluation

Research Results and Discussion

Evaluation Classification Model

Precision 64.91% 66.80% 62.69% 62.91%

Specificity 22.82% 29.05% 0.00% 1.66%

MCC 10% 17% 0% 4%

Accuracy 59.76% 65.64% 61.94% 58.21%

Sensitivity 73.58% 89.38% 80.49% 60.25%

Specificity 36.51% 25.73% 30.71% 54.77%

MCC 11% 20% 13% 15%

Accuracy 67.20% 68.41% 64.56% 66.10%

Sensitivity 84.69% 87.41% 92.10% 72.84%

Specificity 37.76% 36.51% 18.26% 54.77%

MCC 26% 28% 16% 28%

Just like a slight improvement from standard to optimized in NB performance,

30% 28% 28%

MCC Unigram MCC Bigram MCC Trigram

Figure 3. Model Performance Rating Based on MCC . Value

You might also like