Procedia Computer Science 216 (2023) 682–690

7th International Conference on Computer Science and Computational Intelligence 2022

7th International Conference on Computer Science and Computational Intelligence 2022
Sentiment analysis for customer review: Case study of Traveloka
Sentiment analysis for customer review: Case study of Traveloka
Keywords: Sentiment analysis, Twitter, Traveloka

1. Introduction
1. Introduction
Machine learning addresses the question of how to build computers that improve automatically through
experience learning
learningthe question quite
is growing of how to build
rapidly lately, computers
and it alsothat improve
develops everyautomatically through
time we gather data
worldwide. [1]. Machine
There are many learning is growing
examples and typesquite rapidly learning,
of machine lately, and
oneitofalso develops
which every analysis.
is sentiment time we gather data
SentimentThere are many
analysis examples
is a growing andattypes
field of machine learning,
the intersection one of
of linguistics and which is sentiment
computer scienceanalysis.
that attempts to de-
termine analysis is
the sentiment a growing
that fieldinside
is contained at the the
sentence of linguistics and
automatically. [2].computer
The goalscience that attempts
of sentiment analysistoisde-
termine the sentiment that is contained inside the sentence automatically. [2]. The goal of sentiment analysis is to

Ziedhan Alifio Diekson et al. / Procedia Computer Science 216 (2023) 682–690
determine what kind of sentiment we have just acquired from the dataset. Those sentiments could be either negative
or positive.
Analysis of these sentiments and opinions has spread across many fields, such as Consumer information, Mar-
keting, books, application, websites, and Social [3], which we will take an example from the social media platform
We currently know that sentiment analysis is an analysis method that analyzes sentiments from a sentence. We
also know that sentiment analysis can determine whether a sentence is a negative or positive sentiment.
The criteria for positive sentiments are sentences describing a sentiment that a person writes is happy and
satisfied with the app’s performance. The criteria of negative sentiment are the exact opposite of the positive
criteria. The expression that indicates unsatisfied with the performance of Traveloka will classify as a negative
An example of a positive sentence is that there are positive words inside the sentence, such as ”Bagus”, ”Enak”,
and ”Nyaman”. While the unhappy contains negative words inside the sentence, such as ”Jelek”, ”Sedih”, ”Kurang”
and other numerous bad words.
As we may have mentioned, sentiment analysis can be used on many platforms. One of which is a tweet from
Twitter. Nowadays, people tend to pour the inside of their minds through social media, Twitter. These tweets could
be anything, from texts or even a video attached to the tweet. However, it mainly serves as people nowadays think
of a place to say what is on their minds.
Traveloka is a mobile-based app focusing on traveling services such as ticket or hotel booking. That was
launched back in 2012 and mainly served to make traveling more accessible, and people can now book tickets from
their smart- phones. Because of Traveloka, travelers and backpackers can get suitable accommodations and the best
transportation available that they can choose.
The mobile traveling services company also launched its food delivery service called Traveloka Eats to compete
against other mobile application companies that launched their own food delivery services before Traveloka.
Traveloka launched Traveloka Eats in 2018. However, they could not compete against them due to their far stronger
competitors. Although it was launched in 2018, their advertisements about Traveloka Eats can be seen everywhere
recently. We can see them through YouTube advertisements or even banner advertisements online.
This study aims to identify the satisfaction of Twitter users with the services provided by Taveloka by using sen-
timent analysis. The criteria for satisfied or positive sentiments are people happy and satisfied with the app’s perfor-
mance. In contrast, the unhappy or negative criteria are the exact opposite of the satisfied or optimistic criteria, such
as unsatisfied people or bad experiences.

2. Literature Review

Sentiment analysis is one of the tasks of text classification, which aims to determine subjective information from
a sentence, whether the sentence has positive, negative, or neutral sentiments. Sentiment analysis will extract
contextual information from a text and then determine the sentiment of the text by utilizing certain algorithms and
calculations such as machine learning or deep learning. Sentiment analysis is a field where we can use various
methods to complete the task. Moreover, the platform on which we want to analyze the sentiment also varies. This
condition proves that this area of research is massive and that we can explore its possibilities and use various
methods. From using sentiment analysis for customer reviews to predicting the presidential election from Twitter
[4]. Sentiment analysis is also helpful for noticing how people feel during the COVID-19 outbreak. For example, the
research that was conducted by A.
D. Dubey shows which of the 12 countries they picked shows the most emotions [5]. Those emotions are anger,
anticipation, disgust, fear, joy, sadness, surprise, and trust. The results were quite interesting because we can finally
know which of these countries shows the highest emotions from the emotion that was mentioned before. Such as
the, people from France were the highest for giving angry emotions toward the COVID-19 outbreak. Their research
is indeed interesting because they have mixed results for each emotion.
Besides using sentiment analysis for people’s reactions to the COVID-19 outbreak, we can also use it for what
people think about the vaccines or the lockdowns that were happening a couple of years ago. The research that S.
Almotiri conducted aims to know what New Zealanders think about the lockdown that was happening there [6].
Ziedhan Alifio Diekson et al. / Procedia Computer Science 216 (2023) 682–690
678 Author name / Procedia Computer Science 00 (2019) 000–000

Most of the people there were surprised, took it positively, and supported the government’s actions for the
betterment of others. While lockdowns were taken positively, the question comes to mind about how people view
vaccines. The research that C. Villavicencio et al. conducted shows that the people in the Philippines were happy
that they were vaccinated [7]. However, some might perceive it as a dangerous vaccine, but not for the Philippines
as it would seem. We can also use sentiment analysis to know what people think about certain companies or
products. For example, the research that D. D. Das et al. conducted shows how many certain people think about an
airline company positively or negatively [8]. Not only on airlines, but the research also that was conducted by A.R.
Prananda et al. shows that
we can use it for other companies or products [9]. They used sentiment analysis for customers’ views about the
performance of the Go-Jek app. Which was new by the time they conducted this research.
As mentioned before, sentiment analysis can be done by using various methods. The research that R. Patel et al.
conducted was using a lexicon-based method aiming to find what people think about the World Cup held in Brazil
back in 2014 [10]. They used and gathered all the data they needed by themselves. Same along with the research that
B. Thapa conducted uses two platforms as the source of the data [11]. They used Twitter and Reddit posts as the
data source for gathering the topic about how they feel about cybersecurity. They used Python Text Processing as
the tool for data gathering and VADER (Valence Aware Dictionary for Sentiment Reasoning) algorithm as the
In sentiment analysis, the classification is divided into three levels: document level, sentence level, and aspect
level [12]. Here we use sentence level for this research. There are five steps when preprocessing data: cleaning,
removing stopwords, tokenization, and stemming [13]. After preprocessing, we can determine which words have
sentiment [14]. As a result, we can find out the sentiment of the data we are looking for [15].
Based on information uploaded by Tetra Pak Index(2017), there are around 132 million internet users in Indone-
sia, and 40 percent of the population are social media users [16]. Sentiment analysis analyzes opinions, sentiments,
evaluations, judgments, attitudes, and emotions towards entities such as products, services, organizations,
individuals, problems, events, topics, and their attributes [17]. Sentiment analysis is the process of reading text to get
sentimental information from the text [18]. In this study, similar to existing research, we collect data from tweets on
Twitter [19]. Whereas we use different methods, we also use Twitter’s API and scikit-learn.
Sentiment analysis is also a field where we can accomplish tasks using different methods. There are also different
platforms that we want sentiment analysis to vary. This proves that the field of study is so broad that various
methods can be used to explore its potential. The result of our literature reviews can be seen in table 1.

3. Methodology

Our stages of work is shown in Figure 1. When we first do the data retrieval stage, we first need authorization
from Twitter to gain permission to gather data from Twitter API. After we gathered the data, we next did the data
trans- formation step. This step transforms all of the datasets from word data into numerical data using TF-IDF
Vectiorizer, and we split data into train and test datasets. Moreover, after transforming it, we began the classification
training and predicting on the test data step, which calculates and gets the accuracy score and the f1-score. The f1-
score formula can be seen in the equation 1. The last step is Results Analysis. The f1-score obtained from each
model could represent in the form of a table based on the sentiments. Then we explore more about the classification
result from the highest f1-score model by representing the confusion matrix.

3.1. Data Retrieval

In this step, we create our dataset, which consists of 1200 tweets related to Traveloka. The datasets that will be
collected in this study will be using the following rules:

1. Twitter posts are filtered only to show posts from Indonesia

2. All Twitter posts that contains “traveloka” keyword
3. All Twitter posts that contains “traveloka eats” keyword
Ziedhan Alifio Diekson et al. / Procedia Computer Science 216 (2023) 682–690
Author name / Procedia Computer Science 00 (2019) 000–000 679

With the rules stated above, the datasets will be processed through a series of analysis methods. Below are
examples of the negative tweet within our dataset:
First example:

traveloka eat promonya makin ga menarik, auto uninstall deh. (traveloka eat’s promos these days are even more
uninteresting, automatically uninstalled it.)

Table 1. Literature Review Table

Paper Objective Method
To analyze prediction of Indonesia presidential election from They used the same platform as ours, but they use R
Twitter. programming language instead ofpython.
To identify the sentiments of the citi- zens from 12 different They used the same platform as ours, but they use R
[5] countries regarding COVID19 and identify what emotions have programming language instead ofpython just like the first paper
been shared by people from different parts of the world. did.
They used Rcurl as the IDE, sentR as the classifier, and several
[8] To know people thoughts about various ar- line services
base methods of Nat- ural language processing.
To know how satsified are the spectators of 2014 world cup
[10] through finding the hashtags ”#brazil2014” and #worldcup2014” They used the lexicon method
on twit- ter.
To identify and analyze how people in New Zealand feels about They used RapidAPI for the data collector and AFINN lexicon
the lockdown during theCOVID-19 pandemic. method as the analyzer.
To know how many Filipinos are enthusiast with the COVID-19 They used NLP and sentiment classificationusing Na¨ıve Bayes
Vaccine. classifier algorithm.
This study focuses on VAA study, which is a hospital to
[20] recommend candidate and parties to the people during the They used Dynamic Virtual Advice method.
Sentiment analysis of two international apparel brands to They used the same method as us, but they used streaming API
determine which brand is most popular. instead of regular twitterAPI.
To analyze sentiments related to cybersecu-rity posted by people They used VADER algorithm as the classi- fier and python text
in twitter and reddit. processing as the data collector and analyzer.
They used Natural Language Toolkit (NLTK) combined with
[22] To analyze sentiments from twitter posts re-lated to electricity
scikitlearn in phyton
To analyze the sentiment about reviews that are posted in the Gather data from Shopee review page on Google Play Store and
application page in Google Play Store about Shopee. use naive bayes to perform sentiment analysis.
To analyze the sentiment about reviews that are posted in the
[14] They used Naive Bayes method to identify.
application page in Google Play Store about Go-Jek.
To know whatever Trump supporter or Hillary supporter are The methods that they are using are the samewith ours, but they
positive, neutral, and negative. use streaming API.
To see public satisfaction with digital pay- ment services They used Na¨ıve Bayes and K-Nearest Neighbour Methods
available in Indonesia. which is different from ours.
They used Microsoft Analytic Text Analyt- ics instead of
[9] To identify the business intelligence analysis in GO-JEK.

Second example:

Jangan pesan tiket di @traveloka bikin emosi dan tidak punya tanggung jawab sama sekali #travelokakecewa
(Don’t book your tickets at @traveloka it makes you frustrated and they also don’t have any responsibility at

Both examples can be identified as a negative sentiment. The first example contains several negative words such
as uninteresting and uninstall which we can learned that the writer of the sentence is dissatisfied with the services of
Traveloka eats and wanted to uninstall the app. The second example is similar with the previous one. The sentence
contains the word frustrated which we can learned that the writer was clearly not happy with the booking system at
traveloka and the writer also encouraged other people not to use the platform.
Next, are some of the examples of the positive tweet within our dataset: First example:

barusan beli tix jakarta bali, totally 5jt, tapi karna reedem point, dapet potongan 300rb, trus ada promo
Ziedhan Alifio Diekson et al. / Procedia Computer Science 216 (2023) 682–690
680 Author name / Procedia Computer Science 00 (2019) 000–000

traveloka discount 100rb, mayan bgt diskonan 400rb. (I just bought a ticket from jakarta to bali, totally 5
million, but because I redeemed some points, I got 300 thousand off. Moreover, there’s a discount from
traveloka 100 thousand, it’s really nice that I got 400 thousand off.)

Second example:

Diskon traveloka eats manteepppp (The discounts in traveloka eats are greatttt)

Both examples can be identified as positive sentiments, with the first containing the word ”mantep” (great),
which we can learn that the writer is satisfied with the services. The second example also contains the word
”mantep” (great) but has differences from the first example. The difference is that the first example tells us that the
writer is generally satisfied with Traveloka, while the second example tells us that the writer is happy with the
discounts that Traveloka has to offer.
Machine learning algorithms often use numerical data, we need to transform or convert the data into a set of
numerical vector data with a process commonly known as vectorization. The vectorizer will convert input data by
calculating how much the TF-IDF score for each word in our dataset and finally put the information into a vector

3.2. Classification Training and Predicting on the Test Data

In the next step, we split the data into training and test subsets. From there, we can start on the training of the
classifier and predict the test data as well. The dataset was split into 80% of the train set and 20% of the test set. We
used Support Vector Machine (SVM), Naive Bayes, and Logistic regression models to see which classification has
the best accuracy.
The SVM model can also be used for sufficient data reduction. This research was already conducted by
Shenglong Zhou [26], which managed to reduce the memory and storage use using a kernel-based SVM model. We
used the SVM model because we needed its features to eliminate feature selection, which makes text classification
fairly easier.
A paper that was written by Ying Guan Et al. Logistic regression is also frequently used in the medical world to
frequently used to develop a predictive model based upon binary data to predict the likelihood of a patient’s health
status, such as health or disease [27]
We used Naive Bayes classifier because it has been used in various research. For example, a paper by Guoliang
Ou et al. stated that Naive Bayesian classifier (NBC) had been used in numerous domains. The main advantage of
the NBC is its simple model structure, which makes it easy to implement, and its good theoretical interpretability.
[28] This is also why we choose this classifier for our research because of its simplicity and is relatively easy to
implement to fulfill our research findings.
After we trained and tested the data, the next step that we did was classifying the data. In this step, we can finally
know the accuracy and also the f1-score from our dataset. Moreover, we printed out the report that can be seen so
that we can see how much accuracy and f1-score we get from our dataset. The formula of the f1-score can be
calculated using the equation:

𝑃𝑃𝑟𝑟𝑒𝑒𝑐𝑐𝑖𝑖𝑠𝑠𝑖𝑖𝑜𝑜𝑛𝑛 ൈ 𝑅𝑅𝑒𝑒𝑐𝑐𝑎𝑎𝑙𝑙𝑙𝑙
𝐹𝐹ͳ − 𝑆𝑆𝑐𝑐𝑜𝑜𝑟𝑟𝑒𝑒 ൌ ʹ ൈ   ͳ
𝑃𝑃𝑟𝑟𝑒𝑒𝑐𝑐𝑖𝑖𝑠𝑠𝑖𝑖𝑜𝑜𝑛𝑛 ൅ 𝑅𝑅𝑒𝑒𝑐𝑐𝑎𝑎𝑙𝑙𝑙𝑙

3.3. Results Analysis

After we obtained the accuracy and f1-score from each method, the next step was comparing the results from
each method and analyzing them. After that, we pick one model with the highest accuracy for further analysis, such
as the confusion matrix. Moreover, we summarize the performance result into a form of a table in which we can
finally see the precision, the f1-score, the recall score, and the support value for each sentiment from the model with
the best result that we have used.
Ziedhan Alifio Diekson et al. / Procedia Computer Science 216 (2023) 682–690
Author name / Procedia Computer Science 00 (2019) 000–000 681

Fig. 1. The methodology of this research

4. Result and Discussion

From the dataset, we have a total of 133,227 words inside. Fig 2a shows how many tweets we collected based on
their sentiments. We collected about 690 tweets categorized as positive, and 510 tweets categorized as negative. The
next thing we did was the classification step from three different methods.
The results from each method can be seen in table 2. The table shows us that using the SVM model acquired the
highest accuracy from the other two models. The lowest is logistic regression, in which we acquired an accuracy of
82,50%, and the Naive Bayes method which has 82,91%. From this, we used the results of the SVM model to create
other results, such as a confusion matrix.
The results of the experiment using Logistic Regression, SVM, and Na¨ıve Bayes can be seen in the table 3. The
performance accuracy of SVM for both sentiments are relatively high. From the results that we just acquired from
Ziedhan Alifio Diekson et al. / Procedia Computer Science 216 (2023) 682–690
682 Author name / Procedia Computer Science 00 (2019) 000–000

the SVM model, we also provided a confusion matrix that can be seen in the Fig 3 to acknowledge the errors that
was made in the model. Based on the values that is shown in the Fig 3, we can learn that the result is enough for us
to be satisfied with the experiment we had done.
Table 2. Table of Three Different Methods Results

TF-IDF 82.50% 84.5 82.91%




(a) Number of tweets based on its sentiment (b) Word cloud representation from the dataset

Fig. 2. Data Exploratory

In the word cloud figure 2b we can also see which words were most used in the dataset. The word cloud also uses
several stopwords to avoid unwanted words inside the word cloud. The example of stopwords are ”traveloka”,
”traveloka eat”, ”travelokaeat”, ”traveloka health”, ”ada”, ”di”, ”ini”, ”aku”, ”yg”, ”yang”, ”ga”, ”saya”, and many
other words that we acquired from the NLTK library. We removed these words because we thought they might not
be helpful and will not be much of a help for the analysis process.
We also provided a confusion matrix created based on the model with the highest accuracy, the SVM model. The
confusion matrix, can be seen in Fig 3 is to acknowledges the errors made in the model. Based on the values shown
in Fig 3, we can learn that the result is enough for us to be satisfied with our experiment.
We have learned that the dataset about Traveloka is not a topic often used for conducting sentiment analysis.
Despite the rising popularity, only a few researchers conducted a sentiment analysis about Traveloka. The dataset
we created, used in this experiment can be accessed through one of the author’s GitHub links in the footnote. 1
down below. With the accuracy score we acquired through our experiment, We can conclude that our experiment’s
results are good enough to meet our expectations.

Table 3. Table of Results

Method Sentiment Precision Recall F1 Score Support

Negative 0.93 0.64 0.76 102
Logistic Regression
Positive 0.78 0.96 0.86 138
Negative 0.86 0.78 0.82 98
Positive 0.86 0.92 0.88 142
Negative 0.81 0.77 0.79 102
Naïve Bayes
Positive 0.84 0.87 0.85 138

Ziedhan Alifio Diekson et al. / Procedia Computer Science 216 (2023) 682–690
Author name / Procedia Computer Science 00 (2019) 000–000 683

Fig. 3. Confusion matrix based on the SVM model

5. Conclusion and Future Works

This study shows public opinion on the Traveloka application based on data collected from Twitter. Based on a
total of 1,200 tweet data collected, our classification method proves that 610 positive and 590 negative tweets have
relatively high scores, but positive tweets have higher scores than negative tweets. We also use a word cloud to
categorize and find which vocabulary or keywords are frequently used in data sets that describe Traveloka user
performance and satisfaction. The dataset shows that Traveloka gets positive feedback on the promotions,
campaigns, and discounts they provide to users. For further research, we would apply different topics and methods,
such as algorithms, to get more accurate results in assessing public sentiment. These improvements should be
referenced in the body of the paper. Furthermore, by doing this research, we can conduct similar research in the
future with even more excellent results and methodology.


Ziedhan Alifio Diekson et al. / Procedia Computer Science 216 (2023) 682–690
