Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)

IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0

Comparison Study of Sentiment Analysis of Tweets


using Various Machine Learning Algorithms
Suvarna G Kanakaraddi Ashok K Chikaraddi
School of Computer Science and Engineering Dept.of MCA
KLE Technological University KLE Technological University
Hubli, Karnataka, India Hubli, Karnataka, India
Suvarna_gk@kletech.ac.in chikaraddi@kletech.ac.in

Karuna C. Gull P S Hiremath


Computer Science and Engineering Dept. of MCA
KLE Institute of Technology KLE Technological university
Hubli, Karnataka, India Hubli,Karnataka, Inia
karunagull74@gmail.co m pshiremath@kletech.ac.in

Abstract— Today’s advancement in the design of web Consumer information; the, and the Streaming API can gather
technology, has made enormous data available for internet Twitter data constantly. Also, engineers are able to merge
operators and also the numerous data is being created. Web these different APIs to brand self-owned applications. Hence,
technology networking sites like Twitter, Facebook, Google+ has notion investigation seems to have a solid fundament with the
become a platform for sharing and exchanging the knowledge, backing of monstrous online information [2].
for discussing and expressing their views about different themes
with, unlike groups. The proposed system emphasizes largely on It was also mentioned [3] that the Social Networking Sites
opinion investigation of twitter facts that support the stand as the backbone to provide access to associates and
investigation of the data in the tweets that show ideas are share messages. Messages may be in the form of matter or
extremely unstructured, varied and in some instances positive or substance in posts, shares, comments. The extraction and
negative, and also emotions are taken into account for the analysis of the matter inside, are the challenging issues in
sentiment analysis. The proposed system provides a performance Social Media as the data is vibrant and the format of data will
analysis of machine learning algorithms such as S upport Vector vary from site to site. Their work gives the steps in sequence
Machine, Navie Bayes, Max Entropy, LS TM, CNN, Random to extract the content from a twitter account and help us to
forest. Among all these techniques S VM provides an accuracy of come up with new innovative ideas in developing better
79.90%. applications by using the dynamic data.
Keywords—SVM (Support Vector Machine), Long Short- This research discusses the development of various
Term Memory (LSTM), Convolution Neural Networks machine learning algorithms to classify the tweets of different
(CNN), Application Program Interface (API), Natural company cell phone users. Google Samsung, Mi, Apple and
Language Processing (NLP) other cell phone company data set is considered and the
Twitter API is used to collect the Tweets of the users of these
I. INT RODUCT ION cell phones.
Sentiment analysis is a set of concentration, thought, or People have come across several developments in the
decision provoked by inclination. It deliberates individuals’ earlier span of time in the utilization of online networking
sentiments towards precise elements. Web technology is a assets and microblogging sites like Twitter, Facebook and
creative spot for such sentiment data. Each individual or as a YouTube. These developed rich assets have become
client can post their views and have a discussion through marketing information for many organizations and
different internet-based activities such as discussions, small associations. Most of the organizations conducted meeting and
scale websites, or online social organizing destinations. From reviews to increase the response and also to know the quality
the viewpoint of scientist’s, numerous internet-based life of the products. But such conventional techniques were time-
destinations discharge can be made like, their application intensive, costly, and not returned the outcomes the
programming interfaces (APIs), provoking information organizations were searching for because of eco-friendly
gathering and Investigation from scientists as well as issues and ill-structured studies. Hence today valuable
designers. For example, Twitter at present comprises of three feedback on products and services are received through
distinctive adaptations of accessible APIs [1] , Specifically the marketing strategies that are particularly relying on sentiment
Streaming API, the REST API, the Search API. The Search analysis and natural language processing. Today huge amount
API allows designers to query explicit twitter data. In the of client’s sentiment data is uploaded on internet each day and
REST API, designers can accumulate standing evidence and this data is more of unstructured content and its troublesome

978-1-7281-4685-0/20/$31.00 ©2020 IEEE 287

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0

to gain the meaning for computers. In previous times it was Authors have proposed a process to determine the
impractical to understand a lot of ill-shaped information, sentiment of the tweet and categorized the tweets into positive
however at this point with computational power after the and negative tweets [8]. An ensemble classifier method is
estimation of Moore's law (Moore, 1965) and dispersed proposed here where a single classifier is formed by joining
systems of PCs utilizing structures, for example, Hadoop, the base learning classifier. The objective of this research is to
enormous data files can be presently understood with greater improve the performance and accuracy of the sentiment
ease. Lot of assumption is used in this field of research, for differentiation method. Data preliminary-processing and
example, Watson and Googles ongoing procurement of deep feature depiction in sentiment differentiation are also
learning technology and IBMs rigorous research into their inspected. Model is developed in python. To represent the
Natural Language Processing supercomputer. Hence with classification, similarity measure and evaluation purpose
extended interest and studies into this field Complex devices Scikit-learn is used. To pre-process the data Natural Language
will be gaining a comprehended idea from the content that will Toolkit (NLTK) is used to stem and remove the stopping term.
incredibly enhance data analytics and search engines. To handle the data set Padas is utilized. Multidimensional
arrays are handled by using NumPy.
II. LIT ERAT URE SURVEY According to the authors [9] opinions can be collected
Authors have [4] developed a framework for from movie reviews sites, e-commerce sites and social media
understanding the sentiments of bikers from the twitter data. sites. These opinions usually facilitate by giving suggestion on
Hashtags for the eight-year data were obtained. This research how to launch a particular product or how to improve the
has provided the techniques for performing text extraction and provision of service etc. Some sites are capable to analyse the
opinion analysis for the unstructured text data. Observation huge data statistically available on sites to give the statistical
shows the biking data which is related to seasonal and weather report and even those sites tried to provide users with a visual
changes. Common sentiments on biking were positive. insight into what people feel about a certain product and other
Though the negative opinions are related to crime, ill weather kinds of stuff. Authors detail the process of analysis of the
and other circumstances. In recent year’s dichotomy score classification algorithm and discuss various validation
shows more positiveness compare to previous years. This methods. They have worked with real samples. Authors
framework helps planners and decision-makers for biking. worked on different thresholds that limit feature sets. Results
showed in graphical form exposed the performance of the
Authors have carried out work on the k-means algorithm classifier used. They mentioned future opinion mining systems
[5]. This algorithm works correctly if the initial seed is need better insight into natural language opinions so that it can
selected perfectly. Locating objects to the cluster is checked as fill a gap that has raised
per the requirements and max E/I value. In the beginning, if
the better initial seed is chosen then excellent result with Authors have proposed a novel approach of utilizing
higher E/I ratio and appropriate cluster designs should be lexicons that are based on semantic likeliness between text
obtained. terms and lexicon word stock [10]. The proposal considers an
opinion inspection which uses lexicon-based semantic
This research [6] focuses on developing a model which can likeliness as a characteristic and embedding based
analyze the contents of a microblog and analyses the customer representation. The experiment is conducted on seven public
feedback. This application helps the organization to know the data sets and four sentiment lexicons. Evaluation of the feature
customer's perceptions of a product. Authors have used fuzzy extraction is performed through several statistical methods.
logic for converting the linguistic variables into membership The main goal of this research is to semantic feature extraction
functions that elaborate the variables in a fuzzy system. A is to be integrated with embedding based representation.
linguistic variable and a fuzzy set that is related to a specific
parameter are represented by every membership function. If This research uses nine years of data set of microblog
then rules are designed to describe the interaction of the fuzzy messages. Relationship between sentiments extracted from the
set. The entire hypothesis collaborates into the final fuzzy set. global financial market is analysed. Here user subgroups are
This set produces a fuzzy value that is transformed into a firm considered for sentiment metrics. Sentiment measures
value using suitable approaches. captured financial growth in the main economic area [1].
According to the authors [7] a technique was built by them Author [11] has used the Twitter report of XL Axiata
to detect and summarize an overall sentiment. The 92XL123), Telkomsel (@telkomsel) and indosat (@
methodology suggested by the works with sentiments on indosatmania), extracted opinion and processed the sentiments
Twitter data contextually. They concentrated on Sentiment with three classifier algorithms such as SVM, Navie Bays and
Analysis as a major feature and they carried out with Natural Decision tree. Designed a real-time dashboard. Most of the
Language Processing (NLP). In their work, they carried out a keywords that people discuss are summarized on the
fine pre-processing on the data extracted from Twitter using dashboard.
Twitter API. They used Part-Of-Speech (POS) tagger, This research proposes an algorithm based on emotion
SentiWordNet, WordNet and NLP to assign weights to words score learning for sentiment classification [12]. This algorithm
in the tweets. The analysis of the data modified is done using is scalable as it does not require the manual score. The
various algorithms available in the WEKA tool. Then they experiment is conducted on 1000 tweets. The proposed
compared the results with and without lexical/NLP method. algorithm has detected the sentences as positive and negative
sentences.

978-1-7281-4685-0/20/$31.00 ©2020 IEEE 288

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0

In this research interests of the target, the user is analyzed


for a particular brand and categories of interests of the
customer's activities in social networking s ite such as
Facebook is considered [13]. Fuzzy clustering approach is
proposed. The approach experiments on the real-time samples.
Authors have proposed methods to perform the prediction
of the stock market of different companies using sentiment
analysis[14] [15]. Also developed SVM and linear regression
model to predict the accuracy of the result. Fig. 2. Physical view of System

III. PROPOSED SYST EM The preprocessed data collected from the real time twitter
The Proposed system makes use of the Twitter API to data set using twitter API and sent to the following algorithms
download the tweets of Google, Samsung, Mi, Apple and Naive Bayes, CNN, SVM, LSTM, Decision Tree, XGBoost
many more from twitter. Twitter data- real-time data is and Random Forest.
collected and pre-processed. One of the preprocessing steps is
the replacement of the special characters (@, hashtag,:-),:-()
with USER MENTION, HASHTAG, EMO POS, EMO NEG
respectively. Then the required features are extracted and sent
to various machine learning algorithms – the classifiers for the
training. Now, the dataset is properly arranged in the fo rmat.
The data set in the proper format is tested with the help of test
data. The final step is predicting the sentiment as the output:

Fig. 1. Proposed System


Fig. 3. Flow diagram
To accomplish the objective of proposed work, system
characterizes the architecture model shown in Fig. 2. The The output from these algorithms is predicted which cell
architecture speaks to the principle segments of the system. phone has more positive tweets and negative m tweets. Then
Every segment comprises the working of system and together model inspection is done for the right results and then the
gives the usefulness of the framework. System configuration majority vote for the particular cell is seen. The output is seen
will be viewed as the utilizations of framework hypothesis to as which cell has the highest positive and highest negative
product improvement. It depicts how the system functions and emotional among other 5 cell phone brands.
overall architecture.
The Architecture Diagram shows the flow of methods IV. DEVELOPMENT
followed from the real raw twitter data set to the predicted This section discusses the development of the software
value. Twitter data set has been taken with the help of twitter process. The Methodology gives the total examination of each
API, the dataset is separated as training and testing data set. procedure associated with Fig.3.
The training data set is pre-processed and the required features
are extracted and sent to train various machine learning Data Set: In this work, the data set considered is in the
algorithms – the different classifiers. If the output is zero(0), it form of comma-separated values files with tweets. The
is a negative tweet, if the output is 1, it is the positive tweet. training data set is of the type CSV file, it has tweet id, tweet
The data set in the proper format is tested with the help of test and tweet hashtag. Here tweet id is a unique integer which is
data which is the final step in predicting the sentiment as the used to identify the tweets. Were the tweets are the mixture of
output. words, symbols, emotions and URLs. Words contribute to
predicting the sentiment. Training data set considered here is
A flow diagram is shown in Fig. 3 maps out the flow of 6000 tweets and testing data set considered as 2000 tweets
information for any process or System. respectively.
Pre-Processing: In social media raw tweets used by the
people results in noisy data, as it is the nature of people’s

978-1-7281-4685-0/20/$31.00 ©2020 IEEE 289

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0

usage. These tweets will have dissimilar characteristics like For a data set consisting of features set and labels set, an SVM
retweets, emotions etc., which are to be extracted, To create classifier builds a model to predict the classes for the new
user-friendly data, twitter data has to be normalized. examples. It assigns a new example or data points to one of
the classes.
In this work, various processing methods are used to
standardize the data set. Initially, the general preprocessing Algorithm:
approach is applied on tweets which are as follows : i. Define an optimal hyperplane
ii. Extend step I for nonlinearly separable problems
i. Convert the tweet to lowercase
iii. Map data to high dimensional space where it is easy
ii. Replace two or more dots(.) with space
to classify with linear decision surfaces.
iii. Strip spaces and quotes(“and’) from the ends of the tweet
iv. Replace two or more spaces with a single space C. Convolution Neural Networks
The special twitter features are handled as follows: Convolution Neural Networks or CNNs are particular
neural networks that involve surfaces known as convolution
Emotion – Operators use numerous emotions in a different surfaces that can decipher spacial data. It works particularly
way and is not possible to match all these emotions on social well for big data. It requires only small amounts of
media. As the numbers increases in large scale, however, preprocessing since it learns features. A convolution surface
some of the commonly used emotions are matched which are comprises of many filters or kernels that help it learn to mine
frequently used. Here matched emotions are replaced with particular forms of features from the data. A 2D window slides
either EMO POS or EMO NEG depending on whether it is over the input data exhibiting the convolution operation is
conveying a positive or negative emotion. called a kernel. This research uses temporal convolution that
Hashtag- are the unspaced phrases prefixed by the hash suits an analysis of linear data like tweets.
symbol (), which is frequently used by users to mention a For the random inputs and two hidden layers. The
trending topic on twitter. Then replace all these hash tags with activation and sigmoid functions are computed as follows:
the words with the hash symbol. Like, hellooo is replaced by
hello. The regular expression used to match hashtags is (§+). activation = lambda x: 1.0/(1.0 + np.exp(-x))

Classifiers: This subsection discusses various classifier sigmoid function:


used in this work such as Naïve Bayes, SVM, CNN, Decision input = np.random.randn(3, 1)
tree and Random Forest
hidden1 = activation(np.dot(W1, input) + b1)
A. Naive Bayes hidden2 = activation(np.dot(W2, hidden1) + b2)
It is a simple model which can be used for text
output = np.dot(W3, hidden2) + b3
classification. Here, the class c^ is assigned to a tweet t. This
algorithm learns the probability of an object with certain Here W1,W2,W3,b1,b2,b3 are learnable parameter of the
features belonging to a particular group or class. This model.
algorithm is also called “Naïve” because it makes the
assumption that the occurrence of a certain feature is D. Decision Tree
independent of the occurrence of the other features. So it
A classifier model in which every node represents a test
provides a method to compute the conditional probability, i.e.,
on the attribute of the data set and its children showcase the
the probability of an event based on the previous knowledge
results is called a decision tree. The family of superintended
which is available on the events .
learning algorithms comprises of decision trees. In comparison
Following are the Bayes Theorem equations : other superintended learning algorithms, decision tree
algorithm is also used for solving grouping and regression
P(A/B) = P(B/A)P(A)/P(B) problems too. The last classes of the data points are showcased
The components of this equation are, by leaf nodes. This superintended classifier method uses data
 P(A/B) - Conditional probability of occurrence of event with familiar labels to create the decision tree and then the
model is utilized on the test data. The best test conditio n or
A given the event B is true.
decision has to be taken for every node in the tree.
 P(A) and P(B) – the probability of occurrence of event
A and B respectively. Decision Tree Algorithm
 P(B/A) – probability of occurrence of event B given the  The best Characteristic of the dataset is put at the root of
event A is true. the tree.
B. SVM  Divide the training set into subsets. Subsets have to be
created in such a way that every subset comprises of data
SVM is a Support Vector Machine. It is a non probabilistic
with the same value for an element.
binary linear classifier. For a training set of points(xi,yi),
where x is a feature vector and y is the class. To determine the  Step 1 and step 2 should recur on every subset until the
leaf nodes are found in all the branches of the tree.
maximum margin hyperplane that divides the points with xi =
1 and yi = 1. The equation of the hyperplane is; w • x b = 0.

978-1-7281-4685-0/20/$31.00 ©2020 IEEE 290

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0

E. Random Forest Accuracy of the decision tree is shown in fig 5. Accuracy


Random forest algorithm is a superintended grouping obtained by this algorithm is 75.98%.
algorithm. This algorithm creates the forest with a number of
trees as is evident from the name. It can be said that the forest
looks more robust if it has more trees. In a similar manner in
the random forest classifier, the high accuracy results are
obtained with more number of trees in the forest. We
implemented a random forest algorithm by using Random
Forest Classifier from sklearn. ensemble provided by scikit-
learn. The algorithm is experimented using 10 estimators
(trees) using both presence and frequency features. Presence
features performed better than frequency though the
improvement was not substantial. Decision tree method can be
related to rule-based system. The decision tree algorithm
shows up some set of rules when the training data file with
targets and features is provided. The similar set rules can be
utilized to exhibit the prediction on the test dataset Fig. 6. Accuracy of Navie Bayes is 70.58%

The above fig 6 shows the accuracy of Navie Bays.


V. RESULT A NALYSIS Accuracy obtained for this classifier is 70.58%.
This section discusses the results obtained by the various
classifiers. The following table 1 shows the accuracy obtained
by different classifiers.

TABLE I. CLASSIFIER ACCURACY

Algorithm Accuracy
SVM 79.90%
Navie Bays 70.58%
Decision T ree 75.98%
Random Forest 72.05%

Fig. 7. Accuracy of Random Forrest is 72.05%

Accuracy of the Random forest is shown in fig 7.


Accuracy of this algorithm is 72.05%.

Fig. 4. Bigram word count

Fig 4 depicts the Bigram word count of the tweets. It


shows the maximum and minimum word count of the Bigram
feature extraction.

Fig. 8. Accuracy of SVM 79.90%

Fig 8 show the accuracy of SVM classifier. Here the


accuracy obtained by this algorithm is 79.90%. The following
fig 9 shows the word count result and fig 10 shows the
comparison of all the classifiers. Among all the classifiers
SVM Classifier provides better accuracy.
Fig. 5. Accuracy of Decision tree is 75.98%

978-1-7281-4685-0/20/$31.00 ©2020 IEEE 291

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0

References
[1] Axel Groß-Klußmann , Stephan König , Markus Ebner, “ Buzzwords
build momentum: Global financial Twitter sentiment and the aggregate
stock market’, Expert Systems With Applications 136 (2019) 171–186
[2] Marcus Fontoura Alexander Shraer, Maxim Gurevich and Vanja
Josifovski. “T opk Publish-Subscribe for Social Annotation of New".
Analysis of T witter Data,pages 6(6):385–396, 26th August 2013.
[3] Narashima S. Purohit, Meghana Bhat, Akshata B. Angadi, Karuna C.
Gull, presented a paper on “Crawling through Web to Extract the Data
from Social Networking Site - T witter”, in National Conference on
Parallel Computing Technologies – (PARCOMPUTECH-2015), CDAC
in association with National Knowledge Network, IEEE and CSI
Fig. 9. Unigram Word count Bangalore chapter, at Bangalore, 19 - 20 February, 2015
[4] Subasish Das , Anandi Dutta , Gabriella Medina , Lisa Minjares-Kyle ,
Zachary Elgart ,” Extracting patterns from Twitter to promote biking”,
IAT SS Research 43 (2019) 51–59
[5] Karuna Gull, Akshata Angadi, (2018) “A Methodical Study about
Behaviour of Different seeds on varying Distance Measures using an
Iterative T echnique with Evaluation of Cluster validity”, in the
proceedings of CSI-2015, 50th Golden Jubilee Annual Convention On
Digital Life, Springer Nature Singapore Pte Ltd. 2018, ICT Based
Innovations, Advances in Intelligent Systems and Computing (AISC),
https://doi.org/10.1007/978-981-10-6602-3_7, pp.63-74.
[6] Karen Howells, Ahmet Ertugan, “Applying fuzzy logic for sentiment
analysis of social media network data in marketing” , Procedia
Computer Science 120 (2017) 664–670
[7] Karuna Gull, Sudip Padhye, Dr. Sandeep Sharma, Dr. Subodh Jain,
Fig. 10. Comparison of Classifiers (2017), “A Comparative Analysis of Lexical/NLP Method with
WEKA’s Bayes Classifier”, International Journal on Recent and
Innovation T rends in Computing and Communication (IJRIT CC),
VI. CONCLUSION Volume: 5 Issue: 2, February 2017, pp. 221 – 227 ISSN: 2321-8169.
Available: https://ijritcc.org/index.php/ijritcc/article/view/203/203
This work considers the tweets with a mixture of words, [8] Ankit, Nabizath Saleena , “ An Ensemble Classification System for
emoticons, URLs, hashtags, user mentions, and symbols. In T witter Sentiment Analysis”, Procedia Computer Science 132 (2018)
this research machine learning algorithms such as Naive 937–946
Bayes, Maximum Entropy, Decision Tree, Random Forest, [9] Karuna Gull, Akshata B. Angadi (2016), “T ext Minin g Predictive
XGBoost, SVM, Multi-Layer Perceptron, Recurrent Neural Modeling Algorithm for classifying Attitudes of customers with
networks and Convolutional Neural Networks to classify the Accuracy Estimation”, in the proceedings of Second International
Conference on Information and Communication T echnology for
polarity of the tweet was developed. This research uses two Competitive Strategies (ICTCS-2016), Conference Proceedings by ACM
types of features namely unigrams and bigrams for – ICPS, ISBN No: 978-1-4503-3962-9. DOI:10.1145/2905055.2905268.
classification and observation shows that augmenting the Available: http://dx.doi.org/10.1145/2905055.2905268
feature vector with bigrams improved the accuracy. Once the [10] Oscar Araque , Ganggao Zhu, Carlos A. Iglesias, “ A semantic
feature has been extracted it is represented as either a sparse similarity-based perspective of affect lexicons for sentiment analysis”,
vector or a dense vector. It has been observed that the Knowledge-Based Systems 165 (2019) 346–359
presence in the sparse vector representation recorded a better [11] Nur Azizah Vidya, Mohamad Ivan Fanany, Indra Budi, “ T witter
Sentiment to Analyze Net Brand Reputation of Mobile Phone
performance than frequency. Neural network methods Providers”, Procedia Computer Science 72 ( 2015 ) 519 –
performed better than other classifiers in general. LSTM 526.
model has produced better classification result. Through the [12] Shivani Bahria, Pranav Bahria, Sangeeta Lal “ A Novel approach of
Naive Bayes algorithm, 70.58% result is obtained. An Sentiment Classification using Emoticons “, International Conference on
ensemble method has achieved an accuracy of 79.90% IDS 2018) , Procedia Computer Science 132 (2018) 669–678
compared to other classifiers. [13] Karuna C. Gull, Seema C. G, Akshata B. Angadi, Suvarna G.
Kanakaraddi, “A Clustering T echnique T o Rise Up T he Marketing
Handling emotion ranges: it can improve and train the T actics By Looking Out T he Key Users”, 978 -1-4799-2572-
models to handle a range of sentiments. Tweets don’t always 8/14/$31.00_c 2014 IEEE
have a positive or negative sentiment. At times they may have [14] B.L Poja, Suvarna Kanakaraddi , Meenaxi M Raikar ,” sentiment based
no sentiment i.e. neutral. Sentiment can also have gradation s stock market prediction”, International Conference on Computational
T echniques, Electronics and Mechanical Systems (CT EMS), 2018
like the sentence, This is good, is positive but the sentence,
This is extraordinary. is somewhat more positive than the first. [15] Sudip Padhye , Karuna Gull, (2016) “Regression analysis for Stock
Market Prediction using Weka Tool without Sentiment Analysis”, in the
So sentiment can be classified in ranges, say from -2 to +2. proceedings of Sixth International Conference on Computational
Further, this research can be extended for other real-time data Intelligence and Information T echnology, CIIT -2016, Emerging
sets and performance of the classification can be improved by T echnologies in Engineering conference proceedings by the McGraw-
applying deep learning algorithms . Hill Education (India) Private Limited, pp. 78-87. ISBN No: 978-93-
5260-435-7.

978-1-7281-4685-0/20/$31.00 ©2020 IEEE 292

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.

You might also like