Professional Documents
Culture Documents
10 1109@icict48043 2020 9112546
10 1109@icict48043 2020 9112546
Abstract— Today’s advancement in the design of web Consumer information; the, and the Streaming API can gather
technology, has made enormous data available for internet Twitter data constantly. Also, engineers are able to merge
operators and also the numerous data is being created. Web these different APIs to brand self-owned applications. Hence,
technology networking sites like Twitter, Facebook, Google+ has notion investigation seems to have a solid fundament with the
become a platform for sharing and exchanging the knowledge, backing of monstrous online information [2].
for discussing and expressing their views about different themes
with, unlike groups. The proposed system emphasizes largely on It was also mentioned [3] that the Social Networking Sites
opinion investigation of twitter facts that support the stand as the backbone to provide access to associates and
investigation of the data in the tweets that show ideas are share messages. Messages may be in the form of matter or
extremely unstructured, varied and in some instances positive or substance in posts, shares, comments. The extraction and
negative, and also emotions are taken into account for the analysis of the matter inside, are the challenging issues in
sentiment analysis. The proposed system provides a performance Social Media as the data is vibrant and the format of data will
analysis of machine learning algorithms such as S upport Vector vary from site to site. Their work gives the steps in sequence
Machine, Navie Bayes, Max Entropy, LS TM, CNN, Random to extract the content from a twitter account and help us to
forest. Among all these techniques S VM provides an accuracy of come up with new innovative ideas in developing better
79.90%. applications by using the dynamic data.
Keywords—SVM (Support Vector Machine), Long Short- This research discusses the development of various
Term Memory (LSTM), Convolution Neural Networks machine learning algorithms to classify the tweets of different
(CNN), Application Program Interface (API), Natural company cell phone users. Google Samsung, Mi, Apple and
Language Processing (NLP) other cell phone company data set is considered and the
Twitter API is used to collect the Tweets of the users of these
I. INT RODUCT ION cell phones.
Sentiment analysis is a set of concentration, thought, or People have come across several developments in the
decision provoked by inclination. It deliberates individuals’ earlier span of time in the utilization of online networking
sentiments towards precise elements. Web technology is a assets and microblogging sites like Twitter, Facebook and
creative spot for such sentiment data. Each individual or as a YouTube. These developed rich assets have become
client can post their views and have a discussion through marketing information for many organizations and
different internet-based activities such as discussions, small associations. Most of the organizations conducted meeting and
scale websites, or online social organizing destinations. From reviews to increase the response and also to know the quality
the viewpoint of scientist’s, numerous internet-based life of the products. But such conventional techniques were time-
destinations discharge can be made like, their application intensive, costly, and not returned the outcomes the
programming interfaces (APIs), provoking information organizations were searching for because of eco-friendly
gathering and Investigation from scientists as well as issues and ill-structured studies. Hence today valuable
designers. For example, Twitter at present comprises of three feedback on products and services are received through
distinctive adaptations of accessible APIs [1] , Specifically the marketing strategies that are particularly relying on sentiment
Streaming API, the REST API, the Search API. The Search analysis and natural language processing. Today huge amount
API allows designers to query explicit twitter data. In the of client’s sentiment data is uploaded on internet each day and
REST API, designers can accumulate standing evidence and this data is more of unstructured content and its troublesome
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
to gain the meaning for computers. In previous times it was Authors have proposed a process to determine the
impractical to understand a lot of ill-shaped information, sentiment of the tweet and categorized the tweets into positive
however at this point with computational power after the and negative tweets [8]. An ensemble classifier method is
estimation of Moore's law (Moore, 1965) and dispersed proposed here where a single classifier is formed by joining
systems of PCs utilizing structures, for example, Hadoop, the base learning classifier. The objective of this research is to
enormous data files can be presently understood with greater improve the performance and accuracy of the sentiment
ease. Lot of assumption is used in this field of research, for differentiation method. Data preliminary-processing and
example, Watson and Googles ongoing procurement of deep feature depiction in sentiment differentiation are also
learning technology and IBMs rigorous research into their inspected. Model is developed in python. To represent the
Natural Language Processing supercomputer. Hence with classification, similarity measure and evaluation purpose
extended interest and studies into this field Complex devices Scikit-learn is used. To pre-process the data Natural Language
will be gaining a comprehended idea from the content that will Toolkit (NLTK) is used to stem and remove the stopping term.
incredibly enhance data analytics and search engines. To handle the data set Padas is utilized. Multidimensional
arrays are handled by using NumPy.
II. LIT ERAT URE SURVEY According to the authors [9] opinions can be collected
Authors have [4] developed a framework for from movie reviews sites, e-commerce sites and social media
understanding the sentiments of bikers from the twitter data. sites. These opinions usually facilitate by giving suggestion on
Hashtags for the eight-year data were obtained. This research how to launch a particular product or how to improve the
has provided the techniques for performing text extraction and provision of service etc. Some sites are capable to analyse the
opinion analysis for the unstructured text data. Observation huge data statistically available on sites to give the statistical
shows the biking data which is related to seasonal and weather report and even those sites tried to provide users with a visual
changes. Common sentiments on biking were positive. insight into what people feel about a certain product and other
Though the negative opinions are related to crime, ill weather kinds of stuff. Authors detail the process of analysis of the
and other circumstances. In recent year’s dichotomy score classification algorithm and discuss various validation
shows more positiveness compare to previous years. This methods. They have worked with real samples. Authors
framework helps planners and decision-makers for biking. worked on different thresholds that limit feature sets. Results
showed in graphical form exposed the performance of the
Authors have carried out work on the k-means algorithm classifier used. They mentioned future opinion mining systems
[5]. This algorithm works correctly if the initial seed is need better insight into natural language opinions so that it can
selected perfectly. Locating objects to the cluster is checked as fill a gap that has raised
per the requirements and max E/I value. In the beginning, if
the better initial seed is chosen then excellent result with Authors have proposed a novel approach of utilizing
higher E/I ratio and appropriate cluster designs should be lexicons that are based on semantic likeliness between text
obtained. terms and lexicon word stock [10]. The proposal considers an
opinion inspection which uses lexicon-based semantic
This research [6] focuses on developing a model which can likeliness as a characteristic and embedding based
analyze the contents of a microblog and analyses the customer representation. The experiment is conducted on seven public
feedback. This application helps the organization to know the data sets and four sentiment lexicons. Evaluation of the feature
customer's perceptions of a product. Authors have used fuzzy extraction is performed through several statistical methods.
logic for converting the linguistic variables into membership The main goal of this research is to semantic feature extraction
functions that elaborate the variables in a fuzzy system. A is to be integrated with embedding based representation.
linguistic variable and a fuzzy set that is related to a specific
parameter are represented by every membership function. If This research uses nine years of data set of microblog
then rules are designed to describe the interaction of the fuzzy messages. Relationship between sentiments extracted from the
set. The entire hypothesis collaborates into the final fuzzy set. global financial market is analysed. Here user subgroups are
This set produces a fuzzy value that is transformed into a firm considered for sentiment metrics. Sentiment measures
value using suitable approaches. captured financial growth in the main economic area [1].
According to the authors [7] a technique was built by them Author [11] has used the Twitter report of XL Axiata
to detect and summarize an overall sentiment. The 92XL123), Telkomsel (@telkomsel) and indosat (@
methodology suggested by the works with sentiments on indosatmania), extracted opinion and processed the sentiments
Twitter data contextually. They concentrated on Sentiment with three classifier algorithms such as SVM, Navie Bays and
Analysis as a major feature and they carried out with Natural Decision tree. Designed a real-time dashboard. Most of the
Language Processing (NLP). In their work, they carried out a keywords that people discuss are summarized on the
fine pre-processing on the data extracted from Twitter using dashboard.
Twitter API. They used Part-Of-Speech (POS) tagger, This research proposes an algorithm based on emotion
SentiWordNet, WordNet and NLP to assign weights to words score learning for sentiment classification [12]. This algorithm
in the tweets. The analysis of the data modified is done using is scalable as it does not require the manual score. The
various algorithms available in the WEKA tool. Then they experiment is conducted on 1000 tweets. The proposed
compared the results with and without lexical/NLP method. algorithm has detected the sentences as positive and negative
sentences.
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
III. PROPOSED SYST EM The preprocessed data collected from the real time twitter
The Proposed system makes use of the Twitter API to data set using twitter API and sent to the following algorithms
download the tweets of Google, Samsung, Mi, Apple and Naive Bayes, CNN, SVM, LSTM, Decision Tree, XGBoost
many more from twitter. Twitter data- real-time data is and Random Forest.
collected and pre-processed. One of the preprocessing steps is
the replacement of the special characters (@, hashtag,:-),:-()
with USER MENTION, HASHTAG, EMO POS, EMO NEG
respectively. Then the required features are extracted and sent
to various machine learning algorithms – the classifiers for the
training. Now, the dataset is properly arranged in the fo rmat.
The data set in the proper format is tested with the help of test
data. The final step is predicting the sentiment as the output:
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
usage. These tweets will have dissimilar characteristics like For a data set consisting of features set and labels set, an SVM
retweets, emotions etc., which are to be extracted, To create classifier builds a model to predict the classes for the new
user-friendly data, twitter data has to be normalized. examples. It assigns a new example or data points to one of
the classes.
In this work, various processing methods are used to
standardize the data set. Initially, the general preprocessing Algorithm:
approach is applied on tweets which are as follows : i. Define an optimal hyperplane
ii. Extend step I for nonlinearly separable problems
i. Convert the tweet to lowercase
iii. Map data to high dimensional space where it is easy
ii. Replace two or more dots(.) with space
to classify with linear decision surfaces.
iii. Strip spaces and quotes(“and’) from the ends of the tweet
iv. Replace two or more spaces with a single space C. Convolution Neural Networks
The special twitter features are handled as follows: Convolution Neural Networks or CNNs are particular
neural networks that involve surfaces known as convolution
Emotion – Operators use numerous emotions in a different surfaces that can decipher spacial data. It works particularly
way and is not possible to match all these emotions on social well for big data. It requires only small amounts of
media. As the numbers increases in large scale, however, preprocessing since it learns features. A convolution surface
some of the commonly used emotions are matched which are comprises of many filters or kernels that help it learn to mine
frequently used. Here matched emotions are replaced with particular forms of features from the data. A 2D window slides
either EMO POS or EMO NEG depending on whether it is over the input data exhibiting the convolution operation is
conveying a positive or negative emotion. called a kernel. This research uses temporal convolution that
Hashtag- are the unspaced phrases prefixed by the hash suits an analysis of linear data like tweets.
symbol (), which is frequently used by users to mention a For the random inputs and two hidden layers. The
trending topic on twitter. Then replace all these hash tags with activation and sigmoid functions are computed as follows:
the words with the hash symbol. Like, hellooo is replaced by
hello. The regular expression used to match hashtags is (§+). activation = lambda x: 1.0/(1.0 + np.exp(-x))
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
Algorithm Accuracy
SVM 79.90%
Navie Bays 70.58%
Decision T ree 75.98%
Random Forest 72.05%
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Inventive Computation Technologies (ICICT-2020)
IEEE Xplore Part Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
References
[1] Axel Groß-Klußmann , Stephan König , Markus Ebner, “ Buzzwords
build momentum: Global financial Twitter sentiment and the aggregate
stock market’, Expert Systems With Applications 136 (2019) 171–186
[2] Marcus Fontoura Alexander Shraer, Maxim Gurevich and Vanja
Josifovski. “T opk Publish-Subscribe for Social Annotation of New".
Analysis of T witter Data,pages 6(6):385–396, 26th August 2013.
[3] Narashima S. Purohit, Meghana Bhat, Akshata B. Angadi, Karuna C.
Gull, presented a paper on “Crawling through Web to Extract the Data
from Social Networking Site - T witter”, in National Conference on
Parallel Computing Technologies – (PARCOMPUTECH-2015), CDAC
in association with National Knowledge Network, IEEE and CSI
Fig. 9. Unigram Word count Bangalore chapter, at Bangalore, 19 - 20 February, 2015
[4] Subasish Das , Anandi Dutta , Gabriella Medina , Lisa Minjares-Kyle ,
Zachary Elgart ,” Extracting patterns from Twitter to promote biking”,
IAT SS Research 43 (2019) 51–59
[5] Karuna Gull, Akshata Angadi, (2018) “A Methodical Study about
Behaviour of Different seeds on varying Distance Measures using an
Iterative T echnique with Evaluation of Cluster validity”, in the
proceedings of CSI-2015, 50th Golden Jubilee Annual Convention On
Digital Life, Springer Nature Singapore Pte Ltd. 2018, ICT Based
Innovations, Advances in Intelligent Systems and Computing (AISC),
https://doi.org/10.1007/978-981-10-6602-3_7, pp.63-74.
[6] Karen Howells, Ahmet Ertugan, “Applying fuzzy logic for sentiment
analysis of social media network data in marketing” , Procedia
Computer Science 120 (2017) 664–670
[7] Karuna Gull, Sudip Padhye, Dr. Sandeep Sharma, Dr. Subodh Jain,
Fig. 10. Comparison of Classifiers (2017), “A Comparative Analysis of Lexical/NLP Method with
WEKA’s Bayes Classifier”, International Journal on Recent and
Innovation T rends in Computing and Communication (IJRIT CC),
VI. CONCLUSION Volume: 5 Issue: 2, February 2017, pp. 221 – 227 ISSN: 2321-8169.
Available: https://ijritcc.org/index.php/ijritcc/article/view/203/203
This work considers the tweets with a mixture of words, [8] Ankit, Nabizath Saleena , “ An Ensemble Classification System for
emoticons, URLs, hashtags, user mentions, and symbols. In T witter Sentiment Analysis”, Procedia Computer Science 132 (2018)
this research machine learning algorithms such as Naive 937–946
Bayes, Maximum Entropy, Decision Tree, Random Forest, [9] Karuna Gull, Akshata B. Angadi (2016), “T ext Minin g Predictive
XGBoost, SVM, Multi-Layer Perceptron, Recurrent Neural Modeling Algorithm for classifying Attitudes of customers with
networks and Convolutional Neural Networks to classify the Accuracy Estimation”, in the proceedings of Second International
Conference on Information and Communication T echnology for
polarity of the tweet was developed. This research uses two Competitive Strategies (ICTCS-2016), Conference Proceedings by ACM
types of features namely unigrams and bigrams for – ICPS, ISBN No: 978-1-4503-3962-9. DOI:10.1145/2905055.2905268.
classification and observation shows that augmenting the Available: http://dx.doi.org/10.1145/2905055.2905268
feature vector with bigrams improved the accuracy. Once the [10] Oscar Araque , Ganggao Zhu, Carlos A. Iglesias, “ A semantic
feature has been extracted it is represented as either a sparse similarity-based perspective of affect lexicons for sentiment analysis”,
vector or a dense vector. It has been observed that the Knowledge-Based Systems 165 (2019) 346–359
presence in the sparse vector representation recorded a better [11] Nur Azizah Vidya, Mohamad Ivan Fanany, Indra Budi, “ T witter
Sentiment to Analyze Net Brand Reputation of Mobile Phone
performance than frequency. Neural network methods Providers”, Procedia Computer Science 72 ( 2015 ) 519 –
performed better than other classifiers in general. LSTM 526.
model has produced better classification result. Through the [12] Shivani Bahria, Pranav Bahria, Sangeeta Lal “ A Novel approach of
Naive Bayes algorithm, 70.58% result is obtained. An Sentiment Classification using Emoticons “, International Conference on
ensemble method has achieved an accuracy of 79.90% IDS 2018) , Procedia Computer Science 132 (2018) 669–678
compared to other classifiers. [13] Karuna C. Gull, Seema C. G, Akshata B. Angadi, Suvarna G.
Kanakaraddi, “A Clustering T echnique T o Rise Up T he Marketing
Handling emotion ranges: it can improve and train the T actics By Looking Out T he Key Users”, 978 -1-4799-2572-
models to handle a range of sentiments. Tweets don’t always 8/14/$31.00_c 2014 IEEE
have a positive or negative sentiment. At times they may have [14] B.L Poja, Suvarna Kanakaraddi , Meenaxi M Raikar ,” sentiment based
no sentiment i.e. neutral. Sentiment can also have gradation s stock market prediction”, International Conference on Computational
T echniques, Electronics and Mechanical Systems (CT EMS), 2018
like the sentence, This is good, is positive but the sentence,
This is extraordinary. is somewhat more positive than the first. [15] Sudip Padhye , Karuna Gull, (2016) “Regression analysis for Stock
Market Prediction using Weka Tool without Sentiment Analysis”, in the
So sentiment can be classified in ranges, say from -2 to +2. proceedings of Sixth International Conference on Computational
Further, this research can be extended for other real-time data Intelligence and Information T echnology, CIIT -2016, Emerging
sets and performance of the classification can be improved by T echnologies in Engineering conference proceedings by the McGraw-
applying deep learning algorithms . Hill Education (India) Private Limited, pp. 78-87. ISBN No: 978-93-
5260-435-7.
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 13,2020 at 13:28:05 UTC from IEEE Xplore. Restrictions apply.