The Thesis Report

Department of Electrical and Computer Engineering
North South University
Directed Research
Opinion Mining Based on

Crowdsourced Data
Md. Azwad Hasan ID # 1430125042

Chowdhury
Supervisor
Md. Shahriar Karim

Assistant Professor
ECE Department
Spring, 2019
DECLARATION
This is to certify that this Thesis is my original work. No part of this work has been submitted
elsewhere partially or fully for the award of any other degree or diploma. Any material
reproduced in this project has been properly acknowledged.
Students’ name & Signature
Md. Azwad Hasan

Chowdhury
APPROVAL
The directed research entitled “Opinion Mining Based on Crowdsourced Data” by Md.
Azwad Hasan Chowdhury (ID # 1430125042) is approved in partial fulfillment of the
requirement of the Degree of Bachelor of Science in Computer Science and Engineering on May
and has been accepted as satisfactory.
Supervisor’s Signature
Md. Shahriar Karim

Assistant Professor
Dhaka, Bangladesh.
Department Chair’s Signature
Dr. K. M. A. Salam
Professor and Chairman
Dhaka, Bangladesh.
ACKNOWLEDGMENT
First of all, we wish to express our gratitude to the Almighty for giving us the strength to
perform our responsibilities and complete the report.
The capstone project program is very helpful to bridge the gap between the theoretical
knowledge and real life experience as part of Bachelor of Science (BSc) program. This report has
been designed to have a practical experience through the theoretical understanding.
We also acknowledge our profound sense of gratitude to all the teachers who have been
instrumental for providing us the technical knowledge and moral support to complete the project
with full understanding.
It is imperative to show our appreciation for our honorable faculty member Md. Shahriar Karim
for his undivided attention and help to achieve this milestone. Also, our gratefulness is divine to
the North South University, ECE department for providing us an opportunity to a research under
CSE 498R.
We thank our friends and family for their moral support to carve out this research and always
offer their support.
ABSTRACT
This report presents directed research on Opinion Mining also known as Sentiment Analysis on
several types of dataset. The analysis techniques include machine learning, deep neural network,
lexicon-based approach and non-machine learning based naïve approaches. Instead of sticking
into one particular approach, the target of this research was to explore all kinds of possibilities
and techniques that can be done on analyzing sentiments of users posted on online platforms like
social media, e-commerce site reviews and movie review sites.
Table of Contents
CHAPTER 1...................................................................................................................................1
OVERVIEW...................................................................................................................................1
1.1 Introduction............................................................................................................................2
1.2 Research Details....................................................................................................................2
1.3 Research Goals......................................................................................................................3
1.4 Summary................................................................................................................................3
CHAPTER 2...................................................................................................................................4
MOTIVATION..............................................................................................................................4
2.1 Introduction............................................................................................................................5
2.2 Motivation towards our research topic..................................................................................5
2.3 Application Domains.............................................................................................................5
2.4 Summary................................................................................................................................6
CHAPTER 3...................................................................................................................................7
RELATED WORK........................................................................................................................7
3.1 Introduction............................................................................................................................8
3.2 Sentiment analysis of Twitter data........................................................................................8
3.3 Sentiment in a cross-media analysis framework...................................................................8
3.4 Apply word vectors for sentiment A. of APP reviews..........................................................8
3.5 Aspect level SA on E-commerce data...................................................................................9
3.6 Thai Sentiment Analysis for Consumer’s Review in Multiple Dimensions Using Sentiment
Compensation Technique............................................................................................................9
3.7 Sentiment analysis using Latent Dirichlet Allocation and topic polarity word cloud
visualization.................................................................................................................................9
3.8 Deep learning for sentiment analysis of movie reviews......................................................10
3.9 Deep learning for sentiment analysis...................................................................................10
3.10 Opinion-Tree based Flexible Opinion Mining Model.......................................................10
CHAPTER 4.................................................................................................................................11
MACHINE LEARNING BASED APPROACHES..................................................................11
4.1 Introduction..........................................................................................................................12
4.2 Data set and preprocessing..................................................................................................12
4.3 Pad sequence........................................................................................................................12
4.3 Algorithm and architecture..................................................................................................12
4.5 Tools and Technologies.......................................................................................................14
4.6 Implemented model.............................................................................................................15
CHAPTER 5.................................................................................................................................16
NAÏVE APPROACH...................................................................................................................16
5.1 Introduction..........................................................................................................................17
5.2 Data source..........................................................................................................................17
5.3 Tools....................................................................................................................................18
5.4 Modules...............................................................................................................................19
5.5 Getting Tweets in JSON format..........................................................................................19
5.6 Pagination and Cursor..........................................................................................................19
5.7 Analyzing twitter data..........................................................................................................19
5.8 Text Blob.............................................................................................................................20
CHAPTER 6.................................................................................................................................21
LEXICON BASED APPROACH...............................................................................................21
6.1 Introduction..........................................................................................................................22
6.2 Dataset.................................................................................................................................22
6.3 System Architecture.............................................................................................................22
6.4 Score calculation..................................................................................................................23
6.5 Summary..............................................................................................................................24
CHAPTER 7.................................................................................................................................25
RESULT ANALYSIS..................................................................................................................25
7.1 Introduction..........................................................................................................................26
7.2 Machine Learning based approach......................................................................................26
7.3 Lexicon based Approach.....................................................................................................26
7.3 Naive Approach...................................................................................................................27
CHAPTER 8.................................................................................................................................28
CONCLUSION............................................................................................................................28
BIBLIOGRAPHY........................................................................................................................30
APPENDIX...................................................................................................................................33
CODE SAMPLES........................................................................................................................33
List of Figures
Figure No. Figure caption Page No.
1 LSTM architecture 22
2 Recurrent Neural Network 23
3 Implemented RNN model 24
4 Twitter API access approval email 26
5 Token information snapshot 27
6 Lexicon based system design 31
7 Output result of Naive approach 36
8 Code sample of RRN implementation 34
9 Code sample of Naïve approach 35
List of Tables
Table No. Table caption Page No.
1 Result of ML based approach 35
2 Result of Lexicon based approach 35
CHAPTER 1
OVERVIEW
1
1.1 Introduction
Research and industry are becoming more and more interested in finding automatically the
polarized opinion of the general public regarding a specific subject. The advent of social
networks has opened the possibility of having access to massive blogs, recommendations, and
reviews. The challenge is to extract the polarity from these data, which is a task of opinion
mining or sentiment analysis [8]. It is now an important field of research. It has vast implications
on automation and Artificial Intelligence based applications. In every minute, people post
different types of opinions on various online platforms. Analyzing those opinions manually is a
very difficult task. To analyze this opinion automatically, researchers need to come out with
some sort of automation system. Nowadays different types of approaches are being used for
analyzing these types of data. Among those approaches, machine learning based approaches are
the most popular ones. But this can be done using different approaches as well. For example,
lexicon based approaches are also used in this area. The proposed research has also shown
different naive approaches to solve this problem. The research is conducted on different social
media data, e-commerce site review data, and movie rating based review platforms.
1.2 Research Details

The thesis has implemented several data analyzing techniques on different types of user data
based on users’ opinion on some particular topics.
1. Machine Learning based approach
This research is conducted on IMDB’s movie review dataset. For extracting information, a deep
learning based algorithm called recurrent neural network (RNN) is used which is a variation of
neural network. The architecture implemented here is called Long Short Term Memory (LSTM).
2. Naïve Approach using Python
This research is conducted on the real-time tweets data from social media data – Twitter. The
project needs to access real-time user tweets to analyze the polarity of their tweets. For accessing
the twitter data in real time, a module built by Twitter engineers called Tweepy is used in the
project. Tweepy is built on top of high-level programming language Python.
2
3. Lexicon Based Approach
This technique uses dictionaries of words. Each word is interpreted with its emotional polarity
and opinion force. This dictionary is then paired with the document to determine its overall
polarity score of the document. These techniques usually give high precision but low recall.
Lexical algorithms can gain near-perfect results, but, they require using a lexicon – something
that’s not always available in all the languages.
1.3 Research Goals

There are 3 major goals that we want to focus on this research. They are as follows:
i. Analyzing user data in real-time: The research has implemented a model that gives a real-
time analyzing score on the tweets, posted by different users on Twitter. It classifies the posted
contents in the positive, negative and neutral category.
ii. Analyzing product reviews: The reviews posted by different e-commerce site users for any
particular product needs some kind of automation. If the user size of that business is very large
and the review comes from the user is also uncountable, then there must be some sort of
automation system to figure out how many reviews are positive and how many reviews are
negative for a particular product so that the business authority can take appropriate steps against
that product.
iii. Social biases: By analyzing the user data, it is possible to figure out community biases or
violation mindset related activities against some other community or class.
1.4 Summary
This chapter gives the insight of the modules that have been used in this proposed research. It
provides a clear picture on how the proposed methods and technologies are going work for
sentiment analysis with the help of machine learning, lexical and naïve approaches.
3
CHAPTER 2
MOTIVATION
4
2.1 Introduction
In this chapter, the paper discusses the motivation, due to which we thought of implementing our
proposed models and technologies. We will also discuss in this chapter as to why we have
chosen opinion mining as our research topic.
2.2 Motivation towards our research topic

Opinion mining is remarkably helpful in social media monitoring as it allows us to obtain an
overview of the broader public opinion behind particular issues. Utterly reading a post will let
people recognize whether the writer had a positive attitude or a negative attitude on the topic –
but that is only in the case when people have well knowledge in the language. However, a
computer has no idea of commonly spoken language – so, engineers or researchers require to
break down this problem into a mathematical problem or in the language of computers. It cannot
solely deduce whether something contains joy, frustration, anger, or otherwise – without any
context of what those words mean.
Opinion mining solves this problem by using Natural Language Processing. Basically, it
identifies the essential keywords and phrases within a document, which eventually help the
algorithm to classify the emotional state of the document. This makes computers to do
automation in many ways and reduce the effort of human. This field is very new and there is a lot
to contribute in this area. This is what made us select this topic as our research area.
2.3 Application Domains

Opinion mining has been applied to a variety of domains, from commercial to political and social
applications. Commercial applications involve mining reviews, predicting movie success,
analyzing social influence in product selling, extracting product reviews and predicting sales.
There are also political/social applications such as assessing citizens’ satisfaction of public
policies and predicting election results from social media data.
5
2.4 Summary
This chapter provided the idea about the motivation towards our thesis topic which aims to
systematically automate the entire work of natural languages through different types of methods.
6
CHAPTER 3
RELATED WORK
7
3.1 Introduction
In this chapter, the paper discusses different types of sentiment analysis (SA) methods that
currently exist in the current environment. The works conducted in this field by other researchers
previously and had tremendous impact in this field.
3.2 Sentiment analysis of Twitter data

Apoorv A., Boyi X. Ilia V., and Owen R. [1] manually collected 11,875 manually annotated
Twitter data (tweets) from a commercial source. Their collection consists of tweets in foreign
languages. They use Google translate to convert it into English before the annotation process.
Each tweet is labelled by a human annotator as positive, negative, neutral or junk. For the pre-
processing they divided the entire dataset into two parts, 1) an emoticon dictionary and 2) an
acronym dictionary. For feature extraction they designed a tree kernel. This tree kernel is an
instance of a general class of convolution kernels. They designed a unigram model as their
baseline for the comparison models which outperforms the popular state-of-the-art baseline.
3.3 Sentiment in a cross-media analysis framework

Yonas W. [2] introduces the implementation and integration of a sentiment analysis pipeline into
the ongoing open source cross-media analysis framework. Their designed models includes the
following components; chat room cleaner, NLP and sentiment analyser. Before the integration,
they also compare two broad categories of sentiment analysis methods, namely lexicon-based
and machine learning approaches. They used the apache-hadoop framework with its lexicon-
based sentiment prediction algorithm and Stanford coreNLP library with the Recursive Neural
Tensor Network (RNTN) model.
3.4 Apply word vectors for sentiment A. of APP reviews.

Xian F., Xiaoge L., Feihong D., Xin L. and Mian W. [3] aimed to investigate the effectiveness
of word vector representations for the problem of Sentiment Analysis. They divided their work
into three parts - sentiment words extraction, polarity of sentiment words detection, and text
8
sentiment prediction. They built sentiment lexicon and predict text sentiment frameworks to do
their task. The approach was not that much different from the conventional ways. They
measured polarity of different sentences and then made a classical naïve classifier to
differentiate the words through a cluster.
3.5 Aspect level SA on E-commerce data

Satuluri V., Meena B. [4] worked on E-commerce’s user data based on user reviews. They made
their classifier using Naïve Bayes (NB) classification and Support Vector Machine (SVM)
classification algorithms. Naïve Bayes algorithm calculates probability of each aspect of a text.
Support vectors are the data points of the dataset that are nearest to the hyper plane. Hyper plane
classifies a set of data. To classify a new data correctly we use hyper which is obtained from a
greatest possible margin.
3.6 Thai Sentiment Analysis for Consumer’s Review in

Multiple Dimensions Using Sentiment Compensation
Technique
Paitoon P., and Chayapol M., [5] used Sentiment compensation technique is used to
automatically compensate the sentiment to a dimension where consumer’s review mentions the
sentiment without a dimension. The results show that their proposed method outperform
sentiment to dimension (S2D) and dimension to sentiment (D2S) methods with the overall
accuracy 93.60%.
3.7 Sentiment analysis using Latent Dirichlet Allocation and

topic polarity word cloud visualization
Mohammad F. A. B. and Retno K. [6] did sentiment analysis using Latent Dirichlet Allocation
(LDA) that extracts the topic of documents where the topic is represented as the appearance of
the words with different topic probability. For using LDA data needs to be represented visually.
For that reason, they did topic polarity word cloud visualization.
9
3.8 Deep learning for sentiment analysis of movie reviews
Hadi P., and Saman G. [7] explored natural language methods to perform sentiment analysis.
They worked both on binary and multi-class classification. For the binary classification they
applied the bag of words, and skip-gram word2vec models followed by various classifiers,
including random forest, SVM, and logistic regression. For the multi-class case,
they implemented the recursive neural tensor networks (RNTN).
3.9 Deep learning for sentiment analysis

Lina M., and Rojas B. [8] have discussed many possible methods of solving sentiment analysis
related problems. It is actually a review journal where the authors have discussed the different
types of methods and analogies on opinion mining. They have discussed different deep learning
based techniques like feed-forward neural network, convolutional neural network, recurrent
neural network, long-short term memory, non-recursive neural networks, recursive neural
networks etc.
3.10 Opinion-Tree based Flexible Opinion Mining Model

Ding, J., Le, Z., Zhou, P., Wang, G., & Shu, W. [16] came up with this flexible opinion tree
which can be used to mine out user’s opinion up to three layers to get more specific opinion of
the user towards a certain internet ongoing hot topic. Here the first layer nodes are the coarse
granularity meaning whether the opinion is positive, neutral or negative in terms of attitude. The
second layer nodes contain the attitude force (strength of the attitude, ex: too strong, less strong)
of one coarse attitude. After mining out the second layer, the leaf nodes are constructed which
contain two-tuples (containing <object n, opinion n>). The leaf nodes are fine-grained opinions
referred as concrete reviews.
10
CHAPTER 4
MACHINE LEARNING
BASED APPROACHES
11
4.1 Introduction
In this chapter, the paper discusses the machine learning based approaches the research has
considered. This chapter explains the data set, preprocessing and the algorithms that has been
considered for sentiment analysis.
4.2 Data set and preprocessing

For the initial experiment, the research has considered a dataset from IMDB movie ratings. The
dataset contains 50000 movie reviews. The dataset split evenly into 25000 train and 25000 test
sets. The overall distribution of labels is balanced (25k positive and 25k negative). This also
includes an additional 50,000 unlabeled documents for unsupervised learning. There are 30
reviews for each movie. Once the dataset is trained on the project framework, it is able to
classify the dataset as either positive or negative. This dataset is pre-labelled by other machine
learning researchers which made our works easier. Word IDs that have been pre-assigned to
individual words and the label is an integer (0 for negative, 1 for positive). In the dataset, the
maximum review length is 2697 and the minimum length is 14.
4.3 Pad sequence

In order to feed the dataset into our own RNN model, the input documents must have the same
length. The proposed research limits the maximum review length to max_words by truncating
longer reviews and padding shorter reviews with a null value (0). This can be done using the
pad_sequences() function in Keras.
4.3 Algorithm and architecture
12
For extracting feature from any texts, Recurrent Neural Network (RNN) is widely used. It is a
deep learning algorithm. Its framework Long short term memory (LSTM) is widely used for
remembering things in hierarchical order. RNN is a type of neural network where the output
from the previous step is fed as input to the current step. In traditional neural networks, all the
inputs and outputs are independent of each other, but in cases like when it is required to predict
the next word of a sentence, the previous words are required and hence there is a need to
remember the previous words. Thus RNN came into existence, which solved this issue with the
help of a Hidden Layer [9].
Figure 1: LSTM architecture
Formula for calculating the current state:
ht =f (C t−1 , X t )
Where: ht is the current state, C t−1 , is the previous state, and X t is the input state.
Formula for applying the activation function:
ht =tanh ⁡(W hh C t−1 +W xh X t )
13
Where, W hh weight at recurrent neuron and W xh is weight at input neuron.
Formula for calculating output:
Ot =W hy ht
Where, Ot is the output and W hy is the weight in output layer.
The following diagram demonstrate how it works:
Figure 2: Recurrent neural network
4.5 Tools and Technologies

The project implements the above architecture and algorithms using a deep learning based
library called Keras with a TensorFlow backend and few other libraries to implement the code.
14
As a programming language, Python was used. Keras has built in dataset for IMDB’s movie
review. Other libraries implemented in the project are:
 NumPy
 Panda DataFrame
 Matplotlib
4.6 Implemented model

The research’s initial model has a RNN model with 1 embedding, 1 LSTM and 1 dense layer. So,
it turns out that 213,301 parameters needed to train. The implemented model summary is shown
below:
Figure 3: implemented RNN model
In this model, binary cross entropy has been used as loss function and Adam has been used as an
optimizer.
15
CHAPTER 5
NAÏVE APPROACH
16
5.1 Introduction
In this chapter, the paper discusses naïve approaches for solving sentiment analysis. In naïve
approaches, there is no intelligence involved like machine learning based approaches. Explicit
logics need to write for accomplishing a task.
5.2 Data source
17
This proposed research used Twitter’s tweet data posted by different users but in real-time. The
proposed application gives a real-time polarity score on twitter’s recently posted tweets. For
accessing tweets in real time, the project source code needs to access Twitter API. For accessing
twitter API, users need to create a developer account. During the application process, the Twitter
developer team asks 4 questions related to the use case of the API. After filling up the form, it
takes around 5-6 hours to confirm the application. If the application is approved, Twitter's
developer team sends an email like below:
Figure 4: Twitter API access approval email
For accessing the data from source code, 4 types of secret keys are provided by the API. The
following keys are provided by twitter API: ACCESS_TOKEN, ACCESS_TOKEN_SECRET,
CONSUMER_KEY and CONSUMER_SECRET. These key values are essential and need to
refer in the project codes to access twitter’s data. The app can be created from the following
URL address:
https://developer.twitter.com/en/apps
5.3 Tools
18
Python is used as the primary programming language. To use the Twitter API, there is a module
called Tweepy is needed. At first, it needs to create an app on the Twitter developer page where
access tokens can be seen. A snapshot from the app we created is given below:
Figure 5: Token information snapshot
5.4 Modules
1. StreamListener: It is a class from Tweepy module that allows to listen or read the tweets
based on certain keywords or hashtags.
2. oAuthHandler: This module is used for authenticating users based on the credentials
stored in API keys.
3. Stream: Helps to give real time data from twitter.
19
5.5 Getting Tweets in JSON format
This section contains two Python files, one for the API credential values and another one for
writing the logic to get the real-time twitter data defined for a few tags. The logic file contains
two classes. One is TwitterStreamer () and the other one is StdOutListener (). The first class
provides us with Twitter data in real-time and also filters the data by the defined keywords. The
second class is only for printing the values in JSON format.
5.6 Pagination and Cursor

This section adds a new class called TwitterClient (). This class includes four functions. The first
one is the constructor which defines the authentication of the user. The second one is for
accessing the user timeline. The third one is for accessing that user’s friend list and the last one is
for retweets posted on own timeline. This class also contains an extra function for printing error
messages.
5.7 Analyzing twitter data

In this section, the project categories each tweets in the following manner:
 Id
 Length
 Date
 Source
 Likes
 Retweets
This section also contains a function that is used to clean twitter data using Regular Expressions
of Python (RegEx.).
5.8 Text Blob

Textblob is a python library for processing natural language data. It gives an API that splits a text
through NLP into a part-of-speech tag which is used for sentiment analysis, noun phrase
20
extraction, classification, translation etc. [15]. It is all about python strings. Forgetting a polarity
score, the first step is doing tokenization which is one of the basic tasks of NLP. After that, the
project needs to extract a noun phrase extraction. This returns two properties, polarity and
subjectivity. Polarity is float which lies in the range of [-1,1] where 1 means positive statement
and -1 means a negative statement. Subjective sentences generally refer to personal opinion,
emotion or judgment whereas objective refers to factual information. Subjectivity is also a float
which lies in the range of [0,1]. [16].
21
CHAPTER 6
LEXICON BASED
APPROACH
6.1 Introduction
In this chapter, the paper explores the steps associated with lexicon based approach. In this
approach, the words are divided into two categories. Positive words and negative words.
6.2 Dataset
22
In this experiment, the research is conducted on a state of art dataset called SentiWordNet [10]. It
is supported by sentiment analysis applications. It provides three annotations with each labelled
data (positive, negative and neutral). Wordnet is a linguistic resource for sentiment analysis.
6.3 System Architecture
Figure 6:Lexicon based system design
A lexicon-based approach for opinion mining is based on the insight of a dataset that highly
depends on the polarity. However, due to the complexity of natural language processing, a
simple model may fail to extract the information out of one sentence properly. Because of this,
this paper proposed a fine-grained model which splits the sentences into a dictionary of small
words which is called micro phase. Let’s say this micro phase is called mi. The sentiment score
labelled with each micro phase is t j. The paper approaches two different ways of this
representation called Basic, Normalized, [11]
In the basic formulation, the sentiment of the content is achieved by first summing the polarity of
each micro-phrase. Then, the score is normalized by the range of the whole content. In this case,
the micro-phrases are just utilized to invert the polarity when a negation is found in the text.
23
n
POLbasic (mi )
Sbasic (T )=∑ ¿
i=1 ¿ T ∨¿
k
POLbasic ( mi) =∑ score(t j)
j−1
In the normalized formulation, the micro-phrase-levels cores are normalized by using the length
of the single micro-phrase, in order to weigh differently the micro-phrases according to their
length.
n
Snorm ( T )=∑ POL norm (mi)
i=1
k
score (t j)
POLnorm ( mi ) =∑ ¿
j=1 ¿ mi∨¿
6.4 Score calculation

The effectiveness of the word strictly depends upon the score of score(t j ). For each lexical
resource, a separate way to determine the sentiment score is chosen. In most cases for
determining the score, NLP pipelines are used to get its positive tags. Then, all the synets are
mapped to the positive values which are extracted earlier. Finally, the score is calculated as the
weighted average of all the sentiment scores of the synets. Similar way, scores are calculated for
the negative values.
6.5 Summary
In this chapter, the paper has discussed the lexicon-based approach for calculating sentiment
scores posted from an online platform. Here, this paper shows two different mathematical
models for calculating the sentiment score.
24
25
CHAPTER 7
RESULT ANALYSIS
7.1 Introduction
In this chapter, the paper discusses the results obtained from the above mentioned three methods
of doing opinion mining or sentiment analysis. The first approach is a Machine Learning based
approach where Recurrent Neural Network was used along with its popular architecture Long
26
Short Term Memory. The second method is the Naïve approach where twitter API was used to
calculate sentiment score and the last one is Lexicon based approach.
7.2 Machine Learning based approach

As an evaluation matrix, the research used accuracy. The batch size used in this paper is 64.
After running 3 epochs, the maximum accuracy came from the test set is 92.19%
Epoch # Training Training Validation Validation

Loss Accuracy Loss Accuracy
1 0.4844 0.7592 0.2803 0.8594
2 0.3004 0.8792 0.2671 0.8906
3 0.2426 0.9074 0.2507 0.9219
Table 1: Result of ML based approach
7.3 Lexicon based Approach

Model Test Accuracy (%)
Basic 57.25
Normalized 58.98
The experiment was conducted on SemEval-2013 [12] and Stanford Twitter sentiment (STS)
datasets [13] consists of 14435 tweets. The dataset splits into training (8180) and test data
(3255).
Table 2: Result of lexicon based approach
7.3 Naive Approach
27
For this research, a polarity score was generated from the U.S president Donald Trump’s twitter
timeline. For simplicity, the project limits the number of posts. One of the results got from our
compiler and on the output screen, a sample is given below:
Figure 7:Output result of Naive approach
28
CHAPTER 8
CONCLUSION
29
To sum up everything, opinion mining or sentiment analysis is now a big field of research. Many
researchers are trying to improve sentiment analysis accuracy. As the contents of the internet-
based platforms are increasing rapidly, automation of these data is very urgent. It is quite
impossible to read all of these contents by human and take appropriate steps for each content
separately. This is where automatic opinion mining or sentiment analysis is needed.
In this proposed research paper, 3 approaches are demonstrated to do sentiment analysis.

Proposed methods are Machine Learning based approach where this paper implemented
Recurrent Neural Network with LSTM framework. The next method is a naïve way of doing
opinion mining using the existing two existing API called Tweepy and TextBlob. The last
approach is the lexicon-based approach where this paper has shown two mathematical models –
basic and normalized model.
30
BIBLIOGRAPHY
31
[1] Apoorv A., Boyi X. Ilia V., and Owen R., "Sentiment Analysis of Twitter Data", No date
found.
[2] Woldemariam, Y. (2016). Sentiment analysis in a cross-media analysis framework. 2016
IEEE International Conference On Big Data Analysis (ICBDA). doi:
10.1109/icbda.2016.7509790.
[3] Fan, X., Li, X., Du, F., Li, X., & Wei, M. (2016). Apply word vectors for sentiment
analysis of APP reviews. 2016 3Rd International Conference On Systems And
Informatics (ICSAI). doi: 10.1109/icsai.2016.7811108
[4] Vanaja, S., & Belwal, M. (2018). Aspect-Level Sentiment Analysis on E-Commerce
Data. 2018 International Conference On Inventive Research In Computing Applications
(ICIRCA). doi: 10.1109/icirca.2018.8597286.
[5] Porntrakoon, P., & Moemeng, C. (2018). Thai Sentiment Analysis for Consumer’s
Review in Multiple Dimensions Using Sentiment Compensation Technique
(SenseComp). 2018 15Th International Conference On Electrical
Engineering/Electronics, Computer, Telecommunications And Information Technology
(ECTI-CON). doi: 10.1109/ecticon.2018.8619892
[6] Bashri, M., & Kusumaningrum, R. (2017). Sentiment analysis using Latent Dirichlet
Allocation and topic polarity wordcloud visualization. 2017 5Th International
Conference On Information And Communication Technology (Icoic7). doi:
10.1109/icoict.2017.8074651
[7] (2019). Retrieved from https://cs224d.stanford.edu/reports/PouransariHadi.pdf.
[8] Rojas-Barahona, L. (2016). Deep learning for sentiment analysis. Language And
Linguistics Compass, 10(12), 701-719. doi: 10.1111/lnc3.12228
[9] Introduction to Recurrent Neural Network - GeeksforGeeks. (2019). Retrieved from
https://www.geeksforgeeks.org/introduction-to-recurrent-neural-network/
[10] Andrea Esuli Baccianella, Stefano and Fabrizio Sebastiani. SentiWordNet 3.0:
Anenhanced lexical resource for sentiment analysis and opinion mining. InProceedingsof
LREC, volume 10, pages 2200–2204, 2010.
[11] Cataldo G., Giovanni S., Marco P., "A comparison of lexicon-based approaches for
sentiment analysis of microblog”., Proceedings of the 8th International Workshop on
Information Filtering and Retrieval.
32
[12] Preslav Nakov, Zornitsa Kozareva, Alan Ritter, Sara Rosenthal, Veselin Stoyanov,and
Theresa Wilson. Semeval-2013 task 2: Sentiment analysis in twitter. 2013.
[13] Alec Go, Richa Bhayani, and Lei Huang. Twitter sentiment classification usingdistant
supervision.CS224N Project Report, Stanford, pages 1–12, 2009.
[14] TextBlob: Simplified Text Processing — TextBlob 0.15.2 documentation. (2019).
Retrieved from https://textblob.readthedocs.io/en/dev/
[15] TextBlob, N. (2019). Natural Language Processing for Beginners: Using TextBlob.
Retrieved from https://www.analyticsvidhya.com/blog/2018/02/natural-language-
processing-for-beginners-using-textblob/
[16] Ding, J., Le, Z., Zhou, P., Wang, G., & Shu, W. (2009). An Opinion-Tree Based Flexible
Opinion Mining Model. 2009 International Conference On Web Information Systems
And Mining. doi: 10.1109/wism.2009.38
33
APPENDIX
CODE SAMPLES
34
from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout
embedding_size=32
model=Sequential()
model.add(Embedding(vocabulary_size, embedding_size, input_length=max_words))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())
Figure 8: Code sample of RNN implementation
class TweetAnalyzer():
35
def clean_tweet(self, tweet):
return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
def analyze_sentiment(self, tweet):
analysis = TextBlob(self.clean_tweet(tweet))
if analysis.sentiment.polarity > 0:
return 1
elif analysis.sentiment.polarity == 0:
return 0
36

The Thesis Report

Uploaded by

Copyright:

Available Formats

You might also like

The Thesis Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Thesis Report

Uploaded by

Copyright:

Available Formats

Department of Electrical and Computer Engineering

North South University

Opinion Mining Based on

Md. Azwad Hasan ID # 1430125042

Md. Shahriar Karim

reproduced in this project has been properly acknowledged.

Students’ name & Signature

Md. Azwad Hasan

Md. Shahriar Karim

Department Chair’s Signature

1.2 Research Details....................................................................................................................2

1.3 Research Goals......................................................................................................................3

2.2 Motivation towards our research topic..................................................................................5

2.3 Application Domains.............................................................................................................5

3.2 Sentiment analysis of Twitter data........................................................................................8

3.3 Sentiment in a cross-media analysis framework...................................................................8

3.4 Apply word vectors for sentiment A. of APP reviews..........................................................8

3.5 Aspect level SA on E-commerce data...................................................................................9

3.9 Deep learning for sentiment analysis...................................................................................10

3.10 Opinion-Tree based Flexible Opinion Mining Model.......................................................10

MACHINE LEARNING BASED APPROACHES..................................................................11

4.2 Data set and preprocessing..................................................................................................12

4.3 Pad sequence........................................................................................................................12

4.3 Algorithm and architecture..................................................................................................12

4.5 Tools and Technologies.......................................................................................................14

4.6 Implemented model.............................................................................................................15

5.2 Data source..........................................................................................................................17

5.5 Getting Tweets in JSON format..........................................................................................19

5.6 Pagination and Cursor..........................................................................................................19

5.7 Analyzing twitter data..........................................................................................................19

5.8 Text Blob.............................................................................................................................20

LEXICON BASED APPROACH...............................................................................................21

6.4 Score calculation..................................................................................................................23

7.2 Machine Learning based approach......................................................................................26

7.3 Lexicon based Approach.....................................................................................................26

7.3 Naive Approach...................................................................................................................27

1.2 Research Details

1. Machine Learning based approach

2. Naïve Approach using Python

1.3 Research Goals

2.2 Motivation towards our research topic

2.3 Application Domains

3.2 Sentiment analysis of Twitter data

3.3 Sentiment in a cross-media analysis framework

3.4 Apply word vectors for sentiment A. of APP reviews.

3.5 Aspect level SA on E-commerce data

3.6 Thai Sentiment Analysis for Consumer’s Review in

3.7 Sentiment analysis using Latent Dirichlet Allocation and

3.9 Deep learning for sentiment analysis

3.10 Opinion-Tree based Flexible Opinion Mining Model

4.2 Data set and preprocessing

4.3 Pad sequence

4.3 Algorithm and architecture

Figure 1: LSTM architecture

Formula for calculating the current state:

Formula for applying the activation function:

ht =tanh ⁡(W hh C t−1 +W xh X t )