Professional Documents
Culture Documents
Literature Survey
Literature Survey
Submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology
In
By
Meghna Peethambaran
November 2017
FEDERAL INSTITUTE OF SCIENCE AND TECHNOLOGY (FISAT)
R
Mookkannor(P.O), Angamaly-683577
CERTIFICATE
This is to certify that literature survey titled Sentiment Analysis for predicting movie review
is a bonafide work carried out by Meghna Peethambaran (14004078) in partial fulfilment for the
award of Bachelor of Technology in Computer Science and Engineering from Mahatma Gandhi University,
Kottayam, Kerala during the academic year 2017-2018.
Place:
Date:
Movie reviews are assessments of the aesthetic, entertainment, social and cultural merits and significance
of a current film or video. Reviews tend to be short to medium length articles, often written by a single
staff writer for a particular publication. For film industry, online review of critical audiences plays an
important role. On one hand, the good comments of a movie can attract more audiences in general. On
the other hand, the good comments do not necessary mean high box revenue and vice verse. Although
reviews are usually fairly "quick takes" on a movie, they can, in some instances, be lengthy, substantive,
and very insightful. Here we developed a model to perform sentimental analysis on the movie reviews
and predict whether it is a positive or negative review.
ACKNOWLEDGEMENT
Apart from the efforts put in by us, the success of this project depends largely on the encouragement
and guidelines of many others. We take this opportunity to express our gratitude to the people who have
been instrumental in the successful completion of this
project:
Mr.Paul Mundadan, Chairman, FISAT, who provided us with the vital facilities required by the
project right from inception to completion.
Dr. George Issac, Principal, FISAT for the amenities he provided, which helped us in the fulfillment
of our project.
Dr. Prasad J.C, HOD(CSE Dept), FISAT who always guided us and rendered his help in all phases
of our project.
Mr.Pankaj Kumar G, for his constant encouragement and enthusiastic supervision and for guiding
us with patience in all the stages.Without his help and inspiration, this would not have been materialized.
Ms. Divya John, Ms.Reshmi R, Ms.Preethi N P and Mr. Paul P Mathai for their guidance
and constant supervision as well as for providing necessary information regarding the project and also
for their support in completing the project.
The faculty of the CSE Dept., FISAT and Lab Instructors for providing us with the necessary Lab
facilities and helping us throughout this project.
Our families who inspired, encouraged and fully supported us in every trial that came our way. Also,
we thank them for giving us not just financial, but moral and spiritual support
Meghna Peethambaran
Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Works 3
2.1 Genre Specific Aspect Based Sentiment Analysis of Movie Reviews . . . . . . . . . . . . . 3
2.1.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Performance calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Reduced Feature Based Sentiment Analysis on Movie Reviews Using Key Terms . . . . . 6
2.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Detection Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Feature Selection & Classification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Design Approach for Accuracy in Movies Reviews Using Sentiment Analysis . . . . . . . . 15
2.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Aspect Based Sentiment Analysis of Movie Reviews . . . . . . . . . . . . . . . . . . . . . 17
2.6 Sentiment Analysis of Movie Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Improvement of Sentiment Analysis based on Clustering of Word2Vec Features . . . . . . 20
2.7.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Conclusion 24
List of Figures
Chapter 1
Introduction
With the increasing popularity of social media sites such as Twitter and review sites like Yelp and Rotten
Tomatoes, it is important to be able to automatically make sense of these large amounts of subjective
opinionated data. Sentiment analysis, using natural language processing and machine learning techniques
to characterize subjective human opinions or sentiments, has been rapidly gaining popularity as a method
of analyzing these large corpora for such diverse applications such as predicting trends in the stock market,
and characterizing diurnal and seasonal moods such as seasonal affective disorder.Most of the work to
date has been identifying how the presence of individual words in an excerpt, such as a tweet or movie
review, contributes to the sentiment of the entire excerpt (a so-called bag-of-words model).
1.1 Overview
Semantic analysis describes the process of understanding natural language-the way that humans com-
municate based on meaning and context.The semantic analysis of natural language content starts by
reading all of the words in content to capture the real meaning of any text. It identifies the text elements
and assigns them to their logical and grammatical role. It analyzes context in the surrounding text and
it analyzes the text structure to accurately disambiguate the proper meaning of words that have more
than one definition.Semantic technology processes the logical structure of sentences to identify the most
relevant elements in text and understand the topic discussed.
Sentiment analysis [1] is a methodology by which find out the sentimental orientation of a piece of
text. Using it, we can infer whether a particular person has conveyed a positive or negative sentiment
in the said text under consideration. We tackled the issue of aspect based sentiment analysis of movie
reviews in our previous publication [2]. In it the concept of "driving factors" is used, which enhanced the
overall classification accuracy by amplifying the effect of certain movie aspects with respect to others. In
the current work,the same is tend to be used, but for reviews with different genre. Many researchers have
done work on Aspect based analysis of review, be it movie or customer review. Also many algorithms have
been developed for the same. But not much work has been done on genre specific aspect based analysis.
Genre specific reviews demand special techniques while analysing as such reviews contain sentences or
words that have unique meaning based on the context i.e. genre in which they are used.
The researchers got inspired from approaches in different fields like information retrieval, natural
language processing, statistics, summarization, probability and machine learning, and different concepts
in these fields are applied for better opinion mining. Major steps involved in sentiment analysis include
data gathering, preprocessing, aspect identification, feature extraction and sentiment classification. An
opinion mining process can be done at different levels like document level, sentence level, phrase level,
tweet level or aspect level.
Chapter 2
Related Works
2.1.1 Implementation
The method aims at developing a lexicon based aspect oriented analysis approach for genre specific re-
views. Fig 2.1 describes the flow of the proposed method. The dataset used is here is that Mahesh Joshi,
Dipanjan Das, Kevin Gimpel, and Noah A. Smith used in their experiments. The dataset was in XML
format and each file contained movie details like name of the movie, genre of the movie, date of release
and also sited web links for obtaining full reviews. Pre-processing was required for the dataset as it was
not in accordance with our requirements. The links and the genre of each movie was extracted from the
dataset. Since the dataset didn’t contain ratings,movielens dataset was used.
[10]
Sentiment analysis is a methodology by which we find out the sentimental orientation of a piece
of text. Using it, we can infer whether a particular person has conveyed a positive or negative sentiment
in the said text under consideration. We tackled the issue of aspect based sentiment analysis of movie
reviews in our previous publication [2]. In it we use the concept of "driving factors", which enhanced
the overall classification accuracy by amplifying the effect of certain movie aspects with respect to oth-
ers. In the current work, we tend to use the same concept, but for reviews with different genre. Many
researchers have done work on Aspect based analysis of review, be it movie or customer review. Also
many algorithms have been developed for the same. But not much work has been done on genre specific
aspect based analysis. Genre specific reviews demand special techniques while analysing as such reviews
contain sentences or words that have unique meaning based on the context i.e. genre in which they are
used
The next step was that of separating the review text into aspect specific text. Aspect Based Text
Separator (ABTS) was used for this purpose. ABTS separates the review text into different groups based
on the movie aspects. It does this using an aspect lexicon. The next step was that of the classification
of these separated aspect texts. To account for all the context sensitive words, a genre specific lexicon
is developed. This lexicon would contain certain words whose orientation would depend on the genre in
which they are used. A list of top 500 frequently used adjectives in everyday life was used and formed a
lexicon out of these words. To assign orientation to these words based on the movie genre, the method-
ology of Semantic orientation is used.
Before defining semantic orientation,Pointwise Mutual Information (PMI) is defined. PMI between two
words is the amount of information that acquire about the presence of one word when we observe the
other [14].
The formula for PMI is:
Here X denotes a positively oriented word or string of words and Y denotes a negatively oriented
word or a string of words. Thus we find the co-occurrence of word with a positive word and with a
negative word. Then subtract the PMI obtained with the positive and negative word to get the overall
orientation of the word. Thus if a negative value is obtained, the overall SO is negative, and it means
that the word under consideration occurs more closely with the negative word string and similarly if the
result is positive, the word is closely associated with the positive word string.
The NEAR operator functionality was programmatically recreated using Boolean operators to work
on our dataset. To find co-occurrence of two words, word1 and word2 we issued a Boolean query over our
dataset as: "word1 word2" OR "word2 word1" OR "word1 * word2" OR "word2 * word1". The above
query considers all the cases related to the positioning of the words. Here "*" represents a wildcard,
which means there can be single or multiple words between the two words.
After hitcount collection, the SO of the adjective was calculated according to the given formula. The
SO values were normalised and value was brought between -1 and 1 before storing the values in the
lexicon. This process was carried on for all 500 adjectives and for each adjective. The SO values for each
genre corresponding to each adjective were stored in a file and a genre specific lexicon was formed. After
the review is passed through the ABTS, the separated aspect texts are forwarded to be scored using the
Genre Specific Lexicon Scorer (GSLS). Here the adjectives are extracted from the text and score it using
the lexicon. If the adjective is present in the lexicon, a score corresponding to the genre of the review
which is under consideration is given. If the adjective is not present in the lexicon, then the adjective
is scored using the SentiWordnetdictionary. SentiWordnet contains sentimental scores, which are not
context specific, for huge collection of adjectives, adverbs and nouns.
The effect of negation and intensifiers on the adjective that is being scored is also considered. Negations
are the words that change the polarity of the adjective. Intensifiers are the words that enhance the score
of the adjective. After all the adjectives have been scored, the average of the value of all the scores is
computed and assign this averaged value as the score for the aspect text. The next step is the application
of driving factors on these scores (DFM). Driving factors are used for amplifying the importance of
certain aspects in the overall classification process and also in fine grained analysis of the review under
consideration. After application of driving factors, it is followed by the summation process in which a
proper threshold was set and review scores were compared with this threshold, and review classification
based on this comparison was done.
2.1.3 Summary
Using the driving factors in the above mentioned methodology gave the following results: for action genre-
got plot, movie and direction as the most important factors as these aspects had the highest values, for
comedy - got acting, plot and movie, for crime - got screenplay, music and plot, for drama - got music,
movie and direction and for horror - got music, direction and movie. These results obtained are only for
the particular dataset under consideration.
2.2.1 Implementation
Movie reviews can be analyzed in many ways to extract useful information from them. It helps in opinion
summarization, extracting sentiment orientation of an author towards the movie, performance comparison
of multiple movies, identification of characters and storyline and so on. The focus is given on determining
whether the overall opinion of an author towards a movie is positive or negative in nature. Hence it can
be considered as a binary classification task. The proposed system for implementing this task is designed
as a supervised machine learning algorithm that uses some predefined lexicons and a set of key terms in
the document corpus for feature extraction. Hence it could be considered as a hybrid of machine learning
based and lexicon based approaches. Mainly there are two phases for this algorithm: Learning Phase and
Detection Phase.
In the Learning Phase, initially a set of positive and negative key terms are extracted from the
training data. Then a set of features are generated based on these terms. Additional features included
into the feature vector are SentiWordNet (SWN) based score and features based on a lexicon of positive
and negative words. This feature vector and the sentiment label of documents are used to train classifiers.
Detection phase uses the trained classifiers to classify the test data as positive or negative.
Input to the system is a set of training data that is labeled either as positive or negative and a set of
unlabeled data for testing. The textual data extracted from movie review website is unstructured and it
is to be converted into a form suitable for further processing.
Preprocessing and plot elimination are applied on both labeled training documents for learning and un-
labeled test documents for sentiment detection as shown in figure 2.2. As part of preprocessing phase,
some erroneous patterns that frequently occur in the data are removed through regular expression match-
ing.There will be some terms that frequently occur in the data, but do not contribute towards the text
mining. Those terms like "the", "and", "does", "of" etc. are called stop words and are removed as part
of pre processing. But the stop words removal is done only whenever required. Unlike the existing ap-
proaches for sentiment analysis, a plot elimination phase is also incorporated in the preprocessing phase.
It can be considered as a part of the preprocessing or as a separate phase.
By analyzing the movie reviews, it is found that the performance of a sentiment classification system
may improve by eliminating the portions of reviews that describe the story, thus retaining sentences
bearing actual opinion of the author. It is done in two phases:
a.Plot Elimination Phase I
C.FeatureExtraction
A review cannot be directly given as it is to a machine learning algorithm. Hence each document is
to be transformed into a set of features. These features are to be selected in a logical manner such that
it holds the information relevant for classification and prediction. The experiments are done using three
types of features:
-Key term based features
- SentiWordNet based features
- Lexicon based features
2.2.4 Summary
Sentiment analysis on movie reviews is a challenging task because of the presence of plot descriptions
within the reviews. There was a considerable improvement in performance by eliminating those portions
of the review that describe the story line, thus enabling the sentiment classification algorithm to focus
on the relevant opinionated sentences. The study also introduced a novel set of features based on the
frequent N-grams used by authors to express their feelings in addition to a set of lexicon based features.
Elimination of plot and the reduced feature set made the proposed sentiment analysis system efficient in
terms of time and cost.
2.3.1 Implementation
A.Preprocessing of data
Preprocessing is the preparation of dataset before applying any algorithm into it.The data is stemmed
to remove commoner morphological endings from words in english. Then data stopping is performed to
remove the most common words according to a stop list to reduce the size of the document. Parts of
Speech is a processing technique where the words are marked corresponding to a particular part of speech
such as noun etc.
After the preprocessing, the next step was analyzing the data to find common observable patterns
that may affect the polarity of the document. In order to calculate the document polarity, it is necessary
to understand that the sentiment score may be enhanced or diminished with its usage as well as their
relationship with the nearby words.With the analysis of features from observation, the impact of each
feature on the polarity of the document to set the scaling factor for each of the feature need to be found.
To find the impact, Information Gain of each features and used a Feature Ranking Algorithm to rank all
the features has been used.
Well known classifiers namely Bagging, Random Forest, Decision Tree, Naive Bayes, K-Nearest Neigh-
bor, Classification via Regression are used. The classification is done in our methodology with the aim
to predict the class level for a machine to predict the class of a movie review whenever it arrives.
2.3.2 Summary
In this work, extracted new features that have a strong impact on determining the polarity of the movie
reviews and applied computation linguistic methods for the preprocessing of the data. were extracted.
Feature impact analysis were performed,by computing information gain for each feature in the feature
set and used it to derive a reduced feature set. Among six classification techniques, we found that the
highest accuracy was given by Random Forest with an accuracy of 88.95%.
2.4.1 Implementation
This paper focuses on two areas like first Feature Selection and Ranking and second using machine learning
techniques.The labels are provided to the polarity as follow Strong Negative - (-2), Weak Negative - (-1),
Neutral- 0, Weak Positive- 1, Strong Positive-2. THe work flow is shown in fig 2.8
A.Input Data
The input data is in the form of reviews from the "times of india" movie review dataset. Particular
movie is selected from the dataset and reviews regarding that movie are displayed on web page. After
releasing of any new movie the reviews of that movie are added to the dataset.
B.Pre-Processing
The text pre-processing techniques are divided into three subcategories(fig 2.9):
- Tokenization: The data present in the text document contains block of characters called tokens.
These text documents are separated as tokens and used for further processing of data.
- Removal of Stop Words: Those words which appear too often that support no information for the
task are removed.
-Part of Speech Tagging: POS tagger parses a sentence or document and tags each term with its part
of speech.
C. Text Transformation
In the process of text transformation the score of each sentence in the source document is calculated
by sum of weight of each term in the corresponding sentences. The weight of each term is calculated
by multiplication of that words based on adjective word extracted from part-of-speech. The output of
pre-processing process is given as input to text transformation process.
D. Feature Extraction
In the process of feature extraction, movie features are extracted from every sentence. For finding the
polarity of text document, it is necessary to understand the sentiment score with its usage as well as their
relationship with all the nearby words.
E Feature Reduction Approach
One of the biggest problems of sentimental analysis is dealing with text data which are available in very
high dimensions which may affect the performance of classifier. So, there is a need for such technique
which will eliminate those features that are not relevant and keeping only those features. The Information
Gain and Gain Ratio are the most popular techniques among number of feature reduction techniques.
Information Gain: Information Gain technique is mainly used for finding importance of a feature in
decreasing overall entropy. Information Gain process is mainly based on the measure entropy. The
entropy measure indicates the impurity of collected samples.
Gain Ratio: In gain ratio the contribution of all features will be normalized before the classification of
the document.
F.Classification
Lexicon based approach is used,SentiWordNet, for finding the overall polarity of movie reviews. The
classification is done with the Random Forest classifier to determine the sentiment labels for a machine
and to predict the class of a movie reviews whenever it arrives in the form of positive or negative polarity.
The reduced features are provided as input to classification process and the classification is based on
number of positive and negative sentiment. The sentiments in the sentence are classified according to
polarity.
2.4.2 Summary
Many researchers have work on the domain of movie reviews, Pang et al have used three classification
techniques NB, ME, SVM with the help of these technique they have achieve accuracy of 82.90%. Prabowo
et al.used hybrid SVM method for classification method it achieves accuracy of 87.3%. Rui Yao et al
work on NB classification technique and achieve accuracy of 78.75%. Mullen and Collier et al performs
sentiment classification using SVM and achieves accuracy of 86%. On comparing the result with the
previous models the approach in the paper achieves highest accuracy level than previous model used for
classification of movie reviews.
albeit a sentence, belonging to positive or a negative class of reviews. The traditional method of training
and testing the classifier is applied. The output of the classifier is either 1 or -1 denoting that the input
text was of positive or negative orientation respectively. Instead of NB ,any classifier like SVM etc can be
used that is able to clearly classify the text in two classes. Based on the weightage of the driving factors
of the movie, the aspect based output is multiplied with the respective driving factor.
C.Collecting Datasets
10 reviews each for 100 movies were collected from the popular movie review database website www.imdb.com[15].
All the reviews were labelled manually to evaluate performance of our algorithmic formulations. Out of
1000 movie reviews collected, 760 are labeled positive and 240 are labeled as negative reviews.
D.Performance Evaluation
In order to evaluate the accuracy and performance of our algorithmic formulations,the standard per-
formance metrics of Accuracy, Precision, Recall and Fmeasure were computed.
The measure of Accuracy A used by us is:
2.7.1 Implementation
In this paper, a method to construct feature set is proposed to reduce the dimension of the Word2Vec
features for sentiment analysis. In particular, the set of terms in a vocabulary are clustered around
opinion words in order to distribute them based on polarity. It is hypothesized that such a method will
The Vector representation of a corpus is discovered by using the Skip-gram technique of the Word2Vec
[31] to calculate the probability distribution of terms.The terms in vocabulary are clustered based on
their polarity in the distribution space. For each of the words in the dictionary that also appears in
the vocabulary, the associated vector for the word in the Word2Vec from the earlier step is extracted.
In order to construct the clusters of terms in the vocabulary, the similarity between each term in the
vocabulary and all words in the sentiment lexical dictionary selected as the centroids of the clusters is
calculated. Specifically, the cosine similarity is used to measure the similarity between two vectors. The
terms are assigned to the cluster to which centroid is the most similar. As a result, the terms in the
vocabulary are clustered based on the opinion words in the dictionary. In sentiment analysis, the reviews
or comments need to be classified into its polarity whether it is positive or negative. Typically, the high
dimensional vectors of the Word2Vec are used as the features for the classification techniques.
The algorithm for the proposed method is given as:
2.7.2 Summary
A method is proposed to reduce the size of the Word2Vec feature set for sentiment analysis. The method
constructs cluster of terms centered by a set of opinion words from a sentiment lexical dictionary. A
simple transformation is applied to the negative term vectors to redistribute the terms in the space based
on their polarity. A much smaller matrix of document vectors is produced based on the set of clusters.
Two classifiers, namely Logistic Regression and Support Vector Machine (SVM) are used to compare the
performance of different feature set for sentiment analysis. It has been observed that the performance of
the proposed method is encouraging, showing that it can be more effective and efficient than the baseline.
In the future, more investigation will be performed on the Word2Vec in term of the perplexity
Chapter 3
Sentiment analysis is tough because same topic can be expressed in different ways. Also the words used
to express a positive sentiment would be negative in other statements. The movie reviews posted on the
inter-net are unstructured form of grammar and expressing opinions on a topic are never standardized,
one person’s appreciation may differ from others.
The problem statement on Sentiment Analysis of Movie Reviews have been described as:
It is the heart of sentiment analysis; all the review statement contains sentiment words which have
a major contribution in determining the polarity of the review. Example, ”The movie was good and
interesting”, here the sentiment words good and interesting tells us that the polarity of the movie
is positive.
• Sarcasm
It is really difficult to know the tone of author in textual sentences, we can’t definitely say that bad
means bad or good. For example, ”The movie was supposed to be hilarious?”
• Parsing
What does the verb and/or adjective of a subjective or objective textual sentence really refer to?
• Scaling
What is the quantity of data input as a proportion of the total universe of users? 10% of the IMDB
corpus gives you a rough idea of what’s going on n but the result are nowhere close to the resolution
you get t with 50% of the reviews.
Chapter 4
Conclusion
Using the driving factors we were able to extract the most importance aspects for a particular dataset
under consideration. Thus by using this methodology, we can identify importance aspects across various
datasets and across various genres. Using this knowledge, we might try to develop a fine grained recom-
mendation system which recommends the user with movies not only on ratings, but also on the aspects
about the movies he likes. Also instead of using it on movie review, we can use them on customer reviews
for business analytics and marketing of the product. The customers give their opinion about various
product aspects like its performance, usability, cost, build quality etc. and thus creating a domain for
driving factor usage. The research can have a considerable application on reviews in Indian languages
too.Further study is needed for feasible application of this model on Indian languages. We have only used
the concept of negation, intensifiers and genre specific lexicon to induce some knowledge about inter-
word dependencies in the algorithm. Various other techniques like dependency tree, clause based scoring
can be used for further detailed analysis.
Sentiment analysis on movie reviews is a challenging task because of the presence of plot descriptions
within the reviews. There was a considerable improvement in performance by eliminating those portions
of the review that describe the story line, thus enabling the sentiment classification algorithm to focus
on the relevant opinionated sentences. The study also introduced a novel set of features based on the
frequent N-grams used by authors to express their feelings in addition to a set of lexicon based features.
Elimination of plot and the reduced feature set made the proposed sentiment analysis system efficient in
terms of time and cost. More features could be incorporated into feature set because of its small size.
Features based on deep learning techniques could be adopted in future to further improve the sentiment
classification results. Similarly new context based features can also be added to the feature list. Context
is extracted in this method using the concept of N-grams.
[1] Vijay Parkhe, Bhaskar Bhiswas, "Genre Specific Aspect Based Sentiment Analysis of Movie Reviews"
[2] Qing Caoa,Wenjing Duanb and QiweiGana, "Exploring determinants of voting for the "helpfulness"
of online user reviews: A text mining approach"
[3] Rasika Wankhede and Prof. A.N.Thakare , "Design Approach for Accuracy in Movies Reviews Using
Sentiment Analysis "
[4] Sruthi S , Reshma Sheik and Ansamma John "Reduced Feature Based Sentiment Analysis on Movie
Reviews Using Key Terms "
[5] V.K.Singh, R.Piryani, A.Uddin, P.Waila , "Sentiment analysis of movie reviews: A new feature-based
heuristic for aspect-level sentiment classification"
[6] Tirath Prasad Sahu and Sanjeev Ahuja, "Sentiment Analysis of Movie Reviews: A study on Feature
Selection Classification Algorithms "
[7] Pang, B., Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization
with respect to rating scales. In Annual meeting-association for computational linguistics (Vol. 43, p.
115).
[8] https://machinelearningmastery.com/develop-word-embedding-model-predicting-movie-review-
sentiment/
[9] https://nlp.stanford.edu/courses/cs224n/2012/reports/WuJean_PaoYuanyuan_224nReport.pdf
[10] http://www.expertsystem.com/natural-language-process-semantic-analysis-definition/
[11] https://machinelearningmastery.com/deep-learning-bag-of-words-model-sentiment-analysis/
[12] https://www.scribd.com/document/252659877/Sentiment-Analysis-of-Rotten-Tomatoes-for-Box-
Office-Revenue-Prediction
-25-
Sentiment Analysis for predicting movie review
[13] https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews
[14] Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J. (2008, June). Liblinear: A library
for large linear classification. J. Mach. Learn. Res., 9, 1871âĂŞ1874.
[15] Peter D. Turney, Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised
Classification of Reviews, Proceedings of the 40th Annual Meeting of the Association for Computa-
tional Linguistics (ACL), Philadelphia, July 2002, pp. 417-424.
[16] E. Nikolaidis, C. Sabo, J. A. R. Marshal, and A. Reina, Characterisation and upgrade of the commu-
nication between overhead controllers and Kilobots, White Rose Research Online, Tech. Rep., April
2017.
[17] Eissa M.Alshari, Azreen Azman, Shyamala Doraisamy, Norwati Mustapha and Mustafa
Alkeshr"Improvement of Sentiment Analysis based on Clustering of Word2Vec Features"