Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Movie recommendation and sentiment analysis using Deep learning

Muhammad Saqib (L1f21MSCS0056), Ayesha Nazir Ch (L1F21MSCS0052)


Department of Information Technology University of Central Punjab,Khayban jinah 1 , Lahore

articleinfo abstract
Keywords: Despite the availability of large volumes of movie reviews on Rotten Tomatoes, there is still a need to explore the patterns and
Cosine similarity insights that can be gleaned from the data. Specifically, there is a lack of research that examines the relationship between Rotten
Movie recommendation Tomatoes' review scores and other factors such as movie genre, release year, and audience demographics. This study aims to fill
Sentiment analysis this gap by conducting an exploratory data analysis of Rotten Tomatoes movies reviews to uncover trends and patterns that can
CNN shed light on the factors that influence the review scores and help movie makers to make better decisions. our findings include
LSTM having found an average sentiment score, which movies and genres tend to have a better rating and a positive sentiment, and we
also found if there is a decrement or increment in the frequency of movie reviews over the years. And on the basis of these
findings, we published our conclusions on how there is a decrease in positive reviews over the last few years.

,
1. Introduction:
online review platforms, including Rotten Tomatoes, and found that users are
A lot of study has been done in previous years in the area of data analysis more likely to trust reviews from platforms that have higher ratings and more
of movie reviews. Some examples of previous research studies in this area reviews. The study also found that users are more likely to trust reviews from
include "The Rotten Tomatoes effect: Word-of-mouth and pre-release movie other users who have similar demographic characteristics. However, this study
buzz" by Brett Danaher and Michael D. Smith (2014) This study analysed the did not specifically focus on movies or examine the relationship between
relationship between Rotten Tomatoes' movie scores and pre-release buzz on review scores and other factors. Moreover no such research or study is found
social media platforms. The study found that Rotten Tomatoes scores are which examines the impact of other factors on review scores, such as movie
significantly correlated with pre-release buzz, and that the impact of word-of- genre, release year, and audience demographics, Analysis of the relationship
mouth on movie success is stronger for movies with high Rotten Tomatoes between review scores and movie success metrics, such as box office revenue
scores. However, the study did not examine the impact of other factors on and critical acclaim, Investigating the role of reviewer characteristics, such as
review scores. Another study was done to find the credibility of online review expertise and credibility, on review scores, Comparing the review scores and
platforms “A study of hotel booking websites in the United States" by Xueming trends between Rotten Tomatoes and other movie review platforms, such as
Luo, Jie Zhang, and Yong Liu (2014) This study analysed the credibility of IMDb and Metacritic. These gaps will be filled in our research paper
.

2. Related works

This literature review will provide an overview of the existing research on


the exploratory data analysis of Rotten Tomatoes movies reviews and identify
the gaps that can be filled by further research. Several studies have examined
the relationship between Rotten Tomatoes scores and movie success metrics,
such as box office revenue and critical acclaim. A study by Danaher and Smith
(2014) found that Rotten Tomatoes scores are significantly correlated with pre-
release buzz on social media platforms and that the impact of word-of-mouth on
movie success is stronger for movies with high Rotten Tomatoes scores.
Another study by Tavarez et al. (2019) found that high Rotten Tomatoes scores
are associated with higher box office revenues, even after controlling for other
factors such as budget and genre. These studies suggest that Rotten Tomatoes
scores can be a valuable predictor of movie success. While the relationship
between Rotten Tomatoes scores and movie success has been explored in
previous research, there is a lack of research that examines the impact of other  Distribution of Audience ratings across the years :
factors on review scores. For example, Tavarez et al. (2019) only considered
factors such as budget and genre, but did not examine the impact of audience
demographics or release year. Further research could explore these factors and
provide insights into the drivers of review scores. Another area that has received
limited attention in previous research is the comparison of Rotten Tomatoes with
other movie review platforms, such as IMDb and Metacritic. A study by Redi et
al. (2020) compared the rating distribution and correlation between Rotten
Tomatoes and Metacritic and found that the two platforms exhibit different
rating patterns and that the correlation between them is moderate. This suggests
that different review platforms may provide different perspectives on movies,
and further research could explore these differences in more detail. In
conclusion, several studies have explored the relationship between Rotten
Tomatoes scores and movie success, but there is still a need for further research
to examine the impact of other factors on review scores and to compare Rotten
Tomatoes with other movie review platforms. These gaps provide opportunities
for exploratory data analysis of Rotten Tomatoes movies reviews, which could
shed light on the drivers of review scores and help movie makers make better
decisions. .

3. Methodology

in this project, natural language processing (NLP) techniques will be applied in


order to detect large-scale patterns among written reviews provided by  Frequency of Studio names:
customers on imdb. The goal of this project is to predict whether customers liked
an Alexa device they have purchased using the information in their reviews.
 Histogram of Movies by year of release:

 Pie chart of genres and relative Audience status - credit to Marco


Zanella for the code :

 Histogram of Reviews by year of posting


.

Another technique will be used called Lemmatization instead of using


stemming. Lemmatization is a powerful technique that can be used to give the
root word for a given word but it requires the part of speech of that word.
To know more about this technique and the differences between it and
stemming i suggest you to have a look into this
Word Cloud
Visualize the review after cleaning With a Word Cloud which enables us to
discover and understand the reviews. It is a word picture in which the size of
each word hence, more frequent words appear larger
Split data to Train & Test
we split the data into train and test sets:
• 80% for training
• 20% for testing
Word Cloud Positive Words:
After this we applied the model on reviews.

Dataset:
IMDB Dataset" and it contains a collection of movie reviews that were scraped
from the popular movie review website, IMDb. The dataset consists of a single
CSV file, which contains 50,000 rows and two columns.The first column,
"review", contains the text of the movie review, while the second column,
"sentiment", contains a binary label indicating whether the review is positive or
negative. Reviews with a sentiment label of 1 are positive, while reviews with
a sentiment label of 0 are negative.The dataset can be used for a variety of
natural language processing (NLP) tasks such as sentiment analysis, text
classification, and language modeling. The dataset is well-suited for training
Word Cloud Negative Words: and testing machine learning models that can identify the sentiment of movie
reviews based on their text. It is important to note that the dataset may contain
some biased or inaccurate reviews, as they were scraped from a public website
and were not necessarily written by professional movie critics. Therefore, it is
recommended to clean and preprocess the data before using it for any analysis
ormodelingtask.

CNN and LSTM Model :


In Deep learning modelling we will be using CNN (Convolutional neural
networks) with LSTM (Long Short-Term Memory)model . CNN are
particularly good at detecting spatial structure in data. The sequence of words
in reviews has a one-dimensional spatial structure, and the CNN could be able
to chose out invariant features for positive and negative sentiment. An LSTM
layer can then learn sequences from the learned spatial features.
The Model Architecture :
Data Preprocessing:
For NLP we will keep only the necessary columns such as "verified reviews" • The 1st layer of the model is Embedding layer which uses the 100
and "sentiment" columns and we will drop the rest length vector.the Embedding layer is initialized with random weights and will
learn an embedding for all of the words in the train data
NLP(natural language processing)
• The 2nd layer is Conv1D with 32 convolution kernels and 2 kernel
In this section, I'll only look at the review text and sentiment columns to see size
whether the user's reviews on the device are positive or negative based on the
text. This reduces the model's complexity and turns it into a classification binary • The 3rd layer is MaxPooling1D with 2 poolsize
model. • The 4nd layer is LSTM layer with 256 neurons which will work as
Tokenisation: the memory unit of the model.

Tokenization is a critical stage in the natural language processing process. • output layer with 1 unit "Sigmoid function " which will helps in
Tokenization is the process of breaking down a sentence, paragraph, or even an providing the labels .
entire text document into smaller parts, such as individual words or phrases. By using LSTM we dont need to use preprocessing tasks such as stopwords
Each of these smaller components is referred to as a token. elimination coz this network have its own special feature for elimination of
in this project we will be using word tokenize in clean reviews function unnecessary information.It also has another feature: LSTM has a capability that
allows it to memorise the data sequence.this features makes LSTM a powerful
Stopwords Removal : tool for text classification .
stopwords removal step is a way to remove the unnecessary words that will Results :
not add any important information into the reviews
In this part, I will test the result of the models with custom reviews and compare the accuracy
Lemmatization : of the models to see which model has highest accuracy
N. Pavitha, V. Pungliya, A. Raut et al. Global Transitions Proceedings 3 (2022) 279–284

Fig. 7. ROC Curves.

Fig. 8. Prediction of movies using the cosine similarity


algorithm.

Fig. 9. Sentiment analysis on reviews using NB algo- rithm.

282
N. Pavitha, V. Pungliya, A. Raut et al. Global Transitions Proceedings 3 (2022) 279–284

Fig. 10. Sentiment analysis on reviews using SVC.


4. Conclusion References

[1] N. Nassar, A. Jafar, Y. Rahhal, A novel deep multi-criteria collaborative filtering model for
This paper is basically divided into two major parts. One of which focuses on
recommendation system, Knowl. Based Syst. 187 (2020) 104811 .
Movie Recommendation system and the other on the Senti- ment analysis. The [2] A. Beheshti, S. Yakhchi, S. Mousaeirad, S.M. Ghafari, S.R. Goluguri, M.A. Edrisi, Towards
study discusses both the systems in detail and has come to some important cognitive recommender systems, Algorithms 13 (8) (2020) 176 .
conclusions. For the Movie Recommendation System, the Cosine Similarity [3] S. Sharma, V. Rana, M. Malhotra, Automatic recommendation system based on hy- brid
filtering algorithm, Educ. Inf. Technol. 27 (2021) 1–16 .
algorithm has been used to recommend the best movies that are related to the [4] S.R.S. Reddy, S. Nalluri, S. Kunisetti, S. Ashok, B. Venkatesh, Content-based movie
movie entered by the user based on different factors such as the genre of the recommendation system using genre correlation, in: Smart Intelligent Computing and
movie, overview, the cast as well as the ratings given to the movie. Cosine Applications, Springer, Singapore, 2019, pp. 391–397 .
[5] M. Yasen, S. Tedmori, Movies reviews sentiment analysis and classification, in:
Similarity has given fair results even after running several tests on it and has been
Proceedings of the IEEE Jordan International Joint Conference on Elec- trical Engineering
quite accurate at recommending the movies. and Information Technology, JEEIT, 2019, pp. 860–865, doi:
Sentiment analysis also plays an important role in this study. It ba- sically aims 10.1109/JEEIT.2019.8717422 .
[6] N. Rajput, S. Chauhan, Analysis of various sentiment analysis techniques, Int. J. Comput.
to classify the reviews into positive or negative. Two algo- rithms have been used
Sci. Mob. Comput. 8 (2) (2019) 75–79 .
for the same. One of which is NB and other is SVC. The main reason behind using [7] Z. Shaukat, A.A. Zulfiqar, C. Xiao, M. Azeem, T. Mahmood, Sentiment analysis on IMDB
two algorithms is to find out what which is the best algorithm to classify the using lexicon and neural networks, SN Appl. Sci. 2 (2) (2020) 1–10 .
reviews because the reviews have huge diversity in them, so it is very important [8] T. Widiyaningtyas, I. Hidayah, T.B. Adji, User profile correlation-based similarity
(UPCSim) algorithm in movie recommendation system, J. Big Data 8 (2021) 52 .
to choose the right algorithm for classification. Finally, the experimental results [9] R.H. Singh, S. Maurya, T. Tripathi, T. Narula, G. Srivastav, Movie recommendation system
show that SVM Algorithm has better accuracy than NB by a very small margin. using cosine similarity and KNN, Int. J. Eng. Adv. Technol. (IJEAT) 9 (5) (2020) 2–3 ISSN:
Some prospects of this study have been mentioned below: 2249 –8958VolumeIssueJune .
[10] S. Kumar, K. De, P.P. Roy, Movie recommendation system using sentiment analysis from
1 Increasing the Accuracy of both Sentiment Analysis for better clas- sification microblogging data, IEEE Trans. Comput. Soc. Syst. 7 (4) (2020) 915–923 .
[11] A. Rahman, M.S. Hossen, Sentiment analysis on movie review data using machine learning
of sarcastic or ironic reviews. approach, in: Proceedings of the International Conference on Bangla Speech and Language
2 Sentiment Analysis of the reviews in different languages other than English. Processing (ICBSLP), IEEE, 2019, pp. 1–4 .
3 Movie recommendation according to users’ preference (cast, genre, year of [12] S. Uddin, A. Khan, M.E. Hossain, M.A. Moni, Comparing different supervised ma- chine
learning algorithms for disease prediction, BMC Med. Inf. Decis. Mak. 19 (1) (2019) 1–16
release, etc.).
.
[13] S. Ghosh, A. Dasgupta, A. Swetapadma, A study on support vector machine based linear
Although the system is very accurate, it does have some limitations. One of and non-linear pattern classification, in: Proceedings of the International Con- ference on
which is, if the movie entered by the user isn’t present in the dataset or if the user Intelligent Sustainable Systems (ICISS), IEEE, 2019, pp. 24–28 .
does not enter the name of the movie in the similar manner as that of in the dataset, [14] K. Dashtipour, M. Gogate, A. Adeel, H. Larijani, A. Hussain, Sentiment analysis of Persian
movie reviews using deep learning, Entropy 23 (5) (2021) 596 .
then the system fails to recommend movies. One more limitation is the linguistic
[15] S. Soubraylu, R. Rajalakshmi, Hybrid convolutional bidirectional recurrent neural network
barrier while doing the sentimental analysis. As of now only reviews written in based sentiment analysis on movie reviews, Comput. Intell. 37 (2) (2021) 735–757 .
English can be analyzed. The Sentimental analysis also gives wrong classification
if the reviews are sarcastic or ironic.

283

You might also like