Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

A Project-Based Seminar Report

On

“Movie Rating Prediction and Recommendation System”


Submitted to
Savitribai Phule Pune University, Pune
In partial fulfillment for the award of the Degree of
Bachelor of Engineering
in
Information Technology
by

Kunal Bhatt (T150368571 / T20242 & Div-II)

Under the guidance of


Prof. S. M. Kamble

Department of Information Technology


STES’s, Smt. Kashibai Navale College of Engineering,
Vadgaon (BK), Pune, 411 041.
2020-2021 (SEM-II)
CERTIFICATE
This is to certify that the project-based seminar report entitled “Movie Rating Prediction
and Recommendation System” being submitted by Kunal Bhatt (T150368591 /
T20242 (Div-II)) is a record of bonafide work carried out by him under the supervision
and guidance of Prof. S. M. Kamble in partial fulfillment of the requirement for TE
(Information Technology) 2015 course of Savitribai Phule Pune University, Pune in the
academic year 2020-2021

Date: / / 2021

Place: Pune

Prof. S. M. Kamble Guide Dr. A. V. Deshpande


Prof. R. H. Borhade Principal, SKNCOE, Pune
Head of the Department, IT

This Project Based Seminar report has been examined by us as per the
Savitribai Phule Pune University, Pune requirements at Smt. Kashibai Navale
College of Engineering, Pune-41 on ________________

Internal Examiner External Examiner


ACKNOWLEDGEMENT

I am very thankful to all the teachers who have provided me valuable guidance
towards the completion of this seminar work on the Movie Rating Prediction and
Recommendation System. I express my sincere gratitude towards the cooperative
department who has provided me with valuable assistance and requirements for the
seminar work.
I am very grateful and want to express my thanks to Prof. S. M. Kamble for guiding me
in the right manner, correcting my doubts by giving her time whenever I required, and
providing her knowledge and experience in making this seminar work. I am also thankful
to the HOD of our Information Technology department Prof. R.H.Borhade for his moral
support and motivation which has encouraged us in making this seminar work.
The acknowledgment will be incomplete if I do not thank our Principal Prof. Dr. A. V.
Deshpande, who gave his constant support and motivation which has been highly
instrumental in making this seminar work.

(Students Name & Signature)


TABLE OF CONTENTS
Abstract 5

List of Figures 6

Abbreviations 7

1. INTRODUCTION 8

1.1 Introduction to Project topic 8

1.2 Motivation behind Project topic. 9

1.3 Aim and Objective(s) of the Project work 9

1.4 Project Title 10

2. BACKGROUND STUDY OF Movie Rating 11

Prediction

2.1 Introduction to Seminar Topic. 11

2.2 Motivation behind Seminar topic. 12

2.3 Aim and Objective(s) of the Seminar work 12

2.4 Literature Survey 13

3. SEMINAR RELATED OTHER CHAPTERS 14

3.1 Proposed Methodology 14

3.2 Algorithms 14

3.2.1 Random Forest 14

3.2.2 Gradient Boost 15

3.2.3 K-Nearest Neighbors 15

3.3 Future Work 15

4. Conclusion 16

5. References 17
ABSTRACT

Movie rating is an important element to decide movie quality. It is like a summary to


reflect the quality of all elements inside a movie. People prefer to use rating as a
reference to decide before deciding to watch a movie or not. It is important to predict
movie rating before it is released to maintain the objectivity of the movie rating. Many
existing types of research failed to address this problem because they used post-release
elements such as social media comments to predict movie ratings. The other problem is
the predicted rating is not intended for general people. Several types of research used
collaborative filtering, however, the rating found was intended for specific people. To
address limitations from previous researches, this study used historical values of the
movie as features. Historical values could be generated from pre-released elements from
the movie, it was created from the relation between movie which based on movie similar
attributes such as actor, director, genres, content rating, and production companies. By
using historical values, objective predictions can be made even before the movie released.
The proposed method was intended to make more accurate and general predictions for
movie ratings. In this study, the usage of historical features and convolutional neural
network (CNN) as models showed promising results.

Keywords: Movie rating, rating prediction, historical values, convolutional neural


network, CNN
LIST OF FIGURES
ACRONYMS

RF Random Forest
IMDB Internet Movie Database
DS Data Set
Knn K-Nearest Neighbour
Chapter 1
INTRODUCTION

1.1 Introduction to Project

People's favorite form of entertainment is movies, which have become an essential part of
our lives as a source of leisure and amusement. Movies have evolved into a tool for
learning about different cultures throughout the world, in addition to being a source of
amusement. Even a movie may be regarded as a work of art that drives people insane. It
has become the most important source of entertainment for people all over the world,
regardless of their various backgrounds. Every year, a slew of films with a diverse range
of genres, storylines, and performers are released. In the last five years, the United States
and Canada have released an average of 765 films per year. In the year 2019 alone, In
comparison to 2018, this number has climbed by 70 films. People would need a guideline
or metric to determine if a movie is good or not, given the vast number of films released,
so they don't waste their money on bad films. People are frequently unsure of which film
to watch for amusement in their spare time. Furthermore, watching terrible movies might
alter the audience's mood.

The study can be utilized as a proof of concept for future applications, and it should
highlight some of the obstacles that must be solved in order to construct a successful
prediction model. This concept might theoretically be applied to credit ratings, the stock
market, or the housing market.

A recommendation system is a solution to the problem of finding suitable items for a


consumer despite searching a large number of options. Although people's preferences
differ from one to the next, they do follow a pattern. Recommendation Systems are
software tools and techniques that provide recommendations based on a person's
preferences in order to find fresh required content for them. By assessing past browsing
history, comments assigned to products, and diverse user behavior, the Recommendation
System generates ideas for customers. There are two types of recommended algorithms:
user-based and item-based algorithms.
1.2 Motivation Behind Project Topic

It is true that a good movie can change your but it is also true that a bad one can make you
feel gloomy or can say sadly nowadays movie has become one of the most popular
entertainments for people and it has become an integral part of our lives as a medium of
relaxation and entertainment. Many people just google a movie’s review and read the first
review and start watching it and when after watching more than half of the or full movie
they realized that it’s not something they wanted to watch, sometimes finding a good
movie to watch can be difficult because we all know that not all movies are not like
Marvels Production’s Avenger series or Zack Snyder’s directed ones, and that’s where our
Movie Recommendation system comes, it focuses on your previous choices, study it
thoroughly and then suggest you a movie that will be similar to your previous choices
which you found amusing and rewatchable, and that’s what we provide you which is a
system that will recommend you the movie on the type of role actor and actress has
played in it, the genre of the movie and various factor which are considered to be user’s
choice

1.3 Aim and Objectives of the Work

Our aim is to create a System which will save users time, which they spend on searching
for a good movie which they may like they spend time on various site and youtube
channel so that they get a review of a movie that they may like, they spend the time which
they got for relaxation, the time they got to boost their energy, many people after a hectic
day like to watch a movie at the end of the day but sometimes they end up watching
movies which are way out of the genres they like, or something they have heard of, from
someone, what we are working on is a system which will use Machine Learning to
understand what type of movie you like to watch if I say specifically it will recommend
you movies on the basis of your previously watched movies on the choice of genre you
like the most, your favorite actor, actresses, director and considering many other factors. It
will not just save your time but will also refresh because once Tom Hanks(An American
Actor and Filmmaker) said “At the end of the day it’s got to be a movie which makes
people think, ‘Hey, I couldn’t have spent my time any better.’ ”.
The objectives of this task are to build up a system which will perform several tasks such
as:-

1. To recommend movies through the previous data sets.


2. To the movies, you may like or not.
3. To save the users time and money.
4. To improve the accuracy of recommendation every time user use this for a movie
recommendation .
5. Recommending Movie using data from different users with similar movie choice

Figure 1.3: Movie Recommendation System.

1.4 Project Title

Title: Movie Rating Prediction and Recommendation System


Chapter 2
Background Work Of Movie Rating
Prediction System

2.1 Introduction to Seminar Topic

Every year, hundreds of movies are produced and distributed. There are both excellent and
poor films among them. As a result, how do we know their merits if we haven't seen the film?
Or how do we pick a nice movie to watch on weekends to unwind and enjoy? Most of the
time, we will base our decision on the film's score or a review. The IMDb website is a
wonderful place to start right now. Because of its popularity, the IMDb website offers a wealth
of information about movies as well as fan feedback.IMDb's ratings are well known by the
public, and they reflect both the quality of the content and, to some extent, the audience's
approval. As a result, we will attempt to uncover the key aspects impacting the IMDb score
and suggest an effective method for predicting it in this study. The data in our article comes
from Kaggle's IMDb 5000 Movie Dataset. It comprises 28 variables for 5042 films and 4906
posters from 66 countries, spanning 100 years. There are 2399 different director names and
thousands of actors and actresses to choose from.

In this study, we aim to anticipate the Cinema has a major impact on our culture using the
IMDb dataset. Cinema is one of the most effective forms of mass communication on the
planet. Cinema has the power to impact society on a local and global scale. Every year, a
wide range of films are produced. Some films depict historical events, while others
produce culture, while others provide fantasy, and still, others do a variety of other things.

Chronological values are one of the study's distinguishing characteristics. These


characteristics were developed as a result of the link between a film and previously
released films. It was anticipated that the forecast rating based on these historical values
would be far more objective than the audience's reaction when the film was first released.
This method is referred to as a cohort rating prediction by Ning et al. The cohort rating
approach looks for movies that are comparable in terms of historical qualities and values,
and then predicts a rating based on those similarities. For instance, a film starring David
Leitch (director) and Vin Diesel. (actor) would have a comparable castrating of the films.

We do an exploratory examination of the data and discover some intriguing phenomena


that help us improve our prediction technique as well as learn about the factors that
influence the IMDb score of a movie.

Finally, our findings reveal that on this dataset, we attain a high level of IMDb score
prediction accuracy.

2.2 Motivation Behind Project Topic

It is true that a good movie can change your but it is also true that a bad one can make you
feel gloomy or can say sadly nowadays movie has become one of the most popular
entertainments for people and it has become an integral part of our lives as a medium of
relaxation and entertainment. Many people don’t want to wait for the Movie to be released
and to be reviewed by the people who have watched it, and not all movies are like
Marvel’s Avenger series or Zack Snyder’s, that’s why what we need is a system that will
predict the rating of the movie on the basis of the director’s previous movies, the type of
role actor and actress has played the genre they are best suited in and various factor, and
that’s what we have proposed here a way which will let you know the chances of the
success of a particular movie which will not only save the money but also save the time of
many people.

2.3 Aim and Objectives of the Work

The aim of this work is to create a Movie Rating Prediction System which will predict a
rating of the movie which will user determine that should he gives his time on a particular
movie or should go for any other movie. Instead of getting a review from a particular
person, we will provide it from no of people using various algorithms and data set from
their previous experience.

The objectives of this task are to build up a system which will perform several tasks such
as:-
1. To predict the rating of movies through the previous data sets.
2. To predict the success rate of the movie through the review.
3. To save the users time and money.
4. To increase the efficiency of the.

2.4 Literature Survey

In this section, I have discussed various methodologies that have been proposed by
teachers for Movie Rating Prediction.

Prior to its release in theatres, a neural network was employed to forecast the
financial performance of a box office film. This predicting had been transformed into
a nine-class classification issue. The model was depicted with only a few details. A.
Sivasantoshreddy, P. Kasat, and A. Jain used hype analysis to try to anticipate a
movie's box office opening. It was attempted to improve movie gross forecast by
news analysis, using Lydia's quantitative news data. There were two different models
in it (regression and k-nearest neighbor models). However, they only considered
films with a large budget. When a popular term was used as a name, the model
failed, and it was unable to forecast whether there was no news about a movie.

M.H. Latif and H. Afzal used the IMDB database as their sole source of information,
and their data was not accurate. As they previously stated, their data was
inconsistent and exceedingly noisy. As a result, they used Central Tendency as a
benchmark for filling in blanks for other qualities.
K. Jonas, N. Stefan, S. Daniel, and F. Kai used the sentiment and social network
analysis to anticipate their hypothesis, which was based on an analysis of the
intensity and positivity of IMDb's Oscar Buzz subforum. They have taken into
account movie critics as an influencer and their prognostication. They employed a
jumble of words that yielded erroneous results.
Chapter 3
SEMINAR RELATED OTHER TOPICS

3.1 Proposed Methodology

The first step is to find a sample and suitable dataset of movie data for analysis. General
pre-production information on film projects, such as genre, language, and information on
the actors and directors involved, must all be relevant features of such data. Similarly,
some metrics of success, such as user-generated movie ratings, must be included in the
data. Second, the appropriate dataset must be produced and arranged in such a manner that
the data used is both indicative of the overall movie scenario and suitable for analysis
using machine learning techniques and algorithms. Finally, the performance of the
required machine learning algorithms in terms of prediction must be assessed using the
supplied dataset. This necessitates the acquisition and configuration of a set of tools
capable of evaluating both algorithms in comparison to one another based on data while
maintaining measurement equivalence. Appropriate metrics of this parameter must also be
identified in order to compare the algorithms based on their prediction performance.
Figure 3.1: WorkFlow of our System

3.2 Algorithms

To get the required accuracy we will be using some of the best algorithms that can give
more accurate result when provided with certain data sets

Following algorithms will be used for Predictions:-

1. Random Forest
2. Gradient Boost
3. K-Nearest Neighbours

3.2.1 Random Forest

Ensemble algorithms combine several methods, either of the same or distinct types, to
classify objects. Running predictions using Naive Bayes, SVM, and Decision Tree, for
example, and then voting on the final class for the test object. From a randomly selected
portion of the training set, the random forest classifier generates a set of decision trees. It
then combines the votes from various decision trees to determine the test object's final
class. Timberlands that aren't quite right A gathering learning technique for
characterization, relapse, and various errands, RF or arbitrary choice forests, works by
creating a large number of choice trees at the time of preparation and giving the class, the
method of the classes (in arrangement) or the mean forecast (in relapse) of the trees that
are unique RF is a step forward from the previous option. The tree algorithm corrects the
propensity of overfitting in decision-making to their training set of trees.

3.2.2 Gradient Boost

Extreme Gradient Boosting (XG Boosting) is one of the Gradient Boosting


implementations, but it differs from Gradient Boosting in that it controls overfitting by
employing a more regularised model, which aids in more accurate prediction. The name
XG Boost, on the other hand, refers to the engineering goal of pushing the computational
resources for the boosted tree method to their limit. This is one of the reasons why so
many people utilize the XG Boost algorithm. It might be more appropriate to refer to the
model as regularised gradient boosting.

3.2.3 K-Nearest Neighbours

K-Nearest Neighbors is one of Machine Learning's most basic but crucial categorization
algorithms. Pattern recognition, data mining, and intrusion detection are just a few of the
applications it finds in the supervised learning domain. It is commonly used in real-world
contexts because it is non-parametric, which means it makes no underlying assumptions
regarding data distribution.
3.3 Future Work

We'd like to expand both the number of movies and the number of features in the dataset in
the future. Other social media sources of movie data collecting, such as Twitter and
YouTube, are also being considered. The supervised learning models MLP and Bagging are
two other learning models that we want to apply to the movie data. We'd like to compare the
findings of these models to the ones presented here.
Chapter 4
CONCLUSION

The IMDb dataset is a fascinating one to study. After constructing the five models,
we discovered that the Random Forest accurately depicts the movie's features. In
comparison to prior studies, the success rate for all models is higher. The results are
superior to those produced by certain conventional libraries and related
investigations. The success of a film is not just determined by film-related factors.
For a film to be successful, the number of people who watch it is critical. Because
the entire objective is to attract people, the entire industry will be meaningless if
there is no one to view a film.

Limitations :

It becomes difficult to predict when the dataset which is given to the system is new or
can say which cant be related to any other data provided previously for ex:- If the
cast is new or the director is new then it’s sometimes difficult the predict the results
accurately
REFERENCES

IEEE

standard

Journal

Paper:

1. P. Chaovalit and L. Zhou, “Movie review mining: a comparison


between supervised and unsupervised classification
approaches,” in Proceedings of the Hawaii International
Conference on System Sciences (HICSS), 2005.

https://ieeexplore.ieee.org/document/1385466

2. R. Sharda and D. Delen, “Predicting box-office success of


motion pictures with neural networks,” Expert Systems with
Applications, vol. 30, no. 2, pp. 243–254, 2006.

https://www.researchgate.net/publication/222530390

Names of Websites referred

https://developer.android.com/guide/topics/media/mediaplayer

https://builtin.com/data-science/random-forest-algorithm

https://towardsdatascience.com/machine-learning-basics-with-
the-k-nearest-neighbors-algorithm-6a6e71d01

https://machinelearningmastery.com/gentle-introduction-
gradient-boosting-algorithm-machine-learning/

https://www.imdb.com/

You might also like