Professional Documents
Culture Documents
A Project-Based Seminar Report On Movie Rating Prediction System
A Project-Based Seminar Report On Movie Rating Prediction System
On
Date: / / 2021
Place: Pune
This Project Based Seminar report has been examined by us as per the
Savitribai Phule Pune University, Pune requirements at Smt. Kashibai Navale
College of Engineering, Pune-41 on ________________
I am very thankful to all the teachers who have provided me valuable guidance
towards the completion of this seminar work on the Movie Rating Prediction and
Recommendation System. I express my sincere gratitude towards the cooperative
department who has provided me with valuable assistance and requirements for the
seminar work.
I am very grateful and want to express my thanks to Prof. S. M. Kamble for guiding me
in the right manner, correcting my doubts by giving her time whenever I required, and
providing her knowledge and experience in making this seminar work. I am also thankful
to the HOD of our Information Technology department Prof. R.H.Borhade for his moral
support and motivation which has encouraged us in making this seminar work.
The acknowledgment will be incomplete if I do not thank our Principal Prof. Dr. A. V.
Deshpande, who gave his constant support and motivation which has been highly
instrumental in making this seminar work.
List of Figures 6
Abbreviations 7
1. INTRODUCTION 8
Prediction
3.2 Algorithms 14
4. Conclusion 16
5. References 17
ABSTRACT
RF Random Forest
IMDB Internet Movie Database
DS Data Set
Knn K-Nearest Neighbour
Chapter 1
INTRODUCTION
People's favorite form of entertainment is movies, which have become an essential part of
our lives as a source of leisure and amusement. Movies have evolved into a tool for
learning about different cultures throughout the world, in addition to being a source of
amusement. Even a movie may be regarded as a work of art that drives people insane. It
has become the most important source of entertainment for people all over the world,
regardless of their various backgrounds. Every year, a slew of films with a diverse range
of genres, storylines, and performers are released. In the last five years, the United States
and Canada have released an average of 765 films per year. In the year 2019 alone, In
comparison to 2018, this number has climbed by 70 films. People would need a guideline
or metric to determine if a movie is good or not, given the vast number of films released,
so they don't waste their money on bad films. People are frequently unsure of which film
to watch for amusement in their spare time. Furthermore, watching terrible movies might
alter the audience's mood.
The study can be utilized as a proof of concept for future applications, and it should
highlight some of the obstacles that must be solved in order to construct a successful
prediction model. This concept might theoretically be applied to credit ratings, the stock
market, or the housing market.
It is true that a good movie can change your but it is also true that a bad one can make you
feel gloomy or can say sadly nowadays movie has become one of the most popular
entertainments for people and it has become an integral part of our lives as a medium of
relaxation and entertainment. Many people just google a movie’s review and read the first
review and start watching it and when after watching more than half of the or full movie
they realized that it’s not something they wanted to watch, sometimes finding a good
movie to watch can be difficult because we all know that not all movies are not like
Marvels Production’s Avenger series or Zack Snyder’s directed ones, and that’s where our
Movie Recommendation system comes, it focuses on your previous choices, study it
thoroughly and then suggest you a movie that will be similar to your previous choices
which you found amusing and rewatchable, and that’s what we provide you which is a
system that will recommend you the movie on the type of role actor and actress has
played in it, the genre of the movie and various factor which are considered to be user’s
choice
Our aim is to create a System which will save users time, which they spend on searching
for a good movie which they may like they spend time on various site and youtube
channel so that they get a review of a movie that they may like, they spend the time which
they got for relaxation, the time they got to boost their energy, many people after a hectic
day like to watch a movie at the end of the day but sometimes they end up watching
movies which are way out of the genres they like, or something they have heard of, from
someone, what we are working on is a system which will use Machine Learning to
understand what type of movie you like to watch if I say specifically it will recommend
you movies on the basis of your previously watched movies on the choice of genre you
like the most, your favorite actor, actresses, director and considering many other factors. It
will not just save your time but will also refresh because once Tom Hanks(An American
Actor and Filmmaker) said “At the end of the day it’s got to be a movie which makes
people think, ‘Hey, I couldn’t have spent my time any better.’ ”.
The objectives of this task are to build up a system which will perform several tasks such
as:-
Every year, hundreds of movies are produced and distributed. There are both excellent and
poor films among them. As a result, how do we know their merits if we haven't seen the film?
Or how do we pick a nice movie to watch on weekends to unwind and enjoy? Most of the
time, we will base our decision on the film's score or a review. The IMDb website is a
wonderful place to start right now. Because of its popularity, the IMDb website offers a wealth
of information about movies as well as fan feedback.IMDb's ratings are well known by the
public, and they reflect both the quality of the content and, to some extent, the audience's
approval. As a result, we will attempt to uncover the key aspects impacting the IMDb score
and suggest an effective method for predicting it in this study. The data in our article comes
from Kaggle's IMDb 5000 Movie Dataset. It comprises 28 variables for 5042 films and 4906
posters from 66 countries, spanning 100 years. There are 2399 different director names and
thousands of actors and actresses to choose from.
In this study, we aim to anticipate the Cinema has a major impact on our culture using the
IMDb dataset. Cinema is one of the most effective forms of mass communication on the
planet. Cinema has the power to impact society on a local and global scale. Every year, a
wide range of films are produced. Some films depict historical events, while others
produce culture, while others provide fantasy, and still, others do a variety of other things.
Finally, our findings reveal that on this dataset, we attain a high level of IMDb score
prediction accuracy.
It is true that a good movie can change your but it is also true that a bad one can make you
feel gloomy or can say sadly nowadays movie has become one of the most popular
entertainments for people and it has become an integral part of our lives as a medium of
relaxation and entertainment. Many people don’t want to wait for the Movie to be released
and to be reviewed by the people who have watched it, and not all movies are like
Marvel’s Avenger series or Zack Snyder’s, that’s why what we need is a system that will
predict the rating of the movie on the basis of the director’s previous movies, the type of
role actor and actress has played the genre they are best suited in and various factor, and
that’s what we have proposed here a way which will let you know the chances of the
success of a particular movie which will not only save the money but also save the time of
many people.
The aim of this work is to create a Movie Rating Prediction System which will predict a
rating of the movie which will user determine that should he gives his time on a particular
movie or should go for any other movie. Instead of getting a review from a particular
person, we will provide it from no of people using various algorithms and data set from
their previous experience.
The objectives of this task are to build up a system which will perform several tasks such
as:-
1. To predict the rating of movies through the previous data sets.
2. To predict the success rate of the movie through the review.
3. To save the users time and money.
4. To increase the efficiency of the.
In this section, I have discussed various methodologies that have been proposed by
teachers for Movie Rating Prediction.
Prior to its release in theatres, a neural network was employed to forecast the
financial performance of a box office film. This predicting had been transformed into
a nine-class classification issue. The model was depicted with only a few details. A.
Sivasantoshreddy, P. Kasat, and A. Jain used hype analysis to try to anticipate a
movie's box office opening. It was attempted to improve movie gross forecast by
news analysis, using Lydia's quantitative news data. There were two different models
in it (regression and k-nearest neighbor models). However, they only considered
films with a large budget. When a popular term was used as a name, the model
failed, and it was unable to forecast whether there was no news about a movie.
M.H. Latif and H. Afzal used the IMDB database as their sole source of information,
and their data was not accurate. As they previously stated, their data was
inconsistent and exceedingly noisy. As a result, they used Central Tendency as a
benchmark for filling in blanks for other qualities.
K. Jonas, N. Stefan, S. Daniel, and F. Kai used the sentiment and social network
analysis to anticipate their hypothesis, which was based on an analysis of the
intensity and positivity of IMDb's Oscar Buzz subforum. They have taken into
account movie critics as an influencer and their prognostication. They employed a
jumble of words that yielded erroneous results.
Chapter 3
SEMINAR RELATED OTHER TOPICS
The first step is to find a sample and suitable dataset of movie data for analysis. General
pre-production information on film projects, such as genre, language, and information on
the actors and directors involved, must all be relevant features of such data. Similarly,
some metrics of success, such as user-generated movie ratings, must be included in the
data. Second, the appropriate dataset must be produced and arranged in such a manner that
the data used is both indicative of the overall movie scenario and suitable for analysis
using machine learning techniques and algorithms. Finally, the performance of the
required machine learning algorithms in terms of prediction must be assessed using the
supplied dataset. This necessitates the acquisition and configuration of a set of tools
capable of evaluating both algorithms in comparison to one another based on data while
maintaining measurement equivalence. Appropriate metrics of this parameter must also be
identified in order to compare the algorithms based on their prediction performance.
Figure 3.1: WorkFlow of our System
3.2 Algorithms
To get the required accuracy we will be using some of the best algorithms that can give
more accurate result when provided with certain data sets
1. Random Forest
2. Gradient Boost
3. K-Nearest Neighbours
Ensemble algorithms combine several methods, either of the same or distinct types, to
classify objects. Running predictions using Naive Bayes, SVM, and Decision Tree, for
example, and then voting on the final class for the test object. From a randomly selected
portion of the training set, the random forest classifier generates a set of decision trees. It
then combines the votes from various decision trees to determine the test object's final
class. Timberlands that aren't quite right A gathering learning technique for
characterization, relapse, and various errands, RF or arbitrary choice forests, works by
creating a large number of choice trees at the time of preparation and giving the class, the
method of the classes (in arrangement) or the mean forecast (in relapse) of the trees that
are unique RF is a step forward from the previous option. The tree algorithm corrects the
propensity of overfitting in decision-making to their training set of trees.
K-Nearest Neighbors is one of Machine Learning's most basic but crucial categorization
algorithms. Pattern recognition, data mining, and intrusion detection are just a few of the
applications it finds in the supervised learning domain. It is commonly used in real-world
contexts because it is non-parametric, which means it makes no underlying assumptions
regarding data distribution.
3.3 Future Work
We'd like to expand both the number of movies and the number of features in the dataset in
the future. Other social media sources of movie data collecting, such as Twitter and
YouTube, are also being considered. The supervised learning models MLP and Bagging are
two other learning models that we want to apply to the movie data. We'd like to compare the
findings of these models to the ones presented here.
Chapter 4
CONCLUSION
The IMDb dataset is a fascinating one to study. After constructing the five models,
we discovered that the Random Forest accurately depicts the movie's features. In
comparison to prior studies, the success rate for all models is higher. The results are
superior to those produced by certain conventional libraries and related
investigations. The success of a film is not just determined by film-related factors.
For a film to be successful, the number of people who watch it is critical. Because
the entire objective is to attract people, the entire industry will be meaningless if
there is no one to view a film.
Limitations :
It becomes difficult to predict when the dataset which is given to the system is new or
can say which cant be related to any other data provided previously for ex:- If the
cast is new or the director is new then it’s sometimes difficult the predict the results
accurately
REFERENCES
IEEE
standard
Journal
Paper:
https://ieeexplore.ieee.org/document/1385466
https://www.researchgate.net/publication/222530390
https://developer.android.com/guide/topics/media/mediaplayer
https://builtin.com/data-science/random-forest-algorithm
https://towardsdatascience.com/machine-learning-basics-with-
the-k-nearest-neighbors-algorithm-6a6e71d01
https://machinelearningmastery.com/gentle-introduction-
gradient-boosting-algorithm-machine-learning/
https://www.imdb.com/