Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:07/July-2021 Impact Factor- 5.354 www.irjmets.com

MOVIE RECOMMENDATION SYSTEM APPROACH USING CLASSIFICATION


TECHNIQUES
Yeole Madhavi B.*1, Rokade Monika D.*2, Khatal Sunil S.*3
*1PG Student, Department Of Computer Engineering, SPCOE, Dumbarwadi (Otur),
Pune, Maharashtra, India.
*2,3Assistant Professor, Department Of Computer Engineering, SPCOE, Dumbarwadi (Otur),
Pune, Maharashtra, India.
ABSTRACT
Movie recommendation is one of the hottest topics in the domain of Recommender Systems. In order to
improve user experience and reduce movie searching / exploring time, a better movie recommendation is
preferred. Websites like Netflix, Amazon Prime, etc. use movie recommendation to improve their revenue or
profits. Some recommender systems use collaborative filtering technique whereas some recommender systems
use content filtering technique. Major factor for decision making is the ratings given by the users to the movies
which they have watched. The project proposes to get the movie recommendations based on binary
classification techniques. Users can get user personalized movie recommendations by using this technique.
Multiple Machine Learning and/or Deep Learning algorithms have been used in order to get recommendations.
A lot of analysis is done in order to see if the recommendations are appropriate or not.
Keywords: Movie Recommendation, Classification Techniques, Cold Start Problem.
I. INTRODUCTION
Due to abundance of information collected till 21st century and the increasing rate of information flowing over
the internet, there is a lot of confusion related to what to consume and what not to consume. Even on YouTube,
when you want to watch a video of a particular concept, generally, there are a lot of videos available out there for
you. Now, since the results are ranked appropriately, there may not be much issue but what if the results were
not ranked appropriately? Well, in that case, we would probably spend a lot of time to find the best possible
video which suits us and satisfies our need. This recommendation results are when you search something on a
website. Next time, when you visit a particular website, without even searching, sometimes the system is able to
show you recommendations which you might like. Isn’t this an interesting feature? So, basically, the job of a
recommender system is to suggest the most relevant items to the user. Recommendation systems are used in
YouTube for video recommendation, Amazon and Flipkart for product recommendation, Netflix and Amazon
Prime for movie recommendation, and so on. Whatever you do on such websites, there is a system which see
your behavior and then ultimately suggest things / items with which you are highly likely to engage. This
research paper deals with movie recommendations and logic behind movie recommendation system, traditional
movie recommendation systems, issues related to traditional movie recommendation systems, and a proposed
solution for Artificial Intelligence based personalized movie recommendation system. A lot of famous movie
recommendation related datasets are already available on Kaggle and other websites. Some of the famous
datasets include Movielens dataset, TMDB Movie Dataset, and the dataset by Netflix itself. Websites like Netflix,
Amazon Prime, etc. use movie recommendation to increase their revenue or profits by ultimately improving the
user experience. In fact, there was a competition conducted by Netflix in the year 2009 with a prize money of
nearly 1 million dollars ($1M) for making at least 10% improvement in the existing system.
As dealt earlier, we have a lot of data available at our exposure and we need to filter the data in order to consume
it because generally we are not interested in each and everything available to us. In order to filter the data, we
need some filtering techniques. There are different types of filtering techniques or movie recommendation
algorithms over which a recommendation system can be based upon.
Major filtering techniques or movie recommendation algorithms are as follows:
1. Content Based Filtering
2. Collaborative Filtering
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1767]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:07/July-2021 Impact Factor- 5.354 www.irjmets.com
3. Hybrid Filtering
Some of these techniques can be further broken into subparts.
II. LITERATURE SURVEY
Sang-Min Choi, et. al. [1] mentioned about the shortcomings of collaborative filtering approach like sparsity
problem or the cold-start problem. In order to avoid this issue, the authors have proposed a solution to use
category information. The authors have proposed a movie recommendation system which is based on genre
correlations. The authors stated that the category information is present for the newly created content. Thus,
even if the new content does not have enough ratings or enough views, still it can pop up in the
recommendations list with the help of category or genre information. The proposed solution is unbiased over the
highly rated most watched content and new content which is not watched a lot. Hence, even a new movie can be
recommended by the recommendation system.
George Lekakos, et. al. [2] proposed a solution of movie recommendation using hybrid approach. The authors
stated that Content based filtering and Collablorative filtering have their own shortcomings are can be used in a
specific situation. Hence, the authors have come up with a hybrid approach which takes into consideration both
content-based filtering as well as collaborative filtering. The solution is implemented in 'MoRe' which is a movie
recommendation system. For the sake of pure collaborative filtering, Pearson correlation coefficient has not been
used. Instead, a new formula has been used. But this formula has an issue of 'divide by zero' error. This error
occurs when the users have have given same rating to the movies. Hence, the authors have ignored such users. In
case of pure content-based recommendation system, the authors have used cosine similarity by taking into
consideration movie writers, cast, directors, producers and the movie genre. The authors have implemented a
hybrid recommendation method by using 2 variations - 'substitute' and 'switching'. Both of these approaches
show results based on collaborative filtering and show recommendations based on content-based filtering when
a certain criterion is met. Hence, the authors use collaborative filtering technique as their main approach.
Debashis Das, et. al. [3] wrote about the different types of recommendation systems and their general
information. This was a survey paper on recommendation systems. The authors mentioned about Personalized
recommendation systems as well as non-personalized systems. User based collaborative filtering and item based
collaborative filtering was explained with a very good example. The authors have also mentioned about the
merits and demerits of different recommendation systems.
Jiang Zhang, et. al. [4] proposed a collaborative filtering approach for movie recommendation and they named
their approach as 'Weighted KM-Slope-VU'. The authors divided the users into clusters of similar users with the
help of K-means clustering. Later, they selected a virtual opinion leader from each cluster which represents the
all the users in that particular cluster. Now, instead of processing complete user-item rating matrix, the authors
processed virtual opinion leader-item matrix which is of small size. Later, this smaller matrix is processed by the
unique algorithm proposed by the authors. This way, the time taken to get recommendations is reduced.
S. Rajarajeswari, et. al. [5] discussed about Simple Recommender System, Content-based Recommender System,
Collaborative Filtering based Recommender System and finally proposed a solution consisting of Hybrid
Recommendation System. The authors have taken into consideration cosine similarity and SVD. Their system
gets 30 movie recommendations using cosing similarity. Later, they filter these movies based on SVD and user
ratings. The system takes into consideration only the recent movie which the user has watched because the
authors have proposed a solution which takes as input only one movie.
Muyeed Ahmed, et. al. [6] proposed a solution using K-means clustering algorithm. Authors have separated
similar users by using clusters. Later, the authors have created a neural network for each cluster for
recommendation purpose. The proposed system consistes of steps like Data Preprocessing, Principal Component
Analysis, Clustering, Data Preprocessing for Neural Network, and Building Neural Network. User rating, user
preference, and user consumption ratio have been taken into consideration. After clustering phase, for the
purpose of predicting the ratings which the user might give to the unwatched movies, the authors have used
neural network. Finally, recommendations are made with the help of predicted high ratings.
Gaurav Arora, et. al. [7] have proposed a solution of movie recommendation which is based on users' similarity.
The research paper is very general in the sense that the authors have not mentioned the internal working details.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1768]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:07/July-2021 Impact Factor- 5.354 www.irjmets.com
In the Methodology section, the authors have mentioned about City Block Distance and Euclidean Distance but
have not mentioned anything about cosine similarity or other techniques. The authors stated that the
recommendation system is based on hybrid approach using context based filtering and collaborative filtering but
neither they have stated about the parameters used, not they have stated about the internal working details.
V. Subramaniyaswamy, et. al. [8] have proposed a solution of personalized movie recommendation which uses
collaborative filtering technique. Euclidean distance metric has been used in order to find out the most similar
user. The user with least value of Euclidean distance is found. Finally, movie recommendation is based on what
that particular user has best rated. The authors have even claimed that the recommendations are varied as per
the time so that the system performs better with the changing taste of the user with time.
Harper, et. al. [9] mentioned the details about the MovieLens Dataset in their research paper. This dataset is
widely used especially for movie recommendation purpose. There are different versions of dataset available like
MovieLens 100K / 1M / 10M / 20M / 25M /1B Dataset. The dataset consists of features like user id, item id /
movie id, rating, timestamp, movie title, IMDb URL, release date, etc. along with the movie genre information.
III. PROPOSED METHODOLOGY
The proposed recommendation system takes into consideration binary classification techniques in order to get
recommendations. Binary Classification can be done using multiple algorithms like Logistic Regression, Support
Vector Machines (SVM) or Support Vector Classifier (SVC), Multilayer Perceptron, etc.
The system can work in the following 2 ways:
1. User needs to enter the movies which he has watched and rated 5 stars, 4 stars, 3 stars, 2 stars, and 1
star. Now, the corresponding movie ID (item ID) and corresponding ratings will be added to the dataframe.
Along with the movie ID and ratings, we need to add the user ID as well. Now, selection of the user ID is done
carefully such that it does not match with any of the existing user IDs. Based on the information feeded by the
user to the system, the recommendations will be shown. These recommendations will not consist of the movies
which are already watched by the user.
User needs to enter the user ID only. Now, this user ID should be a valid user ID. Valid user ID means the user
ID which is present in the dataset. In case, if the user ID is present in the dataset, then it is obvious that the user
ID is present in the dataframe which is created by using ‘pandas’ library. Every user ID has some history of the
movies watched by that particular user. Based on the user history associated with the particular user ID, the
recommendations will be shown. These recommendations will not consist of the movies which are already
watched by the user.
A) Architecture

Fig. 1. System Architecture

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[1769]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:07/July-2021 Impact Factor- 5.354 www.irjmets.com
B) Dataset & Preprocessing
Movielens 100K Dataset has been used for this research work. It consistes of a lot of attributes which are useful
for movie recommendation purpose. Some of the attriutes are as follows:
‘user_id’: It consists of User ID. Each user has a unique user ID.
‘item_id’: It consists of Item ID or Movie ID. There are nearly 1682 item IDs in the Movielens 100K dataset. Each
Item ID is indicating a unique movie. In other words, each movie has a unique Item ID.
‘rating’: It indicated the rating which the user has given to the movie which he/she has watched. Rating can be 1
(indicating 1 star) or 2 (indicating 2 star) or 3 (indicating 3 star) or 4 (indicating 4 star) or 5 (indicating 5 star).
‘timestamp’: It indicates the timestamp at which a particular rating has been given to a particular movie by a
particular user.
Minimum number of features available to proceed with this technique is 3 which are as follows:
1. User ID
2. Item ID / Movie ID
3. Rating

Fig. 2. Glimpse of User ID, Item ID, Rating, and Timestamp


MovieLens Dataset provides with a lot of additional information. We can use it for getting better results.
Ratings can be encoded in the following manner:
[1] Type 1 Binary Ratings:
4 star and 5 star ratings will be indicated by '1'.
1 star, 2 star, and 3 star ratings will be indicated by '0'.
[2] Type 2 Binary Ratings:
5 star ratings will be indicated by '1'.
1 star, 2 star, 3 star, and 4 star ratings will be indicated by '0'.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[1770]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:07/July-2021 Impact Factor- 5.354 www.irjmets.com

Fig 3. Glimpse of User ID, Item ID, Rating, Timestamp and Type 1 Binary Ratings
In the above figure, you can see that the ratings are manipulated and converted into binary ratings such that 1
star, 2 star, and 3 star ratings are indicated by 0. Moreover, the 4 star and 5 star ratings are indicated by 1.

Fig. 4. Glimpse of User ID, Item ID, Rating, Timestamp and Type 2 Binary Ratings
In the above figure, you can see that the ratings are manipulated and converted into binary ratings such that 1
star, 2 star, 3 star, and 4 star ratings are indicated by 0, and the 5 star ratings are indicated by 1.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1771]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:07/July-2021 Impact Factor- 5.354 www.irjmets.com
C) ALGORITHMS
Since the recommendations are done on the basis of binary classification, we need Binary Classification
algorithms. Following are some of the binary classification algorithms which can be used in order to get the
recommendations:
Logistic Regression:

Fig. 5. Linear Regression


Logistic Regression uses the logistic function or the sigmoid function. In the above figure, threshold value is
y=0.5. Hence, if the probability of a data point being of class 1 is greater than or equal to 0.5, then that particular
data point will be classified as with a label ‘1’. Moreover, if the probability of a data point being of class 1 is less
than 0.5, then that particular data point will be classified with a label ‘0’.
Support Vector Machine:

Fig. 6. Support Vector Machine


In case of Support Vector Machine (SVM), we use Support Vector Classifier (SVC) for the sake of binary
classification. In case of SVM, we try to find the margin maximizing hyperplane. The hyperplane passing
through the support vector is either a positive hyperplane or a negative hyperplane as seen from the above
figure. Margin maximizing hyperplane is in between the positive hyperplane and the negative hyperplane. The
distance between margin maximizing hyperplane and the positive hyperplane is the same as the distance
between margin maximizing hyperplane and the negative hyperplane. There are different types of kernel in
SVM like ‘linear’, ‘sigmoid’, ‘rbf’, ‘polynomial’, etc. We can train by using the default kernel which is ‘rbf’ kernel.
Naïve Bayes:
Naïve Bayes can be classified into 3 types – ‘Gaussian’, ‘Multinomial’ and ‘Bernoulli’. Naïve Bayes algorithm
work on Bayes Theorem. Bayes Theorem is sometimes also referred as Bayes Rule or Bayes Law. Naïve Bayes is
a probabilistic classifier.
K-Nearest Neighbours

Fig. 7. K-Nearest Neighbors

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[1772]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:07/July-2021 Impact Factor- 5.354 www.irjmets.com
In case of K-Nearest Neighbor, we classify the data point to be of a particular class, say Class A or Class B on the
basis of the labels of the nearest ‘k’ neighbors. In the above figure, the datapoint indicated by ‘New example to
classify’ will be classified as Class B when K=3 because among the nearest 3 neighbors, majority (2 neighbors in
this case) of them are of Class B. The same data point will be classified as Class A when K=7 because among the
nearest 7 neighbors, majority (4 neighbors in this case) of them are of Class A.
In case if the user wants a personalized recommendation and if the user enters the movies which he/she has
watched and rated, then he will get recommendations according to the data feeded to the system by that
particular user. If the user has watched only 1 movie, then we can display content-based recommendations
using content-based filtering. If the user has watched more than one movie, then we can recommend the
movies using binary classification techniques. If we use binary classification techniques, then we need to sort
the results according to some parameter.
We can apply any of the above algorithms and check the results. The correctness of the recommendations can be
checked manually or by comparing with the state of the art algorithms. The algorithm showing appropriate
results can be incorporated in order to show movie recommendations. Alternatively, we can get the
recommendations using multiple binary classification techniques and just take the common portion of the
individual results to get the final result.
D) Sorting the Results
Once we get the recommendations, we need to sort the recommendations according to some parameter. In this
research work, we have sorted according to the popularity. Later, we can display the recommendations.
Popularity of a movie can be calculated by the number of user IDs who have watched that particular movie.
Movie is indicated by ‘item_id’ in the dataframe. Ultimately, we need to compute the number of tuples
containing that particular item ID or movie ID.
IV. RESULTS

Fig. 8. Screenshot of Recommendation List in case if the user has watched ‘Copycat (1995)’ and
rated it as 5 star
The above figure shows the movie recommendations if the user has watched just ‘Copycat (1995)’ movie and
rated it as 5 stars.
V. CONCLUSION
Finding movie recommendations using binary classification technique is a unique way of getting
recommendations as compared to the traditional methods like Content-based Filtering and Collaborative
Filtering. In order to get the relevant results, proper binary classification algorithm should be applied. Logistic
Regression is the basic classification technique. There are a lot of other techniques which work better than
Logistic Regression like SVM. Type 2 binary ratings give better results but sometimes the results might be less
in number if we consider Type 2 binary ratings. Hence, in the cases when Type 2 binary ratings give a smaller
number of results, we can use Type 1 binary ratings. Even after getting the results, the results need to be sorted
on the basis of popularity. This will ensure that the recommendations are appropriate to the users.
VI. REFERENCES
[1] Choi, Sang-Min, Sang-Ki Ko, and Yo-Sub Han. "A movie recommendation algorithm based on genre
correlations." Expert Systems with Applications 39.9 (2012): 8079-8085.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[1773]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:07/July-2021 Impact Factor- 5.354 www.irjmets.com
[2] Lekakos, George, and Petros Caravelas. "A hybrid approach for movie recommendation." Multimedia
tools and applications 36.1 (2008): 55-70.
[3] Das, Debashis, Laxman Sahoo, and Sujoy Datta. "A survey on recommendation system." International
Journal of Computer Applications 160.7 (2017).
[4] Zhang, Jiang, et al. "Personalized real-time movie recommendation system: Practical prototype and
evaluation." Tsinghua Science and Technology 25.2 (2019): 180-191.
[5] Rajarajeswari, S., et al. "Movie Recommendation System." Emerging Research in Computing,
Information, Communication and Applications. Springer, Singapore, 2019. 329-340.
[6] Ahmed, Muyeed, Mir Tahsin Imtiaz, and Raiyan Khan. "Movie recommendation system using clustering
and pattern recognition network." 2018 IEEE 8th Annual Computing and Communication Workshop
and Conference (CCWC). IEEE, 2018.
[7] Arora, Gaurav, et al. "Movie recommendation system based on users’ similarity." International Journal
of Computer Science and Mobile Computing 3.4 (2014): 765-770.
[8] Subramaniyaswamy, V., et al. "A personalised movie recommendation system based on collaborative
filtering." International Journal of High Performance Computing and Networking 10.1-2 (2017): 54-63.
[9] Harper, F. Maxwell, and Joseph A. Konstan. "The movielens datasets: History and context." Acm
transactions on interactive intelligent systems (tiis) 5.4 (2015): 1-19.
[10] Monika D.Rokade ,Dr.Yogesh kumar Sharma,”Deep and machine learning approaches for anomaly-
based intrusion detection of imbalanced network traffic.” IOSR Journal of Engineering (IOSR JEN), ISSN
(e): 2250-3021, ISSN (p): 2278-8719
[11] Monika D.Rokade ,Dr.Yogesh kumar Sharma”MLIDS: A Machine Learning Approach for Intrusion
Detection for Real Time Network Dataset”, 2021 International Conference on Emerging Smart
Computing and Informatics (ESCI), IEEE
[12] Monika D.Rokade, Dr. Yogesh Kumar Sharma. (2020). Identification of Malicious Activity for Network
Packet using Deep Learning. International Journal of Advanced Science and Technology, 29(9s), 2324 -
2331.
[13] Sunil S.Khatal, Dr.Yogesh kumar Sharma, “Health Care Patient Monitoring using IoT and Machine
Learning.”, IOSR Journal of Engineering (IOSR JEN), ISSN (e): 2250-3021, ISSN (p): 2278-8719
[14] Sunil S.Khatal, Dr.Yogesh kumar Sharma, “Data Hiding In Audio- Video Using Anti Forensics Technique
ForAuthentication ", IJSRDV4I50349, Volume : 4, Issue : 5
[15] Sunil S.Khatal Dr. Yogesh Kumar Sharma. (2020). Analyzing the role of Heart Disease Prediction System
using IoT and Machine Learning. International Journal of Advanced Science and
Technology, 29(9s),2340-2346.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[1774]

You might also like