Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

T.C.

ANKARA UNIVERSITY
Faculty of Engineering
Department of Electrical and Electronics Engineering

Movie Recommendation System

Hüseyin Tayyip ALTAY


Nurullah MERTEL

Kaan Can YILMAZ

January 2024
ANKARA

i
Contents
Movie Recommendation System...............................................................................................i
Contents...................................................................................................................................ii
ABSTRACT..............................................................................................................................iii
I. INTRODUCTION...........................................................................................................1
II. DATASET......................................................................................................................1
Example Dataset:..............................................................................................................1
III. METHODS..................................................................................................................2
1. Inspection of The Data..................................................................................................2
2. Feature Extraction........................................................................................................3
3. Data Selection..............................................................................................................4
4. Data Filtering...............................................................................................................5
5. Clustering Movies........................................................................................................6
6. KNN Model Training...................................................................................................8
IV. PERFORMANCE CRITERIA.......................................................................................8
1. Accuracy of Recommendations:........................................................................................8
2. Diversity of Recommendations:........................................................................................9
3. Robustness to Noisy Data:................................................................................................9
4. Explainability of Recommendations:.................................................................................9
5. Correlation Matrix:........................................................................................................10
6. Ethical Considerations:...................................................................................................11
V. EXPERIMENTAL RESULTS........................................................................................12
1. Barnyard..................................................................................................................12
2. Man of Steel.............................................................................................................12
3. Dawn of the Dead....................................................................................................13
VI. CONCLUSION..........................................................................................................13
VII. ACKNOWLEDGEMENTS........................................................................................14
REFERENCES...................................................................................................................15
RELATED LINKS................................................................................................................15

ii
ABSTRACT
Our team, which consists of people with a lot of expertise in the film industry, set out to
create an innovative movie recommendation system in response to the growing demand in
today's media landscape for tailored movie suggestions. Under the joint direction of Hüseyin
Tayyip ALTAY and Nurullah MERTEL, we aim to improve the user experience by navigating
the intricacies of varied cinema content. This paper describes our Movie Recommendation
System's full design, development, and implementation processes using cutting-edge
machine learning methods. The system's goals go beyond traditional one-size-fits-all
methods by attempting to identify complex user preferences, adjust to shifting interests, and
provide personalised recommendations. The study includes a thorough analysis of the
dataset and includes insights about keywords, movie credits, actor information, and genres.
Preprocessing that comes next is guided by exploratory data analysis, which guarantees a
strong base for the recommendation system. The dataset, which is a crucial component of
our research, presents a singular chance to investigate the variables affecting both critical
and box office success. The experimental findings demonstrate the effectiveness of the
system on a variety of criteria. The conclusion summarises our project's transformative
journey, highlighting its accomplishments, resilience, user-centric design, ethical
considerations, and future directions. Thanks are given for everyone's contributions, and the
significance for content distribution and personalization is emphasised. As we approach the
next stage, our project serves as both a prologue and a finale to a continuous story of
advancement and improvement.

iii
I. INTRODUCTION
The abundance of available cinematic content in today's media-consuming society demands a
thoughtful and customised approach to movie recommendation. Our team, which consists of
people with extensive experience in the film industry, set out to create an advanced Movie
Recommendation System in answer to this need. The joint endeavour of us, Hüseyin Tayyip
ALTAY and Nurullah MERTEL aims to present a solution that not only manages the
intricacies of diverse film content but also improves the user experience in general.

The thorough design, development, and implementation procedures that make up our Movie
Recommendation System are described in this study. Using cutting-edge machine learning
techniques, our system aims to identify complex user preferences, adjust to changing
preferences, and deliver customised recommendations that go beyond standard one-size-fits-
all methods.

Our main goal is to further the conversation around recommendation algorithms and how
they affect user interaction. Our team works to clarify the technical and user-focused aspects
of our Movie Recommendation System by careful study,and deliberate design.

II. DATASET
Predicting movie success is a tricky business, even for costly mega-productions. This dataset,
despite a switch from IMDB to TMDB, offers a valuable chance to explore factors
influencing both critical acclaim and box office performance and there are about 4803
separate movies in the database. There are about 4803 separate movies in the db. After
removing empty keywords, genres and cast columns. The remaining is 4124 unique movies

Example Dataset:

1
Predicting a movie's success before its release is as much of an art as filmmaking itself. Even
mega-budget productions with A-list casts can flop, leaving studios scratching their heads.
This dataset, meticulously crafted using the versatile JSON format, offers a treasure trove of
insights to guide our exploration into the elusive formula for cinematic success.

This JSON-powered dataset is your invitation to embark on a cinematic data adventure. Dive
into the depths of movie magic, uncover the secrets of success, and perhaps even contribute
our own insights to the ever-evolving story of film.

III. METHODS
Importing of the libraries to use;

1. Inspection of The Data


An essential first stage in the development of our movie recommendation engine was data
inspection. We looked through the dataset in detail, looking at cast details, genres, movie
credits, and keywords. Insights regarding cast diversity, genre distribution, and keyword
frequency were obtained through exploratory data analysis. In order to address missing
values and refine the data for the best possible system performance, this step led the future
preprocessing. The thorough examination guaranteed that a solid and well-understood dataset
served as the foundation for our recommendation system.

2
These steps were followed to have enough information about the dataset that is used in the
code. Also, these steps were followed for the credits datasets and so that the information
regarding credits dataset was taken easily thanks to these steps.

2. Feature Extraction
This process delves into movies, unearthing their defining characteristics one by one. Imagine
sifting through mountains of data, carefully extracting only the essentials: genres like action
or drama, keywords, the original language weaving its cultural tapestry, the unique title
encapsulating its essence, and even the production company. Analysing these key movie
features allows us to explore what makes each film tick, what resonates with audiences, and
ultimately unlock the magic of cinema.

3
3. Data Selection
Data selection picks the crucial leaves of the genres, keywords, language, title, and the
production company. These chosen movie elements become our map, guiding us to what
makes each film tick, just like studying key parts of a tree reveals its secrets. By analysing
these elements individually and together, we uncover trends, audience preferences, and even
the ingredients for cinematic fruitfulness. Ultimately, this approach leads to a deeper
appreciation for the diverse and vibrant forest of filmmaking.

4
4. Data Filtering
Data filtering involves the process of narrowing down data based on specific criteria.
Retrieving only those fields that are pertinent to a particular inquiry or analysis can be viewed
as a form of filtering for relevant information. This selective approach enables users to focus
on the specific aspects of the data that are essential for their purposes, enhancing efficiency
and facilitating a more targeted examination of the information at hand. Whether applied in
database queries, data analysis, or information retrieval systems, data filtering plays a crucial
role in refining and extracting the most pertinent elements from a larger dataset, thereby
aiding in informed decision-making and meaningful insights.

5
5. Clustering Movies
Utilise clustering techniques to group movies based on common features such as genre or keywords.
For instance, we divide movies depending on their genre or keywords to categorise movies into
clusters that share similar characteristics. This step helps in organising the dataset and establishing
relationships between movies with common traits.

6
Analyzing the ratings of movies featuring the top 10 cast members is a comprehensive
endeavor that delves into the collective impact of these renowned actors and actresses on the
cinematic landscape. Beginning with the identification of the top 10 cast based on various
criteria such as popularity or critical acclaim, the process involves compiling data on the
movies in which they have acted, including essential details like release years, genres, and,
crucially, ratings. By averaging the ratings for each movie, an overall rating is computed,
offering a nuanced perspective on the perceived quality of each film. The aggregation of
these individual movie ratings then provides insight into the consistent performance of each
top cast member.

Exploring directors with the highest number of movies is a journey into the prolific contributions of
certain filmmakers to the cinematic realm. This analysis typically involves identifying and compiling
data on directors who have an extensive body of work, spanning multiple films. By examining the
number of movies attributed to each director, insights emerge into their prolificacy and influence
within the industry. This information can be pivotal for understanding the scope of a director's impact
on the film landscape, providing a lens through which to appreciate their versatility, thematic
preferences, and ability to sustain creative output over time.

7
6. KNN Model Training
Implement the KNN algorithm for recommendation purposes. Train the model using the preprocessed
and clustered dataset. The KNN algorithm identifies movies that are similar to a given movie by
calculating the distance between them in the feature space. In this context, the features could include
genre, keywords, or other relevant attributes.

IV. PERFORMANCE CRITERIA

1. Accuracy of Recommendations:
We carefully evaluate the correctness of our movie recommendation system by utilising
important metrics such as F1 score, precision, and recall. Recall assesses the system's
capacity to identify every pertinent item, whereas precision estimates the percentage of
pertinent movies among the ones that are suggested. A fair assessment is provided by the F1
score, which takes memory and precision into account. This thorough assessment guarantees
that customers receive relevant and high-quality movie recommendations from the system.
The ratio of real positive predictions to all predicted positives is known as precision.
"Out of all the instances predicted as positive, how many were actually positive?" is the
question it answers. A measure of positive prediction accuracy is called precision.
The ratio of genuine positive forecasts to all real positives is known as recall (sensitivity). It
provides an answer to the query, "How many of the actual positive instances were correctly
predicted?" A model's recall refers to how well it catches all of the good examples.
F1 Score: The harmonic mean of recall and precision is the F1 score. It offers a compromise
between recall and precision, particularly in cases when the distribution of classes is not
uniform. The F1 score ranges between 0 and 1, with 1 being the best possible score.

8
(TP=True Positive, FP= False Positive)

2. Diversity of Recommendations:
Our recommendation engine gives diversity in movie recommendations top priority in order
to improve user experience. Using an intra-list diversity metric, we make sure that the
recommended films are not unduly confined to particular performers, genres, or keywords.
This method meets a wide range of preferences by providing users with more inclusive and
diversified choices.

3. Robustness to Noisy Data:


An important consideration in assessing the system's performance is how robust it is to
incomplete or noisy data. Strong recommendation systems should be able to manage
abnormalities and inconsistent data in an efficient manner, guaranteeing accurate and
dependable recommendations. In practical situations, this enhances the system's overall
dependability.

4. Explainability of Recommendations:
An explainability score is used in our recommendation system to give transparency and user
trust top priority. This measure evaluates the system's ability to give clear explanations for
the movies it recommends. Users gain confidence from an explainable system since it
provides them with information about the underlying reasons that influence each
recommendation.

9
5. Correlation Matrix:
The Correlation Matrix serves as a fundamental tool for uncovering relationships between
variables within the dataset. This matrix provides a comprehensive overview of the strength
and direction of correlations among different features, allowing data scientists and analysts to
discern patterns and dependencies. The correlation coefficients within the matrix range from -
1 to 1, revealing the nature of the relationships: positive, negative, or no correlation. This
exploration is particularly valuable during the initial phases of data analysis, aiding in the
identification of potential multicollinearity issues and informing decisions related to feature
selection.

10
6. Ethical Considerations:
We value impartiality and justice while acknowledging the ethical ramifications of
recommendation systems. Examining biases is a crucial part of ethical concerns to make sure
that recommendations don't unduly benefit particular genres or populations. Our dedication to
moral behaviour emphasises how crucial it is to provide fair and inclusive movie
recommendations.

11
V. EXPERIMENTAL RESULTS

1. Barnyard

The comparison between the predicted and actual ratings for 'Barnyard' unveils insights into
the performance of a predictive model or algorithm. According to the model, the predicted
rating for 'Barnyard' is 6.510000. In contrast, the actual rating assigned to the movie by
viewers or a rating source stands at 5.300000. This discrepancy of 1.21 between the predicted
and actual ratings highlights an area where the predictive model may need refinement. A
difference of this magnitude suggests that the model might not fully capture certain factors
influencing the movie's rating, emphasizing the importance of continuous evaluation and
improvement in predictive modeling. Such analyses play a pivotal role in fine-tuning models
to provide more accurate and reliable predictions for audience preferences and movie ratings.

2. Man of Steel

12
The evaluation of 'Man of Steel' involves a comparison between the predicted and actual
ratings, shedding light on the effectiveness of a predictive model or algorithm. According to
the model, the predicted rating for 'Man of Steel' is 6.520000. In contrast, the actual rating
assigned to the movie by viewers or a rating source is 6.500000. This minimal difference of
0.02 between the predicted and actual ratings suggests that the predictive model performed
quite accurately in estimating the movie's rating. The close alignment between the predicted
and actual values indicates a high level of precision in the model's ability to capture the
factors influencing the movie's reception. Such meticulous assessments contribute to the
refinement of predictive models, ensuring their reliability in forecasting audience preferences
and movie ratings.

3. Dawn of the Dead

The predicted rating for 'Dawn of the Dead' is 6.210000, as generated by a predictive model
or algorithm that takes into account various features or characteristics of the movie. In
contrast, the actual rating assigned to 'Dawn of the Dead' by viewers or a rating source is
6.800000. This discrepancy of 0.59 between the predicted and actual ratings serves as an
indicator of the model's performance. The smaller the difference, the more accurate the
predictive model is considered. In this instance, the model slightly underestimated the
movie's actual rating. Such evaluations play a crucial role in refining and enhancing
predictive models, as they provide insights into areas where the model may need
improvement to better reflect the true preferences and sentiments of the audience.

VI. CONCLUSION
The result of our intensive efforts to design, develop, and deploy the movie recommendation
system is a revolutionary journey marked by creativity, teamwork, and a steadfast dedication
to user-centric design. This extensive project offers a complex tapestry of accomplishments

13
and contributions that demonstrate not only technical skill but also a significant influence on
the tailored content delivery landscape.

Our movie recommendation system has a significant influence on the personalised content
delivery environment that extends beyond its immediate technological consequences.
Through the combination of cutting-edge analytics and a user-centred design approach, we
have helped to transform how people find and interact with movies. This influence has
consequences that go beyond the boundaries of our project and further advance the general
development of recommendation systems.

Finally, when we approach the brink of moving on to the next stage, we feel a sense of
success and excitement for what is to come. This movie recommendation system project is a
prologue to a continuous story of improvement and progress rather than just its final chapter.

VII. ACKNOWLEDGEMENTS
We would like to express my sincere gratitude to my teacher, Mr. Kaan Can YILMAZ, for
providing usewith the invaluable opportunity to explore and learn in the field of movie
reccomendation. Your guidance, support, and encouragement have been instrumental in
shaping my understanding and passion for machine learning. I appreciate the time and effort
you have invested in fostering a positive and enriching learning environment. Thank you,
hocam, for being an exceptional teacher and mentor. Your commitment to our academic
growth has left a lasting impact on my educational journey.

Sincerely.

14
REFERENCES
1. Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge
(2001)
2. Wang, H.: Nearest Neighbours without k: A Classification Formalism based on
Probability, technical report, Faculty of Informatics, University of Ulster, N.Ireland,
UK (2002)
3. Derrac, J., et al. "Keel data-mining software tool: Data set repository, integration of
algorithms and experimental analysis framework." J. Mult. Valued Log. Soft Comput
17 (2015): 255-287.

RELATED LINKS
https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata

15
i

You might also like