1 - Assignment Details Practical Data Science With Python

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Practical data Science with Python Assignment

Assessment : Recommender Systems


This assignment focuses on recommender systems, a data science application widely used in the real
world. You will need to develop and implement appropriate solutions to complete the corresponding
tasks and present the results (virtually). These tasks must be completed individually.

Introduction
This assignment focuses on recommender systems, a data science application widely used in
the real world. You will need to develop and implement appropriate solutions to complete
the corresponding tasks, and present the results (virtually). These tasks must be completed
individually.

Academic Integrity
All assignments will be checked with plagiarism-detection software; any student found to have
plagiarised will be subject to disciplinary actions. Plagiarism includes, e.g., submitting code that is not
your own or submitting text that is not your own. Allowing others to copy your work is also
considered as plagiarism. All plagiarisms will be penalised; there are no exceptions and no excuses.

The Dataset
This assignment deals with movie recommendation. The dataset to be used throughout the assignment
is the MovieLens 1M Dataset.

Task 1: User-based Collaborative Filtering (6 marks)

In this task you need to develop, implement, and evaluate user-based (i.e., user-user)
collaborative filtering that uses KNN (k-nearest neighbour), i.e., KNN-based Collaborative
Filtering. Randomly choose one user (as the active user) and predict this user’s ratings
on movies. Note that in the given dataset, the user might have only rated some of the
movies.
Specific requirements include:

 Choose an appropriate similarity metric (and other parameters).


 Implement the approach in Python. Add detailed comments for the code (to explain
 your implementation).
 Study the impact of the parameter K (of KNN), with at least 5 different values.
 Use RMSE (root-mean-square error) as the metric for evaluation.
 Summarise the results/findings concisely in slides (and presentation).
Tips on creating comments in Python: https://www.w3schools.com/python/python comments.asp.

Task 2: Item-based Filtering (6 marks)

In this task you need to develop, implement, and evaluate item-based (i.e., item-item)
(collaborative) filtering that uses KNN. Randomly choose one item (i.e., movie) and
predict users’ ratings on this movie. Note that in the given dataset, it is possible that
some users didn’t rate this movie.
Specific requirements include:

 Choose an appropriate value for the parameter K (of KNN) (and other parameters).
 Implement the approach in Python. Add detailed comments for the code (to explain
 your implementation).
 Compare the performance of at least 2 similarity metrics.
 Use RMSE (root-mean-square error) as the metric for evaluation.
 Summarise the results/findings concisely in slides (and presentation).

Task 3: A Better Recommender System (12 marks)

In this task you are required to identify/propose a better solution to improve the recommendation
quality and evaluate its performance in comparison with two baseline methods.

Task 3.1. Develop the ”new” solution for movie recommendation. This can be done
in either of the following ways:

Option 1: Search and read related publications on recommender systems. Identify/choose one
solution that can be used for movie recommendation and will potentially deliver better
recommendation quality. The solution must have been published in a peer-reviewed journal or
conference. It can be based on KNN or not.

Option 2: Based on an extensive literature review (conducted by yourself), propose


a new algorithm by yourself. In this case, the idea is yours. You need to justify
its originality and novelty. A strong list of references must be provided. Note that
parameter tuning only cannot be considered as a (new) approach.
 Describe the idea (in your own words) clearly and precisely in slides (and presentation). Cite
references wherever necessary.
 In slides, explicitly give the source of the solution (if you choose Option 1) or provide
a strong list of references (Option 2).

 Name the solution as Option1RecSys if you choose Option 1 or Option2RecSys for


Option 2.

 Implement the approach in Python. Add detailed comments for the code (to explain
your implementation as well as details of the algorithm).

Task 3.2.
Randomly choose 5 users (from who have rated more than 100 movies
each) and recommend Top-30 movies (to each user). Use AP (Average Precision) and
NDCG (Normalized Discounted Cumulative Gain) as evaluation metrics. Compare the
performance of three solutions:

 Movie Average: recommends items with the highest average ratings (see also Week
9 lec slides). Name it as MovieAvg.

 KNN-based Collaborative Filtering: the approach developed in the above Task 1.


Choose the optimal parameters (based on results of Task 1). Name it as KNNCF.

 The solution developed in the above Task 3.1, i.e., Option1RecSys or Option2RecSys.

Use one appropriate graph/chart to visualise the results of each metric, respectively
(i.e., one graph for AP and another for NDCG). Summarise the results/findings concisely
in slides (and presentation).

** IMPORTANT**

“It is required that the solution developed in Task 3.1 (Option1RecSys or


Option2RecSys) must achieve the best (overall) performance; if this is not the
case, your whole Task 3 will be considered as ”FAIL” (as per the rubric)”

Task 4: Presentation (6 marks)


In this assignment, you need to design slides and record a presentation (for the above tasks

and results). Your slides and presentation could include, e.g., but not limited to:

 a cover page/slide containing e.g., your name and student ID, in addition to Assignment info,
 a concise outline, key results, and findings of Task 1,
 a concise outline, key results, and findings of Task 2,
 clear and complete description of the ”new” solution developed in Task 3.1 (with
 proper citations),
 literature review (with proper citations), if applicable,
 necessary/key details of the algorithm of Option1RecSys or Option2RecSys,
 4
 key results, visualisation, and findings of Task 3.2,
 a list of references.

The following requirements must be strictly met; otherwise, your submission

will be considered invalid.

 The slides should be no more than 10 pages in total (that is, no more than 10 slides)
 Save your clean slides as a PDF file and name it A3Slides.pdf. This version should
 not contain any recordings therein.
 The presentation (recording) should be no more than 5 minutes.
 The recording video file must be in MP4 format. Name it A3Presentation.mp4.
 The recording video file must be less than 30MB in size.

Assignment Rubrics
Assignment 3 (Sem 2, 2023)

Criteria Ratings

This criterion
is linked to a
learning
outcomeTask 1
User-based
Collaborative
Filtering 6 to >4.8 Pts 4.8 to >4.2 Pts 4.2 to >3.6 Pts 3.6 to >3.0 Pts 3 to >0
HIGH DISTINCTION CREDIT PASS FAIL
DISTINCTION All parameters are All parameters are All parameters are There
All parameters are properly set. The correctly set. The correctly set. The with pa
properly set. The implementation is implementation is implementation is setting
implementation is free from free from substantial free from substantial the imp
completely free from (substantial) errors. errors. Most details errors. The code is or com
errors. Details of the Details of the of the commented to are no
implementation are implementation are implementation are explain the comme
clear. The code is clear. The code is clear. The code is implementation. All study o
well commented to well commented to appropriately comments are of the
explain the explain the commented to somewhat correct. parame
implementation. All implementation. All explain the The impact of the (somew
comments are comments are implementation. All key parameter is insuffi
necessary and appropriate. The comments are studied with support inappr
precise. The impact impact of the key correct. The impact of numeric results. quantit
of the key parameter parameter is studied of the key parameter The correct metric is are mis
is studied adequately with strong support is studied with used. All incorre
with strong support of numeric results. support of numeric calculations are free Calcul
of well-organised The correct metric is results. The correct from substantial errors.
numeric results. The used. All metric is used. All errors. The results are dis
correct metric is calculations are free calculations are free are displayed and presen
used. All from errors. The from errors. The presented in some inappr
calculations are free results are displayed results are displayed way, and the the fin
from errors. The and presented and presented findings are summa
results are well appropriately, and appropriately, and summarised correct
displayed and the findings are the findings are correctly.
presented, and the summarised summarised
findings are correctly. correctly.
summarised
correctly and
concisely.
Assignment 3 (Sem 2, 2023)

Criteria Ratings

This criterion
is linked to a
learning
outcomeTask 2
Item-based
Filtering
6 to >4.8 Pts 4.8 to >4.2 Pts 4.2 to >3.6 Pts 3.6 to >3.0 Pts 3 to >0
HIGH DISTINCTION CREDIT PASS FAIL
DISTINCTION All parameters are All parameters are All parameters are There
All parameters are properly set. The correctly set. The correctly set. The with p
properly set. The implementation is implementation is implementation is setting
implementation is free from free from substantial free from substantial the im
completely free from (substantial) errors. errors. Most details errors. The code is or com
errors. Details of the Details of the of the commented to are no
implementation are implementation are implementation are explain the comme
clear. The code is clear. The code is clear. The code is implementation. All compa
well commented to well commented to appropriately comments are similar
explain the explain the commented to somewhat correct. (somew
implementation. All implementation. All explain the The performance insuffi
comments are comments are implementation. All resulting from inappr
necessary and appropriate. The comments are different similarity quantit
precise. The performance correct. The metrics is compared are mi
performance resulting from performance with support of incorre
resulting from different similarity resulting from numeric results. The Calcul
different similarity metrics is compared different similarity correct metric is errors.
metrics is compared with strong support metrics is compared used. All are dis
adequately with of numeric results. with support of calculations are free presen
strong support of The correct metric is numeric results. The from substantial inappr
well-organised used. All correct metric is errors. The results the fin
numeric results. The calculations are free used. All are displayed and summa
correct metric is from errors. The calculations are free presented in some correct
used. All results are displayed from errors. The way, and the
calculations are free and presented results are displayed findings are
from errors. The appropriately, and and presented summarised
results are well the findings are appropriately, and correctly.
displayed and summarised the findings are
presented, and the correctly. summarised
findings are correctly.
summarised
correctly and
concisely.
Assignment 3 (Sem 2, 2023)

Criteria Ratings

This criterion
is linked to a
learning
outcomeTask 3
A Better
Recommender
System

12 to >9.6 Pts 9.6 to >8.4 Pts 8.4 to >7.2 Pts 7.2 to >6.0 Pts 6 to >0
HIGH DISTINCTION CREDIT PASS FAIL
DISTINCTION The idea (of the The idea (of the The idea (of the The ide
The idea (of the "new" solution) is "new" solution) is "new" solution) is "new"
"new" solution) is clearly and precisely clearly described. described in some (somew
clearly and precisely described. All All references are way. All references Refere
described. All references are properly cited. The are (properly) cited. citation
references are properly cited. The source of the The source of the missing
properly cited. The source of the solution solution is clearly solution is specified, of the s
source of the solution is clearly specified, if specified, if if applicable. Key unclear
is clearly specified, if applicable. Key applicable. Key details of the applica
applicable. Key details of the details of the algorithm are details
details of the algorithm are algorithm are provided. All algorith
algorithm are provided. All provided. All methods are named missing
provided. All methods are named methods are named as required. All are not
methods are named as required. All as required. All parameters are require
as required. All parameters are parameters are correctly set. The issues w
parameters are properly set. The correctly set. The implementation of parame
properly set. The implementation of all implementation of all methods is free or erro
implementation of all methods is free from all methods is free from substantial implem
methods is (substantial) errors. from substantial errors. The code is comme
completely free from Details of the errors. Most details commented in some are no
errors. Details of the implementation are of the way. All comments comme
implementation are clear. The code is implementation are are somewhat Perform
clear. The code is well commented. All clear. The code is correct. All settings compa
well commented. All comments are commented. All for performance (somew
comments are appropriate. All comments are evaluation are insuffic
necessary and settings for correct. All settings somewhat inappro
precise. All settings performance for performance reasonable. The quantit
for performance evaluation are evaluation are performance of are mis
evaluation are reasonable. The reasonable. The different solutions is incorre
reasonable and performance of performance of compared with Calcula
properly justified or different solutions is different solutions is support of numeric errors.
explained. The compared with compared with results. All are disp
performance of strong support of support of numeric calculations are free visuali
different solutions is numeric results. All results. All from substantial present
compared adequately calculations are free calculations are free errors. The results inappro
with strong support from errors. The from errors. The are displayed, finding
of well-organised results are displayed, results are visualised, and summa
numeric results. All visualised, and displayed, presented in some correct
calculations are free presented visualised, and way. The findings "new"
from errors. The appropriately. The presented are summarised doesn't
Assignment 3 (Sem 2, 2023)

Criteria Ratings

This criterion
is linked to a
learning
outcomeTask 4
Presentation
6 to >4.8 Pts 4.8 to >4.2 Pts 4.2 to >3.6 Pts 3.6 to >3.0 Pts 3 to >0 P
HIGH DISTINCTION CREDIT PASS FAIL
DISTINCTION All requirements for All requirements All requirements Not all re
All requirements for slides and presentation for slides and for slides and for slides
slides and (specified in the presentation presentation presentat
presentation Assignment (specified in the (specified in the in the As
(specified in the document) are met. Assignment Assignment documen
Assignment The contents are document) are met. document) are Key cont
document) are met. adequate and properly All necessary met. Most are missi
The contents are well formatted, and all elements are necessary format is
designed. necessary elements are included and elements are Presentat
Presentation is well included. Presentation properly formatted. included and follow. O
structured. is largely well Presentation is properly of inform
Organisation of structured. largely well formatted. presentat
information and Organisation of structured. Presentation is inapprop
presentation are information and Organisation of structured in many gra
clear, logical, presentation are clear, information and some way. spelling
appropriately consistent and presentation are Organisation of
sequenced and appropriately clear, and mostly information and
consistent. No sequenced. No appropriately- presentation are
obvious grammar or obvious grammar or sequenced. Some somewhat
spelling mistakes. spelling mistakes. grammar or appropriate.
spelling mistakes. Some grammar or
spelling mistakes.

Total points: 30

You might also like