Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

1

DSE BIG DATA SYSTEMS ASSIGNMENT 3

Submission Date: 20 May 2020 11.55 PM

Weightage: 10%

You all must have visited GoodReads, in order to see the ratings for the books you are interested in or
looking for an interested book! You might be deciding what to read next, then you’re in the right place.
You will tell what titles or genres you’ve enjoyed in the past, and GoodReads give you surprisingly
insightful recommendations. Now it’s your turn to develop such a recommendation system!

You have been given a GoodReads book rating dataset (link provided in the references section). Using the
Spark’s MLLib module and other related libraries / modules (additional references provided at the end of
this document), you are supposed to prepare a recommendation engine.

The Collaborative filtering (CF) is a technique used by recommender systems. Usually the two common
questions those will be answered by this technique are:

 For a given user, what are the top recommended products?


 For a given product, what are the recommended users?

With the help of the given dataset and the recommendation model you have built, answer the following
questions:

Q1. What are the number of unique users and books?

Q2. What percentage of books have received the ratings 3 or less than 3?

Q3. After tuning the parameters like rank, maxIter and regParam, what is the best RMSE that you have
obtained?

Q4. Using the recommendation engine based on the best RMSE obtained,

a) What are the top 5 book title recommendations made for each user?
b) What are the top 5 user recommendation made for each book title?

Q5. For user 1, what are the book titles recommendations made by your model actually appear in the
users “to read” list? What is your conclusion from the same?

Notes:
 This is a take-home assignment to be carried out by each learner group independently.
 This is programming exercise - requiring the given dataset to be used – on Jupyter notebook
environment / Apache Zeppelin notebook.

DSE BDS Assignment 3


1

 You may consult / discuss with other learners peripheral aspects such as the environment but not
on solving the specific problems in terms of design or implementation.
 You have to write the appropriate Python code in Jupyter / Zeppelin notebook to support you
answers and submit with following nomenclature
Final document - BDS_Assignment3_<Group_ID>.ipynb / zeppelin notbook
 Provide appropriate justification when processing the data or arriving at the conclusions.
 In case of any further queries, if those are generic once, learners are encouraged to use discussion
forums, otherwise they can reach out to me at ppawar@wilp.bits-pilani.ac.in.
 Manage your efforts properly as there is no scope to shift the deadlines announced above.

References:
1) Collaborative Filtering
2) Apache Spark Collaborative Filtering documentation
3) ALS algorithm
4) Large-scale Parallel Collaborative Filtering for the Netflix Prize
5) GoodReads Dataset
6) Apache Zeppelin

DSE BDS Assignment 3

You might also like