Welcome to Scribd!

2020 Dse Bds Assign3

Uploaded by

surajpb1989

0% found this document useful (0 votes)

32 views2 pages

This document provides instructions for an assignment to build a book recommendation engine using collaborative filtering on a GoodReads book rating dataset. Students are asked to analyze the dataset to determine the number of unique users and books as well as the percentage of books rated 3 or less. They then need to tune parameters of the recommendation model to minimize the RMSE and use the model to provide top 5 book recommendations for each user and top 5 user recommendations for each book. The model recommendations for user 1 should also be compared to that user's actual "to read" list to evaluate the model.

Original Description:

assignment

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

32 views2 pages

2020 Dse Bds Assign3

Uploaded by

surajpb1989

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 2

Search inside document

1

DSE BIG DATA SYSTEMS ASSIGNMENT 3

Submission Date: 20 May 2020 11.55 PM

Weightage: 10%

You all must have visited GoodReads, in order to see the ratings for the books you are interested in or
looking for an interested book! You might be deciding what to read next, then you’re in the right place.
You will tell what titles or genres you’ve enjoyed in the past, and GoodReads give you surprisingly
insightful recommendations. Now it’s your turn to develop such a recommendation system!

You have been given a GoodReads book rating dataset (link provided in the references section). Using the
Spark’s MLLib module and other related libraries / modules (additional references provided at the end of
this document), you are supposed to prepare a recommendation engine.

The Collaborative filtering (CF) is a technique used by recommender systems. Usually the two common
questions those will be answered by this technique are:

 For a given user, what are the top recommended products?

 For a given product, what are the recommended users?

With the help of the given dataset and the recommendation model you have built, answer the following
questions:

Q1. What are the number of unique users and books?

Q2. What percentage of books have received the ratings 3 or less than 3?

Q3. After tuning the parameters like rank, maxIter and regParam, what is the best RMSE that you have
obtained?

Q4. Using the recommendation engine based on the best RMSE obtained,

a) What are the top 5 book title recommendations made for each user?
b) What are the top 5 user recommendation made for each book title?

Q5. For user 1, what are the book titles recommendations made by your model actually appear in the
users “to read” list? What is your conclusion from the same?

Notes:
 This is a take-home assignment to be carried out by each learner group independently.
 This is programming exercise - requiring the given dataset to be used – on Jupyter notebook
environment / Apache Zeppelin notebook.

DSE BDS Assignment 3

 You may consult / discuss with other learners peripheral aspects such as the environment but not
on solving the specific problems in terms of design or implementation.
 You have to write the appropriate Python code in Jupyter / Zeppelin notebook to support you
answers and submit with following nomenclature
Final document - BDS_Assignment3_<Group_ID>.ipynb / zeppelin notbook
 Provide appropriate justification when processing the data or arriving at the conclusions.
 In case of any further queries, if those are generic once, learners are encouraged to use discussion
forums, otherwise they can reach out to me at ppawar@wilp.bits-pilani.ac.in.
 Manage your efforts properly as there is no scope to shift the deadlines announced above.

References:
1) Collaborative Filtering
2) Apache Spark Collaborative Filtering documentation
3) ALS algorithm
4) Large-scale Parallel Collaborative Filtering for the Netflix Prize
5) GoodReads Dataset
6) Apache Zeppelin

DSE BDS Assignment 3

Product Dissection
Document6 pages
Product Dissection
Sahil Shirke
No ratings yet
KNIME Essentials
From Everand
KNIME Essentials
Gábor Bakos
No ratings yet
AZ-900T0xModule 02core Azure Services
Document43 pages
AZ-900T0xModule 02core Azure Services
Anshul Aakotkar
No ratings yet
TAFJ-DB Performance PDF
Document19 pages
TAFJ-DB Performance PDF
Devinda De Zoysa
100% (1)
Research Paper Review Format
Document6 pages
Research Paper Review Format
afmclccre
100% (1)
Automatic Question Generation For Literature Review Writing Support
Document4 pages
Automatic Question Generation For Literature Review Writing Support
qyptsxvkg
No ratings yet
Userprofile-Based Personalized Research Paper Recommendation System
Document4 pages
Userprofile-Based Personalized Research Paper Recommendation System
afeekqmlf
100% (1)
DAS Assignment Question
Document4 pages
DAS Assignment Question
黄佩珺
No ratings yet
Literature Review On Expert System
Document6 pages
Literature Review On Expert System
c5sd1aqj
100% (1)
Mac Dissertation Tools
Document7 pages
Mac Dissertation Tools
CollegePapersToBuyUK
100% (1)
Sample Review Comments For Research Paper
Document6 pages
Sample Review Comments For Research Paper
afmcuvkjz
100% (1)
Chapter 2 For Research Paper
Document8 pages
Chapter 2 For Research Paper
gz8qs4dn
100% (1)
A Literature Review of Research in Software Defect Reporting
Document5 pages
A Literature Review of Research in Software Defect Reporting
aflsmawld
No ratings yet
Computer Research Paper Sample
Document5 pages
Computer Research Paper Sample
qxiarzznd
100% (1)
Computers Research Paper
Document5 pages
Computers Research Paper
qhujvirhf
100% (1)
R&D PDF
Document6 pages
R&D PDF
Muhammad Shahbaz
No ratings yet
Research Review Paper Template
Document7 pages
Research Review Paper Template
zmvhosbnd
100% (1)
Linux Research Paper PDF
Document8 pages
Linux Research Paper PDF
afnhkvmnemelfx
100% (1)
My Spotify
Document11 pages
My Spotify
Mohammad Boustta
No ratings yet
7641 Assignment 1
Document4 pages
7641 Assignment 1
Muhammad Aleem
No ratings yet
Skype Literature Review
Document6 pages
Skype Literature Review
kcjzgcsif
100% (1)
IDB Assignment Question - Intake 2109
Document4 pages
IDB Assignment Question - Intake 2109
tp060916
No ratings yet
Index
Document5 pages
Index
Tuku Sahu
No ratings yet
Literature Review On Uart
Document7 pages
Literature Review On Uart
ruogdicnd
100% (1)
Literature Review On Linux
Document6 pages
Literature Review On Linux
afdttatoh
100% (1)
Machine Learning Project 1
Document3 pages
Machine Learning Project 1
Fuad Kemal
No ratings yet
Master Thesis Opinion Mining
Document4 pages
Master Thesis Opinion Mining
paulasmithindependence
100% (2)
IDB Assignment Question - Intake 2206
Document4 pages
IDB Assignment Question - Intake 2206
Jang Hang Choo
No ratings yet
Mapping Career Oriented Questionnaire To Professionals Using Recommender Systems
Document27 pages
Mapping Career Oriented Questionnaire To Professionals Using Recommender Systems
HARIKA PINNINTI
No ratings yet
Preparing Data for Analysis with JMP
From Everand
Preparing Data for Analysis with JMP
Robert Carver
No ratings yet
Thesis Database
Document8 pages
Thesis Database
aflnxhshxlddxg
100% (2)
HRD Condensed Syllabi
Document118 pages
HRD Condensed Syllabi
Rajib Mukherjee
No ratings yet
Research Paper Software Mac
Document4 pages
Research Paper Software Mac
k0wyn0tykob3
100% (1)
Phase II - Research Organization Document - Group& Final
Document4 pages
Phase II - Research Organization Document - Group& Final
Amber Coffey
No ratings yet
Online System Sample Thesis
Document8 pages
Online System Sample Thesis
zyxnlmikd
100% (2)
Research Paper Review Sample
Document6 pages
Research Paper Review Sample
jtxyihukg
100% (1)
Beginning Spring Data Data Access and Persistence For Spring Framework 6 and Boot 3 Andres Sacco Full Chapter
Document67 pages
Beginning Spring Data Data Access and Persistence For Spring Framework 6 and Boot 3 Andres Sacco Full Chapter
kenneth.mccrary981
100% (10)
Building Machine Learning Systems Using Python: Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)
From Everand
Building Machine Learning Systems Using Python: Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)
Deepti Chopra
No ratings yet
Assignment Lis1
Document14 pages
Assignment Lis1
Shretta Hope
No ratings yet
Research Paper Library Services
Document8 pages
Research Paper Library Services
ajqkrxplg
100% (1)
(Download PDF) Algorithms and Data Structures in Action Meap V12 Marcello La Rocca Online Ebook All Chapter PDF
Document42 pages
(Download PDF) Algorithms and Data Structures in Action Meap V12 Marcello La Rocca Online Ebook All Chapter PDF
yvonne.thorsness561
100% (9)
Literature Review Python
Document8 pages
Literature Review Python
c5sd1aqj
100% (1)
Literature Review On Computer Hardware
Document8 pages
Literature Review On Computer Hardware
ckkuzaxgf
100% (1)
Designing A Literature Review Matrix Template
Document7 pages
Designing A Literature Review Matrix Template
c5pgpgqk
100% (1)
Free Dissertation Thesis
Document6 pages
Free Dissertation Thesis
INeedSomeoneToWriteMyPaperCanada
100% (1)
Linux Research Paper Topics
Document7 pages
Linux Research Paper Topics
rykkssbnd
100% (1)
Recommendation System Research Papers
Document7 pages
Recommendation System Research Papers
gw321jrv
100% (1)
Computer Science Research Papers PDF
Document5 pages
Computer Science Research Papers PDF
afeebjrsd
100% (1)
Example of Research Paper Review
Document8 pages
Example of Research Paper Review
gzzjhsv9
100% (1)
Literature Review Vanet
Document8 pages
Literature Review Vanet
aflrpjser
100% (1)
Parts of Chapter 2 of Research Paper
Document6 pages
Parts of Chapter 2 of Research Paper
gw10ka6s
100% (1)
Research Paper On Laptops
Document5 pages
Research Paper On Laptops
afnjowzlseoabx
100% (1)
HCI Exam Review
Document21 pages
HCI Exam Review
Sarah O'Connor
No ratings yet
Chapter 5 Thesis Recommendation
Document6 pages
Chapter 5 Thesis Recommendation
jennifersimmonsmilwaukee
100% (2)
Structure Versioning For PyTables
Document17 pages
Structure Versioning For PyTables
motek
100% (2)
How To Write A Research Paper Powerpoint For Middle School
Document6 pages
How To Write A Research Paper Powerpoint For Middle School
fzmgp96k
No ratings yet
Thesis 2.0 Download Free
Document6 pages
Thesis 2.0 Download Free
jennyhillminneapolis
100% (2)
Chapter 4 After Modfiy
Document4 pages
Chapter 4 After Modfiy
fatmahelawden000
No ratings yet
Thesis Format Chapter 4-5
Document7 pages
Thesis Format Chapter 4-5
anneryssanchezpaterson
100% (1)
Rod II
Document3 pages
Rod II
api-534702185
No ratings yet
PL-100 Exam - Free Actual Q&as, Page 1 - ExamTopics
Document635 pages
PL-100 Exam - Free Actual Q&as, Page 1 - ExamTopics
333surima
No ratings yet
E Thesis Endnote
Document6 pages
E Thesis Endnote
HelpWritingACollegePaperCanada
100% (2)
Acer Literature Review
Document5 pages
Acer Literature Review
fat1kifywel3
100% (1)
26 TH 20 Jan 2020 Logout
Document36 pages
26 TH 20 Jan 2020 Logout
surajpb1989
No ratings yet
Wireshark Cheatsheet3 PDF
Document1 page
Wireshark Cheatsheet3 PDF
surajpb1989
No ratings yet
Cell ID Cell Name Site ID Site Name District
Document5 pages
Cell ID Cell Name Site ID Site Name District
surajpb1989
No ratings yet
Site Id Site Name Unique Cellid Affected Sector: Custodian Zone 2G 3G 4G 2G
Document10 pages
Site Id Site Name Unique Cellid Affected Sector: Custodian Zone 2G 3G 4G 2G
surajpb1989
No ratings yet
Vool
Document203 pages
Vool
surajpb1989
No ratings yet
Cell ID Cell Name Site ID Site Name District
Document5 pages
Cell ID Cell Name Site ID Site Name District
surajpb1989
No ratings yet
Cellindex 3 Gonly
Document90 pages
Cellindex 3 Gonly
surajpb1989
No ratings yet
Equations
Document1 page
Equations
surajpb1989
No ratings yet
Cell Index
Document207 pages
Cell Index
surajpb1989
No ratings yet
Current Technologies and Trends in The Development of Gyros Used in Navigation Applications - A Review
Document6 pages
Current Technologies and Trends in The Development of Gyros Used in Navigation Applications - A Review
surajpb1989
No ratings yet
KL04681 1896 Painkulam AGRAHARM - 24 2117, 2118, 2119, 2120 Chelakkara Attoor KL04224 Elite
Document8 pages
KL04681 1896 Painkulam AGRAHARM - 24 2117, 2118, 2119, 2120 Chelakkara Attoor KL04224 Elite
surajpb1989
No ratings yet
Appendix G Study Planner: Practice Test Reading Task
Document3 pages
Appendix G Study Planner: Practice Test Reading Task
surajpb1989
No ratings yet
Dreamline - Assessment Form: Senior Executive Engineer, 8 Years
Document1 page
Dreamline - Assessment Form: Senior Executive Engineer, 8 Years
surajpb1989
No ratings yet
SCF Editing Paramaters
Document5 pages
SCF Editing Paramaters
surajpb1989
No ratings yet
Kia Zens
Document15 pages
Kia Zens
surajpb1989
No ratings yet
Hardware Compatibility For New BoQ, Nov 2014
Document4 pages
Hardware Compatibility For New BoQ, Nov 2014
surajpb1989
No ratings yet
Flip Flop Mealy and Moore Model
Document25 pages
Flip Flop Mealy and Moore Model
surajpb1989
100% (1)
Profile: Roopesh Kaimal Sasi Bhavan Nedumporom Post Thiruvalla Pathanathita Dist KERALA 689578 PH: +91-940-028-6748
Document8 pages
Profile: Roopesh Kaimal Sasi Bhavan Nedumporom Post Thiruvalla Pathanathita Dist KERALA 689578 PH: +91-940-028-6748
surajpb1989
No ratings yet
Keylogger Code C++
Document2 pages
Keylogger Code C++
surajpb1989
No ratings yet
FY10 ICD-9-CM Procedure Codes Linked To NHSN Operative Procedure Categories
Document22 pages
FY10 ICD-9-CM Procedure Codes Linked To NHSN Operative Procedure Categories
Yohan Julian Mega Nanda
No ratings yet
ADT
Document34 pages
ADT
bravejaya2002
No ratings yet
Unit-2 - Components of CBIS
Document8 pages
Unit-2 - Components of CBIS
getalok4u
No ratings yet
Notes On Structural Query Language
Document3 pages
Notes On Structural Query Language
Gunjan
No ratings yet
Six Sense
Document19 pages
Six Sense
Alan Fernández Chirinos
No ratings yet
Lec6 QP Indexing
Document40 pages
Lec6 QP Indexing
Previzsla
No ratings yet
Data Flow Diagram Symbols
Document3 pages
Data Flow Diagram Symbols
AAA
No ratings yet
BI Apps796 Perf Tech Note V9
Document134 pages
BI Apps796 Perf Tech Note V9
sartison4271
No ratings yet
Management Information Systems
Document2 pages
Management Information Systems
Ha ia
No ratings yet
SPIE APRS Tutorial Geowebservices HCK
Document23 pages
SPIE APRS Tutorial Geowebservices HCK
Gheorghe Ionescu
No ratings yet
Feral Children Article
Document12 pages
Feral Children Article
Isabel Sidhe
No ratings yet
DAC + TSD Sinergi ATM Himbara Via Jalin - Decomm B24 (Is Jalin) - 22000194
Document32 pages
DAC + TSD Sinergi ATM Himbara Via Jalin - Decomm B24 (Is Jalin) - 22000194
anggoro tri putranto
No ratings yet
Install Windows
Document205 pages
Install Windows
kishore2285
No ratings yet
DBMS 5th Sem - LabManual
Document63 pages
DBMS 5th Sem - LabManual
sriharshapatilsb
No ratings yet
RDBMS Day2
Document65 pages
RDBMS Day2
shabsae
No ratings yet
HMC & CMC - IBM Power Community
Document13 pages
HMC & CMC - IBM Power Community
rajeevkghosh
No ratings yet
DBMS Notes
Document44 pages
DBMS Notes
Cescbs Chaalys
No ratings yet
CV DV ETL Dev
Document2 pages
CV DV ETL Dev
DeepakVyas
No ratings yet
Sap-Novasoft-Erp Implementation On Pantaloons: Submitted by
Document11 pages
Sap-Novasoft-Erp Implementation On Pantaloons: Submitted by
rachel
No ratings yet
The Data Driven Enterprise
Document27 pages
The Data Driven Enterprise
semiring
No ratings yet
TR 3965 PDF
Document63 pages
TR 3965 PDF
Purushothama Gn
No ratings yet
Arijit Ghosh Dbms
Document14 pages
Arijit Ghosh Dbms
Arijit Ghosh
No ratings yet
Basic Informatica PowerCenter Case Study
Document28 pages
Basic Informatica PowerCenter Case Study
etlbideveloper
100% (1)
Database Management System
Document12 pages
Database Management System
priyank chopra
No ratings yet
Normalization and ERD
Document2 pages
Normalization and ERD
Jaleel James
No ratings yet
Advanced Database Management Systems
Document3 pages
Advanced Database Management Systems
Unaise Ek
100% (1)
Introduction To Electronic Medical Records
Document16 pages
Introduction To Electronic Medical Records
Saniat Obaidullah
100% (1)