Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Exploiting Review Helpfulness Rating in

Scalable Recommendation Systems


Using Apache Spark

Presenter Supervisor
Farrah Munir Dr. Zareen Alamgir
Prospective Lecturer FAST NU FAST NU Lahore
Lahore
Agenda

 Introduction of Recommendation Systems and techniques


 Introduction of Spark
 Problem Statement
 Literature Review
 Proposed model Distributed DualRec
 Experiments & Results
 Future work
Introduction to Recommendation
Systems
 Why we need recommendation system?
Examples
 Amazon
 Yelp
 Netflix
 TripAdvisor
 eBay
Recommendation
Techniques

Content based filtering


Collaborative Filtering
Matches user and item
profiles/content Uses known item ratings
- Domain specific + Captures dynamic interests
- Static interests assumption - Sparsity & cold-start problem

Memory Based Matrix Factorization Model


Uses Near Neighborhood
approach Uses Latent factor based low
dimensional representation
Can find Item-Item or User-User
similarity Gradient Decent
Uses Pearson Correlation, Cosine - non-convex, slowly convergent,
similarity measures Alternate Least Square
+ fast convergent
+ easily scalable and parallelizable
Recommendation System Representation

Matrix Factorization model

https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0
Motivation
Use Review and Helpfulness votes, user dual role
yelp.com
amazon.com

https://www.amazon.com

https://www.yelp.com/biz/peoples-bistro-san-francisco-4?osq=people+bistro
Problem statement

 Users dual role can be exploited also in scalable recommendation systems


to deal with large scale rating sparsity and cold-start problem by
integrating review helpfulness with rating to infer implicit rating.
 Performance improvement achieved on parallel distributed environment
will ensure system’s practicality, efficiency, scalability and accuracy in real
terms.
Background
Parallel Processing Frameworks
Hadoop Vs Spark
 Map Reduce programming  Higher development level
paradigm
 Integrated other computing tools
 Job is divided in map and reduce
 Faster than Hadoop, in-memory
tasks
computations using immutable
 Input and output data is stored in parallel dataset (RDD)
distributed file system HDFS with
replication  Lazy computations
(transformation vs action)
 Data partitioning, synchronization,
 optimized task scheduling and
failure recovery, inter-machine
failure recovery using
communication is auto-managed
dependency /lineage graph
 Master coordinates and monitors
cluster nodes
Hadoop & Spark
Frameworks
Iterative operations on Hadoop vs
Spark

https://www.tutorialspoint.
Literature Review 1

year Title Approach Pros Cons


2009 Koren, Yehuda, Robert Bell, • Latent factor MF • Captures • Cold start
and Chris Volinsky. "Matrix • Gradient Decent dynamic • Sparsity
factorization techniques for
• ALS interests
recommender systems."
Computer 8 (2009): 30-37. • Bias factors • CF models are
more accurate
than CB models
2011 Sandra Garcia Esparza, • CB model using TF-IDF scores • Handles novelty • Less accurate than
Michael P O’Mahony, and • Term based user index /profile better than CF CF models
Barry Smyth. “Effective
product recommendation
using the real-time web”. In:
Research and Development
in Intelligent Systems XXVII.
Springer, 2011, pp. 5–18.
2010 Damien Poirier, Francoise • Review sentiment analysis • More accurate • Overall opinion
Fessant, and Isabelle Tellier. • Learning classifiers Naïve predictions
“Reducing the coldstart
Bayesian, SVM
problem in content
recommendation through • Positive, negative overall
opinion classification”. In: review score to augment
Web Intelligence and rating
Intelligent Agent Technology
(WI-IAT), 2010 IEEE/WIC/ACM
International Conference on.
Vol. 1. IEEE. 2010, pp. 204–207.
Literature Review 2
year Reference Approach Pros Cons
2013 Weishi Zhang et al. • Review elements sentiment • Fine grained opinion • Not feature
“Generating virtual ratings analysis mining specific
from chinese reviews to
• Accumulation of Emoticons and • Improved accuracy interests
augment
online recommendations”. opinion words sentiment score
In: ACM Transactions on
intelligent systems and
technology (TIST) 4.1 (2013),
p. 9.
2013 Julian McAuley and Jure • Review topic modeling LDA • Automatic topic
Leskovec. “Hidden factors modeling
and hidden topics:
• Topic specific
understanding
rating dimensions with sentiment orientation
review text”. In:
Proceedings of the 7th
ACM
conference on
Recommender systems.
ACM. 2013, pp. 165–172.
2013 Li Chen and FengWang. Mapping topic on aspect using LDA Aspect aware
“Preference-based recommendation
clustering reviews for
augmenting
e-commerce
recommendation”. In:
Literature Review 3
year Reference Approach Pros Cons
2018 Zhiyong Cheng et al. “Aspect- • LDA with local aspect • Use specific local
Aware Latent Factor Model: preference weights aspect rating
Rating Prediction
with Ratings and Reviews”. In:
arXiv preprint arXiv:1802.07938
(2018).

2012 Sindhu Raghavan, Suriya • Helpfulness vote to measure • Spam review detection
Gunasekar, and Joydeep review quality
Ghosh. “Review quality
aware collaborative filtering”.
In: Proceedings of the sixth
ACM conference on
Recommender
systems. ACM. 2012, pp. 123–
130.
2018 Pei-Ju Lee, Ya-Han Hu, and • Helpfulness prediction • Reviewer
Kuan-Ting Lu. “Assessing the classification model (SVM, characteristics are
helpfulness of online
Regression, RF) more powerful factor
hotel reviews: A classification-
based approach”. In: • Accessed review quality than review elements
Telematics and Informatics parameters
35.2 (2018), pp. 436–445.
Literature Review

year Reference Approach Pros Cons


2015 SuhangWang, Jiliang • User dual role as reviewer • Alleviated sparsity and • Local influence
Tang, and Huan Liu. and helpfulness voter cold start problem of emotions not
“Toward dual roles of • ALS CF approach considered
users in recommender
systems”. In: Proceedings
of the 24th ACM
International on
Conference
on Information and
Knowledge
Management. ACM. 2015,
pp. 1651–1660.
2018 Xuying Meng et al. • Positive votes influence Incorporated local and
“Exploiting emotion on higher than negative global impact of positive
reviews for recommender and negative votes
systems”.
In: Proc. AAAI Conf. Artif.
Intell. 2018.
DualRec System
 Reviewer
Rates the item and writes a review
 Rater
Votes review as helpful, cool, funny, inappropriate etc.
 Vote is scaled on 1 to 5 scale as helpfulness rating

DualRec Framework
Yellow line shows reviewer has given a review and hence a rating exists for that item as purple line indicates. Green
solid line shows Type-I helpfulness rating where item rating is known. Dotted green line shows Type-II helpfulness rating
i.e. actual item rating by that user doesn’t exist.
Review and Helpfulness vote in DualRec System
 Type-I
Helpfulness
Reviewer and rater
both item ratings are
known
 Type-II
Helpfulness
Only Reviewer item
rating is known

H21 is type-1 helpfulness


Reviewer Review Rate
Type-II
Yellow line shows review written by a user
Black line shows (R/H) rating by a user
Grey line shows a review is associated with a product
Preliminaries
 matrices are written as bold-face capital letters and vectors are denoted as bold-face lower-case letters.

Denotes number of elements in set P


Set of n users
Set of m products
Set of N reviews

Rik =? For unknown rating


Basic Matrix factorization based
Collaborative filtering technique
Implicit rating using rater role

Empirically these simple features produced


good enough results using linear Regression
model.

Yellow line shows review written by a user, Black line shows (R/H) rating by a user, Grey line shows a review is associated
with a product
Implicit rating using rater role
Objective Function
Proposed methodology
Datasets
Results of sequential Dual Rec
Dual Rec Vs MF
Experiments & Results
Tuning Alpha & beta
Thesis ІІ work

 Converting Naïve Spark ALS algorithm to Distributed DualRec model.


 Preparing a large at least 1GB dataset of helpfulness ratings and trusted
users to evaluate distributed Dual Rec performance.
 Comparing results of sequential vs parallel implementations
 Analyzing the effect of α,β parameters with large dataset
 Observing systems increased accuracy with big sparse data and cold start
users
Future research directions

 As Aspect Aware Latent Factor models are gaining popularity. In future,


Distributed DualRec framework can be amalgamated with these models to
give a combined rating based on review and helpfulness.
 Another future direction could be incorporating contemporary approaches
of using social networks for recommendations with addition of helpfulness.
 DualRec System currently uses linear regression model to formulate implicit
inferred rating from helpfulness. Some different feature based model can
be adopted to achieve higher accuracy.
References

 SuhangWang,JiliangTang,andHuanLiu.“Towarddualrolesofusersinrecommendersystems”.In:Proceedingsofthe24th
ACMInternationalonConference on Information and Knowledge Management. ACM. 2015, pp. 1651–1660.
 Yehuda Koren, Robert Bell, and Chris Volinsky. “Matrix factorization techniques for recommender systems”. In:
Computer 8 (2009), pp. 30–37.
 Xuying Meng et al. “Exploiting emotion on reviews for recommender systems”. In: Proc. AAAI Conf. Artif. Intell.
2018.
 Sindhu Raghavan, Suriya Gunasekar, and Joydeep Ghosh. “Review quality
awarecollaborativefiltering”.In:ProceedingsofthesixthACMconferenceonRecommender systems. ACM. 2012, pp.
123–130.
 Restaurants, Dentists, Bars, Beauty Salons, Doctors - Yelp. https://www.yelp. com/biz/peoples-bistro-san-francisco-
4?osq=people+bistro. Accessed: 2019-01-15.
 Parallax Scrolling, Java Cryptography, YAML, Python Data Science, Java i18n, GitLab, TestRail, VersionOne,
DBUtils, Common CLI, Seaborn, Ansible, LOLCODE,
CurrentAffairs2018,ApacheCommonsCollections.https://www.tutorialspoint. com. Accessed: 2019-01-15.
 Restaurants, Dentists, Bars, Beauty Salons, Doctors - Yelp. https://www.yelp. com/biz/peoples-bistro-san-francisco-
4?osq=people+bistro. Accessed: 2019-01-15.

You might also like