Professional Documents
Culture Documents
Session 1 2
Session 1 2
Session 1 2
Survey Papers
Compulsory Readings:
1. J. Ben Schafer, Joseph A. Konstan, John Riedl (1999): Recommender systems in
e‐commerce. ACM Conference on Electronic Commerce
2. 2. G. Adomavicius, A. Tuzhilin (2005): Toward the Next Generation of
Recommender Systems: A Survey of the State‐of‐the‐Art and Possible
Extensions. IEEE Transactions on Knowledge and Data Engineering 17(6): 734‐
749
Additional Readings:
3. S. S. Anand, B. Mobasher (2005): Intelligent techniques for web
personalization. In Intelligent Techniques for Web Personalization, pages 1–36.
Springer
Outline
• Introduction to Recommender Systems
– Data model, properties and types of Recommender Systems
– Issues of Recommender Systems
• Conclusions
Outline
• Intelligent Techniques for Web Personalization
• Classifications of Approaches to Personalization
– Individual Vs Collaborative
– Reactive Vs Proactive
– User Vs Item Information
– Memory Based Vs Model Based
– Client Side Vs Server Side
• Personalization Techniques
– Content-Based Filtering
– Traditional Collaborative Filtering
– Model Based Techniques
–Item-Based Collaborative Filtering
–Clustering Based Approaches
–Graph Theoretic Approaches
• Issues
– The Cold Start and Latency Problem
– Data Sparseness
– Scalability
– Privacy
– Recommendation List Diversity
– Adapting to User Context
– Using Domain Knowledge
– Managing the Dynamics in User Interests
– Robustness
– Trust
• Evaluation of Personalization Systems
Introduction to Recommender Systems
Consumer
Web pages,
products, e-
Usenet articles,
commerce
e-mails
items,
Recommender systems are a technological
proxy for a social process
Recommendations Recommendations
from friends from Online
Systems
Basic interaction paradigm of recommender
systems
Output (Recommendations):
“Books you might enjoy are…”
Recommender Systems (RS)
• Solution to information overload.
• Content-based
– RSs find items similar to the ones you liked in
past
• Collaborative Filtering
– Users give ratings to items
– RS finds users similar to you (User similarity)
– Suggest you items liked by them
CF is Simple and effective, BUT ...
Everyday Examples of Recommendations
• Commonly one has hybrid systems which use all three kinds of
links in the previous picture
What do RSs achieve?
• Help people make decisions
– Examples:
• Where to spend attention
• Where to spend money
• Help maintain awareness
– Examples:
• New products
• New information
Recommender Systems
• RS – problem of information filtering
• RS – problem of machine learning
• Enhance user experience
– Assist users in finding information
– Reduce search and navigation time
• Increase productivity
• Increase credibility
• Mutually beneficial proposition
Recommendation types
• User attributes-based Recommendation
– Male, 18-35: Recommend The Matrix
• Content Similarity
– You liked The Matrix: recommend The Matrix
Reloaded
• Collaborative Filtering
– People with interests like yours also liked Forrest
Gump
This title is a textbook-style exposition on the topic, with its
information organized very clearly into topics such as compression,
indexing, and so forth. In addition to diagrams and example text
transformations, the authors use "pseudo-code" to present algorithms
in a language-independent manner wherever possible. They also
supplement the reading with mg--their own implementation of the
techniques. The mg C language source code is freely available on the
Web.
Personalized
Recommendation
Inputs to Recommender Systems
• Past transactions from users:
– which docs viewed
– content/attributes of documents
– which products purchased
– pages bookmarked
– explicit ratings (movies, books … )
• Current context:
– browsing history
– search(es) issued
• Explicit role/domain info:
– Role in an enterprise
– Document taxonomies
– Interest profiles
Sample Applications
• Ecommerce
– Product recommendations - amazon
• Corporate Intranets
– Recommendation, finding domain experts, …
• Digital Libraries
– Finding pages/books people will like
• Medical Applications
– Matching patients to doctors, clinical trials, …
• Customer Relationship Management
– Matching customer problems to internal experts
Recommender Data Model (I)
• Set C={c1, …, cn} of users
• Set S={s1, …, sm} of items (e.g. products)
• Elements from C and S can be described by a vector
respectively
– (a1, …, as) attributes of user profile
– (b1, …, bt) description of items (meta data, features, …)
• Goal of recommendation process: recommend new items for
an active user u
• Overview of process
1. User modelling (explicit or implicit, e.g. user rates items)
2. Personalization, generate list of recommended items
Recommender Data Model (II)
• Let u be a utility function that measures the usefulness of
item s to user c, i.e., u : C X S → R, where R is a totally
ordered set
• Then, for each user c ε C, we want to choose such item s’ ε S
that maximizes the user’s utility. More formally-
[Ado./Tuz. (2005)]
RS as a Intelligent Techniques for Web Personalization
New to me
Fast Engaging
RECOMMENDATIONS
PROCESS
Easy Good
• Possible solutions
– Use content-based approaches to easier integrate new
items in recommendation process
– Use collaborative filtering to allow „cross-domain“
recommendations
Scalability
• Algorithms are based on matching users and items
– The more items and users, the higher the computational effort to analyze the data
• Storage/memory and runtime complexity
• Alternatively, the quality of recommendations suffer
– Scalability of recommender systems is an issue in practice
1. Content based RS
2. Collaborative RS
3. Hybrid RS
Types of RS – Content based RS
Content based RS highlights
– Recommend items similar to those users
preferred in the past
– User profiling is the key
– Items/content usually denoted by keywords
– Matching “user preferences” with “item
characteristics” … works for textual information
– Vector Space Model widely used
Content-based Recommender Systems
• Basic idea
– Match user profile (interests, ratings, click history, …) with item set
– Often: Systems recommend item which are similar to items that
the active preferred in the past
• Important formalisms
– Representation of items (item model)
• Often as a vector of features
• E.g. Vector Space Model (VSM) for documents
– Representation of users (user model)
• E.g. Rating vector, learned user preferences
– Metrics to match items and users
• Calculate similarity between vectors
Content-based Recommender Systems-Methods
and Variants
• Demograhic filtering, stereotypes
– Grundy example
• Bayesian networks
– Spam example
• Document modeling
– E.g. tf*idf
– Data mining/machine learning methods
– Classification, clustering of items, short example
– Decision trees
• User Customization
– User explicitely specifies interesting categories
• Rule-based systems
– Modeling expert knowledge or learning rules from user behaviour
• Knowledge based systems
Advantages Content-based Filtering
• No (or less pronounced) „new item“ problem
• Usually good scalability
– Because most approaches are model-based
• Often no explicit profil acquistion needed
– Ratings not needed, transaction history sufficient
• Often no domain knowledge needed
– Item description sufficient
• Often quality of recommendation improves over time
– Better user model
Disadvantages Content-based Filtering
• Item model limited to analyzed features
– E.g. keywords in document or points-of-interests relevant for mobile
applications
– Features have to be available explicitely
• Overfitting, portfolio effect
– Recommendation based on similarity only
– No real new, unexpected items (diversity often poor)
• Cold start: Often still “ new user“ problem
– But usually less pronounced than with collaborative recommenders
– Rule-system example
• No new user problem, user model is implicitly observed user location
• But rules description have to be provided (trigger)
Disadvantages Content-based Filtering
Content based RS - Limitations
– Not all content is well represented by keywords,
e.g. images
– Items represented by same set of features are
indistinguishable
– Overspecialization: unrated items not shown
– Users with thousands of purchases is a problem
– New user: No history available
– Shouldn’t show items that are too different, or
too similar
Case-based Recommenders
• A form of content-based recommendation
• Structured information with a well defined set of
features and feature values
– Travel information presented in its price, duration,
accommodation, location, mode of transport, etc.
– Job information presented in the job kinds, salary, business
category of each company, educational level, experience,
location etc.
• Information is represented as cases and the system
recommends the cases that are most similar to the user’s
preferences
Wolfgang Wörndl 47
Case-Based Reasoning
• Case-based recommendation origins in Case-Based
Reasoning (CBR)
– It is to solve new problems by reusing the solutions to
problems that have been previously solved and stored as
cases in a case-base
– Each case consists of a specification part, which describes
the problem and a solution part, which describes the
solution of the problem
• Solutions to similar prior problems are a useful starting point for
new problem solving
• “The users would like the similar one that they liked
before.”
Wolfgang Wörndl 48
Simple Example of Case-based
Recommendation
Product #1
HD: 250 GB
I want a laptop with Memory: 2 GB
250GB HD, 1GB Screen Size: 15 inch
Price: $550
memory and 14 inch
screen for $400 Product #2
HD: 150 GB
Memory: 1 GB
Screen Size: 15 inch
Price: $450
Product #3
HD: 250 GB
Memory: 1 GB
Screen Size: 14.2 inch
Price: $500
Case-based Recommendation
Types of RS – Collaborative RS
Collaborative RS highlights
– Use other users recommendations (ratings)
to judge item’s utility
– Key is to find users/user groups whose
interests match with the current user
– Vector Space model widely used (directions
of vectors are user specified ratings)
– More users, more ratings: better results
– Can account for items dissimilar to the ones
seen in the past too
– Example: Movielens.org
Types of RS – Collaborative RS
Collaborative RS - Limitations
– Different users might use different scales. Possible
solution: weighted ratings, i.e. deviations from
average rating
– Finding similar users/user groups isn’t very easy
– New user: No preferences available
– New item: No ratings available
– Demographic filtering is required
– Multi-criteria ratings is required
Collaborative Filtering (CF)
• Basic idea: System recommends items which
were preferred by similar users in the past
– Based on ratings
• Express preferences of the active user
• And also other users Collaborative approach
– Works on user-item matrix
• Memory- or model-based
• No item meta data etc.!
• Assumption: Similar taste in the past implies similar taste
in future
• CF is formalization of “word of mouth“ among
buddies
General Process
1. Users rate items
2. Find set S of users which have rated similar to
the active user u in the past ( neighborhood)
Similarity calculation
Select the k nearest users to the active user
Profile of Profile of
user 1 user 2
Profile of Profile of
current user 3
user
Profile of
user 4
Documents from
like-minded users’
Profile of
profiles
user 5
=> recommended
documents
Example (I)
Source: http://www.dfki.de/~jameson/ijcai03-tutorial/
Example (II)
Example (III)
Required Metrics
• Metric for user-user similarity
– Mean-squared difference
– Cosine
– Pearson/Spearman correlation
• Cosine similarity:
(Henze, 2006/7)
Example Calculation
Pearson/Spearman Correlation
• Average rating is taken into account
– is vector of average ratings
• Not suitable for unary ratings
– Unary: Item is marked (or not)
• z.B. “Product was purchased“
– Binary: “good/bad“, “+/-“ etc.
– Scalar: Numerical rating (e.g. 1-5) etc.
– Consider only items which were rated by both users
Required Metrics
• Metric for user-user similarity
– Mean-squared difference
– Cosine
– Pearson/Spearman correlation
• Similarity threshold
– S contains all users with a similarity bigger than a threshold t
• Problem: maybe too few users in S
• Aggregate neighborhood
– Follow similarity threshold method first
– If S is too small (less than k users)
• Determine „centroid“ of set S and add users which are most similar to centroid ( less
deviators than center-based method)
Neighborhood of Similar Users
• Goal: Determine set S of users which are most similar
to the active user u
• Center-based
– S contains k most similar users
• Problem: maybe some of the users are not really that similar, if k was chosen too large,
deviators possible
• Similarity threshold
– S contains all users with a similarity bigger than a threshold t
• Problem: maybe too few users in S
• Aggregate neighborhood
– Follow similarity threshold method first
– If S is too small (less than k users)
• Determine „centroid“ of set S and add users which are most similar to centroid ( less
deviators than center-based method)
Required Metrics
• Metric for user-user similarity
– Mean-squared difference
– Cosine
– Pearson/Spearman correlation
• Problems
– Similarity of u with members of S is not taken into account
• Solution: Weighting based on similarity
CF Recommender (II)
• Note
– Many variations of algorithms in research literature
• For various application domains, with different properties
Collaborative Filtering
• Amazon and other commercial service use
some form of collaborative filtering
– Exact method usually not published
1. User level
– Highlighting interests, hobbies, and keywords
people have in common
2. Item level
– link the keywords to eCommerce (by RS algorithms)
Possible Improvement in RS
System transparency
– Help users understand how the RS works
– Example:
http://www.pandora.com/
Amazon.com
Result:
– Generate trust
– Convince users
Possible Improvement in RS
Multidimensionality of Recommendations
– Take into consideration the contextual
information
Examples:
Movie- Different context, Different rating
Travel
Possible Improvement in RS
Randomness and Nonintrusiveness
• Many recommender systems are intrusive in the sense
that they require explicit feedback from the user and
often at a significant level of user involvement
• some recommender systems use nonintrusive rating
determination methods where certain proxies are used
to estimate real ratings
• However, nonintrusive ratings (such as time spent
reading an article) are often inaccurate and cannot fully
replace explicit ratings provided by the user
Possible Improvement in RS
Other
– Privacy (CF methods)
One-way hash: easily computed one direction,
impossible in the other
– Malicious use (recommendation spam)
Probabilistic techniques to determine the
honesty of a score (unusual pattern)
Possible Improvement in RS
Common business models adapted:
– Charge recipient of recommendations
– Provide incentives for giving ratings
– Targeted advertisements
– Charge owners of the items
Possible Improvement in RS
Complicated Problems
– People might change minds afterwards