DMBAR Chapter 14 Association Rules and Collaborative Filtering

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

DATA MINING FOR BUSINESS

ANALYTICS IN R

Galit Shmueli , Peter. C Bruce, Inbal Yahav,


Nitin R. Patel, Kenneth C. Lichtendahl, Jr.

Indian Adaptation by
O.P. Wali, Professor, Indian Institute of Foreign Trade

Copyright © 2022 by John Wiley & Sons, Inc. All rights reserved.
CHAPTER 14

Association Rules and Collaborative Filtering


14.1 ASSOCIATION RULES

Put simply, association rules, or affinity analysis, constitute a study of “what goes with what.” This method is also
called market basket analysis. Association rules are heavily used in retail for learning about items that are purchased
together, but they are also useful in other fields.
14.1 ASSOCIATION RULES

Discovering Association Rules in Transaction Databases


Association rules are commonly encountered in online recommendation systems (or recommender systems), where
customers examining an item or items for possible purchase are shown other items that are often purchased in
conjunction with the first item(s). The display from Amazon.com’s online shopping system illustrates the application
of rules like this under “Frequently bought together.”
14.1 ASSOCIATION RULES

Generating Candidate Rules


The idea behind association rules is to examine all possible rules between items in an if–then format, and select only
those that are most likely to be indicators of true dependence. We use the term antecedent to describe the IF part,
and consequent to describe the THEN part. In association analysis, the antecedent and consequent are sets of items
(called itemsets) that are disjoint (do not have any items in common).
14.1 ASSOCIATION RULES

The Apriori Algorithm


Several algorithms have been proposed for generating frequent itemsets, but the classic algorithm is the Apriori
algorithm of Agrawal et al. (1993). The key idea of the algorithm is to begin by generating frequent itemsets with just
one item (one-itemsets) and to recursively generate frequent itemsets with two items, then with three items, and so
on, until we have generated frequent itemsets of all sizes.
14.1 ASSOCIATION RULES

Selecting Strong Rules


From the abundance of rules generated, the goal is to find only the rules that indicate a strong dependence between
the antecedent and consequent itemsets. To measure the strength of association implied by a rule, we use the
measures of confidence and lift ratio.
14.1 ASSOCIATION RULES

Data Format
Transaction data are usually displayed in one of two formats: a transactions database (with each row representing a
list of items purchased in a single transaction), or a binary incidence matrix in which columns are items, rows again
represent transactions, and each cell has either a 1 or a 0, indicating the presence or absence of an item in the
transaction.
14.1 ASSOCIATION RULES

The Process of Rule Selection


The process of selecting strong rules is based on generating all association rules that meet stipulated support and
confidence requirements. This is done in two stages. The first stage, described earlier, consists of finding all “frequent”
itemsets, those itemsets that have a requisite support. In the second stage, we generate, from the frequent itemsets,
association rules that meet a confidence requirement.
14.1 ASSOCIATION RULES

Interpreting the Results


In interpreting results, it is useful to look at the various measures. The support for the rule indicates its impact in
terms of overall size: How many transactions are affected? If only a small number of transactions are affected, the rule
may be of little use (unless the consequent is very valuable and/or the rule is very efficient in finding it).
14.1 ASSOCIATION RULES
Rules and Chance
Two principles can guide us in assessing rules for possible spuriousness due to chance effects:
1) The more records the rule is based on, the more solid the conclusion.
2) The more distinct rules we consider seriously (perhaps consolidating multiple rules that deal with the same
items), the more likely it is that at least some will be based on chance sampling results.
14.1 ASSOCIATION RULES

 Leverage computes the difference between the observed frequency of A and C appearing together and
the frequency that would be expected if A and C were independent. A leverage value of 0 indicates
independence.
 The implications are that lift may find very strong associations for less frequent items, while leverage
tends to prioritize items with higher frequencies/support in the dataset.
 Conviction compares the probability that X appears without Y if they were dependent with the actual
frequency of the appearance of X without Y. Unlike confidence, conviction factors in both P(X) and
P(Y) and always has a value of 1 when the relevant items are completely unrelated
 Conviction is a directed measure. Hence, while lift is the same for both (Eggs→Bacon) and
(Bacon→Eggs), conviction is different between the two, with Conv(Eggs→Bacon) being much higher.
Thus, you can use conviction to evaluate the directional relationship between your items.
14.2 COLLABORATIVE FILTERING

Collaborative filtering is a popular technique used by such recommendation systems. The term collaborative filtering
is based on the notions of identifying relevant items for a specific user from the very large set of items (“filtering”) by
considering preferences of many users (“collaboration”).

• This method makes automatic predictions (filtering) about the interests of a user by collecting preferences or taste
information from many users (collaborating). The underlying assumption of the collaborative filtering approach is
that if a person A has the same opinion as a person B on a set of items, A is more likely to have B's opinion for a
given item than that of a randomly chosen person.
14.2 COLLABORATIVE FILTERING

Data Type and Format:


Collaborative filtering requires availability of all item–user information. Specifically, for each item–user combination,
we should have some measure of the user’s preference for that item. Preference can be a numerical rating or a
binary behavior such as a purchase, a ‘like’, or a click.
14.2 COLLABORATIVE FILTERING

User-Based Collaborative Filtering: “People Like You”:


One approach to generating personalized recommendations for a user using collaborative filtering is based on finding
users with similar preferences, and recommending items that they liked but the user hasn’t purchased. The algorithm
has two steps:
1) Find users who are most similar to the user of interest (neighbors). This is done by comparing the preference of
our user to the preferences of other users.
2) Considering only the items that the user has not yet purchased, recommend the ones that are most preferred by
the user’s neighbors.
14.2 COLLABORATIVE FILTERING

Item-Based Collaborative Filtering:


When the number of users is much larger than the number of items, it is computationally cheaper (and faster) to find
similar items rather than similar users. Specifically, when a user expresses interest in a particular item, the item-based
collaborative filtering algorithm has two steps:
1) Find the items that were co-rated, or co-purchased, (by any user) with the item of interest.
2) Recommend the most popular or correlated item(s) among the similar items.
14.2 COLLABORATIVE FILTERING

Advantages and Weaknesses of Collaborative Filtering:


Collaborative filtering relies on the availability of subjective information regarding users’ preferences.
User-based collaborative filtering looks for similarity in terms of highly-rated or preferred items.
User-based collaborative filtering helps leverage similarities between people’s tastes for providing personalized
recommendations. the term “prediction” is often used to describe the output of collaborative filtering, this method is
unsupervised by nature. It can be used to generate predicted ratings or purchase indication for a user, but usually we
do not have the true outcome value in practice.
One important way to improve recommendations generated by collaborative filtering is by getting user feedback..
14.2 COLLABORATIVE FILTERING

Collaborative Filtering vs. Association Rules


While collaborative filtering and association rules are both unsupervised methods used for generating
recommendations, they differ in several ways:
▪ Frequent itemsets vs. personalized recommendations
▪ Transactional data vs. user data
▪ Binary data and ratings data
▪ Two or more items
14.2 COLLABORATIVE FILTERING

These distinctions are sharper for purchases and recommendations of non-popular items, especially when comparing
association rules to user-based collaborative filtering. When considering what to recommend to a user who
purchased a popular item, then association rules and item-based collaborative filtering might yield the same
recommendation for a single item.
14.3 SUMMARY

Association rules (also called market basket analysis) and collaborative filtering are unsupervised methods for
deducing associations between purchased items from databases of transactions. Association rules search for generic
rules about items that are purchased together. The main advantage of this method is that it generates clear, simple
rules of the form “IF X is purchased, THEN Y is also likely to be purchased.” The method is very transparent and easy
to understand.
THANK YOU

You might also like