DMBAR Chapter 14 Association Rules and Collaborative Filtering

DATA MINING FOR BUSINESS
ANALYTICS IN R
Galit Shmueli , Peter. C Bruce, Inbal Yahav,

Nitin R. Patel, Kenneth C. Lichtendahl, Jr.
Indian Adaptation by
O.P. Wali, Professor, Indian Institute of Foreign Trade
Copyright © 2022 by John Wiley & Sons, Inc. All rights reserved.
CHAPTER 14
Association Rules and Collaborative Filtering

14.1 ASSOCIATION RULES
Put simply, association rules, or affinity analysis, constitute a study of “what goes with what.” This method is also
called market basket analysis. Association rules are heavily used in retail for learning about items that are purchased
together, but they are also useful in other fields.
Discovering Association Rules in Transaction Databases

Association rules are commonly encountered in online recommendation systems (or recommender systems), where
customers examining an item or items for possible purchase are shown other items that are often purchased in
conjunction with the first item(s). The display from Amazon.com’s online shopping system illustrates the application
of rules like this under “Frequently bought together.”
Generating Candidate Rules

The idea behind association rules is to examine all possible rules between items in an if–then format, and select only
those that are most likely to be indicators of true dependence. We use the term antecedent to describe the IF part,
and consequent to describe the THEN part. In association analysis, the antecedent and consequent are sets of items
(called itemsets) that are disjoint (do not have any items in common).
The Apriori Algorithm

Several algorithms have been proposed for generating frequent itemsets, but the classic algorithm is the Apriori
algorithm of Agrawal et al. (1993). The key idea of the algorithm is to begin by generating frequent itemsets with just
one item (one-itemsets) and to recursively generate frequent itemsets with two items, then with three items, and so
on, until we have generated frequent itemsets of all sizes.
Selecting Strong Rules

From the abundance of rules generated, the goal is to find only the rules that indicate a strong dependence between
the antecedent and consequent itemsets. To measure the strength of association implied by a rule, we use the
measures of confidence and lift ratio.
Data Format
Transaction data are usually displayed in one of two formats: a transactions database (with each row representing a
list of items purchased in a single transaction), or a binary incidence matrix in which columns are items, rows again
represent transactions, and each cell has either a 1 or a 0, indicating the presence or absence of an item in the
transaction.
The Process of Rule Selection

The process of selecting strong rules is based on generating all association rules that meet stipulated support and
confidence requirements. This is done in two stages. The first stage, described earlier, consists of finding all “frequent”
itemsets, those itemsets that have a requisite support. In the second stage, we generate, from the frequent itemsets,
association rules that meet a confidence requirement.
Interpreting the Results

In interpreting results, it is useful to look at the various measures. The support for the rule indicates its impact in
terms of overall size: How many transactions are affected? If only a small number of transactions are affected, the rule
may be of little use (unless the consequent is very valuable and/or the rule is very efficient in finding it).
Rules and Chance
Two principles can guide us in assessing rules for possible spuriousness due to chance effects:
1) The more records the rule is based on, the more solid the conclusion.
2) The more distinct rules we consider seriously (perhaps consolidating multiple rules that deal with the same
items), the more likely it is that at least some will be based on chance sampling results.
 Leverage computes the difference between the observed frequency of A and C appearing together and
the frequency that would be expected if A and C were independent. A leverage value of 0 indicates
independence.
 The implications are that lift may find very strong associations for less frequent items, while leverage
tends to prioritize items with higher frequencies/support in the dataset.
 Conviction compares the probability that X appears without Y if they were dependent with the actual
frequency of the appearance of X without Y. Unlike confidence, conviction factors in both P(X) and
P(Y) and always has a value of 1 when the relevant items are completely unrelated
 Conviction is a directed measure. Hence, while lift is the same for both (Eggs→Bacon) and
(Bacon→Eggs), conviction is different between the two, with Conv(Eggs→Bacon) being much higher.
Thus, you can use conviction to evaluate the directional relationship between your items.
14.2 COLLABORATIVE FILTERING
Collaborative filtering is a popular technique used by such recommendation systems. The term collaborative filtering
is based on the notions of identifying relevant items for a specific user from the very large set of items (“filtering”) by
considering preferences of many users (“collaboration”).
• This method makes automatic predictions (filtering) about the interests of a user by collecting preferences or taste
information from many users (collaborating). The underlying assumption of the collaborative filtering approach is
that if a person A has the same opinion as a person B on a set of items, A is more likely to have B's opinion for a
given item than that of a randomly chosen person.
Data Type and Format:

Collaborative filtering requires availability of all item–user information. Specifically, for each item–user combination,
we should have some measure of the user’s preference for that item. Preference can be a numerical rating or a
binary behavior such as a purchase, a ‘like’, or a click.
User-Based Collaborative Filtering: “People Like You”:

One approach to generating personalized recommendations for a user using collaborative filtering is based on finding
users with similar preferences, and recommending items that they liked but the user hasn’t purchased. The algorithm
has two steps:
1) Find users who are most similar to the user of interest (neighbors). This is done by comparing the preference of
our user to the preferences of other users.
2) Considering only the items that the user has not yet purchased, recommend the ones that are most preferred by
the user’s neighbors.
Item-Based Collaborative Filtering:

When the number of users is much larger than the number of items, it is computationally cheaper (and faster) to find
similar items rather than similar users. Specifically, when a user expresses interest in a particular item, the item-based
collaborative filtering algorithm has two steps:
1) Find the items that were co-rated, or co-purchased, (by any user) with the item of interest.
2) Recommend the most popular or correlated item(s) among the similar items.
Advantages and Weaknesses of Collaborative Filtering:

Collaborative filtering relies on the availability of subjective information regarding users’ preferences.
User-based collaborative filtering looks for similarity in terms of highly-rated or preferred items.
User-based collaborative filtering helps leverage similarities between people’s tastes for providing personalized
recommendations. the term “prediction” is often used to describe the output of collaborative filtering, this method is
unsupervised by nature. It can be used to generate predicted ratings or purchase indication for a user, but usually we
do not have the true outcome value in practice.
One important way to improve recommendations generated by collaborative filtering is by getting user feedback..
Collaborative Filtering vs. Association Rules

While collaborative filtering and association rules are both unsupervised methods used for generating
recommendations, they differ in several ways:
▪ Frequent itemsets vs. personalized recommendations
▪ Transactional data vs. user data
▪ Binary data and ratings data
▪ Two or more items
These distinctions are sharper for purchases and recommendations of non-popular items, especially when comparing
association rules to user-based collaborative filtering. When considering what to recommend to a user who
purchased a popular item, then association rules and item-based collaborative filtering might yield the same
recommendation for a single item.
14.3 SUMMARY
Association rules (also called market basket analysis) and collaborative filtering are unsupervised methods for
deducing associations between purchased items from databases of transactions. Association rules search for generic
rules about items that are purchased together. The main advantage of this method is that it generates clear, simple
rules of the form “IF X is purchased, THEN Y is also likely to be purchased.” The method is very transparent and easy
to understand.
THANK YOU

DMBAR Chapter 14 Association Rules and Collaborative Filtering

Uploaded by

Copyright:

Available Formats

You might also like

DMBAR Chapter 14 Association Rules and Collaborative Filtering

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DMBAR Chapter 14 Association Rules and Collaborative Filtering

Uploaded by

Copyright:

Available Formats

DATA MINING FOR BUSINESS

Galit Shmueli , Peter. C Bruce, Inbal Yahav,

Association Rules and Collaborative Filtering

Discovering Association Rules in Transaction Databases

Generating Candidate Rules

The Apriori Algorithm

Selecting Strong Rules

The Process of Rule Selection

Interpreting the Results

Data Type and Format:

User-Based Collaborative Filtering: “People Like You”:

Item-Based Collaborative Filtering:

Advantages and Weaknesses of Collaborative Filtering:

Collaborative Filtering vs. Association Rules

You might also like