Professional Documents
Culture Documents
ML Apriori Lect07a
ML Apriori Lect07a
Lecture 05
Learning Outcomes
• Concept of Basket Data, items, Item sets, and
Frequent Rules.
• Frequent Pattern Mining & Apriori Algorithm
Association Rules
• Mining Association Rules between Sets of Items
in Large Databases
(R. Agrawal, T. Imielinski & A. Swami) 1993.
Simple patterns:
1. OJ and soda are more likely purchased together than
any other two items
2. Detergent is never purchased with milk or window cleaner
3. Milk is never purchased with soda or detergent
How Good is an Association Rule?
Customer Items Purchased
1 OJ, soda POS Transactions
2 Milk, OJ, window cleaner
3 OJ, detergent
4 OJ, detergent, soda
5 Window cleaner, soda
Market-Basket transactions
Example of Association Rules
TID Items
{Diaper} {oil},
1 Bread, Milk
{Milk, Bread} {Eggs,Coke},
2 Bread, Diaper, oil, Eggs {oil, Bread} {Milk},
3 Milk, Diaper, oil, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence, not
5 Bread, Milk, Diaper, Coke causality!
Definition: Frequent Itemset
• Itemset
– A collection of one or more items
TID Items
• Example: {Milk, Bread, Diaper}
– k-itemset 1 Bread, Milk
• An itemset that contains k items 2 Bread, Diaper, oil, Eggs
• Support count () 3 Milk, Diaper, oil, Coke
– Frequency of occurrence of an itemset 4 Bread, Milk, Diaper, Beer
– E.g. ({Milk, Bread,Diaper}) = 2 5 Bread, Milk, Diaper, Coke
• Support
– Fraction of transactions that contain an itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
– An itemset whose support is greater than or eq
ual to a minsup threshold
Definition: Association Rule TID Items
Association Rule 1 Bread, Milk
– An implication expression of the form X Y, 2 Bread, Diaper, oil, Eggs
where X and Y are itemsets 3 Milk, Diaper, oil, Coke
– Example: 4 Bread, Milk, Diaper, Beer
{Milk, Diaper} {oil} 5 Bread, Milk, Diaper, Coke
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support minsup
2. Rule Generation
– Generate high confidence rules from each frequent items
et, where each rule is a binary partitioning of a frequent it
emset
24
Brute Force approach to Frequent Itemset Generation
• For an itemset with 3 elements, we h
ave 8 subsets TID Items
– Each subset is a candidate frequent it 1 Bread, Milk
emset which needs to be matched ag 2 Bread, Diaper, oil, Eggs
ainst each transaction 3 Milk, Diaper, oil, Coke
4 Bread, Milk, Diaper, Beer
1-itemsets 5 Bread, Milk, Diaper, Coke
Itemset Count
2-itemsets
{Milk} 4
Itemset Count
{Diaper} 4
{Milk, Diaper} 3
{oil} 3
{Diaper, oil} 3
3-itemsets {oil, Milk} 2
Itemset Count
{Milk, Diaper, oil} 2
Important Observation: Counts of subsets can’t be smaller than the count of an itemset!
25
Reducing Number of Candidates
• Apriori principle:
– If an itemset is frequent, then all of its subsets must also be frequent
• Apriori principle holds due to the following property of the support mea
sure:
X , Y : ( X Y ) s ( X ) s (Y )
– Support of an itemset never exceeds the support of its subsets
– This is known as the anti-monotone property of support
Illustrating Apriori Principle
Item Count Items (1-itemsets)
Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Oil 3 {Bread,Milk} 3
Diaper 4 {Bread,Oil} 2 (No need to generate
Eggs 1 {Bread,Diaper} 3
candidates involving Coke
{Milk,Beer} 2
{Milk,Diaper} 3 or Eggs)
{Oil,Diaper} 3
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 3
– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
• Generate length (k+1) candidate itemsets from length k frequent itemsets
• Prune candidate itemsets containing subsets of length k that are infrequent
• Count the support of each candidate by scanning the DB
• Eliminate candidates that are infrequent, leaving only those that are frequent
Note: This algorithm makes several passes over the transaction list
Apriori Algorithm (Mining Frequent Patterns)
Rule Generation
• Given a frequent itemset L, find all non-empty subsets f L such
that f L – f satisfies the minimum confidence requirement
– If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC BD, AD BC, BC AD,
BD AC, CD AB,
asses over the transaction list {Diaper} 4/5=0.8 {Diaper, Bread} 3/5=0.6
• Supports computed from frequent ite {Bread} 4/5=0.8 {Bread, Milk} 3/5=0.6
mset generation can be reused
Itemset Support
• Tables on the side show support valu
{Bread, Milk, Diaper} 3/5=0.6
es for all (except null) the subsets of i
temset {Bread, Milk, Diaper}
Example of Rules:
• Confidence values are shown for the
(23-2) = 6 rules generated from this fr {Bread, Milk} {Diaper} (s=0.6, c=1.0)
equent itemset {Milk, Diaper} {Bread} (s=0.6, c=1.0)
{Diaper, Bread} {Milk} (s=0.6, c=1.0)
{Bread} {Milk, Diaper} (s=0.6, c=0.75)
{Diaper} {Milk, Bread} (s=0.6, c=0.75)
{Milk} {Diaper, Bread} (s=0.6, c=0.75)
31
Rule generation in Apriori Algorithm
• For each frequent k-itemset wh {Bread, Milk, Diaper}
ere k > 2 1-item rules
– Generate high confidence rules wi
th one item in the consequent
{Bread, Milk} {Diaper}
– Using these rules, iteratively gene {Milk, Diaper} {Bread}
rate high confidence rules with m {Diaper, Bread} {Milk}
ore than one items in the consequ
ent 2-item rules
– if any rule has low confidence the {Bread} {Milk, Diaper}
n all the other rules containing the {Diaper} {Milk, Bread}
consequents can be pruned (not
generated) {Milk} {Diaper, Bread}
32
Transactional Data For an Electronics Branch
(Example)
Generation of candidate itemsets and frequent
itemsets, where the minimum support count is 2.
Review Questions
1. What is support and confidence ?
2. How many number of rules can be generated from an item set containing
02 items,
3. How many number of item sets can be generated from 4 distinct items?
4. Why Appriori Algorithm is slow?
Thank you