Blatt03 Sol

Database Systems Group • Prof. Dr.
Thomas Seidl
Exercise 3:
Frequent Itemset Mining
Knowledge Discovery in Databases I
SS 2016
Recap: Frequent Itemset Mining
Basic terms and definitions:
• Items 𝐼𝐼 = 𝑖𝑖1 , … , 𝑖𝑖𝑚𝑚 TID items

100 {butter, bread, milk, sugar}
• Itemset 𝑋𝑋 ⊆ 𝐼𝐼 200 {butter, flour, milk, sugar}
300 {butter, eggs, milk, salt}
• Database 𝐷𝐷 400 {eggs}
500 {butter, flour, milk, salt sugar}
• Transactions 𝑇𝑇
• Support: 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋 = 𝑇𝑇 ∈ 𝐷𝐷 | 𝑋𝑋 ⊆ 𝑇𝑇
• Frequent Itemset: 𝑋𝑋 freq. iff 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋 ≥ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
Goal: Find all frequent itemsets in 𝐷𝐷!

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 2
Recap: Frequent Itemset Mining
Naive Algorithm: Just count the frequencies of all

possible subsets of 𝐼𝐼 in the database.
• Problem: For 𝐼𝐼 = 𝑚𝑚, there are 2𝑚𝑚 such itemsets!

• Clearly, this becomes infeasible rather quickly…
ABCD not frequent
Main idea of the Apriori algorithm: ABC ABD ACD BCD
✗
AB AC AD BC BD CD
Prune the exponential search space
A B C D
using anti-monotonicity Ø

Exercise 3-1: Frequent Itemsets
The Apriori algorithm makes use of prior knowledge of

subset support properties. Prove the following subset
properties:
a) All non-empty subsets of a frequent itemset must
also be frequent.
b) The support of any non-empty subset 𝑆𝑆𝑆 of itemset 𝑆𝑆
must be as great as the support of 𝑆𝑆.

Exercise 3-1 (a): Frequent Itemsets
a) All non-empty subsets of a frequent itemset must

also be frequent:
Proof:
• Let 𝑆𝑆 ⊆ 𝐼𝐼 be a frequent itemset, i.e. 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆 ≥ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
• Let ∅ ≠ 𝑆𝑆 ′ ⊆ 𝑆𝑆
• Then
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆 ′ ≥𝑏𝑏) 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆
≥𝑆𝑆 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓. 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
i.e. 𝑆𝑆𝑆 is a frequent itemset.

Exercise 3-1 (b): Frequent Itemsets
b) The support of any non-empty subset 𝑆𝑆𝑆 of itemset 𝑆𝑆

must be as great as the support of 𝑆𝑆.
Proof:
• Let ∅ ≠ 𝑆𝑆 ′ ⊆ 𝑆𝑆 ⊆ 𝐼𝐼
• For any transaction 𝑇𝑇 ⊆ 𝐼𝐼 in database 𝐷𝐷, we have:
𝑆𝑆 ⊆ 𝑇𝑇 ⇒ 𝑆𝑆𝑆 ⊆ 𝑇𝑇
• Thus, it holds that
𝑇𝑇 ∈ 𝐷𝐷 | 𝑆𝑆 ⊆ 𝑇𝑇 ⊆ 𝑇𝑇 ∈ 𝐷𝐷 | 𝑆𝑆𝑆 ⊆ 𝑇𝑇
and consequently
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆 = 𝑇𝑇 ∈ 𝐷𝐷 | 𝑆𝑆 ⊆ 𝑇𝑇 ≤ 𝑇𝑇 ∈ 𝐷𝐷 | 𝑆𝑆𝑆 ⊆ 𝑇𝑇 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑆𝑆 ′ )

Exercise 3-2: Frequent Itemset Mining
Let 𝐷𝐷 be a database that contains the following four

transactions:
TID items_bought
T1 {K, A, D, B}
T2 {D, A, C, E, B}
T3 {C, A, B, E}
T4 {B, A, D}
In addition let 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 60%.

a) Find all frequent itemsets using the Apriori algorithm.
b) Find all frequent itemsets using the FP-growth
algorithm.
c) Determine all closed and maximal frequent itemsets.

Exercise 3-2 (a): Apriori Algorithm
minSup=0.6
database D C1 itemset sup L1 itemset sup
{A} 100% {A} 100%
TID items scan D {B} 100% {B} 100%
1 {K, A, D, B} {C} 50% {D} 75%
2 {D, A, C, E, B} {D} 75% 𝐿𝐿1 ⋈ 𝐿𝐿1
3 {C, A, B, E} {E} 50%
4 {B, A, D} {K} 25%
C2 itemset C2 itemset C2 itemset sup L2 itemset sup

{A B} prune C {A B} scan D {A B} 100% {A B} 100%
2
{A D} {A D} {A D} 75% {A D} 75%
{B D} {B D} {B D} 75% {B D} 75%
𝐿𝐿2 ⋈ 𝐿𝐿2
C3 itemset prune C3 C3 itemset scan D C3 itemset sup L3 itemset sup
{A B D} {A B D} {A B D} 75% {A B D} 75%
𝐿𝐿3 ⋈ 𝐿𝐿3
C4 is empty
Recap: FP-Growth Algorithm
Bottleneck of Apriori: Candidate generation
• Huge candidate set

• Multiple scans of the database
FP-Growth: FP-mining without candidate generation
• Compress database, retain only information relevant

to FP-mining: FP-tree
• Use efficient Divide & Conquer approach and grow
frequent patterns without generating candidate sets

Exercise 3-2 (b): FP-Growth Algorithm
TID items bought (ordered) frequent items

1 {K, A, D, B} {A, B, D}
2 {D, A, C, E, B} {A, B, D}
3 {C, A, B, E} {A, B}
Initial FP-tree
4 {B, A, D} {A, B, D}
for each transaction only keep

{}
minSup=0.6 its frequent items sorted in
descending order of their
frequencies A:4
sort header table:
items in item frequency B:4
the order of A 4
descending support B 4
D 3 D:3
C 2
E 2
K 1

Exercise 3-2 (b): FP-Growth Algorithm
Initial FP-tree conditional pattern base:

item cond. pattern base
{} A {}
B A:4
D AB:3
A:4
item frequency
item frequency
A 4 B:4 A 3
B 4
B 3
D 3
C 2 D:3
E 2
K 1 D-conditional FP-tree
{}|D {}|B {}|A={}
{{A}}
A:3 A:4
{{B},{AB}}
B:3
{{D},{AD},{BD},{ABD}}
Exercise 3-2 (c): Closed and Maximal
Frequent Itemsets
• Closed frequent itemsets:

• 𝑋𝑋 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ⇔ ∄𝑌𝑌: 𝑋𝑋 ⊂ 𝑌𝑌 ∧ 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑌𝑌 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑋𝑋)
• Set of closed itemsets contains complete information
• Maximal frequent itemsets:

• 𝑋𝑋 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ⇔ ∄𝑌𝑌: 𝑋𝑋 ⊂ 𝑌𝑌 ∧ 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑌𝑌 ≥ 𝑚𝑚𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
• Not complete, but more compact
frequent itemsets support

{A} 1
TID items_bought
{B} 1
T1 {K, A, D, B}
{D} 0.75
T2 {D, A, C, E, B}
closed but not maximal {A,B} 1
T3 {C, A, B, E}
{A,D} 0.75
T4 {B, A, D}
{B,D} 0.75
closed & maximal {A,B,D} 0.75

Recap: Association Rule Mining
Association rule:
𝑋𝑋 ⇒ 𝑌𝑌
where 𝑋𝑋, 𝑌𝑌 ⊆ 𝐼𝐼 are two itemsets with 𝑋𝑋 ∩ 𝑌𝑌 = ∅.
• 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋 ⇒ 𝑌𝑌 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑋𝑋 ∪ 𝑌𝑌)

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑋𝑋∪𝑌𝑌)
• 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑋𝑋 ⇒ 𝑌𝑌 =
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑋𝑋)
• Strong association rules have 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ≥ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 and
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ≥ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
Goal: Find all strong association rules in 𝐷𝐷!

Exercise 3-3: Association Rule Mining
After frequent itemset mining, association rules can be

extracted as follows: For each frequent itemset 𝑋𝑋 and
every non-empty subset 𝑌𝑌 ⊂ 𝑋𝑋, generate a rule 𝑌𝑌 ⇒
𝑋𝑋 ∖ 𝑌𝑌 if it fulfills the minimum confidence property.
a) Proof the following anti-monotonicity lemma for

strong association rules:
Let 𝑋𝑋 be a frequent itemset and 𝑌𝑌 ⊂ 𝑋𝑋. If 𝑌𝑌 ⇒ 𝑋𝑋 ∖ 𝑌𝑌 is a

strong association rule, then 𝑌𝑌 ′ ⇒ 𝑋𝑋 ∖ 𝑌𝑌𝑌 is also a
strong association rule for every 𝑌𝑌 ⊆ 𝑌𝑌𝑌.

Exercise 3-3 (a): Association Rule Mining
Let 𝑋𝑋 be a frequent itemset and 𝑌𝑌 ⊂ 𝑋𝑋. If 𝑌𝑌 ⇒ 𝑋𝑋 ∖ 𝑌𝑌 is a

strong association rule, then 𝑌𝑌 ′ ⇒ 𝑋𝑋 ∖ 𝑌𝑌𝑌 is also a
strong association rule for every 𝑌𝑌 ⊆ 𝑌𝑌𝑌.
Proof:
• 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑌𝑌 ′ ⇒ 𝑋𝑋 ∖ 𝑌𝑌 ′ = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋
≥ 𝑋𝑋 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓. 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋
• 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌 ′ ⇒ 𝑋𝑋 ∖ 𝑌𝑌 ′ =
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑌𝑌 ′
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋
≥3−1 𝑏𝑏
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑌𝑌
= 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌 ⇒ 𝑋𝑋 ∖ 𝑌𝑌
≥𝑌𝑌⇒𝑋𝑋∖𝑌𝑌 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚

Exercise 3-3 (b): Association Rule Mining
b) Extract all strong association rules from the

database 𝐷𝐷 provided in the previous exercise with a
minimum confidence of 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 80%. Which
candidate rules can be pruned based on anti-
monotonicity?
candidate rule confidence
𝑨𝑨 ⇒ 𝑩𝑩 1 ✔
frequent itemsets support
𝑩𝑩 ⇒ 𝑨𝑨 1 ✔
{A} 1
𝑨𝑨 ⇒ 𝑫𝑫 0.75 ✗
{B} 1
𝑫𝑫 ⇒ 𝑨𝑨 1 ✔
✗
{D} 0.75
𝑩𝑩 ⇒ 𝑫𝑫 0.75
{A,B} 1
𝑫𝑫 ⇒ 𝑩𝑩 1 ✔
{A,D} 0.75
𝑨𝑨, 𝑩𝑩 ⇒ 𝑫𝑫 0.75 ✗
{B,D} 0.75
𝑨𝑨, 𝑫𝑫 ⇒ 𝑩𝑩 1 ✔
{A,B,D} 0.75
𝑩𝑩, 𝑫𝑫 ⇒ 𝑨𝑨 1 ✔
𝐴𝐴 ⇒ 𝐵𝐵, 𝐷𝐷 and 𝐵𝐵 ⇒ 𝐴𝐴, 𝐷𝐷 can be pruned! 𝑫𝑫 ⇒ 𝑨𝑨, 𝑩𝑩 1 ✔

Blatt03 Sol

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Blatt03 Sol

Uploaded by

Copyright:

Available Formats

Database Systems Group • Prof. Dr.

Basic terms and definitions:

• Items 𝐼𝐼 = 𝑖𝑖1 , … , 𝑖𝑖𝑚𝑚 TID items

Goal: Find all frequent itemsets in 𝐷𝐷!

Naive Algorithm: Just count the frequencies of all

• Problem: For 𝐼𝐼 = 𝑚𝑚, there are 2𝑚𝑚 such itemsets!

ABCD not frequent

Main idea of the Apriori algorithm: ABC ABD ACD BCD

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 3

The Apriori algorithm makes use of prior knowledge of

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 4

a) All non-empty subsets of a frequent itemset must

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 5

b) The support of any non-empty subset 𝑆𝑆𝑆 of itemset 𝑆𝑆

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 6

Let 𝐷𝐷 be a database that contains the following four

In addition let 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 60%.

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 7

C2 itemset C2 itemset C2 itemset sup L2 itemset sup

Bottleneck of Apriori: Candidate generation

• Huge candidate set

FP-Growth: FP-mining without candidate generation

• Compress database, retain only information relevant

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 9

TID items bought (ordered) frequent items

for each transaction only keep

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 10

Initial FP-tree conditional pattern base:

• Closed frequent itemsets:

• Maximal frequent itemsets:

frequent itemsets support

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 12

• 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋 ⇒ 𝑌𝑌 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑋𝑋 ∪ 𝑌𝑌)

Goal: Find all strong association rules in 𝐷𝐷!

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 13

After frequent itemset mining, association rules can be

a) Proof the following anti-monotonicity lemma for

Let 𝑋𝑋 be a frequent itemset and 𝑌𝑌 ⊂ 𝑋𝑋. If 𝑌𝑌 ⇒ 𝑋𝑋 ∖ 𝑌𝑌 is a

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 14

Let 𝑋𝑋 be a frequent itemset and 𝑌𝑌 ⊂ 𝑋𝑋. If 𝑌𝑌 ⇒ 𝑋𝑋 ∖ 𝑌𝑌 is a

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 15

b) Extract all strong association rules from the

Knowledge Discovery in Databases I: Exercise 3 13.05.2016 16

You might also like