Professional Documents
Culture Documents
Blatt03 Sol
Blatt03 Sol
Thomas Seidl
Exercise 3:
Frequent Itemset Mining
Knowledge Discovery in Databases I
SS 2016
Recap: Frequent Itemset Mining
• Support: 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋 = 𝑇𝑇 ∈ 𝐷𝐷 | 𝑋𝑋 ⊆ 𝑇𝑇
• Frequent Itemset: 𝑋𝑋 freq. iff 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋 ≥ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
✗
AB AC AD BC BD CD
Prune the exponential search space
A B C D
using anti-monotonicity Ø
Proof:
• Let 𝑆𝑆 ⊆ 𝐼𝐼 be a frequent itemset, i.e. 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆 ≥ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
• Let ∅ ≠ 𝑆𝑆 ′ ⊆ 𝑆𝑆
• Then
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆 ′ ≥𝑏𝑏) 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆
≥𝑆𝑆 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓. 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
i.e. 𝑆𝑆𝑆 is a frequent itemset.
Proof:
• Let ∅ ≠ 𝑆𝑆 ′ ⊆ 𝑆𝑆 ⊆ 𝐼𝐼
• For any transaction 𝑇𝑇 ⊆ 𝐼𝐼 in database 𝐷𝐷, we have:
𝑆𝑆 ⊆ 𝑇𝑇 ⇒ 𝑆𝑆𝑆 ⊆ 𝑇𝑇
• Thus, it holds that
𝑇𝑇 ∈ 𝐷𝐷 | 𝑆𝑆 ⊆ 𝑇𝑇 ⊆ 𝑇𝑇 ∈ 𝐷𝐷 | 𝑆𝑆𝑆 ⊆ 𝑇𝑇
and consequently
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆 = 𝑇𝑇 ∈ 𝐷𝐷 | 𝑆𝑆 ⊆ 𝑇𝑇 ≤ 𝑇𝑇 ∈ 𝐷𝐷 | 𝑆𝑆𝑆 ⊆ 𝑇𝑇 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑆𝑆 ′ )
minSup=0.6
database D C1 itemset sup L1 itemset sup
{A} 100% {A} 100%
TID items scan D {B} 100% {B} 100%
1 {K, A, D, B} {C} 50% {D} 75%
2 {D, A, C, E, B} {D} 75% 𝐿𝐿1 ⋈ 𝐿𝐿1
3 {C, A, B, E} {E} 50%
4 {B, A, D} {K} 25%
𝐿𝐿2 ⋈ 𝐿𝐿2
C3 itemset prune C3 C3 itemset scan D C3 itemset sup L3 itemset sup
{A B D} {A B D} {A B D} 75% {A B D} 75%
𝐿𝐿3 ⋈ 𝐿𝐿3
C4 is empty
Knowledge Discovery in Databases I: Exercise 3 13.05.2016 8
Recap: FP-Growth Algorithm
Association rule:
𝑋𝑋 ⇒ 𝑌𝑌
where 𝑋𝑋, 𝑌𝑌 ⊆ 𝐼𝐼 are two itemsets with 𝑋𝑋 ∩ 𝑌𝑌 = ∅.
Proof:
• 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑌𝑌 ′ ⇒ 𝑋𝑋 ∖ 𝑌𝑌 ′ = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋
≥ 𝑋𝑋 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓. 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋
• 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌 ′ ⇒ 𝑋𝑋 ∖ 𝑌𝑌 ′ =
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑌𝑌 ′
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑋𝑋
≥3−1 𝑏𝑏
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑌𝑌
= 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑌𝑌 ⇒ 𝑋𝑋 ∖ 𝑌𝑌
≥𝑌𝑌⇒𝑋𝑋∖𝑌𝑌 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚