Professional Documents
Culture Documents
TI Intro
TI Intro
T I.Intro- 3
Introduction to Pattern
(Set) Mining
1. What is Pattern Mining
2. Frequent Itemsets
2.1. Downwards closedness property
2.2. The Apriori Algorithm
3. The Flood of Itemsets
3.1. Closed, Maximal & Non-Derivable Itemsets
4. Global and Local Data Mining
1 ✔ ✔
2 ✔ ✔ ✔ ✔
3 ✔ ✔ ✔
4 ✔ ✔ ✔ ✔
5 ✔ ✔ ✔
{bread, milk}
a: bread
b: beer
{bread, beer, {bread, beer,
c: milk
milk, diapers} diapers, eggs}
d: diapers
e: eggs n
n
2 subsets of n items. Layer k has k subsets.
DTDM, WS 12/13 23 October 2012 T I.Intro- 8
Transaction data as binary matrix
1 1
✔ 1
✔ 0 0 0
2 1
✔ 0 1
✔ 1
✔ 1
✔
3 0 1
✔ 1
✔ 1
✔ 0
4 1
✔ 1
✔ 1
✔ 1
✔ 0
5 1
✔ 1
✔ 1
✔ 0 0
4 ✔ ✔ ✔ ✔ ✔ ✔
• 255!
5 ✔ ✔ ✔ ✔ ✔ • Still 31 frequent itemsets
6 ✔ ✔ ✔ ✔ ✔ with 50% minfreq
7 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
1 ✔ ✔ ✔ ✔ ✔
• 31 frequent itemsets with
50% minfreq
2 ✔ ✔ ✔ ✔ ✔ ✔
3 ✔ ✔ ✔ ✔ ✔ ✔ • 16 frequent closed
4 ✔ ✔ ✔ ✔ ✔ ✔ itemsets with 50%
5 ✔ ✔ ✔ ✔ ✔ minfreq
6 ✔ ✔ ✔ ✔ ✔ • 9 frequent maximal
7 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ itemsets with 50%
minfreq
Frequent itemsets
Closed itemsets
Maximal
itemsets
ABC
ABC ABC
ABC
ABC
C
|I\J|+1 supp(J)
– If |I \ X| is even: supp(I) ÂX✓J✓I ( 1)