Professional Documents
Culture Documents
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
transaction contains {X Y Z}
confidence, c, conditional
Completeness:
never breaks a long pattern of any transaction
mining
Compactness
reduce irrelevant information—infrequent items are gone
over 100
July 26, 2021 Data Mining: Concepts and Techniques 19
Mining Frequent Patterns Using FP-tree
tree
Method
For each item, construct its conditional pattern-base,
FP-tree
Until the resulting FP-tree is empty, or it contains only
one path (single path will generate all the combinations of its
sub-paths, each of which is a frequent pattern)
Header Table {}
Node-link property
For any frequent item ai, all the possible frequent
patterns that contain ai can be obtained by following
ai's node-links, starting from ai's head in the FP-tree
header
Prefix path property
To calculate the frequent patterns for a node ai in a
path P, only the prefix sub-path of ai in P need to be
accumulated, and its frequency count should carry the
same count as node ai.
July 26, 2021 Data Mining: Concepts and Techniques 23
Step 2: Construct Conditional FP-tree
pattern base
{} m-conditional pattern
Header Table base:
Item frequency head f:4 c:1 fca:2, fcab:1
f 4 All frequent patterns
c 4 c:3 b:1 b:1 {} concerning m
a 3 m,
b 3 a:3 p:1 f:3 fm, cm, am,
m 3 fcm, fam, cam,
p 3 m:2 b:1 c:3 fcam
p:2 m:1 a:3
m-conditional FP-tree
July 26, 2021 Data Mining: Concepts and Techniques 24
Mining Frequent Patterns by Creating
Conditional Pattern-Bases
c:3
f:3
am-conditional FP-tree
c:3 {}
Cond. pattern base of “cm”: (f:3)
a:3 f:3
m-conditional FP-tree
cm-conditional FP-tree
{}
{}
All frequent patterns
concerning m
f:3
m,
c:3 fm, cm, am,
fcm, fam, cam,
a:3
fcam
m-conditional FP-tree
July 26, 2021 Data Mining: Concepts and Techniques 27
Principles of Frequent Pattern
Growth
Pattern growth property
Let be a frequent itemset in DB, B be 's
conditional pattern base, and be an itemset in B.
Then is a frequent itemset in DB iff is
frequent in B.
“abcdef ” is a frequent pattern, if and only if
“abcde ” is a frequent pattern, and
“f ” is frequent in the set of transactions containing
“abcde ”
70
60
50
40
30
20
10
0
0 0.5 1 1.5 2 2.5 3
Support threshold(%)
100
Runtime (sec.)
80
60
40
20
0
0 0.5 1 1.5 2
Support threshold (%)
July 26, 2021 Data Mining: Concepts and Techniques 31
Presentation of Association Rules
(Table Form )
hierarchies:
2% milk Wonder bread
threshold
too high miss low level associations
Level-by-level independent
Level-cross filtering by k-itemset
Level-cross filtering by single item
Controlled level-cross filtering by single item
Level 1 Milk
min_sup = 5%
[support = 10%]
Back
July 26, 2021 Data Mining: Concepts and Techniques 40
Reduced Support
Multi-level mining with reduced support
Level 1 Milk
min_sup = 5%
[support = 10%]
Back
July 26, 2021 Data Mining: Concepts and Techniques 41
Multi-level Association: Redundancy
Filtering
predicates)
age(X,”19-25”) occupation(X,“student”) buys(X,“coke”)
hybrid-dimension association rules (repeated predicates)
age(X,”19-25”) buys(X, “popcorn”) buys(X, “coke”)
Categorical Attributes
finite number of possible values, no ordering among
values
Quantitative Attributes
numeric, implicit ordering among values
age(X,”30-34”) income(X,”24K -
48K”)
buys(X,”high resolution TV”)
July 26, 2021 Data Mining: Concepts and Techniques 50
ARCS (Association Rule Clustering
System)
1. Binning
2. Find frequent
predicateset
3. Clustering
4. Optimize
July 26, 2021 Data Mining: Concepts and Techniques 51
Limitations of ARCS
confidence
etc.
Data constraint: SQL-like queries
Find product pairs sold together in Vancouver in Dec.’98.
Dimension/level constraints:
in relevance to region, price, brand, customer category.
Rule constraints
small sales (price < $10) triggers big sales (sum > $200).
Interestingness constraints:
strong rules (min_support 3%, min_confidence 60%).
July 26, 2021 Data Mining: Concepts and Techniques 62
Rule Constraints in Association Mining
Two kind of rule constraints:
Rule form constraints: meta-rule guided mining.
constraint
A classification of (single-variable) constraints:
Class constraint: S A. e.g. S Item
Domain constraint:
S v, { , , , , , }. e.g. S.Price < 100
v S, is or . e.g. snacks S.Type
V S, or S V, { , , , , }
e.g. {snacks, sodas } S.Type
Succinctness
Anti-monotonicity Monotonicity
Convertible constraints
Inconvertible constraints
July 26, 2021 Data Mining: Concepts and Techniques 69
Property of Constraints:
Anti-Monotone
S v, { , , } yes
vS no
SV no
SV yes
SV partly
min(S) v no
min(S) v yes
min(S) v partly
max(S) v yes
max(S) v no
max(S) v partly
count(S) v yes
count(S) v no
count(S) v partly
sum(S) v yes
sum(S) v no
sum(S) v partly
avg(S) v, { , , } convertible
(frequent constraint) (yes)
July 26, 2021 Data Mining: Concepts and Techniques 71
Example of Convertible Constraints:
Avg(S) V
min(S.Price ) v is succinct
Optimization:
If C is succinct, then C is pre-counting prunable. The
S v, { , , } Yes
vS yes
S V yes
SV yes
SV yes
min(S) v yes
min(S) v yes
min(S) v yes
max(S) v yes
max(S) v yes
max(S) v yes
count(S) v weakly
count(S) v weakly
count(S) v weakly
sum(S) v no
sum(S) v no
sum(S) v no
avg(S) v, { , , } no
(frequent constraint) (no)
July 26, 2021 Data Mining: Concepts and Techniques 74
Chapter 6: Mining Association
Rules in Large Databases
Association rule mining
Mining single-dimensional Boolean association rules
from transactional databases
Mining multilevel association rules from transactional
databases
Mining multidimensional association rules from
transactional databases and data warehouse
From association mining to correlation analysis
Constraint-based association mining
Summary