Mining Multiple-Level Association Rules in Large Databases: IEEE Transactions On Knowledge and Data Engineering, 1999

Mining Multiple-level Association
Rules in Large Databases
IEEE Transactions on Knowledge and Data Engineering, 1999
Authors :
Jiawei Han
Simon Fraser University, British Columbia.
Yongjian Fu
University of Missouri-Rolla, Missouri.
Presented by Michael Johnson

1
Outline
1. What is MLAR?
● Concepts
● Motivation
2. A Method For Mining M-L Association Rules

● Problems/Solutions
● Definitions
● Algorithm Example
● Interestingness
● Optimizations
3. Conclusions/Future Work
4. Exam Questions
2
Outline
1. What is MLAR?
● Concepts
● Motivation

● Definitions
● Interestingness
● Optimizations
4. Exam Questions
3
What is MLDM?
What's the difference between the following
rules:
 Rule A →70% of customers who bought
diapers also bought beer
 Rule B →45% of customers who bought
cloth diapers also bought dark beer
 Rule C →35% of customers who bought
Pampers also bought Samuel Adams
beer.
4
What is MLDM?
 Rule A applies at a generic higher level of
abstraction (product)
 Rule B applies at a more specific level of

abstraction (category)
 Rule C applies at the lowest level of abstraction

(brand).
This process is called drilling down.

5
What Do We Gain?
 Concrete rules allow for
 More targeted marketing
 New marketing strategies
 More concrete relationships
6
Hierarchy Types
 Generalization/Specialization
(is a relationships)
 Is a With Multiple Inheritance
 Whole-Part hierarchies (is-part-of; has-part)
7
Is A Relationship
Generalization to Specialization
Vehicle
4-Wheels 2-Wheels
Sedan SUV Bicycle Motorcycle
8
Is A With Multiple Inheritance
Vehicle
Commuting Recreational
Car Bicycle Snowmobile
9
Whole-Part Hierarchies
Computer
Motherboard Hard Drive
RAM CPU RW Head Platter
10
Outline
1. What is MLAR?
● Concepts
● Motivation

● Definitions
● Interestingness
4. Exam Questions
11
MLAR: Main Goal
 As usual we are trying to develop a method
to extract non-trivial, interesting, and strong
rules from our transactional database.
 A method which:
 Avoids trivial rules (Milk→Bread)
 Common sense
 Avoids coincidental rules (Toy→Milk)
 Low support
12
What Do We Need?
1. Data Representation At Different Levels Of
Abstraction
● Explicitly stored in databases
● Provided by experts or users
● Generated via clustering (OLAP)
2. Efficient Methods for ML Rule Mining (focus

of this paper)
13
Possible Methods
 Apply single-level Apriori Algorithm to each of the multiple
levels under the same miniconf and minsup.
 Potential Problems?
 Higher Levels of abstraction will naturally have higher
support, support decreases as we drill down
 What is the optimal minsup for all levels?
 Too high a minsup → too few itemsets for lower levels
 Too low a minsup → too many uninteresting rules
14
Possible Solutions
 Adapt minsup for each level
 Adapt minconf for each level
 Do both
 This paper studies a progressive deepening
method developed by extension of the Apriori
Algorithm, focused on minsup
15
Assumption
 If an item is non-frequent at one level, its
descendants no longer figure in further
analysis.
 Explore only descendants of frequent items
as we drill down
 Are there problems that arise from

making these assumptions?
16
Problems
 May eliminate possible interesting rules for
itemsets at one level whose ancestors are not
frequent at higher levels.
 If so, can be addressed by 2 workarounds
2 minsup values at higher levels – one for
filtering infrequent items, the other for passing
down frequent items to lower levels; latter
called level passage threshold (lph)
The lph may be adjusted by user to allow
descendants of sub frequent items
17
Differences From Previous
Research
 Other studies use same minsup across
different levels of the hierarchy
 This study….
 Uses different minsup values at different levels of
the hierarchy
 Analyzes different optimization techniques
 Studies the use of interestingness measures
18
Requirements
Transactional database must contain:
1. Item dataset containing item description:
{<Ai>,<description>}
2. A transaction dataset T containing set of transactions..

{Tid*,{Ap…Aq}}
*Tid is a transaction identifier (key)
19
Algorithm Flow
 At Level 1:
 Generate frequent itemsets
 Get table filtered for frequent itemsets T[2]
 At subsequent levels:
 Generate candidate subsets using Apriori
 Calculate support for generated candidates
 Union 'passing' subsets with existing rule set
 Repeat until no additional rules are generated, or
desired level is reached
20
Outline
1. What is MLAR?
● Concepts
● Motivation

● Definitions
● Interestingness
● Optimizations
4. Exam Questions
21
Definitions
 A pattern or an itemset A is one item Ai or a set of
conjunctive items Ai Λ …. Λ Aj
 The support of a pattern is the number of transactions
that contain A vs. the total number of transactions
σ(A|S)
 The confidence of a rule A → B in S is given by:
φ(A→B) = σ(AUB)/σ(A) (i.e. conditional probability)
Specify 2 thresholds: minsup(σ’) and miniconf (φ’); different

values at different levels
22
Definitions
 A pattern A is frequent in set S at if:
 the support of A is no less than the corresponding minimum
support threshold σ’
 A rule A → B is strong for a set S, if:

a. each ancestor of every item in A and B is frequent at its
corresponding level
b. A Λ B is frequent at the current level and
c. φ(A→B)≥ φ’ (miniconf criteria)
This ensures that the patterns examined at the lower levels arise from
itemsets that have a high support at higher levels
23
Outline
1. What is MLAR?
● Concepts
● Motivation

● Definitions
● Interestingness
4. Exam Questions
24
Example: Taxonomy
food
Level 1: milk bread
Level 2: 2% chocolate white wheat
Level 3: Dairyland Foremost ... ... Old Mills Wonder ... ...
Generalized ID System:
2% Foremost Milk Coded as GID:112
(1st item in Level 1, 1st item in level 2, 2nd item in level 3)
25
Example: Dataset
Table 1:
A sales-transaction Table
Trans-id Bar_code_set
351428 {17325, 92108, ….}
653234 {23423, 56432,…}
Table 2:
A sales_item Description Relation
Bar_code GID Category Brand Content Size Price

17325 112 milk Foremost 2% 1 Gal $3.31
….. ….. …… ….. ….. …..
26
Example: Preprocessing
Join sales_transaction table to sales_item table to

produce encoded transaction table T[1]:
TID Items
T1 {111 , 121 , 211 , 221}
T2 {111 , 211 , 222 , 323}
T3 {112 , 122 , 221 , 411}
T4 {111 , 121}
T5 {111 , 122 , 211 , 221 , 413}
T6 {211 , 323 , 524}
T7 {323 , 411 , 524 , 713}
27
Example: Step 1
Find Level-1 frequent itemsets
Minsup = 4
Level-1 Frequent-1 Itemsets
T[1]
L[1,1]
TID Items
T1 {111 , 121 , 211 , 221} Itemset Support
T2 {111 , 211 , 222 , 323} {1**} 5

T3 {112 , 122 , 221 , 411} {2**} 5
T4 {111 , 121}
T5 {111 , 122 , 211 , 221 , 413} Level-1 Frequent 2-Itemsets
T6 {211 , 323 , 524} L[1,2]
T7 {323 , 411 , 524 , 713} Itemset Support
{1**,2**} 4
28
Example: Step 2
Create T[2] by filtering T[1] w/ L[1,1]
T[1] Filtered T[2]

TID Items TID Items
T1 {111 , 121 , 211 , 221} T1 {111 , 121 , 211 , 221}
T2 {111 , 211 , 222 , 323} T2 {111 , 211 , 222}
T3 {112 , 122 , 221 , 411} T3 {112 , 122 , 221}
T4 {111 , 121} T4 {111 , 121}
T5 {111 , 122 , 211 , 221 , 413} T5 {111 , 122 , 211 , 221}
T6 {211 , 323 , 524} T6 {211}
T7 {323 , 411 , 524 , 713}
Itemset Support
{1**} 5
{2**} 5 29
L[2,1]
Itemset Support
Example: Step 3 {11*} 5
{12*} 4
{21*} 4
Find Level-2 Frequent Itemsets
{22*} 4
Minsup = 3
L[2,2]
Filtered T[2] Itemset Support
TID Items {11*, 12*} 4
T1 {111 , 121 , 211 , 221} {11*, 21*} 3
T2 {111 , 211 , 222} {11*,22*} 4
T3 {112 , 122 , 221} {12*, 22*} 3
T4 {111 , 121} {21*, 22*} 3
T5 {111 , 122 , 211 , 221}

T6 {211} L[2,3]
Itemset Support
{11*, 12*, 22*} 3
{11*, 21*, 22*} 3
30
Example: Step 4
Find Level-3 Frequent Itemsets
Minsup = 3
L[3,1]
Filtered T[2] Itemset Support
{111} 4
TID Items
{211} 4
T1 {111 , 121 , 211 , 221}
{221} 3
T2 {111 , 211 , 222}
T3 {112 , 122 , 221}
L[3,2]
T4 {111 , 121} Itemset Support
T5 {111 , 122 , 211 , 221} {111, 211*} 3
T6 {211}
Stop: Lowest Level Reached

31
Outline
1. What is MLAR?
● Concepts
● Motivation

● Definitions
● Interestingness
● Optimizations
4. Exam Questions
32
Are All Of The Strong Rules
Interesting?
MLDM creates unique challenges for rule pruning
The paper defines two filters for interesting rules:

1. Removal of redundant rules
2. Removal of unnecessary rules
33
Redundant Rules
Consider a strong rule at Level 1: Milk→Bread
food
milk bread
2% chocolate white wheat
Dairyland Foremost ... ... Old Mills Wonder ... ...
 This rule is likely to have descendent rules which may or may not contain additional
information, even if they met our minconf and minsup criteria at that level:
 2% Milk→Wheat Bread, 2% Milk→White Bread, Chocolate Milk→Wheat Bread
 We need a way to distinguish between rules that add information and those that are
redundant 34
Redundant Rules
A rule is redundant if the confidence for a rule falls in a

certain range and the items in the rule are descendents
of a different rule.
35
Redundant Rules
Applying Redundant Rule reduction eliminates 40-70% of

discovered Strong Rules
36
Unnecessary Rules
Consider the following rules:
R: Milk→Bread (minsup = 80%)
R': Milk, Butter → Bread (minsup = 80%)
How much additional information do we gain from the R'?
MLDM can produced very complex rules that meet our

minsup and minconf criteria, but do contain much
unique/useful information.
We need a way to distinguish between rules that add
information and those that are Unnecessary
37
Unnecessary Rule
A rule R is unnecessary if there is a simpler rule R' and

φ(R) is within a given range of φ(R')
38
Outline
1. What is MLAR?
● Concepts
● Motivation

● Definitions
● Interestingness
● Optimizations
4. Exam Questions
39
Hardware Setup
Hardware: Sun Microsystems SPARCstation 20
1. 32MB RAM
2. 100Mhz Clock
3. CLI
40
Algorithm Optimizations
 Authors proposed 3 iterations of the original algorithm

1)ML_T1LA
Use only one encoded table T[1]
●
2)ML_TML1
Generate T[1], T[2], … T[n+1]
●
3)ML_T2LA
●Uses T[2], but calculates down level support with a single
scan
41
ML_T1LA
 Instead of generating T[2] from T[1], ML_T1LA algorithm
generates support for all levels of hierarchy in a single
scan from T[1]
Pros:
Avoids generation of new transaction table
Limits number of scans to the size of the largest transaction
Cons:
Scanning T[1] requires scanning all items, even infrequent ones
 Performance may suffer for DB w/ many infrequent itemsets
Large memory required (32MB RAM = page swapping!!!)
42
ML_TML1
 Instead of using only T[2] for rule mining, ML_TML1

algorithm generates a table for each level, using L[i,1] to
filter T[i] and create T[i+1]
Pros:
Saves significant processing time if only a small portion of the data is

frequent at each level
Allows for creation of T[i] and L[i,1] in parallel
●
Cons:
May not be efficient if only a small number of items is filtered at each level
43
ML_T2LA
 Like the base algorithm, ML_T2LA creates T[2] table

from the frequent itemsets in T[1]. However, it allows for
parallel creation of L[i,k].
Pros:
Saves time by limiting the number of scans
Cons:
May not be efficient if only a small number of items is filtered at

each level
44
Experimental Results
 While the figures show that T2LA is best for most of the
time, the authors preferred ML_T1LA
45
Outline
1. What is MLAR?
● Concepts
● Motivation

● Definitions
● Interestingness
● Optimizations
4. Exam Questions
46
Conclusions
 This paper demonstrated:
 Extending association rules from single-level to multiple-
level.
 A top-down progressive deepening technique for mining
multiple-level association rules.
 Filtering of uninteresting association rules
 Performance optimization techniques (not covered)
47
Future Work
 Develop efficient algorithms for mining
multiple-level sequential patterns
 Cross-level associations
 Improve interestingness of rules
48
Outline
1. What is MLAR?
● Concepts
● Motivation

● Definitions
● Interestingness
● Optimizations
4. Exam Questions
49
Exam Question 1
Q. What is a major drawback to multiple-level data
mining using the same minsup at all levels of a
concept hierarchy?
A. Large support exists at higher levels of the
hierarchy; smaller support at lower levels. In order
to insure that sufficiently strong association rules
are generated at the lower levels, we must reduce
the support at higher levels which, in turn, could
result in generation of many uninteresting rules at
higher levels. Thus we are faced with the problem
of determining which is the optimal minsup at all
levels
50
Exam Question 2
Q. Give an example of a multiple level association
rule
A.
High level: 80% of people who buy cereal also buy
milk
Low Level: 25% of people who buy Cheerios
cereal buy Hood 2% Milk
51
Exam Question 3
Q. There were 3 examples of hierarchy
types in multiple level rule mining. Pick
one and draw an example
Is-A Whole-Part
Is-A
Vehicle Multiple Computer
Inheritance
4-Wheels 2-Wheels Motherboard Hard Drive
Vehicle
RAM CPU RW Platter

Sedan SUV Bike Motorcycle
Commuting Recreational Head
Car Bicycle Snowmobile
52

Mining Multiple-Level Association Rules in Large Databases: IEEE Transactions On Knowledge and Data Engineering, 1999

Uploaded by

Copyright:

Available Formats

You might also like

Mining Multiple-Level Association Rules in Large Databases: IEEE Transactions On Knowledge and Data Engineering, 1999

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mining Multiple-Level Association Rules in Large Databases: IEEE Transactions On Knowledge and Data Engineering, 1999

Uploaded by

Copyright:

Available Formats

Mining Multiple-level Association

Rules in Large Databases

IEEE Transactions on Knowledge and Data Engineering, 1999

Presented by Michael Johnson

2. A Method For Mining M-L Association Rules

2. A Method For Mining M-L Association Rules

 Rule B applies at a more specific level of

 Rule C applies at the lowest level of abstraction

This process is called drilling down.

Sedan SUV Bicycle Motorcycle

Car Bicycle Snowmobile

Motherboard Hard Drive

RAM CPU RW Head Platter

2. A Method For Mining M-L Association Rules

2. Efficient Methods for ML Rule Mining (focus

 Are there problems that arise from

2. A transaction dataset T containing set of transactions..

2. A Method For Mining M-L Association Rules

Specify 2 thresholds: minsup(σ’) and miniconf (φ’); different

 A rule A → B is strong for a set S, if:

2. A Method For Mining M-L Association Rules

Level 1: milk bread

Level 2: 2% chocolate white wheat

Bar_code GID Category Brand Content Size Price

Join sales_transaction table to sales_item table to

T2 {111 , 211 , 222 , 323} {1**} 5

T[1] Filtered T[2]

T4 {111 , 121} {21*, 22*} 3

T5 {111 , 122 , 211 , 221}

Stop: Lowest Level Reached

2. A Method For Mining M-L Association Rules

The paper defines two filters for interesting rules:

2% chocolate white wheat

Dairyland Foremost ... ... Old Mills Wonder ... ...

A rule is redundant if the confidence for a rule falls in a

Applying Redundant Rule reduction eliminates 40-70% of

MLDM can produced very complex rules that meet our

A rule R is unnecessary if there is a simpler rule R' and

2. A Method For Mining M-L Association Rules

 Authors proposed 3 iterations of the original algorithm

Avoids generation of new transaction table

Limits number of scans to the size of the largest transaction

Scanning T[1] requires scanning all items, even infrequent ones

 Performance may suffer for DB w/ many infrequent itemsets

Large memory required (32MB RAM = page swapping!!!)

 Instead of using only T[2] for rule mining, ML_TML1

Saves significant processing time if only a small portion of the data is

 Like the base algorithm, ML_T2LA creates T[2] table

Saves time by limiting the number of scans

May not be efficient if only a small number of items is filtered at

2. A Method For Mining M-L Association Rules

2. A Method For Mining M-L Association Rules

RAM CPU RW Platter

Car Bicycle Snowmobile

You might also like

T4 {111 , 121} {21, 22} 3