Association Rule

ASSOCIATION RULE
MINING

As noted earlier, huge amount of data is stored
electronically in many retail outlets due to
barcoding of goods sold. Natural to try to find
some useful information from this mountains of
data.

A conceptually simple yet interesting technique
is to find association rules from these large
databases.

The problem was invented by Rakesh Agarwal
at IBM.
2
Introduction
Association rules mining (or market basket
analysis) searches for interesting customer
habits by looking at associations.
Applications in marketing, store layout,
customer segmentation, medicine, finance,
and many more.

Question: Suppose you are able to find that two products x and y (say bread and
milk or crackers and cheese) are frequently bought together. How can you use that
information?
3
Introduction
Applications
Market Basket Analysis: given a database of customer
transactions, where each transaction is a set of items the goal
is to find groups of items which are frequently purchased
together.
Telecommunication (each customer is a transaction
containing the set of phone calls)
Credit Cards/ Banking Services (each card/account is a
transaction containing the set of customers payments)
Medical Treatments (each patient is represented as a
transaction containing the ordered set of diseases)

Transaction ID Items
10
Bread, Cheese, Newspaper
20
Bread, Cheese, Juice
30
Bread, Milk
40
Cheese, Juice. Milk, Coffee
50 Sugar, Tea, Coffee, Biscuits, Newspaper
60 Sugar, Tea, Coffee, Biscuits, Milk, Juice, Newspaper
70
Bread, Cheese
80
Bread, Cheese, Juice, Coffee
90
Bread, Milk
100
Sugar, Tea, Coffee, Bread, Milk, Juice, Newspaper

5
A Simple Example

Consider the following ten transactions of a shop selling only nine items. We
will return to it after explaining some terminology.
Let the number of different items sold be n
Let the number of transactions be N
Let the set of items be I={i1, i2, , in}. The
number of items may be large, perhaps several
thousands.
Let the set of transactions be T= {t1, t2, , tN}.
Each transaction ti contains a subset of items
from the itemset {i1, i2, , in}. These are the
things a customer buys when they visit the
supermarket. N is assumed to be large, perhaps
in millions.
Not considering the quantities of items bought.
6
Terminology
Want to find a group of items that tend to occur
together frequently.
The association rules are often written as
XY meaning that whenever X appears Y
also tends to appear. X and Y may be single
items or sets of items but the same item does
not appear in both.
7
Terminology
Suppose X and Y appear together in only 1%
of the transactions but whenever X appears
there is 80% chance that Y also appears.
The 1% presence of X and Y together is called
the support (or prevalence) of the rule and
80% is called the confidence (or predictability)
of the rule.
These are measures of interestingness of the
rule.

8
Terminology
Terminology
9
Confidence denotes the strength of the association
between X and Y. Support indicates the frequency of the
pattern. A minimum support is necessary if an
association is going to be of some business value.

Let the chance of finding an item X in the N
transactions is x% then we can say probability of X is
P(X) = x/100.

Question: Why is minimum support necessary? Why is minimum confidence necessary?
10
The model: data
I = {i
1
, i
2
, , i
m
}: a set of items.
Transaction t :
t a set of items, and t _ I.
Transaction Database T: a set of transactions T
= {t
1
, t
2
, , t
n
}.

11
Transaction data: supermarket
data
Market basket transactions:
t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}

tn: {biscuit, eggs, milk}
Concepts:
An item: an item/article in a basket
I: the set of all items sold in the store
A transaction: items purchased in a basket; it may
have TID (transaction ID)
A transactional dataset: A set of transactions
12
Transaction data: a set of
documents
A text document data set. Each document is
treated as a bag of keywords
doc1: Student, Teach, School
doc2: Student, School
doc3: Teach, School, City, Game
doc4: Baseball, Basketball
doc5: Basketball, Player, Spectator
doc6: Baseball, Coach, Game, Team
doc7: Basketball, Team, City, Game
13
The model: rules
A transaction t contains X, a set of items
(itemset) in I, if X _ t.
An association rule is an implication of the form:
X Y, where X, Y c I, and X Y = C

An itemset is a set of items.
E.g., X = {milk, bread, cereal} is an itemset.
A k-itemset is an itemset with k items.
E.g., {milk, bread, cereal} is a 3-itemset
14
Rule strength measures
Support: The rule holds with support sup in T
(the transaction data set) if sup% of
transactions contain X Y.
Confidence: The rule holds in T with
confidence conf if conf% of tranactions that
contain X also contain Y.
An association rule is a pattern that states
when X occurs, Y occurs with certain
probability.
15
Support and Confidence
Support count: The support count of an
itemset X, denoted by X.count, in a data set T
is the number of transactions in T that contain
X. Assume T has n transactions.
Then,

n
count Y X
support
). (
=
count X
count Y X
confidence
.
). (
=
Terminology
16
Suppose we know P(X Y) then what is the
probability of Y appearing if we know that X already
exists. It is written as P(Y|X)

The support for XY is the probability of both X and
Y appearing together, that is P(X Y).

The confidence of XY is the conditional probability
of Y appearing given that X exists. It is written as
P(Y|X) and read as P of Y given X.

17
Association Rule Problem
Given:
a set I of all the items;
a database T of transactions;
minimum support p;
minimum confidence q;
Find:
all association rules X Y with a minimum
support p and confidence q.
18
Problem Decomposition
1. Find all sets of items that have minimum
support (frequent itemsets)
2. Use the frequent itemsets to generate the
desired rules
Want to find all associations which have at least
p% support with at least q% confidence such
that
all rules satisfying any user constraints are found
the rules are found efficiently from large
databases
the rules found are actionable
19
The task
20
Many mining algorithms
There are a large number of them!!
They use different strategies and data structures.
Their resulting sets of rules are all the same.
Given a transaction data set T, and a minimum support and
a minimum confident, the set of association rules existing in
T is uniquely determined.
Any algorithm should find the same set of rules
although their computational efficiencies and
memory requirements may be different.
We study only one: the Apriori Algorithm

Association Rule Mining Task
Given a set of transactions T, the goal of
association rule mining is to find all rules
having
support minsup threshold
confidence minconf threshold

Brute-force approach:
List all possible association rules
Compute the support and confidence for each
rule
Prune rules that fail the minsup and minconf
thresholds
Computationally prohibitive!
10
Bread, Cheese, Newspaper
20
Bread, Cheese, Juice
30
Bread, Milk
40
Cheese, Juice. Milk, Coffee
50 Sugar, Tea, Coffee, Biscuits, Newspaper
60 Sugar, Tea, Coffee, Biscuits, Milk, Juice, Newspaper
70
Bread, Cheese
80
Bread, Cheese, Juice, Coffee
90
Bread, Milk
100
Sugar, Tea, Coffee, Bread, Milk, Juice, Newspaper

22
Example

We have 9 items and 10 transactions. Find support and confidence for each pair.
How many pairs are there?
There are 9 x 8 or 72 pairs, half of them
duplicates. 36 is too many to analyse, so we use
an even simpler example of n = 4 (Bread, Cheese,
Juice, Milk) and N = 4. We want to find rules with
minimum support of 50% and minimum
confidence of 75%.

23
Example
100 Bread, Cheese
200 Bread, Cheese, Juice
300 Bread, Milk
400 Cheese, Juice. Milk

N = 4 results in the following frequencies. All items have
the 50% support we
require. Only two pairs
have the 50% support
and no 3-itemset has
the support.

We will look at the
two pairs that have
the minimum support
to find if they have
the required confidence.

24
Example
Itemsets Frequency
Bread 3
Cheese 3
Juice 2
Milk 2
(Bread, Cheese) 2
(Bread, Juice) 1
(Bread, Milk) 1
(Cheese, Juice) 2
(Cheese, Milk) 1
(Juice, Milk) 1
(Bread, Cheese, Juice) 1
(Bread, Cheese, Milk) 0
(Bread, Juice, Milk) 0
(Cheese, Juice, Milk) 1
(Bread, Cheese, Juice, Milk) 0

Items or itemsets that have the minimum
support are called frequent. In our example, all
the four items and two pairs are frequent.
We will now determine if the two pairs {Bread,
Cheese} and {Cheese, Juice} lead to
association rules with 75% confidence.
Every pair {A, B} can lead to two rules A B
and B A if both satisfy the minimum
confidence. Confidence of A B is given by
the support for A and B together divided by the
support of A.

25
Terminology
We have four possible rules and their
confidence is given as follows:
Bread Cheese with confidence of 2/3 = 67%
Cheese Bread with confidence of 2/3 = 67%
Cheese Juice with confidence of 2/3 = 67%
Juice Cheese with confidence of 100%
Therefore only the last rule Juice Cheese
has confidence above the minimum 75% and
qualifies. Rules that have more than user-
specified minimum confidence are called
confident.

26
Rules
This simple algorithm works well with four
items since there were only a total of 16
combinations that we needed to look at but if
the number of items is say 100, the number of
combinations is much larger, in billions.
The number of combinations becomes about a
million with 20 items since the number of
combinations is 2
n
with n items (why?). The
nave algorithm can be improved to deal more
effectively with larger data sets.
27
Problems with Brute Force
Frequent Itemset Generation
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given d items, there
are 2
d
possible
candidate itemsets
Question: Can you think of an improvement over
the brute force method in which we looked at
every itemset combination possible?

Nave algorithms even with improvements dont
work efficiently enough to deal with large
number of items and transactions. We now
define a better algorithm.
29
Problems with Brute Force
Improved Brute Force
Rather than counting all possible item
combinations we can look at each transaction
and count only the combinations that actually
occur as shown below (that is, we dont count
itemsets with zero frequency).
Transaction ID Items Combinations
100 Bread, Cheese {Bread, Cheese}
200 Bread, Cheese,
Juice
{Bread, Cheese}, {Bread, Juice},
{Cheese, Juice}, {Bread, Cheese, Juice}
300 Bread, Milk {Bread, Milk}
400 Cheese, Juice, Milk {Cheese, Juice}, {Cheese, Milk}, {Juice,
Milk}, {Cheese, Juice, Milk}

30
Improved Brute Force
Itemsets Frequency
Bread 3
Cheese 3
Juice 2
Milk 2
(Bread, Cheese) 2
(Bread, Juice) 1
(Bread, Milk) 1
(Cheese, Milk) 2
(Cheese, Juice) 2
(Juice, Milk) 1
(Bread, Cheese, Juice) 1
(Cheese, Juice, Milk) 1

31
Removing zero frequencies leads to the smaller table
below. We then proceed as before but the improvement is
not large and we need better techniques.
Frequent Itemset Generation
Strategies
Reduce the number of candidates (M)
Complete search: M=2
d
Use pruning techniques to reduce M

Reduce the number of transactions (N)
Reduce size of N as the size of itemset increases

Reduce the number of comparisons (NM)
Use efficient data structures to store the
candidates or transactions
No need to match every candidate against every
transaction
33
The Apriori algorithm
Probably the best known algorithm
Two steps:
Find all itemsets that have minimum support
(frequent itemsets, also called large itemsets).
Use frequent itemsets to generate rules.

E.g., a frequent itemset
{Chicken, Clothes, Milk} [sup = 3/7]
and one rule from the frequent itemset
Clothes Milk, Chicken [sup = 3/7, conf = 3/3]
34
Step 1: Mining all frequent
itemsets
A frequent itemset is an itemset whose support
is minsup.
Key idea: The apriori property (downward
closure property): any subsets of a frequent
itemset are also frequent itemsets
AB AC AD BC BD CD
A B C D
ABC ABD ACD BCD
The Apriori Algorithm
Frequent I temset Property:
Any subset of a frequent itemset is frequent.
Contrapositive:
If an itemset is not frequent, none of its
supersets are frequent.
Reducing Number of
Candidates
Apriori principle:
If an itemset is frequent, then all of its subsets must also
be frequent

Apriori principle holds due to the following property of
the support measure:

Support of an itemset never exceeds the support of its
monotonic function (or monotone function) is a function
between ordered sets that preserves the given order
subsets
This is known as the anti-monotone property of support
) ( ) ( ) ( : , Y s X s Y X Y X > _
Found to be
Infrequent
null
A B C D E
ABCDE
Illustrating Apriori Principle
null
A B C D E
ABCDE
Pruned
supersets
Illustrating Apriori Principle
Item Count
Bread 4
Coke 2
Milk 4
Beer 3
Diaper 4
Eggs 1
Itemset Count
{Bread,Milk} 3
{Bread,Beer} 2
{Bread,Diaper} 3
{Milk,Beer} 2
{Milk,Diaper} 3
{Beer,Diaper} 3
Itemset Count
{Bread,Milk,Diaper} 3

Items (1-itemsets)
Pairs (2-itemsets)

(No need to generate
candidates involving Coke
or Eggs)
Triplets (3-itemsets)
Minimum Support = 3
If every subset is considered,

6
C
1
+
6
C
2
+
6
C
3
= 41
With support-based pruning,
6 + 6 + 1 = 13
39
The Apriori Algorithm: Basics
The Apriori Algorithm is an influential algorithm for
mining frequent itemsets for boolean association rules.

Key Concepts :
Frequent Itemsets: The sets of item which has minimum
support (denoted by L
i
for i
th
-Itemset).
Apriori Property: Any subset of frequent itemset must
be frequent.
Join Operation: To find L
k
, a set of candidate k-itemsets
is generated by joining L
k-1
with itself.

40
To find associations, this classical Apriori algorithm
may be simply described by a two step approach:

Step 1 discover all frequent (single) items that have
support above the minimum support required

Step 2 use the set of frequent items to generate the
association rules that have high enough confidence
level
Apriori Algorithm Terminology
41
A kitemset is a set of k items.
The set C
k
is a set of candidate kitemsets that are
potentially frequent.
The set L
k
is a subset of C
k
and is the set of kitemsets
that are frequent.
Step 1 Computing L1
Scan all transactions. Find all frequent items
that have support above the required p%. Let
these frequent items be labeled L1.
Step 2 Apriori-gen Function
Use the frequent items L1 to build all possible
item pairs like {Bread, Cheese} if Bread and
Cheese are in L1. The set of these item pairs
is called C2, the candidate set.

42
Step-by-step Apriori Algorithm
Step 3 Pruning
Scan all transactions and find all pairs in the
candidate pair set C2 that are frequent. Let
these frequent pairs be L2.
Step 4 General rule
A generalization of Step 2. Build candidate set
of k items Ck by combining frequent itemsets
in the set Lk-1.

43
Step 5 Pruning
Step 5, generalization of Step 3. Scan all
transactions and find all item sets in Ck that
are frequent. Let these frequent itemsets be
Lk.
Step 6 Continue
Continue with Step 4 unless Lk is empty.
Step 7 Stop
Stop when Lk is empty.
44
45
The Apriori Algorithm in a
Nutshell
Find the frequent itemsets: the sets of items that have
minimum support
A subset of a frequent itemset must also be a frequent
itemset
i.e., if {AB} is a frequent itemset, both {A} and {B}
should be a frequent itemset
Iteratively find frequent itemsets with cardinality from 1
to k (k-itemset)
Use the frequent itemsets to generate association rules.

46
The Apriori Algorithm : Pseudo
code
Join Step: C
k
is generated by joining L
k-1
with itself
Prune Step: Any (k-1)-itemset that is not frequent cannot be a
subset of a frequent k-itemset
Pseudo-code:
C
k
: Candidate itemset of size k
L
k
: frequent itemset of size k

L
1
= {frequent items};
for (k = 1; L
k
!=C; k++) do begin
C
k+1
= candidates generated from L
k
;
for each transaction t in database do
increment the count of all candidates in C
k+1

that are contained in t
L
k+1
= candidates in C
k+1
with min_support
end
return
k
L
k
;

47
The Apriori Algorithm: Example
Consider a database, D , consisting
of 9 transactions.
Suppose min. support count
required is 2 (i.e. min_sup = 2/9 =
22 % )
Let minimum confidence required
is 70%.
We have to first find out the
frequent itemset using Apriori
algorithm.
Then, Association rules will be
generated using min. support &
min. confidence.
TID List of Items
T100 I1, I2, I5
T100 I2, I4
T100 I2, I3
T100 I1, I2, I4
T100 I1, I3
T100 I2, I3
T100 I1, I3
T100 I1, I2 ,I3, I5
T100 I1, I2, I3
State
University of
New York,
Stony Brook
48
Step 1: Generating 1-itemset Frequent Pattern
Itemset Sup.Count
{I1} 6
{I2} 7
{I3} 6
{I4} 2
{I5} 2
Itemset Sup.Count
{I1} 6
{I2} 7
{I3} 6
{I4} 2
{I5} 2
In the first iteration of the algorithm, each item is a member of the set
of candidate.
The set of frequent 1-itemsets, L
1
, consists of the candidate 1-
itemsets satisfying minimum support.

Scan D for
count of each
candidate
Compare candidate
support count with
minimum support
count
C
1
L
1
49
Itemset
{I1, I2}
{I1, I3}
{I1, I4}
{I1, I5}
{I2, I3}
{I2, I4}
{I2, I5}
{I3, I4}
{I3, I5}
{I4, I5}
Itemset Sup.
Count
{I1, I2} 4
{I1, I3} 4
{I1, I4} 1
{I1, I5} 2
{I2, I3} 4
{I2, I4} 2
{I2, I5} 2
{I3, I4} 0
{I3, I5} 1
{I4, I5} 0
Itemset Sup
Count
{I1, I2} 4
{I1, I3} 4
{I1, I5} 2
{I2, I3} 4
{I2, I4} 2
{I2, I5} 2
Generate
C
2
candidates
from L
1

C
2
C
2
L
2
Scan D for
count of
each
candidate
Compare
candidate
support count
with
minimum
support count
50
Step 2: Generating 2-itemset Frequent Pattern [Cont.]
To discover the set of frequent 2-itemsets, L
2
, the
algorithm uses L
1
Join L
1
to generate a candidate set of 2-
itemsets, C
2
.
Next, the transactions in D are scanned and the support
count for each candidate itemset in C
2
is accumulated
(as shown in the middle table).
The set of frequent 2-itemsets, L
2
, is then determined,
consisting of those candidate 2-itemsets in C
2
having
minimum support.
Note: We havent used Apriori Property yet.

51
Itemset
{I1, I2, I3}
{I1, I2, I5}
Itemset Sup.
Count
{I1, I2, I3} 2
{I1, I2, I5} 2
Itemset Sup
Count
{I1, I2, I3} 2
{I1, I2, I5} 2
C
3 C
3
L
3
Scan D for
count of
each
candidate
Compare
candidate
support count
with min
support count
Scan D for
count of
each
candidate
The generation of the set of candidate 3-itemsets, C
3
, involves use of
the Apriori Property.
In order to find C
3
, we compute L
2
Join L
2
.
C
3
= L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5},
{I2, I4, I5}}.
Now, Join step is complete and Prune step will be used to reduce the
size of C
3
. Prune step helps to avoid heavy computation due to large C
k
.
52
Step 3: Generating 3-itemset Frequent Pattern [Cont.]
Based on the Apriori property that all subsets of a frequent itemset must
also be frequent, we can determine that four latter candidates cannot
possibly be frequent. How ?
For example , lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1, I3}
& {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L
2
, We will
keep {I1, I2, I3} in C
3
.
Lets take another example of {I2, I3, I5} which shows how the pruning is
performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}.
BUT, {I3, I5} is not a member of L
2
and hence it is not frequent violating
Apriori Property. Thus We will have to remove {I2, I3, I5} from C
3
.
Therefore, C
3
= {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of
result of Join operation for Pruning.
Now, the transactions in D are scanned in order to determine L
3
, consisting
of those candidates 3-itemsets in C
3
having minimum support.
53
The algorithm uses L
3
Join L
3
to generate a candidate set
of 4-itemsets, C
4
. Although the join results in {{I1, I2, I3,
I5}}, this itemset is pruned since its subset {{I2, I3, I5}} is
not frequent.
Thus, C
4
= , and algorithm terminates, having found
all of the frequent items. This completes our Apriori
Algorithm.
Whats Next ?
These frequent itemsets will be used to generate strong
association rules ( where strong association rules satisfy
both minimum support & minimum confidence).

54
Step 5: Generating Association Rules from Frequent
Itemsets
Procedure:
For each frequent itemset l, generate all nonempty subsets of l.
For every nonempty subset s of l, output the rule s (l-s) if
support_count(l) / support_count(s) >= min_conf where min_conf
is minimum confidence threshold.

Back To Example:
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4},
{I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.
Lets take l = {I1,I2,I5}.
Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.

55
Step 5: Generating Association Rules from Frequent
Itemsets [Cont.]
Let minimum confidence threshold is , say 70%.
The resulting association rules are shown below,
each listed with its confidence.
R1: I1 ^ I2 I5
Confidence = sc{I1,I2,I5}/sc{I1,I2} = 2/4 = 50%
R1 is Rejected.
R2: I1 ^ I5 I2
R2 is Selected.
R3: I2 ^ I5 I1
R3 is Selected.
56
Step 5: Generating Association Rules from
Frequent Itemsets [Cont.]
R4: I1 I2 ^ I5
Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33%
R4 is Rejected.
R5: I2 I1 ^ I5
Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29%
R5 is Rejected.
R6: I5 I1 ^ I2
Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100%
R6 is Selected.
In this way, We have found three strong association
rules.
57
Another Example
Database TDB
1
st
scan
C
1
L
1
L
2
C
2 C
2
2
nd
scan
C
3
L
3
3
rd
scan
Tid Items
10 A, B, D
20 A, C, E
30 A, B, C, E
40 C, E
Itemset sup
{A} 3
{B} 2
{C} 3
{D} 1
{E} 3
Itemset sup
{A} 3
{B} 2
{C} 3
{E} 3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup
{A, B} 2
{A, C} 2
{A, E} 2
{B, C} 1
{B, E} 1
{C, E} 3
Itemset sup
{A, B} 2
{A, C} 2
{A, E} 2
{C, E} 3
Itemset
{A, C, E}
Itemset sup
{A, C, E} 2
Sup
min
= 2
An Example
58
This example is similar to the last example but we have
added two more items and another transaction making
five transactions and six items. We want to find
association rules with 50% support and 75% confidence.
100 Bread, Cheese, Eggs, Juice
300 Bread, Milk, Yogurt
400 Bread, Juice, Milk
500 Cheese, Juice, Milk

First find L1. 50% support requires that each
frequent item appear in at least three
transactions. Therefore L1 is given by:

59
Example
Item Frequnecy
Bread 4
Cheese 3
Juice 4
Milk 3

The candidate 2-itemsets or C2 therefore has
six pairs (why?). These pairs and their
frequencies are:

60
Example
Item Pairs Frequency
(Bread, Cheese) 2
(Bread, Juice) 3
(Bread, Milk) 2
(Cheese, Juice) 3
(Cheese, Milk) 1
(Juice, Milk) 2

Question: Why did we not select (Egg, Milk)?
L2 has only two frequent item pairs {Bread, Juice}
and {Cheese, Juice}. After these two frequent
pairs, there are no candidate 3-itemsets (why?)
since we do not have two 2-itemsets that have the
same first item (why?).
The two frequent pairs lead to the following
possible rules:
Bread Juice
Juice Bread
Cheese Juice
Juice Cheese

61
Deriving Rules
The confidence of these rules is obtained by
dividing the support for both items in the rule by
the support of the item on the left hand side of the
rule.
The confidence of the four rules therefore are
3/4 = 75% (Bread Juice)
3/4 = 75% (Juice Bread)
3/3 = 100% (Cheese Juice)
3/4 = 75% (Juice Cheese)
Since all of them have a minimum 75%
confidence, they all qualify. We are done since
there are no 3itemsets.

62
Deriving Rules
Rule Generation
Given a frequent itemset L, find all non-empty subsets f
c L such that f L f satisfies the minimum confidence
requirement
If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC BD, AD BC, BC AD,
BD AC, CD AB,

If |L| = k, then there are 2
k
2 candidate association
rules (ignoring L C and C L)
63
Rule Generation
How to efficiently generate rules from frequent itemsets?
In general, confidence does not have an anti-monotone property
c(ABC D) can be larger or smaller than c(AB D)

But confidence of rules generated from the same itemset has an
anti-monotone property
e.g., L = {A,B,C,D}:

c(ABC D) > c(AB CD) > c(A BCD)

Confidence is anti-monotone w.r.t. the RHS of the rule
64
Rule Generation for Apriori Algorithm
ABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>AD BD=>AC CD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Lattice of rules
ABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>AD BD=>AC CD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Pruned
Rules
Low
Confidence
Rule
65
Rule Generation for Apriori Algorithm
Candidate rule is generated by merging two rules that
share the same prefix
in the rule consequent

join(CD=>AB,BD=>AC)
would produce the candidate
rule D => ABC

Prune rule D=>ABC if its
subset AD=>BC does not have
high confidence
BD=>AC CD=>AB
D=>ABC
66
37
Rule generation algorithm
Moving items from the antecedent to the consequent
never changes support, and never increases confidence
- Algorithm
For each itemset I with minsup:
- Find all minconf rules with a single consequent of the form (I - L
1
-> L
1
)
- Repeat:
Guess candidate consequents C
k
by appending items from I - L
k-1
to Lk-1
Verify confidence of each rule I - C
k
-> C
k
using known itemset support values
- Key fact:
68
Association Rules
consider:
AB C (90% confidence)
and A C (92% confidence)

Clearly the first rule is of no use. We should look for
more complex rules only if they are better than simple
rules.
Pattern Evaluation
Association rule algorithms tend to produce too
many rules
many of them are uninteresting or redundant
Redundant if {A,B,C} {D} and {A,B} {D}
have same support & confidence

Interestingness measures can be used to
prune/rank the derived patterns

In the original formulation of association rules,
support & confidence are the only measures used
70
Association & Correlation
As we can see support-confidence framework can
be misleading; it can identify a rule (A=>B) as
interesting (strong) when, in fact the occurrence of
A might not imply the occurrence of B.

Correlation Analysis provides an alternative
framework for finding interesting relationships, or
to improve understanding of meaning of some
association rules (a lift of an association rule).
Computing Interestingness
Measure
Given a rule X Y, information needed to compute rule
interestingness can be obtained from a contingency table
Y Y
X f
11
f
10
f
1+
X f
01
f
00
f
o+
f
+1
f
+0
|T|
Contingency table for X Y
f
11
: support of X and Y
f
10
f
01
f
00
Used to define various measures
support, confidence, lift, Gini,
J-measure, etc.
Drawback of Confidence

Coffee

Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Association Rule: Tea Coffee

Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
Although confidence is high, rule is misleading
P(Coffee|Tea) = 0.9375
73
Correlation Concepts
Two item sets A and B are independent (the
occurrence of A is independent of the occurrence of
item set B) iff
P(A B) = P(A) P(B)
Otherwise A and B are dependent and correlated

The measure of correlation, or correlation between
A and B is given by the formula:
Corr(A,B)= P(A U B ) / P(A) . P(B)
74
Correlation Concepts [Cont.]
corr(A,B) >1 means that A and B are positively
correlated i.e. the occurrence of one implies the
occurrence of the other.

corr(A,B) < 1 means that the occurrence of A is
negatively correlated with ( or discourages) the
occurrence of B.

corr(A,B) =1 means that A and B are independent
and there is no correlation between them.
Statistical Independence
Population of 1000 students
600 students know how to swim (S)
700 students know how to bike (B)
420 students know how to swim and bike (S,B)

P(S.B) = 420/1000 = 0.42
P(S) P(B) = 0.6 0.7 = 0.42

P(S.B) = P(S) P(B) => Statistical independence
P(S.B) > P(S) P(B) => Positively correlated
P(S.B) < P(S) P(B) => Negatively correlated
76
Association & Correlation
The correlation formula can be re-written as
Corr(A,B) = P(B|A) / P(B)

We already know that
Support(A B)= P(AUB)
Confidence(A B)= P(B|A)
That means that, Confidence(A B)= corr(A,B) P(B)

So correlation, support and confidence are all different, but the
correlation provides an extra information about the association rule (A
B).

We say that the correlation corr(A,B) provides the LIFT of the
association rule (A=>B),
i.e. A is said to increase (or LIFT) the likelihood of B by the factor of
the value returned by the formula for corr(A,B).
Measures how much more likely is the RHS given the LHS than merely
the RHS

Statistical-based Measures
Measures that take into account statistical
dependence
)] ( 1 )[ ( )] ( 1 )[ (
) ( ) ( ) , (
) ( ) ( ) , (
) ( ) (
) , (
) (
) | (
Y P Y P X P X P
Y P X P Y X P
t coef f icien
Y P X P Y X P PS
Y P X P
Y X P
Interest
Y P
X Y P
Lif t

=
=
=
=
|
Example: Lift/Interest

Coffee

Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Association Rule: Tea Coffee

Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated)
Rule Evaluation - Lift
Transaction No. Item 1 Item 2 Item 3 Item 4
100 Beer Diaper Chocolate
101 Milk Chocolate Shampoo
102 Beer Milk Vodka Chocolate
103 Beer Milk Diaper Chocolate
104 Milk Diaper Beer
Whats the support and confidence for rule {Chocolate}{Milk}?
Support = 3/5 Confidence = 3/4
Very high support and confidence.
Does Chocolate really lead to Milk purchase?
No! Because Milk occurs in 4 out of 5 transactions. Chocolate is even
decreasing the chance of Milk purchase (3/4 < 4/5)
Lift = (3/4)/(4/5) = 0.9375 < 1
There are lots of
measures proposed
in the literature

Some measures are
good for certain
applications, but not
for others

What criteria should
we use to determine
whether a measure is
good or bad?

Efficiency
81
Consider an example of a supermarket database which
might have several thousand items including 1000
frequent items and several million transactions.

Which part of the apriori algorithm will be the most
expensive to compute? Why?
Efficiency
82
The algorithm to construct the candidate set C
i
to find
the frequent set L
i
is crucial to the performance of the
Apriori algorithm. The larger the candidate set, higher
the processing cost of discovering the frequent itemsets
since the transactions must be scanned for each. Given
that the numbers of early candidate itemsets are very
large, the initial iterations dominate the cost.

In a supermarket database with about 1000 frequent
items, there will be almost half a million candidate
pairs C
2
that need to be searched for to find L
2
.
Efficiency
83
Generally the number of frequent pairs out of half a
million pairs will be small and therefore (why?) the
number of 3-itemsets should be small.

Therefore, it is the generation of the frequent set L
2

that is the key to improving the performance of the
Apriori algorithm.

Comment
In class examples we usually require high
support, for example, 25%, 30% or even
50%. These support values are very high
if the number of items and number of
transactions is large. For example, 25%
support in a supermarket transaction
database means searching for items that
have been purchased by one in four
customers! Not many items would
probably qualify.
Practical applications therefore deal with
much smaller support, sometimes even
down to 1% or lower.
84
Transactions Storage
10 A, B, D
20 D, E, F
30 A, F
40 B, C, D
50 E, F
60 D, E, F
70 C, D, F
80 A, C, D, F

85
Consider only eight transactions with transaction IDs {10, 20, 30, 40, 50,
60, 70, 80}. This set of eight transactions with six items can be represented
in at least three different ways as follows
Horizontal Representation
TID A B C D E F
10 1 1 0 1 0 0
20 0 0 0 1 1 1
30 1 0 0 0 0 1
40 0 1 1 1 0 0
50 0 0 0 0 1 1
60 0 0 0 1 1 1
70 0 0 1 1 0 1
80 1 0 1 1 0 1

86
Vertical Representation
Items 10 20 30 40 50 60 70 80
A 1 0 1 0 0 0 0 1
B 1 0 0 1 0 0 0 0
C 0 0 0 1 0 0 1 1
D 1 1 0 1 0 1 1 1
E 0 1 0 0 1 1 0 0
F 0 1 1 0 1 1 1 1

87
88
Bigger Example
TID Items
1 Biscuits, Bread, Cheese, Coffee, Yogurt
2 Bread, Cereal, Cheese, Coffee
3 Cheese, Chocolate, Donuts, Juice, Milk
4 Bread, Cheese, Coffee, Cereal, Juice
5 Bread, Cereal, Chocolate, Donuts, Juice
6 Milk, Tea
7 Biscuits, Bread, Cheese, Coffee, Milk
8 Eggs, Milk, Tea
9 Bread, Cereal, Cheese, Chocolate, Coffee
12 Bread, Cheese, Coffee, Donuts, Juice
13 Biscuits, Bread, Cereal
14 Cereal, Cheese, Chocolate, Donuts, Juice
15 Chocolate, Coffee
16 Donuts
17 Donuts, Eggs, Juice
18 Biscuits, Bread, Cheese, Coffee
20 Cheese, Chocolate, Donuts, Juice
21 Milk, Tea, Yogurt
22 Bread, Cereal, Cheese, Coffee
23 Chocolate, Donuts, Juice, Milk, Newspaper
24 Newspaper, Pastry, Rolls
25 Rolls, Sugar, Tea

89
Frequency of Items
Item No Item name Frequency
1 Biscuits 4
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 Donuts 10
8 Eggs 2
9 Juice 11
10 Milk 6
11 Newspaper 2
12 Pastry 1
13 Rolls 2
14 Sugar 1
15 Tea 4
16 Yogurt 2

90
Frequent Items
Assume 25% support. In 25 transactions, a frequent item must
occur in at least 7 transactions. The frequent 1-itemset or L
1
is
now given below. How many candidates in C
2
? List them.
Item Frequency
Bread 13
Cereal 10
Cheese 11
Chocolate 9
Coffee 9
Donuts 10
Juice 11

91
L
2

The following pairs are frequent. Now find C
3
and then L
3
and the rules.
Frequent 2-itemset Frequency
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
{Chocolate, Donuts} 7
{Chocolate, Juice} 7
{Donuts, Juice} 9

92
Rules
The full set of rules are given below. Could some rules be removed?
Cheese Bread
Cheese Coffee
Coffee Bread
Coffee Cheese
Cheese, Coffee Bread
Bread, Coffee Cheese
Bread, Cheese Coffee
Chocolate Donuts
Chocolate Juice
Donuts Chocolate
Donuts Juice
Donuts, Juice Chocolate
Chocolate, Juice Donuts
Chocolate, Donuts Juice
Bread Cereal
Cereal Bread

Study the above rules carefully.
93
Improving the Apriori Algorithm
Many techniques for improving the efficiency have
been proposed:
Pruning (already mentioned)
Hashing based technique
Transaction reduction
Partitioning
Sampling
Dynamic itemset counting
94
Pruning
Pruning can reduce the size of the candidate set
C
k
. We want to transform C
k
into a set of frequent
items L
k
. To reduce the work of checking, we may
use the rule that all subsets of C
k
must also be
frequent.
95
Example
Suppose the items are A, B, C, D, E, F, .., X, Y, Z
Suppose L
1
is A, C, E, P, Q, S, T, V, W, X
Suppose L
2
is {A, C}, {A, F}, {A, P}, {C, P}, {E, P},
{E, G}, {E, V}, {H, J}, {K, M}, {Q, S}, {Q, X}
Are you able to identify errors in the L
2
list?
What is C
3
?
How to prune C
3
?
C
3
is {A, C, P}, {E, P, V}, {Q, S, X}
96
Hashing
The direct hashing and pruning (DHP) algorithm
attempts to generate large itemsets efficiently and
reduces the transaction database size.

When generating L
1
, the algorithm also generates all
the 2-itemsets for each transaction, hashes them to
a hash table and keeps a count.
Hash tables are a common approach to the
storing/searching problem.
Hash Tables
What is a Hash Table ?
The simplest kind of hash
table is an array of
records.
This example has 701
records.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[ 700]
Each record has a
special field, called its
key.
In this example, the key
is a long integer field
called Number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
The number might be a
person's identification
number, and the rest of
the record has
information about the
person.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
When a hash table is in
use, some spots contain
valid records, and other
spots are "empty".
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Inserting a New Record
In order to insert a new
record, the key must
somehow be converted
to an array index.
The index is called the
hash value of the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685
Typical way create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
Typical way to create a
hash value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
3
The hash value is used
for the location of the
new record.
Number 580625685
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
[3]
The hash value is used
for the location of the
new record.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685
Collisions
Here is another new
record to insert, with a
hash value of 2.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
My hash
value is
[2].
Collisions
This is called a
collision, because
there is already another
valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
When a collision occurs,
move forward until you
find an empty spot.
Collisions
This is called a
collision, because
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
find an empty spot.
Collisions
This is called a
collision, because
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
find an empty spot.
Collisions
This is called a
collision, because
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
The new record goes
in the empty spot.
Searching for a Key
The data that's attached
to a key can be found
fairly quickly.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
Searching for a Key
Calculate the hash value.
Check that location of the
array for the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is
[2].
Not me.
Searching for a Key
Keep moving forward until
you find the key, or you
reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is
[2].
Not me.
Searching for a Key
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is
[2].
Not me.
Searching for a Key
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is
[2].
Yes!
Searching for a Key
When the item is found, the
information can be copied
to the necessary location.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is
[2].
Yes!
Deleting a Record
Records may also be deleted from a hash
table.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Please
delete me.
Deleting a Record
table.
But the location must not be left as an ordinary
"empty spot" since that could interfere with
searches.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Deleting a Record
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136 Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
table.
But the location must not be left as an ordinary
"empty spot" since that could interfere with
searches.
The location must be marked in some special
way so that a search can tell that the spot used
to have something in it.
121
Example
100 Bread, Cheese, Eggs, Juice
300 Bread, Milk, Yogurt
400 Bread, Juice, Milk
500 Cheese, Juice, Milk

100 (B, C) (B, E) (B, J) (C, E) (C, J) (E, J)
200 (B, C) (B, J) (C, J)
300 (B, M) (B, Y) (M, Y)
400 (B, J) (B, M) (J, M)
500 (C, J) (C, M) (J, M)

Consider the transaction database in the first table below used in an
earlier example. The second table below shows all possible 2-itemsets
for each transaction.
122
Hashing Example
The possible 2-itemsets in the last table are now hashed to a hash
table below. The last column shown in the table below is not
required in the hash table but we have included it for explaining the
technique.

Bit vector Bucket number Count Pairs C2
1 0 3 (C, J) (B, Y) (M, Y) (C, J)
0 1 1 (C, M)
0 2 1 (E, J)
0 3 0
0 4 2 (B, C)
1 5 3 (B, E) (J, M) (J, M)
1 6 3 (B, J) (B, J)
1 7 3 (C, E) (B, M) (B, M)

123
Hash Function Used
- For each pair, a numeric value is obtained by first representing
B by 1, C by 2, E 3, J 4, M 5 and Y 6. Now each pair can be
represented by a two digit number, for example (B, E) by 13
and (C, M) by 26.
- The two digits are then coded as modulo 8 number (dividing
by 8 and using the remainder). This is the bucket address.
- A count of the number of pairs hashed is kept. Those
addresses that have a count above the support value have the
bit vector set to 1 otherwise 0.
- All pairs in rows that have zero bit are removed.

124
Find C
2

The major aim of the algorithm is to reduce the size of C
2
. It is
therefore essential that the hash table is large enough so that
collisions are low. Collisions result in loss of effectiveness of
the hash table. This is what happened in the example in which
we had collisions in three of the eight rows of the hash table
which required us finding which pair was frequent.
125
Transaction Reduction
As discussed earlier, any transaction that does not
contain any frequent k-itemsets cannot contain any
frequent (k+1)-itemsets and such a transaction may
be marked or removed.
126
Example
Frequent items (L
1
) are
A, B, D, M, T. We are
not able to use these to
eliminate any
transactions since all
transactions have at
least one of the items in
L
1
. The frequent pairs
(C
2
) are {A,B} and
{B,M}. How can we
reduce transactions
using these?

TID Items bought
001 B, M, T, Y
002 B, M
003 T, S, P
004 A, B, C, D
005 A, B
006 T, Y, E
007 A, B, M
008 B, C, D, T, P
009 D, T, S
010 A, B, M
127
Partitioning
The set of transactions may be divided into a number
of disjoint subsets. Then each partition is searched
for frequent itemsets. These frequent itemsets are
called local frequent itemsets.
128
Partitioning
Phase 1
Divide n transactions into m partitions
Find the frequent itemsets in each partition
Combine all local frequent itemsets to form
candidate itemsets

Phase 2
Find global frequent itemsets

129
Sampling
A random sample (usually large enough to fit in the
main memory) may be obtained from the overall set of
transactions and the sample is searched for frequent
itemsets. These frequent itemsets are called sample
frequent itemsets.

130
Sampling
Not guaranteed to be accurate but we sacrifice
accuracy for efficiency. A lower support threshold may
be used for the sample to ensure not missing any
frequent datasets.

The actual frequencies of the sample frequent itemsets
are then obtained.

More than one sample could be used to improve
accuracy.
Exercise
Given the above list of transactions, do the following:
1) Find all the frequent itemsets (minimum support 40%)
2) Find all the association rules (minimum confidence 70%)
3) For the discovered association rules, calculate the lift
Transaction No. Item 1 Item 2 Item 3 Item 4
100 Beer Diaper Chocolate
101 Milk Chocolate Shampoo
102 Beer Soap Vodka
103 Beer Cheese Wine
104 Milk Diaper Beer Chocolate

Association Rule - Data Mining

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Association Rule - Data Mining

Uploaded by

Copyright:

Available Formats

You might also like