Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Association Rules Mining – Apriori Algorithm

Apriori algorithm is a kind of frequent itemset algorithm that can be used to mine association rules. This
algorithm firstly finds out all frequent itemsets (i.e. frequency) according to support degree and then
generates association rules (i.e. intensity) according to confidence coefficient.

https://www.youtube.com/watch?v=43CMKRHdH30

Use Frq for the frequency


𝑓𝑟𝑞(𝑋, 𝑌)
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 =
𝑁
Rule: X  Y
𝑓𝑟𝑞(𝑋, 𝑌)
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 =
𝑓𝑟𝑞(𝑋)

Association Rules Example 1:


Find affinities between products, that is, which products sell together often.
Assume minimum support level is set at 30%; the minimum confidence level is set at 60%.
Transaction List
1 Milk Egg Bread Butter
2 Milk Butter Egg Ketchup
3 Bread Butter Ketchup
4 Milk Bread Butter
5 Bread Butter Cookies
6 Milk Bread Butter Cookies
7 Milk Cookies
8 Milk Bread Butter
9 Bread Butter Egg Cookies
10 Milk Butter Bread
11 Milk Bread Butter
12 Milk Bread Cookies Ketchup

Reference: https://www.youtube.com/watch?v=43CMKRHdH30

1
Association Rules Example 1 Solution:

Transaction List
1 Milk Egg Bread Butter
2 Milk Butter Egg Ketchup
3 Bread Butter Ketchup
4 Milk Bread Butter
5 Bread Butter Cookies
6 Milk Bread Butter Cookies
7 Milk Cookies
8 Milk Bread Butter
9 Bread Butter Egg Cookies
10 Milk Butter Bread
11 Milk Bread Butter
12 Milk Bread Cookies Ketchup

No of transactions 100% 12
Minimum No. to support 30% 4
Confidence level 60% 8

Step 1 Step 2 Step 3


1-Item Sets Freq. 2-Item Sets Freq. 3-Item Sets Freq.
Milk 9 Milk, Bread 7 Milk, Bread, Butter 6
Bread 10 Milk, Butter 7 Milk, Bread, Cookies 2
Butter 10 Milk, Cookies 3 Bread, Butter, Cookies 3
Egg 3 Bread, Butter 9 Milk, Butter, Cookies 2
Ketchup 3 Bread, Cookies 4
Cookies 5 Butter, Cookies 3

Frequency Frequency Frequency


Freq. Freq. Freq.
1-Item Sets 2-Item Sets 3-Item Sets
Milk 9 Milk, Bread 7 Milk, Bread, Butter 6
Bread 10 Milk, Butter 7
Butter 10 Bread, Butter 9
Cookies 5 Bread, Cookies 4
2
Defining association rules
Frequent Item set I = {Milk, Bread, Butter}
Non Empty Sets = {{Milk}, {Bread}, {Butter}, {Milk, Bread}, {Milk, Butter}, {Bread, Butter}}

For every Non Empty Set S of I, the association rule is defined as follows:
 S  (I-S)
 If Support(I)/Support(S) >= min_confidence

Rule 1: {Milk} --> {Bread, Butter}


Support = 6/12 = 50%
Confidence=Support (Milk, Bread, Butter)/Support(Milk) = (6/12)/(9/12)=6/9=66.67% >= 60%
 {S=50%, C=66.67%}
 Valid Rule

Rule 2: {Bread} --> {Milk, Butter}


Support = 6/12=50%
Confidence=Support (Milk, Bread, Butter)/Support(Bread)=6/10=60% >= 60%
 {S=50%, C=60%}
 Valid Rule

Rule 3: {Butter} --> {Milk, Bread}


Support = 6/12=50%
Confidence=Support (Milk, Bread, Butter)/Support(Butter)=6/10=60% >= 60%
 {S=50%, C=60%}
 Valid Rule
Rule 4: {Milk, Bread} --> {Butter}
Support = 6/12=50%
Confidence=Support (Milk, Bread, Butter)/Support (Milk, Bread) = 6/7=85.7% >= 60%
 {S=50%, C=85.7%}
 Valid Rule

Rule 5: {Milk, Butter} --> {Bread}


Support = 6/12=50%
Confidence=Support (Milk, Bread, Butter)/Support (Milk, Butter) = 6/7=85.7% >= 60%
 {S=50%, C=85.7%}
 Valid Rule

Rule 6: {Bread, Butter} --> {Milk}


Support = 6/12=50%
Confidence=Support (Milk, Bread, Butter)/Support (Bread, Butter) = 6/9=66.67% >= 60%
 {S=50%, C=66.67%}
 Valid Rule

3
Association Rules (Numerical) Example 2:
Consider the following transactions. Apply the association rule mining to get the association rules
with minimum support of 2 and minimum confidence of 50%.

TID List if Items IDs


T100 I1 I2 I5
T200 I2 I4
T300 I2 I3
T400 I1 I2 I4
T500 I1 I3
T600 I2 I3
T700 I1 I3
T800 I1 I2 I3 I5
T900 I1 I2 I3

Reference: https://www.youtube.com/watch?v=NT6beZBYbmU

4
Association Rules (Numerical) Example 2 Solution:
In addition to the given transactions’ list above, we can be say:
No of transactions 100% 9
Min No. to support 22% 2
Confidence level 50% 5

Step 1 Step 2 Step 3 Step 4


1-Item 2-Item 3-Item 4-Item
Freq. Freq. Freq. Freq.
Set Sets Sets Sets
I1 6 I1, I2 4 I1, I2, I3 2 I1,I2,I3,I5 1
I2 7 I1, I3 4 I1, I2, I4 1
I3 6 I1, I4 1 I1, I2, I5 2
I4 2 I1, I5 2 I1, I3, I4 0
I5 2 I2, I3 4 I1, I3, I5 1
I2, I4 2 I1, I4, I5 0
I2, I5 2 I2, I3, I4 0
I3, I4 0 I2, I3, I5 1
I3, I5 1 I3, I4, I5 0
I4, I5 0

Freq. Freq. Freq. 3 Freq.


Freq. Freq. Freq. Freq.
1-Item 2-Item Item 4-Item
I1 6 I1, I2 4 I1, I2, I3 2 Not possible
I2 7 I1, I3 4 I1, I2, I5 2
I3 6 I1, I5 2
I4 2 I2, I3 4
I5 2 I2, I4 2
I2, I5 2

We first define association rules for Frequent Item set I = {I1, I2, I3}, then we define association
rules for Frequent Item set I = {I1, I2, I5}

5
First NonEmptySets: {{I1}, {I2}, {I3}, {I1, I2}, {I1, I3}, {I2, I3}}
For every NonEmptySet S of I, the association rule is defined as follows:
 S --> (I-S) & If Support(I)/Support(S) >= min_confidence
Rule 1: {I1} --> {I2, I3}
Support = 2/9=22.22%, Confidence = Support (I1, I2, I3)/Support (I1) = (2/9)/(9/9)=2/9=33.33% < 50%
 {S=22.22%, C=33.34%}  Invalid Rule
Rule 2: {I2} --> {I1, I3}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I3)/Support (I2) = 2/7 = 28.57% < 50%
 {S=22.22%, C=28.75%}  Invalid Rule
Rule 3: {I3} --> {I1, I2}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I3)/Support(I3) =2/6=33.34% < 50%
 {S=22.22%, C=33.34%}  Invalid Rule
Rule 4: {I1, I2} --> {I3}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I3)/Support (I1, I2) = 2/4 = 50% >= 50%
 {S=22.22%, C=50%}  Valid Rule
Rule 5: {I1, I3} --> {I2}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I3)/Support (I1, I3) = 2/4 = 50% >= 50%
 {S=22.22%, C=50%}  Valid Rule
Rule 6: {I2, I3} --> {I1}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I3)/Support (I2, I3) = 2/4 = 50% >= 50%
 {S=22.22%, C=50%}  Valid Rule

Second NonEmptySets: {{I1},{I2},{I5},{I1, I2}, {I1, I5}, {I2, I5}}


Rule 1: {I1} --> {I2, I5}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support(I1) = 2/6 = 33.33% < 50%
 {S=22.22%, C=33.34%}  Invalid Rule
Rule 2: {I2} --> {I1, I5}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support(I2) = 2/7 = 28.57% < 50%
 {S=22.22%, C=28.75%}  Invalid Rule
Rule 3: {I5} --> {I1, I2}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support(I5) = 2/2 = 100% >= 50%
 {S=22.22%, C=100%}  Valid Rule
Rule 4: {I1, I2} --> {I5}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support (I1, I2) = 2/4 = 50% >= 50%
 {S=22.22%, C=50%}  Valid Rule
Rule 5: {I1, I5} --> {I2}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support (I1, I5) = 2/2 = 100% >= 50%
 {S=22.22%, C=100%}  Valid Rule
Rule 6: {I2, I5} --> {I1}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support (I2, I5) = 2/2 = 100% >= 50%
 {S=22.22%, C=100%}  Valid Rule
6
To summarize, the 7 valid rules from the two Non Empty Sets are:

First NonEmptySets: {{I1}, {I2}, {I3}, {I1, I2}, {I1, I3}, {I2, I3}}
Rule 4: {I1, I2} --> {I3}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I3)/Support (I1, I2) = 2/4 = 50% >= 50%
 {S=22.22%, C=50%}  Valid Rule
Rule 5: {I1, I3} --> {I2}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I3)/Support (I1, I3) = 2/4 = 50% >= 50%
 {S=22.22%, C=50%}  Valid Rule
Rule 6: {I2, I3} --> {I1}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I3)/Support (I2, I3) = 2/4 = 50% >= 50%
 {S=22.22%, C=50%}  Valid Rule

Second NonEmptySets: {{I1},{I2},{I5},{I1, I2}, {I1, I5}, {I2, I5}}


Rule 3: {I5} --> {I1, I2}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support(I5) = 2/2 = 100% >= 50%
 {S=22.22%, C=100%}  Valid Rule
Rule 4: {I1, I2} --> {I5}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support (I1, I2) = 2/4 = 50% >= 50%
 {S=22.22%, C=50%}  Valid Rule
Rule 5: {I1, I5} --> {I2}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support (I1, I5) = 2/2 = 100% >= 50%
 {S=22.22%, C=100%}  Valid Rule
Rule 6: {I2, I5} --> {I1}
Support = 2/9=22.22%, Confidence=Support (I1, I2, I5)/Support (I2, I5) = 2/2 = 100% >= 50%
 {S=22.22%, C=100%}  Valid Rule

You might also like