Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Association Mining

Aprori Algorithm

Introduction:

Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset
for association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset
properties. An iterative approach or level-wise search where k-frequent itemsets are used to find k+1
itemsets.

Input Transactions  Aprori Algorithm  Association Rule

Applications:

1. Market Basket Analysis


2. Suggestions
3. Trend Analysis

Steps of Aprori Algorithm

Apriori algorithm is a sequence of steps to be followed to find the most frequent itemset in the given
database. This data mining technique follows the join and the prune(Trimming) steps iteratively until
the most frequent itemset is achieved. A minimum support threshold is given in the problem or it is
assumed by the user.

#1) In the first iteration (K=1) of the algorithm, each item is taken as a 1-itemsets (One Item in
One transaction) candidate. The algorithm will count the occurrences of each item.

#2) Let there be some minimum support, min_sup ( eg 50%). The set of 1 – itemsets whose
occurrence is satisfying the min sup are determined. Only those candidates which count more
than or equal to min_sup, are taken ahead for the next iteration and the others are pruned.

#3) Next, 2-itemset frequent items with min_sup are discovered. For this in the join step, the 2-
itemset is generated by forming a group of 2 by combining items with itself.

#4) The 2-itemset candidates are pruned using min-sup threshold value. Now the table will have
2 –itemsets with min-sup only.

#5) The next iteration will form 3 –itemsets using join and prune step. This iteration will follow
antimonotone property where the subsets of 3-itemsets, that is the 2 –itemset subsets of each
group fall in min_sup. If all 2-itemset subsets are frequent then the superset will be frequent
otherwise it is pruned.

#6) Next step will follow making 4-itemset by joining 3-itemset with itself and pruning if its
subset does not meet the min_sup criteria. The algorithm is stopped when the most frequent
itemset is achieved (Item set is null)
Important Terms and Formulas
N
Frequency(x) = ∑ Occurance of xi
i=0

Support Count(x) = Frequency(x) in N (Transactions)

Freq( x)
Support (x) =
N

Support ( A , B)
Confidence (A,B) =
Support ( A)

Support ( A , B)
Lift (A,B) =
Support ( A ) . Support (B)

Candidate Set (C) – Prior input set for next iteration

Item Set (L)= Item set whose support is greater or equal to the minimum support
Example

For the following given transaction dataset generate the rule using Aprori Algorithm. Given that
minimum support is 0.5 (50 %) and acceptable confidence is 0.75 (75%)

Transaction ID Items
1. Bread Cheese Egg Juice
2. Bread Cheese Juice
3. Bread Milk Yogurt
4. Bread Juice Milk
5. Cheese Juice Milk

Solution

Step 1: K=1

Create a table containing support count of each item present in dataset – Called C1(candidate set)

Item Support Count Support %


Bread 4 4/5 =80%
Cheese 3 3/5=60%
Egg 1 1/5=20%
Juice 4 4/5=80%
Milk 3 3/5=60%
Yogurt 1 1/5=20%

Pruning

compare candidate set item’s support count with minimum support count(here min_support=50% if support
of candidate set items is less than min_support then remove those items). This gives us itemset L1.

Item Support Count Support %


Bread 4 4/5 =80%
Cheese 3 3/5=60%
Juice 4 4/5=80%
Milk 3 3/5=60%

Now this frequent item set become candidate set for next Iteration
Step 2: K=2

Grouping and Pruning

compare candidate set item’s support count with minimum support count(here min_support=50%
if support of candidate set items is less than min_support then remove those items). This gives us
itemset L1.

Grouping

Item Support Count Support %


Bread, Cheese 2 2/5 =40%
Bread, Juice 3 3/5=60%
Bread, Milk 2 2/5=40%
Cheese, Juice 3 3/5=60%
Cheese, Milk 1 1/5=20%
Juice, Milk 2 2/5=40%
Pruning

Item Support Count Support %


Bread, Juice 3 3/5=60%
Cheese, Juice 3 3/5=60%

Step 3: K=3

Grouping and Pruning

compare candidate set item’s support count with minimum support count(here min_support=50% if support
of candidate set items is less than min_support then remove those items). This gives us itemset L1.

Grouping :Making 3-Item Candidate

Item Support Count Support %


Bread, Juice, Cheese 1 1/5=20%

Frequent item set become Null Set (Empty Set)


Computing Confidence and Life

Rule 1: (Bread, Juice) =>Confidence = Support (Bread, Juice) /Support(Bread) = (3/5) / (4/5)
= 3/4

= 75%

 Lift (Bread, Juice)


 = Support (Bread, Juice) /( Support(Bread) .Support (Juice))
 Lift= (3/5) /{(4/5).(4/5)}
 = 0.6/0.64
 = 0.9375

Rule 2: (Juice, Bread) =>Confidence = Support (Juice, Bread) /Support(Juice) = (3/5) / (4/5)
= 3/4

= 75%

 Lift = Support (Juice, Bread) /( Support(Bread) .Support (Juice)


 Lift= (3/5) /{(4/5).(4/5)}
 = 0.6/0.64
 = 0.99375

Rule 3: (Cheese, Juice) =>Confidence = Support (Cheese, Juice) /Support(Cheese) = (3/5) / (3/5)
=1

= 100%

 Lift = Support (Cheese, Juice)/( Support(Cheese) .Support (Juice)


 Lift= (3/5) /{(3/5).(4/5)}
 = 0.6/0.48
 = 1.25

The lift is a value between 0 and infinity: A lift value greater than 1 indicates that the rule body and
the rule head appear more often together than expected, this means that the occurrence of the rule body
has a positive effect on the occurrence of the rule head.

Rule 4: (Juice, Cheese) =>Confidence = Support (Juice, Cheese) /Support(Juice) = (3/5) / (4/5)
= 75%

 Lift = Support (Juice, Cheese) /( Support(Cheese) .Support (Juice)


 Lift= (3/5) /{(3/5).(4/5)}
 = 0.6/0.48
 = 1.25
Algorithm

1: Find all large 1-itemsets


2: For (k = 2 ; while Lk-1 is non-empty; k++)

3 {Ck = apriori-gen(Lk-1)

4 For each c in Ck, initialise c.count to


zero
5 For all records r in the DB
Matlab
{Cr = subset(Ck, r); For each c in Cr ,
Let:
c.count++ } Item ID Number
Bread 1.
7 Set Lk := all c in C whose count >=
Cheese
Egg k 3.
2.

Juice 4.
minsup Milk 5.
Yogurt 6.

8 } /* end -- return all of the Lk sets.


K-Mean Clustering

C1={1,4…
C2={2,3,..
Updated cetroid

MatLab
Naïve Bayse Algorithm (Classification)

Example:
Consider a given dataset, apply naïve bayse algorithm to predict the fruit if it has following
properties
Fruit X={Yellow, Sweet, Long}

Fruit Yellow Sweet Long Total


Mango 350 450 0 800
Banana 400 300 350 1050
Others 50 100 50 200
Total 800 850 400 2050

Solution:
First Compare with Mango
P(X|Mango) =P(Y|M).P(S|M).P(L|M)
According to bayes theorem

P(Y|M) = P(M|Y).P(Y) /P(M)


= (350/800)(800/2050) /(800/2050)
= 0.43375

P(S|M) = P(M|S).P(S) /P(M)


= (450/850)*(850/2050) /(800/2050)
= 0.5625

P(L|M) = P(M|L).P(L)/P(M)
= (0/400)*(400/2050) /(800/2050)
= 0

P(X|Mango) = P(Y|M).P(S|M).P(L|M)
= (0.43375)*(0.5625)*(0)

P(X|Mango) = 0
Now Compare With Banana
P(X|Banana) =P(Y|B).P(S|B).P(L|B)
According to bayes theorem
P(Y|B) = P(B|Y).P(Y) /P(B)
= (400/800)(800/2050) /(1050/2050)
= 0.380

P(S|B) = P(B|S).P(S) /P(B)


= (300/850)*(850/2050) /(1050/2050)
= 0.2857

P(L|B) = P(B|L).P(L)/P(B)
= (350/400)*(400/2050) /(1050/2050)
= 0.333

P(X|B) = P(Y|B).P(S|B).P(L|B)
= (0.380)*(0.2857)*(0.33)
= 0.035

Now Compare With others


P(X|Other) =P(Y|O).P(S|O).P(L|O)
According to bayes theorem

P(Y|O) = P(O|Y).P(Y) /P(O)


= (50/800)(800/2050) /(200/2050)
= 0.25

P(S|O) = P(O|S).P(S) /P(O)


= (100/850)*(850/2050) /(200/2050)
= 0.5

P(L|O)= P(O|L).P(L)/P(O)
= (50/400)*(400/2050) /(200 /2050)
= 0.25

P(X|B) = P(Y|B).P(S|B).P(L|B)
= (0.25)*(0.5)*(0.25)
= 0.03125
KNN

You might also like