ML Algorithm

Association Mining
Aprori Algorithm
Introduction:
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset
for association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset
properties. An iterative approach or level-wise search where k-frequent itemsets are used to find k+1
itemsets.
Input Transactions  Aprori Algorithm  Association Rule
Applications:
1. Market Basket Analysis

2. Suggestions
3. Trend Analysis
Steps of Aprori Algorithm
Apriori algorithm is a sequence of steps to be followed to find the most frequent itemset in the given
database. This data mining technique follows the join and the prune(Trimming) steps iteratively until
the most frequent itemset is achieved. A minimum support threshold is given in the problem or it is
assumed by the user.
#1) In the first iteration (K=1) of the algorithm, each item is taken as a 1-itemsets (One Item in
One transaction) candidate. The algorithm will count the occurrences of each item.
#2) Let there be some minimum support, min_sup ( eg 50%). The set of 1 – itemsets whose
occurrence is satisfying the min sup are determined. Only those candidates which count more
than or equal to min_sup, are taken ahead for the next iteration and the others are pruned.
#3) Next, 2-itemset frequent items with min_sup are discovered. For this in the join step, the 2-
itemset is generated by forming a group of 2 by combining items with itself.
#4) The 2-itemset candidates are pruned using min-sup threshold value. Now the table will have
2 –itemsets with min-sup only.
#5) The next iteration will form 3 –itemsets using join and prune step. This iteration will follow
antimonotone property where the subsets of 3-itemsets, that is the 2 –itemset subsets of each
group fall in min_sup. If all 2-itemset subsets are frequent then the superset will be frequent
otherwise it is pruned.
#6) Next step will follow making 4-itemset by joining 3-itemset with itself and pruning if its
subset does not meet the min_sup criteria. The algorithm is stopped when the most frequent
itemset is achieved (Item set is null)
Important Terms and Formulas
N
Frequency(x) = ∑ Occurance of xi
i=0
Support Count(x) = Frequency(x) in N (Transactions)
Freq( x)
Support (x) =
N
Support ( A , B)
Confidence (A,B) =
Support ( A)
Support ( A , B)
Lift (A,B) =
Support ( A ) . Support (B)
Candidate Set (C) – Prior input set for next iteration
Item Set (L)= Item set whose support is greater or equal to the minimum support
Example
For the following given transaction dataset generate the rule using Aprori Algorithm. Given that
minimum support is 0.5 (50 %) and acceptable confidence is 0.75 (75%)
Transaction ID Items
1. Bread Cheese Egg Juice
2. Bread Cheese Juice
3. Bread Milk Yogurt
4. Bread Juice Milk
5. Cheese Juice Milk
Solution
Step 1: K=1
Create a table containing support count of each item present in dataset – Called C1(candidate set)
Item Support Count Support %

Bread 4 4/5 =80%
Cheese 3 3/5=60%
Egg 1 1/5=20%
Juice 4 4/5=80%
Milk 3 3/5=60%
Yogurt 1 1/5=20%
Pruning
compare candidate set item’s support count with minimum support count(here min_support=50% if support
of candidate set items is less than min_support then remove those items). This gives us itemset L1.

Bread 4 4/5 =80%
Cheese 3 3/5=60%
Juice 4 4/5=80%
Milk 3 3/5=60%
Now this frequent item set become candidate set for next Iteration
Step 2: K=2
Grouping and Pruning
compare candidate set item’s support count with minimum support count(here min_support=50%
if support of candidate set items is less than min_support then remove those items). This gives us
itemset L1.
Grouping

Bread, Cheese 2 2/5 =40%
Bread, Juice 3 3/5=60%
Bread, Milk 2 2/5=40%
Cheese, Juice 3 3/5=60%
Cheese, Milk 1 1/5=20%
Juice, Milk 2 2/5=40%
Pruning

Bread, Juice 3 3/5=60%
Cheese, Juice 3 3/5=60%
Step 3: K=3
Grouping and Pruning
compare candidate set item’s support count with minimum support count(here min_support=50% if support
of candidate set items is less than min_support then remove those items). This gives us itemset L1.
Grouping :Making 3-Item Candidate

Bread, Juice, Cheese 1 1/5=20%
Frequent item set become Null Set (Empty Set)

Computing Confidence and Life
Rule 1: (Bread, Juice) =>Confidence = Support (Bread, Juice) /Support(Bread) = (3/5) / (4/5)
= 3/4
= 75%
 Lift (Bread, Juice)

 = Support (Bread, Juice) /( Support(Bread) .Support (Juice))
 Lift= (3/5) /{(4/5).(4/5)}
 = 0.6/0.64
 = 0.9375
Rule 2: (Juice, Bread) =>Confidence = Support (Juice, Bread) /Support(Juice) = (3/5) / (4/5)
= 3/4
= 75%
 Lift = Support (Juice, Bread) /( Support(Bread) .Support (Juice)

 Lift= (3/5) /{(4/5).(4/5)}
 = 0.6/0.64
 = 0.99375
Rule 3: (Cheese, Juice) =>Confidence = Support (Cheese, Juice) /Support(Cheese) = (3/5) / (3/5)
=1
= 100%
 Lift = Support (Cheese, Juice)/( Support(Cheese) .Support (Juice)

 Lift= (3/5) /{(3/5).(4/5)}
 = 0.6/0.48
 = 1.25
The lift is a value between 0 and infinity: A lift value greater than 1 indicates that the rule body and
the rule head appear more often together than expected, this means that the occurrence of the rule body
has a positive effect on the occurrence of the rule head.
Rule 4: (Juice, Cheese) =>Confidence = Support (Juice, Cheese) /Support(Juice) = (3/5) / (4/5)
= 75%
 Lift = Support (Juice, Cheese) /( Support(Cheese) .Support (Juice)

 Lift= (3/5) /{(3/5).(4/5)}
 = 0.6/0.48
 = 1.25
Algorithm
1: Find all large 1-itemsets

2: For (k = 2 ; while Lk-1 is non-empty; k++)
3 {Ck = apriori-gen(Lk-1)
4 For each c in Ck, initialise c.count to

zero
5 For all records r in the DB
Matlab
{Cr = subset(Ck, r); For each c in Cr ,
Let:
c.count++ } Item ID Number
Bread 1.
7 Set Lk := all c in C whose count >=
Cheese
Egg k 3.
2.
Juice 4.
minsup Milk 5.
Yogurt 6.
8 } /* end -- return all of the Lk sets.

K-Mean Clustering
C1={1,4…
C2={2,3,..
Updated cetroid
MatLab
Naïve Bayse Algorithm (Classification)
Example:
Consider a given dataset, apply naïve bayse algorithm to predict the fruit if it has following
properties
Fruit X={Yellow, Sweet, Long}
Fruit Yellow Sweet Long Total

Mango 350 450 0 800
Banana 400 300 350 1050
Others 50 100 50 200
Total 800 850 400 2050
Solution:
First Compare with Mango
P(X|Mango) =P(Y|M).P(S|M).P(L|M)
According to bayes theorem
P(Y|M) = P(M|Y).P(Y) /P(M)

= (350/800)(800/2050) /(800/2050)
= 0.43375
P(S|M) = P(M|S).P(S) /P(M)

= (450/850)*(850/2050) /(800/2050)
= 0.5625
P(L|M) = P(M|L).P(L)/P(M)
= (0/400)*(400/2050) /(800/2050)
= 0
P(X|Mango) = P(Y|M).P(S|M).P(L|M)
= (0.43375)*(0.5625)*(0)
P(X|Mango) = 0
Now Compare With Banana
P(X|Banana) =P(Y|B).P(S|B).P(L|B)
P(Y|B) = P(B|Y).P(Y) /P(B)
= (400/800)(800/2050) /(1050/2050)
= 0.380
P(S|B) = P(B|S).P(S) /P(B)

= (300/850)*(850/2050) /(1050/2050)
= 0.2857
P(L|B) = P(B|L).P(L)/P(B)
= (350/400)*(400/2050) /(1050/2050)
= 0.333
P(X|B) = P(Y|B).P(S|B).P(L|B)
= (0.380)*(0.2857)*(0.33)
= 0.035
Now Compare With others

P(X|Other) =P(Y|O).P(S|O).P(L|O)
P(Y|O) = P(O|Y).P(Y) /P(O)

= (50/800)(800/2050) /(200/2050)
= 0.25
P(S|O) = P(O|S).P(S) /P(O)

= (100/850)*(850/2050) /(200/2050)
= 0.5
P(L|O)= P(O|L).P(L)/P(O)
= (50/400)*(400/2050) /(200 /2050)
= 0.25
P(X|B) = P(Y|B).P(S|B).P(L|B)
= (0.25)*(0.5)*(0.25)
= 0.03125
KNN

ML Algorithm

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Algorithm

Uploaded by

Copyright:

Available Formats

Association Mining

Input Transactions  Aprori Algorithm  Association Rule

1. Market Basket Analysis

Steps of Aprori Algorithm

Support Count(x) = Frequency(x) in N (Transactions)

Candidate Set (C) – Prior input set for next iteration

Item Support Count Support %

Item Support Count Support %

Grouping and Pruning

Item Support Count Support %

Item Support Count Support %

Grouping and Pruning

Grouping :Making 3-Item Candidate

Item Support Count Support %

Frequent item set become Null Set (Empty Set)

 Lift (Bread, Juice)

 Lift = Support (Juice, Bread) /( Support(Bread) .Support (Juice)

 Lift = Support (Cheese, Juice)/( Support(Cheese) .Support (Juice)

 Lift = Support (Juice, Cheese) /( Support(Cheese) .Support (Juice)

1: Find all large 1-itemsets

4 For each c in Ck, initialise c.count to

8 } /* end -- return all of the Lk sets.

Fruit Yellow Sweet Long Total

P(Y|M) = P(M|Y).P(Y) /P(M)

P(S|M) = P(M|S).P(S) /P(M)

P(S|B) = P(B|S).P(S) /P(B)

Now Compare With others

P(Y|O) = P(O|Y).P(Y) /P(O)

P(S|O) = P(O|S).P(S) /P(O)

You might also like