Professional Documents
Culture Documents
DWM Assignment 2
DWM Assignment 2
Step 1: From above given distance matrix, E and A clusters are having minimum
distance, so merge them together to form cluster(E,A).
Distance matrix:
dist((E A), C)dist((E A), B)dist((E A), D)=MIN(dist(E,C),
dist(A,C))=MIN(2,2)=2=MIN(dist(E,B), dist(A,B))=MIN(2,5)=2=MIN(dist(E,D),
dist(A,D))=MIN(3,3)=3dist((E A), C)=MIN(dist(E,C), dist(A,C))=MIN(2,2)=2dist((E A),
B)=MIN(dist(E,B), dist(A,B))=MIN(2,5)=2dist((E A), D)=MIN(dist(E,D),
dist(A,D))=MIN(3,3)=3
Step 2: Consider the distance matrix obtained in step 1. Since B,C distance is minimum,
we combine B and C.
dist((B C), (E A))dist((B C), D)=MIN(dist(B,E), dist(B,A), dist(C E), dist(C A))=MIN(2,5, 2,
2)=2=MIN(dist(B, D), dist(C,D))=MIN(3,6)=3dist((B C), (E A))=MIN(dist(B,E), dist(B,A),
dist(C E), dist(C A))=MIN(2,5, 2, 2)=2dist((B C), D)=MIN(dist(B, D),
dist(C,D))=MIN(3,6)=3
Step 3: Consider the distance matrix obtained in step 2. Since (E,A) and (B,C) distance
is minimum, we combine them
dist((E A), (B C))=MIN(dist(E,B), dist(E,C), dist(A B), dist(A C))=MIN(2, 2, 2, 5, 2)=2dist((E
A), (B C))=MIN(dist(E,B), dist(E,C), dist(A B), dist(A C))=MIN(2, 2, 2, 5, 2)=2
Ans.
Object X Y
A 1 1
B 2 1
C 4 3
D 5 4
K=2
1] Obtain adjacency matrix.
- A B C D
A 0 - - -
B 1 0 - -
C 13−−√13 8–√8 0 -
D 5 9–√9 2–√2 0
3] c1c1 = {A}
c2c2 = { B, C, D}
∴∴ centroid of c1c1 = {(1,1)}
" c2c2 {( 2+4+53,1+3+43)2+4+53,1+3+43)}
= {(3.67, 2.67)}
3] c1c1 = {A , B}
c2c2 = {C , D}
∴∴ centroid of c1c1 = { (1.5, 1)}
" c2c2 = {(4.5, 3.5)}
∴∴ c1c1 = {A, B}
c2c2 = { C, D}
∵∵ there is no change, these are the final clusters.
Ans.
Step-1: K=1
(I) Create a table containing support count of each item present in dataset –
Called C1(candidate set)
(II) compare candidate set item’s support count with minimum support count(here
min_support=2 if support_count of candidate set items is less than min_support then
remove those items). This gives us itemset L1.
Step-2: K=2
Generate candidate set C2 using L1 (this is called join step). Condition of joining
Lk-1 and Lk-1 is that it should have (K-2) elements in common.
Check all subsets of an itemset are frequent or not and if not frequent remove
that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check for
each itemset)
Now find support count of these itemsets by searching in dataset.
(II) compare candidate (C2) support count with minimum support count(here
min_support=2 if support_count of candidate set item is less than min_support then
remove those items) this gives us itemset L2.
Step-3:
Generate candidate set C3 using L2 (join step). Condition of joining Lk-
1 and Lk-1 is that it should have (K-2) elements in common. So here, for L2,
(II) Compare candidate (C3) support count with minimum support count(here
min_support=2 if support_count of candidate set item is less than min_support then
remove those items) this gives us itemset L3.
Step-4:
Generate candidate set C4 using L3 (join step). Condition of joining Lk-
1 and Lk-1 (K=4) is that, they should have (K-2) elements in common. So
Thus, we have discovered all the frequent item-sets. Now generation of strong
association rule comes into picture. For that we need to calculate confidence of each
rule.
Confidence –
A confidence of 60% means that 60% of the customers, who purchased milk and bread
also bought butter.
Confidence(A->B)=Support_count(A∪B)/Support_count(A)
So here, by taking an example of any frequent itemset, we will show the rule
generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as strong
association rules.
Ans.
Data classification is broadly defined as the process of organizing data by relevant
categories so that it may be used and protected more efficiently. On a basic level, the
classification process makes data easier to locate and retrieve. Data classification is of
particular importance when it comes to risk management, compliance, and data
security.
Data classification involves tagging data to make it easily searchable and trackable. It
also eliminates multiple duplications of data, which can reduce storage and backup
costs while speeding up the search process. Though the classification process may
sound highly technical, it is a topic that should be understood by your organization’s
leadership.
There are three main types of data classification that are considered industry
standards:
m1=3, m2=4
Step 2:Calculate the distance of the objects from the mean and assign the objects to
the cluster with minimum distance
k1={2,3} k2={4,10,12,20,30,11,25}
Step 3:Reassing means
m1=(2+3/2) m2=(4+10+12+20+30+11+25/2)
m1=2.5, m2=16
Step 4:Calculate distance and assign clusters
k1={2,3,4} k2={10,12,20,30,11,25}
Step 5: Reassing means
m1=3,m2=18
Step 6: Calculate distance and assign clusters
k1={2,3,4,10} k2={12,20,30,11,25}
Step 7: Reassing means
m1=4.75 , m2=19.6
Step 8:Calculate distance and assign clusters
k1={2,3,4,10,11,12} k2={20,30,25}
Step 9:Reassing means
m1=7 , m2=25
Step 10:Calculate distance and assign clusters
**k1={2,3,4,10,11,12} k2={20,30,25}**
Repeat the steps till you get same clusters
As clusters in step 8 and step 10 are same ,these are the final clusters.