Professional Documents
Culture Documents
06 - Unsupervised Learning - 18 Dec 2023
06 - Unsupervised Learning - 18 Dec 2023
1
Unsupervised Learning
Agenda
• Unsupervised Learning
– K-Means Clustering
– Agglomerative Clustering
Clustering
Clustering is the partitioning of a data set into subsets
(clusters), so that the data in each subset (ideally) share
some common trait - often according to some defined
distance measure.
4
Clustering
Notion of Cluster can be ambigous
Clustering Applications
Types of Clustering
• A clustering is a set of clusters
Important distinction between hierarchical and partitional sets of
clusters
• Partitional Clustering
– A division of data objects into non-overlapping subsets (clusters)
• Hierarchical clustering
– A set of nested clusters organized as a hierarchical tree
Hierarchical Clustering
These find successive clusters using previously established clusters.
1. Agglomerative ("bottom-up"):
Agglomerative algorithms begin with each element as a separate cluster and merge
them into successively larger clusters.
2. Divisive ("top-down"):
Divisive algorithms begin with the whole set and proceed to divide it into
successively into smaller clusters.
Hierarchical Clustering
Partitional Clustering
– Construct a partition of a data set to produce several clusters –
At once
– Examples
▪ K-means clustering
▪ Fuzzy c-means clustering
Partitional Clustering
K means Clustering
13
K means Clustering
K means Clustering
15
K-Means Example
K-Means Example
K-Means Example
18
K-Means : In Class Practice
19
K-Means– Example 2
– Suppose we have 4 medicines and each has two attributes (pH
and weight index).
– Our goal is to group these objects into K=2 clusters of medicine
c1 = A, c2 = B
d( D , c1 ) = ( 5 − 1)2 + ( 4 − 1)2 = 5
d( D , c2 ) = ( 5 − 2)2 + ( 4 − 1)2 = 4.24
K-Means– Example 2
– Assign the sample to its closest cluster
c1 = (1, 1)
2 + 4 + 5 1+ 3 + 4
c2 = ,
3 3
= (11 / 3, 8 / 3)
= (3.67, 2.67)
K-Means– Example 2
• Repeat the above steps
1+ 2 1+1 1
c1 = , = (1 , 1)
2 2 2
4+5 3+4 1 1
c2 = , = ( 4 ,3 )
2 2 2 2
K-Means– Example 2
K-Means– Example 2
K-Means– Example 2
• We obtain result that G2=G1. Comparing the grouping of last
iteration and this iteration reveals that the objects do not move
group anymore.
33
Hierachical clustering
34
Hierachical clustering
35
Hierachical Clustering
• Let us consider a gene measured in a set of 5 experiments:
A, B, C, D and E.
• The values measured in the 5 experiments are:
36
Hierachical Clustering
SOLUTION:
1. The closest two values are 100 and 200
▪ =>the centroid of these two values is 150.
2. Now we are clustering the values: 150, 500, 900, 1100
3. The closest two values are 900 and 1100
▪ =>the centroid of these two values is 1000.
4. The remaining values to be joined are: 150, 500, 1000.
5. The closest two values are 150 and 500
▪ =>the centroid of these two values is 325.
a
ab
b
abcde
c
cde
d
de
e
Divisive
Step 4 Step 3 Step 2 Step 1 Step 0
38
Agglomerative clustering
d3
d5
d3,d4,d5
d1
d4
d2 d1,d2 d4,d5 d3
Agglomerative Clustering - Example
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5
Data matrix
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
dAB = ((1-1.5)2+(1-1.5)2)1/2 = 0.707
C 5.66 4.95 0.00 2.24 1.41 2.50
Euclidean distance D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
40
Merge two closest clusters
Data matrix
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50
Find two closest clusters D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
41
Update Distance Matrix
Dist A B C D,F E
A 0.00 0.71 5.66 ? 4.24
B 0.71 0.00 4.95 ? 3.54
C 5.66 4.95 0.00 ? 1.41
D,F ? ? ? 0.00 ?
E 4.24 3.54 1.41 ? 0.00
42
Update Distance Matrix
43
Merge two closest clusters
44
Update Distance Matrix
45
Merge two closest clusters/Update Distance Matrix
46
Merge two closest clusters/Update Distance Matrix
47
Final Result
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5
Data matrix
48
Dendrogram Representation
49
Thank You
50