Professional Documents
Culture Documents
04 - KMeans Clustering
04 - KMeans Clustering
04 - KMeans Clustering
1 05/25/2024
Outline
Introduction:
– Supervised vs Unsupervised
– Clustering
Unsupervised
– You have unlabeled data
– Discover the underlying structure of the data
– Try to understand the data
– Not predicting anything specific
Cluster analysis
– find similarities between data according to characteristics
underlying the data and grouping similar data objects into
clusters
Partitional (Centroid-based)
– Each cluster represented by a centroid
– Determined by a proximity measurement
Other types:
– Distribution-based
– Density-based
– Etc.
Overlap criteria
– Hard clustering
– Soft clustering
Dimensionality reductions
Graph coloring
Market segmentation
Etc.
5 o1 o2 o5 o6
4
2 o4 o3 o8 o7
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
Set K = 2,
initialize 2 centroid randomly
2 o4 +
c
o 3 o8 o7
2
1
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
2 o4 +
c
o 3 o8 o7
2
1
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
3 +
c
2
2 o4 o3 o8 o7
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
3 +
c
2
2 o4 o3 o8 o7
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
5 o1 o2 o5 o6
4
3
+ c1 + c2
2 o4 o3 o8 o7
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
5 o1 o2 o5 o6
4
3
+ c1 + c2
2 o4 o3 o8 o7
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
2 o4 +
c
o 3 o8 o7
2
1
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
Iteration 1
Iteration 1
x2
6
c1
5 o1 + o2 o5 o6
3 +
c
2
2 o4 o3 o8 o7
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
Iteration 2
5 o1 o2 o5 o6
4
3
+ c1 + c2
2 o4 o3 o8 o7
0 x1
0 1 2 3 4 5 6 7 8 9 10 11 12
Iteration 3
Try 1
Iteration 1
Try 1
Iteration 2
Try 1
Iteration 3
Try 1
Iteration 4, iteration stopped
Try 2
Iteration 1
Try 2
Iteration 1
Try 2
Iteration 2
Try 2
Iteration 3, iteration stopped
c1
+
Cons
– Necessity of specifying k
– Sensitive to initial assignment of centroids
– Sensitive to noise and outlier
– Not suitable for discovering clusters with non-convex shapes
– Non-deterministic, Can be inconsistent from one run to another
– Need to define measurement to evaluate the performance
𝑗=1 x𝑖 ∈𝐶 𝑗
Silhouette coefficient