Professional Documents
Culture Documents
Clustering: Unsupervised Learning Methods 15-381
Clustering: Unsupervised Learning Methods 15-381
General Assumptions
Each data item is a tuple (vector)
Values of tuples are nominal, ordinal or
numerical
Similarity = (Distance)-1
Euclidean distance
Observations
Single pass over the data easy to cluster new data
incrementally
Requires arbitrary Smin threshold
O(n|C|) time, O(n) space
K-Means Clustering
Step 4: Recompute
Centroids
Cluster(pi) = Argmin(d(pi,cj))
cj{c1,…,ck}
K-Means Clustering: Iterate Until Stability
Complete-link:
2
1 5
4
6 3
9
8
Hierarchical Agglomerative
Clustering Methods (cont.)
Postprocess Taxonomies
Eliminate "no-op" levels
Agglomerate "skinny" levels
Label meaningful levels manually or with
centroid summary