Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

CLUSTER VALIDATION

Presented By :Rohit Paul


CLUSTERING
 Process of partitioning a set of data objects into subsets
(called clusters)
 Objects in a cluster are similar to one another and
dissimilar to objects in other clusters.
CLUSTER VALIDITY INDICES
 To evaluate the “goodness” of the resulting clusters.
 Different aspects of cluster validation
 To compare clustering algorithms
 To compare two different cluster set
 Comparing the results of a cluster analysis to externally known
results
 Determining the ‘correct’ number of clusters
 Scikit-learn(sklearn) – a library for machine learning in
python
 from sklearn.metrics import ..
Types of Validity Indices
 Internal Quality Indices
 Use to measure the goodness of a clustering structure without
respect to external information.
 How well the clusters are separated and how compact the
clusters are.
 External Quality Indices
 Measure the extent to which cluster labels match the externally
supplied class labels.
Internal Quality Indices
 Based on the following two criteria:
 Compactness/Cohesion: how closely related the objects in a
cluster are
 Separation: how distinct or well-separated a cluster is from
other clusters
 Application
 To compare clustering algorithms
 Determining the ‘correct’ number of clusters
Disadvantages of k-mean
Choosing the number of clusters k
 In most exploratory applications, the number of clusters K is
unknown
 Correct choice of k is often ambiguous
Davies Bouldin Index

Maximum of intra-cluster distance by


inter-cluster distance
 Lower the DB index value, better is the clustering

>> from sklearn.metrics import davies_bouldin_score


………....
>> davies_bouldin_score(X, labels)
Dunn Index

It is defined as Minimum separation by


maximum diameter
 Higher the Dunn index value, better is the clustering.
Silhouette Index
 The Silhouette Coefficient combine ideas cohesion and
separation, but for individual points
S(i) = ( b(i) – a(i) ) / ( max { ( a(i), b(i) ) }
Where,
 a(i) is the average dissimilarity of ith object to all other objects
in the same cluster
 b(i) is the average dissimilarity of i th object with all objects in
the closest cluster.
>> from sklearn.metrics import silhouette_score
………....
>> silhouette_score(X, labels)
Other Internal Cluster Validity Indices
 Root-mean-square std dev
 R-squared
 Modified Hubert statistics
 Calinski-Harabasz index
 I index
 SD validity index
 S_Dbw validity index and so on….
External Quality Indices
Comparing the results of a cluster analysis to an
externally known result, such as externally provided
class labels
 Validate against ground truth
 Compare two clusters
Jaccard Score
Rand Index
 Measure the number of pairs that are in:
 A = Same class both in P and G
 B = Same class in P but different in G
 C = Different class in P but

same in G
 D = Different class both in

P and G
 Agreement: a, d
 Disagreement: b, c
 Rand Index:

>> from sklearn.metrics import adjusted_rand_score


………....
>> adjusted_rand_score(labels_true, labels_pred)
F-measure
 Precision: What % of tuples that the classifier labeled
positive are actually positive
 Recall: What % of positive tuples did
the classifier label as positive

F-Measure : The harmonic mean of precision and


recall
Others External Cluster Validity Indices
 Normalized Mutual Information(NMI)
 Purity
 Sorensen-Dice
 Braun-Banquet
 Normalized Van Dongen
 Pair-Set Index
 Centroid Index and many more….
Reference
 https://medium.com/swlh/how-to-choose-the-right-numbe
r-of-clusters-in-the-k-means-algorithm-9160c57ec760
 https://present5.com/clustering-methods-part-3-cluster-val
idation-pasi-franti/
 https://www.datanovia.com/en/lessons/cluster-validation-s
tatistics-must-know-methods/
 https://www.geeksforgeeks.org/dunn-index-and-db-index-
cluster-validity-indices-set-1/
 Understanding of Internal Clustering Validation Measures
Yanchi Liu1,2, Zhongmou Li2, Hui Xiong2, Xuedong
Gao1, Junjie Wu31School of Economics and
Management, University of Science and Technology
Beijing, China

Thank You !!

You might also like