Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Hierarchical Clustering

Hierarchical Clustering animal

• Produces a set of nested vertebrate invertebrate


clusters organized as a
hierarchical tree from a set fish reptile amphib. mammal worm insect crustacean
of unlabeled examples

• Can be visualized as a
Dendrogram
6 5
0.2

4
• A diagram that s h ows t h e 0.15 3 4
2
h i e ra r c h i c a l r e l a t i o n s h i p 5
between objects. 0.1 2
• A tre e l i ke d i a g ra m t h at
re co rd s t h e s e q u e n c e s o f 0.05 1
1
merges or splits 3
0
1 3 2 5 4 6
Typical Alternatives to Calculate the Distance between Clusters
• Single link: smallest distance between an element in one cluster and an element in the
other, i.e., dis(Ki, Kj) = min(tip, tjq)

• Complete link: largest distance between an element in one cluster and an element in
the other, i.e., dis(Ki, Kj) = max(tip, tjq)

• Average: avg distance between an element in one cluster and an element in the other,
i.e., dis(Ki, Kj) = avg(tip, tjq)

• Centroid: distance between the centroids of two clusters, i.e., dis(Ki, Kj) = dis(Ci, Cj)
• Medoid: distance between the medoids of two clusters, i.e., dis(Ki, Kj) = dis(Mi, Mj)
• Medoid: one chosen, centrally located object in the cluster
Hierarchical Algorithms
• Single-link
• Distance between two clusters set equal to the minimum of
distances between all instances
• Single link (nearest neighbour). The distance between two
clusters is determined by the distance of the two closest
objects (nearest neighbours) in the different clusters.
• Complete-link
• Distance between two clusters set equal to maximum of all
distances between instances in the clusters
• Complete link (furthest neighbour). The distances between
clusters are determined by the greatest distance between
any two objects in the different clusters (i.e., by the
"furthest neighbours").
• Tightly bound, compact clusters
Hierarchical Algorithms cont..
Pair-group average.
The distance between two clusters is calculated as
the average distance between all pairs of
objects in the two different clusters.
This method is also very efficient when the objects
form natural distinct "clumps," however, it
performs equally well with elongated, "chain"
type clusters.

Pair-group centroid.
The distance between two clusters is determined
as the distance between centroids.
Hierarchical Clustering
• Use distance matrix as clustering criteria. This method does not require
the number of clusters k as an input, but needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4 Agglomerative (bottom-up): (AGNES)
Agglomerative cluster is a common type of Hierarchical
Clustering that is also called Agglomerative nesting (AGNES).
a Start with each document being a single cluster.
ab
Eventually all documents belong to the same cluster.
b
abcde
c Divisive (top-down): (DIANA)
cde
d It is a top-down clustering approach. It works as similar as
de Agglomerative Clustering but in the opposite direction, also
e known as DIANA (Divisive Clustering Analysis).
Start with all documents belong to the same cluster.
Step 4 Step 3 Step 2 Step 1 Step 0 Eventually each node forms a cluster on its own.
Dendrogram: Shows How the Clusters are Merged

Decompose data objects into a


several levels of nested
partitioning (tree of clusters),
called a dendrogram.

A clustering of the data objects is


obtained by cutting the
dendrogram at the desired level,
then each connected component
forms a cluster.
What is a Dendrogram?
• A Dendrogram is a type of tree diagram showing hierarchical relationships between
different sets of data.

Distance between data points represents dissimilarities.


Height of the blocks represents the distance between clusters.
Parts of a Dendrogram
• The Clades are the branch and are arranged
according to how similar (or dissimilar) they are.
Clades that are close to the same height are
similar to each other; clades with different
heights are dissimilar — the greater the
difference in height, the more dissimilarity.
• Each clade has one or more leaves.
• Leaves A, B, and C are more similar to each other
than they are to leaves D, E, or F.
• Leaves D and E are more similar to each other
than they are to leaves A, B, C, or F.
• Leaf F is substantially different from all of the
other leaves.
One question that might have intrigued you by now is how do
you decide when to stop merging the clusters?
• You cut the dendrogram tree with a
horizontal line at a height where the line
can traverse the maximum distance up
and down without intersecting the
merging point.

• For example in the below figure L3 can


traverse maximum distance up and down
without intersecting the merging points.
So we draw a horizontal line and the
number of verticle lines it intersects is the
optimal number of clusters.
AGNES (Agglomerative Nesting)
• Introduced in Kaufmann and Rousseeuw (1990)
• Implemented in statistical analysis packages, e.g., Splus
• Use the Single-Link method and the dissimilarity matrix.
• Merge nodes that have the least dissimilarity
• Go on in a non-descending fashion
• Eventually all nodes belong to the same cluster
Agglomerative Hierarchical clustering Algorithm
• Start with taking each data point as singleton cluster
• Compute the distance between all pair of points. The resultant matrix will be a square
matrix.
• Now on the basis of the distance between pair of points, select the ones with minimum
distance. Merge them into one cluster. Represent the cluster formed in dendrogram. I
will discuss how to draw dendrogram in later sections.
• Update the distance matrix by deleting the rows corresponding to points that have
merged into cluster.
• Now moving further, again calculate distance between all pair of points after merging
of points is done.
• Again merge the points between which the distance is minimum.
• Keep merging the points and forming the clusters until there is one big cluster and side
by side represent your clusters that are forming in dendrogram.
How Agglomerative Hierarchical clustering Algorithm
Works?
Agglomerative Hierarchical Clustering (AHC) is an iterative classification method whose
principle is simple.

1. Start assigning each observation as a single point cluster, so that if we have N observations,
we have N clusters, each containing just one observation.
2. Find the closest (most similar) pair of clusters and make them into one cluster, we now have
N-1 clusters. This can be done in various ways to identify similar and dissimilar measures.
3. Find the two closest clusters and make them to one cluster. We now have N-2 clusters. This
can be done using agglomerative clustering linkage techniques .
4. Repeat steps 2 and 3 until all observations are clustered into one single cluster of size N.

This process continues until all the objects have been clustered. These successive clustering
operations produce a binary clustering tree (dendrogram), whose root is the class that
contains all the observations. This dendrogram represents a hierarchy of partitions.
Single Link Agglomerative Clustering
• Use maximum similarity of pairs:

sim(ci ,c j )  max sim( x, y )


xci , yc j

• Can result in “straggly” (long and thin) clusters due to chaining effect.
• Appropriate in some domains, such as clustering islands: “Hawai’i clusters”
• After merging ci and cj, the similarity of the resulting cluster to another
cluster, ck, is:

sim((ci  c j ), ck )  max(sim(ci , ck ), sim(c j , ck ))


Single Link Example
Complete Link Agglomerative Clustering
• Use minimum similarity of pairs:

sim(ci ,c j )  min sim( x, y )


xci , yc j

• Makes “tighter,” spherical clusters that are typically preferable.


• After merging ci and cj, the similarity of the resulting cluster to another
cluster, ck, is:

sim((ci  c j ), ck )  min( sim(ci , ck ), sim(c j , ck ))


Complete Link Example
Hierarchical clustering using
single and complete linkages
Example
• Find single link technique to find X Y
clusters in the given database. 1
0.4 0.53
2
0.22 0.38
3
0.35 0.32
4
0.26 0.19
5
0.08 0.41
6
0.45 0.3
Plot given data 1
X Y

0.4 0.53
2 0.22 0.38
3 0.35 0.32
4 0.26 0.19
5 0.08 0.41
6 0.45 0.3
X Y
Identify two nearest clusters 1 0.4 0.53
2 0.22 0.38
3 0.35 0.32
4 0.26 0.19
5 0.08 0.41
6 0.45 0.3
Repeat process until all objects in same cluster
X Y

1 0.4 0.53
2 0.22 0.38
3 0.35 0.32
4 0.26 0.19
5 0.08 0.41
6 0.45 0.3
Average link
X Y

1
• Average distance matrix 2
0.4 0.53

0.22 0.38
3 0.35 0.32
4 0.26 0.19
5 0.08 0.41
6 0.45 0.3
Construct a distance matrix

1 2 3 4 5 6

1 0

2 0.24 0

3 0.22 0.15 0

4 0.37 0.2 0.15 0

5 0.34 0.14 0.28 0.29 0

6 0.23 0.25 0.11 0.22 0.39 0


DIANA (Divisive Analysis)
• Introduced in Kaufmann and Rousseeuw (1990)
• Implemented in statistical analysis packages, e.g., Splus
• Inverse order of AGNES
• Eventually each node forms a cluster on its own
Divisive Clustering (Algorithm)
 Start with assuming all the data is in one cluster.
 Compute the distance of each point from every other point. For ex- If we have 5 points, compute distance of 1
from 2,3,4,5 and sum it and so on.
 Now the point which has maximum distance will be segregated from all other points i.e. now the cluster is
divided into two.
 Now again compute the distance of every point from other points(including distance computation within
cluster) .
 Take the difference of distances computed from each cluster. Segregate the point which has maximum
difference.
 Continue this algorithm until difference in distances is not negative.
 Distance measures
 Average Linkage – In this, average of distance between two points is taken while merging the points in one
cluster.
 Complete Linkage – In this, maximum of distance between two points is taken (farthest) while merging the
points in one cluster.
 Single Linkage – In this, minimum of distance between two points is taken (closest) while merging the points
in one cluster.
Difficulties in Hierarchical Clustering
• Difficulties regarding the selection of merge or split points
• This decision is critical because the further merge or split decisions are
based on the newly formed clusters
• Method does not scale well
• So hierarchical methods are integrated with other clustering techniques to
form multiple-phase clustering

You might also like