Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

https://people.revoledu.com/kardi/tutorial/Clustering/Numerical%20Example.

htm
Example of Complete Linkage Clustering

Complete linkage
Single linkage

What is hierarchical clustering (agglomerative) ?

Clustering is a data mining technique to group a set of objects in a


way such that objects in the same cluster are more similar to each
other than to those in other clusters.

In hierarchical clustering, we assign each object (data point) to a


separate cluster. Then compute the distance (similarity) between
each of the clusters and join the two most similar clusters. Let’s
understand further by solving an example.
Dendogram

Objective : For the one dimensional data set {7,10,20,28,35},


perform hierarchical clustering and plot the dendogram to visualize
it.

Solution : First, let’s the visualize the data.


Observing the plot above, we can intuitively conclude that:

1. The first two points (7 and 10) are close to each other and
should be in the same cluster

2. Also, the last two points (28 and 35) are close to each other
and should be in the same cluster

3. Cluster of the center point (20) is not easy to conclude

Let’s solve the problem by hand using both the types of


agglomerative hierarchical clustering :

1. Single Linkage : In single link hierarchical clustering, we


merge in each step the two clusters, whose two closest
members have the smallest distance.
Using single linkage two clusters are formed :

Cluster 1 : (7,10)

Cluster 2 : (20,28,35)
2. Complete Linkage : In complete link hierarchical clustering, we
merge in the members of the clusters in each step, which provide
the smallest maximum pairwise distance.

Using complete linkage two clusters are formed :


Cluster 1 : (7,10,20)

Cluster 2 : (28,35)

Conclusion : Hierarchical clustering is mostly used when the


application requires a hierarchy, e.g creation of a taxonomy.
However, they are expensive in terms of their computational and
storage requirements.

Hierarchical Clustering
Hierarchical clustering involves creating clusters that have a predetermined ordering from top to bottom. For exampl
and folders on the hard disk are organized in a hierarchy. There are two types of hierarchical
clustering, Divisive and Agglomerative.

Divisive method

In divisive or top-down clustering method we assign all of the observations to a single cluster and then partition the c
two least similar clusters using a flat clustering method (e.g., K-Means). Finally, we proceed recursively on each cluste
there is one cluster for each observation. There is evidence that divisive algorithms produce more accurate hierarchie
agglomerative algorithms in some circumstances but is conceptually more complex.

Agglomerative method
In agglomerative or bottom-up clustering method we assign each observation to its own cluster. Then, compute the s
(e.g., distance) between each of the clusters and join the two most similar clusters. Finally, repeat steps 2 and 3 until
only a single cluster left. The related algorithm is shown below.

Before any clustering is performed, it is required to determine the proximity matrix containing the distance between
using a distance function. Then, the matrix is updated to display the distance between each cluster. The following thr
methods differ in how the distance between each cluster is measured.

Single Linkage

In single linkage hierarchical clustering, the distance between two clusters is defined as the shortest distance between
points in each cluster. For example, the distance between clusters “r” and “s” to the left is equal to the length of the a
between their two closest points.

Complete Linkage

In complete linkage hierarchical clustering, the distance between two clusters is defined as the longest distance betw
points in each cluster. For example, the distance between clusters “r” and “s” to the left is equal to the length of the a
between their two furthest points.

Average Linkage

In average linkage hierarchical clustering, the distance between two clusters is defined as the average distance betwe
point in one cluster to every point in the other cluster. For example, the distance between clusters “r” and “s” to the
equal to the average length each arrow between connecting the points of one cluster to the other.

Example: Clustering the following 7 data points.

X1 X2
A 10 5
B 1 4
C 5 8
D 9 2
E 12 10
F 15 8
G 7 7

Step 1: Calculate distances between all data points using Euclidean distance function. The shortest distance is betwee
points C and G.

A B C D E F
B 9.06
C 5.83 5.66
D 3.16 8.25 7.21
E 5.39 12.53 7.28 14.42
F 5.83 14.56 10.00 16.16 3.61
G 3.61 6.71 2.24 8.60 5.83 8.06

Step 2: We use "Average Linkage" to measure the distance between the "C,G" cluster and other data points.
A B C,G D E
B 9.06
C,G 4.72 6.10
D 3.16 8.25 6.26
E 5.39 12.53 6.50 14.42
F 5.83 14.56 9.01 16.16 3.61

Step 3:

A,D B C,G E
B 8.51
C,G 5.32 6.10
E 6.96 12.53 6.50
F 7.11 14.56 9.01 3.61

Step 4:

A,D B C,G
B 8.51
C,G 5.32 6.10
E,F 6.80 13.46 7.65
Step 5:

A,D,C,G B
B 6.91
E,F 6.73 13.46

Step 6:

A,D,C,G,E,F
B 9.07

Final dendrogram:
What is Hierarchical Clustering?
Hierarchical clustering is a popular method for grouping objects. It
creates groups so that objects within a group are similar to each other
and different from objects in other groups. Clusters are visually
represented in a hierarchical tree called a dendrogram.
Hierarchical clustering has a couple of key benefits:

1. There is no need to pre-specify the number of clusters. Instead, the dendrogram can be cut
at the appropriate level to obtain the desired number of clusters.

2. Data is easily summarized/organized into a hierarchy using dendrograms. Dendrograms


make it easy to examine and interpret clusters.

Applications
There are many real-life applications of Hierarchical clustering. They
include:

 Bioinformatics: grouping animals according to their biological features to reconstruct


phylogeny trees

 Business: dividing customers into segments or forming a hierarchy of employees based on


salary.

 Image processing: grouping handwritten characters in text recognition based on the


similarity of the character shapes.

 Information Retrieval: categorizing search results based on the query.

Hierarchical clustering types


There are two main types of hierarchical clustering:

1. Agglomerative: Initially, each object is considered to be its own cluster. According to a


particular procedure, the clusters are then merged step by step until a single cluster
remains. At the end of the cluster merging process, a cluster containing all the elements will
be formed.
2. Divisive: The Divisive method is the opposite of the Agglomerative method. Initially, all
objects are considered in a single cluster. Then the division process is performed step by
step until each object forms a different cluster. The cluster division or splitting procedure is
carried out according to some principles that maximum distance between neighboring
objects in the cluster.
Between Agglomerative and Divisive clustering, Agglomerative
clustering is generally the preferred method. The below example will
focus on Agglomerative clustering algorithms because they are the most
popular and easiest to implement.
Avrage

Python Code:

https://www.learndatasci.com/glossary/hierarchical-clustering/
https://towardsdatascience.com/machine-learning-algorithms-part-12-
hierarchical-agglomerative-clustering-example-in-python-1e18e0075019

https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-
hierarchical-clustering/

https://www.w3schools.com/python/python_ml_hierarchial_clustering.as
p

Decision tree

https://youtu.be/zNYdkpAcP-g

https://www.w3schools.com/python/python_ml_decision_tree.asp

Video link:

https://youtu.be/v7oLMvcxgFY

https://vitalflux.com/hierarchical-clustering-explained-with-python-
example/

You might also like