Example of Complete Linkage Clustering

Complete linkage
Single linkage

What is hierarchical clustering (agglomerative) ?

Clustering is a data mining technique to group a set of objects in a

way such that objects in the same cluster are more similar to each
other than to those in other clusters.

In hierarchical clustering, we assign each object (data point) to a

separate cluster. Then compute the distance (similarity) between
each of the clusters and join the two most similar clusters. Let’s
understand further by solving an example.

Objective : For the one dimensional data set {7,10,20,28,35},

perform hierarchical clustering and plot the dendogram to visualize

Solution : First, let’s the visualize the data.

Observing the plot above, we can intuitively conclude that:

1. The first two points (7 and 10) are close to each other and
should be in the same cluster

2. Also, the last two points (28 and 35) are close to each other
and should be in the same cluster

3. Cluster of the center point (20) is not easy to conclude

Let’s solve the problem by hand using both the types of

agglomerative hierarchical clustering :

1. Single Linkage : In single link hierarchical clustering, we

merge in each step the two clusters, whose two closest
members have the smallest distance.
Using single linkage two clusters are formed :

Cluster 1 : (7,10)

Cluster 2 : (20,28,35)
2. Complete Linkage : In complete link hierarchical clustering, we
merge in the members of the clusters in each step, which provide
the smallest maximum pairwise distance.

Using complete linkage two clusters are formed :

Cluster 1 : (7,10,20)

Cluster 2 : (28,35)

Conclusion : Hierarchical clustering is mostly used when the

application requires a hierarchy, e.g creation of a taxonomy.
However, they are expensive in terms of their computational and
storage requirements.

Example: Clustering the following 7 data points.

X1 X2
A 10 5
B 1 4
C 5 8
D 9 2
E 12 10
F 15 8
G 7 7

Step 1: Calculate distances between all data points using Euclidean distance function. The shortest distance is betwee
points C and G.

B 9.06
C 5.83 5.66
D 3.16 8.25 7.21
E 5.39 12.53 7.28 14.42
F 5.83 14.56 10.00 16.16 3.61
G 3.61 6.71 2.24 8.60 5.83 8.06

Step 2: We use "Average Linkage" to measure the distance between the "C,G" cluster and other data points.
B 9.06
C,G 4.72 6.10
D 3.16 8.25 6.26
E 5.39 12.53 6.50 14.42
F 5.83 14.56 9.01 16.16 3.61

Step 3:

B 8.51
C,G 5.32 6.10
E 6.96 12.53 6.50
F 7.11 14.56 9.01 3.61

Step 4:

B 8.51
C,G 5.32 6.10
E,F 6.80 13.46 7.65
Step 5:

B 6.91
E,F 6.73 13.46

Step 6:

B 9.07

Final dendrogram:
