Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

EXPERIMENT NO.

AIM:-

Implementation of a hierarchical clustering method.

THEORY:-

Introduction to Hierarchical Clustering:-

Hierarchical clustering is another unsupervised learning algorithm that is used


to group together the unlabeled data points having similar characteristics.
Hierarchical clustering algorithms falls into following two categories.
Agglomerative hierarchical algorithms − In agglomerative hierarchical
algorithms, each data point is treated as a single cluster and then successively
merge or agglomerate (bottom-up approach) the pairs of clusters. The hierarchy
of the clusters is represented as a dendrogram or tree structure.
Divisive hierarchical algorithms − On the other hand, in divisive hierarchical
algorithms, all the data points are treated as one big cluster and the process of
clustering involves dividing (Top-down approach) the one big cluster into
various small clusters.

Steps to Perform Agglomerative Hierarchical Clustering

We are going to explain the most used and important Hierarchical clustering i.e.
agglomerative. The steps to perform the same is as follows −
 Step 1 − Treat each data point as single cluster. Hence, we will be
having, say K clusters at start. The number of data points will also be K
at start.
 Step 2 − Now, in this step we need to form a big cluster by joining two
closet datapoints. This will result in total of K-1 clusters.
 Step 3 − Now, to form more clusters we need to join two closet clusters.
This will result in total of K-2 clusters.
 Step 4 − Now, to form one big cluster repeat the above three steps until K
would become 0 i.e. no more data points left to join.
 Step 5 − At last, after making one single big cluster, dendrograms will be
used to divide into multiple clusters depending upon the problem.
Let’s visualize how hierarchical clustering works with an Example.

Suppose we have data related to marks scored by 4 students in Math and


Science and we need to create clusters of students to draw insights.

Now that we have the data, the first step we need to do is to see how distant
each data point is from each other.
For this, we construct a Distance matrix. Distance between each point can be
found using various metrics i.e. Euclidean Distance, Manhattan Distance, etc.
We’ll use Euclidean distance for this example:
We now formed a Cluster between S1 and S2 because they were closer to each
other. Now a question arises, what does our data look like now?

We took the average of the marks obtained by S1 and S2 and the values we get
will represent the marks for this cluster. Instead of averages, we can consider
maximum or minimum values for data points in the cluster.

gain find the closest points and create another cluster.


If we repeat the steps above and keep on clustering until we are left with just
one cluster containing all the clusters, we get a result which looks something
like this:

The figure we get is what we call a Dendrogram. A dendrogram is a tree-like


diagram that illustrates the arrangement of the clusters produced by the
corresponding analyses. The samples on the x-axis are arranged automatically
representing points with close proximity that will stay closer to each other.

PROGRAM:-
OUTPUT:-
CONCLUSION:-

Thus we learnt and implemented hierarchical clustering algorithm.

You might also like