Professional Documents
Culture Documents
Hierarchical Clustering Unit 4 ML
Hierarchical Clustering Unit 4 ML
In data mining and statistics, hierarchical clustering analysis is a method of clustering analysis
that seeks to build a hierarchy of clusters i.e. tree-type structure based on the hierarchy.
In machine learning, clustering is the unsupervised learning technique that groups the data based
on similarity between the set of data. There are different-different types of clustering
algorithms in machine learning. Connectivity-based clustering: This type of clustering
algorithm builds the cluster based on the connectivity between the data points. Example:
Hierarchical clustering
Centroid-based clustering: This type of clustering algorithm forms around the
centroids of the data points. Example: K-Means clustering, K-Mode clustering
Distribution-based clustering: This type of clustering algorithm is modeled using
statistical distributions. It assumes that the data points in a cluster are generated from a
particular probability distribution, and the algorithm aims to estimate the parameters of
the distribution to group similar data points into clusters Example: Gaussian Mixture
Models (GMM)
Density-based clustering: This type of clustering algorithm groups together data
points that are in high-density concentrations and separates points in low-
concentrations regions. The basic idea is that it identifies regions in the data space that
have a high density of data points and groups those points together into clusters.
Example: DBSCAN(Density-Based Spatial Clustering of Applications with Noise)
In this article, we will discuss connectivity-based clustering algorithms i.e Hierarchical
clustering
Hierarchical clustering
Hierarchical clustering is a connectivity-based clustering model that groups the data points
together that are close to each other based on the measure of similarity or distance. The
assumption is that data points that are close to each other are more similar or related than data
points that are farther apart.
A dendrogram, a tree-like figure produced by hierarchical clustering, depicts the hierarchical
relationships between groups. Individual data points are located at the bottom of the dendrogram,
while the largest clusters, which include all the data points, are located at the top. In order to
generate different numbers of clusters, the dendrogram can be sliced at various heights.
The dendrogram is created by iteratively merging or splitting clusters based on a measure of
similarity or distance between data points. Clusters are divided or merged repeatedly until all
data points are contained within a single cluster, or until the predetermined number of clusters is
attained.
We can look at the dendrogram and measure the height at which the branches of the dendrogram
form distinct clusters to calculate the ideal number of clusters. The dendrogram can be sliced at
this height to determine the number of clusters.
Types of Hierarchical Clustering
Steps:
Consider each alphabet as a single cluster and calculate the distance of one cluster
from all the other clusters.
In the second step, comparable clusters are merged together to form a single cluster.
Let’s say cluster (B) and cluster (C) are very similar to each other therefore we merge
them in the second step similarly to cluster (D) and (E) and at last, we get the clusters
[(A), (BC), (DE), (F)]
We recalculate the proximity according to the algorithm and merge the two nearest
clusters([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)]
Repeating the same process; The clusters DEF and BC are comparable and merged
together to form a new cluster. We’re now left with clusters [(A), (BCDEF)].
At last, the two remaining clusters are merged together to form a single cluster
[(ABCDEF)].
Python implementation of the above algorithm using the scikit-learn library:
Python3
from sklearn.cluster import AgglomerativeClustering
import numpy as np
# randomly chosen dataset
clustering = AgglomerativeClustering(n_clusters=2).fit(X)
print(clustering.labels_)
Output :
[1, 1, 1, 0, 0, 0]
Hierarchical Divisive clustering
It is also known as a top-down approach. This algorithm also does not require to prespecify the
number of clusters. Top-down clustering requires a method for splitting a cluster that contains
the whole data and proceeds by splitting clusters recursively until individual data have been split
into singleton clusters.
Algorithm :
given a dataset (d 1, d2, d3, ....dN) of size N
at the top we have all data in one cluster
the cluster is split using a flat clustering method eg. K-Means etc
repeat
choose the best cluster among all the clusters to split
split that cluster by the flat clustering algorithm
until each data is in its own singleton cluster
Hierarchical Divisive clustering
Implementations code
Python3
import numpy as np
Z = linkage(X, 'ward')
# Plot dendrogram
dendrogram(Z)
plt.xlabel('Data point')
plt.ylabel('Distance')
plt.show()
Output:
Hierarchical Clustering Dendrogram
- Kohonen nueral networks are used because they are a relatively simple
network to construct that can be trained very rapidly.
- The Kohonen neural network contains only an input and output layer of
neurons. There is no hidden layer in a Kohonen neural network
- It differs both in how it is trained and how it recalls a pattern. The Kohohen
neural network does not use any sort of activation function and a bias
weight.
- Output from the Kohonen neural network does not consist of the output of
several neurons.
- When a pattern is presented to a Kohonen network one of the output
neurons is selected as a "winner". This "winning" neuron is the output from
the Kohonen network.
- Often these "winning" neurons represent groups in the data that is
presented to the Kohonen network.
- The most significant difference between the Kohonen neural
network and the feed forward back propagation neural
network is that the Kohonen network trained in an
unsupervised mode.
Difference between the Kohonen neural network and the feed forward back
propagation neural network:
The neural networks with only two layers can only be applied to
linearly separable problems and this is the case with the Kohonen neural
network.
Kohonennueral networks are used because they are a relatively simple
network to
construct that can be trained very rapidly.
Algorithm
Training:
Step 1: Initialize the weights wij random value may be assumed. Initialize the learning
rate α.
Step 2: Calculate squared Euclidean distance.
D(j) = Σ (wij – xi)^2 where i=1 to n and j=1 to m
Step 3: Find index J, when D(j) is minimum that will be considered as winning index.
Step 4: For each j within a specific neighborhood of j and for all i, calculate the new
weight.
wij(new)=wij(old) + α[xi – wij(old)]
Step 5: Update the learning rule by using :
α(t+1) = 0.5 * t
Step 6: Test the Stopping Condition.
Below is the implementation of the above approach:
Python3
import math
class SOM:
# by Euclidean distance
D0 = 0
D1 = 0
for i in range(len(sample)):
D0 = D0 + math.pow((sample[i] - weights[0][i]), 2)
D1 = D1 + math.pow((sample[i] - weights[1][i]), 2)
if D0 < D1:
return 0
else:
return 1
for i in range(len(weights[0])):
return weights
# Driver code
def main():
# Training Examples ( m, n )
m, n = len(T), len(T[0])
# weight initialization ( n, C )
# training
ob = SOM()
epochs = 3
alpha = 0.5
for i in range(epochs):
for j in range(m):
# training sample
sample = T[j]
J = ob.winner(weights, sample)
s = [0, 0, 0, 1]
J = ob.winner(weights, s)
if __name__ == "__main__":
main()
Output:
Test Sample s belongs to Cluster : 0
Trained weights : [[0.6000000000000001, 0.8, 0.5, 0.9], [0.3333984375,
0.0666015625, 0.7, 0.3]]