Clustering Techniques

Clustering Techniques
Periodic Table
Clusters poor metals

Periodic Table
Powered by Machine
Learning
https://www.chemistryworld.com/opinion/machine-learning-
mendeleevs-have-rediscovered-the-periodic-table/
3010720.article
Helpful in vaccine
Coronaviradae Family Development
Coronaviradae Family
Covid 19 Helpful in vaccine

Development
What is Clustering ?
Clustering is the process of grouping a set of data

objects into multiple groups or clusters so that objects
within a cluster have high similarity, but are very
dissimilar to objects in other clusters.
Cluster analysis has been widely used in many applications such as business intelligence,
image pattern recognition, web search, biology, security etc.
Use the spam
filter for mails
Clustering
Partitioning Hierarchical Density based Grid Based
Given a set of n objects, a partitioning method constructs k partitions of the data, where each partition represents a
cluster and k ≤ n. it then uses an iterative relocation technique that attempts to improve the partitioning by moving
objects from one group to another.
Partitioning
Centroid-based clustering is the easiest of all
the clustering types in data mining. It works
on the closeness of the data points to the
chosen central value. The datasets are
divided into a given number of clusters, and
a vector of values references every cluster.
The input data variable is compared to the
vector value and enters the cluster with
minimal difference.
Pre-defining the number of clusters at the

initial stage is the most crucial yet most
complicated stage for the clustering
approach. Despite the drawback, it is a
vastly used clustering approach for
surfacing and optimizing large datasets.
The K-Means algorithm lies in this
category
Document
Analysis
Clustering
A hierarchical method creates a hierarchical decomposition of agglomerative or divisive, based on how the hierarchical
decomposition is formed.
Hierarchical Clustering
Hierarchical Clustering is also known as connectivity-based clustering, is based on the principle that every object is
connected to its neighbors depending on their proximity distance (degree of relationship). The clusters are represented in
extensive hierarchical structures separated by a maximum distance required to connect the cluster parts.
The clusters are represented as Dendrograms, where X-axis represents the objects that do not merge while Y-axis is the
distance at which clusters merge. The similar data objects have minimal distance falling in the same cluster, and the
dissimilar data objects are placed farther in the hierarchy. Mapped data objects correspond to a Cluster amid discrete
qualities concerning the multidimensional scaling, quantitative relationships among data variables, or cross-tabulation in
some aspects.
Hierarchical Clustering -
Types
Agglomerative Divisive
(Bottom Up) (Top Down)
1,2,3,4,5,6,7
1,2,3 4,5 6,7
1 2 3 4 5 6 7
Hierarchical Clustering -
Types
Agglomerative Divisive
(Bottom Up) (Top Down)
Dendrogram
A dendrogram is a diagram
that shows the hierarchical
relationship between
objects. It is most
commonly created as an
output from hierarchical
clustering. The main use of
a dendrogram is to work out
the best way to allocate
objects to clusters
Dendrograms
Canada United States Germany France United Kingdom Australia

Dendrogram – 100 observations
Similarity Measures – Euclidean
Distance
(5,6 ) b (5,6 )
X1 X1 a
(1,3 ) (1,3 )
X2 X2
Euclidean distance or Euclidean metric is the Manhattan distance is the distance between
"ordinary" straight-line distance between two two points is the sum of the absolute
points in Euclidean space differences of their Cartesian coordinates.
ED = √ 2
2 2
( 𝑥 − 𝑥 1 ) + ( 𝑦 2 − 𝑦 1 ) MD = ( 𝑎+ 𝑏 )
ED = √ ( 5 −1 ) 2
+( 6 − 3 )
2
MD = ( 3+ 4 )
ED =𝟓 MD = 𝟕
Traffic
problem
Clustering
Most partitioning methods cluster objects based on the distance between objects. Such methods can find only spherical-
shaped clusters and encounter difficulty in discovering clusters of arbitrary shapes.
Density based
Density-based aka DBSCAN ( Density Based Special clustering with applications with Noise)
clustering method considers density ahead of distance. Data is clustered by regions of high
concentrations of data objects bounded by areas of low concentrations of data objects. The
clusters formed are grouped as a maximal set of connected data points.
Retail
Customers
Clustering
Grid-based methods quantize the object space into a finite number of cells that form a grid structure. Using grids is often
an efficient approach to many spatial data mining problems, including clustering. Therefore, grid-based methods can be
integrated with other clustering methods such as density-based methods and hierarchical methods.
Other Clustering Techniques Constraints Based
The clustering process, in general, is based on the approach

that the data can be divided into an optimal number of
“unknown” groups. The underlying stages of all the clustering
algorithms are to find those hidden patterns and similarities
without intervention or predefined conditions. However, in
certain business scenarios, we might be required to partition
the data based on certain constraints. Here is where a
supervised version of clustering machine learning techniques
comes into play.
A constraint is defined as the desired properties of the

clustering results or a user’s expectation of the clusters so
formed – this can be in terms of a fixed number of clusters, the
cluster size, or important dimensions (variables) that are
required for the clustering process.
Other Clustering Techniques Distribution-Based
It is a probability-based distribution that uses statistical

distributions to cluster the data objects. The cluster includes data
objects that have a higher probability to be in it. Each cluster
has a central point, the higher the distance of the data point from
the central point, the lesser will be its probability to get included
in the cluster.
A constraint is defined as the desired properties of the

clustering results or a user’s expectation of the clusters so
formed – this can be in terms of a fixed number of clusters, the
cluster size, or important dimensions (variables) that are
required for the clustering process.

Clustering Techniques

Uploaded by

Copyright:

Available Formats

You might also like

Clustering Techniques

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clustering Techniques

Uploaded by

Copyright:

Available Formats

Clustering Techniques

Clusters poor metals

Covid 19 Helpful in vaccine

Clustering is the process of grouping a set of data

Partitioning Hierarchical Density based Grid Based

Pre-defining the number of clusters at the

Partitioning Hierarchical Density based Grid Based

1,2,3 4,5 6,7

Canada United States Germany France United Kingdom Australia

Partitioning Hierarchical Density based Grid Based

Partitioning Hierarchical Density based Grid Based

The clustering process, in general, is based on the approach

A constraint is defined as the desired properties of the

It is a probability-based distribution that uses statistical

A constraint is defined as the desired properties of the

You might also like