Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 36

FOUNDATION TO DATA SCIENCE

Data Analytics Techniques

ADVANCED STATISTICAL METHODS

Day 12-13: Clustering Methods


K-Means-Hierarchical Method of Clustering

Prof. Dr. George Mathew


B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD 1
K-Means Vs KNN
K-NN is a Supervised machine learning while
K-means is an unsupervised machine learning.
K-NN is a classification or regression machine
learning algorithm while K-means is a clustering
machine learning algorithm. K-NN is a lazy
learner while K-Means is an eager learner
1. Partitioning Method
If we know how much cluster is required, we
use Partitioning Method
2. Hierarchical Method
If we do not know the number of cluster in
advance, we use Hierarchical Method.
The k-means clustering
The k-means clustering is an unsupervised learning technique that helps in
partitioning data of n observations into K buckets of similar observations.

The clustering algorithm is called so because it operates by computing the mean of


the features which refer to the dependent variables based on which we cluster things,
such as segmenting of customers based on an average transaction amount and the
average number of products purchased in a quarter of a year. This mean value
then becomes the center of a cluster. The number K refers to the number of clusters,
that is, the technique consisting of computing a K number of means, leading to the
clustering of the data around these k-means.
How do we choose this K? If we have some idea of what we are looking for or how
many clusters we expect or want, then we set K to be this number before we start the
engines and let the algorithm compute along.
The k-means clustering
Python Code:
11 ASAP GM DAP_Week11_K_MeansClustering
Hierarchical Method of Clustering

You might also like