Professional Documents
Culture Documents
K Means Clustering Algorithm: Explained: Dni Institute
K Means Clustering Algorithm: Explained: Dni Institute
K Means Clustering Algorithm: Explained: Dni Institute
DnI Institute
Build Data and Decision Science Experience
Menu
One of the most frequently used unsupervised algorithms is K Means. K Means Clustering is
exploratory data analysis technique. This is non-hierarchical method of grouping objects together.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the
same group (called a cluster) are more similar (in some sense or another) to each other than to those
in other groups (clusters).
In this blog, we aim to explain the algorithm in a simple steps and with an example.
Business Scenario: We have height and weight information. Using these two variables, we need to
group the objects based on height and weight information.
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 1/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
If you look at the above chart, you will expect that there are two visible clusters/segments and we
want these to be identified using K Means algorithm.
Data Sample
Height Weight
185 72
170 56
168 60
179 68
182 72
188 77
180 71
180 70
183 84
180 88
180 67
177 76
Step 1: Input
Dataset, Clustering Variables and Maximum Number of Clusters (K in Means Clustering)
In this dataset, only two variables –height and weight – are considered for clustering
Height Weight
185 72
170 56
168 60
179 68
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 2/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
182 72
188 77
180 71
180 70
183 84
180 88
180 67
177 76
Cluster
Height Weight
K1 185 72
K2 170 56
Height Weight
185 72
170 56
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 3/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
K1 185 72
K2 170 56
We have considered two observations for assignment only because we knew the assignment.
And there is no change in Centroids as these two observations were only considered as initial
centroids
Step 4: Move on to next observation and calculate Euclidean Distance
Height Weight
168 60
Since distance is minimum from cluster 2, so the observation is assigned to cluster 2. Now revise
Cluster Centroid – mean value Height and Weight as Custer Centroids. Addition is only to cluster 2,
so centroid of cluster 2 will be updated
Updated cluster centroids
Updated Centroid
Cluster
Height Weight
K=1 185 72
(170+168)/2 (56+60)/2
K=2 = 169 = 58
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 4/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Step 5: Calculate Euclidean Distance for the next observation, assign next observation based on
minimum euclidean distance and update the cluster centroids.
Next Observation.
Height Weight
179 68
Cluster Centroids
Updated Centroid
Cluster
Height Weight
K=1 182.8 72
K=2 169 58
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 5/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
In the next blog, we focus on creating clusters using R. K Means Clustering using R
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 6/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
K Means
k means clustering algorithm, k means clustering example, k means clustering explained, k means
steps, simple explanation k means, Working of k means
Interview Process - Evaluating Analytical Skills
Facebook Groups - Who is contributing?
Vishal Nigam
September 25, 2016 at 1:57 pm | Reply
DnI Institute
September 25, 2016 at 2:21 pm | Reply
Thanks Vishal
Nitesh
October 8, 2016 at 5:55 am | Reply
Very good..example..
but there is a text mistake in step 4.. euclidean distance from cluster 2
DnI Institute
October 8, 2016 at 6:13 am | Reply
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 7/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Kumar P
October 14, 2016 at 9:22 pm | Reply
K-Means finds the best centroids by alternating between (1) assigning data points to clusters
based on the current centroids (2) chosing centroids (points which are the center of a cluster)
based on the current assignment of data points to clusters.
One iteration:
1. Assign labels (clusters) to all observations
2. Calculate the new Centroid values using mean
Ref: http://stanford.edu/~cpiech/cs221/handouts/kmeans.html
In your example you are updating the centroid values even before assigning all the
observations to clusters.
Please clarify.
DnI Institute
October 15, 2016 at 3:04 pm | Reply
Thanks Kumar for your comment.. I do not think there is any overall approach-wise
disconnect between steps we explained and mentioned in the link.. If you read the Step 3 -
it calculates Euclidean Distance, Assign observation to a Cluster and Cluster Centroids are
updated.Hope it helps
Pranay A
March 7, 2018 at 4:19 pm | Reply
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 8/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
@Dnl Institute,
from step 3, the assignment is only done to the new data points, and the centroids are
updated. But what if the data points assigned in the previous iterations have to change
from one cluster to the other due to the change in the centroids. I mean the euclidian
distance changes right. so, there could be a possibility that a data point in one cluster is
more closer to the data point in the other cluster, than the data point in the same cluster. I
hope my explanation is good.
Van Tuyen
November 25, 2016 at 10:18 am | Reply
Great, Professional.
You demo is peffect!
Thanks Bro.
Hazim
December 1, 2016 at 6:00 pm | Reply
Perfect.....
Mistake on calculation
Step 5 -Uploaded centroid weight values is incorrect I think.
DnI Institute
December 5, 2016 at 4:57 pm | Reply
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 9/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Eliazar
January 8, 2017 at 11:51 pm | Reply
Noor
May 10, 2017 at 10:14 am | Reply
Anant Lalchandani
August 15, 2017 at 8:08 pm | Reply
Tirtha Chakraborty
October 6, 2017 at 3:11 pm | Reply
There's probably a calculation mistake in the updated centroid values in step 5. But,apart
from that, wonderfully explained. Such a complex thing made so easy!
Rakesh Mondal
November 10, 2017 at 6:46 pm | Reply
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 10/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Devendra Shukla
November 20, 2017 at 10:31 am | Reply
Hello,
I have one doubt on K means clustering. How can any team work on K means clustering
algorithm( team means in real time project) because if value of K will be multiple so cluster
will also create multiple so can only one person will work on K means or how we use this
also real-time project?
DnI Institute
November 20, 2017 at 2:58 pm | Reply
When a k means clustering project is being done, multiple values of k are considered.
There are a few considerations to select the final clustering is selected.
Devendra Shukla
November 21, 2017 at 6:46 am | Reply
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 11/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Thanks, Dnl Institute for your reply. Have you any link or any site where we are using K
means like this.
DnI Institute
November 25, 2017 at 12:52 am | Reply
These are practical steps and considerations.. So you may not see a lot of info on internet.
Tas
December 29, 2017 at 6:37 pm | Reply
DnI Institute
February 12, 2018 at 8:50 am | Reply
Thanks for the comments.. We have advised that it is directional and not a pure research
blog. Also, we have mentioned that all objects are reconsidered for the reassignment..
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 12/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Vijeta
January 29, 2018 at 3:49 am | Reply
Tim
February 24, 2018 at 11:50 pm | Reply
I am afraid this is not a right version of k-means clustering. first, the centroid is revised every
time a new data point is assign. that may be OK. The most serious problem is that after all
data points are assigned, the clustering ends. Assign all data points is only one step in k-
means clustering, and next step is to update centroids, and these two steps are repeated
until no data point changes clustering.
Aarti
May 3, 2018 at 4:38 am | Reply
I think there is mistake in step 5 that is updated centroid how 182.8 nd 72 come.there is
calculation mistake. But rest of steps is well explained
Kleber
May 24, 2018 at 11:11 am | Reply
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 13/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Good job !
Please fix the table in Step 4: the values used in the calculation of the distance to Cluster 2
are not correct (you used the Cluster 1 values again by mistake).
Abhijit Choudhary
August 20, 2018 at 3:24 am | Reply
This is great!!!
Manju Gupta
September 21, 2018 at 8:11 am | Reply
I appreciate your work on Data Science. It's such a wonderful read on Data Science course.
Keep sharing stuffs like this. I am also educating people on similar Data Science training so if
you are interested to know more you can watch this Data Science tutorial:-
https://www.youtube.com/watch?v=h_GnVUIISk0&
RAnu
September 25, 2018 at 8:44 am | Reply
Mahmoud Shaban
January 24, 2019 at 8:07 pm | Reply
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 14/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Leave a Comment
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
Post Comment
Search …
Categories
Campaign Analytics
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 15/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Career
Cross Sell Modeling
Customer Analytics
Customer Retention
Decision Tree
Forecasting
Fraud Analytics
Insurance
jobs
K Means
Logistic Regression
Machine Learning
Market Basket
Multiple Regression
Next Best Action
Predictive Modeling
Python
Python for Data Science
R
R for Data Science
R Visualization
Random Forest
Retail Analytics
Risk Analytics
SAS
Segmentation
Social Media
Statistical Tests
Statistics
Support Vector Machine
Survival Model
Technology
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 16/17
9/26/2019 K Means Clustering Algorithm: Explained – DnI Institute
Tool
Uncategorized
Views
Views
dni-institute.in/blogs/k-means-clustering-algorithm-explained/ 17/17