K - Means & Hierarchical - Clustering - Concepts

Clustering Techniques
(K_Means,Hierarchical)
Copyright©2019 by Simplifying Skills, Contact:-9579708361/8390096208

K Means Clustering
K Means Clustering is an unsupervised learning algorithm that

will attempt to group similar clusters together in your data.
So what does a typical clustering problem look like?
● Cluster Similar Documents
● Cluster Customers based on Features
● Market Segmentation
● Identify similar physical groups
Copyright©2019 by Simplifying Skills, Contact:-9579708361/8390096208

K Means Clustering
● The overall goal is to divide data into distinct groups such

that observations within each group are similar
K Means Clustering
The K Means Algorithm

● Choose a number of Clusters“K”
● Randomly assign each point to a cluster
● Until clusters stop changing, repeat thefollowing:
○ For each cluster, compute the cluster centroid by
taking the mean vector of points in the cluster
○ Assign each data point to the cluster for which the
centroid is the closest
K Means Clustering
Choosing a K Value
Choosing a K Value
● There is no easy answer for choosing a “best” K value

● One way is the elbow method
First of all, compute the sum of squared error (SSE) for some
values of k (for example2, 4, 6, 8, etc.).
The SSE is defined as the sum of the squared distance
between each member of the cluster and its centroid.
Choosing a K Value
If you plot k against the SSE, you will see that the error
decreases as k gets larger; this is because when the number
of clusters increases, they should be smaller, so distortion is
also smaller.
The idea of the elbow method is to choose the k at which the
SSE decreases abruptly.
This produces an "elbow effect" in the graph, as you can see
in the following picture:
Choosing a K Value
Clustering Techniques
[Segmentation]
Why and Where?
What we have done so far
 We had a definite response
 We had data points which possibly led to that response
 We extracted information through these modelling

techniques, how the data points contributed to that
response
No Response?
 Now what if we don’t really have a response. We

just have data. Is that any good? What kind of
information would you be interested in
extracting?
 Unsupervised Learning?
In absence of a target
 We can try to find out if there is some pattern in

the data.
 What do we mean by pattern?
 Ifall the observations or data points are similar,

all information that we can extract is by
summarizing the data.
Contd..
 More information would be if some of the data

points are different from others, or in other words
there exist some groups within the population,
different than each other.
 Examples?
Class Case : Fine Wine
Wine Tasters
 Wine tasters are used to rate wines on various

parameters.
 Its not only becoming tough to find good wine tasters, its
almost impossible get consistent feedbacks from multiple
wine tasters
 For mass production companies, this process needs to be

automated
Segmentation to Rescue
 Insteadof relying on subjective opinions of wine tasters

we can measure chemical properties of wines
 We can then group similar wines together based on their

chemical properties
 Although final labeling on these groups will have to be a

manual process
Quantifying Difference
What do you mean by different?
 Howdo you “quantify” the difference? How do

you make the group distinguishable based on
data, more importantly “numbers’?
A small example
 Lets plot this small data set and try to figure out if there is way to quantify
our intuition.
So distance it is then. Scale matters?
 As it turns out scale matters, especially if

variables in consideration are measured on
different scales.
 The way out?

Standardization is the saviour Again!
 Or , is it?
 Althoughstandardization is recommended, do we
always standardize?
Combining Groups
Aggregation of groups : Methods
 Linkage Methods
 Single
 Complete
 Average
 Centroid
Methods contd..
Hierarchical Clustering
Problems with Hierarchical Clustering
 With increasing data; process becomes too slow

and resource intensive
 Treediagrams become too cluttered to make any

sense out of them
 K-means clustering comes to the rescue

K-Means Clustering
 You alreadytell the algorithm, how many clusters

there are going to be in the data, that is the
number K.
 You startwith K random observations from the

data, as K clusters.
Contd..
 Next observation is added to one of these clusters with

the criterion which you have chosen.
 Centroid for the said data is calculated, this would be the

new point from which distance from the cluster will be
calculated.
 This procedure is repeated until all the data points are

assigned to clusters
K Means Clustering
K Means Clustering
Problems?
 We’ll take care of these problems in coming

sections. Lets briefly touch upon these for now
 Sensitive to the initial seeds [ the first K points]

K?
 What value of K is appropriate?
 R2or WSS will keep on increasing /decreasing

respectively with increase in value of K, until K
equals the number of data points [We’ll learn how
wss plays a role in clustering in next section]
 Where do we stop increasing K?

Variable Selection and results of
clustering
 Which variable should be selected? All of them?
What can be the problem with including all the
variables?
 Fine my clustering is done; now what?

Sum of Squares & sons!
 Forany given data Total sum of squares or SST is

constant
 SST= SSW + SSB, that is if we break our data into

multiple groups. If these groups are formed, SSB
is much higher w.r.t. To SSW.
Contd..
 As we increase number of groups SSW goes down.
 With increase in K if fall in SSW is not rapid/

steep, it implies the higher number of groups are
not resulting in better formed groups.
Effect on R2
 R2= SSB/SST , as opposed to SSW, SSB goes

up and equals SST if we have number of
groups equal to data points. With
increasing K, R2 approaches to one.
Effect on WSS – Deciding on no. of
clusters
Other Considerations
What to do with categorical variables?
 Make ordinal if possible
 Or Dummy variables
 Keepin mind that those ordinal variables should

not be circular. Ok. Fine. Wait!....Circular?
Is multicollinearity an issue?
 There is no hard and fast statistical theory at play

here
 No hypothesis testing
 Nobeta {the symbol!} , no DV, no hassle, why

exactly multicollinearity can be an issue here?
Applications
Further applications of Cluster analysis
 Marketing & Media
 Banking & Insurance
 Medical and Pharmaceuticals
 Socio-economic
Types of Clustering
Broadly speaking, clustering can be divided into two subgroups :
Hard Clustering: In hard clustering, each data point either belongs to a

cluster completely or not. For example, in the above example each
customer is put into one group out of the 10 groups.
Soft Clustering: In soft clustering, instead of putting each data point

into a separate cluster, a probability or likelihood of that data point to
be in those clusters is assigned. For example, from the above scenario
each costumer is assigned a probability to be in either of 10 clusters of
the retail store.
Difference between K Means and Hierarchical clustering
• Hierarchical clustering can’t handle big data well but K Means

clustering can. This is because the time complexity of K Means is
linear i.e. O(n) while that of hierarchical clustering is quadratic
i.e. O(n2).
• In K Means clustering, since we start with random choice of
clusters, the results produced by running the algorithm multiple
times might differ. While results are reproducible in Hierarchical
clustering.
• K Means is found to work well when the shape of the clusters is
hyper spherical (like circle in 2D, sphere in 3D).
• K Means clustering requires prior knowledge of K i.e. no. of
clusters you want to divide your data into. But, you can stop at
whatever number of clusters you find appropriate in hierarchical
clustering by interpreting the dendrogram
Others Application Of Clustering
• Recommendation engines
• Market segmentation
• Social network analysis
• Search result grouping
• Medical imaging
• Image segmentation
• Anomaly detection
Let’s Implement

K - Means & Hierarchical - Clustering - Concepts

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

K - Means & Hierarchical - Clustering - Concepts

Uploaded by

Copyright:

Available Formats

Clustering Techniques

Copyright©2019 by Simplifying Skills, Contact:-9579708361/8390096208

K Means Clustering is an unsupervised learning algorithm that

Copyright©2019 by Simplifying Skills, Contact:-9579708361/8390096208

● The overall goal is to divide data into distinct groups such

The K Means Algorithm

● There is no easy answer for choosing a “best” K value

 We had a definite response

 We had data points which possibly led to that response

 We extracted information through these modelling

 Now what if we don’t really have a response. We

 We can try to find out if there is some pattern in

 What do we mean by pattern?

 Ifall the observations or data points are similar,

 More information would be if some of the data

 Wine tasters are used to rate wines on various

 For mass production companies, this process needs to be

 Insteadof relying on subjective opinions of wine tasters

 We can then group similar wines together based on their

 Although final labeling on these groups will have to be a

 Howdo you “quantify” the difference? How do

 As it turns out scale matters, especially if

 The way out?

 With increasing data; process becomes too slow

 Treediagrams become too cluttered to make any

 K-means clustering comes to the rescue

 You alreadytell the algorithm, how many clusters

 You startwith K random observations from the

 Next observation is added to one of these clusters with

 Centroid for the said data is calculated, this would be the

 This procedure is repeated until all the data points are

 We’ll take care of these problems in coming

 Sensitive to the initial seeds [ the first K points]

 What value of K is appropriate?

 R2or WSS will keep on increasing /decreasing

 Where do we stop increasing K?

 Fine my clustering is done; now what?

 Forany given data Total sum of squares or SST is

 SST= SSW + SSB, that is if we break our data into

 As we increase number of groups SSW goes down.

 With increase in K if fall in SSW is not rapid/

 R2= SSB/SST , as opposed to SSW, SSB goes

 Make ordinal if possible

 Keepin mind that those ordinal variables should

 There is no hard and fast statistical theory at play

 Nobeta {the symbol!} , no DV, no hassle, why

 Marketing & Media

 Banking & Insurance

 Medical and Pharmaceuticals

Broadly speaking, clustering can be divided into two subgroups :

Hard Clustering: In hard clustering, each data point either belongs to a

Soft Clustering: In soft clustering, instead of putting each data point

• Hierarchical clustering can’t handle big data well but K Means

You might also like