Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Clustering Techniques

(K_Means,Hierarchical)

Copyright©2019 by Simplifying Skills, Contact:-9579708361/8390096208


K Means Clustering

K Means Clustering is an unsupervised learning algorithm that


will attempt to group similar clusters together in your data.
So what does a typical clustering problem look like?
● Cluster Similar Documents
● Cluster Customers based on Features
● Market Segmentation
● Identify similar physical groups

Copyright©2019 by Simplifying Skills, Contact:-9579708361/8390096208


K Means Clustering

● The overall goal is to divide data into distinct groups such


that observations within each group are similar
K Means Clustering

The K Means Algorithm


● Choose a number of Clusters“K”
● Randomly assign each point to a cluster
● Until clusters stop changing, repeat thefollowing:
○ For each cluster, compute the cluster centroid by
taking the mean vector of points in the cluster
○ Assign each data point to the cluster for which the
centroid is the closest
K Means Clustering
Choosing a K Value
Choosing a K Value

● There is no easy answer for choosing a “best” K value


● One way is the elbow method
First of all, compute the sum of squared error (SSE) for some
values of k (for example2, 4, 6, 8, etc.).
The SSE is defined as the sum of the squared distance
between each member of the cluster and its centroid.
Choosing a K Value
If you plot k against the SSE, you will see that the error
decreases as k gets larger; this is because when the number
of clusters increases, they should be smaller, so distortion is
also smaller.
The idea of the elbow method is to choose the k at which the
SSE decreases abruptly.
This produces an "elbow effect" in the graph, as you can see
in the following picture:
Choosing a K Value
Clustering Techniques
[Segmentation]
Why and Where?
What we have done so far

 We had a definite response

 We had data points which possibly led to that response

 We extracted information through these modelling


techniques, how the data points contributed to that
response
No Response?

 Now what if we don’t really have a response. We


just have data. Is that any good? What kind of
information would you be interested in
extracting?

 Unsupervised Learning?
In absence of a target

 We can try to find out if there is some pattern in


the data.

 What do we mean by pattern?

 Ifall the observations or data points are similar,


all information that we can extract is by
summarizing the data.
Contd..

 More information would be if some of the data


points are different from others, or in other words
there exist some groups within the population,
different than each other.

 Examples?
Class Case : Fine Wine
Wine Tasters

 Wine tasters are used to rate wines on various


parameters.

 Its not only becoming tough to find good wine tasters, its
almost impossible get consistent feedbacks from multiple
wine tasters

 For mass production companies, this process needs to be


automated
Segmentation to Rescue

 Insteadof relying on subjective opinions of wine tasters


we can measure chemical properties of wines

 We can then group similar wines together based on their


chemical properties

 Although final labeling on these groups will have to be a


manual process
Quantifying Difference
What do you mean by different?

 Howdo you “quantify” the difference? How do


you make the group distinguishable based on
data, more importantly “numbers’?
A small example
 Lets plot this small data set and try to figure out if there is way to quantify
our intuition.
So distance it is then. Scale matters?

 As it turns out scale matters, especially if


variables in consideration are measured on
different scales.

 The way out?


Standardization is the saviour Again!

 Or , is it?

 Althoughstandardization is recommended, do we
always standardize?
Combining Groups
Aggregation of groups : Methods

 Linkage Methods
 Single
 Complete
 Average
 Centroid
Methods contd..
Hierarchical Clustering
Problems with Hierarchical Clustering

 With increasing data; process becomes too slow


and resource intensive

 Treediagrams become too cluttered to make any


sense out of them

 K-means clustering comes to the rescue


K-Means Clustering

 You alreadytell the algorithm, how many clusters


there are going to be in the data, that is the
number K.

 You startwith K random observations from the


data, as K clusters.
Contd..

 Next observation is added to one of these clusters with


the criterion which you have chosen.

 Centroid for the said data is calculated, this would be the


new point from which distance from the cluster will be
calculated.

 This procedure is repeated until all the data points are


assigned to clusters
K Means Clustering
K Means Clustering
Problems?

 We’ll take care of these problems in coming


sections. Lets briefly touch upon these for now

 Sensitive to the initial seeds [ the first K points]


K?

 What value of K is appropriate?

 R2or WSS will keep on increasing /decreasing


respectively with increase in value of K, until K
equals the number of data points [We’ll learn how
wss plays a role in clustering in next section]

 Where do we stop increasing K?


Variable Selection and results of
clustering
 Which variable should be selected? All of them?
What can be the problem with including all the
variables?

 Fine my clustering is done; now what?


Sum of Squares & sons!

 Forany given data Total sum of squares or SST is


constant

 SST= SSW + SSB, that is if we break our data into


multiple groups. If these groups are formed, SSB
is much higher w.r.t. To SSW.
Contd..

 As we increase number of groups SSW goes down.

 With increase in K if fall in SSW is not rapid/


steep, it implies the higher number of groups are
not resulting in better formed groups.
Effect on R2

 R2= SSB/SST , as opposed to SSW, SSB goes


up and equals SST if we have number of
groups equal to data points. With
increasing K, R2 approaches to one.
Effect on WSS – Deciding on no. of
clusters
Other Considerations
What to do with categorical variables?

 Make ordinal if possible

 Or Dummy variables

 Keepin mind that those ordinal variables should


not be circular. Ok. Fine. Wait!....Circular?
Is multicollinearity an issue?

 There is no hard and fast statistical theory at play


here

 No hypothesis testing

 Nobeta {the symbol!} , no DV, no hassle, why


exactly multicollinearity can be an issue here?
Applications
Further applications of Cluster analysis

 Marketing & Media

 Banking & Insurance

 Medical and Pharmaceuticals

 Socio-economic
Types of Clustering

Broadly speaking, clustering can be divided into two subgroups :

Hard Clustering: In hard clustering, each data point either belongs to a


cluster completely or not. For example, in the above example each
customer is put into one group out of the 10 groups.

Soft Clustering: In soft clustering, instead of putting each data point


into a separate cluster, a probability or likelihood of that data point to
be in those clusters is assigned. For example, from the above scenario
each costumer is assigned a probability to be in either of 10 clusters of
the retail store.
Difference between K Means and Hierarchical clustering

• Hierarchical clustering can’t handle big data well but K Means


clustering can. This is because the time complexity of K Means is
linear i.e. O(n) while that of hierarchical clustering is quadratic
i.e. O(n2).
• In K Means clustering, since we start with random choice of
clusters, the results produced by running the algorithm multiple
times might differ. While results are reproducible in Hierarchical
clustering.
• K Means is found to work well when the shape of the clusters is
hyper spherical (like circle in 2D, sphere in 3D).
• K Means clustering requires prior knowledge of K i.e. no. of
clusters you want to divide your data into. But, you can stop at
whatever number of clusters you find appropriate in hierarchical
clustering by interpreting the dendrogram
Others Application Of Clustering
• Recommendation engines
• Market segmentation
• Social network analysis
• Search result grouping
• Medical imaging
• Image segmentation
• Anomaly detection
Let’s Implement

You might also like