Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Ch 5 - Advanced Analytical Theory and Methods

Clustering Analysis – Tutorial


Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data.

Copyright © 2014 EMC Corporation. All Rights Reserved. Module 4: Analytics Theory/Methods 1
Equations (two dimensions)

● In two dimensions, the distance, d, between


any two points, (x1,y1) and (x2, y2), in the
Cartesian plane is expressed by using the
Euclidean distance measure:

● In two dimensions, the centroid (xc , yc) of


them points in a k-means cluster is calculated
as follows:
○ m points in the cluster

Copyright © 2014 EMC Corporation. All Rights Reserved. 2


Clustering Equations (n dimensions)

• In n dimensions, if here are M objects, where each object is


described by n attributes, then object i is described by (pi1, pi2, …pin)
for i = 1,2, ... , M.
• To expand the earlier process to find the k clusters from two
dimensions to n dimensions, the following equations provide the
formulas for calculating the distances and the locations of the
centroids for n >= 1.
• For a given point, pi , at (pi1, pi2, …pin) and a centroid, q, located at
(q1, q2, …qn). The distance, d, between pi and q, is expressed as:

Copyright © 2014 EMC Corporation. All Rights Reserved. 3


Clustering Equations (n dimensions)

● The centroid, q, of a cluster of m points, by (pi1, pi2, …pin) for i =


1,2, ... , M is calculated as shown:

● The Within Sum of Squares (WSS):

■ WSS is the sum of the squares of the distances between each data point and
the closest centroid. The term q(i) indicates the closest centroid that is
associated with the ith point.

Copyright © 2014 EMC Corporation. All Rights Reserved. 4


Clustering Algorithm

Copyright © 2014 EMC Corporation. All Rights Reserved. 5


6
7
Centroids

8
9
- We will calculate all the points in the same way

10
11
12
- Allocation for each point to its Cluster

13
14
15
16
Iteration 2
Centroids

17
18
19
20
Iteration 3
Centroids

21
22
23
24
Iteration 4
Centroids

- Same Allocation of Clusters


25
Example
Using the following plot to draw the final Clusters

26
Example
Using the following plot to draw the final Clusters

27
Extra Examples

• https://youtu.be/_S5tvagaQRU?t=174 (Video)
• https://www.youtube.com/watch?v=wt-X61BnUCA (Video)
• http://mnemstudio.org/clustering-k-means-example-1.htm (2d)
• https://www.saedsayad.com/clustering_kmeans.htm (2d)
• https://www.datascience.com/blog/k-means-clustering (Python)
• https://pythonprogramminglanguage.com/kmeans-elbow-method
/
(WSS in Python)

Copyright © 2014 EMC Corporation. All Rights Reserved. 28

You might also like