Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Clustering

Prof. Ankur Sinha


Indian Institute of Management Ahmedabad
Gujarat India
Clustering
• Grouping a set of data objects into different
groups based on similarity
• An example of unsupervised learning
• Data objects can be vectors representing
different attributes for an object, for example,
customer, location, product, etc.
Examples
• Used in a variety of areas
– Marketing
– Urban planning
– Customer segmentation
– Product segmentation
– Seismology
Similarity Measure
• If two objects i and j are represented by
vectors xi and xj
– How do you measure similarity between the two
objects
• Euclidean distance
• Manhattan distance
• Mahalanobis distance
– Similarity can be chosen based on the application
Similarity Measure
• Consider 10 customers with two attributes
– Attribute 1: Recent usage of services
– Attribute 2: Customer age
• Objective: Cluster the data into two classes and design two marketing
campaigns for the two customer segments
X 10 years
10

7
Customer Age

0
0 1 2 3 4 5 6 7 8 9 10
X 10 minutes

Usage of Service
Similarity Measure
• Consider 10 customers with two attributes
– Attribute 1: Usage of services
– Attribute 2: Customer age

10 Cluster 1 Cluster2
9

8
(3,4) (6,2)
7

6 (2,6) (7,2)
5

4
(4,5) (7,4)
3 (4,7) (8,4)
2

1
(3,8) (8,5)
0
0 1 2 3 4 5 6 7 8 9 10
Clustering approaches
• Hierarchical clustering
– Agglomerative
– Divisive
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
Clustering approaches
• K-means Clustering
– Select initial centroids randomly
– Assign objects to centroids based on similarity
measure
– Compute new centroid as mean of each class
– Repeat the above two steps until there is no
change
K-Means Clustering

Start with centroids randomly placed Assign points to the centroids Update centroids

Assign points to the new centroids Update centroids Assign points to the new centroids
Random centroids
K-Means Clustering

Start with centroids randomly placed Assign points to the centroids Update centroids

Assign points to the new centroids Update centroids Assign points to the new centroids

Continue until there is no


change in the structure of the
clusters

You might also like