Professional Documents
Culture Documents
Module 4-1
Module 4-1
Module 4
Contents
• Types of data in Cluster analysis
• Partitioning Methods (k-Means, k-Medoids)
• Hierarchical Methods (Agglomerative,
Divisive).
MUMBAIUNIVERSITY EXAM QUESTIONS
Clustering
Income:
Income: Medium
High Children:2
Children:1 Car: Truck
Car: Luxury
Cluster 1 Cluster 4
Car: Sedan and
Income: Low Children:3
Children:0 Income:
Car:Compac Medium
t Cluster 3
Cluster 2
Clustering is
ambiguous
⚫ There is no correct or incorrect solution for
clustering.
1. Partitioning Method
2. Hierarchical Method
3. Density-based Method
4. Grid-Based Method
5. Model-Based Method
6. Constraint-based Method
Clustering Methods
Clustering
Hierarchical Partitioned
Original
A Partitional Clustering
Points
• Finding all clusters at once
• A division data objects into non-overlapping subsets
(clusters) such that each data object is in exactly one subset
Partitioning Algorithms: Basic
Concept 123
5. May 2017
⚫ D={1,2,6,7,8,10,15,17,20}
⚪ K1={2,7,10,17}
⚪ K2={1,6,8,15,20}
⚫ Iteration 1: m1=9, m2=10
⚪ K1 (9)={1,2,6,7,8)
⚪ K2(10)={10,15,17,20}
⚫ Iteration 2: m1=4.8, m2=15.5
⚪ K1(4.8)= {1,2,6,7,8,10)
⚪ K2(15.5)={15,17,20}
⚫ Iteration 3: m1=5.6, m2=17.3
⚪ k1(5.6)={1,2,6,7,8,10}
⚪ k2 (17.3)={15,17,20}
⚫ Iteration 4: m1=5.6, m2=17.3
⚪ k1(5.6)={1,2,6,7,8,10}
⚪ k2 (17.3)={15,17,20}
K-Means
(Graph)
K-Means (graph)
Medicine Weight pH
A 1 1
B 2 1
C 4 3
D 5 4
25
Solution
• Plot the values on a graph.
• Mark any k centeroids
26
Questions on
Partitioning
method-K-Means
May 2018
May
2019
What is the problem of k-Means Method?
E = |Pi – Ci|
e de cluster
cluster Iteration:
a
ab Select a cluster and split it into
b abcd
two sub clusters
c e cde Until each leaf cluster contains
d only one object
de
e
62
Dendrogram
⚫ Dendrogram:a tree data
structure which illustrates
hierarchical clustering
techniques.
⚫ Each level shows clusters for
that level.
⚪ Leaf – individual clusters
⚪ Root – one cluster
⚫ A cluster at level i is the
union of its children clusters
at level i+1.
60
Example
Solution
Example 2:
Calculation Steps:
Step 1: Draw the graph.
Same formula can be used for
(p1,p3),(p1,p4),(p1,p5),(p1,p6)
The distance matrix is:
Step 3: Find the minimum value element from distance
matrix.
The minimum value element is (p3,p6)and value is 0.11
i.e. our 1st cluster (p3,p6)
May 2010 university question
1. What is clustering technique? Discuss the agglomerative algorithm
using following data and plot a dendrogram using link approach.
The following figure contains sample data items indicating the
distance between the elements.
MAY 2019
Agglomerative – Complete Link
Algorithm
Agglomerative – Complete Link
1. Discuss the agglomerative algorithm using following data and plot a
Algorithm
dendrogram using Complete link approach. The following figure
contains sample data items indicating the distance between the
elements. 1 2 3 4 5
11.00 0.90 0.10 0.65 0.20
20.90 1.00 0.70 0.60 0.50
30.10 0.70 1.00 0.40 0.30
40.65 0.60 0.40 1.00 0.80
50.20 0.50 0.30 0.80 1.00
2. Complete Link
Advantage: Less susceptible to noise and outliers
Dis-Advantage:
⯍Tends to break large clusters
⯍Biased towards globular clusters
3. Average Link
Less susceptible to noise and outliers
⯍ Advantage:
C2 C1
Splitting Process of DIANA
Iteration: (Cont’d) C2 C1
6. Otherwise, stop
splitting process.
Discussion on Hierarchical Approaches
190
⚫ Strengths
⚪ Do not need to input k, the number of clusters
⚫ Weakness
⚪ Do not scale well; time complexity of at least
O(n2), where n is total number of objects
⚪ Can never undo what was done previously
Hierarchical clustering
comparison
• Agglomerative (bottom
Divisive (top down)
• up)