Professional Documents
Culture Documents
lec-3
lec-3
63
Clustering is the art of grouping
together pattern vectors that in some
sense belong together because they
have similar characteristics and are
different from other pattern vectors.
64
Question:
How do we start the process of finding
clusters and identifying similarities???
Answer:
First realize that clustering is an art and
there is no correct answer only feasible
alternatives.
Second explore structures of data,
similarity measures, and limitations of
various clustering procedures
65
Formalization of the Problem of Clustering
66
1. The members in each subset are in
some sense similar and not similar to
members in the other subsets.
K
=S
∩
4. Clk
k=1
Exhaustive
Φ is the Null Set 67
Illustration of Clusters and Cluster centers
68
Will now look at two examples that
illustrate problems in performing
meaningful clustering:
69
Example 1:
Given the data below, obtained by
measuring the weight and diameter of 4
large foam balls labeled a, b, c, and d.
71
Solution:
The plot of the points in the 2-dimensional
pattern space is given below
By closeness in
pattern space select
Cl1 = { a,c }
Cl2 = { b,d }
72
The plot of the same points in the 2-dimensional
pattern space with Diameter shown in inches
rather than feet (different scale) is given below
73
The plot of the same points in the 2-dimensional
pattern space with Diameter shown in inches
rather than feet (different scale) is given below
By closeness in
pattern space select
Cl1 = { a,b }
Cl2 = { c,d }
74
Which set of clusters is the
correct answer ???
75
Which set of clusters is the
correct answer ???
#1: Cl1 = { a,c } Measured in feet
Cl2 = { b,d }
76
Which set of clusters is the
correct answer ???
#1: Cl1 = { a,c } Measured in feet
Cl2 = { b,d }
77
Which set of clusters is the
correct answer ???
#1: Cl1 = { a,c } Measured in feet
Cl2 = { b,d }
78
Which set of clusters is the
correct answer ???
#1: Cl1 = { a,c } Measured in feet
Cl2 = { b,d }
80
One approach is to solve the scaling problem is to
normalize each dimension separately if they
represent different properties like weight and
diameter.
Weight
1 81
Concentrate now on quantitative data and
examine measures of similarity between
pattern samples and clusters
Euclidean Distance between
two pattern vectors x and y
1.
minimum distance
2.
average distance
83
3.
between means
Where
4.
between medians
84
5.
maximum distance
88
K-Means Clustering Algorithm: Basic Procedure
89
Flow Diagram for K-Means Algorithm
90