Professional Documents
Culture Documents
Fuzzy C-Means Clustering Algorithm
Fuzzy C-Means Clustering Algorithm
Presented by
Asya Nikitina
1
Fuzzy Sets and Membership Functions
You are approaching a red light and must advise a driving
student when to apply the brakes. What would you say:
“Begin braking 74 feet from the crosswalk”?
“Apply the brakes pretty soon”?
3
Fuzzy Sets and Membership Functions
Conventional (or crisp) sets contain objects that satisfy
precise properties required for membership. For example,
the set of numbers H from 6 to 8 is crisp:
H = {r ∈ ℛ | 6 ≤ r ≤ 8}
mH = 1; 6 ≤ r ≤ 8;
mH = 0; otherwise (mH is a membership function)
5
Fuzzy Sets and Membership Functions
Because the property “close to 7” is fuzzy, there is not a
unique membership function for F. Rather, it is left to the
modeler to decide, based on the potential application and
properties desired for F, what mF(r) should be like.
s w a p m w a te r? b e e r? H 2O ? H C l?
A B
8
Hard clustering assign each feature
vector to one and only one of the
clusters with a degree of membership
equal to one and well defined
boundaries between clusters.
9
Fuzzy clustering allows each feature
vector to belong to more than one
cluster with different membership
degrees (between 0 and 1) and
vague or fuzzy boundaries between
clusters.
10
Difficulties with Fuzzy Clustering
The optimal number of clusters K to be
created has to be determined (the
number of clusters cannot always be
defined a priori and a good cluster
validity criterion has to be found).
12
Objectives and Challenges
Create an algorithm for fuzzy clustering that
partitions the data set into an optimal number
of clusters.
16
Description of Fuzzy Partitioning
3) Compute new cluster prototypes Vi
18
The Fuzzy k-Means Algorithm
For the hyperellipsoidal clusters, an “exponential”
distance measure, d2e (Xj – Vi), based on ML
estimation was defined:
20
The Major Advantage of FMLE
Obtaining good partition results starting from
“good” classification prototypes.
21
Unsupervised Tracking of Cluster
Prototypes
Different choices of classification prototypes
may lead to different partitions.
22
Unsupervised Tracking of Cluster
Prototypes
1) Compute average and standard deviation of the
whole data set.
2) Choose the first initial cluster prototype at the
average location of all feature vectors.
3) Choose an additional classification prototype
equally distant from all data points.
4) Calculate a new partition of the data set
according to steps 1) and 2) of the fuzzy
k-means algorithm.
1) If k, the number of clusters, is less than a given
maximum, go to step 3, otherwise stop.
23
Common Fuzzy Cluster Validity
Each data point has K memberships; so, it is
desirable to summarize the information by a
single number, which indicates how well the
data point (Xk) is classified by clustering.
26
Proposed Performance Measures
Average partition density, DPA, is calculated from:
27
Proposed Performance Measures
The partition density, PD, is calculated from:
28
Sample Runs
In order to test the performance of the
algorithm, N artificial m-dimensional
feature vectors from a multivariate normal
distribution having different parameters and
densities were generated.