Sayadi FuzzyClustering

Fuzzy Clustering
Presented By: Omid Sayadi* Supervisor: Dr. Bagheri

PhD. student, Biomedical Image and Signal Processing Lab (BiSIPL), Department of Electrical Engineering, Sharif University of Technology, osayadi@ee.sharif.edu
*
Fuzzy Clustering
Problem Statement Fuzzy Clustering Algorithms Fuzzy Clustering Applications Discussion and Conclusions
Spring 2008 Sharif University of Technology 2
Introduction
Cluster
A number of similar individuals that occur together as a two or more consecutive features that span a specific subspace of a concept. or Collection of data objects,
similar to one another within the same cluster dissimilar to the object in other clusters
Spring 2008
Weight (Kg) Lorries
Sport Cars Medium Market Cars Top Speed (Km/h)
Sharif University of Technology
Introduction (cont.)
Clustering
Process of grouping a set of physical or abstract objects into classes of similar objects. Clustering is the art of finding groups in data
Kaufmann & Rousseeu
Cluster analysis is an important human activity: Distinguishing in early chilhood, Learn a new object or understand a new phenomenon (feature extraction and comparison)
Motivation
Discovering hidden patterns and structures, Discovering large sets of data into small number of meaningful groups (clusters), Dealing with a managable number of homogenous groups, instead of dealing with a vast number of single data objects, Data reduction and information compaction.
Clustering vs. Classification
Clustering: Unsupervised Learning No class labels defined. Classification: Supervised Learning Predefined (priori known) clas labels, Training set (labeled) and test set. Clustering is unsupervised classification, where no classes are predefined (labeled).
Similarity measures
Clustering: Max intra-similarity
Min inter-similarity
Spring 2008
Similarity measure functions
Tchebyschev
Euclidean
Hamming
Minkowski
Spring 2008
Clustering Approaches
Hierarchy algorithms Find successive clusters using previously established clusters. Partitioning algorithms Construct various partitions and then evaluate. Determine all clusters at once. Model-based algorithms Grid-based algorithms Density-based algorithms
Hierarchical Clustering
Create a hierarchical decomposition of the data set using some criterion and a termination condition.
Divisive (Top-down ) Agglomerative (Bottom-up)
Spring 2008
10
Divisive vs Agglomerative
Spring 2008
11
Partitional Clustering
Given a database of N objects, partition the objects into a pre-specified number of K clusters.
1 K iK N M (N , K ) = ( 1 ) ( K i ) K! i = 0 i
Number of clustering ways
Liu, 1968
The clusters are formed to optimize a similarity function (max intra-similarity and min inter-similarity). Popular Partitioning Algorithms: K-means EM (Expectation Maximization)
Challenges
Hierarchy algorithms The tree of clusters (dendogram) needs satisfaction of a termination criteria dendogram cutting Agglomerative or Divisive Irreversible split and merge Partitioning algorithms Pre-selection of number of clusters (K).
K-means algorithm
Given the number of clusters (K), partition objects (randomly) into K nonempty subsets, While new assignments occur, do: Compute seed points as the centroids (virtual mean point) of the clusters of the current partition. Assign each object to the cluster with the nearest seed point.
Sharif University of Technology 14
Spring 2008
K-means example
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 0.9
Problem: Equal distance to centroids !
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Spring 2008
15
Taxonomy of Clustering Approaches
Spring 2008
16
Fuzzy Clustering
Introduction
Fuzzy Clustering Algorithms Fuzzy Clustering Applications Discussion and Conclusions

Problem Statement
HCM (K-means) Formulation
r r r X = {x1 , x2 ,K, xn }
Ci
All clusters C together fills the whole universe U
Set of data in the feature space ith cluster
UC
i =1
=U
Clusters do not overlap
Ci C j =
A cluster C is never empty and it is smaller than the whole universe U
for all i j for all i

There must be at least 2 clusters in a c-partition and at most as many as the number of data points K 18
Ci U 2cK
Spring 2008
Problem Statement (cont.)

K-means Failures
The objective function in classical clustering:
J= Ji = u k ci i =1 i =1 k ,u k Ci

Minimise the total sum of all distances
Each data must be assigned to exactly one cluster. The problem of data points that are equally distant.

Equi-distant data points Butterfly data points (Ruspinis Butterfly 1969)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Spring 2008
20

Towards Fuzzy Clustering
We need to support uncertainity Each data can belong to multiple clusters with varying degree of membership.
1 0.9 0.8 0.7 0.6 0.5
The space is partitioned into overlaping groups.

Crisp Clusters Fuzzy Clusters
0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Spring 2008
21

Fuzzy C-Partition Formulation
r r r X = {x1 , x2 ,K, xn }
Ci
All clusters C together fills the whole universe U
Set of data in the feature space ith cluster
UC
i =1
=U
Clusters do overlap
Ci C j = or for all i j
A cluster C is never empty and it is smaller than the whole universe U
Ci U 2cK
for all i
There must be at least 2 clusters in a c-partition and at most as many as the number of data points K 22
Spring 2008

Fuzzy Clustering Types
Hard Clustering
Probabilistic Fuzzy Clustering Bezdek, 1981 Omitting the non-overlapping condition Possibilistic Fuzzy Clustering Krishnapuram & Keller, 1993
Fuzzy Clustering
Spring 2008

Probabilistic Fuzzy Clustering
A constraint optimization
r uij = Ci ( x j ) [0,1] The membership degree of a datum Xj to ith cluster r u j = (u1 j ,K, ucj )T Fuzzy label vector to each data point Xj r r r U = [uij ]cn = (u1 , u2 ,K, un ) Fuzzy label vector to each data point Xj
c n
J f ( X ,U f , C ) = u d
i =1 j =1
uij > 0
&
j =1
c
i {1,K, c}
j {1,K , n}
No empty cluster Normalization Constraint 24
m 2 ij ij
uij = 1
i =1
Spring 2008

Probabilistic Fuzzy Clustering (cont.)
m 2 J f ( X ,U f , C ) = uij dij i =1 j =1 c n
Distance between datum Xj and cluster i
Fuzzifier exponent (m1) Usually m=2

m=1.1 m=2
m determines the fuzziness of the clustering:

m1 : more crisp clustering m : more fuzzy clustering Spring 2008 Sharif University of Technology 25

The cost function Jf cannot be minimized directly, hence an alternative optimization scheme (AO) must be used. The iterative algorithm:
First, the membership degrees are optimized for fixed cluster parameters: U t = jU (Ct 1 ), t > 0 Then, the cluster parameters are optimized for fixed membership degrees: Ct = jc (U t )
Spring 2008

Minimization result:
Update formula for the membership degree:
( t +1) uij = 2 /( m 1) d ij 2 /( m 1) d lj l =1 c
Gravitation to cluster i relative to total gravitation
It depends not only to the distance of the datum Xj to cluster i, but also on the distances between this data point and other clusters.

What about the cluster prototypes (C) ?
They are algorithm dependent, i.e. they depend on:
Describing parameters of the cluster (location, shape, size) Distance measure d.
Problem: Lack of Typicality

The notmalization constraint causes the cluster to tend to the outliers. No difference between x1 and x2 (0.5 for both)

Possibilistic Fuzzy Clustering
Idea: Drop the normalization probabilistic fuzzy clustering. Remainig constraint: condition in
u
i =1
ij
=1
j {1, K , n}
A penalty term which forces the membership degrees away from zero.
The cost function to be minimized:

J f ( X , U f , C ) = u d + i (1 uij ) m
i =1 j =1 m ij 2 ij i =1 j =1 c n c n
Spring 2008
29

Possibilistic Fuzzy Clustering (cont.)
J f ( X , U f , C ) = u d + i (1 uij ) m
i =1 j =1 m ij 2 ij i =1 j =1 c n c n
i>0: Used to balance the contrary objectives expressed in the two above terms. 1 uij = Minimization result: 1 /( m 1)
2 d ij 1+ i
Result: The membership degree of a datum Xj to cluster i depends only on its distance to this cluster.

More about i
Let m=2 in the previous update equation. If i equals (dij)2, then uij=0.5 . Hence, i determines the distance to the cluster i at which the membership degree should be 0.5 . Permitted extension of the cluster can be controlled by this parameter. i can be estimated by the fuzzy intra-class distance in probabilistic fuzzy clustering model: n
i =
Spring 2008 Sharif University of Technology
u
j =1 n j =1
m 2 ij ij
m ij
31
Fuzzy Clustering
Introduction Problem Statement
Fuzzy Clustering Applications Discussion and Conclusions

FC algorithms
Major algorithms
Fuzzy c-means (FCM) Possibilistic c-means (PCM) Gustafson-Kessel (GK) Assumptions:
Input: data matrix (Xpn), and number of clusters (c). Output: cluster centers (C), and fuzzy partition matrix (U). Initialize cluster centers randomly for all algorithms.
Spring 2008
33
FC algorithms (cont.)
FCM
A probabilistic fuzzy clustering approach, Finds c spherical clusters the cluster prototype is the cluster center (C), The found clusters are approximately the same size, Distance measure: Euclidian distance, According to the objective function Jf the cluster prototype is updated as: n
Ci( t +1) =
Spring 2008 Sharif University of Technology
( t +1) m r ( u ij ) x j j =1 n
(u
j =1
( t +1) m ij
34
FCM algorithm
While U (t +1) U (t ) < or C (t +1) C (t ) < repeat: Compute distances. Compute membership values (Partition matrix):
( t +1) uij = 2 /( m 1) d ij 2 /( m 1) d lj l =1 c
for i = 1, K , c and j = 1, K , N
Compute cluster centers:

Ci(t +1) =
Spring 2008
( t +1) m r ( u ij ) x j j =1 n n
j =1 Sharif University of Technology
(u
for i = 1,..., c
35
( t +1) m ij
FCM (cont.)
The probabilistic FCM is widely used as an initializer for other clustering methods. It is a fast, reliable and stable method. In practice, FCM is not likely to stuck in local minimums. But it has problems: Lack of typicality, Sensitive to outliers.
Sharif University of Technology solution
PCM
Spring 2008
36
PCM algorithm
uij = 1 d 1+ i
2 ij 1 /( m 1)
for i = 1,..., c and j = 1,..., N

Different from FCM
Compute cluster centers:

Ci(t +1) =
Spring 2008
( t +1) m r ( u ij ) x j j =1 n n
The same as FCM
j =1 Sharif University of Technology
(u
for i = 1,..., c
37
( t +1) m ij
FCM vs. PCM
PCM has solved the problems of FCM, but we face a new problem :
Cluster Coincidence and Cluster Repulsion
FCM PCM
Spring 2008
38
GK
Problem of FCM and PCM: only spherical clusters. In GK, each cluster is characterized by its center and r covariance matrix: Ci = {c i = 1,..., c i , i }, GK finds ellipsoidal clusters with approximately the same size. Clusters adapt themselves to the shape and location of data, because of the covariance matrix. Cluster size can be controlled by: det(i ) Usually det ( i ) = 1
Spring 2008
GK (cont.)
The Mahalanobis distance is used in GK:
1 r r r T 1 r r d ( x j , Ci ) = det(i ) p (x j ci ) i (x j ci ) 2
Each cluster have its special size and shape, The algorithm is locally adaptive, We need an update equation for covariance matrix, to minimize the objective function (either in probabilistic n or possibilistic): r (t +1) r r (t +1) T ( t +1) r
i
Spring 2008
( t +1)
u
j =1
ij
( x j ci
n
)( x j ci
Sharif University of Technology j =1
uij
( t +1)
40
GK algorithm
u
( t +1) ij
2 /( m 1) d ij 2 /( m 1) d lj l =1 c
or
uij =
1 d 1+ i
2 ij
(t +1)
1 /( m 1)
Compute cluster centers and cluster covariance matrix:

Ci(t +1) =
(u
j =1 n
( t +1) m ij
r ) xj for i = 1,..., c i )
(t +1)
Spring 2008 j =1
(u
uij
j =1
r r r r ( x j ci(t +1) )(x j ci(t +1) )T
( t +1) m ij
u
j =1
(t +1)
ij
41
FCM vs. GK
FCM GK
Spring 2008
42
Other non-point-prototypes clustering models
Shell clustering algorithms are used for segmentation and the detection of special geometrical contours.
Spring 2008
43
Fuzzy Clustering
Introduction Problem Statement Fuzzy Clustering Algorithms
Discussion and Conclusions

FC applications
Typical Applications
Fuzzy Inference System (FIS), Image Processing, Pattern Recognition, Machine learning, Data minig, Social network analysis, and ...
Spring 2008
45
FC applications (cont.)
FIS
Fuzzy inference mechanism is summerized as:
Q1: Where do the membership functions come from? Q2: How are the if-then rules extracted from data?
Spring 2008
FIS from Fuzzy Clustering
Clustering data in: Input-Output feature space, Input and Output spaces separately, Output space (induce clusters in inputs). Obtain membership functions by: Projection onto variables, Parametrization of the membership function. Extract one rule per cluster, Usually, FCM + Mamdani FIS is used.
Spring 2008
FIS from Fuzzy Clustering (cont.)
Spring 2008
48
FIS from Fuzzy Clustering (cont.)
Spring 2008
49
The same idea for Ruspinis Butterfly
m=1.25
m=2
Spring 2008
50
Biomedical applications
Tumor detection and extraction mamograpgym, ...). Image segmentation (MRI images, radiograohy, ...). (cancer, Cephalic
Spring 2008
51
Tumor detection
Crisp Adaptive methods Fuzzy methods
Spring 2008
52
Fuzzy Clustering
Introduction Problem Statement Fuzzy Clustering Algorithms Fuzzy Clustering Applications
Spring 2008
53
Conclusion
In summary
The ability to cluster data (concepts, perceptions, etc.) is an essential feature of human intelligence. The main idea of FC is to partition data into overlaping groups based on the similarity amongst patterns. The result of clustering is a set of clusters, cluster centers, and a matrix containig the membership degrees. FCM results in spherical clusters, but confusing in equally distant data objects. PCM doen not have the normal constraint, but suffers from coincidence or cluster repulsion.
Spring 2008
Conclusion (cont.)
Summary (cont.)
KG uses the covariance matrix, hence it yields ellipsoidal clusters.
The algorithms incorporate a fuzziness exponent which determines the intention of the algorithm towards fuzzy.
Spring 2008
Conclusion (cont.)
Summary (cont.)
FCM is widely used as an initializer for other clustering methods. FC methods are widely used in Fuzzy membership function generation to model fuzzy rule bases and inference systems. Compared to the classical (Crisp) clustering, FC methods show more efficiency in many applications.
Spring 2008
56
Discussion
Related issues
Number of clusters (c): Yang Shanlin & Malay proved that: c n0.5 Elbow criterion: Define a validity measure, and evaluate it using different number of clusters to find an optimum point (elbow), where adding another cluster shouldnt add sufficient information.
Spring 2008
57
Discussion (cont.)
Related issues (cont.)
Spring 2008
58
Discussion (cont.)
Shape of membership function
Semantically fuzzy sets are required to be convex, monotonous and with limited support. Does FCM not support the above conditions? Does PCM lead to convex membership functions? We should choose another cluster estimation to have proper clusters with flexibility to choose the membership functions support.
Spring 2008
59
Discussion (cont.)
Shape of membership function (cont.)
A typical approach: Triangular Fuzzy Membership Functions.
FCM
Spring 2008
60
References
Journal papers:
C. Dring, M.J. Lesot, and R. Kruse, Data Analysis with Fuzzy Clustering Methods, Comp. statistics & data analysis, 2006. A. Baraldi, and P. Blonda, A Survey of Fuzzy Clustering Algorithms for Pattern RecognitionPart I and II, IEEE Trans. Systems, Man and Cybernetics, vol. 29, no. 6, 1999. J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. A. K.Jain, M. N. Murtyand, and P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol. 31, no. 3, 1999.
Thesis:
A. I. Shihab, Fuzzy Clustering Algorithms and Their Application to Medical Image Analysis, PhD. Thesis, University of London, 2000.
Spring 2008

Sayadi FuzzyClustering

Uploaded by

Copyright:

Available Formats

You might also like

Sayadi FuzzyClustering

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sayadi FuzzyClustering

Uploaded by

Copyright:

Available Formats

Fuzzy Clustering

Presented By: Omid Sayadi* Supervisor: Dr. Bagheri

Sport Cars Medium Market Cars Top Speed (Km/h)

Sharif University of Technology

Sharif University of Technology

Sharif University of Technology

Sharif University of Technology

Sharif University of Technology

Problem: Equal distance to centroids !

Sharif University of Technology

Sharif University of Technology

Fuzzy Clustering Algorithms Fuzzy Clustering Applications Discussion and Conclusions

Set of data in the feature space ith cluster

Clusters do not overlap

for all i j for all i

Sharif University of Technology

Problem Statement (cont.)

Problem Statement (cont.)

Sharif University of Technology

Problem Statement (cont.)

The space is partitioned into overlaping groups.

Sharif University of Technology

Problem Statement (cont.)

Set of data in the feature space ith cluster

Sharif University of Technology

Problem Statement (cont.)

Problem Statement (cont.)

No empty cluster Normalization Constraint 24

Sharif University of Technology

Problem Statement (cont.)

Distance between datum Xj and cluster i

Fuzzifier exponent (m1) Usually m=2

m determines the fuzziness of the clustering:

Problem Statement (cont.)

Problem Statement (cont.)

Gravitation to cluster i relative to total gravitation

Problem Statement (cont.)

Problem: Lack of Typicality

Problem Statement (cont.)

The cost function to be minimized:

Sharif University of Technology

Problem Statement (cont.)

Problem Statement (cont.)

Fuzzy Clustering Applications Discussion and Conclusions

Sharif University of Technology

Compute cluster centers:

j =1 Sharif University of Technology

for i = 1,..., c and j = 1,..., N

Compute cluster centers:

The same as FCM

j =1 Sharif University of Technology

Sharif University of Technology

Sharif University of Technology j =1

Compute cluster centers and cluster covariance matrix:

r r r r ( x j ci(t +1) )(x j ci(t +1) )T

Sharif University of Technology

Sharif University of Technology