Sayadi FuzzyClustering

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Fuzzy Clustering

Presented By: Omid Sayadi* Supervisor: Dr. Bagheri


PhD. student, Biomedical Image and Signal Processing Lab (BiSIPL), Department of Electrical Engineering, Sharif University of Technology, osayadi@ee.sharif.edu
*

Fuzzy Clustering

Problem Statement Fuzzy Clustering Algorithms Fuzzy Clustering Applications Discussion and Conclusions
Spring 2008 Sharif University of Technology 2

Introduction
Cluster
A number of similar individuals that occur together as a two or more consecutive features that span a specific subspace of a concept. or Collection of data objects,
similar to one another within the same cluster dissimilar to the object in other clusters
Spring 2008
Weight (Kg) Lorries

Sport Cars Medium Market Cars Top Speed (Km/h)

Sharif University of Technology

Introduction (cont.)
Clustering
Process of grouping a set of physical or abstract objects into classes of similar objects. Clustering is the art of finding groups in data
Kaufmann & Rousseeu

Cluster analysis is an important human activity: Distinguishing in early chilhood, Learn a new object or understand a new phenomenon (feature extraction and comparison)
Spring 2008 Sharif University of Technology 4

Introduction (cont.)
Motivation
Discovering hidden patterns and structures, Discovering large sets of data into small number of meaningful groups (clusters), Dealing with a managable number of homogenous groups, instead of dealing with a vast number of single data objects, Data reduction and information compaction.
Spring 2008 Sharif University of Technology 5

Introduction (cont.)
Clustering vs. Classification
Clustering: Unsupervised Learning No class labels defined. Classification: Supervised Learning Predefined (priori known) clas labels, Training set (labeled) and test set. Clustering is unsupervised classification, where no classes are predefined (labeled).
Spring 2008 Sharif University of Technology 6

Introduction (cont.)
Similarity measures
Clustering: Max intra-similarity

Min inter-similarity

Spring 2008

Sharif University of Technology

Introduction (cont.)
Similarity measure functions
Tchebyschev

Euclidean

Hamming

Minkowski

Spring 2008

Sharif University of Technology

Introduction (cont.)
Clustering Approaches
Hierarchy algorithms Find successive clusters using previously established clusters. Partitioning algorithms Construct various partitions and then evaluate. Determine all clusters at once. Model-based algorithms Grid-based algorithms Density-based algorithms
Spring 2008 Sharif University of Technology 9

Introduction (cont.)
Hierarchical Clustering

Create a hierarchical decomposition of the data set using some criterion and a termination condition.
Divisive (Top-down ) Agglomerative (Bottom-up)

Spring 2008

Sharif University of Technology

10

Introduction (cont.)
Divisive vs Agglomerative

Spring 2008

Sharif University of Technology

11

Introduction (cont.)
Partitional Clustering
Given a database of N objects, partition the objects into a pre-specified number of K clusters.
1 K iK N M (N , K ) = ( 1 ) ( K i ) K! i = 0 i
Number of clustering ways

Liu, 1968

The clusters are formed to optimize a similarity function (max intra-similarity and min inter-similarity). Popular Partitioning Algorithms: K-means EM (Expectation Maximization)
Spring 2008 Sharif University of Technology 12

Introduction (cont.)
Challenges
Hierarchy algorithms The tree of clusters (dendogram) needs satisfaction of a termination criteria dendogram cutting Agglomerative or Divisive Irreversible split and merge Partitioning algorithms Pre-selection of number of clusters (K).
Spring 2008 Sharif University of Technology 13

Introduction (cont.)
K-means algorithm
Given the number of clusters (K), partition objects (randomly) into K nonempty subsets, While new assignments occur, do: Compute seed points as the centroids (virtual mean point) of the clusters of the current partition. Assign each object to the cluster with the nearest seed point.
Sharif University of Technology 14

Spring 2008

Introduction (cont.)
K-means example
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1 0.9

Problem: Equal distance to centroids !

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Spring 2008

Sharif University of Technology

15

Introduction (cont.)
Taxonomy of Clustering Approaches

Spring 2008

Sharif University of Technology

16

Fuzzy Clustering
Introduction

Fuzzy Clustering Algorithms Fuzzy Clustering Applications Discussion and Conclusions


Spring 2008 Sharif University of Technology 17

Problem Statement
HCM (K-means) Formulation
r r r X = {x1 , x2 ,K, xn }
Ci
All clusters C together fills the whole universe U

Set of data in the feature space ith cluster

UC
i =1

=U

Clusters do not overlap

Ci C j =
A cluster C is never empty and it is smaller than the whole universe U

for all i j for all i


There must be at least 2 clusters in a c-partition and at most as many as the number of data points K 18

Ci U 2cK

Spring 2008

Sharif University of Technology

Problem Statement (cont.)


K-means Failures
The objective function in classical clustering:
J= Ji = u k ci i =1 i =1 k ,u k Ci


Minimise the total sum of all distances

Each data must be assigned to exactly one cluster. The problem of data points that are equally distant.
Spring 2008 Sharif University of Technology 19

Problem Statement (cont.)


Equi-distant data points Butterfly data points (Ruspinis Butterfly 1969)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Spring 2008

Sharif University of Technology

20

Problem Statement (cont.)


Towards Fuzzy Clustering
We need to support uncertainity Each data can belong to multiple clusters with varying degree of membership.
1 0.9 0.8 0.7 0.6 0.5

The space is partitioned into overlaping groups.


Crisp Clusters Fuzzy Clusters

0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Spring 2008

Sharif University of Technology

21

Problem Statement (cont.)


Fuzzy C-Partition Formulation
r r r X = {x1 , x2 ,K, xn }
Ci
All clusters C together fills the whole universe U

Set of data in the feature space ith cluster

UC
i =1

=U

Clusters do overlap

Ci C j = or for all i j
A cluster C is never empty and it is smaller than the whole universe U

Ci U 2cK

for all i
There must be at least 2 clusters in a c-partition and at most as many as the number of data points K 22

Spring 2008

Sharif University of Technology

Problem Statement (cont.)


Fuzzy Clustering Types
Hard Clustering
Probabilistic Fuzzy Clustering Bezdek, 1981 Omitting the non-overlapping condition Possibilistic Fuzzy Clustering Krishnapuram & Keller, 1993
Sharif University of Technology 23

Fuzzy Clustering

Spring 2008

Problem Statement (cont.)


Probabilistic Fuzzy Clustering
A constraint optimization
r uij = Ci ( x j ) [0,1] The membership degree of a datum Xj to ith cluster r u j = (u1 j ,K, ucj )T Fuzzy label vector to each data point Xj r r r U = [uij ]cn = (u1 , u2 ,K, un ) Fuzzy label vector to each data point Xj
c n

J f ( X ,U f , C ) = u d
i =1 j =1

uij > 0
&
j =1
c

i {1,K, c}
j {1,K , n}

No empty cluster Normalization Constraint 24

m 2 ij ij

uij = 1
i =1

Spring 2008

Sharif University of Technology

Problem Statement (cont.)


Probabilistic Fuzzy Clustering (cont.)
m 2 J f ( X ,U f , C ) = uij dij i =1 j =1 c n

Distance between datum Xj and cluster i

Fuzzifier exponent (m1) Usually m=2


m=1.1 m=2

m determines the fuzziness of the clustering:


m1 : more crisp clustering m : more fuzzy clustering Spring 2008 Sharif University of Technology 25

Problem Statement (cont.)


Probabilistic Fuzzy Clustering (cont.)
The cost function Jf cannot be minimized directly, hence an alternative optimization scheme (AO) must be used. The iterative algorithm:
First, the membership degrees are optimized for fixed cluster parameters: U t = jU (Ct 1 ), t > 0 Then, the cluster parameters are optimized for fixed membership degrees: Ct = jc (U t )
Sharif University of Technology 26

Spring 2008

Problem Statement (cont.)


Probabilistic Fuzzy Clustering (cont.)
Minimization result:
Update formula for the membership degree:
( t +1) uij = 2 /( m 1) d ij 2 /( m 1) d lj l =1 c

Gravitation to cluster i relative to total gravitation

It depends not only to the distance of the datum Xj to cluster i, but also on the distances between this data point and other clusters.
Spring 2008 Sharif University of Technology 27

Problem Statement (cont.)


Probabilistic Fuzzy Clustering (cont.)
What about the cluster prototypes (C) ?
They are algorithm dependent, i.e. they depend on:
Describing parameters of the cluster (location, shape, size) Distance measure d.

Problem: Lack of Typicality


The notmalization constraint causes the cluster to tend to the outliers. No difference between x1 and x2 (0.5 for both)
Spring 2008 Sharif University of Technology 28

Problem Statement (cont.)


Possibilistic Fuzzy Clustering
Idea: Drop the normalization probabilistic fuzzy clustering. Remainig constraint: condition in

u
i =1

ij

=1

j {1, K , n}
A penalty term which forces the membership degrees away from zero.

The cost function to be minimized:


J f ( X , U f , C ) = u d + i (1 uij ) m
i =1 j =1 m ij 2 ij i =1 j =1 c n c n

Spring 2008

Sharif University of Technology

29

Problem Statement (cont.)


Possibilistic Fuzzy Clustering (cont.)
J f ( X , U f , C ) = u d + i (1 uij ) m
i =1 j =1 m ij 2 ij i =1 j =1 c n c n

i>0: Used to balance the contrary objectives expressed in the two above terms. 1 uij = Minimization result: 1 /( m 1)
2 d ij 1+ i

Result: The membership degree of a datum Xj to cluster i depends only on its distance to this cluster.
Spring 2008 Sharif University of Technology 30

Problem Statement (cont.)


More about i
Let m=2 in the previous update equation. If i equals (dij)2, then uij=0.5 . Hence, i determines the distance to the cluster i at which the membership degree should be 0.5 . Permitted extension of the cluster can be controlled by this parameter. i can be estimated by the fuzzy intra-class distance in probabilistic fuzzy clustering model: n
i =
Spring 2008 Sharif University of Technology

u
j =1 n j =1

m 2 ij ij

m ij

31

Fuzzy Clustering
Introduction Problem Statement

Fuzzy Clustering Applications Discussion and Conclusions


Spring 2008 Sharif University of Technology 32

FC algorithms
Major algorithms
Fuzzy c-means (FCM) Possibilistic c-means (PCM) Gustafson-Kessel (GK) Assumptions:
Input: data matrix (Xpn), and number of clusters (c). Output: cluster centers (C), and fuzzy partition matrix (U). Initialize cluster centers randomly for all algorithms.

Spring 2008

Sharif University of Technology

33

FC algorithms (cont.)
FCM
A probabilistic fuzzy clustering approach, Finds c spherical clusters the cluster prototype is the cluster center (C), The found clusters are approximately the same size, Distance measure: Euclidian distance, According to the objective function Jf the cluster prototype is updated as: n
Ci( t +1) =
Spring 2008 Sharif University of Technology
( t +1) m r ( u ij ) x j j =1 n

(u
j =1

( t +1) m ij

34

FC algorithms (cont.)
FCM algorithm
While U (t +1) U (t ) < or C (t +1) C (t ) < repeat: Compute distances. Compute membership values (Partition matrix):
( t +1) uij = 2 /( m 1) d ij 2 /( m 1) d lj l =1 c

for i = 1, K , c and j = 1, K , N

Compute cluster centers:


Ci(t +1) =
Spring 2008
( t +1) m r ( u ij ) x j j =1 n n

j =1 Sharif University of Technology

(u

for i = 1,..., c
35

( t +1) m ij

FC algorithms (cont.)
FCM (cont.)
The probabilistic FCM is widely used as an initializer for other clustering methods. It is a fast, reliable and stable method. In practice, FCM is not likely to stuck in local minimums. But it has problems: Lack of typicality, Sensitive to outliers.
Sharif University of Technology solution

PCM

Spring 2008

36

FC algorithms (cont.)
PCM algorithm
While U (t +1) U (t ) < or C (t +1) C (t ) < repeat: Compute distances. Compute membership values (Partition matrix):
uij = 1 d 1+ i
2 ij 1 /( m 1)

for i = 1,..., c and j = 1,..., N


Different from FCM

Compute cluster centers:


Ci(t +1) =
Spring 2008
( t +1) m r ( u ij ) x j j =1 n n

The same as FCM

j =1 Sharif University of Technology

(u

for i = 1,..., c
37

( t +1) m ij

FC algorithms (cont.)
FCM vs. PCM
PCM has solved the problems of FCM, but we face a new problem :
Cluster Coincidence and Cluster Repulsion
FCM PCM

Spring 2008

Sharif University of Technology

38

FC algorithms (cont.)
GK
Problem of FCM and PCM: only spherical clusters. In GK, each cluster is characterized by its center and r covariance matrix: Ci = {c i = 1,..., c i , i }, GK finds ellipsoidal clusters with approximately the same size. Clusters adapt themselves to the shape and location of data, because of the covariance matrix. Cluster size can be controlled by: det(i ) Usually det ( i ) = 1
Sharif University of Technology 39

Spring 2008

FC algorithms (cont.)
GK (cont.)
The Mahalanobis distance is used in GK:
1 r r r T 1 r r d ( x j , Ci ) = det(i ) p (x j ci ) i (x j ci ) 2

Each cluster have its special size and shape, The algorithm is locally adaptive, We need an update equation for covariance matrix, to minimize the objective function (either in probabilistic n or possibilistic): r (t +1) r r (t +1) T ( t +1) r
i
Spring 2008
( t +1)

u
j =1

ij

( x j ci
n

)( x j ci

Sharif University of Technology j =1

uij

( t +1)

40

FC algorithms (cont.)
GK algorithm
While U (t +1) U (t ) < or C (t +1) C (t ) < repeat: Compute distances. Compute membership values (Partition matrix):
u
( t +1) ij

2 /( m 1) d ij 2 /( m 1) d lj l =1 c

or

uij =

1 d 1+ i
2 ij
(t +1)

1 /( m 1)

Compute cluster centers and cluster covariance matrix:


Ci(t +1) =

(u
j =1 n

( t +1) m ij

r ) xj for i = 1,..., c i )

(t +1)

Spring 2008 j =1

(u

uij
j =1

r r r r ( x j ci(t +1) )(x j ci(t +1) )T

( t +1) m ij

Sharif University of Technology

u
j =1

(t +1)

ij

41

FC algorithms (cont.)
FCM vs. GK
FCM GK

Spring 2008

Sharif University of Technology

42

FC algorithms (cont.)
Other non-point-prototypes clustering models
Shell clustering algorithms are used for segmentation and the detection of special geometrical contours.

Spring 2008

Sharif University of Technology

43

Fuzzy Clustering
Introduction Problem Statement Fuzzy Clustering Algorithms

Discussion and Conclusions


Spring 2008 Sharif University of Technology 44

FC applications
Typical Applications
Fuzzy Inference System (FIS), Image Processing, Pattern Recognition, Machine learning, Data minig, Social network analysis, and ...

Spring 2008

Sharif University of Technology

45

FC applications (cont.)
FIS
Fuzzy inference mechanism is summerized as:

Q1: Where do the membership functions come from? Q2: How are the if-then rules extracted from data?
Sharif University of Technology 46

Spring 2008

FC applications (cont.)
FIS from Fuzzy Clustering
Clustering data in: Input-Output feature space, Input and Output spaces separately, Output space (induce clusters in inputs). Obtain membership functions by: Projection onto variables, Parametrization of the membership function. Extract one rule per cluster, Usually, FCM + Mamdani FIS is used.
Sharif University of Technology 47

Spring 2008

FC applications (cont.)
FIS from Fuzzy Clustering (cont.)

Spring 2008

Sharif University of Technology

48

FC applications (cont.)
FIS from Fuzzy Clustering (cont.)

Spring 2008

Sharif University of Technology

49

FC applications (cont.)
The same idea for Ruspinis Butterfly
m=1.25

m=2

Spring 2008

Sharif University of Technology

50

FC applications (cont.)
Biomedical applications
Tumor detection and extraction mamograpgym, ...). Image segmentation (MRI images, radiograohy, ...). (cancer, Cephalic

Spring 2008

Sharif University of Technology

51

FC applications (cont.)
Tumor detection
Crisp Adaptive methods Fuzzy methods

Spring 2008

Sharif University of Technology

52

Fuzzy Clustering
Introduction Problem Statement Fuzzy Clustering Algorithms Fuzzy Clustering Applications

Spring 2008

Sharif University of Technology

53

Conclusion
In summary
The ability to cluster data (concepts, perceptions, etc.) is an essential feature of human intelligence. The main idea of FC is to partition data into overlaping groups based on the similarity amongst patterns. The result of clustering is a set of clusters, cluster centers, and a matrix containig the membership degrees. FCM results in spherical clusters, but confusing in equally distant data objects. PCM doen not have the normal constraint, but suffers from coincidence or cluster repulsion.
Sharif University of Technology 54

Spring 2008

Conclusion (cont.)
Summary (cont.)
KG uses the covariance matrix, hence it yields ellipsoidal clusters.

The algorithms incorporate a fuzziness exponent which determines the intention of the algorithm towards fuzzy.
Sharif University of Technology 55

Spring 2008

Conclusion (cont.)
Summary (cont.)
FCM is widely used as an initializer for other clustering methods. FC methods are widely used in Fuzzy membership function generation to model fuzzy rule bases and inference systems. Compared to the classical (Crisp) clustering, FC methods show more efficiency in many applications.

Spring 2008

Sharif University of Technology

56

Discussion
Related issues
Number of clusters (c): Yang Shanlin & Malay proved that: c n0.5 Elbow criterion: Define a validity measure, and evaluate it using different number of clusters to find an optimum point (elbow), where adding another cluster shouldnt add sufficient information.

Spring 2008

Sharif University of Technology

57

Discussion (cont.)
Related issues (cont.)

Spring 2008

Sharif University of Technology

58

Discussion (cont.)
Shape of membership function
Semantically fuzzy sets are required to be convex, monotonous and with limited support. Does FCM not support the above conditions? Does PCM lead to convex membership functions? We should choose another cluster estimation to have proper clusters with flexibility to choose the membership functions support.

Spring 2008

Sharif University of Technology

59

Discussion (cont.)
Shape of membership function (cont.)
A typical approach: Triangular Fuzzy Membership Functions.

FCM

Spring 2008

Sharif University of Technology

60

References
Journal papers:
C. Dring, M.J. Lesot, and R. Kruse, Data Analysis with Fuzzy Clustering Methods, Comp. statistics & data analysis, 2006. A. Baraldi, and P. Blonda, A Survey of Fuzzy Clustering Algorithms for Pattern RecognitionPart I and II, IEEE Trans. Systems, Man and Cybernetics, vol. 29, no. 6, 1999. J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. A. K.Jain, M. N. Murtyand, and P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol. 31, no. 3, 1999.

Thesis:
A. I. Shihab, Fuzzy Clustering Algorithms and Their Application to Medical Image Analysis, PhD. Thesis, University of London, 2000.
Sharif University of Technology 61

Spring 2008

You might also like