Possibilistic Approach For Clustering

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO.
3, AUGUST 7996 393
clusters. However, if we in fact knew that the center point had a [SI R. DavC and R. Krishnapuram, “Robust clustering methods: a unified
label that was the same as that of the cluster on the left, then a view,” IEEE Truns. Fuzzy Syst., to be published.
membership of 0.5 does not make sense from a classification point [9] D. E. Gustafson, and W. C. Kessel, “Fuzzy clustering with a fuzzy
covariance matrix,” in Proc. IEEE CDC, San Diego, CA, 1979, pp.
of view. The partition generated by a fuzzy clustering algorithm has 761-766.
more to do with the distance measure chosen than the class labels [lo] R. Krishnapuram and C.-P. Freg, “Fitting an unknown number of lines
associated with the data points. Thus, although in specific cases the and planes to image data through compatible cluster merging,” Pattern
partition may roughly correspond with the correct classification, this Recogn., vol. 25, Apr. 1992, pp. 385-400.
[ 1I] R. Krishnapuram, “Generation of membership functions via possibilistic
is simply fortuitous unless we, in fact, know the distributions of the clustering,” in 3rd IEEE Con$ Fuzzy Syst., Orllando, FL, July 1994, pp.
clusters in feature space and choose the distance measures and the 902-908.
clustering algorithm accordingly. [ 121 M. Barni, V. Cappellini, and A. Mecocci, “Comments on ‘A Possibilistic
Approach to Clustering,”’ IEEE Trans. Fuzzy Syst., vol. 4, pp. 393-396,
Aug. 1996
VII. CONCLUSION [ 131 W. Dzwinel, “In search of the global minimum in problems of feature
extraction and selection,” in Proc. Eur. Congress Intell. Tech. So@
Since the objective function of the PCM is a modification of that Computing, Aachen, Germany, Sept. 1995, pp. 1326-1 330.
of the FCM, the PCM may appear to be a close cousin of the FCM. 1141 S. Medasani, J. Kim, and R. Krishnapuram, “Ektimation of membership
However, as shown in this correspondence, there are fundamental functions for pattern recognition and computer vision,” in Fuzzy Logic
differences between the two algorithms. The FCM is primarily a and its Applications 10 Engineering, Information Sciences, and Intelli-
gent Systems, K . C. Min and Z. Bien, Eds. Norwell MA: Kluwer,
partitioning algorithm, whereas the PCM is primarily a mode-seeking 1995, pp. 45-54.
algorithm. The power of the PCM does not lie in creating partitions, [ 151 J . C. Bezdek, Puttern Recognition with Fuzzy Objective Function Algo-
but rather in tinding meaningful clusters as defined by dense regions. rithms. New York Plenum, 1981.
Its strengths are that it overcomes the need to specify the number of [16] P. J. Huber, Robust Statisrics. New York: Wiley, 1981.
clusters and it is highly robust in the presence of noise and outliers.
Its weakness is that it requires a good initialization and a reliable
scale estimate to function effectively. When the data is not severely
contaminated, the FCM can provide a reasonable initialization and a
scale estiimate. Thus, with the proper choice of the scale and fuzzifier Comments on “A Possibilistic Approach to Clustering”
parameters, the PCM can be used to improve the results of the FCM.
The situation is quite different when the data set is highly noisy. M. Barni, V. Cappellini, and A. Mecocci
The least squares (LS) algorithm is a general regression algorithm
in statistics that minimizes the sum of the squares of the residues,
and it has been used extensively in engineering applications. It Abstract-In this comment, we report a difficulty with the application
has been shown [6], [7] that the FCM is a generalization of the of the possibilistic approach to fuzzy clustering (PCM) proposed by Keller
and Krishnapuram. In applying this algorithm we found that it has the
least squares technique that uses harmonic means of distances to undesirable tendency to produce coincident clusters. Results illustrating
prototypes as residues. It is well known that the LS analysis is this tendency are reported and a possible explanation for the PCM
severely compromised by a single outlier in the data set. Thus, the behavior is suggested.
FCM can completely break down in the presence of a single outlier. In
contrast, It has been shown [6]-[SI that the PCM is a robust parameter I. INTRODUCTION
estimation technique that is related to the M-estimator [16] which
In their paper,’ Krishnapuram and Keller presented a new approach
has been widely used in robust statistics with good results. Robust
to fuzzy clustering [possibilistic e-means (PCM)]. By relaxing the
techniques can tolerate up to 50% noise [4]. As shown in Fig. 3,
the FCM is quite inadequate as an initialization and scale estimation constraint that the memberships of a data point across classes sum to
one, each cluster is disentangled from the others and the membership
tool when noise is present. Fortunately, there are many techniques in
robust statistics that can help us in this regard. values are interpreted as the compatibilities of the point to the
class prototypes. Besides, the possibilistic aplproach leads to higher
noise immunity with respect to classical algorithms derived from
REFERENCES Bezdek’s fuzzy e-means (FCM) [l].Indeed the novel approach is very
R. Krishnapuram and J. Keller, “A possihilistic approach to clustering,” interesting since by recasting fuzzy clustering into the framework of
IEEI? Trans. Fuzzy Syst., vol. 1, pp. 98-110, May 1993. possibility theory, membership functions are directly related to the
J. T. Tau and R. C. Gonzales, Pattern Recognition Principles. Reading, typicality of data points with respect to the given classes. In this
MA: Addison-Wesley, 1974. way, classification tasks are made easier and the impact of spurious
G. ELeni and X. Liu, “A least biased fuzzy clustering method,” IEEE
points on the tinal partition is reduced.
Trans. Pattern Anal. Machine Intell., vol. 16, pp. 954-960, Sept. 1994.
P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier The purpose here is to describe our experience in applying the
Detection. New York Wiley, 1987. PCM algorithm, whose performance, we found, is severely compro-
R. DavB and R. Krishnapuram, “Robust algorithms for clustering,” in
Proc. Inr. Fuzzy Syst. Assoc. Congress, SZo Paula, Brazil, July 1995, Manuscript received March 30, 1995; revised August 1, 1995.
vol. I, pp. 561-564. M. Bani and V. Cappellini are with the Department of Electronic Engi-
J. Kim, R. Krishnapuram, and R. Dav: “On robustifying the C-means neering, University of Florence Via S. Marta, 3 50139 Firenze, Italy.
algorithms,” in Proc. North Amer. Fuzzy Informat. Processing Soc. Coni, A. Mecocci is with the Department of Electronic Engineering, University
College Park, MD, Sept. 1995, pp. 630-635. of Pavia Via Abbiategrasso, 209-27100 Pavia, Italy
0. Nasraoui and R. Krishnapuram, “Crisp interpretations of fuzzy and Publisher Item Idenltifier !S 1063-6706(96)05624-X.
possibilistic clustering algorithms,” in Proc. Eur. Congress F u u y Intell. ‘R. Krishnapuram and J. M. Keller, IEEE Trans. Fuzzy Syst., vol. 1, pp.
Technol., Aachen, Germany, Sept. 1995, pp. 1312-1318. 98-1 10, May 1993.
1063-6706/96$05.00 0 1996 IEEE

394 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 3, AUGUST 1996
Fig. 1. One Band (out of seven) of a multichannel satellite image. Fig. 3. PCM partition of the multichannel image reported in Fig. 1. The
FCM partition of Fig. 2 has been used to initialize PCM. Except for cluster
(a) which groups water pixels, three almost coincident clusters have been
produced (b)-(d),
Fig. 2. FCM partition of the multichannel image reported in Fig. 1. A

random partition has been adopted as initialization. For each cluster an image
is shown whose pixel values represent the membership to the cluster. Clusters
are found to group pixels belonging to different land cover classes: (a) water,
(b) wooded areas, (c) agricultural land, and (d) urban areas. Fig. 4. Synthetic data set consisting of three visual clusters
mised by the tendency it has to produce coincident clusters. First

experimental results are reported, then a rationale for the difficulties where X = { X I x2,. . . ,2") c W p is the data set to be clustered, c is
we encountered is suggested. the number of clusters, U is the partition we look for, u L , k is the grade
of membership of xk to the ith cluster, v is the set of cluster centers
11. TEST RESULTS and d z k is the distance between z k and the center of the ith cluster,
According to the classical FCM approach, given n points to be i.e., d , k = l l ~ - The minimization of Jm(U,v)
k v21/. is obtained by
clustered, the membership functions of each cluster are determined means of an iterative process which, by starting from a given set of
by minimizing the quantity initial cluster centers, is assumed to terminate at a local minimum
[2]. In their paper, Krishnapuram and Keller relax the constraints
expressed by (2) so that columns need not sum to one. To avoid the
i=l k=l trivial null solution, they modify the objective function according to
subject to the constraints the following definition
C n R
where 1 1 ~are suitable positive numbers to be chosen by the user.

IEEE TRAPJSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 3, AUGUST 1996 395
Fig. 5. FCM partition of the data set shown in Fig. 4. Hard sets depicted in the figure have been obtained by thresholding the fuzzy clusters produced by
FCM. For each set the corresponding cluster center is reported. A threshold eaual to 0.5 was used.
Fig. 6. PCM partition of the data set reported in Fig. 4. Hard sets depicted in the figure have been obtained by thresholding PCM possihilistic clusters. For
each set the corresponding cluster center is reported. A threshold equal to 0.5 has been adopted. The three clusters coincide.
We applied the PCM algorithm to several data sets, stemming ranging from 0.1 to 0.7, have been tried. The images reported in the
from different application fields, and we realized that the possibilistic figure illustrate PCM output after the refining step; similar results hold
approach has an undesirable tendency to produce coincident clusters. in case such a step 11snot applied. As it can be seen, though the initial
Figs. 1-21 illustrate such a tendency. In Fig. 1, one band out of seven partition was rather good the final results are unsatisfactory. In fact,
of a multichannel satellite image is shown. First, the classical FCM in addition to the water cluster [Fig. 3(a)], tlhree almost coincident
clustering has been applied to the multichannel image. The number of clusters have been obtained [Figs. 3(b)-(d)].
clusters c and the exponential parameter m have been set to four and To get more insight into the PCM behavior, synthetic data sets
two, respectively, while initial cluster centers have been randomly with easily distinguishable clusters have been built. These data have
chosen. FCM output is shown in Fig. 2-for each cluster, an image been used to assess, the effectiveness of PCM in extracting structure
is reported whose pixel values represent the degree of membership to from data. As a result, the tendency to produce coincident clusters
the cluster. A comparison between FCM output and a land cover map has been confirmed.
proves the effectiveness of the FCM in distinguishing among different In Fig. 4 a test data set is shown where (visually) a good partition
ground cover classes: the first cluster groups water pixels [Fig. 2(a)], should consist of three clusters centered at (1, 0), ( 3 , 0), and (5, 0).
the second cluster is representative of wooded areas [Fig. 2(b)], the First classical FCM with random initialization was applied. Then,
third one consists of pixels belonging to agricultural land [Fig. 2(c)], memberships to each cluster have been threslholded, thus producing
whereas the fourth cluster identifies urban areas [Fig. 2(d)]. hard sets as depicted in Fig. 5. The number of clusters has been set
A problem with PCM is that it needs a good initialization to provide to three and the exponent m to two. A threshold equal to 0.5 was
accurate clustering. In their paper, Krishnapuram and Keller suggest used to produce hard sets from the fuzzy clusters. Upon inspection of
using the output of FCM as an initial partition. They also suggest the results, the effectiveness of the FCM in extracting the structure
applying the PCM algorithm twice: the first time by setting of data comes out
When the PCM algorithm is used the results are quite different.
As in the case of FCM hard sets can be obtained from possibilistic
clusters by simply thresholding the membership values: the hard sets
(4) obtained in this way are reported in Fig. 6. PCM was applied by
c
3x1
U: using the output of FCM as the initial partition and by choosing M ,
according to (4) and (5). Moreover we set c == 3, m = 2 and values
while the second time it should be of a from 0.1 to 0.7 were tried. Fig. 6 shows that, once again, PCM
produces three almost coincident clusters, thus failing to recognize
the structure underlying the data set.
where (K)* is an appropriate a-cut of the ith cluster. 111. DISCUSSION

In Fig. 3, results obtained by applying the PCM algorithm are The bad behavior PCM sometimes exhibits can be explained by
shown. As suggested by Krishnapuram and Keller, FCM output the well-known fact that since no link exists aimong clusters, the min-
(Fig. 2) has been used as initial partition and have been chosen as imization of J m ( U , w ) can be obtained by o'perating independently
in (4) and (5), e was set to four and rn to two. Several values of a on each cluster. For each of them the following quantity must be
396 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO. 3, AUGUST 1996
minimized those provided by the FCM algorithm, PCM can miss the objective
n
to identify the structure underlying data.
IV. CONCLUSION
In this note we described a difficulty which arises in applying
where U(’)is the ith row of partition matrix C‘ and U ( ’ ) is the center the possibilistic e-means clustering proposed by Keller and Krish-
of cluster i . If i is theAsame for all clusters, the same choice of napuram. Experimental results carried out on real images as well
(U(’), U ( ’ ) ) minimizes $a(‘), v ( ” ) , regardless of i , i.e., the ( C ,v ) as on test data illustrate that difficulties can arise because of the
which PCM should produce is such that its rows are all equal and the tendency of PCM to produce coincident clusters and of the exaggerate
cluster centers U ( ’ ) are coincident. As a matter of fact, PCM does not dependence on the initial partition. A rationale for this behavior has
always produce trivial solutions since*M’s are not constant over i ; been suggested by noting that disentangling the clusters leads to an
moreover, it usually fails to minimize .J(u(‘),U ( ’ ) ) globally but it only increase in the number of local minimum which can prevent the
reaches a local minimum. Nevertheless, the number of possible local iterative process from reaching a satisfactory partition.
minimum increases, thus allowing for a lot of bud minimizers which
are likely to trap PCM iterations into a nonsatisfactory partition. So, REFERENCES
if iterations start from a bad initialization it will be very likely that
a good solution will not be reached. In fairness to PCM it should [ 1J J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algo-
be noticed that all fuzzy clustering techniques depend on the initial rithms. New York: Plenum, 1981.
[2] J. C. Bezdek, R. J. Hathaway, M. J. Sabin, and W. T. Tucker,
partition, but the experimental results discussed above show that with “Convergence theory for fuzzy e-means: Counterexamples and repairs,”
regard to initialization dependency, PCM is not just as the other IEEE Trans. Syst., Man, Cybern., vol. SMC-17, pp. 873-877, Sept./Oct.
algorithms. In fact, even by starting from good initial partitions as 1987.

Possibilistic Approach For Clustering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Possibilistic Approach For Clustering

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 4, NO.

3, AUGUST 7996 393

1063-6706/96$05.00 0 1996 IEEE

Fig. 2. FCM partition of the multichannel image reported in Fig. 1. A

mised by the tendency it has to produce coincident clusters. First

where 1 1 ~are suitable positive numbers to be chosen by the user.

where (K)* is an appropriate a-cut of the ith cluster. 111. DISCUSSION

You might also like