VII. CONCLUSION
extraction and selection,"
of the FCM, the PCM may appear to be a close cousin of the FCM.
However, as shown in this correspondence, there are fundamental
differences between the two algorithms. The FCM is primarily a
partitioning algorithm, whereas the PCM is primarily a mode-seeking
clusters and it is highly robust in the presence of noise and outliers.
Its weakness is that it requires a good initialization and a reliable
scale estimate to function effectively. When the data is not severely
contaminated, the FCM can provide a reasonable initialization and a
scale estiimate. Thus, with the proper choice of the scale and fuzzifier
parameters, the PCM can be used to improve the results of the FCM.
The situation is quite different when the data set is highly noisy.
M. Barni, V. Cappellini, and A. Mecocci
The least squares (LS) algorithm is a general regression algorithm
in statistics that minimizes the sum of the squares of the residues,
and it has been used extensively in engineering applications. It
Abstract-In this comment, we report a difficulty with the application
of the possibilistic approach to fuzzy clustering (PCM) proposed by Keller
and Krishnapuram. In applying this algorithm we found that it has the
undesirable tendency to produce coincident clusters. Results illustrating
this tendency are reported and a possible explanation for the PCM
behavior is suggested.
severely compromised by a single outlier in the data set. Thus, the
FCM can completely break down in the presence of a single outlier. In
contrast, It has been shown that the PCM is a robust parameter
estimation technique that is related to the M-estimator which
I. INTRODUCTION

In their paper, Krishnapuram and Keller presented a new approach
has been widely used in robust statistics with good results. Robust
to fuzzy clustering [possibilistic e-means (PCM)]. By relaxing the
techniques can tolerate up to 50% noise. As shown in Fig. 3,
the FCM is quite inadequate as an initialization and scale estimation
constraint that the memberships of a data point across classes sum to
one, each cluster is disentangled from the others and the membership
tool when noise is present. Fortunately, there are many techniques in
values are interpreted as the compatibilities of the point to the
class prototypes. Besides, the possibilistic aplproach leads to higher
noise immunity with respect to classical algorithms derived from
Manuscript received March 30, 1995; revised August 1, 1995.
M. Bani and V. Cappellini are with the Department of Electronic Engi-
neering, University of Florence Via S. Marta, 3 50139 Firenze, Italy.
A. Mecocci is with the Department of Electronic Engineering, University
of Pavia Via Abbiategrasso, 209-27100 Pavia, Italy
Publisher Item Idenltifier !S 1063-6706(96)05624-X.
Fig. 1. One Band (out of seven) of a multichannel satellite image. Fig. 3. PCM partition of the multichannel image reported in Fig. 1. The
FCM partition of Fig. 2 has been used to initialize PCM. Except for cluster
(a) which groups water pixels, three almost coincident clusters have been
produced (b)-(d),

Fig. 2. FCM partition of the multichannel image reported in Fig. 1. A

random partition has been adopted as initialization. For each cluster an image
is shown whose pixel values represent the membership to the cluster. Clusters
are found to group pixels belonging to different land cover classes: (a) water,
(b) wooded areas, (c) agricultural land, and (d) urban areas. Fig. 4. Synthetic data set consisting of three visual clusters

mised by the tendency it has to produce coincident clusters. First

experimental results are reported, then a rationale for the difficulties where X = { X I x2,. . . ,2") c W p is the data set to be clustered, c is
we encountered is suggested. the number of clusters, U is the partition we look for, u L , k is the grade
of membership of xk to the ith cluster, v is the set of cluster centers
11. TEST RESULTS and d z k is the distance between z k and the center of the ith cluster,
According to the classical FCM approach, given n points to be i.e., d , k = l l ~ - The minimization of Jm(U,v)
k v21/. is obtained by
clustered, the membership functions of each cluster are determined means of an iterative process which, by starting from a given set of
by minimizing the quantity initial cluster centers, is assumed to terminate at a local minimum
[2]. In their paper, Krishnapuram and Keller relax the constraints
expressed by (2) so that columns need not sum to one. To avoid the
i=l k=l trivial null solution, they modify the objective function according to
subject to the constraints the following definition
C n R

where 1 1 ~are suitable positive numbers to be chosen by the user.


Fig. 5. FCM partition of the data set shown in Fig. 4. Hard sets depicted in the figure have been obtained by thresholding the fuzzy clusters produced by
FCM. For each set the corresponding cluster center is reported. A threshold eaual to 0.5 was used.

Fig. 6. PCM partition of the data set reported in Fig. 4. Hard sets depicted in the figure have been obtained by thresholding PCM possihilistic clusters. For
each set the corresponding cluster center is reported. A threshold equal to 0.5 has been adopted. The three clusters coincide.

We applied the PCM algorithm to several data sets, stemming ranging from 0.1 to 0.7, have been tried. The images reported in the
from different application fields, and we realized that the possibilistic figure illustrate PCM output after the refining step; similar results hold
approach has an undesirable tendency to produce coincident clusters. in case such a step 11snot applied. As it can be seen, though the initial
Figs. 1-21 illustrate such a tendency. In Fig. 1, one band out of seven partition was rather good the final results are unsatisfactory. In fact,
of a multichannel satellite image is shown. First, the classical FCM in addition to the water cluster [Fig. 3(a)], tlhree almost coincident
clustering has been applied to the multichannel image. The number of clusters have been obtained [Figs. 3(b)-(d)].
clusters c and the exponential parameter m have been set to four and To get more insight into the PCM behavior, synthetic data sets
two, respectively, while initial cluster centers have been randomly with easily distinguishable clusters have been built. These data have
chosen. FCM output is shown in Fig. 2-for each cluster, an image been used to assess, the effectiveness of PCM in extracting structure
is reported whose pixel values represent the degree of membership to from data. As a result, the tendency to produce coincident clusters
the cluster. A comparison between FCM output and a land cover map has been confirmed.
proves the effectiveness of the FCM in distinguishing among different In Fig. 4 a test data set is shown where (visually) a good partition
ground cover classes: the first cluster groups water pixels [Fig. 2(a)], should consist of three clusters centered at (1, 0), ( 3 , 0), and (5, 0).
the second cluster is representative of wooded areas [Fig. 2(b)], the First classical FCM with random initialization was applied. Then,
third one consists of pixels belonging to agricultural land [Fig. 2(c)], memberships to each cluster have been threslholded, thus producing
whereas the fourth cluster identifies urban areas [Fig. 2(d)]. hard sets as depicted in Fig. 5. The number of clusters has been set
A problem with PCM is that it needs a good initialization to provide to three and the exponent m to two. A threshold equal to 0.5 was
accurate clustering. In their paper, Krishnapuram and Keller suggest used to produce hard sets from the fuzzy clusters. Upon inspection of
using the output of FCM as an initial partition. They also suggest the results, the effectiveness of the FCM in extracting the structure
applying the PCM algorithm twice: the first time by setting of data comes out
When the PCM algorithm is used the results are quite different.
As in the case of FCM hard sets can be obtained from possibilistic
clusters by simply thresholding the membership values: the hard sets
(4) obtained in this way are reported in Fig. 6. PCM was applied by
U: using the output of FCM as the initial partition and by choosing M ,
according to (4) and (5). Moreover we set c == 3, m = 2 and values
while the second time it should be of a from 0.1 to 0.7 were tried. Fig. 6 shows that, once again, PCM
produces three almost coincident clusters, thus failing to recognize
the structure underlying the data set.

where (K)* is an appropriate a-cut of the ith cluster. 111. DISCUSSION

In Fig. 3, results obtained by applying the PCM algorithm are The bad behavior PCM sometimes exhibits can be explained by
shown. As suggested by Krishnapuram and Keller, FCM output the well-known fact that since no link exists aimong clusters, the min-
(Fig. 2) has been used as initial partition and have been chosen as imization of J m ( U , w ) can be obtained by o'perating independently
in (4) and (5), e was set to four and rn to two. Several values of a on each cluster. For each of them the following quantity must be

minimized those provided by the FCM algorithm, PCM can miss the objective
to identify the structure underlying data.

In this note we described a difficulty which arises in applying
where U(’)is the ith row of partition matrix C‘ and U ( ’ ) is the center the possibilistic e-means clustering proposed by Keller and Krish-
of cluster i . If i is theAsame for all clusters, the same choice of napuram. Experimental results carried out on real images as well
(U(’), U ( ’ ) ) minimizes $a(‘), v ( ” ) , regardless of i , i.e., the ( C ,v ) as on test data illustrate that difficulties can arise because of the
which PCM should produce is such that its rows are all equal and the tendency of PCM to produce coincident clusters and of the exaggerate
cluster centers U ( ’ ) are coincident. As a matter of fact, PCM does not dependence on the initial partition. A rationale for this behavior has
always produce trivial solutions since*M’s are not constant over i ; been suggested by noting that disentangling the clusters leads to an
moreover, it usually fails to minimize .J(u(‘),U ( ’ ) ) globally but it only increase in the number of local minimum which can prevent the
reaches a local minimum. Nevertheless, the number of possible local iterative process from reaching a satisfactory partition.
minimum increases, thus allowing for a lot of bud minimizers which
if iterations start from a bad initialization it will be very likely that
