Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

5.

Evaluation of Fuzzy Clustering

What is natural classification? The search for an answer to this ques-


tion is the basis of fuzzy clustering. The essence of fuzzy clustering is
to consider not only the belonging status of an object to the assumed
clusters, but also to consider how much each of the objects belong to
the clusters. Fuzzy clustering is a method to obtain “natural groups”
in the given observations by using an assumption of a fuzzy subset on
clusters. By this method, we can get the degree of belongingness of an
object to a cluster. That is, each object can belong to several clusters
with several degrees, and the boundaries of the clusters become un-
certain. Fuzzy clustering is a natural classification when considering
real data. There is no doubt concerning the usefulness of this cluster-
ing seeing its wide application to many fields. However, these methods
all suffer in that it is difficult to interpret the clusters obtained. Such
clustering sometimes causes confusion when trying to understand clus-
tering behavior because each cluster is not exclusive. Each cluster has
its own degree of mixing with other clusters. Sometimes, a very mixed
cluster is created which makes it difficult to interpret the result. In this
case, we need to evaluate the cluster homogeneity in the sense of the
degree of belongingness. In this chapter, we describe evaluation tech-
niques [40], [42] for the result of fuzzy clustering using homogeneity
analysis [16].
While in the case of hard clustering, the evaluation is simply based
on the observations which belong to each of the clusters. However, in
the case of fuzzy clustering, we have to consider two concepts. These
are the observations and the degree of belongingness which each ob-
ject possess. We describe a method to obtain the exact differences of
the clusters using the idea of homogeneity analysis. The result will
show the similarities of the clusters, and will give an indication for the
evaluation of a variable number of clusters in fuzzy clustering.

Mika Sato-Ilic and Lakhmi C. Jain: Innovations in Fuzzy Clustering, StudFuzz 205, 105–123
(2006)
www.springerlink.com c Springer-Verlag Berlin Heidelberg 2006
106 5. Evaluation of Fuzzy Clustering

The evaluation of the clustering result has two aspects. The first
of them is the validity of the clustering result which discusses whether
the clustering result is satisfactory, good, or bad. The other is the
interpretation of the clusters obtained and a discussion of how it is
possible to interpret them. This chapter focuses on the first point. For
the interpretation of the fuzzy clustering result, we discuss fuzzy clus-
ter loadings in chapters 3 and 4. The validity of the fuzzy clustering
result has also been discussed. In the conventional measures of the va-
lidity of fuzzy clustering, partition coefficient and entropy coefficient
are well known [5]. These measures are essentially based on the idea
that a clear classification is a better result. Using the idea of within-
class dispersion and between-class dispersion, separation coefficients
are introduced [20]. According to the fuzzy hypervolume, the parti-
tion density was discussed [15]. In this chapter, we describe methods
of evaluation of a fuzzy clustering result which use the idea of the
homogeneity of homogeneity analysis.

5.1 Homogeneity Analysis

Homogeneity analysis is a well known technique for optimizing the


homogeneity of variables by manipulation and simplification. Histori-
cally, the idea of homogeneity is closely related to the idea that differ-
ent variables may measure ‘the same thing’. We can reduce the number
of variables, or put a lower weight for such a variable in order to get
fair situations when comparing other variables.
Suppose X and Y are multivariable data matrixes which are n × p,
n × q respectively. Where, n is the number of objects and p and q are
the numbers of variables for X and Y , respectively. We assume the
weight vectors for X and Y and denote them as a and b which are
p × 1 and q × 1 vectors.
The purpose of the homogeneity analysis is to find a and b, and
the n × 1 vector f to minimize the following:

S(f , a, b) = f − Xa 2 + f − Y b 2 . (5.1)

For fixed f , the estimates of a and b which minimize equation (5.1)


are obtained as follows:
   
a = (X X)−1 X f , b = (Y Y )−1 Y f .

You might also like