Professional Documents
Culture Documents
23574
23574
23574
9th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT)
the medoid and points in the cluster. J(U, V, W ) = (uki )m D(xk , vi , wi , α), m > 1,
j=1 k=1
A basic K-medoid clustering can be described as
the alternate optimization [2, 9]:
where V = (v1 , . . . , vc ) and W = (w1 , . . . , wc ).
c ∑
∑ N Since J(U, V, W ) has three variables, the follow-
J1 (U, V ) = uki d(xk , vi ) ing alternate optimization is used:
j=1 k=1
Asymmetric fuzzy c-medoid algorithm.
with the constraint on U :
AFCMED1: Give an initial value for V̄ and W̄ .
∑
c
MU = {U = (uki ) : ukj = 1, ∀k; ukj ≥ 0, ∀k, j} AFCMED2: Fix V̄ , W̄ and find
j=1
Ū = arg min J(U, V̄ , W̄ ).
and another constraint on V = (v1 , . . . , vc ): U ∈MU
(xk ̸= vi or xk ̸= wi )
ūki = 1 ⇐⇒ i = arg min d(xk , v̄j ),
1≤j≤c uki = 1, (xk = vi = wi ). (3)
where v̄j is given by (1). The optimal solutions ūki while the solutions for V and W are respectively
and v̄i are also written as uki and vi for simplicity given as follows:
without confusion.
∑
N
2.2. Fuzzy c-medoids for asymmetric vi = arg min (uki )m D(xk , zj , wi ) (4)
zj ∈X
k=1
measures
∑
N
Let us assume that d(x, y) is asymmetric. We in- wi = arg min (uki )m D(xk , vi , yj ) (5)
yj ∈X
troduce a new measure having three variables and k=1
a parameter α ∈ [0, 1]:
2.3. Theoretical properties
D(x, v, w; α) = αd(x, v) + (1 − α)d(w, x).
We can prove the following theoretical properties of
We moreover assume that either α = 0, α = 1, or the solutions of fuzzy c-medoids. First property is
α = 21 for simplicity. almost trivial and the proof is omitted.
436
Proposition 1 Cluster centers vi and wi given re- 3. Examples
spectively by (4) and (5) satisfy the following:
3.1. A small example of travelers among
∑
N
countries
vi = arg min (uki )m d(xk , zj ),
zj ∈X
k=1 We show a small example for illustrating how the al-
∑
N gorithm works. In this example, The data of foreign
wi = arg min (uki )m d(yj , xk ). traveler in major 19 countries in 2001 from World
yj ∈X
k=1 Tourism Organization [16] are used. The countries
Second property is on the convergence of the algo- are South Africa, America, Canada, China, Taiwan,
rithm. The convergence criterion in AFCMED5 is Hong Kong, Korea, Japan, India, Indonesia, Sin-
roughly written in terms of (Ū , V̄ , W̄ ). We can also gapore, Australia, New Zealand, England, France,
use the value of objective function, i.e., we stop the Switzerland, Italy, Thailand, and Malaysia. The
algorithm when the objective function value is not number of travelers nij from country xi to country
decreased. Note that the objective function value is xj is given and normalized to similarity
monotonically nonincreasing.
nij
Proposition 2 Suppose we stop the algorithm s(xi , xj ) = ∑ .
nil
when the objective function value is not decreased.
l
Then algorithm AFCMED necessarily stops, and
( )2
the upper bound of the number of iterations is Nc . Then s(xi , xj ) is transformed to dissimilarity
The proof is easy when we observe that choice for d(xi , xj ) = 1 − s(xi , xj ),
all combinations of (vi , wi ) ∈ X × X is finite, and
the objective function is monotone nonincreasing. except that d(xi , xi ) is set to zero for all xi .
( )2
However, Nc is generally huge and hence it gives Figures 1, 2, and 3 respectively show graphs with
an unrealistic upper bound. two, three, and four clusters. Clusters are distin-
Third property is on the membership value uki . guished by different colors and we used the hard
We suppose an object xℓ is ‘movable’ to the infinity c-medoids with m = 1 and α = 12 for two medoids
in the sense that D(xℓ , vi , wi ; α) → ∞ for all 1 ≤ in a cluster. In these figures, some edges are thicker
i ≤ c. and others are thinner. Where an edge has higher
similarity, the edge is shown by a thicker arrow.
Proposition 3 Suppose xk = vi = wi . When α =
The two centers vi and wi in each cluster are ex-
1, xk = vi and when α = 0, xk = wi . We then have
pressed by a pentagram and a hexagram. Thus
uki = max uli = 1. a hexagram means outer-direction medoid and a
1≤l≤N
pentagram means inner-direction medoid. We used
Suppose xℓ moves to the infinity in the sense that Gephi [17], an interactive visualization software for
D(xℓ , vi , wi ) → ∞ for all 1 ≤ i ≤ c. Then we have graph data.
1 More details about clusters are as follows.
uℓi → , ∀1 ≤ i ≤ c.
c
Two clusters:
The proof is not difficult when we observe the form
{ China, Taiwan, Hong Kong, Korea },
of uki in (2). If xk = vi = wi , then D(xℓ , vi , wi ; α) =
with medoids: v = China, w = Korea ;
0 and uki takes it maximum value of uki = 1. If
{ South Africa, America, Canada, Japan,
D(xℓ , vi , wi ; α) → ∞, then it is easy to see uℓi → 1c .
India, Indonesia, Singapore, Australia, New
Zealand, England, France, Switzerland, Italy,
2.4. Hard c-medoids
Thailand, Malaysia },
The function J(U, V, W ) can be used for m = 1 to with medoids: v = America, w = Australia.
derive solutions for K-medoids alias hard c-medoids
for asymmetric dissimilarity measures. The solu- Three clusters:
tions are reduced to the following. { China, Taiwan, Hong Kong, Korea },
{ with medoids: v = China, w = Korea ;
1, i = arg min D(xk , vj , wj ) {France, Switzerland, Italy},
uki = 1≤j≤c , with medoids: v = France, w = Switzerland.
0, otherwise
∑ { South Africa, America, Canada, Japan,
vi = arg min d(xk , zj ), India, Indonesia, Singapore, Australia, New
zj ∈X
xk ∈Gi Zealand, England, Thailand, Malaysia },
∑ with medoids: v = America, w = Australia.
wi = arg min d(yj , xk ).
yj ∈X
xk ∈Gi Four clusters:
Proposition 2 holds also for hard c-medoids, since { China, Taiwan, Hong Kong, Korea, Japan },
the argument is the same as that for fuzzy c- with medoids: v = China, w = Korea ;
medoids. {France, Switzerland, Italy},
437
with medoids: v = France, w = Switzerland.
{America, Canada},
with medoids: v = w = Canada.
{ South Africa, India, Indonesia, Singapore,
Australia, New Zealand, England, Thailand,
Malaysia },
with medoids: v = Singapore, w = Aus-
tralia.
A real Twitter user network of Japanese political The formulation and algorithms of hard and fuzzy c-
parties was used of which the data have been taken medoids for asymmetric dissimilarity measures have
438
Figure 3: Classification result of nineteen countries Figure 4: Classification result of nineteen countries
to four clusters. Clusters are distinguished by dif- to four clusters using vi alone. Clusters are distin-
ferent colors. Pentagram implies vi and hexagram guished by different colors. Pentagram implies vi .
means wi . A ‘blue’ cluster is with vi = wi at
Canada.
Acknowledgment
Table 1: The maximum and the average of the
This study has partly been supported by the
Rand indexes (RI) for three methods: the proposed
Grant-in-Aid for Scientific Research, JSPS, Japan,
method using vi and wi , the method using vi alone,
no.26330270.
and the method using wi alone.
Method Max RI Ave. RI
vi and wi 0.95 0.9 References
vi alone 0.89 0.8
wi alone 0.94 0.88 [1] D. Arthur, S. Vassilvitskii, k-means++: the
advantages of careful seeding, Proc. of SODA
2007, pp.1027-1035.
[2] J. C. Bezdek, Pattern Recognition with Fuzzy
been proposed and tested using numerical examples. Objective Function Algorithms, Plenum, New
The method includes three options of α = 0, 1, and York, 1981.
1
2 . The values of α = 0, or 1 mean that a cluster has [3] B. S. Everitt, S. Landau, M. Leese, D. Stahl,
only one medoid of vi or wi , while α = 12 implies Cluster Analysis, 5th ed., Wiley, 2011.
that a cluster should have two medoids of vi and [4] M. A. Russell, Mining the Social Web, O’Reilly,
wi . 2011.
A fundamental problem is that which of α should [5] L. Kaufman, P. J. Rousseeuw, Finding Groups
be adopted. This problem has no general solution in Data, Wiley, 1990.
and dependent on an application domain. It hence [6] R. Krishnapuram, A. Joshi, Liyu Yi, A Fuzzy
needs further investigation in a specific application Relative of the k-Medoids Algorithm with Ap-
such as SNS networks. plication to Web Document and Snippet Clus-
Another problem is that larger computation is tering, Proc. of FUZZ-IEEE1999, 1999.
needed than K-means. For such problems, we need [7] J. B. MacQueen, Some methods of classification
to consider multistage clustering (e.g., [13]) whereby and analysis of multivariate observations, Proc.
we can effectively reduce computation. Moreover of 5th Berkeley Symposium on Math. Stat. and
an algorithm of k-medoid++ should be considered Prob., pp.281-297, 1967.
which is a variation of k-means++ [1]. [8] S. Miyamoto, Introduction to Cluster Analysis,
As other future studies, the present method Morikita-Shuppan, 1999 (in Japanese).
should be compared with other existing meth- [9] S. Miyamoto, H. Ichihashi, K. Honda, Algo-
ods (e.g.,[10, 15]) that can handle asymmetric mea- rithms for Fuzzy Clustering, Springer, 2008.
sures of dissimilarity using large-scale real examples. [10] M. E. J. Newman, Fast algorithm for detect-
Moreover cluster validity criteria (e.g.,[2]) should be ing community structure in networks, Physical
developed for medoid clustering. Review E, 69(6), 066133, 2004.
439
Figure 5: Classification result of nineteen countries
to four clusters using wi alone. Clusters are distin-
guished by different colors. Hexagram implies wi .
440