Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi'an, 2-5 November 2003

A FUZZY CLASSIFICATION METHOD BASED ON SUPPORT VECTOR


MACHINE
QIANG HE, XI-ZHAO WANG, HONGJIE XING

Machine Learning Center, Faculty of Mathematics and Computer Science, Hebei University, Badoing, Hebei, China
E-MAIL:heq@maiI.hbu,edu.cn

Abstract: hyperplane is expanded with nonzero weights on a few


Support vector machine (SVM) is a novel type learning input points that are called support vectors.
machine, based on statistical learning theory. Due to the good There are more and more applications using the SVM
genernlization capability, SVMs have been widely used in techniques. For the data with numerical condition attribute
classification, regression and pattern recognition. In this and decision attributes, we can forecast decision attribute
paper, for the data with numerical condition attributes sod
decision uttributes, a new fuzzy classificatiou method (Fa) value of new samples by regression learning, however, in
based on SVM is proposed. This method first fuzzifies some case, we cannot precisely know the sample through
decision attributes to some classes(liugnistic terms), then its decision attribute value. In this paper, a new fuzzy
trains decision function(c1assifier). For a uew sample, the classification method (FCM) based on SVM is proposed.
decision function doesn't forecast the value of its decision The characteristic of the new method is that, for the data
attribute, hut gives the corresponding class and its with numerical decision attributes, this method fuzzifies the
membership degree as fuzzy decision. This fnzzy decision decision attributes to some classes (linguistic terms) fmt,
result is more objective and easier to understand than crisp then learning classifier, for a new sample, the decision function
decision in common sense. The design principle is !given and doesn't forecast the value of its decision amihute, hut gives the
the classifcation algorithm is implemented in this paper. The corresponding class and the membership degree. This fuzzy
experimental results show that the new method proposed in decision result is more objective and easier to understand than
this paper is effective. The method optimizes the classified precise decision result.
result of common SVMs, and therefore enhances the The rest of this paper is organized as follows. In
intelligent level of SVMs. section 2,we review several SVMs for multiclass
classification, such as one-against-all, one-against-one and
Key words: DAGSVM methods. In section 3, the detail introduction of
Support vector machine; Binary classification;Multiclass fuzzy classification method based on SVM will be given.
classification; Fuzzy ID3.
Two experiments are presented in section 4. We have some
remarks in section 5.
1. Introduction
2. Several SvMs for multiclass classification
The theory of support vector machine is a new
classification technique based on the statistical learning
theory [I], and has drawn much attention on this topic in The detail description of the following three methods
recent years. For its extraordinary generalization, now, SVM can see [2], we only give a brief introduction here.
has been a powerful tool for solving classification problem.
A S V M first maps the original input space into a 2.1 One-against-all
high-dimensional feature space through some nonlinear
mapping, chosen a priori, and then constructs a optimal The one-against-all method [l] is the earliest used
separating hyperplane maximizing the margin between two implementation for SVM multiclass classification. It
classes in this space. Maximizing the margin is a quadratic consmcts K binary SVMs where K is the number of
programming problem and can be solved in its dual space classes, where the ith SVM separates training samples of
by introducing Lagrangian multipliers. Without any the class i from the other training samples. For instance,
knowledge of mapping, the S V M finds the optimal there are I training data(x,,y,),(x,,y,) ,...,(x,,y,),where
hyperplane by using kernel functions describing an inner xi ER", yi ~ { 1 ,,...
2 K] , the ith SVMi=1,2 ,...,K
product in feature space. The vector d e h i i g the optimal

0-7803-78652/03/$17.0002003 IEEE
1237

Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:31:08 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Machine Learning and Cybernetics, Wan, 2-5 November 2003

solves following problem: c:'ro, t=1,2 )...,1


where i . j = 1,2,...,K . i # j , and dual problem of
above problem are the same as one-against-all method.
subject to There are K(K-1)/2 decision iimctions after solving
(m'.p(xj))+b'21-4;. ifyj=i these optimal problems, where the rules are as the
following:
(w' .p ( x j ) )+ b' 2 -1 + 4; if y j # i r;,(x)=[d 'q(X)]+b"
&j>O, j = I , 2,...,I or i, j = Q . .,K, i# j
Above optimal problem can be solved from its dual
problem by introducing Lagrangian multipliers. The dual f,(x) = &,a,K(x,,x)+b"
problem is following: We predict x is in the class with large vote, which is
called "MaxWins" [5]: if xj
( x ) says x is in the itb class,
then one adds the vote pi for the ith class. Otherwise the
subject to
votepj for thejth is increased by one. Then we predict x
Ora,<C i=1,2 ,...,1.
is in the class with the largest vote, i.e.
I
C a , y ,= O c~QW)
= ~gm~{pl,pz,...,pKl
'=I
We will get K two-class classification rules by 2.3 DAGSVM
solving K quadratic programming problems, where the
rules are following: The DAGSVM's training phase is the same as
f , ( ~=) [o' .p(x)]+ b' one-against-one method by solving K(K - 1)I 2 binary
SVMs [6].However, in the testing phase, it uses a rooted
or j = 1,2,...,K
binary directed acyclic graph which has K ( K - 1 ) / 2
intemal nodes and K leaves. Given a test sample x ,
starting at the root node, the binary decision function is
For a new sample x ,we say x is in the class which
evaluated. Then it moves to either left or right depending
has the largest value of the decision function, i.e.
on the output value. The figure I is the derision DAG of
class ( x ) = argmaxlf,(x),f,(x),... , f , ( x ) } four classes.

2.2 One-against-one

Another major method is called the one-against-one


method [3-4]. This method constructs K ( K - 1) / 2 binary
SVMs where each one separates training samples of class i
from training samples of class j, i, j = 1,2,...,K. i # j .
We construct each binary SVMs by solving following
quadratic programming problem:

4 3 2 1
Figure 1 The decision DAG

1238

Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:31:08 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Machine Learning and Cybernetics, Wan, 2-5 November 2003

3. Fuzzy classification method based on SVM


Sign[fj( X ) ] = 1, if sample x belongs to the class j,
In this section, We make a detail description about the
idea and formulation of this fuzzy classification method sign[fj (x)] = -1, otherwise. And sample x is classified
(FCM). into the class:
3.1 Fuzzifying the training data CWX)
=ar~maxlf;(x),f,(x>,...,f~(x)}
(4)
Suppose that we wish to perform classification on N
iid training samples:
3.3 Learning membership degree

(5.Y, ) A x 2 5 Y 2 1,. ..>(XI+>Y,) (1) For any, sample 3, its distance of hyperplane
f, (x) = Oin equation (3) is
Where X ~ E R " y, , E R , i=1,2,...,N . Since the
decision attributes are numerical, they need to be fuzzified
into lin&istic terms(categorica1 terms). The membership (5)
functions can he approximately determined based on
experts' opinion or people's common perception. In this
paper, we fuzzify decision attributes to K classes. through In some sense, d(x) reflects membership degree of x
fuzzy clustering based on self-organized learning [7,8]. So
we get a set of training points with categorical decision with respect to classj, if f,(x) > 0 ,the larger d(x) is, the
attributes and corresponding membership degree: higher membership degree sample x has belonged to the
class j. So there is a functional relation between the
(XI, 71, ..,(XN Y,, P,)
PI), (xu ?A.. 3 (2) membership degree and distance. since llUill is a
constant for the same hyperplane f,(x)=O, there is a
{1,2 ,...K },i = 1,2,...,N ,pi is
Wherex, E R " , ? ; E functional relation bemeen the membership degree and
the maximal value of ' p i l , p i,...,
2 pjK , where hW.
For training samples in equation (I), we know the
,uq(U# ~[0,1]) is the membership degree indicating following:
with what degree y j belongs to the class j,a n d h is the
class corresponding to the maximal membership degree of
I (dl 9 Y1)'@2 .P2 1,. ..>(dN 3 Y, 1 (6)
,ui,,pi2 ,...,piK i=1,2,...,N , j=1,2,..., K.i.e.:
Where
- P, =max@,,,pi2, ...,piK)
Y;= a m a x { p , , ,...,ptK1
.
z =1,2, ...,N di =majl,z..,KCfj(xi)},~;
=max(u,,,P;i,,. . . , p j K } , i = ~ . . . , N

Given a testing sample x, firstly, it is classified into


3.2 Constructing optimization separating hyperplane class Saccounting to equation (4), at the same time, we
get d = m ~ .,,, . Then we use piecewise
~K [,f j~( ~,) }~
We then use one-against-all as method of classification
and the data in equation (2) as training samples to construct interpolation to estimate the membership degree of x with
K optimal separating hyperplanes. So we can get K respect to class y" through data in equation (6) which have
decision functions: the same class label with sample x. So for a new sample x,
f , ( x ) = (U' .&x)) +b' this method not only classifies it to a certain class, but also
estimates the membership degree.
or j = 1,2,. ..,K (3)
4. Examples
f j ( x )= X ~ , ~ i a / K ( x i , x ) + b '
There are many applications that can be fitted by this

1239

Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:31:08 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Machine Learningand Cybernetics, Xi”,2-5 November 2003

method. In this section, we will introduce two examples to vector machine, and enhance the intelligentlevel of support
see the benefits of this method. vector machine.
We evaluated this method in using the hodyfat data However, this method needs to he improved ,for there
(from http://silver. sdsmt. edu/”rwjohnso), and the rice still remains some problems. For one thing, this method is
data(from UCI) listed in Table 1. sensitive to noises because of overfitting problem. For
Table 1 Feature of experiment data another, the forecasting capability of membership
prediction needs to be enhanced further.

Acknowledgements

This paper is supported by the fund of Natural Science


of Hebei province, the fund of educational committee of
Hebei province, and a key project of the ministry of
education.

Reference
FCM F1D3
Test Accuracy Rate 100% 79% [l] C. Cortes and V. Vapnik, “Support-vector network,”
Average Error of Machine Learning,vol. 20, pp. 273-297, 1995
o,15 0.17 [2] Cbib-Wei Hsu and Chih-Jen Lin, “A Comparison of
Membership Degree
Methods for Multiclass Support Vector Machine,”
IEEE TRANSACTIONS ON N E W
NETWOFXS,vol.l3,No.2,March2002
[3] S. Knerr, L. Personnaz, and G.Dreyfus, “Single-layer
leaming revisited A stepwise procedure for building
and training a neural network,” in Neurocomputing:
Algorithms, Architectures and Application, J.
Experimental result shows that this new fuzzy Fogelman, Ed. New YorkSpringer-Verlag, 1990
classificationmethod not only has high classified accuracy, [4] U. KreBel, “Pairwise classification and support
but also has strong forecasting capability for the vector machines,” in Advances in Kemel Methods-
membership degree. support vector learning, B. Scholkopf, C. J.C. Burges,
and A.J. smola, Eds. Cambridge, W M I T
5. Conclusions Press,1999, pp.255-268
[5] J.Fridman. (1996) Another Approch to poly-
In conclusion, we have described a new fuzzy chotomous Classification. Dep. Statist., Stanford, CA,
classification method. .The designing fundamentals and the [Online]. Availiahle: httu://www- stat. Stanford. edd
methods of computation and realization are given. The reports/ friedman/poly.ps.Z.
experimental results show that the new method in this [6] J.C. Platt, N.Cristianini, and J. Shave-Taylor, “Large
paper is more effective than Fuzzy ID3.. The characteristic Margin DAGS for Multiclass Classification, ” in
of this method is that, for the data with numerical decision Advances in Neural Information Processing System.
attributes, it fuzzifies the samples to some classes Cambridge, W M I T Press,2OOO,vol. 12, pp.547-553
(linguistic terms) first, then leaming classifier; for new [7] T. Kohonen, Self-Organization and Associ4tive
sample, this method doesn’t forecast the corresponding Memory (Springer, Berlin, 1988)
decision value, but gives the corresponding class and the [8] Yufei Yuan and Micheal JShaw, ”Induction of fuzzy
membership degree with respect to this class.This fuzzy decision trees”, Fuzzy Sets and Systems 69(1995)
decision result is more objective and easier to understand 125-139
than precise decision result. Experimental result including
two databases shows that this new fuzzy classification
method not only has high classified accuracy, hut also has
strong forecasting capability for the membership degree.
This new method optimizes the classified result of support

1240

Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:31:08 UTC from IEEE Xplore. Restrictions apply.

You might also like