Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Expert Systems

with Applications
Expert Systems with Applications 32 (2007) 409–414
www.elsevier.com/locate/eswa

A new medical decision making system: Least square support vector


machine (LSSVM) with Fuzzy Weighting Pre-processing
a,*
Emre Çomak , Kemal Polat b, Salih Günesß b, Ahmet Arslan a

a
Department of Computer Engineering, Engineering and Architecture Faculty, Selcuk University, Konya 42075, Turkey
b
Department of Electrical and Electronics Engineering, Engineering and Architecture Faculty, Selcuk University, Konya 42075, Turkey

Abstract

The use of machine learning tools in medical diagnosis is increasing gradually. This is mainly because the effectiveness of classification
and recognition systems has improved in a great deal to help medical experts in diagnosing diseases. This study aims at diagnosing Liver
Disorder with a new hybrid machine learning method. By hybridizing LSSVM with Fuzzy Weighting Pre-processing, a method was
obtained to solve this diagnosis problem via classifying Liver Disorder. Fuzzy Weighting Pre-processing stage was developed firstly
in our study. This Liver Disorder dataset is a very commonly used dataset in literature relating the use of classification systems for Liver
Disorder Diagnosis and it was used in this study to compare the classification performance of our proposed method with regard other
studies. We obtained a classification accuracy of 94.29%, which is the highest one reached so far. This result is for Liver Disorder but it
states that this method can be used confidently for other medical diseases diagnosis problems, too.
 2005 Elsevier Ltd. All rights reserved.

Keywords: SVM; LSSVM; Fuzzy Weighting Pre-processing; Liver Disorder disease diagnosis; ROC curves

1. Introduction Many researches have been increasingly studied for new


technologies that are used to help doctors for diagnosis of
With improvements in medical knowledge systems in disorders. Different techniques without doctors have been
medical institutes and hospitals, determining useful knowl- tried to develop for diagnosis of disorders. In some disor-
edge is becoming more difficult. Especially, because the con- ders Artificial Neural Networks (ANNs) have been success-
ventional manual data analysis techniques are not effective ful (Yalçın & Yıldırım, 2003).
in diagnosis, using computer based analyses are becoming As known to many people, liver’s mission is to store and
inevitable in disease diagnosis. So, it is the time to develop throw out the material includes poison. If amount of the
modern, effective and efficient computer based systems for material includes poison is greater than capacity of liver
decision support. There are a number of data analysis tech- store, related areas in the liver will deteriorate. Some mate-
niques: statistical, machine learning and data abstraction. rials and enzymes join in blood. While diagnosing the dis-
Medical analysis using machine learning techniques has order, level of these enzymes are investigated. In the
begun to be conducted for last twenty years. The advanta- diagnosing of disorders the liver, very much error could
ges of using machine learning schemes in medical analysis be made since both there are many enzymes and different
have caused human support and costs to decrease and amounts of alcohol can cause different disorders in differ-
caused diagnosis accuracy to increase (Cheung, 2001). ent patients (Yalçın & Yıldırım, 2003).

*
Corresponding author. Tel.: +90 332 2232043, +90 332 2232098, +90 332 2232082, +90 332 2232000; fax: +90 332 2410635.
E-mail addresses: ecomak@selcuk.edu.tr (E. Çomak), kpolat@selcuk.edu.tr (K. Polat), sgunes@selcuk.edu.tr (S. Günesß), ahmetarslan@selcuk.edu.tr
(A. Arslan).

0957-4174/$ - see front matter  2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2005.12.001
410 E. Çomak et al. / Expert Systems with Applications 32 (2007) 409–414

In this paper, a new medical diagnosing system based on


Fuzzy Weighting Pre-processing developed by us and
LSSVM is proposed. Proposed system is implemented on
BUPA Liver Disorder dataset taken from UCI Machine
Learning Repository (BUPA Liver Disorders Dataset)
and 94.29% classification performance is obtained. This
diagnosing rate is the highest rate in literature. According
to this result, our system can be used for medical
diagnosing.
The rest of the paper is organized as follows. We
explained the method in Section 2 with subtitles of SVM,
LSSVM and Fuzzy Weighting Pre-processing. In each sub-
section of this section, the detailed information is given.
Section 3 gives used Liver Disorder data source and data-
set. The results obtained in applications are given in Sec-
tion 4 for Liver Disorders dataset. Consequently, in
Section 5, we conclude the paper with summarization of
results by emphasizing the importance of this study and
Fig. 1. The structure of a simple SVM.
mentioning about some future work.

2. Material and method


To maximize this margin (C), norm of w is minimized.
In this work, we have used LSSVM and Fuzzy Weight- To reduce the number of solutions for norm of w, following
ing Pre-processing as material and method. These are equation is determined:
explained as follows. C  kwk ¼ 1 ð4Þ

2.1. LSSVM Then formula (5) is minimized subject to constraint (2).


1 2
In this section we firstly mention about SVM classifier kwk ð5Þ
2
after that LSSVM related to SVM. When we study on the non-separable data, slack vari-
ables ni, are added into formula (2) and (5). Instead of for-
2.1.1. Support vector machines (SVMs)
mulas (2) and (5), new formulas are used:
SVM is a reliable classification technique, which is based
on the statistical learning theory. This technique was firstly y i ½ðw  xi Þ þ w0  P 1  ni ð6Þ
proposed for classification and regression tasks by Vapnik Xn
1 2
(1995). C ni þ kwk ð7Þ
i¼1
2
As shown in Fig. 1, a linear SVM was developed to clas-
sify the data set which contains two separable classes such Since originally SVMs classify the data in linear case, in
as {+1, 1}. Let the training data consist of n data the nonlinear case SVMs do not achieve the classification
(x1, y1), . . . ,(xn, yn), x 2 Rn and y 2 {+1, 1}. To separate tasks. To overcome this limitation on SVMs, kernel
these classes, SVMs have to find the optimal (with maxi- approaches are developed. Nonlinear input data set is con-
mum margin) separating hyperplane so that SVM has good verted into high dimensional linear feature space via ker-
generalization ability. All of the separating hyperplanes are nels. In SVMs, following kernels are most commonly used:
formed with
DðxÞ ¼ ðw  xÞ þ w0 ð1Þ • Dot product kernels: K(x, x 0 ) = x Æ x 0 .
and provide following inequality for both y = +1 and • Polynomial kernels: K(x, x 0 ) = (x Æ x 0 + 1)d, where d is
y = 1: the degree of kernel and positive integer number.
• RBF kernels: K(x, x 0 ) = exp(kx  x 0 k2/r2), where r is
y i ½ðw  xi Þ þ w0  P 1; i ¼ 1; . . . ; n ð2Þ a positive real number.
The data points which provide above formula in case of
equality are called the support vectors. The classification In our experiments r = 1.7 is selected.
task in SVMs is implemented by using of these support
vectors. 2.1.2. Least squares support vector machines (LSSVM)
Margins of hyperplanes obey following inequality: LSSVMs are proposed by Suykens and Vandewalle
(1999). The most important difference between SVMs and
yk  DðxkÞ
P C; k ¼ 1; . . . ; n ð3Þ LSSVMs is that LSSVMs use a set of linear equations for
kwk training while SVMs use a quadratic optimization problem
E. Çomak et al. / Expert Systems with Applications 32 (2007) 409–414 411

(Tsujinishi & Abe, 2003). While formula (7) is minimized known as input and output membership functions. These
subject to formula (6) in Vapnik’s standard SVMs, in are selected as triangular membership functions as shown
LSSVMs formula (9) is minimized subject to formula (8): in Figs. 2 and 3, respectively.
y i ½ðw  xi Þ þ w0  ¼ 1  ni ; i ¼ 1; . . . ; n ð8Þ Firstly, the formation of these membership functions is
realized as follows: As a first step, the mean values of each
1 2 CX n
feature are calculated through using all of the samples’ cor-
kwk þ n2 ð9Þ
2 2 i¼1 i responding feature values in
According to these formulas, their dual problems are 1 XN
mi ¼ xk;i ð11Þ
built as follows: N k¼1
1 CX n
Here, xk,i represents the ith feature value of sample xk,
Qðw; b; a; nÞ ¼ kwk2 þ n2
2 2 i¼1 i k = 1, 2, . . . ,N. After calculation of these sample means
X
n for each feature, the input membership function is formed
 ai fy i ½ðw  xi Þ þ w0   1 þ ni g ð10Þ by triangles as in Fig. 2. The supports of these triangles are
i¼1
determined by Avg/8, Avg/4, Avg/2, Avg, 2 · Avg,
Another difference between SVMs and LSSVMs is that 4 · Avg, 8 · Avg as shown in Fig. 2. The lines of input
ai (Lagrange multipliers) are positive or negative in membership functions are named as mf1, mf2, . . . ,mf8.
LSSVMs but they must be positive in SVMs. Information For the output membership function formation, again
in detailed is found in Suykens and Vandewalle (1999) eight parts formed membership functions (Fig. 3). The
and Tsujinishi and Abe (2003). interval [0, 1] is divided into eight equal part and the corre-
sponding lines are again named but in this case as mf1 0 ,
2.2. Fuzzy Weighting Pre-processing mf2 0 , . . . ,mf8 0 . Before continuing it is worth to note that
these input and output membership functions are formed
In the Fuzzy Weighting Pre-processing, each feature for each feature so there will be exist different input–output
takes new feature value according to its old value. Two membership function configuration for each feature since
membership functions are defined in this pre-processing the sample means of each feature differs.

Fig. 2. Input membership function.

Fig. 3. Output membership function.


412 E. Çomak et al. / Expert Systems with Applications 32 (2007) 409–414

After determination of input and output membership 3. The used Liver Disorder dataset
functions for each feature, the weighting pre-processing
comes into scene. For a feature value, say xk,i, that is for Liver is an effective organ in neutralizing toxics and
ith feature value of xk sample, this value is taken as in throwing them from the body. If the amount of toxics
the x-axis of input membership function and y-values of reaches a level exceeding working capacity of the organ,
the points at which this value cuts the input membership the cells of related parts in organ are destroyed. Then, some
functions are determined. For example if this feature value substances and enzymes are appeared and interfere in
is between 0 and Avg/8, then this point will cut both line blood. During diagnosis of the disease, the levels of these
mf1 and mf2. The y-values at these intersection points, enzymes are analysed. Because of the fact that effects of dif-
say y1 and 2, are known as membership values (l) and they ferent alcohol dosages vary from one person to the other as
will then be used in a fuzzy rule base in the following man- well as the fact that there are many enzymes, there can be
ner: firstly, the input membership value, l(i), is determined frequent possible errors in diagnosis (Yalçın & Yıldırım,
by using the above intersection points: 2003).
BUPA Liver Disorders data set which is prepared by
lðiÞ ¼ lA\B ðxk;i Þ ¼ MINðlA ðxk;i Þ; lB ðxk;i ÞÞ; x2X ð12Þ BUPA medical research company includes 345 samples
with 6 features and 2 class labels. All of the samples are
Here, lA(xk,i) and lB(xk,i) membership values correspond taken from only one single male.
to the intersection points as mentioned above. The rule Two hundred samples of the whole data set belong to
base for our system is used as presented in Table 1. After first class label and remaining 145 samples belong to sec-
this l(i) value is determined through using Eq. (12) for ond class label. The first five features for each sample are
our xk,i feature value, the output weight value is then obtained from blood tests. The last feature is daily alcohol
determined by using output membership functions and consumption. Information about this data set in detailed
the rules in Table 1. Here, in determining weight as a last can be found in Yalçın and Yıldırım (2003):
step, firstly the input membership value, l(i), is presented
to output membership function to determine the corre- • mean corpuscular volume (Mcv);
sponding weighting value of our original feature value. • alkaline phosphotase (Alkphos)—protein in cell mem-
This membership value is now taken as a point in y-axis brane of gall secretion;
of the output membership functions and again as for • alanine aminotransferase (Sgpt)—it is one of the amino-
the case in input membership functions, the intersection transferase variety which cause raising of blood level
points are determined which are cut by this membership when hepatocellular necrosis is set;
value. It is apparent from output membership functions • aspartate aminotransferase (Sgot)—it is one of the ami-
that there will be more than one intersection points. That notransferase variety which cause raising of blood level
which of them will be used is decided through the rules when hepatocellular necrosis is set;
in Table 1. For example, if input feature value cuts mf1 • c-glutamyl transpeptidase (Gammagt)—this is a test
and mf2 lines in input membership functions then the out- which determines the amount of GGT enzyme in blood;
put value for this feature will be the mean of two points • drinks—daily alcohol consumption (half-pint).
that l(i) cuts mf1 0 and mf2 0 at the output membership
functions.
4. The experimental results
Table 1
Fuzzy rule base for our system In our experimental study, Liver Disorder dataset is
1. if Input_value cuts mf1 and mf2 then firstly pre-processed by Fuzzy Weighting Pre-processing
Output_value = (mf1 0 (y) + mf2 0 (y))/2 and then classified by LSSVM classifier. As mentioned in
2. if Input_value cuts mf2 and mf3 then Section 3, this data set includes 345 samples with six fea-
Output_value = (mf2 0 (y) + mf3 0 (y))/2 tures and two output labels. While first class includes 200
3. if Input_value cuts mf3 and mf4 then
samples, 145 samples belong to second class. Training data
Output_value = (mf3 0 (y) + mf4 0 (y))/2 set includes 310 samples (180 samples from first class and
130 samples from second class). Testing data set includes
4. if Input_value cuts mf4 and mf5 then
Output_value = (mf4 0 (y) + mf5 0 (y))/2 35 samples (20 samples from first class and 15 samples
from second class).
5. if Input_value cuts mf5 and mf6 then
Output_value = (mf5 0 (y) + mf6 0 (y))/2
In standard LSSVM 60.0% classification accuracy is
obtained, while in LSSVM with Fuzzy Weighting Pre-pro-
6. if Input_value cuts mf6 and mf7 then cessing 94.29% classification accuracy is obtained. Sensitiv-
Output_value = (mf6 0 (y) + mf7 0 (y))/2
ity and Specificity accuracies are also presented in Table 2.
7. if Input_value cuts mf7 and mf8 then In addition to, obtained accuracy rates in literature so far
Output_value = (mf7 0 (y) + mf8 0 (y))/2
are listed in Table 3.
E. Çomak et al. / Expert Systems with Applications 32 (2007) 409–414 413

Table 2 method is preferred. According to this method, ROC


Comparing between standard LSSVM and LSSVM with Fuzzy Weighting curves and area under these curves are computed for both
Pre-processing
classifiers as shown in Fig. 4.
Sensitivity (%) Specificity (%) ROC curves is a statistical comparing method which
Standard LSSVM r = 1.7, 100 6.66 uses the rates of true positive and false positive. Areas
(60.0% testing accuracy) C = 0.1 under ROC curves are represented by Az value. This value
LSSVM with Fuzzy r = 1.7, 95 93.33
Weighting Pre-processing C = 0.1
is related to the accuracies of classifiers. Higher values
(94.29% testing accuracy) represent higher classification accuracies, while lower
values represent lower classification accuracies (Osareh,
Mirmehdi, Thomas, & Markham, 2002; Centor, 1991).
Table 3 ROC curves show that there is a significant difference
LSSVM with Fuzzy Weighting Pre-processing classification accuracy for between computed areas for two classifiers (Az = 0.95 for
BUPA Liver Disorders problem with classification accuracies obtained by LSSVM with fuzzy but Az = 0.336 for LSSVM without
other methods in literature fuzzy).
Author (Year) Method Classification
accuracy (%) 5. Discussion and conclusions
Pham et al. (2000) RULES-4 (40%–60%) 55.90
Cheung (2001) C4.5 (5 · CV) 65.59 In this paper, a new weighting method called Fuzzy
Cheung (2001) Naı̈ve Bayes (5 · CV) 63.39
Cheung (2001) BNND (5 · CV) 61.83
Weighting Pre-processing have been developed using fuzzy
Cheung (2001) BNNF (5 · CV) 61.42 logic and a new medical diagnosis system is built by asso-
Van Gestel et al. (2002) SVM with GP (10 · CV) 69.70 ciating this weighting method with LSSVM classifier.
Lee and Mangasarian SSVM (10 · CV) 70.33 In application phase of this study, developed LSSVM
(2001a, 2001b) with Fuzzy Weighting Pre-processing method is applied
Lee and Mangasarian RSVM (10 · CV) 74.86
(2001a, 2001b)
to BUPA Liver Disorders dataset and 94.29% classification
Yalçın and Yıldırım (2003) MLP (3 · CV) 73.05 rate is obtained. This rate is the highest classification rate in
Yalçın and Yıldırım (2003) PNN (3 · CV) 42.03 literature. In addition to, with standard LSSVM (without
Yalçın and Yıldırım (2003) GRNN (3 · CV) 65.55 Fuzzy Weighting Pre-processing) 60% classification rate is
Yalçın and Yıldırım (2003) RBF (3 · CV) 58.55 obtained. This result shows that Fuzzy Weighting Pre-pro-
Polat et al. (2005) AIRS (10 · CV) 81.00
Our method (2005) LSSVM with Fuzzy 94.29
cessing extremely increases the classification rate of
Weighting Pre-processing LSSVM for current data set.
According to the application results, LSSVM with
Fuzzy Weighting Pre-processing showed a considerably
To compare the classification performances of standard high performance with regard to the classification accuracy
LSSVM and LSSVM with Fuzzy Weighting Pre-processing especially for BUPA Liver Disorders dataset.
classifiers, receiver operator characteristic (ROC) curves In this study also ROC curves are used to test the accu-
racy of proposed system statistically. As shown in ROC
curves while under the area of ROC curves for standard
LSSVM is 0.336, this area for LSSVM with Fuzzy Weight-
ing Pre-processing is 0.95. According to these results, pro-
posed system is very effective and reliable.
Although developed method is built as an offline diag-
nosing system, it can be rebuilt as an online diagnosing sys-
tem in the future.

Acknowledgement

This study is supported by the Scientific Research Pro-


jects of Selcuk University.

References

BUPA Liver Disorders Dataset. UCI Repository of Machine Learning


Databases. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/liver-
disorders/bupa.data.
Fig. 4. ROC curves for LSSVM with fuzzy and LSSVM without fuzzy Centor, R. M. (1991). Signal detectability: The use of ROC curves and
with (Az). their analysis. Medical Decision Making, 11, 102–106.
414 E. Çomak et al. / Expert Systems with Applications 32 (2007) 409–414

Cheung, N. (2001). Machine learning techniques for medical analysis. Polat, K., S ß ahan, S., Kodaz, H., & Günesß, S. (2005). Karaciğer
School of Information Technology and Electrical Engineering, B.Sc. Rahatsızlığı Tesßhisinde Yeni Bir Sınıflama Yöntemi: Danısßmalı Yapay
thesis, University of Queenland. Bağısßıklık Sistemi (AIRS). IEEE 13. Sinyal Is_ ßleme Kurultayı (SIU-
_
Lee, Y. J., & Mangasarian, O. L. (2001a). SSVM: A smooth support 2005), Kayseri.
vector machine for classification. Computational Optimization and Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector
Applications, 20(1), 5–22. machine classifiers. Neural Processing Letters, 9(3), 293–300.
Lee, Y. J., & Mangasarian, O. L. (2001). RSVM: Reduced support vector Tsujinishi, D., & Abe, S. (2003). Fuzzy least squares support vector
machines. In Proceedings of the first SIAM international conference on machines for multi-class problems. Neural Networks Field, 16, 785–792.
data mining. Van Gestel, T., Suykens, J. A. K., Lanckriet, G., Lambrechts, A., De Moor,
Osareh, A., Mirmehdi, M., Thomas, B., & Markham, R. (2002). B., & Vandewalle, J. (2002). Bayesian framework for least squares
Comparative exudate classification using support vector machines support vector machine classifiers, Gaussian processes and kernel
and neural networks. In T. Dohi & R. Kikinis (Eds.), Fifth interna- fisher discriminant analysis. Neural Computation, 14(5), 1115–1147.
tional conference on medical image computing and computer-assisted Vapnik, V. (1995). The nature of statistical learning theory. New York:
intervention. LNCS 2489 (pp. 413–420). Berlin: Springer. Springer.
Pham, D. T., Dimov, S. S., & Salem, Z. (2000). Technique for selecting Yalçın, M., & Yıldırım, T. (2003). Karaciğer bozukluklarının yapay sinir
examples in inductive learning. In European symposium on intelligent ağları ile tesßhisi. In Biyomedikal Mühendisliği Ulusal Toplantısı
techniques (ESIT 2000), Aachen, Germany (pp. 119–127). (BIYOMUT 2003), Istanbul, Türkiye (pp. 293–297).

You might also like