Expert Systems With Applications: M.A.H. Farquad, V. Ravi, S. Bapi Raju

Expert Systems with Applications 37 (2010) 55775589
Contents lists available at ScienceDirect
Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa
Support vector regression based hybrid rule extraction methods for forecasting
M.A.H. Farquad a,b, V. Ravi a,*, S. Bapi Raju b
a
Institute for Development and Research in Banking Technology, Castle Hills Road #1, Masab Tank, Hyderabad 500 057 (AP), India
b
Department of Computer and Information Sciences, University of Hyderabad, Hyderabad 500 046 (AP), India
a r t i c l e i n f o a b s t r a c t
Keywords: Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector
Rule extraction Machine (SVM) introduced by Vapnik (1995). The main drawback of these newer techniques is their lack
Support vector machine of interpretability. In other words, it is difcult for the human analyst to understand the knowledge learnt
Support vector regression by these models during training. The most popular way to overcome this difculty is to extract ifthen
Classication and regression tree
rules from SVM and SVR. Rules provide explanation capability to these models and improve the compre-
Adaptive network based fuzzy inference
system
hensibility of the system. Over the last decade, different algorithms for extracting rules from SVM have
Dynamic evolving fuzzy inference system been developed. However rule extraction from SVR is not widely available yet. In this paper a novel
Root mean squared error hybrid approach for extracting rules from SVR is presented. The proposed hybrid rule extraction proce-
dure has two phases: (1) Obtain the reduced training set in the form of support vectors using SVR (2)
Train the machine leaning techniques (with explanation capability) using the reduced training set.
Machine learning techniques viz., Classication And Regression Tree (CART), Adaptive Network based
Fuzzy Inference System (ANFIS) and Dynamic Evolving Fuzzy Inference System (DENFIS) are used in
the phase 2. The proposed hybrid rule extraction procedure is compared to stand-alone CART, ANFIS
and DENFIS. Extensive experiments are conducted on ve benchmark data sets viz. Auto MPG, Body
Fat, Boston Housing, Forest Fires and Pollution, to demonstrate the effectiveness of the proposed
approach in generating accurate regression rules. The efciency of these techniques is measured using
Root Mean Squared Error (RMSE). From the results obtained, it is concluded that when the support vec-
tors with the corresponding predicted target values are used, the SVR based hybrids outperform the
stand-alone intelligent techniques and also the case when the support vectors with the corresponding
actual target values are used.
2010 Elsevier Ltd. All rights reserved.
1. Introduction the hyperplane with small norm while simultaneously minimizing

the sum of the distances from data points to the hyperplane. De-
During the last decade a number of researchers and practitio- spite their superior performance in various application areas, the
ners are using SVM to solve pattern classication and function- models created by SVM and SVR are opaque, which must be con-
approximation problems. Function approximation or regression sidered as a serious drawback as these models do not yield human
problems, unlike classication problems, have continuous output comprehensible knowledge. Many researchers have tried to treat
variable. In many applications such as medicine, nance it is desir- this accuracy vs. comprehensibility trade-off by converting the
able to extract knowledge from SVM for the users to gain a better opaque, high accurate model to transparent model via rule
understanding of how the model solves the problems. SVM algo- extraction.
rithm developed by Vapnik (1995) is based on statistical learning Rule extraction algorithms were rst developed in the context
theory. For classication problems (Burges, 1998; Osuna, Freund, of neural networks. They provide transparency for the opaque
& Girosi, 1997; Pontil & Verri, 1997) it tries to nd a maximal mar- models (Gallant, 1988). Researchers argued that even limited
gin hyperplane that separates two classes. In the case of regression explanation can positively inuence the system acceptance by
problem the goal is to construct a hyperplane that lies near to or the user (Davis, Buchanan, & Shortliffe, 1977). Using rule extraction
close to as many instances as possible (Ancona, 1999; Joachims, a learning system might discover salient features in the input data
1998; Smola & Scholkopf, 1998). Therefore the objective is to nd whose importance was not previously recognized (Craven & Shav-
lik, 1994).
Multiple rule extraction techniques have been proposed by the
* Corresponding author. Tel.: +91 40 23534981x2042.
E-mail addresses: farquadonline@gmail.com (M.A.H. Farquad), rav_padma@ researchers to extract rules from SVM. Nunez, Angulo, and Catata
yahoo.com (V. Ravi), bapics@uohyd.ernet.in (S.B. Raju). (2002) proposed SVM + Prototype for extracting rules from SVM.
0957-4174/$ - see front matter 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2010.02.055
5578 M.A.H. Farquad et al. / Expert Systems with Applications 37 (2010) 55775589
They used K-means clustering algorithm for determining prototype In this paper, we present a hybrid rule extraction procedure for
vectors for each input class. An ellipsoid is dened in the input solving regression problems. The proposed approach has two
space combining these prototypes with support vectors and phases, (i) SVR is used to extract the support vectors from the
mapped it to ifthen rules. Fung, Sandilya, and Bharat Rao (2005) training set. Two different training sets are made, where, one set
proposed a rule extraction technique similar to SVM + Prototype. has support vectors along with their corresponding actual output
But it does not require the computationally expensive clustering. values given in the dataset and the second set has the support vec-
Instead, the algorithm transforms the problem to a simpler, equiv- tors and their corresponding output values predicted by SVR. (ii)
alent variant and constructs the hyper cubes by solving linear pro- These two sets are then used separately to generate rules using
grams. Each hypercube is then transformed to a rule. CART, ANFIS and DENFIS. By using the rst training set, we are
Fu, Ong, Keerthi, Hung, and Goh (2004) proposed RulExtSVM for reducing the number of patterns in the input space (because we
extracting ifthen rules using intervals dened by hyper rectangu- used support vectors only) and the rules are generated in phase
lar forms. A hyper rectangle is generated by using the intersection 2 are not extracted from SVR. However by using the 2nd training
of the support vectors with the decision boundary and the rules are set, we are ensuring that the rules generated in phase 2 are indeed
generated based on the hyper rectangles. Initial rule set1 is then extracted from SVR.
tuned in order to improve rule accuracy and the redundant rules The rest of the paper is organized as follows. Section 2 presents
are removed to obtain more concise rule set. The disadvantage of overview of SVR, CART, ANFIS and DENFIS. Section 3 explains the
this algorithm is the construction of hyper rectangles based on architecture of proposed hybrid approach. Section 4 presents the
the number of support vectors. A hybrid rule extraction technique experimental setup. The datasets used in the study are described
is proposed by Barakat and Diederich (2004, 2005), where they in Section 5. Section 6 presents results and discussion. Finally Sec-
rst developed an SVM model using the training set and removed tion 7 concludes the paper.
the output class labels. They used the developed model to predict
the output class labels. Later, they used these support vectors for 2. Overview of the intelligent and machine learning techniques
training the decision tree and generated rules. Hyper rectangle
Rules Extraction (HRE) (Zhang, Su, Jia, & Chu, 2005) rst constructs 2.1. Support vector regression
hyper rectangles according to the prototypes and the support vec-
tors (SVs) using SVM model. By projecting these hyper rectangles The support vector machine (SVM) is a constructive learning
onto coordinate axes, ifthen rules are obtained. Barakat and Brad- procedure based on the statistical learning theory (Vapnik, 1995).
ley (2006) describe the use of the area under the receiver operating SVMs are an inductive machine learning technique based on the
characteristic (ROC) curve (AUC) to assess the quality of rules ex- structural risk minimization principle that aims at minimizing
tracted from SVM. the true error. An SVM performs classication by constructing an
Fuzzy Rule Extraction (FREx) (Chaves, Vellasco, & Tanscheit, N-dimensional hyper plane that optimally separates the data into
2005) determines the projection of the support vectors in the coor- two categories. The main objective of SVM is to nd an optimal
dinate axes during rst step. Using triangular fuzzy membership separating hyperplane that correctly classies data points as much
function each support vector is then transformed into fuzzy ifthen as possible and separates the points of two classes as far as possi-
rule. A Multiple Kernel-Support Vector Machine (MK-SVM) scheme ble, by minimizing the risk of misclassifying the training samples
with feature selection, rule extraction and prediction modeling is and unseen test samples.
proposed to improve the explanation capacity of SVM (Chen, Li, In a typical regression problem, we are given a training set
& Wei, 2007). This approach makes use of the information provided n
fxi ; yi gi1 Rd R, where xi and yi are the input variable vector
by the separating hyperplane and support vectors. Rules obtained and output variable, respectively, of the ith pair. Support vector
from the present approach have good generalization capacity and regression (Scholkopf & Smola, 2002) is a kernel method that per-
comprehensibility and are applied to gene expression data of can- forms nonlinear regression based on the kernel trick. Essentially,
cer tissue. Barakat and Bradley (2007) proposed a method to ex- each input xi 2 Rd is mapped implicitly via a nonlinear feature
tract rules directly from the support vectors (SVs) of a trained map / to some kernel-induced feature space F where linear
SVM using a modied sequential covering algorithm termed regression is performed.
SQRexSVM. Rules are generated based on an ordered search of In SVR (Smola & Schlkopf, 2004; Vapnik, 1995), the goal is to
the most discriminative features, as measured by interclass separa- nd a function f x that has at most e deviation from the actually
tion. Rule set performance is then evaluated using the measured obtained targets yi for all the training data. Deviation larger than
rates of true positives (TPs) and false positives (FPs), and the e is not accepted.
AUC. Martens, Baesens, and Gestel (2009) proposed a new Active In the case of linear functions f taking the form
Learning-Based Approach (ALBA) to extract rules from SVM mod-
els. ALBA makes use of the key concept of SVM that the support X
n
f x wi xi b with w 2 R; b 2 R: 1
vectors are typically close to decision boundary and extracts rules
i1
from the trained SVM. All these methods except for Barakat and
Diederich (2004, 2005), do not employ hybrid approach for rule One way to ensure minimum e deviation is to minimize the norm,
extraction. Barakat and Diederich (2004, 2005) applied decision
X
n
trees to extract rules from a trained SVM. However, they did not i:e: kwk2 wTi wi :
test their approach sufciently. i1
Then Farquad, Ravi, and Raju (2008a, 2008b) proposed a hybrid
The problem can be written as a convex optimization problem:
rule extraction approach using SVM for bankruptcy prediction in
banks. They used SVM as a preprocessor to extract support vectors. 1
These support vectors along with their corresponding actual out- minimize kwk2
82
put values of the target variable are used to train a classier with > Pn
>
explanation capability such as Fuzzy Rule Based System (FRBS), < yi wi xi b 6 e;
>
i1
Decision Tree and RBF in the second phase. It is observed from subject to 2
>
> P
n
the empirical results that the hybrid SVM + FRBS outperformed >
: wi xi b yi 6 e:
the stand-alone classiers. i1
M.A.H. Farquad et al. / Expert Systems with Applications 37 (2010) 55775589 5579
The assumption in (2) is that such a function f actually exists

that approximates all pairs xi ; yi with e precision. In the case
where constraints are infeasible, slack variables ni ; ni were intro-
duced. This case is called soft margin formulation (Bennett & +
X X
Mangasarian, 1992) and is described by the following problem; 0
X
X +
1 X l X X X
Minimize kwk2 C ni ni X
2 i1
X
X X
8 X
> Pn
X
>
> yi wi xi b 6 e ni ; X
X
>
>
< i1 X
subject to Pn 3
>
> w x b yi 6 e ni ;
> i1 i i
>
>
: Fig. 1. The soft margin loss setting for a linear SVR.
ni ; ni P 0:
The constant C > 0 determines the amount up to which devia-
tions larger than e are tolerated. This is called e-insensitive loss To make the SV algorithm nonlinear all training patterns xi are
function jnje and is described by mapped U : X ! F into some feature space F as described in (Aiz-

0 if jnj 6 e; erman, Braverman, & Rozonoer, 1964; Nilsson, 1965) and then
jnje : 4 using the standard SV regression a linear hyperplane is constructed
jnj e otherwise:
in the feature space. Using the trick of kernel functions (Cortes &
Fig. 1 depicts this e-insensitive loss function graphically. Only Vapnik, 1995) following QP problem is formulated.
the points inside the shaded region contribute to the prediction
8
accuracy, whereas the points outside the shaded region contribute > P
l
> 1
ai ai aj aj Kxi ; xj ;
to the error, as the deviations are penalized in a linear fashion. It is <2
>
i;j1
observed that in most cases the optimization problem (3) can be Maximize
>
> P
l P
l
solved more easily in its dual formulation. Hence, Lagrange multi- >
: e ai ai yi ai ai ;
pliers are used to get the dual formulation as described in Fletcher i1 i1
(1989), which is as follows: X

l
subject to ai ai 0 and ai ; ai 2 0; C:
1 X l X l
i1
L kwk2 C ni ni gi ni gi ni
2 i1 i1
The optimal solution obtained
X
l X
n P P
w li1 ai ai Uxi and f x li1 ai ai Kxi ; x b,
ai e ni yi wi xi b
where K(,) is a kernel function.
i1 i1
Usually more than one kernel used in the literature to map the
X
l X
n X
l
ai e ni yi wi xi b gi ni gi ni : 5 input space into feature space (Cristianini & Shawe-Taylor, 2000).
i1 i1 i1 The question is which kernel functions provide good generalization
for a particular problem. One has to try more than one kernel func-
Here L is the Lagrangian and gi ; gi ; ai ; ai are Lagrange multipliers.
tion for a particular problem in order to resolve this issue. Because
Hence the dual variables in (5) have to satisfy positivity constraints
of the approximate mapping of input space to higher dimensional
i.e.
feature space using different kernel functions, support vectors ex-
ai ; ai ; gi ; gi P 0: 6 tracted are different and the number of support vectors varies as
well for each kernel.
According to the saddle point condition the partial derivatives of L
with respect to the primal variables w; b; ni ; ni have to vanish for
optimality. Therefore, we get 2.2. Classication and regression tree (CART)
@L X l
CART implements binary recursive portioning procedure (Brei-
a ai 0; 7
@b i0 i man, Friedman, Olshen, & Stone, 1984). It always splits attributes
into exactly two child nodes and the process can be repeated by
@L X l
w ai ai xi 0; 8 treating each child node as a parent. Key features of CART are split-
@w i1 ting of attributes (nodes) and deciding when a tree is complete and
@L assigning each terminal node to a class outcome (or predicted va-
C ai gi 0; 9 lue for regression). To split a node into two child nodes, CART al-
@ni
@L ways asks questions that have a yes or no answer. For
C ai gi 0: 10 example, the questions might be: is salary 655,000? Or is loan
@ni
660,000?
Substituting (7)(10) into (5) yields the dual optimization problem; CARTs method is to look at all possible splits for all attributes
(variables/nodes) included in the analysis. It checks for the best
1X l X n X l
split for the node using GINI index and once a best split is found,
maximize ai ai aj aj xi xj e ai ai
2 i;j1 i;j1 i1 CART repeats the search process for each child node, continuing
X
l recursively until further splitting is impossible or stopped. At this
yi ai ai 11 point a maximal tree has been produced, which probably greatly
i1 overts the information contained within the learning dataset. The
X
l next step is tree pruning, which results in the creation of a se-
subject to ai ai 0 and ai ; ai 2 0; C: quence of simpler trees, through removing the highly specialized
i1
rules, which have low coverage. The nal step is optimal tree
selection, during which the tree that does not overt the data, is DENFISs off-line learning process is implemented in the follow-
selected from among the sequence of pruned trees (Breiman ing way:
et al., 1984).
1. Cluster (partition) the input space to nd a cluster center by
using off-line evolving clustering method with constraint opti-
2.3. Adaptive network based fuzzy inference system (ANFIS)
mization i.e. Evolving Clustering Method (ECM), which is a fast,
one pass clustering algorithm for dynamic clustering of input
Fuzzy inference system (FIS) also called as fuzzy rule based sys-
stream data.
tem, is actually composed of ve functional blocks, (i) rule base,
2. Create the antecedent part of each fuzzy rule and also the cur-
containing a number of fuzzy ifthen rules. (ii) Database, dening
rent position of the cluster center.
the membership function of the fuzzy sets used in the rules. (iii)
3. Find n datasets, each of them including one cluster center and p
Decision Making Unit, used for inference operations on rules. (iv)
learning data pairs that are closest to the center in the input
Fuzzication interface, which transforms the crisp inputs into de-
space.
gree of match with linguistic values. (v) Defuzzication Interface,
4. Estimate the function f to create the consequent part of each
which transforms the fuzzy results of the inference into crisp out-
fuzzy rule with n datasets.
put (Jang, 1993). An adaptive network is a network structure that
consists of nodes and directional links through which the nodes
are connected. Part or all the nodes in this network are adaptive 3. Proposed hybrid approach
in nature, i.e. the outputs of these adaptive nodes are dependent
on the parameter(s) of that node. Gradient descent and the chain In this research work we propose a novel hybrid rule extraction
rule are basic learning approaches of adaptive networks (Werbos, procedure for solving regression problems. For regression prob-
1974). Learning species how these parameters should be tuned lems, the error in prediction varies depending on the distance of
in order to minimize the error measure. the predicted output value from the actual output value. The pro-
ANFIS is an FIS (Takagi & Sugeno, 1985) implemented in the posed hybrid has two components (i) extraction of support vectors
framework of adaptive networks. Using a given input / output data from SVR and (ii) rule generation using intelligent techniques viz.
set, the ANFIS constructs an FIS (Jang, 1993). Membership function CART, ANFIS and DENFIS.
parameters for FIS are tuned using either a backpropagation algo-
rithm alone or in combination with a least squares method. Fuzzy
ifthen rules are of the form IF A THEN B, where A and B are labels 3.1. Phase 1: extraction of support vectors
of fuzzy sets (Zadeh, 1965) characterized by appropriate member-
ship functions. Using SVR, support vectors are extracted from the given training
set. The predictive accuracy measured in terms of Root Mean
Squared Error (RMSE) obtained by SVR is computed and the set
2.4. Dynamic evolving neuro-fuzzy inference system (DENFIS) of support vectors corresponds to the experiments that yields the
lowest RMSE are considered for the next phase of the hybrid. The
DENFIS is new type of FIS for adaptive online learning (Kasabov dataow is depicted in Fig. 2.
& Song, 2002). DENFIS evolves through incremental, hybrid (super- Two different training sets are constructed using the extracted
vised/unsupervised) learning and accommodates new input data. support vectors. Case (1): training set consisting of the support
Using local element tuning, new fuzzy rules are generated and up- vectors and the corresponding actual values of the target variable
dated during the operations of the system. given in the dataset. Case (2): training set consisting of the support
DENFIS is used for online and off-line learning. In online model vectors and the corresponding predicted values of the target
of DENFIS, the linear function in the consequent parts are created variable predicted by SVR. Using Case (2), we are ensuring that
and updated through learning from data using least square the rules generated during phase 2 are indeed extracted from SVR.
estimator.
3.2. Phase 2: rule generation

Training SVR
Set During rule generation phase we analyzed both Case (1) and
Case (2) with one machine learning and two soft computing tech-
niques with explanation capability viz., CART, ANFIS and DENFIS.
Support Vectors Figs. 3 and 4 depict the rule generation step of the proposed
hybrid rule extraction method. Rules are generated for each fold
separately under 10-fold cross validation method. Generated rules
Case (1): Support vectors and Case (2): Support Vectors and are rst tested against the test set of each of the 10-folds and
corresponding Actual Target values Target values predicted by SVR the same set of rules is tested against the validation set as well.
Prediction accuracy of the rules is determined in terms of RMSE.
Fig. 2. First phase of the proposed hybrid (support vectors extraction). Lower the RMSE value higher the prediction rate is.
Case (1) CART / ANFIS / DENFIS
Test / Validation Rules / Tree Prediction Error (RMSE)
Fig. 3. Second Phase of the proposed hybrid for Case (1) (Rule Generation).
Case(2) CART / ANFIS / DENFIS
Test / Validation Rules / Tree Prediction Error (RMSE)
Fig. 4. Second Phase of the proposed hybrid for Case (2) (Rule Generation).
4. Experiment setup sets formed out of Case (1) and Case (2) are constructed using
the extracted support vectors as explained in phase1 of the Sec-
Dataset is rst divided into two parts of 80%:20% ratio. Then 80% tion 3. The new training sets formed out of Case (1) and Case (2)
part of the data is used for 10-fold cross validation testing and 20% are then used in the phase 2 of the hybrids SVR + CART, SVR + AN-
of the data is named as validation set and stored for validation pur- FIS and SVR + DENFIS for generating rules.
pose. Fig. 5 depicts the segmentation of the data and the 10-fold We chose publicly available benchmark datasets for regression
Cross Validation structure maintained for all the experiments in analysis from UCI machine learning repository and StatLib (Data,
this paper. Table 1 presents the number of samples in each dataset Software and News from the Statistics Community) repository.
after dividing the datasets into 80% and 20%. Support vectors are The datasets viz., Auto MPG, Body Fat, Boston Housing, Forest Fires
extracted for every fold using SVR and two different new training and Pollution are used to evaluate the proposed hybrid rule extrac-
tion procedure. Table 2 presents the details of the datasets, includ-
ing number of instances, number of attributes and the target
Data Set 100% variable information. The accuracy of the rules on a test and valida-
tion dataset is measured in terms of the RMSE which is computed
as follows:
80% Data for 10-fold Cross Validation Validation Set 20%
s
Pn
01 02 03 10
i1 yi yi
^ 2
RMSE ;
n
Rules extracted using training set in 10-fold
Cross validation are tested against Validation where y is the actual output and y ^ is the predicted output and n is
Set the number of patterns. Linear, Polynomial, RBF and Sigmoid Ker-
nels are used to extract support vectors. The support vectors ex-
Fig. 5. Segmentation of data and 10-fold Cross Validation. tracted using that kernel, which yield the best accuracy, are then
supplied to the rule generation phase. Table 3 presents the average
RMSE and the average number of support vectors extracted using
Table 1 four different kernels for all the datasets.
Eighty percent and 20% division of the datasets.
Data set 100% Total instances 20% Validation 80% Tenfold CV

5. Dataset description
Auto MPG 398 78 320
Body fat 252 52 200
Boston housing 506 106 500 5.1. Auto MPG dataset
Forest res 517 107 410
Pollution 60 10 50
This dataset concerns city-cycle fuel consumption in miles per
gallon. This dataset contains 398 instances with eight attributes.
This dataset is available in UCI machine learning repository (Asun-
cion & Newman, 2007). The attributes are described in Table 4.
Table 2
Data set information.
Data set Total instances Attributes Target variable 5.2. Body fat dataset
Auto MPG 398 8 Miles per gallon
Body fat 252 15 Body mass index This dataset is obtained from StatLib repository http://lib.stat.c-
Boston housing 506 14 MEDV mu.edu. It estimates the percentage of body fat determined by
Forest res 517 13 Area effected underwater weighing and various body circumference measure-
Pollution 60 16 Air pollution
ments for 252 men (Penrose, Nelson, & Fisher, 1985). Its attributes
are described in Table 5.
Table 3
Average RMSE values by SVR.
Data set Linear Polynomial RBF Sigmoid

RMSE SVs RMSE SVs RMSE SVs RMSE SVs
Auto MPG 0.0814 144 0.0808 150 0.098 141 0.0815 144
Body fat 0.0189 106 0.0441 106 0.1124 90 0.019 105
Boston housing 0.1115 188 0.0857 199 0.1468 183 0.1115 188
Forest res 0.0515 227 0.0518 242 0.0515 228 0.0515 228
Pollution 0.1187 28 0.1352 33 0.1416 25 0.1185 28
Table 4 Table 7
Attribute description of Auto MPG dataset. Attribute description of Forest res dataset.
# Attribute name Attribute type # Attribute name Attribute type

1 Cylinders Multivalued discrete 1 X x-axis spatial coordinate within the Multivalued discrete
2 Displacement Continuous Montesinho park map: 19
3 Horsepower Continuous 2 Y y-axis spatial coordinate within the Multivalued discrete
4 Weight Continuous Montesinho park map: 29
5 Acceleration Continuous 3 month Multivalued discrete
6 Model year Multivalued discrete 4 day Multivalued discrete
7 Origin Multivalued discrete 5 FFMC ne fuel moisture code Continuous
8 Miles Per Gallon Continuous (TARGET) 6 DMC duff moisture code Continuous
7 DC drought code Continuous
8 ISI initial spread index Continuous
9 temperature Continuous
Table 5
10 RH relative humidity Continuous
Attribute description of Body fat dataset.
11 Wind wind speed in km/h Continuous
# Attribute name Attribute type 12 Rain outside rain in mm/m2 Continuous
13 Area the burned area of the forest Continuous (TARGET)
1 X1, Density determined from underwater weighing Continuous (in ha): 0.001090.84
2 X2, Age (years) Multivalued discrete
3 X3, Weight (lbs) Continuous
4 X4, Height (inches) Continuous
5 X5, Neck circumference (cm) Continuous Table 8
6 X6, Chest circumference (cm) Continuous Attribute description of Pollution dataset.
7 X7, Abdomen 2 circumference (cm) Continuous
8 X8, Hip circumference (cm) Continuous # Attribute name Attribute type
9 X9, Thigh circumference (cm) Continuous
1 X1, PREC average annual precipitation in inches Continuous
10 X10, Knee circumference (cm) Continuous
2 X2, JANT average January temperature in degrees F Continuous
11 X11, Ankle circumference (cm) Continuous
3 X3, JULT Average July temperature in degrees F Continuous
12 X12, Biceps (extended) circumference (cm) Continuous
4 X4, OVR65% of 1960 SMSA population aged 65 or Continuous
13 X13, Forearm circumference (cm) Continuous
older
14 X14, Wrist circumference (cm) Continuous
5 X5, POPN average household size Continuous
15 Percent body fat from Siris (1956) equation Continuous
6 X6, EDUC median school years completed by those Continuous
(TARGET)
over 22
7 X7, HOUS% of housing units which are sound & with Continuous
all facilities
8 X8, DENS population per sq. mile in urbanized Continuous
Table 6
areas, 1960
Attribute description of Boston Housing dataset.
9 X9, NONW% non-white population in urbanized Continuous
# Attribute name Attribute type areas, 1960
10 X10, WWDRK% employed in white collar Continuous
1 CRIM: per capita crime rate by town Continuous
occupations
2 ZN: proportion of residential land zoned for lots over Continuous
11 X11, POOR% of families with income < $3000 Continuous
25,000 sq.ft.
12 X12, HC relative hydrocarbon pollution potential Continuous
3 INDUS: proportion of non-retail business acres per Continuous
13 X13, NOX same for nitric oxides Continuous
town
14 X14, SO@ same for sulphur dioxide Continuous
4 CHAS: Charles River dummy variable (=1 if tract Binary
15 X15, HUMID annual average % relative humidity at Continuous
bounds river; 0 otherwise)
1pm
5 NOX: nitric oxides concentration (parts per 10 Continuous
16 MORT Total age-adjusted mortality rate per Continuous
million)
100,000 (TARGET)
6 RM: average number of rooms per dwelling Continuous
7 AGE: proportion of owner-occupied units built prior Continuous
to 1940
8 DIS: weighted distances to ve Boston employment Continuous
Table 9
centres
Average RMSE values using Auto MPG.
9 RAD: index of accessibility to radial highways Continuous
10 TAX: full-value propertytax rate per $10,000 Continuous Auto MPG Case (1) Case (2)
11 PTRATIO: pupilteacher ratio by town Continuous
12 B: 1000(Bk 0.63)2 where Bk is the proportion of Continuous Test Validation Test Validation
blacks by town CART 0.0591 0.0484
13 LSTAT: % lower status of the population Continuous SVR + CART 0.0802 0.0642 0.0422 0.0274
14 MEDV: Median value of owner-occupied homes in Continuous DENFIS 0.295 0.3134
$1000s (TARGET) SVR + DENFIS 0.1399 0.1781 0.1647 0.1604
ANFIS 0.1036 0.1607
SVR + ANFIS 0.5489 0.937 0.0939 0.1185
5.3. Boston housing dataset
This dataset is obtained from UCI machine learning repository

attributes. This is a difcult regression task, where the aim is to
(Asuncion & Newman, 2007). It concerns housing values in suburbs
predict the burned area of forest res, in the northeast region of
of Boston and contains 506 instances with 14 attributes. Table 6
Portugal, by using meteorological and other data (Cortez & Morais,
presents the attribute description.
2007). The attributes are described in Table 7.
5.4. Forest res dataset 5.5. Pollution dataset
This dataset is obtained from UCI machine learning repository This dataset is obtained from StatLib repository http://lib.stat.c-
(Asuncion & Newman, 2007). It contains 517 instances with 13 mu.edu. This dataset contains 60 instances with 16 attributes
Table 10
Rules set using SVR + CART for Auto MPG dataset.
# Antecedents Prediction
01 if WEIGHT 6 0.415226 and HORSEPOWER 6 0.146739 and ORIGIN 6 0.75 and MODEL_YEAR 6 0.541667 0.473147
02 if WEIGHT 6 0.415226 and ORIGIN 6 0.75 and MODEL_YEAR > 0.541667 and MODEL_YEAR 6 0.875 and HORSEPOWER 6 0.0380435 0.636408
03 if WEIGHT 6 0.415226 and ORIGIN 6 0.75 and MODEL_YEAR > 0.541667 and MODEL_YEAR 6 0.875 and HORSEPOWER > 0.0380435 and 0.529263
HORSEPOWER 6 0.14673
04 if WEIGHT 6 0.415226 and HORSEPOWER 6 0.146739 and ORIGIN 6 0.75 and MODEL_YEAR > 0.875 0.619481
05 if WEIGHT 6 0.415226 and HORSEPOWER 6 0.146739 and ORIGIN > 0.75 and MODEL_YEAR 6 0.875 0.618746
06 if WEIGHT 6 0.415226 and HORSEPOWER 6 0.146739 and ORIGIN > 0.75 and MODEL_YEAR > 0.875 0.760563
07 if HORSEPOWER > 0.146739 and WEIGHT 6 0.274171 and MODEL_YEAR 6 0.541667 and DISPLACEMENT 6 0.130491 0.442948
08 if HORSEPOWER > 0.146739 and WEIGHT 6 0.274171 and MODEL_YEAR 6 0.541667 and DISPLACEMENT > 0.130491 0.406843
09 if HORSEPOWER > 0.146739 and WEIGHT 6 0.274171 and MODEL_YEAR > 0.541667 and MODEL_YEAR 6 0.791667 and 0.472736
ACCELERATION 6 0.538691
10 if HORSEPOWER > 0.146739 and WEIGHT 6 0.274171 and MODEL_YEAR > 0.541667 and MODEL_YEAR 6 0.791667 and ACCELERATION > 0.538691 0.547625
11 if HORSEPOWER > 0.146739 and WEIGHT > 0.274171 and WEIGHT 6 0.415226 and MODEL_YEAR 6 0.541667 0.35487
12 if HORSEPOWER > 0.146739 and WEIGHT > 0.274171 and WEIGHT 6 0.415226 and MODEL_YEAR > 0.541667 and MODEL_YEAR 6 0.791667 0.409262
13 if HORSEPOWER > 0.146739 and MODEL_YEAR > 0.791667 and WEIGHT 6 0.200312 0.632958
14 if HORSEPOWER > 0.146739 and MODEL_YEAR > 0.791667 and WEIGHT > 0.200312 and WEIGHT 6 0.296569 0.536736
15 if HORSEPOWER > 0.146739 and MODEL_YEAR > 0.791667 and WEIGHT > 0.296569 and WEIGHT 6 0.415226 0.488202
16 if MODEL_YEAR 6 0.875 and WEIGHT > 0.415226 and WEIGHT 6 0.505387 and HORSEPOWER 6 0.138587 0.419966
17 if MODEL_YEAR 6 0.875 and WEIGHT > 0.415226 and WEIGHT 6 0.505387 and HORSEPOWER > 0.138587 and DISPLACEMENT 6 0.447028 0.331893
18 if MODEL_YEAR 6 0.875 and WEIGHT > 0.415226 and WEIGHT 6 0.505387 and HORSEPOWER > 0.138587 and DISPLACEMENT > 0.447028 0.298214
19 if MODEL_YEAR 6 0.875 and WEIGHT > 0.505387 and WEIGHT 6 0.604479 and ACCELERATION 6 0.747024 and HORSEPOWER 6 0.453804 0.304063
20 if MODEL_YEAR 6 0.875 and WEIGHT > 0.505387 and WEIGHT 6 0.604479 and ACCELERATION 6 0.747024 and HORSEPOWER > 0.453804 0.236424
21 if MODEL_YEAR 6 0.875 and WEIGHT > 0.505387 and WEIGHT 6 0.604479 and ACCELERATION > 0.747024 0.196165
22 if WEIGHT > 0.415226 and WEIGHT 6 0.604479 and MODEL_YEAR > 0.875 0.477826
23 if WEIGHT > 0.604479 and WEIGHT 6 0.693082 and ACCELERATION 6 0.494047 0.183192
24 if WEIGHT > 0.604479 and WEIGHT 6 0.693082 and ACCELERATION > 0.494047 0.104375
25 if WEIGHT > 0.693082 and WEIGHT 6 0.870286 0.117609
26 if WEIGHT > 0.870286 0.0874944
Table 11 (McDonald & Schwing, 1973). Table 8 presents the attributes

Average RMSE values using Body fat. description.
Body fat Case (1) Case (2)
Test Validation Test Validation
6. Results and discussion
CART 0.0202 0.0134
SVR + CART 0.025 0.0314 0.0242 0.0195
DENFIS 0.2539 0.0164
For extracting support vectors, LibSVM algorithm implemented
SVR + DENFIS 0.0381 0.0201 0.019 0.0048 in RapidMiner (Ingo, Michael, Ralf, Martin, & Timm, 2006) is used.
ANFIS 0.0971 0.057 RapidMiner is an open source data mining software package avail-
SVR + ANFIS 0.0595 0.052 0.0609 0.047 able at http://rapid-i.com/. Different kernels viz., linear, polyno-
mial, RBF and sigmoid are used separately to extract the support
Table 12
Rules set using SVR + DENFIS for Body fat dataset.
01 if X1 is GMF(0.50, 0.34) and X2 is GMF(0.50, 0.71) and X3 is GMF(0.50, 0.49) and X4 is GMF(0.50, 0.73) and X5 is Y = 1.99 0.99 * X1 + 0.01 * X3 0.01 *
GMF(0.50, 0.68) and X6 is GMF(0.50, 0.55) and X7 is GMF(0.50, 0.54) and X8 is GMF(0.50, 0.39) and X9 is GMF(0.50, 0.45) X4 + 0.01 * X7 + 0.01 * X8 0.01 * X10
and X10 is GMF(0.50, 0.44) and X11 is GMF(0.50, 0.30) and X12 is GMF(0.50, 0.73) and X13 is GMF(0.50, 0.67) and X14 is
GMF(0.50, 0.53)
02 if X1 is GMF(0.50, 0.35) and X2 is GMF(0.50, 0.23) and X3 is GMF(0.50, 0.53) and X4 is GMF(0.50, 0.83) and X5 is
GMF(0.50, 0.50) and X6 is GMF(0.50, 0.46) and X7 is GMF(0.50, 0.48) and X8 is GMF(0.50, 0.50) and X9 is GMF(0.50, 0.63)
GMF(0.50, 0.19)
GMF(0.50, 0.14)
GMF(0.50, 0.05)
GMF(0.50, 0.32)
GMF(0.50, 0.19)
(continued on next page)

Table 12 (continued)
GMF(0.50, 0.66)
GMF(0.50, 0.45)
GMF(0.50, 0.77)
GMF(0.50, 0.36)
GMF(0.50, 0.45)
GMF(0.50, 0.12)
GMF(0.50, 0.42)
GMF(0.50, 0.40)
GMF(0.50, 0.44)
GMF(0.50, 0.55)
GMF(0.50, 0.14)
GMF(0.50, 0.53)
GMF(0.50, 0.27)
GMF(0.50, 0.53)
GMF(0.50, 0.47)
GMF(0.50, 0.34)
GMF(0.50, 0.71)
GMF(0.50, 0.27)
GMF(0.50, 0.25)
GMF(0.50, 0.05)
GMF(0.50, 0.36)
GMF(0.50, 0.42)
GMF(0.50, 0.38)
GMF(0.50, 0.62)
GMF(0.50, 0.12)
GMF(0.50, 0.21)
GMF(0.50, 0.40)
GMF(0.50, 0.21)
GMF(0.50, 0.64)
GMF(0.50, 0.29)
GMF(0.50, 0.95)
GMF(0.50, 0.36)
GMF(0.50, 0.55)
GMF(0.50, 0.27)
GMF(0.50, 0.58)
GMF(0.50, 0.25)
GMF(0.50, 0.42)

GMF(0.50, 0.42)
GMF(0.50, 0.40)
GMF(0.50, 0.40)
GMF(a, b) indicates Gaussian membership function with mean a and variance b.
Table 13 nel yielded the lowest RMSE 0.0808 and hence, the support vectors
Average RMSE values using Boston housing. extracted using polynomial kernel are used for rule generation
Boston housing Case (1) Case (2) purpose. Our proposed hybrid SVR + CART using Case (2) yielded
highest prediction accuracy against validation set with RMSE of
Test Validation Test Validation
0.0274. The hybrids SVR + ANFIS and SVR + DENFIS with Case (2)
CART 0.0444 0.0657
obtained RMSE of 0.1185 and 0.1604 respectively. The hybrids
SVR + CART 0.0616 0.0784 0.0101 0.0568
DENFIS 0.6136 1.1477 SVR + CART, SVR + ANFIS and SVR + DENFIS with Case (1) obtained
SVR + DENFIS 0.643 0.4263 0.2695 0.3101 the RMSE of 0.0642, 0.937 and 0.1781 respectively. Stand-alone
ANFIS 0.1553 0.5635 CART yielded RMSE of 0.0484, ANFIS yielded RMSE of 0.1607 and
SVR + ANFIS 0.512 0.7204 0.1216 0.1509 DENFIS yielded RMSE of 0.3134. Table 10 presents 26 rules of the
fold, which yielded the best prediction accuracy on validation set
using the hybrid SVR+CART with Case (2). Results of Case (2) are
vectors from the training set of the given problem. CART is avail- much better than that of Case (1).
able at http://www.salfordsystems.com/cart.php and is a popular Table 11 presents the average results of 10-fold cross validation
decision tree used for solving both classication and regression and the results on validation set for Body Fat dataset. Linear kernel
problems. ANFIS is available as the function ans in the Fuzzy Logic obtained the best prediction accuracy with RMSE of 0.0189 for this
Toolbox of MATLAB. DENFIS is implemented using NeuCom tool dataset and the set of support vectors extracted using linear kernel
and NeuCom student version is freely accessible at http://www.au- are used in the rule generation phase. Our proposed hybrid
t.ac.nz/. All the data sets are normalized before invoking SVR. SVR + DENFIS with Case (2) yielded highest prediction accuracy
Table 9 presents the average RMSE values over 10-folds and the on validation data with RMSE of 0.0048. The rules of the fold,
RMSE on validation data set for Auto MPG dataset. Polynomial ker- which yielded the highest accuracy on validation set using the hy-
Table 14
Rules Set using SVR + CART for Boston Housing dataset.
01 if CRIM 6 0.059321 and LSTAT 6 0.104305 and RM 6 0.620617 0.522726
02 if CRIM 6 0.059321 and LSTAT 6 0.104305 and RM > 0.620617 and RM 6 0.661142 0.580579
03 if CRIM 6 0.059321 and LSTAT > 0.104305 and LSTAT 6 0.164045 and RM 6 0.573864 0.45106
04 if CRIM 6 0.059321 and LSTAT > 0.104305 and LSTAT 6 0.164045 and RM > 0.573864 and RM 6 0.661142 0.505173
05 if LSTAT 6 0.164045 and RM 6 0.661142 and CRIM > 0.059321 0.826973
06 if LSTAT 6 0.164045 and CRIM 6 0.0449 and RM > 0.661142 and RM 6 0.704158 0.61574
07 if LSTAT 6 0.164045 and CRIM 6 0.0449 and RM > 0.704158 and RM 6 0.820272 0.71813
08 if LSTAT 6 0.164045 and RM > 0.661142 and RM 6 0.820272 and CRIM > 0.0449 0.900976
09 if LSTAT 6 0.164045 and RM > 0.820272 and RAD 6 0.217392 0.892593
10 if LSTAT 6 0.164045 and RM > 0.820272 and RAD > 0.217392 0.787818
11 if LSTAT > 0.164045 and LSTAT 6 0.357754 and PTRATIO 6 0.132978 0.639806
12 if PTRATIO > 0.132978 and LSTAT > 0.164045 and LSTAT 6 0.274835 and CHAS 6 0.5 and RM 6 0.48927 0.361998
13 if PTRATIO > 0.132978 and LSTAT > 0.164045 and LSTAT 6 0.274835 and CHAS 6 0.5 and RM > 0.48927 and TAX 6 0.271946 0.442761
14 if PTRATIO > 0.132978 and LSTAT > 0.164045 and LSTAT 6 0.274835 and CHAS 6 0.5 and RM > 0.48927 and TAX > 0.271946 0.398301
15 if PTRATIO > 0.132978 and LSTAT > 0.164045 and LSTAT 6 0.274835 and CHAS > 0.5 0.591983
16 if PTRATIO > 0.132978 and LSTAT > 0.274835 and LSTAT 6 0.357754 and CRIM 6 0.321786 and NOX 6 0.101749 0.216231
17 if LSTAT > 0.274835 and LSTAT 6 0.357754 and CRIM 6 0.321786 and NOX > 0.101749 and PTRATIO > 0.132978 and PTRATIO 6 0.888298 0.353425
18 if LSTAT > 0.274835 and LSTAT 6 0.357754 and CRIM 6 0.321786 and NOX > 0.101749 and PTRATIO > 0.888298 0.268955
19 if PTRATIO > 0.132978 and LSTAT > 0.274835 and LSTAT 6 0.357754 and CRIM > 0.321786 0.161555
20 if LSTAT > 0.357754 and LSTAT 6 0.450745 and CRIM 6 0.0012965 0.414246
21 if LSTAT > 0.357754 and LSTAT 6 0.450745 and CRIM > 0.0012965 and CRIM 6 0.117411 0.272778
22 if CRIM 6 0.117411 and LSTAT > 0.450745 and NOX 6 0.436214 0.240371
23 if LSTAT > 0.450745 and NOX > 0.436214 and CRIM 6 0.0221815 0.125006
24 if LSTAT > 0.450745 and NOX > 0.436214 and CRIM > 0.0221815 and CRIM 6 0.117411 0.196233
25 if LSTAT > 0.357754 and CRIM > 0.117411 and NOX 6 0.50926 0.224651
26 if CRIM > 0.117411 and NOX > 0.50926 and LSTAT > 0.357754 and LSTAT 6 0.415287 0.227456
27 if NOX > 0.50926 and LSTAT > 0.415287 and CRIM > 0.117411 and CRIM 6 0.157842 0.0778943
28 if NOX > 0.50926 and LSTAT > 0.415287 and CRIM > 0.157842 and CRIM 6 0.249894 0.15585
29 if NOX > 0.50926 and LSTAT > 0.415287 and CRIM > 0.249894 0.0781522
Table 15 brid SVR + DENFIS with Case (2) are 46 and presented in Table 12.
Average RMSE values using Forest res. The hybrids SVR + CART and SVR + ANFIS with Case (2) yielded an
Forest res Case (1) Case (2) RMSE of 0.0195 and 0.047 respectively. CART in stand-alone mode
Test Validation Test Validation obtained the RMSE of 0.0134, which is better than the hybrids
SVR + CART, SVR + ANFIS and SVR + DENFIS with Case (1) which
CART 0.0371 0.042
SVR + CART 0.0436 0.0442 0.00018 0.00031
obtained RMSE of 0.0314, 0.0201 and 0.052 respectively. Stand-
DENFIS 0.2238 0.2516 alone ANFIS yielded RMSE of 0.057 where as the stand-alone DEN-
SVR + DENFIS 0.2442 0.252 0.0594 0.0099 FIS yielded RMSE of 0.0164.
ANFIS 0.2325 0.3462 Table 13 presents the average results obtained using Boston
SVR + ANFIS 0.1193 0.099 0.0462 0.0098
Housing dataset. Here polynomial kernels yielded lowest RMSE
of 0.0857 and hence, the support vectors extracted using polyno-
mial kernel are used for rule generation phase. The proposed hy-
Table 16 brid SVR + CART with Case (2) yielded highest prediction
Rules set using SVR + CART for forest res dataset. accuracy against validation set with RMSE of 0.0568. The hybrids
# Antecedents Prediction SVR + ANFIS and SVR + DENFIS with Case (2) yielded RMSE of
1 if MONTH 6 0.863636 and RAIN 6 0.578125 0.00346333
0.1509 and 0.3101 respectively. The hybrids SVR + CART, SVR + AN-
2 if MONTH 6 0.863636 and RAIN > 0.578125 0.00667844 FIS and SVR + DENFIS with Case (1) obtained RMSE of 0.0784,
3 if MONTH > 0.863636 0.00532111 0.7204 and 0.4263 respectively. CART in stand-alone fashion ob-
tained RMSE of 0.0657, whereas ANFIS in stand-alone mode
yielded RMSE of 0.5635 and stand-alone DENFIS yielded RMSE of
Table 17 1.1477. Table 14 presents 29 rules of the fold yielding the best
Average RMSE values using Pollution.
accuracy on validation set using the hybrid SVR + CART with Case
Pollution Case (1) Case (2) (2).
Test Validation Test Validation Table 15 presents the average RMSE values obtained using For-
CART 0.1285 0.1127
est Fires dataset. Using this data RBF kernel yielded the lowest
SVR + CART 0.1219 0.1512 0.0769 0.0854 RMSE 0.0515 and the support vectors extracted using RBF kernel
DENFIS 0.1982 0.1395 is selected for rule generation phase in hybrid. The hybrid
SVR + DENFIS 0.3046 0.2499 0.1184 0.0765 SVR + CART with Case (2) obtained best prediction accuracy with
ANFIS 0.1505 0.1034
RMSE of 0.00031. Hybrids SVR+ANFIS and SVR + DENFIS with Case
SVR + ANFIS 0.157 0.1145 0.137 0.0956
(2) yielded RMSE of 0.0098 and 0.0099, respectively. The hybrids
Table 18
Rules set using SVR + DENFIS for pollution dataset.
01 if X1 is GMF(0.50, 0.82) and X2 is GMF(0.50, 0.59) and X3 is GMF(0.50, 0.69) and X4 is GMF(0.50, 0.35) and X5 Y = 1.30 + 0.38 * X1 0.20 * X2 0.11 * X3 0.10 *
is GMF(0.50, 0.88) and X6 is GMF(0.50, 0.39) and X7 is GMF(0.50, 0.05) and X8 is GMF(0.50, 0.25) and X9 is X4 0.12 * X6 0.09 * X7 + 0.29 * X8 + 0.51 *
GMF(0.50, 0.95) and X10 is GMF(0.50, 0.34) and X11 is GMF(0.50, 0.91) and X12 is GMF(0.50, 0.09) and X13 is X9 0.05 * X10 0.03 * X11 0.03 * X12 + 0.02 *
GMF(0.50, 0.13) and X14 is GMF(0.50, 0.55) and X15 is GMF(0.50, 0.31) X13 + 0.15 * X14 + 0.01 * X15
02 if X1 is GMF(0.50, 0.64) and X2 is GMF(0.50, 0.34) and X3 is GMF(0.50, 0.37) and X4 is GMF(0.50, 0.82) and X5
is GMF(0.50, 0.49) and X6 is GMF(0.50, 0.93) and X7 is GMF(0.50, 0.70) and X8 is GMF(0.50, 0.40) and X9 is
GMF(0.50, 0.11) and X10 is GMF(0.50, 0.77) and X11 is GMF(0.50, 0.13) and X12 is GMF(0.50, 0.07) and X13 is
GMF(0.50, 0.13) and X14 is GMF(0.50, 0.48) and X15 is GMF(0.50, 0.39)
03 if X1 is GMF(0.50, 0.68)and X2 is GMF(0.50, 0.24) and X3 is GMF(0.50, 0.15) and X4 is GMF(0.50, 0.95) and X5

GMF(0.50, 0.30) and X10 is GMF(0.50, 0.73) and X11 is MF(0.50, 0.19) and X12 is GMF(0.50, 0.10) and X13 is
SVR + CART, SVR + ANFIS and SVR + DENFIS using Case (1) yielded Table 17 presents the results obtained using pollution data. As
RMSE of 0.0442, 0.099 and 0.252, respectively. The techniques the sigmoid kernel yielded the lowest prediction error using pollu-
CART, ANFIS and DENFIS in stand-alone mode yielded RMSE of tion data, the support vectors extracted using the sigmoid kernel
0.042, 0.3462 and 0.2516 respectively. The rules extracted using are used for rule generation. The hybrid SVR + DENFIS with Case
the hybrid SVR + CART with Case (2) for the fold that yielded best (2) outperforms other approaches with RMSE of 0.0765. Hybrids
prediction accuracy on validation set are presented in Table 16. SVR + CART and SVR + ANFIS with Case (2) yielded RMSE of
0.0854 and 0.0956 respectively. SVR + CART, SVR + ANFIS and on hybrid intelligent systems, November 0609, 2005. Rio de Janeiro,
Brazil.
SVR + DENFIS with Case (1) yielded RMSE of 0.1512, 0.1145 and
Chen, Z., Li, J., & Wei, L. (2007). A multiple kernel support vector machine scheme for
0.2499 respectively, whereas CART, ANFIS and DENFIS in stand- feature selection and rule extraction from gene expression data of cancer tissue.
alone mode obtained the RMSE of 0.1127, 0.1034 and 0.1395, Articial Intelligence in Medicine, 41, 161175.
respectively. Table 18 presents 26 rules which are extracted for Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20,
273297.
the fold which obtained the best prediction accuracy against vali- Cortez, P., & Morais, A. (2007). A data mining approach to predict forest res using
dation set using the proposed hybrid SVR + DENFIS with Case (2). meteorological data. In J. Neves, M. F. Santos, & J. Machado (Eds.), New trends in
It is observed from the experiments that the proposed hybrids articial intelligence, proceedings of the 13th EPIA2007 Portuguese conference on
articial intelligence, Guimares, Portugal (pp. 512523).
SVR + CART, SVR + ANFIS and SVR + DENFIS with Case (2) obtained Craven, M. W., & Shavlik, J. W. (1994). Using sampling and queries to extract rules
better prediction accuracies, compared to other hybrids from trained neural networks. In Proceedings of the eleventh international
SVR + CART, SVR + ANFIS and SVR + DENFIS with Case (1) and conference on machine learning, San Francisco, CA, USA.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines
stand-alone CART, ANFIS and DENFIS. It is also observed from the and other kernel-based learning methods. New York, NY, USA: Cambridge
experiments that the hybrid SVR + CART with Case (2) obtained University Press.
the best prediction accuracy in case of Auto MPG, Boston Housing Davis, R., Buchanan, B. G., & Shortliffe, E. (1977). Production rules as a representation
for a knowledge-based consultation program. Articial Intelligence, 8(1), 1545.
and Forest Fires datasets and SVR + DENFIS with Case (2) obtained Farquad, M. A. H., Ravi, V., & Raju, S. B. (2008). Support vector machine based hybrid
best results using Body Fat and Pollution datasets. Stand-alone classiers and rule extraction thereof: Application to bankruptcy prediction in
CART, ANFIS and DENFIS most of the time performed better com- banks. In E. Soria, J. D. Martn, R. Magdalena, M. Martnez, & A. J. Serrano (Eds.),
Handbook of research on machine learning applications and trends: Algorithms,
pared to the hybrid SVR + CART, SVR + ANFIS and SVR + DENFIS
methods and techniques, IGI Global, USA.
with Case (1). We observed that rule extraction using proposed hy- Farquad, M. A. H., Ravi, V., & Raju, S. B. (2008). Rule extraction using support vector
brids is a viable option for solving regression problems. machine based hybrid classier. Presented in TENCON-2008, IEEE region 10
conference, 1921 November, Hyderabad, India.
Fletcher, R. (1989). Practical methods of optimization. New York: John Wiley and Sons.
7. Conclusion Fu, X., Ong, C. J., Keerthi, S., Hung, G. G., & Goh, L. (2004). Extracting the knowledge
embedded in support vector machines. In International joint conference on neural
networks (IJCNN04), Budapest, Hungary.
In this paper, we propose a new hybrid rule extraction approach Fung, G., Sandilya, S., & Bharat Rao, R. (2005). Rule extraction from linear support
for solving regression problems from SVR. First, using SVR, support vector machines. In KDD05: proceeding of the eleventh ACM SIGKDD international
vectors are extracted in phase 1 and the extracted support vectors conference on knowledge discovery in data mining (pp. 3240). New York, NY,
USA: ACM Press.
are fed to one of CART, ANFIS and DENFIS in phase 2. The data set of Gallant, S. (1988). Connectionist expert systems. Communications of the ACM, 31(2),
extracted support vectors with actual corresponding actual target 152169.
values available is referred to as Case (1), whereas in Case (2) the Ingo, M., Michael, W., Ralf, K., Martin, S., & Timm, E. (2006). YALE: Rapid prototyping
for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD
corresponding target values are replaced by the predictions given international conference on knowledge discovery and data mining (KDD-06),
by SVR. Using Case (2) we are ensuring that the rules generated Philadelphia, PA, USA.
during phase 2 are indeed extracted from SVR. From the analysis Jang, J. S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference system. IEEE
Transactions on Systems, Man, Cybernetics, 23(5/6), 665685.
it is concluded that the proposed hybrid SVR + CART, SVR + ANFIS
Joachims, T. (1998). Making large-scale SVM learning practical. Technical report, LS-8-
and SVR + DENFIS with Case (2) achieved higher accuracy than 24, Computer Science Department, University of Dortmund.
the accuracies obtained using stand-alone CART, ANFIS and DENFIS Kasabov, N., & Song, Q. (2002). DENFIS: Dynamic, evolving neural-fuzzy inference
systems and its application for time-series prediction. IEEE Transactions on Fuzzy
and the hybrids SVR + CART, SVR + ANIFS and SVR + DENFIS with
Systems, 10, 144154.
Case (1). Stand-alone CART, ANFIS and DENFIS achieved better Martens, D., Baesens, B., & Gestel, T. V. (2009). Decompositional rule extraction from
accuracy compared to the hybrids SVR + CART, SVR + ANFIS and support vector machines by active learning. IEEE Transactions on Knowledge and
SVR + DENFIS with Case (1). We conclude that the proposed hybrid Data Engineering, 21(2), 178191.
McDonald, G. C., & Schwing, R. C. (1973). Instabilities of regression estimates
is a viable alternative to generating rules from SVR for solving relating air pollution to mortality. Technometrics, 15, 463482.
regression problems. Nilsson, N. J. (1965). Learning machines: Foundations of trainable pattern classifying
systems. McGraw-Hill.
Nunez, H., Angulo, C., & Catata, A. (2002). Rule extraction from support vector
References machines. In European symposium on articial neural networks proceedings (pp.
107112).
Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I. (1964). Theoretical foundations Osuna, E. E., Freund, R., & Girosi, F. (1997). Support vector machines: Training and
of the potential function method in pattern recognition learning. Automation applications. Technical report, Massachusetts Institute of Technology, Articial
and Remote Control, 25, 821837. Intelligence Laboratory, AI Memo No. 1602.
Ancona, N. (1999). Classication properties of support vector machines for regression. Penrose, K. W., Nelson, A. G., & Fisher, A. G. (1985). FACSM, Human Performance
Technical report, R.I.-IESI/CNR-Nr.,02/99. Research Center, Brigham Young University, Provo, Utah 84602 as listed in
Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Irvine, CA: Medicine and Science in Sports and Exercise, 17 (2), 189.
University of California, School of Information and Computer Science. Pontil, M., & Verri, A. (1997). Properties of support vector machines. Technical Report,
Barakat, N. H., & Bradley, A. P. (2006). Rule extraction from support vector Massachusetts Institute of Technology, Boston MA, USA.
machines: Measuring the explanation capability using the area under the ROC Scholkopf, B., & Smola, A. J. (2002). Learning with kernals. Massachusetts, USA: MIT
curve. In The 18th international conference on pattern recognition (ICPR06), Press.
Hong Kong. Siri, W. E. (1956). Gross composition of the body. In J. H. Lawrence & C. A. Tobias
Barakat, N. H., & Diederich, J. (2004). Learning-based rule-extraction from support (Eds.), Advances in biological and medical physics IV. New York, USA: Academic
vector machines. In Proceedings of the 14th international conference on computer Press Inc.
theory and applications ICCTA2004, Alexandria, Egypt. Smola, A. J., & Scholkopf, B. (1998). A tutorial on support vector regression.
Barakat, N. H., & Bradley, A. P. (2007). Rule extraction from support vector NEUROCOLT2 Technical report series, NC2-TR-1998-030.
machines: A sequential covering approach. IEEE Transactions on Knowledge and Smola, A. J., & Schlkopf, B. (2004). A tutorial on support vector regression. Statistics
Data Engineering, 19(6), 729741. and Computing, 14(3), 199222.
Barakat, N. H., & Diederich, J. (2005). Eclectic rule-extraction from support vector Takagi, T., & Sugeno, M. (1985). Fuzzy identication of systems and its applications
machines. International Journal of Computational Intelligence, 2(1), 5962. to modeling and control. IEEE Transactions on Systems Man and Cybernet, SMC-
Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming 15(1), 116132.
discrimination of two linearly inseparable sets. Optimization Methods and Vapnik, V. N. (1995). The nature of statistical learning theory. New York, NY, USA:
Software, 1, 2334. Springer-Verlag, New York Inc..
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classication and Werbos, P. (1974). Beyond regression: New tools for prediction and analysis in the
regression trees. Belmont, California: Wadsworth International Group. behavioral sciences. Ph.D. thesis, Harvard University.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338353.
Data Mining and Knowledge Discovery, 2, 143. Zhang, Y., Su, H., Jia, T., & Chu, J. (2005). Rule extraction from trained support vector
Chaves, Ad. C. F., Vellasco, M. M. B. R. & Tanscheit, R. (2005). Fuzzy rule machines. Lecture notes in computer science (3518). Berlin, Heidelberg: Springer.
extraction from support vector machines. In Fifth international conference pp. 6170.

Expert Systems With Applications: M.A.H. Farquad, V. Ravi, S. Bapi Raju

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Expert Systems With Applications: M.A.H. Farquad, V. Ravi, S. Bapi Raju

Uploaded by

Copyright:

Available Formats

Expert Systems with Applications 37 (2010) 55775589

Contents lists available at ScienceDirect

Expert Systems with Applications

1. Introduction the hyperplane with small norm while simultaneously minimizing

The assumption in (2) is that such a function f actually exists

(1989), which is as follows: X

3.2. Phase 2: rule generation

Case (1) CART / ANFIS / DENFIS

Test / Validation Rules / Tree Prediction Error (RMSE)

Case(2) CART / ANFIS / DENFIS

Test / Validation Rules / Tree Prediction Error (RMSE)

Data set 100% Total instances 20% Validation 80% Tenfold CV

Data set Linear Polynomial RBF Sigmoid

# Attribute name Attribute type # Attribute name Attribute type

This dataset is obtained from UCI machine learning repository

5.4. Forest res dataset 5.5. Pollution dataset

Table 11 (McDonald & Schwing, 1973). Table 8 presents the attributes

(continued on next page)

(continued on next page)

GMF(a, b) indicates Gaussian membership function with mean a and variance b.

(continued on next page)

You might also like