Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

IMPROVING CLASSIFICATION WITH COST-SENSITIVE

APPROACH FOR DISTRIBUTED DATABASES

Maria Muntean, Honoriu Vălean, Ioan Ileană, Corina Rotar


1 Decembrie 1918 University of Alba Iulia, Romania
Technical University of Cluj Napoca, Romania
mmuntean@uab.ro, Honoriu.Valean@aut.utcluj.ro

ABSTRACT
A problem arises in data mining, when classifying unbalanced datasets using
Support Vector Machines. Because of the uneven distribution and the soft margin
of the classifier, the algorithm tries to improve the general accuracy of classifying
a dataset, and in this process it might misclassify a lot of weakly represented
classes, confusing their class instances as overshoot values that appear in the
dataset, and thus ignoring them. This paper introduces the Enhancer, a new
algorithm that improves the Cost-sensitive classification for Support Vector
Machines, by multiplying in the training step the instances of the underrepresented
classes. We have discovered that by oversampling the instances of the class of
interest, we are helping the Support Vector Machine algorithm to overcome the
soft margin. As an effect, it classifies better future instances of this class of interest.
Experimentally we have found out that our algorithm performs well on distributed
databases.

Keywords: classification; distributed databases; Cost-Sensitive Classifier; Support


Vector Machine.

1 INTRODUCTION find a line that “best” separates points in the positive


class from the points in the negative class. The
Most of the real-world data are imbalanced in hyperplane is characterized by the decision function f
terms of proportion of samples available for each (x) = sgn(w , Φ(x) + b), where “w” is the weight
class, which can cause problems such as over fit or vector, orthogonal to the hyperplane, “b” is a scalar
little relevance. The Support Vector Machine (SVM), that represents the margin of the hyperplane, “x” is
a classification technique based on statistical the current sample tested, “Φ(x)” is a function that
learning theory, was applied with great success in transforms the input data into a higher dimensional
many challenging non-linear classification problems feature space and “,” representing the dot product.
and was successfully applied to imbalanced Sgn is the signum function. If “w” has unit length,
databases. then <w, Φ(x)> is the length of “Φ(x)” along the
direction of “w”.
2 SUPPORT VECTOR MACHINES To construct the SVM classifier one has to
minimize the norm of the weight vector “w” (where
The Support Vector Machine (SVM), proposed ||w|| represents the Euclidian norm) under the
by Vapnik and his colleagues in 1990’s [1], is a new constraint that the training patterns of each class
machine learning method based on Statistical reside on opposite sides of the separating surface.
Learning Theory and it is widely used in the area of The training part of the algorithm needs to find the
regressive, pattern recognition and probability normal vector “w” that leads to the largest “b” of the
density estimation due to its simple structure and hyperplane. Since the input vectors enter the dual
excellent learning performance. Joachims validated only in form of dot products the algorithm can be
its outstanding performance in the area of text generalized to non-linear classification by mapping
categorization in 1998 [2]. SVM can also overcome the input data into a higher-dimensional feature
the over fitting and under fitting problems [3], [4], space via an a priori chosen non-linear mapping
and it has been used for imbalanced data function “Φ” and construct a separating hyperplane
classification [5], [6]. with the maximum margin.
The SVM technique is based on two class In solving the quadratic optimization problem of
classification. There are some methods used for the linear SVM (i.e. when searching for a linear
classification in more than two classes. Looking at SVM in the new higher dimensional space), the
the two dimensional problem we actually want to training tuples appear only in the form of dot
products, <Φ(xi) , Φ(xj)> , where “Φ(x)” is simply obtain efficient techniques for solving classification
the nonlinear mapping function applied to transform problems. An example of such method is the Cost-
the training tuples. Expensive calculation of dot sensitive metaclassifier [9], [10], [11], that targets
products <Φ(xi) , Φ(xj)> in a high-dimensional space the imbalanced learning problem by using different
can be avoided by introducing a kernel function “K”: cost matrices that describe the costs for
K ( x i , x j ) = φ ( xi ) ⋅ φ ( x j ) (1) misclassifying any particular dataset.
The kernel trick can be applied since all feature
vectors only occur in dot products. The weight 3 COST-SENSITIVE APPROACH
vectors than become an expression in the feature
space, and therefore “Φ” will be the function through Fundamental to the Cost-sensitive learning
which we represent the input vector in the new space. methodology is the concept of the cost matrix. This
Thus it is obtained the decision function having the approach takes the classify cost into account, and it
following form: aims to reduce the classify cost to the least. Instead
of creating balanced data distributions through
f ( x) = sgn( ∑ yiα i k ( x, xi ) + b ) (2)
different sampling strategies, Cost-sensitive learning
i∈ℜ
targets the imbalanced learning problem by using
where αi represent the Lagrange multipliers and the
different cost matrices that describe the costs for
samples xi for which αi > 0 are called Support
misclassifying any particular dataset.
Vectors [7].
A very useful tool, the Confusion Matrix for two
The idea of the kernel is to compute the norm of
classes is shown in Table 1.
the difference between two vectors in a higher
dimensional space without representing those vectors
Table 1: confusion matrix for a two class problem
in the new space. Regarding the final SVM
formulation, the free parameters of SVMs to be Predicted Class
determined within model selection are given by the Class = 1 Class = 0
regularization parameter c and the kernel, together Actual Class Class = 1 TP FN
with additional parameters of the respective kernel Class = 0 FP TN
function.
We chose to perform the experiments with The true positives (TP) and true negatives (TN)
Polynomial Kernel function that is given by: are correct classifications. A false positive (FP)
occurs when the outcome is incorrectly predicted as
K ( xi , x j ) = ( m ⋅ xi ⋅ x j + n ) e (3),
1 (or positive) when it is actually 0 (negative). A
where e represents the exponent of the Polynomial false negative (FN) occurs when the outcome is
Kernel function, and m, n are the coefficients of this incorrectly predicted as negative when it is actually
expression. positive.
Training a SVM requires the solution of a very In addition, the accuracy measure may be
large quadratic programming (QP) optimization defined. It represents the ratio between correctly
problem. So, the Sequential Minimal Optimization classified instances and the sum of all instances
(SMO) algorithm is proposed in the literature [7] for classified, both correct and incorrect ones. The above
training SVM breaks this large QP problem into a measure was defined as:
series of smallest possible QP problems. These small TP + TN (4)
accuracy =
QP problems are solved analytically, which avoids TP + TN + FP + FN
using a time-consuming numerical QP optimization More precisely, the classification gives equal
as an inner loop. The amount of memory required for importance to all the misclassified data (false
SMO is linear in the training set size, which allows negatives and false positives are equally significant).
SMO to handle very large training sets. Because The Cost-sensitive classifications strive to minimize
matrix computation is avoided, SMO scales the total cost of the errors made by a
somewhere between linear and quadratic in the misclassification, rather than the total amount of
training set size for various test problems, while the misclassified data.
standard chunking SVM algorithm scales somewhere This paper introduces an algorithm named
between linear and cubic in the training set size. We Enhancer aimed for increasing the TP of
used the SMO method in our experiments in order to underrepresented classes of datasets, using Cost-
train faster the SVM. sensitive classification and SVM.
The SVMs have a great disadvantage in
imbalanced classification the: because of the uneven 4 DESCRIPTION OF THE ENHANCER
distribution and the soft margin of the SVM, the
algorithm tries to improve the general accuracy of Experimentally we have found out that the
classifying a dataset, and in this process it might features that help in raising the TP of a class are the
misclassify a lot of weakly represented classes. cost matrix and the amount of instances that the class
Various methods enrich the SVM algorithm to has. The last one can be modified by multiplying the
number of instances of that class that the dataset data is going to be introduced to be tested, the results
initially has. The algorithm can be applied to not are going to be disastrous. This way of evaluation is
well represented classes from imbalanced data sets. the 10 fold cross validation. Like this the dataset is
Imbalanced data sets occur in two class domains being randomized, and stratified using an integer
when one class contains a large number of examples, seed that takes values in the range 1-10. The
while the other class contains only a few examples. algorithm performs 10 times the evaluation of the
The Enhancer algorithm is detailed in the data set, and all the time has a different test set (Fig.
following pseudo code: 1).
1. Read and validate input;
2. For all the classes that are not well represented:
BEGIN
Evaluate class with no attribute added
Evaluate class at Max multiplication rate
Evaluate the class at Half multiplication
REPEAT
Flag = False
Evaluate the intervals (beginning,
middle),
(middle, end)
Figure 1: 10 fold Cross Validation
If the end condition is met
Flag = True So, after performing the stratification, each time
If the first interval has better results we the data set was split into the training and test set, the
should use this, otherwise the other Enhancer took the training set and applied
Find the class evaluation after classMultiply() on it. Like this the instances that
multiplying class instances middle were going to be multiplied were not going to be
times among that data that was going to test the result of
UNTIL Flag = False the SMO model.
END The performance of the algorithm is only due to
3. Multiply all the classes with the best factor the multiplied data, and there is no over fitting to this
obtained; specific data set. The data was trained in order to be
4. Evaluate dataset. evaluated as accurately as possible by a general test
While reading and validating the input we set, and not only by the one for testing.
collected from the command line the parameters that The instances were multiplied using the
were used by this classifier, together with the properties of the Instances object in which they were
classifier parameters that were usually transmitted to stored following this pseudo code:
the program. The input parameters needed were the 1 aux← all instances of class x from dataset
number of the class that needs to have its TP 2 for i=0 to max do
improved and the ε that is the maximum allowed 3 add (instance from aux to dataset)
difference between the evaluation of the two 4 randomize dataset
intervals (beginning, middle) and (middle, end). By performing this series of operations the
Our classifier had also as input parameters the number of instances of the desired class was
multiplicands that the optimization algorithm had multiplied by the desired amount and in the same
used. There are available two kinds of evaluations time we had a good distribution of instances inside
that also accept class multiplication: the dataset in order not to harm or benefit any of the
• Evaluating a dataset with only the instances of classes in the new dataset.
one class being multiplied, and keeping the other still In order to see what the best improvement is, we
to their initial value. This kind of operation was need to calculate an ending property of the logarithm.
especially useful when we tried to find out what was After some experiments the conclusion was that we
the best multiplicand for a certain class. must optimize the TP and in the same time keep the
• Evaluation of a dataset where the instances of accuracy as high as possible. This can be translated
all classes could undergo a multiplication process. in the following equation:
The multiplication of the classes could be any real ϕ = ∆TPi + ∆ACC = max (5)
number greater or equal to 1. If the multiplicand was This means that we are trying all the time to
1, then the class remained with the initial number of maximize the TP of classes and also the Accuracy.
instance. The only flaw in this equation is the Accuracy is
It is important to avoid performing the medium (50%) and the TP of that certain class is
evaluation on data that the algorithm used to train the really close to 0. If realize to get the TP of the class
model on, because otherwise the algorithm is going as high as 80-90%, the loss in the accuracy, that is
to over fit on this particular dataset, and when new going to appear inevitably, is going to pass unnoticed
by this function. That is why we needed to introduce
the following constraint: ∆ACC > θ , where “θ” is the
minimum allowed drop in the accuracy.
The Enhancer algorithm described in the pseudo
code used a Divide et Impera technique, that
searched in the space (0 multiplication – max
multiplication) for the optimal multiplier for the class.
The algorithm is going to stop its search under two
circumstances:
• The granulation is getting to thin, i.e., the
difference between the beginning and end of an
interval is very small (under a set epsilon). This
constraint is set, in order not to let the algorithm
wonder around searching for solutions that vary one
from another by a very small number (<10-2).
• The modulus of the difference between the
∆TPi + ∆ACC
from the first and the second interval
should be bigger that a known value. This value is
the considered to be the deviation of the Accuracy Figure 2: the structure of the database
added to the deviation of the TP of that class:
The class distribution for the database is
µ = σACC + σTP (6)
illustrated below (Fig. 3):
After finding the best multiplicand for the class
that we are trying to optimize, we constructed a
training set that contained each class instances
multiplied by the optimal multiplicand found at the
previous step. The experiments were performed on
the multiplicands of each of the other weakly
represented classes, in order to raise the accuracy and
the TP of the other classes while keeping the TP of
the interested class at least at the same value that the
algorithm retrieved.

5 EXPERIMENTAL RESULTS
Figure 3: the database class distribution
5.1 Database description
5.2 Database replication
For the evaluation, we used the Dermatology
database, obtained from the online UCI Machine Our distributed database system included three
Learning Repository [12]. A brief description of the servers situated on three geographic remote sites.
dataset is presented in Table 2 and Fig. 2. Each server had a client (Fig. 4). Two clients had
This database has accuracy levels which are not insert, update and delete rights, and the third one
very high, so improvement is possible. could only select information. The algorithms
proposed for the experiments ran on the last client
Table 2: the database used in the experiments and used the data from the aggregated database.

Database No of No of Attributes
attributes instances types
Dermatology 34+1 358 Num,
Nom
Knowledge 5.4 Polynomial kernel degree’s influence
discovery
AGGREGATION
CLIENT 3 SERVER
Dermatology Accuracy variation with respect to e
0,975
0,97
Replication Replication
0,965
Acc
Replication 0,96

SERVER 1 SERVER 2 0,955


Dermatology1 Dermatology2 0,95
Replication 1 1,5 2 2,5 3 3,5 4 4,5 5
Insert Insert
Update Update
Delete Delete
CLIENT 1 CLIENT 2
Figure 7: the variation of accuracy with respect to e

Figure 4: database replication Time variation with respect to e


1,2
1
5.3 Complexity parameter of SVM influence 0,8
0,6 Time

First, we analyzed how the parameters that the 0,4


0,2
Polynomial Kernel function receives influence the 0
accuracy of classification, in order to choose the 1 1,5 2 2,5 3 3,5 4 4,5 5

values for the parameters that give the highest rates


for the accuracy. The two parameters we focused on
were: c (the complexity parameter of the SMO
classifier) and e (the exponent of the Polynomial
Kernel function). Figure 8: the time needed in order to evaluate the
In the evaluations performed the value of database with respect to e
complexity and then the value of the exponent were
After performing the above experiments, we
changed (Fig. 5 and Fig. 6).
decided to use for evaluation the following values for
the input SMO parameters: c=1.0 and e=4.0,
Accuracy variation with respect to c facilitating an increase in accuracy.
0,959 Another conclusion that could be drawn was that
0,958 the time needed for training the classifier did not
0,957 depend on the complexity parameter and varied
0,956 Acc between 0.8 and 1.0 seconds.
0,955
0,954 In order to improve the classification of the
0,953 weakly represented classes in the dataset, two
1,4
1,79
2,19
2,6

3,4
3,8
4,2
4,6

5,4
5,8
6,2
6,6
1

approaches were tested:


• Cost-sensitive classification
• Multiplication of the instances of weakly
Figure 5: the variation of accuracy with respect to c represented classes.

5.5 Cost-Sensitive classification


Time variation with respect to c
0,85
In the case of the Cost-sensitive classification,
0,8
the main aim was to find a good cost matrix, to
0,75
increase the cost of wrongly classified instances that
Ti…
0,7 belong to the weakly represented classes. In order to
0,65 perform this, we used the Cost-Sensitive Classifier
0,6 [13] on the Dermatology database as follows:
1
1,4
1,79
2,19
2,6
3
3,4
3,8
4,2
4,6
5
5,4
5,8
6,2
6,6

• We have set as cost matrix the default cost


matrix (0 on the main diagonal and 1 in rest);
• We evaluated the datasets;
Figure 6: the time needed in order to evaluate the • We “fixed” the cost matrix, to increase the cost
database with respect to c of the wrongly classified;
• We reevaluated the datasets and redid the
previous step. • The correct classifications of certain classes’
After performing the experiments we observed increased, and in the same time the general accuracy
that the accuracy and the TP of the classes 0, 2, 4 and rose. In this case, the growth of the TP of that class
5 remained constant with respect to Cost Matrix hadn’t influenced the accuracy of other classes, so
change (Fig. 9). more instances were correctly classified, thus the
increase in the accuracy.
• The correct classification of certain classes’
increased and the general accuracy remained the
same. In this case, the rise of the TP of a class
affected the evaluation of other classes, which were
going to get sloppier.
Cost-sensitive classification proved to be a good
method of improving the TP of the unbalanced
classes in the dataset that were weakly connected
with one-another.
When the instances of certain classes were not
Figure 9: Class 0, Class 1, Class 4 and Class 5 TP correctly identified, this could be because of the soft
variation of Dermatology dataset with respect to Cost margin of the SVMs, which were interpreted that the
Matrix change instances of the weakly represented classes are just
few errors in the classification of the larger classes.
The TP of the Classes 1 and 3 evolved almost
complementary one from another (when one TP rises, 5.6 Multiplying underrepresented classes
the other falls and the other way around) (Fig. 10 and
Fig. 11). We mention that the Class 1 has 60 In order to improve the classification of one class
instances, which represent 17% of the full dataset of interest from the training dataset using SVM, we
and the Class 3 has 48 instances (13% of the total needed to improve its chances of being recognized.
amount). In order to SVM recognize classes, it needed support
vectors from those classes and that’s why we had
increased the number of instances of the weakly
represented classes and in the same time we had kept
the value of the other classes constant. First, we tried
to find out what is the best multiplier to use for the
several classes and how much did it affect the rest of
the evaluation.
This algorithm helps the Classes number 1 and
number 3. Although it started from a quite high, the
TP of the class stabilized at around 0.9 (Fig. 12 and
Fig. 13). In the end, the accuracy of this classes
Figure 10: Class 1 TP variation of Dermatology reached 0.98, respectively 0.96, meaning a better
dataset with respect to Cost Matrix change value than the one reached using only the Cost-
sensitive classification.

Figure 11: Class 3 TP variation of Dermatology


dataset with respect to Cost Matrix change Figure 12: The evolution of the TP of Class 1 and
the general accuracy with respect to the number of
After the experiments, we have found out that instances of Class 1
most of the time, when we were dealing with the
classification of unbalanced data sets, and trying to
improve the accuracy with which the classes were
predicted, two main things happened:
We also noted that the Enhancer had highest
values for TP instances among all other
metaclassifiers while the accuracy was kept at an
acceptable level (Table 4).

Table 4: the results obtained by compared


metaclassifiers on Dermatology database

Meta TP
Class
Figure 13: The evolution of the TP of Class 3 and Acc
ifier Cl Cl Cl Cl Cls Cls
the general accuracy with respect to the number of (%)
with s0 s1 s2 s3 4 5
instances of Class 3
SMO
Ada 95.8 1.0 0.8 1.0 0.8 1.0 1.0
So, the Enhancer multiplied the information
Boost 1 0 5 0 8 0 0
accordingly, such that to maximize ∆TPi + ∆ACC , so Attri
the accuracy does not fall below a set ε. We set ε to bute 97.4 1.0 0.9 0.9 0.9 1.0 1.0
0.05 (5%) and we concluded that with the new Selec 9 0 5 7 0 0 0
algorithm, the TP of certain classes of interest were ted
increased significantly while keeping the general CVPa
accuracy in the desired range (Fig. 14). ramet
We observed that the last classifier performs the 95.8 1.0 0.8 1.0 0.8 1.0 1.0
er
best and the Enhancer algorithm could have pointed 1 0 5 0 8 0 0
Selec
even more accurately the instances that belong to the tion
class of interest, but with the downside of pulling the Dag 94.6 1.0 0.9 1.0 0.8 1.0 0.5
general accuracy below the threshold preset ε. ging 9 0 2 0 9 0 5
Deco 96.3 1.0 0.8 1.0 0.8 1.0 1.0
rate 7 0 8 0 8 0 0
Ense
mble 94.4 0.9 0.9 1.0 0.8 0.9 0.9
Selec 1 7 3 0 5 2 0
tion
Filtere
d 96.0 1.0 0.8 1.0 0.8 1.0 1.0
Class 9 0 7 0 8 0 0
ifier
Meta 68.4 0.9 0.0 1.0 0.0 0.9 0.9
Cost 4 9 0 0 0 6 0
Multi
Figure 14: Comparison between the TP of the
classes resulting Cost Sensitive SMO Evaluation and Class 96.3 1.0 0.9 1.0 0.8 1.0 0.9
with the Enhancer Evaluation Classi 7 0 0 0 8 0 5
fier
The Cost Matrix that was the one that gave the Multi
96.0 1.0 0.8 1.0 0.8 1.0 1.0
best results in the Cost-sensitive experiments. The Boost
9 0 7 0 8 0 0
rows are read as “classified as”, and columns as AB
“actual class” (Table 3). CostS
ensiti
95.5 1.0 0.8 1.0 0.8 1.0 1.0
Table 3: Dermatology Cost Matrix ve
3 0 5 0 5 0 0
Classi
Predicted class fier
0.0 1.0 1.0 1.0 1.0 1.0 Enha 95.5 1.0 0.9 1.0 0.9 1.0 1.0
n cer 0 0 0 0 0 0 0
1.0 0.0 1.0 1.0 1.0 1.0
1.0 1.0 0.0 1.0 1.0 1.0 Actual 6 CONCLUSIONS
1.0 315.0 1.0 0.0 1.0 1.0 class
1.0 1.0 1.0 1.0 0.0 1.0 This paper is focused on providing the Enhancer,
1.0 1.0 1.0 1.0 1.0 0.0 a viable algorithm for improving the SVM
classification of unbalanced datasets.
Most of the times, in unbalanced datasets, the
classifiers have a tendency of classifying in a very Conference on Information and Computing
accurate manner the instances belonging to the best Science, Manchester, UK, pp. 247-250, May, 21-
represented classes and do a sloppy job with the rest. 22 (2009).
By doing this, most of the classification have an [5] X. Duan, G. Shan, and Q. Zhang: Design of a
above average classification rate simply because in two layers Support Vector Machine for
practice we are going to encounter more instances of classification, 2009 Second International
the well represented class. On the other side most of Conference on Information and Computing
the times the damage done by not classifying one of Science, Manchester, UK, pp. 247-250, May, 21-
the other classes correctly produces more damage 22 (2009).
than the other way around. [6] Y. Li, X. Danrui, and D. Zhe: A new method of
This solution is especially important when it is Support Vector Machine for class imbalance
far more important to classify the instances of a class problem, 2009 International Joint Conference on
correctly, and if in this process we might classify Computational Sciences and Optimization,
some of the other instances as belonging to this class Hainan Island, China, pp. 904-907, April 24-26
we do not produce any harm. For instance it is better (2009).
to send people suspect of different diseases to further [7] Z. Xinfeng, X. Xiaozhao, C. Yiheng, and L.
investigations, than sending ill people at home and Yaowei: A weighted hyper-sphere SVM, 2009
telling them they don’t have anything to worry about. Fifth International Conference on Natural
In order to overcome this problem we have Computation, Tianjin, China, pp. 574-577, 14-
developed the new classifying algorithm that can 16, August (2009).
classify the instances of a class of interest up to [8] J., Platt: Fast Training of Support Vector
better than the classification of the Cost Sensitive Machines using Sequential Minimal
and SVM algorithm. All of this is happening while Optimization, Advances in Kernel Methods –
keeping the accuracy at an acceptable level. The Support Vector Learning, B. Scholkopf, C.
algorithm improves the classification of the weakly Burges, A. Smola, eds., MIT Press (1998).
represented class in the dataset and it can be used for [9] D. I. Morariu, L. N. Vintan: A Better
solving real medical diagnosis problems. We have Correlation of the SVM Kernel’s Parameters, 5th
discovered that by over sampling the instances of the RoEduNet IEEE International Conference, Sibiu,
class of interest, we are helping the SVM algorithm Romania (2006).
to overcome the soft margin. As an effect, it [10] H. He and E. A. Garcia, Learning from
classifies better future instances of this class of imbalanced data, IEEE Transactions on
interest. Knowledge and Data Engineering, VOL. 21, NO.
9, September: (2009).
7 REFERENCES [11] Y. Dai, H. Chen, and T. Peng: Cost-sensitive
Support Vector Machine based on weighted
[1] V N. Vapnik: The nature of statistical learning attribute, 2009 International Forum on
theory, New York: Springer-Verlag (2000). Information Technology and Applications,
[2] I. Joachims: Text categorization with Support Chengdu, China, pp. 690-692, 15-17, May
Vector Machines: Learning with many relevant (2009).
features, Proceedings of the European [12] R. Santos-Rodriguez, D. Garcia-Garcia, and J.
Conference on Machine Learning, Berlin: Cid-Sueiro: Cost-sensitive classification based
Springer (1998). on Bregman divergences for medical diagnosis,
[3] M. Hong, G. Yanchun, W. Yujie, and L. 2009 International Conference on Machine
Xiaoying: Study on classification method based Learning and Applications, Florida, USA, pp.
on Support Vector Machine, 2009 First 551-556, 13-15, December (2009).
International Workshop on Education [13] University of California Irvine. UCI Machine
Technology and Computer Science, Wuhan, Learning Repository,
China, pp.369-373, March, 7-8 (2009). http://archive.ics.uci.edu/ml/.
[4] X. Duan, G. Shan, and Q. Zhang: Design of a [14] E. Frank, L. Trigg, G. Holmes, Ian H. Witten:
two layers Support Vector Machine for Technical Note: Naive Bayes for Regression,
classification, 2009 Second International Machine Learning, v.41 n.1, p.5-25 (2000).

You might also like