Self-Configuring Hybrid Evolutionary Algorithm For Fuzzy Imbalanced Classification With Adaptive Instance Selection

JAISCR, 2016, Vol. 6, No. 3, pp.
173 – 188
10.1515/jaiscr-2016-0013
SELF-CONFIGURING HYBRID EVOLUTIONARY

ALGORITHM FOR FUZZY IMBALANCED
CLASSIFICATION WITH ADAPTIVE INSTANCE
SELECTION
Vladimir Stanovov, Eugene Semenkin, Olga Semenkina
Institute of Computer Science and Telecommunications, Siberian State Aerospace

University Krasnoyarskii rabochii ave. 31, 660014, Krasnoyarsk, Russian Federation
Abstract
A novel approach for instance selection in classification problems is presented. This adap-
tive instance selection is designed to simultaneously decrease the amount of computation
resources required and increase the classification quality achieved. The approach gener-
ates new training samples during the evolutionary process and changes the training set
for the algorithm. The instance selection is guided by means of changing probabilities,
so that the algorithm concentrates on problematic examples which are difficult to clas-
sify. The hybrid fuzzy classification algorithm with a self-configuration procedure is used
as a problem solver. The classification quality is tested upon 9 problem data sets from
the KEEL repository. A special balancing strategy is used in the instance selection ap-
proach to improve the classification quality on imbalanced datasets. The results prove the
usefulness of the proposed approach as compared with other classification methods.
Keywords: Fuzzy classification, instance selection, genetic fuzzy system, self-configuration
1 Introduction chine learning problems are called genetics-based

machine learning methods (GBML). A typical ap-
Today the area of machine learning (ML) tech- plication of such methods is solving of classifica-
niques is quickly developing due to recent advances tion problems [1-3].
in computer and internet technologies. These ad-
In this paper we focus on fuzzy classification
vances led to the need to process, analyse and un-
methods, i.e. fuzzy rule bases. There is a set of
derstand massive amounts of data. Modern ma-
GBML approaches used to solve the problem of
chine learning methods often use evolutionary al-
a fuzzy rule base design for classification. These
gorithms (EAs) as design techniques to adjust the
methods differ in several ways: some of them opti-
weights or define the structure of the solution. The
mize the positions and shape of fuzzy terms, while
classical evolutionary algorithm – the genetic algo-
others optimize the rules and their combinations.
rithm (GA) is a powerful method, although there
The so-called Pittsburg approach, when an individ-
is a set of specialized approaches for various prob-
ual in the EA is a rule base, is used more often
lems, such as neural network structure design and
than the Michigan algorithm, when an individual is
weight adjustment, neuro-fuzzy inference system
a single rule. However, the methods which combine
design, fuzzy rule base design, etc. The special-
both the Michigan and Pittsburg approaches seem to
ized evolutionary algorithms applied to solve ma-
be the most promising.
Unauthenticated
Download Date | 3/13/18 10:33 AM
174 Vladimir Stanovov, Eugene Semenkin, Olga Semenkina
In the case of solving real-world classification 3 contains the description of an instance selec-
tion problems, researchers may face different is- tion approach, Section 4 contains the experimental
sues related to the data available. These issues are: results, and Section 5 concludes the paper.
too large or too small amount of data available,
large numbers of classes, irrelevant features and in-
stances, errors in data measurements and missing 2 Hybrid fuzzy GBML algorithm
data, imbalances in the amount of data per class
The original hybrid fuzzy evolutionary algorithm
and so on. These issues may cause serious difficul-
was introduced by H. Ishibuchi et. al. in [4]. How-
ties in learning the actual background of the real-
ever, as we developed our method from scratch, we
world process and representing it in a classification
provide a short description of our implementation.
model. In this paper we focus on two main prob-
lems: a large amount of data and an imbalance in The main loop of the evolutionary algorithm
the number of instances. implements the Pittsburg approach, i.e. each indi-
vidual is a rule base. The number of rules in the
The problem of a large amount of data can be
rule base is not fixed and may change during the
solved with data reduction methods (DR). There are
evolutionary process for every individual.
several groups of methods, including training set
selection (TSS), active learning, instance selection There were four different fuzzy partitions used,
(IS) and feature selection. In our work we concen- namely partitions into 2, 3, 4 and 5 fuzzy terms, as
trate on instance selection methods, which are used well as the “don’t care” condition, resulting in 15
for supervised learning problems, such as classifica- fuzzy sets A0 -A14 . The fuzzy sets are presented in
tion. Instance selection and feature selection meth- Figure 1.
ods are divided into two groups: filter and wrapper
approaches.
Instance selection is strongly connected to the
problem of irrelevant instances in the training sam-
ple. Removing instances from the training set does
not necessarily lead to the loss of information, es-
pecially in the case of a large number of training
examples. So, the instance selection method can
be not only a way to decrease the computational Figure 1. Fuzzy partitions
complexity of an algorithm, but also a method to in-
Each rule was represented as an integer string num-
crease the overall accuracy of the resulting model.
ber from 0 to 14. The Pittsburg-type algorithm con-
The idea behind this study is the development of
sists of the following steps:
a method for selecting the instances in such a way
in order to increase the learning capabilities of a
GBML algorithm. 1 Initialization using instances from the training
set
As a GBML method for our experiments, we
used our modification of the hybrid fuzzy evolu- 2 Fitness calculation
tionary algorithm, originally proposed by the H.
Ishibuchi group. Our modifications include self- 3 Selection, Crossover, Mutation
configuration, parameter tuning and some adjust- 4 Apply the Michigan-style part to each individual
ments for imbalanced datasets. Unlike our previous
work, in this paper we perform instance selection 5 If stopping criteria are not satisfied, go to step 2.
testing for different parameters and also consider
the influence of instance selection on various clas-
sification measures. Special attention is paid to the
class imbalance problem.
The rest of the paper is organized as follows:
Section 2 describes the classification method, Sec-
Unauthenticated
SELF-CONFIGURING HYBRID EVOLUTIONARY ALGORITHM FOR . . . 175
The Michigan part contain the following steps: CFq value. The instance is classified into a class,
corresponding to the winner-rule.
1 Define each rule in a rule base as an individual We generated rules 20 times for each individual,
and calculate its fitness and the maximum number of rules in the algorithm
was limited to 40. If all the rules received confi-
2 Remove or add new rules to the rule base with dence values lower than 0.5, i.e. the rule base was
genetic or heuristic approach empty, than the generation procedure was repeated.
3 Return the modified rule base to the population. The fitness value for all individuals was calcu-
lated as a combination of three main criteria, the
SELF-CONFIGURING HYBRID EVOLUTIONARY ALGORITHM FOR . . .
Let us describe each step in detail. The initializa- training sample error f1 (i), the number of rules f2 (i),
tion procedure uses instances from the training set and the total length of all rules f3 (i). The training
to3generate
Return thenew rules.rule
modified This
basestep
to theispopulation.
important as in sample error was the percentage with weight coeffi-
the case of a large number of features it is difficult cient w1 =100, the two other criteria were used with
toLetrandomly
us describegenerate
each stepa in
rule which
detail. Thewould describe
initializa- weights w2 =1 and w3 =1.
at least one instance correctly. To generate set
tion procedure uses instances from the training a rule, a For the selection step we applied three classi-
to generate
random new rules.
instance This step
is taken fromis the
important
sample, as in
and the cal selection schemes, i.e. fitness proportional, rank
the case of a large number of features it is difficult
membership values µ are calculated for each vari- and tournament selection with tournament size of 2.
to randomly generate a rule which would describe
able forone
at least all instance
fuzzy sets. AfterTothis,
correctly. the probability
generate a rule, a of There was only one specialized crossover oper-
arandom
fuzzy set to be selected is determined
instance is taken from the sample, and theas ator used, which combines two rule bases to pro-
membership values µ are calculated for each vari- duce one offspring. For this purpose, the number
able for all fuzzy sets. After this,
µA j (xthe probability of
pi ) of rules for the offspring is determined as a random
a fuzzy set to beP(A j ) = is determined
selected 14 as .
∑k=1 µA j (x pi ) number from 1 to |S1 |+|S2 |, where |Si | is the num-
µA j (x pi ) ber of rules for individual i. If |S1 |+|S2 |exceeds the
The same procedure
P(A j ) = is14 repeated for all variables. maximum number of rules (i.e. 40 in our computa-
∑k=1 µA j (x pi )
Next, for every variable the fuzzy set number is tional experiments) then the number of rules is set
changed to “Don’t care” condition
The same procedure is repeated for allwith 0.9 proba-
variables. to be equal to this maximum number. After this, the
Next, for every variable the fuzzy set
bility. After generating a rule, the confidence number is value
rule base is filled with new rules in a random way
changed to “Don’t care” condition
is calculated using the sample available: with 0.9 proba-
bility. After generating a rule, the confidence value
from a general rule pool created from parents’ rules.
is calculated using the sample available: The mutation step is similar to the one used in
∑x p ∈Classk µAq (x p ) GAs. The probability of a gen (fuzzy set) to be
Con f (Aq → Classk) ∑
= . changed is defined as 1/(n· jS1 |) for average muta-
∑mp=1
x p ∈Classk µAqµ(x )
Aqp(x p )
Con f (Aq → Classk) = m tion, 3/(n· jS1 |) for strong mutation and 1/(3·n · jS1 |)
∑ p=1 µAq (x p )
for weak mutation, where n is the number of vari-
If the confidence value is larger than 0.5, then the ables. Each of the fuzzy sets in a rule base could
If the confidence value is larger than 0.5, then the
fuzzy rule is added into the rule base.
fuzzy rule is added into the rule base. mutate, including “Don’t care” conditions. If the
The class number
The class numberassociated
associated with with
a rulea isrule is confidence value of a fuzzy rule after the use of
not
not coded
coded in inthe
thechromosome,
chromosome, and and is determined
is determined the mutation operator becomes lower than 0.5, the
heuristically usingconfidence
heuristically using confidence values.
values. Forpur-
For this this pur- rule is excluded from the rule base. If there are no
pose, the confidence values are calculatedallfor all
pose, the confidence values are calculated for rules left after mutation then several rules are gener-
classes for
classes for every
everyrule,
rule,andand
the the
classclass
number corre-corre-
number ated using the same procedure as at the initialization
sponding to the largest confidence value is set.
sponding to the largest confidence value is set. step.
The rule weight [5] is also calculated using con-
Thevalue:
fidence rule weight [5] is also calculated using con- The Michigan-style part is applied to every rule
fidence value: base after the mutation operator. At the first step,
the fitness values are calculated for every rule. The
M
rule fitness is equal to the number of instances cor-
CFq = Con f (Aq → Classk) − ∑ Con f (Aq → Classk) .
k=1,k̸=Cq rectly classified with this rule. If there are two iden-
The classification is performed by determining the tical rules, only one of them gets the non-zero fit-
winner-rule, i.e. the rule that has the largest µAq (x p ) ness value. There were three types of the Michigan
Unauthenticated
part: adding new rules, deleting rules and replacing N is the number of evolutionary algorithm genera-
rules, i.e. first deleting, then adding. In the case of tions, K is a constant value usually equal to 0.5.
deleting rules, the number of rules k to be deleted is
defined as 5 · (k − 1) <|S|< 5 · k. The rules with the
lowest fitness values are deleted first. In the case of 3 Adaptive instance selection algo-
adding new rules, the number of rules to be added rithm for imbalanced classifica-
is defined in the same way as for deleting, but if the tion problems
number of rules exceeds the maximum number of
rules then no rule is added. As we mentioned in the Introduction, data reduc-
New rules are added with the use of two differ- tion does not necessary lead to lower classification
ent methods, heuristic and genetic ones. The heuris- quality. In some cases excluding instances from
tic method uses incorrectly classified instances to the training set may lead to classification quality
generate new rules using the same procedure as at improvement, because the deleted instances were
the initialization step. The genetic method uses noisy, repeated many times and so on.
rules from the rule base to produce new rules with So, most of the instance selection methods are
the tournament selection, uniform crossover and av- focused on both data reduction and classification
erage mutation as in the genetic algorithm. quality improvement [9, 10]. However, these meth-
One of the modifications of the original algo- ods are mainly designed to create a subset only
rithm was the self-configuration procedure which once, and not to train an accurate classifier.
was implemented for selection, mutation, the The proposed instance selection algorithm is
Michigan part and adding rules in the Michigan designed for learning algorithms which use a lot of
part. The self-configuration method was first intro- iterations during the learning process. This method
duced in [6, 7] and was successfully applied to a does not require any data preprocessing, it is not
similar problem in [8]. The main idea of the method based on the k − NN method and does not require
is the assigning to genetic operators of the probabil- the distance to be calculated between instances. In-
ities to be used in the future based on their success stead, instances are selected based on classification
in the past. The method uses averaged fitness values quality, i.e. it implements the wrapper approach.
of offspring generated by a certain operator to select
Let us describe the idea in detail. At the first
a winner-operator at each generation. The winner’s
stage, a subsample of the training sample is cre-
probability increases, while all other operators get
ated, having a fixed size (set by user). At this step,
their probabilities decreased.
the instances are selected with equal probabilities.
We will describe this method in detail. Let z be After this, the learning process starts for a number
the number of different operators of the i − th type. of generations (iterations), called the adaptation pe-
The starting probability values are set to pi = 1/z. riod. Only the training subset is used.
The success estimation for every type of operator is
Here every instance receives a counter value Ui ,
performed based on the averaged fitness values:
which means the number of successful uses of the
i-th instance. At the beginning, all Ui = 1, i=1. . . n,
ni
1 and then are changed for every instance. At the end
AvgFiti =
2 ∑ fi j , i = 1, 2, . . . , z, of the adaptation period, the best current solution,
j=1
i.e. best classifier, for the subsample is used to up-
where ni is the number of offspring formed with the date the counter values. Only the counters of the in-
i-th operator, fi j is the fitness value of the j-th off- stances which are in the subsample are updated. If
spring, produced with the i-th operator, AvgFit i is an instance j was classified correctly, then U j =U j +1,
the average fitness of the solutions, produced with otherwise U j =1.
the i-th operator. Then the probability of apply-
So, the sample instances which were classified
ing the operator, whose AvgFit i value is the highest
correctly get their counters updated, while the coun-
among all the operators of this type, is increased by
ters for incorrectly classified measurements are re-
(z · K − K)/(z · N), and the probabilities of applying
set. After the update of counters, a new subsample
other operators are decreased by K/(z · N), where
Unauthenticated
is created, using new counters. The probability for will be used without taking the original class distri-
an instance i to be selected is calculated using the bution into account, the distribution in the subsam-
equation: ple may significantly differ. Because of this, the
1/Ui number of instances available for every class has
pi = .
∑ j=1,n 1/U j to influence the subsample creation procedure. The
problem of imbalanced classification with fuzzy
According to this equation, increasing the counter rule bases has been previously studied in [11, 12].
leads to a decrease in the probability of an instance One of the methods for taking this distribution
being included into the subsample. The denomina- into consideration is a stratified approach, which is
tor in this case is needed for normalization. commonly used in cross-validation. This approach
Thus, the adaptive instance selection algorithm is important for balanced problems, as it saves the
assigns lower probabilities for instances which are class distribution.
easier to classify. At the same time, instances which However in the case of solving imbalanced clas-
are difficult to classify or those which have not been sification problems, the stratified approach may be
used before get larger probabilities to be selected not the best way. As the described adaptive instance
in the new training sample. This procedure imple- selection approach may choose different groups of
ments two main principles: the exploration of areas instances, for imbalanced problems it is possible to
of the feature space unknown before, and using in- sample more balanced subsets, than the original set.
formation about classification quality to build a bet- This idea is implemented in the balancing approach,
ter separation between classes. which creates the subset so that the number of in-
During the learning process, the best solution stances would be as balanced as possible. In this
for every subsample is changed after every adapta- case, the minority classes may be entirely included
tion period, as it depends on the instances selected. into the training set.
Because of this, the probabilities are also changed,
as the best solution for the subsample may present
different results for the whole training set. As the
sample constantly changes, different solutions can
be received, improving the search process. In the
case of an evolutionary algorithm being used as a
learning method, during every new adaptation pe-
riod, a population of solutions from the previous
step is saved. Individuals in this population are able
to classify instances of the new training set, but on
most occasions with lower accuracy.
At each generation of the evolutionary algo-
rithm, the best solution found for the subsample is
checked on the whole sample. This step is required
to exclude losing the best found solution. More- Figure 2. Example of instance selection
over, the best solution for the whole training set is
included into population together with the best solu- In Figure 2, we show a graphical example of
tion for the subsample. At the end of the adaptation which of the instances are chosen by the algorithm
period, all individuals of the current population are and how the active instance selection procedure
checked on the whole training sample. This step is leads to an improvement in the classification qual-
required as the population may contain other solu- ity. The example is a two-class problem with 31
tions which could have even better generalization measurements, with different class instances shown
than the best solution for the current subsample. with crosses and squares. On the left side, the start
One more important issue during creating a sub- of the algorithm is shown, where all instances have
sample is setting the amount of instances to be se- the same probability of being selected, shown by
lected for every class. If the described procedure the size of the points. The measurements in circles
Unauthenticated
are the instances that have been selected into the The classification problems for testing were
current subsample. The separating surface obtained taken from the UCI [13] and KEEL [14] reposito-
by the end of the adaptation period does not classify ries. The selected problems have a large number of
all patterns correctly: instances 16 and 18, which instances, variables and classes, some of which are
are in the subsample, are misclassified. On the right highly imbalanced. The parameters of the datasets
side, the next adaptation period is shown. Instances are presented in Table 1.
16 and 18 did not change their size (i.e. U16 = U18 =
1) compared to unused instances, while all the rest Table 1. Datasets used
(1, 3, 7, 12, 14, 19, 20, 28, 29), used and correctly Dataset Number Number Number
classified, received a lower probability and thus a of of of classes
smaller size. instances features
The lower graph in Figure 2 demonstrates the Magic 19020 10 2
situation after several adaptation periods, where Page- 5472 10 5
those instances that are close to the separating plane blocks
between classes have higher probabilities of being Penbased 10992 16 10
chosen (and they are actually chosen at this itera- Phoneme 5404 5 2
tion), while the remaining instances that lie further Ring 7400 20 2
away have lower probablilties. Satimage 6435 36 6
Segment 2310 19 7
Texture 5500 40 11
4 Experimental setup and results Twonorm 7400 20 2
We performed a set of computational experi- The error values in Table 2 are the error rates in
ments with the presented adaptive instance se- percentages. The next table contains the best results
lection algorithm and hybrid fuzzy GMBL algo- with instance selection and the stratified strategy.
rithm to evaluate their effectiveness. The exper- The best instance selection configurations for
iments were performed on a 4-core Intel Core-i7 the stratified strategy were the following, for Magic:
2600K@4400MHz processor, the program system 30% of the training sample and adaptation period
was implemented in C++ with GCC 4.8.1 compiler, length of 400 generations; for Page-blocks: 30%,
only standard C++ libraries used. 200; for Penbased: 15%, 50; Phoneme: 25%, 200;
The parameters of the hybrid evolutionary Ring: 25%, 50; Satimage: 30%, 50; Segment: 30%,
fuzzy classification algorithm were set in the same 50; Texture: 25%, 50; Twonorm: 30%, 100. So, for
way for all experiments: the population size was most of the problems the best results were obtained
equal to 100, the number of generations to 10000 with the maximum subsample size.
and the maximum number of rules to 40. The pa- The difference between the standard method and in-
rameters for the instance selection, i.e. the size of stance selection with the stratified strategy is pre-
the subsample and the length of the adaptation pe- sented in Table 4.
riod varied. The subsample size was set to 5%,
10%, 15%, 20%, 25% and 30% of the original train- The time ratio was calculated as a ratio of the
ing set, the length of the adaptation period was set modified algorithm to the standard algorithm in per-
to 50, 100, 200 and 400 generations. Also two sam- centages. The maximum difference in accuracy was
pling strategies were tested – a stratified and a bal- 3.59% (Penbased problem). Only for the Phoneme
ancing strategy for the subsample. For each combi- problem was the effect of instance selection nega-
nation of parameters a 10-fold cross-validation pro- tive (i.e. 1.26% lower for the training sample). The
cedure was performed twice, so that the classifica- time spent by the modified algorithm was from 18%
tion quality measures were averaged over 20 runs. to 32% of the original, depending mainly on the size
The standard algorithm without instance selection of the subsample. Results for the balancing strategy
has also been tested. are presented in Table 5.
Table 6 contains the comparison of Tables 2 and 5.
Unauthenticated
Table 2. Results for the standard algorithm

Dataset Training error Test error Number of rules Rule length Time(minutes)
Magic 15.06 15.73 12.6 3.82 370.62
Page-blocks 3.52 3.96 10.1 3.49 94.48
Penbased 7.06 7.46 30.5 6.23 385.2
Phoneme 15.03 16.48 18.3 3.01 84.95
Ring 4.64 5.82 26.6 3.83 226.70
Satimage 12.22 14.22 20.4 11.06 345.82
Segment 4.55 6.45 22.2 6.69 146.18
Texture 6.50 7.75 25.8 14.90 352.26
Twonorm 4.42 6.06 17.4 7.40 254.88
Table 3. Results for instance selection with the stratified strategy

Magic 14.98 15.23 10.7 4.77 121.19
Page-blocks 3.59 3.83 7.8 4.94 20.73
Penbased 3.18 3.87 31.3 6.42 65.36
Phoneme 16.29 16.77 12.5 3.15 15.45
Ring 3.38 4.86 30.0 4.12 54.08
Satimage 11.28 13.05 27.9 6.72 96.31
Segment 3.01 5.13 25.1 6.32 39.21
Texture 3.34 4.31 28.1 11.76 97.47
Twonorm 3.07 4.66 18.7 7.45 58.92
Table 4. Difference between Table 2 and Table 3, positive is better

Dataset Training error Test error Number of rules Rule length Time ratio, %
Magic 0.08 0.5 1.9 -0.95 32.70
Page-blocks -0.07 0.13 2.3 -1.45 21.94
Penbased 3.88 3.59 -0.8 -0.19 16.97
Phoneme -1.26 -0.29 5.8 -0.14 18.19
Ring 1.26 0.96 -3.4 -0.29 23.86
Satimage 0.94 1.17 -7.45 4.34 27.85
Segment 1.54 1.32 -2.9 0.37 26.82
Texture 3.16 3.44 -2.3 3.14 27.67
Twonorm 1.35 1.4 -1.3 -0.05 23.12
Unauthenticated
Table 5. Results for instance selection with the balancing strategy

Magic 14.62 15.08 17.3 3.63 129.65
Page-blocks 2.71 3.25 18.9 4.82 18.53
Penbased 3.27 3.81 30.8 6.11 91.42
Phoneme 15.63 16.88 24.0 2.84 19.62
Ring 3.23 5.08 30.2 3.85 68.23
Satimage 10.57 12.93 27.2 5.84 85.12
Segment 3.55 5.19 25.1 6.24 32.40
Texture 3.37 4.45 27.0 12.81 114.79
Twonorm 4.03 4.81 15.0 7.74 38.11
Table 6. Difference between Table 2 and Table 5, positive is better

Magic 0.44 0.65 -4.7 0.19 34.98
Page-blocks 0.81 0.71 -8.8 -1.33 19.61
Penbased 3.79 3.65 -0.3 0.12 23.73
Phoneme -0.60 -0.40 -5.75 0.17 23.10
Ring 1.41 0.74 -3.6 -0.02 30.10
Satimage 1.65 1.29 -6.8 5.22 24.61
Segment 1.00 1.26 -2.9 0.45 22.16
Texture 3.13 3.30 -1.2 2.09 32.59
Twonorm 0.39 1.25 2.4 -0.34 14.95
Applying the balancing strategy changed the be- The presented results prove that using the bal-
haviour of the algorithm for most of the datasets. anced strategy not only increases the overall classi-
For the Phoneme dataset the difference changed fication accuracy, but also allows different classes
from -1.29% to -0.4%. The best improvement was to be recognized more precisely. However, be-
for the Penbased problem, 3.65%. Although the av- cause for most cases the best results were obtained
erage number of rules increased for 8 datasets out of when using 30% of the sample, applying the balanc-
9, the average rule length decreased for 6 datasets ing strategy to the Page-blocks problem, for exam-
out of 9, which means that the algorithm has de- ple, still resulted in a highly imbalanced subsample.
signed more rules, but they are less complex. Nevertheless, the improvement for this problem is
The best configurations for the balancing strat- about 13% compared with original algorithm.
egy were: Magic: 30%, 200; Page-blocks: 20%, In Tables 8 and 9 we provide the (1-
50; Penbased: 25%, 100; Phoneme: 30%, 50; Ring: RecallM )*100 values and accuracy values for the
30%, 50; Satimage: 30%, 50; Segment: 25%, 50; Page-blocks problem for all subsample sizes and all
Texture: 30%, 50; Twonorm: 20%, 200. adaptation period lengths.
However, the effect of the balanced strategy was The lowest (1- RecallM )*100 value was ob-
mainly demonstrated not in terms of overall accu- tained when the size of the subsample was min-
racy, but in terms of a more balanced classification. imal, i.e. when the subsample is as balanced as
To compare how successfully the algorithm recog- possible. We should mention that even for 5% for
nizes both minority and majority classes, we used the Page-blocks datasets it was impossible to cre-
the RecallM measure from [15]. Table 7 contains ate a completely balanced subsample, as the mi-
the comparison of (1- RecallM )*100 values for the nority class contains only 28 measurements, while
original algorithm, stratified strategy and balanced all the other classes in this case contained 54 mea-
strategy. surements. The bias towards the majority classes
Unauthenticated
Table 7. Comparison of (1- RecallM )*100 values

Dataset Train. Test. Train. Test. Train. Test.
Origin. Origin. Stratif. Stratif. Balanc. Balanc.
Magic 18.74 19.47 18.83 19.12 17.45 17.99
Page- 34.85 36.80 36.43 38.52 19.64 23.05
blocks
Penbased 7.04 7.44 3.16 3.84 3.25 3.80
Phoneme 18.98 20.80 21.48 22.38 16.60 18.10
Ring 4.64 5.82 3.38 4.87 3.23 5.09
Satimage 15.78 17.98 15.85 17.73 13.63 15.28
Segment 4.55 6.45 3.01 5.13 3.55 5.19
Texture 6.50 7.75 3.34 4.31 3.37 4.45
Twonorm 4.42 6.06 3.07 4.66 4.03 4.81
has been almost eliminated thanks to the balancing matrixes for the Page-blocks problem to show the
strategy, as the accuracy value was equal to 7.95. effect of the balancing strategy.
For comparison, we provide the accuracy values for In every row of tables 10 and 11 the predicted
the Page-blocks problem. class having the largest number of instances clas-
sified was marked in bold. The original algorithm
Table 8. (1- RecallM )*100 values for Page-blocks correctly classifies the first, majority class, but the
other classes are also classified into the first class,
Adaptation 50 100 200 400
except the second one. This means that actually the
period
algorithm correctly classifies only the first and sec-
length
ond classes. When using the balancing strategy, the
5% 10.18 12.92 15.18 12.06
situation changes, and most of the instances are cor-
10% 12.92 13.71 14.59 17.51
rectly classified in their class, however, this leads to
15% 20.51 19.63 18.35 19.26
a lower general classification quality.
20% 23.05 21.23 22.72 22.36
25% 28.35 26.14 23.05 22.24 For the remaining problems similar results were
30% 28.04 27.73 28.42 29.51 obtained. The main trend is that more balanced
classifiers can be trained with a smaller size of the
Table 9. Accuracy values for Page-blocks sample. If the sample becomes balanced at some
particular percentage, there is no sense in decreas-
Adaptation 50 100 200 400 ing its size any more.
period
length In Figures 3-10 we show the surface plot as a graph-
5% 7.95 8.44 9.25 8.48 ical representation of the dependence of accuracy
10% 5.81 6.03 6.67 7.27 on the size of the subsample and the length of the
15% 4.00 4.50 4.90 5.77 adaptation period.
20% 3.25 3.40 3.93 4.40
25% 3.78 3.78 3.69 4.15
30% 3.38 3.69 3.78 4.06
For 5% of the training set in the subset the accu-
racy decreases by a factor of 2 compared to values
obtained when using 30% of the sample. However,
the classifiers obtained with 5% of the sample can
be more suitable, as they are capable to successfully
classify all classes with the same accuracy.
Figure 3. Segment
In Tables 10 and 11 we also provide the confusion
Unauthenticated
Table 10. Confusion matrix for the Page-blocks problem, original algorithm
Class Pred. 1 Pred. 2 Pred. 3 Pred. 4 Pred. 5 Unknown
True 1 487.1 2.90 0 0.36 0.90 0
True 2 3.09 29.0 0.18 0.18 0.45 0
True 3 1.72 0 1.09 0 0 0
True 4 2.45 0.09 0 6 0.18 0
True 5 8.73 0 0.27 0 2.54 0
Table 11. Confusion matrix for the Page-blocks problem, balanced strategy
Class Pred. 1 Pred. 2 Pred. 3 Pred. 4 Pred. 5 Unknown
True 1 452.0 19.36 3.72 6.54 9.64 0.09
True 2 1.72 30.27 0.09 0.36 0.36 0.09
True 3 0.18 0 2.54 0.09 0 0
True 4 0 0.27 0 8.27 0.18 0
True 5 1.27 0.36 0.18 0.36 9.36 0
Figure 4. Phoneme Figure 6. Satimage
Figure 5. Page-blocks Figure 7. Twonorm
Unauthenticated
The dependence on the adaptation period length

is not so straightforward; however for most of
the problems shorter adaptation periods are more
preferable. This happens because for most of the
problems the population is capable of adjusting to
the new subsample within 50 generations. More-
over, a shorter adaptation period gives the possibil-
ity to update the Ui values more often and better dis-
tinguish the problematic areas of the feature space.
To compare the time required for computation by
Figure 8. Ring the original algorithm and the instance selection
method, the computation time was averaged over
all adaptation period lengths for every subsample
size. The resulting time values were compared to
the ones for the original algorithm. Next, the accel-
eration rate equal to Torig /TIS was calculated. Table
12 contains the acceleration values.
The maximal acceleration value achieved is
equal to 21.5, and the minimal to 3.09. For most
of the problems the algorithm with adaptive in-
stance selection had results not worse than the orig-
inal method already at the point of 10-15% of the
Figure 9. Penbased
training sample being used, which means that at
the same level of classification quality the instance
selection allows the algorithm to work 6-12 times
faster.
For better understanding of the training process
with instance selection, we provide an error graph
for the best individual for the overall sample and
the subsample. In Figure 11 the graph for the Pen-
based problem with an adaptation period of 50 and
subsample size of 15% is presented. In Figure 12
the graph for the Penbased problem with an adapta-
Figure 10. Magic tion period of 400 generations and subsample size
of 15% is presented.
One of the characteristics of the proposed adap-
tive instance selection algorithm is that the subsam-
ple error rate appears to be larger than the over-
all training error. Nevertheless, this does not pre-
vent a successful training process. The reason for
such behaviour is that the instance selection fo-
cuses on instances which are difficult to classify
(most of the time they are instances on the border
between classes) and includes them into the train-
ing subset. In the first adaptation periods, the sub-
Figure 11. Texture sample error and the overall error are very similar,
which means that the measurements are selected al-
In most of the cases the increasing of the sub-
most uniformly. But later the situation changes, be-
sample size leads to better classification accuracy.
cause instances which are easy to classify get larger
Unauthenticated
Table 12. Time acceleration values

Dataset 5 10 15 20 25 30 Orig
Magic 13.51 9.11 6.15 4.83 3.47 3.09 1
Page-blocks 13.51 8.90 6.70 5.41 4.41 3.60 1
Penbased 14.04 9.47 6.39 5.02 3.61 3.20 1
Phoneme 21.50 11.39 8.72 6.05 4.96 4.16 1
Ring 16.13 10.47 6.83 5.57 4.22 3.46 1
Satimage 17.25 11.62 8.10 5.73 4.34 3.50 1
Segment 19.65 10.65 7.37 5.56 4.42 3.72 1
Texture 17.70 11.09 7.15 4.82 3.76 3.15 1
Twonorm 18.54 12.41 8.86 6.72 5.13 4.39 1
counter values and lower probabilities to be chosen.

As these instances are still present in the training
sample and classified correctly, the overall training
error becomes lower than the subsample error.
When increasing the adaptation period length,
the training process becomes slower. In Figure 12
it is seen that at the first adaptation period the train-
ing subsample error becomes lower than the overall
training set error. This happens because the popu-
lation overfits to the subsample, i.e. the best rule
base for the subsample describes the trends which
are presented only in the subsample. During the
several next adaptation periods the error on the sub-
sample and training set is almost identical, but then Figure 13. Accuracy change during the training
the probabilities are changed so that the algorithm process, Penbased problem, 400 generations
focuses on problematic areas of the search space,
Let us consider the training process in the sense
which leads to a larger subsample error. In the
of different classification quality measures. The
case of a longer adaptation period, the difference
Page-blocks problem will be considered, as it is the
between the training set and subset errors is smaller
most imbalanced problem. To calculate the quality
than for the shorter adaptation period.
measures such as Accuracy (overall classification
accuracy), Precisionµ , Recall, Fscore, PrecisionM ,
RecallM , FscoreM taken from [15], the confusion
matrix was calculated for every generation. In Fig-
ure 13 we provide the averaged graphs for all mea-
sures averaged over all algorithm runs.
Figure 12. Accuracy change during the training

process, Penbased problem, 50 generations
Unauthenticated
Figure 14. The change in classification quality, Figure 16. The change in classification quality,
Page-blocks problem, original algorithm Page-blocks problem, balanced instance selection
For this problem the Precisionµ , Recall and Fscore
measures are almost identical, but PrecisionM ,
RecallM , FscoreM are different. This happens due to
the fact that the last measures are macro-measures
and are calculated for each class separately and then
averaged. In the case of imbalanced classification,
the PrecisionM becomes larger than Accuracy, but
RecallM becomes smaller. The errors for each of
the five classes are presented in Figure 14.
Figure 17. Classification errors for five classes,

Page-blocks problem, balanced instance selection
Compared to the previous case, the solutions
are now more balanced, which is especially notice-
able in Figure 16. Here all classes have error rates
of about 0.1 except for the fifth class, which has
0.2. Also, the FscoreM values are larger than for the
Figure 15. Classification errors for five classes, original algorithm. The presented results prove that
Page-blocks problem, original algorithm the instance selection method allows the classifica-
tion quality to be increased when using the balanced
Only the first, most represented class can be recog-
strategy for imbalanced datasets.
nized correctly by the original algorithm – the error
values for the remaining classes are high or very For comparison with other methods, an addi-
high. The next two Figures, 15 and 16, show the tional computation experiment was performed on
change in classification quality measures and clas- the same set of problems. The algorithm was
sification errors for the instance selection algorithm tested without instance selection, but with larger re-
with the balancing strategy, with the following pa- sources, i.e. the population size was 210, the num-
rameters: 5% of the training sample in the subsam- ber of generations was 50000, and the number of
ple and an adaptation period of 50 generations. rules did not change and was equal to 40. Several
classification algorithms were chosen for compari-
son from [16, 17]. Among them there was also the
prototype algorithm developed by the H. Ishibuchi
Unauthenticated
Table 13. Comparison with other approaches

Dataset HEFCA HEFCA HEFCA Fuzzy Paral. FARC- Bio HEL
IS Orig. Res. GBML Fuzzy HD [17]
[4] GBML [16]
Magic 15.08 15.73 15.87 15.42 14.89 15.49 -
Page- 3.25 3.96 3.78 3.81 3.62 4.99 -
blocks
Penbased 3.81 7.46 5.02 3.07 3.30 3.96 6.00
Phoneme 16.88 16.48 15.36 15.43 15.96 17.86 -
Ring 5.08 5.82 5.52 6.73 5.25 5.92 -
Satimage 12.93 14.22 12.86 15.54 12.96 12.68 11.60
Segment 5.19 6.45 5.62 5.99 5.90 - 2.90
Texture 4.45 7.75 4.90 4.64 4.77 7.11 -
Twonorm 4.81 6.06 5.57 7.36 3.39 4.72 -
group, and also their parallel implementation with Table 14. Training time comparison
the training subsample rotation.
Dataset HEFCA Parallel Ratio
The presented approach with the balancing IS fuzzy
strategy, noted as HEFCA IS (Hybrid Evolution- GBML
ary Fuzzy Classification Algorithm with Instance
Magic 129.65 22.58 5.74
Selection) appeared to be the best in the sense of
min. min.
accuracy for 3 problems out of 9. Moreover, this al-
Page- 18.53 4.74 min. 3.91
gorithm has overcome not only the original method
blocks min.
(HEFCA Orig.) on 8 problems out of 9, but also the
Penbased 91.42 35.56 2.57
method with 10.5 times larger resource (HEFCA
min. min.
Res.) on 7 problems. The prototype algorithm and
Phoneme 19.62 13.19 1.49
its parallel implementation were outperformed on 4
min. min.
problems out of 9. On the Satimage problem the re-
Ring 68.23 22.52 3.03
sults are almost identical. The parallel implementa-
min. min.
tion with subsamples rotation used 10.5 times more
Satimage 85.12 15.38 5.53
computational resource and 7 threads.
min. min.
Table 14 contains the comparison of the time Segment 32.40 4.69 min. 6.91
required for training by the algorithm with instance min.
selection and the Parallel fuzzy GBML [4]. Texture 114.79 15.72 7.30
min. min.
Twonorm 38.11 7.84 min. 4.86
min.
On average, the presented approach is 4.6 times
slower, but the parallel algorithm used 7 threads for
calculation, allowing it to train up to 7 times faster.
One should mention that the presented time values
for instance selection are for the best configurations,
and for most of the problems the best results were
obtained when using 30% of the training sample.
This means that the time required for training can
be decreased even more without a significant loss
in classification accuracy.
Unauthenticated
5 Conclusion [5] Ishibuchi H., T. Yamamoto, Rule weight specifi-

cation in fuzzy rule-based classification systems,
The proposed adaptive instance selection algo- IEEE Trans. Fuzzy Systems 13, 2005, pp. 428–
rithm for imbalanced datasets creates samples of 435.
a fixed size out of the original training sample to [6] E. Semenkin, M. Semenkina, Self-configuring ge-
guide the training process by taking the information netic algorithm with modified uniform crossover
about classification results into consideration. The operator, in Y. Tan, Y. Shi, Z. Ji (Eds.), Advances
goal of this method is to decrease the time required in Swarm Intelligence, PT1, LNCS 7331, 2012, pp.
for computation along while increasing the quality 414-421.
of classification. The testing results prove that the [7] E. Semenkin, M. Semenkina, Self-Configuring Ge-
proposed method allows a significant improvement netic Programming Algorithm with Modified Uni-
in classification quality, especially for imbalanced form Crossover, in Proc. of the IEEE Congress on
datasets because of the use of the balancing strat- Evolutionary Computation (CEC 2012), Brisane
egy. (Australia), pp. 1-6, 2012.
The resulting classifiers are capable of rec- [8] M. Semenkina, E. Semenkin, Hybrid self-
ognizing all classes with similar accuracy values, configuring evolutionary algorithm for automated
design of fuzzy classifier, in Y. Tan, Y. Shi, C.A.C.
which is important for many real-world problems.
Coello (Eds.), Advances in Swarm Intelligence,
The time required for training decreases signifi- PT1, LNCS 8794, 2014, pp. 310-317.
cantly, and can be adjusted by changing the size
of the subsample. The resulting accuracy is at the [9] J. R. Cano, F. Herrera, M. Lozano, Stratification for
scaling up evolutionary prototype selection, Pat-
same level as the best algorithms for the presented
tern Recognition Letters, 2004 Volume 26, Issue
set of problems, or even better. 7, 15 May 2005, Pages 953–963.
The proposed adaptive instance selection
[10] J. R. Cano, F. Herrera, M. Lozano, A Study on
method is universal and can be applied to any other the Combination of Evolutionary Algorithms and
iteration-based machine learning technique used Stratified Strategies for Training Set Selection in
for classification, including non-evolutionary algo- Data Mining, Advances in Soft Computing Volume
rithms. 32, 2005, pp 271-284.
[11] A. Fernndez, M. J. Jesus, F. Herrera, Hierarchi-
References cal fuzzy rule based classification systems with ge-
netic rule selection for imbalanced data-sets, Inter-
[1] A. Fernandez, S. Garcia, J. Luengo, E. Bernado- national Journal of Approximate Reasoning, 2009,
Mansilla, F. Herrera, Genetics-Based Machine pp. 561–577.
Learning for Rule Induction: State of the Art, [12] A. Fernndez, S. Garca, M. J. Jesusb, F. Herrera,
Taxonomy, and Comparative Study, Evolutionary A study of the behaviour of linguistic fuzzy rule
Computation, IEEE Transactions on (Volume:14, based classification systems in the framework of
Issue: 6), June 21, 2010, pp. 913 – 941. imbalanced data-sets, Fuzzy Sets and Systems 159,
2008, pp. 2378 – 2398.
[2] L. B. Booker, D. E. Goldberg, and J. H. Holland,
Classifier systems and genetic algorithms, Artif. [13] Asuncion A., Newman D. UCI machine learning
Intell., vol. 40, no. 1–3, Sep. 1989, pp. 235–282. repository, University of California, Irvine, School
of Information and Computer Sciences, 2007.
[3] Bodenhofer U., Herrera F. Ten Lectures on Ge-
[14] J. Alcal-Fdez, L. Snchez, S. Garcia, M. J. del Je-
netic Fuzzy Systems, Preprints of the Inter-
sus, S. Ventura, J. M. Garrell, J. Otero, C. Romero,
national Summer School: Advanced Control—
J. Bacardit, V. M. Rivas, J. C. Fernndez, and F. Her-
Fuzzy, Neural, Genetic. – Slovak Technical Uni-
rera, KEEL: A software tool to assess evolutionary
versity, Bratislava., 1997. p. 1–69.
algorithms for data mining problems, Soft Com-
put., vol. 13, no. 3, pp. 307–318, Feb. 2009.
[4] Ishibuchi H., Mihara S., Nojima Y. Parallel Dis-
tributed Hybrid Fuzzy GBML Models With Rule [15] Sokolova M., Lapalme G. A systematic analysis of
Set Migration and Training Data Rotation, IEEE performance measures for classification tasks. – In-
Transactions on fuzzy systems, vol. 21, n. 2., April formation Processing and Management 45, 2009,
2013. pp. 427–437.
Unauthenticated
[16] J. Alcala-Fdez, R. Alcala, F. Herrera, A fuzzy as- [17] J. Bacardit, E. K. Burke, N. Krasnogor, Improving
sociation rulebased classification model for high- the scalability of rule-based evolutionary learning,
dimensional problems with genetic rule selection Memetic Comput. J., vol. 1, no. 1, Mar. 2009, pp.
and lateral tuning, IEEE Trans. Fuzzy Syst., vol. 55–67.
19, no. 5, Oct. 2011, pp. 857–872.
Vladimir Stanovov received the Olga Semenkina received her Master

Bachelor of Science and Master of Sci- in Mathematics degree from Kemero-
ence degrees in systems analysis and vo State University (Kemerovo, USSR)
control from Reshetnev Siberian State in 1982, her PhD in Computer Science
Aerospace University, Krasnoyarsk, from Siberian Aerospace Academy
Russia, in 2012 and 2014 respectively. (Krasnoyarsk, Russia) in 1995 and her
His research interests include genetic DSc in Engineering from the Siberian
fuzzy systems, self-configured evo- State Aerospace University (Krasno-
lutionary algorithms and machine yarsk, Russia) in 2002. Since 2003, she
learning. He is currently a PhD student at the Siberian State has been a professor of Higher Mathematics of the Siberian
Aerospace University. Mr. Stanovov received the Best Stu- State Aerospace University. Her areas of research include the
dent Paper Award from the 4th International Congress on modelling of complex systems and discrete optimization. She
Advanced Applied Informatics in 2015. has been awarded the Badge of Honoured Worker of Higher
Education of the Russian Federation.
Eugene Semenkin received his Mas-
ter in Applied Mathematics degree
from Kemerovo State University (Ke-
merovo, USSR) in 1982, his PhD in
Computer Science from Leningrad
State University (Leningrad, USSR) in
1989 and his DSc in Engineering and
Habilitation from the Siberian State
Aerospace University (Krasnoyarsk,
Russia) in 1997. Since 1997, he has been a professor of sys-
tems analysis at the Institute of Computer Science and Tel-
ecommunications of the Siberian State Aerospace University.
His areas of research include the modelling and optimization
of complex systems, computational intelligence and data
mining. He has been awarded the Tsiolkovsky Badge of Hon-
our by the Russian Federal Space Agency and the Reshetnev
medal by the Russian Federation of Cosmonautics.
Unauthenticated

Self-Configuring Hybrid Evolutionary Algorithm For Fuzzy Imbalanced Classification With Adaptive Instance Selection

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Self-Configuring Hybrid Evolutionary Algorithm For Fuzzy Imbalanced Classification With Adaptive Instance Selection

Uploaded by

Copyright:

Available Formats

JAISCR, 2016, Vol. 6, No. 3, pp.

SELF-CONFIGURING HYBRID EVOLUTIONARY

Vladimir Stanovov, Eugene Semenkin, Olga Semenkina

Institute of Computer Science and Telecommunications, Siberian State Aerospace

1 Introduction chine learning problems are called genetics-based

Table 2. Results for the standard algorithm

Table 3. Results for instance selection with the stratified strategy

Table 4. Difference between Table 2 and Table 3, positive is better

Table 5. Results for instance selection with the balancing strategy

Table 6. Difference between Table 2 and Table 5, positive is better

Table 7. Comparison of (1- RecallM )*100 values

Figure 4. Phoneme Figure 6. Satimage

Figure 5. Page-blocks Figure 7. Twonorm

The dependence on the adaptation period length

Table 12. Time acceleration values

counter values and lower probabilities to be chosen.

Figure 12. Accuracy change during the training

Figure 17. Classification errors for five classes,

Table 13. Comparison with other approaches

5 Conclusion [5] Ishibuchi H., T. Yamamoto, Rule weight specifi-

Vladimir Stanovov received the Olga Semenkina received her Master

You might also like