Professional Documents
Culture Documents
1 s2.0 S0957417413007574 Main
1 s2.0 S0957417413007574 Main
Z
3
Z
2
, we consider all arcs connecting t with samples s Z
1
, as
though t were part of the training graph. Considering all possible
paths from S
+
to t, we nd the optimum path P
+
(t) from S
+
and label
t with the class k(R(t)) of its most strongly connected prototype
R(t) S
+
. This path can be identied incrementally, by evaluating
the optimum cost C(t) as:
C(t) = minmaxC(s); d(s; t); \s Z
1
: (10)
Let the node s
+
Z
1
be the one that satises (Eq. (10)) (i.e., the
predecessor P(t) in the optimum path P
) =
k(R(t)), the classication simply assigns L(s
) as the class of t. An
error occurs when L(s
) k(t).
4. Swarm-based Optimization
In this section, we briey describe the swarm-based optimiza-
tion techniques employed in this work.
4.1. Firey Algorithm
The Firey Algorithm was proposed by Yang (2010) and it is de-
rived from the ash attractiveness of reies for mating partners
(communication) and attracting potential preys. The brightness
of a rey is determined by some objective function and the per-
ceived light intensity I depends on the distance d from its source,
as follows:
I = I
0
e
id
; (11)
where I
0
is the original light intensity and i stands for the light
absorption coefcient.
As a reys attractiveness is proportional to the light intensity
seen by adjacent reies, we can now dene the attractiveness b of
a rey by
b = b
0
e
id
2
; (12)
where b
0
is the attractiveness at d = 0.
A rey i is attracted to another rey k with a better tness va-
lue, and moves according to:
x
j
i
(t 1) = x
j
i
(t) b
0
e
id
2
i;k
x
j
k
x
j
i
_ _
/ r
i
1
2
_ _
; (13)
where the second term states the attraction between both reies,
d
2
i;k
stands for the distance between reies i and k,/ is a randomi-
zation factor and r
i
~ U(0,1).
4.2. Gravitational Search Algorithm
Rashedi et al. (2009) proposed an optimization algorithm based
on the gravity, which is one of the fundamental interactions of
2252 D. Rodrigues et al. / Expert Systems with Applications 41 (2014) 22502258
nature. Their approach, called Gravitational Search Algorithm,
models each possible solution as a particle in the universe, which
interacts with other ones according to the Newtons law of univer-
sal gravitation (Halliday, Resnick, & Walker, 2000).
Let p
i
be a particle in a universe, and x
i
R
n
and v
i
R
n
its po-
sition and velocity, respectively. One can dene, at a specic time t,
the force acting on particle i from particle k in the jth dimension as
following:
F
j
ik
(t) = G(t)
M
i
(t)M
k
(t)
R
ik
(t) s
(x
j
k
(t) x
j
i
(t)); (14)
where R
ik
(t) is the Euclidean distance between particles i and k,M
i
stands for the mass of particle i and s is a small constant to avoid
division by zero. G is a gravitational potential, which is given by
G(t) = G(t
0
)
t
0
t
_ _
f
; f < 1; (15)
in which f is a control parameter (Mansouri, Nasseri, & Khorrami,
1999), G(t) is the value of gravitational potential at time t, and
G(t
0
) is the value of the gravitational potential at the time of the
creation of the universe that is being considered (Mansouri
et al., 1999).
To give a stochastic behaviour to Gravitational Search Algo-
rithm, Rashedi et al. (2009) assume the total force that acts on par-
ticle i in a dimension j as a randomly weighted sum of the forces
exerted from other agents:
F
j
i
(t) =
m
k=1;ji
r
j
F
j
ik
(t); (16)
in which r
i
~ U(0,1) and m denotes the number of particles (size of
the universe).
The acceleration of a particle i at time t and dimension j is given
by
a
j
i
(t) =
F
j
i
(t)
M
i
(t)
; (17)
in which the mass M
i
is calculated as follows:
M
i
(t) =
q
i
(t)
m
k=1
q
k
(t)
; (18)
with
q
i
(t) =
f
i
(t) w(t)
b(t) w(t)
: (19)
The terms w(t) and b(t) denote, respectively, the particles with the
worst and best tness value. The term f
i
(t) stands for the tness va-
lue of particle i.
Finally, to avoid local optimal solutions, only the best b masses,
i.e., the ones with highest tness values, will attract others. Let B
be the set of these masses. The value of b is set to b
0
at the begin-
ning of the algorithm and decreases with time. Hence, Eq. (16) is
rewritten as:
F
j
i
(t) =
bB;bi
r
b
F
j
ib
(t): (20)
The velocity and position updating equations are given by:
v
j
i
(t 1) = r
i
v
j
i
(t) a
j
i
(t) (21)
and
x
j
i
(t 1) = x
j
i
(t) v
j
i
(t 1); (22)
where in which r
i
~ U(0,1).
4.3. Harmony Search
Harmony Search (HS) is a meta-heuristic algorithm inspired in
the improvisation process of music players. Musicians often impro-
vise the pitches of their instruments searching for a perfect state of
harmony (Geem, 2009). The main idea is to use the same process
adopted by musicians to create new songs to obtain a near-optimal
solution according to some tness function. Each possible solution
is modelled as a harmony and each musical note corresponds to
one decision variable.
The algorithm, which has a theoretical stochastic derivative
background, generates after each iteration a new harmony vector
~x
new
= (x
1
new
x
2
new
; . . . ; x
n
new
) based on memory considerations, pitch
adjustments, and randomization (music improvisation). Variable
n stands for the number of decision variables, as stated for the
aforementioned nature-inspired optimization techniques.
With regard to the memory consideration step, the idea is to
model the process of creating songs, in which the musician can
use her memories of good musical notes to create a new song. This
process is modeled by the Harmony Memory Considering Rate
(HMCR) parameter. Suppose that HMCR = 0.75. In this case, 75% of
the new harmony will be composed of musical notes that come
from the harmony memory, and the remaining 25% are given ran-
domly, which simulates the process of music improvisation. Math-
ematically speaking:
x
j
new
x
j
new
x
j
1
; . . . ; x
j
m
_ _
with probability HMCR;
x
j
new
A
j
with probability (1 HMCR);
_
_
_
(23)
where m and A
j
stand for the number of harmonies and the set of
ranges for each decision variable j, respectively. Therefore,
HMCR [0,1] is the probability of choosing one value from the his-
toric values stored in the harmony memory, and (1 HMCR) is the
probability of randomly choosing one feasible value.
Further, every component j of the new harmony vector ~x
new
is
examined to determine whether it should be pitch-adjusted, which
is controlled by the Pitch Adjusting Rate (PAR) variable:
Pitching adjusting decision for x
j
new
Yes with probability PAR;
No with probability(1 PAR):
_
(24)
The pitch adjustment for each instrument is often used to improve
solutions and to escape from local optima. This mechanism con-
cerns shifting the neighbouring values of some decision variable
in the harmony.
In such a way, if the pitch adjustment decision for the decision
variable x
j
new
is Yes, x
j
new
is replaced as follows:
x
j
new
x
j
new
r
j
h; (25)
where h is an arbitrary distance bandwidth for the continuous de-
sign variable, and r
j
~ U(0,1).
4.4. Particle Swarm Optimization
Particle Swarm Optimization (PSO) is an algorithm modeled on
swarm intelligence that nds a solution in a search space based on
the social behaviour dynamics (Kennedy & Eberhart, 2001). Each
possible solution of the problem is modeled as a particle in the
swarm that imitates its neighbourhood based on a objective
function.
Some denitions consider Particle Swarm Optimization as a sto-
chastic and population-based search algorithm, in which the social
D. Rodrigues et al. / Expert Systems with Applications 41 (2014) 22502258 2253
behavior learning allows each possible solution to moves onto this
search space by combining some aspect of the history of its own
current and best locations with those of one or more members of
the swarm, with some random perturbations. This process simu-
lates the social interaction between humans looking for the same
objective or a ock of birds looking for food, for instance.
The entire swarm is modeled as a multidimensional space R
n
, in
which each particle p
i
= (x
i
; v
i
) R
n
has two main features: (i) po-
sition (~x
i
) and (ii) velocity (~v
i
). The local (best current position ^x
i
)
and global solution ~g are also known for each particle. After den-
ing the swarm size m, i.e., the number of particles, each one is ini-
tialized with random values for both velocity and position. Each
individual is then evaluated in respect to some tness function
and its local maximum is updated. At the end, the global maximum
is updated with the particle that achieved the best position in the
swarm. This process is repeated until some convergence criterion
is reached. The updated velocity and position equations of the par-
ticle p
i
in the simplest form that governs the Particle Swarm Opti-
mization at time step t are, respectively, given by
v
j
i
(t 1) = wv
j
i
(t) c
1
r
1
^x
i
(t) x
j
i
(t)
_ _
c
2
r
2
~g x
j
i
(t)
_ _
(26)
and
x
j
i
(t 1) = x
j
i
(t) v
j
i
(t 1); (27)
where w is the inertia weight that controls the interaction between
particles, and r
1
, r
2
[0,1] are random variables that give the sto-
chastic idea to Particle Swarm Optimization. The variables c
1
and
c
2
are used to guide the particles onto good directions.
5. Methodology
A data instance is typically described as pair (~x; y), in which
~x R
n
and y stand for the feature vector and its label, respectively.
Let Z(A; }) be a dataset of our classication problem in which A
represents a set of feature vectors, and } the set of outputs related
to each instance. A classier is then dened as a function f : A },
which predicts y for a given~x based on a model learned from a set
of labeled data (supervised learning). In order to provide a better
understanding of the problem, feature selection techniques aim
to discover a minimal subspace which better describes the distri-
bution of A. More precisely, our goal is to select a value of mn
and project each data instance x R
n
to a new one x
/
R
m
. Fur-
thermore, classication algorithms may suffer from the Hughes
phenomenon (Hughes, 1968) in high dimensional spaces, and thus
require much more computational load for numerical solutions of
dynamic programming problems (Bellman, 2010).
We now describe the methodology employed to evaluate the
performance of feature selection techniques discussed in previous
sections (Fig. 1 depicts a pipeline to clarify this procedure). Firstly,
we randomly partitioned the dataset into N folds, i.e., Z = T
1
T
i
T
N
. Note that each fold should be large enough to contain
representative samples of the problem. Further, for each fold, we
train a given instance of the OPF classier over a subset of this fold,
Z
1
i
T
i
, and an evaluation set Z
2
i
T
i
Z
1
i
is then classied in
order to compute a tness function which will guide a stochastic
optimization algorithm to select the most representative set of
features. Each member of the population in the meta-heuristic
algorithm is associated with a string of bits denoting the presence
or absence of a feature. Thus, for each member, we construct a clas-
sier from the training set with only the selected features and
compute a tness function by means of classifying Z
2
i
. As long as
the procedure converges, i.e, all generations of a population were
computed, the agent (bat, rey, mass, harmony, particle) with
the highest tness value encodes a solution with the best com-
pacted set of features. Further, we build a classication model
using the training set and the selected features, and we also eval-
uate the quality of the solution computing an effectiveness over
the remaining folds, T
j
Z T
i
. Algorithm 2 details the methodol-
ogy for comparing feature selection techniques.
Algorithm 2. Feature Selection Evaluation
Fig. 1 displays the above procedure. As aforementioned, the fea-
ture selection is carried on over the fold i, which is partitioned in a
training Z
1
i
and an evaluating set Z
2
i
. The idea is to represent a pos-
sible subset of features as a string of bits, which encodes each
agents position in the search space. Thus, for each agent, we model
the dataset using its string of bits, and an OPF classier is trained
over the new Z
1
i
and its effectiveness using this subset of features
is assessed over Z
2
i
. This recognition rate is then used as the tness
function to guide each agent to new positions until we reach the
convergence criterion. The agent with the best tness function is
then employed to build Z
1
i
, which is used for OPF training. The nal
accuracy using the selected subset of features is computed over the
remaining folds (red rectangle in Fig. 1). This procedure is repeated
over all folds for mean accuracy computation.
In regard to datasets, we have employed the following:
v Wisconsin Breast Cancer: 683 samples, 2 classes, and 10 features
(Mangasarian, Wolberg, & Setiono, 1989).
v DNA: 2,000 samples, 3 classes, and 180 feature (King, Feng, &
Sutherland, 1995).
v USPS: 7,291 samples, 10 classes, and 256 features (Hull, 1994).
v Splice: 1,000 samples, 2 classes, and 60 features (Frank & Asun-
cion, 2010).
v Ionosphere: 351 samples, 2 classes, and 34 features (Frank &
Asuncion, 2010).
v SVM Guide 1: 3,089 samples, 2 classes, and 4 features (Hsu,
Chang, & Lin, 2003).
2254 D. Rodrigues et al. / Expert Systems with Applications 41 (2014) 22502258
6. Experimental results
In this section, we evaluated the effectiveness of BBA to nd out
compact sets of features with as high predictive performance as
possible, and compared it against other swarm-based feature
selection methods. We applied the methodology presented in Sec-
tion 5 to obtain a better quality estimation of each solution. More
precisely, we dened k = 5 for a cross-validation scheme which im-
plied in ve rounds of feature selection for each method, being the
quality of solution evaluated from the remaining four folds. All fea-
Fig. 1. Pipeline of the proposed methodology.
(a) Wisconsin Breast Cancer dataset.
(b) Ionosphere dataset.
(c) DNA dataset.
Fig. 2. Experimental results using different transfer functions for each swarm-based optimization technique.
D. Rodrigues et al. / Expert Systems with Applications 41 (2014) 22502258 2255
tures in each dataset discussed in this section were normalized
within the range [0,1] to avoid attributes with greater numeric
ranges, which can dominate those in smaller numeric ranges. It
is worth noting that a Euclidean metric was employed for OPF dis-
tance computation. In addition, regarding the tness function and
the nal classication performance, we used an accuracy measure
proposed by Papa et al. (2009), which considers the fact that clas-
ses may have different concentrations in the dataset. This informa-
tion avoid a strong estimation bias towards the majority class in
high class imbalance datasets. Additionally, we have employed
Principal Component Analysis (PCA) for comparison purposes with
two distinct congurations: using 70% of the dimensions with the
biggest eigenvalues (called PCA), and using the same number of
features as the best technique on that dataset (called PCA2).
Figs. 2 and 3 display the results obtained over the datasets. Re-
call that, in the subtitles of each gure, Baseline means OPF with
the entire set of features to give us a reference point; Binary stands
for the methods as were proposed in the literature (Falcn, Almei-
da, & Nayak, 2011; Firpi & Goodman, 2004; Nakamura et al., 2012;
Ramos et al., 2011; Rashedi et al., 2010) for feature selection pur-
poses (i.e., the transfer function used as the one displayed in Eq.
(8)); in such methods, after each particles position be changed to
binary values using Eq. (8), their coordinates keep going into the
search space with binary coordinates, and in the following two
methods, the particles assume binary coordinates only for OPF
computation purposes, i.e., their original values over the search
space are continuous-valued. Sigmoid (Eq. (28)) denotes continu-
ous versions of the techniques with a sigmoid function as the con-
tinuous-binary mapping:
f (x) =
1
1 exp(x)
; (28)
and Hyperbolic Tangent (Eq. (29)) means the continuous approaches
with hyperbolic tangent as transfer function:
f (x) = [ tanh(x)[: (29)
(a) Splice dataset.
(b) USPS dataset.
(c) SVM Guide 1 dataset.
Fig. 3. Experimental results using different transfer functions for each swarm-based optimization technique.
Table 1
Parameters used for each optimization technique. The parameter values were
empirically chosen based on results reported in previous studies in the literature.
Technique Parameters
Bat Algorithm a = c = 0.9
Firey Algorithm b
0
= 0.1, i = 0.8, / = 0.1
Gravitational Search Algorithm G
0
= 100, f = 0.8
Harmony Search HMCR = 0.7
Particle Swarm Optimization c
1
= c
2
= 2, w = 0.9
2256 D. Rodrigues et al. / Expert Systems with Applications 41 (2014) 22502258
Table 1 presents the parameters used for each evolutionary-
based techniques. It is important to clarify that, for all techniques,
we assumed a model with a population size of 30 agents and 100
generations to reach a solution.
Loosely speaking, meta-heuristic algorithms differ mainly in re-
spect to balance between generating diverse solutions so as explor-
ing the search space on the global scale, and searching in a local
region by exploiting the neighbourhood of a good solution. Figs.
2 and 3 compare each evaluated algorithm against BBA in the
aforementioned six public datasets. As we can see, in respect to
Wisconsin Breast Cancer dataset, all feature selection approaches
reduced considerably the set of features and, indeed, improved
the predictive performance of OPF classier. In regard to remaining
datasets, the overall classication performance remained quite
similar when considering the original and reduced datasets. It is
also possible to highlight the number of selected features by BBA
with Sigmoid Transfer Function in Ionosphere dataset, which was
considerably smaller than the others.
The reader may observe the Bat Algorithm performed at least
similar to traditional algorithms, overcoming the others on Splice
dataset. Indeed, BA works similarly to the traditional Particle
Swarm Optimization, as the frequency essentially controls the pace
and range of the movement of the bats. However, this difference is
crucial to intensication which means a local diversication, via
randomization, which avoids the solutions being trapped at local
optima. In addition, BA and Harmony Search do not make a explicit
distinction between global and local search, which may become an
advantage for the user to dene parameters.
Additionally, in regard to the experiment with different transfer
functions, for almost all datasets and swarm-based optimization
techniques, the Hyperbolic Tangent has selected more features
than Binary and Sigmoid functions. This last study is interesting
to point out reasonable transfer functions to be employed for bin-
ary-based optimization problems. Finally, we dened the classi-
cation accuracy over a test set and the nal size of the reduced
set as measures for comparing the presented feature selection
approaches. Although the importance of these measures for feature
selection purposes, since we are investigating the sensibility to
over-tting and the curse-of-dimensionality, it may not be a
proper evaluation of the meta-heuristic algorithms performance.
Possibly, a optimal solution over the evaluation set may not be a
good solution in respect to the test set. Despite these consider-
ations, we provided an analysis over ve meta-heuristic algorithms
with a farrier performance estimation through a cross-validation
scheme. In general, the results showed that swarm-inspired
algorithms are suitable choices for optimization problems in
which there is some discontinuity or complexity in the objective
functions.
7. Conclusion
In this paper, we have presented a wrapper feature selection ap-
proach based on the Bat Algorithm and Optimum-Path Forest clas-
sier, which combines a exploration of the search space and an
intense local analysis by exploiting the neighbourhood of a good
solution to reduce the feature space dimensionality. We have also
proposed a methodology to evaluate feature selection methods
performing a k-fold cross validation.
The proposed approach was compared with traditional meta-
heuristic algorithms in six public datasets. As we have a binary-
based feature selection process, we also evaluated two different
transfer functions, a Hyperbolic Tangent and a Sigmoid function,
which map continuous-valued positions into binary ones. The idea
is to analyze continuous optimizations with different transfer func-
tions besides the literature binary optimization approaches.
The results showed that BA is so effective as some state-of-the-
art swarm-based optimization techniques, and it can also drasti-
cally compact the feature set in all evaluated datasets, while can
indeed improve the predictive performance in some cases. Addi-
tionally, the Hyperbolic Tangent transfer function appears to select
more features than Sigmoid functions for almost all datasets and
swarm-based optimization techniques.
References
Allne, C., Audibert, J. Y., Couprie, M., Cousty, J., & Keriven, R. (2007). Some links
between min-cuts, optimal spanning forests and watersheds. In Proceedings of
the international symposium on mathematical morphology, MCT/INPE (pp. 253
264).
Alonso-Atienza, F., Rojo-lvarez, J. L., Rosado-Muoz, A., Vinagre, J. J., Garca-
Alberola, A., & Camps-Valls, G. (2012). Feature selection using support vector
machines and bootstrap methods for ventricular brillation detection. Expert
Systems with Applications, 39, 19561967.
Bellman, R. (2010). Dynamic programming. Princeton, NJ, USA: Princeton University
Press.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to
algorithms (2nd ed.). The MIT Press.
Falcn, R., Almeida, M., & Nayak, A. (2011). Fault identication with binary adaptive
reies in parallel and distributed systems. In Proceedings of the IEEE congress on
evolutionary computation (pp. 13591366). IEEE.
Firpi, H. A., & Goodman, E. (2004). Swarmed feature selection. In Proceedings of the
33rd applied imagery pattern recognition workshop (pp. 112118). Washington,
DC, USA: IEEE Computer Society.
Frank, A., & Asuncion, A. (2010). UCI machine learning repository.
Geem, Z. W. (2009). Music-inspired harmony search algorithm: Theory and
applications (1st ed.). Springer Publishing Company, Incorporated.
Grifn, D. R., Webster, F. A., & Michael, C. R. (1960). The echolocation of ying
insects by bats. Animal Behaviour, 8, 141154.
Guyon, I., Gunn, S., Nikravesh, M., & Zadeh, L. A. (2006). Feature extraction:
Foundations and applications (Studies in fuzziness and soft computing). Secaucus,
NJ, USA: Springer Verlag, New York, Inc..
Guyon, I., Li, J., Mader, T., Pletscher, P. A., Schneider, G., & Uhr, M. (2007).
Competitive baseline methods set new standards for the NIPS 2003 feature
selection benchmark. Pattern Recognition Letters, 28, 14381444.
Halliday, D., Resnick, R., & Walker, J. (2000). Extended fundamentals of physics. Wiley.
Hsu, C., Chang, C., & Lin, C. (2003). A pratical guide to support vector classication.
Technical Report, National Taiwan University.
Huang, C.-L., & Wang, C.-J. (2006). A GA-based feature selection and parameters
optimization for support vector machines. Expert Systems with Applications, 31,
231240.
Hughes, G. (1968). On the mean accuracy of statistical pattern recognizers. IEEE
Transactions on Information Theory, 14, 5563.
Hull, J. J. (1994). A database for handwritten text recognition research. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 16, 550554.
Jain, A., & Zongker, D. (1997). Feature selection: Evaluation, application, and small
sample performance. IEEE transactions on pattern analysis and machine
intelligence, 19, 153158.
Kennedy, J., & Eberhart, R. C. (1997). A discrete binary version of the particle swarm
algorithm. In IEEE international conference on systems, man and cybernetics (Vol.
5, pp. 41044108).
Kennedy, J., & Eberhart, R. (2001). Swarm intelligence. M. Kaufman.
King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: Comparison of classication
algorithms on large real-world problems. Applied Articial Intelligence, 9,
289333.
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated
annealing. Science, 220, 671680.
Koza, J. (1992). Genetic programming: On the programming of computers by means of
natural selection. Cambridge, MA: The MIT Press.
Kudo, M., & Sklansky, J. (2000). Comparison of algorithms that select features for
pattern classiers. Pattern Recognition, 33, 2541.
Kuncheva, L. I., & Jain, L. C. (1999). Nearest neighbor classier: Simultaneous editing
and feature selection. Pattern Recognition Letters, 20, 11491156.
Mangasarian, O. L., Wolberg, W., & Setiono, R. (1989). Pattern recognition via linear
programming : Theory and application to medical diagnosis. Technical Report
TR 0878, University of Wisconsin (Madison, WI US).
Mansouri, R., Nasseri, F., & Khorrami, M. (1999). Effective time variation of g in a
model universe with variable space dimension. Physics Letters, 259, 194200.
Metzner, W. (1991). Echolocation behaviour in bats. Science Progress Edinburgh, 75,
453465.
Nakamura, R. Y. M., Pereira, L. A. M., Costa, K. A., Rodrigues, D., Papa, J. P., & Yang, X. -
S. (2012). BBA: A binary bat algorithm for feature selection. In Proceedings of the
XXV SIBGRAPI conference on graphics, patterns and images (pp. 291297).
Oh, I.-S., Lee, J.-S., & Moon, B.-R. (2004). Hybrid genetic algorithms for feature
selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26,
14241437.
Papa, J. P., Falco, A. X., Albuquerque, V. H. C., & Tavares, J. M. R. S. (2012). Efcient
supervised optimum-path forest classication for large datasets. Pattern
Recognition, 45, 512520.
D. Rodrigues et al. / Expert Systems with Applications 41 (2014) 22502258 2257
Papa, J. P., Falco, A. X., & Suzuki, C. T. N. (2009). Supervised pattern classication
based on optimum-path forest. International Journal of Imaging Systems and
Technology, 19, 120131.
Ramos, C., Souza, A., Chiachia, G., Falco, A., & Papa, J. (2011). A novel algorithm for
feature selection using harmony search and its application for non-technical
losses detection. Computers & Electrical Engineering, 37, 886894.
Rashedi, E., Nezamabadi-pour, H., & Saryazdi, S. (2009). GSA: A gravitational search
algorithm. Information Sciences, 179, 22322248.
Rashedi, E., Nezamabadi-pour, H., & Saryazdi, S. (2010). BGSA: Binary gravitational
search algorithm. Natural Computing, 9, 727745.
Schnitzler, H.-U., & Kalko, E. K. V. (2001). Echolocation by insect-eating bats.
BioScience, 51, 557569.
Wang, X., Yang, J., Teng, X., Xia, W., & Jensen, R. (2007). Feature selection based on
rough sets and particle swarm optimization. Pattern Recognition Letters, 28,
459471.
Yang, X.-S. (2010). Firey, algorithm stochastic test functions and design
optimisation. International Journal Bio-Inspired Computing, 2, 7884.
Yang, X.-S. (2011). Bat algorithm for multi-objective optimisation. International
Journal of Bio-Inspired Computation, 3, 267274.
Yang, X.-S. (2011). Bat algorithm for multi-objective optimisation. International
Journal of Bio-Inspired Computation, 3, 267274.
2258 D. Rodrigues et al. / Expert Systems with Applications 41 (2014) 22502258