Improved Group Search Optimization Based On Opposite Populations For Feedforward Networks Training With Weight Decay

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2012 IEEE International Conference on Systems, Man, and Cybernetics

October 14-17, 2012, COEX, Seoul, Korea


Improved Group Search Optimization Based on
Opposite Populations for Feedforward Networks
Training with Weight Decay
L. D. S. Pacifico and T. B. Ludermir
Centro de Informatica
Universidade Federal de Pernambuco (UFPE)
Av. Jornalista Anibal Fernandes, sin, Recife, PE, 50.740-560, Brazil
{Idsp, tbl}@cin.ufpe.br
Abstact- Training artifcial neural networks (ANNs) is a
complex task of great importance in problems of supervised
learning. Evolutionary algorithms (EAs) are widely used as
global searching techniques for optimization in scientifc and
engineering problems, and these approaches have been
introduced to ANNs to perform various tasks, such as connection
weight training and architecture design. Recently, a novel
optimization algorithm called Group Search Optimizer (GSO)
was introduced, which is inspired by animal searching behaviour
and group living theory. In this paper, we present two new
hybrid GSO approaches, one based on opposite populations and
the other based on opposite populations and a modifed
Diferential Evolution (DE) strategy. We also applied the Weight
Decay (WD) heuristic to enhance the generalization power of
networks. Experimental results show that the proposed GSO
approaches are able to achieve better generalization performance
than Levenberg-Marquardt (LM), Opposite Diferential
Evolution (ODE) and traditional GSO in real benchmark
datasets.
Keywords- Artiicial Neurl Networks; Evolutionary
computing; Group search optimization; Diferential evolution.
I.
I
NTRODUCTION
Artificial neural networks (ANNs) are known as universal
approximators and computational models with particular
characteristics, such as adaptability, capacity of learning by
examples and the ability to organize or to generalize data.
When applied to pattern classification problems, ANNs
through supervised learing techniques are considered a
general method for constructing mappings between a group of
example vectors and the corresponding classes, allowing the
classification of unseen data as one of the classes learned in
the training process.
The main tasks executed in the training process of
Multilayer Perceptrons (MLPs) for pattern classification ae
the selection of a appropriate achitecture for the problem
and the adjustment of the connection weights of the network.
Traditional gradient-based learning algorithms such as
Backpropagation (BP) and its variant Levenberg-Marquardt
(LM) have been extensively used in the training of MLPs, but
these approaches are usually slower than required in leaning,
and may also get stuck in local minima.
Manuscript received May 2, 2012, This work was supported by FACEPE,
CNPq and CAPES (Brazilian Research Agencies),
978-1-4673-1714-6/12/$31,00 2012 IEEE
Global search techniques, such as evolutionary algorithms
(EAs), like Genetic Algorithm (GA) [23] and Differential
Evolution (DE) [13, 14], Simulated Annealing (SA) [24],
Tabu Search (TS) [25], Ant Colony Optimization (ACO) [26]
and Particle Swarm Optimization (PSO) [16]-[18], with the
ability to broaden the seach space in the attempt to avoid
local minima, has been widely combined with neural networks
to perform various tasks: connection weight training,
connection weight initialization, rule extraction from artificial
neural networks, architecture optimization, and so on.
GSO algorithm is a novel Swarm Intelligence (SI) algorithm
for continuous optimization problems. GSO is inspired by
animal social searching behaviour and group living theory,
following a Producer-Scrounger (PS) model [I], which
assumes that group members search either for "finding"
(producing) or for "joining" (scrounging) opportunities. Some
applications have been done using the GSO algorithm. In He
et at. [4], a group search optimizer strategy for training an
ANN was used for diagnosis of breast cancer. Wu et al. [5]
presented a multi-objective optimization method, where a
group search optimizer with multiple producers (GSOMP) was
applied to the optimal placement of multi-type Flexible AC
Transmission System (FACTS) devices in a power system. He
and Li [6] applied an ANN trained with GSO to machine
condition monitoring. In Silva et al. [7] two GSO approaches
based on cooperative behaviour among multiple GSO groups
were introduced to benchmark patter classification problems.
In this paper, we introduce two novel hybrid GSO
approaches based on opposite populations [14, 15] combined
with weight decay (WD) heuristic [19]-[22] to improve the
process of weight optimization of MLPs: OGSO-WD and
OGSO-MDE-WD. OGSO-WD applies the concepts of
Opposition-Based Learning (OBL) [14] to decrease the
distance from an unknown solution by comparing the
candidate solution with its opposite and continuing with the
better one [15]. OGSO-MDE-WD differs from OGSO-WD by
means of a hybrid evolution scheme, which combines a
modified DE strategy and traditional GSO operators to
improve the exploration of the search space. Experiments have
been done with ODE [15], traditional GSO, GSO-WD [7],
Levenberg-Marquardt algorithms and the proposed approaches
474
using real benchmark classification problems (Cancer,
Diabetes, Heart, Iris and Wine) obtained from the UCI
Machine Learning Repository [27].
This paper is organized as follows. Section II presents
briefy the standard Group Search Optimizer (GSO) algorithm
[2, 3], the Differential Evolution (DE) algorithm [12, l3], te
Opposition-Based Leaning (OBL) [14] concept and Weight
Decay heuristic [7, 19, 21]. Next, the hybrid Group Search
Optimizer approaches are presented (Section III) and the
experimental results ae shown (Section IV). Conclusions ae
given in Section V.
II.
P
RLIMINARIS
A. Group Search Optimizer (GSO)
The GSO [2, 3] is a swarm intelligence algorithm inspired
by animal social searching behaviour and group living theory.
GSO employs the Producer-Scrounger (PS) model as a
framework. The PS model was firstly proposed by Barnard
and Sibly [I] to analyze social foraging strategies of group
living animals. In this model, it is assumed that there are two
foraging strategies within groups: (I) producing, e. g. ,
searching for food; and (2) joining (scrounging), e. g. , joining
resources uncovered by others. Foragers are assumed to use
producing or joining strategies exclusively. Under this
framework, concepts of resource searching from animal
scanning mechanism are used to design optimum searching
strategies to design the GSO algorithm.
Figure 1 shows the pseudocode for the standard GSO
algorithm. The population PN with size N of the GSO
algorithm is called a group and each individual in the
population is called a member. A GSO group consists of three
types of members: producers, scroungers and dispersed
members (rangers) [2, 3]. During each GSO search bout, a
group member which found the best fitness value so far (most
promising area) is chosen as the producer [9], and the
remaining members are scroungers or dispersed members. All
scroungers will join the resource found by the producer,
performing scrounging strategy. Rangers adopt random walks
as the ranging strategy [10].
In GSO, when a member is outside the search space, it will
tum back to its previous position inside the search space
(bounded search space), restricting the search to a profitable
patch [11]. The standard GSO algorithm is present in Fig. 1
[3] .
B. Diferential Evolution (DE)
The Differential Evolution (DE) was proposed by Price
and Stom in 1995 [12, 13]. DE is an efective, robust, and
simple global optimization algorithm, which only has a few
control parameters [15]. Like other evolutionary algorithms,
DE starts with a randomly initial population, when no
preliminary knowledge about the solution space is available.
Assuming that X (i = I, 2, ... , N) are solution vectors in
generation k, successive populations are generated by adding
the weighted diference of two randomly selected vectors to a
third randomly selected one. For classical DE (i.e.,
DE/randllin), the mutation, crossover, and selection
475
operators are defined by eq. (I), eq. (2) and eq. (3),
respectively:
V = X
,
+ F(X
_
- X )
v

J
i
X

, otherise
J'
X

+! =
I
'
I
U
,
i fCU,
k
) isbetter thanf(x )
I
X
k
, otherwise
,
(1)
(2)
(3)
where Vik is a mutant vector, i = { I, 2, ... , N}, nl, n2, and n3
are mutually different random integer indices selected from
{l, 2, ... ,N} (N :: 4 is required), j = {I, 2, . . . , D} is the
individual dimension, F U [0, 2] is a real constant which
determines the amplification of the added differential variation
of (X _- X

.rand/O, 1) is the jth evaluation of a uniform


random number generator, Cr is the crossover rate and I E{ I,
2, ... ,N} is a random parameter index, chosen once for each i
to make sure that at least one parameter is always selected
from the mutated vector V
'
. Most popular values for Cr are in
the range of (0.4, 1) [15].
1. k = 0;
2. Randomly initialize positions and head angles of all
members;
3. Calculate the fitness values of initial members;
4. WHILE (the termination conditions are not met)
5. FOR (each members i in te group)
6. Choose producer: Find the producer Xp of the group;
7. Perform producing: perform producer behaviour;
Perform scrounging: Randomly select a percentage
from the rest members to perform scrounging;
8. Perform dispersion: The rest members (rangers) will
perform ranging;
9. Fitness evaluation: Calculate the fitness value of
current member;
10. END FOR
II. k=k+l;
12. END WHILE
FIgure 1 - Pseudocode for te standard GSa algoritm
C. Opposition-Based Learing (OBL)
Opposition-Based Leaing (OBL) was introduced by
Tizhoosh in 2005 [14]. The main concept behind OBL is the
simultaneous consideration of an estimate and its
coresponding opposite estimate, in order to increase the
chances to find a solution better than the curent one.
Evolutionary optimization methods start with some initial
solutions (initial population), usually initialized randomly
(random guesses), and try to improve them towad some
optimal solution(s). The process of searching terminates when
some predefined criteria are satisfied. The quality of the
solutions and the computation time are directly related to the
distance of these initial random values from the optimal
solution. One can improve the chance of starting with a closer
(fitter) solution by simultaneously checking the opposite
solution [15]. By doing this, the fitter one (guess or opposite
guess) can be chosen as an initial solution. The same approach
can be applied not only to initial solutions but also
continuously to each solution in the curent population [14, 15].
The opposite number for a real value x ( [I, u] is defined by
eq. (4) [14, 15]:
x=l+u-x (4)
Since all individuals Xi =
|
Xli, X,i, . , xm} from a population
PN are represented as real vectors, we can straightforwardly
define the concept of an opposite population P
N
as a
population of individuals Xi = |-,,.-_,-,
(i = 1, 2, ... ,
N) where (eq. (5)):
xj
i
=l+u-Xi
i
,j=I,2"",D (5)
D. Weight Decay (WD)
Weight decay was initially suggested as a heuristic to
improve the backpropagation algorithm for the preference bias
of a robust neural network that is insensitive to noise [7, 19,
21]. In this work, weight decay heuristic was combined with
GSO algorithm as an attempt to improve the generalization
performance of the trained MLP network. This implementation
used an adaptive scheme for determining the regularization
coefficient (A(t)), as described in [21]. Every member has its
own At) coeficient, adjusted adaptively according to its mean
eror. After determining the position of the individual through
the standad GSO, the decay term is applied to the current
position of the member, according to eq. (6), penalizing trends
for large weights.
Xi(t+ 1)=Xi(t+ 1)-Ai(t+ I)Xi(t+ 1) (6)
The new cost function is given by:
Ai
(t
) 2
= E(t) + -IIXII
,
2
'
(7)
where Ei(t) denotes the neural network error at time t related to
individual i and IIXil1
2
is the squaed norm of Xi. In each
iteration, each Ai(t) is updated as follows:
A
i
(t) + INC , if E.(t) < E.(t)
A;
(t
+ 1) =
I I (8)
A
i
(t)-INC ifE
i
(t)E
i
(t)
otheri se
where t is the increment value to the regularization
coefficient and E i(t) denotes the mean error obtained by the
ith individual so far.
The weight decay mechanism performs in a network
architecture diferentially toward zeroes by reinforcing large
weights and weakening small weights connections. As small
weights can be used by the network to code noise patters, this
weight decay mechanism is considered especially important in
noisy data.
III.
O
PPOSITION-
B
ASED
G
ROUP
S
EARCH
O
PTIMIZATION
A
pPROACHES
This section presents two new hybrid GSO approaches
based on opposite populations: OGSO-WD and OGSO-MDE
WD. The first one applies the concept of opposition to
476
decrease the distance from an unknown solution by comparing
one member of the population (i.e., a candidate solution) with
its opposite and continuing with the better one. This approach
is a direct extension of the works presented in [14, 15] to the
GSO context. The second one is a hybrid model that combines
GSO and a modified version of the DE operators to improve
the population in such a way that members are randomly
chosen to execute a GSO or modified DE update. We also
applied the weight decay heuristic to enhance the
generalization power of the proposed methodologies. Figure 2
and 3 show pseudocodes for OGSO-WD and OGSO-MDE
WD algorithms.
In OGSO-WD, each member Xi of the population PN is
initialized randomly and its opposite member is computed,
generating an opposite population OPN. The initial population
is composed by the fittest individuals from {PN U OPN}
[15]. After initialization process, all member of the population
are updated according to basic GSO operators (i.e., producing,
scrounging or ranging) [2, 3]. Based on a jumping probability
In after generating new populations by GSO operations, the
opposite population is calculated and the fittest individuals are
selected from the union of the current population and the
opposite population [15]. We adopted the dynamic opposite
population scheme presented in [15], so that generation
jumping calculates the opposite of each variable based on
minimum and maximum values of that variable in the current
population, shortening the search space progressively.
In OGSO-MDE-WD, the population is divided in two
groups: the first group will update using basic GSO operators
and the second one will follow a modified differential
evolution scheme. The modifications are performed on classic
DE mutation operator (i.e., DE/rand/l/bin), generating two
new schemes, from now on called s-mutation and r-mutation,
respectively. The main diference between s-mutation and r
mutation lies in the fact that the producer is always selected as
one of the members generating the new mutated vector V
'
in
s-mutation. Eq. (9) describes the s-mutation operator, and eq.
(l0) describes the r-mutation operator.
k
X
k k k
V
i
=
p
+ r
3
0 (X
11
-X

_
(9)
V
k
_
X
k
r 0 (X
k _
X
k
) (10)
i -
,
+
3 12 13
where r3 E 9 is a uniform random sequence in (0, 1), nl, n2,
and n3 are mutually diferent random integer indices selected
from
|
I, 2, . . . , }.
In these approaches, each member Xi in the group is
composed of a set of weights between input layer and hidden
nodes (WI)' hidden biases o,,. a set of weights between
hidden layer and output layer (W
2
), and output biases o,,
[4]:
Xi = [W.o.W .o]
For each member, all variables are randomly initialized
within the range of [-1, 1]. The fitness function adopted is te
mean squared error (MSE) on the validation set:
1
N C 2
E -
_" " n
_
n )
( 11)

0
where t is the taget output related to the ith individual
concering the kth class, 0 is the output obtained by the ANN
for the ith individual concerning the kth class and C is the
number of output nodes.
1. k = 0;
2. Randomly initialize positions and head angles of all
members;
3. Randomly initialize initial population P
4.
5.
6.
7.
8.
9.
10.
II.
12.
13.
14.
IS.
16.
Find the corresponding opposite population vto
v

N'
Select the fittest N members from ,P U v to
form the initial population;
Calculate the fitness values of initial members;
WHILE (the termination conditions are not met)
FOR (each members i in the group)
Choose producer: Find the producer Xl of the group;
Perform producing: perform producer behaviour;
Perform scrounging: Randomly select a percentage
from the rest members to perform scrounging;
Perform dispersion: The rest members (rangers) will
perform ranging;
Fitness evaluation: Calculate the fitness value of
current member;
Regularization Coeficient Update: using (8);
END FOR
IF rand(O, I) < i,
Find the corresponding opposite population vto
current population P ,
17. Select the fittest N members from ,vu vto
form the new population;
18. END IF
19. k=k+l;
20. END WHILE
Figure 2 - Pseudocode for the OGSO-WD algorithm
IV. RESULTS AND DISCUSSION
In this section, we compare the new two GSO approaches
with the LM, ODE [15], traditional GSO [2, 3] and GSO-WD
presented in our ealy work [7]. All programs run in a
MATLAB 6.0 environment. A validation set is used in all
evaluated methodologies to prevent overfitting.
For evaluating all of these algorithms five benchmark
classification datasets, obtained from UCI Machine Learning
Repository [27] have been used. These datasets presents
diferent degrees of dificulties and different number of
classes. The evaluation metrics used are an empirical analysis
and a paired hypothesis test of type t-test (a = 5%) [28] over
the test accuracies obtained by each method in each dataset. In
the experiments, some parameters have been fixed for all
datasets (Table 1), according to [2, 3, 7, 15, 19]. The
maximum pursuit distance lmax was given as follows:
lmax = IIU - L II =
!i
-
L
(11 )
477
where Li and Ui are the lower and upper bounds for the ith
dimension.
1. k = 0;
2. Randomly initialize positions and head angles of all
members;
3. Randomly initialize initial population P
4. Find the corresponding opposite population vto
v

N'
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Select the fittest N members from ,P U v to
form the initial population;
Calculate the fitness values of initial members;
WHILE (the termination conditions are not met)
FOR (each members i in the group)
Choose producer: Find the producer Xp of the group;
Perform producing: perform producer behaviour;
IF GSO group:
Perform scrounging: Randomly select a
percentage from the rest members to perform
scrounging;
Perform dispersion: The rest members (rangers)
will perform ranging;
ELSE (Modified DE group)
S-Mutation: Randomly select a percentage from the
rest members to mutate following eq. (9);
R-Mutation: The rest members will mutate
following eq. ( 10);
Crossover: each member from modified DE group
will execute crossover according to eq. (2);
Fitness evaluation: Calculate the fitness value of
current member;
19. Regularization Coeficient Update: using (8);
20. END FOR
21. IF rand(O, 1) < ir
22. Find the corresponding opposite population vto
current population P ,
23. Select the fittest N members from ,vu vto
form the new population;
24. ENDIF
25. k = k + I;
26. END WHILE
Figure 3 - Pseudocode for the OGSO-MDE-WD a1gonthm
Table 1
-
Fixed parameter for all algorithms
Algorithm Parameter Value
PN
50
GSO
Scroungers percentage 80%
nul and Uma
ni a
L
, emai2
Maxi mum number of i terations 50
OGSO-MDE-DE S-Mutation Percentage 80%
C, 0.8
ODE F 1.0
ir 0.3
Levenberg- Number of hi dden nodes 6
Marquardt Maxi mum number of epochs 200
Weight Decay
A 5 X 10)
INC 1 X 10
We tested for each dataset diferent maximum number of
GSO iterations, using the global best value found (Table 1).
Each dataset was divided in training, validation and testing
sets, as specified in Table 2. For all algorithms, 50
independent executions were done with each dataset. The
training, validation and testing sets were randomly generated
at each trial of simulations. The results obtained by LM, GSO,
GSO-WD and the new GSO approaches that achieved the best
results are bolded.
Table 2 - Dataset specifications
Dataset Cancer Diabetes Heart Iris Wine
Training 350 384 130 70 78
Validation 175 192 70 40 50
Testing 174 192 70 40 50
From Table 3 through Table 7, the results obtained for each
method are shown. Table 3 shows the results obtained for
cancer dataset. The best result has been achieved by the hybrid
OGSO-MDE-WD (96.41 % of test accuracy) in an empirical
evaluation, followed by OGSO-WD and GSO-WD [7],
respectively, and these approaches outperformed the ODE,
traditional GSO and LM approaches. The t-test showed that
OGSO-MDE-WD achieved better results than ODE,
traditional GSO and LM algorithms.
Table 3 - Results for Cancer
Classification Algorithm Training Time (s) Test Accuracy (% )
LM 0.28376 87.32 15.22
ODE 2021.67 95.84 1. 30
GSO 1041.38 95.68 1.44
GSO-WD 1043.84 96.08 1. 41
OGSO-WD 2326.39 96.23 1.43
OGSO-MDE-WD 1272.48 96.41 1.45
For diabetes case (Table 4), in an empirical analysis OGSO
WD (76.61 % of test accuracy) outperformed all other
algorithms, followed by OGSO-MDE-WD. The t-test showed
that OGSO-WD outperformed ODE, GSO and LM algorithms,
and the OGSO-MDE-WD outperformed ODE and LM
algorithms.
Table 4 - Results for Diabetes
Classification Algorithm Training Time (s) Test Accuracy (% )
LM 0.31474 70.61 7.99
ODE 3921.26 75.30 2.48
GSO 1205.83 75.59 2.39
GSO-WD 1246.65 75.61 2.80
OGSO-WD 2306.60 76.61 + 2.66
OGSO-MDE-WD 1733.45 76.34 2.51
Table 5 - Results for Heart
Classification Algorithm Training Time (s) Test Accuracy (%)
LM 0.22700 71.57 12.21
ODE 2373.30 79.46 5.73
GSO 2067.71 79.62 5.25
GSO-WD 2148.87 80.43 5.47
OGSO-WD 1340.83 80.51 5.92
OGSO-MDE-WD 1510.72 80.34 5.19
The table 5 shows that OGSO-WD (80.51% of test
accuracy) achieved the higher test accuracy empirically for
heart dataset. All GSO based algorithms outperformed ODE,
traditional GSO and LM techniques empirically, but according
to t-test their results are similar among each other, except for
478
LM, which was worse than all other approaches, presenting a
high degree of instability.
In relation to iris plant dataset (Table 6), the best empirical
results have been achieved by OGSO-MDE-WD (96.05 of test
accuracy) even as the date set for diabetes. All GSO based
algorithms outperformed ODE, traditional GSO and LM
techniques empirically, but according to t-test their results are
similar among each other, except again, for the LM algorithm
(the worst approach).
Table 6 - Results for Iris
Classification Algorithm Training Time (s) Test Accuracy (% )
LM 0.2033 76.60 24.53
ODE 3083.65 95.20 3.60
GSO 797.23 95.35 3.67
GSO-WD 781.72 95.70 3.35
OGSO-WD 2692.64 95.85 3.30
OGSO-MDE-WD 1337.63 96.05 +3.43
For wine dataset (table 7), again the hybrid OGSO-MDE
WD outperformed all approaches in an empirical analysis,
followed by the new OGSO-WD and GSO-WD, respectively.
Once more t-test showed that all methods but LM achieved
similar performances.
Table 7 - Results for Wine
Classification Algorithm
LM
ODE
GSO
GSO-WD
OGSO-WD
OGSO-MDE-WD
Training Time (s)
0.322
6939.87
2983.68
4409.86
1788.65
1311.14
V.
C
ONCLUSION
Test Accuracy (% )
79.72 22.28
96.04 2.84
96.00 2.71
96.32 3.46
96.88 2.56
97.00 2.72
In this paper, we introduced two new hybrid learing
approaches based on opposite populations and GSO to
improve LM algorithm for training ANNs, namely OGSO-WD
and OGSO-MDE-WD. Both approaches use the concept of
opposition-based learning (OBL) [14] to accelerate the
convergence of GSO method. The performance of the tested
algorithms was evaluated with well known benchmark
classification datasets [27]. Experimental results show that the
hybrid approaches achieved better generalization performances
than ODE, original GSO and LM for all tested datasets. For
two cases (cancer and diabetes), the new GSO based
techniques outperformed the ODE, GSO and LM based on t
test evaluation test (confidence coeficient of 95 %).
A
CKNOWLEDGMENT
The authors would like to thank F ACEPE, CNPq and
CAPES (Brazilian Reseach Agencies) for their financial
support.
R
EFERENCES
[1] C. J. Barnard and R. M. Sibly, "Producers and Scroungers: A General
Model and Its Application to Captive Flocks of House Sparrows", in:
Animal Behaviour, vol. 29, pp. 543-550, 1981.
[2] S. He, Q. H. Wu, and J. R. Saunders, "A Novel Group Search Optimizer
Inspired by Animal Behavioural Ecology," in: 2006 1E Congress on
Evolutionar Computation (CEC 2(06), pp.1272-1278, Vancouver,
2006.
[3] S. He. H. Wu and J. R. Saunders, "Group Search Optimizer: An
Optimization Algorithm Inspired by Animal Searching Behaviour", in:
IE Transactions on /volutionar Computation, vol. 13, no. 5, pp.
973-990,2009.
[4] S. He, Q. H. Wu and J. R. Saunders, "Breast Cancer Diagnosis Using an
Artificial Neural Network Trained by Group Search Optimizer", in:
Transactions ojthe Institute ojMeasurement and Control, vol. 31, no. 6,
pp. 517-531, 2009.
[5] Q. H. Wu, Z. Lu, M. S. Li and T. Y. Ji, "Optimal Placement of FACTS
Devices by a Group Search Optimizer with Multiple Producers", in:
2008 Congress on Evolutionar Computation (CEC 2(08), Hong Kong,
pp. l033-1039,2008.
[6] S. He and X. Li, "Application of a Group Search Optimization based
Artificial Neural Network to Machine Condition Monitoring", in: 13'
h
IEEE Interational Conjerence on Emerging Technologies and Factor
Automation (ETFA 2(08), pp. 1260-1266,2008.
[7] D. N. G. Silva, L. D. S. Pacffico and T. B. Ludermir. "Improved Group
Search Optimizer Based on Cooperation Among Groups for
Feedforward Networks Training with Weight Decay", in: 2011 1Int
Conj on Systems, Man, and Cyberetics, 2011, Anchorage. IEEE
SMC2011 Confrence Proceedings, pp. 2133-2138, 2011.
[8] W. J. O'Brien, B. 1. Evans and G. L. Howick, "A New View of the
Predation Cycle of a Planktivorous Fish, White Crappie (Pomoxis
Annularis)", in: Canadian J. Fisheries Aquatic Sci., vol. 43, pp. 1894-
1899, 1986.
[9] I. D. Couzin, J. Krause, N. R. Franks and S. A. Levin, "Effective
Leadership and Decision-Making in Animal Groups on the Move",
Nature, vol. 434, pp. 513-516, 2005.
[10] C. L. Higgins and R. E. Strauss, "Discrimination and classification of
foraging path produced by search-tactic models," Behavior Ecology, vol.
15,pp. 248-254,2003.
[11] A. F. Dixon, "An Experimental Study of the Searching Behaviour of the
Predatory Cocconellid Beetle Adalia Decempunctata", in: J. Animal
Ecology, vol. 28, pp. 259-2281,2003.
[12] R. Storn and K. Price, "Differential Evolution - A Simple and Efficient
Adaptive Scheme for Global Optimization Over Continuos Spaces",
Berkerley, CA, Tech. Rep. TR-95-012, 1995.
[13] R. Storn and K. Price, "Differential Evolution - A Simple and Efcient
Heuristic for Global Optimization Over Continuos Spaces", in: Joural
oJ Global Optimization II, pp. 341-359, 1997.
[14] H. R. Tizhoosh, "Opposition-Based Learning: A New Scheme for
Machine Learning", in: Proc. Int. Conf. Comput. Intell. Modeling
Control and Autom., Vienna, vol. 1, pp. 695-701, 2005.
479
[IS] S. Rahbamayan, H. R. Tizhoosh and M. M. A. Salama, "Opposition
Based Differential Evolution", in: IEEE Transactions on Evolutionar
Computation, vol. 12, no. 1, pp. 64-79, 2008.
[16] J. Kennedy and R. Eberhart, "Particle Swarm Optimization", in: Proc.
1/E/lntl. Con( on Neural Networks, 1942-1948, 1995.
[17] J. Kennedy and R. Eberhart. "Swarm Intelligence", Morgan Kaufmann
Publishers, Inc, San Francisco, CA, 200 I.
[18] F. van den Bergh. "An Analysis of Particle Swarm Optimizers", PhD
dissertation, Faculty of Natural and Agricultural Sciences, Univ.
Pretoria, Pretoria, South Africa, 2002.
[19] M. Carvalho, T. B. Ludemir, "Particle Swarm Optimization of Feed
Foward Neural Networks with Weight Decay", in: Proceedings oj the
Sixth Interational ConJerence on Hybrid Intelligent Systems (HIS'06),
pp. 5-10, 2006.
[20] S. Haykin, "Neural Networks: A Comprehensive Foundation", 2
ml
Edition, Prentice Hall, 1998.
[21] A. S. Weigend, D. E. Rumelhart and B. A. Huberman, "Generalization
by Weight Elimination with Application to Forecasting", in:
Proceedings ojthe 1990 confrence on Advances in neurl infrmation
processing systems II (NIPS II), pp. 875-882, 1990.
[22] C. Zanchettin and T. B. Ludermir, "Global Optimization Methods for
Designing and Training Feedforward Artificial Neural Networks", in:
Dynamics oj Continuous, Discrete & Impulsive Systems (DCDIS) A
Supplement, Advances in Neural Networks, vol. 14 (Sl), pp. 328-337,
2007.
[23] E. Eiben and J. E. Smith, "Introduction to Evolutionar Computing",
Natural Computing Series, MIT Press, Springer, Berlin, 2003.
[24] S. Kirkpatrick, C. D. Gellat Jr. and M. P. Vecchi, "Optimization by
Simulated Annealing", Science, vol. 220, pp. 671-680,1983.
[25] F. Glover, "Future Paths for Integer Programming and Links to Artificial
Intelligence", Computers and Operation Research, vol. 13, pp. 533-549,
1986.
[26] M. Dorigo, V. Maniezzo and A. Colorni, "Ant System: Optimization by
a Colony of Cooperative Agents", IEEE Transactions on Systems, Man
and Cyberetics - Part B, vol. 26, no. 1, pp.29-41, 1996.
[27] A. Frank and A. Asuncion, UCI Machine Learning Repository, Univ.
California, Sch. Inform. Comput. Sci., Irvine, CA, 2011 [Online].
Available: htt://archive.ics.uci.edu/ml.
[28] M. H. DeGroot, "Probability and Statistics", 2
ml
edition. 1989.

You might also like