Professional Documents
Culture Documents
PPR - Espinal - Comparison of PSO and DE For Training Neural Networks
PPR - Espinal - Comparison of PSO and DE For Training Neural Networks
Abstract—The use of computational resources required for A. Feed-Foward Artificial Neural Network
Feed-Forward Artificial Neural Network (FFANN) training phase
by means of classical techniques such as the backpropagation In FFANN, the first term feedforward describes how this
learning rule can be prohibitive in some applications. A good neural network processes and recalls patterns. In a feedforward
training phase is needed for a high performance of a neural neural network, neurons are only connected forward. Each
network. In searching for alternative methods for training phase layer of the neural network contains connections to the next
of FFANN, some metaheuristic techniques have been used to
layer, but there are no back connections [2].
do this task. This paper compares the performance of Particle
Swarm Optimization (PSO) and Differential Evolution (DE) as The feed-forward process can be explained as follows. Let
training methods for FFANN under several well-known pattern be expression 1 the notation for a feedforward neural network
recognition instances. [3]
Keywords-Neural Networks, Particle Swarm Optimization, Dif-
W 1 ,b1 W 2 ,b2 W L ,bL
ferential Evolution x0 → x1 → · · · → xL (1)
84
The training phase was performed in the batch training antennas with a total transmitted power on the order
approach [4]. It means that for all patterns of the training of 6.4 kilowatts. The instances in this databse are de-
dataset, the quadratic error [3] is calculated from the target scribed by 2 attributes per pulse number, correspond-
output and the neural network output (eq. 9) before to adjust ing to the complex values returned by the function
all weights of the neural network. resulting from the complex electromagnetic signal.
The dataset has 351 instances with 35 attributes each
M
m
(p) (p) one. The first 34 attributes are continuous and the
Quadratic Error = (tk − yk )2 (9)
35th is the type of class, good or bad. Good radar
p=1 k=1
returns are those showing evidence of some type of
where: M = number of patterns, m = number of structure in the ionosphere and bad returns are those
neural outputs, t = target output, y = neural network that do not; their signals pass through the ionosphere.
output. Irish Plant: The Irish Plant [14] dataset contains 3
Since we have a continuous space, the Quadratic Error can classes of 50 instances each, where each class refers
be considered like a F itness f unction. This function can be to a type of iris plant. One class is linearly separable
easily implemented in a metaheuristic technique such as PSO from the other 2; the latter are not linearly separable
and DE showed in the previous section. The dimension of the from each other. The dataset has 150 instances ,50 in
problem will be the weights of the links between nodes in our each of three classes. Each instances has 4 attributes:
FFANN’s model. sepal length in cm, sepal width in cm, petal length
The final goal for these two training techniques is the in cm and petal width in cm. The classes are: Iris
same that the backpropagation learning rule: To minimize the Setosa, Iris Versicolour and Iris Virginica.
quadratic error of the neural network outputs. Teaching Assistant Evaluation (TAE): The Teach-
ing Assistant Evaluation [15] dataset consist of eval-
uations of teaching performance over three regular
semesters and two summer semesters of 151 teaching
assistants assignment at the Statistics Department of
the University of Wisconsin-Madison. The scores
were divided into 3 roughly equal-sized categories
to form the class variable. Each instance is formed
by 5 attributes.
Wine: The Wine [16] dataset is the result of a
chemical analysis of wines grown in the same region
in Italy but derived from three different cultivations.
The analysis determined the quantities of 13 con-
stituents found in each of the three types of wines.
The dataset has 178 instances with 13 attributes each
one.
B. Experimental design
Fig. 1. Codification of a Particle/Genome for representing a FFANN
configuration. The configuration of the experiments realized for this paper
is described on the next paragraphs.
1) FFANN Configuration: As each dataset has a very par-
IV. E XPERIMENTS AND R ESULTS ticular structure (number of classes and number of features),
In this section we discuss the experiment design, execution the neural network architecture changed for each dataset. Since
and results of the comparison between PSO and DE for the the main goal of this paper is to know which metaheuristic is
training phase. the best in the minimization of the quadratic error, the authors
decided to use FFANNs with only one hidden layer for reasons
of simplicity.The number of neurons in the input layer was
A. Dataset description
given by the number of features of the dataset, the hidden layer
We chose five well-know dataset instances from the UCI was estimated as the double of the neurons in the input layer
Machine Learning Repository: minus one and finally the number of neurons of the output
Glass: The Glass [12] dataset is an study of classifi- layer is equal to the number of classes of the dataset. All
cation of types of glasses. It was motivated by crim- hidden and output neurons have the sigmoid function as their
inological investigation. This dataset is formed by 7 activation function. The eq. 10 shows the sigmoid function.
classes and it has 214 instances with 10 attributes
1
each one. f (uli ) = l (10)
Ionosphere: The Ionosphere [13] dataset was col- 1 + e−ui
lected by a system in Goose Bay, Labrador. This sys- where uli is the input of the neuron xli , as it’s explained in
tem consists of a phased array of 16 high-frequency section II-A.
85
2) Metaheuristics Configuration: In order to gather in-
formation about the performance of the two metaheuritics
proposed (PSO and DE) in the training phase of the FFANN,
we used the quadratic error from the neural network output
and the desired output. Our main objective is to minimize the
difference between the current solution and the desire solution.
Equation 9 was used as the objetive function in the
proposed metaheuristic techniques PSO and DE. This ap-
proach transforms the original problem of classification into
an optimization problem, where we need to find the correct
weights that minimizes the fitness function.
For each dataset we applied 10,000 iterations of the pro-
posed metaheuristics. Finally we compared the last fitness
reported for each heuristic to analyze its performance.
PSO setup: PSO used 20 particles (each particle rep-
Fig. 2. Training/Optimization Process for Glass dataset
resents a complete FFANN array of weights), with a
χ parameter of autoadaptation. The parameter of the
neighbourhood memory coefficient is ϕ1 = 2.05, and
for the personal memory coefficient is ϕ2 = 2.05.
DE setup: DE used 20 individuals (same as PSO,
each individuals represents the complete set of
weights of a FFANN), and F = 0.15. The scheme
for the mutation step used in this work was the
DE/Current to best/1 scheme (eq. 7).
We applied this design for each dataset in order to achieve
a fair comparison between PSO and DE.
C. Results
Once the test was executed for each dataset with the
proposed metaheuristics, we achieved the results shown in
table I. Figures 2 to 6 show the training/optimization process
for each dataset done by both metaheuristics: PSO and DE.
Dataset PSO DE
Fig. 3. Training/Optimization Process for Ionosphere dataset
Glass 57.881 133.51
Iono 0.362 12.574
Iris 1.49e−3 2.00
TAE 48.67 78.30
Wine 0.003 58.054
TABLE I
F INAL Q UADRATIC ERROR FROM EACH DATASET
D. Discussion
The PSO algorithm is clearly the winner against the DE
algorithm, but we need to consider the number of operations
of both algorithms.
The number of operations of each algorithms affects directly
the computational resources used i.e a higher number of oper- Fig. 4. Training/Optimization Process for Iris dataset
ations produces a major resources consumption. This situation
is relevant for the results. The DE algorithm is simpler that
the PSO algorithm by the next reasons:
86
This paper only works on the training phase, so for further
work, we propose to complete the process of classification in
order to validate that a minimum error in the training phase
equals to a better performance of classification rate.
ACKNOWLEDGEMENT
Authors thank the support received from the CONACYT
and DGEST (Grant 3528.10-P).
R EFERENCES
[1] E. P. P. A. Derks, M. S. S. Pastor, and L. M. C. Buydens, “Robustness
analysis of radial base function and multi-layered feed-forward
neural network models,” Chemometrics and Intelligent Laboratory
Systems, vol. 28, no. 1, pp. 49 – 60, 1995. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/016974399580039C
Fig. 5. Training/Optimization Process for TAE dataset [2] J. Heaton, Introduction to Neural Networks for Java, Second Edition.
Heaton Research, Inc., 2008.
[3] M. Friedman and A. Kandel, Introduction to pattern recognition statis-
tical, structural, neural and fuzzy logic approaches. World Scientific,
2000.
[4] V. G. Gudise, G. K. Venayagamoorthy, and S.-M. /eee, “Comparison of
particle swarm optimization and backpropagation as training algorithms
for neural networks,” in in Proceedings of the IEEE Swarm Intelligence
Symposium 2003 (SIS 2003, 2003, pp. 110–117.
[5] J. Ilonen, J.-K. Kamarainen, and J. Lampinen, “Differential evolution
training algorithm for feed-forward neural networks,” Neural Processing
Letters, vol. 17, no. 1, pp. 93–105, 2003.
[6] J. Kennedy and R. C. Eberhart, “Particle swarm optimization,” IEEE
Int. Conf. Neural Netw, vol. 4, pp. 1942–1948, 1995.
[7] R. Storn and K. Price, “Differential evolution - a simple and
efficient heuristic for global optimization over continuous spaces,” J. of
Global Optimization, vol. 11, pp. 341–359, December 1997. [Online].
Available: http://portal.acm.org/citation.cfm?id=596061.596146
[8] C. Maurice, Particle Swarm Optimization. USA: Wiley-ISTE, 2006.
[9] R. Poli, J. Kennedy, and T. Blackwell, “Particle swarm optimization,”
Swarm Intelligence, vol. 1, no. 1, pp. 33–57, Jun. 2007. [Online].
Available: http://dx.doi.org/10.1007/s11721-007-0002-0
Fig. 6. Training/Optimization Process for Wine dataset [10] X. S. Yang, Nature Inspired Metaheuristic Algorithms, 2nd ed. Luniver
Press, 2008.
[11] J. Holland, “Adaptation in natural and artificial systems,” University of
Michigan Press, 1975. [Online]. Available: http://mitpress.mit.edu/
• DE makes a mutation by means of a vectorial subtraction, [12] P. Zhong and M. Fukushima, “A regularized nonsmooth newton method
for multi-class support vector machines,” in Systems Analysis, Optimiza-
and then evaluates the fitness function in order to decide tion and Data Mining in Biomedicine. Taylor and Francis, Mar 2007,
if the new solution replaces the old one. vol. 22, pp. 225–236.
• PSO needs to update each particle velocity. [13] V. G. Sigillito, S. P. Wing, L. V. Hutton, and K. B. Baker, “Classification
of radar returns from the ionosphere using neural networks,” Johns
• PSO requires more complex operations to generate a new Hopkins APL Technical Digest, vol. 10, pp. 262–266, 1989.
solution i.e needs to evaluate local and global knowledge [14] R. A. FISHER, “The use of multiple measurements in taxonomic
to discern a new solution. problems,” Annals of Human Genetics, vol. 7, no. 2, pp.
179–188, 1936. [Online]. Available: http://dx.doi.org/10.1111/j.1469-
• PSO checks for each particle if another particle has found 1809.1936.tb02137.x
a better solution that the current reported as BGlobal and [15] W. Loh and Y. Shih, “Split selection methods for classification trees,”
GLocal . Statistica Sinica, 1997.
[16] K. Ali and M. Pazzani, “Error reduction through learning multiple
If we consider each difference, we can assume that the PSO descriptions, in press,” Machine Learning, vol. 24, 1996.
algorithm at least uses a twice of computational resources that
the DE algorithm. This excess in the operations can be the
reason of its better performance.
V. C ONCLUSIONS
This paper has compared two metaheuristcs for FFANN’s
training phase. We have used PSO and DE as training phase.
DE have shown a simpler computing performance since the
number of operation made for this algorithm is at least half
the number of operations made for PSO. The test shows that
PSO works in a better way than DE in the minimization of
quadratic error function as fitness function.
87