Tuning of The Structure and Parameters of Neural Network Using A

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

IECON'O1: The 27th Annual Conference of the IEEE Industrial Electronics Society

Tuning of the Structure and Parameters of Neural Network using an Improved


Genetic Algorithm
-
H. K. Lam S. H. Ling F. H. F. Leung P. K. S. Tam
Centre for Multimedia Signal Centre for Multimedia Signal Centre for Multimedia Signal Department of Electronic and
Processing, Department of Processing, Department of Processing, Department of Information Engineering, The
Electronic and lnformation Electronic and Information Electronic and lnformation Hong Kong Polytechnic
Engineering, The Hong Kong Engineering, The Hong Kong Engineering, The Hong Kong University, Hung Hom, Kowloon,
Polytechnic University, Hung Polytechnic University, Hung Polytechnic University, Hung Hong Kong
Hom, Kowloon, Hong Kong Horn, Kowloon, Hong Kong Hom, Kowloon, Hong Kong enptam@hkpucc.polyu.edu.hk
hkl@eie.polyu.edu.hk ensteve@eie.polyu.edu.hk enfrank@polyu.edu.hk

Abstract - This paper presents the tuning of the structure This improved GA is implemented in floating-point number;
and parameters of a neural network using an improved hence, the processing time is much faster than that of the
genetic algorithm (GA). The improved GA is implemented traditional GA [I-2,51. Third, the improved GA has only one
by floating-point number. The processing time of the parameter (population size), instead of three parameters, to be
improved C A is faster than that of the GA implemented by chosen by the user. This makes the improved GA simple and
binary number as coding and decoding are not necessary. easy to use, especially for the users who do not have too much
By introducing new genetic operators to the improved CA, it knowledge on tuning. Fourth, a three-layer neural network with
will also be shown that the improved GA performs better switches introduced in some links is proposed to facilitate the
than the traditional C A based on some benchmark test tuning of the network structure. As a result, for a given fully
functions. A neural network with switches introduced to connected neural network, it may no longer be a fully connected
links is proposed. By doing this, the proposed neural network after learning. This implies that the cost of
network can learn both the input-output relationships of an implementing the proposed neural network, in terms of
application and the network structure. Using the improved hardware implementation, processing time and simulation time
GA, the structure and the parameters of the neural network can be reduced. Fifth, the improved GA is able to help tuning
can b e tuned. An application example on sunspot the structure as well as the parameters of the proposed neural
forecasting is given to show the merits of the improved C A network. Sixth, as an application example, the proposed neural
and the proposed neural network. network tuned by the improved GA is used to estimate the
numbers of sunspot [7-81. It will be shown that a better
I. 'INTRODUCTION performance can be obtained as compared with that from a
GA is a directed random search technique invented by traditional feed-forward neural network [2] tuned by the
Holland [I]in 1975,which is a widely applied in optimization traditional GA [I-2,51.
problems [I-2, 51. This is especially useful for complex
optimization problems where the number of parameters is large 11. IMPROVED GENETIC ALGORITHM
and the analytical solutions are difficult to obtain. GA can help Genetic algorithms (GAS) are powerful searching algorithms.
to find out the optimal solution globally over a domain [ 1-2, 51. The traditional GA process [I-2,51 is shown in Fig. 1. First, a
It has been applied in different areas such as fuzzycontrol [9-11, population of chromosomes is created. Second, the
151, path planning [12], greenhouse climate control [13], chromosomes are evaluated by a defined fitness function. Third,
modeling and classification [14]etc.. some of the chromosomes are selected for performing genetic
Neural network was proved to be a universal approximator operations. Forth, genetic operations of crossover and mutation
[ 161. A 3-layer feed-forward neural network can approximate are performed. The produced offspring replace their parents in
any nonlinear continuous function to an arbitrary accuracy. the initial population. This GA process repeats until a
Neural networks are widely applied in areas such as prediction user-defined criterion is reached. However, a superior offspring
[7],system modeling and control [16]. Owing to its particular is not guaranteed to produce in each reproduction process. In
structure, a neural network is very good in learning [2]using this paper, the traditional GA is modified and new genetic
some learning algorithms such as GA [I]and back propagation operators are introduced to improve its performance. Our
algorithm [2]. In general, the learning steps of the neural improved GA is implemented by floating-point numbers, and
network are as follows. First, a network structure is defined with the processing time is shorter than the GA implemented by
fixed numbers of inputs, hidden nodes and outputs. Second, a binary numbers as the coding and decoding processes are not
,learning algorithm is chosen to realize the learning process. It needed [l-2,51. Two parameters, the probabilities of crossover
can be seen that the fixed structure may not provide the optimal and mutation, in the traditional GA are no longer needed. Only
performance within a defined training period. Sometimes, the the population size is needed to be defined. The improved GA
chosen neural network is large enough to provide a performance process is shown in Fig. 2. Its details will be given as follows.
better than it should have. When this happens, the training
period may have to be longer (as the performance is not met) and A . Initial Population
the implementation cost will also be increased. The initial population is a potential solution set P. The first
The contributions of this paper are six-fold. First, new set of population is usually generated randomly.
genetic operators are introduced to the improved GA. It will be
shown that the improved GA performs more efficiently than the
traditional GA [I-2,5]based on some benchmark De Jong's test
functions [3-4,6, 171. Second, an improved GA is proposed.
pop-size; j = 1,2,..., no-vars (2)
' The work described in this paper was substantially supported by a
~ ~ ~~~

para;,, I p,, I para;,, i = 1, 2, ..., pup-size; j = I,2, ...,


Research Grant of The Hong Kong Polytechnic University (project
numbers B-Q394 and G-V954). nu-vars (3)

0-7803-7108-9/01/$10.00 (C)200 1 IEEE 25


IECON'O1: The 27th Annual Conference of the IEEE Industrial Electronics Society

where pop-size denotes the population size; no-vars denotes the This offspring (7) will then undergo the mutation operation.
number of variables to be tuned; p,, ,i = I , 2, .. ., pop-size;j = The mutation operation is to change the genes of the
chromosomes. Consequently, the features of the chromosomes
1, 2, ..., no-vars, are the parameters to be tuned; para:,, and inherited from their parents can be changed. Three new
offspring will be generated by the mutation operation as defined
paraAax are the minimum and maximum values of the
by 7

parameter p,, . It can be seen from ( I ) to ( 3 ) that the potential


nos, =[os/ 0s; ... osio-v,,]
solution set P contains some candidate solutions p ,
(chromosomes). The chromosome p , contains some variables
+[b,Anos, b,Anos* ... ~"o~",rs~~~~,",sl ,J

=1,2,3 (8)
P,, (genes). where bi, i = 1, 2, ..., no-vars, can only take the value of 0 or 1,
Anos, , i = 1, 2, ..., no-vars, are randomly generated numbers
B. Evaluation
Each chromosome in the population will be evaluated by a
+
such that para;, 2 os,/ Anosi <: parakax. The first new
defined fitness function. The better chromosomes will return offspring (j = 1) is obtained according to (8) with that only one
higher values in this process. The fitness function to evaluate a b, ( i being randomly generated within the range) is allowed to
chromosome in the population can be written as, be 1 and all the others are zeros. The second new offspring is
fitness = f (p ;) (4) obtained according to (8) with that some bi chosen randomly are
The form of the fitness function depends on the application. set to be 1 and others are zero. The third new offspring is
obtained according to (8) with all bi = 1. These three new
C. Selection offspring will then be evaluated using the fitness function of (4).
Two chromosomes in the population will be selected to The one with the largest fitness value f,will replace the
undergo genetic operations for reproduction. It is believed that
the high potential parents will produce better offspring (survival chromosome with the smallest fitness value f , in the
of the best ones). The chromosome having a higher fitness value population if f, > f , .
should therefore have a higher chance to be selected. The
After the operation of selection, averaging, and mutation, a
selection can be done by assigning a probability q i to the
new population is generated. This new population will repeat
chromosome p i : the same process. Such an iterative process can be terminated
when the result reaches a defined condition, e.g., the change of
(5) the fitness values between the current and the previous iteration
is less than 0.001, For the traditional GA process depicted in
,=I Fig. I, the offspring generated may not be better than their
The cumulative probability 4, for the chromosome p , is parents. This implies that the searched target is not necessarily
approached monotonically after each iteration. Under the
defined as,
proposed improved GA process, however, if f, < f , , the
previous population is used again in the next genetic cycle.

The selection process starts by randomly generating a nonzero 111. BENCHMARK TESTFUNCTIONS
floating-point number, d E [0 I] , for each chromosome. De Jong's Test Functions [3-4, 6, 171 are used as the
benchmark test functions to examine the applicability and
Then, the chromosome p, is chosen if t,-l< d I 4,, i = 1,2, efficiency of the improved GA. Five test functions, 4 (x) , i = 1,
..., pop-size, and Go = O . It can be observed from this 2, 3 , 4, 5, will be used, where x = [xi xI ... x,,] . n is an
selection process that a chromosome having a largerfl p , ) will integer denoting the dimension of the vector x. The five test
'

have a higher chance to be selected. Consequently, the best functions are defined as follows,
chromosomes will get more copies, the average will stay and the
worst will die off. In the selection process, only two f , ( x ) = z x ; , -5.12 I x i 25.12 (9)
chromosomes will be selected to undergo the genetic operations. i=l
where n = 3 and the minimum point is atfi(0, 0,O) =0
D. Genetic Operations "-1

The genetic operations are to generate some new f * ( x ) = i~= l ( l o o x ( x , +- xl i z ) I + ( x i - l ) 2 ) ,


chromosomes (offspring) from their parents after the selection
process. They include the averaging and the mutation -2.048 2 X, 22.048 (10)
operations. The average operation is mainly for exchanging where n = 2 and the minimum point is atfz(O, 0) = 0.
information from the two parents obtained in the selection
process. The operation is realized by taking the average of the f , ( x ) = 6 x n + z f l o o r ( x , ) , -5.12 I x, I 5.12 (1 1)
parents. For instance, if the two selected chromosomes are p I ,=I
and p2, the offspring generated by the averaging process is given where n = 5 and the minimum point is ath(t5.12, 51, ..., [5.12,
by, 51) = 0. The floor function, floor(.), is to round down the
argument to an integer.
(7) "
f 4 ( x ) = ~ i x x 1+Gauss(O,l),
4 - 1 . 2 8 2 ~21.28
~ (12)
,=I

0-7803-7 108-9/01/$10.00(C)2001 IEEE 26


IECON'O1: The 27th Annual Conference of the IEEE Industrial Electronics Society

where n = 30 and the minimum point is at h(0, ..., 0) = 0. network is as follows,


Gauss(0, 1) is a function to generate uniformly a floating-point
number between 0 and 1 inclusively. y , (t) = 26(s:,)wjklogsig[1,
j-1

, k = 1, 2, ..., no,,,
I
(6(sb)vqzi( t )- 6 ( s : ) b J ) -6(s:)b:

(17)
z , ( I ) ,i = I , 2, .. ., n, , are the inputs which are functions of a
r=l
variable t; n,, denotes the number of inputs; v G , i = 1, 2, ...,
-65.356 5 X, < 65.356 (13) n,, ; j = I , 2, .. ., nh, denote the weight of the link between the
where
32 -16 0 16 32 -32 -16 0 16 32 i-th input and the j-th hidden node; nh denotes the number of
n='uqi=[-32 32 32 32 32 -16 -16 -16 -16 -16
the hidden nodes; s,) , i = 1,2, ..., n, ; j = 1,2, ..., nh,denotes
-32 -16 0 16 32 -32 -16 0 16 32 -32 -16 0 16 32
0 0 0 0 0 16 16 16 16 16 32 32 32 32 321, the parameter of the link switch from the i-th input to thej-th
k = 500 and the minimum point is ath(32, 32) =: 1. hidden node; s;k ,j = 1,2, ., ., nh ;k = 1,2, ..., not,,,denotes the
It should be noted that the minimum values of all functions in parameter of the link switch from thej-th hidden node to the k-th
the defined domain are zero except for f 5 ( x ) . The fitness output; no", denotes the number of outputs of the proposed
function for f,to f, is defined as, neural network; 6: and 6,' denote the biases for the hidden
fitness = - ,i=l,2,3,4. nodes and output nodes respectively; sl and sf denote the
1 + L(X)
parameters of the link switches of the biases to the hidden and
and the fitness function for f, is defined as, output layers respectively; logsig(.) denotes the logarithmic
1 sigmoid function:
fitness = -
f,(x) logsig(a) = -, a € % (18)
1+ e-a
The improved GA goes through these five test functions. The y,(t) ,k = 1,2, .. ., no,,,,is the k-th output of the proposed neural
results are compared with those obtained by the traditional GA network. By introducing the switches, the weights vy and the
[5]. For each test function, the simulation takes 500 iterations
and the population size is 20. Each parameter of the traditional switch states can be tuned. It can be seen that the weights of the
GA is encoded into a 40-bit chromosome and the probabilities of links govern the input-output relationship of the neural network
crossover and mutation are 0.25 and 0.03 respectively. The while the switches of the links govern the structure of the neural
initial values of x in the population for a test function are set to network.
be the same. For tests 1 to 5, the initial value are[l 1 I],
B. Tuning of the Parameters and Structure
In this section, the proposed neural network is employed to
respectively. The results of the average fitness values over the learn the input-output relationship of an application using the
30 times of simulations of the improved and traditional GAS are improved GA. The input-output relationship is described by,
tabulated in Table 1. It can be seen from Table I that the y d ( t ) = g ( z d ( t ) ) t, = 1,2, ..., nd (19)
performance of the improved GA is better than that of the
traditional GA. From Table I, the processing time of the where Y d ( t )= [YP(t) Yz"(t) ... Y;Jt)I and
improved GA is much shorter than that of the traditional GA.
z d ( t ) = [ z y ( t ) z,d(t) ... z ; . ( t ) ] are the given inputs and
1v. NEURAL
NETWORK
WITH LINKSWITCHES AND TUNING
USING the desired outputs of an unknown nonlinear function g(.)
THE IMPROVED CA
In this section, a neural network with link switches is respectively. nd denotes the number of input-output data pairs.
presented. By introducing a switch to a link, not only the The fitness function is defined as,
parameters but also the structure of the neural network can be 1
tuned using the improved GA. fitness = - (20)
1 + err
A. Neural Network with Link Switches
Neural networks [ 5 ] for tuning usually have a fixed structure.
err =
n,,", n k

C
f: (Y: (') - Yk (t)l
(21)
The number of connections must be large enough to fit a given '=I
k=l nd
application. This may cause the neural network structure to be
unnecessarily complex, and increase the implementation cost. The objective is to maximize the fitness value of (20) using the
In this section, a multiple-input-multiple-output three-layer improved GA by setting the chromosome to be
neural network is proposed as shown in Fig. 3. The main [.:k w,, s: v,, sj 6: s,' b l ] for all i,j, k.
different point is that a unit step function is introduced to each
Referring to (21), the maximum fitness value sometimes will be
link. Such a unit step function is defined as,
dominated by the error value of certain outputs that may not be
of great interest. To avoid this, n, , k = I , 2, ..., nd , are chosen
to reduce the effects of the error from these dominated outputs.
This is equivalent to adding a switch to each link of the neural It can be seen from (20) and (21) that a larger fitness value
network. Referring to Fig. 3, the input-output relationship of the implies a smaller error value.
proposed multiple-input multiple-output three-layer neural

0-7803-7108-9/0 1/$10.00(C)2001 IEEE 27


IECON’O1: The 27th Annual Conference of the IEEE Industrial Electronics Society

V. APPLICATION EXAMPLE absolute error (MAE), are tabulated in Table HI. It can be
An application example on forecasting of the sunspot number observed in Table I11 that our approach performs better than the
[7-81 will be given in this section. The sunspot cycles from 1700 traditional approach. Refer to Table 111, the best result is
to 1980 are shown in Fig. 4. The cycles generated are non-linear, obtained when the number of hidden node is 4 and the number of
non-stationary, and non-Gaussian which are difficult to model iterations for learning process is 2000. The number of
and predict. We use the proposed 3-layer neural network connected link is 16 after learning (the number of fully
(3-input-single-output) with link switches for the sunspot connected links is 21 which includes the bias links). It is about
number forecasting. The inputs, zi, of the purposed neural 24% reduction of the links after learning. The training error and
network are defined as z I( t ) = yp (t - 1) , z 2( t ) = yf’(t - 2) the forecasting error in term of mean absolute error (MAE) are
113869 and 13.0378 respectively.
and z 3 ( t )= y P ( t -3) where t denotes the year and yp(r) is
the sunspot numbers at the year t. The sunspot numbers of the VI. CONCLUSION
first 180 years (i.e., 1705 < t I 1884 ) are used to train the An improved GA has been proposed in this paper. The
proposed neural network. Refer to (17), the proposed neural improved GA is implemented by floating-point numbers. As no
network used for the sunspot forecasting is governed by, coding and encoding of the chromosomes are necessary, the
process time for learning using the improved GA is faster. New
genetic operators have been introduced to the improved GA. By
using the benchmark De Jong’s test functions, it has been shown
The value of nh are changed from 3 to 7 to test the learning
that the improved GA performs more efficiently than the
performance. The fitness function is defined as follows,
traditional GA. Besides, by introducing a switch to each link, a
fitness = err
1 neural network that facilitates the tuning of its structure has been
proposed. Using the improved GA, the proposed neural network
is able to learn both the input-output relationship of an
application and the network structure. As a result, a given fully
connected neural network can be reduced to a partly connected
The improved GA is employed to tune the parameters and network after learning. This implies that the cost of
structure of the neural network of (22). The objective is to implementation of the neural network can be reduced. An
maximize the fitness function of (23). The larger value of the application example on forecasting the sunspot numbers using
fitness function indicates the smaller value of err of (24). The the proposed neural network trained with the improved GA has
best fitness value is 1 and the worst one is 0. The population size been given. The simulation results have been compared with
used for the improved GA is 20. The lower and the upper those obtained by a traditional feed-forward networks trained by
bounds of the link weights are defined as the traditional GA. It has been shown that our proposed network
trained by the improved GA can perform better.

REFERENCES
1,2, ...,3 ; j = I, 2, ..., nh, k = 1. The chromosomes used for the J. H. Holland, Adaptation in natural and artificial systems,
improved GA are [sjl w , ~ SI, v,, s: b: S: b:] . University of Michigan Press, Ann Arbor, MI, 1975.
D. T. Pham and D. Karaboga, Intelligent optimization
The initial values of the link weights are randomly generated. techniques, genetic algorithms, tabu search, simulated
For comparison purpose, a fully connected 3-layer feed-forward annealing and neural networks, Springer, 2000.
neural network (3-input-single-output) [2] trained by the Y. Hanaki, T. Hashiyama and S. O h m a , “Accelerated
traditional GA [I-2, 51 is used for the sunspot number evolutionary computation using fitness estimation,” IEEE
forecasting . The working conditions are exactly the same as SMC ‘99 Conference Proceedings of IEEE International
those mentioned above. Additionally, a bit length of 10 is used Conference on Systems, Man, and Cybernetics, vol. I,
for the parameter coding. The probabilities of the crossover and 1999, pp. 643-648.
mutation are 0.25 and 0.03 respectively. In both approaches, the De Jong, K. A., “An analysis of the behavior of a class of
learning processes are carried out by a personal computer with genetic adaptive systems,” Ph.D. Thesis, University of
PI11 500Hz CPU. The number of iterations to train the neural Michigan, Ann Arbor, MI., 1975.
network is 2000 for both approaches. Z. Michalewicz, Genetic Algorithm + Data Structures =
The tuned neural networks are used to forecast the sunspot Evolution Programs, second, extended edition,
number during the years 1885-1980. Fig. 6 shows the Springer-Verlag, 1994.
simulation results of the forecasting of the sunspot number G X. Yao and Y. Liu “Evolutionary Programming made
during the years 1885-1980 using the proposed neural network Faster,“ IEEE Trans., on Evolutionary Computation, vol. 3,
trained with improved GA (dashed lines), and the traditional no. 2, July 1999, pp.82-102
feed-forwards trained with traditional GA (dotted lines). The M. Li, K. Mechrotra, C. Mohan, S. Ranka, “Sunspot
actual sunspot numbers are represented in solid lines. The Numbers Forecasting Using Neural Network,” 5th IEEE
number of hidden nodes nh changes from 3 to 7. The simulation International Symposium on Intelligent Control, 1990.
results are tabulated in Table I1 and Table 111. From the Table 11, Proceedings, pp.524-528, 1990.
it is observed that the proposed neural network trained with the T. J. Cholewo, J. M. Zurada, “Sequential network
improved GA provides better results than those of traditional construction for time series prediction,” International
feed-forward neural network trained with traditional GA in Conference on Neural Networks, ~01.4,pp.2034-2038,
terms of accuracy (fitness values), number of links and learning 1997.
time. The training error (governed by (24)) and the forecasting B. D. Liu, C. Y. Chen and J. Y. Tsao, “Design of adaptive
fuzzy logic controller based on linguistic-hedge concepts
error (governed by I9’O
,4885
lyp(t)- 96 y l ( t ) ’ ), represented in mean and genetic algorithms,” IEEE Trans. Systems, Man and

0-7803-7108-9/01/$10.00 (C)2001 IEEE 28


IECON’O1: The 27th Annual Conference of the IEEE Industrial Electronics Society

Cybernetics, Part B, vol. 31 no. 1, Feb. 2001, pp. 32-53.


[IO] Y. S Zhou and L. Y, Lai “Optimal design for hzzy
controllers by genetic algorithms,” IEEE Trans., I n d u s e
Applications, vol. 36, no. 1 , Jan.-Feb. 2000, pp. 93-97.
[ 111 C. E Juang, J. Y. Lin and C. T. Lin, “Genetic reinforcement
learning through symbiotic evolution for fuzzy controller
design,” IEEE Trans., Systems, Man and Cybernetics, Part
B, vol. 30, no. 2, April, 2000, pp. 290-302.
[ 121 H. Juidette and H. Youlal, ‘‘Fuzzy dynamic path planning
using genetic algorithms,” Electronics Letters, vol. 36, no.
4, Feb. 2000, pp. 374-376
[ 131 R. Caponetto, L. Fortuna, G Nunnan, L. Occhipinti and M.
G Xibilia, “Soft computing for greenhouse climate
control,” IEEE Trans., Fuzzy Systems, vol. 8, no. 6, Dec.
2000, pp. 753-760.
[I41 M. Setnes and H. Roubos, “GA-hzzy modeling and
classification: complexity and performance,” IEEE. Trans,
Fuzzy Systems, vol. 8 , no. 5, Oct. 2000, pp. 509-522.
[ 151 Belarbi, K.; Titel, F., “Genetic algorithm for the design of a
class of fuzzy controllers: an alternative approach,” IEEE
Trans., Fuzzy Systems, vol. 8, no. 4, Aug. 2000, pp.
398-405.
[I61 M. Brown and C . Harris, Neuraljiuzzy adaptive modeling -1 --yy Ew,,ch

andcontrol, Prentice Hall, 1994.


[ 171 S. Amin and J.L. Fernandez-Villacanas, “Dynamic Local
Search,” Second International Conference On Genetic
Algorithms in Engineering Systems: Innovations and Fig. 3. The proposed 3-layer neural network
Applications, pp129-132, 1997.
I I I

Initial Population Subpopulation


(Chromosomes) (Offspring)

Genetic
Evaluation --* Selection
Operations

Fig. 1. Traditional GA.


Yea,

Initial Population Subpopulation Fig. 4. Sunspot cycles from year 1700 to 1980
(Chromosomes)

m , , , , , , ,

I
Evaluation
New Genetic
180

opemtionr

Fig. 2. Improved GA.

Year

(a) Number of hidden nodes ( nh ) = 3

0-7803-7 108-9/01/$10.00 (C)2001 IEEE 29


IECON'O1: The 27th Annual Conference of the IEEE Industrial Electronics Society

sunspot numbers (solid line) for the years 1885-1980.

Table I. Simulation results of the improved and the traditional GAS.


YE.=

(b). Number of hidden nodes ( nh ) = 4

m , , , , , , , , ,

-
the sunspot number after 2000 iterations ofiearning.

Yea,

(c). Number of hidden nodes ( nh ) = 5

14.6844 17.5568
14.0869 17.2024
14.1786 15.5637
15.1926 18.0312
7 15.3548 17.1553
Table 111. Training error and forecasting error represented in mean
absolute error (MAE).

Y e

(d). Number of hidden nodes ( nh ) = 6

80
I-

(e). Number of hidden nodes ( nh ) = 7


Fig. 6. Simulation results of a 96-year prediction using the proposed
neural network with the proposed GA (dashed line) and the traditional
network with the traditional GA (dotted line), compared with actual

0-7803-7 108-9101/$10.00 (C)200 1 IEEE 30

You might also like