Professional Documents
Culture Documents
(2013) A Coevolutionary Approach To Learn Animal Behaviour Through Controlled Interaction
(2013) A Coevolutionary Approach To Learn Animal Behaviour Through Controlled Interaction
(2013) A Coevolutionary Approach To Learn Animal Behaviour Through Controlled Interaction
223
This paper is organized as follows. Sec. 2 presents related
work. Sec. 3 presents the methodology used, including the Speed
simulated animal behavior and the proposed coevolutionary
algorithm. A proof-of-concept study, conducted in computer
simulations, is presented in Sec. 4. Sec. 5 concludes the 0.5
paper.
0.1 0.9 1.0
2. RELATED WORK 0
Light intensity
224
ous space. The simulation advances in discrete time steps I s b1
t ∈ {0, 1, 2, . . . }. The (ambient) light intensity in the envi-
ronment, I, can be varied continuously between 0 and 1.
The animal distinguishes between three levels of light in-
tensity, low (0 ≤ I < IL ), medium (IL ≤ I ≤ IH ), and high b2
(IH < I ≤ 1), with each level corresponding to a state of
the animal. Hereafter, these states will be referred to as L,
M , and H.
If the animal is in state M at time t, its speed, s(t) ∈ R,
varies linearly with I (t) as:
O2
! "
s(t) = k I (t) − 0.5 , (1) O1
where k is a constant.
If the animal is in state L, its behavior depends on the Figure 2: The structure of the classifiers is a recur-
number of H to L transitions that had occurred since it rent Elman neural network with two inputs, three
has last been in state M (or since t = 0, if the animal has hidden neurons, and two output neurons. See the
never been in state M ). If no transitions had occurred, then text for details.
the speed of the animal is k (IL − 0.5), and remains that
way for as long as the animal remains in state L. If one H
to L transition had occurred, then the speed of the animal have any knowledge about the model structure or parame-
decays exponentially with a rate α1 for every time step that ters.
the animal remains in state L. Hence, in the first time step
that the animal is in state L, the speed is k (IL − 0.5); then, 3.2.2 Classifier Structure
it changes to α1 k (IL − 0.5), α12 k (IL − 0.5), and so on. If The structure of the classifiers is a recurrent Elman neu-
two H to L transitions had occurred since it has last been ral network [6] with two inputs, three hidden neurons, and
in state M (or since t = 0, if the animal has never been in two output neurons (see Fig. 2). The network has a total
state M ), then the rate is α2 ; if three or more transitions of 26 parameters, which can all assume values in R. The
had occurred, then the rate is α3 . activation function used in the hidden and the output neu-
The behavior of the animal in state H is analogous to rons is the logistic sigmoid, which has the range (0, 1) and
that in state L. The possible exponential change rates are is defined as sig (·) = 1/ (1 + exp (− (·))).
also α1 , α2 and α3 ; however, which one of these applies One of the inputs to the network is the light intensity in
now depends on the number of L to H transitions that had the environment at time step t, I (t) ∈ [0, 1], while the other
occurred since the last M state. input is the speed s(t) .
Table 1 shows an example state sequence for the animal, In order to make a judgement between a model and the
along with the factor by which the animal’s ‘base’ speed (i.e., ‘real’ animal, the network observes the behavior of the ani-
the speed shown in Fig. 1) is multiplied in each time step. mal (speed) over a period of time; here 10 s at 0.1 s intervals
Here, IL and IH are set to 0.1 and 0.9 respectively, k is for a total of T = 100 time steps. In addition, the network
set to 1.25; hence, the lower and the upper saturation values is also in control of the light intensity in the animal’s envi-
of the speed are k (IL − 0.5) = −0.5 and k (IH − 0.5) = 0.5. ronment. At time t = 0, the value of the light intensity is
The exponential rates of change are set to: α1 = 0.8, α2 = chosen randomly with a uniform distribution in the range
0.4, α3 = 0.2. Thus, in each case, the animal’s speed decays [0, 1]. The neural network is then updated, using I (0) and
exponentially towards zero.
s(t) . The value of the light intensity for the next time step
3.2 Coevolutionary Approach and Implemen- is obtained from the neural network’s output neuron O1 ,
tation and the process repeats. After having iterated through all
the time steps, the final value of output neuron O2 is used
The coevolutionary algorithm is comprised of two popula- to make a judgement: the network decides on a model if
tions — one of models, and one of classifiers — that co-evolve O2 < 0.5, and on the real animal if O2 ≥ 0.5.
with each other competitively.
The fitness of the classifiers depends solely on their ability
to distinguish the behavior of the models from the behavior 3.2.3 Optimization Algorithm
of the ‘real’ animal, described in Sec. 3.1. The fitness of the The algorithm used here is based on a (µ + λ) evolution
models depends solely on their ability to mislead the classi- strategy with self-adaptive mutation strengths [1], and can
fiers into making the wrong judgement, that is, classifying be thought of as consisting of two sub-algorithms — one
them as the ‘real’ animal. for the models, and another for the classifiers — which are
identical, and do not interact with each other except for the
3.2.1 Model Structure fitness calculation step (described later in Sec. 3.2.4).
In this paper, both animal and model share the same In this algorithm, an individual is a 2-tuple, a = (x, σ),
structure. The task is thus to estimate parameters k, α1 , where x ∈ Rn represents objective parameters, and σ ∈
α2 , and α3 , as described in Sec. 3.1. As a consequence of (0, ∞)n represents mutation strengths. The i-th mutation
this choice, it is simple to gauge the quality of the models strength in σ corresponds to the i-th element in x. For the
obtained (as discussed in the results section). It is worth model sub-algorithm, n = 4, for the classifier sub-algorithm,
mentioning that the fitness of the models solely depends on n = 26.
the performance of the classifiers, and that the latter do not Each generation g comprises a population of µ = 50 indi-
225
viduals: correct judgement, the classifier’s fitness increases by one,
(g)
#
(g) (g)
$ for a final fitness in the set {0, 1, 2, . . . , 200}.
P = a1 , a2 , . . . , aµ(g) .
In the population of the first generation, P (0) , all the ob- 4. RESULTS
jective parameters are initialized to 0.0 and all the mu-
tation strengths are initialized to 1.0. Thereafter, in ev- 4.1 Coevolutionary Runs without and with
ery generation g, the µ parent individuals are first used Noise
to create λ = 50 offspring individuals by recombination.
!(g) We performed 100 coevolutionary runs using the setup
For the generation of each recombined individual ak , k ∈
described in Sec. 3. This setup, in which the classifier is in
{1, 2, . . . , λ}, two individuals are chosen randomly, with re-
(g) (g) control of the light intensity in the animal’s environment,
placement, from the parent population: aχ and aψ , where is hereafter referred to as the “interactive” setup. In order
χ,ψ ∈ {1, 2, . . . , µ}. Discrete and intermediary recombina- to validate the advantages of the interactive approach, we
tion are then used to generate the objective parameters and compared it against the situation where the classifier only
the mutation strengths of the recombined individual, respec- observes the animal passively. We considered two such se-
tively: tups: in the first setup (hereafter, “passive 1”) the light in-
!(g) (g) (g) tensity changes randomly, following a uniform distribution
xk,i = xχ,i OR xψ,i , (2) in [0, 1], in every time step. In the second setup (hereafter,
“passive 2”), the intensity changes randomly every ten time
! "
!(g) (g) (g)
σk,i = σχ,i + σψ,i /2, (3)
steps. All other aspects of these two setups are identical
to the “interactive” setup. Another 100 coevolutions were
where i ∈ {1, 2, . . . , n} is indexing the elements within the
performed for each of the two passive setups.
vectors and, in Eq. 2, the selection is performed randomly
In addition, we considered the situation where both the
and with equal probability.
animal and the learning system are affected by noise. Each
Each of the λ recombined individuals is then mutated in
of the three setups was tested by performing another 100
order to obtain the final offspring population, P !!(g) . This is
coevolutionary runs. The light intensity perceived by the
done according to:
animal is obtained by multiplying the real intensity by a ran-
!!(g)
σk,i
!(g)
= σk,i exp τ ! Nk (0, 1) + τ Nk,i (0, 1) ,
% &
(4) dom number generated uniformly in [0.95, 1.05], and capping
the perceived intensity to 1 if it exceeds this value. Noise is
!!(g) !(g) !!(g)
xk,i = xk,i + σk,i Nk,i (0, 1) , (5) also applied to the speed of the animal by multiplying the
original speed with a random number generated uniformly
for all {k, i}, where k ∈ {1, 2, . . . , λ} is indexing the individ- in [0.95, 1.05]. In order to make this setup more feasible to
uals within the population and i ∈ {1, 2, . . . , n} is indexing implement, it is assumed that the system cannot directly
the elements within the vectors. Eq. 4 generates the per- measure the speed of the animal, but rather its position.
turbed mutation strength from the original one according to Noise on the position is applied by adding a random num-
a log-normal distribution. Eq. 5 mutates the objective pa- ber generated from normal distribution N (0, 0.005). The
rameter according to a normal distribution having the per- speed of the animal for the classifier’s input is then calcu-
turbed mutation strength as its variance. In Eq. 4, Nk (0, 1) lated by subtracting the previous estimated position from
and Nk,i (0, 1) are both random numbers generated from a the current estimated position.
standard normal distribution; however, the former is gener-
ated once for each individual (i.e. for each value of k), while 4.2 Analysis of the Evolved Models and Clas-
the latter is generated separately for each element within sifiers
each individual (i.e. for each combination of k and i). The
Fig. 3 shows a box plot1 with the distributions of the best
parameters τ ! and τ determine the learning √ rates of the
' mu-
√ evolved parameters in the 1000th generation of the three
tation strengths, and are set as τ ! = 1/2 2n, τ = 1/2 2 n coevolutionary setups, for the cases (a) without and (b) with
(similar to [19]). noise.
Once the offspring population has been generated, the µ The passive coevolutions are able to evolve the parameters
individuals with the highest fitness (see Sec. 3.2.4) from the k and α1 ; however, they are not able to evolve α2 and α3 .
combined population, P (g) ∪ P !!(g) (which contains µ + λ If the light intensity changes randomly, it is highly unlikely
individuals), are selected as the parents to form the popu- that the transitions L to H and/or H to L occur enough
lation of the next generation, P (g+1) . Individuals with an times, without an M state in between, such that the clas-
equal fitness have an equal chance of being selected. sifiers can observe the effects of α2 and α3 . Therefore, the
classifiers are not capable of distinguishing the behavior of
3.2.4 Fitness Calculation models from the behavior of the real animal with respect to
The fitness of each model is obtained by evaluating it with these two parameters, and in turn, these parameters do not
each of the classifiers in the competing population (100 in converge to their true value in the model population.
total). For every classifier that wrongly judges the model as
1
being the real animal, the model’s fitness increases by one. The box plots presented here are all as follows. The line
Therefore, in the end, the model obtains a fitness in the set inside the box represents the median of the data. The edges
{0, 1, 2, . . . , 100}. of the box represent the lower and the upper quartiles (25-
th and 75-th percentiles) of the data, while the whiskers
The fitness of each classifier is obtained by using it to represent the lowest and the highest data points that are
evaluate (i) each model in the competing population (100 within 1.5 times the inter-quartile range from the lower and
in total) once, and (ii) the real animal 100 times. For each the upper quartiles, respectively. Circles represent outliers.
226
2
1
1.5
1
0.5
0.5
0
0
-0.5
-1
-0.5
interactive
-1.5 passive 1
light intensity
passive 2
-2 speed
1 2 3 4 -1
0 10 20 30 40 50 60 70 80 90 100
index of model parameters time step
227
15
10 1
interactive
passive 1
10
total square error (log scale)
10 passive 2 0.8
0 0.4
10
-5 0.2
10 best fitness of classifiers
best fitness of models
0 average fitness of classifiers
-10 average fitness of models
10
1 200 400 600 800 1000
0 200 400 600 800 1000
generation generation
15
10 1
interactive
passive 1
10
total square error (log scale)
10 passive 2 0.8
0 0.4
10
-5 0.2
10 best fitness of classifiers
best fitness of models
0 average fitness of classifiers
-10 average fitness of models
10
1 200 400 600 800 1000
0 200 400 600 800 1000
generation generation
Figure 5: This plot shows the convergence of the Figure 6: This plot shows the normalized fitness of
coevolutionary algorithms with and without noise. the classifiers and the models in (a) the interactive
The horizontal axis represents the generation, while coevolution, and (b) the passive 1 coevolution. The
the vertical axis represents the total square error in curves show the average fitness across 100 coevolu-
the parameters of the best model in each generation. tionary runs.
Each box corresponds to 100 coevolutionary runs,
and the solid lines correspond to the median error.
by α1 , while parameters α2 and α3 take a longer time to
converge to the true values. This means that the classifiers
quickly improve in fitness, which in turn causes the fitness first learn to distinguish models from the ‘real’ animal on
of the models to decrease. This increases the selective pres- the basis of k and α1 . This ability of the classifiers drives
sure on the models. In the case of the passive 1 coevolution, the model population to rid itself of models that are differ-
the average fitness of the classifiers also starts off at around ent to the ‘real’ animal in terms of k and/or α1 . Eventually,
0.5. In the first few generations, this increases slightly, be- the classifiers also learn to exploit the effects of α2 and α3 in
cause the classifiers learn how to distinguish models from the order to make the right decision; thereby driving the model
‘real’ animal on the basis of parameters k and α1 . However, population to evolve these two parameters correctly. After
the models quickly adapt to this new ability of the classi- about the 500th generation, the learning of the four param-
fiers. Now, as the classifiers are very unlikely to have the eters proceeds with virtually identical rates.
opportunity to observe the effects of α3 and α4 , their av-
erage fitness returns to 0.5. This leads to a disengagement
phenomenon, in which there is no more meaningful selection 4.4 Using a Simple Evolutionary Algorithm
in the model population, leading their error to drift. In order to compare the coevolutionary method against a
We also wish to analyze how the four individual parame- more traditional approach, we used a simple evolution where
ters evolve during the course of the only successful coevolu- a single population of models evolves. As there are now no
tionary setup, i.e. the interactive coevolution. For simplic- classifiers, an interactive approach is not possible, and thus
ity, we consider only the noiseless case, which is shown in we conducted 100 evolutionary runs for the passive 1 and
Fig. 7. This plot reveals how learning proceeds in the coevo- passive 2 methods of changing the light intensity in the ani-
lution. Parameter k is the first to be learnt, followed closely mal’s environment. The structure of the evolution is identi-
228
10
2 5. CONCLUSIONS
This paper has presented a new platform that allows for
10
0 the automatic learning of animal behavior without prior in-
total square error (log scale)
1
6. ACKNOWLEDGMENTS
0.5 The research work disclosed in this publication is funded
0 by the Marie Curie European Reintegration Grant within
the 7th European Community Framework Programme (grant
-0.5 no. PERG07-GA-2010-267354).
-1 M. Gauci acknowledges support by the Strategic Edu-
cational Pathways Scholarship (Malta). The scholarship
-1.5 passive 1 is part-financed by the European Union—European Social
passive 2 Fund (ESF) under Operational Programme II—Cohesion
-2
1 2 3 4 Policy 2007–2013, “Empowering People for More Jobs and
index of model parameters a Better Quality of Life”.
229
[8] I. D. Gauld, M. A. O’Neill, and K. J. Gaston. Driving [14] A. Persidis. High-throughput screening. Nature
Miss Daisy: the performance of an automated insect biotechnology, 16(5):488–493, 1998.
identification system. In Hymenoptera: evolution, [15] J. Schmidhuber. Der ideale Wissenschaftler (in
biodiversity and biological control. The Fourth German). http:
International Hymenoptera Conference, pages //www.cio.de/karriere/personalfuehrung/803246,
303–311. CSIRO PUBLISHING, 1999. June 2004.
[9] L. Grossman. Computer literacy tests: Are you [16] M. Schmidt and H. Lipson. Distilling free-form natural
human? Time Magazine, June 2008. laws from experimental data. Science,
[10] R. D. King, J. Rowland, S. G. Oliver, M. Young, et al. 324(5923):81–85, 2009.
The automation of science. Science, 324(5923):85–89, [17] A. Turing. Computing machinery and intelligence.
2009. Mind, 59(236):433–460, 1950.
[11] B. Kouchmeshky, W. Aquino, J. C. Bongard, and [18] K. E. Whelan and R. D. King. Intelligent software for
H. Lipson. Co-evolutionary algorithm for structural laboratory automation. Trends in Biotechnology,
damage identification using minimal physical testing. 22(9):440–445, 2004.
International Journal for Numerical Methods in [19] X. Yao, Y. Liu, and G. Lin. Evolutionary
Engineering, 69(5):1085–1107, 2007. programming made faster. IEEE Transaction on
[12] N. MacLeod, M. Benfield, and P. Culverhouse. Time Evolutionary Computation, 3(2):82–102, 1999.
to automate identification. Nature, 467(7312):154–55,
2010.
[13] M. Mirmomeni and W. Punch. Co-evolving data
driven models and test data sets with the application
to forecast chaotic time series. In 2011 IEEE Congress
on Evolutionary Computation, pages 14–20. Auburn
University, New Orleans, LA, 2011.
230