You are on page 1of 9

ARTIFICIAL NEURAL NETWORKS

Active Vision and Receptive Field Development


In Evolutionary Robots

Abstract:
In this paper, we describe the Active vision, visual receptive
artificial evolution of adaptive neural fields, artificial evolution, learning,
controllers for an outdoor mobile robot neural networks, mobile robots
equipped with a mobile camera. The
robot can dynamically select the gazing 1. INTRODUCTION
direction by moving the body and/or the
camera. The neural control system, Biological vision systems filter,
which maps visual information to motor compress, and organize the large
commands, is evolved online by means amount of optical stimulation as
of a genetic algorithm, but the synaptic electrical signals proceed from the
connections (receptive fields) from retina towards deeper structures of the
visual photoreceptors to internal brain. This data reduction is achieved
neurons can also be modified by by a layered, distributed, and
Hebbian plasticity while the robot topologically organized set of neurons
moves in the environment. We show that individually respond to specific
that robots evolved in physics based aspects of the optical stimulus. In
simulations with Hebbian visual mammals, for example, neurons in the
plasticity display more robust adaptive early stage of the visual cortex
behavior when transferred to real selectively respond to particular
outdoor environments as compared to features of the environment, such as
robots evolved without visual plasticity. oriented edges [9], which are linear
We also show that the formation of combinations of the pattern of retinal
visual receptive fields is significantly activations. Neurons in later stages of
and consistently affected by active the visual cortex respond to more
vision as compared to the formation of complex patterns that also take into
receptive fields with grid sample images account the direction of movement of
in the environment of the robot. Finally, the stimulus and cannot easily be
we show that the interplay between reduced to a linear combination of
active vision and receptive field lower-level features [1].
formation amounts to the selection and The features that trigger the
exploitation of a small and constant response of a neuron represent the
subset of visual features available to the receptive field of that neuron. The
robot. receptive fields of cortical visual
neurons are not entirely genetically
Keywords determined, but develop during the first
weeks of the newborn baby and there is
evidence that this process may already proximity sensors distributed around
start before birth. Studies of newborn the body of the robot (Figure 1).
kitten raised in boxes with only vertical Infrared sensors are used only by the
texture show that these animals do not operating system to detect collisions and
develop as many receptive fields for reposition the robot between trials; their
horizontal features as kitten raised in activation values are not given to the
normal environments [8] and therefore neural controller. The robot has three
see the world in a different way. The wheels on each side, but only the central
development of visual receptive fields wheel (which is slightly lower) is
occurs through Hebbian synaptic motorized (the remaining two
plasticity, an adaptive process based on
the degree of correlated activity of pre-
and post-synaptic neurons . This
amounts to a bottom-up, data-driven,
and self organizing process that
captures the statistics of the
environment where the animal lives.
Simple computational models, in the
form of feed-forward neural networks
with Hebbian learning, develop
receptive fields that resemble those
found in the early stages of the
mammalian visual cortex when exposed
to input signals taken from uniform Figure 1: The Koala mobile robot with
contrast distributions or from large sets pan/tilt camera.
of natural images [6].
wheels rotate passively using a system of
The ability to become sensitive gears). The wheels have rotation
only to a subset of all the possible encoders that are used for
features that could be derived from the reconstructing the trajectories of the
huge array of optical stimulation allows robot during behavioral analysis. The
brains with limited computational and pan and tilt angles of the camera are
storage resources to exploit at best their independently controlled by two motors.
information processing capacity. Since The robot is equipped with an onboard
the receptive fields are not entirely computer (PC-104), hard disk, and
genetically determined, but depend on operating system (Linux). The cables
the visual stimulation to which the are used only for powering the robot
animal is exposed during its life, the through rotating contacts and to
developmental process plays an download/upload data through Ethernet
adaptive role for the animal. Although at the beginning and end of an
the details of the features accounted for experiment.
by the receptive fields, and their role in
perception, is still a topic of debate [4], The neural network has
it is generally accepted that animals and feed-forward architecture with
models of early-vision stages extract the evolvable thresholds and discrete-time,
dominant statistical features of their fully-recurrent connections at the
visual environment. output layer (Figure 2). A set of visual
neurons, arranged on a five by five grid,
2 METHOD with non-overlapping receptive fields
receive information about the grey level
We use a Koala (K-Team S.A.) of the corresponding pixels in the image
wheeled robot equipped with a pan/tilt provided by a camera on the robot. The
camera (Sony EVI-D31) and infrared receptive field of each neuron covers a
square area of 48 by 48 pixels in the
image. We can think of the total area
spanned by all receptive fields (240 by
240 pixels) as the surface of an artificial
retina. The activation of a visual
neuron, scaled between 0 and 1, is given
by the average grey level of all pixels
spanned by its own receptive field or by
the grey level of a single pixel located
within the receptive field. The choice
between these two activation methods
can be dynamically changed by one
output neuron at each time step. Two
proprioceptive neurons provide
information about the measured
horizontal (pan) and vertical (tilt)
angles of the camera. These values are Figure 2: The architecture is composed
in the interval [−100, 100] and [−25, 25] of a grid of visual neurons with non
degrees for pan and tilt, respectively. overlapping receptive fields
Each value is scaled in the interval [0, 1]
so that activation 0.5 corresponds to 0 The connection strengths between visual
degrees (camera pointing forward neurons and hidden neurons are either
parallel to the floor). Memory input genetically determined (“No learning”
units store the values of the output condition) or they can be modified by
neurons at the previous sensory motor means of a Hebbian learning rule
cycle step and send them back to the (“Learning” condition), which has been
output units through a set of shown to produce connection strengths
connections, which effectively act as that approximate the eigenvectors
recurrent connections among output corresponding to the principal Eigen
units [3]. The bias unit has a constant values of the correlation matrix of the
value of −1 and its outgoing connections input patterns . In other words, this
represent the adaptive thresholds of learning rule implements an
output neurons [7]. Hidden and output approximate Principal Component
neurons use the sigmoid activation Analysis of the input images [10]. The
function f(x) = 1/(1+exp(−x)), where x is modification of connection strength
the weighted sum of inputs to the Δwij depends solely on postsynaptic and
neuron. Output neurons encode the presynaptic neuron activations yi, xj
behavior of the active vision system and
of the robot at each sensory motor cycle.
One neuron determines the filtering
method used to activate visual neurons
and two neurons control the movement
of the camera, encoded as angles
relative to the current position. The
remaining two neurons encode the
direction and rotational speeds of the
left and right motored wheels of the
robot. Activation values above and where k is a counter that points to
below 0.5 stand for forward and postsynaptic neurons up to the neuron
backward rotational speeds whose weights are being considered.
respectively. The new connection strengths are given

by
where 0 < η ≤ 1 is the learning rate,
which in these experiments starts at 1.0
and is halved every 80 sensory motor
cycles. This learning rule has been used
in previous computational models of
receptive field development (Hancock et
al., 1992) and is intended to capture a
system-level property of visual where ok and sk denote the current
plasticity, not the precise way in which output value of kth hidden neuron and
biological synaptic strengths are the standard deviation of the stored
modified in the visual cortex. output values by the current time step,
Among the several available respectively.
models of synaptic plasticity [8] for a The strengths of the synaptic
review, we opted for this one because it connections are encoded in a binary
can be applied online while the robot string that represents the genotype of
moves in the environment and because the individual. In experimental
it implements an approximate Principal conditions where learning is enabled,
Component Analysis, which is a widely the connections between visual neurons
used technique for image compression. and hidden neurons are not genetically
The robotic system and neural encoded. In that case, they are
network are updated at discrete time randomly initialized in the range
intervals of 300 ms. At each time [−_3/25,_3/25] at the beginning of the
interval, the following steps are life of each individual. Genetically
performed: encoded connection strengths can take
1. the activations of the visual values in the range [−4.0, 4.0] and are
and proprioceptive neurons are encoded on 5 bits. The genotype totals
computed from the values provided by 325 bits (65 connections × 5 bits) in the
the robot, the values of the memory learning condition and 950 bits (65 +
units are set to the values of the output 125 connections × 5 bits) in the no-
units at the previous time step (or to learning condition. A population of 100
zero if the individual starts its “life”); genomes is randomly initialized by the
2. The activations of the hidden computer and evolved for 20
units are computed and normalized (in generations. Each genome is decoded
experimental conditions where learning into the corresponding neural network
is enabled); and tested for four trials during which
3. The activations of the output its fitness is computed. The best 20%
units are computed; individuals (those with highest fitness
4. The wheels of the robot are set values) are reproduced, while the
at the corresponding rotational speed remaining 80% are discarded, by
for 300 ms while the camera is set to its making an equal number of copies so to
new position; create a new population of the same
5. In the experimental conditions size. These new genomes are randomly
where learning is enabled,the paired, crossed over with probability 0.1
connection weights from visual neurons per pair, and mutated with probability
to hidden neurons are modified using 0.01 per bit. Crossover consists in
the current neuron activation values. swapping genetic material between two
In step 2 the activation of each strings around a randomly chosen point.
hidden unit is normalized to have the Mutation consists in toggling the value
same magnitude in order to equalize the of a bit. Finally, a copy of the best
contributions of hidden units to genome of the previous generation is
activations of the output units. The inserted in the new population at the
normalized output value of the kth place of a randomly chosen genome
hidden neuron ok is computed by: (elitism).
3 Experiments

We have carried out two sets of


evolutionary experiments (Figure 3) to
investigate whether onto genetic
development of receptive fields provides
an adaptive advantage in new
environmental conditions—namely
when transferred from a simulated to a
real outdoor environment—and to
investigate the interactions between
evolution and learning. The
where Sleft and Sright are in the range
evolutionary experiments are carried
[-8, 8] cm/sec and f(Sleft, Sright, t) = 0 if
out in simulations and the best evolved
Sleft or Sright is smaller than 0
individuals are tested in the real
(backward rotation); E is the number of
environment. In the first condition (“No
trials (four in this paper), T is the
learning”), which serves as a control
maximum number of sensory motor
condition, all synaptic connections are
cycles per trial (400 in these
genetically encoded and evolved without
experiments), T is the observed
learning. In the second condition
number of sensory motor cycles (for
(“Learning”), learning is enabled for the
example, 34 for a robot whose trial is
connections from visual neurons to
truncated after 34 steps to prevent
hidden neurons which are not
collision with a wall). At the beginning
genetically encoded, but initialized to
of each trial the robot is relocated in the
small random values. Connection
real outdoor environment at a random
strengths developed during learning are
position and orientation by means of a
not transmitted to offspring. The fitness
motor procedure during which the
function selected robots for their ability
robot moves forward and turns in a
to move straight forward as long as
random direction for 10 seconds. In
possible for the duration of the life of
simulation, position and orientation are
the individual. This is quantified by
instantly randomized. In the learning
measuring the amount of forward
condition, the synaptic weight values are
rotation of the two motorized wheels of
re-initialized to random values at the
the robot. Each individual is decoded
beginning of each trial. The results of
and tested for four trials, each trial
the two experimental conditions,
lasting 400 sensory motor cycles of 300
without learning and with learning, do
ms. A trial can be truncated earlier if
not show significant differences and
the operating system detects an
reach comparable performance levels
imminent collision with infrared
(Figure 3). In both cases, evolved
distance sensors. The fitness function
individuals perform large, collision-free
F(Sleft, Sright) is a function of the
paths around the environment for the
measured speeds of the left Sleft and
entire duration of a trial.
right Sright wheels:
because the relative uniformity and
predictability of the simulated
environment can be quickly captured by
the Hebbian learning mechanism.

Figure 3: Population average (thin line)


and best fitness (thick line) during
evolution in physics-based simulations
for “No learning” and “Learning”
conditions. Each data point is the
average of three evolutionary runs with The advantage of receptive field
different initializations of the development becomes evident when the
population. best evolved robots from simulation are
One way to observe the effects of tested in the real outdoor environment.
receptive field development consists of In that case, the performance of best
freezing the receptive fields at different individuals evolved without learning
periods of life and measuring the drops significantly (horizontal line in
corresponding performance of graph) whereas the performance of
individuals in the environment. We have “Learning” individuals with receptive
carried out a series of such tests in the fields that developed in the new
simulated environment for all best environment (FinRF) is almost the same
evolved individuals with learning and as that measured in simulation tests
compared their performance to that of (Figure 4, FinRF). The large difference
best evolved individuals without between performance with initial
learning. The initial random synaptic random receptive fields (RandRF) and
strengths (RandRF) of visual neurons final receptive fields (FinRF), as well as
do not provide sufficient information to the significant performance cost
let the robot move around properly, but incurred during learning (Learning), is
the final connection strengths (FinRF) an indication of the discrepancies and
do provide the structure necessary for novelties in the real environment that
the rest of the network to achieve a the learning mechanism had to
performance comparable to that of best integrate.
evolved individuals without learning.
The absence of difference between 5. Receptive Field Development
performance measured with well-
formed receptive field and performance In order to understand the effect
measured during the entire learning of behavior on receptive field formation,
phase (400 sensory motor cycles) we compared the final receptive fields of
indicates that learning does not produce evolved robots tested in the outdoor
a significant cost for the individuals–see environment with the receptive fields
[12] for a discussion of the cost of obtained by training visual neurons on a
learning in evolutionary individuals. grid sample of all images that the robot
The development of sufficiently good could
receptive fields is very fast probably
evolution and modified after each image
presentation using the learning rule
described in equation 1. Synaptic weight
values stabilized approximately after
200 applications of the learning rule,
which is less than the trial time of
evolutionary robots. The final receptive
fields of the network trained on grid
sample images of the network of the
best individual evolved in “Learning”
condition and of the network of the best
individual evolved in “No learning”
condition for the sake of completeness.
As we explained in section 2 above, the
synaptic weight values developed with
Figure 4: Performance of best the Hebbian learning rule used here
individuals evolved in the “Learning” approximate the principal components
condition (three for each evolutionary of the correlation matrix of the images.
condition) and tested each five times in The leftmost receptive field in the figure
the simulated environment from corresponds to the first principal
different initial positions and component that is the component that
orientations. The horizontal line explains most of the image variance
represents the test performance of the which represents the most important
best evolved individuals evolved in the feature in the visual environment. The
“No learning” condition. Individuals are second-from-left receptive field
tested with random and fixed receptive corresponds to the second principal
fields (RandRF), with final and fixed component, that is the component that
receptive fields (FinRF), and with explains most of the residual image
learning enabled. The duration of the variance, which represents the second
test is the same as that used during most important feature in the visual
evolution. environment, and so forth. The most
possibly gather in the outdoor significant difference emerging from
environment. Grid sample images were this comparison is that the dominant
collected by placing the robot in the receptive field for evolved learning
center of the outdoor environment and individuals is an horizontal edge on the
taking snapshots with the camera lower portion of the visual field whereas
pointed at different directions, tilt, and the dominant receptive field for the
time of the day. We chose 36 directions network trained on a grid sample
by letting the robot perform a full resembles a light central blob against a
rotation in place, three tilt values (−25, dark background combined with a weak
0 and 25 degrees), and three times of the edge in the upper portion of the visual
day to account for different lighting field. The horizontal edge appears only
conditions and shadows that varied also as the second most important receptive
during the behavioral tests with evolved field for the network trained on grid
robots. The complete set was composed sample images. This difference is
of 324 images. Images were filtered supported by the fact that the subspace
using pixel averaging (the strategy of the distribution of snapshots collected
chosen by all best evolved individuals) by evolved learning robots with well
and fed as input to the visual part of the formed receptive fields is significantly
neural architecture in random order. different from that of the distribution of
The synaptic weight values were grid sample snapshots. The comparisons
initialized to small random values become more difficult for the remaining
within the same range used during three receptive fields, but this is not
surprising because these receptive fields learning robots pay attention to specific
capture any small residual variance of visual features of the environment,
the image distribution that may have notably the edge between the ground
been caused by different starting and the walls, in order to navigate in the
positions and/or trajectory differences environment. This phenomenon can be
of evolved individuals. The receptive further investigated by measuring the
fields of the “No learning” individual diversity of snapshots captured by
cannot be easily interpreted as in the evolved robots as compared to grid
other two cases because our genetic sample snapshots. For sake of clarity,
representation and evolutionary method we project the 25 dimensional space of
is not forced to represent visual the input images (given by the five by
information in any predefined order. five visual neurons) onto the 3
We can however speculate that these dimensional subspace of the first three
receptive fields may contain a linearly principal components of the image
scrambled version of the first principal distribution. Each snapshot is
component observed in the “Learning represented by a dot in the three
condition” because both evolved robots dimensional space. These data indicate
perform similar trajectories and point that such robots self-select through their
the camera towards the edge between behavior (body movements and camera
the ground and the wall. The behavioral movements) a significantly smaller and
and receptive field analysis indicate that consistent subset of visual features that
evolved learning robots pay attention to are useful for performing their survival
specific visual features of the task of collision-free straight navigation.
environment, notably the edge between This holds also for individuals of the
the ground and the walls, in order to “No learning” condition evolved and
navigate in the environment. This tested in simulated environments (data
phenomenon can be further investigated not shown), but not when such
by measuring the diversity of snapshots individuals are tested in the real
captured by evolved robots as compared environment.
to grid sample snapshots. For sake of
clarity, we project the 25 dimensional 6.Conclusion
spaces of the input images (given by the
five by five visual neurons) onto the 3 The experimental results
dimensional subspace of the first three described in this article indicate that the
principal components of the different interaction between learning and
starting positions and/or trajectory behavior within an evolutionary context
differences of evolved individuals. The brings a number of synergetic
receptive fields of the “No learning” phenomena:
individual cannot be easily interpreted a) Behavior affects learning by
as in the other two cases because our selecting a subset of learning
genetic representation and evolutionary experiences that are functional to the
method is not forced to represent visual survival task;
information in any predefined order. b) Learning affects behavior by
We can however speculate that these generating selection pressure for actions
receptive fields may contain a linearly that actively search for situations that
scrambled version of the first principal are learned;
component observed in the “Learning c) Learning contributes to the
condition” because both evolved robots adaptive power of evolution (as long as
perform similar trajectories and point the parameters subject to learning are
the camera towards the edge between not also genetically encoded) by coping
the ground and the wall. with change that occurs faster than
The behavioral and receptive evolutionary time, as is the case of
field analysis indicate that evolved transfer from simulation to reality.
These results are promising for [6] Hancock, P. J., Baddeley, R. J., S., S.
scalability and potential applications of L., 1992. The principal components of
evolutionary robotics in real-world natural images.
situations where robots cannot possibly Network 3, 61–70.
be evolved in real-time in the real world,
but may have to evolve at least partly in [7] Hertz, J., Krogh, A., Palmer, R. G.,
simulations. They are also an indication 1991. Introduction to the theory of neural
that complex behavior can be generated computation.
by relatively simple control Addison-Wesley, Redwood City, CA.
architectures with active behavior and
local unsupervised learning that can be [8] Hinton, G. E., Sejnowski, T. J.
implemented in low-power, low-cost (Eds.), 1999. Unsupervised Learning.
micro-controllers. The significant role MIT Press, Cambridge,
of behavior in receptive field formation MA.
during learning is being increasingly
recognized in the neuroscience [9] Hubel, D.H.,Wiesel, T.N., 1968.
community [12]. Within this context, Receptive fields and functional
Evolutionary Robotics is very well architecture of monkey
positioned to address specific questions striate cortex. Journal of Physiology 195,
raised by theories and models of 215–243.
learning in a behavioral context because
it does not require hardwiring of [10] Jolliffe, I. T., 1986. Principal
behavior, but only set up of the Components Analysis. Springer Verlag,
environment and of the performance New York.
criterion where evolution will operate.
[11] Linsker, R., 1988. Self-
Organization in a Perceptual Network.
References Computer 3, 105–117.

[1] Aloimonos, Y. (Ed.), 1993. Active [12] Mayley, G., 1996. Landscapes,
Perception. Lawrence Erlbaum, learning costs and genetic assimilation.
Hillsdale, NJ. Evolutionary Computation 4(3), 213–234.

[2] Blakemore, C., Cooper, G. F., 1970. [12] Polley, D. B., Chen-Bee, C. H.,
Development of the brain depends on Frostig, R. D., 1999. Two directions of
the visual plasticity in the
environment. Nature 228, 477–478. Sensory-deprived adult cortex. Neuron
24, 623–637.
[3] Elman, J. L., 1990. Finding
Structure in Time. Cognitive Science 14,
179–211.

[4]Field, D. J., 1994.What is the goal of


sensory coding? Neural Computation 4,
559–601.

[5] Floreano, D., Kato, T., Marocco, D.,


Sauser, E., 2004. Coevolution of active
vision and feature selection. Biological
Cybernetics 90(3), 218–228.

You might also like