Professional Documents
Culture Documents
Attention-Based Smart-Camera For Spatial Cognition
Attention-Based Smart-Camera For Spatial Cognition
Attention-Based Smart-Camera For Spatial Cognition
Registers 5684 1,3% - which are also based the neural model taking into account the six visual
BRAM 41 7,6% - scales. (see A. in fig. 3.1.1).
Embedded 1,4Mb 7,6% - First, we apply a Deriche gradient filter (h(x)) )on the grey
memory scale image: h(x) = c.x.e−α|x|, with c = (1−e−α 2 . α sets
DSP 5 4,8% -
Multi-scale LUT 54781 25,06% 3,5
Registers 59765 13,67% 10,5
BRAM 401 73,66% 9,69
Embedded 14,4 Mb 73,66% 9,69
memory
DSP 246 27,33% 5,7
of
Gaussian (DoG) filter consisting in two Gaussians of stan- where mj is the jth element of the tensor iSWM,
dard deviations σDoG1 and σDoG2 . Then NPoI points of wSW M is the weight of the synaptic link between j
pi
interest (PoI) are extracted as local maxima on this DoG and mj . The learning rule is the same as the equation
image by a local competition mechanism. Around each of (2), replacing dj by mj .
these PoI, local views (images) are extracted between two In the following, visual information (visual signatures
disks of radius rsmall and rbig. To avoid redundancies, two and their retinotopic positions) is provided by a software
PoI cannot be closer than rbig/2. Then, these local views imple- mentation of the multi-scale vision architecture
are encoded in a population of Nd neurons using a log-polar (presented in fig. 1 in section 2) running on a classical
transformation providing a descriptor d (log/polar descrip- CPU (see 3.2.1).
tor on fig. 2). This transformation has relatively little com- 3
putational cost, is invariant to small rotations and scale vari- The orientation can be provided by either a magnetic com-
ations, and gives good place recognition results [9]. pass or multimodal-compass modelling Head Direction Cells
A neuronal population s codes visual signatures (the ”what” founds in mammals[2].
4
information provided by d). The activity of each neuron We use in this work a two-dimensional SWM but this
model could be extended to a third-order tensor by taking
si into account another input like the signatures elevation [9].
B)
A)
X
coor
in
imagedinate
pt
or des
In
pu
t Gr
ad
ie
nt
im
ag Signature Signature
e Place cells
Place cells
C) D)
X X
in coor in coor
imagedinate imagedinate
pt
L
pt
L
or des or des
Figure 2: Neural architectures for place recognition. A) The model of Gaussier et al. where place cells
code the pattern of activity of a spatial working memory (SWM) [7]. For each image of a panorama, the
product of the visual signature (what information) and the azimuth (where information) of each Point of
Interest (PoI) are sequentially extracted and added to the SWM by an attentional mechanism. The azimuth
is computed from head orientation and X coordinate of the PoI in the image. Vision processes of this model
correspond to a single-scale of the vision chain as described in section 2.1. The neural models B)-D) differ by
the way they process the sorted-list of PoI descriptors provided by the multi-scale architecture (see section
2.2). B) Multi-scale PoI stacking model (MPIS): a single PC population learns a SWM activity pattern
formed by taking into account PoI found on the six scales. C) Multi-scale PC stacking model (MPCS): each
of the 6 visual scales is processed as in A), resulting in six unlinked PC populations. A multi-scale PC then
learns the stacking of theses PC responses. D) Multi-scale coarse-to-ftne model (MCFPC): as in C), PoI of
each visual scale (n) are coded by a population of PC, but in addition, PC of successive scales are also linked
such that the winning PC of a coarse scale (n+1) can bias PC recognition at a ftner scale (n).
3.1.2 Multi-scale PoI stacking (MPIS) model a neuromodulation link, bias the recognition competition
In this model (see B. in fig. 3.1.1-B) PoI descriptors ex- between PCs of the ”finer” scale n.
tracted at every scale of the vision system are fed to the For a given scale n, a PC neuron (pni ) has the following
same neural networks as in the single-scale model. It corre- activity at time t:
sponds to a direct adaptation of the single-scale system to . Σ
pn (t) = H (wn (t).pn+1 (t))
i iq q
multi-scale one by simply stacking the PoI
× .1 −
NSW
Σ
descriptors. 1
M
|wn(t) − m (t)|
n (5)
3.1.3 Multi-scale PC stacking (MPCS) model
Σ
ij j
Pattern of activities in each SWM code for the current NSWM
j=1
n
location of the robot perceived at different scales. At each where is the weight between the PC p at scale n and
n i i
wn+1 q n
scale of the model described in fig. 3.1.1-C, place cells are pq the winning PC at scale n + 1. m j
is the jth element
computed as in the single-scale model. Since all scales will of the tensor SWM at scale n; w is the weight between pn
n
0.2525
0.2020
0.1515
3WD Scores WNR Scores
4.5 4.5 60 60 0.
4.04.0
0.
50 50
3.5.5
3
0. 0.1010
.0 40 40
3.03
s
2.5.52
Scores
0.
S cores
S cores
30 30
r
c
S
2.0.02
0.
.5 20 20 0.0505
1.51
1.0.01 0.
1010
.50.50
0.
00
Monoscale Multiscale1model Multiscalemodel
2 Monoscale
Single-scale Multiscmodel
ale1 Multiscalmodel
e2 Multiscale3 model Monoscale 0.0000 Multiscale1 Multiscale2 Multiscale3 Random
0.00.0 MPIS MPCS MCFPC model MPIS MPCS MCFPC
Multiscale3
Single-scale Single-scale MPIS model MPCS model MCFPC model Random
Figure 3: Performance scores of all tested models: coarse-to-ftne model MCFPC gets the best results in
almost all the tests or is second to one of its variants MPIS (Multi-scale PoI stacking model). Single-scale
(reference) model performs better than MPCS (reference) model. 3WD: The lower the better; WNR and
MAP: The great the better. MAP: Random corresponds to a random selection of the winning PC.
MPCS model gives the worst results in all tests. In this E. Cervera, and A. Morales, editors, 13th Int. Conference on
model, PC are computed independently on each scale and Simulation of Adaptive Behavior, pages 132–143. Springer, 2014.
then simply merged in a flat representation. [3] F. Dias, F. Berry, J. Serot, and F. Marmoiton. Hardware, design and
Processing the scales from coarse to fine based on a implementation issues on a fpga-based smart camera. In First Int.
hierar- chical PC learning allows to greatly enhance the Conference on Distributed Smart Cameras, pages 20–26, 2007.
robustness of the PC response. Indeed, in the WNR test,
we observe a real impact of the modulation link between
PC of successive scales, MCFPC model obtaining the
highest score.
4. CONCLUSION
We introduced in this paper a vision-based robotic ar-
chitecture composed of a smart-camera, implemented on a
dedicated-hardware (FPGA), providing PoI descriptors,
ex- tracted at multiple visual scales, to a neural
architecture simulated via a real-time neural network
software for the localisation of a mobile robot. We first
presented the im- plementation details and the resources
usage of the vision model taking into account 6 spatial
scales in the hardware system. Then we showed that the
global performance of a robot localisation task highly
depends on the way extracted PoI are combined and
learned across the visual scales. Re- sults highlighted the
advantage of exploring the visual scene following a coarse-
to-fine approach where coarse recognition levels modulate
finer recognition levels.
5. REFERENCES
[1] A. Borji, D. N. Sihite, and L. Itti. What/where to
look next? modeling top-down visual attention in
complex interactive environments. IEEE
Transactions on Systems, Man, and Cybernetics,
pages 1–16, 2012.
[2] P. Delarboulas, P. Gaussier, R. Caussy, and M.
Quoy. Robustness study of a multimodal compass
inspired from hd-cells and dynamic neural fields. In
A. P. del Pobil, E. Chinellato, E. Martinez-Martin, J.
Hallam,
[4] L. Fiack, N. Cuperlier, and B. Miramond.
Embedded and real-time architecture for
bio-inspired vision-based robot navigation.
Journal of Real-Time Image
Processing, 10(4):699–722, 2015.
[5] L. Fiack, B. Miramond, and N. Cuperlier.
FPGA-based vision perception architecture
for robotic missions. In Smart cameras for
robotic applications, page 4, Portugal,
Oct. 2012.
[6] L. Fiack, L. Rodriguez, and B. Miramond.
Hardware design of a neural processing unit
for bio-inspired computing. In 13th Int.
Conference on New Circuits and
Systems Conference, pages 1–4, 2015.
[7] P. Gaussier, A. Revel, J.-P. Banquet, and
V. Babeau. From view cells and place
cells to cognitive map learning:
processing stages of the hippocampal
system. Biological Cybernetics, 86:15–
28, 2002.
[8] C. D. Gilbert and W. Li. Top-down
influences on visual processing. Nature
reviews in neurosciences,
5(14):350–363, 2013.
[9] C. Giovannangeli, P. Gaussier, and J. P.
Banquet. Robustness of visual place cells in
dynamic indoor and outdoor environment.
Int. Journal of Advanced Robotic
Systems, 3(2):115–124, 2006.
[10] A. Jauffret, N. Cuperlier, P. Gaussier, and
P. Tarroux. From self-assessment to
frustration, a small step towards
autonomy in robotic navigation.
Frontiers in Neurorobotics, 7(16),
2013.
[11] M. Lagarde, P. Andry, and P. Gaussier.
Distributed real time neural networks in
interactive complex systems. In Int.
conference on Soft computing as
transdisciplinary science and
technology, 2008.
[12] A. Rahman, D. Houzet, D. Pellerin, S. Marat, and
N. Guyader. Parallel implementation of a
spatio-temporal visual saliency model. J.
Real-Time Image Process., 6(1):3–14,
Mar. 2011.
5.1 Acknowledgments
The authors would like to thank the French
CNRS, the ENSEA and the University of
Cergy-Pontoise for funding the RobotSoC
project.