Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Real Time Artificial Vision System in an Omnidirectional

Mobile Platform for HRI


Carolina M. Salcedo∗, César A. Peña∗ , Jés de Jesus Cerqueira∗ , Antonio M. Lima∗∗
∗ ∗∗
{carolina.moreno, cesar.pena, jes}@ufba.br, amnlima@dee.ufcg.edu.br

Abstract— In this paper we implemented an artifi-


cial vision system for HRI (Human Robot Interaction)
based on methods for recognition of facial features,
object detection and tracking. The purpose is to im-
plement a vision system in order to process perceptual
information as a result of human-robot social interac-
tion in a hybrid architecture for an omnidirectional
test platform that is currently being developed in
the Robotics Laboratory of the UFBA Polytechnic
School.

I. Introduction

In human-robot interaction (HRI), the robots (specif-


ically social robots) need to have an insight-oriented
for humans and optimized in order to interact with
the environment. They should be able to track hu-
man features (faces, hands, body), to interpret speech,
natural language and discrete commands. Additionally, Fig. 1: CAD model for the HiBot.
they must incorporate mechanisms for recognizing facial
expressions, gestures and human activities [14], [2], [20].
In this article was implemented a vision system that section IV shows the proposed vision system; the exper-
is part of the perception system for HiBot, an om- iment results are shown in section V and finally section
nidirectional platform developed in the UFBA robotic VI shows the concluding remarks.
Laboratory (see Fig. 1), into a hybrid approach for social II. Mobile Robot Hibot
interaction, aiming to improve the system performance in
The HiBot is part of the research project in HRI
tasks synchronization and response time instead of using
that is being developed in the Robotics Laboratory of the
centralized architectures [10], [13], [1], [3]. The system
Polytechnic School of the Federal University of Bahia. It
input consist of frames acquired from a USB camera.
is being designed onto a rectangular structure acquired
The system has a detection module, and recognition of
with the AndyMark Inc. In the physical projection of
multiple faces based on Haar features [18], and using
the robot are used aluminum parts in order to ensure
the Eigenfaces method [16]. A detection object module
lightness and strength.
uses color segmentation and an object tracking mod-
The structure is based on a configuration with four
ule. For the implementation of the vision system there
Mecanum wheels to support mobility and charge trans-
were used libraries for computer vision and artificial
port as shown in Fig. 2. The Mecanum wheels allows the
intelligence libraries for C language: OpenCV by Intelr
movement of the robot in any direction without necessity
and Aforge.NET for .NET Framework that are General
of reorientation since, each wheel has a structure based
Public License.
on bearing with no movements controllable, allowing
The organization of the paper is as follows: section II
omnidirectional movements. Each omnidirectional wheel
shows the physical structure for the HiBot; in section
is coupled to a gearbox which is the mediation with a DC
III is shown a description of the HRI architecture for
motor. Each shaft gearbox has a coupled encoder (see
the HiBot and the inclusion of the perception subsystem;
Fig. 2) permitting to obtain the angular velocity from
∗ Programa de Pós-Graduação em Engenharia Elétrica - Escola
each wheel and the next control process.
Politécnica da Universidade Federal da Bahia. Rua Aristides Novis, The HiBot consist of the following subsystems: (i)
02, Federação, Salvador, BA, Brazil sensing subsystem; (ii) actuators subsystem; (iii) per-
∗∗Centro de Engenharia Elétrica e Informática - Departamento ception subsystem; and (iv) control subsystem. The
de Engenharia Elétrica da Universidade Federal de Campina
Grande. Rua Aprı́gio Veloso, 882, Bodocongó, Campina Grande, perception and control subsystems are executed in a
PB, Brazil computational architecture.
Cognitive Level

Object / Human / Enviroment


Motivation
Learning
Mechanism

Knowledge
DataBase

MMB
Pos. Objetive Behavior Symbolic
( q goal ) Selected Information

Associative Level

Map

Messages/
Reception
Mapping Comunication
Designer Monitor Location Rx/Tx

Motion Deliverative
Plan Reactive
Behavior of
Navigation Behaviors

LCTS
Fig. 2: CAD model for the mobile base of HiBot. Quantitative
Response

Capture Reactive Level


Image µKernel
Video Manager of
In the sensing subsystem there are 10 linear dis- Perception control inputs

tance sensors SHARP GP2D12; a LCD monitor with Sensors


Actuators
TouchScreen technology; two JPG serial cameras µCAM- Perception Action

RTM
Subsytem
232 for stereoscopic vision implementation driven by
a servomotor to control elevation and panoramic; four Fig. 3: Three levels of the architecture for the concurrent tasks
execution in real time. The arrows indicate the flow of information
digital encoders placed in the axes of the DC motors. An in the three modules of the architecture. The blocks indicate the
audio-visual interaction from LCD screen will be made components and processes running on each module.
using computer graphics synthesis combined with the
capture and voice sinthesis for a face avatar. The actuator
subsystem consist of four DC motors with planetary Image Classificator
Acquisition
gearboxes 5.95:1. There are used four H-bridges to engage VGA Cam
AdaBoost

the DC motors to their respective velocity controllers. Eigenfaces


Calculator
The computational architecture which is responsible RGB Recognition
Filter Object / Size / Moving
for the implementation of the perception and control Aforge.Net
Reactive
Level
subsystems, will be divided in two layers: a low level
and a high level. The low-level layer will be based on Distance
Process 1
..
Acquisition
the PIC32r from Microchip and will be used for the Distance
.

Process 10 Associative
implementation of dynamic and kinematic control strate- Sensor
Level
gies of the actuators. The operating system kernel in
real time will be implemented from a structure based on Sound
Acquisition
Preprocessor
Procedure
Petri nets in the FreeRTOS package. The high-level layer Microphone

will be used for the trajectory control, and for advanced Preception Subsystem
control laws for navigation and perception. The high level
Fig. 4: Perception subsystem based on the approach of concurrent
processing system has the ability to designate actions in execution of tasks for the architecture of HiBot.
the low layer using simultaneous movement control of
the four DC motors. This layer is implemented into a
PC-104 with embedded Linux operating system, Intelr
CoreTM 2 Duo Processor (2.26 GHz), with 4 GB of RAM The implementation of the perceptual system is de-
with support for SPX modules that allows increasing the signed within the approach of an architecture for con-
number of analog and digital inputs. current execution of tasks with a hybrid structure and
improves system performance by task synchronization
III. Perception System and the Hibot and response time, contrary to happens with centralized
Architecture architectures [10], [13], [2], [3]. The Fig. 3 shows the
architecture based on three basics processes. The per-
Since the central topic of this paper is the vision sys- ception system consists of acquisition modules that are
tem, we will give treatment principally to the perception responsible for extracting information from the cameras,
within the HiBot architecture. The perception system distance sensors and microphones [10], [13], as shown in
aims the following activities: Fig. 4.
i. Recognize human and/or objects in the environment; The artificial vision system consist of an acquisition
ii. Support the environment state in order to help the system for stereoscopic image, controlled by two servo-
navigation (movement detection); motor as shown in Fig. 1. With the information from the
camera subsystem, an image frame is formed and sent scription of the current state of the environment that is
to a sorter that aims to identify if the environmental sent by messages to the cognitive level.
element that is near the robot, is a human, an animal 3) Reactive level: The reactive level will be imple-
or an object with or without movement. The idea is mented in the RTM process and its main function will be
to implement this subsystem using the SVM (Support the implementation of the basic controllers for the robot
Vector Machine) classification technique and the sensors locomotion. These basics controllers will be encapsulated
fusion technique, both approaches for recognition and and organized as behavior in a kernel for real-time. This
objects classification [17]. level has the responsibility through the perception mod-
ule, to collect information of environment as shown in
A. HiBot Architecture
Fig. 4 (using encoders and distance sensors); human faces
To implement the architecture for the HiBot is used and localization of elements into the environment by the
a hybrid approach of three levels: a Cognitive level, an information acquired from camera and microphone [4].
Associative level and a Reactive level ; all implemented
with a concurrent programming approach, as shown in IV. Implemented Vision System
Fig. 3. The consideration of concurrent processes is due
to the robot is responsible for locate itself autonomously With the purpose of achieve the items i) and ii) in
in the environment, without receiving no information Section III the vision mechanism that was considered as
about its location in the environment or the position of part of the perception subsystem project for the HiBot
the environmental factors, like in dynamic environments was divided into: i) facial detection and recognition, ii)
[11]. Otherwise, the perception of the environment, loca- object detection and iii) tracking of moving objects. For
tion, object recognition and mapping of the environment the vision system was used the Viola and Jones method
are responsibility of the robot [10], [21], [22], [23]. [19] that is based on Haar features [9], [5].
According to Fig. 3, the RTM (Real Time Manager ) For the face recognition method it was used Eigenfaces
process is the responsible for the perception and action technique based on PCA (Principal Component Analy-
activities. The LCTS (Linker and Coordinated Tasks Syn- sis) technique. It is a technique for feature extraction
chronizer ) process is responsible for the communication favorable for Gaussian distribution data. However, it is
between internal processes of the robot and its synchro- not possible to ensure that the images in the study show
nization. The MMB (Manager Motivation and Behavior ) that distribution. In compliance with that limitation,
process is responsible for choosing the behavior of human methods that use PCA have high success rates [7], like
interaction through emotions. Eigenfaces [16]. An example of good results obtained
1) Cognitive level: The cognitive level has the re- using PCA is shown in [1], where it is used Eigenfaces
sponsibility for choosing the learning tasks for the ap- method, obtaining recognition rates higher than 90%.
propriate behavior. The relative tasks from this level The track object module is based on size and color
will be implemented by the MMB process which uses a segmentation, where is chosen the object color to be
symbolic knowledge base to encapsulate messages con- tracked, a width and height base. It uses the HSV
taining perception information received by the robot, (Hue, Saturation and Value) model, which is a non-linear
a motivation process in order to organize and keeps transformation of the RGB color system.
a sequence of the behavior/habilities in an emotional
model that incorporates basic emotions (i.e., sadness, A. Face Detection in Motion
anger, joy, fear, neutral) [13]. When a task is performing The algorithmic scheme which was implemented in
and the cognitive level receives information from the order to develop the proposed detector in this work, is
current state of the environment, it checks if the current based in the proposal made by Viola and Jones in [19]
state of the environment allows the current task remains and the subsequent modifications made by Maydt in [6].
performed [10]. This detector is based on a cascade of classifiers which
2) Associative level: The associative level is imple- is explored throughout the image in multiples scales and
mented through LCTS process and its main responsibil- locations. Each cascade step is based on the use of simple
ity is to make local tasks provided by the cognitive level. type Haar characteristics, which are computed efficiently
For each task the LCTS process chooses the sequence using a classifier based on Boosting called AdaBoost,
of adequate parameters for the behavior of the reactive which is able to select the most important characteristics.
level, in order to that task can be performed successfully. From a database containing positive and negative face
With the information of received perception, the asso- images (faces and non-faces), the AdaBoost is applied
ciative level uses an inference engine with fuzzy rules for to find the features that better differentiate positive and
the frame received by the perception system, in order to negative images and distribute these features in classifiers
classify the current state of the environment and feeds [19]. The efficiency of this scheme lies in the fact that the
the knowledge base of the cognitive level so then, decide negatives (the majority of windows to be explored) are
which plans are applicable to the various environmental gradually eliminated (see Fig. 5), so that the first steps
situations. That classification generates a symbolic de- remove large numbers of them (the easiest) with little
replacements

Types Orientations Algorithm 1 Training Eigenfaces Method


Require: Images Set S = {Γ1 , Γ2 , Γ3 , . . . , ΓM }, Γi ∈
RM .
Type 1 Type 2 Type 3 Ensure: M orthonormals vectors. P
1 M
0◦ 90◦ 1: Obtain the mean image Ψ = M n=1 Γn .
2: for each Γi do
Type 4 Type 5 45◦ 135◦ 3: Find the difference Φi = Γi − Ψ.
4: end for
Fig. 5: Different types and orientations of Haar features proposed by 5: Calculate the covariance matrix
Viola and Jones (2001) and Lienhart and Maydt (2002). The white
boards calculate the positive sign and the black negative sign. M
1 X
C= Φn ΦTn = AAT ,
M n=1
processing. So, the final steps will be use the necessary where A = [Φ1 Φ2 . . . ΦM ].
time for classifying correctly the most difficult cases. 6: for k=1 to M do
The original detector [18] uses three kind of different 1 PM
7: Calculate λk = (µT Φn )2 .
characteristics (numbers 1, 2 and 5 in figure 5) with M n=1 k
8: end for
two possible directions (vertical and horizontal). The 9: Solve
Lienhart and Maydt detector proposed in [6] adds two max{λ1 , . . . , λM }
new kind of characteristics (type 3 and 4) deleting 5 type (
and setting two new diagonal directions (45◦ and 135◦ ). T 1, j = k,
s.a µj µk = δjk =
0, j 6= k,
B. Eigenfaces Method for Face Recognition
10: return Set {µk }, for k = 1, . . . , M .
The Eigenfaces method was first presented by [16].
The author’s proposal was to use PCA for extracting the
attributes of a face image. Each image can be represented Algorithm 2 Face Recognition
as a linear combination of eigenfaces. An approximated Require: Image Γx ∈ RM .
version from each image can be obtained using the best Ensure: Face Recognized Γx ∈ S.
set of eigenfaces which are the eigenvectors with the 1: while εk ≥ Tolerance do
largest eigenvalues in a higher variance in the faces 2: for k=1 to M do
set. The best set of eigenfaces represents a dimensional 3: Calculate wk = µTk Φi .
subspace with size M and named Face space [5]. The 4: end for
steps for the training and face recognition algorithm are 5: Define Ωk = [w1 , w2 , . . . , wM ]T .
shown in the Algorithms 1 and 2. 6: εk = kΩ − Ωk k2 .
It is considered that the input face belongs to a class 7: end while
if εk is below the threshold. Then, the face image is 8: return Face recognized Γx .
considered as a known face. If the difference is above
the threshold, but below a second threshold, the image
can be determined as a unknown face. If the input image
min(R, G, B), and then the H, S and V values.
is above these two limits, the image is considered as a
non-face. V = MRGB ,
MRGB − mRGB
C. Detection and Object Tracking S = ,
 MRGB
The proposed method for detection and object track- G−B
 60 , MRGB = R, G ≥ B,
ing is base on color segmentation and size, using the HSV



 MRGB S
model. HSV model uses chrominance concepts (hue),

G−B


 60 M + 360, MRGB = R, G < B,

saturation and luminance (brightness). The tone repre-

RGB S
sents the actual color, saturation represents the degree of H =
 B−R
purity and the luminance represents information of the  60
 + 120, MRGB = G,

 M RGB S
brightness of the color [7]. The HSV model shows color


 60 R − G + 240,


information and luminance more uncorrelated than the MRGB = B

MRGB S
RGB model. Therefore, HSV model is more appropriate
for segmentation based on color [8]. The results give the tone ranging from 0 to 360,
To transform from RGB space to the HSV space, indicating the angle at the circle where the hue (H) is
initially all the components (R, G and B) are normalized defined. The saturation and brightness are set between
in the range [0, 1]. Then, are calculated the maximum 0.0 and 1.0, representing the lowest and highest possible
and minimum values MRGB = max(R, G, B), mRGB = values.
TABLE I: Results of detection and face recognition.
The algorithm for detecting and tracking objects is
given below: Object static in frontal position
Faces Detected Recognized
Algorithm 3 Object Tracking Time to Process (s) 1.24s 1.52s
Require: Initialization of the camera. Aceptance Rate for Detection (%) 1.2
FRR (%) 1.9
Ensure: Object tracked. FAR (%) 2.2
1: Setting the object color to be detected: Object in motion
SRGB = (R, G, B). Faces Detected Recognized
Time to Process (s) 1.492 1.76
2: Setting the object size to be detected: Aceptance Rate for Detection (%) 1.7
FRR (%) 2.8
ZRGB = (minWidth, minHeight). FAR (%) 3.1

3: Class EuclideanColorFiltering(): The filter keeps Carolina Carolina


only the color SRGB eliminating the rest.
4: Class GrayscaleBT709(): To transform the image
into grayscale. Using
CRGB = (0.2125, 0.7154, 0.0721)
as the RGB values [15].
5: Class BlobCounter(): This filter receives the (a)
grayscaling image and extracts the quantity and Carolina and James Carolina and Akson
features from the found objects.
6: Specify the object location by applying the centroid
calculation in the largest object found.
7: return Object tracked.

In Algorithm 3 the classes EuclideanColorFilte-


(b)
ring(), GrayscaleBT709() and BlobCounter() are spe-
cific tools for image processing from AForge.Net [15]. Fig. 6: Results of (a) face detection and (b) face recognition using
Eigenfaces method.
V. Results
In order to train and validate the detection method was
created a training/test set with image from a database. can be seen that the proposed method presents positive
The images were acquired in the front position for the results and is equally appropriate than the detection and
cam with a tolerance of ±30◦ with respect to the head face recognition modules for a perception system for the
rotation and in the gray scale with a resolution of 100 × robot in development. Failure rates associated with the
100 pixels using a VGA camera. tracking module are presented in Table II. Otherwise, the
In Fig. 6 is presented the results of the face detection tracking system showed capability to recognize different
and recognition using Haar features for the detection and colors in acceptable sizes for the resolution defined in the
the Eigenfaces method for recognizing. It can be seen images.
in Fig. 6 that the proposed methods show good results, TABLE II: Taxa de falhas referente ao rastreamento de objetos.
both in the detection and reconstruction of faces. The
developed program shows the images, the number and Object color Failure rate (%)
names of the recognized persons, these names have been Branco 2.8
previously tagged in the training phase. The program Azul 2.8
Amarelo 4.8
can detect and recognize faces even in the presence of Vermelho 3.4
occlusions and pose variations. False Acceptance Rates Roxo 3.4
(FAR) and False Rejection Rates (FRR) were calculated Preto 3.0
Verde 4.4
(as presented in [12]) and they are shown in Table I.
The Fig. 7 shows the result of detection and object
tracking using the method based on color segmentation The mean execution time of the main loop is 32ms,
with an estimated time of 1.22s for the complete process. this time is considered in view of the computer used in
For the example shown in this figure, the green, black and the tests, that have a Linux operating system, with a
red colors were chosen in order to the program detects Intelr CoreT M 2 Duo processor with 2.26 GHz and 4
the objects for that color, the second image shows the GB of RAM memory. For the algorithm implementations
object tracked, in this case a fan which was moving. It we used vision computer libraries from OpenCV and
Detection Green Obj. Tracking
References
[1] A. F. Abate, M. Nappi, D. Riccio, and G. Sabatino. 2d and
3d face recognition: A survey. Pattern Recognition Letter. Vol.
28, pp 1885-1906, 2007.
[2] R. C. Arkin, M. Fujita, T. Takagi and R. Hasegawa. Etho-
logical modeling and architecture for an entertainment robot.
IEEE International Conference on Robotics and Automation,
Seoul, Koreia. pp. 1-6, 2001.
[3] R. C. Arkin, M. Fujita, T. Takagi and R. Hasegawa. An
ethological and emotional basis for human-robot interaction.
Detection Black Obj. Tracking Robotics and Autonomous Systems, 2003.
[4] C. L. Breazeal. Designing sociable robots. Massachusetts Insti-
tute of Technology Press, Cambridge, MA. 2003.
[5] K. Delac, M. Grgic, M. and P. Liatsis. Appearance-based
statistical methods for face recognition. 47th International
Symposium ELMAR. pp. 151-158, 2005.
[6] R. Lienhart and J. Maydt. An extended set of Haar-like
features for rapid object detection. International Conference
on Image Processing. Vol. 1, pp. 900-903, 2002.
[7] G. Matos, M. Mendonça, E. Freire, J. Montalvão and L.
Matos. Sistema de visão artificial baseado em detecção de
Detection Red Obj. Tracking cores (para sistemas de controle de robôs celulares com re-
alimentação visual). VIII Simpósio Brasileiro de Automação
Inteligente, Florianópolis, SC, 2007.
[8] Y. Ohta, T. Kanade and T. Sakai. Color information for region
segmentation. Computer Graphics and Image Processing. vol.
13, num. 3, pp 222-241, 1980.
[9] C. P. Papageorgiou, M. Oren and T. Poggio. A general frame-
work for object detection. VI International Conference on
Computer Vision. pp. 555-562, 1998.
[10] C. Policastro, R. Romero and G. Zuliani. Robotic architecture
inspired on behavior analysis. International Joint Conference
Fig. 7: Results from the detection and tracking objects using color on Neural Networks. pp. 1482-1487, 2007.
segmentation. [11] J. Reif and M. Sharir. Motion planning in the presence of
moving obstacles. 26th Annual Symposium on Foundations of
Computer Science. pp. 144-154, 1985.
[12] I. Ribeiro, G. Chiachia and A. N. Marana. Reconhecimento
AForge.Net, and the GNU C Compiler(GCC) from Free de faces utilizando análise de componentes principais e a
transformada census. VI Workshop de Visão Computacional.
Software Foundation. pp. 25-30, 2010.
[13] M. A. Salichs, R. Barber, A. M. Khamis, M. Malfaz, J. F.
VI. CONCLUSIONS Gorostiza, R. Pacheco, R. Rivas, A. Corrales, E. Delgado and
This paper presented the preliminary results a real D. Garcia. Maggie: A robotic platform for human-robot social
interaction. IEEE Conference on Robotics, Automation and
time vision system that will be part of a perceptual Mechatronics. pp. 1-7, 2006.
system for an omnidirectional mobile platform in order [14] B. Scassellati. Investigating models of social development
to interact with humans. It includes multiple face de- using a humanoid robot. International Joint Conference on
Neural Networks. Vol. 4, pp. 2704-2709, 2003.
tection using a Haar features, multiple face recognition [15] L. M. Surhone, M. T. Tennoe and S. F. Henssonow.
using Eigenfaces technique, and detection and object AForge.NET: Artificial Intelligence, Computer Vision, .NET
tracking using color segmentation and size information. Framework. Beta Script Publishing, 2010.
[16] M. Turk and A. Pentland. Eigenfaces for recognition. Journal
The results presented during the evaluation were satis- of Cognitive Neuroscience. vol.3, num. 1, pp. 71-86, 1991.
factory, according to data presented in Tables I and II. [17] V. N. Vapnik. The Nature of Statistical Learning Theory.
Analyzing these data, we can conclude that the vision Second ed. Springer, 1999.
[18] P. Viola M. and Jones. Rapid object detection using a boosted
system is is well suited to be used in interaction with cascade of simple features. Conference on Computer Vision
robots and other applications to control system based on and Pattern Recognition. Kauai, Hawaii. pp. 511-518, 2001.
visual feedback. We are currently investigating in more [19] P. Viola M. and Jones. Robust real-time face detection. Com-
puter Vision. vol.57, num. 2, pp. 137-154, 2004.
detail the issues of robustness to changes in head size [20] T. Sogo, H. Ishiguro and T. Ishida. Mobile robot navigation by
and orientation, and sensitive light changes in order to a distributed vision system. Journal of Japan Robotics Society,
improve the vision system. vol. 17, pp. 1009-1016, 1999.
[21] W. Ponweiser, M. Ayromlou, M. Vincze, C. Beltran, O. Mad-
ACKNOWLEDGMENT sen and A. Gasteratos. RobVision: vision based navigation
for mobile robots. International Conference on Multisensor
The authors of this paper would like to thank the Fusion and Integration for Intelligent Systems, August. pp.
following institutions for the support given to this re- 109-114, 2001.
[22] J. M. Motta and S. R. Gonsalves. Sistema de rastreamento por
search: FAPESB - Fundação de Amparo à Pesquisa visão em robôs móveis com otimização por projeto fatorial.
do Estado da Bahia; CNPq - Conselho Nacional de Revista Iberoamericana de Ingnierı́a Mecánica. vol 12, num.
Desenvolvimento cientı́fico e Tecnológico; e CAPES - 1, pp. 25-34, 2008.
[23] G. N. DeSouza and A. C. Kak. Vision for Mobile Robot
Coordenação de Aperfeiçoamento de Pessoal de Nı́vel Navigation: A Survey. IEEE Transactions on Pattern Analysis
Superior. and Machine Intelligence. vol. 24, num. 2, 237-267, 2002.

You might also like