Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2018 International Conference on Intelligent Systems (IS)

Voice Controlled Robotic Manipulator with Decision


Tree and Contour Identification Techniques
Daniel Jeswin Nallathambi Arul Thileeban S Sailesh Kumar Gajesh
Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering Engineering Engineering
SSN College of Engineering SSN College of Engineering SSN College of Engineering
Chennai, India Chennai, India Chennai, India
danieljeswin@gmail.com arulthileeban023@gmail.com skg10408@gmail.com
D Thenmozhi
Department of Computer Science and
Engineering
SSN College of Engineering
Chennai, India
theni_d@ssn.edu.in

Abstract— Robotic manipulators are widely employed for a manipulators, they become more flexible and can adapt
number of industrial and commercial purposes today. Most of themselves to different environments, thus enlarging their
these manipulators are preprogrammed with specific instructions potential areas of applications. For example, in the case of
that they follow precisely. With such forms of control, the manufacturing robots, a minor change in the manufacturing
adaptability of the manipulators to dynamic environments as well
as new tasks is reduced. In order to increase the flexibility and
process would not require reprogramming of the robot. Other
adaptability of usage of robotic manipulators alternative control applications include areas like healthcare by serving as
mechanisms need to be pursued. In this paper we present one assistants to surgeons during surgery, handling of hazardous
such mechanism, using a speech-based control system along with material in waste disposal and so on.
machine perception. The manipulator is also equipped with a
vision system that allows it to recognize the required objects in
the environment. Voice commands issued as plain English II. RELATED WORK
statements by the user are converted into precise instructions Robotic manipulators have been of in existence for a while
after incorporating the environment sensed by the robot executed and found many useful applications in fields like industrial
by the manipulator. This combination of speech and vision manufacturing, automation, etc. In order for manipulators to
systems allows the manipulator to adapt itself to changing tasks
as well as environments without any reprogramming.
be deployed in other fields like those mentioned in the
introduction, the manipulator would have to be capable of
Keywords-component; Robotic Manipulator; Decision Tree, working in a dynamic environment, constantly reconfiguring
Natural Language Processing; Object Detection; Speech to Text. itself to accomplish different tasks. Manipulators used in
manufacturing and automation processes such as those in [1]
I. INTRODUCTION and [2], are designed to perform a set of repetitive tasks
accurately and this is not sufficient for environments with
Robotics is a rapidly developing field in the modern world. dynamically changing tasks. Some manipulators that allow for
Robots have had an effect in almost every field and have such operation in the targeted environments are proposed in
brought about revolutionizing changes in certain fields like [3]-[6]. These systems are gesture-controlled. The problem
manufacturing, healthcare and search and rescue to name a with such systems is that they require the presence of a human
few. One of the most popular types of robots currently in use whose actions they mimic. For applications like surgery where
is robotic manipulators. They have revolutionized high accuracy is required gesture-based control systems are
manufacturing processes by almost removing the necessity for not a viable option.
human involvement. Such robotic manipulators are pre-
programmed with the tasks that they are expected to carry out Voice controlled robotic manipulators overcome this
and they are carried out with a high degree of accuracy. Now, inconvenience. The user only supplies the object of interest or
with the advent of artificial intelligence in robotics, robots are the target of the operation. The manipulator intelligently
more intelligent and can be trained to take their own decisions figures out the best path to execute the instruction. A camera
depending on the environment. Such robots have a much wider allows the manipulator to track the target and thus execute
range of applications. instructions with a high degree of accuracy. Such systems
require filtering of noise emanating from the environment as
In this paper a voice controlled robotic arm is discussed.
the voice commands issued are critical to operation and must
This arm was designed and made using the OWI Robotic Arm
be captured as accurately as possible. In [7] and [8] some
Edge as the hardware platform. While voice-controlled robots
techniques are discussed using Hidden Markov Model and
exist, this technology is yet to be applied to robotic
fuzzy neural-networks.
manipulators. With the integration of voice control into

978-1-5386-7097-2/18/$31.00 2018 IEEE


E. NLTK and NLP module
The rest of the paper is organized as follows. In the next NLTK is a natural language processing library, written in
section the hardware and software components along with the Python. It is a popular among researchers for NLP research.
techniques used are discussed. In Section IV the architecture of This is one of the core libraries that this system interfaces with
the system and the working is elaborated in detail. This is to provide the functionality of understanding and executing the
followed by a discussion of the experiments carried out and the voice commands issued by the user.
results obtained and their interpretation. Finally, we conclude
with the conclusion and the scope for future improvement.
The NLP module of the system receives the text from the
III. COMPONENTS API call to Google Speech to Text API. The main task of this
module is to identify the specific instruction issued by the user
In this section the various hardware and software
within the text it receives. The task could be to pick or drop
components used in developing the arm are discussed. The
objects, move objects or to simply move the arm into a
algorithms used are also detailed in this section. The
specific configuration. Therefore, the system needs to obtain
methodology and working of the system as a whole is dealt
the objects specified by the user if any and the operation to be
with in the next section
performed on them. It also needs to be able to distinguish
between commands involving objects and those involving only
A. OWI Robotic Arm Edge the arm. First, the sequence of words received is parsed into
The OWI Robotic Arm Edge is a 5 DOF manipulator. It has tokens and passed to the decision tree, whose nodes are
2 DOF for the shoulder i.e. the base, and 1 DOF each for the composed of patterns and a pattern matching mechanism is
elbow, wrist and gripper. It offers an USB interface making it implemented. The decision tree used here is constructed from
easy to program and use. Low level commands to the servo scratch based on the need and keywords to be inputted assisted
motors controlling the arm can be given through this USB by Wordnet. Along with the basic matching, to increase the
interface. In this system, the Raspberry Pi is responsible for efficiency and to reduce the errors, Python NLTK Module’s
interfacing with the arm, sensors, cameras and microphone and Wordnet is used. It contains words and a set of functions to
giving instructions to be executed by the arm. identify basic structure and grammar of sentences.
Tokenization can be carried out easily and more accurately
through this module. Actions are associated with each pattern
B. Raspberry Pi 3 matched, thus mapping the input text to a specific action to be
performed. The set of actions supported by the system are
The Raspberry Pi 3 is a low-cost microcontroller with huge
shown in the Fig. 1. Composite actions are also defined and
processing capabilities. It comes with a 1.4GHz 64-bit
are broken down into simple prompts and passed to the object
processor with support for communication interfaces like
detection and tracking module or controller module depending
USB, WiFi, Bluetooth, etc. The compactness and the
on the instruction. The attributes for pick up are the object, its
processing and communicational capabilities the Pi offers has
color and its shape. The attributes for move arm are the
made it a popular choice of microcontroller for embedded and
directions i.e. up, down, left and right, along with the distance.
simple robotic systems. For this system, the Raspberry Pi is
Instructions with coordinates specified are sent to the tracking
the core hardware component. It interfaces with the
submodule of the object detection and tracking module. The
microcontroller, camera and the ultrasonic sensor, processes
PICK UP ATTR instruction is sent to the object detection
the data coming from them and outputs appropriate machine
submodule. The remaining instructions are sent directly to the
level instructions to the arm through the USB interface. The
controller module for execution.
default operating system for Raspberry Pi-Raspbian is used.

C. Ultrasonic Sensor INSTRUCTION Action


The ultrasonic sensor is a low-cost distance measurement sensor. MOVE TO {x,y,z} Move to the corresponding
It transmits ultrasonic pulses at a definite frequency. These point in relative 3D space
pulses bounce off targets and the sensor measures the time taken MOVE {DIRECTION} Move in that direction
for the echoes to reach it and uses it to estimate the distance to PICK UP FROM {x,y,z} Pick up the object from the
point defined
the object. A fusion of the data from the sensor and the camera
PUT DOWN TO {x,y,z} Drop the object on the
allows for effective object tracking. While RGBD cameras like point defined
Kinect can be used for the same purpose, they are much more PICK UP {ATTR} Pick up the defined object
expensive and the data captured is too extensive to be analyzed based on the attribute
real-time on low power devices like the Raspberry Pi. DROP Drop the held object
GRAB Do a grab action
D. Microphone
Fig 1. Mapping of instructions to actions
The microphone converts audio signals into electrical
signals, that can be processed by a computing system. In this
system, the voice commands issued by the user are captured
by a microphone and converted to text using Google Speech
to Text API. This forms the Speech to Text module of the
system.
F. OpenCV and object detection and tracking module module or input from the object detection module to track the
OpenCV is a cross platform, open source library for image object. This is the actual module that sends instructions to the
processing and computer vision, written in C++. It contains controller module to control the robotic arm. When it
optimized implementations of various standard image instructions from the NLP module or the detection module in
processing and machine learning algorithms. In this system, search mode, it uses PID control to move the arm to the
the object recognition and tracking algorithm, is implemented required location. When it receives data from the object
using this library. detection module in track mode. It uses a form of PID control
in which the centroid of the object is kept close to the center of
The object detection and tracking module is made up of two the frame. At the same time, it moves the arm towards the
submodules i.e. the detection module and the tracking module. object. This allows to determine whether the arm should move
The object detection module receives the PICKUP instruction to the left, right, up or down. The distance measured by the
along with its attributes. It also receives input from the camera ultrasonic sensor determines whether the arm should move to
and the ultrasonic sensor. The sensory inputs are required to the front or back. Thus, the direction and the distance to be
locate and track the required object. The color received as moved is outputted by the tracking submodule.
input from the NLP module is used to filter the image,
removing unnecessary portions which do not satisfy this G. Controller module
constraint. They are detected by comparing in the HSV scale. The controller is the software component that issues the
After this, shapes are detected to hone in on the right object. low-level instructions that control the arm. The controller can
Shapes are detected using contour detection. The detected receive input from the NLP module or from the object
contours are compared to roughly how many lines are needed detection and tracking module. For instructions that involve
to encompass the object. This however, does not work when picking up objects, the object detection and tracking module
the command is to search for spherical shaped objects as needs to specify direction in which the arm needs to move
spheres are seen as circles in 2D space. The environment is along with the distance. For move instructions, the direction
captured as 2D images by the camera. So instead of contour and distance are obtained from the NLP module. Drop
detection, Hough Circle detection algorithms are used for this instructions are also received directly from the NLP module.
special case. If there are multiple targets at the end of this This module then issues the appropriate instructions that
detection, the target with the largest area constrained by the control the servo motors of the arm.
ability of the gripper is chosen.
IV. SYSTEM ARCHITECTURE AND WORKING
In order to improve the accuracy in dealing with images The system architecture showing how the hardware
from low quality webcams, the images are first denoised and components and hardware modules discussed in the previous
sharpened. Image denoising is especially essential when the section interact with each other is shown in Fig. 2.
object is close to the camera. This occurs when the arm moves
closer to its target. Image sharpening is carried out to The basic workflow of the system is as follows. When the
overcome blurring in low resolution cameras which might arm is idle it is in a predefined initial configuration. The
hinder object detection. It also allows increasing the attention microphone continuously receives audio signals from the
to certain parts of the image. Once the object has been detected environment and when the user issues a voice command, as
the centroid of the object is found. The distance to the object is mentioned it is converted to electrical signals by the
measured by the ultrasonic sensor. The ultrasonic sensor is microphone. The text equivalent of these signals is then
mounted on top of the camera and angled so that within a produced by the Speech to Text module using Google APIs.
certain range, the objects seen by the camera and the sensor The NLP module then uses decision trees supplemented by
are the same. Therefore, using a fixed transformation, the Wordnets to extract the basic actions that can be executed by
actual distance to the object can be found. This module the arm. As mentioned, depending on the instruction, further
operates in two modes. In search mode, it has received the processing is carried out by the appropriate modules. Finally,
instruction from the NLP module but has not located the the controller receives the required commands from either the
object yet. In this mode, it issues instructions to ensure that the tracking submodule of the object detection and tracking
arm circularly sweeps out the search area, shortening the module or the NLP module. The controller module then issues
radius each time. After a predetermined number of such the required low-level instructions to the arm through the USB
sweeps, if the object is still not located, failure is returned. In interface. The task is completed when the instructions are
the track mode, it has found the required object and it outputs executed successfully by the arm.
the data required by the tracking module to track the object.
All the software modules execute within the Raspberry Pi as
The next submodule of this module is the tracking module. shown in the architecture diagram Fig. 2.
This module receives the coordinates directly from the NLP
Fig. 4 Raw input image

Fig.2 Architecture of the system

V. RESULTS
Testing was done by instructing the arm to detect and pick
up common objects like chalks etc. Fig. 3. shows the final
build of the robot after successful execution of few commands
leading to picking up a blue chalk from a position and rotating
the arm around to another position. Intermediary results from
the object detection and tracking module are shown in the
figures. Fig.4 shows the input image, Fig. 5 shows the color
filter applied, Fig. 6 shows the threshold image obtained after
applying this filter. Finally, Fig. 7 shows the final output Fig. 5. Color filtered image
image after applying contour detection. As mentioned since
low quality webcams are used to capture the images, the input
images are sharpened denoised and then filtered against color
followed by contour detection. The performance of the arm
and the results obtained were satisfactory as the manipulator
was able to recognize objects with reasonable accuracy. When
working with other environments, parameters of object
detection and recognition may have to be tweaked to get
improved results.

Fig. 6. Threshold image

Fig. 7. Final output image after contour detection


Fig.3 Final build after successful execution
VI. CONCLUSION
In conclusion, in this paper a voice controlled robotic
manipulator is described. The OWI Robotic Arm Edge is used
as a hardware base to test the research and development. The
various hardware and software components that compose the
system are described first. This is followed by a description of
the architecture of the system and how the various components
of the system are integrated and work together. Finally, the
results of testing are shown.

Future developments would proceed in the direction of


integrating this technology with commercially produced
robotic manipulators and test them in actual work
environments. In order for this to be possible, the vision
system of the manipulator needs to capable of decoding
environments that it could face in the real world. With
improvements in the area of object detection and recognition,
complex environments that such a commercial manipulator
would face in the real world could be understood and
interpreted by it. With the successful testing of this system,
producing such a commercial robotic manipulator is the next
step.

REFERENCES
[1] Saunak Saha, Amitangshu Mukherjee, Sanchoy Das and Babak Hoseini,
“Intelligent task repeatability of an industrial robotic arm” in 2016 IEEE
7th Power India International Conference (PIICON), pp. 1-5.
[2] Pedro Neto, J. Norberto Pires and A. Paulo Moreira, “Accelerometer-
based control of an industrial robotic arm” in RO-MAN 2009 - The 18th
IEEE International Symposium on Robot and Human Interactive
Communication pp. 1192-1197.
[3] Sagar Shirwalkar, Alok Singh, Kamal Sharma and Namita Singh,
“Telemanipulation of an industrial robotic arm using gesture recognition
with Kinect” in 2013 International Conference on Control, Automation,
Robotics and Embedded Systems (CARE) pp. 1-6.
[4] Karlos Ishac and Kenji Suzuki, “Gesture Based Robotic Arm Control for
Meal Time Care using a Wearable Sensory Jacket” in IEEE 4th
International Symposium on Robotics and Intelligent Sensors pp. 122-
127.
[5] Rajesh Kannan Megalingam, Sricharan Boddupalli and K G S Apuroop
(2017), “Robotic Arm Control through Mimicking of Miniature Robotic
arm” in 4th International Conference on Advanced Computing and
Communication Systems pp. 1-7.
[6] Rupam Das and KB. Shiva Kumar (2016), “GeroSim: A Simulation
Framework for Gesture Driven Robotic Arm Control using Intel
RealSense” in 1st IEEE International Conference on Power Electronics,
Intelligent Control and Energy Systems pp. 1-5
[7] Rong Phoophuangpairoj (2011), “Using Multiple HMM Recognizers and
the Maximum Accuracy Method to Improve Voice-Controlled Robots”
in 19th IEEE International Symposium on Intelligent Signal Processing
and Communication Systems pp. 1-6.
[8] Amitava Chatterjee, Koliya Pulasinghe, Keigo Watanabe and Kiyotaka
Izumi (2005) “A particle-swarm-optimized fuzzy-neural network for
voice-controlled robot systems” in IEEE Transactions on Industrial
Electronics, vol. 52, no. 6 pp. 1478-1489.

You might also like