Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Developing a Pakistan Sign Language Translator Application Using OpenCV and CNN

Sign language is used as a medium of communication to communicate with the deaf people. In
Pakistan, this communication through signing is known as Pakistan Sign Language (PSL). The
purpose of this research is to bridge the communication gap between deaf and unimpaired
people. This study focuses on Pakistan sign language translator application using OpenCV and
Neural Networks – a mobile application to translate sign language into English text and speech
that works by putting a smartphone before the user. Therefore, we would be using OpenCV
framework to capture real-time video or picture frames and extracting the features from them,
to recognize the hand gestures. It uses openCV for preprocessing like cropping and MOG
background subtraction though, then training the Neural Network model. Hand Recognition 
Gesture Recognition Pattern Recognition  PSL Recognition .Our app uses neural networks
and computer vision to understand the video of the sign language speaker. This is later
converted to speech by smart algorithms. The biggest challenge would be that there are so
many languages spoken here and so many variations in Indian sign language. A lot depends on
how the person is enacting and given the many regions, every person will have his or her
version. The corpus needs to be huge enough to be able to take it. It will play a major role in
building inclusion by taking away communication barriers faced by deaf people.
Keywords: Sign Language, Communication, Translate, Convolutional Neural
Networking (CNN)

his process can be divided into three major categories: gesture detection, gesture
interpretation, and text to speech conversion .

Gesture Detection

Because PSL will heavily rely on hand gestures, the first step in our approach to recognize
our target gestures is to detect the hands. Our goal in this section is to discriminate and isolate
the hands from the rest of the background. This is a necessary step because it simplifies the
data that will be used for interpreting the gesture. In order to discriminate and isolate the
hands from the image, we will use image processing techniques to derive our target visual
features; we will most likely use some of the following features: skin color, silhouettes, and
contours. (Zabulis et al.)

To perform these image processing techniques, we will be using OpenCV. This library
supports Numpy, which is an optimized library for numerical operations in Python. That
means openCV arrays can be converted into Numpy arrays to increase the speed of numerical
operations on large sets of numbers much faster. We will start using OpenCV of Python
version for its simplicity and code readability. However, we can switch to C++ version later
if the speed of Python version is not good enough for in real-time recognition of hand
gesture. Python can be easily extended with C++ to make our program as fast as original C++
codes. OpenCV also contains statistical machine learning functions for k-nearest neighbor
algorithm, naive Bayes classifier, artificial neural networks, Support Vector Machine, etc. On
top of that, OpenCV can be used with other machine learning libraries like SciPy and
Matplotlib which supports Numpy
Gesture Interpretation

Interpreting gestures can be broken down into two levels of complexity: posture and gesture
interpretation. Posture interpretation is the simpler case because it only deals with the
interpretation of non-moving figures. On the other hand, gesture interpretation adds another
level of complexity by analyzing the movement of figures. In ASL, most of the hand gestures
for numbers and letters do not involve any motion after the sign is formed; this will allow us
to use posture interpretation approaches. Because our target vocabulary is relatively small,
we can extend this approach to gesture interpretation by treating it as a series of hand
postures (Zabulis et al).

In essence, machine learning techniques use mathematical models of observable phenomenon


to provide predictions for unobserved data (Belgioioso et al.) For our case, the phenomenon
being represented as mathematical models are the hand gestures and their corresponding
meaning, and the unobserved data is the real time detection of a person performing ASL. We
can also see that we are dealing with a classification type problem and can take
inconsideration some machine learning classification techniques. To name a few, we have the
k-Nearest Neighbors, Discriminant Analysis, Support Vector Machines, and Relevance
Vector machines. Scikit-learn is a library that contains some algorithms relating to these
techniques, which is why we will use it for our project.

Text to Speech Conversion (TTS)


Text to speech conversion is often called TTS which is an abbreviation of Text-To-Speech.
Festival(the Festival Speech Synthesis System) is a framework for TTS. It builds speech
synthesis systems with various example modules and offers a full text to speech system.
Since it is written in C++, we will use it through the Python wrapper Pyfestival. In order to
keep our focus on gesture recognition and to reduce development time, we will use this
existing library. In our project, Pyfestival will be used to convert text into English speech.
The text input for this tool will be obtained from the gesture interpretation.

Our mobile application will rely on superior new technology: AI and neural networks.
All the translation will happen in the cloud. Application will use neural networks and
computer vision (OpenCV) to recognize the video of sign language speaker, and
then smart algorithms translate it into speech.
It will just require a camera on the device facing the signing person, and a
connection to the internet.
All the translation will be done by algorithms so that means we can differentiate on
price as well, offer an inexpensive translation service which is handy and gives a lot
of benefits to both deaf people, and businesses and service providers.
It will translate as quick as the person speaks, translate Pakistan Sign language and
can be plugged into many products, such as video chat applications, AI assistants,
etc.
The translation softwares available in the market are either slow or expensive, or rely
on old technology which does not allow scaling to another markets outside country of
origin.

It will be a free app, with limited number of phrases initially, just enough to satisfy the
basic communication needs of deaf people, and eventually we will expand it to cater
the needs of wider audience.

It will create a chance to lead to higher employment of the deaf, and social equality
This pocket application will find use in a B2B setting, where businesses who want to
employ deaf and mute employees can use it to convey employee messages to the
end consumer, according to the company.

Objectives of the Study

 To develop an application to translate Pakistan sign language into text and speech in
order to help the unimpaired teachers to understand the communication of deaf learners.

Modern Approaches
As modern-day technologies are all about cell phone computing, gesture-based
environments, and cloud computing. The world is bounding into the gestural mechanism. IT-
based companies like Microsoft, Google and Leap Motion are introducing devices like Kinect,
Google Glass and Leap Motion controller (Potter, 2013) so, development in the technology
may be used to benefit deaf people.
For deaf children, communication is colossal scrap. It gets impossible for them to blend
with society because they cannot communicate in a normal way. In the academic scenario, the
learning environment for deaf learners is not always parallel compared to normal learners. One
of the most effective methods for the deaf is communicating through Sign Language. The
communication through signing can't be comprehended through gesture-based communication
by a specialist or other fellows. It creates difficulties between deaf and normal person
communication (Mindess, A .2014)
In upstanding frameworks and research for gesture-based communication Translation
and acknowledgment gadget, each picture based and sensor-based strategies are utilized. Most
recent studies concentrated more on the hand gesture identification system because of the
software in HCI, Robotics, game-based learning, and sign language identification programs
and systems. Different processes and algorithms from pc vision community have been used
(Itkarkar & Nandy, 2014).

Sign language is a communication tool that uses gestures is a way of interaction. These gestures
include shape and movement of hands, arms, fingers as well as eyes, face, and body. There is
no development of international standards of sign language due to variation in culture and way
of communication throughout the world, but some country including Britain, Spain and
America have developed a standard at the national level for their deaf population.
Every field of the world aims at maximizing the abstraction of benefits from technological
development, so does the communication and particularly for impaired people. The fast and
efficient way of interaction with and among deaf people is demanding and nowadays probably
every country is focusing on these developments while leaving behind outdated sources of sign
language communication like pad and pencil.

Sign language (SL) is a foundational medium between the individuals who are not able
to hear well. This method is also called optical motion dialect. People who are not able to
understand well use this method of sign language as the main channel for communication.
Every country has its sign language. For example, China, America, India, and Pakistan have
their sign languages which are known as Chinese sign language, American Sign Language,
Indian sign language and Pakistani sign language. Many progressive countries give a lecture
on this issue. They organize different project activities including information technology to
eliminate the gap among a deaf and the normal people. Many surveys have been conducted on
this issue in central and south Asia. However, in Pakistan, this method is under investigation
because there is no structured or organized information about language grammar, contents and
instruments for transmission. (Khan et al., 2015). But till now the main point of this research
is to discuss the problems to make a way between the normal and deaf community and after
using the literature. They suggest many items to build a bridge.
The rules of sign language are different from the rules of spoken and written languages. The
sign language is based on shapes, and written language is based on word formation and some
basic rules of grammar (Debevc et al., 2015). The information technology has the main impact
on our lives many things that we use in our life are made by a human.

Natural language processing and Machine translation, both are overlapping with computer
technology in Research expanse. The vital objective is to expedite the deaf people while
interacting with unimpaired. In this regard, we have explored speech translation into Pakistan
Sign Language. Natural language processing can boost the way of using computers, while
Machine translation can break the language barrier. Communication is a basic human need.
People without hearing impairment can listen and speak, but deaf people cannot understand,
and if they are by birth deaf, then they cannot even talk. So deaf people use signs to
communicate; these signs are known as sign language. In Pakistan, there is communication
through signing known as Pakistan Sign Language (PSL) S., Ahmad, A., ... & Mohamad, D.
(2006)
Sign language is accepted as minority language that co-exists parallel to other
majority languages. Most of the country have developed sign language for their hearing-
impaired community. Although it is different to develop a standardized sign language for the
world because of the difference in culture and norms in a different part of the world, some
word exists in one language cannot be present in another language. So, it is difficult to form
one sign language and countries instead prefer their native sign languages. Neider, C., Kegel,
J., MacLaughlin, D., Bahan, B., & Lee, R. G. (2000) developed a prototype system called
sign to voice (S2V) that recognize gestures after transformation from digitalized images to
voice notes by using neural network approach. The research was propelled by the need of a
more efficient tool in modern society.

A gesture is described as a combination of movement, position, shape and palm


orientation of the hand. All of them are accessed to segment and recognize the sequence of
sign images. Image processing is the most vital step of than translation this process because
if recognition is not valid and translation will be incorrect for sure. Vocabulary is limited to a
certain extent (Chiu, Y. H., Wu, C. H., Su, H. Y., & Cheng, C. J. (2007)

There is an opportunity of continuous improvement in sign language system as there


is no standardize language system and each system at the national level needs to be improved
according to the range of signs used. The objective is to provide communicational ease for
hearing-impaired people throughout the world (Halawani, S. M. (2008). Every sign language
consists of three components, e.g. finger-spelling, word level sign vocabulary, and non-
manual features. Finger-spelling has a function of spelling each word and each letter of a
word. Word level sign vocabulary used in most of the communication. Non-manual features
are expression and position of body parts, e.g. facial expression and body position.

After spelling, the sequence of words is converted into a series of the sign via
translation module. The translation system developed to convert speech into the sign. At first,
the spoken word is converted into text and then relevant signs are displayed. This system can
be developed as an application in a mobile device, but there is a limitation in image
processing due to the capacity of hardware used in mobile devices. Jose, M. J., Priyadharshni,
V., Anand, M. S., Kumaresan, A., & MohanKumar, N. (2013).
In term of the processing the image recognition, deep convolutional neural network
has been the most successful approach. It is based on feedforward neural network
architecture and has a receptive size field and fixed network depth. This paradigm has certain
limitations including loss of image details when inputs are resized by the processor
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012)

Transfer Learning

Transfer learning is the methodology of machine learning based training of models on


dataset after refactored to endorse specific data. Specific data is obtained by recycling
weights of a pre-trained model and restarting or changing weights at classification layers. The
main advantage of this technique is that it requires less time and data. However, there is a
challenge in transferring learning branches from the variation in data used for training versus
original data. If the difference is more significant than datasets will require to reinitialize or
otherwise increase the rate of learning of deeper layers Garcia, B., & Viesca, S. A. (2016)

Garcia, B., & Viesca, S. A. (2016) produced a robust model for the letter (a-e) and
modest model for letters (a-k excluding j). Due to the lack of variation in the database, the
validation accuracy was not reproducible while testing as of at during training. The model
was expected to generalize with high efficiency when additional data is taken in different
conditions.

Garcia and Viesca (2016) applied a pre-trained architecture of GoogleNet which is


trained on the dataset of ILSVRC based on convolutional neural networking and also used
the ASL datasets of Massey University and Surrey University to apply transfer learning to
this task. They produced a robust model that classifies letters a-e correctly with first-time
users. For all ASL Letters, a fully generalizable translator can be produced.

The technology of depth sensing use is quickly growing in popularity, and the other
tools have been incorporated into the process, that proved successful, developments such as
custom designed, color gloves have been used to facilitate the process of recognition and
make the step more efficient feature extraction by making specific gestural units easier to
classify and identify (Dong, Leu & Yin, 2015).

Until now, different methods of automatic sign language recognition were not able to
make use of the depth sensing technology which is widely available today. Previous works
made use of just a simple camera technology to generate datasets of simple images, without
depth or contour information, just the pixels present but classifying images of ASL letter
gestures using CNN's have had some success Garcia & Viesca (2016) but using a GoogLeNet
architecture which is pre-trained (Cui, Liu, & Zhang, 2017) have proposed a deep structure
with recurrent convolutional neural network for constant sign language identification. We
have designed a staged optimization procedure for educating our deep neural network
structure.

Deep convolutional neural networks (CNN) have accomplished breakthroughs in


gesture identification (Guptaet al, 2015). And sign recognizing (Natlia et al., 2014), and
recurrent neural networks (RNNs) has also proven significant results while learning the
dynamic temporal dependencies in sign recognizing (Pigou et al. 2018).

Foong (2018) provides a system prototype which is capable of automatically identify sign
language to assist normal people in speaking more efficiently with the hearing or speech
impaired people. The sign to Voice machine prototype, S2V, was developed using Feed
Forward Neural network for two-series signs detection. Different sets of universal hand
gestures have been captured from a video camera and applied to train the neural network for
classification purpose. The experimental results have proven that the neural network has
accomplished excellent result for sign-to-voice translation.
METHODOLOGY:
The application consist of two modules:
-Drawing PSL gestures/signs from real time video.
-Mapping them to human understandable text/Speech.

We will using new super technology AI and Neural Networks to evaluate different algorithms
and bring efficient solution to this problem

CONCLUSION:
The application will convert the Pakistani sign language into text form, to facilitate the deaf
people to convey their messages as well as others. The proposed system brings Two Way
Communication between Deaf and Dumb people and Normal People and aims to bridge
communication gap between two strata of the society. It will be a mobile application so it will
serve its need in all aspects

You might also like