Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/324694726

ACE Assisted Communication for Education: Architecture to support Blind &


Deaf communication

Conference Paper · April 2018


DOI: 10.1109/EDUCON.2018.8363341

CITATIONS READS

10 819

5 authors, including:

João Pedro Pinto da Silva Ulisses Tiago Oliveira


University of Vigo Polytechnic Institute of Porto
16 PUBLICATIONS   33 CITATIONS    9 PUBLICATIONS   42 CITATIONS   

SEE PROFILE SEE PROFILE

Paula Maria Escudeiro Nuno Filipe Escudeiro


Polytechnic Institute of Porto Polytechnic Institute of Porto
151 PUBLICATIONS   392 CITATIONS    79 PUBLICATIONS   252 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Gaball View project

PRAXIS View project

All content following this page was uploaded by João Pedro Pinto da Silva Ulisses on 24 May 2018.

The user has requested enhancement of the downloaded file.


ACE Assisted Communication for Education:
Architecture to support Blind & Deaf communication
João Ulisses1, Tiago Oliveira1, Paula Maria Escudeiro1, Nuno Escudeiro12, Fernando Maciel Barbosa3
1
ISEP - School of Engineering, Polytechnic Institute of Porto, Portugal,
2
LIAAD - Laboratory of Artificial Intelligence and Decision Support, INESC Tec, Portugal,
3
FEUP - Faculty of Engineering, University of Porto, Portugal,
{jppsu, taqol, pmo, nfe}@isep.ipp.pt, fmb@fe.up.pt

Abstract—Deaf and blind students face communication barriers Assisted Communication for Education project (ACE)
that are constantly present in their daily lives. These barriers arise proposes an architecture to support real time translation
naturally since the deaf community, the blind community and the between sign and oral languages in both ways. This translation
rest of the students and teachers use different languages and can then be used by the blind, so they can perceive the sign
different channels to communicate. These barriers have a language as speech. Similarly, blind people can communicate
significant impact in the academic, personal and professional with the deaf by using speech or written text which is
development of deaf and blind students. Using automatic tools to converted to sign language. This bidirectional translation is
assist the fluid communication between people using different
available to assist the communication between the deaf and
languages and different channels of communication might
non-deaf communities using the VirtualSign tools [4][5].
significantly promote the social inclusion of disabled students. In
this paper we describe the ACE architecture which incorporates Promoting equal opportunities and social inclusion of
VirtualSign, a translator for sign language and other components people with disabilities is one of the main concerns of the
to allow for the real time translation between sign and oral modern society and also a key topic in the agenda of the
languages. This architecture supports any sign language. We European Higher Education [3].
expect ACE will provide ways for the fluid communication among
deaf people, blind people and those not constrained by these Fluid communication is of utmost importance as it impacts
disabilities. accessibility to education and hinders creativity. ACE seeks to
promote inclusive education by offering tools to assist the
Keywords—Inclusive school; sign language translator; 3D
communication with and between the deaf and blind
avatar; gloves sensor; Blind & Deaf communication)
communities.
I. INTRODUCTION ACE makes use of gaming technology such as an avatar
Deaf and blind students face serious challenges in with 3D animation, glove sensors, Unity game engine, used in
nowadays education settings. Communication barriers are entertainment games and serious games. This facilitates a link
constantly present and arise naturally due to the use of different with educational content, specially through serious games.
languages. Different forms and channels of communication by
the deaf community, the blind community and the rest of the II. RELATED WORK
students and teachers often leads to loss of information. In the last two decades a significant number of works
Although some deaf can read text fluently, they are a minority. focusing on the development of techniques to automate the
Deaf students’ mother language is sign language with its translation of sign languages was proposed. However, many of
proper grammar quite distinct from oral languages. Blind them are directed for the American Sign Language [6]. The
people rely on voice to communicate while deaf people rely on introduction of serious games in the education of people with
visual representations; these channels are independent and speech or hearing disabilities have been published in [7] being
prevent any type of communication between these two accessible educational content. However, these only make use
communities. of several of the methods proposed to perform representation
For deaf children, reading is troublesome due to difficulties and recognition of sign language gestures apply some of the
in understanding the meaning of the vocabulary and the main state-of-the-art techniques. These involve segmentation,
sentences. In sign language they use sentences only word by tracking and feature extraction as well as the use of specific
word while filtering out some elements of the speech, e.g. hardware as depth sensors and data gloves [3].
articles. The order of these also comes in a structure for ProDeaf is an application that does the translation of
example having subject then object then verb, often having the Portuguese text or voice to Brazilian sign language [3]. The
time and location referenced at the beginning of a sentence. It objective of ProDeaf is to make the communication between
is hard for the deaf to understand an oral language grammar is mute and deaf people easier by making digital content
different and more complex [1][2] than sign language’s accessible in Brazilian sign language. The translation is done
grammar. This fact together with the lack of communication using a 3D avatar which performs the gestures. ProDeaf
via sign language in schools severely compromises the already has over 130 000 users [3].
development of linguistic, emotional and social skills in deaf
students [3].
Showleap is a recent Spanish Sign language translator, it Mohandes et al. [12] published a detailed compilation of the
claims to translate sign language to voice and voice into sign most successful image-based and sensor-based approaches to
language [3]. At the moment Showleap has no precise results Arabic sign language translation. While it is possible to achieve
on the translation and the creators claim the product is 90% positive results separately, the authors concluded combining
complete [3]. both approaches should wield the best result in future real-time
translation applications. This last thought comes in agreement
Motionsavvy Uni is another sign language translator that to the VirtualSign approach to translation and sign recognition.
makes use of the leapmotion [3]. This translator converts
gestures into text and voice into text. Text and voice are not Considering related past research projects, ACE has
converted into sign language with Motionsavvy Uni. The scientific information and results to infer a stable architecture
translator has been designed to be built into a tablet. to formalize the complex data of multiple sign languages and
Motionsavvy Uni claims to have 2000 signs on launch and other communications types.
allows users to create their own signs [3].
III. SIGN LANGUAGE
Spreadthesign is a project with the participation of nine
European teams composed of deaf people and listeners from In general, despite most gestures being different among
various institutions dedicated to working with deaf people from different sign languages, they mostly use combinations of a
the following countries: Portugal, Spain, Lithuania, Sweden, common set of hand configurations and general syntactic rules,
the Czech Republic, Turkey, France, Germany, the United such as, not using articles.
Kingdom and even Russia, Japan, Finland and the United In order to learn most Portuguese Sign Language rules, we
States of America [8]. The main objective of the project is to engaged in meetings with professional sign language
collect gestures at the national level of each participating interpreters and deaf people. Currently we are arranging and
country to build a multilingual, digital, online dictionary in taking part on more meetings for other sign languages in
different thematic areas, combining the national and gestural multiple countries.
languages of the different countries involved [8]. This is a way
of promoting the access of the deaf to the vocational training Some of the rules include:
programs in transnational mobility schemes [8]. The general movements can be linear, obliquus, in arc or
In terms of sign language to text translation, an early circular. On the other hand, the internal movements consist on
example of sign language to text translation comes from the change of hand configuration and orientation during a
Sagawa et al. [9] in 1995 in the form of a Japanese sign gesture, like fingering, hook, twist or friction, among others.
language to text translator and vice versa. They studied two Gesture as a whole can be analyzed in sub-units –
different approaches for sign to text: image processing or hand- “queremas” (same as phonemes), which are defined by the
based input but ended up concluding the technology at the time location, configuration and movement of the Dominating Hand
was not yet accurate enough to capture the needed information (M1) or Non-Dominating Hand (M2). The intensity and
from the picture, which made them continue the work using velocity of the motion can be an indicator of pronunciation,
data gloves. They were able to reach a 97% success rate, even while the facial expression can express emotions related to the
though their dataset was definitely small, it was still an gesture.
impressive result for that time. In recent years we have many
more examples of successful translation. There are three types of gestures: Dominating Hand
gestures, executed solely by M1; Base Hand gestures, executed
Akmeliawati et al. [10] had great success with their with both hands with M2 serving as base or support for M1;
implementation of a sign language translator which makes use Bimanual gestures, using both hands with the same
of color segmentation and neural networks for a word-by-word configuration but incorporating motion.
translation of Malaysian sign language to English text,
achieving over 90% success rate. The shortcomings suggest Facial expression and body movement can also accentuate
their application only detects hand movement and thus cannot the gestures in three different ways: interrogative, in which the
disambiguate between contexts on sentences hence the word- speaker raises his head and the eyebrows remain frowned or
for-word approach chosen. They use a simple webcam with raised, with his body moving forward a little and the arms
simple gloves adapted with color strips on key spots, then moving up or down; exclamative, where the speaker moves his
analyze the video input via image processing techniques and body forwards or backwards to express joy, surprise or
use neural networks to classify the gesture. It is a cost-effective reluctance; declarative, where the speaker remain with a neutral
approach but the capabilities for a full translation are far from or null facial expression [2].
our plans for VirtualSign. Chai et al. [11] opted to use Kinect Sign language involves a set of components that make it a
for their implementation of a Chinese sign language translator rich but hard to decode communication vehicle [3]. Although it
that converts gestures into text with an accuracy of over 80%. is not as formal and not as structured as written text it contains
They make use of 3D trajectory matching techniques, a far more complex way of expression [3]. When performing
comparing the motion of each hand and matching it towards sign language, we must take into account of a series of
known gestures. Kinect is also one of our chosen technologies elements that define the manual and non-manual components
for translation, but not with this implementation, seeing as it [3]. By changing one of these elements the gesture changes or
does not allow for hand configuration or facial expression loses its meaning. At the level of manual component, there is
recognition, which can prove crucial in a full translation. the definition of dominant and non-dominant or supporting
hand [3]. Usually for each person, the dominant hand coincides Our blind architecture will make use of generic APIs, such
with the hand with greater dexterity. In the execution of as Web Speech API, and integrate them with other
gestures, the acting of the dominant hand may possibly differ architectures, such as our deaf architecture, allowing
from the support hand [3]. communication between all types of users and allowing
collaborations. These collaborations are important to enrich
Manual elements of sign languages include: Hand creativity, which is a highlight part of this architecture.
configuration: the hand shape while executing the gesture. In
Portuguese sign language there is a total of 57 identified hand Blindness is part of creativity [14], we can perceive in our
configurations, shared between the dominant and supporting imagination this when reading a book. This part of creativity is
hand [3]. Orientation of the palm of the hand: some pairs of indeed noticeable on blind people daily lives and the way they
configurations differ only in the palm’s orientation [3]. perceive things, which can lead to interesting creations when
Location of articulation: the place of articulation comprises the they are allowed to create. ACE seeks to enable and exploit this
different areas of the body where the gestures are performed in creativity by using digital games and digital tools.
the gestural space. Some gestures are intrinsically connected to
a contact point e.g. eyes, neck, chest, trunk. Others are held in As different types of users collaborate which each other,
front of the person, without any contact point as in the case of interactions are made. There are multiple types of interactions,
resulting from inputs and outputs of blind and deaf
the alphabet [3]. Hand movement: characterized by the use of
one or two hands and by the motion of the hands in the gestural architectures. These interactions have architectures on their
space [3]. own, such as: Collaborative Art, where we want to explore the
deaf and blind creativity since they have different capabilities.
Non-manual component comprises: Body movement: Serious games where we can explore game mechanic concepts
responsible for introducing a temporal context. The torso to teach using Bloom’s taxonomy [15] and other similar
leaning back, upright or forward indicates the communication architectures [16][17][18]. In education a possibility linked to
in the past, present or future, respectively [3]. Facial serious games is virtual classrooms, where the students can be
expressions. The facial expressions add a sense of emotion to more participative and communicate with ease by using ACE
the speech, a subjective experience, associated with and VirtualSign tools. The serious games created can also be
temperament, personality and motivation. Two distinct used for entertainment and to increase the interaction of online
emotions cannot occur simultaneously, since the emotions are communities. For this to be easier, game mechanics can be
mutually exclusive. This is one of the aspects most emphasized used so players have fun despite their incapacities. Their
by the deaf community [3]. enhanced senses can be used to solve tasks within the game,
giving them a greater sense of achievement, such abilities can
Sign language, like any other living language, is constantly be echolocation for the blind and parallel vision for the deaf.
evolving and becoming effectively a contact language with
listeners, increasingly being seen as a language of instruction A well-designed game entices players into the “reality” of
and learning in different areas, a playful language in times of the game world and keeps them there until the goals of the
leisure, and professional language in several areas of work game have been met [19].
[12].
In GILT (Games Interaction & Learning Technologies)
Portuguese Sign Language is a system constantly evolving laboratory there are serious games being made by researchers
and renewing, thus offering a world full of possibilities [2]. It using various hardware such as smartphone, Microsoft’s Kinect
remains to be understood, accepted and respected [2]. For this and more recently Emotiv Epoc+. These serious games
we need an architecture able to withstand the complexity of integrate in our Educational architecture which is linked with
sign languages, their interactions and how they evolve. the other architectures. Some previous work relating serious
games is in [20][21] which includes our serious game for the
We are setting up the translation models for several sign deaf that makes use of VirtualSign tools [5][22].
languages, including the sign languages from Portugal, Brazil,
Slovenia, England, Cyprus, Greece and Germany. We are One of these games created for the deaf is a multiplayer
gathering structural rules of these sign languages through game as described in [3]. The game consists of a first-person
interviews and structured surveys to be filled by our partner’s puzzle game and requires two players to cooperate in order to
experts. go through the game [3]. This game has a chat where users can
type but it also makes use of the VirtualSign translator to allow
IV. ARCHITECTURE OF ACE deaf users to chat within the game using sign language [3].
Architecture of ACE can be divided in the two main areas: Cooperation is not only necessary but also rewarded as the
Blind Architecture and Deaf Architecture (VirtualSign). In the players performance is represented through a score [3]. The
future it will have more architectures for each type of players can compare their performances as team with others
disability. The VirtualSign project has been more focused on through the highscore list [3].
the deaf architecture due to the issues the society has in solving Two games were created for the blind, Virtual Adventure
them. ACE is currently expanding on VirtualSign as a and Game2Sences [23]. These games used Microsoft’s Kinect
continuation and expanding to other disabilities and and smartphone hardware respectively.
interactions. Although VirtualSign did not have a meta-analysis
and a large-scale architecture oversight, their contents will be Game2Senses project intends to reduce the social barriers
put into ACE and the architecture will be done with ACE between blind community and those not suffering from visual
components. disabilities [23]. It also intends to increase the quantity, variety
and quality of computer games for visual impaired. This could be studied in this scope to enrich the computer
achieved the specified goals through a two-fold approach: perception of utility, quality and originality thus again creating
demonstrated a way to present interactive and immersive more interesting algorithms to be used by artificial intelligence
entertainment for blind using spatial audio experiences and agents.
presented interactions between players in the game [23]. This
game has a graphical interface, to promote blind and non- Collaborative creativity can be used in education as a mean
visually impaired to play the Game2Senses and interact with to give true inclusion of the students by allowing them to work
each other [23]. in groups to solve problems, explore and learn by themselves.
It would also be interesting to see how collaborative creativity
Some applications were also created out of the scope of or creativity when playing against each other to win a game
digital games. Figure Out is a mobile application for deaf works and try to extract if a type of individual would have an
tourists however any tourist can find it useful, as it translates advantage in which situations. This leads to a more appealing
any language from text captured via the smartphone camera. way to create lectures and educative content dedicated to
This text is translated in a simplified way so the deaf tourist specific types of individuals.
can understand. This was mostly used for restaurant menus,
public transport guidelines and schedules and other important For the ACE architecture, creativity modules are contained
in each branch of the architecture, in which we are currently
tourist’s routines.
giving more emphasis in the deaf architecture due the
Alarm Alert is another mobile application for the deaf, complexity of sign language.
which warns the user about nearby danger. This danger is
detected by the smartphone running the application which is Deaf architecture (VirtualSign), which at the moment is the
receiving soundwaves and filtering them to detect danger most complex, we plan to expand on VirtualSign and use it on
scenarios. Danger scenarios are varied, like an ambulance, ACE linking it with the other architecture types. This one is
thunder and other were considered important for the deaf to divided in two main areas, text to sign and sign to text. The
identify in their daily life. When an occurrence is detected the online tools for the deaf are the translator itself, and the
smartphone vibrates and a picture with the event is shown for configurator, which are the input and output of the deaf
the user to quickly identify what is happening nearby. This architecture respectively. Deaf architecture supports text-to-
application gives important sound awareness to the deaf, but sign and sign-to-text translation inertly, when linked to the
more importantly the sounds are properly studied and filtered blind architecture it supports translation through voice.
so it does not give false positives, which could cause major V. TEXT TO SIGN
problems.
Two main VirtualSign tools assist in the text-to-sign
As continuation of past work, serious games have proven translation module: Translator and the Configurator, both are
multiple times to be a successful source of information and components of the VSSO (VirtualSignStudioOnline). These
method to study and validate hypothesis. With this in mind a tools are developed in Unity and built into WebGL, so they can
new set of serious games are being devolved, using Emotiv be used online through most popular internet browsers. Other
Epoc+ to increase the scope of inclusiveness and reach out to builds are also used, depending the type of user, they are made
more people. The game mechanics are also definitely different, in specific to Windows, Android and others, reducing possible
commencing with a card game, a turn game, as we hypothesize technology barriers.
it will suit the time needed for answer and communication, as
well creativity in strategy and creating rules or cards for the VSSO Translator works like an online application and is
game itself. meant to be used daily at convenience, VSSO can also be an
Android application. Users can write text in a white box, which
The nature of card games by being a turn game will allow then it be translated to any of the I-ACE (international version
the players to have more time to cogitate and give an adequate of the project ACE) partner’s languages, including, so far:
answer in their turn, without needing to worry about reaction. It Portuguese, Slovenian, German, British, Cypriot and Greek
is known games require reaction are more appellative to young sign languages as well as LIBRAS, the Brazilian sign
player as generally old players have less reflexes. With this in language. Our avatar does the required animations as gestures
mind it would be more prudent to create a game which relies performing the translation of text to sign language. The
less on reaction and more on strategy for the mobility impaired translation is performed by our 3D avatar checks the input
players using Emotiv Epoc+. This also facilitates the multiple words in our validated signs database, which returns the
types of communication in an online card game match as they gesture information according to the specific country sign
can wait for the translation to be done. Another aspect to language grammar rules. Another utility of this tool is it can
consider is to exploit the different types of more developed serve as connection from text to animation for other tools,
senses. As individuals with certain difficulties certain other digital games, plugins and other software by calling VSSO
capabilities rise. These capabilities such as peripheral vision, translator methods.
echolocation and others could be used as means to enhance
creativity as seeing things differently can have an impact on VSSO configurator is based on 3D animation configuration
creativity. Imagination and creativity to solve strategic programs, such as Maya, 3DSMax and Blender. In the VSSO
problems by using these senses can be compared with deep configurator is where the deaf and experts will add and validate
learning algorithms from which new algorithms could be the gestures, this can be done online or offline. This tool uses
formulated for the search the best strategy. Creativity itself our 3D avatar and can be manipulated like an 3D animation
configurator, but with sign language specific work tools. Arms,
hand configuration, body movement, head and facial hand configurations can be created and later used or
expression are all customizable with several degrees. This is recognized. This is especially useful for adding new sign
customized for each moment of a sign, each sign has several languages to the system.
time moments in which the hand has a certain sum of states.
When creating a sign, the user can reuse and import other A plugin for Microsoft’s PowerPoint is being developed
similar gestures and edit accordingly to the new gesture being using text to sign VSSO tools. PowerPoint text is translated
configured. This allows the user to save time to create multiple into animations with the sign gestures. These gestures can be
signs, as information such as hand, body, head, arms is already placed on each PowerPoint’s slide allowing deaf students to
configured and can be quickly edited for all the moments of a better understand the content or to study at home. Professors
sign. It is possible for an editor using the tool to copy and can have now the opportunity to teach deaf students without
mimic movements from websites which contain videos of mastering sign language. Better yet is the fact PowerPoint
signs, thus adding those words to VirtualSign database. This presentation can be translated to different sign languages when
work is normally done by the editor but can also be done by the needed. Students can also use the plugin on their own to
validator. Each user can be a validator or an editor, each having convert any presentation helping them to better understand the
a unique login, which has information about the user stored in content in the PowerPoint.
our VirtualSign database This information is important to know VSSO will also implement voice recognition and text to
which sign languages the user knows, expertise, what speech online. It is important to be online not only for it to be
languages they are adding content to and other trivial generic more practical as it can be used in any platform, specially
information. The validator is responsible to validate the signs smartphones, also to be always ready to use anytime, without
created by editors, the signs have three states, validated, in the issues that come with installing software. These are to be
review and not validated. The signs are to be validated by an integrated with the PowerPoint plugin as well so the
expert which can also correct the movement or send a request presentation works for hearing blind students. Text or words is
to the editor to change the animation. Editors can be added our base form which is why our next focus will be receiving
voluntarily in our application by signing up, our application is more information, such as context. The way this module works
made so they can work on their own. However, there is control is simple, voice recognition converts voice to text and from text
to whom we allow to create the signs and specially who it is convert to animation, allowing the blind to speak with the
validates them, this control is made by the us, the admins of deaf. The opposite direction is also possible by using sign to
each team and by sign language experts, accepting or refusing text, the gesture is detected and converted to text, from text it is
their new accounts according to their expertise. Editors do not converted to speech by using text to speech algorithms.
require to know sign language, as they can watch websites like
Spreadthesign, which contain videos of gestures than can be VI. SIGN TO TEXT
copied into VSSO configurator. Some languages not included Sign to text portion of VirtualSign consists mainly on 3
in Spreadthesign have their own similar website with sign applications: the hand capture app, the sign language to text
language online, such is the case of Slovenia. translator and another approach to our configurator.
Each word can also have a context input which will help Translation from sign language to text is a complex process
disambiguating semantics and also help with grammar rules. due to the great number of variables to capture and analyze:
This feature is currently being worked on, as it needs an hand configuration, hand and arm position, motion speed, body
automatic structure can be validated by the experts. This movement, and facial expression, among others. And although
automatic structure is being tested and used by the editors and in theory it may seem it is an exaggerated number of factors for
validators online to gather context information in a valid form. a language, every detail is important to a correct interpretation
This structure is of the utmost importance as otherwise experts and translation from sign language. Every piece of information
would fill out this information with no structure, causing erratic the body provides while performing the gestures can help not
information matching, lack of information and other issues only to identify the word or sentence, but also the specific
later. A following future publication will further explain our context the speaker is referring to [24].
context structure and data collection online which is within
ACE. During our talks with sign language specialists, speakers
and translators we reached the conclusion it is possible to
Once the words are validated they become automatically divide Sign Language into three different pieces: hand
available for the translator tool, which is updated in real time configuration, body motion and facial expression. As such we
for all users. always try to use a technology or combination of multiple
systems that allow us to capture the most of these three factors
Interpreter and expert’s opinions revealed some animation’s
as possible for a more accurate translation.
speed is not perfect, and they could have more information
such as elbow position. Animation speed quality and file size In terms of technology we tried different approaches using
can be improved by using glove sensor data instead of image processing techniques on normal webcams thinking
animation file. An overhaul of the gesture animation to a more about reducing cost and improving accessibility to the
parametric animation approach is being made, but the data of applications. But unfortunately, we found the results were not
the gestures can still be used, as this is to improve their visual satisfying enough to use on the context we needed and the
performance. The same is being done for the hand animation techniques which were more accurate were also heavier on the
configurations to increase their performance due the file size, hardware, which is a huge problem when we consider the
as well a new hand configuration configurator in which new translation must be done in real-time and without any control
over the hardware it is installed on, so we had to choose other not need a model to work so the user can easily select any
routes. dataset on a text file and test it in real time. Current results are
satisfying with over 80% accuracy when the user who built the
Our chosen technology at the moment is combining input dataset is the same to test it, but when tried with different
from a pair of data gloves with body movement analysis using people the result drops to 50% or lower and the algorithm only
a Kinect sensor. Our selected data gloves are 5DT Data Glove returns the most likely configuration, which is not feasible for
14 Ultra, manufactured by Fifth Dimension Technologies [25]. the real-time translation we plan in the future.
These gloves measure not only finger flexure but also the
abduction between the fingers, with 14 sensors on each hand, A newer approach was recently implemented, which uses
two on each finger and the other four in between them. The Convolutional Neural Networks, a class of deep neural
amount of sensors is perfect for an accurate capturing of hand networks that has been pushing forward the state of the art in
configurations and the gloves allow for an individual classification and has the major advantage of requiring little
calibration per user on real time, which reduces the preprocessing. We use several datasets to create a classification
documented problem [12] of different hand sizes and or finger model and the results were positive. The algorithm returns the
lengths interfering with the accuracy of the captured data. Our three most likely results along with the probability associated
second piece of technology is the well-known Kinect with each one, which is perfect for real-time translation, when
developed by Microsoft, specifically the improved version 2 the movements are fast and sometimes detecting the hand
[26]. Although some research shows using its depth sensor and configuration can lead to small errors, by having access to the
applying image processing techniques on it can wield positive other two configurations we can use the context of the sentence
results on small amounts of data and words [27], we find it is to correct the prediction and understand its meaning. The
not scalable to the amount of words we are planning nor is it current results are 55% when only the top result is considered,
feasible for real time translation as image processing is heavy 81% when we consider the first two results, and 88% when all
on the hardware. Instead we use the Kinect sensor to detect the three results are considered, which means the correct prediction
body object, previously known as skeleton when Kinect first is almost always on the top two and we can use context to
launched to the market, of the user in front of it and capture the disambiguate between them in a translation context in the
different joint coordinates and general body movement. With future.
hand configuration processing via the data gloves and body
movement analysis via Kinect, the only key element missing One of the problems we have when classifying the hand
was processing the facial expression. Kinect is also useful for configuration is figuring out where the palm of the hand is
that part as it already recognizes some basic facial expressions facing, since our current gloves do not possess any type of
out of the box and there are several libraries which push this gyroscope. There are several instances of similar hand
functionality even further [28]. At this point in the project this configurations in Portuguese sign language which only change
last component is not implemented yet and only some research the palm direction, and we believe implementing something
on the topic has been done. similar on the data glove would significantly improve the
already great results we have with this last classifying
Sign to text component of VirtualSign is composed by three technique, as it would give us 3 more variables, pitch, roll and
applications. The first one is the hand capture application. This yaw, to work with.
program has two main functionalities: hand configuration
dataset creation, and hand configuration classifying. For the Our second application is another approach to the gesture
first part the user wears a data glove and after selecting a sign configurator. It implements everything the original configurator
language from the currently available languages: Portuguese, has on Unity, but it adds an all new data glove and Kinect
Brazilian, Greek, Cypriot, English, German and Slovenian, he component. Our motivation to begin this new configurator was
is guided through a simple calibration process to ensure the due to the heavy nature of the translation process and the time
gloves are adjusted to the user. The calibration process consists it will take to develop, so we decided to prioritize improving
in having the user perform a set of different hand the way the words are configured and stored on the database,
configurations that move the flexion sensors enough to get the meaning the process becomes easier and the database across all
values adjusted to the user’s hand. After this, the user can go languages fills up faster, allowing for better translation in the
future. It is a different approach to our first configurator and is
through all the configurations for its chosen language, perform
the shown gesture and press a button to capture the data from being remodeled constantly as we try to create an architecture
the gloves. In the end a dataset is created and can be used for to enable both gesture speakers and regular users to easily fill
direct classification or to create a prediction model. The second up the database with more words and expressions. On this new
functionality is hand classification, on which the user can select approach the user can simply put on the gloves and perform the
between classifier algorithms, currently supporting K-Nearest motion in front of the camera which will then record all data
Neighbors [29] and Convolutional Neural Networks [30], and and present it on the Unity framework, making for an easier
allow the application predict which hand configuration he is and faster recording procedure. It is still under testing and in
performing with the glove. This functionality allows for instant the future, it is likely both approaches for the configurator will
testing of the captured dataset and serves as a first merge into a more robust product with ease-of-use in mind.
implementation of the final classifier version which will be Our third and final application composing the sign to text
present on the sign to text translator. part of the project is the actual sign language to text translator.
K-Nearest Neighbors classifier is a basic and non-tuned It is still on the first steps of development, but the initial
version mainly for instant testing of the created datasets. It does structure is already defined. It will make use of the Kinect
sensor to capture body movement and facial expression data, Kinect sensor. Leap Motion [31] is also under our analysis due
and the data gloves to capture hand data for real-time hand to being advertised as an extremely fast and hardware friendly
configuration classification. We capture the movement in real- technology requiring low processing power, and would not
time and separate it into “keyframes”. Those keyframes are force the user to wear any gloves, this can prove to be more
defined by whether the hand configuration changed or when cost effective in the long run.
the arm movement stops being linear. For this we need to have
two processes running at the same time, the hand classifier and VII. EXPERTS VALIDATING THE DATA
the body motion analyzer. The hand classifier will always be ACE architecture aims at solving deaf and blind
on the background recording every time the user changes hand communication problems in the society by removing
configuration we consider a change when the user’s hand stays technologic communication barriers and creating a data
still for half a second, then instantly use the classifier to structure able to withstand the complexity of multiple sign
determine which hand configuration it is and records a languages granting accessibility and independency. This is
“keyframe” with the change. On the other hand, the Kinect important because our experts validating the data are deaf and
sensor will be analyzing all body movement and facial or interpreters, they must be comfortable using our tools.
expression and capturing a “keyframe” every time the user’s Without the experts we would not be able to acquire data, so
arm movement stops being linear. Every time the users hand our tools must be tailored for their needs.
moves we use a simple function to determine if the new
VirtualSign tools are available online through VSSO and
coordinates are in line to the previous coordinates and if not,
they could work from any platform, our databases are dynamic
record it as a new movement. This way the final result is a list
and we have inside communication mechanisms which also
of “keyframes” similar to the ones we register on the database
allow collaborative work within the tool. This allows teams of
using our configurator application mentioned on text to sign
editors and validators to work asynchrony or in synchrony to
chapter. By retaining the structure homogeneous amongst all
better suit their preferences.
applications, it is easier to match the keyframes recorded by the
translator with the ones on the database, and the result is then Scalability of ACE structure is reflected on the easiness to
presented to the user. One of the main problems for this in the include new sign languages and expand ACE’s data content.
future is to determine in a real-time context with a constant
stream of gestures, where a word or expression ends and This is definitely important as we need context data to be
another one begins, this process can be very demanding and validated by experts, which is useful for many tasks in
require a lot of thought. Still, although it is a very premature translation. Context can be validated by the experts for each
implementation, we find it is pointing towards the direction we sign, creating a link between any language and that sign, since
envision for VirtualSign as a final product. Summarizing the this information is inputted by experts, it is bound to be more
process leading form sign language to text translation, we are correct than automatic methods which are common nowadays.
going to briefly go over the main steps needed. First step is This adds extreme value for this information. Also, information
capturing the hand configuration data from experts using the from automatic context finders can serve as input for the
hand capture application, which means having them wear the experts to ease their work and to format their input, so their
glove and perform each of the positions multiple times to task is to validate this information to a sign.
create a dataset which we can then use to generate a prediction However, there are still some issues to be addressed such
model. Next, we implement a model on the translation as: grammatical models, grammar and quality of the
application along with all the words recorded with the translation, lack of knowledge of the tool leading to
configurator on the database. Any user not only experts can misinterpretations, overvalue of hand configuration over facial
then use the translator application and stand in front of the expression, reluctance of the deaf community to accept the
camera with the gloves performing the gestures. The technological solutions based on avatars. Through meetings of
application will use this classifier to analyze the hand I-ACE we can explore these questions and come up with
configurations in real-time at the same time as the Kinect solutions, as well find more information for grammatical
sensor searches for linear arm movements. The translator will models for each sign language.
keep presenting the translation results to the user by combining
this information with facial expression data to disambiguate VIII. RESULTS & DISCUSSION
between contexts, if needed. These last steps could be more ACE architecture is prepared to work with several distinct
detailed in a future paper once the implementation process is sign languages making it possible to have deaf people from
even further. different countries understanding each other. This is important
In terms of future improvements, we consider we can do as it helps preserve their culture and makes users friendlier.
better on the hand classification and the body motion analyzing This way, the blind architecture can be linked by using voice
and we are always searching for new hardware or technologies recognition and text to speech allowing more inclusion even
which can help us improve even further. We are soon with different communities. This allows us to have
beginning to work with a new version of the 5DT data gloves bidirectional communication between several different types of
which implements a gyroscope alongside all the other already communication channels. Concluding, each user uses what
existing sensors, this will improve our accuracy a lot as they are most comfortable with. Other communication types
preciously described. We are also studying the feasibility of are being added to have a truly inclusive communication
using an accelerometer on the glove to detect motion direction system for everyone, putting an end to exclusion due to
and speed in order to discard some of the processing on the
communication difficulties and promoting synergies and [8] O. Coelho, “Direitos linguísticos, acessibilidade e
inclusive creative work. cidadania, Spread the Sign e Profacity,” Rev. Divers.,
Accessibility to information will also increase because a pp. 22–25, 2009.
user of this system can better explain what he wants in his own [9] H. Sagawa, M. Ohki, T. Sakiyama, E. Oohira, H.
way and to receive the information from anyone in a format he Ikeda, and H. Fujisawa, “Pattern recognition and
can understand. This is especially important for education but synthesis for a sign language translation system,” J.
can be used in any software. We plan to archive this with Vis. Lang. Comput., vol. 7, no. 1, pp. 109–127, 1996.
plugins for PowerPoint which uses VSSO translator, allowing a
seamless translation of educative content. [10] R. Akmeliawati, M. P.-L. Ooi, and Y. C. Kuang,
“Real-Time Malaysian Sign Language Translation
As we expand we found some interpreters are a bit against using Colour Segmentation and Neural Network,”
the use of the avatar as they fear it could replace them, however 2007 IEEE Instrum. Meas. Technol. Conf. IMTC 2007,
this is positive for the deaf, as it allows them to become more no. May, pp. 1–6, 2007.
independent. While other interpreters understand this as the
future and help us by giving us important information. [11] X. Chai, G. Li, Y. Lin, Z. Xu, Y. Tang, and X. Chen,
“Sign Language Recognition and Translation with
Currently we have 2,127 validated Portuguese sign Kinect,” 10th IEEE Int. Conf. Autom. Face Gesture
language words animated and three hundred more on the Recognit., pp. 22–26, 2013.
process of validation. Our recent team in Slovenia has
configured over 400 words since October 2017. In England [12] M. Mohandes, M. Deriche, and J. Liu, “Image-based
British Sign Language also stared expanding. The following and sensor-based approaches to arabic sign language
formations will be in Germany, Greece and Cyprus where recognition,” IEEE Trans. Human-Machine Syst., vol.
experts will learn how to use the software and give us sign 44, no. 4, pp. 551–557, 2014.
language data as animation context and validation. [13] M. Morgado and M. Martins, “Língua Gestual
We are also saving contextual information from a semantics Portuguesa e Bilinguismo,” Rev. Divers., pp. 7–9,
point of view having experts validating it, so this data can be 2009.
used to support disambiguation of terms. This data is being [14] D. K. Simonton, “Creative thought as blind variation
collected by all countries we have partnership with. Currently and selective retention: Why creativity is inversely
we are working on the formal specifications of the structure of related to sightedness.,” J. Theor. Philos. Psychol.,
syntax and grammar, the workflow of data collection in VSSO
vol. 33, no. 4, pp. 253–266, 2013.
and how it fits the architecture of ACE.
[15] B. S. Bloom, M. D. Engelhart, E. J. Furst, W. H. Hill,
REFERENCES and D. R. Krathwohl, Taxonomy of educational
[1] M. Goldfeld, A criança surda: linguagem e cognição objectives, handbook I: The cognitive domain, vol. 19.
numa perspectiva sociointeracionista. Plexus Editora, New York: David McKay Co Inc, 1956.
2002. [16] N. Suttie et al., “In persuit of a ‘serious games
[2] T. M. M. de M. Martins, “A letra e o gesto: estruturas mechanics’ : A theoretical framework to analyse
linguísticas em Língua Gestual Portuguesa e Língua relationships between ‘game’ and ‘pedagogical
Portuguesa,” 2011. aspects’ of serious games,” in Procedia Computer
[3] P. Escudeiro, N. Escudeiro, M. Norberto, J. Lopes, Science, 2012, vol. 15, pp. 314–315.
and F. Soares, “Digital Assisted Communication,” in [17] S. Arnab et al., “Mapping learning and game
Proceedings of the 13th International Conference on mechanics for serious games analysis,” Br. J. Educ.
Web Information Systems and Technologies - Volume Technol., vol. 46, no. 2, pp. 391–411, 2015.
1: WEBIST, 2017, pp. 395–402. [18] M. Callaghan, M. Savin-Baden, N. McShane, and A.
[4] P. Escudeiro et al., “Real Time Bidirectional Gomez Eguiluz, “Mapping Learning and Game
Translator of Portuguese Sign Language,” Procedia Mechanics for Serious Games Analysis in Engineering
Comput. Sci., vol. 67, pp. 252–262, 2015. Education,” IEEE Trans. Emerg. Top. Comput., vol. 5,
[5] P. Escudeiro et al., “Virtual sign translator,” Int. Conf. no. 1, pp. 77–83, 2017.
Comput. Networks Commun. Eng. (ICCNCE 2013), [19] K. Salen and E. Zimmerman, “a Rules of Play: Game
no. Iccnce, pp. 290–292, 2013. Design Fundamentals,” Nihon Ronen Igakkai Zasshi.,
[6] S. Morrissey and A. Way, “An example-based p. 672, 2004.
approach to translating sign language,” Structure, pp. [20] Y. Bouzid, M. A. Khenissi, and M. Jemni, “Designing
109–116, 2005. a Game Generator as an Educational Technology for
[7] J. Gameiro, T. Cardoso, and Y. Rybarczyk, “Kinect- the Deaf Learners,” Contemp. Educ. Technol., vol. 3,
Sign, Teaching Sign Language to ‘Listeners’ through a no. 1, pp. 60–75, 2012.
Game,” Procedia Technol., vol. 17, pp. 384–391, [21] P. Escudeiro, N. Escudeiro, M. Norberto, and J.
2014. Lopes, “VirtualSign in serious games,” in Lecture
Notes of the Institute for Computer Sciences, Social-
Informatics and Telecommunications Engineering,
LNICST, 2016, vol. 161, pp. 42–49.
[22] P. Escudeiro, N. Escudeiro, M. Norberto, and J.
Lopes, “Virtualsign game evaluation,” in Lecture
Notes of the Institute for Computer Sciences, Social-
Informatics and Telecommunications Engineering,
LNICST, 2017, vol. 176 LNICST, pp. 117–124.
[23] P. Escudeiro, N. Escudeiro, and P. Oliveira, “Blind’s
Inclusion in Computer Games,” Proc. 27th EAEEIE
Annu. Conf., 2017.
[24] W. C. Stokoe and M. Marschark, “Sign language
structure: An outline of the visual communication
systems of the american deaf,” J. Deaf Stud. Deaf
Educ., vol. 10, no. 1, pp. 3–37, 2005.
[25] E. A. Arkenbout, J. C. F. de Winter, and P. Breedveld,
“Robust hand motion tracking through data fusion of
5dt data glove and nimble VR kinect camera
measurements,” Sensors (Switzerland), vol. 15, no. 12,
pp. 31644–31671, 2015.
[26] T. Hachaj, M. R. Ogiela, and K. Koptyra,
“Effectiveness comparison of Kinect and Kinect 2 for
recognition of oyama karate techniques,” Proc. - 2015
18th Int. Conf. Network-Based Inf. Syst. NBiS 2015,
pp. 332–337, 2015.
[27] H. V. Verma, E. Aggarwal, and S. Chandra, “Gesture
recognition using kinect for sign language
translation,” 2013 IEEE Second Int. Conf. Image Inf.
Process., pp. 96–100, 2013.
[28] B. Y. L. Li, A. S. Mian, W. Liu, and A. Krishna,
“Using Kinect for Face Recognition Under Varying
Poses, Expressions, Illumination and Disguise.”
[29] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction
to Data Mining. 2005.
[30] B. Hu, Z. Lu, H. Li, and Q. Chen, “Convolutional
Neural Network Architectures for Matching Natural
Language Sentences,” Adv. Neural Inf. Process. Syst.
27, pp. 2042–2050, 2014.
[31] F. Weichert, D. Bachmann, B. Rudak, and D. Fisseler,
“Analysis of the accuracy and robustness of the Leap
Motion Controller,” Sensors (Switzerland), vol. 13,
no. 5, pp. 6380–6393, 2013.

View publication stats

You might also like