Professional Documents
Culture Documents
Project Final Draft Last Last
Project Final Draft Last Last
Project Final Draft Last Last
Project Documentation on
By:
Advisor:
Tigabu Dagne
June, 2016
Addis Ababa Institute of Technology
This Project documentation submitted in partial fulfillment of the requirements for the
Degree of Bachelor of Science in Software Engineering.
Tigabu Dagne
June, 2016
II
Declaration of Originality
We declare that this project is our original work and has not been presented for a
degree in any other university.
Name Signature
1. __________________________ __________________
2. __________________________ __________________
3. __________________________ __________________
4. __________________________ __________________
This project Documentation has been submitted for examination with my approval
as University Advisor.
III
Acknowledgement
Before all, the project team would like give special thanks to God for his guidance in all our
journey so far.
Then, we would like to thank our Advisor Tigabu Dagne without whose persistent supervision and
guidance this project documentation might not be completed.
Our special thanks also goes to our family who supported in all the ways from our childhood to
present.
Next we would like to thank all the people, who gave their valuable time and feedback, support
and encouragement while we work on project documentation so far.
Finally, we would like to take this opportunity to thank all the people who have contributed to this
project documentation with their invaluable opinions and suggestions, without whose
collaboration the project documentation might not be completed.
IV
Abstract
Communication is the way people interact with each other in their day to day life. As one way of
communication Sign language is dominant language for the deaf community. As member of the
society this community need communication with any one of the community breaking their
communication barrier. The project team tries to tackle the problem to some extent in breaking
this communication barrier. Shortage of corpus data, scope of the project might make the project
to be difficult in the way it is thought to be developed. The positive impact this project brings if
successful makes us to be motivated to try more and more.
Internationally there are many Sign language to text and sound are done. For instance, ASL to Text
and Sound for English language is implemented in USA, Arabic Sign Language to Text and Audio
for Arabic language is implemented in Arab countries and so on. Locally, there is no system done
for Ethiopian Sign language. As a result we come-up with a project to do ESL to Text and Audio.
The main purpose of this project documentation is to design a system which converts Ethiopian
Sign Language to text and speech.
Our system is proposed to work on desktop and is proposed to be implemented using C#, Matlab
or EmguCv. The methodologies used for data gathering tools are questionnaire, discussion, use
case scenario and extensive literature review. Moreover, the methodology proposed to be used in
implementation are image processing/video processing approach, dictionary file development,
using existing audio file for Amharic letters and collection of sound corpus by recording audio.
The design phase is done using Object oriented approach using the UML modeling. The team has
tried to identify the functional requirements and modeled it using the major modeling methods.
Each functional and user requirement is analyzed using activity diagram, sequence diagram, DFD
diagram, class diagram. At last the test design is included to test the system at the end of its
implementation.
The implementation is done using C# and EmguCV library which is cross platform .NET
wrapper for opencv image processing. The algorithms applied are Convex and Convexity hull for
image detection and SURF and Absolute difference for recognition and matching. The test
results is done according to the test design of first semester. The audio processing uses
concatenative approach and implemented on Matlab.
V
Table of Contents
Contents pages
Acknowledgement ........................................................................................................................ IV
Abstract .......................................................................................................................................... V
Table of Contents .......................................................................................................................... VI
List of Acronyms .......................................................................................................................... IX
List of Figures ................................................................................................................................ X
List of Tables ............................................................................................................................... XII
Chapter 1. Introduction ................................................................................................................... 1
1.1 Background of the Study .................................................................................................. 1
1.2 Statement of the Problem ................................................................................................. 2
1.3 Objective of the Study ...................................................................................................... 3
1.3.1 General Objective ..................................................................................................... 3
1.3.2 Specific Objective ..................................................................................................... 4
1.4 Proposed System .............................................................................................................. 4
1.5 Feasibility Study............................................................................................................... 5
1.5.1 Technical Feasibility ................................................................................................. 5
1.5.2 Economic Feasibility ................................................................................................ 6
1.5.3 Legal Feasibility........................................................................................................ 6
1.5.4 Operational Feasibility .............................................................................................. 6
1.5.5 Schedule Feasibility .................................................................................................. 6
1.6 Methodology & Data Gathering Tools............................................................................. 7
1.6.1 Data Gathering tools ................................................................................................. 7
1.6.2 Implementation Tools and methods employed ......................................................... 7
1.7 Scope and Limitation of the Study ................................................................................... 8
1.8 Organization of the Project ................................................................................................... 9
Chapter 2. Review of Related Literature ..................................................................................... 10
2.1. Related works on the Area ................................................................................................. 10
2.2. The History of Sign Language Representation .................................................................. 11
2.3. Phonological Structure of Sign Language .......................................................................... 12
2.4. Morphological Structure of Sign Language ....................................................................... 12
2.5. Text-to-Speech (TTS) Systems at Glance .......................................................................... 13
VI
2.6. Phonological Structure of Amharic .................................................................................... 13
2.7. Morphological Structure of Amharic Language ................................................................ 14
Chapter 3. System Features........................................................................................................... 15
3.1. Functional Requirements.................................................................................................... 15
3.1.1. User Requirements ...................................................................................................... 16
3.1.2. Detail Functional Requirements .................................................................................. 18
3.1.3 Use Case Model ............................................................................................................ 24
3.2. Non Functional Requirements ............................................................................................ 26
3.3 Analysis Models .................................................................................................................. 27
3.3.1. Data Flow Diagrams .................................................................................................... 27
3.3.2 Sequence Diagram ........................................................................................................ 27
Chapter 4. System Design ............................................................................................................. 32
4.1 Deployment Diagram .......................................................................................................... 32
4.2 Architectural Design ........................................................................................................... 32
4.2.1 Class Diagram............................................................................................................... 32
4.2.2 Activity Diagram .......................................................................................................... 34
4.2.3 Proposed Software Structure ........................................................................................ 45
4.3 User Interface Design .......................................................................................................... 47
4.4 Test Design .......................................................................................................................... 51
Chapter 5. Implementation of the System..................................................................................... 53
5. Implementation of the System .................................................................................................. 53
5.1 Implementation.................................................................................................................... 53
5.2 Detection and Recognition .................................................................................................. 53
5.2.1 Hand Skin Color Detection........................................................................................... 53
5.2.1 Hand Recognition ......................................................................................................... 55
5.2.3 Training phase of implementation ................................................................................ 58
5.3 Implementation of Text to Audio ........................................................................................ 58
Chapter 6. Test Results ................................................................................................................. 60
6.1. Purpose ............................................................................................................................... 60
6.2. Unit Functional Testing...................................................................................................... 60
6.3 System Function Testing ..................................................................................................... 61
Chapter 7. Conclusion and Recommendation ............................................................................... 63
VII
7.1 Conclusion........................................................................................................................... 63
7.2 Recommendation ................................................................................................................. 63
Chapter 8. User Manual ................................................................................................................ 65
8.1. General Information ........................................................................................................... 65
8.1.1 System Overview .................................................................................................... 65
8.1.2 Authorization ............................................................................................................... 65
8.1.3 Contact ......................................................................................................................... 65
8.2 Access level ........................................................................................................................ 65
8.3 System Activities............................................................................................................... 65
8.3.1 Load Video ................................................................................................................... 65
8.3.2 Capture Video ............................................................................................................... 66
8.3.3 Video Description ........................................................................................................ 66
8.3.3 Text Display ................................................................................................................ 66
8.3.5 Exit the system.............................................................................................................. 67
8.3.6 Training the system ...................................................................................................... 68
8.4. How the System works....................................................................................................... 68
References ..................................................................................................................................... 69
Appendix A. Questionnaire .......................................................................................................... 72
Appendix B. Project Schedule ...................................................................................................... 73
VIII
List of Acronyms
IX
List of Figures
Figure 1. The proposed system -------------------------------------------------------------------------- 5
Figure 2. The bar graph depicting the user requirement fetching -----------------------------------17
Figure 3. System wide use case system ---------------------------------------------------------------- 24
Figure 4. DFD Diagram -----------------------------------------------------------------------------------27
Figure 5. Sequence diagram for Load and Show Video Description --------------------------------28
Figure 6. Sequence Diagram for Capture Video and Show Description ----------------------------28
Figure 7. Sequence diagram for View Text -------------------------------------------------------------29
Figure 8. Sequence diagram for play audio -------------------------------------------------------------30
Figure 9. Sequence diagram for Sending feedback and Receiving feedback ----------------------31
Figure 10. Deployment Diagram -------------------------------------------------------------------------32
Figure 11. Class Diagram ---------------------------------------------------------------------------------33
Figure 12. Activity diagram load video -----------------------------------------------------------------34
Figure 13. Activity diagram capture video ------------------------------------------------------------- 35
Figure 14. Activity diagram save video -----------------------------------------------------------------36
Figure 15. Activity diagram video Description --------------------------------------------------------36
Figure 16. Activity diagram process video -------------------------------------------------------------37
Figure 17. Activity diagram search pattern. ------------------------------------------------------------38
Figure 18. Activity diagram save pattern ---------------------------------------------------------------38
Figure 19. Activity diagram text display ---------------------------------------------------------------39
Figure 20. Activity diagram copy to clipboard ------------------------------------------------------- 40
Figure 21. Activity diagram text process -------------------------------------------------------------- 41
Figure 22. Activity diagram save text ------------------------------------------------------------------ 41
Figure 23. Activity diagram play audio ---------------------------------------------------------------- 42
Figure 24. Activity diagram save audio ---------------------------------------------------------------- 43
Figure 25. Activity diagram send feedback ------------------------------------------------------------ 44
Figure 26. Activity diagram send activity -------------------------------------------------------------- 44
Figure 27. Activity diagram Receive Feedback-------------------------------------------------------- 45
Figure 28. The proposed software structure ------------------------------------------------------------ 46
Figure 29. The proposed system Flowchart Structure ------------------------------------------------ 47
Figure 30. The overall user interface ---------------- --------------------------------------------------- 48
X
Figure 31. When video is loaded/captured/ sound play activity ------------------------------------- 49
Figure 32. When Mute is activated / sound is mute --------------------------------------------------- 50
Figure 33. When process video is pressed -------------------------------------------------------------- 50
Figure 34. When video processing is finished ---------------------------------------------------------- 51
Figure 35. The matched area of the hand image captured from video------------------------------- 56
Figure 36. Better matched image for captured image from video file------------------------------- 56
Figure 37. Unmatched image with an image capture from video file ------------------------------- 57
Figure 38. Loading the Video----------------------------------------------------------------------------- 66
Figure 39. Text Display for an corresponding sign of Zero ------------------------------------------ 67
Figure 40. Audio output------------------------------------------------------------------------------------ 67
XI
List of Tables
Table 1 response to the questionnaire distributed ----------------------------------------------- 16
Table 2 list of functional requirement ------------------------------------------------------------ 19
Table 3 list of use case description ---------------------------------------------------------------- 25
Table 4 Test Design Suit ---------------------------------------------------------------------------- 51
Table 5 Unit Function Testing---------------------------------------------------------------------- 61
Table 6 System Function Testing------------------------------------------------------------------- 62
XII
Chapter 1. Introduction
1.1 Background of the Study
Ethiopian Sign Language (ESL) is natural language spoken by vast community over one million
Ethiopians of the Deaf community [1]. It is a native language for the Deaf community. It is
composed of two features namely: manual features and non-manual features where the manual
components are hand-shape, hand orientation, hand position, and hand movement and the non-
manual ones are mouth pattern, head and shoulder movement, facial expression and eye-gaze [1].
Like many of African Sign Language, it has historical connection with American Sign Language
(ASL). However it has independent lexical and grammatical structure [2]. In addition as [4]
discussed ESL has origin from ASL with some influence from Nordic countries. For instance, [4]
explained the case by using a pilot survey on how close those two languages (ESL and ASL) on
249 signs and found out that 25% of the words brought from ASL are only modified to suit the
Ethiopian culture leaving complete modification. As a result sign language speaker from USA can
somewhat communicate with sign language user from Ethiopia when compared with the normal
or non-sign language users of both countries. Knowing its origin is essential to understand how it
works and what its relation with other languages.
Internationally, researchers have tried to find solution to the problem of deaf and hearing people
communication. As explained in [7], Motion Savvy Uni (The First Sign language to voice system)
which is two way communication tool is being developed and expected to be arrived by summer
2016. MotionSavvy Uni package comprises of a tablet, a case with integrated motion tracking
capabilities and a mobile app. As the user performs sign language, he or she can see themselves
through a mirror image that provides live-time feedback of their signing. Due to this Motion Savvy
is going to be the breakthrough for the future in effective communication of signers and hearing
people [7]. Even though this technology has not spread still but expected to be dispersed by 2016.
Locally, only few researches are done, for instance a Machine translation of Amharic text to
Ethiopian Sign Language is done by [1] as his partial fulfillment of Master of Science in Computer
Engineering. He has achieved in translating Amharic text to Sign language for the first time. It is
an input for other researchers. But the project is not usable for the day to day life at least
communication of signers and non-signers. In addition two students namely: Libnedengel and
1
Befakadu made project on Amharic Text and sound to Ethiopian Sign Language (ESL) as partial
fulfillment of Bachelor of Science in Computer Engineering in 2015 [8]. They have achieved in
translating the Amharic text and audio to respective ESL. But the system is not trained well and
when some words are fed it shows nothing.
The project team is doing the project targeting mainly for an association of Sign language user
people found in Ethiopia. Even though system is mainly for sign user it can be used for by non-
sign user also. The association name is ENAD (Ethiopian National Association for the Deaf).
ENAD is a non-profit organization established by people with deaf themselves. It was founded by
six persons with deaf. It became legally registered on 6 July 2006 at the Ministry of Justice [13].
Their mission is to empower people with deaf and their families for an independent life, and to
facilitate and ensure their full inclusion and participation in their society, in raising public
awareness of their unique disability, promoting the use of alternative techniques of communication
and promoting and improvement of services and facilities for people who are deaf.
The government and public awareness towards sign language and deaf individuals is somewhat
optimistic nowadays. As an indication the Addis Ababa University has launched the Department
of Ethiopian Sign Language and Deaf Culture in degree program since 2008 GC. The program
trains deaf and hearing students in the areas of Ethiopian Sign Language, Sign Linguistics,
Interpretation and Deaf Culture [3]. The other promising thing is that ESL education is being given
as non-credit course which is aimed to train users of ESL to be fluent and help the Deaf community
[4]. But this still has not solved those problems imposed on deaf community regarding
communication gaps. It is about 7-8 years the Department is launched. But professionals
graduated from this department has not significantly resolved the issues of narrowing the gap and
also be active member of the country’s economy.
2
Though there are researches made to improve the active role of deaf community, the fact is that
they are not significant in tackling the problem. In addition as [1] discussed that there are video
based text to ESL but the approach is infeasible in terms of time and cost. Because making a video
needs time and professionals. The project done by [1] is not more than mere translation of Amharic
texts to ESL and not seen used by deaf community to alleviate the widened gap of communication
between sign language users and non-sign language speakers.
The project done by Libnedengel and Befakadu is somewhat nice for they have done Amharic to
Text then Sound and then to Ethiopian Sign Language [8]. Their project has achieved translating
Amharic Texts to Sound and then to respective ESL. The system they developed works in one
direction only. That means in the existing system two people from signer and hearing respectively
if they can read (literate) they can communicate by writing on paper or else text based chatting.
The system is giving priority for the non-signer than the signer.
The existing system where deaf individuals and normal people communicate is using interpreters
it is manual and it needs additional person to interpret for them or else text based communication
where each communicated with writing texts. Some videos are also tried to be made to aid them
but not sufficient. In general the existing system is not suitable for catering the communication
problem faced by the signer people with the hearing people.
3
1.3.2 Specific Objective
Specific objectives of this project are:
4
Video Processing
Input as Video Video Key Frames
Segmentation Processing
Sign Language
5
1.5.2 Economic Feasibility
It is to determine the positive economic benefits to the organization that the proposed system will
provide. It includes quantification and identification of all the benefits expected [6]. The proposed
system will have a number of economic benefits to the organization and beyond it. These are:
6
1.6 Methodology & Data Gathering Tools
It implies the methods the researcher intend to collect data.
Questionnaire
Questionnaires are informal way of requirement gathering tools which are used to access
stakeholders on remote or distant location. That means these are stakeholders who are inaccessible
easily at the time the project team needed [10]. As a result the project team has planned to disperse
questionnaires to these stakeholders to gather the needed data which is essential for the system to
be developed.
Use cases
Use cases are basically stories that describe how a discrete process works and are easier for users
to articulate [10]. [12] has explained use case as scenario which captures contract between
stakeholders of system about its behavior. In addition they are means by which the software’s
system behavior is described under various conditions as it responds to requests by users [12]. As
a result the project team has planned to use this essential tool in overall design process of the
system.
7
The project team has decided to employ the follows methods and approaches in line with the
implementation tools.
Image processing/Video processing approach will be employed: where the input will be
video and the video will be further extracted to respective video frames of different ESL
signs.
Object Oriented Analysis and Design Approach: it is an approach where primary focus
is made on behavior of the system. Unifying Modeling Language (UML) is object oriented
language for specifying, visualizing, constructing, and documenting software system
(UML Document Set, 2001). It is the successor to the modelling languages found in the
Booch (Booch 1994), OOSE/Jacobson, OMT and other methods. The UML is important
because it can help software developers communicate. It has to be used in a way that helps
communication and does not hinder it [4]. UML will be employed to specify, visualize,
construct, and document the system during analysis and design.
Dictionary File development: the file used for the language recognition. And researcher
will be using the ESL dictionary
Using available audio for Amharic ‘Fidel’ sounds: to develop dictionary to map the
respective sign with the sound (audio) in .wav format.
Recording audio: for different sample sound for the system testing in each phase.
8
1.8 Organization of the Project
The documentation contains five phases. The first phase is introduction which contains the
background of the study, statement of the problem, objective of the study, proposed system,
feasibility study, methodology and data gathering tools and scope and limitation of project. The
second phase covers the literature review which include related works on the area, history of sign
language representation, phonological structure of sign language, morphological structure of sign
language, Text to Speech at glance, phonological structure of Amharic language and
morphological structure of Amharic language. The third phase covers the system features which
include functional requirement, non-functional requirement, use case diagram, use case
description, DFD diagram and sequence diagram. The fourth phase covers the system design which
include deployment diagram, class diagram, activity diagrams, proposed software structure and
test design. The fifth phase contains the covers the implementation of the project which contains
the detection and recognition as well as the implementation of Text to audio. The sixth phase
contains the test results which includes the unit test result and system function test results. And
finally the last phase covers the conclusion and recommendation parts. In addition the user manual
is also included at last.
9
Chapter 2. Review of Related Literature
2.1. Related works on the Area
Approximately one or two out of 1,000 to 2,000 babies is/are born hearing/speech impaired [12].
This shows us that their number is not quite small and need recognition in the day to day life. Even
though they are recognized, communication between hearing and deaf person is not easier as it
requires involvement of third party. As a result hearing impaired people are pressured to adopt
written language as their second whether it is their interest or not. Due to this, the area is an active
area of research and there are many projects or researches done and being done to bridge the gap
between hearing and hearing impaired community. Among them, some of them are discussed
below.
The iCommunicator is a technology made by PPR Direct who is the proud member of Microsoft
Assistive Technology Vendor. It converts in real-time speech to text, speech/text to video sign
language and speech/text to computer generated voice. It provides end-users with many
opportunities in the areas such as education, business and every day activities. It is equipped with
better speech recognition (Dragon NaturallySpeaking v9.0) and also compatible with Microsoft
accessibility options [15]. In line with this Microsoft claims 110million devices now run Windows
10 with the exclusion of the previous operating systems such as (Windows XP, 7, 8, and 8.1). So
iCommunicator’s compatibility with Microsoft makes it to be popular [15].
As [16] explained there are mainly two major systems developed in regard to Sign Language
Recognition. These are Glove based and vision based. The glove based recognition systems
recognize based on different illumination based changes. It is an approach where microcontrollers
are embedded into gloves. Lee et al stated in [16] proposed glove sensor mechanism to learn
Korean Finger spelling using k-mean. Halawani stated in [16] developed Arabic Sign Language
translation system for mobile devices. Regarding to vision based approach Hamad et al stated in
[16] introduced handshape estimation approach using two cameras. In addition the Hidden Markov
Model (HMM) and Dynamic Programming were used to recognize ASL words. With respect to
this [17] proposed an idea to convert sign language to voice recognition based on feature extraction
and HMM from grey scale images. Moreover, Mohandes cited in [17] proposed prototype system
to recognize Arabic Sign Language based on Support Vector Machine (SVM) and also Arabic
Text to Arabic Sign Language.
10
ASL to Text-Sound based on Image processing and machine learning is another approach used by
researchers. It is one part of the vision based recognition. Paper [19] proposed a system which
converts ASL to Text-Sound using Image processing, Machine learning and Morse code.
MST of Amharic Text to ESL developed by [1] made spring board for Ethiopian research on Sign
languages. This system is the reverse of the system proposed to be done by this project team.
Arabic Sign Language to Text-Sound proposed by [9] using Image/video processing approach
where video is converted into individual frames and they are matched with the augments of frames
already in the database. [9] used an approach where two relational databases (gestures and
description database, and the conjugate sound database) are built and video processing is to extract
the key frames, and extracting the key words and finally displaying the sentence with the audio
playing.
Sign Writing is another system developed by Valerie Sutton, for the purpose of annotating
movement. It was developed for communication purpose than linguist purpose. It is pictorial
notation system to describe non-manual features [19]. It is developed to be appropriate for any
sign language.
Gloss: is another notation system used to represent sign using a word in target language and it
requires a translation system to take an approach of dictionary-based approach.
11
Most of the notation systems are trying to uniquely represent specific sign language in terms of
unique phonetic transcription. These systems are mostly unable to explain the emotions and some
facial movement as much as possible. The approach we are using is an image processing approach
where video is fragmented to frames and compared to database saved images of sign languages.
And each of those database saved images representing the sign languages are mapped to
description that is text.
ESpeak TTS: it is a text to speech system for android supporting for 75 languages and accents.
It is based on eyes-free version and supports for accents and special characters, improved handling
of speech and rate and pitch. And it is also based on MIPS service and improved support for Speech
Synthesis Markup Language (SSML) [14]. The support for many languages makes it usable for
projects undertaken on the specific languages.
13
other [22]. Amharic has also its own phonology structure. As explained in [1], Amharic has its
own phonetic, phonological and morphological properties. And also it has its own inventory of
speech sounds and also has set of speech sounds which are not in any language. For instance {ጵ፡
ሽ፡ቅ} are some of the speech sounds which are missing in other languages. Amharic has consonants
and vowels which is different from other language. Accordingly, [1] has explained that it has 30
consonants from which 27 are simple and 3 complex and seven vowels. When viewed from the
phonological aspect, Amharic is not ambiguous as other languages like English. For instance in
English looking letters is not enough to say or dictate the word created by those letters. But in
Amharic it is easy to read if someone has knowledge of the language by simply looking the
sequence of words.
14
Chapter 3. System Features
3.1. Functional Requirements
Functions are software capabilities which can satisfy user’s explicit or implicit requirements when
specified conditions are met. In other words they are statement of services that the system should
provide, how the system should react to particular inputs and how the system should behave in
particular situations. They are requirements which capture the intended behavior of system. And
those behaviors can be expressed by services, tasks or functions the system is required to perform.
15
19. Play the sound for the respective text.
Q.No Assumed Questions that can cover the whole user requirements Yes No Don’t
know
1 Do you agree if the system has live video feed? 9 0 1
2 Do you agree if the system has synchronous text and speech 6 2 2
output?
3 Is there a purpose to save the video recorded? 5 2 3
4 Is there a purpose to check the live feed video? 7 0 3
5 The system should recover even error occurs 8 0 2
6 Do you think that performance matters? 9 0 1
7 The system should be secured. Do you agree? 7 0 3
8 Do you think that the system should be scalable? 4 3 3
16
Graph illustrating the responses for eight questions
10
9
8
7
6
5
4
3
2
1
0
Quest 1 Quest 2 Quest 3 Quest 4 Quest 5 Quest 6 Quest 7 Quest 8
Yes No Unsure
As shown on the graph, the candidate are more or less agreed on the proposed system. Based on
the reply by the candidates the project team has proposed the following requirements. Even though
the requirements lists are based on the reply, there are also additional requirements which are
proposed and selected by the project team. These are:
1. Load Video: saved video which will can be in any storage place. Search for the video and
open into the dialog box. The videos validity will be checked.
2. Capture live video: It is to feed live or recorded video to the system. It is also an input
mechanism using camera. It will have unit functions capture, pause, and cancel. When the
record button is pressed the system starts the camera module and starts capturing video and
saves to specific directory. It will be shown on the window.
3. Show video info: show the description of the loaded or recorded video. Show the video
type, frame rate per second (fps), duration, and size.
4. Process Video: after the information of the video is shown the process continuing is video
segmentation. These functionality works after the button start process is pressed.
17
Segment video: where the loaded video is segmented to key frames. And the key frames
are saved in in a directory after it is segmented into frames.
Extract region of interest: where from each of key frames region of interest will be
extracted and then cropped.
Resize frame: at the end of cropping the frame will be resized to appropriate pixels.
Feature extraction: features of the resized frames will be extracted to obtain patterns.
Pattern matching: the obtained pattern of the feature will be matched with pattern saved
in the database.
Handle unmatched pattern: if by chance unmatched pattern is found then it will be
saved to database.
5. Display the Text: checks that gesture is matched properly then the text will be displayed
clearly.
6. Process the Text: the text successfully displayed will be an input to the TTS engine. The
text will go a process in natural language process. The process which are taken place in
processing text are as follows.
Text input: the text displayed will be put as input to process text stage.
Text Analyze: the text input go to analyses stage.
Text matching: text is matched with text inside database which saved with
respective sound.
Handle unmatched text: if by chance unmatched text is found then it will be saved
to the database.
7. Play the sound: checks that matched text is correct then it outputs the sound of the text.
The play sound can be muted or not.
8. Send Feedback: Sends the unrecognized texts and gestures to an admin email
18
FR-S-VL-1. The others are given in similar way to the example. Each of the functional
requirements in the use cases are explained in detail based on their order and requirement id, title,
priority, and dependency.
19
Priority High
3. Show Video Info:
Requirement ID FR-S-7
Title Show
Description The system shows the user to about the
details of the video
Priority Low
Dependency Capture Video, Load Video
4. Process Video:
Requirement ID FR-S-8
Title Segment Video
Description The system takes loaded or captured video
and segment it to key frames.
Priority High
Dependency Show Video Info.
Requirement ID FR-S-9
Title ROI Extract
Description The system takes the key frames and
extracts the region of interest from each
key frames and then crops it.
Priority High
Dependency FR-S-8
Requirement ID FR-S-10
Title Cropper
Description The system takes the region of interest
extracted and crops it.
Priority High
Dependency FR-S-9
Requirement ID FR-S-11
Title Frame resize
Description The system takes the cropped frame and
resize it to some optimum size.
Priority High
Dependency FR-S-10
20
Requirement ID FR-S-12
Title Feature Extract
Description The system will take resized frame and
extract the features on it to generate
patterns
Priority High
Dependency FR-S-11
Requirement ID FR-S-13
Title Search
Description The system will take the patterns generated
by feature extractor and query the database.
Priority High
Dependency FR-S-12
Requirement ID FR-S-14
Title Match
Description The system will match the pattern
generated and the patterns queried.
Priority High
Dependency FR-S-12
Requirement ID FR-S-15
Title Save pattern
Description The system saves the unmatched pattern to
database.
Priority High
Dependency FR-S-13
5. Text Display:
Requirement ID FR-S-16
Title Display text
Description The system show the text matched to
respective pattern.
Priority High
Dependency FR-S-12,FR-S-13,FR-S-14
Requirement ID FR-S-17
Title Copy text
Description The system allows the user to save the text
to local storage.
Priority Low
Dependency FR-S-16
21
11. Text Processor:
Requirement ID FR-S-18
Title Text feed
Description The system takes text from the display or
from the matched query and be ready for
the text process.
Priority High
Dependency FR-S-16
Requirement ID FR-S-19
Title Text Analyze
Description The system analyze the ready text and
make ready for the matcher.
Priority High
Dependency FR-S-18
Requirement ID FR-S-20
Title Text Search
Description The system takes the processed text then
query for database.
Priority High
Dependency FR-S-19
Requirement ID FR-S-21
Title Text Match
Description The system matched the text processed
with the text queried.
Priority High
Dependency FR-S-20
Requirement ID FR-S-22
Title Save text
Description The system saves the unrecognized text to
database.
Priority High
Dependency FR-S-20
22
12. Audio Player:
Requirement ID FR-S-23
Title Play Audio
Description The system allows the user to play the
sound for the matched text.
Priority Medium
Dependency FR-S-21
Requirement ID FR-S-24
Title Mute Audio
Description The system allows the user to mute the
sound not to play.
Priority Medium
Requirement ID FR-S-25
Title Save Audio
Description The system allows the user to save audio to
local storage.
Priority Low
Dependency FR-S-23, FR-S-21
23
3.1.3 Use Case Model
The diagram below represents the activities of actor(s) when interacting with each function of the
system.
The following diagram depicts the complete use case diagram of the system. And those use cases
connected with actor are use cases on which the actor directly access.
24
Use Case Description
25
Priority Medium
Description The property allows user to send the mail server
admin.
Actor Any User, Mail server admin
Precondition Internet connection, network connection
Use Case Number UC_06
Use Case Title Receive Feed Back
Priority High
Description The property allows the web admin to receive the file
sent
Actor Web Server
26
3.3 Analysis Models
The analysis models we use are Data Flow Diagrams and sequence diagrams modeling. The
requirements which are listed in functional requirement tables are modeled.
The following sequence diagram shows the process while the system tries to capture video.
View Text
This sequence diagram shows what the text output after processing the video. It depicts the way
processes are interacted in showing the text. Here the process starts when the user loads the video
and the description is seen and then the user again press process then the process video will process
28
video and matches to database and if the match is successful it shows the text. Otherwise the user
is prompted to save the unmatched pattern and the user will save the pattern to specified location.
Play Audio
This process is started by the user by loading the video and the user will see description
successfully. Then the user will request process and the system will process the video and returns
match successfully. The text will be displayed to the user and at the same time will invoke the text
process to process the text and the system will query database and find the match and the wait the
user to invoke play and the audio is played and at the same prompts the user to save the audio. If
the user chooses to save, the system saves the audio but if the user choose cancel then nothing is
saved. If the match of text is unsuccessful then the system prompts the user to save unmatched text
to directory where unmatched pattern is saved.
29
Figure 8. Sequence diagram for play audio
The user invokes the send feedback activity and the send feedback locates where the files are and
then invokes the compressor class and the file is successfully compressed will make send feedback
30
activity to send a message which shows that it is ready to send the file. The user requests the send
request and the file sent to the server and receiver activity will show message to the admin as well
as will send back message that shows message has been successfully delivered to the user.
31
Chapter 4. System Design
System design is the process of defining the architecture, components, modules, interfaces,
and data for the system to satisfy the specified requirements.
32
The class file shown below is based on the DFD and use case diagram already shown above in
Figure
33
4.2.2 Activity Diagram
Load Video Activity Diagram
This diagram shows the activities done while loading video. While load video the user tries to open
using the dialog box and he will pass the open request or cancel request. If the cancel request is
passed the activity finishes without opening the video. But if the open request and the video
opening is successful then activity ends unless it returns again to initial to try again.
This diagram shows an activity while capturing or recording video. The activity initialized and
first checks for camera. If the camera is working then recording starts. While recording if stop
request is passed then it will stop recording and automatically save the video to the earlier specified
34
directory. If no record again request is passed, then the saved video is loaded for the next process
and ends the activity.
The diagram shows how the video is saved. After the save video activity is initialized, it will take
the file name, its size and destination to save and then saves. If all this is successful ends the
activity unless returns to initial.
35
Figure 14. Activity Diagram Save Video
This diagram shows how the video information is shown. As soon as the activity starts it will take
the loaded video and takes the parameters location, name, size, format, duration, number of frames,
frame rate and shows the information of each loaded video.
36
Process Video Activity
The diagrams shows an activity where video is changed to frames and extracted for matching from
patterns saved in database. It first checks that the video loaded is valid and if valid it will waits
start processing otherwise it will return to initial state and waits for press of process button again.
After the process has extracted the features then search activity is used to search for pattern
resembling the patterns generated in feature extraction. If the search results are successful then
matching is done. If matching is successful then checks if there are other more frames unfinished
and returns to segment action part again. Otherwise it will end. If there are some patterns which
are unmatched then it will go to save pattern activity and ends the activity.
The diagram showing how pattern is searched in the process video activity.
37
Figure 17. Activity Diagram Search Pattern
The diagram showing how save pattern process goes on. The save here takes the pattern name, and
file size and saving location and saves to the specified location.
38
Text Display Activity
The diagram shows how the text is displayed. After the matching step in process video. This
activity starts at every time process video ends. The display checks if text is loaded successfully.
If successfully loaded displays the wanted text. Otherwise return to initial or start and starts
again. If the text is displayed successfully the copy to clipboard activity can be activated and be
used to copy to clipboard copying is done asynchronously. This activity lasts if there are texts to
be loaded but if finished the activity ends.
The following activity diagram shows how activity goes while copying to clipboard. Checks if
there text is displayed then copy will be successful. Otherwise not successful it will return to the
start or inactive stage.
39
Figure 20. Active Diagram Copy to clipboard.
This following activity diagram shows actions in Text process activity. The text displayed goes
to analyses stage and analyzed. During analyses the text’s behaviors are studied deeply if it is
word, letter, what follows what and so on. After deep analysis then texts equivalent will be
searched from database and then matched with text displayed and if match is found then the
respective sound is loaded and the next activity will be ready to undertake. If the text is
unmatched then will be saved to the position where the unmatched pattern is saved. Here the
saving unmatched text can be done asynchronously that it cannot affect the main activity. The
activity remains active until the text loading is not finished and as soon as the text loading is
finished then activity come to an end.
40
Figure 21. Activity Diagram Text Process
41
Play Audio Activity
This activity become active when audio is loaded after the successful match of text in text
processing. If audio is not successfully loaded then the activity become inactive or to initial. But
if the audio loading is successful then waits for play or mute request. If play request then it
asynchronously ask the user to save the mp3 version of the audio and at the same time plays the
audio. But if the mute request is passed then then audio sound cannot be heard and ends the activity
if the no more audio is loading but goes to audio load stage if audio loading unfinished.
42
The activity diagram below shows those actions done while saving audio file. This occurs when
the user requests play audio file and file is played as well as prompts the user to save the audio or
download the audio file.
It is an activity invoked when the user requests send feedback and then tries to locate the files then
find the patterns saved with respective text and then compresses it. Then it will make the file ready
to the send activity and then ends.
43
Figure 25. Activity Diagram Send Feed Back.
Send Activity
This activity is invoked by send feedback activity and sends the file to the specified admin email
with proper email, subject and descriptions already specified.
44
Figure 27. Activity Diagram Receive Feed Back.
45
Video Processing
Input Video
Sign Language
46
Figure 29. The proposed system Flowchart Structure
47
Figure 30. The overall User interface
The following user interface depicts an interface when the user loads, captures video. And the
sound play is also active by default. The loaded videos description then shown on the video
description space. And if the user is capturing video the video loaded automatically after recorded
and some description will be shown on the video description section. The user has not pressed the
process video button.
48
Figure 31. When Video is loaded/ captured /Sound play Active
When user doesn’t want the sound then he/she may mute sound and the mute audio will be
activated. Still the user hasn’t pressed the process button.
49
Figure 32. When Mute is activated/ Sound is mute
This user interface depicts when a user presses process video and the message “video
processing….” is seen and when the message is no longer seen the process is finished.
50
User interface when text is viewed
This user interface depicts when user shown the text output and as the text is being seen the text
shown is processed for its audio version and becoming ready to be played.
51
Table 4: Test design table
52
Chapter 5. Implementation of the System
53
So the skin detector we have implemented works on the principle of YCC or YCrCb color space
which can be used to find the general skin color. As explained in [28], there are two types of skin
detection mechanisms. These are:
Pixel based skin detection: is detection mechanisms where each pixel is classified as either skin
or non-skin individually from neighbor.
Region based skin detection: is detection mechanism which looks for spatial distribution or
arrangement based on intensity and texture.
The detection mechanism used in the implementation is color based so it is categorized under pixel
based skin detection.
According to [28], YCC is advantageous from over RGB and HSV in that:
It can be applied to complex color images even on an uneven illuminations, while HSV
(Hue, saturation, and value) is not ideal because the processing consumes time as well as
is best for only simple images with uniform background.
Is applicable and widely used for color based analysis while RGB is not ideal for color
based analysis due to its non-uniform as well as device independent nature.
Based on max and min value of YCC, the skin and non-skin color is detected. The detected skin
region is will be eroded and dilated before its hull points and contour points are decided. The
morphological operations applied are dilation and erosion.
Dilation: maximizing morphological operation which makes the bright regions of image to grow
by computing the maximal value.
Erosion: minimizing morphological operation which computes the local minimum over the area
of kernel.
The skin detected will be given to function which extracts contours and convexity hull. Based on
the biggest contour of the skin part will be identified and segmented.
54
The YCrCb color space when implemented using Emgu CV, it will be used as Ycc. Image<Ycc,
Byte> frameImage = new Image<Ycc, byte> (bitmap). In this sample code, the Ycc represents the
YCrCb color space. This color is used to detect the skin with max value of Ycc, and min value of
Ycc. The detected skin will be eroded and dilated with cvErode and cvDilate to smooth and make
the detected skin to be seen. On the main grabber of image the rectangular region will be drawn
whenever skin color is detected. That region will be then used to train the system and to be used
for matching after.
SURF means Speeded Up Robust Features. It is a local feature detector and recognizer and
descriptor which is used for matching the images based on the descriptors found from the saved
image in the database and grabbed image.
It extracts feature from query image and all the images in the database or collection of images.
This algorithm is very efficient but used for few images because the searching become exhaustive
for more images. But the SURF algorithm used here is equipped with fastest search called KNN
(K Nearest Neighbor) which makes the algorithm efficient.
55
matrix and distance matrix. The first one tracks the index of each images start and end. The
distance matrix is the one which calculates the distance between each of images from the
queried image.
6. Inappropriate matches will be filtered out
7. The matched results will be ready for the other step.
The following figures shows how much of the image are matched and if the match percentage is
higher the red rectangular region become fully drawn otherwise it will draw nothing if there is
little or no matching.
Figure 35. The matched area of the hand image captured from video.
56
Figure. 36 Better matched image for captured image from video file.
Figure 37. Unmatched image with an image capture from video file.
From the filtered indices and distance matrix, match percentage is calculated and find how much
region of images is similar. Using percentage as well as the absolute difference of each image of
database and queried image, matching is done and the corresponding text of saved image will be
output on the display.
57
5.2.3 Training phase of implementation
Using the contour, convex and convexity hull algorithms, the detected skin region will be
extracted and saved to database file. While it is being saved it is cropped to 100x100 pixel which
optimal value for comparison of images and the name of image will be saved to the database file.
The approach used here is concatenative synthesis. Concatenative synthesis is the way each of
alphabet sound is concatenated to create word. When word is given it also read based on those
already saved sounds of alphabets. The special Amharic characters other than the 231 are not
included and it is beyond our scope and English alphabet are also included. The sample rating of
each audio is the default one which 1200.
1. The text output is given as an input to the voice synthesis class. The class is implemented
using C# with library calling Matlab.
2. The input text will be matched with its corresponding number
3. The Matlab function will be called from C# using the following code.
var matlab = new MLApp.MLApp()
matlab.Execute(path);
The function calling is in the following snippet code.
58
4. Matlab function numberToVoice takes numberArray and converts to its corresponding
voices.
5. Silence will be removed from each sounds with threshold of 0.1
6. After the silence is removed it will be enhanced using speech enhancement function. That
is used to reduce high pitch voiced signals using the low pass filter.
7. The sound will be returned to C# environment and played.
59
Chapter 6. Test Results
6.1. Purpose
The purpose of this test result is to provide a summary of the results of the test performed as
outlined within this documentation.
This depicts the test result of unit function of the system and each is tested according to the test
design suit designed on previous semester.
60
Table 5. Unit Function Testing
9 Show Text Team Pass less As the video frame is moving the
respective matched text is shown.
10 Show Text Team Pass Some As the video frame is frame is moving the
with sound defects respective matched text is shown and
respective audio is played.
61
We also put a training environment to train the system with more images and videos to get a better
result since our application works in a controlled environment and since we couldn’t find enough
videos and image to train it ourselves we put this training button to train the system.
By using some video we tried to test our system to convert the video to the sound and whether it
actually train new sets of images and video as shown in the table below.
62
Chapter 7. Conclusion and Recommendation
7.1 Conclusion
The documentation the team developed is for ESL to Text and Audio in image processing
approach. It is one way and it goes from gesture to text and audio. In order to develop the
documentation the team has gathered requirement by questionnaire and use case scenarios, and
group discussion. And the gathered requirements are modeled using the analysis models such as
DFD diagrams, Sequence diagrams, activity diagrams, class diagrams. The documentation has also
been equipped with test design which will be used to test the implementation. In the
implementation part we have used EmguCv library which is cross platform for .Net wrapper to
opencv for image processing. In it we have tried to train different sample videos and used to
recognize from file and live camera. The audio processing uses concatenative approach which
included the whole 231 alphabet without including the special characters other than those. The
algorithms used are SURF as well as Absolute differences of images for image recognition and
matching. The videos used are short because if longer the processing time will be higher and
performance become slower. The image database is not huge if so the SURF algorithm is efficient
on considerable size of images. There is some delay on calling Matlab for the audio processing of
the respective text. We have developed the running prototype as a desktop version only and also
the communication is one way.
7.2 Recommendation
Finally the project team recommends as follows:
For the future the team recommend the system to be scaled to be two direction that both
side communication can be possible.
The system should be available to smart phones and tablets also, since these devices are
vastly available in the world and in our country as well.
The system should work on web based application technologies and web api need to be
developed for the future.
The system should support all the gestures of ESL.
TTS need to be developed for the voices of Ethiopian regions Amharic dialect.
63
Large trained database of gesture need to be available for all the gestures of ESL language.
Better machine learning algorithms should be applied for training and pattern recognition
on large scale.
64
Chapter 8. User Manual
8.1. General Information
8.1.1 System Overview
The system mainly intended to solve the problem of communication gap between deaf and normal
people; translates ESL to Amharic text and sound. The system consists of three phases:
8.1.2 Authorization
Using the system in illegal way which violets the university rule and regulation, and making
unauthorized copies of data, reports and documents is not permitted.
8.1.3 Contact
If you face any difficulty while using the system and have questions, recommendations, and
face challenges please contact us:
E-mail: eslteam2016project@gmail.com
65
Figure 38. Loading the video
66
Figure 39. Text Display for an corresponding sign of Zero
67
8.3.6 Training the system
After clicking the train button on the main GUI other screen comes and can train the system using
video loaded, camera, or static images. Even though the performance resulting three of them is
different. After successfully training user can check if the system is trained properly.
68
References
[1] Dagnachew F. Wolde. Machine Translation System for Amharic Text to Ethiopian Sign
Language: Thesis report, Addis Ababa Institute of Technology, 2011.
[3] Information Bulletin of the Faculty of Humanities in Addis Ababa University Vol. 1 Issue 2
January 2012, Addis Ababa.
[4] Eyasu Haile. Sign Language News at Addis Ababa University: case study, Addis Ababa
University, 2008.
20, 2015.
[9] A.E.El-Alfi, A.F. El-Gamal & R.A. El-Adly. Real Time Arabic Sign Language to Arabic
Text and Sound Translation System: Journal Report, Mansoura, Egypt. Retrieved October 30,
2016 from http://www.ijert.org/view-pdf/9906/.
[10] http://www.techrepublic.com/blog/10-things/10-techniques-for-gathering-requirements/
[12] Chaelynne M. Wolak. Gathering Requirements the Use Case Approach. Retrieved
24, 2015.
69
[14] http://freecode.com/projects/espeak-for-android Accessed date November 28, 2015.
[15] http://www.computerworld.com/article/2988167/microsoft-windows/microsoft-
claims110m-devices-now-run-windows-10.html Accessed date November 28, 2015.
[17] Abdelmoty M.Ahmed, Reda Abo Alez, Muhammad Taha and Gamal Tharwat. Propose a
New Method for Extracting Hand using in the Arabic Sign Language Recognition (Arslr)
System. Retrieved from http://www.ijert.org/download/14297/propose-a-new-method-for-
extracting-hand-using-in-the-arabic-sign-language-recognition-arslr-system
[18] Sarad Dhungel. American Sign Language (ASL) to Text/Voice. Retrieved December 1,
2015 from http://www.mwftr.com/SD1415/ASL2Text.pdf
[19] Jessica Hutchinson. Literature Review: Analysis of Sign Language Notations for Parsing in
Machine Translation of SASL. Thesis Report. Rhodes University South Africa, 2012. Retrieved
December 1, 2015 from http://www.cs.ru.ac.za/research/g09h2318/LiteratureReview.pdf
[22] http://www-01.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsPhonology.htm.
Accessed date December 2, 2015.
70
[28] Chelsia Amy Doukim and etal. COMPARISON OF THREE COLOUR SPACES IN SKIN
DETECTION. University Malaysia Sabah, 88999 Kota Kinabalu, Sabah, Malaysia.
71
Appendix A. Questionnaire
Dear respondents, you are requested to fill this questionnaire for the purpose of requirement
gathering for our project. There are total of 8 questions and you are expected to tick (X) on space
provided.
Q.No Assumed Questions that can cover the whole user Yes No Don’t know
requirements
72
Appendix B. Project Schedule
The project ends at the end of semester. And the project schedule is based on the deadline of the
end semester for the documentation part or Software Requirement Specification Document
(SRSD). The project schedule for the system to be developed is illustrated as follows.
The project schedule in terms of Gantt chart is shown as follows. The end date both in first semester
and second semester might change because of the University schedule might change.
73
74