Project Final Draft Last Last

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Addis Ababa Institute of Technology

School of Electrical and Computer Engineering

Center of Information Technology and Scientific Computing

Department of Software Engineering

Project Documentation on

Ethiopian Sign Language (ESL) Conversion System to Text and Audio


Communication: Image Processing Approach

By:

Eyoel Fekadu: ENR/1739/04

Michael Gobena: ENR/2611/04

Tegbabu Solomon: ENR/2002/04

Wondifraw Teffera: ENR/2179/04

Advisor:

Tigabu Dagne

June, 2016
Addis Ababa Institute of Technology

School of Electrical and Computer Engineering

Center of Information and Scientific Computing

Department of Software Engineering

Ethiopian Sign Language (ESL) Conversion System to Text and Audio


Communication: Image Processing Approach

This Project documentation submitted in partial fulfillment of the requirements for the
Degree of Bachelor of Science in Software Engineering.

Project Advised by:

Tigabu Dagne

Name Title Signature Date

1. __________________ Advisor _____________ _____________


2. __________________ Chairperson ___________ _____________
3. __________________ Examiner _____________ _____________
4. __________________ Examiner _____________ _____________
5. __________________ Examiner _____________ _____________

June, 2016

II
Declaration of Originality

We declare that this project is our original work and has not been presented for a
degree in any other university.

Name Signature

1. __________________________ __________________

2. __________________________ __________________

3. __________________________ __________________

4. __________________________ __________________

This project Documentation has been submitted for examination with my approval
as University Advisor.

Advisor Name: Signature

Tigabu Dagne __________________

III
Acknowledgement
Before all, the project team would like give special thanks to God for his guidance in all our
journey so far.

Then, we would like to thank our Advisor Tigabu Dagne without whose persistent supervision and
guidance this project documentation might not be completed.

Our special thanks also goes to our family who supported in all the ways from our childhood to
present.

Next we would like to thank all the people, who gave their valuable time and feedback, support
and encouragement while we work on project documentation so far.

Finally, we would like to take this opportunity to thank all the people who have contributed to this
project documentation with their invaluable opinions and suggestions, without whose
collaboration the project documentation might not be completed.

IV
Abstract
Communication is the way people interact with each other in their day to day life. As one way of
communication Sign language is dominant language for the deaf community. As member of the
society this community need communication with any one of the community breaking their
communication barrier. The project team tries to tackle the problem to some extent in breaking
this communication barrier. Shortage of corpus data, scope of the project might make the project
to be difficult in the way it is thought to be developed. The positive impact this project brings if
successful makes us to be motivated to try more and more.

Internationally there are many Sign language to text and sound are done. For instance, ASL to Text
and Sound for English language is implemented in USA, Arabic Sign Language to Text and Audio
for Arabic language is implemented in Arab countries and so on. Locally, there is no system done
for Ethiopian Sign language. As a result we come-up with a project to do ESL to Text and Audio.

The main purpose of this project documentation is to design a system which converts Ethiopian
Sign Language to text and speech.

Our system is proposed to work on desktop and is proposed to be implemented using C#, Matlab
or EmguCv. The methodologies used for data gathering tools are questionnaire, discussion, use
case scenario and extensive literature review. Moreover, the methodology proposed to be used in
implementation are image processing/video processing approach, dictionary file development,
using existing audio file for Amharic letters and collection of sound corpus by recording audio.

The design phase is done using Object oriented approach using the UML modeling. The team has
tried to identify the functional requirements and modeled it using the major modeling methods.
Each functional and user requirement is analyzed using activity diagram, sequence diagram, DFD
diagram, class diagram. At last the test design is included to test the system at the end of its
implementation.

The implementation is done using C# and EmguCV library which is cross platform .NET
wrapper for opencv image processing. The algorithms applied are Convex and Convexity hull for
image detection and SURF and Absolute difference for recognition and matching. The test
results is done according to the test design of first semester. The audio processing uses
concatenative approach and implemented on Matlab.

V
Table of Contents
Contents pages
Acknowledgement ........................................................................................................................ IV
Abstract .......................................................................................................................................... V
Table of Contents .......................................................................................................................... VI
List of Acronyms .......................................................................................................................... IX
List of Figures ................................................................................................................................ X
List of Tables ............................................................................................................................... XII
Chapter 1. Introduction ................................................................................................................... 1
1.1 Background of the Study .................................................................................................. 1
1.2 Statement of the Problem ................................................................................................. 2
1.3 Objective of the Study ...................................................................................................... 3
1.3.1 General Objective ..................................................................................................... 3
1.3.2 Specific Objective ..................................................................................................... 4
1.4 Proposed System .............................................................................................................. 4
1.5 Feasibility Study............................................................................................................... 5
1.5.1 Technical Feasibility ................................................................................................. 5
1.5.2 Economic Feasibility ................................................................................................ 6
1.5.3 Legal Feasibility........................................................................................................ 6
1.5.4 Operational Feasibility .............................................................................................. 6
1.5.5 Schedule Feasibility .................................................................................................. 6
1.6 Methodology & Data Gathering Tools............................................................................. 7
1.6.1 Data Gathering tools ................................................................................................. 7
1.6.2 Implementation Tools and methods employed ......................................................... 7
1.7 Scope and Limitation of the Study ................................................................................... 8
1.8 Organization of the Project ................................................................................................... 9
Chapter 2. Review of Related Literature ..................................................................................... 10
2.1. Related works on the Area ................................................................................................. 10
2.2. The History of Sign Language Representation .................................................................. 11
2.3. Phonological Structure of Sign Language .......................................................................... 12
2.4. Morphological Structure of Sign Language ....................................................................... 12
2.5. Text-to-Speech (TTS) Systems at Glance .......................................................................... 13

VI
2.6. Phonological Structure of Amharic .................................................................................... 13
2.7. Morphological Structure of Amharic Language ................................................................ 14
Chapter 3. System Features........................................................................................................... 15
3.1. Functional Requirements.................................................................................................... 15
3.1.1. User Requirements ...................................................................................................... 16
3.1.2. Detail Functional Requirements .................................................................................. 18
3.1.3 Use Case Model ............................................................................................................ 24
3.2. Non Functional Requirements ............................................................................................ 26
3.3 Analysis Models .................................................................................................................. 27
3.3.1. Data Flow Diagrams .................................................................................................... 27
3.3.2 Sequence Diagram ........................................................................................................ 27
Chapter 4. System Design ............................................................................................................. 32
4.1 Deployment Diagram .......................................................................................................... 32
4.2 Architectural Design ........................................................................................................... 32
4.2.1 Class Diagram............................................................................................................... 32
4.2.2 Activity Diagram .......................................................................................................... 34
4.2.3 Proposed Software Structure ........................................................................................ 45
4.3 User Interface Design .......................................................................................................... 47
4.4 Test Design .......................................................................................................................... 51
Chapter 5. Implementation of the System..................................................................................... 53
5. Implementation of the System .................................................................................................. 53
5.1 Implementation.................................................................................................................... 53
5.2 Detection and Recognition .................................................................................................. 53
5.2.1 Hand Skin Color Detection........................................................................................... 53
5.2.1 Hand Recognition ......................................................................................................... 55
5.2.3 Training phase of implementation ................................................................................ 58
5.3 Implementation of Text to Audio ........................................................................................ 58
Chapter 6. Test Results ................................................................................................................. 60
6.1. Purpose ............................................................................................................................... 60
6.2. Unit Functional Testing...................................................................................................... 60
6.3 System Function Testing ..................................................................................................... 61
Chapter 7. Conclusion and Recommendation ............................................................................... 63

VII
7.1 Conclusion........................................................................................................................... 63
7.2 Recommendation ................................................................................................................. 63
Chapter 8. User Manual ................................................................................................................ 65
8.1. General Information ........................................................................................................... 65
8.1.1 System Overview .................................................................................................... 65
8.1.2 Authorization ............................................................................................................... 65
8.1.3 Contact ......................................................................................................................... 65
8.2 Access level ........................................................................................................................ 65
8.3 System Activities............................................................................................................... 65
8.3.1 Load Video ................................................................................................................... 65
8.3.2 Capture Video ............................................................................................................... 66
8.3.3 Video Description ........................................................................................................ 66
8.3.3 Text Display ................................................................................................................ 66
8.3.5 Exit the system.............................................................................................................. 67
8.3.6 Training the system ...................................................................................................... 68
8.4. How the System works....................................................................................................... 68
References ..................................................................................................................................... 69
Appendix A. Questionnaire .......................................................................................................... 72
Appendix B. Project Schedule ...................................................................................................... 73

VIII
List of Acronyms

1. ArSL: Arabic Sign Language


2. ASL: American Sign Language.
3. BSC: Bachelor of Science.
4. ENAD: Ethiopian National Association for Deaf
5. ESL: Ethiopian Sign Language.
6. FLANN: Fast Library for Approximate Nearest Neighbor
7. KNN: K Nearest Neighbor
8. SRSD: Software Requirement Specification Document.
9. SSML: Speech Synthesis Markup Language.
10. SURF: Speeded UP Robust Features
11. TTS: Text to Speech
12. UML: Unified Modeling Language

IX
List of Figures
Figure 1. The proposed system -------------------------------------------------------------------------- 5
Figure 2. The bar graph depicting the user requirement fetching -----------------------------------17
Figure 3. System wide use case system ---------------------------------------------------------------- 24
Figure 4. DFD Diagram -----------------------------------------------------------------------------------27
Figure 5. Sequence diagram for Load and Show Video Description --------------------------------28
Figure 6. Sequence Diagram for Capture Video and Show Description ----------------------------28
Figure 7. Sequence diagram for View Text -------------------------------------------------------------29
Figure 8. Sequence diagram for play audio -------------------------------------------------------------30
Figure 9. Sequence diagram for Sending feedback and Receiving feedback ----------------------31
Figure 10. Deployment Diagram -------------------------------------------------------------------------32
Figure 11. Class Diagram ---------------------------------------------------------------------------------33
Figure 12. Activity diagram load video -----------------------------------------------------------------34
Figure 13. Activity diagram capture video ------------------------------------------------------------- 35
Figure 14. Activity diagram save video -----------------------------------------------------------------36
Figure 15. Activity diagram video Description --------------------------------------------------------36
Figure 16. Activity diagram process video -------------------------------------------------------------37
Figure 17. Activity diagram search pattern. ------------------------------------------------------------38
Figure 18. Activity diagram save pattern ---------------------------------------------------------------38
Figure 19. Activity diagram text display ---------------------------------------------------------------39
Figure 20. Activity diagram copy to clipboard ------------------------------------------------------- 40
Figure 21. Activity diagram text process -------------------------------------------------------------- 41
Figure 22. Activity diagram save text ------------------------------------------------------------------ 41
Figure 23. Activity diagram play audio ---------------------------------------------------------------- 42
Figure 24. Activity diagram save audio ---------------------------------------------------------------- 43
Figure 25. Activity diagram send feedback ------------------------------------------------------------ 44
Figure 26. Activity diagram send activity -------------------------------------------------------------- 44
Figure 27. Activity diagram Receive Feedback-------------------------------------------------------- 45
Figure 28. The proposed software structure ------------------------------------------------------------ 46
Figure 29. The proposed system Flowchart Structure ------------------------------------------------ 47
Figure 30. The overall user interface ---------------- --------------------------------------------------- 48

X
Figure 31. When video is loaded/captured/ sound play activity ------------------------------------- 49
Figure 32. When Mute is activated / sound is mute --------------------------------------------------- 50
Figure 33. When process video is pressed -------------------------------------------------------------- 50
Figure 34. When video processing is finished ---------------------------------------------------------- 51
Figure 35. The matched area of the hand image captured from video------------------------------- 56
Figure 36. Better matched image for captured image from video file------------------------------- 56
Figure 37. Unmatched image with an image capture from video file ------------------------------- 57
Figure 38. Loading the Video----------------------------------------------------------------------------- 66
Figure 39. Text Display for an corresponding sign of Zero ------------------------------------------ 67
Figure 40. Audio output------------------------------------------------------------------------------------ 67

XI
List of Tables
Table 1 response to the questionnaire distributed ----------------------------------------------- 16
Table 2 list of functional requirement ------------------------------------------------------------ 19
Table 3 list of use case description ---------------------------------------------------------------- 25
Table 4 Test Design Suit ---------------------------------------------------------------------------- 51
Table 5 Unit Function Testing---------------------------------------------------------------------- 61
Table 6 System Function Testing------------------------------------------------------------------- 62

XII
Chapter 1. Introduction
1.1 Background of the Study
Ethiopian Sign Language (ESL) is natural language spoken by vast community over one million
Ethiopians of the Deaf community [1]. It is a native language for the Deaf community. It is
composed of two features namely: manual features and non-manual features where the manual
components are hand-shape, hand orientation, hand position, and hand movement and the non-
manual ones are mouth pattern, head and shoulder movement, facial expression and eye-gaze [1].
Like many of African Sign Language, it has historical connection with American Sign Language
(ASL). However it has independent lexical and grammatical structure [2]. In addition as [4]
discussed ESL has origin from ASL with some influence from Nordic countries. For instance, [4]
explained the case by using a pilot survey on how close those two languages (ESL and ASL) on
249 signs and found out that 25% of the words brought from ASL are only modified to suit the
Ethiopian culture leaving complete modification. As a result sign language speaker from USA can
somewhat communicate with sign language user from Ethiopia when compared with the normal
or non-sign language users of both countries. Knowing its origin is essential to understand how it
works and what its relation with other languages.

Internationally, researchers have tried to find solution to the problem of deaf and hearing people
communication. As explained in [7], Motion Savvy Uni (The First Sign language to voice system)
which is two way communication tool is being developed and expected to be arrived by summer
2016. MotionSavvy Uni package comprises of a tablet, a case with integrated motion tracking
capabilities and a mobile app. As the user performs sign language, he or she can see themselves
through a mirror image that provides live-time feedback of their signing. Due to this Motion Savvy
is going to be the breakthrough for the future in effective communication of signers and hearing
people [7]. Even though this technology has not spread still but expected to be dispersed by 2016.

Locally, only few researches are done, for instance a Machine translation of Amharic text to
Ethiopian Sign Language is done by [1] as his partial fulfillment of Master of Science in Computer
Engineering. He has achieved in translating Amharic text to Sign language for the first time. It is
an input for other researchers. But the project is not usable for the day to day life at least
communication of signers and non-signers. In addition two students namely: Libnedengel and

1
Befakadu made project on Amharic Text and sound to Ethiopian Sign Language (ESL) as partial
fulfillment of Bachelor of Science in Computer Engineering in 2015 [8]. They have achieved in
translating the Amharic text and audio to respective ESL. But the system is not trained well and
when some words are fed it shows nothing.

The project team is doing the project targeting mainly for an association of Sign language user
people found in Ethiopia. Even though system is mainly for sign user it can be used for by non-
sign user also. The association name is ENAD (Ethiopian National Association for the Deaf).

ENAD is a non-profit organization established by people with deaf themselves. It was founded by
six persons with deaf. It became legally registered on 6 July 2006 at the Ministry of Justice [13].

Their mission is to empower people with deaf and their families for an independent life, and to
facilitate and ensure their full inclusion and participation in their society, in raising public
awareness of their unique disability, promoting the use of alternative techniques of communication
and promoting and improvement of services and facilities for people who are deaf.

1.2 Statement of the Problem


As discussed by [1], there are few aged and modern schools that supports new members of the
deaf people in Ethiopia. Though these schools enabled the deaf community to have opportunity of
developing native language of their own, it still made them be marginalized and their scope of
communication be limited to specific geographic area. As a result they are obliged to have human
interpreters in their day to day life of their interaction with the normal people.

The government and public awareness towards sign language and deaf individuals is somewhat
optimistic nowadays. As an indication the Addis Ababa University has launched the Department
of Ethiopian Sign Language and Deaf Culture in degree program since 2008 GC. The program
trains deaf and hearing students in the areas of Ethiopian Sign Language, Sign Linguistics,
Interpretation and Deaf Culture [3]. The other promising thing is that ESL education is being given
as non-credit course which is aimed to train users of ESL to be fluent and help the Deaf community
[4]. But this still has not solved those problems imposed on deaf community regarding
communication gaps. It is about 7-8 years the Department is launched. But professionals
graduated from this department has not significantly resolved the issues of narrowing the gap and
also be active member of the country’s economy.

2
Though there are researches made to improve the active role of deaf community, the fact is that
they are not significant in tackling the problem. In addition as [1] discussed that there are video
based text to ESL but the approach is infeasible in terms of time and cost. Because making a video
needs time and professionals. The project done by [1] is not more than mere translation of Amharic
texts to ESL and not seen used by deaf community to alleviate the widened gap of communication
between sign language users and non-sign language speakers.

The project done by Libnedengel and Befakadu is somewhat nice for they have done Amharic to
Text then Sound and then to Ethiopian Sign Language [8]. Their project has achieved translating
Amharic Texts to Sound and then to respective ESL. The system they developed works in one
direction only. That means in the existing system two people from signer and hearing respectively
if they can read (literate) they can communicate by writing on paper or else text based chatting.
The system is giving priority for the non-signer than the signer.

The existing system where deaf individuals and normal people communicate is using interpreters
it is manual and it needs additional person to interpret for them or else text based communication
where each communicated with writing texts. Some videos are also tried to be made to aid them
but not sufficient. In general the existing system is not suitable for catering the communication
problem faced by the signer people with the hearing people.

This project is intended to get answers for the following questions:

 What are the ways to represent ESL in text and sound?


 How to increase the recognition percentage with in noisy environment?
 How to map ESL Signs with Amharic Text and Audio?
 What training mechanisms are used to train the System?

1.3 Objective of the Study


1.3.1 General Objective
The general objective of this project is to develop a prototype which converts ESL to respective
text and speech using image processing.

3
1.3.2 Specific Objective
Specific objectives of this project are:

1. To device a way to represent ESL in text and sound.


2. To select a better algorithm for recognition and matching.
3. To increase the recognition rate even in noisy environment
4. To apply the appropriate training algorithm to train the system.

1.4 Proposed System


This proposed system try to alleviate the problem of communication gap by developing the system
which convert ESL to equivalent Amharic text and audio. The system uses video input and process
the sign language speaker’s continuous frames (video) and then changes to respective Text and
Audio. In general the proposed system aims to develop a prototype which translate ESL to Amharic
text and sound. The proposed system has three main phases: Video processing/Image Processing,
Pattern Construction & Discrimination, and Text & Sound transformation. The system uses ESL
gestures from Ethiopian Sign Language Dictionary. The proposed system will work in three main
steps as explained by [9] in the following figure.

4
Video Processing
Input as Video Video Key Frames
Segmentation Processing

Pattern Construction and


Display the Discrimination
Text and The Featu
Matching
Sound playing re
Extrac
tion

Sign Language

Figure 1. The proposed system

1.5 Feasibility Study


A feasibility study is aimed to objectively and rationally uncover the strengths and weaknesses of
an existing business or proposed venture, opportunities and threats present in the environment,
the resources required to carry through, and ultimately the prospects for success [6]. In other words
it is using some criteria to judge the cost required and value attained. As discussed on [6] there
are five major feasibility areas. Namely: Technical, Economic, Legal, Operational, and Schedule
(TELOS).

1.5.1 Technical Feasibility


It is focused on gaining an understanding of the present technical resources of the organization and
their applicability to the expected needs of the proposed system [6]. The technical feasibility study
include materials, labors, technology. For the proposed system the present technical resource such
as materials are somewhat available (computer, appropriate software and etc.) and labors in terms
of programmers are available and this all is also expected to fulfill the needs of the system.

5
1.5.2 Economic Feasibility
It is to determine the positive economic benefits to the organization that the proposed system will
provide. It includes quantification and identification of all the benefits expected [6]. The proposed
system will have a number of economic benefits to the organization and beyond it. These are:

 Easy communication between deaf individuals and normal people.


 Make deaf individuals to actively participate in any activities where normal people do.
 To widely aware and teach deaf community starting from the early child hood age.
 To make deaf community actively participate on political matters of the country.

1.5.3 Legal Feasibility


It is to determine whether the proposed system conflicts with legal requirements [6]. The proposed
system comply with the rule, regulation and proclamation of the country. None of the part of the
system, its function contradicts the law.

1.5.4 Operational Feasibility


It is a measure of how well a proposed system solves the problems, and takes advantage of the
opportunities identified during scope definition and how it satisfies the requirements identified in
the requirements analysis phase of system development [6]. The proposed system is expected to
be solve the problem which is highly seen to satisfactory level and it is also expected to satisfy the
requirements identified in the analysis phase of the project.

1.5.5 Schedule Feasibility


It is a measure of how reasonable the project timetable is [6]. The system is done as purpose of
partial fulfillment for the Bachelor of Science in Software Engineering. And the schedule is that
all documentation that is System Requirement Specification to be completed within the semester
and the implementation will be conducted within the coming semester. As result the project
timetable is bounded by the semester length.

6
1.6 Methodology & Data Gathering Tools
It implies the methods the researcher intend to collect data.

1.6.1 Data Gathering tools


Discussion
Group discussion is one way to have a common understanding of the system. So we use it one of
the data gathering tool.

Questionnaire
Questionnaires are informal way of requirement gathering tools which are used to access
stakeholders on remote or distant location. That means these are stakeholders who are inaccessible
easily at the time the project team needed [10]. As a result the project team has planned to disperse
questionnaires to these stakeholders to gather the needed data which is essential for the system to
be developed.

Extensive literature review survey on the area


Literature review is a test of scholarly written, which includes the current substantive findings as
well as theoretical and methodological contribution to particular topic [11]. Extensive literature
review survey will be conducted to use any available made researches or tools that will enable to
develop the system.

Use cases
Use cases are basically stories that describe how a discrete process works and are easier for users
to articulate [10]. [12] has explained use case as scenario which captures contract between
stakeholders of system about its behavior. In addition they are means by which the software’s
system behavior is described under various conditions as it responds to requests by users [12]. As
a result the project team has planned to use this essential tool in overall design process of the
system.

1.6.2 Implementation Tools and methods employed


The project will be conducted using image processing, video processing tools such as Matlab,
EmguCv which is cross platform library for OpenCV image processing, audacity, Visual
Studio IDE, Visual paradigm, the concept of training to train the system will also be used.

7
The project team has decided to employ the follows methods and approaches in line with the
implementation tools.

 Image processing/Video processing approach will be employed: where the input will be
video and the video will be further extracted to respective video frames of different ESL
signs.
 Object Oriented Analysis and Design Approach: it is an approach where primary focus
is made on behavior of the system. Unifying Modeling Language (UML) is object oriented
language for specifying, visualizing, constructing, and documenting software system
(UML Document Set, 2001). It is the successor to the modelling languages found in the
Booch (Booch 1994), OOSE/Jacobson, OMT and other methods. The UML is important
because it can help software developers communicate. It has to be used in a way that helps
communication and does not hinder it [4]. UML will be employed to specify, visualize,
construct, and document the system during analysis and design.
 Dictionary File development: the file used for the language recognition. And researcher
will be using the ESL dictionary
 Using available audio for Amharic ‘Fidel’ sounds: to develop dictionary to map the
respective sign with the sound (audio) in .wav format.
 Recording audio: for different sample sound for the system testing in each phase.

1.7 Scope and Limitation of the Study


The project has its own scope and limitation. It is implemented as one way communication i.e.
ESL to text and Speech output. In addition the system done is supports only selected ESL words
in Amharic. This are numbers from zero up to five, some selected words in Amharic and English
alphabet from A to D and some alphabet of Amharic language. But the sound of the whole alphabet
is included. The project supports the gesture of ESL which are static only that is two dimensional
without motion. The dynamic aspect of sign language i.e. ESL is beyond the scope of the project.
Due to time and cost of the project the project is confined to specific region in i.e. Addis Ababa.
The videos used for training the system are recorded in controlled environment. The audio
synthesis is trained but done using concatenative approach. Only the 231 alphabet and A-Z English
alphabet are considered. Word level concatenation is not done is beyond our scope.

8
1.8 Organization of the Project
The documentation contains five phases. The first phase is introduction which contains the
background of the study, statement of the problem, objective of the study, proposed system,
feasibility study, methodology and data gathering tools and scope and limitation of project. The
second phase covers the literature review which include related works on the area, history of sign
language representation, phonological structure of sign language, morphological structure of sign
language, Text to Speech at glance, phonological structure of Amharic language and
morphological structure of Amharic language. The third phase covers the system features which
include functional requirement, non-functional requirement, use case diagram, use case
description, DFD diagram and sequence diagram. The fourth phase covers the system design which
include deployment diagram, class diagram, activity diagrams, proposed software structure and
test design. The fifth phase contains the covers the implementation of the project which contains
the detection and recognition as well as the implementation of Text to audio. The sixth phase
contains the test results which includes the unit test result and system function test results. And
finally the last phase covers the conclusion and recommendation parts. In addition the user manual
is also included at last.

9
Chapter 2. Review of Related Literature
2.1. Related works on the Area
Approximately one or two out of 1,000 to 2,000 babies is/are born hearing/speech impaired [12].
This shows us that their number is not quite small and need recognition in the day to day life. Even
though they are recognized, communication between hearing and deaf person is not easier as it
requires involvement of third party. As a result hearing impaired people are pressured to adopt
written language as their second whether it is their interest or not. Due to this, the area is an active
area of research and there are many projects or researches done and being done to bridge the gap
between hearing and hearing impaired community. Among them, some of them are discussed
below.

The iCommunicator is a technology made by PPR Direct who is the proud member of Microsoft
Assistive Technology Vendor. It converts in real-time speech to text, speech/text to video sign
language and speech/text to computer generated voice. It provides end-users with many
opportunities in the areas such as education, business and every day activities. It is equipped with
better speech recognition (Dragon NaturallySpeaking v9.0) and also compatible with Microsoft
accessibility options [15]. In line with this Microsoft claims 110million devices now run Windows
10 with the exclusion of the previous operating systems such as (Windows XP, 7, 8, and 8.1). So
iCommunicator’s compatibility with Microsoft makes it to be popular [15].

As [16] explained there are mainly two major systems developed in regard to Sign Language
Recognition. These are Glove based and vision based. The glove based recognition systems
recognize based on different illumination based changes. It is an approach where microcontrollers
are embedded into gloves. Lee et al stated in [16] proposed glove sensor mechanism to learn
Korean Finger spelling using k-mean. Halawani stated in [16] developed Arabic Sign Language
translation system for mobile devices. Regarding to vision based approach Hamad et al stated in
[16] introduced handshape estimation approach using two cameras. In addition the Hidden Markov
Model (HMM) and Dynamic Programming were used to recognize ASL words. With respect to
this [17] proposed an idea to convert sign language to voice recognition based on feature extraction
and HMM from grey scale images. Moreover, Mohandes cited in [17] proposed prototype system
to recognize Arabic Sign Language based on Support Vector Machine (SVM) and also Arabic
Text to Arabic Sign Language.

10
ASL to Text-Sound based on Image processing and machine learning is another approach used by
researchers. It is one part of the vision based recognition. Paper [19] proposed a system which
converts ASL to Text-Sound using Image processing, Machine learning and Morse code.

MST of Amharic Text to ESL developed by [1] made spring board for Ethiopian research on Sign
languages. This system is the reverse of the system proposed to be done by this project team.

Arabic Sign Language to Text-Sound proposed by [9] using Image/video processing approach
where video is converted into individual frames and they are matched with the augments of frames
already in the database. [9] used an approach where two relational databases (gestures and
description database, and the conjugate sound database) are built and video processing is to extract
the key frames, and extracting the key words and finally displaying the sentence with the audio
playing.

2.2. The History of Sign Language Representation


Through different times researchers has tried to represent Sign languages in different ways to
enable Sign language to be represented and be used like any other oral language. As a result some
the notation system are discussed below.

Hamburg Notation System (HamNoSys): is phonetic transcription system for sign


languages, which is analogous to International Phonetic Alphabet (IPA) for oral languages [20].
Paper [20] explains in detail that HamNoSys is an alphabetic system describing signs on mostly
phonetic level. It is a notation system which introduced an alphabetic system to describe the sub
lexical parameters location, hand configuration and movement to give phonological description of
sign language. This system is developed to be suitable for any sign language.

Sign Writing is another system developed by Valerie Sutton, for the purpose of annotating
movement. It was developed for communication purpose than linguist purpose. It is pictorial
notation system to describe non-manual features [19]. It is developed to be appropriate for any
sign language.

Gloss: is another notation system used to represent sign using a word in target language and it
requires a translation system to take an approach of dictionary-based approach.

11
Most of the notation systems are trying to uniquely represent specific sign language in terms of
unique phonetic transcription. These systems are mostly unable to explain the emotions and some
facial movement as much as possible. The approach we are using is an image processing approach
where video is fragmented to frames and compared to database saved images of sign languages.
And each of those database saved images representing the sign languages are mapped to
description that is text.

2.3. Phonological Structure of Sign Language


The phonology refers to study of physical sounds present in sound. The concept of phonology is
not only applied to spoken language but also to sign language. It is possible that phonology of Sign
language can be defined. The following parameters are taken into account while talking about
phonology of any sign language specifically ESL in our case. These are:

 Configuration : it is hand shape when doing the sign


 Orientation of the hand: where the hand palm is pointing to.
 Position: where the sign is completed.
 Motion: Movement of the hand when doing the sign. These can be straight, swaying, or
circular.
 Contact point: which part of the hand touch the body.
 Plane: where the sign depends on the distance of the body
 Non-manual components: Information provided by the body.

2.4. Morphological Structure of Sign Language


The morphology of spoken language is mostly related with both the inflectional and derivational
morphology. The inflectional morphology refers to modification of the words. Whereas the
derivational morphology refers to forming of new word from the existing word [26]. Accordingly,
sign languages such as ESL have only derivational morphology because there are no injection for
tense, number or person. The most important parameters while dealing with morphology of Sign
languages specifically in our case ESL are as follows:

 Degree: mouthing is used.


 Reduplication: repeating the same sign several times.
 Compounds: fusion of two words
12
 Verbal aspect: also this involves reduplication
 Verbal number: to explain plural and singular. Again reduplication is used.

2.5. Text-to-Speech (TTS) Systems at Glance


Text-to-Speech Systems are speech synthesis that converts text to spoken voice output. According
to IBM [23], TTS is the generation of speech from text. TTS system was first built to aid visually
impaired by offering a computer-generated spoken voice that would read text to the user [24].
There are many TTS system developed in the world. Specifically the most widely implemented
ones are English TTS, German TTS, Spanish TTS, and Chinese TTS and so on. Some of them are
reviewed as follows.

MaryTTS: is an open-source, multilingual text-to-speech synthesis platform developed with


java. It supports a number of languages such as German, British and American English, French,
Italian, Swedish, Russian, Turkish, and Telugu. The platform is also suitable for adding a support
to a new language [25]. The platform is also being made be suitable for more languages.

DPSEECH: is a text to speech system which has a functionality of Automatic Speech


Recognition (ASR) [27]. According to [27] the system allows to save an output in the form of wav
or mp3 file. It supports also a system of local recognition. One of its drawback is that it mainly
work with installed voices and there is no way to add support to another language easily.

ESpeak TTS: it is a text to speech system for android supporting for 75 languages and accents.
It is based on eyes-free version and supports for accents and special characters, improved handling
of speech and rate and pitch. And it is also based on MIPS service and improved support for Speech
Synthesis Markup Language (SSML) [14]. The support for many languages makes it usable for
projects undertaken on the specific languages.

2.6. Phonological Structure of Amharic


To understand the text to speech process one need to understand how the phonology works in
specific language. Phonology is an aspect of language that deals with rules for the structure and
sequencing of speech sounds. And every language has a wide variety of speech sounds (phonemes)
[21]. It is also the study of how sounds are organized and used in natural languages and includes
an inventory of sounds and their features and rules which specify how sounds interact with each

13
other [22]. Amharic has also its own phonology structure. As explained in [1], Amharic has its
own phonetic, phonological and morphological properties. And also it has its own inventory of
speech sounds and also has set of speech sounds which are not in any language. For instance {ጵ፡
ሽ፡ቅ} are some of the speech sounds which are missing in other languages. Amharic has consonants
and vowels which is different from other language. Accordingly, [1] has explained that it has 30
consonants from which 27 are simple and 3 complex and seven vowels. When viewed from the
phonological aspect, Amharic is not ambiguous as other languages like English. For instance in
English looking letters is not enough to say or dictate the word created by those letters. But in
Amharic it is easy to read if someone has knowledge of the language by simply looking the
sequence of words.

2.7. Morphological Structure of Amharic Language


Morphology is the identification, analysis and description of the structure of a given languages
morphemes and other linguistic units such as root words, parts of speech, intonation and stresses,
and implied context [26]. Understanding the morphology can help us in understanding the Amharic
text because the system proposed to have text processing part and speech processing. Baye cited
in [1] explain that Amharic is morphologically rich language but despite its complex
morphological structure it is well understood. The units from which the language is developed is
morpheme. According to Baye cited in [1] there are two types of morphemes namely; free
morpheme and bound morpheme. Where free morpheme give independent meaning and bound
morphemes depends on free morphemes to give meaning. As a result as we are processing the texts
these morphological aspect is taken into consideration.

14
Chapter 3. System Features
3.1. Functional Requirements
Functions are software capabilities which can satisfy user’s explicit or implicit requirements when
specified conditions are met. In other words they are statement of services that the system should
provide, how the system should react to particular inputs and how the system should behave in
particular situations. They are requirements which capture the intended behavior of system. And
those behaviors can be expressed by services, tasks or functions the system is required to perform.

The main functionalities of the system are as follows. These are:

1. Load the video from the a path it is found to the system


2. Record video to be used for the system.
3. Detect that the video is valid.
4. Show loaded or recorded video description: such as the size, type, duration, frame rate per
second (fps), path.
5. Segment the video to general frames and key frames: general frames are the total frames
segmented from the video. And the key frame is the key frame that is segmented from the
input video.
6. Save the general and key frames to their respective directory
7. Show the segmented video description: such as the size, number of general frames, number
of key frames.
8. Region of Interest Extraction: where the region which is needed is extracted.
9. Features are extracted for pattern reconstruction
10. Matching the patterns obtained with database available gesture pattern.
11. Detect any unrecognized gesture: is where matching was not successful.
12. Prompt for description of unrecognized gesture: the text description along with the sound.
13. Saves or adds the unrecognized gesture to the database.
14. Display text for the matched patterns
15. Output text is input to the TTS engine.
16. The TTS engine will process text
17. Detect if the text has no respective sound
18. Prompt for sound of the respective text which is unrecognized and save it.

15
19. Play the sound for the respective text.

3.1.1. User Requirements


The project team has distributed questionnaire to 10 candidates to collect the user requirements.
And accordingly the team has tried to identify some user requirements. And again the team has
also tried to launch an interview but it was difficult to get translator. As a result the team was
unable to launch interview because of the unavailability of translators. The replies for the
questionnaire distributed is tabulated as follows. In addition, the reply collected are drawn in the
bar graph also.

Table 1. Response to the questionnaire distributed.

Q.No Assumed Questions that can cover the whole user requirements Yes No Don’t
know
1 Do you agree if the system has live video feed? 9 0 1
2 Do you agree if the system has synchronous text and speech 6 2 2
output?
3 Is there a purpose to save the video recorded? 5 2 3
4 Is there a purpose to check the live feed video? 7 0 3
5 The system should recover even error occurs 8 0 2
6 Do you think that performance matters? 9 0 1
7 The system should be secured. Do you agree? 7 0 3
8 Do you think that the system should be scalable? 4 3 3

16
Graph illustrating the responses for eight questions
10
9
8
7
6
5
4
3
2
1
0
Quest 1 Quest 2 Quest 3 Quest 4 Quest 5 Quest 6 Quest 7 Quest 8
Yes No Unsure

Figure. 2. The bar graph depicting the User requirement fetching.

As shown on the graph, the candidate are more or less agreed on the proposed system. Based on
the reply by the candidates the project team has proposed the following requirements. Even though
the requirements lists are based on the reply, there are also additional requirements which are
proposed and selected by the project team. These are:

1. Load Video: saved video which will can be in any storage place. Search for the video and
open into the dialog box. The videos validity will be checked.
2. Capture live video: It is to feed live or recorded video to the system. It is also an input
mechanism using camera. It will have unit functions capture, pause, and cancel. When the
record button is pressed the system starts the camera module and starts capturing video and
saves to specific directory. It will be shown on the window.
3. Show video info: show the description of the loaded or recorded video. Show the video
type, frame rate per second (fps), duration, and size.
4. Process Video: after the information of the video is shown the process continuing is video
segmentation. These functionality works after the button start process is pressed.

The steps in processing videos are explained as follows. This are:

17
 Segment video: where the loaded video is segmented to key frames. And the key frames
are saved in in a directory after it is segmented into frames.
 Extract region of interest: where from each of key frames region of interest will be
extracted and then cropped.
 Resize frame: at the end of cropping the frame will be resized to appropriate pixels.
 Feature extraction: features of the resized frames will be extracted to obtain patterns.
 Pattern matching: the obtained pattern of the feature will be matched with pattern saved
in the database.
 Handle unmatched pattern: if by chance unmatched pattern is found then it will be
saved to database.
5. Display the Text: checks that gesture is matched properly then the text will be displayed
clearly.
6. Process the Text: the text successfully displayed will be an input to the TTS engine. The
text will go a process in natural language process. The process which are taken place in
processing text are as follows.
 Text input: the text displayed will be put as input to process text stage.
 Text Analyze: the text input go to analyses stage.
 Text matching: text is matched with text inside database which saved with
respective sound.
 Handle unmatched text: if by chance unmatched text is found then it will be saved
to the database.
7. Play the sound: checks that matched text is correct then it outputs the sound of the text.
The play sound can be muted or not.
8. Send Feedback: Sends the unrecognized texts and gestures to an admin email

3.1.2. Detail Functional Requirements


Each of the use case functions are described in the table below. The id of the requirement is taken
from the name of the functional requirement and number. For instance for the functional
requirement video loader the functional requirement id given by taking the first letter of each word
and adding FR-S in front (which is to mean Functional Requirement System). So the id become

18
FR-S-VL-1. The others are given in similar way to the example. Each of the functional
requirements in the use cases are explained in detail based on their order and requirement id, title,
priority, and dependency.

Table 2. List of Functional Requirements


1. Load Video:
Requirement ID FR-S-1
Title Load
Description The system allows the user to
load video from some local
directory
Priority High
Requirement ID FR-S-2
Title Pause
Description The system allows the user to
pause the video at any time
Priority Medium
Requirement ID FR-S-3
Title Stop
Description The system allows the user to
clear the video loaded if from
some local directory
Priority Medium

2. Capture Live Video


Requirement ID FR-S-4
Title Record
Description The system allows the user to capture the
video using the camera
Priority High
Requirement ID FR-S-5
Title Stop
Description The system allows the user to stop the
recording
Priority Medium
Requirement ID FR-S-6
Title Save
Description The system allows the user to save the
video captured if from some local directory

19
Priority High
3. Show Video Info:
Requirement ID FR-S-7
Title Show
Description The system shows the user to about the
details of the video
Priority Low
Dependency Capture Video, Load Video

4. Process Video:
Requirement ID FR-S-8
Title Segment Video
Description The system takes loaded or captured video
and segment it to key frames.
Priority High
Dependency Show Video Info.
Requirement ID FR-S-9
Title ROI Extract
Description The system takes the key frames and
extracts the region of interest from each
key frames and then crops it.
Priority High
Dependency FR-S-8
Requirement ID FR-S-10
Title Cropper
Description The system takes the region of interest
extracted and crops it.
Priority High
Dependency FR-S-9
Requirement ID FR-S-11
Title Frame resize
Description The system takes the cropped frame and
resize it to some optimum size.
Priority High
Dependency FR-S-10

20
Requirement ID FR-S-12
Title Feature Extract
Description The system will take resized frame and
extract the features on it to generate
patterns
Priority High
Dependency FR-S-11
Requirement ID FR-S-13
Title Search
Description The system will take the patterns generated
by feature extractor and query the database.
Priority High
Dependency FR-S-12
Requirement ID FR-S-14
Title Match
Description The system will match the pattern
generated and the patterns queried.
Priority High
Dependency FR-S-12
Requirement ID FR-S-15
Title Save pattern
Description The system saves the unmatched pattern to
database.
Priority High
Dependency FR-S-13

5. Text Display:
Requirement ID FR-S-16
Title Display text
Description The system show the text matched to
respective pattern.
Priority High
Dependency FR-S-12,FR-S-13,FR-S-14
Requirement ID FR-S-17
Title Copy text
Description The system allows the user to save the text
to local storage.
Priority Low
Dependency FR-S-16

21
11. Text Processor:
Requirement ID FR-S-18
Title Text feed
Description The system takes text from the display or
from the matched query and be ready for
the text process.
Priority High
Dependency FR-S-16
Requirement ID FR-S-19
Title Text Analyze
Description The system analyze the ready text and
make ready for the matcher.
Priority High
Dependency FR-S-18
Requirement ID FR-S-20
Title Text Search
Description The system takes the processed text then
query for database.
Priority High
Dependency FR-S-19
Requirement ID FR-S-21
Title Text Match
Description The system matched the text processed
with the text queried.
Priority High
Dependency FR-S-20
Requirement ID FR-S-22
Title Save text
Description The system saves the unrecognized text to
database.
Priority High
Dependency FR-S-20

22
12. Audio Player:
Requirement ID FR-S-23
Title Play Audio
Description The system allows the user to play the
sound for the matched text.
Priority Medium
Dependency FR-S-21
Requirement ID FR-S-24
Title Mute Audio
Description The system allows the user to mute the
sound not to play.
Priority Medium
Requirement ID FR-S-25
Title Save Audio
Description The system allows the user to save audio to
local storage.
Priority Low
Dependency FR-S-23, FR-S-21

15. Feed Back Sender:


Requirement ID FR-S-26
Title Check Connection
Description The system checks if there is connection
Priority Medium
Dependency Hardware network Connection
Requirement ID FR-S-27
Title Search
Description The system will search for file which is
saved for the patterns which are
unrecognized.
Priority High
Dependency FR-S-26
Requirement ID FR-S-28
Title Send
Description The system will send the queried results to
respective email.
Priority High
Dependency FR-S-26, FR-S-27

23
3.1.3 Use Case Model
The diagram below represents the activities of actor(s) when interacting with each function of the
system.

Use Case Diagram

The following diagram depicts the complete use case diagram of the system. And those use cases
connected with actor are use cases on which the actor directly access.

Figure 3. System wide use case of the system.

24
Use Case Description

Table 3. List of Use case Description

Use Case Number UC_01


Use case Title Load property
Priority High
Actor Any user
Description This property allows user to load video.
Use Case Number UC_02
Use Case Title Capture Property
Priority High
Description This property allows the user capture or record
video
Actor Any user
Precondition Checked if the camera is working normally or not.
Use Case Number UC_03
Use Case Title Start Process Property
Priority High
Description This property allows the user to start the system to
process the video.
Actor Any User
Precondition Checked if the video information are seen correctly.
Use Case Number UC_04
Use Case Title Play Audio Property
Priority High
Description This property allows the user to play the audio or
mute
Actor Any User
Use Case Number UC_05
Use Case Title Send Feed Back

25
Priority Medium
Description The property allows user to send the mail server
admin.
Actor Any User, Mail server admin
Precondition Internet connection, network connection
Use Case Number UC_06
Use Case Title Receive Feed Back
Priority High
Description The property allows the web admin to receive the file
sent
Actor Web Server

3.2. Non Functional Requirements


Nonfunctional requirements depend on the sort of system architecture, implementation strategy
and operational scenario selected by the authority, which will be responsible for the system. Based
on the reply from the candidates, the major coverage of non-functional requirement goes to the
performance and fault tolerant. And following the robustness has also an optimum coverage. Even
though the scalability has medium coverage, the project team has included it because it essential.
According the following non-functional requirements are identified by basing the reply given.

1. Performance: the system should have a satisfactory performance. It should respond as


quickly as possible to the user.
2. Fault tolerant: the system should be working even error or faults occurs. It need to recover
as quickly as possible from that error or fault condition.
3. Scalable: The system should be scalable and should be used by other researchers who work
on this area.
4. Safety requirements: the system shall not harm any of the user.
5. Security requirements: the database is secured that the user can only save the
unrecognized pattern and text but not the pattern to be used for the second time.

26
3.3 Analysis Models
The analysis models we use are Data Flow Diagrams and sequence diagrams modeling. The
requirements which are listed in functional requirement tables are modeled.

3.3.1. Data Flow Diagrams


The following diagram depicts the flow of data in each unit of the system. The system is
decomposed into six units such as video processing unit, video to text unit, text processing unit,
text to audio unit, send feedback unit, receive feedback unit and database unit.

Figure 4. DFD Diagram

3.3.2 Sequence Diagram


This modeling models the actions in sequence from the user to the system. It is of interaction
diagram that shows how process operate with one other and in what order. It depicts the objects
and classes involved in the scenario and the sequence of messages exchanged between the objects
needed to carry out the functionality of the scenario.
27
Show Video Description
It is a sequence diagram which shows the how the process goes in showing video description.

Figure 5. Sequence diagram for Load and Show Video Description.

Capture Video and show Description

The following sequence diagram shows the process while the system tries to capture video.

Figure 6. Sequence Diagram for Capture Video and Show Description

View Text

This sequence diagram shows what the text output after processing the video. It depicts the way
processes are interacted in showing the text. Here the process starts when the user loads the video
and the description is seen and then the user again press process then the process video will process

28
video and matches to database and if the match is successful it shows the text. Otherwise the user
is prompted to save the unmatched pattern and the user will save the pattern to specified location.

Figure 7. Sequence diagram for View Text

Play Audio

This process is started by the user by loading the video and the user will see description
successfully. Then the user will request process and the system will process the video and returns
match successfully. The text will be displayed to the user and at the same time will invoke the text
process to process the text and the system will query database and find the match and the wait the
user to invoke play and the audio is played and at the same prompts the user to save the audio. If
the user chooses to save, the system saves the audio but if the user choose cancel then nothing is
saved. If the match of text is unsuccessful then the system prompts the user to save unmatched text
to directory where unmatched pattern is saved.

29
Figure 8. Sequence diagram for play audio

Send feedback and Receive feedback

The user invokes the send feedback activity and the send feedback locates where the files are and
then invokes the compressor class and the file is successfully compressed will make send feedback

30
activity to send a message which shows that it is ready to send the file. The user requests the send
request and the file sent to the server and receiver activity will show message to the admin as well
as will send back message that shows message has been successfully delivered to the user.

Figure 9. Sequence diagram for Sending feedback and Receiving feedback.

31
Chapter 4. System Design
System design is the process of defining the architecture, components, modules, interfaces,
and data for the system to satisfy the specified requirements.

4.1 Deployment Diagram


The deployment diagram models the physical deployment of the artifacts. The deployment
diagram for the system is depicted as follows.

Figure 10. Deployment Diagram

4.2 Architectural Design


4.2.1 Class Diagram
It is a type of static structure diagram that describes the structure of the system by showing the
systems classes, their attributes, operations or methods and their relationships among the classes.

32
The class file shown below is based on the DFD and use case diagram already shown above in
Figure

Figure 11. Class Diagram

33
4.2.2 Activity Diagram
Load Video Activity Diagram
This diagram shows the activities done while loading video. While load video the user tries to open
using the dialog box and he will pass the open request or cancel request. If the cancel request is
passed the activity finishes without opening the video. But if the open request and the video
opening is successful then activity ends unless it returns again to initial to try again.

Figure 12. Activity Diagram Load Video

Capture Video Activity

This diagram shows an activity while capturing or recording video. The activity initialized and
first checks for camera. If the camera is working then recording starts. While recording if stop
request is passed then it will stop recording and automatically save the video to the earlier specified

34
directory. If no record again request is passed, then the saved video is loaded for the next process
and ends the activity.

Figure 13. Activity Diagram Capture Video

Save Video Activity

The diagram shows how the video is saved. After the save video activity is initialized, it will take
the file name, its size and destination to save and then saves. If all this is successful ends the
activity unless returns to initial.

35
Figure 14. Activity Diagram Save Video

Show Video Description Activity

This diagram shows how the video information is shown. As soon as the activity starts it will take
the loaded video and takes the parameters location, name, size, format, duration, number of frames,
frame rate and shows the information of each loaded video.

Figure 15. Activity Diagram Show video Description.

36
Process Video Activity

The diagrams shows an activity where video is changed to frames and extracted for matching from
patterns saved in database. It first checks that the video loaded is valid and if valid it will waits
start processing otherwise it will return to initial state and waits for press of process button again.
After the process has extracted the features then search activity is used to search for pattern
resembling the patterns generated in feature extraction. If the search results are successful then
matching is done. If matching is successful then checks if there are other more frames unfinished
and returns to segment action part again. Otherwise it will end. If there are some patterns which
are unmatched then it will go to save pattern activity and ends the activity.

Figure 16. Activity Diagram Process Video

Search Pattern Activity

The diagram showing how pattern is searched in the process video activity.

37
Figure 17. Activity Diagram Search Pattern

Save Pattern Activity

The diagram showing how save pattern process goes on. The save here takes the pattern name, and
file size and saving location and saves to the specified location.

Figure 18. Activity Diagram Save pattern

38
Text Display Activity

The diagram shows how the text is displayed. After the matching step in process video. This
activity starts at every time process video ends. The display checks if text is loaded successfully.
If successfully loaded displays the wanted text. Otherwise return to initial or start and starts
again. If the text is displayed successfully the copy to clipboard activity can be activated and be
used to copy to clipboard copying is done asynchronously. This activity lasts if there are texts to
be loaded but if finished the activity ends.

Figure 19. Activity Diagram Text Display

Copy to Clipboard Activity

The following activity diagram shows how activity goes while copying to clipboard. Checks if
there text is displayed then copy will be successful. Otherwise not successful it will return to the
start or inactive stage.

39
Figure 20. Active Diagram Copy to clipboard.

Text Process Activity

This following activity diagram shows actions in Text process activity. The text displayed goes
to analyses stage and analyzed. During analyses the text’s behaviors are studied deeply if it is
word, letter, what follows what and so on. After deep analysis then texts equivalent will be
searched from database and then matched with text displayed and if match is found then the
respective sound is loaded and the next activity will be ready to undertake. If the text is
unmatched then will be saved to the position where the unmatched pattern is saved. Here the
saving unmatched text can be done asynchronously that it cannot affect the main activity. The
activity remains active until the text loading is not finished and as soon as the text loading is
finished then activity come to an end.

40
Figure 21. Activity Diagram Text Process

Save Text Activity

The diagram in figure 13 shows actions when unmatched text is saved.

Figure 22. Activity Diagram save Text.

41
Play Audio Activity

This activity become active when audio is loaded after the successful match of text in text
processing. If audio is not successfully loaded then the activity become inactive or to initial. But
if the audio loading is successful then waits for play or mute request. If play request then it
asynchronously ask the user to save the mp3 version of the audio and at the same time plays the
audio. But if the mute request is passed then then audio sound cannot be heard and ends the activity
if the no more audio is loading but goes to audio load stage if audio loading unfinished.

Figure 23. Activity Diagram Play Audio.

Save Audio Activity

42
The activity diagram below shows those actions done while saving audio file. This occurs when
the user requests play audio file and file is played as well as prompts the user to save the audio or
download the audio file.

Figure 24. Activity Diagram Save Audio.

Send Feed Back Activity

It is an activity invoked when the user requests send feedback and then tries to locate the files then
find the patterns saved with respective text and then compresses it. Then it will make the file ready
to the send activity and then ends.

43
Figure 25. Activity Diagram Send Feed Back.

Send Activity

This activity is invoked by send feedback activity and sends the file to the specified admin email
with proper email, subject and descriptions already specified.

Figure 26. Activity Diagram Send Activity.

44
Figure 27. Activity Diagram Receive Feed Back.

4.2.3 Proposed Software Structure


The proposed software structure of the system is depicted as follows where the process are
explained and the processes can be acted by the actors or not. Since the actor diagram explains
about the interaction it is necessary to redraw the actor diagram in the proposed architecture. And
also the architecture depicts the general overview what the software looks like and what it does.

45
Video Processing
Input Video

By Load Video or Video Key Frames


Capture Video Segmentation Processing

Pattern Construction and


Discrimination
Feature
Display the Text Matching Extract
and The Sound ion
playing

Sign Language

Figure 28. The proposed software architecture

Proposed Software Flow chart structure.


The proposed software structure depicted in terms of flow chart is shown as follows. It shows the
overall view what the software is proposed to do when implemented. It has six phases and the first
one video segmentation phase, ROI Extraction phase, Resizing phase, Feature Extraction phase,
matching phase, and the output phase (where text and audio are output).

46
Figure 29. The proposed system Flowchart Structure

4.3 User Interface Design


User interface is the system by which users or people interact with the system. The user interface
the team designed for system has some controls which allows the user to interact with the system.
Some of the controls included in user interface are as follows.
 Load Video: It allows the user to load video from some location in the storage.
 Capture Video: allows the user to capture video which invokes the camera to record video.
 Pause/play: allows the user to play or pause the video loaded
 Process video: allows the user to start the video to be processed. It is the springboard for
the video processing.
 Play: allows the user to play audio as well as prompting the user to save the audio or not.
 Mute: allows the user to mute the sound for playing audio.
The diagram below shows what the user interface looks like when a user first opens the application.

47
Figure 30. The overall User interface

User interface for when a user loads or capture video

The following user interface depicts an interface when the user loads, captures video. And the
sound play is also active by default. The loaded videos description then shown on the video
description space. And if the user is capturing video the video loaded automatically after recorded
and some description will be shown on the video description section. The user has not pressed the
process video button.

48
Figure 31. When Video is loaded/ captured /Sound play Active

User interface when user mutes sound

When user doesn’t want the sound then he/she may mute sound and the mute audio will be
activated. Still the user hasn’t pressed the process button.

49
Figure 32. When Mute is activated/ Sound is mute

User interface when video is processed

This user interface depicts when a user presses process video and the message “video
processing….” is seen and when the message is no longer seen the process is finished.

Figure 33. When Process Video is pressed

50
User interface when text is viewed

This user interface depicts when user shown the text output and as the text is being seen the text
shown is processed for its audio version and becoming ready to be played.

Figure 34. When Video Processing is finished/ Text Output is shown

4.4 Test Design


Test design is process whereby the system is tested by having some tests which the system is
expected to do. The system we have designed can be tested with following suits which are controls
or actions done in the system. The test design contains the test item, the action, criteria of the test
item and finally the assessment. The test can be done after the system is implemented and the
measure is based on the assessment made that is success or fail. If the assessment is full of success
the implemented system is successful can be operational otherwise the system may be impractical.

51
Table 4: Test design table

Seque Test item Action Test criteria Assessment


nce
1 Load video Open a video file The video file will be displayed on to Success/fail
the video display frame and meta data
of the video will be displayed in to
the video description frame.
2 Capture Capturing live video A live video feed will be displayed on Success/fail
video feed to the video display frame
3 Save Storing captured The video captured must be saved Success/fail
videos onto disk
4 Pause/play Pausing of playing If the video is in a pause state the Success/fail
the video video will be played but if the video
is in a play state the video will be
paused
5 Stop Closing the video The video opened or the live video Success/fail
opened or the live feed will be closed and the display
video feed frame will be empty
6 Process Processing the video A text will be display on to the text Success/fail
video frame by frame display window/frame and the Play
and Mute feature will be activated
7 Play Playing the audio The audio that is recognized during Success/fail
the processing stage will be played
8 Mute Audio silencing The audio that was playing will be Success/fail
silenced
9 Copy text to Copying text file The text file copied must be pasted in Success/fail
clip board clip-board
10 Send Sending feedback to The feedback will be sent to the Success/fail
feedback the server server and confirmation will pop up
to the user
11 Exit Quitting the whole User must be prompted to save Success/fail
process unsaved videos and the system will
exit

52
Chapter 5. Implementation of the System

5. Implementation of the System


5.1 Implementation
The System is implemented using Visual Studio C# and EMGU CV which is wrapper of Opencv
for C# and Matlab. And the Visual Studio version is Visual Studio 2012. The EMGU CV version
is 2.4.10. And Matlab Version 2014a. The framework used is EMGU CV framework which is
Opencv wrapper for C#.Net. And also C#.Net framework is also used for the user interface. Matlab
is used for the audio synthesis and connected to C#.Net. The database used is flat file that is text
file. The image and video segmentations is done using thresholding.

5.2 Detection and Recognition


5.2.1 Hand Skin Color Detection
It is used to find skin color in recorded video, camera frame or still image. It uses YCrCb Skin
Detection. It is the most known and most used color space for skin detection. As explained in [14],
YCrCb is an orthogonal color space that reduces redundancy found in RGB color channels. In
addition to that it also represents the color in statistically independent components. The luminance
information is stored as a single component (Y), and chrominance information is stored as the
color-difference components; Cb. and Cr. Cb represents the blue component and reference value.
Whereas the Cr component represents the difference between the red components and reference
value. As explained by [14], this color space is the favorable choice for skin detection due to the
following factors. These are transformation simplicity and unambiguous separation of luminance
and chrominance. The YCrCb can be also called YCC which represents the digital component of
video. We can define YCrCb as follows:

Y the brightness (Luminance).

Cr Red minus Luminance (R-Y)

Cb Blue minus Luminance (B-Y)

53
So the skin detector we have implemented works on the principle of YCC or YCrCb color space
which can be used to find the general skin color. As explained in [28], there are two types of skin
detection mechanisms. These are:

1. Pixel based skin detection


2. Region Based skin detection

Pixel based skin detection: is detection mechanisms where each pixel is classified as either skin
or non-skin individually from neighbor.

Region based skin detection: is detection mechanism which looks for spatial distribution or
arrangement based on intensity and texture.

The detection mechanism used in the implementation is color based so it is categorized under pixel
based skin detection.

Why YCC is used in place of RGB or HSV?

According to [28], YCC is advantageous from over RGB and HSV in that:

 It can be applied to complex color images even on an uneven illuminations, while HSV
(Hue, saturation, and value) is not ideal because the processing consumes time as well as
is best for only simple images with uniform background.
 Is applicable and widely used for color based analysis while RGB is not ideal for color
based analysis due to its non-uniform as well as device independent nature.

Based on max and min value of YCC, the skin and non-skin color is detected. The detected skin
region is will be eroded and dilated before its hull points and contour points are decided. The
morphological operations applied are dilation and erosion.

Dilation: maximizing morphological operation which makes the bright regions of image to grow
by computing the maximal value.

Erosion: minimizing morphological operation which computes the local minimum over the area
of kernel.

The skin detected will be given to function which extracts contours and convexity hull. Based on
the biggest contour of the skin part will be identified and segmented.

54
The YCrCb color space when implemented using Emgu CV, it will be used as Ycc. Image<Ycc,
Byte> frameImage = new Image<Ycc, byte> (bitmap). In this sample code, the Ycc represents the
YCrCb color space. This color is used to detect the skin with max value of Ycc, and min value of
Ycc. The detected skin will be eroded and dilated with cvErode and cvDilate to smooth and make
the detected skin to be seen. On the main grabber of image the rectangular region will be drawn
whenever skin color is detected. That region will be then used to train the system and to be used
for matching after.

5.2.1 Hand Recognition


Recognition is the step measures how much the trained labels are recognized when loaded from
videos, image, or captured from live camera. For the recognition step, the team used SURF
algorithm.

How the SURF algorithms does works?

SURF means Speeded Up Robust Features. It is a local feature detector and recognizer and
descriptor which is used for matching the images based on the descriptors found from the saved
image in the database and grabbed image.

It extracts feature from query image and all the images in the database or collection of images.

This algorithm is very efficient but used for few images because the searching become exhaustive
for more images. But the SURF algorithm used here is equipped with fastest search called KNN
(K Nearest Neighbor) which makes the algorithm efficient.

The steps of algorithm is as follows:

1. For each image in the database descriptors are computed


2. All the descriptors are combined into one big matrix by concatenating rows from all
matrices and save the row number when the newly appended descriptors start and end.
3. From the concatenated matrix FLANN index is built. FLANN means Fast Library for
Approximate Nearest Neighbor.
4. Compute the descriptors from an input image or queried image.
5. Have KNN search over the built FLANN index. On this step the search will return k-pairs
of with lowest distance. The results of KNN search will be two matrices. These are indices

55
matrix and distance matrix. The first one tracks the index of each images start and end. The
distance matrix is the one which calculates the distance between each of images from the
queried image.
6. Inappropriate matches will be filtered out
7. The matched results will be ready for the other step.

The following figures shows how much of the image are matched and if the match percentage is
higher the red rectangular region become fully drawn otherwise it will draw nothing if there is
little or no matching.

Figure 35. The matched area of the hand image captured from video.

56
Figure. 36 Better matched image for captured image from video file.

Figure 37. Unmatched image with an image capture from video file.

Matching Inside Recognition

From the filtered indices and distance matrix, match percentage is calculated and find how much
region of images is similar. Using percentage as well as the absolute difference of each image of
database and queried image, matching is done and the corresponding text of saved image will be
output on the display.

57
5.2.3 Training phase of implementation
Using the contour, convex and convexity hull algorithms, the detected skin region will be
extracted and saved to database file. While it is being saved it is cropped to 100x100 pixel which
optimal value for comparison of images and the name of image will be saved to the database file.

5.3 Implementation of Text to Audio


This system takes almost all Ethiopian alphabets (Fidel) as an input and using a concatenative
synthesis it returns a sound for the input words. We design the system using Matlab audio
manipulation techniques by interfacing the Matlab code from the C# (Visual Studio). Since the
main objective of our assignment is not synthesizing speech from the text and standalone app
instead it is integrated with the first part of our project which is the conversion of sign language to
text. From the C# form of textbox we accept the Amharic word and by string comparison we
convert all the Amharic text to numbers ranging from 1 up to 231. And the C# code concatenate
the numbers and make a string of this number which is an input for the Matlab function.

How the Audio Synthesis Works?

The approach used here is concatenative synthesis. Concatenative synthesis is the way each of
alphabet sound is concatenated to create word. When word is given it also read based on those
already saved sounds of alphabets. The special Amharic characters other than the 231 are not
included and it is beyond our scope and English alphabet are also included. The sample rating of
each audio is the default one which 1200.

The steps used are as follows:

1. The text output is given as an input to the voice synthesis class. The class is implemented
using C# with library calling Matlab.
2. The input text will be matched with its corresponding number
3. The Matlab function will be called from C# using the following code.
var matlab = new MLApp.MLApp()
matlab.Execute(path);
The function calling is in the following snippet code.

58
4. Matlab function numberToVoice takes numberArray and converts to its corresponding
voices.
5. Silence will be removed from each sounds with threshold of 0.1
6. After the silence is removed it will be enhanced using speech enhancement function. That
is used to reduce high pitch voiced signals using the low pass filter.
7. The sound will be returned to C# environment and played.

59
Chapter 6. Test Results
6.1. Purpose
The purpose of this test result is to provide a summary of the results of the test performed as
outlined within this documentation.

6.2. Unit Functional Testing

This depicts the test result of unit function of the system and each is tested according to the test
design suit designed on previous semester.

60
Table 5. Unit Function Testing

Test Test Case Tester Pass/Fail Severity of Comments


Case Name Defect
ID
1 Load Video Team Pass Less The video is loaded successfully.
2 Capture Team Pass Less The camera will start and start capturing
Video the video.
3 Save Team Pass Less Saves the captured images when the
train button clicked
4 Train Team Pass Less Trains the system to with video file, static
images or live feed camera
5 Play/Pause Team Pass Less It pauses or plays the video running
6 Play Audio Team Pass Some It plays the audio but there is some delay
defect due while it trying call function in the Matlab.
to some
delay
7 Mute Audio Team Pass Less It stops the audio sound as it is running.
8 Exit Team Pass Less Quits the whole program.

9 Show Text Team Pass less As the video frame is moving the
respective matched text is shown.
10 Show Text Team Pass Some As the video frame is frame is moving the
with sound defects respective matched text is shown and
respective audio is played.

6.3 System Function Testing


In this testing design we try to show the result of our testing on the main functions of our system
which is to load a video to the C# interface and convert it to a sound. The project team implemented
the application only for the 2D images of sign language i.e. static images and also as explained in
the scope for only some selected words. And the sound synthesis related to sound of the Amharic
text, alphabets and English letters is made on using Matlab and implemented with it and interfacing
with C#.Net.

61
We also put a training environment to train the system with more images and videos to get a better
result since our application works in a controlled environment and since we couldn’t find enough
videos and image to train it ourselves we put this training button to train the system.

By using some video we tried to test our system to convert the video to the sound and whether it
actually train new sets of images and video as shown in the table below.

Table 6. System Function Testing

Test Id Description Expected Result Actual Result


1 Pre-Condition: recorded video loaded A text out put “አንድ”, ”ሁለት”, Got each output
to the application ”ሶስት”, ”አራት”,“ አምስት” , ”ዜሮ” with some delay.
A sign video containing numbers of 1 and related sound
,2,3,4,5,0
2 Pre-Condition: recorded video loaded A text output “ጆሮ”,”አፍንጫ” Got each output
to the application and related sound with some delay
A sign video containing words ear,
nose
3 Pre-Condition: recorded video loaded A text output “A”,”B”,”C” and Got each output
to the application related sound with some delay
English alphabets sign language of a
character “A”,”B”,”C”, “D”
4 A sign video with a meaning of eye A text output of “አይን”and This failed to
loaded to be trained and give the related sound work but by
word equivalent “አይን” for the video. using the right
Then check the trained video training video it
can be achieved
right.
5 A sign video of Amharic alphabets A text output of ‘ሀ’,’ለ’, ‘መ’, ‘ ቀ’ This works well
for ‘ሀ’,’ለ’, ‘መ’, ‘ ቀ’ then checked the is expected to be displayed with with some delay
video related sound. of sound.

62
Chapter 7. Conclusion and Recommendation
7.1 Conclusion
The documentation the team developed is for ESL to Text and Audio in image processing
approach. It is one way and it goes from gesture to text and audio. In order to develop the
documentation the team has gathered requirement by questionnaire and use case scenarios, and
group discussion. And the gathered requirements are modeled using the analysis models such as
DFD diagrams, Sequence diagrams, activity diagrams, class diagrams. The documentation has also
been equipped with test design which will be used to test the implementation. In the
implementation part we have used EmguCv library which is cross platform for .Net wrapper to
opencv for image processing. In it we have tried to train different sample videos and used to
recognize from file and live camera. The audio processing uses concatenative approach which
included the whole 231 alphabet without including the special characters other than those. The
algorithms used are SURF as well as Absolute differences of images for image recognition and
matching. The videos used are short because if longer the processing time will be higher and
performance become slower. The image database is not huge if so the SURF algorithm is efficient
on considerable size of images. There is some delay on calling Matlab for the audio processing of
the respective text. We have developed the running prototype as a desktop version only and also
the communication is one way.

7.2 Recommendation
Finally the project team recommends as follows:
 For the future the team recommend the system to be scaled to be two direction that both
side communication can be possible.
 The system should be available to smart phones and tablets also, since these devices are
vastly available in the world and in our country as well.
 The system should work on web based application technologies and web api need to be
developed for the future.
 The system should support all the gestures of ESL.
 TTS need to be developed for the voices of Ethiopian regions Amharic dialect.

63
 Large trained database of gesture need to be available for all the gestures of ESL language.
 Better machine learning algorithms should be applied for training and pattern recognition
on large scale.

64
Chapter 8. User Manual
8.1. General Information
8.1.1 System Overview
The system mainly intended to solve the problem of communication gap between deaf and normal
people; translates ESL to Amharic text and sound. The system consists of three phases:

1. Video processing/Image Processing


2. Pattern Construction and Discrimination
3. Text and Sound Transformation

8.1.2 Authorization
Using the system in illegal way which violets the university rule and regulation, and making
unauthorized copies of data, reports and documents is not permitted.

8.1.3 Contact
If you face any difficulty while using the system and have questions, recommendations, and
face challenges please contact us:
E-mail: eslteam2016project@gmail.com

8.2 Access level


The system is allowed to every body. User is not allowed to copy codes, sell the product and
unable to make any change.

8.3 System Activities


The activities the system include are

8.3.1 Load Video


This activity helps the user to input video by loading the video to the system using dialogbox.

65
Figure 38. Loading the video

8.3.2 Capture Video


If the camera is working then recording starts. While recording if stop
request is passed it will stop recording.

8.3.3 Video Description


Describes information of processed video
 video size
 format
 duration
 codec of video.

8.3.3 Text Display


Display respective amharic text of video processed

66
Figure 39. Text Display for an corresponding sign of Zero

Figure 40 Audio output

8.3.5 Exit the system


Used to exit(quit) the system after use. It disposes the application after exit button or menu is
selected.

67
8.3.6 Training the system
After clicking the train button on the main GUI other screen comes and can train the system using
video loaded, camera, or static images. Even though the performance resulting three of them is
different. After successfully training user can check if the system is trained properly.

8.4. How the System works


The system works with Visual Studio of version not later than 2010. And with an EmguCv 2.4.10.
Version. It needs dlls’ of emgucv.dll, emgucv.util.dll, Emgucv.cv.ui, Emgu.CV.util.dll this are
the must and to be included in the reference. It can be downloaded from
http://www.emgu.com/wiki/index.php/Version_History_2.x#Emgu.CV-2.4.10 . It is free and can
be downloaded freely. It is an open source which is a wrap of OpenCV for C#.Net. After installing
the exe or extracting the zip file it need to be entered into system path that the useful image
processing dlls’ can be found. The extracted version or set up can be downloaded and installed.
And Matlab version of 2011 and more is expected for the application to run. The file temp file
need to be put in the C directory. Like the following. C:\Temp\temp. The project contains Matlab
code which is found inside temp directory.

68
References
[1] Dagnachew F. Wolde. Machine Translation System for Amharic Text to Ethiopian Sign
Language: Thesis report, Addis Ababa Institute of Technology, 2011.

[2] http://www.africansignlanguages.org/countriespays/#Ethiopia Accessed date November 8,


2015.

[3] Information Bulletin of the Faculty of Humanities in Addis Ababa University Vol. 1 Issue 2
January 2012, Addis Ababa.

[4] Eyasu Haile. Sign Language News at Addis Ababa University: case study, Addis Ababa
University, 2008.

[5] https://en.wikipedia.org/wiki/Interview accessed date November 11, 2015.

[6] https://en.wikipedia.org/wiki/Feasibility_study accessed date November 11, 2015.

[7] http://www.gizmag.com/uni-sign-language-translator/34247/ accessed date November

20, 2015.

[8] https://www.youtube.com/watch?v=zVLj5Qm-xq0 accessed date November 17, 2015

[9] A.E.El-Alfi, A.F. El-Gamal & R.A. El-Adly. Real Time Arabic Sign Language to Arabic
Text and Sound Translation System: Journal Report, Mansoura, Egypt. Retrieved October 30,
2016 from http://www.ijert.org/view-pdf/9906/.

[10] http://www.techrepublic.com/blog/10-things/10-techniques-for-gathering-requirements/

Accessed date November 18, 2015

[11] https://en.wikipedia.org/wiki/Literature_review. Accessed date November 20, 2015.

[12] Chaelynne M. Wolak. Gathering Requirements the Use Case Approach. Retrieved

November 20, 2015 from http://www.bus.iastate.edu/nilakant/MIS538/Readings.

[13] http://deafblindinethiopia.blogspot.com/p/about-us.html. Accessed date November

24, 2015.

69
[14] http://freecode.com/projects/espeak-for-android Accessed date November 28, 2015.

[15] http://www.computerworld.com/article/2988167/microsoft-windows/microsoft-
claims110m-devices-now-run-windows-10.html Accessed date November 28, 2015.

[16] Shekainah Paulson and Mrs. B. Thilagavathi. Adaptable –Speech-to-Sign Language

Translation System International Journal Research. Retrieved from


http://www.ijert.org/download/8874/an-adaptable-speech-to-sign-language-translation-system

[17] Abdelmoty M.Ahmed, Reda Abo Alez, Muhammad Taha and Gamal Tharwat. Propose a
New Method for Extracting Hand using in the Arabic Sign Language Recognition (Arslr)
System. Retrieved from http://www.ijert.org/download/14297/propose-a-new-method-for-
extracting-hand-using-in-the-arabic-sign-language-recognition-arslr-system

[18] Sarad Dhungel. American Sign Language (ASL) to Text/Voice. Retrieved December 1,
2015 from http://www.mwftr.com/SD1415/ASL2Text.pdf

[19] Jessica Hutchinson. Literature Review: Analysis of Sign Language Notations for Parsing in
Machine Translation of SASL. Thesis Report. Rhodes University South Africa, 2012. Retrieved
December 1, 2015 from http://www.cs.ru.ac.za/research/g09h2318/LiteratureReview.pdf

[20] https://en.wikipedia.org/wiki/Hamburg_Notation_System. Accessed date November 28,


2015

[21] http://www.speechtherapyct.com/whats_new/phonology.pdf. Accessed date November 30,


2015.

[22] http://www-01.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsPhonology.htm.
Accessed date December 2, 2015.

[23] http://www.research.ibm.com/tts. Accessed date December 8, 2015.

[24] http://www.webopedia.com/TERM/T/TTS.htm Accessed Date December 8, 2015.

[25] http://mary.dfki.de/. Accessed Date December 8, 2015.

[26] https://en.wikipedia.org/wiki/Morphology_(linguistics) . Accessed Date December 8, 2015.

[27] http://dimio.altervista.org/. Accessed Date December 9, 2015.

70
[28] Chelsia Amy Doukim and etal. COMPARISON OF THREE COLOUR SPACES IN SKIN
DETECTION. University Malaysia Sabah, 88999 Kota Kinabalu, Sabah, Malaysia.

71
Appendix A. Questionnaire
Dear respondents, you are requested to fill this questionnaire for the purpose of requirement
gathering for our project. There are total of 8 questions and you are expected to tick (X) on space
provided.

Q.No Assumed Questions that can cover the whole user Yes No Don’t know
requirements

1 Do you agree if the system has live video feed?

2 Do you agree if the system has synchronous text and


speech output?

3 Is there a purpose to save the video recorded?

4 Is there a purpose to check the live feed video?

5 The system should recover even error occurs

6 Do you think that performance matters?

7 The system should be secured. Do you agree?

8 Do you think that the system should be scalable?

72
Appendix B. Project Schedule
The project ends at the end of semester. And the project schedule is based on the deadline of the
end semester for the documentation part or Software Requirement Specification Document
(SRSD). The project schedule for the system to be developed is illustrated as follows.

Task Start Date End date


Project proposal Submission 30/10/2015 23/11/2015
Literature review 17/11/2015 3/12/2015
System Feature(Requirement Gathering) 25/11/2015 3/12/2015

System Design Initiation 3/12/2015 25/12/2016


Full System Design finalize all 25/12/2016 20/01/2016
documentation
Prototype Design 10/01/2016 20/01/2016
Break Break Break
First phase implementation and testing 01/02/2016 01/03/2016
Second phase implementation and testing 01/03/2016 01/04/2016
Third phase implementation and testing 01/04/2016 01/05/2016
Integration Phase and testing 01/05/2016 15/05/2016
Full system and testing 15/05/2016 25/05/2016
Finalize and submission 25/05/2016 01/06/2016

The project schedule in terms of Gantt chart is shown as follows. The end date both in first semester
and second semester might change because of the University schedule might change.

73
74

You might also like