Professional Documents
Culture Documents
ProjectTemplateFinal 4 4 - 4
ProjectTemplateFinal 4 4 - 4
BY
Aadarsh B.k.
Sandhya Thapa
Gitanjali Shah
Madan Shahi
JUNE, 2023
ACKNOWLEDGEMENT
First and foremost, we extend our deepest gratitude to our project supervisor
ER.Santosh Bhattarai for his invaluable guidance, expertise, and constant
motivation. His insightful feedback, constructive criticism, and patience have
played a pivotal role in shaping the direction and quality of this project.
We would also like to thank my classmates and fellow students who provided
valuable insights, engaging discussions, and a collaborative environment that
fostered learning and growth. Their willingness to share ideas, exchange feedback,
and offer assistance has been truly invaluable.
Lastly, we would like to thank all the participants who willingly volunteered their
time and expertise to contribute to this project. Their cooperation and willingness
to share their experiences and insights have greatly enriched the findings and
outcomes of this study.
I
ABSTRACT
The project aims at building a machine learning model that will be able to classify
the various hand gestures used for fingerspelling in sign language. In this user
independent model, classification machine learning algorithms are trained using a
set of image data and testing is done on a completely different set of data. For the
image dataset, depth images are used, which gave better results than some of the
previous literatures [4], owing to the reduced pre-processing time. Various machine
learning algorithms are applied on the datasets, including Convolutional Neural
Network (CNN). An attempt is made to increase the accuracy of the CNN model
by pre-training it on the Imagenet dataset. However, a small dataset was used for
pre-training, which gave an accuracy of 15percent during training.
II
TABLE OF CONTENTS
ACKNOWLEDGEMENT I
SUMMARY II
TABLE OF CONTENTS III
LIST OF FIGURES IV
LIST OF ABBREVIATIONS V
1 OVERVIEW 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Project Scope and Applications . . . . . . . . . . . . . . . . . . . . 3
2 LITERATURE REVIEW 4
3 REQUIREMENT ANALYSIS 7
3.1 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 METHODOLOGY 9
4.1 System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 RESULTS AND ANALYSIS 11
5.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 CNN: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3 Creating the Model: . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4 ANN: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.5 Training vs Validation Accuracy: . . . . . . . . . . . . . . . . . . . 13
6 FUTURE ENHANCEMENTS 15
6.1 Future Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7 CONCLUSION 16
References 17
III
LIST OF FIGURES
IV
LIST OF ABBREVIATIONS
V
CHAPTER 1
OVERVIEW
1.1 Introduction
In recent years, there has been growing interest in developing automated systems
that can detect and interpret sign language gestures. These systems aim to bridge
the communication gap between hearing-impaired individuals and the general
population, enabling more inclusive interactions and accessibility.
In the context of sign language detection, deep learning techniques offer promising
potential. By training neural networks on large datasets of sign language gestures,
these models can learn to recognize and classify different hand and body move-
ments accurately. The availability of annotated sign language datasets, such as
videos or images of signers performing various gestures, has further facilitated the
development of deep learning-based sign language detection systems.
1
1.2 Motivation
The motivation behind our project proposal lies in the transformative impact
that sign language detection can have on the lives of individuals with hearing
impairments. By developing an advanced and efficient system that can accurately
interpret sign language gestures, we aim to empower the deaf and hard-of-hearing
community by providing them with a means to communicate effortlessly and
naturally with the broader society. This technology holds immense potential
in numerous domains, such as education, healthcare, public services, and social
interactions.
1.4 Objective
iii. Adaptability to Different Sign Languages: Create a system that can adapt
and recognize different sign languages, considering the variations and differences in
2
signs across regions and cultures.
This project aims to develop a robust sign language detection system capable of
accurately recognizing and interpreting a wide range of sign language gestures. The
scope includes designing efficient algorithms for gesture recognition, optimizing
real-time processing, accommodating variations across different sign languages, and
creating a user-friendly interface. The system’s primary goal is to enhance commu-
nication accessibility and inclusivity for the deaf and hard-of-hearing community
in various domains and applications.
3
CHAPTER 2
LITERATURE REVIEW
4
and artistic forms such as sign poetry and sign dance. The literature discusses the
importance of recognizing and valuing sign languages as unique and independent
languages, promoting deaf rights and cultural inclusivity. Efforts to document and
preserve sign languages have resulted in sign language dictionaries, corpora, and
linguistic databases, providing valuable resources for linguistic research, language
documentation, and language revitalization initiatives.[3]
In recent years, the literature has witnessed increased attention on multimodal
communication, involving the integration of sign language with other modalities
such as speech, text, and haptic feedback. This interdisciplinary field explores
methods for enabling effective communication between deaf and hearing individuals
through the use of sign-language interpreters, speech-to-sign translation systems,
and sign-language avatars. The literature highlights challenges in achieving seam-
less multimodal communication, such as synchronization, context awareness, and
cultural nuances. Researchers have investigated the design and evaluation of
multimodal interfaces and communication technologies to support inclusive commu-
nication and equal access to information and services for deaf individuals.[4] The
goal of this project was to build a neural network able to classify which letter of
the American Sign Language (ASL) alphabet is being signed, given an image of a
signing hand. This project is a first step towards building a possible sign language
translator, which can take communications in sign language and translate them
into written and oral language. Such a translator would greatly lower the barrier
for many deaf and mute individuals to be able to better communicate with others in
day to day interactions. This goal is further motivated by the isolation that is felt
within the deaf community. Loneliness and depression exists in higher rates among
the deaf population, especially when they are immersed in a hearing world.[5] Sign
language translation is a promising application for vision-based gesture recognition
methods, in which highlystructured combinations of static and dynamic gestures
correlate to a given lexicon. Machine learning techniques can be used to create
interactive educational tools or to help a hearing-impaired person communicate
more effectively with someone who does not know sign language. In this paper,
the development of an online sign language recognizer is described. The scope
of the project is limited to static letters in the American Sign Language (ASL)
5
alphabet.[6] In this technologically advanced world, we must utilize the power of
artificial intelligence to solve some challenging real-life problems. One of the major
issues that the world is still trying to cope up with is establishing an efficient way
for communication between people. Between 6 and 8 million people in the United
States have some form of language impairment.[7]
6
CHAPTER 3
REQUIREMENT ANALYSIS
ii. Google Colab’s CPU: For the purpose of running Python code and carrying
out machine learning operations, Google Colab offers a virtual machine (VM)
environment. The virtual machines (VMs) used by Google Colab include CPUs,
and you can select the runtime type that best meets your requirements.
iii. Memory: Sufficient RAM is required to store and manipulate video frames,
as well as perform complex computations. At least 8 GB of RAM is recommended,
but more may be necessary depending on the specific requirements of your system.
7
requiring any software installation.
e. PIL: PIL (Python Imaging Library) is a popular library in Python for image
processing and manipulation. It provides a wide range of functions and methods
for opening, manipulating, and saving various image file formats.
f. Pickle: Pickle is a built-in module in Python that allows you to serialize Python
objects into a binary format and deserialize them back into Python objects. It’s
commonly used for saving and loading complex data structures, such as lists,
dictionaries, and custom objects.
8
CHAPTER 4
METHODOLOGY
The following elements are commonly included in the block diagram for American
sign language detection:
1. User: Represents the person using or signing in front of the sign language
recognition software.
3. Hand Detection: The goal of the hand detection procedure is to locate and
isolate the hand region within the video frames.
9
7. Output Mapping: This method links the movements that have been identified
to their sign language equivalents or matching meanings.
8. Text/Speech Output: For the user or interpreter, the final identified gestures
are output as text or speech.
10
CHAPTER 5
We created our dataset for training and testing. The alphabet in American Sign
Language is depicted in images or videos in this dataset.
5.2 CNN:
Convolutional Neural Network is a type of deep learning model commonly used for
image classification, object detection, and other computer vision tasks. CNNs are
designed to automatically learn spatial hierarchies and patterns from image data.
11
Figure 5.2: Convolutional Neural Network
The model consists of three convolution blocks with a max pool layer in each of
them. There’s a fully connected layer with 128 units on top of it that is activated
by a relu activation function. numc lasses = 2
5.4 ANN:
12
Figure 5.3: Artificial Neural Network
We tested our preprocessing on a test set made up of photos from the original
dataset and our own image data collection to see if it really did produce a more
robust model. the models’ performance on the test set. We observe that the model
that was trained on picture retouching performs a lot better. Due to the former’s
lack of overfitting, it performs better than the model trained on the original photos.
Reviewing the confusion matrix for the Filtered Model on the Kaggle test set.
13
ResultNdiscussion.jpg
14
CHAPTER 6
FUTURE ENHANCEMENTS
Due to the size of our initial dataset, using it requires a server with plenty of RAM
and disk space. The division of the file names into training, validation, and test
sets and the dynamic loading of photos in the dataset class are potential remedies.
We could train the model on more samples in the dataset if we used such a loading
strategy.
15
CHAPTER 7
CONCLUSION
In conclusion, the field of American sign language identification has shown promise,
utilizing computer vision and machine learning methods to promote effective
communication between sign language users and non-sign language users. These
devices are designed to bridge the communication gap and support inclusivity
for the deaf and hard-of-hearing community by recognizing and translating sign
language motions in real-time.
16
References
[2] F. Zhang and K. Kim, “Id-based blind signature and ring signature from
pairings,” in Advances in Cryptology — ASIACRYPT 2002, Y. Zheng, Ed.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2002, pp. 533–547.
[4] A. O. M. Salih, “Audio noise reduction using low pass filters,” Open Access
Library Journal, vol. 4, no. 11, pp. 1–7, 2017.
[5] D. Liu, P. Smaragdis, and M. Kim, “Experiments on deep learning for speech
denoising,” in Fifteenth Annual Conference of the International Speech Com-
munication Association, 2014.
[6] J. Zhang, J.-g. Yao, and X. Wan, “Towards constructing sports news from live
text commentary,” in Proceedings of the 54th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 1361–1371.
[7] C. van der Lee, E. Krahmer, and S. Wubben, “Pass: A dutch data-to-text
system for soccer, targeted towards specific audiences,” in Proceedings of
the 10th International Conference on Natural Language Generation, 2017, pp.
95–104.
17