Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

Contents

List Of Figures.........................................................................................................................................2
Abstract.................................................................................................................................................2
ACKNOWLEDGEMENT............................................................................................................................2
DEDICATION...........................................................................................................................................2
Introduction...........................................................................................................................................3
Final output layer....................................................................................................................................4
Problem statement................................................................................................................................4
Objectives..............................................................................................................................................5
Project Scope.........................................................................................................................................5
Literature review....................................................................................................................................5
Functional Requirements.......................................................................................................................6
Nonfunctional Requirements.................................................................................................................6
Project Significance................................................................................................................................7
Libraries used in our project...................................................................................................................7
TensorFlow..............................................................................................................................................7
OpenCV....................................................................................................................................................7
MEDIAPIPE...............................................................................................................................................8
Methodology..........................................................................................................................................8
Dataset Generation................................................................................................................................8
Project Diagram.....................................................................................................................................8
Use Case Diagram....................................................................................................................................9
Activity Diagram....................................................................................................................................10
Test cases:............................................................................................................................................10
Test case #1:..........................................................................................................................................10
Code implementation:..........................................................................................................................13
ConClusion & Future Work...................................................................................................................17
LIST OF FIGURES
S.NO Figure No. Figure Name Page No.
1 Figure 1.1 Sign Languages Data 3
2 Figure 1.2 Use Case 7
3 Figure 1.3 Activity Diagram 8
4

ABSTRACT
Deaf (hard of hearing) and dumb people mostly utilize sign language to communicate with
others and their community. People who are unable to talk or hear use hand gestures to
communicate in this language. The goal of sign language recognition (SLR) is to identify
acquired hand motions and to continue until related hand gestures are translated into text or
speech. Here, static and dynamic hand gestures for sign language can be distinguished. The
human community values both types of recognition, even if static hand gesture recognition is
easier than dynamic hand gesture recognition. To recognize hand gestures, we can utilize deep
learning computer vision, in which the model learns to identify the hand gesture images over
time. The model generates an English sentence when it successfully recognizes the gesture.
Since this model will operate more effectively, it will be simpler to communicate with hard-of-
hearing and deaf persons. The use of Deep Learning for Sign Language Recognition will be
covered in this study.

ACKNOWLEDGEMENT

In the name of Allah, the most Gracious and the Most Merciful.
Peace and blessing of Allah be upon Prophet Muhammad ‫ﷺ‬
First, praise of Allah, for giving us this opportunity, the strength and the patience to complete
our FYP finally, after the challenges and difficulties. We would like to thank our supervisor”Sir
Mazahir Hussain” for his guidance, motivation and most his significant contribution in this
project, expert “Sir Ayub Latif” , “Sir Affan Ahmed”, “Sir Farhan”,”Sir Azam”, and “Sir Affan
Alim”for giving us the opportunity to work on this project. We would also like to thanks our
parents for financial and moral support and our friends who have helped and motivated us
throughout. May Allah reward them all abundantly. Ameen
DEDICATION
This report is dedicated to PAF-KIET University, our Teacher, our Supervisor, our Parents, our
fellow colleagues and the hard-working students of PAF-KIET, with a hope that they will
succeed in every aspect of their Academic Career and this project may help them in any aspect
of their life.

INTRODUCTION
The visual means of communication used in sign language to communicate meaning include facial
expressions, hand gestures, and body movements. For those who have trouble hearing or speaking, sign
language is quite helpful. Recognition of sign language refers to the translation of these gestures into
already-existing, officially spoken language words or alphabets. Therefore, translating sign language into
words using an algorithm or model can aid in closing the communication gap between those who have
hearing or speech disabilities and the rest of society.

Computer vision and machine learning researchers are now conducting intensive research in the area of
image-based hand gesture identification. With the aim of making human-computer interaction (HCI)
simpler and more natural without the use of additional devices, it is an area where many researchers are
researching. Therefore, the main objective of research on gesture recognition is to develop systems that
can recognize certain human gestures and use them, for instance, to convey information.

Vision-based hand gesture interfaces need quick and incredibly reliable hand detection as well as real-
time gesture recognition for this to work. A powerful human communication tool with many possible
uses is hand gestures. In this context, sign language recognition is the deaf community's primary means
of communication. An active field of research in computer vision and machine learning is hand gesture
detection for human-computer interaction.

Learning. Making systems that can recognize particular gestures and utilize them to transmit
information or control devices is one of its main objectives. However, hand postures are the static
structure of the hand, whereas gestures are the dynamic movement of the hand, and gestures need to
be described in the spatial and temporal domains.

The most natural means of communication for deaf individuals is sign language, although it has been
noted that they have trouble interacting with hearing people. Similar to how spoken languages have a
vocabulary of words, sign languages to have a vocabulary of signs. The grammar of sign languages varies
from country to country and is not standardized or universal.
FINAL OUTPUT LAYER

We will connect the values obtained from the fully connected layer to the last layer of neurons—whose
count is equal to the entire number of classes—in order to estimate the likelihood that each image will
belong to a particular class.

MOTIVATON

Our project on sign language detection and conversion into speech aims to address the communication
gap between the deaf community and those who do not understand sign language. By developing an
accurate and efficient system, we can empower individuals who rely on sign language by converting
their gestures into spoken language. This project strives to promote inclusivity, break down language
barriers, leverage advancements in AI and computer vision, and contribute to educational and research
initiatives. Join us in creating a world where effective communication is accessible to all, regardless of
their preferred mode of expression.

PROBLEM STATEMENT
Many gestures are used in sign language to make it appear like movement language, which is made up
of a series of hand and arm motions. There are various sign languages and hand gestures for various
nations. Additionally, it should be emphasized that certain unfamiliar words can be translated by only
demonstrating motions for each alphabet. Additionally, sign language has particular motions for each
letter of the English alphabet. These sign languages can be divided into two categories: static gesture
and dynamic gesture. The dynamic gesture is used for specific concepts, whereas the static gesture
symbolizes the alphabet. Additionally, dynamic encompasses words, sentences, etc. Additionally, even
identical signs seem very different to different signers and from various vantage points. This research
focuses on using object identification and pipelining to create a static sign language translator. We
developed a lightweight network that can be utilized with low-resource embedded devices, independent
programs, and web applications.
OBJECTIVES
The American Sign Language presented in the alphabet of Figure 1 is intended to be recognised by the
real-time vision-based system known as the Sign Language Recognition Prototype. The prototype's goals
were to evaluate the viability of a vision-based system for sign language recognition and, concurrently,
to evaluate and choose hand features that might be applied in any real-time sign language recognition
system via machine learning algorithms.

The adopted approach only makes use of one camera and is predicated on the following notions:

1. The user must be in front of the camera, inside a specified boundary region.
2. Due to camera restrictions, the user must be within a specific distance range.
3. The hand stance is unobscured by any other items and is defined by the bare hand.
4. Since the chosen camera does not perform well in sunlight, the system must be utilized indoors.

PROJECT SCOPE
The scope of our project is to provide a realistic solution to dumb and deaf people that will help them
communicate with those who can’t understand that language. In Pakistan, there are 900000 people
according to the 2019 survey who can speak or hear. For those, this software will be helpful to
communicate with everyone easily. The detected gestures will be converted to alphabets and then the
alphabet will make a sentence.

The scope of the Sign Language Detection Project is to develop a real-time sign language recognition
system that can accurately recognize and translate sign language into written text or speech. The system
will use computer vision techniques to identify and track hand gestures, and it will be tested on a large
dataset of sign language gestures. The project will also include a user interface to allow users to interact
with the system and receive real-time translations.

LITERATURE REVIEW
Sign language recognition has been a field of active research for several decades. Many studies have
been conducted on sign language recognition using computer vision techniques, such as hand gesture
recognition, deep learning, and convolutional neural networks. There is a growing body of literature on
the topic, and researchers have made significant progress in the development of sign language
recognition systems. However, there is still room for improvement, particularly in terms of the accuracy
and speed of recognition.

Most of the research in this area is conducted with the use of a glove-based device. The glove-based
system has sensors on each finger, including potentiometers, accelerometers, and others. The
corresponding alphabet is shown in accordance with what they read. A glove-based gesture recognition
system that could learn new gestures and update the model of each gesture in the system online was
able to distinguish 14 of the hand alphabet's letters. Over time, sophisticated glove technologies like the
Sayre Glove, Dexterous Hand Master, and Power Glove have been developed. The fundamental issue
with this glove-based system is that every time a new user touches the fingertips, the Image Processing
unit must be calibrated again in order for the fingertips to be recognized. Live detection techniques are
being used in the implementation of our project. Our project's key benefit is that it is not limited to use
with a black background. With any background, it can be used. Additionally, it is not necessary for our
method to wear colored bands.

FUNCTIONAL REQUIREMENTS
 Real-time Sign Language Detection:

The system should accurately detect and track hand movements in real-time from video input.

Gesture Recognition:

The system should recognize a wide range of sign language gestures and symbols, covering various sign
languages.

 Gesture-to-Text Conversion:

The system should convert detected sign language gestures into corresponding textual representations.

Speech Synthesis:

The system should generate clear and natural speech output based on the converted text.

 Multi-Platform Compatibility:

The system should be compatible with different platforms, such as desktops, mobile devices, and
embedded systems.

 Robustness and Error Handling:

The system should handle noise, occlusions, and variations in lighting conditions, ensuring reliable
performance.

 User Interface:

The system should provide a user-friendly interface for easy interaction and control.

NONFUNCTIONAL REQUIREMENTS
 Accuracy and Reliability:

The system should have a high accuracy rate in detecting and recognizing sign language gestures,
ensuring reliable conversions and speech synthesis.

 Real-time Performance:

The system should operate in real-time, with minimal latency, to provide seamless and natural
communication.

 Scalability:

The system should be scalable to handle a large number of users and support concurrent processing.

 Accessibility:

The system should adhere to accessibility standards, such as providing alternative output options for
individuals with visual or hearing impairments.
 Privacy and Security:

The system should prioritize the privacy and security of user data, ensuring that personal information is
protected during the process.

 Adaptability and Customizability:

The system should be adaptable to different users' needs and preferences, allowing for customization of
gestures, language models, and speech synthesis options.

 System Integration:

The system should be easily integrated with other applications, platforms, or assistive technologies to
enable seamless communication across various contexts.

 Documentation and Support:

The project should provide comprehensive documentation, including user guides and technical support
resources, to facilitate system setup, usage, and troubleshooting.

PROJECT SIGNIFICANCE

The project on sign language detection and speech conversion holds significant importance in promoting
inclusive communication and empowering the deaf community. By accurately interpreting sign language
gestures and converting them into spoken language, the project enables effective communication
between sign language users and those who do not understand sign language. This has wide-ranging
benefits, including improved access to education, increased employment opportunities, enhanced social
integration, and the advancement of technology and research in fields such as artificial intelligence and
computer vision. Ultimately, the project aims to create a more inclusive and equitable society by
breaking down language barriers and fostering understanding and inclusivity.

LIBRARIES USED IN OUR PROJECT


TENSORFLOW
A complete open-source platform for machine learning is called TensorFlow. Researchers can advance
the state-of-the-art in machine learning thanks to its extensive, adaptable ecosystem of tools, libraries,
and community resources, and developers can simply create and deploy apps that are powered by
machine learning.

TensorFlow provides a variety of abstraction levels so you may pick the one that best suits your
requirements. The high-level Keras API makes it simple to build and train models while utilising
TensorFlow and machine learning.

Eager execution enables for fast iteration and intuitive debugging if you require additional freedom. Use
the Distribution Strategy API to distribute training across various hardware configurations for big ML
training projects without modifying the model definition asses.

OPENCV
An open-source library of programming functions called OpenCV (Open-Source Computer Vision) is
utilised for real-time computer vision.
It is mostly utilised for image processing, video recording, and feature analysis like face and object
recognition. Its primary interface is C++, although bindings are also available for Python, Java, and
MATLAB/OCTAVE.

MEDIAPIPE
The MediaPipe pipeline for sign language detection and speech conversion is designed to accurately
interpret sign language gestures and convert them into spoken language. It leverages the capabilities of
MediaPipe, an open-source framework for building real-time perceptual computing pipelines, to enable
efficient and robust processing of video input.

The Sign Language Detection and Speech Conversion MediaPipe pipeline provides a comprehensive
solution for bridging the communication gap between sign language users and those who rely on spoken
language. By combining hand detection, gesture recognition, sign language to text conversion, and
speech synthesis, the pipeline enables effective and inclusive communication for the deaf and hard-of-
hearing community.

METHODOLOGY
It uses a vision-based methodology. The issue of employing any artificial gadgets for interaction is
eliminated because all signs are represented with naked hands.

DATASET GENERATION
We searched for pre-made datasets for the project but were unable to locate any that were in the form
of raw photos and met our specifications. The datasets in the form of RGB values were the only ones we
could locate. So, we made the decision to compile our own data set. The procedures we used to
produce our data set are listed below.

To create our dataset, we utilised the Open Computer Vision (OpenCV) software.

First, for training and testing reasons, we took about 200 and 800 photos, respectively, of each of the
ASL (American Sign Language) symbols.

PROJECT DIAGRAM
Here are some illustrations that demonstrate how our project or system can achieve the desired goals
based on the literature review and project scope from above.
USE CASE DIAGRAM
ACTIVITY DIAGRAM

TEST CASES:
TEST CASE #1:
WITHOUT DETECTION:
WITH DETECTION:
CODE IMPLEMENTATION:

import os
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder

configs = config_util.get_configs_from_pipeline_file(CONFIG_PATH)
detection_model = model_builder.build(model_config=configs['model'],
is_training=False)

# Restore checkpoint
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
ckpt.restore(os.path.join(CHECKPOINT_PATH, 'ckpt-31')).expect_partial()
print(ckpt)

@tf.function
def detect_fn(image):
    image, shapes = detection_model.preprocess(image)
    prediction_dict = detection_model.predict(image, shapes)
    detections = detection_model.postprocess(prediction_dict, shapes)
    return detections

import cv2
import numpy as np

from tkinter import Label, Tk, Button


import tkinter as tk
from PIL import Image, ImageTk
import visualFile as vf
import time

category_index =
label_map_util.create_category_index_from_labelmap(ANNOTATION_PATH+'/
label_map.pbtxt')

base = Tk()
base.title('Sign - Text - Speech')
base.state('zoomed')
lbl = Label(base)
lbl.pack()
lbl2 = Label(base, text='Words: ', font=('Arial', 20))
lbl2.place(relx=0.33, rely=0.70)
lbl2 = Label(base, text='Sentences: ', font=('Arial', 20))
lbl2.place(relx=0.33, rely=0.80)

#Label For Words


lbl3 = Label(base, justify='left', wraplength=200)
lbl3.place(relx=0.48, rely=0.70)
#Label For Sentences
lbl4 = Label(base, justify='left', wraplength=200)
lbl4.place(relx=0.48, rely=0.80)

cap = cv2.VideoCapture(0)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

words = ""
sentences=""
tmpwords = ""
tmpsentences=""
def update_frame():

    ret, frame = cap.read()


    if ret:
        frame1 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image_np = np.array(frame1)
        image_np_expanded = np.expand_dims(image_np, axis=0)

        input_tensor = tf.convert_to_tensor(image_np_expanded, dtype=tf.float32)


   
        detections = detect_fn(input_tensor)

        num_detections = int(detections.pop('num_detections'))
        detections = {key: value[0, :num_detections].numpy()
                      for key, value in detections.items()}
        detections['num_detections'] = num_detections

        # detection_classes should be ints.


        detections['detection_classes'] =
detections['detection_classes'].astype(np.int64)

        label_id_offset = 1
        image_np_with_detections = image_np.copy()

        #viz_utils.visualize_boxes_and_labels_on_image_array(
        #            image_np_with_detections,
        #            detections['detection_boxes'],
        #            detections['detection_classes']+label_id_offset,
        #            detections['detection_scores'],
        #            category_index,
        #            use_normalized_coordinates=True,
        #            max_boxes_to_draw=5,
        #            min_score_thresh=.5,
        #            agnostic_mode=False)
        prtext, _ = vf.visualize_boxes_and_labels_on_image_array(
            image_np_with_detections,
            detections['detection_boxes'],
            detections['detection_classes']+label_id_offset,
            detections['detection_scores'],
            category_index,
            use_normalized_coordinates=True,
            max_boxes_to_draw=1,
            min_score_thresh=.3,
            agnostic_mode=False)

        photo_img =
ImageTk.PhotoImage(image=Image.fromarray(image_np_with_detections))
        lbl.config(image=photo_img)
        lbl.image = photo_img

        scores = detections['detection_scores']
        category_dict = {value.get('id'):value.get('name') for _,value in
category_index.items()}

        if (prtext in category_dict.values() and round(scores[0] * 100) > 75):


            global sentences
            global words
            if (len(prtext) > 1):
                #if (prtext == 'space'):
                #    sentences = sentences +" "+ words
                #    lbl4.config(text=sentences, font=('Arial', 20))
                #else:
                #global sentences
                #sentences = sentences +" "+ prtext
                #lbl4.config(text=sentences, font=('Arial', 20))
               
                if (prtext == 'space'):
                    tmpwords = words.strip()
                    words = ""
                    tmpsentences = sentences +" "+ tmpwords
                    lbl4.config(text=tmpsentences, font=('Arial', 20))
                else:
                    sentences = sentences +" "+ prtext
                    #sentences = "My name is Abubakar"
                    lbl4.config(text=sentences, font=('Arial', 20))
            else:
                words = words + prtext
                lbl3.config(text=words, font=('Arial', 20))
            time.sleep(2)

        base.after(2, update_frame)

##For Speech Working


import pyttsx3 as p
from gtts import gTTS
from googletrans.client import Translator

def clearBtnClick():
    global words
    words=""
    global sentences
    sentences = ""
    lbl3.config(text="")
    lbl4.config(text="")

def backBtnClick():
    global words
    words = words[:-1]
    lbl3.config(text=words)

def engSpeech():
    global sentences
    engine = p.init()
    engine.say(sentences)
    engine.runAndWait()

def urduSpeech():
    global sentences
    translator = Translator()
    result = translator.translate(text=sentences, dest='ur')
    txt = result.text
    obj = gTTS(text=txt, lang = 'ur')
    obj.save('lastUrduSentence.mp3')
    os.system("mpg123 lastUrduSentence.mp3")
   

bckBtn = Button(base, text= "Back", height=3, width=10, command=backBtnClick)


bckBtn.place(relx= 0.63,rely=0.70)
clearBtn = Button(base, text= "Clear", height=3, width=10,
command=clearBtnClick)
clearBtn.place(relx= 0.63,rely=0.80)
engBtn = Button(base, text= "English", height=3, width=10, command=engSpeech)
engBtn.place(relx=0.73, rely=0.70)
urduBtn = Button(base, text= "Urdu", height=3, width=10, command=urduSpeech)
urduBtn.place(relx=0.73, rely=0.80)

update_frame()
base.mainloop()

CONCLUSION & FUTURE WORK


In recent years, there has been a significant advancement in the field of sign language detection
and conversion into speech, enabling better communication between individuals who are deaf
or hard of hearing and those who do not understand sign language. However, the future holds
even more promising developments in this area. Researchers and technologists are actively
working on refining and expanding these technologies to make them more accurate, accessible,
and user-friendly.
One aspect of future work involves improving the accuracy of sign language detection systems.
Current systems rely on computer vision algorithms to interpret hand movements and gestures,
but there is still room for enhancement. Researchers are exploring the use of advanced
machine learning techniques, such as deep learning and neural networks, to train models that
can better recognize and understand a wider range of sign language gestures. This would
enable more precise translation from sign language to spoken language.
Additionally, efforts are being made to develop portable and wearable devices that can
facilitate real-time sign language translation. These devices would incorporate compact
cameras and sensors to capture sign language movements and convert them into speech
output, either through built-in speakers or connected earpieces. The aim is to create
lightweight, unobtrusive devices that can be easily worn by individuals who are deaf or hard of
hearing, enabling them to communicate effortlessly with people who do not understand sign
language.
Another area of future research involves exploring natural language processing techniques to
improve the translation of sign language into spoken language. Sign languages often differ in
syntax and grammar from spoken languages, posing challenges for accurate translation.
Researchers are investigating methods to bridge this gap by developing algorithms that can
analyze the context and meaning of sign language expressions, considering the nuances and
cultural variations of different sign languages. This would result in more fluent and contextually
appropriate speech output.
Furthermore, there is a growing interest in integrating sign language detection and conversion
technologies into mainstream communication platforms and devices. The goal is to make these
technologies widely available and seamlessly integrated into everyday life. Imagine a future
where smartphones, tablets, and other communication devices have built-in sign language
detection capabilities, allowing individuals who are deaf or hard of hearing to communicate
effortlessly through video calls, social media platforms, and messaging apps.
In summary, the future of work on sign language detection and conversion into speech holds
great potential. With advancements in computer vision, machine learning, natural language
processing, and device integration, we can expect more accurate, portable, and user-friendly
systems that facilitate effective communication between individuals who use sign language and
those who do not. These advancements have the power to bridge communication barriers and
create a more inclusive and accessible society for everyone.

You might also like