Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

MACHINE LEARNING PROJECT REPORT

Sign Language Recognition


(UML501)
Fifth-Semester
Submitted by:
(102153031) PRANAV KUMAR AGRAWAL
(102153003) ADITYA JHA

BE Third Year, COE

8 Submitted To:

DR. ARUN SINGH PUNDIR

Computer Science and Engineering Department


Thapar Institute of Engineering and Technology, Patiala
NOVEMBER 2023
Table of Contents
1. Introduction.....................................................................................................................................3
2. Methodology...................................................................................................................................4
3. Python Modules Used......................................................................................................................5
4. Python Code Description.................................................................................................................7
5. Machine Learning Model...............................................................................................................18
6. Library Used for the ML Modeling.................................................................................................19
7. Results...........................................................................................................................................20
8. Conclusion.....................................................................................................................................21
9. References.....................................................................................................................................22
1. Introduction
Sign Language Recognition, this technology has the potential to
revolutionize communication for the deaf and hard of hearing
community, creating a more inclusive and accessible society. Sign
languages are rich and complex visual-spatial languages that involve the
use of hand gestures, facial expressions, body movements, and other
non-manual signals to convey meaning.
Combination of image processing and data analysis techniques,
computers can learn to recognize and interpret hand gestures,
translating them into written or spoken words.
Challenges in Sign Language Recognition:
1. Variability:
 Sign languages exhibit considerable variability due to
regional differences, individual signing styles, and context-
dependent meanings. Adapting to this variability is a
significant challenge.
2. Real-time Processing:
 Achieving real-time recognition is crucial for practical
applications, especially in communication scenarios where
delays can hinder effective interaction.
3. Vocabulary Size:
 Sign languages often have a large vocabulary, and creating
comprehensive datasets for training models that cover all
possible gestures can be challenging.
Ongoing research aims to improve accuracy, robustness, and the
overall usability of these systems in real-world applications.
2. Methodology
1. Data Collection: The process begins with capturing the
sign language gestures through images or videos. This
input is typically obtained through cameras or depth
sensors
2. Data Pre-processing: The captured data may undergo
preprocessing steps to enhance its quality and reduce noise.
This can include tasks like image normalization, background
removal, and hand tracking
3. Feature Extraction: Extracting relevant features from the
input data is crucial for effective sign language recognition.
Common features include hand shape, movement,
orientation, and facial expressions.
4. Training a Machine Learning Model: Various machine
learning techniques such as LSTM (Long Short-Term Memory)
which is a type of recurrent neural network (RNN) can be
used to train the machine learning model from the datasets
to learn the mapping between input gestures and their
corresponding meaning.
5. Testing of the ML Model: The testing a trained Sign Language
Recognition (SLR) model involves evaluating its performance
on a separate dataset that it has not seen during training.
6. Recognition Output: The final output of the system is the
recognition of the sign language gestures, which can be
presented in textual or symbolic form. This output can
then be used to generate spoken language or text for
communication
3. Python Modules Used
Brief description of import modules:

Fig 3.1

 import os: Imports the os module, which provides a way to interact with the operating system.
In this case, it is used for potential future operations involving the file system, although it's
not directly used in the provided snippet.
 import cv2: Imports the OpenCV library. OpenCV is a powerful computer vision library that
provides tools for image and video processing.

Fig 3.2
 MediaPipe is an open-source library by Google that offers solutions for various media processing
tasks, including face detection, hand tracking, pose estimation, and more.It provides pre-built
models and easy-to-use APIs for integrating these functionalities into applications

Fig 3.3

 This line imports the sleep function from the time module.
 The sleep function is used to pause the execution of the program for a specified number
of seconds.

Fig 3.4
 from function import *: This line imports all functions from a module named function.
It's a wildcard import, and it's generally recommended to import only the specific
functions or classes you need to avoid namespace conflicts.
 from sklearn.model_selection import train_test_split: This line imports the
train_test_split function from scikit-learn, which is a popular machine learning library in
Python. This function is often used to split a dataset into training and testing sets.
 from keras.utils import to_categorical: This line imports the to_categorical function
from Keras, a high-level neural networks API running on top of TensorFlow. This
function is commonly used to one-hot encode categorical variables.
 from keras.models import Sequential: This line imports the Sequential class from Keras.
The Sequential model is a linear stack of layers that can be easily created and built layer
by layer.
 from keras.layers import LSTM, Dense: This line imports the LSTM and Dense layers
from Keras. LSTM (Long Short-Term Memory) is a type of recurrent neural network
(RNN) layer, and Dense is a standard fully connected layer.
 from keras.callbacks import TensorBoard: This line imports the TensorBoard callback
from Keras. The TensorBoard callback is used to visualize training metrics in the
TensorBoard web interface during model training.
4. Python Code Description
 collectdata.py: This Python script uses the OpenCV library to capture video from the
default camera (camera index 0) and allows the user to capture images of specific regions of
interest (ROIs) corresponding to different letters of the alphabet. Here's an high level
explanation of the code flow:

Video Capture Initialization:

The code uses the OpenCV library (cv2) to initialize a video capture object (cap) for the default camera
(index 0). This object is used to read frames from the camera.

Directory Setup and Image Counting:

The script sets up a directory named 'Image/' where it intends to save images for each letter of the
alphabet ('A' to 'Z'). The variable count is a dictionary that keeps track of the number of images saved for
each letter. The count is determined by listing the files in the corresponding subdirectories ('A' to 'Z')
and calculating the length of the list.

Frame Processing:

The script enters into an infinite loop (while True) where it continuously reads frames from the camera
using cap.read().

It defines a region of interest (ROI) on the captured frame by drawing a rectangle with the vertices (0,40)
and (300,400). The ROI is then displayed in a window named "data" using cv2.imshow().

Another window named "ROI" displays the actual region of interest extracted from the frame
(frame[40:400,0:300]), focusing on the specified rectangle.

Image Capture on Key Press:

The script waits for a key press using cv2.waitKey(10). If a key is pressed, it checks the ASCII value of the
key using ord() and compares it to the ASCII values of letters 'a' to 'z'.

If a key corresponds to a letter, it saves the current ROI as an image in the respective subdirectory ('A' to
'Z') inside the 'Image/' directory. The image file is named based on the count for that letter.

The script continues to capture frames and save images until the user interrupts by pressing a key. After
exiting the loop, it releases the video capture (cap.release()) and closes all OpenCV windows
(cv2.destroyAllWindows()).
Fig 4.1 Fig 4.2
Fig 4.3
 function.py: This is a Python script is a helper file where various functions are define
which are used in other python scripts.

The various functions defined in this file are follows:

mediapipe_detection mediapipe_detection(image, model)

This function prepares an image for processing by a MediaPipe model by converting its color space, sets
flags to control write access, runs inference using the model, resets the flags, and converts the image
back to the original color space before returning the processed image and results

draw_styled_landmarks(image, results)

This function helps in visualizing the detected hand landmarks and their connections on an image, which
can be useful for debugging, analysis, or user interface applications

extract_keypoints(results)

This function takes the output of a hand tracking model from the MediaPipe library and extracts the 3D
coordinates of the hand landmarks. The function assumes that the input results contains information
about multiple hand landmark

Fig 4.4
 data.py: This python code script collects hand gesture data using the MediaPipe library.
Here's the code flow:

Data Directory Setup:

The script first attempts to create directories for each action and sequence in the specified data path
(DATA_PATH). It iterates over the actions and sequences, trying to create directories using os.makedirs.
If a directory already exists, it moves on without raising an exception.

MediaPipe Hand Detection Setup:

The script uses the mp_hands (MediaPipe hands) module to initialize hand tracking. It sets various
parameters such as model complexity, minimum detection confidence, and minimum tracking
confidence.

Data Collection Loop:

The script enters a loop to capture frames for each action, sequence, and frame number. It reads frames
from the specified image files (e.g., 'Image/{}/{}.png') instead of capturing them in real-time using a
camera. This could be useful for collecting pre-recorded data or for debugging.

Visualization and Wait Logic:

For each frame, it performs hand detection using the mediapipe_detection function and draws the
detected landmarks on the image using draw_styled_landmarks. It also displays information about the
ongoing data collection process on the screen using OpenCV.

During the first frame of each sequence, it displays a special message indicating the start of data
collection, and then, on subsequent frames, it shows a message indicating the action and sequence
being recorded.

Keypoint Extraction and Data Export:

For each frame, it extracts keypoints from the detected hand landmarks using the extract_keypoints
function. The keypoints are then saved as NumPy arrays in the specified directory
(DATA_PATH/action/sequence/frame_num.npy). This data likely serves as the training input for a
machine learning model.

The script also checks for the 'q' key press during the loop. If the 'q' key is pressed, the loop breaks,
allowing the user to stop the data collection process gracefully.

Note: The lines related to video capture (cap) are commented out, and frames are loaded from image
files. Uncommenting and using cap would allow real-time video capture from a camera.
Fig 4.5
 trainmodel.py: In this code script a machine learning model for hand gesture recognition,
is trained using a deep learning approach with the Keras library & Tensor flow. Here's an
explanation of the code flow:

Data Preparation:

The script loads previously collected hand gesture data stored as NumPy arrays. It creates sequences of
frames for each action and labels them accordingly. The data is split into training and testing sets using
train_test_split. The sequences represent the temporal aspect of the hand gestures.

Label Mapping:

The script creates a mapping (label_map) from action labels to numerical values. This mapping is used to
convert action labels into numerical representations for training the machine learning model.

Model Definition:

The script defines a sequential Keras model for training. It uses Long Short-Term Memory (LSTM) layers,
which are suitable for handling sequences of data. The model architecture consists of multiple LSTM
layers followed by dense layers. The output layer uses the softmax activation function, indicating that
the model is designed for multiclass classification.

Model Training:

The model is compiled using the Adam optimizer and categorical cross-entropy loss function. It is then
trained on the prepared training data (X_train and y_train) for a specified number of epochs (200 in this
case). The training progress is logged using TensorBoard callbacks, which can be visualized for analysis.

Model Saving:

After training, the script saves the trained model in two formats: as a JSON file (model.json) to store the
architecture and as an HDF5 file (model.h5) to store the weights and biases. These files can be later used
for making predictions without retraining the model.

Note: The script assumes certain data structures such as actions, no_sequences, sequence_length, and
DATA_PATH, which are likely defined earlier in the code. The success of the model training depends on
the quality and diversity of the collected hand gesture data.

Fig 4.6 Fig 4.7


Fig 4.8
 app.py: This code a script is for real-time hand gesture recognition using a pre-trained model
in the trainmodel.py. Here's code flow:

Loading Pre-Trained Model:

The script reads the architecture of a pre-trained neural network model from a JSON file (model.json)
and loads the weights from an HDF5 file (model.h5). This allows the script to use a previously trained
model for hand gesture recognition.

Visualization Functions:

The script defines a function prob_viz for visualizing the prediction probabilities on the input frame. It
draws rectangles with different colors representing the confidence levels for each action and overlays
the action labels.

Real-Time Gesture Recognition Loop:

The script enters a real-time loop using OpenCV's video capture. It captures frames from the camera
feed and processes them for hand gesture recognition using the MediaPipe hands module.

It extracts hand keypoints from the detected landmarks and maintains a sequence of the last 30 frames
(sequence).

The pre-trained model is then used to predict the action based on the sequence of keypoints. If the
prediction confidence surpasses a threshold, the recognized action is added to a sentence, and the
result is displayed on the frame.

Displaying Output and Accuracy:

The recognized gesture sequence is displayed on the top of the video feed, along with the
corresponding prediction accuracy. The script keeps track of the recognized actions in a sentence list
and their corresponding accuracy in the accuracy list.

Break Gracefully:

The script breaks out of the loop and releases resources if the 'q' key is pressed.

Note: Ensure that the necessary libraries (cv2, mp_hands, etc.) are imported, and the required functions
(mediapipe_detection, extract_keypoints, etc.) are defined in the code. Additionally, the script assumes
that a suitable pre-trained model (model.json and model.h5) is available for use.
Fig 4.9
Fig 4.10
5. Machine Learning Model

 LSTM: LSTM, or Long Short-Term Memory, is a type of recurrent neural network (RNN)
architecture that is well-suited for processing sequential data, such as text, speech, and time
series data. Unlike traditional RNNs, which can suffer from the vanishing gradient problem,
LSTMs have a memory cell that allows them to learn long-term dependencies in sequences. This
makes them well-suited for tasks such as machine translation, speech recognition, and text
generation.
 Dense Layer: Dense layers are used in the final stages of a neural network, where they are
used to classify or predict data. The number of neurons in the dense layer is typically equal to
the number of classes or categories that the neural network is trying to classify. For example, if a
neural network is trying to classify images of animals, the dense layer might have 10 neurons,
one for each animal class.
6. Library Used for the ML Modeling
 Keras: Keras is a high-level API or wrapper for the TensorFlow machine learning platform.
It provides an intuitive and easy-to-use interface for building and training deep learning
models. Keras is designed to be flexible and extensible, making it a powerful tool for a wide
variety of machine learning tasks.
 TensorFlow: It a numerical computation library that uses data flow graphs to represent
and execute machine learning models. These graphs allow you to build and train complex
models efficiently, and the library provides a variety of built-in operations for manipulating
data, performing arithmetic operations, and training neural networks.
7. Results

Fig 6.1

The model recognizes and outputs hand gestures. Here Output: A

Fig 6.2

The model recognizes and outputs hand gestures. Here Output: C


8. Conclusion
In conclusion, while sign language recognition systems face
challenges, advancements in machine learning and computer vision offer promising solutions.
Future trends include the development of more robust algorithms, improved training datasets,
and the integration of sensor technologies. With continued research and collaboration, the
power of machine learning can be fully harnessed to accurately recognize and interpret sign
language, enhancing communication and inclusivity for the deaf community.
9. References

 Numpy: Van der Walt, S., Colbert, S. C., & Varoquaux, G.


(2011). The NumPy array: a structure for efficient
numerical computation. Computing in Science &
Engineering, 13(2), 22-30.
 https://www.tensorflow.org/learn
 https://www.geeksforgeeks.org/introduction-to-tensorflow/
 https://developers.google.com/mediapipe
 Scikit-learn: Pedregosa, F., Varoquaux, G., Gramfort, A.,
Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J. (2011).
Scikit-learn: Machine learning in Python. Journal of
machine learning research, 12(Oct), 2825-2830.

You might also like