Ebss 111

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 88

ABSTRACT

The project introduces an application using computer vision for Hand gesture recognition. A camera
records a live video stream, from which a snapshot is taken with the help of interface. The system is
trained for each type of count hand gestures (one, two, three, four, and five) at least once. After that
a test gesture is given to it and the system tries to recognize it.

A research was carried out on a number of algorithms that could best differentiate a hand
gesture. It was found that the diagonal sum algorithm gave the highest accuracy rate. In the
preprocessing phase, a self-developed algorithm removes the background of each training
gesture. After that the image is converted into a binary image and the sums of all diagonal
elements of the picture are taken. This sum helps us in differentiating and classifying
different hand gestures.

Previous systems have used data gloves or markers for input in the system. I have no such
constraints for using the system. The user can give hand gestures in view of the camera
naturally. A completely robust hand gesture recognition system is still under heavy research
and development; the implemented system serves as an extendible foundation for future
work.
ACKNOWLEDGMENT

To accomplish even the simplest task, the help of Allah Almighty is required and my thesis, with its
complexities and difficulties, was not an easy task. After completing the task of its production, I am
thankful to Allah The Most Powerful that He helped at every step and never let the courage waver
even in the face of the worst problem.
My thesis supervisor, Pascal Rebreyend was also very supportive of everything. He gives me
tremendous support and help throughout, even though it took up a lot of his precious time.
I wish to acknowledge all my university teachers who support and guide me throughout this degree
specially Dr. Hasan Fleyeh, Siril Yella, Mark Dougherty and Jerker Westin. I am also thankful to my
family who support me in my studies.
INDEX

Contents Page Numbers

CHAPTER-1 INTRODUCTON
1.0 Introduction ---------------------------------------------------------------------1

1.1 Digital Image Processing ---------------------------------------------------1


1.2 Biometrics --------------------------------------------------------------------2

1.3Hand gesture detection and recognition -----------------------------------3

1.3.1 Detection ------------------------------------------------------------------3

1.3.2 Recognition ---------------------------------------------------------------4

1.4 Motivation --------------------------------------------------------------------6


1.5 Scope ---------------------------------------------------------------------------7

1.6 Software Tools ----------------------------------------------------------------7

1.7 Objectives ----------------------------------------------------------------------7

1.8 Conclusion ---------------------------------------------------------------------8

CHAPTER-2 LITERATURE REVIEW


2.0 Introduction-------------------------------------------------------------------- 9
2.1 Lighting ----------------------------------------------------------------------10
2.2 Camera orientations and distance-----------------------------------------10
2.3 Background Selections-----------------------------------------------------11

2.4 Different recognition approaches ----------------------------------------11

2.41 Pen-Based gesture recognition-----------------------------------------11

2.42 Tracker based gesture recognition ------------------------------------11

2.43 Data gloves ---------------------------------------------------------------13

2.44 Body suits -----------------------------------------------------------------14

2.45 Head and face gestures --------------------------------------------------14

2.46 Hand and arm gestures --------------------------------------------------14

2.47 Body gestures -------------------------------------------------------------15

2.48 Vison-Based gestures recognitions-------------------------------------16

2.5 Conclusion -------------------------------------------------------------------17


CHAPTER-3 METHODOLOGY
3.0 Introduction --------------------------------------------------------------------18
3.1 Project constraints -----------------------------------------------------------19
3.2 The webcam system (USB Port ) ------------------------------------------20
3.3 Brief outline of the implemented system ---------------------------------20
3.3.1 Pre Processing -----------------------------------------------------------21
3.31.1 Skin Modelling ---------------------------------------------------------21
3.3.3 standardize data---------------------------------------------------------20
3.4 feature extraction-------------------------------------------------------------20
3.4.1 segmentation technique-------------------------------------------------21
3.4.2 Threshold method--------------------------------------------------------21
3.4.3 edge based segmentation-----------------------------------------------22
3.4.4 clustering based segmentation methods-----------------------------23
3.5 image analysis----------------------------------------------------------------24
3.6 UML Diagrams--------------------------------------------------------------24
3.7 algorithm----------------------------------------------------------------------32
3.7.1 purpose of using this algorithm--------------------------------------32
3.7.2 algorithm CNN----------------------------------------------------------33
3.7.2.1 Architecture of CNN-----------------------------------------------33
3.7.2.2 Convolutional layer------------------------------------------------34
3.7.2.3 rectified linear unit(reLU)----------------------------------------35
3.7.2.4 Pooling----------------------------------------------------------------36
3.7.2.5 flattening-------------------------------------------------------------37
3.7.2.6 dense neural network----------------------------------------------38
3.7.3 explanation of algorithm-----------------------------------------------39
3.7.4 conclusion-----------------------------------------------------------------42

CHAPTER-4 IMPLEMENTATION
4.1 Introduction of software techniques------------------------------------43
4.2 importance of python features-------------------------------------------45
4.3 software and hardware requirement analysis-------------------------46
4.3.1 python language--------------------------------------------------------46
4.3.2 independence across platforms--------------------------------------47
4.3.3 consistency and simplicity--------------------------------------------47
4.3.4 frameworks and libraries variety-----------------------------------47
4.3.5 A low entry barrier------------------------------------------------------48
4.3.6 Versatility------------------------------------------------------------------48
4.3.7 IDLE Software-----------------------------------------------------------48
4.3.8 Libraries used(panda’s)------------------------------------------------49
4.3.9 key features of pandas--------------------------------------------------50
4.3.10 benefits of pandas------------------------------------------------------50
4.3.11 python pandas data structure----------------------------------------50
4.3.12 software requirements-------------------------------------------------51
4.3.13 hardware requirements------------------------------------------------51
4.4 python libraries used---------------------------------------------------------51
4.4.1 Tensorflow----------------------------------------------------------------51
4.4.2 Numpy--------------------------------------------------------------------52
4.4.3 Scikit-Learn-------------------------------------------------------------53
4.4.4 Matplotlib----------------------------------------------------------------53
4.4.5 Pandas--------------------------------------------------------------------53
4.4.6 Tkinter--------------------------------------------------------------------54
4.4.7 Python Image Library(Pil/Pillow)----------------------------------54
4.4.8 Open Cv-----------------------------------------------------------------55
4.4.9 Pytorch-------------------------------------------------------------------55
4.4.10 Keras--------------------------------------------------------------------55
4.5 Process Of Implementation-----------------------------------------------56
4.5.1 Data Collection From Kaggle For Brain Tumor Classification-56
4.5.2 Data Preprocessing Techniques For Brain Tumor Classification-57
4.5.2.1 Code For Preprocessing-------------------------------------------59
4.5.2.2 Explanation Of Code----------------------------------------------63
4.5.2.3 Creating Gui Frames---------------------------------------------64

4.5.2.4 Explanation Of Code---------------------------------------------68

Chapter-5 Testing Of Project--------------------------------------------71


5.0 Explanation Of Testing------------------------------------------------------71

5.1 Test Cases--------------------------------------------------------------------74

Chapter-6 Result-----------------------------------------------------------77
6.0 Way Of Execution------------------------------------------------------------77
6.1 Result Analysis---------------------------------------------------------------80
Chapter-7--------------------------------------------------------------------82
Conclusion And Future Scope---------------------------------------------------------82
References----------------------------------------------------------------------------------83

LIST OF FIGURES FIG NO FIGURE


NAME PAGE NO
Fig:1.1 Survival Rate After Tumor Detected-------------------------------------4
Fig:3.1 Basic System Architecture-------------------------------------------------17
Fig:3.2 Brain Mri’s-------------------------------------------------------------------18
Fig:3.3 Image Before And After Thresholding----------------------------------21
Fig:3.4 Image Before And After Edge Based Segmentation------------------22
Fig:3.5 Image Before And After Clustering------------------------------------23
Fig:3.6 Use Case Diagram-----------------------------------------------------------26
Fig:3.7 Class Diagram----------------------------------------------------------------28
Fig:3.8 Sequence Diagram-----------------------------------------------------------29
Fig:3.9 Activity Diagram------------------------------------------------------------31
Fig:3.10 Basic Architecture Of CNN-----------------------------------------------34
Fig:3.11 CNN Reading A Human Smile-------------------------------------------34
Fig:3.12 Scan Of Image By CNN---------------------------------------------------35
Fig:3.13 Graphical Representation Of ReLU------------------------------------35
Fig:3.14 Function Of ReLU----------------------------------------------------------36
Fig:3.15 Function Of Max And Avg Pooling-------------------------------------37
Fig:3.16 Function Of Flattening-----------------------------------------------------38
Fig:3.17 Basic Representation Of Dense Neural Network---------------------39
Fig:3.18 Steps Involved In Convolution Layer-----------------------------------40
Fig:3.19 Convolution Neural Network---------------------------------------------40
Fig:3.20 Working Flow Of The Proposed CNN Model-------------------------42
Fig:4.0 GUI Screen-------------------------------------------------------------------70
Fig:6.1 Execution Of Step-1--------------------------------------------------------77
Fig:6.2 Execution Of Step-2--------------------------------------------------------77
Fig:6.3 Execution Of Step-3--------------------------------------------------------78
Fig:6.4 Execution Of Step-4---------------------------------------------------------78
Fig:6.5 Execution Of Step-5---------------------------------------------------------79
Fig:6.6 Execution Of Step-6---------------------------------------------------------79
FIG:6.7 EXECUTION OF STEP-7----------------------------------------------80

LIST OF TABLES

TABLE NUMBER TABLE NAME PAGE NUMBER

TABLE:6.1 ACCURACY TABLE-----------------------------------80


INDEX
CONTENTS PAGE
NUMBERS

CHAPTER 1
INTRODUCTION
…………………………………………………...........................1
1.1 DIGITAL IMAGE
PROCESSING…............................................................ .1
1.2
BIOMETRICS ....................................................................................................1
1.3 HAND GESTURE DETECTIOIN AND
RECOGNITION ............................2
1.3.1
DETECTION .......................................................................................................2
1.3.2
RECOGNITION ..................................................................................................3
1.4
MOTIVATION ......................................................................................................
..4
1.5
SCOPE ....................................................................................................................
.4
1.6 SOFTWARE
TOOLS .............................................................................................4
1.7
OBJECTIVES ........................................................................................................
.5
1.8 CONCLUSION
……………………………………………………………………………………
……5
CHAPTER 2: LITERATURE REVIEW
………………………………………..........6

INTRODUCTION………………………………………………………………
…………………………….
2.1
LIGHTING .............................................................................................................
...7
2.2 CAMERA ORIENTATIONS AND
DISTANCE .................................................... 8
2.3 BACKGROUND
SELECTION ............................................................................. 8
2.4 DIFFERENT RECOGNITION
APPROACHES.................................................. 8
2.41 PEN-BASED GESTURE
RECOGNITION ......................................................... 8
2.42 TRACKER-BASED GESTURE
RECOGNITION............................................. 9
2.43 DATA
GLOVES ..................................................................................................... 9
2.44 BODY
SUITS ........................................................................................................ 10
2.45 HEAD AND FACE
GESTURES ......................................................................... 10
2.46 HAND AND ARM
GESTURES............................................................................ 11
2.47 BODY
GESTURES .............................................................................................. 11
2.48 VISION-BASED GESTURE
RENITION ......................................................... 12
2.5
CONCLUSION…………………………………………………………………
………………………
CHAPTER 3: METHODOLOGY………….
…………………………………………………………..14

INTODUCTION…………………………………………………………………
………………………….
3.1 PROJECT
CONSTRAINTS ...........................................................................................14
3.2 THE WEBCAM SYSTEM (USB
PORT) ..........................................................14
3.3 BRIEF OUTLINE OF THE IMPLEMENTED
SYSTEM ................................15
3.31 PRE-
PROCESSING ...........................................................................................15
3.31.1 SKIN
MODELING ..........................................................................................16 3.31.2
REMOVAL OF BACKGROUND……………………………………….......17
3.31.3 CONVERSION FROM RGB TO
BINARY ...................................................18
3.31.4 HAND
DETECTION ...................................................................................... 18
3.32 FEATURE
EXTRACTION…………………………………………………….19
ALGORITHMS..................................................................................................19
3.33 REAL TIME
CLASSIFICATION .................................................................... 19
3.4
CONCLUSION…………………………………………………………………
………………………
CHAPTER 4 :
IMPLEMENTATION……………………………………………….
INTRODUCTION………………………………………..
…………………………...
4.1 NEURAL NETWORKS………………………………………………….
………..
4.2 ROW VECTOR
ALGORITHM .......................................................................... .
4.3 EDGING AND ROW VECTOR PASSING
ALGORITHM...............................
4.4 PYTHON LANGUAGE…………………………………………...…….
………...
4.4.1 PYTHON LIBRARIES USED …………………………………………..
……………………….
4.4.2
TENSORFLOW……………………………………………………………..
………..………………
4.4.3 OPEN CV……………………………………………………..
……………..………………………….
4.4.4 MEDIA
PIPE………………………………………………………………………………
………….
4.4.5 MEDIA PIPE TOOL KIT……………………………………...
…………………………………..
4.4.6 GRAPHS…………………………………………….
………………………………………………….
4.4.7
NUMPY…………………………………………………………………………
……………………….
4.5 MEAN AND STANDARD DEVIATION OF EDGED
IMAGE............................
4.6 DIAGONAL SUM
ALGORITHM .......................................................................
4.61 GUI
DESIGN ...................................................................................................
4.62 NN
TRAINING ......................................................................................................
4.63 PERFORMANCE OF
NN ..................................................................................
4.64 DETECTION AND
RECOGNITION ................................................................
4.65 CODE FOR PRE
PROCESSING……………………………………………………………...….
4.7 GRAPHICAL USER INTERFACE
(GUI) ........................................................... ……….
4.8 CONCLUSION
……………………………………………………………………………………
……..

CHAPTER 5: SYSTEM DESIGN ………...


………………………………...……
INTRODUCTION………………………………………………………………
………………………
5.1 ROW VECTOR ALGORITHM ......................................................................
5.2 EDGING AND ROW VECTOR PASSING ALGORITHM ........................
5.3 MEAN AND STANDARD DEVIATION OF EDGED IMAGE ...................
5.4 DIAGONAL SUM
ALGORITHM ...................................................................
5.5 PROCESSING
TIME .......................................................................................
5.5.1 USE CASE
DIAGRAM…………………………………………………
5.5.2 CLASS
DIAGRAM………………………………………………………
5.5.3 SEQUENCE DIAGRAM
……………………………………………….
5.5.4 ACTIVITY
DIAGRAM…………………………………………………
5.6 ROTATION
VARIANT ................................................................................................
5.7 EXPERIMENT AND
ANALYSIS .....................................................................
5.7.1 EFFECT WITH DIFFERENT SKIN
TONES ..............................................
5.7.2EFFECT OF TRAINING
PATTERN ..............................................................
5.7.3 GESTURE
RECOGNITION ..........................................................................
5.8 FAILURE
ANALYSIS ................................................................................
5.9.1 RECOVERY
TESTING ..................................................................................
5.9.2 SENSITIVITY
TESTING .......................................................................
5.10
CONCLUSION…………………………………………………………………
………………….
CHAPTER 6: TESTING OF PROJECT
INTRODUCTION………………………………………………………………
………………………
6.1 EXPLANATION OF
TESTING……………………………………………….
6.2 TYPES OF TESTS…….……………………………….
……………………….
6.2.1 UNIT TESTING………….
……………………………………………………
6.2.2 INTERATION
TESTING……………………………………………………
6.2.3 FUNCTIONAL
………………………………………….................................
6.3 WHITE
TESTING .............................................................................................
6.4 BLACK BOX
TESTING ....................................................................................
6.3
CONCLUSION…………………………………………………………………
………………….
CHAPTER 7: CONCLUSION AND ANALYSIS………….
………………
7.1 CONCLUSION ..........................................................................................
7.2 FUTURE WORK .......................................................................................
REFERENCES .................................................................................................

LIST OF FIGURES
FIGURE: 1.1: LIGHTING CONDITION AND BACKGROUND .................... 2
FIGURE 1.2: HAND GESTURE RECOGNITION FLOW CHART................ 3
FIGURE 2.1: THE EFFECT OF SELF SHADOWING (A) AND CAST
SHADOWING (B)
[25] ...........................................................................................7
FIGURE 3.1:SYSTEM IMPLIMENTATION....................................................
15
FIGURE 3.2: DIFFERENT ETHNIC GROUP SKIN PATCHES ....................
17
FIGURE 3.3: REMOVAL OF BACKGROUND..................................................
18 FIGURE 3.4: LABELING SKIN
REGION ......................................................... 19 FIGURE 3.5: REAL TIME
CLASSIFICATION ................................................ 20 FIGURE 4.1: NEURAL
NET BLOCK DIAGRAM ............................................ 21 FIGURE 4.2: NN
FOR ROW VECTOR AND EDGING ROW VECTOR ....... 22 FIGURE 4.3:
NN FOR MEAN AND STANDARD DEVIATION ...................... 22 FIGURE
4.4 ROW VECTOR OF AN IMAGE .................................................... 23
FIGURE 4.5: ROW VECTOR FLOW CHART ..................................................
23 FIGURE 4.6: EDGING AND ROW VECTOR FLOW
CHART ....................... 24 FIGURE 4.7: MEAN & S.D FLOW
CHART ...................................................... 25 FIGURE 4.8: DIAGONAL
SUM ........................................................................... 26 FIGURE 4.9:
DIAGONAL SUM FLOW CHART ............................................... 26 FIGURE
4.10: GRAPHICAL USER INTERFACE ............................................. 27
FIGURE 4.11: NN
TRAINING .............................................................................. 28 FIGURE 4.12:
PERFORMANCE CHART .......................................................... 28 FIGURE
4.13: GRAPHICAL USER INTERFACE OUTPUT ........................... 29
FIGURES 5.1: PERFORMANCE PERCENTAGE .............................................
31 FIGURES 5.2: DEGREE OF ROTATION
CLASSIFICATION .........................31
FIGURES5.3: DEGREE OF ROTATION MISCLASSIFICATION ..................
31
FIGURES 5.4: ETHNIC GROUP SKIN DECTION ............................................
31
FIGURES 5.5: MIX TRAINING PATTERN ....................................................
…..31
FIGURES 5.6: CLASSIFICATION PERCENTAGE ............................................
31

IX
CHAPTER 1

INTRODUCTION
Recent developments in computer software and related hardware technology have provided a value
added service to the users. In everyday life, physical gestures are a powerful means of communication.
They can economically convey a rich set of facts and feelings. For example, waving one's hand from
side to side can mean anything from a "happy goodbye" to "caution". Use of the full potential of
physical gesture is also something that most human computer dialogues lack [14].

The task of hand gesture recognition is one the important and elemental problem in computer vision.
With recent advances in information technology and media, automated human interactions systems are
build which involve hand processing task like hand detection, hand recognition and hand tracking.

This prompted my interest so I planned to make a software system that could recognize human gestures
through computer vision, which is a sub field of artificial intelligence. The purpose of my software
through computer vision was to program a computer to "understand" a scene or features in an image.

A first step in any hand processing system is to detect and localize hand in an image. The hand
detection task was however challenging because of variability in the pose, orientation, location and
scale. Also different lighting conditions add further variability.

1.1 DIGITAL IMAGE PROCESSING

Image processing is reckoned as one of the most rapidly involving fields of the software industry with
growing applications in all areas of work. It holds the possibility of developing the ultimate machines
in future, which would be able to perform the visual function of living beings. As such, it forms the
basis of all kinds of visual automation.
1. Medical Imaging: Techniques like MRI, CT scans, and ultrasound heavily rely on image processing
for diagnosis and treatment planning. Advanced algorithms can assist in detecting anomalies,
tumors, or other medical conditions.
2. Autonomous Vehicles: Image processing plays a crucial role in the development of self-driving
cars. Cameras capture real-time images of the vehicle's surroundings, which are then processed to
identify objects, pedestrians, road signs, and obstacles.
3. Surveillance and Security: Image processing is vital in surveillance systems for monitoring and
analyzing video feeds to detect suspicious activities, identify individuals, and enhance security
measures.
4. Satellite Imaging: Satellite imagery is used for various purposes such as environmental monitoring,
urban planning, agriculture, and disaster management. Image processing techniques help in
analyzing vast amounts of data to extract meaningful insights.

1
5. Augmented Reality (AR) and Virtual Reality (VR): Image processing enables the integration of
virtual elements into real-world environments, enhancing user experiences in gaming, entertainment,
training simulations, and various other applications.
6. Industrial Automation: Image processing is widely used in quality control, defect detection,
sorting, and robotic guidance systems across industries like manufacturing, electronics, and
agriculture.
7. Biometrics: Image processing techniques are employed in biometric systems for recognizing and
verifying individuals based on physiological or behavioral characteristics like fingerprints, facial
features, iris patterns, or voice.
8. Retail and Marketing: Image processing facilitates tasks like object recognition, facial analysis,
and customer behavior tracking, enabling personalized marketing strategies, inventory management,
and customer engagement.

1.2 BIOMETRICS
Biometric systems are systems that recognize or verify human beings. Some of the most important
biometric features are based physical features like hand, finger, face and eye. For instance finger print
recognition utilizes of ridges and furrows on skin surface of the palm and fingertips. Hand gesture
detection is related to the location of the presence of a hand in still image or in sequence of images i.e.
moving images. Other biometric features are determined by human behavior like voice, signature and
walk. The way humans generate sound for mouth, nasal cavities and lips is used for voice recognition.
Signature recognition looks at the pattern, speed of the pen when writing ones signature.
1. Fingerprint Recognition: This method relies on the distinct patterns formed by ridges and furrows
on the surface of fingertips and palms. Fingerprint recognition is widely used in various applications,
including law enforcement, access control, and mobile devices.
2. Face Recognition: Face recognition systems analyze facial features such as the arrangement of eyes,
nose, and mouth to identify individuals. It finds applications in surveillance, security systems, and
user authentication on devices.
3. Iris Recognition: Iris recognition involves analyzing the unique patterns in the colored part of the
eye (the iris). It is known for its high accuracy and is often used in high-security applications such as
border control and national identification programs.
4. Hand Gesture Detection: Hand gesture detection identifies the presence and movement of hands in
images or video sequences. It has applications in human-computer interaction, virtual reality, and
sign language recognition.
5. Voice Recognition: Voice recognition systems analyze the unique characteristics of an individual's
voice, including pitch, tone, and pronunciation, to verify their identity. Voice recognition is used in
voice-controlled devices, telephone banking, and authentication systems.
6. Signature Recognition: Signature recognition examines the distinctive features of an individual's
handwritten signature, including stroke pattern, speed, and pressure. It is utilized in banking,
document authentication, and forensic analysis.
2
7. Gait Recognition: Gait recognition identifies individuals based on their unique walking patterns. It
is often used in video surveillance for identifying people at a distance or in scenarios where facial
recognition is not feasible.

1.3 HAND GESTURE DETECTION AND RECOGNITION

1.3.1 DETECTION
Hand detection is related to the location of the presence of a hand in a still image or sequence of
images i.e. moving images. In case of moving sequences it can be followed by tracking of the hand in
the scene but this is more relevant to the applications such as sign language. The underlying concept of
hand detection is that human eyes can detect objects which machines cannot with that much accuracy
as that of a human. From a machine point of view it is just like a man fumble around with his senses to
find an object.

The factors, which make the hand detection task difficult to solve, are:

Variations in image plane and pose

The hands in the image vary due to rotation, translation and scaling of the camera pose or the hand itself. The
rotation can be both in and out of the plane.

Skin Color and Other Structure Components

The appearance of a hand is largely affected by skin color, size and also the presence or absence of
additional features like hairs on the hand further adds to this variability.

Lighting Condition and Background

3
As shown in Figure 1.1 light source properties affect the appearance of the hand. Also the background,
which defines the profile of the hand, is important and cannot be ignored.

Figure: 1.1: Lighting Condition and Background

1.3.2 RECOGNITION
Hand detection and recognition have been significant subjects in the field of computer vision and
image processing during the past 30 years. There have been considerable achievements in these fields
and numerous approaches have been proposed.
1. Traditional Computer Vision Techniques: In the early years, hand detection and recognition often
relied on traditional computer vision techniques such as edge detection, contour analysis, and
template matching. These methods were limited by their reliance on handcrafted features and often
struggled with variations in hand appearance and pose.
2. Statistical and Machine Learning Approaches: With the advent of statistical and machine learning
techniques, researchers began to explore methods such as Support Vector Machines (SVMs),
Random Forests, and Neural Networks for hand detection and recognition. These approaches
allowed for more robust modeling of hand appearance and motion patterns, leading to improved
performance in various conditions.
3. Feature-based Methods: Feature-based methods involve extracting key features from hand images
or video frames, such as color histograms, texture descriptors, or geometric properties. These
features are then used for hand detection and recognition tasks, often in combination with machine
learning algorithms.

4
4. Depth-based Methods: Depth sensors, such as Microsoft Kinect or Intel RealSense, have enabled
the development of depth-based hand detection and recognition systems. By capturing depth
information, these systems can overcome challenges related to variations in lighting conditions and
background clutter.
5. Deep Learning: In recent years, deep learning techniques, particularly Convolutional Neural
Networks (CNNs), have revolutionized hand detection and recognition. Deep learning models can
automatically learn hierarchical representations of hand features from raw pixel data, leading to
state-of-the-art performance in various hand-related tasks.
6. Gesture Recognition: Gesture recognition goes beyond simple hand detection and aims to
understand the meaning and intent behind hand movements. Gesture recognition systems often
leverage machine learning and deep learning techniques to classify and interpret gestures in real-
time.
7. Real-time Systems: With advancements in hardware and algorithms, real-time hand detection and
recognition systems have become feasible. These systems are capable of processing hand gestures
and movements in video streams with low latency, enabling applications such as gesture-based
interfaces and virtual reality interactions.
However, the typical procedure of a fully automated hand gesture recognition system can be illustrated
in the Figure 1.2 below:

5
Figure 1.2: Hand Gesture Recognition Flow Chart
1.4 MOTIVATION
Biometric technologies make use of various physical and behavioral characteristics of human such as
fingerprints, expression, face, hand gestures and movement. These features are then processed using
sophisticated machines for detection and recognition and hence used for security purposes. Unlike
common security measures such as passwords, security cards that can easily be lost, copied or stolen;
these biometric features are unique to individuals and there is little possibility that these pictures can be
replaced or altered.

Among the biometric sector hand gesture recognition are gaining more and more attention because of
their demand regarding security for law enforcement agency as well as in private sectors such as
surveillance systems.

6
In video conferencing system, there is a need to automatically control the camera in such a way that the
current speaker always has the focus. One simple approach to this is to guide the camera based on
sound or simple cues such as motion and skin color.
Hand gestures are important to intelligent human and computer interaction to build fully automated
systems that analyze information contained in images, fast and efficient hand gesture recognition
algorithms are required.

1.5 SCOPE

The scope of this project is to build a real time gesture classification system that can automatically
detect gestures in natural lighting condition. In order to accomplish this objective, a real time gesture
based system is developed to identify gestures.

This system will work as one of futuristic of Artificial Intelligence and computer vision with user
interface. Its create method to recognize hand gesture based on different parameters. The main priority
of this system is to simple, easy and user friendly without making any special hardware. All
computation will occur on single PC or workstation. Only special hardware will use to digitize the
image (Digital Camera).

1.6 SOFTWARE TOOLS

Due to the time constraint and complexity of implementing system in C++, the aim was to design a
prototype under MATLAB that was optimized for detection performance. A system that accepted
varying inputs of different sizes and image resolutions was implemented; constructing a well coded
and documented system for easier future development.

1.7 OBJECTIVES
First objective of this project is to create a complete system to detect, recognize and interpret the hand
gestures through computer vision

Second objective of the project is therefore to provide a new low-cost, high speed and color image
acquisition system.

7
1.8 CONCLUSION :
In conclusion, the integration of computer software with advanced hardware technology has
significantly enhanced user experiences by introducing value-added services. Recognizing the
importance of physical gestures in everyday communication, there has been a growing interest in
developing software systems that can understand and interpret these gestures through computer vision,
a subfield of artificial intelligence.

Hand gesture recognition is a crucial component of such systems, enabling automated human
interaction in various applications. However, it presents significant challenges due to the variability in
hand pose, orientation, location, scale, and lighting conditions.

Despite these challenges, recent advancements in information technology and media have facilitated
the development of automated systems capable of detecting, recognizing, and tracking human hand
gestures. By leveraging computer vision techniques, these systems aim to enhance human-computer
dialogue by enabling computers to "understand" and respond to gestures in a manner akin to human
communication.

Moving forward, continued research and development in this field hold the potential to revolutionize
human-computer interaction, making it more intuitive, efficient, and accessible across various domains,
from interactive gaming and virtual reality to assistive technologies and robotics.

8
CHAPTER 2
LITERATURE REVIEW

INTRODUCTION
Hand gesture recognition research is classified in three categories. First “Glove based
Analysis” attaching sensor with gloves mechanical or optical to transduces flexion of fingers into
electrical signals for hand posture determination and additional sensor for position of the hand. This
sensor is usually an acoustic or a magnetic that attached to the glove. Lookup table software toolkit
provided for some applications to recognize hand posture.

The second approach is “Vision based Analysis” that human beings get information from their
surroundings, and this is probably most difficult approach to employ in satisfactory way. Many
different implementations have been tested so far. One is to deploy 3-D model for the human hand.
Several cameras attached to this model to determine parameters corresponding for matching images of
the hand, palm orientation and joint angles to perform hand gesture classification. Lee and Kunii
developed a hand gesture analysis system based on a three-dimensional hand skeleton model with 27
degrees of freedom. They incorporated five major constraints based on the human hand kinematics to
reduce the model parameter space search. To simplify the model matching, specially marked gloves
were used [3].

The Third implementation is “Analysis of drawing gesture” use stylus as an input device. These
drawing analysis lead to recognition of written text. Mechanical sensing work has used for hand
gesture recognition at vast level for direct and virtual environment manipulation. Mechanically sensing
hand posture has many problems like electromagnetic noise, reliability and accuracy. By visual sensing
gesture interaction can be made potentially practical but it is most difficult problem for machines.

Full American Sign Language recognition systems (words, phrases) incorporate data gloves. Takashi
and Kishino discuss a Data glove-based system that could recognize 34 of the 46 Japanese gestures
(user dependent) using a joint angle and hand orientation coding technique. From their paper, it seems
the test user made each of the 46 gestures 10 times to provide data for principle component and cluster
analysis. The user created a separate test from five iterations of the alphabet, with each gesture well
separated in time. While these systems are technically interesting, they suffer from a lack of training [1,
2].

Excellent work has been done in support of machine sign language recognition by Sperling and Parish,
who has done careful studies on the bandwidth necessary for a sign conversation using spatially and
temporally sub-sampled images. Point light experiments (where “lights” are attached to significant
locations on the body and just these points are
used for recognition), have been carried out by Poizner. Most systems to date study isolate/static
gestures. In most of the cases those are fingerspelling signs [13].
9
2.1 LIGHTING
The task of differentiating the skin pixels from those of the background is made considerably easier by
a careful choice of lighting. According to Ray Lockton, if the lighting is constant across the view of the
camera then the effects of self-shadowing can be reduced to a minimum [25]. (See Figure 2.1)

Figure 2.1: The effect of self shadowing (A) and cast shadowing (B)
[25].
The top three images were lit by a single light source situated off to the left. A self- shadowing effect
can be seen on all three, especially marked on the right image where the hand is angled away from the
source. The bottom three images are more uniformly lit, with little self-shadowing. Cast shadows do
not affect the skin for any of the images and therefore should not degrade detection. Note how an
increase of illumination in the bottom three images results in a greater contrast between skin and
background [25].

The intensity should also be set to provide sufficient light for the CCD in the camera. However, since
this system is intended to be used by the consumer it would be a disadvantage if special lighting
equipment were required. It was decided to attempt to extract the hand information using standard
room lighting. This would permit the system to be used in a non-specialist environment [25].

2.2 CAMERA ORIENTATIONS AND DISTANCE


It is very important to careful about direction of camera to permit easy choice of background. Two
good and more effective approaches are to point the camera towards wall or floor. Lighting was
standard room; intensity of light would be higher and shadowing effects lower because camera was
10
pointed downwards. The distance of the camera from the hand should be such that it covers the entire gesture
mainly. There is no effect found on the accuracy of the system if the image is a zoomed one or not; the principle
is to cover the entire hand area majorly.

2.3 BACKGROUND SELECTION


Another important aspect is to maximize differentiation that the color of background must be different
as possible from skin color. The floor color in the work used was black. It was decided to use this color
because it offered minimum self-shadowing problem as compared to other background colors.

2.4 DIFFERENT RECOGNITION APPROACHES


The different recognition approaches studied are as follows:

2.41 PEN-BASED GESTURE RECOGNITION

Recognizing gestures from two-dimensional input devices such as a pen or mouse has been considered
for some time. The early Sketchpad system in 1963 used light-pen gestures, for example. Some
commercial systems have used pen gestures since the 1970s. There are examples of gesture recognition
for document editing for air traffic control, and for design tasks such as editing splines.

More recently, systems such as the OGI Quick Set system have demonstrated the utility of pen-based
gesture recognition in concert with speech recognition to control a virtual environment. Quick Set
recognizes 68 pen gestures, including map symbols, editing gestures, route indicators, area indicators,
and taps.

Oviatt has demonstrated significant benefits of using both speech and pen gestures together in certain
tasks. Zeleznick and Landay and Myers developed interfaces that recognize gestures from pen-based
sketching [3].

There have been commercially available Personal Digital Assistants (PDAs) for several years, starting
with the Apple Newton, and more recently the 3Com Palm Pilot and various Windows CE devices,
Long, and Rowe survey problems and benefits of these gestural interfaces and provide insight for
interface designers. Although pen-based gesture recognition is promising for many HCI environments,
it presumes the availability of, and proximity to, a flat surface or screen. In virtual environments, this is
often too constraining – techniques that allow the user to move around and interact in more natural
ways are more compelling [3].

2.42 TRACKER-BASED GESTURE RECOGNITION

There are many tracking system available commercially which can used for gesture recognition,
primarily tracking eye gaze, hand gesture, and overall body and its position. In virtual environment
11
interaction each sensor has its own strengths and weaknesses. Gestural interface eye gaze can be
useful, so I focus here on gesture based input from tracking the hand and the body.

1. Hand Gesture Tracking:

 Strengths:

 Precision: Hand gesture tracking systems can offer high precision, allowing for fine-
grained control and manipulation in virtual environments.

 Versatility: Hand gestures are highly expressive and can convey a wide range of
commands and interactions, making them versatile for various applications.

 Familiarity: Users are accustomed to using their hands for interaction in the physical
world, making hand gesture tracking a natural and intuitive input method.

 Weaknesses:

 Occlusion: Hand tracking systems may struggle with occlusion when one hand
obstructs the view of the other or when objects come between the hands and the
sensors, leading to inaccuracies in tracking.

 Fatigue: Extended use of hand gestures for interaction may lead to user fatigue,
especially in scenarios requiring prolonged or repetitive gestures.

 Limited Degrees of Freedom: Some hand tracking systems may have limitations in
tracking complex hand movements or gestures involving multiple fingers due to
technical constraints.

2. Body Movement Tracking:

 Strengths:

 Immersion: Tracking the movement of the entire body allows for a more immersive
user experience, enabling users to engage with virtual environments in a natural and
intuitive manner.

 Full-body Interaction: Body tracking facilitates interactions beyond hand gestures,


such as walking, running, jumping, and other full-body movements, enriching the
range of interactions possible in virtual environments.

 Accessibility: Body tracking can be more accessible to users with mobility


impairments or those who may find hand gestures challenging to perform.

 Weaknesses:

 Space Requirements: Body tracking systems often require larger physical spaces for
operation, limiting their feasibility in constrained environments or for users with
limited space.

 Calibration and Setup: Achieving accurate body tracking may require calibration and
setup procedures, which can be time-consuming and may require technical expertise.
12
 Occlusion and Ambiguity: Similar to hand gesture tracking, body tracking systems
may encounter issues with occlusion when body parts obstruct each other or when
users adopt ambiguous poses, leading to tracking errors.

2.43 DATA GLOVES

For communication and manipulation people use their hand for wide variety of tasks. Hands including
wrist with approximately 29 degrees of freedom are very dexterous and extremely expressive and quite
convenient. In variety of application domain, hand could be used for control device with sophisticated
input, providing real time control with many degrees of freedom for complex tasks. Sturman analyzed
task characteristics and requirements, hand action capabilities, and device capabilities, and discussed
important issues in developing whole-hand input techniques [4].

Sturman suggested taxonomy of whole-hand input that categorizes input techniques along two
dimensions: Classes of hand actions that could be continuous or discrete and interpretation of hand
actions that could be direct, mapped, or symbolic [4].

Given interaction task, can be evaluated as to which style best suits the task. Mulder presented an
overview of hand gestures in human-computer interaction, discussing the classification of hand
movement, standard hand gestures, and hand gesture interface design [3].

For the measurement position of hand configuration there are many commercially devices available to
calculate the degree of precision, accuracy and completeness. These devices include exoskeleton and
instrumented gloves mounted on hand and figure that are known as “Data gloves”. Few advantages of
data gloves, direct measurement of hand and finger parameters, provision of data, high sampling
frequency, easy if use, line of sight, low cost version and translation independency feature of data.

However with advantages of data glove there are few disadvantages like difficulty in calibration,
reduction in range of motion and comfort, noise in inexpensive system, expensiveness of accurate
system. Moreover it‟s compulsory for user to wear cumbersome device.

Many projects have used hand input from data gloves for “point, reach, and grab” operations or more
sophisticated gestural interfaces. Latoschik and Wachsmuth present a multi-agent architecture for
detecting pointing gestures in a multimedia application. Väänänen and Böhm developed a neural
network system that recognized static gestures and allows the user to interactively teach new gestures
to the system. Böhm et al. extend that work to dynamic gestures using a Kohohen Feature Map (KFM)
for data reduction [3].

The HIT Lab at the University of Washington developed Glove GRASP, a C/C++ class library that
allows software developers to add gesture recognition capabilities to SGI systems, including user-
dependent training and one- or two-handed gesture recognition. A commercial version of this system is
available from General Reality [3].
13
2.44 BODY SUITS

Process of small place of strategically dots placed on human body, people can perceive patterns such as
gestures, activities, identities and other aspects of body. One way of approach is recognition of
postures and human movements is optically measure of 3D position such as markers attached to body
and then recovers time varying articulate structure of body. This articulated sensing by position and
joint angles using electromechanically sensors. Although some of system require small ball or dot
placed top user clothing. I prefer body motion capture by “body suits” generically.

Body suits have advantages and disadvantages similar to those of data gloves. At high sampling rate it
provides reliable results but they are cumbersome and very expensive. Nontrivial calibration. Several
cameras used by optical system which is typically offline data process, lack of wires and tether is major
disadvantages.

2.45 HEAD AND FACE GESTURES

When people interact with one another, they use an assortment of cues from the head and face to
convey information. These gestures may be intentional or unintentional, they may be the primary
communication mode or back channels, and they can span the range from extremely subtle to highly
exaggerate. Some examples of head and face gestures include: nodding or shaking the head, direction
of eye gaze, raising the eyebrows, opening the mouth to speak, winking, flaring the nostrils and looks
of surprise, happiness, disgust, anger, sadness, etc [5].

People display a wide range of facial expressions. Ekman and Friesen developed a system called FACS
for measuring facial movement and coding expression; this description forms the core representation
for many facial expression analysis systems [6].

A real-time system to recognize actions of the head and facial features was developed by Zelinsky and
Heinzmann, who used feature template tracking in a Kalman filter framework to recognize thirteen
head/face gestures [6].
Essa and Pentland used optical flow information with a physical muscle model of the face to produce
accurate estimates of facial motion. This system was also used to generate spatiotemporal motion-
energy templates of the whole face for each different expression – these templates were then used for
expression recognition [3].

2.46 HAND AND ARM GESTURES

These two parts of body (Hand & Arm) have most attention among those people who study gestures in
fact much reference only consider these two for gesture recognition. The majority of automatic
recognition systems are for deictic gestures (pointing), emblematic gestures (isolated signs) and sign
languages (with a limited vocabulary and syntax). Some are components of bimodal systems,
integrated with speech recognition. Some produce precise hand and arm configuration while others
only coarse motion [3].

14
Stark and Kohler developed the ZYKLOP system for recognizing hand poses and gestures in real-time.
After segmenting the hand from the background and extracting features such as shape moments and
fingertip positions, the hand posture is classified. Temporal gesture recognition is then performed on
the sequence of hand poses and their motion trajectory. A small number of hand poses comprises the
gesture catalog, while a sequence of these makes a gesture [3].

Similarly, Maggioni and Kämmerer described the Gesture Computer, which recognized both hand
gestures and head movements. There has been a lot of interest in creating devices to automatically
interpret various sign languages to aid the deaf community. One of the first to use computer vision
without requiring the user to wear anything special was built by Starner, who used HMMs to recognize
a limited vocabulary of ASL sentences. The recognition of hand and arm gestures has been applied to
entertainment applications [3].

Freeman developed a real-time system to recognize hand poses using image moments and orientations
histograms, and applied it to interactive video games. Cutler and Turk described a system for children
to play virtual instruments and interact with life like characters by classifying measurements based on
optical flow [3].

2.47 BODY GESTURES

This section includes tracking full body motion, recognizing body gestures, and recognizing human
activity. Activity may be defined over a much longer period of time than what is normally considered a
gesture; for example, two people meeting in an open area, stopping to talk and then continuing on their
way may be considered a recognizable activity. Bobick proposed taxonomy of motion understanding in
terms of: Movement – the atomic elements of motion, Activity – a sequence of movements or static
configurations and Action – highlevel description of what is happening in context.

Most research to date has focused on the first two levels [3].

The Pfinder system developed at the MIT Media Lab has been used by a number of groups to do body
tracking and gesture recognition. It forms a two-dimensional representation of the body, using
statistical models of color and shape. The body model provides an effective interface for applications
such as video games, interpretive dance, navigation, and interaction with virtual characters [3].

Lucente combined Pfinder with speech recognition in an interactive environment called Visualization
Space, allowing a user to manipulate virtual objects and navigate through virtual worlds [3].

Paradiso and Sparacino used Pfinder to create an interactive performance space where a dancer can
generate music and graphics through their body movements – for example, hand and body gestures can
trigger rhythmic and melodic changes in the music [3].

15
Systems that analyze human motion in virtual environments may be quite useful in medical
rehabilitation and athletic training. For example, a system like the one developed by Boyd and Little to
recognize human gaits could potentially be used to evaluate rehabilitation progress [3].

Davis and Bobick used a view-based approach by representing and recognizing human action based on
“temporal templates,” where a single image template captures the recent history of motion. This
technique was used in the Kids Room system, an interactive, immersive, narrative environment for
children [3].

Video surveillance and monitoring of human activity has received significant attention in recent years.
For example, the W4 system developed at the University of Maryland tracks people and detects
patterns of activity [3].

2.48 VISION-BASED GESTURE RECOGNITION

The most significant disadvantage of the tracker-based systems is that they are cumbersome. This
detracts from the immerse nature of a virtual environment by requiring the user to put on an unnatural
device that cannot easily be ignored, and which often requires significant effort to put on and calibrate.
Even optical systems with markers applied to the body suffer from these shortcomings, albeit not as
severely. What many have wished for is a technology that provides real-time data useful for analyzing
and recognizing human motion that is passive and non-obtrusive. Computer vision techniques have the
potential to meet these requirements.
Vision-based interfaces use one or more cameras to capture images, at a frame rate of 30 Hz or more,
and interpret those images to produce visual features that can be used to interpret human activity and
recognize gestures [3].

Typically the camera locations are fixed in the environment, although they may also be mounted on
moving platforms or on other people. For the past decade, there has been a significant amount of
research in the computer vision community on detecting and recognizing faces, analyzing facial
expression, extracting lip and facial motion to aid speech recognition, interpreting human activity, and
recognizing particular gestures [3].

Unlike sensors worn on the body, vision approaches to body tracking have to contend with occlusions.
From the point of view of a given camera, there are always parts of the user‟s body that are occluded
and therefore not visible – e.g., the backside of the user is not visible when the camera is in front. More
significantly, self-occlusion often prevents a full view of the fingers, hands, arms, and body from a
single view. Multiple cameras can be used, but this adds correspondence and integration problems [3].

The occlusion problem makes full body tracking difficult, if not impossible, without a strong model of
body kinematics and perhaps dynamics. However, recovering all the parameters of body motion may
not be a prerequisite for gesture recognition. The fact that people can recognize gestures leads to three
possible conclusions: we infer the parameters that we cannot directly observe, we don‟t need these
parameters to accomplish the task, and we infer some and ignore others [3].

16
Unlike special devices, which measure human position and motion, vision uses a multipurpose sensor;
the same device used to recognize gestures can be used to recognize other objects in the environment
and also to transmit video for teleconferencing, surveillance, and other purposes. There is a growing
interest in CMOS-based cameras, which promise miniaturized, low cost, low power cameras integrated
with processing circuitry on a single chip [3].

Currently, most computer vision systems use cameras for recognition. Analog cameras feed their signal
into a digitizer board, or frame grabber, which may do a DMA transfer directly to host memory. Digital
cameras bypass the analog-to-digital conversion and go straight to memory. There may be a
preprocessing step, where images are normalized, enhanced, or transformed in some manner, and then
a feature extraction step. The features – which may be any of a variety of two- or three-dimensional
features, statistical properties, or estimated body parameters – are analyzed and classified as a
particular gesture if appropriate [3].

This technique was also used by us for recognizing hand gestures in real time. With the help of a web
camera, I took pictures of hand on a prescribed background and then applied the classification
algorithm for recognition.

is also known as edge detection method. Subtraction method is very simple that subtract input image
pixel to another image or constant value to provide output. I have also studied different approaches to
hand gesture recognition and came to know that implementation of such techniques like PCA and
Gradient method is complicated, we can produce same output as these techniques gives us by simple
and easy implementation. So, I have tried four different algorithms and finally selected the one, which
was most efficient
i.e. diagonal sum algorithm. This algorithm is able to recognize maximum gestures correctly.

CONCLUSION :

In conclusion, hand gesture recognition research can be classified into three main categories: glove-
based analysis, vision-based analysis, and analysis of drawing gestures.

17
1. Glove-based Analysis: This approach involves attaching sensors to gloves to transduce finger
flexion into electrical signals for determining hand posture. Additional sensors may track the
position of the hand. While this method provides a direct way to capture hand movements, it has
limitations such as electromagnetic noise, reliability issues, and the need for specially marked
gloves.
2. Vision-based Analysis: This method relies on computer vision techniques to analyze hand gestures.
It often involves deploying 3D models of the human hand and using multiple cameras to determine
parameters such as hand shape, palm orientation, and joint angles. While powerful, this approach is
challenging due to the complexity of interpreting visual data accurately.
3. Analysis of Drawing Gestures: This approach utilizes stylus or other input devices to recognize
hand gestures, particularly in the context of drawing or writing. While mechanical sensing can be
used for this purpose, it has its own set of challenges, including electromagnetic noise and accuracy
issues.
These different approaches have been applied in various contexts, including American Sign Language
recognition systems. While there has been notable progress in this field, challenges remain, such as the
need for extensive training data and the complexity of recognizing dynamic gestures in real-time
conversations.
Overall, hand gesture recognition research is advancing rapidly, driven by the increasing demand for
intuitive human-computer interaction systems. Continued innovation in this area holds the potential to
revolutionize how we interact with technology in diverse domains ranging from virtual reality and
gaming to assistive technologies and robotics.

CHAPTER 3
METHODOLOGY
INTRODUCTION
There have been numerous researches in this field and several methodologies were proposed like
Principle Component Analysis (PCA) method, gradient method, subtraction method etc. PCA relates to
Linear transformation consist on statistical approach. This gives us powerful tool for pattern
recognition and data analysis which mostly used in image processing techniques for data (compression,
dimension and correlation). Gradient method is also another image processing technique that detect
colour patches applying low pass filters .

1. Principal Component Analysis (PCA):

 PCA is a statistical technique used for dimensionality reduction and feature extraction.

 In the context of gesture recognition, PCA can be applied to reduce the dimensionality of feature
vectors representing hand or body movements.

 By projecting the data onto a lower-dimensional subspace defined by the principal components, PCA
can help remove redundant information and capture the most significant variations in the data.

 PCA is often used in conjunction with other classification algorithms, such as support vector
machines (SVMs) or neural networks, for gesture recognition tasks.
18
2. Gradient Method:

 The gradient method is a technique commonly used in image processing for edge detection and
feature extraction.

 In the context of gesture recognition, the gradient method can be applied to detect changes in
intensity or color patches in images or video frames.

 By applying gradient-based filters, such as Sobel or Prewitt filters, edges and boundaries of objects,
including hand gestures or body movements, can be detected.

 The gradient method can provide valuable information about the spatial layout and shape of
gestures, which can be used as features for classification.

3. Subtraction Method:

 The subtraction method is a simple yet effective technique used for background subtraction in image
processing.

 In gesture recognition, the subtraction method can be applied to isolate the moving hand or body
from the background in video sequences.

 By subtracting a reference background image or frame from the current frame, regions of motion
corresponding to hand gestures or body movements can be extracted.

 The resulting binary mask or silhouette can then be further processed and analyzed for gesture
classification or tracking.

3.1 PROJECT CONSTRAINTS

I propose a vision-based approach to accomplish the task of hand gesture detection. As discussed
above, the task of hand gesture recognition with any machine learning technique suffers from the
variability problem. To reduce the variability in hand recognition task we assume the following
assumptions:

• Single colored camera mounted above a neutral colored desk.


• User will interact by gesturing in the view of the camera.
• Training is must.
• Hand will not be rotated while image is capturing.

The real time gesture classification system depends on the hardware and software.

Hardware
• Minimum 2.8 GHz processor Computer System or latest
• 52X CD-ROM drive

19
• Web cam (For real-time hand Detection)
Software
• Windows 2000(Service Pack 4),XP, Vista or Windows 7
• Matlab 8.0 or latest (installed with image processing toolbox)
• Vcapg2.dll (Video Capture Program Generation 2)
• DirectX 9.0 (for supporting Vcapg2)

3.2 THE WEBCAM SYSTEM (USB PORT)

Below is the summary of the specifications of the camera which this system required:

Resolution: 640x480
Video frame rate: 30fps @640x480
Pixel depth: Minimum 1.3-mega pixels
Connection port: USB

In my project web cam was attached via USB port of the computer. The web cam worked by
continually capturing the frames. In order to capture a particular frame, the user just need to select the
particular Algorithm METHOD button on the interface and the hand was detected in the particular
frame. The web cam took color pictures, which were then converted into grayscale format. The main
reason of sticking to grayscale was the extra amount of processing required to deal with color images.

3.3 BRIEF OUTLINE OF THE IMPLEMENTED SYSTEM

Hand gesture recognition system can be divided into following modules:

• Preprocessing
• Feature extraction of the processed image
• Real time classification

CAMERA PRE - PROCESSING

FEATURE EXTRACTION
ALGORITHMS

REAL TIME
OUTPUT
CLASSIFICATION

20
Figure 3.1: System Implementation

3.31 PRE-PROCESSING

Like many other pattern recognition tasks, pre-processing is necessary for enhancing robustness and
recognition accuracy.

The preprocessing prepares the image sequence for the recognition, so before calculating the diagonal
Sum and other algorithms, a pre-processing step is performed to get the appropriate image, which is
required for real time classification. So it consists of some steps. The net effect of this processing is to
extract the hand only from the given input because once the hand is detected from the given input it can
be recognized easily. So pre- processing step mainly consists of following tasks:

• Skin Modeling
• Removal of Backgrond
• Conversion from RGB to binary
• Hand Detection

3.31.1 SKIN MODELLING

There are numerous method used for skin detection such as RGB (Red, Green, Blue), YCbCr
(Luminance Chrominance) and HSV (Hue, Saturation, Value).

• RGB:

RGB is a 3D color space pixel where each pixel has combination of three colors Red, Green and Blue
at specific location. This technique widely used in image processing for identifying skin region.

• YCbCr (Luminance Chrominance):

This color space is used in digital video color information represent two color Cb and Cr. Cb is
difference between Blue and Cr is difference between Red component references of value. This is
basically RGB transformation to YCbCr for separation of luminance and chrominance for color
modelling.

21
• HSV (Hue, Saturation and Value):

In HSV, Hue detect dominant color and Saturation define colourfulness whilst Value measure intensity
or brightness. This is well enough to choose single color but it ignores complexity of color appearance.
It trade off computation speed mean computationally expensive and perceptual relevance.
My approach for this thesis is to work with RGB to binarization techniques to Explicitly Defined
skin Region.

• Skin Detection:

The skin color detection is one of important goal in hand gesture recognition. Skin color detection
decision rules which we have to build that will discriminate between skin portion and non-skin portion
pixels. This is accomplished usually by metric introduction, which measure distance of the pixel color.
This metric type is knows as skin modelling.

• Explicitly Defined Skin Region

Following are some common ethnic skin groups and there RGB color space:

FIGURE 3.2: DIFFERENT ETHNIC GROUP SKIN PATCHES

To build a skin classifier is to define explicitly through a number of rules the boundaries of skin color
cluster in some color space. The advantage of this method is the simplicity of skin detection rules that
leads to the construction of very rapid classifier. For Example [7]

(R,G,B) is classified as skin if:


R > 95 and G > 40 and B > 20 and
max{R,G,B}−min{R,G,B} > 15 and |
R−G| > 15 and R > G and R > B

22
In this classifier threshold defined to maximize the chance for recognizing the skin region for each
color. If we see in Figure 3.2 that Red color in every skin sample is greater than 95, Green is greater
than 40 and Blue is greater than 20 in. So threshold can make this classifier easily detect almost all
kind of skin.

This is one of the easiest methods as it explicitly defines skin-color boundaries in different color
spaces. Different ranges of thresholds are defined according to each color space components in as the
image pixels that fall between the predefined ranges are considered as skin pixels. The advantage of
this method is obviously the simplicity which normally avoids of attempting too complex rules to
prevent over fitting data. However, it is important to select good color space and suitable decision rules
to achieve high recognition rate with this method. [8]

3.31.2 REMOVAL OF BACKGROUND

I have found that background greatly affects the results of hand detection that‟s why I have decided to
remove it. For this I have written our own code in spite of using any built-in ones.

FIGURE: 3.3: REMOVAL OF BACKGROUND

3.31.3 CONVERSION FROM RGB TO BINARY

All algorithms accept an input in RGB form and then convert it into binary format in order to provide
ease in recognizing any gesture and also retaining the luminance factor in an image.

3.31.4 HAND DETECTION

Image could have more than one skin area but we required only hand for further process. For this I choose
criteria image labeling which is following:
23
• Labeling:
To define how many skin regions that we have in image is by labelling all skin regions. Label is basically an
integer value have 8 connecting objects in order to label all skin area pixel. If object had label then mark current
pixel with label if not then use new label with new integer value. After counting all labelled region (segmented
image) I sort all them into ascending order with maximum value and choose the area have maximum value
which I interested because I assume that hand region in bigger part of image. To separate that region which
looked for, create new image that have one in positions where the label occurs and others set to zero.

Figure 3.4: Labeling Skin Region

3.32 FEATURE EXTRACTION ALGORITHMS

There are four types of algorithms that I studied and implemented namely as followings:

• Row vector algorithm


• Edging and row vector passing
• Mean and standard deviation of edged image
• Diagonal sum algorithm
For details of these algorithms, see Chapter4.

24
3.33 REAL TIME CLASSIFICATION

Figure 3.5 shows the concept for real time classification system. A hand gesture image will be
passed to the computer after being captured through camera at run time and the computer will try to
recognize and classify the gesture through computer vision.

FIGURE 3.5: REAL TIME CLASSIFICATION

In real time classification the system developed tries to classify the gestures not saved before but
given at the run time. The system first trains itself with the user count gestures at the run time and
then tries to classify the newly given test gestures by the user. The algorithm used by the system for
real time classification is the diagonal sum algorithm.
1. Training Phase:
 During the training phase, the system collects data on the gestures provided by the user at
runtime.

25
 For each gesture, the system extracts relevant features that capture important characteristics
of the gesture. These features could include aspects such as hand movement trajectory, hand
shape, speed, etc.
 The system then stores these feature vectors along with their corresponding gesture labels.
2. Classification Phase:
 In the classification phase, when a new gesture is provided by the user in real-time, the
system extracts features from this gesture.
 It then compares the features of the new gesture with the features of the gestures stored
during the training phase.
 The diagonal sum algorithm typically involves calculating the diagonal sum of the feature
vectors. This can be achieved by summing the diagonal elements of the covariance matrix of
the feature vectors.
 Once the diagonal sum is computed for both the new gesture and each stored gesture in the
training set, the system compares these values.
 The stored gesture with the closest diagonal sum to that of the new gesture is considered the
best match.
 Finally, the system classifies the new gesture based on the label of the closest match.
3. Real-Time Classification:
 The entire process described above is performed in real-time as the user provides new
gestures.
 The system continuously updates its training data and adapts its classification model based
on the gestures provided by the user during runtime.

CHAPTER 4
IMPLEMENTATION

INTRODUCTION
In this chapter, I described the detail of all the four features extraction algorithms. First I would like to
discuss neural network used in first three algorithms.

4.1 NEURAL NETWORKS

Neural networks are composed of simple elements operating in parallel. These elements are inspired by
biological nervous systems. As in nature, the network function is determined largely by the

26
connections between elements. We can train a neural network to perform a particular function by
adjusting the values of the connections (weights) between elements [9].

Commonly neural networks are adjusted, or trained, so that a particular input leads to a specific target
output. Such a situation is shown in Figure 4.1 below. There, the network is adjusted, based on a
comparison of the output and the target, until the network output matches the target. Typically many
such input/target pairs are used, in this supervised learning to train a network [9].

Figure 4.1 Neural net Block Diagram

Neural networks have been trained to perform complex functions in various fields of application
including pattern recognition, identification, classification, speech, and vision and control
systems .Today neural networks can be trained to solve problems that are difficult for conventional
computers or human beings [9].
Once data ready for representation then next step is to design NN for training and testing data. In first
two algorithms Row Vector and Edging and Row Vector passing algorithm have three layers feed
forward network: Input, Hidden and Output. Number of neuron in Input is 640 which are equal to
number of features extracted from each of algorithm and one neuron for Output layer for skin class to be
recognized. But for Mean and standard deviation there are only two input which is also equal to extracted
features from this algorithm. Neural network Architecture has number of parameter such as learning rate (lr),
number of epochs and stopping criteria which is based on validation of data. Training of Mean Square Error at
output layer which is set trial values and which is set by several experiments.

27
Figure 4.2: NN for Row Vector and Edging Row Vector

Figure 4.3: NN for Mean and S.D

4.2 ROW VECTOR ALGORITHM


We know that behind every image is a matrix of numbers with which we do manipulations to derive
some conclusion in computer vision. For example we can calculate a row vector of the matrix. A row
vector is basically a single row of numbers with resolution 1*Y, where Y is the total no of columns in
the image matrix. Each element in the row vector represents the sum of its respective column entries as
illustrated in Figure 4.4:

Figure 4.4 Row vector of an image

The first algorithm I studied and implemented makes use of the row vector of the hand gestures. For
each type of hand gesture, I took several hand images, do skin modeling, labeling, removed their
background and RGB to binary conversion in the preprocessing phase, calculated their row vectors and
then trained the neural network with these row vectors. Ultimately, the neural network was able to
recognize the row vectors that each gesture count can possibly have. Hence, after training, the system
was tested to see the recognition power it had achieved.

28
Mathematically, we can describe the image for training or testing purpose given to the neural network
as:

Input to neural network =Row vector (After image Preprocessing)

The flowchart of the algorithm is given below in Figure 4.5:

Figure 4.5: Row Vector Flow Chart

4.3 EDGING AND ROW VECTOR PASSING ALGORITHM

In the pre-processing phase of this algorithm, I do preprocessing, skin modeling and


removed the background etc. of the gesture image taken. This image was converted
from RGB into grayscale type. Gray scale images represent an image as a matrix where
every element has a value corresponding to how bright/dark the pixel at the
corresponding position should be colored.

For representing the brightness of pixels there are two ways for represent numbers, First
class called Double class that assign floating numbers (“decimals”) between 0 and 1 for
each pixel. The zero (0) value represent black and value one (1) corresponds to white.
The second class known as unit8 that assign integer between 0 and 255 for brightness of
pixel, zero (0) correspond to black and 255 for white. The unit8 class requires less
storage than double roughly 1/8.

29
After the conversion of the image into grayscale, I took the edge of the image with a
fixed threshold i.e. 0.5.This threshold helped us in removing the noise in the image. In
the next step, a row vector of the edged image was calculated. This row vector is then
passed on to the neural network for training. The neural network (NN) is later on tested
for the classification of the gestures.

Mathematically, the input to the neural network is given as:

Input to NN= Row vector [Edge (Grayscale image)]

Figure 4.6: Edging and Row Vector Flow Chart

4.4 Python Language

In this method, python software is used for the HAND GESTURE detection.
Machine learning and AI, as a unit, are still developing but are rapidly growing in usage due
to the need for automation. Artificial Intelligence makes it possible to create innovative
solutions to common problems, such as fraud detection, personal assistants, spam filters,
search engines, and recommendations systems. The demand for smart solutions to real-world
problems necessitates the need to develop AI further in order to automate tasks that are
tedious to program without AI. Python programming language is considered the best
algorithm to help automate such tasks, and it offers greater simplicity and consistency than
other programming languages. Further, the presence of an engaging python community
makes it easy for developers to discuss projects and contribute ideas on how to enhance their
code.

30
Independence across platforms Due to its ability to run on multiple platforms without the
need to change, developers prefer Python, unlike in other programming languages. Python
runs across different platforms, such as Windows, Linux, and macOS, thus requiring little or
no changes. The platforms are fully compatible with the Python programming language,
which means that there is little to no need for a Python expert to explain the program’s code.

Consistency and simplicity The Python programming language is a haven for most software
developers looking for simplicity and consistency in their work. The Python code is concise
and readable, which simplifies the presentation process. A developer can write code easily
and concisely compare it to other programming languages. It allows developers to receive
input from other developers in the community to help enhance the software
Frameworks and libraries variety Libraries and frameworks are vital in the preparation of a
suitable programming environment. Python frameworks and libraries offer a reliable
environment that reduces software development time significantly. A library basically 48
includes a prewritten code that developers can use to speed up coding when working on
complex projects.

A low entry barrier There is a shortage of programmers around the world. Python is easy to
learn a language the barriers to entry are very low. Multiple data scientists can learn Python
quickly to participate in machine learning projects. Believe it or not, Python is so similar to
English that it’s easy to understand. Thanks to the simple phrase structure, you can
confidently use complex systems.

Versatility Python is easy to use and supports various libraries and frameworks, making the
language more versatile. However, it works in two categories.

● Web development
● Machine learning One could say that there are multiple other appliances where Python
cannot stand.
For instance, it may be tough to program hardware-level or operating systems applications in
it, and it can be challenging to provide this language to the SPA front end. However, it works
very well on the backend. 4.3.7 IDLE Software Python IDLE is one of the IDEs used for
Python programming. IDLE stands for Integrated Development and Learning Environment.
It can be accessed by opening the command prompt and typing IDLE. It will give the IDLE
as the result after opening it a Python shell is opened where you can begin coding. Shell is an
interactive interpreter. It provides the output for each line of code immediately. Pressing the
enter key not only changes the line but produces the immediate result of the line after which
it is pressed. Unlike Jupyter Notebook, IDLE doesn’t allow us to write the complete code
first and then compute the results. But if a user wants to check each line of his code as he
types it, he will prefer Python IDLE over 49 Jupiter Notebook. So basically, it depends on the
user. He may want to complete his code and then runit or check every line simultaneously
while writing the code. But if you are one of those who want a visually attractive application
to code on, you must go with Jupyter notebook.

Advantages:
● Very simple and basic
● Runs without any server or browser
● Only requires Anaconda installation
31
● It han in-built debugger
● It Can be customized according to the user’s preferences Libraries Used (Panda’s) Pandas
is defined as an open-source library that provides high-performance data manipulation in
Python.

The name of Pandas is derived from the word Panel Data, which means an Econometrics
from Multidimensional data. It is used for data analysis in Python. Data analysis requires lots
of processing, such as restructuring, cleaning or merging, etc. There are different tools are
available for fast data processing, such as Numpy, Scipy, Cython, and Panda. But we prefer
Pandas because working with Pandas is fast, simple, and more expressive than other tools.
Pandas are built on top of the Numpy package, which means Numpy is required for operating
the Pandas. Before Pandas, Python was capable of data preparation, but it only provided
limited support for data analysis. So, Pandas came into the picture and enhanced the
capabilities of data analysis. It can perform five significant steps required for processing and
analysis of data irrespective of the origin of the data, i.e., load, manipulate, prepare, model,
and analyze. 50
Software Requirements

● Operating System: Windows, Mac or Linus Browser


● Specification: Any Internet Browser

Hardware Requirements

● Processor: i5 or above
● Hard Disk : 500 GB.
● Input Devices: Keyboard, Mouse
● RAM: 8 GB

4.4.1 PYTHON LIBRARIES USED

4.4.2 TENSOR FLOW

TensorFlow It is basically a framework for defining and running computations that involve
tensors, which are partially defined computational objects that eventually produce a value.
TensorFlow is an open source software library for numerical computation using data
flow graphs. The graph nodes represent mathematical operations, while the graph edges
represent the multidimensional data arrays (tensors) that flow between them.
This 52 flexible architecture enables you to deploy computation to one or more CPUs
or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow provides
stable Python and C APIs as well as non guaranteed backward compatible API's for C++, Go,
Java, JavaScript, and Swift.

Features:

1. Better computational graph visualizations .


2. Reduces error by 50 to 60 percent in neural machine learning .
32
3. Parallel computing to execute complex models.
4. Seamless library management backed by Google.
5. Quicker updates and frequent new releases to provide you with the latest features.

4.4.3 OpenCV

OpenCV is one of the most famous and widely used open-source libraries for computer
vision tasks such as image processing, object detection, face detection, image segmentation,
face recognition, and many more. Other than this, it can also be used for machine learning
tasks. This is developed by Intel in 2002. It is written in C++ but developers have provided
Python and Java bindings. It is easy to read and use. To build computer vision and machine
learning models, OpenCV has more than 2500+ algorithms. These algorithms are very much
useful to perform various tasks such as face recognition, object detection, and many more.

1. Functionality: OpenCV provides a wide range of functionalities for tasks such as image
processing, object detection and tracking, facial recognition, image segmentation, and more. It
includes algorithms for feature detection and extraction, image filtering, geometric
transformations, and machine learning.
2. Cross-platform and Language Support: OpenCV is cross-platform and can be used on
various operating systems including Windows, Linux, macOS, Android, and iOS. While it is
primarily written in C++, it provides bindings for Python, Java, and other languages, making it
accessible to a broader audience of developers.
3. Ease of Use: OpenCV is designed to be user-friendly with easy-to-understand functions and
interfaces. Its Python bindings make it particularly popular among developers due to the
simplicity and readability of Python syntax.
4. Rich Set of Algorithms: OpenCV offers a vast collection of pre-built algorithms (over 2500)
for various computer vision and machine learning tasks. These algorithms range from basic
image processing operations to advanced techniques such as deep learning-based object
detection and image segmentation.
5. Community and Support: OpenCV has a large and active community of developers and
researchers contributing to its development and maintenance. This community provides
support through forums, documentation, tutorials, and contributions to the library itself.
6. Integration with Machine Learning Libraries: OpenCV can be integrated with popular
machine learning libraries such as TensorFlow, PyTorch, and scikit-learn, allowing developers
to combine computer vision techniques with machine learning models for more complex
applications.

4.4.4 MEDIAPIPE
What is MediaPipe?

MediaPipe is a Framework for building machine learning pipelines for processing time- series
data like video, audio, etc. This cross-platform Framework works on Desktop/Server, Android, iOS,
and embedded devices like Raspberry Pi and Jetson Nano.

A Brief History of MediaPipe


33
Since 2012, Google has used it internally in several products and services. It was initially developed
for real-time analysis of video and audio on YouTube. Gradually it got integrated into many more
products; the following are some.

1. Perception system in NestCam


2. Object detection by Google Lens
3. Augmented Reality Ads
4. Google Photos
5. Google Home
6. Gmail
7. Cloud Vision API, etc.

MediaPipe powers revolutionary products and services we use daily. Unlike power-hungry machine
learning Frameworks, MediaPipe requires minimal resources. It is so tiny and efficient that even
embedded IoT devices can run it. In 2019, MediaPipe opened up a new world of opportunity for
researchers and developers following its public release.

4.4.5 MediaPipe Toolkit

MediaPipe Toolkit comprises the Framework and the Solutions. The following diagram shows the
components of the MediaPipe Toolkit.

34
FIGURE 4.45:1: MediaPipe Toolkit
Framework

The Framework is written in C++, Java, and Obj-C, which consists of the following APIs.

1. Calculator API (C++).


2. Graph construction API (Protobuf).
3. Graph Execution API (C++, Java, Obj-C).

4.4.6 Graphs

The MediaPipe perception pipeline is called a Graph. Let us take the example of the first solution, Hands.
We feed a stream of images as input which comes out with hand landmarks rendered on the images.

Home > MediaPipe > MediaPipe – The Ultimate Guide to Video Processing

35
The flow chart below represents the MP (Abbr. MediaPipe) hand solution graph.

36
FIGURE 4.46:1: MediaPipe Hands Solution Graph

4.4.7 NumPy

NumPy is a well-known general-purpose array-processing package. An extensive collection


of high complexity mathematical functions make NumPy powerful to process large multi-
dimensional arrays and matrices. NumPy is very useful for handling linear algebra, Fourier
transforms, and random numbers. Other libraries like TensorFlow use NumPy at the backend
for manipulating tensors.
With NumPy, we can define arbitrary data types and easily integrate with most databases.
NumPy can also serve as an efficient multi-dimensional container for any generic data that is
in any datatype. The key features of NumPy include powerful Ndimensional array objects,
broadcasting functions, and out-of-box tools to integrate C/C++ and Fortran code.

Features:
1.Provides fast, precompiled functions for numerical routines
2.Array-oriented computing for better efficiency
3.Supports an object-oriented approach

4.5 MEAN AND STANDARD DEVIATION OF EDGED IMAGE

In the pre-processing phase, doing several step like removing the background and RGB image is
converted into grayscale type as done in the previous algorithm. The edge of the grayscale image is
taken with a fixed threshold i.e. 0.5 then calculate the mean and standard deviation the processed
image.

Mean is calculated by taking a sum of all the pixel values and dividing it by the total no of values in the
matrix. Mathematically, it is defined as:

X =  Xi / n
i=1
Stand Deviation can calculate from mean which is mathematically defined as:

37
The mean and standard deviation of each type of count gesture are given to the neural network for
training. In the end, the system is tested to see the success rate of classification this technique provides.
Mathematically, the input given to the neural network is defined as:

Input to NN= Mean (Edge (Binary image)) + S.D (Edge (Binary Image))

Figure 4.5:1: Mean & S.D Flow Chart


4.6 DIAGONAL SUM ALGORITHM

In the pre-processing phase, doing mentioned steps


in methodology, skin modeling removal of the
background, conversion of RGB to binary and
labeling. The binary image format also stores an
image as a matrix but can only color a pixel black or
white (and nothing in between). It assigns a 0 for
black and a 1 for white. In the next step, the sum of
all the elements in every diagonal is calculated. The
main diagonal is represented as k=0 in Figure 4.7
given below; the diagonals below the main
diagonal are represented by k<0 and those above it Figure: 4.6: Diagonal Sum are represented as
k>0

38
The gesture recognition system developed through this algorithm first train itself with the diagonals
sum of each type of gesture count at least once, and then its power could be tested by providing it with
a test gesture at real time. Mathematically, the input given to the system at real time is given as:
n

Xi =  Diagonals
i=1
n

Input= Xi
i=1
The flowchart of the algorithm is given below in Figure 4.9:

Figure 4.9: Diagonal Sum Flow Chart


GRAPHICAL USER INTERFACE :
GUIDE is MATLab‟s Graphical User Interface Development Environment. GUIDE use for GUIs
containing various style figure windows of user interface objects. For creating GUI each object must be
programmed to activate user interface4.6 GRAPHICAL USER INTERFACE (GUI) .

4.61 GUI DESIGN


The next stage was to design a GUI such that it reflected the GUI requirements stated above. Following
Figure 4.9 shows the GUI design:

39
FIGURE 4.10: GRAPHICAL USER INTERFACE

START button activates the web cam. User can view his/her hand in the first box, which is above the
start button. On selecting any gesture from drop down list under the TRAINING button, image will be
captured and displayed in the right hand box.

My first step is training for which we capture different images and select respective options of gesture
numbers from the drop down menu in order to train the system. When an option is chosen from the
drop down menu, the user is asked to enter a name for the training image. Recognition process can now
be started by capturing a test gesture and then clicking the any of algorithm under METHOD button.
This displays a save window that stores your test gesture by the name you give it. After that a progress
bar indicates the processing of the system (i.e. preprocessing and recognition phase). The result of the
system will appear in front of the RESULT textbox. EXIT button enables the user to quit the MAT
LAB.

4.62 Neural Network Training If NN algorithm selected:

40
FIGURE 4.11: NN TRAINING

This neural network training system will pop up when we select Row Vector algorithm, Edging and
Row Vector and Mean and Standard Deviation algorithm. This is not for Diagonal Sum algorithm.
Diagonal Sum classified real time.

4.63 : Performance of NN

41
Figure: 4.12: Performance Chart

4.64 Detection and Recognition of a gesture

42
FIGURE: 4.13: GRAPHICAL USER INTERFACE OUTPUT

For Diagonal Sum algorithm we need to select 5 different gestures by selecting drop down menu under
TRAINING for real time training and for recognition select Diagonal Sum Algorithm under METHOD
drop down menu. There is no neural network for this algorithm but remaining 3 algorithms have neural
network for train the system.

43
4.7 CODE FOR PRE PROCESSING
import cv2
import numpy as np
import mediapipe as mp
import tensorflow as tf
from tensorflow.keras.models import load_model

# initialize mediapipe
mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=1, min_detection_confidence=0.7)
mpDraw = mp.solutions.drawing_utils

# Load the gesture recognizer model


model = load_model('mp_hand_gesture')

# Load class names


f = open('gesture.names', 'r')
classNames = f.read().split('\n')
f.close()
print(classNames)

# Initialize the webcam


cap = cv2.VideoCapture(0)

while True:
# Read each frame from the webcam
ret, frame = cap.read()

# Flip the frame vertically


frame = cv2.flip(frame, 1)
44
framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

# Get hand landmark prediction


result = hands.process(framergb)

# print(result)

className = ''
height, width, _ = frame.shape
x = width
y = height

# post process the result


if result.multi_hand_landmarks:
landmarks = []
for handslms in result.multi_hand_landmarks:
for lm in handslms.landmark:
# print(id, lm)
lmx = int(lm.x * x)
lmy = int(lm.y * y)

landmarks.append([lmx, lmy])

# Drawing landmarks on frames


mpDraw.draw_landmarks(frame, handslms, mpHands.HAND_CONNECTIONS)

# Predict gesture
prediction = model.predict([landmarks])
# print(prediction)
classID = np.argmax(prediction)
className = classNames[classID]

45
# show the prediction on the frame
cv2.putText(frame, className, (10, 50), cv2.FONT_HERSHEY_SIMPLEX,
1, (0,0,255), 2, cv2.LINE_AA)

# Show the final output


cv2.imshow("Output", frame)

if cv2.waitKey(1) == ord('q'):
break

# release the webcam and destroy all active windows


cap.release()

cv2.destroyAllWindows()

output:

46
47
5.1 CONCLUSION :

The provided code demonstrates a real-time hand gesture recognition system using the MediaPipe
library for hand landmark detection and a pre-trained TensorFlow model for gesture recognition.
Here's a conclusion for the code:
The script begins by importing necessary libraries including OpenCV, NumPy, MediaPipe, and
TensorFlow. It then initializes the hand detection model from MediaPipe and loads a pre-trained
gesture recognition model using TensorFlow's Keras API. Additionally, it loads class names from a
file.
Inside the main loop, the script continuously captures frames from the webcam, flips them
vertically (since OpenCV captures frames in an inverted manner), and processes them to detect
hand landmarks using MediaPipe. It then extracts landmark coordinates and draws them on the
frame.
Subsequently, it feeds the extracted landmarks into the loaded gesture recognition model to predict
the corresponding gesture. The predicted gesture class is then displayed on the frame using
OpenCV's cv2.putText() function.
Finally, the processed frame with the predicted gesture is displayed in a window. The loop
continues until the user presses the 'q' key to quit, at which point the script releases the webcam
and closes all active windows.
In conclusion, the provided code demonstrates a simple yet effective real-time hand gesture
recognition system using popular computer vision and machine learning libraries. With further
optimization and training, it could be integrated into various applications such as gesture-
controlled interfaces, virtual reality systems, or interactive gaming platforms.

48
CHAPTER 5

SYSTEM DESIGN

INTRODUCTION
The hand gesture recognition system has been tested with hand images under various conditions. The
performance of the overall system with different algorithms is detailed in this chapter. Examples of
accurate detection and cases that highlight limitations to the system are both presented, allowing an
insight into the strengths and weaknesses of the designed system. Such insight into the limitations of
the system is an indication of the direction and focus for future work.

System testing is actually a series of different tests whose primary purpose is to fully exercise the
computer-based system. It helps us in uncovering errors that were made inadvertently as the system
was designed and constructed. We began testing in the „small‟ and progressed to the „large‟. This
means that early testing focused on algorithms with very small gesture set and we ultimately moved to
a larger one with improved classification accuracy and larger gesture set.

5.1 ROW VECTOR ALGORITHM


The detection rate of the system achieved through this algorithm was 39%. It was noticed that the
performance of the system improved as the data set given to neural network (NN) for training was
increased. For each type of gesture, 75 images were given to the system for training. At the end, the
system was tested with 20 images of each kind of gesture. The results of the algorithm are given below
in Figure 5.1.

The row vector algorithm failed to give satisfactory results because the parameter (row vector) of two
different pictures happened to be the same for different gestures. This resulted in wrong classification
of some of the gestures and also it takes too much time in training process so, a need was felt for
improvement in the parameter passed to the neural network (NN). This resulted in the evolution of my
edge and row vector-passing algorithm.

5.2 EDGING AND ROW VECTOR PASSING ALGORITHM


The detection rate of the system achieved through this algorithm was 47%. It was noticed that the
performance of the system improved as the data set for training was increased. For each type of
gesture, 75 images were given to the system for training. At the end, the system was tested with 20
images of each kind of gesture. The results of the algorithm are given below in Figure 5.1.

49
The introduction of edge parameter along with the row vector gained an improvement in performance
but the self-shadowing effect in edges deteriorated the detection accuracy and it was again thought to
improve the parameter quality passed to the neural network (NN). It also have drawback of time
consuming it take more than normal time for training process. This gave birth to mean and standard
deviation of edged image algorithm.

5.3 MEAN AND STANDARD DEVIATION OF EDGED IMAGE


The detection rate of the system achieved through this algorithm was 67%. It was noticed that the
performance of the system improved as the data set for training was increased. For each type of
gesture, 75 images were given to the system for training. At the end, the system was tested with 20
images of each kind of gesture. The implementation details of the algorithm are given below in Figure
5.1.

The mean and standard deviation algorithm did help us in attaining an average result and also it take
less time for training process but still the performance was not very good as I want. The reason was
majorly the variation in skin colors and light intensity.

5.4 DIAGONAL SUM ALGORITHM


The poor detection rate of the above algorithms resulted in the evolution of diagonal sum algorithm,
which used the sum of the diagonals to train and test the system. This is real time classification
algorithm. User need to train system first and then try to recognized gesture. Every gesture at least once
user should have to give system for training process. The detection rate of the system achieved through
this algorithm was 86%. For each type of gesture, multiple images were given to the system for
training. After every training process system were tested 20 times. At the end, the system was tested
with 20 images of each kind of gesture. The results of the algorithm are given below in Figure 5.1.

The diagonal sum algorithm also demanded improvement as its detection accuracy was not 100% but it
was good.

50
Figures 5.1: Performance Percentage
5.5 PROCESSING TIME
Evaluation of time in an image processing and recognition procedure is very important for result and
performance, which shows tendency in all techniques which I used to recognized hand gestures. There
are few factors that prejudiced on the results such as quality of image, size of image (e.g. 648x480) and
the parameters of recognition techniques or algorithm. In first three algorithms e.g. Row Vector,
Edging and Row vector and Mean and Standard Deviation, neural network used for training and testing
but there is real time classification training and testing without NN in Diagonal Sum, So its takes less
time. Therefore time counted for training phase. Given image to system for testing it‟s include training,
testing, feature extraction and recognition of that particular image Following are comparison of
processing time for all algorithms:

Algorithms Row Vector Edging and Mean & S.D Diagonal Sum
(NN) Row vector (NN)
(NN)
Processor Speed Intel 2.53 GHz Intel 2.53 GHz Intel 2.53 GHz Intel 2.53 GHz
Test Image Size 640x480 640x480 640x480 640x480
Time (s) 240 sec 240 sec 180 sec 60 sec

Table 1: Processing Time (Testing)


5.6 UML DIAGRAMS :

UML stands for Unified Modeling Language. UML is a standardized generalpurpose modeling
language in the field of object-oriented software engineering. The standard is managed, and
was created by, the Object Management Group. The goal is for UML to become a common
language for creating models of object oriented computer software. In its current form UML is
comprised of two major components: a Meta-model and a notation. In the future, some form of
method or process may also be added to; or associated with, UML. The Unified Modeling
Language is a standard language for specifying, Visualization, Constructing and documenting
the artifacts of software system, as well as for business modeling and other non-software
systems. The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems. The UML is a very important part of
51
developing objects oriented software and the software development process. The UML uses
mostly graphical notations to express the design of software projects.

5.6.1 USE CASE DIAGRAM :

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis. Its purpose is to present a graphical overview of the
functionality provided by a system in terms of actors, their goals (represented as use cases), and any
dependencies between those use cases. The main purpose of a use case diagram is to show what
system functions are performed for which actor. Roles of the actors in the system can be depicted

52
figure 5.6.1 usecase diagram

5.6.2 CLASS DIAGRAM :

software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static
structure diagram that describes the structure of a system by showing the system's classes, their
attributes, operations (or methods), and the relationships among the classes. It explains which class
contains information.

53
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can develop and
exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practiceS.

54
FIGURE 5.6.2 CLASS DIAGRAM :

5.6.3 SEQUENCE DIAGRAM :

55
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams.

FIGURE 5.6.3 SEQUENCE DIAGRAM :

5.6.4 ACTIVITY DIAGRAM :


Activity diagrams are graphical representations of workflows of stepwise activities and actions with support
for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams can be used to
56
describe the business and operational step-by-step workflows of components in a system. An activity
diagram shows the overall flow of control.

FIGURE 5.6.4 ACTIVITY DIAGRAM

The above activity diagram visually describes the sequence of activities or actions that occur within a
System .
All the activities briefly discussed below:
1.The User starts by collecting a dataset.
2.The collected dataset is sent to the DataPreprocess activity for preprocessing.
3.The DataPreprocess activity, once completed, loads the machine learning models.
4.The User interacts with the Model to predict inputs based on the loaded models.
5.Meanwhile, the Admin views users' information.

57
6.The Admin can activate users based on the viewed information.

5.6 ROTATION VARIANT


Influence of rotation in same gesture at different degree is also important role in gesture recognition
process. Let consider first three methods e.g. Row vector, Edging and Row Vector and Mean & S.D.
There are multiple images of different people with different degrees of angle in training database so
neural network have ability of learning which can classify with different variation of gesture position.
Increasing training pattern gives more effective result because I observe that neural network able to
generalize better as if we increase number of gestures patterns made by different people, there by better
ability to extract features of specific gesture rather than feature of same gesture which made by single
person. The main motivation of neural network in pattern recognition is that once network set properly
trained by learning then system produce good result even existence of incomplete pattern and noise.

In real time classification e.g. Diagonal Sum rotation influence does matter but it depends on degree of
rotation. It can easily classify if degree of rotation if between 10 to 15 degree which you can see below
Figure 5.2. But if the degree of rotation is more that this then it is possible that it misclassify. Let
suppose during training process if we are going to give gesture to system which originate vertically and
for testing it‟s on vertically see below Figure 5.2. So it is possible that diagonal sum value will change
and output also misclassify. This is because when we try to recognize gesture in real time, determining
problem during training when one gesture end other begins, so this is main cause of misclassification.

Figure 5.2: Degree of Rotation classification

Figure 5.3: Degree of Rotation misclassification

5.7 EXPERIMENTS AND ANALYSIS

58
Performed experiment shows the achieved result and estimate gesture recognition system projected in
chapter 4. The experiment divided into two categories to better analyze system performance and
capabilities. The more general approach to work with differently user independent system developed to
interact with multi users with different kind of skin colors and hands shapes. It is very important
approach to attempt for independent multi- user system. The system can be used by various users.

Two main aims for this work to detect hand and recognition of hand gesture with neural network and
real classification. The first aim to detect hand with different skin tones, using explicitly defined skin
region. Secondly gesture recognition with neural network and real classification by different
algorithms. This system designed to test the hypothesis that detection and recognition rate would
increase as:

• Hand detection with different skin tones


• More training pattern are used to train neural network
• Gesture recognition

The analysis of each experiment which mentioned above is presented here one by one according to
above sequence.

5.71 EFFECT WITH DIFFERENT SKIN TONES:

Hand detection with different ethnic group experiment designed to test hypothesis that detection rate
would increase. If we are able to detect skin then ultimately hand will detect. For using explicitly
defined skin region classifier makes this possible to detect almost all kind of skins with different ethnic
group.

59
Figure 5.4: Ethnic Group Skin Detection

Figure 5.4 is a graph showing that effect of using explicitly skin classifier with different ethnic group.
10 images of each group given to this and it gives almost 100% result. It has abilty to detect almost all
kind of skin tones. Using explicitly defined skin region makes possible and experiment have confirmed
this with different skin tones.

Since it was observed that the detection accuracy varied with different skin colors and light intensity
but using skin classifier make system efficient can have ability to detect different skin tones.

5.72 EFFECT OF TRAINING PATTERN:

This experimental hypothesis was to decreased misclassification as the number of training pattern
increased. There are multiple images of different people with different degrees of angle in training
database so neural network have ability of learning which can classify with different variation of
gesture position. Increasing training pattern gives more effective results.

Initially in neural network try to train network with 30 images as 6 images of individual gesture but
classification result was less and it is observed that as we increase number of pattern of different people
classification result increased which we can see Figure 5.5 by increasing training set classification
result improved.

Figure 5.5: Mix Training Pattern

This experiment shows that neural network able to generalize better as if we increase number of
gestures patterns made by different people, there by better ability to extract features of specific gesture
rather than feature of same gesture which made by single person. The main motivation of neural
network in pattern recognition is that once network set properly trained by learning then system
produce good result even existence of incomplete pattern and noise.

60
5.73 GESTURE RECOGNITION:

This experimental hypothesis was to recognize of gestures that user gave to system either with training
with neural network and real time classification. The database to test hand gesture recognition system
created 75 static images with different background. The static images size 640x480 pixels collected
from camera. I grabbed gestures from different people and implement different methods of recognition
with more than one user ultimately system has ability to recognize gestures of different peoples.

The accuracy percentage measured per hand gesture from different users. In Figure 5.6 prove the
effectiveness of all method implemented. Classification of each gesture can be seen from following
results. In following results comparison that classification percentage varies with different methods.

Figure 5.6: Classification Percentage

This experiment shows that system have ability to recognize hand gestures with multiple users with
different skin colors. Hand gestures classification accuracy can see while classifying between five
gestures. These are not enough result to express any conclusive trend for gesture recognition
experiment but most important thing is that gesture recognition was possible using neural network and
real time classification.

5.8 FAILURE ANALYSIS

ORIENTATION/ ROTATION

There are few factors which could decrease efficiency of system specially when we talk about real time
classification problem occur during training when one gesture end other begins, when user try to
recognize gesture with different degree the system fail to recognize the gesture.

61
In different hand images the hand could have different orientation; this may affect hand recognition
accuracy because when we get images from camera to testing and training distance of camera from
hand and rotation of wrist makes difference, so it is essential to work with many degrees of freedom as
possible in order to make recognition process realistic.

If we talk about orientation we must deal this thing in processing phase doing framing of every image
by finding main axis of hand and calculate orientation and reorientation. For the rotation of wrist there
are many way to deal with e.g. COG (Centre of Gravity). In processing phase we could dealt with
every training and testing images by rotating clock or anti clockwise by making absolute orientation
point.

5.9 CONVENTIONAL TEST STRATEGIES APPLIED

5.91 RECOVERY TESTING


System fails in many ways but recovery must be properly performed. For example when a person trains
the system with his gestures and the system fails to do so, it gives an error message indicating that the
system was not properly trained and it keeps on doing so until it gets a valid gesture (that‟s acceptable
to the system for differentiation). When the system is properly trained, only then we can expect that it
will give us accurate results.

5.92 SENSITIVITY TESTING


Invalid input classes that may cause instability or improper processing. It was found during sensitivity
testing that if the system was once fully trained for all the gesture types, it gave accurate results,
otherwise if it were just trained for a single or two gestures and then tested, it performs erroneous
processing.

5.10 CONCLUSION :
In conclusion, the chapter presents a comprehensive analysis of the hand gesture recognition
system's performance using various algorithms and testing methodologies. The evaluation aims to
provide insights into the system's strengths, weaknesses, and areas for future improvement. Here's
a summary of the key findings:
1. Algorithm Performance Evaluation: The chapter discusses the performance of different
algorithms such as the Row Vector Algorithm, Edging and Row Vector Passing Algorithm,
Mean and Standard Deviation of Edged Image Algorithm, and Diagonal Sum Algorithm.
Each algorithm was tested with a dataset and evaluated based on detection rates. The results
indicated varying levels of accuracy and processing time for each algorithm.
2. Effect of Rotation Variant: The influence of rotation on gesture recognition was analyzed,
highlighting its impact on classification accuracy. While certain algorithms showed
robustness to rotation within a specific range, significant rotation could lead to
misclassification. Techniques to address rotation issues, such as image preprocessing and
feature extraction, were proposed.
3. Experiments and Analysis: The chapter describes experiments conducted to evaluate the
system's performance under different conditions, including skin tones, training patterns, and
gesture recognition accuracy. The results demonstrated the system's ability to detect hands

62
with various skin tones and recognize gestures accurately, especially with increased training
patterns.
4. Failure Analysis: Potential failure scenarios, such as orientation and rotation issues, were
identified and discussed. Strategies to mitigate these issues, such as image framing and
rotation correction, were proposed to improve system robustness.
5. Conventional Test Strategies: The chapter highlights the importance of recovery testing and
sensitivity testing to ensure the system's reliability and stability. These test strategies help
identify and address potential failure points, ensuring accurate and consistent performance.
In conclusion, while the hand gesture recognition system showed promising results in certain
aspects, there are still challenges to overcome, particularly in handling orientation and rotation
variat
ions. Future work could focus on refining algorithms, optimizing training processes, and
incorporating advanced techniques to improve the system's accuracy and robustness across diverse
conditions and user scenarios.

63
CHAPTER 6

TESTING OF PROJECT
6.0 INTRODUCTION
The provided text introduces the concept of testing in software development, emphasizing its
importance in ensuring that software systems meet requirements and function correctly. It outlines
various types of tests, including unit testing, integration testing, functional testing, and acceptance
testing, each serving specific purposes in validating different aspects of the software.
Unit testing is described as testing individual software units to ensure their internal logic functions
properly, typically done after completing each unit before integration. Integration testing focuses
on testing integrated software components to ensure they function as a single program. Functional
testing verifies that functions tested are available as specified by requirements and documentation,
covering areas such as valid input, invalid input, functions, output, and system/procedures.
Acceptance testing involves testing by end-users to ensure the system meets functional
requirements.
The text also introduces two approaches to testing: white box testing and black box testing. White
box testing involves testing with knowledge of the software's inner workings and structure, while
black box testing is performed without knowledge of the internal implementation, treating the
software as a "black box."
Furthermore, the text outlines the objectives and features to be tested in a software system,
emphasizing the importance of verifying correct functionality, proper formatting of entries,
prevention of duplicate entries, and accurate navigation through links.
Overall, the text provides a comprehensive overview of testing in software development,
highlighting its role in ensuring software quality and reliability.

6.1 EXPLANATION OF TESTING

System Test The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub assemblies, assemblies and/or a finished product It is the process of exercising software
with the intent of ensuring that the Software system meets its requirements and user expectations and does
not fail in an unacceptable manner. There are various types of test. Each test type addresses a specific testing
requirement.

64
6.2 Types Of Tests

6.2.1 Unit testing

Unit testing involves the design of test cases that validate that the internal program logic is functioning
properly, and that program inputs produce valid outputs. All decision branches and internal code flow
should be validated. It is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that relies on knowledge of
its construction and is invasive. Unit tests perform basic tests atcomponent level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly defined inputs and
expected results.

6.2.2 Integration testing

Integration tests are designed to test integrated software components to determine if they actually run as one
program. Testing is event driven and is more concerned with the basic outcome of screens or fields.
Integration tests demonstrate that although the components were individually satisfaction, as shown by
successfully unit testing, the combination of components is correct and consistent.
Integration testing is specifically aimed at exposing the problems that arise from the
combination of components.
Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications, e.g. components in a
software system or – one step up – software applications at the company level – interact without error. Test
Results: All the test cases mentioned above passed successfully. No defects encountered. Acceptance
Testing User Acceptance Testing is a critical phase of any project and requires significant participation by
the end user. It also ensures that the system meets the functional requirements. Test Results: All the test
cases mentioned above passed successfully. No defects encountered.

6.2.3 Functional testing

Functional tests provide systematic demonstrations that functions tested are available as specified by the
business and technical requirements, system documentation, and user manuals. Functional testing is centered
on the following items:
Valid Input : identified classes of valid input must be accepted.

Invalid Input :
identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures : interfacing systems or procedures must be invoked. Organization and preparation of


functional tests is focused on requirements, key functions, or special test cases. In addition, systematic
65
coverage pertaining to identify Business process flows; data fields, predefined processes, and successive
processes must be considered for testing. Before functional testing is complete, additional tests are identified
and the effective value of current tests is determined. System Test System testing ensures that the entire
integrated software system meets requirements. It tests a configuration to ensure known and predictable
results. An example of system testing is the configuration oriented system integration test. System testing is
based on process descriptions and flows, emphasizing pre-driven process links and integration points.

6.3White Box Testing


White Box Testing is a testing in which in which the software tester has knowledge of the inner workings,
structure and language of the software, or at least its purpose. It is purpose. It is used to test areas that cannot
be reached from a black box level.

6.4 Black Box Testing


Black Box Testing is testing the software without any knowledge of the inner workings, structure or
language of the module being tested. Black box tests, as most other kinds of tests, must be written from a
definitive source document, such as specification or requirements document, such as specification or
requirements document. It is a testing in which the software under test is treated, as a black box .you cannot
“see” into it. The test provides inputs and responds to outputs without considering how the software works.
Unit Testing Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct phases.
Test strategy and approach Field testing will be performed manually and functional tests will be written in
detail.
Test objectives
 All field entries must work properly.
 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed. Features to be tested
 Verify that the entries are of the correct format
 No duplicate entries should be allowed All links should take the user to the correct page.

6.5 CONCLUSION :

The text provides a comprehensive overview of testing methodologies employed in software


development. It begins by highlighting the fundamental purpose of testing: to discover errors and
ensure that software systems meet requirements and user expectations without failing in an
unacceptable manner.
The types of tests discussed include:
1. Unit Testing: This involves testing individual software units to validate internal program
logic and ensure that program inputs produce valid outputs. Unit tests focus on specific
business processes, applications, or system configurations, verifying that each unique path
performs accurately as per documented specifications.
2. Integration Testing: Integration tests are designed to test integrated software components to
ensure they function as one program. These tests expose problems that may arise from the
combination of components, demonstrating that the combination is correct and consistent.
3. Functional Testing: Functional tests systematically demonstrate that functions tested are
available as specified by business and technical requirements, system documentation, and
user manuals. These tests focus on validating valid input acceptance, rejection of invalid

66
input, exercising identified functions, exercising identified classes of application outputs, and
invoking interfacing systems or procedures.
4. System Testing: System testing ensures that the entire integrated software system meets
requirements and tests a configuration to ensure known and predictable results. It emphasizes
process descriptions and flows, emphasizing pre-driven process links and integration points.
Additionally, the text discusses two approaches to testing:
 White Box Testing: Testing where the tester has knowledge of the inner workings, structure,
or language of the software, used to test areas unreachable from a black-box level.
 Black Box Testing: Testing without knowledge of the inner workings, structure, or language
of the module being tested. Inputs and outputs are provided without considering the internal
functioning of the software.
Finally, the text concludes by outlining test objectives and features to be tested, emphasizing the
importance of verifying correct functionality, proper formatting of entries, prevention of duplicate
entries, and accurate navigation through links.
Overall, the text provides a comprehensive understanding of testing methodologies, emphasizing their
critical role in ensuring software quality and reliability.

67
CHAPTER 7

RESULT AND ANALYSIS

7.1 CONCLUSION

This chapter summarizes my work at every stage of the project. At the time I started my thesis, I had a
brief idea of how I will bring it from a topic on the paper to a real product. Due to knowledge of Computer
Vision and Biometric subjects I had background in the image-processing field but not at expert level but my
constant effort helped me to go through and succeed eventually.
As required in every project, research is of utmost importance. So, I spent the pretty much time in
going through the background literature. I looked at various approaches of doing my thesis and
developed four different methods: Row vector algorithm, Edging and row vector passing algorithm,
Mean and standard deviation of edged image and Diagonal sum algorithm.

Each of these algorithms was tried with neural networks and have higher performance rate in the
ascending order respectively.

The first limitation that was discovered in all the algorithms used with neural networks was that their
performance depended on the amount of training dataset provided. The system worked efficiently after
being trained by a larger dataset as compared to a smaller dataset.

The Row vector algorithm used initially was a very vague approach adopted for classification as it was
found through experiments that the row vectors of two different images could happen to be the same.

In the edging and row vector-passing algorithm, the edging parameter was introduced in addition to the
row vector to improve the gesture classification accuracy but it was found that due to self-shadowing
effect found in edges, the detection rate was not sufficiently improved.

The next parameters tried for classification were mean and standard deviation. They also failed to give
satisfactory results (i.e. above 60%) but still they were among the best parameters used for detection
with neural networks.

Due to the unusual behavior of neural network with all the mentioned parameters, the diagonal sum
algorithm was finally implemented in real time. The system was tested with 60 pictures and the
detection rate was found to be 86%. The strengths and weaknesses of gesture recognition using
diagonal sum have been presented and discussed. With the implemented system serving as an
extendible foundation for future research, extensions to the current system have been proposed.

68
7.2 FUTURE WORK

The system could also be made smart to be trained for only one or two gestures rather than all and then
made ready for testing. This will require only a few changes in the current interface code, which were
not performed due to the shortage of time.
One time training constraint for real time system can be removed if the algorithm is made efficient to
work with all skin types and light conditions which seems impossible by now altogether. Framing with
COG (Centre of gravity) to control orientation factor could make this system more perfect for real
application.

The system‟s speed for preprocessing could be improved if the code is developed in VC/VC.Net

69
REFERENCES

[1] J.Jenkinwinston (96207106036), M.Maria Gnanam (96207106056), R.Ramasamy


(96207106306), Anna University of Technology, Tirunelveli: Hand Gesture Recognitionsystem
Using Haar Wavelet.

[2] Laura Dipietro, Angelo M. Sabatini, Senior Member, IEEE, and Paolo Dario, Fellow, IEEE, A
Survey of Glove-Based Systems and Their Applications.

[3] Kay M. Stanney HANDBOOK OF VIRTUAL ENVIRONMENTS Design,


Implementation, and Applications, Gesture Recognition Chapter #10 by Matthew Turk

[4] Daniel Thalman, Gesture Recognition Motion Capture, Motion Retargeting, and Action
Recognition

[5] Hatice Gunes, Massimo Piccardi, Tony Ja, 2007, Research School of Information
Sciences and Engineering Australian National University Face and Body Gesture Recognition for a
Vision-Based Multimodal Analyzer

[6] J. Heinzm ann and A. Zelinsky Robust Real - Time Face Tracking and Gesture Recognition

[7] Vladimir Vezhnevets, Vassili Sazonov, Alla Andreeva, Moscow State University A
Survey on Pixel-Based Skin Color Detection Techniques,

[8] TEY YI CHI, Universiti Teknologi Malaysi, FUZZY SKIN DETECTION

[9] Robyn Ball and Philippe Tissot, Texas A&M University, Demonstration of Artificial
Neural Network in Matlab

[10] Howard Demuth, Mark Beale, Neural Network Toolbox

[11] Ari Y. Benbasat and Joseph A. Paradiso in MIT Media Laboratory, Cambridge An
Inertial Measurement Framework for Gesture Recognition and Applications

[12] Peggy Gerardin, Color Imaging Course Winter Semester 2002-2003, Color Image
Segmentation

[13] Michael Nielsen, Moritz Störring, Thomas B. Moeslund, and Erik Granum, March 2003, A
procedure for developing intuitive and ergonomic gesture interfaces for man- machine
interaction.
[14] Oğuz ÖZÜN1, Ö. Faruk ÖZER2, C. Öncel TÜZEL1, Volkan ATALAY, A. Enis
ÇETİN2, Dept. of Computer Engineering, Middle East Technical University, Ankara, Turkey, Dept. of
Electrical Engineering, Bilkent University, Ankara, Turkey

70
Faculty of Engineering, Sabanci University, Istanbul, Turkey, Vision based single stroke character
recognition for wearable computing.

[15] Melinda M. Cerney1, Judy M. Vance, March 28, 2005, Gesture Recognition in Virtual
Environments: A Review and Framework for Future Development.

[16] Ashutosh Saxena, Aditya Awasthi, Vaibhav Vaish, and SANKET: Interprets your Hand
Gestures.

[17] Yoichi Sato, Institute of Industrial Science, the University of Tokyo, Makiko Saito
Hideki Koike, Graduate School of Information Systems, University of Electro-
Communications, Tokyo Real-Time Input of 3D Pose and Gestures of a User’s Hand and Its
Applications for HCI

[18] Mohammad Al-aqrabawi Fangfang Du, 11th March‟ 2000, Human Skin Detection Using Color
Segmentation

[19] Manuel Cabido Lopes Jos´e Santos-Victor Instituto de Sistemas e Rob´otica, Institution Superior
T´ecnico, Lisbon, Portugal in IROS Workshop on Robot Programming by demonstration, Las
Vegas, SA, Oct 31st, 2003 Motor Representations for Hand Gesture Recognition and
Imitation

[20] Raymond Lockton and Andrew W. Fitzgibbon, Department of Engineering Science University of
Oxford. Real-time gesture recognition using deterministic boosting

[21] Didier Stricker, Didier Stricker, June 15th 2006 Hand gesture recognition gradient orientation
histograms and eigenvectors methods

[22] Steven Daniel Lovell, February 2005, A System for Real-Time Gesture
Recognition and Classification of Coordinated Motion

[23] Klimis Symeonidis, Submitted to the School of Electronic and Electrical Engineering On August
23,2000 Hand Gesture Recognition Using Neural Networks.

[24] William Buxton, Eugene Fiume, Ralph Hill, Alison Lee, Carson Woo in Computer Systems
Research Group, Department of Computer Science, University of Toronto
Continuous hand-gesture driven input.

[25] Ray Lockton, Balliol College, Oxford University, Hand Gesture Recognition Using Computer
Vision.

71

You might also like