Fin Irjmets1658480505

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:07/July-2022 Impact Factor- 6.752 www.irjmets.com
HAND GESTURE RECOGNITION USING CNN
Shivamurthy RC*1, Musaveer Khan*2, Ruchitha M*3,
Sathvik Nidhi YT*4, Yadunandan AR*5
*1Head Of The Department, Department Of Computer Science And Engineering Maharaja
Institute Of Technology, Mysore, Karnataka, India.
*2,3,4,5Department Of Computer Science And Engineering Maharaja Institute Of
Technology, Mysore, Karnataka, India.
ABSTRACT
Due to its adaptability and user-friendliness, hand gesture recognition is one of the areas of human-computer
interface that is actively being researched. A system for operating a gadget or communicating with persons who
are impaired is being developed using the gesture recognition approach. Variations in illumination, irregular
backgrounds, variations in a user's hand's size and shape, and high interclass similarity amongst hand gesture
poses present significant hurdles for the creation of an effective hand gesture identification algorithm. Using
specially created features like a deep convolutional neural network and a histogram of oriented gradients
(HOG), a user independent static hand gesture identification method is examined in this chapter (CNN). Fully
connected layers of two are used to extract the deep features.
I. INTRODUCTION
From the ancient age, gesture was the first mode of communication, after the evolution of human civilization
they developed the verbal communication, but still non-verbal communication is equally significant. Such non-
verbal communication is not only used for physically challenged person but also it can be efficiently used for
various applications such as 3D gaming, aviation, surveying, etc. This is the best method to interact with
computer without any peripheral devices. Many Researchers are still developing robust and efficient new hand
gesture recognition techniques. The major steps associated while designing the system are: data acquisition,
segmentation and tracking, feature extraction and gesture recognition. There're different methodologies
associated with several sub steps present at each step.
A various segmentation and Tracking, feature extraction and recognition techniques are studied and analyzed.
This project reviews the comparative study of various hand gesture recognition techniques which are
presented up-till now. In order to offer new possibilities to interact with machine and to design more natural
and more intuitive interactions with computing machines, our research aims at the automatic interpretation of
gestures based on computer vision. In this project, we propose a technique which commands computer using
six static and eight dynamic hand gestures. Hand gesture recognition system received great attention in the
recent few years because of its manifoldness applications and the ability to interact with machine efficiently
through human computer interaction. In this project a survey of recent hand gesture recognition systems is
presented. Key issues of hand gesture recognition system are presented with challenges of gesture system.
The point of the task is to enhance the acknowledgment of the human hand poses in a Human Computer
Interaction application, the lessening of the time figuring and to enhance the client comfort with respect to the
utilized human hand stances. The creators built up an application for PC mouse control. The application
dependent on the proposed calculation, hand cushion shading and on the chose hand include introduces great
conduct with respect to the time processing. The user has an increased comfort in use of the system due to the
proposed hand postures to control the system and hand gestures can be used to identify and express the signs
as language thru speech synthesizer.
II. METHODOLOGY
2.1 Convolutional Neural Network
In the field of deep learning, convolutional neural network (CNN) is among the class of deep neural networks,
which was being mostly deployed in the field of analyzing/image recognition. Convolutional Neural uses a very
special kind of method which is being known as Convolution. CNN is a powerful algorithm for image processing.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[2908]
e-ISSN: 2582-5208
These algorithms are currently the best algorithms we have for the automated processing of images. Many
companies use these algorithms to do things like identifying the objects in an image.
2.1.2 CNN’s basic architecture
A CNN architecture consists of two key components:
• A convolution tool that separates and identifies the distinct features of an image for analysis in a process
known as Feature Extraction.
• A fully connected layer that takes the output of the convolution process and predicts the image’s class based
on the features retrieved earlier.
2.1.3 Working of CNN
The CNN is made up of three types of layers: convolutional layers, pooling layers, and fully connected (FC)
layers.
Figure 1: Working of CNN

Convolution layer : This is the very first layer in the CNN that is responsible for the extraction of the different
features from the input images. The convolution mathematical operation is done between the input image and a
filter of a specific size MxM in this layer.
Fully Connected : The Fully Connected (FC) layer comprises the weights and biases together with the neurons
and is used to connect the neurons between two separate layers. The last several layers of a CNN Architecture
are usually positioned before the output layer.
Pooling layer : The Pooling layer is responsible for the reduction of the size(spatial) of the Convolved
Feature. This decrease in the computing power is being required to process the data by a significant reduction
in the dimensions.
III. MODELING
3.1 System design
Three-tier (layer) is a client-server architecture in which the user interface, business process (business rules)
and data storage and data access are developed and maintained as independent modules or most often on
separate platforms.
The Architecture of Application is based on three-tier architecture. The three logical tiers are:
• Presentation tier - XML Forms, Images.
• Middle tier –Java classes.
• Data tier- Firebase Database
The main reason for considering three-tier architecture for the Application is as follows:
1. Flexibility.
2. Reusability.
3. Security.

[2909]
e-ISSN: 2582-5208
Figure 2: Three tier architecture

3.1.1 Detailed design
Detailed design starts after the system design phase is completed and the system design has been certified
through the review. The goal of this phase is to develop the internal logic of each of the modules identified
during system design.
In the system design, the focus is on identifying the modules, whereas during detailed design the focus is on
designing the logic for the modules. In other words in system design attention is on what components are
needed, while in detailed design how the components can be implemented in the software is the issue.
3.1.2 Data flow diagram
DFD graphically representing the functions, or processes, which capture, manip ulate, store, and distribute
data between a system and its environment and between components of a system. The visual
representation makes it a good communication tool between User and System designer. Structure of DFD
allows starting from a broad overview and expand it to a hierarchy of detailed diagrams.
Upload
image
Hand Gesture
Recognition
Image Gesture
comparisio Recognition
n using
KNN &
SVM
Image Data Preprocessing
Comparision
Load Data Gesture
Load Image Segmentation using KNN &
Sets Recognition
SVM
Figure 3: data flow diagram

3.1.3 Use case diagram
Use case diagram is a graph of actors, a set of use cases enclosed by a system boundary, communication
associations between the actor and the use case. The use case diagram describes how a system interacts with
outside actors; each use case represents a piece of functionality that a system provides to its users. A use case is
[2910]
e-ISSN: 2582-5208
known as an ellipse containing the name of the use case and an actor is shown as a stick figure with the name of
the actor below the figure. the use cases are used during the analysis phase of a project to identify and partition
system functionality. They separate the system into actors and use case. Actors represent roles that are played
by user of the system. Those users can be humans, other computers, pieces of hardware, or even other software
systems.
Hand Gesture Recognition
Load Training data
Upload Images
User Preprocessing
Comparision Using KNN & SVM
Feature Extraction
gesture recognition
Figure 4: Use case diagram

3.1.4 Sequence diagram
A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes
involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the
functionality of the scenario. Sequence diagrams are sometimes called event diagrams, event scenarios.
User: Server:
Load Training
Data Set
Load Image
Preprocessing
Segmentation/Fe
atures
Classification
using KNN &
SVM
Gesture
Recognition
Figure 5: Sequence diagram

IV. EXPERIMENTS AND DISCUSSION
Implimentation
Implementation is the process of converting a new or a revised system design into an operational one. The
objective is to put the new or revised system that has been tested into operation while holding costs, risks, and
personal irritation to the minimum. A critical aspect of the implementation process is to ensure that there will
be no disrupting the functioning of the organization. The best method for gaining control while implanting any
new system would be to use well planned test for testing all new programs. Before production files are used to
[2911]
e-ISSN: 2582-5208
test live data, text files must be created on the old system, copied over to the new system, and used for the
initial test of each program.
Implementation is the most crucial stage in achieving a successful system and giving the user’s confidence that
the new system is workable and effective. Implementation of a modified application to replace an existing one.
This type of conversation is relatively easy to handle, provide there are no major changes in the system.
System Implimentation
There are three major types of implementations are there but the following are proposed for the project.
In the initial phase of this project, an image is captured through webcam these gestures are converted into
textual representation and using text to speech synthesizer the resultant text can be converted into user’s
habitual language i.e., Speech.
Following are the Algorithms used for this project:
HSV segmentation (Hue Saturation Value): Palm identification and mapping. LBP (Local Binary Pattern): Used
for edge detection of hand.
CNN (Convolution Neural Networks): To detect depth and intensity for preprocessing.
GMM (Gaussian mixture model): To detect image of the hand from the video sequences.
HMM (Hidden Markov Model): Gesture recognition to string conversion and string conversion to speech
synthesizer
Parallel conversion type of implimentation
In this type of implementation both the current system and the proposed system run in parallel. This happens
till the user gets the complete confidence on the proposed system and hence cuts of the current system.
Implimentation methodology of the project
The project is implemented in modular approach. Each module is coded as per the requirements and tested and
this process is iterated till the all the modules have been thoroughly implemented.
The hand is detected using the background subtraction method and the result of hand detection is transformed
to a binary image. Then, the fingers and palm are segmented so as to facilitate the finger recognition. Moreover,
the fingers are detected and recognized. Last, hand gestures are recognized using a simple rule classifier.
Hand detection
The images are captured with a normal camera. These hand images are taken under the same condition. The
background of these images is identical. So, it is easy and effective to detect the hand region from the original
image using the background subtraction method. However, in some cases, there are other moving objects
included in the result of background subtraction. The skin color can be used to discriminate the hand region
from the other moving objects. The color of the skin is measured with the HSV model. The HSV (hue, saturation,
and value) value of the skin color is 315, 94, and 37, respectively.
Fingers and palm segmentation
The output of the hand detection is a binary image in which the white pixels are the members of the hand
region, while the black pixels belong to the background. Then, the following procedure is implemented on the
binary hand image to segment the fingers and palm.
(i) Palm Point: The palm point is defined as the center point of the palm. It is found by the method of distance
transform. Distance transform also called distance map is a representation of an image. In the distance
transform image, each pixel records the distance of it and the nearest boundary pixel.
(ii) Inner Circle of the Maximal Radius: When the palm point is found, it can draw a circle with the palm point as
the center point inside the palm. The circle is called the inner circle because it is included inside the palm. The
radius of the circle gradually increases until it reaches the edge of the palm. That is the radius of the circle stops
to increase when the black pixels are included in the circle.
(iii) Hand Rotation: When the palm point and wrist point are obtained, it can yield an arrow pointing from the
palm point to the middle point of the wrist line at the bottom of the hand. Then, the arrow is adjusted to the
direction of the north.

[2912]
e-ISSN: 2582-5208
(iv) Fingers and Palm Segmentation: With the help of the palm mask, fingers and the palm can be segmented
easily. The part of the hand that is covered by the palm mask is the palm, while the other parts of the hand are
fingers.
Recognition of hand gestures
When the fingers are detected and recognized, the hand gesture can be recognized using a simple rule classifier.
In the rule classifier, the hand gesture is predicted according to the number and content of fingers detected. The
content of the fingers means what fingers are detected. The rule classifier is very effective and efficient.
System security
Although our method is mainly thought We will able to control the system thru our hand gestures and also we
provide an efficient way for gesture recognition and the gesture of sign language is convert to speech with the
help of speech synthesizer.
V. CONCLUSION
When the fingers are detected and recognized, the hand gesture can be recognized using a simple rule classifier.
In the rule classifier, the hand gesture is predicted according to the number and content of fingers detected. The
content of the fingers means what fingers are detected. The rule classifier is very effective and efficient.
VI. REFERENCES
[1] J.P. Bonet. “Reducci_on de las letras y arte para ense~nar a hablar a los mudos”, Coleccion Cl_asicos
Pepe. C.E.P.E., 1992.
[2] William C. Stokoe. Sign Language Structure [microform] / William C. Stokoe. Distributed by ERIC
Clearinghouse, [Washington, D.C.], 1978.
[3] William C. Stokoe, Dorothy C Casterline, and Carl G Croneberg. “A Dictionary of American Sign
Language on Linguistic Principles” Linstok Press, [Silver Spring, Md.], New Edition, 1976.
[4] Code Laboratories. CL NUI Platform. http://codelaboratories.com/ kb/nui.
[5] The Robot Operating System (ROS), http://www.ros.org/wiki/ kinect. [6] Open Kinect Project,
http://openkinect.org/wiki/Main_Page.
[6] Open NI API Reference. http://openni.org/Documentation/Reference/ index.html.
[7] Bridle, J., Deng, L., Picone, J., Richards, H., Ma, J., Kamm, T., Schuster, M., Pike, S., Reagan, R., “An
Investigation of Segmental Hidden Dynamic Models of Speech co-articulation for Automatic Speech
Recognition.”, Final Report for the 1998 Workshop on Language Engineering, Center for Language and
Speech Processing at Johns Hopkins University, pp. 161, 1998.

[2913]

Fin Irjmets1658480505

Uploaded by

Copyright:

Available Formats

You might also like

Fin Irjmets1658480505

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fin Irjmets1658480505

Uploaded by

Copyright:

Available Formats

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Figure 1: Working of CNN

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Figure 2: Three tier architecture

Image Data Preprocessing

Figure 3: data flow diagram

Hand Gesture Recognition

Load Training data

Comparision Using KNN & SVM

Figure 4: Use case diagram

Figure 5: Sequence diagram

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

You might also like