Alphanumeric Recognition Using Hand Gestures: Shashank Krishna Naik, Mihir Singh, Pratik Goswami, Mahadeva Swamy GN

e-ISSN: 2582-520
International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:06/June-2022 Impact Factor- 6.752 www.irjmets.com
ALPHANUMERIC RECOGNITION USING HAND GESTURES
Shashank Krishna Naik*1, Mihir Singh*2, Pratik Goswami*3,
Mahadeva Swamy GN*4
*1,2,3,4Department Of Computer Science And Engineering JSS Science And
Technology University, Mysore, India.
ABSTRACT
The work presented in this paper has the goal to develop a system for automatic translation of static gestures of
alphabets in American Sign Language. The required images for the selected alphabet are obtained using a
digital camera. These images are passed through the MediaPipe framework. The MediaPipe pipeline utilizes
multiple models like a palm detection model and hand landmark model which gives us the 3 dimensional hand
coordinates, these three dimensional coordinates transformed into two dimensional coordinates and then
euclidean distance is calculated from reference point. Then, feature selection is applied on these data to reduce
the computational cost and increase the performance. Then, on these data we apply the supervised learning
algorithm (KNN) for multi-class classification to classify data or predict alphanumeric accurately.
Keywords: Machine Learning, Mediapipe, Euclidean Distance, Alphanumeric, Supervised Learning.
I. INTRODUCTION
In this paper, we focus on using pointing behavior for a natural interface and machine learning algorithm for
multi class classification to classify the static alphanumeric gestures. Hand gesture recognition based human-
machine interface is being developed vigorously in recent years. Due to the effect of lighting and complex
background, most visual hand gesture recognition systems work only under a restricted environment. To
classify the static hand gestures, we developed a simple and fast method using a supervised learning algorithm
which takes the distance between hand landmarks as input. In recent years, the gesture control technique has
become a new developmental trend for many human- based electronics products. This technique lets people
control these products more naturally, intuitively and conveniently. In this paper, a fast gesture recognition
scheme is proposed to be an interface for the human-machine interaction (HMI) of systems. This paper
presents some low-complexity algorithms and gestures to reduce the gesture recognition complexity and be
more suitable for controlling real-time computer systems and also we are hosting the website which recognises
the static hand gestures, so that anyone with internet connection can access this functionality.
II. EXISTING SYSTEM
A. Hand gesture recognition with skin detection and deep learning method
This method has mainly four steps, first we detect skin color of the image to detect the hand and then we pass
it through the next stage that is preprocessing, usually the binary image will be noisy, so image preprocessing
is necessary, which fills the holes and restores the image. After we remove the noise in the image, we need to
extract contours. We consider each point cluster as a contour. Among these contours, there is only one contour
which represents the hand region. Furthermore, the hand region and face region are the largest two contours.
Based on such a fact, the problem of finding the hand from the contours becomes the problem of separating
hands from faces. So we collect 100 samples of face region and hand region. Then we use VGGNet to classify
them. VGGNet is a deep convolution neural network developed by the Visual Geometry Group and researchers
at Google DeepMind. Then the pyramidal pooling module and attention mechanism are used to increase the
receptive field and classify the details more efficiently.
B. Static gesture recognition combining graph and appearance features
This method has mainly two step, first one is feature extraction phace. In feature extraction phase, the first step
is to take an image from a digital camera or webcam. Once the frame picture is captured, it requires RGB to Gray
conversion as our taken image is in RGB form. After the Conversion, we are having Gray scale image and now it
is converted into binary by using OTSU’s segmentation algorithm. These binary image pixels have only 2
numeric values that are 0 as black and 1 as white. The binary images are often produced by thresholding a
grayscale image from the background. The result of this is sharp and clear details of an image. The goal of edge
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[2044]
e-ISSN: 2582-520
detection is to extract the boundary of the desired object for shape details. Edge detector defines the location of
our desired features in the image. Canny edge detector is a very effective algorithm but it is providing extra
details more than needed. For this reason we have selected Modified Moore Neighbor Contour Tracing
Algorithm for getting required detail only from the image as a feature extraction method.
The classification phase includes different features of feature extraction method and here we will apply rule
based approach for recognition. In our application we have consider 3 features; Centroid, Roundness and Scan
line features for recognition. We have collected all the values of different features and then based on that we
will make the rules for recognition. This classification method is totally based on rules. Learning based
approach is much time consuming method, so we are using a rule based approach for fast recognition.
III. PROPOSED METHODOLOGY
Sign language is a visual-gestural language used by people who can't hear or have hearing impairments. The
movement of hands in particular conveys certain words and phrases. Sign language has its own different
vocabulary which is comparable to some regular speaking languages. Our Alphanumeric recognition system
attempts to recognise the sign language alphabets and number gesture into text format, by using these series of
alphabets we can form a meaningful sentence.
1. We are using a video camera or more specifically the webcam on our laptops to collect a live video feed from
which we will collect hand data and send it through a recognition algorithm.
2. The next step is detecting the hand, the webcam will capture a series of video frames for analysis.
3. These frames are run through the MediaPipe framework, The MediaPipe pipeline utilizes multiple models
like a palm detection model that returns an oriented hand bounding box from the full image.
4. The cropped image region is fed to a hand landmark model defined by the palm detector and returns high-
fidelity 3D hand key points.
Fig 1: Hand landmark model

5. These key points contain x, y and z coordinates, in which we are considering only x and y coordinates.
6. Then, these coordinates are used to calculate the Euclidean distance between each landmark from the
reference point. 6 From our analysis, we came to the conclusion that is by taking the center point
(9_MIDDLE_FINGER_MCP) as a reference point will be more effective, and the left side landmark distance
from the center point is multiplied with negative to differentiate the left and right hand.
7. Feature selection:
a. Feature selection is the process of reducing the number of input variables when developing a predictive
model.
b. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling
and, in some cases, to improve the performance of the model.
c. Before feature selection.
['0',’1’,'2', '3', '4',’5’, '6', '7', '8',’9’, '10', '11', '12',’13’, '14', '15', '16',’17’, '18', '19', '20']
d. After feature selection.
[2045]
e-ISSN: 2582-520
['0','2', '3', '4', '6', '7', '8', '10', '11', '12', '14', '15', '16', '18', '19', '20']
[Note : ‘0’ represents the hand landmark distance from 9 to 0 as shown in fig. 1. Similarly all the feature
represent the euclidean distance from hand landmark point ‘9’]
8. When we press the capture button the calculated distance will be sent to the backend for processing.
9. Then these distances are passed through a supervised learning algorithm (KNN) for multi-class
classification to classify data or predict alphabet or numbers accurately.
10. The predicted data is sent as response to the front-end application and displayed on screen.
In such a manner we plan to process the sequence of captured images and identify individual alphabets or a
number, by using these series of alphabets we can form a meaningful sentence.
IV. SYSTEM ARCHITECTURE
Fig 2: System architecture

[2046]
e-ISSN: 2582-520
International Research Journal of Modernization in Engineering Technology and
Science
Volume:04/Issue:06/June-2022 Impact
( Peer-Reviewed, Open Access, Factor-
Fully 6.752
Refereed www.irjmets.com
International Journal )
Fig 3: Media Pipe architecture

V. EXPERIMENTAL RESULTS AND ANALYSIS
A. Experimental Results
The performance of the recognition system is evaluated by testing its ability to classify the signs based on non
parametric supervised learning classifier KNN.
Fig 4: Sample result

As we can see in the figure we can see red dots which indicates the hand landmark and the letter at top left
corner of the video output frame is the output which is predicted by the our model and we tested our model
with different signs which is static gestures and it is able to recognise the alphabets from A to Z (except J and Z)
and number from zero to nine.
[2047]
e-ISSN: 2582-520
Science
Fully 6.752
B. Analysis
a. KNN vs SVM
We got 98.96% training accuracy and 98.58% testing accuracy with KNN model and 97.74% model accuracy
and 97.17% model accuracy with SVM model.
Fig 5: knn classification report Fig 6: svm classification report

Precision : Percentage of positive instances out of the total predicted positive instances.
Recall : Percentage of positive instances out of the total actual positive instances.
F1 score : It is the harmonic mean of precision and recall.
Performance of KNN model is high for our implementation because our implementation has more features to recognise
or classify and it is proven that in these conditions KNN accuracy is better than SVM. So, we chose the KNN classifier.
b. Value of K in KNN classifier
To determine the optimal value of K we calculated the mean error for K value from one to forty. And which one
has the lowest mean error is considered as K value.

[2048]
e-ISSN: 2582-520
Science
Fully 6.752
Fig 7: K value vs Mean error graph

To confirm the value of K we calculated the testing and training accuracy of model with K value from one to
nine.
Fig 8: K value vs Accuracy graph

As we can see in fig. 8 when the ‘K’ value becomes 5 the testing accuracy is increased and similarly in fig. 7 the
mean error is low. Hence, we selected 5 as ‘K’ value, while generating the KNN model.
VI. FUTURE WORK
We can integrate this with different type of search bar so that the Deaf & Dumb people can search easily using
sign language. Similarly, it can be integrated with phonePay, Gpay type payment methods and also e-commerce
websites like flipkart and amazon to pay or order stuff directly using sign language. We can also add a feature
that speaks out the text that are recognised from the hand gesture.
VII. CONCLUSION
In this project, we are developing an alphanumeric sign language recognition using hand gestures in a new
approach, which can be used to recognize both alphabets and numerics using hand gestures. We are using the
new approach to recognize the gestures by tracking the coordinates of the hand to find which alphanumeric
symbol has been shown. This approach will be lightweight and efficient. There are a number of advantages
that could facilitate people who are differently abled people (deaf and dumb). The output of the sign language
will be displayed in the text form in real-time. This could lead to a revolution in communication as any disability
would not stop anyone from expressing their opinions. We can further improve this by adding a feature that
[2049]
e-ISSN: 2582-520
Science
Fully 6.752
will spell out the complete sentence that is recognized by the system.
VIII. REFERENCES
[1] HAND GESTURE RECOGNITION: A LITERATUREREVIEW RafiqulZaman Khan and 2Noor Adnan
Ibraheem, Department of Computer Science, A.M.U. Aligarh, India (2012).
[2] Conversion of Sign Language into Text Mahesh Kumar N B1 1Assistant Professor (Senior Grade),
Bannari Amman Institute of Technology, Sathyamangalam, Erode, India (2018).
[3] P. Mekala et al. Real-time Sign Language Recognition based on Neural Network Architecture. System
Theory (SSST), 2011 IEEE 43rd South eastern Symposium 14-16 March 2011.
[4] https://google.github.io/mediapipe/solutions/hands.html
[5] Nasser H. Dardas and Emil M. Petriu" Hand Gesture Detection and Recognition Using Principal
Component Analysis" international conference on computational intelligence for measurement system
and application (CIMSA), pp: 1-6, IEEE, 2011.

[2050]

Alphanumeric Recognition Using Hand Gestures: Shashank Krishna Naik, Mihir Singh, Pratik Goswami, Mahadeva Swamy GN

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Alphanumeric Recognition Using Hand Gestures: Shashank Krishna Naik, Mihir Singh, Pratik Goswami, Mahadeva Swamy GN

Uploaded by

Copyright:

Available Formats

e-ISSN: 2582-520

International Research Journal of Modernization in Engineering Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Fig 1: Hand landmark model

Fig 2: System architecture

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Fig 3: Media Pipe architecture

Fig 4: Sample result

Fig 5: knn classification report Fig 6: svm classification report

F1 score : It is the harmonic mean of precision and recall.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Fig 7: K value vs Mean error graph

Fig 8: K value vs Accuracy graph

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

You might also like