Hell 3

Gesture Control Gaming
Tarinee Prasad Sahoo, Kumar Mapanip Saheb Sahoo
Department of Computer Science and Engineering

National Institute of Technology Rourkela
Gesture Control Gaming
Project report submitted in partial fulfillment
of the requirements for the degree of
Bachelor of Technology
in
Computer Science and Engineering
by
Tarinee Prasad Sahoo, Kumar Mapanip Saheb Sahoo

(Roll Number: 116CS0224, 716CS1046)
based on research carried out
under the supervision of
Prof. Arun Kumar
October, 2016

October 07, 2016
Certificate of Examination
Roll Number: 116CS0224, 716CS1046
Name: Tarinee Prasad Sahoo, Kumar Mapanip Saheb Sahoo
Title of Dissertation: Gesture Control Gaming
We the below signed, after checking the project report mentioned above and the official
record book (s) of the student, hereby state our approval of the project report submitted in
partial fulfillment of the requirements of the degree of Bachelor of Technology in Computer
Science and Engineering at National Institute of Technology Rourkela. We are satisfied with
the volume, quality, correctness, and originality of the work.
Arun Kumar
Principal Supervisor Member, DSC
Member, DSC Member, DSC
External Examiner Chairperson, DSC
Ashok K Turuk
Head of the Department
Prof. Arun Kumar

Associate Professor
Prof.
October 07, 2016
Supervisors’ Certificate
This is to certify that the work presented in the project report entitled Gesture Control
Gaming submitted by Tarinee Prasad Sahoo, Kumar Mapanip Saheb Sahoo, Roll
Number 116CS0224, 716CS1046, is a record of original research carried out by him under
our supervision and guidance in partial fulfillment of the requirements of the degree of
Bachelor of Technology in Computer Science and Engineering. Neither this project report
nor any part of it has been submitted earlier for any degree or diploma to any institute or
university in India or abroad.
Arun Kumar
Associate Professor
Dedication
This project is dedicated to all those people who have helped us in different stages of life.
We’re thankful to our teachers, friends and family who supported us during the tenure of
this project.
Signature
Declaration of Originality
I, Tarinee Prasad Sahoo, Kumar Mapanip Saheb Sahoo, Roll Number 116CS0224,
716CS1046 hereby declare that this project report entitled Gesture Control Gaming presents
my original work carried out as a undergraduate student of NIT Rourkela and, to the best of
my knowledge, contains no material previously published or written by another person, nor
any material presented by me for the award of any degree or diploma of NIT Rourkela or
any other institution. Any contribution made to this research by others, with whom I have
worked at NIT Rourkela or elsewhere, is explicitly acknowledged in the dissertation. Works
of other authors cited in this dissertation have been duly acknowledged under the sections
“Reference” or “Bibliography”. I have also submitted my original research records to the
scrutiny committee for evaluation of my dissertation.
I am fully aware that in case of any non-compliance detected in future, the Senate of NIT
Rourkela may withdraw the degree awarded to me on the basis of the present dissertation.
October 07, 2016 Tarinee Prasad Sahoo, Kumar Mapanip

NIT Rourkela Saheb Sahoo
Acknowledgment
The project has helped us learn a lot about Image Processing and Deep Learning. This will
be a crucial factor in improving our skills and will help us in future projects. We thank all
those people who have directly and indirectly helped in completion of the project. The results
obtained have been validated by rigorous experimentation and we’re open to any discussions
regarding the legitimacy of the algorithms and formulas used.
Tarinee Prasad Sahoo, Kumar Mapanip

November 30, 2018
Saheb Sahoo
NIT Rourkela
Roll Number: 116CS0224, 716CS1046
Abstract
This is a project implementing concepts of deep learning and image processing to design
a gesture based system which uses various hand gestures to mimic keyboard commands.
These virtual commands will be used as the input for a game. The game will be designed
using PyGame module of python.
Keywords: Deep Learning; Image Processing; PyGame.

Contents
Certificate of Examination ii
Supervisors’ Certificate iii
Dedication iv
Declaration of Originality v
Acknowledgment vi
Abstract vii
1 Introduction 1
1.1 Why gesture? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 How Gesture Recognition works? . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Stages of Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Algorithms Used and Working 3

2.1 Adaptive Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 CNN Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Progress and Conclusion 6

3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References 9
Dissemination 10
Index 11
viii
Chapter 1
Introduction
The main aim behind this project is to make an interaction between human and computer
using various applications running on computer by aiming basic shapes made by hand.Our
hand movements have an important role while interacting with other people, as they convey
very rich information in many ways. Gesture recognition could be a quite sensory activity
computing program that permits computers to capture and interpret human gestures as
commands. Gesture Recognition is largely the power of a computer to grasp gestures and
execute commands supported those gestures .
Many companies are working on gesture recognition system . Most of them are working
with additional sensors like wireless gloves with sensors or the most popular Microsoft
Kinect which uses a Depth camera which basically consists of some system that segment
the body , and based on the segmentation they produce the getsure . On the contorary we
decided to feed a neural network with a large amount of data and seeing whether we can
achieve the result without using much of hardware . We only need a laptop camera to build
a gesture recognition system .
1.1 Why gesture?

Gesture is a natural form of expression and has been a vital mode of communication for
human.Building a gesture based recognition system would be the first step for a computer
to understand human body language. Thus building a bridge between human and computer
interaction.
1.2 How Gesture Recognition works?

Gesture recognition is an alternate technique for providing period of time knowledge to a
computer rather than typewriting with keys or sound on barely screen, a motion sensing
element perceives and interprets movements because the primary supply of knowledge
input.The recognized gesture can be used as data input for games.
• The camera feeds image knowledge into a sensing device(camera) connected to the
computer.
1
• The Programmed package then identifies meaningful gestures from a planned gesture
library wherever every gesture is mapped to a associated command.
• The package then associates every real�time gesture, interprets it , uses the library
to spot meaningful gestures that match with the given gesture. Once the gesture has been
understood, classification is performed.
1.3 Stages of Processing

There are three main stages in working of a search engine, namely
• Data Collection
• Data Preprocessing
• Deep Learning
• Score Computation
1.4 Related Work

Gesture Recognition is essentially the power of a computer to grasp gestures and execute
commands supported those gestures. Most users are familiar with Wii match, X-box and
PlayStation games like “Just Dance” and “Kinect Sports.”
Chapter 2
Algorithms Used and Working
We have used two algorithms namely:
» Adaptive Thresholding
» Convolutional Neural Network.
2.1 Adaptive Thresholding

Adaptive thresholding generally takes a grayscale or color image as input and, within the
simplest implementation, outputs a binary image representing the segmentation. for every
picture element within the image, a threshold needs to be calculated. If the picture element
price is below the brink it’s set to the background price, otherwise it assumes the foreground
price.
There are two prime approaches to finding the threshold: (i) the Chow and Kaneko
approach and (ii) local thresholding. the idea behind each strategies is that smaller image
regions are a lot of probably to possess around uniform illumination, therefore being a lot
of appropriate for thresholding. Chow associate degreed Kaneko divide a picture into an
array of overlapping subimages and so notice the optimum threshold for every subimage by
investigation its bar chart. the brink for every single picture element is found by interpolating
the results of the subimages. the disadvantage of this technique is that it’s machine pricey
and, therefore, isn’t acceptable for period applications.
An alternative approach to finding the native threshold is to statistically examine the
intensity values of the native neighborhood of every picture element. The datum that is most
acceptable depends mostly on the input image. easy and quick functions embrace the mean
of the native intensity distribution,
T = Mean
the median value,
T = Median
or the mean of the minimum and maximum values,
T = (Max+Min)/2
The size of the neighborhood needs to be giant enough to hide comfortable foreground
3
Chapter 2 Algorithms Used and Working
and background pixels, otherwise a poor threshold is chosen. On the opposite hand, selecting
regions that area unit overlarge will violate the idea of roughly uniform illumination.
This technique is a smaller amount computationally intensive than the Chow and Kaneko
approach and produces smart results for a few applications.
2.2 Convolutional Neural Network

Convolutional Neural Network is a category of deep neural networks, most ordinarily
applied to analyzing visual imaging. CNNs use a variation of multilayer perceptrons
designed to want borderline preprocessing. they’re conjointly called shift invariant or space
invariant artificial neural networks (SIANN), supported their shared-weights architecture
and translation unchangingness characteristics. Convolutional networks were inspired by
biological processes in this the property pattern between neurons resembles the organization
of the animal visual area. Individual plant tissue neurons respond to stimuli solely in an
exceedingly restricted region of the visual view called the receptive field. The receptive
fields of various neurons partly overlap specified they cowl the entire visual view. CNNs
use comparatively very little pre-processing compared to alternative image classification
algorithms. this implies that the network learns the filters that in ancient algorithms were
hand-engineered. This independence from previous information and human effort in feature
style could be a major advantage.
4
2.2.1 CNN Parameters
• First layer : Convolution layer with 5 * 5 stride with ‘relu’ activation function .
• Second layer : Maxpooling layer with 2 * 2 pooling window .
• Third layer : Convolution layer with 5 * 5 sliding with ‘relu’ activation function .
• Fourth layer : Maxpooling layer with 2 * 2 pooling window
• Flattening of outputs from fourth layer
• Fifth layer : Full connection layer with 256 input nodes and ‘relu’ Activation .
• Sixth layer : Four nodes with sigmoid function
Chapter 3
Progress and Conclusion
Setup
We developed our algorithms on Python3 . And we employed Tensorflow 1.4,Keras 2.0.9
to build the network and the loss function . We used CV2 3.3.0 for image processing and
drawing bounding boxes as well as text. For mathematical computations Numpy 1.2.1 was
used . For network model training NVIDIA NC6 in AWS
Dataset
In order to train a convolutional neural network, a lot of data is required for good accuracy .
All the dataset has been prepared and trained by us.
Progress
3.1 Data Collection
• We used OpenCv module of python for image processing, image is extracted from live
video with camera frame by frame .
• Images are then converted to grayscale.
• Adaptive mean thresholding with binary thresholding is used and Mask is applied.
• The final image that was saved was a 64x64 grayscaleimage.
3.2 Data Preprocessing

• This step preprocesses the data i.e. normalizing and splitting the data into train test.
• The split ratio is 75
• Total data points (approx..) Train � 8000 Test – 2000
3.3 CNN Model

• This section describes the architecture used.
6
Chapter 3 Progress and Conclusion
• The model consists of two CONV BOX followed by a fully connected layer.
• One CONV BOX consists of three layers
Convolution Operation - > Relu - > Max Pooling
• So, this operation was carried out twice.
• Then flattening of the data is performed and fed into Fully connected ANN.
Fig.
1
The plot Fig. 1 shows the steps performs in training the neural network with a appropriate
set of hyperparameter with proper tuning .
Result
Fig.
2
The plot Fig. 2 shows the validation loss for the training set .
7
Fig. 3
The plot Fig. 1 shows the validation loss vs number of epochs and accuracy vs number of
epochs for the test set .
Conclusion and Future Work

As can be observed from the graph, the accuracy of our model is very good and hence it will
work satisfactorily for four control based systems .So for the future work we are planning to
develop a multiple control based game and control it with gestures instead of keyboard.
References
• https://www.youtube.com/watch?v=FTr3n7uBIuE
• http://cs231n.github.io/convolutional-networks/
• https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
• http://colah.github.io/posts/2014-07-Conv-Nets-Modular/
• http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/
0
This reference format follows ASME style. You are advised to follow one reference format of any
dominant journal of your field.
9
Dissemination
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and
Semantic Search Engines: Google, Yahoo, Msn and Hakia (by Duygu Tümer,
Mohammad Ahmed Shah, Yıltan Bitirim)
Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation

(by G. Atsaros, D. Spinellis, P. Louridas)
Design and Development of Semantic-based Search-Engine Model (by

CAIBo�LIYang-mei)
Web Search Engines: Part 1 (by David Hawking )

Index
layout, 1

Hell 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hell 3

Uploaded by

Copyright:

Available Formats

Gesture Control Gaming

Tarinee Prasad Sahoo, Kumar Mapanip Saheb Sahoo

Department of Computer Science and Engineering

Project report submitted in partial fulfillment

of the requirements for the degree of

Computer Science and Engineering

Tarinee Prasad Sahoo, Kumar Mapanip Saheb Sahoo

based on research carried out

under the supervision of

Prof. Arun Kumar

Department of Computer Science and Engineering

Member, DSC Member, DSC

External Examiner Chairperson, DSC

Prof. Arun Kumar

October 07, 2016

October 07, 2016 Tarinee Prasad Sahoo, Kumar Mapanip

Tarinee Prasad Sahoo, Kumar Mapanip

Keywords: Deep Learning; Image Processing; PyGame.

Supervisors’ Certificate iii

2 Algorithms Used and Working 3

3 Progress and Conclusion 6

1.1 Why gesture?

1.2 How Gesture Recognition works?

1.3 Stages of Processing

1.4 Related Work

2.1 Adaptive Thresholding

2.2 Convolutional Neural Network

3.2 Data Preprocessing

3.3 CNN Model

Conclusion and Future Work

Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation

Design and Development of Semantic-based Search-Engine Model (by

Web Search Engines: Part 1 (by David Hawking )

You might also like