1.1.1 History of Input Devices

Gesture Recognition based Virtual Mouse and Keyboard
CHAPTER 1
INTRODUCTION
1.1 Introduction
A mouse, in computing terms, is a pointing device that detects two-dimensional
movements relative to a surface. This movement is converted into the movement of a pointer on
a display that allows control of the Graphical User Interface (GUI) on a computer platform.
There are a lot of different types of mouse that have already existed in modern-day technology,
there's the mechanical mouse that determines the movements by a hard rubber ball that rolls
around as the mouse is moved. Years later, the optical mouse was introduced that replaces the
hard rubber ball with a LED sensor to detect tabletop movement and then sends off the
information to the computer for processing.
1.1.1 History of input devices
In the year 2004, the laser mouse was then introduced to improve the accuracy of
movement with the slightest hand movement, it overcome the limitations of the optical mouse
which is the difficulties to track high-gloss surfaces. However, no matter how accurate can it be,
there are still limitations that exist within the mouse itself in both physical and technical terms.
For example, a computer mouse is a consumable hardware device as it requires replacement in
the long run, either the mouse buttons were degraded that causing inappropriate clicks, or the
whole mouse was no longer detected by the computer itself. Despite the limitations, computer
technology continues to grow, and so does the importance of humancomputer interactions. Ever
since the introduction of a mobile device that can interact with touch screen technology, the
world is starting to demand the same technology to be applied to every technological device, this
includes the desktop system. However, even though the touch screen technology for the desktop
system already exists, the price can be very steep.
Therefore, a virtual human computer interaction device that replaces the physical mouse
or keyboard by using a webcam or any other image capturing devices can be an alternative way
for the touch screen.
This device which is the webcam will be constantly utilized by a software that monitors the
gestures given by the user in order to process it and translate to motion of a pointes, as similar to
a physical mouse.
B.E., Department of CSE, AIT, Chikkamagaluru Page 1

1.1.2 Existing Input Devices
It is known that there are various types of physical computer mouse in the modern
technology, the following will discuss about the types and differences about the physical mouse.
Known as the trackball mouse that is commonly used in the 1990s, the ball within the mouse is
supported by two rotating rollers in order to detect the movement made by the ball itself. One
roller detects the forward/backward motion while the other detects the left/right motion. The ball
within the mouse is steel made that was covered with a layer of hard rubber, so that the detection
is more precise. The common functions included are the left/right buttons and a scroll-wheel.
However, due to the constant friction made between the mouse ball and the rollers itself,
the mouse are prone to degradation, as overtime usage may cause the rollers to degrade, thus
causing it to unable to detect the motion properly, rendering it useless. Furthermore, the switches
in the mouse buttons are no different as well, as long-term usage may cause the mechanics within
to be loosed and will no longer detect any mouse clicks till it was disassembled and repaired.
It is fair to say that the Virtual Mouse will soon to be substituting the traditional physical
mouse in the near future, as people are aiming towards the lifestyle where that every
technological device can be controlled and interacted remotely without using any peripheral
devices such as the remote, keyboards, etc. It doesn't just provide convenience, but it's cost
effective as well. Virtual Mouse that will soon to be introduced to replace the physical computer
mouse to promote convenience while still able to accurately interact and control the computer
system. To do that, the software requires to be fast enough to capture and process every image,
in order to successfully track the user's gesture.
1.1.3 Applications
Therefore, this project will develop a software application with the aid of the latest
software coding technique and the open-source computer vision library also known as the
OpenCV. The application of the project is as below:
Real time application like 2D and 3D images can be drawn using the AI virtual system using
the hand gestures.
(i) User friendly application and Provides greater flexibility than the existing
system and easy to adapt.
(ii) Removes the requirement of having a physical mouse

1.2 Motivation

• For disabled people or physically challenged person it is very difficult to control the
computer. These barriers can be grouped into three functional categories : barriers to
providing computer input, interpreting the output, and reading supporting
documentation. To solve this problem virtual system is developed.
• To provide an easier interaction routine to the user.
• Real time application like 2D and 3D images can be drawn using the AI virtual system
using the hand gestures
• Constant operation of input devices by hand can strain finger muscles and it may lead
to numbness(carpal tunnel syndrome). This has medical importance.
• It helps people with physical disabilities.
• During surgeries it is difficult to manually handle the devices in such cases through
gestures it will be easier.
• In keyboards some letters won’t type, a tiny piece of debris can get stuck under a few
of our keys. In such situations our virtual keyboard will be helpful.
• Amidst of COVID-19 situation it is not safe to use the devices by touching them
because it may result in a possible situation of spread of the virus by touching the
devices.
• To overcome these problems proposed system can be used.
1.3 Problem statement
“To design and implement a smart instrument to recognize the gesture of the user
and imitate the activities of the input devices”
• Input : Gesture
Process :
Preprocessing:
• Image Segmentation
• Filtering
• Tracking
Feature Extraction
Image classification
Output : Smart Instrument which Recognizes the Gesture of the User.

It's no surprised that all technological devices have its own limitations, especially when
it comes to computer devices. After the review of various type of the physical mouse,
the problems are identified and generalized.
The following describes the general problem that the current physical mouse suffers:
• Physical mouse is subjected to mechanical wear and tear.
• Physical mouse requires special hardware and surface to operate.
• Physical mouse is not easily adaptable to different environments and its

performance varies depending on the environment.
• Mouse has limited functions even in present operational environments
All wired mouse and wireless mouse have its own lifespan.
1.4 Scope of the project
• The proposed model has a greater accuracy of 99% which is far greater than the that
of other proposed models for virtual mouse, and it has many applications
• Amidst the COVID-19 situation, it is not safe to use the devices by touching them
because it may result in a possible situation of spread of the virus by touching the
devices, so the proposed AI virtual mouse can be used to control the PC mouse
functions without using the physical mouse
• The system can be used to control robots and automation systems without the usage
of devices
• 2D and 3D images can be drawn using the AI virtual system using the hand gestures
• AI virtual mouse can be used to play virtual reality- and augmented realitybased
games without the wireless or wired mouse devices
• Persons with problems in their hands can use this system to control the mouse
functions in the computer
• In the field of robotics, the proposed system like HCI can be used for controlling
robots

• In designing and architecture, the proposed system can be used for designing
virtually for prototyping
• Provides greater flexibility than the existing system and easy to adapt.
1 .5 Objectives of Proposed System
The main objectives of the project are,
• To develop a Virtual Mouse application that targets a few aspects of significant

development.
• To program the camera to continuously capturing the images, which the images will
be analyzed, by using various image processing techniques.
• To convert hand gesture/motion into mouse input that will be set to a particular
screen position.
• To eliminate the needs of having a physical mouse while able to interact with the
computer system through webcam by using various image processing techniques,
which will help for beginners.
• To develop a Virtual Mouse application that can be operational on all kind of

surfaces and environment.
• To detect the position of the defined colours where it will be set as the position of
the mouse pointers.
1.6 Literature survey
As modern technology of human computer interactions become important in our

everyday lives, varieties of mouse with all kind of shapes and sizes were invented, from a
casual office mouse to a hard-core gaming mouse.
However, there are some limitations to these hardware as they are not as environmental
friendly as it seems. For example, the physical mouse requires a flat surface to operate, not to
mention that it requires a certain area to fully utilize the functions offered. Furthermore, some
of these hardware are completely useless when it comes to interact with the computers
remotely due to the cable lengths limitations, rendering it inaccessible.

1. Zhengyou, Z., Ying, W. and Shafer, S. (2001)
“Visual Panel: Virtual Mouse, Keyboard and 3D Controller with an Ordinary

Piece of Paper”
To overcome the stated problems, Zhengyou et al. (2001), proposed an interface system
named Visual Panel that utilize arbitrary quadrangle-shaped planar object as a panel to allow
the user to use any tip-pointer tools to interact with the computer. The interaction movements
will be captured, analysed and implement the positions of the tip-pointer, resulting accurate
and robust interaction with the computer. The overall system consists of panel tracker, tip-
pointer tracker, holography, calculation and update, and action detector and event generator as
it can simulate both mouse and keyboard.
Figure 1.1: The system overview of Visual Panel (Zhengyou, Ying and Shafer, 2001)
However, although the proposed system solved the issues of cable length limitations, it still
requires a certain area and material to operate. Zhengyou et al., have mentioned that the system
can accepts any panel as long as it is quadrangle- shaped, meaning any other shape besides
stated shape are not allowed.
2. Niyazi, K. (2012).
“Mouse Simulation Using Two Coloured Tapes”
Kamran Niyazi et al. (2012), mentioned that to solve the stated problem, ubiquitous
computing method is required. Thus, colour tracking mouse simulation was proposed. The
said system tracks two colour tapes on the user fingers by utilizing the computer vision

technology. One of the tapes will be used for controlling the movement of the cursor while
the other will act as an agent to trigger the click events of the mouse.
Figure 1.2: The system architecture of the mouse-simulation (Niyazi, 2012)
To detect the colours, the system are first required to process the captured image by
separating the hand pixels from the non-hand pixels, which can be done by background
subtraction scheme that segments the hands movement information from the non-changing
background scene. In order to implement this, the system requires to capture a pair of images
to represent the static workplace from the camera view.
When subtraction process is complete, the system will undergo another process that
separates the RGB pixels to calculate the probability and differentiate the RGB values to
determine which part are the skin and which are not. After the process is completed, it will
start detecting the defined colour in the image, the image RGB pixels will be converted into
HSV colour plane in order to eliminate the variation in shades of similar colour.
The resulting image will be converted to Binary Image and will undergo a filtering process
to reduce the noise within the image.

Snapshots 1.1: Yellow colour tape for cursor movement (Niyazi, 2012)
Even though the proposed system solved most of the stated issues, but there are limited
functions offered by the proposed system as it merely able to perform common functions,
such as: cursor movements, left/right click, and double clicks. While other functions, such
as the middle click and mouse scroll were ignored.
3. Sekeroglu, K. (2010).
“Virtual Mouse Using a Webcam”
Another colour detection method proposed by Kazim Sekeroglu (2010), the system
requires three fingers with three colour pointers to simulate the click events. The proposed
system are capable of detecting the pointers by referring the defined colour information, track
the motion of the pointers, move the cursor according to the position of the pointer, and
simulate the single and double left or/and right click event of the mouse.

Snapshots 1.2: Input image using one and two pointers (Sekeroglu, 2010)
To detect the colours, they have utilized the MATLAB's built in "imsubtract" function,
with the combination of the noise filtering by using median filter, which are effective in filtering
out or at least reduce the "salt and pepper" noise. The captured image will be converted to Binary
Scale Image by using MATLAB's built in "im2bw" function to differentiate the possible values
for each pixel. When the conversion is done, the captured image will undergo another filtering
process by using "bwareaopen" to remove the small areas in order to get an accurate number of
the object detected in the image.
4. Shuman Tian, Xianbin Cao.: October 2016
“Portable Vision-Based Human Computer Interaction(HCI)”
Another "Ubiquitous Computing" approach proposed by Chu-Feng Lien (2015), requires only
finger-tips to control the mouse cursor and click events. The proposed system doesn't requires
hand-gestures nor colour tracking in order to interact with the system, instead it utilize Motion
History Images(MHI) , a method that used to identify movements with a row of images in time
increase of error rate.
Furthermore, due to the mouse click events occurred when the finger hold on a certain
positions, this may lead to user constant finger movements to prevent false alarm, which may
result inconvenience.

Figure 1.3: The Flow Chart of Portable Vision-Based Human Computer Interaction (Chu-Feng, 2008)
There are abundance of methods for computer interaction besides the traditional physical mouse
interaction. With the evolutions of modern technology and programming, so does the Human Computer
Interaction (HCI) methods, as it allows unlimited ways to access the computers. This approach allows the
developers to design specific/unique system that suit the needs of the users, from gesture movement tracking to
coloured tracking, it's no surprise that in near future, physical mouse will no longer be needed and be replaced
by video cameras that tracks gestures.
5. KRUTIKA JADHAV, FAHAD JAGIRDAR,SAYALI MANE,JAHANGIR SHAHABADI - 2016
“Virtual Keyboard and Virtual Mouse”

Computing is not limited to desktops and laptops; it has found its way into mobile devices like palm tops and
even cell phones. But what has not changed for the last 50 or so odd years is the input device, the good old
QWERTY keyboard. Virtual keyboard uses sensor technology and artificial intelligence to let users work on
any surface as if it were a keyboard. This paper develops an application to -visualize the keyboard of computer
with the concept of image processing. The virtual keyboard should be accessible and functioning. With the help
of camera image of keyboard will be fetched. The typing will be captured by camera, as we type on cardboard
simply drawn on paper.
Camera will capture finger movement while typing. So basically this is giving the virtual keyboard. This
paper also presents a vision based virtual mouse that will take finger coordinates as input. The mouse will use
our finger for recognition of our mouse. While to develop a system that will work as virtual keyboard i.e. with
the help of camera image of keyboard will be fetched. The typing will be captured by camera, as we type on
cardboard simply drawn on paper. Camera will capture finger movement while typing. Virtual keyboard
technology is an application of virtual reality. While to develop a system that will work as virtual keyboard i.e.
with the help of camera image of keyboard will be fetched Virtual reality means enabling single or multiple
users to move and react in a computer simulated environment. It contains various types of devices, which allow
users to sense and manipulate virtual objects. However, all of them used different methods to make a clicking
event. In our project we are using both virtual keyboard and mouse.
In virtual keyboard we are capturing the movement of finger tapping simply on blank paper or card board
while in mouse we are capturing the finger movement with the help of camera. Thus virtual keyboard/mouse
which makes the human computer interaction simpler being a small, handy, well-designed and easy to use
application, turns into a perfect solution for cross platform multilingual text input.
1.5 Organization of the report
The report has been organized into below chapters.
Chapter 1- Introduction:
Introduction about the project, Motivation ,Problem Statement, Scope of the project,
objectives, literature survey, Organization of Report, Summary

Chapter 2 – System requirement specification:
As the name suggested the second chapter consisting of specific requirement, software and
hardware requirements that used in this project. Also, we summarize this chapter at the end.
Chapter 3 – High level Designed:
The chapter contains design consideration which we made, architecture of proposed system, usecase
diagram use case diagram are used by the specification of the system. It also includes dataflow diagram for each
module and used states
Chapter 4 – Detail Design:
The chapter 4 explains about the detailed functionalities and description of each module and the
structural chart diagram. This report is organized as 6 chapters namely, Introduction, Analysis, D esign,
Implementation, Testing and lastly Conclusion and Future Enhancements.
1.6 Summary
The first chapter describes the short introduction of the lesion detection in kidney using CNN. The
motivation of the project is discussed in section 1.2. Problem statement of the project explained in section 1.3.
And the scope and objective of the project described in section 1.4 and 1.5 respectively. Finally, section 1.6
gives details of the literature survey reviews of the important paper referred by the above-mentioned author.

CHAPTER 2
SYSTEM REQUIREMENT SPECIFICATION

System requirement specification gathered by extracting the appropriate information to implement the
system. It is the elaborative condition which the system need to attain. Moreover, SRS delivers a complete
knowledge of the system to understand what its project is going to achieve without any constraints on how to
achieve this goal. This SRS not providing the information to outside characters but hides the plane and gives
little implementation details.
2.1 Specific requirements
2.1.1 Python Programming Language
In technical terms, Python is an object-oriented, high-level programming language with integrated

dynamic semantics primarily for web and app development. It is extremely attractive in the field of Rapid
Application Development because it offers dynamic typing and dynamic binding options. Python is relatively
simple, so it's easy to learn since it requires a unique syntax that focuses on readability. Developers can read and
translate Python code much easier than other languages. In turn, this reduces the cost of program maintenance and
development because it allows teams to work collaboratively without significant language and experience
barriers.
Additionally, Python supports the use of modules and packages, which means that programs can be
designed in a modular style and code can be reused across a variety of projects. Once you've developed a module
or package you need, it can be scaled for use in other projects, and it's easy to import or export these modules.One
of the most promising benefits of Python is that both the standard library and the interpreter are available free of
charge, in both binary and source form. There is no exclusivity either, as Python and all the necessary tools are
available on all major platforms. Therefore, it is an enticing option for developers who don't want to worry about
paying high development costs. Python is a programming language used to develop software on the web and in
app form, including mobile. It's relatively easy to learn, and the necessary tools are available to all free of charge.

That makes Python accessible to almost anyone. Python is a general-purpose programming language,
which is another way to say that it can be used for nearly everything. Most importantly, it is an interpreted
language, which means that the written code is not actually translated to a computer-readable format at runtime.
Whereas, most programming languages

The concept of a "scripting language" has changed considerably since its inception, because Python is now used to
write large, commercial style applications, instead of just banal ones. This reliance on Python has grown even more
so as the internet gained popularity. A large majority of web applications and platforms rely on Python, including
Google's search engine, YouTube, and the web-oriented transaction system of the New York Stock Exchange
(NYSE). You know the language must be pretty serious when it's powering a stock exchange system. In fact, NASA
actually uses Python when they are programming their equipment and space machinery.
Benefits of using Python
There are many benefits of learning Python, especially as your first language, which we will discuss.
It is a language that is remarkably easy to learn, and it can be used as a stepping stone into other programming languages
and frameworks. If you're an absolute beginner and this is your first time working with any type of coding language,
that's something you definitely want.
2.1.2 Python Framework
OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning
software library. OpenCV was built to provide a common infrastructure for computer vision applications and to
accelerate the use of machine perception in the commercial products. Being a BSD-licensed product, OpenCV
makes it easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic
and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and
recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects,
extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a
high resolution image of an entire scene, find similar images from an image database, remove red eyes from images
taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented
reality, etc. OpenCV has more than 47 thousand people of user community and estimated number of downloads
exceeding 18 million. The library is used extensively in companies, research groups and by governmental bodies.
Along with well-established companies like Google, Yahoo, Microsoft, Intel, IBM, Sony, Honda, Toyota
that employ the library, there are many start-ups such as Applied Minds, VideoSurf, and Zeitera, that make
extensive use of OpenCV. OpenCV’s deployed uses span the range from stitching street view images together,
detecting intrusions in surveillance video in Israel, monitoring mine equipment in China, helping robots navigate

and pick up
objects at Willow Garage, detection of swimming pool drowning accidents in Europe, running interactive art in
Spain and New York, checking runways for debris in Turkey, inspecting labels on products in factories around the
world on to rapid face detection in Japan.
It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS.
OpenCV leans mostly towards real-time vision applications and takes advantage of MMX and SSE instructions
when available. A full-featured CUDA and OpenCL interfaces are being actively developed. There are over 500
algorithms and about 10 times as many functions that compose or support those algorithms. OpenCV is written
natively in C++ and has a templated interface that works seamlessly with STL containers .

2.2 Hardware Requirements

The following describes the hardware needed in order to execute and develop the Virtual
Mouse application:
• Computer Desktop or Laptop
The computer desktop or a laptop will be utilized to run the visual software in order to display what
webcam had captured. A notebook which is a small, lightweight and inexpensive laptop computer is
proposed to increase mobility.
System will be using
Processor : Core i3/i5
Main Memory : 4GB RAM and above
Hard Disk : 320GB
• Webcam
Webcam is a necessary component for detecting the image. Sensitivity of mouse is directly proportional
to resolution of camera. If the resolution of camera is good enough, an enhanced user experience is
guaranteed. The webcam serves the purpose of taking real time images whenever the computer starts.
On the basis of gestures and motion of fingers, system will decide the respective action.
2.3 Software Requirements
• Windows 7 and above/Linux/macOS: Microsoft Windows, commonly referred Windows, is a group of

several proprietary graphical operating system families, all of which are developed and marketed by Microsoft.
Each family caters to a certain sector of the computing industry.
Microsoft introduced an operating environment named Windows on November 20, 1985, as a graphical
operating shell system for MS-DOS in response to the growing interest in graphical user interface (GUIs).[5]
Microsoft Windows came to dominate the world's personal computer (PC).
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating
system kernel first released on September 17, 1991, by Linus Torvalds.
Linux is typically packaged in a Linux distribution.
Distribution include the Linux kernel and supporting system software and libraies, many of which are
provided by the GNU Project Many Linux distributions use the word "Linux" in their name, but the Free
software foundation uses the name "GNU/Linux" to emphasize the importance of GNU software macOS is a
series of proprietary graphical operating systems developed and marketed by Apple Inc. since 2001.
It is the primary operating system for Apple's Mac computers. Within the market of desktop, laptop and
home computers, and by web usage, it is the second most widely used desktop OS, after Microsoft Windows.
macOS succeeded the classic Mac OS, a Macintosh operating system with nine releases from 1984 to 1999.
During this time, Apple cofounder Steve Jobs had left Apple and started another company, NeXT, developing
the NeXTSTEP platform that would later be acquired by Apple to form the basis of macOS.
Figure 2.1: Block diagram of Software working

• Anaconda Navigator: It is a free and open-source distribution of python for scientific computing, that
aims to simplify package management and deployment. The distribution includes data-science packages
suitable for Windows, Linux and macOS.
Package versions in Anaconda are managed by the package management system conda. This package
manager was spun out as a separate open-source package as it ended up being useful on its own and for
things other than Python. There is also a small, bootstrap version of Anaconda called Miniconda, which
includes only conda, Python, the packages they depend on, and a small number of other packages.

• Imutils : a series of convenience functions to make a basic image processing functions such as
translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV
and both Python 2.7 and Python 3
• Jupyter notebook: Project Jupyter is a non-profit organization created to "develop open-source

software, open-standards, and services for interactive computing across dozens of programming
languages".
It was spun off from IPython in 2014 by Fernando Pérez and Brian Granger. Project
Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are
Julia, Python and R, and also a homage to Galileo's notebooks recording the discovery of the moons of
Jupiter. Project Jupyter has developed and supported the interactive computing products Jupyter
Notebook, JupyterHub, and JupyterLab. Jupyter is a NumFOCUS fiscally sponsored project.
• OpenCV: OpenCV (Open Source Computer Vision Library) is an open source computer vision and
machine learning software library.
The library has more than 2500 optimized algorithms, which includes a comprehensive set of both
classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be
used to detect and recognize faces, identify objects, classify human actions in videos, track camera
movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo
cameras, stitch images together to produce a high resolution image of an entire scene, find similar images
from an image database, remove red eyes from images taken using flash, follow eye movements,
recognize scenery and establish markers to overlay it with augmented reality, etc.
It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac
OS. OpenCV leans mostly towards real-time vision applications and takes advantage of MMX and SSE
instructions when available. A full-featured CUDA and OpenCL interfaces are being actively developed
right now. There are over 500 algorithms and about 10 times as many functions that compose or support
those algorithms. OpenCV is written natively in C++ and has a templated interface that works seamlessly
with STL containers.
• NumPy : NumPy is a Python library. It is used to supports a large multi-dimensional arrays and
matrices, along with a large collections of high-level mathematical functions to operate on these arrays.
NumPy is open- source software and has many contributors is a library for the Python programming
language, adding support for large, multi- dimensional arrays and matrices, along with a large

collection of highlevel mathematical functions to operate on these arrays. The ancestor of NumPy,
Numeric, was originally created by Jim Hugunin with contributions from several other developers.
In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray
into Numeric, with extensive modifications. NumPy is opensource software and has many contributors.
NumPy is a NumFOCUS fiscally sponsored project.
Pyautogui : This library lets your Python scripts control the mouse and keyboard to automate interactions
with other applications. The Application user interface is designed to be as simple. It works on Windows,
macOS, and Linux, and runs on Python 2 and 3. Pyautogui is the Python module which can automate
your GUI and programmatically control your keyboard and mouse.
2.4 Functional Requirements
Functional requirements are a set of statements that define the functions that the system should provide,
how the system should react to particular input and how the system should behave in particular situations. In
some cases, the functional requirements may also specify what the system should not do. A function is described
as a set of inputs, the behaviour, and outputs. The functional requirements of the proposed system are as follows:
• System Input: system should recognize hand gestures using which the virtual mouse and keyboard works.
• Expected Output: Move the cursor according to the position of the center of the pointer. Simulate the single
and the double left click and the right click of the mouse.
• Expected Behavior: Detect the pointer using the defined color information, define the region and the center
of the pointer and draw a bounding box around it to track the motion of the pointer
2.5 Nonfunctional Requirements

Non-functional requirement is a requirement that specifies criteria that can be used to judge the operation
of a system, rather than specific behaviors. Non-Functional Requirements are the constraints on the services or
function offered by the system. Non-functional necessities are those who don't directly have an effect on the
functioning of the system however have an effect on, the performances of the system. The non-functional
requirements of the proposed system are as follows:
• Input should be present in the local storage.

• Increased processing speed.
• Usability
• Easy Interface for capture of image .

• Performance
• Should not take excessive time
• Supportability
• Contain easy to understand code with provisions for future enhancement.
2.6 Interfaces
When referring to software, an interface is a program that allows a user to interact
computers in person or over a network. An interface may also refer to controls used in a program
that allow the user to interact with the program. One of the best examples of an nterface is a GUI
(Graphical User Interface). This type of interface is what you are using now to navigate your
computer and how you got to this page.
Internal Interfaces
The internal network interface is one of computer hardware that connects the computer
to network.
External Interfaces
The external interfaces are typically a products lifeline to the outside word search
interfaces may be used for number of purposes including connecting connecting to peripherals
field programming or testing during product manufacturing.
Ex: user interfaces, buttons, functions on every screen.
2.7 Summary
The chapter 2 consider all the system requirements which we require to develop this
proposed system. Section 2.1 Specific requirement and 2.2 grants hardware requirements and
2.3 grants software requirement like programming languages have been explained. 2.4 and 2.5
explains about functional and non functional requirements respectively. 2.6 gives a brief on
interfaces

CHAPTER 3
HIGH LEVEL DESIGN
3.1 System Design of Gesture Recognition System
System designs is the process of defining the architecture, Modules, Interfaces, and
data for a system to satisfy specified requirements. Systems design could be seen as the
application of the systems theory to product development.

Figure 3.1:System Architecture of Gesture Recognition based virtual input devices.
3.2 Design Overview

Figure 3.2: Virtual Mouse Block Diagram
During the process of colour recognition, it contains 2 major phases which are the calibration
phase and recognition phase. The purpose of the calibration phase is to allow the system to
recognize the Hue Saturation Values of the colours chosen by the users, where it will store the
values and settings into text documents, which will be used later on during the recognition phase.
While on the recognition phase, the system will start to capture frames and search for colour
input with based on the values that are recorded during the calibration phase.
The phases of the virtual mouse is as shown in figure below.

3.3 Use Case diagram
A use case diagram is a dynamic or behavior diagram in UML. Use case diagrams model
the functionalityof a system using actors and use cases. Use cases are a set of actions, services,
and functions that the system needs to perform. In this context, a "system" is something being
developed or operated, such as a web site. The "actors" are people or entities operating under
defined roles within the system. Use case diagrams are valuable for visualizing the functional
requirements of a system that will translate into design choices and development priorities.
Figure3.3: Use Case diagram of proposed system
The Figure3.3 depicts the use case diagram of the system where the user just has to upload
an image either from the system or can upload it from any website as well. Then rest all the work
will done by the system where the system initializes by extracting the features, then pre-processes
the image, then segmentation of that image, then recognition and then generation of score sheet.

Figure3.4 Use Case diagram of recognition phase
The Figure3.4 illustrate the use case diagram of the recognition phase where the user
provides their hand gesture as input that is captured by the webcamera. The captured image goes
through frame acquisition to get a proper hand gesture image. This image under goes
preprocessing in the system.
Figure 3.5 Use Case diagram of calibration Phase
The Figure3.5 represents use case diagram of the calibration phase, the system and the
input device in which system recognises the colour inputs using the acquisition method then it

undergoes the frame noise filtering by which the undesired frames will be deleted and then it
forwards to the standard deviation calculation, using it the desired hand gesture will be
recognised, the command will be executed and we can see the display of the desired output
3.4 Data Flow Diagram
A data-flow diagram is a way of representing a flow of data through a process or a

system (usually an information system). The DFD also provides information about the outputs
and inputs of each entity and the process itself. A data-flow diagram has no control flow there
are no decision rules and no loops. Specific operations based on the data can be represented by
a flowchart
There are several notations for displaying data-flow diagrams. The notation presented
above was described in 1979 by Tom DeMacro as part of structured analysis.
For each data flow, at least one of the endpoints (source and / or destination) must exist in a
process. The refined representation of a process can be done in another data-flow diagram, which
subdivides this process into sub-processes.
The data-flow diagram is a tool that is part of structured analysis and data modelinf. When using
UML, the activity diagram typically takes over the role of the data-flow diagram. A special form
of data-flow plan is a site-oriented data-flow plan.
Data-flow diagrams can be regarded as inverted Petri nets, because places in such networks
correspond to the semantics of data memories. Analogously, the semantics of transitions from
Petri nets and data flows and functions from data-flow diagrams should be considered equivalent.
Entity names should be comprehensible without further comments. DFD is a system

created by analysts based on interviews with system users. It is determined for system developers,
on one hand, project contractor on the other, so the entity names should be adapted for model
domain or amateur users or professionals. Entity names should be general (independent, e.g.
specific individuals carrying out the activity), but should clearly specify the entity.
Processes should be numbered for easier mapping and referral to specific processes. The
numbering is random, however, it is necessary to maintain consistency across all DFD levels (see
DFD Hierarchy). DFD should be clear, as the maximum number of processes in one DFD is
recommended to be from 6 to 9, minimum is 3 processes in one DFD.

The exception is the so-called contextual diagram where the only process symbolizes the
model system and all terminators with which the system communicates.
Figure 3.6 Data Flow Diagram of recognition Phase
The above figure 3.6 represents the data flow diagram of the recognition face the user
will capture the live video of the gestures which undergoes image frame acquisition will get the
desired frames which undergoes the noise filtering and the image undergoes binarization where
will get the binary image then will calculate the colour coordinates and which helps in the display
of desired commands.

Figure 3.7 Data Flow Diagram of calibration phase
The figure 3.7 depicts the DFD of the calibration phase here the user inputs the video
which undergoes image frame acquisition through that process will get the frames the frames
undergoes the noise filtering where the undesired background noises will be deleted and then the
enhanced image undergoes binarization so where will get the binary image in in that binary image
will calculate the colour coordinates and the desired command will be executed image as an
output that has hsv image undergoes standard deviation calculation then the calculated values
will perform the mouse and keyboard operation the result will be displayed on the screen.
3.5 Flow Chart
A flowchart is a type of diagram that represents a workflow or process. A flowchart can

also be defined as a diagrammatic representation of an algorithm, a step-by-step approach to
solving a task. The flowchart shows the steps as boxes of various kinds, and their order by
connecting the boxes with arrows.

Figure:3.8 Flow Chart for gesture recognition system
The above figure 3.8 shows the flowchart that is a very simple yet powerful tool to
improve productivity in both our personal and work life. Here are some ways flowchart can be
helpful in Document a process, Present solution to a problem, Brainstorm ideas in a meeting,
Design an operation system, Explain a decision making process, Store information, Draw an
organizational chart, Create a visual user journey, Create a sitemap.

3.6 State Chart Diagram
A state chart diagram is a type of diagram used in computer science and related fields to
describe the behaviour of systems. State diagrams require that the system described is composed
of a finite number of states; sometimes, this is indeed the case, while at other times this is a
reasonable abstraction. Many forms of state diagrams exist, which differ slightly and have
different semantics. State diagrams are used to give an abstract description of the behaviour of a
system.
This behavior is analyzed and represented by a series of events that can occur in one or
more possible states. Hereby "each diagram usually represents objects of a single class and track
the different states of its objects through the system". A state chart diagram shows the behaviour
of classes in response to external stimuli. Specifically, a state diagram describes the behaviour
of a single object in response to a series of events in a system. Sometimes it's also known as a
Harel state chart or a state machine diagram. This UML diagram models the dynamic flow of
control from state to state of a particular object within a system.
State chart diagrams are good at describing the behaviour of an object across several use
cases. State diagrams are not very good at describing behaviour that involves a number of objects
collaborating. As such, it is useful to combine state diagrams with other techniques. For instance,
interaction diagrams are good at describing the behaviour of several objects in a single use case,
and activity diagrams are good at showing the general sequence of actions for several objects
and use cases.
The below figure represents the Class diagram of the gesture recognition system the first
step is to capture the video using the live video capture methods it will be the initial stage then it
undergoes the filtering which includes frame noise filtering the background noise will be
subtracted then it undergoes binarization from frame to hsv and hsv to the binary image then the
colour detection which includes colour combination and comparison next will be the execution
phase which executes the display of desired commands.

Figure 3.9: State Chart Diagram of proposed system
3.7 Summary
In third chapter, high level design of the propose method is discussed. Section 3.1
represents the design considerations for the project. Section 3.2 discusses the system architecture
of the proposed system. This gives a basic working of the system. Section 3.3
describes use case diagram. Section 3.4 describes state chart diagram
CHAPTER 4
DETAILED DESIGN
Detailed design is the process of defining the components, modules, interfaces and data
for a system to satisfy specified requirements. System model is a phase where an internal logic
of each of these modules specified in high-level design is decided. In this phase further details

is specified. Other low-level components and subcomponents are also described as well.
Detailed design of each module of our project is described below.
4.1 Structural Chart

Structural Chart is a static diagram. It represents the static view of an application. Class
diagram is not only used for visualizing, describing, and documenting different aspects of a
system but also for constructing executable code of the software application.
Class diagram describes the attributes and operations of a class and also the constraints
imposed on the system. The class diagrams are widely used in the modelling of object oriented
systems because they are the only UML diagrams, which can be mapped directly with object
oriented languages. Class diagrams are the main building blocks of every object-oriented method.
The class diagram can be used to show the classes, relationships, interface, association, and
collaboration. UML is standardized in class diagrams. Since classes are the building block of an
application that is based on OOPs, so as the class diagram has an appropriate structure to
represent the classes, inheritance, relationships, and everything that OOPs have in their context.
It describes various kinds of objects and the static relationship between them.
Class diagram shows a collection of classes, interfaces, associations, collaborations, and
constraints. It is also known as a structural diagram. Class diagrams are the only diagrams which
can be directly mapped with object-oriented languages and thus widely used at the time of
construction. Below is figure 4.1.

Figure 4.1: Class diagram of gesture Recognition Input Devices
Figure. 4.1 helps us to know about the class diagram of the image detection and classification
process.
The figure represents the state chart diagram of the gesture recognition system first the video
will be captured using the cv2.videocapture() then the image will be resized using imutils.resize()
then it's the process of filtering using cv.guassianblur that next the image undergoes the binary
translation using cv.cvtcolor() and binary.pattern() then the we will find contours using the
find_contours( ) and draw_contours() and the final step is to display the output which will be
done using cv2.inmshow()
4.2 Functionality and Processing

• Real Time Image Acquisition

The program will start of by capturing real-time images via a webcam where it will wait for
users' colour input. The size of the acquired image will be compressed to a reasonable size to
reduce the processing loads of processing the pixels within the captured frame.
• User's Colour Input Acquisition
The program acquires the frames that consist of input colours submitted by the users, the captured
frame will be sent for process where it will be undergone a series of transition and calculation to
acquire the calibrated HSV values.
Frame Noise Filtering
Every captured frame consists of noises that will affect the performance and the accuracy of
the program, therefore the frame require to be noise free. To do that, filters need to be applied
on the captured frames to cancel out the unwanted noise. For the current project, Gaussian filter
will be used, which is a common smoothing method to eliminate noise frame. This can be done
by using Gaussian Blur(Input Array src, Output Array dst, Size ksize, double sigmaX, double
sigmaY=0, intborderType = BORDER_DEFAULT ).
Before After
Snapshot 4.1: The comparison between un-filtered and filtered frame

• HSV Frame Transition
The captured frame require to be converted from a BGR format to a HSV format.

Which can be done by using cvtColor(src, dst, CV_BGR2HSV). Color vision can be processed using
RGB color space or HSV color space. RGB color space describes colors in terms of the amount
of red, green, and blue present. HSV color space describes colors in terms of the Hue, Saturation,
and Value. In situations where color description plays an integral role, the HSV color model is
often preferred over the RGB model.
The HSV model describes colors similarly to how the human eye tends to perceive color.
RGB defines color in terms of a combination of primary colors, where as, HSV describes color
using more familiar comparisons such as color, vibrancy and brightness.
Snapshot 4.2: HSV image of captured image
RGB - HSV Conversion

Steps:
1. Divide r, g, b by 255
2. Compute cmax, cmin, difference
3. Hue calculation :
1. if cmax and cmin equal 0, then h = 0

2. if cmax equal r then compute h = (60 * ((g – b) / diff) + 360) % 360 3. if

cmax equal g then compute h = (60 * ((b – r) / diff) + 120) % 360
4. if cmax equal b then compute h = (60 * ((r – g) / diff) + 240) % 360
4. Saturation computation :
1. if cmax = 0, then s = 0
2. if cmax does not equal 0 then compute s = (diff/cmax)*100
5. Value computation :
1. v = cmax*100
RGB - HSV Calculation
Red = 45
Green = 215
Blue = 0
Cmax = 0.84
Cmin =0.17 diff
=0.67
H = (60* ((b-r) / diff) +120) % 360

=(60* ((0- 0.17)/0.67) +120) % 360
= 105.44
S =(diff /cmax)* 100

=(0.67/0.84)* 100
=29.4.
V=Cmax *100
=0.84*100
=84
H=105.44
S=29.4
V=84
• HSV Values Extraction
In order to acquire the HSV values, the converted frame require to be split into 3 single
different planes, to do that the frame needs to be divided from a multi-channel array into a single
channel array, which can be done by using split(const Mat& src, Mat* mvbegin).
Hues are the three primary colors (red, blue, and yellow) and the three secondary colors
(orange, green, and violet) that appear in the color wheel or color circle. When you refer to hue,
you are referring to pure color, or the visible spectrum of basic colors that can be seen in a
rainbow.
• Standard Deviation Calculation
To obtain the maximum and the minimum of the HSV values, it requires to gone through
the Standard Deviation calculation, a measurement used to quantify the amount of variation /
dispersion among other HSV values. Furthermore, to obtain an accurate range of values, three-
sigma rule are required in the calculation, so that chances of the captured values have a very high
possibility to fall within the threesigma intervals.
• Webcam & Variables Initialization
On the early stage of the recognition phase, the program will initialize the required variables
which will be used to hold different types of frames and values where each are will be used to
carry out certain task. Furthermore, this is the part where the program collects the calibrated HSV
values and settings where it will be used later during the transitions of Binary Threshold.
• Real Time Image Acquisition
The real time image is captured by using the webcam by using (cv:: VideoCapture cap(0);),
where every image captured are stored into a frame variable ( cv::Mat), which will be flipped and
compressed to a reasonable size to reduce process load.
• Frame Noise Filtering
Similar to the noise filtering during the calibration phase, Gaussian filters will be applied
to reduce the existing noise of the captured frames. This can be done by using Gaussian
Blur(Input Array src, Output Array dst, Size ksize, double sigmaX double sigmaY=0,
intborderType=BORDER_DEFAULT ).

1 2 1
1/6 x 2 4 2 mask
1 2 1
7 9 5
ex: 4 6 8
2 0 1
Gaussian filter =1/16 [7x1+9x2+5x1+4x2+4x6+8x2+2x1+0x2+1x1

=1/16[81]
=5.062
=5
* used to blur the edges and reduce contrast
* similar to median filter but faster
• Binary Threshold Transition
The converted HSV frame will undergone a range check to check if the HSV values of
the converted frame lies between the values of the HSV variables gathered during the calibration
phase. The result of the range check will convert the frame into a Binary Threshold, where a part
of the frame will set to 255 (1 bit) if the said frame lies within the specified HSV values, the
frame will set to 0 ( 0 bit) if otherwise.

Snapshot 4.3: Indications of the original captured frame and the converted Binary Threshold
Calculation of Binary Threshold
A simple thresholding example would be selecting a threshold value T, and then setting
all pixel intensities less than T to 0, and all pixel values greater than T to 255 By thresholding
the image using HSV, we can separate the image into the vision target (foreground), and the
other things that the camera sees (background). The following code example converts a HSV
image into a binary image by thresholding with HSV values.
7 9 5 255 255 255
4 6 8 0 255 255
2 0 1 0 0 0
Suppose, T=5
Figure 4.2 : Binary Threshold
• Binary Threshold Morphological Transformation

After the binary threshold is obtained, the frame will be undergone a process called
Morphological Transformation, which is a structuring operation to eliminate any holes and small
object lurking around the foreground. The transformation consists of two morphological
operators, known as Erosion and Dilation.
The Erosion operator are responsible for eroding the boundaries of the foreground object,
decreasing the region of the binary threshold, which is useful for removing small noises. As for
Dilation, it is an opposite of erosion, it increases the region of the binary threshold, allowing
eroded object to return to its original form.
For the current project, both operators were used for morphological Opening and Closing,
where Opening consists of combination of erosion followed by dilation, which is very useful in
removing noise, whereas Closing is the opposite of Opening, which is useful in closing small
holes inside the foreground object. Morphological closing operation is then performed on the
obtained binary image.
The morphological close operation is a dilation followed by erosion. It groups together
pixels in close proximity to form a single object. The result is a binary image showing only
moved blue objects. The factors of the neighborhood such as shape and size can be decided by
the programmer, thereby constructing programmer-defined morphological operations for the
input image.
The most basic morphological operations are dilation and erosion Dilation added pixels
to the objects in an image, while erosion removes pixels on object boundaries. According to the
structuring element used, the number of pixels added or removed will differ.
Figure 4.3 Dilation and Erosion

• Colour Combination Comparison
After obtaining results from Morphological Transformation process, the program will calculate
the remaining number of objects by highlighting it as blobs, this process requires cvBlob library,
which is an add-on to OpenCV.
The results of the calculation will then send for comparison to determine the mouse
functions based on the colour combinations found within the captured frames.
A Blob, in a sense, is anything that is considered a large object or anything bright in a
dark background, in images, we can generalize it as a group of pixel values that forms a somewhat
colony or a large object that is distinguishable from its background. Using image processing, we
can detect such blobs in an image.
Figure 4.4 Binary Large Objects [BLOBs]
• Colours' Coordinates Acquisition

For everyobjectwithinthebinarythreshold, the program willhighlight the overall shape of
the object (cvRenderBlobs(const IplImage *imgLabel, CvBlobs &blobs, IplImage *imgSource,
IplImage *imgDest, unsigned short mode=0x000f, double alpha=1.);), where it will calculate
the area of the shape and the coordinates of midpoint of the shapes.
The coordinates will be saved and used later in either setting cursor positions, or to calculate
the distance between two points to execute various mouse functions based on the result collected.
• Execution of Mouse Action

The program will executes the mouse actions based on the colours combinations exist in
the processed frame. The mouse actions will perform according to the coordinates provided by
the program, and the program will continue on acquire and process the next realtime image until
the users exit from the program.
4.3 Activity Diagram
Activity diagram is another important diagram in UML to describe the dynamic aspects
of the system. Activity diagram is basically a flowchart to represent the flow from one activity
to another activity. The activity can be described as an operation of the system.
The control flow is drawn from one operation to another. This flow can be sequential,
branched, or concurrent. Activity diagrams deal with all type of flow control by using different
elements such as fork, join, etc.
The basic purposes of activity diagrams are similar to other four diagrams. It captures the
dynamic behaviour of the system. Other four diagrams are used to show the message flow from
one object to another but activity diagram is used to show message flow from one activity to
another.
Activity is a particular operation of the system. Activity diagrams are not only used for
visualizing the dynamic nature of a system, but they are also used to construct the executable
system by using forward and reverse engineering techniques. The only missing thing in the
activity diagram is the message part. It does not show any message flow from one activity to
another. Activity diagram is sometimes considered as the flowchart. Although the diagrams look
like a flowchart, they are not. It shows different flows such as parallel, branched, concurrent, and
single.
Activity diagrams are mainly used as a flowchart that consists of activities performed
by the system. Activity diagrams are not exactly flowcharts as they have some additional
capabilities. These additional capabilities include branching, parallel flow, swimlane, etc.
Before drawing an activity diagram, we must have a clear understanding about the
elements used in activity diagram. The main element of an activity diagram is the activity itself.
An activity is a function performed by the system. After identifying the activities, we need to
understand how they are associated with constraints and conditions.

In the below figure 4.5 the activity diagram of the system is given where step by step
procedures are followed to detect the species of the bird. Here the user is supposed to upload an
image, if any error is occurred or any interruption happens, the process will stop right at that
moment and the user is supposed to restart procedure.
Figure 4.5: Activity Diagram of gesture recognition input devices
If the image is successfully uploaded, then the system will take the required steps to
identify the species of the bird appearing in the image. The process involves pre-processing,
feature extraction, segmentation, recognition and at last output is generated by the generation of
score sheet.

4.4 Sequence Diagram

A sequence diagram shows object interaction arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages exchanged between
the object needed to carry out the functionality of the scenario. Sequence diagrams are typically
associated with use case realization in the logical view of the system under development.
Sequence diagrams are sometimes called event diagrams or event scenarios.
Sequence Diagrams are interaction diagrams that detail how operations are carried out.
They capture the interaction between objects in the context of a collaboration. Sequence
Diagrams are time focus and they show the order of the interaction visually by using the vertical
axis of the diagram to represent time what messages are sent and when. They are a popular
dynamic modelling solution in UML because they specifically focus on lifelines, or the processes
and objects that live simultaneously.
Figure 4.6: Sequence Diagram

4.5 Summary
The fourth chapter gives the detailed design of the system. In section 4.1 the structural
chart for this system is shown. Section 4.1 gives the class diagram. Section 4.2 gives the Activity
Diagram and section 4.3 gives the Sequence Diagram.

CHAPTER 5
IMPLEMENTATION REQUIREMENTS
In this project hardware usage is less. We use a personal computer or a laptop

(intel i3/i5 2.4 Ghz)with ram 4/8 Gb. Webcam is used throughout the project for gesture
recognition.
When it comes to software implementation, operating system is mandatory for
every system to work .In the instant project we are using windows 10 as operating system,
albeit, Windows Xp / windows 7 can be used. Python is used as programming language,
to code the programme. Opencv library is used for hand gesture detection and recognition
which is a Python library with free access. It can easily be installed on Anaconda using
pip install command. Image processing toolbox is used for image processing.
5.1 Implementation Requirement

To recognize the hand gesture,the used software is as follows:
➢ Python is used as the Programming language.
➢ Windows 10 is the Operating system.
➢ Input device operations based on gestures.
5.2 Programming Language Selection

The main necessary component required to develop the proposed method is Python
Programming idiom.This language together provides the platform for execution and
the
Programming idiom.
5.2.1 Python Language

In technical terms, Python is an object-oriented, high-level programming
language with integrated dynamic semantics primarily for web and app development.
It is extremely attractive in the field of Rapid Application Development because it offers
dynamic typing and dynamic binding options.

Python is relatively simple, so it's easy to learn since it requires a unique syntax
that focuses on readability. Developers can read and translate Python code much easier
than other languages. In turn, this reduces the cost of program maintenance and
development because it allows teams to work collaboratively without significant
language and experience barriers.
Additionally, Python supports the use of modules and packages, which means
that programs can be designed in a modular style and code can be reused across a variety
of projects. Once you've developed a module or package you need, it can be scaled for
use in other projects, and it's easy to import or export these modules.
One of the most promising benefits of Python is that both the standard library
and the interpreter are available free of charge, in both binary and source form. There
is no exclusivity either, as Python and all the necessary tools are available on all major
platforms. Therefore, it is an enticing option for developers who don't want to worry
about paying high development costs.
If this description of Python over your head, don't worry. You'll understand it
soon enough. What you need to take away from this section is that Python is a
programming language used to develop software on the web and in app form, including
mobile. It's relatively easy to learn, and the necessary tools are available to all free of
charge.
That makes Python accessible to almost anyone. If you have the time to learn,
you can create some amazing things with the language.
Python is a general-purpose programming language, which is another way to say that it

can be used for nearly everything. Most importantly, it is an interpreted language, which
means that the written code is not actually translated to a computer-readable format at
runtime. Whereas, most programming languages do this conversion before the program
is even run. This type of language is also referred to as a "scripting language" because it
was initially meant to be used for trivial projects.
The concept of a "scripting language" has changed considerably since its

inception, because Python is now used to write large, commercial style applications,
instead of just banal ones. This reliance on Python has grown even more so as the
internet gained popularity. A large majority of web applications and platforms rely on
Python, including Google's search engine, YouTube, and the web-oriented transaction
system of the New York Stock Exchange (NYSE). You know the language must be
pretty serious when it's powering a stock exchange system. In fact, NASA actually uses
Python when they are programming their equipment and space machinery.
Benefits of using Python

There are many benefits of learning Python, especially as your first language, which we
will discuss.
It is a language that is remarkably easy to learn, and it can be used as a stepping stone
into other programming languages and frameworks. If you're an absolute beginner and this is
your first time working with any type of coding language, that's something you definitely
want.
Python is widely used, including by a number of big companies like Google, Pinterest,
Instagram, Disney, Yahoo!, Nokia, IBM, and many others. The Raspberry Pi - which is a
mini computer and DIY lover's dream - relies on Python as it's main programming
language too. You're probably wondering why either of these things matter, and that's
because once you learn Python, you'll never have a shortage of ways to utilize the skill. Not
to mention, since a lot of big companies rely on the language, you can make good money
as a Python developer. Other benefits include:
1) Python can be used to develop prototypes, and quickly because it is so easy to work with
and read.
2) Most automation, data mining, and big data platforms rely on Python. This is because it
is the ideal language to work with for general purpose tasks.
3) Python allows for a more productive coding environment than massive languages like C#
and Java. Experienced coders tend to stay more organized and productive when working
with Python, as well.
4) Python is easy to read, even if you're not a skilled programmer. Anyone can
begin working with the language, all it takes is a bit of patience and a lot of

practice. Plus, this makes it an ideal candidate for use among multi-programmer
and large development teams.
5) Python powers Django, a complete and open source web application
framework. Frameworks - like Ruby on Rails - can be used to simplify the
development process.
6) It has a massive support base thanks to the fact that it is open source and community
developed. Millions of like-minded developers work with the language on a daily
basis and continue to improve core functionality. The latest version of Python
continues to receive enhancements and updates as time progresses. This is a great way
to network with other developers.
5.2.2 OpenCV
OpenCV (Open Source Computer Vision Library) is an open source computer vision
and machine learning software library. OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine
perception in the commercial products. Being a BSD-licensed product, OpenCV makes it
easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a
comprehensive set of both classic and state-of-the-art computer vision and machine
learning algorithms. These algorithms can be used to detect and recognize faces, identify
objects, classify human actions in videos, track camera movements, track moving objects,
extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images
together to produce a high resolution image of an entire scene, find similar images from an
image database, remove red eyes from images taken using flash, follow eye movements,
recognize scenery and establish markers to overlay it with augmented reality, etc. OpenCV
has more than 47 thousand people of user community and estimated number of downloads
exceeding 18 million. The library is used extensively in companies, research groups and by
governmental bodies.
Along with well-established companies like Google, Yahoo, Microsoft, Intel, IBM, Sony,
Honda, Toyota that employ the library, there are many start-ups such as Applied Minds,
VideoSurf, and Zeitera, that make extensive use of OpenCV. OpenCV’s deployed uses
span the range from stitching street view images together, detecting intrusions in
surveillance video in Israel, monitoring mine equipment in China, helping robots navigate

and pick up objects at Willow Garage, detection of swimming pool drowning accidents in
Europe, running interactive art in Spain and New York, checking runways for debris in
Turkey, inspecting labels on products in factories around the world on to rapid face
detection in Japan.
It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux,
Android and Mac OS. OpenCV leans mostly towards real-time vision applications and
takes advantage of MMX and SSE instructions when available. A full-featured CUDA and
OpenCL interfaces are being actively developed. There are over 500 algorithms and about
10 times as many functions that compose or support those algorithms. OpenCV is written
natively in C++ and has a templated interface that works seamlessly with STL containers.
5.3 Main Packages

Some of the main packages used in this project are mentioned below.
Packages Description
Python is a library of python bindings

Import cv2 as cv designed to solve computer vision problems.
This library allows you to control and

Pynput monitor input devices . currently mouse and
keyboard input and monitoring are
supported.
From pynput.mouse import The package pynput.mouse contains classes

for controlling and monitoring the mouse.
Import wx In a GUI interface ,the input is most

commonly collected in a textbox where the
user can type using the keyboard in wx
python, the object of wx.

Is a python library used for working with

NumPy arrays. It also has functions for working in
domain of linear algebra and matrices
.NumPy was created in 2005 by Travis
Oliphant . It is a open source project and you
can use it freely NumPy stands for numerical
python.
OpenCV is used for all sorts of image and video
OpenCV analysis, like facial recognition and detection,
license plate reading, photo editing, advanced
robotic vision, optical character recognition,
and a whole lot more.
NumPy stands for ‗Numerical Python„ or
NumPy ‗Numeric Python„. It is an open source module
of Python which provides fast mathematical
computation on arrays and matrices.
a series of convenience functions to make a basic

image processing functions such as translation.
Imutils rotation, resizing, skeletonization, and displaying
Matplotlib images easier with OpenCV and both
Python 2.7 and Python 3
It is a high-level, interpreted, interactive, and

Python a interactive programming language. Object-
oriented programming language This
programming language is a kind of It's
written in a way that's easy to understand,
with a lot of English keywords.
This library lets your Python scripts control
pyautogui the mouse and keyboard to automate
interactions with other applications. The
Application user interface is designed to be as
simple. It works on Windows, macOS, and
Linux, and runs on Python 2 and 3.
Python comes with a built-in package called

Json json for encoding and decoding JSON data.

5.4 Main user defined functions
A User-Defined Function (UDF) is a function provided by the user of a program

or environment, in a context where the usual assumption is that functions are built into
the program or environment. Below mentioned are the most important user-defined
functions used.
➢ Getting an input image from the user
The code given below represents getting an input image from the user
mouse = Controller()
app = wx.App(False)
(sx,sy) = wx.GetDisplaySize()
(camx,camy) = (320,240)
cam = cv.VideoCapture(0)
cam.set(3,camx)
cam.set(4,camy)
mlocold = np.array([0,0])
mouseloc = np.array([0,0])
damfac = 3
pinch_flag = 0
➢ Masking technique
The code given below represents the masking technique:
hsv_img = cv.cvtColor(img,cv.COLOR_BGR2HSV)
mask = cv.inRange(hsv_img,np.array([100,128,0]),np.array([215,255,255]))
#cv.imshow("mask",mask)
mask_open = cv.morphologyEx(mask,cv.MORPH_OPEN,np.ones((5,5)))
#cv.imshow("maskopen",mask_open)
mask_close=cv.morphologyEx(mask_open,cv.MORPH__CLOSE,np.ones((20,20)))
#cv.imshow("mask close",mask_close)
mask_final = mask_close
#cv.imshow("mask final",mask_final)
conts,_=cv.findContours(mask_final.copy(),cv.RETR_EXTERNAL,cv.CHAIN_
APP ROX_SIMPLE)

➢ Background deduction
The code given below represents the background deduction:
img = cv.GaussianBlur(img,(5,5),0)
hsv_img = cv.cvtColor(img,cv.COLOR_BGR2HSV)
mask = cv.inRange(hsv_img,np.array([100,128,0]),np.array([215,255,255]))
mask_open=cv.morphologyEx(mask,cv.MORPH_OPEN,np.ones((5,5)))
mask_close=cv.morphologyEx(mask_open,cv.MORPH_CLOSE,np.ones((20,20)
mask_final = mask_close
conts,_ = cv.findContours
(mask_final.copy(),cv.RETR_EXTERNAL,cv.CHAIN_APPROX_SIMPLE)
cv.drawContours(img,conts,-1,(255,0,0),3)




CHAPTER 6
TESTING
Testing is an important phase in the development life cycle of the product. This was a
phase where the error remaining from all the phases was detected. Hence testing performs a
very critical role for quality assurance and ensuring the reliability of the software. During the
testing, the program to be tested was executed with a set of test cases and the output of the
program for the test errors was evaluated to determine whether the program is performing as
expected. Error was found and corrected by using the following testing steps and correction was
recorded for future references. Thus, a series of testing was performed on the system before it
was ready for implementation.
Test Environment
A testing environment is a setup of software and hardware on which the testing team is
going to perform the testing of the newly built software product. This setup consists of the physical
setup which includes hardware and logical setup that includes Server operating system, client
operating system, database Server, front end running environment, browser or any other software
components required to run this software product.
This testing setup is to be built on both server and client. The software was the tested on the
following platforms:
 Anaconda Prompt
 Windows Operating System (OS)
• Test Cases
A test case is a document which has a set of test data, preconditions, expected result and
post conditions for a particular test scenario in order to verify compliance against specific
requirements.
 Features to be tested
 Items to be tested
 Purpose of testing
 Pass/Fail Criteria
6.1 Unit testing
Unit testing is the testing of individual hardware or software units or groups of tested units.
Using white box testing techniques, testers verify that the code does what it is intended to do at a
very low natural level.
Unit testing is generally done within a class or a component. Unit testing focuses
verification effort on the unit of software design (module). Using the unit test plans prepared in
the design phase of the system development as a guide, important control paths are tested to
uncover errors within the boundary of the modules.
Each unit in this project was thoroughly tested to check if it might fail in any possible
situation. This testing was carried out at the completion of each unit. At the end of the unit testing
phase, each unit was found to be working satisfactorily in regard to the expected output from the
module. Table 5.1 shows the possible unit test cases.
Figure 6.1: Unit testing

6.2 Integration Testing
Integration testing is the testing in which software components, hardware components or

both are combined and tested to evaluate the interaction between them. Integration testing is the
process of testing the interface between two software units or module. It’s focus on determining
the correctness of the interface. The purpose of the integration testing is to expose faults in the
interaction between integrated units. Once all the modules have been unit tested, integration testing
is performed.
Using both black and white testing techniques, the tester verifies that units work together
when they are integrated into a larger code base.
Data can be lost across an interface one module can have an adverse effect on the others
sub functions, when combined it may not produce the desired major function. Also the global data
structures can present problems. Integration testing is a symmetric technique for constructing the
program structure while at the same time conducting tests to uncover errors associated with the
interface.
Testing performed to expose defects in the interfaces and in the interactions between
components or systems. Upon completion of unit testing, the units or modules are to be integrated
which gives raise to integration testing. Table 5.2 shows the test cases for Integration testing. The
related modules are combined and tested.

Figure 6.1: Integration testing
6.3 System Testing
System testing is the testing conducted on a complete, integrated system to evaluate the
system compliance with its specified requirements. System testing involves putting the new
program in many different environments to ensure that the program work in typical customer
environments with various versions and types of operating systems and/or applications. The
table 5.3 shows the System testing for the project.
Figure 6.3: System testing
6.4 Summary
This chapter presents about unit testing, integration testing and system testing which consists of unit test
cases for the various modules of the Recognition System.

CHAPTER 7
RESULTS AND DISCUSSIONS
7.1 Snapshots
Fig : Code on Anaconda prompt to run the application
Fig: Virtual Keyboard

Fig: Typing using Virtual Keyboard
Fig: Virtual Mouse

Fig :Color Object Detection
7.2 SUMMARY
This chapter describes the snapshots of the results that are obtained in the proposed
work that gives an overall idea of working of the system

CHAPTER 8
CONCLUSION AND FUTURE ENHANCEMENTS
Conclusion
In conclusion, it’s no surprised that the physical mouse will be replaced by a virtual non-physical
mouse in the Human-Computer Interactions (HCI), where every mouse movements can be
executed with a swift of your fingers everywhere and anytime without any environmental
restrictions. This project had develop a colour recognition program with the purpose of replacing
the generic physical mouse without sacrificing the accuracy and efficiency, it is able to recognize
colour movements, combinations, and translate them into actual mouse functions. Due to
accuracy and efficiency plays an important role in making the program as useful as an actual
physical mouse, a few techniques had to be implemented.
First and foremost, the coordinates of the colours that are in charge of handling the cursor
movements are averaged based on a collections of coordinates, the purpose of this technique is
to reduce and stabilize the sensitivity of cursor movements, as slight movement might lead to
unwanted cursor movements. Other than that, several colour combinations were implemented
with the addition of distance calculations between two colours within the combination, as
different distance triggers different mouse functions. The purpose of this implementation is to
promote convenience in controlling the program without much of a hassle. Therefore, actual
mouse functions can be triggered accurately with minimum trial and errors.
Furthermore, to promote efficient and flexible tracking of colours, calibrations phase was
implemented, this allows the users to choose their choices of colours on different mouse
functions, as long the selected colours doesn't fall within the same/similar RGB values (e.g. blue
and sky-blue). Other than that, adaptive calibrations were also implemented as well, it is
basically allows the program to save different set of HSV values from different angles where it
will be used during the recognition phase.
In Overall, the modern technologies have come a long way in making the society life better in
terms of productivity and lifestyle, not the other way around. Therefore, societies must not
mingle on the past technologies while reluctant on accepting changes of the newer one. Instead,
it’s advisable that they should embrace changes to have a more efficient, and productive
lifestyle.

Future Enhancements
There are several features and improvements needed in order for the program to be more user
friendly, accurate, and flexible in various environments. The following describes the
improvements and the features required:
a) Smart Recognition Algorithm
Due to the current recognition process are limited within 25cm radius, an adaptive
zoom-in/out functions are required to improve the covered distance, where it can
automatically adjust the focus rate based on the distance between the users and the
webcam.
b) Better Performance
The response time are heavily rely on the hardware of the machine, this includes the
processing speed of the processor, the size of the available RAM, and the available
features of webcam. Therefore, the program may have better performance when it's
running on a decent machines with a webcam that performs better in different types of
lightings.

REFERENCES
[1] http://www.iosrjournals.org/iosrjce/ papers/Vol10%20issue5/C01051016.pdf?id=139
[2] S. Sadhana Rao,” Sixth Sense Technology”, Proceedings of the International Conference on
Communication and Computational Intelligence– 2010, pp.336-339.
[3] Game P. M., Mahajan A.R,”A gestural user interface to Interact with computer system ”,
International Journal on Science and Technology (IJSAT) Volume II, Issue I, (Jan.- Mar.)
2011, pp.018 – 027.
[4] International Journal of Latest Trends in Engineering and Technology Vol. (7)Issue(4),
pp.055-062.
[5] Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-4, 2017



1.1.1 History of Input Devices

Uploaded by

Copyright:

Available Formats

You might also like

1.1.1 History of Input Devices

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1.1.1 History of Input Devices

Uploaded by

Copyright:

Available Formats

Gesture Recognition based Virtual Mouse and Keyboard

1.1.1 History of input devices

B.E., Department of CSE, AIT, Chikkamagaluru Page 1

1.1.2 Existing Input Devices

(ii) Removes the requirement of having a physical mouse

B.E., Department of CSE, AIT, Chikkamagaluru Page 2

• To provide an easier interaction routine to the user.

• It helps people with physical disabilities.

1.3 Problem statement

and imitate the activities of the input devices”

B.E., Department of CSE, AIT, Chikkamagaluru Page 3

• Physical mouse requires special hardware and surface to operate.

• Physical mouse is not easily adaptable to different environments and its

• Mouse has limited functions even in present operational environments

1.4 Scope of the project

B.E., Department of CSE, AIT, Chikkamagaluru Page 4

1 .5 Objectives of Proposed System

The main objectives of the project are,

• To develop a Virtual Mouse application that targets a few aspects of significant

• To develop a Virtual Mouse application that can be operational on all kind of

1.6 Literature survey

As modern technology of human computer interactions become important in our

B.E., Department of CSE, AIT, Chikkamagaluru Page 5

1. Zhengyou, Z., Ying, W. and Shafer, S. (2001)

“Visual Panel: Virtual Mouse, Keyboard and 3D Controller with an Ordinary

“Mouse Simulation Using Two Coloured Tapes”

B.E., Department of CSE, AIT, Chikkamagaluru Page 6

Figure 1.2: The system architecture of the mouse-simulation (Niyazi, 2012)

B.E., Department of CSE, AIT, Chikkamagaluru Page 7

“Virtual Mouse Using a Webcam”

B.E., Department of CSE, AIT, Chikkamagaluru Page 8

4. Shuman Tian, Xianbin Cao.: October 2016

“Portable Vision-Based Human Computer Interaction(HCI)”

B.E., Department of CSE, AIT, Chikkamagaluru Page 9

“Virtual Keyboard and Virtual Mouse”

B.E., Department of CSE, AIT, Chikkamagaluru Page 10

1.5 Organization of the report

The report has been organized into below chapters.

B.E., Department of CSE, AIT, Chikkamagaluru Page 11

Chapter 2 – System requirement specification:

Chapter 3 – High level Designed:

Chapter 4 – Detail Design:

B.E., Department of CSE, AIT, Chikkamagaluru Page 12

SYSTEM REQUIREMENT SPECIFICATION

2.1 Specific requirements

2.1.1 Python Programming Language

In technical terms, Python is an object-oriented, high-level programming language with integrated

B.E., Department of CSE, AIT, Chikkamagaluru Page 13

B.E., Department of CSE, AIT, Chikkamagaluru Page 14

Benefits of using Python

2.1.2 Python Framework

B.E., Department of CSE, AIT, Chikkamagaluru Page 15

B.E., Department of CSE, AIT, Chikkamagaluru Page 16

2.2 Hardware Requirements

• Computer Desktop or Laptop

Processor : Core i3/i5

Main Memory : 4GB RAM and above