Air Written Digits Recognition Using CNN For Device Control IJERTCONV8IS14035

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Special Issue - 2020 International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181
NCETESFT - 2020 Conference Proceedings

Air-Written Digits Recognition using CNN for


Device Control
Easily predict the hand written digits drawn in air using Convolution Neural Network
to control an IR Device

Akshay J R Varalatchoumy M Hitesh C


Department of Computer Science Associate Professor Department of Computer Science
and Engineering Department of Computer Science and Engineering
Cambridge Institute of Technology and Engineering Cambridge Institute of Technology
Bengaluru, India Cambridge Institute of Techmology Bengaluru, India
Bengaluru, India

Chethan V Jagadish C
Department of Computer Science and Engineering Department of Computer Science and Engineering
Cambridge Institute of Technology Cambridge Institute of Technology
Bengaluru, India Bengaluru, India

Abstract - Air-Written digits give a complimentary writing is troublesome, for example, gesture-based
methodology to general human-pc communication. They are communication, augmented reality (AR), virtual reality (VR),
intended to be simple so that the users can effectively retain and and so forth. In, gesture based writing the number of gestures is
perform them. Air writing framework alludes to composing a constrained by human stance, however it is conceivable to
semantic character or digits in free space by moving a marker, or expand the number of gestures by consolidating them. In any
some other handheld gadget. It is generally pertinent where case, recalling these is difficult for new users. On the other hand,
conventional pen-up and pen-down composing frameworks are in air writing, users can write in the same order as traditional
problematic because of its basic composing style, it has an method like 0-9. By utilizing this idea of air written digit
extraordinary bit of leeway over the motion-based framework.
recognition, it is possible to point a finger towards the camera to
However, it is a difficult task because of the non-uniform
characters and diverse composing styles. In this project, we
perform the right task. So, in this project, a camera was utilized
developed an air-writing digit recognition system using a camera to gather the information of air written digit and once this data
that tracks a marker to recognize the digits later these digits are is collected erroneous, temporal and spatial noise is removed by
sent to an IR module which can be used to control any device that applying some filters and normalization techniques like for
supports IR signals. For better feature selection, we processed the example Erosion and Morphological techniques. However,
data by using Erosion and Morphological pre-processing zigzag effects may persist due to the special writing style. Users
techniques and employed convolutional neural network (CNN) a write digits in the 3D space inside an imaginary box, which is
deep learning algorithm for image recognition. The model was not fixed; no delimiter distinguishes the boundary of the region
tested and verified by a self-generated dataset. To evaluate the of interest. Thus, the digits are unstable whereas unwanted
robustness of our system, we tested the recognition model and the sequences are drawn that make the framework more
IR module with 700 test data with over 5 IR devices respectively. challenging. Thus, an efficient deep learning-based algorithm,
Hence, it verifies that the proposed model is invariant for digits named the convolutional neural network (CNN) was deployed
and any IR gadgets. to predict the image drawn in air. In this we mainly used CNN
because it works well with the image dataset. The main
Keywords - Air-written digits, Camera,CNN, IR-Module. contributions of this project are as follows. (1) Designed a
I. INTRODUCTION proficient deep learning models allowing permitting air-writing
digit recognition with an accuracy rate of 93%. (2) Successfully
Humans interact with electronic devices like TV using their control and IR device by using the thus predicted digits.
remote controller but the communication with such devices can
be made easy by a unique system of Air-Written digit II. RELATED WORK
recognition. Air-writing digit recognition empowers users to Past attempts to handle air-writing issue were depended upon
communicate with the machine and interact naturally with no multi-camera arrangement to judge depth data, depth sensors,
mechanical gadgets where air-writing is a procedure of
for example, LEAP Motion [1], Kinect [2], or motion control
composing something in a 3D space by utilizing characters,
gadget, for example, Myo [3] and wearable gesture control
signals or trajectory data. It permits clients to write in a touchless
framework. Particularly, it is valuable when conventional gadgets. While these methodologies represent simpler tracking
and better precision, they suffer from cost-effective general-

Volume 8, Issue 14 Published by, www.ijert.org 130


Special Issue - 2020 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
NCETESFT - 2020 Conference Proceedings

purpose usage because of elementary reliance on the external The first layer of the architecture is the User layer. User
instruments. layer will comprise of the people who interacts with the
framework for the required results. The next layer in system
Schick et al. [4] proposed a sensor free marker-less system architecture is the Raspberry PI which consists of Local Host,
utilizing numerous cameras for 3D hand tracking followed by Data Base, Camera and CNN model for digit recognition. The
recognition with HMM. This strategy is accounted to attain local host consists of web pages which is developed using
86.15% recognition rate for characters and 97.54% recognition HTML, CSS, and PYTHON on DJANGO SERVER. These
rate for words recognition. Chen et al. [5], [6] utilized LEAP webpages are accessed by the user using a browser to record
Motion for tracking and a Hidden Markov Model (HMM) for new remote, select or delete an existing remote. The Raspberry
recognition and reported an error rate of 0.8% for recognition Pi also consists of a data base which stores the information of
of words and error rate of 1.9% for recognition of letters. Dash all the remotes which are recorded by the user. The information
et al. [7] used Myo armband sensor on a completely unique stated here, is the IR codes of the buttons in the remote, it is
Fusion model architecture by consolidating one Convolutional stored in a ‘.txt’ file named with the name of the remote itself.
Neural Network (CNN) and two Gated Recurrent Units (GRU). After selecting an appropriate remote the user draws the digits
The Fusion model is accounted to surpass other generally on air, the camera captures this air written digit and sends it to
utilized models, for example, support vector machine, k-nearest the CNN module for digit prediction. The CNN module
neighbors and accomplished a precision of 91.7% in an consists of a dedicated CNN model for hand written digit
individual free assessment and 96.7% in a person dependent recognition, this CNN model predicts/classifies the digit drawn
assessment. on air. The digits thus classified/predicted is then converted to
the respective IR code, using the IR database. This IR code is
then sent to Arduino through a serial port, the Arduino consists
III. PROPOSED METHODOLOGIES of IR module which consists of IR library namely “irremote.h”,
All the above existing systems have either made use of some using this library the IR code is then transmitted to the
special purpose sensors or a camera setup hence these kind of respective IR device with the help of an IR LED.
setup increases the overall cost of the system to which it has to
implemented and thus the mainstream adoption of these systems A. Marker Segmentation
is restricted. Therefore, to overcome these restrictions, a generic The major problem we face was the high variability of
video camera was used to implement an air-writing system human skin color, which gave a difficulty in differentiating the
which can be commonly used in any devices (such as TV etc.) segment of the hand from the background using color-based
consisting of an in-built video camera. segmentation technique. To overcome this difficulty, a
simplistic approach was implemented in the present work. The
In order to achieve the above aim, this project mainly feasibility of the air-writing system can be demonstrated by
concentrates on constructing a machine which consists of a ignoring finger detection and hand segmentation, instead made
camera and an IR circuit, interconnected with each other by use of marker of a fixed color for writing the digits in the air
using an Arduino and a Raspberry Pi. which is a three-dimensional space.
B. Digits Recognition
The Fig. 1 represents the architecture of the above discussed
The marker tip trajectory for air-written digit are projected
system/machine consisting of a Raspberry PI, camera, Arduino,
from three-dimensional space onto a two-dimensional image
IR transmitter and receiver.
plane and in order to predict the written character from the
fetched projected image a pre-trained convolutional neural
network (CNN) was used. At the time of this work, a dataset for
air-written digits was created. Therefore, the CNN model was
trained with the created dataset and later the trained model was
used to predict the air-written digit.

Fig. 2. Representation of CNN model


Fig. 1. Representation of System Architecture

The architecture of the CNN model is shown in the Fig. 2. As


discussed above in this project the CNN model is used for

Volume 8, Issue 14 Published by, www.ijert.org 131


Special Issue - 2020 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
NCETESFT - 2020 Conference Proceedings

mapping the images to its corresponding labels. CNN model C. Signal Transmission
consists of three-layer namely input layer, hidden layer and Once convolutional neural network (CNN) predicts the
output layer. The detailed description of the layer is as follows character, the respective character is then converted to IR raw
code using the IR database stored in Raspberry Pi. The IR raw
 Input Layer code consists of an approximately 64 numbers which represents
the Address bits and Command bits of an IR signal. Once the IR
The input layer is where the image is read. The image
raw code is fetched from the database it is then sent to the
given to the input layer is of size 28x28. The image is
Arduino board which is configured as an 38Khz frequency IR
preprocessed in order to remove any noise present in the
remote. In this project we have particularly configured the
background and in the edges of the digits. These images
Arduino as an 38KHz frequency IR remote because majority of
are saved in a csv file where each row consist of the digit
the IR remotes work around this particular frequency. But with
label and 784 pixels (28x28) information.
further involvement in the present work,it was recognized that
 Hidden Layer there is no fixed representation for zeros and ones with data
transmission over different IR methods. There are various
This layer is the backbone of the CNN architecture. The different encoding techniques for generating these codes. The
feature extraction is performed here with a series of significance of 38 KHz is that the frequency at which signal
convolution layer, pooling and activation function. oscillates when logically high that is this is the carrier frequency
 Convolution Layer of the signal. The Fig. 2 shows an example IR signal in NEC
Protocol.
The Convolution layer is used for extracting the
features from the images. A kernel of size 3x3 is
used in the convolution layer, which is used to scan
through the image and find the unique feature. In
kernel the weights are selected randomly which is
then multiplied to the scanned image from left to
right.
 Pooling Layer
The main purpose of this layer is to reduce the
spatial dimensionality. The output of the pooling
layer is a pooled featured map. The pooling layer is
generally placed between two convolution layers.
The pooling layer is used for controlling overfitting
and feature selection. The commonly used pooling Fig. 3. Representation of an IR signal in NEC protocol
layer is mac=x pooling. Consider a 4x4 convolved
is to be converted to 2x2. This is done by taking a
2x2 kernel size and selecting a maximum value There are many different protocols for IR data transmission
present in that kernel. Therefore, the resulting and receiving like NEC, RFC etc. among this NEC protocol was
output will a 2x2 convolved layer. used preferably by different device brands, so let take an
example to demonstrate how the receiver converts the
 Activation Layer modulated IR signal to a binary one and zeros.
This layer is used to introduce non linearity in the The NEC protocol follows these rules for transmission and
system. There main activation function which are receiving or IR data that is Logical ‘1’ starts with a 562.5 µs
SoftMax, Relu etc. The activation function which is long HIGH pulse of 38 kHz IR followed by a 1,687.5 µs long
used in the present work is nonlinear rectified liner LOW pulse. Logical ‘0’ is transmitted with a 562.5 µs long
unit. The Relu function is used for normalizing the HIGH pulse followed by a 562.5 µs long LOW pulse.
value present in the layer. The negative values
present in the input is replaced with “0” as output Therefore, for transmitting a digit through Arduino using
and the positive values are kept as it is without any the NEC protocol, the message transmission should be carried
changes. out in order, first a 9ms leading pulse burst (16 times the pulse
burst length used for a logical data bit) and followed be a 4.5ms
 Output Layer space, then a 8-bit address for the receiving device the 8-bit and
This is also known as the classification layer. This is a logical inverse of the address, followed by a 8-bit command and
fully connected feed forward network used as classifiers. the 8-bit logical inverse of the command. Finally, a 562.5µs
In this fully connected layer, the neurons are connected pulse burst to signify the end of message transmission. The
to all the pervious layers neurons. This layer calculates address and command are also sent in the inverse for error
predicted classes based on the combined result of the reduction and validation. Figure 1 illustrates the format of an
features learned in the pervious layers. The number of NEC IR transmission frame, for an address of 00h (00000000b)
output classes depends on the number of labels in the and a command of B5h (10101101b).
target dataset. SoftMax function is used in the present
work for classifying the input images based on the
features generated during the previous layers to various
classes based on the trained data.

Volume 8, Issue 14 Published by, www.ijert.org 132


Special Issue - 2020 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
NCETESFT - 2020 Conference Proceedings

From the Fig. 5 and Fig. 6 we can note that the CNN model
is able to predict the numerical drawn in air.
This work can additionally be upgraded by receiving this
system for utilizing straightforwardly on hands with no marker
for better unrestricted generalization of the proposed method,
the cost of this machine can be further reduced by designing
an circuit board which consists of only with the resources
Fig. 4. Representation of IR signal in Analog waveform /components which are necessary for this framework, enable
voice recognition system which helps the user when air-written
From the Fig. 3, we can notice that a 16-bits device address recognition fails, enabling OTA updates for the framework in
(address + logical inverse) is transmitted in 27ms and the other order to increase the efficiency of the of the CNN model,
27ms are taken to transmit a 16 bits command (command + dynamically learn different kinds of handwritings, fonts of
logical inverse) and 67.5ms to transmit the message frame different digits with the help of user inputs so that the efficiency
completely. The inverse of the command and address are sent of the CNN model automatically increases with respect to the
for reliability and verification purpose. user, enable backup of the OS, training data, and CNN model for
IV. EXPERIMENTAL RESULTS AND FUTURE easier recovery of the system.
ENHANCEMENT REFERENCES
In this project a robust CNN model is proposed for air- [1] Leap Motion Inc. LEAP Motion. 2010. URL: https://
written digit recognition in order to control an IR device. To www.leapmotion.com/.
keep away from the troubles of human skin division a marker of [2] Microsoft Corporation. Kinect. 2010. URL: https://
uniform shading is utilized. Recognition is performed through developer.microsoft.com/en-us/windows/kinect/.
tweaking of air-writing information and by a pre-trained CNN [3] Thalmic Labs Inc. Myo. 2013. URL: https://www.myo.com.
model. In this experiment the proposed system accomplished [4] A. Schick, D. Morlock, C. Amma, et al. “Visionbased Handwriting
93% recognition rate over English numerals and also supported Recognition for Unrestricted Text Input in Mid-air”. In: Proceedings of
all the IR devices. This strategy utilizes a camera and is the 14th ACM International Conference on Multimodal Interaction. ICMI
’12. ACM, 2012, pp. 217–220.
completely independent on any depth or motion sensor for
[5] M. Chen, G. AlRegib, and B. H. Juang. “Air-Writing Recognition–Part I:
example Kinect, LEAP Motion, Myo and so an., which is the Modeling and Recognition of Characters, Words, and Connecting
main advantage of this system over existing methodologies as it Motions”. In: IEEE Transactions on Human-Machine Systems 46.3 (June
decreases the cost drastically. 2016), pp. 403–413. ISSN: 2168-2291. DOI: 10.1109/
THMS.2015.2492598.
[6] M. Chen, G. AlRegib, and B. H. Juang. “Air-Writing Recognition–Part
II: Detection and Recognition of Writing Activity in Continuous Stream
of Motion Data”. In: IEEE Transactions on Human-Machine Systems 46.3
(June 2016), pp 436–444. ISSN: 2168-2291. DOI: 10.
1109/THMS.2015.2492599.
[7] A. Dash, A. Sahu, R. Shringi, et al. “AirScript–Creating Documents in
Air”. In: 14th International Conference on Document Analysis and
Recognition. 2017, pp. 908– 913.
[8] P. O. Kristensson, T. Nicholson, and A. Quigley. “Continuous recognition
of one-handed and two-handed gestures using 3D full-body motion
tracking sensors”. In: Proceedings of the 2012 ACM international
conference on Intelligent User Interfaces. ACM. 2012, pp. 89–92.
Fig. 5. Drawing the Numerical
[9] Xuan Yang, et.al., [2016],” Multi - Digit using Convolution Neural
Network” in Stanford University.
[10] T Siva Ajay, [JULY – 2017], “Hand Written Digit Recognition Using
Convolution Neural Network”, in International Research Journal of
Engineering and Technology (IRJET).
[11] Wan Zhu, [2018], “Classification of MNIST Hand Written Digit Database
Using Neural Network”.
[12] Vijayalaxmi R Rudraswaminmath, et.al., [June – 2019], “Hand Written
digit recognition” in International Journal of Innovative Science and
Technology.
[13] Prasun Roy, Subhankar Ghosh, Umapada Pal. "A CNN Based Framework
for Unistroke Numeral Recognition in Air-Writing", 2018 16th
International Conference on Frontiers in Handwriting Recognition
(ICFHR), 2018.
[14] www.mdpi.com
[15] Mingyu Chen, Ghassan AlRegib, Biing-Hwang Juang. "Air-Writing
Fig. 6. Prediction from the CNN Model Recognition—Part I: Modeling and Recognition of Characters, Words,
and Connecting Motions", IEEE Transactions on Human-Machine
Systems,2016.
[16] techdocs.altium.com
[17] www.circuitbasics.com

Volume 8, Issue 14 Published by, www.ijert.org 133

You might also like