Analysis On Handwriting Using Pen-Tablet For Identification of Person and Handedness

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2021 International Conference on Information and Communication Technology

for Sustainable Development (ICICT4SD), 27-28 February, Dhaka

Analysis on Handwriting Using Pen-Tablet for


Identification of Person and Handedness
2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD) | 978-1-6654-1460-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICICT4SD50815.2021.9397018

Shammi Akhtar†, Moumita Mehjabin Dipti†, Tahasina Afroze Tinni†, Pallab Khan†, Raihan Kabir‡, and Md Rashedul Islam†,∗

Department of CSE, University of Asia Pacific, Dhaka, Bangladesh

Department of CSE, University of Aizu, Fukushima, Japan
Email: shammi@uap-bd.edu, moumita15229@gmail.com, tahasinatinni@gmail.com, pallabkhan59@gmail.com,
raihan.kabir.cse@gmail.com, rashed.cse@gmail.com

Abstract— Human handwriting has some unique properties detection has been done using handwriting data analysis
to express the behaviors and personality of any person. The collected by pen-tablet.
state-of-the-art systems analyze the handwriting on paper
manually and demand human expertness. Moreover, it is very II. LITERATURE REVIEW
difficult to recognize a person by analyzing the handwriting The insight into the various approaches used for person
manually. Thus, this paper proposes a system for person
and handedness identification through handwriting by
identification along with another system to identify the
handedness of individuals using handwriting data analysis. The
different researchers is presented throughout this section.
used handwritten texts for these studies have been collected by Since a big amount of data is required for systems where
the pen-tablet. Six parameters are captured and distinguishable machine learning is necessary to bring out the sought result,
features are extracted to identify the person’s handwriting many ML-based tools have been developed to train the
attributes. To identify the person and handedness using machine which is described in [4].
extracted features, two different classification algorithms, i.e., A. Online Vs Offline
Support Vector Machine with linear kernel and Random Forest
Classifier, are used. The proposed systems show 69.39% and Both online and offline approaches have their individual
72.44% accuracy using SVM and random forest classifier roles in handwriting document analysis. The working
respectively for person identification and 97.25% accuracy for procedures of both of them are different. The online approach
handedness identification after using SVM. involves the automatic conversion of text which is written in
the special digitizer or PDA that contains the presence of a
Keywords—Online, Pen-tablet, Person Identification, sensor which is used to pick the pin tip movement of the pen
Handedness, System-I, System-II, SVM, Random Forest Classifier or stylus. This approach includes the involvement of special
applications that are designed to provide communication with
I. INTRODUCTION the computer. Alternatively, the offline approach involves
Handwriting is one of the most popular and highly automatic conversion of text in an image to letters which is
analyzed topics. It is quite an old topic and many works have readable within the computers. The available handwritten text
been done over the years. Handwriting is also very convenient in this approach is in image format.
and a large number of people prefer handwriting over typing.
So researchers are trying to making it better and finding new B. Previous Work
ways for handwriting analysis. Handwriting is a unique thing A strategy for person identification through handwriting
for each person. Since it uses the involvement of the brain, it based on retrieval problem and code-book based vector of
reflects some functions of the mind which are different from local aggregate descriptor has been proposed in [5]. Five
another person. These functions can be determined by features are extracted from the handwriting. All the features
analyzing the handwritten text. For being a unique entity, it is are computed between a pair of points in a stroke. The features
very convenient to use it as recognition. Both Identification are Speed, Writing Direction, Curvature, Vicinity Aspect,
and authentication are categorized into writer recognition in Vicinity Curliness. SVM classifier used to train the codebook
[1]. Person identification and verification through handwriting descriptor that is extracted from each enrolled document. An
used for security purposes and sometimes as evidence [2]. As end-to-end framework for online text-independent person
mentioned earlier handwriting is involved with the brain that’s identification through handwriting has been proposed in [6].
why it can reflect personality and psychology of any person. They have used RNN (recurrent neural network). The
So, many other behaviors such as age, handedness, mental handwritten data of the individuals are represented by a set of
condition can be predict using handwriting analysis. random hybrid strokes (RHS). RHS is a sampled short
Handedness is important as evidence in most of the cases and sequence that represents of pen tip movement and pen-ups or
can be used to determine the dominant hand in an pen-down states, randomly. The features are timestamp,
ambidextrous. Handedness also can reflect the brain’s button status, azimuth, altitude, x-coordinate, y-coordinate,
asymmetry. Two cerebral hemispheres of brain basically and pressure. In [7], an online-based system to determine the
handle the handedness of human beings [3]. Handedness plays handedness of the person along with the gender determination
an important role to know about any person’s brain has been presented. 18 Features like speed, writing direction,
functionality and that will help to know about that person’s curvature, normalized x- and y-coordinate, speed in x- and y-
personality. Thus, handedness detection is a main concern in direction, overall acceleration, etc. have been used. They used
this paper. In this paper, person identification and handedness two classifier approaches. One is, discriminating approach
where SVM has used, another is a generative approach where
the Gaussian mixture model has used. As for result, SVM with
*Corresponding author

978-1-6654-1460-9/21/$31.00 ©2021 IEEE


120

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 19:28:06 UTC from IEEE Xplore. Restrictions apply.
various kernel such as linear, RBF (Radial basis function), from 20 persons for 10 English word. As for System-II, we’ve
sigmoid, polynomial has brought 54.69%, 62.57%, 54.24%, collected handwritten texts from 8 persons, 4 persons are
61.46% accuracy respectively. The accuracy is 84.66% that is Right-handed and rest of the 4 persons are Left-Handed. As
obtained by using GMM. In [8], an online-based system has for the data, there are 8 data items and they are 8 different
been proposed to detect the handedness using pen-tablet data. types of shapes. Each person has written 10 times for each data
They have created a reference model. They also used dynamic item using both hands. So we’ve collected 80 handwritten data
programming matching methodology which calculates the for each right-handed person and 80 for each left-handed
distance between sample and input data. They used SVM and person, in total we have 1280 handwritten data for 8 persons.
obtained an accuracy 95%. Fig. 3 shows the words and symbols used in data collection.
III. PROPOSED MODEL
This section describes the proposed models. Two systems
have been created to accommodate two different purposes.
From now on, system for person identification will be termed
as System-I and system for handedness identification will be
termed as System-II throughout the paper. Fig. 1 shows the
system diagram.
Fig. 2. The data acquisition device: Pen-Tablet

Fig. 3. Words and symbols for handwriting data collection

C. Feature Extraction
Six features have been used here; they are– time, pressure,
x-axis, y-axis, horizontal angle, vertical angle. As
Fig. 1. General flow diagram of proposed system aforementioned, these features are extracted by the pen and
pad, therefore stored in an excel file by the draw application.
A. Data acquisition device: Pen-Tablet The values of these features are the main base of this paper.
Pen-tablet has been used to collect data. The working D. Data Analysis
procedure of this has been described in [9]. It has the
For System-I, we have collected raw data from the pen-
capability of detecting the position of the pen. The pen-tablet
table sensor. So pre-processing was necessary to conduct our
contains a communicator that conducts communication with
experiments smoothly. There were a lot of files consist of
the device driver and a specific application of the external
numerical values of six features. We’ve used average as our
computer. Fig. 2 shows both devices For our experiment, we
statistical feature. Average of each column has been measured
used an application named Draw Application. The pen is used
and using this process only a row has been created for one
to write in the pad. The time, pressure, angle, and many other
person for that person’s one handwritten text for one word.
features that the writer will take to write will be measured by
That’s how only one excel file has been created that contains
the pen and through the pad the axes can be measured. That
all person’s handwritten texts. Now one row contains the
application will create an excel file that will be stored by some
information about a person’s one time handwriting. Two more
values of features.
columns have been added to ease our work, they are word and
B. Data Collection name. Word column contains the data items and name column
We have some pre-defined English words and some contains the name of the person. Those column have been
different types of shapes as our data items in the dataset. To created to let machine know which person’s information it has
conduct experiments, we’ve collected data based on those been provided with and for which word.
items from several people. Data from 20 people have been
collected for System-I. They need to write those predefined
words on digital pad for 5 times. Thus, 50 handwritten texts
from each person and in total 1000 handwritten texts obtained

121

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 19:28:06 UTC from IEEE Xplore. Restrictions apply.
(a) Y-axis (b) Horizontal angle

Fig. 4. Plot for Time, Pressure, And X-axis for System-I

(c) Vertical angle


Fig. 8. Plot for Y-axis, Horizontal And Vertical Angles for Left Hand

Fig. 5. Plot for Y-axis, Horizontal and Vertical Angles for System-I

Fig. 9. Plot for Time, Pressure And X-axis for Right Hand

Fig. 6. Collected Text-Thank You

(a) Time (b) Pressure

Fig. 10. Plot for Y-axis, Horizontal And Vertical Angles for Right Hand

(c) x-axis
Fig. 7. Plot for Time, Pressure And X-axis for Left Hand

122

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 19:28:06 UTC from IEEE Xplore. Restrictions apply.
IV. EXPERIMENT AND RESULT ANALYSIS
A. System-1
A comprehensive experiment has been done using a
system developed by python. After pre-processing we got a
new dataset that contains all the information of each person.
Now that dataset split into two parts, i.e., Training and testing
part. Here 80% of data have been used for training purpose
and 20% for testing purpose. The data has been passed through
the classification algorithms. First, SVM with linear kernel
has been used. Then in sought of better accuracy, Random
Forest Classifier has also been used where the data passed into
1000 trees.
Fig. 11. Collected Shape-Spiral
B. System-II
The values were in different scales and some were in huge,
The new processed dataset is split into training and testing
so it was then scaled in a certain range so that all the values
parts. Here, 80% of data have been used for training purpose
remain in a same scale. Fig. 4 and Fig. 5 show the variation of
and 20% for testing purpose. We used SVM with linear kernel
average of all features for each person. Fig. 6 shows how the
as the classification algorithms.
collected data look like if we present it with its X-axis and y-
axis’s values. TABLE I. ACCURACY OF SYSTEM-I
As for System-II, it also has been developed in similar way Classifier Training Accuracy Testing Accuracy
the System-I has. The latter dataset which is created after SVM-Linear 87.36% 69.39%
preprocessing, contains three more columns in this system; RFC 98.67% 72.44%
they are shapes, person, hand. The shapes column contains the
shape names, person column contains person name and hand
TABLE II. ACCURACY OF SYSTEM-II
column contains the hand they used to write. Here we used 0/1
in column hand,’0’ used to define left hand and ’1’ used to Classifier Training Accuracy Testing Accuracy
define right hand. Fig. 7, Fig. 8, Fig. 9, and Fig. 10 show the SVM-Linear 98.72% 97.25%
variation of average of all features for each person’s hand. Fig. C. Result and Analysis
11 shows how the collected data look like if we present it with
values of x-axis and y-axis. Table 1 shows the obtained accuracy after using two
different algorithms for System-I. The SVM classification
algorithm show training accuracy 87.36% and testing
E. Classification Model accuracy 69.39%. The Random Forest Classifier shows the
training accuracy 98.67% and testing accuracy 72.44%. Table
Classification used for predictive problem where a class
2 shows the obtained accuracy for System-II using SVM.
label is needs to be predicted for given input. Two
Result shows that the accuracy for training is 98.72% and for
classification models have been used in our systems. They
testing is 97.25%. Table-III represents the comparison
are– Support Vector Machine and Random Forest Classifier.
between a state-of-the-art model [8] and our System-II.
1) Support Vector Machine: The SVM is a most popular
machine learning model for solving the linear and non-linear TABLE III. COMPARISON WITH STATE-OF-THE-ART METHOD FOR
SYSTEM-II
classification problem. It is applicable in many real-world
Subject Existing Work [8] Our Work (System-II)
problems such as text hyper-text categorization, image Features Elapsed time, X-axis, Time, Pressure, X-axis, Y-
classification, satellite data, handwritten text, biological and Y-axis, Pen-pressure, axis, Horizontal angle,
other science. It uses hyper-plane that is used as classification Pen-orientation, Vertical angle
Pen-height
plane [10]. Its uses kernel to transform the data and based on Statistical Average pressure, Average Time, Average
that it creates an optimal hyper-plane that separates the data Features Maximum pressure, pressure, Average x-y axes,
into classes [11]. It can be expressed as equation (1). Minimum pressure, Average of both angles.
Average velocity
(1) Classification SVM with linear, SVM with linear
Model Polynomial and RBF kernel.
kernel
where, b is the bias, S is set of observations, αi is learning Accuracy 95% 97.25%
parameter and k is kernel. Here we used linear kernel.
2) Random Forest Classifier: This classifier is an As System-I has been developed by using two algorithms,
ensemble classifier which consists numerous decision trees we would compare the results of two of them by their
that are act in parallel. It takes the prediction of each tree in classification reports. Precision, recall, f1 score have been
account and then based on that it predicts the final output [12]. calculated after applying both classifiers in System-I. Table-
In our experiments we’ve used 1000 trees. IV and Table-V show the classification reports that obtained
using SVM and RFC from System-I. Table-V presents the
classification report for System-II.

123

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 19:28:06 UTC from IEEE Xplore. Restrictions apply.
TABLE IV. CLASSIFICATION RESULT OF SYSTEM-I USING SVM As the usage of pen-tablet is increasing, our work will put
Subjects Precision Recall f1-score Support more convenient effect to the handwriting analysis with pen-
Tinni 0 0 0 9 tablet data. Our presented systems can be applied to any pen-
Fahim 0.67 0.36 0.47 11 tablet related analysis in future.
Mohiuddin 1 1 1 9
Nishat 0.73 1 0.84 8 Although we have created two systems for two different
Lubna 0.62 0.62 0.62 8
Jannatul 0.38 0.83 0.53 6 purposes. It could bring betterment if it is possible to create
Arafat 0.62 0.89 0.73 9 only one system which has the ability to identify the person
Pia 1 0.91 0.95 11 and that person’s handedness at the same time. In future work
Pallab 0.6 0.67 0.63 9 we plan to extend our experiments to create one system with
Sondha 1 0.58 0.74 12
Snigdha 1 0.83 0.91 12 those two purposes.
Razon 0.58 0.78 0.67 9
Sharif 0.62 0.77 0.69 13 REFERENCES
Janee 0.5 0.6 0.55 15 [1] A. Bensefia and T. Paquet, “Writer verification based on a single
Noyon 1 0.86 0.92 14
handwriting word samples,” EURASIP J. on Image and Video Proc.,
Tushar 0.62 0.62 0.62 8
vol. 1, p. 34, October 2016.
Mridula 0.83 1 0.91 5
Rimi 0.33 0.22 0.27 9 [2] M. Tapiador and J. A. Sig u¨enza, “Writer identification method
Rafia 1 0.8 0.89 10 based on forensic knowledge,” in proc. of Int. Conf. on Bio.
Mahi 0.5 0.67 0.57 9 Authentication. Springer, pp. 555–561, 2004.
Average 0.7 0.69 0.68 196 [3] P. V. Wlassoff, “Handedness: What does it say about your brain
structure?” Neuroscience & Neurology, BrainBlogger, February 2018.
TABLE V. CLASSIFICATION RESULT OF SYSTEM-I USING RFC Accessed on: Nov. 5, 2020. [Online]. Avilable:
http://www.brainblogger.com/2018/02/19/handedness-what-does-it-
Subjects Precision Recall f1-score Support say-about-your-brain-structure/
Rimi 0.8 0.44 0.57 9
[4] M. Mahmud, M. S. Kaiser, T. M. McGinnity, and A. Hussain, “Deep
Arafat 0.5 0.56 0.53 9
learning in mining biological data,” Cognitive Computation, vol. 2021,
Tushar 0.56 0.62 0.59 8
no. 13, pp. 1–33, January 2021.
Rafia 0.75 0.9 0.82 10
Pallab 0.8 0.89 0.84 9 [5] V. Venugopal and S. Sundaram, “An online writer identification
Razon 0.58 0.78 0.67 9 system using regression-based feature normalization and codebook
Nishat 0.86 0.75 0.8 8 descriptors,” Expert Sys. with App., vol. 72, pp. 196–206, 2017.
Janee 0.58 0.47 0.52 15 [6] X.-Y. Zhang, G.-S. Xie, C.-L. Liu, and Y. Bengio, “End-to-end online
Sondha 1 0.58 0.74 12 writer identification with recurrent neural network,” IEEE Trans. on
Sorif 0.63 0.92 0.75 13 Human-Machine Sys., vol. 47, no. 2, pp. 285–292, 2016.
Lubna 0.67 0.75 0.71 8 [7] M. Liwicki, A. Schlapbach, P. Loretan, and H. Bunke, “Automatic
Mohiuddin 0.9 1 0.95 9 detection of gender and handedness from on-line handwriting,” in
Mahi 0.75 0.67 0.71 9 Proc. 13th Conf. of the Graphonomics Society, Jan. 2007, pp. 179–183.
Snigdha 1 0.92 0.96 12
[8] J. Shin and M. A. Rahim, “Handedness detection based on drawing
Fahim 0.7 0.64 0.67 11 patterns using machine learning techniques.” in Proc. The Thirteenth
Jannatul 0.23 0.5 0.32 6 Int. Conf. on Adv. in Comp.-Human Interactions, June 2020.
Pia 1 0.82 0.9 11
Mridula 1 1 1 5 [9] Y. Nakajima, K. Angelov, V. Blazhev, and T. Dimitrova, “Pen tablet,
Tinni 0.8 0.44 0.57 9 handwritten data recording device, handwritten data drawing method,
Noyon 0.92 0.86 0.89 14 and handwritten data synthesis method,” Apr. 21 2020, US Patent
Average 0.76 0.72 0.73 196 10,627,921.
[10] R. Ebrahimzadeh and M. Jampour, “Efficient handwritten digit
recognition based on histogram of oriented gradients and svm,” Int. J.
TABLE VI. CLASSIFICATION RESULT OF SYSTEM-II
of Comp. App., vol. 104, no. 9, pp 10–13, October 2014.
Precision Recall F1-Score Support [11] G. M. Foody and A. Mathur, “Toward intelligent training of supervised
Left 0.97 0.98 0.97 121 image classifications: directing training data acquisition for svm
Right 0.98 0.97 0.97 134 classification,” Remote Sensing of Environment, vol. 93, no. 1-2, pp.
Avg. Total 0.97 0.97 0.97 255 107–117, October 2004.
[12] T. Yiu, “Understanding random forest,” Towards Data Science,
August 2019. Accessed on: Nov. 12, 2020. [Online]. Available:
https://towardsdatascience.com/understanding-random-forest-
V. CONCLUSION AND FUTURE WORK 58381e0602d2
In this paper, two proposed systems are presented for
online person identification and handedness detection using
person’s handwriting analysis. The main focus of those
systems is to predict human behaviors for identifying person
and handedness using handwriting data analysis. In those
proposed model digital pen-tablet is used for collecting
handwriting data which consist of 6 writing parameters. Data
are analyzed and statistical features are extracted for
handwriting data. Two classifiers have been applied, SVM
and Random Forest Classifier. Person identification is
performed by SVM and Random forest classifier. This system
has given us testing accuracy of 69.39% and 72.44%
respectively. Our second system is for person’s handedness to
identify which is performed by SVM. This system has given
us 97.25% accuracy.

124

Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 19:28:06 UTC from IEEE Xplore. Restrictions apply.

You might also like