Professional Documents
Culture Documents
A Novel Approach: Recognition of Devanagari Handwritten Numerals
A Novel Approach: Recognition of Devanagari Handwritten Numerals
Abstract— Recognition of Indian languages scripts is challenging problems. In Optical Character Recognition [OCR], a
character or symbol to be recognized can be machine printed or handwritten characters/numerals. There are several
approaches that deal with problem of recognition of numerals/character depending on the type of feature extracted and different
way of extracting them. In this paper an automatic recognition system for isolated Handwritten Devanagari Numerals has been
proposed. The proposed system relies on a feature extraction technique based on recursive subdivision of the character image
so that the resulting sub- images at each iteration have balanced numbers of foreground pixels as possible. Support Vector
Machine (SVM) is used for classification. Accuracy of 98.98% has been obtained by using standard dataset provided by ISI
(Indian Statistical Institute) Kolkata
1 INTRODUCTION
© 2011 IJEECS
http://www.ijeecs.org
four sub-images that consist of about the same Table 1: Distribution of numerals in
amount of foreground pixels. The process of Devanagari Database
division of the image gives 4, 16… sub-images. Digits Training Set Test Set Total
Initially at each level, the feature is calculated 0 1844 369 2213
and then recognition rate is calculated at each 1 1891 378 2269
level, choose the level at which the highest 2 1891 378 2269
recognition rate is achieved. 3 1882 377 2259
4 1876 376 2252
2 DATABASE 5 1889 378 2267
6 1869 374 2243
The database is provided by the ISI (Indian
7 1869 378 2247
Statistical Institute, Kolkata) [15]. Initially De-
8 1887 377 2264
vanagari script was developed to write Sanskrit
but was later adapted to write many other lan- 9 1886 378 2264
guages such as Hindi Marathi and Nepali. The 18784 3763 22547
printed Devanagari Numerals are shown in 3.1 Preprocessing
figure 1 and it is seen that there are variations
i) Adjust image intensity values of the image
in the shapes of numerals 5, 8 and 9 in their
using imadjust () function of Matlab.
printed forms. In figure 2, there are shown the
ii) Convert the image into binary image by
samples of the Handwritten Devanagari Nu-
choosing threshold value 0.8.
merals database. The distributions of training
iii) Remove from a binary image all connected
data and testing data are shown in table 1.
components (objects) that have fewer than 30
pixels
iv) Apply median filtering, is a nonlinear oper-
ation often used in image processing to reduce
"salt and pepper" noise
v) Normalized the image into 90*90
Figure 1: Devanagari Numerals 3.2 Feature Extraction Algorithm
Suppose that im(x, y) is a handwritten charac-
ter image in which the foreground pixels are
denoted by 1‟s and background pixels are de-
noted by 0‟s. Feature extraction algorithm sub-
divided the character image recursively. At
granularity level 0 the image divided into four
parts and gives a division point (DP) (x0, y0).
The following algorithm shows that how x0 is
calculated and likewise y0.
Algorithm:
Step 1: input im (xmax, ymax) where xmax and ymax
be the width and the height of the character
image
Step 2: Let v0 [xmax] be the vertical projection of
image (fig 3.b)
Step 3: Create v1 [2*xmax] array by inserting a „0‟
before each element of v0 (fig 3.c)
Figure 2: Handwritten Devanagari Numerals Step 4: Find xq in v1 that minimizes the differ-
Samples ence between the sum of the left partition [1, xq]
and the right partition [xq, 2*xmax] or left parti-
tion should be greater than right if not able to
3 PROPOSED METHOD equally divide.
Step 5: x0=xq/2;
The entire database is gray scale image that
contains noise and is not in normalized form.
All experiments have done on Matlab 7.10.0.
2
Step 6: if xq mod 2 = 0 culated at particular level and drawn a graph
2 sub-images are (figure-6) that shows the level of granularity
[(1, 1), (x0, ymax) and (x0, 1), (xmax, ymax)] and the recognition rate. By the help of graph
Else examine the highest recognition rate at corre-
2 sub-images are [(1, 1), (x0, ymax) and sponding level (L best).
(x0+1, 1), (xmax,ymax)]
3.3 Classification
Classification step is divided into two phases.
(i) Training phase
In this phase, gradually increase the higher
levels of granularity starting with level 1, fea-
tures are extracted. The recognition rate is cal- Figure 6: Example finding the best level (Lbest )
3
4 CLASSIFIER (SVM)
Support Vector Machine is supervised Machine
Learning technique. The existence of SVM is
shown in figure 7. It is primarily a two class
classifier. Width of the margin between the
classes is the optimization criterion, i.e. the
empty area around the decision boundary de-
fined by the distance to the nearest training
pattern. These patterns called support vectors,
In case that two classes have identical votes,
finally define the classification function.
though it may not be a good strategy, now we
Computer simply choose the class appearing first in the
Vision array of storing class names.
LIBSVM is used with Radial Basis Func-
tion (RBF) kernel, a popular, general-purpose
Artificial yet powerful kernel, denoted as
Intelligence
K (xi, xj) ≡ exp (-γ||xi-xj||2) (2)
Clustering Support
Vector
Machine
5 EXPERIMENTS AND RESULTS
In order to classify the handwritten numeral
Figure 7 and evaluate the performance of the technique,
we have carried out the experiment by setting
All the experiments are done on LIBSVM various parameter examples Lbest, gamma, and
3.0.1[20] which is multiclass SVM and select cost parameter. All experiments was performed
RBF (Redial Basis Function) kernel. A feature on a Intel® core 2 duo CPU T6400 @ 2GHz with
vector set fv(xi) i=1…m, where m is the total 3 GB RAM under 32 bit windows 7 Ultimate
number of character in training set and a class operating system.
set cs(yj) j=1…m , cs(yj) ϵ { 0 1 ….9} which defines The training set of Devanagari Hand-
the class of the training set, fed to Multi Class written Numerals provided by ISI, Kolkata
SVM. contains 18784 samples used to determinate the
LIBSVM implements the “one against best granularity level. Here to obtain recogni-
one” approach (Knerr et al .., 1990) [16] for tion accuracy at different granularity level used
multi-class classification. Some early works of cross validation function of LIBSVM with n=10
applying this strategy to SVM include, for ex- and set the γ =0.5 and c=500. The recognition
ample, Kressel (1998) [17]. If k is the number of accuracy at different-2 granularity level shows
classes, then k (k-1)/2 classifiers are construct- in fig 8. At level 3, the highest accuracy 98.98
ed and each one trains data from two classes. obtained.
For training data from the ith and jth classes, we After obtaining the best granularity
solve the following two class classification level, trains the LIBSVM by ISI training set. The
problem: size of feature vector is 170 (2*4L –
In classification we use a voting strate- 2*40+2*41+2*42+2*43). Some granularity level
gy: each binary classification is considered to applies on the test data to form the feature vec-
be a voting where votes can be cast for all data tor and obtained the 98.40 % accuracy when
points x - in the end a point is designated to be values of γ, c set to 1.1 and 500. Confusion ma-
in a class with the maximum number of votes.
4
trix (Table 2 in bold latter) shows that 2 con- Table 4: Comparison of accuracy obtained by
fused with 3 and 4 confused with 5 as 7 with 6 different methods
and the highest recognition rate is 99.73% for 8. S.n Method proposed Data Accuracy
Computation time taken by the training phase by Size Obtained
and testing phase is shown in table 3.
1 R. Bajaj et al [6] 400 89.6 %
REFERENCES
[1] Ivind due trier, Anil Jain, torfiinn Taxt, “A feature
extraction method for character recognition-A
Table 3: Computational time (Feature Extracted survey “, Pattern Recg, vol 29, No 4, pp-641-662,
at level 3) 1996
Phase Sample size Time Required [2] Sandhya Arora, Debotosh Bhattacharjee, Mita Na-
Training 18784 66 seconds sipuri, D. K. Basu, M. Kundu, “ Recognition of
Testing 3763 15 seconds Non-Compound Handwritten Devnagari Charac-
ters using a Combination of MLP and Minimum
Edit Distance”, International Journal of Computer
Test dataset and training dataset com-
Science and Security (IJCSS),Volume (4) : Issue-1
bined to perform the cross validation function
pp 107-120.
of LIBSVM with n=10 and set the γ =0.5 and
[3] P M Patil, T R Sontakke,” Rotation, scale and trans-
c=500. Features vector for whole dataset
lation invariant handwritten Devanagari numeral
(22547) is calculated at level 3 (Lbest) and ob-
character recognition using general fuzzy neural
tained 98.98% recognition rate.
network”, Pattern Recognition, Elsevier, 2007.
5
[4] Anil K. Jain, Robert P.W. Duin, and Jianchang Mao, chitectures and Applications. Springer-Verlag,
“Statistical Pattern Recognition: A Review”, IEEE 1990.
Transactions on Pattern Analysis and Machine In- [17] U. H.-G. Kressel. Pairwise classication and support
telligence, Vol. 22, No. 1, pp- 4-37, January 2000. vector machines. In B. Scholkopf, C. J. C. Burges,
[5] G S Lehal, Nivedan Bhatt, “A Recognition System and A. J. Smola, editors, Advances in Kernel Meth-
for Devnagri and English Handwritten Numerals”, ods { Support Vector Learning, pages 255{268,
Proc. Of ICMI, 2000. Cambridge, MA, 1998. MIT Press.
[6] Reena Bajaj, Lipika Day, Santanu Chaudhari, “De- [18] http://www.csie.ntu.edu.tw/~cjlin/libsvm
vanagari Numeral Recognition by Combining De- [19] http://www.csie.ntu.edu.tw/~cjlin/papers/libsv
cision of Multiple Connectionist Classifiers”, Sad- m.pdf
hana, Vol.27, Part-I, 59-72, 2002. [20] http://www.isical.ac.in/~ujjwal/download/datab
[7] R.J.Ramteke, S.C.Mehrotra, “Recognition Hand- ase.html.
written Devanagari Numerals”, International jour-
nal of Computer processing of Oriental languages,
Mahesh Jangid is an M.Tech. Student in computer
2008. science & engineering department of Dr. B R Ambedkar
[8] U. Bhattacharya, S. K. Parui, B. Shaw, K. National Institute of Technology.He has completed his
Bhattacharya, “Neural Combination of ANN and B.E. degree in 2007 from Rajasthan University.He has
HMM for Handwritten Devnagari Numeral the 2 year teaching experience from JECRC Jaipur. His
research area is image processing, optical character
Recognition”. recogniton, pattern recognition.
[9] U. Pal, T. Wakabayashi, N. Sharma and F. Kimura,
“Handwritten Numeral Recognition of Six Popular Renu Dhir has done her Ph.D in computer science and
Indian Scripts”, Proc. 9th ICDAR, Curitiba, Brazil, engineering from Punjabi University in 2007 and
M.Tech. in computer science & engineering from TIET
Vol.2 (2007), 749-753. Patiala in 1997.Her area of research is mainly image
[10] J. Park, V. Govindaraju, S. N. Shrihari, ''OCR in processing and character recognition. She has pub-
Hierarchical Feature Space'', IEEE Transactions on lished more than 35 papers in various international jour-
nals and confereces
Pattern Analysis and Machine Intelligence, 2000,
Now she is working as a Associate Professor in NIT
Vol. 22, No. 24, pp. 400-408. Jalandhar.
[11] Samet H, “The Design and Analysis of Spatial Data
Structures", Addison-Wesley Longman Publishing Rajneesh Rani is doing Ph.D. from NIT Jalandahar. She
has completed her M.Tech in computer science and
Co., Inc., 1990. engineering from Punjabi University, Patiala in 2003.She
[12] S. Mozaffari, K. Faez, M. Ziaratban, "Character has 7 year of teaching experience. Her area of research
Representation and Recognition using Quadtree- is image proceesing and character recognition.
based Fractal Encoding Scheme ", Proceedings of
the 8th International Conference on Document
Analysis and Recognition, Seoul, Korea, 2005,
Vol.2, pp. 819-823.
[13] A. P. Sexton, V. Sorge, "Database-Driven Mathe-
matical Character Recognition", Graphics Recogni-
tion, Algorithms and Applications (GREC), Lecture
Notes in Computer Science (LNCS), Hong Kong,
2006, pp. 206-217.
[14] Georgios Vamvakas, Basilis Gatos, Stavros J. Peran-
tonis,” Handwritten character recognition through
two-stage foreground sub-sampling”,” Pattern
Recognition,” 43 (2010) 2807–2816
[15] U. Bhattacharya and B.B. Chaudhuri, “Databases
for Research on Recognition of Handwritten Char-
acters of Indian Scripts,” Proc. Eighth Int‟l Conf.
Document Analysis and Recognition (ICDAR ‟05),
vol. 2, pp. 789-793, 2005.
[16] S. Knerr, L. Personnaz, and G. Dreyfus. Single-
layer learning revisited: a stepwise procedure for
building and training a neural network. In J. Fo-
gelman, editor, Neu-rocomputing: Algorithms, Ar-