Professional Documents
Culture Documents
Handwritten Kannada Kagunita
Handwritten Kannada Kagunita
net/publication/275925802
CITATIONS READS
8 9,443
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sasikumar Mukundan on 03 February 2016.
94
International Journal of Computer Theory and Engineering, Vol.3, No.1, February, 2011
1793-8201
information and consonant information to recognize them vowel అ. All other vowels modify the consonant appearance
separately. A neural network – multi layer perceptron (MLP)
in the same way. The explicit appearance of a vowel in a
with back-propagation (BP) is used for classification. The
confusion matrix is formed for analysis. syllable overrides the inherent vowel ఆ of a consonant. The
The paper is organized as follows. Section II briefly other vowel matra shapes and positions are as in fig. 2(b).
discusses Kannada script including Kagunita structure. The dotted circle indicates the position of the consonant.
Section III is devoted to explain briefly the technologies used Sample examples of combining consonant with vowel
for feature extraction. Section IV describes the feature forming a Kagunita are shown in fig. 2 (c) and (d). The first
extraction. Section V reports the experimental results and character in these examples is the basic character in its
analysis. Section VI gives the conclusion of our study. original form and the remaining 14 characters show the shape
changes due to added matra.
(a) అ ఆ ఇ ఈ ఉ ఊ ఋ ఎ ఏ ఐ ఒ ఓ ఔ అం అః
II. KANNADA LANGUAGE SCRIPT (b) ◌ా ◌ి ◌ిౕ ◌ు ◌ూ ◌ృ ◌ె ◌ెౕ ◌ై ◌ెూ ◌ెూౕ ◌ౌ ◌ం ◌ః
The current Kannada basic character set has 15 vowels and (c) క ಕా ಕಿ ಕಿౕ కు కూ కృ ಕె ಕెౕ ಕై ಕెూ ಕెూౕ ಕౌ కం కః
34 consonants in the modified list. The manual observation of
(d) య ಯా ౕ యు యూ యృ ౕ ౖ ౕ ಯౌ యం యః
this set showed that many characters have similar shapes with
slight modifications as shown in fig. 1 with typed characters Figure 2. (a) 15 vowels, (b) matras (c) ,(d) sample examples of Kagunita
in similar shape groups and with a sample handwritten page
of basic character set. Similar shapes are marked with same Thus recognizing Kagunita means recognizing 510
colored circle and square. It is manually observed that there different aksharas, a very large number of classes and so is a
are about 13 groups of similar character shapes. very complex task. Since Kagunita is a combination of
vowels and consonants, if we identify these two components
అఆ ఇఞ
in a given form, then we can recognize a Kagunita. Hence the
ఐణటచ ఖబభఒ problem is now recognizing a vowel from 15 vowels and a
consonant from 34 consonants extracted from a Kagunita.
ద డపహవఏ థధఢఫషఘఛ This is the approach we take in this paper.
The recognition of vowel in the presence of the 34
ఠరగ నస variations of consonant and the recognition of a consonant
ళశ యఋమ under the presence of 15 vowel variations is a difficult task.
As the size of the vowel matra varies, this influences the size
కత ఙజఓ of the consonant in the normalized image. A vowel matra has
similarities with other vowel matra and a consonant has
అం అః similarities with other consonants. Hence recognition of a
vowel or a consonant under the presence of huge similarities
and dissimilarities makes the problem complex. This raises
issues of looking at two types of feature extraction – one
suitable for identifying the vowel element and another set for
the consonant element.
ratio α=
σy a constant.
σx
Using the 0 , 1st, 2nd and 3rd order moments the following
th
viewed in the vertical direction. Also some have more vowel matra varies in size. We can observe four different
information, some less and some close to the average sizes of vowel matra.
information of the character with thin, wide or squared To recognize the right side vowel, we therefore considered
shapes. These features play an important role to find the 4 different size images - one original image and 3 cut images
density of the information and its distribution in the image so that at least one category can best represent the vowel
and so we considered statistical features also to strengthen matra. The first image category is the one with no cuts
the feature set. (original whole image), second category with around 20%
1) Information content of the image cuts, third with around 40% cut and the fourth category with
A global feature called information content of the image is around 60% cut. Hence there will be around 6 images to help
chosen as one of the feature since some characters have more in the recognition of vowel in the Kagunita as shown in the
number of curvatures making image to have more fig. 9(a).
information in the same space. This feature is calculated as Similarly, to recognize the consonant component in the
the ratio of the total number of black pixels to the total Kagunita, as the images are size normalized, the vowel matra
number of pixels in the image. variation varies the size of the consonant present to the left of
T the vowel matra. Hence if no vowel matra to the right then
IC = black (12)
Tallpixels consonant occupies the whole image. With 20% vowel matra
the consonant occupies 80% of the total space and so on.
Therefore to recognize the consonant we considered again 4
images with one original and 3 cut images as shown in the fig.
9(b).
2) Aspect Ratio
Some of the characters can be easily distinguished because
of their typical form, like their thinness, broad-ness, square
shaped-ness or tallness etc,. The minimum and the maximum
positional values of x and y are used in the computation.
⎝
(
⎛ y
⎜ max − y min ) + (xmax − xmin ) ⎟
2
⎠
2⎞ (a) cut images for vowel recognition
D. Cut images for Kagunita A. Moment features from Original and Directional images
Kannada script does not have any vowel that modifies the Kannada character’s curvature shape can be best extracted
left side of the consonant. The vowel sign is on the top, right as a feature if we extract moment features from the
or bottom of the character. The matra are grouped based on directional images. So we are finding 4 directional binary
the top variations, side variations and bottom variations as images using Gabor wavelets from the dynamically
shown in the fig. 8. preprocessed original images. We then extract moments
There are very small variations of matra shapes with same features from them. The directional image centralness,
size on the top. In bottom there are only two variations of divergence, imbalance should describe the relevant
same size. To make this information prominent for its characteristics of the original image much more strongly than
recognition, this portion can be cut around 20% on the top if moments are taken on the original image.
and bottom respectively giving 2 different images. The right Based on this intuition, we are computing the moments
98
International Journal of Computer Theory and Engineering, Vol.3, No.1, February, 2011
1793-8201
99
International Journal of Computer Theory and Engineering, Vol.3, No.1, February, 2011
1793-8201
images, 510 sets are formed. From these, 128 sets of samples
are separated out as test samples.
A. Performance of moments and statistical features on
whole Kagunita images
For the experiments we considered features from original
images and directional images. From dynamically
preprocessed original images we found Central moments and
Geometric moments and to test the efficiency of the feature
set we considered Central moments from original images as
one feature set O-CM and both central and geometric (b) Recognition results for consonants,
moments O-GMCM as another feature set to find their
contributions, and both feature sets are normalized for vowel Figure 9. Individual class Moments results from series 1- original image,
and consonant recognition. The recognition capabilities are series-2 four directional Gabor images
tested and results of vowel recognition are as in table IV and
of consonant are as in table V. Geometric moments improved As expected they show a nonlinear relation as shown with
the results from 40% to 47% for vowels and 24% to 29% for yellow circles. This suggests they can be used together as
consonants. features in further experiments.
The combined moments vowel recognition results along
TABLE IV. VOWEL RECOGNITION RESULTS OF MOMENTS FEATURES with statistical features is as shown in table VI. When
FROM WHOLE IMAGES
moment features G-CM and O-GMCM are together used as
Sl No of Feature Vowel Recognition results features the vowel average recognition results improved from
no features description in % 63% to 71%. The statistical features G-S showed a
No of epocs – 1000.
Mean Min Max performance of 21% individually but when combined with
1 10 O-CM 40 15 70 G-CM the results improved from 63% to 69%. When all
2 14 O-GMCM 47 18 74 these features here used together, the average results are
3 40 G-CM 63 42 81 77%.
4 56 G-GMCM 64 40 82
TABLE VI. VOWEL RECOGNITION RESULTS OF COMBINED MOMENTS
Similarly we computed central moments G-CM and both FEATURES FROM WHOLE IMAGES
geometric and central moments together G-GMCM as Sl No of Feature description Vowel Recognition
features on four directional images and normalized for no features results in %
No of epocs – 1000.
vowels and consonants recognition. The recognition Mean Min Max
capabilities are tested and results of vowels are as in table IV 1 54 G-CM+O-GMCM 71 57 83
and of consonants are as in table V. G-GMCM contributed 2 48 G-S 21 9 43
only 1% improvement from 63% to 64% over G-CM for both 3 88 G-CM+G-S 69 50 84
vowels and consonants. 4 102 G-CM+O-GMCM+G-S 77 56 92
100
International Journal of Computer Theory and Engineering, Vol.3, No.1, February, 2011
1793-8201
20T305070R20B cut with one image of 20% cut on the top, 3 Sl No of Feature description Vowel Recognition
no features results in %
images with 30%, 50% and 70% cut from the right and one
No of epocs – 1000.
image with 20% cut sizes on the bottom of the original image Mean Min Max
performed well for vowel recognition compared to other cuts 1 168 G-CM+G-S+ 85 75 94
with an average of 69%. 20T305070R20B
2 182 G-CM+G-S+O-GMCM 84 72 93
TABLE VIII. RESULTS OF CUT EXPERIMENTS FOR VOWEL RECOGNITION +20T305070R20B
1 48 507090 42 18 78 5 ఊ ◌ూ అం ◌ం
2 48 406080 46 15 88 6 ఋ ◌ృ ఐ ◌ై
3 48 305070 34 14 64 7 ఏ ◌ెౕ ఓ ◌ెూౕ
4 64 305070L30B 37 16 76
8 ఔ ◌ౌ ఆ ◌ా ఇ ◌ి
5 64 406080L30B 35 13 72
9 అం ◌ం ఈ ◌ిౕ అః ◌ః
6 64 507090L30B 36 11 74
Figure 10. Vowel Confusion Matrix
The moments and statistical features extracted from cut The analysis of the confusion matrix shows that the
images 20T305070R20B and from four Gabor directional confusion is between the similar vowels matras. Also it is
images G-CM+G-S performs well with an average observed that some Kagunita shapes become similar to other
recognition rate of 85% for vowel recognition as in table X. Kagunita shapes.
For O-GMCM the result is 86% with an improvement of 1% For example, మ and వು , ಲె and , ತా and ಲౌ etc.
for vowel recognition. With huge writing variations, if the circle in (అం) ◌ం matra
TABLE X. VOWEL RECOGNITION RESULTS WITH COMBINED FEATURE becomes smaller or the circle in (ఆ) ◌ా matra becomes bigger,
SET
there will be close similarities. If the (ఎ) ◌ె matra is written
with short horizontal line, it can be confused with (ఇ) ◌ి
101
International Journal of Computer Theory and Engineering, Vol.3, No.1, February, 2011
1793-8201
102
International Journal of Computer Theory and Engineering, Vol.3, No.1, February, 2011
1793-8201
103