Professional Documents
Culture Documents
F2 2012 2 Tautu
F2 2012 2 Tautu
F2 2012 2 Tautu
Publicat de
Universitatea Tehnică „Gheorghe Asachi” din Iaşi
Tomul LVIII (LXII), Fasc. 2, 2012
SecŃia
AUTOMATICĂ şi CALCULATOARE
1. Introduction
∗
Corresponding author; e-mail: fleon@cs.tuiasi.ro
32 Eugen-Dumitru Tăutu and Florin Leon
2. Preprocessing Techniques
2.1. The Otsu Method
σ W2 (t ) = w1 (t )σ 12 (t ) + w2 (t )σ 22 (t ) (1)
σ B2 (t ) = σ 2 − σ W2 (t ) = w1 (t ) w2 (t )[ µ1 (t ) − µ 2 (t )]2 (2)
The solution for the segmentation of the areas of the characters in the
image was given by an implementation of a new algorithm that, scanning the
image from left to right and from bottom to top, and finding a black pixel, will
consider it as the original area delimiting the character from which is part of.
This area is further expanded in three directions, namely top, left and right, so
as to include the rest of the pixels that are part of the handwritten character.
Expansion in one direction is stopped when, among the new pixels brought by
that expansion there's no black one. Expansion in that direction is resumed
when the expansions in the other directions bring in it's border new black pixels.
This process ends when either no more expansions in any direction can be done
or when the algorithmfinishes scanning the entire picture.
The steps of the algorithm are the following:
− P1 - Scan the image from left to right and from bottom to top;
− P2 - For each black pixel encountered which is not part of an area
already found do:
− P2.1 - Tag the up, left and right directions as possible expansions;
− P2.2 - If there is a direction of which frontier contains no black
pixels, mark this direction as not possible for expansion;
34 Eugen-Dumitru Tăutu and Florin Leon
min(W1 , H1 )
R1 = (3)
max(W1 , H1 )
min(W2 , H 2 )
R2 = (4)
max(W2 , H 2 )
Table 1
Functions for Aspect Ratio Mapping
Method Function
Fixed aspect ration R2 = 1
Aspect ratio preserved R2 = R1
Square root of aspect ratio R2 = R1
Cubic root of aspect ratio R2 = 3 R1
π
Sine of aspect ratio R2 = sin( R1 )
2
x ' = x ' ( x, y )
(5)
y ' = y ' ( x, y )
and:
x = x( x' , y ' )
(6)
y = y ( x' , y ' )
In case of the mesh in the forward mapping, the discrete coordinates ( x, y ) scan the
original image pixels and the pixel value f ( x, y ) is assigned to all the pixels that
fall within the range ([ x' ( x, y )],[ y ' ( x, y )]) to ([ x' ( x + 1, y + 1)],[ y ' ( x + 1, y + 1)]) .
The forward mapping is mostly used because of the fact that meshing
the mapped coordinates ( x, y ) can be easily done.
The functions for the forward and backwards mapping are given in
Table 2. Denoted by α and β in the table, they are given by:
α = W2 W1
(7)
β = H 2 H1
Table 2
Functions for Coordinate Mapping
Method Forward mapping Backwards mapping
x' = α x x = x' α
Linear
y' = β y y = y' β
( x '− x'c )
x' = + xc
x' = α ( x − xc ) + x 'c α
Moment
y ' = β ( y − y c ) + y 'c ( y '− y 'c )
y' = + yc
β
x'
x' = hx−1 ( )
x ' = W2 h x ( x ) W2
Non-linear
y ' = H 2 hy ( y) y'
y ' = h y−1 ( )
H2
For extracting the features that define the characters in the image we
used the discrete cosine transformation (Watson, 1994), which is a technique
that converts a signal into its elementary frequency components. Each line of M
pixels from an image can be represented as a sum of M weighted cosine
functions, assessed in discrete points, as shown by the following equation (in
the one-dimensional case):
M −1 2C x (2i + 1) xπ
Ti = ∑ s x cos
2M
(8)
x=0 M
2
for 0 ≤ i < M where C x = for x = 0 , otherwise C x = 1 .
2
Bul. Inst. Polit. Iaşi, t. LVIII (LXII), f. 2, 2012 37
1 7 7 (2 x + 1) jπ (2 y + 1)iπ
Ti. j = Ci C j ∑ ∑ s y , x cos cos (9)
4 x =0 y =0 16 16
1
In eq. (9) it is considered that Ci = C j = , for i, j = 0 , otherwise
2
Ci = C j = 1 .
It can be said that the transformed matrix elements with lower indices
correspond to coarser details in the image and those with higher indices to finer
details. Therefore, if we analyze the matrix T obtained by processing different
blocks of an image, we see that in the upper left corner of the matrix we have
high values (positive or negative) and the more we explore down to the bottom
right corner the values start to decline even more, tending to 0.
The next step is the actual selection of certain elements in the array. The
first operation that can be done is to order the elements of the matrix into an
onedimensional array so as to highlight as many values of zero as possible. The
ordering is done by reading the matrix in zigzag.
To extract the necessary features for character recognition we can select
the first N values from this array. As N increases, so does the recognition
accuracy, but that happens at the expense of increasing the training time of the
support vector machine.
3. System Architecture
A first step through which the image passes is to convert it into an image
with gray levels, after which the optimal threshold for binarization is computed by
38 Eugen-Dumitru Tăutu and Florin Leon
using the the Otsu method. With this threshold, the image is converted to black
and white, thus highliting the handwritten characters which it contains.
The next step is to segment the areas corresponding to the letters of the
handwritten words from the image converted to black and white, after which
these areas are converted to a matrix with values of 0 and 1. One last operation
performed by this module is to normalize this matrix to a predetermined size so
as to facilitate the feature extraction process.
This module was designed to extract the features from the segmented areas
of the image containing the characters to be recognized, traits that serve to distinguish
an area corresponding to a letter from an area corresponding to other letters.
To begin with, the first n components of the discrete cosine
transformation of a segmented area are considered to be the features that
describe it. In the next phase, certain statistical details of the area are added to
the discrete consine transformation components to define its features:
− number of black pixels from a matrix (the so called “on”" pixels);
− mean of the horizontal positions of all the "on" pixels relative to the
centre of the image and to it's width;
− mean of the vertical positions of all the "on" pixels relative to the
centre of the image and to it's height;
− mean of horizontal distances between "on" pixels;
− mean of vertical distances between "on" pixels;
− mean product between vertical and horizontal distances of "on" pixels;
− mean product between the square of horizontal and vertical distances
between all "on" pixels;
− mean product between the square of vertical and horizontal distances
between all "on" pixels;
− mean number of margins met by scanning the image from left to right;
− sum of vertical positons of the margins met by scanning the image
from left to right;
− mean number of margins met by scanning the image from bottom to top;
− sum of horizontal positions of the margins met by scanning the image
from top to bottom.
One last operation implemented by this module is the normalization of
the results obtained up until now so as they corespond to the format accepted by
the support vector machine module.
various parameters of these kernels (Hsu et al., 2010). After setting the type of
kernel and its parameters, the support vector machine is trained with the set of
features given by the other modules. Once the training is over, the support vector
machine can be used to classify new sets of characters. For building this module,
we used the SVM.NET library (Johnson, 2009) which is an implementation of the
libSVM library (Chang & Lin, 2012) for the .NET platform.
4. Experimental Results
For testing the accuracy of the system, we used, in a first test scenario,
an image which contained 100 small letters only (Fig. 2). The construction of
the training set, which consisted of two images containing 40 examples of each
small letter in the English alphabet, took 18.5058 sec. The results are presented
in Table 3.
Table 3
Results for Training with Sets that Correspond only to Small Letters
Kernel function C p Precision
Linear 1 − − − − 88%
Linear 10 − − − − 86%
Linear 100 − − − − 86%
RBF 10 − − 0.25 − 91%
RBF 10 − − 0.15 − 91%
RBF 10 − − 0.1 − 92%
RBF 10 − − 0.05 − 93%
RBF 10 − − 0.03 − 92%
RBF 10 − − 0.02 − 95%
RBF 10 − − 0.01 − 96%
RBF 10 − − 0.005 − 95%
Polynomial 10 2 2 − − 93%
Polynomial 10 2 4 − − 80%
Polynomial 10 2 1 − − 88%
Polynomial 10 2 0.5 − − 87%
Polynomial 10 3 2 − − 92%
40 Eugen-Dumitru Tăutu and Florin Leon
Table 3
Continuation
Kernel function C p Precision
Polynomial 10 3 4 − − 80%
Polynomial 10 3 1 − − 88%
Polynomial 10 3 0.5 − − 87%
Polynomial 10 4 2 − − 92%
Polynomial 10 4 4 − − 82%
Polynomial 10 4 1 − − 88%
Polynomial 10 4 0.5 − - 87%
Sigmoid 10 − 0.5 − 1 93%
Sigmoid 10 − 0.5 − 5 83%
Sigmoid 10 − 0.2 − 1 66%
Sigmoid 10 − 0.7 − 1 49%
For the next test scenario we used for training only the features
corresponding to capital letters. The image used for testing contained 100
letters, and the construction of the training set, which consisted of two images
containing 40 examples of each capital letter in the English alphabet, took
19.5075 sec. The results are presented in Table 4.
Table 4
Results for Training with Sets that Correspond only to Capital Letters
Kernel function C p Precision
Linear 1 − − − − 87%
Linear 10 − − − − 88%
Linear 100 − − − − 86%
RBF 10 − − 0.25 − 90%
RBF 10 − − 0.15 − 89%
RBF 10 − − 0.1 − 93%
RBF 10 − − 0.05 − 92%
RBF 10 − − 0.03 − 89%
Bul. Inst. Polit. Iaşi, t. LVIII (LXII), f. 2, 2012 41
Table 4
Continuation
Kernel function C p Precision
RBF 10 − − 0.02 − 89%
RBF 10 − − 0.01 − 88%
RBF 10 − − 0.005 − 90%
Polynomial 10 2 2 − − 93%
Polynomial 10 2 4 − − 75%
Polynomial 10 2 1 − − 93%
Polynomial 10 2 0.5 − − 92%
Polynomial 10 3 2 − − 93%
Polynomial 10 3 4 − − 71%
Polynomial 10 3 1 − − 95%
Polynomial 10 3 0.5 − − 91%
Polynomial 10 4 2 − − 94%
Polynomial 10 4 4 − − 71%
Polynomial 10 4 1 − − 92%
Polynomial 10 4 0.5 − − 91%
Sigmoid 10 − 0.5 − 1 47%
Sigmoid 10 − 0.2 − 1 66%
The last test cases consisted of training the support vector machine with
sets corresponding to both small and capital letters. The images used for testing
were the previous ones, and the construction of the training set took 37.9334
seconds. The results are presented in Table 5.
Table 5
Results for Training with Sets Corresponding to Both Small and Capital Letters
Kernel function C p Precision
Linear 1 − − − − 79%
Linear 10 − − − − 74.5%
RBF 10 − − 0.25 − 76.5%
RBF 10 − − 0.1 − 79%
RBF 10 − − 0.05 − 80%
RBF 10 − − 0.01 − 76.5%
Polynomial 10 2 2 − − 74%
Polynomial 10 3 2 − − 71%
Polynomial 10 4 2 − − 71%
Sigmoid 10 − 0.5 − 1 47%
Sigmoid 10 − 0.2 − 1 50%
lengthiest operation of the system), the processor was loaded to 33% and the
application occupied 55.120 MB of memory. During idle mode, the application
consumes 44.892 MB of memory.
6. Conclusions
The system uses in its implementation methods like the Otsu technique
for calculating the global threshold of an image that is later used on its
binarization. Following that, an own algorithm for segmenting characters from
the image is used, after which the areas determined by this algorithm are
transformed and normalized so that they have the same dimensions in order to
facilitate the feature extraction process. The features extracted from the area
include components of the discrete cosine transformation applied to images and
some statistics details of it. One last step before sending these features to the
support vector machine is to scale them to a range accepted by it.
Reaching a precision rate of 90% in case of training with sets
corresponding to small or capital letters, and one of over 75% in case of training
the support vector machine with sets of both small and capital letters, the
system achieved its goal which is the recognition of characters from an image.
A future direction of expanding the system would be the addition of
techniques that determine automatically the optimal parameters of the kernel
functions. Also, an own implementation of the support vector machine module
can be added further on.
REFERENCES
Sandu V., Leon F., Recognition of Hadwritten Digits Using Multilayer Perceptrons.
Bul. Inst. Polit. Iaşi, s. Automatic Control and Computer Science, LV (LIX), 4,
103−114, 2009.
Watson B.A., Image Compression Using the Discrete Cosine Transform. Mathematica
Journal, 4, 1, 81−88, 1994.
(Rezumat)