Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Pattern Recognition 58 (2016) 172–189

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

A multi-objective approach towards cost effective isolated handwritten


Bangla character and digit recognition
Ritesh Sarkhel, Nibaran Das n, Amit K. Saha, Mita Nasipuri
Computer Science and Engineering Department, Jadavpur University, Kolkata 700032, India

art ic l e i nf o a b s t r a c t

Article history: Identifying the most informative local regions of a handwritten character image is necessary for a robust
Received 22 September 2015 handwritten character recognition system. But identifying them from a character image is a difficult task.
Received in revised form If this task were to be performed incurring minimum possible cost, it becomes more challenging due to
22 March 2016
having two independent, apparently contradicting objectives which need to be optimized simulta-
Accepted 13 April 2016
Available online 22 April 2016
neously, i.e. maximizing the recognition accuracy and minimizing the associated recognition cost. To
address the problem a multi-objective approach is required. In the present task, two popular multi-
Keywords: objective optimization Algorithm (1) a Non-Dominated Sorting Harmony-Search Algorithm (NSHA) and
Feature set (2) a Non-Dominated Sorting Genetic Algorithm-II (NSGA-II, Deb et al., 2002 [18]) are employed for
Region sampling
region sampling separately. The method objectively selects the most informative set of local regions
Handwritten character recognition
using the framework of Axiomatic Fuzzy Set (AFS) theory, from the sets of pareto-optimal solutions
Multi-objective evolutionary algorithm
Harmony search provided by the multi-objective region sampling algorithms. The system has been evaluated on two
NSGA-II isolated handwritten Bangla datasets, (1) a dataset of randomly mixed handwritten Bangla Basic and
AFS theory Compound characters and (2) a dataset of handwritten Bangla numerals separately, with SVM based
classifier, using a feature set containing convex-hull based features and CG based quad-tree partitioned
longest-run based local features extracted from the selected local regions. The results have shown a
significant increase in recognition accuracy and decrease in recognition cost for all the datasets. Thus the
present system introduces a cost effective approach towards isolated handwritten character recognition
systems.
& 2016 Elsevier Ltd. All rights reserved.

1. Introduction but development of OCR for complete Bangla script [10] has not
received much attention from researchers until recently. Bangla is
Optical Character Recognition (OCR) is an active area of research. the second most popular script in India and the fifth most popular
While there are many systems commercially available for recog- script in the world [11]. Bangla alphabet contains some of the most
nizing printed text [1–4], their success is yet to be extended to intricate and complex characters, which differ from one another
handwritten characters. Several reasons can be cited to explain this only by a single period, a modifier ref or an upper horizontal line or
apparent anomaly. Shape and size of handwritten characters vary Matra, as shown in an example in Fig. 1. Bangla alphabet contains
from one individual to another. It may even vary for a single indi- about 50 Basic characters (11 vowels and 39 consonants) and more
vidual from time to time, depending on various factors. These than 334 Compound characters [12]. Samples of a few of Bangla
challenges make the task of recognizing handwritten characters Basic and Compound characters is shown in Fig. 2.
very difficult. Researchers all around the world have proposed One of the most common approaches taken up by OCR
researchers is zoning, i.e. dividing the character image into several
several methods [5] for handwritten character recognition, but
zones or local regions [13] and generating the invariant local feature
most of them are focused on Roman scripts [6], concentrating on
set by extracting features from every local region. There are several
English and other European languages. Among Asian languages,
different zoning methods [13] mentioned in the literature, but most
Chinese [7], Japanese, Korean languages are dominant in the lit-
of them can be classified into two major categories: static [4,10]
erature. Indian scripts like Malayalam, Tamil, Telugu, and Hindi have
and dynamic zoning methods [13]. Static zoning methods divide a
started to get attention of the researchers during past decade [8,9], handwritten character image into a fixed set of overlapping or
non-overlapping windows, where the number of windows is fixed.
n
Corresponding author. Tel./fax: þ 91 3324146766. Basu et al. used static zoning method in [14] and sub-divided the
E-mail address: nibaran@gmail.com (N. Das). handwritten numerals’ image into 9 fixed-sized, overlapping local

http://dx.doi.org/10.1016/j.patcog.2016.04.010
0031-3203/& 2016 Elsevier Ltd. All rights reserved.
R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 173

Fig. 1. Similarity in shape and size between different Bangla characters. (a). Bangla Basic character ‘ ’ (b) Bangla Basic character ‘ ’.

Fig. 2. Samples of handwritten Bangla characters.

regions and extracted longest-run based features form each sub- region sampling algorithms mark one of the contributions of the
region. On other hand, dynamic zoning methods sub-divide a present work. Both of the region sampling algorithms are
handwritten character image into local regions by dynamically employed over the decision space separately. These algorithms
creating windows based on some statistical or topological feature have two objective functions: – (1) maximizing handwritten
of that specific character. Cao et al. [8] proposed a similar techni- character recognition accuracy and (2) minimizing associated
que to generate a hierarchical feature-space based on a quin-tree recognition costs. In our experimental setup, recognition accuracy
partition of the character image, where zones were dynamically is measured using an SVM based classifier and recognition costs
created based on the centroid of the contour segment of the are measured by: (i) average time taken by the recognition system
character residing in the parent zone. Das et al. [10,15] have used a to recognize each handwritten character in the test-set and (ii) the
GA based selection mechanism to find out the most optimal set of number of local regions used to represent each handwritten
local regions for recognition of handwritten Bangla numerals. character in the test-set. Two sets of pareto-optimal solutions
In those papers, the researchers have emphasized on achieving provided by these two algorithms are then combined using Axio-
better recognition accuracies, but associated recognition costs matic Fuzzy Set (AFS) Theory [19]. The multi-objective region
incurred in the process were not taken into consideration. For sampling algorithms and the AFS theory based approach to
example, Das et al. presented a two pass approach towards objectively combine the pareto-optimal solutions provided by the
handwritten character recognition in [10], which produced a sig- multi-objective algorithms mark one of the contributions of the
nificant increase in the recognition accuracy but at the cost of a present work.
recognition cost which is almost 8.5 times than the average per The proposed method tries to find an objective solution over
character recognition cost incurred by traditional single pass the decision space, while providing an optimal trade-off between
approach towards handwritten character recognition. This may recognition accuracy and corresponding recognition costs, making
prove to be undesirable to users who want to use such a system for it suitable to use in practical applications. The present work has
real-life applications. An extensive study of recognition accuracy been evaluated on datasets of isolated handwritten Bangla char-
versus associated recognition cost is undertaken in our experi- acters and handwritten Bangla numerals separately. Results from
mental setup to investigate the scope of a practical optical char- these experiments have been compared with some of the other
acter recognition system, in terms of both recognition accuracy popular handwritten character recognition methods present in the
and associated recognition cost. literature, to prove its superiority.
In the present work, a multi-objective approach towards optical The rest of the paper is organized as follows: in Section 2, a
character recognition (OCR) is proposed, which attempts to find a brief overview on multi-objective evolutionary algorithms based
trade-off between the recognition accuracy achieved by the sys- region sampling techniques is presented, basics of Axiomatic
tem and its associated recognition costs. In real life applications of Fuzzy Set (AFS) theory is introduced in Section 3; Section 4
an OCR system, insignificant increase in recognition accuracy at describes the featureset and our present work is discussed in
the expense of high recognition cost may not be acceptable to the details in Section 5, experimental results are presented in Section
users of the system. In such cases, a multi-objective approach can 6. Finally, a brief conclusion is drawn based on the results gathered
provide the user with a set of good solutions. In the present work, from the experiments.
framework of a novel, multi-objective isolated handwritten char-
acter recognition system is proposed. There are several variants of
multi-objective Evolutionary Algorithms [16] present in the lit- 2. Motivation behind using multi-objective evolutionary
erature. A Non-dominated Sorting Harmony-search Algorithm (NSHA algorithms for region sampling
[17]) based region sampling method and a Non-dominated Sorting
Genetic Algorithm – II (NSGA-II [18]) based region sampling method Region sampling based OCR systems try to identify the most
is introduced in our present work. These two multi-objective discriminative set of local regions from handwritten character
174 R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189

images. The easiest approach is to exhaustively enumerate every Using multiple region sampling algorithms and comparing their
possible combination of local regions until the best combination is results hereafter, help us to reach a conclusive decision about the
found. This approach however, would take time of exponential performance of the proposed system, as the results become
order. Therefore, Evolutionary Algorithm based meta-heuristic independent of the specificities of any particular algorithm in
approaches are generally employed, so that a good enough solu- consideration.
tion is found within a reasonable amount of time. In our present work, an Axiomatic Fuzzy Set (AFS) Theory
Most of the proposed methods present in handwritten char- based fuzzy logic is proposed to combine and finally return a
acter recognition literature [11] however, focus only on increasing single solution from the candidate set of pareto-optimal solutions
the accuracy of the recognition system. Sometimes such incre- extracted from the multi-objective region sampling algorithms
ments in recognition accuracy may come at an expense of used in our experimental setup. AFS Theory has been employed in
increased recognition cost. In such a scenario, where moderate our experimental setup, because its framework provides much
increments in recognition accuracy is achieved by incurring high greater flexibility [19] in computing the interaction between a set
recognition costs, the solutions provided by a recognition system of local regions. It is also easier to compute the combined class-
may not be acceptable to the end-user. A handwritten character separating power of a set of local regions, compared to other fuzzy
recognition system, to be used in real-life applications, should not logic based systems. Important aspects of a fuzzy logic based
just increase the average recognition accuracy of its test dataset system like definition of membership function, logical operation
but keep a check on the associated recognition cost as well. Our
between a set of elements in the fuzzy featureset is already
present work uses Multi-objective Evolutionary Algorithms to find
defined [19] the framework of AFS theory.
such a set of solutions. The multi-objective Evolutionary Algorithm
based region sampling technique, used in our experimental setup,
tries to identify the most informative set of local regions by
heuristically searching the solution space, while incurring mini- 3. A Brief overview on axiomatic set theory based fuzzy logic
mum possible recognition cost.
To formalize, in our experimental setup, the multi-objective A priori identification of the set of local regions containing the
region sampling techniques try to optimize two separate objec- most informative set of features is a difficult task. Local region or
tives simultaneously, which are: (a) maximizing the recognition feature selection methods, described in the literature, can be
accuracy and (b) minimizing the cost associated with the recog- broadly classified into two categories: wrappers and filters [30].
nition process, as described before. The recognition cost is repre- Wrappers identify the most informative set of features from the
sented in terms of two major factors: (i) number of local regions discourse with the help of a training dataset and an efficient
needed to represent the handwritten characters and (ii) average learning algorithm; whereas filters use some kind of heuristics to
time taken by the proposed system to recognize each handwritten identify the feature set that has the most promise. For applications
character. This set of solutions, which is also called the pareto- like handwritten character recognition, merit of a local region is
optimal solutions [20] presents the users of the system with a set interpreted by its contribution to the recognition system’s class-
of choices. Users may choose a single solution from the pareto- separability power. Although there has been some literature that
optimal set, based on their real-time requirement and domain uses entropy based fuzzy interpretation [31] of local features in
expertize. feature-ranking techniques for recognition of handwritten Hindi
There are several variants of multi-objective Evolutionary numerals, fuzzification of features to determine the feature set
Algorithms [21] present in the literature. In our present work, we with most class-separability power for handwritten character
have used a widely popular multi-objective Evolutionary Algo- recognition has not been addressed that much. Collection of
rithm, Non-dominated Sorting Genetic Algorithm (NSGA-II), pro- individually good features does not ensure a good feature set [32].
posed by Deb et al. [18]. Since its introduction, NSGA-II has been Finding out all the possible combinations of local features is also a
applied in various applications such as water network design [22], difficult and time consuming task [33]. Our present work uses the
construction management [23], economics [24], population plan- framework of AFS theory [19] based algebra to define fuzzy
ning [25] etc. To the best of our knowledge, the performance of semantics, membership functions and logical operations on a set
NSGA-II in a region sampling based handwritten character recog- of local regions to identify the subset of local regions which has
nition system is yet to be explored. In the present work, an NSGA- the most class-separating power.
II based region sampling algorithm is proposed. We have included
an extensive study of another multi-objective Evolutionary Algo-
rithm in our experimental setup. Harmony Search Algorithm is a 3.1. AFS algebra based fuzzy logic
relatively new, music inspired meta-heuristics algorithm, pro-
posed by Geem et al. [26,44]. It has been proved [27] to provide AFS algebra mainly focuses on two things: how fuzzy sets are
better performance than Genetic Algorithms for some applica- created i.e. the definition of membership functions and how
tions. Several variants of multi-objective Harmony Search Algo- logical operations on fuzzy sets are defined. The concepts extrac-
rithms [28] are presented in the literature. They have been used to ted from a given dataset depend strongly on the observed data
address problems such as economic/environmental dispatch [29], [19]. For example, the concept of “heavy” (in terms of a person’s
optimal power flow problem [17] etc. but to the best of our weight) is not interpreted the same way in a dojo of Japanese
knowledge, performance of a multi-objective Harmony Search Sumo wrestlers, as it would be interpreted in a Ballet studio of
Algorithm in a region sampling based handwritten character Broadway. Strong additional background knowledge on the data-
recognition system is yet to be explored. Here, we have introduced set is thus necessary to extract a concept.
a Non-dominated Sorting Harmony Search Algorithm (NSHA) based A fuzzy concept is defined based on one or more features of the
region sampling technique for this purpose. dataset. Let M be the set of fuzzy concepts, M ¼ m1 ; m2 …::mn ,
Both NSGA-II and NSHA based region sampling technique, where a concept mi is defined on a feature of X i A F. M can be
proposed in our present work, have been tested extensively on viewed as a building block, containing the elementary concepts
publicly available datasets of isolated handwritten Bangla char- associated to each feature. Using M, every possible concept on X
acters and isolated handwritten Bangla numerals. Details about can be easily formulated and represented, which essentially means
the datasets used in the present work is provided in Section 4. that every possible concept Y on X can be easily formulated using
R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 175

M, where C т ðx0 Þ is the set of all elements whose degree of belonging to


X   ∏m A C m is less than or equal to that of x0 and Mγ ð:Þ is a function
Y¼ j AJ
∏imi A C j mi where mi D C j : ð1Þ whose value always lies between 0 and 1.

3.1.1. Definitions 4. Design of the featureset


Let M be a non-empty set. Then the set EM* can be defined as
nX   4.1. Dataset of the experiment
EM  ¼ jAJ
∏imi A C j mi j C j A 2m ;
o The proposed method has been evaluated on four publicly
jA J; J maybe any non  empty index set : ð2Þ
available benchmark [12,34] datasets. A brief overview of the
A binary relation R between two fuzzy concepts is defined as datasets used in the present work is described in Table 1. More
follows: details about these databases such as sample collection, prepara-
P   P   tion techniques etc. can be found in the ‘Reference’ column pro-
For j A J ∏imi A C j mi ; i A I ∏kmk A Di mk A EM ;
X   X   vided in Table 1.
i k
j AJ
∏ m i A C j
m i R i AI
∏m k A Di
m k
( 4.2. Design of the feature set
8 C j ðj A J Þ ( Di ði A I Þsuch that C j + Di
 ð3Þ
8 Di ði A I Þ ( C j ðj A J Þ such that Di + C j
The extracted features, used in the present work can be clas-
It is clear from (3) that R is an equivalence relation and the sified into two categories: global and local. While the number of
P  i P  k  global features extracted from a handwritten character or digit
semantics j A J ∏mi A C j mi and i A I ∏mk A Di mk is equivalent
image is fixed, in case of local features it may vary based on the
under R.
number of local regions considered. Global features are extracted
The quotient EM  =R is called EM and each from the whole image, whereas local features are extracted from a
X  
∏kmk A Di mk ϵ EM is called a f uzzy concept: ð4Þ sub-region or local region [13] of the image being considered. Both
i AI
global and local features are used in the feature set so that the
inherent distinguishing pattern of a character or digit can be suf-
3.2. AFS structure ficiently quantified.

AFS structure formalizes a mathematical description of the data 4.2.1. Global features
structure used by the AFS algebra, a completely distributive lattice The number of global features used in our experimental setup
[19] generated by the fuzzy datasets and the concepts behind is 175, out of which 155 are convex-hull based features and the
them. An AFS structure can be defined as a triplet ðM; т; X 0 Þ: rest are quad-tree based longest run based features. Convex hull
based features, proposed by Das et al. [40] are used in the present
3.2.1. Definition work. The feature set includes maximum dcp, total number of
Let X 0 ; M are two finite sets. Let т be a relation defined as: rows having dcp 4 0, average dcp, number of visible bays, etc. CG
т: X 0  X 0 -2m . т is called an AFS structure if it follows two axioms: based quad-tree partitioned longest-run features were suggested
by Basu et al. in the year 2009. Longest run features extracted from
ðaÞ ðx0 ; y0 Þ A X 0  X 0 ; тðx0 ; y0 Þ D тðx0 ; x0 Þ the root node (1*4 ¼ 4) and the first-level child-nodes (4*4 ¼ 16) of
ð5Þ
ðbÞ 8 ðx0 ; y0 Þ; ðy0 ; z0 Þ A X 0  X 0 ; тðx0 ; y0 Þ \ Tðy0 ; z0 Þ D тðx0 ; z0 Þ the resultant quad-tree also contribute to the global feature set
0
In this definition, X is the universe of discourse, M is the concept (4 þ16 ¼20). Details about the feature extraction techniques can
set and т is the axiomatic structure. In real world applications, т be found in reference [41].
can be defined as follows:
4.2.2. Local features
тðx0 ; y0 Þ ¼ fmj m A M; x0 Rm y0 g A 2m ; ð6Þ
The features extracted from the second level of child nodes of
where Rm represents binary relation of simple conceptm A M, and the CG based quad-tree [42] contribute to the local feature set. The
x0 Rm y0 means the degree of x0 belonging to attribute m is larger number of nodes in the second level of the quad-tree is 16
than or equal to that ofy0 . (42 ¼16). Longest run based features are extracted from each local
P
The membership function of a fuzzy concept γ ¼ γ ¼ j A J region represented by a node in the quad-tree, along four axes,
  horizontal, vertical and two diagonals. Each local region is repre-
∏imi A C j mi is defined as follows:
   sented by 4 local features. Hence, total number of local features in
 
θγ ¼ supj ∏γ A C j ℳγ C тj ðx0 Þ ; where C T ðx0 Þ ¼ z0 A X 0 j Tðx0 ; z0 Þ + C the feature set of our present work is 64 (16*4¼ 64) and the
aggregate number of features in the integrated feature set of the
ð7Þ
experimental setup is 239 (175þ 64 ¼239).

Table 1
Datasets used in the present work.

Index Name of the dataset Dataset type Number of Number of training Number of test Reference
classes samples samples

DB1 CMATERdb 3.1.1 Isolated, handwritten Bangla numerals 10 4000 2000 [35,36]
DB2 ISI handwritten Bangla numeral Isolated, handwritten Bangla numerals 10 19392 4000 [37,38]
database
DB3 CMATERdb 3.2.1 Isolated, handwritten Bangla Basic characters 50 12000 3000 [35,36]
DB4 CMATERdb 3.2.1 þ CMATERdb Randomly mixed, isolated, handwritten Bangla 384 46919 11661 [36,39]
3.1.3.3 Basic and Compound characters
176 R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189

A brief overview of the feature extraction techniques employed quad-tree partition of a sample handwritten character or digit
in our present work is presented in Fig. 3. image. Let i denotes the partition level of a CG based quad-tree,
where i A f0; 1; 2g. Si denotes
n the
 set
n of local regions
oo at i-th level of

the quad-tree; then Si ¼ Rij  j A 0; 1…4i  1 . Clearly, a sample
5. Present work image M can be defined as M ¼ ⋃j Rij . Let the set of local features
extracted
n from the oregion Rij of M is denoted by V ij , where
As discussed before, objectives of our present work is threefold.
V ij ¼ F H V D1 D2
ij ; F ij ; F ij ; F ij . FH V D1 D2
ij ,F ij ,F ij ,F ij denote the values of the
They are: (a) proposing a region sampling methodology that
longest run based feature extracted from the local region Rij of the
returns a subset of local regions containing the most dis-
handwritten character or numeral image, along its four major axes
criminating features, while incurring minimum possible recogni-
i.e. horizontal axis, vertical axis, principal diagonal axis-1 (south-
tion cost (b) evaluating the performance of the proposed method
west diagonal) and principal diagonal axis-2 (north-east diagonal)
on two separate, publicly available datasets of Bangla handwritten
respectively, as shown in Fig. 8. Global feature set of M is defined
characters and (c) comparing the performance of the proposed
as, G ¼ ð⋃ij A f0;1g V ij Þ [ CF, where CF denotes the convex-hull based
method against some of the popular contemporaries present in
features extracted from the entire image. Local feature set of M is
the literature. Fig. 4 presents a block diagram of the proposed
defined as L ¼ ⋃ij ¼ 2 V ij . Fitness value of a set of regions Lm  L is
system.
denoted by f ðLm [ GÞ, where f ð:Þ denotes the recognition accuracy
by a SVM is based classifier of the sample handwritten character or
5.1. Definitions and notations
digit image, described by only the set of local regions Lm .
The present work performs region sampling in two phases. In
Let I HW is a 2D array that denotes a digital image M of
 the first phase of the proposed system, a non-dominated sorting
dimensions H  W, such that I HW ¼ f ði; jÞ0 r i rH  1and0 r j
harmony-search algorithm (NSHA) based region sampling method
rW  1g, where f(i,j) denotes the intensity of the pixel at position
and a non-dominated sorting genetic algorithm (NSGA-II by Deb
ði; jÞ. In our present work, only binary images of Bangla hand-
et al. [18]) based region sampling method are employed on the
written characters and numerals are considered. For a binary
dataset separately and the non-dominated points are extracted
image the value of f ði; jÞ A ½0; 1. As discussed before, a 2D binary
from both of the pareto-optimal fronts. In the final phase, a quality
image is assumed to be a combination of a number of overlapping/
consensus is performed among the extracted points. Local regions
non-overlapping regions Rk , k ¼{1, 2... n}, i.e. M ¼⋃k Rk . Here the
with majority of votes (more than 50% vote share) are preserved
regions Rk , k ¼{1, 2... n} are rectangular in shape with edges par-
separately and the rest of the candidate regions are reinterpreted
allel to corresponding edges of M. Thus, a region Rk is defined by
as fuzzy features with the help of AFS theory [43]. The subset of
the pixel-pairs
n at o the bottom-left and top-right corners
regions with the most class-separating power is selected. Finally,
i.e. Rk  iTL TL BR BR
k ; jk ; ik ; jk , where the pixel at (iTL TL
k ; jk ) denotes the the union of the set of fuzzily selected subset of local regions and
pixel at the top-left corner of the region and the pixel at (iBR BR
k ; jk ) the set of previously separated regions are returned. Fig. 4 shows a
denotes the pixel at the bottom-right corner of the region. Goal of schematic diagram of the proposed system.
the present work is to find a subset M i D M, such that the recog-
nition cost of the character image (when described by the set of 5.2.1. The Non-dominated Sorting Genetic Algorithm (NSGA-II) based
regions M i ) is minimal and the recognition accuracy achieved by region sampling methodology
the proposed system is maximum. Since its introduction, NSGA-II [18] has been widely used in
various applications such as water network design [22], con-
5.2. Region sampling methodology struction management [23], economics [24], population planning
[25] etc. In our present work, we have investigated the efficacy of
As discussed before, the local feature set used in the present NSGA-II for the selection of the most discriminating set of local
work is comprised of longest run based features [15] computed regions from a handwritten character or digit image, while
along 4 directions, extracted from the second level of the CG based incurring minimum possible recognition cost. In the following

Fig. 3. Feature extraction techniques implemented under present work.


R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 177

Fig. 4. Schematic representation of the integrated system developed under present work.

section, a NSGA-II based region sampling methodology is intro- (2) Identification of the pareto-optimal points in search-space:
duced for this purpose. Through reproduction and natural selection, the algorithm
heuristically tries to identify the set of local regions having the
(1) Initialization: The algorithm is initialized with an empty set ß; most class-separability power, while incurring minimal recog-
upon termination ß contains the non-dominated points nition cost. The region selection method is described in details
extracted from the pareto-optimal front. The population size in Appendix A as Algorithm 1.
is fixed at the number of possible non-trivial subset cardinal- (3) Objective functions: The algorithm has two objectives:
ities of the local region set. In our present work, the popula- (1) minimization of average recognition cost for handwritten
tion size of NSGA-II based region sampling methodology is 15. character/digit and (2) maximization of recognition accuracy
The parameters crossover-probability (pc) is set to 0.9 and of the test dataset, using a SVM-based classifier. The recogni-
mutation-probability (pm) is set to (1/(total number of local tion cost is expressed by, (i) average per character recognition
regions)) or 0.0625 in our present work, as suggested by Deb time and (ii) number of local regions used to represent each
et al. [18]. handwritten character/digit.
178 R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189

featureset F R extracted from V R (as discussed in Section 4.2) , f T


denotes the average per character recognition time achieved by
the proposed system and f R denotes the number of local regions
used to represent each handwritten character/digit.
(4) Termination criteria: The algorithm is terminated when the
algorithm has successfully reproduced 25 generations of the
initial population. Hence, the maximum number of iterations
(NI) of the algorithm is 25.

Pseudocode of the proposed NSGA-II based region sampling


algorithm and a detailed analysis of the algorithm, is described in
Appendix A as Algorithm 1.

5.2.2. A Non-dominated Sorting Harmony-search (NSHA) based


region sampling methodology
Harmony search is a derivative-free nature-inspired meta-
heuristics algorithm, proposed by Geem et al. [22], in the year
2001. It mimics a musician’s journey towards a better state of har-
mony. Since then, it has been used extensively by researchers for
various applications [45,46]. Several variations of multi-objective
harmony search algorithm have also been proposed by researchers
[28], but the performance of a multi-objective harmony search
algorithm for region sampling based handwritten character recog-
nition system is yet to be explored. To address this, a non-dominated
sorting harmony-search based region sampling method is proposed
in our present work. The proposed method uses a non-dominated
sorting and crowding-distance based ranking technique, introduced
by Deb et al. in [18], to extract the pareto-optimal solutions from the
solution space. The exploration (harmony memory consideration rate
Fig. 5. Binary operators used in a genetic algorithm. or HMCR) and exploitation (pitch adjustment rate or PAR) parameters
used by this algorithm are dynamically adjusted with respect to the
current generation of the population.
In our experimental setup, a comparative study between the
proposed algorithm and a basic harmony search based region
sampling was performed on the dataset of randomly mixed
handwritten Bangla Basic and Compound characters [39,12]. Our
present work have provided a significant 14.2965% increase in
recognition accuracy and 6.25% decrease in the associated recog-
nition cost, compared to basic harmony search based region
sampling method [47].

(1) Initialization: The algorithm is initialized with an empty set ß;


upon termination ß contains the non-dominated points
extracted from the pareto-optimal front. Harmony memory size
(HMS) is fixed at the number of possible non-trivial subset
Fig. 6. Crowding distance of point P is the perimeter of the cuboid shown in cardinalities of the set of local regions. In our present work, HMS
dashed line. is set to 15. The parameters harmony memory consideration rate
(HMCR) and pitch adjustment rate (PAR) are self-adaptive. Value
To formalize, the optimization problem can be described as of HMCR at the tth population generation is denoted by HMCRt
following: and the value of PAR at the tth population generation is denoted
8 by PARt , defined as following:
< Maximize f A ðO1Þ
>
Minimize f T ðO2Þ HMCRt ¼ HMCRmin þ ðHMCRmax  HMCRmin Þ=t  NI ð5Þ
>
: Minimize f
R ðO3Þ
PARt ¼ PARmax –ðPARmax PARmin Þ=t  NI ð6Þ

with respect to the following constraints:


8 PARmax and PARmin denote the maximum and minimum values of
>
> f Z0 ðC1Þ the parameter pitch adjustment rate or PAR, whereas HMCRmax
> A
>
< f 40 ðC2Þ and HMCRmin denote the maximum and minimum values of the
T
>
> f R ¼ jV R j ðC3Þ parameter harmony memory consideration rate or HMCR. The
>
>
: 1 r f o 16 ðC4Þ values of HMCRmax and HMCRmin are set as 1.0 and 0.9 respec-
R
tively, whereas the values of PARmax and PARmin are 1.0 and
where V R denotes the region-vector representing the encoding 0.0 respectively, as suggested by Pan et al. in [48].
of local regions returned by the algorithm, f A denotes the (2) Identification of the pareto-optimal points in search-space: The
recognition accuracy achieved by the SVM based classifier using algorithm heuristically searches for the most informative set
the features extracted from the global featureset G and the local of local regions that will incur minimal recognition cost. The
R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 179

Fig. 7. Analogy of musical improvisation and optimization technique in Harmony Search Method.

Fig. 8. Directional vector model of a region in residual regions-space.

region selection method employed under present work is 5.3. Integrating the pareto-optimal solutions using AFS theory based
described in details in Appendix A as Algorithm 2. fuzzy logic
(3) Objective functions: The algorithm has two objectives:
(1) minimization of average recognition cost for handwritten Applying NSHA and NSGA-II on a test dataset of handwritten
character/digit and (2) maximization of recognition accuracy Bangla characters/numerals (discussed in 5.2.1 and 5.2.2) produce
of the test dataset, using a SVM-based classifier. The recogni- two pareto-optimal sets of local region subsets. Let, the pareto-
tion cost is expressed by, (i) average per character recognition optimal set returned by NSHA is denoted by ß1 and the pareto-
time and (ii) number of local regions used to represent each optimal set returned by NSGA-II is denoted by ß2. The present
handwritten character/digit, same as the NSGA-II based region work uses a novel region sampling method to objectively choose
sampling method described before. Therefore, the optimiza- the most discriminative set of regions, using the information
tion problem can also be formalized same as the NSGA-II gathered from ß1 and ß2. This is one of the major contributions of
based region sampling algorithm, described before.
the present work.
(4) Termination criteria: The algorithm is terminated when the
It has been proved in [21] that elitism helps accelerate the
algorithm has successfully reproduced 25 generations of the
convergence rate of a multi-objective evolutionary algorithm. To
initial population. Hence, the maximum number of iterations
choose a solution from set of candidate solutions in both ß1 and ß2,
(NI) of the algorithm is 25. Pseudocode and a detailed analysis
a quality consensus is performed. The regions with majority con-
of the proposed NSHA based region sampling algorithm is
sensus are therefore excluded from the region-space from further
described in Appendix A as Algorithm 2.
considerations and maintained separately henceforth. Let, this set
Both of the algorithms proposed in our experimental setup, of regions is denoted by E. The resultant region space VR forms the
have been extensively tested for a dataset of randomly mixed vocabulary for fuzzy concepts and semantics; it is called residual
handwritten Bangla Basic and Compound characters and a dataset region-space. Clearly, VR ¼Ec.
of Bangla numerals separately. The results of the experiments are As previously discussed in Section 3, one of the main motiva-
shown in Section 5. It is to be noted that both of the algorithms tions of using fuzzy logic is better interpretability of complex
discussed above are developed independent of any specific dataset concepts to model the outstanding human ability of decision
and can be successfully applied for any region sampling based making with imprecise understanding or lack of background
pattern recognition task. knowledge. In the final phase of the proposed method, the residual
180 R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189

region-space is reinterpreted as a fuzzy feature set. The steps are fuzzy feature set UF and the scalarized feature set S are same, as
discussed in the following section. the mapping between UF and S is one to one.
In a system with more than two elements, understanding the
5.3.1. Scalarization of the residual feature set overall interaction between a set of elements is best indicated by
As discussed earlier in Section 4.2.2, longest run features are investigating the interaction between a pair of elements and then
extracted along 4 axes from each local region. Considering a unit their combination with other elements. In a similar way, in our
vector along the direction of each of the four axes, h(horizontal), v
present work, a subset of features is selected by investigating the
(vertical), d1(principal diagonal 1) and d2(principal diagonal 2), each
interaction between a pair of features first and then combining
region can be represented as a vector of the longest run of black
them with other features such that the resultant subset of features
pixels (shown in Fig. 3b) along each of the four directions. The first
with best overall interaction between them is returned.
step of final phase of the proposed method is scalarization of the
local feature set extracted from each region. Let the scalarized feature
set is denoted by S. The proposed directional vector model of longest 5.3.3. Computing the class-separation power of a set of fuzzy
run based features, used in the present work is shown in Fig. 8. features
In our present work, scalarization of local regions is done by Using AFS theory [43], the membership function of a pair of
representing it as a vector of directional components. fuzzy features u1 ; u2 A UF can be calculated by computing θu1 u2 (x)
Example: (as shown in Section 3). Membership function denotes the
Suppose, the feature set extracted from a local region R is given
apparent class-separation power of the combination of the two
below:
features u1 ; u2 . Another metric for class-separation is introduced in
the present work, denoted by normðu1 ; u2 Þ. It is defined in the
Vertical Horizontal Principal Diagonal 1 Principal Diagonal 2 following way:
A B C D P
xAX θu1 u2 ðxÞ
normðu1 ; u2 Þ ¼ ð10Þ
j Xj
Using the proposed vector model the region R can be repre-
sented as
R ¼ Av þ Bh þ Cd1 þ Dd2 ð7Þ 5.3.4. Selecting the set of regions with the most class-separating
power
Using vector decomposition, (7) can be rewritten as To consider all possible combination of local regions, a 1616
R ¼ Av þ Bh þ Cðhsin45ο –vcos45ο Þ þ Dðhsin45ο þ vcos45ο Þ matrix M is formed (number of local regions in our experimental
yields setup is 16), where each row and column corresponds to a local region
⟹ ð8Þ
from the residual region-space. M[i,j] denotes the combined class-
R ¼ ðA–Ccos45 ο þDcos45ο Þv þ ðB þ Csin45 ο þ Dsin45ο Þh separation power of the i-th and j-th feature from the fuzzy feature set
UF, represented by the metric defined in (10). The diagonal elements
Hence, the local region R can be scalarized
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi to:
are made to zero to nullify the effect of combining i-th feature with
ðA  C cos 451 þ D cos 451Þ2 þ ðB þ C sin 451 þ D sin 451Þ2 : In the fea-
itself. The task of selecting the most informative set of local regions
ture set S, there will be a scalarized feature against each of the
now reduces to finding the maximum sum k  k non-trivial sub-
regions of the residual region-space.
matrix of M. The sub-matrix is formed such that, if the i-th row is
5.3.2. Defining the fuzzy feature set selected, then the i-th column also has to be selected [50], which
The fuzzy features are defined such that it can reflect the essentially means that the i-th region is being considered. The k  k
proximity of a sample character image to its true class with respect maximum sum sub-matrix returns the best k features from the fuzzy
to all the other possible classes. For a feature f A S, it is represented featureset. Now, as there is a one to one mapping between the original
as fuzzy feature uf . Value of uf for a handwritten character/digit featureset and the fuzzy featureset, this helps us to identify the k best
image, x A X is denoted by uxf . It is defined as following: features from the residual featureset. Each feature in the residual fea-
  tureset has a one to one mapping with a local region in the residual
spreadðclassx Þ
xf  meanclass
f
x
x
uf ¼ P    vf ð9Þ region-space, therefore we get the k best local regions from the resi-
spreadðiÞ
iAC xf  meanif dual region-space at the end of this method. A pseudocode of the
algorithm described above, is described in Appendix A as Algorithm 3.
where xf is the value of the feature f for element x, classx is the true Finally, the union of the set of k selected regions and the elitist
class of x and C denotes the set of all possible classes, meanif is
set of regions E, which was previously separated and maintained is
mean of the values of the feature f in class i, vf is the vote share of
returned as the best candidate solution.
the local region corresponding to the feature f in the previously
performed quality consensus on the pareto-optimal set extracted
Table 2
from both NSGA-II and NSHA and spread(i) denotes the spread of Values of the parameters used in our experimental framework.
the elements in class i (standard deviation is used for this measure
in the present work). Region sampling Parameter values used in the present work
It has been mentioned in the literature [49] that associating too methodology

many fuzzy sets and rules can make the fuzzy model hard to NSHA [17] based region HMCRmax HMCRmin PARmax PARmin POP NI
interpret and thus defeating the purpose of fuzzifying the original sampling 1.0 0.9 1.0 0.0 15 25
featureset in the first place. In our present work, a one to one methodology pm pc POP NI
mapping is maintained between a feature and its corresponding NSGA-II [18] based 0.0625 0.9 15 25
region sampling
fuzzy feature, therefore the dimensionality of fuzzy featureset
methodology
does not increase than the original featureset. Cardinalities of the
R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 181

Fig. 9. Pareto-optimal surface for the dataset (DB4) of randomly mixed handwritten Bangla Basic and Compound characters.

Fig. 10. Pareto-optimal surface for the dataset (DB2) of handwritten Bangla digits.

6. Experimental results optimal surfaces for two of the larger datasets (DB2 and DB4 in
Table 1) used in our experimental setup.
Objectives of the present work are: (a) increasing the recog- Finally, in the second phase, a quality consensus is performed
nition accuracy and (b) decreasing the corresponding recognition among the regions extracted from the pareto-surfaces and the
cost for the recognition of isolated Bangla handwritten characters. regions with majority consensus are separated. The minority
The recognition cost is represented in our experimental setup as, regions form the residual region-space. The local regions can now
(i) average per character recognition time and (ii) number of local be selected from the residual region-space, based on their class-
regions used to represent each handwritten digit/character. The separability power, using AFS theory based fuzzy logic (as dis-
integrated system shown in Fig. 4 is implemented for the experi- cussed in Section 5.3), such that the recognition accuracy achieved
mental setup. As discussed in Table 1 of Section 4.1, performance by the proposed system is maximized.
of the system is evaluated on four separate datasets, (a) a dataset A comparative analysis of the maximum recognition accuracies
achieved by employing NSHA, NSGA-II separately on all of the
of handwritten Bangla numerals (DB1), (b) a dataset of hand-
datasets used in the present work and their fuzzy recombination is
written Bangla numerals (DB2), (c) a dataset of handwritten
shown in Fig. 11.
Bangla Basic characters (DB3), and (d) a randomly mixed dataset of
The final results returned by the system are shown in Tables 3
handwritten Bangla Basic and Compound characters (DB4). In our
and 4. Dataset indexes are same as mentioned in Table 1. From the
experimental setup, LIBSVM [51], an open-source tool based on
results provided in Tables 3 and 4, it is clear that the proposed
SVM [52] is used for classification tasks. Values of the different
method provides a substantial increase in recognition accuracy
parameters used in our experimental setup is described in Table 2. and decrease in recognition cost for all of the datasets. For the
The experiments are performed on an Intel(R) Core(TM) i5 pro- dataset of randomly mixed Bangla Basic and Compound char-
cessor, with 3.10 GHz clock-frequency and 8 GB RAM. acters, per character average recognition time taken by the clas-
As shown in Fig. 4, in the first phase of our experimental setup, sifier has slightly increased. This may be attributed to greater
pareto-optimal solutions are extracted from the final population of number of support vectors needed to be generated to define the
the evolutionary algorithms (NSHA and NSGA-II). These solutions Maximal Marginal Hyperplane [52] for the dataset. This apparent
are plotted as an ordered triplet of [recognition accuracy, average contradiction serves the purpose of highlighting the difficulties
per character recognition time, number of local regions used to involved in handwritten character recognition. However, overall
describe each handwritten character/digit] over the solution-space. performance of the proposed system on the datasets used in our
Figs. 9 and 10 demonstrate the resultant representative pareto- present experimental setup is proved to be good.
182 R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189

Comparison of recognition accuracy(%) achieved for


the dataset (DB1) of handwritten Bangla digits

Recognition Accuracy (%)


97.8
97.6
97.4
97.2
97
Handwritten Bangla digits
NSHA 97.3
NSGA-II 97.3
Present work 97.8

NSHA NSGA-II Present work

Comparison of recognition accuracy(%) achieved for the


dataset (DB2) of handwritten Bangla digits
Recognition Accuracy (%)

98.3
98.2
98.1
98
Handwritten Bangla
digits
NSHA 98.1
NSGA-II 98.15
Present work 98.23
NSHA NSGA-II Present work

Comparison of recognition accuracy(%) achieved for the


Recognition Accuracy (%)

dataset (DB3) of hanwritten Bangla Basic characters

87.5
87
86.5
86
85.5
Handwritten Bangla
characters
NSHA 86.3095
NSGA-II 86.6363
Present work 87.28

NSHA NSGA-II Present work

Comparison of recognition accuracy(%) achieved for the


dataset (DB4) of randomly mixed hanwritten Bangla Basic
and Compound characters
Recognition Accuracy (%)

86.8
86.6
86.4
86.2
86
Handwritten Bangla characters
NSHA 86.3095
NSGA-II 86.6363
Present work 86.6478

NSHA NSGA-II Present work

Fig. 11. A comparative analysis of the maximum recognition accuracy achieved the proposed system.
R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 183

Table 3
Final results on the dataset of handwritten Bangla numerals.

Table 3.1

Dataset Index Dataset type Methodology Recognition Accuracy (%) by Average per character recogni- Number of local regions
SVM classifier tion time (ms) used
(A) (B) (C)

DB1 Isolated handwritten Bangla The Present work 97.8 16.531 11


numerals Without region sampling 97.15 17.57 16
Improvement using present 0.67 5.913 31.25
work (%)

Table 3.2

Dataset Index Dataset type Methodology Recognition Accuracy (%) by Average per character recogni- Number of local regions
SVM classifier tion time (ms) used
(A) (B) (C)

DB2 Isolated handwritten Bangla The Present work 98.23 30.6872 14


numerals Without region sampling 97.5 30.72 16
Improvement using present 0.73 0.107 12.5
work (%)

Table 4
Final results on the dataset of handwritten Bangla characters.

Table 4.1

Dataset Index Dataset type Methodology Recognition Accuracy (%) by Average per character Number of local
SVM classifier recognition time (ms) regions used
(A) (B) (C)

DB3 Isolated handwritten Bangla Basic The present work 87.28 28.96 14
characters Without region sampling 84.5750 30.02 16
Improvement using present 3.1983 3.531 12.5
work (%)

Table 4.2

Dataset Index Dataset type Methodology Recognition Accuracy (%) by Average per character Number of local
SVM classifier recognition time (ms) regions used
(A) (B) (C)

DB4 Randomly mixed isolated handwritten The present work 86.6478 90.6675 15
Bangla Basic and Compound characters Without region sampling 85.33 85.52 16
Improvement using present 1.3178  6.019 6.25
work (%)

A brief comparative analysis of the proposed method with accuracy and 0.231% reduction in recognition cost for the datasets
some of the contemporary methods present in the literature is DB1, DB2, DB3 and DB4 respectively.
presented in Table 5. Competing methods are tested on the same To establish the superiority of the method proposed in the present
datasets [12,38], used in our experimental setup and their per- work, statistical significance tests are performed on the results shown
formance is compared based on two evaluation metrics: (a) the in Table 5. In our experimental setup, paired t-tests at 5% significance
maximum recognition accuracy achieved by the method after level have been performed to establish statistical significance of the
repeating the experiment for 25 times and (b) the reduction in superior performance shown by the proposed method. Results of the
paired t-tests are shown in Table 6. Indexes are same as in Table 5. The
recognition cost using the method. In our experimental setup,
results shown in Table 6 emphatically prove that superior perfor-
recognition accuracy of a competing method is computed by using
mance provided by the present work is statistically significant for
a SVM based classifier whereas the reduction in recognition cost is
9 out of 10 competing methods in case of Bangla characters and 10 out
computed as the sum of average reduction in per character
of 11 competing methods in case of Bangla digits.
recognition time and the average reduction in the number of local
In spite of the various precautions taken to disambiguate
regions needed to represent each character. The local regions used similarly shaped handwritten Bangla characters, due to various
to represent each character are produced by a two-level CG based writing styles and/or complex shape of different Bangla characters
quad-tree partitioning [41]. The proposed method achieved and digits, some of the characters and digits in the test dataset
(1) 97.8% recognition accuracy and 37.163% reduction in recogni- were misclassified by the proposed methodology, which highlights
tion cost, (2) 98.23% recognition accuracy and 12.61% reduction in the challenges inherent in handwritten character recognition.
recognition cost, (3) 87.28% recognition accuracy and 16.031% Fig. 12 shows some of the correctly classified and misclassified
reduction in recognition accuracy & (4) 86.6478% recognition characters by the proposed method.
184 R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189

Table 5
Comparative analysis of the proposed method with contemporaries.

Table 5.1

index Work reference Database Type of data Classification Scheme Recognition accuracy Reduction in recognition
(%) cost (%)

A1 Hassan et al. [53] DB1 Isolated handwritten Bangla numerals SRC 94.00 0
A2 Hassan et al. [54] KNN 96.70 0
A3 Das et al. [55] SVM 97.70 0
A4 Das et al. [15] SVM 97.70 3.9
A5 Roy et al. [56] SVM 93.30 0
A6 Xu et al. [57] Bayesian Network 87.50 0
A7 Present work SVM 97.80 37.163
B1 Wen et al. [58] DB2 Isolated handwritten Bangla numerals SVM 75.05 0
B2 Bhattacharya et al. SVM 84.50  170.11
[34]
B3 Basu et al. [42] MLP 67.70 0
B4 Das et al. [15] SVM 80.58  100
B5 Roy et al. [56] SVM 87.26  846.59
B6 Present work SVM 98.23 12.61

Table 5.2

Index Work reference Database Type of data Classification Scheme Recognition accuracy Reduction in recognition
(%) cost (%)

C1 Basu et al. [59] DB3 Isolated handwritten Bangla Basic characters SVM 80.58  200
C2 Roy et al. [60] SVM 86.40 13.52
C3 Das et al. [61] SVM 82.27 0
C4 Bhowmick et al. [62] MLP 84.23 0
C5 Sarkhel et al. [47] SVM 86.533 8.583
C6 Present work SVM 87.28 16.031
D1 Das et al. [61] DB4 Randomly mixed, isolated handwritten Bangla SVM 75.05 0
D2 Roy et al. [60] Basic and Compound characters SVM 84.50  170.11
D3 Basu et al. [14] MLP 67.70 0
D4 Basu et al. [59] SVM 80.58  100
D5 Das et al. [10] SVM 86.96  846.59
D6 Present work SVM 86.6478 0.231

Table 6
Results of paired t-test at 5% significance level.

Table 6.1

Index Database Type of data p-Value t-Value H r H0

A1 DB1 Bangla digits 0.0348 5.2197 Yes


A2 0.0472 4.4388 Yes
A3 0.4597 0.7454 No
A4 0.0009 3.5355 Yes
A5 0.0001 18.6852 Yes
A6 0.0012 3.4333 Yes
B1 DB2 Bangla digits 0.0002 66.65 Yes
B2 0.0001 24104 Yes
B3 0.0004 47.44 Yes
B4 0.0448 2.212 Yes
B5 0.0114 9.297 Yes

Table 6.2

Index Database Type of data p-Value t-Value H r H0

C1 DB3 Bangla Basic characters 0.0310 2.2222 Yes


C2 0.0077 2.7828 Yes
C3 0.0168 2.4773 Yes
C4 0.0001 6.1000 Yes
C5 0.0492 2.0182 Yes
D1 DB4 Randomly mixed Bangla 0.0112 9.361 Yes
D2 Basic and Compound 0.0015 25.52 Yes
D3 characters 0.0122 8.954 Yes
D4 0.0801 3.209 No
D5 0.0001 139.4 Yes
R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 185

Fig. 12. Samples of correctly classified and misclassified characters by the present work.

dataset of handwritten Bangla digits [38] respectively. Both of the


multi-objective evolutionary region sampling algorithms used in
the experimental setup, have been developed independent of any
particular dataset. The results extracted from the pareto-optimal
surfaces of both the algorithms are fuzzily combined to return the
best compromise solution. The multi-objective region sampling
Fig. 13. Samples of broken and degraded Bangla handwritten characters and digits algorithms and the AFS theory based fuzzy region selection
which were correctly classified by the present work. methodology, proposed under the current work, mark one of the
contributions of this paper. Also to the best of our knowledge, this
It is to be noted that our present work can successfully handle is the first work which tries to find a trade-off between max-
broken and degraded characters. Fig. 13 shows some of such char- imizing the achieved recognition accuracy and minimizing the
acters present in the datasets that have been used in our experiment. associated recognition cost for a handwritten character recogni-
All these samples were successfully recognized by our proposed tion system. The experimental results have shown great promise
method, despite the apparent discontinuities present in the shapes of
in this approach. It opens up a new frontier of developing efficient
the characters. The reason behind this may be attributed to the
optical character recognition systems, which can be successfully
similarities in the sub-images generated in the first and second level
used in various real-life applications.
of depths of quad-tree partitioning of the handwritten character/digit
image. Interested readers can find more details regarding this in [10].

Conflict of interest
7. Conclusion
None declared.
A novel region sampling method based on multi-objective
evolutionary algorithms is presented here. The proposed method
tries to identify the set of most discriminative local regions while Acknowledgment
incurring minimal recognition cost. A non-dominated sorting
harmony search based region sampling method and a non- The authors are thankful to CVPR unit, ISI, Kolkata for providing
dominated sorting genetic algorithm based region sampling the dataset for the experiment. The authors are also thankful to
methodology have been employed in our experimental setup. It “Centre for Microprocessor Application for Training Education and
provided 1.3178% and 0.73% increase in recognition accuracy, Research” and Department of Computer Science & Engineering,
while decreasing the associated recognition cost by 12.607% and Jadavpur University, Kolkata for kindly providing the resources and
0.234% for a dataset of handwritten Bangla characters [35] and a infrastructural facilities that helped to complete this work.

Appendix A

To describe the algorithms in more details, pseudocodes of the proposed algorithms developed under our present work is described here. In
this section, the pseudocodes of proposed NSGA-II [18] based region sampling methodology, the NSHA [17] based region sampling methodology
and the AFS theory based local region selection methodology is described under Algorithm 1, Algorithm 2 and Algorithm 3 respectively.
Algorithm 1. A non-dominated sorting genetic algorithm based
region sampling methodology.

Input: Initial feature set extracted from the Bangla handwritten character/digit database.
Output: A set of non-dominated points extracted from the pareto-optimal front.
Initialize the parameters: population size (POP), crossover-probability (pc), mutation-probability (pm), maximum number of iterations (NI)

Begin
1. ß ¼ Φ /* ß denotes the pareto-optimal set */
2. for (i ¼ 1 to NI){
186 R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189

3. C ¼β /* C denotes population of the current generation, represented by a set of ordered pairs (recognition accuracy,
recognition cost) */
4. for ( j ¼ 1 to POP-1){ /*POP denotes the maximum possible population size*/
5. if (population is empty){
6. Initialize the population with a set S containing 2 sets ofjrandomly selected local regions.
7. C ¼ C[S
8. }
9. Generate a uniform random number r1 ϵ (0, 1).
10. if (r1 r pc ){ // pc denotes the crossover probability
11. Randomly select two parents from the current generation.
12. /* These two parents constitute the mating pool. */
13. Randomly select a crossover point and perform single-point crossover operation. If the two offspring generated by this
operation are denoted by O1 and O2.
14. Update population: C ¼ C [ O1 [ O2
15. }
16. Generate a uniform random number r2 ϵ (0, 1).
17. if (r2 r pm ){ // pm denotes the mutation probability
18. Randomly select a parent from the current generation. Make a copy of this parent for the mutation operation. Perform
mutation on this parent (shown in Fig. 5).
19. Let, the mutated parent is denoted by MP.
20. C ¼ C [ MP
21. }
22. }
23. for the population in C{
24. Perform fast non-dominated sort and assign non-domination rank, as suggested by Deb et al. [16]. For a parent p A C, non-
domination rank of p is the number of individuals in C that dominates p.
25. To preserve diversity among the pareto-optimal points, a metric called the crowding- distance (shown in Fig. 6) is also
integrated, when trying to find the non-dominated fronts.
26. Using these two metrics i.e. non-domination ranks and crowding-distance, non-dominated fronts are computed based on a
partial order relation denoted by ! n . For any p, q A C, p ! n q implies [(prank o qrank) or ((prank ¼ qrank) and (pdistance 4qdistance))]. The
points of the pareto-optimal front are represented by the set ß.
27. }
28.
29. return ß. /* ß contains the pareto-optimal set of points */
End

Analysis of the algorithm


Using the meta-heuristics searching ability of the algorithm, the pareto-optimal front produces several alternate solutions to the user
to choose from. Time complexity analysis of the algorithm for each iteration, is as following:
 
1) Time complexity for non-dominated sorting is: O M ð2N Þ2 ;
2) Time required for crowding-distance assignment is:
OðM ð2N Þlog ð2N ÞÞ;
3) Sorting on ! n requires time of the order of: Oð2Nlog ð2NÞÞ:
 
In aggregate, total time required for each iteration of the algorithm is of the order of: O MN2 , where N is the total number of local
regions in the local feature set and M is the number of objectives to be optimized. This is a significant reduction from experimenting with
all the possible combinations of local regions, which would have taken time of the order of Oð2N Þ.

Algorithm 2. A self-adaptive non-dominated sorting harmony-


search region sampling methodology.

Input: Initial feature set extracted from the Bangla handwritten character/digit database.
Output: A set of non-dominated points extracted from the pareto-optimal front.
Initialize the parameters: harmony memory size (HMS), minimum value of HMCR (HMCRmin), maximum value of HMCR (HMCRmax),
minimum value of PAR (PARmin), maximum value of PAR (PARmax), maximum number of iterations (NI).
Begin
1. ß ¼ Φ /* ß denotes the pareto-optimal set */
2. for (i ¼ 1 to NI ){
3.C ¼ ß /* C denotes population of the current generation, represented by a set of ordered pairs (recognition
accuracy, recognition cost) */
4. for ( j ¼ 1 to HMS-1){ /* HMS denotes the maximum possible population size */
5. if (population is empty){
6. Initialize the population with a set S containing 2 sets of j randomly selected local regions.
R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 187

7. C ¼ C[S
8. }
9. Generate a uniform random number r1 ϵ (0, 1).
10. if (r1 r HMCRi ){ /* The value of HMCRi is set by (5) */
11. Randomly select a parent from the current generation. Make a copy of this parent for pitch adjustment.
12. Generate a uniform random number r2 ϵ (0, 1).
13. if (r2 r PARi ){ /* The value of PARi is set by (6) */
14. Randomly select a region from the previously chosen parent i.e. a note from a harmony.
15. Replace the selected region with a region selected randomly from the set of local regions that are not included in the
chosen parent.
16. Let the pitch adjusted parent is denoted by P.
17. C ¼ C⋃P
18. }
19. }
20. else {
21. Initialize a parent T with j randomly selected regions such that, T 2= C.
22. C ¼ C⋃T
23. }
24. }
25. for the population in C {
26. Perform fast non-dominated sort and assign non-domination rank, as suggested by Deb et al. [16]. For a parent p A C, non-
domination rank of p is the number of individuals in C that dominates p.
27. To preserve diversity among the pareto-optimal points, a metric called the crowding-distance (shown in Fig. 6) is also inte-
grated, when trying to find the non-dominated fronts.
28. Using these two metrics i.e. non-domination ranks and crowding-distance, non-dominated fronts are computed based on a partial
order relation denoted by ! n . For any p, q A C, p ! n q implies [(prank o qrank) or ((prank ¼ qrank) and (pdistance 4qdistance))]. The points of the
pareto-optimal front are represented by the set ß.
29. }
30. }
31. return ß. /* ß contains the pareto-optimal set of points */
End

Analysis of the algorithm


The proposed method searches for the most informative set of local regions in the search space of (recognition accuracy, recognition
cost) and returns the pareto-optimal set. Time complexity analysis of the algorithm for each iteration, is given as following:
 
1) Time complexity for non-dominated sorting is: O M ð2N Þ2 ;
2) Time required for crowding-distance assignment is:
OðM ð2N Þlog ð2N ÞÞ;
3) Sorting on ! n requires time of the order of: Oð2Nlog ð2NÞÞ:
 
In aggregate, total time required for each iteration of the algorithm is of the order of: O MN2 , where N is the total number of local
regions in the local feature set and M is the number of objectives to be optimized. This is a significant reduction from experimenting with
all the possible combinations of local regions, which would have taken time of the order of Oð2N  1 Þ.

Algorithm 3. An AFS theory based feature selection technique


from the residual featureset.

Input: The set of training samples X, original featureset F.


Output: The most discriminating subset of features from the residual featureset.
1 for x A X // X is the set of training samples
2 Calculate uxf according toxf ; // f A F(the original set of features) and
// uxf denotes the fuzzy features for the
// element x, as shown in Eq. (9)
3 for ui ; uj A UF // UF denotes the fuzzy featureset
4 Calculateθu1 u2 ðxÞ; // As shown in Eq. (7)
5 for ui ; uj A UF
6 Calculatenormðu1 ; u2 Þ; // As shown in Eq. (10)
7 Create matrix M;
8 Find maximum sum k  k sub-matrix from M; // k  k sub-matrix is chosen from M
// that if i-th row is selected, i-th column
// will also have to be selected.
188 R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189

References

[1] A. Ul-Hasan, S. Bin Ahmed, F. Rashid, F. Shafait, T.M. Breuel, Offline printed [31] M. Hanmandlu, A.V. Nath, A.C. Mishra, V.K. Madasu, Fuzzy Model Based
Urdu Nastaleeq script recognition with bidirectional LSTM networks, in: Pro- Recognition of Handwritten Hindi Numerals using Bacterial Foraging, in:
ceedings of the 2013 12th International Conference on Document Analysis and Proceedings of the 6th IEEE/ACIS International Conference on Computer and
Recognition, 2013, pp. 1061–1065. Information Science (ICIS 2007), 2007, pp. 309–314.
[2] A. Ray, S. Rajeswar, S. Chaudhury, Text recognition using deep BLSTM net- [32] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach.
works, in: Proceedings of the 2015 Eighth International Conference on Learn. Res. 3 (2003) 1157–1182.
Advances in Pattern Recognition (ICAPR), 2015, pp. 1–6. [33] M. Ramze Rezaee, B. Goedhart, B.P.F. Lelieveldt, J.H.C. Reiber, Fuzzy feature
[3] A. Park, Offline text recognition without intraword character segmentation selection, Pattern Recognit. 32 (12) (1999) 2011–2019.
based on two-dimensional low frequency discrete Fourier transforms, 24- [34] U. Bhattacharya, B.B. Chaudhuri, Handwritten numeral databases of Indian
May-1994. scripts and multistage recognition of mixed numerals, IEEE Trans. Pattern
[4] A. Vinciarelli, S. Bengio, H. Bunke, Offline recognition of unconstrained Anal. Mach. Intell. 31 (3) (2009) 444–457.
handwritten texts using HMMs and statistical language models, IEEE Trans. [35] cmaterdb – CMATERdb: The pattern recognition database repository – Google
Pattern Anal. Mach. Intell. 26 (6) (2004) 709–720. Project Hosting. [Online]. Available: https://code.google.com/p/cmaterdb/.
[5] H. Fujisawa, Forty years of research in character and document recognition— (accessed: 31.01.15).
an industrial perspective, Pattern Recognit. 41 (8) (2008) 2435–2446. [36] R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, CMATERdb1 : a database of
[6] R.M. Bozinovic, S.N. Srihari, Off-line cursive script word recognition, IEEE unconstrained handwritten Bangla and Bangla – English mixed script docu-
Trans. Pattern Anal. Mach. Intell. 11 (1) (1989) 68–83. ment image, 2012, pp. 71–83.
[7] P.-K. Wong, C. Chan, Off-line handwritten Chinese character recognition as a [37] S. K. Parui, K. Guin, U. Bhattacharya, B.B. Chaudhuri, Online Handwritten
compound Bayes decision problem, IEEE Trans. Pattern Anal. Mach. Intell. 20 Bangla Character Recognition Using HMM 3. Analysis of strokes in hand-
(9) (1998) 1016–1023. written, IEEE, 2008, pp. 1–4.
[8] J. Cao, M. Ahmadi, M. Shridhar, Recognition of handwritten numerals with [38] ISI Image Databases of Handwritten Isolated Bangla Numerals. [Online].
multiple feature and multistage classifier, Pattern Recognit. 28 (2) (1995) Available: 〈http://www.isical.ac.in/  ujjwal/download/BanglaNumeral.html〉.
153–160. (Accessed: 20.04.15).
[9] S.V. Rajashekararadhya, P.V. Ranjan, Efficient zone based feature extraction [39] CMATERdb3.1.3.3 – cmaterdb – Handwritten Bangla Compound character
algorithm for handwritten numeral recognition of four popular south indian image database – CMATERdb: The pattern recognition database repository –
scripts, J. Theor. Appl. Inf. Technol. 4 (12) (2008) 1171–1181. Google Project Hosting.
[10] N. Das, R. Sarkar, S. Basu, P.K. Saha, M. Kundu, M. Nasipuri, Handwritten Bangla [40] N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri, D. Kumar Basu, An improved
character recognition using a soft computing paradigm embedded in two pass feature descriptor for recognition of handwritten Bangla alphabet, in: Pro-
approach, Pattern Recognit. 48 (6) (2014) 2054–2071. ceedings of the ICSIP 2009, 2009, pp. 451–454.
[11] U. Pal, B.B. Chaudhuri, Indian script character recognition: a survey, Pattern [41] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D. Kumar Basu, A novel
Recognit. 37 (9) (2004) 1887–1899. framework for automatic sorting of postal documents with multi-script
[12] R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, CMATERdb1: a address blocks, Pattern Recognit. 43 (10) (2010) 3507–3521.
database of unconstrained handwritten Bangla and Bangla–English mixed [42] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D. K. Basu, Recognition of
script document image, Int. J. Doc. Anal. Recognit. 15 (1) (2011) 71–83. numeric postal codes from multi-script postal address blocks, in: Pattern
[13] D. Impedovo, G. Pirlo, Zoning methods for handwritten character recognition: Recognition and Machine Intelligence, Springer, 2009, pp. 381–386.
a survey, Pattern Recognit 47 (3) (2014) 969–981. [43] X. Liu, W. Pedrycz, Axiomatic Fuzzy Set Theory and Its Applications, Springer,
[14] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, Handwritten Berlin, Heidelberg, 2009.
‘Bangla’ alphabet recognition using an MLP based classfier, in: Proceedings of [44] G.V. Geem, Loganathan, A new heuristic optimization algorithm: harmony
the 2nd National Conference on Computer Processing of Bangla—2005, 2005, search, Simulation 76 (2) (2001) 60–68.
pp. 285–291. [45] S. Salcedo-Sanz, A. Pastor-Sánchez, J. Del Ser, L. Prieto, Z.W. Geem, A coral reefs
[15] N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, A genetic algorithm optimization algorithm with harmony search operators for accurate wind
based region sampling for selection of local features in handwritten digit speed prediction, Renew. Energy 75 (2015) 93–101.
recognition application, Appl. Soft Comput. 12 (5) (2012) 1592–1606. [46] Z.W. Geem, J.-H. Kim, Wastewater treatment optimization for fish migration
[16] K. Deb, Multi-objective optimization using evolutionary algorithms: an using harmony search, Math. Probl. Eng. 2014 (2014) 1–5.
introduction, 2011, pp. 1–24. [47] R. Sarkhel, A. Saha, N. Das, An enhanced harmony search method for bangla
[17] S. Sivasubramani, K.S.S. Swarup, Multi-objective harmony search algorithm for handwritten character recognition using region sampling, in: Proceedings of
optimal power flow problem, Int. J. Electr. Power Energy Syst. 33 (3) (2011) the 2nd IEEE International Conference on Recent Trends in Information Sys-
745–752. tems (ReTIS-15), 2016, p. (in press).
[18] K. Deb, A. Member, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist [48] Q.-K. Pan, P.N. Suganthan, J.J. Liang, M.F. Tasgetiren, A local-best harmony
multiobjective genetic algorithm, IEEE Trans. Evol. Comput. 6 (2) (2002) search algorithm with dynamic subpopulations, Eng. Optim. 42 (2) (2010)
182–197. 101–117.
[19] L. Xiaodong, The fuzzy theory based on AFS algebras and AFS structure, J. [49] Y. Li, Z.-F. Wu, Fuzzy feature selection based on min–max learning rule and
Math. Anal. Appl. 217 (2) (1998) 459–478. extension matrix, Pattern Recognit. 41 (1) (2008) 217–226.
[20] V. Chankong, Y.Y. Haimes, Multiobjective Decision Making: Theory and [50] A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, An Axiomatic Fuzzy Set Theory
Methodology, North Holland, New York, 1983. Based Feature Selection Methodology for Handwritten Numeral Recognition.
[21] E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algo- [51] C. Chang, C. Lin, LIBSVM: A Library for Support Vector Machines, vol. 2 (3),
rithms: empirical results, Evol. Comput. 8 (2) (2000) 173–195. 2011.
[22] M. Atiquzzaman, S.-Y. Liong, X. Yu, Alternative decision making in water dis- [52] V.N. Vapnik, An overview of statistical learning theory, IEEE Trans. Neural
tribution network with NSGA-II, J. Water Resour. Plan. Manag. 132 (2) (2006) Netw. 10 (5) (1999) 988–999.
122–126. [53] H.A. Khan, A. Al Helal, K.I. Ahmed, Handwritten Bangla digit recognition using
[23] E. Fallah-Mehdipour, O. Bozorg Haddad, M.M. Rezapour Tabari, M.A. Mariño, Sparse Representation Classifier, in: Proceedings of the 2014 International
Extraction of decision alternatives in construction management projects: Conference on Informatics, Electronics & Vision (ICIEV), 2014, pp. 1–6.
application and adaptation of NSGA-II and MOPSO, Expert Syst. Appl. 39 (3) [54] T. Hassan, H.A. Khan, Handwritten Bangla numeral recognition using Local
(2012) 2794–2803. Binary Pattern, in: Proceedings of the 2015 International Conference on
[24] S. Dhanalakshmi, S. Kannan, K. Mahadevan, S. Baskar, Application of modified Electrical Engineering and Information Communication Technology (ICEEICT),
NSGA-II algorithm to combined economic and emission dispatch problem, Int. 2015, pp. 1–4.
J. Electr. Power Energy Syst. 33 (4) (2011) 992–1002. [55] N. Das, J.M. Reddy, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, A
[25] S. Kannan, S. Baskar, J.D. McCalley, P. Murugan, Application of NSGA-II algo- statistical–topological feature combination for recognition of handwritten
rithm to generation expansion planning, IEEE Trans. Power Syst. 24 (1) (2009) numerals, Appl. Soft Comput. 12 (8) (2012) 2486–2495.
454–461. [56] A. Roy, N. Mazumder, N. Das, R. Sarkar, S. Basu, M. Nasipuri, A new quad tree
[26] K.S. Lee, Z.W. Geem, A new meta-heuristic algorithm for continuous engi- based feature set for recognition of handwritten bangla numerals, in: AICERA
neering optimization: harmony search theory and practice, Comput. Methods 2012 – Annual International Conference on Emerging Research Areas: Inno-
Appl. Mech. Eng. 194 (36–38) (2005) 3902–3933. vative Practices and Future Trends, 2012, pp. 1–6.
[27] X.S. Yang, Harmony search as a metaheuristic algorithm, Stud. Comput. Intell. [57] J. Xu, J. Xu, Y. Lu, Handwritten Bangla digit recognition using hierarchical
191 (2009) 1–14. Bayesian network, in: Proceedings of the 2008 3rd International Conference
[28] J.R. Germ, Multiobjective Harmony Search Algorithm Proposals, vol. 281, 2011, on Intelligent System and Knowledge Engineering, 2008, vol. 1, pp. 1096–1099.
pp. 51–67. [58] Y. Wen, Y. Lu, P. Shi, Handwritten Bangla numeral recognition system and its
[29] S. Sivasubramani, K.S. Swarup, Environmental/economic dispatch using multi- application to postal automation, Pattern Recognit. 40 (1) (2007) 99–107.
objective harmony search algorithm, Electr. Power Syst. Res. 81 (9) (2011) [59] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, A hierarchical
1778–1785. approach to recognition of handwritten Bangla characters, Pattern Recognit 42
[30] I. Tsamardinos, C.F. Aliferis. Towards principled feature selection: relevancy, (7) (2009) 1467–1484.
filters, and wrappers, in: Proceedings of the Ninth International Workshop on [60] A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, Region selection in
Artificial Intelligence and Statistics, 2003. handwritten character recognition using artificial bee colony optimization, in:
R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 189

Proceedings of the Third International Conference on Emerging Applications [62] S.K.P.T.K. Bhowmik, U. Bhattacharya, Recognition of Bangla Handwritten
of Information Technology (EAIT– 2012), 2012, pp. 183–186. Characters Using an MLP Classifier Based on Stroke Features, Springer, Berlin,
[61] N. Das, B. Das, R. Sarkar, S. Basu, M. Kundu, Handwritten Bangla basic and Heidelberg, 2004.
compound character recognition using MLP and SVM classifier, J. Comput. 2
(2) (2010) 109–115.

Ritesh Sarkhel received his B.C.S.E degree from Jadavpur University in 2012. He worked as an R&D Engineer in Samsung Research Institute, Noida from 2012 to 2014. He is
currently pursuing M.C.S.E degree from Jadavpur University. His areas of current research interest are OCR of handwritten text, optimization techniques and computer vision.

Nibaran Das received his B.Tech degree in Computer Science and Technology from Kalyani Govt. Engineering College under Kalyani University, in 2003. He received his M.C.
S.E. degree from Jadavpur University, in 2005. He received his Ph.D. (Engg.) degree thereafter from Jadavpur University, in 2012. He joined J.U. as a lecturer in 2006. His areas
of current research interest are OCR of handwritten text, Bengali fonts, optimization techniques and image processing. He has been an editor of Bengali monthly magazine
“Computer Jagat” since 2005.

Amit K. Saha received his B.Tech degree in Information Technology from WBUT, in 2011. He received his M.T.C.T. degree from Jadavpur University, in 2015. His areas of
current research interest are OCR of handwritten text, Nature Inspired Computing and Multi-Objective Optimization.

Mita Nasipuri received her B.E.Tel.E., M.E.Tel.E., and Ph.D. (Engg.) degrees from Jadavpur University, in 1979, 1981 and 1990, respectively. Prof. Nasipuri has been a faculty
member of J.U. since 1987. Her current research interest includes image processing, pattern recognition, and multimedia systems. She is a senior member of the IEEE, U.S.A.,
Fellow of I.E. (India) and W.B.A.S.T., Kolkata, India.

You might also like