Combination of Hough Transform and Neural Network On Recognizing Mathematical Symbols

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Combination of Hough Transform and Neural

Network on recognizing mathematical symbols


Aouadi Nabil
Latice Laboratory
Tunis, Tunisia
nabil.aouadi@utic.rnu.tn

Abstract—Offline printed mathematical symbol recognition is 60s [5] [6]. In this work, the general objective, covered is to
a particularly difficult task. Recognizing mathematical symbols is contribute to an automatic system which is required to rec-
one stage within the overall system for recognition of mathemati- ognize mathematical formulas, automatically extracted from
cal documents. We describe many experiments using MultiLayer
Perceptron (MLP), Hough Transform (HT), k Nearest Neighbors image of scientific documents [7]. Notice that mathematical
formula recognition is mainly composed of three major steps:
2021 8th International Conference on ICT & Accessibility (ICTA) | 978-1-6654-6641-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICTA54582.2021.9809779

(kNN) and structural Freeman chain code, to enhance symbol


recognition of printed mathematics. First, we investigate the use formula segmentation, symbol extraction and recognition and
of a MLP based method. Second, we compare the performance finally formula structural analysis. The first step segments
of a proposal neural structural method, named HT-MLPs, on the input image into a set of symbols. The second and third
symbols that initial MLP usually confuses. The inclusion of HT
in MLP reduces symbol confusion rate by 21% and improves steps analyze the spatial arrangement of this set of symbols to
recognition rates from 72% to 93%. To improve the efficiency of recover the information content of the mathematical formula.
the proposed method, we compare it to KNN then to Freeman Though it is known that the segmentation phase is generally
code based methods, commonly used in pattern recognition. easier than the latter one, the effort of reducing recognition
While analyzing results, we show that HT-MLP always gives errors is still important because not only wrong symbol recog-
a lower mean confusion and rejection and higher success rates
than the others solutions. nition results often give bad effect to the structural analysis
Index Terms—Perceptron Multilayer Network, Hough Trans- phase, but also it is useful to reduce search space of the latter
form, Mathematical Symbol Specialized Neural Network phase. Although the proposed system achieved good results
for symbol recognition before this investigation, its failure to
I. I NTRODUCTION distinguish certain common symbols would be bothersome to
any serious user. So we aim 1) to expand the symbol set,
The big challenges of Human Computer Interaction research
used for the training and test steps, in order to recognize other
is the full inclusion of people with special needs into the
symbols, Greek letters and new operators and connectors, and
digital world. We specially thinks about blind people. Many
2) to improve the accuracy of single symbol recognition. To
studies such as printed and handwritten mathematical symbol
this end, we implemented some distinct recognition algorithms
recognition aim at addressing their needs [1]. Significant
and performed experiments to determine which algorithms
research effort has been reported for handwritten math formula
were best suited for recognition of mathematical symbols.
or symbol recognition. It’s an attractive field of pattern recog-
The paper content will be as follows. First, we examine the
nition leading to practical applications [2] and [3]. According
existing mathematics notation recognition literature. Then, we
to [4], Symbol recognition process has some limitations that do
study the ability of a MLP, as neural classifier, to distinguish
not allow achieving high accuracy. Three types of limitations
mathematical symbols. Next, we show how the classifier may
can be considered :
be integrated with the Hough Transform (HT ) to improve
• Symbols with different meanings and similar shapes such its classification ability. Afterwards, we analyze and compare
as: 9q gq bh GC KR DO UV QO B8 Z2 S5 I1 Uv. our experimental results with those obtained by the KN N
• Upper and lower case symbols with same shapes such as: method and by the Freeman code. We close the paper with
nN fF wW zZ xX yY cC uU sS kK mM pP vV. some concluding remarks and future works.
• interpreting the mathematical expression

This paper addresses the problem of recognizing math- II. P REVIOUS RELATED RESEARCH WORK
ematic notation which poses challenges from its two- In [8], we report that the failure of conventional
dimensional pattern, the rich set of used symbols, the range of OCR systems to treat mathematics symbols has several
similar looking symbols whose bold, calligraphic, and italic consequences:
varieties must be recognized distinctly, the imbalance and
paucity of an available training data, and the impossibility - Readers of mathematical documents cannot automatically
of final verification through spell check. Many researchers search for earlier occurrences of a variable or operator, in
have worked on Handwritten Mathematical Expression (HME) tracing the notation and definitions used by a journal article.
recognition for many years such as Andersons effort in the - The appearance of mathematics on the same line as text

978-1-6654-6641-7/21/$31.00 ©2021 IEEE


Authorized licensed use limited to: Zhejiang University. Downloaded on August 18,2023 at 22:46:06 UTC from IEEE Xplore. Restrictions apply.
often confounds OCR treatment of surrounding words. III. P ROPOSAL METHODS FOR SYMBOL RECOGNITION
- Equations can only be represented as graphics by semantic A basic method in symbol recognition is to record several
transformation systems, such as those converting digital samples for each symbol of interest, then to compare input
documents into braille for accessibility by blind readers. symbols against those stored models. There are currently 24
symbol classes in our database, with 140 samples for each.
Note that several research centers are interested in the The two third of the database are used for the training and the
recognition of mathematical formulas and symbols (printed rest for the test. These symbols are extracted, from different
or handwritten) as shown in table 1. scientific journal, articles and books with various sizes and
In [9], algorithms based on nearest neighbor scheme for fonts then scanned using a resolution of 300 dpi. We believe
offline character recognition, can get high accuracy but is that there are two mainly factors which influence on the
slow. Authors proposed a classifier based on nearest neighbor training: sample quality and value diversity.
classification in a space of 27 features. For each symbol, its A. MLP, neural classifier
bounding box is divided a 5 by 5 rectangular grid and the This section will be about the use of a MLP to classify
percentage of black pixels in each cell will be extracted as a mathematical symbols. Recall that MLP is a trainable algo-
feature. The other two features are aspect ratio and the absolute rithm that can learn to solve complex problems from training
height in pixels of the bounding box. Euclidean distance data that consists of a set of pairs of inputs and desired outputs.
is calculated as the dissimilarity between two symbols. For It is composed of interconnected processing elements, called
Alvaro et al in [10], each bounding box was normalized to a neurons, that work together to produce an output. It is a
fixed size and the Euclidean distance between two images is modification of the standard linear perception in that it uses
calculated based on the difference of each pixel.This method three or more layers of neurons with nonlinear activation func-
obtained a 94.24% symbol recognition rate in InftyCDB-1 tions, and is more powerful than the perceptron in that it can
database. distinguish data that is not linearly separable. In the proposed
In last decades, many papers have been made which high- MLP (see figure 1), the input and model symbols both consist
lights the importance of this research in pattern recognition of bitmaps of size 20*15 pixels. It is a three layer network. The
and analysis of documents. Hidden Markov Models (HMMs) input layer comprises, according various experiments: 1) 80 or
have been already used by [11] and [12] for symbol recog- 150 neurons which respectively correspond to the black pixel
nition. They had 82 symbols which were written 50 times. densities of each 4 or 2 pixels portion of symbol bitmap, 2) 300
A combination of HMMs and Artificial Neuronal Networks neurons when considering each pixel from the bitmap symbol
(ANNs) was then proposed by [13]. In another method, and 3) 7 neurons if the seven Hu moments are used as input
proposed by [14], an improved version of Kohn-Munkres to the MLP. The output layer consists of 24 neurons referring
matching-algorithm is used with a 94 symbol set. Later on, to the different classes of mathematical symbols, associated
the authors, in [15] proposed a recognition based method using with their activation degrees.
Support Vector Machine (SVM) with a 43 mathematic symbol After several tests, it was shown that the best MLP design
set. Methods based on combining classifiers for mathematic is a threelayer network with 15 neurons in the unique hidden
symbol recognition was tested by [16]. They used feature layer and using 0.3 as training factor. To measure the MLP
template matching together with HMMs in a 198 symbol set. performance, we compute the recognition (R), the rejection
Notice that majority of symbol recognition research in mathe- (J) and the confusion (C) rates. Figure 2 gives the obtained
matics notation focuses on handwritten input produced online average rates in function of the MLP input layer content.
via data tablet. Generally, in pattern recognition in general, We conclude that 1) the best results are obtained with a
there are two major approaches: segmentation-based methods 4pixel portion black pixel densities used as input layer and
(structural approach) and free segmentation methods (statis- 2) the HU moments seem to be not suitable to characterize
tical approach). In segmentation-based methods, the symbol mathematical symbols because they are insensitive to rotation
is split into segments to be recognized, using segmentation so many confusion cases are noted (e.g. ⊂ versus ∩ and ∪,
algorithm. In contrast, segmentation-free methods use features → versus ←, ( versus ), [ versus ] ).
of the whole symbol image. Current trends are no longer Analyzing the obtained results with 4pixels portion densities
using a single method. The use of mixed methods tends to as input layer, and observing the event of confusion, we find
generalize. In [4], two stages needed to the process of symbol certain distinct symbols, such as , ∈, (, ̸∈, ← are in close
recognition:symbol segmentation and accurate classification resemblance. They have roughly similar pixel densities. We
for over 300 classes. Many multidimensional mathematical remark that the structure of symbol should play a decisive
symbols need both horizontal and vertical projection to be role to remove such ambiguities. By counting the number of
segmented. In the next section, we will describe the use segments in a symbol, for example, we can distinguish many
of different methods for symbol recognition (neuron based variants of symbols (see Figure 3).
method: MLP, neural-structural method: HT-MLPs, statistical To this end, we had resort to HT , to detect lines especially
method: KNN and structural method based on freeman code) as most symbols are composed of segments. We demonstrate,
and present the results of our experiments along with analysis. as it will be shown in the next subsection, that it is possible

Authorized licensed use limited to: Zhejiang University. Downloaded on August 18,2023 at 22:46:06 UTC from IEEE Xplore. Restrictions apply.
to improve the M LP results by the combination of HT as good results. Then, we used Freeman code to describe the
frontal part. We call this hybrid neural structural method: contours of symbols. Recall that this code moves along a
HT − M LP s. digital curve or a sequence of border pixels based on eight
connectivities. Each movement is classified as a direction
B. HT-MLPs, neural structural classifier and encoded with a numbering scheme i—i=0,1,2,7 denoting
It is about to extract segments from symbol bitmap using an angle of 45i counterclockwise from the positive xaxis.
HT (after a skeletonization step using a thinning algorithm) In this method chain code stores the absolute position of
then according to their number (0, 1, 2 segments or more), the first pixel and the relative positions of successive pixels
we call the appropriate M LP as shown in Figure 8. This along the symbols border. Thus, input and model symbols are
implies that initial MLP will be split into three specialized converted to sequences of digits, called chain codes. Digit
M LP s: MLP0, MLP1 and MLP2+ , each one will serve to sequences are compared using histogram of slopes along
learn and recognize symbols respectively composed of 0, 1, 2 contour and histogram matching techniques. In fact, we used
or more segments. Here are the best obtained results for the various metric of distances (e.g. Euclidian, Manhattan, and
three MLPs: Canberra distances), and similarities (histogram intersection)
- MLP0: Input layer (80 neurons), 1 hidden layer (4 neurons), to respectively compare between two histograms (stored as
Number of iterations:14. vectors). In case of a distance, we usually look for the
- MLP1: Input layer: 80 neurons Hidden layer : 1 layer nearest elements (to minimize distance). In the case of a
(3 neurons) Number of iterations:100 - MLP2+: Input layer: similarity, we look for the most similar elements, (to maximize
80 neurons Hidden layer : 1 layer (12 neurons) Number of similarity index). This method achieved a recognition rate near
iterations:1100 to 46.67%.
Figure 7 summarizes the obtained results with the proposal
neural structural method: HT-MLPs. We can observe a distinc-
tive improvement compared to initial MLP since the symbol IV. C ONCLUSION
misrecognition [11] rate is reduced by 21% and the recognition
rate is increased from 72% to 93%. However, there exists yet In this paper, we mainly described a successful classifi-
some confusion between similar symbols, ∏ composed∩ of the cation method based on three specialized MLPs, utilizing
same number ∑ of segments (e.g. versus (, versus , [versus HT. We evaluated effects of this hybridization, comparing
], ] versus ). We considered some of the misrecognitions to its recognition rates with those of initial MLP and other
be too difficult for any classifier to resolve on the basis of only methods based on KNN and Freeman code. Results of this
the number of segments. Further structural primitives should hybridization are very satisfactory, given the small number
be taken into account to avoid such confusions. of samples. It achieved 93% of single symbol recognition.
These results show that: 1) giving slightly higher weight to the
C. KNN classifier using Hu moments structural information: segment number, and 2) specializing
It is reported that KNN algorithm is amongst the simplest of the MLP, produces better results. But, the problem of symbol
all machine learning algorithms. We applied here for symbol recognition is still opened especially as the ultimate objective
recognition. Thus, a symbol is classified by a majority vote is to recognize any mathematical symbol whatever isolated or
of its neighbors, with the symbol being assigned to the class in the formula context, handwritten or printed. It is important
most common amongst its k nearest neighbors. K is a positive to note that the rates of recognition, confusion and rejection are
integer, typically small. If k = 1, then the symbol is simply not the only criteria to evaluate this work since the formula
assigned to the class of its nearest neighbor. The neighbors are context should play a decisive role to pronounce about the
taken from a set of symbols for which the correct classification real identity of the symbol. To improve this work, we plan
is known. This can be thought of as the training set for the further research in particular, the following: 1) to elaborate
algorithm, though no explicit training step is required. In order tests of efficiency and performance of the proposal neural
to identify neighbors, the symbols are here represented by structural method on a wider set of mathematical symbols.
Hu moments in a multidimensional feature space. We used Until now, we formed a reasonable size set. Classical meth-
the Euclidean distance to compare between input and model ods have limitations, but the development of robust methods
symbols Hu moment values. After several tests by varying the depends on the database availability of large size to test the
value of k from 1 to 4 considering Hu moments, we find that performance, robustness, reliability of the proposed methods
the best recognition rate is obtained with k equals to 4 and and conduct meaningful statistical tests to compare against
the first three moments of Hu. each other, 2) to distinguish the symbols that confuse yet
the HTMLPs classifier considering additional features such as
D. Structural Freeman chain code segment orientation, other structural primitives like presence
Another method, known as contour, is the structural Free- of loops, branchpoints [11], endpoints that could distinguish
man chain code of the symbols border. To detect symbol many variants of symbols and 3) to combine the results of
border, we tested different filters such as filters of Robert several recognizers which should give more accurate results
and Sobel. We find that Sobel filter is simplest and gives than using only one.

Authorized licensed use limited to: Zhejiang University. Downloaded on August 18,2023 at 22:46:06 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] Bianchini, Claudia and Borgia, Fabrizio and De Marsico, Maria, ”A
concrete example of inclusive design: deaf-oriented accessibility”, arXiv
preprint arXiv:1911.13207, 2019.
[2] Ayeb, Kawther Khazri and Meguebli, Yosra and Echi, Afef Kacem,
”Deep Learning Architecture for Off-Line Recognition of Handwritten
Math Symbols”, Mediterranean Conference on Pattern Recognition and
Artificial Intelligence,2020.
[3] Ayeb, Kawther Khazri and Echi, Afef Kacem and Belaı̈d, Abdel,
”Arabic/Latin and handwritten/machine-printed formula classification
and recognition”, 1st International Workshop on Arabic Script Analysis
and Recognition (ASAR),2017.
[4] Nazemi, Azadeh and Tavakolian, Niloofar and Fitzpatrick, Donal and
Suen, Ching Y and others, ”Offline handwritten mathematical symbol
recognition utilising deep learning”,arXiv preprint arXiv:1910.07395,
2019.
[5] R. H. Anderson, ”Syntax-directed Recognition of Hand-printed Twodi-
mensional Mathematics”, in Symposium on Interactive Systems for
Experimental Applied Mathematics: Proceedings of the Association for
Computing Machinery Inc. Symposium, 1967, pp. 436459.
[6] D. BLOSTEIN and A. N. N. GRBAVEC, ”RECOGNITION OF MATH-
EMATICAL NOTATION”,in Handbook of Character Recognition and
Fig. 1. The CROHME competition for mathematic formula and symbol
Document Image Analysis, pp. 557582.
recognition.
[7] A. Kacem, A. Belad and M. Benahmed, ”Automatic extraction of
printed mathematical formulas using fuzzy logic and propagation of
context”, in Jour. of IJDAR, volume 4, Number 2, pp. 97108, December
2001.
[8] , Malon, Christopher and Uchida, Seiichi and Suzuki, Masakazu, ”Math-
ematical symbol recognition with support vector machines”, Pattern
Recognition Letters, volume 29 , number 9 , pages 1326–1332 ,2008
[9] Yang, Michael and Fateman, Richard, ”Extracting mathematical expres-
sions from postscript documents”,Proceedings of the 2004 international
symposium on Symbolic and algebraic computation, pages 305–311,
2004.
[10] Alvaro, Francisco and Zanibbi, Richard, ”A shape-based layout descrip-
tor for classifying spatial relationships in handwritten math”, Proceed-
ings of the 2013 ACM symposium on Document engineering, pages
123–126, 2013.
[11] M. Koschinski, H.J. Winkler and M. ang, ”Segmentation and recognition
of symbols within handwritten mathematical expressions”, in Proc. of
CASSP, vol. 4, Detroit, MI, pp. 24392442, 1995.
[12] H.J. Winkler,H. Fahner and M. lang, ”A softdecision approach for
structural analysis of handwritten mathematical expressions”, in Proc.
of ICASSP, vol. 4, Detroit, MI, pp. 24592462, 1995.
[13] A. Kosmala, G. Rigoll, S. Lavirotte and L. Pottier, ”Online handwritten
formula recognition using hidden Markov models and context dependent
graph grammars”, in Proc. of ICDAR, Bangalore, Karnataka, India,
pp.107110, 1999.
[14] Z. Xuejun, L. Xinyu, P. Boachang and Y. tang, ”Online recognition
handwritten mathematical symbols”, in Proc. of ICDAR, Ulm, Germany,
pp. 645648, 1997.
[15] E. Topia and R. Rojas, ”Recognition of online handwritten mathematical
formulas in the Echalk system”, in Proc. of ICDAR, Edinburgh, U.K.,
pp. 980984, 2003.
[16] U. Garain, B. Chaudhuri, and A. Ray Chaudhuri, ”Identification of
embedded mathematical expressions in scanned documents”, in Proc.
of ICPR, Cambridge, UK, pp. 138149, 2004.
Fig. 2. MLP design with 4 pixel portion black pixel densities as input.

Fig. 3. MLP rates.

Authorized licensed use limited to: Zhejiang University. Downloaded on August 18,2023 at 22:46:06 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. symbol confusion cases.

Fig. 7. MLP2+ rates.

Fig. 5. MLP0 rates.

Fig. 8. Average Rates of HT-MLPs.

Fig. 6. MLP1 rates.

Authorized licensed use limited to: Zhejiang University. Downloaded on August 18,2023 at 22:46:06 UTC from IEEE Xplore. Restrictions apply.
Combination of Hough Transform and Neural
Network on recognizing mathematical symbols
Aouadi Nabil
Latice Laboratory
Tunis, Tunisia
nabil.aouadi@utic.rnu.tn

Fig. 9. Proposed System for Mathematical Symbol Recognition

Combination of Hough Transform and Neural


Network on recognizing mathematical symbols
Aouadi Nabil
Latice Laboratory
Tunis, Tunisia
nabil.aouadi@utic.rnu.tn

Authorized licensed use limited to: Zhejiang University. Downloaded on August 18,2023 at 22:46:06 UTC from IEEE Xplore. Restrictions apply.

You might also like