Professional Documents
Culture Documents
1043 - زیارت بان
1043 - زیارت بان
1043 - زیارت بان
Abstract-- In this paper a directional filter is proposed to and lower zones of the line from the printed texts. The main
describe the curvedness of textures. The proposed filter is inspired drawback of these approaches is that they require high-quality
by the basic Gabor filter and has an elliptic form. Thus, they are and noise-free document images [21].
called directional elliptic Gabor (DEG) filters. Characters and
subwords in Farsi machine-printed texts are constructed from
The approaches based on frequently used components
both straight and curved segments. Moreover, the amounts of [13,17,18] have been proposed for content-independent font
curvedness of various Farsi fonts are different. Therefore, the recognition applications. In these methods, the learning set
features based on the proposed filter can be useful in Farsi font consists of a number of samples of frequently used components
recognition. Better describing straightness and curvedness of text in all font classes. In these approaches, the text font is
components increases the separability among various fonts. determined based on the fonts of the detected samples of the
Experiments demonstrate that using both Gabor filters and the
proposed DEG filters for texture features extraction improves the
components in a document image. The algorithms in the third
Farsi font recognition accuracy. category require computing the matching scores between all
predetermined most-frequent components in all font classes
Keywords—Farsi font recognition, Directional elliptic and all components of a test image. Due to the large number of
Gabor filters. required matchings, these methods are very time-consuming.
By considering more font classes and larger number of words
in the test document images, the complexity and the processing
I. INTRODUCTION time will increase considerably.
Texture analysis was used in many approaches
Converting document images into editable text files is one of
the interesting goals of optical character recognition (OCR). [2,6,8,10,14,19-21] to determine the fonts of text blocks. These
The accuracy of OCR in the machine-printed texts with known approaches first normalize the spaces between text lines,
words, and characters in text images. Text blocks are
fonts is significantly higher than that in the texts with unknown
fonts [1]. Several works have been done in optical font normalized by filling the empty spaces at the end of text
blocks. Then, the texture features are extracted from the
recognition in various languages such as Latin [2-12],
Chinese [6, 13-16], Arabic [17-19], and Farsi [20,21]. The normalized text blocks.
related works on the optical font recognition are briefly listed Gabor filter was used in [20] to extract the texture features
from Farsi document images. Seven font types and four font
in Table I. In all of these works, the font of the whole text in a
document image was assumed to be uniform. For the cases of styles were considered and the rate of 85% was obtained by
using a weighted Euclidean distance (WED) classifier.
complex multi-font text images, a preprocessing stage is
Recently, Khosravi and Kabir [21] proposed a Sobel-
required to segment a multi-font document into several single-
font text parts. This preprocessing stage is a wide subject in the Roberts feature extraction method for Farsi font recognition.
These features statistically describe the texture of the texts.
OCR field and is not in the scope of this paper. Thus, like all
other previous works, we propose an algorithm to recognize an 15000 training and 5000 test samples were used and the rate of
unknown uniform font of a machine-printed document. 94.16% was achieved for the recognition of ten Farsi font
types.
Font recognition approaches can be roughly divided into three
main categories: Typographical feature-based methods, The rest of the paper is organized as follows: The proposed
DEG filters which are used for Farsi font recognition is
frequently used component-based methods, and global texture
analysis. discussed in section 2. In section 3, the experimental results are
Typographical feature-based algorithms [3-5] extract some presented. Finally, conclusions are drawn in section 4.
features, like character skews, between-characters and
between-words space widths, and projections in upper, center
Fig. 1. Various curvedness of four different Farsi fonts. The texts of all text images are the same.
Fig. 4. Noise sensitivity of various feature extraction methods: from left side (1st columns) sample text blocks, (2nd and 3rd columns) Sobel and Roberts phase
images, respectively. (4th and 5th columns) Gabor and DEG filtered images, respectively. (1st row) results related to the original text block, (2nd row) the results on
the noisy text block with the SNR value equal to 25dB.
[20] A. Borji and M. Hamidi, “Support Vector Machine for Persian Font
Recognition,” Int. Journal of Intelligent Systems and Technologies, Vol.
IV. CONCLUSION 2, 2007, pp. 178-183.
In this paper, directional elliptic Gabor features were [21] H. Khosravi and E. Kabir, “Farsi font recognition based on Sobel–
Roberts features,” Pattern Recognition Letters, Vol. 31, 2010, pp. 75–
proposed to describe the curvature of the textures. 82.
Combination of Gabor features and proposed DEG features [22] M. Ziaratban, K. Faez, F. Bagheri, “Content-Independent Farsi Font
described both straightness and curvature of different Farsi Recognition Based on Dynamic Most-Frequent Connected
fonts and presented the best performance. Combining the Components,” 21st International Conference on Pattern Recognition,
ICPR’12, Japan, pp. 729-733, 2012.
proposed DEG features with the basic Gabor features [23] Y. Freund, R.E. Schapire, “Experiments with a new boosting algorithm,”
improves Farsi font recognition rate about 1.9% and 1.7% In Proc. Int. Conf. on Machine Learning, Bari, Italy, pp. 148–156, 1996.
better than the basic Gabor and Sobel-Roberts features,
respectively. Furthermore, basic Gabor and DEG filter-based
features were more robust to noise as well as their
combination.
V. REFERENCES
[1] H.S. Baird and G. Nagy, “A Self-Correcting 100-Font Classifier,” In
Proc. SPIE, Vol. 2181, pp. 106-115, 1994.
[2] H. Ma and D. Doermann, “Font Identification Using the Grating Cell
Texture Operator,” In Proc. of DRR, 2005, pp. 148-156.
[3] A. Zramdini and R. Ingold, “Optical Font Recognition from Projection
Profiles,” Electronic Publishing, Vol. 6, No. 3, 1993 pp. 249-260.
[4] A. Zramdini and R. Ingold, “Optical Font Recognition Using
Typographical Features,” IEEE Trans. on PAMI, Vol. 20, No. 8, 1998
pp. 877-882.
[5] M.C. Jung, Y.C. Shin and S.N. Srihari, “Multifont Classification using
Typographical Attributes,” In Proc. of ICDAR, India, 1999 pp. 353-356.
[6] Y. Zhu, T. Tan and Y. Wang, “Font Recognition Based on Global
Texture Analysis,” IEEE Trans. on PAMI, Vol. 23, No. 10, 2001 pp.
1192-1200.
[7] S.H. Kim, “Word-Level Optical Font Recognition Using Typographical
Features,” IJPRAI, Vol. 18, No. 4, 2004, pp. 541-561.
[8] C.A. Cruz, R.R. Kuoppa, M.R. Ayala, A.A. Gonzalez and R.E. Perez,
“High-order Statistical Texture Analysis-Font Recognition Applied,”
Pattern Recognition Letters, Vol. 26, 2005, pp. 135-145.
[9] B.B. Chaudhuri and U. Garain, “Extraction of Type Style-based Meta-
information from Image Documents,” IJDAR, Vol. 3, 2001, pp. 138-149.
[10] B. Allier and H. Emptoz, “Font Type Extraction and Character
Prototyping Using Gabor Filters,” In Proc. of ICDAR, 2003, pp. 799-
803.
[11] H. Shi and T. Pavlidis, “Font Recognition and Contextual Processing for
More Accurate Text Recognition,” In Proc. of ICDAR, 1997, pp. 39-44.
[12] S.L. Manna, A.M. Colla and A. Sperduti, “Optical Font Recognition for
Multi-Font OCR and Document Processing,” In Proc. of 10th Int.
Workshop on Database & Expert Systems Applications, 1999, pp. 549-
553.
[13] C.F. Lin, Y.F. Fang and Y.T. Juang, “Chinese text distinction and font
identification by recognizing most frequently used characters,” Image
and Vision Computing, Vol. 19, 2001, pp. 329-338.
[14] M.H. Ha, X.D. Tian and Z.R. Zhang, “Optical Font Recognition Based
on Gabor Filter,” In Proc. of Int. Conf. on Machine Learning and
Cybernetics, 2005, pp. 4864-4869.
[15] Z. Yang, L. Yang, D. Qi and C.Y. Suen, “An EMD-based Recognition
Method for Chinese Fonts and Styles,” Pattern Recognition Letters, Vol.
27, 2006, pp. 1692-1701.
[16] X. Ding, L. Chen and T. Wu, “Character Independent Font Recognition
on a Single Chinese Character,” IEEE Trans. on PAMI, Vol. 29, No. 2,
2007, pp. 197-204.
[17] I.S.I. Abuhaiba, “Arabic Font Recognition Based on Templates,” Int.
Arab Journal of Information Technology, Vol. 1, 2003, pp. 33-39.
[18] I.S.I. Abuhaiba, “Arabic Font Recognition Using Decision Trees Built
from Common Words,” Journal of Computing and Information
Technology (CIT), Vol. 13, No. 3, 2005, pp. 211-223.
[19] B. Moussa, A. Zahour, M.A. Alimi and A. Benabdelhafid, “Can Fractal
Dimension Be Used in Font Classification,” In Proc. of ICDAR, 2005,
pp. 146-150.