Professional Documents
Culture Documents
admin,+17.++SI+JJT+VOL+78.4-2+2016 Sana Final
admin,+17.++SI+JJT+VOL+78.4-2+2016 Sana Final
Teknologi
Abstract
Text detection in image is an important field. Reading text in image is challenging because of the variations in images. Text
detection in images is useful for many navigational purposes e.g. text on google API’s and traffic panels etc. This paper analyzes
the work done on text detection by many researchers and critically evaluates the techniques designed for text detection and
states the limitation of each approach. We have integrated the work of many researchers for getting a brief over view of
multiple available techniques and their strengths and limitations are also discussed to give readers a clear picture. The major
dataset discussed in all these papers are ICDAR 2003, 2005, 2011, 2013 and SVT(street view text).
natural scene images as well as in videos to highlight ICDAR 2011 are used. The proposed low-level SFT filter,
the strengths and limitations of existing text detection leads to high recall, and the effectiveness of the two-
techniques. level TCDs, leads to high precision. But this method can
only predict text in horizontal manner not vertical.
2.1 Text Detection in Natural Scene Images Yao et al.[5] have proposed a multi-scale
representation of text in images using strokelets. Over
In static natural image, Iqbal et al.[1] suggested an the conventional approaches strokelets have
adaptive classifier threshold to detect text in images. following four advantages: Usability, robustness,
Adaptive classifier threshold is based on the geometric generality and expressivity called the URGE properties.
mean and standard deviation of posterior probabilities They designed an efficient algorithm for text
of a MSER-based candidate characters using Bayesian recognition. First of all the character identification is
network scores. A candidate character is discarded if done by seeking maxima in hough maps. They used
posterior probability is below the adaptive classifier the discriminative clustering algorithm designed by
threshold value. This method is evaluated on an ICDAR Singh et al. [35], certain changes were made
2013 and achieved a significant competitive according to requirement in this algorithm. The
performance with a comparison of recently published random forest classifier is used for classification of
algorithms. However, the proposed method requires components. The data sets of SVT and ICDAR 2013 are
improvements in performance by testing other used to evaluate the results. The proposed algorithm
dataset. In addition, this method only detects on a consistently outperformed the existing state-of-the art
given dataset rather than any natural image. approaches. But the result could show variations over
Jaderburg et al.[2] does text detection using deep other datasets. This system has limitations due to
features. They divided the work of text detection into random forest’s behavior.
two tasks: spotting the word region and recognizing Ye et al.[6] have invented a new approach on text
the words. A Convolutional Neural Network classifier is detection in image by combing both the appearance
used for Case sensitive, Case insensitive and bigram and consensus component representation into a
classification and generated the saliency maps for discriminative model. SVM (support vector machine) is
these. The convolutional structure of CNN goes used for dictionary classifiers. This discriminative model
through the whole image once. The mining technique performs two tasks: one to differentiate text/non-text
goes through the images produces word level and and determine component grouping. In text
character level annotations. The output is divided as detection, candidate components are built on MSERs.
follow: a character/background classifier, a case They proposed a definition of “text patterns” using a
insensitive classifier, a case insensitive classifier and a sequence of classifiers. A multi-class SVM training
bigram classifier. These classifications adds to the algorithm is used to train the dictionary classifier. On
efficiency of this work. He used the ICDAR (2003, 2005, the given set of samples, they then calculated the
2011, 2013) Robust Reading data set and street view classifier responses of the components consensus
text data set. The system have its limitations due to the features and the components. Then hypothesis on the
saliency maps, because these may sometime give text are made until it become negative. They
bad resolution and wrong result. evaluated the work on following two datasets: ICDAR
Yin et al.[3] presented a robust method for 2011 scene text dataset and the Street View Text (SVT)
detecting texts in natural images and designed an dataset. Their proposed approach has improved the
effective pruning algorithm using the strategy of precision, recall rate and f-measure than the previous
minimizing regularized variations to extract Maximally approaches. The problem in the method was, if the
Stable Extremal Regions (MSERs). Further, they distance between candidate characters is as large as
proposed distance matric learning algorithm and then text height then the object may be missed to be
used single link clustering algorithm to group recognized.
candidate character into text region. Finally, non-text Neumann et al. [7] have worked on text detection
candidates were eliminated using Adaboost classifier in scene image with Oriented Stroke Detection and
based on posterior probabilities. The method is presented an unconstrained end to end localization
evaluated on ICDAR 2011 Robust Reading and recognition method. This method detects the
Competition dataset and also on multilingual (Chinese character as an image region, which contain strokes
and English). But the proposed system doesn’t work for of specific orientation and relative position. The
all natural scene images containing any orientation of detected stroke induce the set of rectangles to be
text. classified. The strokes are modelled as responses to
Huang et al.[4] introduced a new technique for text oriented filters in the gradient projection scale space
localization in natural image using Stroke Feature and the relative stroke position is modelled by
Transform and Text Covariance Descriptors. At first they subsampling the responses into a fixed-sized matrix.
used a low level filter “Stroke Feature Transformation Character recognition is done by recognizing a known
(SFT) for removing background pixels. Afterwards, the stroke pattern with a trained classifier. Then, to detect
candidates are obtained using color homogeneity words in the image and recognize their content, an
and consistency between stroke widths. And for optimal sequence was found in each text line. The
component detection, two maps are generated using result was evaluated on the ICDAR 2011 dataset. The
SFT: stroke width map and a stroke color map. Then advantage of the method is that the no. of rectangles
two Text covariance descriptors (TCD) were used for are reduced many fold due classification of set of
text region and text line classification. So, by using the strokes. And the limitation is due to existence of an
SFT and two TCDs, their system’s performance ambiguity that a sub-region of a character might be
improved many folds. The dataset of ICDAR 2005 and another character.
117 Sana et al. / Jurnal Teknologi (Sciences & Engineering) 78: 4–3 (2016) 115–126
Minetto et al. [8] have proposed a Snooper text in multi being extracted, soble was used to obtain edge
scale fashion. First of all, it locates the candidate density. In order to detect candidate regions, they
characters on the images by image segmentation and estimated edge density across the edge image by
shape based binary classification. The segmentation applying a Gaussian kernel on it. Next, the
algorithm used, was developed by Fabrizio et al. [9] to complicated background curve and noise is removed
define local foreground and back ground. Using the by morphological opening, closing. Then text
SVM classifier, the character filtering is done. Next, the localization is done using, density-based information
candidate characters are grouped by simple and rectangle window in the residual edge image.
geometric criteria to form either candidate words or Finally, for text segmentation, they used the work of
candidate text lines. The grouping module in iTown Nomura et al. [36]. The dataset of ICDAR 2005 is used,
does not work well when come across wide spaces so result showed that the proposed method worked well
to overcome this issue, this module was run twice. One even in low illumination. The system’s strength is its
more advantage of the multi-scale approach is that it detection in low illumination while has a drawback
makes the segmentation algorithm insensitive to that for long curves, the system may scan twice for
character texture like high frequency details. They removing noise.
used four datasets for testing the result: ITW (iTown González et al. [14] have proposed a text reading
project’s image collection), SVT (Street View Text), EPS algorithm in natural image. Their process is based on
(used by Epshtein), ICDAR (half of the 2005). And a two steps one: image is analyzed to detect text using
comparison was done between Snooper text and geometric feature. Two: the recognition is done.
TessBack, TesseRact, result showed that TessBack is Following are the contribution of their work: 1) a
better one. The advantage of multi-scale fashion is, it combination of adaptive thresholding method and
makes segmentation algorithm insensitive to character Maximally Stable Extremal Region, for segmentation, 2)
texture like high frequency details. The method has a brief study on different features of text is done to
limitations: The grouping module doesn’t work well for discriminate between text and non-text 3) a
wide spaces. And it can’t detect tilted or vertical text. restoration stage is proposed for rejected characters,
Ganesh et al. [10] have worked on text detection 4) a method for detection of single character using K
in the images of big data. They have taken google API nearest neighbor and use of DP for misspelled words.
to get many street view images those acted as Big The dataset of ICDAR 2003, 2005and 2011 are. Result
Data in their work. After obtaining the images filters are showed that the proposed system performed better
applied to remove noise. First of all averaging filter is and scored first in precision (mean the no. of false
applied, then median filter is applied and finally positive was smallest). But the limitation is: method
adaptive filter is applied to eliminate the low does not work for multi oriented text.
frequency regions and noise. Next, image processing González et al. [15] have worked on text detection
is done in two steps: first, color based partition method in traffic signs boards From Street-Level Imagery Using
Second, the classifiers were applied to detect whether Visual Appearance. They have used the text detection
the above partitions contain text in them or not. Then and recognition technique (with modification) by the
by using Hough transformation method, text line same author used in [14].They applied blue and white
grouping method. The strength of the technique is segmentation on the image. Then the classification is
dealing of big data that’s very crucial. But it has its done using Naive Byes and SVM accordingly. The
limitation that result may vary for other big data rather technique is based on color segmentation and BOVM
than dataset used in this technique. (bag of visual words) approach. And this method is
Gao et al. [11] have worked on detection of applied to only those areas detected by blue, white
images in natural scene and invented a Hierarchical masks. Then the feature extraction is done using haris-
Visual Saliency Model. As sometimes the region laplace salient point detector. They have also
containing the character is salient, and is detected by compared the following descriptors for their work: SIFT,
saliency based method. So, they have worked on C-SIFT, Hue, Histogram. The strengthening features
knowing how much these region effect the detection were: Character recognition was enhanced to detect
of characters and proposed a new method and symbols, invention of a new technique for blue region
compared with Itti et al.’s model [12]. First of all the description. The limitation of the method was: it
salient regions are obtained and image is filtered. In application only to areas detected blue and white.
second step: evaluation of local saliency region inside Iqbal et al. [16] have worked on text localization in
global salient region is done. And that is the final map scene images and they have used K2 algorithm and
required. The scenery image data base is used which Bayesian network score. At first they have detected
contain 3018 images. The result showed that hierarchal the candidate region using MSER algorithm and
saliency method performed better than Itti et al. [12]. filtration is done by constraints. As a second step,
A problem exist in this method is the threshold method textual regions are being constructed. Due to intense
and hierarchal clustering method for cropping pixel values every candidate binary region has its own
saliency region, not always give good result. features. So, for each feature, K2 algorithm is used to
Xiao et al. [13] have proposed a method for text obtain Bayesian Network score. Finally, classification of
detection in images of complex back ground. The true candidates is done by using a text classifier. They
method uses density-based information and rectangle have tested their data on ICDAR robust reading
window in the residual edge image. The input was of competition data set of 2013. So, their method of
low illumination images with complicated background Bayes Text resulted ranked on number 4th out of 10
in the RGB format so at first the conversion to Ycbcr is among the recently published results. But the limitation
done. Then Tone mapping function is used to enhance of the method is: dependence on Yin et al [20].
the image. After this, the vertical edge of this image is
118 Sana et al. / Jurnal Teknologi (Sciences & Engineering) 78: 4–3 (2016) 115–126
Jaderberg et al. [17] have worked on text detection in text detection and the limitation of system is due to
natural scenes and using Synthetic Data and Artificial random forest classifier’s behavior.
Neural Networks. This method considers the whole Milevskiy et al. [19] have worked on Joint Energy-
image for recognition of word that makes it different. based Detection and Classification of Multilingual Text
A synthetic engine is used to produce data and Lines. A new hierarchal model MDL is introduced for
generate a word as whole. Then the neural network is classification and detection of the images taken by
use to train that data. They have used three models: any hand held device, the energy of the image can
dictionary encoding, bag-of-N-grams encoding and be optimized by fusion moves. The method segmented
sequence encoding. The data set of ICDAR 2003, images into multiple classes of language and text line
ICDAR 2013 and Street View Text are used to generate by looking at geometric errors. The original image was
data through synthetic engine. Their method improved detected as text blobs using edge based technique
greatly over the standard datasets because of simple then Adaboost classifier is applied and finally text line
fast machinery and cost reduction of data acquisition. is detected using energy based algorithm. The
And the strength of the system is it takes images as combined text detection and classification was based
whole. There exists a limitation due to the fact that on energy minimization by BCD (Block Coordinate
synthetic engine may produce large amount of data decent). The dataset was consisted of 500 images
that can create difficulties. taken by mobile cameras. This model combined the
Yao et al. [18] have worked on multi-oriented text geometric error cost which adds to the strength of the
detection and proposed a new unified framework for method and the narrowness of the method is due to
it. Text detection and recognition is done all together Adaboost classifier.
and same features are utilized. The system works well Barlas at al. [20] have worked on recognizing hand
on multiple orientation of text. A new search based written or typed text in heterogeneous document and
dictionary approach is invented to eliminate errors developed analysis system for text recognition. Well
caused by resembling symbol. Firstly, the candidates they have presented LITIS’s proposed system for
are generated via clustering and SWT, then segmentation (into 8 multiple classes). In their work
recognition is performed using similar classification they presented connected component based
schema. For this they used randomized tree and the strategy for identification and segmentation of the text
component level classification, it is done by using and heterogeneous documents are dealt with it,
Random Forest classifier. Then result is forwarded to a which make it different and credits to learning based
dictionary to correct errors. And finally the detected approach. Figure 1 shows the working flow of the
text is the output of the given framework. The dataset system.
of ICDAR2003, 2005 and 2011 are used. Furthermore
dataset like MSRA TD500 is also used because yet there
was no state of art dataset for multi-oriented text.
Proposed system performed quite well on all these
data sets. The strength of the system is multi-oriented
Figure 1 A typed and handwritten text block segmentation system for heterogeneous and complex documents
LITIS’s segmentation constitutes many detectors. by using Run length smoothing algorithm and method
Firstly the noise is being removed, then text and non-text presented by Thomas Breuel et al. [21]. The method was
are separated using learning based approach. These tested on MOURDOR’s dataset. The strength of the
characters are then input to MPL classifier (trained on method is the detection of hand written text along with
2000 doc images in MOURDOR’s dataset). Then finally typed text in heterogeneous documents. The drawback
layer separation is done to differentiate between of system is the text on graphical portion can’t be
textual cc into typed or hand written text using detected.
codebook, afterwards the block segmentation is done
119 Sana et al. / Jurnal Teknologi (Sciences & Engineering) 78: 4–3 (2016) 115–126
Yin et al. [22] presented a new approach for localization is then computed by Canny filter. Then they used a
of text in image using MSER and geometric features and Multilayer Perceptron (MLP) for validation. Finally, the
adaboost for classification. The proposed approach verified components are clustered to form a single
works in a way that first the candidate letters are rectangle over the text word. The dataset of ICDAR 2003
extracted using MSER and based on geometric features is used, the result showed that the false alarm rate is
non letters are removed. After this candidate groups are comparatively high for CAdaboost as compared to
classified using AdaBoost classifier, candidate’s features classical Adaboost but the standard deviation is low.
are extracted. The dataset of the algorithm was of The strength of method is it works well for different font,
ICDAR 2011, result were better than the published style and complex background but limited to only grey
algorithms till 2011. Contrary to traditional approaches scale images.
of sliding window and CC, the new method is better. Its Ye et al. [24] have worked on text detection and text
detection is limited to only horizontal text. restoration in natural image and presented a robust
Hanif at al. [23] have worked on text detection and method for text detection, recognition and restoration.
localization in grey scale image using Constrained First of all, GLVQ (generalized learning Vector
AdaBoost Algorithm and presented a text detector quantization) algorithm is used to segment pixels to
that is based on cascade of boost ensemble. The neural locate characters. Then the differentiation between
networks are utilized to obtain the automatic rules for text and non-text is done and wavelet co efficient is
localization. Feature extraction is done using used along with color variance to detect text patterns.
rectangular text segment. LRT (likelihood ratio test) was Then the SVM classifier is used. Text lined characters are
picked up as weak classifier. Afterwards, classical then binarized and input to OCR for reduction of false
Adaboost algorithm is used for removing classification alarm. After this, the restoration of text is done by a
errors ignored by weak classifiers, so a strong classifier is process based on plane to plane homography. It is
constructed using the hypothesis based on features independent of camera parameters and applied
extracted. Using the attentional cascade many non- where deformation on the text is detecting. Figure 2
object feature are removed at early stages. The cc are shows the working of system.
being projected to grey level image and an edge map
The dataset of 1500 images captured, with different method of Wolf et al. [26] was used. Following 9
background, illuminations etc. are used. Their method methods participated 1) Yi’s Method, 2) Kim’s Method,
showed robust performance. The OCR reduces the false 3) Text Hunter, 4) KAIST AIPR System, 6) LIP6-Retin, 7)
alarm that adds to strength of the system but the “TDM IACAS, 8) TH-TextLoc, TH-OCR System, 9) ECNU-
limitation of the system is due to the inconsistent CCG Method. Kim’s method attained the first position in
behavior of GLVQ algorithm. text localization and in text detection there were three
Shahab at al. [25] done the analysis of multiple participant among them TH-OCR performed best. The
techniques for text detection in natural scene images result showed that still there was a lot of room for
based on ICDAR 2011’s challenge 2. According to them improvement. Only few methods took part in word
there three sub part in text reading in an image: 1) text recognition but still there is room for lot of improvement.
location, 2) character detection and 3) word detection. Toyama et al. [27] have worked on text recognition
In 2011 a competition was held to test the algorithm using eye gaze. And develop a system to translate the
developed to test these two parts text localization and text from French to English. The OCR (optical-character-
word recognition. So they enhanced the dataset of recognition) technology is used and human eye gaze
2003 by adding images in it. For text localization the was taken as input to get efficient result. They have
120 Sana et al. / Jurnal Teknologi (Sciences & Engineering) 78: 4–3 (2016) 115–126
extended the work of [28]. The feedback is given as sequence of characters with each comprising a
through Head mount display. Not only the translation small number of candidates. They themselves designed
but the navigational help is also provided. They utilized a matrix to evaluate the test result of their work. The
SMIs Eye Tracking Glasses (ETG)2 for gaze input. These strength of the method is text detection in images
gaze algorithms are used: Gaze Repetitive Leap (GRL), captured in motion but there is a limitation as well: When
Gaze Scan. Then the OCR is activated, the text region is the distance is 1.5m while taking observation then the
extracted from the, end and start fixation point of decoder may not work.
gesture. Then the navigation is provided on HDM Shivakumara et al. [30] has worked basically on
screen. The advantage of the method is its navigational arbitrary oriented text detection in videos using
help for foreigners but limitation is: the result showed Gradient Vector Flow and Grouping based Methods.
only the basic effectiveness of their algorithm. Soble edge map of the image is used to identify
dominant text pixel. Further edge components in Soble
edge map corresponding to dominant pixels are
2.2 Text Detection in Videos extracted and they called them Text Candidates (TC).
Then to overcome arbitrary orientation they proposed
Wang et al. [29] focused on videos to detect text by two stages grouping for TC, at first stage grows the
using simultaneous localization and mapping (SLAM) to perimeter of each TC to identify the nearest neighbor,
extract planar scene surfaces “tiles”. And all the sensor which gave the text component. Then in next stage, the
data is input using LCM (Lightweight Communications tails of CTC were used to identify the direction of text to
and Marshaling) then the system maintain the sensor find nearest neighbor, the objective of this step was to
ring’s motion using incremental LIDAR scan-matching form a word. Next, word patches are combined to
and a local map is built consisting of line segment. Tiles detect text lines. The datasets used are as follow: Hua’s
are then projected onto the camera which generated dataset, ICDAR 2003 dataset, arbitrarily-oriented data,
the observation and then a fronto-parallel view was non-horizontal data and horizontal data. The proposed
observed through a homography transformation. These method outperforms the existing methods. The strength
observations are then considered for text detection and of the system lies in the fact that it detects text in
decoding. They used DCT and MSER to produce arbitrary orientation in videos, but the problem in the
detection regions for character which then provided to method is it may not give good result for horizontal text
Tesseract. Then a clustering process grouped down the line with less space. Table 1 provides a detailed analysis
characters word candidate. The output came up to be of all the above technique
121 Sana et al. / Jurnal Teknologi (Sciences & Engineering) 78: 4–3 (2016) 115–126
Performance
F-
Author Method Advantages Limitations Dataset Precisio
Recall measur
n
e
K2 algorithm is used here which have
K2 algorithm based text
Better than ICDAR 2011’s reading a drawback that is its high 62.37
Iqbal et al.[1] detection using classified ICDAR 2013 84.97% 71.94%
competition. dependence on ordering of nodes in %
threshold
a structure.
CNN goes through the whole image
once, which simplifies the task, ICDAR 2003, ICDAR
automatic production of word level The saliency maps generated, may 2005, ICDAR 2011,
Deep features for text
Jaderb-urg et al.[2] and character level annotations. sometime give bad resolution and ICDAR 2013, X X X
spotting
The output was classified that wrong result. Street View Text
added to the efficiency of method. dataset
Better than Wu[31], Alsharif [32].
Minimized regularized variation is
In single link clustering could lead to ICDAR 2011,
better than linear smoothing or
Robust Text Detection in chaining phenomenon which may multilingual 68.26
Yin et al.[3] median filtering which reduce noise 86.29% 76.22%
Natural Image lead to impractically heterogeneous dataset(Chinese and %
but smoothens the edges to some
clusters and difficulties. English
degree.
SFT filter and two TCD classifiers 72%
Text Localization in Natural enhanced the system performance Using SFT, the direction of stroke is not 81% 74% and
ICDAR 2005, ICDAR
Huang et al[4] Images using SFT and Text many folds and leads to high recall determined so can only detect and and 73%
2011
Covariance Descriptors and high precision respectively. horizontal text line. 82% 75% respecti
Better than Neumann[33]. vely
Strokelets have URGE properties
(usability, robustness, generality and ICDAR 2003(full) with
expressivity) which make it better Random forest is used for accuracy 80.33%,
Strokelets: A Learned
than previous traditional classification but in this classification, ICDAR 2003(half) with
Multi-Scale
Yao et al.[5] approaches. Proposed algorithm Regression can't accuracy 88.48%, X X X
Representation for Scene
achieved better result than present predict beyond a particular range SVT with accuracy
Text Recognition
algorithms. RF classifier is used in the training data. 72.89%,
instead of SVM because, performs ICDAR 2013
better than later.
Scene Text Detection via Gemma co relation is used to
If the distance between candidate
Integrated Discrimination extract low contrast text
character is as large as text height 67.52
Ye at al[6] of Component components. Using all hypothesis ICDAR 2011 and SVT 43.89% 53.20%
then the object may be missed to be %
Appearance and generated in method could be time
recognized.
Consensus consuming so they used loose
122 Sana et al. / Jurnal Teknologi (Sciences & Engineering) 78: 4–3 (2016) 115–126
[14] Gonzalez, A., and L. M. Bergasa. 2013. A Text Reading (ICDAR), 2011 International Conference. Beijing, China. 18-
Algorithm For Natural Images. Image and Vision 21 Sept. 2011. 1491-1496
Computing. 31(3): 255-274. [26] Wolf, C., and J., M., Jolion. 2006. Object Count/Area
[15] Gonzalez, A., L. M., Bergasa, and J. J. Yebes. 2014. Text Graphs For The Evaluation Of Object Detection And
Detection And Recognition On Traffic Panels From Street- Segmentation Algorithms. International Journal of
Level Imagery Using Visual Appearance. Intelligent Document Analysis and Recognition (IJDAR). 8(4): 280-296.
Transportation Systems, IEEE Transactions. Beijing, China. 6- [27] Toyama, T., D., Sonntag, A., Dengel, T., Matsuda, M.,
11 July 2014. 15(1): 228-238. Iwamura and K. Kise. 2014. A Mixed Reality Head-Mounted
[16] Iqbal, K., X. C., Yin, H. W., Hao, S., Asghar, and H., Ali. 2014. Text Translation System Using Eye Gaze Input.
Bayesian Network Scores Based Text Localization In Scene In Proceedings Of The 19th International Conference on
Images. In Neural Networks (IJCNN), 2014 International Intelligent User Interfaces. Haifa, Israel. 24-27 February 2014.
Joint Conference. Beijing, China. 6-11 July 2014. 15(1): 2218- 329-334.
2225. [28] Kobayashi, T., T., Toyamaya, F., Shafait, M., Iwamura, K.,
[17] Jaderberg, M., K., Simonyan, A., Vedaldi, and A., Zisserman. Kise, And A., Dengel. 2012. Recognizing Words In Scenes
2014. Synthetic Data And Artificial Neural Networks For With A Head-Mounted Eye-Tracker. In Document Analysis
Natural Scene Text Recognition. Arxiv Preprint Arxiv. 9 June Systems (DAS), 2012 10th IAPR International Workshop. Gold
2014. 1406. 2227. Cost, QLD 27-29 March 2012. 333-338.
[18] Yao, C., X., Bai, and W., Liu. 2014. A Unified Framework For [29] Wang, H. C., Y., Landa, M., Fallon, and S., Teller. 2014.
Multi-Oriented Text Detection And Recognition. Image Spatially Prioritized And Persistent Text Detection And
Processing, IEEE Transactions. 23(11): 4737-4749. Decoding. In Camera-Based Document Analysis and
[19] Milevskiy, I., and Y., Boykov. 2014. Joint Energy-based Recognition. 23 August 2014. 3-17
Detection and Classificationon of Multilingual Text [30] Shivakumara, P., Phan, T. Q., Lu, S., and Tan, C. L. 2013.
Lines. arXiv preprint arXiv. 23 July 2014. 1407.6082. Gradient Vector Flow And Grouping-Based Method For
[20] Barlas, P., S., Adam, C., Chatelain and T. Paquet. 2014. A Arbitrarily Oriented Scene Text Detection In Video
Typed And Handwritten Text Block Segmentation System Images. Circuits and Systems for Video Technology, IEEE
for Analysis Systems (DAS). 11th IAPR International Workshop Transactions. 23(10): 1729-1739.
on heterogeneous and complex documents. Tours, [31] Wang, T., D. J., Wu, A., Coates, and A., Y., Ng. 2012. End-To-
France. 7-10 April 2014. 46 – 50. End Text Recognition With Convolutional Neural Networks.
[21] Breuel, T. M. 2002. Two Geometric Algorithms For Layout In Pattern Recognition (ICPR), 2012 21st International
Analysis. Document Analysis Systems v. 19 August 2002. 188- Conference. Tsukuba, Japan. 11-15 November 2012. 3304-
199. 3308.
[22] Yin, X., X. C., Yin, H. W., Hao, and Iqbal, K. 2012. Effective [32] Alsharif, O., and J. Pineau. 2013. End-To-End Text
Text Localization In Natural Scene Images With MSER, Recognition With Hybrid Hmm Maxout Models. arXiv
Geometry-Based Grouping And AdaBoost. In Pattern preprint arXiv: 7 October 2013. 1310.1811.
Recognition (ICPR), 2012 21st International Conference. [33] Neumann, L., and J., Matas. 2012. Real-Time Scene Text
Tsukuba, Japan. 11-15 November 2012. 725-728. Localization And Recognition. In Computer Vision and
[23] Hanif, S. M., and L. Prevost. 2009. Text Detection And Pattern Recognition (CVPR), 2012 IEEE Conference.
Localization In Complex Scene Images Using Constrained Providence, RI. 16-21 June 2012. 3538-3545.
Adaboost Algorithm. In Document Analysis and [34] Wang, K., B., Babenko, and S. Belongie. 2011. End-to-end
Recognition, 2009. ICDAR'09. 10th International scene text recognition. In Computer Vision (ICCV), 2011
Conference. Barcelona, Spain. 26-29 July 2009. 1-5 IEEE International Conference. Barcelona, Spain. 6-13 Nov.
[24] Ye, Q., J., Jiao, J., Huang and H. Yu. 2007. Text Detection 2011. 1457-1464.
And Restoration In Natural Scene Images. Journal of Visual [35] Singh, S., A., Gupta, and A. A. Efros. 2012. Unsupervised
Communication and Image Representation. 18(6): 504- Discovery Of Mid-Level Discriminative Patches. Computer
513. Vision–ECCV 2012. 3(4): 73-86.
[25] Shahab, A., F., Shafait, and A. Dengel. 2011. ICDAR 2011 [36] Nomura, S., K., Yamanaka, O., Katai, H., Kawakami and T.,
Robust Reading Competition Challenge 2: Reading Text In Shiose. 2005. A Novel Adaptive Morphological Approach
Scene Images. In Document Analysis and Recognition For Degraded Character Image Segmentation. Pattern
Recognition. 38(11): 1961-1975