Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Recognition of handwritten text using long

short term memory (LSTM) recurrent neural


network (RNN)
Cite as: AIP Conference Proceedings 2095, 030011 (2019); https://doi.org/10.1063/1.5097522
Published Online: 09 April 2019

I. Joe Louis Paul, S. Sasirekha, D. Raghul Vishnu, and K. Surya

ARTICLES YOU MAY BE INTERESTED IN

EOQ inventory model for buyer-vendor with screening, disposed cost and controllable lead
time
AIP Conference Proceedings 2095, 030010 (2019); https://doi.org/10.1063/1.5097521

A second order evolution equation with a lower order fractional term in a Banach space
AIP Conference Proceedings 2095, 030001 (2019); https://doi.org/10.1063/1.5097512

Effect on displacement, stress and strain of a finite, buried, strike-slip inclined fault in the
standard linear solid (SLS)
AIP Conference Proceedings 2095, 030005 (2019); https://doi.org/10.1063/1.5097516

AIP Conference Proceedings 2095, 030011 (2019); https://doi.org/10.1063/1.5097522 2095, 030011

© 2019 Author(s).
Recognition of Handwritten Text using Long Short Term
Memory (LSTM) Recurrent Neural Network (RNN)
I Joe Louis Paul1,a), S Sasirekha1, b), D Raghul Vishnu1, c), K Surya1, d)
1
Department of Information Technology, SSN College of Engineering, Rajiv Gandhi Salai (OMR),
Kalavakkam – 603110, Chennai, Tamilnadu, India.
a)
Corresponding author: joelouisi@ssn.edu.in
b)
sasirekhas@ssn.edu.in
c)
raghulvishnu14087@it.ssn.edu.in
d)
suryak14116@it.ssn.edu.in

Abstract. Handwriting recognition is a technique which is used to produce machine readable text from a given text
image. The hand written text is captured as an image from mobile. Handwritten characters are usually recognized with
Optical Character Recognition (OCR) scanners. But with the large usage of mobile phones, detecting text from mobile
camera has plenty of applications such as medical script processing, exam script evaluation etc. Camera image has lot of
noises when compared to the OCR scanned images. Therefore, the image is pre-processed to reduce noise using image
processing techniques such as binarization, thresholding and etc. The letters are segmented and extracted from an image.
The features such as binary codes are extracted from the letters. The neural network classifier is built using Long Short
Term Memory (LSTM) network which is trained using an already built character dataset. The neural network is used to
test the input images. The output is provided as a text document with the recognized words .Since the input feed is
obtained from images, the noise will be high compared to the existing system input set which uses scanned images. Noise
reduction technique such as low intensity pixel removal is applied to reduce the noise from the input image for improving
the efficiency.

INTRODUCTION
Character recognition is one of the active research fields in the computer vision, pattern matching and artificial
intelligence. Using character recognition, the images of handwritten text are converted into machine-editable text.
With the input of handwritten mobile camera images, the intention is to design a handwritten character recognition
system as a software system which can be used to recognize any handwritten character efficiently. Moreover, it is
possible to make the computer to interpret or understand or learn the written or printed text given in the particular
language as specified by the user. Even though the character recognition field is continuously evolving for many
years with the advent of mobile communication technology, the focus is on the implementation of character
recognition techniques for the proven techniques.
Nowadays, most of the things are in digitalized format. So, in order to create the paperless environment, the
system is built to recognize the handwritten text and convert into text document. Therefore, the people can use this
information for a longer period of time. Hence, this paper is mainly focusing on how to reduce the human work from
typing the large documents. The existing system uses the Optical Character Recognition (OCR) scanners for
recognizing the text from images where as this paper uses mobile camera images in order to recognize the
handwritten text. Fig. 1 shows the difference between OCR and mobile camera images. The proposed system can be
used in most of the fields like digital character conversion, meaning translation, content based image retrieval,
keyword spotting, signboard translation, text-to-speech conversion, scene image analysis.

Recent Developments in Mathematical Analysis and Computing


AIP Conf. Proc. 2095, 030011-1–030011-13; https://doi.org/10.1063/1.5097522
Published by AIP Publishing. 978-0-7354-1825-7/$30.00

030011-1
(
a) (b)

FIGURE 1. OCR versus mobile camera images.

There are two types of handwritten recognition namely online character recognition and offline character
recognition. This work uses the offline character recognition so that the user can access the system without the help
of Internet. Figure 2 describes the various types of handwriting recognition followed by Fig. 3 which illustrates the
digital conversion of the handwritten text.

FIGURE 2. Types of handwritten character recognition.

FIGURE 3. Typical digital conversion of handwritten text.

030011-2
Hence, the idea of this proposed work is to design and develop an efficient system which gets the input in the
form of a digital image format. The system processes the image for better comparison. Here the usage of Long Short
Term Memory (LSTM) provides the better efficiency and execution time when compared to other neural networks.
At the end, the character recognition system gives a prediction of the character in percentage accuracy.

RELATED WORKS
Vinciarelli et al. proposed an offline recognition of large vocabulary cursive handwritten text implemented as a
system for the offline recognition of handwritten lines of text is proposed for a data produced by single writer. The
system is based on continuous density Hidden Markov Model (HMM) and statistical language model [1].
Tuan Nguyen and Nakagawa proposed an improved segmentation of online handwritten text using Recurrent
Neural Network (RNN), shows an application of Bi-directional Long Short Term Memory Recurrent Neural
Network for segmentation of online handwritten English text [2].
Sun et al. proposed convolutional multi directional recurrent network for offline handwritten text recognition
based network architecture called convolutional multi directional recurrent network (CDRN) for offline handwritten
text recognition. The module is used to abstract contextual information in various directions [3].
Voigtlaender et al. proposed hand writing recognition with large multi dimensional long short term memory
recurrent neural network implemented with the help of Graphics Processing Unit (GPU) which greatly reduces
training times by processing the input in the diagonal wise fashion. Here, authors achieve impressive results for the
handwriting recognition [4, 5, 6, 7].
In recent literature, it is found that the long-term short-term memory network (LSTM), a RNN variation in use
for character recognition. LSTMs contain information in a gated cell outside the normal flow of the recurring
network. Information can be stored in a computer's memory, written to it or read from a cell. The cell decides what
to store and when to allow reading, writing and erasure via opening and closing gates. The LSTM architecture
allows the network to store information for longer periods of time and prevents gradients from disappearing and
exploding.
The key to LSTMs is the cell state, the horizontal line at the top of the diagram. The state of the cell is like a
conveyor belt. It runs through the whole chain, with only a few minor linear interactions. Information is very easy to
flow unchanged along. The LSTM does have the ability to remove or add information to the cell state, carefully
regulated by structures called gates. Gates are a way to optionally let information through. They are composed out of
a sigmoid neural net layer and a point wise multiplication operation.

SYSTEM ARCHITECTURE
Character recognition is a classic problem of pattern recognition that researchers have been working on since
computer vision was first introduced. The automatic character recognition applications are wider than ever with
today's omnipresence of cameras. For Latin script, this is largely regarded as a problem solved in restricted
situations, such as images of scanned documents containing common font characters and uniform background.
The input image is obtained from the user. Acquired image is applied to pre processing steps such as digitization
.The input image is converted to a binary image. The binary image is converted to a grayscale image based on the
threshold value. The edges of the text in image are identified and the image is dilated and filled. Then the text is
segmented into lines, words and characters. Features are extracted from each character. The features are used to
predict the class by predicting using a classifier built using Long Short Term Memory (LSTM) Recurrent Neural
Network (RNN). The recognized class labels are appended to a file to produce the desired result.
Figure 4 shows the system flow of the handwriting recognition process. However, images obtained with popular
cameras and handheld devices still pose a huge challenge for the recognition of character. This dataset shows the
challenging aspects of this problem. Figure 5 describes a sample set of images from chars74k dataset.

030011-3
FIGURE 4. System flow of the handwriting recognition process.

FIGURE 5. Sample character set.

The images captured from handheld devices need to be pre processed to remove the unnecessary areas and focus
on the text areas in a camera image. The focused text areas have inadequate lighting and different stroke width.
The digitization process of text can be carried out either by a scanner, computer or mobile camera. The images
which are captured using the above devices are in gray tone.

030011-4
Binarization is a technique that converts gray images to binary images. We used a threshold approach based on
histogram to convert images of gray scale into two tone images. Morphological operations such as opening and
closing are used to remove noise and connect edge points of disjoint character. Our assumption is that levels of
intensity are normalized and reversed, meaning that white pixels indicate intensity 1.
If the scanned document is carelessly fed to the scanner, the digitized image can be skewed. Skew correction can
be performed in the same skew amount by rotating the document in the opposite direction. We rotate the document
at an angle and determine the maximum histogram value in the row. When the headline is aligned with the
horizontal direction, we get the maximum value for the row histogram. Further document rotation subsequently
decreases the maximum row histogram value.
We have employed hierarchical approach for segmentation. Segmentation is performed at different levels like
line segmentation, word segmentation, character segmentation, zone wise modifier isolation.
Histogram enjoys a central segmentation position. It is very easy to calculate the boundaries of each line from
the test image histogram. The image document is scanned horizontally for the isolation of text lines to count the
number of pixels in each row. In order to build the line histogram, the frequency of black pixels in each row is
counted. This count is zero between gaps in the line.
Each segmented line column histogram gives us the boundaries of each word. The continuous black pixel portion
of the line is considered a word in that line. If a vertical scan does not find a black pixel, it is considered to be the
spacing between words. Figure 6 shows the segmentation of words.

FIGURE 6. Segmentation of words.

There are 62 class labels in the dataset for handwritten capital and small letters and 10 numerical digits [A-Z a-z
0-9]. The chars74k dataset is a set of images that are photographed text images for English language. The images are
obtained from various sources such as streets and are convoluted to focus on the text portion of the image.
The chars74k dataset is a set of images that are photographed text images for English language. The images are
obtained from various sources such as streets and are convoluted to focus on the text portion of the image.
The testing set is obtained from handheld mobile devices. The images are pre processed by binarization, image
dilation and filing. The obtained image is similar in characteristics to an OCR image which will be used for
prediction of characters.
Binary code is obtained after pre processing and resizing of the image in a standard size. The binary code along
with the class labels will be used for training the neural networks. Figure 7 shows the binary code of an image.

030011-5
FIGURE 7. Binary code of an image.

EXPERIMENTAL RESULTS
The experimental work involves the image acquisition, prepre-processing,
processing, feature extraction, training a LSTM RNN
neural network and finally testing by predicting classes of handwritten images using the weight values from the built
network. From Figures 8 to 16 6 illustrates the various screenshots obtained for the proposed system showing the
input image with and without noise, image dilation, image filling, the Graphical User Interface (GUI) followed by
LSTM training and the output text document respectively. The software’s that have been used for the analysis of the
proposed work are MATLAB scripts for image processing, MATLAB GUIDE for GUI development and MATLAB
nnstart for neural networks implementation respectively. For implementation, the training
raining size of 18600 images is
used. Binary code is used as the extracted feature of the given input image. Histogram based hierarchical approach
has been used for segmentation and finally for prediction Artificial Neural Network has been used
(i.e., LSTM RNN).

FIGURE 8. Binary code of an image.

030011-6
Figure 9 shows the input image without noise while Fig. 10 shows the input image after dilation.

FIGURE 9. Input image without noise.

FIGURE 10. Image dilation.

030011-7
Figure 11 shows the input image with filling while Fig. 12 shows the final processed image.

FIGURE 11. Image filling.

FIGURE 12. Final processed image.

030011-8
Figure 13 displays the GUI of the proposed handwritten recognition system.

FIGURE 13. GUI.

Figure 14 displays the number of objects in that image and the extracted characters from the image.

FIGURE 14. Extracted characters from the image.

030011-9
Figure 15 shows the output documents for the given input image.

FIGURE 15. Output Text Document.

It is found that, our proposed system has achieved an accuracy of 88.32% as the character recognition for our
own custom handwritten English text mobile camera dataset. Figure 16 illustrates the training accuracy achieved for
our proposed system.

FIGURE 16. Training Accuracy.

030011-10
The performance of LSTM RNN has been analyzed. Figure 17 describes the confusion matrix plotted as the
Region of Convergence (RoC) to measure the performance of the proposed LSTM RNN based handwritten
recognition system.

FIGURE 17. Region of Convergence (RoC).

Figure 18 depicts the error histogram for 20 bins to visualize the performance of the proposed study.

FIGURE 18. Error Histogram.

030011-11
Figure 19 displays the neural network validation at the given epoch of 54 with the gradient and validation checks
respectively.

FIGURE 19. Neural Network Validation.

The performance of LSTM RNN has been compared with the existing works. Evidence of its generality and
power of LSTM provided by data from a recent international Arabic recognition competition is found, where it
outperformed all entries (91.4% accuracy compared to 87.2% for the competition winner) for OCR based Arabic
language dataset.

CONCLUSION
The aim of this work is to design a system which accepts input images of handwritten text and recognizes the
characters in the image by building a neural network. The primary objectives are to build a dataset for mobile
camera handwritten images, to reduce the noise in handwritten text image for efficient character recognition and to
make use of a recurrent neural network to train the system to recognize characters.
The existing LSTM system for handwritten recognition uses OCR for input feed which has less noise. The
proposed system uses the images of handwritten text from mobile cameras (more noise) to achieve the same
proposed accuracy. This proposed study mainly focuses on recognition of handwritten text using RNN which is
supported by LSTM. This paper can be used for the purpose of Exam paper evaluation, medical scripts and
document conversion etc. Here, LSTM is used to carry on the information for long period. The training period for
LSTM is very high and requires intense processing power also camera images of hand written text are highly
affected to noise and each person handwriting varies differently systems may be proposed to reduce these
challenges.

030011-12
REFERENCES
1. A. Vinciarelli, S. Bengio, H. Bunke, Proceedings of Seventh International Conference on Document Analysis
and Recognition, Edinburgh, 2003, pp. 1101–1105.
2. C. T. Nguyen, M. Nakagawa, Proceedings of 15th International Conference on Frontiers in Handwriting
Recognition (ICFHR 2016), Shenzhen, 2016, pp. 246 –251.
3. Z. Sun, L. Jin, Z. Xie, Z. Feng, S. Zhang, Proceedings of 15th International Conference on Frontiers in
Handwriting Recognition (ICFHR 2016), Shenzhen, 2016, pp. 240 – 245.
4. P. Voigtlaender, P. Doetsch, H. Ney, Proceedings of 15th International Conference on Frontiers in Handwriting
Recognition (ICFHR 2016), Shenzhen, 2016, pp. 228 – 233.
5. L. Zhang, G. Bai, C. Xuan, Proceedings of IEEE International Conference on Intelligent Computing and
Intelligent Systems, Xiamen, 2010, pp. 670 – 674.
6. A. Yuan, G. Bai, P. Yang, Y. Guo, X. Zhao, Proceedings of International Conference on Frontiers in
Handwriting Recognition, Bari , 2012, pp. 207 – 212.
7. Y. Chherawala, P. Roy, and M. Cheriet. IEEE Transactions on Cybernetics, 46, 2825 – 2836 (2016).

030011-13

You might also like