Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Research paper


Optical Character System in Banking

Submitted as partial fulfilment for the award of

Session 2020-21
Information Technology

Tulika Sharma, 1703213114

Muskan Gupta, 1703221059
Shivanshdeep, 1703213102

Under the guidance of:

Mr. Manish Kumar Sharma

Designation (Assistant Professor(Senior Scale))



(Formerly UPTU)
We would like to acknowledge the contributions and support of our project guide, Mr.

Manish Kumar Sharma, designation, IT Department, with respect and gratitude, whose

expertise, guidance, support, encouragement, and enthusiasm has made this report

possible. We are also thankful to Prof. (Dr.) Amit Sinha, H.O.D of Information

Technology Department, and Mr. Ashwin Perti, A.H.O.D of Information Technology

Department for his constant encouragement, valuable suggestions and moral support and


Although it is not possible to name individually, we shall ever remain indebted to the

faculty members of ABES Engineering College, Ghaziabad for their persistent support and

cooperation extended during this work.

This acknowledgement will remain incomplete if I fail to express our deep sense of

obligation to my parents and God for their consistent blessings and encouragement.

Signature of student Signature of student Signature of student

(Name: Tulika Sharma) (Name: Muskan Gupta) (Name: Shivanshdeep)

(Roll No.: 1703213114) (Roll No.: 1703221059) (Roll No.: 1703213102)

Student’s Declaration

We hereby declare that the work being presented in this report entitled “Optical Character
System in Banking ” is an authentic record of our own work carried out under the
supervision of Mr. Manish Kumar Sharma, Designation, Information Technology.
The matter embodied in this report has not been submitted by us for the award of any other


Signature of student Signature of student Signature of student

(Name: Tulika Sharma) (Name: Muskan Gupta) (Name: Shivanshdeep)

(Roll No.: 1703213114) (Roll No.: 1703221059) (Roll No.: 1703213102)

Department: IT Department: IT Department: IT

This is to certify that the above statement made by the candidate(s) is correct to the best of

my knowledge.

Signature of HOD Signature of Supervisor

(Name: ) (Name:............................. )

(Name of Department) (Designation)

Date:.......................... (Name of Department)

Table of Contents

S. No. Contents Page No.

Acknowledgment i

Student’s Declaration ii

List of Figures iii

Abstract iv

Chapter 1 : Introduction 1-2

1.1 : Requirements of the project

1.2 : Analysis of Problem Based on Project Objective

Chapter 2 : Literature Survey 2

Chapter 3 : Proposed Methodology 3-5

3.1 : Proposed Algorithm

3.2 : Proposed Model

Chapter 4 : Results and Discussion 6

Chapter 5 : Conclusion and Future Scope 7

References 7
List of figures
Serial No. Figure name

Fig 1.1 : Bank Cheque

Fig 4.1 : OCR algorithm flowchart

Fig 5.6 : Identifying location of the word

The procedure of reserving data from bank records into the bank database is a long and an erroneous chore. In this module
we suggest a structure that examines bank records and withdraws data from them in the form of text which can additionally
be reserved in the bank dataset easily.
In the system suggested, the attributes of every character written in the input data are extracted and then transferred to the
neural network. Data sets, containing texts written by various people are used to teach the system. The proposed
identification system gives high levels of accuracy as compared to the conventional approaches in this field. This structure
can efficiently identify texts and transform them into structural form which can be effortlessly stored and organized.
I. Introduction

In today’s world, there is a growing demand for the software related systems to acknowledge related characters
in computer systems when data is scanned through documents in paper format as it is known that we have various
books and newspaper which are in the format of print associated with many subjects. One of the simple ways to
store data in these documented papers into the computer system is to firstly scan the documents and after that
store them as IMAGES. As a output, computer is not able to identify the characters while studying them. For this
processing we require a system software known as CHARACTER RECOGNITION SYSTEM.
Thus our need is to develop character recognition software system to perform Document Image Analysis so that
documents can be transformed from format that is paper to a format that is electronic. The process of conversion
of paper documents into that with electronic format is currently going on task in many of the organizations
especially in Research and Development (R&D) areas, in large businesses, enterprises. in the institutions of the
government, so on .
Character Recognition and Conversion is the method of electronic conversion or mechanical conversion of images
of hand text, type written text or text in printed form into machine text. The method by which printed texts are
digitized for them to be searched electronically and stored in a more compact manner. Scanning of information is
done through paper documents that are stored as images and further processed into format of text. For computers,
this process is tough to perform. Any document scanned is a graphical file, that is, pixels patterns. Further, it
becomes feasible to withdraw information that is useful. Texts that are in machine readable form can then be used
for different causes. Character Recognition and Conversion system that is based on a grid infrastructure ands
performs various processes such as analysis of image, processing of electronic document converted from paper
formats. The aim of this project is to spot, draw out and acknowledge text obtained from images, particularly
forms in the bank using the OCR algorithm. For Example-. Extracting cheque number, amount of data, etc from
a cheque used in bank.

Fig 1.1 : Bank Cheque

II. Literature Survey

OCR is utilized in finding and looking for text from electronic documents and papers or to place the text on
different websites. It is also used in captcha in optical music character recognition.
Another use of OCR is in dealing with computer vision research and in the study of system design which can
spot and analyse computer printed papers and texts written by hand.
This task deals with combined features of OCR and have majorly focussed on its application such as image
sudoku solver, Car License Plate Detection and Recognition, Hand written and Computer Printed Documents
Recognition. It also operates with hand written and computer printed papers. One more work was to define a
structure of data and OCR information to an operator with the help of HTML interface which is helped by making
use of HTML and Javascript. The structure of data comprises an OCR with higher level of confidence for every
fields and letters.

III. Proposed Work

In this research paper, a feature extraction scheme for the recognition of hand written characters is suggested. An
overview of the system is given below in block diagram of Figure 2.

Fig 2 : OCR algorithm flowchart

The chief concept in recognition of patterns automatically, is to first train the machine about the classes of patterns
that may happen and their similarities. In OCR the patterns are letters, numbers and also special symbols such as
commas, question-marks etc., while the various classes corelate to various letters.
Our proposed system is OCR based on grid infrastructure that is a character recognition system that helps
identification of the characters in various languages. This attribute is called grid infrastructure that deletes the
issue of specific character identification and helps many functions performed on the data.

3.1 Image Acquisition

In Image acquisition, the recognition system acquires a scanned image as an input image. The image must include
a specific format such as JPEG, BMT etc.

3.2 Pre – Processing

Once the scanner has scanned an image there might be certain amounts of noise in it or the characters maybe
broken. With pre-processing images are upgraded for successful validation.
Techniques used in this process:
• De-Skew
• Despeckle
• Binarization
• Removal of line
• Zoning
• Detection of Line and Word
• Script identification
• Segmentation

3.3 Segmentation
This is the third step, in this process an image of group of characters is converted into sub-images of meaningful
chunks. Therefore, the pre-processed image that is inputted is segmented into isolated characters by allocating a
number to every character by labelling process.
Fig 3 : Identifying location of the word

3.4 Feature Extraction

This process means dividing the input data into a set of attributes, that is, to find important characteristics that
makes pattern recognizable. This method -
1) Extracts properties to identify a character uniquely.
2) Extracts properties that differentiates between similar characters.
Technique used are: Gabor feature extraction

3.5 Classification :
Every character is classified in a specific category. They can now take it to the neural network (NN) to train them
to recognize characters. It is done by technique called KNN classifier.

3.6 Post - processing :

OCR correctness can be grown if the result is bound by a lexicon – a collection of words that are permitted to be
in a document. This could be, example, a more technical dictionary all the words in the English language, or for
a particular area.


OCR technology gives really fast and automated information collection which can help prevent considerable time
and labor costs in the banking system.
The system consist of various benefits such as automation of dull tasks, reduces complexity of time, reduces
database and increases adaptability to inputs that are not trained with only a little number of attributes to calculate.

The suggested system can be implemented using software like Matlab. The project works on the algorithm of
OCR based on ML. The image which is scanned is taken as input and feed forward architecture is applied.

The structure of neural network includes an input layer with fifty-four inputs, two layers which are hidden and
each has hundred neurons and a following output layer with twenty-six neurons. The training of network is done
by technique called the gradient descent back propagation way with momentum and adaptive learning rate and
log-sigmoid transfer function. Training of the Neural network has been done using known dataset. The number of
input nodes is selected on the basis of number of attributes.


We have chosen Optical Character Recognition as chief fundamental method for the recognition of characters.
The transformation of documents in paper format to electronic format is quite a tedious job which can also be
erroneous if done manually. There are many bank documents in a bank daily whose information requires to be
kept in the bank database. Therefore, we have suggested a structure which can work on the Optical Character
Recognition algorithm to make the defined task automated and hence effortless.

This project will actually be supportive in the banking field especially in these times of pandemic and even later.
OCR shall prove to be a robust tool for future data entry applications in the field of banking system.
The Optical Character Recognition software can be enhanced in the future in different kinds of ways such as:
Training and recognition speeds can be improved in the future by making it more user-friendly. This project will
be a stepping stone in the near future.

[1] W. N. Manegoli and Prof. P. Desai, “ Optical Character Recognition for running Hand writing ”, International
Research Journal of Engineering and Technology, Vol. 4 – No. 5, pp. 793 – 795, 2016.
[2] N. Agarwal and M. Yadav, “ Hand writing Recognition System – A Review ”, International Journal of
Computer Applications, Vol. 214 – No. 19, pp. 46 – 40, 2015.
[3] P.Senior and P. Robinson. "An Offline running Hand writing Recognition System, IEEE Transactions on
Pattern recognition and Machine Intelligence, vol. 23, no. 3, pp309-312, APRIL 2014.
[4] Ray Smith. Hybrid Page Layout Analysis by a Tab-Stop Detection. Proceedings of the 11th international
conference on document analysis and recognition. 2009.

You might also like