Final Eval Report PDF

OCR Based Text Recognition & Translation
Capstone Project Report

December Evaluation
Submitted by:
(101603179) MAGANJOT SINGH
(101603121) HIMANSHU GOYAL
(101611024) HIMANSHU SHARMA
(10160030) GEETIKA
BE Third Year, Computer Engineering
CPG No: 56
Under the Mentorship of
Dr. Singara Singh
Associate Professor
Computer Science and Engineering Department
TIET, Patiala
December 2019
ABSTRACT
The aim of the project 'Multilingual OCR' is to develop OCR software for online/offline
handwriting recognition. OCR is an Optical character recognition and is the mechanical or
electronic translation of images of handwritten or typewritten text (usually captured by a
scanner) into machine-editable text and then can be summarized. OCR is a field of research in
pattern recognition, artificial intelligence and machine vision.
Handwritten recognition is used most often to describe the ability of a computer to translate
human writing into text. This may take in one of the two ways, either by scanning of written
text or by writing directly on peripheral input devices.
Text Summarization is the process of creating a condensed form of text document which
maintains significant information and general meaning of source text. Automatic text
summarization becomes an important way of finding relevant information precisely in large text
in a short time with little efforts.
ii
DECLARATION
We hereby declare that the design principles and working prototype model of the project
entitled OCR Based Text Recognition & Translation is an authentic record of our own work
carried out in the Computer Science and Engineering Department, TIET, Patiala, under the
guidance of Dr. Singara Singh during the 6th semester (2019).
Date: 18th December 2019
Roll No. Name Signature
101603179 Maganjot Singh
101603121 Himanshu Goyal
101611024 Himanshu Sharma
101610030 Geetika
Counter Signed By:
Faculty Mentor:
Dr. Singara Singh
Associate Professor
Computer Science & Engineering Department,
TIET, Patiala
iii
ACKNOWLEDGMENT
We would like to express our thanks to our mentor Dr. Singara Singh. He has been of great
help in our venture, and an indispensable resource of technical knowledge. He is truly an
amazing mentor to have.
We are also thankful to Dr. Maninder Singh, Head, Computer Science and Engineering
Department, entire faculty and staff of Computer Science and Engineering Department, and
also our friends who devoted their valuable time and helped us in all possible ways towards
successful completion of this project. We thank all those who have contributed either directly
or indirectly towards this project.
Lastly, we would also like to thank our families for their unyielding love and encouragement.
They always wanted the best for us, and we admire their determination and sacrifice.
Date: 18th December 2019
Roll No. Name Signature
101603179 Maganjot Singh
101603121 Himanshu Goyal
101611024 Himanshu Sharma
101610030 Geetika
iv
TABLE OF CONTENTS
ABSTRACT ii
DECLARATION iii
ACKNOWLEDGEMENT iv
TABLE OF CONTENTS v
LIST OF TABLES viii
LIST OF FIGURES ix
1. INTRODUCTION 1
1.1 Project Overview 1
1.1.1 Technical Terminology 1
1.1.2 Problem Statement 2
1.1.3 Goal 2
1.1.4 Solution 2
1.2 Need Analysis 3
1.3 Research Gaps 3
1.4 Problem Definition and Scope of the Project 4
1.5 Assumptions & Contstraints 4
1.6 Approved Objectives 5
1.7 Methodology Used 5
1.8 Summary of Project Outcomes 5
1.9 Novelty of Work 6
2. REQUIREMENT ANALYSIS 7
2.1 Literature Survey 7
2.1.1 Theory Associated With Problem Area 7
2.1.2 Existing Systems and Solutions 7
2.1.3 Research Findings for Existing Literature 8
2.1.4 The Problem That Has Been Identified 14
2.1.5 Survey of Tools and Technologies Used 15
2.2 Standards 15
2.3 Software Requirement Specifications 15
v
2.3.1 Introduction 15
2.3.1.1 Purpose 15
2.3.1.2 Intended Audience and Reading Suggestions 16
2.3.1.3 Project Scope 16
2.3.2 Overall Description 16
2.3.2.1 Product Perspective 16
2.3.2.2 Product Features 16
2.3.3 External Interface Requirements 17
2.3.3.1 User Interfaces 17
2.3.3.2 Hardware Interfaces 17
2.3.3.3 Software Interfaces 17
2.3.4 Other Non-Functional Requirements 17
2.3.4.1 Performance Requirements 17
2.3.4.2 Safety Requirements 17
2.3.4.3 Security Requirements 17
2.4 Cost Analysis 17
2.5 Risk Analysis 18
3. METHODOLOGY ADOPTED 20
3.1 Investigative Technology 20
3.2 Proposed Solution 21
3.3 Work Breakdown Structure 22
3.4 Tools & Technologies Used 23
4. DESIGN SPECIFICATIONS 24
4.1 System Architecture 24
4.2 Design Level Diagram 25
4.3 User Interface Diagrams 27
5. IMPLEMENTATION AND EXPERIMENTAL RESULTS 28

5.1 Experimental Setup 28
5.2 Experimental Analysis 29
5.2.1 Data 29
5.2.2 Performance Parameters 30
vi
5.3 Working of the project 31
5.3.1 Procedural Work Flow 31
5.3.2 Algorithmic Approaches Used 34
5.3.3 Project Deployment 46
5.3.4 System Screenshots 48
5.4 Testing Process 54
5.4.1 Test Plan 54
5.4.1.1 Features to be tested 54
5.4.1.2 Test Strategy 54
5.4.1.3 Test Techniques 54
5.4.2 Test Cases 57
5.4.3 Test Results 59
5.5 Results and Discussions 61
5.6 Validation of Objectives 61
6. CONCLUSIONS AND FUTURE DIRECTIONS 63

6.1 Conclusions 63
6.2 Environmental, Economic, Societal Benefits 63
6.3 Reflections 64
6.4 Future Work 64
7. PROJECT METRICS 66
7.1 Challenges Faced 66

7.2 Relevant Subjects 66
7.3 Interdisciplinary Knowledge Sharing 66
7.4 Peer Assessment Matrix 67
7.5 Role Playing and Work Schedule 68
7.6 Student Outcomes Description and Performance Indicators 69
7.7 Brief Analytical Assessment 71
APPENDIX-A REFERENCES 73
APPENDIX-B PLAGIARISM REPORT 75
vii
LIST OF TABLES
Table No. Caption Page No.
Table 1 Assumptions and Constraints 4
Table 2 Research Findings for Existing literature 8-14
Table 3 Cost Analysis 17-18
Table 4 Risk Analysis in empty spot detection 18
Table 5 Risk Analysis in localization and guidance 19
Table 6 Peer Assessment Matrix 67
Table 7 Role Playing and Work Schedule 68
Table 8 Student Outcomes 69-71
viii
LIST OF FIGURES
Figure No. Caption Page No.
Figure 1 Proposed Solution 21
Figure 2 Work Breakdown Structure 22
Figure 3 Component Diagram 24
Figure 4 Design Level Diagram 25
Figure 5 MVC Tier Architecture 26
Figure 6 Basic Architecture 26
Figure 7 User Interface Diagram 27
Figure 8 OCR Experimental Flow 28
Figure 9 Image Segmentation 30
Figure 10 Activity Diagram 33
Figure 11 Simple Thresholding Output 34
Figure 12 Adaptive Thresholding Output 35
Figure 13 OTSU 36
Figure 14 Document Scanning Output 41
Figure 15 Component Diagram 46
Figure 16 Deployment Diagram 47
Figure 17 System Prototype Screenshots 48
Figure 18 Test Inputs 57
Figure 19 Test Outcomes 59
ix
INTRODUCTION
1.1 Project Overview

The project revolves around OCR, Language Translation and Summarization of text. Human mind
can perceive the message(s) conveyed by an image simply by looking. We observe the text on the
image as text and can read it as per our understanding. Computers don't work this way. Their
systems demand some alternative more concrete, robust and organized in a system interpretable
way.
The question arises, How to actually achieve it? - The image is first scanned and the text and visual
contents are converted into a bitmap, which is essentially a matrix of monochrome dots. The image
is then pre-processed where the brightness, contrast and contours are adjusted to enhance the
accuracy of the process. The graphic is then broken into segments identifying the areas of interest
(areas comprising images and text) which helps start off the abstraction. The areas containing text
can be divided further into lines, words & characters, by the virtue of which, the software is able to
map the characters with the datasets through comparison, identification and various other detection
and matching algorithms. The conclusive result is the extracted text from the input still that the
system returns. The process might not turn out to be 100% accurate and might require human aid
and intelligence to correct some elements that weren’t able to be scanned properly due to irregular
terrain and lighting. Dictionary or other NLP Techniques can also be applied to handle the errors.
Further down, for the translation part the, Google Translation AJAX API will be sought for
which provides a simple interactive programmed interface for translating/identifying any random
string into any language that is supported, using modern standards Neural Network translation and
LSTMs. It is highly responsive, so web platforms and applications can easily interact with the
integrated API for fast, efficient translation of the detected/input text from the automatically
detected source language to the destination choice of language (such as French to English). Now
finally the summarization will be done using the python Natural Language Toolkit.
1.1.1 Technical Terminology
Natural Language Toolkit- NLTK includes many corporate and linguistic resources such as
WordNet using several text processing libraries for operations such as classification, tagging,
Page 1 of 80
parsing, tokenization, stemming, and semantic reasoning. In addition, wrappers for industry level
libraries can be accessed using NLTK.
Kick-off- To push for beginning of a process
Optical Character Recognition- Recognising Text/Characters from images that might contain
noise/other interference
1.1.2 Problem Statement

Mankind’s struggle to communicate meaning ‘twixt pen and mind has been a problem since long
for anyone wishing to converse/convey effectively. Optical Character Recognition, widely known
as OCR, is a state-of-the-art answer to this nearly never-ending riddle, brought into the present day
requirements, for machines to understand humanistic graphics, symbols and texts.
How many times have you been frustrated that you cannot select the text in an image? How often
would you find the image you are searching for if you could only type in a key word that you know
the image contains, only to realize that your computer is not capable of “reading” the contents of
that image? How much time have you wasted typing words that already exist in picture form?
1.1.3 Goal
To build a website that provides a one stop solution to easy understanding of signs/text in other
languages by incorporating OCR, Translation of the recognized text and an alternative to
summarize long texts in a way that it puts minimum load on the servers and provides maximum
responsiveness and reliability to the product.
1.1.4 Solution
• Build an OCR Module that identifies text from input images and sends it into translation
• Provide language options for translation of the output
• Provide an alternative for Summarization and text to speech conversion of the text
User:
➢ Input Image for recognition: The Recognition is done on the basis of input by the user,
be it an already clicked image to upload, or a live image input
➢ Translation: User can translate the recognised text into the given choices of languages.
Page 2 of 80
➢ Summarization: The system will allow the users to summarize the text if the text is
lengthy and complex for the user to summarize.
➢ Text to Speech: The user can also convert the text to speech for ease of access
1.2 Need Analysis

• The OCR softwares are not widely studied worldwide and need to be studied on a higher
level.
• The OCR modules currently are not highly accurate with handwritten corpora.
• English Language Modules for OCR are the only modules readily available.
• For people who travel and find it difficult to interpret texts in other languages, OCR softwares
will come to the rescue.
• In an era, dominated by concepts and techniques of AI and Machine Learning, OCR becomes
an appropriate target of study.
1.3 Research Gaps

• We will not be able to deploy this as an Android App as of yet, since we don’t have the learning
to do so yet.
• We need a system which is able to process different types of high variation handwriting by
advanced Pre-Processing
• Noise is random variation in an image that can make the text incorporated in the image more
difficult to read. There are some noises which cannot be removed in binarization step of
Tesseract that can lead to a drop in accuracy rates.
• The dark borders around an image can be picked up as extra character which would result in
wrong observation.
• Degree of skewness directly affects the quality of OCR by reducing the quality of Tesseract’s
line segmentation. To resolve this page should be rotated in such a way that text lines are
horizontal.
• Converting an image to black and white. Tesseract uses Otsu algorithm to do this but results
are not at par with actual image.
Page 3 of 80
1.4 Problem Definition and Scope
The scope of this project is outlined as follow:-
• Find out the advantages of building a unified platform to recognise, translate and summarize
text.
• Study the practicability and different options to enhance the user’s experience and potency of
OCR.
• Develop a paradigm system that integrates a backend system and a mobile consumer running
on a smartphone.
• Evaluate and review the positives and weaknesses of the project build outcome, look for and
determine for additional improvement.
1.5 Assumptions and Constraints

The following table describes the assumptions and constraints made for our unique project:-
TABLE 1: Assumptions and Constraints
S.No Assumptions
1 Let us assume that this is a web app used by general public who
love to travel to different places
• Reading text written in other languages on various articles

• Text Summarization for lengthy texts
Assuming both the tasks are single user tasks, we have designed a
distributed translation system that could simultaneously convert the
text into other languages.
2 It is assumed that the database for the translation model will be made
available.
3 The requirements specified in the SRS may be affected by various factors that include:
• The workability of the System modules such as those dealing with
simultaneous translation.
• A basic module of word count is set up.
To facilitate the decision whether to accept or reject the job, users
are assumed to have a fair estimate of job execution times.
Page 4 of 80
4 One assumption about the product is that it can always be used on
mobile phones that have enough performance. If required hardware
resources are not available in the phone for the application to perform
well, there may be scenarios where the application does not work as
intended or even at all.
1.6 Approved Objectives
• To develop a user-friendly webpage
• Easy scanning of handwritten text image input
• Translation of scanned text to language of choice
• Achieve text summarization in multiple target languages
1.7 Methodology Used

The complete process is explained by the following flow :-
• Integrating google AJAX API for translation to other languages

• High quality image input from camera
• Pre-processing of Image using threshold and four point transform for better outputs
• Optical Character Recognition for different variations of fonts and texts
• Summarization using NLP
1.8 Project Outcomes and Deliverables

Outcomes:-
• Detection of text through camera

• Image/Video to Digital Text Conversion
• Translation of Text to Various Languages
• Text Summarization
• Text to speech conversion (Future Scope)
Page 5 of 80
Deliverables:-
1. A website with the following integrated features:
i OCR
ii Summarization
iii Translation
iv Important Words Highlight
2. A working model
3. Trained Dataset
1.9 Novelty of work

• Our project minimizes human effort and time spent into executing all the tasks that our
product provides at a unified platform.
• We are integrating OCR Engine, Translator and Summariser with A website whereas in
most systems, Android apps are use, which usually eat up space, are difficult to maintain
and sometimes face bugs.
• We can store all the translated text in the cloud which can be used in the future.
• With the help of this mechanism we aim to build efficient and interactive future smart
classes.
• We target to reduce the academic and professional language barriers.
• Our product will also be helpful in preservation of old books and eventually saves a lot of
human effort.
Page 6 of 80
REQUIREMENT ANALYSIS
In software and system engineering, requirement analysis encircle the tasks those go
in finding the requirements or conditions to satisfy for a replacement or altered product or
project, taking account of the presumably conflicting necessities of the assorted stakeholders,
analysing, documenting, verifying and managing software or system necessities.
2.1 Literature Survey

The idea of creating this product has now become possible through the usage of vast libraries like
Tesseract that have been developed and deployed in python environments. One particular of the
critical issues associated with smart classrooms is the language barriers created due to the cultural
differences. In current times it is the need of the hour that classrooms shall be multi lingual which
will be possible by the virtue of our website.
Over the time, Optical Character Recognition technology has become more precise & accurate and
more complex due to advancements in technology areas like machine learning, NLP, AI etc.
Nowadays, OCR software uses image pre-processing techniques along with various feature
identification and detection methods, and text mining to transform documents quicker and more
accurately.
2.1.1 Theory Associated With Problem Area

Mankind’s struggle to communicate meaning ‘twixt pen and mind has been a problem since long,
for anyone wishing to converse/convey effectively. Optical Character Recognition, widely known
as OCR is a state-of-the-art answer to this nearly never-ending riddle, brought into the present day
requirements, for machines to understand humanistic graphics, symbols and texts.
2.1.2 Existing Systems and Solutions

• OCR includes following existing works:-
➢ Google translate uses live camera or pre saved image for text recognition and translation.
➢ Apps like Google Keep or Ever-note is very useful while working on OCR occasionally.
Page 7 of 80
➢ Text Fairy app frequently transforms images to text in no time. First enhancement is done
on the image and then various pre-processing is done.
➢ CamScanner is an app that enables us to make pdf from scanned copies and serves as a
means to store docs as well.
• Summarization: -
➢ SMMRY (pronounced SUMMARY) was created to summarize paragraphs and long articles
into smaller summarized articles
➢ SMMRY reduces the text into important sentences using algorithms
➢ it accomplishes its targets by:
▪ Ranking sentences by priority using some algorithms
▪ Removing transition phrases
▪ Removing excessive examples
• Translation:-
➢ Microsoft Translator is another popular translate app. It features instant translation
through camera, though it supports half the languages than Google translate does.
➢ Online-Translation.com is known for providing self-automated conversion online from
and to the most of all European languages. High quality translations are provided by this
service with the help of its unique linguistic technologies.
2.1.3 Research Findings for Existing literature
All the literature findings from various research papers have been listed below:-
TABLE 2: Research Findings for existing literature
S. Roll Name Paper Title Tools/ Findings Citatio

No Number Technology n
1 101603121 Himanshu Selecting Convolutional The effectiveness of a Quang
Goyal automatically Neural pre-processing method Anh
pre-processing Network, depends on the nature of BUI,
methods to Preprocessing OCR and the type of David
improve OCR Method distortion included in MOLL
performances Selection the document ARD,
Page 8 of 80
Salvat
ore
TABB
ONE.
2 101603121 Himanshu Preprocessing Processing Filters, Transformations, Yasser
Goyal Techniques in Techniques Histogram Processing, Algina
Character Image Enhancement hi
Recognition Techniques
3 101603121 Himanshu Optical OCR, The OCR techniques Sukhp
Goyal Character Templates based on neural network reet
Recognition provide more accurate Singh
Techniques results than other
techniques.
4 101603121 Himanshu Handwritten Artificial The best recognition Surya
Goyal Character Neural accuracy was reported Nath
Recognition – Network, K- for Binarization, Noise R S,
A Review Nearest removal,
Neighbor, Skeletonization and Afseen
Support Normalization as aS
Vector preprocessing methods.
Machine
5 101603121 Himanshu Thresholding: Optical Various thinning and H.K.
Goyal A Pixel level Character thresholding algorithms Anasu
Image Recognition, yield better results for ya
Processing Pixel level an input image. Devi
Methodology processing
Preprocessing algorithms,
Technique for thresholding
an OCR
System For
Brahmi Script
Page 9 of 80
6 101611024 Himanshu Locating text in Locating The methods give good Yu
Sharma complex color text, Optical results on a variety of Zhon
images character test images. The g,
recognition, algorithms are relatively Kalle
Color robust to variations in Karu
segmentation, font, color, or size of the ,
Spatial text Anil
variance K.
Jain
7 101611024 Himanshu A Review on Character In this paper projection Hiral
Sharma Optical Recognition profile based method for Modi,
Character System, segmentation, fourier M. C.
Recognition Image transform technique is Parikh
Techniques Segmentation, for preprocessing, and
OCR, nearest neighbour
Preprocessing, classifier for
Skew classification
correction,
Classifier.
8 101611024 Himanshu Handwriting Handwriting The design of human P.
Sharma Recognition – identification, computer interfaces Shank
“Offline” feature based on handwriting is ar
Approach extraction, part of a tremendous Rao,
handwriting research effort together J.
individuality, with speech recognition, Aditya
large-scale language processing
systems for and translation to
offline facilitate
analysis communication of
people with computer
networks.
Page 10 of 80
9 101611024 Himanshu Detecting Text Android, An Android application Sathia
Sharma Based Image OCR, text is developed by priya
With Optical translator, text integrating Tesseract Ramia
Character to speech OCR engine, Bing h,
Recognition for translator and phones’ Tan
English built-in speech out Yu
Translation and technology Liong,
Speech using Manoj
Android Jayaba
lan
10 101603179 Maganjot Optical Word OCR system gives much Ahme
Singh Character detection, better results in terms of d
Recognition of zoning, performance and Asif
Bangla character accuracy in comparison Chow
Characters separations with existing usual dhury
using neural and character approach due to the , Ejaj
network: A recognition application of the Ahme
better approach efficient ways of line d,
and word detection, Sham
zoning, character eem
separations, character Ahme
recognition and also d,
employment of Shohr
skewness correction, ab
thinning, better Hossa
approach in scaling and in,
neural network in Chow
detection of characters. dhury
Mofiz
ur
Page 11 of 80
Rahm
an
11 101603179 Maganjot Optical Optical The network has been Najib
Singh Character Character trained and tested for a Ali
Recognition Recognition, number of widely used Moha
(OCR) System Neural fonts.The recognition of med
Network, new font characters by Isheaw
Fuzzy Logic the system is very easy y,
and quick. We can edit Habib
the information of the ul
documents more Hasan
conveniently and we
can reuse the edited
information as and
when required.
12 101603179 Maganjot Optical Optical Tesseract is command- Chirag
Singh Character Character based tool, which is Patel,
Recognition by Recognition open source and is Atul
Open Source (OCR), Open available in the form of Patel,
OCR Tool Source, DLL, Dynamic Link Library. Dharm
Tesseract: A Tesseract It can be easily made endra
Case Study available in graphics Patel
mode.
13 101603179 Maganjot Extraction of Text OCR is easily portable G.R.H
Singh Text from an Extraction, and its scalability which emala
Image and its Android, can recognize various kshmi,
Language OCR, languages M.Sak
Translation Tesseract thiman
Using OCR imala,
J.Salai
Page 12 of 80
Ani
Muthu
14 101610030 Geetika Optical OCR, The proposed method N.
character Handwritten computes error rate Venka
recognition Character efficiently, it results in ta Rao,
technique Recoginition, increasing the accuracy Dr.
algorithms Neural of neural network. A.S.C.
Network. S.Sastr
Recognition y,
Technique A.S.N.
Chakr
avarth
y,
Kalya
nchakr
avarthi
P
15 101610030 Geetika Diagonal based Handwritten The diagonal method J.Prad
feature character of feature extraction eep,
extraction for recognition, yields the highest E.Srini
handwritten Image recognition accuracy vasan,
alphabets processing, of 97.8 % for 54 S.Him
recognition Feature features and 98.5% for avathi
system using extraction, 69 features
neural network feed forward
neural
networks
16 101610030 Geetika Optical Optical Despite the Samee
Character character computational ksha
Recognition recognition, complexity involved, Barve
Using Artificial Artificial artificial neural
Page 13 of 80
Neural Neural networks offer several
Network Network, advantages in back-
supervised propagation network and
learning, the classification in the
Multi-Layer sense of emulating
Perception, adaptive human
the back intelligence to a small
propagation extent.
algorithm
17 101610030 Geetika Summarization: Information The indexing of John
some problems Retrieval, documents is a valuable Hutchi
and methods Text aid to reliable and ns
summarization consistent retrieval, and
indexing is itself a
variety of
summarization.
Information access and
retrieval depends
crucially on all these
various types and tools
of knowledge
organization.
2.1.4 Problems Identified

• Identifying type of distortions in an input image.
• Selection of optimal pre-processing technique.
• Noise getting induced into the input image.
• Accuracy in detecting handwritten texts and complicated fonts.
• Inefficient processing of low contrast images.
Page 14 of 80
2.1.5 Survey of Tools and Technologies Used
• Anaconda Spyder
• Raspberry Pi
• Sublime Text
• Internet.
• Internet Browser.
• Laptop.
2.2 Standards
• IEEE 754 for documentation
• IEEE 1233-1998 For System Requirement Development
• IEEE 830-1998 For System Requirement Specification Analysis
• Python 3.6
• Tesseract 4.0.0
• Raspberry Pi 3 Model B+
• OpenCV 4.0.0
2.3 Software Requirements Specification

2.3.1 Introduction
It specifies all the requirements and dependencies which must be met in order to complete the
making of software.
2.3.1.1 Purpose
Tourists and Corporate Travellers face many problems on a frequent basis when understanding or
translating text/symbols into recognisable and easy ways.
• Translating and Recognising Text into digital form is not very common
• Summarization of a text embedded in an image based on POS Tags in multiple languages
• The Apps consume a lot of space in the current Smartphone OSs’
• Ease of Access for the user to perform all the operations on a text on a single platform
Page 15 of 80
2.3.1.2 Intended Audience and Reading Suggestions
The software targets the students and professors who require the text from books in editable form
without the need to type the entire text.
2.3.1.3 Project Scope
• Investigate the value and the benefits of developing a system for OCR and Translation in
corporates and tourism
• Study the feasibility and different options to enhance the User’s domain (by using multiple
languages/adding features for the user)
• Study the website development procedure for a global platform that is responsive on the least
specifications too
• Design and develop a prototype system which includes a backend system and a mobile client
running on a smartphone
• Evaluate and review the positives and weaknesses of the project build outcome, seek and
identify for further improvement
2.3.2 Overall Description
2.3.2.1 Product Perspective
To build a website that provides a one stop solution to easy understanding of signs/text in
other languages by incorporating OCR, Translation of the recognized text and
summarization of long texts in a way that it puts minimum load on the servers and provides
maximum responsiveness and reliability to the product.
2.3.2.2 Product Features
- OCR Recognition
- Read another Image
- Translation
- Summarization
- Text to Speech
2.3.3 External Interface Requirements
Page 16 of 80
2.3.3.1 User Interfaces
We require an interactive web interface which would be easily accessible and hostable from all
platforms.
2.3.3.2 Hardware Interfaces

We require a laptop, PiCamera Module, Raspberry Pi Model 3, and connecting wires to form the
first hardware.
2.3.3.3 Software Interfaces
Software interfaces include the use of PHP & HTML for web development, Apache & Hadoop for
Backend development, Translation API for translation, & Tesseract Based Modules for OCR.
2.3.4 Other Non-functional Requirements
These are those requirements which don’t have a direct affect on functionality of our project but are
in some way required for building an efficient and optimized model.
2.3.4.1 Performance Requirements
The Model Should be Efficient, Responsive and Accurate which would reduce the user’s time and
effort. In this way user would more likely use our website if it meets his real time need.
2.3.4.2 Safety Requirements
The System should be reliable and safe from external threats, bugs and hacks as some important
documents which user scans might be hacked or leaked through online source. So, our model should
be safe from external threats like information hacking.
2.3.4.3 Security Requirements
The system should be end to end secure for the user so that information scanned or retrieved should
be visible to user or owners and not to any third party source or person.
2.3.4.4 Accessibility Requirements
The system should be easily accessible across all types of platforms
Page 17 of 80
2.4 Cost Analysis
TABLE 3: Cost Analysis
S.No Items Quantity Price (INR)
1 API Integration 1 1500
2 Website Hosting 1 800
Total 2300
2.5 Risk Analysis

1. Empty spot detection
TABLE 4: Risk Analysis in Empty Spot detection
Major modes of Risk involved Risk Mitigation
failure
Line Failure Could detect lines and figures Better Pre-Processing using
OTSU and Image Opening &
Closing
Failure in image Inability to read text Better Training Modules

processing (software
error)
Low Contrast Failure Failure to read text from low Image Enhancement during
contrast images Pre-Processing
Page 18 of 80
2. Failure in localization and guidance
TABLE 5: Risk Analysis in localization and guidance
Major modes of Risk involved Risk Mitigation
failure
Loose Connections Inability to pick image input Check proper connectivity
Inability to recognise Might be due to different Can use accelerometer data

and strange handwritten
font pattern
Page 19 of 80
METHODOLOGY ADOPTED
Methodology is the systematic, theoretical analysis of the methods applied to a chosen arena of
study It involves a theoretical investigation of the body of methods and theories associated with a
branch of knowledge. Typically, it incorporates concepts such as paradigms, theoretical models,
process steps, and other quantitative or qualitative techniques.
3.1 Investigative Techniques

In our project, Experimental investigative techniques are being used with the initial idea of giving
a contribution to society using our engineering skills. An Experimental Investigative Technique
involves coming up with a plan, designing the required procedure for executing the program and
then coming up with the final hypothesis for the desired project. Our idea comprised of developing
a unified platform to Recognise, Translate and Summarize text. We incorporated Web Dev,
Tesseract, Summarization and Translation to give the user easy access to all the functionality. The
procedure designed by us included splitting the project into four modules- Website Development,
Building the tesseract model(OCR Engine), Building the Translation and Summarization Module
and the integration of API (for users to choose between languages).
The following are the independent variables
• The source language of the input image/text.
• The size of the input image.
• The contrast of the input image
• Number of users online.
The following are the dependent variables
• Processing Time Threshold.
Hypothesis :-
• If the user inputs the image, tesseract based OCR Model converts the textual content of the
image into digital text.
• The digital text goes into the Translation Module which translates the output text and allows
the user to further summarize the text at his benefit.
Page 20 of 80
• The user can also use the summarization function that uses functions of the NLTK toolkit
to summarize the important words and can also generate a parse tree of the pin pointed
words.
3.2 Proposed Solution

• Login to the website
• Choose to upload image or click an image
• Click the proceed Option to proceed with the Text Recognition Phase
• Once the text is recognised, the user can decide to Translate/Summarize the text
• Once Translation Module gets activated, the source language is identified and the user needs
to input the desired language to be translated to
• Summarization is done using NLTK toolkit
FIGURE 1: Proposed Solution
Page 21 of 80
3.3 Work Breakdown Structure
The following shows the work breakdown structure of our project:-
1 Initializing the Project ALL Tue 2-12-19 Fri 3-22-19 39 100% 39

1.1 Project Management Phase Tue 2-12-19 Sat 3-02-19 19 100% 19
1.1.1 Feasibility Study Tue 2-12-19 Sun 2-17-19 6 100%
1.2 Requirement Gathering Analysis Sun 3-03-19 Thu 3-07-19 5 100% 24
1.2.1 Analyze Requirements Sun 3-03-19 Tue 3-05-19 3 100%
1.3 Research Work Fri 3-08-19 Fri 3-22-19 15 100% 39
2 Design Phase ALL - -

2.1 Basis of project Fri 4-05-19 Sun 4-14-19 10 100% 10
2.1.1 Brain storming Session Fri 4-05-19 Mon 4-08-19 4 100% 4
2.1.2 Role division Tue 4-09-19 Sun 4-14-19 6 100% 6
2.2 Specifications Wed 4-17-19 Wed 5-01-19 15 100% 25
2.2.1 Dataset Collection Thu 5-02-19 Sat 5-11-19 10 100% 7
3 Implementation Sat 5-11-19 Thu 10-17-19 160 114

3.1 Webpage Development Himanshu Sharma Sat 5-11-19 Sun 6-09-19 30 100% 20
3.2 Dataset Collection Maganjot Singh, Geetika Sat 5-11-19 Sun 6-09-19 30 100% 20
3.3 Image to Text Conversion Maganjot Singh Mon 6-10-19 Tue 7-09-19 30 100% 22
3.4 Text Summarization Himanshu Goyal Sat 5-11-19 Sun 6-09-19 30 100% 20
3.5 Text to Speech Geetika Mon 6-10-19 Tue 7-09-19 30 100% 22
3.6 API development Himanshu Goyal Sat 5-11-19 Tue 7-09-19 60 100% 42
3.7 API intergration Himanshu Sharma Wed 7-10-19 Tue 7-16-19 7 100% 5
3.8 Investigation of modules ALL Sat 7-20-19 Sun 8-18-19 30 100% 20
3.9 Review of errors ALL Tue 8-20-19 Tue 10-08-19 50 100% 36
4 Testing ALL Sun 10-20-19 Fri 11-08-19 20 30

4.1 Unit Testing Sun 10-20-19 Tue 10-29-19 10 100% 7
4.2 Analysing and removing errors Wed 10-30-19 Fri 11-08-19 10 100% 8
5 Project Closure ALL Sun 11-10-19 Sat 11-30-19 21 15

5.1 Final Documentation Sun 11-10-19 Sat 11-30-19 21 100% 15
FIGURE 2: Work Breakdown Structure
Page 22 of 80
3.4 Tools and Technologies Used:-
• Tesseract
• Python
➢ Scikit-Image
➢ ImUtils
➢ OpenCV
➢ Numpy
➢ Argparse
➢ GoogleTrans
➢ os
• HTML, CSS, PHP, Flask
• Cloud Server
Page 23 of 80
DESIGN SPECIFICATIONS
Product style specification may be a careful document providing data a few designed
product or process. For instance, the planning specification should embody all necessary
drawings, dimensions, environmental factors, technology factors, aesthetic factors,
maintenance that may be required, etc. It may additionally provide specific samples of how the
planning ought to be dead, serving to others work properly.
4.1 System Architecture (Component Diagram)

Component diagram Component diagram is a unique type of diagram in the UML base. The objective
also differs from all other diagrams discussed so far. It does not describe the functionality of the
system but it describes the components used to create those functionalities.
FIGURE 3: Component Diagram
Page 24 of 80
4.2 Design Level Diagram
It is a design phase structure that allows us to build a more modular and efficient program structure
which results in a reduced complexity.
FIGURE 4: Design level Diagram
Page 25 of 80
FIGURE 5: MVC Tier Architecture
FIGURE 6: Basic Architecture
Page 26 of 80
4.3 User Interface Diagram
It is used for modelling the user to system interactions and enables the user to gain a high level
generalised overview of the software.
FIGURE 7: User Interface Diagram
Page 27 of 80
IMPLEMENTATION AND EXPERIMENTAL RESULTS
5.1 Experimental Setup
FIGURE 8: OCR Flow Diagram Experimental
The system that has been developed by us, uses the Tesseract library to alternatively identify
characters and various other knowledge sources to enhance performance parameters. Different
methods are implied for variable languages. Mixed characters are first divided into its
component/individual identifiable symbols which, in addition to being a natural way of dealing with
the Devanagari script, helps reduce the size of the set. The automated instructor makes two iterations
on the text containing image to detect the characteristics of all the symbols in the script. According
to the sequence of events, one by one our system first uses multiple image frames of handwritten or
printed text from the camera and then the frame is sent for back-in pre-processing which is achieved
by using sk-image and OpenCorp. The kit uses tools and the contents that have been converted to
text become the input for the Google Translate AJAX API, allowing us to have the language changed
content required and we also get the choice to summarize the information for better understanding.
Page 28 of 80
5.2 Experimental Analysis
5.2.1 Data
The following datasets can be used by Tesseract as a training dataset:
Born-digital image (web and email)
Images are often used to embed text information in electronic documents (web and email). The use
of images as text carriers stems from many requirements. For example, images are used to beautify
(such as titles, titles, etc.), to attract attention (such as advertisements), to hide information (such as
avoiding text-based filtering images in spam emails Used for), even a computer tell a human
separately (captcha test).
Automatically extracting text from born-digital images is an interesting possibility as it will provide
enabling technology for many applications such as improved indexing and retrieval of web content,
increased access to content, content filtering (such as advertising or spam email) e.t.c.
While birth-digital text images are similar to actual visual text images on the surface (both feature
text in complex color settings), they are different at the same time. Birth-digital images are
inherently low-resolution (created to be broadcast and displayed on online screens) and text on the
image is digitally created; On the other hand visual text images are captured in high-resolution
cameras. Although innate digital images may suffer from compression artifacts and severe anti-
aliasing, they do not share the illumination and geometric problems of real visual images. Therefore
it is not necessary that methods developed for one domain will work in another.
ETL Character Database
The "ETL Character Database" is a collection of approximately 1.2 million hand-written and
machine-printed numerals, symbols, images of the Latin alphabet and Japanese characters and is
compiled into 9 datasets (ETL-1 to ETL-9). This database was reorganized as the Electrotechnical
Laboratory (currently the Institute of Advanced Industrial Science and Technology (AIST)) in
collaboration with the Japan Electronic Industry Development Association (currently reorganized
as the Japan Electronics Industry and Information Technology Industry Association), universities,
and others. Was reorganized into). (Has been reorganized). Research Organization for Character
Recognition Research from 1973 to 1984. The database can only be used for research purposes for
free. This database was supplied by mail, sending recorded media such as magnetic tape and CD-
R. This database has been available for download since April 2011. Since January 2014 the database
Page 29 of 80
has been moved to etlcdb.db.aist.go.jp. Specification: ETL-1, ETL-2, ETL-3, ETL-4, ETL-5, ETL-
6, ETL-7, ETL-8, ETL-9
5.2.2 Performance Parameter

We study the matrix for evaluation of OCR performance in terms of physical segmentation in terms
of recognition of textual content. These OCRs are the output (hypothesis) and context (also known
as ground truth) metrics on the input format. Two evaluations are considered: the quality of the
segmentation and the identification rate of the identifier. The three pairs of input formats differ
between the two types of inputs: text only (text) and text with spatial information (xml). These pairs
of input-to-hypothesis are: 1) text-to-text, 2) xml-to-xml and 3) text-to-xml. For the text-to-text pair,
we selected the RETAS method to perform experiments and show its limits.
Regarding text-to-XML, a new method based on unique wordcatchers is proposed to solve the
problem of aligning different information. We define ZoneMapAltCntmetric for the xml-to-xml
approach and show that this anoint provides the most reliable and complete evaluation of the two.
Open source OCRs such as TesRact and OCRopusare were chosen to be used. The dataset used a
collection of documents from the ISTEX1 documentbase, which collects separate collaborations from
the French newspaper "Le Nouveblersvitter" as well as invoices and administrative documents.
FIGURE 9: Image Segmentation

5.2.2.1 Character Metric
Character metric is based on the Levensheight distance which signifies the minimum number of edit
operations required to correct the hypothesis Text (OCR generated Text) to match it with the reference
text. Let Cins be the number of character insertions in HZ to match with RZ, Cdel be the number of
char deletions and Csub be the number of character substitutions.
Let CH be the number of characters in HZ and CR be the number of characters in RZ. The total
character error Cerror is defined by the Equation 1:
Page 30 of 80
Cerror= Cins+ Cdel+ Csub(1)
The number of correct characters Ccorrect is given as:

Ccorrect= Caln−Cerror(2)
where Calnis the number of characters in the aligned sequence. Character precision and recall are
computed respectivelyas defined in the equation 3a and equation 3b.
5.2.2.2 Word Metric

Word metric is based on the Levensheight distance, such as lighter, but at a word level. This means
that we will see the minimum number of word insertions, deletions, and substitutions required to
correct the hypothesis text to match with the reference text. If the word in RZ is an alternative to the
organic in the corresponding HZ, it is considered correct. Aegis of erroneous words Werror, number
of correct words Wcorrect, word precision WPreults and word recall WRecall are similarly calculated
in synonyms.
5.2.2.3 Strict Word Metric
Strict word metric is similar to the word metric, except for the calculation of a correct word. RZ is a
strictly correct verbatim that combines all its characters with HZ as if there are no extra letters in HZ,
that is, the word RZ must exactly match the word HZ and not substitute for HZ alone.
5.3 Working of the Project

5.3.1 Procedural Workflow
• Upload the image whose text is to be recognized

• The image is preprocessed using various image processing techniques
• A perspective Transform Module is built
• Edge detection takes place
• Using the edges detected, an outline representing the piece of paper that undergoes scanning
is found.
Page 31 of 80
• Perspective change is applied to obtain bird's eye view of the document or scanned text.
• Otsu thresholding is applied which enhances the image.
• The image is then segmented into blocks to recognize separate characters and words.
• Then as per the gaps between different letters, the words are recognized character by
character.
• Then outlier detection is applied to see which words or characters are not recognized
properly.
• It is fed to a loop until no outlier is found.
• Then to recognize broke characters we use classifiers.
• They match the characters with a prototype to find to which character the broke characters
would map.
• Then the recognized text is fed to the translator.
• The translator uses the Google Ajax API for the process of text translation.
• The user inputs the destination language and text is translated to the desired language.
• User has the option to summarize the recognized text.
• The text is fed to the summarizer which uses the natural language processing techniques.
• Summarizer uses the Tf-Idf vectorization technique to give weighting to the meaning words.
• On the basis of this weighting, a summarized meaningful paragraph is generated which is
shown to the user on the screen.
Page 32 of 80
FIGURE 10: Activity Diagram
Page 33 of 80
5.3.2 Algorithmic Approaches Used
5.3.2.1 External Thresholding

Noise is a random variation of color or brightness between pixels of an image. Noise decrease the
readability of text from an image. There are two major types of noise - salt and pepper noise and
Gaussian noise.
Now, There are 3 types of standard Thresholdings used.
i) Simple Thresholding
The matter is straight forward. For each pixel, the same threshold value is applied. If the pixel value
is less than the threshold, it is set to 0, otherwise it is set to the maximum value. The function
cv.threshold is used to implement threading. The first argument is the source image, which needs to
be a grayscale image. The second argument is the threshold value used for the classification of pixel
values. The third argument is the maximum value that is subsequently assigned to pixel values higher
than the threshold. OpenCV is the one that provides different types of throttling that is given by our
fourth parameter of the function. The method returns two outputs. The first is the threshold that was
used and the second is the output threshold image.
FIGURE 11: Simple Thresholding Output

ii) Adaptive Thresholding
In the previous section, we used a global value as a category. But this may not be good in all cases,
e.g. If an image has different lighting conditions in different areas. In that case, the adaptive threshold
may help. Here, the algorithm determines a pixel based on a small area around it. So we obtain
different thresholds for different regions of the same image which gives better results for images with
different illumination.
Page 34 of 80
FIGURE 12: Adaptive Thresholding
iii) Otsu's Binarization

In global thresholding, we used an arbitrary chosen value as the threshold. In contrast, Otsu's method
avoids choosing a value and determines it automatically.
Consider an image with only two different image values (bimodal image), where the histogram would
include only two peaks. There would be a good range between those two values. Similarly, Otsu's
method determines an optimal global boundary value from the image histogram.
To do this, the cv.threshold () function is used, where cv.THRESH_OTSU is passed as an additional
flag. The limit value can be chosen arbitrarily. The algorithm then finds the optimal threshold value
that is returned as the first output.
See the examples below. The input image is a noisy image. In the first case, a global limit with a
value of 127 is applied. In the second case, Otsu's emphasis is directly applied. In the third case, the
image is first filtered with a 5x5 Gaussian kernel to remove noise, then an Otsu threshold is applied.
See how results improve with filtering.
Page 35 of 80
FIGURE 13: OTSU Comparison Histograms
5.3.2.2 Preprocessing
Using OpenCV, a document scanning a pre-processing unit can be created in 5 easy phases:
➢ Step 1: Development of a perspective change module

➢ Step 2: Edge Detection
➢ Step 3: The edges found in the image are then used to find the outline that represents the
piece of paper that undergoes scanning.
➢ Step 4: Perspective change is applied to obtain bird-view of the document or scanned texts.
➢ Step 5: Application of Otsu thresholding
Page 36 of 80
Step-1 Building The Perspective Transform Module
We begin by importing the required dependencies. The order_points function that we defined takes
a single parameter argument, which pts', which is a data container of four points that specifies the x
(x, y) coordinates of each point in the rectangle. Now we start by allocation of the required memory
for the four to-be-used ordered points. The top-left point is detected, which will have the lowest
coordinate sum, as well as the bottom-right point, which will have the highest coordinate sum.
Finally, we cardinally subtract the top-right and bottom-left points. Here we take the difference (ie
x1 - y1) between different points, which use the np.diff function of the number library. The points
with the minimum cardinal difference would be the points at the top-right, while the points with the
maximum cardinal difference will be the ones at the bottom-left. The ordered functions are now
returned and passed to the calling function.
Page 37 of 80
We begin by defining the Four_point_transform function. Now we call / interrupt for the
order_point function. Now, we unpack these coordinates for convenience. Now we calculate the
resolution of the newly created warped_image. Therefore, we can set the width of the new image.
In a reflective / derivative way, we proceed to find the height of the resulting image.
Then, we define 4 points that represent a "top-down" view of the image. The first entry (0, 0) in
this list indicates the top-left corner. The next entry is (max 1 - 0, 0) which represents the top-right
corner. Next, we have (MaxWith - 1, MaxHeight - 1) which is the bottom-right corner. Finally, we
will have (0, MaxHeight - 1) which is in the lower-left corner of the image.
To actually see the image upside down or "bird's eye", we'll use the cv2.getPerspectiveTransform
function. We implement the transformation matrix using the cv2.warpPerspective function.
Page 38 of 80
Step-2 Detect The Edges
Initially the packages that are needed start importing. We will import the Four_point_transform
function that we created above. We also use the Emulates module, which includes convenient
functions for sizing, rotating, and cropping images.
First, we load our image from disk. To catalyze our image processing, as well as to make our edge
detection step more accurate, we will take the size of our scanned image to around 500 pixels. From
here, we convert our image from RGB to grayscale and Gaussian blur to remove noise with high
frequency, and canny edge detection. Then the output is displayed.
Page 39 of 80
Step-3 Finding The Contours
We begin by finding the shape in the input image. We have also handled the issue that OpenCV 2.4
gives different sizes according to the version OpenCV4 used. A performance trick we like is to
actually sort the shape on the shells and preserve only the largest. This allows us to restrict our
examination to its largest parts, excluding the rest. Then we start controlling the controls and
estimate the number of digits. And again, this is a fairly safe assumption. The scanner app will
assume that (1) the document being scanned is the main focal point of our image and (2) the
document is now rectangular, and thus will have four different edges.
Step 4: Apply the Perspective Transform & Threshold
Page 40 of 80
All heavy lifting is controlled by the four_point_transform function. We pass two arguments to our
function char_point_transform: the first is the original image that we loaded from disk, and the
second is the document that represents the document, multiplied by our modified ratio.
FIGURE 14: Document Scanning Output
Step-5 Otsu Thresholding
Since we are working with bimodal images, Otsu's algorithm tries to find a threshold value (T),
which reduces the weighted class variant given by the relation
It actually finds a value of T that lies between the two peaks such that the differences in both classes
are minimal. This can only be implemented in Python:
Page 41 of 80
5.3.2.3 Tesseract Engine Working
The Tesseract is an open-source optical character recognition engine that was developed between
1984 and 1994 at Hewlett Packard Systems. Like a supernova in space, it appeared from anywhere
during the 1995 UNLVAnual tests for OCR accuracy.
This had an important advantage: from nesting to examining the number of children and
grandchildren, it is easy to detect and identify the inverted text and to easily identify it as black-
white text.
Blobs are organized into segments and text lines, and lines and areas are analyzed for fixed pitch or
proportional text. Text lines are divided into different words based on the type of character lines.
The fixed pitch text is immediately sliced by the character cells. Proportional text is divided into
words using proportional spaces and fuzzy logics.
The recognition then proceeds as a two-iteration process. In the initial pass, an attempt is made to
identify the words in return. Every term that is satisfactory is given to an adaptive classifier for
training data. The adaptive classifier then gets a chance to identify the text at the bottom of the page
more accurately.
Page 42 of 80
The last step not only handles fuzzy spaces, but also examines the alternative hypothesis for x-height
to detect small-cap text.
5.3.2.3.1 Line and Word Finding
5.3.2.3.1.1 Line Finding

The algorithm is designed to find the line so that the skewed page can be identified without de-skew,
thus saving image quality loss. A simple percentile height filter removes the drop-cap and vertical
touching characters. The median elevation approximates the size of the text in the area, so it is safe
to filter out droplets that are smaller than a fraction of the average height, most likely punctuation,
special markings, and noise.
Filtered drops are more likely to fit a model of non-overlapping, parallel, but sloping lines. Crabbing
and processing by X-Coordinate makes it possible to assign blobs to a unique text line while tracking
the page's slope, with very little risk of assigning the wrong text line to a slanted appearance. Once
the filtered blobs have been assigned to the lines, a minimal means of squares is used to estimate
the baseline. The final stage eliminates at least half of the horizontally overlapping mergers, with
Wikipedia traces linking the right base and portions of some broken characters.
5.3.2.3.1.2 Fixed Pitch Detection and Chopping
Tesseract then tests the text lines to find out if they are fixed pitches. It locates the fixed pitch text,
cuts the text into words using the tesseract pitch, and deactivates the chopper and associate on these
words for the word recognition step. The figure below shows a typical example of fixed pitch.
5.3.2.3.2 Word Recognition
The art of the recognition process for any character recognition engine is to identify how a word
should be divided into characters. The initial segmentation output of the line search is first
classified. The rest of the stages of our word recognition apply only to our non-fixed-pitch text.
5.3.2.3.2.1 Chopping Joined Characters
While the result from the term is unsatisfactory, Tesseract attempts to improve the result by cutting
the drop with the worst confidence from its character classification. Candidate chop points are
found from the concave corner of the polygon approximation of the outline, and vice versa can be
Page 43 of 80
concave or line segments. This can take up to 3 pairs of chop points to successfully separate
characters from the ASCII set.
5.3.2.3.2.2 Associating Broken Characters

When the chops have expired, it is given to the ally, if the word is not good enough. The
collaborator then performs an A * search of the graph of the partition of the possible combination
of candidate graphs, which is mixed into the candidate characters. It does this without actually
creating a partition graph, but it keeps a hash table of visited states.
5.3.2.3.3 Static Character Classifier
5.3.2.3.3.1 Features
The early version of Tesseract used topical features developed from the work of Shillman et. al.
Although well independent of font and size, these features are not robust to problems found in
real-life images, as Boecker describes. One idea involved using parts of the polygon
approximation, but the method is also not robust to damaged characters.
5.3.2.3.3.2 Classification
The classification proceeds as a two-step process. In the first stage, a class pruner creates a shortlist
of character classes that can match the unknown. For each attribute, from a quantized 3-
dimensional look-up table, the bit-vectors that match it, and the bit-vectors are expressed on all
attributes. Classes with the highest stages (after being corrected for the number of expected
features) become a shortlist for the next step.
Each feature of the unknown looks like a small vector of the prototype of a given class that it can
match, and then the real similarity between them is calculated. The best joint distance, which is
calculated from detailed characterization and prototype evidence, is the best over all stored
configurations of the class.
5.3.2.3.4 Training Data
Since the classifier is able to easily identify damaged characters, the classifier was not trained on
damaged characters. In fact, the classifier was trained on 20 samples of 94 characters from 8 fonts
Page 44 of 80
in the same size, with 4 attributes, totaling 60160 training samples. This is in contrast to other
published classifiers, such as the Calra Classifier with over one million samples, and Baird's 100-
font classifier with 1175000 training samples.
5.3.2.3.5 Linguistic Analysis
Tesseract has relatively little linguistic analysis. Whenever the word recognition module is
considering a new division, the linguistic module selects the best available word string in each of
the following categories: top consecutive word, top word word, top numeric word, top UPPER
case word, top lower case Word (with)) optional initial upper), top classification option word. The
final decision for a given segment is the term with the minimum distance rating, where each of the
above ranges is multiplied by a different constant.
Words in different clauses may have different characters. It is difficult to directly compare these
terms, even where a classifier claims productive probabilities, which Tesseract does not. This
problem is solved in Tessract by generating two numbers for each character classification. The
first, called confidence, is the zero of the normalized distance from the prototype. This enables
"confidence" in the sense that the greater the number, but still the greater a distance, the greater
the distance from zero. The second output, called the rating, multiplies the normalized distance
from the prototype by the total outline length in the unknown character. Ratings for characters
within a word can be expressed meaningfully, because the total length of all the characters within
a word is always the same.
5.3.2.4 Translation Working
Googletrans is a free and unlimited Python library that has implemented the Google Translate API.
It uses the Google Translate Ajax API to call for such methods and functions to detect and translate.
i) If the source language is not provided, Google Translate tries to find the source language.
ii) Advanced Use ( Bulk Content)

Arrays can also be used to perform batches of strings in a single method call and a single HTTP
session. The above method also works for arrays.
Page 45 of 80
5.3.3 Project Deployment
FIGURE 15: Component Diagram
Page 46 of 80
FIGURE 16: Deployment Diagram
Page 47 of 80
5.3.4 System Screenshots
The following are the screenshots taken while running or executing the model or its integrated
components:-
Page 48 of 80
Page 49 of 80
Page 50 of 80
Page 51 of 80
Page 52 of 80
FIGURE 17: System Prototype Screenshots
Page 53 of 80
5.4 Testing Process
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub assemblies, assemblies and/or a finished product. It is the process of exercising
software with the intent of ensuring that the software system meets its requirements and user
expectations and does not fail in an unacceptable manner. There are various types of test. Each test
type addresses a specific testing requirement.
5.4.1 Test Plan
5.4.1.1 Features to be tested
- Recognition Ability
- Translation of recognised text
- Summarization of recognised text
- Pipelining of Data
- Text to Speech conversion ability
5.4.1.2 Test Strategy
• Testing normal working of website server
• Testing whether the OCR page is working well or not.
• Testing the accuracy of text obtained from OCR.
• Checking whether the pipelining from OCR page to translator and summarizer page is
working fine or not
• Testing the working of translator and summarizer
• Checking the accuracy of translator and summarizer.
5.4.1.3 Test Techniques
5.4.1.3.1 Unit Testing
Unit testing involves the design of test cases that validate that the intemal program logic is
functioning properly, and that program input produces valid outputs All decision branches and
internal code flow should be validated. It is the testing of individual software units of the application
it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is invasive.
Unit tests perform basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business process performs
Page 54 of 80
accurately to the documented specifications and contains clearly defined inputs and expected
results.
Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct
phases.
Test strategy and approach-
Field testing will be performed manually and functional tests will be written in detail.
Test objectives:
- All field entries must work properly.
- Pages must be activated from the identified link.
- The entry screen, messages and responses must not be delayed
5.4.1.3.2 Integration Testing
Integration tests are designed to test integrated software components to determine if they actually
run as one program. Testing is event driven and is more concerned with the basic outcome of screens
or fields. Integration tests demonstrate that although the components were individually satisfaction,
as shown by successfully unit testing, the combination of components is correct and consistent.
Integration testing is specifically aimed at exposing the problems that arise from the combination
of components
Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects.The task of the
integration test is to check that components or software applications, ex. components in a software
system or one step up software applications at the company level - interact without error
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
5.4.1.3.3 System Testing
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.
User Acceptance Testing is a critical phase of any project and requires significant participation by
the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
5.4.1.3.4 Functional Testing
Page 55 of 80
Functional tests provide a systematic demonstration that functions tested are available as specified
by the business and technical requirements, system documentation, and user manuals
Functional testing is centered on the following items:
Valid Input: identified classes of valid input must be accepted.
Invalid Input: identified classes of invalid input must be rejected.
Functions: identified functions must be exercised.
Output: identified classes of application outputs must be exercised.
System Procedures: interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions, or
special test cases. In addition, systematic coverage pertaining to identify business process flows,
data fields predefined processes, and successive processes must be considered for testing. Before
functional testing is complete, additional tests are identified and the effective value of current tests
is determined.
There are two basic approaches of functional testing
a. Black box or functional testing
b. White box testing or structural testing
5.4.1.3.4.1 Black box testing
This method is used when knowledge of the specified function that a product has been design to
perform is known. The concept of black box is used to represent a system hose inside workings are
not available to inspection. In a black box the test item is eaten as “Black”, since its logic is unknown
is what goes in and what comes out, or the input and output In black box testing, we try various
inputs and examine the resulting outputs. The black box testing can also be used for scenarios based
test In this test we verify whether it is taking valid input and producing resultant out to user. It is
imaginary box testing that hides internal workings. In our project valid input is image resultant
output well structured image should be received.
5.4.1.3.4.2 White box testing
White box testing is concern with testing implementation of the program. The intent of structural
testing is not to exercise all the inputs or outputs but to exercise the different programming and data
structures used in the program Thus structure testing aims to achieve test cases that will force the
desire coverage of different structures. Two types of path testing are:
1. Statement testing
2. Branch testing
Page 56 of 80
5.4.1.3.4.2.1 Statement Testing
The main idea of statement testing coverage is to test every statement in the objects method by
executing it at least once. However, realistically, it is impossible to test program on every single
input, so you never can be sure that a program will not fail on some input.
5.4.1.3.4.2.2 Branch Testing
The main idea behind branch testing coverage is to perform enough tests to ensure that every branch
alternative has been executed at least once under some test. As in statement testing coverage, it is
unfeasible to fully test any program of considerable size.
5.4.2 Test Cases
Page 57 of 80
FIGURE 18: Test Inputs
Page 58 of 80
5.4.3 Test Results
Page 59 of 80
Page 60 of 80
FIGURE 19: Test Outcomes
5.5 Results and Discussions

OCR refers to optical character recognition, conversion of document photos, or scene photos into
machine-encoded text. There are many tools available to implement OCR in your system such as
Tesseract OCR and Cloud Vision. They use AI and machine learning as well as trained custom
models. The identification of the text to produce a good quality depends on a wide variety of factors.
OCR output highly depends on the quality of the input image. This is why each OCR engine
provides guidelines regarding the quality and size of the input image. These guidelines help the
OCR engine deliver accurate results. Here the image pre-processing input comes to improve image
quality so that the OCR engine can provide an accurate output.
5.6 Validation of Objectives

The objectives we aimed at achieving have been successfully achieved. The Following are the
objectives aimed at:
• We aim to develop a unified user-friendly web-page for all the tasks
• Easy scanning & recognition of image input
• Translation of text to other languages
Page 61 of 80
• Achieve text summarization
• Optional Text to Speech
Page 62 of 80
CONCLUSIONS AND FUTURE DIRECTIONS
6.1 Conclusion
A system has been developed which uses tesseract library to optically recognize the characters and
various knowledge sources to improve the performance. Different approaches have been taken for
different languages e.g. the composite characters are first segmented into its constituent symbols
which helps in reducing the size of the set, in addition to being a natural way of dealing with
Devanagari script. The automated trainer makes two passes over the text image to learn the features
of all the symbols of the script. According to the sequence of events as put one after another first
our system takes in multiple image frames of handwritten or printed text from the camera and then
the frames are sent to the backend for pre-processing and the content that has been converted to text
becomes the input for out Google translation API from which we obtain the language converted
content required and we also get the option to summarize the information for better understanding.
6.2 Environmental, Economic and Societal Benefits
1. Environmental: We've seen notes below all inbound emails, which ask us to think twice
before printing a document, and we all understand that reducing paper consumption has a
measurable impact on the environment. But imaging can also help with environmental
concerns in ways you may not have considered. How much fuel is used to ship paper to the
office? What about moving documents to offsite storage? How about running climate
control for document storage areas?
2. Economic:
i. Reduced Cost: In addition to the actual cost of your employees' labor, there are
countless other costs that can be mitigated by implementing an imaging and OCR
solution. Some of these areas include: printing, copying, maintenance of
consumables and office equipment, lost document costs, and shipping costs. We
have helped a company save half a million dollars in annual shipping costs
through shipping alone!
ii. Reduced Errors: Unfortunately, people make mistakes. We forget things, make
typos, lose and misfile documents. Imaging and OCR will reduce expensive
errors that can cost a healthcare organization hundreds of thousands of dollars a
year.
Page 63 of 80
3. Societal:
i. Availability: By scanning and extracting the information of paper documents with
OCR, it provides information and document image in multiple locations and
multiple systems with no delay in searching for image and information retrieval.
With a few clicks of the mouse, the document is available to those who need it.
ii. Security: In healthcare, document security is a constant concern. In addition to
the compliance issues brought by extensive regulation, there are real costs that
can make mishandled or inaccurate documents. With electronic documents,
access can be strictly controlled and you will have the ability to fully audit any
access to documents.
6.3 Reflections
The following reflections can be stated based on our working and completion of the project:
1. The model we had in our mind of a single tool for translation, text recognition and summarization
has been successfully implemented and is now a working webpage.
2. The webpage is incorporated with a number of features including character recognition from the
image,text translation from one language to another, and making long paragraphs into summarized
ones. These all tools can be used simultaneously with the help of pipelining.
3. We are working on our idea to make it more accurate and user-friendly.
6.4 Future Work

• Integrate the running module to built a unified framework
• Once the website development phase is complete, we will setup server to integrate project code into
the backend
• work on optimization and time and space complexity of the project
• Set hosting for the website and get the project live
• Identifying the problem or improvising possible in the project.
• Test the live project with sample cases and try to build the user-friendly interface for the target
audience.
• Once the testing phase is completed, the project is complete and the only task will be the
maintenance of the website and server
Page 64 of 80
• There are particular optimization techniques available for Otsu's binarization. We can search and
implement it.
Page 65 of 80
PROJECT METRICS
7.1 Challenges Faced

• The font characteristics of the characters in paper documents and quality of images are only some of the
recent challenges.
• Characters may not be recognized correctly by computer system.
• There is very slight difference between some digits and letters for computers to recognize them and
distinguish one from the others correctly
• the need to scale techniques for generating abstracts.
• The need to scale techniques to create abstract.
• Learning Django Framework.
• Translation difficulties
i. The knowledge of the two languages;
ii. The meaning of the words;
iii. Perceptions
7.2 Relevant Subjects
▪ Image Processing
▪ Natural Language Processing
▪ Python
▪ Machine Learning
7.3 Interdisciplinary Knowledge Sharing

The various disciplines used are as follows:
1. Image Processing: The whole optical image reorganization part of the project needs a good
knowledge of image processing and how to implement various algorithms in order to reach
the required state.
2. Natural language Processing: The language translation part of the project requires a fairly
good knowledge of natural language processing because there are various problems that could
occur otherwise.
3. Python: The implementation of the whole project is done in python so this is also essential.
Page 66 of 80
4. Machine Learning: The training and testing of the project essentially requires the knowledge
of machine learning to optimize accuracy.
As the above mentioned subjects/disciplines are all essential and are used in building this project and
is fairly shared it defines the Interdisciplinary Knowledge Sharing.
7.4 Peer Assessment Matrix

TABLE 6: Peer Assessment Matrix
Evaluation Of
S1 S2 S3 S4
S1
S2
Evaluation
By
S3
S4
S1- Geetika Valecha

S2- Himanshu Goyal
S3- Himanshu Sharma
S4- Maganjot Singh
Page 67 of 80
7.5 Role Playing and Work Schedule
TABLE 7: Role Playing and Work Schedule
Level WBS Task Description Assigned To Start End
11 Initializing the Project ALL 12/02/2019 04/04/2019

2 1.1 Project Management Phase 12/02/2019 23/03/2019
3 1.1.1 Feasibility Study 12/02/2019
2 1.2 Requirement Gathering Analysis
3 1.2.1 Analyze Requirements
2 1.3 Research Work
12 Design Phase ALL 05/04/2019 06/05/2019
2 2.1 Basis of project
3 2.1.1 Brain storming Session
3 2.1.2 Role division
2 2.2 Specifications
3 2.2.1 Dataset Collection
13 Implementation 11/05/2019 12/10/2019
2 3.1 Webpage Development Himanshu Sharma
2 3.2 Image to Text Conversion Maganjot Singh
2 3.3 Text Summarization Geetika
2 3.4 Text to Speech Geetika
2 3.5 API development Himanshu Goyal
2 3.6 API intergration Himanshu Sharma
2 3.7 Investigation of modules ALL
2 3.8 Review of errors ALL
14 Testing ALL 15/10/2019 05/11/2019

2 4.1 Unit Testing
2 4.2 Analysing and removing errors
1 5 Project Closure ALL 05/11/2019 30/11/2019

2 5.1 Final Documentation
Page 68 of 80
7.6 Student Outcomes, Description and Performance Indicators
TABLE 8: Student Outcomes
SO Student Outcome Descriptions Outcomes
A1 Applying mathematical concepts to Mathematical concepts were required in

obtain analytical and numerical processing of the input images.
solutions
A2 Applying basic principles of science We trained the dataset with camera

towards solving engineering problems. clicked inputs to make it function in
max devices
A3 Applying engineering techniques for We developed this system to ease the

solving computing problems. process of document scanning and
traveling professional requirements.
B1 Identify the constraints, assumptions Assumptions and constraints were

and models for the problems. feasible with time.
B2 Use appropriate methods, tools and We have used Born-Digital Image and
techniques for data collection. ETL Character database.
C1 Design software system to address The system is developed in Python.

desired needs in different problem
domains.
C2 Can understand scope and constraints It doesn’t have any effect on

such as economic, environmental, environment.
social, political, ethical, health and
safety, manufacturability, and
sustainability
D1 Fulfill assigned responsibility in We divided the work in image

multidisciplinary teams. processing, training model and testing,
Page 69 of 80
documentation, UML diagrams and
testing
D2 Can play different roles as a team Mutual understanding and team work
player has been heightened by discussion and
helping.
E1 Identify engineering problems. To make the system applicable to

efficiently recognize text from image
and be able to translate and summarize
it.
E2 Use analytical and computational Used prediction techniques to solve the

methods to obtain solutions. problem.
F1 Showcase professional responsibility In assessments we experienced and

while interacting with peers and learned to present our project in a
professional communities. decent way.
F2 Able to evaluate the ethical dimensions Many problems were identified as we

of a problem. go deeper in working on project.
G1 Produce a variety of documents such as Produced reports according to the

project reports using given format format instructed by the coordinator.
G2 Deliver well-organized and effective We presented our project through

oral presentation. presentation and working prototypes.
H1 Aware of environmental and societal The project has no negative effect on

impact of engineering solutions. the society.
H2 Examine economic trade-offs in The project is economically sound to

computing systems. maintain and improve.
I1 Able to explore and utilize resources to We took help from seniors, YouTube
enhance self-learning. videos, courses by foreign universities,
Page 70 of 80
open source codebases and guidance
from mentor.
I2 Recognize the importance of lifelong Trying and failing is best way to learn
learning. more and more about a subject.
J1 Comprehend the importance of OCR is not known to all and is a

contemporary issues. complex task when performed
manually. This is why we took up the
project to make it easier.
K1 Write code in different programming The code is written in Python.

languages.
K2 Apply different data structures and We applied algorithms for image

algorithmic techniques. acquisition, environmental setup,
feature extraction etc.
K3 Use software tools We used Python for the development of

necessary for computer the system.
engineering domain
7.7 Brief Analytical Assessment

Q1. What sources of information did your team explore to arrive at the list of possible project
problems?
Ans. Various sources of information were explored to list down the possible project problems such
as different research papers,journals and some help from internet was also taken.
Q2.What analytical, computational and/or experimental methods did your project team use to obtain
solutions to the problems in the project?
Ans. We used various image processing techniques like perspective transform,edge

detection,Outline detection, Otsu thresholding, segmentation, outlier detection, thresholding etc.
We used Google Ajax api for translation and various natural language processing techniques like
tokenization, tf-idf vectorization etc. For summarization.
Page 71 of 80
Q3.Did the project demand demonstration of knowledge of fundamentals, scientific and/or
engineering principles? If yes, how did you apply?
Ans. Yes, the project demanded the demonstration of knowledge of fundamentals and scientific
principles. We used the basic principle of edge detection and line detection to form an outline border
for the image. Using this we de-skewed the image and applied basic thresholding techniques to
refine the image. The chopping and segmentation principle is used to chop characters into
meaningful recognized words.
Q4.How did your team shares responsibility and communicate the information of schedule with
others in the team to coordinate design and manufacturing dependencies?
Ans: We have a WhatsApp group to communicate with each other. Apart from that we divide our
work most of the in-group of two and sit together to do the work. As far as it comes to responsibility,
we all share a common purpose of learning so even if there is one person in our group who knows
how to work with something which other members don’t. Firstly, all other members learn that thing
and then we all sit together to complete the project.
Q5.What resources did you use to learn new materials not taught in class for the course of the
project?
Ans. We mainly referred to the online resources such as wikipedia,youtube etc. in order to learn
new materials that were not taught in class.
Q6.Does the project make you appreciate the need to solve problems in real life using
engineering and could the project development make you proficient with software development
tools and environments?
Ans: Yes, this project made us realize what we have learned in the last four years and how much
power we hold. It does not matter if we know some stuff or not. The Internet is there to pick us up
and help us start running. All that we need is the passion and willingness to do what we want.
We learned how to develop a website, worked with various image processing algorithms and natural
language processing techniques. So yes this did help us get comfortable with many environments
and algorithms.
Page 72 of 80
REFERENCES
[1] N. Venkata Rao, Kalyanchakravarthi P., Dr. A.S.C.S.Sastry, A.S.N. Chakravarthy, “Optical
Character Recognition Technique Algorithms” vol. 83. No.2Journal of Technology and Applied
Information Technology, 2016
[2] Er. Neetu Bala , “Optical Character Recognition Techniques: A Review” vol. 4, Issue 5,
International Journal of Advanced Research in Computer Science and Software Engineering,
2014
[3] Hiral Modi, M.C. Parikh, “A review on Optical Character Recognition Techniques”, vol.160.
No.6 International Journal of Computer Applications (0975 – 8887), February 2017
[4] Yu Zhong, KalleKaru, Anil K. Jain, "Locating Text in complex Color Images" Pattern
Recognition Elvesier Science Ltd., volume 28, No.10, pp. 1523-1535, 1995.
[5] Bazzi, Issam, Richard Schwartz and John Makhoul, “An omnifont open-vocabulary OCR
system for English and Arabic”Pattern Analysis and Machine Intelligence 21 495-504, 1999
[6] P. Shankar Rao, J. Aditya, “Handwriting Recognition – “Offline” Approach”, Department of
CSE, Andhra University, 2010.
[7] J.Pradeep, E.Srinivasan and S.Himavathi. “Diagonal Based Feature Extraction for Handwritten
Alphabets Recognition System using Neural Network” International Journal of Computer Science
& Information Technology (IJCSIT), Vol 3, No 1, Feb 2011
[8] John Hutchins “SUMMARIZATION: SOME PROBLEMS AND METHODS” [From:
Meaning: the frontier of informatics. Informatics 9. Proceedings of a conference jointly sponsored
by Aslib, the Aslib Informatics Group, and the Information Retrieval Specialist Group of the British
Computer Society, King's College Cambridge, 26-27 March 1987; edited by Kevin P. Jones.
(London: Aslib, 1987), p. 151-173.]
[9] R. Karpinski, D. Lohani, and A. Belaïd “METRICS FOR COMPLETE EVALUATION OF

OCR PERFORMANCE” [From: Int'l Conf. IP, Comp. Vision, and Pattern Recognition | IPCV'18
|]
[10] S.V. Rice, F.R. Jenkins, T.A. Nartker, The Fourth Annual Test of OCR Accuracy,
Technical Report 95-03, Information Science Research Institute, University of Nevada, Las Vegas,
July 1995.
Page 73 of 80
[11] R.W. Smith, The Extraction and Recognition of Text from Multimedia Document
Images, PhD Thesis, University of Bristol, November 1987.
[12] R. Smith, “A Simple and Efficient Skew Detection Algorithm via Text Row
Accumulation”, Proc. of the 3rd Int. Conf. on Document Analysis and Recognition (Vol. 2), IEEE
1995, pp. 1145-1148.
[13] P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, Wiley-IEEE,
2003.
[14] S.V. Rice, G. Nagy, T.A. Nartker, Optical Character Recognition: An Illustrated Guide
to the Frontier, Kluwer Academic Publishers, USA 1999, pp. 57-60.
[15] P.J. Schneider, “An Algorithm for Automatically Fitting Digitized Curves”, in A.S.
Glassner, Graphics Gems I, Morgan Kaufmann, 1990, pp. 612-626.
[16] R.J. Shillman, Character Recognition Based on Phenomenological Attributes: Theory

and Methods, PhD. Thesis, Massachusetts Institute of Technology. 1974.
[17] B.A. Blesser, T.T. Kuklinski, R.J. Shillman, “Empirical Tests for Feature Selection
Based on a Pscychological Theory of Character Recognition”, Pattern Recognition 8(2), Elsevier,
New York, 1976.
[18] M. Bokser, “Omnidocument Technologies”, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp.
1066-1078.
[19] H.S. Baird, R. Fossey, “A 100-Font Classifier”, Proc. of the 1st Int. Conf. on Document
Analysis and Recognition, IEEE, 1991, pp 332-340.
[20] G. Nagy, “At the frontiers of OCR”, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp 1093-
1100.
[21] G. Nagy, Y. Xu, “Automatic Prototype Extraction for Adaptive OCR”, Proc. of the 4th
Int. Conf. on Document Analysis and Recognition, IEEE, Aug 1997, pp 278-282.
[22] I. Marosi, “Industrial OCR approaches: architecture, algorithms and adaptation

techniques”, Document Recognition and Retrieval XIV, SPIE Jan 2007, 6500-01.
Page 74 of 80
PLAGIARISM REPORT
Plagiarism Checker X Originality Report

Similarity Found: 18%
Date: Tuesday, October 22, 2019

Statistics: 1186 words Plagiarized / 6586 Total words
Remarks: Medium Plagiarism Detected - Your Document needs Selective Improvement.
----------------------------------------------------------------------------------------------------------------------
INTERNET SOURCES:
----------------------------------------------------------------------------------------------------------------------
0% - Empty
0% - https://en.wikipedia.org/wiki/Wikipedia:
0% - https://www.geeksforgeeks.org/counting-n
0% - https://stackoverflow.com/questions/5479
0% - http://www.nltk.org/
0% - https://www.careerlauncher.com/machine-l
0% - https://docs.nvidia.com/gameworks/conten
0% - http://houseofbots.com/tagged-news?tag=R
0% - https://machinelearningmastery.com/machi
0% - https://www.researchgate.net/publication
0% - https://www.biblegateway.com/passage/?se
0% - http://read.pudn.com/downloads167/source
0% - http://www.in-mediakg.com/software/tts/t
0% - https://d3bxy9euw4e147.cloudfront.net/os
Page 75 of 80
0% - http://jts2019.com/session-programme/
0% - https://www.codeproject.com/articles/167
0% - https://jivp-eurasipjournals.springerope
0% - https://github.com/tesseract-ocr/tessera
0% - http://usednissanpatrol.com.au/repair-ma
0% - http://homepages.inf.ed.ac.uk/rbf/HIPR2/
0% - http://www.pondiuni.edu.in/storage/dde/d
0% - https://www.ittoolkit.com/articles/proje
0% - https://quizlet.com/5104182/glossary-of-
0% - https://www.academia.edu/3732087/A_surve
0% - http://www.bing.com/Translator
0% - https://monkeylearn.com/text-analysis/
0% - https://newhorizonsmt.wixsite.com/websit
0% - https://businessofsoftware.org/2011/08/j
0% - https://kintronics.com/how-alpr-works/
0% - https://www.learntoplay.net/1-simple-tri
0% - https://cdn.ymaws.com/www.a4pt.org/resou
0% - https://www.ijsr.in/upload/1746695408Cha
0% - https://apps.apple.com/gb/app/microsoft-
0% - https://www.online-translator.com/About/
0% - https://www.pyimagesearch.com/2018/09/26
0% - https://source.android.com/compatibility
0% - https://www.cips.org/Documents/Qualifica
0% - https://www.ukessays.com/essays/manageme
0% - https://www.adafruit.com/categories/105
0% - https://reqtest.com/requirements-blog/fu
0% - https://nrcan.gc.ca/energy/efficiency/bu
0% - https://careerfoundry.com/en/blog/ux-des
Page 76 of 80
0% - https://aapm.onlinelibrary.wiley.com/doi
0% - https://www.slideshare.net/abubashars/su
0% - https://www.researchgate.net/profile/Chi
0% - https://study.com/academy/lesson/writing
0% - https://html.com/input-type-button/
0% - https://www.mathworks.com/help/images/re
0% - http://www.bibalex.org/isis/UploadedFile
0% - https://www.nltk.org/api/nltk.html
0% - https://textminingonline.com/getting-sta
0% - https://www.smartdraw.com/gantt-chart/st
0% - https://www.uml-diagrams.org/component-d
0% - https://www.tutorialspoint.com/uml/uml_c
0% - https://www.tutorialspoint.com/software_
0% - https://www.conceptdraw.com/examples/and
0% - https://www.hosiaisluoma.fi/blog/archima
0% - https://www.microsoft.com/en-au/store/co
0% - https://www.modishproject.com/civil-serv
0% - https://www.sejda.com/compress-pdf
0% - http://www.promeng.eu/downloads/training
0% - https://www.academia.edu/23879666/Evalua
0% - https://biopython.org/DIST/docs/tutorial
0% - https://www.codesdope.com/cpp-dynamic-me
0% - https://opencvpython.blogspot.com/2012/0
0% - https://plus.cs.aalto.fi/o1/2018/w05/ch0
0% - https://datascienceplus.com/how-to-extra
0% - https://minhld.wordpress.com/2017/06/
0% - https://biopython.org/DIST/docs/tutorial
0% - https://datascienceplus.com/how-to-extra
0% - http://docshare.tips/opencv-tutorials-24
Page 77 of 80
0% - https://www.sciencedirect.com/science/ar
0% - https://www.coursehero.com/file/p27o563b
0% - https://quizlet.com/19668018/web-111-fla
0% - https://theailearner.com/tag/image-proce
0% - https://mafiadoc.com/calculus-applicatio
0% - https://opticspy.github.io/lightpipes/de
0% - https://arxiv.org/pdf/1003.5893.pdf
0% - http://static.googleusercontent.com/medi
0% - https://docs.microsoft.com/en-us/windows
0% - https://storage.googleapis.com/pub-tools
0% - https://epdf.pub/the-fuzzy-systems-handb
0% - https://github.com/tesseract-ocr/tessera
0% - https://machinelearningmedium.com/2019/0
0% - https://dl.acm.org/citation.cfm?id=31673
0% - https://www.gutenberg.org/files/28490/28
0% - https://home.deib.polimi.it/gini/robot/d
0% - https://www.academia.edu/30969770/An_Ove
0% - https://machinelearningmedium.com/2019/0
0% - http://www.uap-bd.edu/ce/anam/Anam_files
0% - http://ecomputernotes.com/fundamental/in
0% - https://mnsl-journal.springeropen.com/ar
0% - https://storage.googleapis.com/pub-tools
0% - https://static.googleusercontent.com/med
0% - https://www.yahoo.com/
0% - https://static.googleusercontent.com/med
0% - http://lili.org/forlibs/ce/able/course7/
0% - https://www.academia.edu/32328825/An_ove
Page 78 of 80
2% - https://www.cs.bgu.ac.il/~elhadad/hocr/
0% - https://statistics.berkeley.edu/computin
0% - http://www.infitt.org/ti2014/papers/121_
0% - https://www.litcharts.com/literary-devic
0% - https://pypi.org/project/googletrans/
0% - https://davidwalsh.name/google-translate
0% - https://unitedlanguagegroup.com/blog/why
0% - https://github.com/elzeard91/py-googletr
0% - http://hayko.at/vision/dataset.php
0% - https://www.federalregister.gov/document
0% - https://www.jeffbullas.com/meta-titles-a
0% - http://dagdata.cvc.uab.es/icdar2013compe
0% - https://www.securityinformed.com/news/da
0% - https://photographylife.com/advantages-a
0% - http://szeliski.org/Book/drafts/Szeliski
0% - http://etlcdb.db.aist.go.jp/
0% - https://cedar.buffalo.edu/~srihari/paper
0% - https://link.springer.com/article/10.100
0% - https://www.jeita.or.jp/english/about/20
0% - https://www.topuniversities.com/universi
0% - https://patents.google.com/patent/US7730
0% - https://www.sightline.us/images/pdf/ETL_
1% - https://csce.ucmss.com/cr/books/2018/LFS
0% - https://dev.mysql.com/doc/en/char.html
0% - http://computationalculture.net/out-of-b
Page 79 of 80
0% - https://pinoybix.org/2014/11/mcqs-in-dig
0% - https://en.m.wikipedia.org/wiki/Units_of
0% - https://journals.plos.org/ploscompbiol/a
0% - https://en.wikipedia.org/wiki/Kebab_case
0% - https://mafiadoc.com/proceedings-of-pape
1% - https://medium.com/cashify-engineering/i
0% - https://www.geeksforgeeks.org/python-thr
0% - https://ufo-filters.readthedocs.io/en/ma
0% - https://gregorkovalcik.github.io/opencv_
0% - https://docs.opencv.org/2.4/modules/imgp
0% - https://opencvpython.blogspot.com/
0% - https://medium.com/@Kittipop.P/concept-o
0% - https://docs.opencv.org/3.4/d7/d4d/tutor
0% - https://docs.opencv.org/trunk/d7/d4d/tut
0% - https://support.minitab.com/en-us/minita
0% - http://aircconline.com/cseij/V6N1/6116cs
0% - https://flask-limiter.readthedocs.io/en/
0% - http://ijarcet.org/wp-content/uploads/IJ
0% - https://www.altexsoft.com/blog/datascien
0% - https://dl.acm.org/citation.cfm?id=25952
0% - https://en.wikipedia.org/wiki/Programmin
0% - https://www.academia.edu/5853095/A_Revie
0% - https://owl.purdue.edu/owl/research_and_
Page 80 of 80

Final Eval Report PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Eval Report PDF

Uploaded by

Copyright:

Available Formats

OCR Based Text Recognition & Translation

Capstone Project Report

(101603179) MAGANJOT SINGH

(101603121) HIMANSHU GOYAL

(101611024) HIMANSHU SHARMA

BE Third Year, Computer Engineering

Under the Mentorship of

Dr. Singara Singh

Computer Science and Engineering Department

Date: 18th December 2019

Roll No. Name Signature

101603179 Maganjot Singh

101603121 Himanshu Goyal

101611024 Himanshu Sharma

Counter Signed By:

Dr. Singara Singh

Computer Science & Engineering Department,

Date: 18th December 2019

Roll No. Name Signature

101603179 Maganjot Singh

101603121 Himanshu Goyal

101611024 Himanshu Sharma

5. IMPLEMENTATION AND EXPERIMENTAL RESULTS 28

6. CONCLUSIONS AND FUTURE DIRECTIONS 63

7.1 Challenges Faced 66

APPENDIX-B PLAGIARISM REPORT 75

Table No. Caption Page No.

Table 1 Assumptions and Constraints 4

Table 2 Research Findings for Existing literature 8-14

Table 3 Cost Analysis 17-18

Table 4 Risk Analysis in empty spot detection 18

Table 5 Risk Analysis in localization and guidance 19

Table 6 Peer Assessment Matrix 67

Table 7 Role Playing and Work Schedule 68

Table 8 Student Outcomes 69-71

Figure No. Caption Page No.

Figure 1 Proposed Solution 21

Figure 2 Work Breakdown Structure 22

Figure 3 Component Diagram 24

Figure 4 Design Level Diagram 25

Figure 5 MVC Tier Architecture 26

Figure 6 Basic Architecture 26

Figure 7 User Interface Diagram 27

Figure 8 OCR Experimental Flow 28

Figure 9 Image Segmentation 30

Figure 10 Activity Diagram 33

Figure 11 Simple Thresholding Output 34

Figure 12 Adaptive Thresholding Output 35

Figure 14 Document Scanning Output 41

Figure 15 Component Diagram 46

Figure 16 Deployment Diagram 47

Figure 17 System Prototype Screenshots 48

Figure 18 Test Inputs 57

Figure 19 Test Outcomes 59

1.1 Project Overview

1.1.1 Technical Terminology

Kick-off- To push for beginning of a process

1.1.2 Problem Statement

1.2 Need Analysis

1.3 Research Gaps

1.5 Assumptions and Constraints

• Reading text written in other languages on various articles