Professional Documents
Culture Documents
Final Eval Report PDF
Final Eval Report PDF
Submitted by:
(10160030) GEETIKA
CPG No: 56
Associate Professor
TIET, Patiala
December 2019
ABSTRACT
The aim of the project 'Multilingual OCR' is to develop OCR software for online/offline
handwriting recognition. OCR is an Optical character recognition and is the mechanical or
electronic translation of images of handwritten or typewritten text (usually captured by a
scanner) into machine-editable text and then can be summarized. OCR is a field of research in
pattern recognition, artificial intelligence and machine vision.
Handwritten recognition is used most often to describe the ability of a computer to translate
human writing into text. This may take in one of the two ways, either by scanning of written
text or by writing directly on peripheral input devices.
Text Summarization is the process of creating a condensed form of text document which
maintains significant information and general meaning of source text. Automatic text
summarization becomes an important way of finding relevant information precisely in large text
in a short time with little efforts.
ii
DECLARATION
We hereby declare that the design principles and working prototype model of the project
entitled OCR Based Text Recognition & Translation is an authentic record of our own work
carried out in the Computer Science and Engineering Department, TIET, Patiala, under the
guidance of Dr. Singara Singh during the 6th semester (2019).
101610030 Geetika
Faculty Mentor:
Associate Professor
TIET, Patiala
iii
ACKNOWLEDGMENT
We would like to express our thanks to our mentor Dr. Singara Singh. He has been of great
help in our venture, and an indispensable resource of technical knowledge. He is truly an
amazing mentor to have.
We are also thankful to Dr. Maninder Singh, Head, Computer Science and Engineering
Department, entire faculty and staff of Computer Science and Engineering Department, and
also our friends who devoted their valuable time and helped us in all possible ways towards
successful completion of this project. We thank all those who have contributed either directly
or indirectly towards this project.
Lastly, we would also like to thank our families for their unyielding love and encouragement.
They always wanted the best for us, and we admire their determination and sacrifice.
101610030 Geetika
iv
TABLE OF CONTENTS
ABSTRACT ii
DECLARATION iii
ACKNOWLEDGEMENT iv
TABLE OF CONTENTS v
LIST OF TABLES viii
LIST OF FIGURES ix
1. INTRODUCTION 1
1.1 Project Overview 1
1.1.1 Technical Terminology 1
1.1.2 Problem Statement 2
1.1.3 Goal 2
1.1.4 Solution 2
1.2 Need Analysis 3
1.3 Research Gaps 3
1.4 Problem Definition and Scope of the Project 4
1.5 Assumptions & Contstraints 4
1.6 Approved Objectives 5
1.7 Methodology Used 5
1.8 Summary of Project Outcomes 5
1.9 Novelty of Work 6
2. REQUIREMENT ANALYSIS 7
2.1 Literature Survey 7
2.1.1 Theory Associated With Problem Area 7
2.1.2 Existing Systems and Solutions 7
2.1.3 Research Findings for Existing Literature 8
2.1.4 The Problem That Has Been Identified 14
2.1.5 Survey of Tools and Technologies Used 15
2.2 Standards 15
2.3 Software Requirement Specifications 15
v
2.3.1 Introduction 15
2.3.1.1 Purpose 15
2.3.1.2 Intended Audience and Reading Suggestions 16
2.3.1.3 Project Scope 16
2.3.2 Overall Description 16
2.3.2.1 Product Perspective 16
2.3.2.2 Product Features 16
2.3.3 External Interface Requirements 17
2.3.3.1 User Interfaces 17
2.3.3.2 Hardware Interfaces 17
2.3.3.3 Software Interfaces 17
2.3.4 Other Non-Functional Requirements 17
2.3.4.1 Performance Requirements 17
2.3.4.2 Safety Requirements 17
2.3.4.3 Security Requirements 17
2.4 Cost Analysis 17
2.5 Risk Analysis 18
3. METHODOLOGY ADOPTED 20
3.1 Investigative Technology 20
3.2 Proposed Solution 21
3.3 Work Breakdown Structure 22
3.4 Tools & Technologies Used 23
4. DESIGN SPECIFICATIONS 24
4.1 System Architecture 24
4.2 Design Level Diagram 25
4.3 User Interface Diagrams 27
vi
5.3 Working of the project 31
5.3.1 Procedural Work Flow 31
5.3.2 Algorithmic Approaches Used 34
5.3.3 Project Deployment 46
5.3.4 System Screenshots 48
5.4 Testing Process 54
5.4.1 Test Plan 54
5.4.1.1 Features to be tested 54
5.4.1.2 Test Strategy 54
5.4.1.3 Test Techniques 54
5.4.2 Test Cases 57
5.4.3 Test Results 59
5.5 Results and Discussions 61
5.6 Validation of Objectives 61
7. PROJECT METRICS 66
APPENDIX-A REFERENCES 73
vii
LIST OF TABLES
viii
LIST OF FIGURES
Figure 13 OTSU 36
ix
INTRODUCTION
The question arises, How to actually achieve it? - The image is first scanned and the text and visual
contents are converted into a bitmap, which is essentially a matrix of monochrome dots. The image
is then pre-processed where the brightness, contrast and contours are adjusted to enhance the
accuracy of the process. The graphic is then broken into segments identifying the areas of interest
(areas comprising images and text) which helps start off the abstraction. The areas containing text
can be divided further into lines, words & characters, by the virtue of which, the software is able to
map the characters with the datasets through comparison, identification and various other detection
and matching algorithms. The conclusive result is the extracted text from the input still that the
system returns. The process might not turn out to be 100% accurate and might require human aid
and intelligence to correct some elements that weren’t able to be scanned properly due to irregular
terrain and lighting. Dictionary or other NLP Techniques can also be applied to handle the errors.
Further down, for the translation part the, Google Translation AJAX API will be sought for
which provides a simple interactive programmed interface for translating/identifying any random
string into any language that is supported, using modern standards Neural Network translation and
LSTMs. It is highly responsive, so web platforms and applications can easily interact with the
integrated API for fast, efficient translation of the detected/input text from the automatically
detected source language to the destination choice of language (such as French to English). Now
finally the summarization will be done using the python Natural Language Toolkit.
Natural Language Toolkit- NLTK includes many corporate and linguistic resources such as
WordNet using several text processing libraries for operations such as classification, tagging,
Page 1 of 80
parsing, tokenization, stemming, and semantic reasoning. In addition, wrappers for industry level
libraries can be accessed using NLTK.
Optical Character Recognition- Recognising Text/Characters from images that might contain
noise/other interference
1.1.3 Goal
To build a website that provides a one stop solution to easy understanding of signs/text in other
languages by incorporating OCR, Translation of the recognized text and an alternative to
summarize long texts in a way that it puts minimum load on the servers and provides maximum
responsiveness and reliability to the product.
1.1.4 Solution
• Build an OCR Module that identifies text from input images and sends it into translation
• Provide language options for translation of the output
• Provide an alternative for Summarization and text to speech conversion of the text
User:
➢ Input Image for recognition: The Recognition is done on the basis of input by the user,
be it an already clicked image to upload, or a live image input
➢ Translation: User can translate the recognised text into the given choices of languages.
Page 2 of 80
➢ Summarization: The system will allow the users to summarize the text if the text is
lengthy and complex for the user to summarize.
➢ Text to Speech: The user can also convert the text to speech for ease of access
Page 3 of 80
1.4 Problem Definition and Scope
The scope of this project is outlined as follow:-
• Find out the advantages of building a unified platform to recognise, translate and summarize
text.
• Study the practicability and different options to enhance the user’s experience and potency of
OCR.
• Develop a paradigm system that integrates a backend system and a mobile consumer running
on a smartphone.
• Evaluate and review the positives and weaknesses of the project build outcome, look for and
determine for additional improvement.
Page 4 of 80
4 One assumption about the product is that it can always be used on
mobile phones that have enough performance. If required hardware
resources are not available in the phone for the application to perform
well, there may be scenarios where the application does not work as
intended or even at all.
Page 5 of 80
Deliverables:-
1. A website with the following integrated features:
i OCR
ii Summarization
iii Translation
iv Important Words Highlight
2. A working model
3. Trained Dataset
Page 6 of 80
REQUIREMENT ANALYSIS
In software and system engineering, requirement analysis encircle the tasks those go
in finding the requirements or conditions to satisfy for a replacement or altered product or
project, taking account of the presumably conflicting necessities of the assorted stakeholders,
analysing, documenting, verifying and managing software or system necessities.
➢ Google translate uses live camera or pre saved image for text recognition and translation.
➢ Apps like Google Keep or Ever-note is very useful while working on OCR occasionally.
Page 7 of 80
➢ Text Fairy app frequently transforms images to text in no time. First enhancement is done
on the image and then various pre-processing is done.
➢ CamScanner is an app that enables us to make pdf from scanned copies and serves as a
means to store docs as well.
• Summarization: -
➢ SMMRY (pronounced SUMMARY) was created to summarize paragraphs and long articles
into smaller summarized articles
➢ SMMRY reduces the text into important sentences using algorithms
➢ it accomplishes its targets by:
▪ Ranking sentences by priority using some algorithms
▪ Removing transition phrases
▪ Removing excessive examples
• Translation:-
➢ Microsoft Translator is another popular translate app. It features instant translation
through camera, though it supports half the languages than Google translate does.
➢ Online-Translation.com is known for providing self-automated conversion online from
and to the most of all European languages. High quality translations are provided by this
service with the help of its unique linguistic technologies.
All the literature findings from various research papers have been listed below:-
TABLE 2: Research Findings for existing literature
Page 8 of 80
Salvat
ore
TABB
ONE.
2 101603121 Himanshu Preprocessing Processing Filters, Transformations, Yasser
Goyal Techniques in Techniques Histogram Processing, Algina
Character Image Enhancement hi
Recognition Techniques
3 101603121 Himanshu Optical OCR, The OCR techniques Sukhp
Goyal Character Templates based on neural network reet
Recognition provide more accurate Singh
Techniques results than other
techniques.
4 101603121 Himanshu Handwritten Artificial The best recognition Surya
Goyal Character Neural accuracy was reported Nath
Recognition – Network, K- for Binarization, Noise R S,
A Review Nearest removal,
Neighbor, Skeletonization and Afseen
Support Normalization as aS
Vector preprocessing methods.
Machine
5 101603121 Himanshu Thresholding: Optical Various thinning and H.K.
Goyal A Pixel level Character thresholding algorithms Anasu
Image Recognition, yield better results for ya
Processing Pixel level an input image. Devi
Methodology processing
Preprocessing algorithms,
Technique for thresholding
an OCR
System For
Brahmi Script
Page 9 of 80
6 101611024 Himanshu Locating text in Locating The methods give good Yu
Sharma complex color text, Optical results on a variety of Zhon
images character test images. The g,
recognition, algorithms are relatively Kalle
Color robust to variations in Karu
segmentation, font, color, or size of the ,
Spatial text Anil
variance K.
Jain
7 101611024 Himanshu A Review on Character In this paper projection Hiral
Sharma Optical Recognition profile based method for Modi,
Character System, segmentation, fourier M. C.
Recognition Image transform technique is Parikh
Techniques Segmentation, for preprocessing, and
OCR, nearest neighbour
Preprocessing, classifier for
Skew classification
correction,
Classifier.
8 101611024 Himanshu Handwriting Handwriting The design of human P.
Sharma Recognition – identification, computer interfaces Shank
“Offline” feature based on handwriting is ar
Approach extraction, part of a tremendous Rao,
handwriting research effort together J.
individuality, with speech recognition, Aditya
large-scale language processing
systems for and translation to
offline facilitate
analysis communication of
people with computer
networks.
Page 10 of 80
9 101611024 Himanshu Detecting Text Android, An Android application Sathia
Sharma Based Image OCR, text is developed by priya
With Optical translator, text integrating Tesseract Ramia
Character to speech OCR engine, Bing h,
Recognition for translator and phones’ Tan
English built-in speech out Yu
Translation and technology Liong,
Speech using Manoj
Android Jayaba
lan
10 101603179 Maganjot Optical Word OCR system gives much Ahme
Singh Character detection, better results in terms of d
Recognition of zoning, performance and Asif
Bangla character accuracy in comparison Chow
Characters separations with existing usual dhury
using neural and character approach due to the , Ejaj
network: A recognition application of the Ahme
better approach efficient ways of line d,
and word detection, Sham
zoning, character eem
separations, character Ahme
recognition and also d,
employment of Shohr
skewness correction, ab
thinning, better Hossa
approach in scaling and in,
neural network in Chow
detection of characters. dhury
Mofiz
ur
Page 11 of 80
Rahm
an
11 101603179 Maganjot Optical Optical The network has been Najib
Singh Character Character trained and tested for a Ali
Recognition Recognition, number of widely used Moha
(OCR) System Neural fonts.The recognition of med
Network, new font characters by Isheaw
Fuzzy Logic the system is very easy y,
and quick. We can edit Habib
the information of the ul
documents more Hasan
conveniently and we
can reuse the edited
information as and
when required.
12 101603179 Maganjot Optical Optical Tesseract is command- Chirag
Singh Character Character based tool, which is Patel,
Recognition by Recognition open source and is Atul
Open Source (OCR), Open available in the form of Patel,
OCR Tool Source, DLL, Dynamic Link Library. Dharm
Tesseract: A Tesseract It can be easily made endra
Case Study available in graphics Patel
mode.
13 101603179 Maganjot Extraction of Text OCR is easily portable G.R.H
Singh Text from an Extraction, and its scalability which emala
Image and its Android, can recognize various kshmi,
Language OCR, languages M.Sak
Translation Tesseract thiman
Using OCR imala,
J.Salai
Page 12 of 80
Ani
Muthu
14 101610030 Geetika Optical OCR, The proposed method N.
character Handwritten computes error rate Venka
recognition Character efficiently, it results in ta Rao,
technique Recoginition, increasing the accuracy Dr.
algorithms Neural of neural network. A.S.C.
Network. S.Sastr
Recognition y,
Technique A.S.N.
Chakr
avarth
y,
Kalya
nchakr
avarthi
P
15 101610030 Geetika Diagonal based Handwritten The diagonal method J.Prad
feature character of feature extraction eep,
extraction for recognition, yields the highest E.Srini
handwritten Image recognition accuracy vasan,
alphabets processing, of 97.8 % for 54 S.Him
recognition Feature features and 98.5% for avathi
system using extraction, 69 features
neural network feed forward
neural
networks
16 101610030 Geetika Optical Optical Despite the Samee
Character character computational ksha
Recognition recognition, complexity involved, Barve
Using Artificial Artificial artificial neural
Page 13 of 80
Neural Neural networks offer several
Network Network, advantages in back-
supervised propagation network and
learning, the classification in the
Multi-Layer sense of emulating
Perception, adaptive human
the back intelligence to a small
propagation extent.
algorithm
17 101610030 Geetika Summarization: Information The indexing of John
some problems Retrieval, documents is a valuable Hutchi
and methods Text aid to reliable and ns
summarization consistent retrieval, and
indexing is itself a
variety of
summarization.
Information access and
retrieval depends
crucially on all these
various types and tools
of knowledge
organization.
Page 14 of 80
2.1.5 Survey of Tools and Technologies Used
• Anaconda Spyder
• Raspberry Pi
• Sublime Text
• Internet.
• Internet Browser.
• Laptop.
2.2 Standards
• IEEE 754 for documentation
• IEEE 1233-1998 For System Requirement Development
• IEEE 830-1998 For System Requirement Specification Analysis
• Python 3.6
• Tesseract 4.0.0
• Raspberry Pi 3 Model B+
• OpenCV 4.0.0
2.3.1.1 Purpose
Tourists and Corporate Travellers face many problems on a frequent basis when understanding or
translating text/symbols into recognisable and easy ways.
• Translating and Recognising Text into digital form is not very common
• Summarization of a text embedded in an image based on POS Tags in multiple languages
• The Apps consume a lot of space in the current Smartphone OSs’
• Ease of Access for the user to perform all the operations on a text on a single platform
Page 15 of 80
2.3.1.2 Intended Audience and Reading Suggestions
The software targets the students and professors who require the text from books in editable form
without the need to type the entire text.
• Investigate the value and the benefits of developing a system for OCR and Translation in
corporates and tourism
• Study the feasibility and different options to enhance the User’s domain (by using multiple
languages/adding features for the user)
• Study the website development procedure for a global platform that is responsive on the least
specifications too
• Design and develop a prototype system which includes a backend system and a mobile client
running on a smartphone
• Evaluate and review the positives and weaknesses of the project build outcome, seek and
identify for further improvement
2.3.2 Overall Description
2.3.2.1 Product Perspective
To build a website that provides a one stop solution to easy understanding of signs/text in
other languages by incorporating OCR, Translation of the recognized text and
summarization of long texts in a way that it puts minimum load on the servers and provides
maximum responsiveness and reliability to the product.
2.3.2.2 Product Features
- OCR Recognition
- Read another Image
- Translation
- Summarization
- Text to Speech
2.3.3 External Interface Requirements
Page 16 of 80
2.3.3.1 User Interfaces
We require an interactive web interface which would be easily accessible and hostable from all
platforms.
Page 17 of 80
2.4 Cost Analysis
TABLE 3: Cost Analysis
Total 2300
failure
Line Failure Could detect lines and figures Better Pre-Processing using
OTSU and Image Opening &
Closing
Page 18 of 80
2. Failure in localization and guidance
TABLE 5: Risk Analysis in localization and guidance
failure
Loose Connections Inability to pick image input Check proper connectivity
Page 19 of 80
METHODOLOGY ADOPTED
Methodology is the systematic, theoretical analysis of the methods applied to a chosen arena of
study It involves a theoretical investigation of the body of methods and theories associated with a
branch of knowledge. Typically, it incorporates concepts such as paradigms, theoretical models,
process steps, and other quantitative or qualitative techniques.
Page 20 of 80
• The user can also use the summarization function that uses functions of the NLTK toolkit
to summarize the important words and can also generate a parse tree of the pin pointed
words.
Page 21 of 80
3.3 Work Breakdown Structure
The following shows the work breakdown structure of our project:-
Page 22 of 80
3.4 Tools and Technologies Used:-
• Tesseract
• Python
➢ Scikit-Image
➢ ImUtils
➢ OpenCV
➢ Numpy
➢ Argparse
➢ GoogleTrans
➢ os
• HTML, CSS, PHP, Flask
• Cloud Server
Page 23 of 80
DESIGN SPECIFICATIONS
Product style specification may be a careful document providing data a few designed
product or process. For instance, the planning specification should embody all necessary
drawings, dimensions, environmental factors, technology factors, aesthetic factors,
maintenance that may be required, etc. It may additionally provide specific samples of how the
planning ought to be dead, serving to others work properly.
Page 24 of 80
4.2 Design Level Diagram
It is a design phase structure that allows us to build a more modular and efficient program structure
which results in a reduced complexity.
Page 25 of 80
FIGURE 5: MVC Tier Architecture
Page 26 of 80
4.3 User Interface Diagram
It is used for modelling the user to system interactions and enables the user to gain a high level
generalised overview of the software.
Page 27 of 80
IMPLEMENTATION AND EXPERIMENTAL RESULTS
The system that has been developed by us, uses the Tesseract library to alternatively identify
characters and various other knowledge sources to enhance performance parameters. Different
methods are implied for variable languages. Mixed characters are first divided into its
component/individual identifiable symbols which, in addition to being a natural way of dealing with
the Devanagari script, helps reduce the size of the set. The automated instructor makes two iterations
on the text containing image to detect the characteristics of all the symbols in the script. According
to the sequence of events, one by one our system first uses multiple image frames of handwritten or
printed text from the camera and then the frame is sent for back-in pre-processing which is achieved
by using sk-image and OpenCorp. The kit uses tools and the contents that have been converted to
text become the input for the Google Translate AJAX API, allowing us to have the language changed
content required and we also get the choice to summarize the information for better understanding.
Page 28 of 80
5.2 Experimental Analysis
5.2.1 Data
The following datasets can be used by Tesseract as a training dataset:
Images are often used to embed text information in electronic documents (web and email). The use
of images as text carriers stems from many requirements. For example, images are used to beautify
(such as titles, titles, etc.), to attract attention (such as advertisements), to hide information (such as
avoiding text-based filtering images in spam emails Used for), even a computer tell a human
separately (captcha test).
Automatically extracting text from born-digital images is an interesting possibility as it will provide
enabling technology for many applications such as improved indexing and retrieval of web content,
increased access to content, content filtering (such as advertising or spam email) e.t.c.
While birth-digital text images are similar to actual visual text images on the surface (both feature
text in complex color settings), they are different at the same time. Birth-digital images are
inherently low-resolution (created to be broadcast and displayed on online screens) and text on the
image is digitally created; On the other hand visual text images are captured in high-resolution
cameras. Although innate digital images may suffer from compression artifacts and severe anti-
aliasing, they do not share the illumination and geometric problems of real visual images. Therefore
it is not necessary that methods developed for one domain will work in another.
The "ETL Character Database" is a collection of approximately 1.2 million hand-written and
machine-printed numerals, symbols, images of the Latin alphabet and Japanese characters and is
compiled into 9 datasets (ETL-1 to ETL-9). This database was reorganized as the Electrotechnical
Laboratory (currently the Institute of Advanced Industrial Science and Technology (AIST)) in
collaboration with the Japan Electronic Industry Development Association (currently reorganized
as the Japan Electronics Industry and Information Technology Industry Association), universities,
and others. Was reorganized into). (Has been reorganized). Research Organization for Character
Recognition Research from 1973 to 1984. The database can only be used for research purposes for
free. This database was supplied by mail, sending recorded media such as magnetic tape and CD-
R. This database has been available for download since April 2011. Since January 2014 the database
Page 29 of 80
has been moved to etlcdb.db.aist.go.jp. Specification: ETL-1, ETL-2, ETL-3, ETL-4, ETL-5, ETL-
6, ETL-7, ETL-8, ETL-9
where Calnis the number of characters in the aligned sequence. Character precision and recall are
computed respectivelyas defined in the equation 3a and equation 3b.
Page 32 of 80
FIGURE 10: Activity Diagram
Page 33 of 80
5.3.2 Algorithmic Approaches Used
Page 34 of 80
FIGURE 12: Adaptive Thresholding
Page 35 of 80
FIGURE 13: OTSU Comparison Histograms
5.3.2.2 Preprocessing
Using OpenCV, a document scanning a pre-processing unit can be created in 5 easy phases:
Page 36 of 80
Step-1 Building The Perspective Transform Module
We begin by importing the required dependencies. The order_points function that we defined takes
a single parameter argument, which pts', which is a data container of four points that specifies the x
(x, y) coordinates of each point in the rectangle. Now we start by allocation of the required memory
for the four to-be-used ordered points. The top-left point is detected, which will have the lowest
coordinate sum, as well as the bottom-right point, which will have the highest coordinate sum.
Finally, we cardinally subtract the top-right and bottom-left points. Here we take the difference (ie
x1 - y1) between different points, which use the np.diff function of the number library. The points
with the minimum cardinal difference would be the points at the top-right, while the points with the
maximum cardinal difference will be the ones at the bottom-left. The ordered functions are now
returned and passed to the calling function.
Page 37 of 80
We begin by defining the Four_point_transform function. Now we call / interrupt for the
order_point function. Now, we unpack these coordinates for convenience. Now we calculate the
resolution of the newly created warped_image. Therefore, we can set the width of the new image.
In a reflective / derivative way, we proceed to find the height of the resulting image.
Then, we define 4 points that represent a "top-down" view of the image. The first entry (0, 0) in
this list indicates the top-left corner. The next entry is (max 1 - 0, 0) which represents the top-right
corner. Next, we have (MaxWith - 1, MaxHeight - 1) which is the bottom-right corner. Finally, we
will have (0, MaxHeight - 1) which is in the lower-left corner of the image.
To actually see the image upside down or "bird's eye", we'll use the cv2.getPerspectiveTransform
function. We implement the transformation matrix using the cv2.warpPerspective function.
Page 38 of 80
Step-2 Detect The Edges
Initially the packages that are needed start importing. We will import the Four_point_transform
function that we created above. We also use the Emulates module, which includes convenient
functions for sizing, rotating, and cropping images.
First, we load our image from disk. To catalyze our image processing, as well as to make our edge
detection step more accurate, we will take the size of our scanned image to around 500 pixels. From
here, we convert our image from RGB to grayscale and Gaussian blur to remove noise with high
frequency, and canny edge detection. Then the output is displayed.
Page 39 of 80
Step-3 Finding The Contours
We begin by finding the shape in the input image. We have also handled the issue that OpenCV 2.4
gives different sizes according to the version OpenCV4 used. A performance trick we like is to
actually sort the shape on the shells and preserve only the largest. This allows us to restrict our
examination to its largest parts, excluding the rest. Then we start controlling the controls and
estimate the number of digits. And again, this is a fairly safe assumption. The scanner app will
assume that (1) the document being scanned is the main focal point of our image and (2) the
document is now rectangular, and thus will have four different edges.
Page 40 of 80
All heavy lifting is controlled by the four_point_transform function. We pass two arguments to our
function char_point_transform: the first is the original image that we loaded from disk, and the
second is the document that represents the document, multiplied by our modified ratio.
Since we are working with bimodal images, Otsu's algorithm tries to find a threshold value (T),
which reduces the weighted class variant given by the relation
It actually finds a value of T that lies between the two peaks such that the differences in both classes
are minimal. This can only be implemented in Python:
Page 41 of 80
5.3.2.3 Tesseract Engine Working
The Tesseract is an open-source optical character recognition engine that was developed between
1984 and 1994 at Hewlett Packard Systems. Like a supernova in space, it appeared from anywhere
during the 1995 UNLVAnual tests for OCR accuracy.
This had an important advantage: from nesting to examining the number of children and
grandchildren, it is easy to detect and identify the inverted text and to easily identify it as black-
white text.
Blobs are organized into segments and text lines, and lines and areas are analyzed for fixed pitch or
proportional text. Text lines are divided into different words based on the type of character lines.
The fixed pitch text is immediately sliced by the character cells. Proportional text is divided into
words using proportional spaces and fuzzy logics.
The recognition then proceeds as a two-iteration process. In the initial pass, an attempt is made to
identify the words in return. Every term that is satisfactory is given to an adaptive classifier for
training data. The adaptive classifier then gets a chance to identify the text at the bottom of the page
more accurately.
Page 42 of 80
The last step not only handles fuzzy spaces, but also examines the alternative hypothesis for x-height
to detect small-cap text.
5.3.2.3.1 Line and Word Finding
The art of the recognition process for any character recognition engine is to identify how a word
should be divided into characters. The initial segmentation output of the line search is first
classified. The rest of the stages of our word recognition apply only to our non-fixed-pitch text.
5.3.2.3.2.1 Chopping Joined Characters
While the result from the term is unsatisfactory, Tesseract attempts to improve the result by cutting
the drop with the worst confidence from its character classification. Candidate chop points are
found from the concave corner of the polygon approximation of the outline, and vice versa can be
Page 43 of 80
concave or line segments. This can take up to 3 pairs of chop points to successfully separate
characters from the ASCII set.
5.3.2.3.3.1 Features
The early version of Tesseract used topical features developed from the work of Shillman et. al.
Although well independent of font and size, these features are not robust to problems found in
real-life images, as Boecker describes. One idea involved using parts of the polygon
approximation, but the method is also not robust to damaged characters.
5.3.2.3.3.2 Classification
The classification proceeds as a two-step process. In the first stage, a class pruner creates a shortlist
of character classes that can match the unknown. For each attribute, from a quantized 3-
dimensional look-up table, the bit-vectors that match it, and the bit-vectors are expressed on all
attributes. Classes with the highest stages (after being corrected for the number of expected
features) become a shortlist for the next step.
Each feature of the unknown looks like a small vector of the prototype of a given class that it can
match, and then the real similarity between them is calculated. The best joint distance, which is
calculated from detailed characterization and prototype evidence, is the best over all stored
configurations of the class.
5.3.2.3.4 Training Data
Since the classifier is able to easily identify damaged characters, the classifier was not trained on
damaged characters. In fact, the classifier was trained on 20 samples of 94 characters from 8 fonts
Page 44 of 80
in the same size, with 4 attributes, totaling 60160 training samples. This is in contrast to other
published classifiers, such as the Calra Classifier with over one million samples, and Baird's 100-
font classifier with 1175000 training samples.
5.3.2.3.5 Linguistic Analysis
Tesseract has relatively little linguistic analysis. Whenever the word recognition module is
considering a new division, the linguistic module selects the best available word string in each of
the following categories: top consecutive word, top word word, top numeric word, top UPPER
case word, top lower case Word (with)) optional initial upper), top classification option word. The
final decision for a given segment is the term with the minimum distance rating, where each of the
above ranges is multiplied by a different constant.
Words in different clauses may have different characters. It is difficult to directly compare these
terms, even where a classifier claims productive probabilities, which Tesseract does not. This
problem is solved in Tessract by generating two numbers for each character classification. The
first, called confidence, is the zero of the normalized distance from the prototype. This enables
"confidence" in the sense that the greater the number, but still the greater a distance, the greater
the distance from zero. The second output, called the rating, multiplies the normalized distance
from the prototype by the total outline length in the unknown character. Ratings for characters
within a word can be expressed meaningfully, because the total length of all the characters within
a word is always the same.
5.3.2.4 Translation Working
Googletrans is a free and unlimited Python library that has implemented the Google Translate API.
It uses the Google Translate Ajax API to call for such methods and functions to detect and translate.
i) If the source language is not provided, Google Translate tries to find the source language.
Page 45 of 80
5.3.3 Project Deployment
Page 46 of 80
FIGURE 16: Deployment Diagram
Page 47 of 80
5.3.4 System Screenshots
The following are the screenshots taken while running or executing the model or its integrated
components:-
Page 48 of 80
Page 49 of 80
Page 50 of 80
Page 51 of 80
Page 52 of 80
FIGURE 17: System Prototype Screenshots
Page 53 of 80
5.4 Testing Process
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub assemblies, assemblies and/or a finished product. It is the process of exercising
software with the intent of ensuring that the software system meets its requirements and user
expectations and does not fail in an unacceptable manner. There are various types of test. Each test
type addresses a specific testing requirement.
5.4.1 Test Plan
5.4.1.1 Features to be tested
- Recognition Ability
- Translation of recognised text
- Summarization of recognised text
- Pipelining of Data
- Text to Speech conversion ability
5.4.1.2 Test Strategy
• Testing normal working of website server
• Testing whether the OCR page is working well or not.
• Testing the accuracy of text obtained from OCR.
• Checking whether the pipelining from OCR page to translator and summarizer page is
working fine or not
• Testing the working of translator and summarizer
• Checking the accuracy of translator and summarizer.
5.4.1.3 Test Techniques
5.4.1.3.1 Unit Testing
Unit testing involves the design of test cases that validate that the intemal program logic is
functioning properly, and that program input produces valid outputs All decision branches and
internal code flow should be validated. It is the testing of individual software units of the application
it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is invasive.
Unit tests perform basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business process performs
Page 54 of 80
accurately to the documented specifications and contains clearly defined inputs and expected
results.
Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct
phases.
Test strategy and approach-
Field testing will be performed manually and functional tests will be written in detail.
Test objectives:
- All field entries must work properly.
- Pages must be activated from the identified link.
- The entry screen, messages and responses must not be delayed
5.4.1.3.2 Integration Testing
Integration tests are designed to test integrated software components to determine if they actually
run as one program. Testing is event driven and is more concerned with the basic outcome of screens
or fields. Integration tests demonstrate that although the components were individually satisfaction,
as shown by successfully unit testing, the combination of components is correct and consistent.
Integration testing is specifically aimed at exposing the problems that arise from the combination
of components
Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects.The task of the
integration test is to check that components or software applications, ex. components in a software
system or one step up software applications at the company level - interact without error
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
5.4.1.3.3 System Testing
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.
User Acceptance Testing is a critical phase of any project and requires significant participation by
the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
5.4.1.3.4 Functional Testing
Page 55 of 80
Functional tests provide a systematic demonstration that functions tested are available as specified
by the business and technical requirements, system documentation, and user manuals
Functional testing is centered on the following items:
Valid Input: identified classes of valid input must be accepted.
Invalid Input: identified classes of invalid input must be rejected.
Functions: identified functions must be exercised.
Output: identified classes of application outputs must be exercised.
System Procedures: interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions, or
special test cases. In addition, systematic coverage pertaining to identify business process flows,
data fields predefined processes, and successive processes must be considered for testing. Before
functional testing is complete, additional tests are identified and the effective value of current tests
is determined.
There are two basic approaches of functional testing
a. Black box or functional testing
b. White box testing or structural testing
5.4.1.3.4.1 Black box testing
This method is used when knowledge of the specified function that a product has been design to
perform is known. The concept of black box is used to represent a system hose inside workings are
not available to inspection. In a black box the test item is eaten as “Black”, since its logic is unknown
is what goes in and what comes out, or the input and output In black box testing, we try various
inputs and examine the resulting outputs. The black box testing can also be used for scenarios based
test In this test we verify whether it is taking valid input and producing resultant out to user. It is
imaginary box testing that hides internal workings. In our project valid input is image resultant
output well structured image should be received.
5.4.1.3.4.2 White box testing
White box testing is concern with testing implementation of the program. The intent of structural
testing is not to exercise all the inputs or outputs but to exercise the different programming and data
structures used in the program Thus structure testing aims to achieve test cases that will force the
desire coverage of different structures. Two types of path testing are:
1. Statement testing
2. Branch testing
Page 56 of 80
5.4.1.3.4.2.1 Statement Testing
The main idea of statement testing coverage is to test every statement in the objects method by
executing it at least once. However, realistically, it is impossible to test program on every single
input, so you never can be sure that a program will not fail on some input.
5.4.1.3.4.2.2 Branch Testing
The main idea behind branch testing coverage is to perform enough tests to ensure that every branch
alternative has been executed at least once under some test. As in statement testing coverage, it is
unfeasible to fully test any program of considerable size.
5.4.2 Test Cases
Page 57 of 80
FIGURE 18: Test Inputs
Page 58 of 80
5.4.3 Test Results
Page 59 of 80
Page 60 of 80
FIGURE 19: Test Outcomes
Page 62 of 80
CONCLUSIONS AND FUTURE DIRECTIONS
6.1 Conclusion
A system has been developed which uses tesseract library to optically recognize the characters and
various knowledge sources to improve the performance. Different approaches have been taken for
different languages e.g. the composite characters are first segmented into its constituent symbols
which helps in reducing the size of the set, in addition to being a natural way of dealing with
Devanagari script. The automated trainer makes two passes over the text image to learn the features
of all the symbols of the script. According to the sequence of events as put one after another first
our system takes in multiple image frames of handwritten or printed text from the camera and then
the frames are sent to the backend for pre-processing and the content that has been converted to text
becomes the input for out Google translation API from which we obtain the language converted
content required and we also get the option to summarize the information for better understanding.
6.2 Environmental, Economic and Societal Benefits
1. Environmental: We've seen notes below all inbound emails, which ask us to think twice
before printing a document, and we all understand that reducing paper consumption has a
measurable impact on the environment. But imaging can also help with environmental
concerns in ways you may not have considered. How much fuel is used to ship paper to the
office? What about moving documents to offsite storage? How about running climate
control for document storage areas?
2. Economic:
i. Reduced Cost: In addition to the actual cost of your employees' labor, there are
countless other costs that can be mitigated by implementing an imaging and OCR
solution. Some of these areas include: printing, copying, maintenance of
consumables and office equipment, lost document costs, and shipping costs. We
have helped a company save half a million dollars in annual shipping costs
through shipping alone!
ii. Reduced Errors: Unfortunately, people make mistakes. We forget things, make
typos, lose and misfile documents. Imaging and OCR will reduce expensive
errors that can cost a healthcare organization hundreds of thousands of dollars a
year.
Page 63 of 80
3. Societal:
i. Availability: By scanning and extracting the information of paper documents with
OCR, it provides information and document image in multiple locations and
multiple systems with no delay in searching for image and information retrieval.
With a few clicks of the mouse, the document is available to those who need it.
ii. Security: In healthcare, document security is a constant concern. In addition to
the compliance issues brought by extensive regulation, there are real costs that
can make mishandled or inaccurate documents. With electronic documents,
access can be strictly controlled and you will have the ability to fully audit any
access to documents.
6.3 Reflections
The following reflections can be stated based on our working and completion of the project:
1. The model we had in our mind of a single tool for translation, text recognition and summarization
has been successfully implemented and is now a working webpage.
2. The webpage is incorporated with a number of features including character recognition from the
image,text translation from one language to another, and making long paragraphs into summarized
ones. These all tools can be used simultaneously with the help of pipelining.
3. We are working on our idea to make it more accurate and user-friendly.
Page 64 of 80
• There are particular optimization techniques available for Otsu's binarization. We can search and
implement it.
Page 65 of 80
PROJECT METRICS
Page 66 of 80
4. Machine Learning: The training and testing of the project essentially requires the knowledge
of machine learning to optimize accuracy.
As the above mentioned subjects/disciplines are all essential and are used in building this project and
is fairly shared it defines the Interdisciplinary Knowledge Sharing.
Evaluation Of
S1 S2 S3 S4
S1
S2
Evaluation
By
S3
S4
Page 67 of 80
7.5 Role Playing and Work Schedule
TABLE 7: Role Playing and Work Schedule
Page 68 of 80
7.6 Student Outcomes, Description and Performance Indicators
TABLE 8: Student Outcomes
SO Student Outcome Descriptions Outcomes
B2 Use appropriate methods, tools and We have used Born-Digital Image and
techniques for data collection. ETL Character database.
Page 69 of 80
documentation, UML diagrams and
testing
D2 Can play different roles as a team Mutual understanding and team work
player has been heightened by discussion and
helping.
I1 Able to explore and utilize resources to We took help from seniors, YouTube
enhance self-learning. videos, courses by foreign universities,
Page 70 of 80
open source codebases and guidance
from mentor.
I2 Recognize the importance of lifelong Trying and failing is best way to learn
learning. more and more about a subject.
Ans. Various sources of information were explored to list down the possible project problems such
as different research papers,journals and some help from internet was also taken.
Q2.What analytical, computational and/or experimental methods did your project team use to obtain
solutions to the problems in the project?
Ans. Yes, the project demanded the demonstration of knowledge of fundamentals and scientific
principles. We used the basic principle of edge detection and line detection to form an outline border
for the image. Using this we de-skewed the image and applied basic thresholding techniques to
refine the image. The chopping and segmentation principle is used to chop characters into
meaningful recognized words.
Q4.How did your team shares responsibility and communicate the information of schedule with
others in the team to coordinate design and manufacturing dependencies?
Ans: We have a WhatsApp group to communicate with each other. Apart from that we divide our
work most of the in-group of two and sit together to do the work. As far as it comes to responsibility,
we all share a common purpose of learning so even if there is one person in our group who knows
how to work with something which other members don’t. Firstly, all other members learn that thing
and then we all sit together to complete the project.
Q5.What resources did you use to learn new materials not taught in class for the course of the
project?
Ans. We mainly referred to the online resources such as wikipedia,youtube etc. in order to learn
new materials that were not taught in class.
Q6.Does the project make you appreciate the need to solve problems in real life using
engineering and could the project development make you proficient with software development
tools and environments?
Ans: Yes, this project made us realize what we have learned in the last four years and how much
power we hold. It does not matter if we know some stuff or not. The Internet is there to pick us up
and help us start running. All that we need is the passion and willingness to do what we want.
We learned how to develop a website, worked with various image processing algorithms and natural
language processing techniques. So yes this did help us get comfortable with many environments
and algorithms.
Page 72 of 80
REFERENCES
[1] N. Venkata Rao, Kalyanchakravarthi P., Dr. A.S.C.S.Sastry, A.S.N. Chakravarthy, “Optical
Character Recognition Technique Algorithms” vol. 83. No.2Journal of Technology and Applied
Information Technology, 2016
[2] Er. Neetu Bala , “Optical Character Recognition Techniques: A Review” vol. 4, Issue 5,
International Journal of Advanced Research in Computer Science and Software Engineering,
2014
[3] Hiral Modi, M.C. Parikh, “A review on Optical Character Recognition Techniques”, vol.160.
No.6 International Journal of Computer Applications (0975 – 8887), February 2017
[4] Yu Zhong, KalleKaru, Anil K. Jain, "Locating Text in complex Color Images" Pattern
Recognition Elvesier Science Ltd., volume 28, No.10, pp. 1523-1535, 1995.
[5] Bazzi, Issam, Richard Schwartz and John Makhoul, “An omnifont open-vocabulary OCR
system for English and Arabic”Pattern Analysis and Machine Intelligence 21 495-504, 1999
[6] P. Shankar Rao, J. Aditya, “Handwriting Recognition – “Offline” Approach”, Department of
CSE, Andhra University, 2010.
[7] J.Pradeep, E.Srinivasan and S.Himavathi. “Diagonal Based Feature Extraction for Handwritten
Alphabets Recognition System using Neural Network” International Journal of Computer Science
& Information Technology (IJCSIT), Vol 3, No 1, Feb 2011
[8] John Hutchins “SUMMARIZATION: SOME PROBLEMS AND METHODS” [From:
Meaning: the frontier of informatics. Informatics 9. Proceedings of a conference jointly sponsored
by Aslib, the Aslib Informatics Group, and the Information Retrieval Specialist Group of the British
Computer Society, King's College Cambridge, 26-27 March 1987; edited by Kevin P. Jones.
(London: Aslib, 1987), p. 151-173.]
[10] S.V. Rice, F.R. Jenkins, T.A. Nartker, The Fourth Annual Test of OCR Accuracy,
Technical Report 95-03, Information Science Research Institute, University of Nevada, Las Vegas,
July 1995.
Page 73 of 80
[11] R.W. Smith, The Extraction and Recognition of Text from Multimedia Document
Images, PhD Thesis, University of Bristol, November 1987.
[12] R. Smith, “A Simple and Efficient Skew Detection Algorithm via Text Row
Accumulation”, Proc. of the 3rd Int. Conf. on Document Analysis and Recognition (Vol. 2), IEEE
1995, pp. 1145-1148.
[13] P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, Wiley-IEEE,
2003.
[14] S.V. Rice, G. Nagy, T.A. Nartker, Optical Character Recognition: An Illustrated Guide
to the Frontier, Kluwer Academic Publishers, USA 1999, pp. 57-60.
[15] P.J. Schneider, “An Algorithm for Automatically Fitting Digitized Curves”, in A.S.
Glassner, Graphics Gems I, Morgan Kaufmann, 1990, pp. 612-626.
[17] B.A. Blesser, T.T. Kuklinski, R.J. Shillman, “Empirical Tests for Feature Selection
Based on a Pscychological Theory of Character Recognition”, Pattern Recognition 8(2), Elsevier,
New York, 1976.
[18] M. Bokser, “Omnidocument Technologies”, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp.
1066-1078.
[19] H.S. Baird, R. Fossey, “A 100-Font Classifier”, Proc. of the 1st Int. Conf. on Document
Analysis and Recognition, IEEE, 1991, pp 332-340.
[20] G. Nagy, “At the frontiers of OCR”, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp 1093-
1100.
[21] G. Nagy, Y. Xu, “Automatic Prototype Extraction for Adaptive OCR”, Proc. of the 4th
Int. Conf. on Document Analysis and Recognition, IEEE, Aug 1997, pp 278-282.
Page 74 of 80
PLAGIARISM REPORT
Page 75 of 80
0% - http://jts2019.com/session-programme/
0% - https://www.codeproject.com/articles/167
0% - https://jivp-eurasipjournals.springerope
0% - https://github.com/tesseract-ocr/tessera
0% - http://usednissanpatrol.com.au/repair-ma
0% - http://homepages.inf.ed.ac.uk/rbf/HIPR2/
0% - http://www.pondiuni.edu.in/storage/dde/d
0% - https://www.ittoolkit.com/articles/proje
0% - https://quizlet.com/5104182/glossary-of-
0% - https://www.academia.edu/3732087/A_surve
0% - http://www.bing.com/Translator
0% - https://monkeylearn.com/text-analysis/
0% - https://newhorizonsmt.wixsite.com/websit
0% - https://businessofsoftware.org/2011/08/j
0% - https://kintronics.com/how-alpr-works/
0% - https://www.learntoplay.net/1-simple-tri
0% - https://cdn.ymaws.com/www.a4pt.org/resou
0% - https://www.ijsr.in/upload/1746695408Cha
0% - https://www.researchgate.net/publication
0% - https://apps.apple.com/gb/app/microsoft-
0% - https://www.online-translator.com/About/
0% - https://www.researchgate.net/publication
0% - https://www.pyimagesearch.com/2018/09/26
0% - https://source.android.com/compatibility
0% - https://stackoverflow.com/questions/3207
0% - https://stackoverflow.com/questions/2039
0% - https://www.cips.org/Documents/Qualifica
0% - https://www.ukessays.com/essays/manageme
0% - https://www.adafruit.com/categories/105
0% - https://reqtest.com/requirements-blog/fu
0% - https://nrcan.gc.ca/energy/efficiency/bu
0% - https://careerfoundry.com/en/blog/ux-des
Page 76 of 80
0% - https://stackoverflow.com/questions/4801
0% - https://aapm.onlinelibrary.wiley.com/doi
0% - https://www.slideshare.net/abubashars/su
0% - https://www.researchgate.net/profile/Chi
0% - https://study.com/academy/lesson/writing
0% - https://html.com/input-type-button/
0% - https://www.mathworks.com/help/images/re
0% - http://www.bibalex.org/isis/UploadedFile
0% - https://www.nltk.org/api/nltk.html
0% - https://textminingonline.com/getting-sta
0% - https://www.smartdraw.com/gantt-chart/st
0% - https://www.uml-diagrams.org/component-d
0% - https://www.tutorialspoint.com/uml/uml_c
0% - https://www.tutorialspoint.com/software_
0% - https://www.conceptdraw.com/examples/and
0% - https://www.hosiaisluoma.fi/blog/archima
0% - https://www.microsoft.com/en-au/store/co
0% - https://www.modishproject.com/civil-serv
0% - https://www.sejda.com/compress-pdf
0% - http://www.promeng.eu/downloads/training
0% - https://www.academia.edu/23879666/Evalua
0% - https://biopython.org/DIST/docs/tutorial
0% - https://www.codesdope.com/cpp-dynamic-me
0% - https://opencvpython.blogspot.com/2012/0
0% - https://www.pyimagesearch.com/2014/08/25
0% - https://plus.cs.aalto.fi/o1/2018/w05/ch0
0% - https://datascienceplus.com/how-to-extra
0% - https://minhld.wordpress.com/2017/06/
0% - https://www.pyimagesearch.com/2014/08/25
0% - https://biopython.org/DIST/docs/tutorial
0% - https://datascienceplus.com/how-to-extra
0% - http://docshare.tips/opencv-tutorials-24
Page 77 of 80
0% - https://www.sciencedirect.com/science/ar
0% - https://www.coursehero.com/file/p27o563b
0% - https://quizlet.com/19668018/web-111-fla
0% - https://theailearner.com/tag/image-proce
0% - https://mafiadoc.com/calculus-applicatio
0% - https://opticspy.github.io/lightpipes/de
0% - https://arxiv.org/pdf/1003.5893.pdf
0% - http://static.googleusercontent.com/medi
0% - https://docs.microsoft.com/en-us/windows
0% - https://www.researchgate.net/publication
0% - http://static.googleusercontent.com/medi
0% - https://storage.googleapis.com/pub-tools
0% - https://epdf.pub/the-fuzzy-systems-handb
0% - https://github.com/tesseract-ocr/tessera
0% - https://machinelearningmedium.com/2019/0
0% - https://dl.acm.org/citation.cfm?id=31673
0% - https://www.gutenberg.org/files/28490/28
0% - https://home.deib.polimi.it/gini/robot/d
0% - http://static.googleusercontent.com/medi
0% - https://www.academia.edu/30969770/An_Ove
0% - https://machinelearningmedium.com/2019/0
0% - http://www.uap-bd.edu/ce/anam/Anam_files
0% - https://www.academia.edu/30969770/An_Ove
0% - http://ecomputernotes.com/fundamental/in
0% - https://mnsl-journal.springeropen.com/ar
0% - https://storage.googleapis.com/pub-tools
0% - https://static.googleusercontent.com/med
0% - https://www.yahoo.com/
1% - https://www.academia.edu/30969770/An_Ove
0% - https://static.googleusercontent.com/med
0% - http://lili.org/forlibs/ce/able/course7/
0% - https://www.academia.edu/32328825/An_ove
Page 78 of 80
0% - https://www.researchgate.net/publication
2% - https://www.cs.bgu.ac.il/~elhadad/hocr/
0% - https://statistics.berkeley.edu/computin
1% - https://www.academia.edu/30969770/An_Ove
0% - http://www.infitt.org/ti2014/papers/121_
1% - https://www.academia.edu/30969770/An_Ove
0% - https://www.litcharts.com/literary-devic
0% - https://pypi.org/project/googletrans/
0% - https://davidwalsh.name/google-translate
0% - https://unitedlanguagegroup.com/blog/why
0% - https://github.com/elzeard91/py-googletr
0% - http://hayko.at/vision/dataset.php
0% - https://www.federalregister.gov/document
0% - https://www.jeffbullas.com/meta-titles-a
0% - http://dagdata.cvc.uab.es/icdar2013compe
0% - https://www.securityinformed.com/news/da
0% - https://photographylife.com/advantages-a
0% - http://szeliski.org/Book/drafts/Szeliski
0% - http://etlcdb.db.aist.go.jp/
0% - https://cedar.buffalo.edu/~srihari/paper
0% - https://link.springer.com/article/10.100
0% - https://www.jeita.or.jp/english/about/20
0% - https://www.topuniversities.com/universi
0% - https://patents.google.com/patent/US7730
0% - https://www.sightline.us/images/pdf/ETL_
1% - https://csce.ucmss.com/cr/books/2018/LFS
0% - https://dev.mysql.com/doc/en/char.html
1% - https://csce.ucmss.com/cr/books/2018/LFS
1% - https://csce.ucmss.com/cr/books/2018/LFS
1% - https://csce.ucmss.com/cr/books/2018/LFS
0% - http://computationalculture.net/out-of-b
0% - https://www.researchgate.net/publication
Page 79 of 80
0% - https://pinoybix.org/2014/11/mcqs-in-dig
0% - https://stackoverflow.com/questions/1321
0% - https://en.m.wikipedia.org/wiki/Units_of
0% - https://journals.plos.org/ploscompbiol/a
0% - https://en.wikipedia.org/wiki/Kebab_case
0% - https://mafiadoc.com/proceedings-of-pape
1% - https://medium.com/cashify-engineering/i
0% - https://www.geeksforgeeks.org/python-thr
0% - https://www.pyimagesearch.com/2014/09/08
0% - https://ufo-filters.readthedocs.io/en/ma
0% - https://gregorkovalcik.github.io/opencv_
0% - https://docs.opencv.org/2.4/modules/imgp
0% - https://opencvpython.blogspot.com/
0% - https://medium.com/@Kittipop.P/concept-o
0% - https://docs.opencv.org/3.4/d7/d4d/tutor
0% - https://docs.opencv.org/trunk/d7/d4d/tut
0% - https://stackoverflow.com/questions/4292
0% - https://support.minitab.com/en-us/minita
0% - http://aircconline.com/cseij/V6N1/6116cs
0% - https://docs.opencv.org/trunk/d7/d4d/tut
0% - https://flask-limiter.readthedocs.io/en/
0% - https://docs.opencv.org/trunk/d7/d4d/tut
0% - http://ijarcet.org/wp-content/uploads/IJ
1% - https://medium.com/cashify-engineering/i
0% - https://www.altexsoft.com/blog/datascien
1% - https://medium.com/cashify-engineering/i
1% - https://medium.com/cashify-engineering/i
0% - https://dl.acm.org/citation.cfm?id=25952
0% - https://en.wikipedia.org/wiki/Programmin
0% - https://www.academia.edu/5853095/A_Revie
0% - https://www.researchgate.net/publication
0% - https://owl.purdue.edu/owl/research_and_
Page 80 of 80