Professional Documents
Culture Documents
Graduation Project: Text Recognition in Drug Images: A Comparative Study of Several End-To-End OCR and Enhancements
Graduation Project: Text Recognition in Drug Images: A Comparative Study of Several End-To-End OCR and Enhancements
Graduation Project
Report
presented at
by
To all my family,
To all my friends,
First of all, praise, gratitude and thanks to God, the most powerful and merciful, who
gave me the opportunity and the ability and made me believe in every moment of despair
that everything is possible, that the next is easier and that success is a matter of time.
I would like to express my sincere thanks to my academic supervisor Mr. Slim Kanoun,
professor at the National School of Engineers of Sfax, for the quality of his supervision,
his continuous encouragement, his support and his precious advice which have been very
beneficial and fruitful.
I would also like to thank my industrial supervisor Mr. Slim Kanoun, professor at the
National School of Engineers of Sfax, for his availability, for the good working conditions,
his precious advice and above all for his patience and continual encouragement throughout
this project.
Also, I express my great gratitude to the members of the jury for having honored me
by accepting to evaluate this work. I hope they find in it the qualities of clarity and
motivation that they expect.
Finally, I would like to express my deep gratitude to all the teachers at the national
engineering school of Sfax for their availability and kindness during my training cycle at
this institution.
CONTENTS Khaled Ben chikha
Contents
Contents .............................................................................................................................. 3
List of Tables........................................................................................................................ 6
Acronyms ............................................................................................................................ 7
Implementation ................................................................................................................ 33
3.1 Introduction ........................................................................................................ 33
3.2 Hardware Environment ..................................................................................... 33
3.3 Software Environment ........................................................................................ 34
3.3.1 Programming language and frameworks ............................................. 34
3.3.2 Used libraries ......................................................................................... 35
3.4 Interfaces ............................................................................................................ 36
3.5 Code based realization and results ..................................................................... 39
3.6 Conclusion .......................................................................................................... 45
List of Figures
v
LIST OF TABLES Khaled Ben chikha
List of Tables
Page
Acronyms Khaled Ben chikha
Acronyms
AI: Artificial Intelligence
NLP: Natural Language Processing
ML: Machine Learning
DL: Deep Learning
OS: Operating System
NER: Named Entity Recognition
PDF: Portable Document Format
CNN: Convolutional Neural Network
OCR: Optical Character Recognition
RNN: Recurrent Neural Network
CV: Computer Vision
HOG: Histogram of Oriented Gradients
SVM: Support Vector Machines
LSTM: Long Short-Term Memory
CTC: Connectionist Temporal Classification
SAST: Static application security testing
General Introduction Khaled Ben chikha
General Introduction
For decades, the problem of manual input and the need and mass of documents
have been increasing. In addition, the technological progress experienced and the
appearance and rapid development of acquisition equipment such as scanners and
cameras and the significant innovation in information technology, the appearance
and development of artificial intelligence as well as significant improvements
experienced in computer hardware.
Additionally, most business workflows involve receiving information from print
media. Paper forms, invoices, scanned legal documents, and printed contracts are
all part of business processes. These large volumes of paperwork take up a lot of
time and space to store and manage. Although paperless document management
is the way forward, scanning the document to image creates challenges. This
process requires manual intervention and can be tedious and slow.
Moreover, the digitization of this documentary content creates image files in which
the text is hidden. Text in images cannot be processed by word processors in the
same way as text documents. All this has allowed the appearance of OCR
technology to solve all these problems.
Page 1
General Introduction Khaled Ben chikha
OCR stands for Optical Character Recognition. It is a CV field that uses many of
CV techniques such as text localization, detection, and recognition. It denotes the
process of converting printed material into word processing files or text files that
can be read, edited, and managed using computers using AI models based on ML
and DL algorithms.
OCR itself has taken many steps to arrive at this stage of technological maturation
and of vital importance during the last years. From recognition of checks to
scanning of postal addresses and then of forms in the first tests. old then archives
centers as well as much more complex administrative documents to arrive today at
OCRs capable of recognizing entire documents of hundreds of pages such as books
to give a digital copy in a few minutes. This has gigantically improved the human
knowledge via the web. In addition to its comprehensiveness, ease and availability,
this relatively easy concept vaster technically research field has affected the
medical field which has taken enormous benefit from OCR.
One of the most used OCR frameworks is easyocr. It has a lot of advantages such as high
average accuracy in recognition and easy use added to rapidity of processing. But it has
a lot of drawbacks such as non-resistance to bad quality images. In addition, it doesn’t
do well with images affected by artifacts including partial occlusion, distorted
perspective, and complex background. Globally, this is a medium quality OCR as
not the worse and not the best OCR in the world nowadays. That is a simple reason
why we have used this specific type of OCR as we will further explain our choice
according to this comparative study aiming to enhance its results by application on
medical field and specifically on drugs images.
Page 2
General Introduction Khaled Ben chikha
Page 3
General Introduction Khaled Ben chikha
This project is structured around 3 chapters and is closed with a conclusion. The
chapters are presented as follows:
• Chapter 1: describes the context of the project. In addition, it sets out state of
theart on end-to-end systems for text detection and recognition. Firstly, we
detail therelated work on Hand crafted approach. Secondly, we will speak about
the related work on deep learning approach. And finally, we will deal with the
existing solutions.
Page 4
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
Chapter
1.1 Introduction
This first chapter present the general context to which belongs our project. Subsequently,
we deal with end-to-end systems for text detection and recognition. It starts with the
related work on hand crafted approach used only for text detection and localization. After
that, we denote the related work on deep learning approach used for both text detection
and recognition. Finally, we summarize by a discussion on existing solutions.
Page 5
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
Page 6
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
Machine learning is a subset of AI that uses computer algorithms to analyze data and make
intelligent decisions based on what it has learned, without being explicitly programmed.
A crucial aspect of machine learning is that it allows machines to solve problems on their
own and make accurate predictions using the data provided
Deep learning is a dedicated subset of machine learning that uses layered neural networks
to simulate human decision making. Deep learning algorithms may be used to label and
classify data and recognize patterns. This allows AI systems to continuously learn on the
job, and to improve the quality and accuracy of results by figuring out whether decisions
were correct.
Computer vision is a field of artificial intelligence (AI) that enables computers and systems
to derive meaningful information from digital images, videos and other visual inputs and
take actions or make recommendations based on that information. If AI enables computers
to think, computer vision enables them to see, observe and understand.
Computer vision works much the same as human vision, except humans have a head
start. Human sight has the advantage of lifetimes of context to train how to tell objects
apart, how far away they are, whether they are moving and whether there is something
wrong in an image.
Page 7
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
Existing image functions may be kind of divided into 2 classes: the handcrafted and the
discovered ones. By handcrafted capabilities we understand the ones that are extracted
from separate snapshots in keeping with a positive manually predefined algorithm based
on the expert information. Consequently, handmade method was once the traditional
technique before the appearance of deep mastering. The handmade capabilities were
usually used with Traditional device studying methods for object popularity and laptop
vision like Guide Vector Machines, as an example. However, newer strategies like
convolutional Neural networks generally do no longer have to be supplied, as they can
examine the capabilities from the picture facts. Handmade method is Best used for
detection phase, it can’t be employed all through popularity phase because Complexity
of venture and the approach barriers. It needs a ML set of rules to perform the recognition
mission. Earlier, many characteristic extraction strategies were proposed in literature for
photo processing. HOG and DCT have been used in many samples Recognition issues.
Researchers used DCT capabilities for their studies paintings on a word-Based
popularity gadget. Their effects described better accuracy compared to different
Strategies. Used Radon remodel and DCT for function extraction. To enhance the Low
frequency thing, they used Radon rework. SVM was used for classification Experimental
outcomes present that the proposed functionsare green. DCT is likewise used for Vehicle
plate reputation, video textual content detection and Iris reputation. There’s also
presentation of a set of rules for ruling estimation of historic manuscripts.
Page 8
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
Page 9
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
Page 10
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
neighborhood features and which ultimately Outline the digits. An extensive observeof
returned propagation for optical recognition Of each handwritten letters and digits is given
via Martin and Pittman. Keeler, Martin, and others at MCC worked on combining
segmentation and Recognition in a single included device. That is important if characters
are Touching in any such manner that they cannot be segmented by a straightforward
segmentation process. Numerous comparative studies have additionally been done, either
via solving the data set and varying the techniques or with the aid of also the use of a
number of records units. Man on et al Is an early reference where easy and multilayer
perceptron are compared with Statistical distance-based totally classifiers inclusive of ANN
in recognizing handwritten digits for automatic analyzing of postal codes. A comparison
of CNN, multilayer perceptron, and radial foundation functions in spotting handwritten
digits is given by Lee. An evaluation of the mission and numerous neural and
traditional processes is also given. For the undertaking of optical handwritten person
reputation, a giant step Was the production of a CDROM (special Database three) via
NIST which includes a huge set of digitized
Individual pics and pc subroutines that technique them. A number of the above-
referred to works use this data set or its predecessor. It’s miles to be had by writing to
NIST. A comparison of 4 statistical and 3 neural community classifiers is given via Blue et
al. For optical Character recognition and a comparable assignment, fingerprint reputation
(for which a comparable CDROM changed into additionally made to be had through NIST.
Researchers from NIST made several research using this information set and technical
reviews can be accessed over the internet. Currently with the discount of price of computing
power and memory, it has Been viable to have multiple structures for the same project that
are then blended To enhance accuracy . One technique is to have parallel fashions after
which Take a vote. Every other method is to have models cascaded where less difficult
fashions Are used to classify easier photographs and complicated strategies are used to
classify Images of poorer exceptional. Otherwise, deep learning may be compared to
handcrafted by the figure below.
Page 11
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
1.5.1.1 Easyocr
Easyocr is a python totally based OCR library made through Jaided-AI organization
which extracts the text from the photograph. Its a geared up-to-use OCR with more than
forty languages supported which include Chinese language, jap, Korean and Thai. It’s an
open supply task licensed under Apache 2.0.
It does the sure pre-processing steps(grey scaling and many others.,) within its
library and extracts the text. It also applies the CRAFT set of rules to stumble
Page 12
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
on the text. CRAFT is a scene text detection method to successfully detect textplace
through exploring every individual and affinity among the characters. The recognition
model makes use of CRNN. The sequencing labelling is carried out by wayof LSTM and
CTC. Here the CTC is supposed for labelling the unsegmented series records with RNN.
Easyocr helps multiple hyper-parameters for read text method. The hyper-parameters are
protected below each layer of the processing mechanism. It also supports hyper-
parameters that belongs to CRAFT as well. One could find the specific hyper-parameters
here that may be used inside it and helps GPU processing. Its output has a nested array in
which first detail gives the co-ordinate axis which may be used tomark the text in the
image. Subsequent the actual text and the remaining is the self-assurance price. While
we form the sentence from the. Extracted text, the order wherein the texts are extracted
has a few discomforts.
The first-rate of your source photograph is important. If the first-rate of the unique
source image is good, it means that if the human eyes can see the unique supply truly, it
is goingto be viable to acquire true OCR effects. However, if the authentic supply itself
is not clean, then OCR outcomes will most in all likelihood consist of mistakes. The higher
the high-quality of original supply image, the easier it’s far to differentiate characters
fromthe rest, the higher the accuracy of OCR might be.
1.5.1.2 Tesseract
Page 13
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
It was simplest these days that some humans at HP determined to convey it to lifestyles in
2005 via freeing the code, with assist from the facts technology research Institute and some
debugging from Google. The latter, significantly inquisitive about OCR techniques, is
now accountable for its improvement and enhancements. Tesseract is a simple reputation
engine, within the feel that it does no longer offer a person interface, perform format
evaluation, or layout the effects it produces. Some other of its obstacles is that it handiest
recognizes US-ASCII characters and consequently most effective works successfully with
documents written in English. Sooner or later, the acquisition of grayscale or color
documents remains tough. Its compilation and its execution are possible underneath each
GNU/Linux and Microsoft systems. Its average accuracy is at 0.97.
Easyocr supports the GPU version and performance is good on GPU and provides the
confidence of the extracted text which can be used to analyze further. EasyOCR works
better with noisy images when compared with tesseract.
Tesseract supports customized pre-processing layer based on the user’s need and work
pretty faster with multiple images. It gives the output as a sentence which is not the case
with easyOCR. Its performance is directly linked towards the quality of the image. It has the
configuration to extract only the digits. Tesseract also support training on customized data
as well.
we would like to point out the general limitations of both tesseract and easyocr.
In general Poor quality scans may produce poor quality OCR. If a document contains
languages outside of those given in the LANG arguments, results may be poor. On
Page 14
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
handwritten text both would give low results. Doesn’t do well with images affected by
artifacts including partial occlusion, distorted perspective, and complex background.
Both tesseract and easyocr is good for scanning the clean documents and would
result with the higher accuracy. Both supports LSTM.
1.5.2.1 Easyocr
1.5.2.2 PaddleOCR
PaddleOCR, advanced through Baidu, is primarily based on the deep studying framework
PaddlePaddle (PArallel allotted Deep getting to know). It supports Linux, home windows,
macOS, and different systems.
Textual content detection models: EAST, DB, SAST Text popularity models: CRNN,
Rosetta, superstar-net, uncommon, SRN
You may educate and installation PaddleOCR to servers, mobile (both iOS and
Android), embedded, and IoT devices. It has a Paddle Lite library that allows to integrate
Page 15
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
with go-platform hardware for smooth deployment. It helps both CPU and GPU. For
faster computing, GPU is desired.
PaddleOCR supports the Python programming language, but for inference and
deployment, you may use C++ and Python. Diverse options are available for serving and
benchmarks. You can effortlessly installation any bundle the usage of the pip package
manager. You could additionally use this GitHub repo to convert Paddle to PyTorch.
PaddleOCR isn’t always the most correct, but after some publish-processing,
PaddleOCR gives tough opposition to Tesseract, especially inside the Chinese language
language. On the time of writing, it supports greater than 80 languages, together with
Korean, German, and French.
1.5.2.3 GOCR
GOCR turned into developed underneath the GNU public license by means of Joerg
Schulenburg. (It become to start with called GNU OCR however later modified to Joerg’s
OCR). It helps enter codecs like TIFF, GIF, PNG, PNM, PBM, and BMP to output a text
report. It helps windows, Linux, and OS/2.
You could integrate GOCR with one-of-a-kind frontends, which makes it easy to
port to extraordinary operating systems and architectures. You don’t should educate a
application or shop big fonts. You may actually call from the command line to get the
effects.
It isn’t always continually correct because it has difficulty reading handwritten textual
content, noisy photos, and overlapping characters. It’s to be had in English and can also
translate barcodes. On the time of writing, GOCR isn’t actively maintained, without a
new release due to the fact 2018. It doesn’t appear to have big network help, either
Page 16
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha
It works on the CPU but not with the GPU. It turned into written in the c program
language period. A few wrappers are available, like gocr-php, Golang implementation, and
GOCR.Js.
Cognitive OpenOCR (Cuneiform) with the aid of Cognitive technologies turned into
advanced by combining databases from other openware OCR applications, plus user
input and comments. It helps twenty to thirty languages, together with Russian, English,
Turkish, and Italian.
As it is database is built-in, you don’t want an internet connection to apply it; but, it
has no longer been actively maintained in view that 2019 and does not provide network
support.
Most of the time, outputs want modifying, and the tool gives negative consequences
with much less contrasted pictures, making it less accurate. It really works on the CPU
however does not assist GPU. It turned into written in C and C++ and has a wrapper
available in the Internet.
1.6 Conclusion
In this chapter, we spoke about handcrafted approach and its techniques.Then,we detailed
deep learning approach related work.After that,we moved to optical character recognition
systems,we made a comparaison study between ocr systems and we will retain only easyocr
as it is suitable for our project context.
Page 17
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
Chapter
2
Experimental study on text detection
and recognition on drugs images
2.1 Introduction
In this section, we provide an experimental study on text detection and recognition on drugs
images. We begin by handcrafted approach results. Then ,we dive into deep learning
approach in which we start by easyocr experimental results and we finish by OCR based
CNN results .Finally, we finish by a general performance evaluation of our system.
Handcrafted approach is mainly based on SVM, KNN and many other traditional
techniques, it can only do the text detection task as it is represented by the craft system
in easyocr framework. So, in this section we will compare handcrafted approaches to our
system only detection part which is craft.
Page 18
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
Handcrafted techniques are SVM and KNN mainly based text detection. This techniques
are mainly traditional and also have poor results compared to advanced text detection
techniques like CRAFT which is a part of easyocr framework.
CRAFT is used to refer to character region awareness for text detection. Traditional
characterlevel bounding box detection techniques are not adequate to address these
areas. In addition, the ground truth generation for character-level text processing is a
tedious and costly task. The CRAFT text detection model uses a convolutional neural
network model to calculate region scores and affinity scores. The region score is used to
localize character regions while the affinity score is used to group the characters into
text regions.
CRAFT uses a fully convolutional neural network based on the VGG16 model. The
inference is done by providing word-level bounding boxes. The CRAFT text detection
model works well at various scales, from large to small texts and is shown to be effective
on unseen datasets as well. However, on Arabic and Bangla text, which have continuous
characters, the text detection performance of the model is not up to the mark.
The time taken by CTPN, EAST and MSER text detection methods is lower compared to
the CRAFT text detection engine. However, CRAFT is more accurate, and the bounding
boxes are more precise when the text is long, curved, rotated or deformed as illustrated
in the figures below.
Page 19
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
Page 20
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
CRAFT is very advanced, and has more benefic results compared to traditional systems
as it is shown in the figure below.
In this section, we will detail some advanced approaches used recently to carry out the best
accuracy in text detection as well as in text recognition based mainly on deep learning
approaches. thus, we will begin by easyocr approach and main results .Then , we will detail
the other techniques based on CNN .Finally,we will close by a general performance
evaluation in which we will compare the two cited techniques to our system.
Page 21
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
Page 22
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
Page 23
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
2.3.1.3 Easyocr comparison to other OCR systems in term of accuracy and pace
Here, we compare easyocr to different other OCR systems in term of accuracy and
processing rapidity. This is illustrated by the figures below.
As result interpretation, we deduce that easyocr is more accurate and relatively slow
which should be enhanced in our case because medicine recognition has critical accuracy
need and less rapidity concerns. As it concerns our case we tend in our system to enhance
this framework by the use of segmentation based on EAST detector and segmenting code
and also auto-correction algorithm based on dictionary comparison that we will talk about
in implementation phase.
Page 24
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
We conclude that easyocr has a great accuracy and an acceptable rapidity processing
compared to a lot of other ocr systems.Added to that,easyocr is the greatest context
related to our project.So,since the high accuracy need of medical applications in general
and our case in special, we have retain this system and use it in our project but we had
enhanced this technique to be more suitable and show more good and realistic results.
In the next subsection,we will talk about other CNN based systems and its experimental
study results and interpretations.
This is an end-to-end CNN based system for image text detection and recognition.
The Model used for the problem is called Convolutional Recurrent Neural Network
(CRNN) since it combines Deep Convolutional Neural Networks (DCNN) and RNN
which constructs an end-to-end system for sequence recognition. The Model consistsof
three components, including the convolutional layers, the recurrent layers, and a
transcription layer. The convolutional layers automatically extract a feature sequence from
each input image. On top of the convolutional network, a recurrent
Page 25
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
neural network is built for making prediction for each frame of the feature sequence,
outputted by the convolutional layers. The transcription layer at the top of CRNN is
adopted to translatethe per-frame predictions by the recurrent layers into a label sequence.
Though CRNN is composed of different kinds of network architectures (DCNN and RNN), it
can be jointly trained with one loss function.
A deep Bi-directional Recurrent Neural Network is built on the top of the convolutional
layers, as the recurrent layers. The recurrent layers predict a label distribution yt for
each frame xt in the feature sequence x =x1; … ; xT. We use a Bi-directional RNN here
because it is observed that in image-based sequences, contexts from both directions are
useful and complementary to each other.
2.3.2.3 Transcription
Transcription is the process of converting the per-frame predictions made by RNN into
a label sequence. We will be using Connectionist Temporal Classification (CTC) in our
transcription process to decode the output from the RNN and convert it to a text label.
Let’s discuss about Connectionist Temporal Classification now.
The Sequence labeling problem consists of input sequences X=[x1,x2,..,xT] and its
corresponding output sequences Y=[y1,y2,…,yU]. In this problem we need to find an
accurate mapping from X’s to Y’s but there are a few issues with this, that is both X
Page 26
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
and Y can vary in length and we don’t have an accurate alignment (correspondence of
the elements) of X and Y. The CTC algorithm overcomes these challenges. For a given
X it gives us an output distribution over all possible Y’s. We can use this distribution
either to infer a likely output or to assess the probability of a given output. I have tried
to explain CTC in brief in this section, you can read more about CTC in this well written
Medium Article and this blog in distill which helped me understand connectionist temporal
classification and its details.
2.3.2.5 Modeling
I trained two CRNN models on the same data to see which of the two gives better results.
Model 1: Used CNN, Bi-directional LSTM for RNN trained using Adam Optimizer.
Model 2: Used CNN, Bi-directional GRU for RNN trained on RAdam optimizer. Rectified
Adam (RAdam) is the new state of the art AI Optimizer. You can read about RAdam in
this Medium Article which explains the optimizer.
Model 1
Train Stage: which takes input image, text labels (encoded as integers), input length
(time-steps length), label length and outputs CTC loss Prediction Stage: which takes
input image and outputs a matrix of dimesnions 42x37 where 42 is the number of time-
steps of RNN and 37 is the length of letters + 1 character for ctc blank
Training the Model 1 on 200000 Images and validating on 12000 images for 20 epochs
with early stopping.
Page 27
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
Model 2
The Model 2 Architecture has Train and Prediction stages like Model 1.
Training the Model 2 on same set of images like in Model 1 for 20 epochs with early
stopping.
Page 28
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
.Summary
Let us look at the summary of results for both the models on Test dataimages.
Page 29
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
Out of the two models, Model 1 has better results. Here is another comparison of
cited models using a sample image.
Conclusion Model 1 Trained on 200000 images from Synth Text Images performs
reasonably well on Unseen 15000 Test Images of Variable length labels with an accuracy
of 0.88 and letter accuracy of 0.94 Model 2 also Trained on same 200000 images has an
accuracy of 0.82 and letter accuracy of 0.93 on 15000 Test Images of Variable length
labels.
Page 30
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
As a further improvement of this end-to-end text detection and recognition based CNN
models, we can Train the Models with more images like 300000 or 400000 images to see
if get any improvements in the results. Experiment with RAdam Optimizer for Model
1 instead of Adam Optimizer. This model is trained on Single word images, it can be
trained further to recognize special characters, sentences. As a conclusion, we denote that
CNN based systems are the best systems compared to OCR normal systems such as easyocr
or tesseract. We will instead apply a performance evaluation that compare our system to
handcrafted and DL last cited approaches.
In this section, we will compare our system based enhanced easyocr to handcrafted as well
as easyocr and CNN systems.
The general performance comparison between our enhanced system and previouscited
methods is illustrated by the following table.
As interpretation, we conclude that our system has gained a full accuracy and a little
time-consuming state which can be very benefic to help extract valuable information from
medicine images.
Page 31
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
2.5 Conclusion
In this section, we talked about different existing approaches, and we compared these
methods to each other. And finally, we compared the different techniques to our system
in term of accuracy. In the next chapter, we will talk about implementation and used
technologies helping to develop our system and interfaces composing the realized system.
Page 32
CHAPTER 3. Implementation Khaled Ben chikha
Chapter
Implementation
3.1 Introduction
Having analyzed our needs and justified our use of an enhanced easyocr algorithm, all that
remains is to choose the tools, languages and frameworks that we will use before starting to
develop, we begin by identifying what suits us best materially and technically. Hence,in
this last part, we first present the working environment and the various development tools
used. Then we detail the realization parts.
The characteristics of the machine on which we carried out our work are given in table 2.
Features Description
Mark MSI
Processor Intel(R) Core (TM) i5-11400H
RAM 8 GB
Disk 512 GB SSD
Graphic card Intel(R) UHD Graphics
OS Windows 11 professional
Page 33
CHAPTER 3. Implementation Khaled Ben chikha
In this section, we will present the technologies and tools we have used.
3.3.1.1 Python
Anaconda is a free and open source distribution of the Python and R programming
languages applied to the development of applications dedicated to data science and
machine learning, which aims to simplify package management and deployment.
.Jupyter
The jupyter project is a project whose objectives are to develop open source software,
open standards and services for interactive computing in several programming languages.
It was derived from IPython in 2014 by Fernando Pérez and Brian Granger.
3.3.1.3 Visual studio code
Visual Studio Code is an extensible code editor developed by Microsoft for Windows,
Linux, and macOS. Features include debugging support, syntax highlighting, smart code
completion, snippets, code refactoring, and Git built-in.
.Streamlit
engineers and data scientists. This framework allows you to create web applications that
can easily integrate machine learning models and data visualization tools.
Page 34
CHAPTER 3. Implementation Khaled Ben chikha
3.3.2.1 Opencv
TensorFlow is an open source machine learning tool developed by Google. The source
code was opened on November 9, 2015 by Google and released under the Apache license. It
is based on the DistBelief framework, initiated by Google in 2011, and has an interface for
Python, Julia and R
3.3.2.4 Numpy
Page 35
CHAPTER 3. Implementation Khaled Ben chikha
At the core of the NumPy package, is the ndarray object. This encapsulates n-
dimensional arrays of homogeneous data types, with many operations being performed in
compiled code for performance.
3.4 Interfaces
This figure shows the main interface which is loaded locally via streamlit server and have
streamlit design user interface.It has two buttons:the first called Browse files for uploading
a medicine package image from computer and the second called Text localization and
recognition with non improved easyocr which calls basic easyocr non enhanced function
to perform detection then recognition then print output text.
Page 36
CHAPTER 3. Implementation Khaled Ben chikha
This figure shows the image upload interface. The user clicks on Browse files button
and select a medicine package image stored on user computer.Then,the website prints the
selected image. There is possibility of image change by clicking Browse files and choosing
another image, the system will automatically do an update. Also, clicking on the exit icon
which appears by moving the mouse to the image displayed may delete the image from
screen and the interface become as main interface.
Page 37
CHAPTER 3. Implementation Khaled Ben chikha
Page 38
CHAPTER 3. Implementation Khaled Ben chikha
The figure shows the image text recognition interface. The recognized text is displayed
by clicking on recognize with easyocr and it can be also copied and pasted in another file
to store result.
This figure shows the effect of segmentation and its important full accuracy results.
Page 39
CHAPTER 3. Implementation Khaled Ben chikha
The figure 24 shows that used keras-ocr has some improvements compared to
easyocr,but still have some drawbacks that would be enhanced by other techniques .
Page 40
CHAPTER 3. Implementation Khaled Ben chikha
The previous two figure shows the amelioration of recognition accuracy by the means
of enhanced detection and label segmentation, but still not enough for our case.
Page 41
CHAPTER 3. Implementation Khaled Ben chikha
This figure shows that thresholding method led to an accuracy enhancement, but still
not enough as it depends only on one binarization threshold interval.
Page 42
CHAPTER 3. Implementation Khaled Ben chikha
Page 43
CHAPTER 3. Implementation Khaled Ben chikha
This figure shows the effect of auto-correction on accuracy and medicine name
extraction and correction. It has the most and the greatest results compared to
previous enhancements and accorded to our high accuracy needed context.
Finally, we denote that this is a research-oriented study since it is not an extensive development
project rather than being a comparative study that applies the theoretical and experimental
knowledge as well as the incremental ascertainment and experimental interpretations.
Page 44
CHAPTER 3. Implementation Khaled Ben chikha
3.6 Conclusion
In this chapter, we have focused on the realization phase of this project based on a
comparative OCR result and we deduce that the study is incremental and iterative. In the
following,we close this document with a general conclusion and a statement of
perspectives.
Page 45
General conclusion and perspectives Khaled Ben chikha
This document is a presentation of the work done during our final year internship at
MIRACL laboratory. Our mission is to design and implement a medicine image package
text detection and recognition system using AI techniques. We started by understanding
the general context of the project and by identifying the different requirements of the
system. This project required months of hard work both in theoretical study and technical
work. These efforts have resulted in a solution that meets our needs and combines
efficiency and performance. The implementation of our project is divided into three main
parts. The first part is based on related work on different approaches like our system.We
went through handcrafted, DL and finally we talked about existing OCR systems. The
second part includes the experimental study in which we compared different techniques,
and we closed by a performance evaluation. The third part is concerned with the
implementation part which consecutively cited the hardware environment, software
environment, interfaces and finally some results depended on developed pieces of code
showing an iterative and incremental study that leaded progressively to a very performant
result. This project allowed us to increase my knowledge as a future engineer, by applying
the notions obtained during my training and especially by developing a spirit of
initiative and adaptation to the specifications. We faced many problems and we
managed to overcome them all. This work allowed us to improve our knowledge in the field
of artificial intelligence and deep learning. Furthermore, we discovered how the optical
character recognition is practiced and applied, and its role in medical field.
Page 46
General conclusion and perspectives Khaled Ben chikha
Although we are quite happy with the solution that has been developed, I think that
it can be further developed, and additional features can be added or modified. As a first
perspective, because our system doesn’t have a result exploitation, we can use the
recognized text to generate a message or a sound so that it can help people taking
medication. It can by applied by using other python related sound libraries like
pyaudio. We started to convert the text to a sound and generate an email, but it was
not an easy task. The main challenge is complexity of algorithm due to the change of
image which is not easy to deal with. As a second perspective, because our system
cannot perform translation tasks. Furthermore, we can use natural language processing
to perform this task so that is recognized text can be easily understood and have a world
reputation through the web by search engines requests.
As a third and last perspective, our system cannot perform named entity recognition.
In addition, named entity recognition techniques are useful for drug name matching, drug
to drug interaction and associated chemical and medical extraction.
Finally, our application is open to extensions to include other fields than computer
science, computer vision and medical field.
Page 47
Webography and Bibliography
[2] Brzeski, Adam, et al. "Evaluating performance and accuracy improvements for
attention-OCR." IFIP International Conference on Computer Information Systems and
Industrial Management. Springer, Cham, 2019, (19/12/2022)
[3] Wei, Tan Chiang, U. U. Sheikh, and Ab Al-Hadi Ab Rahman. "Improved optical
character recognition with deep neural network." 2018 IEEE 14th International
Colloquium on Signal Processing & Its Applications (CSPA). IEEE, 2018. ,(19/12/2022)
[4]Hadar I. Avi-Itzhak, Thanh A. Diep, and Harry Garland, “High Accuracy Optical
Alphabet Recognition Using Neural Networks with Centroid Dithering IEEE
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE”, VOL. 17,
NO. 2, FEBRUARY 1995., (17/12/2022)
[5] Naseer, Asma, and Kashif Zafar. "Meta features-based scale invariant OCR decision
making using LSTM-RNN." Computational and Mathematical Organization Theory
25.2,(16/12/2022)
[6] Sushruth Shastry, Gunasheela G, Thejus Dutt, Vinay D S and Sudhir Rao Rupanagudi,
-“ A novel algorithm for Optical Character Recognition (OCR)”. 2013 IEEE.
(18/12/2022)
[7] https://signy.io/#/(16/12/2022)