Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

Computer Engineering & Applied

Republic of Tunisia Mathematics Department


Ministry of Higher Education and Scientific
Research
ST-EN07/00
Sfax University Graduation Project
National Engineering School of Sfax Serial N°: 2022 / DIMA-069

Graduation Project
Report
presented at

National Engineering School of Sfax


(Computer Engineering & Applied Mathematics Department)

in order to obtain the

National Engineering Diploma in Computer Science

by

Khaled Ben chikha

Text recognition in drug images: A


comparative study of several
end-to-end OCR and enhancements
Defended on 27/12/2022 in front of the committee composed of

Dr. Mouna Baklouti Kammoun President


Dr. Fadoua Drira Reviewer
Dr. Slim Kanoun Academic Supervisor
Dedication

It’s all thanks to God

With delight, I dedicate this work

In memory of my dear grandfather.

To my dear parents Hedia and Abderraouf, for the sacrifices they


have made and continue to make, for the love with which they have
showered me. May this work be the symbol of my gratitude for your
precious generosity. I would like to thank you for your support,
which has illuminated my path. There aren’t enough words to say
how much I feel for you two.
God bless you.

To my dear brothers and sister yacine,yousra and mounir:


Source of my joys and secrets of my triumphs, Thank you
for your unconditional love and support.
God bless you.

To all my family,

To all my friends,

Khaled Ben chikha


Thanks

First of all, praise, gratitude and thanks to God, the most powerful and merciful, who
gave me the opportunity and the ability and made me believe in every moment of despair
that everything is possible, that the next is easier and that success is a matter of time.

I would like to express my sincere thanks to my academic supervisor Mr. Slim Kanoun,
professor at the National School of Engineers of Sfax, for the quality of his supervision,
his continuous encouragement, his support and his precious advice which have been very
beneficial and fruitful.

I would also like to thank my industrial supervisor Mr. Slim Kanoun, professor at the
National School of Engineers of Sfax, for his availability, for the good working conditions,
his precious advice and above all for his patience and continual encouragement throughout
this project.

My sincere thanks to the MIRACL laboratory which welcomed me to develop my end


of study internship.

Also, I express my great gratitude to the members of the jury for having honored me
by accepting to evaluate this work. I hope they find in it the qualities of clarity and
motivation that they expect.

Finally, I would like to express my deep gratitude to all the teachers at the national
engineering school of Sfax for their availability and kindness during my training cycle at
this institution.
CONTENTS Khaled Ben chikha

Contents
Contents .............................................................................................................................. 3

List of Figures ...................................................................................................................... 5

List of Tables........................................................................................................................ 6

Acronyms ............................................................................................................................ 7

General Introduction .......................................................................................................... 1

A comparative study of several ........................................................................................... 5

end-to-end OCR on drug image text recognition ............................................................... 5


1.1 Introduction .......................................................................................................... 5
1.2 Basic definitions .................................................................................................... 5
1.2.1 Artificial intelligence ................................................................................. 5
1.2.2 Machine Learning ..................................................................................... 7
1.2.3 Deep Learning ........................................................................................... 7
1.2.4 Computer vision ........................................................................................ 7
1.3 Related work on handcrafted approach ............................................................... 8
1.4 Related work on deep learning approach........................................................... 10
1.5 Discussion on existing solutions ........................................................................ 12
1.5.1 Easyocr compared to tesseract ............................................................... 12
1.5.2 Easyocr compared to other OCR ............................................................ 15
1.6 Conclusion .......................................................................................................... 17

Experimental study on text detection and recognition on drugs images ....................... 18


2.1 Introduction ........................................................................................................ 18
2.2 Handcrafted approach ........................................................................................ 18
2.2.1 Handcrafted techniques ........................................................................ 19
2.2.2 CRAFT technique ................................................................................... 19
CONTENTS Khaled Ben chikha
2.2.3 CRAFT performance compared to other techniques ............................. 21
2.3 Deep Learning approach .................................................................................... 21
2.3.1 Based on easyocr ................................................................................... 22
2.3.2 Based on CNN ......................................................................................... 25
2.4 Performance Evaluation ..................................................................................... 31
2.5 Conclusion .......................................................................................................... 32

Implementation ................................................................................................................ 33
3.1 Introduction ........................................................................................................ 33
3.2 Hardware Environment ..................................................................................... 33
3.3 Software Environment ........................................................................................ 34
3.3.1 Programming language and frameworks ............................................. 34
3.3.2 Used libraries ......................................................................................... 35
3.4 Interfaces ............................................................................................................ 36
3.5 Code based realization and results ..................................................................... 39
3.6 Conclusion .......................................................................................................... 45

General conclusion and perspectives ............................................................................... 46

Webography and Bibliography ......................................................................................... 48


LIST OF FIGURES Khaled Ben chikha

List of Figures

Figure 1. handcrafted approach represented as feature extractor in traditional machine learning


flow .................................................................................................................................................... 9
Figure 2. handcrafted approach ...................................................................................................... 10
Figure 3. Deep learning pipeline compared to handcrafted pipeline...............................................12
Figure 4. easyocr framework ............................................................................................................ 15
Figure 5. CRAFT architecture .......................................................................................................... 20
Figure 6. CRAFT compared to RCNN ............................................................................................. 20
Figure 7. CRAFT results compared to handcrafted ML techniques .................................................21
Figure 8. Easyocr internal architecture ........................................................................................... 22
Figure 9. General quality performance ........................................................................................... 23
Figure 10. Percentage of tesseract to easyocr in term of accuracy accorded to image quality........ 23
Figure 11. Percentage of easyocr to tesseract in term of accuracy accorded to image quality ........ 23
Figure 12. Processing rapidity comparative study .......................................................................... 24
Figure 13. Accuracy comparative study ........................................................................................... 24
Figure 14. General easyocr performance comparative study .......................................................... 25
Figure 15. Model 1 results ................................................................................................................ 28
Figure 16. Model 2 results ............................................................................................................... 29
Figure 17. Summary comparison of the two models ....................................................................... 29
Figure 18. Model 1 test results ......................................................................................................... 30
Figure 19. Model 2 test results......................................................................................................... 30
Figure 20. Main application interface ............................................................................................. 36
Figure 21. Image upload interface ................................................................................................... 37
Figure 22. Image text recognition interface .................................................................................... 38
Figure 23. Enhanced keras-ocr by segmentation ............................................................................ 39
Figure 24. Keras-ocr another example showing errors ................................................................... 40
Figure 25. EAST detection result ......................................................................................................41
Figure 26. Drug name segmentation effect ......................................................................................41
Figure 27. Thresholding method accuracy improvement ............................................................... 42
Figure 28. Auto-correction algorithm result ................................................................................... 42
Figure 29. Full auto-correction result ............................................................................................. 43

v
LIST OF TABLES Khaled Ben chikha

List of Tables

Table 1: General performance comparison ......................................................................................31


Table 2 : Hardware Environment .................................................................................................... 33

Page
Acronyms Khaled Ben chikha

Acronyms
AI: Artificial Intelligence
NLP: Natural Language Processing
ML: Machine Learning
DL: Deep Learning
OS: Operating System
NER: Named Entity Recognition
PDF: Portable Document Format
CNN: Convolutional Neural Network
OCR: Optical Character Recognition
RNN: Recurrent Neural Network
CV: Computer Vision
HOG: Histogram of Oriented Gradients
SVM: Support Vector Machines
LSTM: Long Short-Term Memory
CTC: Connectionist Temporal Classification
SAST: Static application security testing
General Introduction Khaled Ben chikha

General Introduction

For decades, the problem of manual input and the need and mass of documents
have been increasing. In addition, the technological progress experienced and the
appearance and rapid development of acquisition equipment such as scanners and
cameras and the significant innovation in information technology, the appearance
and development of artificial intelligence as well as significant improvements
experienced in computer hardware.
Additionally, most business workflows involve receiving information from print
media. Paper forms, invoices, scanned legal documents, and printed contracts are
all part of business processes. These large volumes of paperwork take up a lot of
time and space to store and manage. Although paperless document management
is the way forward, scanning the document to image creates challenges. This
process requires manual intervention and can be tedious and slow.
Moreover, the digitization of this documentary content creates image files in which
the text is hidden. Text in images cannot be processed by word processors in the
same way as text documents. All this has allowed the appearance of OCR
technology to solve all these problems.

Page 1
General Introduction Khaled Ben chikha

OCR stands for Optical Character Recognition. It is a CV field that uses many of
CV techniques such as text localization, detection, and recognition. It denotes the
process of converting printed material into word processing files or text files that
can be read, edited, and managed using computers using AI models based on ML
and DL algorithms.
OCR itself has taken many steps to arrive at this stage of technological maturation
and of vital importance during the last years. From recognition of checks to
scanning of postal addresses and then of forms in the first tests. old then archives
centers as well as much more complex administrative documents to arrive today at
OCRs capable of recognizing entire documents of hundreds of pages such as books
to give a digital copy in a few minutes. This has gigantically improved the human
knowledge via the web. In addition to its comprehensiveness, ease and availability,
this relatively easy concept vaster technically research field has affected the
medical field which has taken enormous benefit from OCR.

One of the most used OCR frameworks is easyocr. It has a lot of advantages such as high
average accuracy in recognition and easy use added to rapidity of processing. But it has
a lot of drawbacks such as non-resistance to bad quality images. In addition, it doesn’t
do well with images affected by artifacts including partial occlusion, distorted
perspective, and complex background. Globally, this is a medium quality OCR as
not the worse and not the best OCR in the world nowadays. That is a simple reason
why we have used this specific type of OCR as we will further explain our choice
according to this comparative study aiming to enhance its results by application on
medical field and specifically on drugs images.

Page 2
General Introduction Khaled Ben chikha

In this context, this end-of-study project aims to developing a system that is


capableto transform a medicine package image to a text enhancing the accuracy of
it bysegmentation techniques and auto-correction algorithm no matter how the quality
is poor. This is a study-based and research-oriented project it is not an extensive
development project rather than being a comparative study that applies the
theoretical and experimental knowledge as well as the incremental ascertainment
and experimental interpretations to enhance this medium result OCR.

Page 3
General Introduction Khaled Ben chikha

This project is structured around 3 chapters and is closed with a conclusion. The
chapters are presented as follows:

• Chapter 1: describes the context of the project. In addition, it sets out state of
theart on end-to-end systems for text detection and recognition. Firstly, we
detail therelated work on Hand crafted approach. Secondly, we will speak about
the related work on deep learning approach. And finally, we will deal with the
existing solutions.

• Chapter 2: denotes an experimental study on text detection and


recognition on drugs images. consecutively, we will speak about hand crafted
approach, deep learning approach based on easyocr and the one based on CNN
then we will finalize by a performance evaluation of the cited techniques.

• Chapter 3: details the implementation. First of all, we will be interested


bythe technologies: programming language, libraries and frameworks. Finally,
we will denote the interfaces of our system.

Page 4
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

Chapter

A comparative study of several


end-to-end OCR on drug image text
recognition

1.1 Introduction

This first chapter present the general context to which belongs our project. Subsequently,
we deal with end-to-end systems for text detection and recognition. It starts with the
related work on hand crafted approach used only for text detection and localization. After
that, we denote the related work on deep learning approach used for both text detection
and recognition. Finally, we summarize by a discussion on existing solutions.

1.2 Basic definitions

1.2.1 Artificial intelligence

Page 5
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

Artificial intelligence is a branch of computer science that treats the simulation of


intelligent behavior. It is the ability of a system to adequately interpret external data, to
learn from that data and to use this knowledge to fulfill specific goals and tasks through
flexible adaptation. Artificial intelligence is a wide concept containing many sub-fields,
notably machine learning, deep learning, and natural language processing.
The term artificial intelligence, coined by John McCarthy It is defined by one of its
creators, Marvin Lee Minsky, as the construction of computer programs that engage in
tasks that are, for the time being, performed more satisfactorily by human beings
because they require high-level mental processes such as: perceptual learning, memory
organization and critical reasoning. This imitation can take place in reasoning, for
example in games or the practice of mathematics, in the understanding of natural
languages, in perception: visual (interpretation of images and scenes), auditory
(understanding of spoken language) or by other sensors, in controlling a robot in an
unknown or hostile environment.

Page 6
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

1.2.2 Machine Learning

Machine learning is a subset of AI that uses computer algorithms to analyze data and make
intelligent decisions based on what it has learned, without being explicitly programmed.
A crucial aspect of machine learning is that it allows machines to solve problems on their
own and make accurate predictions using the data provided

1.2.3 Deep Learning

Deep learning is a dedicated subset of machine learning that uses layered neural networks
to simulate human decision making. Deep learning algorithms may be used to label and
classify data and recognize patterns. This allows AI systems to continuously learn on the
job, and to improve the quality and accuracy of results by figuring out whether decisions
were correct.

1.2.4 Computer vision

Computer vision is a field of artificial intelligence (AI) that enables computers and systems
to derive meaningful information from digital images, videos and other visual inputs and
take actions or make recommendations based on that information. If AI enables computers
to think, computer vision enables them to see, observe and understand.

Computer vision works much the same as human vision, except humans have a head
start. Human sight has the advantage of lifetimes of context to train how to tell objects
apart, how far away they are, whether they are moving and whether there is something
wrong in an image.

Page 7
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

1.3 Related work on handcrafted approach

Existing image functions may be kind of divided into 2 classes: the handcrafted and the

discovered ones. By handcrafted capabilities we understand the ones that are extracted
from separate snapshots in keeping with a positive manually predefined algorithm based
on the expert information. Consequently, handmade method was once the traditional
technique before the appearance of deep mastering. The handmade capabilities were
usually used with Traditional device studying methods for object popularity and laptop
vision like Guide Vector Machines, as an example. However, newer strategies like
convolutional Neural networks generally do no longer have to be supplied, as they can
examine the capabilities from the picture facts. Handmade method is Best used for
detection phase, it can’t be employed all through popularity phase because Complexity
of venture and the approach barriers. It needs a ML set of rules to perform the recognition
mission. Earlier, many characteristic extraction strategies were proposed in literature for
photo processing. HOG and DCT have been used in many samples Recognition issues.
Researchers used DCT capabilities for their studies paintings on a word-Based
popularity gadget. Their effects described better accuracy compared to different
Strategies. Used Radon remodel and DCT for function extraction. To enhance the Low
frequency thing, they used Radon rework. SVM was used for classification Experimental
outcomes present that the proposed functionsare green. DCT is likewise used for Vehicle
plate reputation, video textual content detection and Iris reputation. There’s also
presentation of a set of rules for ruling estimation of historic manuscripts.

Page 8
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

pedestrian/non-pedestrian. Used The gradient features for primary character popularity


accumulated from diverse writers. Their sample contained 7200 characters. They
normalized sample images to 8100-pixel sizes. Support Vector Machines (SVM) was used
for classification. They obtained accuracy of 0.94. Based on this related work, the authors
noticed that the text recognition system for ancient documents does not exist. Therefore,
this is the related work on handcrafted approach depending mainly on SVM, HOG and DCT.
Their pattern Contained 7200 characters. They normalized pattern pics to 8100 pixel sizes.
Assist Vector Machines (SVM) turned into Used for type. They obtained accuracy of
0.94. Based in this associated work, the authors observed that the Text reputation
system for historic files does no longer Exist. Therefore, that is the associated paintings
on handcrafted approach relying particularly on SVM, HOG and DCT.

Handcrafted approach is illustrated by these two figures below.

Figure 1. handcrafted approach represented as feature extractor in traditional


machine learning flow

Page 9
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

Figure 2. handcrafted approach

1.4 Related work on deep learning approach

Optical character recognition is one of the maximum popular pattern popularity


Applications and lots of structures have been proposed inside the beyond towards this
intention. This is Due to the fact it’s far a sizable utility of glaring monetary application and
additionally because It’s miles a look at mattress before more complicated visible
sample popularity packages are tried. One of the earliest neural network-based structures
for handwritten individual Recognition is the Neocognitron of Fukushima. A massive
amount of work on optical popularity of postal ZIP codes became performedby a
collection at ATT Bell Labs Through Le Cun and others. The gadget makes useof a
multilayered community with neighborhood Connections and weight sharing skilled with
lower backpropagation for type. This implements a hierarchical cone in which less difficult
local functions are extracted in Parallel which integrate to form better-stage, less

Page 10
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

neighborhood features and which ultimately Outline the digits. An extensive observeof
returned propagation for optical recognition Of each handwritten letters and digits is given
via Martin and Pittman. Keeler, Martin, and others at MCC worked on combining
segmentation and Recognition in a single included device. That is important if characters
are Touching in any such manner that they cannot be segmented by a straightforward
segmentation process. Numerous comparative studies have additionally been done, either
via solving the data set and varying the techniques or with the aid of also the use of a
number of records units. Man on et al Is an early reference where easy and multilayer
perceptron are compared with Statistical distance-based totally classifiers inclusive of ANN
in recognizing handwritten digits for automatic analyzing of postal codes. A comparison
of CNN, multilayer perceptron, and radial foundation functions in spotting handwritten
digits is given by Lee. An evaluation of the mission and numerous neural and
traditional processes is also given. For the undertaking of optical handwritten person
reputation, a giant step Was the production of a CDROM (special Database three) via
NIST which includes a huge set of digitized

Individual pics and pc subroutines that technique them. A number of the above-
referred to works use this data set or its predecessor. It’s miles to be had by writing to
NIST. A comparison of 4 statistical and 3 neural community classifiers is given via Blue et
al. For optical Character recognition and a comparable assignment, fingerprint reputation
(for which a comparable CDROM changed into additionally made to be had through NIST.
Researchers from NIST made several research using this information set and technical
reviews can be accessed over the internet. Currently with the discount of price of computing
power and memory, it has Been viable to have multiple structures for the same project that
are then blended To enhance accuracy . One technique is to have parallel fashions after
which Take a vote. Every other method is to have models cascaded where less difficult
fashions Are used to classify easier photographs and complicated strategies are used to
classify Images of poorer exceptional. Otherwise, deep learning may be compared to
handcrafted by the figure below.

Page 11
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

Figure 3. Deep learning pipeline compared to handcrafted pipeline

1.5 Discussion on existing solutions

In this section, we will discuss about OCR existing solutions.

1.5.1 Easyocr compared to tesseract

1.5.1.1 Easyocr

Easyocr is a python totally based OCR library made through Jaided-AI organization
which extracts the text from the photograph. Its a geared up-to-use OCR with more than
forty languages supported which include Chinese language, jap, Korean and Thai. It’s an
open supply task licensed under Apache 2.0.

It does the sure pre-processing steps(grey scaling and many others.,) within its
library and extracts the text. It also applies the CRAFT set of rules to stumble

Page 12
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

on the text. CRAFT is a scene text detection method to successfully detect textplace
through exploring every individual and affinity among the characters. The recognition
model makes use of CRNN. The sequencing labelling is carried out by wayof LSTM and
CTC. Here the CTC is supposed for labelling the unsegmented series records with RNN.
Easyocr helps multiple hyper-parameters for read text method. The hyper-parameters are
protected below each layer of the processing mechanism. It also supports hyper-
parameters that belongs to CRAFT as well. One could find the specific hyper-parameters
here that may be used inside it and helps GPU processing. Its output has a nested array in
which first detail gives the co-ordinate axis which may be used tomark the text in the
image. Subsequent the actual text and the remaining is the self-assurance price. While
we form the sentence from the. Extracted text, the order wherein the texts are extracted
has a few discomforts.

The first-rate of your source photograph is important. If the first-rate of the unique
source image is good, it means that if the human eyes can see the unique supply truly, it
is goingto be viable to acquire true OCR effects. However, if the authentic supply itself
is not clean, then OCR outcomes will most in all likelihood consist of mistakes. The higher
the high-quality of original supply image, the easier it’s far to differentiate characters
fromthe rest, the higher the accuracy of OCR might be.

1.5.1.2 Tesseract

An optical individual reputation engine, tesseract-OCR changed into released in August


2006. Traditionally, Tesseract changed into created in 1985 through Hewlett Packard and
discontinued 10 years later. Aware of the capability of this software program, it became
currently decided to make it to be had to all and sundry by releasing it beneath the
Apache v2 license. Tesseract-OCR is a way from being as powerful because the
proprietary software currently available on the market, however, is in reality becoming the
great loose individual recognition engine. Evolved by way of Hewlett Packard from 1985 to
1994, Tesseract became one of the first-class optical person reputation software programs.
After HP’s withdrawal from the OCR marketplace, it remained unchanged for a long term.

Page 13
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

It was simplest these days that some humans at HP determined to convey it to lifestyles in
2005 via freeing the code, with assist from the facts technology research Institute and some
debugging from Google. The latter, significantly inquisitive about OCR techniques, is
now accountable for its improvement and enhancements. Tesseract is a simple reputation
engine, within the feel that it does no longer offer a person interface, perform format
evaluation, or layout the effects it produces. Some other of its obstacles is that it handiest
recognizes US-ASCII characters and consequently most effective works successfully with
documents written in English. Sooner or later, the acquisition of grayscale or color
documents remains tough. Its compilation and its execution are possible underneath each
GNU/Linux and Microsoft systems. Its average accuracy is at 0.97.

1.5.1.3 Advantages of easyocr

Easyocr supports the GPU version and performance is good on GPU and provides the
confidence of the extracted text which can be used to analyze further. EasyOCR works
better with noisy images when compared with tesseract.

1.5.1.4 Advantages of tesseract

Tesseract supports customized pre-processing layer based on the user’s need and work
pretty faster with multiple images. It gives the output as a sentence which is not the case
with easyOCR. Its performance is directly linked towards the quality of the image. It has the
configuration to extract only the digits. Tesseract also support training on customized data
as well.

1.5.1.5 Limitations of both tesseract and easyocr

we would like to point out the general limitations of both tesseract and easyocr.

In general Poor quality scans may produce poor quality OCR. If a document contains
languages outside of those given in the LANG arguments, results may be poor. On

Page 14
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

handwritten text both would give low results. Doesn’t do well with images affected by
artifacts including partial occlusion, distorted perspective, and complex background.

Both tesseract and easyocr is good for scanning the clean documents and would
result with the higher accuracy. Both supports LSTM.

1.5.2 Easyocr compared to other OCR

1.5.2.1 Easyocr

Easyocr can be illustrated by this figure below.

Figure 4. easyocr framework

1.5.2.2 PaddleOCR

PaddleOCR, advanced through Baidu, is primarily based on the deep studying framework
PaddlePaddle (PArallel allotted Deep getting to know). It supports Linux, home windows,
macOS, and different systems.

PaddleOCR includes an extremely-lightweight and fashionable OCR version,


integrating OCR algorithms like:

Textual content detection models: EAST, DB, SAST Text popularity models: CRNN,
Rosetta, superstar-net, uncommon, SRN

You may educate and installation PaddleOCR to servers, mobile (both iOS and
Android), embedded, and IoT devices. It has a Paddle Lite library that allows to integrate

Page 15
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

with go-platform hardware for smooth deployment. It helps both CPU and GPU. For
faster computing, GPU is desired.

PaddleOCR supports the Python programming language, but for inference and
deployment, you may use C++ and Python. Diverse options are available for serving and
benchmarks. You can effortlessly installation any bundle the usage of the pip package
manager. You could additionally use this GitHub repo to convert Paddle to PyTorch.

At the time of writing, PaddleOCR is actively maintained and developing speedy,


with 18.8k stars and three.9k forks. It gives proper community guide and serves as a
dependency for numerous GitHub tasks.

PaddleOCR isn’t always the most correct, but after some publish-processing,
PaddleOCR gives tough opposition to Tesseract, especially inside the Chinese language
language. On the time of writing, it supports greater than 80 languages, together with
Korean, German, and French.

1.5.2.3 GOCR

GOCR turned into developed underneath the GNU public license by means of Joerg
Schulenburg. (It become to start with called GNU OCR however later modified to Joerg’s
OCR). It helps enter codecs like TIFF, GIF, PNG, PNM, PBM, and BMP to output a text
report. It helps windows, Linux, and OS/2.

You could integrate GOCR with one-of-a-kind frontends, which makes it easy to
port to extraordinary operating systems and architectures. You don’t should educate a
application or shop big fonts. You may actually call from the command line to get the
effects.

It isn’t always continually correct because it has difficulty reading handwritten textual
content, noisy photos, and overlapping characters. It’s to be had in English and can also
translate barcodes. On the time of writing, GOCR isn’t actively maintained, without a
new release due to the fact 2018. It doesn’t appear to have big network help, either

Page 16
CHAPTER 1. A c o m p a r a t i v e s t u d y o f s e v e r a l
end-to-end OCR on drug image text recognition Khaled Ben chikha

It works on the CPU but not with the GPU. It turned into written in the c program
language period. A few wrappers are available, like gocr-php, Golang implementation, and
GOCR.Js.

1.5.2.4 Cognitive OpenOCR

Cognitive OpenOCR (Cuneiform) with the aid of Cognitive technologies turned into
advanced by combining databases from other openware OCR applications, plus user
input and comments. It helps twenty to thirty languages, together with Russian, English,
Turkish, and Italian.

As it is database is built-in, you don’t want an internet connection to apply it; but, it
has no longer been actively maintained in view that 2019 and does not provide network
support.

Most of the time, outputs want modifying, and the tool gives negative consequences
with much less contrasted pictures, making it less accurate. It really works on the CPU
however does not assist GPU. It turned into written in C and C++ and has a wrapper
available in the Internet.

1.6 Conclusion

In this chapter, we spoke about handcrafted approach and its techniques.Then,we detailed
deep learning approach related work.After that,we moved to optical character recognition
systems,we made a comparaison study between ocr systems and we will retain only easyocr
as it is suitable for our project context.

Page 17
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

Chapter
2
Experimental study on text detection
and recognition on drugs images

2.1 Introduction

In this section, we provide an experimental study on text detection and recognition on drugs
images. We begin by handcrafted approach results. Then ,we dive into deep learning
approach in which we start by easyocr experimental results and we finish by OCR based
CNN results .Finally, we finish by a general performance evaluation of our system.

2.2 Handcrafted approach

Handcrafted approach is mainly based on SVM, KNN and many other traditional
techniques, it can only do the text detection task as it is represented by the craft system
in easyocr framework. So, in this section we will compare handcrafted approaches to our
system only detection part which is craft.

Page 18
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

2.2.1 Handcrafted techniques

Handcrafted techniques are SVM and KNN mainly based text detection. This techniques
are mainly traditional and also have poor results compared to advanced text detection
techniques like CRAFT which is a part of easyocr framework.

2.2.2 CRAFT technique

CRAFT is used to refer to character region awareness for text detection. Traditional
characterlevel bounding box detection techniques are not adequate to address these
areas. In addition, the ground truth generation for character-level text processing is a
tedious and costly task. The CRAFT text detection model uses a convolutional neural
network model to calculate region scores and affinity scores. The region score is used to
localize character regions while the affinity score is used to group the characters into
text regions.

CRAFT uses a fully convolutional neural network based on the VGG16 model. The
inference is done by providing word-level bounding boxes. The CRAFT text detection
model works well at various scales, from large to small texts and is shown to be effective
on unseen datasets as well. However, on Arabic and Bangla text, which have continuous
characters, the text detection performance of the model is not up to the mark.

The time taken by CTPN, EAST and MSER text detection methods is lower compared to
the CRAFT text detection engine. However, CRAFT is more accurate, and the bounding
boxes are more precise when the text is long, curved, rotated or deformed as illustrated
in the figures below.

Page 19
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

Figure 5. CRAFT architecture

Figure 6. CRAFT compared to RCNN

Page 20
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

2.2.3 CRAFT performance compared to other techniques

CRAFT is very advanced, and has more benefic results compared to traditional systems
as it is shown in the figure below.

Figure 7. CRAFT results compared to handcrafted ML techniques

As an interpretation, CRAFT has shown very good results compared to ML


techniques which explain the strength of recent techniques like easyocr in text detection
field and the poor results of other traditional techniques that still used in few and specific
tasks which is not our case.

2.3 Deep Learning approach

In this section, we will detail some advanced approaches used recently to carry out the best
accuracy in text detection as well as in text recognition based mainly on deep learning
approaches. thus, we will begin by easyocr approach and main results .Then , we will detail
the other techniques based on CNN .Finally,we will close by a general performance
evaluation in which we will compare the two cited techniques to our system.

Page 21
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

2.3.1 Based on easyocr

2.3.1.1 Easyocr architecture

Easyocr is a framework based on three main layers:

.Detection layer :based on CRAFT technique


.Recognition layer :based on combined system of Resnet,LSTM and CTC.
.post processing layer : based on cv techniques.

the figure below illustrates the internal easyocr architecture.

Figure 8. Easyocr internal architecture

2.3.1.2 Easyocr general quality and time performance

we have compared easyocr to tesseract in term of image quality-based accuracy


andruntime performance as shown by the figures below.

Page 22
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

Figure 9. General quality performance

Figure 10. Percentage of tesseract to easyocr in term of accuracy accorded to


image quality

Figure 11. Percentage of easyocr to tesseract in term of accuracy accorded to


image quality

As interpretation, we conclude that easyocr is more resistant to poor quality than


tesseract but has not a very high accurate results since medicine package text recognition
need a high accuracy and resistance to bad quality images.

Page 23
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

2.3.1.3 Easyocr comparison to other OCR systems in term of accuracy and pace

Here, we compare easyocr to different other OCR systems in term of accuracy and
processing rapidity. This is illustrated by the figures below.

Figure 12. Processing rapidity comparative study

Figure 13. Accuracy comparative study

As result interpretation, we deduce that easyocr is more accurate and relatively slow
which should be enhanced in our case because medicine recognition has critical accuracy
need and less rapidity concerns. As it concerns our case we tend in our system to enhance
this framework by the use of segmentation based on EAST detector and segmenting code
and also auto-correction algorithm based on dictionary comparison that we will talk about
in implementation phase.

2.3.1.4 General easyocr performance

In this subsection, we will present a general performance experimental study showing


a full comparison of easyocr to other systems comparison.

Page 24
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

Figure 14. General easyocr performance comparative study

We conclude that easyocr has a great accuracy and an acceptable rapidity processing
compared to a lot of other ocr systems.Added to that,easyocr is the greatest context
related to our project.So,since the high accuracy need of medical applications in general
and our case in special, we have retain this system and use it in our project but we had
enhanced this technique to be more suitable and show more good and realistic results.
In the next subsection,we will talk about other CNN based systems and its experimental
study results and interpretations.

2.3.2 Based on CNN

This is an end-to-end CNN based system for image text detection and recognition.
The Model used for the problem is called Convolutional Recurrent Neural Network
(CRNN) since it combines Deep Convolutional Neural Networks (DCNN) and RNN
which constructs an end-to-end system for sequence recognition. The Model consistsof
three components, including the convolutional layers, the recurrent layers, and a
transcription layer. The convolutional layers automatically extract a feature sequence from
each input image. On top of the convolutional network, a recurrent

Page 25
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha
neural network is built for making prediction for each frame of the feature sequence,
outputted by the convolutional layers. The transcription layer at the top of CRNN is
adopted to translatethe per-frame predictions by the recurrent layers into a label sequence.
Though CRNN is composed of different kinds of network architectures (DCNN and RNN), it
can be jointly trained with one loss function.

2.3.2.1 Image Feature Extraction Using CNN

In CRNN model, the component of convolutional layers is constructed by taking the


convolutional and max-pooling layers. Such component is used to extract a sequential
feature representation from an input image.

2.3.2.2 Sequence Labeling Using RNN

A deep Bi-directional Recurrent Neural Network is built on the top of the convolutional
layers, as the recurrent layers. The recurrent layers predict a label distribution yt for
each frame xt in the feature sequence x =x1; … ; xT. We use a Bi-directional RNN here
because it is observed that in image-based sequences, contexts from both directions are
useful and complementary to each other.

2.3.2.3 Transcription

Transcription is the process of converting the per-frame predictions made by RNN into
a label sequence. We will be using Connectionist Temporal Classification (CTC) in our
transcription process to decode the output from the RNN and convert it to a text label.
Let’s discuss about Connectionist Temporal Classification now.

2.3.2.4 Connectionist Temporal Classification (CTC)

The Sequence labeling problem consists of input sequences X=[x1,x2,..,xT] and its
corresponding output sequences Y=[y1,y2,…,yU]. In this problem we need to find an
accurate mapping from X’s to Y’s but there are a few issues with this, that is both X

Page 26
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

and Y can vary in length and we don’t have an accurate alignment (correspondence of
the elements) of X and Y. The CTC algorithm overcomes these challenges. For a given
X it gives us an output distribution over all possible Y’s. We can use this distribution
either to infer a likely output or to assess the probability of a given output. I have tried
to explain CTC in brief in this section, you can read more about CTC in this well written
Medium Article and this blog in distill which helped me understand connectionist temporal
classification and its details.

2.3.2.5 Modeling

I trained two CRNN models on the same data to see which of the two gives better results.

Model 1: Used CNN, Bi-directional LSTM for RNN trained using Adam Optimizer.
Model 2: Used CNN, Bi-directional GRU for RNN trained on RAdam optimizer. Rectified
Adam (RAdam) is the new state of the art AI Optimizer. You can read about RAdam in
this Medium Article which explains the optimizer.

Model 1

The Model Architecture has two Stages:

Train Stage: which takes input image, text labels (encoded as integers), input length
(time-steps length), label length and outputs CTC loss Prediction Stage: which takes
input image and outputs a matrix of dimesnions 42x37 where 42 is the number of time-
steps of RNN and 37 is the length of letters + 1 character for ctc blank

Training the Model 1 on 200000 Images and validating on 12000 images for 20 epochs
with early stopping.

Page 27
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

Figure 15. Model 1 results

Model 2

The Model 2 Architecture has Train and Prediction stages like Model 1.

Training the Model 2 on same set of images like in Model 1 for 20 epochs with early
stopping.

Page 28
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

Figure 16. Model 2 results

.Summary

Let us look at the summary of results for both the models on Test dataimages.

Figure 17. Summary comparison of the two models

Page 29
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

Out of the two models, Model 1 has better results. Here is another comparison of
cited models using a sample image.

Figure 18. Model 1 test results

Figure 19. Model 2 test results

Conclusion Model 1 Trained on 200000 images from Synth Text Images performs
reasonably well on Unseen 15000 Test Images of Variable length labels with an accuracy
of 0.88 and letter accuracy of 0.94 Model 2 also Trained on same 200000 images has an
accuracy of 0.82 and letter accuracy of 0.93 on 15000 Test Images of Variable length
labels.

Page 30
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

2.3.2.6 Possible further improvements of this end-to-end specific system

As a further improvement of this end-to-end text detection and recognition based CNN
models, we can Train the Models with more images like 300000 or 400000 images to see
if get any improvements in the results. Experiment with RAdam Optimizer for Model
1 instead of Adam Optimizer. This model is trained on Single word images, it can be
trained further to recognize special characters, sentences. As a conclusion, we denote that
CNN based systems are the best systems compared to OCR normal systems such as easyocr
or tesseract. We will instead apply a performance evaluation that compare our system to
handcrafted and DL last cited approaches.

2.4 Performance Evaluation

In this section, we will compare our system based enhanced easyocr to handcrafted as well
as easyocr and CNN systems.

The general performance comparison between our enhanced system and previouscited
methods is illustrated by the following table.

images time(s) word accuracy (%) Device


handcrafted 100 150 45 CPU
easyocr 100 15 83 CPU
CNN 100 12 98 CPU
our system 100 17 100 CPU
Table 1: General performance comparison

As interpretation, we conclude that our system has gained a full accuracy and a little
time-consuming state which can be very benefic to help extract valuable information from
medicine images.

Page 31
CHAPTER 2. Experimental study on text
detection and recognition on drugs images Khaled Ben chikha

2.5 Conclusion

In this section, we talked about different existing approaches, and we compared these
methods to each other. And finally, we compared the different techniques to our system
in term of accuracy. In the next chapter, we will talk about implementation and used
technologies helping to develop our system and interfaces composing the realized system.

Page 32
CHAPTER 3. Implementation Khaled Ben chikha

Chapter

Implementation

3.1 Introduction

Having analyzed our needs and justified our use of an enhanced easyocr algorithm, all that
remains is to choose the tools, languages and frameworks that we will use before starting to
develop, we begin by identifying what suits us best materially and technically. Hence,in
this last part, we first present the working environment and the various development tools
used. Then we detail the realization parts.

3.2 Hardware Environment

The characteristics of the machine on which we carried out our work are given in table 2.

Features Description
Mark MSI
Processor Intel(R) Core (TM) i5-11400H
RAM 8 GB
Disk 512 GB SSD
Graphic card Intel(R) UHD Graphics
OS Windows 11 professional

Table 2 : Hardware Environment

Page 33
CHAPTER 3. Implementation Khaled Ben chikha

3.3 Software Environment

In this section, we will present the technologies and tools we have used.

3.3.1 Programming language and frameworks

3.3.1.1 Python

Python, shown in Figure below, is an open source interpreted programming language. It


is most widely used in the field of data science and machine learning
3.3.1.2 Anaconda

Anaconda is a free and open source distribution of the Python and R programming
languages applied to the development of applications dedicated to data science and
machine learning, which aims to simplify package management and deployment.

.Jupyter

The jupyter project is a project whose objectives are to develop open source software,
open standards and services for interactive computing in several programming languages.
It was derived from IPython in 2014 by Fernando Pérez and Brian Granger.
3.3.1.3 Visual studio code
Visual Studio Code is an extensible code editor developed by Microsoft for Windows,
Linux, and macOS. Features include debugging support, syntax highlighting, smart code
completion, snippets, code refactoring, and Git built-in.

.Streamlit

Streamlit is an open-source Python framework specially designed for machine learning

engineers and data scientists. This framework allows you to create web applications that
can easily integrate machine learning models and data visualization tools.

Page 34
CHAPTER 3. Implementation Khaled Ben chikha

3.3.2 Used libraries

3.3.2.1 Opencv

OpenCV is a free library, initially developed by Intel, specialized in real-time image


processing. The robotics company Willow Garage, then the company ItSeez succeeded in
supporting this library.
3.3.2.2 Keras
Keras is an open source library written in python Keras library allows interfacing with
deep neural network and machine learning algorithms including Tensorflow3, Theano,
Microsoft Cognitive Toolkit
3.3.2.3 Tensorflow

TensorFlow is an open source machine learning tool developed by Google. The source
code was opened on November 9, 2015 by Google and released under the Apache license. It
is based on the DistBelief framework, initiated by Google in 2011, and has an interface for
Python, Julia and R
3.3.2.4 Numpy

NumPy is a library for Python programming language, intended to manipulate


matrices or multidimensional arrays as well as mathematical functions operating on these
arrays. NumPy is the fundamental package for scientific computing in Python. It is
a Python library that provides a multidimensional array object, various derived objects
(such as masked arrays and matrices), and an assortment of routines for fast operations on
arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O,
discrete Fourier transforms, basic linear algebra, basic statistical operations, random
simulation and much more.

Page 35
CHAPTER 3. Implementation Khaled Ben chikha

At the core of the NumPy package, is the ndarray object. This encapsulates n-
dimensional arrays of homogeneous data types, with many operations being performed in
compiled code for performance.

3.4 Interfaces

Figure 20. Main application interface

This figure shows the main interface which is loaded locally via streamlit server and have
streamlit design user interface.It has two buttons:the first called Browse files for uploading
a medicine package image from computer and the second called Text localization and
recognition with non improved easyocr which calls basic easyocr non enhanced function
to perform detection then recognition then print output text.

Page 36
CHAPTER 3. Implementation Khaled Ben chikha

Figure 21. Image upload interface

This figure shows the image upload interface. The user clicks on Browse files button
and select a medicine package image stored on user computer.Then,the website prints the
selected image. There is possibility of image change by clicking Browse files and choosing
another image, the system will automatically do an update. Also, clicking on the exit icon
which appears by moving the mouse to the image displayed may delete the image from
screen and the interface become as main interface.

Page 37
CHAPTER 3. Implementation Khaled Ben chikha

Figure 22. Image text recognition interface

Page 38
CHAPTER 3. Implementation Khaled Ben chikha

The figure shows the image text recognition interface. The recognized text is displayed
by clicking on recognize with easyocr and it can be also copied and pasted in another file
to store result.

3.5 Code based realization and results

Figure 23. Enhanced keras-ocr by segmentation

This figure shows the effect of segmentation and its important full accuracy results.

Page 39
CHAPTER 3. Implementation Khaled Ben chikha

Figure 24. Keras-ocr another example showing errors

The figure 24 shows that used keras-ocr has some improvements compared to
easyocr,but still have some drawbacks that would be enhanced by other techniques .

Page 40
CHAPTER 3. Implementation Khaled Ben chikha

Figure 25. EAST detection result

Figure 26. Drug name segmentation effect

The previous two figure shows the amelioration of recognition accuracy by the means
of enhanced detection and label segmentation, but still not enough for our case.

Page 41
CHAPTER 3. Implementation Khaled Ben chikha

Figure 27. Thresholding method accuracy improvement

This figure shows that thresholding method led to an accuracy enhancement, but still
not enough as it depends only on one binarization threshold interval.

Figure 28. Auto-correction algorithm result

Page 42
CHAPTER 3. Implementation Khaled Ben chikha

This figure shows the auto-correction algorithm based on dictionary comparison


method and character replacement algorithm that leads to drug name correction and
extraction.Thus,this denotes the full enhancement level of our study based from the
minor enhancement which is basic easyocr use without improvement to the next level
of keras-ocr which shows more good results to the superior level which is EAST and
segmentation combined technique up to the thresholding 99.99% accuracy ending by
this auto-correction algorithm having 100% of accuracy.

Figure 29. Full auto-correction result

Page 43
CHAPTER 3. Implementation Khaled Ben chikha

This figure shows the effect of auto-correction on accuracy and medicine name
extraction and correction. It has the most and the greatest results compared to
previous enhancements and accorded to our high accuracy needed context.

Finally, we denote that this is a research-oriented study since it is not an extensive development
project rather than being a comparative study that applies the theoretical and experimental
knowledge as well as the incremental ascertainment and experimental interpretations.

Page 44
CHAPTER 3. Implementation Khaled Ben chikha

3.6 Conclusion

In this chapter, we have focused on the realization phase of this project based on a
comparative OCR result and we deduce that the study is incremental and iterative. In the
following,we close this document with a general conclusion and a statement of
perspectives.

Page 45
General conclusion and perspectives Khaled Ben chikha

General conclusion and perspectives

This document is a presentation of the work done during our final year internship at
MIRACL laboratory. Our mission is to design and implement a medicine image package
text detection and recognition system using AI techniques. We started by understanding
the general context of the project and by identifying the different requirements of the
system. This project required months of hard work both in theoretical study and technical
work. These efforts have resulted in a solution that meets our needs and combines
efficiency and performance. The implementation of our project is divided into three main
parts. The first part is based on related work on different approaches like our system.We
went through handcrafted, DL and finally we talked about existing OCR systems. The
second part includes the experimental study in which we compared different techniques,
and we closed by a performance evaluation. The third part is concerned with the
implementation part which consecutively cited the hardware environment, software
environment, interfaces and finally some results depended on developed pieces of code
showing an iterative and incremental study that leaded progressively to a very performant
result. This project allowed us to increase my knowledge as a future engineer, by applying
the notions obtained during my training and especially by developing a spirit of
initiative and adaptation to the specifications. We faced many problems and we
managed to overcome them all. This work allowed us to improve our knowledge in the field
of artificial intelligence and deep learning. Furthermore, we discovered how the optical
character recognition is practiced and applied, and its role in medical field.

Page 46
General conclusion and perspectives Khaled Ben chikha

Although we are quite happy with the solution that has been developed, I think that
it can be further developed, and additional features can be added or modified. As a first
perspective, because our system doesn’t have a result exploitation, we can use the
recognized text to generate a message or a sound so that it can help people taking
medication. It can by applied by using other python related sound libraries like
pyaudio. We started to convert the text to a sound and generate an email, but it was
not an easy task. The main challenge is complexity of algorithm due to the change of
image which is not easy to deal with. As a second perspective, because our system
cannot perform translation tasks. Furthermore, we can use natural language processing
to perform this task so that is recognized text can be easily understood and have a world
reputation through the web by search engines requests.

As a third and last perspective, our system cannot perform named entity recognition.
In addition, named entity recognition techniques are useful for drug name matching, drug
to drug interaction and associated chemical and medical extraction.

Finally, our application is open to extensions to include other fields than computer
science, computer vision and medical field.

Page 47
Webography and Bibliography

[1] https://github.com/JaidedAI/EasyOCR (20/12/2022)

[2] Brzeski, Adam, et al. "Evaluating performance and accuracy improvements for
attention-OCR." IFIP International Conference on Computer Information Systems and
Industrial Management. Springer, Cham, 2019, (19/12/2022)

[3] Wei, Tan Chiang, U. U. Sheikh, and Ab Al-Hadi Ab Rahman. "Improved optical
character recognition with deep neural network." 2018 IEEE 14th International
Colloquium on Signal Processing & Its Applications (CSPA). IEEE, 2018. ,(19/12/2022)

[4]Hadar I. Avi-Itzhak, Thanh A. Diep, and Harry Garland, “High Accuracy Optical
Alphabet Recognition Using Neural Networks with Centroid Dithering IEEE
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE”, VOL. 17,
NO. 2, FEBRUARY 1995., (17/12/2022)

[5] Naseer, Asma, and Kashif Zafar. "Meta features-based scale invariant OCR decision
making using LSTM-RNN." Computational and Mathematical Organization Theory
25.2,(16/12/2022)

[6] Sushruth Shastry, Gunasheela G, Thejus Dutt, Vinay D S and Sudhir Rao Rupanagudi,
-“ A novel algorithm for Optical Character Recognition (OCR)”. 2013 IEEE.
(18/12/2022)

[7] https://signy.io/#/(16/12/2022)

You might also like