Professional Documents
Culture Documents
Thesis Proposal
Thesis Proposal
Thesis Proposal
Submitted by:
Apurba Datta
20101040
Md. Mehedi Hasan
20101007
Md.Niaz Mahamud
20101019
Supervised by:
Dr. Bilkis Jamal Ferdosi
Professor, CSE, UAP
Co-Supervised by:
Sk. Tanzir Mehedi
Lecturer, CSE, UAP
Chapter 1: Introduction
1.1 Motivation:
The right of good health is a fundamental human right in our country and the government is
strived to improve the health care system. In Bangladesh a lack of access to a patient's medical
history is a major obstacle for effective diagnosis and other treatment. Every patient faces the
problem of preserving all medical documents. This system will help patients to collect all
documents through (EHR) system.
Every healthcare sector faces challenges in managing data from medical papers, particularly
deciphering crucial information due to doctor's cursive handwriting. Security of healthcare data
falls threatening. The main problem of patients is needed to address these issues and
revolutionize the way medical prescriptions and test results are maintained and reviewed. This
system uses powerful extraction and classification of algorithms to address the challenges in the
Bangladeshi healthcare system. This innovation not only changes healthcare management but
also improves treatment. To know the medical history of a patient is very important for any
doctor to give their treatment.
Our proposed system extracts and records medical data from medical documents by employing
image processing techniques, providing patients with a medical history. Here, individuals are
empowered to confidently take control of their own well-being, healthcare professionals can
easily access vital data, and the overall healthcare journey is unparalleled in its simplicity and
effectiveness.
Moreover, the proposed solution effectively resolves challenges like deciphering doctors' cursive
handwriting. To tackle these issues head-on, the solution offers innovative strategies and
improvements.
In short, the problem statement resolves around the need to develop a seamless, efficient, and
user-friendly tool that enables patients to establish a medical record from images of their medical
documents, extracting and categorizing data, and addressing issues of illegible handwriting.
1.7 Our proposed method [briefly]
We propose a system that collects medical information from doctor’s prescription. Our proposed
system has three major parts:
1. Image acquisition and related word segmentation
2. Optical Character Recognition
3. Classification of Medical terms
4. Patient’s Database
Chapter 2: Background Study
2.1 Convolution Neural Network(CNN):
Convolution Neural Network(CNN) is a one type neural networks which do image
recognition,image classification.But it is widely used in object detection and face
recognision.When anyone give an image to cnn model for train and test, it puts the image
through several convolution layers, pooling layer with filter, fully connected layer a and classify
and object into 0 and 1 it applies softmax function. Now we show the complete CNN process in
the figure.
2.2 OCR: OCR used to check handwritten and printed text from any document to determine
their shape by detecting patterns of dark and light with digital electronic device.Then it converts
into digital text.
2.3 NLP: NLP is an important component of AI. Normally it is used to analyse and process
natural language data. It is also used to classify any document text .So it automatically classifies
text and predicts unseen text classes.
2.4 YOLOV8:
The YOLOv8 architecture is an object detection algorithm that utilizes a single neural network to
simultaneously predict bounding boxes and class probabilities for multiple objects in an image.
Chapter 3:Related Works
We have arranged the related work into three categories: Text Localization, Medical Document
Analysis, and Optical Character Recognition (OCR).
3.1 Review the related previous works
3.1.1 Text Localization
Dhande et al., focused on character recognition for cursive English handwriting, which involves
identifying pharmaceutical names from prescriptions written by doctors. This task is challenging
due to the connected nature of the characters [4].
Accurate text and word segmentation in cursive handwriting This paper employs horizontal and
vertical projection methods for text-line and word segmentation, the convex hull algorithm for
feature extraction,which computes 125 features for each character, and Support Vector Machine
(SVM) for the classification and recognition of medicine names. This system achieved accuracy
of 95% for text-line segmentation, 92% for word segmentation, and 85% for recognition
accuracy.
The image data set of a document is acquired by taking a photograph or scanning the document.
The limitations of this paper include the need for improvement in recognition accuracy and the
absence of information on the dataset used.
Reis et al., employed the YOLOv8 model, a cutting-edge single-shot object detection system, for
real-time flying object recognition [14].
The paper uses the YOLOv8 architecture for real-time flying object detection. It trains an initial
model on 40 different flying object classes, extracting abstract feature representations. Transfer
learning is performed on a more representative data set to refine the model. Mosaic augmentation
is used on the training set, combining four random images into a single mosaic image, which is
then used as input for the model. The refined model keeps the same inference speed and obtains
an improved mAP50-95 of 0.835, while the finalized model achieves a mAP50-95 of 0.685 and
an average inference speed of 50 fps.
Our initial model training data set contains 15,064 images of various flying objects with an 80%
train and 20% validation split.
The paper discusses the challenges in detecting flying objects due to factors like spatial variance,
speed, occlusion, and clustered backgrounds, as well as the limitations of the refined model's
performance.
Roy et al., introduced keyword based searching in prescription to efficiently extract relevant data
[3].
They divided a prescription into two parts, printed letterhead and the handwritten part or body.
Extracting the domain specific knowledge of a doctor by identifying the department name from
the letterhead helps to search only relevant words in the handwritten portion.
500 prescriptions had been trained using the Hidden Markov Model, and they achieved
encouraging results.
They proposed a multilayer Perceptron based Tandem feature to improve the performance. But
the searching efficiency entirely depends on the size of the keyword database.
Shanmugalingam et al., addressed the need for automatic language identification in the context
of social media, particularly for code-mixed text. Language identification becomes crucial for
analyzing the noisy content on social media, where users often mix two or more languages
together to express their thoughts [5].
Existing language detectors fail in the context of social media due to phenomena such as
code-mixing, lexical borrowing, and phonetic typing. The paper proposes a novel approach using
Natural Language Processing (NLP) and Machine Learning (ML) techniques to provide
word-level language tags for Tamil-English code-mixed data from social media posts and
comments. The paper achieved a high accuracy of 89.46% in language identification using the
SVM classifier for Tamil-English code-mixed data in social media.
The dataset used for training and testing consists of 3,000 Tamil-English code-mixed sentences,
with 2,000 sentences. The British National Corpus (BNC) and LEXNORM Corpus are used to
identify English words.
Achkar et al., addressed the challenge of deciphering doctors' handwritten prescriptions, which
often contain Latin abbreviations and medical terminology that are difficult for most people to
understand [6].
The paper proposes the use of Artificial Neural Networks (ANN), specifically the Deep
Convolutional Recurrent Neural Network (CRNN), to develop a system for recognizing
handwritten English medical prescriptions..The proposed system achieves good recognition rates
and an accuracy of 98%.
The training is done using 66 different classes, including alphanumeric characters, punctuations,
and spaces.
The program currently requires manual insertion of text on a line-by-line basis, which can be
time-consuming.This system reduces the risk of negative consequences due to incorrect
deciphering.
Carchiolo et al., researched to extract text from scanned prescriptions and effectively classify
medical prescription data using Natural Language Processing (NLP) and machine learning
techniques [7].
The proposed system uses Natural Language Processing and machine learning techniques for
medical prescription classification, extracting text from scanned prescriptions, and applying
algorithms. It employs a RESTful Web Service and uses syntactic rules and rule-based tagging.
First it shows 30% of prescriptions are unclassifiable, especially when spelling corrections aren't
needed. This improved solution significantly improved classification, with only 5% of
prescriptions considered unclassifiable.
The paper uses a dataset of 800,000 medical prescription text rows to test a proposed system for
prescription classification. The dataset includes 5,000 mapping rules and is used to train and
validate a supervised machine learning classifier for detecting white prescriptions.
The system effectively classified medical prescriptions using Natural Language Processing and
machine learning techniques, extracting text from scanned prescriptions, utilizing embedded
terms, and categorizing patient data.
Khandokar et al., applied handwritten character recognition (HCR) using a convolutional neural
network (CNN) [8].
The process of image processing involves pre-processing, segmentation, feature extraction,
classification, and recognition. Pre-processing involves cleaning the input image to remove noise
and convert it to grayscale or binary formats. Segmentation separates individual characters,
eliminates boundaries, and scales them to specific sizes. Feature extraction is done using a CNN
with a ReLU activation function, which forms a reduced matrix and compacts it into a feature
vector. The accuracy of the CNN improves with increasing training images, reaching 92.91%.
The system trained with 1000 images from the NIST dataset, achieving 92.91% accuracy with
200 testing images.
The accuracy of training images cannot exceed the 92.91% limit due to numerical errors and
constraints on CNN image differentiation for labels.
Tabassum et al. , Created a system that can recognize a doctor's handwriting and generate digital
prescriptions A bidirectional Long Short-Term Memory (LSTM) network is used in the study as
a machine learning method [9].
For the purpose of predicting doctors' handwriting, an online character recognition system was
developed using a bidirectional LSTM model. Using bidirectional LSTM and RSS data
augmentation, this study conducted eight trials on the handwritten data set and achieved an
average accuracy of 93.0% (maximum: 94.5%, minimum: 92.1%). Compared to the recognition
result without any data expansion, this accuracy was 19.6% higher.
A 'Handwritten Medical Term Corpus' dataset includes 17,431 samples of 480 medical terms.
Data augmentation methods are presented to expand the sample size and boost recognition
effectiveness. These methods include pattern rotation, shifting, and stretching.
Handwritten recognition technology in smartpens can digitize doctors' writings in real time,
potentially reducing medical errors, saving costs, and promoting healthy living in poor nations.
Model (recognizes doctors' handwriting) accuracy varies in different augmentation stages.
Sakib et al., discussed to extracts and classifies medical text from prescription photographs [10].
For this purpose, the system uses a four-phase method. Text extraction and localization from
prescription photographs, image categorization, OCR-enabled image to text conversion, and text
classification into categories like symptoms, medications, diagnostic tests, and others are among
the stages. The research uses a very deep convolutional network known as VGG-16 for picture
classification. The Bidirectional Encoder Representations from Transformers (BERT) model is
used for text categorization. Tesseract OCR is utilized for accurately extracting text from
English printed text images, with an error rate of only 3.62%.For image classification training,
accuracy is 94.44%, and for medical term classification, accuracy is 99.9%.
Through the extraction and classification of data from unstructured Bangladeshi medical
prescriptions, the proposed system seeks to establish a medical history archive. This paper
extracts each and every word in an unordered manner. Due to this, there is no sequence or
relationship between the words. Also, this paper only works with only english handwritten and
printed text.
Padmanabhan et al., studied, deep learning methods including CNN, RNN, and LSTM are used
to train the model for recognizing doctors' handwritten prescriptions in a variety of languages
[11].
The study used grayscale conversion for image preprocessing, deep learning techniques like
CNN, RNN, and LSTM for image recognition, and OCR and word segmentation for converting
medical prescription images into digital text. Fuzzy search and market basket analysis were used
to optimize the pharmaceutical database and provide structured output to the user, while Unicode
was used to match words from various languages.
The paper uses a dataset of 17,431 handwritten medical prescriptions from 39 Bangladeshi
doctors, preprocessed and augmented using SRP, and trains a model using the IAM dataset.
The model was trained with 50 ecop, divided into 90% for training and 10% for testing, and kept
low on CTC loss function. Tested at a pharmacy shop, it successfully distinguished handwritten
notes from physicians, delivering prescriptions in consumers' preferred language.
Paul et al., extracted medication names from handwritten prescriptions using a "weakly
supervised" method [12].
The paper introduces a weakly supervised method for extracting medicine names from
handwritten prescriptions using weakly labeled data. It uses weak labels to identify regions of
interest and injects a domain-specific medicine language model(n-gram model). The model uses
a pre-trained OCR backbone and a CTC decoder for symbol classification. The suggested
method outperforms existing methods for extracting drug names from handwritten prescriptions
using weakly labeled data by more than 2.5 times.
The study uses a dataset of handwritten prescriptions, consisting of 9645 images written by 117
doctors. The dataset contains weak labels with medicine names, and 500 images are strongly
annotated for evaluation. The study also uses more than 90,000 medicine names.
OCR errors hinder advanced information extraction methods, and adapting models to
domain-specific training data is costly due to limited medical document availability.
Sharma et al., discussed the use of Optical Character Recognition (OCR) systems to extract text
from photos and convert it to ASCII text [13].
The process involves digitization, pre-processing, text localization, splitting, categorization, and
character recognition. The system uses Tensorflow and Keras for data pre-processing and the
accuracy of CNN classification of handwritten digits is evaluated using a single layer and a
hidden layer. The paper also mentions the Leptonica library, an open-source software for
computer vision and image processing applications. The maximum accuracy is found to be on
average, 93%.
The MNIST dataset is used as a training sample by the system.
This project reduces the complexities and improves the accuracy of OCR. This system can
achieve an accuracy of around 93% on average, the highest it has ever achieved.
3.2 Limitation of previous work
Roy et al. 2017 ELSEVIER 500 prescription Keyword Not specified Efficiency
[3] images spotting, depend on
Hidden markov keyword
model database size
Dhande et 2017 ICCUBEA Not specified Line and word Segmentation: Recognition
al. [4] (they create their segmentation, line (95%) accuracy is low,
own dataset by convexhull and word and they try to
take photos) algo,SVM (92%) update it in the
Recognition future.
accuracy
(85%)
Tabassum 2022 Scientific 17431 data from LSTM, RSS 93% Performance
et al. [9] Reports 480 medical data depend on
terms augmentation different
augmentation
Sakib et al. 2022 ICCTT Used their own OCR,BERT, image No relationship
,[10] dataset. VGG-16 classification between the
(94.44%), words.works
medical term with only english
classification handwritten and
(99.9%) printed text.
Paul et al. 2023 ICDAR 9645 handwritten weakly Performs 2.5 Model Costly
[12] medical supervised times better due to limited
prescriptions; method,OCR than other medical
also use 90000 method document
backbone , CTC
medical names availability
decoder
Sarkar et 2013 SPRINGE Not specified Word spotting Not specified Performance
al. [1] R droppen when
search query
words with four
or fewer
characters
Veit et al. 2016 arXiv COCO text Three different Not specified Difficulty in
[2] dataset OCR accurately
approaches classifying text
by language due
to ambiguity.
1. Prescription Image Capture: - The system starts the procedure by giving the patient who has
the prescription a special identification number when it has been captured or scanned. Every
further piece of information pertaining to that patient's prescriptions uses this patient ID as a
point of reference.
2. Produce an ID for the patient: - The system creates or retrieves a patient ID when capturing
a new prescription image. To ensure the uniqueness and integrity of patient information, this ID
can be produced consecutively using a certain technique, or it can be acquired from an
already-existing patient database.
3. Store Data in the Database: - A structured database contains the prescription photograph that
was taken and the related data (patient ID, prescription details, date, etc.). Organized and
retrievable data is ensured by using a relational database model or other suitable data storage
technologies.
4. Display Data from the Database: - The system has the ability to display and retrieve
prescription data that has been stored. To retrieve certain prescription records, this may entail
executing a database query using the patient ID or other pertinent criteria. After the data has been
retrieved, authorized users can see it for inspection or additional processing.
4.2 Current Status
Firstly collect data(prescriptions) from different sources like friends, known relatives etc. Then
we input some images to manually annotate and train models using Yolov8 to extract the
annotated image. Then crop the annotated part as images. Here the main goal is to provide better
health care and improve prescription data usability, accuracy and accessibility.
Medical documents collection(prescription): Prescriptions collection is one of the major
challenges in this process, which has all the medical history. Here maintain some steps like
arranging, extracting, storing, and checking prescriptions in a database. To ensure those patients
their prescriptions are safely stored and hide their identity.
Image capture(prescription): This phase's main task is the scanning or taking prescription
images accurately. Hard copy of the prescription makes a soft copy to this step and proceed to
analyze further. The physical prescription is to be digitally digitized so that it can be processed
and analyzed further. Then prescriptions are stored to maintain sequence that's why those are
easily accessed or retrieved.
Image Annotation (Manually using API): In this phase we use CVAT Application
Programming Interface(API) tools to annotate the image manually. We create a Region of
Interest(ROI) which is basically the portion of the medicine name, doge, weight and how many
days the medicine continues. We manually categorized and labeled the fields in the prescription
image which gives the coordinate point of the categorized and labeled fields.
Training Models with YOLO V8: Here YOLO V8 is a deep learning model which one uses for
text and object detection and analysis of the image or video. We use the previous data of image
annotation to train the model of YOLO V8 and try to use as much data as possible to get the
better performance. Because the accuracy of the output depends on the train data accuracy. After
testing the model we get the information of a patient’s medicine, weight and dosage.
5.2 Budget
To Build our dataset, we need to collect around 2000 medical prescriptions, which requires a
good amount of manpower. They will collect prescriptions from different hospitals with the
consent of the respected patient.
To develop our system we need 800 working hours. Considering per hour cost 100 taka, we need
(800*100) = 80,000 tk to develop our system.
[1] Sarkar, Sayantan. (2013). Word Spotting in Cursive Handwritten Documents Using Modified
Character Shape Codes. doi: 10.1007/978-3-642-31600-5_27.
[2] Andreas, Veit., Tomas, Matera., Lukas, Neumann., Jiri, Matas., Serge, Belongie. (2016).
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images.
arXiv: Computer Vision and Pattern Recognition,
[3] Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prithviraj Dhar, Umapada Pal,
Keyword spotting in doctor's handwriting on medical prescriptions,Expert Systems with
Applications,Volume 76, 2017, Pages 113-128,ISSN 0957-4174,
doi:https://doi.org/10.1016/j.eswa.2017.01.027.
[4] P. S. Dhande and R. Kharat, "Character Recognition for Cursive English Handwriting to
Recognize Medicine Name from Doctor's Prescription," 2017 International Conference on
Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 2017, pp. 1-5,
doi: 10.1109/ICCUBEA.2017.8463842.
[8] Khandokar, I & Hasan, Md Munirul & Ernawan, Ferda & Islam, Md & Kabir, Muhammad
Nomani. (2021). Handwritten character recognition using convolutional neural network. Journal
of Physics: Conference Series. 1918. 042152. doi: 10.1088/1742-6596/1918/4/042152.
[9] Tabassum, Shaira & Abedin, Nuren & Rahman, Md & Rahman, Md & Ahmed, Mostafa &
Maruf, Rafiqul & Ahmed, Ashir. (2022). An online cursive handwritten medical words
recognition system for busy doctors in developing countries for ensuring efficient healthcare
service delivery. Scientific Reports. 12. doi:10.1038/s41598-022-07571-z.
[10] A. M. Sakib, B. Jamal Ferdosi, S. Jahan and K. Jashim, "Medical Text Extraction and
Classification from Prescription Images," 2022 25th International Conference on Computer and
Information Technology (ICCIT), Cox's Bazar, Bangladesh, 2022, pp. 472-477, doi:
10.1109/ICCIT57492.2022.10055123.
[12] Paul ,S., Madan, G., Mishra, A., Hegde, N., Kumar, P., Aggarwal, G. (2023). Weakly
Supervised Information Extraction from Inscrutable Handwritten Document Images. In: Fink,
G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023.
ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. doi:
https://doi.org/10.1007/978-3-031-41685-9_28
[13] Sharma, K. , Kona, S.J. , Jangwal, A. , M, A. , Rajabai C, P. and Sona, D.R. 2023.
Handwritten Digits and Optical Characters Recognition. International Journal on Recent and
Innovation Trends in Computing and Communication. 11, 4 (May 2023), 20–24.
DOI:https://doi.org/10.17762/ijritcc.v11i4.6376.
[14] Reis, D., Kupec, J., Hong, J., & Daoudi, A. (2023). Real-Time Flying Object Detection with
YOLOv8. ArXiv, doi: 10.48550/arxiv.2305.09972
Chapter 7:Appendix (CEP mapping)
7.1 Complex Engineering Problem & Mapping
How Ks are addressed through the project and mapping among Ps, COs and POs
K8 Research literature Research on different paper to figure out CO1 PO-(l) Life-long
learning
How Ps are addressed through the project and mapping among Ps, COs, and POs
P3 Depth of analysis No clear solution due to availability and CO1, PO-(l) Life-long
required variations of handwritten medical CO2 learning
prescriptions. So depth analysis is PO-(b) Problem
needed to select proper pipeline from analysis
many alternative.Also need proper data
analysis.[k2]
How As are addressed through the project and mapping among Ps, COs, and POs
As Attributes How A’s are addressed
A1 Range of resources The project needs various medical prescription as a dataset,and technology
needed like high end PCs and Jupyter notebooks and TensorFlow
A2 Level of interaction When get the prescription image the system convert it into a digital format
and store it in the database.It helps the patiens and doctor to access any
medical information easily.
A4 Consequences for Our work helps the people to understand medical prescription cursive
society and handwriting, and their medical history.
environment
CO1 Identifying a real-life problem that can be transmitted PO(l) Life-long learning
to an engineering or computing solution through
design, development and validation.
CO2 Identify, formulate, and analyze a real world problem PO(b) Problem analysis
based on requirement analysis. PO(c) Design/development
CO4 Use modern development tools which are popular PO(c) Design/development
among s/w developers. PO(d) Investigation
CO5 Identify societal, health, safety, legal and cultural PO(g) Environment and
issues related to the project. sustainability
CO8 Use modern analysis and design tools in the process PO(e) Modern tool usage
of designing and
validating of a system and subsystem