Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Diabetes Retinopathy Prediction Using Multi

Model Hyper Tuned Machine Learning


A PROJECT REPORT

Submitted by
S. PRIYADHARSHINI (18132001)
S. HARIPRIYA (18132009)
A. AARTHI (18132011)

Under the guidance of

Dr. B.V. BAIJU


Assistant Professor (S.G)

in partial fulfillment for the award of the degree


of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY

SCHOOL OF COMPUTING SCIENCES

DEPARTMENT OF INFORMATION TECHNOLOGY

HINDUSTAN INSTITUTE OF TECHNOLOGY AND SCIENCE

CHENNAI- 603 103

APRIL 2022

I
BONAFIDE CERTIFICATE

Certified that the project report titled “DIABETES RETINOPATHY

PREDICTION USING MULTI MODEL HYPER TUNED MACHINE

LEARNING” is the bonafide work of PRIYADHARSHINI (18132001), S.

HARIPRIYA (18132009), A. AARTHI (18132011) who carried out the project work

under my supervision.

SIGNATURE SIGNATURE

Dr. V. CERONMANI SHARMILA Dr.B.V. BAIJU

ASSOCIATE PROFESSOR ASSISTANT PROFESSOR

HEAD OF THE DEPARTMENT SUPERVISOR

DEPARTMENT OF IT DEPARTMENT OF IT

INTERNAL EXAMINER EXTERNAL EXAMINER

Name: ________________________ Name: _____________________

Designation: ___________________ Designation:________________

Project Viva - Voice conducted on ___________

II
ACKNOWLEDGEMENT

First and foremost, I would like to thank the Lord Almighty for His presence

and immense blessings throughout the project work. It’s a matter of pride and

privilege for me to express my deep gratitude to the management of HITS for

providing me the necessary facilities and support.

I wish to express my heartfelt gratitude to Dr.V. Ceronmani Sharmila, Head

of the Department, Department of Information Technology for much of his valuable

support and encouragement in carrying out this work.

I would like to thank my internal guide Dr.B.V. Baiju, Assistant Professor,

Department of Information Technology for continually guiding and actively

participating in my project, giving valuable suggestions to complete the project

work.

I would like to express my deepest gratitude to my parents who were pillars

for my support when I was looking for some encouragement. Last but not the least,

special thanks to all my department faculties for their support for promptly guiding

us to go ahead with the project.

III
ABSTRACT

Diabetic Retinopathy is chronic mellitus which occurs due to lack of insulin that

causes Diabetic which can incite loss of vision in case it is not identified at the very

first level. Diabetic retinopathy is also considered as one of the major impermanence

diseases in older people. Diabetics can lead to acute complications such as

cardiovascular disease, stroke etc. If preventive measures are not taken, it can lead

to further diseases such as nephropathy, diabetic foot and retinopathy. Data mining

plays an important role in diabetic retinopathy which can be beneficial for the better

health of the society. This model helps to identify the diabetic retinopathy based on

classification models and by using machine learning algorithms with hyper

parameter tuning produces higher disease prediction.

IV
TABLE OF CONTENTS

CHAPTER TITLE PAGE


NO. NO.
Abstract iv
List of Abbreviations ix
List of Figures x
List of Tables xi
1 INTRODUCTION 1
1.1 INTRODUCTION 1
1.2 OVERVIEW OF PROJECT 2
1.3 SCOPE OF THE PROJECT 3
1.4 SUMMARY 3
2 LITERATURE REVIEW 4
2.1 INTRODUCTION 4
2.2 INITIAL INVESTIGATION 4
2.3 FEASIBILITY STUDY 4
2.3.1 Technical feasibility
2.3.2 Economical Feasibility
5
2.3.3 Behavioral Feasibility

2.4 LITERATURE SURVEY 6

V
10
2.5 SUMMARY

3 PROJECT DESCRIPTION 11
3.1 INTRODUCTION 11
3.2 EXISTING SYSTEM 11
3.2.1 Limitations of Existing System 12
3.3 PROPOSED SYSTEM 12
3.3.1 Advantages over Existing System 13
3.4 SUMMARY 13
4 PROJECT REQUIREMENT 14
4.1 INTRODUCTION
4.2 SOFTWARE REQUIREMENTS
4.2.1 Hardware and software specification
4.2.2 Python
4.2.3 Anaconda Software
4.2.4 OpenCv 14-20
4.2.5 Jupyter Notebook
4.3 TECHNOLOGY OVERVIEW
4.3.1 Multi Model ML
4.3.2 HyperParameter tuning
4.4 SUMMARY
5 SYSTEM DESIGN 21
5.1 INTRODUCTION 21
5.2 SYSTEM ARCHITECTURE
5.2.1 Input Design 21-24
5.2.2 User Interface design

VI
5.2.3 Procedural Design
5.2.4 Output Design
5.3 SUMMARY
6 MODULE DESCRIPTION 25
6.1 INTRODUCTION 25
6.2 MODULES 25
6.2.1 Module 1: Data Collection and
preprocessing
6.2.2 Module 2: Exploratory data analysis
6.2.3 Module 3: Implementation of ML
algorithm 25-28
6.2.4 Module 4: Prediction of diabetic
retinopathy
6.2.5 Module 5: Performance analysis
6.3 SUMMARY
7 IMPLEMENTATION 28
7.1 INTRODUCTION 28
7.2 PROCEDURE 28
7.3 ALGORITHMS USED 29
7.3.1 Logistic Regression algorithm 29
7.3.2 KNN Algorithm 29
7.3.3 Decision Tree Algorithm 30
7.3.4 Random Forest Algorithm 30
7.3.5 SVM Algorithm 31
7.4 ALGORITHM WITH HYPERPARAMETER 31
7.4.1 Web Application 32

VII
7.5 Evaluation Parameters 32
7.5.1 Confusion Matrix 32
7.5.2 F1-Score 33
7.5.3 Precision 33
7.5.4 Recall 33
7.6 Summary 34
8 CONCLUSION & FUTURE WORK 35
8.1 CONCLUSION 35
REFERENCES 36
APPENDIX 39
A. SCREENSHOTS 39
B. SAMPLE CODE 47
C. PUBLICATION

VIII
LIST OF FIGURES

Fig. No Title Page No.

1 System Architecture 21
2 Input Design 22
3 UI Design 23
4 Procedural Design 23
5 Confusion Matrix 33

IX
LIST OF ABBREVIATIONS

DR Diabetic Retinopathy

ML Machine Learning

SVM Support Vector Machine

KNN K nearest neighbor

GDM Gestational diabetic melitus

PDR Proliferative diabetic retinopathy

NDPR Non-Proliferative diabetic retinopathy

EDA Exploratory data analysis

BDR Background Diabetic retinopathy

SDR Simple Diabetic retinopathy

X
CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION
Diabetic Retinopathy is a severe retinal disorder that occurs because of
uncontrollable diabetic mellitus which leads to loss of vision if the disease is not
detected at the prompt. Diabetic retinopathy is the majority disorder as per World
Health Organization Report. Diabetic retinopathy will affect 191 million people by
2030. Retinal disease that is caused due to diabetic retinopathy will subsume
glaucoma, cataracts and retinopathy. There are enormous chances for diabetic
infected persons to not be treated for a very long time. The foremost aim of this
research paper is to construct a Working Model to detect diabetic retinopathy by
using multimodal machine learning with hyper parameter tuning method to get more
accuracy than the existing. Once the model is trained and tested with the datasets it
can be used to predict diabetic retinopathy that affects a person at an earlier stage.
Diabetic retinopathy will mainly affect the human Red Blood cells and it shrinks the
vision of a human eye. Microvascular disease and macrovascular disease are the
most common diabetic complications diseases. Microvascular disease threatens the
tiny blood vessels and arteries. The diabetic retinopathy disease will mainly attack
the eye(Retinopathy), Nerves(Neuropathy) and kidney(Nephropathy).The crucial
Macrovascular stumbling block includes cardiovascular disease demonstrated as
strokes among other serious complications. There are different types of diabetes that
can affect a human however the most common type of diabetics are type1 diabetic
and type 2 diabetic and gestational diabetic mellites(GDM).Type 1 diabetic will
commonly affect children’s, type 2 diabetes will commonly affect the middle aged
people and infrequently older person. Whereas gestational diabetic mellites(GDM)
will affect women during their pregnancy period. Generally diabetic retinopathy

1
treatment will be conducted by an Ophthalmologist by gathering the retinal images
of the patient. Data mining process can be exceedingly helpful for medical
practitioners for extracting hidden medical knowledge. Diabetic retinopathy is the
most complex disorder that cannot be ignored according to Clinical Research World
of Ophthalmology. DR is codified as two different stages No proliferative
DR(NDPR)and Proliferative DR(PDR). Further Diabetic retinopathy is Classified
into mild stage, severe stage. Initial Medication of DR can prevent patients from
more distant worsening. Today's world Diabetic Retinopathy Treatment is
undergoing by most experienced doctors in order to classify the critical stages of
patients in identifying the diseases manually. The performance Random Forest
algorithm and decision tree algorithm were better compared to Support vector
machine algorithm, KNN algorithm and K-means algorithm because the parameters
used for Multimodal algorithms gave more accuracy and detected the diabetic
retinopathy at an earlier juncture.

1.2 OVERVIEW OF PROJECT

● The proposed system classifies normal and abnormal input data by using all
necessary ML algorithms. Computational methods and algorithms have been
developed to analyze as well as quantify biomedical dataset.
● The proposed technique applies supervised machine learning to the sector of
clinical analysis on the way to reduce the time and strain passed through with
the aid of using the ophthalmologist and different participants of the group
withinside the screening, analysis and remedy of diabetic retinopathy.
● As diabetes continues to increase in popularity, more and more people are
looking for ways to manage the disease. Some of the most popular methods

2
used to treat diabetes include diet, exercise, and medication. However, there
is still much debate surrounding how effective these treatments are.
● Multi-Model Machine Learning is a machine learning method that can help
you predict which type of diabetes will develop in a person. This is important
because it can help you make better decisions about treatment and
medications.

1.3 SCOPE OF THE PROJECT


The project has an amazing scope withinside the future. According to the
report, diabetic retinopathy is a difficulty that impacts the eyes. It can harm the blood
vessels withinside the light-touchy tissue in the back of the eye (retina). Blindness
might also additionally arise due to this problem.
1.4 SUMMARY
This chapter covers the overview of the project and the major area of the
project was Data mining techniques and machine learning algorithms with hyper
parameter tuning were discussed here.

3
CHAPTER 2
LITERATURE REVIEW
2.1 INTRODUCTION
Research in a specific field should contain a detailed study of literature study related
to that subject and highly structured review is required to acquire a strong
background of the domain. A complete study of the literature study would give
insights about what was already done in the same field which will lead to a
significant investigation. This chapter presents a detailed review on the problem of
diabetic retinopathy disease prediction which is induced by diabetes. Various
methods of predicting different variability diseases of diabetic have been analyzed
and discussed in this chapter.

2.2 INITIAL INVESTIGATION


Diabetic retinopathy may be a chronic mellites which is caused because of high
levels of sugar in blood which can result in affecting the human organs supported
the severity of the diabetic. The initial investigation of the diabetic retinopathy
research is to seek out the organs which are tormented by diabetic mellitus 100 and
ninety-one insulin-taking patients with diabetes of a minimum of 5 years' duration
were identified from their doctors' records in 1970–1971. Patients were seen again
in 1972–1973 and 1976–1977. The classification scheme proposed appears useful
for characterizing overall retinopathy severity of patients on the premise of gradings
of fundus photographs.

4
2.3 FEASIBILITY STUDY
The feasibility observe is research of the software product’s viability or it's far a
quantification of the way benefits product improvement could be for the enterprise
from a sensible standpoint. A feasibility observe is performed for numerous motives
to decide whether or not a software program product is viable in phrases of
improvement, implementation, and fee to the company. Feasibility observe is a
important degree of the software program Project Management Process as it
determines whether or not to continue with the proposed challenge due to the fact
it's far almost viable or to halt the proposed challenge right here as it isn't always
viable to broaden or consider the proposed challenge again. Along with this, a
feasibility observe assists in figuring out danger elements related to growing and
deploying a system, in addition to making plans for danger analysis. Additionally,
with the aid of using studying numerous parameters related to proposed challenge
improvement, a feasibility observe facilitates slender enterprise options and enhance
achievement rates.
2.3.1 Technical feasibility
In this phase, existing resources, including hardware and software, as well as
necessary technologies, are analyzed/assessed in order to build the project.
Appropriate trials were conducted to determine the project's feasibility in terms of
technical feasibility, convenience, and economic viability. The findings of the
examination indicate that the project is technically viable. The performance study
demonstrates that the project achieves a higher level of accuracy and precision,
implying that it is feasible in terms of performance criteria.
2.3.2 Economical Feasibility
The economic feasibility of an answer is decided through its price. Economic
feasibility research looks at the project’s price and benefit. That is all through this
feasibility study, an in-depth evaluation of the project’s improvement charges is
5
conducted, which covers all wished charges for very last improvement, consisting
of hardware and software program aid requirements, layout and improvement
charges, and operations charges, amongst others. The thought is easy to enforce and
economically viable.

2.3.3 Behavioral Feasibility


Behavioral feasibility is a measure of how well a proposed system solves issues, how
well it exploits possibilities found during scope definition, and how well it meets
requirements established during the requirements analysis phase of system
development. It works quite well with a modest internet connection. The connection
is encrypted, and user information is securely kept. User inputs are accurately
captured, and responses are immediately recorded. This application is compatible
with all JavaScript enabled browsers. It is compatible with all mobile and desktop
devices.

2.4 LITERATURE SURVEY


Diabetic a chronic disease which affects human organs, it has a various stages based
on the different stage the diabetic disease will affects the human body parts.Most of
the foods that we ate is broken into sugar and released into our bloodstream. When
our blood sugar increases it signals our pancreas to release sugar. (Dr.B.V.Baiju et
al..) identified the proposed algorithm for identifying the diabetic diseases which
affect a person's neuropathy, nephropathy, cardiovascular. The author has used the
hybrid model to achieve higher accuracy compared to the standard model using the
algorithms K-means, Sparsity Correlation, Decision Tree, MLDDM and MAIM and
shows the accuracy of 99.5% using 500 samples of data.

6
Vaibhav V. Kamble et al. [2] Proposed a retinal pictures data set by utilizing RBF
brain organization. The preliminary sees the delicacy of 71.2, Perceptivity 0.83 and
Particularity 0.043 for DIARETDB0.

Amol Prataprao Bhatkar[3], Proposed a Multilayer Perceptron Neural Network


(MLPNN) to recognize diabetic retinopathy in retinal pictures. The MLPNN
classifier is introduced to group retinal pictures as ordinary and strange. The Train
N Times technique was utilized to prepare the MLPNN to track down the best
component subset. The preparation and cross approval rates by the MLP NN are 100
percent for location of ordinary and strange retinal pictures.

Cut Fiarni et al. [4] recommended a decrease. The vaccination model of diabetes
entanglement grievance is based on information mining style and calculation
bunching and characterization. Diabetes clinical information is isolated into four
classes by the model: nephropathy, retinopathy, neuropathy, and blended impacts.
To develop the ideal rule-based model for vaticination, they measure execution
utilizing grouping and order calculations.

Elmogy, Mohammed [5] For ophthalmologists, the proposed ML-CAD framework


imagines unmistakable sickness changes and analyses DR grades. They start by
eliminating commotion, working on the quality, and normalizing the retinal picture
sizes. Analysts determined the dim level run length network normal in four particular
headings to recognize solid and DR people.

S.Sankaranarayanan and Pramananda Perumal T[6]. Proposed two significant Data


Mining procedures “FP-Growth and Apriori have been utilized for application to

7
diabetes dataset and affiliation rules are being produced by both of these
calculations”.

Jiangxue Han et. al. [7] fostered a PC vision framework for perceiving and
computerizing this protest involving a neural organization to give discoveries to an
enormous number of cases in a short measure .

Y. Sun and D. Zhang [8] Proposed an assortment of five AI models for


distinguishing DR in cases utilizing Electronic Health Rate(EHR) information, as
well as A bunch of treatment choices. The last testing discoveries exhibit those
irregular backwoods is an AI model that accomplishes 92% precision and performs
well.

G. Kalyani et. al. [9] Diabetic retinopathy will be distinguished and characterized
utilizing the arranged case organization. The convolution and essential case layers
are utilized to remove qualities from fundus pictures, while the class case layer and
softmax layer are utilized to evaluate the probability that the picture has a place with
a particular class. Four execution models are used to check the proposed
organization's productivity utilizing the Messidor dataset.

Manisha Sharma et. al. [10] Proposed a PC vision framework via mechanizing it
utilizing a neural organization and furthermore identifying the grumbling, to give
results for an enormous number of cases in a little quantum of time.

Harry Pratt et. al. [11] proposed a CNN approach for diagnosing Diabetic
Retinopathy from advanced design pictures and straightforwardly arranging its
firmness. Also, he fostered an organization with CNN plan and information
8
expansion that could lay out the complex choices worried at stretches the
arrangement task like miniature aneurysms, exudates and hemorrhages on the tissue
layer and regularly offer an assignment precisely and keeping in mind that not client
finish input. Prepared this organization involving a top-of-the-line GPU for an
elevated place arrangement.

V. Deepa et. al. [12] Proposed procedure presents A troupe of multi stage profound
CNN models for Diabetic retinopathy evaluating upheld pictures patches. The pre-
handling stage permits the information pictures to give a great deal of important data
than the crude information pictures. Normalization and resizing region units used
because of the pre-handling methods during this work. The arranged multi-stage
algorithmic program is executed with three primary stages to execute the order
measures of the call organization.

Syna Sreng and Noppadol Maneerat [13] Proposed strategy, the picture is
preprocessed to deflect small commotions and improve the differentiation of the
image. Additionally, liquor edge perception is utilized to recognize the splendid
sores. Then, at that point, the red sores region unit recognized wishing on formal hat
morphological separating ways. Additionally, the remarkable and dull sores region
unit consolidated by exploitation consistent AND administrator. To be left
exclusively neurotic signs, the commotions close vessels are unit extra eliminated
by exploitation mass examination. Morphological choices are unit removed and one
more to the SVM classifier.

Suriyaharayananm et. al. [14] proposed A most recent limitation for vulnerable side
location, first sight the vessel and exudate fixes and remove each to actuate point. A

9
few of the choices like vas, exudates and points are recognized for precisely
exploitation morphological tasks applied reasonably.

Carlos Santos et. al. [15] Proposed strategy upheld profound neural organization
models that perform one-stage object recognition, abuse moderate information
increase and move learning methods to give a model that among the assignments of
physical design injuries. The model was prepared, upheld and upheld the YOLOv5
plan and conjointly the PyTorch structure, accomplishing values for map.

Kavakiotis et al. [16] In the field of diabetes research, proposed uses of AI,
information mining strategies, and apparatuses as far as expectation and analysis.
“The utilization of ml and information mining strategies in advanced datasets that
incorporate clinical and organic data is relied upon to prompt more top to bottom
investigation toward analysis and treatment of DM” because of the appearance of
biotechnology and the tremendous measure of information created.

2.5 SUMMARY
This chapter covers the initial investigation of the project and feasibility study of the
paper and literature survey. The major area of the project was Data mining
techniques and machine learning algorithms with hyper parameter tuning were
discussed here.

10
CHAPTER 3
INTRODUCTION
3.1 INTRODUCTION
Diabetic retinopathy is a disease which is caused by changes of blood vessels in
retina. In most cases people with DR, the blood vessels in the retina may swell and
leak fluid. The crucial Macrovascular stumbling block includes cardiovascular
disease demonstrated as strokes among other serious complications. There are
different types of diabetes that can affect a human however the most common type
of diabetics are type1 diabetic and type 2 diabetic and gestational diabetic
melitus(GDM). Type 1 diabetic will commonly affect childrens ; type 2 diabetes will
commonly affect the middle aged people and infrequently older person. Whereas
gestational diabetic melitus(GDM) will affect women during their pregnancy period.
Generally diabetic retinopathy treatment will be conducted by an Ophthalmologist
by gathering the retinal images of the patient. Data mining process can be
exceedingly helpful for medical practitioners for extracting hidden medical
knowledge.
3.2 EXISTING SYSTEM
The existing work deals with the issue of ailment gauge using biomedical diabetic
enlightening list has been particularly analyzed. In like manner, the maker present
an effect measure based sickness conjecture estimation. In the first place, the
methodology scrutinizes the biomedical enlightening assortment and performs
upheaval ejection. Second, the features are taken out and for each data point a multi
property social similarity measure (MARSM) is evaluated towards different bundles
available. Taking into account the MARSM measure evaluated, a lone class has been
recognized for the significant thing given. The procedure produces higher viability
in grouping as well as disease assumption. The procedure diminishes the deceptive
request extent and reduces the time unpredictability. The examination can likewise
11
be improved by remembering ecological elements for diabetic subordinate infection
expectation. Scientists might additionally add occasions of diabetic retinopathy to
the dataset for the forecast of diabetic retinopathy.
3.2.1 limitations of existing system
The existing system was designed to show the higher accuracy of the proposed
algorithm (MIAM, MLDDM) by combining two algorithms using machine learning
methods. The proposed method reported that the hybrid model achieved higher
accuracy compared to the standard model using the algorithms K-means, Sparsity
Correlation, Decision Tree, MLDDM and MAIM to predict diabetes, diabetic
nephropathy, diabetic neuropathy and diabetic cardiovascular. The existing method
has not predicted the diabetic disease which will affect human vision (retinopathy).

3.3 PROPOSED SYSTEM


The proposed work was intended to demonstrate the data mining technique in
diabetic disease prediction systems in the medical stream. In order to perform this
task, the retinopathy disease based data was selected for analysis and prediction. In
the proposed system we used five supervised machine learning algorithms to show
the comparison and accuracy of the algorithm and shows which supervised
algorithm has a high accuracy in detecting the diabetic retinopathy disease. The best
model algorithm will predict the diabetic retinopathy based on the user input data.
In the proposed model we have designed a web application with sample data, so
when a user has given any input with regards to the data in the web application the
machine learning model will predict the relevant information and conclude that
whether a person with such data has been affected with diabetic retinopathy or not.
In this system we have used a hyper parameter tuning method to show the
comparison of model algorithm prediction of diabetic retinopathy without hyper
parameter tuning and with hyper parameter tuning.
12
3.3.1 Advantages over Existing System
The existing system was built to show the accuracy for the diabetic disease predict
diabetes, diabetic nephropathy, diabetic neuropathy and diabetic cardiovascular, the
model was predicted the accuracy with K-means, Sparsity Correlation, Decision
Tree, MLDDM and MAIM machine learning algorithms. The proposed model was
designed to predict the diabetic retinopathy using data mining techniques and the
proposed method was included with multiple machine learning algorithms to show
the comparison of various machine learning algorithms to show the highest accuracy
of the algorithm which will predict retinopathy disease we have improve the
algorithm accuracy by adding hyper parameter tuning to show the accuracy of each
algorithm.

3.4 SUMMARY
This chapter covers the proposed and existing system of the project and the major
area of the project was Data mining and machine learning. The advantages of the
proposed system over existing systems were discussed here.

13
CHAPTER-4
PROJECT REQUIREMENT
4.1 INTRODUCTION
A software program requirement may be understood as a asset that the software need
to show off so as for it to correctly carry out its characteristic. This characteristic can
be to automate a few a part of a project of the individuals who will use the software,
to help the enterprise strategies of the organisation that has commissioned the
software, controlling a tool wherein the software is to be embedded, and plenty of
more. The functioning of the users, or the enterprise strategies or the tool will
generally be complicated and, via way of means of extension, the necessities at the
software program might be a complicated aggregate of necessities from unique
humans at unique degrees of an employer and from the surroundings wherein the
software need to execute.
4.2 SOFTWARE REQUIREMENTS
The requirements specification is a technical specification of necessities for the
software products. It is step one withinside the requirements evaluation process; it
lists the requirements of a selected software machine such as functional, overall
performance and protection requirements. The requirements additionally offer
utilization situations from a person, an operational and an administrative perspective.
The reason of software requirements specification is to offer an in-depth review of
the software task, its parameters and goals. This describes the task target market and
its person interface, hardware and software requirements. It defines how the client,
crew and audience see the task and its functionality.

4.2.1 HARDWARE AND SOFTWARE SPECIFICATION

• Hardware specifications:
– Microsoft Server enabled computers, preferably workstations
14
– Higher RAM, of about 4GB or above
– Processor of frequency 1.5GHz or above
• Software specifications:
– Python 3.6 and higher
– Anaconda software
– Jupyter Notebook

4.2.2 PYTHON
Python is a programming language that helps the advent of a extensive variety
of applications. Developers regard it as a awesome preference for Artificial
Intelligence (AI), Machine Learning, and Deep Learning projects.

● It has a large variety of libraries and frameworks: The Python language comes
with many libraries and frameworks that make coding easy. This additionally
saves a extensive quantity of time. The maximum famous libraries are
NumPy, that is used for medical calculations; SciPy for extra superior
computations; and scikit, for getting to know information mining and
information analysis. These libraries paintings along effective frameworks
like TensorFlow, CNTK, and Apache Spark. These libraries and frameworks
are critical with regards to device and deep getting to know projects.

● Simplicity: Python code is concise and readable even to new developers, that
is useful to gadget and deep getting to know projects. Due to its easy syntax,
the improvement of packages with Python is rapid while as compared to many
programming languages. Furthermore, it lets in the developer to check
algorithms with out enforcing them. Readable code is likewise crucial for
collaborative coding. Many people can paintings collectively on a
15
complicated challenge. One can without problems discover a Python
developer for the team, as Python is a acquainted platform. Therefore, a brand-
new developer can speedy get familiar with Python’s principles and paintings
at the challenge instantly.

● The large on-line help: Python is an open-supply programming language and


enjoys exceptional help from many assets and fine documentation worldwide.
It additionally has a huge and energetic network of builders who offer their
help at any degree of development.

● Fast development: Python has a syntax that is straightforward to apprehend


and friendly. Furthermore, the severa frameworks and libraries enhance
software program development. By the usage of out-of-container solutions,
lots may be executed with some traces of code. Python is right for growing
prototypes, which enhances productivity.

● Flexible integrations: Python initiatives may be included with different


structures coded in special programming languages. This manner that it's far
a good deal simpler to combo it with different AI initiatives written in
different languages. Also, considering the fact that it's far extensible and
portable, Python may be used to carry out go languages tasks. The adaptability
of Python makes it clean for facts scientists and builders to teach system
getting to know models.

● Fast code tests: Python presents plenty of code evaluate and check tools.
Developers can quick take a look at the correctness and pleasant of the code.
AI initiatives have a tendency to be time-consuming, so well-established

16
surroundings for trying out and checking for insects is needed. Python is the
perfect language because it helps those features.

● Performance: Some builders argue that Python is highly gradual in


comparison to different programming languages. As tons as velocity isn't one
in all Python’s sturdy suits, it presents the answer called Cython. It is a
superset of Python language designed to reap code overall performance
similar to C language. Developers can use Cython to code C extensions the
equal manner they code in Python, as its syntax is nearly the equal. Cython
will increase the language overall performance significantly.

● Visualization tools: Python comes with a huge kind of libraries. Some of those
frameworks provide excellent visualization tools. In AI, Machine learning,
and Deep learning, it's far vital to offer statistics in a human-readable

4.2.3 ANACONDA SOFTWARE


Anaconda is a distribution of the Python and R programming languages for clinical
computing (facts technological know-how, device mastering applications, large-
scale facts processing, predictive analytics, etc.), that objectives to simplify package
deal control and deployment. The distribution consists of facts-technological know-
how applications appropriate for Windows, Linux, and macOS. It is advanced and
maintained with the aid of using Anaconda, Inc., which become based with the aid
of using Peter Wang and Travis Oliphant in 2012. As an Anaconda, Inc. product, it's
also called Anaconda Distribution or Anaconda Individual Edition, at the same time
as different merchandise from the employer are Anaconda Team Edition and
17
Anaconda Enterprise Edition, each of which aren't free. Package variations in
Anaconda are controlled with the aid of using the package deal control machine
conda This package deal supervisor become spun out as a separate open-supply
package deal because it ended up being beneficial on its very own and for different
matters than Python There is likewise a small, bootstrap model of Anaconda known
as Miniconda, which incorporates handiest conda, Python, the applications they rely
on, and a small quantity of different applications. To get Anaconda, simply.

4.2.4 OpenCV

OpenCV is the massive open-supply library for pc vision, system learning, and
photograph processing and now it performs a prime position in real-time operation
which could be very crucial in today’s systems. By the use of it, you possibly can
procedure pics and motion pictures to perceive objects, faces, or maybe handwriting
of a human. When included with numerous libraries, including NumPy, python is
able to processing the OpenCV array shape for analysis. To Identify photograph
sample and its numerous functions we use vector area and carry out mathematical
operations on those functions. The first OpenCV model turned into 1.0. OpenCV is
launched beneathneath a BSD license and for this reason it’s unfastened for each
instructional and business use. It has C++, C, Python and Java interfaces and helps
Windows, Linux, Mac OS, iOS and Android. When OpenCV turned into designed
the primary awareness turned into real-time programs for computational efficiency.
All matters are written in optimized C/C++ to take benefit of multi-middle
processing.

4.2.5 Jupyter Notebook

A simple evaluation of the Jupyter Notebook App and its additives, the records of
Jupyter Project to reveal how it is linked to I Python, An evaluation of the 3

18
maximum famous approaches to run your notebooks: with the assist of a Python
distribution, with pip or in a Docker container, A sensible advent to the additives
that had been protected withinside the first section, entire with examples of Pandas
Data Frames, a proof on the way to make your pocket book files magical, and the
excellent practices and guidelines to help you to make your pocket book an
introduced price to any facts technological know-how project. The simplest manner
for a amateur to get began out with Jupyter Notebooks is with the aid of using putting
in Anaconda. Anaconda is the maximum broadly used Python distribution for facts
technological know-how and springs pre-loaded with all of the maximum famous
libraries and tools. As properly as Jupyter, a number of the largest Python libraries
wrapped up in Anaconda consist of NumPy , pandas and Matplotlib, aleven though
the overall 1000+ listing is exhaustive. This helps you to hit the floor walking for
your very own absolutely stocked facts technological know-how workshop with out
the problem of handling endless installations or annoying approximately
dependencies and OS-specific (read: Windows-specific) set up issues.

4.3 TECHNOLOGY OVERVIEW


The technology assessment is the place to begin for developing a well-described cost
proposition, that is, a unmarried sentence that carries the product that makes use of
the generation, indicating the tangible outcomes to that clients and displaying
indeniable marketplace management inside a given goal segment.

4.3.1 MULTI MODEL ML


Multi-version gadget mastering (ML) is a kind of gadget mastering set of rules that
may be used to expect the overall performance of an character version in a statistics
set. A version is a fixed of predictions made via way of means of a gadget, and multi-
version gadget mastering may be used to expect the overall performance of more
19
than one fashions in statistics set. This is large as it lets in for the schooling and
prediction of fashions that aren't confined to at least one unique version or statistics
set. Multi-version gadget mastering may be used to enhance the accuracy of
predictions via way of means of making an allowance for the schooling of extra
numerous fashions. In order to enhance the accuracy of our gadget mastering
fashions, it's miles crucial to apprehend and discover more than one-version
techniques. This manner expertise how special fashions may be used collectively to
be able to create a higher solution. Predictive modeling issues, for example, in which
the shape of the trouble itself shows using more than one fashions are common.
These issues may be mechanically divided into subproblems.

4.3.2 Hyperparameter tuning


Hyperparameter tuning (or hyperparameter optimization) is the manner of figuring
out the proper mixture of hyperparameters that maximizes the version performance.
It works via way of means of walking a couple of trials in a unmarried schooling
manner. Each trial is a entire execution of your schooling utility with values for your
preferred hyperparameters, set inside the limits you specify. This manner as soon as
completed will come up with the set of hyperparameter values which might be
satisfactory suitable for the version to provide ultimate results. Hyperparameter
settings should have a large effect at the prediction accuracy of the educated version.
Optimal hyperparameter settings regularly range for one-of-a-kind datasets.
Therefore, they have to be tuned for every dataset. Since the schooling manner
doesn’t set the hyperparameters, there wishes to be a meta-manner that tunes the
hyperparameters. This is what we suggest via way of means of hyperparameter
tuning.

20
4.4 SUMMARY
This chapter covers the Software and hardware requirements of the project and the
major area of the project was Data mining and machine learning. The requirements
and The technology overview was discussed here.

21
CHAPTER 5
SYSTEM DESIGN
5.1 INTRODUCTION
The system of the model was system designed optimized and evaluated with
the high testing accuracy which was achieved by the adding hyperparameter tuning
method based on the machine Learning system. The system design shows the
overview process of the model which includes multiple modules. The architecture
diagram depicts the entire flow of the Model which starts from data preprocessing
to the final prediction of the model and with the evaluation parameters.

5.2 SYSTEM ARCHITECTURE

Figure 1: system Architecture


The first phase is data cleaning, which involves fixing or removing duplicated,
poorly formatted, or damaged data. The second phase is data integration, which
involves combining data from several sources into a single perspective. The data
reduction stage is the third step, in which the data is encoded, scaled, and sorted if
necessary. The data transformation is the last phase in which the data is turned into
the desired format.

22
5.2.1 Input Design

Figure 2: Input design


The multi model algorithm prediction identifies the input data and then it verifies
the completeness. Then the method computes the multi-level disease decency
measure on each of the algorithms with reach dimensions. The final model was a
web application where user has to type the relevant data as result the model will
predict whether the user has a diabetic retinopathy disease or the user has not
affected by the diabetics.

23
5.2.2 User Interface design

Figure 3: UI design
In this project we have designed a web application using python language where
user can enter the details like Glucose level, Blood Pressure, High Pressure, Insulin,
BMI, Diabetes, Age by observing the input from the user the web application will
display whether the user has been affected with diabetic retinopathy or not. If a
person has a diabetic retinopathy it will show a suggestion to the person who has
been affected by diabetic retinopathy.
5.2.3 procedural Design

Figure 4: Procedural design

24
The procedural diagram depicts the entire flow of the machine learning model the
dataset has been collected from the kaggle and the data was pre processed and we
used data mining techniques to find if there is any anomalies in the dataset and then
we have split the dataset into training and testing part to show the accuracy and
finally by using the evaluation parameters we have concluded the output and the
machine learning algorithm which has the higher accuracy to predict the diabetic
retinopathy.

5.2.4 Output Design


The final output of the proposed project is to identify whether the user has been
affected with the diabetic retinopathy or not. We’ve built our model using Python in
the flask web application. Since flask is a micro-framework and easy to manage
given its pythonic nature, we have used the same.

5.3 SUMMARY
This chapter covers the system design,input and output design of the project and the
major area of the project was Data mining and machine learning.The system
architecture and design overview was discussed here.

25
CHAPTER 6
MODULE DESCRIPTION
6.1 INTRODUCTION
The system is intended to show the data mining techniques in disease prediction for
diabetic retinopathy. The input data can be extracted to detect Diabetic Retinopathy.
The data mining technique specified in this paper places focus on the feature
relevance and classification techniques to accurately categorize the disease
associated with the retina based on the features extracted from input parameters
using classification techniques. Created checkpoints to stop the model when it
reached higher accuracy. Save the best model that produces high accuracy. Finally,
Predict retinopathy when data input is given.

6.2 Modules
6.2.1 Module 1: Data collection and preprocessing
1. The dataset is then pre-processed and then the cleaned data will be used
for training.
2. Then the cleaned data will be explored and all the necessary things have
to be done like removing noise, changing all the parameters to the same.
3. Data preprocessing comprises 4 steps
The first step is the data cleaning in which the duplicate, incorrectly formatted,
corrupted data will be fixed or removed.
4. The second step is data integration which combines multiple sources of
data into a single view.
5. The third step is the data reduction step in which the data are encoded,
scaled sorted if needed.

26
● The final step is the data transformation in which the data is transformed into
a required format

6.2.2 Module 2: Exploratory data analysis


EDA is a vital section following statistics series and preprocessing wherein the
statistics is certainly displayed, plotted, and manipulated with out making any
assumptions a good way to useful resource in statistics first-class evaluation and
version construction.
● Exploratory statistics evaluation is a totally critical section in gaining knowledge
of and investigating numerous statistics units and summarizing their massive
characteristics.
● The foremost purpose of the exploratory evaluation is to search for distribution,
outliers, and anomalies withinside the statistics a good way to cause unique trying
out of the hypothesis.
● The utility of EDA can help in uncovering hidden styles in datasets.
● Exploratory statistics evaluation (EDA) is usually a important assignment in
uncovering hidden styles, detecting outliers, and figuring out critical variables and
any anomalies in statistics.

6.2.3 Module 3: Implementation Machine learning algorithms

● To predict diabetic retinopathy, we propose developing a multi-model


supervised learning-based system using techniques like Random Forest,
Support Vector Machines, and Decision Tree, Logistic Regression, KNN.
● The fundamental advantage of integrating numerous approaches is that each
method may benefit from the complimentary predictive properties of the
others.
27
● Multi-component algorithms use multi-layer and deep architectures to
progressively extract data's fundamental properties from the lowest to highest
levels, and they can also uncover diverse patterns in enormous amounts of
data.
● The prediction model will assist us in identifying the presence of diabetic
retinopathy in the given sample of users.

6.2.4 Module 4: Prediction of Diabetes retinopathy

● To analysis of all DR, we created an application that is able to analyze all of


the input data, process the data using distinct techniques, and then accurately
predict whether the person is diagnosed with Diabetic Retinopathy or not
● The best model will be saved and it is used for the prediction. The file given
by the user is predicted using the pre-trained model. The accuracy of the
prediction model is over 95 %.
● Retinopathy is finally detected and displayed on to the GUI respectively so
that the patient can know about the condition of his ailment currently faced by
the potential patient.
● Checkpoint will be created for the model which produces the best accuracy
and prediction will be done
● The application will act as a virtual assistant in major clinical laboratories,
healthcare centers, and medical clinics.

6.2.5 Module 5: Performance analysis


● Confusion Matrix
● F1 Score
● Precision

28
● Recall
● Support

6.3 SUMMARY
This chapter covers the Modules description of the project and the major area of
the project was Data mining and machine learning. Five modules were discussed
here.

29
CHAPTER 7
IMPLEMENTATION

7.1 INTRODUCTION
The implementation process for building and testing the model we have used five
algorithms which are Logistic regression, Random Forest, Decision tree, KNN,
Support vector machine. We have used the above algorithm to show the accuracy of
individual algorithm by using a hyperparameter tuning, The research paper is
intended to show the comparison of the five algorithms without using a hyper
parameter tuning and using hyper parameter tuning for identifying the patients with
diabetic retinopathy possibilities. The result is designed in such a way that the
accuracy of the algorithm using a hyperparameter tuning shows higher accuracy then
the algorithm without hyper parameter tuning.

7.2 PROCEDURE
There are four stages to data preparation.
1. The first phase is data cleaning, which involves fixing or removing duplicated,
poorly formatted, or damaged data.
2. The second phase is data integration, which involves combining data from
several sources into a single perspective.
3. The data reduction stage is the third step, in which the data is encoded, scaled,
and sorted if necessary.
4. The data transformation is the last phase in which the data is turned into the
desired format.

30
7.3 ALGORITHMS USED
In this research paper we have used a supervised ML method to detect the early stage
of DR in a person, the algorithms we used are Logistic regression, Random Forest,
Decision Tree, KNN, Support Vector Machine.

7.3.1 Logistic regression algorithm


Logistic regression widely used popular Machine Learning algorithms, which comes
under the Supervised Literacy fashion. It's used for prognosticating the categorical
value using a given range of standalone value variables. Logistic retrogression
predicts the affair of a categorical dependent variable.The term “ Logistic” is taken
from the Logit function that's used in this system of classification.Image result for
logistic retrogression in machine literacy
Logistic Retrogression is a “Supervised machine literacy” algorithm that can be used
to model the class or event. It is used for linear divisible data. That means Logistic
retrogression is generally used for Double bracket problems. Supervised literacy
algorithm should have input variables (x) and a target variable (Y) when you train
the model.

7.3.2 KNN Algorithm


KNN otherwise called k neighbor strategy which is managed ML .By including all
the checkpoints , the basic motive of the classify and to predict test data points..
As a result data mining have been used for classification, regression and missing
value estimation. k means is the another most simple algorithm for segmenting and
classifying image into different clusters based on attribute, feature value. There are
various way to identify the k means value but the most simple way to run the
algorithm is to choose the one which performs best. Each datapoint has the huge
biggest local neighborhood while developing the model, which encompasses the
31
greatest of data points. This is the world’s biggest data points may seen as the
neighborhood encompasses all of its data points. We continue the enclosed
procedure for data points covered by any data points are covered by selected
representatives. In any case we don't need to show the precise k of the technique
throughout the creation of the model. The number of data points owned by certain
neighborhoods are considered as an idea k although it varies across representatives.
During the creation of the model the k is created automatically. Furthermore,
utilizing the classified model not only decreases the volume of the data to be
classified but also enhances the efficiency gradually.

7.3.3 Decision Tree


It is the most popular and powerful algorithm for classification and prediction.
Decision tree as the name suggests it is tree like structure which will help us to
identify the result based on tree like graph. The research focuses on decision tree
algorithm in particular . the numerical weights in the neural network of connections
between nodes and substantially more difficult to create conceptual principles. In
data mining decision trees are used as a common classification method. Decision
tree offers a wide range of applications due to their ease and analysis of accuracy
across many data types. Decision tree is the recursive split of an instance that is used
to classify the data. Each test value evaluates a single attribute in the simple and
most common situation. The condition refers to the range of cases in numeric
properties. the decision tree geometrically understands the collection of hyperplanes,
naturally decision makers favor less complicated solutions as they are more
understandable.

32
7.3.4 Random Forest
It's a Supervised ML Algorithm that helps to solve problems like regression and
classification. In the event of a relapse, it assembles decision trees on numerous
examples and takes their greater vote in favor of order and normal. The Random
Forest Algorithm creates the final result by combining the results of many Decision
Trees. It erodes the packing guideline, which creates an alternate preparation subset
from test preparation information with substitution, and the final result is mostly
dependent on voting.The tree relies on the benefits of an arbitrary vector evaluated
independently and uniformly across the forest. Out to be huge, the speculation
mistake for forest meets as far as feasible. The power of the individual trees in the
forest, as well as their interrelation, is crucial to the hypothesis blunder of a forest of
tree classifiers. A method for generating random numbers is the Random Forest
Algorithm. As the name implies.We calculate an unregistered joint distribution
value. The main objective is to identify the prediction function for forecasting value.
A loss method is a formula that calculates how much money
7.3.5 Support Vector Machine
SVM is additionally referred to as a support vector machine learning formula which
might be used for each classification similarly as regression issues. SVM or Support
Vector Machine may be a direct model for bracket and retrogression issues. It will
break direct and non-linear issues and work well for varied sensible issues. SCIKIT-
Learning is an extensively accessible methodology for imposing cc algorithms.
SUPPORT VECTOR MACHINE is additionally useful in SCIKIT library
appropriate model and validation). SVM workshop by mapping information to a
high-dimensional purpose area so information points are often distributed, so once
the information is not otherwise linearly dissociable. A division between the orders
may be a plant, conjointly the information square measure regenerates in such a way
that the division might be drawn as a hyperplane.
33
7.4 ALGORITHM WITH HYPERPARAMETER TUNING
In ML hyper parameters are used for improving the accuracy of any machine
learning algorithm. A hyper parameter could be used in a machine learning
algorithm which has a low accuracy and it will help us to improve the accuracy of
the algorithm. The key to machine attainment algorithms is hyperparameter
standardization. For illustration the terms “ model parameter” and “ model
hyperparameter.”
There are two hyper parameter tuning GridSearchCV and Randomized SearchCV.
We have used GridsearchCV for our model to improve the accuracy of the model.
Gridsearchcv has an added advantage over Randomized searchcv the random cv is
fast comparing to grid search cv but as it goes for a random search it might search
for a null value and there might be a chance where the accuracy of the algorithm will
decrease whereas gridsearch cv is slow and its search for each grid and find the best
value to improve the accuracy of the model.
7.4.1 Web Application
We’ve built our model using Python in the flask web application. The model was
trained and downloaded as a pickle file. The front end was designed for the user
input where the users can enter the valid data and the result will display in the web
application. Flask is also extensible and doesn’t force a particular directory structure
or require complicated boilerplate code before getting started. Flask’s framework is
more explicit than Django’s framework and is also easier to learn because it has less
base code to implement a simple web-Application.

34
7.5 EVALUATION PARAMETERS
Once the model has been built the accuracy of the model has to be evaluated by the
performance metrics in machine learning methods. We have used F1-Score,
precision, recall, confusion matrix.

7.5.1 Confusion Matrix


Confusion matrix is a very intuitive cross tab of actual class values and predicted
class values. It contains the count of observations that fall in each category.
Build a model → make class predictions on test data using the model → create a
confusion matrix for each model.

Figure5: Confusion matrix

7.5.2 F1- Score


F1 score will depict the gap between precision and recall. F1 Score isn’t so high if
one of these measures, Precision or Recall, is improved at the expense of the other.
F1 Score = 2*(Recall * Precision) / (Recall + Precision).

35
7.5.3 Precision
Ratio of true negatives to total negatives in the data. Important when: you want to cover all
true negatives.
Specificity = TN/(TN+FP)

7.5.4 Recall
Ratio of true positives to actual positives in the data. Important when: identifying
the positives is crucial.
Sensitivity or Recall = TP/(TP+FN)

7.6 SUMMARY
This chapter covers the Algorithms and implementation of the project and the major
area of the project was Data mining and machine learning. Machine learning model
with hyper parameter tuning was discussed here.

36
CHAPTER 8
CONCLUSION & FUTURE WORK
8.1 CONCLUSION
Diabetic Retinopathy (DR) is the maximum generic eye ailment ensuing in blindness
in diabetic sufferers. Diabetic Retinopathy (Damage in Retina) is the maximum not
unusual place threatening diabetic eye ailment and reasons main imaginative and
prescient loss and blindness. A affected person with the diabetic ailment desires to
revel in occasional eye screening.
The most important targets of this paintings have been
a) Development of a machine so that it will be capable of become aware of sufferers
with BDR and PDR from both shadeation photograph or grey degree fundus
photograph.
b) The exclusive diabetic retinopathy sicknesses which might be of hobby
encompass pink spots and bleeding each falls among BDR and PDR levels of the
ailment. While SDR kinds are anticipated to be cited the ophthalmologist.
State-of-the-artwork ML strategies have been followed on this observe. Traditional
regression evaluation is predicated on hypothesis-pushed assumptions, whilst the
ML strategies used do now no longer require a predetermined assumption. This
function lets in for statistics-pushed exploration for non-linear styles that are
expecting threat for a given individual, i.e., unique threat stratification. As found on
this observe, the rating of the significance confirmed that the length of diabetes,
HbA1c, systolic blood pressure, TG, BMI, serum creatinine, age, schooling degree,
length of hypertension, and earnings degree have been the ten maximum crucial
elements for RDR. Furthermore, the given ML set of rules calls for simplest
minimum enter all through the version improvement stage, that's specially crucial
for the reason that ML fashions can without difficulty comprise new statistics to

37
replace and optimize, thereby constantly enhancing their discriminative overall
performance over time.
Our fashions supplied facts for DR screening in excessive-threat populations and
might assist to lessen the frequency of ocular examinations in low-threat
populations. Limited research have been to be had on threat stratification of DR
primarily based totally on ML and non-ocular parameters. By education the statistics
of 1,782 sufferers (with out the usage of cross-validation), the logit version received
an AUC of 0.760 primarily based totally on backward removal as a function choice
strategy. We divided the scientific statistics of 536 sufferers in Taiwan into
education and validation sets (at an 80:20 ratio), and in comparison the overall
performance of 4 fashions (assist vector system, selection tree, ANN, and logistic
regression) for DR detection, and discovered that assist vector system executed
satisfactory with an AUC of 0.839. Random Forest outperformed logistic regression
for DR detection with AUCs of 0. eighty four and 0.77, respectively. The above cited
research has been primarily based totally on hospital-primarily based totally
statistics, however population-primarily based totally statistics are extra applicable
to the truth of DR screening programmes. This observe implemented ML strategies
to population-primarily based totally statistics and proven their usefulness for RDR
detection with comparable AUCs to the ones in hospital-primarily based totally
research.
The significance rating evaluation confirmed that the quantity and length of
smoking and consuming have been additionally crucial for RDR. Finally, the rating
of threat elements may offer perception into the prevention of DR.
In this secondary evaluation of a large-scale population-primarily based totally
survey, we first extracted demographic variables, laboratory check results, and
scientific and own circle of relatives history, after which implemented exclusive ML
algorithms to rank threat elements and for identity of RDR. The Random Forest set
38
of rules accomplished the satisfactory overall performance primarily based totally
on 10 easy variables. The utilization of ML algorithms to rank epidemic threat
elements (apart from ophthalmic examinations) to become aware of referable
sufferers will lessen the price and feature a excessive utility cost in resource-bad
areas.

39
REFERENCES
1. Amol Prataprao Bhatkar and G.U. Kharat,(2015) “Detection of Diabetic
Retinopathy in Retinal Images Using MLP Classifier”, IEEE International
Symposium on Nanoelectronic and Information Systems (iNIS), Vol. 1, Pp.
331-335.

2. B.V. Baiju and John Aravindhar,(2019)“Multi attribute inter dependency


relational clustering of diabetic data with influence measure based disease
prediction” in Journal of green engineering(JGE), vol.9, issue-1.

3. Cut Fiarni et. al,(2019),“Analysis and Prediction of Diabetes Complication


Disease using Data Mining Algorithm” in Procedia Computer Science 161.
Vol. 161, Pp 449-457.

4. Carlos Santos et. al.“A New Method Based on Deep Learning to Detect
Lesions in Retinal Images using YOLOv5” Published on 2021 IEEE Int.l
Conf. on Bioinformatics and Biomedicine (BIBM),Pp. 3513-3520.

5. Dipika Gadriye and Gopichand Khandale,(2014) “Neural Network Based


Method for the Diagnosis of Diabetic Retinopathy”, International Conference
on Computational Intelligence and Communication Networks (CICN), Vol.
1,Pp. 1073-1077.

6. V. Deepa, (2021)“Ensemble of multi-stage deep convolutional neural


networks for automated grading of diabetic retinopathy using image patches”
,Journal of King Saud University-Computer and Information Sciences.

7. Eman Abdelmaksoud et. al.“Automatic diabetic retinopathy grading system


based on detecting multiple retinal lesions”, IEEE in Open Access Journal,vol.
9, 2021.

40
8. Farrukh Aslam Khan et. al., “Detection and Prediction of Diabetes Using Data
Mining: A Comprehensive Review” Published on 2021 IEEE Int.I Conf, Vol.
9,Pp. 43711-43735.

9. Harry Pratt et al. “Convolutional Neural Networks for Diabetic Retinopathy”.


Published in 2016 at Procedia Computer Science Int. Conf. On Medical
Imaging Understanding and Analysis,Vol. 90, Pp. 200-205.

10.Jayant Yadav et. al. “Diabetic Retinopathy detection using feedforward neural
network”, Tenth Int. Conf. On Contemporary Computing(IC3), Pp. 1-3, 2017.

11.Jiaxi Goa et. al. “Diabetic Retinopathy classification using an efficient


convolution neural network”. Published on 2019 at IEEE Int. Conf. On Agents
(ICA).

12.Kavakiotis et al. “Machine Learning and Data Mining Methods in Diabetes


Research” published on 2017 at Computational and Structural Biotechnology
Journal .

13.Kalyani et. al. “Diabetic Retinopathy detection and classification using


capsule networks” . Published on 17 March 2021 in Springer journals.

14.Anwar.F et.al. (2022)“A comparative analysis on diagnosis of diabetes


mellitus using different approaches”,Pp. 2352-9148.

41
APPENDIX

A. SCREENSHOTS

1. RANDOM FOREST
1.1 Random Forest Algorithm with hyperparameter

Figure 6: Confusion matrix for Random Forest with hyper parameter tuning

The above screenshots show the accuracy of the Random Forest algorithm with
hyper parameter tuning techniques.

42
1.2 Random Forest Algorithm without hyperparameter

Figure 7: Confusion matrix for Random Forest without hyper parameter tuning

The above screenshots show the accuracy of the Random Forest algorithm without
hyper parameter tuning techniques, hence the accuracy of the model is low compared
to hyper parameter tuning method.

43
2. LOGISTIC REGRESSION

2.1 Logistic Regression Algorithm with hyperparameter

Figure 8: Confusion matrix for Logistic Regression with hyper parameter tuning

The above screenshots show the accuracy of the Logistic Regression algorithm with
hyper parameter tuning techniques.

2.2 Logistic Regression Algorithm without hyperparameter

Figure 9: Confusion matrix for Logistic Regression without hyper parameter tuning

44
The above screenshots show the accuracy of the Logistic Regression algorithm
without hyper parameter tuning techniques; hence the accuracy of the model is low
compared to hyper parameter tuning method.

3. KNN ALGORITHM

3.1 KNN with hyperparameter

Figure 10: Confusion matrix for KNN with hyper parameter tuning

The above screenshots show the accuracy of the KNN algorithm with hyper
parameter tuning techniques.

45
3.2 KNN without hyperparameter

Figure 11: Confusion matrix for KNN without hyper parameter tuning

The above screenshots show the accuracy of the KNN algorithm without hyper
parameter tuning techniques, hence the accuracy of the model is low compared to
hyper parameter tuning method.

4. DECISION TREE

4.1 Decision tree with hyperparameter

46
Figure 12: Confusion matrix for Decision Tree with hyper parameter tuning

The above screenshots show the accuracy of the Decision Tree algorithm with hyper
parameter tuning techniques.

4. 2 Decision Tree without hyperparameter

Figure 13: Confusion matrix for Decision Tree without hyper parameter tuning

The above screenshots show the accuracy of the Decision Tree algorithm without
hyper parameter tuning techniques, hence the accuracy of the model is low compared
to hyper parameter tuning methods.

47
5. SVM ALGORITHM

5.1 SVM with hyperparameter

Figure 14: Confusion matrix for SVM with hyper parameter tuning

The above screenshots show the accuracy of the Support Vector Machine algorithm
with hyper parameter tuning techniques.

5.2 SVM without hyperparameter

Figure 15: Confusion matrix for SVM without hyper parameter tuning

The above screenshots show the accuracy of the Support Vector Machine algorithm
without hyper parameter tuning techniques, hence the accuracy of the model is low
compared to hyper parameter tuning methods.

48
6. ALGORITHM COMPARISON

Figure 16: Comparison of algorithms with hyper parameter tuning method

The above comparison graph shows the overall accuracy of each algorithm using
hyper parameter tuning methods and we can conclude that the random forest
algorithm has a high accuracy compared to other algorithms.

6.2 Comparison of algorithms without hyper parameter tuning method

Figure 17: Comparison of algorithms with hyper parameter tuning method

49
The above comparison graph shows the overall accuracy of each algorithm using
hyper parameter tuning methods and we can conclude that the random forest
algorithm has a high accuracy compared to other algorithms.

B. SAMPLE CODE
plt.figure(figsize=(10,10))
sns.heatmap(data.corr(), annot = True)
from sklearn.model_selection import train_test_split
X = data.iloc[:,:-1]
y = data['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 10)
print("Train Set: ", X_train.shape, y_train.shape)
print("Test Set: ", X_test.shape, y_test.shape)
model = RandomForestClassifier(n_estimators=20)
model.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
acc_rand = accuracy_score(y_test, model.predict(X_test))*100
score_.append(acc_rand)
model_.append("Random Forest Classifier")
# search for optimun parameters using gridsearch
params = {'penalty':['l1','l2'],
'C':[0.01,0.1,1,10],
'class_weight':['balanced',None]}
logistic_clf = GridSearchCV(LogisticRegression(),param_grid=params,cv=10)
#make predictions
logistic_predict = logistic_clf.predict(X_test)
log_accuracy = accuracy_score(y_test,logistic_predict)

50
log_accuracy = round(log_accuracy*100,2)
model_.append("Logistic Regression")
score_.append(log_accuracy)
cm=confusion_matrix(y_test,logistic_predict)
conf_matrix=pd.DataFrame(data=cm,columns=['Predicted:0','Predicted:1'],index=['
Actual:0','Actual:1'])
plt.figure(figsize = (8,5))
sns.heatmap(conf_matrix, annot=True,fmt='d',cmap="YlGnBu")
# search for optimun parameters using gridsearch
params= {'n_neighbors': np.arange(1, 5)}
grid_search = GridSearchCV(estimator = KNeighborsClassifier(), param_grid =
params,
scoring = 'accuracy', cv = 10, n_jobs = -1)
knn_clf = GridSearchCV(KNeighborsClassifier(),params,cv=3, n_jobs=-1)
# train the model
knn_clf.fit(X_train,y_train)
knn_clf.best_params_
#accuracy
knn_accuracy = accuracy_score(y_test,knn_predict)
knn_accuracy = round(knn_accuracy*100,2)
model_.append("K Nearest Neighbour")
score_.append(knn_accuracy)
cm=confusion_matrix(y_test,knn_predict)
conf_matrix=pd.DataFrame(data=cm,columns=['Predicted:0','Predicted:1'],index=['
Actual:0','Actual:1'])
plt.figure(figsize = (8,5))
sns.heatmap(conf_matrix, annot=True,fmt='d',cmap="YlGnBu")
51
from sklearn.tree import DecisionTreeClassifier
dtree= DecisionTreeClassifier(random_state=7)
# grid search for optimum parameters
params = {'max_features': ['auto', 'sqrt', 'log2'],
'min_samples_split': [2,3,4,5,6,7,8,9,10,11,12,13,14,15],
'min_samples_leaf':[1,2,3,4,5,6,7,8,9,10,11]}
tree_clf = GridSearchCV(dtree, param_grid=params, n_jobs=-1)
# train the model
tree_clf.fit(X_train,y_train)
tree_clf.best_params_
# predictions
tree_predict = tree_clf.predict(X_test)
#accuracy
tree_accuracy = accuracy_score(y_test,tree_predict)
tree_accuracy = round(tree_accuracy*100,2)
model_.append("Decision Tree Classifier")
score_.append(tree_accuracy)
cm=confusion_matrix(y_test,tree_predict)
conf_matrix=pd.DataFrame(data=cm,columns=['Predicted:0','Predicted:1'],index=['
Actual:0','Actual:1'])
plt.figure(figsize = (8,5))
sns.heatmap(conf_matrix, annot=True,fmt='d',cmap="YlGnBu")
#grid search for optimum parameters
Cs = [0.001, 0.01, 0.1, 1, 10]
gammas = [0.001, 0.01, 0.1, 1]
param_grid = {'C': Cs, 'gamma' : gammas}

52
svm_clf = GridSearchCV(SVC(kernel='rbf', probability=True), param_grid,
cv=10)
cm=confusion_matrix(y_test,svm_predict)
conf_matrix=pd.DataFrame(data=cm,columns=['Predicted:0','Predicted:1'],index=['
Actual:0','Actual:1'])
plt.figure(figsize = (8,5))
sns.heatmap(conf_matrix, annot=True,fmt='d',cmap="YlGnBu")

53
C. PUBLICATION

ICICCS 2022 Conference Acceptance Letter


1 message

ICCS Conference <iccs.conf.org@gmail.com> Fri,


29 Apr 2022 at 3:46 pm
To: HARIPRIYA S <haripriya12072001@gmail.com>

Dear Author,

Your manuscript was accepted and recommended for publication in the Springer
Series - Advances in Intelligent Systems and Computing.

Please refer the acceptance letter and technical comments.

Kindly ensure the following points before uploading the final paper.
1. Final manuscript must be as per Springer template, Refer Springer sample word document
Click Here
2. Minimum 15-20 references are expected and that must be in the article and all
references must be cited in the text. Like [1], [2],.....
3. The article has few typographical errors which may be carefully looked at.
4. Complete the Consent to publish form (Publishing agreement).
5. Ensure all the figures and tables are cited in the sequential order.
6. Mark * for the corresponding author name and email address in the first page of the paper.

Important dates:
Conference Date: 29-30, June 2022
Last Date for Registration: 8, May 2022

Registration Information available at


http://iciccs.co.in/2022/registration.html

For payment, Please use the townscript link:

https://www.townscript.com/v2/e/4th-international-conference-on-intelligent-computing-
information-and-control- systems-204131/booking/tickets

Registration Method: Send Final paper (in both .doc & .pdf), Response to Reviewer
Comments, Publishing agreement and screen snapshot of payment proof to
iccs.conf.org@gmail.com

For any queries contact:


Email: iccs.conf.org@gmail.com
Mobile: 9597324073
Regards,
Conference chair ICICCS 2022

54

You might also like