Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

lOMoARcPSD|38740498

Divyanshi Thesis.docx (2)

m.tech (Madan Mohan Malaviya University Of Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)
lOMoARcPSD|38740498

Skin Cancer Detection


A Project Report Submitted
In Partial Fulfillment of the Requirement for the Degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
by
Divyanshi Singh - 2018021051, Abhishek Kumar Yadav - 2018021007, Hritik Singh - 2018021056

Under the Supervision of


Prof. P.K Singh
Department - Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Madan Mohan Malaviya University of Technology,


Gorakhpur (U.P.) - INDIA
June, 2022

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

© M. M. M. University of Technology, Gorakhpur (U.P.) – 273010, INDIA


ALL RIGHTS RESERVED

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

CERTIFICATE

Certified that Divyanshi Singh - 2018021051, Abhishek Kumar Yadav - 2018021007,


Hritik Singh 2018021056 have carried out the project work presented in this report
entitled “Skin Cancer Detection” for the award of Bachelor of Technology in
Computer Science and Engineering from Madan Mohan Malaviya University of
Technology (formerly Madan Mohan Malaviya Engineering College), Gorakhpur
(UP) under my supervision and guidance. The report embodies result of original work
and study carried out by students themselves and the contents of the report do not form
the basis for the award of any other degree to the candidate or to anybody.

Prof. P.K Singh


Department - CSE
M.M.M.U.T. Gorakhpur

Date:

3
Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)
lOMoARcPSD|38740498

CANDIDATE’S DECLARATION

I declare that this written submission represents my work and ideas in my own words
and where other ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all principles of
academic honesty and integrity and have not misrepresented or falsified any
idea/data/fact/source in my submission. I understand that any violation of the above will
be cause for disciplinary action by the University and can also evoke penal action from
the sources which have thus not been properly cited or from whom proper permission
has not been taken when needed.

Divyanshi Singh - 2018021051


Abhishek Kumar Yadav - 2018021007
Hritik Singh - 2018021056
B.Tech (ECE), Final Year
Department of Computer Science and Engineering

4
Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)
lOMoARcPSD|38740498

APPROVAL SHEET

This project report entitled “Skin Cancer Detection” by Divyanshi Singh -


2018021051, Abhishek Kumar Yadav - 2018021007, Hritik Singh - 2018021056 is
approved for the degree of Bachelor of Technology in Computer Science and
Engineering

Examiner

Supervisor

Head of Department

Date:

Place:

5
Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)
lOMoARcPSD|38740498

ACKNOWLEDGEMENT

It is matter of great pleasure and satisfaction for me to present this


dissertation work entitled “Skin Cancer Detection”, as a part of
curriculum for award of “Bachelor of Technology” from Madan Mohan
Malaviya University of Technology, Gorakhpur (U.P.) India.
I am very grateful to my Head of the Department Prof. Udai Shankar.
It has been truly reassuring to know that he is always willing to share his
quest for new problem and new solutions forms a very challenging and
rewarding environment with us. He provides all kind of academic as well
as administrative support for smooth completion of my dissertation work.
Without his valuable guidance, this work would never have been a
successful one.
I am very much thankful to my supervisor, Dr. P.K. Singh also to
encourage me to perform work in the emerging area of research i.e.
organic material based devices and their digital circuit applications as
well as their continuous guidance and support throughout my work. I
would also like to thank all my classmates for their valuable suggestions
and helpful discussions.
At last, I am grateful to my family member especially my beloved
parents, for their encouragement and tender. Without them, I was unable
to have enough strength to finish this dissertation.

Date Divyanshi Singh


Roll No. 2018021051

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

LIST OF FIGURES

Figure No. Description of Figures Page no.

Fig. 1.1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 2

Fig. 1.2 xxxxxxxxxxxxxxxxxxxxxxxxxx 3

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

LIST OF TABLES

Table No. Description Page no.

Table 3.1 Name of table 1

Table 3.2 Name of table 45

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

ABSTRACT

Abstract Skin cancer is the most common human malignancy, is


primarily diagnosed visually, beginning with an initial clinical
screening and followed potentially by dermoscopic analysis, a
biopsy, and histopathological examination. Automated
classification of skin lesions using images is a challenging task
owing to the fine-grained variability in the appearance of skin
lesions. But recently some of the researchers using deep
convolutional neural networks outperformed the expert
dermatologists. The CNN achieves performance on par with all
tested experts across both tasks, demonstrating an artificial
intelligence capable of classifying skin cancer with a level of
competence comparable to dermatologists. CNN achieves
performance on par with all tested experts across both tasks,
demonstrating an artificial intelligence capable of classifying skin
cancer with a level of competence comparable to dermatologists.
Outfitted with deep neural networks, mobile devices can
potentially extend the reach of dermatologists outside of the
clinic.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

TABLE OF CONTENTS

Certificate .iii
Candidate’s Declaration .iv
Approval Sheet .v
Acknowledgement .vi
List of Figures vii
List of Tables .ix
Abstract x
Table of Contents xi

CHAPTER 1 INTRODUCTION 1-6


1.1 Introduction 1
1.2 Problem Statement 1
1.3 Proposed Solution 2

CHAPTER 2 7
2.1 Literature Survey on Skin Cancer Cure 7

CHAPTER 3 Project Description


13-23
CHAPTER 4 Technology Components Used
25-32
4.1 Python anaconda setup 25
4.2 Tensorflow framework 25
4.3 Html and CSS 27

CHAPTER 5 Dataset and Preprocessing


33-40 36
5.1 Data Descriptor 40
5.5 Background & Summary

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

5.2 Digitization of diapositives 40

CHAPTER 6 Deep Learning Techniques for Skin Cancer


Detection 41-42
6.1 Convolution: 41
6.2 Max Pooling: 41
6.3 Flattening:
6.4 -Full Connection:
CHAPTER 7 Building the CNN Model
41-42
6.1 Initializing the CNN 41
6.2 Adding the Convolution Layer 41
6.3 Adding the Pooling Layer
6.4 Adding Flattening to the layer
6.5 Adding the Fully-Connected layer & Output layer 41
6.6 Compiling the CNN
6.7 Fitting the CNN model to the dataset
CHAPTER 7 Testing and performance of our model 42
41-42

CHAPTER 7 Deploy a Flask App Using Gunicorn to App Platform 43


41-42
REFERENCES
43-48

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

INTRODUCTION

Skin cancer is the single most common malignancy affecting humankind,


with over 5.4 million, in the USA alone, diagnosed on a yearly basis. Yet,
diagnosis is still a visual process, which relies on the long-winded
procedure of clinical screenings, followed by dermoscopic analysis, and
then a biopsy and finally a histopathological examination. This process
easily takes months and the need for many medical professionals and still is
only ~77% accurate. The sheer amount of time and technicalities it takes
when diagnosing the patient (let alone beginning their treatment) and the
many opportunities for human error, leave thousands dead annually.
Current methods using AI to diagnose lesions But, with the development of
artificial intelligence and machine learning capabilities, there is shining
potential to spare time and mitigate errors- saving millions of lives in the
long run. In particular, Convolutional Neural Networks (CNNs) can
automate most of the diagnosis process with equal or more accuracy than
the current methods. For a refresher on neural nets, you can read an
introductory article here (This article assumes you have a fundamental
understanding of NNs and CNNs). To see for me, I replicated a CNN using
10,000 training images to compare the results to human experts and
analyze the results to see how they contrast. This article goes through the
process and explains how a basic CNN can be enhanced quite easily and
match or surpass human capabilities. An overview of the steps we’ll
take In 14,

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

relatively simple steps, I’ll show how I built and tuned this model, as well
as the final results and how they compare. The dataset I used includes 7
major categories of skin cancers, Melanocytic nevi, Melanoma, Benign
keratosis-like lesions, Basal cell carcinoma, Actinic keratoses, Vascular
lesions, and Dermatofibroma.

Problem Statement
Skin cancer is the most commonly diagnosed cancer in the United States,
and most cases are preventable. Skin cancer greatly affects quality of life,
and it can be disfiguring or even deadly. Medical treatment for skin cancer
creates substantial health care costs for individuals, families, and the
nation. The number of Americans who have had skin cancer at some point
in the last three decades is estimated to be higher than the number for all
other cancers combined and skin cancer incidence rates have continued to
increase in recent years. Each year in the United States, nearly 5 million
people are treated for all skin cancers combined, with an annual cost
estimated at $8.1 billion.10 Melanoma is responsible for the most deaths of
all skin cancers, with nearly 9,000 people dying from it each year.11 It is
also one of the most common types of cancer among U.S. adolescents and
young adults.12 Annually, about $3.3 billion of skin cancer treatment costs
are attributable to melanoma.10 Despite efforts to address skin cancer risk
factors, such as inadequate sun protection and intentional tanning
behaviors, skin cancer rates, including rates of melanoma, have continued
to increase in the United States and worldwide. With adequate support and
a unified approach, comprehensive, communitywide efforts to prevent skin
cancer can work. Although such success will require a sustained
commitment and coordination across diverse partners and sectors,

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

significant reductions in illness, deaths, and health care costs related to skin
cancer can be achieved.

Proposed Solution

We are designed a web Application to detect Skin cancer in its early stages,
Deep learning has revolutionized the entire landscape of machine learning
during recent decades. It is considered the most sophisticated machine
learning subfield concerned with artificial neural network algorithms.
These algorithms are inspired by the function and structure of the human
brain. Deep learning techniques are implemented in a broad range of areas
such as speech recognition, pattern recognition, and bioinformatics. As
compared with other classical approaches of machine learning, deep
learning systems have achieved impressive results in these applications.
Various deep learning approaches have been used for computer-based skin
cancer detection in recent years. This Report thoroughly discusses and
analyzes skin cancer detection techniques based on deep learning. This
paper focuses on the presentation of a comprehensive, systematic literature
review of classical approaches of deep learning, such as artificial neural
networks (ANN), convolutional neural networks (CNN), neural networks
for skin cancer detection. A significant amount of research has been
performed on this topic. Thus, it is vital to accumulate and analyze the
studies, classify them, and summarize the available research findings. To
conduct a valuable systematic review of skin cancer detection techniques
using deep neural network-based classification, we built search strings to
gather relevant information. We kept our search focused on publications of
well-reputed journals and conferences. We established multi-stage selection

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

criteria and an assessment procedure, and on the basis of the devised


search, Some relevant research papers were selected. These papers were
thoroughly evaluated and analyzed from different aspects. We are greatly
encouraged by the trends in skin cancer detection systems, but still, there is
space for further improvement in present diagnostic techniques.

Literature Survey on Skin Cancer Cure

A deep learning model is proposed in this study to detect skin cancer in


dermoscopy images. These images included melanoma and non-melanoma
cancers. The proposed system achieved a high performance (AUC=0.91) in
distinguishing malignant and benign lesions. These results demonstrate the
strength of deep learning in detecting cancer. In this work, dermatologists’
performance was not investigated to be compared with the model accuracy.
However, previous studies has shown that skin cancer detection by human
experts contain a significant error. Since the performance is highly related
to the dataset, it is not possible to compare the results of this work to the
performance achieved by dermatologists in other studies. Thus, in future
work, we intend to collect the opinion of dermatologists, in order to
compare their performance with that of the model with on the same dataset.
The performance of deep learning models highly depends on the amount of
training data. Since the majority of images in HAM10000 dataset are
benign, only 3400 images were used in this study so that the data comprise
an equal number of benign and malignant images. Despite the limited
amount of training data, the proposed model achieved high accuracy in
discriminating benign and malignant lesions. Nevertheless, it is expected
that larger training sets significantly enhance the performance of the model.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Therefore, the main challenge in this area is the lack of a large


comprehensive public database of skin lesion images. The threshold for the
confidence score of CNN was set to 0.5 in this analysis. However, the
threshold can be adjusted based on user preference. For example, if it is
more important to not miss cancerous legions than it is to misidentify
benign legions as malignant, then the user must reduce the threshold to
improve sensitivity at the expense of specificity. This means moving the
red point on the curve towards the left. The proposed approach can be
deployed in computer-aided detection systems to assist dermatologists to
identify skin cancer. Moreover, it can be implemented in smartphones to be
applied on skin lesion photographs taken by patients. This allows for early
detection of cancer, especially for those without access to doctors. Early
diagnosis can significantly facilitate the treatment and improve the survival
chance.

Technology Components Used

• Python anaconda setup


• Machine Learning
• Deep learning
• Flask
• Tensorflow framework
• Keras
• Html and CSS

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Components used in this project are explained below.

Keras is compact, easy to learn, high-level Python library run on top of TensorFlow
framework. It is made with focus of understanding deep learning techniques, such as
creating layers for neural networks maintaining the concepts of shapes and
mathematical details. The creation of freamework can be of the following two types −

● Sequential API
● Functional API

Consider the following eight steps to create deep learning model in Keras −

● Loading the data


● Preprocess the loaded data
● Definition of model
● Compiling the model
● Fit the specified model
● Evaluate it
● Make the required predictions
● Save the model

We will use the Jupyter Notebook for execution and display of output as shown below −

Step 1 − Loading the data and preprocessing the loaded data is implemented first to
execute the deep learning model.

import warnings
warnings.filterwarnings('ignore')

import numpy as np
np.random.seed(123) # for reproducibility

from keras.models import Sequential


from keras.layers import Flatten, MaxPool2D, Conv2D, Dense, Reshape, Dropout

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

from keras.utils import np_utils


Using TensorFlow backend.
from keras.datasets import mnist

# Load pre-shuffled MNIST data into train and test sets


(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

This step can be defined as “Import libraries and Modules” which means all the libraries
and modules are imported as an initial step.

Step 2 − In this step, we will define the model architecture −

model = Sequential()
model.add(Conv2D(32, 3, 3, activation = 'relu', input_shape = (28,28,1)))
model.add(Conv2D(32, 3, 3, activation = 'relu'))
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation = 'softmax'))

Step 3 − Let us now compile the specified model −

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Step 4 − We will now fit the model using training data −

model.fit(X_train, Y_train, batch_size = 32, epochs = 10, verbose = 1)

The output of iterations created is as follows −

Epoch 1/10 60000/60000 [==============================] - 65s -


loss: 0.2124 -
acc: 0.9345
Epoch 2/10 60000/60000 [==============================] - 62s -
loss: 0.0893 -
acc: 0.9740
Epoch 3/10 60000/60000 [==============================] - 58s -
loss: 0.0665 -
acc: 0.9802
Epoch 4/10 60000/60000 [==============================] - 62s -
loss: 0.0571 -
acc: 0.9830
Epoch 5/10 60000/60000 [==============================] - 62s -
loss: 0.0474 -
acc: 0.9855
Epoch 6/10 60000/60000 [==============================] - 59s -
loss: 0.0416 -
acc: 0.9871
Epoch 7/10 60000/60000 [==============================] - 61s -
loss: 0.0380 -
acc: 0.9877
Epoch 8/10 60000/60000 [==============================] - 63s -
loss: 0.0333 -
acc: 0.9895
Epoch 9/10 60000/60000 [==============================] - 64s -
loss: 0.0325 -

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

acc: 0.9898
Epoch 10/10 60000/60000 [==============================] - 60s -
loss: 0.0284 -
acc: 0.9910

DataSet Used

Dataset It consists of 10015 dermatoscopic images which are released as


a training set for academic machine learning purposes and are publicly
available through the ISIC archive. This benchmark dataset can be used
for machine learning and for comparisons with human experts. These
are microscopic images of Benign & Malignant Skin Cancer cells. The
main issue is that the images are quite large, and therefore I would
require a GPU or better CPU to process all these images without Jupyter
crashing constantly. For the purpose of this project, I am using 1000
Images in my training dataset (500 Benign and 500 Malignant), and 400
images in my test dataset (200 benign and 200 malignant).s study to
detect skin cancer in dermoscopy images. These images included
melanoma and non-melanoma cancers. The proposed system achieved a
high performance (AUC=0.91) in distinguishing malignant and benign
lesions. These results demonstrate the strength of deep learning in
detecting cancer. In this work, dermatologists’ performance was not
investigated to be compared with the model accuracy. However,
previous studies [ 3 ] has shown that skin cancer detection by human
experts contain a significant error. Since the performance is highly
related to the dataset, it is not possible to compare the results of this
work to the performance achieved by dermatologists in other studies.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Thus, in future work, we intend to collect the opinion of dermatologists,


in order to compare their performance with that of the model with on the
same dataset. The performance of deep learning models highly depends
on the amount of training data. Since the majority of images in
HAM10000 dataset are benign, only 3400 images were used in this
study so that the data comprise an equal number of benign and
malignant images. Despite the limited amount of training data, the
proposed model achieved high accuracy in discriminating benign and
malignant lesions. Nevertheless, it is expected that larger training sets
significantly enhance the performance of the model. Therefore, the main
challenge in this area is the lack of a large comprehensive public
database of skin lesion images. The threshold for the confidence score
of CNN was set to 0.5 in this analysis. However, the threshold can be
adjusted based on user preference. For example, if it is more important
to not miss cancerous legions than it is to misidentify benign legions as
malignant, then the user must reduce the threshold to improve
sensitivity at the expense of specificity. This means moving the red
point on the curve towards the left. The proposed approach can be
deployed in computer-aided detection systems to assist dermatologists
to identify skin cancer. Moreover, it can be implemented in smartphones
to be applied on skin lesion photographs taken by patients. This allows
for early detection of cancer, especially for those without access to
doctors. Early diagnosis can significantly facilitate the treatment and
improve the survival chance.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Data Descriptor: The HAM10000 dataset, a large collection of multi-


source dermatoscopic images of common pigmented skin lesions

Training of neural networks for automated diagnosis of pigmented skin


lesions is hampered by the small size and lack of diversity of available
datasets of dermatoscopic images. We tackle this problem by releasing the
HAM10000 (“Human Against Machine with 10000 training images”)
dataset. We collected dermatoscopic images from different populations
acquired and stored by different modalities. Given this diversity, we had to
apply different acquisition and cleaning methods and developed semi-
automatic workflows utilizing specifically trained neural networks. The
final dataset consists of 10015 dermatoscopic images which are released as
a training set for academic machine learning purposes and are publicly
available through the ISIC archive. This benchmark dataset can be used for
machine learning and for comparisons with human experts. Cases

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

include a representative collection of all important diagnostic categories in


the realm of pigmented lesions. More than 50% of lesions have been
confirmed by pathology, while the ground truth for the rest of the cases was
either follow-up, expert consensus, or confirmation by in-vivo confocal
microscopy.

Background & Summary


Dermatoscopy is a widely used diagnostic technique that improves the
diagnosis of benign and malignant pigmented skin lesions in comparison to
examination with the unaided eye1. Dermatoscopic images are also a
suitable source to train artificial neural networks to diagnose pigmented
skin lesions automatically. In 1994, Binder et al. 2 already used
dermatoscopic images successfully to train an artificial neural network to
differentiate melanomas, the deadliest type of skin cancer, from
melanocytic nevi. Although the results were promising, the study, like most
earlier studies, suffered from small sample size and the lack of
dermatoscopic images other than melanoma or nevi. Recent advances in
graphics card capabilities and machine learning techniques set new
benchmarks with regard to the complexity of neural networks and raised
expectations that automated diagnostic systems will soon be available that
diagnose all kinds of pigmented skin lesions without the need for human
expertise3. Training of neural-network-based diagnostic algorithms
requires a large number of annotated images4 but the number of high-
quality dermatoscopic images with reliable diagnoses is limited or
restricted to only a few classes of diseases. In 2013 Mendonça et al. made
200 dermatoscopic images available as the PH2 dataset including 160 nevi
and 40 melanomas5. Pathology was the ground truth for melanomas but not
available for most nevi. Because the set is publicly available
(http://www.fc.up.pt/addi/) and includes comprehensive metadata it served

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

as a benchmark dataset for studies of the computer diagnosis of melanoma


until now. Accompanying the book Interactive Atlas of Dermoscopy6 a
CD-ROM is commercially available with digital versions of 1044
dermatoscopic images including 167 images of non-melanocytic lesions,
and 20 images of diagnoses not covered in the HAM10000 dataset.
Although this is one of the most diverse available datasets in regard to
covered diagnoses, its use is probably limited because of its constrained
accessibility. The ISIC Archive (https://isic-archive.com/) is a collection of
multiple databases and currently includes 13786 dermatoscopic images (as
of February 12th, 2018). Because of permissive licensing (CC-0), well-
structured availability, and large size it is currently the standard source for
dermatoscopic image analysis research. It is, however, biased towards
melanocytic lesions (12893 of 13786 images are nevi or melanomas).
Because this portal is the most comprehensive, technically advanced, and
accessible resource for digital dermatoscopy, we will provide our dataset
through the ISIC archive. Because of the limitations of available datasets,
past research focused on melanocytic lesions (i.e the differentiation
between melanoma and nevus) and disregarded non-melanocytic
pigmented lesions although they are common in practice. The mismatch
between the small diversity of available training data and the variety of
real-life data resulted in a moderate performance of automated diagnostic
systems in the clinical setting despite excellent performance in
experimental settings3,5,7,8. Building a classifier for multiple diseases is
more challenging than binary classification9. Currently, reliable multi-class
predictions are only available for clinical images of skin diseases but not
for dermatoscopic images10,11. To boost the research on the automated
diagnosis of dermatoscopic images we released the HAM10000 (“Human
Against Machine with 10000 training images”) dataset. The dataset will be
provided to the participants of the ISIC 2018 classification challenge
hosted

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

by the annual MICCAI conference in Granada, Spain, but will also be


available to research groups who do not participate in the challenge.
Because we will also use this dataset to collect and provide information on
the performance of human expert diagnosis, it could serve as a benchmark
set for the comparisons of humans and machines in the future. In order to
provide more information to machine-learning research groups who intend
to use the HAM10000 training set for research, we describe the evolution
and the specifics of the dataset (Fig. 1) in detail.

Methods:
The 10015 dermatoscopic images of the HAM10000 training set were
collected over a period of 20 years from two different sites, the Department
of Dermatology at the Medical University of Vienna, Austria, and the skin
cancer practice of Cliff Rosendahl in Queensland, Australia. The Australian
site stored images and meta-data in PowerPoint files and Excel databases.
The Austrian site started to collect images before the era of digital cameras
and stored images and metadata in different formats during different time
periods.
Extraction of images and meta-data from PowerPoint files Each
PowerPoint file contained consecutive clinical and dermatoscopic images
of one calendar month of clinical workup, where each slide contained a
single image and a text field with a unique lesion identifier. Because of the
large amount of data we applied an automated approach to extract and sort
those images.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

We used the Python package python-pptx to access the PowerPoint files


and to obtain the content. We iterated through each slide and automatically
extracted and stored the source image, the corresponding identifier, and the
year of documentation, which was part of the file name.

Digitization of diapositives: Before the introduction of digital cameras,


dermatoscopic images at the Department of Dermatology in Vienna,
Austria were stored as dispositive. We digitized the di-positives with a
Nikon Coolscan 5000 ED scanner with a two-fold scan with Digital ICE
and stored files as JPEG Image

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

highest quality (300DPI; 15 × 10 cm). We manually cropped the scanned


images with the lesion centered to 800x600px at 72DPI and applied manual
histogram corrections to enhance visual contrast and color reproduction

Extraction of data from a digital dermatoscopy system


The Department of Dermatology at the University of Vienna is equipped
with the digital dermatoscopy system MoleMax HD (Derma Medical
Systems, Vienna, Austria). We extracted cases from this system by filtering
SQL tables with a proprietary tool provided by the manufacturer. We
selected only nonmelanocytic lesions with a consensus benign diagnosis,
nevi with >1.5 years of digital dermatoscopic follow-up, and excised
lesions with a histopathologic report. Histopathologic reports were matched
manually to specific lesions. From a series of multiple sequential images of
the same nevus, we extracted only the most recent one. Some melanomas
of this set were also photographed with a DermLiteT

FOTO (3GenTM) camera. These additional images became also part of the
ViDIR image series, where different images of the same lesion were
labeled with a common identifier string. Original images of the MoleMax
HD system had a resolution of 1872x1053px (MoleMax HD) with non-
quadratic pixels. We manually cropped all MoleMax HD images to
800x600px (72DPI), centered the lesion if necessary, and reverted the
format to quadratic pixels. Filtering of dermatoscopic images The source
image collections of both sites contained not only dermatoscopic images
but also clinical close-ups and overviews. Because there was no reliable
annotation of the imaging type, we had to separate the dermatoscopic
images from the others. To deal with a large amount of data efficiently we
developed an automated method to screen and categorize >30000 images,
similar to Han et al12: We hand-labeled 1501 image files of the Australian

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

collection into the categories "overviews", "close-ups" and


"dermatoscopy". Using the hand-labeled images as a training set, we fine-
tuned an InceptionV3- architecture13 (weights pre-trained on ImageNet4
data) to classify the images according to image type. After training for 20
epochs with Stochastic Gradient Descent, with a learning rate initialized at
0.0003, step-down (Gamma 0.1) at epochs 7 and 13, and a batch-size of 64,
we obtained a top-1 accuracy of 98.68% on our hold-out test set. This
accuracy was sufficient to accelerate the selection process of dermatoscopic
images. The few remaining misclassified close-ups and overviews were
removed by hand in a second revision. Unifying pathologic diagnoses
Histopathologic diagnoses showed high variability within and between
sites including typos, different dermatopathology terminologies, multiple
diagnoses per lesion, or uncertain diagnoses. Cases with uncertain
diagnoses and collisions were excluded except for melanomas in
association with a nevus. We unified the diagnoses and formed seven
generic classes, and specifically avoided ambiguous classifications. The
histopathologic expression "superficial spreading melanoma in situ, arising
in a preexisting dermal nevus", for example, should only be allocated to the
"melanoma" class and not to the nevus class. The seven generic classes
were chosen for simplicity and in regard to the intended use as a
benchmark dataset for the diagnosis of pigmented lesions by humans and
machines. The seven classes covered more than 95% of all pigmented
lesions examined in the daily clinical practice of the two study sites. A
more detailed description of the disease classes is given in the usage notes
below.

Manual quality review


A final manual screening and validation round was performed on all
images to exclude cases with the following attributes:

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Type: Close-up and overview images that were not removed with
automatic filtering
Identifiability: Images with potentially identifiable content such as
garment, jewelry, or tattoos
Quality: Images that were out of focus or had disturbing artifacts like
obstructing gel bubbles. We specifically tolerated the presence of terminal
hairs.
Content: Completely non-pigmented lesions and ocular, subungual or
mucosal lesions Remaining cases were reviewed for appropriate color
reproduction and luminance and, if necessary, corrected via manual
histogram correction. Code availability Custom generated code for the
described methods is available at https://github.com/ptschandl/
HAM10000_dataset.

Data Records All data records of the HAM10000 dataset are deposited at
the Harvard Dataverse (Data Citation 1). Table 1 shows a summary of the
number of images in the HAM10000 training set according to diagnosis in
comparison to existing databases. Images and metadata are also accessible
at the public ISIC-archive through the archive gallery as well as through
standardized API-calls.

Data Preprocessing
Processing data In this step we have read the CSV by joining the path of
the image folder which is the base folder where all the images are placed
named base_skin_dir. After that, we made some new columns which are
easily understood for later reference such as we have made column path
which contains the image_id, cell_type which contains the short name of
lesion type and at last we have made the categorical column cell_type_idx
in which we have to categorize the lesion type into codes from 0 to 2,

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Deep Learning Techniques for Skin Cancer Detection


Deep neural networks play a significant role in skin cancer detection. They
consist of a set of interconnected nodes. Their structure is similar to the
human brain in terms of neuronal interconnectedness. Their nodes work
cooperatively to solve particular problems. Neural networks are trained for
certain tasks; subsequently, the networks work as experts in the domains in
which they were trained. In our study, neural networks were trained to
classify images and to distinguish between various types of skin cancer.
Different types of skin lesions from the International Skin Imaging
Collaboration (ISIC) dataset are presented in the Figure above. We
searched for different techniques of learning, such as ANN, CNN, KNN,
and GAN for skin cancer detection systems.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

There are 4 steps involved in building a CNN:

Step 1- Convolution:

A function derived from two given functions by integration expresses how


the shape of one is modified by the other. There are 3 main elements of the
convolution operation:
● Input image: the actual image in pixels (basically our input data)
Feature detector: These can even be called filters. They basically
detect certain features in the input image.
● Feature Detectors are placed over the input image (though are much
smaller in size) and count the number of cells in which the feature
detector matches a subset of the input image. The feature detector
then moves alone the input image to cover all of its areas and the
distances it moves can be referred to as “strides”.

● Feature map: contains the arguments recorded by the filter over the
input image. The value from the input image is initially inserted in
the top-left cell of the feature map, and it moves a block to the right,
recording the observation of every stride.

Step 2- Max Pooling:

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

The purpose of max-pooling is to enable the CNN to detect an image when


presented with basic modification (flipped, mirrored, upside-down). In this
step, we determine a pooled feature map.
Note: It is different from the process used to determine feature maps in
convolution. This process involves placing a smaller matrix on top of the
feature map and inserting the maximum value within the matrix into our
pooled feature map. Then we keep moving towards the right until we fill all
the values, as it was done in the convolution step.

Max Pooling is
concerned with teaching the convolutional neural network to recognize that
despite all of these differences, they are all images of the same thing. In
order to do that, the network needs to acquire a property that is known as
“spatial variance.”, so it can recognize an object in an image even if it is
spatially different from another image of the same object. There are also
other pooling techniques such as Mean Pooling (Takes the average), and
Sum Pooling (Takes the Sum).

Step 3 — Flattening:
By the time we reach this step, we have a pooled feature map by now. As
the name of this step implies, we are literally going to flatten our pooled
feature map into the shape of a column. So instead of looking like a box

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

squared matrix, our pooled feature map now looks like a vertical column.
After flattening is that we end up with a long vector of input data that we
then pass through the artificial neural network to have it processed further.
If we didn’t do this step, it would be hard for the Neural Network to read
our data.

Step 4 - Full Connection:

The fully connected layer in the CNN is the same as a hidden layer in an
ANN. The role of the artificial neural network is to take this data and
combine the features into a wider variety of attributes that make the
convolutional network more capable of classifying images. This is also the
step where we calculate the error function that our network takes into
account before making predictions.

In an ANN, it was called the Loss Function. The machine can now place
weights on each of the fully-connected layers to determine the binary
outcome of our independent variable.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

1. We start with an input image. In our case, we would use a single image
from our dataset of 1000 images, and later we would loop the function over
the other images. 2. We apply filters or feature maps to the input image,
which gives us a convolutional layer.
3. We then break up the linearity of that image using the rectifier function.
4. The image becomes ready for pooling, the purpose of which is to
provide our CNN with “spatial invariance”. You’ll see it explained in more
detail in the pooling tutorial. After pooling, we end up with a pooled feature
map.
5. We then flatten our pooled feature map before inserting into an artificial
neural network. Throughout this entire process, the network’s building
blocks, like the weights and the feature maps, are trained and repeatedly
altered in order for the network to reach the optimal performance that will
make it able to classify images and objects as accurately as possible.

Building the CNN Model


We start by importing the Keras packages required for this pipeline.
There are 5 classes of Keras explained below, you can read through to
understand how each is going to be utilized in the model:
1. Sequential: This class is used to initialize our CNN. There are 2 ways of
initializing a class: firstly using sequential, and secondly using a graph.
2. Convolution2D: This is the class for our Convolution operation. As
mentioned above, the convolution operation is one of the most crucial steps
in building a CNN. This class is used for 2D models, which is what we
have.
3. MaxPooling2D: Another 2D class, which runs the Max pooling
operation on our CNN, given a few parameters regarding our data. As
mentioned earlier, Max Pooling is a necessary step in building a CNN as it
makes the model more rigid and susceptible to changes such as flipping,
mirroring the image, etc.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

4. Flatten: This class is involved in the next step in building our model. In
order for our machine to understand the data, we must convert it from a
matrix to a column, which can be done by flattening the data. 5. Dense: It is
the most essential class since it creates an output layer for the ANN, which
will be important in optimizing our weights for the model, and assigning a
loss/error function to evaluate the effectiveness of the model.

Initializing the CNN


We created an object called a classifier, which is essentially the CNN
classification model I’ll be building throughout this project. We attached
this object to the Sequential class to initialize the CNN model as a sequence
of layers.

Adding the Convolution Layer

The next step is to add a convolution layer. As I had mentioned above, the
convolution layer applies a filter or a feature detector to the input image
and creates a feature map for the images. I used the Convolution 2D class,
for which I explain the parameters below:
● The first parameter refers to the number of feature detectors. The
default value for this is 64, but since I’m using a CPU and not a
GPU, I chose to go with 32 feature detectors to save time and be
more resourceful since I have 1000 training images, and 400 test
images. However, 64 feature detectors could make this model a lot
more accurate.
● The second and third parameters refer to the size or dimensions of
our feature detectors, which would be a 3 x 3 matrix, thus I input (3,
3).

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

● The third parameter is the input shape, which is the shape & size of
the input image. Since all my images are of different sizes, I will
later convert them into 32 x 32 pixels. I specified 3for the number of
color channels, as these are colored images and use 3 channels
(RBG). If it were black & white, I would input 1 instead of 3.
● The final parameter is the activation function, which we use to
activate neurons in the neural network. I’m using the rectifier
activation function as this is a non-linear model, thus we input ‘relu’

Adding the Pooling Layer

The next step consists of adding the pooling layer to the CNN model.
Using the add method again, I add the MaxPooling2D Class and specify the
pool size, which will slide over the feature map to create a pooled feature
map. In this case, we will use a 2 X 2 pool size. This step is important in
reducing the size of our feature map, making the model less complex and
computational.

Adding Flattening to the layer


Below, we used the add function again to add the Flatten class to our
object, classifier. This, as we had mentioned before, creates a column shape
from the pooled feature matrix by flattening it. There are no parameters
required for this class, so we leave the brackets empty.

Adding the Fully-Connected layer & Output layer.


The next step is the final part of building the CNN model. we build the
fully-connected layer in this model, which is similar to the hidden layers I
had in my ANN model. So the first line of code below creates the hidden

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

layer. Using the add method, I utilize the Dense function which has 2
parameters:
● The first is the number of nodes for the output layer. In my ANN
model, I had taken the average of the sum of input and output layers,
however, in this case, even that number would be too big. Thus, I did
some research and discovered that we shouldn’t be using a number
too small either. I learned that having at least 128 nodes is a good
way to start.
● The second parameter is the activation function, which will again be
relu as I’m using the rectifier function for this non-linear model.
Then I created the output layer with a similar line of code. Since the
output is a binary variable, it will have 1 node and since I want to
know the probability that this model will predict whether a cell is
benign or malignant, I use the sigmoid activation function.
classifier.add(Dense(output_dim=128,activation=‘relu’))
classifier.add(Dense(output_dim = 1, activation =
‘sigmoid’))

Compiling the CNN

Now that I have built the CNN model, I need to compile it and optimize the
weights and the loss function to evaluate the model. To do this, I use the
compile method in my classifier object and input the following parameters:
● Optimizer, which is the algorithm I used to find the optimal weights
for the CNN Model. I use adam, which is a stochastic gradient
descent algorithm. The ‘adam’ algorithm is one of the faster ones.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

● The loss function will be binary_crossentropy (just like my ANN


model) since what we are trying to predict is a binary variable (1 or
0; Benign or Malignant).
● The metrics function will evaluate the model. I use accuracy again,
which is found by dividing the True predictions over total
predictions.

Fitting the CNN model to the dataset

● Now that the model is built and compiled, the next step is to fit the
CNN model to the image dataset. The code below seems much more
difficult than what I had to do to build and compile the model.
However, the Keras website provides this code within its
documentation, as image augmentation is a common practice using
Keras. As I explain the code, I will refer to certain important
parameters:
● First I had to import the ImageDataGeneratorclass, which is used to
rescale, zoom and flip the images (to make our CNN model more
rigid). I kept the default parameters on rescaling, shear_range,
zoom_range, and horizontal_flip. Then we rescale the test image
data too by the same amount. Next, I use the flow from the directory
method, to fit the training set and test set to our image data. The
following parameters were used for this:
○ The first parameter refers to the path of the training/test data
within the directory.
○ The target size refers to the size of the target image, which is
32 x 32 pixels.
○ The batch_size refers to how often we want to change our
weights. I chose 10 for the training dataset, and 32 for the test

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

dataset to reoptimize the weights after 10 & 32 observations


for the two sets respectively.
○ The class_modewill be binary since the machine is predicting
a binary variable. Finally, we fit the dataset using the
fit_generator method. Here, I specify the following
parameters:
● Steps per epoch refer to the number of images in our training set,
which is 1000.
● Epochs: the number of times I want to run the complete model.
According to my research the more, the better. But the more you
have, the more time it takes. For now, let’s go with 25 epochs.
● The validation data is our test set, which will see how accurately the
machine can predict new data.
● Validation steps refer to the number of images in our test set, which
is 400.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Testing and performance of our model

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

The testing accuracy and validation accuracy of our model are checked,
and the confusion matrix is plotted. The misclassified images count of
each type is also determined.

Deploy a Flask App Using Gunicorn to App


Platform
We build a Python application using the Flask microframework on
DigitalOcean’s App Platform. Flask is a Python-based microframework that is
popular with web developers, given its lightweight nature and ease of use.

Flask app to App Platform using gunicorn. Gunicorn is a Python WSGI HTTP
Server that uses a pre-fork worker model. By using gunicorn, you’ll be able to
serve your Flask application on more than one thread.

Prerequisites
● A GitHub account.
● Python3 installed on your local machine. You can follow the
following tutorials for installing Python on Windows, Mac, or
Linux.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

● A text editor. You can use Visual Studio Code or your favorite text
editor.

Step 1: Creating a Python Virtual Environment for your Project

Before you get started, you need to set up your Python developer
environment. You will install your Python requirements within a virtual
environment for easier management.

First, let’s create a project directory for our code and requirements.txt file to be
stored in and change into that directory. Run the following commands:

mkdir flask-app

cd flask-app

Next, create a directory in your home directory that you can use to store all of
your virtual environments:

mkdir ~/.venvs

Now create your virtual environment using Python:

python3 -m venv ~/.venvs/flask

This creates a directory called flask within your .venvs directory. Inside, it
installs a local version of Python and a local version of pip. You can use this to
install and configure an isolated Python environment for your project.

Before you install your project’s Python requirements, you need to activate
the virtual environment.

Use the following command:

source ~/.venvs/flask/bin/activate

Copy

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Your prompt changes to indicate that you are now operating within a Python
virtual environment. It looks like this: (flask)user@host:~$.

With your virtual environment active, install Flask and gunicorn using the
local instance of pip:

pip install Flask gunicorn

Copy

Now that you have the flask package installed, save this requirement and its
dependencies so App Platform can install them later.

Do this now using pip and then saving the information to a requirements.txt file:

pip freeze > requirements.txt

Copy

You now have all of the software needed to start a Flask app. You are almost
ready to deploy.

Step 2: Creating a Minimal Flask App

In this step, you will build a standard Hello Sammy! Flask application. You
won’t focus on the mechanics of Flask outside of how to deploy it to App
Platform. If you wish to deploy another application, the following steps will
work for a wide range of Flask applications.

Using your favorite text editor, open a file named app.py:

nano app.py

Copy

Now add the following code to the file:

from flask import Flask


app = Flask( name )

@app.route('/')
def hello_world():

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

return 'Hello Sammy!'

Copy

This code is the standard Hello World example for Flask with a slight
modification to say hello to your favorite shark. For more information about
this file and Flask, visit the official Flask documentation.

You have written your application code. Now you will configure the Gunicorn
server.

Step 3: Setting Up Your Gunicorn Configuration

Gunicorn is a Python WSGI HTTP server that many developers use to deploy
their Python applications. This WSGI (Web Server Gateway Interface) is
necessary because traditional web servers do not understand how to run
Python applications. For your purposes, a WSGI allows you to deploy your
Python applications consistently. You can also configure multiple threads to
serve your Python application, should you need them. In this example, you
will make your application accessible on port 8080, the standard App Platform
port. You will also configure two worker-threads to serve your application.

Open a file named gunicorn_config.py:

nano gunicorn_config.py

Copy

Now add the following code to the file:

bind = "0.0.0.0:8080"

workers = 2

Copy

This is all you need to do to have your app run on App Platform using
Gunicorn. Next, you’ll commit your code to GitHub and then deploy it.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Step 4: Pushing the Site to GitHub


DigitalOcean’s App Platform deploys your code from GitHub repositories.
This means that you must get your site in a git repository and then push that
repository to GitHub.

First, initialize your project directory containing your files as a git repository:

git init

Copy

When you work on your Flask app locally, certain files get added that are
unnecessary for deployment. Let’s exclude those files using Git’s ignore list.
Create a new file called .gitignore:

nano .gitignore

Copy

Add the following code to the file:

*.pyc

Copy

Save and close the file.

Now execute the following command to add files to your repository:

git add app.py gunicorn_config.py requirements.txt .gitignore

Copy

Make your initial commit:

git commit -m "Initial Flask App"

Copy

Your files commit:

[secondary_label Output]
[master (root-commit) aa78a20] Initial Flask App

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

4 files changed, 18 insertions(+)


create mode 100644 .gitignore
create mode 100644 app.py
create mode 100644 gunicorn_config.py

create mode 100644 requirements.txt

Copy

Open your browser and navigate to GitHub, log in with your profile, and
create a new repository called flask-app. Create an empty repository without a
README or license file.

Once you’ve created the repository, return to the command line and push your
local files to GitHub.

First, add GitHub as a remote repository:

git remote add origin https://github.com/your_username/flask-app

Copy

Next, rename the default branch main, to match what GitHub expects:

git branch -M main

Copy

Finally, push your main branch to GitHub’s main branch:

git push -u origin main

Copy

Your files transfer:

[secondary_label Output]
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 1.20 KiB | 77.00 KiB/s, done.
Total 6 (delta 0), reused 0 (delta 0)
To github.com:MasonEgger/flask-app.git
* [new branch] main -> main

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Branch 'main' set up to track remote branch 'main' from 'origin'.

Copy

Enter your GitHub credentials when prompted to push your code.

Your code is now on GitHub and accessible through a web browser. Now you
will deploy your app to DigitalOcean’s App Platform.

Step 5: Deploying to DigitalOcean with App Platform

Once you push the code, visit The App Platform Homepage and click Launch
Your App. A prompt requests that you connect your GitHub account:

Connect your account and allow DigitalOcean to access your repositories.


You can choose to let DigitalOcean access all of your repositories or just to
the ones you wish to deploy.

Click Install and Authorize. GitHub returns you to your DigitalOcean


dashboard.

Once you’ve connected your GitHub account, select the your_account/flask-app

repository and click Next.

Next, provide your app’s name, choose a region, and ensure the main branch is
selected. Then ensure that Autodeploy code changes is checked. Click Next
to continue.

DigitalOcean detects that your project is a Python app and automatically


populates a partial Run command.

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Flow Chart of Working in background of web application

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

How to deploy your machine learning model using Digital Ocean cloud
platform. It is no doubt that doing a data science and machine learning
project, starting from collecting the data, processing the data, visualizing
insights about the data, and developing a machine learning model to do a
predictive task is a fun thing to do. What makes it more fun and doable is
that we can do all of those steps in our local machine and then be done with
it. However, wouldn’t it be awesome if other people can make use of our
machine learning model to do fun and cool stuff? The true magic of
machine learning comes when our model can get into other people’s hands
and they can do useful stuff from it.

The next important step is to process the image the user has uploaded. The
processing step includes resizing the image to the same size as training and
validation images. After resizing the image, then the loaded model should
predict in which category this image belongs.

import cv2
from PIL import Image, ImageOps
import numpy as np
def import_and_predict(image_data, model):
size = (150,150)
image = ImageOps.fit(image_data, size, Image.ANTIALIAS)
image = np.asarray(image)

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)img_resize = interpola琀椀on=cv2.INTE


img_reshape = img_resize[np.newaxis,...]
(cv2.resize(img,dsize=(75,75),

predic琀椀on = model.predict(img_reshape)

return predic琀椀on if 昀椀le is None:


st.text("Please upload an image 昀椀le") else:
image = Image.open(昀椀le)
st.image(image, use_column_width=True)
predic琀椀on = import_and_predict(image, model)

if np.argmax(predic琀椀on) == 0:
st.write("Benign!")
elif np.argmax(predic琀椀on) == 1:
st.write("Malignant")
else:
st.write("No Skin Cancer!")

st.text("Probability (0: Benign, 1: Malignant, 2: No Skin Cancer")


st.write(predic琀椀on)

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

After that, you need to save the Python file in the same directory as your
previous Python file. We basically all set right now! To check what our web
app looks like, open your prompt, and then navigate to the working
directory of your Python files. In the working directory, you can type the
following command:

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

Now you will see from your prompt that you can check your web app on
your localhost. If you wait a little bit, a new window will be launched

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

shortly after you run your Streamlit app. Below is the screenshot of the
simple image classification web app.

Conclusion

The CNN Model I created above was tested to recognize from an image
whether a Skin Cancer cell is Benign or Malignant. The model has an
accuracy rate on the training set of 96.7%, with a loss of .089 in the latest
epoch, with that being the highest among all other epochs. The model has
an accuracy rate on the test set of 71.51% with a loss of 1.4443 in the latest
epoch. However, the model’s highest level of accuracy of 75.25% was in
epoch 5/25. Overall, I would say the model has done well and has achieved
my goal of being more than 70% accurate.

References

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)


lOMoARcPSD|38740498

1. https://www.sciencedirect.com/science/article/pii/S266682702100
0177
2. https://ieeexplore.ieee.org/document/8641762
3. https://reader.elsevier.com/reader/sd/pii/S2666827021000177?tok
en=FD976C9B7F8F83FC83BCD551982485FCDEEE2
4. https://arxiv.org/abs/2105.04895
5. https://www.researchgate.net/publication/325116934_Image_clas
sification_using_Deep_learning

Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)

You might also like