Professional Documents
Culture Documents
Divyanshi Thesis
Divyanshi Thesis
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
by
Divyanshi Singh - 2018021051, Abhishek Kumar Yadav - 2018021007, Hritik Singh - 2018021056
CERTIFICATE
Date:
3
Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)
lOMoARcPSD|38740498
CANDIDATE’S DECLARATION
I declare that this written submission represents my work and ideas in my own words
and where other ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all principles of
academic honesty and integrity and have not misrepresented or falsified any
idea/data/fact/source in my submission. I understand that any violation of the above will
be cause for disciplinary action by the University and can also evoke penal action from
the sources which have thus not been properly cited or from whom proper permission
has not been taken when needed.
4
Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)
lOMoARcPSD|38740498
APPROVAL SHEET
Examiner
Supervisor
Head of Department
Date:
Place:
5
Downloaded by Ankit Gupta (ankitagra4.ag@gmail.com)
lOMoARcPSD|38740498
ACKNOWLEDGEMENT
LIST OF FIGURES
LIST OF TABLES
ABSTRACT
TABLE OF CONTENTS
Certificate .iii
Candidate’s Declaration .iv
Approval Sheet .v
Acknowledgement .vi
List of Figures vii
List of Tables .ix
Abstract x
Table of Contents xi
CHAPTER 2 7
2.1 Literature Survey on Skin Cancer Cure 7
INTRODUCTION
relatively simple steps, I’ll show how I built and tuned this model, as well
as the final results and how they compare. The dataset I used includes 7
major categories of skin cancers, Melanocytic nevi, Melanoma, Benign
keratosis-like lesions, Basal cell carcinoma, Actinic keratoses, Vascular
lesions, and Dermatofibroma.
Problem Statement
Skin cancer is the most commonly diagnosed cancer in the United States,
and most cases are preventable. Skin cancer greatly affects quality of life,
and it can be disfiguring or even deadly. Medical treatment for skin cancer
creates substantial health care costs for individuals, families, and the
nation. The number of Americans who have had skin cancer at some point
in the last three decades is estimated to be higher than the number for all
other cancers combined and skin cancer incidence rates have continued to
increase in recent years. Each year in the United States, nearly 5 million
people are treated for all skin cancers combined, with an annual cost
estimated at $8.1 billion.10 Melanoma is responsible for the most deaths of
all skin cancers, with nearly 9,000 people dying from it each year.11 It is
also one of the most common types of cancer among U.S. adolescents and
young adults.12 Annually, about $3.3 billion of skin cancer treatment costs
are attributable to melanoma.10 Despite efforts to address skin cancer risk
factors, such as inadequate sun protection and intentional tanning
behaviors, skin cancer rates, including rates of melanoma, have continued
to increase in the United States and worldwide. With adequate support and
a unified approach, comprehensive, communitywide efforts to prevent skin
cancer can work. Although such success will require a sustained
commitment and coordination across diverse partners and sectors,
significant reductions in illness, deaths, and health care costs related to skin
cancer can be achieved.
Proposed Solution
We are designed a web Application to detect Skin cancer in its early stages,
Deep learning has revolutionized the entire landscape of machine learning
during recent decades. It is considered the most sophisticated machine
learning subfield concerned with artificial neural network algorithms.
These algorithms are inspired by the function and structure of the human
brain. Deep learning techniques are implemented in a broad range of areas
such as speech recognition, pattern recognition, and bioinformatics. As
compared with other classical approaches of machine learning, deep
learning systems have achieved impressive results in these applications.
Various deep learning approaches have been used for computer-based skin
cancer detection in recent years. This Report thoroughly discusses and
analyzes skin cancer detection techniques based on deep learning. This
paper focuses on the presentation of a comprehensive, systematic literature
review of classical approaches of deep learning, such as artificial neural
networks (ANN), convolutional neural networks (CNN), neural networks
for skin cancer detection. A significant amount of research has been
performed on this topic. Thus, it is vital to accumulate and analyze the
studies, classify them, and summarize the available research findings. To
conduct a valuable systematic review of skin cancer detection techniques
using deep neural network-based classification, we built search strings to
gather relevant information. We kept our search focused on publications of
well-reputed journals and conferences. We established multi-stage selection
Keras is compact, easy to learn, high-level Python library run on top of TensorFlow
framework. It is made with focus of understanding deep learning techniques, such as
creating layers for neural networks maintaining the concepts of shapes and
mathematical details. The creation of freamework can be of the following two types −
● Sequential API
● Functional API
Consider the following eight steps to create deep learning model in Keras −
We will use the Jupyter Notebook for execution and display of output as shown below −
Step 1 − Loading the data and preprocessing the loaded data is implemented first to
execute the deep learning model.
import warnings
warnings.filterwarnings('ignore')
import numpy as np
np.random.seed(123) # for reproducibility
This step can be defined as “Import libraries and Modules” which means all the libraries
and modules are imported as an initial step.
model = Sequential()
model.add(Conv2D(32, 3, 3, activation = 'relu', input_shape = (28,28,1)))
model.add(Conv2D(32, 3, 3, activation = 'relu'))
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation = 'softmax'))
acc: 0.9898
Epoch 10/10 60000/60000 [==============================] - 60s -
loss: 0.0284 -
acc: 0.9910
DataSet Used
Methods:
The 10015 dermatoscopic images of the HAM10000 training set were
collected over a period of 20 years from two different sites, the Department
of Dermatology at the Medical University of Vienna, Austria, and the skin
cancer practice of Cliff Rosendahl in Queensland, Australia. The Australian
site stored images and meta-data in PowerPoint files and Excel databases.
The Austrian site started to collect images before the era of digital cameras
and stored images and metadata in different formats during different time
periods.
Extraction of images and meta-data from PowerPoint files Each
PowerPoint file contained consecutive clinical and dermatoscopic images
of one calendar month of clinical workup, where each slide contained a
single image and a text field with a unique lesion identifier. Because of the
large amount of data we applied an automated approach to extract and sort
those images.
FOTO (3GenTM) camera. These additional images became also part of the
ViDIR image series, where different images of the same lesion were
labeled with a common identifier string. Original images of the MoleMax
HD system had a resolution of 1872x1053px (MoleMax HD) with non-
quadratic pixels. We manually cropped all MoleMax HD images to
800x600px (72DPI), centered the lesion if necessary, and reverted the
format to quadratic pixels. Filtering of dermatoscopic images The source
image collections of both sites contained not only dermatoscopic images
but also clinical close-ups and overviews. Because there was no reliable
annotation of the imaging type, we had to separate the dermatoscopic
images from the others. To deal with a large amount of data efficiently we
developed an automated method to screen and categorize >30000 images,
similar to Han et al12: We hand-labeled 1501 image files of the Australian
Type: Close-up and overview images that were not removed with
automatic filtering
Identifiability: Images with potentially identifiable content such as
garment, jewelry, or tattoos
Quality: Images that were out of focus or had disturbing artifacts like
obstructing gel bubbles. We specifically tolerated the presence of terminal
hairs.
Content: Completely non-pigmented lesions and ocular, subungual or
mucosal lesions Remaining cases were reviewed for appropriate color
reproduction and luminance and, if necessary, corrected via manual
histogram correction. Code availability Custom generated code for the
described methods is available at https://github.com/ptschandl/
HAM10000_dataset.
Data Records All data records of the HAM10000 dataset are deposited at
the Harvard Dataverse (Data Citation 1). Table 1 shows a summary of the
number of images in the HAM10000 training set according to diagnosis in
comparison to existing databases. Images and metadata are also accessible
at the public ISIC-archive through the archive gallery as well as through
standardized API-calls.
Data Preprocessing
Processing data In this step we have read the CSV by joining the path of
the image folder which is the base folder where all the images are placed
named base_skin_dir. After that, we made some new columns which are
easily understood for later reference such as we have made column path
which contains the image_id, cell_type which contains the short name of
lesion type and at last we have made the categorical column cell_type_idx
in which we have to categorize the lesion type into codes from 0 to 2,
Step 1- Convolution:
● Feature map: contains the arguments recorded by the filter over the
input image. The value from the input image is initially inserted in
the top-left cell of the feature map, and it moves a block to the right,
recording the observation of every stride.
Max Pooling is
concerned with teaching the convolutional neural network to recognize that
despite all of these differences, they are all images of the same thing. In
order to do that, the network needs to acquire a property that is known as
“spatial variance.”, so it can recognize an object in an image even if it is
spatially different from another image of the same object. There are also
other pooling techniques such as Mean Pooling (Takes the average), and
Sum Pooling (Takes the Sum).
Step 3 — Flattening:
By the time we reach this step, we have a pooled feature map by now. As
the name of this step implies, we are literally going to flatten our pooled
feature map into the shape of a column. So instead of looking like a box
squared matrix, our pooled feature map now looks like a vertical column.
After flattening is that we end up with a long vector of input data that we
then pass through the artificial neural network to have it processed further.
If we didn’t do this step, it would be hard for the Neural Network to read
our data.
The fully connected layer in the CNN is the same as a hidden layer in an
ANN. The role of the artificial neural network is to take this data and
combine the features into a wider variety of attributes that make the
convolutional network more capable of classifying images. This is also the
step where we calculate the error function that our network takes into
account before making predictions.
In an ANN, it was called the Loss Function. The machine can now place
weights on each of the fully-connected layers to determine the binary
outcome of our independent variable.
1. We start with an input image. In our case, we would use a single image
from our dataset of 1000 images, and later we would loop the function over
the other images. 2. We apply filters or feature maps to the input image,
which gives us a convolutional layer.
3. We then break up the linearity of that image using the rectifier function.
4. The image becomes ready for pooling, the purpose of which is to
provide our CNN with “spatial invariance”. You’ll see it explained in more
detail in the pooling tutorial. After pooling, we end up with a pooled feature
map.
5. We then flatten our pooled feature map before inserting into an artificial
neural network. Throughout this entire process, the network’s building
blocks, like the weights and the feature maps, are trained and repeatedly
altered in order for the network to reach the optimal performance that will
make it able to classify images and objects as accurately as possible.
4. Flatten: This class is involved in the next step in building our model. In
order for our machine to understand the data, we must convert it from a
matrix to a column, which can be done by flattening the data. 5. Dense: It is
the most essential class since it creates an output layer for the ANN, which
will be important in optimizing our weights for the model, and assigning a
loss/error function to evaluate the effectiveness of the model.
The next step is to add a convolution layer. As I had mentioned above, the
convolution layer applies a filter or a feature detector to the input image
and creates a feature map for the images. I used the Convolution 2D class,
for which I explain the parameters below:
● The first parameter refers to the number of feature detectors. The
default value for this is 64, but since I’m using a CPU and not a
GPU, I chose to go with 32 feature detectors to save time and be
more resourceful since I have 1000 training images, and 400 test
images. However, 64 feature detectors could make this model a lot
more accurate.
● The second and third parameters refer to the size or dimensions of
our feature detectors, which would be a 3 x 3 matrix, thus I input (3,
3).
● The third parameter is the input shape, which is the shape & size of
the input image. Since all my images are of different sizes, I will
later convert them into 32 x 32 pixels. I specified 3for the number of
color channels, as these are colored images and use 3 channels
(RBG). If it were black & white, I would input 1 instead of 3.
● The final parameter is the activation function, which we use to
activate neurons in the neural network. I’m using the rectifier
activation function as this is a non-linear model, thus we input ‘relu’
The next step consists of adding the pooling layer to the CNN model.
Using the add method again, I add the MaxPooling2D Class and specify the
pool size, which will slide over the feature map to create a pooled feature
map. In this case, we will use a 2 X 2 pool size. This step is important in
reducing the size of our feature map, making the model less complex and
computational.
layer. Using the add method, I utilize the Dense function which has 2
parameters:
● The first is the number of nodes for the output layer. In my ANN
model, I had taken the average of the sum of input and output layers,
however, in this case, even that number would be too big. Thus, I did
some research and discovered that we shouldn’t be using a number
too small either. I learned that having at least 128 nodes is a good
way to start.
● The second parameter is the activation function, which will again be
relu as I’m using the rectifier function for this non-linear model.
Then I created the output layer with a similar line of code. Since the
output is a binary variable, it will have 1 node and since I want to
know the probability that this model will predict whether a cell is
benign or malignant, I use the sigmoid activation function.
classifier.add(Dense(output_dim=128,activation=‘relu’))
classifier.add(Dense(output_dim = 1, activation =
‘sigmoid’))
Now that I have built the CNN model, I need to compile it and optimize the
weights and the loss function to evaluate the model. To do this, I use the
compile method in my classifier object and input the following parameters:
● Optimizer, which is the algorithm I used to find the optimal weights
for the CNN Model. I use adam, which is a stochastic gradient
descent algorithm. The ‘adam’ algorithm is one of the faster ones.
● Now that the model is built and compiled, the next step is to fit the
CNN model to the image dataset. The code below seems much more
difficult than what I had to do to build and compile the model.
However, the Keras website provides this code within its
documentation, as image augmentation is a common practice using
Keras. As I explain the code, I will refer to certain important
parameters:
● First I had to import the ImageDataGeneratorclass, which is used to
rescale, zoom and flip the images (to make our CNN model more
rigid). I kept the default parameters on rescaling, shear_range,
zoom_range, and horizontal_flip. Then we rescale the test image
data too by the same amount. Next, I use the flow from the directory
method, to fit the training set and test set to our image data. The
following parameters were used for this:
○ The first parameter refers to the path of the training/test data
within the directory.
○ The target size refers to the size of the target image, which is
32 x 32 pixels.
○ The batch_size refers to how often we want to change our
weights. I chose 10 for the training dataset, and 32 for the test
The testing accuracy and validation accuracy of our model are checked,
and the confusion matrix is plotted. The misclassified images count of
each type is also determined.
Flask app to App Platform using gunicorn. Gunicorn is a Python WSGI HTTP
Server that uses a pre-fork worker model. By using gunicorn, you’ll be able to
serve your Flask application on more than one thread.
Prerequisites
● A GitHub account.
● Python3 installed on your local machine. You can follow the
following tutorials for installing Python on Windows, Mac, or
Linux.
● A text editor. You can use Visual Studio Code or your favorite text
editor.
Before you get started, you need to set up your Python developer
environment. You will install your Python requirements within a virtual
environment for easier management.
First, let’s create a project directory for our code and requirements.txt file to be
stored in and change into that directory. Run the following commands:
mkdir flask-app
cd flask-app
Next, create a directory in your home directory that you can use to store all of
your virtual environments:
mkdir ~/.venvs
This creates a directory called flask within your .venvs directory. Inside, it
installs a local version of Python and a local version of pip. You can use this to
install and configure an isolated Python environment for your project.
Before you install your project’s Python requirements, you need to activate
the virtual environment.
source ~/.venvs/flask/bin/activate
Copy
Your prompt changes to indicate that you are now operating within a Python
virtual environment. It looks like this: (flask)user@host:~$.
With your virtual environment active, install Flask and gunicorn using the
local instance of pip:
Copy
Now that you have the flask package installed, save this requirement and its
dependencies so App Platform can install them later.
Do this now using pip and then saving the information to a requirements.txt file:
Copy
You now have all of the software needed to start a Flask app. You are almost
ready to deploy.
In this step, you will build a standard Hello Sammy! Flask application. You
won’t focus on the mechanics of Flask outside of how to deploy it to App
Platform. If you wish to deploy another application, the following steps will
work for a wide range of Flask applications.
nano app.py
Copy
@app.route('/')
def hello_world():
Copy
This code is the standard Hello World example for Flask with a slight
modification to say hello to your favorite shark. For more information about
this file and Flask, visit the official Flask documentation.
You have written your application code. Now you will configure the Gunicorn
server.
Gunicorn is a Python WSGI HTTP server that many developers use to deploy
their Python applications. This WSGI (Web Server Gateway Interface) is
necessary because traditional web servers do not understand how to run
Python applications. For your purposes, a WSGI allows you to deploy your
Python applications consistently. You can also configure multiple threads to
serve your Python application, should you need them. In this example, you
will make your application accessible on port 8080, the standard App Platform
port. You will also configure two worker-threads to serve your application.
nano gunicorn_config.py
Copy
bind = "0.0.0.0:8080"
workers = 2
Copy
This is all you need to do to have your app run on App Platform using
Gunicorn. Next, you’ll commit your code to GitHub and then deploy it.
First, initialize your project directory containing your files as a git repository:
git init
Copy
When you work on your Flask app locally, certain files get added that are
unnecessary for deployment. Let’s exclude those files using Git’s ignore list.
Create a new file called .gitignore:
nano .gitignore
Copy
*.pyc
Copy
Copy
Copy
[secondary_label Output]
[master (root-commit) aa78a20] Initial Flask App
Copy
Open your browser and navigate to GitHub, log in with your profile, and
create a new repository called flask-app. Create an empty repository without a
README or license file.
Once you’ve created the repository, return to the command line and push your
local files to GitHub.
Copy
Next, rename the default branch main, to match what GitHub expects:
Copy
Copy
[secondary_label Output]
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 1.20 KiB | 77.00 KiB/s, done.
Total 6 (delta 0), reused 0 (delta 0)
To github.com:MasonEgger/flask-app.git
* [new branch] main -> main
Copy
Your code is now on GitHub and accessible through a web browser. Now you
will deploy your app to DigitalOcean’s App Platform.
Once you push the code, visit The App Platform Homepage and click Launch
Your App. A prompt requests that you connect your GitHub account:
Next, provide your app’s name, choose a region, and ensure the main branch is
selected. Then ensure that Autodeploy code changes is checked. Click Next
to continue.
How to deploy your machine learning model using Digital Ocean cloud
platform. It is no doubt that doing a data science and machine learning
project, starting from collecting the data, processing the data, visualizing
insights about the data, and developing a machine learning model to do a
predictive task is a fun thing to do. What makes it more fun and doable is
that we can do all of those steps in our local machine and then be done with
it. However, wouldn’t it be awesome if other people can make use of our
machine learning model to do fun and cool stuff? The true magic of
machine learning comes when our model can get into other people’s hands
and they can do useful stuff from it.
The next important step is to process the image the user has uploaded. The
processing step includes resizing the image to the same size as training and
validation images. After resizing the image, then the loaded model should
predict in which category this image belongs.
import cv2
from PIL import Image, ImageOps
import numpy as np
def import_and_predict(image_data, model):
size = (150,150)
image = ImageOps.fit(image_data, size, Image.ANTIALIAS)
image = np.asarray(image)
predic琀椀on = model.predict(img_reshape)
if np.argmax(predic琀椀on) == 0:
st.write("Benign!")
elif np.argmax(predic琀椀on) == 1:
st.write("Malignant")
else:
st.write("No Skin Cancer!")
After that, you need to save the Python file in the same directory as your
previous Python file. We basically all set right now! To check what our web
app looks like, open your prompt, and then navigate to the working
directory of your Python files. In the working directory, you can type the
following command:
Now you will see from your prompt that you can check your web app on
your localhost. If you wait a little bit, a new window will be launched
shortly after you run your Streamlit app. Below is the screenshot of the
simple image classification web app.
Conclusion
The CNN Model I created above was tested to recognize from an image
whether a Skin Cancer cell is Benign or Malignant. The model has an
accuracy rate on the training set of 96.7%, with a loss of .089 in the latest
epoch, with that being the highest among all other epochs. The model has
an accuracy rate on the test set of 71.51% with a loss of 1.4443 in the latest
epoch. However, the model’s highest level of accuracy of 75.25% was in
epoch 5/25. Overall, I would say the model has done well and has achieved
my goal of being more than 70% accurate.
References
1. https://www.sciencedirect.com/science/article/pii/S266682702100
0177
2. https://ieeexplore.ieee.org/document/8641762
3. https://reader.elsevier.com/reader/sd/pii/S2666827021000177?tok
en=FD976C9B7F8F83FC83BCD551982485FCDEEE2
4. https://arxiv.org/abs/2105.04895
5. https://www.researchgate.net/publication/325116934_Image_clas
sification_using_Deep_learning