Professional Documents
Culture Documents
Skin Cancer Classification
Skin Cancer Classification
Skin Cancer Classification
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING
Submitted by
BANKA DHARMIK
20B91A0526
BONAFIDE CERTIFICATE
This is to certify that the project work entitled “AI-Powered Early Warning
System for Skin Manifestations” is the bonafide work of Banka Dharmik bearing
20B91A0526, who carried out the project work under my supervision in partial
fulfilment of the requirements for the award of the degree of Bachelor of
Technology in Computer Science and Engineering.
I hereby declare that the project work entitled “AI-Powered Early Warning
System for Skin Manifestations” is a genuine work carried out by me in B.Tech
( Computer Science and Engineering ) at SRKR Engineering College(A),
Bhimavaram and has not been submitted either in part or full for the award of any
other degree or diploma in any other institute or University.
Banka Dharmik
20B91A0526
ABSTRACT
Skin diseases rank as the fourth leading cause of the burden of nonfatal diseases worldwide.
Not only do they affect the individual, but they are also strong symptoms of underlying diseases.
Early detection of skin lesions is essential for the management of dermatological disorders.
Effective skin care in underserved areas is challenging due to lack of diagnostic tools,
communication problems, and trained dermatologists. Even the preliminary screening of
dermatological manifestations is a hectic task.
By automating the preliminary screening process, the tool facilitates remote screening, ensures
timely intervention, and reduces the burden on healthcare systems. These technological advances
address the challenges posed by the global burden of skin diseases, contributing to improved health
outcomes, psychological well-being, functioning and social participation of individuals affected.
Keywords: Dermatological disorders, skin lesions, machine learning, deep learning, early
detection, image segmentation, classification, severity assessment, recommendation, skin cancer.
TABLE OF CONTENTS
1 INTRODUCTION........................................................................................................................................................................... 6
1.1 The Urgency of Early Detection............................................................................................................................................6
1.2 AI Steps Up to the Challenge:................................................................................................................................................6
1.3 The Power of Deep Learning:................................................................................................................................................7
1.4 Beyond MNIST: The Complexity of Skin Lesions:...............................................................................................................8
1.5 A Glimpse into the Future:.....................................................................................................................................................8
2 LITERATURE SURVEY................................................................................................................................................................9
3 PROBLEM STATEMENT............................................................................................................................................................11
4 METHODOLOGY......................................................................................................................................................................... 12
4.1 Preprocessing and Data Augmentation................................................................................................................................12
4.2 CNN Architecture Design....................................................................................................................................................12
4.3 Model Optimization.............................................................................................................................................................13
4.4 Overfitting Prevention Techniques......................................................................................................................................13
4.5 Model Evaluation and Fine-tuning.......................................................................................................................................13
5 IMPLEMENTATION....................................................................................................................................................................14
5.1 Data Preparation...................................................................................................................................................................15
5.1.1 Importing the modules................................................................................................................................................15
5.1.2 Image Data Loading...................................................................................................................................................16
5.1.3 Lesion Type Categorization.......................................................................................................................................16
5.2 Image Preprocessing and Augmentation..............................................................................................................................17
5.2.1 Image Resizing and Normalization............................................................................................................................17
5.2.2 Data Augmentation.....................................................................................................................................................17
5.2.3 Class Balancing..........................................................................................................................................................18
5.3 Splitting Training and Testing Dataset.................................................................................................................................18
5.4 CNN Model Architecture.....................................................................................................................................................18
5.4.1 Convolutional Layers and Activation Functions........................................................................................................18
5.4.2 Pooling, Dropout, and Batch Normalization..............................................................................................................19
5.4.3 Model Design.............................................................................................................................................................19
5.5 Model Training.....................................................................................................................................................................19
5.6 Hyperparameter Tuning and Model Optimization...............................................................................................................20
5.6.1 Early Stopping and Model Checkpoints.....................................................................................................................20
5.6.2 Optimization Algorithms............................................................................................................................................21
5.7 Model Evaluation and Analysis...........................................................................................................................................21
5.7.1 Loading the Best Model and Calculating Accuracy...................................................................................................21
5.7.2 Class-wise Accuracy Assessment...............................................................................................................................22
6 RESULT ANALYSIS....................................................................................................................................................................23
6.1 Overall Performance............................................................................................................................................................23
6.2 Accuracy and Loss Trends...................................................................................................................................................24
6.3 Class-wise Performance.......................................................................................................................................................25
6.4 Visual Comparisons.............................................................................................................................................................25
7 CONCLUSION AND FUTURE WORK......................................................................................................................................27
8 REFERENCES............................................................................................................................................................................... 28
1 INTRODUCTION
The human skin, our largest and most visible organ, acts as a constant shield against the external
world. Yet, this protective barrier itself is susceptible to a multitude of diseases, impacting millions
worldwide. These conditions, ranging from common rashes to potentially life-threatening cancers,
can significantly affect individuals physically, psychologically, and socially. Early detection and
management are crucial for effective treatment and improved quality of life. However, access to
healthcare professionals and diagnostic tools remains a challenge, particularly in underserved
communities. This is where the power of artificial intelligence (AI) emerges, offering a glimmer of
hope for bridging the healthcare gap and revolutionizing skin disease detection.
Skin lesions, often the first visible signs of dermatological conditions, play a critical role in early
diagnosis. Take skin cancer, the most common human malignancy. Traditionally, its detection relies
on visual inspection, followed by dermoscopy, biopsy, and histopathological examination. This
process, while effective, can be time-consuming, resource-intensive, and prone to human error.
Delays in diagnosis can have severe consequences, highlighting the need for faster, more accurate
methods.
This project delves into the exciting realm of AI-powered skin lesion analysis, proposing a novel
solution: a system that utilizes convolutional neural networks (CNNs) for proactive detection.
Imagine a world where a simple image captured by a smartphone can be analyzed by an AI system,
providing valuable insights into potential skin concerns. This system aims to bridge the healthcare
gap by offering:
1. Cost-effectiveness: By automating the preliminary screening process, the system can reduce
the reliance on expensive specialist consultations.
2. Accessibility: The system's digital nature makes it readily available, even in remote areas
with limited access to dermatologists.
3. Efficiency: AI algorithms can analyze large volumes of data quickly and
accurately, facilitating faster diagnosis and treatment initiation.
1.3 The Power of Deep Learning:
The core of this system lies in the power of CNNs, a type of deep learning architecture inspired
by the structure and function of the human visual cortex. By training these networks on vast
datasets of labelled skin lesion images, such as the HAM10000 dataset (Fig: 1.1), the system learns
to recognize patterns and subtle differences crucial for accurate classification.
Another common dataset that proved to shown much influence is the ISIC Skin Cancer Dataset
(Fig: 1.2). This set consists of 2357 images of malignant and benign oncological diseases, which
were formed from The International Skin Imaging Collaboration (ISIC). It is publicly available. and
serves as a valuable ground for CNN models.
Fig 1.1: ISIC Skin Cancer Dataset Sample Images
While the success of deep learning in tasks like image recognition is undeniable, skin lesion
analysis presents a unique set of challenges compared to simpler datasets like MNIST (handwritten
digits). The high variability in shape, color, and texture of skin lesions, coupled with the often
subtle differences between benign and malignant cases, demands a more sophisticated approach.
This project delves into the intricate process of building and evaluating a CNN model that can
effectively navigate these complexities, paving the way for a more accurate and reliable AI-
powered diagnostic tool.
This project is just one step in the ongoing journey towards harnessing the power of AI for
improved healthcare. As technology advances and datasets grow, the accuracy and efficiency of AI-
powered skin lesion analysis are poised to further improve. Imagine a future where AI-powered
systems seamlessly integrate into routine healthcare checkups, offering real-time feedback and
personalized recommendations. This future holds immense potential for early detection, improved
treatment outcomes, and ultimately, a healthier future for all.
2 LITERATURE SURVEY
Taye Girma Debelee [1] conduct a systematic review on machine learning (ML) for skin disease
detection, focusing on evaluating datasets, current methods, and challenges. Through a detailed
methodology including research question formulation and database search strategies, they explore
the use of ML and AI, particularly deep learning models like EfficientNets and MobileNet V2, in
dermatology. The review highlights the importance of datasets such as HAM10000 and ISIC in
advancing diagnostic accuracy but points out the need for more extensive, varied datasets and
model generalizability for future progress.
T. Swapna [2] explore the application of deep learning for skin disease classification, utilizing
CNN architecture alongside Alex Net, ResNet, and InceptionV3 models. The study focuses on
enhancing diagnostic accuracy through the HAM10000 dataset, comprising various skin conditions,
extended with images of cuts and burns. It underscores the effectiveness of deep learning in
simplifying the diagnostic process, promising significant improvements in accuracy and efficiency.
Nancy Girdhar, Aparna Sinha, and Shivang Gupta [3] introduce DenseNet-II, an advancement in
CNNs aimed at improving melanoma detection. Focused on overcoming the challenges of variable
image quality in diagnosis, their study highlights the critical need for accurate detection tools
amidst rising cancer cases. DenseNet-II merges the strengths of existing models like DenseNet,
VGG-16, InceptionV3, and ResNet into a superior classifier tested on the HAM10000 dataset. This
innovation represents a significant leap in medical imaging, offering enhanced accuracy in
identifying melanoma, thereby contributing to better outcomes in cancer care.
Evgin Goceri and Ayse Akman Karakas [4] conducted a comparative study to classify skin
diseases using CNNs, focusing on networks like VGG16, VGG19, GoogleNet, InceptionV3, and
ResNet101. They ensured uniform testing conditions across networks and assessed them based on
accuracy, precision, and other metrics. InceptionV3 was highlighted for its novel architecture,
contributing to effective skin lesion classification. Ultimately, ResNet101 showed the highest
accuracy at 77.72%. This research highlights the potential of specific CNN models, including
InceptionV3, in improving dermatological diagnostics.
Kassem, Hosny, and Fouad [5] propose a deep learning model utilizing transfer learning with
GoogleNet for the accurate classification of skin lesions. Their study, focusing on the ISIC 2019
cancer dataset, demonstrates the model's ability to distinguish between eight different skin lesion
classes. By modifying GoogleNet's architecture and employing transfer learning, they achieve
classification accuracy, sensitivity, specificity, and precision percentages of 94.92%, 79.8%, 97%,
and 80.36%, respectively. This work highlights the effectiveness of their approach in handling the
ISIC 2019 dataset.
Esmaeilzadeh's [6] study delves into consumer attitudes towards adopting AI in healthcare,
pinpointing technological, ethical, and regulatory factors as major influencers on acceptance. It
underscores the critical balance between perceived benefits and risks associated with AI tools for
health purposes, aiming to enhance user trust and guide ethical AI integration into healthcare
practices.
Elder [7] discuss the transformative role of AI in cosmetic dermatology, particularly focusing on
skincare. They highlight AI-driven innovations such as personalized skincare regimens and
augmented reality tools for skin analysis. These advancements empower patients in their skincare
decisions and suggest a future where AI further personalizes and enhances dermatological care.
Berman [8] highlight the increased risk of skin cancer in solid organ transplant recipients
(SOTRs), underscoring the need for diligent prevention, regular screenings, and careful
management. The study advocates for adjusted immunosuppressive treatments and emphasizes
multidisciplinary care to mitigate skin cancer risks in this vulnerable population.
Sander [9] stress the importance of sunscreen in mitigating skin cancer risk, underscoring its
proven effectiveness against melanoma and nonmelanoma types. They recommend broad-spectrum
sunscreens with at least SPF 30, while noting ongoing research into the safety and environmental
impacts of sunscreen ingredients.
Shaheen's [10] review highlights the transformative role of AI in healthcare, particularly in areas
like drug discovery, clinical trials, and patient care. It showcases AI's capacity to accelerate
pharmaceutical research, streamline clinical trials through efficient data handling, and significantly
improve patient treatment outcomes. The review underlines AI's potential in enhancing healthcare
efficiency and making healthcare services more accessible and personalized.
3 PROBLEM STATEMENT
Skin conditions, a major global health issue, often go undiagnosed and untreated, especially in
resource-poor areas. This "diagnostic divide" leaves many facing the detrimental impacts of
untreated skin diseases without access to proper care. The lack of skilled dermatologists and
essential diagnostic tools in these communities further exacerbates the issue, delaying necessary
treatment, which might be deadly.
Skin cancer ranks among the most common cancers worldwide, posing a significant health
challenge, especially in regions with limited access to dermatology experts and diagnostic tools.
The capability for early detection and precise classification of skin lesions is vital for effective
treatment and improved patient outcomes. Yet, in many areas, a lack of medical infrastructure and
specialists heightens the risks associated with delayed diagnosis and treatment errors. Artificial
intelligence (AI), particularly through Convolutional Neural Networks (CNNs), emerges as a key
solution to this issue, offering a method to accurately classify skin cancer types based on
dermatoscopic images.
The HAM10000 dataset, which includes dermatoscopic images across seven skin cancer
categories, provides a valuable resource for training AI models in skin cancer classification. This
enables not only the advancement of machine learning in healthcare but also the possibility to make
dermatological diagnostics more accessible to underserved communities. The methodology
described herein focuses on creating and refining a CNN model for this purpose, leveraging AI to
potentially transform skin cancer diagnostics globally.
Normalization: Standardizes the pixel values across all images to have a mean of 0 and a
standard deviation of 1, ensuring that the model trains more efficiently.
Data Augmentation Algorithms: Techniques like rotation, zoom, flip, and translation are
applied to increase the diversity of the training dataset, helping the model generalize better
to new, unseen images
Dropout: A regularization technique where randomly selected neurons are ignored during
training, preventing them from co-adapting too much. This helps in reducing overfitting by
making the network more robust.
Early Stopping: Monitors the model's performance on a validation set and stops training
when performance degrades, as indicated by an increase in validation loss.
Transfer Learning: Leveraging pre-trained models on large datasets and fine-tuning them
on the HAM10000 dataset can significantly improve accuracy, especially when the dataset
is relatively small. Algorithms like VGG, ResNet, or Inception can serve as the starting
point.
Ensemble Methods: Combining predictions from multiple models or variations of a model
can improve accuracy and robustness. Techniques like bagging and boosting may be
employed to aggregate the outputs of multiple models.
This methodology provides a foundation for building a powerful CNN model capable of
classifying skin cancer with high accuracy. By carefully selecting and fine-tuning these algorithms,
it's possible to create a model that generalizes effectively to new, unseen images, making strides in
addressing the global challenge of skin cancer diagnosis.
5 IMPLEMENTATION
In addressing the pressing challenge of skin cancer detection, this project harnesses the power of
Convolutional Neural Networks (CNNs) to classify dermatoscopic images from the HAM10000
dataset. The implementation aims to develop a robust, accurate, and efficient model that can assist
in early diagnosis and contribute to improved patient outcomes. This section outlines the systematic
steps taken, from data preparation through model evaluation, to achieve these goals.
To optimize the CNN's performance for skin cancer classification, we undertook several dataset
preparation steps. Our objective was to transform the raw HAM10000 image data into a format
conducive to effective machine learning. The steps included:
1. Importing libraries and modules
2. Image Data Loading
3. Lesion Type Categorization
We began by importing essential libraries and modules, including NumPy for linear
algebra, Pandas for data processing, and TensorFlow along with Keras for building and training
our CNN model.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2
import os
We initiated the preparation by loading the dermatoscopic images along with their metadata
from the HAM10000 dataset. This step involved reading the image files and the accompanying
CSV file that contains metadata, including diagnoses (dx), image IDs, and patient information.
lesion_ID_dict = {
'nv': 0,
'mel': 1,
'bkl': 2,
'bcc': 3,
'akiec': 4,
'vasc': 5,
'df': 6
}
# Maping the lesion type and ID to a dict.
df_skin['lesion_type'] = df_skin['dx'].map(lesion_type_dict)
df_skin['lesion_ID'] = df_skin['dx'].map(lesion_ID_dict)
To optimize our CNN for skin cancer classification, we resized the HAM10000 images for
consistency and applied data augmentation techniques like rotation, flip, and zoom. These steps
enriched the dataset and improved model generalization.
1. Image Resizing and Normalization
2. Data Augmentation
3. Class Balancing
Given the varied sizes of dermatoscopic images, we resized each image to a uniform
dimension (e.g., 100x100 pixels) to ensure consistency.
To enhance the diversity of the training dataset and mitigate overfitting, we applied data
augmentation techniques such as rotation, zoom, flip, and translation. This step generated
additional synthetic images from the original dataset, increasing its robustness.
Recognizing the imbalanced nature of the dataset, with some lesion types being more
prevalent than others, we computed class weights. This approach allowed us to give higher
importance to underrepresented, aiming for a balanced model sensitivity across all categories.
We split the dataset into training and test sets, allocating 75% of the images for training and
25% for testing. This division allowed us to evaluate the CNN’s performance on unseen data,
ensuring the model's generalizability.
The architecture incorporates modern neural network practices to extract meaningful patterns
from the data, enhancing the model's predictive accuracy. The core components of our CNN include
convolutional layers with activation functions, pooling to reduce dimensionality, dropout for
regularization, and batch normalization to accelerate training.
Our CNN employs multiple convolutional layers with filters of various sizes to capture a
wide range of features essential for skin cancer classification. Each convolutional layer is
followed by a ReLU activation function, introducing non-linearity and enabling efficient
learning of complex patterns. This structure aids in preventing the vanishing gradient issue and
improves the model's ability to generalize, making it robust in detecting different skin cancer
types.
5.4.2 Pooling, Dropout, and Batch Normalization
Pooling: To reduce the spatial dimensions of the feature maps and mitigate overfitting,
we incorporated Max Pooling layers after specific convolutional layers. This approach
helps in reducing the number of parameters and computation in the network, making the
model more efficient.
Dropout: Recognizing the importance of preventing overfitting, especially when
working with a relatively limited dataset, we integrated Dropout layers within the
network architecture. By randomly dropping units during training, Dropout forces the
model to learn more robust features that are generalizable across unseen data.
Batch Normalization: To ensure stable and faster training, we applied Batch
Normalization after each convolutional layer. This technique normalizes the inputs to
each layer, reducing internal covariate shift and improving the overall training dynamics.
We meticulously assessed the performance of our CNN model, focusing on accuracy, loss
metrics, and class-wise accuracy to determine its efficacy in classifying various skin cancer types.
This evaluation not only highlighted the model's predictive strengths but also identified areas for
potential improvement, setting a foundation for future enhancements.
The best-performing model was saved during training, using the checkpoints established
based on validation accuracy. It shows our CNN after hyperparameter tuning and training.
To ensure our model performs well across all lesion types, we calculated class-wise
accuracy. This assessment highlights how effectively the model classifies each specific type of
skin cancer, revealing any biases or weaknesses.
for i in range(7):
acc_parz = round(np.mean(y_test2[y_test2 == i] == y_pred[y_test2 == i]),2)
lab_parz = lesion_names[i]
print('accuracy for',lab_parz,'=',acc_parz)
acc_tot.append(acc_parz)
6 RESULT ANALYSIS
Upon the comprehensive evaluation of our Convolutional Neural Network (CNN) model,
designed for the classification of skin cancer types using the HAM10000 dataset, we have observed
promising results that underscore the potential of AI in dermatological diagnostics. The model
achieved a notable accuracy, highlighting its capability to effectively distinguish between various
skin lesion types. The accuracy and loss curves, plotted over the training and validation phases,
exhibited a favourable convergence, indicating a balanced learning process without significant
overfitting or underfitting issues.
The CNN model demonstrated promising results in classifying skin cancer types from the
HAM10000 dataset. Achieving an accuracy of 87.28%, with precision, recall, and F1-scores
reflecting a balanced performance across various classes, the model validates the effectiveness of
the chosen architecture and training regimen. These metrics underscore the model's capability in
distinguishing between different skin lesion types, an essential aspect of early and accurate skin
cancer diagnosis.
Throughout the training and validation phases, the model's accuracy improved consistently,
while the loss decreased, indicative of effective learning. However, a careful examination of the
trends revealed minor signs of overfitting, as evidenced by a slight divergence between training and
validation accuracy in the later epochs. This observation guided adjustments in model training,
including the incorporation of dropout and early stopping, to mitigate overfitting and enhance
generalization.
The model's performance varied across different skin lesion classes, with Vascular lesions and
Dermatofibroma showing the highest accuracy rates. In contrast, Melanocytic nevi and Benign
keratosis-like lesions presented more challenges, reflecting lower prediction accuracy. This
variation highlights the model's strengths and areas where further tuning could yield improvements.
Visual inspection of test images alongside their predicted and actual labels revealed the model's
proficiency in identifying distinct lesion characteristics. Successful predictions across a spectrum of
lesion types demonstrated the model's robustness. However, notable misclassifications, particularly
in less accurately predicted classes, underscored the necessity for continuous model refinement.
The CNN model for skin cancer classification represents a significant step forward in applying
artificial intelligence to dermatological diagnosis. By achieving high accuracy and uncovering
specific areas for improvement, this project not only contributes to the ongoing efforts in medical
AI but also outlines a clear path for future advancements.
In conclusion, this project lays the groundwork for transformative changes in skin cancer
diagnosis, with AI-powered models offering speed, accuracy, and accessibility. Continuous
innovation and collaboration between technologists and medical professionals will be key to
realizing the full potential of this promising field.
8 REFERENCES
[1] Debelee, Taye Girma. "Skin Lesion Classification and Detection Using Machine Learning
Techniques: A Systematic Review." Diagnostics 13.19 (2023): 3147.
[2] Swapna, T., et al. "Detection and Classification of Skin diseases using Deep Learning." The
International journal of analytical and experimental modal analysis, ISSN 0886-9367 (2021).
[3] Girdhar, Nancy, Aparna Sinha, and Shivang Gupta. "DenseNet-II: An improved deep
convolutional neural network for melanoma cancer detection." Soft computing 27.18 (2023): 13285-
13304.
[4] Goceri, Evgin, and Ayse Akman Karakas. "Comparative evaluations of CNN based networks
for skin lesion classification." 14th International conference on computer graphics. visualization,
computer vision and image processing (CGVCVIP), Zagreb, Croatia. 2020.
[5] Kassem, Mohamed A., Khalid M. Hosny, and Mohamed M. Fouad. "Skin lesions classification
into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning."
IEEE Access 8 (2020): 114822-114832.
[6] Esmaeilzadeh, Pouyan. "Use of AI-based tools for healthcare purposes: a survey study from
consumers’ perspectives." BMC medical informatics and decision making 20.1 (2020): 1-19.
[7] Elder, Alexandra, et al. "The role of artificial intelligence in cosmetic dermatology—current,
upcoming, and future trends." Journal of Cosmetic Dermatology 20.1 (2021): 48-52.
[8] Berman, Hannah, et al. "Skin cancer in solid organ transplant recipients: A review for the
nondermatologist." Mayo Clinic Proceedings. Elsevier, 2022.
[9] Sander, Megan, et al. "The efficacy and safety of sunscreen use for the prevention of skin
cancer." Cmaj 192.50 (2020): E1802-E1808.