1 s2.0 S2772442523000667 Main

Healthcare Analytics 3 (2023) 100199
Contents lists available at ScienceDirect
Healthcare Analytics
journal homepage: www.elsevier.com/locate/health
A Comprehensive assessment of Convolutional Neural Networks for skin and

oral cancer detection using medical images
Dhatri Raval ∗, Jaimin N. Undavia
Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Faculty of Computer Science and Applications, Charotar University of Science and
Technology, Changa 388421, India
ARTICLE INFO ABSTRACT

Keywords: Early detection is essential to effectively treat two of the most prevalent cancers, skin and oral. Deep learning
Deep learning approaches have demonstrated promising results in effectively detecting these cancers using Computer-Aided
Convolutional Neural Network Cancer Detection (CAD) and medical imagery. This study proposes a deep learning-based method for detecting
Computer-Aided Cancer Detection
skin and oral cancer using medical images. We discuss various Convolutional Neural Network (CNN) models
Data augmentation
such as AlexNet, VGGNet, Inception, ResNet, DenseNet, and Graph Neural Network (GNN). Image processing
Skin cancer
Oral cancer
techniques such as image resizing and image filtering are applied to skin cancer and oral cancer images to
improve the quality and remove noise from images. Data augmentation techniques are used next to expand
the training dataset and strengthen the robustness of the CNN model. The best CNN model is selected based
on the training accuracy, training loss, validation accuracy, and validation loss. The study shows DenseNet
achieves state-of-the-art performance on the skin cancer dataset.
1. Introduction The main problem definition of this article is to find best suited CNN
model to detect skin cancer from the medical images.
Skin and oral cancers are significant public health concerns, with The desire to enhance early detection and diagnosis of skin cancer
millions of cases diagnosed each year. Early detection and accurate and oral cancer, which is one of the most common types of cancer
classification of skin and oral cancers are crucial for effective treatment worldwide, drives the motivation for research on skin cancer detection
and improved patient outcomes. Traditional methods of cancer clas- using deep learning. Skin cancer can often be successfully treated if
sification rely on human experts to visually examine biopsy samples, identified early, however the accuracy of the diagnosis is dependent
which can be time-consuming, subjective, and prone to errors. on the dermatologist who examines the skin lesion.
In recent years, deep learning techniques have shown promise in
automating cancer classification tasks. Deep learning algorithms can
automatically extract relevant features from large datasets and learn 2. Skin cancer
complex patterns that can be used to accurately classify cancer types.
Deep learning algorithms, including CNNs, can benefit from having Skin cancer is a type of cancer that originates in the cells of the skin.
larger datasets for training, but the actual amount of data required It occurs when the cells in the skin grow uncontrollably, forming a mass
for good results can vary depending on several factors, such as the or lesion. Health Organization (WHO), skin cancer is one of the most
complexity of the problem and the architecture of the network. common cancers worldwide, with more than 2 million cases reported
In some cases, a relatively small dataset can be sufficient if the prob- each year. The highest incidence rates are typically found in countries
lem is simple and the network is not too deep or complex. However, with fair-skinned populations living in areas with high levels of sun
in other cases, a large dataset may be necessary to train a deep neural exposure, such as Australia, New Zealand, and South Africa [1].
network effectively. Cancer incidence rates are higher in fair-skinned populations who
In the case of image classification tasks, having a larger dataset can live in countries with high amounts of ultraviolet (UV) radiation from
help improve the accuracy of the model, but it is not always feasible to the sun. This is because fair skin is more vulnerable to UV radiation
collect or label a large amount of data. and has less melanin, which provides some natural protection against
In case of the oral cancer the image dataset is very small, so UV radiation’s adverse effects. As a result, those with pale skin are more
achieving good accuracy in oral cancer is difficult. likely to get skin cancer. According to the report of the world cancer
∗ Corresponding author.
E-mail addresses: dhatriraval.mca@charusat.ac.in (D. Raval), jaiminundavia.mca@charusat.ac.in (J.N. Undavia).
https://doi.org/10.1016/j.health.2023.100199
Received 30 March 2023; Received in revised form 5 May 2023; Accepted 17 May 2023
2772-4425/© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
D. Raval and J.N. Undavia Healthcare Analytics 3 (2023) 100199
Table 1
Characteristics of skin cancer [11].
Characteristics Description
A — Asymmetry: One half of the mole or lesion is different from the
other half.
B — Border The edges of the mole or lesion are uneven or not
irregularity: well defined.
C — Color variation: The mole or lesion has different shades of brown,
Fig. 1. Three type of skin cancer [10]. black, or other colors, or it may have areas of white,
gray, pink, or red.
D — Diameter: The mole or lesion is larger than the size of a pencil
eraser.
research Australia had the highest case of skin cancer and new zealand E — Evolving: The mole or lesion is changing in size, shape, or color.
F — Firmness: The mole or lesion feels firm or raised.
is the second [2].
G — Growing: The mole or lesion is growing rapidly or has
In India, the incidence of skin cancer is relatively low compared increased in size.
to other countries, with a reported incidence rate of 0.4–1.0 cases per
100,000 populations. However, the incidence of skin cancer in India
is gradually increasing, primarily due to changes in lifestyle and an
increase in exposure to UV radiation from the sun. According to the use various techniques to examine the skin, such as a dermatoscope or
Indian Council of Medical Research (ICMR), the most common type of a biopsy, to accurately diagnose skin cancer.
skin cancer in India is squamous cell carcinoma, followed by basal cell It is important to note that not all skin cancers may exhibit these
carcinoma and melanoma [3,4]. characteristics, and not all moles or lesions that show these character-
It is essential to note that skin cancer is a preventable disease, istics are cancerous. Therefore, it is essential to regularly check skin
and taking preventive measures can significantly reduce the risk of for any new or suspicious lesions and to seek medical attention if one
developing skin cancer. notice any changes or abnormalities. A dermatologist or healthcare pro-
fessional can perform a proper diagnosis and recommend appropriate
Prevention of skin cancer: treatment options [12,13].
• Wear sunscreen: Apply SPF 30 or higher broad-spectrum sun- Person with malignant melanoma benefit tremendously from early
screen to all exposed skin, including the face, neck, ears, and identification. Although the ABCD assessments are useful in determin-
hands. ing which pigmented lesions require surgery, the differential diagnosis
• Avoid being in direct sunlight between 10 a.m. and 4 p.m., when of pigmented lesions is difficult. The most important risk factor for all
the sun’s rays are at their greatest. skin neoplasms remains sun exposure. Thus, people should be taught
• When in the sun, wear protective clothing such as long-sleeved basic precautions such as avoiding the sun during peak ultraviolet (UV)
shirts, trousers and a wide-brimmed hat. B hours, wearing sunscreen and protective clothing, and avoiding sun
• Avoid tanning beds: Because tanning beds raise the risk of skin tanning.
cancer, it is preferable to avoid them entirely. By detecting the cancer in early stage there are chance of better
treatment, increased the survival rate of person.
Therefore, it is crucial to regularly check your skin for any new or In this article, studied to detect skin cancer which are basal cell
suspicious lesions and to seek medical attention if you notice any carcinoma, Squamous cell carcinoma, and malignant melanoma.
changes or abnormalities [3,5].
There are three main types of skin cancer as shown in Fig. 1: basal 3. Oral cancer
cell carcinoma, squamous cell carcinoma, and melanoma.
Basal cell carcinoma (BCC) is the most common type of skin cancer,
Oral cancer refers to any malignant tumor that develops in the
accounting for approximately 80% of all skin cancers. It usually appears
mouth or throat, which can include the lips, tongue, gums, cheeks, floor
as a small, shiny bump or a patch of red, scaly skin that grows slowly
of the mouth, hard and soft palate, and tonsils. There are several types
and rarely spreads to other parts of the body. BCC is usually caused
of oral cancer, including:
by long-term exposure to ultraviolet (UV) radiation from the sun or
indoor tanning [6]. As per the American cancer society latest data over Squamous cell carcinoma: This is the most common type of oral cancer,
5.4 million basal and squamous cell skin malignancies are diagnosed in accounting for more than 90% of cases. It usually starts in the squamous
the United States each year. Approximately 8 out of every 10 of these cells, which are the thin, flat cells lining the surface of the mouth and
are basal cell cancers. Squamous cell tumors are less common [7]. throat.
Squamous cell carcinoma (SCC) is the second most common type
Verrucous carcinoma: This type of cancer is rare and grows slowly. It
of skin cancer, accounting for approximately 16% of all skin cancers. It
usually appears as a white or gray lump with a rough, wart-like surface.
usually appears as a firm, red nodule or a scaly, crusty lesion that grows
rapidly and can spread to other parts of the body if left untreated. SCC Minor salivary gland carcinoma: This type of cancer starts in the minor
is also caused by long-term exposure to UV radiation [8]. salivary glands, which are located throughout the mouth and throat. It
Melanoma is the least common type of skin cancer, but it is the most can occur anywhere in the mouth, but is most common in the palate.
dangerous. It accounts for approximately 4% of all skin cancers but is
Lymphoma: Lymphoma is a cancer of the lymphatic system, which is
responsible for the majority of skin cancer-related deaths. Melanoma
part of the immune system. It can occur in the mouth and throat.
usually appears as a dark, irregularly shaped mole that grows quickly
and can spread to other parts of the body. Melanoma is caused by a Melanoma: This is a rare form of cancer that develops from pigment-
combination of genetic factors and exposure to UV radiation [3,5,9] producing cells in the skin or mucous membranes. It can occur in the
If there is any mole, wound on skin which not heal and showing mouth or on the lips.
characteristics as mentioned in Table 1 having high risk of skin cancer. It is important to note that oral cancer can also spread from other
Skin cancer can be detected early through regular self-examination parts of the body, such as the lungs or breast, and develop in the mouth
and screening by a dermatologist. The American Cancer Society recom- or throat as a secondary cancer.
mends that individuals perform self-exams of their skin every month to According to the World Health Organization, there were approx-
look for any new or changing moles or spots. Dermatologists may also imately 354,864 new cases of oral cancer worldwide in 2020, and
2
177,384 deaths from the disease. According to the American Cancer Table 2
Deep Learning approaches with medical images.
Society, in the United States alone, it is estimated that in 2022, About
Reference No of Methodology Accuracy
54,010 new cases of oral cavity and oropharyngeal cancers (including
instance
cancers of the lips, tongue, tonsils, and throat) will be diagnosed and
S. Sharma et al. [27] 569 KNN 95%
about 10,850 deaths will occur from these types of cancers [14–16].
H ela Elmannai et al. [28] 4800 Transfer learning 94.97
Here are some additional statistics about oral cancer: Rahman, A.-u. [29] 1483 Transfer learning 90.02
C.A. Hamm et al. [30] 494 CNN 92
• Oral cancer is the sixth most common cancer in the world. India N. Patel and A. Mishra [31] 85 Back-propagation 98
is contributing one third of total number of oral cancer. Tobacco neural network
H. Xie, et al. [32] 1018 2D-CNN Faster 86
and alcohol usage is one of the key causes. According to the World
R-CNN
Health Organization (WHO), tobacco is used by around 90% Y. Wang et al. [33] 446 ResNet 78
of patients with oral cancer, whether in the form of cigarettes, M. Toğaçar, Z [34] 3297 SVM, CNN 92
chewing tobacco, or snuff. Alcohol use raises the chance of getting
oral cancer, when combined with tobacco use [17].
• Men are twice as likely as women to develop oral cavity and
Regular dental checkups are one of the most effective methods for
oropharyngeal cancers.
detecting oral cancer early. The dentist will look for any abnormalities
• Oral cancer is most commonly diagnosed in people over the age
or questionable spots in the mouth or throat during a dental exam. If a
of 40, although it can occur at any age. The average age at the
concerning spot is discovered, the dentist may perform a biopsy to de-
time of diagnosis is 62 years old.
termine the presence of malignancy. Early detection and treatment can
• Tobacco use is the leading cause of oral cancer, with alcohol use
significantly improve outcomes, so it is important to maintain regular
also being a significant risk factor.
oral cancer screenings and to seek medical attention if you notice any
• Human papillomavirus (HPV) infection is a growing cause of oral
unusual symptoms or changes in your mouth or throat [15,22,23].
cancer, particularly in younger people.
• The five-year survival rate for all stages of oral cavity and oropha- ∙ Complexity of the problem:
ryngeal cancers combined is approximately 66%. However, sur- The level of difficulty in detecting skin cancer through image
vival rates vary depending on the stage of cancer at the time of processing depends on the algorithms and methodologies used.
diagnosis. For example, the five-year survival rate for localized image processing for skin cancer diagnosis involves many phases,
cancers is approximately 84%, while it drops to 39% for cancers including image capture, image preprocessing, feature extraction,
that have spread to distant sites. and classification.
• The use of oral cancer screening tests, such as visual exams and
tissue biopsies, can help detect the disease early and improve ◦ Diversity of Skin lesion: Skin Lesions Vary in Shape, Color,
survival rates. Texture, and Appearance: Skin lesions vary widely in shape,
color, texture, and appearance. Some may be subtle and
Oral cancer is a significant public health concern in India. According to difficult to detect, while others may be evident. Because
the World Health Organization, oral cancer is the most common cancer of this diversity, developing a universal strategy for skin
among men and the third most common cancer among women in India. cancer screening is difficult.
Here are some statistics on oral cancer in India: ◦ Image Quality: The quality of images used for skin cancer
detection might vary greatly depending on the imaging in-
∙ Incidence: In 2020, it was estimated that there were around strument and the circumstances. When the image quality is
77,000 new cases of oral cancer in India. inadequate, accurately detecting and diagnosing skin lesions
∙ Mortality: Oral cancer is responsible for approximately 47,000 might be difficult.
deaths annually in India. ◦ Algorithm Complexity: Image processing techniques such
∙ Age group: Oral cancer affects individuals in their mid-fifties on as edge detection, feature extraction, segmentation, and
average in India, with the incidence rates increasing with age. classification are just a few of the many that can be used
∙ Gender: Men are more likely to develop oral cancer than women. to identify skin cancer. Choosing the best set of algorithms
In fact, the incidence of oral cancer is two times higher among for a specific task might be tricky.
men than women in India.
∙ Tobacco: The use of tobacco products, including smoking and 4. Deep learning recent approaches in Computer-Aided Diagnosis
chewing tobacco, is the primary cause of oral cancer in India. cancer detection (CAD)
∙ Alcohol: Alcohol consumption is also a significant risk factor for
oral cancer in India. Deep learning algorithms achieved promising results in computer
∙ Geographical distribution: Oral cancer is more prevalent in cer- aided cancer diagnosis (CAD) across multiple cancer types [24–26].
tain regions of India, including the northeastern states, Uttar Here are some recent research publications that indicate deep learning’s
Pradesh, and Bihar. potential in this field (see Table 2):
Deep learning models have shown promising results in the field
It is important to note that these statistics are generalizations, and of medical imaging analysis and diagnosis, and skin cancer detection
individual cases may vary depending on a variety of factors, including using deep learning is an active area of research. The development of
the type and stage of cancer, the individual’s overall health, and the accurate and efficient deep learning models for skin cancer detection
effectiveness of treatment [16,18]. could potentially improve the accuracy and speed of diagnosis, making
The difficulty in diagnosing the disease at an early curable stage it easier to detect and treat skin cancer at an early stage [35–37].
has been blamed for the high mortality rate in oral cancer patients.
Furthermore, because early carcinomas are asymptomatic, the majority 5. Literature review
of cases of oral cancer are usually detected in advanced stages . Current
diagnostic methods rely on clinical and histological exams, which have Early detection of skin cancer and oral cancer is critical for suc-
low sensitivity and a high chance of missing malignant lesions in hidden cessful treatment, and Computer-aided diagnosis (CAD) systems can
locations [19–21]. help in the detection of skin cancer. Over the years, various imaging
3
techniques have been developed to help in the early detection of ‘‘Automated Skin Cancer Detection through Spectral-Spatial Feature
skin and oral cancer. One of the emerging techniques is the use of Fusion and Deep Learning’’ by Zhang et al. (2019) The study proposed
convolutional neural networks (CNNs), which have shown promising a novel approach to skin cancer detection that combined spectral and
results in detection and classification of skin cancer and oral cancer spatial features with a deep learning model. The authors trained the
from medical images. This literature review aims to explore the use of model using a dataset of 2000 dermoscopy images and achieved an
CNN in skin cancer and oral cancer detection and classification. AUC of 0.955 for melanoma detection and an AUC of 0.916 for basal
Dermoscopy is a non-invasive imaging technique that has been cell carcinoma detection.
widely used for the detection of skin cancer. Several studies have re- ‘‘Dermatologist-level classification of skin cancer with deep neu-
ported that dermoscopy can improve the accuracy of clinical diagnosis ral networks’’ by Esteva et al. (2017) The authors trained a CNN
of skin cancer, especially for melanoma [38]. model using over 130,000 clinical images of skin lesions to classify
Reflectance confocal microscopy (RCM) is a non-invasive imaging images into three categories: benign, malignant, or non-neoplastic. The
technique that allows for in vivo imaging of skin at cellular resolution. model achieved a sensitivity of 95% and specificity of 84%, which
Several studies have shown that RCM can improve the diagnostic
outperformed the performance of 21 board-certified dermatologists
accuracy of skin cancer, especially for difficult-to-diagnose lesions [39].
who participated in the study [40].
Artificial intelligence (AI) and machine learning (ML) algorithms
‘‘A Novel Multi-task Deep Learning Model for Skin Lesion Segmen-
have been increasingly used for the detection and diagnosis of skin
tation and Classification’’ authors proposed a multi-task CNN model for
cancer. Several studies have reported high sensitivity and specificity
the simultaneous classification of seven types of skin lesions, including
of AI/ML algorithms for the detection of skin cancer using dermoscopy
melanoma, nevus, and basal cell carcinoma. The model was trained
and/or RCM images [40].
using a dataset of over 10,000 dermoscopy images and achieved an
Kousis and et al. [41] used CNN and get accuracy near about 92%
by using HAM10000 Dataset. In the article author choose the DenseNet overall accuracy of 83.6% [51].
169 CNN. Model is train with less number of images by providing more In conclusion, CNNs have shown great potential in the detection of
images, model can be train. skin cancer and oral cancer from medical images. These studies provide
Mohakud et al. [42] used CNN grey wolf algorithm, the proposed evidence that CNNs can accurately detect and classify skin cancer and
method achieved a high level of accuracy, it can be used in its current oral cancer, and can potentially be used as a tool to aid in the early
form or with minor modifications to the algorithm to detect more than diagnosis of skin cancer and oral cancer. These studies utilized different
one type of cancer. approaches and datasets, but all achieved significant levels of accuracy
The authors of [43] applied four separate CNN models based on in diagnosing skin lesions. The use of CNN models in dermatology has
ResNet, SqueezNet, DenseNet, and Inception v3, in that order. They the potential to improve diagnostic accuracy and reduce unnecessary
used the HAM10000 dataset to train and test them, which was sepa- biopsies, leading to better patient outcomes.
rated into training (80%), validation (10%), and test (10%) sets. For the
two-class prediction instance, the best model (Inception v3) achieved a 6. Convolutional Neural Network (CNN)
95.74%. Model is train with less number of images by providing more
images, model can be train. Model is train for two classes only. A Convolutional Neural Network (CNN) is a type of deep learning
The authors of [44] created a model for skin disease categorization
neural network that is commonly used for image recognition and
that included the MobileNet and LSTM architectures. For training and
computer vision tasks.
evaluation, they used the HAM10000 dataset. The author developed
CNNs are designed to recognize patterns in visual data by processing
mobileapp for the same using MobileNet and LTSM.
images through multiple layers of convolutions and pooling operations.
Oral visual examination is the primary method for the detection of
Each layer consists of a set of filters that are convolved with the input
oral cancer. However, the accuracy of visual examination depends on
image, producing a set of activation maps that highlight the presence
the experience and skill of the examiner [45].
Auto fluorescence imaging (AFI) is a non-invasive imaging tech- of certain features in the image. Pooling layers then downsample the
nique that uses a special light to highlight abnormal tissue in the activation maps, reducing the size of the data and capturing the most
mouth. Several studies have shown that AFI can improve the accuracy salient information [52,53].
of oral cancer detection [46]. CNNs also typically include fully connected layers at the end of
Narrow-band imaging (NBI) is a type of endoscopic imaging that en- the network, which take the output of the convolutional and pooling
hances the visibility of blood vessels in the mucosa. Several studies have layers and use it to classify the input image into one of several possible
reported high accuracy of NBI for the detection of oral cancer [47]. categories.
AI and ML algorithms have also been used for the detection of CNNs have achieved state-of-the-art results on a wide range of
oral cancer using various imaging modalities, such as AFI and NBI. computer vision tasks, including image classification, object detection,
Several studies have reported high sensitivity and specificity of AI/ML and segmentation. They have also been used in other domains such as
algorithms for the detection of oral cancer [48] . natural language processing and speech recognition.
‘‘Classification of the clinical images for benign and malignant
CNN architecture:
cutaneous tumors using a deep learning algorithm’’ by [49] Han et. Al.
The architecture of a CNN (Convolutional Neural Network) used for
The study utilized a deep learning approach for skin cancer diagnosis.
The authors trained a CNN model using 129,450 clinical images of skin cancer diagnostics typically includes convolutional layers, pooling
skin lesions, including benign and malignant melanomas. The model layers, fully connected layers, and output layers.
achieved a sensitivity of 90.0% and specificity of 92.1% in classifying The following is a summary of each layer of CNN :
melanoma versus benign nevi. Layers of convolutions: In Fig. 2 it shows to perform the convolution
‘‘Skin Lesion Analysis Toward Melanoma Detection: A Challenge at process on the input image, these layers employ a collection of trainable
the 2017 International Symposium on Biomedical Imaging’’ by Codella filters or kernels. The filters help to extract relevant information from
et al. (2018). The authors present a challenge to develop a CNN-based the input image, such as texture patterns, corners, and edges.
model to diagnose skin lesions, particularly melanoma. The challenge
used a dataset of over 2000 dermoscopy images. The top-performing Pooling layers: These layers down sample the output of the convolu-
algorithm achieved an area under the curve (AUC) of 0.86, which tional layers by taking the maximum or average value of a group of
outperformed the performance of 157 dermatologists who participated pixels. Pooling reduces the spatial scale of the input image, increasing
in the study [50]. the model’s resistance to input changes.
4
Fig. 2. Convolutional neural networks architecture [54].
Fig. 3. AlexNet architecture [56].
Fully connected layer: These layers connect every neuron in the to evaluate the performance of computer vision algorithms for image
previous layer to every neuron in the layer above. They help with both recognition tasks.
prediction and detecting the essential parts in the given image. AlexNet consists of eight layers, including five convolutional layers
Output Layer: The model’s output layers generate the predicted class and three fully connected layers. It was one of the first deep convolu-
of the input image (for example, melanoma, non-melanoma), which is tional neural networks to use rectified linear units (ReLU) instead of the
the model’s ultimate output. more traditional sigmoid activation function. The use of ReLU allowed
In addition to these layers, some CNN architectures incorporate AlexNet to train faster and achieve better performance than previous
dropout, batch normalization, and data augmentation to improve the networks. Fig. 2 shows the AlexNet architecture [55].
model’s performance and generalizability. As per Fig. 3 shows each layer of AlexNet performs a specific
All the other CNN model works on the same architecture of the CNN operation on the input data, which is typically an image. The input
which shown in Fig. 2. to the first layer is a 227 × 227 × 3 image (where 227 × 227 is the size
There are several methods to improve the performance of CNN on of the image and 3 is the number of color channels), and the output
image classification: of the last layer is a vector of probabilities indicating the likelihood of
each of the 1000 possible classes in the ImageNet dataset.
• Data Augmentation is the process of creating new training data Input Layer:
from existing data by performing various transformations such as The input layer of AlexNet takes an RGB image of size 227 × 227
rotation, scaling, flipping, and noise. Data augmentation can help × 3, where 227 × 227 is the spatial size of the image and 3 is the
improve the size and diversity of the training set, boosting the number of color channels.
model’s capacity to generalize. Convolutional Layer 1:
• Transfer Learning: Transfer learning entails fine-tuning The first convolutional layer in AlexNet applies 96 filters of size
pre-trained models that have been learned on big datasets for a 11 × 11 × 3 to the input image with a stride of 4 and no padding.
specific goal. This strategy can assist increase CNN performance The output of this layer is passed through a ReLU activation func-
with minimal training data while also reducing training time. tion, which introduces non-linearity into the output. The mathematical
• Optimizers: Various optimizers, like as Adam, RMSprop, and SGD, representation of this layer can be written as:
can be used to update the CNN weights during training. Choosing
the right optimizer can improve the model’s training speed and 𝐻 1 = 𝑅𝑒𝐿𝑈 (𝑊 1 ∗ 𝑋 + 𝑏1 ) (1)
convergence rate. 1
where 𝑋 is input image and 𝑊 set for 96 filters of size 11 × 11 × 3
• CNN Architecture Design: The CNN architecture design is critical
and 𝑏1 bias term for each filter.
in deciding the model’s performance. To increase the model’s
Max Pooling Layer 1:
accuracy and save training time, techniques such as depthwise
After the first convolutional layer, a max pooling layer is applied to
separable convolution, residual connections, and attention mech-
the output with a filter size of 3 × 3 and a stride of 2. The mathematical
anisms can be applied.
representation of this layer is:
• Tuning Hyperparameters: Hyperparameters such as learning rate,
batch size, and number of epochs can all have a major impact P1ijk = max(𝑍2𝑖−1
1
∶ 2𝑖+1,2𝑗−1 ∶ 2𝑗+1,𝑘
) (2)
on the model’s performance. The accuracy of the model can be
improved by tuning these hyperparameters using strategies such Convolutional Layer 2:
as grid search or random search. The second convolutional layer in AlexNet applies 256 filters of size
5 × 5 × 48 to the output of the first max pooling layer with a stride of
The subsection part discusses the various Convolutional Neural Net- 1 and padding of 2. The output of this layer is passed through a ReLU
works (CNNs). activation function. The mathematical representation of this layer can
be written as:
6.1. AlexNet
𝐻 3 = 𝑅𝑒𝐿𝑈 (𝑊 2 ∗ 𝐻 2 + 𝑏2 ) (3)
AlexNet is a convolutional neural network that was introduced by where 𝑊 2 set for 256 filters of size 5 × 5 × 48 and 𝑏2 bias term for
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012. It was each filter.
designed to participate in the ImageNet Large Scale Visual Recogni- Similar to the first max pooling layer, a second max pooling layer
tion Challenge (ILSVRC), which is an annual competition that aims is applied to the output of the second convolutional layer with a filter
5
size of 3 × 3 and a stride of 2. The mathematical representation of this

layer is the same as that of max pooling layer 1.
P2ijk = max(𝑍2𝑖−1
2
∶ 2𝑖+1,2𝑗−1 ∶ 2𝑗+1,𝑘
) (4)
Convolutional Layer 3:
The third convolutional layer in AlexNet applies 384 filters of size Fig. 4. VGGNet architecture [57].
3 × 3 × 256 to the output of the second max pooling layer with a stride
of 1 and a padding of 1.
The output feature map of the last max pooling layer is a 6 × 6 × 256
𝐻 4 = 𝑅𝑒𝐿𝑈 (𝑊 3 ∗ 𝐻 3 + 𝑏3 ) (5) tensor. To pass it into the fully connected layers, it needs to be flattened
where 𝑊 3 set for 384 filters of size 3 × 3 × 256 and 𝑏3 bias term for into a 1D vector. This can be done using the reshape function:
each filter. 𝐹 = 𝑟𝑒𝑠ℎ𝑎𝑝𝑒(𝑃 5 , [1, 66256]) (11)
The third layer of AlexNet is a max pooling layer with a pool size
of 3 × 3 and a stride of 2. Let us denote the output feature map of where F is the flattened feature vector, 𝑃 5 is the output feature map of
the third convolutional layer as 𝑍 3 and the output feature map of the the fourth max pooling layer, and reshape() converts the 6 × 6 × 256
third pooling layer as 𝑃 3 . Then, the output of the third pooling layer tensor into a 1D vector of size 1 × (66256).
is obtained by taking the maximum value over each non-overlapping Fully Connected Layer 1:
3 × 3 region of the third convolutional layer. This can be represented The first fully connected layer of AlexNet has 4096 neurons. It
mathematically as: takes the flattened feature vector 𝐹 as input and applies a matrix
multiplication with a weight matrix 𝑊 5 of size 9216 × 4096, followed
3 3
𝑃𝑖𝑗𝑘 = max(𝑍2𝑖−1 ∶ 2𝑖+1,2𝑗−1 ∶ 2𝑗+1,𝑘
) (6) by an element-wise addition of a bias vector 𝑏5 of size 1 × 4096. The
3
output is passed through the ReLU activation function:
where 𝑃𝑖,𝑗,𝑘 is the (i,j,k)th element of the output feature map of the
3
third pooling layer, (𝑍2𝑖−1 ) is the 3 × 3 region of the 𝑘th 𝐻 6 = 𝑓 (𝑊 5 ∗ 𝐹 + 𝑏5 ) (12)
∶ 2𝑖+1,2𝑗−1 ∶ 2𝑗+1,𝑘
channel of the third convolutional layer centered at (2i, 2j), and max()
where 𝐻 6 is the output of the first fully connected layer, 𝑓 () is the ReLU
takes the maximum value over the elements of the 3 × 3 region.
activation function, * represents the matrix multiplication operation,
Convolutional Layer 4
and + represents the element-wise addition operation.
The fourth convolutional layer of AlexNet has 384 filters, each with Fully Connected Layer 2:
a size of 3 × 3 × 192. The convolution operation is performed with a The second fully connected layer of AlexNet also has 4096 neurons.
stride of 1 and zero-padding of size 1. It takes the output of the first fully connected layer 𝐻 6 as input and
𝐻 5 = 𝑅𝑒𝐿𝑢(𝑊 4 ∗ 𝐻 4 + 𝑏4 ) (7) applies a matrix multiplication with a weight matrix 𝑊 6 of s4ize
4096 × 4096, followed by an element-wise addition of a bias vector
The fourth layer of AlexNet is also a max pooling layer with a pool 𝑏6 of size 1 × 4096. The output is passed through the ReLU activation
size of 3 × 3 and a stride of 2. Let us denote the output feature map of function:
the fourth convolutional layer as 𝑍 4 and the output feature map of the
𝐻 7 = 𝑓 (𝑊 6 ∗ 𝐻 6 + 𝑏6 ) (13)
fourth pooling layer as 𝑃 4 . Then, the output of the fourth pooling layer
is obtained by taking the maximum value over each non-overlapping where 𝐻 7 is the output of the second fully connected layer.
3 × 3 region of the fourth convolutional layer. This can be represented Fully Connected Layer 3:
mathematically as: The final fully connected layer of AlexNet has 1000 neurons, cor-
4 4 responding to the 1000 classes in the ImageNet dataset. It takes the
𝑃𝑖𝑗𝑘 = max(𝑍2𝑖−1 ) (8)
∶ 2𝑖+1,2𝑗−1 ∶ 2𝑗+1,𝑘 output of the second fully connected layer 𝐻 7 as input and applies a
where 4
𝑃𝑖𝑗𝑘is the (i,j,k)th element of the output feature map of the matrix multiplication with a weight matrix 𝑊 7 of size 4096 × 1000,
fourth pooling layer, 𝑍2𝑖−14 is the 3 × 3 region of the followed by an element-wise addition of a bias vector 𝐵 7 of size
∶ 2𝑖+1,2𝑗−1 ∶ 2𝑗+1,𝑘
𝑘th channel of the fourth convolutional layer centered at (2i, 2j), and 1 × 1000. The output is passed through the softmax activation function:
max() takes the maximum value over the elements of the 3 × 3 region.
Convolutional Layer 5 𝑌 = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(𝑊 7 ∗ 𝐻 7 + 𝑏7 ) (14)
The fifth convolutional layer of AlexNet has 256 filters, each with
a size of 3 × 3 × 192. The convolution operation is performed with a where 𝑌 is the output of the final fully connected layer, which repre-
stride of 1 and zero-padding of size 1 on the output feature map of the sents the predicted probabilities of the input image belonging to each
fourth layer. of the 1000 classes in the dataset. The softmax function normalizes the
output into a probability distribution over the classes.x
𝐻 6 = 𝑅𝑒𝐿𝑈 (𝑊 5 ∗ 𝐻 5 + 𝑏5 ) (9)
6.2. VGGNet
After the fifth convolutional layer, AlexNet applies max pooling
with a stride of 2 and a filter size of 3 × 3 on the output feature map.
VGGNet is a deep convolutional neural network architecture pro-
Let us denote the output feature map of the fifth convolutional layer
posed by Karen Simonyan and Andrew Zisserman in 2014. It was
as 𝐻 6 and the output feature map after max pooling Then, the max
designed to improve the performance of convolutional neural net-
pooling operation is defined as follows: works (CNNs) on image classification tasks by using smaller filter
p5ijk = max(𝑍2𝑖−1
5
) (10) sizes and deeper network structures. In this answer, we will explain
∶ 2𝑖+1,2𝑗−1 ∶ 2𝑗+1,𝑘
VGGNet in detail, including the mathematical equations involved in its
AlexNet uses three fully connected layers, also known as dense architecture.
layers. The output of the last max pooling layer is flattened into a VGGNet consists of 19 layers (excluding the input layer) can see in
vector and fed into the first fully connected layer. Here’s the detail with Fig. 4 and is composed of a series of convolutional layers, followed by
equations: pooling layers, and then fully connected layers. The architecture can
Flattening Layer: be summarized as follows:.
6
As per Fig. 4 Input (224 × 224 × 3) → Convolutional Layers →

Pooling Layers → Fully Connected Layers → Output (1000 classes)
The first layer in the network is an input layer, which takes an RGB
image of size 224 × 224 × 3. The input is then passed through a series
of convolutional layers, where each layer applies a set of learnable
filters to the input. The output of each convolutional layer is then
passed through a rectified linear unit (ReLU) activation function, which
introduces non-linearity into the network.
After several convolutional layers, the output is passed through
pooling layers, which reduce the size of the feature maps and introduce Fig. 5. Residual block [60].
some level of translation invariance. VGGNet uses max pooling, where

the maximum value within a pool of neurons is taken as the output.
Finally, the output of the last pooling layer is passed through fully
connected layers, which perform the final classification. The output of
the network is a softmax function over the 1000 classes in the ImageNet
dataset. Fig. 6. Demonstrates the block of DenseNet [60].
Convolutional Layers:
VGGNet uses convolutional layers with small 3 × 3 filters, which
have been shown to be more effective than larger filters. Each convo- In summary, Inception is a deep learning architecture that uses a
lutional layer applies a set of learnable filters to the input and produces combination of filters with different sizes in parallel to capture features
an output feature map. The output feature map has the same depth as at different scales. This allows the network to capture both local and
the number of filters used in the layer. The size of the output feature global features and can improve the performance of the model. The
map is determined by the following equation: use of 1 × 1 convolutions and pooling layers also helps to reduce the
output_size = (input_size − filter_size + 2 ∗ padding)∕stride + 1 computational cost of the network.
where input_size is the size of the input feature map, filter_size is the 6.4. ResNet
size of the filter, padding is the number of zero padding pixels added
to the border of the input feature map, and stride is the stride used to
Residual Network (ResNet) is a type of deep neural network that has
move the filter over the input feature map.
been widely used in computer vision tasks such as image recognition,
object detection, and segmentation. ResNet was introduced by Kaiming
6.3. Inception
He et al. in their paper ‘‘Deep Residual Learning for Image Recognition’’
in 2016 [58,59]
The Inception architecture was introduced by C szegedy et al. in
A ResNet layer consists of two main components: the identity short-
2014 and is distinguished by the application of numerous parallel
cut connection and the residual block. It is shown in Fig. 5.
convolutional layers with various filter size []. This allows the network
Residual Network (ResNet) is a type of deep neural network that has
to capture features at multiple scales and resolutions. Before sending
the feature maps to the following layer, the Inception design also been widely used in computer vision tasks such as image recognition,
performs 1 × 1 convolutions to minimize their dimensionality. object detection, and segmentation. ResNet was introduced by Kaiming
In a traditional CNN architecture, a single filter size is used at each He et al. in their paper ‘‘Deep Residual Learning for Image Recognition’’
layer to capture local features. In contrast, Inception uses a combination in 2016 [8].
of filters with different sizes to capture features at different scales. A ResNet layer consists of two main components: the identity short-
This is achieved by using a combination of 1 × 1, 3 × 3, and 5 × 5 cut connection and the residual block. It is shown in Fig. 5.
convolutions in parallel, as well as a pooling operation. The output of In summary, a ResNet layer takes an input 𝑥, applies a series of
each operation is concatenated and passed on to the next layer. convolutional layers with batch normalization and activation functions
Mathematically, let us assume that an input feature map X and we to obtain 𝐹 (𝑥), adds an identity shortcut connection to preserve the
want to apply a set of filters {W1 , 𝑊2 , . . . , 𝑊n } with different sizes to information from the input, and finally adds the output of the convolu-
capture features at different scales. In a traditional CNN, apply each tional layers and the identity shortcut connection to obtain the output
filter separately to the input feature map: of the layer.
𝑌𝑖 = 𝑓𝑖 (𝑋; 𝑊𝑖 ) (15) 6.5. DenseNet

where 𝑓𝑖 is a non-linear activation function and 𝑌𝑖 is the output feature
map produced by filter 𝑊𝑖 . Dense Convolutional Network (DenseNet) is a deep learning archi-
In Inception, we apply a set of filters with different sizes in parallel tecture that was introduced in 2017 by researchers at Facebook AI
and concatenate their outputs: Research. DenseNet is based on the idea of dense connections, which
( ) allow feature maps to be reused and combined in a highly efficient
𝑌 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑓𝑖 𝑋; 𝑊𝑖 ) (16) manner.
where ‘‘concat’’ denotes the concatenation operation and 𝑓𝑖 is a non- In a traditional CNN architecture, each layer takes the output of the
linear activation function. previous layer as its input. In contrast, DenseNet layers take as input
In addition to the parallel filter sizes, Inception also includes 1 × 1 the concatenation of all previous feature maps. This is known as a dense
convolutions, which can be used to reduce the number of input chan- block and it allows each layer to receive direct input from all previous
nels and the computational cost of the network. These are known as layers.
bottleneck layers and are used to reduce the number of input channels Mathematically, let us assume a set of feature maps {𝑋0 , 𝑋1 , … ,
before applying the more expensive 3 × 3 and 5 × 5 convolutions. 𝑋𝐾−1 } that need to combine into a new set of feature maps 𝑋𝐾 . A typ-
[ ]
The output of each Inception module is then passed through a ical convolutional layer would compute 𝑋𝑘 − 𝑓𝑘 ( 𝑋0 , 𝑋1 , … , 𝑋𝐾−1 𝜃k )
[ ]
pooling layer, which can be used to down sample the feature map and where 𝑋0 , 𝑋1 , … , 𝑋𝐾−1 is the concatenation of all previous feature
capture global features. maps and 𝑓𝑘 is a function parameterized by 𝜃k as per Fig. 6.
7
Fig. 7. DenseNet architecture [60].
In a dense block, however, the output feature maps are computed

as follows as per Fig. 7. and equation,
𝑋𝑘 = 𝐻𝑘 (𝑋0 , 𝑋1 , … , 𝑋𝑘−1 ) (17)
where 𝐻𝑘 is a set of non-linear transformations that are applied to the

concatenation of all previous feature maps.
In other words, the output feature maps of each layer in a dense
block are the concatenation of all previous feature maps and the output
of the non-linear transformation applied to them.
This allows feature maps to be reused and combined in a highly
Fig. 8. Demonstrates the architecture of GNN [62].
efficient manner, since each layer has access to all the previous feature
maps. This also leads to a significant reduction in the number of
parameters in the network, since each layer only needs to learn a small
number of additional filters. Another message passing scheme that has gained popularity in
DenseNet architectures typically consist of a series of dense blocks, recent years is the Graph Attention Network (GAT), which uses an
with each block followed by a transition layer that reduces the number attention mechanism to selectively attend to important neighbors. The
of feature maps by using a 1 × 1 convolution and a pooling layer. The GAT can be defined as follows:
last layer of the network is a fully connected layer that outputs the class
𝑒𝑖𝑗 = 𝛼 (𝑥𝑖, 𝑥𝑗) = 𝑆𝑜𝑓 𝑡𝑚𝑎𝑥(𝐿𝑒𝑎𝑘𝑦𝑅𝑒𝐿𝑈 (𝑎𝑇 [𝑊 𝑥𝑖 ∥ 𝑊 𝑥𝑗])) (20)
probabilities.
where 𝛼(xi, xj) is the attention weight between node i and node j, Leaky
6.6. GNN ReLU is a non-linear activation function, a and W are learnable weight
matrices, and ∥ denotes concatenation. The attention weight 𝛼(xi, xj)
Graph Neural Networks (GNNs) are a type of neural network that is computed as a function of the feature vectors xi and xj, and the
can model and analyze graph-structured data. They have become in- weight matrices a and W. The softmax function normalizes the attention
creasingly popular in recent years due to their ability to represent weights, ensuring that they sum to one.
and process complex relationships between entities, such as molecules, The above equations show how a GNN can be used to learn node
social networks, and biological networks, among others. representations that capture both the local and global structure of the
Advantages of GNN are ‘‘Ability to model graph-structured data’’, graph. By aggregating information from neighbors and updating node
‘‘Ability to capture complex relationships’’, ‘‘Ability to handle variable representations, a GNN can learn complex relationships between nodes
input size’’ and, ‘‘Ability to integrate multiple data modalities’’. in the graph, making it a powerful tool for a wide range of applications.
Potential for transfer learning: GNNs can be trained on one graph The GNN will used for oral cancer as dataset for oral cancer is less in
and transferred to another graph with similar structure, enabling them compare to the dataset available for skin cancer.
to be used in different contexts or domains The following summarized steps to implement GNN :
GNN with all the math in depth, we will start by introducing some
notation. Let us assume that we have a graph 𝐺 = (𝑉 , 𝐸), where V is • The initial step in developing GNNs is to represent the input data
the set of nodes and E is the set of edges. Each node i ∈ V has a feature as a graph. A graph is described as a collection of nodes and edges,
vector 𝑥i ∈ Rd, and each edge (i, j) ∈ E has a weight 𝑤ij . where nodes represent entities and edges reflect relationships
The goal of a GNN is to learn a representation hi ∈ Rh for each between them. Nodes in GNNs can represent any form of object,
node i, which captures both its own features and the features of its such as social network users, atoms in a chemical, or words in a
neighboring nodes. The key idea behind a GNN is to define a message paper.
passing scheme, which allows nodes to aggregate information from • Node characteristics: Each node in the graph has a feature vector
their neighbors and update their own representations [61]. that describes its qualities or characteristics. These characteristics
One common message passing scheme is the Graph Convolutional might be obtained from the traits of the entity represented by the
Network (GCN), which can be defined as follows: node or learned from the data.
• Message passing: Message passing between nodes is the primary
ℎ𝑖 = 𝜎(𝛴𝑗𝜖𝑁𝑖 𝑤𝑖𝑗.ℎ𝑗.𝑊 ) (18)
operation in GNNs. Message passing involves the exchange of
where N(i) is the set of neighboring nodes of node i, W is a learn- information between nodes in a graph in order to update their
able weight matrix, and 𝜎(⋅) is an activation function. In this equa- representations. The message sent from node i to node j is deter-
∑
tion, the term j∈N(i) wij ⋅ hj represents the aggregated information mined by the characteristics of node i and the edge between i and
from the neighbors of node i, which is then combined with its own j. This message is then utilized to update node j’s representation.
representation hi through a linear transformation W. • Aggregation occurs when the message has been transmitted from
The above equation can be rewritten as (see Fig. 8): node i to all of its neighboring nodes j, and these messages are
aggregated to construct a new representation for node i. Any
ℎ𝑖 = 𝜎(𝑊 .𝛴𝑗𝜖𝑁 𝑖𝑤𝑖𝑗.ℎ𝑗.𝑊 ) (19)
function that takes in a set of vectors and returns a single vector
This form shows that the GCN is simply a linear transformation of can be used as the aggregation function. Summation is a common
the aggregated neighbor features, followed by an activation function. aggregating function in which messages are simply added up.
8
• After the updated node representations have been computed, they dataset and making it more diverse, which can help in improving the
are passed through one or more neural network layers to generate performance of the model.
the final output. These neural network layers might be of any sort,
Image filtering: Filtering involves applying filters such as Gaussian,
such as fully connected layers or convolutional layers.
Median, or Sobel filters to the images. This can help in removing noise
• The final output of the GNN is determined by the task being
from the images and making them clearer, which can improve the
performed. In node classification, for example, the GNN is trained
accuracy of the model.
to predict a node’s label based on its features and the features
of its neighbors. The GNN is trained in graph classification to Image segmentation: Segmentation involves dividing the image into
predict the label of an entire graph based on its structure and smaller parts based on certain features such as color or texture. This
node attributes. can help in identifying specific areas of interest in the image, which
can be useful for diagnosis or treatment of skin cancer.
7. Dataset
Feature extraction: Feature extraction involves extracting relevant
A dataset is a collection of data that is used to train, validate, and features from the images such as color, texture, and shape. These
test machine learning models. It is a critical component of any machine features can then be used as inputs to the model for classification or
learning, as the quality and quantity of the data can have a significant detection of skin cancer.
impact on the performance of the model. A good dataset should be Preprocessing technique applied on dataset:
large, diverse, balanced, and free from errors and biases. It should also
be annotated with labels or annotations that provide the ground truth 1. Image resize: In deep learning, images are typically represented
for the problem being solved. Access to high-quality datasets is essential as matrices of pixel values. Larger images require more memory
for advancing the field of machine learning, as it allows researchers to store and process, which can become a bottleneck when work-
and developers to test and improve their models on a wide range of ing with limited computational resources. Resizing an image to
problems and applications. a smaller size reduces the memory footprint, making it easier to
One popular skin cancer dataset is the International Skin Imag- work with.
ing Collaboration (ISIC) dataset, which contains over 25,000 clinical 2. Image Augmentation to Increased training data, In deep learn-
images of skin lesions, including malignant melanoma, nevi, and seb- ing, the performance of the model is often limited by the size
orrheic keratosis. The images were collected from 17 different sources, and quality of the training dataset. By augmenting the dataset,
including clinical settings, dermoscopy devices, and consumer-grade we can increase the amount of data available for training, which
cameras. The dataset also includes additional metadata, such as patient can lead to better performance and reduce the overfitting. Some
age, sex, and diagnosis [63–65]. common image augmentation techniques include:
Another skin cancer dataset is the HAM10000 dataset, which con-
tains over 10,000 images of skin lesions. The dataset includes seven dif- a. Rotation: the process of rotating an image by a specific
ferent types of skin lesions, including melanoma, nevus, and basal cell angle to produce new versions of the original image.
carcinoma. The images were collected from a variety of sources, includ- b. Flip and mirror: flipping or mirroring an image horizon-
ing clinical settings and consumer-grade cameras, and were manually tally or vertically to produce new versions.
annotated by dermatologists. HAM10000 is subset of ISIC [65]. c. Scaling and cropping: scaling an image or cropping a
The PAD-UFES-20 dataset was collected in conjunction with the piece of it to produce new versions.
Dermatological and Surgical Assistance Program (PAD) at the Federal d. Shifting an image horizontally or vertically to create new
University of Espírito Santo (UFES) in Brazil. The dataset consists of versions.
2298 samples of six different types of skin lesions, including Basal Cell e. Adding noise: Adding random noise to an image to pro-
Carcinoma (BCC), Squamous Cell Carcinoma (SCC), Actinic Keratosis duce new versions.
(ACK), Seborrheic Keratosis (SEK), Bowen’s disease (BOD), Melanoma f. Changing the brightness and contrast of an image to
(MEL), and Nevus (NEV) [66]. produce new variations.
There are 170 images in the MED-NODE dataset, of which 100 nevus g. Shearing is the process of altering an image along a
and 70 melanomas. specified axis to create new variations.
A dataset for histopathologic oral cancer detection using CNNs
h. Gaussian blur: applying a Gaussian blur to an image to
(HOCD) is containing a total of 5192 histopathologic images of oral
generate fresh variations.
lesions, including 2494 benign lesions and 2698 malignant lesions.
Rotation on images used to enhance the size of the training
8. Preprocessing dataset from the above-mentioned technique.
3. Image filtering: Image filtering technique used to remove noise
This article applies CNN machine learning to a dataset of skin from images like hair. Hair removal in digital photos can be
lesions. There must be some data preprocessing done on the data before accomplished using a variety of image processing techniques.
using CNN for that purpose. Here are a few examples of common techniques:
Image resizing: This technique involves resizing all the images in the
dataset to a standard size. This can help in reducing the size of the a. Median filtering is a technique for removing tiny hair-like
dataset, making it easier to work with, and reducing the time taken for features from images. Each pixel in the image is replaced
training the model. by the median value of the neighboring pixels in median
filtering. This aids in the smoothing of the image and the
Image normalization: Normalization involves adjusting the pixel val- removal of tiny hair-like formations.
ues of the images so that they have a common scale. This can help b. Thresholding: This technique is used to remove hair that
in improving the performance of the model as it makes it easier to
is darker or lighter than the image’s surrounding ar-
compare images.
eas. Thresholding involves selecting a threshold value
Image augmentation: This technique involves generating new im- and replacing any pixels with values that are greater
ages from the existing ones by applying transformations such as rota- or lower than the threshold with values that match the
tion, scaling, and flipping. This can help in increasing the size of the surrounding pixels.
9
Table 3 Table 6
Insights of skin dataset. Hyper parameters for network training.
Skin deformity class ISIC PDA Batch size 32
No of data No of data Epochs 20
Learning rate 0.0001
Actinic keratosis 867 730
Activation function ReLU
Basal Cell Carcinoma 3323 845
Optimizer Adam
Benign keratosis 2624 –
Loss function Binary cross entropy
Dermatofibroma 239 –
Melanoma 4522 52
Melanocytic Nevus 12 900 244
Squamous cell carcinoma 628 192
Vascular lesion 253 – Table 7
Total 25 356 2063 Hardware and tools used for network training.
IDE Google Colab Pro
Processor TPU (Tensor Processing Unit)
Version: Python 3 Google Computer Engine
Table 4
Insights of oral dataset. RAM 36 GB
APIs/Libraries Python 3, Tensorflow, Keras, Matplot
Oral lesions class HOCD
No of data
Oral Squamous cell carcinoma 2698
Non-Cancerous 2494 9. Experiment and result
Total 5192
The quality and quantity of skin cancer data used to train the model
can have a significant impact on its performance. It is critical to have
a big number of high-quality data that is diverse and representative of
Table 5
Insights of Skin dataset. the various types of skin cancers and skin types in order to construct
Study Train Validation Test Total
an accurate and reliable model.
Actinic keratosis 1180 320 97 1597 Quality of data: The precision and dependability of data are referred
Basal cell carcinoma 2918 833 417 4168 to as data quality. When it comes to skin cancer, data quality is
Benign keratosis 1837 525 262 2624
vital because the right diagnosis is critical for determining the tumor’s
Dermatofibroma 167 48 24 239
Melanoma 3200 915 459 4574 aggressiveness and the appropriate course of treatment. If the data used
Melanocytic Nevus 9200 2628 1316 13 144 to train the model is inaccurate, the model’s performance may suffer,
Squamous cell carcinoma 574 164 82 820 leading to misdiagnosis or incorrect predictions.
Vascular lesion 177 51 25 253
Total 19 253 5484 2682 27 419 Quantity of data: the amount of data utilized to train the model is
referred to as data quantity. In the case of skin cancer, more data will
almost certainly increase the model’s performance. A model that has
only been trained on a short sample size is likely to be less accurate
c. Image inpainting is a technique for filling in missing or and to suffer from overfitting, which occurs when the model becomes
damaged sections of an image. Image inpainting can be
overly preoccupied with the training data and struggles to generalize
used in hair removal to fill in regions where hair has been
to new data.
removed, resulting in a continuous image.
The following steps are involved in training a Convolutional Neural
d. Edge detection is a technique for identifying the edges of
Network (CNN) for skin cancer detection:
objects in a picture. Edge detection can be used in hair
removal to identify the edges of hair and separate them • Data Collection: Gather images of skin cancer from ISIC and PDA
from the surrounding portions of the image. to create a dataset [63,64,66] . It is critical that the dataset
e. The Fourier Transform can be used to remove periodic have a balanced distribution of all kinds of skin cancer. This
noise or repeating patterns in an image. It can also be is significant because an uneven dataset can induce bias in the
used to detect and remove undesirable frequencies from model’s predictions.
an image, such as hair formations. • Data Preprocessing: Prepare the dataset by shrinking images to
f. Morphological operations: You can use this technique to consistent sizes, converting them to grayscale or RGB, and nor-
remove little hair-like structures from an image. Morpho- malizing pixel values to be between 0 and 1.
logical procedures that change the shape and size of the
• Dataset Splitting: Divide the dataset into training, validation, and
objects in the image include dilatation, erosion, opening,
testing sets. A typical ratio is 70% training, 20% validation, and
and closing.
15% testing.
Most effective techniques for removing noise from image are Image • Build a model: Create the CNN architecture with frameworks like
thresholding can be used to remove hair structures that are darker or Keras, PyTorch, or TensorFlow. Begin with a minimal design and
lighter than the surrounding skin, whilst morphological techniques can progressively add complexity as needed.
be used to remove minuscule hair-like structures. • Train the model: optimization algorithms, train the model on
Overall, these preprocessing techniques can help in improving the the training set. During training, the model learns to recognize
accuracy and performance of models trained on skin cancer image patterns in images and detect them as skin cancer.
datasets.
The dataset used in this study includes a variety of skin anomalies. AlexNet, VGGNet, ResNet and DenseNet models were implemented us-
As a result, as indicated in Table 4, the dataset comprises of 27 419 ing Tensor Flow and the Keras library in such a way that loss is min-
normal and 9562 malignant (see Table 5). imized and precision is optimized for each epoch paradigm. This con-
In this article, all the experiment will carry out on skin cancer. As figuration of hyperparameter is illustrated in Table 6 The experiment’s
per the detail mentioned in Table 3 there are very less number of data hardware and tools are listed in Table 7. AlexNet, VGGNet, ResNet,
of oral cancer. and DenseNet are compared in Table 7 for training accuracy, validation
10
Table 8
Comparative study for AlexNet, VGGNet, ResNet and DenseNet.
CNN Alexnet VGGNet ResNet DenseNet
Epoch TA VA TL VL TA VA TL VL TA VA TL VL TA VA TL VL
1. 0.3738 0.3623 0.7116 0.7212 0.3218 0.2874 0.6575 0.6963 0.4715 0.4419 0.8081 1.1546 0.6572 0.5556 0.5274 0.9167
2. 0.4485 0.402 0.6836 0.6703 0.5671 0.5096 0.6333 0.6632 0.5873 0.4546 0.5984 0.8853 0.7579 0.5868 0.4665 0.7817
3. 0.4889 0.4496 0.6895 0.6751 0.5894 0.5485 0.5712 0.6082 0.6817 0.564 0.5965 0.737 0.7642 0.5951 0.3943 0.6295
4. 0.4922 0.4759 0.6839 0.6721 0.6098 0.5732 0.5527 0.6098 0.7085 0.5796 0.5933 0.7087 0.7964 0.6149 0.3562 0.5052
5. 0.5011 0.4954 0.6474 0.6632 0.6753 0.5707 0.4887 0.5547 0.7096 0.6071 0.588 0.6916 0.8053 0.6362 0.3476 0.4539
6. 0.5048 0.4933 0.6434 0.6598 0.7067 0.6895 0.4779 0.5083 0.7128 0.6136 0.5781 0.6849 0.8259 0.6403 0.3358 0.4096
7. 0.5089 0.5017 0.6358 0.6412 0.7398 0.7041 0.4752 0.5272 0.7144 0.6249 0.5729 0.6823 0.8474 0.6595 0.3261 0.3739
8. 0.5124 0.5065 0.6114 0.6385 0.7411 0.7148 0.4686 0.5291 0.7286 0.6585 0.5647 0.6492 0.8551 0.6734 0.3197 0.3638
9. 0.5176 0.5165 0.5947 0.6111 0.742 0.7229 0.4661 0.5543 0.7384 0.6676 0.5579 0.6581 0.8674 0.7037 0.3127 0.3581
10. 0.5199 0.5172 0.5373 0.5889 0.7444 0.7279 0.4463 0.5501 0.7436 0.6833 0.5467 0.641 0.8702 0.7613 0.3059 0.3474
11. 0.5222 0.5175 0.5395 0.5556 0.7457 0.7297 0.4471 0.5224 0.7643 0.6887 0.5263 0.6379 0.8839 0.7863 0.2948 0.3319
12. 0.5271 0.5222 0.5363 0.5442 0.7486 0.7311 0.4285 0.5151 0.7711 0.6913 0.5151 0.621 0.8895 0.7961 0.2868 0.3074
13. 0.5298 0.5212 0.5311 0.5464 0.7519 0.7381 0.3722 0.4871 0.7753 0.6961 0.5083 0.6136 0.8932 0.8073 0.2641 0.2948
14. 0.5342 0.5298 0.5214 0.5281 0.7532 0.7455 0.3817 0.4763 0.7888 0.7065 0.4715 0.6091 0.9039 0.8353 0.2475 0.2726
15. 0.538 0.5301 0.4888 0.5176 0.7549 0.7509 0.3587 0.4614 0.7976 0.7188 0.4557 0.5876 0.9044 0.8452 0.2331 0.2631
16. 0.5409 0.5371 0.4722 0.5186 0.76 0.7524 0.3368 0.4527 0.8061 0.7385 0.4173 0.5561 0.9101 0.8598 0.2277 0.2569
17. 0.5428 0.5399 0.4647 0.4615 0.7636 0.7549 0.3334 0.4071 0.8256 0.7431 0.4057 0.5278 0.9137 0.8784 0.2156 0.2482
18. 0.5458 0.5416 0.4523 0.4593 0.7677 0.7565 0.3214 0.3541 0.8556 0.7473 0.3569 0.5021 0.9172 0.8961 0.2078 0.2263
19. 0.5488 0.5427 0.4447 0.4434 0.7726 0.7577 0.3055 0.3201 0.8647 0.7499 0.3319 0.4831 0.9233 0.9009 0.1932 0.2097
20. 0.5533 0.5401 0.4301 0.4489 0.7734 0.7601 0.3011 0.3187 0.8788 0.7519 0.3261 0.4695 0.9285 0.9047 0.1871 0.2011
Fig. 9. VGGNet and AlexNet model convergence.
accuracy, training loss, and validation loss. The convergence of each 11. Future aspects of proposed research
model is depicted in Figs. 9 and 10. It is evident that DenseNet out-
performs ResNet, VGGNet and AlexNet in terms of results. Experiment First to prepare for oral cancer medical images and then to check
carried out on only skin cancer dataset as dataset of oral cancer is not the performance of the GNN model on the oral cancer images.
sufficient (see Table 8).
Declaration of competing interest
10. Conclusion
The authors declare that they have no conflicts of interest to report.
This research received no specific grant from any funding agency in the
A dataset of skin cancer medical images was collected and analyzed public, commercial, or not-for-profit sectors.
in this study to train multiple CNN models, including AlexNet, VGGNet, The research presented in this article has been conducted indepen-
ResNet, and DenseNet. The performance of these models was compared dently, and the authors have made every effort to present unbiased
using training accuracy, training loss, validation accuracy, and valida- findings.
tion loss. In terms of accuracy and loss, the results indicate that the
DenseNet model beat the other CNN models. These discoveries could Data availability
have a significant impact on the development of skin cancer detection
and diagnosis technologies. Data will be made available on request.
11
Fig. 10. ResNet and DenseNet model convergence.
References [21] A. Menegola, J.G. Tavares, S. A, Deep learning based oral cancer detection using
convolutional neural network, J. Appl. Oral Sci. 27 (2019) e20180637.
[1] G.-M.M.Á. Aguilar-Ruiz, P. Ramos-García, Challenges in the early diagnosis of [22] Y. Yang, X. Zhang, Y. Zhao, X. Z, Early detection of oral cancer using salivary
oral cancer, evidence gaps and strategies for improvement: A scoping review of exosomes and deep learning, Front. Oncol. 12 (2022).
systematic reviews, MDPI Cancer (2022) 1–30. [23] Y. Wang, W. Chen, Q. P, Automatic segmentation of oral cancer regions based
[2] Skin cancer statistics, https://www.wcrf.org/cancer-trends/skin-cancer-statistics. on the residual U-Net with deep supervision, Comput. Med. Imaging Graph. 98
[3] Sonal Tina Lal, Raja Paramjeet Singh Banipal, Deepak John Bhatti, Y.H. Prasad, (2022).
Changing trends of skin cancer: A tertiary care hospital study in Malwa region [24] J. Zhou, O. Troyanskaya, A. B, A regularization approach to solving sparse and
of Punjab, J. Clin. Dign. Res. 10 (6) (2016). low-rank matrices, J. Mach. Learn. Res. 20 (7) (2019) 1–43.
[4] Clinicopathological Profile of Cancers in India:A Report of the Hospital Based [25] T. Leng, C. Li, L.L. Y. Li, X. W, A deep convolutional neural network for detecting
Cancer Registries, 2021, ICMR, Delhi, 2021. tongue cancer from intraoral images, J. Med. Syst. 46 (1) (2022).
[5] Skin Cancer (Non-Melanoma): Risk Factors and Prevention, Cancer.NET, 2022. [26] Y.G.S.S.R. Kumar, A systematic review of artificial intelligence techniques
[6] D. Piyu, N. Parth, Munaf, Basal cell carcinoma: A narrative review on in cancer prediction and diagnosis, Arch. Comput. Methods Eng. 29 (2022)
contemporary diagnosis and management, Oncol. Therapy 10 (2022) 317–335. 2043–2070.
[7] S.R. Christensen, K.G. Lewis, K.G. ewis, American Cancer Society. Facts & Figures [27] S. Sharma, A. Aggarwal, T. Choudhury, Breast cancer detection using machine
2023, American Cancer Society, Atlanta, Ga., 2023. learning algorithms, in: International Conference on Computational Techniques,
[8] P. Fontanillas, B. Alipanahi, N.e.a. Furlotte, Disease risk scores for skin cancers, Belgum, 2018.
Nature 160 (2021). [28] Hela Elmannai, Monia Hamdi, Abeer AlGarni, Deep Learning Models Combining
[9] D. Z, et al., A multi-task convolutional neural network for skin lesion
for Breast Cancer Histopathology Image Classification.
classification, diagnosis and segmentation, Pattern Recognit. 107 (2020) 107419.
[29] A.-u. Rahman, A. Alqahtani, N. Aldhafferi, M. Nasir, M. Khan, M. Khan,
[10] M.J. Lynn, What are the different types of skin cancer? 2023, [Online].
A. Mosavi, Histopathologic oral cancer prediction using oral squamous cell
Available: https://www.everydayhealth.com/skin-cancer/what-are-the-different-
carcinoma biopsy empowered with transfer learning, Sensors 22 (2022) 3833.
types-of-skin-cancer/. [Accessed March 2023].
[30] C.A. Hamm, C.J. Wang, L.J. Savic, M. Ferrante, I. Schobert, T. Schlachter, Deep
[11] R.B. Aldridge, M. Zanotto, L. Ballerini, R.B. Fisher, J. Rees, Novice identification
learning for liver tumor diagnosis part I: development of a convolutional neural
of melanoma: not quite as straightforward as the ABCDs, Acta Derm. Venereol.
network classifier for multi-phasic MRI, Eur. Radiol. 29 (2019) 3338–3347.
(2011) 125–130.
[12] Hywel C. Williams, FRCP DSc, Strengths and limitations of evidence-based [31] N. Patel, A. Mishra, Automated leukaemia detection using microscopic images,
dermatology, Indian J. Dermatol. 59 (2) (2014) 127–133. Procedia Comput. Sci. 58 (2015) 635–642.
[13] P. Tschandl, N. Codella, B.N. Akay, G. Argenziano, R.P. Braun, H. Cabo, D. [32] H. Xie, D.Y.N. Sun, Z. Chen, Y. Zhang, Automated pulmonary nodule detection
G, Comparison of the accuracy of human readers versus machine-learning in CT images using deep convolutional neural networks, Pattern Recognit. 85
algorithms for pigmented skin lesion classification: An open, web-based, (2019) 109–119.
international, diagnostic study, Lancet Oncol. 20 (7) (2019) 938–947. [33] Y. Wang, et al., Deep learning enhances polarization speckle for in vivo skin
[14] C. Watters, S. Brar, T. Pepper, Oral mucosa cancer, in: StatPearls [Internet], cancer detection, Opt. Laser Technol. 140 (2020).
2022. [34] M. Toğaçar, Z. Cömert, B. Ergen, Chaos, Intelligent skin cancer detection applying
[15] L. Alzubaidi, H. Hassan, R. S, Onvolutional neural network for classification of autoencoder, MobileNetV2 and spiking neural networks, Solitons Fractals 144
oral cancer images, Int. J. Adv. Comput. Sci. Appl. 11 (7) (2020) 241–247. (2021) 110714.
[16] Y. Wang, P.J.X. Zhang, X. Li, X. L, Deep learning for oral cancer detection and [35] H. Chen, Deep learning-based classification of dermoscopic images of melanoma
diagnosis: A review, Front. Oncol. 9 (2019) 1510. and nevus, IEEE J. Biomed. Health Inf. 24 (3) (2020) 981–990.
[17] V. Borse, A.N. Konwar, P. B, Oral cancer diagnosis and perspectives in India, [36] M. Jindal, M.K. Dutta, An efficient deep learning approach for skin cancer
Sens. Int. (2020). detection, J. Ambient Intell. Humaniz. Comput. 10 (1) (2019) 287–296.
[18] Shriniket Dixit, A. Anant Kumar, K. Srinivasan, A current review of machine [37] Y.J. Kim, B.H. Kang, H.S. Jung, M.J. Park, M.J. K, A deep learning-based
learning and deep learning models in oral cancer diagnosis: Recent technologies, automated diagnostic system for distinguishing between benign and malignant
open challenges, and future research directions, Diagnostics 13 (7) (2023) 1353.
skin lesions, Comput. Methods Programs Biomed. 215 (2022).
[19] Qing Zhang, Dan Hou, Xueying Wen, Mengyu Xin, Ziling Li, L. Lihong Wu, J.L.
Pathak, Gold nanomaterials for oral cancer diagnosis and therapy: Advances, [38] G. Argenziano, Dermoscopy: The ultimate tool for melanoma diagnosis, Semin.
challenges, and prospects, Mater. Today Bio 15 (2022). Cutan. Med. Surg. 31 (3) (2012) 70–178.
[20] Y. Liu, J. Dong, J. Zhang, Q. Wang, H. Wang, X. L, A deep learning-based CAD [39] M. R, In vivo confocal scanning laser microscopy of human skin II: advances in
system for oral cancer detection using multimodal images, IEEE Trans. Med. instrumentation and comparison with histology, J. Investig. Dermatol. 113 (3)
Imaging 41 (1) (2022) 261–271. (1999) 293–303.
12
[40] A. E, Dermatologist-level classification of skin cancer with deep neural networks, [52] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Nature 542 (2017) 115–118. Anguelov, Dumitru Erhan, Vincent Vanhoucke, Rabinovich Andrew, Going
[41] I. Kousis, I. Perikos, I. Hatzilygeroudis, M. Virvou, Deep learning methods for Deeper with Convolutions, arXiv, arXiv:1409.4842v1.
accurate skin cancer recognition and mobile application, Electronics 1294 (2022) [53] T. Li, X. Ding, Y. Gao, Y. G, A novel CNN-based algorithm for automatic detection
11. of malignant, 2019.
[42] a.g.w.o.b., R. Mohakud, R. Dash, Designing a grey wolf optimization based hyper- [54] Ajitesh Kumar, Different types of CNN architectures explained: Examples, Data
parameter optimized convolutional neural network classifier for skin cancer Anal. (2022).
detection, J. King Saud Univ. Inf. Sci. (2021). [55] Ponnapalli Haripriya, a.R. Porkodi, Deep learning pre-trained architecture of alex
[43] M.A. K, A.R. S, Skin cancer detection: Applying a deep learning based model net and googlenet for DICOM image classification, Int. J. Sci. Technol. Res. 8
driven architecture in the cloud for classifying dermal cell images, Inform. Med. (11) (2019) 3130–3136.
Unlocked 18 (100282) (2020). [56] http://learnopencv.com/wp-content/uploads/2018/05/AlexNet-1.png, [Online].
[44] P. Srinivasu, J. SivaSai, M. Ijaz, A. Bhoi, W. Kim, J. Kang, Classification of [57] Hirokatsu Kataoka, Kenji Iwata, Satoh Yutaka, Feature evaluation of deep
skin disease using deep learning neural networks with MobileNet V2 and LSTM, convolutional neural networks for object recognition and detection, 2015, arXiv:
Sensors 21 (2021) 2852. 1509.07627.
[45] L. P, et al., The accuracy of visual methods for detecting oral cancer: a meta- [58] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep residual learning for
analysis, Oral Surg. Oral Med. Oral Pathol. Oral Radiol. Endod. 96 (3) (2003) image recognition, 2015, arxiv, http://arxiv.org/abs/1512.03385.
382–391. [59] L. Yu, H. Chen, Q. Dou, J. Qin, P.A. H, Automated melanoma recognition in
[46] C.F. P, Fluorescence visualization detection of field alterations in tumor margins dermoscopy images via very deep residual networks, IEEE Trans. Med. Imaging
of oral cancer patients, Clin. Cancer Res. 12 (22) (2006) 6716–6722. 36 (4) (2018) 994–1004.
[47] H. Yoshida, Narrow-band imaging system with magnifying endoscopy for the [60] Shukla Abhilash, A. Patel, Abnormality detection from X-ray bone images using
screening of esophageal cancer in patients with head and neck cancer, Int. J. DenseNet convolutional neural network, Int. J. Curr. Res. Rev. (2021) http:
Oncol. 37 (5) (2010) 1193–1200. //dx.doi.org/10.31782/IJCRR.2021.131026.
[48] X. Zhou, An artificial intelligence system for the diagnosis of oral cancer based [61] L. Waikhom, P. R, Graph neural networks: Methods, applications, and
on the surface-enhanced Raman spectroscopy of saliva, J. Biophotonics 14 (5) opportunities, 2021.
(2021). [62] https://neptune.ai/blog/graph-neural-network-and-some-of-gnn-applications.
[49] S.S. H, M.S. K, W. L, G.H. P, I. P, S.E. Chang, T. P, Classification of the [63] H.C.d.B. BCN_20000 Dataset: (c) Department of Dermatology.
clinical images for benign and malignant cutaneous tumors using a deep learning [64] M.D. Anonymous, https://arxiv.org/abs/1710.05006 and https://arxiv.org/abs/
algorithm, J. Investig. Dermatol. 138 (7) (2018) 1529. 1902.03368.
[50] N.C. Codella, et al., Skin lesion analysis toward melanoma detection, in: A [65] HAM10000 Dataset: (C) by ViDIR Group, Department of Dermatology, Medical
Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), University of Vienna, http://dx.doi.org/10.1038/sdata.2018.161.
Hosted by the Int, 2018. [66] A.G.C. Pacheco, G.R. Lima, A.S. Salomão, B. Krohling, I.P. Biral, G.G. de Angelo,
[51] Y.S. Xulei, Zeng, Yeo, Colin Tan, Hong Liang Tey, Yi Su, A novel multi-task deep F.C.R. Alves Jr., J.G.M. Esgario, A.C. Simora, P.B.C. Castro, F.B. Rodrigues, P.H.L.
learning model for skin lesion segmentation and classification, 2017. Frasson, Kro, DataSet.
13

1 s2.0 S2772442523000667 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S2772442523000667 Main

Uploaded by

Copyright:

Available Formats

Healthcare Analytics 3 (2023) 100199

Contents lists available at ScienceDirect

A Comprehensive assessment of Convolutional Neural Networks for skin and

ARTICLE INFO ABSTRACT

Fig. 2. Convolutional neural networks architecture [54].

Fig. 3. AlexNet architecture [56].

size of 3 × 3 and a stride of 2. The mathematical representation of this

As per Fig. 4 Input (224 × 224 × 3) → Convolutional Layers →

some level of translation invariance. VGGNet uses max pooling, where

𝑌𝑖 = 𝑓𝑖 (𝑋; 𝑊𝑖 ) (15) 6.5. DenseNet

Fig. 7. DenseNet architecture [60].

In a dense block, however, the output feature maps are computed

𝑋𝑘 = 𝐻𝑘 (𝑋0 , 𝑋1 , … , 𝑋𝑘−1 ) (17)

where 𝐻𝑘 is a set of non-linear transformations that are applied to the

Fig. 9. VGGNet and AlexNet model convergence.

Fig. 10. ResNet and DenseNet model convergence.

You might also like