Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Multimedia Systems (2022) 28:1239–1250

https://doi.org/10.1007/s00530-021-00840-3

SPECIAL ISSUE PAPER

Automatic segmentation of melanoma skin cancer using transfer


learning and fine‑tuning
Rafael Luz Araújo1 · Flávio H. D. de Araújo1,2 · Romuere R. V. e Silva1,2

Accepted: 14 August 2021 / Published online: 30 August 2021


© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021

Abstract
The massive use of multimedia technologies has enabled the exploration of information in many data such as texts, audio,
videos, and images. Computational methods are being developed for several purposes such as monitoring, security, business,
and even health through the automatic diagnosis of diseases by medical images. Among these diseases, we have melanoma
skin cancer. Melanoma is a skin cancer that causes a large number of fatalities worldwide. Several methods for the automatic
diagnosis of melanoma in dermoscopic images have been developed. For these methods to be more efficient, it is essential
to isolate the lesion region. This study used a melanoma segmentation method based on U-net and LinkNet deep learning
networks combined with transfer learning and fine-tuning techniques. Additionally, we evaluate the model’s ability to learn
to segment the disease or just the dataset by combining datasets. The experiments were carried out in three datasets (PH2,
ISIC 2018, and DermIS) and obtained promising results, emphasizing the U-net that obtained an average Dice of 0.923 in
the PH2 dataset, Dice = 0.893 in ISIC 2018, and Dice = 0.879 in the DermIS dataset.

Keywords Melanoma · Segmentation · U-net · LinkNet · Transfer learning · Fine-tuning

1 Introduction to health, where it were carried out several studies to create


computer-aided diagnostic (CAD) tools based on medical
The increase in the use of multimedia technologies that has images [6, 7].
been occurring in recent years is providing the exploration One of the diseases that CAD tools can diagnose is mela-
of information about human society in large amounts of data. noma skin cancer. Malignant melanoma has less incidence
These data can be text, audio, video, and image files found than non-melanoma skin cancers. However, it causes more
in various media, such as digital bibliographic collections, deaths and is more likely to be reported and diagnosed accu-
monitoring and surveillance applications, web applications, rately than non-melanoma skin cancers. Its incidence has
social networks, and many others that deal with a large vol- also increased dramatically since the early 1970s, with an
ume of data [1–3]. average increase of 4% each year in the United States [8].
Several computational methods use multimedia content Although deadly, if melanoma is diagnosed early, it can
for several purposes, such as improving health, safety, and be treated, increasing the patient’s chances of cure. The for-
economy. We can see this in [4], which implements effi- mal diagnosis performed by the dermatologist consists of a
cient and high-performance pedestrian detection for intel- visual examination of the injured region. However, because
ligent vehicles; Arain et al. [5] proposes a technique for some types of lesions are very similar, the specialist has
locating trips for tourists, observing their time and prefer- difficulties making an accurate diagnosis, requiring a lot of
ence needs. One area that stands out is computing applied experience for good accuracy [9]. Dermatologists have an
average accuracy rate of around 65%–80% when performing
* Romuere R. V. e Silva the diagnosis without additional technical support such as a
romuere@ufpi.edu.br special high-resolution camera and magnifying glass [10].
Several studies have been carried out are to develop
1
Electrical Engineering, Federal University of Piauí, Teresina, computational techniques to assist in the diagnosis of mela-
PI, Brazil
noma through images of skin lesions to assist specialist doc-
2
Information Systems, Federal University of Piauí, Picos, PI, tors [11]. To have a better interpretation of the images, in
Brazil

13
Vol.:(0123456789)
1240 R. L. Araújo et al.

Table 1  Summary of works identified in state of the art: work, dataset and segmentation techniques
Work Dataset Technique

Barata et al. 2013. [12] PH2 Histogram computation + peak detection + threshold estimation.
Fan et al. 2017. [21] PH2, EDRA, ISBI2016 Color map + Brightness map + Thresholding
Ahn et al. 2017. [22] PH2, ISIC Saliency-based via Background Detection
Al-Masni et al. 2018. [23] PH2, ISBI 2017 Deep full resolution convolutional networks
Aljanabi et al. 2018. [24] PH2, DERMIS, ISBI 2016, ISBI 2017 Artificial Bee Colony Algorithm
Peng et al. 2019. [25] PH2, ISBI 2016 Generative adversarial network + U-NET
Goyal et al. 2019. [26] PH2, ISIC 2017 Combination of R-CNN and DeeplabV3C mask
Khan et al. 2019. [27] DERMIS K-means
Abraham and Khan 2019. [28] ISIC 2018, BUS 2017 U-NET
Santos et al. 2020. [29] PH2, DERMIS, ISIC 2016, ISIC 2017 SLIC0 algorithm + Fuzzy C-means + post-processing
Amin et al. 2020. [6] ISIC 2018, PH2, ISBI 2016, ISBI 2017 Biorthogonal 2-D wavelet transform + Thresholding
Guo et al. 2020. [30] ISIC 2018 Adaptive atrous convolution (AAC) and knowledge aggregation
module (KAM)
Nazi and Abir 2020. [7] PH2, ISIC 2018 U-NET
Araújo et al. 2021. [31] PH2, DERMIS Deep Learning + post-processing

some cases, the lesion region must be isolated. This stage is 2 Related works
called segmentation and has been the target of many studies.
According to [6, 12, 13], the images have noise and other Several computational methods for automatic segmenta-
artifacts besides the lesion, making segmentation a challeng- tion of melanoma have emerged in recent years. Table 1
ing and crucial task. presents a summary of the related works present in the
Several techniques based on deep learning models are literature, we can see that several segmentation methods
used to perform the segmentation of the skin lesion, such as use traditional techniques such as Thresholding [19] and
U-net [14] and LinkNet [15]. These models usually get good K-means [20]. In [21], the segmentation of melanoma
results, but small datasets are insufficient to train deep mod- occurs by combining a color map and a brightness map,
els from scratch adequately. To circumvent this problem, the and then the Otsu algorithm is applied. Barata et al. [12]
transfer of learning is usually applied [16]. applies a thresholding algorithm comprising three steps:
Transfer learning consists of reusing a pre-trained model (1) histogram computation, (2) peak detection, and (3)
in a new problem. According to [17], the use of pre-trained threshold estimation.
weights allows the model to have a faster and better conver- Santos et al. proposed a segmentation method that
gence. In addition, to taking features learned in one problem generated superpixels and extracted texture features for a
and taking advantage of them in another similar problem, an Seeded Fuzzy C-means algorithm to group the superpix-
optional last step can be performed to achieve improvements els based on the specialist’s markings. Afterward, post-
potentially. This step is called fine-tuning and consists of processing is applied. However, when looking at the lit-
defrosting the trained model, or part of it, and performing erature, we noticed that the works with better results used
the training again on the new data with a shallow learning techniques based on deep learning, such as U-net [14].
rate. This makes the pre-trained features adapt incrementally We can see several segmentation methods based on deep
to the new data [18]. learning, like [26], that presents a fully automated ensem-
In this sense, this work has as main contributions: (1) ble method based on deep learning that obtains average
Comparison of deep learning models U-net and LinkNet Dice = 0.907 in the PH2 dataset. In [7], the U-net network
applied in the segmentation of melanoma with the use of was used to segment images from the ISIC 2018 dataset,
Transfer Learning and Fine-tuning techniques; (2) Evalua- obtaining an average Dice = 0.87.
tion of the ability of our technique to learn to segment the In research proposed by Araújo et al. [31], the
disease or just each dataset in isolation through cross-testing authors used deep-learning techniques combined with
with three datasets.

13
Automatic segmentation of melanoma skin cancer using transfer learning and fine‑tuning 1241

Fig. 1  Proposed methodology for segmentation of skin lesions with deep learning networks combined with transfer learning and fine-tuning
techniques

post-processing, obtaining an average Dice = 0.883 in the fine-tuning techniques have on these models. We reduced the
ISIC 2018 dataset. The [28] paper, using U-net, achieved computational cost using small images (128 × 128) through
Dice = 0.856 in the same dataset. The FrCN method a resize, which maintains the proportions of the lesions. We
proposed by [23] directly learned the full resolution fea- achieved high performance in small datasets and with unbal-
tures of each pixel of the input data, without the need anced classes without using data augmentation. In addition,
for pre-or post-processing operations, such as artifact we used three famous data sets to carry out isolated and
removal, low contrast adjustment, or further enhancement crossover tests, assessing the ability of our technique to learn
of the segmented skin lesion limits and obtained a mean to segment the disease or just each dataset in isolation.
Dice = 0.917 in the PH2 dataset.
The literature presents several methods for the segmen-
tation of dermoscopic images, including methods based on 3 Materials and methods
deep learning networks. When analyzing these works, we
identified some deficiencies/limitations. Some works use The methodology proposed1 in this research aims to segment
only one dataset. Usually, those that use several datasets skin lesion images with deep learning networks combined
concatenate them to increase data and do not bother to evalu- with the transfer learning and fine-tuning techniques and it
ate the dates separately and crosswise, to investigate whether follows the steps present in Fig. 1. Starting from the acqui-
the model is learning characteristics of the dataset or the sition of the images where we choose the datasets, in the
disease. segmentation stage, the transfer learning and fine-tuning is
Some works do not use the full dataset, reducing the cred- applied in the U-net and LinkNet and finally the validation
ibility of the results, as it allows the removal of images with with the main segmentation metrics.
low performance. Some surveys show few evaluation met-
rics or just one as accuracy. Accuracy alone is usually not 3.1 Image acquisition
sufficient for segmentation assessment, requiring the use of
metrics such as Dice and Jaccard. In the image acquisition stage, we gathered the most used
Most works perform pre-processing and resize, disregard- melanoma datasets in the literature. All datasets chosen con-
ing the distortion of skin lesions. There was also a great tain images with skin lesions and segmentation masks made
use of data augmentation to solve challenges such as small by a specialist doctor. The three datasets chosen are:
datasets, unbalanced classes, and overfitting. Deep learning
models require a large number of images for proper training • PH2: the ph2 dataset has 40 images of melanoma and
[22, 23, 32]. As these models have many layers and param- 160 non-melanoma, totaling 200 dermoscopic images of
eters, they are susceptible to overfitting when trained with melanocytic lesions [33];
few images. That is why the data augmentation obtains a • ISIC 2018: the International Skin Imaging Collaboration
better performance. In addition, of the studies found, only (ISIC) 2018 dataset consists of an international effort
[6, 7, 12, 27] perform the classification stage, and none of to improve melanoma diagnosis, being publicized in a
them evaluates the impact that segmentation causes in the
melanoma classification.
Our work uses the U-net, and LinkNet deep learning mod- 1
The codes, databases, and information for using the proposed meth-
els applied in the segmentation of melanoma and has as dif- odology are available in a public repository: https://​github.​com/​rafal​
ferentials investigate the impact that Transfer Learning and uz/​skin-​seg-​unet-​linkn​et.

13
1242 R. L. Araújo et al.

Fig. 2  Examples of melanoma,


non-melanoma and segmen-
tation skin lesions made by
experts in the PH2, ISIC 2018
and DermIS datasets

competition for researchers to create high-performance expansion path. The main strategy that differentiates U-Net
methods for segmentation and classification of the dis- from other segmentation architectures is combining the fea-
ease. The dataset has 519 melanoma images and 2075 ture maps of the contraction stage and their symmetric coun-
non-melanoma images, totaling 2594 images [34, 35]; terparts in the expansion stage, allowing the propagation
• DermIS: The Dermatology Informátion System (Der- of context information to the high-resolution feature maps.
mIS) is a service that offers information about dermatol- Figure 3 presents the U-net architecture and a pre-trained
ogy on the internet, an atlas of images of skin diseases simplification.
such as melanoma is available [36]. We gathered 207 The architecture used here differs from that of [14] in its
images, 119 of which were melanoma, and 88 were non- number of layers. As it is pre-trained, it has several layers
melanoma. according to the chosen backbone. We used the ResNet34
backbone for this work. In a simplified way, the contrac-
Figure 2 shows examples of melanoma, non-melanoma and tion path of the U-net network is composed of a pre-trained
segmentation skin lesions made by specialists in the three model of ResNet34 without the layer fully connected.
data sets used in this research.
3.2.2 LinkNet
3.2 Segmentation
Most pixel-based semantic segmentation algorithms for
The segmentation step aims to identify a region of interest in understanding visual scenes, although accurate, do not
the image. In our case, the skin lesion. This step is essential focus on the efficient use of neural network parameters to
in an automatic detection system as it prevents noise and be used satisfactorily in real-time applications. In this sense,
elements external to the lesion from being analyzed. Several the LinkNet was designed to be a deep learning network
segmentation techniques have been developed, from more that allows learning without a significant increase in the
traditional algorithms [19, 20] to techniques using deep parameters. Its architecture is similar to U-net and other seg-
learning [23, 25, 37]. In this research, we use deep learning mentation networks. An encoder on the left and a decoder
networks for segmentation. In [17], we find the U-net and on the right. The encoder has the function of encoding the
LinkNet architectures used in this work. Both enable the information in the resource space, and the decoder maps
segmentation of binary and multiclass images, containing this information in the spatial categorization to perform the
25 backbones for each architecture. All backbones have pre- segmentation [15]. We can see the original LinkNet archi-
trained weights to use the potential of the learning transfer. tecture and a pre-trained simplification in Fig. 4.
Here, we use the ResNet34 backbone and the image weights. In the original LinkNet network, the input images
Jaccard was used to lose segmentation. Additionally, we per- pass through an initial block and follow through resid-
form the fine-tuning step to obtain better results. ual blocks represented as an encoding block (i). Then,
in the decoder, they pass through decoder blocks (i).
3.2.1 U‑net The decoder also uses the full convolution technique.
LinkNet’s great advantage lies in the way it connects
The U-net network, initially proposed by [14], is a fully con- between encoders and decoders. According to [15], the
volutional network model that was created for the segmenta- input of each encoding layer is diverted to the output of
tion of biomedical images. Its architecture constitutes a con- its corresponding decoder, making it possible for spatial
traction path that captures contextual information and, then, information lost in the encoder to be used by the decoder.
a precise segmentation is obtained through an asymmetrical This knowledge sharing allows the decoder to use fewer

13
Automatic segmentation of melanoma skin cancer using transfer learning and fine‑tuning 1243

Fig. 3  U-net architectures: a Original U-net architecture [14]. Each different operations. b Simplification of pre-trained U-net architecture
blue box corresponds to a multi-channel resource map. The white adapted from [17]. On the left, we see the contraction path made up
boxes represent maps of copied resources. The arrows indicate the of the ResNet34 backbone

Fig. 4  LinkNet architectures: a LinkNet architecture adapted from [15]. b Simplification of pre-trained LinkNet architecture adapted from [17],
on the left, we see the encoder formed by the ResNet34 backbone

parameters resulting in a more efficient network for 3.3 Transfer learning


real-time operations. In our work, the LinkNet network
encoder is composed of a pre-trained model of ResNet34 The transfer learning stage consists of storing the knowl-
without the layer fully connected. edge acquired while solving a problem and applying it to

13
1244 R. L. Araújo et al.

a different but related problem. This is very effective when


the data set is small, making the model easier to adapt to the
new data, which would not be as fast or effective if trained
from scratch.
In this research, we use models with pre-trained weights
from the imagenet. The imagenet is a set of image data
with tens of millions of images ordered cleanly to provide
researchers around the world with an easily accessible image
database [38].

3.4 Fine‑tuning

The fine-tuning stage serves to enhance the achievement of


improvements in the model’s results. It assumes that for a
model to adjust to certain observations, its parameters must
be adjusted very precisely. Fine-tuning consists of unfreeze
a trained model or part of it, and then training is performed
again on the new data using a shallow learning rate. This Fig. 5  Performance evaluation values computed
causes the weights already learned to be slightly adjusted.
In this research, fine-tuning occurred with the freezing
of the encoder weights. According to [17], at times, not to TP
SEN = , (1)
damage the weights of the encoder that was properly trained TP + FN
with huge gradients during the first stages of training, we
can freeze him and train only the decoder that was randomly TN
initialized.
ESP = , (2)
TN + FP

TP + TN
3.5 Validation ACC = , (3)
TP + TN + FP + FN

To compute the effectiveness of our methodology, we com- 1.0 ∗ FP 1.0 ∗ FN


pare the segmentation masks provided by the specialists in AUC =1 − 0.5 ∗ ( + ), (4)
FP + TN FN + TP
the databases with the ones obtained by U-net and LinkNet.
This process is performed for each pixel in both images, 2 ∗ TP
thereby it is possible to compute a confusion matrix of our DSC = , (5)
2 ∗ TP + FP + FN
results, where:
TP
• True Positive (TP): when the lesion pixels are correctly JAC = . (6)
TP + FP + FN
segmented;
• True Negative (TN): when the lesion pixels are incor-
rectly segmented ;
• False Positive (FP): when pixels without lesion are cor-
rectly segmented; 4 Experimental results and discussion
• False Negative (FN): when pixels without lesion are
incorrectly segmented. To carry out the experiments, we performed a proportional
resizing of the images from the PH2, ISIC 2018, and Der-
Figure 5 shows a illustration of how the confusion matrix mIS datasets to standardize the images’ size and reduce the
is computed. From those values, it is possible to compute computational cost. In our work [31], we found that the size
some metrics, in our work we used the following: Sensitivity 128×128 is sufficient to obtain satisfactory results. The next
(SEN) [39], in Eq. 1; Specificity (ESP) [39], in Eq. 2; Accu- step was to map how many tests would be performed to
racy (ACC) [40], in Eq. 3; AUC [40], in Eq. 4; Dice (DSC) contemplate all possible combinations between the three
[41], in Eq. 5; and, Jaccard (JAC) [42], in Eq. 6. Below are datasets to validate the networks’ level of generalization.
the equations for each metric. We designed thirteen tests as shown in Fig. 6.

13
Automatic segmentation of melanoma skin cancer using transfer learning and fine‑tuning 1245

reliability. K-fold [43] enables the partitioning of the data


set into mutually exclusive subsets.

4.1 Results with U‑net

Table 2 shows the results of the pre-trained U-net with


weights from the imagenet. The pre-trained weights proved
to be efficient in adapting to the three sets of images, obtain-
ing promising results. In the PH2 dataset, mean DSC = 0.912
and JAC = 0.839 were obtained. In the ISIC 2018 dataset
average DSC = 0.891 and JAC = 0.803 and finally, in Der-
mIS the average DSC = 0.868 and JAC = 0.767. The net-
work obtained similar results in the three datasets, although
the best results were in PH2.
When conducting training with the PH2 dataset, we
noticed that the DermIS database tests were not satis-
factory, with DSC = 0.591. At ISIC 2018 was obtained
DSC = 0.785, indicating that training with the PH2 set
does not make the network generalist enough to identify
the disease in other diversified images. The same behav-
ior is observed when training the network with the Der-
mIS dataset. Training with the ISIC 2018 set, on the other
hand, proved to be more efficient, obtaining good results
Fig. 6  Tests performed with the combinations between the PH2, ISIC when testing with the PH2 (DSC = 0.901) and DermIS
2018, and DermIS datasets (DSC = 0.809) datasets. This must occur because the ISIC
2018 database has a larger number of images and more

Table 2  Results of U-net Train Test SEN ESP ACC​ AUC​ DSC JAC
segmentation tests with transfer
learning PH2 PH2 0.878 0.985 0.959 0.932 0.912 0.839
ISIC18 0.797 0.958 0.933 0.877 0.785 0.646
DermIS 0.911 0.917 0.917 0.914 0.591 0.419
ISIC18 PH2 0.937 0.954 0.950 0.946 0.901 0.819
ISIC18 0.866 0.986 0.968 0.925 0.891 0.803
DermIS 0.902 0.977 0.972 0.939 0.809 0.679
DermIS PH2 0.505 0.996 0.877 0.751 0.666 0.499
ISIC18 0.492 0.997 0.919 0.745 0.652 0.484
DermIS 0.816 0.996 0.983 0.906 0.868 0.767
DermIS + ISIC18 PH2 0.938 0.960 0.955 0.949 0.909 0.834
PH2 + DermIS ISIC18 0.734 0.986 0.947 0.859 0.810 0.681
PH2 + ISIC18 DermIS 0.908 0.974 0.969 0.941 0.798 0.664
ALL ALL 0.868 0.987 0.969 0.927 0.894 0.809

We use the U-net and LinkNet networks with ResNet34


diversity, making the network learn more characteristics of
backbone for the tests. The backbone has pre-trained
the disease, not just being attached to aspects of the dataset.
weights with the imagenet. For fine-tuning, we freeze the
This is more evident when observing that training at ISIC
encoding network and trained only the decoder. In tests
2018 and testing at PH2 obtain better results than training/
where the training base is the same as the test base, we
testing only with ISIC 2018.
divide the data into 60% training, 20% test, and 20% vali-
When combining two datasets for training, we noticed
dation. Additionally, we use the k-fold cross-validation
that the PH2 dataset ends up decreasing the results. This is
method in these cases, with k = 5, to increase the results’

13
1246 R. L. Araújo et al.

Table 3  Results of U-net Train Test SEN ESP ACC​ AUC​ DSC JAC
segmentation tests with fine-
tuning PH2 PH2 0.905 0.982 0.963 0.944 0.923 0.858
ISIC18 0.766 0.972 0.941 0.869 0.799 0.665
DermIS 0.899 0.943 0.940 0.921 0.665 0.499
ISIC18 PH2 0.929 0.958 0.952 0.944 0.903 0.823
ISIC18 0.866 0.987 0.968 0.927 0.893 0.807
DermIS 0.899 0.978 0.972 0.939 0.812 0.683
DermIS PH2 0.603 0.994 0.899 0.799 0.744 0.593
ISIC18 0.554 0.994 0.927 0.774 0.699 0.537
DermIS 0.839 0.995 0.985 0.917 0.879 0.785
DermIS + ISIC18 PH2 0.935 0.961 0.955 0.948 0.909 0.834
PH2 + DermIS ISIC18 0.748 0.986 0.949 0.867 0.819 0.693
PH2 + ISIC18 DermIS 0.906 0.977 0.972 0.941 0.810 0.681
ALL ALL 0.861 0.987 0.968 0.924 0.889 0.801

evident when comparing training/test (ISIC 2018/Dermis) ISIC 2018 dataset, the DSC was 0.882, only 0.9% lower, and
that obtain DSC = 0.809 and training/test (PH2 + ISIC in DermIS, it had DSC = 0.788, which is 8% lower than the
2018/DermIS) that obtain DSC of just 0.798. We found that result obtained in the U-net.
the PH2 dataset has little diversity and many specific charac- The difference in results for the other tests compared to
teristics of the dataset, making it easy to obtain good results U-net was also quite expressive, pointing out that the U-net,
and wrong to obtain generalized training for the disease. in these configurations, presented a performance relatively
By combining the three datasets, we combine a greater higher than LinkNet for segmentation of dermoscopic
diversity of images and underlying characteristics and dis- images.
ease. We concluded that the results, although not so high, but When analyzing Table 5, we noticed that there was a
have greater consistency in the generalization question. To significant improvement in the results of LinkNet. In the
improve the results more and make the network better adjust ph2 dataset, the average DSC increased to 0.901, which
to the pre-trained weights, we use the fine-tuning technique, represents a 4.2% increase. In the DermIS dataset, the
which has its results presented in the Table 3. DSC increased to 0.834 (4.6% increase). These significant
After performing fine-tuning, we noticed a smaller increases prove that fine-tuning is efficient and improves
increase in some cases, while in others, a very expressive segmentation models’ accuracy based on deep learning.
increase, showing that the technique brings improvements Figure 7 presents examples of lesions, specialist mask and
to our model. In the PH2 dataset, an average DSC = 0.923 predictions obtained by U-net and LinkNet in the three
was obtained, an increase of 1.1%, and the same increase in datasets.
the DermIS dataset.
In other cases, this increase was much more expressive, 4.3 Comparison with related works
such as in training/testing (PH2/DermIS), where the increase
was 7.4%. In the training/test combination (DermIS/PH2), Although the results obtained in the tests have shown to be
the increase was 7.8%. In the combinations of two datasets promising, we compared high-performance works present in
for training and one for testing, the increase was smaller, the literature that performs the segmentation of melanoma
and in the test with the three datasets together, it was also in the datasets used in this research. Table 6 shows the com-
smaller. The fine-tuning technique proved useful and brought parison of the proposed methodology with the works listed
many improvements to our method, making it achieve more in the PH2 dataset. We noticed that some works obtained
significant results. better results for the SEN metric, such as [24], who had the
highest SEN and obtained the highest ESP of all. However,
4.2 Results with LinkNet we obtained the best results in ACC, AUC, and mainly in
the DSC and JAC indices, one of the primary metrics for
Table 4 shows the results of the pre-trained LinkNet net- segmentation assessment.
work with weights from the image. It shows that the results, Table 7 shows the comparison with the works that per-
although promising, are much lower than those of the U-net formed segmentation in the DermIS dataset. As we can see,
for the configurations used. In the PH2 dataset, the average our proposed method obtained the best results in almost all
DSC = 0.859 was 5.3% lower than that of the U-net. In the metrics, second only to [31] AUC. We also observed that

13
Automatic segmentation of melanoma skin cancer using transfer learning and fine‑tuning 1247

Fig. 7  Tests performed with the combinations between the PH2, ISIC 2018, and DermIS datasets

Table 4  Results of LinkNet Train Test SEN ESP ACC​ AUC​ DSC JAC
segmentation tests with transfer
learning PH2 PH2 0.796 0.983 0.938 0.890 0.859 0.757
ISIC18 0.729 0.952 0.918 0.841 0.732 0.578
DermIS 0.862 0.890 0.888 0.876 0.505 0.338
ISIC18 PH2 0.927 0.956 0.949 0.942 0.898 0.816
ISIC18 0.865 0.982 0.964 0.923 0.882 0.789
DermIS 0.862 0.985 0.977 0.924 0.832 0.712
DermIS PH2 0.401 0.998 0.853 0.699 0.570 0.398
ISIC18 0.349 0.997 0.898 0.673 0.511 0.343
DermIS 0.676 0.997 0.976 0.836 0.788 0.653
DermIS + ISIC18 PH2 0.928 0.952 0.946 0.940 0.893 0.808
PH2 + DermIS ISIC18 0.641 0.992 0.938 0.817 0.763 0.616
PH2 + ISIC18 DermIS 0.889 0.983 0.977 0.936 0.840 0.725
ALL ALL 0.854 0.983 0.963 0.919 0.878 0.783

U-net has the best results, losing to LinkNet only in ESP. AUC, and mainly DSC results. These comparison results
The small number of works listed in this dataset is because it with the three datasets show that our technique was very
does not have a closed number of images, so in some works robust and promising, obtaining results that compete with
that we find, the authors have collected few images, such as the high-performance works present in the literature.
[44], who performed tests on only two images. In [45] work,
a combination of 44 images from the DermIS dataset was 4.4 Discussions
performed with 96 images from two other datasets, due to
which they were not added to the table. After the tests, we found that the U-net network had a higher
We can compare related works in the ISIC 2018 dataset segmentation performance than LinkNet. Indicating that
in Table 8, where we see that [6] obtained the highest results even though the LinkNet has efficient use of parameters and
for SEN and JAC. However, our pre-trained U-net method- number of operations in real-time applications, it loses preci-
ology submitted to fine-tuning obtained better ESP, ACC, sion compared to U-net, which forms deeper features than

13
1248 R. L. Araújo et al.

Table 5  Results of LinkNet Train Test SEN ESP ACC​ AUC​ DSC JAC
segmentation tests with fine-
tuning PH2 PH2 0.870 0.980 0.953 0.925 0.901 0.820
ISIC18 0.750 0.960 0.928 0.855 0.761 0.615
DermIS 0.885 0.878 0.878 0.881 0.490 0.325
ISIC18 PH2 0.936 0.957 0.952 0.946 0.904 0.826
ISIC18 0.862 0.986 0.967 0.924 0.889 0.800
DermIS 0.882 0.984 0.978 0.933 0.841 0.726
DermIS PH2 0.516 0.996 0.880 0.756 0.676 0.510
ISIC18 0.467 0.996 0.915 0.731 0.628 0.457
DermIS 0.752 0.996 0.980 0.874 0.834 0.718
DermIS + ISIC18 PH2 0.923 0.956 0.948 0.940 0.896 0.812
PH2 + DermIS ISIC18 0.699 0.989 0.944 0.844 0.795 0.660
PH2 + ISIC18 DermIS 0.893 0.981 0.975 0.937 0.828 0.706
ALL ALL 0.866 0.986 0.968 0.926 0.893 0.807

Table 6  Comparison of the References SEN ESP ACC​ AUC​ DSC JAC
proposed method with related
works methods on the PH2 Barata et al. 2013 [12] 0.904 0.970 0.928 – 0.900 0.837
dataset
Fan et al. 2017 [21] – – – – 0.893 –
Ahn et al. 2017 [22] – – – – 0.910 –
Al-Masni et al. 2018 [23] 0.937 0.956 0.950 – 0.917 0.847
Aljanabi et al. 2018 [24] 0.955 0.984 0.960 – 0.922 0.852
Peng et al. 2019 [25] 0.870 0.970 0.930 – 0.900 0.850
Goyal et al. 2019 [26] 0.929 0.932 0.938 – 0.907 0.839
Proposed (U-net) 0.905 0.982 0.963 0.944 0.923 0.858
Proposed (LinkNet) 0.870 0.980 0.953 0.925 0.901 0.820

Table 7  Comparison of the References SEN ESP ACC​ AUC​ DSC JAC
proposed method with related
works methods on the DermIS Khan et al. 2019 [27] – – 0.940 – – –
dataset
Santos et al. 2020 [29] – – 0.975 – 0.834 –
Araujo et al. 2021 [31] 0.923 0.987 0.983 0.955 0.871 0.774
Proposed Method (U-net) 0.839 0.995 0.985 0.917 0.879 0.785
Proposed Method (LinkNet) 0.752 0.996 0.980 0.874 0.834 0.718

Table 8  Comparison of the References SEN ESP ACC​ AUC​ DSC JAC
proposed method with related
works methods on the ISIC Abraham and Khan 2018 [28] – – – – 0.856 –
2018 dataset
Amin et al. 2020 [6] 1.000 0.900 – – 0.820 0.850
Guo et al. 2020 [30] – – 0.950 – 0.864 0.776
Nazi and Abir 2020 [7] – – – – 0.87 0.80
Araujo et al. 2021 [31] 0.866 0.983 0.965 0.924 0.883 0.791
Proposed Method (U-net) 0.866 0.987 0.968 0.927 0.893 0.807
Proposed Method (LinkNet) 0.862 0.986 0.967 0.924 0.889 0.800

other targeting networks. We also found that transfer learn- and classes. We observed that that the results obtained
ing and fine-tuning techniques positively impact models, are promising, competing with the best works found in
making the model learn even with a few unbalanced images state-of-the-art.

13
Automatic segmentation of melanoma skin cancer using transfer learning and fine‑tuning 1249

Even with promising results, our technique still has some This occurrence indicates that the PH2 dataset has little
limitations. The device used to capture images influences diversity between the images and specific characteristics of
characteristics such as resolution, colors, sharpness, and the base, making the model have good accuracy in it and a
lighting. In this sense, further tests would be necessary to bad one in other sets. The ISIC 2018 dataset, on the other
ensure that the technique obtained good results in images hand, has a considerably larger number of images, indicat-
captured by devices other than the datasets used. In addition, ing greater diversity, enabling the network to learn charac-
the variation in skin color in different populations presents teristics inherent to the disease. The results also point out
itself as a limiting factor in the reproducibility of the results, that the combination of the three datasets results in a more
enabling future investigations. In addition, according to [46], general model.
the white population has approximately ten times more risk The results obtained in this work are promising, but they
of developing cutaneous melanoma than the black, Asian or still can be improved. For this, in future works, we will eval-
Hispanic population. uate the impact that different data augmentation techniques
Our work also had some shortcomings that we can elimi- have on the network’s generalization capacity. We will also
nate in the future. We have not investigated the impact of investigate whether the impact of our segmentation will be
pre-processing and data augmentation to reduce computa- positive during the lesion classification stage. Additionally,
tional cost. However, in future work, we must perform new we will evaluate others classification techniques applied to
tests using these techniques because pre-processing can the detection of melanoma, such as the convolutional neural
remove noise and other elements inherent to the dataset, networks ResNet, DenseNet, Inception, and the new Capsule
which can improve the level of learning of the model on the networks.
characteristics of the disease. In addition, the use of data
augmentation can reduce the occurrence of overfitting in the
model and make it better to treat unbalanced classes. References
1. Memon, M.H., Khan, A., Li, J.-P., Shaikh, R.A., Memon, I., Deep,
S.: Content based image retrieval based on geo-location driven
image tagging on the social web, in: 2014 11th International
5 Conclusion and future works Computer Conference on Wavelet Actiev Media Technology and
Information Processing (ICCWAMTIP), IEEE, 2014, pp. 280–283
The segmentation stage is a difficult process and fundamen- 2. Memon, I.: Authentication user’s privacy: An integrating location
privacy protection algorithm for secure moving objects in loca-
tal to obtain success in diagnostic methods of dermoscopic tion based services. Wireless Personal Communications 82(3),
images. In this research, we present a melanoma segmenta- 1585–1600 (2015)
tion approach that uses U-net and LinkNet deep learning 3. Memon, M.H., Li, J.-P., Memon, I., Arain, Q.A.: Geo matching
networks combined with transfer learning and fine-tuning regions: multiple regions of interests using content based image
retrieval based on relative locations. Multimedia Tools Appl.
techniques. In addition, we investigate the networks’ ability 76(14), 15377–15411 (2017)
to learn the characteristics of the disease or specific charac- 4. Said, Y., Atri, M.: Efficient and high-performance pedestrian
teristics of the datasets through dataset combination tests. detector implementation for intelligent vehicles. IET Intell. Trans.
We performed experiments with three public datasets Syst. 10(6), 438–444 (2016)
5. Arain, Q.A., Memon, H., Memon, I., Memon, M.H., Shaikh,
(PH2, ISIC 2018, and DermIS). The U-net network proved R.A., Mangi, F.A.: Intelligent travel information platform based
to be more efficient than LinkNet for the configurations on location base services to predict user travel behavior from user-
used. We also found that the use of fine-tuning consider- generated gps traces. Int. J. Comput. Appl. 39(3), 155–168 (2017)
ably improves the accuracy of our two models. With U-net, 6. Amin, J., Sharif, A., Gul, N., Anjum, M.A., Nisar, M.W., Azam,
F., Bukhari, S.A.C.: Integrated design of deep features fusion for
results were obtained that compete with the best results localization and classification of skin cancer. Pattern Recognit.
found in the literature, with mean DSC = 0.923 in dataset Lett. 131, 63–70 (2020)
PH2, DSC = 0.893 in dataset ISIC 2018, and DSC = 0.879 7. Al Nazi, Z., Abir, T.A.: Automatic skin lesion segmentation and
in dataset DermIS. melanoma detection: Transfer learning approach with u-net and
dcnn-svm, in: Proceedings of International Joint Conference on
When analyzing the tests with combinations of datasets, Computational Intelligence, Springer, 2020, pp. 371–381
we concluded that training the model with a single dataset 8. WHO, World Health Organization, https://​www.​who.​int/​news-​
containing few images will not be enough to create a gener- room/q-a​ -d​ etail/u​ ltrav​ iolet-(​ uv)-r​ adiat​ ion-a​ nd-s​ kin-c​ ancer, online;
alist model that learns to segment the disease. It learns only accessed 04 August 2020 (2020)
9. Haenssle, H.A., Fink, C., Schneiderbauer, R., Toberer, F., Buhl, T.,
the characteristics inherent to the dataset. This was evident Blum, A., Kalloo, A., Hassen, A.B.H., Thomas, L., Enk, A., et al.:
in training with the PH2 dataset, which, when testing with Man against machine: diagnostic performance of a deep learning
other datasets, obtained unsatisfactory results. With the ISIC convolutional neural network for dermoscopic melanoma recogni-
2018 dataset, the network was able to adapt more to the dis- tion in comparison to 58 dermatologists. Ann. Oncol. 29(8), 1836–
1842 (2018)
ease, obtaining promising results in other datasets.

13
1250 R. L. Araújo et al.

10. Argenziano, G., Soyer, H.P.: Dermoscopy of pigmented skin lesions- 30. Guo, X., Chen, Z., Yuan, Y.: Complementary network with adaptive
a valuable tool for early. Lancet Oncol. 2(7), 443–449 (2001) receptive fields for melanoma segmentation, in, : IEEE 17th Inter-
11. Moura, N., Veras, R., Aires, K., Machado, V., Silva, R., Araújo, F., national Symposium on Biomedical Imaging (ISBI). IEEE 2020,
Claro, M.: Abcd rule and pre-trained cnns for melanoma diagnosis. 2010–2013 (2020)
Multimedia Tools Appl. 78(6), 6869–6888 (2019) 31. Araújo, R.L., Ricardo de Andrade, L.R., Rodrigues, J.J., e Silva, R.R.:
12. Barata, C., Ruela, M., Francisco, M., Mendonça, T., Marques, J.S.: Automatic segmentation of melanoma skin cancer using deep learn-
Two systems for the detection of melanomas in dermoscopy images ing, in: 2020 IEEE International Conference on E-health Networking,
using texture and color features. IEEE Syst. J. 8(3), 965–979 (2013) Application & Services (HEALTHCOM), IEEE, 2021, pp. 1–6
13. Tang, P., Liang, Q., Yan, X., Xiang, S., Sun, W., Zhang, D., Coppola, 32. Al-Masni, M.A., Al-Antari, M.A., Park, J.-M., Gi, G., Kim, T.-Y.,
G.: Efficient skin lesion segmentation using separable-unet with sto- Rivera, P., Valarezo, E., Choi, M.-T., Han, S.-M., Kim, T.-S.: Simul-
chastic weight averaging. Comput. Methods Programs Biomed. 178, taneous detection and classification of breast masses in digital mam-
289–301 (2019) mograms via a deep learning yolo-based cad system. Comput. Meth-
14. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional net- ods Programs Biomed. 157, 85–94 (2018)
works for biomedical image segmentation, in: International Confer- 33. Mendonca, T., Celebi, M., Mendonca, T., Marques, J.: Ph2: A public
ence on Medical image computing and computer-assisted interven- database for the analysis of dermoscopic images, in: Dermoscopy
tion, Springer, 2015, pp. 234–241 image analysis, CRC Press, 2015
15. Chaurasia, A., Culurciello, E., Linknet: Exploiting encoder repre- 34. Codella, N., Rotemberg, V., Tschandl, P., Celebi, M.E., Dusza, S.,
sentations for efficient semantic segmentation, in, : IEEE Visual Gutman, D., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M.,
Communications and Image Processing (VCIP). IEEE 2017, 1–4 et al.: Skin lesion analysis toward melanoma detection 2018: A chal-
(2017) lenge hosted by the international skin imaging collaboration (isic),
16. Hosny, K.M., Kassem, M.A., Foaud, M.M.: Classification of skin arXiv preprint arXiv:​1902.​03368
lesions using transfer learning and augmentation with alex-net. PloS 35. Tschandl, P., Rosendahl, C., Kittler, H.: The ham10000 dataset, a
one 14(5), e0217293 (2019) large collection of multi-source dermatoscopic images of common
17. Yakubovskiy, P.: Segmentation models, https://​github.​com/​qubvel/​ pigmented skin lesions. Sci. Data 5, 180161 (2018)
segme​ntati​on_​models (2019) 36. DermIS, Dermatology information system, https://​www.​dermis.​
18. Keras, Transfer learning & fine-tuning, https://​keras.​io/​guides/​trans​ net/​dermi​sroot/​en/​home/​index.​htm, online; accessed 25 June 2020
fer_​learn​ing, online; accessed 20 October 2020 (2020) (2020)
19. Otsu, N.: A threshold selection method from gray-level histograms. 37. Bi, L., Kim, J., Ahn, E., Kumar, A., Feng, D., Fulham, M.: Step-wise
IEEE Trans. Syst. Man Cybernet. 9(1), 62–66 (1979) integration of deep class-specific learning for dermoscopic image
20. MacQueen, J., et al.: Some methods for classification and analysis segmentation. Pattern Recognit. 85, 78–89 (2019)
of multivariate observations, in: Proceedings of the fifth Berkeley 38. Imagenet, About imagenet, http://​www.​image-​net.​org/​about-​overv​
symposium on mathematical statistics and probability, Vol. 1, Oak- iew, online; accessed 22 October 2020 (2020)
land, CA, USA, 1967, pp. 281–297 39. Zeng, C., Zhu, Z., Liu, G., Hu, W., Wang, X., Yang, C., Wang, H.,
21. Fan, H., Xie, F., Li, Y., Jiang, Z., Liu, J.: Automatic segmentation of He, D., Tan, J.: Randomized, double-blind, placebo-controlled trial
dermoscopy images using saliency combined with otsu threshold. of oral enalapril in patients with neurally mediated syncope. Am.
Comput. Biol. Med. 85, 75–85 (2017) Heart J. 136(5), 852–858 (1998)
22. Ahn, E., Kim, J., Bi, L., Kumar, A., Li, C., Fulham, M., Feng, D.D.: 40. Provost, F., Domingos, P.: Well-trained pets: Improving probability
Saliency-based lesion segmentation via background detection in estimation trees, Raport instytutowy IS-00-04, Stern School of Busi-
dermoscopic images. IEEE J. Biomed. Health Informatics 21(6), ness, New York University
1685–1693 (2017) 41. Ginsberg, J.R., Young, T.P.: Measuring association between indi-
23. Al-Masni, M.A., Al-Antari, M.A., Choi, M.-T., Han, S.-M., Kim, viduals or groups in behavioural studies. Animal Behaviour 44(1),
T.-S.: Skin lesion segmentation in dermoscopy images via deep full 377–379 (1992)
resolution convolutional networks. Comput. Methods Programs 42. Hamers, L., et al.: Similarity measures in scientometric research:
Biomed. 162, 221–231 (2018) The jaccard index versus salton’s cosine formula. Information Pro-
24. Aljanabi, M., Özok, Y.E., Rahebi, J., Abdullah, A.S.: Skin lesion cessing and Management 25(3), 315–18 (1989)
segmentation method for dermoscopy images using artificial bee 43. Burman, P.: A comparative study of ordinary cross-validation, v-fold
colony algorithm. Symmetry 10(8), 347 (2018) cross-validation and the repeated learning-testing methods. Biom-
25. Peng, Y., Wang, N., Wang, Y., Wang, M.: Segmentation of dermos- etrika 76(3), 503–514 (1989)
copy image using adversarial networks. Multimedia Tools Appl. 44. Chakkaravarthy, A.P., Chandrasekar, A.: Automatic detection and
78(8), 10965–10981 (2019) segmentation of melanoma using fuzzy c-means, in: 2019 Fifth
26. Goyal, M., Oakley, A., Bansal, P., Dancey, D., Yap, M.H.: Skin International Conference on Science Technology Engineering and
lesion segmentation in dermoscopic images with ensemble deep Mathematics (ICONSTEM), Vol. 1, IEEE, 2019, pp. 132–136
learning methods. IEEE Access 8, 4171–4181 (2019) 45. Dey, N., Rajinikanth, V., Ashour, A.S., Tavares, J.M.R.: Social group
27. Khan, M.Q., Hussain, A., Rehman, S.U., Khan, U., Maqsood, M., optimization supported segmentation and evaluation of skin mela-
Mehmood, K., Khan, M.A.: Classification of melanoma and nevus noma images. Symmetry 10(2), 51 (2018)
in digital images for diagnosis of skin cancer. IEEE Access 7, 46. Rastrelli, M., Tropea, S., Rossi, C.R., Alaibac, M.: Melanoma: epi-
90132–90144 (2019) demiology, risk factors, pathogenesis, diagnosis and classification.
28. Abraham, N., Khan, N.M.: A novel focal tversky loss function with vivo 28(6), 1005–1011 (2014)
improved attention u-net for lesion segmentation, in, : IEEE 16th
International Symposium on Biomedical Imaging (ISBI 2019). Publisher's Note Springer Nature remains neutral with regard to
IEEE 2019, 683–687 (2019) jurisdictional claims in published maps and institutional affiliations.
29. Santos, E., Veras, R., Miguel, H., Aires, K., Claro, M.L., Junior,
G.B.: A skin lesion semi-supervised segmentation method, in: 2020
International Conference on Systems, Signals and Image Processing
(IWSSIP), IEEE, 2020, pp. 33–38

13

You might also like