Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

sensors

Article
Skin Lesion Analysis towards Melanoma Detection
Using Deep Learning Network
Yuexiang Li 1,2 ID
and Linlin Shen 1,2, * ID

1 Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University,
Shenzhen 518060, China; yuexiang.li@szu.edu.cn
2 Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University,
Shenzhen 518060, China
* Correspondence: llshen@szu.edu.cn

Received: 19 December 2017; Accepted: 8 February 2018; Published: 11 February 2018

Abstract: Skin lesions are a severe disease globally. Early detection of melanoma in dermoscopy
images significantly increases the survival rate. However, the accurate recognition of melanoma is
extremely challenging due to the following reasons: low contrast between lesions and skin, visual
similarity between melanoma and non-melanoma lesions, etc. Hence, reliable automatic detection of
skin tumors is very useful to increase the accuracy and efficiency of pathologists. In this paper, we
proposed two deep learning methods to address three main tasks emerging in the area of skin lesion
image processing, i.e., lesion segmentation (task 1), lesion dermoscopic feature extraction (task 2)
and lesion classification (task 3). A deep learning framework consisting of two fully convolutional
residual networks (FCRN) is proposed to simultaneously produce the segmentation result and the
coarse classification result. A lesion index calculation unit (LICU) is developed to refine the coarse
classification results by calculating the distance heat-map. A straight-forward CNN is proposed for
the dermoscopic feature extraction task. The proposed deep learning frameworks were evaluated
on the ISIC 2017 dataset. Experimental results show the promising accuracies of our frameworks,
i.e., 0.753 for task 1, 0.848 for task 2 and 0.912 for task 3 were achieved.

Keywords: skin lesion classification; melanoma recognition; deep convolutional network;


fully-convolutional residual network

1. Introduction
Melanoma is the most deadly form of skin cancer and accounts for about 75% of deaths associated
with skin cancer [1]. Accurate recognition of melanoma in early stage can significantly increase the
survival rate of patients. However, the manual detection of melanoma produces huge demand of
well-trained specialists, and suffers from inter-observer variations. A reliable automatic system for
melanoma recognition, increasing the accuracy and efficiency of pathologists, is worthwhile to develop.
The dermoscopy technique has been developed to improve the diagnostic performance of
melanoma. Dermoscopy is a noninvasive skin imaging technique of acquiring a magnified and
illuminated image of skin region for increased clarity of the spots [2], which enhances the visual effect
of skin lesion by removing surface reflection. Nevertheless, automatic recognition of melanoma from
dermoscopy images is still a difficult task, as it has several challenges. First, the low contrast between
skin lesions and normal skin region makes it difficult to segment accurate lesion areas. Second,
the melanoma and non-melanoma lesions may have high degree of visual similarity, resulting in
the difficulty for distinguishing melanoma lesion from non-melanoma. Third, the variation of skin
conditions, e.g., skin color, natural hairs or veins, among patients produce different appearance of
melanoma, in terms of color and texture, etc.

Sensors 2018, 18, 556; doi:10.3390/s18020556 www.mdpi.com/journal/sensors


Sensors 2018, 18, 556 2 of 16

Skin lesion segmentation is the essential step for most classification approaches. Recent review of
automated skin lesion segmentation algorithms can be found in [3]. Accurate segmentation can benefit
the accuracy of subsequent lesion classification. Extensive studies [4–12] have been made to produce
decent lesion segmentation results. For example, Gomez et al. proposed an unsupervised algorithm,
named Independent Histogram Pursuit (IHP), for the segmentation of skin lesion [13]. The algorithm
was tested on five different dermatological datasets, and achieved a competitive accuracy close to 97%.
Zhou developed several mean-shift-based approaches for segmenting skin lesions in dermoscopic
images [14–16]. Garnavi et al. proposed an automated segmentation approach for skin lesion using
optimal color channels and hybrid thresholding technique [17]. In more recent research, Pennisi et al.
employed Delaunay Triangulation to extract binary masks of skin lesion regions, which does not
require any training stage [18]. Ma proposed a novel deformable model using a newly defined speed
function and stopping criterion for skin lesion segmentation, which is robust against noise and yields
effective and flexible segmentation performance [19]. Yu used a deep learning approach, i.e., a fully
convolutional residual network (FCRN), for skin lesion segmentation in dermoscopy images [20].
Based on the segmentation results, hand-crafted features can be extracted for melanoma
recognition. Celebi et al. extracted several features, including color and texture from segmented lesion
region for skin lesion classification [21]. Schaefer used an automatic border detection approach [22] to
segment the lesion area and then assembled the extracted features, i.e., shape, texture and color, for
melanoma recognition [23]. On the other hand, some investigations [24] have attempted to directly
employ hand-crafted features for melanoma recognition without a segmentation step. Different
from approaches using hand-crafted features, deep learning networks use hierarchical structures to
automatically extract features. Due to the breakthroughs made by deep learning in an increasing
number of image-processing tasks [25–28], some research has started to apply deep learning approaches
for melanoma recognition. Codella et al. proposed a hybrid approach, integrating convolutional
neural network (CNN), sparse coding and support vector machines (SVMs) to detect melanoma [29].
In recent research, Codella and his colleagues established a system combining recent developments in
deep learning and machine learning approaches for skin lesion segmentation and classification [30].
Kawahara et al. employed a fully convolutional network to extract multi-scale features for melanoma
recognition [31]. Yu et al. applied a very deep residual network to distinguish melanoma from
non-melanoma lesions [20].
Although lots of work has been proposed, there is still a margin of performance improvement for
both skin lesion segmentation and classification. The International Skin Imaging Collaboration (ISIC)
is a cooperation focusing on the automatic analysis of skin lesion, and has continuously expanded its
datasets since 2016. In ISIC 2017, annotated datasets for three processing tasks related to skin lesion
images, including lesion segmentation, dermoscopic feature extraction and lesion classification, were
released for researchers to promote the accuracy of automatic melanoma detection methods. Different
from the extensively studied lesion segmentation and classification, dermoscopic feature extraction is
a new task in the area. Consequently, few studies have been proposed to address the problem.
In this paper, we proposed deep learning frameworks to address the three main processing tasks
of skin lesion images proposed in ISIC 2017. The main contribution of this paper can be summarized
as follows:

(1) Existing deep learning approaches commonly use two networks to separately perform lesion
segmentation and classification. In this paper, we proposed a framework consisting of
multi-scale fully-convolutional residual networks and a lesion index calculation unit (LICU)
to simultaneously address lesion segmentation (task 1) and lesion classification (task 3).
The proposed framework achieved excellent results in both tasks. Henceforth, the proposed
framework is named as Lesion Indexing Network (LIN).
(2) We proposed a CNN-based framework, named Lesion Feature Network (LFN), to address
task 2, i.e., dermoscopic feature extraction. Experimental results demonstrate the competitive
performance of our framework. To the best of our knowledge, we are not aware of any previous
Sensors 2018, 18, x FOR PEER REVIEW 3 of 16

(2) We proposed a CNN-based framework, named Lesion Feature Network (LFN), to address task
2, i.e., dermoscopic feature extraction. Experimental results demonstrate the competitive
Sensorsperformance
2018, 18, 556 of our framework. To the best of our knowledge, we are not aware of any previous
3 of 16
work proposed for this task. Hence, this work may become the benchmark for the following
related research in the area.
work proposed for this task. Hence, this work may become the benchmark for the following
(3) We made detailed analysis of the proposed deep learning frameworks in several respects, e.g.,
related research in the area.
the performances of networks with different depths; and the impact caused by adding different
(3) We made detailed analysis of the proposed deep learning frameworks in several respects, e.g., the
components (e.g., batch normalization, weighted softmax, etc.). This work provides useful
performances of networks with different depths; and the impact caused by adding different
guidelines for the design of deep learning networks in related medical research.
components (e.g., batch normalization, weighted softmax, etc.). This work provides useful
guidelines for the design of deep learning networks in related medical research.
2. Methods
In this section, we introduce the deep learning methods developed for different tasks.
2. Methods
In this section, we introduce the deep learning methods developed for different tasks.
2.1. Lesion Segmentation and Classification (Task 1 & 3)
2.1. Lesion Segmentation and Classification (Task 1 & 3)
2.1.1. Pre-Processing
2.1.1.The
Pre-Processing
original training set contains 2000 skin lesion images of different resolutions. The
resolutions of some
The original lesionset
training images
containsare 2000
above 1000
skin × 700,
lesion whichofrequire
images a high
different cost of computation.
resolutions. The resolutions It
is necessary to rescale the lesion images for the deep learning network. As directly
of some lesion images are above 1000 × 700, which require a high cost of computation. It is necessary resizing images
may
to distort
rescale thethe shape
lesion of the
images forskin lesion,
the deep we first
learning croppedAs
network. the centerresizing
directly area of images
lesion image and then
may distort the
proportionally
shape of the skinresize
lesion,thewe
area
firsttocropped
a lower the
resolution. Theofsize
center area of the
lesion center
image andsquare was set to be resize
then proportionally 0.8 of
the area
height
to aoflower
the image, and automatically
resolution. The size of thecropped with reference
center square was set totobe
the0.8image
of thecenter.
heightAs illustrated
of the image,
in Figure
and 1, this approach
automatically croppednot withonly enlarges
reference tothe
thelesion
imagearea for feature
center. detection,
As illustrated but also
in Figure maintains
1, this approachthe
shape of the skin lesion.
not only enlarges the lesion area for feature detection, but also maintains the shape of the skin lesion.

Figure 1.1. Pre-processing


Figure Pre-processingfor
forskin
skin lesion
lesion image.
image. First
First cropcrop the center
the center areathen
area and and proportionally
then proportionally
resize
resize
to to a resolution.
a lower lower resolution. (The numbers
(The numbers of image
of image size aresize are measured
measured by pixels).
by pixels).

2.1.2. Data
2.1.2. Data Augmentation
Augmentation
The dataset
The dataset contains
contains three
three categories
categories of
of skin
skin lesion,
lesion, i.e.,
i.e., Melanoma,
Melanoma, Seborrheic
Seborrheic keratosis
keratosis and
and
Nevus. As
Nevus. the number
As the number of of images
images of
of different
different categories
categories varies
varies widely,
widely, wewe accordingly
accordingly rotated
rotated the
the
images belonging to different classes to establish a class-balanced dataset. The dataset
images belonging to different classes to establish a class-balanced dataset. The dataset augmented augmented
with this
with this step
step is
is denoted
denoted asas DR.
DR. The
The number
number of of images
images ofof original
original training
training set
set and
and DR
DR are
are listed
listed in
in
Table 1. The numbers in the brackets after the category names are the angles for each
Table 1. The numbers in the brackets after the category names are the angles for each rotation. rotation.

Table 1.
Table 1. Detailed
Detailed information
information of
of data
data augmentation
augmentation (task
(task 11 &
& 3).
3).

Melanoma (18°) Seborrheic Keratosis (18°) Nevus (45°)◦


Melanoma (18◦ ) Seborrheic Keratosis (18◦ ) Nevus (45 )
Original 374 254 1372
Original 374 254 1372
DR 7480 5080 10,976
DR 7480 5080 10,976

The images in DR are randomly flipped along the x or y-axis to establish another pair dataset,
The
called images
DM. in DR
The two are randomly
datasets flipped
are separately along
used the xFCRNs.
to train or y-axis to establish another pair dataset,
called DM. The two datasets are separately used to train FCRNs.
Sensors 2018, 18, 556 4 of 16
Sensors 2018, 18, x FOR PEER REVIEW 4 of 16
Sensors 2018, 18, x FOR PEER REVIEW 4 of 16

2.1.3. Lesion
2.1.3. Lesion Indexing Network
Network (LIN)
2.1.3. Lesion Indexing
Indexing Network (LIN)
(LIN)
Architecture
Network Architecture
Network Architecture
The fully convolutional residual network, i.e., FCRN-88, proposed proposed in our previous work [32],
The fully convolutional residual network, i.e., FCRN-88, proposed in our previous work [32],
which outperforms
which outperforms thethe FCRN-50
FCRN-50 and FCRN-101
FCRN-101 [33], was was extended
extended to to simultaneously
simultaneously address the the
which outperforms the FCRN-50 and FCRN-101 [33], was extended to simultaneously address the
tasks of
tasks oflesion
lesionsegmentation
segmentation andand classification
classification in this
in this paper.paper.
In theInprevious
the previous work
work [32], [32], residual
a novel a novel
tasks of lesion segmentation and classification in this paper. In the previous work [32], a novel
residual
in in module
residual residual (Figure
module2c) (Figure 2c) is proposed
is proposed to replace tothe
replace the residual
original original residual module 2a,b)
module (Figure (Figure
to
residual in residual module (Figure 2c) is proposed to replace the original residual module (Figure
2a,b) to
better better the
address address the vanishing
gradient gradient vanishing
problem asproblem as the
the network network
goes deeper.goes deeper.
Using the RiRUsing the RiR
module, the
2a,b) to better address the gradient vanishing problem as the network goes deeper. Using the RiR
module,FCRN-50
original the original
was FCRN-50
transformed wasto transformed
a deeper model, to ai.e.,
deeper model,
FCRN-88. Thei.e., FCRN-88.
improved The improved
FCRN-88 achieves
module, the original FCRN-50 was transformed to a deeper model, i.e., FCRN-88. The improved
FCRN-88
new achieves new
state-of-the-art state-of-the-art
results results forofthe
for the segmentation segmentation
HEp-2 specimen of HEp-2 specimen images.
images.
FCRN-88 achieves new state-of-the-art results for the segmentation of HEp-2 specimen images.

(a) (b) (c)


(a) (b) (c)
Figure 2.
2. Residual building
building blocks. (a)(a) Plain identity shortcut;
shortcut; (b) Bottleneck;
Bottleneck; (c) Residual
Residual in Residual
Residual
Figure 2. Residual
Figure Residual building blocks. Plain identity
blocks. (a) Plain identity shortcut; (b)
(b) Bottleneck; (c)
(c) Residual in
in Residual
(RiR).
(RiR). (a,b) are adopted in the original FCRN-50 and FCRN-101.
(RiR). (a,b)
(a,b) are
are adopted
adopted in
in the
the original
original FCRN-50
FCRN-50 and
andFCRN-101.
FCRN-101.
Based on FCRN-88, we construct a Lesion Indexing Network (LIN) for skin lesion image
Based
Based on
onFCRN-88,
FCRN-88,wewe construct a Lesion
construct Indexing
a Lesion Network
Indexing (LIN) for
Network skinfor
(LIN) lesion
skinimage
lesionanalysis.
image
analysis. The flowchart of LIN is presented in Figure 3. Two FCRNs trained with datasets using
The flowchart
analysis. of LIN is presented
The flowchart of LIN isinpresented
Figure 3. in
Two FCRNs
Figure trained
3. Two with datasets
FCRNs usingdatasets
trained with differentusing
data
different data augmentation methods are involved. The lesion index calculation unit (LICU) is
augmentation
different datamethods are involved.
augmentation methods Theare
lesion index calculation
involved. The lesionunit (LICU)
index is designed
calculation unitto (LICU)
refine the
is
designed to refine the probabilities for Melanoma, Seborrheic keratosis and Nevus.
probabilities for Melanoma,
designed to refine Seborrheicfor
the probabilities keratosis Nevus.
and Seborrheic
Melanoma, keratosis and Nevus.

Figure 3. Flowchart of the Lesion Indexing Network (LIN). The framework contains two FCRN and a
contains two
Figure 3. Flowchart of the Lesion Indexing Network (LIN). The framework contains two FCRN and a
calculation unit for lesion index. (The numbers of image size are measured by pixels).
unit for
calculation unit for lesion
lesion index.
index. (The numbers
numbers of
of image
image size
size are
are measured
measured by
by pixels).
pixels).
In the testing stage, as the fully convolutional network accepts inputs with different sizes, we
In the
In the testing
testing stage,
stage, as the
the fully
fully convolutional
convolutional network acceptsaccepts inputs with
with different
different sizes,
sizes, we
we
proportionally resize the as
skin lesion images to two network
scales, i.e., ~300 ×inputs
300 and ~500 × 500, and send
proportionally resize
resize the
the skin
skin lesion images to two scales,
scales,i.e.,
i.e.,~300 × 300300
andand
~500~500
× 500,
× and send
proportionally
them to the FCRNs, lesion
respectively. Theimages
resultsto of
two
different scales are ×interpolated
~300 to the 500, and
original
them
send to the
them of FCRNs,
to the respectively.
FCRNs, The
respectively. results
The of different scales are interpolated to the original
resolution testing image and sum up to results of different
yield the scales aremaps.
coarse possibility interpolated
The LICUto the originala
employs
resolution of testing image and sum up to yield the coarse possibility maps. The LICU employs a
distance map representing the importance of each pixel to refine the coarse possibilities of skin lesions.
distance map representing the importance of each pixel to refine the coarse possibilities of skin lesions.
Sensors 2018, 18, 556 5 of 16

resolution of testing image and sum up to yield the coarse possibility maps. The LICU employs a
distance map representing the importance of each pixel to refine the coarse possibilities of skin lesions.
The reason for using separate FCRN-88 trained on different datasets, i.e., DR and DM, is that
we found ‘mirror’ operation seems to fool the FCRN-88 during training. The segmentation and
classification accuracies on the validation set verified our findings, i.e., the separate network provides
Sensors 2018, 18, x FOR PEER REVIEW 5 of 16
better segmentation and classification performance than that of a single FCRN-88 trained on DR + DM.
The reason for using separate FCRN-88 trained on different datasets, i.e., DR and DM, is that we
Lesion Index Calculation Unit (LICU)
found ‘mirror’ operation seems to fool the FCRN-88 during training. The segmentation and
classification
As the accurateaccuracies on the
possibility mapsvalidation set verified
of different lesionourcategories
findings, i.e.,
of the separate
skin lesionnetwork provides useful
image provide
better segmentation and classification performance than that of a single FCRN-88 trained on DR + DM.
information for pathologists, we proposed a component, named Lesion Index Calculation Unit (LICU),
to refineLesion
the coarse
Index skin lesionUnit
Calculation possibilities
(LICU) maps from FCRNs.
First, the coarse possibility maps after summation need to be normalized to [0, 1]. Let vi ( x, y) be
As the accurate possibility maps of different lesion categories of skin lesion image provide
the valueuseful y) in ith coarse
of (x,information map, the normalized possibility for skin lesions (pi ) can
for pathologists, we proposed a component, named Lesion Index
be deduced by:
Calculation
Unit (LICU), to refine the coarse skin lesion possibilities maps from FCRNs.
vi ( x,
First, the coarse possibility maps
y) − min (vneed
after summation i ( x, y ))
to be normalized to [0, 1]. Let , be
i ∈1, 2, 3
p i ( x, y ) =
the value of (x, y) in ith coarse map, i ∈ 1, 2, 3
3 the normalized possibility for skin lesions ( ) can be deduced by:
(1)
∑i=1 (vi ( x, y) − min (vi ( x, y)))
i ∈1,2,3
, min ,
∈ , ,
, = ∈ 1, 2, 3 (1)
Each pixel in the lesion area has ∑ a different
, min
importance , for lesion classification. It can be observed
∈ , ,
from Figure 4a,c that the area near the lesion border of some skin lesion images has a more similar
Each pixel in the lesion area has a different importance for lesion classification. It can be
appearance, i.e., color/texture,
observed from Figure 4a,c to thatskin thannear
the area thatthe
oflesion
the center
borderarea. The
of some blue
skin lines
lesion in Figure
images 4a,c are the
has a more
borders similar
of lesions produced by LIN. The lesion area with similar features to skin may
appearance, i.e., color/texture, to skin than that of the center area. The blue lines in Figure provide less
information forthe
4a,c are lesion recognition.
borders Hence, by
of lesions produced theLIN.
distances from
The lesion areapixels to thefeatures
with similar nearesttoborder
skin mayare used
provide
to represent theless information
importance of for lesion
pixels forrecognition. Hence, the distances
lesion classification. Examples from pixels to the
of distance maps nearest
are shown
border are used to represent the importance of pixels for lesion classification. Examples of distance
in the Figure 4b,d. The colors in the distance map represent the weights for corresponding pixels.
maps are shown in the Figure 4b,d. The colors in the distance map represent the weights for
The distance map is multiplied to each of the normalized coarse possibility maps to generate refined
corresponding pixels. The distance map is multiplied to each of the normalized coarse possibility
maps. Finally,
maps towe average
generate the possibilities
refined maps. Finally,inwe theaverage
lesionthe area of refinedinmaps
possibilities to obtain
the lesion area ofthe indexes for
refined
differentmaps
categories ofthe
to obtain skin lesion.
indexes for different categories of skin lesion.

(a)
(b)

(c)
(d)
Examples
Figure 4.Figure of skin
4. Examples lesion
of skin images
lesion with
images withoutlines (blue)and
outlines (blue) anddistance
distance maps.
maps. Thecolumn
The first first column
(a,c)the
(a,c) shows shows the original
original lesionlesion images
images and
and thesecond
the second (b,d)
(b,d)shows
showsthethe
corresponding distance
corresponding maps. maps.
distance
The scales for the original lesion images are about 1300 pixels × 1000 pixels and 1000 pixels × 800
The scales for the original lesion images are about 1300 pixels × 1000 pixels and 1000 pixels × 800 pixels,
pixels, respectively. The numbers of image size of distance maps are measured by pixels. The
respectively. The numbers of image size of distance maps are measured by pixels. The numbers in
numbers in color-bar represent corresponding weights.
color-bar represent corresponding weights.
Implementation
The proposed LIN is established using MatConvNet toolbox [34]. While 80% of the training
dataset is used for training, the remainder is used for validation. The FCRNs were individually
Sensors 2018, 18, 556 6 of 16

Implementation

SensorsThe proposed
2018, 18, x FOR LIN
PEERis established using MatConvNet toolbox [34]. While 80% of the training dataset
REVIEW 6 of 16
is used for training, the remainder is used for validation. The FCRNs were individually trained with a
trained withsize
mini-batch a mini-batch sizeGPU
of 128 on one of 128 on one GPU
(GeForce GTX (GeForce
TITAN X,GTX TITAN
12 GB RAM).X,The 12 GB RAM).
details Thetraining
of the details
of the training
setting settingasare
are the same theWe
[32]. same as [32].
stopped theWe stopped
network the network
training early, training early, after
after 6 epochs, 6 epochs,the
to overcome to
overcome the
overfitting overfitting problem.
problem.

2.2. Dermoscopic Feature Extraction (Task


(Task 2)
2)
Dermoscopic feature extraction is a new task announced in ISIC 2017, which aims to extract
clinical
clinical features
featuresfrom
fromdermoscopic images.
dermoscopic images.Little previous
Little workwork
previous has addressed this task.
has addressed thisIntask.
this section,
In this
we introduce
section, a CNN-based
we introduce approach,
a CNN-based i.e., the Lesion
approach, i.e., theFeature
Lesion Network (LFN), developed
Feature Network to address
(LFN), developed to
the challenge.
address the challenge.

2.2.1. Superpixel Extraction


2.2.1. Superpixel Extraction
The
The ISIC
ISICdermoscopic
dermoscopic images
images contain four four
contain kindskinds
of dermoscopic features,features,
of dermoscopic i.e., Pigment
i.e., Network
Pigment
(PN),
Network (PN), Negative Network (NN), Streaks (S) and Milia-like Cysts (MC). Topositions
Negative Network (NN), Streaks (S) and Milia-like Cysts (MC). To locate the of
locate the
dermoscopic features, thefeatures,
positions of dermoscopic dermoscopic images were
the dermoscopic subdivided
images into superpixels
were subdivided using algorithm
into superpixels using
introduced in [35]. An example is shown in Figure 5. The original skin lesion
algorithm introduced in [35]. An example is shown in Figure 5. The original skin lesion image image (Figure 5a) was
divided into 996 superpixel areas (Figure 5b), which are separated by black lines.
(Figure 5a) was divided into 996 superpixel areas (Figure 5b), which are separated by black lines.
Each
Each superpixel
superpixel areaarea can
can bebe classified
classified into
into one
one of
of five
five categories:
categories: four
four kinds
kinds of of dermoscopic
dermoscopic
features and background (B). Hence, the problem of feature extraction
features and background (B). Hence, the problem of feature extraction is converted is converted to the classification
to the
of
classification of superpixel areas. We extract the content of each superpixel according tothem
superpixel areas. We extract the content of each superpixel according to [35] and resize to a
[35] and
uniform size,toi.e.,
resize them 56 × 56,size,
a uniform for the
i.e.,proposed Lesion
56 × 56, for Feature Network.
the proposed Lesion Feature Network.

(a) (b)
Figure 5.5.Example
Figure Exampleof of superpixels.
superpixels. TheThe original
original imageimage
(a) was(a)subdivided
was subdivided into 996
into 996 pieces pieces of
of superpixel
areas (b) separated by black lines. The scale for the lesion image is 1022 pixels × 767 pixels. pixels.
superpixel areas (b) separated by black lines. The scale for the lesion image is 1022 pixels × 767

2.2.2. Data
2.2.2. Data Augmentation
Augmentation
The extracted
The extractedpatch
patch dataset
dataset is extremely
is extremely imbalanced.
imbalanced. Most ofMost of only
patches patches only
contain the contain the
background
background information. Hence, data augmentation processing is needed to balance the
information. Hence, data augmentation processing is needed to balance the number of images of
number of
images of different categories. Two processing techniques, i.e., Random sample and Patch rotation,
different categories. Two processing techniques, i.e., Random sample and Patch rotation, were adopted.
were adopted. The number of images of the original and augmented patch datasets is listed in Table 2.
The number of images of the original and augmented patch datasets is listed in Table 2.

Table 2. Detailed
Table 2. Detailed information
information of
of data
data augmentation
augmentation (task
(task 2).
2).

Original Random Sample + Rotation


Original Random Sample + Rotation
Background (B) >90,000 87,089
Background
Pigment (B)
Network (PN) >90,000
>80,000 87,089
77,325
Pigment Network (PN) >80,000 77,325
Negative Network (NN) ~3000 12,908
Negative Network (NN) ~3000 12,908
Milia-like
Milia-like Cysts
Cysts (MC)(MC) ~5000
~5000 18,424
18,424
Streaks (S) (S)
Streaks ~2000
~2000 8324 8324

Random Sample
As listed in Table 2, the volume of the original background patches is much larger than that of
other categories. However, most of background patches contain similar contents. Hence,
background patches contain lots of redundant information. To remove the redundancy and decrease
Sensors 2018, 18, 556 7 of 16

Random Sample
Sensors
As2018, 18, xin
listed FOR PEER2,REVIEW
Table the volumeof the original background patches is much larger than that 7 of of
16
other categories. However, most of background patches contain similar contents. Hence, background
the patchcontain
patches volume, the
lots ofbackground patches for LFN
redundant information. trainingthe
To remove are redundancy
randomly selected from the
and decrease theoriginal
patch
patch dataset,
volume, which ultimately
the background formed
patches a settraining
for LFN of 87,089 background
are randomly patches.
selected from the original patch
Duewhich
dataset, to theultimately
extremely large avolume
formed of Pigment
set of 87,089 Network
background (PN) in the original patch dataset,
patches.
random
Duesample operationlarge
to the extremely was volume
also applied to PN,Network
of Pigment resulting(PN)
in a in
setthe
of original
77,325 PN patches.
patch dataset, random
sample operation was also applied to PN, resulting in a set of 77,325 PN patches.
Patch Rotation
PatchThe
Rotation
volumes of NN, MC and S patches are relatively small in the original dataset. Image
rotation is employed
The volumes to augment
of NN, MC and Sthe volumes.
patches Three angles,
are relatively smalli.e., 90, original
in the 180 anddataset.
270, were adopted
Image for
rotation
patch
is rotation,
employed to which
augmentincreases the patch
the volumes. volumes
Three angles,toi.e.,
12,908, 18,424
90, 180 andand
270,8324
werefor NN, MC
adopted forand
patchS,
respectively.
rotation, which increases the patch volumes to 12,908, 18,424 and 8324 for NN, MC and S, respectively.

2.2.3. Lesion Feature Network (LFN)


our Lesion
The augmented training set was used to train our Lesion Feature
Feature Network
Network (LFN),
(LFN), whose
whose
architecture is
architecture is presented
presented in
in Figure
Figure 6.
6.

Figure 6.
Figure 6. Flowchart of Lesion
Flowchart of Lesion Feature
Feature Network
Network (LFN).
(LFN).

While the blue rectangles represent the convolutional layers, the numbers represent kernel size
While the blue rectangles represent the convolutional layers, the numbers represent kernel size
and number of kernels. LFN involves 12 convolutional layers for feature extraction, which can be
and number of kernels. LFN involves 12 convolutional layers for feature extraction, which can be
separated into 4 stages, i.e., 3 convolutional layers per stage. As the 1 × 1 convolution can integrate
separated into 4 stages, i.e., 3 convolutional layers per stage. As the 1 × 1 convolution can integrate
the features extracted by 3 × 3 convolution for better feature representation, a network in network
the features extracted by 3 × 3 convolution for better feature representation, a network in network like
like structure [36] is adopted for each stage. FC is the fully connected layer. Both max pooling (MP)
structure [36] is adopted for each stage. FC is the fully connected layer. Both max pooling (MP) and
and average pooling (AP) are used, and the network was trained with softmax loss, defined in (2).
average pooling (AP) are used, and the network was trained with softmax loss, defined in (2).
1 1
= 1 = 1log ∑ e f yi (2)
L=
N i∑ L i =
N i∑ − log( f
) (2)
∑j e j
where denotes the j-th element ( ∈ 1, , K is the number of classes) of vector of class scores f,
is the label
where of i-ththe
f j denotes input
j-thfeature
elementand(j ∈N[1,
is K
the
], Knumber of training
is the number data. of vector of class scores f , yi is
of classes)
Although
the label of i-th the data
input augmentation
feature and N is theoperation
numberwas performed,
of training data. the obtained training dataset is still
imbalanced.
Although Totheaddress the problem, weights
data augmentation operation arewas
assigned for different
performed, classestraining
the obtained while calculating the
dataset is still
softmax loss,To
imbalanced. to address
pay more theattention
problem,toweights
the classes with fewer
are assigned for samples.
different According
classes while to calculating
the numberthe of
images inloss,
softmax the to
augmented
pay moretraining
attention set,
tothe
theweights are set
classes with to 1, samples.
fewer 1, 5, 3 andAccording
8 for B, PN,toNN, MC and of
the number S,
respectively.
images in the augmented training set, the weights are set to 1, 1, 5, 3 and 8 for B, PN, NN, MC and
S, respectively.
2.2.4. Implementation
2.2.4. Implementation
The proposed LFN is developed using Keras toolbox. The patch dataset is separated into the
Theset
training proposed
and the LFN is developed
validation using to
set according Keras toolbox. Theofpatch
the percentages 80:20,dataset is separated
respectively. into the
The network is
training
optimized setby
and the validation
Stochastic set according
Gradient Descent (SGD) to the [37]
percentages
with an of 80:20,learning
initial respectively.
rate ofThe network
0.01 and a
is optimizedof
momentum by0.9.
Stochastic Gradient
The learning rateDescent
decreases (SGD)
with[37] with =an0.1.
gamma initial
The learning
network rate
wasof 0.01 and
trained on a
momentum
single GPU of 0.9. TheGTX
(GeForce learning
TITANrateX,decreases
12GB RAM) with gamma
and was= observed
0.1. The network was trained
to converge after 10on a single
epochs of
GPU (GeForce GTX TITAN X, 12GB RAM) and was observed to converge after 10 epochs of training.
training.
Sensors 2018, 18, x FOR PEER REVIEW 8 of 16
Sensors 2018, 18, 556 8 of 16

3. Performance Analysis
3. Performance Analysis
3.1. Datasets
3.1. Datasets
We use the publicly available International Skin Imaging Collaboration (ISIC) 2017 dataset [38] for
experiments
We use the in this paper.
publicly ISIC 2017
available provides 2000
International skin lesion
Skin Imaging images as(ISIC)
Collaboration a training
2017 set with[38]
dataset masks
for
for segmentation,
experiments superpixel
in this paper. masks
ISIC 2017 for dermoscopic
provides feature
2000 skin lesion images extraction
as a trainingandsetannotations
with masks for
classification. The
segmentation, lesion images
superpixel masks foraredermoscopic
classified into three extraction
feature categories,and Melanoma,
annotationsSeborrheic keratosis
for classification.
and lesion
The Nevus.images
Melanoma is a malignant
are classified skincategories,
into three tumor, which leads toSeborrheic
Melanoma, high deathkeratosis
rate. Theandother two
Nevus.
kinds of lesion,
Melanoma i.e., Seborrheic
is a malignant keratosis
skin tumor, whichand Nevus,
leads to higharedeath
the benign
rate. Theskin
othertumors derived
two kinds from
of lesion,
different
i.e., cells. Figure
Seborrheic 7 presents
keratosis and Nevus,the lesion
are theimages
benignfrom
skin ISIC
tumors2017 and their
derived frommasks for different
different tasks.
cells. Figure 7
The first the
presents rowlesion
in Figure 7 shows
images the original
from ISIC 2017 andskin lesion
their masksimages. The second
for different tasks.row
Theshows theinmasks
first row Figure for
7
lesion the
shows segmentation,
original skinwhile
lesionthe thirdThe
images. row shows
second rowthe superpixel
shows the masks masks for dermoscopic
for lesion segmentation,feature
while
extraction.
the third rowISIC 2017the
shows also providesmasks
superpixel a publicly available validation
for dermoscopic set with another
feature extraction. ISIC 2017 150 skin
also lesion
provides
aimages
publiclyforavailable
evaluation. validation set with another 150 skin lesion images for evaluation.
In this section, we analyze the performances of the proposed LIN and LFN on the ISIC 2017
validation set. The comparison with benchmark algorithms algorithms willwill be
be presented
presented in in the
the next
next section.
section.

Melanoma Seborrheic keratosis Nevus

Figure 7. Examples of lesion images from ISIC 2017 and their masks. The first row shows the original
images of
of different
differentlesions.
lesions.The
Thesecond
secondrow
rowshows
showsthethe segmentation
segmentation masks.
masks. TheThe third
third rowrow shows
shows the
the superpixel
superpixel maskmask for dermoscopic
for dermoscopic feature
feature extraction.
extraction. The for
The scales scales for theimages
the lesion lesion are
images
1022 are 1022
pixels ×
pixels
767 × 7673008
pixels, pixels, 3008
pixels ×pixels × 2000and
2000 pixels pixels and
1504 1504×pixels
pixels × 1129 pixels,
1129 pixels, respectively.
respectively.

3.2. Evaluation
3.2. Evaluation Metrics
Metrics

3.2.1. Lesion Segmentation


ISICrecommends
The ISIC recommendsseveralseveral metrics
metrics forfor performance
performance evaluation,
evaluation, which
which includes
includes accuracyaccuracy
(AC),
(AC), Jaccard
Jaccard IndexDice
Index (JA), (JA),coefficient
Dice coefficient (DI), sensitivity
(DI), sensitivity (SE) and(SE) and specificity
specificity (SP). Let(SP).
Ntp , Let
Ntn , N f ,p and, N f n
and
represent represent
the number the
of number of true
true positive, truepositive,
negative,true
falsenegative, false
positive and positive
false andrespectively.
negative, false negative, The
criteria can beThe
respectively. defined as:can be defined as:
criteria
Ntp + Ntn
AC = , (3)
= Ntp + N f p + Ntn , + Nf n (3)
Ntp 2 × Ntp
JA = , DI = × , (4)
= Ntp + N f n + ,N f p = 2 × N , f n + Nf p
tp + N (4)
×
Sensors 2018, 18, 556 9 of 16

Ntp Ntn
Sensors 2018, 18, x FOR PEER REVIEWSE = , SP = 9 of 16 (5)
Ntp + N f n Ntn + N f p
In this paper, we mainly used the JA metric for the evaluation of segmentation performance.
= , = (5)
The other metrics are measured as reference.
In this paper, we mainly used the JA metric for the evaluation of segmentation performance.
3.2.2. The
Dermoscopic Feature
other metrics Extraction
are measured and Lesion Classification
as reference.
The same evaluation metrics, i.e., AC, SE and SP, are employed to assess the performance of
3.2.2. Dermoscopic Feature Extraction and Lesion Classification
dermoscopic feature extraction and lesion classification. Average precision (AP), defined in [38], is
The same
also involved. evaluation
In this metrics,
paper, the i.e., AC,
primary SE and
metric SP, are
for these employed
two tasks isto assess
the area the performance
under the ROCof curve,
dermoscopic feature extraction and lesion classification. Average precision (AP), defined
i.e., AUC, which is generated by evaluating the true positive rate (TPR), i.e., SE, against the false in [38], is
also involved. In this paper, the primary metric for these two tasks is the area under the ROC curve,
positive rate (FPR), defined in (6), at various threshold settings.
i.e., AUC, which is generated by evaluating the true positive rate (TPR), i.e., SE, against the false
positive rate (FPR), defined in (6), at various threshold
N settings.
fp
FPR = = 1 − SP (6)
= Ntn + =
N1f p
(6)
3.3. Lesion Indexing Network (LIN)
3.3. Lesion Indexing Network (LIN)
3.3.1. The Performance on Lesion Segmentation
3.3.1. The Performance on Lesion Segmentation
To visually analyze the segmentation performance of the proposed LIN, some examples of its
To visually
segmentation resultsanalyze the segmentation
are presented in Figureperformance
8. The blue of and
the proposed
red linesLIN, some examples
represent of its
the segmentation
segmentation results are presented in Figure 8. The blue and red lines represent the
outlines of LIN and the ground truths, respectively. The examples illustrate some primary challenges segmentation
in theoutlines
area of of
skinLIN and the ground truths, respectively. The examples illustrate some primary
lesion image processing. The contrast between lesion and skin region is low in
challenges in the area of skin lesion image processing. The contrast between lesion and skin region is
Figure 8b,c,f. Human hair near the lesion region of Figure 8d may influence the segmentation.
low in Figure 8b,c,f. Human hair near the lesion region of Figure 8d may influence the segmentation.
The artificial scale measure in Figure 8a–c,e,f is another kind of noise information for lesion
The artificial scale measure in Figure 8a–c,e,f is another kind of noise information for lesion
segmentation.
segmentation.Nevertheless,
Nevertheless,it itcan
canbebeobserved from Figure
observed from Figure8 8that
thatthethe proposed
proposed Lesion
Lesion Indexing
Indexing
Network yields
Network satisfactory
yields segmentation
satisfactory segmentationresults
resultsfor
for all of the
all of thechallenging
challenging cases.
cases.

Melanoma

(a) (b)

(c) (d)

Figure 8. Cont.
Sensors 2018, 18, 556 10 of 16
Sensors 2018, 18, x FOR PEER REVIEW 10 of 16

Non-melanoma

Sensors 2018, 18, x FOR PEER REVIEW 10 of 16

Non-melanoma

(e) (f)

(e) (f)

(g) (h)
Figure 8. Examples of skin lesion segmentation results produced by LIN for ISIC 2017 validation set.
Figure 8. Examples of skin lesion segmentation results produced by LIN for ISIC 2017 validation
(a–d) are the results of Melanoma, while (e–h) are the results for Seborrheic keratosis and Nevus. The
set. (a–d) are the results of Melanoma, while (e–h) are the results for Seborrheic keratosis and Nevus.
blue and red lines represent the segmentation results and ground truths.
The blue and red lines represent
(g) the segmentation results and ground truths. (h)
Training with
FigureDR and DMof skin lesion segmentation results produced by LIN for ISIC 2017 validation set.
8. Examples
Training with (a–d)
DR andare theDM
results of Melanoma, while (e–h) are the results for Seborrheic keratosis and Nevus. The
In the experiments, ‘rotation’ and ‘mirror’ operations were adopted to enlarge the training
blue and red lines represent the segmentation results and ground truths.
In the experiments,Indexing
dataset for Lesion ‘rotation’ Network.
and ‘mirror’However, the FCRN-88
operations seems totobeenlarge
were adopted fooled theby the ‘mirror’
training dataset
for Lesion Indexing Network. However, the FCRN-88 seems to be fooled by the ‘mirror’+operation.
operation.
TrainingFigure
with 9
DR shows
and DM the loss curves of FCRN-88 trained with DR, DM and DR DM,
Figure respectively.
9 shows Note
In the
the that
loss ‘trloss’
curves
experiments, ofrepresents
FCRN-88
‘rotation’ and the training
trained
‘mirror’ loss DR,
with
operations andwere‘valoss’
DM andrepresents
adopted DR + DM,
to enlarge thethevalidation
training loss.Note
respectively.
The validation loss of FCRN-88 trained on DR/DM is stable around 0.2. In contrast, the loss of
that ‘trloss’dataset for Lesion
represents theIndexing
trainingNetwork.
loss and However,
‘valoss’the FCRN-88 seems
represents to be fooledloss.
the validation by the ‘mirror’
FCRN-88 trained by DR + DM decreases to about 0.18 and then
operation. Figure 9 shows the loss curves of FCRN-88 trained with DR, DM and DR + DM, gradually increases to over 0.2. The
The validation
FCRN-88
loss
trainedNote
respectively.
ofDR
withthat
FCRN-88 trained
+ DMrepresents
‘trloss’ has the lowest
on training
DR/DM
the training loss and
is stable
loss‘valoss’
around the
(greenrepresents
line) but the
0.2.validation
In contrast,
highest validation
loss.
the loss
of FCRN-88 trained
loss (cyan The
line) by DR
validation
among the+ frameworks.
loss DM
of decreases
FCRN-88 Thistoon
trained isabout
DR/DM0.18
because isthe and
stable thenofgradually
around
samples 0.2.
DRInand increases
contrast,
DM the paired.
are to over
loss of The 0.2.
The FCRN-88
similar trained
FCRN-88
appearances with
trained of DR
DR ++DM
bypaired DM has the
decreases
samples make tolowest
about training
0.18
the very and then
deep loss (green
gradually
FCRN-88 line) buttothe
increases
easily overfittedover highest
to0.2.
theThe validation
dataset.
loss (cyanFCRN-88 trained with
line) among DR + DM has the This
the frameworks. lowestistraining
becauseloss (green line) but the
the samples of highest
DR and validation
DM are paired.
loss (cyan line) among the frameworks. This is because the samples of DR and DM are paired. The
The similarsimilar
appearances of paired samples make the very deep FCRN-88 easily overfitted to the dataset.
appearances of paired samples make the very deep FCRN-88 easily overfitted to the dataset.

Figure 9. Loss curves of LIN trained with DR, DM and DR + DM.


FigureFigure 9. Loss
9. Loss curves
curves of of
LINLINtrained
trained with
with DR,
DR,DMDM
and and
DR + DR
DM. + DM.

Table 3 listed the JA of single FCRN-88 trained on DR/DR + DM and our LIN evaluated on
ISIC 2017 validation set. For comparison convenience, the frameworks only take a single scale of
Sensors 2018, 18, 556 11 of 16

lesion images, i.e., ~300 × 300, as input. As shown in Table 3, due to the overfitting problem, the
JA of FCRN-88 trained with DR + DM is the lowest, i.e., 0.607. The proposed LIN achieves the best
performance, i.e., 0.710.

Table 3. JA of frameworks on ISIC 2017 validation set.

Model JA
FCRN-88 (DR) 0.697
FCRN-88 (DR + DM) 0.607
LIN (ours) 0.710

Experiments on the Multi-Scale Input Images


Taking computation efficiency into account, the original skin lesion images were cropped and
resized to 320 × 320 for network training. However, lesion images of larger scale (~500 × 500) provide
a clearer view of the lesion area, e.g., the texture, for feature extraction. To demonstrate the importance
of processing skin lesion images at multiple scales, a set of experiments were conducted. Three
scales of testing images were selected, i.e., ~300 × 300, ~500 × 500 and ~700 × 700, for comparison.
The comparison results are presented in Table 4.
For single scale, an input image of ~300 achieves the best performance on the ISIC validation
set, i.e., a JA of 0.710. Degradation of segmentation performance is observed when only using the
larger-scale images, i.e., degradations of 0.012 and 0.048 for ~500 and ~700, respectively. However, the
larger-scale input images can assist LIN to perform more accurate segmentation. The LIN using all of
three scales achieves the best JA, i.e., 0.753, which is 0.002 higher than the second-rank, i.e., LIN using
~300 and ~500. In consideration of computational efficiency, the LIN using ~300 and ~500 is preferable
for experiments and applications.

Table 4. JA of frameworks with different scales of inputs.

Model JA
LIN (~300) 0.710
LIN (~500) 0.698
LIN (~700) 0.662
LIN (~300 + ~500) 0.751
LIN (~300 + ~500 + ~700) 0.753

3.3.2. The Performance on Lesion Classification

Performance of LICU
Each pixel in the lesion images has different importance for the final classification result. Although
the FCRN-88 can simultaneously perform segmentation and classification tasks, it assigns equal
importance for all pixels. Lesion Index Calculation Unit (LICU) measures the pixel importance
by distance map, and accordingly refines the possibility maps from FCRN-88s. Experiments were
conducted on the ISIC 2017 validation set to assess the performance of LICU. Table 5 lists the results.
Compared to the plain LIN, i.e., 0.891, the LICU component produces an improvement of 0.021 for
LIN, i.e., 0.912.

Table 5. AUC of frameworks with/without LICU.

Model AUC
LIN without LICU 0.891
LIN with LICU 0.912
Sensors 2018, 18, 556 12 of 16

3.4. Lesion Feature Network (LFN)

3.4.1. Analysis of Network Architecture


To analyze the influence caused by layer width, we transform the original LFN to two variations
for comparison, i.e., Narrow LFN and Wide LFN, the detailed information for which is listed in Table 6.

Table 6. Detailed information of different LFNs

LFN Narrow LFN Wide LFN


16, (3,3) 16, (3,3) 32, (3,3)
Stage 1 16, (1,1) 16, (1,1) 32, (1,1)
16, (3,3) 16, (3,3) 32, (3,3)
32, (3,3) 16, (3,3) 64, (3,3)
Stage 2 32, (1,1) 16, (1,1) 64, (1,1)
32, (3,3) 16, (3,3) 64, (3,3)
64, (3,3) 16, (3,3) 64, (3,3)
Stage 3 64, (1,1) 16, (1,1) 64, (1,1)
64, (3,3) 16, (3,3) 64, (3,3)
128, (3,3) 32, (3,3) 128, (3,3)
Stage 4 128, (1,1) 32, (1,1) 128, (1,1)
128, (3,3) 32, (3,3) 128, (3,3)

The performances of three LFNs were evaluated on ISIC 2017 validation set in Table 7. By
comparing the AUC of LFN and Narrow LFN, we notice that the narrow layer decreases the capacity
of feature representation of framework. The AUC of Narrow LFN is 0.822, which is 0.026 lower than
that of LFN, i.e., 0.848. In another aspect, too wide a layer leads to the overfitting problem, which
also decreases the performance of LFN. The AUC of wide LFN (0.803) is 0.045 lower than that of the
original LFN. Hence, the proposed LFN better balances the relationship between feature representation
capacity of framework and the network overfitting problem.

Table 7. AUC of LFNs on the validation set.

Model AUC
Narrow LFN 0.822
Wide LFN 0.803
LFN 0.848
LFN (without WSL) 0.778
LFN (without BN) 0.842

3.4.2. Performance of Weighted Softmax Loss (WSL)


Although a data augmentation approach was used to balance the sample volumes of different
categories, the generated training set is still imbalanced. Weighted softmax loss (WSL) is another
important tool for alleviating the influence caused by an imbalanced training set during network
training. As shown in Table 7, without using WSL, the AUC of LFN sharply decreases to 0.778, which
demonstrates the importance of weighted softmax loss.

3.4.3. Usage of Batch Normalization (BN)


Batch normalization (BN) [39] components can reduce internal covariate shift and accelerate the
training process, and has been widely adopted in many deep learning frameworks, e.g., ResNet [33]
and Inception [40]. In the proposed LFN, BN is adopted between convolutional layer and rectified
linear units layer. The result presented in Table 7 indicates that an improvement of 0.006 is generated
by a BN component for AUC.
Sensors 2018, 18, 556 13 of 16

4. Comparison with Benchmarks


To further evaluate the performance of proposed LIN and LFN, we compared them with several
existing deep learning frameworks on ISIC 2017 validation set.

4.1. Lesion Segmentation


For lesion segmentation, the fully convolutional network (FCN) proposed by Long et al. [41], the
U-net [42], the fully-convolutional Inception (II-FCN) [43] and the encoder-decoder network using
RNN layer (Auto-ED) [44] are included for comparison. The results are listed in Table 8.
Although our LIN is the deepest network among the listed algorithms, the balanced data
augmentation strategy and dual-network structure alleviate the overfitting problem. Table 8 shows
that the proposed LIN achieved the best JA (0.753), AC (0.950) and DC (0.839) among the presented
benchmark algorithms. The Auto-ED ranks second, i.e., 0.738, 0.936 and 0.824 were achieved for JA, AC
and DC, respectively. The U-net and II-FCN produced the best SE (0.853) and SP (0.984), respectively.

Table 8. Lesion segmentation performances of different frameworks.

Method JA AC DC SE SP
FCN-8s [41] 0.696 0.933 0.783 0.806 0.954
U-net [42] 0.651 0.920 0.768 0.853 0.957
II-FCN [43] 0.699 0.929 0.794 0.841 0.984
Auto-ED [44] 0.738 0.936 0.824 0.836 0.966
LIN (ours) 0.753 0.950 0.839 0.855 0.974

4.2. Dermoscopic Feature Extraction


For the task of dermoscopic feature extraction, as little work has addressed it, only the framework
proposed by Kawahara [45] was included for comparison. The results are shown in Table 9.

Table 9. Dermoscopic feature extraction performances of different frameworks.

Method AUC AC AP SE SP
J. Kawahara [45] 0.893 0.985 0.185 0.534 0.987
LFN (ours) 0.848 0.902 0.422 0.693 0.902

The framework proposed by Kawahara converted the problem of dermoscopic feature extraction
to a semantic segmentation task, which was supervised by a revised F1 score. As the F1 score
takes the overlapping area between prediction and ground truth as the main criterion for network
training, Kawahara’s framework yields better performance on predicting the topological structure of
dermoscopic features. As a consequence, it achieves higher AUC (0.893), AC (0.985) and SP (0.987) than
that of the proposed LFN. Different from Kawahara’s framework, the proposed LFN is a patch-based
classification network. It yields decent results on edge detection of dermoscopic features, which results
in a higher average precision (AP) and sensitivity (SE), i.e., 0.422 and 0.693, than that of the framework
ranking in second place.

4.3. Lesion Classification


Table 10 lists the lesion classification results of different frameworks, which includes AlexNet [46],
VGG-16 [47], ResNet-50/101 [33] and Inception-v3 [40]. The proposed LIN achieved the best AUC
(0.912), AC (0.857) and AP (0.729) among the presented benchmark algorithms, which are 0.02, 0.01
and 0.017 higher than the second ranks, respectively. The ResNet-50 and ResNet-101 produce excellent
performances for SE (0.845) and SP (0.986), respectively. As the Inception-v3 is an extremely deep
Sensors 2018, 18, 556 14 of 16

network, it easily encounters the overfitting problem and achieves relatively low AUC (0.800) and AP
(0.564) among the benchmarking algorithms.

Table 10. Lesion classification performances of different frameworks.

Method AUC AC AP SE SP
AlexNet [46] 0.859 0.823 0.651 0.343 0.969
VGG-16 [47] 0.892 0.847 0.709 0.586 0.919
ResNet-50 [33] 0.873 0.723 0.690 0.845 0.694
ResNet-101 [33] 0.869 0.840 0.712 0.336 0.986
Inception-v3 [40] 0.800 0.800 0.564 0.355 0.933
LIN(Ours) 0.912 0.857 0.729 0.490 0.961

5. Conclusions
In this paper, we proposed two deep learning frameworks, i.e., the Lesion Indexing Network
(LIN) and the Lesion Feature Network (LFN), to address three primary challenges of skin lesion image
processing, i.e., lesion segmentation, dermoscopic feature extraction and lesion classification.
The Lesion Indexing Network was proposed to simultaneously address lesion segmentation
and classification. Two very deep fully convolutional residual networks, i.e., FCRN-88, trained with
different training sets, are adopted to produce the segmentation result and coarse classification result.
A lesion indexing calculation unit (LICU) is proposed to measure the importance of a pixel for the
decision of lesion classification. The coarse classification result is refined according to the distance map
generated by LICU.
The Lesion Feature Network was proposed to address the task of dermoscopic feature extraction,
and is a CNN-based framework trained by the patches extracted from the dermoscopic images. To the
best of our knowledge, we are not aware of any previous work available for this task. Hence, this work
may become a benchmark for subsequent related research.
Our deep learning frameworks have been evaluated on the ISIC 2017 dataset. The JA and AUC
of LIN for lesion segmentation and classification are 0.753 and 0.912, which outperforms the existing
deep learning frameworks. The proposed LFN achieves the best average precision and sensitivity,
i.e., 0.422 and 0.693, for dermoscopic feature extraction, which demonstrates its excellent capacity for
addressing the challenge.

Acknowledgments: The work was supported by Natural Science Foundation of China under Grants No. 61672357
and 61702339, the Science Foundation of Shenzhen under Grant No. JCYJ20160422144110140, and the China
Postdoctoral Science Foundation under Grant No. 2017M622779.
Author Contributions: Yuexiang Li and Linlin Shen conceived and designed the experiments; Yuexiang Li
performed the experiments; Yuexiang Li and Linlin Shen analyzed the data; Linlin Shen contributed
reagents/materials/analysis tools; Yuexiang Li and Linlin Shen wrote the paper.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Jerant, A.F.; Johnson, J.T.; Sheridan, C.D.; Caffrey, T.J. Early detection and treatment of skin cancer. Am. Fam. Phys.
2000, 62, 381–382.
2. Binder, M.; Schwarz, M.; Winkler, A.; Steiner, A.; Kaider, A.; Wolff, K.; Pehamberger, H. Epiluminescence
microscopy. A useful tool for the diagnosis of pigmented skin lesions for formally trained dermatologists.
Arch. Dermatol. 1995, 131, 286–291. [CrossRef] [PubMed]
3. Celebi, M.E.; Wen, Q.; Iyatomi, H.; Shimizu, K.; Zhou, H.; Schaefer, G. A state-of-the-art survey on lesion border
detection in dermoscopy images. In Dermoscopy Image Analysis; CRC Press: Boca Raton, FL, USA, 2015.
4. Erkol, B.; Moss, R.H.; Stanley, R.J.; Stoecker, W.V.; Hvatum, E. Automatic lesion boundary detection in
dermoscopy images using gradient vector flow snakes. Skin Res. Technol. 2005, 11, 17–26. [CrossRef]
[PubMed]
Sensors 2018, 18, 556 15 of 16

5. Celebi, M.E.; Aslandogan, Y.A.; Stoecker, W.V.; Iyatomi, H.; Oka, H.; Chen, X. Unsupervised border detection
in dermoscopy images. Skin Res. Technol. 2007, 13. [CrossRef]
6. Iyatomi, H.; Oka, H.; Celebi, M.E.; Hashimoto, M.; Hagiwara, M.; Tanaka, M.; Ogawa, K. An improved
Internet-based melanoma screening system with dermatologist-like tumor area extraction algorithm.
Comput. Med. Imag. Graph. 2008, 32, 566–579. [CrossRef] [PubMed]
7. Celebi, M.E.; Kingravi, H.A.; Iyatomi, H.; Aslandogan, Y.A.; Stoecker, W.V.; Moss, R.H.; Malters, J.M.;
Grichnik, J.M.; Marghoob, A.A.; Rabinovitz, H.S. Border detection in dermoscopy images using statistical
region merging. Skin Res. Technol. 2008, 14, 347. [CrossRef] [PubMed]
8. Norton, K.A.; Iyatomi, H.; Celebi, M.E.; Ishizaki, S.; Sawada, M.; Suzaki, R.; Kobayashi, K.; Tanaka, M.;
Ogawa, K. Three-phase general border detection method for dermoscopy images using non-uniform
illumination correction. Skin Res. Technol. 2012, 18, 290–300. [CrossRef] [PubMed]
9. Xie, F.; Bovik, A.C. Automatic segmentation of dermoscopy images using self-generating neural networks
seeded by genetic algorithm. Pattern Recognit. 2013, 46, 1012–1019. [CrossRef]
10. Sadri, A.; Zekri, M.; Sadri, S.; Gheissari, N.; Mokhtari, M.; Kolahdouzan, F. Segmentation of dermoscopy
images using wavelet networks. IEEE Trans. Biomed. Eng. 2013, 60, 1134–1141. [CrossRef] [PubMed]
11. Celebi, M.E.; Wen, Q.; Hwang, S.; Iyatomi, H.; Schaefer, G. Lesion border detection in dermoscopy images
using ensembles of thresholding methods. Skin Res. Technol. 2013, 19, e252–e258. [CrossRef] [PubMed]
12. Peruch, F.; Bogo, F.; Bonazza, M.; Cappelleri, V.M.; Peserico, E. Simpler, faster, more accurate melanocytic
lesion segmentation through MEDS. IEEE Trans. Biomed. Eng. 2014, 61, 557–565. [CrossRef] [PubMed]
13. Gómez, D.D.; Butakoff, C.; Ersbøll, B.K.; Stoecker, W. Independent histogram pursuit for segmentation of
skin lesions. IEEE Trans. Biomed. Eng. 2008, 55, 157–161. [CrossRef] [PubMed]
14. Zhou, H.; Schaefer, G.; Sadka, A.; Celebi, M.E. Anisotropic mean shift based fuzzy c-means segmentation of
skin lesions. IEEE J. Sel. Top. Signal Process. 2009, 3, 26–34. [CrossRef]
15. Zhou, H.; Schaefer, G.; Celebi, M.E.; Lin, F.; Liu, T. Gradient vector flow with mean shift for skin lesion
segmentation. Comput. Med. Imaging Graph. 2011, 35, 121–127. [CrossRef] [PubMed]
16. Zhou, H.; Li, X.; Schaefer, G.; Celebi, M.E.; Miller, P. Mean shift based gradient vector flow for image
segmentation. Comput. Vis. Image Underst. 2013, 117, 1004–1016. [CrossRef]
17. Garnavi, R.; Aldeen, M.; Celebi, M.E.; Varigos, G.; Finch, S. Border detection in dermoscopy images using
hybrid thresholding on optimized color channels. Comput. Med. Imaging Graph. 2011, 35, 105–115. [CrossRef]
[PubMed]
18. Pennisi, A.; Bloisi, D.D.; Nardi, D.; Giampetruzzi, A.R.; Mondino, C.; Facchiano, A. Skin lesion image
segmentation using delaunay triangulation for melanoma detection. Comput. Med. Imaging Graph. 2016, 52,
89–103. [CrossRef] [PubMed]
19. Ma, Z.; Tavares, J. A novel approach to segment skin lesions in dermoscopic images based on a deformable
model. IEEE J. Biomed. Health Inform. 2017, 20, 615–623. [CrossRef] [PubMed]
20. Yu, L.; Chen, H.; Dou, Q.; Qin, J.; Heng, P.A. Automated melanoma recognition in dermoscopy images via
very deep residual networks. IEEE Trans. Med. Imaging 2017, 36, 994–1004. [CrossRef] [PubMed]
21. Celebi, M.E.; Kingravi, H.A.; Uddin, B.; Iyatomi, H.; Aslandogan, Y.A.; Stoecker, W.V.; Moss, R.H.
A methodological approach to the classification of dermoscopy images. Comput. Med. Imaging Graph.
2007, 31, 362–373. [CrossRef] [PubMed]
22. Celebi, M.E.; Iyatomi, H.; Schaefer, G.; Stoecker, W.V. Lesion border detection in dermoscopy images.
Comput. Med. Imaging Graph. 2009, 33, 148–153. [CrossRef] [PubMed]
23. Schaefer, G.; Krawczyk, B.; Celebi, M.E.; Iyatomi, H. An ensemble classification approach for melanoma
diagnosis. Memet. Comput. 2014, 6, 233–240. [CrossRef]
24. Stanley, R.J.; Stoecker, W.V.; Moss, R.H. A relative color approach to color discrimination for malignant
melanoma detection in dermoscopy images. Skin Res. Technol. 2007, 13, 62–72. [CrossRef] [PubMed]
25. Hospedales, T.; Romero, A.; Vázquez, D. Guest editorial: Deep learning in computer vision. IET Comput. Vis.
2017, 11, 621–622.
26. Sulistyo, S.B.; Woo, W.L.; Dlay, S.S. Regularized neural networks fusion and genetic algorithm based on-field
nitrogen status estimation of wheat plants. IEEE Trans. Ind. Inform. 2017, 13, 103–114. [CrossRef]
27. Sulistyo, S.B.; Wu, D.; Woo, W.L.; Dlay, S.S.; Gao, B.; Member, S. Computational deep intelligence vision
sensing for nutrient content estimation in agricultural automation. IEEE Trans. Autom. Sci. Eng. 2017, in
press. [CrossRef]
Sensors 2018, 18, 556 16 of 16

28. Sulistyo, S.; Woo, W.L.; Dlay, S.; Gao, B. Building a globally optimized computational intelligent image
processing algorithm for on-site nitrogen status analysis in plants. IEEE Intell. Syst. 2018, in press. [CrossRef]
29. Codella, N.; Cai, J.; Abedini, M.; Garnavi, R.; Halpern, A.; Smith, J.R. Deep learning, sparse coding, and svm
for melanoma recognition in dermoscopy images. In International Workshop on Machine Learning in Medical
Imaging; Springer: Cham, Switzerland, 2015; pp. 118–126.
30. Codella, N.; Nguyen, Q.B.; Pankanti, S.; Gutman, D.; Helba, B.; Halpern, A.; Smith, J.R. Deep learning
ensembles for melanoma recognition in dermoscopy images. IBM J. Res. Dev. 2016, 61. [CrossRef]
31. Kawahara, J.; Bentaieb, A.; Hamarneh, G. Deep features to classify skin lesions. In Proceedings of the 2016
IEEE 13th International Symposium onpp. Biomedical Imaging (ISBI), Prague, Czech Republic, 13–16 April
2016; pp. 1397–1400.
32. Li, Y.; Shen, L.; Yu, S. HEp-2 specimen image segmentation and classification using very deep fully
convolutional network. IEEE Trans. Med. Imaging 2017, 36, 1561–1572. [CrossRef] [PubMed]
33. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June
2016; pp. 770–778.
34. Vedaldi, A.; Lenc, K. MatConvNet—Convolutional neural networks for MATLAB. In Proceedings of the
ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 689–692.
35. Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; SüSstrunk, S. SLIC superpixels compared to
state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [CrossRef]
[PubMed]
36. Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400.
37. Lecun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation
applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [CrossRef]
38. Codella, N.C.F.; Gutman, D.; Celebi, E.; Helba, B.; Marchetti, A.M.; Dusza, W.S.; Kalloo, A.; Liopyris, K.;
Mishra, N.; Kittler, H.; et al. Skin lesion analysis toward melanoma detection: A challenge at the
2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging
collaboration (ISIC). arXiv 2017, arXiv:1710.05006.
39. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating deep network training by reducing internal covariate
shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July
2015; pp. 448–456.
40. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer
vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV,
USA, 27–30 June 2016; pp. 2818–2826.
41. Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015;
pp. 3431–3440.
42. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation.
In Proceedings of the Medical Image Computing and Computer Assisted Interventions, Munich, Germany,
5–9 October 2015; pp. 234–241.
43. Wen, H. II-FCN for skin lesion analysis towards melanoma detection. arXiv 2017, arXiv:1702.08699.
44. Attia, M.; Hossny, M.; Nahavandi, S.; Yazdabadi, A. Spatially aware melanoma segmentation using hybrid
deep learning techniques. arXiv 2017, arXiv:1702.07963.
45. Kawahara, J.; Hamarneh, G. Fully convolutional networks to detect clinical dermoscopic features. arXiv
2017, arXiv:1703.04559.
46. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks.
In Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe,
Nevada, 3–6 December 2012; pp. 1097–1105.
47. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv
2015, arXiv:1409.1556.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
© 2018. This work is licensed under
https://creativecommons.org/licenses/by/4.0/ (the “License”).
Notwithstanding the ProQuest Terms and Conditions, you may use this
content in accordance with the terms of the License.

You might also like