A Color and Texture Based Hierarchical K-NN

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

A Color and Texture Based Hierarchical K-NN

Approach to the Classification of Non-melanoma


Skin Lesions

Lucia Ballerini, Robert B. Fisher, Ben Aldridge, and Jonathan Rees

Abstract This chapter proposes a novel hierarchical classification system based


on the K-Nearest Neighbors (K-NN) model and its application to non-melanoma
skin lesion classification. Color and texture features are extracted from skin lesion
images. The hierarchical structure decomposes the classification task into a set of
simpler problems, one at each node of the classification. Feature selection is em-
bedded in the hierarchical framework that chooses the most relevant feature subsets
at each node of the hierarchy. The accuracy of the proposed hierarchical scheme is
higher than 93 % in discriminating cancer and potential at risk lesions from benign
lesions, and it reaches an overall classification accuracy of 74 % over five common
classes of skin lesions, including two non-melanoma cancer types. This is the most
extensive known result on non-melanoma skin cancer classification using color and
texture information from images acquired by a standard camera (non-dermoscopy).

1 Introduction
Skin cancers are the most common forms of human malignancies in fair skinned
populations [18]. Although malignant melanoma is the form of “skin cancer” with
the highest mortality, the “non-melanoma skin cancers” (basal cell carcinomas
and squamous cell carcinomas, etc.) are far more common. The incidence of both
melanoma and non-melanoma skin cancers is increasing, with the number of cases

L. Ballerini () · R.B. Fisher


School of Informatics, University of Edinburgh, Edinburgh, UK
e-mail: lucia.ballerini@ed.ac.uk
R.B. Fisher
e-mail: rbf@inf.ed.ac.uk

B. Aldridge · J. Rees
Department of Dermatology, University of Edinburgh, Edinburgh, UK
B. Aldridge
e-mail: ben.aldridge@ed.ac.uk
J. Rees
e-mail: jonathan.rees@ed.ac.uk

M.E. Celebi, G. Schaefer (eds.), Color Medical Image Analysis, 63


Lecture Notes in Computational Vision and Biomechanics 6,
DOI 10.1007/978-94-007-5389-1_4, © Springer Science+Business Media Dordrecht 2013
64 L. Ballerini et al.

being diagnosed doubling approximately every 15 years [35]. It is widely accepted


that early detection is fundamental to reducing the diseases’ morbidity and mortal-
ity. Automatic detection systems may offer benefit for this key diagnostic task.
There are a considerable number of published studies on classification meth-
ods relating to the diagnosis of cutaneous malignancies. The first published work
presenting an automatic classification of melanoma could be found in 1987 [11].
A paper describing the first complete system appeared a few years later [29]. The
number of published papers has increased every year and the significant progress
that has occurred in this field is demonstrated by the recent journal special issue that
summarizes the state of the art in computerized analysis of skin cancer images and
provides future directions for this exciting subfield of medical image analysis [16].
Different techniques for enhancement, segmentation, feature extraction and clas-
sification have been reported by several authors. Enhancement includes color cali-
bration and normalization [32, 54].
Concerning segmentation, Celebi et al. [14] presented a systematic overview of
main border detection methods: clustering followed by active contours are the most
popular. Improvements in lesion border detection are described in recent papers [26,
39, 54, 61, 66].
Numerous features have been extracted from skin images, including shape, color,
texture and border properties [19, 37, 43, 52, 56, 57, 63]. It is common to use fea-
tures related to the ABCD mnemonic rule [49]. However, our experiments suggested
that the use of the ABCD rule in the development of automatic classifiers can be ar-
guably discouraged [64].
Classification methods range from discriminant analysis to neural networks and
support vector machines [15, 41, 55]. See Maglogiannis et al. [40] for a review of
the state of the art of computer vision system for skin lesion characterization.
These methods have been mainly developed for images acquired by epilumi-
nescence microscopy (ELM or dermoscopy). However, newer technologies, includ-
ing digital dermoscopy, infrared imaging, multispectral imaging, and confocal mi-
croscopy, have recently come to the forefront in providing greater diagnostic accu-
racy [16].
Moreover published studies mainly focus on differentiating melanocytic naevi
(moles) from melanoma. Whilst this is undeniably important (as malignant melano-
ma is the form of skin cancer with the highest mortality), in the “real-world” the
majority of lesions presenting to dermatologists for assessment are not covered by
this narrow domain, and such systems ignore other benign lesions and crucially
the two most common skin cancers (Squamous Cell Carcinomas and Basal Cell
Carcinomas) [10, 27, 60].
The proposed work uses only high resolution color images acquired using stan-
dard cameras. To our knowledge only two melanoma pre-screening systems are
based on standard camera images [1, 12].
In the current study, color and texture features are used for the classification. We
focus on 5 common classes of skin lesions: Actinic Keratosis (AK), Basal Cell Car-
cinoma (BCC), Melanocytic Nevus/Mole (ML), Squamous Cell Carcinoma (SCC),
Seborrhoeic Keratosis (SK). As far as we can tell there is no research on automatic
classification of these lesion types (other than moles) outside our group [39].
Classification of Non-melanoma Skin Lesions 65

Moreover, this paper introduces a new hierarchical framework for skin lesion
classification. This framework is comprised of a modified version of the K-Nearest
Neighbors (K-NN) classifier, the Hierarchical K-Nearest Neighbors (HKNN) clas-
sifier, and a new similarity measure, based on color and texture features, that uses
different feature sets for comparing similarity at each node of the hierarchy.
The motivation for using a K-NN classifier can be seen in Fig. 4. It is clear that the
clusters overlap greatly, but are distinguishable. No hard boundary could separate
them (e.g. as usable by a support vector machine or Bayesian classifier).
Below we describe how the lesion classes can be organized in a hierarchical
scheme (Sect. 2) that suggests the use of the hierarchical classifier (Sect. 3). Then
we introduce the feature pool (Sect. 4). Therefore we make 2 claims:
1. The use of a hierarchical K-NN classifier improves classification accuracy from
70 % to 74 % over a non-hierarchical K-NN, and from 67 % and 69 % over a flat
and a hierarchical Bayes classifier, respectively,
2. This is the most extensive paper to present lesion classification results for non-
melanoma skin cancer using color imagery acquired by a standard camera, unlike
the dermoscopy method, which requires a specialised sensor.
While 74 % is low compared to the 90+ % rates achieved by melanoma clas-
sification, we argue that 74 % is worth publication: (a) the melanoma results are
from only the 2 class problem of melanoma vs melanocytic naevi (moles), and (b) it
has taken more than 20 years of research specifically on that problem to reach the
90+ % levels, whereas this is the first research on image-based classification of AK,
BCC, SCC and SK. We accept that whilst classification rates of this magnitude seem
low in the sphere of informatics research, these rates are significantly above what is
currently being achieved in non-specialist medical practice [10, 21, 27, 44, 51, 60].

2 Skin Class Hierarchy


Some images of the five classes are shown in Fig. 1. The hierarchy is fixed a pri-
ori by grouping our image classes into two main groups. The first group, hence
called Group1, contains lesion classes: Actinic Keratosis (AK), Basal Cell Carci-
noma (BCC) and Squamous Cell Carcinoma (SCC). The second group, hence called
Group2, contains lesion classes: Melanocytic Nevus/Mole (ML) and Seborrhoeic
Keratosis (SK). We note that AK, BCC, SCC, ML and SK are diagnostic classes
defined by dermatologists. The two groups were constructed by clustering classes
containing images which were visually similar at the first split. However we can
give some meaning to two groups observing that the first group comprises BCC and
SCC that are the two most common types of skin cancer and AK which is consid-
ered a pre-malignant condition that can give rise to SCCs and sometimes can be
visually similar to early superficial BCCs. In the second group ML and SK are both
benign forms of skin lesions having a similar appearance. The class grouping leads
to the hierarchical structure shown in Fig. 2. This structure makes a coarse separa-
tion among classes at the upper level while finer decisions are made at a lower level.
As a result, this scheme decomposes the original problem into 3 sub-problems.
66 L. Ballerini et al.

Fig. 1 Examples of skin


lesion images from the
different classes used in this
work

Fig. 2 Block diagram of the


hierarchical organization of
skin lesion classes

3 Hierarchical K-NN Classifier


A large number of classifier combinations have been proposed in the literature [33].
They may have different feature sets, different training sets, different classification
methods or different training sessions, all resulting in a set of classifiers whose out-
put may be combined, with the hope of improving the overall classification accuracy.
The schemes for combining multiple classifiers can be grouped into three main cate-
gories according to their architecture: (1) parallel, (2) cascading and (3) hierarchical.
In the hierarchical architecture, individual classifiers are combined into a structure
which is similar to a decision tree classifier. The advantage of this architecture is the
high efficiency and flexibility in exploiting the discriminant power of different types
of features [33]. A large number of studies have shown that classifier combination
can improve recognition accuracy [33]. It has been shown that in many domains an
ensemble of classifiers outperforms any of its single components [42]. The approach
used in our research falls within the hierarchical model.
Our approach divides the classification task into a set of smaller classification
problems corresponding to the splits in the classification hierarchy (see Fig. 2).
Each of these subtasks is significantly simpler than the original task, since the clas-
sifier at a node in the hierarchy need only distinguish between a smaller number of
Classification of Non-melanoma Skin Lesions 67

classes. Therefore, it may be possible to separate the smaller number of classes with
higher accuracy. Moreover, it may be possible to make this determination based on
a smaller set of features.
The proposed approach addresses also the feature selection problem. The reduc-
tion in the feature space avoids many problems related to high dimensional feature
spaces, such as the “curse of dimensionality” problem [33], where the indexing
structures degrade and the significance of each feature decreases, making the pro-
cess of storing, indexing and classifying extremely time consuming. Moreover, in
several situations, many features are correlated, meaning that they bring redundant
information about the images that can deteriorate the ability of the system to cor-
rectly distinguish them. Dimensionality reduction or feature selection has been an
active research area in pattern recognition, statistics and data mining communities.
The main idea of feature selection is to choose a subset of input features by elimi-
nating features with little or no predictive information.
It is important to note that the key here is not merely the use of feature selection,
but its integration with the hierarchical structure. In practice we build different clas-
sifiers using different sets of training images (according to the set of classifications
made at the higher levels of the hierarchy). So each classifier uses a different set
of features optimized for those images. This forces the individual classifiers to use
potentially independent information.
Hierarchical classifiers are well known [28, 45, 58] and commonly used for doc-
ument and text classification [13, 20, 23, 50], including a hierarchical K-NN clas-
sifier [24]. While we found papers describing applications of hierarchical systems
to medical image classification and annotation tasks [22, 47, 59], to the best of our
knowledge only a hierarchical neural network model has been applied to skin le-
sions [53]. They claim over 90 % accuracy on 58 images including 4 melanomas.
Unfortunately many technical details are not described in the paper. On the other
hand, only poor performance was reported relative to the classification of melanoma
using the K-NN method [7, 31]. Some promising results have been presented very
recently by using a K-NN followed by a Decision Tree classifier [12].

3.1 K-NN Classifier

K-NN is a well-known classifier. K-NN was first introduced by Fix and Hodges [25]
in 1951. It is well explored in the literature and has been shown to have good classifi-
cation performance on a wide range of real world data sets [17]. Many lazy learning
algorithms are derivatives of the K-NN. A review of them is presented in the paper
of Wetterschereck et al. [62]. A recent application of one of these similarity-based
learning algorithms, namely the lazy CL procedure, to melanoma is described by
Armengol [5].
To classify an unknown example T , the K-NN classifier finds the K nearest
neighbors among the training data and uses the categories of the K neighbors
to weight the category candidates. Then majority voting among the categories of
68 L. Ballerini et al.

data in the neighborhood is used to decide the class label of T . Given M classes
C1 , C2 , . . . , CM and N training samples I1 , I2 , . . . , IN , and the classification for Ii
with respect to category Cj (i = 1, . . . , N ; j = 1, . . . , M):

1 I i ∈ Cj
y(Ii , Cj ) = (1)
0 Ii ∈ / Cj
the decision rule in K-NN can be written as:
M 
K
assign T to Cj if score(T , Cj ) = arg max y(Ii , Cj ) (2)
j =1
i=1
where the training examples Ii are ranked according to their similarity to the test
example T .
The K-NN classifier has only one free parameter K which can be optimized by
a leave-one-out cross-validation procedure, given the distance function Dist (see
Eq. (13)) which is used to determine the ‘nearest’ neighbors. Choosing the proper-
ties to be used in each classifier is a core issue, and is addressed next. The actual
distance metrics are presented in Sect. 4.

3.2 Learning Phase

Our Hierarchical K-NN classifier (HKNN) is composed of three distinct K-NN clas-
sifier systems, one at the top level, and two at the bottom level. The top level clas-
sifier is fed with all the images in the training set. It classifies them into one of the
two groups. The other two classifiers are trained using only the images of the cor-
responding group (i.e. AK/BCC/SCC or ML/SK) that have been correctly (when in
the training stage) classified by the top classifier, and classifies them into one of the
2 or 3 diagnostic classes.
The learning phase consists of the feature selection process for the three distinct
K-NN classifiers. A sequential forward selection algorithm [34] (SFS) is used for
feature selection. The goal for choosing features is the maximization of the clas-
sification accuracy. We used a weighted classification accuracy due to the uneven
class distribution of our data set. This is the rate with which the system is able to
correctly identify each class. Then we take an average of these rates with respect to
the number of classes. Therefore our overall classification accuracy is defined as:

1  correctly_classified(Cj )
M
Overall accuracy = (3)
M number_of_test_images(Cj )
j =1

where M is the number of classes.


A leave-one-out cross-validation method is used during feature selection. Each
image is used as a test image, all the remaining images in the training set are ranked
according to their similarity index to the test image. Finally the test image is clas-
sified to the class which is most frequent among the K samples nearest to it using
Classification of Non-melanoma Skin Lesions 69

Eq. (2). The features that maximize the classification accuracy over all the images
in the training set are selected among all the extracted features.
At the end, there will be three sets of features for the three classification tasks, one
selected for the top classifier and two selected for the subclassifiers. The feature sets
for the two subsystems are also selected using SFS, but only using images from the
appropriate classes (i.e. AK/BCC/SCC or ML/SK). Note that, since every subnode
in the hierarchy has only a subset of the total classes, and the subnodes each have
fewer images, the additional cost of feature selection is not substantially more than
that of a flat classification scheme.

3.3 Classification Phase

In the classification phase all the test images are classified through the hierarchical
structure. Each image is first classified into one of the two groups by the top level
classifier that uses the first set of features. Then one of the classifiers of the second
level is invoked according to the output group of the top classifier and therefore
the image is classified in one of the 5 diagnostic classes using one of the two other
subsets of features.
A drawback of the proposed method is that errors on the first classification level
can not be corrected in the second level. If an example is incorrectly classified at
the top level and assigned to a group that does not contain the true class, then the
classifiers at lower levels have no chance of achieving a correct classification. This
is known as the “blocking” problem [58]. An attempt to solve this problem could
be to use classifiers on the second level which classify to more than the two or three
classes for which they are optimized. Our attempts in this direction show us that not
only these classifiers gave much worse results, but also incur additional problems
due to the very small number of images wrongly classified in the first level, that
makes the classes more unbalanced.

4 Feature Description
Here, skin lesions are characterized by their color and texture. In this section we
will describe a set of features that can capture such properties.

4.1 Color Features

Color features are represented by the mean colors μ = (μR , μG , μB ) of the lesion
and their covariance matrices Σ . Let
N 
1  1 
N
μX = Xi and CXY = Xi Yi − μX μY (4)
N N
i=1 i=1
70 L. Ballerini et al.

where: N is the number of pixels in the lesion, Xi the color component of chan-
nel X (X, Y ∈ {R, G, B}) of pixel i. In the RGB (Red, Green, Blue) color space,
the covariance matrix is:
⎡ ⎤
CRR CRG CRB
Σ = ⎣ CGR CGG CGB ⎦ (5)
CBR CBG CBB
In this work, RGB, HSV (Hue, Saturation, Value) and CIE_Lab, CIE_Lch (Munsell
color coordinate system [48]) and Ohta [46] color spaces were considered. Four
normalization techniques were investigated to reduce the impact of lighting, which
were applied before extracting color features. In the end, we normalized each color
component by dividing each color component by the average of the same component
of the healthy skin of the same patient, because it had best performance compared to
the other normalization techniques. After experimenting with the 5 different color
spaces, we choose the normalized RGB, because it gave slightly better results than
the other color spaces (see Sect. 5.4.2)

4.2 Texture Features

Texture features are extracted from generalized co-occurrence matrices (GCM).


Assume an image I having Nx columns, Ny rows and Ng gray levels. Let
Lx = {1, 2, . . . , Nx } be the columns, Ly = {1, 2, . . . , Ny } be the rows, and Gx =
{0, 1, . . . , Ng − 1} be the set of quantized gray levels. The co-occurrence matrix Pδ
is a matrix of dimension Ng × Ng , where [30]:


Pδ (i, j ) = # (k, l), (m, n) ∈ (Ly × Lx ) × (Ly × Lx ) | I (k, l) = i, I (m, n) = j
(6)
i.e. the number of co-occurrences of the pair of gray levels i and j which are a
distance δ = (d, θ ) apart. In our work, the pixel pairs (k, l) and (m, n) have distance
d = 5, 10, 15, 20, 25, √ 30 and orientation
√ θ = 0◦ , 45◦ , 90◦ , 135◦ , i.e. (m
√ = k + d,
n =√l), (m = k +d/ 2, n = l +d/ 2), (m = k, n = l +d), (m = k −d/ 2, n = l +
d/ 2).
Generalized co-occurrence matrices are the extension of the co-occurrence ma-
trix to multispectral images, i.e. images coded on n color channels [6]. Let u and v
be two color channels. The generalized co-occurrence matrices are:
(u,v)

Pδ (i, j ) = # (k, l), (m, n) ∈ (Ly × Lx ) × (Ly × Lx ) | Iu (k, l) = i,

Iv (m, n) = j (7)
For example, in case of color images, coded on three channels (RGB), we have
six cooccurrence matrices: (RR), (GG), (BB) that are the same as gray level co-
occurrence matrices computed on one channel and (RG), (RB), (GB) that take into
account the correlations between the channels.
Classification of Non-melanoma Skin Lesions 71

Fig. 3 Areas of lesions


where ratio features where
calculated

In order to have orientation invariance for our set of GCMs, we averaged the
matrices with respect to θ . Quantization levels NG = 64, 128, 256 are used for the
three color spaces: RGB, HSV and CIE_Lab.
From each GCM we extracted 12 texture features: energy, contrast, corre-
lation, entropy, homogeneity, inverse difference moment, cluster shade, cluster
prominence, max probability, autocorrelation, dissimilarity and variance as defined
in [30], for a total of 3888 texture features (12 features × 6 inter-pixel distances ×
6 color pairs × 3 color spaces × 3 gray level quantisations). Two sets of texture
features are extracted from GCMs calculated over the lesion area of the image, as
well as over a patch of healthy skin of the same image. Differences and ratios of
each of the lesion and normal skin values are also calculated, giving 2 more sets of
features:
featurel−s = featurelesion − featurehealthy _skin (8)
featurel/s = featurelesion /featurehealthy _skin (9)
Altogether, for a given feature family we use {featurelesion , featurehealthy _skin ,
featurel−s , featurel/s }. This gives a total of 4 × 3888 = 15552 possible texture fea-
tures, from which we extracted a good subset. All features are z-normalized over all
training data.

4.3 Ad Hoc Color Ratio Features

Color ratio features are designed ad hoc for skin lesions, by observing color varia-
tions inside the lesion area. Mean colors μA and μB are extracted over the areas A
and B shown in Fig. 3, and their ratios calculated as:
μA
ratio = (10)
μB
Two different area sizes are considered. In the first case, the thickness of the
border area is 10 % of the area of the lesion. In the second case, the diameter of the
inner area is 1/3 of the diameter of the whole lesion. Since lesions are not circular,
the morphological erosion operator is applied iteratively inward from the border
until the desired percentages of lesion area pixels are reached. These features seem
particularly useful for BCCs, which present pearly edges.
Ad hoc color ratio features are calculated for the three color spaces: RGB, HSV
and CIE_Lab, and all feature set are z-normalized. These properties are included in
the texture feature set.
72 L. Ballerini et al.

4.4 Distance Measure

The color and texture features are combined to construct a distance measure between
each test image T and a database image I .
For color covariance-based features, the Bhattacharya distance metric:
 −1
1 T (ΣT + ΣI ) 1 | (ΣT +ΣI ) |
BDCF (T , I ) = (μT − μI ) (μT − μI ) + ln √ 2
8 2 2 |ΣT ||ΣI |
(11)
is used, where μT and μI are the average (over all pixels in the lesion) color feature
vectors, ΣT and ΣI are the covariance matrices of the lesion of T and I respectively,
and | · | denotes the matrix determinant.
The Euclidean distance:

 S
 T    2
ED (T , I ) = f
TF
 −f
subset
I  =
subset fT −fI
i i (12)
i=1

is used for distances between a subset of S texture features fsubset , selected as de-
scribed later. Other metric distances (mahalanobis, cityblock) have been considered,
but gave worse results.
We aggregated the two distances into a distance matching function as:
Dist(T , I ) = w · BDCF (T , I ) + (1 − w) · EDT F (T , I ) (13)
where w is a weighting factor that has been selected experimentally, after trying
all the values: {0.1, 0.2, . . . , 0.9}. In our case, w = 0.7 gave the best results. A low
value of Dist indicates a high similarity.

5 Methods

The features described in previous sections were extracted from the lesions in our
image database. In this section we will describe in detail the image analysis and the
choices of the model parameters.

5.1 Acquisition and Preprocessing

Our image database comprises 960 lesions, belonging to 5 classes (45 AK,
239 BCC, 331 ML, 88 SCC, 257 SK). The ground truth used for the experiments is
based on the agreed classifications by 2 dermatologists and a pathologist.
Images are acquired using a Canon EOS 350D SLR camera. Lighting was con-
trolled using a ring flash and all images were captured at the same distance (∼50 cm)
resulting in a pixel resolution of about 0.03 mm. Lesions are segmented using the
Classification of Non-melanoma Skin Lesions 73

region-based active contour approach described in [39]. The segmentation method


uses a statistical model based the level-set framework. Morphological opening has
been applied to the segmented lesions to be sure to have patches containing only
lesions and healthy skin where the features are extracted.

5.2 Highlight Removal

Specular highlights appear as small and bright regions in various parts of our skin
images. The highlights created by specular reflections are a major obstacle for
proper color and texture feature extraction.
Specular highlights are often characterized by local coincidence of intense
brightness (I ) and low color saturation (S). Intensity and saturation are defined as
follow:
R+G+B
I= (14)
3
min(R, G, B)
S =1− (15)
I
and candidate specular reflection regions can be identified using appropriate thresh-
old values (motivated by [38]):
I > Ithr · Imax (16)
S < Sthr · Smax (17)
where Imax are Smax the maximum intensity and saturation in the image respec-
tively.
The most appropriate threshold values experimentally chosen (Ithr = 0.8) and
(Sthr = 0.5) differ from the values proposed in [38] probably due to the different
nature of the images.
We did not apply any subsequent filling procedure on the detected regions, as
this may destroy the original texture and therefore have a negative impact of the
subsequent feature extraction. Areas identified as “highlight” were simply excluded
from the region where the feature extraction process takes place.

5.3 Feature Normalization

The features described in previous sections have very different value ranges. To
account for this, an objective rescaling of the features is achieved by normalizing to
z-scores of each feature set, which is defined as
xij − μj
zij = (18)
σj
74 L. Ballerini et al.

where: xij represents the ith sample measure of feature j , μj the mean value of all
samples for feature j and σj is the standard deviation of the samples for feature j .
In addition, feature values outside the values at 5–95 percentiles have been trun-
cated to the 5th or 95th percentile value, and the normalising μ and σ calculated
from the truncated set. The normalising parameters were constant over all experi-
ments.

5.4 Evaluation

To assess performance, training and test sets were created by randomly splitting the
data set into 3 equal subsets. The only constraint on otherwise random partitioning
was that a class was represented equally in each subset. A 3-fold cross-validation
method was used, i.e. 3 sets composed of two-thirds of the data were created and
used as training sets for feature selection and the remaining one-third of the data as
the test set using the selected features for classification. Thus no training example
used for feature selection was used as a test example in the same experiment. Three
experiments were conducted independently and performance reported as mean and
standard deviation over the three experiments.
In the hierarchical classifiers mentioned in previous sections, the most commonly
used performance measures are the classic information retrieval notions of precision
and recall, or a combination of the two measures [58]. As we are dealing with a
classification task and not a retrieval task, we use the classification accuracy derived
from the confusion matrix. In the training stage, confusion matrices are obtained
by a leave-one-out scheme, where each image is used as a test image and classi-
fied according the known classification of the remaining images in the training set.
On the other hand, in the classification stage, confusion matrices are obtained in a
slightly different way: each image of the test set is classified according to the known
classifications of the K nearest neighbors in the training set.

5.4.1 Influence of the K Parameter

Classification results when varying the value of K of the K-NN classifiers have been
evaluated. In some experiments we noticed a little improvement by using a smaller
value of K for feature selection and a bigger one for classification. Table 1 shows
our evaluation. The numbers (mean ± standard deviation of the accuracy over the
three sets) in the first column are obtained in the feature selection stage, i.e. using the
value of K written on their left. The highest classification accuracy over the test sets
for each value of K used during the feature selection are highlighted in boldface.
We chose values of K: (1) to be odd numbers, (2) to be smaller than the training
class sizes and (3) to span what seemed like a sensible range. Since performance
does not vary too much for the K = 11 or K = 15 test cases, any value of K in this
range is probably approximately equally effective. In the following, the presented
Classification of Non-melanoma Skin Lesions 75

Table 1 Accuracy of the three subclassifiers varying the value of K. Each row shows the value of
K used in training, columns show the K used in testing
(a) Top level
Training Set Test Set
K =7 K = 11 K = 15

K =7 95.80 ± 0.53 91.67 ± 0.93 92.09 ± 1.59 91.88 ± 1.23


K = 11 95.68 ± 0.18 93.33 ± 0.67 92.71 ± 1.17 92.61 ± 0.74
K = 15 95.73 ± 0.63 93.23 ± 1.42 93.33 ± 0.95 93.86 ± 0.72

(b) Group1 (AK, BCC, SCC)


Training Set Test Set
K =7 K = 11 K = 15

K =7 79.40 ± 0.75 69.48 ± 0.98 71.50 ± 1.28 70.95 ± 2.31


K = 11 79.96 ± 3.40 69.07 ± 3.38 70.04 ± 0.88 70.86 ± 1.14
K = 15 81.87 ± 3.62 70.87 ± 0.91 72.64 ± 2.41 71.79 ± 2.06

(c) Group2 (ML, SK)


Training Set Test Set
K =7 K = 11 K = 15

K =7 91.97 ± 0.42 85.82 ± 0.88 86.01 ± 0.86 85.82 ± 0.39


K = 11 91.88 ± 0.54 85.64 ± 0.58 86.00 ± 0.70 86.19 ± 0.59
K = 15 90.80 ± 1.20 84.55 ± 0.86 85.84 ± 0.81 85.67 ± 1.43

results are obtained using the combination of K that gave the best classification
accuracy (underlined in the table) on the test set for each subclassifier (top level
classifier: train K = 15, test K = 15; AK/BCC/SCC classifier: train K = 15, test
K = 11; ML/SK classifier: train K = 11, test K = 15). Recalling that K is the num-
ber of nearest samples used to classify the image under examination, it is technically
correct to use different values at the classification stage than those used during the
feature selection stage.

5.4.2 Influence of Color Features

A comparison of the accuracy (mean ± standard deviation over the three sets) of
the three subclassifiers using only color features is reported in Table 2, using the K
values reported in the previous section. Note that values for RGB are different from
Table 1 because texture features are not used here.
The best results are obtained using RGB and Ohta color spaces. Actually all
accuracies are nearly identical before normalization. After normalization, RGB and
76 L. Ballerini et al.

Table 2 Accuracy of the (a) Before color normalization


three subclassifiers over the
three sets using different Top level Group1 Group2
color spaces, before and after
normalization RGB 87.82 ± 2.14 64.37 ± 3.80 55.03 ± 1.31
HSV 87.81 ± 2.21 62.61 ± 2.88 55.03 ± 1.31
Lab 87.20 ± 2.68 63.47 ± 2.73 55.95 ± 1.33
Lch 86.46 ± 2.52 63.48 ± 2.91 55.05 ± 0.62
Ohta 87.82 ± 0.97 64.37 ± 3.80 55.85 ± 1.37

(b) After color normalization


Top level Group1 Group2

RGB 92.71 ± 0.66 74.38 ± 1.81 84.35 ± 1.19


HSV 89.80 ± 1.95 62.65 ± 4.16 54.45 ± 2.60
Lab 91.04 ± 1.45 62.93 ± 3.68 56.87 ± 1.94
Lch 87.71 ± 1.80 65.23 ± 3.57 53.54 ± 2.40
Ohta 92.71 ± 0.66 62.31 ± 2.71 57.79 ± 3.40

Ohta color spaces still give best results for the top classifier, while RGB gives much
better results for the other two subclassifiers. These data also indicate that color
features are more important at the top level of the hierarchy, i.e. in discriminating
cancerous vs non cancerous lesions.

5.4.3 Influence of Texture Features

The texture feature set that best discriminates between the groups at the first level
of the hierarchy is different from the feature sets that best discriminate at the second
level, and these two sets also differ between each other. Figure 4 shows a scatter
plot of the two top features for each classifier. The list of selected features for each
level of the hierarchy is reported in the Appendix (see Table 8). Considering that
a potentially very different set of features is selected at each node of the hierarchy,
we can say that the hierarchical method, as a whole, actually uses a larger set of
features in a more efficient way, without ending up in problems like the “curse of
dimensionality”. Hence, there is a benefit from the hierarchical scheme.
We can observe that color information is important also in the texture features
because texture properties extracted using different color channels are selected.
The plots of the accuracy vs the number of features (from 1 to 10 for each level
of the hierarchy) are shown in Fig. 5. We show only the plots for one of the three
subsets and for the best K combinations. Keeping in mind that the color features are
fixed and feature selection is applied only to texture features, the nearly flat trend of
the top level classifier (Fig. 5 top) confirms that color features are more important
in discriminating AK/BCC/SC from ML/SK, and adding more texture features does
Classification of Non-melanoma Skin Lesions 77

Fig. 4 Scatter plots of the


top 2 features for each of the
three sets. Top graph shows
Group1 (AK/BCC/SCC) in
red and Group2 (SK/ML) in
green
78 L. Ballerini et al.

Fig. 5 Plots of accuracy vs


number of texture features for
one of the 3 subsets, using
color feature +1 to 10 texture
features
Classification of Non-melanoma Skin Lesions 79

Table 3 Accuracy of the (a) Stop when validation accuracy decreases once
three subclassifiers over the
three training sets, validation Top level Group1 Group2
sets and test sets
Training set 94.06 ± 1.37 71.63 ± 6.20 87.84 ± 1.01
Validation set 91.88 ± 0.81 69.16 ± 0.36 85.99 ± 0.39
Test set 91.56 ± 1.06 71.49 ± 1.91 86.01 ± 0.33
# Features 5, 5, 3 6, 9, 3 3, 3, 14

(b) Stop when validation accuracy decreases twice


Top level Group1 Group2

Training set 94.06 ± 1.90 70.77 ± 6.17 87.56 ± 1.45


Validation set 92.40 ± 0.77 69.69 ± 1.92 86.55 ± 0.70
Test set 90.83 ± 0.94 71.51 ± 2.70 85.32 ± 3.08
# Features 16, 13, 2 5, 18, 2 18, 2, 12

not improve its performance more than 2 %. On the other hand, the trend of the
AK/BCC/SCC and ML/SK subclassifiers (Fig. 5 bottom) indicates the usefulness of
using texture features at this level of the hierarchy.

5.4.4 Influence of Feature Number and Selection Algorithm

Referring again to the plots shown in Fig. 5, we can see that is reasonable to stop
after adding 10 texture features to color features, as the accuracy on the test set was
not significantly improving anymore.
A slight overfitting problem evident in some plots suggested us to make experi-
ments using the three sets as train, validation, test sets respectively. Results (mean
± standard deviation of the accuracy over the three subsets) are reported in Table 3.
We stopped selecting additional features when the accuracy on the validation set
decreased (once in the top table, twice in the bottom one).
We did not notice any significant improvement. This is probably due to the
smaller training set size that further reduced the size of the smallest class.
The number of features selected for each of the three subsets is in the last row of
the tables. The high variation means the number of selected features is not a crucial
choice. Indeed, the three subsets are created by randomly splitting the data in such
a way that the 5 lesion classes were equally represented in each subset.
The SFS feature selection algorithm is claimed not to be the optimal algorithm,
however in our case the use of a sequential forward backward greedy algorithm
(see Table 4) did not show any significant improvement. Once again we note a high
variation in the number of selected features for the three subsets.
80 L. Ballerini et al.

Table 4 Accuracy of the


three subclassifiers over the Top level Group1 Group2
three training sets, validation
sets and test sets using a Training set 95.35 ± 1.73 80.25 ± 5.09 91.35 ± 0.40
greedy forward backward Validation set 91.67 ± 0.45 67.19 ± 1.67 85.45 ± 0.78
algorithm
Test set 90.52 ± 0.97 72.08 ± 2.92 86.75 ± 2.55
# Features 5, 20, 10 10, 20, 4 20, 4, 20

Table 5 Comparison of the overall percentage accuracy of the hierarchical and flat classifiers over
the three training sets and test sets
Flat KNN HKNN Flat Bayes Hierarc. Bayes

Training set 77.6 ± 1.4 83.4 ± 1.4 74.3 ± 2.2 81.9 ± 1.5
Test set 69.8 ± 1.6 74.3 ± 2.5 67.7 ± 2.3 69.6 ± 0.4

5.5 Comparison with Other Methods

In Table 5 we compare our results with the results obtained using a non hierarchical
approach, i.e. a flat K-NN classifier and a Bayes classifier that use a single set of
features for all the 5 classes. The flat classifiers were trained using features selected
using the same SFS algorithm. Results of a hierarchical Bayes classifier, having the
same hierarchy as the HKNN classifier and whose subclassifiers were trained using
the same features and the same SFS algorithm, are also reported in the table. We
see that the use of hierarchy gives an improvement both over the training and test
sets.

6 Overall Results

The final results are reported in Table 6. The final accuracy of the top classifier and
the two subclassifiers at the bottom levels are also reported here. The values are
the mean ± standard deviation over the three training and test sets. These results
are obtained using best combination of K determined in Sect. 5.4.1, the RGB color
features and 10 texture features for each subclassifier. We decided to use a fixed
number of features as the train-validation-test scheme did not enhance performance
(see considerations in Sect. 5.4.4 about set sizes and number of features). Similarly,
the variety of results from the different configurations and numbers of features all
have about the same level, given the estimated standard deviations, and so suggest
that there is little risk of overtraining.
Recall the top level classifier discriminates between cancer and pre-malignant
conditions (AK/BCC/SCC) and benign forms of skin lesions (ML/SK). Therefore,
its very high accuracy (above 93 %) indicates the good performance of our system in
identifying cancer and potential at risk conditions. Analysis of the wrongly classified
Classification of Non-melanoma Skin Lesions 81

Table 6 Accuracy of the three subclassifiers and combined classifier over the three training sets
and test sets. Note that the Group1/2 results are only over the lesions correctly classified at the top
level. On the other hand, the full classifier results report accuracy based on both levels
Top level Group1 Group2 Full classifier

Training set 95.7 ± 0.6 81.9 ± 3.6 91.9 ± 0.5 83.4 ± 1.4
Test set 93.9 ± 0.7 72.6 ± 2.4 86.2 ± 0.6 74.3 ± 2.5

Table 7 Classification AK BCC ML SCC SK


results: confusion matrix on
the test images. Rows are true AK 7 27 1 9 1
classes, columns are the BCC 2 210 6 14 7
selected classes ML 10 10 269 10 42
SCC 8 34 5 36 5
SK 9 8 33 8 199

images at the top level pointed out that these were the lesions on which clinical
diagnosis of experienced dermatologists was most uncertain.
The overall classification accuracy on the test set is 74.3 ± 2.5 %, as shown in
the right column of Table 6. The overall result also includes the ∼6 % misclassified
samples from the first level.
The overall performance (74 %) is not yet at the 90 % level (achieved after 20+
years of research) for differential diagnosis of moles versus melanoma, however,
our method addresses lesion classes that seem to have no previous automated image
analysis (outside of research from our group [2–4, 8, 9, 36, 39, 65]) and, as high-
lighted previously, our algorithms’ performance is above the diagnostic accuracy
currently being achieved by non-specialists.
Table 7 shows the confusion matrix of the whole HKNN system on the test im-
ages. This matrix has been obtained by adding the three confusion matrices from
the three test sets, as they are disjoint. We note a good percentage of correctly clas-
sified BCC, ML and SK. The number of correctly classified AK and SCC at a first
glance looks quite low. This is due to the small number of images in each of these
two classes. However most of the AKs are misclassified as BCC and we should re-
member that AK is a pre-malignant lesion. Also many SCC are classified as BCC
which is another kind of cancer. Therefore consequences of these mistakes are not
as dramatic as if they were diagnosed as benign. An additional split in the hierarchy
may improve results.

7 Conclusions

We have presented an algorithm based on a novel hierarchical K-NN classifier, and


its application as the first classification of 5 most common classes of non-melanoma
82 L. Ballerini et al.

skin lesion from color images. Our approach uses a hierarchical combination of
three classifiers, utilizing feature selection to tailor the feature set of each classifier
to its task. The hierarchical K-NN structure improves the performance of the system
over the flat K-NN and a Bayes classifier.
As the accuracy is above 70 %, this system could be used in the future as a
diagnostic aid for skin lesion images, particularly as the cancerous vs non cancerous
results are ∼94 %.
These results were produced by optimizing classification accuracy. For medical
use future research should include the cost of decisions into the optimization pro-
cess.
Further studies will include the extraction of other texture related features, the
evaluation of other feature selection methods and the use of a weighted K-NN
model, where neighbor images are weighted according their distance to the test
image. In the future, it would be interesting to extend the hierarchical approach to
more than two hierarchical levels, including self-learned hierarchies.

Acknowledgement We thank the Wellcome Trust for funding this project (Grant No: 083928/Z/
07/Z).

Appendix

List of texture features selected for each level of the final tree. (See Table 8.)

Table 8 Legend: R = Red, G = Green, B = Blue, H = Hue, S = Saturation, V = Value, L, a, b =


Lab color space. Texture features are defined in [30]
(a) Top level
Texture feature Interp. distance Quant. levels Color channels Site

Entropy 5 128 HV Lesion


Cluster Shade 10 128 Lb Skin
Inv. Diff. Moment 5 64 SV Lesion
Contrast 15 64 ab Skin
Energy 5 64 HV Lesion
Inv. Diff. Moment 15 64 RG Lesion
Max Probability 5 128 HH Lesion/skin
Cluster Prominence 25 64 GG Skin
Correlation 10 256 HV Lesion
Cluster Prominence 15 64 LL Lesion-skin
Classification of Non-melanoma Skin Lesions 83

Table 8 (Continued)
(b) Group 1 (AK,BCC,SCC)
Texture feature Interp. distance Quant. levels Color channels Site

Variance 30 64 HS Lesion/skin
Energy 25 256 ab Lesion
Inv. Diff. Moment 25 64 BB Lesion
Entropy 15 128 HH Lesion
Max Probability 5 256 RB Lesion/skin
Cluster Prominence 10 64 SS Lesion-skin
Contrast 5 64 SS Lesion
Homogeneity 15 256 RG Lesion
Homogeneity 25 128 SV Lesion/skin
Inv. Diff. Moment 5 64 HS Skin

(c) Group 2 (ML,SK)


Texture feature Interp. distance Quant. levels Color channels Site

Correlation 10 256 SS Lesion


Dissimilarity 5 64 bb Skin
Cluster Shade 5 64 HH Lesion
Cluster Shade 5 64 HV Lesion
Cluster Prominence 5 64 HH Lesion
Cluster Prominence 5 64 HS Lesion
Cluster Shade 10 64 HV Lesion
Cluster Shade 5 256 HV Lesion
Correlation 5 64 HV Skin
Contrast 5 64 HH Lesion

References
1. Alcón JF, Heinrich A, Uzunbajakava N, Krekels G, Siem D, de Haan G, de Haan G (2009)
Automatic imaging system with decision support for inspection of pigmented skin lesions and
melanoma diagnosis. IEEE J Sel Top Signal Process 3:14–25
2. Aldridge RB, Glodzik D, Ballerini L, Fisher RB, Rees JL (2011) The utility of non-rule-based
visual matching as a strategy to allow novices to achieve skin lesion diagnosis. Acta Derm-
Venereol 91:279–283
3. Aldridge RB, Li X, Ballerini L, Fisher RB, Rees JL (2010) Teaching dermatology using 3-
dimensional virtual reality. Archives of Dermatology 149(10)
4. Aldridge RB, Zanotto M, Ballerini L, Fisher RB, Rees JL (2011) Novice identification of
melanoma: not quite as straightforward as the ABCDs. Acta Derm-Venereol 91:125–130
5. Armengol E (2011) Classification of melanomas in situ using knowledge discovery with ex-
plained case-based reasoning. Artif Intell Med 51:93–105
6. Arvis V, Debain C, Berducat M, Benassi A (2004) Generalization of the cooccurence matrix
for colour images: application to colour texture classification. Image Anal Stereol 23(1):63–
72
84 L. Ballerini et al.

7. Aslandogan Y, Mahajani G (2004) Evidence combination in medical data mining. In: Proceed-
ings of international conference on information technology: coding and computing, vol 2, pp
465–469
8. Ballerini L, Li X, Fisher RB, Aldridge B, Rees J (2010) Content-based image retrieval of skin
lesions by evolutionary feature synthesis. In: di Chio C, et al (eds) Application of evolutionary
computation, Istanbul, Turkey. Lectures notes in computer science, vol 6024, pp 312–319
9. Ballerini L, Li X, Fisher RB, Rees J (2010) A query-by-example content-based image retrieval
system of non-melanoma skin lesions. In: Caputo B (ed) Proceedings MICCAI-09 workshop
MCBR-CDS 2009: medical content-based retrieval for clinical decision support. LNCS, vol
5853. Springer, Berlin, pp 31–38
10. Basarab T, Munn S, Jones RR (1996) Diagnostic accuracy and appropriateness of general
practitioner referrals to a dermatology out-patient clinic. Br J Dermatol 135(1):70–73
11. Cascinelli N, Ferrario M, Tonelli T, Leo E (1987) A possible new tool for clinical diagnosis
of melanoma: the computer. J Am Acad Dermatol 16(2):361–367
12. Cavalcanti PG, Scharcanski J (2011) Automated prescreening of pigmented skin lesions using
standard cameras. Comput Med Imaging Graph 35(6):481–491
13. Ceci M, Malerba D (2003) Hierarchical classification of HTML documents with WebClassII.
In: Proceedings of the 25th European conference on information retrieval, pp 57–72
14. Celebi ME, Iyatomi H, Schaefer G, Stoecker WV (2009) Lesion border detection in der-
moscopy images. Comput Med Imaging Graph 33(2):148–153
15. Celebi ME, Kingravi HA, Uddin B, Iyatomi H, Aslandogan YA, Stoecker WV, Moss RH
(2007) A methodological approach to the classification of dermoscopy images. Comput Med
Imaging Graph 31(6):362–373
16. Celebi ME, Stoecker WV, Moss RH (2011) Advances in skin cancer image analysis. Comput
Med Imaging Graph 35(2):83–84
17. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory
13(1):21–27
18. Cancer research UK (CRUK). CancerStats, Internet (2011). URL http://info.cancerresearchuk.
org/cancerstats. Accessed 03/08/2011
19. Dalal A, Moss RH, Stanley RJ, Stoecker WV, Gupta K, Calcara DA, Xu J, Shrestha B, Drugge
R, Malters JM, Perry LA (2011) Concentric decile segmentation of white and hypopigmented
areas in dermoscopy images of skin lesions allows discrimination of malignant melanoma.
Comput Med Imaging Graph 35(2):148–154
20. D’Alessio S, Murray K, Schiaffino R, Kershenbaum A (2000) The effect of using hierarchical
classifiers in text categorization. In: Proceedings of 6th international conference recherche
d’information assistee par ordinateur, pp 302–313
21. Day GR, Barbour RH (2000) Automated melanoma diagnosis: where are we at? Skin Res
Technol 6:1–5
22. Dimitrovski I, Kocev D, Loskovska S, Dzeroski S (2011) Hierarchical annotation of medical
images. Pattern Recognit 44(10–11):2436–2449
23. Dumais S, Chen H (2000) Hierarchical classification of web content. In: Proceedings of the
23rd annual international ACM SIGIR conference on research and development in information
retrieval. ACM, New York, pp 256–263
24. Duwairi R, Al-Zubaidi R (2011) A hierarchical K-NN classifier for textual data. Int Arab J Inf
Technol 8(3):251–259
25. Fix E, Hodges JL (1989) Discriminatory analysis. Nonparametric discrimination: consistency
properties. Int Stat Rev 57(3):238–247
26. Garnavi R, Aldeen M, Celebi ME, Varigos G, Finch S (2011) Border detection in dermoscopy
images using hybrid thresholding on optimized color channels. Comput Med Imaging Graph
35(2):105–115
27. Gerbert B, Maurer T, Berger T, Pantilat S, McPhee SJ, Wolff M, Bronstone A, Caspers N
(1996) Primary care physicians as gatekeepers in managed care: primary care physicians’ and
dermatologists’ skills at secondary prevention of skin cancer. Arch Dermatol 132(9):1030–
1038
Classification of Non-melanoma Skin Lesions 85

28. Gordon AD (1987) A review of hierarchical classification. J R Stat Soc A 150(2):119–137


29. Green A, Martin N, McKenzie G, Pfitzner J, Quintarelli F, Thomas BW, O’Rourke M, Knight
N (1991) Computer image analysis of pigmented skin lesions. Melanoma Res 1:231–236
30. Haralick RM, Shanmungam K, Dinstein I (1973) Textural features for image classification.
IEEE Trans Syst Man Cybern 3(6):610–621
31. Hintz-madsen M, Hansen LK, Larsen J, Olesen E, Drzewiecki KT (1995) Design and evalua-
tion of neural classifiers application to skin lesion classification. In: Proceedings of the 1995
IEEE workshop on neural networks for signal processing V, pp 484–493
32. Iyatomi H, Celebi ME, Schaefer G, Tanaka M (2011) Automated color calibration method for
dermoscopy images. Comput Med Imaging Graph 35(2):89–98
33. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pat-
tern Anal Mach Intell 22(1):4–37
34. Jain AK, Zongker D (1997) Feature-selection: evaluation, application, and small sample per-
formance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158
35. Ko CB, Walton S, Keczkes K, Bury HPR, Nicholson C (1994) The emerging epidemic of skin
cancer. Br J Dermatol 130:269–272
36. Laskaris N, Ballerini L, Fisher RB, Aldridge B, Rees J (2010) Fuzzy description of skin le-
sions. In: Manning DJ, Abbey CK (eds) Medical imaging 2010: image perception, observer
performance, and technology assessment. Proceedings of the SPIE, vol 7627, pp 762,717-1–
762,717-10
37. Lee TK, Claridge E (2005) Predictive power of irregular border shapes for malignant
melanomas. Skin Res Technol 11(1):1–8
38. Lehmann TM, Palm C (2001) Color line search for illuminant estimation in real-world scenes.
J Opt Soc Am A 18(11):2679–2691
39. Li X, Aldridge B, Ballerini L, Fisher R, Rees J (2009) Depth data improves skin lesion seg-
mentation. In: Proceedings of the 12th international conference on medical image computing
and computer assisted intervention (MICCAI), London, pp 1100–1107
40. Maglogiannis I, Doukas CN (2009) Overview of advanced computer vision systems for skin
lesions characterization. IEEE Trans Inf Technol Biomed 13(5):721–733
41. Maglogiannis I, Pavlopoulos S, Koutsouris D (2005) An integrated computer supported ac-
quisition, handling, and characterization system for pigmented skin lesions in dermatological
images. IEEE Trans Inf Technol Biomed 9(1):86–98
42. Martínez-Otzeta JM, Sierra B, Lazkano E, Astigarraga A (2006) Classifier hierarchy learning
by means of genetic algorithms. Pattern Recognit Lett 27(16):1998–2004
43. Mete M, Kockara S, Aydin K (2011) Fast density-based lesion detection in dermoscopy im-
ages. Comput Med Imaging Graph 35(2):128–136
44. Morrison A, O’Loughlin S, Powell FC (2001) Suspected skin malignancy: a comparison of di-
agnoses of family practitioners and dermatologists in 493 patients. Int J Dermatol 40(2):104–
107
45. Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput
J 26(4):354–359
46. Ohta YI, Kanade T, Sakai T (1980) Color information for region segmentation. Comput Graph
Image Process 13(1):222–241
47. Pourghassem H, Ghassemian H (2008) Content-based medical image classification using a
new hierarchical merging scheme. Comput Med Imaging Graph 32(8):651–661
48. Rahman MM, Desai BC, Bhattacharya P (2006) Image retrieval-based decision support system
for dermatoscopic images. In: IEEE symposium on computer-based medical systems. IEEE
Computer Society, Los Alamitos, pp 285–290
49. Rigel DS, Russak J, Friedman R (2010) The evolution of melanoma diagnosis: 25 years be-
yond the ABCDs. CA: Cancer J Clinicians 60(5):301–316
50. Rodriguez C, Boto F, Soraluze I, Pérez A (2002) An incremental and hierarchical K-NN
classifier for handwritten characters. In: Proceedings of the 16th international conference on
pattern recognition (ICPR’02), vol 3. IEEE Computer Society, Washington, pp 98–101
86 L. Ballerini et al.

51. Rosado B, Menzies S, Harbauer A, Pehamberger H, Wolff K, Binder M, Kittler H (2003)


Accuracy of computer diagnosis of melanoma: a quantitative meta-analysis. Arch Dermatol
139(3):361–367
52. Sadeghi M, Razmara M, Lee TK, Atkins M (2011) A novel method for detection of pigment
network in dermoscopic images using graphs. Comput Med Imaging Graph 35(2):137–143
53. Salah B, Alshraideh M, Beidas R, Hayajneh F (2011) Skin cancer recognition by using a
neuro-fuzzy system. Cancer Inform 10:1–11
54. Schaefer G, Rajab MI, Celebi ME, Iyatomi H (2011) Colour and contrast enhancement for
improved skin lesion segmentation. Comput Med Imaging Graph 35(2):99–104
55. Schmid-Saugeons P, Guillod J, Thiran JP (2003) Towards a computer-aided diagnosis system
for pigmented skin lesions. Comput Med Imaging Graph 27:65–78
56. Seidenari S, Pellacani G, Pepe P (1998) Digital videomicroscopy improves diagnostic accu-
racy for melanoma. J Am Acad Dermatol 39(2):175–181
57. Stoecker WV, Wronkiewiecz M, Chowdhury R, Stanley RJ, Xu J, Bangert A, Shrestha B,
Calcara DA, Rabinovitz HS, Oliviero M, Ahmed F, Perry LA, Drugge R (2011) Detection of
granularity in dermoscopy images of malignant melanoma using color and texture features.
Comput Med Imaging Graph 35(2):144–147
58. Sun A, Lim EP, Ng WK (2003) Performance measurement framework for hierarchical text
classification. J Am Soc Inf Sci Technol 54:1014–1028
59. Tommasi T, Dedelaers T (2010) In: The medical image classification task. ImageCLEF: the
information retrieval series, vol 32, pp 221–238
60. Viola KV, Tolpinrud WL, Gross CP, Kirsner RS, Imaeda S, Federman DG (2011) Outcomes
of referral to dermatology for suspicious lesions: implications for teledermatology. Arch Der-
matol 147(5):556–560
61. Wang H, Moss RH, Chen X, Stanley RJ, Stoecker WV, Celebi ME, Malters JM, Grichnik JM,
Marghoob AA, Rabinovitz HS, Menzies SW, Szalapski TM (2011) Modified watershed tech-
nique and post-processing for segmentation of skin lesions in dermoscopy images. Comput
Med Imaging Graph 35(2):116–120
62. Wettschereck D, Aha DW, Mohri T (1997) A review and empirical evaluation of feature
weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11:273–314
63. Wollina U, Burroni M, Torricelli R, Gilardi S, Dell’Eva G, Helm C, Bardey W (2007) Digital
dermoscopy in clinical practise: a three-centre analysis. Skin Res Technol 13:133–142
64. Zanotto M (2010) Visual description of skin lesions. Master’s thesis, School of Informatics,
University of Edinburgh
65. Zanotto M, Ballerini L, Aldridge B, Fisher RB, Rees J (2011) Visual cues do not improve
skin lesion ABC(D) grading. In: Manning DJ, Abbey CK (eds) Medical imaging 2011: image
perception, observer performance, and technology assessment. Proceedings of the SPIE, vol
7966, pp 79,660U-1–79,660U-10
66. Zhou H, Schaefer G, Celebi ME, Lin F, Liu T (2011) Gradient vector flow with mean shift for
skin lesion segmentation. Comput Med Imaging Graph 35(2):121–127

You might also like