Professional Documents
Culture Documents
A Color and Texture Based Hierarchical K-NN
A Color and Texture Based Hierarchical K-NN
A Color and Texture Based Hierarchical K-NN
1 Introduction
Skin cancers are the most common forms of human malignancies in fair skinned
populations [18]. Although malignant melanoma is the form of “skin cancer” with
the highest mortality, the “non-melanoma skin cancers” (basal cell carcinomas
and squamous cell carcinomas, etc.) are far more common. The incidence of both
melanoma and non-melanoma skin cancers is increasing, with the number of cases
B. Aldridge · J. Rees
Department of Dermatology, University of Edinburgh, Edinburgh, UK
B. Aldridge
e-mail: ben.aldridge@ed.ac.uk
J. Rees
e-mail: jonathan.rees@ed.ac.uk
Moreover, this paper introduces a new hierarchical framework for skin lesion
classification. This framework is comprised of a modified version of the K-Nearest
Neighbors (K-NN) classifier, the Hierarchical K-Nearest Neighbors (HKNN) clas-
sifier, and a new similarity measure, based on color and texture features, that uses
different feature sets for comparing similarity at each node of the hierarchy.
The motivation for using a K-NN classifier can be seen in Fig. 4. It is clear that the
clusters overlap greatly, but are distinguishable. No hard boundary could separate
them (e.g. as usable by a support vector machine or Bayesian classifier).
Below we describe how the lesion classes can be organized in a hierarchical
scheme (Sect. 2) that suggests the use of the hierarchical classifier (Sect. 3). Then
we introduce the feature pool (Sect. 4). Therefore we make 2 claims:
1. The use of a hierarchical K-NN classifier improves classification accuracy from
70 % to 74 % over a non-hierarchical K-NN, and from 67 % and 69 % over a flat
and a hierarchical Bayes classifier, respectively,
2. This is the most extensive paper to present lesion classification results for non-
melanoma skin cancer using color imagery acquired by a standard camera, unlike
the dermoscopy method, which requires a specialised sensor.
While 74 % is low compared to the 90+ % rates achieved by melanoma clas-
sification, we argue that 74 % is worth publication: (a) the melanoma results are
from only the 2 class problem of melanoma vs melanocytic naevi (moles), and (b) it
has taken more than 20 years of research specifically on that problem to reach the
90+ % levels, whereas this is the first research on image-based classification of AK,
BCC, SCC and SK. We accept that whilst classification rates of this magnitude seem
low in the sphere of informatics research, these rates are significantly above what is
currently being achieved in non-specialist medical practice [10, 21, 27, 44, 51, 60].
classes. Therefore, it may be possible to separate the smaller number of classes with
higher accuracy. Moreover, it may be possible to make this determination based on
a smaller set of features.
The proposed approach addresses also the feature selection problem. The reduc-
tion in the feature space avoids many problems related to high dimensional feature
spaces, such as the “curse of dimensionality” problem [33], where the indexing
structures degrade and the significance of each feature decreases, making the pro-
cess of storing, indexing and classifying extremely time consuming. Moreover, in
several situations, many features are correlated, meaning that they bring redundant
information about the images that can deteriorate the ability of the system to cor-
rectly distinguish them. Dimensionality reduction or feature selection has been an
active research area in pattern recognition, statistics and data mining communities.
The main idea of feature selection is to choose a subset of input features by elimi-
nating features with little or no predictive information.
It is important to note that the key here is not merely the use of feature selection,
but its integration with the hierarchical structure. In practice we build different clas-
sifiers using different sets of training images (according to the set of classifications
made at the higher levels of the hierarchy). So each classifier uses a different set
of features optimized for those images. This forces the individual classifiers to use
potentially independent information.
Hierarchical classifiers are well known [28, 45, 58] and commonly used for doc-
ument and text classification [13, 20, 23, 50], including a hierarchical K-NN clas-
sifier [24]. While we found papers describing applications of hierarchical systems
to medical image classification and annotation tasks [22, 47, 59], to the best of our
knowledge only a hierarchical neural network model has been applied to skin le-
sions [53]. They claim over 90 % accuracy on 58 images including 4 melanomas.
Unfortunately many technical details are not described in the paper. On the other
hand, only poor performance was reported relative to the classification of melanoma
using the K-NN method [7, 31]. Some promising results have been presented very
recently by using a K-NN followed by a Decision Tree classifier [12].
K-NN is a well-known classifier. K-NN was first introduced by Fix and Hodges [25]
in 1951. It is well explored in the literature and has been shown to have good classifi-
cation performance on a wide range of real world data sets [17]. Many lazy learning
algorithms are derivatives of the K-NN. A review of them is presented in the paper
of Wetterschereck et al. [62]. A recent application of one of these similarity-based
learning algorithms, namely the lazy CL procedure, to melanoma is described by
Armengol [5].
To classify an unknown example T , the K-NN classifier finds the K nearest
neighbors among the training data and uses the categories of the K neighbors
to weight the category candidates. Then majority voting among the categories of
68 L. Ballerini et al.
data in the neighborhood is used to decide the class label of T . Given M classes
C1 , C2 , . . . , CM and N training samples I1 , I2 , . . . , IN , and the classification for Ii
with respect to category Cj (i = 1, . . . , N ; j = 1, . . . , M):
1 I i ∈ Cj
y(Ii , Cj ) = (1)
0 Ii ∈ / Cj
the decision rule in K-NN can be written as:
M
K
assign T to Cj if score(T , Cj ) = arg max y(Ii , Cj ) (2)
j =1
i=1
where the training examples Ii are ranked according to their similarity to the test
example T .
The K-NN classifier has only one free parameter K which can be optimized by
a leave-one-out cross-validation procedure, given the distance function Dist (see
Eq. (13)) which is used to determine the ‘nearest’ neighbors. Choosing the proper-
ties to be used in each classifier is a core issue, and is addressed next. The actual
distance metrics are presented in Sect. 4.
Our Hierarchical K-NN classifier (HKNN) is composed of three distinct K-NN clas-
sifier systems, one at the top level, and two at the bottom level. The top level clas-
sifier is fed with all the images in the training set. It classifies them into one of the
two groups. The other two classifiers are trained using only the images of the cor-
responding group (i.e. AK/BCC/SCC or ML/SK) that have been correctly (when in
the training stage) classified by the top classifier, and classifies them into one of the
2 or 3 diagnostic classes.
The learning phase consists of the feature selection process for the three distinct
K-NN classifiers. A sequential forward selection algorithm [34] (SFS) is used for
feature selection. The goal for choosing features is the maximization of the clas-
sification accuracy. We used a weighted classification accuracy due to the uneven
class distribution of our data set. This is the rate with which the system is able to
correctly identify each class. Then we take an average of these rates with respect to
the number of classes. Therefore our overall classification accuracy is defined as:
1 correctly_classified(Cj )
M
Overall accuracy = (3)
M number_of_test_images(Cj )
j =1
Eq. (2). The features that maximize the classification accuracy over all the images
in the training set are selected among all the extracted features.
At the end, there will be three sets of features for the three classification tasks, one
selected for the top classifier and two selected for the subclassifiers. The feature sets
for the two subsystems are also selected using SFS, but only using images from the
appropriate classes (i.e. AK/BCC/SCC or ML/SK). Note that, since every subnode
in the hierarchy has only a subset of the total classes, and the subnodes each have
fewer images, the additional cost of feature selection is not substantially more than
that of a flat classification scheme.
In the classification phase all the test images are classified through the hierarchical
structure. Each image is first classified into one of the two groups by the top level
classifier that uses the first set of features. Then one of the classifiers of the second
level is invoked according to the output group of the top classifier and therefore
the image is classified in one of the 5 diagnostic classes using one of the two other
subsets of features.
A drawback of the proposed method is that errors on the first classification level
can not be corrected in the second level. If an example is incorrectly classified at
the top level and assigned to a group that does not contain the true class, then the
classifiers at lower levels have no chance of achieving a correct classification. This
is known as the “blocking” problem [58]. An attempt to solve this problem could
be to use classifiers on the second level which classify to more than the two or three
classes for which they are optimized. Our attempts in this direction show us that not
only these classifiers gave much worse results, but also incur additional problems
due to the very small number of images wrongly classified in the first level, that
makes the classes more unbalanced.
4 Feature Description
Here, skin lesions are characterized by their color and texture. In this section we
will describe a set of features that can capture such properties.
Color features are represented by the mean colors μ = (μR , μG , μB ) of the lesion
and their covariance matrices Σ . Let
N
1 1
N
μX = Xi and CXY = Xi Yi − μX μY (4)
N N
i=1 i=1
70 L. Ballerini et al.
where: N is the number of pixels in the lesion, Xi the color component of chan-
nel X (X, Y ∈ {R, G, B}) of pixel i. In the RGB (Red, Green, Blue) color space,
the covariance matrix is:
⎡ ⎤
CRR CRG CRB
Σ = ⎣ CGR CGG CGB ⎦ (5)
CBR CBG CBB
In this work, RGB, HSV (Hue, Saturation, Value) and CIE_Lab, CIE_Lch (Munsell
color coordinate system [48]) and Ohta [46] color spaces were considered. Four
normalization techniques were investigated to reduce the impact of lighting, which
were applied before extracting color features. In the end, we normalized each color
component by dividing each color component by the average of the same component
of the healthy skin of the same patient, because it had best performance compared to
the other normalization techniques. After experimenting with the 5 different color
spaces, we choose the normalized RGB, because it gave slightly better results than
the other color spaces (see Sect. 5.4.2)
In order to have orientation invariance for our set of GCMs, we averaged the
matrices with respect to θ . Quantization levels NG = 64, 128, 256 are used for the
three color spaces: RGB, HSV and CIE_Lab.
From each GCM we extracted 12 texture features: energy, contrast, corre-
lation, entropy, homogeneity, inverse difference moment, cluster shade, cluster
prominence, max probability, autocorrelation, dissimilarity and variance as defined
in [30], for a total of 3888 texture features (12 features × 6 inter-pixel distances ×
6 color pairs × 3 color spaces × 3 gray level quantisations). Two sets of texture
features are extracted from GCMs calculated over the lesion area of the image, as
well as over a patch of healthy skin of the same image. Differences and ratios of
each of the lesion and normal skin values are also calculated, giving 2 more sets of
features:
featurel−s = featurelesion − featurehealthy _skin (8)
featurel/s = featurelesion /featurehealthy _skin (9)
Altogether, for a given feature family we use {featurelesion , featurehealthy _skin ,
featurel−s , featurel/s }. This gives a total of 4 × 3888 = 15552 possible texture fea-
tures, from which we extracted a good subset. All features are z-normalized over all
training data.
Color ratio features are designed ad hoc for skin lesions, by observing color varia-
tions inside the lesion area. Mean colors μA and μB are extracted over the areas A
and B shown in Fig. 3, and their ratios calculated as:
μA
ratio = (10)
μB
Two different area sizes are considered. In the first case, the thickness of the
border area is 10 % of the area of the lesion. In the second case, the diameter of the
inner area is 1/3 of the diameter of the whole lesion. Since lesions are not circular,
the morphological erosion operator is applied iteratively inward from the border
until the desired percentages of lesion area pixels are reached. These features seem
particularly useful for BCCs, which present pearly edges.
Ad hoc color ratio features are calculated for the three color spaces: RGB, HSV
and CIE_Lab, and all feature set are z-normalized. These properties are included in
the texture feature set.
72 L. Ballerini et al.
The color and texture features are combined to construct a distance measure between
each test image T and a database image I .
For color covariance-based features, the Bhattacharya distance metric:
−1
1 T (ΣT + ΣI ) 1 | (ΣT +ΣI ) |
BDCF (T , I ) = (μT − μI ) (μT − μI ) + ln √ 2
8 2 2 |ΣT ||ΣI |
(11)
is used, where μT and μI are the average (over all pixels in the lesion) color feature
vectors, ΣT and ΣI are the covariance matrices of the lesion of T and I respectively,
and | · | denotes the matrix determinant.
The Euclidean distance:
S
T 2
ED (T , I ) = f
TF
−f
subset
I =
subset fT −fI
i i (12)
i=1
is used for distances between a subset of S texture features fsubset , selected as de-
scribed later. Other metric distances (mahalanobis, cityblock) have been considered,
but gave worse results.
We aggregated the two distances into a distance matching function as:
Dist(T , I ) = w · BDCF (T , I ) + (1 − w) · EDT F (T , I ) (13)
where w is a weighting factor that has been selected experimentally, after trying
all the values: {0.1, 0.2, . . . , 0.9}. In our case, w = 0.7 gave the best results. A low
value of Dist indicates a high similarity.
5 Methods
The features described in previous sections were extracted from the lesions in our
image database. In this section we will describe in detail the image analysis and the
choices of the model parameters.
Our image database comprises 960 lesions, belonging to 5 classes (45 AK,
239 BCC, 331 ML, 88 SCC, 257 SK). The ground truth used for the experiments is
based on the agreed classifications by 2 dermatologists and a pathologist.
Images are acquired using a Canon EOS 350D SLR camera. Lighting was con-
trolled using a ring flash and all images were captured at the same distance (∼50 cm)
resulting in a pixel resolution of about 0.03 mm. Lesions are segmented using the
Classification of Non-melanoma Skin Lesions 73
Specular highlights appear as small and bright regions in various parts of our skin
images. The highlights created by specular reflections are a major obstacle for
proper color and texture feature extraction.
Specular highlights are often characterized by local coincidence of intense
brightness (I ) and low color saturation (S). Intensity and saturation are defined as
follow:
R+G+B
I= (14)
3
min(R, G, B)
S =1− (15)
I
and candidate specular reflection regions can be identified using appropriate thresh-
old values (motivated by [38]):
I > Ithr · Imax (16)
S < Sthr · Smax (17)
where Imax are Smax the maximum intensity and saturation in the image respec-
tively.
The most appropriate threshold values experimentally chosen (Ithr = 0.8) and
(Sthr = 0.5) differ from the values proposed in [38] probably due to the different
nature of the images.
We did not apply any subsequent filling procedure on the detected regions, as
this may destroy the original texture and therefore have a negative impact of the
subsequent feature extraction. Areas identified as “highlight” were simply excluded
from the region where the feature extraction process takes place.
The features described in previous sections have very different value ranges. To
account for this, an objective rescaling of the features is achieved by normalizing to
z-scores of each feature set, which is defined as
xij − μj
zij = (18)
σj
74 L. Ballerini et al.
where: xij represents the ith sample measure of feature j , μj the mean value of all
samples for feature j and σj is the standard deviation of the samples for feature j .
In addition, feature values outside the values at 5–95 percentiles have been trun-
cated to the 5th or 95th percentile value, and the normalising μ and σ calculated
from the truncated set. The normalising parameters were constant over all experi-
ments.
5.4 Evaluation
To assess performance, training and test sets were created by randomly splitting the
data set into 3 equal subsets. The only constraint on otherwise random partitioning
was that a class was represented equally in each subset. A 3-fold cross-validation
method was used, i.e. 3 sets composed of two-thirds of the data were created and
used as training sets for feature selection and the remaining one-third of the data as
the test set using the selected features for classification. Thus no training example
used for feature selection was used as a test example in the same experiment. Three
experiments were conducted independently and performance reported as mean and
standard deviation over the three experiments.
In the hierarchical classifiers mentioned in previous sections, the most commonly
used performance measures are the classic information retrieval notions of precision
and recall, or a combination of the two measures [58]. As we are dealing with a
classification task and not a retrieval task, we use the classification accuracy derived
from the confusion matrix. In the training stage, confusion matrices are obtained
by a leave-one-out scheme, where each image is used as a test image and classi-
fied according the known classification of the remaining images in the training set.
On the other hand, in the classification stage, confusion matrices are obtained in a
slightly different way: each image of the test set is classified according to the known
classifications of the K nearest neighbors in the training set.
Classification results when varying the value of K of the K-NN classifiers have been
evaluated. In some experiments we noticed a little improvement by using a smaller
value of K for feature selection and a bigger one for classification. Table 1 shows
our evaluation. The numbers (mean ± standard deviation of the accuracy over the
three sets) in the first column are obtained in the feature selection stage, i.e. using the
value of K written on their left. The highest classification accuracy over the test sets
for each value of K used during the feature selection are highlighted in boldface.
We chose values of K: (1) to be odd numbers, (2) to be smaller than the training
class sizes and (3) to span what seemed like a sensible range. Since performance
does not vary too much for the K = 11 or K = 15 test cases, any value of K in this
range is probably approximately equally effective. In the following, the presented
Classification of Non-melanoma Skin Lesions 75
Table 1 Accuracy of the three subclassifiers varying the value of K. Each row shows the value of
K used in training, columns show the K used in testing
(a) Top level
Training Set Test Set
K =7 K = 11 K = 15
results are obtained using the combination of K that gave the best classification
accuracy (underlined in the table) on the test set for each subclassifier (top level
classifier: train K = 15, test K = 15; AK/BCC/SCC classifier: train K = 15, test
K = 11; ML/SK classifier: train K = 11, test K = 15). Recalling that K is the num-
ber of nearest samples used to classify the image under examination, it is technically
correct to use different values at the classification stage than those used during the
feature selection stage.
A comparison of the accuracy (mean ± standard deviation over the three sets) of
the three subclassifiers using only color features is reported in Table 2, using the K
values reported in the previous section. Note that values for RGB are different from
Table 1 because texture features are not used here.
The best results are obtained using RGB and Ohta color spaces. Actually all
accuracies are nearly identical before normalization. After normalization, RGB and
76 L. Ballerini et al.
Ohta color spaces still give best results for the top classifier, while RGB gives much
better results for the other two subclassifiers. These data also indicate that color
features are more important at the top level of the hierarchy, i.e. in discriminating
cancerous vs non cancerous lesions.
The texture feature set that best discriminates between the groups at the first level
of the hierarchy is different from the feature sets that best discriminate at the second
level, and these two sets also differ between each other. Figure 4 shows a scatter
plot of the two top features for each classifier. The list of selected features for each
level of the hierarchy is reported in the Appendix (see Table 8). Considering that
a potentially very different set of features is selected at each node of the hierarchy,
we can say that the hierarchical method, as a whole, actually uses a larger set of
features in a more efficient way, without ending up in problems like the “curse of
dimensionality”. Hence, there is a benefit from the hierarchical scheme.
We can observe that color information is important also in the texture features
because texture properties extracted using different color channels are selected.
The plots of the accuracy vs the number of features (from 1 to 10 for each level
of the hierarchy) are shown in Fig. 5. We show only the plots for one of the three
subsets and for the best K combinations. Keeping in mind that the color features are
fixed and feature selection is applied only to texture features, the nearly flat trend of
the top level classifier (Fig. 5 top) confirms that color features are more important
in discriminating AK/BCC/SC from ML/SK, and adding more texture features does
Classification of Non-melanoma Skin Lesions 77
Table 3 Accuracy of the (a) Stop when validation accuracy decreases once
three subclassifiers over the
three training sets, validation Top level Group1 Group2
sets and test sets
Training set 94.06 ± 1.37 71.63 ± 6.20 87.84 ± 1.01
Validation set 91.88 ± 0.81 69.16 ± 0.36 85.99 ± 0.39
Test set 91.56 ± 1.06 71.49 ± 1.91 86.01 ± 0.33
# Features 5, 5, 3 6, 9, 3 3, 3, 14
not improve its performance more than 2 %. On the other hand, the trend of the
AK/BCC/SCC and ML/SK subclassifiers (Fig. 5 bottom) indicates the usefulness of
using texture features at this level of the hierarchy.
Referring again to the plots shown in Fig. 5, we can see that is reasonable to stop
after adding 10 texture features to color features, as the accuracy on the test set was
not significantly improving anymore.
A slight overfitting problem evident in some plots suggested us to make experi-
ments using the three sets as train, validation, test sets respectively. Results (mean
± standard deviation of the accuracy over the three subsets) are reported in Table 3.
We stopped selecting additional features when the accuracy on the validation set
decreased (once in the top table, twice in the bottom one).
We did not notice any significant improvement. This is probably due to the
smaller training set size that further reduced the size of the smallest class.
The number of features selected for each of the three subsets is in the last row of
the tables. The high variation means the number of selected features is not a crucial
choice. Indeed, the three subsets are created by randomly splitting the data in such
a way that the 5 lesion classes were equally represented in each subset.
The SFS feature selection algorithm is claimed not to be the optimal algorithm,
however in our case the use of a sequential forward backward greedy algorithm
(see Table 4) did not show any significant improvement. Once again we note a high
variation in the number of selected features for the three subsets.
80 L. Ballerini et al.
Table 5 Comparison of the overall percentage accuracy of the hierarchical and flat classifiers over
the three training sets and test sets
Flat KNN HKNN Flat Bayes Hierarc. Bayes
Training set 77.6 ± 1.4 83.4 ± 1.4 74.3 ± 2.2 81.9 ± 1.5
Test set 69.8 ± 1.6 74.3 ± 2.5 67.7 ± 2.3 69.6 ± 0.4
In Table 5 we compare our results with the results obtained using a non hierarchical
approach, i.e. a flat K-NN classifier and a Bayes classifier that use a single set of
features for all the 5 classes. The flat classifiers were trained using features selected
using the same SFS algorithm. Results of a hierarchical Bayes classifier, having the
same hierarchy as the HKNN classifier and whose subclassifiers were trained using
the same features and the same SFS algorithm, are also reported in the table. We
see that the use of hierarchy gives an improvement both over the training and test
sets.
6 Overall Results
The final results are reported in Table 6. The final accuracy of the top classifier and
the two subclassifiers at the bottom levels are also reported here. The values are
the mean ± standard deviation over the three training and test sets. These results
are obtained using best combination of K determined in Sect. 5.4.1, the RGB color
features and 10 texture features for each subclassifier. We decided to use a fixed
number of features as the train-validation-test scheme did not enhance performance
(see considerations in Sect. 5.4.4 about set sizes and number of features). Similarly,
the variety of results from the different configurations and numbers of features all
have about the same level, given the estimated standard deviations, and so suggest
that there is little risk of overtraining.
Recall the top level classifier discriminates between cancer and pre-malignant
conditions (AK/BCC/SCC) and benign forms of skin lesions (ML/SK). Therefore,
its very high accuracy (above 93 %) indicates the good performance of our system in
identifying cancer and potential at risk conditions. Analysis of the wrongly classified
Classification of Non-melanoma Skin Lesions 81
Table 6 Accuracy of the three subclassifiers and combined classifier over the three training sets
and test sets. Note that the Group1/2 results are only over the lesions correctly classified at the top
level. On the other hand, the full classifier results report accuracy based on both levels
Top level Group1 Group2 Full classifier
Training set 95.7 ± 0.6 81.9 ± 3.6 91.9 ± 0.5 83.4 ± 1.4
Test set 93.9 ± 0.7 72.6 ± 2.4 86.2 ± 0.6 74.3 ± 2.5
images at the top level pointed out that these were the lesions on which clinical
diagnosis of experienced dermatologists was most uncertain.
The overall classification accuracy on the test set is 74.3 ± 2.5 %, as shown in
the right column of Table 6. The overall result also includes the ∼6 % misclassified
samples from the first level.
The overall performance (74 %) is not yet at the 90 % level (achieved after 20+
years of research) for differential diagnosis of moles versus melanoma, however,
our method addresses lesion classes that seem to have no previous automated image
analysis (outside of research from our group [2–4, 8, 9, 36, 39, 65]) and, as high-
lighted previously, our algorithms’ performance is above the diagnostic accuracy
currently being achieved by non-specialists.
Table 7 shows the confusion matrix of the whole HKNN system on the test im-
ages. This matrix has been obtained by adding the three confusion matrices from
the three test sets, as they are disjoint. We note a good percentage of correctly clas-
sified BCC, ML and SK. The number of correctly classified AK and SCC at a first
glance looks quite low. This is due to the small number of images in each of these
two classes. However most of the AKs are misclassified as BCC and we should re-
member that AK is a pre-malignant lesion. Also many SCC are classified as BCC
which is another kind of cancer. Therefore consequences of these mistakes are not
as dramatic as if they were diagnosed as benign. An additional split in the hierarchy
may improve results.
7 Conclusions
skin lesion from color images. Our approach uses a hierarchical combination of
three classifiers, utilizing feature selection to tailor the feature set of each classifier
to its task. The hierarchical K-NN structure improves the performance of the system
over the flat K-NN and a Bayes classifier.
As the accuracy is above 70 %, this system could be used in the future as a
diagnostic aid for skin lesion images, particularly as the cancerous vs non cancerous
results are ∼94 %.
These results were produced by optimizing classification accuracy. For medical
use future research should include the cost of decisions into the optimization pro-
cess.
Further studies will include the extraction of other texture related features, the
evaluation of other feature selection methods and the use of a weighted K-NN
model, where neighbor images are weighted according their distance to the test
image. In the future, it would be interesting to extend the hierarchical approach to
more than two hierarchical levels, including self-learned hierarchies.
Acknowledgement We thank the Wellcome Trust for funding this project (Grant No: 083928/Z/
07/Z).
Appendix
List of texture features selected for each level of the final tree. (See Table 8.)
Table 8 (Continued)
(b) Group 1 (AK,BCC,SCC)
Texture feature Interp. distance Quant. levels Color channels Site
Variance 30 64 HS Lesion/skin
Energy 25 256 ab Lesion
Inv. Diff. Moment 25 64 BB Lesion
Entropy 15 128 HH Lesion
Max Probability 5 256 RB Lesion/skin
Cluster Prominence 10 64 SS Lesion-skin
Contrast 5 64 SS Lesion
Homogeneity 15 256 RG Lesion
Homogeneity 25 128 SV Lesion/skin
Inv. Diff. Moment 5 64 HS Skin
References
1. Alcón JF, Heinrich A, Uzunbajakava N, Krekels G, Siem D, de Haan G, de Haan G (2009)
Automatic imaging system with decision support for inspection of pigmented skin lesions and
melanoma diagnosis. IEEE J Sel Top Signal Process 3:14–25
2. Aldridge RB, Glodzik D, Ballerini L, Fisher RB, Rees JL (2011) The utility of non-rule-based
visual matching as a strategy to allow novices to achieve skin lesion diagnosis. Acta Derm-
Venereol 91:279–283
3. Aldridge RB, Li X, Ballerini L, Fisher RB, Rees JL (2010) Teaching dermatology using 3-
dimensional virtual reality. Archives of Dermatology 149(10)
4. Aldridge RB, Zanotto M, Ballerini L, Fisher RB, Rees JL (2011) Novice identification of
melanoma: not quite as straightforward as the ABCDs. Acta Derm-Venereol 91:125–130
5. Armengol E (2011) Classification of melanomas in situ using knowledge discovery with ex-
plained case-based reasoning. Artif Intell Med 51:93–105
6. Arvis V, Debain C, Berducat M, Benassi A (2004) Generalization of the cooccurence matrix
for colour images: application to colour texture classification. Image Anal Stereol 23(1):63–
72
84 L. Ballerini et al.
7. Aslandogan Y, Mahajani G (2004) Evidence combination in medical data mining. In: Proceed-
ings of international conference on information technology: coding and computing, vol 2, pp
465–469
8. Ballerini L, Li X, Fisher RB, Aldridge B, Rees J (2010) Content-based image retrieval of skin
lesions by evolutionary feature synthesis. In: di Chio C, et al (eds) Application of evolutionary
computation, Istanbul, Turkey. Lectures notes in computer science, vol 6024, pp 312–319
9. Ballerini L, Li X, Fisher RB, Rees J (2010) A query-by-example content-based image retrieval
system of non-melanoma skin lesions. In: Caputo B (ed) Proceedings MICCAI-09 workshop
MCBR-CDS 2009: medical content-based retrieval for clinical decision support. LNCS, vol
5853. Springer, Berlin, pp 31–38
10. Basarab T, Munn S, Jones RR (1996) Diagnostic accuracy and appropriateness of general
practitioner referrals to a dermatology out-patient clinic. Br J Dermatol 135(1):70–73
11. Cascinelli N, Ferrario M, Tonelli T, Leo E (1987) A possible new tool for clinical diagnosis
of melanoma: the computer. J Am Acad Dermatol 16(2):361–367
12. Cavalcanti PG, Scharcanski J (2011) Automated prescreening of pigmented skin lesions using
standard cameras. Comput Med Imaging Graph 35(6):481–491
13. Ceci M, Malerba D (2003) Hierarchical classification of HTML documents with WebClassII.
In: Proceedings of the 25th European conference on information retrieval, pp 57–72
14. Celebi ME, Iyatomi H, Schaefer G, Stoecker WV (2009) Lesion border detection in der-
moscopy images. Comput Med Imaging Graph 33(2):148–153
15. Celebi ME, Kingravi HA, Uddin B, Iyatomi H, Aslandogan YA, Stoecker WV, Moss RH
(2007) A methodological approach to the classification of dermoscopy images. Comput Med
Imaging Graph 31(6):362–373
16. Celebi ME, Stoecker WV, Moss RH (2011) Advances in skin cancer image analysis. Comput
Med Imaging Graph 35(2):83–84
17. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory
13(1):21–27
18. Cancer research UK (CRUK). CancerStats, Internet (2011). URL http://info.cancerresearchuk.
org/cancerstats. Accessed 03/08/2011
19. Dalal A, Moss RH, Stanley RJ, Stoecker WV, Gupta K, Calcara DA, Xu J, Shrestha B, Drugge
R, Malters JM, Perry LA (2011) Concentric decile segmentation of white and hypopigmented
areas in dermoscopy images of skin lesions allows discrimination of malignant melanoma.
Comput Med Imaging Graph 35(2):148–154
20. D’Alessio S, Murray K, Schiaffino R, Kershenbaum A (2000) The effect of using hierarchical
classifiers in text categorization. In: Proceedings of 6th international conference recherche
d’information assistee par ordinateur, pp 302–313
21. Day GR, Barbour RH (2000) Automated melanoma diagnosis: where are we at? Skin Res
Technol 6:1–5
22. Dimitrovski I, Kocev D, Loskovska S, Dzeroski S (2011) Hierarchical annotation of medical
images. Pattern Recognit 44(10–11):2436–2449
23. Dumais S, Chen H (2000) Hierarchical classification of web content. In: Proceedings of the
23rd annual international ACM SIGIR conference on research and development in information
retrieval. ACM, New York, pp 256–263
24. Duwairi R, Al-Zubaidi R (2011) A hierarchical K-NN classifier for textual data. Int Arab J Inf
Technol 8(3):251–259
25. Fix E, Hodges JL (1989) Discriminatory analysis. Nonparametric discrimination: consistency
properties. Int Stat Rev 57(3):238–247
26. Garnavi R, Aldeen M, Celebi ME, Varigos G, Finch S (2011) Border detection in dermoscopy
images using hybrid thresholding on optimized color channels. Comput Med Imaging Graph
35(2):105–115
27. Gerbert B, Maurer T, Berger T, Pantilat S, McPhee SJ, Wolff M, Bronstone A, Caspers N
(1996) Primary care physicians as gatekeepers in managed care: primary care physicians’ and
dermatologists’ skills at secondary prevention of skin cancer. Arch Dermatol 132(9):1030–
1038
Classification of Non-melanoma Skin Lesions 85