Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 J uly, 2012

DERMATOLOGICAL DISEASE DIAGNOSIS USING COLOR-SKIN IMAGES

M. SHAMSUL ARIFINI, M. GOLAM KIBRIA2, ADNAN FIROZE " M. ASHRAFUL AMINI, HONG YAN3

1 Computer Vision and Cybernetics Group,SECS,Independent University,Bangladesh


2Department of Computer Science & Engineering,University of Information Technology & Sciences
3 Department of Electronic Engineering,City University of Hong Kong,Hong Kong

E-MAIL: arifin_cham@yahoo.com.golam.kibria@uits.edu.bd.adnan.firoze@gmail.com.aminmdashraful@ieee.org.
h.yan@cityu.edu.hk

Abstract: addresses such complexities of the school of dermatology of


This paper presents an automated dermatological medical science.
diagnostic system. Etymologically, dermatology is the medical Even though a substantial amount of work has
discipline of analysis and treatment of skin anomalies. The undergone in the amalgamation of medicine and computer
system presented is a machine intervention in contrast to human
science, little research has been conducted that connects
arbitration into the conventional medical personnel based
computer vision and dermatology and none have taken place
ideology of dermatological diagnosis. The system works on two
dependent steps - the first detects skin anomalies and the latter
in the subcontinent of South Asia to our knowledge.
identifies the diseases. The system operates on visual input i.e. Techniques for segmentation of dermatoscopic images
high resolution color images and patient history. In terms of using Stabilized Inverse Diffusion Equations (SIDE) were
machine intervention, the system uses color image processing introduced by Gao et al. [1]. They focused on segmenting
techniques, k-means clustering and color gradient techniques to legions from regular skin surfaces. In their experiment they
identify the diseased skin. For disease classification, the system used the 6 segmentation procedures to 87 images of Nevi and
resorts to feedforward backpropagation artificial neural
Melanomas. Their accuracy using algorithms: MMRF,
networks. The system exhibits a diseased skin detection
MMRFISIDE, MSIDE, Median cut and Adaptive
accuracy of 95.99% and disease identification accuracy of
thresholding in blue channel were 75.29%, 71.76%, 63.5%,
94.016% while tested for a total of 2055 diseased areas in 704
skin images for 6 diseases.
54.77% and 51.5% respectively. A multispectral image
processing techniques in dermatology was proposed by Jalil
Keywords: [2]. She exposed several approaches in segmenting and
Skin anomalies, Color Gradient, Clustering, GLCM, labeling skin anomalies using transformations in the
Dermatology. frequency domain. She found an average accuracy of 86.4%
in detecting skin abnormalities. Two researches that were
1. Introduction
close to our very own work presented in this paper were
conducted by Antkowiak [3, 5]. His dataset consisted of 215
dermatological images. He used the Fourier transforms of the
Dermatology is the branch of medical science that is
images and used them as features directly to the Artificial
concerned with diagnosis and treatment of skin based
Neural Networks (ANN) and Support Vector Machine (SVM).
disorders. The vast spectrum of dermatological disorders
He presented a comparative analysis on of his classification
varies geographically and also seasonally due to temperature,
performances based on SVM and ANN Using (SVM,ANN),
.
humidity and other environmental factors. Human skin is one
his disease wise classification were as follows: Acne Vulgaris
of the most unpredictable and difficult terrains to
(45.1%, 96.1%), Atopic Dermatitis (42.1%, 93.9%),
automatically synthesize and analyze due to its complexity of
Granuloma Annulare (42.9%,94.0%),Keloid (55.0%,95.4%),
jaggedness, tone, presence of hair and other mitigating
Melanocytic Nevus (64.1%, 97.4%), Melanoma Maligna
features. Even though there have been several researches
(48.4%,96.8%), Nevus Pilosus (66.1%, 97.4%). Thus, ANN
conducted to detect and model human skin using Computer
showed significantly better classification performance.
Vision techniques,very few have concentrated on the medical
Similar approaches using ANN were employed by
paradigm of the problem. Expert diagnostic systems that deal
Agatonovic-Kustrin et al. [4] and Bishop and Beresford [6].
with dermatological disorders are hard to find in the scholarly
A varied imaging dataset were introduced by du Vivier
area of Intelligent and Expert Systems. Therefore,our system
[7] and eCureMe [8]. Gniadecka et al. [9] and Delgado

978-1-4673-1487-9/12/$31.00 ©2012 IEEE


1675
Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 uly, 2012 J

Gomez et al. [10] introduced specialized neural techniques Disease Acne Eczema Psoriasis Tinea Scabies Vitiligo
Name Corporis
for the detection of Melanoma and Psoriasis. In an extensive
study, Handels et al. [11] introduced techniques to extract Patients 19 18 18 14 26 26
features from visual skin tumors. Hoffmann et al. [12]
focused on diagnosis of skin cancer using neural approaches Number 107 102 105 107 182 101
which was exemplary. of
Pictures
Taken
2. The Dermatological Disease Diagnosis System

The complete methodology of our system is represented


2.2. Image Pre-processing and ROI Detection
in Fig. 1 as a flowchart. The individual steps are modularized
and are often autonomous and sometimes dependent on each
other. In image preprocessing, our task is to segment the
diseased skin from healthy skin and making the system
prepared for the classification stage.
Data Collechon and Image AcqUlS1hon

2.2.1 Cropping the Images


Color Grad!ent Generahon

Clu stenn giThresho 1 ding an d Detechng 0 f the Re g1 on s


The apparent pictures that were taken had objects,
of Interest (ROJ) clothing and humans accessories present in the images. We
had to remove them manually to recreate and refine the
Fealure Extrach 0 n
dataset such that the images contain only - "skin" (both
healthy and diseased).
System Trammg, Teshng and Vahdahon

2.2.2 Color Gradient Generation


Fig. 1. A flowchart of the methodology of our system

We used the modified Sobel operator based on color


2.1. Skin Image Acquisition and Data Collection instead of gray. Let, the gradient of a 2D function Fe(x, y) is
defined as the dependent on co-ordinates x and y. Using this
The fIrst and primary task was to collect necessary data notation, it can be shown that the direction of maximum rate
of patients in order to develop the system. Besides this, the of change of c(x,y) where c is the color image as function of
non-visual data of patient histories had to be collected. Thus, x and y (gx and gy are linear horizontal and vertical gradients)
a specialized doctor was present to validate and record the and is given by the angle in following equation.
external data of the patients such as disease history,feeling in 1 1 2gxy
diseased part of body, elevation of the diseased region etc. ( )
B X,Y = -tan-
2
---=-'---
- -::
gxx-gyy (1)
Some characteristic details of the data collection steps are
given below. The value of the rate of change (i.e. the magnitude of the
• Camera Build: Pansonic Lumix FZ-35
gradient) in the directions given by the elements of 0, is given
by following equation.:
• Focusing: The camera was focused manually to adjust
1

Fo (X, y ) = H [(gxx + gyy ) + (gxx-gyy )cos2B(x,y ) + 2gxy sin2B(x, Y )ly


the variable nature of natural light
• Image Resolution: 12 Megapixels (4000 x 3000 pixels)

• Shutter Speed: 1120 to 11125 seconds (based on natural (2)


lighting)
2.2.3 Labeling the Regions of Interest (ROI) on the Image
• A reference object: (a 50 paisa coin that is standard
After color gradient generation, a threshold was applied
Bangladeshi coin of a diameter of 2.2 centimeters) and k-means clustering was performed on the color gradient.
• Source Institution: Sir Salimullah Medical College and
Then morphological closing was performed on the clusters to
Mitford Hospital,Dhaka,Bangladesh obtain the binary mask and by applying the mask we
• Department: Department of Dermatology segmented the diseased part from the healthy skin. All the
• Patients Participated: 112, details of the data collected is consequent steps are illustrated in Fig. 2.
provided in table I.
TABLE I: DETAILED SAMPLE INFORMATION

1676
Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 J uly, 2012

Fig. 3. (a) Original Image with ROI-s Labeled (b) Image with the ROI-s
taken apart (c) Original image with no healthy skin present

However we have used overlapping concentric circles


inside the ROI region to have a more clear idea of the mean
colors present inside the Region of Interest (ROI). The circles
were used as they provide a measure which is rotation
invariant and they were overlapped to attain an attribute of
translation invariance.
(d) (e) (I) Five concentric circles were used to extract the colors
Fig. 2. (a) Original Image (b) Color Gradient of Original Image (c)
features. It is illustrated in Fig. 4.
Averaging Filter applied over (b) to ignore hair or scar (d) Binary Mask
by thresholding and clustering on the image with (c) label (e) Labeled
region of interest (t) Isolated ROI (Region of interest or Diseased Skin)

2.3. Feature extraction

The features are of two types: automated visual features


(from images) and external features (from patient history
forms). The extraction procedures were different for the two.

2.3.1 Automated Visual feature extraction:

The feature collections that were extracted from the


images of the ROI and healthy skin are the following: Fig. 4. Mean Color extracted using concentric windows inside the ROI
• Mean and Standard Deviations of Colors of 3 Color (the left most arrow represents the mean color of healthy skin)
Channels (R, G and B) ofROI
• Mean and Standard Deviations of Colors of 3 Color To compute energy, entropy, contrast and homogeneity
Channels (R, G and B) ofhealthy skin we first computed GLCM for each color channel in each
• Distribution (scattering ofthe ROI-s) window. G x G GLCM Pd for a displacement vector d (dx, =

• Area (in mm2 ) dy) is given by Shahbahrami et al. [21]. The (i, j) of Pdis the
• Energy, Entropy, Contrast and Homogeneity from GLCM
number of occurrences of the pair of gray-level i and j which
in each color channel [21]
are a distance "d" apart. The equations for obtaining the 4
features from a GLCM can be found in Shahbahrami et al.
At this point,we have separated the healthy skin and the
[21].
individual ROI-s in structures that contain the images of the
The distribution is how the diseased parts are spread out
RGB components. From the RGB components, we have
from each other from its manifestations. This was done from
computed the mean, variance (standard deviation) of the
calculating the Euclidean distances from one connected
channels individually. Energy, entropy, contrast and
component to another and getting their mean. It can be
homogeneity were computed from the regions by computing
expressed as the following equation (where, every i is a ROI
the gray level co-occurrence matrices (GLCM) in each
in a single image, dis�j is the distance from ROI i to j and m
channel as proposed by Shahbahrami et al. [21]. The
is the total number of ROI-s in an image). This feature
separation of ROI and healthy skin was done using masks as
calculates the "spread" of the ROIs in an image.
shown in Fig. 3.
Thus we can see how the healthy and normal skins were
separated to extract colors from them.

1677
Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 J uly, 2012

[m 1
should be noted that in the ROC curves: True Positives (TP)
L dist .. = Correctly detected Regions of Interest (diseased skin),
. . . ·-1
Dzstrzbutzon = 1-
Y
X �
ixels
False Positives (FP) = Incorrectly detected Regions of
m mm Interestlhealthy skin detected as ROI, True Negatives (TN) =
Correctly detected Healthy Skin (since the images are
(3) cropped to contain skin only, the value of this is constant at
The area or size of a ROI is very important to classify 100% at all times for our system), False Negatives (FN) =
the disease. It was done by fIrst using propping regions over Incorrectly detected healthy skin (healthy skin contains
the labeled ROI-s over the masks generated to detect the ROI/the measure of how much the system was unable to
regions of interest. Then we multiplied the area in pixel detect the diseased parts).
with the ratio of pixel to the reference object present in the From these results (Fig. 5) we see that we fInd the best
images (a 50 paisa coin with 2.2 c.m. in diameter) to fInally "detection of ROI accuracy" at the Color-Gradient threshold
get the area or size of the ROI in milimete? of 0.8 and the accuracies are 86.04% for unsupervised
clustering (Fig. 5-top) and 92.25% for semi-supervised (user
2.3.2 External features extraction inputted number of ROI) clustering (Fig. 5-bottom).
Consequently, we therefore choose the semi-supervised
The features that were collected from the patient history
clustering for further validation since it exhibited an
are the following: diseased body part, elevation and
improved detection rate over the unsupervised system.
frequency of occurrence (in months). The diseased body parts
were quantifIed and added manually to the sample
ROC Curves varying over Color Gradient Threshold
descriptions. Since stereo imaging was not feasible in a
(unsupervised system)
government hospital to take a 3D images, the elevations of 100.00 --True
.. Positive %
ROIs was taken from the patient history forms. A s::: 80.00
u

.. --False
dermatologist assisted in this regard. Elevation refers to the E 60.00 h,.,f-------''"'''
-' 'o;----1:-.JII'
;:- '-- '' OO.5 3 Positive %
a
depth or height of the diseased lesion or region of interest 1; --True
Negative %
40.00
Q.
(ROI). Naturally as it is a length that was measured and the 20.00 --False
unit was in millimeter. Negative %
0 .00
--Accuracy%
0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
2.4. System Training and Testing Color Gradient

We have used feed-forward back-propagation neural ROC Curves varying over Color Gradient Threshold (semi­
supervised system)
network training to perform this step. We validated and tested
our system using the tenfold cross validation process. The 100.00 -- True
Positive %
virtue of using a cross validation technique was that there ..
� 80.00
-- False
were no overlapping of the test data and training data,making Positive %
..
E
(; 60.00
the system testing results viable and dependable.
1:
-- True
Negative %
8'.. 40.00

--False
20.00 Negative %

3. Experimental results and discussion


0.00
-- Accuracy%

0.10 0.20 0.30 0.40 0.50 0.60 0.70 0 .80 0.90 1.00
The selection of an optimized threshold of Color Gradient
color-gradient and neuron number of the feed-forward
back-propagation ANN was tested with different Fig. 5. ROC curves of ROI detection for varying Color Gradient using
configuration of ANN We have selected our semi-supervised
.
unsupervised clustering (top) and Sami-supervised clustering (bottom)
for Sample size: 2055.
system as it gives us better response than the unsupervised
system and they are discussed in next subsection.
3.2 Classification Performance
3.1. ROC Curves for Optimal Color Gradient Threshold
Upon creation of the training matrix of dimensions 103
x 2055 (2055 samples with 103 features each) and target
The ROC curves in Fig. 5 illustrate the selection of the
matrix of dimension 6 x 2055 (2055 total samples of 6
best threshold. Since the images were cropped to include only
diseases) we were set for tenfold cross validation. We
skin portions, thus in all cases True Negative=100%. It

1678
Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 J uly, 2012

partitioned each fold with 10% samples for testing and 90% 3.2.2 Disease Classification Accuracy Evaluation:
samples for training such that upon 10 folds we get
comprehensive accuracy results. We tested our system using We have evaluated our system's performance based on
70 neurons growing to 150 neurons in a single hidden layer. the sample sizes as weights to signify how well it performs.
From the plot in Fig. 6 we see that the highest classification Thus, based on table II, we conclude that our system
accuracy was found using 98 neurons in the hidden layer and performs with a classification accuracy of 94.0146% with
the accuracy was found to be 94.667%. training from 2055 samples of 6 diseases and an ANN with
one hidden layer with 98 neurons.
Neuron Numbers in Hidden Layervs.
Accuracy (Sample Size: 2055)

4. Conclusions

In this paper we presented a robust and automated


method for the diagnosis of dermatological diseases. In brief
it was a challenging task since the human skin is one of the
II most difficult surface or terrain to analyze. It is unique and
o�o�o � o�o�o�o�o�o novel since our dataset was not acquired from secondary
��••��OO""MMMM••�
""""""""""" sources; rather it was a work of months of toil in an actual
Neurons In Hidden Lne, hospital of Bangladesh making the dataset a standard dataset
which is completely new in perspective of Bangladesh. We
Fig. 6. Neurous iu hidden layer vs. tenfold cross-validated average
should point out that it is to replace doctors because no
98 Neurons
accuracy folds vs. classification accuracy in aercentage for
(number of neurons for highest classification accuracy) machine can yet replace the human input on analysis and
intuition. Nevertheless, we consider our system as a
3.2.1Disease-wise classification peiformance using terifold significant leap for machine intervention in medicine in
cross-validation: Bangladesh. The system exhibits a diseased skin detection
accuracy of 95.99% and disease identification accuracy of
Among our 2055 samples we had 6 diseases. It is to be 94.016% while tested for a total of 2055 diseased areas in
noted that every image was not a sample rather every ROI 704 skin images for 6 diseases.
was a sample in our system, making it more versatile so that
one image containing multiple diseases can also be classified Acknowledgement
successfully. The disease-wise accuracy calculation and data
info is presented in table II using 98 neurons in the hidden This work is supported by a grant from the Citygroup
layer of our Artificial Neural Network. Bangladesh (www.citygroup.com.bd).

TABLE II: OVERALL CLASSIFICATION PERFORMANCE GENERATION References

Accuracy
[1] Jianbo Gao,Jun Zhang,Matthew G. Fleming,IlyaPollak,
Disease Sample Size Accuracy X
Sample Size
Armand B. Cognetta. Segmentation of Dermatoscopic
Images by Stabilized Inverse Diffusion Equations.
Acne 405 96.66 % 391.347 Computerized Medical Imaging and Graphics,
Eczema 320 88.00 % 281.6 22(5):375-389 . 1998.
Psoriasis 288 89.33 % 257.2704 [2] Bushra Jalil . Multispectral Image Processing Applied to
T inea 280 88.33 % 247.324 Dermatology. Thesis Submitted for the Degree of MSc.
Corporis
Erasmus Mundus in Vision and Robotics (VIBOT).
Scabies 507 98.67% 500.2569
Vitiligo 255 99.67% 254.1585
Universite de Bourgogne .2008 .
Total: 2055 Total: 1932.083 [3] Michal Antkowiak . Artificial Neural Networks vs.
Accuracy using (equation 5) = (1931/2055) x 100% = Support Vector Machines for Skin Diseases
94.0146% Recognition . Master's Thesis in Computing Science .
Umea University,Sweden . 2006.

1679
Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 J uly, 2012

[4] S. Agatonovic-Kustrin and R. Beresford.Basic concepts [14] NIH: National Institute of Health (2011). Medline Plus -
of artificial neural network (ANN) modeling and its Scabies [Online]. Available:
application in pharmaceutical research. Journal of http://www.nlm.nih.gov/medlineplus/scabies.htrnl.
Pharmaceutical and Biomedical Analysis, 22:717-727, Retrieved: July 27,2011.
2000. [15] NIH: National Institute of Health (2011). Medline Plus -
[5] Michal Antkowiak. Recognition of skin diseases using Eczema [Online]. Available:
artificial neural networks. In Proceeding of USCCS'05, http://www.nlm.nih.gov/medlineplus/eczema.htrnl.
pages 313-325,2005. Retrieved: July 27,2011.
[6] Chris M. Bishop. Neural networks and their applications. [16] NIH: National Institute of Health (2011). Medline Plus -
Review of Scientific Instruments, 65(6):1803-1832, Psoriasis [Online]. Available:
1994. http://www.nlm.nih.gov/medlineplus/psoriasis.htrnl.
[7] Anthony du Vivier. Atlas of Clinical Dermatology. Retrieved: July 27,2011.
Churchill Livingstone,2002. [17] NIH: National Institute of Health (2011). Medline Plus -
[8] eCureMe. Medical Dictionary. [Online] . Available: Vitiligo [Online]. Available:
http://www.ecureme.com .Retrieved June 16,2011. http://www.nlm.nih.gov/medlineplus/vitiligo.htrnl.
[9] Monika Gniadecka, Peter Alshede Philipsen, Sigurdur Retrieved: July 27,2011.
Sigurdsson, Sonja Wessel, and et al. Melanoma [18] NIH: National Institute of Health (2011). Medline Plus­
diagnosis by Raman spectroscopy and neural networks: TineaCorporis [Online]. Available:
Structure alterations in proteins and lipids in intact cancer http://www.nlm.nih.gov/medlineplus/tineacorporis.html.
tissue. J Invest Dermatol,122:443--449,2004. Retrieved: July 27,2011.
[10] David Delgado Gomez, Toke Koldborg Jensen, [19] NIH: National Institute of Health (2011). Medline Plus -
SuneDarkner, and Jens Michael Carstensen.Automated Scabies [Online]. Available:
visual scoring of psoriasis. [Online] . Available: http://www.nlm.nih.gov/medlineplus/scabies.htrnl.
http://www.imm.dtu.dkivisiondagIVD02/medicinsk/arti Retrieved: July 27,2011.
culo.pdf,2002. Retrieved: June 16,2011. [20] Rafael Gonzalez, Richard Woods . Color Image
[11] Handels H. ,RoB T. , Kreusch 1. , Wolff H. H. , and Processing. Digital Image Processing with MATLAB
P··oppl S. 1. . Feature selection for optimized skin tumor 2nd Edition. p. 204-207. Prentice Hall Publications ,
recognition using genetic algorithms. Artificial USA . 2003 .
Intelligence in Medicine, 16:283-297,1999. [21] Shahbahrami A. ,Borodin D. ,Juurlink B. . Comparison
[12] K. Hoffmann,T. Gambichler,A. Rick,M. Kreutz et al. . Between Color and Texture Features for Image
Diagnostic and neural analysis of skin cancer Retrieval . In Proceedings of ACM Multimedia
(DANAOS).A multicentre study for collection and Conference, Boston,MA. pp. 361 - 371,2008.
computer-aided analysis of data from pigmented skin [22] Health Policy of Bangladesh(2011). Ministry of Health
lesions using digital dermoscopy. British Journal of and Family Welfare - Government of People's Republic
Dermatology, 149:801-809,2003. of Bangladesh [Online]. Available:
[13] Samuel Freire da Silva .Dermatological Atlas [Online]. http://nasmis.dghs.gov.bdlmohfw/index.php?option=co
Available: m content&task=view&id=388&ltemid=483. Retrieved:
www.atlasdermatologico.com.br/index.htrnl .Retrieved:
Dec 11,2011.
July 27,2011.

1680

You might also like