Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Journal of Computing Theories and Applications

Research Article
Texture Analysis and Classification of Benign and Malignant
Skin Disease using KNN Algorithm
Mamet Adil Araaf1,*

1 Faculty of Computer Science, Dian Nuswantoro University, Semarang, Central Java 50131, Indonesia; e-mail :
111201710656@mhs.dinus.ac.id
* Corresponding Author : Mamet Adil Araaf

Abstract: Skin is the largest organ in humans, it has functioned as the outermost protector of the
organs that are inside. Because of it, therefore the skin is often affected by various diseases, but some
people still often ignore the presence of skin diseases. Skin diseases are categorized into two, namely
skin diseases that are benign (benign) and malignant (malignant) which can potentially become cancer-
ous. In terms of appearance, skin disease in humans when viewed from the naked eye is very difficult
due to the complex surface and seems smooth. Therefore, image classification is carried out to assist
humans in differentiating between benign and malignant skin diseases. This study used the Gray Level
Co-occurrence Matrix (GLCM) feature extraction method and the K-Nearest Neighbor (KNN) clas-
sification algorithm. The variation of total datasets used consisted of 3297, 1649, 825, and 210 which
contained benign and malignant images, this data was tested with K = 1, K = 3, K = 5, K = 7 and K
= 9 and angles 0, 45, 90, and 135. The results obtained from dataset 3297 have the highest accuracy
79.24%, dataset 1649 has the highest accuracy of 79.39%, dataset 825 has the highest accuracy 83.63%
and in dataset 200 the accuracy reaches 100%

Keywords: Skin Disease; Classification image of skin disease; Gray Level Co-occurrence Matrix
(GLCM); K-Nearest Neighbor (KNN)

1. Introduction
The skin is the largest organ of the human body, its mass is approximately 4 kg to 5 kg.
The skin has a surface area of about 1.2 m2 - 2.2 m2. For this reason, the skin has many uses
for humans, including protecting the body, adjusting body temperature and being used for
the sense of touch as well [1]. Therefore skin disorders often occur due to several causative
factors such as climatic conditions, place of residence, environment, unhealthy living habits,
allergies and so on. There are two main categories in which skin diseases are classified as
benign and malignant [2].
Copyright: © 2023 by the authors. In Indonesia, skin cancer or malignant tumors ranks third after uterine and breast cancer.
Submitted for possible open access Skin cancer accounts for 5.9% - 7.8% of all types of cancer per year. The most common types
publication under the terms and of skin cancer in Indonesia are basal cell carcinoma (65.5%), squamous cell carcinoma (23%),
conditions of the Creative Commons
melanoma (7.9%) and other types of skin cancer. [3]. Detection of malignant skin diseases or
Attribution (CC BY) license
cancer can be done with a series of laboratory tests, but this method is considered too time
(https://creativecommons.org/licen
consuming [4]. Therefore, computer-assisted diagnosis for identification of skin diseases and
ses/by/4.0/).
classification through images is needed so that the process becomes fast. Other research re-
lated to the classification of skin diseases in humans, including research using the method
used is the Deep Learning Convolutional Neural Network (CNN) algorithm for the process
of classifying skin diseases, using the two proposed methods, namely MobileNet v1 and In-
ception V3. The research data is divided into training image data and test image data, pre-
processing is carried out on the training data and validation data. Then, the training data is
added to balance the sum of data from all disease classes. The data is then processed before
entering the training process which includes the CNN model and model evaluation. In the
testing process, the data testing files that have been processed are then tested with the results
of the model evaluation. In this study, the dataset used was MNIST HAM10000 which

Journal of Computing Theories and Applications 2023, vol 1, no 1. publikasi.dinus.ac.id/index.php/jcta/


Journal of Computing Theories and Applications 2023, vol 1, no 1. Araaf. 22

consisted of 10,015 images of skin diseases. The resulting accuracy is 72% using the Inception
V3 method and 58% using MobileNet [5]. Then other research, namely the classification pro-
cess using Artificial Neural Network (ANN), because it is considered to have a better ability
to handle complex relationships between different parameters and make classifications based
on the training data carried out. The success of a Neural Network is highly dependent on the
architectural model used and the training algorithm used to train the network. The model
used in this study is the input layer, a single hidden layer of ten neurons and the output layer,
while the training algorithms are Levenberg-Marquardt (LM), Scaled Conjugate Gradient
(SCG) and Bayesian Regularization (BR). The dataset used in this study was taken from sev-
eral different web sources and most were obtained from dermnet which contained 463 images
of skin diseases and the resulting accuracy was 76.9% [6].
From several related studies on the classification of skin cancer, it is stated that the di-
agnosis of skin cancer can use computer assistance based on an image of the disease. There-
fore, in this study, image classification will be carried out to help humans distinguish between
benign and malignant skin diseases using the Gray Level Co-occurence Matirx (GLCM) fea-
ture extraction method and the K-Nearest Neighbor (KNN) classification algorithm. GLCM
has the advantage of segmenting images well and is very accurate for extracting features that
represent the texture of an image [7], besides that it has fast computation time while KNN
has a fast training process, easy implementation but is susceptible to noise and works effec-
tively on data great training, but there are deficiencies in determining the value of k or bias
[8] [9].

2. Methodology
2.1. Dataset
The data collection process in this study uses a public dataset from
https://www.kaggle.com/fanconic/skin-cancer-malignant-vs-benign. The data consists of
3297 disease images from malignant and benign tumor types, which were formed by The
International Skin Imaging Collaboration (ISIC). The disease image data to be used consists
of 2 data, namely benign and malignant image data. The total image data contained in the
dataset is 3297 images. The pixel size of each image in the dataset is uniform, namely 224 x
224 pixels, so there is no need to crop or resize the pixel size to make it smaller. Figure 1
below shows a graph of the number of images contained in the dataset.

Figure 1. Graph of Image Class.

2.2 Preprocessing
Image processing is carried out before the classification process using the K - Nearest
Neighbor algorithm. Below are some of the processes used in the image processing process:

2.2.1 Grayscaling Image


To be able to work with the Gray Level Co-Occurrence Matrix (GLCM) method so that
it can produce texture values for each type of tumor, a grayscale process is performed (con-
verting the color image to gray scale).
Journal of Computing Theories and Applications 2023, vol 1, no 1. Araaf. 23

(a) (b)
Figure 2. Conversion to RGB to Grayscale.

2.2.2 Average Filter


The image that has been converted into a grayscale image is then refined using an aver-
age filter.

(a) (b)
Figure 3. Denoising Image using Average Image.

The result of the image processing process is to support better feature extraction values.
The application of the feature extraction method is obtained by using the Gray Level Co-
Occurrence Matrix (GLCM) using angles of 0, 45, 90 and 135 in images of various types of
tumor disease so that later it produces values of contrast, correlations, dissimilarity, energy,
and homogeneity. After obtaining this value, the classification process is carried out. The
following are equations (1), (2), (3), (4), and (5) in finding each feature in GLCM.

𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡 = ∑ ∑ (𝑖 − 𝑗)2 𝑝(𝑖,𝑗) (1)


𝑖 𝑗

(𝑖 − 𝜇𝑖)(𝑗 − 𝜇𝑗)𝑝(𝑖,𝑗)
𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 = ∑ ∑ (2)
𝑖 𝑗 𝜎𝑖𝜎𝑗

𝐸𝑛𝑒𝑟𝑔𝑦 = ∑ ∑ 𝑝(𝑖, 𝑗)2 (3)


𝑖 𝑗

𝑝(𝑖, 𝑗)
𝐻𝑜𝑚𝑜𝑔𝑒𝑖𝑛𝑒𝑖𝑡𝑦 = ∑ ∑ (4)
𝑖 𝑗 1 + |𝑖 − 𝑗|

𝐷𝑖𝑠𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = ∑ ∑ 𝑝|𝑖 − 𝑗| p(𝑖 − 𝑗)2 (5)


𝑖 𝑗

2.3 Classification
After feature extraction is carried out by generating the value of each feature from
GLCM, then the next step is classification by the K-Nearest Neighbor (KNN) algorithm.
Then divide it into various classes and which data will be grouped, the system will classify the
new data into the appropriate group and determine the level of accuracy of the model.
KNN calculation starts from determining k, k is the closest number of power. For ex-
ample using parameters with odd values, namely k=1, k=3, k=5 and so on. Next, calculate
the distance between the testing image and the training image. By using the Euclidean distance
Journal of Computing Theories and Applications 2023, vol 1, no 1. Araaf. 24

as in equation (6), it can be used in calculating the distance between testing images. Then sort
the results of the shortest distance search, then place the closest neighbor according to the k
that has been determined, then check the labels on the k closest data. And finally grouping
the testing images into majority classes according to the k closest neighbors based on training
image data to determine whether the image is labeled as benign or malignant.

𝑑(𝑖, 𝑗) = √|𝑥𝑖1 − 𝑥𝑗1 |2 + |𝑥𝑖2 − 𝑥𝑗2 |2 + ⋯ + |𝑥𝑖𝑝 − 𝑥𝑗𝑝 |2 (6)

Flowchart dari proses klasifikasi KNN dapat dilihat pada gambar 3. dan untuk proses lengkap
dari model yang diusulkan terdapat pada Gambar 4

Figure 4. Classification Flowchart.

Figure 5. Proposed Method.

4. Results and Discussion


This research produces a system that can be used to classify malignant and benign skin
cancer using the GLCM method and the KNN algorithm. Tests were carried out on several
dataset sizes, consisting of a dataset with a total of 3297 of which 2637 training data and 660
test data, a dataset with a total of 1649 of which 1319 training data and 330 test data, a dataset
with a total of 825 of which 660 train data and 165 test data, then in dataset with a total of
200 training data and 10 test data.
Journal of Computing Theories and Applications 2023, vol 1, no 1. Araaf. 25

The existing image data is converted to grayscale and the average filter is added, then the
features are taken using GLCM feature extraction to obtain contrast, homogeneity, correla-
tion, dissimlarity and energy features as shown in Figure 6. The results of these features are
then used for the training process. using KNN, after obtaining the model from the results of
the training process, testing is carried out with the existing test data based on the closest
distance to all images in the training data using the help of the Euclidean distance calculation
method. After that the results are evaluated using a confusion matrix to produce accuracy. In
Figure 7 the average accuracy of the test is calculated from various k values, angles 0, 45, 90,
135, d = 1,2,3 and by using an average filter or without an average filter, it is produced by
adding a filter the average accuracy gets better.

Figure 6. GLCM feature extraction results in each image.

Figure 6. Classification Results.

4. Conclusions
From the feature extraction results using the GLCM method with angles of 0, 45, 90,
135, while the classification process uses K-Nearest Neighbor with values k = 1, 3, 5, 7, and
9 for 3297 and 1649 datasets, while for datasets 825 and 200 used k=1 to k=10 in classifying
benign and malignant skin diseases. It can be seen from the resulting accuracy percentage,
proving that the GLCM feature extraction method and the KNN classification algorithm can
be applied quite well for the process of classifying images of benign and malignant skin dis-
eases. The addition of an average filter on grayscale images affects the accuracy results for the
better. The reduction in accuracy results for large datasets is caused by the presence of several
damaged images, for this reason along with the reduction in the number of images the accu-
racy increases because the damaged images decrease, especially in the dataset with 825 images
and 210 datasets.
The system can be developed even better by using other extraction features or by adding
preprocessing to the image to support a better classification process. Then the selection of
the right dataset, because the dataset used in this study still contains several images that are
damaged, so that it affects the classification process and the accuracy that is obtained when
there are more training and testing data. Besides that, in determining the optimal value of k,
it can be done by using an evaluation test first such as cross validation on the model to be
used.
Journal of Computing Theories and Applications 2023, vol 1, no 1. Araaf. 26

Funding: This research received no external funding


Conflicts of Interest: The authors declare no conflict of interest.

References
[1] M. N. Islam, J. Gallardo-Alvarado, M. Abu, N. A. Salman, S. P. Rengan, and S. Said, “Skin disease recognition using texture analysis,”
2017 IEEE 8th Control Syst. Grad. Res. Colloquium, ICSGRC 2017 - Proc., vol. 1, no. August, pp. 144–148, 2017, doi:
10.1109/ICSGRC.2017.8070584.
[2] T. Goswami, V. K. Dabhi, and H. B. Prajapati, “Skin Disease Classification from Image - A Survey,” pp. 599–605, 2020, doi:
10.1109/ICACCS48705.2020.9074232.
[3] S. Wilvestra, S. Lestari, and E. Asr, “Studi Retrospektif Kanker Kulit di Poliklinik Ilmu Kesehatan,” vol. 7, no. Supplement 3, pp.
47–49, 2018.
[4] N. Alifa and D. Juniati, “Analisis Jenis Tumor Kulit Menggunakan Dimensi Fraktal Box Counting Dan K-Means,” J. Ris. dan Apl.
Mat., vol. 3, no. 2, p. 71, 2019, doi: 10.26740/jram.v3n2.p71-77.
[5] I. K. E. Purnama et al., “Disease Classification based on Dermoscopic Skin Images Using Convolutional Neural Network in
Teledermatology System,” 2019 Int. Conf. Comput. Eng. Network, Intell. Multimedia, CENIM 2019 - Proceeding, vol. 2019-Novem, pp.
1–5, 2019, doi: 10.1109/CENIM48368.2019.8973303.
[6] P. Dubal, S. Bhatt, C. Joglekar, and S. Patii, “Skin cancer detection and classification,” Proc. 2017 6th Int. Conf. Electr. Eng. Informatics
Sustain. Soc. Through Digit. Innov. ICEEI 2017, vol. 2017-Novem, pp. 1–6, 2018, doi: 10.1109/ICEEI.2017.8312419.
[7] N. Sarashadarti, I. B. Hidayat, P. D. Suhardjo, and M. S. S. K, “SINTESA PENELITIAN DETEKSI KISTA PERIAPIKAL
RADIOGRAF DENGAN METODE BINARY LARGE OBJECT (BLOB) DAN METODE GRAY LEVEL CO-
OCCURRENCE MATRIX (GLCM),” vol. 5, no. 3, pp. 5538–5545, 2018.
[8] S. Mutrofin, A. Izzah, A. Kurniawardhani, and M. Masrur, “Optimasi teknik klasifikasi modified k nearest neighbor menggunakan
algoritma genetika,” no. September, pp. 130–134, 2014, doi: 10.13140/RG.2.2.22330.08648.
[9] I. N. Atthalla, “Klasifikasi Penyakit Kanker Payudara Menggunakan Metode K Nearest Neighbor,” vol. 4, no. 1, pp. 978–979, 2018.

You might also like