Professional Documents
Culture Documents
Hybrid of Convolutional Neural Network Algorithm and Autoregressive Integrated Moving Average Model For Skin Cancer Classification Among Malaysian
Hybrid of Convolutional Neural Network Algorithm and Autoregressive Integrated Moving Average Model For Skin Cancer Classification Among Malaysian
Chee Ka Chin1, Dayang Azra binti Awang Mat2, Abdulrazak Yahya Saleh3
1,2Department
of Electrical and Electronic, Faculty of Engineering, Universiti Malaysia Sarawak, 94300 Kota Samarahan,
Sarawak, Malaysia
3FSKPM Faculty, Universiti Malaysia Sarawak (UNIMAS), Kota Samarahan 94300, Sarawak, Malaysia
Corresponding Author:
Chee Ka Chin
Department of Electrical and Electronic, Faculty of Engineering
Universiti Malaysia Sarawak
94300 Kota Samarahan, Sarawak, Malaysia
Email: 19020098@siswa.unimas.my
1. INTRODUCTION
In National Cancer Registry data, Malaysia’s reported that skin cancers account for 2.6% cancer
cases in the country which considers as minor cases in Malaysia [1]. Based on data collected from
Dermatology Clinic in Hospital Kuala Lumpur from 2006 to 2014 [2], the most common type of skin cancer
is basal cell carcinoma with 34.9%, followed by cutaneous lymphoma with 25.7% and squamous cell
carcinoma in 20.6% cases while melanoma is the least common cases which only 5.4% of the patients
diagnosed. Even though there are less skin cancer cases in Malaysia, this disease still needs to be concerned
due to skin cancer consider as most predominant type of cancer in worldwide [3].
Dermatologists often adopt various diagnosis guide for skin lesion such as ABCDE rule (A–stands
for asymmetrical shape, B–irregular borders of lesion, C–uneven distribution of color, D–diameter, and E-a
presence of evolution) [4], [5] and 7-point checklist [6], [7] but need to be further examined via
epiluminescence microscopy (ELM). Garnavi et al. described that there is an improvement of 5-30% in the
detection while using ELM as compared to the examination of naked eye [8], [9]. Therefore, it is possible to
assist dermatologists via computer-aided diagnosis (CAD) system for analyzing the pigmented skin lesions
efficiently with the development of ELM through automatic and quantitative assessment of the skin lesion in
clinical practice.
Recently, convolutional neural network (CNN) allowed computers to outperform in the
classification tasks of skin cancer as compare to the machine learning due to the CNN has ability to handle
the large and complex datasets that produce supremacy performance in terms of accuracy [10]. This can be
proved through the study about the artificial intelligence (AI) program that trained on nearly 130,000 images
of moles, rashes, and lesions using a technique known as deep learning which provide “at least” 91 percent
accuracy [11]. The benefit of CNN is this architecture able to classify the skin lesions according to the high-
level features instead of the conventional method that incorporate the visual information of low-level
dermoscopic that needed a segmentation step before the extraction of those features [12].
In addition, autoregressive integrated moving average (ARIMA) model which known as the Box-
Jenkins method [13] is one of the machine learning algorithms also show the great performance in medical
cases forecasting recently through the statistical analysis [14]. Therefore, the hybrid approach of CNN and
ARIMA has been proposed for skin cancer classification problem in this paper. This approach believe that will
be performed efficiently on a reliable skin cancer classification with the ability to differentiate between three
different common types of skin cancer such as melanoma, basal cell carcinoma and squamous cell carcinoma.
Actually, the aim of this paper is to evaluate the hybrid of CNN algorithm and ARIMA model performance for
skin cancer classification among Malaysian.
2. RELATED WORKS
Recently, researchers around the world focus on solving the classification problems in medical field
especially skin cancer classification by using the machine learning and deep learning approaches. Codella et
al. [15] depicted the combination of deep learning and sparse coding as feature extractor while the fused
support vector machine (SVM) algorithm was used for classification task. The beauty of this paper is the
adoption of the deep learning in feature extraction task which able to increase time and space complexity of
the model. After that, an improvement approach has been introduced by same author in terms of the
automated segmentation, multi-contextual analysis of the extracted features and additional machine learning
techniques for classification task [16].
In Ozkan and Koklu [17] proposed a skin lesion classification system by applying four machine
learning methods to classify the lesion into melanoma, abnormal, and normal. In this work, the highest
accuracy percentage of 92.50% was performed by the artificial neural networks (ANN). This obviously
shows that machine learning techniques still consider as low accuracy performance when compare to ANN
due to the machine learning methods unable to get the deep features from network flow. Oliveira et al. [18]
used used the ABCD rule and the Box-Counting approach as attribute extraction descriptors and support
vector machine (SVM) as classifier for the identification pigmented skin lesions in macroscopic images. The
best performance was achieved by SVM with histogram intersection kernel give the accuracy of 79.01% at
the classification between nevi and seborrheic keratosis.
For the utilization of deep learning approach in skin cancer classification, Emara et al. [19]
proposed the modified Inception-v4 architecture which was also a CNN architecture to classify skin lesion
through imbalance HAM 10000 skin cancer dataset [20]. The enhancement of this work was employing the
feature reuse using long residual connection to improve classification performance with the achievement of
high accuracy, 94.70% but this pretrained DNN has the complex network with a huge number of parameters
that may causes the limited classification accuracy. Mukherjee et al. [21] using CNN methods named as
CNN malignant lesion detection (CMLD) architecture to analyze the classification accuracy performance of
malignant melanoma. The best performance in this work was achieved with accuracy of 90.58% when using
Dermofit dataset for model training and testing. Recently, Kassem et al. [22] used modified GoogleNet by
replacing last two layers with multiclass support vector machine (SVM) with transfer learning to classify the
skin lesion. This hybrid approach achieved classification performance with accuracy, sensitivity, specificity,
and precision are 94.92%, 79.80%, 97.00%, and 80.36%.
Since ARIMA model shows the outstanding performance in forecasting tasks, researchers tend to
use hybridisation methods for enhancement due to its outstanding performance in solving a lot of engineering
problems. Wang et al. [23] developed the hybrid model of SVM and ARIMA (SVM-ARIMA) to the garlic
short-term price and outperformed the ARIMA and SVM models. Ma et al. [24] proposed hybrid of
multilayer perceptron (MLP) neural network with ARIMA model which named as NN-ARIMA and hybrid of
multidimensional support vector regression (MSVR) with ARIMA model which known as MSVR-ARIMA
Int J Artif Intell, Vol. 10, No. 3, September 2021: 707 - 716
Int J Artif Intell ISSN: 2252-8938 709
for network-wide traffic state prediction. MLP neural network and MSVR were implemented to capture the
network-scale co-movement pattern of all traffic flows while ARIMA was utilized to postprocess the residual
time series out of neural network in this paper. Suhermi et al. [25] employed the hybrid deep neural network
(DNN) and ARIMA model for roll motion prediction. This proposed approach is based on the supremacy
performance of ARIMA and artificial neural network (ANN) models in linear and non-linear modelling. In
addition, the advance hybrid approach has been yielded by Ji et al. [26]. The authors introduced the hybrid of
ARIMA model, CNN and long short-term memory network (LSTM) to forecast the future prices of carbon.
The idea behind of this hybrid method is ARIMA used for linear features capturing. CNN utilized for
hierarchical data structure capturing while LSTM utilized for long-term dependencies capturing in the data.
The beauty of these papers were the empirical results of hybrid model performed better than that of non-
hybrid model and proven that the hybrid method quite effective in improvement of forecast accuracy.
3. METHODOLOGY
In this paper, a new technique has been introduced for skin cancer classification using hybridisation
technique by combining CNN algorithm and ARIMA model. Our technique has been implemented and
validated. From the CNN-ARIMA model, CNN algorithm will be used to extract the features of skin cancer
and ARIMA model behave as classifier to classify the skin cancer based on three types common skin cancer
which consists of melanoma, basal cell carcinoma and squamous cell carcinoma [27].
3.1. Dataset
The dataset used in this paper was collected from The International Skin Imaging Collaboration
(ISIC) via Kaggle website [28]. This dataset consists of 2357 RGB dermoscopic images in JPEG format
which categorized in 9 different skin cancer diseases based on the classification taken from ISIC. These
images were very clean with high resolution. Therefore, the dataset from ISIC is suitable for deep learning
implementation. Besides that, there were also different image resolution from this source since the archive
combines many sub datasets. In this paper, there were 995 images which comprised of 3 different classes in
Figure 1 such as melanoma, basal cell carcinoma and squamous cell carcinoma were selected due to these
three types of skin cancers considered as common in Malaysia. Figure 1 illustrates the selected skin cancer
types that used in this work such as Melanoma in Figure 1(a), Basal cell carcinoma in Figure 1(b) and
Squamous cell carcinoma in Figure 1(c).
Figure 1. Selected types of skin cancer [28]: (a) Melanoma, (b) Basal cell carcinoma, (c) Squamous cell
carcinoma
Hybrid of convolutional neural network algorithm and autoregressive integrated… (Chee Ka Chin)
710 ISSN: 2252-8938
contour will be enhanced by using image thresholding method Figure 2(d). The fine hair contour will be
removed and inpaint the original image to produce the clean skin cancer images without fine hair Figure 2(e).
(d) (e)
Figure 2. Hair removal process using OpenCV for original basal cell carcinoma image: (a) Original basal cell
carcinoma, (b) Grayscale image, (c) After morphological black hat operation,
(d) Thresholded image, (e) Finalized image without fine hair
After that, the further augmented approaches such as width shift, height shift, zoom, stretch and
rotation were selected to avoid overfit or underfit issues help the model to generalize properly. This process
able to provide the data set with acceptable size. The detail parameters choice for further data augmentation
of skin cancer datasets were summarized in Table 2.
Int J Artif Intell, Vol. 10, No. 3, September 2021: 707 - 716
Int J Artif Intell ISSN: 2252-8938 711
Before preprocessed and augmented dataset implement in the CNN-ARIMA model, all of these
datasets need to resize to the shape of 32×32 and consequently convert every pixel of images to array which
known as feature maps. The input will be the skin cancer image with 2D array that corresponding to the pixel
values. According to Figure 3, the proposed CNN-ARIMA model is the multistage architecture consists of
three stages that consists of multiple layers and fully connected layers which are connected after these stages.
The details of the proposed CNN-ARIMA model are such as:
First stage - Third Stage: The input initially fed into first stage of the CNN-ARIMA model. The first
and second convolutional layers in this stage contain 64 filters with the kernel size of 5×5 followed by
rectified linear unit (ReLU) as an activation function after each convolutional layer. The output size
after pooling layer decreases by half since a stride of 2 have been used with pooling size of 2×2. The
dropout layer with the ratio of 0.5 also added after pooling layer to avoid overfitting. The final dropout
layer in this stage has an output of 12×12×64 which will behave as input in next stage. The same step
will be repeated in next two stages except the 32 filters with kernel size of 3×3 and 16 filters with kernel
size of 1×1 are used in second stage and third stage respectively.
Fully connected layer: The output from final dropout layer in third stage convert to a single array
through flatten layer which also means convert the 3D array into a 1D array of size 2×2×16= 64. After
that, the ARIMA model obtain in the ARIMA layer via the algorithm;
Where p is the auto regressive value, d is the differencing value and q is the moving average value.
However, whatever value insert in p, d and q, the output parameter still the same which will not affect
the performance in this study. In the final last 3 layers, the dense implement according to large 512-unit
layers with the dropout ratio of 0.5 followed by the final layer with 3 nodes for classification
performance among 3 classes of skin cancer which are basal cell carcinoma, melanoma and squamous
cell carcinoma through a softmax activation function.
In addition, there are also the parameters compute after every layer in model except input layer,
dropout layer and flatten layer due to there is no backpropagation learning involve in these layers. The
number of parameters for convolutional layer, 𝑃𝑎𝑟𝑎𝑚𝑐𝑜𝑛𝑣 and fully connected layer, 𝑃𝑎𝑟𝑎𝑚𝑓𝑢𝑙𝑙𝑦 are
computed by the formula;
𝑃𝑎𝑟𝑎𝑚𝑓𝑢𝑙𝑙𝑦 = (𝑙 + 1) × 𝑐 (3)
Where 𝑚 is the shape of width of the filter, 𝑛 is the shape of height of the filter, 𝑓 is the number of filters in
the previous layer, 𝑘 is the number of filters in current layer, 𝑙 is the previous layer neurons and 𝑐 is the current
layer neurons. The total parameter computed for the proposed CNN-ARIMA model as shown in Table 3 is
136422 which also based on the sum of every single parameter from each layer.
Hybrid of convolutional neural network algorithm and autoregressive integrated… (Chee Ka Chin)
712 ISSN: 2252-8938
Int J Artif Intell, Vol. 10, No. 3, September 2021: 707 - 716
Int J Artif Intell ISSN: 2252-8938 713
𝑇𝑃+𝑇𝑁
𝐴𝐶𝐶 = × 100% (4)
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
𝑇𝑃
𝑇𝑃𝑅 = × 100% (5)
𝑇𝑃+𝐹𝑁
𝑇𝑁
𝑇𝑁𝑅 = × 100% (6)
𝑇𝑁+𝐹𝑃
𝑇𝑃
PPV = × 100% (7)
𝑇𝑃+𝐹𝑃
Where 𝑇𝑃 is true positive value, 𝑇𝑁 is true negative value, 𝐹𝑃 is false positive value and 𝐹𝑁 is false
negative value while 𝑇𝑃𝑅, 𝑇𝑁𝑅 and PPV refers to true positive rate, true negative rate and positive
prediction value respectively. Therefore, the classification result performance in this paper consists of overall
test accuracy with the average sensitivity, average specificity and average precision according to basal cell
carcinoma, melanoma and squamous cell carcinoma.
(a) (b)
Hybrid of convolutional neural network algorithm and autoregressive integrated… (Chee Ka Chin)
714 ISSN: 2252-8938
Figure 5. The confusion matrix of the classification accuracy for the proposed CNN-ARIMA model
According to Table 4, the performance measure of three skin cancer types via confusion matrix as
shown in Figure 5 achieve the average sensitivity of 92.39%, average specificity of 96.11% and average
precision of 92.40%. In addition, the performance of the proposed CNN-ARIMA model also compared with
the existing techniques using the different amount of skin cancer dataset and different classes of skin cancer
to classify. Results of comparative study as presented in Table 5 show that the proposed CNN-ARIMA
model achieve better performance than the existing tehniques in the term of sensitivity and precision, as it has
been able to accurately distinguish skin cancer in various classes of skin cancer. However, the classification
accuracy and specifity for the proposed model of CNN-ARIMA is still marginally lower than the previous
work proposed by Kassem et al. [22].
5. CONCLUSION
This research introduced the hybrid algorithm of deep neural network and machine learning
approach for skin cancer classification. The hybrid approach was relied on the combination of CNN and
ARIMA. The proposed hybrid performance shown significantly higher results in terms of sensitivity,
specificity and precision as compared to the state-of-art skin cancer classification approaches that used single
machine learning and single deep learning methods. In addition, this work also did not require any prior
segmentation approach.
ACKNOWLEDGEMENTS
It is hereby declared that, this paper has not been submitted anywhere for publication and there is no
conflict of interest for publishing this paper. Authors would like to thank University Malaysia Sarawak
Int J Artif Intell, Vol. 10, No. 3, September 2021: 707 - 716
Int J Artif Intell ISSN: 2252-8938 715
(UNIMAS) and Fusionex International for supporting this research through UNIMAS-Fusionex Research
Grant (RG/F02/FUSX/01/2019).
REFERENCES
[1] Beacon Hospital, "Skin Cancer," 2016. [Online] Available: http://www.beaconhospital.com.my/skin-cancer/
[Accessed Dec. 10. 2018]., p. 2018, 2018.
[2] A. M. Affandi, "Is skin cancer common in Malaysia?" The Star Online, Feb. 2015. [Online], Available
https://www.thestar.com.my. [Accessed Dec. 10, 2018]., p. 2018, 2018.
[3] Z. Khazaei, M. Sohrabivafa, K. Mansori, H. Naemi, and E. Goodarzi, “Incidence and mortality of cervix cancer and
their relationship with the human development index in 185 countries in the world: An ecology study in 2018,”
Advances in Human Biology, vol. 9, no. 3, p. 222, 2019, doi: 10.4103/aihb.aihb_15_19.
[4] M. Kunz and W. Stolz, "ABCD rule," Dermoscopedia Organization, 17 Jan. 2018. Available:
https://dermoscopedia.org/ABCD_rule. [Accessed 11 Nov. 2020].,” p. 2020, 2020.
[5] B. Tas, “Faster Evaluation of ABCD Rule and Total Dermoscopic Score for Nevomelanocytic Lesions:
Dermoscopic Score Scale,” JOURNAL OF SKIN AND STEM CELL, vol. 6, no. 2, 2020, doi: 10.5812/jssc.102138.
[6] P. Carli et al., "Education Pattern analysis, not simplified algorithms, is the most reliable method for teaching
dermoscopy for melanoma diagnosis to residents in dermatology," British Journal of Dermatology, vol. 148, no. 5,
pp. 981-984, 2003, doi: 10.1046/j.1365-2133.2003.05023.x.
[7] J. Kawahara, S. Daneshvar, G. Argenziano, and G. Hamarneh, "Seven-Point Checklist and Skin Lesion
Classification Using Multitask Multimodal Neural Nets," in IEEE Journal of Biomedical and Health Informatics,
vol. 23, no. 2, pp. 538-546, March 2019, doi: 10.1109/JBHI.2018.2824327.
[8] R. Garnavi, M. Aldeen, and J. Bailey, "Computer-Aided Diagnosis of Melanoma Using Border- and Wavelet-Based
Texture Analysis," in IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 6, pp. 1239-1252,
Nov. 2012, doi: 10.1109/TITB.2012.2212282.
[9] F. Kaliyadan, “The scope of the dermoscope,” Indian Dermatol. Online J., vol. 7, no. 5, p. 359, 2016, doi:
10.4103/2229-5178.190496.
[10] T. Tiwari, T. Tiwari, and S. Tiwari, “How Artificial Intelligence, Machine Learning and Deep Learning are
Radically Different?,” International Journal of Advanced Research in Computer Science and Software
Engineering, vol. 8, no. 2, p. 1, 2018, doi: 10.23956/ijarcsse.v8i2.569.
[11] J. Vincent, “Artificial intelligence can spot skin cancer as well as a trained doctor,” The Verge, p. 14396500, 2019.
[12] S. Loussaief and A. Abdelkrim, “Convolutional Neural Network hyper-parameters optimization based on Genetic
Algorithms,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 10, pp. 252-266,
2018, doi: 10.14569/IJACSA.2018.091031.
[13] G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, "Time Series Analysis: Forecasting and Control,"
Journal of Time Series Analysis, vol. 37, no. 5, pp. 712, March 2016, doi: 10.1111/jtsa.12194.
[14] M. A. F. Quioc, S. C. Ambat, and A. C. Lagman, "Medical Cases Forecasting for the Development of Resource
Allocation Recommender System," 2019 IEEE 4th International Conference on Computer and Communication
Systems (ICCCS), 2019, pp. 414-418, doi: 10.1109/CCOMS.2019.8821673.
[15] N. C. B, J. Cai, M. Abedini, and R. Garnavi, "Deep Learning, Sparse Coding, and SVM for Melanoma Recognition
in Dermoscopy Images," International Workshop on Machine Learning in Medical Imaging, vol. 1, pp. 118-126,
2015.
[16] N. C. F. Codella et al., “Deep learning ensembles for melanoma recognition in dermoscopy images 1 Dataset,” vol.
61, no. 4, pp. 1-28, 2017.
[17] M. Koklu and I. A. Ozkan, “Skin Lesion Classification using Machine Learning Algorithms,” International Journal
of Intelligent Systems and Applications in Engineering, vol. 4, no. 5, pp. 285–289, 2017, doi:
10.18201/ijisae.2017534420.
[18] R. B. Oliveira, N. Marranghello, A. S. Pereira, and J. M. R. S. Tavares, “A computational approach for detecting
pigmented skin lesions in macroscopic images,” Expert Systems with Applications, vol. 61, pp. 53-63, 2016, doi:
10.1016/j.eswa.2016.05.017.
[19] T. Emara, H. M. Afify, F. H. Ismail and A. E. Hassanien, "A Modified Inception-v4 for Imbalanced Skin Cancer
Classification Dataset," 2019 14th International Conference on Computer Engineering and Systems (ICCES), 2019,
pp. 28-33, doi: 10.1109/ICCES48960.2019.9068110.
[20] P. Tschandl, C. Rosendahl, and H. Kittler, "The HAM10000 dataset, a large collection of multi-source
dermatoscopic images of common pigmented skin lesions," Scientific Data, vol. 5, no. 1, pp. 1-9, 2018, doi:
10.1038/sdata.2018.161.
[21] S. Mukherjee, A. Adhikari, and M. Roy, “Malignant Melanoma Classification Using Cross-Platform Dataset with
Deep Learning CNN Architecture,” Recent Trends in Signal and Image Processing, pp. 31-41, 2019, doi:
10.1007/978-981-13-6783-0.
[22] M. A. Kassem, K. M. Hosny, and M. M. Fouad, “Skin Lesions Classification into Eight Classes for ISIC 2019
Using Deep Convolutional Neural Network and Transfer Learning,” IEEE Access, vol. 8, pp. 114822-114832,
2020, doi: 10.1109/ACCESS.2020.3003890.
[23] B. Wang et al., “Research on hybrid model of garlic short-term price forecasting based on big data,” Computers,
Materials and Continua, vol. 57, no. 2, pp. 283-296, 2018, doi: 10.32604/cmc.2018.03791.
[24] T. Ma, C. Antoniou, and T. Toledo, “Hybrid machine learning algorithm and statistical time series model for
network-wide traffic forecast,” Transportation Research Part C: Emerging Technologies, vol. 111, pp. 352-372,
Hybrid of convolutional neural network algorithm and autoregressive integrated… (Chee Ka Chin)
716 ISSN: 2252-8938
BIOGRAPHIES OF AUTHORS
Mr. Chee Ka Chin is currently pursuing the master's degree of engineering in Faculty of
Engineering, University Malaysia Sarawak (UNIMAS). His research interests include artificial
intelligence and deep learning application.
Dr. Dayang Azra binti Awang Mat is a Senior Lecturer in Faculty of Engineering, University
Malaysia Sarawak (UNIMAS) from 2006 until now. She graduated from Kyushu University
Japan, in 2014 with Doctor of Engineering (in RF and millimeter-wave design). She received
her MEng. (Computer and Communication) in 2006 at Universiti Kebangsaan Malaysia and
degree in BEng. (Hons) Electronic and Telecommunication at Universiti Malaysia Sarawak.
She is expert in mobile and wireless communication, antenna and filter design, RF both in
microwave and millimeter-wave.
Dr. Abdulrazak Yahya Saleh is a Senior Lecturer in Faculty of Cognitive Sciences and
Human Development, Universiti Malaysia Sarawak (UNIMAS). He received his PhD from
Universiti Teknologi Malaysia, MSc and BSc from University of Science and Technology,
Yemen. His research is mainly on the cognitive science, artificial intelligence, neural networks,
spiking neural networks, brain modeling, optimization methods, multi-objective optimization
algorithms and machine learning techniques and their applications. The current focus of his
research is on deep learning applications and challenges in big data analytics.
Int J Artif Intell, Vol. 10, No. 3, September 2021: 707 - 716