Junaedi 2019

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2019 International Conference on Information and Communications Technology (ICOIACT)

Tuberculosis Detection In Chest X-Ray Images


Using Optimized Gray Level Co-Occurrence Matrix
Features
Imam Junaedi Erni Yudaningtyas Rahmadwati
Department of Electrical Engineering Department of Electrical Engineering Department of Electrical Engineering
Brawijaya University Brawijaya University Brawijaya University
Malang, Indonesia Malang, Indonesia Malang, Indonesia
imam.junaedi90@yahoo.com erni@ub.ac.id rahma@ub.ac.id

Abstract— Tuberculosis (TB) is a deadly infectious disease classification in detecting TB. The feature extraction methods
caused by Mycobacterium Tuberculosis (MTB). Chest X-ray used include Histogram of Gradient (HOG) [5], Gray Level
(CXR) image has been the main tool for detecting lung TB Co-occurrence Matrix [6], Haar Wavelet Transform (HWT)
historically. CXR images are analyzed by radiologists to [7], and Local Binary Pattern (LBP) [8]. GLCM is known as
determine whether or not there are signs of TB in the lungs. The
a mathematical method for detecting lung abnormalities and
results of the analysis by radiologists in analyzing CXR images
are influenced by the subjectivity of radiologists, such as provides an opportunity for doctors to localize abnormal tissue
experience from radiologists, conditions of observation, fatigue, types, both tumors and pulmonary edema (fluid buildup in the
and others. The subjectivity factor of the radiologist can be lung airbag) [9]. Ten GLCM features can detect TB with an
overcome by the computer aided diagnosis system. This paper accuracy of 75%, specificity of 75% and sensitivity of 75% in
proposed a TB detection system on CXR images using optimized Pawar and Ganorkar’s research [6]. Pawar’s TB detection
Gray Level Co-Occurrence Matrix (GLCM) features as the system can be improved by applying optimization features to
input. GLCM is optimized using the Principal Component the GLCM to improve system performance.
Analysis (PCA) and then classified using the Support Vector PCA is used in this research to optimize GLCM features.
Machine (SVM). In this paper, CXR images were classified as
PCA can optimize the GLCM features by reducing the GLCM
normal, primary TB (PTB) and secondary TB (STB). The
results of this paper indicate that the classification system with texture features and provide a better classification in the
optimized GLCM as input has better performance than the classification system of the sub-Antarctic vegetation
classification system with regular GLCM as input. The community [10]. PCA is also able to reduce GLCM features
classification system with optimized GLCM as input in the 8- in electroencephalogram (EEG) spectrogram analysis from
fold cross validation test has an accuracy of 100% for the eighty to five components [11].
normal class, 98.72% for the PTB class and 98.72% for the STB Classification methods that can be used are Euclidian
class. Distance [6], SVM [5], K-nearest Neighbor (K-NN) [12], and
Convolutional Neural Network (CNN) [13]. SVM is a
Keywords— computer aided diagnosis, tuberculosis (TB),
gray level co-occurrence matrix (GLCM), principal component
supervised machine learning that can be used for
analysis (PCA), support vector machine (SVM) classification. SVM has a fairly high level of accuracy. SVM
is widely applied in the field of bioinformatics, such as for TB
I. INTRODUCTION detection [5], detection of brain tumors [14], and medical
image classification [15]. SVM can detect TB in CXR images
TB is a deadly infectious disease caused by MTB [1]. The
with an accuracy rate of 88%, sensitivity of 88% and
World Health Organization (WHO) reported that in 2017
specificity of 88% [5]. The classification output of the SVM
Indonesia was included in the list of 20 high TB burden
classification can be improved by using the optimized GLCM
countries with a total of 446,732 cases. WHO estimates that
features as the input.
there are 107,000 deaths of TB patients in Indonesia in 2017
In this paper, a computer aided diagnosis system was
[2]. Historically, CXR is the main tool for detecting TB [3]. designed to detect TB in CXR images using Matlab 2018a and
CXR has a fairly high sensitivity to TB detection, especially Weka 3.8 software. CXR images are processed through
when used to detect pulmonary abnormalities associated with several stages, namely preprocessing, segmentation, and
TB in cases of patients who do not have visible symptoms of classification. The optimized GLCM features used as input for
TB. CXR images are analyzed by radiologists to determine the classification process. The classification method in this
whether or not there are abnormalities in the lungs. The results research is SVM. This research aims to reduce the subjectivity
of analysis by radiologists in analyzing CXR images are factor in the process of detecting TB in CXR
influenced by the subjectivity of radiologists, such as
experience from radiologists, conditions of observation, II. RESEARCH METHOD
fatigue, etc [4]. The subjectivity factor of the radiologist can This research proposed a TB detection system in CXR
be overcome by the TB detection system on computerized images using optimized GLCM by PCA as input and
CXR images. The TB detection system in computerized CXR classified by multilevel SVM classification. In this research,
images can also be used as a second opinion in giving a CXR images were obtained from Shenzhen chest X-ray set,
diagnosis. Guangdong Medical College [16]. CXR data used consisted
Detection of TB in CXR images has been investigated in of 46 normal images, 24 PTB images, and 8 STB images.
recent years. Many methods are used to extract feature and CXR images are processed through several stages, namely

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

978-1-7281-1655-6/19/$31.00 ©2019 IEEE 95


2019 International Conference on Information and Communications Technology (ICOIACT)

preprocessing, segmentation, feature extraction, feature


optimization, and classification as shown as a block diagram
in Fig. 1.
Start

CXR Image

Preprocessing

Segmentation

Fig. 2. Preprocessing result


GLCM Feature Extraction
B. Segmentation
The segmentation process is the process of separating the
Feature Optimization Using PCA
region of interest (ROI) from the background image. The
segmentation process separates the lung fields from the rest of
SVM Classification the CXR images. The segmentation process in this research
uses morphological filters, binary segmentation methods and
the selection of two largest binary large objects to obtain
masks. Mask is used to separate background images with ROI
Secondary
Normal Classification
TB
on CXR images. The segmentation result of Fig. 2 is shown in
Fig. 3.

Primary TB

End

Fig. 1. Block diagram of TB detection system in CXR images

A. Preprocessing
Preprocessing is the initial process in processing CXR
images in this research. Preprocessing includes the process of
reducing image size (resize), sharpening image features and Fig. 3. Segmentation result
reducing noise. The preprocessing process is done using
Matlab 2018a. C. Feature Extraction
The initial stage of preprocessing is the reduction of image The feature extracted in this research is the GCLM feature.
size. CXR images are reduced in size to 512x512 pixels to The GLCM features in this research are twelve features,
speed up CXR image processing. After the CXR image is namely contrast, correlation, energy, homogeneity, mean,
reduced in size, the next step is to sharpen the features on the standard deviation, entropy, root mean square (RMS),
CXR image. The CXR feature is sharpened by adapthisteq variance, smoothness, kurtosis, and skewness.
function in Matlab 2018a. Adaphisteq on Matlab 2018a serves 1) Contrast
to improve the contrast quality of the image by transforming Contrast is a measure of the intensity of gray level
contrast values using the Contrast Limited Adaptive variations between reference pixels against neighboring
Histogram Equalization (CLAHE) method. CLAHE can pixels. The contrast will be zero for plain images.
improve image contrast quality by considering the presence of Contrast is given by (1),
noise in the image. The final step is noise reduction in CXR
images. Noise on CXR images is reduced by using a Gaussian =∑ ∑( − ) = [, ] (1)
filter. Gaussian filter can reduce noise and smoothened CXR
images to facilitate edge detection in the segmentation where, the matrix P [i, j] is a GLCM matrix of images
process. The preprocessing result is shown in Fig. 2. with gray values i and j.
2) Correlation
Correlation is a measure of the relation of a reference
pixel to its neighbor in an image. The mean is denoted as

96
2019 International Conference on Information and Communications Technology (ICOIACT)

, and the standard deviation is denoted as , . 9) Variance


Correlation is given by (2), Variance is the square root of the standard deviation
(σ). Variance is given by (9),
(, )
=∑ ∑ (2)
=√ (9)

3) Energy 10) Smoothness


Energy is a measure of local uniformity at the gray Smoothness describes the relative smoothness of the
level. Energy will be worth one on a plain image. Energy image intensity. Smoothness is a statistical method for
is given by (3), selecting image blocks that have less noise. Smoothness
is given by (10),
=∑ ∑ [, ] (3)
ℎ =1− (10)
4) Homogeneity
Homogeneity is a measure of the distribution of pixel
11) Kurtosis
elements in the co-occurrence matrix to the GLCM
Kurtosis (K) is a measure of the extent of normal
diagonal. Homogeneity is given by (4),
distribution. Kurtosis is given by (11),

(, )
=∑ ∑
[, ]
(4) = ∑ ∑ −3 (11)
| |

5) Mean 12) Skewness


Mean (μ) is the average value of pixels in an Skewness (S) is a measure of the degree of
image. Mean is given by (5), asymmetric distribution of a pixel around the mean.
Skewness is given by,
= ∑ ∑ (, ) (5)
(, )
= ∑ ∑ (12)
where, p (i, j) is the pixel value at point (i, j) of an image
of size M x N. D. Feature Optimization
6) Standard Deviation Twelve GLCM features from the image extraction process
The standard deviation (σ) is an estimate of the are optimized using PCA. PCA purpose is to reduce the
deviation of the square of the average gray pixel of the dimensions of the GLCM feature. Weka 3.8 is used for the
mean value. The standard deviation describes dispersing feature optimization process. The initial process of PCA is the
within the local area. Standard deviation is given by (6), calculation of the mean used for the process of forming a
correlation matrix. Twelve GLCM features are converted into
= ∑ ∑ ( ( , )− ) (6) a correlation matrix. Continually eigenvalue and eigenvector
are calculated, followed by the formation of a new GLCM
feature. The new GLCM feature is used as input to the SVM
7) Entropy classification system.
Entropy is a measure of randomness from gray level The result of GLCM optimization by PCA is shown in Fig.
distribution. Entropy is used as a measure of the 4.
distribution of variations within a region. Entropy is given
by (7),

= −∑ ∑ [, ] [, ] (7)

8) RMS
RMS is an RMS calculation of each input row or
column along the vector of the input dimension. RMS is
given by (8),

= ∑ ∑ ( − Ī) (8)

where, is the pixel intensity of the ith and jth elements


in the two-dimensional image with the size M x N and the Fig. 4. Feature optimization result
value of Ī is the average intensity of all pixel values in the E. Classification
image.
The classification process is the process of determining
CXR images classified as normal, PTB or STB. Weka 3.8 is

97
2019 International Conference on Information and Communications Technology (ICOIACT)

used for the classification process. The classification method


used in this research is multiclass SVM. The type of multiclass
SVM used in the research is C-SVC (support vector class). C-
SVC is an SVM type that seeks the best hyperplane to separate
different classes by maximizing the distance between sample
points to the hyperplane to minimize misclassification errors.
Function on C-SVC to minimize misclassification errors is
given by (13),

Ф( , ) = ∑ (13)

where, w is weight vector, ξi is slack variable, Ф is cost


function, and C is the regulation parameter. The value of Fig. 5. Graph of SVM (GLCM without optimization) performance
parameter C in this research is 1. Fig. 6 shows the accuracy, sensitivity, and specificity of
SVM classifier in this research has two types of inputs, the SVM with GLCM input that has been optimized using
namely the GLCM feature without optimization and the PCA. SVM with GLCM input optimized by PCA have the
GLCM feature optimized by using the PCA. highest value on 8-fold cross-validation with 100% accuracy,
100% sensitivity and 100% specificity for normal class;
III. RESULT AND ANALYSIS 98.72% accuracy, 100% sensitivity and 98.15% specificity for
The parameters for measuring the performance of the system PTB class; and 98.72% accuracy, 87.50% sensitivity and
are accuracy, sensitivity, and specificity. Each parameter 100% specificity for STB class.
calculated for each classification class. Accuracy is the
amount of data that is properly classified by the system.
Sensitivity is a measure of how well a classification system
classifies data points in the positive class. Specificity is a
measure of how well a classification system classifies data
points in the negative class. Accuracy, sensitivity, and
specificity are given by (14), (15), and (16),

= (14)

= (15)
Fig. 6. Graph of SVM (optimized GLCM by PCA) performance
= (16) The SVM classification system with GLCM input without
optimization is not able to classify PTB and STB classes
where, TP (True Positive) is an outcome where the system properly. The performance of the SVM classification system
correctly predicts the positive class, TN (True Negative) is an with GLCM input optimized by PCA is better than the SVM
outcome where the system correctly predicts the negative classification system with GLCM input without optimization.
class, FP (False Positive) is an outcome where the system PCA is able to optimize the GLCM feature in the SVM
incorrectly predicts the positive class, and FN (False classification system to improve the performance of the CXR
Negative) is an outcome where the system incorrectly image classification system.
predicts the negative class.
The SVM classification system in this research is validated IV. CONCLUSION
using the cross-validation method. Cross-validation methods TB detection system in the CXR images has been
used in this research are 2-fold, 4-fold, 6-fold, 8-fold, and 10- developed to reduce the subjectivity factor in the process of
fold. Validation results from SVM with GLCM input without detecting TB in CXR. SVM classification system with GLCM
optimization will be compared with SVM with GLCM input input optimized by PCA is better than SVM classification with
that has been optimized using PCA. GLCM input without optimization. PCA can optimize the
Fig. 5 shows the accuracy, sensitivity, and specificity of GLCM feature in the SVM classification system to improve
the SVM with GLCM input without optimization. SVM with the performance of the CXR image classification system.
GLCM input without optimization have highest performance SVM with GLCM input optimized by PCA have the highest
at 6-fold cross-validation with 62.82% accuracy, 86.96% value on 8-fold cross-validation with 100% accuracy for the
sensitivity and 28.13% specificity for normal class; 67.95% normal class; 98.72% accuracy for PTB class; and 98.72%
accuracy, 29.17% sensitivity and 85.19% specificity for PTB accuracy for STB class.
class; and 89.74% accuracy, 0% sensitivity and 100%
specificity for STB class. REFERENCES
[1] F. Varaine and M. L. Rich, “Tuberculosis: practical guide for
clinicians, nurses, laboratory technicians and medical auxiliaries,”
Médecins sans frontiers, 2014
[2] WHO, “Global tuberculosis report 2018,” WHO, 2018

98
2019 International Conference on Information and Communications Technology (ICOIACT)

[3] WHO, “Chest radiography in tuberculosis detection – summary of International Journal of Applied Earth Observation and
current WHO recommendations and guidance on programmatic Geoinformation, 2010, doi:10.1016/j.jag.2010.01.006.
approaches,” WHO, 2016 [11] M. Mustafa, M. N. Taib, Z. Murat, and L. Sahrim, “GLCM texture
[4] A. Tingberg, “Quantifying the quality of medical x-ray images,” feature reduction for EEG spectogram image using PCA. IEEE
Lund University, 2000 Student Conference and Development, Dec. 2010, doi:
[5] G. S. Mahajan and S. R. Ganorkar, “Tuberculosis detection using 10.1109/SCORED.2010.5704047.
neural network,” International Journal of Innovative Research in
[12] B. Antony and, N. Banu P. K, “Lung tuberculosis detection using x-
Computer and Communication Engineering., vol. 5, issue 5, may.
ray images,” International Journal of Applied Engineering
2017, pp. 10391-10398, doi:10.15680/IJIRCCE.2017.0505366.
[6] C. C. Pawar and S. R. Ganorkar, “Tuberculosis screening using Research, vol. 12, number 24, 2017, pp. 15196-15201.
digital image processing techniques,” International Research [13] C. Liu, Y. Cao, M. Alcantara, B. Liu, M, Brunette, J. Peinado, and W.
Journal of Engineering and Technology., vol. 3 issue 7, pp. 623-627, Curioso, “TX-CNN: detecting tuberculosis in chest x-ray images
July. 2016, using convolutional neural network,” 2017 IEEE International
[7] R. Krithika and K. Alice, “Automatic detection of tuberculosis in Conference on Image Processing (ICIP).
chest radiography using fuzzy classifier,” International Journal of [14] M. Kadam, and A. Dhole, “Brain tumor detection using GLCM with
Engineering and Computer Science, vol. 6 issue 3, March. 2017, pp. the help of KSVM,” International Journal of Engineering and
20721-20730, doi:10.18535/ijecs/v6i3.58. Technical Research, Vol. 7, issue 2, February 2017.
[8] A. R. Chand and G. M. Raj, “Advanced tuberculosis detection system [15] Z, Camlica, H.R. Tizhoosh, and F, Khalvati, “Medical image
using chest radiographs,” International Advanced Research classification via SVM using LBP Features from saliency-based
Journal in Science, vol. 3 special issue 3, August 2016, pp. 77-80, folded data,” 2015 IEEE 14th International Conference on Machine
doi:10.17148/IARJSET. Learning and Applications.
[9] N. Zayed and H. A. Elnemr, “Statistical analysis of haralick texture [16] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. Wang, P.-X. Lu, and G.
features to discriminate lung abnormalities,” International Journal Thoma, “Two public chest x-ray datasets for computer-aided
of Biomedical Imaging, vol. 2015 article ID 267807, screening of pulmonary diseases,” Quantitative imaging in
doi:10.1155/2015/267807. medicine and surgery, vol. 4, no. 6, p. 475, 2014.
[10] H. Murray, A. Lucieer and R. Williams, “Texture-based classification
of sub-Antartic vegetation communities on Heard island,”

99

You might also like