Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Available online at www.sciencedirect.

com

ScienceDirect
Available online at www.sciencedirect.com

ProcediaScienceDirect
Available online at www.sciencedirect.com
Computer Science 00 (2018) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 00 (2018) 000–000
www.elsevier.com/locate/procedia
Procedia Computer Science 132 (2018) 40–46

International Conference on Computational Intelligence and Data Science (ICCIDS 2018)


International Conference on Computational Intelligence and Data Science (ICCIDS 2018)
Application of Feature Extraction and Classification Methods for
Histopathological
Application of Feature ImageExtraction
using GLCM, LBP, LBGLCM,
and Classification GLRLM
Methods for
Histopathological Image using andGLCM,
SFTA LBP, LBGLCM, GLRLM
and
Şaban Öztürk a SFTA
*, Bayram Akdemirb
a Amasya, 05000, Turkey
Şaban Öztürk *, Bayram Akdemirb
a
Amasya University,
b
Selçuk University, Konya, 42000, Turkey
a
Amasya University, Amasya, 05000, Turkey
b
Selçuk University, Konya, 42000, Turkey

Abstract

Classification
Abstract of histopathologic images and identification of cancerous areas is quite challenging due to image background
complexity and resolution. The difference between normal tissue and cancerous tissue is very small in some cases. So, the
features of theof
Classification tissue patches in the
histopathologic imageand
images have key importance
identification for automatic
of cancerous areasclassification. Using only
is quite challenging due one featurebackground
to image or using a
few featuresand
complexity leads to poor The
resolution. classification
difference results
betweenbecause
normal oftissue
the and
smallcancerous
differencetissue
between thesmall
is very textures. In this
in some study,
cases. So, the
classification
features of theresults are compared
tissue patches usinghave
in the image different feature extraction
key importance algorithms
for automatic that can
classification. extract
Using only various features
one feature froma
or using
histopathological
few features leads image texture.
to poor For this study,
classification GLCM,
results LBP,
because of LBGLCM, GLRLM and
the small difference SFTA the
between algorithms
textures.which are study,
In this successful
the
feature extraction
classification algorithms
results have beenusing
are compared chosen. The features
different featureobtained fromalgorithms
extraction these methods
that are
canclassified
extract with SVM,
various KNN, LDA
features from
and Boosted Tree image
histopathological classifiers. TheFor
texture. mostthis
successful feature LBP,
study, GLCM, extraction algorithm
LBGLCM, for histopathological
GLRLM images is
and SFTA algorithms determined
which and the
are successful
most successful
feature extractionclassification algorithm
algorithms have been is determined.
chosen. The features obtained from these methods are classified with SVM, KNN, LDA
and Boosted Tree classifiers. The most successful feature extraction algorithm for histopathological images is determined and the
© 2018 The Authors. Published by Elsevier Ltd.
© 2018
most The Authors.
successful Published
classification by Elsevier
algorithm B.V.
is determined.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and
© 2018
Data The Authors.
Science (ICCIDSPublished
2018). by Elsevier B.V.

* Corresponding author. Tel.: +905065702451


E-mail address: saban.ozturk@amasya.edu.tr
* Corresponding author. Tel.: +905065702451
1877-0509© 2018 The
E-mail address: Authors. Published by Elsevier B.V.
saban.ozturk@amasya.edu.tr
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and
Data Science2018
1877-0509© (ICCIDS 2018).Published by Elsevier B.V.
The Authors.
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and
Data Science (ICCIDS 2018).

1877-0509 © 2018 The Authors. Published by Elsevier Ltd.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and Data Science
(ICCIDS 2018).
10.1016/j.procs.2018.05.057
2 Şaban Öztürk/ Procedia Computer Science 00 (2018) 000–000

Peer-review under responsibility of the scientific committee


Şaban Öztürk et al. /of the International
Procedia Conference
Computer Science on Computational
132 (2018) 40–46 Intelligence and 41
Data Science (ICCIDS 2018).

Keywords:feature extraction; GLCM; LBP; LBGLCM; GLRLM; SFTA; SVM; LDA; KNN; histopathological image.

1. Introduction

Image processing techniques have an important role in the interpretation of medical images and for automatic
diagnosis. Especially in recent years, the development of whole-slide imaging techniques and the increase in cancer
cases have attracted the attention of many researchers to automatic histopathological image analysis [1]. The whole-
slide images have a very high resolution and it takes quite a long time to be examined by experts. Computer aided
automatic image processing methods are presented to facilitate this exhaustive process. These methods help the
expert to decide on the analysis of the image, and in some cases assume the role of decision maker [2]. Image
features are used in the automatic classification of images and in the decision-making process. Many features such as
texture differences, shape differences, light fluctuations, color changes in the image provide useful information for
classification algorithms. The most important point here is to determine the correct features and select the
appropriate classification algorithm for these features. Different classification results for the same image can be
obtained with different feature extraction algorithms [3]. Therefore, feature selection is the one of the most important
step for classification.
The purpose of feature extraction algorithms is to identify features that can best represent the image and contain
fewer parameters. With the specified features, the image can be expressed meaningfully using fewer parameters. A
faster and successful classification can be made with fewer computational loads by eliminating unimportant
parameters [4]. Low-level features and high-level features are usually removed from the images. Low-level features
are simpler features in the image and computational load is less. However, the classification success is low for
complex images. High-level features are more complex and have more computational load. The choice of which
features to use varies depending on the problem. For this reason, there are many feature extraction algorithms with
different approaches in the literature.
In this study, large sized histopathological images are fragmented into small image pieces sizes of 128x128
pixels. Each piece of image is labeled as normal tissue or cancer tissue. Although each tissue type has its own
characteristics, it is often difficult to distinguish between cancerous tissue and normal tissue. In general, cells in
cancerous tissues tend to expand and tissue color becomes clearer. Irregularities in cell arrays have come to fruition.
In normal tissue, the cell forms are more regular and the color is darker. But the classification of such complex
structures with only these low-level features remains low. For this reason, different algorithms capable of extracting
features at higher levels have been tested on histopathologic images in this study. So, successful algorithms in the
literature, GLCM, LBP, LBGLCM, GLRLM and SFTA feature extraction algorithms have been tried. Each
algorithm is applied to the whole data set in order, and the feature matrix for each image is extracted. These property
matrices obtained from labelled images are classified in order using SVM, KNN, LDA and Boosted Tree algorithms.
The purpose of the study is to determine the feature extraction algorithm that can determine the most appropriate
features for histopathological images. As a result, the most successful feature extraction algorithms that can
represent image texture and the most successful classification algorithm that can classify these algorithms have been
determined as the result of experiments.

2. Feature Extraction Methods from Histopathological Images

Histopathological images are quite large and processing of large images is a time consuming process. For this
reason, the images are divided into smaller pieces and the process time is shortened. Then, feature extraction is
applied to each image piece. These image descriptive feature matrices obtained from feature extraction process are
classified by classification algorithms. Mentioned method is shown in Figure 1.
42 Şaban Öztürk et al. / Procedia Computer Science 132 (2018) 40–46
Şaban Öztürk/ Procedia Computer Science 00 (2018) 000–000 3

Image
Piece

GLCM
SVM

... LBP
KNN ...
... ...
LBGLCM
LDA
GLRLM

BOOSTED TREE
SFTA

Whole-Slide Image Image Pieces Feature Extraction Classification Classified Image Pieces
Fig. 1. System overview

a) Gray level co-occurrence matrix (GLCM) is a popular texture-based feature extraction method. The GLCM
determines the textural relationship between pixels by performing an operation according to the second-order
statistics in the images. Usually two pixels are used for this operation [5]. The GLCM determines the frequency of
combinations of these pixel brightness values determined. That is, it represents the frequency formation of the pixel
pairs [6]. The GLCM properties of an image are expressed as a matrix with the same number of rows and columns as
the gray values in the image. The elements of this matrix depend on the frequency of the two specified pixels. Both
pixel pairs can vary depending on their neighborhood. These matrix elements contain the second-order statistical
probability values depending on the gray value of the rows and columns. If the intensity values are wide, the
transient matrix is quite large. This creates a time-consuming process load [7].
The GLCM features used in this study are as follows; autocorrelation, contrast, correlation, cluster prominence,
cluster shade, dissimilarity, energy, entropy, homogeneity, maximum probability, sum of squares (variance), sum
average, sum variance, sum entropy, difference variance, difference variance, difference entropy, information
measure of correlation, inverse difference normalized, inverse difference moment normalized. A GLCM feature
matrix is generated which can successfully represent a picture with fewer parameters using these properties.
b) Local binary pattern (LBP) feature extraction algorithm is a very useful algorithm that is resistant to light
variations. We can simply describe the LBP process as follows; a window which has specified neighborhood value is
traversed over the image and a center pixel label assignment is made. In this process, threshold is applied according
to the pixel values adjacent to the center pixel. Then, the LBP matrix is calculated according to the local
neighborhood values in the clockwise or counterclockwise direction. Thus, the statistical and structural model of the
textural structure is calculated mathematically [8]. The most important features of the LBP algorithm are resistant to
gray level changes and computational simplicity which can be used in real-time applications [9]. Equations 1 and 2
are used for labeling the pixels.

P 1
LBPP , R
 sg
P 0
P  gc  2P (1)

1, x  0
s ( x)   (2)
0, x< 0

wheregc represents central pixels gray value, gP represents the values of the neighbors of the center pixel. P
represents numbers of neighbors and R represents radius of the neighborhood. In this study, feature matrices which
contain 10 image feature are obtained for each image by using LBP algorithm. Then, classifiers are trained using
these property matrices.
Şaban Öztürk et al. / Procedia Computer Science 132 (2018) 40–46 43
4 Şaban Öztürk/ Procedia Computer Science 00 (2018) 000–000

c) Local Binary Gray Level Co-occurrence Matrix (LBGLCM) feature extraction method is based on the
combination of LBP and GLCM algorithms. For this operation, firstly the LBP operator is applied to the raw image.
The image is analyzed with the LBP operator to create a texture image. Finally, the GLCM features of this generated
LBP image are extracted [6]. The Traditional GLCM algorithm operates on the basis of a pixel and its next neighbor
pixel when extracting the features. It does not care about other local patterns on the image. However, in LBGLCM
method, features are extracted by considering all texture structure and spatial information. Thanks to these
advantages, LBGLCM algorithm can produce more successful results than GLCM algorithm in many image
processing applications [10]. In this study, features obtained from histopathological images using the LBGLCM
algorithm are obtained using the same formulas with GLCM algorithm.
d) Gray Level Run Length Matrix (GLRLM) is a texture representation model that extracts the spatial plane
features of each pixel relative to the high order statistics [11]. At the end of this process a 2D feature matrix is
obtained. Each element in this matrix gives the total number of occurrences of the gray level in the given direction
[5]. Assume that P (i, j) is the image matrix to find the GLRLM properties used in this study and the property matrix
is obtained using the formulas in these equations:

C
P(i, j )
R
SRE   (3)
i 1 j 1 j2
G R
LRE   j 2 P(i, j ) (4)
i 1 j 1

2
G  R 
GLN     P(i, j )  (5)
j1
i 1  
2
 G
R  (6)
RLN     P(i, j ) 
j1
i 1  
1 (7)
RP  S
n
G
P(i, j )
R
(8)
LGRE  
i 1 j 1 i2
G R
HGRE   i 2 P(i, j ) (9)
i 1 j 1

e) Segmentation-based Fractal Texture Analysis (SFTA) algorithm is a fairly successful feature extraction
algorithm that performs fractal analysis of image texture [12]. The operation of the SFTA algorithm consists of two
main parts. In the first step, multi-level threshold processing is applied to gray level input images and the input
image is converted into many different binary images. The most commonly used method forthis operation is the
Two-Threshold Binary Decomposition (TTBD) method. In the second step, properties are extracted from each
binary image. Fractal measurements in the SFTA algorithm are applied to learn the boundary complexity of objects
and the structures in the image [13]. Assume I(x,y) as a gray level input image for the TTBD algorithm. This
selected T threshold pair is applied as in Equation 10.

1, if tl  I  x, y   tu
 (10)
I b  x, y   
0, otherwise

44 Şaban Öztürk et al. / Procedia Computer Science 132 (2018) 40–46
Şaban Öztürk/ Procedia Computer Science 00 (2018) 000–000 5

Equation 11 is used to determine the SFTA properties in the image.

1, if   x ', y '  N8  x, y   :


 (11)
 I b  x ', y '
 0 
  x, y  

 I b  x , y   1,

0, otherwise

Classifiers: Classification algorithms are very important for a system to be able to decide automatically. In order
for a system to be able to decide independently of the human factor, classification algorithms must be trained and
experiential [14]. For this reason, many classification algorithms have been proposed in the literature. These
algorithms have the ability to perform operations depending on the state of the obtained features. In most cases,
existing classification algorithms can be applied to many problems and successful results can be obtained. However,
more specific classification algorithms have been produced for some problems.
In this study, Support Vector Machine (SVM) [15], K-nearest neighbors (KNN) [16], linear discriminant analysis
(LDA) [17] and Boosted Tree [18] algorithms which are used frequently in the literature and produce successful
results are used. Feature matrices obtained from feature extraction algorithms are classified by the mentioned
classification algorithms. The classification results obtained from different classifiers differ even for the same feature
matrices because of the different characteristics of the classifiers.

3. Experiments and Experimental Results

In the experiments, 1416 histopathological images are used. These images are gray-level images with dimensions
of 128x128 pixels. Each image is obtained by cutting from large-scale whole-slide histopathological images. In total,
there are 708 pieces of cancerous image and 708 pieces of normal image. Of these images, 1016 (508 cancer tissue
image, 508 normal tissue image) image are used for the training of classifiers. 400 (200 cancer tissue image, 200
normal tissue image) images are used for the test process. The use of the proposed method in the whole-slide
histopathological image is as follows: firstly the whole-slide image is divided into pieces according to the
determined dimension. Each piece of image is classified with a classifier that has been trained. Each image part is
then placed in the original image.Some of the sample images in the dataset are shown in Fig. 2.

Fig. 2. Specimens of histopathological images, a) normal tissue, b) cancer tissue

Feature extraction algorithms and classifiers are implemented on a computer with an Intel Core i7-7700k (4.2
GHz) processor, 32 GB DDR4 RAM and NVIDIA GEFORCE GTX 1080 graphic card.Camelyon challenge dataset
is used for experiments [19].
Images taken from the real world often have noise and various disturbing factors. These adverse factors reduce
the success of image processing algorithms. In order to minimize the effect of the mentioned negativity,
Şaban Öztürk et al. / Procedia Computer Science 132 (2018) 40–46 45
6 Şaban Öztürk/ Procedia Computer Science 00 (2018) 000–000

preprocessing methods are applied to the images [20]. Preprocessing algorithms are created by bringing many
algorithms together to get the desired successful result. The pre-processing algorithms are determined according to
the type of image, the noise level, or the requirements of the main algorithm. After this process, the image becomes a
new image with improved contrast and low noise. In this study, preprocessing is applied to the images before the
extraction from the histopathological images. Because, in the obtained feature matrices, it is not desirable that there
are false feature coefficients due to noise and disturbing effects of undesirable factors. For this, the image is first
applied with a 2D Gaussian smoothing filter, as in Equation 12.

x2  y 2
1 
(12)
G  x, y   e 2 2
2 2

In the second step, the two-dimensional image matrix is transformed into a single line. The median value of these
lines is calculated [21]. This calculated median value is one dimension. This parameter is then subtracted from the
pixels in the original image. This will reduce the gray level fluctuations and brightness in the background of the
image. But at this stage, the important details of the image becomes blurred. Thisblurring makes it difficult to
capture important features. For this reason, the image is sharpened in the third step so that the cells and cell
boundaries in the image become more apparent. In this process, high-pass filtering is used. The gray level transitions
at the edges of the object are made more apparent. After this process, the objects in the image become apparent.
However, there is a noise similar to the small dots in the image. In the last step of preprocessing, a 2D median filter
is used to remove these noises.
Feature extraction operations are performed after the images are cleaned. All preprocessed images in the training
dataset are used to compare the successes of different feature extraction algorithms. GLCM, LBP, LBGLCM,
GLRLM and SFTA algorithms are applied to each image in sequence. At the output of these algorithms, a separate
feature matrix is obtained for each image. The GLCM algorithm generates a feature matrix with 22 image feature
and 1 class information for each image. The LBP algorithm generates a feature matrix with 10 image features and 1
class information, the LBGLCM algorithm generates a feature matrix with 22 image features and 1 class
information, the GLRLM algorithm generates a feature matrix with 7 image features and 1 class information, the
SFTA algorithm generates a feature matrix with 27 image features and 1 class information for each image in the
training phase.
Four classification algorithms have been trained using the obtained feature matrices and label values. These
algorithms are SVM, KNN, LDA and Boosted Tree algorithms which are used frequently in literature and can
produce successful results. Then the test images are classified using the trained classification algorithms. In this way,
the performance of feature extraction algorithms and classifiers for histopathological images is compared.
Table 1 compares the performance of 5 feature extraction algorithms and 4 classification algorithms used for
histopathological images. When the feature matrix obtained by the SFTA algorithm is classified by the Boosted Tree
algorithm, the highest success is obtained. When the feature matrix obtained by SFTA is classified by SVM, the
second most successful result is obtained. When the feature matrix generated by the LBP algorithm is classified with
KNN, it produces the lowest success in the table. When Table 1 is examined in general, the SFTA algorithm has the
highest success in all classifier algorithms. The LBP algorithm has lower results than the other algorithms. Among
classification algorithms SVM and Boosted Tree algorithms produced the highest success.

Table 1.Comparison of Classification Results


GLCM LBP LBGLCM GLRLM SFTA
SVM 92.8% 89.6% 92.9% 91.7% 94%
KNN 91.6% 84.2% 90.6% 87.6% 93.4%
LDA 90.3% 84.5% 91.5% 90.3% 92.6%
BOOSTED TREE 92.8% 89.8% 92.2% 91.8% 94.3%
46 Şaban Öztürk et al. / Procedia Computer Science 132 (2018) 40–46
Şaban Öztürk/ Procedia Computer Science 00 (2018) 000–000 7

4. Conclusion

In this study, well-known feature extraction algorithms and classification algorithms are compared on
histopathologic images. Images used in the experiments are cut into small pieces to prevent time loss from whole-
slide histopathologic images. Feature matrices extracted by GLCM, LBP, LBGLCM, GLRLM and SFTA from cut
image parts are classified by SVM, KNN, LDA and Boosted Tree. The obtained results are compared in a table. The
feature matrix results obtained by the SFTA algorithm produces more successful results than the other feature
extraction algorithms. The LBP algorithm produces more unsuccessful results than other feature extraction
algorithms. Among classification algorithms SVM and Boosted Tree algorithms have been more successful. The
most successful combination is the combination of SFTA and Boosted Tree with 94.3%.

References

[1] Sertel, O., Lozanski, G., Shana’ah, A., &Gurcan, M. N. (2010). Computer-aided detection of centroblasts for follicular lymphoma grading
using adaptive likelihood-based cell segmentation. IEEE Transactions on Biomedical Engineering, 57(10), 2613-2616.
[2] Mikhaylov, V. V., &Bakhshiev, A. V. (2017). The System for Histopathology Images Analysis of Spinal Cord Slices. Procedia Computer
Science, 103, 239-243.
[3] Nabizadeh, N., &Kubat, M. (2015). Brain tumors detection and segmentation in MR images: Gabor wavelet vs. statistical features.
Computers & Electrical Engineering, 45, 286-301.
[4] Nagarajan, G., Minu, R. I., Muthukumar, B., Vedanarayanan, V., &Sundarsingh, S. D. (2016). Hybrid Genetic Algorithm for Medical Image
Feature Extraction and Selection. Procedia Computer Science, 85, 455-462.
[5] Albregtsen, F., Nielsen, B., &Danielsen, H. E. (2000). Adaptive gray level run length features from class distance matrices. In Pattern
Recognition, 2000. Proceedings. 15th International Conference on (Vol. 3, pp. 738-741). IEEE.
[6] Sastry, S. S., Kumari, T. V., Rao, C. N., Mallika, K., Lakshminarayana, S., &Tiong, H. S. (2012). Transition temperatures of thermotropic
liquid crystals from the local binary gray level cooccurrence matrix. Advances in Condensed Matter Physics, 2012.
[7] Mohanaiah, P., Sathyanarayana, P., &GuruKumar, L. (2013). Image texture feature extraction using GLCM approach. International Journal
of Scientific and Research Publications, 3(5), 1.
[8] Heikkilä, M., Pietikäinen, M., &Schmid, C. (2009). Description of interest regions with local binary patterns. Pattern recognition, 42(3), 425-
436.
[9] Gunay, A., &Nabiyev, V. V. (2008, October). Automatic age classification with LBP. In Computer and Information Sciences, 2008. ISCIS'08.
23rd International Symposium on (pp. 1-4). IEEE.
[10]Nanni, L., Brahnam, S., Ghidoni, S., Menegatti, E., & Barrier, T. (2013). Different approaches for extracting information from the co-
occurrence matrix. PloS one, 8(12), e83554.
[11]Mohanty, A. K., Beberta, S., &Lenka, S. K. (2011). Classifying benign and malignant mass using GLCM and GLRLM based texture features
from mammogram. International Journal of Engineering Research and Applications, 1(3), 687-693.
[12]Costa, A. F., Humpire-Mamani, G., &Traina, A. J. M. (2012, August). An efficient algorithm for fractal analysis of textures. In Graphics,
Patterns and Images (SIBGRAPI), 2012 25th SIBGRAPI Conference on (pp. 39-46). IEEE.
[13]Saraswathi, D., Sharmila, G., & Srinivasan, E. (2014, February). An automated diagnosis system using wavelet based SFTA texture features.
In Information Communication and Embedded Systems (ICICES), 2014 International Conference on (pp. 1-5). IEEE.
[14]Zhang, Y., Lee, K., & Lee, H. (2016, June). Augmenting supervised neural networks with unsupervised objectives for large -scale image
classification. In International Conference on Machine Learning (pp. 612-621).
[15]Vapnik, V., &Izmailov, R. (2017). Knowledge transfer in SVM and neural networks. Annals of Mathematics and Artificial Intelli gence, 1-17.
[16]Tanveer, M., Shubham, K., Aldhaifallah, M., & Ho, S. S. (2016). An efficient regularized K-nearest neighbor based weighted twin support
vector regression. Knowledge-Based Systems, 94, 70-87.
[17]Treder, M. S., Porbadnigk, A. K., Avarvand, F. S., Müller, K. R., &Blankertz, B. (2016). The LDA beamformer: Optimal estimati on of ERP
source time series using linear discriminant analysis. NeuroImage, 129, 279-291.
[18]Al Shamsi, F., & Aung, Z. (2016, December). Automatic patent classification by a three-phase model with document frequency matrix and
boosted tree. In Electronic Devices, Systems and Applications (ICEDSA), 2016 5th International Conference on (pp. 1-4). IEEE.
[19]https://camelyon16.grand-challenge.org/
[20]Öztürk, Ş., &Akdemir, B. Fuzzy logic-based segmentation of manufacturing defects on reflective surfaces. Neural Computing and
Applications, 1-10. (2017). https://doi.org/10.1007/s00521-017-2862-6
[21]Pang, J., Zhang, S., & Zhang, S. (2016, May). A median filter based on the proportion of the image variance. In Information T echnology,
Networking, Electronic and Automation Control Conference, IEEE (pp. 123-127). IEEE.

You might also like