Malaria 21

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Micron 45 (2013) 97–106

Contents lists available at SciVerse ScienceDirect

Micron
journal homepage: www.elsevier.com/locate/micron

Machine learning approach for automated screening of malaria parasite using


light microscopic images
Dev Kumar Das a , Madhumala Ghosh a , Mallika Pal b , Asok K. Maiti b , Chandan Chakraborty a,∗
a
School of Medical Science and Technology, IIT Kharagpur, India
b
Department of Pathology, Midnapur Medical College & Hospital, Midnapur, West Bengal, India

a r t i c l e i n f o a b s t r a c t

Article history: The aim of this paper is to address the development of computer assisted malaria parasite characterization
Received 24 November 2011 and classification using machine learning approach based on light microscopic images of peripheral blood
Received in revised form 3 November 2012 smears. In doing this, microscopic image acquisition from stained slides, illumination correction and
Accepted 6 November 2012
noise reduction, erythrocyte segmentation, feature extraction, feature selection and finally classification
of different stages of malaria (Plasmodium vivax and Plasmodium falciparum) have been investigated.
Keywords:
The erythrocytes are segmented using marker controlled watershed transformation and subsequently
Malaria parasite
total ninety six features describing shape-size and texture of erythrocytes are extracted in respect to the
Erythrocyte
Texture
parasitemia infected versus non-infected cells. Ninety four features are found to be statistically significant
Bayesian classifier in discriminating six classes. Here a feature selection-cum-classification scheme has been devised by
Machine learning combining F-statistic, statistical learning techniques i.e., Bayesian learning and support vector machine
(SVM) in order to provide the higher classification accuracy using best set of discriminating features.
Results show that Bayesian approach provides the highest accuracy i.e., 84% for malaria classification
by selecting 19 most significant features while SVM provides highest accuracy i.e., 83.5% with 9 most
significant features. Finally, the performance of these two classifiers under feature selection framework
has been compared toward malaria parasite classification.
© 2012 Elsevier Ltd. All rights reserved.

1. Introduction schizont and gametocyte are visible under light microscope using
peripheral blood smears. The trophozoite stage is often known
Malaria is one type of parasitic infectious disease caused by as ring stage (WHO, 2010). In case of P. falciparum trophozoite
Plasmodium species viz. Plasmodium falciparum (P. falciparum), and gametocyte stage are visible under microscope but schizont
Plasmodium vivax (P. vivax), Plasmodium malariae (P. malariae) and stage is rarely visible because it remains in capillaries and bone
Plasmodim ovale (P. ovale) (Greer et al., 2009). This parasite exhibits marrow (Cuomo et al., 2012). In case of P. vivax infection, all
a complex life cycle involving an insect vector (mosquito) and a ver- three stages and in P. falciparum infection, two stages (trophozoite
tebrate host (human). Malaria is common in Asian and Sub African and gametocyte) are visible under microscope during peripheral
populations (Frean, 2010) and is responsible for 1.5–2.7 millions blood smear screening. Clinicians examine erythrocytes under light
of death per year (Raviraja et al., 2006). In the Indian population, microscope to study the color and morphological changes toward
the incident rate is higher in P. vivax infection cases than that of P. malaria diagnosis. Evaluation accuracy mostly depends on the
falciparum. It has been observed that ∼50–60% of malaria patients expert’s clinic-pathological understanding. In effect, such proce-
are affected by P. vivax while ∼40–50% is affected by P. falciparum dure involves humanistic error in terms of subjectivity, which leads
in India (NVBDCP, 2010-2011). to inconsistent as well as less diagnostic accuracy. To increase
Like other diseases, it is well understood that early detection of diagnostic precision by minimizing such subjectivity, developing a
malaria infection leads to prevention and cure by means of pro- computer assisted malaria parasite detection tool has given impor-
viding treatment and management. Red blood cells or erythrocyte tance in modern pathological services where a clinician will get
blood cells are mainly affected by the malaria parasites. In human assistance in order to quickly make better decision toward malaria
blood, three life stages viz., trophozoite, schizont and gametocyte diagnosis.
are cycled for the parasite. These infection stages viz. trophozoite, In modern diagnostic system development, machine learning
techniques have enormous contributions for achieving higher diag-
nostic precision in medical imaging informatics like microscopy,
∗ Corresponding author. Tel.: +91 3222 283570; fax: +91 3222 282221. ultrasound imaging, MRI, and CT. Microscopic image analysis is
E-mail address: chandanc@smst.iitkgp.ernet.in (C. Chakraborty). the most important as well as highly informative tool toward

0968-4328/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.micron.2012.11.002
98 D.K. Das et al. / Micron 45 (2013) 97–106

pathological evaluation of different diseases viz. hematological dis- Malaria parasite in HSV (Hue, Saturation, and Value) color space
order, oral cancer, breast cancer, cervical cancer, etc. Like others was segmented (Makkapati and Rao, 2009). Erythrocytes infected
microscopic image analysis, peripheral blood smear screening is by malaria parasites were detected by using statistical approach
one of the essential diagnostic techniques to identify hematolog- (Raviraja et al., 2008). Mathematical morphology and granulom-
ical disorders (anemia, thalassemia, etc.) and parasitic infection etry approaches (Dempster and Ruberto, 1999) and gray level
(malaria, filaria) in the blood. In case of malaria detection, pathol- thresholding (Toha and Ngah, 2007) for estimation of parasitemia
ogists frequently use light microscope to detect infection in were applied. Kumar et al. (2006) suggested clump splitting algo-
erythrocytes based on color as well as morphological changes of rithm and rule base approach to segment out clump erythrocytes
the erythrocyte. from peripheral blood smear images. Sio et al. (2007) applied
Now-a-days there are various techniques for malaria diagnosis rule based approach for P. falciparum infection detection purpose.
available in the market (Tangpukdee et al., 2009) but conven- Most of the literatures showed malaria classification based on
tional microscopic technique remains the gold standard for malaria cultured blood smear sample. But no comprehensive approach
diagnosis. Other methods are not cost effective and also these toward developing a pathological decision support system is
require further improvement for diagnostic precision. Few liter- still available for computerized detection of malaria parasitemia
atures have suggested computer vision approach to detect malaria viz., P. vivax and P. falciparum using peripheral blood smear
infection based on digital microscopic images of peripheral blood images.
smear. Color histogram based malaria parasite detection (Tek et al., In view of this, our study focuses on development of machine
2006) has been carried out. Further, Diaz et al. (2009) showed learning approach for discriminating five (three P. vivax and two P.
quantification and classification of P. falciparum infected erythro- falciparum) different stages of infected erythrocyte due to malaria
cytes. Morphological and novel thresholding selection techniques infection and non-infected erythrocytes using color, textural and
for identification of erythrocytes were used by Ross et al. (2006). morphological information. Fig. 1 depicts systematic approach for

Fig. 1. Work flow diagram of the proposed methodology.


D.K. Das et al. / Micron 45 (2013) 97–106 99

Fig. 2. Original grabbed images. (a) Gametocyte and (b) trophozoite infection stages of P. vivax.

executing the screening tool development methodology for malaria All images were labeled by the pathologists and marked
screening. the images P. vivax and P. falciparum infection stage wise.
Fig. 2 shows the grabbed images of different malaria infection
stages.
2. Materials and methods

2.1. Sample collection 2.2. Preprocessing

Thin peripheral blood smear samples were prepared and col- 2.2.1. Illumination correction
lected from Midnapur Medical College & Hospital, West Midnapur, Due to staining variability of blood smear and camera cali-
India. Here, thin smears were prepared on clean and disinfected bration, change occurs in illumination of the microscope images.
slides and stained with leishman for visualizing different cellu- Several illumination correction techniques have been applied in the
lar counterparts. Peripheral blood smear slides were collected literature. Here we have considered gray world assumption (Lam,
from a total 600 (50 normal, 496 P. vivax and 54 P. falcipa- 2005) for correcting illumination. Fig. 3 shows the result of an illu-
rum) patients. Out of 600, 150 (70 P. vivax, 40 P. falciparum and mination corrected image. Mathematically gray world assumption
40 normal) slides were considered because other slides were is described as follows:
not well prepared and not clearly visible under microscope. All If f(x, y) is considered as grabbed image of size M × N where
samples were verified and labeled by three distinct patholo- corresponding red, green and blue channel can be defined as fr (x, y),
gists as P. vivax, P. falciparum or normal cases. Approval for this fg (x, y) and fb (x, y), respectively. The average value of each channel
study was obtained from the research ethics committee of the can be calculated as follows respectively
Institute.
1 
M N
Fravg = fr (x, y) (1)
2.1.1. Image acquisition MN
x=1 y=1
Leishman stained peripheral blood smear images of malaria
patients’ slides were optically grabbed by Leica Observer (Leica
1 
M N
DM750) under 100× oil objectives (NA 1.5150) in JPEG format
and the image size was 2048 × 1536. The effective magnifi- Fgavg = fg (x, y) (2)
MN
cation and pixel size were 1000 and 0.064 ␮m, respectively. x=1 y=1

Fig. 3. Illumination corrected Images. (a–c) Original images and (d–f) illumination corrected images.
100 D.K. Das et al. / Micron 45 (2013) 97–106

Table 1 2.3. Erythrocyte segmentation


Quantitative performance measure of different denoising filters.

Filtering techniques MSE PSNR RMSE SNR Peripheral blood smear image consists of four components
Median filter 1.025 48.021 1.012 23.949 viz. erythrocyte, leukocyte, platelet and plasma. Leukocytes and
Wiener 1.070 47.837 1.034 23.764 platelets are morphologically different from erythrocyte’s shape.
Max filter 0.469 51.416 0.685 27.344 In this study, erythrocytes are the main regions of interest. Marker
Geometric mean filter 0.360 × 10−4 92.560 0.006 41.164 controlled watershed algorithm (Gonzalez and Woods, 2002;
M3 filter 0.399 × 10−3 82.115 0.020 30.719
Beucher and Meyer, 1993) has been applied to separate erythro-
cytes from the microscopic image. Fig. 5 shows the segmentation
result of peripheral blood smear images using marker controlled
1 
M N watershed techniques. In principle, marker controlled watershed
Fbavg = fb (x, y) (3) algorithm consists of the following three steps.
MN
x=1 y=1

Step 1 – Determine gradient image using Sobel filter


In this method, keeping green channel unchanged, gain for red Step 2 – Foreground and background region extraction
and green channels was computed as Step 3 – Computed watershed transform.
Fgavg Fgavg
gr = and gb = 2.4. Feature extraction
Fravg Fbavg

Subsequently, red and blue channel were adjusted as Feature extraction is a transformation of an image data into
another domain that produce features. In the present study, five dif-
fradj (x, y) = gr × fr (x, y) and fbadj (x, y) = gb × fb (x, y) (4) ferent types of malaria infected erythrocytes and non-infected cells
were discriminated from peripheral blood smear images. In manual
evaluation process, pathologists consider color and morphologi-
2.2.2. Noise reduction cal variation for identifying which types of infected erythrocytes
Noise removal is still a challenge in medical image analysis. are present on that particular peripheral blood smear slide. We
Several filtering methods exist but each one has its own limita- have computed a total of 80 textural (entropy, Haralick textu-
tion. Based on quantitative measure of performance (Thangavel ral features, local binary pattern, fractal dimension, histogram
et al., 2009), geometric mean filter (Gonzalez and Woods, 2002) based features, gray level run length matrix based texture) and
was selected for impulse noise removal purpose. Table 1 shows the 16 morphological features (shape features and Hu’s moment) to
performance of five different types of filters and Fig. 4 shows the discriminate six types of infected and non-infected erythrocytes
output of different impulse noise removal filters. The geometrical (see Table 2).
mean filter can defined as
⎡ ⎤1/mn 2.4.1. Entropy
 Entropy is the measure of uncertainty associated with ran-
f1 (x, y) = ⎣ g(s, t)⎦ (5) domness. We have considered five different types of entropy
(s,t) ∈ Sxy measures viz., Shannon, Renyi, Havarda and Charvat, Kapur’s
entropy (Pharwaha and Sing, 2009) and Yeager’s measure (Ghosh
where the gray image of illumination corrected image is g(s, t), and et al., 2010). Let I(x, y) be the erythrocyte (infected or non-infected)
Sxy is the set of coordinate of window of size m × n. image having Ni (i = 0,1,0,2,3,4, . . ., L–1) distinct gray values. The

Fig. 4. Output images of the different types of impulse noise removal filter.
D.K. Das et al. / Micron 45 (2013) 97–106 101

Fig. 5. Marker controlled watershed segmentation result. (a) Original image and (b) segmented Image.

Table 2 diagnostic problems. In view of this, we used GLCM and extracted


List of extracted quantitative features.
19 textural features like energy, entropy, variance, and information
No. of features Feature measure of correlation (Tan et al., 2009, 2010).
1 Havarda Charvat Entropy Suppose I(x, y) denotes the segmented erythrocyte (infected and
2 Kapur’s entropy non-infected) image having N(0,1,2,3, . . ., N − 1) distinct gray level
3 Renyi entropy intensities. Firstly, we calculate GLCM of order N × N, where N refers
4 Yeager’s measure the number of gray levels. Based on this GLCM, Haralick described
5 Shannon entropy
statistical features for describing the textural pattern of an image.
6–24 Haralick textural features (19)
25 Fractal dimension Some of these are calculated as follows:
26–31 Local binary pattern (6) If P(i, j) = normalized dependence matrix and N = no. of gray lev-
32–36 Histogram features (5) els present in the erythrocyte, then
37–45 Shape features (9)
Entropy
46–52 Hu’s moment (7)
53–96 GRLM (44)

N−1 
N−1
I1 = − P(i, j) log(P(i, j)) (12)
normalized histogram can be defined for a particular region of i=0 j=0

interest of size (M × N) as Energy


Ni
Hi = (6) 
N−1 
N−1
MN I2 = P(i, j)
2
(13)
The Shannon entropy can be defined as i=0 j=0


L−1
Correlation
S=− Hi log2 (Hi ), (7) N−1 N−1
i=0 i=0 j=0
(i, j)P(i, j) − x y
I3 = (14)
Similarly the others entropy measures are described as follows: x y
Renyi entropy where  x ,  y , x , and y indicate the standard deviations and
means of Px , Py whereby Px , Py correspond to the partial proba-
1  L−1

R= log2 Hi˛ , where ˛ =


/ 1, ˛ > 0 (8) bility density functions. Px (i) = ith entry in the marginal-probability
1−˛ matrix obtained by summing the rows of P(i, j)
i=1
Variance
Havrda and Charvat’s entropy
 
N−1 N−1

1 
L−1 I4 = (i − )
2
log(P(i, j)), where  = mean of P(i, j) (15)
R= log2 Hi˛ , where ˛ =
/ 1, ˛ > 0 (9)
1−˛ i=0 j=0
i=0
Information correlation measure 1
Kapur’s entropy
L−1 I1 − HXY1
I5 = (16)
1 H˛
i=0 i max(HX − HY )
K˛,ˇ = log2 L−1 , where ˛ =
/ ˇ, ˛ > 0, ˇ > 0 (10)
ˇ−˛ H
ˇ
i=0 i Information correlation measure 2
Yager’s measure I6 = (1 − exp[−2(HXY 2 − I1 )])1/2 (17)
L−1
i=0
2Hi − 1 where HX and HY are the entropies for Px and Py
Y =1− (11)
|M × N|

N−1
In addition, another five types of first order statistical features HXY = − P(i, j)(log(P(i, j))
viz. mean, variance, skewness, kurtosis and energy were computed i,j=0
based on image histogram.

2.4.2. GLCM based textural features 


N−1
HXY 1 = − P(i, j) log(Px (i)Py (j))
Gray level concurrence matrix (GLCM) (Haralick et al., 1973)
based textural features provide useful textural information in many i,j=0
102 D.K. Das et al. / Micron 45 (2013) 97–106

Long run low gray-level run emphasis (LRLGE)



N−1=0
Ng Nr
HXY 2 = − Px (i)Py (j) log(Px (i)Py (j)) i=1 j=1
(R(i, j) · j2 )/i2
i,j=0
LRLGE = Ng Nr (28)
i=1 j=1
R(i, j)
Sum entropy
Long run high gray-level run emphasis (LRHGE)

2(N−1)
Ng Nr
I7 = − Px+y (i) log(Px+y (i)) (18) i=1 j=1
(R(i, j) · j2 i2 )/i2
i=2
LRHGE = Ng Nr (29)
i=1 j=1
R(i, j)
2.4.3. GLRLM based textural features
The gray level run length matrix (GLRLM) has been proposed by 2.4.4. Fractal dimension
Galloway (1975) to describe coarse structure analysis. For an eryth- Fractal dimension is basically used for estimating the roughness
rocyte image I(x, y), run length matrix R(i, j) specifies the number of of an image surface (Krishnan et al., 2012). If we consider the gray
run length j in the given direction for a particular gray value i. Based scale profile as the third dimension along with two dimensions of
on this run length matrix, Galloway (1975) proposed five textu- image, the variation in this profile gives the roughness or textural
ral features. Further, Chu et al. (1990) and Dasarathy and Holder changes of the virtual surface consisting of the infected erythro-
(1991) proposed six new textural features. Total 11 textural fea- cyte. Here we considered modified differential box counting with
tures (Tang, 1998) for each 11◦ , 45◦ , 90◦ , and 135◦ direction angles sequential algorithm (Mandelbrot, 1982; Sarkar and Choudhury,
were computed as follows: 1994). Mathematically fractal dimension can be defined as
Short run emphasis (SRE) log Nr
Ng Nr D = lim
r→0 log(1/r)
(30)
i=1 j=1
r(i, j)/j2
SRE =  Ng  Nr
(19)
The summation of difference between maximum and minimum
i=1 j=1
R(i, j)
intensities provides N where r can be found by
Long run emphasis (LRE)
S
Ng Nr r=
i=1 j=1
j2 R(i, j) M
LRE = Ng Nr (20)
where S and M denote the grid size and the minimum size of the
i=1 j=1
R(i, j)
image, respectively. The grid contribution Nr can be calculated as
Gray-level non uniformity (GLNU) follows

Ng Nr
2
Nr = nr (i, j) (31)
i=1 j=1
R(i, j)
GLNU = Ng Nr (21) i,j

i=1 j=1
R(i, j)
Let maximum and minimum gray levels of the image I(x, y) in
Run length non uniformity (RLNU) (i, j) grid fall in the box numbers k and l, respectively.

Nr Ng
2 nr (i, j) = k − l + 1 (32)
j=1 i=1
R(i, j)
RLNU = Ng Nr (22)
2.4.5. Local binary pattern (LBP)
i=1 j=1
R(i, j)
LBP is an important textural feature describing the local neigh-
Run percentage (RP) borhood (Ojala et al., 2002; Krishnan et al., 2011) for gray scale
Ng Nr images. Here we have considered circular neighborhood and bilin-
R(i, j)
RP =
i=1 j=1
, (23) ear interpolation value for LBP computation. If P is considered as
P number of circular neighborhood pixel points for radius R, let Gc
Here P is the total number of image pixels point indicates the gray value of center pixel of circular neighborhood
Low gray-level run emphasis (LGRE) for an erythrocyte image I(x, y) and corresponding circular neigh-
Ng Nr borhood pixel gray value is Gp , for p = 0, . . . ., P − 1. Depending on
i=1 j=1
R(i, j)/i2 the gray value of the center pixel Gc , circular points P are converted
LGRE = Ng Nr (24) into a binary (0 or 1) pattern. The local texture of the image I(x, y)
i=1 j=1
R(i, j)
is defined as
High gray-level run emphasis (HGRE)
T = t(Gc , G0 , . . . , Gp−1 ) (33)
Ng Nr
R(i, j) · i2
i=1 j=1 The LBP for center pixel can be defined as
HGRE = Ng Nr (25)
i=1 j=1
R(i, j)

P−1
p 1, if x ≥ 0;
Short run low gray-level run emphasis (SRLGE) LBPPR = F(Gp − Gc )2 where F(x) = (34)
0, otherwise
Ng Nr R(i,j)
p=0
i=1 j=1 i2 ·j2
SRLGE = Ng Nr (26) In rotation invariant mapping, all neighborhood sets rotate in
i=1 j=1
R(i, j) clockwise direction for getting maximum number of most signifi-
cant bits which is zero in the LBP code.
Short run high gray-level run emphasis (SRHGE)
Ng Nr LBPri
P,R = min{ROR(LBPPR , i)|i = 0, 1, . . . , P − 1} (35)
i=1 j=1
(R(i, j) · i2 )/j2
SRHGE = Ng Nr (27) For a particular bit sequence x by i step, circular rotation is ROR(x,
R(i, j)
i=1 j=1 i). To remove sampling artifact, here uniformity measure (U) is
D.K. Das et al. / Micron 45 (2013) 97–106 103

Fig. 6. Class conditional density plot for non-infected and others five types of malarial infection stage (a) Kapur’s entropy and (b) fractal dimension.

considered based on the transition in the neighborhood pattern. Table 3


Statistical test of features describing malaria samples.
In LBP code, pattern with U ≤ 2 is considered.
Feature set Feature
P−1 *
F(Gp − Gc ) if U(LBPP,R ) 1 Havarda Charvat entropy
p=0
LBPriu2
P,R = (36) 2* Kapur’s entropy
P + 1 Otherwise 3* Renyi entropy
4* Yeager’s measure
5* Shannon entropy
Here we have considered two features (mean and variance) for 6–24* Haralick textural features (19)
25* Fractal dimension
every radius (R = 1, 2, 3) and corresponding P as 8, 16, and 24.
26–31* Local binary pattern (6)
32,34–36* Histogram features (4)
37–40, 42–45* Shape features (8)
2.4.6. Morphological feature 46–52* Hu’s moment (7)
53–96* GRLM (44)
Morphometric information have significant role in charac- 33 Histogram features
terizing abnormal erythrocytes detection. In case of anemia, 41 Shape feature
erythrocytes shape and size become irregular with respect to *
p < 0.001 indicates statistical significance.
normal in nature. Here nine shape features like area, perimeter,
circularity, eccentricity, orientation, major axis, minor axis, round-
ness, formfactor (Gonzalez and Woods, 2002) and seven invariant 3.1. One-way ANOVA
moments (Hue, 1962; Das et al., 2010) are extracted.
One-way ANOVA is a study of relationship between different
samples (Gun et al., 2008). It compares the mean of three or more
3. Feature selection classes by using F distribution. F distribution is defined as the ratio
between mean square variance of classes and mean square vari-
In this work, a total of 96 textural and morphological features ance within classes. For calculating mean square, first we have to
were generated for malaria parasite infected erythrocytes. One- compute sum square classes and sum square within classes.
way ANOVA was applied to obtain F-value for feature selection
purpose. In addition the distribution profiles (Rastogi, 2008; Gun 4. Malaria infection stage classification
et al., 2008) of features among abnormal and normal erythro-
cytes were quantified using probability density estimation and 4.1. Bayesian approach
Box-whisker’s plot. Figs. 6 and 7 show the density and distribution
plot for the six class data. In case of malaria infected erythrocytes, Malarial infections stage classification becomes the challeng-
94 features were found to be statistically significant. Table 3 shows ing task. Here Naïve Bayes’ classifier (Duda et al., 2007; Han and
the F-statistic as discriminating criteria for feature ranking. Kamber, 2006) is used for classifying five stages of malaria infected

Fig. 7. Box-whisker plot for non-infected and others five types of malarial infection stage (a) Kapur’s entropy and (b) fractal dimension.
104 D.K. Das et al. / Micron 45 (2013) 97–106

erythrocytes (ring, scizon, gametocytes for P. vivax and ring, game- Table 4
Performance analysis of feature selection-cum-classification scheme.
tocyte for P. falciparum) and non-infected erythrocyte. Suppose
there are m classes viz., C1 , C2 , C3 , . . . , Cm , whereas d dimensional F value % Accuracy Feature set
feature space X = (x1 , x2 , . . . , xd ) is considered as object descrip-
Naïve Bayes’ SVM
tors. For a particular feature set X, classifier predicts the infected
>4 68.35 67.11 94
stage in one of the classes where it attains higher posterior proba-
>100 70.60 67.11 74
bility i.e., erythrocyte belongs to the class C1 if and only if >150 74.43 67.11 63
>200 77.25 67.68 53
P(Ci |X) > P(Cj |X) for 1 ≤ j ≤ m, j =
/ i. (37) >249 80.85 67.68 47
>300 82.43 77.70 32
Posterior probability can be defined based on Bayes’ theorem as
>350 81.64 76.91 28
P(X|Ci ) · P(Ci ) >400 84.00 76.80 19
P(Ci |X) = (38) >450 83.78 80.96 13
P(X)
>500 80.96 75.67 11
where P(X)is the prior probability as defined by >550 82.31 83.55 9
>600 79.95 83.44 7

m >650 68.24 73.19 6
P(X) = P(X|Ci )P(Ci ) (39) >700 65.42 71.95 3

i=1

P(X|Ci ) denotes the likelihood of class Ci with respect to X. Under


Naïve assumption, the likelihood function becomes the product of Here, W is weight coefficient vector, and b is bias term. Main aim
marginal density functions, defined as of this algorithm is to minimize the cost function J(W), defined as


d
J(W ) =
1
(42)
P(X|Ci ) = P(xk |Ci ) = P(x1 |Ci ) × P(x2 |Ci ) WT W
k=1 For linearly non-separable data cases, Eq. (41) can be written as
× P(x3 |Ci ) × · · · × P(xd |Ci ) (40) W T ˚(X) + b = 0 (43)

where ˚ defined as kernel transformation to a higher dimensional


In order to predict for unknown features set X* , posterior prob-
space.
ability is calculated for each class (Ci ) label and predict the class
label for which posterior probability is maximum.
5. Results
4.2. Support vector machine (SVM)
In the proposed scheme, Table 1 showed the results of MSE,
RMSE, SNR and PSNR for various filters where it has been observed
SVM is a well known supervised learning technique. It optimizes
that geometric mean filter provides higher SNR and lower MSE
the class separation hyperplane in such a way that maximizes the
in minimizing the impulse noise. After that erythrocytes are seg-
distance between pattern and the class separating hyperplane. Here
mented using marker controlled watershed method. Tables 2 and 3
data are not linearly separable (Martis et al., 2012; Krishnan et al.,
showed all 96 extracted features where 94 features are found to
2012). So RBF kernel has been applied for projecting the data into
be statistically significant as evaluated by Fisher’s F-criterion. In
higher dimensional space where data are linearly separable. Let
Table 4, the result toward performance analysis of the feature
each class has d dimensional feature vector X = (x1 , x2 , . . . , xd ) and
selection-cum-classification scheme has been obtained in order to
its class level y is assigned two values +1, −1. Then, the boundary
select optimum set of features for achieving the highest accuracy
hyperplane is defined as
in both the learning techniques. 10-fold cross validation approach
WT X + b = 0 (41) generated confusion matrices (see Tables 5 and 6) for Bayesian

Table 5
Bayesian learning based confusion matrix for accuracy 84% and 19 most significant features.

PV gametocyte PV schizont PV trophozoite PF gametocyte PF trophozoite Non-infected

PV gametocyte 143 5 0 0 0 0
PV schizont 12 132 0 0 4 0
PV trophozoite 6 1 121 0 20 0
PF gametocyte 10 0 0 138 0 0
PF trophozoite 3 0 21 0 110 14
Non-infected 0 0 6 0 40 102

PV : P. vivax; PF: P. falciparum.

Table 6
SVM learning based confusion matrix for accuracy 83.5% and 9 most significant features.

PV gametocyte PV schizont PV trophozoite PF gametocyte PF trophozoite Non-infected

PV gametocyte 110 16 4 18 0 0
PV schizont 12 130 1 0 5 0
PV trophozoite 6 1 118 0 23 0
PF gametocyte 3 0 0 145 0 0
PF trophozoite 0 0 12 3 108 25
Non-infected 0 0 7 0 10 131

PV: P. vivax; PF: P. falciparum.


D.K. Das et al. / Micron 45 (2013) 97–106 105

Table 7
Comparative study of the proposed methodology with the existing methods.

Malaria sp. Staining Performance

Sensitivity – 94%
Diaz et al. (2009) P. falciparum Giemsa
Specificity – 98.7%
Sio et al. (2007) P. falciparum Giemsa –
Sensitivity – 74%
Tek et al. (2006) P. falciparum Giemsa
Specificity – 98%
Sensitivity – 85%
Ross et al. (2006) P. falciparum Giemsa
Positive predictive value – 81%
(a) Bayesian learning
Sensitivity – 98.10%
Specificity – 68.91%
Proposed methodology P. falciparum & P. vivax Leishman
(b) SVM learning
Sensitivity – 96.62%
Specificity – 88.51%

Fig. 8. Malaria classification accuracy corresponding to F-value and feature set.

and SVM techniques based on the most significant set of features. of feature space is decreasing with varying classification accuracy
Finally, a comparative study has been shown in Table 7 to compare while feature’s discriminating potentiality computed by F-value is
the proposed machine learning scheme with existing methods with increasing. In Table 4 and Fig. 8, it can be observed that Bayesian
respect to sensitivity and specificity. approach provides highest accuracy (84%) with 19 most significant
features corresponding to F ≥ 400; and SVM leads to the highest
6. Discussions accuracy (83.5%) for 9 most significant features while F ≥ 550. Such
interactive statistical feature selection process becomes impor-
Automated detection of malaria infected erythrocytes from tant when the dimension of features set is large. From Table 7, it
peripheral blood smear samples using light microscopy is still a has been observed that most of the studies have been designed
challenging task in pathological decision making. In fact, infection for only P. falciparum infection stage classification using culture
stage classification is most important for early diagnosis of malaria. blood sample with giemsa stain. The proposed machine learning
Now-a-days both types (P. falciparum and P. vivax) of malaria infec- scheme using Bayesian approach achieved 84% screening accu-
tion are even observed in a single patient. Often pathologists tend racy, 98.10% sensitivity and 68.91% specificity for both P. falciparum
to overlook the presence of both infections due to overlapping fea- and P. vivax parasite recognition. On the other side, the devel-
tures of trophozoite stage. Our proposed methodology provides oped approach using SVM leads to 83.5% screening accuracy,
significant discriminating capability to differentiate both (P. falcip- 96.62% sensitivity and 88.51% specificity for both the parasites
arum and P. vivax) trophozoite stages of the infection with reduced recognition.
subjective error. Most of the literatures considered cultured blood
samples; but here we have considered leishman stain peripheral 7. Conclusion
blood smear sample. Textural as well as morphological information
are necessary for characterizing abnormal malaria infected erythro- In the field of quantitative microscopy, machine learning plays
cytes. Here, we have incorporated both textural and morphological important role in structural and textural characterization of tis-
features to discriminate P. vivax and P. falciparum infected and sue and cells. In view of this, here attempt has been made to
non-infected erythrocytes. In our proposed approach, we have con- develop computer aided pattern recognition of malaria para-
sidered a total of 888 (148 per class) erythrocytes for training and sitemia along with its stages based on learning techniques. The
testing purposes. In Table 4, it can be observed that the dimension proposed scheme is able to quantitatively characterize both P.
106 D.K. Das et al. / Micron 45 (2013) 97–106

vivax and P. falciparum for better pathological understanding. In Krishnan, M.M.R, Shah, P., Chakraborty, C., Ray, A.K., 2012. Statistical analysis of
addition to this, it is able to automatically classify the malaria textural features for improved classification of oral histopathological images.
Journal of Medical Systems 36 (2), 865–881.
infected erythrocytes into trophozoite, schizont and gametocyte Kumar, S., Ong, S.H., Raganath, S., Ong, T.C., Chew, F.T., 2006. A rule-based
stages. Moreover, it can be applicable in telemedicine to provide approach for robust clump splitting. Pattern Recognition Letter 39 (6),
quick diagnosis in remote places where pathologists are not often 1088–1098.
Lam, E.Y., 2005. Combining gray world and retinex theory for automatic white bal-
accessible. ance in digital photography. In: Proceedings of International Symposium on
Consumer Electronics, pp. 134–139.
Acknowledgement Makkapati, V.V., Rao, R.M., 2009. Segmentation on malaria parasites in peripheral
blood smear images. In: Proceedings of IEEE International Conference on Acous-
tics, Speech and Signal Processing, pp. 1361–1364.
The authors acknowledge Dept. of Information Technology, Mandelbrot, B.B., 1982. Fractal Geometry of Nature. W. H. Freeman and Company,
Govt. of India for providing financial support to carry out this work New York.
Martis, R.J., Acharya, R.U., Mandana, K.M., Ray, A.K., Chakraborty, C., 2012.
(Ref. No. IIT/SRIC/SMST/DPR/2009-10/15).
Application of principal component analysis to ECG signals for automated
diagnosis of cardiac health. Expert Systems with Applications 39 (14),
11792–11800.
References National vector borne disease control programme. Trend of malaria (2010-2011).
Available at http://nvbdcp.gov.in/malaria9.html
Beucher, S., Meyer, F., 1993. The morphological approach to segmentation: the Ojala, T., Pietikainen, M., Maenpaa, T., 2002. Multiresolution gray-scale and rotation
watershed transformation. Mathematical Morphology in Image Processing, vol. invariant texture classification with local binary patterns. IEEE Transactions on
34. Marcel Dekker, New York, pp. 433–481 (Chapter 12). Pattern Analysis and Machine Intelligence 24 (7), 971–981.
Chu, A., Schgal, C.M., Greenleaf, J.F., 1990. Use of gray value distribution of run lengths Pharwaha, A.P.S., Sing, B., 2009. Shannon and non-Shannon measures of entropy for
for texture analysis. Pattern Recognition Letter 11 (6), 415–420. statistical texture feature extraction in digitized mammograms. In: Proceedings
Cuomo, M.J., Noel, L.B., White, D.B., 2012. Diagnosing Medical Parasites: A Public of WCECS, vol. I/II, San Francisco, USA, pp. 1286–1291.
Health Officers Guide to Assisting Laboratory and Medical Officers. Retrieved Rastogi, V., 2008. Fundamentals of Biostatistics. Ane Books, India.
from http://www.phsource.us/PH/PARA/Diagnosing Medical Parasites.pdf Raviraja, S., Osman, S.S., Kardman, 2008. A novel technique for malaria diag-
Das, D., Ghosh, M., Chakraborty, C., Pal, M., Maity, A.K., 2010. Invariant moment based nosis using invariant moments and by image compression. In: IFMBE
feature analysis for abnormal erythrocyte recognition. In: Proceedings of IEEE, Proceedings of 4th International Conference on Biomedical Engineering, vol. 21,
ICSMB, IEEE, IIT Kharagpur, India, pp. 242–247. pp. 730–733.
Dasarathy, B.R., Holder, E.B., 1991. Image characterization based on joint gray-level Raviraja, S., Bajpai, G., Sharma, S.K., 2006. Analysis of detecting the malarial
run-length distribution. Pattern Recognition Letter 12 (8), 497–502. parasite infected blood images using statistical based approach. In: IFMBE
Dempster, A., Ruberto, C.D., 1999. Morphological processing of malarial slide images. Proceedings of 3rd International conference on Biomedical Engineering, vol. 15,
In: Matlab DSP Conference, Espoo, Finland, pp. 16–17. pp. 534–537.
Diaz, G., Gonzalez, F.A., Romero, E., 2009. A semi automatic method for quantification Ross, N.E., Pritchard, C.J., Rubin, D.M., Duse, A.G., 2006. Automatic image processing
and classification of erythrocytes infected with malaria parasites in microscopic method for the diagnosis and classification of malaria on thin blood smears.
image. Journal of Biomedical Informatics 42 (2), 296–307. Medical and Biological Engineering Computing 44 (5), 427–436.
Duda, R., Hart, P.E., Stork, D.G., 2007. Pattern Classification, 2nd ed. Wieley Pub, New Sarkar, N., Choudhury, B.B., 1994. An efficient differential box-counting approach
Delhi. to compute fractal dimension of image. IEEE Transaction on Systems, Man and
Frean, J., 2010. Microscopic determination of malaria parasite load: role of image Cybernetics 24 (1), 115–120.
analysis. Microscopy: Science, Technology, Application and Education, FORMA- Sio, S.W.S., Sun, W., Kumar, S., Bin, W.Z., Tan, S.S., et al., 2007. Malaria count: an
TEX 3, 862–866. image analysis-based program for the accurate determination of parasitemia.
Galloway, M.M., 1975. Texture analysis using gray level run lengths. Computer Journal of Microbiological Methods 68 (1), 11–18.
Graphics and Image Processing 4 (2), 172–179. Tan, J.H., Ng, E.Y.K., Acharya, R.U., Chee, C., 2010. Study of normalocular ther-
Gonzalez, R.C., Woods, R.E., 2002. Digital Image Processing, 2nd ed. Prentice Hall, mogram using textural parameters. Infrared Physics & Technology 53 (2),
New York. 120–126.
Ghosh, M., Das, D., Chakraborty, C., 2010. Entropy based divergence for leukocyte Tan, J.H., Ng, E.Y.K., Acharya, U.R., Chee, C., 2009. Infrared thermography on ocu-
image segmentation. In: Proceedings of IEEE, ICSMB, IEEE, IIT Kharagpur, India, lar surface temperature: a review. Infrared Physics & Technology 52 (4),
pp. 409–413. 97–108.
Greer, J.P., Foerster, J., Rodgers, G.M., Paraskevas, F., Glader, B., et al., 2009. Wintrobe’s Tang, X., 1998. Texture information in run-lengths matrices. IEEE Transaction on
Clinical Hematology, 12th ed. Lippincott Williams & Wilkins, Philadelphia. Image Processing 7 (11), 1602–1609.
Gun, A.M., Gupta, M.K., Dasgupta, B., 2008. Fundamentals of Statistics, vol. 2. The Tangpukdee, N., Duangdee, C., Wilairatana, P., Krudsood, S., 2009. Malaria diagnosis:
World Press Pvt. Ltd., Kolkata, India. a brief review. Korean Journal of Parasitology 47 (2), 93–102.
Han, J., Kamber, M., 2006. Data Mining: Concept and Techniques. Morgan Kaufmann Tek, F.B., Dempster, A.G., Kale, I., 2006. Malaria parasite detection in peripheral blood
Publishers, San Francisco, USA. images. In: Proceeding of British Machine Vision Conference.
Haralick, R.M., Shanmugam, K., Dinstein, I., 1973. Textural features for image Thangavel, K., Manavalan, R., Laurence Aroquiaraj, I., 2009. Removal of speckle noise
classification. IEEE Transaction on Systems. Man and Cybernetics 3 (6), from ultrasound medical image based on special filters: comparative study.
610–621. ICGST-GVIP Journal 9 (3), 25–32.
Hue, M.K., 1962. Visual pattern recognition by moment invariants. IRE Transaction Toha, S.F., Ngah, U.K., 2007. Computer Aided Medical Diagnosis for the Identifi-
on Information Theory 8 (2), 179–187. cation of Malaria Parasites. In: Proceedings of IEEE ICSCN, MIT Campus, Anna
Krishnan, M.M.R., Shah, P., Choudhury, A., Chakraborty, C., Paul, R.R., et al., 2011. University, Chennai, India, pp. 521–522.
Textural characterization of histopathological images for oral sub-mucous fibro- WHO, 2010. Basic Malaria Microscopy: Part I. Learner’s Guide, 2nd ed. World Health
sis detection. Tissue and Cell 43 (5), 318–330. Organization, Geneva, Switzerland, pp. 51–67.

You might also like