Professional Documents
Culture Documents
1 s2.0 S1746809420302846 Main
1 s2.0 S1746809420302846 Main
A R T I C L E I N F O A B S T R A C T
Keywords: Glaucoma also known as silent theft of sight, is the prevailing source of sightedness across the globe. It is mainly
Glaucoma caused owing to damage in the optical nerve of an eye leading to permanent blindness. Traditional approaches
Retinal fundus image used by the ophthalmologists for diagnosis include assessment of intraocular pressure by tonometry, pachymetry
Structural features
etc. But all these evaluations are time-consuming, require human interaction and may be prone to subjective
Non-structural features
Classification
errors. Thus, to overcome these challenges, researchers are working in the field of medical imaging by analysis of
the retinal images for glaucoma diagnosis. Also, Computer aided diagnosis (CAD) systems can be developed to
overcome these challenges using machine learning approaches for classifying retinal images as ‘abnormal’ and
‘normal’. This paper presents a new set of reduced hybrid features derived from structural and nonstructural
features to classify the retinal fundus image, which could serve as a second opinion for ophthalmologists. The
structural features extracted include Disc damage likelihood scale (DDLS) and Cup to disc ratio (CDR). Whereas,
non-structural features include Grey level run length matrix (GLRM), Grey level co-occurrence matrix (GLCM),
First order statistical (FoS), Higher order spectra (HOS), Higher order cumulant (HOC) and Wavelets. Finally, the
paper presents a comparative analysis of K nearest neighbor (k-NN), Neural network (NN), Random forest (RF),
Support vector machine (SVM) and Naïve bayes (NB) using metrics such as accuracy, specificity, precision and
sensitivity.
Abbreviations: CAD, computer aided diagnosis; DDLS, disc damage likelihood scale; CDR, cup to disc ratio; GLRM, grey level run length matrix; GLCM, grey level
co-occurrence matrix; FoS, first order statistical; HOS, higher order spectra; HOC, higher order cumulant; k-NN, K nearest neighbor; NN, neural network; RF, random
forest; SVM, support vector machine; NB, naive bayes; ROI, region of interest; SO, sequential optimization; MLP, maximum linear progression; RBF, radial basis
function; PCA, principal component analysis; EAS, evolutionary attribute selection; OS, operating system; LARKIFCM, level set based adaptively regularized kernel
based intuitionistic fuzzy c means; DWT, discrete wavelet transform; IG, information gain; GR, gain ratio; SRE, short run emphasis; LRE, long run emphasis; GLN, grey
level run emphasis; RP, run percentage; RLN, run length non-uniformity; autoc, autocorrelation; contr, contrast; corrp, correlation; cprom, cluster prominence; cshad,
cluster shade; dissi, dissimilarity; entro, entropy; homop, homogeneity; maxpr, maximum probability; sosvh, sum of squares variance; savgh, sum average; svarh, sum
variance; senth, sum entropy; dvarh, difference variance; denth, difference entropy; inf1h, information measure of correlation; inf2h, information measure of cor
relation2; indnc, inverse difference normalized; idmnc, inverse difference moment normalized; var, variance; kr, kurtosis; sk, skewness; db, daubechies; bior, bio
rthogonal; Tn, true negative; Tp, true positive; Fn, false negative; Fp, false positive.
* Corresponding author.
E-mail addresses: niharikathakur04@gmail.com (N. Thakur), mamtajuneja@pu.ac.in (M. Juneja).
https://doi.org/10.1016/j.bspc.2020.102137
Received 2 November 2019; Received in revised form 28 July 2020; Accepted 4 August 2020
Available online 26 August 2020
1746-8094/© 2020 Elsevier Ltd. All rights reserved.
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
Fig. 2. (a) Normal fundus retinal image (b) Abnormal fundus retinal image.
India 12 million people are observed to be suspect of glaucoma and the Comparison of the approaches showed that RF outperforms other ap
cases of blindness is approximated to be 1.2 million [2]. The screening of proaches with accuracy of 90% [6]. Thereafter Sumeet et al. extracted
glaucoma is usually carried out by skilled ophthalmologists using mea wavelet features for classification of glaucoma using SVM, SO, NB and
surements of the fluid in the eye, the angle of drainage or out flow RF. Based on the analysis, they found that SVM outperforms all other
channel and visual analysis of retinal fundus image. But all these manual classifiers with accuracy of 93% [7]. Also, Rama Krishnan et al. used
tasks and visual analysis of any medical image is time-consuming, which SVM to classify glaucoma based on wavelet and HOS features with an
may take approximate of 20–45 min and prone to subjective evalua accuracy of 95% [8]. Further, Noronha et al. extracted HOS and clas
tions. Thus, there is a need for a Computer aided diagnosis (CAD) system sified glaucoma using SVM and NB classifier. Amongst the two, NB
[3] for glaucoma which can aid as an assistant to the ophthalmologists classifier performed better than the SVM with accuracy of 92.65% [9].
for analysis of medical images comprehensively in a short duration. The Thereafter, Rajendra et al. used SVM and NB classifiers to classify
CAD system comprises pre-processing, segmentation and classification glaucoma based on Gabor features. They analyzed that SVM classifier
of medical images for diagnosis of glaucoma. Pre-processing is the performs better than NB with accuracy of 93.13% [10]. Further, Issac
beginning step to remove certain noise and outliers in the input image et al. extracted structural features such as cup to disc ratio (CDR) and
that causes problems for further analysis by use of various filters and disc damage likelihood scale (DDLS) after segmentation of disc and cup
morphological operations etc. Further, the segmentation comprises from retinal image. Thereafter, the classification was performed using
dividing the input image into multiple regions for extracting the SVM and neural network (NN). Based on the performance, SVM was
required region of interest (ROI) which is of more meaning than other found to be better than NN with accuracy of 94.11% and specificity of
regions so as to easily analyze the abnormality. Finally, the classification 100% [11]. Later, Salem et al. extracted structural, textural and
is a decision-making approach to classify images as ‘abnormal’ or intensity-based features for detection of glaucoma using SVM classifier.
‘normal’. This work emphasizes on the classification module of CAD The values of sensitivity, specificity and accuracy were observed to be
systems using machine learning approaches in retinal images. Fig. 1 100%, 87% and 92% for the same [12]. Haleem et al. presented an
shows the generalized CAD system with different modules. approach for classification between abnormal and normal cases of
In the area of medical image processing, glaucoma can also be glaucoma using SVM classifier with accuracy of 94.4%. The features
diagnosed by analysis of retinal imaging acquired using retinal fundus extracted for classification were structural, gaussian, gabor and wavelets
cameras. Fig. 2 presents the retinal image with absence of glaucoma [13]. Further, Claro et al. classified retinal images using Maximum
(normal) and presence of glaucoma (abnormal). Here, it can be seen that linear progression (MLP), RF, Random committee and Radial basis
an eye with glaucoma contains an enlarged area of optic cup occupying function (RBF). They extracted structural and textural features to test on
the optic disc [4]. different classifiers. Based on the analysis, MLP classifier was found to
Various approaches have been performed till date for classification of outperform others with accuracy of 93.03% [14]. Similarly, Singh et al.
glaucoma using different types of features extracted from retinal images extracted wavelet features for glaucoma classification using RF, NB, K
[5]. Some of them include: nearest neighbor (k-NN), ANN and SVM. Features extracted were
Rajendra et al. presented a computerized approach for diagnosis of thereafter selected using Principal component analysis (PCA) and
glaucoma by Higher order spectra (HOS) and textural features. The Evolutionary attribute selection (EAS). Testing of results with different
classification was performed using Support vector machine (SVM), classifiers showed that the features selected by EAS and classified using
Sequential optimization (SO), Naïve bayes (NB) and Random forest (RF). RF/NN gives accuracy of 94.7%. Also, the features selected using PCA
2
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
3
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
cumulant (HOC) and Discrete wavelet transform (DWT) are extracted 3.1. Pre-processing
from grayscale converted retinal image. Further, the more significant
features are extracted by Information gain (IG), Gain ratio (GR), Cor This is the initial step comprising, channel separation and vessel
relation, Relief, Wrapper; and classification is performed using k-NN, removal. The input images were initially converted into different
NN, SVM, RF and NB. The flow diagram of the methodology used for channels such as red, green and blue to analyse the best suitable channel
classification is being represented in Fig. 3 and the procedure for the for further processing. Finally, the red and green channels were opted
same is discussed in Section 3.1 to Section 3.5. for segmentation of optic disc and optic cup after removal of vessels
using morphological closing with structuring element of type ‘disc’.
Fig. 4 shows the input image, gray-scale image, vessels removed red
channel and vessels removed green channel that can be used for
4
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
Here, DDLS is the scale of disc damage likelihood, RIMwidth is the width
of region between disc and cup (i.e. rim) shown in Fig. 4(c), while,
Diadisc is the disc diameter used in Eq. (1) above. Here also, the RIMwidth
and Diadisc are calculated using ‘EquivDiameter’ parameter of function
‘regionprops’ in MATLAB with values of RIMwidth and Diadisc in the same
Fig. 6. DDLS Chart. units.
and DWT are obtained directly from grayscale converted retinal image matrix for direction Ө, p(i,j| Ө) is the normalized matrix given as p(i,
P(i,j|Ө)
shown in Fig. 4(b). The details of structural and non-structural features jӨ) = Nr (Ө) . The values of Ө in this case are 0◦ corresponding to hori
are given below: zontal, 45 representing diagonal and 90◦ signifying vertical direction.
◦
Thus, the final value of GLRM is computed for each degree and the mean
3.3.1. Structural features of these values are returned. The commonly extracted GLRM features are
These are the features which emphasize on the external boundary of SRE, LRE, GLN, RP and RLN.
the ROI detected i.e. area, volume, length, breadth, perimeter, etc. Some
of the structural features discussed here include CDR and DDLS: (a) Short run emphasis (SRE): It is the measure of short run lengths
distribution and calculated using Eq. (3). Here, the greater value
(i) Cup to disc ratio (CDR): It is the ratio of the optic cup to the of SRE indicates fine texture.
optic disc diameters used in ophthalmology to access the pro
gression of glaucoma. The increasing value of CDR signifies the
Ng ∑
∑ Nr
P(i,j|θ)
CDR =
Diacup
(1) (b) Long run emphasis (LRE): It is a measure of long run lengths
Diadisc distribution with higher values indicating coarse texture. LRE is
computed using Eq. (4)
Here, CDR is the ratio of cup to disc, Diacup is the diameter of optic cup
and Diadisc is the diameter of disc. Diacup and Diadisc are calculated from Ng ∑
∑ Nr ⃒
segmented optic cup and optic disc shown in Fig. 4(a) and Fig. 4(b) using P(i, j⃒θ)j2
(4)
i=1 j=1
‘EquivDiameter’ parameter of function ‘regionprops’ in MATLAB. Here, LRE =
Nr (θ)
the values of Diacup and Diadisc must be in same unit.
(ii) Disc damage likelihood scale (DDLS): DDLS is the ratio of rim
width to the disc diameter, which is used for prediction of the
disease glaucoma. It provides the quantitative assessment of
5
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
(c) Grey level run emphasis (GLN): GLN is the measure of similar (a) Autocorrelation (autoc) [27]: autoc is the measure of coarseness
intensity values in an image, computed using Eq. (5). Here, a and fineness of the texture. It is calculated using Eq. (8) and in
lesser value of GLN indicates higher similarity in intensity values. this case, higher magnitude of autoc indicates more fineness.
( )2 Ng ∑
Ng
Ng
∑ ∑Nr ∑
P(i, j|θ) autoc = p(i, j)ij (8)
i=1 j=1 i=1 j=1
GLN = (5)
Nr (θ)
(b) Contrast (contr) [27]: contr is the measure of variation in in
tensity and is computed using Eq. (9) below.
(d) Run percentage (RP): It is the measure of coarseness in the texture
The larger value of contr signifies disparity in the values of the image
of an image by using the ratio of runs number to the voxels
intensity.
number in ROI. It is calculated using Eq. (6) and the RP with
higher values indicate fine texture. Ng ∑
∑ Ng
contr = (i − j)2 p(i, j) (9)
Nr (θ)
RP = (6) i=1 j=1
Np
1
Here, the values of RP lie between Np to 1.
(c) Correlation (corrp) [27]: corrp lies between 0 and 1, indicating
(e) Run length non-uniformity (RLN): It is the measure of similarity the dependency of gray level values in the respective voxels
of run lengths through the image and calculated using Eq. (7). calculated by Eq. (10). Further, the larger value of corrp signifies
Here, the lower values of RLN signifies more homogeneity. higher dependency.
( Ng
)2 Ng ∑
∑ Ng
∑
Nr ∑ p(i, j)ij − μx μy
P(i, j|θ)
(10)
i=1 j=1
RLN =
j=1 i=1
(7) corrp =
Nr (θ) σx (i)σy (j)
6
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
Entropy (entro) [27]: entro is the measure of randomness or uncertainty (n) Difference variance (dvarh) [28]: It is the heterogeneity measure,
in the image, computed using Eq. (15). Its higher values signify increase that places higher weights on different levels of intensity pairs
in randomness and uncertainty. deviating from the mean using Eq. (22). In this case, an increase
in dvarh signifies an increase in the heterogeneity.
Ng ∑
∑ Ng
entro = − p(i, j)log(p(i, j) + ε) (15) N
∑ g− 1
(23)
Ng ∑
∑ Ng
p(i, j) denth = px− y (k)log(px− y (k) + ε)
homop = ⃒ (16)
i=1 j=1 1 + ⃒i − j|2 k=0
2Ng
∑
svarh = (k − SA)2 px+y (k) (20) (iii) First order statistical (FoS) features [30]
k=2
7
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
[ ]
(a) Mean: It is the measure of average grey level of each region dn φ(λ)
Also, from the Eq. (34) it can be observed that (− m)n = μn .
computed using Eq. (28). dλn
λ=0
Further, the function for generating cumulants is given as:
∑
L− 1
b= b.p(b) (28) ∑
∞
(mλ)n
b=0 ψ X (λ) = lnφX (λ) = Kj (35)
n=1
n!
(b) Standard deviation (std): std is the measure of variation or Here Kn represents the nth cumulant of the variable X and is given as
dispersion from the mean defined using Eq. (29). [
n
]
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ (− m)n d dλ
ψ X (λ)
n = Kn . On comparison of the Eqs. (34) and (35),
√ L− 1 λ=0
√∑
σ b = √ (b − b)2 b(p) (29) cumulants can be expressed as K1 = μ1 , K2 = μ2 − (μ1 )2 etc. Further, the
b=0 ( )
2 2
variable X along with variance σ2 and mean μ, φX (λ) = exp mλμ − λ 2σ
(c) Variance (var): var is the measure of spread of the grey levels signifies Kn = 0 for j ≥ 3. Thus, the cumulants of order three at different
around the mean value computed using Eq. (30). orientations of 10◦ , 50◦ , 90◦ , 130◦ and 180◦ were given as y_cum10,
y_cum50, y_cum90, y_cum130 and y_cum180. These orientations were
∑ determined by repeated testing at different degrees. These very orien
L− 1
σ2 = (b − b)2 b(p) (30)
b=0
tations were only used, as the difference between them were significant
for ‘normal’ and ‘abnormal’ cases. While, at other degrees the difference
was not significant.
(d) Kurtosis (kr): kr is the measure of the ‘peak’ values distributed in
ROI of image. A higher value of kr signifies distribution of mass (v) Higher order spectral (HOS) features [6]
concentrated towards the tail instead of mean, while lower value
of kr implies distribution of mass concentrated towards a spike HOS features are also used to analyse non-stationary, non-linearity
near the value of mean. and non-gaussian characteristics hidden in the image, it uses phase and
amplitude information of a given signal which can be used for both
1 ∑
L− 1
kr = (b − b)4 p(b) − 3 (31) random processes and deterministic signals. These features are derived
σ4b b=0
from third-order statistics of the signal, namely bispectrum. The bis
pectrum is given as B(f1 , f2 ) = E[X(f1 )X(f2 )X ∗ (f1 + f2 )], here X(f ) is the
(e) Skewness (sk): sk is the degree of asymmetry in the values of fourier transform of the signal x(nT), n is the integer index, T is the
image distributed near the mean value. It can be positive or sampling interval and E[.] signifies the expectation operation. Features
negative depending upon the elongation of tail and distribution are here calculated by integrating the bispectrum along f2 = af1 , where
of mass. a is the slope. The bispectral invariant P(a) termed as phase of integrated
( )
1 ∑
L− 1 bispectrum is given as P(a) = arctan IIri (a)
(a) , where Ir and Ii refers to real
sk = (b − b)3 p(b) (32)
σ4b b=0 and imaginary part of the integrated spectrum. These bispectrum con
tains information about shape of waveform within the window,
invariant to amplification and shift along with robustness to changes in
(f) Entropy (ent): ent is the measure of randomness/uncertainty in
time scale. These features are distributed uniformly and symmetrically
the values of image calculated using Eq. (33). It is used to mea
about zero in the interval [ − π, + π].
sure the average of information desired to encode values of an
The commonly used HOS features are as follows:
image. In this case, higher values of ‘ent’ specify more
randomness.
(a) Entropy: It is the measure of distribution of the spectral power in
∑
L− 1 the image or a signal, computed using Eq. (36). It is derived from
ent = − p(b)logp(b) (33) the probability distribution in the frequency domain, and calcu
lates entropy, also termed as Shannon entropy.
b=0
∑
(iv) Higher order cumulant (HOC) features [31] Ph = p(ψ n )logp(ψ n ) (36)
n
HOC features are used to analyse non-stationary, non-linearity and Where, p(ψ n ) = 1
∑
l(ϕ(B(f1 , f2 )) ∈ ψ n , ψn = {ϕ| − π + 2πn/N
L
non-gaussian characteristics hidden in the image. It has certain prop Ω
erties such as same values regardless of permutations in the arguments, ≤ ϕ ≤ − π + 2π (n + 1)/N}, n = 0,1,.......,N − 1, L is the number of points
cumulants of scaled random variables equals the product of all scale in the region Ω, ϕ is the phase angle of bispectrum and l(.) is indicator
factor times the cumulant, and sums of independent random processes function which gives value 1 when phase angle is within the range of ψ n .
are the sum of cumulants sum. Also, the third and higher order cumu
lants for gaussian are identically zeros. Further, the cumulants follow a (b) Mean: Similar to the mean defined in Eq. (28), here the mean is
special symmetry which makes it sufficient for computation in the sub- defined in context of spectral analysis using Eq. (37).
domain. It uses higher order statistics derived from moments/correla 1∑
tions for a given image. Moments are particular weights assigned to Mean = |B(f1 , f2 )| (37)
L Ω
image pixels or intensities using a function for interpretation. For an
image with random variable X, moments are given as μn = E(Xn ), n = 1,
2, ....., ∞. Here, the function of X given as φX (λ) is:
∑
∞
(mλ)n (c) Entropy1: It is the measure of entropy with degree 1 computed
φX (λ) = E(emλX ) = 1 + μn ,λ ∈ R (34) using Eq. (38).
n=1
n!
8
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
∑
Ent1 = − tnn logtnn (38) Table 2
nn Number of Features.
⃒ S. Type Name Number
where, probability distribution for entropy with degree 1 is tnn = (⃒B(f1 , no.
⃒ ∑ ⃒
f2 )⃒)/( |B(, f1 , f2 )⃒)
Ω 1 Structural CDR, DDLS 2
features
(d) Entropy2: It is the measure of entropy with degree 2 computed 2 GLRM SRE, LRE, GLN, RP, RLN 5
3 GLCM contr, corrp, autoc, cprom, cshad, dissi, maxpr, 20
using Eq. (39).
sosvh, energy, entro, homop, savgh, svarh,
∑ denth, senth, dvarh, inf2h, inf1h, indnc, idmnc
Ent2 = − qnn logqnn (39)
4 FoS Mean, std, var, kr, sk, ent 6
nn
5 HOC y_cum10, y_cum50, y_cum90, y_cum130 and 5
y_cum180
where, probability distribution for entropy with degree 1 is 6 HOS Entropy, Mean, Entropy1, Entropy2, Entropy3 5
∑
qnn = (|B(f1 , f2 )|2 )/( Ω |B(f1 , f2 )|2 ) 7 Wavelet db3H, db3 V, db3D, sym3H, sym3 V, sym3D, 18
bior3.3H, bior3.3 V, bior3.3D, bior3.5H,
bior3.5 V, bior3.5D, bior3.7H, bior3.7 V,
(e) Entropy3: It is the measure of entropy with degree 3 computed bior3.7D, haarH, haarV, haarD
using Eq. (40). TOTAL number of features 61
∑
Ent3 = − rnn logrnn (40)
nn db3 is the third order wavelet filter with square modulus of transfer
∑N− 1 N− 1+k k
function h.Consider, P(y) = k=0 Ck y , wherein CN−
k
1+k
signifies
where, probability distribution for entropy with degree 1 is
∑ binomial coefficients. Thus, the function m0 (ω) is given in Eq. (45)
rnn = (|B(f1 , f2 )|3 )/( Ω |B(f1 , f2 )|3 ) ⃒ ( ( ) )N ( (ω) )
⃒
⃒m0 (ω)|2 = cos2 ω P sin2 (45)
⃒ 2 2
(vi) Discrete wavelet transform (DWT) features [32,33]
∑2N− 1
Here, m0 (ω) = √1̅̅2 k=0 hk e− ikω
, length of ψ and φ is 2N − 1 and number
The wavelet transformation is the division of data, operators or
functions into different components of frequency, and study each with of dissipating moments of ψ is N = 3.
resolution matching to scale. Wavelet transform here decomposes the
image into four different components namely approximation, horizon (b) Symlets (sym3: sym3H, sym3 V, sym3D)
tal, diagonal and vertical. Approximation component is utilized in
scaling and other three in translation. For images, separable wavelet sym3 is a third order wavelet filter with few modifications in the db3
Transform with One-Dimension filter bank is applied to rows and col to increase symmetry. The function m0 is re-used by considering
umns of each channel. The scaling function φj,m,n (x, y) and translation |m0 (ω)|2 used in Eq. (45) as function W of z = eiω . Here, the W is of form
function ψ ij,m,n (x, y) used here for m rows and n columns of an image are W(z) = U(z)U(1/z) and the value of U is selected in such a way that all
its roots are strictly greater than or equal to 1.
given in Eqs. (41) and (42). Further, the 3 high-pass channels corre
sponding to horizontal ψ H (x, y), vertical ψ V (x, y) and diagonals ψ D (x, y)
(c) Biorthogonal (bior3.3: bior3.3H, bior3.3 V, bior3.3D; bior3.5:
functions are extracted along with φ(x, y) scaling function. High pass
bior3.5H, bior3.5 V, bior3.5D; bior3.7: bior3.7H, bior3.7 V,
channel is used here to acquire high frequency components i.e. abrupt
bior3.7D)
changes from horizontal, vertical and diagonal components of the
image.
bior is the extension of wavelet which considers two wavelets instead
φj,m,n (x, y) = 2j/2 φ(2j x − m, 2j y − n) (41) of one to overcome the incompatibility of symmetry and reconstruction.
First wavelet ψ is used for analysis and second wavelet ψ performs
ψ ij,m,n (x, y) = 2j/2 ψ (2j x − m, 2j y − n) i = {H, V, D} (42) synthesis with coefficients in Eqs. (46) and (47).
∫
Here, D, H and V signifies diagonal, horizontal, and vertical components cj,k = s(x)ψ j,k (x)dx (46)
of wavelet transform. Thereafter, the DWT of function f(x, y) with size
∑
m X n is given in Eqs. (43) and (44). s= cj,k ψ j,k (47)
j,k
1 m− 1 ∑
∑ n− 1
Wφ (j0 , m, n) = √̅̅̅̅̅̅ f (x, y)φjo ,m,n (x, y) (43) Thus, the wavelets ψ and ψ related by duality are given in equation
mn x=0 y=0
(48) and equation (49)
∫
1 ∑ M− 1 ∑
N− 1
Wψi (j, m, n) = √̅̅̅̅̅̅ f (x, y)ψ ij,m,n (x, y) i = {H, V, D} (44) ψ j,k (x)ψ j’,k’ (x)dx = 0 if j ∕
= j’ or k ∕
= k’ (48)
mn x=0 y=0
∫
Here, jo is scale starting arbitrary, Wφ (j0 , m, n) signifies approximation φ0,k (x)φ0,k’ (x)dx = 0 if k ∕
= k’ (49)
coefficients of f(x, y) at jo scale and Wψi (j, m, n) is used to add H, V and D
coefficients for scales j > = jo . Further, the types of DWT such as dau
bechies 3 (db3), symlets 3(sym3), biorthogonal 3.3(bior 3.3), bio (d) haar (haarH, haarV, haarD)
rthogonal 3.5(bior 3.5), biorthogonal 3.7(bior 3.7) and haar with H, V
and D components are given as The haar is the simplest wavelet comprising of φ(t) scaling function
and ψ (t) translating function given in Eqs. (50) and (51)
(a) Daubechies (db3: db3H, db3 V, db3D) { }
1 0≤t≤1
φ(t) = (50)
0 else
9
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
Table 3
∑
v
Ranking of features. I(S) = − pi log2 (pi ) (54)
S.no. Features Ranks i=1
IG GR Correlation Relief Wrapper where, pi signifies probability of a random sample. Now for a specific
1 CDR 1 1 2 2 1 feature ‘S’ with v distinct values taken into consideration, the entropy is
2 DDLS 2 2 1 1 2 computed using Eq. (55)
3 Homom 3 8 8 9 7
4 Dvarh 4 4 7 5 6 ∑
v
Si
5 Dissi 5 5 5 6 4 H(S) = I(S) (55)
S
6 Contr 6 6 6 7 3 i=1
7 Homop 7 7 9 8 5
Further, the information gained is given using Eq. (56) and
8 Denth 8 3 10 10 10
9 Idmnc 9 9 4 3 8 normalization of same is performed using Eq. (57)
10 Indnc 10 10 3 4 9
Gain(S) = I(S) − H(S) (56)
⎧ ⎫ ∑
v
10
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
1 and labels drawn from finite set. Also, it requires small training data for
R= (WV) (61)
m’ estimation of parameters significant for classification. It is also consid
ered as a conditional probability, for instance a vector x = (x1 , ..........,
Here, m’ is the number of iterations. Similar to above selection criteria’s, xn ) with n number of features are assigned probabilities t(CCk |x1 , ........,
the features with high score of 0.59 are considered as significant, xn ) for ‘k’ possible outcomes of CCk classes. Thus, according to bayes
whereas rest with scores less than 0.59 are discarded. So, the ten sig theorem conditional probability is given using (62):
nificant features accordingly are DDLS, CDR, indnc, idmnc, dissi, contr, ⃒
dvarh, homom, homop, denth, and the ranks of same is given in Table 3. ⃒
t(CCk ⃒⃒x) =
t(CCk )t(x|CCk )
(62)
t(x)
3.4.5. Wrapper selection [38]
Wrapper selection is an approach that evaluates a feature subset Here, t(x|CCk) specifies the likelihood i.e. the probability of a feature ‘x’
using an algorithm which employs search strategy such as forward in a class ‘CCk’. Whereas, t(CCk) is the prior probability of class ‘CCk’ and
feature selection, backward feature selection, exhaustive feature selec t(x) is the prior probability of feature ‘x’.
tion or bidirectional search to look into the space for possible feature It can be seen from Eq. (62) that denominator is not dependent on CC
subset. Greedy algorithm is evaluated on each subset to determine the and is thus effectively constant. Further the numerator is identical to
best possible combination. The wrapper approach detects the interac model of joint probability t(CCk , x1 , ........, xn ) which can be written as
tion between variables and finds the optimal feature subset. In this case, given in Eq. (63) using the chain rule for applications of conditional
the search strategy employed is bidirectional search due to its better probability.
input [39]. The extracted features are ranked and fed into classifiers for t(CCk |x1 , ......, xn ) = t(CCk ) t(xi |CCk ) (65)
Z
categorization into normal and abnormal cases. The classifiers used in
i=1
11
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
3.5.2. Support vector machine (SVM) [41] Here, φ(xi ) represents the image in the feature space due to xi . While, wo
SVM use an optimal linear hyperplane to separate the images in data signifies the optimal bias bo .
set into a feature space. The ideal hyperplane is achieved by maximizing Finally, the x with values of wo ≥ 1 are considered in one hyperplane
the margin among two sets. Thus, the final hyperplane depends on the and wo < 1 in another hyperplane.
training patterns of border known as support vectors. It mainly operates
on two operations, first is the nonlinear mapping of input vector into 3.5.3. Random forest (RF) [42]
high dimension feature space hidden from output and input. Second is RF is an ensemble approach of learning for classification that per
the ideal hyperplane construction for separation of discovered features. forms computation by construction of magnitude at training time and
It considers a vector x drawn from input space of dimension mo, outputs the class with mean or mode of classes. This decision tree is a
{ }
φj (x) for j = 1 to m1 signifies non-linear transformations to the feature popular approach for machine learning where, learning meets the need
space from input space where m1 is the dimension of feature space. Also of serving an off-shelf process for mining of data. The algorithm for
{ }
wj for j = 1 to m1 denotes a set of linear weights connecting output training of RF uses general technique of bagging or bootstrap aggrega
{ }
space to the feature space, φj (x) represents the input supplied to tion to make the trees learn. Now, for training the set X = x1 , ......, xn
weight wj through feature space, b represents bias, αi is the coefficient of with labels Y = y1 , ......, yn bagging iteratively (B times) chooses a sam
lagrange and di is the corresponding output. ple randomly by interchange of training set and fit trees to samples as
Further, the steps involved for designing of SVM are as follows: given below.
Step 1: Define hyperplane H acting as decision surface i. For b = 1 to B
Here C = 21πh, where h determines the trade-off between the hyperplanes i. Choose k minimum distances.
ii. Choose the most common class for these k distances.
and ensures that the xi lies on optimal hyperplane.
iii. The common class is considered the class for new item.
Step 4: Finally, the linear weight wo with the optimal values of αo,i are
given as:
For a point x’ in the dth dimensional space of feature set X, function
⃒⃒ ⃒⃒
∑
N
fx’ (x) : ℝd →ℝ is computed based on euclidean metric fx’ (x) = ⃒⃒x − x’⃒⃒.
wo = αo,i di φ(xi ) (74)
i=1
Further, the entire training set comprising of X samples are ordered with
12
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
The first activation used here is the hyperbolic tangent lying in the (i) Accuracy: Accuracy is the state of being precise or correct
range -1 to 1, whereas second one is the logistic function which lies in and is given as:
range 0–1. In this case, yi signifies output for ith node and vi represents
the weighted sum of connections in input. As, MLP is an extensive Accuracy =
Tp + Tn
× 100 (83)
connected architecture, each individual node in single layer in Tp + Tn + Fp + Fn
terconnects with weight wij for each node of the following layer. The
learning is performed using change of weights on the basis of corrections
that minimize the error in entire output using
1∑ 2 (i) Precision: Precision is the probability of predicted labels
ε(n) = e (n) (80) belongingness to actual labels and is given as:
2 j j
Tp
Precision = (84)
Here, ej (n) = dj (n) − yj (n) is the degree of error in an output node j for Tp + Fp
the nth sample in training data, where d is the expected output and y is
the predicted output. Further the change in each weight using gradient
descent is given as Δwji (n) = − η ∂υ
∂ε(n)
i (n)
yi (n), where (yi = wij ∗ xi + b, here
b is the bias) is the output of previous neuron and η is the rate of learning (i) Sensitivity: Sensitivity is the probability of diseased occur
to ensure that weights converge to response quickly without oscillations. rence to the total number of diseased occurrences and is given
The Weight updation can be performed as wij = wij + η ∗ ej ∗ x. as:
Now the derivative to be calculated is dependent on induced local
field vj, and the derivative for output node can be computed using Eq.
13
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
Table 6
Performance analysis for split method.
Classifier/Performance Accuracy (in%) Specificity Precision Sensitivity
k-NN 83.3 ± 4.6 91.6 ± 3.3 0.83 ± 0.02 0.90 ± 0.02 0.84 ± 0.03 0.91 ± 0.03 0.83 ± 0.02 0.91 ± 0.02
NN 88.8 ± 5.2 94.4 ± 4.1 0.87 ± 0.03 0.93 ± 0.03 0.88 ± 0.04 0.95 ± 0.04 0.88 ± 0.02 0.94 ± 0.02
SVM 91.6 ± 3.3 97.2 ± 3.1 0.89 ± 0.02 0.96 ± 0.02 0.91 ± 0.03 0.97 ± 0.03 0.91 ± 0.01 0.97 ± 0.01
RF 88.8 ± 5.2 94.4 ± 4.1 0.87 ± 0.03 0.93 ± 0.03 0.88 ± 0.04 0.95 ± 0.04 0.88 ± 0.02 0.94 ± 0.02
NB 76.3 ± 4.9 89.6 ± 4.3 0.74 ± 0.03 0.89 ± 0.03 0.76 ± 0.03 0.89 ± 0.03 0.76 ± 0.01 0.88 ± 0.01
14
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
Fig. 9. Plots of Specificity, Precision, Sensitivity for different classifiers using split method.
Table 7
3-fold cross validation.
Classifier/Performance Accuracy (in%) Specificity Precision Sensitivity
k-NN 79.9 ± 5.5 88.6 ± 5.1 0.76 ± 0.04 0.88 ± 0.04 0.78 ± 0.03 0.87 ± 0.03 0.78 ± 0.02 0.87 ± 0.02
NN 83.3 ± 4.6 93.1 ± 4.2 0.81 ± 0.03 0.92 ± 0.03 0.82 ± 0.03 0.91 ± 0.03 0.81 ± 0.02 0.91 ± 0.02
SVM 90.3 ± 3.8 96.5 ± 3.3 0.89 ± 0.02 0.95 ± 0.02 0.91 ± 0.02 0.96 ± 0.02 0.90 ± 0.01 0.95 ± 0.01
RF 83.3 ± 4.6 93.1 ± 4.2 0.81 ± 0.03 0.92 ± 0.03 0.82 ± 0.03 0.91 ± 0.03 0.81 ± 0.02 0.91 ± 0.02
NB 75.8 ± 5.8 85.3 ± 5.3 0.73 ± 0.04 0.85 ± 0.04 0.75 ± 0.04 0.84 ± 0.04 0.74 ± 0.03 0.84 ± 0.03
Table 8
5-fold cross validation.
Classifier/Performance Accuracy (in%) Specificity Precision Sensitivity
k-NN 76.5 ± 6.1 85.6 ± 5.7 0.75 ± 0.04 0.84 ± 0.04 0.74 ± 0.03 0.84 ± 0.03 0.74 ± 0.02 0.83 ± 0.02
NN 79.2 ± 5.6 88.6 ± 5.1 0.78 ± 0.03 0.87 ± 0.03 0.78 ± 0.03 0.86 ± 0.03 0.77 ± 0.02 0.86 ± 0.02
SVM 87.7 ± 4.2 93.2 ± 3.9 0.87 ± 0.02 0.92 ± 0.02 0.86 ± 0.02 0.92 ± 0.02 0.86 ± 0.01 0.91 ± 0.01
RF 79.2 ± 5.6 88.6 ± 5.1 0.78 ± 0.03 0.87 ± 0.03 0.78 ± 0.03 0.86 ± 0.03 0.77 ± 0.02 0.86 ± 0.02
NB 72.9 ± 6.3 79.8 ± 6.1 0.71 ± 0.04 0.79 ± 0.04 0.70 ± 0.04 0.78 ± 0.04 0.71 ± 0.03 0.79 ± 0.03
4.1. Parameter selection for classifiers polynomial function, (gamma*u*v + coef()^degree)” and “with radial
basis function, tanh(gamma*u*v + coef()^degree)”, where gamma =
For the parameter selection of the classifiers, different parameters 1.25. The accuracy in case of “with radial basis function, exp (-gamma*|
were varied to check the optimal performance. The parameters used in u-v|2)” was 94.4%, “with polynomial function, (gamma*u*v + coef()
NB were “with kernel estimation” and “without kernel estimation”. The ^degree)” was 97.2% and “with radial basis function, tanh(gamma*u*v
accuracy in case of NB “with kernel estimation” was 86.32% and that for + coef()^degree)” was 94.4%. Thereafter, the parameters in case of
“without kernel estimation” was 89.65%. Further, the parameters used random forest was the number of trees (n), its values varied from 10 to
in SVM were “with basis radial function, exp (-gamma*|u-v|2)”, “with 50. Accuracy in case of n = 10 was observed to be 91.6%, at n = 30 was
Fig. 10. Plots of Accuracy with cross validation method for different classifiers.
15
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
Fig. 11. Plots of Specificity, Precision, Sensitivity for different classifiers using 3-fold cross method.
Fig. 12. Plots of Specificity, Precision, Sensitivity for different classifiers using 5-fold cross method.
94.4% and at n = 50 was 90.2%. Thereafter, in the case of k-NN values of testing images comprises 50% normal cases i.e. ones without glaucoma
k were varied from 3 to 7. The accuracy in case of k = 3 was 86.9%, k = 4 and 50% abnormal cases i.e. ones with glaucoma. Similarly, the 30%
was 89.6%, k = 5 was 91.6%, k = 6 was 89.6% and k = 7 was 87.3%. testing images comprises 50% glaucoma cases and 50% non-glaucoma
Whereas, in the case of NN the number of hidden layers (l) varied from 1 cases. Analysis was then performed for both the feature sets using
to 10, accuracy in case of l = 1–5 was 92.6%, l = 6–8 was 94.4% and l = different classifiers. Table 6 shows the values of accuracy, specificity,
9–10 was 91.3%. Fig. 7 shows the accuracy of each classifier with precision and sensitivity of k-NN, NN, SVM, RF and NB classifiers for a
changing parameters. Finally, Table 5 presents the optimal values used full set of 61 features, and reduced set of 10 features. Further Fig. 8
for classification. shows the plots of accuracy for the full set of 61 hybrid features and
reduced set of 10 hybrid features. While Fig. 9 presents the plots of
specificity, precision and sensitivity for a full set of 61 hybrid features
4.2. Performance analysis and reduced set of 10 features.
Thus, from Table 6 and Figs. 8–9 it can be observed that accuracy of
This section presents the analysis of different classifiers such as NB, k-NN for 61 features was observed to be 83.3% and that for 10 features
SVM, RF, k-NN and NN on two types of feature sets. First is the full set of was 91.6%. While, the values of specificity, precision and sensitivity for
61 features consisting of structural and non-structural features. Second 61 features were 0.83, 0.84 and 0.83; and that for 10 features were 0.90,
is the reduced set of 10 significant features shown in Table 3 extracted 0.91 and 0.91.
from 61 features using different ranking and selection criteria discussed Similarly, the accuracy of NN and RF for 61 features was 88.8% and
in section 3.2. Further, the analysis has been performed with a split that for 10 features was 94.4%. On the other hand, the values of speci
method and k-fold cross validation for both sets of features. ficity, precision and sensitivity for 61 features were 0.87, 0.88 and 0.88;
and that for 10 features were 0.93, 0.95 and 0.94.
4.2.1. Split method Further, the accuracy of SVM for 61 features was 91.6% and that for
The dataset was here divided into training and testing sets using a 10 features was 97.2%. While, the values of specificity, precision and
split method, 70% of the dataset was used for training and 30% for sensitivity for 61 features were 0.89, 0.91 and 0.91; and that for 10
testing without any overlap in the training and testing set. Here, 70% of
16
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
Table 9
Comparison of proposed feature set with State-of-the-art feature sets.
Approach Dataset Used Classifier Features extracted Classification Accuracy (Split Classification Accuracy (5-fold cross
Used Approach) validation)
Fig. 13. Plots of accuracy for different feature sets on SVM classifier.
features were 0.96, 0.97 and 0.97. validation, and that for 10 features was 93.1% with 3-fold cross vali
Also, the accuracy of NB for 61 features was 76.3% and that for 10 dation and 88.6% with 5-fold cross validation. While, the values of
features were 89.6%. Whereas, the values of specificity, precision and specificity, precision and sensitivity for 3-fold cross validation with 61
sensitivity for 61 features were 0.74, 0.76 and 0.76; and that for 10 features were 0.81, 0.82 and 0.71; and with 10 features were 0.92, 0.91
features were 0.89, 0.89 and 0.88 respectively. and 0.91. Similarly, the values of specificity, precision and sensitivity for
Thus, it can be concluded that the proposed set of reduced hybrid 5-fold cross validation with 61 features were 0.78, 0.78 and 0.77; and
features outperforms the full set of 61 features for all the classifiers. Also, with 10 features were 0.87, 0.86 and 0.86.
the SVM classifier is found to outperform other classifiers with highest Thereafter, the accuracy in case of SVM for 61 features was 90.3%
accuracy for both sets of features. with 3- fold cross validation and 87.7% with 5-fold cross validation, and
that for 10 features was 96.5% with 3-fold cross validation and 93.2%
4.2.2. K-fold cross validation method with 5-fold cross validation. While, the values of specificity, precision
The dataset is here divided into k parts, with k-1 part for training and and sensitivity for 3-fold cross validation with 61 features were 0.89,
remaining part for testing during the first iteration. The complete pro 0.91 and 0.90; and with 10 features were 0.95, 0.96 and 0.95. Similarly,
cess is being iterated k-times where every fold is selected arbitrarily to the values of specificity, precision and sensitivity for 5-fold cross vali
avoid biasing by selecting specific samples. Both the 3-fold and 5-fold dation with 61 features were 0.87, 0.86 and 0.86; and with 10 features
cross validation was performed on the entire dataset, and the analysis were 0.92, 0.92 and 0.91.
performed for both the feature sets using different classifiers are shown Finally, the accuracy of NB for 61 features was observed to be 75.8%
in Tables 7 and 8 followed by graph plots in Figs. 10–12. with 3-fold cross validation and 72.9% with 5-fold cross validation, and
On the basis of experimental analysis performed in Tables 7–8 and that for 10 features was 85.3% with 3-fold cross validation and 79.8%
Figs. 10–12, it can be observed that accuracy of k-NN for 61 features was with 5-fold cross validation. While, the values of specificity, precision
observed to be 79.9% with 3-fold cross validation and 76.5% with 5-fold and sensitivity for 3-fold cross validation with 61 features were 0.73,
cross validation, and that for 10 features was 88.6% with 3-fold cross 0.75 and 0.74; and with 10 features were 0.85, 0.84 and 0.84. Similarly,
validation and 85.6% with 5-fold cross validation. While, the values of the values of specificity, precision and sensitivity for 5-fold cross vali
specificity, precision and sensitivity for 3-fold cross validation with 61 dation with 61 features were 0.71, 0.70 and 0.71; and with 10 features
features were 0.76, 0.78 and 0.78; and with 10 features were 0.88, 0.87 were 0.79, 0.78 and 0.79. Thus, it can be concluded that the proposed set
and 0.87. Similarly, the values of specificity, precision and sensitivity for of reduced hybrid features outperforms the full set of 61 features for all
5-fold cross validation with 61 features were 0.75, 0.74 and 0.74; and the classifiers. Also, the SVM classifier is found to outperform other
with 10 features were 0.84, 0.84 and 0.83. classifiers with highest accuracy for both sets of features.
Also, the accuracy of NN and RF for 61 features was observed to be Now, as the SVM classifier is found to outperform all other classifiers,
83.3% with 3-fold cross validation and 79.2% with 5-fold cross comparison of the state-of-the-art feature set with proposed feature set
17
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
for SVM classifier is presented in Table 9 and Fig. 13 with plots of ac [7] S. Dua, U.R. Acharya, P. Chowriappa, S.V. Sree, Wavelet-based energy features for
glaucomatous image classification, IEEE Trans. Inf. Technol. Biomed. 16 (January
curacies to analyse the proposed feature set comprising 10 features.
(1)) (2012) 80–87.
On the basis of results presented in Table 9 and Fig. 13, it can be [8] M.R. Mookiah, U.R. Acharya, C.M. Lim, A. Petznick, J.S. Suri, Data mining
observed that the proposed feature set gives accuracy of 97.22% using technique for automated diagnosis of glaucoma using higher order spectra and
split method and 93.2% using 5-fold cross validation on SVM classifier. wavelet energy features, Knowledge Based Syst. 33 (September) (2012) 73–82.
[9] K.P. Noronha, U.R. Acharya, K.P. Nayak, R.J. Martis, S.V. Bhandary, Automated
Thus, the proposed feature set gives higher accuracy than other state of classification of glaucoma stages using higher order cumulant features, Biomed.
the art feature set, and is thus the most suitable for classification of Signal Process. Control 10 (March) (2014) 174–183.
retinal images as ‘normal’ or ‘abnormal’. [10] U.R. Acharya, E.Y. Ng, L.W. Eugene, K.P. Noronha, L.C. Min, K.P. Nayak, S.
V. Bhandary, Decision support system for the glaucoma using Gabor
transformation, Biomed. Signal Process. Control 15 (January) (2015) 18–26.
5. Conclusion [11] A. Issac, M.P. Sarathi, M.K. Dutta, An adaptive threshold-based image processing
technique for improved glaucoma detection and classification, Comput. Methods
Programs Biomed. 122 (Novemver (2) (2015) 229–244.
This work presented a machine learning based classification using [12] A.A. Salam, T. Khalil, M.U. Akram, A. Jameel, I. Basit, Automated detection of
reduced set of proposed hybrid features comprising of CDR, DDLS, glaucoma using structural and non structural features, Springerplus 5 (December
homom, dvarh, dissi, contr, homop, denth, idmnc and indnc extracted (1)) (2016) 1519.
[13] M.S. Haleem, L. Han, J. Van Hemert, A. Fleming, L.R. Pasquale, P.S. Silva, B.
from 61 set of hybrid features for better diagnosis of glaucoma using J. Song, L.P. Aiello, Regional image features model for automatic classification
retinal images. In the existing studies, the use of features, classifiers and between normal and glaucoma in fundus and scanning laser ophthalmoscopy (SLO)
datasets vary from study to study. Also, very few studies have employed images, J. Med. Syst. 40 (June (6)) (2016) 132.
[14] M. Claro, L. Santos, W. Silva, F. Araújo, N. Moura, A. Macedo, Automatic glaucoma
feature selection using appropriate ranking schemes. Thus, this study detection based on optic disc segmentation and texture feature extraction, CLEI
extracts all possible feature sets based on the state of the art, extracts the Electron. J. 19 (August (2)) (2016) 5.
significant features using five feature ranking schemes and performs [15] A. Singh, M.K. Dutta, M. ParthaSarathi, V. Uher, R. Burget, Image processing based
automatic diagnosis of glaucoma using wavelet features of segmented optic disc
classification with varied classifiers on the same dataset. The method from fundus image, Comput. Methods Programs Biomed. 124 (February) (2016)
ology used comprises of pre-processing to remove the outliers, seg 108–120.
mentation of optic and optic cup to find CDR and DDLS, feature [16] J.A. de Sousa, A.C. de Paiva, J.D. de Almeida, A.C. Silva, G.B. Junior, M. Gattass,
Texture based on geostatistic for glaucoma diagnosis from fundus eye image,
extraction to extract all the possible features, feature ranking/selection
Multimed. Tools Appl. 76 (September (18)) (2017) 19173–19190.
to extract only the significant features followed by classification of [17] J.E. Koh, U.R. Acharya, Y. Hagiwara, U. Raghavendra, J.H. Tan, S.V. Sree, S.
retinal images as ‘normal’ or ‘abnormal’ based on training provided by V. Bhandary, A.K. Rao, S. Sivaprasad, K.C. Chua, A. Laude, Diagnosis of retinal
health in digital fundus images using continuous wavelet transform (CWT) and
features as input and labels as output. The total of 61 features consisting
entropies, Comput. Biol. Med. 84 (May) (2017) 89–97.
of structural and non-structural features such DDLS, CDR, GLRM, GLCM, [18] A. Septiarini, D.M. Khairina, A.H. Kridalaksana, H. Hamdani, Automatic glaucoma
HOS, FoS, HOC and Wavelets were extracted from input retinal images. detection method applying a statistical approach to fundus images, Healthc.
These features are then reduced using IG, GR, Correlation, relief and Inform. Res. 24 (January (1)) (2018) 53–60.
[19] D. Selvathi, N.B. Prakash, V. Gomathi, G.R. Hemalakshmi, Fundus image
wrapper approach, and were fed into k-NN, NN, SVM, RF, NB for clas classification using wavelet based features in detection of Glaucoma, Biomed.
sifying input images as abnormal and normal. Based on the experimental Pharmacol. J. 11 (June (2)) (2018) 795–805.
analysis, the proposed feature set was found to outperform full set of 61 [20] D.C. Shubhangi, N. Parveen, A dynamic roi based Glaucoma detection and region
estimation technique, Int. J. Computer Sci. Mobile Comput. 8 (August (8)) (2019)
features on all classifiers. Also, the ranking/selection reduced the 82–86.
training time of the classifier. Further, the performance of SVM was [21] S. Renukalatha, K.V. Suresh, Classification of Glaucoma using simplified-multiclass
found to be most suitable, as it offered better accuracy than k-NN, NN, support vector machine, Biomed. Eng. Appl. Basis Commun. 31 (October (05))
(2019), 1950039.
RF and NB. Thus, the study conducted can be also used as a second [22] J. Sivaswamy, S.R. Krishnadas, G.D. Joshi, M. Jain, A.U. Tabish, Drishti-gs: retinal
opinion by medical practitioners in glaucoma diagnosis for large scale image dataset for optic nerve head (onh) segmentation, in: 2014 IEEE 11th
clinical testing. Also, in the near future, it can be further extended for International Symposium on Biomedical Imaging (ISBI) 2014 Apr 29, IEEE, 2014,
pp. 53–56.
automation to improve accuracy by deep learning approaches with a
[23] F. Fumero, S. Alayón, J.L. Sanchez, J. Sigut, M. Gonzalez-Hernandez, RIM-ONE, An
larger number of images. open retinal image database for optic nerve evaluation, in: 2011 24th International
Symposium on Computer-Based Medical Systems (CBMS) 2011 Jun 27, IEEE, 2011,
pp. 1–6.
Author statement [24] N. Thakur, M. Juneja, Optic disc and optic cup segmentation from retinal images
using hybrid approach, Expert Syst. Appl. (March) (2019).
It is certified that all the authors have equally contributed in the [25] N. Harizman, C. Oliveira, A. Chiang, C. Tello, M. Marmor, R. Ritch, J.M. Liebmann,
The ISNT rule and differentiation of normal from glaucomatous eyes, Arch.
manuscript. Ophthalmol. 124 (November (11)) (2006) 1579–1583.
[26] G.L. Spaeth, P. Ichhpujani, The ethics of treating or not treating Glaucoma, J. Curr.
Glaucoma Pract. 3 (3) (2009) 7–12.
Declaration of Competing Interest [27] L.K. Soh, C. Tsatsoulis, Texture analysis of SAR sea ice imagery using gray level co-
occurrence matrices, IEEE Trans. Geosci. Remote. Sens. 37 (March (2)) (1999)
780–795.
The authors report no declarations of interest.
[28] R.M. Haralick, K. Shanmugam, Textural features for image classification, IEEE
Trans. Syst. Man Cybern. (November(6)) (1973) 610–621.
References [29] D.A. Clausi, An analysis of co-occurrence texture statistics as a function of grey
level quantization, Can. J. Remote. Sens. 28 (January (1)) (2002) 45–62.
[30] Statistical Features: available at: http://shodhganga.inflibnet.ac.in/bitstream/1
[1] Glaucoma Facts and Stats, available at: https://www.glaucoma.org/glaucoma/gla
0603/10111/9/09_chapter%203.pdf (Accessed 15 December 2018).
ucoma-facts-and-stats.php (Accessed 20 October 2018).
[31] S.R. Jammalamadaka, T.S. Rao, G. Terdik, Higher order cumulants of random
[2] A. Narula, V. Rajshekhar, S. Singh, S. Chakarvarty, An epidemiological study
vectors and applications to statistical inference and time series, Sankhyā: Indian J.
(Cross-sectional study) of Glaucoma in a semi-urban population of Delhi, J. Clin.
Stat. (May) (2006) 326–356.
Exp. Ophthalmol. 8 (2017) 686, https://doi.org/10.4172/2155-9570.1000686.
[32] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Publishing House of
[3] R. Kaur, M. Juneja, A.K. Mandal, Computer-aided diagnosis of renal lesions in CT
Electronics Industry, 2002. Mar; 141(7).
images: a comprehensive survey and future prospects, Comput. Electr. Eng.
[33] Wavelets: available at: http://matlab.izmiran.ru/help/toolbox/wavelet/ch
(August) (2018).
06_a31,32,34.htm (Accessed 15 December 2019).
[4] Glaucoma Research Foundation: Optic nerve cupping, available at: http://www.
[34] T. Mitchell, Machine Learning, McGraw-Hill, New York, 1997.
glaucoma.org/treatment/optic-nerve-cupping.php (accessed 20 October 2018).
[35] H. Uğuz, A two-stage feature selection method for text categorization by using
[5] N. Thakur, M. Juneja, Survey of classification approaches for glaucoma diagnosis
information gain, principal component analysis and genetic algorithm, Knowledge
from retinal images. Advanced Computing and Communication Technologies,
Based Syst. 24 (October (7)) (2011) 1024–1032.
Springer, Singapore, 2018, pp. 91–99.
[36] M. Doshi, Correlation based feature selection (CFS) technique to predict student
[6] U.R. Acharya, S. Dua, X. Du, C.K. Chua, Automated diagnosis of glaucoma using
Perfromance, Int. J. Comput. Netw. Commun. 6 (May (3)) (2014) 197.
texture and higher order spectra features, IEEE Trans. Inf. Technol. Biomed. 15
(May (3)) (2011) 449–455.
18
N. Thakur and M. Juneja Biomedical Signal Processing and Control 62 (2020) 102137
[37] K. Kira, L.A. Rendell, A practical approach to feature selection. Machine Learning [42] H. Trevor, T. Robert, F. JH, The elements of statistical learning: data mining,
Proceedings, Morgan Kaufmann, 1992, pp. 249–256, 1992 Jan 1. inference, and prediction.
[38] M.A. Hall, L.A. Smith, Feature selection for machine learning: comparing a [43] E. Blanzieri, F. Melgani, Nearest neighbor classification of remote sensing images
correlation-based filter approach to the wrapper, in: FLAIRS Conference, vol. 1999, with the maximal margin principle, IEEE Trans. Geosci. Remote. Sens. 46 (May (6))
1999, pp. 235–239. May 1. (2008) 1804–1811.
[39] J.R. Jensen, K. Lulla, Introductory digital image processing: a remote sensing [44] S. Haykin, Neural Networks: a Comprehensive Foundation, Prentice Hall PTR,
perspective. 1994. Oct 1.
[40] I. Rish, An empirical study of the naive Bayes classifier, in: IJCAI 2001 Workshop [45] T. Fawcett, An introduction to ROC analysis, Pattern Recognit. Lett. 27 (June (8))
on Empirical Methods in Artificial Intelligence, vol. 3, 2001, pp. 41–46. (2006) 861–874.
[41] L. Vanitha, A.R. Venmathi, Classification of medical images using support vector
machine, in: International Conference on Information and Network Technology,
Singapore, vol. 4, 2011, pp. 63–67. Sep 24.
19