Professional Documents
Culture Documents
Deep Learning For Weak Supervision of Diabetic Retinopathy Abnormalities
Deep Learning For Weak Supervision of Diabetic Retinopathy Abnormalities
Deep Learning For Weak Supervision of Diabetic Retinopathy Abnormalities
net/publication/334410282
CITATIONS READS
10 797
3 authors, including:
All content following this page was uploaded by Harshit Pande on 08 February 2020.
ABSTRACT
Deep learning-based grading of the fundus images of the
retina is an active area of research. Various existing studies
use different deep learning architectures on different datasets.
Results of some of the studies could not be replicated in other
studies. Thus a benchmarking study across multiple architec-
tures spanning both classification and localization is needed.
We present a comparative study of different state-of-the-art
architectures trained on a proprietary dataset and tested on
the publicly available Messidor-2 dataset. Although evidence
is of utmost importance in AI-based medical diagnosis, most
studies limit themselves to the classification performance
and do not report the quantification of the performance of
the abnormalities localization. To alleviate this, using class
activation maps, we also report a comparison of localiza-
tion scores for different architectures. For classification, we
found that as the number of parameters increase, the mod- Fig. 1: Diabetic Retinopathy abnormalities
els perform better, with NASNet yielding highest accuracy
and average precision, recall, and F1-scores of around 95%.
For localization, VGG19 outperformed all the models with
count while viewing a fundus image. For this study, we have
a mean Intersection over Minimum of 0.45. We also found
classified the images into three classes based on their severity
that there is a trade-off between classification performance
– non-referable DR (NRDR), non-proliferative DR (NPDR),
and localization performance. As the models get deeper, their
and proliferative DR (PDR).
receptive field increases, causing them to perform well on
classification but underperform on the localization of fine- Automated detection of DR is an active area of research
grained abnormalities. among computer vision researchers. Gulshan et al. [3] use In-
ceptionV3 [4] model pre-trained on ImageNet [5]. For train-
Index Terms— Diabetic retinopathy, Messidor-2, Abnor- ing, they use a proprietary EyePACS-1 dataset and report the
mality localization, Class activation maps performance on the Messidor-2 [6, 7] dataset. A replication
attempt conducted by Voets et al. [8] on the Kaggle DR datat-
1. INTRODUCTION set, which is a different version of the EyePACS-1 dataset,
reports inability to replicate the original results by Gulshan
Diabetic Retinopathy (DR) is one of the leading causes of et al. [3]. Voets et al. [8] report a receiver operating curve
preventable blindness, with an estimated 347 million [1] di- (AUC) of 0.94 on Kaggle EyePACS and 0.80 on Messidor-2,
abetics worldwide. Globally, 1.9% of moderate to severe vi- while Gulshan et al. [3] report an AUC of 0.99 on both Eye-
sion loss and 2.6% of total blindness is caused by DR [2]. Pacs as well as Messidor-2.
DR is broadly classified into two categories – non-referable Deep learning models have yielded state-of-the-art results
and referable DR. Referable DR pertains to more than mild in classification tasks but a deeper understanding of the rea-
DR severity. There are many granular features like microa- soning behind a certain prediction made by the model has not
neurysms (MA), hard exudates (HE), and superficial hemor- been much studied. The work by Zhou et al. [9] has played
rhages along with higher level features like new vessels on a crucial role in localization using class activation maps
disc (NVD) or new vessels elsewhere (NVE), pre-retinal or (CAMs) only through weak labels annotations on data using
vitreous hemorrhages that an ophthalmologist takes into ac- the global average pooling (GAP) layer. For DR, Gargeya et
A proprietary dataset of 10,274 images, captured through viation of 30, on a fundus image Ic . The edge image, Ie ,
multiple different cameras, and collected from Narayana is created by subtracting Ib from Ic . The edge image Ie is
Netralaya (NN), C.L. Gupta Eye Institute (CLGEI) and Sai resized to 512⇥512 using bi-cubic interpolation. The image
Retinal Foundation (SRF) in India is used for training. It is is rescaled to range of 0 to 1. During training random rota-
annotated into 3 classes namely referable non-proliferative tions in range of 0 -360 , random horizontal and vertical flips
DR (NPDR), non-referable DR (NRDR) and proliferative DR along with a zoom range of [0.8, 1.2] are applied. To compen-
(PDR) by a panel of 3 annotators having more than 5 years sate for the varying illumination of images, with a probability
of experience in retina sub-specialty. The ground truth label of 0.5, gamma correction is applied between a gamma range
for an image is assigned if two or more annotators give it of [0.3, 1.6] and gain of 1.0. Since there is a skew towards the
the same label. The images with disagreement among all NPDR class, PDR is class balanced by repeating images ran-
three annotators are discarded and not used in training. The domly to ensure the number of images in both the abnormal
final distribution of the training data is mentioned in Table 1. classes are comparable. The dataset mentioned in Table-1 is
The results of all the models are reported on the Messidor-2 split into 85% training and 15% validation in a stratified man-
ner for each class.
Label Count
Non-Proliferative DR (NPDR) 3564 3. BENCHMARKING METHODOLOGY
Non-Referable DR (NRDR) 4630
Proliferative DR (PDR) 2080 Transfer learning through Keras with models pre-trained on
ImageNet [5] is used to train VGG16, VGG19 [12], Incep-
Table 1: Dataset distribution used for training tionV3 [4], InceptionResNetV2 [13], Xception [14], Denset-
Net121 [15], ResNet50 [16], and NASNet [17]. All the mod-
dataset annotated by a panel of 2 annotators. The dataset els are trained and tested on a machine with 12-core CPU, 110
has 1748 images out of which the consensus is achieved for GB RAM and a Nvidia K-80 GPU.
1327 images for the above mentioned 3 classes. The class For all the models, the flattening layer after the last con-
distribution is mentioned in Table 2. The abnormalities per- volutional layer is replaced with the global average pooling
taining to DR are annotated by a single annotator to evaluate (GAP) layer to reduce the number of parameters and to aid
the localization capabilities of the models. Only 240 images in localization via class activation maps as done by Zhou et
belonging to the abnormal classes are annotated. The distri- al. [9]. Adam with a learning rate of 5e 5 and a batch size
bution of abnormalities in these 240 images is shown in Table of 16 with categorical cross-entropy as the loss function is
3. used. While training, the best model is saved on the basis of
As a part of pre-processing steps, a background image, minimum validation loss. The models are evaluated on preci-
Ib , is created by applying a Gaussian blur with a standard de- sion, recall, and F1-scores of all three classes for classification
574
benchmarking.
To validate the localization capabilities of classifica-
tion models, following Wang et al. [18] we compute the
mean Intersection over Minimum area (equation 1) with
respect to each type of abnormality. The ground truth con-
tours are created from the boundary of the abnormalities
marked by the annotators and the predicted contours are gen-
erated from the boundary of CAMs thresholded at a value of
1.5 ⇥ mean(CAM ).
(a) Inception-ResNet-V2 (b) VGG19
n
1X
mIoM = mIoMi (1)
n i=1
Fig. 2: The heatmap generated by the Inception-ResNet-V2
where, (2a) is slightly shifted due to its large receptive field as it can
be observed from the red contours in comparison with the
mIoMi is the mean Intersection over Minimum area
ground truth in blue contours. On the other hand, VGG19
for the ith predicted contour
(2b) has precise contours.
n is the total number of predicted contours
575
NPDR NRDR PDR Weighted Average
Model A Pr
P R F1 P R F1 P R F1 P R F1
DenseNet- 0.70 0.84 0.76 0.97 0.92 0.94 0.45 0.86 0.59 0.92 0.90 0.91 0.90 7,040,579
121
Inception- 0.76 0.88 0.82 0.98 0.95 0.96 0.79 0.86 0.83 0.94 0.93 0.94 0.94 54,341,347
ResNetV2
Inception- 0.83 0.79 0.82 0.97 0.97 0.97 0.62 0.95 0.75 0.94 0.94 0.94 0.93 21,808,931
V3
NASNet 0.83 0.87 0.85 0.97 0.96 0.97 0.77 0.91 0.83 0.95 0.95 0.95 0.95 84,928,917
ResNet50 0.68 0.89 0.77 0.98 0.90 0.94 0.40 0.86 0.55 0.92 0.90 0.90 0.90 23,593,859
VGG16 0.83 0.84 0.84 0.98 0.96 0.97 0.56 0.82 0.67 0.95 0.94 0.94 0.94 14,716,227
VGG19 0.78 0.87 0.82 0.98 0.95 0.96 0.59 0.73 0.65 0.94 0.93 0.94 0.93 20,025,923
Xception 0.73 0.91 0.81 0.99 0.92 0.95 0.51 0.91 0.66 0.94 0.92 0.92 0.92 20,867,627
Table 4: Classification Metrics. Estimation of Precision (P), Recall (R), F1-Score(F1), Accuracy (A), Pr (Parameters);
Non-Proliferative DR (NPDR), Non-Refereable DR (NRDR), Proliferatetive DR (PDR)
Model Deep- Hard- Micro- NVD- Others Scar Soft- Sub- Super- wt.Avg
H Exudate aneurysm NVE- Exudate Hyloid- ficial-H
Fibrosis H
DenseNet- 0.24 0.56 0.12 0.50 0.37 0.36 0.45 0.35 0.30 0.33
121
Inception- 0.19 0.27 0.07 0.37 0.32 0.20 0.28 0.10 0.19 0.20
ResNetV2
Inception- 0.23 0.42 0.10 0.52 0.14 0.24 0.23 0.28 0.23 0.25
V3
NASNet 0.18 0.53 0.14 0.43 0.28 0.17 0.17 0.42 0.18 0.23
ResNet50 0.20 0.47 0.07 0.40 0.21 0.30 0.42 0.27 0.26 0.27
VGG16 0.42 0.56 0.13 0.26 0.41 0.55 0.48 0.50 0.43 0.44
VGG19 0.42 0.65 0.14 0.32 0.49 0.46 0.51 0.51 0.46 0.45
Xception 0.30 0.51 0.12 0.51 0.21 0.27 0.22 0.25 0.28 0.29
Table 5: mIoMa of abnormalities localization; H - Hemorrhage; wt.Avg - Weighted Average, support in Table 3
Fig. 3: Green contours are marked by the annotator and white contours are predicted by models
576
6. REFERENCES learning algorithm for detection of diabetic retinopa-
thy in retinal fundus photographs,” CoRR, vol.
[1] Sebahat Atalikoğlu Başkan and Mehtap Tan, “Research abs/1803.04337, 2018.
of type 2 diabetes patients’ problem areas and affecting
factors,” Journal of Diabetes Mellitus, vol. 07, no. 03, [9] B. Zhou, A. Khosla, Lapedriza. A., A. Oliva, and A Tor-
pp. 175–183, 2017. ralba, “Learning deep features for discriminative local-
ization.,” cvpr, 2016.
[2] Rupert R A Bourne, Gretchen A Stevens, Richard A
White, Jennifer L Smith, Seth R Flaxman, Holly Price, [10] Rishab Gargeya and Theodore Leng, “Automated iden-
Jost B Jonas, Jill Keeffe, Janet Leasher, Kovin Naidoo, tification of diabetic retinopathy using deep learning,”
Konrad Pesudovs, Serge Resnikoff, and Hugh R Taylor, Ophthalmology, vol. 124, no. 7, pp. 962–969, jul 2017.
“Causes of vision loss worldwide, 1990–2010: a sys-
[11] Zhiguang Wang and Jianbo Yang, “Diabetic retinopathy
tematic analysis,” The Lancet Global Health, vol. 1, no.
detection via deep convolutional networks for discrim-
6, pp. e339–e349, dec 2013.
inative localization and visual explanation,” in AAAI
[3] Gulshan V, Peng L, Coram M, and et al, “Development Workshops, 2018.
and validation of a deep learning algorithm for detection
[12] Karen Simonyan and Andrew Zisserman, “Very deep
of diabetic retinopathy in retinal fundus photographs,”
convolutional networks for large-scale image recogni-
JAMA, vol. 316, no. 22, pp. 2402–2410, 2016.
tion,” CoRR, vol. abs/1409.1556, 2014.
[4] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe,
Jonathon Shlens, and Zbigniew Wojna, “Rethinking the [13] Christian Szegedy, Sergey Ioffe, and Vincent Van-
inception architecture for computer vision,” CoRR, vol. houcke, “Inception-v4, inception-resnet and the im-
abs/1512.00567, 2015. pact of residual connections on learning,” CoRR, vol.
abs/1602.07261, 2016.
[5] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause,
Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej [14] François Chollet, “Xception: Deep learning with
Karpathy, Aditya Khosla, Michael Bernstein, Alexan- depthwise separable convolutions,” CoRR, vol.
der C. Berg, and Li Fei-Fei, “ImageNet Large Scale Vi- abs/1610.02357, 2016.
sual Recognition Challenge,” International Journal of
[15] Gao Huang, Zhuang Liu, and Kilian Q. Weinberger,
Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252,
“Densely connected convolutional networks,” CoRR,
2015.
vol. abs/1608.06993, 2016.
[6] Etienne Decencière, Xiwei Zhang, Guy Cazuguel,
[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
Bruno Lay, Béatrice Cochener, Caroline Trone, Philippe
Sun, “Deep residual learning for image recognition,”
Gain, Richard Ordonez, Pascale Massin, Ali Erginay,
CoRR, vol. abs/1512.03385, 2015.
Béatrice Charton, and Jean-Claude Klein, “FEED-
BACK ON a PUBLICLY DISTRIBUTED IMAGE [17] Barret Zoph, Vijay Vasudevan, Jonathon Shlens,
DATABASE: THE MESSIDOR DATABASE,” Image and Quoc V. Le, “Learning transferable architec-
Analysis & Stereology, vol. 33, no. 3, pp. 231, aug 2014. tures for scalable image recognition,” CoRR, vol.
[7] G. Quellec, M. Lamard, P.M. Josselin, G. Cazuguel, abs/1707.07012, 2017.
B. Cochener, and C. Roux, “Optimal wavelet trans- [18] Zhe Wang, Yanxin Yin, Jianping Shi, Wei Fang,
form for the detection of microaneurysms in retina pho- Hongsheng Li, and Xiaogang Wang, “Zoom-in-net:
tographs,” IEEE Transactions on Medical Imaging, vol. Deep mining lesions for diabetic retinopathy detection,”
27, no. 9, pp. 1230–1241, sep 2008. CoRR, vol. abs/1706.04372, 2017.
[8] Mike Voets, Kajsa Møllersen, and Lars Ailo Bongo,
“Replication study: Development and validation of deep
577