Professional Documents
Culture Documents
Artificial Intelligence in Diabetic Eye Disease Screening
Artificial Intelligence in Diabetic Eye Disease Screening
timely treatment and vision loss prevention. However, interpretation of well as approval and registration of a fundus camera device for
retinal images requires specialized knowledge and expertise in diabetic DR screening using DL technology by the US Food and Drug
eye disease. Furthermore, current DR screening programs are capital- and Administration (FDA).17 A study by Google Health, in particular,
labor-intensive, which makes it difficult to rapidly scale up and expand has drawn substantial media publicity, showing extremely high
diabetic eye screening to meet the needs of this growing global epidemic. sensitivity and specificity in detecting referable DR from fundus
Deep learning (DL), a new branch of machine learning technology photographs.3 These developments have been cited in mainstream
under the broad term of artificial intelligence (AI), has made remarkable media and editorials in non-ophthalmology literature.18–21
breakthrough in medical imaging in particular for pattern recognition and Despite the promise and hype, there remains significant
image classification. In ophthalmology, AI and DL technology has been challenges in the actual implementation of DL in DR screening,
developed from big image datasets in assessment of retinal photographs and translating DL research into direct clinical applications of
for detection and screening of DR as well as the segmentation and screening in a community setting. This review will focus on
assessment of OCT images for diagnosis and screening of DME. This the current progress and the development of using AI and DL
review aimed to summarize the current progress and the development of technology for DR screening.
using AI and DL technology for diabetic eye disease screening as well
as current challenges in the actual implementation of DL in screening
programs, and translating DL research into direct clinical applications of WHY IT IS IMPORTANT TO SCREEN FOR DR?
screening in a community setting. Screening for DR is based on a high level of evidence. By
2040, 642 million people worldwide is projected to have diabetes
Key Words: artificial intelligence, deep learning, diabetic retinopathy, mellitus (DM).22 This is a global epidemic with heavy health
optical coherence tomography, screening burden to individuals and societies across the world, affecting
populations not only in highly developed countries, but also
(Asia Pac J Ophthalmol (Phila) 2019;8:158–164) increasingly in developing countries.22–24 Of note, DR is already
the most specific microvascular complication of DM that leads to
progressive visual impairment and even blindness. The prevalence
Copyright © 2019 Asia-Pacific Academy of Ophthalmology. Unauthorized reproduction of this article is prohibited.
Asia-Pacific Journal of Ophthalmology • Volume 8, Number 2, March/April 2019 AI in Diabetic Eye Disease Screening
Copyright © 2019 Asia-Pacific Academy of Ophthalmology. Unauthorized reproduction of this article is prohibited.
TABLE 1. A Summary of Recent Deep Learning Systems for Screening DR and DME
Size of Dataset
Cheung et al
Output(s) of DR/ Deep Learning Size of Dataset for for External Sensitivity for Specificity for AUC for Output of
Study Year Imaging DME Techniques Development Validation Other Outputs Output of DR/DME Output of DR/DME DR/DME
Abràmoff et 2016, Fundus mtmDR (defined AlexNet and Messidor-2: 1748 819 subjects Sufficient image 87.2% 90.7% Not reported
160 | www.apjo.org
al2,48 2018 photograph as ETDRS level VGG network images recruited from 10 quality: yes/no
≥35 and/or DME architectures primary care sites
present)
Gulshan et al3 2016 Fundus Referable DR Inception-V3 128,175 images EyePACS-1: 9963 All-cause referable EyePACS-1: 97.5% EyePACS-1: 93.4% EyePACS-1: 0.991
photograph architecture images Messidor-2: 96.1% Messidor-2: 93.9% Messidor-2: 0.990
Messidor-2: 1748
images
Gargeya and 2017 Fundus Any stage of DR, Customized CNNs 75,137 images Messidor-2: 1748 Nil Messidor-2 (no DR Messidor-2 (no DR Messidor-2 (no DR
Leng4 photograph mild DR images vs any stage of DR): vs any stage of DR): vs any stage of DR):
E-Ophtha: 405 93% 87% 0.94
images Messidor-2 (no DR Messidor-2 (no DR Messidor-2 (no DR
vs mild DR): 74% vs mild DR): 80% vs mild DR): 0.83
E-Ophtha (no DR vs E-Ophtha (no DR vs E-Ophtha (no DR vs
mild DR): 90% mild DR): 94% mild DR): 0.95
Ting et al1 2017 Fundus Referable DR and Adapted VGGNet DR: 76,370 images Primary: 71,896 Referable possible Referable DR: Referable DR: Referable DR:
photograph VTDR architecture Possible glaucoma: images glaucoma 90.5% 91.6% 0.936
125,189 images 10 additional Referable AMD VTDR: 100% VTDR: 91.1% VTDR: 0.958
AMD: 72,610 datasets: 40,752
images images
Li et al5 2018 Fundus Vision-threatening Inception V3 71,043 images 35,201 images Nil 92.5% 98.5% 0.955
photograph referable DR architecture
Kanagasingam 2018 Fundus DR Inception V3 30,000 images 193 subjects Nil 100% 92% Nil
et al6 photograph architecture
Kermany et al14 2018 OCT DME Inception V3 108,312 images 1000 images CNV; drusen; Multiclass model: Multiclass model: Multiclass model:
architecture and urgent referral 97.8% 97.4% 0.999
transfer learning Limited model: Limited model: Limited model:
96.6% 94.0% 0.988
Binary model: Binary model: Binary model:
96.8% 99.6% 0.999
De Fauw et al15 2018 OCT Diagnosis 3D U-Net 877 segmented 997 subjects Referral suggestion, Not reported Not reported 0.990
probability of architecture for scans for the tissue volume
macular edema segmentation segmentation of drusen and
network network; 14,884 epiretinal
Customized CNNs scans with membrane,
for classification diagnoses and and diagnosis
Copyright © 2019 Asia-Pacific Academy of Ophthalmology. Unauthorized reproduction of this article is prohibited.
network referral decisions probability of
for classification other retinal
network abnormalities
Li et al49 2019 OCT DME VGG-16 network 109,312 images 1000 images CNV; drusen 98.8% 98.8% 0.999
architecture
AMD indicates age-related macular degeneration; AUC, area under the curve; CNN, convolutional neural network; CNV, choroidal neovascularization; DME, diabetic macular edema; DR, diabetic retinopathy; ETDRS,
Early Treatment Diabetic Retinopathy Study; mtmDR, more-than-mild diabetic retinopathy; OCT, optical coherence tomography; VGG, Visual Geometry Group; VTDR, vision-threatening diabetic retinopathy.
2. A major development in DL for DR detection was the study sensitivity (>90%), specificity (>73.3%) and AUC (>0.89)
by Gulshan et al3 from Google Healthcare which developed for identifying other referable eye conditions.
a DL system for detecting referable DR using 128,175
retinal photographs which were graded 3 to 7 times for DR,
DME, and image gradability by a panel of 54 US-licensed DL IN OCT FOR DETECTING DME
ophthalmologists and ophthalmology senior residents. The A major issue in DR screening is the detection of
DL algorithm achieved a high sensitivity (≥87%), specificity DME, which is not easily captured from 2-dimensional (2D)
(≥90%) and area under the receiver operating characteristic fundus photographs.35,38 Thus, recent studies have focused on
curve (AUC) (≥0.99) in the external validation using 2 demonstrating that DL algorithms can be trained using OCT
public databases (EyePACS-1: n = 9963 and Messidor-2: images to detect DME and other retinal diseases. Kermany et al14
n = 1748); both graded by at least 7 US board-certified firstly applied DL and transfer learning techniques in the detection
ophthalmologists.3 Other groups have reported similar of DME as well as choroidal neovascularization, drusen, and
results. Gargeya and Leng4 developed a DL algorithm for normal from 2D OCT images. Two models (multiclass model
detecting DR (any DR and mild DR) using 75,137 retinal and binary model) were firstly developed using a training dataset
photographs obtained from the EyePACS public dataset. of 108,312 2D OCT images and a “limited model” was further
The DL algorithm has further been evaluated using 2 other trained using 1000 2D OCT images randomly selected from each
public databases (Messidor-2: n = 1748 and E-Ophtha: n = class during training to compare transfer learning performance
405) with a sensitivity of ≥74%, specificity of ≥80%, and using limited data compared with results using a large dataset.14
AUC of ≥0.83.4 Li et al5 developed another DL algorithm In the validation, they evaluated the performance of the models
and internally validated it using 71,043 retinal photographs of a dataset of 1000 2D OCT images and found that both models
acquired from a total of 36 hospital ophthalmology achieved high performance even in the limited model (sensitivity
departments, optometry clinics, and screening settings ≥96%, specificity ≥94%, and AUC ≥0.99).14 This study
in China. In the external validation, the DL algorithm highlighted the application of transfer learning in conditions with
was evaluated using 35,201 retinal photographs from small datasets.
population-based cohorts of Malay, Caucasian Australians, Another major study was performed by Google’s Deepmind
and Indigenous Australians with a sensitivity of 92.5%, a in collaboration with Moorfields Eye Hospital, where the group
specificity of 98.5% and an AUC of 0.955.5 Kanagasingam applied a combined 3D segmentation and classification networks
et al6 developed a DL algorithm based on several training for interpreting 3D OCT images in predicting tissue volumes and
dataset including DiaRetDB1, EyePACS, and the Australian identifying and referring different retinal abnormalities.15 They
Tele-eye care DR database. They deployed the DL algorithm developed the DL algorithm using 877 segmented scans for
prospectively and evaluated the performance in 193 patients the segmentation network and 14,884 scans from 7621 patients
at a primary care clinic in Australia with a specificity of 92% with multiple clinical diagnoses and referral decisions for
and a positive predictive value of just 12% driven by the low classification network. The classification network learned to map
incidence of DR in their cohort (2 of 193).6 a segmentation map to 4 referral decisions (urgent, semi-urgent,
3. A third major development addresses a major gap in routine, and observation only) and 10 different retinal pathologies
previous studies in which the DL algorithms were not trained (macular retinal edema, choroidal neovascularization, full-
to detect other common sight-threatening eye conditions, thickness macular hole, partial-thickness macular hole, epiretinal
such as glaucoma and AMD. To address this gap, Ting et al1 membrane, geographic atrophy, drusen, vitreomacular traction,
developed a DL system to screen for possible glaucoma and central serous retinopathy, and normal). Their framework
AMD, in addition to referable DR and VTDR. The DL system (predicting tissue volumes, identifying retinal abnormalities,
was composed of 8 CNNs, all using an adaptation of the and providing referral suggestions) closely matched the clinical
VGGNet architecture: (i) an ensemble of 2 networks for the decision-making process, separating judgments about the scan
classification of DR severity; (ii) an ensemble of 2 networks itself from the subsequent referral decision, allowing a clinician
for the identification of referable possible glaucoma; (iii) an to inspect and visualize an interpretable segmentation, rather than
ensemble of 2 networks for the identification of referable simply being presented with a diagnosis and referral suggestion.
AMD; (iv) 1 network to assess image quality; and (v) 1
network to reject invalid non-retinal images. The DL system
was validated using retinal photographs collected from VISUALIZING DL AND EXPLAINABILITY
the Singapore Integrated Diabetic Retinopathy Program A key challenge for clinical adoption of DL algorithms is
(SIDRP) and from 10 additional multiethnic datasets from a major mindset shift in how clinicians entrust clinical care to
different countries with diverse community- and clinic- machines. The issue of ‘black box’ problem has been identified as
based populations with DM.1 In the primary validation an impediment to the application of DL in healthcare.53 Previous
dataset of 71,896 images from 17,974 patients, the AUC reports have already performed visualization heat map analysis to
of the DL system was 0.936 for referable DR, 0.958 for represent intuitively the learning procedure of the CNNs for DR
VTDR, 0.941 for glaucoma suspect, and 0.931 for referable and DME.4,14 Recently, Keel et al54 conducted a study to analyze
AMD. The DL system had AUCs between 0.889 and 0.983 the CNN learning procedure for 2 validated DL systems including
for referable DR on 10 additional datasets (n = 40,752 one for the detection of referable DR. They developed and applied
images). In the secondary validation datasets, the DL system a CNN-independent adaptive kernel visualization technique for
had AUCs between 0.889 and 0.983 for referable DR (n = visualizing the learning procedure of the networks. Using this
40,752 images). The DL system could also achieve high method, discriminative image regions were highlighted when the
Copyright © 2019 Asia-Pacific Academy of Ophthalmology. Unauthorized reproduction of this article is prohibited.
Cheung et al Asia-Pacific Journal of Ophthalmology • Volume 8, Number 2, March/April 2019
classification possibility output of being diagnosed was greater screening programs (Fig. 1B). By using this screening model, the
than 70%, and a heat map was then generated highlighting highly sensitivity of the DL system may need to be set higher in order to
prognostic regions in the input image. They found that typical minimalize false-negative cases.
referable DR lesions (retinal hemorrhage, exudate, intraretinal
microvascular abnormality, venous beading) could be identified
as the most important prognostic regions in 96% of true-positive CURRENT CHALLENGES
referable DR cases whereas nontraditional fundus regions (eg, The performance of DL algorithms is extremely promising
optic disc and fundus areas adjacent to vessels) were identified in as seen from current literature. However, there are still several
85% of false-positive referable DR cases. This study may further challenges and complexities that limit the ability to move forward
promote the clinical adoption of using DL algorithm and enable quickly.
clinicians to understand important exposure variables in real time. First, as mentioned above, the “black box” approach is still
unacceptable. Deep learning uses millions of image features most
predictive for classification rather than explicitly detecting clinical
HOW DOES DL FIT IN CURRENT DR SCREENING features that physicians are familiar with (eg, microaneurysms,
STRATEGY? hard exudates). It is unclear exactly what the machine “thinks”
A critical question in the conceptualization of using DL and “sees”.18 Transparency of DL algorithm is required in order
technology for DR screening is “where does the DL system to let users understand the basis to justify a particular diagnosis,
fit?”. Deep learning systems could potentially be deployed in treatment recommendation, or outcome prediction that the DL
2 different settings.1,18 Figure 1 illustrates the 2 proposed DL- algorithm offered, and also allow examination for any potential
based screening models for DR, compared with an existing bias.57
model. First, the DL system could be incorporated in existing DR Second, AI most likely succeeds when it is used with
screening programs such as in the UK and Singapore to assist high-quality “labeled” input data validated by multiple medical
human assessors in centralized reading centers (a semi-automated specialists for learning and classifying images in relation to
model).1,55,56 The retinal images can be firstly analyzed by the DL outcomes. Also, DL algorithms are highly data hungry, often
system and the human assessors only review images flagged as requiring millions of observations to reach acceptable performance
“referable” or “ungradable” by the DL system (2nd screening) for levels.58 Moreover, training using single clinical dataset is always
further confirmation. This model not only decreases the massive limited by potential biases (eg, same ethnicity, same device)
workload to read those non-referable retinal images but also which may affect both performance and generalizability.59,60
avoids those false-positive cases referring to ophthalmologists Using dataset from diverse settings and populations for training
(Fig. 1A). Second, the DL system could be a fully automated DL algorithm is ideal. Nevertheless, the availability of such
model in that all retinal images can be analyzed by the DL system dataset is very limited and it is very costly.
which will be useful in communities without any existing DR Third, “real-world” experiments are essential in the
A B C
FIGURE 1. Two artificial intelligence–based screening models for diabetic eye diseases have been proposed. A, A semi-automated model that retinal
images will be firstly analyzed by the deep learning systems. The images will be read by doctors or trained graders (2nd screening) if it is flagged by
the deep learning systems. B, A fully automated model that retinal images will be analyzed by the deep learning systems. C, Existing model by human
assessors.
Copyright © 2019 Asia-Pacific Academy of Ophthalmology. Unauthorized reproduction of this article is prohibited.
Asia-Pacific Journal of Ophthalmology • Volume 8, Number 2, March/April 2019 AI in Diabetic Eye Disease Screening
validation of DL algorithm. For example, prospective study age-related macular degeneration. JAMA Ophthalmol. 2018;136:1359–1366.
design in appropriate clinical settings and testing DL algorithm 10. Li Z, He Y, Keel S, et al. Efficacy of a deep learning system for detecting
using independent validation datasets under different settings and glaucomatous optic neuropathy based on color fundus photographs.
conditions that played no role in model development are critical Ophthalmology. 2018;125:1199–1206.
to evaluate the performance of DL algorithms. Several recent 11. Medeiros FA, Jammal AA, Thompson AC. From machine to machine:
studies have already reported the performance of DL algorithm An OCT-trained deep learning algorithm for objective quantification of
using a prospective study design.6,48,61 glaucomatous damage in fundus photographs. Ophthalmology. 2019;126:
Fourth, a continuous data supply is necessary for ongoing 513–521.
improvement and refinement of the DL algorithms and data 12. Redd TK, Campbell JP, Brown JM, et al. Evaluation of a deep learning
may need to be shared across multiple centers for widespread image assessment system for detecting severe retinopathy of prematurity. Br
implementation. However, the current healthcare environment J Ophthalmol. 2018 Nov 23. Epub ahead of print.
holds little incentive for data sharing.57 13. Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus
Lastly, before DL algorithms or systems can be deployed disease in retinopathy of prematurity using deep convolutional neural
in the clinical workflow of diabetic eye disease screening networks. JAMA Ophthalmol. 2018;136:803–810.
programs, there are still more questions need to be addressed. For 14. Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses
example, “If a patient suffers an adverse event due to an AI-based and treatable diseases by image-based deep learning. Cell. 2018;172:
technology, who is responsible?”.57 In addition, more rigorous 1122–1131.e9.
scientific evidence on safety, validity, reproducibility, reliability, 15. De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep
usability, and patient and clinician acceptability is also critical, learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:
before the clinical deployment. 1342–1350.
16. Schlegl T, Waldstein SM, Bogunovic H, et al. Fully automated detection and
quantification of macular fluid in OCT using deep learning. Ophthalmology.
CONCLUSIONS 2018;125:549–558.
In summary, AI and DL are entering the mainstream of clinical 17. FDA permits marketing of artificial intelligence-based device to detect
medicine. This technology can augment human intelligence to certain diabetes-related eye problems [news release]. US Food & Drug
improve decision making and operational processes. Screening of Administration; April 11, 2018. https://www.fda.gov/NewsEvents/
DR is almost an ideal task for AI in health care. Looking forward, Newsroom/PressAnnouncements/ucm604357.htm?fbclid=IwAR2h4qBp
AI will become ubiquitous and indispensable for the screening, SAD6vx8EErts2nhhundX8654ZUL3xwvVahXyLxnSYSxCAOMEias.
with expectation to improve the efficiency and accessibility of Accessed February 24, 2019.
screening programs, and therefore will be able to prevent visual 18. Wong TY, Bressler NM. Artificial intelligence with deep learning
loss and blindness from this devastating disease. technology looks into diabetic retinopathy screening. JAMA. 2016;316:
2366–2367.
19. Beam AL, Kohane IS. Translating artificial intelligence into clinical care.
REFERENCES JAMA. 2016;316:2368–2369.
1. Ting DSW, Cheung CY, Lim G, et al. Development and validation of a 20. Google AI matches diabetic retinopathy screens [medical news]. Medscap;
deep learning system for diabetic retinopathy and related eye diseases November 30, 2016. https://www.medscape.com/viewarticle/872555.
using retinal images from multiethnic populations with diabetes. JAMA. Accessed February 24, 2019.
2017;318:2211–2223. 21. Metz C. India fights diabetic blindness with help from A.I. New York
2. Abràmoff MD, Lou Y, Erginay A, et al. Improved automated detection of Times. March 10, 2019: https://www.nytimes.com/2019/03/10/technology/
diabetic retinopathy on a publicly available dataset through integration of artificial-intelligence-eye-hospital-india.html. Accessed March 12, 2019.
deep learning. Invest Ophthalmol Vis Sci. 2016;57:5200–5206. 22. Ogurtsova K, da Rocha Fernandes JD, Huang Y, et al. IDF Diabetes Atlas:
3. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes
learning algorithm for detection of diabetic retinopathy in retinal fundus Res Clin Pract. 2017;128:40–50.
photographs. JAMA. 2016;316:2402–2410. 23. Chan JC, Malik V, Jia W, et al. Diabetes in Asia: epidemiology, risk factors,
4. Gargeya R, Leng T. Automated identification of diabetic retinopathy using and pathophysiology. JAMA. 2009;301:2129–2140.
deep learning. Ophthalmology. 2017;124:962–969. 24. Chan JC, Zhang Y, Ning G. Diabetes in China: a societal solution for a
5. Li Z, Keel S, Liu C, et al. An automated grading system for detection of personal challenge. Lancet Diabetes Endocrinol. 2014;2:969–979.
vision-threatening referable diabetic retinopathy on the basis of color fundus 25. Wong TY, Cheung CM, Larsen M, et al. Diabetic retinopathy. Nat Rev Dis
photographs. Diabetes Care. 2018;41:2509–2516. Primers. 2016;2:16012.
6. Kanagasingam Y, Xiao D, Vignarajan J, et al. Evaluation of artificial 26. Cheung N, Mitchell P, Wong TY. Diabetic retinopathy. Lancet. 2010;376:
intelligence-based grading of diabetic retinopathy in primary care. JAMA 124–136.
Netw Open. 2018;1:e182665. 27. Yau JW, Rogers SL, Kawasaki R, et al. Global prevalence and major risk
7. Burlina PM, Joshi N, Pacheco KD, et al. Assessment of deep generative factors of diabetic retinopathy. Diabetes Care. 2012;35:556–564.
models for high-resolution synthetic retinal image generation of age-related 28. Cheung CY, Ikram MK, Klein R, et al. The clinical implications of recent
macular degeneration. JAMA Ophthalmol. 2019 Jan 10. Epub ahead of print. studies on the structure and function of the retinal microvasculature in
8. Peng Y, Dharssi S, Chen Q, et al. DeepSeeNet: A deep learning model for diabetes. Diabetologia. 2015;58:871–885.
automated classification of patient-based age-related macular degeneration 29. Zhang P, Zhang X, Brown JB, et al. In: Economic Impact of Diabetes. IDF
severity from color fundus photographs. Ophthalmology. 2019;126:565–575. Diabetes Atlas. 4th ed. International Diabetes Federation; 2010.
9. Burlina PM, Joshi N, Pacheco KD, et al. Use of deep learning for detailed 30. Zhang X, Low S, Kumari N, et al. Direct medical cost associated with
severity characterization and estimation of 5-year risk among patients with diabetic retinopathy severity in type 2 diabetes in Singapore. PLoS One.
Copyright © 2019 Asia-Pacific Academy of Ophthalmology. Unauthorized reproduction of this article is prohibited.
Cheung et al Asia-Pacific Journal of Ophthalmology • Volume 8, Number 2, March/April 2019
Copyright © 2019 Asia-Pacific Academy of Ophthalmology. Unauthorized reproduction of this article is prohibited.