Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

2018 IEEE International Conference on Information Reuse and Integration for Data Science

Diabetic Retinopathy Stage Classification using Convolutional Neural Networks

Xiaoliang Wang1, Yongjin Lu2, Yujuan Wang3, Wei-Bang Chen4*


1
Department of Technology, Virginia State University, Virginia, USA
2
Department of Mathematics and Economics, Virginia State University, Virginia, USA
3
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
4
Department of Engineering and Computer Science, Virginia State University, Virginia, USA
1
xwang@vsu.edu; 2 ylu@vsu.edu; 3 yujuanwang2013@gmail.com; 4 wchen@vsu.edu
*Corresponding Author

Abstract— Diabetic Retinopathy (DR) stage classification has Figure 1 demonstrates different stages of diabetic
been regarded as a critical step for evaluation and management retinopathy. Figure 1(a) illustrates a fundoscopic image of
of diabetes retinopathy. Because of damages of the retina blood Stage I: No DR; Figure 1(b) shows a fundoscopic image of
vessels caused by the high blood glucose level, different extent of Stage II: Mild non-proliferative DR; Figure 1(c) depicts a
microstructures, such as micro-anuerysms, hard exudates, and fundoscopic image of Stage III: Moderate non-proliferative
neovascularization, could occupy the retina area. Deep learning DR; Figure 1(d) exemplifies a fundoscopic image of Stage IV:
based Convolutional Neural Network (CNN) has recently been Severe non-proliferative DR; Figure 1(e) exhibits a
proved a promising approach in biomedical image analysis. In fundoscopic image of Stage V: Proliferative diabetic
this work, representative Diabetic Retinopathy (DR) images
retinopathy. As we can see in Figure 1(b), several micro-
have been aggregated into five categories according to the
aneurysms can be observed on the right of the fundoscopic
expertise of ophthalmologist. A group of deep Convolutional
Neural Network methods have been employed for DR stage image. In Figure 1(c), many micro-aneurysms and hard
classification. State-of-the-art accuracy result has been achieved exudates can be found on the right of the fundoscopic image.
by InceptionNet V3, which demonstrates the effectiveness of In Figure 1(d), more than 20 intra-retinal hemorrhages in any
utilizing deep Convolutional Neural Networks for DR image of the four quadrants and several intra-retinal microvascular
recognition. abnormalities (IRMA) can be identified as clinical signs of
severe NPDR. As neovascularization and vitreous/pre-retinal
Keywords- diabetic retinopathy; image classification; deep hemorrhage can be recognized in Figure 1(e) which indicates
convolutional neural network the case of proliferative diabetic retinopathy.

TABLE 1. INTERNATIONAL CLINICAL DIABETIC RETINOPATHY & DIABETIC


I. INTRODUCTION MACULAR EDEMA DISEASE SEVERITY SCALES
Diabetes mellitus, commonly known as diabetes, is a
metabolic disorder that causes high blood glucose level over a Stage
Dilated Ophthalmoscopy Observable
Severity
prolonged period. On the basis of the epidemic estimation, Findings
more than 370 million people worldwide will be affected by I No abnormalities No DR
diabetes mellitus by 2030 [1]. Individuals who suffer from Mild non-
II Micro-aneurysms only proliferative
diabetes mellitus have higher risk of developing Diabetic DR
Retinopathy (DR) due to damages of the retina blood vessels Any of the following:
caused by the high blood glucose level aforementioned. In - micro-aneurysms Moderate
order to provide appropriated therapy and prevent visual loss, - retinal dot and blot haemorrhages non-
III
it is extremely important to categorize diabetic retinopathy - hard exudates or cotton wool spots proliferative
based on the severity [2]. Based upon the findings of the Early No signs of severe non-proliferative DR
Treatment Diabetic Retinopathy Study and the Wisconsin diabetic retinopathy
Epidemiologic Study of Diabetic Retinopathy, International Any of the following:
Clinical Diabetic Retinopathy and Diabetic Macular Edema - more than 20 intra-retinal
hemorrhages in each of 4 quadrants
Disease Severity Scales, one of the widely-used standard, is
- definite venous beading in 2 or more Severe non-
proposed by Wilkinson in 2003 to classify diabetic IV quadrants proliferative
retinopathy into 5 stages, including: 1) Stage I: No apparent - prominent intra-retinal microvascular DR
retinopathy, 2) Stage II: Mild None-Proliferative Diabetic abnormality (IRMA) in 1 or more
Retinopathy (NPDR), 3) Stage III: Moderate NPDR, 4) Stage quadrants
IV: Severe NPDR, and 5) Stage V: Proliferative diabetic No signs of proliferative retinopathy
retinopathy [3][4]. Table 1 summarizes the disease severity One or both of the following:
Proliferative
level of the five diabetic retinopathy stages based on the V - Neovascularization
DR
findings observable by the ophthalmoscopy with the pupil - Vitreous/pre-retinal hemorrhage
dilated (a.k.a. fundoscopy).

978-1-5386-2659-7/18/$31.00 ©2018 IEEE 465


DOI 10.1109/IRI.2018.00074
(a) Stage I: No diabetic retinopathy (d) Stage IV: Severe non-proliferative diabetic retinopathy

(b) Stage II: Mild non-proliferative diabetic retinopathy

Micro-aneurysm or
Haemorrhage

Hard Exudates

Intra-Retinal Microvascular
Abnormality (IRMA)

Neovascularization

Pre-retinal Hemorrhage

(c) Stage III: Moderate non-proliferative diabetic retinopathy (e) Stage V: Proliferative diabetic retinopathy

Figure 1. Fundoscopic images of different stages of diabetic retinopathy. (a) Stage I: No diabetic retinopathy; (b) Stage II: Mild non-proliferative diabetic
retinopathy; (c) Stage III: Moderate non-proliferative diabetic retinopathy; (d) Stage IV: Severe non-proliferative diabetic retinopathy; (e) Stage V:
Proliferative diabetic retinopathy.

Diabetic retinopathy (DR) staging is important for the of this paper is to introduce an image analysis-based approach
estimation of diabetes mellitus (DM) and the evaluation of to automatically differentiate the 5 stages of diabetic
associated retinopathy; it is also closely related with proper retinopathy based on fundoscopic images. Image analysis has
management and prognosis of DR. In order to objectively and been widely and successfully applied in biomedical field, for
accurately determines the diabetic retinopathy stages, the goal example, to objectively differentiate embryonic

466
developmental stages [5] and classify severity of melanoma A. AlexNet
and nevi from skin lesions [6]. Deep Learning based AlexNet is a convolutional neural network, which is
Convolutional Neural Network (CNN) has recently been equipped with approximately 650,000 neurons and 60 million
proven to be a promising approach for different medical image parameters. It was introduced in 2012 by Alex Krizhevsky and
analysis [7]. Anthimopoulos deployed a deep convolutional his co-authors in [14]. An important feature of AlexNet is the
neural network for lung pattern classification of interstitial introduction of the ReLU nonlinearity into the training of
lung diseases [8]. Esteva et al. in their work leveraged deep neural network. Compared to saturating nonlinear activation
convolutional neural networks for dermatologist-level functions, such as hyperbolic tangent (tanh) and sigmoid
classification of skin cancer [9]. function, the non-saturating nonlinear activation function
On the topic of using deep learning algorithm to classify ReLU obtains the same training error rate six times faster on
diabetic retinopathy in retinal fundus photographs, large scale CIFAR-10 [16] dataset than an equivalent network with tanh
medical studies were conducted in the past [10][11]. neurons. Another important features of AlexNet is “dropout”,
However, these studies combine several stages together and the action of setting the output of each hidden neurons to zero
only aim at building binary classifiers. One of the previous with a probability of 50%, in the last few fully connected
work which also aims to build a five-class severity classifier layers to further fix the overfitting problem. The dropped-out
for diabetic retinopathy using CNNs is given in [12]. neurons do not contribute to the forward propagation process.
However, the results reported in [12] have a lower prediction The architecture of AlexNet is shown in Figure 2.
accuracy, which will be compared to our results later.
In this article, we investigate the performance of different
deep convolutional neural network architectures when they
are deployed to classify the five DR stages. The experiment
has been performed based on a total of 166 fundoscopic
images extracted from the publicly available Kaggle dataset
provided by EyePACS [13]. We find that InceptionNet V3, Figure 2. AlexNet architectue.
which is devised with the most advanced convolutional neural
network building techniques gives the highest 5-fold cross
validation average classification accuracy of 63.23%,
B. VGG16
compared to traditional AlexNet [14] and VGG16 [15], using
only this small number of images. VGG16 [15], which approximately has 138 million
parameters and 16 weighted layers is an evolved
II. METHOD convolutional neural network model, compared to AlexNet.
The convolution neural networks (CNNs) are a category VGG16 investigates the effect of the convolutional network
of neural networks that are proven to be effective in image depth on classification accuracy in the large-scale image
recognition and classification. CNN extends the regular neural recognition setting. The advantage of VGG16 is its simple and
networks by adding the operations of convolution, non- standardized approach of constructing the hidden layers: all
linearity and sub-sampling. The purpose of convolution is to convolution layers use 3 × 3 filters with stride 1 and same
extract features from the input images. By convoluting the padding while all the pooling layers are max-pooling layers
input images with some specially chosen small square using 2 × 2 filters with stride 2. In addition, the number of
matrices, certain image processing effects, such as edge filters in a convolution layer is the power of 2: starting from
detection, sharpening and blurring could be realized. Another 64 and gradually increased to 512. The architecture of VGG16
operation called Rectified Linear Unit (ReLU) that could be is illustrated in Figure 3.
used after every convolution operation is a non-linear
operation by replacing all negative pixel values in the feature
map by zero. The purpose of ReLU is to introduce
nonlinearity. The third operation called pooling or sub-
sampling reduces the dimensionality of each feature map
while retaining the most important information. It is realized
by taking and storing for example, the max, average or sum of
a sub-region in the feature map. After adding an appropriate
number of layers of these three operations, the output feature
map will be connected to a classical neural network to Figure 3. VGG16 architecture.
complete the classification task.
In this paper, we deploy three state-of-the-art
representative convolutional neural network architectures,
AlexNet, VGG16 and InceptionNet V3, for DR stage C. InceptionNet
classification. We go through the architectures of these three Although VGGNet [15] has the compelling feature of
models and explain further how we set up different architectural simplicity, this comes at a high cost: evaluating
configurations to leverage these CNNs for DR stage image the network requires a lot of computation. On the other hand,
classification. the Inception architecture of GoogLeNet is designed to

467
perform well even under strict constraints on memory and parameter ș follows an exponentially weighted moving
computational budget [17]. To achieve this, the authors of average of the gradients of the cost function:
GoogLeNet utilized channel concatenation: concatenating the
channels obtained from 1 × 1 convolution, 3 × 3 convolution, ‫ݒ‬௧ ൌ ߚ‫ݒ‬௧ିଵ ൅ ሺͳ െ ߚሻ‫׏‬ఏ ‫ܬ‬ሺߠሻ
5 × 5 convolution and pooling together while keeping the
height and width of each channel unchanged. To avoid the ߠ ൌ ߠ െ ߙ‫ݒ‬௧
exponentially increase of the mathematical operations that
have to be done for channel concatenation, 1 × 1 “bottleneck” where ȕ is the momentum parameter, Į is the learning rate and
layers that have shallower channel depths are applied before t stands for the number of iteration. SGDM accelerates
applying 3 × 3 or 5 × 5 convolution. The introduction of these learning in the relevant directions while smoothing out
“bottleneck” layers reduces the number of mathematical oscillations in highly volatile directions. The parameter ȕ
operations for a particular convolution by a factor of 10. As a determines the approximate moving window where this
result of channel concatenation and 1 × 1 convolution weighted average is calculated. If ȕ = 0.9, this moving window
“bottleneck” layers, GoogleNet employed only 5 million is approximately 10 iterations, while a value of ȕ = 0.98 sets
parameters, which represented a 12× reduction with respect to the moving windows to approximately 50 iterations. The
its predecessor AlexNet [14], which used 60 million default value of ȕ is 0.9.
parameters. Furthermore, VGGNet employed about 3× more
parameters than AlexNet. Compared to the GoogleNet [17], E. Transfer Learning
InceptionNet V3 [18] utilizes factorizing convolutions For the training procedures of all the three employed
strategies to further increase computational efficiency. Figure CNNs, we perform a fine-tune strategy towards the model
4 exemplifies the inception module of InceptionNet V3 pretrained from ImageNet [20]. To ensure that the transfer
leverages spatial factorization into asymmetric convolutions learning process could be executed properly, the images in our
to save computation cost further. DR dataset are resized to 227 × 227 pixels, 224 × 224 pixels
and 299 × 299 pixels respectively for AlexNet, VGG16, and
D. Optimization Algorithm InceptionNet V3. The resizing of the images unfortunately
To accelerate the convergence to global minimum of the causes distortion (loss of aspect ratio) and loss of fidelity (the
cost function, we apply the stochastic gradient descent with pixel density is reduced to approximately only 1% of the
momentum (SGDM) to find the global minimum of the cost original image). We will address some of these problems in
function for all three CNNs [19]. The update rule for the future.

Base

1 × 1 Conv 1 × 1 Conv Pool 1 × 1 Conv

1 × n Conv 1 × n Conv 1 × 1 Conv

n × 1 Conv n × 1 Conv

1 × n Conv

n × 1 Conv

Filter Concat
Figure 4. Inception module of InceptionNet V3

468
III. EXPERIMENTAL RESULTS experiments. The n-fold cross validation randomly partitions
original sample into training set to train the model and a test
A. Dataset set to validate the model. The advantage of n-folder cross
More than thirty-five thousand fundoscopic images in the validation process is that all observations in the original
Kaggle dataset have relatively inhomogeneous quality due to sample are used for both training and validation predictive
the fact these images were not obtained in a controllable lab models, and each observation is used for validation exactly
environment [13]. Different cameras are used to capture these once.
images that have different image resolutions ranging from In this research, 5-fold cross validation process is used to
2594 × 1944 pixels to 4752 × 3168 pixels and contain a lot of train and test the classification models. The original dataset is
noises due to, for example, the sub-optimal lighting condition split into five sub-datasets with approximately equal size. Five
[12]. To overcome this difficulty, the images in the Kaggle models are then produced using four of the five sub-datasets
dataset have been screened or even relabelled by as the training set and the other one as the validation set.
ophthalmologist when necessary. A total of 166 representative Cross-validation accuracy is calculated for each of the five
and high-quality images were thus chosen from the Kaggle classification models.
dataset and form the dataset that we use to train the D. Hyperparameter Tuning
classification models reported in this work. The composition
of these 166 images is listed in Table 2. Since the performance of the trained CNN models is
sensitive to the altering of hyperparameters, we run small
TABLE 2. DATASET COMPOSITION scale of experiments to test and tune some of the important
hyperparameters, including initial learning rate, learning rate
decay schedule, learning rate decay factor and batch size. We
Stage Description Number of Images sampled a few initial learning rates of 0.001, 0.0001 and
0.00001; mini-batch sizes of 16, 20, 24, 28 and 32; stairwise
I No DR 31 and exponential learning rate decay schedule with various
learning decay factor. The one hyperparameter that we do not
II Mild NPDR 30 tune is the momentum ȕ, which is set to the default value of
0.9 for SGDM. The tuned hyperparameters are listed in Table
III Moderate NPDR 50 3.

TABLE 3. HYPERPARAMETER VALUES


IV Severe NPDR 31

V Proliferative DR 24 Tuned Value


Hyperparameters
InceptionNet
Since we aim to train a five-class classifier, the influence AlexNet VGG16
V3
generated from imbalanced samples of the five stages images
in the original dataset (approximately 74% of the images Initial Learning Rate, ࢻ૙ 0.0001 0.0001 0.001
belong to stage I) on the training process should not be
ignored. Thus, the 166 images are selected in such a way that Learning Rate Decay
Stairwise Stairwise Exponential
each stage has a relatively balanced samples in the new Schedule
reorganized dataset.
Learning Rate Decay
0.1 0.1 0.16
B. Evaluation Metrics Factor

In order to evaluate and compare the performance of Mini-Batch Size 20 20 32


classifiers with various design, accuracy is adopted to measure
the effectiveness of different CNN-based classifiers.
Accuracy (ACC) is formally defined as Momentum, ࢼ(untuned) 0.9 0.9 0.9


ͳ
‫ ܥܥܣ‬ൌ ෍ ߯௜ ǡ E. Performance Evaluation
݉
௜ୀଵ
In this study, three convolutional neural networks, i.e.,
AlexNet, VGG16, and InceptionNet V3, are trained to
where ߯௜ ൌ ͳ, if the predicted label ‫ݕ‬ො ሺ௜ሻ equals to the true produce predictive models using cross validation process. The
label ‫ ݕ‬ሺ௜ሻ for example ݅; otherwise, ߯௜ ൌ Ͳ. comparison of their average 5-fold cross validation accuracy
on the five-class classifier is summarized in Table 4. The
C. Evaluation Metrics average accuracy of AlexNet, VGG16, and InceptionNet V3
are 37.43%, 50.03%, and 63.23%, respectively.
In order to objectively evaluate the performance of
predictive models, we adopt n-fold cross validation in our

469
TABLE 4. EXPERIMENT RESULTS REFERENCES
[1] G. Danaei, M. M. Finucane, Y. Lu, G. M. Singh, M. J. Cowan, C. J.
Training Algorithm Average CV Accuracy
Paciorek, J. K. Lin, F. Farzadfar, Y.-H. Khang, G. A. Stevens, M. Rao,
AlexNet 37.43% M. K. Ali, L. M. Riley, C. A. Robinson, and M. Ezzati, "National,
VGG16 50.03% regional, and global trends in fasting plasma glucose and diabetes
prevalence since 1980: systematic analysis of health examination
InceptionNet V3 63.23%
surveys and epidemiological studies with 370 country-years and 2·7
million participants," The Lancet, vol. 378, issue 9785, 2011, pp. 31-
IV. DISCUSSIONS 40.
[2] L. Wu, P. Fernandez-Loaiza, J. Sauma, E. Hernandez-Bogantes, and
The average 5-fold cross validation accuracy obtained by M. Masis, “Classification of diabetic retinopathy and diabetic macular
InceptionNetV3 with the highest accuracy of 63.23% is a edema,” World Journal of Diabetes, vol. 4, issue 6, Dec. 2013, pp. 290-
40.5% improvement on the results reported in [12], using just 294.
0.47% of images from the Kaggle dataset. The possible [3] C. P. Wilkinson, F. L. Ferris, R. E. Klein, P. P. Lee, C. D. Agardh, M.
reasons for the improvement in the classification performance
Davis, D. Dills, A. Kampik, R. Pararajasegaram, J. T. Verdaguer, and
and efficiency are that 1) the images in our dataset are
G. D. R. P. Group, “Proposed international clinical diabetic retinopathy
handpicked by domain experts, thus, some of the problems in
the original Kaggle dataset, discovered in [12] such as and diabetic macular edema disease severity scales,” Ophthalmology,
heterogeneity and noise, are avoided to certain degree; 2) the vol. 110, issue 9, Sep. 2003, pp. 1677-1682.
relative homogeneous composition of each of the five classes [4] T. Y. Wong, C. M. G. Cheung, M. Larsen, S. Sharma, and R. Simó,
in our dataset could be another reason that our experiments “Diabetic retinopathy,” Nature Reviews Disease Primers, vol. 2, Mar.
perform much better with greater efficiency; 3) the third 2016, pp. 1-16.
reason is that the IncpetionNetV3 we used in this article is an
[5] H. Zhong, W.-B. Chen, and C. Zhang, “Classifying fruit fly early
improvement on the original GoogLeNet. However, we
should point out that the performance of VGG16, a shallower embryonic developmental stage based on embryo in situ hybridization
CNN than the GoogLeNet used in [12], using our tuned images,” in Proceedings of IEEE International Conference on
parameters is still a 11% improvement. This should be due to Semantic Computing (ICSC 2009), IEEE, Sep. 2009, pp. 145-152.
the previous two reasons and the fine tuning of the [6] J. D. Osborne, S. Gao, W.-B. Chen, A. Andea, and C. Zhang, “Machine
hyperparameters. classification of melanoma and nevi from skin lesions,” in Proceedings
It is worth mentioning that as pointed out in [12], the of the 2011 ACM Symposium on Applied Computing (SAC 2011),
classification of diabetic retinopathy on the basis of
ACM, Mar. 2011, pp. 100-105.
fundoscopic images is not a trivial task, even for a well-trained
human expert. Our experimental results indicate the [7] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M.
effectiveness of CNNs in staging diabetic retinopathy. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I.
Although the accuracy of predicative models obtained from Sánchez, “A survey on deep learning in medical image analysis,”
InceptionNet V3 is acceptable based on such a small training Medical image analysis, vol. 42, Dec. 2017, pp. 60-88.
data size, there is still a lot of room for us to improve in the
[8] M. Anthimopoulos, S. Christodoulidis, L. Ebner, A. Christe, S.
future. This paper is our initial step, which set the foundation
Mougiakakou, “Lung pattern classification for interstitial lung diseases
for us to further explore machine learning based classifiers for
staging diabetic retinopathy. We believe that our efforts willġ using a deep convolutional neural network,” IEEE transactions on
provide a useful software-based tool for ophthalmologist to medical imaging, vol. 35, issue 5, May 2016, pp. 1207-1216.
evaluate the severity level of diabetes mellitus by recognizing [9] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau,
different diabetic retinopathy stage, which will further support and S. Thrun, “Dermatologist-level classification of skin cancer with
for proper management of prognosis of diabetic retinopathy. deep neural networks,” Nature, vol. 542, issue 7639, Feb. 2017,
pp.115-118.
V. FUTURE WORKS
[10] D. S. W. Ting, C. Y. Cheung, G. Lim, G. S. W. Tan, N. D. Quang, A.
Future efforts could be made in several aspects. First, more Gan, H. Hamzah, R. Garcia-Franco, I. Y. San Yeo, S. Y. Lee, E. Y. M.
advanced CNN based image classification models could be Wong, C. Sabanayagam, M. Baskaran, F. Ibrahim, N. C. Tan, E. A.
deployed to further improve DR categorization accuracy. Finkelstein, E. L. Lamoureux, I. Y. Wong, N. M. Bressler, S.
Second, CNN based object detection, e.g., blood vessels, Sivaprasad, R. Varma, J. B. Jonas, M. G. He, C. Y. Cheng, G. C. M.
Cheung, T. Aung, W. Hsu, M. L. Lee, and T. Y. Wong, “Development
methods could be leveraged to assist DR stage image and validation of a deep learning system for diabetic retinopathy and
recognition. Last but not least, CNN based image related eye diseases using retinal images from multiethnic populations
segmentation, e.g., cotton wool spots, strategy could also be with diabetes,” The Journal of the American Medical Association, vol.
investigated to implement fine-grained DR stage 318, issue, 22, Dec. 2017, pp.2211-2223;
classification. [11] V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A.
Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, J. Cuadros,
R. Kim, R. Raman, P. C. Nelson, J. L. Mega, and D. R. Webster,
“Development and validation of a deep learning algorithm for detection
of diabetic retinopathy in retinal fundus photographs,” The Journal of

470
the American Medical Association, vol. 316, issue 22, Dec. 2016, [17] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
pp.2402-2410. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
[12] M. Alban and T. Gilligan, “Automated detection of diabetic convolutions,” in Proceedings of the IEEE Conference on Computer
retinopathy using fluorescein angiography photographs,” Stanford
Technical Report, 2016. Vision and Pattern Recognition (CVPR 2015), IEEE, Jun. 2015, pp. 1-
[13] Diabetic retinopathy detection: identify signs of diabetic retinopathy in 9.
eye images, https://www.kaggle.com/c/diabetic-retinopathy-detection [18] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet “Rethinking the inception architecture for computer vision,” in
classification with deep convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Proceedings of International Conference on Neural Information Recognition (CVPR 2016), IEEE, Jun. 2016, pp. 2818-2826.
Processing Systems (NIPS 2012), Curran Associates Inc., Dec. 2012, [19] N. Qian, “On the momentum term in gradient descent learning
pp. 1097-1105. algorithms,” Neural Networks, vol. 12, issue 1, Jan. 1999, pp. 145-
151.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks
[20] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei,
for large-scale image recognition,” in Proceedings of International
“ImageNet: A large-scale hierarchical image database,” in Proceedings
Conference on Learning Representations (ICLR 2015), Sep. 2015. of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR 2009), IEEE, Jun. 2009, pp. 248-255.
[16] A. Krizhevsky and G. Hinton, “Learning multiple layers of features
from tiny images,” Technical Report, 2009.

471

You might also like