Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Journal Pre-proof

Deep Learning to Detect Alzheimer’s Disease from Neuroimaging: A


Systematic Literature Review

Mr Amir Ebrahimighahnavieh , Dr. Raymond Chiong

PII: S0169-2607(19)31094-6
DOI: https://doi.org/10.1016/j.cmpb.2019.105242
Reference: COMM 105242

To appear in: Computer Methods and Programs in Biomedicine

Received date: 8 July 2019


Revised date: 13 November 2019
Accepted date: 25 November 2019

Please cite this article as: Mr Amir Ebrahimighahnavieh , Dr. Raymond Chiong , Deep Learning to
Detect Alzheimer’s Disease from Neuroimaging: A Systematic Literature Review, Computer Methods
and Programs in Biomedicine (2019), doi: https://doi.org/10.1016/j.cmpb.2019.105242

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.


Highlights

 A review of Alzheimer‘s Disease (AD) detection using deep learning is carried out
 Intensity normalization & registration are key preprocessing methods in AD detection
 Patch-based methods on disease-related regions are more useful for feature extraction
 Convolutional neural networks are increasingly utilized with impressive results
 Transfer learning and data augmentation are useful with a limited number of patients

1
Deep Learning to Detect Alzheimer’s
Disease from Neuroimaging: A
Systematic Literature Review
Mr Amir Ebrahimighahnavieh

Dr. Raymond Chiong

The University of Newcastle

University Drive

Callaghan 2308

Australia

E-mail: raymond.chiong@newcastle.edu.au

Abstract: Alzheimer’s Disease (AD) is one of the leading causes of death in developed countries. From a
research point of view, impressive results have been reported using computer-aided algorithms, but
clinically no practical diagnostic method is available. In recent years, deep models have become popular,
especially in dealing with images. Since 2013, deep learning has begun to gain considerable attention in
AD detection research, with the number of published papers in this area increasing drastically since 2017.
Deep models have been reported to be more accurate for AD detection compared to general machine
learning techniques. Nevertheless, AD detection is still challenging, and for classification, it requires a
highly discriminative feature representation to separate similar brain patterns. This paper reviews the
current state of AD detection using deep learning. Through a systematic literature review of over 100
articles, we set out the most recent findings and trends. Specifically, we review useful biomarkers and
features (personal information, genetic data, and brain scans), the necessary pre-processing steps, and
different ways of dealing with neuroimaging data originating from single-modality and multi-modality
studies. Deep models and their performance are described in detail. Although deep learning has achieved
notable performance in detecting AD, there are several limitations, especially regarding the availability of
datasets and training procedures.

Keywords: deep learning; Alzheimer’s disease; convolutional neural networks; recurrent neural networks;
auto-encoders; transfer learning.

2
1 Introduction
With new artificial intelligence technologies, computer systems can be used to enhance the accuracy and
speed of detecting diseases in hospitals, even those that have few medical experts. Advances in medical imaging
and analysis have delivered powerful tools for detecting neurodegeneration, and there is great interest in using
imaging information to diagnose a disease. It has recently been shown that a computer can make as accurate an
assessment as a radiologist [1].

Alzheimer‘s Disease (AD) is an irreversible progressive neurodegenerative disorder that slowly destroys
memory and leads to difficulty in communication and performing daily activities such as speaking and walking.
It is eventually fatal. AD is the most common type of dementia, comprising an estimated 60–80% of all
dementia cases. It typically starts in middle or old age, possibly initiated by accumulation of protein in and
around neurons, and leads to a steady deterioration in memory (associated with synaptic dysfunction, brain
shrinkage, and cell death) [2]. The first changes in the brain occur before cognitive decline begins, and some
biomarkers may become abnormal at this early stage. Research suggests that brain changes related to AD may
begin at least 20 years before symptoms appear [2, 3].

Patients at the initial stage of AD are classified as having Mild Cognitive Impairment (MCI) [4, 5], although
not all patients with MCI will develop AD. MCI is a transitional stage from normal to AD, in which a person
has mild changes in cognitive ability that are obvious to the person affected and to relatives but is still able to
perform everyday activities. About 15–20% of people aged 65 or older have MCI, and 30–40% of individuals
with MCI develop AD within 5 years [2]. The conversion time ranges from 6 to 36 months but is typically 18
months. MCI patients can then be categorized as MCI converters (MCIc) or MCI non-convertors (MCInc),
meaning the patient had or had not converted to AD within 18 months. There are also other subtypes of MCI
that are rarely mentioned in the literature, such as early/late MCI.

The most significant risk factors for AD are family histories and the presence of related genes in a person‘s
genome. An AD diagnosis is based on a clinical examination as well as a comprehensive interview of the patient
and their relatives [6, 7]. Nevertheless, a ‗ground truth‘ diagnosis of AD can only be made via autopsy, which is
not clinically helpful. A group of AD patients with an autopsy-confirmed diagnosis is utilized in [8].

Without ground truth data, patients need some other criteria to confirm AD. Such criteria could improve our
understanding of AD, and make diagnosis possible for living patients. In 1984, NINCDS1 and ADRDA2
established criteria for the clinical diagnosis of AD; in 2007 they were revised based on memory impairment
and the presence of at least one additional supportive feature: abnormal Magnetic Resonance Imaging (MRI)
and Positron Emission Tomography (PET) neuroimaging or abnormal cerebrospinal fluid amyloid and tau
biomarkers [5, 9-11]. NIA3 and the Alzheimer‘s Association have also begun revising diagnostic criteria for AD
[12-16]. The new proposed diagnostic criteria include measures of brain amyloid, neuronal injury, and
degeneration. It has recently been concluded that updates to the criteria are probably warranted every 3–4 years
in order to incorporate new knowledge about the pathophysiology and progression of the disease [17].

1
National Institute of Neurological and Communicative Disorders and Stroke
2
Alzheimer‘s Disease and Related Disorders Association
3
National Institute on Aging

3
The Mini-Mental State Examination (MMSE) [18] and the Clinical Dementia Rating (CDR) [19] are two of
the most frequently used tests in evaluating AD [20], although it should be noted that using them as ground truth
labels for AD might be incorrect. Based on the criteria mentioned above, the reported accuracies of clinical
diagnosis of AD compared to post-mortem diagnosis are in the range of 70–90% [21-24]. Despite its limitations,
clinical diagnosis is the best available reference standard [25]. It is also worth noting that the availability of all
the recognised biomarkers is quite limited.

In 2010, the number of people over 60 years old living with dementia was reported to be 35.6 million
worldwide and 310,000 in Australasia. The numbers are expected to almost double every 20 years so that by
2050 there would be 115 million worldwide and 790,000 in Australasia [26]. Dementia has become the second
leading cause of death in Australia, with 13,126 cases reported in 2016 [27]. The cost of nursing for AD patients
and other types of dementia is expected to increase considerably, making AD one of the most expensive chronic
diseases [2, 28]. Although a number of treatment strategies have been investigated to prevent or slow down the
disease, success has been limited [29]. In future, the early and accurate detection of AD is vital for appropriate
treatment. Early detection of AD means patients can maintain their independence for longer; new research
efforts will lead to better understanding of the disease process and the development of new treatments [30, 31].

Considering all the above, there is a need for a multi-class clinical decision, unbiased by variable
radiological expertise, which can automatically distinguish AD and its different stages from a Normal Control
(NC). Generally, classifying AD patients from NCs or MCIs is not as valuable as predicting MCI conversion,
because AD is clearly apparent without using any expertise when it is too late for treatment. Nevertheless, many
studies still tackle the AD vs. NC problem, since it is helpful in other classification tasks, especially in
understanding the early signs of AD. The most important and main challenge in AD assessment is to determine
whether someone has MCI or not and to predict if an MCI patient will develop the disease. Although the
available computer-aided systems are still not able to replace a medical expert, they can supply supporting
information to improve the accuracy of clinical decisions. It should be noted that not all studies work on AD,
MCI, or NC. Other stages of the disease such as early/late MCI are also considered.

Detecting AD using artificial intelligence is usually a challenge for researchers due to:

 Low medical image acquisition quality and errors in pre-processing and brain segmentation.
 Unavailability of a comprehensive dataset including a vast number of subjects and biomarkers.
 Low between-class variance in different stages of AD. Sometimes the signs that distinguish AD, for
example, brain shrinkage, can be found in a normal healthy brain of older people [32].
 The ambiguity of boundaries between AD/MCI and MCI/NC based on AD diagnostic criteria [25].
 Lack of expert knowledge, especially in identifying Regions-Of-Interest (ROIs) in the brain.
 The complexity of medical images compared to the usual natural images.

There are a few review studies on AD detection using machine learning, which cover topics such as different
types of classifiers, single-modal and multi-modal models, feature extraction algorithms, feature selection
methods, validation approaches, and dataset properties [3, 20, 33-35]. Also, competition challenges – such as
CADDementia4 [25], TADPOLE5 [36], The Alzheimer‘s Disease Big Data DREAM Challenge6 [37], and the

4
http://caddementia.grandchallenge.org

4
international challenge for automated prediction of MCI from MRI data7 (hosted by the Kaggle platform) [38] –
have been shown to be effective in AD analysis; they can provide unbiased comparisons of algorithms and tools
on standardized data involving participants worldwide. In these studies and competitions, many different
machine learning techniques have been investigated and evaluated, but traditional machine learning approaches
are not satisfactory for dealing with such complicated issues as AD [39]. Detecting AD is difficult, and
successful classification calls for a strong ability to discriminate certain features among similar brain image
patterns.

The increase in processing power of Graphics Processing Units (GPUs) has enabled the development of
cutting-edge deep learning algorithms. Deep learning is a subset of machine learning in artificial intelligence
that imitates the workings of the human brain in data processing and pattern recognition to solve complex
decision-making problems. Methods based on deep learning have revolutionised performance in numerous
areas, such as object recognition, detection, tracking, image segmentation, and audio classification. Successful
deep learning in the classification of 2D natural images has benefited studies of deep learning in the domain of
medical images [40, 41]. In recent years, deep learning models, particularly Convolutional Neural Networks
(CNNs), have performed well in the field of medical imaging for organ segmentation and disease detection [42].
Based on neuroimaging data, deep learning models can discover hidden representations, find links between
different parts of images, and identify disease-related patterns. Deep learning models have been successfully
applied to medical images such as structural MRI (simply called MRI in this paper), functional MRI (fMRI),
PET, and Diffusion Tensor Imaging (DTI). In this way, researchers have recently begun using deep learning
models for detecting AD from medical images [40]; however, there is still a long way to go before deep learning
techniques can be used to accurately detect AD.

This paper aims to review the current state of AD detection using deep learning. In particular, we aim to set
out how deep learning can be used in supervised and unsupervised modes to provide a better understanding of
AD. We review AD detection using deep learning to ascertain recent findings and current trends.

A typical block diagram of a computer-aided AD detection system is shown in Figure 1. The context here is
to see what kind of biomarkers and factors can be used in AD detection, which datasets are available, what kind
of pre-processing techniques are needed to deal with biomarkers (especially in neuroimaging), how to extract
single features from 3D brain scans, which deep models are capable of capturing disease-related patterns of AD,
and how to handle multi-modal data.

Typical machine learning methods are composed of three main steps: feature extraction, feature dimension
reduction, and classification. Nevertheless, researchers usually combine all these stages when using deep
learning techniques. All the papers included in this review can be categorized in terms of inputs, what
biomarkers have been used, how biomarkers have been managed, and what deep learning technique was
employed.

To begin, our search strategy and the inclusion/exclusion criteria are first set out to indicate how the papers
were selected for review. Next, biomarkers for AD detection, especially brain scans, are explained. After that,

5
The Alzheimer‘s Disease Prediction Of Longitudinal Evolution, https://tadpole.grand-challenge.org
6
http://dreamchallenges.org
7
https://www.kaggle.com/c/mci-prediction

5
data management methods for dealing with brain scans are discussed: voxel-based, slice-based, ROI-based, and
patch-based (along with necessary pre-processing steps). Details of deep learning models used for AD detection
are then described, together with the specific advantages of each. Finally, training parameters, datasets, and
software platforms are discussed, followed by highlights and future challenges.

Deep
Brain scans Pre-processing Data management Classification
model

Variables

Figure 1. A typical block diagram of a computer-aided AD detection system.

2 The review protocol


Since 2013 the exploration of new neural network structures has gained momentum, with much deeper
models coming to the fore, especially for dealing with the processing of medical images [42]. The importance of
deep learning in AD detection has been revealed, and since 2017 the number of papers published in this area has
increased rapidly, as can be seen in Figure 2. Those numbers in Figure 2 show both pre-print and peer-reviewed
papers, but do not include book chapters and theses. Specifically, there are 9 pre-prints and 105 peer-reviewed
articles derived from our search and selection process.

In terms of classification accuracy, deep models are generally more accurate than general machine learning
techniques [43-60]. Many different techniques based on deep learning have been applied to AD detection.
Nevertheless, a number of controversial findings exist, which motivated us to conduct this literature review, in
order to see what the current state of play is, and what the future trends might be. Our main research question
was to investigate whether deep learning techniques were capable of detecting AD using neuroimaging data.

Our systematic literature review follows a well-defined methodology that aims to be as fair and objective as
possible, compared to a traditional review trying to summarize the main results [61-63]. A systematic literature
review consists of three main stages: planning, executing, and reporting. It aims to set a research question, then
develop a review protocol, identify already available reviews, develop a comprehensive search strategy, select
studies based on the selection criteria, analyze content, update the review protocol, perform a quality
assessment, interpret results, and lastly produce the final document [64, 65].

The review protocol details how the review will be conducted. It outlines the research question, specifies the
process to be followed, and sets out the conditions to be applied when selecting studies; there are quality metrics
to ensure the studies chosen are relevant, and team members are given certain tasks in developing the review
protocol. The review protocol here was designed by the first author, and reviewed and revised by the other co-
authors. Mistakes and defects identified by co-authors in data collection and aggregation procedures were used
to revise the research protocol and the research questions. Each study was reviewed at least three times to make
sure that the extracted data fully complied with the final protocol. Data extraction was facilitated by having a

6
standard data extraction form for each of the research questions. The data extraction form was compiled when
the study protocol was first defined and later revised if any changes were made. Even if the information
provided in a study was incomplete, the whole of the available data was extracted for each research question.
Statistics reported here relate to information provided in primary articles.

The research questions of this study are listed below, together with the sections that address those questions.

 RQ1: What kind of biomarkers and factors are involved in AD detection? (Section 3 and Table 2 of
Appendix 1)

 RQ2: What kind of pre-processing techniques are necessary to deal with biomarkers, especially in
neuroimaging? (Section 3.1)

 RQ3: How can 3D brain scans handle feature extraction? (Section 3.2 and Table 2 of Appendix 1)

 RQ4: Which deep models have been used for capturing disease-related patterns of AD? (Section 4 and
Table 2 of Appendix 1)

 RQ5: What datasets and software platforms are applicable in this area? (Section 5 and Table 3 of
Appendix 1)

 RQ6: How can training parameters be chosen in the training process? (Section 6)

 RQ7: What is the current state of AD detection accuracy using deep models? (Tables 4 to 9 of
Appendix 1).

The search strategy and inclusion/exclusion criteria are described in the next section, followed by a
description of the quality assessment process.

47
50
Number of papers

40 33
30
16
20
6
10 2 3
0
2013 2014 2015 2016 2017 2018
Year

Figure 2. Papers using deep learning to detect AD over the years.

2.1 Search strategy


To identify contributions in AD detection, IEEE Xplore, ScienceDirect, SpringerLink, and ACM digital
libraries were queried for papers containing ―Alzheimer‖ and ―deep‖ in the title, abstract, or keywords. Also,
Web of Science and Scopus were queried to cross-check the findings and locate other papers in lesser-known
libraries. These online databases were chosen since they offer the most important peer-reviewed full-text

7
journals and conference proceedings covering the field of deep learning. The search terms used were expected to
cover most, if not all, of the work incorporating deep learning methods for AD detection. In addition, Google
Scholar was used for forward-searching, that is, checking for citations of found papers to update our search and
to look for other papers to ensure nothing was overlooked. The search process was performed by the first author
and the last update was done on April 8th, 2019.

2.2 Inclusion/exclusion criteria


The study selection criteria determine whether a study will be included or excluded from the systematic
review. A pilot version of the selection criteria was defined on a subset of the primary studies and was further
developed after the review protocol was finalized. In this section, the final version of the selection criteria is
explained. Decisions about inclusion/exclusion were not affected by the names of the authors, their institutions,
the journal, or year of publication.

Some analyses tried to distinguish AD from other brain abnormalities such as Parkinson, Down syndrome,
Schizophrenia, and Autism. These illnesses or disorders are out of the scope of this research and therefore
subjects categorized as NCs are considered to be completely healthy without any neurological/psychiatric
disorder or treatment [66]. In this study, papers that did not use at least one neuroimaging modality were
excluded. This means that studies utilizing EEG, retina, visual attention, speech disfluencies, and the like,
without involving brain scans, were not included. Moreover, studies without clearly reported results on
classification problems (AD/MCI/NC) are not considered. In other words, we did not include papers
investigating the estimation of, for example, MMSE score or time of conversion from MCI to AD, modeling AD
progression, neuroimaging data completion, image processing techniques, or brain segmentation without clearly
reported classification accuracy. When overlapping studies were reported in multiple publications (such as [67,
68]), all the publications were included so as to understand even small differences.

Finally, after performing a full-text search by the first author, the found papers were reduced to 114 articles
written in English from 2013 onwards on deep learning for AD detection using neuroimaging modalities.
Among these 114 papers, Suk [54, 55, 69-74], Aderghal [75-80], and Cheng & Liu [81-86] had the most papers.
According to Google Scholar, references [11], [30], and [32] are the most cited, whereas reference [87] has the
most number of citations per year. In this area of research, leading conferences are the IEEE International
Symposium on Biomedical Imaging and the International Workshop on Machine Learning in Medical Imaging;
and the leading journals are NeuroImage, Medical Image Analysis, and IEEE Journal of Biomedical and Health
Informatics. The categories of AD studies based on deep learning fall into the following search terms:

 Detection or diagnosis. Although most studies work on AD detection, a few studies try to identify the
nature of the disease, for example, with ROI extraction.
 Cross-sectional or longitudinal. The former evaluates each subject at a specific point in time, but the
latter follows subjects over time [58-60, 88, 89].
 Single-modality or multi-modality. In contrast to single-modality studies, multi-modality studies use
more than one neuroimaging modality per subject so as to gain complementary information.
 Automatic or manual. Although fully automatic systems are preferred, some studies involved manual
intervention, especially to reduce brain segmentation errors.

8
2.3 Quality assessment
It is sometimes necessary to assess the quality of a study in order to support the inclusion/exclusion process
or perform comparative analysis. In this secondary review study, there was a limited number of primary articles,
and we did not need any other exclusion metric to reduce the number of studies. Nevertheless, a quality
assessment is still helpful to interpret primary study findings or investigate whether quality differences explain
differences in study results. It is also useful as a means of weighting the importance of individual studies and
guiding recommendations for further research [61-63].

Quality assessment depends strongly on the type of systematic literature review and the contents of the
studies, and so many studies do not perform it [90, 91]. There are no universal definitions of quality metrics for
primary studies, and every systematic literature review has its own task-specific criteria. Quality criteria can be
related to the types of articles (journal or conference), peer-reviewed or not, novelty of the presented idea, and
completeness of the information provided. In our case, four different quality metrics were used including the
article type (pre-print, conference, or journal), scientific impact (number of citations per year), study size
(number of subjects in the dataset), and the completeness of the provided information (according to the research
questions of this review). The following point-based system is used to evaluate the quality of each primary
study:

Table 1. The defined quality metrics in our quality assessment process.

Points
Quality metric
Article type Pre-print Conference Journal
Scientific impact –
Study size
Completeness Incomplete Partially complete Nearly complete

According to Table 1, the maximum possible score is 8, which only one article in our literature review achieved
[55], followed by four articles with a score of 7 [92-95]. The distribution of quality scores of primary studies is
given in Figure 3, which shows that the minimum score was 1, and the average was 3.6. The quality score of
each study is listed in Table 2 of Appendix 1.

9
8 1
7 4
6 7

Qulaity score
5 17
4 29
3 31
2 22
1 3
0 0
0 10 20 30 40
Count

Figure 3. Distribution of quality scores in primary studies.

3 Biomarkers and features in AD detection


Accurate AD detection at the beginning stages of the disease requires evaluation of some quantitative
biomarkers. For detecting AD, several non-invasive neuroimaging modalities such as MRI, fMRI, and PET have
been investigated. Of these biomarkers, MRI is the most widely available and used biomarker for AD detection
and has demonstrated high performance in the literature [35, 42, 96]. It uses a powerful magnetic field and
radiofrequency pulses to create a 3D representation of organs, soft tissues, and bones. fMRI reflects the changes
associated with blood flow. PET is functional imaging technique based on nuclear medicine methods that can
observe metabolic processes within the body.

In addition to multiple neuroimaging modalities, there are many other factors that are possibly relevant to
AD detection: age, gender, educational level, speech pattern, EEG, retinal abnormalities, postural kinematic
analysis, cerebrospinal fluid (CSF) biomarkers, neuropsychological measures (NMs), MMSE and CDR score,
logical memory test (LM), as well as certain genes that are believed to be responsible for about 70% of the risk
[35]. These factors, together with the multiple neuroimaging modalities, can complicate the training of deep
learning models. Figure 4 shows, in all the studied articles, the prevalence of single-modality and multi-
modality studies, the percentage of each neuroimaging modality among the single-modality approaches, and the
percentage where grey matter (GM) measures were used in MRI-based studies. For simplicity, here only the
prevalence of single-modality brain scans is shown (since the multi-modality category can be very complex, as
shown in Table 2 of Appendix 1).

In this section, pre-processing techniques for brain scans are first explained. We then detail different data
management methods for dealing with 3D brain scans. Finally, we make a broad comparison of data
management methods.

10
Multi- Raw
fMRI
modality Single- MRI 78%
9% GM
27% modality 83% 22%
73%
PET
8%

(a) (b) (c)

Figure 4. (a) The prevalence of single-modality and multi-modality studies; (b) of the single-modality
approaches, which neuroimaging modality was used; and (c) frequency of use of grey matter measures (GM) in
MRI-based studies. All figures based on our literature review papers.

3.1 Pre-processing
After sketching the neuroimaging modalities used for AD detection, we next need to look at the way that
studies use these modalities in their deep learning architecture. As a preliminary, however, the necessary pre-
processing steps need to be recognised. Most studies, especially those in machine learning, need pre-processing
before the data can be manipulated. The final success of an intelligent rating system depends strongly on
effective pre-processing. With the advent of deep learning techniques, some pre-processing steps have become
less critical [82, 83]. However, most studies still use pre-processing techniques on raw data such as intensity
normalization, registration, tissue segmentation, skull stripping, and motion correction. At the same time, some
new deep learning methods have been proposed for different pre-processing routines [97]. In this section, the
most common pre-processing techniques are set out.

Intensity normalization means mapping the intensities of all pixels or voxels onto a reference scale. The
intensities are normalized so that similar structures have similar intensities [98-103]. The most commonly
adopted approach is to use the N3 nonparametric non-uniform intensity normalization algorithm [104]. N3 is a
robust and well-established algorithm for sharpening histogram peaks so that any intensity non-uniformity is
reduced. It has been applied to correct non-uniform tissue intensities in about 30% of studies. Another technique
used in about 20% of our studies is smoothing with a Gaussian filter, usually with FWHM (full-width at half
maximum) of between 5–8 mm; this reduces the noise level in the image while retaining the signal level [73].
Another method of intensity normalization is to shift the distribution of voxel intensities about zero (i.e., zero-
centred), which was reported in 15% of studies. Some studies have used other special intensity normalization
methods, for example, processing to remove magnetic field inhomogeneities that occurred during image
acquisition.

Registration is the process of spatially aligning image scans to a reference anatomical space. It is essential
due to the complexity of brain structures and the differences between the brains of different subjects. Image
registration aids in standardizing the neuroimaging modalities regarding a common fixed-size template (such as
MNI8). This alignment makes it possible to compare the voxel intensities of brain scans from different subjects,

8
Montreal Neurological Institute

11
ensuring that a certain voxel in one scan has the same anatomical position as in the brain of another patient.
However, registration is not only about using a standard space but is also sometimes used for co-registering
multiple modalities. The anterior commissure (AC) and posterior commissure (PC) are two major anatomical
landmarks in the brain, so another way of aligning image geometry is AC-PC correction: a brain that has been
AC-PC aligned has the AC and PC in the same axial plane. Another pre-processing step is Gradwarp, which
corrects geometrical distortions due to gradient non-linearity.

The role of tissue segmentation in MRI brain scanning is to measure the volume of tissue in each region.
Since neurodegeneration affects grey matter (GM) at its initial stages, especially in the medial temporal lobe
region [16, 105], GM probability maps (where GM is compared with white matter, WM) are usually used as the
input in classification problems. GM probability maps give a quantitative picture of the spatial distribution of
this tissue in the brain where the brightness of each voxel reflects the amount of local GM. However, a different
method has been used in which non-WM has been extracted from an MRI using a GM mask on the
corresponding FDG-PET scan [106]. Another widely pre-processing technique is skull stripping, which removes
the bone of the skull from images. This can be used alone or together with cerebellum removal or neck removal.
The last technique is motion correction, where motion artefacts in brain scans are suppressed. Figure 5 shows
the prevalence of each pre-processing technique in the literature. As can be seen, intensity normalization and
registration are done in more than 50% of studies.

80
70
Prevalence (%)

60
50
40
30
20
10
0

Pre-processing technique

Figure 5. The prevalence of each pre-processing technique in the literature.

3.2 Input data management


The main aim of feature extraction techniques is to create a quantified set of accurate information such as
shape, texture, and volume of different parts of the brain based on neuroimaging data. The information should
convey the disease pattern and be readily classified. In general, every classification problem has three stages:
feature extraction, feature dimension reduction, and finally classification. Thanks to the structure of deep
learning models, all these steps can be merged into one. However, managing the whole neuroimaging modality

12
is still a challenge. Considering all the studies reviewed here, approaches to input data management can roughly
be grouped into four different categories, depending on the type of extracted features: voxel-based, slice-based,
patch-based, and ROI-based [34, 35]. The prevalence of each category is shown in Figure 6, with more details
in the following sections (note, however, that not all studies fall into these categories; for example, a feature
extraction method was used in [107, 108]).

Combinational
9%
Voxel-based
21%
ROI-based
34%
Slice-based
Patch- 27%
based
9%

Figure 6. The prevalence of each approach to input data management.

3.2.1 Voxel-based
The voxel-based approaches are the most straightforward analysis technique. They use voxel intensity values
from the whole neuroimaging modalities or tissue components (GM/WM in MRI). This technique typically
requires spatial co-alignment (registration), where the individual images of the brain are standardized to a
standard three-dimensional space. Most studies in this category (about 70%) performed a full-brain image
analysis in either single-modality or multi-modality mode. In the rest of the studies, however, tissue
segmentation (the extraction of GM) was performed on MRI images before applying a deep model. Voxel-based
studies performing tissue segmentation cannot be considered full-brain image analysis as they work on only a
part of the brain. The advantage of tissue segmentation in MRI brain scans is explained in Section 3.1. In voxel-
based machine learning methods, a feature dimension reduction technique is usually applied, but this is not
necessarily useful in deep structures. Nevertheless, to overcome high feature dimensionality, a voxel
preselection method can be employed to each neuroimaging modality independently; as an example, Ortiz and
colleagues used the t-test algorithm in an ROI-based study to eliminate non-significant voxels and decrease
computational load [109].

3.2.2 Slice-based
Slice-based architectures assume that certain properties of interest can be reduced to two-dimensional
images, reducing the number of hyper-parameters. Many studies have used their own unique technique to
extract 2D image slices from a 3D brain scan, whereas others consider standard projections of neuroimaging
modalities, such as sagittal or median plane, coronal or frontal plane, and axial or horizontal plane.
Nevertheless, none of the studies in this category performed a full brain analysis, since a 2D image slice cannot
include all the information from a brain scan. In addition to using tissue segmentation, slice-based methods
usually take in the central part of the brain and ignore the rest.

13
Axial projection is the most widely used view. For example, Farooq et al. used slice-based axial scans of
GM volumes such that slices from the start and end, which contain no information, were discarded [110]. Other
examples have used median axial slices from an MRI [111], 166 axial slices of GM [112], 43 axial slices of
fMRI [32], and 3 axial slices of MRI [113]. In two papers, the last 10 slices along the axial plane of each subject
were removed from the GM, as well as slices with zero mean pixels, while all the other slices were concatenated
and used [114, 115]. Axial slices from fMRI data were also used in [116, 117], and again the first 10 slices of
each scan were removed as they contained no functional information. A similar effort by Qui et al. [118] used
three slices in the axial plane of an MRI, including anatomical areas previously reported as regions of interest,
and these were correlated with AD and MCI. Luo and colleagues [119] extracted seven groups of slices (5 slices
in each group) of the mid-axial plane of an MRI, with one classifier per group.

An entropy-based sorting procedure [120, 121] was used to select 32 most informative slices from the axial
plane of each MRI scan. In this method, the image entropy of each slice was computed from the histogram,
which delivered a measure of variation in each slice, and the slices with the highest entropy values were
considered the most informative. Although using these informative slices for training will provide robustness,
high entropy is not necessarily discriminative. Wu et al. adopted a new method that combined 3 slices into an
RGB colour image to meet the requirement of their CNN architectures [122]. From among about 160 axial
slices of MRI scans, the first 15 slices and the last 15 slices without anatomical information were discarded,
resulting in about 130 slices for each scan. Next, 48 different slices were selected randomly from the remaining
slices at intervals of 4, and thus 16 RGB colour images were generated for each scan.

According to Gunawardena et al. [52], the coronal view covers the three most important AD-related regions
in the brain (hippocampal, cortex, and ventricle), and they used only a couple of image slices from the coronal
plane of MRI scans. Under the assumption that the middle slices include areas that have essential features for
classification, 20 [123] and 7 [124] mid-coronal slices of an MRI have been used. A similar approach [125]
emphasized the discriminative potential of the coronal view. Five sagittal slices of an MRI at the centre of the
hippocampus [77], 62 mid-sagittal slices of GM [126], and one sagittal slice of MRI (including hippocampus)
were employed in [127]. Gao and colleagues [128] selected the 50 largest pieces of the sagittal plane from each
MRI scan, and then removed the noisiest and less useful image slices; the value of 50 was chosen based on the
opinion of neurologists.

Since using all three views of 3D scans may supply complementary features useful for classification, there
are some studies that take all image views into account. For example, a majority voting strategy for deep
networks designed for each view was applied in [94, 129]. In [85, 86] the decomposed image slices from each
projection of FDG-PET scans were divided into a number of groups at specific intervals that had some overlap
but no registration or segmentation. In related work [75, 76], the hippocampal region of MRI scans was utilized
on all 3 projections, but with only 3 slices at the centre of the hippocampal region in each projection. A similar
approach was used with morphological information of MRI images, such as cortical volume, surface area,
average cortical thickness, and standard deviation of thickness in each ROI [130]. Aderghal used a similar
method except it used a multi-modality approach (MRI+DTI) [78].

14
3.2.3 ROI-based
Instead of being concerned with the whole brain, ROI methods focus on particular parts of the brain known
to be affected in the early stages of AD. The definition of ROIs usually requires previous knowledge of the
abnormal regions and a brain atlas such as the Automated Anatomical Labeling (AAL) [131] or the Kabani
reference work [132], combined with the long-term experience of researchers. In this way, the GM tissue
volume of 93 ROIs only from MRI [54, 55] along with the mean intensity from PET of the same number of
ROIs were computed as features in [67-71, 87, 133-135]. Similarly, 83 functional regions from MRIs (GM)
and PET were extracted in [43, 44, 136]. Choi and colleagues [72] computed GM tissue volumes of 93 ROIs,
and then picked out regional abnormalities using a deep model of each region. In other work [50, 137, 138],
Principal Component Analysis (PCA) was applied after extracting 93 ROI-based volumetric features from MRI
and the same number of features for PET. In [46, 47], 90 ROIs were extracted from fMRI images and the
correlation coefficient between each possible pair of brain regions computed. Ortiz and colleagues [109], using a
voxel preselection method, selected 98 ROIs from both MRI (GM only) and PET, and designed a deep model
for each ROI. Suk et al. [74] selected 116 ROIs from fMRI images and then trained a deep model on the mean
intensities of each ROI; in this way they found, in an unsupervised and hierarchical way, the non-linear relations
between the ROIs. Together with patch-based features of GM and deformation magnitudes (DM) from MRI
scans, Shi et al. extracted 113 ROI volumes [59]. Image patches were extracted in each of 62 ROIs of PET [99]
or MRI images in [98], while 85 ROIs from PET [49] and 87 ROIs from PET and MRI (GM only) [93] were
extracted, from which these ROIs were further utilized in a patch-based method. In another study, 90 ROIs were
extracted and then a brain network connectivity matrix calculated from multi-modal data [139].

The median slice and its closest neighbours inside a 3D bounding box of the hippocampus were chosen in
[75-78]. This method was called the ―2D+ approach‖ since they moved from a 3D volume to 2D images.
Bhatkoti et al. [100] devised a patch-based representation of different brain sub-regions, including left and right
hippocampus, mid-occipital, parahippocampus, vermis, and fusiform. Shakeri et al. extracted morphological
features as 3D surface meshes from the hippocampus structure of MRIs [57]. Dolph et al. [48] extracted Fractal
Dimension (FD) texture features, together with volumetric, cortical thickness, and surface area features of the
segmented hippocampus, from MRIs and then calculated the statistical properties of the Gray-Level Co-
Occurrence Matrix (GLCM) to describe the FD feature pattern. In a multi-modality study [79, 80], the left and
right lobe of the hippocampus were selected as the most discriminative parts, and a deep model was designed for
each region. Collazos-Huertas et al. [140] used morphological measurements of different parts of MRI scans,
including cortical and subcortical volumes, average thickness and standard deviation, and surface area. In
another MRI study [141], the two hippocampi were segmented and a local 3D image patch extracted from the
centre of each; a deep model was then used for classification. In [142], 430 features were selected, including
cortical thickness, curvature, surface area, and volume, as well as the hippocampus, and analysed together with
gender, age, and baseline MMSE total score. Highly correlated features were then removed to produce
independent features. Finally, a random forest classifier was used for feature selection to identify the 20 most
important features.

After discarding the first and last 10 sagittal slices, Karwath et al. used a deep model to extract informative
ROIs from PET scans [102]. Li et al. [51] selected ROIs from GM segmentation of MRIs, and then calculated a

15
weighted connectivity matrix of ROIs, which represents the connection strength between region pairs, to
produce a final brain network. Instead of directly learning the topological features from complex brain networks,
the method learns the corresponding eigenvalues of the matrix, giving a compact and complete feature
representation. In [143], a voxel-based morphometric analysis of regional GM differences between two groups
of patients (MCIc, MCInc) was used to obtain the 5 most significant ROIs related to GM damage. Ortiz et al.
[60] considered 42 ROIs that were closely related to AD, and then computed an estimate of the corresponding
inverse covariance between regions.

3.2.4 Patch-based
A patch is defined as a three-dimensional cube. Patch-based approaches can capture disease-related patterns
in a brain by extracting features from small image patches. The main challenge in patch-based methods is to
choose the most informative image patches for capturing both local (patch-level) and global (image-level)
features [94]. This approach has been used in a number of studies for AD detection [144]. For example, Cheng
et al. [82] extracted 27 uniform fixed-size local patches of voxels, with 50% overlap, from each FDG-PET
image. A similar approach was proposed in a multi-modality study [83]. Somewhat differently, landmark-based
methods have been used to automatically extract discriminative anatomical landmarks of AD from MRIs via
group comparison of subjects; first, the top 50 discriminative AD-related landmark locations were identified
(bilateral hippocampal, parahippocampal, and fusiform) using a landmark discovery algorithm, and then 27
fixed-size image patches around these detected landmarks were extracted [94, 95, 103, 145]. A similar patch-
based approach was followed in a multi-modality study [146]. In another study [147], the whole brain MRI was
uniformly partitioned into different local regions of the same size, and several 3D patches were extracted from
each region. Then the patches from each region were grouped into different clusters with the k-means clustering
method before the final classification.

Suk and colleagues [73] proposed a latent high-level feature representation method using class-
discriminative patches (based on a statistical significance test between classes) in one multi-modality study (GM
of MRI + FDG-PET). They selected class-discriminative patches from two modalities before the final three-
level classifier (patch-level, mega-patch-level, and image-level). A similar approach [58, 59] was applied in a
longitudinal study with a difference in patch selection after patch extraction. The 100 most class-discriminative
patches of fixed-size were selected in a greedy manner using less than 50% volume overlap. A 3D scalar field of
DM was then calculated, based on estimated voxel deformations matching the baseline; follow-up MRIs were
done for each subject before the final classification stage. In addition to these patch-based features, 113 volumes
of ROIs were also extracted by Shi [59]. A multi-scale approach that extracted data at multiple scales from
smaller sub-regions (fine-scale) to larger sub-regions (coarse-scale) was suggested by Lu [49]. First, 85 ROIs of
FDG-PET were extracted, and then the voxels inside each ROI were further subdivided into patches at three
different scales in preparation for the classification stage. The same approach was applied by Lu [93] for a
multi-modal study where the mean intensity of each patch in FDG-PET scans was used to form a feature vector
representing metabolic activity, and the volume of each GM patch from MRI was used to represent the brain
structure. A similar method was proposed by Lian [92], where anatomical landmarks were used as prior
knowledge to efficiently filter out uninformative regions and assist in defining relatively informative patches.

16
3.3 Summary of biomarker and handling issues
When dealing with biomarkers and features, it needs to be recognised that MRI is the most prevalent type of
neuroimaging modality. Although several studies have reported that MRI is more discriminative compared with
PET [56, 67, 73, 148] or DTI [78], others regard MRI to be as discriminative as PET [69-71, 81, 87, 133, 146]
or slightly less discriminative [83, 93]. Since other studies regard DTI [139] or fMRI [115] as most helpful,
comparison of neuroimaging modalities still needs further investigation.

Managing the input to deep models is also an important issue. Using 2D slices as input instead of the whole
3D image avoids generating millions of training parameters and results in simplified networks (at the cost, of
course, of losing spatial dependency between adjacent slices). When using slice-based methods, sagittal [75, 76]
and coronal [52] views are reported to be more discriminative, although axial views are the most widely used
(some studies [78, 86] say there is no significant difference between planes). In terms of classification power,
multi-view studies have been shown to outperform single-view studies by capturing complementary information
[75, 76, 86]. By way of contrast, voxel-based methods can obtain all 3D information in a single brain scan, but
they typically treat all brain regions uniformly without any adaptation to special anatomical structures.
Moreover, voxel-based methods ignore local information, since they treat each voxel independently, and carry
high feature dimensionality and high computational load. To overcome the high feature dimensionality, voxel
preselection methods might be necessary. Additional benefits of voxel-based methods are presented in [106,
149], where voxels are shown to be more valuable than 2D slices.

Raw brain scans inevitably suffer from noise, which arises from different sources and at different levels
depending on the type of scan. Noise sources usually originate from random neural activity of the patient,
operator issues, equipment, and the environment. A single brain scan contains a complex pattern of voxels and a
large amount of data, creating difficulties in classifying and interpreting features. To classify images, it is
therefore necessary to extract a limited number of discrete pre-defined representative regions instead of using
full-brain image analysis. The strength of ROI-based methods is that they are easily interpreted and
implemented in clinical practice. Although the dimensionality of ROI-based features depends on the number of
defined ROIs, it is always smaller than with slice/voxel-based approaches, meaning the entire brain is
represented by fewer features. With ROI-based studies, knowledge that only specific parts of the brain,
particularly the hippocampus, are involved in AD is put to good use. The hippocampus is a complex brain
structure located in the temporal lobe with a key role in learning and memory, making it one of the most
significant regions for AD detection. At the initial stages of AD, the volume, shape, and texture of the
hippocampus are already affected and have been used as a marker of early AD in various studies [57, 75-80, 98,
99, 141]. Ali and colleagues reported that while the average reduction in volume of the hippocampus is between
0.24% and 1.73% per year, AD patients suffer shrinkage at between 2.2% and 5.9% [150]. In this context, shape
analysis was reported to be more sensitive than volumetry, in particular, at the MCI stage [7]. According to
Leandrou and colleagues [20], texture analysis can outperform both shape and volumetric analysis in
classification accuracy. A few studies combined volume, thickness, shape, intensity, and texture features in the
evaluating AD [35, 151], which may result in better classification performance. Note that although the regions
mainly affected by AD are well-known, it should be remembered that other brain regions might also play a role
in the diagnosis of AD/MCI; however, their contribution is still not well explored [69, 71].

17
While ROI-based feature extraction can considerably decrease feature dimensions, because of the coarse-
scale nature of ROIs some small abnormalities might be ignored. Also, an abnormal region might occupy only a
small part of a pre-defined ROI, may have an irregular shape, and may be distributed over many incompletely
known brain regions; if so, it could lead to loss of discriminative information and limit the representational
power of the extracted features [95]. Accordingly, there could be instability in classification performance [152].
On the other hand, since patch-based methods occupy the intermediate scale between voxel-based and ROI-
based features, they can efficiently handle high feature dimensions and are sensitive to small changes [73, 147].
Because patch extraction does not require ROI identification, the necessity to involve a human expert is reduced
compared to ROI-based approaches. In the end, Cheng and colleagues reported that patch-based methods are
more accurate compared with voxel-based methods [145]. However, challenges in selecting the most
informative image patches still remain. By only using discriminative patches instead of all patches in a brain
scan, Suk and colleagues found both enhanced classification performance and reduced computational cost [73].
A summary of data handling methods is given in Table 2.

Table 2. A summary of data handling methods for AD detection.

Methods Strengths Limitations


Avoids confronting with millions of Loses spatial dependencies in adjacent slices
Sliced-based parameters during training and results
in simplified networks
Can obtain 3D information of a brain Contains high feature dimensionality and high
scan computation load
Voxel-based
Ignores the local information of the neuroimaging
modalities as it treats each voxel independently
Easily interpretable Has limited available knowledge about the brain
Has a low feature dimension regions involved in AD
ROI-based
Fewer features can reflect the entire Ignores detailed abnormalities
brain
Sensitive to small changes Has challenges to select the most informative image
Patch-based
Does not require ROI identification patches

4 A review on deep models for AD detection


The goal of this section is to outline fundamental concepts and algorithms of deep learning techniques and
their architectures that are found in AD detection. The methods are divided into unsupervised and supervised,
which are further separated into Auto-Encoder (AE), Restricted Boltzmann Machine (RBM), Deep Neural
Network (DNN), Deep Polynomial Network (DPN), Recurrent Neural Network (RNN), and 2D/3D
Convolutional Neural Network (CNN). Figure 7 shows the prevalence of each deep model in AD detection, and
more details can be found in [153]. There are also a few methods that cannot fit into this categorization scheme;
for example, Collazos-Huertas and colleagues proposed a deep supervised feature extraction method using
General Stochastic Networks through supervised and layer-wise non-linear mapping learning [140].

18
43
45
40 37
35

Number of papers
30
23
25
20
15
8
10 6 5
5 2 2
0

Deep model

Figure 7. The prevalence of each type of deep model used in AD detection from neuroimaging data.

4.1 Unsupervised deep learning


Unsupervised deep learning networks try to obtain a task-specific representation from neuroimaging data.
The step from machine learning into deep learning was first taken by using an unsupervised deep neural network
to extract high-level features [50, 69]. Suk and colleagues developed an approach where they iteratively
discarded uninformative features in a hierarchical manner to select features from MRI, PET, and CSF
biomarkers [70]. In the end, all of these studies used SVMs for classification. More details on unsupervised
feature learning methods are given in the following subsections.

4.1.1 Auto-Encoder
The AE is a particular type of neural network consisting of two modules: an encoder and a decoder. An AE
can obtain compressed representations from input data by minimizing the reconstruction error between the input
and output values of the network. Encoding maps the data from the input space to a representation space to keep
the most definitive features, whereas decoding maps it back into the input space, thus reconstructing the data.
For classification purposes, the features learned in the middle layer of an AE can be extracted and used as the
pre-training phase for feature extraction and dimension reduction in an unsupervised way, followed by a
classifier. Because of its simple and shallow structure, the representational power of AEs is very limited.
However, multiple AEs can be stacked in a configuration known as stacked AEs, which can considerably
enhance the representational power by using the values of the hidden units of one AE as the input to the next.
The key characteristic of stacked AEs is their ability to learn or discover highly nonlinear and complex patterns.
As the depth of the model increases, higher-level representations are learned.

There are different variations of AEs: sparse AE, de-noising AE, convolutional AE, and variational AE.
General deep learning structures in this section usually consist of an AE and a softmax layer. Stacked sparse
AEs with two [43-47, 136] or three [48, 49, 93] hidden layers and a softmax layer have been trained and fine-
tuned in several single-modality [45-49] and multi-modality [43, 44, 93, 136] studies. Ortiz and colleagues

19
followed a similar approach with three hidden layers in a stacked de-noising AE [60]. Shakeri and colleagues
introduced a variational AE for finding the latent feature representation from hippocampus morphological
variations [57]. Moussavi-Khalkhali et al. used a partially cascaded architecture using sparse/denoising AEs
with three hidden layers to extract high-level representations, integrating low-level features with a softmax layer
[124]. Kim et al. used a stacked sparse AE with three hidden layers from MRI, PET, and CSF images [134];
another AE was then used to fuse the high-level representation of all the data classified with a kernel-based
extreme learning machine.

Choi and colleagues used a stacked denoising AE with two hidden layers to extract regional abnormalities
[72]. In a related fashion, a stacked de-noising sparse AE, which is a combination of both denoising and sparse
AEs, has been proposed, an approach that uses three hidden layers together with an SVM for classification [58,
59]. By concatenating the learned feature representation of stacked AE from MRI, PET, and CSF images with
the original low-level features, Suk and co-workers were able to construct an augmented feature vector that was
then fed into a multi-kernel SVM [69, 71]. Three hidden layers were used for MRI, PET, and their
concatenation, and two hidden layers were used for CSF. Other workers have used a sparse AE to extract
features, followed by a 2D CNN [144] or DNN [100] for classification. Motivated by stacked de-noising AEs,
Majumdar and colleagues proposed a deep dictionary learning platform using both clean and noisy samples,
finally using a neural network for classification [138]. Li and collaborators used a combination of features from
3D CNN and multi-scale 3D convolutional AEs with three hidden layers, ultimately using a softmax layer for
classification [84].

One of the uses of AEs is to find a good initialization for deep neural networks. For example, an AE was
trained to find a suitable filter for convolution, and then the classification was achieved using a CNN [143, 146,
149, 154, 155], an arrangement whereby weights of the hidden layer form a matrix corresponding to
convolutional filters. A high-level layer concatenation AE, which is a variant of the convolutional AE network,
was used by Vu and co-workers to pre-train a CNN in which the high-resolution features from the encoding
layer are concatenated with the corresponding decoding layer [106].

4.1.2 Restricted Boltzmann Machine


The RBM is a single-layer undirected graphical model with a visible layer and a hidden layer. It adopts
symmetric links between visible and hidden layers but has no connections among units within the same layer.
Like an AE, it can generate input data from hidden representations and has been used in a few studies. For
example, Li et al. used a deep model consisting of a stack of RBMs to extract features in an unsupervised
manner [137], finishing with a linear SVM for classification. Suk et al. stacked together multiple RBMs to
transform the input features of an fMRI into an embedding low-dimensional space by detecting non-linear
relations among ROIs [74]. The Suk team first detected hierarchical non-linear functional relations, and then a
Hidden Markov Model (HMM) was used to approximate the likelihood of the input features of the fMRI; for
classification, fitting to the corresponding disease status was done with a linear SVM.

Like AEs, RBMs can be stacked to construct a deep architecture known as a Deep Belief Network (DBN).
The DBN has undirected connections at the top two layers and directed connections at the lower layers. A DBN
with three hidden layers for voxel values of MRI (GM tissue) with a linear SVM as the classifier was proposed
in [53]. Ortiz and colleagues used a set of DBNs for all ROIs as feature extractors, with a linear SVM at the

20
final stage for classification of all DBNs [109]. As pointed out by Guo, despite the benefits, it is computationally
expensive to create a DBN model because of the complicated initialization process [153]. A Deep Boltzmann
Machine (DBM) is also constructed by stacking multiple RBMs as building blocks to find a latent hierarchical
feature representation. Yet, in contrast to DBNs, all the layers in DBMs form an undirected generative model
following the RBM stacking. Although joint optimization is time-consuming in DBMs, and maybe impossible
for large datasets, DBMs can deal with ambiguous inputs more robustly by incorporating top-down feedback
[39, 153]. Instead of having noisy voxel values, Suk et al. found that a high-level 3D representation obtained via
DBM was more robust to noise (and thus helped improve diagnostic performance), whereas a multi-modal
DBM of a PET scan derived its features from paired patches of transformed values of GM tissue densities and
voxel intensities (a linear SVM was used as the final classifier) [73].

4.2 Supervised deep learning


We have seen so far that a typical unsupervised method comprises a feature extractor, which is usually AEs,
and a classifier, like an SVM. Supervised methods have higher popularity in our literature review, where feature
extraction and classification are merged into one model. In this section, supervised methods for AD detection
are reviewed. Further details about the methods used in each paper are set out in Table 2 of Appendix 1, which
summarizes a wide variety of deep models together with their biomarkers.

4.2.1 Deep Neural Network


The DNN has the same configuration as a traditional Multi-Layer Perceptron (MLP) network, yet
incorporates more stacked layers. It can discover complicated relations of input patterns and provides a better
understanding of the nature of the data [50]. DNNs are purely supervised and widely used in different research
areas to discover previously unknown extremely abstract patterns and correlations. However, the training
process of DNNs is not optimal, at least compared to SVMs, and its learning process is also very slow [39].
Bhatkoti et al. proposed a DNN with one hidden layer after a feature extraction step with a modified sparse AE
[100]. Amoroso and colleagues used a DNN with 11 layers that included a mixed optimal configuration of
activation with a decreasing number of neurons in each layer, followed by a softmax at the end [142]. Cui and
co-workers used a DNN with two hidden layers to extract features and train an RNN [88]. A three-stage deep
feature learning and fusion framework for MRI, PET, and genetic data was proposed in [67, 68], with the first
stage involving the learning of latent representations of each modality and the last stage the learning of joint
latent features. In each stage, a DNN was used, and each DNN consisted of several fully-connected hidden
layers and one output softmax layer. A similar approach was used by Thung and collaborators who combined
these modalities together with age, gender, and educational level [135]. Although DNNs are mostly used in a
supervised manner, an unsupervised approach with three hidden layers was suggested by Li and colleagues to
extract high-level feature representation with a linear kernel SVM as the classifier [50].

4.2.2 Deep Polynomial Network


The DPN is another supervised deep learning algorithm, which may have similar, or even better,
performance compared with DBNs and stacked AEs [156]. Motivated by successful applications of DBNs and
stacked AEs, DPNs can also be stacked to construct a much deeper configuration to further enhance
representation performance. A multi-modal stacked DPN (SDPN) consisting of two-stage SDPNs has been

21
proposed for feature extraction from multi-modal neuroimaging data [87, 133]. Two SDPNs are first used to
learn high-level features from MRI and PET, from where features are then fed to another stacked DPN to fuse
multi-modal neuroimaging information. The final learned high-level features contain both the intrinsic
properties of each modality and correlations between the modalities. The whole model was trained in an
unsupervised manner with a linear SVM for classification.

4.2.3 Convolutional Neural Network


CNNs are a type of deep neural network inspired by the visual cortex of the brain. They are the most
successful deep model for image analysis and have been designed to better utilize spatial information by taking
2D/3D images as input and extracting features by stacking several convolutional layers; the result is a hierarchy
of gradually more abstract features [39, 42]. The main idea behind CNNs, and their major advantage, is to
merge feature extraction and classification. The logic behind them is that training a classifier independently
from the feature extraction stage can lead to poor learning performance, possibly due to the heterogeneous
nature of the extracted features and the classifier. The structural information between neighbouring pixels or
voxels is also very important for images. In the deep models we are considering, the inputs are mostly in vector
form, yet vectorization inevitably destroys structural information in images. Also, in contrast to DNNs, due to
pooling layers and shared weights in CNNs, the number of parameters are drastically reduced [42]. In recent
years, CNNs have become very popular and in image-based applications. However, the need for a large dataset
can be considered a weakness of these models.

CNNs were first introduced in 1989 by LeCun and colleagues [157]. Despite initial success, they have not
been widely employed until recently when various new methods have emerged to efficiently train deep
networks, and computer systems have improved [42]. CNNs attracted great interest after deep CNNs achieved
remarkable results in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competitions, where
they have been successfully applied to a dataset of about a million images that included 1000 different classes
[158]. Although many studies prefer to design their own structure based on CNNs, it is common to use well-
known and proven structures like LeNet [159], AlexNet [160], CaffeNet [161], VGGNet [162], GoogLeNet
[163], ResNet [164], DenseNet [165], and Inception [166]. The success of these models has already been
reported in the literature, and especially demonstrated in ImageNet competitions; more details can be found in
[153, 167-169]. In the following subsection, the architecture of CNNs is reviewed, and then related studies using
2D/3D CNNs for AD detection are discussed.

4.2.3.1 CNN architecture


CNNs are made up of several convolutional layers, activation layers, pooling layers, fully-connected layers,
and a softmax layer. Training a CNN includes a forward and backward stage to compute the loss cost between
the predicted output and the ground truth labels, with penalties computed with chain rules. The first and
fundamental layer is the convolutional layer, which convolves the input image with the learned filters and
produces appropriate feature maps. While the first layers of the CNNs extract discriminative shift/scale-
invariant features of local image patches, the last layers permit task-specific classification using these features
[153]. A typical convolutional layer is usually followed by applying a nonlinear activation function such as a
sigmoid, hyperbolic tangent (Tanh), or Rectified Linear Unit (ReLU) to build a feature map for each filter. This
non-linearity enables models to learn more complex representations. ReLU has been employed in most studies,

22
while sigmoid or Tanh is still popular. The second type of layer that comes after the convolutional layer is the
pooling layer, which down-samples the input feature map by replacing each non-overlapping block with its
maximum or average. Pooling helps reduce the number of parameters, feature dimensions, and computations in
the network while keeping the most influential features more compact in low to higher layers. It therefore
achieves a degree of robustness to certain distortions and geometric variations such as shift, scale, and rotation.
The fourth type of layer is the fully-connected layers that perform like traditional neural networks and contain
typically about 90% of the parameters in a CNN [153]. After a series of convolutional and pooling layers,
2D/3D feature maps are flattened into a 1D feature vector that no longer has spatial coordinates; a fully-
connected layer is then implemented. Fully-connected layers connect all feature elements in the previous layer
to the output layer, helpful in learning non-linear relationships between the local features. Finally, the softmax
layer classifies subjects by selecting the highest predicted probabilities for each label. The softmax function
highlights the largest values in a vector while suppressing those that are significantly below the maximum.

4.2.3.2 2D CNNs
CNNs are specifically designed to recognize patterns in two-dimensional images. Many studies have used
2D CNNs for 3D neuroimaging. Also, according to [85, 86], building a 3D CNN requires a larger number of
parameters than 2D CNNs. Section 3.2.2 explained how different studies have extracted 2D information from
3D images by splitting volumetric data into image slices. Here, different deep architectures using 2D CNNs are
discussed. The usual deep model here is a couple of convolutional layers paired with pooling layers and
followed by fully-connected layers and a softmax layer. For instance, 2D CNNs with two [170], three [107, 108,
130], or five [171] convolutional layers in a single-modality approach, or with four convolutional layers in a
multi-modal study [139], have been employed. Other examples are several 2D CNNs with two convolutional
layers in a couple of MRI image slices from the coronal view [52]; six convolutional layers on one sagittal MRI
slice including the hippocampus [127]; and one convolutional layer together with a Polynomial Kernel SVM
where the filters of the CNN was provided by an AE [143]. Luo and co-workers used seven 2D CNNs on seven
groups of slices consisting of three convolutional layers [119]; this meant that if one or more of the classifiers
classified a subject as AD, then the subject was definitely classified as AD. Aderghal et al. used another 2D
CNN architecture consisting of two convolutional layers on a few slices of the hippocampus [77]. This
architecture was further modified to fuse all different views with a majority vote [75, 76]. To achieve better
results it has been further developed as the core of a multi-modality approach to combining all views [78]. A 2D
CNN with two convolutional layers has been used by taking, as weak learners, the predicted response values
from multiple sparse regression models [54, 55]. Li and co-workers proposed the architecture of a spectral CNN
based on a common CNN made up of convolutional layers, subsampling layers, and fully-connected layers [51].
The input layer of this CNN was the spectral domain representation of a brain network, where a set of
eigenvalues represented the connections between regional pairs and the locations of nodes.

However, due to the absence of kernel sharing across the third dimension, a scheme of using 2D CNNs is
inefficient in encoding the spatial information from 3D images [83]. That is why Islam et al. designed three 2D
CNNs to obtain three different views of an MRI [129]. Each CNN consisted of four convolutional layers and
four dense blocks (each with 12 convolutional layers), and the final decision was made with majority voting. To
directly capture the spatial information in a 3D image, a novel structure combining a CNN and an RNN has
been proposed [85, 86]. In these studies, 2D CNNs were built to capture the intra-slice features (similar

23
structures in a single slice), while RNN was used to extract the inter-slice features (similar structures in adjacent
slices) for final classification. First, an FDG-PET scan was decomposed into several 2D image slices in the
coronal, sagittal, and axial directions; the decomposed image slices were further partitioned into several groups,
and for each group of slices, a deep 2D CNN with five convolutional layers was built and trained. After testing
the classification performance of each CNN, the best CNN was selected with the highest classification accuracy
from three different planes. Finally, CNNs and RNN were combined in each direction to obtain three prediction
scores. The final classification was performed by weighted averaging of the three prediction scores obtained
from the three different views [85, 86].

Focusing on well-known 2D architectures and training from scratch, an adjusted LeNet and GoogLeNet have
been used to classify 2D GM slices [114] or fMRI slices [116, 117] or both [115]. Farooq and colleagues used a
2D CNN based on GoogLeNet and ResNet on 2D MRI images [110]. Kazemi et al. demonstrated that both
AlexNet and GoogLeNet performed well on 2D fMRI images for classifying different stages of AD [32].
However, GoogLeNet was reported to be more time consuming, and so AlexNet was chosen as the classifier. It
is common practice to use proven pre-trained CNNs in the initialization stage for one domain-specific task and
then re-train them for new tasks by fine-tuning the CNNs. This is possible because the lower layers of CNNs
have more general features that can be applied to many tasks and are therefore able to be transferred from one
application domain to another, a process known as ―transfer learning‖. Using transfer learning from the
ImageNet dataset, Wu and co-workers fine-tuned CaffeNet and GoogLeNet to predict the risk of conversion
from MCI to AD on 2D MRI images [122]. Similarly, several 2D CNNs, which have used image slices as input,
have been built based on pre-trained VGGNet-16 [121, 123], ResNet-18 [111], Inception-V4 [172], DenseNet-
121 [125], VGGNet-16 and Inception-V4 [120], and GoogLeNet and ResNet-152 [112]. Gao and collaborators
fine-tuned ResNet-18 and used it together with RNN [128]. Zheng et al. fine-tuned AlexNet for 2D images at
the centre of each ROI of PET scans [99], selecting an ensemble of 30% of the well-performed AlexNets as the
classifier using a voting strategy. Islam and colleagues applied transfer learning to an ensemble of three
DenseNet styled models with different depths (121-161-169), where the final decision was made by majority
voting [173]. In another study, Qiu et al. independently trained two MLP models on MMSE and LM test results,
and VGGNet-11 architecture was fine-tuned for three selected MRI slices [118]. The predictions from these
three models were further combined by using majority voting to make the final decision. In a different transfer
learning concept, Wegmayr and colleagues used a 2D deep model based on Inception-V3 as a static feature
extractor [113]. In the study, only one additional linear layer was trained on top of the concatenated features.
The model achieved the same accuracy as the 3D-CNN model trained from scratch; however, it trained much
quicker because the top-tuned layer had many fewer parameters.

4.2.3.3 3D CNNs
Because neuroimaging provides 3D images, and there is a spatial relationship among the images, 3D CNNs
are popular. Despite their complexity, AD detection must take the whole image or some ROIs as the input.
However, this may require training a large number of parameters on a small dataset, which may result in
overfitting [83]. In direct methods, 3D CNNs with twelve [174], five [175], and four [101, 176] convolutional
layers have been utilized. Another network with seven convolutional layers, where three different filter sizes
were chosen in its first convolutional layer to capture input features on different length scales, was used in
[177]. Li et al. used a combination of features from a 3D CNN with six convolutional layers and multi-scale 3D

24
convolutional AEs, with three hidden layers and a softmax layer for classification [84]. In a related approach,
3D CNNs were pre-trained with an AE using one [149] or three [154, 155] convolutional layers. Vu and
colleagues used two 3D CNNs, each with a convolutional layer pre-trained with a sparse AE, on two modalities,
combining them with a fully-connected layer [146]. Punjabi et al. used a 3D CNN with three convolutional
layers and two fully-connected layers for each modality, combining all the layers with another fully-connected
layer [148]. Other proposals have been for 3D CNNs with five [178] or seven [179] convolutional layers fused
with a DNN at the final fully connected layer. The input data used was clinical and genetic data for the DNN,
and MRI for the 3D CNN. Feng et al. used two independent 3D CNNs, each with six convolutional layers for
the MRI and PET images [56], and used a stacked bidirectional RNN instead of traditional fully-connected
layers to gain further advanced semantic information.

Karwath and colleagues employed a 3D CNN classifier with seven convolutional layers to extract ROIs from
PET voxel data [102]. In this approach, the contribution of each voxel was calculated relative to the accuracy of
the utilized 3D CNN, so that an inaccurate voxel could be excluded from input to the 3D CNN. In contrast to
many deep learning methods that target the classification, this method targets the ROIs. Chen et al. used a 3D
CNN with seven convolutional layers for each ROI, with the final decision made by majority voting [98].
Khvostikov et al. used the 3D CNN for left and right lobes of the hippocampus in each modality, and then
combined all CNNs with fully-connected layers [79]. In this work, slightly different configurations were
evaluated for AD vs. NC, and a configuration with six convolutional layers was selected. Two 3D CNNs were
designed with five convolutional layers for the left and right lobes of the hippocampus, and the final
classification was made by combining the prediction scores from both 3D CNNs [141]. Liu and colleagues used
3D CNN models with five [95] and six convolutional layers using a concatenation of features [94]; the approach
was to learn patch-based morphological features from each landmark and make a final classification using a
majority voting strategy on all CNNs for all landmarks. A similar configuration with six convolutional layers
was later proposed [103], in which the researchers concatenated features from all landmarks with fully-
connected layers, and incorporated personal information (e.g., age, gender, and education level) in another set of
fully-connected layers. This deep learning framework thereby embedded personal information and automatically
learnt MRI representations without requiring any expert knowledge of pre-defined features (similar to [94, 95]).
In another related paper, Cheng and colleagues utilized multiple 3D CNNs with four convolutional layers on
several local image patches [145]. The CNNs for the ensemble were selected according to the classification
accuracy of the validation data, and jointly fine-tuned in the last few layers to be more adapted to the global
classification task. Esmaeilzadeh et al. trained a 3D CNN with three convolutional layers on two classes (AD vs.
NC) [180] then fine-tuned the weights so as to classify the subjects into three categories. Choi and colleagues, in
a multi-modal study [181], used a 3D CNN with three convolutional layers for predicting MCI conversion. For
discrimination between MCIc and MCInc, the network was first trained by AD/NC data and then directly
transferred to classify MCIc from MCInc.

Focusing on well-known 3D architectures and training from scratch, Karasawa et al. used a 3D CNN based
on ResNet [182]. Similar ones based on VGGNet and ResNet have been proposed [183, 184]. Cheng et al. used
a 3D CNN structure inspired by LeNet with four convolutional layers for each image patch [82]. In this work,
each network was individually optimized on each patch, and the features extracted from all CNNs were stacked
to form 3D feature maps of the structure. Further, to learn the global features, a deep 3D CNN was constructed

25
at the highest level, followed by a fully-connected layer and a softmax layer for ensemble classification. Tang
and co-workers used a 3D CNN based on VGGNet, with an extra shortcut to merge low-level and high-level
feature information and alleviate gradient vanishing [126]. Vu and colleagues used a 3D CNN based on
VGGNet pre-trained by a high-level layer concatenation of AE [106]. To address the problem of limited training
data, Wang et al. introduced dense connections to 3D-CNN [185], with the dense connections improving
information content and the propagation of gradients throughout the network. Senanayake and colleagues used a
fusion pipeline in which information from multiple modalities was fused seamlessly through a 3D deep model
based on DenseNet [186]. In similar work, Li et al. constructed multiple 3D DenseNets with the same structure,
and then selected the most discriminative DenseNets having high classification accuracy from the validation sets
[147]. Khvostikov et al. applied a 3D deep model consisting of four sequential combinations of Inception block
on each ROI [80], finally, concatenating all models to produce the classification result. Note that the extension
of 2D to 3D in these models creates considerable challenges, including an increased number of parameters and
considerable memory and computational requirements.

4.2.3.4 Cascaded 2D/3D CNNs


Cascaded CNNs have been built to learn features from MRI and PET brain scans [81, 83]. First, multiple
deep 3D CNNs with four convolutional layers are constructed on different local image patches to transform the
local brain image into more compact high-level features. Then a high-level 2D CNN with two convolutional
layers is cascaded to combine the high-level features and generate the latent multi-modal correlation features of
the corresponding image patches. Finally, these extracted features are combined by a fully-connected layer
followed by a softmax layer for AD classification.

4.2.4 Recurrent Neural Network


In time series problems, such as in video applications, RNNs include a ‗memory to model‘ temporal
dependency. In this scheme, past information is implicitly stored in hidden units called state vectors, and using
these state vectors, the output of the current sequential input is computed by considering the current input data
as well as all previous input data. RNNs are not as deep as DNNs or CNNs in terms of the number of layers, and
they may have problems in memorizing long-term input data [39]. They still require large datasets. Fortunately,
substituting the simple perceptron hidden units with more complex units such as LSTM (Long Short-Term
Memory) or GRU (Gated Recurrent Unit), which function as memory cells, helps considerably in overcoming
the memory problem. The LSTM contains three gate units and a memory cell unit, a more complicated
arrangement compared with a traditional RNN, but one that can effectively capture valuable information in a
sequence [88]. GRU is a simpler kind of LSTM with slightly better performance [85, 88]. Although 2D images
involve spatial information instead of sequential information, a 3D image can be treated as a sequence of 2D
images. Nowadays, RNNs are being increasingly applied to images [42].

A 2D CNN together with an RNN has been constructed and trained, in which the hierarchical 2D CNNs
captured the intra-slice features (similar structures in a single slice), while the GRU was used to extract the
inter-slice features (similar structures in adjacent slices) for final classification [85, 86]. Individual CNNs and
GRU combination models for axial, sagittal, and coronal planes have been trained to produce three prediction
scores. The final classification was performed by weighted averaging of the three prediction scores from three
different planes. Feng et al. designed an independent 3D CNN for each modality, and to obtain more detailed

26
information they used a stacked bidirectional RNN at the end instead of traditional fully-connected layers [56].
Cui and co-workers constructed and trained a model with two GRU layers to capture longitudinal changes from
time-series data [88]. To extract temporal features, two GRUs were stacked with a sequence of input feature
vectors, which were generated from the first layer of an MLP. In the study, the GRUs utilized intra-image
features and extracted longitudinal features. To facilitate disease diagnosis, Gao et al. designed an LSTM-based
architecture [128] to extract longitudinal features and capture pathological changes. Another longitudinal study
based on RNNs was done by Lee and colleagues [89].

4.3 Comparative analysis of different deep models


Unfortunately, most of the studies in our literature review did not submit their source code either to any
software development hosting platform or to an online competition one. Therefore, it is not easy to impartially
compare the studies with each other. In addition, it was noticed that most studies that compared their results to
those of others did not actually implement the competitors‘ algorithms, but just reported their final accuracies.
Even if the competing algorithm was implemented, there is no guarantee it would be identical with the original.
Nevertheless, comparative information, collected from all the studies in our literature review is given in this
section, together with our perspective on the individual studies.

Before comparing different deep models, it is first necessary to discuss transfer learning. Training a deep
model from scratch was done in most of the studies; however, it is often inefficient to do so since the training
process is time-consuming and a dataset of satisfactory size (millions of images) is required [83, 123, 128, 167].
Neuroimaging datasets typically have only hundreds of images, a circumstance that gives rise to over-fitting.
Transfer learning is faster and achieves better results compared to training from scratch [111, 113, 120]. There
appears to be close competition among well-known 2D CNNs such as GoogLeNet and ResNet-152 [110, 112];
however, it seems that Inception-V4, ResNet, and CaffeNet have outperformed GoogLeNet, VGGNet-16, and
AlexNet [80, 120, 122, 172, 184]. One paper reports that DenseNet outperforms ResNet and LeNet [147]. A
recent study that implements several well-known 2D CNNs using transfer learning is available [187].

When comparing deep models, in one single-modality study [59] a combination of patch-based and ROI-
based methods gave higher accuracy than an ROI-based method in another multi-modality study [44] that has
used stacked AEs. A combination of patch-based and ROI-based stacked AEs [93] outperformed a combination
of patch-based and voxel-based DBM [73]. Voxel-based 3D CNN + stacked 3D AEs [154] was reported to be
more accurate compared with ROI-based stacked AEs [44, 69] and RBMs [137]. ROI-based stacked DPNs [87,
133] outperformed ROI-based single DPNs, stacked AEs [69], and RBMs [137]. An ROI-based ensemble of 3D
CNNs [98] and AlexNets [99] in two single-modality studies outperformed several multi-modality studies (ROI-
based stacked AEs [43, 44, 69], RBMs [137], and DBNs [109]). Slice-based VGGNet-16 [123] outperformed
patch-based [144] and voxel-based [149] combination of AEs and CNNs. Slice-based Inception-V4 [120],
voxel-based 3D CNNs [106, 155, 175, 177, 182], voxel-based 3D CNN + stacked RNN [56], and patch-based
ensemble of 3D and 2D CNNs [83] were reported to be more accurate than AEs using other data management
methods [43, 44, 48, 69, 71, 144, 149, 154]. A combination of 3D and 2D CNNs [81] outperformed 3D
CNN+stacked 3D AEs [154]. 3D CNN outperformed 2D CNN [126] and voxel-based 3D CNN + stacked RNN
[56] outperformed patch-based ensemble of 3D CNNs + 2D CNN [83]. Nevertheless, slice-based multi-view 2D
CNNs + RNN [86] was reported to have better performance compared with 2D CNN and 3D CNN.

27
Regarding input data management methods, ROI-based and patch-based methods are more efficient than
others, but multi-view studies in slice-based methods have shown good performance too. Comparisons of these
methods are summarized in Section 3.3. According to our review, unsupervised deep models are best utilized to
extract features and feed them to a classifier. Among these models, a stacked AE can significantly improve the
representational power of highly nonlinear and complex patterns. Another benefit of AEs is that they can find
good initialization parameters for CNNs. However, with suitable initialization methods like Xavier [188] and
transfer learning, this advantage of AEs no longer holds. Supervised methods had more popularity in our
literature review, enabling feature extraction and classification to be merged into a single model. They were
reported [133] to have better performance compared with AEs when a DPN or a stacked DPN is utilized.
Compared to SVMs, DNNs are well suited to vector-based problems where the training process is not optimal
and the learning process is too slow [39].

Among the supervised methods, the main competition is between 3D CNNs and 2D CNNs (with or without
RNNs), which are optimized for image-based problems. The former can capture 3D information from the 3D
volume of a brain scan and has shown better performance compared with 2D CNNs [126]. However, the
complexity of training is an issue here, although it can be resolved using patch-based or ROI-based methods
instead of voxel-based ones. On the other hand, 2D CNNs are easier to train. Nevertheless, a scheme employing
2D CNNs is not efficient in encoding the spatial information of the 3D images due to the absence of kernel
sharing across the third dimension [83]. That is the reason some studies consider all three views of a brain scan
or use RNNs after 2D CNN to capture 3D information in adjacent image slices in a sequence of images. The
effect of depth in CNNs has been examined by Wang [185], and the reported results show that shallow and very
deep networks do not necessarily give good results. The strengths and limitations of each deep model are given
in Table 3, with more details in [39, 153, 189-191]. The superiority of CNNs in terms of accuracy, sensitivity,
and specificity for highly-cited studies and for whole primary studies is reported in Section 7 and Tables 4 to 9
of Appendix 1.

28
Table 3. A summary of each deep model utilized in AD detection.

Models Strengths Limitations


Can represent highly nonlinear and Learns to capture as much information
complex patterns as possible rather than as much relevant
AE Good initialization for CNNs information.
Good for dimension reduction
Easy to implement
Can learn very good generative Computationally expensive in the training process
model
RBM
Able to create patterns if there
are missing data
Good for vector-based problems Has a slow training process and not optimal for
Can handle datasets with a large images
DNN number of samples Has generalization issues
Can detect complex
nonlinear relationships
Can effectively learn feature Has limited performance due to the simple
DPN representation from small samples concatenation of the learned hierarchical features
from different layers
Good for sequential 2D images Has issues related to the training process due to
RNN
Good for longitudinal studies vanishing/exploding gradients
Good performance in local Cannot encode the spatial information of the 3D
2D
feature extraction in images images across the third dimension
CNN
Easy to train
CNN Good performance in local Computationally expensive in the training process
3D feature extraction in images
CNN Can capture 3D information from the
3D volume of a brain scan

5 Datasets and software platforms


Although AD detection is a complex task, researchers do not have to work alone. Different online datasets
and software packages are available to assist. Brain image analysis packages such as FreeSurfer 9, FSL10,
MIPAV11, and SPM12 provide powerful tools for different automated pre-processing techniques [20, 192], which
were explained in Section 3.1. Also, software packages such as MATLAB13, Keras14, Tensorflow15, Theano16,
Caffe17, and Torch18 are employed to implement deep models [42, 96, 191, 193]. The popularity of each
software in our literature review is shown in Figure 8. In addition, online datasets such as ADNI19 [194],
AIBL20 [195], OASIS21 [196], and MIRIAD22 [197] are very helpful too [20]. These datasets make publicly

9
See surfer.nmr.mgh.harvard.edu/
10
Functional magnetic resonance imaging of the brain Software Library, see fsl.fmrib.ox.ac.uk/
11
Medical Image Processing, Analysis, and Visualization, see mipav.cit.nih.gov
12
Statistical Parameter Mapping, see www.fil.ion.ucl.ac.uk/spm
13
See mathworks.com
14
See keras.io
15
See tensorflow.org
16
See deeplearning.net/software/theano
17
See caffe.berkeleyvision.org
18
See torch.ch
19
Alzheimer‘s Disease Neuroimaging Initiative, see adni.loni.usc.edu
20
Australian Imaging, Biomarkers and Lifestyle, see aibl.csiro.au
21
Open Access Series of Imaging Studies, see oasis-brains.org
22
Minimal Interval Resonance Imaging in Alzheimer's Disease, see ucl.ac.uk/drc/research/methods/minimal-
interval-resonance-imaging-alzheimers-disease-miriad

29
available biomarkers such as neuroimaging modalities, genetic and blood information, and clinical and cognitive
assessments. Among them all, ADNI is notable for being a longitudinal, multicentre study. It is the most
common dataset in our literature review, being used in about 90% of studies by itself or in combination with
others. ADNI was launched in 2003 by NIA, NIBIB23, FDA24, private pharmaceutical companies, and non-profit
organizations as a $60 million, 5-year public/private partnership. It was a North American-based study that
aimed to recruit 800 adults (about 200 cognitively normal older individuals, 400 people with MCI, and 200
people with early AD) to participate in the research and be followed for 2–3 years. Acquisition of this data is
performed according to the ADNI protocol [194]. ADNI subjects aged from 55 to 90 have been recruited from
over 50 sites across the U.S. and Canada. The primary goal of ADNI is to test whether serial MRI, PET, genetic,
biospecimen, and clinical and neuropsychological assessments can be combined to measure the progression of
MCI and early AD.

OASIS is a project aimed at freely distributing brain MRI data, including two comprehensive datasets. The
cross-sectional dataset includes MRI data of 416 subjects (young, middle-aged, non-demented, and demented
older adults) aged 18 to 96. The longitudinal dataset includes MRI data of 150 subjects (non-demented and
demented older adults) aged 60 to 96.

Another dataset is AIBL, funded by CSIRO25, which includes clinical and cognitive assessments, MRI, PET,
biospecimen, and dietary/lifestyle assessments. The MIRIAD dataset is a database of MRI brain scans collected
from participants at intervals of 2 weeks to 2 years; the study is designed to investigate the feasibility of using
MRI for clinical trials of AD treatments. Finally, some studies prefer to employ their own local datasets. Further
details about additional datasets and the total number of subjects in each paper are shown in Table 3 of
Appendix 1. Additional details on datasets and software packages can be found in [20].

(a) Torch
(b) 6%
MATLAB
Theano
Freesurfer 15% 15%
FSL
25%
40%
Keras
Tensorflow
27%
SPM 23%
MIPAV 23% Caffe
12% 14%

Figure 8. The frequency with which pre-processing software (a) and deep learning software (b) appears in the
literature review.

6 Training considerations
After the above introduction to biomarkers, deep models, and datasets, some concerns about training issues
and parameters need to be addressed. Due to the complexity of deep models and neuroimaging modalities, many

23
National Institute of Biomedical Imaging and Bioengineering
24
Food and Drug Administration
25
Commonwealth Scientific and Industrial Research Organisation

30
different parameters are involved. In this section, the considerations for training the parameters are reviewed.
All the parameter values in this section come from the studies reviewed here and are summarized in Table 4.

The first important matter is optimization algorithms used for training purposes, such as Stochastic Gradient
Descent (SGD), Adam, Adadelta, RMSProp, and Adagrad. Although some studies have done comparisons [107,
185], the most commonly used one is SGD (with or without momentum), which has been employed in about
60% of studies.

The second issue involves employing well-designed parameter initialization for deep models. Random
initialization (especially close to zero with or without a specific distribution) and Xavier [188] initialization of
weights are the two most common methods, and the latter is frequently used to speed up the training of deeper
networks. However, an initialization method first introduced by [198] has been reported [170] to be better than
Xavier. Also, in transfer learning methods, weights coming from pre-trained networks (especially on ImageNet)
have recently attracted much attention. Each of these initialization methods was utilized more or less equally in
the primary studies of this review.

Another issue in the process of training is the rate of learning. The base learning rate is usually within the
range of 0.000001 to 0.5, with a typical value of 0.001. The momentum specifies the amount of the old weight
change, which is added to the current change, with a typical value of 0.9. Training parameters interact, and there
is no generally adopted set of values. However, the range and typical values utilized in the primary studies are
shown in Table 4, and they can be used as a starting point for future research on AD detection using
neuroimaging modalities. The mini-batch size refers to the number of training examples employed in a single
iteration, usually within the range of 4 to 256 with a typical value of 64. The mini-batch size can usually be
chosen almost arbitrarily, but small values add noise to the gradient and can make convergence harder, while
excessively large mini-batch sizes are limited by memory and can cause convergence to a suboptimal local
minimum. However, the accuracy of classification performance has been shown to improve by 4% as batch size
increases from 4 to 48 [113].

There are some special techniques to improve generalization and robustness, increase learning speed, and
reduce overfitting during the training of deep learning models. Batch normalization performs normalization for
each mini-batch and then back-propagates the gradients through the normalization parameters [113]. In about
50% of studies using 2D/3D CNNs, the use of batch normalization improved the speed, performance, and
stability of deep models. Drop-out is a well-known regularization technique where some of the nodes are
randomly dropped (forced to zero) to improve the generalization capability of a model. Neurons that are
dropped out do not contribute to the forward pass and the backpropagation steps. This prevents nodes from co-
adapting weights, forcing them to act independently and reducing overfitting while also alleviating memory and
computational issues [123, 149]. It is also more likely to discover local patterns and structures in the image and
can overcome the problem of insufficient samples [50, 137, 149]. The fraction of deactivated nodes varied from
20% to 80% from one study to another, with a typical value of 50%. This technique has been employed in most
of the studies in our review, consistently increasing performance [50, 137, 170, 180, 185]. However, it has been
reported [54, 55] that dropout is useless in a batch-normalized network. Another regularization method, L1/L2
regularization, basically adds a penalty term as model complexity increases. This decreases the importance
given to higher terms and steers the model towards a less complex equation, improving performance. The key

31
difference between L1 and L2 regularization is in the penalty term, where the former adds ―absolute value‖ as
the penalty to the loss function and the latter adds the ―squared magnitude‖ of coefficients. L1 and L2 are used
alone or together in about 25% of studies [170, 180].

With any dataset, it is vital to know how many subjects are needed for training. According to the literature,
about 60–80% of scans are selected for the training set, and all the others for validation and testing. Apart from
that, training data should be randomly chosen to ensure that it has a similar distribution to the original dataset.
After training and to evaluate the success of the training procedure, a validation technique like cross-validation
is necessary. Cross-validation is a statistical method for evaluating classifiers. The idea behind cross-validation
is to use a fraction of the dataset to train the classifier, and then use the remainder as a new and unseen set to test
the performance of the classifier. The most well-known method for cross-validation is k-fold. In k-fold, samples
are divided into k folds. Subsequently, k iterations of training and validation are performed, so that each fold is
used once and only once for validation. In our literature review, k was within the range of 5 to 20, with a typical
value of 10 used in about 65% of studies. However, there are two drawbacks in k-fold cross-validation. The first
is that the training and testing of the classifier have to be repeated k times, which increases computation time
and cost, especially in deep models. Another drawback relates to the number of subjects in the dataset: large
values of k result in a limited number of subjects in the validation dataset and eventually cause unreliable
results. For this reason, some studies prefer to use the hold-out method (as reported in about 10% of studies) or
use smaller values of k.

Table 4. A summary of the parameters and techniques described in Section 6.

Method/Parameter Values/methods Typical value/method


Optimization SGD, Adam, Adadelta, RMSProp, Adagrad SGD (with or without momentum)
Initialization random, Xavier, transfer learning
Base learning rate 0.000001 – 0.5 0.001
Momentum 0.9 0.9
Mini-batch size 4 to 256 64
Drop-out factor 20% to 80% 50%
Validation hold-out, k-fold cross-validation with k:5–20 k-fold cross-validation with k =10

7 Highlights
This paper systematically reviews strategies for improving AD detection based on deep learning and
neuroimaging modalities (with or without other biomarkers). In this section, the highlights that have emerged
from this systematic review are listed.

First, regarding the classification task, classifying early MCI patients from NCs and predicting MCI
conversion to AD are more valuable compared to others. Some related findings, such as the success of deep
models compared with traditional machine learning methods, were not reported since many researchers have
already discussed them. Similarly, longitudinal studies are known to be more sensitive to early disease-related
changes in the brain, giving more accurate diagnosis [88].

A key finding is the crucial role of pre-processing of brain scans. The performance of an AD detection
system depends in large measure on the quality of the neuroimaging. At the very least, intensity normalization
and registration need to be done. Other key factors are set out below.

32
 ROI-based and patch-based methods are more efficient.

ROI-based features have low feature dimensions and can be easily interpreted, whereas patch-based methods
are sensitive to small abnormal changes in the brain. Both techniques are more efficient compared with slice-
based and voxel-based ones. However, multi-view studies in slice-based methods and voxel pre-selection
techniques in voxel-based methods deliver comparable performance.

 Multi-modality studies outperform single-modality ones.

Neuroimaging modalities such as MRI, PET, fMRI, and DTI are fundamental to AD detection. Other factors
such as age, gender, educational level, memory test score, and genetic information are also helpful. Although the
most discriminative neuroimaging modality is still controversial, combining them is likely to be most effective
since it will reflect different aspects of AD and this is especially helpful for early AD detection and predicting
conversion from the prodromal stages of the disease. Considering the results of single-modality and multi-
modality studies, there is a trade-off between higher accuracy and the financial cost of acquiring additional
biomarkers. Generally speaking, multi-modality studies achieve better results compared with single-modality
studies [56, 67, 69-71, 73, 78, 81, 87, 93, 133, 135, 139, 146, 148], which is expected due to the complexity and
heterogeneity of AD.

 Data augmentation.

The size of the training dataset is known to have a significant effect on the performance of a classifier on an
unseen test set [185]. The numbers of AD and MCI subjects can be very limited in each dataset, which is
inadequate for testing deep models. The situation is worse for multi-modality studies. Therefore, some studies
have combined datasets. Although combining different datasets will result in more heterogeneity, it does lead to
the creation of a large and robust model for classification and prediction. Another way to solve the limited
number of subjects in a dataset is to use data augmentation. Data augmentation is a strategy that increases the
diversity of data available for training models, without actually collecting new data. Data augmentation
techniques such as reflection, random translation, rotation, noise injection, Gamma correction, blurring,
cropping, and scaling have been used, when required, in about 20% of studies to improve classification
performance [111, 180]. In addition, longitudinal datasets provide several brain scans per subject at different
time points, and while their original purpose was to investigate disease progression, they can also be used in a
time-independent way for data augmentation [60, 93, 110, 112, 172]. However, adding additional images of the
same subjects does not necessarily increase performance compared to increasing the number of subjects [113].
Scans from the same subject should not be used in both the training and test sets. Ignoring this factor leads to an
―information leak‖ and overfitting to the individual patient instead of learning the general disease pattern, and
this causes over-optimistic test results [107, 113, 148, 175]. While some studies explicitly avoid the problem by
using only one image per subject, or use a correct train/test split [107, 148, 177], others did not. We conclude
that the training and test sets must be augmented independently [129], even though some studies prefer not to
use any augmentation at all [77, 78, 83].

 A balanced dataset is recommended.

Another issue is class imbalance (too few subjects in one class compared with others), which can be handled
by either a data augmentation method or a reduction in the number of original scans from the over-sampled class

33
[77, 127]. Results from both balanced and unbalanced dataset suggest that accuracy slightly changes with the
distribution of data in each class. Balancing the dataset can improve performance even if it makes the dataset
smaller [112, 170].

 The success of CNNs.

Table 5 shows the diversity of methods used in those papers with the highest average number of citations
per year (from the year of publication to 2019). Here, the results are for the AD vs. NC classification. All the
deep models mentioned are still in use for AD detection. However, the main competition seems to be between
3D CNNs and 2D CNNs (with or without RNNs). To encode the spatial information of 3D images, patch-based
or ROI-based 3D CNNs are competing with multi-view slice-based 2D CNNs combined with RNNs [86, 126].

34
Table 5. A list of papers with the highest average number of citations per year.

Results**
Ref. Data* Deep model Dataset
ACC SEN SPE
Shi et al. [87] MRI (GM), PET /R Multi-modal stacked DPN and a linear kernel SVM ADNI, 202 subjects 97.13 95.93 98.53
Suk et al. [73] MRI (GM), PET /V+P Multi-modal DBM with an SVM ADNI, 398 subjects 95.35 94.65 95.22
Suk et al. [69, 71] MRI (GM), PET, CSF /R Stacked AEs with a multi-kernel SVM ADNI, 202 subjects 98.8 - -
Payan & Montana [149] MRI /V Sparse AEs and 3D CNN ADNI, 2265 scans 95.39 - -
Liu et al [43, 44] MRI (GM), PET /R Stacked sparse AEs and a softmax layer ADNI, 311 subjects 91.40 92.32 90.42
A 3D CNN models for each landmark with concatenation at final ADNI+MIRIAD,
Liu et al. [94] MRI /P 91.09 88.05 93.5
stages 1526 scans
Ortiz et al. [109] MRI (GM), PET /R A set of DBNs for all ROIs and an SVM ADNI, 275 subjects 90 86 94
OASIS + local data,
Wang et al. [127] MRI /S A 2D CNN 97.65 97.96 97.35
196 subjects
Suk et al. [55] MRI (GM) /R A combination of sparse regression models and a 2D CNN ADNI, 805 subjects 91.02 92.72 89.94
Li et al. [137] MRI, PET, CSF /R PCA features, stacked RBMs and a linear kernel SVM ADNI, 202 subjects 91.4 - -
Hosseini-Asl et al. [154, CADDementia +
MRI /V A 3D CNN pre-trained with stacked 3D convolutional AEs 99.3 100 98.6
155] ADNI, 240 subjects
Multimodal and multiscale DNNs consist of 7 DNNs (each DNN:
Lu et al. [93] MRI (GM), PET /P+R ADNI, 1242 subjects 84.6 80.2 91.8
a stacked AE and a softmax layer)
Korolev et al. [183] MRI /V 3D CNN based on ResNet and VGGNet ADNI, 231 subjects 88 - -
Choi & Jin [181] FDG-PET, AV-45 PET /V A multi-modal 3D CNN ADNI, 492 subjects 96 93.5 97.8
Sarraf & Tofighi [115,
fMRI /S GoogLeNet and LeNet-5 ADNI, 144 subjects 100 - -
116]
Gupta et al. [144] MRI /P Sparse AE followed by a CNN and a neural network ADNI, 755 scans 94.74 95.24 94.26
* S: Slice-based; R: ROI-based; V: Voxel-based; P: Patch-based
** ACC: Accuracy; SEN: Sensitivity; SPE: Specificity

35
 Transfer learning gives excellent results.

Although training a deep neural network from scratch is done in many studies, it is often not feasible to do
so: the training process can take too long, or the dataset is too small [83, 123, 128, 167]. While datasets for
general object detection and classification have millions of images, neuroimaging datasets typically have only
hundreds of images, which leads to over-fitting during training. Generally, it is helpful to use proven, pre-
trained CNNs on one dataset for initialization and then re-train them on another dataset using only fine-tuning of
the CNNs (transfer learning). This is possible since the lower CNN layers include more general features that can
benefit many classification tasks and can be transferred from one application domain to another. Transfer
learning is faster and achieves better performance compared to training from scratch, even with distant tasks
[111, 113, 120]. The first transfer learning approach for AD detection using deep learning was set out in [144],
which involved, after feature extraction with a sparse AE, a 2D CNN with one convolutional layer and a max-
pooling layer, and finally, a neural network with a single hidden layer. It was shown that using natural images to
train the AE enhanced classification performance in the following layers. A 3D CNN with three convolutional
layers pre-trained with an AE on one MRI dataset and re-trained with another dataset was proposed in [154,
155], where the generality of the features with the AE pre-trained on the CADDementia dataset was enhanced.
A 3D CNN with three convolutional layers was used for MRI images [180]. First, the model was trained on two
classes (AD vs. NC), and then a third class (MCI) was added and the weights were fine-tuned to classify the
input into three categories. This fine-tuning strategy actually involved transfer learning from the domain of the
two-class learned model to a three-class case, which was said to improve performance. Transfer learning was
also done in [78], where three 2D CNNs with two convolutional layers (one for each view) were trained on MRI
images. With a limited amount of DTI images and instead of training from scratch, this work used transfer
learning of models that had been trained on the MRI dataset to the target DTI dataset. Finally, the combination
of all the networks allowed the final decision to be made using a majority voting strategy. In another example, a
CNN model trained on AD vs. NC was used to initialize the parameters of a 3D CNN model of MCIc vs. NC
classification, decreasing training time and improving classification performance [83]. Similarly, a CNN model
trained on MCIc vs. NC was also used to initialize the parameters of a 3D CNN model for MCInc vs. NC
classification. A simpler approach was used in [92, 130], where a CNN was initially trained for AD vs. NC
classification and then used for MCI conversion prediction. To summarize, although the usefulness and success
of transfer learning depend on a similarity between datasets, using models pre-trained on ImageNet for transfer
learning significantly increases accuracy compared to training from scratch [111, 113, 120]. Even so, there are
still points of agreement and disagreement for AD detection on well-known CNN models that had previously
been shown to have good performance on ImageNet (see the last paragraph of Section 4.2.3.2 and [32, 80, 110,
112, 120, 122, 127, 129, 147, 172, 183, 184, 187]).

8 Future challenges
Although deep learning methods have shown noteworthy results, there are still unresolved issues to solve
before AD detection in clinical settings can be developed. These issues are mostly about data handling, fusing
information from different biomarkers, and datasets. The key challenges are highlighted as follows.

 More investigations are needed in patch-based and ROI-based studies.

36
Recognizing ROIs needs expert knowledge, which is still incomplete. Finding discriminative patches is also
an issue.

 Finding the optimal combination of different biomarkers is essential.

One of the most critical issues in multi-modality studies is a way to fuse information from all modalities.
The easiest way is feature concatenation, where extracted features from all inputs are concatenated and
classified. However, directly concatenating data does not consider similar disease patterns in the same region of
a brain from all modalities, and may result in an inaccurate detection model. In addition, including other factors
such as genetic information is also a challenge.

 Incomplete datasets in multi-modality studies must be resolved.

Another challenge in multi-modality studies is that the data is usually incomplete, and some modalities
might be missing for some subjects. This means that if a single deep model is trained for all modalities, only
those subjects with complete multi-modality data (perhaps about 70% of all multi-modality studies) can be used,
which limits the scope of a model. To overcome this issue, a three-stage deep feature extraction and fusion
framework for MRI, PET, and genetic data has been proposed [67, 68]. It begins with feature extraction from
each modality, then joins the extracted features, and finally does the classification. In this way, all subjects can
be used to train three individual deep learning models for three modalities. Furthermore, it is possible to have
different numbers of hidden layers as well as different numbers of hidden neurons in each layer, making it
possible to learn the latent representations in each modality and modality combination. A similar approach was
reported [135], in which complete data was grouped into subsets using different modality combinations. As
another possible solution, an image generation task can be formulated with an encoder/decoder deep neural
network [101, 199]. This work models the general relationship between MRI and PET to predict missing PET
scans from available MRI scans; then the predicted PET modality is concatenated with the MRI modality and
used as an input pair for the discriminator network. To summarize, although studies on combining different
biomarkers might show promising results, a comprehensive dataset including all these factors is currently not
available. Put briefly, multi-modal studies suffer from a lack of generalisability. Whenever possible, the ability
to consider all features and modalities is beneficial. More details on multi-modality studies, configurations, and
challenges are set out in [200, 201].

 Data generation needs more investigations.

Despite all the efforts to avoid overfitting, such as employing transfer learning and data augmentation, a lack
of enough data samples causes major generalization issues. To tackle such problems, generative models can be
employed, where data generation means generating new images from already existing images to expand the
dataset. It has already been mentioned, for example, that the relationship between MRI and PET can be modeled
so as to predict missing PET scans from available MRI scans [101, 199]. However, this area of research needs
more work, and the effectiveness of data generation on medical images is still unknown.

 Clear explanations of deep models are demanded.

Table 3 compares the strengths and limitations of each deep model for classification tasks. As shown in
Figure 7, CNNs are the most widely utilized deep structures. The success of CNNs is clearly shown in Table 5,
but there is no clear way to select and design a CNN model for AD detection. This means that the number of

37
convolutional and fully-connected layers, and the combination of all layers, must be done arbitrarily or based on
prior experience. At the moment, many CNN models are being used, but researchers have not explained their
selection methodology.

 A benchmarking platform should be provided.

The choice of the dataset is important and can affect the results of the classifier. Since there are different
datasets, different numbers of subjects, and even dissimilar subject number codes, comparing various methods is
often not possible. Even for studies on the same dataset and with the same number of subjects and subject
number code, the results may still not be comparable because a different fraction of subjects may be used as the
training set and the test set. Further details about the results of each paper can be seen in Tables 4 to 9 in
Appendix 1. In these tables, the reported results of each paper for different classification targets are reported.
The average accuracy is about 92%, 83%, 80%, 79%, and 76% for NC vs. AD, MCI vs. AD, NC vs. MCI,
Multi-class (NC, MCI, AD), and MCIc vs. MCInc, respectively, in line with the findings reported by Wang
[185].

9 Conclusion
AD is one of the leading causes of death, especially in developed countries. Since the early detection of AD
is a challenging task in clinics, the use of computer-based systems, together with medical experts, has much to
recommend it in detecting AD. For this task, deep learning has attracted strong attention in recent years. In this
paper, we have set out how deep learning has enabled the development of AD detection systems. We started this
paper with the definition of AD and its symptoms, followed by an explanation of the current criteria for
diagnosis and of related biomarkers such as MRI, PET, and fMRI. It is clear that combining these neuroimaging
modalities can aid AD detection and can be used with other factors like memory test scores and genetic
information to deliver a more accurate diagnosis.

In terms of pre-processing, intensity normalization and registration to a standard anatomical space is


recommended. For image handling, ROI-based and patch-based methods have been reported to be more
efficient compared with slice-based and voxel-based ones due to their ability to include only AD-related features
in a brain scan. Many deep models have been discussed in this paper. In terms of classification method, CNNs
have been used most frequently, with better-reported accuracies in this area compared to other deep models. As
the final target for an AD detection system, an automatic multi-modal longitudinal approach is preferred.
However, regardless of the final AD detection system, overfitting issues related to the dataset still need to be
resolved.

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that
could have appeared to influence the work reported in this paper.

38
References
[1] S. Klöppel et al., "Accuracy of dementia diagnosis—A direct comparison between radiologists and a
computerized method," Brain, vol. 131, no. 11, pp. 2969-2974, 2008.
[2] Alzheimer's Association, "Alzheimer's disease facts and figures: Includes a special report on the
financial and personal benefits of early diagnosis," 2018.
[3] F. Falahati, E. Westman, and A. Simmons, "Multivariate data analysis and machine learning in
Alzheimer's disease with a focus on structural magnetic resonance imaging," Journal of Alzheimer's
Disease, vol. 41, no. 3, pp. 685-708, 2014.
[4] R. C. Petersen, G. E. Smith, S. C. Waring, R. J. Ivnik, E. G. Tangalos, and E. Kokmen, "Mild cognitive
impairment: Clinical characterization and outcome," Archives of Neurology, vol. 56, no. 3, pp. 303-
308, 1999.
[5] B. Dubois and M. L. Albert, "Amnestic MCI or prodromal Alzheimer's disease?," The Lancet
Neurology, vol. 3, no. 4, pp. 246-248, 2004.
[6] J. P. Lerch et al., "Automated cortical thickness measurements from MRI can accurately separate
Alzheimer's patients from normal elderly controls," Neurobiology of Aging, vol. 29, no. 1, pp. 23-30,
2008.
[7] E. Gerardin et al., "Multidimensional classification of hippocampal shape features discriminates
Alzheimer's disease and mild cognitive impairment from normal aging," Neuroimage, vol. 47, no. 4,
pp. 1476-1486, 2009.
[8] S. Klöppel et al., "Automatic classification of MR scans in Alzheimer's disease," Brain, vol. 131, no. 3,
pp. 681-689, 2008.
[9] G. McKhann, D. Drachman, M. Folstein, R. Katzman, D. Price, and E. M. Stadlan, "Clinical diagnosis
of Alzheimer's disease report of the NINCDS‐ ADRDA work group under the auspices of department
of health and human services task force on Alzheimer's disease," Neurology, vol. 34, no. 7, pp. 939-
939, 1984.
[10] B. Dubois et al., "Research criteria for the diagnosis of Alzheimer's disease: Revising the NINCDS–
ADRDA criteria," The Lancet Neurology, vol. 6, no. 8, pp. 734-746, 2007.
[11] R. C. Petersen, "Mild cognitive impairment as a diagnostic entity," Journal of Internal Medicine, vol.
256, no. 3, pp. 183-194, 2004.
[12] C. R. Jack Jr et al., "Introduction to the recommendations from the National Institute on Aging-
Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease," Alzheimer's &
Dementia, vol. 7, no. 3, pp. 257-262, 2011.
[13] G. M. McKhann et al., "The diagnosis of dementia due to Alzheimer‘s disease: Recommendations
from the National Institute on Aging-Alzheimer‘s Association workgroups on diagnostic guidelines for
Alzheimer's disease," Alzheimer's & Dementia, vol. 7, no. 3, pp. 263-269, 2011.
[14] M. S. Albert et al., "The diagnosis of mild cognitive impairment due to Alzheimer‘s disease:
Recommendations from the National Institute on Aging-Alzheimer‘s Association workgroups on
diagnostic guidelines for Alzheimer's disease," Alzheimer's & Dementia, vol. 7, no. 3, pp. 270-279,
2011.
[15] W. E. Klunk et al., "Imaging brain amyloid in Alzheimer's disease with Pittsburgh Compound‐ B,"
Annals of Neurology: Official Journal of the American Neurological Association and the Child
Neurology Society, vol. 55, no. 3, pp. 306-319, 2004.
[16] C. R. Jack Jr et al., "Brain beta-amyloid measures and magnetic resonance imaging atrophy both
predict time-to-progression from mild cognitive impairment to Alzheimer‘s disease," Brain, vol. 133,
no. 11, pp. 3336-3348, 2010.
[17] M. C. Carrillo et al., "Revisiting the framework of the National Institute on Aging-Alzheimer's
Association diagnostic criteria," Alzheimer's & Dementia, vol. 9, no. 5, pp. 594-601, 2013.
[18] M. F. Folstein, S. E. Folstein, and P. R. McHugh, "―Mini-mental state‖: A practical method for grading
the cognitive state of patients for the clinician," Journal of Psychiatric Research, vol. 12, no. 3, pp.
189-198, 1975.
[19] J. C. Morris, "The Clinical Dementia Rating (CDR): Current version and scoring rules," Neurology,
vol. 43, no. 11, pp. 2412-2414, 1993.
[20] S. Leandrou, S. Petroudi, P. A. Kyriacou, C. C. Reyes-Aldasoro, and C. S. Pattichis, "Quantitative MRI
brain studies in mild cognitive impairment and Alzheimer's disease: A methodological review," IEEE
Reviews in Biomedical Engineering, vol. 11, pp. 97-111, 2018.
[21] J. Mattila et al., "Optimizing the diagnosis of early Alzheimer's disease in mild cognitive impairment
subjects," Journal of Alzheimer's Disease, vol. 32, no. 4, pp. 969-979, 2012.

39
[22] A. Lim et al., "Clinico‐ neuropathological correlation of Alzheimer's disease in a community‐ based
case series," Journal of the American Geriatrics Society, vol. 47, no. 5, pp. 564-569, 1999.
[23] H. Petrovitch et al., "Accuracy of clinical criteria for AD in the Honolulu–Asia Aging Study, a
population-based study," Neurology, vol. 57, no. 2, pp. 226-234, 2001.
[24] A. Kazee, T. Eskin, L. Lapham, K. Gabriel, K. McDaniel, and R. Hamill, "Clinicopathologic correlates
in Alzheimer disease: assessment of clinical and pathologic diagnostic criteria," Alzheimer Disease and
Associated Disorders, 1993.
[25] E. E. Bron et al., "Standardized evaluation of algorithms for computer-aided diagnosis of dementia
based on structural MRI: The CADDementia challenge," NeuroImage, vol. 111, pp. 562-579, 2015.
[26] M. Prince, R. Bryce, E. Albanese, A. Wimo, W. Ribeiro, and C. P. Ferri, "The global prevalence of
dementia: A systematic review and metaanalysis," Alzheimer's & Dementia, vol. 9, no. 1, pp. 63-75. ,
2013.
[27] Australian Bureau of Statistics, "Causes of death, Australia, 2015," 2016.
[28] M. D. Hurd, P. Martorell, A. Delavande, K. J. Mullen, and K. M. Langa, "Monetary costs of dementia
in the United States," New England Journal of Medicine, vol. 368, no. 14, pp. 1326-1334, 2013.
[29] F. Mangialasche, A. Solomon, B. Winblad, P. Mecocci, and M. Kivipelto, "Alzheimer's disease:
Clinical trials and drug development," The Lancet Neurology, vol. 9, no. 7, pp. 702-716, 2010.
[30] M. Prince, R. Bryce, and C. Ferri, "World Alzheimer report 2011: The benefits of early diagnosis and
intervention," Alzheimer's Disease International2011.
[31] S. Paquerault, "Battle against Alzheimer's disease: The scope and potential value of magnetic
resonance imaging biomarkers," Academic Radiology, vol. 19, no. 5, pp. 509-511, 2012.
[32] Y. Kazemi and S. K. Houghten, "A deep learning pipeline to classify different stages of Alzheimer's
disease from fMRI data," in Proceedings of the IEEE Conference on Computational Intelligence in
Bioinformatics and Computational Biology (CIBCB), 2018 pp. 1-8.
[33] A. Khan and M. Usman, "Early diagnosis of Alzheimer's disease using machine learning techniques: A
review paper," in Proceedings of the 7th International Joint Conference on Knowledge Discovery,
Knowledge Engineering and Knowledge Management (IC3K), 2015, vol. 1, pp. 380-387.
[34] C. Zheng, Y. Xia, Y. Pan, and J. Chen, "Automated identification of dementia using medical imaging:
A survey from a pattern classification perspective," Brain Informatics, vol. 3, no. 1, pp. 17-27, 2016.
[35] R. Cuingnet et al., "Automatic classification of patients with Alzheimer's disease from structural MRI:
A comparison of ten methods using the ADNI database," Neuroimage, vol. 56, no. 2, pp. 766-781,
2011.
[36] R. V. Marinescu et al., "TADPOLE Challenge: Prediction of longitudinal evolution in Alzheimer's
disease," arXiv preprint arXiv:1805.03909, 2018.
[37] G. I. Allen et al., "Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease,"
Alzheimer's & Dementia, vol. 12, no. 6, pp. 645-653, 2016.
[38] A. Sarica, A. Cerasa, A. Quattrone, and V. Calhoun, "A machine learning neuroimaging challenge for
automated diagnosis of mild cognitive impairment," Neuroscience Methods, vol. 302, pp. 10-13, 2016.
[39] M. I. Razzak, S. Naz, and A. Zaib, "Deep Learning for Medical Image Processing: Overview,
Challenges and the Future," in Classification in BioApps: Springer, 2018, pp. 323-350.
[40] J. Ker, L. Wang, J. Rao, and T. Lim, "Deep learning applications in medical image analysis," IEEE
Access, vol. 6, pp. 9375-9389, 2018.
[41] D. Shen, G. Wu, and H.-I. Suk, "Deep learning in medical image analysis," Annual Review of
Biomedical Engineering, vol. 19, pp. 221-248, 2017.
[42] G. Litjens et al., "A survey on deep learning in medical image analysis," Medical Image Analysis, vol.
42, pp. 60-88, 2017.
[43] S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, and D. Feng, "Early diagnosis of Alzheimer's disease with
deep learning," in Proceedings of the IEEE 11th International Symposium on Biomedical Imaging
(ISBI), 2014, pp. 1015-1018.
[44] S. Liu et al., "Multimodal neuroimaging feature learning for multiclass diagnosis of Alzheimer's
disease," IEEE Transactions on Biomedical Engineering, vol. 62, no. 4, pp. 1132-1140, 2015.
[45] D. Jha and G. Kwon, "Alzheimer‘s disease detection using sparse autoencoder, scale conjugate
gradient and softmax output layer with fine tuning," International Journal of Machine Learning and
Computing, vol. 7, no. 1, pp. 13-17, 2017.
[46] C. Hu, R. Ju, Y. Shen, P. Zhou, and Q. Li, "Clinical decision support for Alzheimer's disease based on
deep learning and brain network," in Proceedings of the IEEE International Conference on
Communications (ICC), 2016, pp. 1-6.
[47] R. Ju, C. Hu, and Q. Li, "Early diagnosis of Alzheimer's disease based on resting-state brain networks
and deep learning," IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB),
vol. 16, no. 1, pp. 244-257, 2019.

40
[48] C. V. Dolph, M. Alam, Z. Shboul, M. D. Samad, and K. M. Iftekharuddin, "Deep learning of texture
and structural features for multiclass Alzheimer's disease classification," in Proceedings of the
International Joint Conference on Neural Networks (IJCNN), 2017, pp. 2259-2266.
[49] D. Lu, K. Popuri, G. W. Ding, R. Balachandar, M. F. Beg, and the Alzheimer‘s Disease Neuroimaging
Initiative, "Multiscale deep neural network based analysis of FDG-PET images for the early diagnosis
of Alzheimer‘s disease," Medical Image Analysis, vol. 46, pp. 26-34, 2018.
[50] F. Li, L. Tran, K.-H. Thung, S. Ji, D. Shen, and J. Li, "Robust deep learning for improved classification
of AD/MCI patients," in Proceedings of the International Workshop on Machine Learning in Medical
Imaging, 2014, pp. 240-247.
[51] X. Li, Y. Li, and X. Li, "Predicting clinical outcomes of Alzheimer‘s disease from complex brain
networks," in Proceedings of the International Conference on Advanced Data Mining and
Applications, 2017, pp. 519-525.
[52] K. Gunawardena, R. Rajapakse, and N. Kodikara, "Applying convolutional neural networks for pre-
detection of Alzheimer's disease from structural MRI data," in Proceedings of the 24th International
Conference on Mechatronics and Machine Vision in Practice (M2VIP), 2017, pp. 1-7.
[53] M. Faturrahman, I. Wasito, N. Hanifah, and R. Mufidah, "Structural MRI classification for Alzheimer's
disease detection using deep belief network," in Proceedings of the 11th International Conference on
Information & Communication Technology and System (ICTS), 2017, pp. 37-42.
[54] H.-I. Suk and D. Shen, "Deep ensemble sparse regression network for Alzheimer‘s disease diagnosis,"
in Proceedings of the International Workshop on Machine Learning in Medical Imaging, 2016, pp.
113-121.
[55] H.-I. Suk, S.-W. Lee, D. Shen, and the Alzheimer‘s Disease Neuroimaging Initiative, "Deep ensemble
learning of sparse regression models for brain disease diagnosis," Medical Image Analysis, vol. 37, pp.
101-113, 2017.
[56] C. Feng, A. Elazab, P. Yang, T. Wang, B. Lei, and X. Xiao, "3D convolutional neural network and
stacked bidirectional recurrent neural network for Alzheimer‘s disease diagnosis," in Proceedings of
the International Workshop on Predictive Intelligence in Medicine, 2018, pp. 138-146.
[57] M. Shakeri, H. Lombaert, S. Tripathi, S. Kadoury, and the Alzheimer‘s Disease Neuroimaging
Initiative, "Deep spectral-based shape features for Alzheimer‘s disease classification," in Proceedings
of the International Workshop on Spectral and Shape Analysis in Medical Imaging, 2016, pp. 15-24.
[58] Y. Chen, B. Shi, C. D. Smith, and J. Liu, "Nonlinear feature transformation and deep fusion for
Alzheimer‘s disease staging analysis," in Proceedings of the International Workshop on Machine
Learning in Medical Imaging, 2015, pp. 304-312.
[59] B. Shi, Y. Chen, P. Zhang, C. D. Smith, J. Liu, and the Alzheimer‘s Disease Neuroimaging Initiative,
"Nonlinear feature transformation and deep fusion for Alzheimer's disease staging analysis," Pattern
recognition, vol. 63, pp. 487-498, 2017.
[60] A. Ortiz, J. Munilla, F. J. Martínez-Murcia, J. M. Górriz, J. Ramírez, and the Alzheimer‘s Disease
Neuroimaging Initiative, "Learning longitudinal MRI patterns by SICE and deep learning: Assessing
the Alzheimer‘s disease progression," in Proceedings of the Annual Conference on Medical Image
Understanding and Analysis, 2017, pp. 413-424: Springer.
[61] P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil, "Lessons from applying the
systematic literature review process within the software engineering domain," Journal of Systems and
Software, vol. 80, no. 4, pp. 571-583, 2007.
[62] B. Kitchenham and S. Charters, "Guidelines for performing systematic literature reviews in software
engineering," Keele University & University of Durham, 2007.
[63] B. Kitchenham, "Procedures for performing systematic reviews," Keele University & Empirical
Software Engineering National ICT Australia Ltd, 2004.
[64] M. Hosni, I. Abnane, A. Idri, J. M. C. de Gea, and J. L. F. Alemán, "Reviewing ensemble classification
methods in breast cancer," Computer Methods and Programs in Biomedicine, vol. 177, pp. 89-112,
2019.
[65] H. M. Aljaroodi, M. T. Adam, R. Chiong, and T. Teubner, "Avatars and embodied agents in
experimental information systems research: A systematic review and conceptual framework,"
Australasian Journal of Information Systems, vol. 23, 2019.
[66] C. Jack et al., "Comparison of different MRI brain atrophy rate measures with clinical disease
progression in AD," Neurology, vol. 62, no. 4, pp. 591-600, 2004.
[67] T. Zhou, K.-H. Thung, X. Zhu, and D. Shen, "Feature learning and fusion of multimodality
neuroimaging and genetic data for multi-status Dementia diagnosis," in Proceedings of the
International Workshop on Machine Learning in Medical Imaging, 2017, pp. 132-140: Springer.

41
[68] T. Zhou, K. H. Thung, X. Zhu, and D. Shen, "Effective feature learning and fusion of multimodality
data using stage‐ wise deep neural network for dementia diagnosis," Human Brain Mapping, vol. 40,
no. 3, pp. 1001-1016, 2019.
[69] H.-I. Suk and D. Shen, "Deep learning-based feature representation for AD/MCI classification," in
Proceedings of the International Conference on Medical Image Computing and Computer-Assisted
Intervention, 2013, pp. 583-590.
[70] H.-I. Suk, S.-W. Lee, D. Shen, and the Alzheimer‘s Disease Neuroimaging Initiative, "Deep sparse
multi-task learning for feature selection in Alzheimer‘s disease diagnosis," Brain Structure and
Function, vol. 221, no. 5, pp. 2569-2587, 2016.
[71] H.-I. Suk, S.-W. Lee, D. Shen, and the Alzheimer‘s Disease Neuroimaging Initiative, "Latent feature
representation with stacked auto-encoder for AD/MCI diagnosis," Brain Structure and Function, vol.
220, no. 2, pp. 841-859, 2015.
[72] J.-S. Choi, E. Lee, and H.-I. Suk, "Regional abnormality representation learning in structural MRI for
AD/MCI diagnosis," in Proceedings of the International Workshop on Machine Learning in Medical
Imaging, 2018, pp. 64-72.
[73] H.-I. Suk, S.-W. Lee, D. Shen, and the Alzheimer‘s Disease Neuroimaging Initiative, "Hierarchical
feature representation and multimodal fusion with deep learning for AD/MCI diagnosis," NeuroImage,
vol. 101, pp. 569-582, 2014.
[74] H.-I. Suk, C.-Y. Wee, S.-W. Lee, and D. Shen, "State-space model with deep learning for functional
dynamics estimation in resting-state fMRI," NeuroImage, vol. 129, pp. 292-307, 2016.
[75] K. Aderghal, J. Benois-Pineau, and K. Afdel, "Classification of sMRI for Alzheimer's disease diagnosis
with CNN: Single siamese networks with 2D+ϵ approach and fusion on ADNI," in Proceedings of the
ACM International Conference on Multimedia Retrieval, 2017, pp. 494-498.
[76] K. Aderghal, J. Benois-Pineau, K. Afdel, and C. Gwenaëlle, "FuseMe: Classification of sMRI images
by fusion of deep CNNs in 2D+ ϵ projections," in Proceedings of the 15th International Workshop on
Content-Based Multimedia Indexing, 2017, p. 34.
[77] K. Aderghal, M. Boissenin, J. Benois-Pineau, G. Catheline, and K. Afdel, "Classification of sMRI for
AD diagnosis with convolutional neuronal networks: A pilot 2D+ϵ study on ADNI," in Proceedings of
the International Conference on Multimedia Modeling, 2017, pp. 690-701.
[78] K. Aderghal, A. Khvostikov, A. Krylov, J. Benois-Pineau, K. Afdel, and G. Catheline, "Classification
of Alzheimer disease on imaging modalities with deep CNNs using cross-modal transfer learning," in
Proceedings of the IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS),
2018, pp. 345-350.
[79] A. Khvostikov, K. Aderghal, J. Benois-Pineau, A. Krylov, and G. Catheline, "3D CNN-based
classification using sMRI and MD-DTI images for Alzheimer disease studies," arXiv preprint
arXiv:1801.05968, 2018.
[80] A. Khvostikov, K. Aderghal, A. Krylov, G. Catheline, and J. Benois-Pineau, "3D Inception-based CNN
with sMRI and MD-DTI data fusion for Alzheimer's disease diagnostics," arXiv preprint
arXiv:1809.03972, 2018.
[81] D. Cheng and M. Liu, "CNNs based multi-modality classification for AD diagnosis," in Proceedings of
the 10th International Congress on Image and Signal Processing, BioMedical Engineering and
Informatics (CISP-BMEI), 2017, pp. 1-5.
[82] D. Cheng and M. Liu, "Classification of Alzheimer‘s disease by cascaded convolutional neural
networks using PET images," in Proceedings of the International Workshop on Machine Learning in
Medical Imaging, 2017, pp. 106-113.
[83] M. Liu, D. Cheng, K. Wang, Y. Wang, and the Alzheimer‘s Disease Neuroimaging Initiative, "Multi-
modality cascaded convolutional neural networks for Alzheimer‘s disease diagnosis,"
Neuroinformatics, vol. 16, pp. 295–308, 2018.
[84] F. Li, D. Cheng, and M. Liu, "Alzheimer's disease classification based on combination of multi-model
convolutional networks," in Proceedings of the IEEE International Conference on Imaging Systems
and Techniques (IST), 2017, pp. 1-5.
[85] D. Cheng and M. Liu, "Combining convolutional and recurrent neural networks for Alzheimer's disease
diagnosis using PET images," in Proceedings of the IEEE International Conference on Imaging
Systems and Techniques (IST), 2017, pp. 1-5.
[86] M. Liu, D. Cheng, and W. Yan, "Classification of Alzheimer‘s disease by combination of convolutional
and recurrent neural networks using FDG-PET images," Frontiers in Neuroinformatics, vol. 12, 2018,
Art. no. 35.
[87] J. Shi, X. Zheng, Y. Li, Q. Zhang, and S. Ying, "Multimodal neuroimaging feature learning with
multimodal stacked deep polynomial networks for diagnosis of Alzheimer's disease," IEEE Journal of
Biomedical and Health Informatics, vol. 22, no. 1, pp. 173-183, 2018.

42
[88] R. Cui, M. Liu, and G. Li, "Longitudinal analysis for Alzheimer's disease diagnosis using RNN," in
Proceedings of the IEEE 15th International Symposium on Biomedical Imaging (ISBI) 2018, pp. 1398-
1401.
[89] G. Lee, K. Nho, B. Kang, K.-A. Sohn, and D. Kim, "Predicting Alzheimer‘s disease progression using
multi-modal deep learning approach," Scientific Reports, vol. 9, no. 1, 2019, Art. no. 1952.
[90] B. Kitchenham, O. P. Brereton, D. Budgen, M. Turner, J. Bailey, and S. Linkman, "Systematic
literature reviews in software engineering– A systematic literature review," Information and Software
Technology, vol. 51, no. 1, pp. 7-15, 2009.
[91] J. V. Hacker, M. Johnson, C. Saunders, and A. L. Thayer, "Trust in virtual teams: A multidisciplinary
review and integration," Australasian Journal of Information Systems, vol. 23, 2019.
[92] C. Lian, M. Liu, J. Zhang, and D. Shen, "Hierarchical fully convolutional network for joint atrophy
localization and Alzheimer's disease diagnosis using structural MRI," IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2018.
[93] D. Lu, K. Popuri, G. W. Ding, R. Balachandar, and M. F. Beg, "Multimodal and multiscale deep neural
networks for the early diagnosis of Alzheimer‘s disease using structural MR and FDG-PET images,"
Scientific Reports, vol. 8, no. 1, 2018, Art. no. 5697.
[94] M. Liu, J. Zhang, E. Adeli, and D. Shen, "Landmark-based deep multi-instance learning for brain
disease diagnosis," Medical Image Analysis, vol. 43, pp. 157-168, 2018.
[95] M. Liu, J. Zhang, D. Nie, P.-T. Yap, and D. Shen, "Anatomical landmark based deep feature
representation for MR images in brain disease diagnosis," IEEE Journal of Biomedical and Health
Informatics, vol. 22, no. 5, pp. 1476-1485, 2018.
[96] J. Liu et al., "Applications of deep learning to MRI images: A survey," Big Data Mining and Analytics,
vol. 1, no. 1, pp. 1-18, 2018.
[97] Z. Akkus, A. Galimzianova, A. Hoogi, D. L. Rubin, and B. J. Erickson, "Deep learning for brain MRI
segmentation: State of the art and future directions," Journal of Digital Imaging, vol. 30, no. 4, pp.
449-459, 2017.
[98] Y. Chen, H. Jia, Z. Huang, and Y. Xia, "Early identification of Alzheimer‘s disease using an ensemble
of 3D convolutional neural networks and magnetic resonance imaging," in Proceedings of the
International Conference on Brain Inspired Cognitive Systems, 2018, pp. 303-311.
[99] C. Zheng, Y. Xia, Y. Chen, X. Yin, and Y. Zhang, "Early diagnosis of Alzheimer‘s disease by
ensemble deep learning using FDG-PET," in Proceedings of the International Conference on
Intelligent Science and Big Data Engineering, 2018, pp. 614-622.
[100] P. Bhatkoti and M. Paul, "Early diagnosis of Alzheimer's disease: A multi-class deep learning
framework with modified k-sparse autoencoder classification," in Proceedings of the International
Conference on Image and Vision Computing New Zealand (IVCNZ), 2016, pp. 1-5.
[101] L. Cai, Z. Wang, H. Gao, D. Shen, and S. Ji, "Deep adversarial learning for multi-modality missing
data completion," in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, 2018, pp. 1158-1166.
[102] A. Karwath, M. Hubrich, S. Kramer, and the Alzheimer‘s Disease Neuroimaging Initiative,
"Convolutional neural networks for the identification of regions of interest in PET scans: A study of
representation learning for diagnosing Alzheimer‘s disease," in Proceedings of the Conference on
Artificial Intelligence in Medicine in Europe, 2017, pp. 316-321.
[103] M. Liu, J. Zhang, E. Adeli, and D. Shen, "Deep multi-task multi-channel learning for joint
classification and regression of brain status," in Proceedings of the International Conference on
Medical Image Computing and Computer-Assisted Intervention, 2017, pp. 3-11.
[104] J. G. Sled, A. P. Zijdenbos, and A. C. Evans, "A nonparametric method for automatic correction of
intensity nonuniformity in MRI data," IEEE Transactions on Medical Imaging, vol. 17, no. 1, pp. 87-
97, 1998.
[105] C. R. Jack Jr et al., "Tracking pathophysiological processes in Alzheimer's disease: an updated
hypothetical model of dynamic biomarkers," The Lancet Neurology, vol. 12, no. 2, pp. 207-216, 2013.
[106] T.-D. Vu, N.-H. Ho, H.-J. Yang, J. Kim, and H.-C. Song, "Non-white matter tissue extraction and deep
convolutional neural network for Alzheimer‘s disease detection," Soft Computing, vol. 22, no. 20, pp.
6825-6833, 2018.
[107] A. M. Taqi, A. Awad, F. Al-Azzo, and M. Milanova, "The impact of multi-optimizers and data
augmentation on TensorFlow convolutional neural network performance," in Proceedings of the IEEE
Conference on Multimedia Information Processing and Retrieval (MIPR), 2018, pp. 140-145.
[108] J. Qiao, Y. Lv, C. Cao, Z. Wang, and A. Li, "Multivariate deep learning classification of Alzheimer‘s
disease based on hierarchical partner matching independent component analysis," Frontiers in Aging
Neuroscience, vol. 10, 2018, Art. no. 417.

43
[109] A. Ortiz, J. Munilla, J. M. Gorriz, and J. Ramirez, "Ensembles of deep learning architectures for the
early diagnosis of the Alzheimer‘s disease," International Journal of Neural Systems, vol. 26, no. 07,
2016.
[110] A. Farooq, S. Anwar, M. Awais, and S. Rehman, "A deep CNN based multi-class classification of
Alzheimer's disease using MRI," in Proceedings of the IEEE International Conference on Imaging
Systems and Techniques (IST), 2017, pp. 1-6.
[111] A. Valliani and A. Soni, "Deep residual nets for improved Alzheimer's diagnosis," in Proceedings of
the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health
Informatics, 2017, pp. 615-615.
[112] A. Farooq, S. Anwar, M. Awais, and M. Alnowami, "Artificial intelligence based smart diagnosis of
Alzheimer's disease and mild cognitive impairment," in Proceedings of the Smart International Cities
Conference (ISC2), 2017, pp. 1-4.
[113] V. Wegmayr and D. Haziza, "Alzheimer classification with MR images: Exploration of CNN
performance factors," in Proceedings of 1st Conference on Medical Imaging with Deep Learning
(MIDL), 2018, pp. 1-7.
[114] S. Sarraf and G. Tofighi, "Classification of Alzheimer's disease structural MRI data by deep learning
convolutional neural networks," arXiv preprint arXiv:1607.06583, 2016.
[115] S. Sarraf and G. Tofighi, "DeepAD: Alzheimer′s disease classification via deep convolutional neural
networks using MRI and fMRI," bioRxiv 070441, p. 070441, 2016.
[116] S. Sarraf and G. Tofighi, "Classification of Alzheimer's disease using fMRI data and deep learning
convolutional neural networks," arXiv preprint arXiv:1603.08631, 2016.
[117] S. Sarraf and G. Tofighi, "Deep learning-based pipeline to recognize Alzheimer's disease using fMRI
data," in Proceedings of the Future Technologies Conference (FTC), 2016, pp. 816-820.
[118] S. Qiu, G. H. Chang, M. Panagia, D. M. Gopal, R. Au, and V. B. Kolachalama, "Fusion of deep
learning models of MRI scans, Mini–Mental State Examination, and logical memory test enhances
diagnosis of mild cognitive impairment," Alzheimer's & Dementia: Diagnosis, Assessment & Disease
Monitoring, vol. 10, pp. 737-749, 2018.
[119] S. Luo, X. Li, and J. Li, "Automatic Alzheimer‘s disease recognition from MRI data using deep
learning method," Journal of Applied Mathematics and Physics, vol. 5, no. 09, 2017, Art. no. 1892.
[120] M. Hon and N. Khan, "Towards Alzheimer's disease classification through transfer learning," in
Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2017, pp.
1166-1169.
[121] R. Jain, N. Jain, A. Aggarwal, and D. J. Hemanth, "Convolutional neural network based Alzheimer‘s
disease classification from magnetic resonance brain images," Cognitive Systems Research, vol. 57, pp.
147-159, 2019.
[122] C. Wu et al., "Discrimination and conversion prediction of mild cognitive impairment using
convolutional neural networks," Quantitative Imaging in Medicine and Surgery, vol. 8, no. 10, pp.
992–1003, 2018.
[123] C. D. Billones, O. J. L. D. Demetria, D. E. D. Hostallero, and P. C. Naval, "DemNet: A convolutional
neural network for the detection of Alzheimer's disease and mild cognitive impairment," in
Proceedings of the IEEE Region 10 Conference (TENCON), 2016, pp. 3724-3727.
[124] A. Moussavi-Khalkhali, M. Jamshidi, and S. Wijemanne, "Feature fusion for denoising and sparse
autoencoders: Application to neuroimaging data," in Proceedings of the 15th IEEE International
Conference on Machine Learning and Applications (ICMLA), 2016, pp. 605-610.
[125] J. Islam and Y. Zhang, "Deep convolutional neural networks for automated diagnosis of Alzheimer‘s
disease and mild cognitive impairment using 3D brain MRI," in Proceedings of the International
Conference on Brain Informatics, 2018, pp. 359-369.
[126] H. Tang, E. Yao, G. Tan, and X. Guo, "A fast and accurate 3D fine-tuning convolutional neural
network for Alzheimer‘s disease diagnosis," in Proceedings of the International CCF Conference on
Artificial Intelligence, 2018, pp. 115-126.
[127] S.-H. Wang, P. Phillips, Y. Sui, B. Liu, M. Yang, and H. Cheng, "Classification of Alzheimer‘s disease
based on eight-layer convolutional neural network with leaky rectified linear unit and max Pooling,"
Journal of Medical Systems, vol. 42, no. 5, p. 85, 2018.
[128] L. Gao et al., "Brain disease diagnosis using deep learning features from longitudinal MR images," in
Proceedings of the Joint International Conference on Web and Big Data Asia-Pacific Web (APWeb)
and Web-Age Information Management (WAIM) 2018, pp. 327-339.
[129] J. Islam and Y. Zhang, "Brain MRI analysis for Alzheimer‘s disease diagnosis using an ensemble
system of deep convolutional neural networks," Brain Informatics, vol. 5, no. 2, pp. 1-14, 2018.
[130] W. Lin et al., "Convolutional neural networks-based MRI image analysis for the Alzheimer‘s disease
prediction from mild cognitive impairment," Frontiers in Neuroscience, vol. 12, 2018, Art. no. 777.

44
[131] N. Tzourio-Mazoyer et al., "Automated anatomical labeling of activations in SPM using a macroscopic
anatomical parcellation of the MNI MRI single-subject brain," Neuroimage, vol. 15, no. 1, pp. 273-
289, 2002.
[132] N. J. Kabani, D. J. MacDonald, C. J. Holmes, and A. C. Evans, "3D anatomical atlas of the human
brain," NeuroImage, vol. 7, no. 4, p. S717, 1998.
[133] X. Zheng, J. Shi, Y. Li, X. Liu, and Q. Zhang, "Multi-modality stacked deep polynomial network based
feature learning for Alzheimer's disease diagnosis," in Proceedings of the IEEE 13th International
Symposium on Biomedical Imaging (ISBI), 2016, pp. 851-854: IEEE.
[134] J. Kim and B. Lee, "Identification of Alzheimer's disease and mild cognitive impairment using
multimodal sparse hierarchical extreme learning machine," Human Brain Mapping, vol. 39, no. 9, pp.
3728-3741, 2018.
[135] K.-H. Thung, P.-T. Yap, and D. Shen, "Multi-stage diagnosis of Alzheimer‘s disease with incomplete
multimodal data via multi-task deep learning," in Deep Learning in Medical Image Analysis and
Multimodal Learning for Clinical Decision Support: Springer, 2017, pp. 160-168.
[136] S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, and D. D. Feng, "Multi-phase feature representation
learning for neurodegenerative disease diagnosis," in Proceedings of the Australasian Conference on
Artificial Life and Computational Intelligence, 2015, pp. 350-359.
[137] F. Li, L. Tran, K.-H. Thung, S. Ji, D. Shen, and J. Li, "A robust deep model for improved classification
of AD/MCI patients," IEEE Journal of Biomedical and Health Informatics, vol. 19, no. 5, pp. 1610-
1616, 2015.
[138] A. Majumdar and V. Singhal, "Noisy deep dictionary learning: Application to Alzheimer's disease
classification," in Proceedings of the International Joint Conference on Neural Networks (IJCNN),
2017, pp. 2679-2683.
[139] Y. Wang et al., "A novel multimodal MRI analysis for Alzheimer's disease based on convolutional
neural network," in Proceedings of the 40th Annual International Conference of the IEEE Engineering
in Medicine and Biology Society (EMBC), 2018, pp. 754-757.
[140] D. Collazos-Huertas, A. Tobar-Rodriguez, D. Cárdenas-Peña, and G. Castellanos-Dominguez, "MRI-
based feature extraction using supervised general stochastic networks in dementia diagnosis," in
Proceedings of the International Work-Conference on the Interplay Between Natural and Artificial
Computation, 2017, pp. 363-373: Springer.
[141] R. Cui and M. Liu, "Hippocampus analysis based on 3D CNN for Alzheimer‘s disease diagnosis," in
Proceedings of the 10th International Conference on Digital Image Processing (ICDIP), 2018, vol.
10806, p. 108065O.
[142] N. Amoroso et al., "Deep learning reveals Alzheimer's disease onset in MCI subjects: Results from an
international challenge," Journal of Neuroscience Methods, vol. 302, pp. 3-9, 2018.
[143] F. Çitak-ER, D. Goularas, and B. Ormeci, "A novel convolutional neural network model based on
voxel-based morphometry of imaging data in predicting the prognosis of patients with mild cognitive
impairment," Journal of Neurological Sciences, vol. 34, no. 1, pp. 52-69, 2017.
[144] A. Gupta, M. Ayhan, and A. Maida, "Natural image bases to represent neuroimaging data," in
Proceedings of the International Conference on Machine Learning, 2013, pp. 987-994.
[145] D. Cheng, M. Liu, J. Fu, and Y. Wang, "Classification of MR brain images by combination of multi-
CNNs for AD diagnosis," in Proceedings of the 9th International Conference on Digital Image
Processing (ICDIP), 2017, vol. 10420, p. 1042042.
[146] T. D. Vu, H.-J. Yang, V. Q. Nguyen, A.-R. Oh, and M.-S. Kim, "Multimodal learning using
convolution neural network and sparse autoencoder," in Proceedings of the IEEE International
Conference on Big Data and Smart Computing (BigComp), 2017, pp. 309-312.
[147] F. Li, M. Liu, and the Alzheimer‘s Disease Neuroimaging Initiative, "Alzheimer's disease diagnosis
based on multiple cluster dense convolutional networks," Computerized Medical Imaging and
Graphics, vol. 70, pp. 101-110, 2018.
[148] A. Punjabi, A. Martersteck, Y. Wang, T. B. Parrish, and A. K. Katsaggelos, "Neuroimaging modality
fusion in Alzheimer's classification using convolutional neural networks," arXiv preprint
arXiv:1811.05105, 2018.
[149] A. Payan and G. Montana, "Predicting Alzheimer's disease: A neuroimaging study with 3D
convolutional neural networks," arXiv preprint arXiv:1502.02506, 2015.
[150] E. M. Ali, A. F. Seddik, and M. H. Haggag, "Automatic detection and classification of Alzheimer's
disease from MRI using TANNN," International Journal of Computer Applications, vol. 148, no. 9,
pp. 30-34, 2016.
[151] R. Wolz et al., "Multi-method analysis of MRI images in early diagnostics of Alzheimer's disease,"
PLOS one, vol. 6, no. 10, p. e25446, 2011.

45
[152] Y. Fan, D. Shen, R. C. Gur, R. E. Gur, and C. Davatzikos, "COMPARE: Classification of
morphological patterns using adaptive regional elements," IEEE Transactions on Medical Imaging,
vol. 26, no. 1, pp. 93-105, 2007.
[153] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, "Deep learning for visual understanding:
A review," Neurocomputing, vol. 187, pp. 27-48, 2016.
[154] E. Hosseini-Asl, R. Keynto, and A. El-Baz, "Alzheimer's disease diagnostics by adaptation of 3D
convolutional network," in Proceedings of the IEEE International Conference on Image Processing
(ICIP), 2016, pp. 126-130.
[155] E. Hosseini-Asl, G. Gimel'farb, and A. El-Baz, "Alzheimer's disease diagnostics by a deeply supervised
adaptable 3D convolutional network," arXiv preprint arXiv:1607.00556, 2016.
[156] R. Livni, S. Shalev-Shwartz, and O. Shamir, "An algorithm for training polynomial networks," arXiv
preprint arXiv:1304.7045, 2013.
[157] Y. LeCun et al., "Backpropagation applied to handwritten zip code recognition," Neural Computation,
vol. 1, no. 4, pp. 541-551, 1989.
[158] O. Russakovsky et al., "Imagenet large scale visual recognition challenge," International Journal of
Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.
[159] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document
recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[160] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural
networks," Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
[161] Y. Jia et al., "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the 22nd
ACM International Conference on Multimedia, 2014, pp. 675-678.
[162] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition,"
arXiv preprint arXiv:1409.1556, 2014.
[163] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2015, pp. 1-9.
[164] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[165] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional
networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017,
vol. 1, no. 2, pp. 4700-4708.
[166] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact
of residual connections on learning," in Proceedings of the 31st Association for the Advancement of
Artificial Intelligence Conference on Artificial Intelligence (AAAI) 2017, vol. 4, p. 12.
[167] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez, "A
review on deep learning techniques applied to semantic segmentation," arXiv preprint
arXiv:1704.06857, 2017.
[168] D. Su, H. Zhang, H. Chen, J. Yi, P.-Y. Chen, and Y. Gao, "Is robustness the cost of accuracy? A
comprehensive study on the robustness of 18 deep image classification models," in Proceedings of the
European Conference on Computer Vision (ECCV), 2018, pp. 631-648.
[169] J. Fu and Y. Rui, "Advances in deep learning approaches for image tagging," APSIPA Transactions on
Signal and Information Processing, vol. 6, 2017.
[170] J. M. Ortiz-Suárez, R. Ramos-Pollán, and E. Romero, "Exploring Alzheimer's anatomical patterns
through convolutional networks," in Proceedings of the 12th International Symposium on Medical
Information Processing and Analysis, 2017, vol. 10160, p. 10160Z.
[171] G. Awate, S. Bangare, G. Pradeepini, and S. Patil, "Detection of Alzheimers disease from MRI using
convolutional neural network with Tensorflow," arXiv preprint arXiv:1806.10170, 2018.
[172] J. Islam and Y. Zhang, "A novel deep learning based multi-class classification method for Alzheimer‘s
disease detection using brain MRI data," in Proceedings of the International Conference on Brain
Informatics, 2017, pp. 213-222.
[173] J. Islam and Y. Zhang, "An ensemble of deep convolutional neural networks for Alzheimer's disease
detection and classification," Accepted poster at NIPS 2017 Workshop on Machine Learning for
Health, arXiv preprint arXiv:1712.01675, 2017.
[174] S. Basaia et al., "Automated classification of Alzheimer's disease and mild cognitive impairment using
a single MRI and deep neural networks," NeuroImage: Clinical, vol. 21, 2019, Art. no. 101645.
[175] K. Bäckström, M. Nazari, I. Y.-H. Gu, and A. S. Jakola, "An efficient 3D deep convolutional network
for Alzheimer's disease diagnosis using MR images," in Proceedings of the IEEE 15th International
Symposium on Biomedical Imaging (ISBI), 2018, pp. 149-153.
[176] E. Jabason, M. O. Ahmad, and M. S. Swamy, "Shearlet based stacked convolutional network for
multiclass diagnosis of Alzheimer‘s disease using the Florbetapir PET Amyloid imaging data," in

46
Proceedings of the 16th IEEE International New Circuits and Systems Conference (NEWCAS), 2018,
pp. 344-347.
[177] V. Wegmayr, S. Aitharaju, and J. Buhmann, "Classification of brain MRI with big data and deep 3D
convolutional neural networks," in Proceedings of the SPIE Medical Imaging 2018: Computer-Aided
Diagnosis, 2018, vol. 10575, p. 10575S.
[178] S. E. Spasov, L. Passamonti, A. Duggento, P. Liò, and N. Toschi, "A multi-modal convolutional neural
network framework for the prediction of Alzheimer‘s disease," in Proceedings of the 40th Annual
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2018, pp.
1271-1274.
[179] S. Spasov, L. Passamonti, A. Duggento, P. Lio, N. Toschi, and the Alzheimer‘s Disease Neuroimaging
Initiative, "A parameter-efficient deep learning approach to predict conversion from mild cognitive
impairment to Alzheimer's disease," Neuroimage, vol. 189, pp. 276-287, 2019.
[180] S. Esmaeilzadeh, D. I. Belivanis, K. M. Pohl, and E. Adeli, "End-to-end Alzheimer‘s disease diagnosis
and biomarker identification," in Proceedings of the International Workshop on Machine Learning in
Medical Imaging, 2018, pp. 337-345.
[181] H. Choi, K. H. Jin, and the Alzheimer‘s Disease Neuroimaging Initiative, "Predicting cognitive decline
with deep learning of brain metabolism and amyloid imaging," Behavioural Brain Research, vol. 344,
pp. 103-109, 2018.
[182] H. Karasawa, C.-L. Liu, and H. Ohwada, "Deep 3D convolutional neural network architectures for
Alzheimer‘s disease diagnosis," in Proceedings of the Asian Conference on Intelligent Information and
Database Systems, 2018, pp. 287-296.
[183] S. Korolev, A. Safiullin, M. Belyaev, and Y. Dodonova, "Residual and plain convolutional neural
networks for 3D brain MRI classification," in Proceedings of the IEEE 14th International Symposium
on Biomedical Imaging (ISBI), 2017, pp. 835-838.
[184] C. Yang, A. Rangarajan, and S. Ranka, "Visual explanations from deep 3D convolutional neural
networks for Alzheimer's disease classification," in Proceedings of the AMIA Annual Symposium,
2018, pp. 1571–1580.
[185] H. Wang et al., "Ensemble of 3D densely connected convolutional network for diagnosis of mild
cognitive impairment and Alzheimer‘s disease," Neurocomputing, vol. 333, pp. 145-156, 2019.
[186] U. Senanayake, A. Sowmya, and L. Dawes, "Deep fusion pipeline for mild cognitive impairment
diagnosis," in Proceedings of the IEEE 15th International Symposium on Biomedical Imaging (ISBI),
2018, pp. 1394-1997.
[187] A. Ebrahimi-Ghahnavieh, S. Luo, and R. Chiong, "Transfer learning for Alzheimer's disease detection
on MRI images," in IEEE International Conference on Industry 4.0, Artificial Intelligence, and
Communications Technology (IAICT), 2019, pp. 133-138.
[188] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks,"
in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010, pp.
249-256.
[189] S. Khan and T. Yairi, "A review on the application of deep learning in system health management,"
Mechanical Systems and Signal Processing, vol. 107, pp. 241-265, 2018.
[190] R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, "Deep learning for healthcare: Review,
opportunities and challenges," Briefings in Bioinformatics, vol. 19, no. 6, pp. 1236-1246, 2017.
[191] D. Ravì et al., "Deep learning for health informatics," IEEE Journal of Biomedical and Health
Informatics, vol. 21, no. 1, pp. 4-21, 2016.
[192] N. Vinutha, P. D. Shenoy, and K. Venugopal, "Efficient morphometric techniques in Alzheimer‘s
disease detection: Survey and tools," Neuroscience International, vol. 7, no. 2, pp. 19-44, 2016.
[193] P. V. Rouast, M. Adam, and R. Chiong, "Deep learning for human affect recognition: Insights and new
developments," IEEE Transactions on Affective Computing, 2019.
[194] C. R. Jack Jr et al., "The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods," Journal
of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic
Resonance in Medicine, vol. 27, no. 4, pp. 685-691, 2008.
[195] K. A. Ellis et al., "The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging:
methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of
Alzheimer's disease," International Psychogeriatrics, vol. 21, no. 4, pp. 672-687, 2009.
[196] D. S. Marcus, T. H. Wang, J. Parker, J. G. Csernansky, J. C. Morris, and R. L. Buckner, "Open Access
Series of Imaging Studies (OASIS): Cross-sectional MRI data in young, middle aged, nondemented,
and demented older adults," Journal of Cognitive Neuroscience, vol. 19, no. 9, pp. 1498-1507, 2007.
[197] I. B. Malone et al., "MIRIAD—Public release of a multiple time point Alzheimer's MR imaging
dataset," NeuroImage, vol. 70, pp. 33-36, 2013.

47
[198] K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level
performance on imagenet classification," in Proceedings of the IEEE International Conference on
Computer Vision, 2015, pp. 1026-1034.
[199] R. Li et al., "Deep learning based imaging data completion for improved brain disease diagnosis," in
Proceedings of the International Conference on Medical Image Computing and Computer-Assisted
Intervention, 2014, pp. 305-312.
[200] D. Ramachandram and G. W. Taylor, "Deep multimodal learning: A survey on recent advances and
trends," IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 96-108, 2017.
[201] J. G. Mannheim et al., "PET/MRI hybrid systems," in Proceedings of the Seminars in Nuclear
Medicine, 2018, vol. 48, no. 4, pp. 332-347.

48

You might also like