978-3-030-86475-0_9-2

AutoEncoder for Neuroimage
Mingli Zhang1(B) , Fan Zhang2(B) , Jianxin Zhang3(B) , Ahmad Chaddad4 ,

Fenghua Guo5 , Wenbin Zhang6 , Ji Zhang7 , and Alan Evans1
1
Montreal Neurological Institute, McGill University, Montreal, Canada
mingli.zhang@mcgill.ca
2
Shandong Future Intelligent Financial Engineering Laboratory, Yantai, China
3
Dalian Minzu University, Dalian, China
4
School of Artificial Intelligence, Guilin University of Electronic Technology,
Guilin, China
5
Shandong University, Jinan, China
6
Carnegie Mellon University, Pittsburgh, USA
7
University of Southern Queensland, Darling Heights, Australia
Abstract. Variational AutoEncoder (VAE) as a class of neural networks

performing nonlinear dimensionality reduction has become an effective
tool in neuroimaging analysis. Currently, most studies on VAE con-
sider unsupervised learning to capture the latent representations and
to some extent, this strategy may be under-explored in the case of heavy
noise and imbalanced neural image dataset. In the reinforcement learn-
ing point of view, it is necessary to consider the class-wise capability
of decoder. The latent space for autoencoders depends on the distribu-
tion of the raw data, the architecture of the model and the dimension of
the latent space, combining a supervised linear autoencoder model with
variational autoencoder (VAE) may improve the performance of clas-
sification. In this paper, we proposed a supervised linear and nonlinear
cascade dual autoencoder approach, which increases the latent space dis-
criminative capability by feeding the latent low dimensional space from
semi-supervised VAE into a further step of the linear encoder-decoder
model. The effectiveness of the proposed approach is demonstrated on
brain development. The proposed method also is evaluated on imbal-
anced neural spiking classification.
1 Introduction
Modeling brain age is critical for the diagnosis of neuropsychiatric disorder.
Investigations on brain age have benefited from the development of advanced
magnetic resonance imaging (MRI) [3] and from large-scale initiatives such as
the Pediatric Imaging, Neurocognition, and Genetics (PING) [6] studies. One of
the simplest ways to model brain age is predicting participant age from magnetic
resonance imaging data through machine learning and statistical analysis. One
of the important topics in neuroscience is about designing an effective encoding
model and applying it to neural spiking prediction [4]. A shared variance compo-
nent analysis method was applied for the estimation of the neural population’s
variance reliably encoding a latent signal [4].
c Springer Nature Switzerland AG 2021
C. Strauss et al. (Eds.): DEXA 2021, LNCS 12924, pp. 84–90, 2021.
https://doi.org/10.1007/978-3-030-86475-0_9
AutoEncoder for Neuroimage 85
Supervised learning in the context of neural networks is commonly used into

a variety of neuroimaging tasks. Unsupervised VAE are commonly used in learn-
ing complex distribution of the dataset [1]. autoencoders (AEs) based supervised
regression is proposed in [8] and a supervised linear AEs is proposed and applied
to brain age prediction [6]. Autoencoders can be treated as a truncated Principal
Component Analysis (PCA) [2] or analysis and synthesis dictionary learning [6,7].
We propose a framework to integrate supervised linear AE [6] and nonlinear
VAE based regression [8] into cascade dual autoencoders. Intuitively, a super-
vised linear autoencoder will be fit to variational autoencoder based regres-
sion. The dual autoencoder can tight the latent representation with their high-
dimensional input dataset. Establishing a more robust latent representation and
a discriminative linear combination, with setup the relationship between the
preliminary latent representation and their high-dimensional input dataset. In
addition, in the supervised linear autoencoder, we adopt class-wise information
estimation to make a more discriminative output for the final classification and
realize supervised learning.
The major contributions of this work are as follows:
– We propose a novel cascade dual autoencoder for generating discrimina-

tive and robust latent representations, which is trained with the variational
autoencoder based regression and class-wise linear autoencoder.
– We present a joint learning framework to embed the inputs into a discrimi-
native latent space with variational autoencoder of the cascade dual autoen-
coder, and assign them with initial input to the ideal distribution by class-wise
linear autoencoder.
– The proposed non-linear and linear cascade dual autoencoder framework is
more efficient and robust for different kinds of datasets than current AEs.
– Empirical experiments on two different datasets demonstrate the effectiveness
of our proposed approach that outperforms state-of-the-art methods.
2 The Proposed Approach

The proposed approach is composed of two important components: 1) The vari-
ational autoencoder based regression [8] with encoder network E, the decoder
network D and the regression estimation R of label L; 2) The linear class-wise
autoencoder with P = [P1 , · · · , Pk , · · · , PK ] as the linear encoder and D =
[D1 , · · · , Dk , · · · , DK ] as the linear decoder [6], where Pk ∈ d×m and Dk ∈ m×d.
We treat the initial input samples as X = [X1 , · · · , Xk , · · · , XK ] from all the K
classes, and the data in k-th class is denoted as Xk ∈ Rd×Nk , where d is the dimen-
sion of training samples and Nk is the number of samples of class k.
2.1 Variational AutoEncoder Based Regression

In VAE based regression, the encoder E provides the reduced latent represen-
tation of the input X, latent representation Z, the decoder network D can
reconstruct X from Z as X̂, and Z is associated with labels L. Our goal
is applied the label into the variational autoencoder based regression, then
86 M. Zhang et al.
Z = [Z1 , · · · , Zk , · · · , ZK ] ∈ Rd1 ×Nk and X with L as the input of the class-wise

linear autoencoder for more accurate and robust classification. We first train
the variational autoencoder based regression of the cascade dual autoencoder
framework.
Generative Model. For the input features X, there is a latent representa-
tion Z ∈ Rd1 ×N associated with the labels L ∈ R1×N , where d1 is the dimension
of latent representation and N is the total number of training samples, it is not
one dimension but related with 1D label. To well guarantee the quality of latent
representation. we construct a deep generative model and the generative process
of x as p(x, z, l) = p(x|z)p(z|l)p(l), where p(z) = N (z|0, I), pθ (x|z) = f (x; z, θ),
where f (x; z, θ) is a suitable likelihood function (e.g., Gaussian or Bernoulli dis-
tribution when x is binary), p(z|l) is a label related prior on latent representation,
p(z|l) is a linear generator model, p(z|l) ∼ wT l + w1 , s.t. wT w = I, where 1
is l1 norm, and p(l) is prior on label. pθ (x|z). The estimated samples from the
posterior distribution over p(z|x) are used to predicting the label l.
Inference Model. To have a scalable and tractable variational infer-
ence and parameter learning, we adopt the standard variational infer-
ence and an auxiliary function q((z, l)|x), we omit the parameter ϕ for
qϕ ((z, l)|x), to approximate the posterior, p((z, l)|x). log p(x) is the sum
of the divergence between q((z, l)|x) and p((z, l)|x). we assume there is
q((z, l)|x) = q(z|x)q(l|x). We follow the variational principle by applying
a lower bound objective function to guarantee the accuracy of the poste-
rior approximation. The lower bound objective model can be written as
J1 := −D(q(l|x), p(l)) + λEq(z|x) [log p(x|z)] − Eq(l|x) [D(q(z|x), p(z|l))]

(1)
where λ is a hyper-parameter that controls the relative weight of the discrim-
inative and generative learning. The higher λ is, the more weight for decoded
reconstruction from latent representation to estimated input. D(q(.), p(.)) is a
divergence function, such as KL-divergence. q(l|x) is formulated as a univariate
Gaussian distribution. q(z|x) as the probabilistic encoder enforces the input to
latent space with multivariate normal distribution [8]. The last term of Eq. (1)
forces q(z|x) as close as possible to the label-prior p(z|l).
2.2 Supervised Linear Autoencoder

To have much accurate classification results than VAE based regression, we intro-
duce class-wise linear autoencoder with the Xk and latent representation Zk from
variational autoencoder as input features, setting as Sk = {Xk , Zk }, where label
is the class number k, the projective equation and reconstructive equation are
as follows
Ak = Pk Sk + n0 X̂k = Dk Ak + n1 (2)
where, Sk ∈ Rn denotes the input features, Ak ∈ Rd is the d dimensional hidden
latent variables, X̂k ∈ Rn is the estimated input Sk from the Ak . Pk is linear
encoder transforming the input features into the Ak . D is linear decoder, back-
project the Ak to estimated outputs. n0 and n1 are bias. The class-wise linear
Table 1. The total amount of spike in each spike group.

Group 1 2 3 4 5 6 7 8
# 25390 3168 1523 339 69 15 1 2
autoencoder modeled by minimizing the expected squared reconstruction error

as

K
J2 := arg min s Sk − Dk Pk Sk F + λPk S̄k 1
2
, (3)
P,D
k=1 Sk ={Xk ,Zk }
where the Frobenius norm is fidelity term, the term with strict sparsity l1 norm
is to force the samples of other classes S̄ not to fit into the modeling of the
current class and hence ensure the model to be class-wise discriminative. This
supervised linear autoencoder can be solved same as the former works [6,7].
Therefore, the overall loss of the proposed supervised cascaded autoencoder
network is as follows,
min J1 + J2 , (4)
θ,ϕ,P,D
2.3 Implementation Details

The supervised variational autoencoder takes data (X, L) as input then out-
puts (X̂, L̂), where L̂ is the predicted label corresponding to the label posterior.
Keras is applied to train q(l|x) with three convolutional layers, two max pool
layers and one softmax layer, then dropout and ReLU activation. When training
q(z|x), three convolutional layers, and two fully connected layers with batch nor-
malization, dropout and ReLU activation are applied. For p(x|z) and p(x|l, z),
a fully connected layer with three ReLU activation and then the sigmoid for
the output. The linear supervised autoencoder is optimized with an alternating
direction method of multipliers (ADMM).
3 Experiments
The proposed cascade dual autoencoder framework is evaluated on modeling
brain age from 3 to 21 years old with the cortical thickness from PING dataset,
the details of the subjects are listed in [6]. The proposed framework is also eval-
uated on modeling single-neuron spiking activity with calcium imaging of 30507
slides. For each sequence, the firing rate and power spectrogram are computed.
The distribution of spikes is presented in Table 1. The cortical calcium image is
in time-sequence format, each spike is not only related to the corresponding cor-
tical calcium image but also their neighbors, we proposed using a non-local image
mean based approach and applied it to the spike calcium imaging to generate
a number of non-local mean calcium imaging sequences. For detailed about the
dataset, please refer to the materials in [5] and Table 1. 5-fold cross-validation
88 M. Zhang et al.
20
18 Table 2. Classification results on the brain age.

16
14
Predicted Values
12 RMSE MAE ACC

10
8
RF 3.5503 2.6186 0.7200
6 NDPL 3.806 2.799 0.7001
4
SVM 4.1641 3.3645 0.5525
2
2 4 6 8 10 12 14 16 18
Test Values VAE 3.6832 2.6850 0.6771
Fig. 1. The predicted age with Ours 3.5077 2.529 0.7336
ground-truth testing R2 = 0.7730
is applied on these experiments. To measure performance in terms of predic-

tion accuracy (ACC), root means square error (RMSE) and mean absolute error
(MAE) are applied in this paper.
Prediction of Brain Age : We first demonstrate the proposed approach’s
performance by predicting the brain age from 3 to 21 years old with the cor-
tical thickness of T1 structure MRI on the PING database. We measure the
contribution of the proposed method by comparing results against the similar
approaches NDPL [6] and VAE [8]. Figure 1 and Table 2 give the prediction accu-
racy in terms of correlation (R2), RMSE, MAE and accuracy (ACC). As one
can see, the proposed method achieved the best accuracy compared with other
baselines. Figure 1 shows the general relationship between cortical thickness and
age which reflects a high correlation (R2 = 0.773) between age and predicted age
(brain age). As expected, a lower prediction accuracy is observed therein on both
sides of the age range due to the challenging regression problem of ‘regression
to mean’.
Prediction of Spiking : The performance of the proposed approach is
demonstrated on predicting the single-neuron spiking, based on calcium imag-
ing. Here, we evaluate our method in a classification setting by applying a
sliding window on the 30507 spikes for the data preprocessing scheme. The 5-
fold cross-validation is applied on these experiments. To measure performance
in terms of prediction overall accuracy (oACC), balanced accuracy(bACC).
oACC = Nc /Nt , where, Nc is total number of all correctly classified subjects
1
K Nck
and Nt is number of all test subjects. bACC = K k=1 , where, Nck is total
Ntk
number of all test subjects in class k. Ntk is the total number of all test subjects
in class k.
Table 3 compares the RMSE, MAE, oACC and bACC obtained by our app-
roach to the recently proposed method NDPL [6], variational autoencoders
(VAE) and random forest (RF) for evaluating the proposed model. We see that
the proposed approach outperforms the state-of-the-art method. N LM is the
proposed model with non-local calcium image mean as input features. From
Table 3, we can find the non-local frame mean as input features have lower resid-
ual classification errors (RMSE and MAE), compared with the calcium image
features as input directly. With non-local calcium image features as input, the
proposed achieved the best performance, yielding improvements of about 0.1805

in RMSE, 0.0855 in MAE and 0.0367 in bACC.
Table 3. Classification results on the single neuron spikes with 5-fold CV.
RMSE MAE oACC bACC
RF(NLM) 0.9932 0.7834 0.8400 0.3100
NDPL(NLM) 1.2549 1.0749 0.5608 0.2377
SVM(NLM) 1.4891 1.8575 0.4525 0.1727
VAE 1.1182 0.8700 0.6371 0.2718
Ours 1.0755 0.8840 0.8418 0.3213
Ours(NLM) 0.8950 0.7985 0.8070 0.3580
4 Conclusion
We proposed an efficient and robust cascade dual autoencoder framework model
brain development and spikes prediction with calcium image and behavior video
frame. Compared with the conventional methods, this approach learns discrim-
inative features by imposing both variational autoencoder and class-wise linear
autoencoder, and a l1 sparsity constraint on coefficients of non-current-class.
Experiments on the tasks of predicting the brain age and modeling spikes showed
the benefit of our approach compared to state-of-the-art methods for these tasks.
Furthermore, our approach can be used in understanding the influence of gender
on brain development.
Acknowledgements. This work was partially supported by the NSFC (61902220,

61972062), the Young and Middle-aged Talents Program of the National Civil Affairs
Commission, the Fonds de recherche du Québec-Santé (FRQS 271636,298507), the Sci-
ence and Technology Innovation Program for Distributed Young Talents of Shandong
Province Higher Education Institutions under Grant No. 2019KJN042.
References
1. Benou, A., Veksler, R., Friedman, A., Raviv, T.R.: De-noising of contrast-enhanced
MRI sequences by an ensemble of expert deep neural networks. In: Deep Learning
and Data Labeling for Medical Applications, pp. 95–110. Springer, New York (2016)
2. Bzdok, D., Eickenberg, M., Grisel, O., Thirion, B., Varoquaux, G.: Semi-supervised
factored logistic regression for high-dimensional neuroimaging data. In: Advances
in Neural Information Processing Systems. pp. 3348–3356 (2015)
3. Cole, J.H., Franke, K.: Predicting age using neuroimaging: Innovative brain ageing
biomarkers. Trends Neurosci. 40(12), 681–690 (2017)
4. Stringer, C., Pachitariu, M., Steinmetz, N., Reddy, C.B., Carandini, M., Harris,
K.D.: Spontaneous behaviors drive multidimensional, brainwide activity. Science
364(6437), 255–255 (2019)
90 M. Zhang et al.
5. Xiao, D., et al.: Mapping cortical mesoscopic networks of single spiking cortical or
sub-cortical neurons. Elife 6, e19976 (2017)
6. Zhang, M., et al. : Brain status modeling with non-negative projective dictionary
learning. NeuroImage 206, 116226 (2020)
7. Zhang, M., Guo, Y., Zhang, C., Poline, J.-B., Evans, A.: Modeling and analysis
brain development via discriminative dictionary learning. In: Knoll, F., Maier, A.,
Rueckert, D., Ye, J.C. (eds.) MLMIR 2019. LNCS, vol. 11905, pp. 80–88. Springer,
Cham (2019). https://doi.org/10.1007/978-3-030-33843-5 8
8. Zhao, Q., Adeli, E., Honnorat, N., Leng, T., Pohl, K.M.: Variational autoencoder
for regression: application to brain aging analysis. In: Shen, D., Liu, T., Peters,
T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019.
LNCS, vol. 11765, pp. 823–831. Springer, Cham (2019). https://doi.org/10.1007/
978-3-030-32245-8 91

978-3-030-86475-0_9-2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

978-3-030-86475-0_9-2

Uploaded by

Copyright:

Available Formats

AutoEncoder for Neuroimage

Mingli Zhang1(B) , Fan Zhang2(B) , Jianxin Zhang3(B) , Ahmad Chaddad4 ,

Abstract. Variational AutoEncoder (VAE) as a class of neural networks

Supervised learning in the context of neural networks is commonly used into

– We propose a novel cascade dual autoencoder for generating discrimina-

2 The Proposed Approach

2.1 Variational AutoEncoder Based Regression

Z = [Z1 , · · · , Zk , · · · , ZK ] ∈ Rd1 ×Nk and X with L as the input of the class-wise

J1 := −D(q(l|x), p(l)) + λEq(z|x) [log p(x|z)] − Eq(l|x) [D(q(z|x), p(z|l))]

2.2 Supervised Linear Autoencoder

Table 1. The total amount of spike in each spike group.

autoencoder modeled by minimizing the expected squared reconstruction error

2.3 Implementation Details

18 Table 2. Classiﬁcation results on the brain age.

12 RMSE MAE ACC

is applied on these experiments. To measure performance in terms of predic-

proposed achieved the best performance, yielding improvements of about 0.1805

Acknowledgements. This work was partially supported by the NSFC (61902220,

You might also like