A Separated Feature Learning based DBN Structure for

A Separated Feature Learning based DBN Structure for
Classification of SSMVEP Signals*

Yaguang Jia, Jun Xie, Member, IEEE, Guanghua Xu, Member, IEEE, Min Li, Member, IEEE, Sicong
Zhang, Ailing Luo and Xingliang Han
which are easy to lose meaningful information and need brain


Abstract— Signal processing is one of the key points in brain

computer interface (BCI) application. The common methods in signals to be strong enough, making large difference in
BCI signal classification include canonical correlation analysis performance among different subjects. Furthermore, 3-5
(CCA), support vector machine (SVM) and so on. However, seconds or more time are usually needed to guarantee
because BCI signals are very complex and valid signals often satisfactory recognition accuracy [1], which would be a long
come with confounded background noise, many current time for BCI to response a control intention. To achieve high
classification methods would lose meaningful information classification accuracy, short response time, and alleviation of
embedded in human EEGs. Otherwise, due to the huge large inter-subject variability, more effective signal processing
inter-subject variability with respect to characteristics and algorithm should be implemented.
patterns of BCI signals, there often exists large difference of
classification accuracy among different subjects. Since BCI With the capability in processing high dimensional and
signals have high dimensionality and multi-channel properties, nonlinear data, deep learning (DL) method have achieved big
this paper proposes a novel structure of deep belief neural success in image processing, computer vision and natural
(DBN) network stacked by restricted boltsman machine (RBM) language processing in the last decade [2]. BCI signal is a kind
to extract efficient features from steady-state motion visual of complex, nonlinear and stochastic signal, and its high
evoked potential signals and implement further classification. dimensionality and multi-channel properties make the DL an
Here DBN extracts local feature from BCI data of each channel inherent superior technique in processing it. The key point is
separately and fuses the local features, and then input the fused to design an efficient network for each specific BCI paradigm.
features to the output classifier which is consist of softmax units. In recent years, a variety of DL structures have been used in
Results proved that the proposed algorithm could achieve
feature extraction and pattern recognition of BCI signals. And
higher accuracy and lower inter-subject variability in short
response time when compared to conventional CCA method.
it has been proved that DL algorithms are more efficient
compared to the traditional algorithms such as SVM and CCA
I. INTRODUCTION [3-9]. In 2014, Xiu et al. [3] implemented a deep belief
network (DBN) with eight hidden layers to process the
Brain Computer Interface (BCI) is a system that brain can motor-imagery data. By combining Ada-boost algorithm, it
use to interact with the world directly without the peripheral has achieved 4-6% of classification accuracy higher than
nerves and muscles. Steady-state visual evoked potential SVM. In 2016, Miku Yanagimoto et al. [6] used a
(SSVEP) has been a hot issue in BCI since it has the convolutional neural network (CNN) to the emotion
characteristics of portable acquisition device, easy to operate, recognition and achieved higher recognition accuracy, which
high signal-to-noise ratio (SNR) and no need of training to have little difference among different subjects. More efficient
specific subjects. A high-performance BCI should classify the DL algorithm for BCI is being proposed in recent years, but
SSVEP efficiently to recognize the user’s command exactly there is a lack of application in SSVEP processing which has
and perform the classification with signals sampled in short high practical value.
time to achieve high communication speed. However, valid
EEG information is always embedded in confounded This paper designed a novel DBN stacked by restricted
background noises, which usually originate from eyes blinks, boltsman machine (RBM) to process a novel steady-state
ECGs, EOGs, EMGs and so on. The most common methods motion visual evoked potential (SSMVEP, i.e., a kind of
such as canonical correlation analysis (CCA) and support SSVEP elicited by oscillating Newton’s rings) [10]. The
vector machine (SVM) usually extract features artificially, features of multi-channel signals were extracted separately by
this novel DBN from data of each channel, and the extracted
* The research leading to these results has received fundings from the features were fused and further passed to the next two layers
National High Technology Research and Development Program (863) of before being submitted to the output softmax classification
China (Approval No. 2015AA042301) and the National Natural Science unit layer [11]. Using the proposed method, we have achieved
Foundation of China (Approval No. 61503298).
J Xie (corresponding author), M Li, SC Zhang, AL Luo and XL Han are
high classification accuracy with short response time and low
with School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an inter-subject variability among different subjects.
710049 China (phone: 86-29-82663707; fax: 86-29-82664257; e-mail:
xiejun@mail.xjtu.edu.cn; min.li@xjtu.edu.cn; wenzhao@stu.xjtu.edu.cn; II. METHODOLOGY
823745876@qq.com; hanxliang@stu.xjtu.edu.cn).
GH Xu (co-corresponding author) is with the State Key Laboratory for
A. Experimental Setups and Recordings
Manufacturing Systems Engineering, and School of Mechanical In the BCI system of this paper, as shown in Fig.1, the
Engineering, Xi’an Jiaotong University, Xi’an 710049 China (e-mail: EEG data were sampled from seven subjects with the eliciting
ghxu@mail.xjtu.edu.cn).
of Newton’s ring paradigm proposed by Xie et al. [10], which
YG Jia is with School of Software Engineering, Xi’an Jiaotong
University, Xi’an 710049 China (e-mail: jyg.4589815@stu.xjtu.edu.cn). preserves the advantages of traditional SSVEP-based BCI and
978-1-5090-2809-2/17/$31.00 ©2017 IEEE 3356

has a higher SNR with lower visual fatigue on its user. The b b b
sample frequency is 1200 Hz, and three electrodes were
located at O1, Oz and O2 by International 10-20 electrodes
h h1 h2 h3
1 2 3
…… b nh
hnh b∈Rnh
position system. Four Newton’s rings contracting and W∈Rnh×nv

expanding in different frequencies of 8.57 Hz, 10 Hz, 12 Hz
and 15 Hz with their positions illustrated in Table I were
simultaneously displayed on the screen. Each trial appeared
v a1 v1 a2 v2 a3 v3 ……
a
v
nv
nv a∈Rnv
for 5 seconds and every subject was asked to stare at each

Newton’s ring for 20 trials consecutively. When they
Figure 2. The structure of RBM.
completed all of the four targets, one run was finished. And for
the availability of the data, six runs should be completed for The type of RBM we employed in this study is a binary
each subject. Seven subjects were recruited in our study and RBM which uses real-valued visible units between 0 and 1 and
3360 trials of data were initially collected. Then the Fast binary hidden units. A probability is assigned to each pair of
Fourier Transform (FFT) was used to decide the validity of the hidden and visible vectors, 𝒉 and 𝒗:
data and 3140 trials of effective data were finally selected for
1
further analysis. To get enough available data for our DBN  𝑝(𝒗, 𝒉) = 𝑒 −𝐸(𝒗,𝒉)  
𝒁
implementation, we used sliding window to truncate samples
from each trial. The length of each sample was selected as Equation (2) gives the energy function:
2400 data points, which correspond to 2-second data, and the
 𝐸(𝒗, 𝒉) = − ∑𝑖∈𝑣𝑖𝑠𝑖𝑏𝑙𝑒 𝑎𝑖 𝑣𝑖 − ∑𝑗∈ℎ𝑖𝑑𝑑𝑒𝑛 𝑏𝑗 ℎ𝑗 − ∑𝑖,𝑗 𝑣𝑖 ℎ𝑗 𝑤𝑖𝑗 
length of the sliding window was set as 600 data points.
Furthermore, to avoid the disturbance of the transient where ℎ𝑗 , 𝑣𝑖 are state of the hidden unit j with biase 𝑏𝑗 and
potentials existed in SSMVEP [12], the truncating procedure visible unit i with biase 𝑎𝑖 . The weight between them is 𝑤𝑖𝑗 .
started from the 841th data point, i.e., after 0.7 second of every And Equation (3) gives Z in (1):
trial. In this way, five samples can be derived from every trial
and 15700 samples from all seven subjects, where each  𝒁 = ∑𝒗,𝒉 𝑒 −𝐸(𝒗,𝒉)  
sample contained three-channel EEG data. The data was
labeled according to the frequency of the Newton’s ring. where Z is called the “partition function”. RBM uses the
maximum likelihood function as the objective. RBM assigns a
probability to every possible visible vector 𝒗:
electrodes
1
stimulation  𝑝(𝒗) = ∑𝒉 𝑒 −𝐸(𝒗,𝒉)  
𝑍
EEG feedback
Given a vector 𝒗, the probability that ℎ𝑗 equals 1 is:
result
BCI display  𝑝(ℎ𝑗 = 1|𝒗) = σ(𝑏𝑗 + ∑𝑖 𝑣𝑖 𝑤𝑖𝑗 ) 
EEG feature -x
aquisition
preprocesing
extraction
classification where σ(x) is the logistic sigmoid function 1/(1 + e ).
DBN
pretrained by RBM Similarly, given a vector h, the probability that 𝑣𝑖 equals 1
is:
Figure 1. The structure of BCI system.  𝑝(𝑣𝑖 = 1|𝒉) = σ(𝑎𝑖 + ∑𝑗 ℎ𝑗 𝑤𝑖𝑗 ) 
TABLE I. ALIGNMENT OF TARGET POSITIONS Contrastive divergence (CD) proposed by Hinton is used
in learning of RBM [13]. First, a training vector is input to the
Position Frequency visible layer. Then using (5) to compute the corresponding
Left 8.57 Hz states of the hidden units. Finally, a “reconstruction" of visible
Right 10 Hz layer is given by sampling with the probability computed by
Up 12 Hz (6). The learning rule is:
Down 15 Hz
 ∆𝑤𝑖𝑗 = 𝜖[(𝑣𝑖 ℎ𝑗 )𝑑𝑎𝑡𝑎 − (𝑣𝑖 ℎ𝑗 )𝑟𝑒𝑐𝑜𝑛 ] 
 ∆𝑎𝑖 = 𝜖[(𝑣𝑖 )𝑑𝑎𝑡𝑎 − (𝑣𝑖 )𝑟𝑒𝑐𝑜𝑛 ] 
B. Using RBM to Initialize the DBN
RBM is a probabilistic graphical model, and stochastic  ∆𝑏𝑗 = 𝜖[(ℎ𝑗 )𝑑𝑎𝑡𝑎 − (ℎ𝑗 )𝑟𝑒𝑐𝑜𝑛 ] 
neural network can be used to interpret RBM. It is very good
at modeling high dimensionality data. The structure of RBM is It has been proved by Hinton that using pretrained RBM to
shown in Fig. 2. initialize ANNs can avoid local minimum effectively [14]. We
used mini-batch gradient descent to pretrain the RBM in this
study, and the pretraining is greedy and unsupervised.
C. The Novel Structure of DBN
A novel structure of DBN was designed by a plurality of
RBM. The structure of the DBN in this study is shown in Fig.
3. In this novel structure, the local feature of data is extracted
from each channel separately. Then in the next layer, we
3357
combined the three local features in a whole layer. Finally, the Given K stimulus frequencies f1, … , fK, a set of standard
EEG data were fitted to a five-layer DBN. Every layer is a stimulus signals Yi, composed of cosinusoids and sinusoids at
RBM except the output classification layer, which is a softmax each stimulus frequencies and their harmonics, can be
layer. And they were pretrained one by one before the back constructed as:
propagation (BP) to avoid local minimum.
cos(2𝜋 · 𝑓𝑖 · 𝑡)
output sin(2𝜋 · 𝑓𝑖 · 𝑡)
·
1 𝑆
Softmax for classification(4 units)  𝑌𝑖 = · ,𝑡 = ,…,  
𝐹𝑠 𝐹𝑠
·
cos(2𝜋 · 𝐻𝑓𝑖 · 𝑡)
More abstract feature of the 3 channels(1000 units) ( sin(2𝜋 · 𝐻𝑓𝑖 · 𝑡))
where Fs is the sampling rate, H is the number of
harmonics, and S is the number of sample points, which is also
Distributed feature of the 3 channels(2000 units) the time window of sample points of the SSMVEP signals X,
recorded from C channels.
local feature of local feature of local feature of

Given the multi-dimensional variables of X and Yi, and
O1(500 units) Oz(500 units) O2(500 units) their linear transformations x=XTWx and yi=𝑌𝑖𝑇 𝑊𝑦𝑖 , then the
maximum of the canonical correlation of x and yi (i = 1, … , K)
can be computed as:
Data from O1 Data from Oz Data from O2
𝑚𝑎𝑥 𝐸(𝑥 𝑇 𝑦𝑖 )
 𝑊𝑥 ,𝑊𝑦 𝜌(𝑥, 𝑦𝑖 ) =  
input 𝑖 √𝐸(𝑥 𝑇 𝑥)𝐸(𝑦𝑖𝑇 𝑦𝑖 )
where ρ(x, yi) is the set of canonical correlation between yi

Figure 3. The structure of DBN. and x, the maximum of ρ is the recognition basis for stimulus
frequency fi (i = 1, … , K).
In the BP stage, we still used Batch Gradient Descent to By performing CCA on every fi (i=1, … , K) separately, we
learn the globe optimization, during which the batch size is ten can obtain 𝜌𝑓𝑖 respect to each stimulus frequency. Then target
times of that in pretraining. The training is supervised.
with stimulus frequency of ftarget is given by:
D. The Final Meta-parameters 𝑚𝑎𝑥
 𝑓𝑡𝑎𝑟𝑔𝑒𝑡 = 𝑖=1,… ,𝐾 𝜌𝑓𝑖  
We trained this DBN for a number of times to decide the
best meta-parameters, and the main meta-parameters of our In CCA calculations of this paper, the stimulus frequency fi
network is shown in Table II. (i = 1, ... , 4) was set to the motion reversal frequency of each
Newton’s ring, the channel count of C was set to 3, and the
TABLE II. MAIN META-PARAMETERS OF DBN harmonics of H was set to 1. The recognition accuracy was
estimated as the percentage of correctly judged samples.
Stage Meta-parameter Value
Weight 0.1 III. RESULTS
Learning Visible 0.1 A. Accuracies accessed by DBN and CCA methods
Pretrain rate bias
(RBM) Hidden 0.1 Fig. 4 showed the accuracies accessed by DBN and CCA
methods in 2-second EEG length across seven subjects,
bias
respectively.
Batch size 60
Iteration number 5
Back Batch size 600
propagation Iteration number 100
In Table II, we can find the optimal iteration number of
pretraining is 5. We trained the DBN for a few times to decide
this meta-parameter, and the result shows that too many
iterations of pretraining is not always good, there is a tradeoff
between computational precision and time consumption. This
is what should be noted when using RBM to fit BCI signals.
E. CCA Method
CCA is a non-parametric multivariable method. Using
CCA to two sets of multidimensional variables, the correlation Figure 4. Accuracies accessed by DBN and CCA methods.
between them can be revealed. The two sets will be
transformed to a pair of linear vectors which have maximum
correlation.
3358
In Fig. 4, accuracies assessed by DBN method multiple models and Dropout, on the generalization of the
successfully exceeded 70% accuracy level even under a very DBN network to improve the detection accuracy. And
short response time (i.e., 2 seconds adopted in our study). Gaussian unit RBM may be another alternative way to more
What’s more, the performance obtained by DBN exceeded practically model the BCI signals.
CCA method for every subject.
One-way analysis of variance (ANOVA) test was used to ACKNOWLEDGMENT
reveal the decrement of the accuracies from DBN to CCA. We want to thank the subjects for participating in these
The results are shown in Table III. experiments and anonymous reviewers for their helpful
comments.
TABLE III. ACCURACY DECREMENT FROM DBN TO CCA
DBN CCA REFERENCES
Sub Decre
Mean Mean F p
ject Accuracy Accuracy ment [1] M Xu, H Qi and B Wan, “A Hybrid BCI Speller Paradigm Combining
P300 Potential And The SSVEP Blocking Feature”, Journal of Neural
S1 70.92% 52.50% 18% 103.3 8.9788×10-10
Engineering, 2013, 10(2): 026001.
S2 72.50% 33.00% 40% 913.3 2.0732×10-19 [2] Y. Lecun, Y. Bengio and G. E. Hinton, “Deep Learning”, Nature 521,
S3 75.50% 57.00% 17% 167.9 8.9525×10-12 436-444(28 May 2015).
[3] Xiu An, Deping Kuang, Xiaojiao Guo, “A Deep Learning Method for
S4 73.54% 57.50% 18% 150.2 2.6580×10-11 Classification of EEG Data Based on Motor Imagery”, ICIC 2014,
S5 74.45% 53.50% 20% 193.8 9.4522×10-12 LNBI 8590, pp. 203–210, 2014.
S6 71.45% 52.55% 18% 114.8 3.0486×10-9 [4] Na Lu, Tengfei Li, Xiaodong Ren, “A Deep Learning Scheme for
Motor Imagery Classification based on Restricted Boltzmann
S7 71.95% 52.95% 19% 216.7 1.7665×10-11 Machines”, IEEE Transactions on Neural Systems and Rehabilitation
Engineering(17 August 2016), Volume: PP, Issue: 99 | DOI:
The result in Table III indicate that the DBN method 10.1109/TNSRE.2016.2601240.
proposed in our study largely prevailed the traditional CCA [5] Teng Ma, Hui Li, Hao. Yang, “The extraction of motion-onset VEP
method in correct detection accuracy. And across subjects, BCI features based on deep learning and compressed sensing”, Journal
of Neuroscience Methods, Volume 275, 1 January 2017, Pages 80–92.
grand-averaged accuracies significantly declined 22% (F =
[6] Miku Yanagimoto, Chika Sugimoto, “Recognition of persisting
416.4, p = 6.8791×10-46) from DBN method (72.94% ± 3.83) emotional valence from EEG using convolutional neural networks”, in
to CCA method (51.18% ± 8.67). proceedings of 2016 IEEE International Workshop on Computational
Intelligence and Applications (IWCIA), November 2016.
B. Analysis of the inter-subject variability [7] Junhua Li, Zbigniew Struzik, Liqing Zhang, “Feature learning from
incomplete EEG with denoising autoencoder”, Neuro-computing,
The statistical variability of SSVEP responses is very Volume 165, 1 October 2015, Pages 23–31.
critical for the applicability of BCI applications. The [8] Tabar YR, Halici U, “A novel deep learning approach for classification
inter-subject variability, which was assessed by one-way of EEG motor imagery signals.”, Journal of Neural Engineering, 2017
ANOVA with Bonferroni-corrected multiple comparisons, Feb, 14(1):016003.
revealed that for the accuracies assessed by CCA method, [9] Yousef Rezaei Tabar and Ugur Halici, “A novel deep learning
approach for classification of EEG motor imagery signals”, Journal of
there were significant differences between individual subjects Neural Engineering, 30 November 2016, Volume 14, Number 1.
(F = 60.93, p = 2.3634×10-24), where subject S2 exhibited [10] Xie J, Xu G, Wang J, Zhang F, Zhang Y (2012), “Steady-State Motion
significant much lower accuracies while subject S3 and S4 Visual Evoked Potentials Produced by Oscillating Newton’s Rings:
showed significant higher accuracies, inferring that there is a Implications for Brain-Computer Interfaces”. PLoS ONE 7(6): e39707.
[11] M. Welling, M. Rosen-Zvi, and G. E. Hinton, (2005). “Exponential
large variability between subjects. This may be due to the Family Harmoniums with An Application to Information Retrieval”,
large response variability across different subjects. But for the Advances in Neural Information Processing Systems, pages 1481-1488,
accuracies assessed by DBN method proposed in our study, Cambridge, MA. MIT Press.
there is no significant accuracy differences (F = 2.19, p > [12] FB Vialatte, M Maurice and J Dauwels, “Steady-state visually evoked
0.05; one-way ANOVA with Bonferroni-corrected multiple potentials: Focus on essential paradigms and future perspectives”,
Progress in Neuralbiology, 2010, 90(4): 418-438.
comparisons) among different subjects, which reflected a [13] Hinton, G. E. (2002). “Training products of experts by minimizing
facilitation in the alleviation of large inter-subject variability. contrastive divergence”, Neural Computation, 14(8):1711-1800.
[14] G. E. Hinton and R. R. Salakhutdinov, “Reducing The Dimensionality
IV. CONCLUSION of Data with Neural Networks”, Science 28 July 2006: vol. 313, Issue
5786, pp.504-507.
In this study, a separated feature learning based DBN
structure was proposed for the classification of SSMVEP
signals. Results indicated that this method is more robust and
can achieve higher SSMVEP discriminative accuracies and
lower inter-subject variability in short response time when
compared to traditional CCA method, which proved the
capability of the proposed DBN structure in the application of
SSVEP detection. But we can also find that the achieved
accuracy is still not high enough and also did not exceed the
common accuracy level of 80%. Further research work will
be focused on a variety of promising ways, such as mixture of
3359

A Separated Feature Learning based DBN Structure for

Uploaded by

Copyright:

Available Formats

You might also like

A Separated Feature Learning based DBN Structure for

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Separated Feature Learning based DBN Structure for

Uploaded by

Copyright:

Available Formats

A Separated Feature Learning based DBN Structure for

Classification of SSMVEP Signals*

which are easy to lose meaningful information and need brain

Abstract— Signal processing is one of the key points in brain

978-1-5090-2809-2/17/$31.00 ©2017 IEEE 3356

position system. Four Newton’s rings contracting and W∈Rnh×nv

for 5 seconds and every subject was asked to stare at each

local feature of local feature of local feature of

where ρ(x, yi) is the set of canonical correlation between yi

You might also like