Subject-Independent Classification On Brain-Computer Interface Using Autonomous Deep Learning For Finger Movement Recognition

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Subject-independent Classification on Brain-Computer Interface

using Autonomous Deep Learning for finger movement recognition


Khairul Anam, Saiful Bukhori, Faruq Sandi Hanggara, and Mahardhika Pratama, Member, IEEE

signals for the hand movements cannot be obtained from


Abstract— The degradation of the subject-independent


classification on a brain-computer interface is a challenging patients who have amputations above the elbow. In addition,
issue. One method mostly taken to overcome this problem is by in patients who have very severe weakness of motor function
collecting as many subjects as possible and then training the (neuromuscular disease), the EMG signal cannot be recorded.
system across all subjects. This article introduces streaming The solution is to use another body signal, which is from a
online learning called autonomous deep learning (ADL) to brain signal or electroencephalogram (EEG).
classify five individual fingers based on electroencephalography
(EEG) signals to overcome the issue above. ADL is a deep EEG signals are signals that are dynamic and non-
learning architecture that can construct its structure by itself stationary. As a result, the motion detection system is
through streaming learning and adapt its structure to the susceptible to changes in EEG signals such as fatigue, the
changes occurring in the input. In this article, the input of ADL humidity of the EEG electrode due to sweating, or a slight shift
is a common spatial pattern (CSP) extracted from the EEG in the EEG electrode. Therefore, we need a detection system
signal of healthy subjects. The experimental results on the that is robust, sturdy, and not sensitive to changes or even little
subject-dependence classification across four subjects using 5- changes and also has an adaptive nature. One reason for the
fold cross-validation show that that ADL achieved the lack of solidity of the existing system is the use of learning
classification accuracy of around 77%. This performance was machines that do not work online. In other words, various
excellent compared to a random forest (RF) and a convolutional machine learnings such as artificial requirements network
neural network (CNN). They achieved accuracies of about 53% (ANN), support vector machine (SVM), k-nearest
and 72%, respectively. On the subject-independent neighborhood (kNN), extreme learning machine (ELM), linear
classification, ADL outperforms CNN by resulting stable
discriminant analysis (LDA) or other works in a batch learning
accuracies for both training and testing, different from CNN
that experience accuracy degradation to approximately 50%.
system. In batch learning, the system is trained offline, and
These results imply that ADL is a promising machine learning then the trained system is applied for testing. The problem is,
in dealing with the issue in the subject-independent due to the non-stationary property of the EEG signal, the data
classification. in offline learning is different from the data in real-time
applications. Furthermore, the gap is getting more prominent
when the system is implemented for a long period of time.
I. INTRODUCTION Certainly, there will always be a recalibration of the system.
The disability prevalence across various countries is quite Unfortunately, it will bore the patient.
a lot. For example, as many as five million Indian descendants An online learning system, or known as streaming
have a disability [1]. Similarly, in Indonesia, around 2-3% of learning, can be a solution to this problem. In this pattern
Indonesian citizens experience disabilities [2]. Many factors recognition system, the system works directly, and at the same
cause disability, one of which is amputation. The causes of time, it is learning and adapting to the new environment. To
amputations are various, such as diabetes, work accident or some extent, the designed systems can also learn by
traffic accident. Various efforts need to take to restore their themselves looking for the appropriate parameters to produce
quality of life, especially disability on the hands. Losing hands a sturdy system. This system is also known as autonomous
becomes a severe problem for sufferers because the machine learning or independent machine learning [3].
functionality of the hands in human life is very high.
The success of deep learning in image processing inspires
One of the methods for the rehabilitation of amputees is by researchers to apply it to EEG signal processing [4]. The main
providing a prosthetic hand robot to replace lost hand advantage of deep learning is that it eliminates feature
functions. The challenge emerges how to design a control extraction stages because the feature extraction has become
system for the robot in such a way the user feels that it is a part part of deep learning. However, the learning process is still in
of the user's body. To produce such a control system, we have the form of batch learning or offline learning so it is less robust
to design a control system that can predict the user desire based with changes. The use of online learning methods in deep
on human body signals such as electromyography (EMG). learning is still in testing stages of the benchmark datasets. To
Many studies have developed a pattern recognition system the best of the author’s knowledge, none of them have been
for hand movements based on EMG signals. However, EMG applied to overcome the inherent problem of the EEG signal
processing. This article is proposed to evaluate the

Khairul Anam is with Department of Electrical Engineering, Universitas Faruq Sandi Hanggara with Intelligent System and Robotics, CDAST
Jember, 68121, Indonesia (corresponding author phone: +62331-484977; e- Universitas Jember, Indonesia (e-mail: faruqsandi@gmail.com).
mail: khairul@unej.ac.id). Mahardhika Pratama is with School of Computer Science and
Saiful Bukhori is with Faculty of Computer Science, Universitas Jember, Engineering, Nanyang Technological University, Singapore (e-mail:
Indonesia (e-mail: saiful.ilkom@unej.ac.id) mpratama@ntu.edu.sg).

978-1-7281-1990-8/20/$31.00 ©2020 IEEE 447

Authorized licensed use limited to: Cornell University Library. Downloaded on September 12,2020 at 15:01:03 UTC from IEEE Xplore. Restrictions apply.
performance of the online deep learning or so-called In equation (1), y(l) is the multiclass probability, and L is the
autonomous deep learning [5][3] to produce a robust EEG number of hidden layers. Meanwhile, in the equation (2) and
signal processing system. (3), 𝑊 (𝑙) ∈ 𝑅𝐾𝑙×𝑑 is the weight of the l-th hidden layer in
which Kl and d is the number of hidden nodes and the number
This article will focus on the performance of the proposed
system overcoming a robustness issue in degradation of input in the l-th hidden layer. Next, 𝑏 (𝑙) ∈ 𝑅𝑚 is the bias of
performance of the subject-independent classification input layer and β(l) is the l-th hidden layer weight in which it is
especially on the EEG pattern recognition for finger adjusted by reward and punishment factor.
movement. The common method to overcome the degradation ADL consists of two adaptations., i.e. the adaption of the
performance in the subject-independence classification is by network width and network depth. The first adaptation is
collecting as many subjects as possible [6][7]. This article governed by network significance (NS) method which creates
takes different approaches by introducing the streaming deep a new hidden unit and prunes an inconsistent hidden unit.
learning called autonomous deep learning to classify five Meanwhile, the later adaption is driven by drift detection
individual finger movements from EEG signal. scenario (DDS) which creates and prune a hidden layer. In
The rest of the paper is as follows. Section two presents the general, the learning mechanism of ADL is depicted in Figure
theoretical aspect of autonomous deep learning. Then, the 2.
section for material and method comes after it. Result and
discussion and then the conclusion will end the presentation of
this paper.

II. AUTONOMOUS DEEP LEARNING


Autonomous deep learning (ADL) is a supervised elastic
deep neural network where the structure of hidden units and
hidden layers can be constructed by itself in a long-life
learning mechanism. The basic of ADL is the multi-layer
perceptron (MLP) that consists of a first layer and multi
intermediate layers. The former is the input feature while the
latter is the hidden layers that have a connection to unique
classifiers, as shown in Figure 1.

The output Figure 2. Autonomous Deep Learning [3]


layer
III. METHOD
The experiments was carried out following the flow shown
in Figure 3. The goal of the pattern recognition based on ADL
is to decode five individual fingers from EEG signals. The
dataset is EEG motor imagery for brain-computer interfaces
The second obtained from [8]. The recording session parts we choose are
hidden
layer five finger motion of five mental imagery types to discriminate
finer movement imageries. All motor imageries are of a single
hand.
The first
hidden layer The EEG montage consisted of 19 lead configured with the
international 10-20 system, one bipolar lead X3 for data
The Input synchronization, and two ground leads A1 and A2. Although
layer the dataset provides data of subject A to subject M, we only
used four subjects {A, B, C, F} because their EEG signal
Figure 1 The Autonomous deep learning structure with two hidden recorded with both frequencies of 200Hz and 1000Hz.
layers
We used four subjects A, B, C, and F at frequencies of
200Hz and 1000Hz. In this study, the feature of an EMG signal
The equation of the ADL is as follow: is the five components of the Common Spatial Pattern[9]. The
windowing process uses a window length of 1 second and
𝐶̂ = max 𝑌̂𝑜 ; 𝑌̂ = ∑𝐿𝑙=1 𝛽 (𝑙) . 𝑦 (𝑙) (1) overlap by 0.75 seconds. The number of data for each subject
𝑜=1,…,𝑚
A, B, C, and F are {4974, 4959, 5941, 4947} at 200Hz and
where {4853, 4962, 4856, 4944} at 1000Hz. We used 100 chunk data
𝑦 (𝑙) = max(𝑊𝑠 (𝑙) ℎ(𝑙) + 𝑏𝑠 (𝑙) ) ∀𝑙 = 1, . . , 𝐿 (2) so there are around 40 streaming learning for each subject.
Before being separated into several batches, the data is
ℎ(𝑙) = 𝜎(𝑊ℎ(𝑙−1) + 𝑏 (𝑙) ), ℎ(𝑙) = 𝑋 (3) randomized first.

448

Authorized licensed use limited to: Cornell University Library. Downloaded on September 12,2020 at 15:01:03 UTC from IEEE Xplore. Restrictions apply.
EEG Classification Both figures indicate that the EEG pattern recognition
Data Common using
Autonomous
Predicted finger
movement
based on ADL was able to predict the five individual fingers
Acquisit Spatial Pattern
ion Deep Learning by the accuracy of around 70 to 80%. The performance of the
proposed pattern recognition system is more standing up when
Figure 3 Computer-brain interface for movement predicting the we compare it with another deep learning method and one
hand movements ensemble powerful classifier i.e. random forest (RF), as it can
be seen from Table 1 to Table 4.
In this paper, we separated evaluation to subject dependent From these tables, we can see that the accuracy of ADL is
and subject independent to examine the robustness of the the worst on the training stage. However, the testing accuracy
system. In subject-dependent, the model trained and tested of ADL is the best by the average accuracy of around 78%
with the dataset of each subject. On the other hand, subject- across four subjects compared to 53% and 72% for RF and
independent evaluation regards a subject dataset as test data CNN, respectively. In comparison to the work of [8], the ADL
and the others as train data. The model is trained with 5-fold based pattern recognition based on EEG developed in this
cross-validations with shuffle, assuming that the comparison article achieves better results. This article employed the same
of training data and testing data that is 4: 1 is good enough for datasets as those in [8] in which utilized support vector
each fold to represent the aggregate data. machine (SVM). Perhaps, this fact supports the advantage of
Comparison to other methods, random forest (RF) with ADL over other classifiers.
100 trees and convolutional neural network (CNN). As for
TABLE I. THE TRAINING ACCURACY OF ADL 5 CROSS-VALIDATION
CNN, it comprises 6 layers, 3 convolutional layers and 3 fully 200HZ DATA
connected layers. The matrix size of the input is 28x28 while
the feature size of the convolutional layers is 32x32, 64x64, SUBJECT
RF (%) CNN (%) ADL (%)
and 64x64 for the first, second, and third layers, respectively. Mean STD Mean STD Mean STD
A 100,00 0,00 99,95 0,01 77,57 0,22
IV. RESULT AND DISCUSSION B 100,00 0,00 99,91 0,07 77,73 0,16
C 100,00 0,00 99,90 0,07 81,76 0,31
A. Subject-dependent classification
F 100,00 0,00 99,93 0,03 78,05 0,20
The subject dependent evaluation was performed on all AVERAGE 100,00 0,00 99,92 0,05 78,78 0,22
subjects. Each new model is trained with some data from a
subject and then tested with other data parts from the same TABLE II. THE TESTING ACCURACY OF ADL 5 CROSS-VALIDATION 200HZ
subject. The experimental results are shown in Figure 4 and DATA
Figure 5.
RF (%) CNN (%) ADL (%)
SUBJECT
Mean STD Mean STD Mean STD
A 54,63 0,71 71,28 0,69 77,46 0,91
B 50,91 0,52 74,92 0,49 77,82 0,60
C 55,56 0,60 70,67 0,76 81,62 1,06
F 54,11 0,58 76,86 0,76 78,13 0,82
AVERAGE 53,80 0,60 73,43 0,67 78,76 0,85

TABLE III. THE TRAINING ACCURACY OF ADL 5 CROSS-VALIDATION


1000HZ DATA

RF (%) CNN (%) ADL (%)


SUBJECT
chunk Mean STD Mean STD Mean STD
Figure 4. The average testing accuracy across four subjects using 5-cross A 100,00 0,00 99,77 0,27 77,61 0,08
validation on data with a frequency of 200 Hz B 100,00 0,00 99,70 0,26 74,73 0,21
C 100,00 0,00 99,73 0,26 77,57 0,06
F 100,00 0,00 99,91 0,06 77,19 0,20
AVERAGE 100,00 0,00 99,78 0,21 76,77 0,14

TABLE IV. THE TESTING ACCURACY OF ADL 5 CROSS-VALIDATION


1000HZ DATA

RF (%) CNN (%) ADL (%)


SUBJECT
Mean STD Mean STD Mean STD
A 48,29 1,18 68,56 1,09 77,75 0,33
B 52,03 0,34 71,53 0,82 74,61 1,02
C 55,51 0,78 69,50 1,24 77,49 0,27
F 59,08 0,74 79,26 0,79 77,20 0,77
chunk
AVERAGE 53,73 0,76 72,21 0,99 76,76 0,60
Figure 5. The average testing accuracy across four subjects using 5-cross
validation on data with a frequency of 1000 Hz

449

Authorized licensed use limited to: Cornell University Library. Downloaded on September 12,2020 at 15:01:03 UTC from IEEE Xplore. Restrictions apply.
B. Subject-independent classification TABLE V. THE CLASSIFICATION ACCURACY OF CNN AND ADL USING 5
CROSS-VALIDATIONS ON 200 HZ DATA FOR SUBJECT-INDEPENDENT TEST
In many cases, most EEG pattern recognition systems
work well on the same subjects. However, the performance TRAINING TESTING
Scenario
CNN ADL CNN ADL
will be poor when it is trained with one subject and tested with A vs BCF 99,95 79,15 22,10 77,46
different subjects. This section will test the performance of B vs ACF 99,93 79,10 20,85 77,82
ADL for subject-independent classification. In this C vs ABF 99,89 77,83 21,07 81,62
experiment, the system was trained using one subject and then F vs ABC 99,93 78,96 19,37 78,13
tested using the rest of subjects. The experimental results are AVERAGE 99,93 ± 0,03 78,76 ± 0,63 20 ± 1,13 78,76 ± 1,93
depicted in Figure 6 and Figure 7.
TABLE VI. THE CLASSIFICATION ACCURACY OF CNN AND ADL USING 5
These two figures show that the ADL works very well on CROSS-VALIDATIONS ON 1000 HZ DATA FOR SUBJECT-INDEPENDENT TEST
the subject-independent classification. The performance of
TRAINING TESTING
ADL is similar to the subject-dependent classification. The Scenario
CNN ADL CNN ADL
average accuracy is around 78% over different four A vs BCF 99,96 76,43 20,26 77,75
scenarios. It shows that the ADL is very promising for B vs ACF 99,59 77,46 20,02 77,82
overcoming subject-independent classification that is most C vs ABF 99,85 76,51 20,50 81,62
haunting the bio-signal pattern recognition implementation. F vs ABC 99,95 76,55 20,96 78,13
AVERAGE 99,84± 0,17 76,74± 0,48 20,43±0,40 78,83±1,87

V. CONCLUSION
This article proposes a new method to overcome the
degradation performance on the subject-independent
classification by introducing a new brain-computer interface
(BCI) based on autonomous deep learning (ADL). One of the
main advantages of ADL is the ability to construct its structure
itself and adapt to the changes in the input behaviors. The
proposed BCI consists of CSP features and ADL. The goal is
to classify five individual fingers based EEG signals. The
experimental results on the subject-dependent classification
show that the ADL outperforms random forest and CNN by
chunk
producing stable accuracies on both training and testing about
Figure 6 The testing accuracy of ADL on subject-independent classification 78%. In the subject-independent classification, the virtue of
on 5-fold cross-validation using data with a frequency of 200 Hz ADL is more standing out compared to CNN. The ADL did
not experience performance degradation, while CNN
experienced a degradation performance of about 50%.

REFERENCES
[1] D. Bright, A. Nair, and D. Salvekar, “EEG-Based Brain Controlled
Prosthetic Arm,” pp. 479–483, 2016.
[2] Irwanto, E. R. Kasim, A. Fransiska, M. Lusli, and O. Siradj,
“Analisis Situasi Penyandang Disabilitas di Indonesia: Sebuah Desk-
Review,” no. November, p. 32, 2010.
[3] A. Ashfahani and M. Pratama, “Autonomous deep learning:
Continual learning approach for dynamic environments,” SIAM Int.
Conf. Data Mining, SDM 2019, no. August, pp. 666–674, 2019.
[4] Y. R. Tabar and U. Halici, “A novel deep learning approach for
classification of EEG motor imagery signals,” J. Neural Eng., vol.
14, no. 1, p. 16003, 2017.
[5] M. Pratama, M. De Carvalho, E. Lughofer, and J. Lu, “ATL :
chunk Autonomous Knowledge Transfer from Many Streaming Processes
Figure 7 The testing accuracy of ADL on subject-independent classification ATL : Autonomous Knowledge Transfer from Many Streaming
on 5-fold cross-validation using data with a frequency of 1000 Hz Processes,” no. August, 2019.
[6] S. Fazli, C. Grozea, M. Danóczy, F. Popescu, B. Blankertz, and K. R.
The benefit of ADL is more highlighted when it is Müller, “Subject independent EEG-based BCI decoding,” Adv.
Neural Inf. Process. Syst. 22 - Proc. 2009 Conf., pp. 513–521, 2009.
compared to CNN, as shown in Table 5 and Table 6. Same as [7] A. M. Ray et al., “A subject-independent pattern-based brain-
subject-dependent classification, the training accuracy of CNN computer interface,” Front. Behav. Neurosci., vol. 9, no. OCTOBER,
is very good, almost 100%. Unfortunately, its testing accuracy pp. 1–15, 2015.
is extremely bad. It became around 20%. On the contrary, [8] M. Kaya, M. K. Binli, E. Ozbay, H. Yanar, and Y. Mishchenko, “A
ADL presents a promising performance. The accuracy did not large electroencephalographic motor imagery dataset for
electroencephalographic brain computer interfaces,” Sci. Data, vol. 5,
decrease and tent to stable. The experimental results show that
no. 1, p. 180211, Dec. 2018.
ADL is a promising solution for robustness issues in the bio- [9] H. Meisheri, N. Ramrao, and S. Mitra, “Multiclass Common Spatial
signal pattern recognition especially on EEG based system. Pattern for EEG based Brain Computer Interface with Adaptive
Learning Classifier,” pp. 1–8, 2018.

450

Authorized licensed use limited to: Cornell University Library. Downloaded on September 12,2020 at 15:01:03 UTC from IEEE Xplore. Restrictions apply.

You might also like