Professional Documents
Culture Documents
Motor Imagery Classification
Motor Imagery Classification
2 Related Work
Convolutional neural networks have proved to be a powerful feature extrac-
tor in the field of computer vision, NLP and even in the field of time series
problems. Even in the field of motor imagery classification, CNNs have been
the first choice of many. Research at US Amy Research Laboratory intro-
duced a compact convolutional-based motor imagery classifier and achieved
state-of-the-art results. Casual Temporal networks are CNN based neural
networks which are known to mimic the nature of recurrent models and fo-
cus only on one part of input at a time. Multiscale CNNs (citation here) had
used both spatial as well as temporal domain convolutions to generate a rich
feature bank and classify based on the features generated. EEG signals have
spatial correlations among each other. Apart from extracting common spa-
tial patterns (CSP) (citation here), projecting the data into the Riemannian
space is extracts the correlation amongst the input channel by generating
the SPD matrix. Learning these correlations in this manifold space proved
to be quite useful especially in the case where there are a smaller number
of training samples present and also have proven better than the traditional
methods in the case of dimensionality reduction. Deep belief networks, are
also popular for feature extraction provide the feature in the encoded stage.
2
Figure 1: Time series plots of two datapoints from the same class.
3
Figure 2: A time series plots of classes, from the top left in clockwise direc-
tion, class 769, class 770, class 771, class 772
4
Figure 4: Time series plots of two datapoints from the same class.
Channel Weighting.
The pre-processed data is then put into the LSTM network for classi-
fication.
5
I took 2 approaches to this paper. First just raw data, without any pre-
processing (to get a baseline score) and second is the method proposed in
the paper. I was not able to implement the channel weighting step therefore
skipped it and implemented the rest as it is.
Pre-processed Data – 38% average accuracy with 45% being the best.
6
Figure 7: Visualization Temporal Casual Convolution
7
4 Proposed Models
The sections henceforth contain the discussion of the models implemented,
and explored. The methods adopted draw motivation from the field of com-
puter vision as well as the others based on representation learning.
8
where ∗ is the cross correlation function given by -
N
X −1
f ∗ g[n] = f [m]kg [m + n]
0
where f ∈ Rn , g ∈ Rm and θ, ρ, γ are the first second and third feature maps
respectively. The obtained feature maps are then stacked up onto each other
with θ on the top and on the bottom. The resultant feature vector is of the
shape N ∗ 3 ∗ K ∗ t. Note that there is no non-linearity applied between any
of the obtained feature maps. This new feature vector obtained becomes
the input for the residual network.
The residual network contains three residuals unit, with increasing number
of channels from 64, 256 and finally 512. There is a max pooling layer after
each residual block with kernel 5 and stride 2. This is followed by linear
classifier layer which is preceded by a global max pool. The residual block
can be defined by the set of following equations where n is the number of
channels -
n
X
x(n) = r(N ∗ n ∗ K ∗ t) = β(n) + ω(n) ∗ x(n)
0
n
X
F (N ∗ n ∗ K ∗ t) = β(n) + ω(n) ∗ xn
0
F (N ∗ n ∗ K ∗ t) = Relu(F (N ∗ n ∗ K ∗ t))
F (N ∗ n ∗ K ∗ t) = BatchN orm(F (N ∗ n ∗ K ∗ t))
r(n) = F (N ∗ n ∗ K ∗ t)Output = r(n) + x(n)
4.1.4 Results
Following the above training paradigm, this section discusses the 4 four class
classification. I also compare a 2-way classification which throws light on
per class performance.
9
Subject Accuracy(4 way)
Subject 1 77.5
Subject 2 65
Subject 3 67.5
Subject 4 63.75
Subject 5 74
Subject 6 67.5
Subject 7 71.25
Subject 8 67.5
Subject 9 82.5
4.1.5 Conclusion
Stacking feature maps and serving it as an input to a residual network is
developed in this paper. The network contains one layer for upscaling and
3 layers for extracting feature maps followed by residual blocks, a down-
sampling layer, global max pooling, and lastly a fully connected for classi-
fication, with a total of 24 layers. Using a cyclic learning rate enables the
model to navigate the loss surface and make appropriately large steps. Con-
trary to conventional approaches where motor imagery classification relies
heavily on data preprocessing, this method does not make use of data pre-
processing or feature engineering whatsoever. The model can be made more
robust by increasing the receptive field by stacking more feature maps.
10
4.2.2 Introduction
A representation l of dimension d of an input signal x is a good represen-
tation of the input vector such that x = ϕ(l) where ϕ, the decoder network
recreates x, the generated signal with minimum variance from the input sig-
nal. This representation l is considered as a feature vector, a signature on
the basis of which classification can be done. Apart from manual channel
selection and various dimensionality and feature extraction techniques like
principle or independent component analysis, the powerful Boltzmann ma-
chines have also been used to extract a feature vector by recreating the same
signal using their greedy training approach. Symbolic aggregation methods,
which are based on regression techniques are also popular in the field of time
series analysis. Convolutional autoencoders have proven to be very useful in
the field of computer vision, in tasks like image denoising, anomaly detection
and complex tasks like single image super resolution. They are generative
networks with an unsupervised manner of learning where the input is first
encoded and then further reconstructed from the encoded state. This en-
coded state can be considered a signature representative of that particular
class upon which classification is performed. EEG signals are long and
multi channeled in nature. It is imperative that the feature vector captures
both spatial as well as the temporal relations between the channels. Here,
I propose a deep layered convolutional auto-encoder with skip connections,
which encodes the given EEG signal into a 1-dimensional representation.
This 1-dimensional encoded feature vector serves as an input vector to a
classification model.
11
Figure 10: A high level structure of the Encoder Decoder Architecture
followed.
where x represents the input and t the time step, u the update vector and
r the reset gate which decide the amount of past information to flow. The
amount information which is allowed to be passed is given by –
X X
u = σ(b + U x(t) + W h(t))
The reset and the update gates can individually ignore the parts of the state
vector. The overall architecture consists of 313 GRUs with a hidden size of
512 followed by a linear layer for classification. The encoded data is first
normalized using layer normalization and then fed into the network.
12
Figure 11: A plot across timesteps showing the MSE between the original
and reconstructed.
y = W.x + b
out = P ReLU (y)
4.2.8 Results
All the above methods rely on the quality of data reconstruction and how
distinct are the signatures of one class from the other. The autoencoder
achieves a mean squared error loss 0.004 on the test dataset thus achieving
a satisfactory quality of reconstructed data. The plots below show the dif-
ference in the generated and the original data across all 22 EEG channels
during the 313 timesteps present. These plots were generated from ran-
domly chosen data in the test dataset Figure 11. The Figure 12 visualizes
the signature obtained of the various classes. As seen from the plots above,
the one-dimensional feature vectors are distinct and achieve their peaks at
different times by significantly different amounts therefore a good decision
boundary can be expected.
13
Figure 12: Signaturues Obtained, from the top left in clockwise direction,
class 769, class 770, class 771, class 772
Subject Accuracy
Subject 1 62.5
Subject 2 60
Subject 3 65
Subject 4 65
Subject 5 55
Subject 6 79
Subject 7 70
Subject 8 63
Subject 9 70
Subject Accuracy
Subject 1 67.5
Subject 2 60
Subject 3 65
Subject 4 65
Subject 5 70
Subject 6 67.5
Subject 7 70
Subject 8 63
Subject 9 72.5
14
Subject Accuracy
Subject 1 40
Subject 2 35
Subject 3 50
Subject 4 23
Subject 5 45
Subject 6 23
Subject 7 40
Subject 8 39
Subject 9 50
4.3.2 Introduction
Graph neural networks are one of the most powerful deep learning techniques
which can work on non-Euclidian data as well as in the Euclidian space
if modelled correctly. Many applications such as social-network analysis,
graph-analysis, point cloud segmentation and molecular structures cannot
be represented as a vector and have to be modelled as a graph to correctly
capture the relations amongst all the elements in the data. Graph neural
networks are also useful for manifold classification. EEG signals can be
visualized in the same way, where every EEG channel are connected by an
edge. Graph convolution networks can capture the structure of channels
spread across the EEG headset and the signals being generated throughout
the time at once.
15
Figure 13: A topological graphs of various classes.
16
Subject Accuracy
Subject 1 70
Subject 2 82.3
Subject 3 82.5
Subject 4 75
Subject 5 75
Subject 6 77.5
Subject 7 81
Subject 8 77
Subject 9 81
5 Conclusion
During the course of six months, I studied a wide variety of deep learning
techniques for the classification of EEG signals. To broadly classify it, the
residual convolutional method was mostly inspired by the field of computer
vision, specifically from video classification. Learning the representations of
the EEG signals were inspired by the domain of natural language processing,
the word2vec models which enabled us to discover new avenues. This got
me to think that for a network to recreate data from the same class, there
has to be a distinctive signature for the same, and classifying that would
be much easier. While studying the Riemannian techniques and manifold
learning, I read about how non-Euclidian data can be distinguished using
the sophisticated field of geometrical learning. This gave me the idea to
capture the EEG data as a graph, and “view” the whole topology at once.
The main objective of all the methodologies obtained was to avoid manual
feature selection or pre-processing to extract features as much as possible,
with the ideology of “Let the network figure it out.” There lies a lot of
undiscovered avenues even in the above-mentioned approaches, for example,
the time2vec model by Facebook research is more sophisticated in represen-
tation learning than the autoencoder method as described. In the residual
convolutional architecture, the receptive field can be increased by stacking
more feature maps which can further increase the performance of the model.
The balance between the complexity of the model and the amount of data
available was handled tactfully, for example by increasing the receptive field
and the results obtained were satisfactory and some were accuracies obtained
were higher on this particular dataset from the other methodologies I have
read about. On a personal note, the previous 6 months have been full of
learning and exploring new concepts in machine learning and all the places
there can be applicable and how to could they be implemented in the field
of time series classification. I would continue to find and implement various
techniques as and when I come across them which would be relevant to the
field of motor imagery classification. My sincere thanks for providing me
17
with this wonderful opportunity.
18