Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2018 17th IEEE International Conference on Machine Learning and Applications

Automatic Seizure Detection via an Optimized


Image-based Deep Feature Learning
Ibrahim Alkanhal B.V.K Vijaya Kumar Marios Savvides
ialkanha@cmu.edu kumar@ece.cmu.edu marioss@andrew.cmu.edu
Department of Electrical and Computer Engineering, Carnegie Mellon University
Pittsburgh, USA

Abstract—In this paper, our goal is to find an optimized the use of machine learning algorithms in general and deep
approach that can learn features from multichannel EEG time- learning approaches in particular to the medical applications.
series data to perform automatic seizure detection. In general, it
is not easy to learn robust features from EEG signals due to the
One of these problems is the lack of data especially the
variations in both intra and inter-patient variability. However, positive samples, which lead to unbalanced datasets and make
to achieve good generalization, we use an algorithm that tries to training process harder.
capture spectral, temporal and spatial information, in contrast The goal of this research effort is to optimize many existing
to standard EEG analysis techniques that ignore spatial aspects. designs of automatic seizure detectors that can learn spectral,
The first stage of this algorithm is to transform EEG signals into
a sequence of topology-preserving multi-spectral and temporal
temporal and spatial features from multichannel EEG time
images. After that, these generated images are fed as inputs series data.
to a convolutional neural network. By overcoming the lack of
data, especially the positive samples, and creating a process to II. R ELATED W ORK
deal with unbalanced datasets and optimizing the complexity of There exist only few publicly available EEG seizure
the network, our convolutional neural network learns a general
spatially invariant representation of a seizure in a reasonable
datasets. In this paper we will consider works using the
time and improves sensitivity, specificity and accuracy result biggest freely available dataset (CHB-MIT) and will focus on
comparable to the state-of-the-art results. the offline seizure detection, which involves analyzing and
labeling patients recordings for the purposes of diagnosis,
I. I NTRODUCTION monitoring, or treatment planning. The main use of offline
Epilepsy is a neurological disorder that is characterized by detectors is to replace the need for the laborious visual analysis
abnormal or excessive episodes of neuronal signals inside the or at least make that job easier [1]. Most work done in this
brain. These abnormal brain activities called seizure and can field is either concerned with training detectors for a specific
occur either in partial onset, which comes from a specific brain patient or using cross-patient data. For the patient-specific
region, or general onset, which comes from the brain as a detectors, recent research used machine learning algorithms
whole. There are many side effects associated with seizures and the benchmark result was an SVM classifier that achieved
such as loss of consciousness and abnormal movements. As a sensitivity of 0.96 and a low false positive rate of 0.08/hours
a result, seizure patients require varying medical care that [2]. Similar results have been obtained recently using recurrent
depends on their situation. In the chronic phase, medications convolutional neural network that used image representation
are taken on a daily basis to prevent further seizures. However, of the raw EEG signals [1]. However, because cross-patient
in the focal seizures, a surgical process can be applied. detectors have more generality and can be used without having
Hence, in order to give suitable treatment, we require accurate an expert to label patient-specific data; these results tend to
detection of seizures. Furthermore, many other uses of seizure be not that important as the cross-patient detectors. On the
detection exist such as monitoring patients who have surgical other hand, it is more challenging to design good cross-patient
resection or under treatment [1][8]. The standard diagnosis detectors due to the variability from patient to patient. The
technique for epilepsy is to monitor brain activities through benchmark cross-patient result was achieved using recurrent
electroencephalograms (EEG). However, EEG readings must convolutional neural network that used image representation
be analyzed by highly trained medical professionals who of the raw EEG signals where they achieved an average
may need several hours to analyze one day of recording sensitivity of 0.85, specificity of 0.83 and a false detection rate
from a single patient. Therefore, it is beneficial to develop a of 0.77/hour on the CHB-MIT dataset [3]. Recently, another
technique that can automatically detect seizures from the EEG deep neural network approach showed good performance on
recordings. Most work done to date uses expert hand-crafted only part of the dataset and achieved an average sensitivity of
features and relies on either spectral or temporal information. 0.812 and a false detection rate of 0.16/hour [4]. Converting
However, it is well known that epileptic seizures are highly raw EEG time-series signals into image representation as a
non-stationary phenomena. Recent research has shown that sequence of topology-preserving multi-spectral images and
using deep learning in such an application can improve the then using it in a deep neural network was first proposed for
result [1][15][16]. However, many problems are still facing a task focused on mental load classification [5].

978-1-5386-6805-4/18/$31.00 ©2018 IEEE 536


DOI 10.1109/ICMLA.2018.00086
Fig. 1. Overview of our approach: (1) multichannel EEG time series are obtained; (2) construct topographical maps for each time frame by applying FFT;
(3) sequence of 3-channel images are fed into a CNN for representation learning and classification.

In (Thodoroff et al., 2016), deep learning showed a promising Fast Fourier transform over a thirty-second time window and
performance on seizure detection, however, many challenges calculate power values in different frequency bands (1-7, 7-14,
are still exist. In this paper, we are aiming to deal with 14-49 Hz) for each electrode projection as shown in Figure
two main challenges. For one, overcome the lack of data, 3. Finally, to create a continuous image, we interpolate the
especially the positive samples, and creating a process to values of each electrode projection using cubic interpolation
deal with unbalanced datasets. Furthermore, optimizing the using Clough-Tocher scheme [5]. This scheme is used to
complexity of the network, hence, the time consumption for interpolate the scattered power measurements over the scalp
training process can be minimized [1]. and to estimate the values between the electrodes over the
16x16 area. This procedure creates images of shape (3x16x16).
III. M ETHODOLOGY Each image has three color channels (one for each frequency
In this paper, we will investigate a deep learning approach band) with height and width of 16 pixels.
for seizure detection from the EEG signal. As shown in Figure Constructing Convolutional Neural Network: We construct
1, the proposed method consists of two main parts. First is con- a convolutional neural network (CNN) to deal with the in-
verting raw EEG time-series signals into image representation herent structure of EEG data as shown in Figure 4. The
as a sequence of topology-preserving multispectral images. CNN has been shown to provide good image representation
After that we construct a convolutional neural network trained of the data, therefore, we used it here to work with the image
to capture spectral, temporal and spatial patterns to predict representation of the spectral, spatial and temporal features
if the image contains a seizure or not. We develop a unique [6]. In this part, we use a CNN that is almost similar to
optimized training technique and apply it to the developed well-known AlexNet which is used in Imagenet classification
network. challenge [6]. As shown in Figure 5, network configuration
consists of ten layers (two conv3-32 layers, two conv3-64
A. Design Description layers,two conv3-128 layers, three max pooling layers, one
Making Images from EEG Time-series Signals: EEG signals fully connected layer and a final softmax layer). We adapt the
consist of multiple time series that represent the measurements network used in the benchmark paper and added two more
over different spatial cortex locations. As with many time conv3-128 layers and one max pooling layer in order to handle
series data, most salient features occur in frequency domain the large amount of data that exist in the training set [1]. In
and can be analyzed using the spectrogram of the signal. addition, a 0.5 probability dropout was used on the last fully
In our case, EEG signal has an additional spatial dimension connected layer to avoid overfiting that may happen due to
[5]. In order to represent both spatial and spectral features, the present of the binary classes classification problem. For
we follow several steps. First, we project the 3D coordinates the convolutional layers we padded the input with 1 pixel
of the patients 21 electrodes onto a 2D surface using Polar to preserve the spatial resolution after convolution and the
Projection, or what is called Azimuthal Equidistant Projection number of kernels are doubled over each following stack. Also,
(AEP), in order to preserve the distance between electrodes in we used small receptive fields of size 3x3 and stride of 1 pixel
the 3D plane as shown in Figure 2 [11][12]. Then, we apply a with ReLU activation function. The Max-pooling layers with

537
Fig. 2. Making images from EEG time-series signals. This process follows tow paths: 1) Converting 3D electrodes locations into 2D representation. 2)
Applying three different frequency bands over the time series.

Fig. 3. Converting 3D electrodes locations into 2D representation.

a 2x2 window and stride of 2 pixels perform a down-sampling


function [9][10].
Fig. 4. Deep convolutional neural network for image classification problem.
IV. DATASET AND DATA PREPROCESSING
In this paper, we are using the biggest freely available
dataset for seizure patients (CHB-MIT) [2]. This dataset
consists of several EEG recordings grouped into 23 cases, were A. Sampling procedure
collected from 22 subjects; 5 males, ages 3-22 and 17 females,
ages 1.5-19. These recordings have a total of more than 951
hours, which is equal to about 3.4 Million seconds. One of In this part, we followed two types of sampling procedure.
the challenges is that the channel formatting varies from one For one, by randomly subsampling the negative samples of the
patient to another. Therefore, we need to deal with each case dataset we can rebalance the ratio between non-seizure and
separately to obtain the power values over 21 channels for seizure data (from 950/3 to 80/20) [1]. This method generate
each patient. Another challenge related to the dataset is the an acceptable outcomes compared to the benchmark results as
labeling process, where we need to do that manually for all shown in Table 1. Since the sampling is random, we consider
the three and half million seconds. the best result out of ten trials. However, to improve theses
results, we followed a trial and error technique to come up
V. T RAINING with an appropriate sampling procedure to replace the random
Our goal from the training process is to come up with the sampling process. As shown in Table 1, different combinations
optimum weight values that can minimize the loss generated of what we call it: Not seizure, Pre-seizure, Pro-seizure and
by the softmax function. We used Adam algorithm which has Pure seizure, can lead to different results. We can defined these
been proved to give a fast convergence rate when used for terminologies as follow:
training neural networks with a learning factor of 0.001 [13]. Not seizure: Signals with no seizure. Pre-seizure: Signals
In order to overcome the overfitting problem, we used many with no seizure but occurs within the five seconds before
approaches that are often used to address this issue. First, we seizure happens. Pro-seizure: Signals within the first or the
used a 0.5 probability dropout in the last fully connected layer last five seconds of seizure. Pure seizure: Seizure other than
of the second configuration [14]. Furthermore, by subsampling the first and last five seconds. We kept the original ratio
the negative samples of the dataset we can rebalance the ratio between negative and positive to stay as it is in the random
between non-seizure and seizure data thus delaying overfitting sampling (80/20). However, the percents shown in Table 2
and generating a better validation and testing results. representing the ratio of each class separately.

538
Fig. 5. Convolutional neural network configuration. Ten layers (Convolution + max pooling), followed by a prediction layer (Softmax) for class prediction

Training Dataset Format Sensitivity% Specificity% False Alarm


Non-seizure (80%) Seizure(20%)
Not seizure% Pre-seizure% Pro-seizure% Pure seizure%
Randomly Randomly Randomly Randomly 84.14% 85.5% 0.78
40 40 10 10 85.7% 83.4% 0.8
20 60 5 15 83.9% 81.6% 0.82
60 20 15 5 87.1% 82.3% 0.81
30 50 10 10 84.2% 79.4% 0.84
50 30 10 10 87.95% 86.5% 0.75
20 60 10 10 85.8% 83.6% 0.8
60 20 10 10 86.6% 81.3% 0.82
30 50 15 5 85.3% 80.5% 0.83
50 30 5 15 86.4% 85.8% 0.78
TABLE I
T EST RESULTS OF DIFFERENT COMBINATIONS OF TRAINING DATASET FORMAT.

Method Classifier Sensitivity % False Alarm (/hour)


Proposed Method Convolutional neural network 87.95 % 0.75
Thodoroff et al [1] Recurrent convolutional neural network 85.16% 0.8
Bolagh et al [3] Riemannian Manifold 85.77% 0.77
Wilson et al [7] REVEAL 67% 1.7
TABLE II
T EST RESULTS FOR OUR APPROACH COMPARED TO THE BENCH MARK RESULTS .

Fig. 7. Test results of false alarm for 23 patients compared to the deep neural
Fig. 6. Test results of sensitivity for 23 patients compared to the deep neural network bench mark results (Thodoroff et al).
network bench mark result (Thodoroff et al).

is tested. For the validation results, it took some time before


VI. R ESULTS starting to perform well in both sensitivity and specificity mea-
To evaluate the performance of our model we compare our sures, it shows a noticeably better sensitivity and specificity
result with the deep neural network and the overall benchmark after the sixteenth epoch. In general, the results seem valid.
results mentioned at the beginning of this paper [1][3][7]. We However, for the random sampling it may still be debatable
applied the training process over 20 epochs to determine where given that different training subsets could generate different
the best average validation result occurs. In this stage, we used results. Therefore, we validate our data over a ten trials to
leave-one-out cross validation, which trains and validates for fulfill the concept of random samples which was previously
N-1 other patients and then tests on all the withheld patient mentioned in this paper. From Table 1, we see that the result
data and repeats this process N times such that each patient represent the best random set over the ten trials. On the other

539
Fig. 8. Examples of (A) None seizure and (B) Seizure images. It is not easy to recognize the seizure from multi second and multichannel images

[2] Shoeb, A. H. (2009). Application of machine learning to epileptic seizure


hand, we can notice that different sampling process produced onset detection and treatment (Doctoral dissertation, Massachusetts Insti-
different results. From Table 1, we can see that (50/30), (10/10) tute of Technology).
[3] Bolagh, S. N. G., & Clifford, G. (2017). Subject Selection on a Rieman-
produced the best result which even exceed the bench mark nian Manifold for Unsupervised Cross-subject Seizure Detection. arXiv
results as shown in Table 2. In Figure 6 and Figure 7, a preprint arXiv:1712.00465.
representation of sensitivity and false alarm over each patient [4] Truong, N. D., Nguyen, A. D., Kuhlmann, L., Bonyadi, M. R., Yang, J., &
Kavehei, O. (2017). A Generalised Seizure Prediction with Convolutional
shows that our approach achieved better generalization over Neural Networks for Intracranial and Scalp Electroencephalogram Data
the patients compared to the neural network benchmark results Analysis. arXiv preprint arXiv:1707.01976.
[1]. [5] Bashivan, P., Rish, I., Yeasin, M., & Codella, N. (2015). Learning repre-
sentations from EEG with deep recurrent-convolutional neural networks.
arXiv preprint arXiv:1511.06448.
VII. C ONCLUSION [6] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks
for large-scale image recognition. arXiv preprint arXiv:1409.1556.
In this paper, we used an approach that can learn robust [7] Scott B. Wilson, Mark L. Scheuer, Ronald G. Emerson, and Andrew
J. Gabor. Seizure detection: evaluation of the reveal algorithm. Clinical
feature representations from EEG data, that would be tol- Neurophysiology, 115:22802291, 2004.
erant to the differences that exist between different seizure [8] Panayiotopoulos, C.P. (2010). A clinical guide to epileptic syndromes and
patients and to the noise associated with EEG data. We their treatment. chapter 6. Springer.
[9] Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep
overcome the problem of high non-stationarity that exist in learning (Vol. 1). Cambridge: MIT press.
epileptic seizures signal by generating image representation [10] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classi-
of these signals over the human scalp. This algorithm has fication with deep convolutional neural networks. In Advances in neural
information processing systems (pp. 1097-1105).
been successfully proved to be able to preserve the spectral, [11] Snyder, J. P. (1987). Map projections–A working manual (Vol. 1395).
spatial and temporal features. Having these image data made US Government Printing Office.
us able to use CNN to do the classification, as opposed to [12] Alfeld, P. (1984). A Trivariate Clough-Tocher Scheme for Tetrahedral
Data (No. MRC-TSR-2702). WISCONSIN UNIV-MADISON MATHE-
standard EEG analysis techniques that ignore some spatial MATICS RESEARCH CENTER.
information. The performance of the proposed design showed [13] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic
that using multi second time window could replace the need of optimization. arXiv preprint arXiv:1412.6980.
[14] Plis, S. M., Hjelm, D. R., Salakhutdinov, R., Allen, E. A., Bockholt,
recurrent neural network and hence reduce the over all design H. J., Long, J. D., ... & Calhoun, V. D. (2014). Deep learning for
complexity. We also overcome the problem of unbalanced neuroimaging: a validation study. Frontiers in neuroscience, 8, 229.
datasets by choosing the appropriate sampling technique. The [15] LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images,
speech, and time series. The handbook of brain theory and neural
proposed approach demonstrates significant improvements in networks, 3361(10), 1995.
classification accuracy over the state-of-the-art results. [16] Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O.,
Monga, R., & Toderici, G. (2015). Beyond short snippets: Deep networks
for video classification. In Proceedings of the IEEE conference on
R EFERENCES computer vision and pattern recognition (pp. 4694-4702).
[1] Thodoroff, P., Pineau, J., & Lim, A. (2016, December). Learning robust
features using deep learning for automatic seizure detection. In Machine
learning for healthcare conference (pp. 178-190).

540

You might also like