Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Physical and Engineering Sciences in Medicine

https://doi.org/10.1007/s13246-020-00938-4

SCIENTIFIC PAPER

EEG‑based deep learning model for the automatic detection of clinical


depression
Pristy Paul Thoduparambil1 · Anna Dominic1 · Surekha Mariam Varghese1

Received: 13 May 2020 / Accepted: 10 October 2020


© Australasian College of Physical Scientists and Engineers in Medicine 2020

Abstract
Clinical depression is a neurological disorder that can be identified by analyzing the Electroencephalography (EEG) signals.
However, the major drawback in using EEG to accurately identify depression is the complexity and variation that exist in
the EEG of a depressed individual. There are several strategies for automated depression diagnosis, but they all have flaws,
which make the diagnostic task inaccurate. In this paper, a deep model is designed in which an integration of Convolution
Neural Network (CNN) and Long Short Term Memory (LSTM) is implemented for the detection of depression. CNN and
LSTM are used to learn the local characteristics and the EEG signal sequence, respectively. In the deep learning model, filters
in the convolution layer are convolved with the input signal to generate feature maps. All the extracted features are given to
the LSTM for it to learn the different patterns in the signal, after which the classification is performed using fully connected
layers. LSTM has memory cells to remember the essential features for a long time. It also has different functions to update
the weights during training. Testing of the model was done by random splitting technique and obtained 99.07% and 98.84%
accuracies for the right and left hemispheres EEG signals, respectively.

Keywords  CD · EEG · CNN · LSTM

Introduction comparatively hypoactive Left Hemisphere (LH). Negative


thinking and depression are related to more prominent acti-
Clinical depression (CD) or simply depression, is one of the vation within the frontal area of the RH. The intensity of
most common psychological conditions that has an adverse depression is completely associated with the hyperactivity
impact on many lives [1]. Feeling of sadness, guilt, loss of of RH [4].
interest, difficulty in concentrating, trouble in making deci- Studies indicate that women are the foremost victims
sions, sleep disturbances are some of the main symptoms of depression [5]. To help depressed individuals return to
of clinical depression. Such conditions may also become a healthy life, early identification of depression is neces-
chronic, which in turn might hinder a person from carrying sary. Clinicians recommend a reliable and early diagnosis
out everyday responsibilities. This can lead to various dis- to increase the chances of successful recovery. Additionally,
turbing thoughts and self-harm attempts in the worst cases early diagnosis can aid throughout the recovery period and
[2]. It can affect any individual, regardless of age and social further enhance the quality of life for patients.
position [3]. A person’s neurological condition is embodied within the
The three levels of depression are mild, medium, and brain’s electrical activity. Electroencephalography (EEG) is
extreme, depending upon the severity of the symptoms [2]. a diagnostic examination that can be administered to assess
Depression is attributed primarily to the inter-hemisphere the electrical activity of the brain over a certain unspeci-
imbalance, a hyper-active Right Hemisphere (RH), and a fied period and is helpful in diagnosing various neurological
conditions such as depression or epilepsy. EEG recording is
a simple process and is achieved by positioning metal elec-
* Pristy Paul Thoduparambil
pristypault@gmail.com trodes on the scalp at different positions, which measures the
tiny electrical potentials that occur on the head as a result
1
Department of Computer Science and Engineering, Mar of neuronal brain behaviour. Compared to other brain imag-
Athanasius College of Engineering, Kothamangalam, ing methods, this has an extremely high time resolution to
Kerala, India

13
Vol.:(0123456789)
Physical and Engineering Sciences in Medicine

monitor the millisecond precision of events inside the brain asleep. It appears to be the most amplitudinal and the
[6]. Thus, depression can be diagnosed by comparing the slowest waves. This is normal as the dominant pattern in
EEG of the left and right hemispheres of the brain. children up to 1 year and in sleep stages 3 and 4.
A sample EEG signal in the time domain and frequency – Theta This band has a frequency of 4 to 8 Hz and is
domain is shown in Figs. 1 and 2, respectively. The EEG labelled as a slow activity. It is detected in children and
signal is primarily divided into four frequency bands like: adults when asleep. It is normal in children up to the age
delta (< 4 Hz), theta (4–8 Hz), alpha (8–15 Hz) and beta of 13 years and in sleep, but it is considered abnormal
(> 15 Hz) as seen in Fig. 2 in the case of awake adults. This can be seen in diffuse
abnormalities like metabolic encephalopathy or some
– Delta This band has a frequency of 4 Hz or less, which instances of hydrocephalus.
can be observed in babies and adults when they are

Fig. 1  Sample EEG signal in time domain

Fig. 2  EEG signal in frequency domain

13
Physical and Engineering Sciences in Medicine

– Alpha This band has a frequency of 8 to 15 Hz. It is typi- Bayes, PNN, and decision tree. With 98% precision, SVM
cally better seen on either side in the posterior areas of achieved better classification. Besides, they suggested an
the brain. It is observed when the eyes are shut, relaxed, index for the diagnosis of depression by incorporating above
and fades when the eyes are open or when one thinks, features. ANN are mathematical models inspired explicitly
calculates, etc. This is a typical pattern seen in relaxed by biological neural networks and can learn various patterns
adults and occurs the most during a lifetime, particularly and make predictions [16]. Many studies [17–19] used clas-
after the age of thirteen. sifiers based on ANN for the automatic detection of pancre-
– Beta This band has a frequency greater than 15 Hz and atic cancer, depression, and epilepsy, respectively.
is labelled as a fast activity. This is observable across the In all traditional machine learning approaches [10–12,
parietal lobes and frontal lobes. In the regions of cortical 14, 17–19], it is required to extract relevant features through
injury, it may be missing or with a decreased intensity. It some non-linear inspection, and then apply a machine learn-
is found to be dominant pattern for adults with high alert- ing categorization algorithm like SVM, PNN, Enhanced
ness, nervous behaviour or have their eyes wide open. It PNN or ANN. Typically, features are selected for accurate
is commonly called natural rhythm [7]. results by performing trial and error, but this can be time
consuming. There is also a possibility that significant fea-
EEG is non-linear and non-stationary in nature [8]. Manu- tures get omitted, and insignificant features get extracted.
ally reading and understanding EEG signals is time consum- All this makes the classification a complex activity and these
ing, tedious, and troublesome; significantly with the large methods are rarely used in medical practice due to its low
quantity of EEG information that must be studied [9]. Due precision, sensitivity, and inaccuracy.
to this, many EEG analysis packages have been developed Recently, deep learning has become more popular in the
for computerized abnormality detection. Subha et al. [10] medical field for accurate and automatic decision making.
used an eight-level multiresolution decomposition method Deep learning, a subfield of ANN, is a learning mechanism
of discrete wavelet transformation for classifying the EEG that lets computers automatically learn from examples and
frequency band into approximation stages. Using the tech- make recognitions/predictions that imitate the way humans
nique of total variation filtering, the high-frequency noise do it. It often achieves greater accuracy and captures high
components in the EEG were eliminated. Relative wavelet priority in areas such as medical diagnosis, recognition of
energy and entropy characteristics of each stage were meas- images, prediction making, recognition of speech, and so
ured and provided to the Artificial Neural Network (ANN) on [9].
as input for identifying normal and depressed subjects. Hos- In recent years, Convolutional Neural Network (CNN)
seinifard et al. [11] concentrated on the classification of the is found to be a significant and robust methodology in deep
depressed and normal subjects by nonlinear EEG signal learning. It has achieved considerable success in computer
analysis. The four non-linear characteristics extracted from vision and has recently become widely used in biomedical
the signal were detrended fluctuation analysis (DFA), cor- signal and image processing tasks [20]. Moreover, various
relation dimension, Higuchi fractal, and Lyapunov exponent. researchers focused on developing a computer-aided diag-
The most appropriate characteristics were selected using a nosis system using CNN in the medical field. Acharya et al.
genetic algorithm. K-nearest neighbour, linear discriminant [21] constructed a CNN model with an average accuracy
analysis, and logistic regression were used for classification, of 88.7%, for extracting EEG features and classifying the
and logistic regression classifier was found to be better with signal as normal, preictal, and seizure. Yildirim et al. [22]
90% accuracy. proposed a one-dimensional CNN to classify normal and
Faust et al. [12] used the wavelet packet decomposition abnormal EEG signals automatically and obtained a 20.66%
method and nonlinear algorithms together for the classifica- percent classification error. Recently, Anna et al. [9] and
tion purpose. Non-linear measures were used for the pro- Acharya et al. [23] developed CNN models with 11 and 13
cessing of EEG frequency bands retrieved by decomposition, layers, respectively, for identifying patients having depres-
and only the selected features by t-test [13] were provided sion using the EEG signal. Yildirim et al. [24] developed a
to different classifiers. Probabilistic Neural Network (PNN) one-dimensional CNN based model with a 91.33% accuracy
showed better performance than the other classifiers. Acha- to efficiently distinguish cardiac arrhythmia from long-term
rya et al. [14] has investigated several other non-linear fea- ECG signal segments. Aswathy et al. [25] conducted a com-
tures: fractal dimension, largest Lyapunov exponent, sample parative analysis of the performance of three different neural
entropy, DFA, Hurst’s exponent, higher-order spectra, and networks: feed-forward neural network, block-based neural
recurrence quantification analysis. After calculating the network, and CNN in the diagnosis of Alzheimer’s disease,
t-value [15], all the extracted features were ranked accord- from which CNN was observed to be the best classifier.
ing to their importance and given to five separate classifiers: Another type of deep learning algorithm that was pro-
Support Vector Machine (SVM), k-nearest neighbour, Naive posed by German researchers Hochreiter and Schmidhuber

13
Physical and Engineering Sciences in Medicine

in 1997 is Long Short Term Memory (LSTM) [26]. It is images. Jianfeng et al. [38] used two networks, one using
widely used for analyzing time-series data [27] and is the one dimensional CNN and LSTM and another with two
primary technique used in the processing of natural lan- dimensional CNN and LSTM, to know the local and global
guages, image identification, speech recognition [28]. It is emotion associated characteristics from speech. Both mod-
the evolution over the Recurrent Neural Network (RNN) to els achieved excellence on the task of understanding speech
deal with the vanishing gradient problem in the traditional emotion, particularly the two-dimensional CNN-LSTM
one. The LSTM units are connected in such a way that infor- framework that outperforms conventional methods with
mation cycles through the loop and enables the network to 95.89% accuracy.
learn about the temporal dynamics present in the data. It Most of the current depression diagnosis studies have
has the capability to retain the important/relevant informa- generally been based on the extraction of hand-crafted fea-
tion and removing/ignoring the irrelevant ones [27]. For tures accompanied by a shallow classifier that is trained on
the prediction of seizure, Petrosian et al. [29] used RNN to these selected features. This study aims to develop a fully
extract features from wavelet decomposed sub-bands of the automatic system for depression detection without the need
EEG time series data. In another analysis [30], the wave- for extraction and selection of features, thereby eliminating
let sequences based on deep bidirectional LSTM was used all obstacles for achieving the best accuracy in the classifica-
for grouping ECG signals into various categories. Tsiouris tion. CNN-LSTM, a combined model, is considered in this
et al. [31] used LSTM for the prediction of epileptic seizures study to automatically extract local characteristics and learn
using both time and frequency domain features extracted the long-term relations embedded in the EEG data.
from the EEG signal.
Oh et al. [27] proposed a system that automatically diag-
nosis arrhythmia from noisy ECG signals by integrating Method and materials
LSTM with CNN. Supratak et al. [32] implemented a model
called DeepSleepNet combining CNN and bidirectional The different modules in the CNN-LSTM combined auto-
LSTM for the automatic classification of stages in the sleep matic diagnostic system are signal preprocessing, local fea-
cycle. Swapna et al. [33] presented an automated diabetes ture extraction, sequence learning, and classification.
detection system with 95.1% accuracy using a CNN-LSTM Figure 3 shows the overall architecture of the automatic
framework and cardiac signals. For the prediction of mortal- depression diagnostic system.
ity in the ICU, Mohammad [34] used a CNN-LSTM model.
Shahzadi et al. [35] implemented a categorization model for EEG dataset
3D brain tumor MR images by cascading CNN with LSTM.
Shahbazi et al. [36] presented a CNN and LSTM integrated The EEG signals were obtained from “PATIENT REPOSI-
network that uses multichannel EEG for predicting epileptic TORY FOR EEG DATA + COMPUTATIONAL TOOLS”
seizures with 98.21% sensitivity. Wahyuningrum et al. [37] (University of New Mexico) [39]. The collected data-
developed a new method using the CNN-LSTM method for set includes EEG recordings captured using 64Ag/AgCl
classifying knee osteoarthritis seriousness from radiographic

Fig. 3  Automatic depression
detection system using EEG
signal

13
Physical and Engineering Sciences in Medicine

electrodes with the help of Synamps2 system voltage from beyond a limit, the model will get over-trained and cause
the scalp of the brain (500 Hz sampling rate, band pass filter the accuracy to decrease. Therefore, it is the responsibility
0.5–100, impedance less than 10 kΩ). of the developer to decide the appropriate number of layers
in the deep learning model. The maximum number of lay-
Preprocessing ers can only be determined through trial and error. In this
model, the accuracy reached its maximum when the total
For EEG data, preprocessing is vital to transform the raw layer count was 13 (including the input layer). The perfor-
signal into a format similar to the original neural signal, mance of the system has improved after adding the drop-
making it appropriate for further processing. Due to the lack out layer. Therefore number of layers was chosen as 13 and
of spatial information, the signal received from the scalp a deep model with 13 layers was developed for automatic
may not be the same as the signal from the brain. Various depression detection. The various parameters like the order
artifacts such as muscle movement or eye blinking may cor- and frequency of layers, kernel size, type and frequency of
rupt the data resulting in distorted signals. Furthermore, filter, strides were decided based on previous observations
the signal can contain unwanted noises. All of these are [9, 21–24, 27, 33–38] and through trial and error. Architec-
impediments to getting the correct result after EEG analy- ture of the model, detailing the different layers is depicted
sis. Finally, the relevant signals need to be separated from in Fig. 4 and the detailed overview of each layer is specified
randomly occurring neural activity in EEG recordings. in Table 1.
Bad channels that can tamper the accuracy of data may Layer 0, the input layer, accepts the preprocessed EEG
be present in the EEG data. Inappropriate placing of elec- signal. Layers 1 to 6 together forms the local feature extrac-
trode or loosing contact with the scalp, bridging of multiple tion module consisting of mainly convolution and pool-
channels as well as over saturation of electrode in case of ing processes. Convolution is principally performed to get
using wet electrodes can lead to bad channel [9]. It is nec- the local characteristics of the input EEG signal [23]. The
essary to eliminate these in the early stages since they can convolution operation is performed by moving kernel one
have adverse effects on further procedures and analysis. In at a time across the input vector and summing up product
the collected EEG data, bad channels and eye blinks were of input and kernel value falling at the same place. This is
already identified and removed using the FASTER algorithm repeated until the kernel hits the right bottom corner of the
[40] and independent component analysis. input vector. The output of the convolution process is the
Before passing the signal to the model, Z-score normali- feature map obtained per kernel. The equation of the convo-
zation [27] was applied to eliminate the amplitude scaling lution operation [23] is given below.
problem and to remove the offset effect in the EEG data, ∑N−1
thereby standardizing the signal. cr = fx
n=0 n r−n
(1)

where x denotes the input EEG signal to the system, f is


Feature extraction and sequence learning the filter (also called kernels), c is the output/feature map
obtained as a result, N indicates the number of data ele-
From previous work [23], it has been observed that CNN ments in the input signal x. The subscript n and r show the
is highly efficient for automatically extracting time-domain nth component of the filter and the rth output respectively.
features but very weak in learning sequential data. In the For getting better results, 128, 64 and 32 specific filters with
case of LSTM, it is widely used for learning long-term various filter sizes of 7 × 1, 5 × 1, 3 × 1 respectively are cho-
dependencies in a time-series signal [27]. It is essential to sen. During the training period, the network itself constantly
make both temporal and sequential learning for efficiently adjusts the kernel weights to obtain the correct information
and automatically analyzing EEG signals and both the CNN present in the input data.
and LSTM have performed relatively well on EEG signals. After each convolution process, one of the several
For this reason, a deep learning model that combines CNN activation functions is applied; here, the leaky linear unit
and LSTM was developed for the automatic diagnosis of rectifier (LeakyRelu) [41] is applied, to improve CNN’s
depression. non-linearity. As shown in the Eq.  (2) [23], the ReLU
The main layers in ANN are input layer, output layer, and function outputs zero for inputs less than zero, and the
the hidden layers [9]. Deep learning is completely based number itself for inputs greater than zero.
on ANN and has a more significant number of hidden lay-
ers. The performance of the deep learning system is mostly f (x) = max (0, x) (2)
dependent on the hidden layers in the network. The devel- All the extracted feature maps are given as input to
oped model can achieve high accuracy by adding more hid- the next layer, called the pooling layer that functions to
den layers. However, when the number of hidden layers is reduce the size of feature maps obtained from the previous

13
Physical and Engineering Sciences in Medicine

Fig. 4  Proposed CNN-LSTM
model

Table 1  Various parameters of the proposed deep model convolution layer while retaining the essential features
Layer Type Filter size Other parameters
[23]. Min pooling, average pooling, and max pooling
are the various pooling operations available. Since the
1 Conv1D 128 × 7 Strides = 1, activation = ReLU max pooling function is used in this analysis, after each
2 MaxPooling1D 2 Strides = 2 max pooling operation, only the larger value is preserved
3 Conv1D 64 × 5 Strides = 1, activation = ReLU within the stride 2 windows of the feature map. The addi-
4 MaxPooling1D 2 Strides = 2 tion of max pooling between successive layers of convolu-
5 Conv1D 32 × 3 Strides = 1, activation = ReLU tion reduced the number of parameters and computational
6 MaxPooling1D 2 Strides = 2 overload, thereby controlling the overfitting of the model.
7 LSTM – Unit Size = 32, return In order to extract the complete features of a large, com-
sequences = true
plex, continuous, non-linear EEG signal, it is necessary to
8 LSTM – Unit size = 32, return
consider the patterns in the previous time domain in addition
sequences = true
to the current patterns. Therefore, the sequence learning,
9 Flatten – –
which aims at modelling short and long-term memory is
10 Dense – Unit size = 10, activation = ReLU
required for correctly identifying the disorder, and this is
11 Dropout – rate = 0.2
possible through some deep learning methods. From the pre-
12 Dense – Unit size = 2, activation = Soft-
Max vious study [29], RNN was used for the learning of the EEG

13
Physical and Engineering Sciences in Medicine

signal by exploiting its short-term memory capacity. How- In a fully connected layer, each neuron in one layer is
ever, it was found to be unsuccessful in long-term memory connected to all the neurons in the next layer. The equation
due to the issue of vanishing gradient [42] that appears dur- for it is represented as [23]
ing back-propagation while training the ANN. This prevents ∑
the correct learning of previous layers in the network during xi = w y + bi
j ji j (3)
training. It can be overcome through deciding and storing
important information over a long period and this is possible where w denotes the weight, y is the input to the neuron/
through LSTM, a type of RNN. output from the previous neuron, b is the biases, x indicates
In the developed model, Layers 7 and 8 are the series the output of the current neuron.
of LSTM with 32 units for finding various patterns in the In the last fully connected layer, the softmax activation
signal from the feature maps produced by the CNN part. function is used to enable the neural network to perform a
This facilitates learning the long-term dependencies in the multi-class function. Now the network is able to determine
signal. The stacked LSTM enables the learning of complex the probability of a particular EEG signal to be that of a
information in the time series signal, thereby increasing the depressed or not.
accuracy in prediction. The LSTM architecture, as shown in The dropout layer is included as the 11th layer for pre-
Fig. 5 consists of a memory cell for storing information and venting the model from overfitting and improving generali-
three gates such as input gate, output gate, and forget gate, zation [43]. The dropout technique randomly drops units/
which regulates the flow of data into and out of the cell. Four neurons while forward propagating, thereby making the
different functions that are sigmoid, tanh, multiplication, and system learn from a subset of features. This also trains the
addition [26] are present in the LSTM for making the weight system to continue working in situations where any of the
updation easier during the model training. neurons are non-functional.
The flatten layer is introduced in between the LSTM and The network is trained in 17 epochs with a batch size of
the fully connected layer that convert matrix formed feature 64. Adaptive moment estimation (Adam) [44] optimization
into a vector form to be given as input to the fully connected algorithm is used to upgrade the various parameters of the
layer of the deep learning model. The fully connected layer developed network. This lets the network converge at a rapid
in layers 10 and 12 does actual classification based on the pace and thus enhances the training phase performance.
output from the last LSTM. Experiments were done with and without dropout, and the
performance was found to be more accurate in the network

Fig. 5  Overview of LSTM
model

13
Physical and Engineering Sciences in Medicine

with the dropout layer. The last part of the network without
dropout is shown in Fig. 6 (total of 12 layers in the network).
The last part of the network with dropout is shown in
Fig. 7 (final network with a total of 13 layers). Here, some
of the neurons in the fully connected layers are dropped.
This prevents the network from over-training. Through this
dropout mechanism, the network learns to work without
fail, even in the cases where some neuron is not working.

Results

The proposed hybrid model was implemented using


python language with Keras and TensorFlow libraries.
The model was trained, evaluated, and it has achieved
Fig. 6  Network without dropout
high performance compared to the existing methods. The
performance of the system was determined by considering
correctly and incorrectly recognized EEG signal. To find
out what role the right and left hemispheres play in depres-
sion, EEG samples from both hemispheres were used.
The performance of the system, while varying the total
number of layers in the network, is shown in Fig. 8. The
maximum accuracy has obtained when the total layer
count was 13.
Some of the approaches to evaluate the system’s perfor-
mance are random splitting and tenfold cross-validation.
Random splitting was used here, and it is the method of
splitting the collected data randomly into training, valida-
tion, and testing. The performance measures such as accu-
racy, specificity, sensitivity were found to be the best when
training, validation, and testing data was 80%, 10%, and
10%, respectively, and is shown in Fig. 9.
The model was trained using both the training and vali-
Fig. 7  Network with dropout dation data. The performance of the system becomes better

Fig. 8  Performance evaluated
by varying number of layers

13
Physical and Engineering Sciences in Medicine

or worse, depending on training, which is crucial in every right and left hemisphere data, respectively and is shown
machine learning method. Data that is not used during the in Table 2.
training process is called testing data, and this data is not
familiar to the system. Therefore, this new/unknown data
was used for measuring the overall performance of the sys- Discussion
tem. Experiments were performed in 17 epochs, with an
average of 47 s. As shown in Figs. 10 and 11, the model Over the years, several different methods for computer-aided
achieved the following accuracies after training for 17 depression detection have been developed. All these meth-
epochs. Using RH data, it achieved an accuracy of 99.42% ods used ANN and other conventional machine learning
during training and 98.84% during validation. Using the LH classifiers after extracting relevant features by approaches
data, it achieved an accuracy of 98.55% during training and such as relative wavelet energy [10], signal entropy [10, 12],
98.61% during validation. non-linear [11, 14]. However, these methods have the over-
The performance of the system was calculated using an head of finding the relevant features from EEG for detecting
unused dataset after completing the training. The model depression. Additionally, one needs to select an appropri-
has achieved an accuracy of 99.07% and 98.84% using the ate method for collecting features. In some cases, there is

Fig. 9  Performance measures of
the model evaluated by chang-
ing training data percent

Fig. 10  Accuracy of the model


using right hemisphere data

13
Physical and Engineering Sciences in Medicine

Fig. 11  Accuracy of the model


using left hemisphere data

a chance of missing important characteristics and includ- is more significant in the diagnosis of clinical depression
ing irrelevant ones. Thus, such methods are complex, time- than that of LH.
consuming, and requires the developer to be professionally In future this can be extended using more subjects. This
experienced. developed method can be implemented in clinics as a diag-
There have been attempts by many researchers to remove nostic tool.
these challenges by developing automatic depression detec-
tion systems. Such systems mainly used CNN [9, 21–25].
These methods of diagnosis lack sequence learning, which Conclusions
is necessary for learning a signal completely.
In the proposed method, all the essential features are The diagnosis of depression in its early stages is critical for
extracted automatically using CNN, and the sequence reducing its risks with proper care and anti-depression treat-
learning is done using LSTM. The benefit of this system is ments. An automated system will help to diagnose depres-
its high performance in the task of diagnosing depression by sion regardless of the knowledge level and experience of
studying both local characteristics and overall EEG signal neurologists. EEG can be used effectively for detecting CD.
patterns. Although this model used only a small set of EEG The developed EEG based automatic system that combines
signals, it achieved an accuracy of 99.07% using the right CNN and LSTM makes the detection more accurate by
hemisphere data and 98.84% from the left hemisphere. From learning both local characteristics and long dependencies in
the results, it is evident that the right hemisphere shows high the EEG signal. Thus, the system has been proved to be most
performance compared to the left hemisphere. efficient for detecting clinical depression. This method can
Negative emotions, bad moods, and depression were be used effectively in clinical sites by providing sufficient
correlated with comparatively higher activity in the frontal data for successful training.
cortex of RH than in the homotopic area of the LH [4]. In
depressive patients, hypometabolism and hypermetabolism
characterize LH and RH, respectively. Depression rate is Funding None.
positively related to RH higher activity. Because of this,
the accuracy of RH is more than LH’s accuracy during the Compliance with ethical standards 
diagnosis of depression using EEG. Hence, the EEG of RH
Conflict of interest  The authors declare that they have no conflict of
interest.

Informed consent  Data used in this research is taken from the database
Table 2  Performance measures using test data made available under the Public Domian Dedication and License v1.0,
Accuracy (%) Sensitivity (%) Specificity (%) as part of a Brain Initiative Seed Award supported by the University of
New Mexico. All participants provided written informed consent that
Right 99.07 99.5 98.6 was approved by the University of Arizona.
Left 98.84 98.61 99.06

13
Physical and Engineering Sciences in Medicine

Research involving human and animals participants  This paper does 20. Li Q, Peng Q, Yan C (2018) Multiple VLAD encoding of CNNs
not contain any studies with human participants or animals performed for image classification. Comput Sci Eng 20(2):52–63
by any of the authors. 21. Acharya UR, Oh SL, Hagiwara Y, Tan JH, Adeli H (2018)
Deep convolutional neural network for the automated detection
and diagnosis of seizure using EEG signals. Comput Biol Med
100:270–278
22. Yıldırım Ö, Baloglu UB, Acharya UR (2018) A deep convolu-
tional neural network model for automated identification of abnor-
References mal EEG signals. Neural Comput Appl. https​://doi.org/10.1007/
s0052​1-018-3889-z
1. World Federation for Mental Health (2012) Depression: a global 23. Acharya UR, Oh SL, Hagiwara Y, Tan JH, Adeli H, Subha DP
crisis. World Federation for Mental Health, Occoquan (2018) Automated EEG-based screening of depression using deep
2. Mental Health Foundation. https​://www.menta​lheal​th.org.uk/a- convolutional neural network. Comput Methods Programs Biomed
to-z/d/depre​ssion​. Accessed 17 Oct 2020 161:103–113
3. American Psychiatric Association. https​://www.psych​iatry​.org/ 24. Yıldırım Ö, Pławiak P, Tan RS, Acharya UR (2018) Arrhythmia
patie​nts-famil​ies/what-is-menta​l-illne​ss. Accessed 17 Oct 2020 detection using deep convolutional neural network with long dura-
4. Hecht D (2010) Depression and the hyperactive right-hemisphere. tion ECG signals. Comput Biol Med 102:411–420
Neurosci Res 68(2):77–87 25. Aswathy KJ, Surekha Mariam V (2018) Neural network in diagno-
5. Albert PR (2015) Why is depression more prevalent in women? J sis of alzheimer’s from electroencephalography. J Emerg Technol
Psychiatry Neurosci 40(4):219–221 Innov Res 5(3):284–288
6. Casson AJ, Abdulaal M, Dulabh M, Kohli S, Krachunov S, Trim- 26. Hochreiter S, Schmidhuber J (1997) Long short-term memory.
ble E (2018) Electroencephalogram. Seamless healthcare monitor- Neural Comput 9(8):1735–1780
ing. Springer, Cham, pp 45–81 27. Oh SL, Ng EYK, Tan RS, Acharya UR (2018) Automated diag-
7. The McGill Physiology Virtual Laboratory. http://www.medici​ ne. nosis of arrhythmia using combination of CNN and LSTM
mcgil​l.ca/physi​o/vlab/biome​d_signa​ls/EEG_n.htm. Accessed 17 techniques with variable length heart beats. Comput Biol Med
Oct 2020 102:278–287
8. Acharya UR, Bhat S, Faust O, Adeli H, Chua EC-P, Lim WJE, 28. Song E, Soong FK, Kang HG (2017) Effective spectral and exci-
Koh JEW (2015) Nonlinear dynamics measures for automated tation modeling techniques for LSTM-RNN-based speech syn-
EEG based sleep stage detection. Eur Neurol 74:268–287 thesis systems. IEEE/ACM Trans Audio, Speech, Lang Process
9. Anna D, Aswathy KJ, Surekha Mariam V (2019) Deep learning 25(11):2152–2161
in computer aided diagnosis of MDD. Int J Innov Technol Explor 29. Petrosian A, Prokhorov D, Homan R, Dasheiff R, Wunsch D
Eng 8(6):464–468 (2000) Recurrent neural network based prediction of epilep-
10. Subha DP, Joseph PK (2012) Classification of EEG signals in tic seizures in intra and extracranial EEG. Neurocomputing
normal and depression conditions by ANN using RWE and signal 30(1):201–218
entropy. J Mech Med Biol 12(4):1240019 30. Yildirim ¨O, (2018) A novel wavelet sequence based on deep
11. Hosseinifard B, Moradi MH, Rostami R (2013) Classifying bidirectional LSTM network model for ECG signal classification.
depression patients and normal subjects using machine learn- Comput Biol Med 96:189–202
ing techniques and nonlinear features from EEG signal. Comput 31. Tsiouris ΚΜ, Pezoulas VC, Zervakis M, Konitsiotis S, Koutsouris
Methods Program Biomed 109(3):339–345 DD, Fotiadis DI (2018) A long short-term memory deep learning
12. Faust O, Ang PCA, Subha DP, Joseph PK (2014) Depression diag- network for the prediction of epileptic seizures using EEG signals.
nosis support system based on EEG signal entropies. J Mech Med Comput Biol Med 99:24–37
Biol 14(3):1450035 32. Supratak A, Dong H, Chao W, Yike G (2017) DeepSleepNet: a
13. Boneau CA (1960) The effects of violations of assumptions under- model for automatic sleep stage scoring based on raw singlechan-
lying the t test. Psychol Bull 57(1):49–64 nel EEG. IEEE Trans Neural Syst Rehabil Eng 25(11):1998–2008
14. Acharya UR, Sudarshan VK, Adeli H, Santhosh J, Koh 33. Swapna G, Soman KP, Vinayakumar R (2018) Automated detec-
JEW, Subha DP, Adeli A (2015) A novel depression diagno- tion of diabetes using CNN and CNN-LSTM network and heart
sis index using nonlinear features in EEG signals. Eur Neurol rate signals. Procedia Comput Sci 132:1253–1262
74(1–2):79–83 34. Khan MH (2019) A CNN-LSTM for predicting mortality in the
15. Gao JB, Cao Y, Gu L, Harris JG, Principe JC (2003) Detection of ICU. Master’s Thesis, University of Tennessee
weak transitions in signal dynamics using recurrence time statis- 35. Shahzadi I, Tang TB, Meriadeau F, Quyyum A (2018) CNN-
tics. Phys Lett A 317:64–72 LSTM: Cascaded framework for brain tumour classification.
16. IGI Global. https:​ //www.igi-global​ .com/dictio​ nary/​ artif​i cial-​ neura​ IEEE-EMBS Conference on biomedical engineering and sciences.
l-netwo​rks/1519. Accessed 17 Oct 2020 IEEE, Piscataway, pp 633–637
17. Sanoob MU, Anand M, Ajesh KR, Surekha Mariam V (2016) 36. Shahbazi M, Aghajan H (2018) A generalizable model for seizure
Artificial neural network for diagnosis of pancreatic cancer. Int J prediction based on deep learning using CNN-LSTM architecture.
Cybern Inform 5(2):40–49 In: IEEE global conference on signal and information processing,
18. Erguzel TT, Ozekes S, Tan O, Gultekin S (2015) Feature selection Anaheim, CA, USA, pp 469–473
and classification of electroencephalographic signals: an artificial 37. Wahyuningrum RT, Anifah L, Eddy Purnama IK, Hery Purnomo
neural network and genetic algorithm based approach. Clin EEG M (2019) A New approach to classify knee osteoarthritis severity
Neurosci 46(4):321–326 from radiographic images based on CNN-LSTM method. In: IEEE
19. Anusha KS, Mathews MT, Puthankattil SD (2012) Classifica- 10th international conference on awareness science and technol-
tion of normal and epileptic EEG signal using time & frequency ogy, Morioka, Japan, pp 1–6
domain features through artificial neural network. In: International 38. Jianfeng Z, Xia M, Lijiang C (2019) Speech emotion recognition
Conference on Advances in Computing and Communications, using deep 1D & 2D CNN LSTM networks. Biomed Signal Pro-
Cochin, Kerala, pp 98–101 cess Control 47:312–323

13
Physical and Engineering Sciences in Medicine

39. Patient Repository for EEG data. http://predi​ct.cs.unm.edu/downl​ 43. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov
oads.php. Accessed 15 Jan 2020 R (2014) Dropout: a simple way to prevent neural networks from
40. Nolan H, Whelan R, Reilly RB (2010) FASTER: fully automated overfitting. J Mach Learn Res 15(56):1929–1958
statistical thresholding for EEG artifact rejection. J Neurosci 44. Kingma DP, Ba LJ (2015) ADAM: a method for stochastic opti-
Methods 192:152–162 mization. In: 3rd international conference on learning representa-
41. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: tions (ICLR), San Diego
surpassing human-level performance on image net classification.
In: IEEE international conference on computer vision, Santiago Publisher’s Note Springer Nature remains neutral with regard to
1026–1034 jurisdictional claims in published maps and institutional affiliations.
42. Bengio Y, Simard P, Frasconi P (1994) Learning long-term
dependencies with gradient descent is difficult. IEEE Trans Neu-
ral Netw 5(2):157–166

13

You might also like