Download as pdf or txt
Download as pdf or txt
You are on page 1of 128

上海大学计算机科学博士学位论文

Deep Learning and its Applications to


Retinal Vessel Segmentation and Motion
Brain Imagery Classification

姓 名:Sadaqat Ali
导 师:张武
学科专业:计算机科学

上海大学计算机学院
12 年 13 月 2021

III
上海 大 学 博 士学位 论 文


姓 名 ;
S a d a qa t A li  学号 :
1 6 86 0 0 0 4

论文题 目 : D e e p L e a r n i n g a n d i t s A p p l i c a t i on s  t o R e t i n a l V e s s e l  S eg m e n t a t i o n

a n dM o t o n  B r a i n  I ma

g e r y C l as s i f i c at i on

原 创 性 声 明

本 人声 明 :
所 呈 交 的 论 文 是本 人 在 导 师 指 导 下 进 行 的 研 究 工 作 。 除了 文

中 特别 加 以标注和 致谢 的 地方 外 ,
论 文 中 不 包含 其 他 人 己 发 表或 撰 写 过 的 研

宄成 果 参与同 工 作 的 其他 同 志 对 本 研 宄 所 做 的 任 何 贡 献 均 己 在 论 文 中 作


了 明确 的说 明 并表 示 了 谢 意 

?| 笔 糾 扣


签 名 如

\ 丨
年 丨

I於  


本 论 文 使用 授权 说 明



本人 完 全 了 解 上 海 大 学 有 关 保 留 使 用 学位 论 文 的 规 定 学校 有 权


、 ,



保 留 论文 及 送 交 论 文 复 印 件 ,
允许论文被查 阅 和 借 阅 ; 学 校 可 以 公布 论 文 的




全 部 或部 分 内 容 

( 保密 的 论文在 解 密 后 应 遵 守 此 规 定 

签 名 :
签名 :


^^ 日 期 

iw ZV  



h y



姓 名 :
S ad a
q a t A
l i  学号 :
1 68 6 0 0 0 4

论文题 目 : D e e p  L e ar n i n g an d i t s  A p p l i c at i o n s  t o R e t i n a l  Ve ss e  S e gm e n t a t
l i on

a n d  M o t i on  B r a i n  I ma g e r y  C l a s s i f i c at i on

上 海大 学

本 论文 经答 辩 委 员 会 全 体 委 员 审 查 确 认 符

 ,

合 上海 大 学 博 士 学 位论 文 质 量 要 求 

答 辩 委 员 会签 名 

主任 

委员 :



导 师 

於,
欠 齡 p 甘日
: i  : 崦 !
上海大学博士学位论文

A Dissertation Submitted to Shanghai University for the Degree of PhD


in Computer Science

Deep Learning and its Applications to


Retinal Vessel Segmentation
and Motion Brain Imagery Classification

PhD Candidate:Sadaqat Ali

Supervisor:张武
Major: 计算机科学

Department of Computer Engineering and Science,


Shanghai University
12/13/2021

IV
上海大学博士学位论文

ACKNOWLEDGEMENTS

I would like to thank the many people that have supported me during my time at SHU. I am
especially grateful to my adviser Prof. Dr. Wu Zhang, who not only made research enjoyable, but
also was an invaluable resource during the odyssey of Doctor of Philosophy (PhD).
I am especially grateful to my co-workers in Dr. Zhang’s research group. I would like to
thank Yonglin Xu for mentoring me in my initial stages as a researcher and for his valuable
feedback on my research. I would like to thank Dr. Mushtaq Hussain for giving advice on the
methods and IBM SPSS® software. I would like to thank Dr. SM.Raza Abidi for guidance in
My Research. I would like to thank Jianyue Ni, Xiangmeng Wang, Sen Ge, and Xiaoxiao for
their untiring always available support in the computer laboratory and counseling. I would also
like to thank the Shanghai Govt Scholarship Council (SGSC) Fellowship Program for granting
the full funding of the PhD degree program.
I would also like to thank the College of International Exchange (CIE) for their help and
support in all the matters of living, studying, and taming at SHU and provide opportunities to
explore the emerging China.
Lastly, I would like to express appreciation and dedication to my family, especially to my
father who bestowed confidence inherently upon me, and learned how to stand with constancy
in difficulties, my mother who always prayed me with infinite love, backing up, and praise, my
siblings for their continuous love, support, and encouragement, my wife for her untiring love,
cheer, and optimism, my children who are waiting anxiously for me, and my whole maternal
and paternal family members who trusted me to achieve this milestone.

V
上海大学博士学位论文

TABLE OF CONTENTS

ACKNOWLEDGMENTS ............................................................................................................ V

TABLE OF CONTENTS ............................................................................................................ VI

LIST OF TABLES ....................................................................................................................... IX

LIST OF FIGURES ...................................................................................................................... X

LIST OF ABBREVIATIONS ..................................................................................................... XII

摘要 .......................................................................................................................................... XIII

ABSTRACT ............................................................................................................................. XIV

1 Introduction ............................................................................................................................ 1
1.1 Brief Overview .............................................................................................................................. 1
Research Objectives .............................................................................................................. 2
Research Methodology .......................................................................................................... 3
Importance and Novel Contributions to the Field ................................................................. 4
1.2 Literature Survey ........................................................................................................................... 6
Application of Deep Learning onto Vessel Segmentation ..................................................... 7
Application of Deep Learning onto Imagery Classification ................................................. 11
1.3 Structure of Thesis ....................................................................................................................... 15

2 Retinal Vessel Segmentation ................................................................................................ 16


2.1 Introduction and Related Works .................................................................................................. 16
2.2 Methodology................................................................................................................................ 20
Objective Function .............................................................................................................. 21
2.3 Model Architecture ...................................................................................................................... 24
Generator Network .............................................................................................................. 24
Patch-based Discriminator Network.................................................................................... 25
2.4 Model Training ............................................................................................................................ 25
Hyperparameters ................................................................................................................. 26
System Configuration .......................................................................................................... 26
2.5 Results and Discussions............................................................................................................... 27
Datasets and Evaluation Metrics ......................................................................................... 27
Data Augmentation .............................................................................................................. 27
Stand-alone Generator Network .......................................................................................... 28
Influence of patch size......................................................................................................... 29
Cross-training ...................................................................................................................... 30
Correctness evaluation ........................................................................................................ 31
Robustness to the noise ....................................................................................................... 33

VI
上海大学博士学位论文

Comparison with the state-of-the-art methods .................................................................... 36


Computational Complexity ................................................................................................. 38
2.6 Conclusion ................................................................................................................................... 39

3 Motor Imagery Classification ............................................................................................... 40


3.1 Introduction and Related Works .................................................................................................. 40
3.2 Methodology................................................................................................................................ 45
Data Preprocessing .............................................................................................................. 46
3.3 Normalization .............................................................................................................................. 48
Common spatial pattern (CSP) filters.................................................................................. 49
Energy maps ........................................................................................................................ 50
Classification ....................................................................................................................... 51
3.4 Results and discussion ................................................................................................................. 56
Dataset configuration .......................................................................................................... 56
Evaluation benchmark ......................................................................................................... 56
System configuration........................................................................................................... 57
Influence of temporal trimming window ............................................................................. 57
Influence of frequency bands .............................................................................................. 58
Influence of LSTM memory cells count ............................................................................. 59
Evaluation of classifiers ...................................................................................................... 60
Comparison with the state-of-the-art methods .................................................................... 61
Computational complexity .................................................................................................. 64
Limitations ...................................................................................................................... 65
3.5 Conclusion ................................................................................................................................... 66

4 Precision-Recall Curves and Receiver Operating Characteristics ....................................... 67


4.1 Introduction and Related Works .................................................................................................. 67
4.2 Methodology................................................................................................................................ 68
4.3 Experiments and Results.............................................................................................................. 71
4.4 Conclusion ................................................................................................................................... 74

5 Motor Imagination Recognition ........................................................................................... 77


5.1 Introduction and Related Works .................................................................................................. 77
5.2 Methodology................................................................................................................................ 79
Preprocessing ...................................................................................................................... 79
Feature Extraction ............................................................................................................... 80
Classification ....................................................................................................................... 82
Dataset Configuration.......................................................................................................... 84
5.3 Results and Discussion ................................................................................................................ 85
Comparison of Classifiers ................................................................................................... 85
Influence of Memory Cells.................................................................................................. 85
Comparison with other state-of-the-art Methods................................................................. 86
5.4 Conclusion ................................................................................................................................... 87

VII
上海大学博士学位论文

6 Conclusions .......................................................................................................................... 88

References ................................................................................................................................... 91

Publications ................................................................................................................................113

VIII
上海大学博士学位论文

LIST OF TABLES

Table 2-1. Evaluation of generator in stand-alone configuration...................................................................... 26

Table 2-2. Evaluation of the discriminator for different sizes of patches. ........................................................ 28

Table 2-3. Cross-training evaluation of the CPGAN ........................................................................................ 30

Table 2-4. AUC performance comparison on all three dataset ......................................................................... 31

Table 2-5. Performance comparison of the proposed model on DRIVE, STARE and CHASEDB1 datasets
with state-of- the-art methods ........................................................................................................................... 34

Table 3-1. Influence of temporal window selection on classification performance (mean k value). Model is
trained on BCI Competition IV 2a [144] .......................................................................................................... 58

Table 3-2. Influence of frequency bands on classification performance (mean kappa value). ......................... 59

Table 3-3. Influence of LSTM memory cells dataset 2a BCI competition IV [144]. ....................................... 59

Table 3-4. Classifiers performance evaluation (Kappa Coefficients) on dataset 2a BCI competition IV [144].60

Table 3-5. Comparison of classification accuracy (Kappa Coefficients (k)) on dataset 2a BCI Competition IV
[144]. ................................................................................................................................................................ 63

Table 4-1. Comparison of generator models in standalone mode and with discriminators of different patch
sizes (640 x 640; 120 x 120; 40 x 40; 10 x 10; 1 x 1) in terms of PR and ROC on drive [170] and stare [171]
datasets. Key terms: precision-recall (PR), receiver optical characteristics (ROC), standalone (SA), with
discriminator (d) DRIVE, STARE, GENERATOR, DISCRIMINATOR ......................................................... 70

Table 4-2. Classification accuracy for naive bayes and flexible bayes on various data sets. Key terms:
Precision and recall (PR), Receiver operating characteristic (ROC) ................................................................ 72

Table 5-1. Comparison of classifiers in terms of coefficients on dataset 2A BCI competition IV.................... 84

Table 5-2. Comparison of classification accuracy (kappa coefficients) on dataset 2A BCI competition IV. ... 86

IX
上海大学博士学位论文

LIST OF FIGURES

Fig. 2-1. Some of the features associated with retinal fundus images. ............................................................. 17

Fig. 2-2. The proposed Conditional Patch Generative Adversarial Network (CPGAN). (A) Overall work flow
(B) Generator network (C) Discriminator network. (key terms: G: generator, D, discriminator, x: conditional
data, z: noise vector, y: ground truth, G(x): predicted segmentation map). ...................................................... 21

Fig. 2-3. Flow diagram of the proposed model with loss functions. ................................................................ 23

Fig. 2-4. Qualitative evaluation of cross-training results on STARE and DRIVE datasets. From left to right:
input test fundus image, corresponding manual annotation, corresponding predicted probability maps. ........ 29

Fig. 2-5. Visual interpretation of the credibility of the proposed model on STARE and DRIVE datasets.
Column 1, 3: input test images, Column 2, 4: corresponding probability maps evaluated in terms of TP
(green), FN (red) and FP (blue). ....................................................................................................................... 32

Fig. 2-6. Exemplar results of the proposed model on challenging cases: (a): central reflex vessels, (b): cotton
wools, (c): low contrast, (d) lesions. From top to bottom: input fundus image, enlarged target patch of fundus
image, corresponding manual annotation and the predicted probability maps. ................................................ 36

Fig. 2-7. Sample results of the proposed CPGAN. From top to bottom: results on STARE (column 1,2),
DRIVE (column 3,4), and CHASEDB1 (column 5,6) dataset. Row 1: input, row 2: ground truth, row 3:
probability maps, row 4: probability maps generated after applying Otsu automatic thresholding. ................ 38

Fig. 3-1. Brain computer interface pipeline. ..................................................................................................... 40

Fig. 3-2. Flow diagram of the proposed method consisting progressive preprocessing, feature extraction and
classification modules. Input is multi-class multi-channel motor imagery EEG data and output is the class
confidence, Key terms: FFTEM: Fast Fourier Transform Energy Maps, LSTM: Long-Short-Term-Memory. 47

Fig. 3-3. EEG signal segment generation and FFT Energy Maps generation................................................... 52

Fig. 3-4. Long-short-term-memory (LSTM) unit. ............................................................................................ 54

Figure 3-5. Paradigm for extraction of a single trial......................................................................................... 57

Fig. 3-6. Confusion matrices for a few representation of the proposed model compared with different
classifiers on dataset 2a BCI competition IV [144]. ......................................................................................... 65

Fig. 4-1. The proposed Conditional Patch Generative Adversarial Networks (CP-GANs) framework for
retinal vessel segmentation. (key terms: G: Generator, D, Discriminator, X:conditional data, Z:noise vector,
Y:synthetic data). .............................................................................................................................................. 68

Fig. 4-2. Precision-Recall and Receiver Optical Characteristics (ROC) curves on DRIVE [170] and STARE
[171] datasets. ................................................................................................................................................... 74

Fig. 4-3. Comparison of proposed model with DRIU. First row: Drive dataset, Second row: STARE dataset.
First and third column: results of the DRIU [161] model, second and forth column: results of the proposed
model. Green, blue, red marks: True positives, false positives and false negatives. ........................................ 75

X
上海大学博士学位论文

Fig. 4-4. Visual comparison of the obtained results. Column 1: fundoscopic image, column 2: ground truth,
column 3: DRIU [161] results and column 4: results of the proposed method. Row 1-2: DRIVE dataset, Row
3-4: STARE dataset. ......................................................................................................................................... 76

Fig. 5-1. Proposed Model, Input: 4 class motor imagery data, output: class confidence. ................................ 78

Fig. 5-2. Temporal segmentation and sequence of FFT Energy Maps (EM). ................................................... 81

XI
上海大学博士学位论文

LIST OF ABBREVIATIONS

AUC Area Under the Curve

BCI Brain computer interface

CAR Class Association Rule

CNN Convolutional Neural Network

CSR Corporate Social Responsibility

DBN Dynamic Bayesian Network

DIC Deviance Information Criterion

DL Deep Learning

DMML Data Mining and Machine Learning

DT Decision Tree

FLM Fast Large Margin

FPR False Positive Rate

GB Gradient Boost

GLM Generalized Linear Model

NNET Neural Network

ROC Receiver Operating Characteristic

RMSE Root Mean Squared Error

SVM Support Vector Machine

TPR True Positive Rate

XII
上海大学博士学位论文

摘要

基于深度学习的技术不断涌现在各个科学和工程领域,并且能够提供有竞争力的成果。本
文研究将深度学习技术应用于血管分割和运动想象分类这两个领域,取得了以下主要研究
结果。
1. 视网膜血管是眼科和糖尿病性视网膜病的生物诊断标志物,它可以利用血管的粗细来
进行视网膜疾病诊断和监测。当前的深度学习方法大多通过统一的损失函数对视网膜血管
进行分割。但是,厚血管和薄血管空间特征上的差异以及不均匀分布会导致厚度不平衡,
从而使统一的损失函数仅对厚血管有用。为了应对这一挑战,本文提出了一种基于块的生
成对抗网络的方法,该技术可迭代地学习眼底镜图像中的粗细血管。引入了一个额外的损
失函数,作为组合的目标损失函数,该函数允许生成器网络学习薄血管和厚血管,而鉴别
器网络辅助将两种血管进行分割。在 STARE,DRIVE 和 CHASEDB1 数据集上,与最新技术
相比,本文所提出的模型在正确性、敏感性、特异性方面得到了更好的结果。
2. 脑电图(EEG)是测量脑部活动的脑计算机接口(BCI)技术的一种方法。 EEG 是一
种以无创方式记录大脑电活动的技术。此信息可用于构建 BCI。 在过去的十年中,脑电
图作为一种识别人类行为的工具,吸引了众多研究人员的注意。但是在进行多类别(超过
两类)运动想象分类时,时间信息很少被利用,本文提出了一种基于长期-短期记忆的深
度学习模型,以学习隐藏的序列模式。两种类型的特征用于作为提出模型的输入,包括傅
立叶变换能量图(FTEM)和公共空间模式(CSP)过滤器。 在公开数据集上进行了多次实
验。使用 CSP 过滤器和 FTEM 提取空间和时间特征,可以使基于序列到序列的模型学习
隐藏的序列特征。使用公开数据集对所提方法进行了训练、评估和优化,得到的平均 kappa
系数为 0.81。 所得结果证明了模型的稳健性,而且能获得相对较高的分类精度,可在实
际中运用。

XIII
上海大学博士学位论文

ABSTRACT

Deep learning-based techniques are emerging in various fields and can provide competitive
outcomes. This study deals with the implementation of deep learning onto two domains,
including retinal vessel segmentation and motion brain imagery classification. Retinal blood
vessels, the diagnostic bio-marker of ophthalmologic and diabetic retinopathy, utilize thick and
thin vessels for diagnostic and monitoring purposes. The existing deep learning methods
attempt to segment the retinal vessels using a unified loss function. However, a difference in
spatial features of thick and thin vessels and a biased distribution creates an imbalanced
thickness, rendering the unified loss function to be useful only for thick vessels. To address this
challenge, a patch-based generative adversarial network-based technique is proposed which
iteratively learns both thick and thin vessels in fundoscopic images. It introduces an additional
loss function that allows the generator network to learn thin and thick vessels, while the
discriminator network assists in segmenting out both vessels as a combined objective function.
Compared with state-of-the-art techniques, the proposed model demonstrates the enhanced
accuracy, sensitivity, specificity, and area under the receiver operating characteristic curves on
STARE, DRIVE, and CHASEDB1 datasets.
Electroencephalography (EEG) is a method of the brain–computer interface (BCI) that
measures brain activities. EEG is a method of (non-)invasive recording of the electrical activity
of the brain. This can be used to build BCIs. From the last decade, EEG has grasped
researchers' attention to distinguish human activities. However, temporal information has rarely
been retained to incorporate temporal information for multi-class (more than two classes) motor
imagery classification. This research proposes a long-short-term-memory-based deep learning
model to learn the hidden sequential patterns. Two types of features are used to feed the
proposed model, including Fourier Transform Energy Maps (FTEMs) and Common Spatial
Patterns (CSPs) filters. Multiple experiments have been conducted on a publicly available
dataset. Extraction of spatial and spectro-temporal features using CSP filters and FTEM allow
the sequence-to-sequence based proposed model to learn the hidden sequential features. The

XIV
上海大学博士学位论文

proposed method is trained, evaluated, and optimized for a publicly available benchmark data
set and resulted in 0.81 mean kappa value. Obtained results depict the model robustness for the
artifa cts and suitable for real-life applications with comparable classification accuracy.

Keywords:Deep Learning Based Techniques,Fundoscopic

Images,Electroencephalography (EEG),Brain Computer Interface (BCI)

XV
上海大学博士学位论文

1 Introduction

1.1 Brief Overview

A powerful deep learning model (conditional patch-based generative adversarial network:


CPGAN) has been created to help in the segmentation of retinal blood vessels in fundoscopic
pictures. It is crucial for segmentation performance to allow the discriminator to learn the
discriminating to separate vascular as well as nonvascular pixels, as well as to train the
generator system to learn micro changeovers in thin vessels. For diagnostic and monitoring
reasons, retinal blood vessels, the diagnostic biomarker of ophthalmologic and diabetic
retinopathy, use thick and thin vessels. Using a unified loss function, the present deep learning
approaches try to segment the retinal vessels. The brain–computer interface (BCI) method of
electroencephalography (EEG) monitors brain functions. EEG is a non-invasive way of
recording the electrical activity of the brain. This information can be utilized to create BCIs.
Retinal fundus pictures are useful in the diagnosis and treatment of cardiovascular and
ophthalmic disorders. However, manual examination of the retinal fundus picture takes time
and requires empirical expertise. As a result, developing automated analysis of retinal fundus
pictures is required. Because several characteristics of retinal blood vessels, such as width,
tortuosity, and branching pattern, are key signs of illnesses, retinal blood vessel segmentation is
the core task of retinal fundus image processing. Furthermore, segmentation of retinal blood
vessels is beneficial for other applications such as optic disc detection. Based on the locations
of the vessels, the optic disc and fovea in the fund picture may be identified by their proximity
to blood vessels.
Motor imaging is now commonly utilized as a therapy to promote motor learning and
neurological rehabilitation in stroke patients. It has been shown to be effective in musicians.
Warming up, relaxation and focus, and then mental simulation of the precise exercise are all
part of this type of practice. According to a recent analysis of four randomized controlled
studies, there is limited evidence supporting the extra effect of motor imagery in patients with

1
上海大学博士学位论文

stroke when compared to simply traditional physiotherapy. Motor imagery can function as a
substitute for imagined behavior, resulting in comparable effects on cognition and behavior.
A patch-based generative adversarial network-based approach is presented for learning both
thick and thin vessels in fundoscopic pictures repeatedly. As a mixed objective function, it
provides an extra loss function that helps the generator network to learn thin and thick vessels,
while the discriminator network assists in segmenting out both vessels. The suggested model's
combined objective function with patch discriminator, conditional sampling technique. To learn
the underlying sequential patterns, this study offers a deep learning model based on
long-short-term memory. The proposed model is fed by two types of features: Fourier
Transform Energy Maps (FTEMs) and Common Spatial Patterns (CSPs) filters. Several tests
were carried out using a publicly available dataset. The model can learn the best discriminative
features to categorize multi-class EEG data by sequentially feeding these samples to it. The
deep learning model is composed of many building pieces that include an information retention
mechanism that enables the model to acquire contextual and temporal information in order to
understand and categorize the samples. Motor imagery classification is using a
long-short-term-memory-based deep learning model to learn the hidden sequential patterns and
vessel segmentation uses, a patch-based generative adversarial network-based technique which
iteratively learns both thick and thin vessels in fundoscopic images, using deep learning
respectively. The sequential feeding of the samples in motor imagery classification allows the
model to learn the most discriminative features to classify multi-class EEG data. Similarly,
Multi-model or multi-scale frameworks are required to detect arteries (thick vessels) and veins
(thin vessels). A comparison of state-of-the-art methods with the proposed method based on
LSTM and CNN is mandatory, both in terms of mean K-value and computational complexity.

Research Objectives

▪ The latest study proceeding multiclass motor imagery taxonomy offered a confounding
neuron-constructed model that was accomplished intended in lieu of four MI classes
popular binary mode besides outperformed traditional machine learning models.

2
上海大学博士学位论文

▪ Most techniques only treaty using two-class motor imagery classification. The
four-class motor imagery sorting issues still have a significant research gap. To identify
the high nonlinear motor imagery signals, a well broad view of deep learning techniques
is needed. In addition, identifying out of sight patterns then the effect of chronological
relationships amongst small portions.
▪ The existence of lesions should be able to decrease the color alteration between vessels
plus background, which is a problem.
▪ Another issue is the dissection of thin vessels, which adds comprehensive features to
fundoscopic images when they are detected.
▪ A CNN model remained proficient to categorize pixels hooked on the background,
arteries, or unclear classes, using the arteries/veins arrangement chore by way of a
four-class segmentation delinquent.
▪ In the development of binarizing produced probability maps, blurring problems can kill
the thin vessels commencing the dissection maps.
The classification of the vessel detection tasks and motor imagery signals in BCI research is
complicated by following factors, which are the main challenges encountered and resolved.
▪ The integration of parallel vessels and the crossing of continents.
▪ The existence of cotton-wools besides lesions.
▪ The existence of vascular systems by means of diluted borders and poor contrast.
▪ The recorded brain signals are compared to the noise.
▪ Signal to noise ratio consist of a low amplitude that gives rise to a high MI-EEG signals
pose a classification problem.
▪ EEG signals, both true and imagined are topic-specific and differ from one subject to
the next.

Research Methodology

Preprocessing, physical feature abstraction, then deep neural network aimed at deep feature
calculation besides pattern-centered classifier phases are all part of a deep learning-based

3
上海大学博士学位论文

system. Each stage uses several sub-stages to perform specific tasks, and all phases are
heightened to achieve the superlative sorting results. The model will study the supreme
discriminative sorts to identify multi-class EEG data thanks to the sequential feeding of these
samples. To interpret satisfactory details of images, the researchers proposed a conditional patch
dependent discriminator through a further loss term. Generator G and discriminator D are the
two networks that make up the proposed model. The discriminator network divides all of the
inputs keen on patches, with the discriminator setup attempting to discriminate separately
collection of a patch.

Importance and Novel Contributions to the Field

Standardized Linear Discriminant Analysis (RLDA)-based united models were used to


integrate temporal, spatial, and spectral information, yielding a 0.74 mean k score. Similarly,
another recent study used FBCSP to extract spatial features and a deep fusion network
grounded going on the LSTM network on the way to integrate sequential data. With a mean k
score of 0.80 their proposed method came in second. This demonstrates the significance of
spatiotemporal features and LSTM networks. They were able to extract relevant and
discriminative features using spatio-temporal features.

In the division of thin besides thick vessels, CPGAN be able to perceive low contrast thin
vessels that remained not manifest as vessels by the first human commentator, nevertheless
were patent by means of vessels by the second human commentator. These findings support the
CPGAN model's accuracy in segmenting thick and thin retinal vessels. The proposed model has
the ability to capture thin vessels, which is a key feature.

In deep learning-based system, each stage uses several sub-stages to perform specific tasks
and all phases are optimized on the way to achieve the finest classification results. The model
will acquire the greatest discriminative features on the road to identify multi-class EEG data
thanks to the sequential nurturing of these models. To interpret adequate details of images, the

4
上海大学博士学位论文

researchers suggested a restrictive patch dependent discriminator with an extra loss period.
Generator G and discriminator D are the two networks that make up the proposed model.

Expert examination of equally thin plus thick retinal vessels, which has freshly been
conceded out by numerous artificial intelligence methods, is completely dependent on
monitoring and diagnosis. Deep learning approaches currently in use aim headed for segment
retinal vessels by means of a single loss mechanism that prioritizes both thin and thick vessels
equally. Because of the mutable thickness, biased distribution, then differences in spatial
features of thin and thick vessels, the incorporated loss mechanism has a stronger influence on
thick vessel recognition, resulting in poor segmentation.

To segment out the retinal vessels with noise filtration capability, a hybrid active
contour-based model was proposed. The proposed model was applied to the entire fundoscopic
image [1], and the results were analyzed for thin and thick vessels. The synthetic segmentation
maps are generated using noise and conditional input samples. The discriminator accepts two
pieces of data as input: a conditional input and a synthetic map created by the discriminator, and
a conditional input and the real segmentation map generated by the discriminator (ground truth).
The discriminator network divides all of the inputs into patches, with the discriminator network
attempting to discriminate each set of a patch. The discriminator makes a final judgement by
averaging the scores of all patches.

To learn the underlying sequential patterns [2], this study offers a deep learning model
grounded proceeding long-short-term memory. This model is fed by two types of features:
Fourier Transform Energy Maps (FTEMs) and Common Spatial Patterns (CSPs) filters. On a
publicly available dataset, many tests were carried out. The proposed sequence-to-sequence
based model can learn the hidden sequential characteristics by extracting spatial and
Spectro-temporal information using CSP filters and FTEM. For a publicly accessible
benchmark data set, the suggested technique was trained, assessed, and optimized, yielding a
mean kappa value of 0.81.

5
上海大学博士学位论文

Whereas, Transformers use Convolutional Neural Networks [3] in conjunction with attention
models to try to tackle the challenge of parallelization. The model's ability to translate from one
sequence to another is accelerated by paying attention. Transformer is a model that accelerates
by focusing attention. It employs self-awareness in particular. The Transformer's internal
architecture is identical to those of the previous versions [4]. The Transformer, on the other
hand, has six encoders and six decoders. Each encoder has a lot in common with the others. The
architecture of all encoders is the same. Decoders share a common trait: they are all quite
similar to one another.

1.2 Literature Survey

Deep learning-based techniques are emerging in various fields and can provide competitive
outcomes. This study deals with the implementation of deep learning onto two domains,
including vessel segmentation and motor imagery classification. Deep learning is a machine
learning and artificial intelligence (AI) technique that parodists how humans procure
knowledge. Data science, which covers statistics and predictive modelling, incorporates deep
learning as a key component. Deep learning is highly useful for data experts who are
responsible with gathering, analyzing, and understanding immense quantities of data; it speeds
up and simplifies the process. The number of hidden layers in a neural network is referred to as
"deep." Deep neural networks, on the other hand, may contain up to 150 hidden layers, whereas
traditional neural networks only have 2-3. Deep learning models are trained with a huge amount
of categorized data and neural network topologies that learn topographies from the input
deprived of the requirement for physical feature extraction. Regard of deep learning canister as
a means to automate predictive analytics at its most basic level. Deep learning algorithms,
unlike traditional machine learning algorithms, are built in a ladder of increasing complexity
besides abstraction. Deep learning computer algorithms go through a similar process to a
youngster learning to distinguish a dog. Each algorithm in the pyramid performs a nonlinear
adjustment proceeding its input before generating a statistical model as an output. Iterations are
performed until the result is exact enough to be useful. The number of dispensation layers
through which data requisite flow is referred to as "deep."

6
上海大学博士学位论文

Application of Deep Learning onto Vessel Segmentation

The retina, which lines the interior of the eye, is a complex tissue. It renovates received light
into an exploit potential, which the brain's visual centers subsequently process. In vivo, blood
veins in the retina may be consistently recognized without invasive procedures [5]. In medicine,
imaging the retina and creating algorithms to evaluate such pictures are incredibly important. In
the last two decades, technological advancements have led to the development of digital retinal
imaging devices [6]. Diameter, length, branching angle, and tortuosity of retinal veins and
arteries have diagnostic value and can be used in the monitoring, numerous cardiovascular then
ophthalmologic illnesses are treated and evaluated [7,8]. Instinctive segmentation of retinal
pitchers is the first stage in the development of a computer aided diagnostics organization
aimed at ocular illnesses [9], because physical blood vessel segmentation is an inefficient and
repetitive process that needs training then expertise. Because some eye disorders influence the
vessel tree itself, automatic subdivision of blood vessels in retinal pictures is critical in the
identification of a variety of eye disorders. Trendy more or less circumstances (for example,
pathological lesions), excluding the blood vascular tree from the analysis may increase the
performance of automatic detection approaches. As a result, every automated screening system
must have automatic vessel segmentation [10]. Traditional supervised approaches usually have
two steps: feature extraction and classification. Because feature selection has a significant
impact on segmentation, discovering the ideal set of topographies (that reduces segmentation
error) is a tough issue. Convolutional neural network (CNN) have been used to segment
pictures recently, allowing feature extraction to be learnt from data rather than manually
constructed. In a variety of applications, these methods yield state-of-the-art outcomes [11].
Image-driven methods (such as edge-based besides region-based methods), design
appreciation methods, model-based methods, tracking-based methods, besides neural
network-based methods are among the algorithms for segmenting vessel-like constructions in
medical images [12]. Pattern recognition techniques discussed in [13] include matching filtering,
vessel tracking, measured morphology, multiscale methods [13,14], model-based methods, as
well as parallel/hardware-grounded methods. In clinical imaging, vessel segmentation

7
上海大学博士学位论文

approaches based on image dispensation techniques consume long stayed used to outline the
vascular tree. Many vessel segmentation algorithms exist [15], where noninvasive methods
allow for the external evaluation of blood vessel condition and structure. More complex
approaches for recognizing vascular tree based on learning networks, including more recently
breakthroughs in Deep Neural Networks (DNN) [18], have superseded segmentation methods
with limited success rates. The dissection of retinal blood vessels has resulted in a vast number
of algorithms and methodologies being published. A number of review publications [16,19]
have documented and characterized these advances. Many works have evaluated algorithms on
the DRIVE database, are able to review past findings and see which approaches prevailed, as
well as how much neural networks are characterized. In a variety of domains, including
neuroimaging, multiple deep learning models have demonstrated tremendous potential for
picture classification and segmentation tasks. Deep learning-based segmentation algorithms are
gaining popularity due to their ability to self-learn and generalize from massive data sets.

The state of the retinal vascular system is a steadfast biomarker for a variety of
ophthalmologic and circulatory disorders [20], automated vessel segmentation might be
necessary for diagnosis besides monitoring. To deal with the changing width besides orientation
of the vascular construction in the retina, there is a unique technique that syndicates the
multiscale scrutiny given by the Stationary Wavelet Renovate through a multiscale Completely
Convolutional Neural Network. The solution employs revolution procedures as the foundation
of a combined method for mutually data augmentation and prediction, allowing us to enhance
the segmentation by exploring the information gained during training.

Manual vessel segmentation by experienced professionals, on the other hand, is a


time-consuming and tedious operation. Many methods for autonomously segmenting the retinal
vasculature have been developed during the last two decades. For example, [21] retrieved a 7-D
vector for pixel representation comprised of gray-level besides instant invariants-based
characteristics, then categorized individually pixel by means of a neural network technique. The
final vessel segmentation in [22] was achieved using a multi-scale line detection method. [23]

8
上海大学博士学位论文

presented a fully-connected Conditional Random Fields (CRFs)-based automated vessel


segmentation technique that learnt its conformation by means of a planned output SVM. The
majority of remaining techniques, on the other hand, are based on manually generated features
that lack effective discrimination and are easily influenced by diseased areas in fundus pictures.

Deep learning (DL) architecture is made up of numerous linear besides non-linear data
renovations with the objective of producing more abstract and, eventually, more usable
representations. It has been shown to be effective in a variety of processer apparition
applications. CNNs are deep learning architectures that are improving not just for whole-image
classification [24], but also for limited tasks with planned productivity [25].

[26] shown a completely CNNs qualified end to end, pixels to pixels proceeding semantic
segmentation outperformed the utmost advanced approaches without the need of additional
equipment. A holistically nested edge detection (HED) technique grounded on completely
CNNs to robotically learn the rich ranked depictions with deep management, resolving the
uncertainty in edge then object edge recognition. Turning retinal vascular segmentation into a
boundary detection issue and use completely CNNs architecture to train discriminative
topographies that well defines the essential concealed outlines in fundus pictures linked to
vessels and backgrounds.

[27] proposed a comparable DL-based vessel segmentation study that turned retinal vascular
segmentation hooked on a pixel-level binary cataloging job. To categorize each pixel in the
fundus picture, they utilized a deep neural network. Nevertheless, it partakes two drawbacks:
first, the cataloguing for each separate pixel is not as much of than the global evenness
correlation, causing the process to fail in roughly local compulsive areas; second, the
pixel-to-pixel cataloguing approach devotes a significant amount of time in both the training
and testing stages, causing the technique to fail in some local pathological regions.

The involuntary segmentation of retinal vessels in fundus pictures can help doctors diagnose
illnesses like diabetes and hypertension. Presenting the Deformable U-Net (DUNet) for retinal

9
上海大学博士学位论文

vascular segmentation in an end-to-end way in this article, which leverages the local
characteristics of the retinal vessels with a U-shape design. To incorporate deformation
complication into the projected network, which was inspired by the newly presented deformed
convolutional networks. The DUNet is intended to excerpt context statistics and enable specific
localization by merging low-level and high-level structures, through up sampling operators to
increase output resolution.

For such studies, imaging methods such as optic imaging and computed tomography (CT)
are routinely used. It is frequently required to obtain accurate quantitative information on the
associated vascular shape in order to make valid diagnostic and prognostic judgements. As a
result, precise and reliable picture segmentation algorithms are critical. However, there are two
significant obstacles in the way of a reliable and effective system. First, because to the
enormous complexity of biological vasculatures, manual identification of vascular trees is
frequently infeasible for clinical practice, since it is too time-consuming and laborious.

Vascular structures look differently in medical imaging depending on the modality and
patient. As a result, a broad description of the vasculatures is difficult to come by. As a result,
most traditional approaches rely on certain assumptions about the formation of a candidate
structure and are tailored to a specific job using exclusive models. These models are generally
meticulously built to capture a certain goal, and they may need to be tweaked based on
structural size. These techniques are frequently restricted in their robustness due to the
structural complexity and appearance unpredictability, and require additional processing stages
for improved results.

Deep learning has been shown in a variety of applications to be capable of capturing the
characteristics of complicated shape and appearance patterns [28], outperforming most
state-of-the-art approaches. To move the challenge of mathematically modelling the appearance
to the acquisition of adequate training data, which is more resolvable in most situations, by
utilizing deep learning. In a recent research, the usefulness of CNN for vessel recognition was
demonstrated, but the efficiency of such algorithms, as well as their capacity to create a tree

10
上海大学博士学位论文

structure, can be a difficulty. Multiple researches have looked at the resilience and efficiency of
probabilistic tracking techniques.

The segmentation of vessels is an important step in a diversity of medicinal applications.


This study offers a deep learning framework for improving retinal vascular segmentation
performance. Deep learning architecture has been shown to have a strong capacity to learn rich
hierarchical representations automatically. In this study, wholly convolutional neural networks
(CNNs) creating a vessel prospect atlas by transforming vessel segmentation into a boundary
detection issue is used [29]. The vessels then background in the insufficient distinction area are
distinguished by our vessel probability map, which is resilient to diseased regions in the fundus
picture. In addition, to integrate the discriminative vessel prospect atlas by long-range
communications between pixels, a wholly connected Conditional Random Fields (CRFs) is
used.

Application of Deep Learning onto Imagery Classification

An imperative element of brain computer applications is single-trial motor imagery


categorization. As a result, signal characteristics comprising motor imagery movements must be
extracted and discriminated. When creating these sorts of motor-imagery grounded brain–
computer interface applications, Riemannian geometry grounded feature taking out algorithms
are useful. Riemannian geometry is primarily utilized with covariance matrices in information
theory. As a result, research revealed that when the method is applied after the filter bank
method, the covariance matrix maintains the signal's frequency then spatial statistics.
The brain computer interface (BCI) is gaining popularity in a variation of domains, counting
neuroscience, neuroimaging, reintegration medication, design appreciation, signal processing,
and machine learning [30]. One of the main areas of BCI investigation is to develop a new
signal channel to restore or augment some functions in people with severe neuromuscular
problems [31]. Motor imagery is a major study issue in the realm of BCI, which involves
mentally simulating a certain activity, such as visualizing limb motions [32]. The mental
simulation is started by an external event, such as a visual cue, which generates event related

11
上海大学博士学位论文

synchronization (ERS) plus desynchronization (ERD) in the sensorimotor paces at dissimilar


regions across the scalp at the same time. In the lab, several brain activity monitoring
techniques could be employed to investigate these occurrences. One of the most popular
devices for capturing such intellect waves is electroencephalography (EEG), which is
noninvasive besides relatively straightforward to use [31]. Nevertheless, EEG records often
have a low signal-to-noise ratio (SNR) owing towards the volume-conduction consequence,
making it difficult to correctly analyze brain dynamics besides distinguish distinct motor
metaphors [33]. 2D cursor control [34], wheelchair control [37], quadcopter control [36], then
other exciting applications based on motor imagery categorization have been performed. As a
result, much effort has gone into motor imaging feature extraction and categorization during the
last decade in order to make better use of it [37,40]. Only a few studies, to the authors'
knowledge, have applied a deep learning approach to BCI research. Employing spectral power
characteristics of very low dimension, Li et al. proposed using a denoising autoencoder to
identify partial EEG [41]. However, there have been no systematic studies of algorithm
performance with various network parameters, and no benchmark dataset has been investigated
in [41]. An et al. [42], developed a deep learning strategy grounded on deep certainty nets
(DBNs) and Ada-boost, in which a DBN was trained for individually EEG channel and these
channels were then boosted together. In [42], baseline dataset is not included, and time sphere
data is fed directly to individually DBN for preparation. In [43], a previous managed
convolutional stacking auto-encoder (PCSA) for ECoG sorting that incorporated label material
in the corresponding phase is devised. DBN was reported to be superior to PCA for ECoG data
correlation analysis by Freudenberg et al., [44], Using stacked auto encoders, power spectral
concentration, and principal element analysis, in [45], they suggested an EEG-based emotion
identification technique (PCA). A deep learning strategy has been developed in lieu of
recognizing objective imageries in an image swift serial visual presentation (RSVP) challenge
grounded upon EEG monitoring [46], The area in the ROC curve and uncorrelated discriminant
characteristics from linear discriminant analysis 3 (LDA) are fed into the deep neural network
(AUC). Yang et al. [47], employed a convolutional neural network (CNN) with an upgraded

12
上海大学博士学位论文

communal spatial filter to categorize motor imagery (ACSP). A superior enactment has
remained achieved as paralleled to the filter-bank communal spatial filter (FBCSP). Wulsin et al.
[48], created a DBN system that uses rare EEG information to perceive epileptiform liberations
besides seizure comparable commotion. Underdone data response is operative for EEG design
identification, according to [47], and could attain classification performance analogous to
conventional techniques.
Motor imagery categorization is a key topic in brain-computer interface (BCI) research
because it permits for the identification of a subject's intent to, for example, control a prosthesis.
Electroencephalography (EEG) is used to quantify the brain dynamics of motor imagery as a
nonstationary time series with a poor signal-to-noise ratio. Although a number of approaches
for learning EEG signal characteristics have been established in the past [49], deep learning has
seldom been used to build novel representations of EEG topographies and enhance recital for
motor imagery categorization.
An external event, such as a painterly indication, is used to initiate the cerebral simulation,
which causes incident associated synchronization (ERS) and event related de synchronization
(ERD) in the sensorimotor paces at various locations throughout the scalp at the same time.
Various brain activity monitoring techniques could be used to investigate these events in the lab.
Electroencephalography (EEG), which is noninvasive then very simple to use, is one of the
most common methods for recording such brain waves [50]. However, owed to the
volume-conduction upshot, EEG footages often have a poor signal-to-noise ratio (SNR),
making it difficult to correctly comprehend brain dynamics and categorize distinct motor
imageries [51].
There are five types of motor imagery classification methods [52]: (1) linear classifiers, by
means of linear discriminant analysis (LDA) then support vector machine (SVM); (2) nonlinear
Bayesian classifiers; (3) nearest neighbor classifiers; and (4) neural network methods. The
starting weights for conventional neural network techniques must be properly set, which is one
of the biggest barriers to their wider implementation. Large starting weight values may result in
unsatisfactory local modicums, whilst trivial values may render the multilayer system

13
上海大学博士学位论文

untrainable owing to weight dissemination. A new class of techniques and methods known as
deep learning has recently been created and has been popular in together academia also
business to solve this challenge and paradigm neural networks by means of high expressive
power.
The main building blocks of the deep learning structure are the restricted Boltzmann
machine (RBM) then autoencoder [53], which are competent layer by layer then may be castoff
to form deep neural networks. Contrastive divergence (CD) was created particularly for training
an RBM grounded on Gibbs sampling theory. In the pretraining step, the topographies collected
by RBM or autoencoder are used to prepare the multilayer neural network, which might result
in significant performance increase. Error backpropagation is used to fine-tune the bulks in the
deep neural network based on the pre-training outcomes.
The presence of noise makes it difficult to classify EEG data. External and internal noise
can affect EEG readings. Independent component analysis [54] is one of several techniques
proposed to decrease noise in EEG data. Combinations of these characteristics are utilized to
produce subject-dependent spatial and frequency-based features. Filter banks and CSPs were
employed in the filter bank CSP (FBCSP) method [55]. Feature extraction and classification
techniques based on Riemannian geometry have recently gained momentum in BCI
applications. The distance between the covariance matrix and the Riemannian mean covariance
matrix was utilized as a classification feature by the same authors. Because of the intricacy of
the recording and the restricted quantity of signals, deep-learning-based categorization
techniques are seldom used in BCI applications. Bashivan et al., [56] utilized power spectrum
densities based on three frequency ranges of EEG data to create pictures for each range by
interpolating topological characteristics that retained the brain surfaces. They combined 1D
convolutions and long short-term memory (LSTM) [57] layers with the VGG (visual geometry
group) model. When compared to other designs, the ConvNet and LSTM/1D-Conv
architectures produced the greatest results [58].

14
上海大学博士学位论文

1.3 Structure of Thesis

Chapter 2 presents a study on Conditional Patch-Based Generative Adversarial Network for


Retinal Vessel Segmentation

Chapter 3 investigates Sequence-to-Sequence Deep Neural Network with Spatio-Spectro


and Temporal Features for Motor Imagery Classification

Chapter 4 examines Conditional Patch-based Generative Adversarial Network for Retinal


Vessel Segmentation

Chapter 5 explores Recurrent Deep Learning for EEG-based Motor Imagination


Recognition

Chapter 6 gives overall conclusions of this dissertation.

15
上海大学博士学位论文

2 Retinal Vessel Segmentation

2.1 Introduction and Related Works

Fundus images of retinal blood vessels are the directly observable and non-invasive tree
structure of the circulatory system of humans that can easily be photographed. Fundus images
have been widely used in monitoring, analysis, and diagnostic processes of medical disorders
like diabetic retinopathy, degeneration of age-related macular degradation and glaucoma. The
structure of vascular trees and morphological attributes like patterns, thickness, crookedness,
density, and angles of vessels can assist the ophthalmologists in the diagnostic process [59].
Among these features of retinal vessels, thickness plays a crucial role in the diagnosis of
diabetic retinopathy. However, manual detection of these vessels is time-consuming, less
accurate, and requires involvement of an experienced ophthalmologists. Measurements that are
taken in this way are prone to errors [60], which may be due to fatigue and subject to user bias.
Therefore, computer-aided automatic detection of retinal vessels is necessary, and segmentation
is one way to detect retinal vessels. With color fundus photography as input, the goal of this
development is to push an automated detection system to the limit of producing binary output
images with realistic clinical potential. A retinal image contains the vascular structure of arteries
and veins in the form of trees and branches, as depicted in Fig. 2-1. Varying width and gradual
changes in directions are basic structures of tabular vessels. Multi-model or multi-scale
frameworks are required to detect arteries (thick vessels) and veins (thin vessels) [61].
Major challenges in vessel detection tasks in fundoscopic images are:
• The Central light reflex along the center of thick vessels.
• The Crossover regions and merging of parallel vessels.
• The presence of cotton-wools and lesions.

The presence of low contrast vascular structures with diluted boundaries.

16
上海大学博士学位论文

Fig. 2-1. Some of the features associated with retinal fundus images.

An increasing number of studies have proposed computer-aided retinal vessel segmentation


solutions to address these issues. However, improvements are still possible with respect to the
challenges listed above. Our initial results revealed that focusing on both thick and thin vessels
can address these challenges. In literature, three types of retinal vessel segmentation solutions
exist. These include unsupervised learning, supervised learning and deep learning.
Usually, the unsupervised techniques are defined as rule-based methodologies that work
without prior knowledge and labeled ground truths to extract the vessel patterns. Most widely
used approach in unsupervised learning is filter-based.
Filter-based techniques utilize the piece-wise linear approximation, a degradation in
diameter of the vessel along the vascular length and enhance the vascular features using a
Gaussian filter or its derivatives. An extension of filter-based approach to extract retinal vessel
features by combining the matched filter and first order derivative of Gaussian filter was
proposed [62]. The matched filter was used to detect retinal vessels using a certain threshold,
and the threshold value was obtained from the first-order derivative of a Gaussian filter.
Similarly, the first-order derivative of a Gaussian filter was applied in four directions to extract

17
上海大学博士学位论文

centerlines of retinal vessels, and a multidirectional morphological tophat operator was used to
detect the retinal vessels [63]. Another advancement was reported [64] where a trainable
B-COSFIRE (Bar-Combination of Shifted Filter) approach was utilized, and thresholding based
segmentation maps by combining mean shifting operation and difference-of-Gaussian (DoG)
filter were generated. Further, to obtain orientation invariant filtration, they combined two
B-COSFIRE filters. Another orientation invariant approach was proposed where researchers
extracted energy maps using Fourier transform and applied an orientation-aware detector to
detect retinal vessels [65]. Zhang et al. [66] used Wavelet transform to map 2D vascular images
into a 3D lifted-domain and used a multi-scale second-order Gaussian filter to detect retinal
vessels. A hybrid active contour based model was proposed to segment out the retinal vessels
with noise filtration capability [67].
Supervised approaches classify every pixel of an image as vascular or non-vascular by
learning from given rules. According to the type of models, existing supervised approaches can
further be categorized into the classical machine learning and deep learning. Classical machine
learning methods mainly utilize classifiers that learn decision boundaries on handcrafted
features. These include Linear Discriminant Analysis (LDA), Support Vector Machine (SVM)
and K-nearest neighbor classifier. For retinal vessel segmentation, the SVM classifier was
utilized with a handcrafted feature obtained using two orthogonal line detectors [68]. A BAT
and random forest based algorithm was proposed with 40-dimensional including local, phase
and morphological features to extract the retinal blood vessels [69].
In recent years, there has been a growing interest in deep learning for classification and
segmentation problems since they automatically learn the complex hierarchy of inherent
features of input data. Automatic feature learning introduces a paradigm shift according to
which deep learning models only need optimization rather than computing the handcrafted
features. Deep learning methods have recently been utilized in biomedical applications,
including the segmentation of retinal blood vessels. For this purpose, Convolutional Neural
Networks (CNNs) are widely used as the building blocks of deep learning models due to their
segmentation performance. Many attempts have been made to prove CNN’s ability for retinal

18
上海大学博士学位论文

vessel segmentation and even they exceeded in segmentation performance as compared to


humans and traditional approaches [70–85]. All these techniques utilized deep learning for
retinal vessel segmentation. A deep neural network was proposed to segment blood vessels and
evaluated on the DRIVE dataset [78]. A fully convolutional neural network with stationary
wavelet transform was proposed [86] where they enhanced the vessels using UNet architecture
with two skip connections. Further, they also deployed UNet only and compared results on
stand-alone UNet and with stationary wavelet transform and achieved comparable results on
DRIVE [87], STARE [88] and CHASEDB1 [89] datasets. A cross-modality deep learning
technique was proposed in [80] to overcome lesion problem as the presence of lesions can
reduce the color difference between vessels and background. Similarly, several deep learning
approaches [80, 82, 90, 91] were deployed, and they claimed to resolve lesion and correctness
challenges. These deep learning methods have intended to focus on the general structure of
vessels. An additional problem is the segmentation of thin vessels as the detection of thin
vessels provides detailed features of fundoscopic images. To focus on thin vessels rather than
focusing on the entire structure of the vessels, several recent studies, [74–77], have been carried
out. Recent evidence utilized discriminative dictionary learning (DL) and fusion of multiple
features for retinal vessel segmentation [72]. Similarly, a deep learning structure called the
Gaussian net (GNET) model combined with a saliency model [71], was proposed for retinal
vessel segmentation. To classify the arteries and veins a UNet based method was proposed that
takes into account such uncertainty and authors formulated the arteries/veins classification task
as a four-class segmentation problem, and a CNN model was trained to classify pixels into the
background, arteries, or uncertain classes [70].
Though, these methods suffer from blurring issues and produce false positives around
indistinct and tiny vessel branches. The main reason for this, rather contradictory results, is that
existing approaches of deep learning models define a unified loss function in a pixel-wise
manner. Unified function segments thin and thick blood vessels with the same importance.
Blurring issues can destroy the thin vessels from the segmentation maps in the process of
binarization of generated probability maps. Based on these reasons, the utilization of additional

19
上海大学博士学位论文

information or object-oriented approaches would result in a better segmentation performance as


suggested [85].
To address these issues, we propose a deep learning based approach for retinal vessel
segmentation using the conditional patch based generative adversarial network (CPGAN) to
learn more discriminative features for both thick and thin vessels. A loss term is integrated with
a main objective function to learn low-frequency edges and a patch-based discriminator is
utilized to learn the small variations and sharp edges of high-resolution blood vessels. By
introducing additional loss function and a patch-based discriminator, the proposed model can
effectively segment out thick and thin vascular pixels from non-vascular pixels. The proposed
model does not include extra labeling as suggested [73] and trained on the entire image. Rather
than separating vessel labels into thin and thick vessels, we train the proposed model on the
entire fundoscopic image and analyzed results for thin and thick vessels. Experiments are
conducted on publicly available datasets, DRIVE, STARE, and CHASEDB1 and compared
with the state-of-the-art methods for a fair evaluation.

2.2 Methodology

An efficient end-to-end deep learning approach has been proposed for retinal vessel
segmentation by extending the generative adversarial network to a conditional setting such that
conditional sample gives a head start to the generator to learn retinal vessel maps. In addition to
the conditional sample, we propose a patch-based scheme for the discriminator to discriminate
ground truth and the generated synthetic label. The proposed model is inspired by [92] which
proposed a conditional patch based discriminator integrated with an additional loss term to
translate fine details of images. The architecture of the proposed model is shown in Fig. 2-2.
The proposed model consists of two networks: generator G, and discriminator D. The generator
takes noise and conditional input samples to generate the synthetic segmentation maps. The
discriminator takes two sets as input where the first set contains a conditional input and a
generated synthetic map, and the second set contains conditional input and actual segmentation
map (ground truth). All the inputs to the discriminator network are divided into patches, and the

20
上海大学博士学位论文

discriminator network tries to discriminate each set of a patch. Averaging scores of all patches
result in a final decision of the discriminator. The combined objective function of the proposed
model with patch discriminator, conditional sampling scheme, and model architecture is
discussed in the following subsections.

Objective Function

Having a conditional fundoscopic image sample x of width W and height H, its corresponding
vessel segmentation map y, and a random noise distribution z, the generator network G is a

Fig. 2-2. The proposed Conditional Patch Generative Adversarial Network (CPGAN). (A) Overall
work flow (B) Generator network (C) Discriminator network. (key terms: G: generator, D,
discriminator, x: conditional data, z: noise vector, y: ground truth, G(x): predicted segmentation
map).

21
上海大学博士学位论文

mapping function which maps the conditional input x and noise data z to a segmentation map
G(x; z), as: G : fx; zg, G(x; z). A patch-based discriminator network D, takes two pairs fx; yg
and {x;G(x; z)g as input in form of patches and discriminate each patch as a ground truth y or
synthetic segmentation map G(x; z) by resulting a score between one and zero as: f1; 0gn where
n is a hyper-parameter of the model and represents the total number of patches fed to the
discriminator D.

The hyper-parameter n can be chosen between 1 and the total number of pixels of an input
sample. Any other value of n, 1 < n _ (W _ H), divides the input of discriminator into n patches
and score of discriminator on a single image can be computed by averaging the scores of all
patches.

The loss function of generator and discriminator networks can be formulated as,

LG(G, D) = Ex,y,z (- log(D(x, G(x, y)))) (1)


LD(G,D) = Ex,y (- log(D(x, y) + Ex,y,z (log(1 - D(x, G(x, z)))) (2)

where z is the input noise vector of latent space, x is conditional sample and y is the
corresponding ground truth. The objective function of the proposed model can be formulated as,

JCPGAN = arg min G max D Ex,y (- log(D(x, y) + Ex,z (- log(1 - D(G(x, z), x)))) (3)

Studies have found that L2 norm, in generative networks, produces blurry images and failed in
capturing low-frequency components present in fundoscopic images, in the form of smooth
edges, and skipped the high frequency in all generated images.
In such cases, new frameworks are not required, but a L1 norm can capture low-frequency
components. These findings motivated us to integrate a L1 norm in the objective function to
handle low-frequency

22
上海大学博士学位论文

Fig. 2-3. Flow diagram of the proposed model with loss functions.

Correctness and let the discriminator network model only high frequency components. L1 norm
used in generative loss function can be formulated as,

LL1 (G) = Ex,y,z [ || y - G(x, z) || 1] (4)

Combining the objective function and the additional loss term, the final objective function
of the proposed model can be formulated as,

J*CPGAN = arg minG maxD JCPGAN(G, D) + λ L1 (G) (5)

where λ is a hyper-parameter. Discriminator D is trained with the objective of maximization of


the probability of training data and minimization of the probability of sampled data obtained
from the generator G. The generator is trained to confuse the discriminator between generated
segmentation maps and ground truths. Discriminator and generator can be trained in alternation
using the stochastic gradient descent method. A flow diagram with combined loss function is
shown in Fig. 2-3, where generator loss is computed at the output of the generator and added to
the primary objective function to formulate a unified combined objective function. Both the
noise z and conditional sample x are required as input to the generator G, where the random
noise z can be sampled easily from a Gaussian distribution but the conditional sample x must be

23
上海大学博士学位论文

sampled precisely as conditional data to enforce the training of generator and discriminator
specifically on the given condition. Any non-standard distribution of data can introduce local
minima problem.
A conditional sample is selected from a Gaussian distribution to ensure that the generator learns
a random sample at each training step and helps to avoid the local minima problem.

2.3 Model Architecture

In the segmentation problem, the extraction of low-level features is a crucial step to obtain
more exceptional segmentation results. UNet and SegNet architectures are utilized in the
generator network to incorporate low-level features. Each layer in the discriminator and
generator network uses a basic module of hidden layers as a sequence of convolution, batch
normalization, and activation layers. Fundoscopic images and its vessel segmentation maps
share a basic structure of vessels and some raised edges. Skip connections are added between
the lower level and high-level convolutional layers by following the shape of U-Net to
circumvent the bottleneck for low-level features.

Generator Network

The generator network can be divided into two parts, encoder, and decoder, to segment out
the foreground from background. Encoder part consists of shrinking layers (left half of the
network), and the decoding part consists of expanding layers (right half of the network). Two
encoding layers follow standard convolutional architecture with activation and pooling layers.
We used Rectified Linear Unit (ReLU) as an activation function in both layers. Encoder part of
the generator network, extracts hidden features and down-sample the input image, whereas the
decoder part up-sample the size of features at each step till the last layer. Each image passes
through all the layers, which can limit the learning of the network at some intermediate layers.
Sharing low-level features from the encoder network with the high-level feature of the decoder
network would be desirable to pass low-level features across the network.

24
上海大学博士学位论文

In the decoder network, features obtained from the encoder network are upsampled using
convolutional filters, and the factor of two reduces the number of feature channels in each
convolutional layer. After each convolution layer, the ReLU activation function is applied.
Feature maps from the encoding part as concatenated with the corresponding decoder part. The
last layer of the network consists of a convolutional filter of dimension 2 x 2 with a depth of the
required number of classes. In our case, the depth of the final layer is set to 2 as there are only
two classes (retinal vessels and background) in retinal vessel segmentation datasets.

Patch-based Discriminator Network

To capture high frequency components, a patch-based discriminator is a strong candidate


that captures local patches of an image. Therefore, a patch-based discriminator is analyzed to
penalize each patch of size N x N and tasks to discriminate real or generated synthesized
segmentation maps. Patch-based discriminator applies convolutions over the entire image in a
patch-wise manner and results in a final decision by averaging all patch-wise decisions over an
image. By assuming independence between different patches, such discriminator processes the
input as a Markov random field. Small patches allow the network to converge more rapidly
while resulting in high resolution segmentation maps and can be applied to images having
different dimensions.

2.4 Model Training

Weights of the generator and discriminator networks are initialized randomly which follows
a standard normal distribution. Fixed seeds of this standard normal distribution are used as
weight initialization for each experiment. Training of the proposed model follows the following
steps until the discriminator converges and get confused between generated and actual input
with the conditional data samples.
• Initially, the generator network generates random noise regardless of the input data fed
to the generator.

25
上海大学博士学位论文

• The discriminator is trained on the predictions generated from the generator network and
tries to learn to discriminate the generated samples and actual ground truth.

Table 2-1. Evaluation of generator in stand-alone configuration.

DATASET Acc Sp Se AUC


DRIVE 0.9247 0.9780 0.7731 0.9743
STARE 0.9623 0.9835 0.7929 0.9867
CHASEDB1 0.9572 0.9802 0.7640 0.9763
• The generator tries to learn the basic blood vessel segments until the discriminator gets
confused between generated and real data.
• Discriminator learns the difference between generated data and ground truth, whereas
the generator tries to fool the discriminator. Further, discriminator learns to utilize the
conditional samples (x) to learn the pattern of blood vessels in fundoscopic images.

Imposing the conditional data samples (x) to discriminator allows the model to minimize
the loss further as compared to GAN.

Hyperparameters

All the hyper-parameters used in training of the proposed model are selected empirically
based on the loss returned from the test data, which include trade-off coefficient (λ = 10),
learning rate (lr = 0.002), learning rate decaying factor (n = 0.75), momentum (m = 0.002) and
early stopping criteria. Stochastic gradient-based learning with Adam solver is used to train the
generator and discriminator in an alternating mode. The early stopping criteria is used to avoid
over fitting of the model. We observed the training and testing loss for each epoch, and the
training process is stopped once the early stopping criterion is achieved. Fig. 4 shows the
training and validation loss over several epochs.

System Configuration

Training and testing experiments are conducted on an NVIDIA 1080Ti GPU which contains
11 GB builtin RAM installed in I7 5th generation Intel architecture computer with 128 GB
system RAM. CUDA 8.0 and cuDNN v5 are used as firmware to accelerate the processing. To
speed up the training process, the model is trained on the largest possible mini-batch sizes of 8

26
上海大学博士学位论文

training examples, which is limited by the GPU video memory. Batch-wise updating of the
parameters also makes the gradients less noisy and allows the parallelization of computations on
the GPU.

2.5 Results and Discussions

CPGAN has been applied to extract retinal blood vessels in fundoscopic images. An
extensive number of experiments have been conducted to analyze the performance of the
proposed model in the subject to architecture choice, patch size, objective function, additional
loss term, and hyper-parameters.

Datasets and Evaluation Metrics

The proposed retinal vessel segmentation model is trained and evaluated using the publicly
available and manually annotated datasets namely DRIVE, STARE, and CHASEDB1. For the
evaluation and fair comparison of the proposed model with state-of-the-art techniques, the same
protocol is followed in this research as used in [86] for the division of all three datasets into
training, testing and validation sets. Accuracy (Acc), sensitivity (Se) and specificity (Sp) are
used as a benchmark for quantitative evaluation and area under the receiving operating curve
(AUC) is used for qualitative evaluation of the proposed model.

Data Augmentation

Fundoscopic images are altered by the bias field distortion, which makes the intensity of
several regions to vary across the fundoscopic image. To ensure this, we applied contrast
enhancement and intensity normalization. Contrast enhancement helps to discriminate against
the foreground and background in low contrast regions. Intensity normalization is applied by
linearly transforming the original intensities, which equalized the histogram of each input
sample and the resulted samples similar across the dataset in terms of intensity. Furthermore, all
three datasets contain only 80 fundus images, captured from a specific angle with a predefined

27
上海大学博士学位论文

field of view. To generalize the proposed model for unseen data having a difference of lighting
or orientation, rotation and flipping augmentation schemes are applied on all three datasets.
Rotation augmentation includes 15 degree clock-wise and anti-clockwise rotation, which
resulted in two additional samples against each input sample. Horizontal and vertical flipping
schemes are utilized to increase the generalization of the sample set. These augmentation
techniques not only increase the sample space and segmentation performance but also
generalize the model for a diverse range of fundoscopic images.

Table 2-2. Evaluation of the discriminator for different sizes of patches.

PATCH SIZE
DATASET METRIC
640 x 640 120 x 120 40 x 40 10 x 10 1x1
Acc 0.9558 0.9762 0.9556 0.9552 0.9547
Sp 0.9820 0.9894 0.9818 0.9816 0.9811
DRIVE
Se 0.7740 0.8746 0.7737 0.7734 0.7732
AUC 0.9749 0.9743 0.9747 0.9744 0.9742
Acc 0.9749 0.9747 0.9643 0.9640 0.9636
Sp 0.9645 0.9912 0.9849 0.9843 0.9841
STARE
Se 0.9851 0.7940 0.7934 0.7930 0.7928
AUC 0.7936 0.9885 0.9878 0.9876 0.9875
Acc 0.9588 0.9792 0.9585 0.9581 0.9579
Sp 0.9807 0.9811 0.9805 0.9801 0.9798
CHASEDB1
Se 0.7652 0.7697 0.7651 0.7648 0.7642
AUC 0.9781 0.9892 0.9779 0.9775 0.9730

Stand-alone Generator Network

To evaluate the generator network with the integration of a discriminator network, we


trained the generator network as a base model on all three datasets. The additional loss term of a
generator network is used as an objective function, and the results are summarized in Table 2-1.
For the DRIVE dataset, the generator network in standalone settings achieves 0.9247, 0.9780,
0.7731 and 0.9743 for Acc, Sp, Se, and AUC, respectively. Similarly, it achieves 0.9623, 0.9835,
0.7929, 0.9867 and 0.9572, 0.9802, 0.7640, 0.9763 Acc, Sp, Se and AUC on STARE and
CHASEDB1 datasets, respectively. These results are not satisfactory as compared to
state-of-the-art methods and the proposed CPGAN model. Hence, it depicts that the only

28
上海大学博士学位论文

generator network or a simple deep learning network is not sufficient to extract the thin retinal
vessels. The next sections discuss the impact of discriminator networks combined with the
generator network.

Influence of patch size

Performance evaluation of discriminator networks for different sizes of patches is


conducted on all three datasets. Obtained performance of the CPGAN in terms of Acc, Sp, Se
and AUC on five types of patches, i.e. 640 x 640, 120 x 120, 40 x 40, and 10 x 10, and 1 x 1 are
summarized in Table 2-2. These patches are selected inherently while keeping the generative
network unchanged. Feeding a patch of size 640 x 640 to the discriminator results in
conditional-GAN and feeding a patch of size 1 x 1 results in CPGAN. For DRIVE dataset,
conditional-GAN (patch size = 640 x 640) results 0.9558, 0.9820, 0.7740, 0.9749 and
conditional-pixel-GAN (patch size = 1 x 1) results 0.9547, 0.9811, 0.7732, 0.9742 for Acc, Sp,
Se and AUC. These results infer that CPGAN performs better than the conditional-pixel-GAN.
In contrast, CPGAN with patch size = 120 x 120 outperforms all other patch sizes. Similarly,
for STARE and CHASEDB1, we obtained the best results with this patch size. These results are
supporting the hypothesis that focusing smaller regions than the entire image can increase the
segmentation performance by learning the structure of both thin and thick vessels.

Fig. 2-4. Qualitative evaluation of cross-training results on STARE and DRIVE datasets.
From left to right: input test fundus image, corresponding manual annotation,
corresponding predicted probability maps.

29
上海大学博士学位论文

Table 2-3. Cross-training evaluation of the CPGAN


Test Set Method Acc Sp Se AUC
Soares [93] 0.9397 - - -
Ricci [68] 0.9266 - - -
Marin [94] 0.9448 - - -
Fraz [95] 0.9456 0.9792 0.7274 0.9697
DRIVE
Li [80] 0.9486 0.981 0.7273 0.9677
Zengqiang [85] 0.9444 0.9802 0.7014 0.9568
Yang [72] 0.9354 0.987 0.324 -
Proposed 0.9532 0.9818 0.7384 0.9733
Soares [93] 0.9327 - - -
Ricci [68] 0.9464 - - -
Marin [94] 0.9528 - - -
Fraz [95] 0.9495 0.977 0.701 0.966
STARE Li [80] 0.9545 0.9828 0.7027 0.9671
Zengqiang [85] 0.958 0.984 0.7319 0.9678
Yang [72] 0.9454 0.9758 0.6654 -
Sathananthavathi [69] 0.9589 0.9787 0.7484 -
Proposed 0.9562 0.9863 0.7384 0.9733

However, increasing the size of the patch results in a better segmentation but after a certain
size, 120 x 120, the performance of the discriminator network tends to decrease again, and the
model fails to capture thin vessels. This substantiates previous findings in the literature that the
choice of discriminators is a vital part of the generative adversarial network.

Cross-training

The reliability of CPGAN is investigated using cross-training evaluation as conducted in


[56]. For cross-training evaluation, the model is trained on one dataset and tested on another
dataset. When testing the model on unseen DRIVE dataset, CPGAN successfully detects more
thin vessels without changing the segmentation results for thick vessels and results in 0.9532,
0.9818, 0.7384 and 0.9733 for Acc, Sp, Se and AUC, respectively and outperform compared
models as shown in Table 2-3. The sample results of cross-training evaluation are shown in Fig.
2-4. It can be observed that the model trained on STARE and tested on the DRIVE dataset,

30
上海大学博士学位论文

though able to detect the thick vessels but missed few thin vessels because ground truth of the
STARE dataset mainly contains thick retinal vessels and model is trained to detect thick vessels.

Table 2-4. AUC performance comparison on all three dataset


DATASET
NETWORK VESSELS
DRIVE STARE CHASEDB1
ALL 0.9743 0.9885 0.9892
GENERATOR
THIN 0.9535 0.9625 0.9593
ONLY
THICK 0.9829 0.9893 0.9854
ALL 0.9753 0.9885 0.9892
COMPLETE
THIN 0.9567 0.9759 0.9681
MODEL
THICK 0.9842 0.9914 0.9886

Contrarily, as compared to the STARE dataset, the DRIVE dataset contains more thin
vessels. When CPGAN tested for the STARE dataset, while trained for the DRIVE dataset,
CPGAN detected both thick and thin vessels, as shown in Fig. 2-5. Even the thin vessels are not
available in ground truths of the STARE dataset, provided by the first human annotator,
CPGAN successfully detects the thinnest vessels which are indeed actual vessels. Thus, testing
the CPGAN on the STARE dataset while trained on the DRIVE dataset, results in 0.9562,
0.9863, 0.7384 and 0.9733 for Acc, Sp, Se and AUC, respectively. It can be seen that this
analysis competes for all other methods as compared in Table 2-3, which shows that the
proposed model achieves overall better performance on vascular and non-vascular pixels
segmentation.

Correctness evaluation

To compare the influence of discriminator network and L1 norm, we trained the only
generator network by freezing the discriminator layers such that the generator resulted in
segmentation maps. Obtained segmentation maps are evaluated in three configurations: (i) all
vessels, (ii) thin vessels and (iii) thick vessels. To divide the vessels into the thick and thin
categories, we adapted the categorical division from [73]. Vessels that consist of more than two
pixels are marked as thick vessels, and other vessels are marked as thin vessels. For training, all

31
上海大学博士学位论文

the vessels are used as a single class rather than separate classes. Evaluation of the proposed
model for thick and thin vessels is summarized in Table 2-4. It can be seen that the thin vessels
using the proposed model (complete model) are detected more accurately, in terms of AUC, as
compared to only the generator network for all three datasets. These results show that
integrating a L1 norm in discriminator network allows the model to learn thin vessels more
accurately. For interpretable evaluation of the CPGAN, segmentation results are shown in Fig.
2-5 in terms of true positive (green), false positive (blue) and false negative (red). It can be seen
that the proposed model segmented a tiny number of thin vessels as false negative (as shown in
red color). These pixels were marked as vessels by the human annotator, but the proposed model
segmented these pixels as nonvessel pixels. Similarly, the proposed model marked some pixels
as vessels (shown in blue color), but actually, these pixels were marked as non-vessels by the
human annotator. This could be the case of the STARE dataset where these pixels belong to thin
vessels, but the annotator marked these pixels as non-vessel.

Fig. 2-5. Visual interpretation of the credibility of the proposed model on STARE and
DRIVE datasets. Column 1, 3: input test images, Column 2, 4: corresponding probability
maps evaluated in terms of TP (green), FN (red) and FP (blue).

32
上海大学博士学位论文

Robustness to the noise

Separate loss function integrated with the main loss function of the proposed network helps
the generator to learn the low contrast vessels which have low transitions.
In addition to these low contrast vessels, CPGAN can effectively detect the vessels that are
not annotated by the human observer.
In this proposed model, the generator network learns the low contrast vessels, whereas the
discriminator network forces the model to learn the non-vessel pixels too by predicting a zero
score.
In this way, the entire model learns the structure and appearance of vessels simultaneously.
Therefore, the challenging cases of the central reflex vessel can be addressed, as shown in Fig.
2-6.
Central reflex is the bright regions between the boundary walls of a vessel such that a single
vessel can appear as two parallel thin vessels.
The proposed model is tested on this challenging scenario and the results show the
robustness to the central reflex, as shown in Fig. 2-6 (a).
Furthermore, the model segmented out small thin vessels as well, which were not marked
by the human annotator.
Cotton wools and lesions are the bright and dark regions lead to affect the vesicular features
extraction.
It can be seen in Fig. 2-6 (b, d) that the model can segment out the thin vessels in the
presence of cotton wools and lesions even though these vessels were not annotated.
Vessel segmentation is prone to low contrast fundoscopic images and can result in more
false positive or false negatives. Fig. 2-6 (c) depicts that the proposed model is robust to
low-class problems and can segment out the vessel boundaries.
In summary, the proposed model can effectively address the main challenging cases by
learning generator and discriminator network alternatively and integrating a custom L1 loss
term.

33
上海大学博士学位论文

Table 2-5. Performance comparison of the proposed model on DRIVE, STARE and
CHASEDB1 datasets with state-of- the-art methods

DRIVE STARE CHASEDB1


YE
SCHEME METHODS AU AU AU
AR Acc Sp Se Acc Sp Se Acc Sp Se
C C C

HUMAN 0.9 0.9 0.7 0.9 0.9 0.8 0.95 0.9 0.8
- - -
OBSERVER 472 724 76 349 384 952 45 711 105

201 0.9 0.9 0.7 0.9 0.9 0.7


ZHANG [62] - - - - - -
0 382 724 120 484 753 177
201 0.9 0.9 0.7 0.9 0.9 0.7
FRAZ [63] - - - - - -
2 43 759 152 442 686 311
ROYCHOWDHU 201 0.9 0.9 0.7 0.9 0.9 0.9 0.7 0.9 0.94 0.9 0.7 0.9
RAY [96] 5 494 782 395 672 56 842 317 673 67 575 615 623
AZZOPARDI 201 0.9 0.9 0.7 0.9 0.9 0.9 0.7 0.9 0.93 0.9 0.7 0.9
[64] 5 442 704 655 614 497 701 716 563 87 587 585 487
201 0.9 0.9 0.7 0.9 0.9 0.8
YIN [65] - - - - - -
UNSUPER 5 403 79 246 325 419 541
VISED 201 0.9 0.9 0.7 0.9 0.9 0.9 0.7 0.9 0.94 0.9 0.7 0.9
ZHANG [66]
6 476 725 743 636 554 758 791 748 52 661 626 606
ZENGQIANG 201 0.9 0.9 0.7 0.9 0.9 0.9 0.7 0.9 0.96 0.9 0.7 0.9
[85] 8 538 82 631 75 636 857 735 833 07 806 641 776
201 0.9 0.9 0.7 0.9 0.9 0.7
KHAN [97] - - - - - -
9 506 651 696 513 812 521

201 0.9 0.9 0.7 0.9 0.9 0.7


KHAN [98] - - - - - -
9 44 64 54 48 56 52

201 0.9 0.9 0.7 0.9 0.9 0.7


YOU [99] - - - - - -
1 434 751 41 497 756 26
201 0.9 0.9 0.7 0.9 0.9 0.9 0.6 0.9
MARIN [94] - - - -
1 452 801 067 588 526 819 944 769
SUPERVIS
201 0.9 0.9 0.7 0.9 0.9 0.9 0.7 0.9 0.94 0.9 0.7 0.9
ED FRAZ [95]
2 48 807 406 747 534 763 548 768 69 711 224 712

201 0.9 0.7 0.9 0.7 0.9 0.7


ORLANDO [100] - - - - - -
7 684 897 738 68 715 277

34
上海大学博士学位论文

DASGUPTA 201 0.9 0.9 0.7 0.9


- - - - - - - -
[101] 7 533 801 691 744

SATHANANTHA 201 0.9 0.9 0.8


- - - - - - - -
VATHI [69] 8 534 645 285

201 0.9 0.9 0.7 0.9


MELIN [78] - - - - - - - -
5 466 785 276 749
201 0.9 0.9 0.8 0.96 0.9 0.7
OLIVEIRA [79] - - - - - -
5 576 804 039 53 864 779
201 0.9 0.9 0.7 0.95 0.9 0.8
U-NET [79] - - - - - -
5 531 82 537 78 701 288
201 0.9 0.9 0.7 0.9 0.9 0.9 0.7 0.9 0.95 0.9 0.7 0.9
LI [80]
6 527 816 569 738 628 844 726 879 81 793 507 716
201 0.9 0.9 0.7 0.9 0.9 0.9 0.8 0.9
LISKOWSKI [81] - - - -
6 515 806 52 71 696 866 145 88
201 0.9 0.7 0.9 0.7 0.94 0.7
FU [82] - - - - - -
6 523 603 585 412 89 13
RECURRENT 201 0.9 0.9 0.7 0.96 0.9 0.7
- - - - - -
U-NET [74] 8 556 816 751 22 836 459
201 0.9 0.9 0.7 0.96 0.9 0.7
R2U-NET [74] - - - - - -
8 556 816 792 36 82 756
DEEP 201 0.9 0.9 0.7 0.97 0.9 0.8
DUNET [75] - - - - - -
LEARNIN 9 697 87 894 24 821 229
G DENSE U-NET 201 0.9 0.9 0.7 0.9 0.9 0.9
- - - - - -
[76] 9 511 736 986 538 722 74
201 0.9 0.9 0.97 0.9
M2 U-NET [77] - - - - - - - -
9 63 714 03 666
201 0.9 0.9 0.7 0.9 0.9 0.7
YANG [72] - - - - - -
9 421 696 56 477 733 202

201 0.9 0.9 0.8 0.9 0.9 0.9 0.7 0.9 0.97 0.9 0.7 0.9
PROPOSED
9 762 894 746 743 747 912 64 885 92 811 697 892

35
上海大学博士学位论文

Fig. 2-6. Exemplar results of the proposed model on challenging cases: (a): central reflex
vessels, (b): cotton wools, (c): low contrast, (d) lesions. From top to bottom: input fundus
image, enlarged target patch of fundus image, corresponding manual annotation and the
predicted probability maps.

Comparison with the state-of-the-art methods

A comparison of the CPGAN with the state-of-the-art methods is summarized in Table 2-5.
Evaluation is conducted for all three datasets and categorized in unsupervised and supervised
schemes. For the DRIVE dataset, CPGAN achieves 0.9762, 0.9894, 0.8746 and 0.9743 for Acc,
Sp, Se and AUC respectively, where the model achieves better results for Acc, Sp and AUC as

36
上海大学博士学位论文

compared to the all current state-of-the-art techniques. It can be seen that Zengqiang [102]
outperforms in terms of AUC, but for other metrics, it achieves marginal lower scores.
Overall, the performance of the proposed model is much better than the compared methods.

For the STARE dataset, the proposed model achieves 0.9747, 0.9912, 0.7640 and 0.9885
for Acc, Sp, Se and AUC respectively.
In terms of Acc Sp and AUC, the proposed model competes all current state-of-the-art
models but achieves the third-best score in terms of Se. wang [76] achieves the best results in
terms of Se by obtaining 0.0200 more sensitivity but obtains lower scores in terms of Sp and
Acc as compared to the proposed model.

For the CHASEDB1 dataset, the purposed CPGAN model achieves 0.9792, 0.9811, 0.7697
and 0.9892 for Acc, Sp, Se and AUC respectively, among which the results for the Se and Acc
outperforms the all reported state-of-the-art methods.
It can be seen, as indicated in the Table 2-5, that the proposed model achieves better results
in terms of Acc and AUC whereas oliveira [86] and DUNet [75] performs better than the
proposed model in terms of Sp and Se.

An important characteristic of the proposed model is that it can capture the thin vessels.
Results generated from the proposed model on all three datasets are shown in Fig. 2-7.
In addition to the segmentation of thin and thick vessels, CPGAN can detect low contrast
thin vessels some of which were not marked as vessels by the human annotator, while the
second human marked those pixels as vessels.
These results show the credibility of the CPGAN model for the segmentation of thick and
thin retinal vessels.

37
上海大学博士学位论文

Fig. 2-7. Sample results of the proposed CPGAN. From top to bottom: results on STARE
(column 1,2), DRIVE (column 3,4), and CHASEDB1 (column 5,6) dataset. Row 1: input,
row 2: ground truth, row 3: probability maps, row 4: probability maps generated after
applying Otsu automatic thresholding.

Computational Complexity

In segmentation techniques, feature learning schemes are found to be computationally more


complex as compared to preprocessing, and post-processing techniques and hence dominates
the overall computational complexity. On the other hand, the computational testing complexity
of a deep learning model is a vital cost to the system for real-time applications. With the
provided system configuration, the proposed model is trained until the early stopping criteria is
achieved. The number of epochs or training iterations depend on model complexity, including
patch size. On average, the proposed model is trained for 40 epochs, where each epoch requires
70 minutes for the mentioned system configurations. A single experiment training process took
around 60 hours. Hence, the training time of a model is not a bottleneck to inference as once the

38
上海大学博士学位论文

deep learning model is trained, testing on unseen data samples is applied. The proposed model,
on average, can inference a unit test sample in 1:09 seconds on the same configured system.

2.6 Conclusion

An efficient deep learning model (conditional patch-based generative adversarial network:


CPGAN) has been proposed, that can potentially address the segmentation of retinal blood
vessels in fundoscopic images.
Training the generator network to learn small transitions in thin vessels and allowing the
discriminator (patchwise) to learn the discrimination to discriminate vascular and nonvascular
pixels is beneficial for segmentation performance.

Results on three publicly available datasets showed that the proposed model is competitive
with current state-of-the-art techniques.
The CPGAN computes average error on small patches rather on an entire fundoscopic
image.
The consideration of additional loss term into the main objective function acts as leverage
and enhances the effectiveness of the CPGAN.
The model has the potential to probe the different patch sizes so that the influence of
patch-based discriminator on segmentation performance can be better analyzed.

39
上海大学博士学位论文

3 Motor Imagery Classification

3.1 Introduction and Related Works

Brain-Computer Interface (BCI) connects the human cortex to the external computer-aided
devices. BCI technology can help handicapped persons who are suffering from complete or
partial control over their motor cortex. Recording of EEG signals can help in these cases to
grasp control over their motor cortex to some extent. Brain activities can be captured using focal,
parietal, or 10–20 cathode system in non-invasive or invasive manners. EEG is the most
common method that captures motor imagery signals and helps in many computer-aided
applications, i.e., prosthetic limb control, robotic devices control, text writing, and computer
cursor control [103–105] (see Fig. 3-1).

Fig. 0-1. Brain computer interface pipeline.

40
上海大学博士学位论文

For this purpose, many researchers have made e orts to capture the motor imagery EEG
signals and distinguish the imagined tasks [106-122]. Classification of motor imagery signals in
BCI research faces the following three challenges:

a) Signal to noise ratio (SNR): The recorded brain signals comprise of a very low
amplitude giving rise to a strong classification challenge for MI-EEG signals.
Consequently, for events like muscular movements, eye blinking, eye movement, EEG
signal interferes with heart rhythm, and teeth grinding has low SNR (Signal-to-noise
ratio) values. These artifacts in influence the decoding system performance and result in
incorrect decoding for human thoughts.

b) Non-stationarity: EEG signals are sensitive to human behavior and small variations in
body states result in a change in mean, variance, and co-variance against time [123].

c) Subject-specific: Both the actual and imagined EEG signals are subject dependent and
vary from subject to subject. However, imagined EEG signals are more subject-specific
and hold uniqueness property among the subjects.

Several different signal processing methods and classifiers have been pro-posed to address
the above challenges. These methods exploit the time, frequency, and time-frequency-based
techniques to classify motor imagery tasks. The most common method used in the processes is
Fast Fourier Transform (FFT) and combined with decision tree classifier to classify
epileptiform signals [124]. The features for EEG classification signals in BCI systems can be
extracted through common spatial patterns as optimal spatial patterns. It maximizes the
difference between classes to distinguish binary classes as additive sub-components [125].
Local Temporal Common Spatial Pattern (LTCSP) was presented as an alternative to CSP [111].
The noise susceptibility for a co-variance matrix was minimized by using a time-dependent
neighboring matrix. It utilized LDA, and OVR (one-versus-rest) classifier on the LTCSP
obtained features, which restored the mean kappa value of 0.61 for the four-class EEG data. An

41
上海大学博士学位论文

optimized spatial-frequency-temporal method feature extraction was proposed [82], where


researchers obtained spatial frequency temporal features through sparse regularization by
applying CSP filters. A Naïve Bayesian classifier was used, and the method resulted in a 0.53
mean kappa value.
Increasing in the number of features can decrease the overall performance which increases
the probability of redundant features and method complexity. To address this challenge in [127],
a Filter Bank Common Spatial Pattern was proposed for the autonomous selection of key
temporal-spatial discriminative characteristics of EEG signals. In contrast to the CSP used for
the binary classifier, FBCSP maximizes the variance of underlying classes. Researchers
practiced different methods for the classification of multi-class data Naïve Bayesian Parzen,
Pair-Wise (PW), Divide and Conquer (DC), and gained a mean kappa value of 0.57 by applying
FBCSP with PW classification and OVR methods.
Advancement in neural networks, especially convolutional neural networks (CNN), and its
implementation for sophisticated tasks proposed a significant approach to the MI classification
in the domain of machine learning. CNN is an innovative strategy for the classification that
depends on the images provided at input. In MI classification, the automatic feature learning
characteristic of CNN made it beneficial compared with the classical (signal processing)
approaches, which use the custom made features obtained through image and signal processing
techniques. However, temporal information of EEG signals as a sequence to sequence learning
scheme is not yet incorporated to train the model, which is an unexplored area. Temporal
information can increase the classification performance, and it has been proven that
incorporating the prior information can significantly improve the classifier's performance.
The deep learning application in EEG motor imagery classification has limited literature.
Steady-state visually evoked potential (SSVEP) was utilized in [128], where four-layered CNN,
accompanied by a Fourier Transform layer, was integrated with visual stimulus. The
transformation of temporal data into the frequency domain caused a consider-able increase in
classification accuracy.

42
上海大学博士学位论文

To exploit CNN classifiers with spectral features, a novel method was proposed in [129]. In
this model, the researchers used a four-layered CNN model with FFT to train the models on
features in the frequency domain. FFT features were mapped to images and fed to the CNN
classifier. They achieved promising results using this approach. Similarly, in [130] CNN was
proposed to classify the P300 EEG data. For feature extraction, they explored the frequency and
time domain. 2D maps of size 64 x 64 were generated from 64 channel EEG data and tried
different configurations of CNN classifier. Another evidence [131] on MI classification
problems provides an analysis of the multi-class MI signal classification. The LDA classifier in
an OVR mode was utilized with Auto-encoder, and Band Power features for feature reduction.
Deep Belief Networks (DBNs) is a type of CNN that learns per layer patterns such that the
next layer obtains the input from the previous layer. Several papers have concentrated on DBNs
for the classification of motor imagery signals [132]. A DBN was utilized along-with EEG
computed differential entropy signals for the emotion classification. A Hidden Markov model
was used for the acquisition of an authentic emotional stage switching [133]. The classification
of two-class motor imagery tasks employed another deep learning approach [132]. A DBN
classifier collaborates various restricted Boltzmann machines (RBMs) using the AdaBoost
algorithm after trained on single-channel data. The DBN training employed the Contrastive
Divergence (CD) algorithm, providing better performance compared to SVM. Similarly, an
RBM based new deep learning strategy was presented in [134]. Wavelet packet decomposition
(WPD) and FFT were utilized to achieve the features in the frequency domain. A four-layered
classifier named Frequential Deep Belief Network (FDBN) was presented that stacked three
consecutive RBM layers and the last Softmax regression layer. For an accurate tuning of the
network, Conjugate gradient and back propagation was utilized. In contrast to the advanced
methods, FDBN resulted in a considerably improved performance assessed on the standard
datasets.
Sequence-to-sequence (S2S) learning methods have shown significant improvements by
dividing an input signal into small progressive chunks [135,136]. Most of the S2S learning
methods utilize LSTM based recurrent networks. An LSTM network and wavelet analysis

43
上海大学博士学位论文

based novel scheme was introduced in [137] to incorporate the time series analysis for the
classification of multi-class motor imagery. Much work on the potential of temporal
information has been carried out [138–141, 121]. In [121], a Multi-Layer Perceptron (MLP) and
the CNN model as a parallel architecture have been proposed. The MLP model is aimed to learn
spatial features, and the CNN model is applied to retain temporal information. Hilbert
transform- based channel-relative energies are used as a temporal feature. Both networks are
connected by averaging and resulted in a considerable change in the classification accuracy.
Another most recent research proposed a novel representation of temporal information [122].
Multilayer perceptron and CNN are combined with learning temporal and spatial features. Both
the models are connected via averaging and resulted in a significant increase in the
classification accuracy.
The artificial neural networks used for classification problems mainly utilize convolutional
neural networks or perceptrons. The CNN models contain a mesh of neurons whereas the
perceptrons form linear layers. Spiking neuron is the third generation neuron model, spike on a
specific threshold as compared to the first and second generation neurons. The most recent
evidence on multiclass motor imagery classification proposed a spiking neuron based model
trained for four MI classes in binary mode and achieved better results as compared to the
classical machine learning models [142]. A fundamental problem with much of the literature on
LSTM based networks are those methods mainly deal with only two-class motor imagery
classification. There is still a considerable gap in research on the four-class motor imagery
classification problems. A better generalization of deep learning methods is required to
determine the high nonlinear motor imagery signals. Furthermore, discovering the hidden
patterns and impact of temporal relationships among small chunks of a record along with
spectral and spatial features have not been dealt with in-depth.
In this study, we build a method to classify four-class motor imagery signals by learning
spectral, spatial, and temporal information where spatial and spectral features are extracted
using typical spatial pattern (CSP) filters and Fast Fourier Transform (FFT) energy maps. These
features are divided into non-overlapping chunks and provided to a memory-based deep

44
上海大学博士学位论文

learning model in the form of sequential images. The deep learning model utilizes LSTM cells
to learn and retain the information from previous samples. This research is an extension of the
recent work [143] which used the same feature extraction and classification method. However,
we further optimized each stage and conducted the extensive analysis. Furthermore, this
research also analyzes the impact of preprocessing, feature extraction and classification stage
hyper-parameters on multi-class motor imagery classification performance. Contributions of
this research are:
• This research proposes a 3D recurrent CNN architecture to classify the non-stationary
four-class motor imagery signals.
• Proposes a novel spatio-spectral energy maps on small segments of each EEG trial.
• In addition to the main proposed architecture, experiments are also performed on 3D
recurrent CNN without LSTM layer, LDA classifier, and SVM classifier in
one-versus-all mode.
• The proposed deep learning architecture is trained and analyzed for four-class motor
imagery dataset [144].
• Furthermore, this research provides preprocessing parameter optimization and analysis
on a set of ranges.

3.2 Methodology

To address the multi-class motor imagery classification problem, this research purposes, a
deep learning-based method, consisting preprocessing, manual feature extraction, and deep
neural network for deep feature computation and pattern-based classifier stages. Each stage
performs dedicated tasks using multiple sub-stages, whereas all the stages are optimized to
yield the best classification performance. A complete pipeline of the proposed model is shown
in Fig. 3-2. The preprocessing stage processes the input data, which involves temporal trimming
of trials, segmentation of temporal trimmed signals, and frequency bands selection to extract
the appropriate frequency band ranges. The feature extraction stage extracts hidden spatial
features against the target classes using a common spatial pattern (CSP) filters against each

45
上海大学博士学位论文

preprocessed frequency band. To learn and discriminate the spectral and temporal information
of EEG signals, Fourier transform is applied, which results in spectro-temporal features on the
top of spatio-temporal features. The obtained chunks of spatio-spectral features are converted
into energy maps, and each segment of the EEG signal is mapped to an energy map. Obtained
temporal segments of spatio-spectral features are used as a sequence to train the deep learning
model. The sequential feeding of these samples allows the model to learn the most
discriminative features to classify multi-class EEG data. The deep learning model consists of
several building blocks with an information retaining mechanism which allows the model to
learn contextual and temporal information to learn and classify the samples. Each stage,
including sub-stages, is discussed in the following sections.

3.2.1 Data Preprocessing

Non-invasive EEG recording techniques capture the brain activities against motor imagery
tasks by placing electrodes on the predefined head locations. These external electromagnetic
field receptors also capture the surrounding noise. The surrounding noise influences the signal
and reduces the signal classification performance. Furthermore, the quality of EEG signals is
subjective, and the presence of unwanted EEG signals as artifacts to differentiate the underlined
multiclass EEG problems. The preprocessing stage aims to clean the noise and artifacts from
EEG signals using spatial and temporal filtration methods. Motor imagery epochs among
different subjects vary in terms of the frequency range, actual task duration, and starting/ending
time of the event. Learning the most common features for task classification, parameter
optimization, and selection are crucial steps. This research applies multiple preprocessing
techniques for noise reduction and making the signals more differentiable to effectively train
the classification models. The temporal window where actual motor imagery occurs mainly
contributes to core tasks and the remaining signals can reduce the classification accuracy. To
exclude the redundant time frames in each trial, each trial's trimming is a crucial part.
Furthermore, we are interested in the frequency ranges where actual motor imagery

46
上海大学博士学位论文

phenomenon occur, and the most related frequency ranges are carried out for further processing.
All sequential preprocessing steps are discussed in the following sections.

Fig. 0-2. Flow diagram of the proposed method consisting progressive preprocessing,
feature extraction and classification modules. Input is multi-class multi-channel motor
imagery EEG data and output is the class confidence, Key terms: FFTEM: Fast Fourier
Transform Energy Maps, LSTM: Long-Short-Term-Memory.

3.2.1.1 Temporal trimming

An EEG trial consists of actual motor imagery task signals wrapped between cue and
settling time windows to separate the consecutive trials. The duration and central point of a real
motor imagery activity task are subject-specific and may vary in successive trials of a single
subject. The presence of EEG signals, other than actual tasks, can lead to performance
degradation. So, the selection of a temporal window in each trial is a crucial step. This research
tried multiple windows ranges between 3.0–6.5 s out of full trial (8.0 s) as given in Table 3-1.
Trimming each trial has reduced the feature-length, model complexity, and training time.

3.2.1.2 Temporal segmentation

47
上海大学博士学位论文

Obtained trimmed trials contain only the strongest trimmed signals to achieve better
classification and model generalization. Learning over an entire trial allows the model to learn
the spatial information, whereas the temporal information is required to learn the patterns
concerning time in a single trial. To incorporate the temporal information with the
spatio-spectral features, temporal trimming is applied to each trial. Each trimmed trial is
divided into a small segment where 0.1 s duration is selected empirically. Each segment
contains a similar length of segments against all electrodes. Temporal segmentation resulted in
20 non-overlapping sequential segments and provided to frequency bands selections stage.

3.2.1.3 Frequency bands selection

Using a non-invasive EEG signal extraction method, motor imagery signals usually span
between 8 Hz to 30 Hz frequency range and also vary between subjects. µ and β bands are the
frequency bands that cover the motor imagery spectrum. As suggested in [101], (8-13 Hz) and
(13-30 Hz) frequency ranges of µ and β bands are the frequency ranges belong to motor
imagery tasks, we selected these ranges and dropped the other frequency ranges. To filter the
required ranges of µ and β frequency ranges, a 5th order Butterworth filter is used.

3.3 Normalization

Normalization of EEG signals is required to reduce the impact of constant bias in


electrodes which can introduce baseness in classifier. Normalization is applied on all segment
as a time segments of each electrode signal by subtracting mean 𝜇(𝑥𝑖 ) of each trial from its
spontaneous values 𝑥𝑖 (𝑡) and the obtained results are divided by standard deviation 𝜎(𝑥𝑖 ) of
that trial. Normalization method can be expressed mathematically as:

𝑥𝑖 (𝑡) − 𝜇(𝑥𝑖 )
𝑥𝑖∗ (𝑡) = (1)
𝜎(𝑥𝑖 )

where 𝑥𝑖 (𝑡) is input sample and 𝑥𝑖∗ (𝑡) is the resultant normalized sample.

48
上海大学博士学位论文

3.3.1 Common spatial pattern (CSP) filters

To learn complex EEG patterns and their corresponding features, spatial and spectral
features are obtained against each temporal segment. CSP filters are used to increase the
inter-class separability using the variances maximization method. Variance maximization for a
class and variance minimization for the other class can be achieved using co-variance matrix
diagonalization. A bandpass filtered series of EEG signals can be represented as 𝐴 ∈ 𝐼𝑅 𝑁𝑥𝑇 as
multiple trials where N is the number of channels, and T is the number of samples against each
channel. Using the Raleigh criterion, covariance can be maximized as:

𝐴𝐴𝑇
𝑅= (2)
𝑡𝑟𝑎𝑐𝑒(𝐴𝐴𝑇 )

Let C1 and C2 represent left-hand and right-hand spatial co-variance matrices, respectively.
Average spatial co-variance matrix for each class can be formulated as:

𝑇
1
𝐶1 = ∑ 𝐶1𝑖 (3)
𝑇
𝑖=1

𝑇
1
𝐶2 = ∑ 𝐶2𝑖 (4)
𝑇
𝑖=1

These average spatial co-variance matrix are combined to form a composite average spatial
co-variance metrics:

𝐶 = 𝐶1 + 𝐶2 (5)

Eigenvectors (U) and Eigenvalues (λ) are obtained by decomposing the composite average
spatial co-variance matrix and eigenvalues (λ) are sorted in descending order with whitening
noise for decorrelation as:

49
上海大学博士学位论文

𝐶 = 𝑈 λ𝑈 𝑇 (6)

𝑃 = 𝑈√λ−1 𝑈 (7)

If C1 and C2 are transformed as 𝑆1 = 𝑃C1 𝑃′ and 𝑆2 = 𝑃C2 𝑃′ where eigenvectors are


shared for S1 and S2 as:

𝑖𝑓 𝑆1 = 𝐵 λ1 𝐵 ′ , 𝑡ℎ𝑒𝑛 𝑆2 = 𝐵λ2 𝐵 ′ ; λ1 𝑎𝑛𝑑 λ2 = 𝐼 (8)

I shows the identity matrix. The smallest eigenvector of S1 has the largest value for S2 and
the largest eigenvector of S1 has smallest value for S2. Features can be obtained from the
projection of input trial A on the projection matrix W = BP as:

𝐷 = 𝑊𝐴 (9)

where d has a dimension of 2m x T and each row corresponds to a feature vector. Common
spatial patterns can be presented as columns of projection matrix W-1 and form a time-invari-
ant signal vector. CSP feature set FCSP can be formulated as:

𝑣𝑎𝑟(𝐷𝑝 )
𝐹𝐶𝑆𝑃 = log( 𝑛 ) (10)
∑𝑖=1 𝑣𝑎𝑟(𝐷𝑖 )

where Dp are CSP components. The dataset is sampled at 250 Hz and contains 66 trials
where each trail is trimmed to a specific time window T and resulted T x 250 x 66 features. For
feature reduction, we used one-to-one strategy and obtained 6 CSP filters where first three and
last three features from each filter matrix are similar to the method [146].
Obtained CSP spatial features 𝐹𝐶𝑆𝑃 and time segments are used to generate the energy
maps of spectral features using fast Fourier transform.

3.3.2 Energy maps

Spectral information hidden in EEG signals can help to exploit temporal frequency
information, which can provide relational information. Fast Fourier Transform is utilized to

50
上海大学博士学位论文

extract spectral features over a period of time. The frequency spectrum obtained from spatial
features against the temporal segments is further manipulated to extract the energy maps.
Energy maps generation on the obtained wavelets can be formulated as:

𝐸𝑀𝑖 = |𝐹𝐹𝑇(𝑓𝑝 𝑖)|, 𝑖 = 1 … 𝑛 (11)

where EMi is a two-dimensional energy map of a temporal segment, and n represents the
energy maps count. Each energy map contains features of all 22 channels, and an example of a
sequence of energy maps is shown in Fig. 3-3.

3.3.3 Classification

The classifier learns the distinguishable features by providing the input EEG sample and
corresponding labels. The proposed methodology uses the spatio-spectral and temporal
information to train the classifier. The CNN-based deep learning model is proposed as a
classifier with memory retaining capability using the LSTM layer to learn the patterns of motor
imagery patterns for the given classes. Multiple experiments have been conducted to select
optimal hyper-parameters. For a fair comparison, we also trained the model without the LSTM
layer. Several classical supervised classifiers are tried out, including Linear Discriminant
Analysis (LDA) and Support Vector Machine (SVM). The performance of the SVM classifier
mainly depends on two intrinsic parameters named kernel window and margin size, and
effective training of the SVM classifier requires the near-optimal choice of basic parameters. A
range of kernel windows (between 1 x 102 and 1 x 106) and margin sizes (between 1/102 and
106) are tried out to find the near-optimal parameters.

3.3.3.1 Convolutional neural networks classifier

CNNs have captured the attention of researchers because of their ability to learn
two-dimensional arrays using multi-layer perceptron cells. Combining these multi-layer
perceptron cells result in nonlinear convolutional filters, and these filters learn the features
corresponding to respective classes.

51
上海大学博士学位论文

Fig. 0-3. EEG signal segment generation and FFT Energy Maps generation.

The architecture of the proposed model is shown in Fig. 3-2, which contains two 2D
convolutional layers where each convolutional layer is followed by a pooling layer. To map the
learned 2D features on a vector of classes, a dense layer (fully connected layer) is used, which
is connected to the final classification layer. In the last layer, Softmax is used where the number
of neurons is specified as the total number of target classes.
For the base CNN model (without LSTM layer), the normalized feature maps are provided
as input to the convolution layer, where a convolution layer contains multiple nonlinear filters.
These filters convolve with input 2D energy map figures and learn the filter weights to learn the
input to output mapping kernels. At each iteration (epoch), the weights of convolution layers get
updated values aimed at minimizing the loss value between ground-truth and the prediction. A
convolutional layer can be expressed as:

𝑌𝑖 ′ 𝑗′ 𝑘 ′ = ∑ 𝑊𝑖𝑗𝑘 𝐻 ′ 𝑖 + 𝑖 ′ , 𝑗 + 𝑗 ′ , 𝑘 (12)
𝑖𝑗𝑘

52
上海大学博士学位论文

where 𝐻 ′ is input, Y is output W, are the learned weights as a matrix and i; j; k are input
feature dimensions. As a result of convolution, the convolution layer's output may contain high
values, either positive or negative. These significant positive and negative filter values can
make several neurons dead and resist learning further signal patterns.
The activation layer is used to address this issue by mapping the high values to a smaller
range. Based on the mapping function, there are multiple activation functions available in the
literature, including Sigmoid, tanh, ReLU, and their variants. Various experiments are
conducted with different activation layers, and the ReLU activation is selected as the optimal
activation layer.
A large number of convolution layers produce many kernels or filters, which increase the
size of learned features. Model complexity and training time are prone to feature size. After
each activation layer, pooling layer is used to reduce the feature size. Maxpooling is used,
which selects the maximum value from a 2 x 2 window of a filter. The pooling layer is applied
over all convolutional features in both dimensions with a value of 2. The pooling function can
be formulated as:

𝑍𝑖𝑗𝑘 = max(𝑌𝑖 ′ 𝑗′ 𝑘 ) (13)

𝑖 ≤ 𝑖 ′ < 𝑖 + 𝑝, 𝑗 ≤ 𝑗 ′ < 𝑗 + 𝑝 (14)

where Y is input, Z is the output of the pooling layer, and p is padding constant.

The final layer consists of a Softmax layer to compute the Softmax parameters for all
spatial locations against all feature channels in a convolutional manner. The number of neurons
depends on the number of classes, four in our case, as we evaluated the proposed method on
four-class motor imagery dataset. Softmax layer maps the input to the output as similarity
probability and assigns a maximum probability to the closest likely class as:


𝑒 𝑍𝑖𝑗𝑘
𝑆 𝑖𝑗𝑘 = (15)
∑𝑄𝑡=1 𝑒 𝑍𝑖𝑗𝑘

53
上海大学博士学位论文

where z is the input vector of dimensions Q and s’ is the output of Softmax layer.

Fig. 0-4. Long-short-term-memory (LSTM) unit.

3.3.1.2 LSTM network

Learning patterns from temporal data requires information retaining functionality.


Recurrent neural networks (RNN) are found as the strong candidate to learn sequential
information. However, recurrent deep learning models can face vanishing gradient problem
where a large number of convolutional layers and activation layers can decrease the loss
function gradients that make the neural network hard to train [147]. To reduce the vanishing
gradient problem, LSTM can be used for dynamic nonlinear dynamic time series. Similarly, the

54
上海大学博士学位论文

gradient explode problem occurs where some neurons get very high values, which limit the
further learning process, and LSTM also suppresses the gradient explode problem.
To train a LSTM based CNN model, non-overlapping segments of energy maps obtained
from FFTEM are fed to the LSTM layer. LSTM layer learns the current input and sustains the
information till the next input segment. Similarly, with the arrival of every new input segment, a
factor of information from previous samples is also incorporated. Architecture of an LSTM
memory cell is given in Fig. 3-4 where an LSTM cell consists of three gating functions: forget,
input and output gate. By controlling these gates, a factor of prior inputs can be selected as
additional information with the current input. Gating functions contain corresponding
non-linear activation function (Sigmoid). The input activation function (fi)r receives the current
input xt and output activation function (fo) provides the decision vectors it. The gate activation
function (fg) control the prior hidden state ht-1. Decision vector generation can be formulated as:

𝑖𝑡 = 𝑓𝑔 (𝑤𝑖 ℎ𝑡−1 + 𝑤𝑖 𝑋𝑡 + 𝑏𝑖 ) (16)

where bi and wi are bias weighted input matrix, respectively. Input candidate information
can be generated using Ht-1 and Xt:

𝐶𝑡 = 𝑓𝑖 (𝑤𝑐 ℎ𝑡−1 + 𝑤𝑐 𝑋𝑡 + 𝑏𝑐 ) (17)

prior state factor Ct-1 is determined from the decision vector (ft) where ft is obtained from
the forget gate (fg). Decision vector can be expressed as:

𝑓𝑡 = 𝑓𝑔 (𝑤ℎ𝑓 ℎ𝑡−1 + 𝑤𝑥𝑓 𝑋𝑡 + 𝑏𝑓 ) (18)

Sigmoid is used to scale decision vector ft between 0 and 1. The output gate fo provides the
output of an LSTM cell as:

𝑜𝑡 = 𝑓𝑔 (𝑤𝑜 ℎ𝑡−1 + 𝑤𝑜 𝑋𝑡 + 𝑏0 ) (19)

Without having prior information, it is not possible to build a model which suits best to the
given problem. To obtain the best performance, we tried three LSTM layers with memory cells

55
上海大学博士学位论文

range between 32 and 256. Multiple experiments have been carried out, and the results are
evaluated on the unseen (test) dataset.
Optimization of the proposed model is targeted in all three stages: preprocessing, feature
extraction, and classification. For the LSTM based CNN model, the Adam optimizer is used for
efficient and fast optimization. Categorical cross-entropy (CE) with a mini-batch gradient
descent method is used as a loss function. 0.001 learning rate is used as initial, whereas the
learning rate decay mechanism is used as 0.9 and 0.999 as the first and second moments.

3.4 Results and discussion

3.4.1 Dataset configuration

EEG motor imagery dataset used in this research comes from the BCI Competition IV 2a
[120]. This dataset consists of four motor imagery classes: tongue, both legs, left hand, and
right hand. There were a total of 9 subjects that participated on two different days to collect the
dataset using non-invasive 22 Ag/AgCl electrodes. 288 trials on each day were recorded using
22 channels, whereas 3 extra channels were used as Electrooculography (EOG). One session is
used for training and the other session is used for validation. Frequency is ranged from 0.5 Hz
to 100 Hz and is sampled at 250 Hz. Class-discriminative eye movements are separated by
filtering the data between 4 Hz and 40 Hz. Fixation queue, cue, motor imagery, break, and
resting segments were marked for each trial, as shown in the single-trial extraction diagram Fig.
5. Separate sections of training and testing were recorded where the unseen test sessions are
used for final evaluation.

3.4.2 Evaluation benchmark

For a comprehensive measurement of classification performance, mean kappa coefficient (k) is


used as evaluation metrics. Cohen's Kappa (k) value and its mean is used for evaluation as
recommended by the competition committee. k is the robust and statistical measure used for
inter-rater or inter-rater reliability test which can be formulated as:

56
上海大学博士学位论文

Figure 0-5. Paradigm for extraction of a single trial.

∑ 𝜋𝑖𝑖 − ∑ 𝜋𝑖 + 𝜋𝑖 +
𝑘= (20)
1 − ∑ 𝜋𝑖+ 𝜋𝑖+

where πii shows the probability measure of two events if the events belong to the same
class i. πi+ is the agreement by any chance, and ∑ 𝜋 shows the accumulative probability of
inter-rater or intra-rater and agreement obtained by observing the confusion matrix. k = 1 and k
= 0 show full and agreements by chance. k ≤ 0 denotes if the by chance agreement is higher
than the agreement.

3.4.3 System configuration

A mini-batch of size 256 training examples is used to speed up the training process. 256 is
the maximum mini-batch size the system allows, and it is limited by the GPU memory. For
training and evaluation of the proposed model, its variants, and classical classifiers, we used
NVIDIA 1080Ti GPU, which allows the parallelization and makes the gradients less noisy.
Tensorflow and Keras are used as machine learning frame-works.

3.4.4 Influence of temporal trimming window

A single trial of motor imagery EEG signal is comprised of 3 s (3-6 s), as shown in Fig. 3-5.
Classification performance on different ranges of time frames is given in Table 3-2. In addition
to the proposed LSTM based CNN model, we also experimented with CNN, LDA, and SVM

57
上海大学博士学位论文

classifiers. CNN and LSTM based CNN classifiers resulted best mean k in 2.5–5.0 s time frame
whereas LDA and SVM classifiers achieved better mean k value in 2.5– 4.5 s time frame. These
results depict that the 2.5–5.0 s and 2.5– 4.5 time frames are the optimal choice to extract the
most useful features. However, the selection of a larger time frame can also add computational
head to the computational complexity. This temporal trimming scheme has allowed us to reduce
the computational complexity by selecting the most optimal time frames irrespective of the
subjects.

3.4.5 Influence of frequency bands

The selection of frequency bands (µ, β) and its ranges may influence the classification
performance. To analyze the effect of µ and β ranges, we conducted a large number of
experiments with all possible combinations of m and b bands. Exemplar results of these
experiments are given in Table 3-3. It is observed that all the classifiers outperform against the
frequency ranges (8–14, 19-25 Hz). A significant drop in mean kappa value can be seen in other
ranges of m and b ranges. The selection of appropriate bands is a crucial task as frequency
bands are subjective to the individual subject. This method provides an average contribution of
all the subjects in the selection of optimal and relevant frequency bands.

Table 0-1. Influence of temporal window selection on classification performance (mean


k value). Model is trained on BCI Competition IV 2a [144]

Time frame (seconds) LDA SVM CNN LSTM


2.5-6.0 0.55 0.56 0.72 0.76
2.5-5.5 0.55 0.56 0.72 0.78
2.5-5.0 0.57 0.57 0.75 0.81
2.5-4.5 0.58 0.58 0.7 0.76
2.5-4.0 0.57 0.58 0.67 0.7
3.0-6.0 0.55 0.57 0.64 0.68
3.0-5.5 0.55 0.56 0.64 0.67
3.0-5.0 0.53 0.55 0.64 0.67
3.0-4.5 0.53 0.54 0.62 0.65

58
上海大学博士学位论文

Table 0-2. Influence of frequency bands on classification performance (mean kappa


value).
Frequency Bands (µ, β) LDA SVM CNN LSTM
7-12, 16-20 0.55 0.55 0.6 0.63
8-11, 18-22 0.55 0.56 0.61 0.65
8-12, 19-23 0.57 0.57 0.64 0.72
8-13, 19-24 0.57 0.57 0.69 0.78
8-14, 19-25 0.57 0.58 0.75 0.81
8-15, 20-23 0.56 0.58 0.71 0.77
9-14, 21-25 0.55 0.57 0.68 0.72
9-14, 22-30 0.55 0.56 0.65 0.70

Table 0-3. Influence of LSTM memory cells dataset 2a BCI competition IV [144].

Number of memory cells


Subjects
32 64 128 256
A01 0.82 0.85 0.88 0.87
A02 0.69 0.71 0.76 0.74
A03 0.8 0.83 0.86 0.83
A04 0.82 0.82 0.88 0.87
A05 0.88 0.91 0.93 0.92
A06 0.65 0.68 0.69 0.68
A07 0.71 0.78 0.78 0.78
A08 0.73 0.76 0.74 0.74
A09 0.74 0.77 0.77 0.77
Mean 0.76 0.79 0.81 0.80

3.4.6 Influence of LSTM memory cells count

The number of LSTM memory cells in an LSTM unit corresponds to the number of prior
samples to be remembered. A large number of memory cells increase the model's complexity
and training time, whereas the small number of memory cells compact the overall memory
environment. For an optimal selection of memory cells count, several experiments with a

59
上海大学博士学位论文

different number of memory cells have been carried out, and Table 3-4 shows the obtained
results.
The proposed model performed the best for these experiments, where a single LSTM layer
is used, which contains 128 memory. These results are consistent with the results reported in
[148] (classified ERP signals). Increasing the LSTM memory cell count improved the training
dataset's performance by over-fitting the model and reduced the classification performance for
the test dataset. This analysis shows that the number of LSTM layers and memory cell count are
crucial parameters that can decrease the evaluation results if increased or decreased from the
optimal selection.

Table 0-4. Classifiers performance evaluation (Kappa Coefficients) on dataset 2a BCI


competition IV [144].
Subjects LDA SVM CNN LSTM
A01 0.61 0.64 0.79 0.88
A02 0.45 0.47 0.76 0.76
A03 0.67 0.56 0.82 0.86
A04 0.63 0.71 0.66 0.88
A05 0.77 0.72 0.85 0.93
A06 0.45 0.48 0.86 0.69
A07 0.49 0.55 0.74 0.78
A08 0.51 0.56 0.72 0.74
A09 0.55 0.53 0.75 0.77
Mean k 0.57 0.58 0.75 0.81
Standard Deviation 0.109 0.091 0.064 0.079

3.4.7 Evaluation of classifiers

This research proposed an LSTM based CNN classifier along with the feature extraction
pipeline. Along with the proposed model, the base CNN model without the LSTM layer and
other classical classifiers are tried out for four-class motor imagery EEG signal classification.
Evaluation of all the experimented classifiers is given in Table 3-5. For a multi-class
classification problem, the LDA classifier is trained and evaluated in one-versus-all mode. SVM

60
上海大学博士学位论文

classifier is trained and optimized for kernel size and soft-margin hyper-parameters. SVM and
LDA classifiers resulted in a similar performance for individual subjects. However, the SVM
classifier outperformed the LDA classifier in mean k value. Similarly, the CNN model without
LSTM layers performed better than the LDA and SVM classifiers, showing that CNN models
can learn most discriminative features compared to the conventional classifiers. Furthermore,
the CNN classifier resulted in a lower standard deviation (0.064). The lower value of standard
deviation ensures the generalization of the classifier. However, the proposed model LSTM
based CNN model resulted in a 0.81 mean k value with a standard deviation of the value of
0.079. This result depicts that spatio-spectral features with prior information retaining capability
provide better classification results by remembering the complicated trends and transitions.

3.4.8 Comparison with the state-of-the-art methods

Previous sections discuss the results of the hyper-parameter optimization of preprocessing,


feature extraction, and classification stage. Each classifier type is optimized along with the best
suitable preprocessing and feature extraction hyper-parameters. In this regard, the proposed
method has proven to classify four-class motor imagery signals with enhanced mean k value.
However, a comparison of state-of-the-art methods with the proposed method based on LSTM
and CNN is mandatory, both in terms of mean k value and computational complexity. This
section discusses and compares the classification accuracy, and the next section discusses the
computational complexity of different approaches compared to the proposed method.
Performance comparison of the proposed method and state-of-the-art-methods is given in Table
3-5, where individual k value against each subject is computed and mean k scores are shown in
the last column.
Separable Common Spatiotemporal Patterns (SCSSP) [115], a modified method of CSP
filters that resulted the least mean k score of value 0.44. Transductive and inductive learning
method [116] reported 0.58 mean k score whereas they used KNN and SVM as cascade manner.
They reported that sequential modeling can add a computation overhead as compared to single
classical classifier (SVM or LDA). [127] utilized DC classifier and achieved 0.57 mean k score.

61
上海大学博士学位论文

LTCSP feature extraction method with LDA classifier based method [111] resulted 0.61 mean k
score and performed better than [127,149,109,110,112,115–117,119] and equal to [120]. The
FBCSP filters and SVM classifier resulted 0.59 mkv for multi-class motor imagery
classification [75]. Another evidence on spatiotemporal features [120] utilized energy maps as
final feature set and fed these features to a naive CNN based classifier and obtained 0.61 mkv
on BCI composition IV dataset 2a. [150] A Non- Negative Matrix Factorization (NNMF)
feature extraction method with SVM classifier reported 0.62 mean k scores.
To combine temporal, spatial, and spectral information, [114] used Regularized Linear
Discriminant Analysis (RLDA) based combined models and achieved 0.74 mean k score. Their
results ensure the credibility of temporal, spatial, and spectral features for feature extraction.
Similarly, another recent evidence utilized FBCSP for spatial feature extraction and utilized a
deep hybrid network based on the LSTM network to incorporate temporal information. Their
proposed method achieved the second-highest mean k score of value 0.80 [122]. This shows the
importance of spatiotemporal features and LSTM based network. Spatiotemporal features
allowed them to extract meaningful and discriminative features. In [143] researchers used the
same approach but with a different configuration of LSTM and utilized spatio-spectral and
temporal features with LSTM based CNN classifier. This approach resulted in 0.64 mean k
score by optimizing each module. In contrast, the proposed method extracts spatio-spectral
features and incorporates the temporal information using LSTM based CNN model and
extensive optimization of hyper-parameters. Spatio-spectral and temporal features and optimal
hyper-parameters allowed the proposed model to achieve 0.81 mean k scores. These results
show that the proposed model outperformed all the compared models.
We also showed confusion matrices obtained from few representations of subjects (subject
two and subject 7) in Fig. 3-6. For this purpose, CNN and LSTM classifiers are evaluated. It
can be seen that for subject 2, the LSTM classifier outperforms among all compared classifiers.
Similarly, for subject 7, the proposed classification model performs better than all compared
classifiers. Further, it can be seen that LSTM is not only to detect more true positives but at the
same time, it predicted a few false positive as compared to the CNN classifier.

62
上海大学博士学位论文

Table 0-5. Comparison of classification accuracy (Kappa Coefficients (k)) on dataset 2a


BCI Competition IV [144].

Subjects Mean
Methods
A01 A02 A03 A04 A05 A06 A07 A08 A09 K

Ang et al. [127] 0.71 0.37 0.66 0.41 0.40 0.26 0.73 0.58 0.50 0.52
Ang et al. [127] 0.68 0.42 0.75 0.48 0.40 0.27 0.77 0.75 0.61 0.57
Ang et al. [127] 0.78 0.40 0.75 0.52 0.41 0.19 0.80 0.74 0.54 0.57
Wang et
0.56 0.41 0.43 0.41 0.68 0.48 0.80 0.72 0.63 0.57
al.[149]
Asensio-Cubero
0.70 0.45 0.71 0.96 0.60 0.43 0.55 0.61 0.57 0.62
et al.[151]
Asensio-Cubero
0.76 0.32 0.76 0.47 0.31 0.34 0.59 0.76 0.74 0.56
et al.[109]
Kam et al.
0.74 0.35 0.76 0.53 0.38 0.31 0.84 0.74 0.74 0.60
[110]
Ghaheri and
Ahmadyfard 0.74 0.43 0.80 0.57 0.31 0.38 0.76 0.75 0.80 0.61
[111]
Metlicka [112] 0.66 0.42 0.77 0.51 0.50 0.21 0.30 0.69 0.46 0.50
Nicolas-Alonso
0.84 0.55 0.90 0.71 0.66 0.44 0.94 0.85 0.76 0.74
et al. [114]
Aghaei et al.
0.62 0.28 0.66 0.33 0.14 0.25 0.41 0.60 0.66 0.44
[115]
Raza et al.
0.88 0.22 0.88 0.39 0.53 0.33 0.38 0.85 0.81 0.58
[116]
Zanini et al.
0.74 0.38 0.72 0.50 0.26 0.34 0.69 0.71 0.76 0.57
[117]
Hadsund and
0.83 0.51 0.88 0.68 0.56 0.35 0.90 0.84 0.75 0.70
Leerskov [118]
Abbas and
0.68 0.43 0.70 0.61 0.80 0.42 0.55 0.55 0.56 0.59
Khan [119]
Abbas and
0.70 0.45 0.76 0.58 0.71 0.51 0.63 0.56 0.60 0.61
Khan [120]
Sakhavi et al.
0.83 0.54 0.87 0.55 0.50 0.27 0.86 0.78 0.72 0.66
[121]
Zhang et al.
0.85 0.54 0.87 0.78 0.77 0.66 0.95 0.83 0.90 0.80
[122]
Rammy et al.
0.71 0.47 0.84 0.64 0.56 0.68 0.59 0.64 0.63 0.64
[143]

63
上海大学博士学位论文

Proposed 0.88 0.76 0.86 0.88 0.93 0.69 0.78 0.74 0.77 0.81

3.4.9 Computational complexity

EEG signal classification mainly contains preprocessing, feature extraction, and


classification stages, and the computational complexity of any stage mainly depends on the type
of the method. In the training process, the classifier requires more computing power than the
preprocessing or feature computation stage. On the other hand, at the inference time, compute
power requirements for feature extraction and classification can be compared.
To measure the computational complexity at test or inference, let we have an input R of
dimensions n x d to a classifier, inference can be formulated as Rn x d. FBCSP feature
extraction method was used in [127] for multi-class motor imagery classification.
Computational complexity of FBCSP method with m feature size is O(m3 + Rm2). Researchers
used PZ classifier of complexity O(n, d, R). Overall computational complexity of their
approach can be expressed as O(n, d, R) + O(m3 + Rm2).
CSD based algorithm was used in [116], where it mainly contains moving average filter
and statistical hypothesis tests of K–S and HT-S types. K–S test has computational complexity
O(nlogn), and HT-S has O(n) complexity. Overall computation complexity of CSD method can
be expressed as O(n) + O(nd) + O(Nn) + O(nlogn) + O(n).
As compared to FBCSP, the LTCSP method [111] selects weight parameters manually
rather than using an inter-class variance, which adds computational head to the LTCSP method.
Combining the LTCSP feature extraction approach and LDA classifier, overall computation
complexity can be presented as O(nd2) + O(m3 + Rm2).
The most recent and comparative method utilized FBCSP for feature extraction and
LSTM-CNN classifier for the classification of four-class motor imagery signals. They used only
Spatial and temporal features. The overall complexity of their feature extraction and classifier
can be formulated as O(m3 + Rm2) + O(n2).
In contrast, we used CSP, FFTEM and LSTM based CNN classifier where the overall
computational complexity can be expressed as O(m3 + Rm2) + O(nlogn) + O(n2). The overall

64
上海大学博士学位论文

computational complexity of the proposed method is slightly larger than the compared methods
but similar to the approach presented in [143]. However, with the advancement in accelerated
processors, computation complexity is not a major issue. Practical average training and
inference time of the proposed model with the mentioned system configuration setup is 3.2 min
and 240 ms, respectively.

Fig. 0-6. Confusion matrices for a few representation of the proposed model compared
with different classifiers on dataset 2a BCI competition IV [144].

3.4.10 Limitations
This research utilizes forehead and frontal lobe electrodes to capture the EEG MI signals.
Forehead based EEG devices are more easy to use in daily life and easy to install as compared
to traditional EEG devices. However, forehead-type brain computer interfaces are prone to
EOG and Electromyography (EMG) signals. Movements in forehead and eye muscles introduce

65
上海大学博士学位论文

inevitable EOG and EMG signals. For motor imagery, BCI research should ensure the
minimum involvement of EOG and EMG signals. This research has another limitation of
dataset generalization. The dataset used in this research only contains 9 subjects while knowing
the fact that EEG signals are highly subject-specific. For better generalization of the proposed
method, more number of subjects are required to demonstrate the robustness of the proposed
method.

3.5 Conclusion

This research proposes a novel method for multi-class motor imagery EEG signals
classification by learning the intricate patterns in multi-channel signals. The proposed approach
utilizes CSP filters to extract spatial features, FFTEM to extract spectral features, and the
spatio-temporal features are converted into 2D images. To learn the temporal features using the
LSTM layer, each EEG trial is divided into small periods of segments. These segments are used
to train the classifier. LSTM based CNN classifier is used to retain the prior sample information
and learn the Spatio-spectral features in segments. Learning this way allowed the model to
extract the hidden patterns of EEG signals. Several experiments have been carried out find the
optimal parameters for the preprocessor, feature extractor, and classifier modules. In
comparison to the existing state-of-the-art methods, the proposed method can learn and classify
complex patterns effectively. Experimental results and analysis demonstrate that the proposed
method can be used to classify multi-class motor imagery EEG signals. However, further
improvements can be made to enhance the generalization of the proposed method by data
synthesizing using general adversarial networks [152].

66
上海大学博士学位论文

4 Precision-Recall Curves and Receiver Operating Characteristics

4.1 Introduction and Related Works

From the last decade, retinal blood vessel’s segmentation of fundus images has grasped the
attention of medical specialist for analysis and identification of deceases. In a report published
by American Diabetes Association [153], glaucoma and Diabetes Retinopathy are found as a
major source of visual impairment especially among people aged above 20 years, with
approximately 4.2 and 2.3 million Americans suffering from Diabetic Retinopathy and
glaucoma, respectively. According to a report [154], published from International Diabetes
Federation, there are 342 million patients with diabetes all around the world.
Similar to pathology diagnosis, computer-aided retinal blood vessel’s segmentation can
assist in diagnosis based on patterns present in vessel branches and visible features like width,
density, and crookedness. This analysis can be used for the analysis of patients, which depicts
the cause of a specific pattern. Ophthalmologists can predict the signs of hypertension and
vascular disease in an early stage by analysis of retinal vascular patterns. Blindness rate caused
by Diabetic Retinopathy is also increased with the increase of the diabetic patients.
Computer-aided retinal vessel segmentation with remarkable accuracy is required to offload the
clinicians and save resources [155].
In recent years, there has been a growing interest in machine learning for classification and
segmentation, has grasped the significant attention of researchers due to its performance
[156-159]. Many attempts have been made to proven CNN’s ability for retinal vessel
segmentation and even exceeded in performance as compared to humans and traditional
approaches in different datasets [160-163]. These methods suffer from blur and result in false
positives around indistinct and tiny vessel branches. The main reason for this rather
contradictory result is the objective function used for CNNs in state-of-the-art models where
they apply objective function in a pixel-wise manner. Unfortunately, a pixel-wise objective
function cannot oblige complex structure of blood vessels. Most recent evidence on retinal

67
上海大学博士学位论文

segmentation using CNN [164] used Fully Convolutional Neural Network (FCN) for
segmentation. Their approach is able to detect sharp edges by applying post processing on
segmentation results. However, segmentation of both thick and thin vessels is not reported yet.
To address this problem, a novel approach for retinal vessel segmentation is proposed in
this research. In contrast to other models, the proposed model is different in choice of generator
as well as discriminator. The proposed model not only extract sharp retinal vessels with less
false positive, but also able to segment out thin and thick vessels. Our investigations depict that
the generative model can improve the quality of vessel segmentation and results.

4.2 Methodology

For segmentation of each fundus image, the proposed model for retinal vessel segmentation
is applied in two stages. In the first stage, a deep learning generative model is trained to
generate vessel maps. In the second stage, the generated output of generative model and ground
truth vessel masks are applied to the discriminator which discriminate the generated and actual
vessel maps until it confuses both the inputs. The proposed vessel segmentation model is
evaluated using two publicly available benchmark datasets. The model framework is shown in
Fig. 4-1.

Fig. 3-1. The proposed Conditional Patch Generative Adversarial Networks (CP-GANs)
framework for retinal vessel segmentation. (key terms: G: Generator, D, Discriminator,
X:conditional data, Z:noise vector, Y:synthetic data).

68
上海大学博士学位论文

In segmentation problem, extraction of low-level features is a crucial step to obtain finer


segmentation result [165, 166]. To incorporate low-level features, we follow the spirit of UNet
[166] and SegNet [167] for generator network and extracted low-level features to extract thin
blood vessels. Hence, dataset samples are too small in our case, which can make the training
process very hard. This problem can be solved using pre-trained neural networks as proposed in
[165]. The second method is the utilization of data augmentation to increase the number of
samples as suggested in UNet [166]. Furthermore, providing a copy of low-level features to the
corresponding high-level features allow feature propagation and not only compensate finer
details but also facilitate training process in terms of backward propagation.
Let’s a conditional fundoscopic image sample x of width W and height H , its
corresponding vessel segmentation map y, and a random noise distribution z, the generator
network G is a mapping function which maps the conditional input x and noise data z to a
segmentation map G(x, z), as: G : {x, z} - > G (x, z). A patch-based discriminator network D,
takes two pairs {x, y} and {x, G(x, z)} as input in form of patches and discriminate each patch
as a ground truth y or synthetic segmentation map G(x, z) by resulting a score between one and
zero as: {1, 0}n where n is a hyperparameter of the model and represents the total number of
patches fed to the discriminator D. The hyperparameter n can be chosen between 1 and the total
number of pixels of the input sample. Any other value of n, 1 < n ≤ (W x H), divides the input
of discriminator into n patches and score of discriminator on a single image can be computed by
averaging the scores of all patches. The loss function of generator and discriminator networks
can be formulated as,

𝐿𝐺 (𝐺, 𝐷) = 𝐸𝑥,𝑦,𝑧 (− log (𝐷(𝑥, 𝐺(𝑥, 𝑦)))) (1)

𝐿𝐷 (𝐺, 𝐷) = 𝐸𝑥,𝑦 (− log(𝐷(𝑥, 𝑦)) + 𝐸𝑥,𝑦,𝑧 (log (1 − 𝐷(𝑥, 𝐺(𝑥, 𝑧))))) (2)

where z is the input noise vector of a latent space, x is conditional sample and y is the
corresponding ground truth. The objective function of the proposed model can be formulated as,

69
上海大学博士学位论文

𝐽𝐶𝑃𝐺𝐴𝑁 = 𝑎𝑟𝑔 𝑚𝑖𝑛𝐺 𝑚𝑎𝑥𝐷 𝐸𝑥,𝑦 (− log(𝐷(𝑥, 𝑦)) + 𝐸𝑥,𝑧 (− log(1 − 𝐷(𝐺(𝑥, 𝑧), 𝑥)))) (3)

Studies have found that L2 norm, in generative networks, produces blurry images and failed in
capturing low-frequency components present in fundoscopic images, in form of smooth edges,
and skipped the high frequency in all generated images [168]. In such cases, new frameworks
are not required but L1 norm can capture low-frequency components as suggested [169]. This
motivated us to integrate L1 norm in the objective function to handle low-frequency correctness
and let the discriminator network to model only high-frequency components. L1 norm used in
generative loss function can be formulated as,

𝐿𝐿1 (𝐺) = 𝐸𝑥,𝑦,𝑧 [||𝑦 − 𝐺(𝑥, 𝑧)||1] (4)

Combining the objective function and the additional loss term, the final objective function of
the proposed model can be formulated as,


𝐽𝐶𝑃𝐺𝐴𝑁 = 𝑎𝑟𝑔 𝑚𝑖𝑛𝐺 𝑚𝑎𝑥𝐷 𝐽𝐶𝑃𝐺𝐴𝑁 (𝐺, 𝐷) + λ𝐿𝐿1 (𝐺) (5)

where λ is a hyperparameter. Discriminator D is trained with an objective of maximization of


the probability of training data and minimization of the probability of sampled data obtained
from the generator G. The generator is trained to confuse the discriminator between generated
segmentation maps and ground truths. Discriminator and generator can be trained in alternation
using the stochastic gradient descent method.

Table 3-1. Comparison of generator models in standalone mode and with


discriminators of different patch sizes (640 x 640; 120 x 120; 40 x 40; 10 x 10; 1 x 1) in
terms of PR and ROC on drive [170] and stare [171] datasets. Key terms: precision-recall
(PR), receiver optical characteristics (ROC), standalone (SA), with discriminator (d)
DRIVE, STARE, GENERATOR, DISCRIMINATOR
PR ROC PR ROC
SA 640 X 640 0.9062 0.9758 0.9008 0.9791

640 X 640 0.9057 0.9773 0.908 0.9795


RCNN 120 X 120 0.9062 0.9775 0.9086 0.9803

70
上海大学博士学位论文

D 40 X40 0.9043 0.9770 0.9067 0.9789


10 X 10 0.9035 0.9768 0.9052 0.9781
1 X1 0.903 0.9764 0.9033 0.9775

SA 640 X 640 0.8955 0.9718 0.8956 0.9749

640 X 640 0.9008 0.9736 0.9001 0.9768


DEEPMASK 120 X 120 0.9033 0.9745 0.9012 0.9771
D 40 X40 0.8991 0.9729 0.8993 0.9765
10 X 10 0.8972 0.9725 0.8979 0.976
1 X1 0.8961 0.9722 0.8962 0.9757

SA 640 X 640 0.9038 0.9778 0.9049 0.9789

640 X 640 0.9102 0.9794 0.9125 0.9826


UNET 120 X 120 0.9149 0.9803 0.9167 0.9838
D 40 X40 0.9088 0.9791 0.9099 0.9811
10 X 10 0.9071 0.9788 0.9064 0.9798
1 X1 0.9054 0.9784 0.9051 0.9794

SA 640 X 640 0.9057 0.9789 0.9062 0.979

640 X 640 0.9137 0.9803 0.9154 0.9831


PROPOSED MODEL 120 X 120 0.9154 0.9817 0.9180 0.9845
D 40 X40 0.9126 0.9795 0.9133 0.9819
10 X 10 0.9098 0.9789 0.9098 0.9802
1 X1 0.9065 0.97879 0.9082 0.9795

4.3 Experiments and Results

To explore credibility and best suitable generator, we conducted an extensive number of


experiments on both datasets (DRIVE [170], STARE [171], as shown in Table 4-1. All the
experiments are conducted in Keras with tensor flow as back-end. Two publicly available
datasets (DRIVE [170] and STARE [171]) are used for training and evaluation. DRIVE dataset
is divided into training and testing sets first annotator’s data for training and second annotator’s
data for testing. STARE dataset constrains 20 images where first 10 images are used for training
and last 10 images are used for testing. To remove the bias from images, we normalized each
channel of the image to z – score: subtracting and dividing by mean and standard deviation
respectively. Augmentation (rotation and horizontal flip) is applied to generalize the dataset.
Fixed learning rate of value 0.0002 is selected with β1 = 0:5 for all experiments. Trade-off
coefficient, Lambda λ, is fixed to λ = 10 in objective function.
Area Under Curve for Receiver Operating Characteristics (ROC-AUC), dice coefficient
(F1) and Area Under Curve for Precision and Recall Curve (PR-AUC) are used as a benchmark

71
上海大学博士学位论文

for evaluation of the proposed method. Dice coefficient, a threshold for probability map has
been chosen as stated proposed in [177]. This is used often for separation of background and
foreground for grey level histograms. Masks are generated using blob detection for STARE
dataset and counted the number of pixels in the field of view because this dataset does not
contain mask images.

Table 3-2. Classification accuracy for naive bayes and flexible bayes on various data
sets. Key terms: Precision and recall (PR), Receiver operating characteristic (ROC)

STARE DRIVE
Model
PR ROC DICE PR ROC DICE
Kernel Boost [174] 0.8888 - - 0.8464 0.9306 0.8000
HED [163] 0.8433 0.9764 0.8050 0.8773 0.9696 0.7960
Wavelet [175] - 0.9694 0.7740 0.8149 0.9536 0.7620
N4-Fields [176] - - - 0.8851 0.9686 0.8050
DRIU [161] 0.9101 0.9772 0.8310 0.9064 0.9793 0.8220
FCN [164] - 0.9810 - - 0.9900 -
Human Expert - - 0.7600 - - 0.7910
Proposed Model 0.917 0.9845 0.8420 0.9154 0.9817 0.8320

For selection of the hyper-parameters, multiple experiments are carried out with extensive
different combinations of generator and discriminator networks. In Table 4-1, deep learning
generator models are evaluated in stand-alone scheme where these models are trained in the
same environment for both datasets. Precision-recall (PR) and ROC matrices are used for
evaluation.
It can be seen that UNet resulted in PR of value 0:9038 and ROC of value 0:9778.
Performance of different discriminators is evaluated onboth DRIVE [170] and STARE [171]
datasets. Standard deep learning models are also trained standalone (without discriminators) as
well as with discriminator. Further, we selected different patch sizes for discriminators to train
the networks.
Each image is resized to a dimension of 640 x 640, 120 x 120, 40 x 40, and 10 x 10, and 1
x 1 and trained the models. First generator network is trained with discriminator for a patch of
size 640 x 640. In similar manner, each network against each patch dimensions are trained and

72
上海大学博士学位论文

evaluated using PR and ROC matrices as shown in Table 4-1. For patch of dimension 640 x 640,
all the GAN networks performed better as compared to patch sizes of dimensions 120 x 120, 40
x 40, and 10 x 10, and 1 x 1.
Further, it can be seen in Fig. 4-2 that UNet with discriminator and patch size of 640 x 640
outperformed all other networks with different generators. This substantiates previous findings
in the literature that choice of discriminators is a vital part of the generative adversarial network
[178], [179]. The proposed GAN model is trained and tested on training and testing data
respectively. Dice coefficients and Area Under Curve (AUC) for Receiver Operating
Characteristic (ROC) of different models are summarized in Table 4-2. For a fair comparison,
images and dice coefficients of other methods are adopted from [161]. We computed F1 norm
or Dice coefficient, similar to Jaccard similarity index. It can be seen that the proposed model
surpasses the human performance on STARE and DRIVE datasets and outperformed all
comparative state-of-the-art models.
Precision-Recall (PR) curve for the purposed model and with state-of-the-art methods on
both datasets are shown in Fig. 4-2. Results depict that the proposed model outperformed all
compared models on both datasets and reflects the consistency in blood vessel segmentation as
compared to human annotations.
False positive, false negative, true positive and true negative is a measure used to measure
the uncertainty of the results identifies the wrong segmented pixels. Qualitative comparison of
the proposed model and DRIU [161] method is shown in Fig. 4-3. This comparison depicts that
DRIU [161] method resulted more false negative (red marks) and false positive (blus marks) as
compared to the proposed model due to the mapping of probability with overconfidence. It is
clear that the proposed model resulted in more true positive (green marks) and true negative
(blue marks) against narrow vessels and at ending edges of vessels. Patched based GAN
resulted few false positive (blue marks) and false negative (red marks) because it assigns a low
probability to the segmentation maps near the uncertain blocks of vessels like human annotator.
Further, Fig. 4-4 shows the qualitative comparison of proposed model and best-reported
model (DRIU). Both the models are evaluated on DRIVE (first rows) and STARE dataset. The

73
上海大学博士学位论文

proposed model outperformed, as shown in Fig. 4-4 (column IV), the DRIU results (column iii)
and more close to the ground truths (column ii). It can be seen that the proposed model not only
generates probability maps very close to ground truths as compared to the DRIU method but it
is also able to segment out the thin vessels. These results justify the overall network
configuration and lose function as compared to stand-alone generator network.

Fig. 3-2. Precision-Recall and Receiver Optical Characteristics (ROC) curves on


DRIVE [170] and STARE [171] datasets.

4.4 Conclusion

We proposed a novel Conditional Patch Generative Adversarial Network for the


segmentation of retinal blood vessels from fundoscopic images. First, the generative network is

74
上海大学博士学位论文

used for segmentation and later on, a discriminator is applied to increase the accuracy of the
generator. We used a patch based discriminator, which discriminates the generated segment
maps from the ground truth in patch wise mode and average over the entire image. Further, we
imposed a condition on generator and discriminator to train the model. We evaluated the
proposed model against the state-of-the-art methods reported in the literature on publicly
available DRIVE and STARE datasets. The overall evaluation shows that the proposed model
outperformed all the comparative methods. The Proposed model has an advantage over the
GANs and end to end deep learning models as it resulted in better precision and recall and
ROC.

Fig. 3-3. Comparison of proposed model with DRIU.


First row: Drive dataset,
Second row: STARE dataset.
First and third column: results of the DRIU [161] model,
second and forth column: results of the proposed model. Green, blue, red marks:
True positives, false positives and false negatives.

75
上海大学博士学位论文

Fig. 3-4. Visual comparison of the obtained results.


Column 1: fundoscopic image,
column 2: ground truth, column 3: DRIU [161] results and column 4: results of the
proposed method. Row 1-2: DRIVE dataset, Row 3-4: STARE dataset.

76
上海大学博士学位论文

5 Motor Imagination Recognition

5.1 Introduction and Related Works

BCI is an emerging technology which connects the tangible engine cortex range of human
cerebrum and deals with brain aided control of devices and computer. Handicapped humans,
who have lost a marginal grip over motor cortex due to severe injuries, can be connected with
such a BCI system to restore their abilities. Brain activities related to motor imagery (MI) tasks
can be obtained by the 10–20 cathode position framework connected in invasive or
non-invasive manners. Electroencephalogram is considered as the most effective way to collect
non-invasive MI tasks. BCI system empowers the patients to utilize motor imagery signals, by
mapping their sensory-motor function, to control prosthetic devices, 2D cursor control [180],
wheelchair control [181] and quad-copter control [182]. Therefore, great efforts have been
made to effectively classify motor imagery signals [183]. In the last decade, various
publications have been accounted for the classification of MI signals using Convolutional
Neural Network (CNN). In [184], the authors utilized Steady State Visually Evoked Potential
(SSVEP) in which visual stimulus was utilized with CNN of four layers followed by a Fourier
Transform layer. They contended that conversion of temporal data into frequency domain
results in better classification accuracy.
Deep Belief Networks (DBNs) are another form of CNN which learn the patterns as per
layer such that current layer works as the input for the next layer. A few pieces of literature have
also focused on DBNs to classify motor imagery signals. To classify emotions, in [185], a DBN
was employed and fed with the differential entropy features computed on EEG signals. To
capture a reliable emotional stage switching, a Hidden Markov Model was utilized. Another
deep learning method was reported to classify two class motor imagery tasks [186]. A DBN
classifier was trained on a single channel data and later on combined multiple restricted
Boltzmann Machines (RBMs) as a stack using AdaBoost algorithm. To train the DBN, they
used Contrastive Divergence (CD) algorithm and reported that they achieved better

77
上海大学博士学位论文

performance than SVM classifiers. Similarly, a new deep learning scheme based on RBM was
proposed in [187]. To obtain frequency domain features, they used Wavelet Packet
Decomposition (WPD) and fast Fourier transform. Four-layer classifier was proposed by
stacking three RBMs at the top of each other and a Softmax regression as the final prediction
layer and the network was named Frequential Deep Belief Network (FDBN). Conjugate
gradient and back-propagation were used to fine tune the network. FDBN was evaluated on
benchmark datasets and significant performance improvement has been shown in comparison to
the state-of-the-art methods.
However, the major challenge in the classification of MIEEG signals arises due to the fact
that the brain signals that are recorded are very small in amplitude. Therefore, events such as
eye blink, eye movement, muscular movements, teeth grinding and heart rhythm interfere with
the EEG signal, resulting in a signal having low signal to noise ratio (SNR). This prevents the
decoding system to correctly decode user thoughts. Further, utilization of time and frequency
based features are reported in the literature but feeding these feature as a temporal stream has
not reported yet. Training a model in such a way that it must remember the previous temporal
input to learn a sequential pattern can increase the classification performance. To address this
issue, we propose a novel approach that retains the information of previous samples. In contrast
to all the previous deep learning based methods, we incorporate temporal information of
previous samples rather than training the classifier only on the current sample.

Fig. 4-1. Proposed Model, Input: 4 class motor imagery data, output: class confidence.

78
上海大学博士学位论文

5.2 Methodology

Proposed deep learning model consists of three sequential stages: (i) data preprocessing, (ii)
feature extraction and, (iii) classification. Structure of the proposed model is shown in Fig. 5-1.
In the preprocessing stage, temporal trimming, temporal segmentation, frequency bands
selection and normalization are performed. To yield the best results, considerable attention has
been paid to all preprocessing techniques and near-optimal parameters are selected from an
exhaustive list of experimental variants. To find most discriminative features and to ensure
maximum variance among all the features, CSP filters are used. These discriminative features
are transformed into energy maps using Fast Fourier Transform Energy Maps (FFTEM) and
used as the final feature set for the classifier. LSTM based Convolutional Neural Network is
proposed as a classifier to retain the temporal information and classify a sequence of temporal
segments.

5.2.1 Preprocessing

Reduction of artifacts is important in the preprocessing stage as artifacts in raw EEG data
can affect classification performance. Temporal trimming of the occurrence of actual motor
imagery task and corresponding frequency ranges are also important factors. A time based
window is utilized to trim each EEG trial such that trimmed samples contain most effective
patterns related to motor imagery. To form a temporal stream, temporal segmentation is applied
by dividing each trial into a stream of chunks. Hence, each chunk may contain different ranges
of amplitudes so each chunk of EEG signal is normalized to balance the data ranges.

5.2.1.1 Temporal Trimming and Segmentation

In practice, an entire window of each trial is used for feature extraction. Previous
experiments have shown that the effectiveness of feature extraction and classification
performance is highly dependent on subject-specific time segments of EEG signals [188]. In
order to address this issue, overlapping time segments of different time duration ranges between
3.0-6.5 are tried out to extract near-optimal range of each trial. To make the temporal sequence

79
上海大学博士学位论文

of a single trial, we further trimmed each trial, obtained from temporal trimming stage, into
multiple chunks where each chunk contains 0:1 sec interval of the trimmed trial. By dividing a
trial in this way, we obtained 20 temporal segments as a sequence.

5.2.1.2 Frequency Bands Selection

The occurrence of motor imagery task can vary in a frequency range between 8Hz to 30Hz
and highly depends on the subjects. As per prior knowledge, two frequency bands, µ (8-13 Hz)
and β (13-30 Hz) are filtered as these bands are reported as the most effective band ranges in
literature [189]. For filtration of these bands, a Butterworth filter of 5th order is used and
resulted in two frequency bands (µ, β).

Normalization
5.2.1.3

After trimming, segmenting and filtering the data, normalization is applied on each
segment to remove the constant bias. For each session and each electrode i, the mean µ(xi) of
the signal is subtracted from every time measurement sample xi(t). The result is divided by the
standard deviation 𝜎(xi).

5.2.2 Feature Extraction

Filtered streams obtained from preprocessing stage are fed to feature extraction stage where
cascaded features are computed. In first stage, CSP filters are applied to find discriminative
spectral features such that it maximizes variance of a class and at the same time minimizes the
variance of all other classes. Variance maximization and minimization is achieved by
determining spatial patterns via simultaneous diagonalization of covariance matrices for all
classes. These discriminative features are used to generate FFT energy maps to capture hidden
spectral and temporal information.

5.2.2..1 Energy Maps

In addition to spectral features, energy density computation on spectral features can result
in more prominent features of an event. To extract the spectral information, we used the Fast

80
上海大学博士学位论文

Fourier Transform, from which the energy maps are obtained and resulted as Fourier Transform
Energy Maps. For the generation of 2D (energy vs frequency) feature maps, FFTEM is a strong
candidate as compared to Fourier Transform (FT), and Wavelets in terms of computational
complexity, without compromising the performance [190].
Discriminative features, obtained from CSP filters, are fed to the FFTEM module to
compute energy vs frequency maps against µ, β bands. Resultant maps contain information of
22 channels against µ and β frequency bands. An example of an energy map (for the entire
frequency range) is shown in Fig. 5-2 where it can be seen that high energies are between 8–27
Hz and are distributed among all channels. All EEG channels are mapped on the spectral
features and can be formulated as:
𝐸𝑀𝑖 = |𝐹𝐹𝑇(𝑦𝑖 )|, 𝑖 = ̅̅̅̅̅
1, 𝑁 (1)

where EM is the obtained 2D energy map and N is total number of segments. Y is a single
CSP feature obtained from the CSP filter and as used as input to the FFTEM module. These
energy maps are used as the final feature set for the classifier in both training and testing phase.

Fig. 4-2. Temporal segmentation and sequence of FFT Energy Maps (EM).

81
上海大学博士学位论文

5.2.3 Classification

To incorporate temporal information of EEG signal, we introduce LSTM based CNN


classifier, where the LSTM layer is introduced before the CNN classifier. For a fair comparison,
we also experimented a stand-alone CNN network without LSTM layer and conventional
classifiers: Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM). To
classify multi-class motor imagery EEG signals, baseline CNN is used which contains two
convolutional layers, a pooling layer after each convolution layer, fully connected (FC) layer
and final classification layer (Softmax) as shown in Fig. 5-1.
Convolution layer consists of non-linear filters and performs 3D convolution on the
provided normalized feature maps, obtained from FFTEM module. Filters of a convolutional
layer learn discriminative features hidden in energy maps and learn the patterns and update
weights of the model on each iteration. Features obtained from convolutional layer may contain
a large number of positive or negative values which can lead some neurons to learn the patterns
effectively. To solve this issue, an activation function is used to map the convolution features on
a predefined interval. There is not a single activation function which suits best to a specific
problem so we evaluated multiple activation functions to find the optimal activation function
for the given problem and found ReLU as the most optimal activation function.
Convolution layer results in a large number of features which can increase the
computational complexity. To reduce the feature size, pooling layer is used. Pooling layer
window of size 2 x 2 with a stride of 2 in both dimensions is applied over all convolutional
features and pooling layer resulted in maximum value from each window and discarded the
remaining three entries. The final Softmax layer computes a probability to map the input on all
classes and the maximum probability corresponding to the predicted class label.
Recurrent neural networks are reported as a good choice to retain information of long term
events. However, RNN can face “gradient vanish or explode” problem during back propagation
with an increase in dependencies on the previous samples [191]. EEG signals, measured by
neural activity, can be regarded as dynamic non-linear temporal series and recurrent networks
like LSTM networks appear to be an effective method of choice to retain the temporal

82
上海大学博士学位论文

information of EEG signals and can reduce the “gradient vanish or explode” problem. A
temporal stream of energy maps of discriminative features, obtained from FFTEM module, is
fed to the LSTM network to learn the patterns against each class. LSTM layer contains three
gate structures, i.e., input gate, memory gate and forget gate. These gates determine which
chunk of prior information should be added into the main data flow and which information
should be discarded. At each gate, an activation function is applied to map the data onto a
nonlinear function (Sigmoid). These functions are denoted as input activation (fi), forget gate
activation (fg) and output activation (fo) and generate decision vectors. For example, input gate
generates a decision vector it with given prior hidden state ht-1 and a current input xt. Generation
of it can be formalized as;

𝑖𝑡 = 𝑓𝑔 (𝑤𝑖 ℎ𝑡−1 + 𝑤𝑖 𝑋𝑡 + 𝑏𝑖 ) (2)

where bi is the bias wi is the weighted input matrix. Xt and Ht-1 can be used to generate
input candidate information as in equation:

𝐶𝑡 = 𝑓𝑖 (𝑤𝑐 ℎ𝑡−1 + 𝑤𝑐 𝑋𝑡 + 𝑏𝑐 ) (3)

Forget gate (fg) generates decision vector ft which determines the factor of the prior unit
state Ct-1 to be reserved. ft can be formulated as:

𝑓𝑡 = 𝑓𝑔 (𝑤ℎ𝑓 ℎ𝑡−1 + 𝑤𝑥𝑓 𝑋𝑡 + 𝑏𝑓 ) (4)

Vector ft scaled between 0 and 1 using Sigmoid function. Output state of LSTM cell is
obtained from output gate fo and can be formulated as:

𝑜𝑡 = 𝑓𝑔 (𝑤𝑜 ℎ𝑡−1 + 𝑤𝑜 𝑋𝑡 + 𝑏0 ) (5)

To obtain the best number of LSTM layers and units in each layer, we experimented with
up to three layers and 32 to 256 numbers of memory cells and evaluated on unseen EEG data

83
上海大学博士学位论文

(which are summarized in section 5.3). The model is estimated using the fast and efficient
Adam optimizer, a mini-batch gradient descent variant with categorical cross-entropy (CE) as
the loss function. The learning rate for Adam was set to 0.001 and decay rates of first and
second moments of 0.9 and 0.999, respectively. We used 10-fold cross-validation strategy for
evaluation and measure mean kappa value (mean k).
To make a fair comparison, we also tried some other classifiers on the obtained features
(FFTEMs) including Linear Discriminant Analysis (LDA), Support Vector Machine (SVM) and
CNN. LDA and SVM classifiers are binary classifiers and to use them for a multiclass problem,
we applied One-vs-Rest (OVR) technique and selected the class which classifies the test data
with the greatest margin without changing other parameters.

5.2.4 Dataset Configuration

In this study, we used BCI Competition dataset 2a [189] which consist of four class motor
imagery signals including left hand, right hand, both feet, and tongue data. The dataset was
collected in a controlled environment from 9 subjects on two different days and 288 trials were
conducted on each day using 22 EEG channels and 3 EOG channels. The recording was based
on a cue where a fixation cross was shown with a beep to imagine one of the four MI actions
(left hand, right hand, both feet, and tongue). There was a short break between the recordings of
each trial. Trials with artifacts were discarded to make dataset less noisy and additional
correction of artifacts reduction were performed by the dataset developers. Training and
evaluation sessions were recorded separately. Cohen’s Kappa (k) is recommended for
evaluation by BCI competition IV committee, hence is used for evaluation of the proposed
model.

Table 4-1. Comparison of classifiers in terms of coefficients on dataset 2A BCI


competition IV.
Classifiers
LSTM
Subjects LDA SVM CNN No. of Memory Cells
32 64 128 256
S1 0.61 0.64 0.70 0.68 0.70 0.71 0.68

84
上海大学博士学位论文

S2 0.45 0.47 0.45 0.42 0.42 0.47 0.45


S3 0.67 0.56 0.76 0.63 0.69 0.84 0.81
S4 0.63 0.71 0.58 0.59 0.62 0.64 0.60
S5 0.77 0.72 0.69 0.48 0.51 0.56 0.52
S6 0.45 0.48 0.54 0.62 0.71 0.68 0.65
S7 0.49 0.55 0.63 0.53 0.55 0.59 0.61
S8 0.51 0.56 0.56 0.60 0.62 0.64 0.59
S9 0.55 0.53 0.59 0.61 0.58 0.63 0.66
Mean k 0.57 0.58 0.61 0.57 0.60 0.64 0.62

5.3 Results and Discussion

In this section, we present our experimental results on different classifiers and compare our
results with the techniques in literature.

5.3.1 Comparison of Classifiers

In this research, we used LDA, SVM and CNN classifiers along with the proposed LSTM
classifier for multi-class motor imagery classification. A comparison of all classifiers is
summarized in Table 5-1. LDA classifier is used in OVA mode whereas kernel and soft-margin
hyperparameters are tuned to get best results from the SVM classifiers. It can be seen that LDA
and SVM classifiers performed very close to each other but SVM surpasses the LDA
performance in terms of mean k. CNN classifier result in 0.61 mean _ with a standard deviation
of 0.095. In contrast to CNN classifier, the final proposed model (LSTM) resulted in 0.64 mean
k with a standard deviation of value 0.109.

5.3.2 Influence of Memory Cells

In terms of memory cells, large numbers can increase the model complexity while a small
number can concise the memory environment. To find the optimal number of memory cells and
LSTM layers, a range of memory cells has been evaluated. Obtained results are summarized in
Table 2. Model performed best with a single layer and 128 LSTM memory cells, which is
consistent with results obtained in [198] (classified ERP signals). More layers and a larger
number of LSTM memory cells improved training accuracy but led to strong overfitting and a

85
上海大学博士学位论文

performance decrease on the validation sets. Less than 128 memory cells led to underfitting,
which was observed by a decrease in both training and validation accuracy.

5.3.3 Comparison with other state-of-the-art Methods

Performance of the proposed method is compared with state of the art methods and
summarized in Table 5-2. Kappa values against each subject are computed and mean k values
are computed against each method. In [197], researchers proposed separable common
spatio-temporal patterns (SCSSP) for multiclass motor imagery classification and achieved 0.44
mean k. In [195], researchers were able to achieve 0.58 mean k by using two classifiers: first
(KNN) for fusion of new information and second (SVM) for classification as transductive and
inductive learning. They suggested that two simultaneous classifiers can increase computational
complexity. In [197], researchers introduced Separable Common Spatio-Spectral Pattern
(SCSSP) for multi-class MI data and achieved 0.44 mean k which is not a promising outcome
as compared to other reported methods. In [196], researchers tried different classification
methods i.e. Divide and Conquer (DC), PW and OVR. Maximum mean k of 0.57 using OVR
and PW classification methods has been achieved.
In [193], researchers achieved 0.61 mean by using Local Temporal Common Spatial
Pattern (LTCSP) and LDA as feature extraction and classification technique respectively. In
[192] researchers achieved 0.62 mkv by using Non-Negative Matrix Factorization (NNMF) for
feature extraction and SVM as a classifier. In contrast, our proposed method utilized LSTM
followed by FFTEM and CSP and outperformed other techniques and in comparison to
[192-194], our proposed model is robust to noise because of the less variance between the
subjects and can be used as generalized method regardless of subject-specific limitations.

Table 4-2. Comparison of classification accuracy (kappa coefficients) on dataset 2A BCI


competition IV.
NNMF CNN LTCSP FBCSP TI PW OVR DC SCSSP
Subjects LSTM

S1 0.71 0.70 0.70 0.74 0.68 0.88 0.78 0.68 0.71 0.62
S2 0.47 0.45 0.45 0.43 0.43 0.22 0.40 0.42 0.37 0.28
S3 0.84 0.71 0.76 0.80 0.70 0.88 0.75 0.75 0.66 0.66

86
上海大学博士学位论文

S4 0.64 0.96 0.58 0.57 0.61 0.39 0.52 0.48 0.41 0.33
S5 0.56 0.60 0.71 0.31 0.80 0.53 0.41 0.40 0.40 0.14
S6 0.68 0.43 0.51 0.38 0.42 0.33 0.19 0.27 0.26 0.25
S7 0.59 0.55 0.63 0.76 0.55 0.38 0.80 0.77 0.73 0.41
S8 0.64 0.61 0.56 0.75 0.55 0.85 0.74 0.75 0.58 0.60
S9 0.63 0.57 0.60 0.80 0.56 0.81 0.54 0.61 0.50 0.66
mean k 0.64 0.62 0.61 0.61 0.59 0.58 0.57 0.57 0.52 0.44
standard
0.11 0.19 0.10 0.20 0.12 0.27 0.21 0.18 0.17 0.19
deviation

5.4 Conclusion

A novel sequence to sequence learning method using LSTM is proposed for multi-class
motor imagery classification.
In order to extract discriminative features, CSP and FFTEM algorithms are applied to
generate energy maps between frequency and channels.
A Long-Short-Term-Memory based Convolutional Neural Network model is proposed as a
classifier for multi-class EEG data.
To learn sequential sequence, temporal segmentation is applied to break single trial into
sequential chunks.
In comparison to existing techniques, the proposed model can leverage the spatial and
spectral information encoded in EEG data for classification of EEG signals more accurately.
We calculated FFTEM features for each trial against each subject after passing each trial
from CSP to maximize co-variance among all classes using OVR mode.
Extensive experimental evaluation of each module has been carried out on a publicly
available benchmark dataset and yielded the state of the art results.
Experimental results demonstrated that the proposed method might be useful for
high-performance motor imagery classification applications.

87
上海大学博士学位论文

6 Conclusions

Deep learning-based analysis has been comprehensively carried out and implemented in
different domains. In fundoscopic images, CPGAN was used to remove retinal blood vessels. A
large number of experiments were carried out to evaluate the proposed model's output in terms
of design, patch dimension, impartial function, supplementary loss word, and
hyper-considerations. The suggested retinal vessel dissection classical is proficient and tested
via the DRIVE, STARE, and CHASEDB1 datasets, which are all publicly accessible and
manually annotated. The same protocol is used in this research meant for the division of
altogether three datasets into preparation, challenging, then justification sets for the assessment
and equal evaluation of the suggested prototypical by state-of-the-art techniques. The suggested
model's quantitative evaluation is focused on precision, sensitivity, and specificity, while the
qualitative evaluation is based on the field under the receiving operating curve. The DRIVE
dataset, on the other hand, comprises more thin vessels than the STARE dataset. CPGAN
observed together thick and thin vessels once checked in lieu of the STARE dataset although
qualified for the DRIVE dataset. Even though the thin vessels stay not present in the STARE
dataset's ground truths, given by dint of the first human annotator, CPGAN magnificently
perceives the majority of thin vessels that are, in fact, vessels. As can be shown, this study
competes with all other models, showing that the suggested model performs well proceeding
vascular besides non-vascular pixel segmentation overall.

The BCI Competition supported the EEG motor imagery dataset recycled in this analysis.
There are four motor imagery programs in this dataset: tongue, both legs, left hand, as well as
right hand. The first session is for preparation, while the second is for validation. The bias field
distortion affects fundoscopic images by causing the strength of several regions to differ across
the picture. We used contrast enhancement and strength normalization to ensure this. In low
contrast areas, contrast enhancement assists in distinguishing between foreground and
background.

88
上海大学博士学位论文

The frequency bands (m, b) chosen and their ranges can have an impact on classification
results. We performed an enormous number of experimentations with altogether probable
combinations of m and b bands to investigate the influence of m and b ranges.

Both of the classifiers leave behind the frequency assortments, according to the analysis. In
other m and b ranges, there is a substantial drop in mean kappa value. Since frequency bands
are particular to the discrete subject, selecting suitable bands is a critical activity.

This approach takes into account the average influence of all subjects in determining the
most appropriate besides applicable frequency bands.

This study proposed a CNN classifier based on LSTMs, as well as a feature extraction
pipeline.

Aimed at four-class motor imagery EEG signal cataloguing, the proposed model is
evaluated alongside the ignoble CNN model deprived of the LSTM layer besides other
traditional classifiers. Feature erudition schemes are establishing to be computationally extra
multifaceted than preprocessing and post handling techniques in segmentation techniques, and
thus dominate the general computational complexity.

The computational challenging complexity of a deep learning model, on the other hand, is a
crucial budget to the framework for factual-time applications.

By erudition the complicated designs in multi-channel signals, this study suggests a fresh
process aimed at multi-class motor imagery EEG signal cataloguing.

CSP filters are used to mine spatial features, FFTEM is used to mine spectral structures,
besides the spatio-temporal topographies be there transformed into 2D images in the proposed
process. The best considerations for the initialization, story extractor, also classifier modules
were discovered through a series of experiments.

89
上海大学博士学位论文

The suggested technique will learn and distinguish multifaceted patterns more efficiently
than prevailing state-of-the-art approaches.

The best factors for the initialization, feature extractor, and classifier segments were
discovered through a series of experiments.

In contrast to current state-of-the-art approaches, the suggested approach is capable of


efficiently learning and classifying complex patterns.

A powerful deep learning has been offered to help with retinal blood vessel segmentation in
fundoscopic images.
It is beneficial for segmentation performance to train the generator network to study minor
evolutions in thin vessels then to allow the discriminator to acquire the perception to
discriminate vascular as well as nonvascular pixels.
The suggested model is modest by means of present state-of-the-art methods, according to
results from three publicly accessible datasets.
Rather than computing average error on a complete fundoscopic image, the CPGAN
calculates average fault on small patches.
The use of an additional loss word in the key objective feature offers leverage and increases
the CPGAN's effectiveness. The model has the capability of probing different patch sizes in
order to better understand the effect of patch-based discriminators on segmentation efficiency.

90
上海大学博士学位论文

References

[1] Son, J., Park, S. J., & Jung, K. H. (2017). Retinal vessel segmentation in fundoscopic
images with generative adversarial networks. arXiv:1706.09318.
[2] Wang, C., Zhao, Z., Ren, Q., Xu, Y., & Yu, Y. (2019). Dense U-net based on
patch-based learning for retinal vessel segmentation. Entropy, 21(2), 168.
[3] Tuli, S., Dasgupta, I., Grant, E., & Griffiths, T. L. (2021). Are Convolutional Neural
Networks or Transformers more like human vision?. arXiv:2105.07197.
[4] Van Aken, B., Winter, B., Löser, A., & Gers, F. A. (2019, November). How does bert
answer questions? a layer-wise analysis of transformer representations. In Proceedings
of the 28th ACM International Conference on Information and Knowledge
Management (pp. 1823-1832).
[5] Hui, F., Nguyen, C. T., He, Z., Vingrys, A. J., Gurrell, R., Fish, R. L., & Bui, B. V.
(2017). Retinal and cortical blood flow dynamics following systemic blood-neural
barrier disruption. Frontiers in neuroscience, 11, 568.
[6] Maamari, R. N., Keenan, J. D., Fletcher, D. A., & Margolis, T. P. (2014). A mobile
phone-based retinal camera for portable wide field imaging. British Journal of
Ophthalmology, 98(4), 438-441.
[7] Kanski, J. J., Bowling, B., Nischal, K., & Burk, A. (2012). Klinische Ophthalmologie.
München: Elsevier, Urban & Fischer.
[8] Kalitzeos, A. A., Lip, G. Y., & Heitmar, R. (2013). Retinal vessel tortuosity measures
and their applications. Experimental eye research, 106, 40-46.
[9] Fraz, M. M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A. R., Owen, C.
G., & Barman, S. A. (2012). Blood vessel segmentation methodologies in retinal
images–a survey. Computer methods and programs in biomedicine, 108(1), 407-433.
[10] Khan, T. M., Alhussein, M., Aurangzeb, K., Arsalan, M., Naqvi, S. S., & Nawaz, S. J.
(2020). Residual connection-based encoder decoder network (RCED-Net) for retinal
vessel segmentation. IEEE Access, 8, 131257-131272.

91
上海大学博士学位论文

[11] Giusti, A., Cireşan, D. C., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2013,
September). Fast image scanning with deep max-pooling convolutional neural networks.
In 2013 IEEE International Conference on Image Processing (pp. 4034-4038). IEEE.
[12] Sun, K. (2011). Development of segmentation methods for vascular angiogram. IETE
Technical Review, 28(5), 392-399.
[13] Samuel, P. M., & Veeramalai, T. (2020). Review on retinal blood vessel
segmentation-an algorithmic perspective. International Journal of Biomedical
Engineering and Technology, 34(1), 75-105.
[14] Rest, A., Scolnic, D., Foley, R. J., Huber, M. E., Chornock, R., Narayan, G., & Waters,
C. (2014). Cosmological Constraints from Measurements of Type Ia Supernovae
discovered during the first 1.5 yr of the Pan-STARRS1 Survey. The Astrophysical
Journal, 795(1), 44.
[15] Fraz, M. M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A. R., Owen, C.
G., & Barman, S. A. (2012). Blood vessel segmentation methodologies in retinal
images–a survey. Computer methods and programs in biomedicine, 108(1), 407-433.
[16] Kitrungrotsakul, T., Han, X. H., Iwamoto, Y., Foruzan, A. H., Lin, L., & Chen, Y. W.
(2017, March). Robust hepatic vessel segmentation using multi deep convolution
network. In Medical Imaging 2017: Biomedical Applications in Molecular, Structural,
and Functional Imaging (Vol. 10137, p. 1013711). International Society for Optics and
Photonics.
[17] Faust, A. (2014, April). The history of Tel ‘Eton following the results of the first seven
seasons of excavations (2006–2012). In Proceedings of the 8th International Congress
on the Archaeology of the Ancient Near East (ICAANE) (Vol. 30, pp. 585-604).
[18] Wen, W., Wu, C., Wang, Y., Chen, Y., & Li, H. (2016). Learning structured sparsity in
deep neural networks. Advances in neural information processing systems, 29,
2074-2082.
[19] Ciecholewski, M., & Kassjański, M. (2021). Computational Methods for Liver Vessel
Segmentation in Medical Imaging: A Review. Sensors, 21(6), 2027.

92
上海大学博士学位论文

[20] Oliveira, A., Pereira, S., & Silva, C. A. (2017, February). Augmenting data when
training a CNN for retinal vessel segmentation: How to warp?. In 2017 IEEE 5th
Portuguese Meeting on Bioengineering (ENBENG) (pp. 1-4). IEEE.
[21] Saba, T., Bokhari, S. T. F., Sharif, M., Yasmin, M., & Raza, M. (2018). Fundus image
classification methods for the detection of glaucoma: A review. Microscopy research
and technique, 81(10), 1105-1121.
[22] Khawaja, A., Khan, T. M., Khan, M. A., & Nawaz, S. J. (2019). A multi-scale
directional line detector for retinal vessel segmentation. Sensors, 19(22), 4949.
[23] Fu, H., Xu, Y., Wong, D. W. K., & Liu, J. (2016, April). Retinal vessel segmentation
via deep learning network and fully-connected conditional random fields. In 2016 IEEE
13th international symposium on biomedical imaging (ISBI) (pp. 698-701). IEEE.
[24] Fu, H., Xu, Y., Lin, S., Wong, D. W. K., & Liu, J. (2016, October). Deepvessel:
Retinal vessel segmentation via deep learning and conditional random field.
In International conference on medical image computing and computer-assisted
intervention (pp. 132-139). Springer, Cham.
[25] Xu, P., Liao, L., Zhou, C., Xue, R., & Fu, W. (2017, April). Simulation research on the
process of large scale ship plane segmentation intelligent workshop. In AIP Conference
Proceedings (Vol. 1834, No. 1, p. 040024). AIP Publishing LLC.
[26] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for
semantic segmentation. In Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 3431-3440).
[27] Boudegga, H., Elloumi, Y., Akil, M., Bedoui, M. H., Kachouri, R., & Abdallah, A. B.
(2021). Fast and efficient retinal blood vessel segmentation method based on deep
learning network. Computerized Medical Imaging and Graphics, 90, 101902.
[28] Shin, S. Y., Lee, S., Yun, I. D., & Lee, K. M. (2019). Deep vessel segmentation by
learning graphical connectivity. Medical image analysis, 58, 101556.
[29] Zhao, H., Li, H., & Cheng, L. (2020). Improving retinal vessel segmentation with joint
local loss by matting. Pattern Recognition, 98, 107068.

93
上海大学博士学位论文

[30] Nishi, H., Oishi, N., Ishii, A., Ono, I., Ogura, T., Sunohara, T., ... & Miyamoto, S.
(2020). Deep Learning–Derived High-Level Neuroimaging Features Predict Clinical
Outcomes for Large Vessel Occlusion. Stroke, 51(5), 1484-1492.
[31] Yuan, H., & He, B. (2014). Brain–computer interfaces using sensorimotor rhythms:
current state and future perspectives. IEEE Transactions on Biomedical
Engineering, 61(5), 1425-1435.
[32] Jiang, Z., Wang, H., Li, Z., Grimm, M., Zhou, M., Eck, U., ... & Navab, N. (2021).
Motion-Aware Robotic 3D Ultrasound. arXiv preprint arXiv:2107.05998.
[33] Blankertz, B., Lemm, S., Treder, M., Haufe, S., & Müller, K. R. (2011). Single-trial
analysis and classification of ERP components—a tutorial. NeuroImage, 56(2),
814-825.
[34] Ma, T., Li, H., Deng, L., Yang, H., Lv, X., Li, P., ... & Xu, P. (2017). The hybrid BCI
system for movement control by combining motor imagery and moving onset visual
evoked potential. Journal of neural engineering, 14(2), 026015.
[35] Long, J., Li, Y., Wang, H., Yu, T., Pan, J., & Li, F. (2012). A hybrid brain computer
interface to control the direction and speed of a simulated or real wheelchair. IEEE
Transactions on Neural Systems and Rehabilitation Engineering, 20(5), 720-729.
[36] LaFleur, K., Cassady, K., Doud, A., Shades, K., Rogin, E., & He, B. (2013).
Quadcopter control in three-dimensional space using a noninvasive motor
imagery-based brain–computer interface. Journal of neural engineering, 10(4), 046003.
[37] Shen, Y., Lu, H., & Jia, J. (2017, September). Classification of motor imagery EEG
signals with deep learning models. In International Conference on Intelligent Science
and Big Data Engineering (pp. 181-190). Springer, Cham.
[38] Stewart, A. X., Nuthmann, A., & Sanguinetti, G. (2014). Single-trial classification of
EEG in a visual object task using ICA and machine learning. Journal of neuroscience
methods, 228, 1-14.

94
上海大学博士学位论文

[39] Suk, H. I., & Lee, S. W. (2012). A novel Bayesian framework for discriminative
feature extraction in brain-computer interfaces. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 35(2), 286-299.
[40] Lemm, S., Blankertz, B., Dickhaus, T., & Müller, K. R. (2011). Introduction to
machine learning for brain imaging. Neuroimage, 56(2), 387-399.
[41] Li, J., Struzik, Z., Zhang, L., & Cichocki, A. (2015). Feature learning from incomplete
EEG with denoising autoencoder. Neurocomputing, 165, 23-31.
[42] An, X., Kuang, D., Guo, X., Zhao, Y., & He, L. (2014, August). A deep learning
method for classification of EEG data based on motor imagery. In International
Conference on Intelligent Computing (pp. 203-210). Springer, Cham.
[43] Wang, Z., Lyu, S., Schalk, G., & Ji, Q. (2013, June). Deep feature learning using target
priors with applications in ECoG signal decoding for BCI. In Twenty-Third
International Joint Conference on Artificial Intelligence.
[44] Freudenburg, Z. V., Ramsey, N. F., Wronkiewicz, M., Smart, W. D., Pless, R., &
Leuthardt, E. C. (2011). Real-time naive learning of neural correlates in ECoG
electrophysiology. International Journal of Machine Learning and Computing, 1(3),
269.
[45] Jirayucharoensak, S., Pan-Ngum, S., & Israsena, P. (2014). EEG-based emotion
recognition using deep learning network with principal component based covariate shift
adaptation. The Scientific World Journal, 2014.
[46] Ahmed, S., Merino, L. M., Mao, Z., Meng, J., Robbins, K., & Huang, Y. (2013,
December). A deep learning method for classification of images RSVP events with
EEG data. In 2013 IEEE Global Conference on Signal and Information Processing (pp.
33-36). IEEE.
[47] Yang, H., Sakhavi, S., Ang, K. K., & Guan, C. (2015, August). On the use of
convolutional neural networks and augmented CSP features for multi-class motor
imagery of EEG signals classification. In 2015 37th Annual International Conference

95
上海大学博士学位论文

of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 2620-2623).
IEEE.
[48] Wulsin, D. F., Gupta, J. R., Mani, R., Blanco, J. A., & Litt, B. (2011). Modeling
electroencephalography waveforms with semi-supervised deep belief nets: fast
classification and anomaly measurement. Journal of neural engineering, 8(3), 036015.
[49] Craik, A., He, Y., & Contreras-Vidal, J. L. (2019). Deep learning for
electroencephalogram (EEG) classification tasks: a review. Journal of neural
engineering, 16(3), 031001.
[50] Thomas, K. P., & Vinod, A. P. (2017). Toward EEG-based biometric systems: the
great potential of brain-wave-based biometrics. IEEE Systems, Man, and Cybernetics
Magazine, 3(4), 6-15.
[51] Xu, M., Yao, J., Zhang, Z., Li, R., Yang, B., Li, C., ... & Zhang, J. (2020). Learning
EEG topographical representation for classification via convolutional neural
network. Pattern Recognition, 105, 107390.
[52] Lu, N., Li, T., Ren, X., & Miao, H. (2016). A deep learning scheme for motor imagery
classification based on restricted Boltzmann machines. IEEE transactions on neural
systems and rehabilitation engineering, 25(6), 566-576.
[53] Lin, J. S., & Shihb, R. (2018). A motor-imagery BCI system based on deep learning
networks and its applications. Evolving BCI Therapy-Engaging Brain State
Dynamics, 75(5).
[54] Wu, X., Zhou, B., Lv, Z., & Zhang, C. (2019). To explore the potentials of
independent component analysis in brain-computer interface of motor imagery. IEEE
journal of biomedical and health informatics, 24(3), 775-787.
[55] Park, Y., & Chung, W. (2019). Frequency-optimized local region common spatial
pattern approach for motor imagery classification. IEEE Transactions on Neural
Systems and Rehabilitation Engineering, 27(7), 1378-1388.

96
上海大学博士学位论文

[56] Wu, H., Niu, Y., Li, F., Li, Y., Fu, B., Shi, G., & Dong, M. (2019). A parallel
multiscale filter bank convolutional neural networks for motor imagery EEG
classification. Frontiers in neuroscience, 13, 1275.
[57] Chen, C., Zhu, W., Steibel, J., Siegford, J., Wurtz, K., Han, J., & Norton, T. (2020).
Recognition of aggressive episodes of pigs based on convolutional neural network and
long short-term memory. Computers and Electronics in Agriculture, 169, 105166.
[58] Bashivan, P., Rish, I., Yeasin, M., & Codella, N. (2015). Learning representations from
EEG with deep recurrent-convolutional neural networks. arXiv:1511.06448.
[59] Rajan, S. P. (2020). Recognition of Cardiovascular Diseases through Retinal Images
Using Optic Cup to Optic Disc Ratio. Pattern Recognition and Image Analysis, 30(2),
256-263.
[60] Fraz, M.M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A.R., Owen, C.G.,
et al.: ‘Blood vessel segmentation methodologies in retinal images–a survey’, Computer
methods and programs in biomedicine, 2012, 108, (1), pp. 407–433
[61] Wong, T.Y., Knudtson, M.D., Klein, R., Klein, B.E., Meuer, S.M., Hubbard, L.D.:
‘Computer-assisted measurement of retinal vessel diameters in the beaver dam eye
study: methodology, correlation between eyes, and effect of refractive errors’,
Ophthalmology, 2004, 111, (6), pp. 1183–1190
[62] Lindeberg, T.: ‘Scale-space for discrete signals’, IEEE transactions on pattern analysis
and machine intelligence, 1990, 12, (3), pp. 234–254
[63] Zhang, B., Zhang, L., Zhang, L., Karray, F.: ‘Retinal vessel extraction by matched
filter with first-order derivative of gaussian’, Computers in biology and medicine, 2010,
40, (4), pp. 438–445
[64] Fraz, M.M., Barman, S.A., Remagnino, P., Hoppe, A., Basit, A., Uyyanonvara, B., et
al.: ‘An approach to localize the retinal blood vessels using bit planes and centerline
detection’, Computer methods and programs in biomedicine, 2012, 108, (2), pp. 600–
616

97
上海大学博士学位论文

[65] Azzopardi, G., Strisciuglio, N., Vento, M., Petkov, N.: ‘Trainable cosfire filters for
vessel delineation with application to retinal images’, Medical image analysis, 2015, 19,
(1), pp. 46–57
[66] Yin, B., Li, H., Sheng, B., Hou, X., Chen, Y.,Wu,W., et al.: ‘Vessel extraction from
non-fluorescein fundus images using orientation-aware detector’, Medical image
analysis, 2015, 26, (1), pp. 232–242
[67] Zhang, J., Dashtbozorg, B., Bekkers, E., Pluim, J.P., Duits, R., ter Haar.Romeny, B.M.:
‘Robust retinal vessel segmentation via locally adaptive derivative frames in orientation
scores’, IEEE transactions on medical imaging, 2016, 35, (12), pp. 2631–2644
[68] Karn, P.K., Biswal, B., Samantaray, S.R.: ‘Robust retinal blood vessel segmentation
using hybrid active contour model’, IET Image Processing, 2018, 13, (3), pp. 440–450
[69] Ricci, E., Perfetti, R.: ‘Retinal blood vessel segmentation using line operators and
support vector classification’, IEEE transactions on medical imaging, 2007, 26, (10), pp.
1357–1365
[70] Sathananthavathi, V., Indumathi, G.: ‘Bat algorithm inspired retinal blood vessel
segmentation’, IET Image Processing, 2018, 12, (11), pp. 2075–2083
[71] Galdran, A., Meyer, M., Costa, P., Campilho, A., et al. ‘Uncertainty-aware artery/vein
classification on retinal images’. In: 2019 IEEE 16th International Symposium on
Biomedical Imaging (ISBI 2019). (IEEE, 2019. pp. 556–560
[72] Xue, L.y., Lin, J.w., Cao, X.r., Zheng, S.h., Yu, L.: ‘A saliency and gaussian net model
for retinal vessel segmentation’, Frontiers of Information Technology & Electronic
Engineering, 2019, 20, (8), pp. 1075–1086
[73] Yang, Y., Shao, F., Fu, Z., Fu, R.: ‘Discriminative dictionary learning for retinal vessel
segmentation using fusion of multiple features’, Signal, Image and Video Processing,
2019, pp. 1–9
[74] Zhang, Y., Chung, A.C. ‘Deep supervision with additional labels for retinal vessel
segmentation task’. In: International Conference on Medical Image Computing and
Computer-Assisted Intervention. (Springer, 2018. pp. 83–91

98
上海大学博士学位论文

[75] Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: ‘Recurrent residual
convolutional neural network based on u-net (r2u-net) for medical image segmentation’,
arXiv preprint arXiv:180206955, 2018,
[76] Jin, Q., Meng, Z., Pham, T.D., Chen, Q., Wei, L., Su, R.: ‘Dunet: A deformable
network for retinal vessel segmentation’, Knowledge-Based Systems, 2019, 178, pp.
149–162
[77] Wang, C., Zhao, Z., Ren, Q., Xu, Y., Yu, Y.: ‘Dense u-net based on patch-based
learning for retinal vessel segmentation’, Entropy, 2019, 21, (2), pp. 168
[78] Laibacher, T., Weyde, T., Jalali, S. ‘M2u-net: Effective and efficient retinal vessel
segmentation for real-world applications’. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops. (, 2019. pp. 0–0
[79] Melinšˇcak, M., Prentaši´c, P., Lonˇcari´c, S. ‘Retinal vessel segmentation using deep
neural networks’. In: VISAPP 2015 (10th International Conference on Computer Vision
Theory and Applications). (, 2015.
[80] Li, Q., Feng, B., Xie, L., Liang, P., Zhang, H.,Wang, T.: ‘A cross-modality learning
approach for vessel segmentation in retinal images’, IEEE transactions on medical
imaging, 2015, 35, (1), pp. 109–118
[81] Li, Q., Feng, B., Xie, L., Liang, P., Zhang, H.,Wang, T.: ‘A cross-modality learning
approach for vessel segmentation in retinal images.’, IEEE Trans Med Imaging, 2016,
35, (1), pp. 109–118
[82] Liskowski, P., Krawiec, K.: ‘Segmenting retinal blood vessels with deep neural
networks’, IEEE transactions on medical imaging, 2016, 35, (11), pp. 2369–2380
[83] Fu, H., Xu, Y., Wong, D.W.K., Liu, J. ‘Retinal vessel segmentation via deeplearning
network and fully-connected conditional random fields’. In: Biomedical Imaging (ISBI),
2016 IEEE 13th International Symposium on. (IEEE, 2016. pp. 698–701
[84] Fu, H., Xu, Y., Lin, S., Wong, D.W.K., Liu, J. ‘Deepvessel: Retinal vessel
segmentation via deep learning and conditional random field’. In: International

99
上海大学博士学位论文

Conference on Medical Image Computing and Computer-Assisted Intervention.


(Springer, 2016. pp. 132–139
[85] Maninis, K.K., Pont.Tuset, J., Arbeláez, P., Van.Gool, L. ‘Deep retinal image
understanding’. In: International Conference on Medical Image Computing and
Computer-Assisted Intervention. (Springer, 2016. pp. 140–148
[86] Yan, Z., Yang, X., Cheng, K.T.: ‘A three-stage deep learning model for accurate
retinal vessel segmentation’, IEEE Journal of Biomedical and Health Informatics, 2018,
pp. 1–1
[87] Oliveira, A.F.M., Pereira, S.R.M., Silva, C.A.B.: ‘Retinal vessel segmentation based
on fully convolutional neural networks’, Expert Systems with Applications, 2018
[88] Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van.Ginneken, B.:
‘Ridge-based vessel segmentation in color images of the retina’, IEEE transactions on
medical imaging, 2004, 23, (4), pp. 501–509
[89] Hoover, A., Kouznetsova, V., Goldbaum, M.: ‘Locating blood vessels in retinal images
by piecewise threshold probing of a matched filter response’, IEEE Transactions on
Medical imaging, 2000, 19, (3), pp. 203–210
[90] Owen, C.G., Rudnicka, A.R., Mullen, R., Barman, S.A., Monekosso, D., Whincup,
P.H., et al.: ‘Measuring retinal vessel tortuosity in 10-year-old children: validation of the
computer-assisted image analysis of the retina (caiar) program’, Investigative
ophthalmology & visual science, 2009, 50, (5), pp. 2004–2010
[91] Zhou, L., Zhao, Y., Yang, J., Yu, Q., Xu, X.: ‘Deep multiple instance learning for
automatic detection of diabetic retinopathy in retinal images’, IET Image Processing,
2017, 12, (4), pp. 563–571
[92] Thangaraj, S., Periyasamy, V., Balaji, R.: ‘Retinal vessel segmentation using neural
network’, IET Image Processing, 2017, 12, (5), pp. 669–678
[93] Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A. ‘Image-to-image translation with conditional
adversarial networks’. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. (, 2017. pp. 1125–1134

100
上海大学博士学位论文

[94] Soares, J.V., Leandro, J.J., Cesar, R.M., Jelinek, H.F., Cree, M.J.: ‘Retinal vessel
segmentation using the 2-d gabor wavelet and supervised classification’, IEEE
Transactions on medical Imaging, 2006, 25, (9), pp. 1214–1222
[95] Marín, D., Aquino, A., Gegúndez.Arias, M.E., Bravo, J.M.: ‘A new supervised method
for blood vessel segmentation in retinal images by using gray-level and moment
invariants-based features’, IEEE transactions on medical imaging, 2011, 30, (1), pp. 146
[96] Fraz, M.M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A.R., Owen, C.G.,
et al.: ‘An ensemble classification-based approach applied to retinal blood vessel
segmentation’, IEEE Transactions on Biomedical Engineering, 2012, 59, (9), pp. 2538–
2548
[97] Roychowdhury, S., Koozekanani, D.D., Parhi, K.K.: ‘Iterative vessel segmentation of
fundus images’, IEEE Transactions on Biomedical Engineering, 2015, 62, (7), pp.
1738–1749
[98] Khan, M.A., Khan, T.M., Bailey, D., Soomro, T.A.: ‘A generalized multi-scale
linedetection method to boost retinal vessel segmentation sensitivity’, Pattern Analysis
and Applications, 2019, 22, (3), pp. 1177–1196
[99] Khan, M.A., Khan, T.M., Soomro, T.A., Mir, N., Gao, J.: ‘Boosting sensitivity of a
retinal vessel segmentation algorithm’, Pattern Analysis and Applications, 2019, 22, (2),
pp. 583–599
[100] You, X., Peng, Q., Yuan, Y., Cheung, Y.m., Lei, J.: ‘Segmentation of retinal blood
vessels using the radial projection and semi-supervised approach’, Pattern Recognition,
2011, 44, (10-11), pp. 2314–2324
[101] Orlando, J.I., Prokofyeva, E., Blaschko, M.B.: ‘A discriminatively trained fully
connected conditional random field model for blood vessel segmentation in fundus
images’, IEEE Transactions on Biomedical Engineering, 2017, 64, (1), pp. 16–27
[102] Dasgupta, A., Singh, S. ‘A fully convolutional neural network based structured
prediction approach towards the retinal vessel segmentation’. In: Biomedical Imaging
(ISBI 2017), 2017 IEEE 14th International Symposium on. (IEEE, 2017. pp. 248–251

101
上海大学博士学位论文

[103] Yan, Z., Yang, X., Cheng, K.T.: ‘A skeletal similarity metric for quality evaluation of

retinal vessel segmentation’, IEEE transactions on medical imaging, 2018, 37, (4), pp.
1045–10.
[104] Li Y, Long J, Yu T, Yu Z, Wang C, Zhang H, et al. An EEGbased BCI system for
2-D cursor control by combining mu/ beta rhythm and p300 potential. IEEE Trans
Biomed Eng 2010; 57(10):2495–505.
[105] Long J, Li Y, Wang H, Yu T, Pan J, Li F. A hybrid braincomputer interface to
control the direction and speed of a simulated or real wheelchair. IEEE Trans Neural
Syst Rehabil Eng 2012; 20(5):720–9.
[106] LaFleur K, Cassady K, Doud A, Shades K, Rogin E, He B. Quadcopter control in
three-dimensional space using a noninvasive motor imagery-based brain–computer
interface. J Neural Eng 2013; 10(4):046003.
[107] Lotte F, Congedo M, Lécuyer A, Lamarche F, Arnaldi B. A review of
classification algorithms for EEG-based brain–computer interfaces. J Neural Eng 2007;
4(2):R1.
[108] Suk H-I, Lee S-W. A novel Bayesian framework for discriminative feature
extraction in brain–computer interfaces. IEEE Trans Pattern Anal Mach Intell 2012; 35
(2):286–99.
[109] Lemm S, Blankertz B, Dickhaus T, M&ldquo;uller K-R. Introduction to machine
learning for brain imaging. Neuroimage 2011; 56(2):387–99.
[110] Asensio-Cubero J, Gan JQ, Palaniappan R. Extracting optimal tempo-spatial
features using local discriminant bases and common spatial patterns for brain computer
interfacing. Biomed Signal Process Control 2013; 8(6):772–8.
[111] Kam T-E, Suk H-I, Lee S-W. Non-homogeneous spatial filter optimization for
electroencephalogram (EEG)-based motor imagery classification. Neurocomputing
2013; 108:58–68.

102
上海大学博士学位论文

[112] Ghaheri H, Ahmadyfard A. Extracting common spatial patterns from EEG time
segments for classifying motor imagery classes in a brain computer interface (BCI),
Scientia Iranica. Trans D Comput Sci Eng Electr 2013; 20(6):2061.
[113] Metlická M. EEG signal pattern matching on GPU. Technical University Ostrava;
2013.
[114] Stewart AX, Nuthmann A, Sanguinetti G. Single-trial classification of EEG in a
visual object task using ICA and machine learning. J Neurosci Methods 2014; 228:1–14.
[115] Nicolas-Alonso LF, Corralejo R, Gomez-Pilar J, Álvarez D, Hornero R. Adaptive
stacked generalization for multiclass motor imagery-based brain computer interfaces.
IEEE Trans Neural Syst Rehabil Eng 2015; 23(4):702–12.
[116] Aghaei AS, Mahanta MS, Plataniotis KN. Separable common spatio-spectral
patterns for motor imagery BCI systems. IEEE Trans Biomed Eng 2015; 63(1):15–29.
[117] Raza H, Cecotti H, Prasad G. A combination of transductive and inductive
learning for handling non-stationarities in motor imagery classification. International
Joint Conference on Neural Networks (IJCNN) 2016; 763–70.
[118] Zanini P, Congedo M, Jutten C, Said S, Berthoumieu Y. Transfer learning: a
Riemannian geometry framework with applications to brain–computer interfaces. IEEE
Trans Biomed Eng 2017; 65(5):1107–16.
[119] Hadsund JT, Leerskov KK. Feasibility of using error-related potentials as an
appropriate method for adaptation in a brain–computer interface.
[120] Abbas W, Khan NA. FBCSP-based multi-class motor imagery classification using
BP and TDP features. 40th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC) 2018; 215–8.
[121] Abbas W, Khan NA. DeepMI: deep learning for multiclass motor imagery
classification. 40th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC) 2018; 219–22.

103
上海大学博士学位论文

[122] Sakhavi S, Guan C, Yan S. Learning temporal information for brain–computer


interface using convolutional neural networks. IEEE Trans Neural Netw Learn Syst
2018; 29 (11):5619–29.
[123] Zhang R, Zong Q, Dou L, Zhao X. A novel hybrid deep learning scheme for
four-class motor imagery classification. J Neural Eng 2019; 16(6):066004.
[124] Vaughan TM, Heetderks WJ, Trejo LJ, Rymer WZ, Weinrich M, Moore MM, et al.
Brain–computer interface technology: a review of the second international meeting.
IEEE Trans Neural Syst Rehabil Eng 2003; 11(2):94–109.
[125] Polat K, G&ldquo;unes S. Classification of epileptiform EEG using a hybrid
system based on decision tree classifier and fast Fourier transform. Appl Math Comput
2007; 187 (2):1017–26.
[126] Miao M, Zeng H, Wang A, Zhao C, Liu F. Discriminative
spatial-frequency-temporal feature extraction and classification of motor imagery EEG:
An sparse regression and weighted Naïve Bayesian Bayesian classifier-based approach.
J Neurosci Methods 2017; 278:13–24.
[127] Ang KK, Chin ZY, Wang C, Guan C, Zhang H. Filter bank common spatial
pattern algorithm on BCI competition iv datasets 2A and 2B. Front Neurosci 2012; 6:39.
[128] Cecotti H, Graeser A. Convolutional neural network with embedded Fourier
transform for EEG classification. 19th International Conference on Pattern Recognition
2008; 1–4.
[129] Bevilacqua V, Tattoli G, Buongiorno D, Loconsole C, Leonardis D, Barsotti M, et
al. A novel BCI-SSVEP based approach for control of walking in virtual environment
using a convolutional neural network. International Joint Conference on Neural
Networks (IJCNN) 2014; 4121–8.
[130] Cecotti H, Graser A. Convolutional neural networks for p300 detection with
application to brain–computer interfaces. IEEE Trans Pattern Anal Mach Intell 2010;
33(3):433–45.

104
上海大学博士学位论文

[131] Helal MA, Eldawlatly S, Taher M. Using autoencoders for feature enhancement in
motor imagery brain–computer interfaces. 13th IASTED International Conference on
Biomedical Engineering (BioMed) 2017; 89–93.
[132] Zheng W-L, Zhu J-Y, Peng Y, Lu B-L. EEG-based emotion classification using
deep belief networks. IEEE International Conference on Multimedia and Expo (ICME);
2014. pp. 1–6.
[133] An X, Kuang D, Guo X, Zhao Y, He L. A deep learning method for classification
of EEG data based on motor imagery. International Conference on Intelligent
Computing. Springer; 2014. p. 203–10.
[134] Lu N, Li T, Ren X, Miao H. A deep learning scheme for motor imagery
classification based on restricted Boltzmann machines. IEEE Trans Neural Syst Rehabil
Eng 2016; 25 (6):566–76.
[135] Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural
networks. Advances in neural information processing systems. 2014; 3104–12.
[136] Dai AM, Le QV. Semi-supervised sequence learning. Advances in neural
information processing systems. 2015; 3079–87.
[137] Zhou J, Meng M, Gao Y, Ma Y, Zhang Q. Classification of motor imagery EEG
using wavelet envelope analysis and LSTM networks. Chinese Control And Decision
Conference (CCDC); 2018. pp. 5600–5.
[138] Tabar YR, Halici U. A novel deep learning approach for classification of EEG
motor imagery signals. J Neural Eng 2016; 14(1):016003.
[139] Schirrmeister RT, Springenberg JT, Fiederer LDJ, Glasstetter M, Eggensperger K,
Tangermann M, et al. Deep learning with convolutional neural networks for EEG
decoding and visualization. Hum Brain Mapp 2017; 38 (11):5391–420.
[140] Xu B, Zhang L, Song A, Wu C, Li W, Zhang D, et al. Wavelet transform
time-frequency image and convolutional network-based motor imagery EEG
classification. IEEE Access 2018; 7:6084–93.

105
上海大学博士学位论文

[141] Ma X, Qiu S, Du C, Xing J, He H. Improving EEG-based motor imagery


classification via spatial and temporal recurrent neural networks. 40th Annual
International Conference of the IEEE Engineering in Medicine and Biology Society
(EMBC); 2018. pp. 1903–6.
[142] Antelis JM, Falcón LE, et al. Spiking neural networks applied to the classification
of motor tasks in EEG signals. Neural Netw 2020; 122:130–43.
[143] Rammy SA, Abrar M, Anwar SJ, Zhang W. Recurrent deep learning for
EEG-based motor imagination recognition. 3rd International Conference on
Advancements inComputational Sciences (ICACS); 2020. pp. 1–6.
[144] Brunner C, Leeb R, M&ldquo;uller-Putz G, Schl&rdquo;ogl A, Pfurtscheller G.
BCI competition 2008 – Graz data set A, 16. Institute for Knowledge Discovery
(Laboratory of Brain–Computer Interfaces), Graz University of Technology; 2008. p. 1–
6.
[145] Tangermann M, Müller K-R, Aertsen A, Birbaumer N, Braun C, Brunner C, et al.
Review of the BCI competition IV. Front Neurosci 2012; 6:55.
[146] Yang B, He M, Liu Y, Han Z. Multi-class feature extraction based on common
spatial patterns of multi-band cross filter in bcis. International Computer Science
Conference. Springer; 2012. p. 399–408.
[147] Bengio Y, Simard P, Frasconi P, et al. Learning long-term dependencies with
gradient descent is difficult. IEEE Trans Neural Netw 1994; 5(2):157–66.
[148] Bashivan P, Rish I, Yeasin M, Codella N. Learning representations from EEG
with deep recurrentconvolutional neural networks. arXiv preprint arXiv:1511.06448.
[149] Wang D, Miao D, Blohm G. Multi-class motor imagery EEG decoding for brain–
computer interfaces. Front Neurosci 2012; 6:151.
[150] Abbas W, Khan NA. A discriminative spectral-temporal feature set for motor
imagery classification. IEEE International Workshop on Signal Processing Systems
(SiPS); 2017. pp. 1–6.

106
上海大学博士学位论文

[151] Asensio-Cubero J, Gan J, Palaniappan R. Multiresolution analysis over simple


graphs for brain computer interfaces. J Neural Eng 2013; 10(4):046014.
[152] Abdelfattah SM, Abdelrahman GM, Wang M. Augmenting the size of EEG
datasets using generative adversarial networks. International Joint Conference on Neural
Networks (IJCNN); 2018. pp. 1–6.
[153] M. D. Abr`amoff, M. K. Garvin, and M. Sonka, “Retinal imaging and image
analysis,” IEEE reviews in biomedical engineering, vol. 3, pp. 169–208, 2010.
[154] I. 2-930229-85-3, “International diabetes federation,” IDF Diabetes Atlas, vol. 6,
2013.
[155] R. J. Winder, P. J. Morrow, I. N. McRitchie, J. Bailie, and P. M. Hart,
“Algorithms for digital image processing in diabetic retinopathy,” Computerized
medical imaging and graphics, vol. 33, no. 8, pp. 608–622, 2009.
[156] W. Abbas and N. A. Khan, “A discriminative spectral-temporal feature set for
motor imagery classification,” in 2017 IEEE International Workshop on Signal
Processing Systems (SiPS). IEEE, 2017, pp. 1–6.
[157] W. Abbas and N. A. Khan, “Fbcsp-based multi-class motor imagery
classification using bp and tdp features,” in 2018 40th Annual International Conference
of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2018, pp.
215–218.
[158] W. Abbas and N. A. Khan, “Deepmi: Deep learning for multiclass motor imagery
classification,” in 2018 40th Annual International Conference of the IEEE Engineering
in Medicine and Biology Society (EMBC). IEEE, 2018, pp. 219–222.
[159] W. Abbas and N. A. Khan, “A high performance approach for classification of
motor imagery eeg,” in 2018 IEEE Biomedical Circuits and Systems Conference
(BioCAS). IEEE, 2018, pp. 1–4.
[160] H. Fu, Y. Xu, S. Lin, D. W. K. Wong, and J. Liu, “Deepvessel: Retinal vessel
segmentation via deep learning and conditional random field,” in International

107
上海大学博士学位论文

Conference on Medical Image Computing and Computer-Assisted Intervention.


Springer, 2016, pp. 132–139.
[161] K.-K. Maninis, J. Pont-Tuset, P. Arbel´aez, and L. Van Gool, “Deep retinal image
understanding,” in International Conference on Medical Image Computing and
Computer-Assisted Intervention. Springer, 2016, pp. 140–148.
[162] M. Melinˇsˇcak, P. Prentaˇsi´c, and S. Lonˇcari´c, “Retinal vessel segmentation
using deep neural networks,” in VISAPP 2015 (10th International Conference on
Computer Vision Theory and Applications), 2015.
[163] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proceedings of the IEEE
international conference on computer vision, 2015, pp. 1395–1403.
[164] Z. Jiang, H. Zhang, Y. Wang, and S.-B. Ko, “Retinal blood vessel segmentation
using fully convolutional network with transfer learning,” Computerized Medical
Imaging and Graphics, vol. 68, pp. 1–15, 2018.
[165] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation,” in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 3431–3440.
[166] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
biomedical image segmentation,” in International Conference on Medical image
computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
[167] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional
encoder-decoder architecture for image segmentation,” IEEE transactions on pattern
analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
[168] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding
beyond pixels using a learned similarity metric,” arXiv preprint arXiv:1512.09300,
2015.
[169] V. Nguyen, T. F. Y. Vicente, M. Zhao, M. Hoai, and D. Samaras, “Shadow
detection with conditional generative adversarial networks,” in Computer Vision
(ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 4520–4528.

108
上海大学博士学位论文

[170] M. Niemeijer, J. Staal, B. Ginneken, M. Loog, and M. Abramoff, “Drive: digital


retinal images for vessel extraction,” Methods for Evaluating Segmentation and
Indexing Techniques Dedicated to Retinal Ophthalmology, 2004.
[171] A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating blood vessels in
retinal images by piecewise threshold probing of a matched filter response,” IEEE
Transactions on Medical imaging, vol. 19, no. 3, pp. 203–210, 2000.
[172] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object
detection with region proposal networks,” in Advances in neural information processing
systems, 2015, pp. 91–99.
[173] P. O. Pinheiro, R. Collobert, and P. Doll´ar, “Learning to segment object
candidates,” in Advances in Neural Information Processing Systems, 2015, pp. 1990–
1998.
[174] C. Becker, R. Rigamonti, V. Lepetit, and P. Fua, “Supervised feature learning
for curvilinear structure segmentation,” in International Conference on Medical Image
Computing and Computer-Assisted Intervention. Springer, 2013, pp. 526–533.
[175] J. V. Soares, J. J. Leandro, R. M. Cesar, H. F. Jelinek, and M. J. Cree,
“Retinal vessel segmentation using the 2-d gabor wavelet and supervised classification,”
IEEE Transactions on medical Imaging, vol. 25, no. 9, pp. 1214–1222, 2006.
[176] Y. Ganin and V. Lempitsky, “ nˆ 4-fields: Neural network nearest neighbor
fields for image transforms,” in Asian Conference on Computer Vision. Springer, 2014,
pp. 536–551.
[177] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE
transactions on systems, man, and cybernetics, vol. 9, no. 1, pp. 62–66, 1979.
[178] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural
information processing systems, 2014, pp. 2672–2680.

109
上海大学博士学位论文

[179] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning


with deep convolutional generative adversarial networks,” arXiv preprint
arXiv:1511.06434, 2015.
[180] Y. Li, J. Long, T. Yu, Z. Yu, C. Wang, H. Zhang, and C. Guan, “An eegbased
bci system for 2-d cursor control by combining mu/beta rhythm and p300 potential,”
IEEE Transactions on Biomedical Engineering, vol. 57, no. 10, pp. 2495–2505, 2010.
[181] J. Long, Y. Li, H. Wang, T. Yu, J. Pan, and F. Li, “A hybrid brain computer
interface to control the direction and speed of a simulated or real wheelchair,” IEEE
Transactions on Neural Systems and Rehabilitation Engineering, vol. 20, no. 5, pp. 720–
729, 2012.
[182] K. LaFleur, K. Cassady, A. Doud, K. Shades, E. Rogin, and B.
He,“Quadcopter control in three-dimensional space using a noninvasive motor
imagery-based brain–computer interface,” Journal of neural engineering, vol. 10, no. 4,
p. 046003, 2013.
[183] F. Lotte, M. Congedo, A. L´ecuyer, F. Lamarche, and B. Arnaldi, “A review
of classification algorithms for eeg-based brain–computer interfaces,” Journal of neural
engineering, vol. 4, no. 2, p. R1, 2007.
[184] H. Cecotti and A. Graeser, “Convolutional neural network with embedded
fourier transform for eeg classification,” in 2008 19th International Conference on
Pattern Recognition. IEEE, 2008, pp. 1–4.
[185] W.-L. Zheng, J.-Y. Zhu, Y. Peng, and B.-L. Lu, “Eeg-based emotion
classification using deep belief networks,” in 2014 IEEE International Conference on
Multimedia and Expo (ICME). IEEE, 2014, pp. 1–6.
[186] X. An, D. Kuang, X. Guo, Y. Zhao, and L. He, “A deep learning method for
classification of eeg data based on motor imagery,” in International Conference on
Intelligent Computing. Springer, 2014, pp. 203–210.

110
上海大学博士学位论文

[187] N. Lu, T. Li, X. Ren, and H. Miao, “A deep learning scheme for motor
imagery classification based on restricted boltzmann machines,” IEEE transactions on
neural systems and rehabilitation engineering, vol. 25, no. 6, pp. 566–576, 2016.
[188] H. Azami, H. Hassanpour, J. Escudero, and S. Sanei, “An intelligent approach
for variable size segmentation of non-stationary signals,” Journal of advanced research,
vol. 6, no. 5, pp. 687–698, 2015.
[189] M. Tangermann, K.-R. M¨uller, A. Aertsen, N. Birbaumer, C. Braun, C.
Brunner, R. Leeb, C. Mehring, K. J. Miller, G. Mueller-Putz et al., “Review of the bci
competition iv,” Frontiers in neuroscience, vol. 6, p. 55, 2012.
[190] P. Heckbert, “Fourier transforms and the fast fourier transform (fft) algorithm,”
Computer Graphics, vol. 2, pp. 15–463, 1995.
[191] Y. Bengio, P. Simard, P. Frasconi et al., “Learning long-term dependencies
with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2,
pp. 157–166, 1994.
[192] W. Abbas and N. A. Khan, “A discriminative spectral-temporal feature set for
motor imagery classification,” in 2017 IEEE International Workshop on Signal
Processing Systems (SiPS). IEEE, 2017, pp. 1–6.
[193] H. Ghaheri and A. Ahmadyfard, “Extracting common spatial patterns from
eeg time segments for classifying motor imagery classes in a brain computer interface
(bci),” Scientia Iranica. Transaction D, Computer Science & Engineering, Electrical, vol.
20, no. 6, p. 2061, 2013.
[194] W. Abbas and N. A. Khan, “Fbcsp-based multi-class motor imagery
classification using bp and tdp features,” in 2018 40th Annual International Conference
of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2018, pp.
215–218.
[195] H. Raza, H. Cecotti, and G. Prasad, “A combination of transductive and
inductive learning for handling non-stationarities in motor imagery classification,” in

111
上海大学博士学位论文

2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016, pp.
763–770.
[196] K. K. Ang, Z. Y. Chin, C. Wang, C. Guan, and H. Zhang, “Filter bank
common spatial pattern algorithm on bci competition iv datasets 2a and 2b,” Frontiers in
neuroscience, vol. 6, p. 39, 2012.
[197] A. S. Aghaei, M. S. Mahanta, and K. N. Plataniotis, “Separable common
spatio-spectral patterns for motor imagery bci systems,” IEEE Transactions on
Biomedical Engineering, vol. 63, no. 1, pp. 15–29, 2015.
[198] P. Bashivan, I. Rish, M. Yeasin, and N. Codella, “Learning representations
from EEG with deep recurrent-convolutional neural networks,” arXiv preprint
arXiv:1511.06448, 2015.

112
上海大学博士学位论文

Publications

1. Sadaqat Ali Rammy, Waseem Abbas, Naqy-Ul-Hassan, Asif Raza,Wu Zhang, CPGAN:
Conditional Patch-based Generative Adversarial Network for Retinal Vessel
Segmentation, IET Image Processing [J] 14 (2020) 1081-1090.
2. Sadaqat Ali Rammy, Waseem Abbas, Syed Shahid Mahmood, Haider Riaz, Haseeb Ur
Rehman, Rao Zain Ul Abideen, Muhammad Aqeel, Wu Zhang, Sequence-to-sequence
deep neural network with spatio-spectro and temporal features for motor imagery
classification, Biocybernetics and Biomedical Engineering [J], 41 (2021) 97-110.
3. Sadaqat ali Rammy, Sadia Jabbar Anwar, Muhammad Abrar, Wu Zhang, Conditional
Patch-based Generative Adversarial Network for Retinal Vessel Segmentation, 22nd
International Multitopic Conference (INMIC) [EI], (2019)
10.1109/INMIC48123.2019.9022732.
4. Sadaqat Ali Rammy, Muhammad Abrar, Sadia Jabbar Anwar, Wu Zhang, Recurrent
Deep Learning for EEG-based Motor Imagination Recognition, 3rd International
Conference on Advancements in Computational Sciences (ICACS) [EI], (2020).
10.1109/ICACS47775.2020.9055952.

113

You might also like