Assignment 1 ECG Classification PDF

SCHOOL OF BIOMEDICAL ENGINEERING AND HEALTH SCIENCES
FACULTY OF ENGINEERING
ARTIFICIAL INTELLIGENCE
SMBE 4083
Topic Assignment 1 - ECG Classification
Lecturer Dr. Ting Chee Ming

Dr. Muhammad Amir As’ari
Section 01
Group No 1
Group Member Lina Suhaili Binti Rosidi (A16MB0079)
Mohamed Wahid Mohamed Ahmed (A16MB4012)
Muhammad Fawwaz Bin Badrul Hisham (A16MB0111)
Tan Lee Hui ( A16SC0324)
Nur Aina Athiraa Binti Azman (A16MB0145)
Nurhidayah Binti Sanat (A16MB0179)

Problem: Classification of Atrial fibrillation (AF) from Single-Lead ECG
Given a set of 139 features extracted from ECG data from the PhysioNet/Computing in
Cardiology Challenge 2017 (Details refer to https://physionet.org/content/challenge-2017/1.0.0/),
the problem is to automatically differentiate AF from normal subjects using different machine
learning classifiers and different feature sets.
Method 1: Logistic Regression
CODING
Feature Set 1: ECG morphological and statistical features
RESULTS
Figure 1 Confusion Matrix for Train & Test Set 1
PERFORMANCE
Measurement Accuracy Recall Specificity Precision F1 scores
Value = 541+4006 = 541 = 4006 = 541 = 2x0.9169x0.9409

4630 541 + 49 4006 + 34 541 + 34 (0.9169 + 0.9409)
= 0.9821 = 0.9169 = 0.9916 = 0.9409 = 0.9287

Table 1 Feature Set 1(Train Set) Performance
Value = 130 + 994 = 130 = 994 = 130 = 2x0.8784x0.8904

1158 130 + 18 994 + 16 130 + 16 (0.8784+0.8904)
= 0.9706 = 0.8784 = 0.9842 = 0.8904 = 0.8844

Table 2 Feature Set 1(Test Set) Performance
Feature Set 2: Frequency-domain based features
RESULTS
PERFORMANCE
Value = 56 + 3983 = 56 = 3983 = 56 = 2x0.0949x0.4958

4630 56 + 534 3983 + 57 56 + 57 (0.0949 + 0.4958)
= 0.8724 = 0.0949 = 0.9859 = 0.4958 = 0.1593

Table 3 Feature Set 2 (Train Set) Performance
Value = 17 + 985 = 17 = 985 = 17 = 2x0.1149x0.4048

1158 17 + 131 985 + 25 17 + 25 (0.1149 + 0.4048)
= 0.8653 = 0.1149 = 0.9752 0.4048 = 0.1790

Feature Set 3: Statistical and FFT related features
RESULTS
PERFORMANCE
Value = 490+3977 = 490 = 3977 = 490 = 2x0.8305x0.8861

4630 490 + 100 3977 + 63 490 + 63 (0.8305 + 0.8861)
= 0.9648 = 0.8305 = 0.9844 = 0.8861 = 0.8574

Value = 119 + 994 = 119 = 994 = 119 = 2x0.8041x0.8815

1158 119 + 29 994 + 16 119 + 16 (0.8041 + 0.8815)
= 0.9611 = 0.8041 = 0.9842 = 0.8815 = 0.8410

Feature Set 4: HRV features
RESULTS
PERFORMANCE
Value = 367+3934 = 367 = 3934 = 367 = 2x0.6220x0.7759

4630 367 + 223 3934+106 367 + 106 (0.6220 + 0.7759)
= 0.9289 = 0.6220 = 0.9738 = 0.7759 = 0.6905

Value = 95 + 994 = 95 = 994 = 95 = 2x0.6419x0.8559

1158 95 + 53 994 + 16 95 + 16 (0.6419 + 0.8559)
= 0.9404 = 0.6419 = 0.9842 = 0.8559 = 0.7336

Feature Set 5: Mixture of features (Spectral, energy, time-domain, statistical)
RESULTS
PERFORMANCE
Value = 375+3999 = 375 = 3999 = 375 = 2x0.6356x0.2731

4630 375 + 215 3999+998 375 + 998 (0.6356 + 0.2731)
= 0.9447 = 0.6356 = 0.8003 = 0.2731 = 0.3820

Value = 90 + 998 = 90 = 998 = 90 = 2x0.6081x0.8824

1158 90 + 58 998 + 12 90 + 12 (0.6081+0.8824)
= 0.9396 = 0.6081 = 0.9881 = 0.8824 = 0.7200

Feature Set 6: All features
RESULTS
PERFORMANCE
Value = 542+3973 = 542 = 3973 = 542 = 2x0.9186x0.8900

4630 542 + 48 3973+67 542 + 67 (0.9186+0.8900)
= 0.9752 = 0.9186 = 0.9834 = 0.8900 = 0.9041

Value = 141 + 982 = 41

1 = 982 = 141 = 2x0.9527x0.8343
1158 141 + 7 982 + 28 141 + 28 (0.9527+0.8343)
= 0.9698 = 0.9527 = 0.9723 = 0.8343 = 0.8896

Table 12 Feature Set 6 (Test Set) Performance
DISCUSSION OF METHOD 1: LOGISTIC REGRESSION
In method 1, the classification model used for binary class classification is logistic
regression. Inside the coding, the function ‘fitclinear’ is applied to train linear classification
models for two-class (binary) learning with high-dimensional, full or sparse predictor data. The
eligible parameters for ‘fitclinear’ function are ‘Lambda’ and ‘Learner’ in which fitclinear
searches among positive values by default log-scaled in the range and ‘fitclinear’ searches
among ‘svm’ and ‘logistic’ respectively.
From the result, we can see different performances of each feature set. The performance
measure includes accuracy, recall, specificity, precision and F1 scores. Feature set which gives
high result of test set accuracy is Feature Set 1 in which the type of feature is ECG
morphological and statistical features with 97.06% . In contrast, feature set which gives low
results of test set accuracy is Feature Set 2 where the type of feature is frequency-domain based
features with 86.53%.
Recall or also known as sensitivity which measures true positive rate shows the highest
reading in Feature Set 6 with 95.27% . Almost all the feature sets show high specificity value in
which specificity can be defined as true negative rate. Precision values of feature sets are almost
the same except for Feature Set 2 with only 40.48% compared to other feature sets where it
quantifies the number of positive class predictions that actually belong to the positive class.
Likewise, Feature Set 2 also displays the lowest reading compared to other feature sets in F1
scores value with only 0.1790. In conclusion, all feature sets show good performance with only
Feature Set 2 showing low performance compared to others.
Method 2: SVM
CODING
Result
Figure 7 Confusion Matrix for Train & Test Set 1 (SVM)
PERFORMANCE
Table 13 Feature Set 1(Train & Test Set) Performance (SVM)

RESULTS
PERFORMANCE
Table 14 Feature Set 1(Train Set) Performance (SVM)

RESULTS
PERFORMANCE

RESULTS
PERFORMANCE

Feature Set 5: Mixture of features (Spectral, energy, time-domain, statistical)
RESULTS
PERFORMANCE

RESULTS
PERFORMANCE

DISCUSSION OF METHOD 2: SVM
In method 2, the learning algorithm used is Support Vector Machine (SVM). ‘fitcsvm’ is
used to train SVM classifiers for this SVM ECG classification. ‘Fitcsvm’ can be used to train
SVM classifiers for one-class and binary classification and it is suitable for this ECG
classification as the output of this ECG classification is 0 and 1 only. Mdl =
fitcsvm(XTrain,YTrain) returns an SVM classifier trained using the predictors in the matrix X
and the class labels in vector Y for one-class or two-class classification.Standardize is specified
as the comma-separated pair consisting of 'Standardize', and it is set “true’ in ECG classification
so that the software can trains the classifier using the standardized predictors, but stores the
unstandardized predictors as a matrix or table in the classifier property X. Besides that, we also
set our classifier to ‘gaussian’ under ‘KernelFunction’ as a gaussian kernel which is the default
for one-class learning. All the elements of the predictor matrix X are divided by the software by
the value of KernelScale. After that, an appropriate kernel norm is applied by the software to
compute the Gram matrix.The software will select an appropriate scale factor using a heuristic
procedure when 'auto' is set in the ‘KernelScale’. The three dots in the classifier are defined to
remain in other settings as default.
From the result, we can see different performances of each feature set. The performance
measure includes accuracy, recall, specificity, precision and F1 scores. Feature set which gives
high result of test set accuracy is Feature Set 6 in which the type of feature is a combination of
all features with 97.93% . In contrast, feature set which gives low results of test set accuracy is
Feature Set 2 where the type of feature is frequency-domain based features with 87.65%.
Recall or also known as sensitivity which measures true positive rate shows the highest
reading in Feature Set 6 with 89.86% . Almost all the feature sets show high specificity value in
which specificity can be defined as true negative rate. All feature sets show high specificity
percentages and most of the values approximately achieve 100%. Precision values of feature sets
are almost the same except for Feature Set 2 with only 60.87% compared to other feature sets
where it quantifies the number of positive class predictions that actually belong to the positive
class. Likewise, Feature Set 2 also displays the lowest reading compared to other feature sets in
F1 scores value with only 16.37%. The other features show more than 60% F1 score and most of
the test parameters for other features show consistency value and high performance compared to
Feature 2. In short, all feature sets show good performance with only Feature Set 2 showing low
performance compared to others.
In the confusion matrix, we have four parameters that we need to consider when
evaluating a model which are accuracy, precision , recall and F1 score. Below are the formula
that used to calculate the four parameter :
T rue P ositive+T otal N egative

Accuracy = T rue P ositive+F alse P ositive+T rue N egative+F alse N egative
T rue P ositive T rue P ositive

P recision = T rue P ositive+F alse P ositive
= T otal P redicted P ositive
T rue P ositive T rue P ositive

Recall = T rue P ositive+F alse N egative
= T otal Actual P ositive
P recision x Recall
F 1 Score = 2 x P recision+Recall
Method 3: Neural Network
CODING
RESULTS
Figure 1: Confusion matrix of training and testing set feature 1
PERFORMANCE
F1 Score(Train) F1 Score(Test)
= 2x0.945x0.899 = 0.9214 = 2x0.948x0.986 = 0.9666

(0.945+0.899) (0.948+0.986)
RESULTS
PERFORMANCE:
= 2x0.807x0.347 = 0.4853 = 2x0.733x0.372 = 0.4935

(0.807+0.347) (0.733+0.372)
RESULTS
PERFORMANCE:
= 2x0.914x0.846 = 0.8487 = 2x0.967x0.797 = 0.8738

(0.914+0.846) (0.967+0.797)
RESULTS
PERFORMANCE:
= 2x0.761x0.714 = 0.7368 = 2x0.837x0.797 = 0.8165

(0.761+0.714) (0.837+0.797)
Feature Set 5: Mixture of features (Spectral, Energy, Time domain, Statistical)
RESULTS
PERFORMANCE:
= 2x0.919x0.753 = 0.8278 = 2x0.830x0.297 = 0.4375

(0.919+0.753) (0.830+0.297)
RESULTS
PERFORMANCE:
= 2x0.965x0.898 = 0.9303 = 2x0.979x0.953 = 0.9658

(0.965+0.898) (0.979+0.953)
DISCUSSION OF METHOD 3: NEURAL NETWORK
In method 3, the classification model used is Neural Network method. Neural Network
uses the “trainlm” function in order to train the sets. ‘Trainlm’ is a network training function that
updates weight and bias values according to Levenberg-Marquardt optimization. It is often the
fastest backpropagation algorithm in the toolbox, and is highly recommended as a first-choice
supervised algorithm, although it requires more memory than other algorithms.
From the result, we can see that feature set 6 has the highest F1 score between the train
set and the test set. Opposing, figure set 2 has the lowest F1 scores. From the confusion matrix
we can get the specificity, recall, accuracy, precision and calculate F1 scores according to the
parameters listed.
Feature set 6 also got the highest accuracy with 98.3% and 99.1% for the train set and test
set respectively. The precision set of all the features are significantly high, except for feature set
2 where its percentage is lower than a quarter of percentage. In conclusion, for calculating the
ECG using Neural Network is significantly relevant for all features, except for feature set 2.
Summary and Conclusion:
F1 SET / Set1 set2 Set3 set4 set5 Set6

Scores MODEL
Logistic Training 0.9287 0.1593 0.8576 0.6905 0.3820 0.9041

Reg
Testing 0.8844 0.1790 0.8410 0.7336 0.7200 0.8896
SVM Training 0.9579 0.5322 0.8914 0.82474 0.8666 0.9613
Testing 0.9129 0.1637 0.8132 0.8014 0.6550 0.9173
NN Training 0.9214 0.4853 0.8487 0.7368 0.8278 0.9303
Testing 0.9666 0.4935 0.8738 0.8165 0.4375 0.9658
According to the data in the table, the highest percentage of F1 score for testing is using neural
networks with 96.66% while training is using SVM method with 96.13%. On the other hand,
lowest figures for both training and testing are shown in the logistic regression method with
15.93% and SVM with 16.37% respectively. It can be concluded that neural network is the best
method that can be used generally.

Assignment 1 ECG Classification PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 1 ECG Classification PDF

Uploaded by

Copyright:

Available Formats

SCHOOL OF BIOMEDICAL ENGINEERING AND HEALTH SCIENCES

Topic Assignment 1 - ECG Classification

Lecturer Dr. Ting Chee Ming

Group Member Lina Suhaili Binti Rosidi (A16MB0079)

Mohamed Wahid Mohamed Ahmed (A16MB4012)

Muhammad Fawwaz Bin Badrul Hisham (A16MB0111)

Tan Lee Hui ( A16SC0324)

Nur Aina Athiraa Binti Azman (A16MB0145)

Nurhidayah Binti Sanat (A16MB0179)

Figure 1 Confusion Matrix for Train & Test Set 1

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​541+4006 = ​541 = ​4006 = ​ 541 = ​2x0.9169x0.9409

= 0.9821 = 0.9169 = 0.9916 = 0.9409 = 0.9287

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​130 + 994 = ​130 = ​994 = ​130 = ​2x0.8784x0.8904

= 0.9706 = 0.8784 = 0.9842 = 0.8904 = 0.8844

Figure 2 Confusion Matrix for Train & Test Set 2

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​56 + 3983 = ​ 56 = 3983 = ​56 = ​2x0.0949x0.4958

= 0.8724 = 0.0949 = 0.9859 = 0.4958 = 0.1593

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​17 + 985 = ​17 = ​985 = ​17 = ​2x0.1149x0.4048

= 0.8653 = 0.1149 = 0.9752 0.4048 = 0.1790

Figure 3 Confusion Matrix for Train & Test Set 3

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​490+3977 = ​490 = ​3977 = ​490 = ​2x0.8305x0.8861

= 0.9648 = 0.8305 = 0.9844 = 0.8861 = 0.8574

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​119 + 994 = ​119 = ​ 994 = ​119 = ​2x0.8041x0.8815

= 0.9611 = 0.8041 = 0.9842 = 0.8815 = 0.8410

Figure 4 Confusion Matrix for Train & Test Set 4

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​367+3934 = ​367 = ​3934 = ​367 = 2x0.6220x0.7759

= 0.9289 = 0.6220 = 0.9738 = 0.7759 = 0.6905

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​95 + 994 = ​95 = ​994 = ​95 = ​2x0.6419x0.8559

= 0.9404 = 0.6419 = 0.9842 = 0.8559 = 0.7336

Figure 5 Confusion Matrix for Train & Test Set 5

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​375+3999 = ​375 = ​3999 = ​375 = ​2x0.6356x0.2731

= 0.9447 = 0.6356 = 0.8003 = 0.2731 = 0.3820

Measurement Accuracy Recall Specificity Precision F1 scores

Value =​ 90 + 998 = ​90 = ​998 = ​ 90 = ​2x0.6081x0.8824

= 0.9396 = 0.6081 = 0.9881 = 0.8824 = 0.7200

Figure 6 Confusion Matrix for Train & Test Set 6

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​542+3973 = ​542 = ​3973 = ​542 = ​2x0.9186x0.8900

= 0.9752 = 0.9186 = 0.9834 = 0.8900 = 0.9041

Measurement Accuracy Recall Specificity Precision F1 scores

Value = ​141 + 982 = ​ 41

= 0.9698 = 0.9527 = 0.9723 = 0.8343 = 0.8896

Figure 7 Confusion Matrix for Train & Test Set 1 (SVM)

Table 13 Feature Set 1(Train & Test Set) Performance (SVM)

Figure 8 Confusion Matrix for Train & Test Set 2 (SVM)

Table 14 Feature Set 1(Train Set) Performance (SVM)

Figure 9 Confusion Matrix for Train & Test Set 3 (SVM)

Table 15 Feature Set 3(Train Set) Performance (SVM)

Figure 10 Confusion Matrix for Train & Test Set 4 (SVM)

Table 16 Feature Set 4(Train Set) Performance (SVM)

Figure 11 Confusion Matrix for Train & Test Set 5 (SVM)

Table 17 Feature Set 5(Train Set) Performance (SVM)

Figure 12 Confusion Matrix for Train & Test Set 6 (SVM)

Value = 541+4006 = 541 = 4006 = 541 = 2x0.9169x0.9409

Value = 130 + 994 = 130 = 994 = 130 = 2x0.8784x0.8904

Value = 56 + 3983 = 56 = 3983 = 56 = 2x0.0949x0.4958

Value = 17 + 985 = 17 = 985 = 17 = 2x0.1149x0.4048

Value = 490+3977 = 490 = 3977 = 490 = 2x0.8305x0.8861

Value = 119 + 994 = 119 = 994 = 119 = 2x0.8041x0.8815

Value = 367+3934 = 367 = 3934 = 367 = 2x0.6220x0.7759

Value = 95 + 994 = 95 = 994 = 95 = 2x0.6419x0.8559

Value = 375+3999 = 375 = 3999 = 375 = 2x0.6356x0.2731

Value = 90 + 998 = 90 = 998 = 90 = 2x0.6081x0.8824

Value = 542+3973 = 542 = 3973 = 542 = 2x0.9186x0.8900

Value = 141 + 982 = 41

= 2x0.945x0.899 = 0.9214 = 2x0.948x0.986 = 0.9666

= 2x0.807x0.347 = 0.4853 = 2x0.733x0.372 = 0.4935

= 2x0.914x0.846 = 0.8487 = 2x0.967x0.797 = 0.8738

= 2x0.761x0.714 = 0.7368 = 2x0.837x0.797 = 0.8165

= 2x0.919x0.753 = 0.8278 = 2x0.830x0.297 = 0.4375

= 2x0.965x0.898 = 0.9303 = 2x0.979x0.953 = 0.9658