Professional Documents
Culture Documents
Assignment 1 ECG Classification PDF
Assignment 1 ECG Classification PDF
FACULTY OF ENGINEERING
ARTIFICIAL INTELLIGENCE
SMBE 4083
Section 01
Group No 1
Given a set of 139 features extracted from ECG data from the PhysioNet/Computing in
Cardiology Challenge 2017 (Details refer to https://physionet.org/content/challenge-2017/1.0.0/),
the problem is to automatically differentiate AF from normal subjects using different machine
learning classifiers and different feature sets.
Method 1: Logistic Regression
CODING
Feature Set 1: ECG morphological and statistical features
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
In method 1, the classification model used for binary class classification is logistic
regression. Inside the coding, the function ‘fitclinear’ is applied to train linear classification
models for two-class (binary) learning with high-dimensional, full or sparse predictor data. The
eligible parameters for ‘fitclinear’ function are ‘Lambda’ and ‘Learner’ in which fitclinear
searches among positive values by default log-scaled in the range and ‘fitclinear’ searches
among ‘svm’ and ‘logistic’ respectively.
From the result, we can see different performances of each feature set. The performance
measure includes accuracy, recall, specificity, precision and F1 scores. Feature set which gives
high result of test set accuracy is Feature Set 1 in which the type of feature is ECG
morphological and statistical features with 97.06% . In contrast, feature set which gives low
results of test set accuracy is Feature Set 2 where the type of feature is frequency-domain based
features with 86.53%.
Recall or also known as sensitivity which measures true positive rate shows the highest
reading in Feature Set 6 with 95.27% . Almost all the feature sets show high specificity value in
which specificity can be defined as true negative rate. Precision values of feature sets are almost
the same except for Feature Set 2 with only 40.48% compared to other feature sets where it
quantifies the number of positive class predictions that actually belong to the positive class.
Likewise, Feature Set 2 also displays the lowest reading compared to other feature sets in F1
scores value with only 0.1790. In conclusion, all feature sets show good performance with only
Feature Set 2 showing low performance compared to others.
Method 2: SVM
CODING
Feature Set 1: ECG morphological and statistical features
Result
PERFORMANCE
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
RESULTS
PERFORMANCE
In method 2, the learning algorithm used is Support Vector Machine (SVM). ‘fitcsvm’ is
used to train SVM classifiers for this SVM ECG classification. ‘Fitcsvm’ can be used to train
SVM classifiers for one-class and binary classification and it is suitable for this ECG
classification as the output of this ECG classification is 0 and 1 only. Mdl =
fitcsvm(XTrain,YTrain) returns an SVM classifier trained using the predictors in the matrix X
and the class labels in vector Y for one-class or two-class classification.Standardize is specified
as the comma-separated pair consisting of 'Standardize', and it is set “true’ in ECG classification
so that the software can trains the classifier using the standardized predictors, but stores the
unstandardized predictors as a matrix or table in the classifier property X. Besides that, we also
set our classifier to ‘gaussian’ under ‘KernelFunction’ as a gaussian kernel which is the default
for one-class learning. All the elements of the predictor matrix X are divided by the software by
the value of KernelScale. After that, an appropriate kernel norm is applied by the software to
compute the Gram matrix.The software will select an appropriate scale factor using a heuristic
procedure when 'auto' is set in the ‘KernelScale’. The three dots in the classifier are defined to
remain in other settings as default.
From the result, we can see different performances of each feature set. The performance
measure includes accuracy, recall, specificity, precision and F1 scores. Feature set which gives
high result of test set accuracy is Feature Set 6 in which the type of feature is a combination of
all features with 97.93% . In contrast, feature set which gives low results of test set accuracy is
Feature Set 2 where the type of feature is frequency-domain based features with 87.65%.
Recall or also known as sensitivity which measures true positive rate shows the highest
reading in Feature Set 6 with 89.86% . Almost all the feature sets show high specificity value in
which specificity can be defined as true negative rate. All feature sets show high specificity
percentages and most of the values approximately achieve 100%. Precision values of feature sets
are almost the same except for Feature Set 2 with only 60.87% compared to other feature sets
where it quantifies the number of positive class predictions that actually belong to the positive
class. Likewise, Feature Set 2 also displays the lowest reading compared to other feature sets in
F1 scores value with only 16.37%. The other features show more than 60% F1 score and most of
the test parameters for other features show consistency value and high performance compared to
Feature 2. In short, all feature sets show good performance with only Feature Set 2 showing low
performance compared to others.
In the confusion matrix, we have four parameters that we need to consider when
evaluating a model which are accuracy, precision , recall and F1 score. Below are the formula
that used to calculate the four parameter :
P recision x Recall
F 1 Score = 2 x P recision+Recall
Method 3: Neural Network
CODING
Feature Set 1: ECG morphological and statistical features
RESULTS
PERFORMANCE
F1 Score(Train) F1 Score(Test)
RESULTS
PERFORMANCE:
F1 Score(Train) F1 Score(Test)
RESULTS
PERFORMANCE:
F1 Score(Train) F1 Score(Test)
RESULTS
PERFORMANCE:
F1 Score(Train) F1 Score(Test)
RESULTS
PERFORMANCE:
F1 Score(Train) F1 Score(Test)
RESULTS
PERFORMANCE:
F1 Score(Train) F1 Score(Test)
In method 3, the classification model used is Neural Network method. Neural Network
uses the “trainlm” function in order to train the sets. ‘Trainlm’ is a network training function that
updates weight and bias values according to Levenberg-Marquardt optimization. It is often the
fastest backpropagation algorithm in the toolbox, and is highly recommended as a first-choice
supervised algorithm, although it requires more memory than other algorithms.
From the result, we can see that feature set 6 has the highest F1 score between the train
set and the test set. Opposing, figure set 2 has the lowest F1 scores. From the confusion matrix
we can get the specificity, recall, accuracy, precision and calculate F1 scores according to the
parameters listed.
Feature set 6 also got the highest accuracy with 98.3% and 99.1% for the train set and test
set respectively. The precision set of all the features are significantly high, except for feature set
2 where its percentage is lower than a quarter of percentage. In conclusion, for calculating the
ECG using Neural Network is significantly relevant for all features, except for feature set 2.
Summary and Conclusion:
According to the data in the table, the highest percentage of F1 score for testing is using neural
networks with 96.66% while training is using SVM method with 96.13%. On the other hand,
lowest figures for both training and testing are shown in the logistic regression method with
15.93% and SVM with 16.37% respectively. It can be concluded that neural network is the best
method that can be used generally.