Voice Pathology Classification Using Machine Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/354047975

Voice Pathology Classification Using Machine Learning

Conference Paper · August 2021

CITATIONS READS

3 944

5 authors, including:

Hussein M. A. Mohammed Asli Nur Omeroglu


Ataturk University Ataturk University
14 PUBLICATIONS 52 CITATIONS 11 PUBLICATIONS 42 CITATIONS

SEE PROFILE SEE PROFILE

Merve Polat Emin Argun Oral


Ataturk University Ataturk University
5 PUBLICATIONS 14 CITATIONS 25 PUBLICATIONS 134 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Hussein M. A. Mohammed on 21 August 2021.

The user has requested enhancement of the downloaded file.


Voice Pathology Classification Using Machine
Learning
Hussein M. A. Mohammed Asli Nur Omeroglu Merve Polat
Department of Electrical & Electronic Department of Electrical & Electronic Department of Electrical & Electronic
Engineering Ataturk University Engineering Ataturk University Engineering Ataturk University
Erzurum,Turkey Erzurum,Turkey Erzurum,Turkey
halameri080@gmail.com asli.omeroglu@atauni.edu.tr merve.polat12@ogr.atauni.edu.tr

Emin Argun Oral Ibrahim Yucel Ozbek


Department of Electrical & Electronic Department of Electrical & Electronic
Engineering Ataturk University Erzurum,Turkey Engineering Ataturk University
eminoral@atauni.edu.tr Erzurum,Turkey
iozbek@atauni.edu.tr

Abstract — In this study, the diagnosis of voice diseases Phoniatrics Association. Acoustic parameters are then used
affecting the sound quality of many people during their life to evaluate sound health status. The accuracy of these
has been examined. Systems developed for the automatic parameters in detecting voice disturbances plays an
classification of healthy and pathological sounds are of great important role in the detection of voice disturbances. The
interest in early detection of voice disorders. The main accuracy of the acoustic parameters is related to the
purpose of this study is to investigate and compare the algorithms used to estimate them. There has been an
performance of these techniques using various machine interest recently in machine learning techniques in the field
learning techniques for voice pathology. All analyzes are
of sound pathology in order to increase the accuracy of
performed using the Saarbruecken Voice Database. Detection
these parameters. In this study, application of feature
of voice pathology are evaluated in terms of accuracy,
sensitivity and specificity. Depending on the characteristics
selection techniques and machine learning algorithms with
evaluated, the best accuracy is determined as 85.91% by using the ability to distinguish the patient and healthy sounds
SVM algorithm. with the best accuracy is investigated. In detail, we evaluate
the pathology using different features obtained from voice
Keywords — Voice Disorder; Machine Learning. signals in addition to patients’ age and gender information.
The performance of the used machine learning methods is
[1] INTRODUCTION evaluated in terms of accuracy, sensitivity and specificity.
Sound is the most important communication tool
among human beings. Vocal communication is a basic skill Details of this study as follows. Section II presents studies
that people need it to express their feelings for daily social on detection of sound diseases by using machine learning
interactions and earn it during their life span. Professionals techniques available in literature. In Section III, the data
such as singers, actors, auctioneers, lawyers, and teachers set, features and machine learning algorithms used for the
who use their voices at a higher level than normal levels classification in experimental stage are introduced. The
are at risk of pathological voice problems. Also people who results are discussed in Chapter V, while our results are
have voice disturbances due to the voice misuse, given in Section IV.
neurological disorders, drug use and unhealthy social
habits may encounter many problems when [2] RELATED WORK
communicating with people and lead to many social and
personal complications. Vocal cord vibration is affected in Sound pathology in the sense of data science is a multi-
different ways depending on the type and location of the class classification problem depending on the audio signal
disease in the vocal fold. Voice pathologies resulting from from individuals. In detecting sound pathology, several
changes in sound quality, pitch and volume can be machine classifiers are usually applied. In recent years,
clinically determined by performing several procedures, several approaches have been developed to improve
such as acoustic analysis which consists of estimation of performance in terms of accuracy in separating healthy and
appropriate parameters extracted from an audio signal to pathological sounds. These approaches focus on
assess possible changes in the audio path in accordance identifying parameters for measuring sound quality and
with the rules of the SIFEL protocol [2] (Società Italiana di new techniques that can detect voice disorders. A brief
summary of some recent studies on our research topic is as
follows; Ghulam Muhammad et al. [4] are proposed a smart
healthcare framework using edge computing. With the
proposed system they achieved 98.5 percent accuracy
using the part of Saarbrucken Voice Disorder (SVD)
database. Daria [5] makes an effort on creation of efficient
and accurate system for automatic detection of normal and
Figure.1. Illustration of voice pathology detection system three different voice pathologies by using SVD database.
Foniatria e Logopedia) prepared by Italian Logopedic and

354
The decision system based on neural networks. Laura purpose, raw features are first obtained in terms of MFCC,
Verde et al. [6] are focused on investigation and statistical features such as kurtosis, swekness, etc., all
comparision of the performance of several machine calculated over vowel /a/. Then processed features are
learning techniques useful for voice pathology detection. formed based on these raw features. They are as follows:
All analyses are performed on a dataset of voices selected
from the SVD database. A novel system for pathological • Age: age of each subject
voice detection using Convolutional Neural Network
• Gender: gender of each subject
(CNN) is presented [7, 8]. In this system parts of SVD
database is used. Kebin Wu et al. [9] are proposed a novel • Mean and standart variations of raw features
model JOLL4R (JOint Learning based on Label Relaxed calculated over epochs of each record
low-Rank Ridge Regression) to fuse audios for voice based Class of each record as healthy or pathological are used
disease detection. In these studies, NN (Neural Network) as labels during experiments.
and SVM (Support Vector Machine) are used by
combining American and German databases MEEI and C. Classifiers
SVD to detect voice disorders [9, 10]. Pavol Harar et al. Different machine learning algorithms are preferred to
described a preliminary investigation of Voice Pathology make a comprehensive comparison of the features obtained
Detection using Deep Neural Networks (DNN) [12]. from the audio files. These techniques used are given
below.
[3] METHOD
A. Database Support Vector Machine (SVM): Support Vector
Machines are mainly used to linearly separate data from
In our study, we used the publicly available
Saarbruecken Voice Database (SVD) database recorded by different classes. It is important to determine an optimal
the Phonetic Institute of the University of Saarland in decision plane that is as far as possible from two classes.
Germany. SVD is composed of healthy and pathological Training of support vector machines requires a solution in
individuals’ voice records. There are 1354 (627 male and the form of a quadratic optimization problem. Sequential
727 female) pathological, suffering from 71 different Minimal Optimization (SMO) technique can also be used
diseases, and 687 (259 male and 428 female) healthy voice for solution [14]. Kernels is utilized to convert a non-
recordings. This database consists of vowels /a/, /i/, /u/ and linearly separable problem to a linearly-separable form.
“Guten Morgen, wie geht es Ihnen?” (Good morning, how
are you?) sentences in German. Logistic Model Tree (LMT): It is a classification model
that combines logistic regression and decision tree [15].
In the study, we have used only the /a/ vowel and all the
available records in the SVD database for the classification Unlike ordinary decision trees, the logistic model leaves
of healthy and pathological sounds. Each sample of the data have a logistic regression model.
used in this study consists of the voice ID (a number to
define the record), age, gender and class information as Decision Tree (DT): In this algorithm, the learned function
healthy or pathological. The experiments are evaluated is represented by a decision tree. A structure used to divide
using the whole database and a balanced database, a data set into smaller clusters by applying a set of decision
composed of 685 male and 685 female subjects. rules. To classify the data, J48, an application of the C4.5
Distribution details of healthy and pathological samples of tree classifier [16], was used.
balanced database in terms of their age and gender are
given in Table.1. K-Nearest Neighbord (KNN): In this algorithm, the
similarity of a new data, to be classified, is compared to
[1] Table.1 Balanced Dataset used in study
each example in the data in terms of distance between
[2] [3] Healthy [4] Pathological them. They are sorted, and first k closest is used for a
[5] Age [6] Female [7] Male [8] Female [9] Male decision. In this study, we used value of 50 for k and
[10] 17- [11] 359 [12] 138 [13] 58 [14] 23 Euclidean distance function to classify the data.
29
D. Feature Selection
[15] 30- [16] 27 [17] 62 [18] 94 [19] 24
39 Among the various features, some of them are more
[20] 40- [21] 15 [22] 37 [23] 85 [24] 43 valuable for voice pathology classification. Feature
49 selection algorithms form a subset of the best features to
increase the algorithm speed and accuracy by eliminating
[25] 50- [26] 13 [27] 9 [28] 87 [29] 74
irrelevant features and reducing the size of feature set.
59
[30] 60+ [31] 14 [32] 11 [33] 104 [34] 93 There are many algorithms available for feature
[35] Total [36] 685 [37] 685 selection. In this work, gradient boosting machine is used
for that purpose. This algorithm trains very precise
B. Features classifiers quickly while choosing high quality features
Feature extraction, in general, is an important step in [13]. Ten most valuable features obtained by feature
improving the success of a classification algorithm as it selection algorithm are shown in Figure.2.
helps better analyze any given data. Features, commonly
used in voice classification studies for machine learning
techniques, are also utilized in the current study. For that

355
DT 81.47 86.20 72.20
KNN 74.41 80.30 62.90
BALANCED DATA
Accuracy (%) Sensitivity (%) Specificity
(%)
SVM 85.91 88.47 83.36
LMT 84.96 82.90 87.00
DT 78.24 78.50 78.00
KNN 78.75 69.90 87.60

Table.3. Classification Results with Feature selection


Figure.2. Most valuable features obtained SELECTED FEATURES
ALL DATA
[4] RESULTS AND DISCUSSION Accuracy (%) Sensitivity Specificity
A. Performance Criteria (%) (%)
The performance of the proposed method is evaluated SVM 83.57 84.70 81.40
with K-fold cross-validation by which K-1 of folds is used LMT 84.90 89.70 75.40
as training set while the remaining one is used as test set. DT 83.62 88.50 74.10
The performance results are averaged over all folds. In our KNN 77.50 82.70 67.20
experiments, K was chosen as ten. In addition to accuracy, BALANCED DATA
sensitivity and specificity measurements are also used to Accuracy (%) Sensitivity Specificity
evaluate the performance of selected classification (%) (%)
techniques. SVM 85.03 81.80 88.30
Accuracy, percentage of correctly classified LMT 84.30 81.30 87.30
pathological and healthy voices, is defined in Eq.1 as DT 82.99 86.60 79.40
follows: KNN 83.64 77.80 89.50
(𝑇𝑃 + 𝑇𝑁) ()
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
(𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁)
B. Classification Results
Sensitivity, representing the positive results of the test, and In this work four set of experiments are conducted
Specificity, representing the negative results, are defined in using two different datasets, namely all and balanced, and
Eq.2 and Eq.3, respectively. two different feature sets, namely all and selected, as
explained in Section III. Table 2 represents test results of
all four classifiers when all features are extracted using all
𝑇𝑃 () data as well as balanced data. Same results are obtained and
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
(𝑇𝑃 + 𝐹𝑁) presented in Table 3 when ten most valuable features,
𝑇𝑁 ()
suggested by gradient boosting machine, are considered
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑡𝑦 = only. When the balanced dataset is used, the best accuracy
(𝑇𝑁 + 𝐹𝑃)
of 85.91% in voice pathology detection is obtained by
Here TP, TN, FP and FN are defined as follows. using SVM classifier. This is higher than the best result
reported in the literature as shown in Table 4. When we
• True Positive (TP): the algorithm recognizes that the
voice sample is pathological, and it is actually pathological. consider all dataset, Logistic Model Tree gives best
accuracy, sensitivity results while that of specificity is
• True Negative (TN): the algorithm recognizes that the obtained with SVM if gradient boosting machine selected
voice sample is healthy, and it is actually healthy. features are utilized by classifiers. Also, when the feature
• False Positive (FP): the voice sample is healthy but the selection is utilized the accuracy, sensitivity and specificity
algorithm recognizes it as pathological; increased in LMT, DT and KNN while only specificity
increased in SVM. On the other hand, when the feature
• False Negative (FN): the voice sample is pathological but selection is utilized with balanced dataset, the accuracy,
the algorithm recognizes it as healthy sensitivity and specificity are all increased in DT and KNN
Table 2. Classification Results without Feature selection methods, whereas only specificity increased in SVM and
ALL FEATURES LMT methods.
ALL DATA
Accuracy (%) Sensitivity (%) Specificity
(%)
SVM 83.82 81.66 84.90
LMT 83.52 88.00 74.80

356
Table.4. Result comparation on different studies REFERENCES
Ref. Method Accuracy Database (SVD) [1] Smith, J. O. and Abel, J. S., ``Bark and ERB Bilinear Transforms'',
(%) IEEE Trans. Speech and Audio Proc., 7(6):697-708, 1999.
SVM 71 pathologies are [2] Lee, K.-F., Automatic Speech Recognition: The Development of the
SPHINX SYSTEM, Kluwer Academic Publishers, Boston, 1989.
[6] DT 85.77 685 healthy and 685
[3] Rudnicky, A. I., Polifroni, Thayer, E H., and Brennan, R. A.
LMT pathological voices "Interactive problem solving with speech", J. Acoust. Soc. Amer.,
CNN 6 of 71 pathologies are Vol. 84, 1988, p S213(A).
[8] 77 482 healthy and 482 [4] Muhammad, Ghulam, et al. "Edge computing with cloud for voice
pathological voices disorder assessment and treatment." IEEE Communications
Magazine 56.4 (2018): 60-65.
SVM 2 of 71 pathologies are
[5] Hemmerling, Daria. "Voice pathology distinction using
[9] NN 87.86 686 healthy and 69 autoassociative neural networks." 2017 25th European signal
pathological voices processing conference (EUSIPCO). IEEE, 2017.
LSTM 71 pathologies are 687 [6] Verde, Laura, Giuseppe De Pietro, and Giovanna Sannino. "Voice
[12] 68.08 healthy and 1354 disorder identification by using machine learning techniques." IEEE
pathological voices Access 6 (2018): 16246-16255.
ANN 71 pathologies are 70 [7] Alhussein, Musaed, and Ghulam Muhammad. "Voice pathology
detection using deep learning on mobile healthcare
[17] 83.3 healthy and 70 framework." IEEE Access 6 (2018): 41034-41041.
pathological voice [8] Wu, Huiyi, et al. "A Deep Learning Method for Pathological Voice
SVM 71 pathologies are 685 Detection Using Convolutional Deep Belief
Ours DT 85.91 healthy and 685 Networks." Interspeech. Vol.
LMT pathological voices [9] Wu, Kebin, et al. "Joint learning for voice based disease
detection." Pattern Recognition 87 (2019): 130-139.
KNN
[10] Verde, Laura, et al. "Dysphonia Detection Index (DDI): A New
Multi-Parametric Marker to Evaluate Voice Quality." IEEE
[5] CONCLUSION Access 7 (2019): 55689-55697.
[11] Ezzine, Kadria, and Mondher Frikha. "Investigation of glottal flow
In this study, four different machine learning parameters for voice pathology detection on SVD and MEEI
algorithms are used to detect pathology in voice signals. databases." 2018 4th International Conference on Advanced
For the classification problem, all the models are trained, Technologies for Signal and Image Processing (ATSIP). IEEE,
validated and tested using continuous voice /a/ sound 2018.
produced with normal pitch voice obtained from [12] Harar, Pavol, et al. "Voice pathology detection using deep learning:
Saarbruecken Voice Data set containing 71 types of a preliminary study." 2017 international conference and workshop
on bioinspired intelligence (IWOBI). IEEE, 2017.
pathology. All tests are performed for the entire SVD
dataset as well as 1370 equally distributed samples [13] Xu, Zhixiang, Gao Huang, Kilian Q. Weinberger, and Alice X.
Zheng. "Gradient boosted feature selection." In Proceedings of the
between healty and pathological classes. These samples are 20th ACM SIGKDD international conference on Knowledge
selected making sure that all age groups and all 71 different discovery and data mining, pp. 522-531. ACM, 2014.
disease types are included in the balanced (equally [14] S B. Schölkopf, C. J. Burges, and A. J. Smola,Advances in
distributed) dataset. For balanced data set and whole data kernelmethods: support vector learning.MIT press, 1999.
set, different sound pathology identification performances [15] Sumner, Marc, Eibe Frank, and Mark Hall. "Speeding up logistic
of various techniques such as Support Vector Machine, model tree induction." European conference on principles of data
Decision Tree, Bayes Classification, Logistic Model Tree mining and knowledge discovery. Springer, Berlin, Heidelberg,
2005.
and KNN are compared.
[16] S. L. Salzberg, “C4. 5: Programs for machine learning by j. ross
When we consider the results given in the literature, the quinlan.morgan kaufmann publishers, inc., 1993,”Machine
proposed method shows a significant improvement Learning, vol. 16,no. 3, pp. 235–240, 1994.
compared to various state-of-the-art studies [8,9,12,17]. On [17] Teixeira, João Paulo, Paula Odete Fernandes, and Nuno Alves.
the other hand, it shows a slight improvement over the best "Vocal Acoustic Analysis–Classification of Dysphonic Voices with
Artificial Neural Networks." Procedia Computer Science 121
results reported in literature [6] as illustrated in Table.4. (2017): 19-26.

357

View publication stats

You might also like