Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Publications of the Astronomical Society of Australia (2022), 1–9

doi:10.1017/um5r.2022.32

REVIEW

Overview of machine learning and deep learning models for Covid-19


identification acoustic signal-based
Y. Siyah, Y. Lakrad, and M. EL Omari
computer and telecommunication research laboratory,Master IT, Faculty of Science, Rabat, Morocco
Author for correspondence: Y. Siyah, Email: youssef.siyah@um5r.ac.ma.

(Received ; revised ; accepted ; first published online )

Abstract
COVID-19 can be pre-screened on the basis of symptoms and confirmed by laboratory tests such as reverse transcription polymerase chain
reaction (RTPCR). This RT-PCR treatment is more expensive and induces a violation of social distancing rules, and is time consuming. The
rapid antigen test (RAT) is another molecular testing method which alleviates the time limitation of RT-PCR but has high false negatives 3.1
(low specificity). Therefore, it was necessary to look for alternative methods to detect COVID-19. The techniques proposed in this overview
are machine learning 4.1 and deep learning 4.2 techniques based on the acoustic signal , these techniques can detect the virus quickly and gives
everyone the opportunity to test at home without going to a laboratory by recording audio clips of their coughing and breathing sounds and
downloading the data anonymously.In this overview, we have collected the most recent methods used in previous work. After a comparison
between these methods 2, the LSTM, 8-CNN and CR19 models were found to give accuracy of 98%, which is the highest reported so far
among the other models proposed in this overview.
Keywords: COVID-19, Machine learning, Deep learning, acoustic signal

1. INTRODUCTION still unaffordable to most global populations. Sometimes, it


is unpleasant to the children. Not least, this test is not yet
The COVID-19 pandemic, also known as the coronavirus
accessible to the people living in remote areas, where medical
pandemic, is an ongoing global pandemic of coronavirus dis-
facilities are scarce [3]. Alarmingly, the physicians suspect
ease 2019 (COVID-19) caused by severe acute respiratory
that the general people refuse the COVID-19 test in fear of
syndrome coronavirus 2 (SARS-CoV-2). The novel virus was
stigma [1]. Governments worldwide have initiated a free mas-
first identified from an outbreak in Wuhan, China, in De-
sive testing campaign to stop the spreading of this virus, and
cember 2019. Attempts to contain it there failed, allowing
this campaign is costing them billions of dollars per day at the
the virus to spread to other areas of China and later world-
average rate of $23 per test[32]. Hence, easily accessible, quick,
wide. The World Health Organization (WHO) declared the
and affordable testing is essential to limit the spreading of the
outbreak a public health emergency of international concern
virus. The COVID-19 detection method, using human audio
on 30 January 2020 and a pandemic on 11 March 2020. As
signals, can play an important role here.
of 22 August 2022, the pandemic had caused more than 596
Recently, AI has been extensively implemented in the dig-
million cases and 6.45 million confirmed deaths, making it one
ital health sector and specifically in the COVID-19 health
of the deadliest in history[4]. figure1 shows the Evolution of
crisis, due to the variety of information it provides, such as
COVID-19 cases and deaths up to august 2022.The WHO
COVID-19 growth-rate detection, risk, and infection severity
reports as most common symptoms of C19 fever, dry cough,
identification, in addition to death prediction. AI is a broad
loss of taste and smell, and fatigue; the symptoms of a severe
umbrella that consists of many subdivisions, including machine
COVID-19 condition are mainly shortness of breath, loss of
learning (ML) and deep learning (DL), which both imitate
appetite, confusion, persistent pain or pressure in the chest,
the functionality of the human brain and behaviors based on
and temperature above 38 degrees Celsius. Monitoring the
the data that is fed to cluster tasks[19]. AI has numerous ap-
development of the pandemic and screening the population
plications in both speech-signal processing and digital-image
for symptoms is mandatory. Arguably the procedures mostly
processing. Both facilitate the process of controlling, mon-
used are temperature measurement – e.g., before boarding
itoring, and overcoming the COVID-19 epidemic through
a plane – and diverse corona rapid tests – e.g., before being
their four-step procedure: detection, prevention, recovery,
allowed to visit a care home. In the clinical test for diagnosing
and response[30]. Scientists believe that it is possible to de-
C19 infection, the anterior nasal swabs sample is collected as
termine the presence of a COVID-19 infection by analyzing
suggested by Hanson et al [18].
the generated sounds from the respiratory system, whether
To date, reverse transcription-polymerase chain reaction (RT-
cough, breathing, or regular speech. Furthermore, medical
PCR) is considered the gold standard for testing coronavirus[34].
imaging data such as chest X-ray and lung computed tomog-
However, the RT-PCR test requires person-to-person contact
raphy (CT) scans can be beneficial in COVID-19 detection[7].
to adminis-ter, needs variable time to produce results, and is
2 Y. Siyah et al.

contains 595 COVID-19(+) people and 592 COVID-19(-)


In this overview, we will present several machine learning people have been tagged. All data has been noise removed.
and deep learning techniques collected from the sounds of the In addition, all data were normalized with the normalization
respiratory system (coughing, breathing, speech) and evaluate method before the study.
these techniques to select the best performing model against Model 3: In 2022, Madhurananda Pahar and Marisa Klop-
COVID-19.The organization of this document is as follows: per and Robin Warren and Thomas Niesler[27] provided a
Section 2 presents the data set collected. It also details the method to detect COVID-19 in cough, breath and speech
classification methods with their performance. In section 3, using deep transfer learning and bottleneck features . the first
the experiments and results are detailed in a table. Section step of this model is the extraction of the characteristics of
4 defines machine and deep learning and then presents the the samples which are the mel frequency spectral coefficients
steps for developing a machine learning model. Finally, the (MFCC) and the energies of the linearly spaced logarithmic fil-
conclusion of the reported work and the future directions of ter banks, as well as their velocity and acceleration coefficients
the proposed approach are presented in section 5. and also signal rate zero-crossing (ZCR) [34] and kurtosis
[34], then selection of this feature is performed using CNN,
LSTM and Resnet50 architectures to improve performance
of COVID-19 based classification on signals cough, breath
and speech audio. and then the superficial classifiers are (LR,
SVM, KNN and MLP). This method used a database con-
taining 10.29 h of annotated audio recordings with four class
labels (cough, speech, sneeze, noise) was available to pre-train
the neural architectures. The composition of this dataset is
summarized as 11202 cough sounds (2.45 h audio), 2.91 h of
speech from male and female participants, 1013 sneeze sounds
(13.34 min audio) and 2.98 h other unvoiced sounds.
Figure 1. COVID-19 cases reported weekly by WHO Region, and global deaths,
as of 21 August 2022[25] Model 4: In 2022, Ezz El Din Hemdan1 and Walid El
Shafai and Amged Sayed[20] provided a method called CR19
to detect COVID-19 in cough audio signals using machine
learning algorithms for automated medical diagnostic applica-
2. Materials And Methods tions. the CR19 consists of a hybrid GA-ML with six differ-
Model 1: In 2022, Rumana Islam and Esam Abdel-Raheem ent machine learning algorithms, namely Linear Regression,
and Mohamed Tarique[21] provided a method to use deep Naïve Bayes, K Nearest Neighbors, Support Vector Machine,
learning to automatically classify and detect covid19 diseases Logistic Regression, and Decision Tree. the first stage of this
from cough sound. this model used a database composed of model is the acquisition and collection of data, in this phase
cough sound samples on covid19 diseases, it is called Virufy. the audio data will be collected from mobile phone sources
The Virufy is a volunteer-run organization, which has built to be processed and then we have the data preprocessing, it
a global database to identify COVID-19 patients using AI. is to load the data collected to be adapted for further process-
The database contains both clinical and participatory data. ing within the proposed framework to indicate the positive or
Clinical data is accurate as it was collected and processed in negative COVID-19 case for each audio signal in the data set
a hospital according to standard operating procedure (SOP). and thereafter we have phase 3 this is for training and finally
Subjects were confirmed as healthy individuals (i.e. COVID- the classification and detection of COVID-19 in cough data.
19 negative) and COVID-19 patients (i.e. COVID-positive) Model 5: In 2022, Mahmoud Aly and Kamel H. Rahouma
using the RT- test PCR, and the data was labeled accordingly. and Safwat M. Ramzy[8] provided a COVID-19 diagnostic
Model 2: In 2021, Yunus Emre Erdogan and Ali Narin [16] method using machine learning and outsourced breathing
provided a method based on machine learning algorithms to and voice recordings. The network architecture of this deep
automatically classify and detect covid19 diseases from cough learning model is very simple, it contains three layers, the
sound. the first step of this model is the extraction of the tradi- first layer contains 16 nodes with ReLU activation function,
tional features using the empirical mode decomposition (EMD) the second layer to reduce overfitting at a rate of 0.5 and the
and the discrete wavelet transform (DWT) of the acoustic third layer contains a single node with the ReLU activation
cough data, then the selection of this feature made by the Reli- function. sigmoid activation function. This model used the
ef F algorithm and after they are classified by the SVM (Vector Coswara dataset where each user recorded 9 different types of
Machine Support). This method used a database consisting of sounds like coughing, breathing, and speaking tagged with
cough acoustic samples on COVID-19(+) and COVID-19(-) COVID-19 status. the database contains 9 separate datasets,
were obtained from the open access site https://virufy.org/. each dataset contains samples of a single type of sound labeled
The data was provided by a mobile application developed by (deep breath, shallow breath, heavy cough, shallow cough,
Stanford University. The data belong to a total of 1187 people. fast count, normal count, vowel-A , vowel-E, vowel-O) with
All data was determined as positive and negative according COVID-19 status (positive) or (negative). The total number
to the results obtained from the RT-PCR test. The database of samples in each dataset was 1299, where (126) positive and
Publications of the Astronomical Society of Australia 3

(1173) negative. Moreover. user recorded 9 different types of sounds such as coughing,
Model 6: In 2022, I. Anee Daisy and R. Lavanya[14] breathing, and speaking, labeled with COVID-19 status.A
provided a deep learning-based method for non-vocal human combination of models trained on different sounds can diag-
sound classification and Covid -19 cough detection. this model nose COVID-19 with more A combination of patterns trained
is used the algorithm of CNN (CONVOLUTIONAL NEU- on different sounds can diagnose COVID-19 more accurately
RON NETWORK) in order to extract the characteristics of than a single pattern trained on cough or breath only.
each audio signal, the acoustic characteristics are selected in the Model 10: In 2021, Coppock, Harry and Gaskell, Alex
time and frequency domains, including the short term energy, and Tzirakis, Panagiotis and Baird, Alice and Jones, Lyn and
the loudness , zero crossing rate (ZCR), power spectral density Schuller, Björn and al. Virufy(is a non-profit research or-
(PSD), and spectral entropy. This model used the Coswara ganization developing artificial intelligence (AI) technology
dataset where each user recorded 9 different types of sounds to screen for Covid-19 from cough)[13] provided a machine
like coughing, breathing, and speaking tagged with COVID- learning based method for classification of non-vocal human
19 status. The COVID-19 identification in this study is based sounds and detection of Covid -19 cough. This model uses
on the ResNet50 variant with a database contains the different ResNet(is Artificial neural networks (ANNs), usually simply
types of audio sounds such as cough sounds, snoring, sneezing called neural networks (NNs) or neural nets, are computing
and COPD. systems inspired by the biological neural networks that con-
Model 7: In 2022, Mohammed Usman and Vinit Ku- stitute animal brains.) and Features"Mel- spectrogram".This
mar Gunjan and Mohd Wajid and Mohammed Zubair and model used database of people who are positive "database of
Kazy Noor-e-alam Siddiquee[35] provided a method using breath, cough, and voice sounds for COVID-19 diagnosis".
speech as a biomarker for COVID-19 detection based on ma- Model 11: In 2021 pahar [26] presents a machine learning-
chine learning. this model is used on the Microsoft Azure based COVID-19 cough classifier that can discriminate COVID-
Machine Learning Studio (MAMLS) cloud platform to per- 19 positive coughs from negative and healthy coughs recorded
form binary classification, and classification performance is on a smartphone. The datasets used contain both forced and
analyzed and compared for five state-of-the-art classification natural coughs which is The Coswara dataset, Coswara is pub-
algorithms available in MAMLS This model uses short-term licly available, contains 92 COVID-19 positive subjects and
Fourier transform (STFT) coefficients as characteristics of each 1079 healthy subjects. The method uses CNN(The convolu-
audio. the algorithms used in this platform are: NN, SVM, tional neural network are powerful artificial intelligence (AI)
LoR, BDT and DF. The speech recordings used in this study systems that use deep learning to perform both generative and
include two categories: speech from healthy individuals with descriptive tasks) based on residuals "Resnet50" which is the
no known pre-existing medical condition at resting heart rate classifier that gives the best performance.
and speech from asymptomatic individuals. COVID-19 pos- Model 12: In 2022, Kawther A. Al Dhlan [6] provided
itive people. The COVID-19 identification in this study is a method for detecting COVID19 using a deep learning ap-
based on the ResNet50 variant with a database containing the proach. The system proposed in this study consists of two
different types of audio sounds such as cough sounds, snoring, steps, preprocessing and classification. The least mean squares
sneezing and COPD. filter removes artifacts or noise from the input speech signal in
Model 8: In 2021, Fakhry, Ahmed and Jiang, Xinyi and the preprocessing stage. After completing the preprocessing
Xiao, Jaclyn and Chaudhari, Gunvant and Han, Asriel and process, the Generative Adversarial Network (GAN) classi-
Khanzada, Amil and al. Virufy(is a non-profit research orga- fier analyzes the filter signal to classify COVID-19 and non-
nization developing artificial intelligence (AI) technology to COVID-19 signals. The database used in this model contains
screen for Covid-19 from cough)[17] provided a deep learning healthy and unhealthy sound samples, including COVID-19
based method for classification of non-vocal human sounds identification.
and detection of Covid -19 cough. This model uses Multi- Model 13: In 2022, Ali Bou Nassif and Ismaïl Shahin
Branch Deep Learning( multimodal data-based failure progno- and Mohamed Bader and Abdelfatah Hassan and Naoufel
sis model) This model used the Coughvid dataset"A database Werghi[24] provided a covid-19 detection system using deep
of breath, cough, and voice sounds for COVID-19 diagnosis". learning algorithms based on speech and image data. in this
This study provided experimental results reaching an AUC of study, the features extracted for the optimal representation of
91% . the speech signal are the Mel frequency cepstral coefficients
Model 9:In 2021, Aly, Mahmoud and Rahouma, Kamel H (MFCC) [2] and the model used is the long-short-term mem-
and Ramzy, Safwat M and al. Virufy(is a non-profit research ory model (LSTM). LSTM is an advanced variant of RNN.
organization developing artificial intelligence (AI) technology It stores data information for an extended period of time, and
to screen for Covid-19 from cough)[8] provided a machine past data is easier to retrieve from memory. The database
learning based method for classification of non-vocal human used in this model consists of 1159 cough, breath and speech
sounds and detection of Covid -19 cough. This model uses sound samples obtained from 592 participants, divided into
Resnet50 (ResNet-50 is a convolutional neural network that is 379 healthy patients and 213 COVID-19 infected patients. All
50 layers deep. ResNet, short for Residual Networks is a classic samples in the dataset were captured using a moving micro-
neural network used as a backbone for many computer vision phone.
tasks.). In this study, we used the Coswara dataset where each Model 14: In 2022,Rahman, Tawsifur and Ibtehaz, Nabil
4 Y. Siyah et al.

and Khandakar and others [29].Use a novel machine learn- Among them, erbSpec-RF obtains the best AUC.
ing approach to detect COVID-19 patients (symptomatic Model 17: In 2022 ,Muguli, Ananya and Pinto, Lancelot
and asymptomatic). A research group at the University of and Sharma, et al [23] provided a method for detecting COVID19
Cambridge shared such a dataset of cough and breath sounds using a machine learning approach. The dataset used is derived
samples from 582 healthy patients and 141 COVID-19 pa- from the Coswara dataset, a set of sound recordings from the
tients,The collected dataset includes data from 245 healthy in- population of sound recordings of COVID-19 positive and
dividuals and 78 asymptomatic and 18 symptomatic COVID19 non-COVID-19 individuals.The volunteering subjects are ad-
patients. The CNN stacking model is based on a meta-learning vised to record their respiratory sounds in a quiet environment,
logistic regression classification using spectrograms (the energy each subject provides 9 audio recordings.The developmental
distribution of the sound is plotted against time and frequency) data sets consisted of 1040 (965 non-COVID subjects) and 990
generated by the time and frequency) generated from the user’s (930 non-COVID subjects) respectively.In this study they used
data.using the combined data set (Cambridge and collected). three model classifiers: Logistic regression (LR), Multi-layer
Model 15: In 2020 Ritwik, Kotra Venkata Sai and Kalluri, perceptron (MLP) and Random forest (RF). Finally, this study
Shareef Babu and Vijayasenan, Deepu [31] attempt to study aims to draw attention to the importance of human voice and
the presence of cues regarding COVID-19 disease in voice other respiratory sounds for the sound diagnosis of COVID-19.
data.They use an approach that is similar to speaker recog- Finally, this study aims to draw attention to the importance
nition. The dataset consists of audio clips extracted from of human voice versus other breath sounds in sound-based
YouTube videos of television interviews (often a video call) of diagnosis of COVID-19.
COVID-19 positive patients. The speech portion of the audio In all the above works, the working steps are one, and the only
was extracted at a sampling rate of 44.1 kHz. The audio sam- difference is the selection of the algorithm that the researcher
ples audio samples were converted to a single-channel (mono) uses for the same purpose, which is to achieve an accurate
wav format and passed through a 300 Hz to 3.4 kHz bandpass and reliable system to detect the presence of the Corona virus
filter. This is done to simulate telephone quality speech. The in the objects to be tested. The following figure2 represents
audio is finally downsampled to 8 kHz. the different stages of audio processing in order to diagnose
We collected speech data from nineteen speakers, includ- covid19.
ing ten COVID19-positive speakers and nine COVID-19-
negative speakers. The composition of male and female speak-
ers in each class is presented in Table 1 with the number of
utterances per class and gender. In this work, they also use the
short-term melodic spectrum as low-level low-level features.
They will follow an approach similar to the super vectors that
were originally used in speaker recognition to extract features
at the utterance level. An SVM trained with these features is
used to predict COVID-19 from the speech data.
Model 16: In 2021,Das, Rohan Kumar and Madhavi,
Maulik and Li, Haizhou[15]In this work, they focus on new
acoustic fronts for the detection of COVID-19 since the data
for the challenge is very limited. We consider auditory acous-
tic indices based on the long term transform, the gammatone
filter bank and the equivalent equivalent rectangular band-
width spectrum to capture the discriminative features of the
discriminative signal characteristics for COVID-19 detection.
The database released for the DiCOVA challenge is derived Figure 2. Framework for classification of COVID 19 status on speech[20].
from Coswara corpus which is collected by a crowd sourcing
platform from COVID-19 positive and negative individuals.
Each participant provided 9 audio recordings including soft 3. RESULTS
and loud cough, soft and deep breathing, sustained phonation 3.1 the Performances
of vowels, and counting 1-20 digits at a fast and normal rate To evaluate the results of a given system, either in machine
in a digits at a fast, normal rate using the web application4. learning or in deep learning, the evolution metrics of this
The collected data are then divided into two tracks for the system are calculated (precession, recall, f-score, specificity and
DiCOVA Challenge. Track 1 includes sound recordings of the precision) as shown in the table1 we see base on the confision
individuals’ coughs, while Track 2 consists of deep breathing, matrix (TP, TN, FP and FN) which are the numbers of true
vowels and deep breathing, vowels, and normal speed number positives, true negatives, false positives and False negative [22],
counts. they focus only on the Track-1 database of the chal- respectively.
lenge in this work. they find that our three auditory acoustic
features perform better than the basic performing better than • TP : which specifies the numberof correctly classified pos-
the baseline MFCC features with the LR and RF classifiers. itive samples.
Publications of the Astronomical Society of Australia 5

• FN : which specifies the number of misclassified samples. among these six classifiers, GA-KNN was more than 97%
• FP : which specifies the number of negatives examples of accurate in diagnosing COVID-19 from cough audio signals.
misclassified samples. Model 5: In [8],The experimental results for the combi-
• TN : which specifies the number of negatives examples of nation of patterns (shallow breathing, severe cough, shallow
correctly classified samples. cough, fast count, E vowel, O vowel) reached an AUC of
96.4%, and the AUC for the combination of patterns (severe
and based on this matrix, we conclude a performance set as cough, shallow cough, E vowel) was 92%. Ultimately, this
shown in the following table: study demonstrates the importance of using different breath,
Table 1. The metrics questions. cough, and speech sounds for accurate and more reliable diag-
Equation Equation Equation nosis and screening of COVID-19.
Number Name Model 6 :In [14],this model successfully detected COVID-
1 Accuracy (TN + TP)/(TN + TP + FN + FP) 19 subjects with a maximum sensitivity of 94.21%, specificity
2 Recall TP/(TP + FN) of 94.96% and area under the receiver operating characteristic
3 Specificity TN/(TN + FP) curves (AUROC) of 0.90.
4 Precision TP/(TP + FP) Model 7 :The experimental results obtained in [35] using
5 F1-Score 2x((PrecisionxRecall) the five classification algorithms showed an average level of
/(Precision + Recall)) performance, the evaluated measurement values being around
70% of their value, but the best performance was observed
for the determination algorithm trend with the highest value.
The highest for the "recall" scale is 0.7892. Because a higher
3.2 Performance Analysis invocation value means fewer false classifications of a "false
Model 1: The performance of the proposed system[21] in negative" class. The cost of misclassifying a COVID-19 posi-
terms of accuracy, precision, F1 score and VPN for the time tive sample as COVID-19 negative is high, and therefore the
domain feature vector are: 0.912, 0.829, 0.889, 0.873 respec- choice of recall as an evaluation measure should be maximized
tively, and for the frequency domain feature vector are: 1.000, while adjusting model parameters.
0.975, 0.974, 0.952 , and for the mixed domain feature vector Model 8 :In [17],The model adopts multi-branch deep
are: 0.941, 0.938, 0.937, 0.934. The comparison between learning (multi-modal data-based fault prediction model), and
these vectors shows that the proposed system achieves better the experimental results provided in this study reach an AUC
performance with the frequency domain feature vector using of 91%. Finally, this study aims to draw attention to the im-
the cough sound samples, thus it is more accurate and reliable portance of human voice and other breath sounds for auditory
compared to the other vectors. diagnosis of COVID-19.
Model 2:In [16], the highest performance values obtained Model 9 :In [8],The results show that by averaging the
for the linear SVM are: an precision equal to 98.4%, a recall of predictions of several separately trained and evaluated models
99.5%, a specificity of 97.3%, an accuracy of 97.4% and an for different types of sounds, a simple binary classifier achieves
F1 score of 98.6% with the EMD and DWT measurements an AUC of 96.4% and an accuracy of 96%. Finally, this study
(traditional characteristic ) and the selection of Relief F func- aims to draw attention to the importance of the human voice
tions. and for the highest performance values obtained from over other breath sounds in the sound diagnosis of COVID-19.
ResNet50 deep feature selection with Relief F for CNN are: Model 10 : In this study[13],we diagnose COVID-19 using
an precision of 97.8%, recall of 98.5%, specificity of 97.3%, end-to-end deep learning from a crowd-sourced audio sample
accuracy of 97.4%, and F1 score values of 98.0%. So from a dataset, with a method ResNet(is an artificial neural network
little comparison, it can be clearly stated that features obtained (ANN)) and Features"Mel- spectrogram".This model used
by traditional feature approaches show higher performance database of people who are positive "A database of breath,
than deep features. cough, and voice sounds for COVID-19 diagnosis",Our results
Model 3 :in [27], the evolution of the efficiency of transfer show that the use of simple binary classifiers achieves an AUC
learning and bottleneck feature extraction is achieved using of 96.4%.
CNN, LSTM and Resnet50 architectures to improve the Model 11 :In [26],The method uses a CNN(The
performance of COVID classification -19 19. 19 based on convolutional neural network ) based on the residual
cough, breath and speech audio cues. The experimental results "Resnet50"(ResNet-50 is a convolutional neural network with
provided by the system proposed in this study reach an area 50 layers of depth), which is the best performing classifier with
under the ROC curve (AUC) of 0.982 for cough, followed by an AUC of 94% and accuracy 95.33%.
an AUC of 0.942 for respiration and 0.923 for speech. Model 12 :The proposed system [6] in reduces the valida-
Model 4 :In [20], a method called CR19 is provided to tion loss and increases the validation accuracy, which enables
detect COVID-19 in cough audio signals using machine learn- the model to learn low root mean square error, and the ex-
ing algorithms for automated medical diagnostic applications. perimental results obtained using the proposed GAN method
The study provides experimental results demonstrating that are represented by the accuracy , recall, accuracy, and The
GA-ML improves the accuracy of GA-ML for all classifiers F-measure is 96.54%, 96.15%, 98.56%, and 0.96, respectively.
compared to non-GA based techniques. We also observed that these results is good compared to the ANN, CNN, RNN
6 Y. Siyah et al.

model[6]. This data will be used to feed the Machine Learning model
Model 13 :The proposed system [24] does Three types to learn how to solve the problem for which it is designed.
of experiments were conducted, using speech-based, image- The data can be labeled, in order to indicate to the model the
based, and speech-image-based models. Long-Short-Term characteristics which it will have to identify. They can also
Memory (LSTM) was used for the vocal classification of be unlabeled, and the model will have to spot and extract the
the patient’s cough, voice and breathing, achieving an pre- recurring features from itself.
cision greater than 98%. Additionally, CNN models VGG16, The second step is to select an algorithm to run on the train-
VGG19, Densnet201, ResNet50, Inceptionv3, InceptionRes- ing dataset. The type of algorithm to use depends on the type
NetV2, and Xception were calibrated for classification of chest and volume of training data and the type of problem to be
X-ray images. The VGG16 model outperforms all other CNN solved.
models, achieving 85.25% precision without fine tuning and The third step is training the algorithm. This is an iterative
89.64% after performing fine tuning techniques. Additionally, process. Variables are run through the algorithm, and the re-
the speech and image-based model was evaluated using the sults are compared with those it should have produced. The
same seven models, achieving an precision of 82.22% by the “weights” and bias can then be adjusted to increase the accu-
InceptionResNetV2 model. So with a small comparison be- racy of the result. The variables are then run again until the
tween the results we conclude that the best model is LSTM algorithm produces the correct result most of the time. The
which is based only on speech. algorithm, thus trained, is the Machine Learning model.
Model 14 :In [29] ,Using a novel machine learning method The fourth and final step is to use and improve the model.
to detect COVID- 19 boxes that are both symptomatic and We use the model on new data, the origin of which depends on
asymptomatic using the CNN as a helper. The accuracy, the problem to be solved. For example, a Machine Learning
sensitivity, and specificity for symptomatic and asymptomatic model designed to detect spam will be used on emails.
boxes were 96.5%, 96.42%, and 95.47% and 98.85%, 97.01%, Machine learning algorithms are divided into two categories
and 99.6%, respectively. which are supervised and unsupervised algorithms [9]. Super-
Model 15 :In [31],In this work, they also use the short-term vised Machine Learning is an elementary but strict technol-
melodic spectrum as low-level low-level features. They follow ogy. Operators present the computer with sample inputs and
an approach similar to the super vectors originally used in the desired outputs, and the computer searches for solutions
speaker monitoring to extract utterance-level features. The to get those outputs based on those inputs. The goal is for the
SVMs are trained with these features in a similar way to the computer to learn the general rule that maps inputs and out-
super vectors originally used in speaker monitoring to extract puts. The main algorithms of supervised machine learning are:
features at the utterance level. An SVM trained with these random forests, decision trees, K-NN (k-Nearest Neighbors)
features is used to predict COVID- 19 from the speech data. algorithm, linear regression , Naïve Bayes algorithm, support
withe accuracy of 88.6% and an F1-Score of 92.7%. vector machine (SVM), logistic regression and gradient boost-
Model 16 :In [15],They found that our three auditory acous- ing. In unsupervised machine learning, the algorithm itself
tic features performed better than the basic MFCC features determines the structure of the input (no labels are applied to
The use of LR and RF classifiers perform better than the ba- the algorithm). This approach can be a goal in itself (which
sic MFCC features. Among them, erbSpec-RF achieves the makes it possible to discover structures buried in the data)
best AUC of 73.4% on the validation set, showing its strong or a means to achieve a certain goal. This approach is also
potential for detecting COVID-19 from cough sounds. called “feature learning”. The main algorithms of unsuper-
Model 17 :In [23] Using this machine learning method to vised machine learning are: K-Means, clustering/hierarchical
detect COVID- 19 bins using both the LR, MLP and RF clustering and dimensionality reduction.
classifiers showed that RF performed better than the other. Its
performance was best for respiration 76.85% AUC. 4.2 Deep learning
In addition to these results,Table2 below contains a small com- Deep Learning [10] is a branch of Machine Learning, but it
parison regarding: database used, models and classifiers used is the most commonly used today. It is an invention of Geoffrey
and the performance of each method . Hinton, dated 1986. Simply put, Deep Learning is an improved
version of Machine Learning. Deep learning uses a technique
that gives it a superior ability to detect even the most subtle
4. Artificial Intelligence
patterns. This technique is called deep neural network. This
4.1 Machine Learning depth corresponds to the large number of layers of computing
Machine Learning or automatic learning [10] is a scientific nodes that make up these networks and work in collaboration
field, and more specifically a subcategory of artificial intelli- to process data and deliver predictions. These neural networks
gence. It consists of letting algorithms discover “patterns”, are directly inspired by the functioning of the human brain.
namely recurring patterns, in data sets. This data can be num- Computing nodes are comparable to neurons, and the network
bers, words, images, statistics... The development of a Machine itself is similar to the brain.
Learning model involves four main steps represented in the
figure3.
The first step is to select and prepare a set of training data.
Publications of the Astronomical Society of Australia 7

Figure 3. Steps in developing models using AI algorithms[2].

Table 2. Comparison with many previous studies for sound-based COVID-19


diagnosis

Research Dataset Sound type Models /Classifiers Performance


Fakhry et al. (Virufy)[17] Coughvid Cough Multi-Branch Deep Learning AUC=91%
Pahar et al.[26] Coswara Cough Resnet50 AUC=98%
Coppock et al.[13] Covid-19- sounds Cough/Breathing ResNet AUC=84.6%
N.Sharma(2020)[33] Healthy/COVID-19(+):941 Cough,Breathing,vowel,Counting(1-20) Random forest classifier ACC=76.74%
C.brown et al.(2021)[11] COVID-19(+):141/(+):298 Cough/Breathing CNN using spectrogram ACC=80%
pahar2021covid[26] Coswara Cough Residue-based CNN ACC=95.33%
Pahar(2021) [28] Coswara breaths Resnet50 AUC=92%
erdougan [16] Coswara cough ResNet50 CNN ACC=96,83 %
N.Sharma [33] Healthy/COvid(+)=941 Cough/breathing Càntraste spectral,MFCC/RF ACC=66.76%
Proposed system[33] COvid(+):50 COvid(-):50 Cough MFCC,vecteur de chrominance/DNN ACC=97.5%
pahar(2022)[27] Coswara coughing,talking, sneezing,noise CNN,LSTM/un Resnet50 AUC=94.2%
hemdan(2022)[20] Coswara breathsounds/coughing/speech CR19 ACC=97%
usman(2022)[35] COVID-19(+)/COVID-19(-) Speech STFT ACC=78.92%
al.(2022)[6] healthy,unhealthy sound GAN ACC=96,54%
nassif(2022)[24] Coswara coughing,breathing,speaking LSTM ACC=98%
das(2021)[15] Coswara Cough,breath,counting from 1-20 erbSpec/RF AUC=81%
das(2021)[15] Coswara Cough,breath,counting from 1-20 GTCC/MLP AUC=78.61%
chaudhari(2020)[12] Coswara,Coughvid cough CNN:LSTM:CRNN AUC=77.1%
ritwik(2020)[31] covid(+)=10,covid(-)=19 cough SVM ACC=88.6%
akman(2022)[5] CCS,CSS Cough,Speech CIdeR ACC=78%
muguli(2021)[23] Coswara Cough,Breath RF AUC=76.85%
rahman(2022)[29] Cambridge coughing,breathing 8-CNN ACC=98%
8 Y. Siyah et al.

5. Discussion without forgetting the Resnet50 model which also performs


One of the most discussed topics in recent years is undoubt- with an Area Under the ROC Curve (AUC) of 98%.
edly the issue of COVID-19. The most important step to get
rid of this outbreak is the correct detection process. For this
reason, there are many studies based on machine learning and References
deep learning algorithms with audio cues to diagnose coro- [1] 2021, More than the virus, fear of stigma is stopping people from
navirus, and most of their methods are detailed in Section 2. getting tested: Doctors, The New Indian Express, available at
https://www.newindianexpress.com/states/karnataka/2020/aug/06/more-
From this study on the different methods and methods used
than-virus-fear-of-stigma-is-stopping-people-from-getting-tested-
in this aspect, we have seen that in each proposed method and doctors-2179656.html accessed on November 22
each system implemented by the researchers in this field, the [2] 2021, Steps in developing models using AI algorithms, available
performances obtained are good as shown in Table 2 with the at https://www.researchgate.net/figure/Steps-in-developing-models-
best methods are [24] [20] [29],but At the level of a limited using-AI-algorithmsf ig13 51583350accessedonMay
and unpublicized database (Coswara), if we come back to the [3] 2021, Word Bank and WHO, Half of the world lacks access to essential
health services, 100 million still pushed into extreme poverty due because
subject of this work, we see that the researchers are looking for
of health expenses, available at https://www.who.int/news/item/13-12-
a solution to detect this epidemic at the slightest cost compared 2017-world-bank-and-who-half-the-world-lacks-access-to-essential-
to the test already in the clinic, for example RT-PCR, as well as health-services-100-million-still-pushed-into-extreme-poverty-because-
reliability and accuracy in terms of results. Machine learning of-health-expenses accessed on November 23
and deep learning algorithms offer many advantages in the [4] 2022, Coronavirus Disease (COVID-19) Dashboard, Word Health Or-
ganization, available at https://covid19.who.int/ . accessed on August 22
medical field in terms of detecting diseases, and accurately in
[5] Akman, A., Coppock, H., Gaskell, A., et al. 2022, Frontiers in Digital
the Covid-19 epidemic. Because the problem with the current Health, 4
RT_PCR test is that it takes a long time to find detection results [6] Al-Dhlan, K. A. 2022, International Journal of Speech Technology, 25,
and during this period a person can spread the disease in his 641
environment, so the risk is high. But the proposed system [7] Alafif, T., Tehame, A. M., Bajaba, S., Barnawi, A., & Zia, S. 2021, Inter-
responds at the same time. Also for the cost, the PCR test takes national journal of environmental research and public health, 18, 1117
700 dirhams per test, but in the case of the proposed system, [8] Aly, M., Rahouma, K. H., & Ramzy, S. M. 2022, Alexandria Engineering
Journal, 61, 3487
it is enough to download the system application. etc On the
[9] Ayodele, T. O. 2010, New advances in machine learning, 3, 19
other hand, the main disadvantage of his methods is that the
[10] Bastien, L. 2020, Machine Learning: Définition, fonctionnement, utili-
performance obtained for a particular method varies with the sations
different types of data used and also the types of properties [11] Brown, C., Chauhan, J., Grammenos, A., et al. 2020, arXiv preprint
extracted from the data, and therefore at the level of reality arXiv:2006.05919
if the state of the user is equivalent to the data used when [12] Chaudhari, G., Jiang, X., Fakhry, A., et al. 2020, arXiv preprint
processing the method so that’s good. Otherwise, the system arXiv:2011.13320
will not correctly detect the disease and in this case the results [13] Coppock, H., Gaskell, A., Tzirakis, P., et al. 2021, BMJ innovations, 7
will not be accurate. [14] Daisy, M. I. A., & Lavanya, R. ????
In order for the proposed methods to be more reliable and [15] Das, R. K., Madhavi, M., & Li, H. 2021, in 22nd Annual Conference of
the International Speech Communication Association, INTERSPEECH
accurate, it is necessary to use a larger database that is more 2021, 4276–4280
representative of the population. In addition, future work will [16] Erdoğan, Y. E., & Narin, A. 2021, Computers in Biology and Medicine,
try to develop these models so that they are not limited to 136, 104765
detecting the presence of the virus but to know if the positive [17] Fakhry, A., Jiang, X., Xiao, J., et al. 2021, arXiv preprint
person needs to be hospitalised or not. We will also use these arXiv:2103.01806
techniques to detect other respiratory diseases that are similar [18] Hanson, K. E., Caliendo, A. M., Arias, C. A., et al. 2020, Clinical
infectious diseases
to COVID19 (asthma, bronchitis, lung cancer, etc.).
[19] Hassan, A., Shahin, I., & Alsabek, M. B. 2020, in 2020 International
conference on communications, computing, cybersecurity, and informatics
(CCCI), IEEE, 1–5
6. Conclusion
[20] Hemdan, E. E.-D., El-Shafai, W., & Sayed, A. 2022, Journal of Ambient
The COVID-19 outbreak has had a significant effect on Intelligence and Humanized Computing, 1
the well-being of people around the world, with a sharp [21] Islam, R., Abdel-Raheem, E., & Tarique, M. 2022, Biomedical Engi-
increase in the number of casualties. Deep learning and neering Advances, 3, 100025
machine learning techniques have provided considerable help [22] Madeeh, O. D., & Abdullah, H. S. 2021in , IOP Publishing, 012008
since the beginning of the global epidemic. In this overview, [23] Muguli, A., Pinto, L., Sharma, N., et al. 2021, arXiv preprint
arXiv:2103.09148
several classification approaches of COVID-19 based on
[24] Nassif, A. B., Shahin, I., Bader, M., Hassan, A., & Werghi, N. 2022,
coughing sound and breathing have been proposed with the Mathematics, 10, 564
performance of each method, The comparison of the obtained [25] Organization, W. H., et al. 2022
results was listed in Table 2 for the identification of the best [26] Pahar, M., Klopper, M., Warren, R., & Niesler, T. 2021, Computers in
technique for COVID-19 classification. According to the Biology and Medicine, 135, 104572
latter, the LSTM, 8-CNN and CR19 models are the most [27] —. 2022, Computers in Biology and Medicine, 141, 105153
efficient compared to the others with an accuracy of 98%, [28] Pahar, M., & Niesler, T. 2021
Publications of the Astronomical Society of Australia 9

[29] Rahman, T., Ibtehaz, N., Khandakar, A., et al. 2022, Diagnostics, 12,
920
[30] Rajkarnikar, L., Shrestha, S., & Shrestha, S. 2021, Int. J. Adv. Eng, 4,
337
[31] Ritwik, K. V. S., Kalluri, S. B., & Vijayasenan, D. 2020, arXiv preprint
arXiv:2011.04299
[32] sarah kliff. 2021, Most Coronavirus Tests Cost About $100.
Why Did One Cost $2,315? The New York Times, available
at https://www.nytimes.com/2020/06/16/upshot/coronavirus-test-cost-
varies-widely.html . accessed on November 23
[33] Sharma, N., Krishnan, P., Kumar, R., et al. 2020, arXiv preprint
arXiv:2005.10548
[34] Udugama, B., Kadhiresan, P., Kozlowski, H. N., et al. 2020, ACS nano,
14, 3822
[35] Usman, M., Gunjan, V. K., Wajid, M., Zubair, M., et al. 2022, Compu-
tational Intelligence and Neuroscience, 2022

You might also like