Effect of Noise on Generic Cough Models

Sayanton V. Dibbo1 * , Yugyeong Kim2 * , Sudip Vhaduri3

Dartmouth College, NH, USA, 2 Fordham University, NY, USA, and 3 Purdue University, IN, USA

Abstract—Respiratory diseases, such as chronic obstructive and computing capabilities of smartphones, we are able to ac-
pulmonary disease (COPD) and asthma, are two major reasons complish various tasks, including sleep monitoring [13], [14],
for people’s death across the globe. In addition to these common mental and physical health monitoring [15]–[19], user authen-
inflammatory respiratory diseases, some human transmissible
respiratory diseases, such as coronaviruses, cause a global pan- tication [20]–[26], and place discovery [27]–[31], among many
demic. One major symptom of these inflammatory respiratory others using various smartphone services. Motivated by this,
diseases is coughing. Identifying coughing using smartphone- researchers have started relying on smartphone-microphone
microphone recordings is easily doable from a remote setup and data, i.e., audio recordings, to detect various types of non-
can help physicians and researchers early guess a situation for an speech human sounds, including coughs [8] due to their low-
individual and a community. However, smartphone-microphone
recordings can be affected by environmental noises and that can cost and wide-scale applicability. While some researchers have
impact the performance of models that are developed to detect developed machine learning models to detect coughs [32],
coughing from microphone recording. Thereby, in this work, we other researchers have started using deep learning to develop
present a detailed analysis of noise impacts on cough detection cough detection models [33]. While most of their works rely
models. We develop models using voluntary coughs and other on server-based implementations to achieve a good perfor-
background sounds obtained from three public datasets and test
the performance of those models while detecting various types mance, they bring additional challenges, including offloading
of coughs, including COPD and COVID-19, obtain from three privacy-sensitive audio recordings from the users’ end to
separate datasets in the presence of background noises. the server. Additionally, most of the works are targeted to
Index Terms—audio analytics; cough; noise; smartphone develop models to detect disease-specific cough using a limited
dataset with limited to no consideration of environmental
I. I NTRODUCTION effects, e.g., the presence of different types of noises at
varying levels [34]. Additionally, models developed from one
A. Motivation
disease-specific cough may down perform while applying on
Coughing and its patterns have association with different a different disease-specific cough. Sometimes it is even more
types of inflammatory respiratory disease, such as COPD [1], challenging to develop cough models for an unknown or new
asthma [2], tuberculosis [3], and coronavirus caused COVID- disease before its outbreak, such as COVID-19. Therefore, it
19 or 2019-nCoV, MERS-CoV, and SARS-CoV-2 diseases [4], is important to develop cough models that can be applicable
among several other [5]. According to the world health or- to users with different types of respiratory diseases in the
ganization, about 67.7% of COVID-19 patients have a dry presence of different background noises.
cough [6]. Thereby, physicians rely on coughing and their
patterns as one of the major symptoms while diagnosing C. Contribution
various respiratory diseases and their stages. However, the In this work, we first present three modeling approaches
most common approaches to assess coughing patterns are (Section II-A). Next, models developed from three publicly
based on various patient-reported surveys [7], [8], which inher- available datasets are extensively tested on three separate
ently suffer from various limitations of self-reported surveys, datasets, including two disease-specific coughs (COVID-19
including human errors and recall bias [9]–[11]. Furthermore, and COPD) datasets (Section II-B) using 15 types of back-
these self-reported cough symptoms do not correlate well with ground noises at varying “signal-to-noise ratio” values to better
objective cough recordings. Thereby, an objective reporting of assess the generalizability and broad applicability of regular
cough symptoms in different environments using audio sensing cough-driven models (Section III). Findings from this work
and computing capability of smartphones can be extremely will provide guidelines for future research in this direction.
helpful not only to better assess various respiratory disease
symptoms but also to foster wide-scale coverage at a low cost II. M ETHODS
in normal and human transmissible pandemic time. In this section, we introduce our modeling schemes. We also
introduce our datasets, pre-processing steps, feature generation
B. Related Work and selection, and hyper-parameter optimization.
With the advancement of mobile networks due to emergence
A. Modeling Approaches
of the internet of things (IoT) [12] and improvement of sensing
In this work, we present three modeling schemes. In un-
* Both authors contributed equally to this research guided modeling approach, we develop unary (one class)
The chronic obstructive pulmonary disease (COPD) dataset
consists of cough sounds obtain from 12 patients (avg. age
56.2 ± 0.9 years). We use the RecForge II android app to
collect these test coughs at 44.1 kHz frequency. While we
primarily use the first three datasets for model development,
the last three datasets, i.e., SNP, COVID-19, and COPD
datasets, are used to test the models.
C. Pre-processing
Fig. 1: Bar graphs with error bars of model performances when We first change the sampling frequency of all clips to 44.1
testing on different types of coughs (no noise augmentation) kHz. Next, we segment the clips to collect ground truth labels,
augment the data, and split them into train-test sets.
1) Data Segmentation and Labeling: In this work, we use
models using only target sound, e.g., cough. On the other the Audacity [40] desktop audio-processing application to
hand, in guided modeling approach, we develop three separate segment and label cough clips into two or three phase cough
binary models considering three categories (i.e., animal, non- events [7]. In summary, we obtain 106 (ESC-50), 106 (SNP),
cough human, and non-living being) of background noises. In 170 (COVID-19), and 282 (COPD) cough events, and 40 non-
these three models, class-1 consists of the same cough sounds, cough events from one of the 15 types of background sounds.
but class-0 consists of one particular noise category, which is 2) Data Augmentation: Often audio recordings, e.g., a
comprised of five types of noises (Section II-B). Finally, In person’s coughing patterns, can be altered due to background
semi-guided modeling approach, we develop binary models for changes, a user’s physical state or mood (tired, excitement,
the semi-guided environments using coughs (class-1) and 15 exercising, and other numerous states). To simulate these
types of sounds obtained from the three categories of sounds effects, we consider three types of augmentations, i.e., 14 pitch
(class-0). shifts (±0.5, ±1, ±1.5, ±2, ±2.5, ±3, and ±3.5), three time
stretches (0.5, 0.25, and 0.75), and four “signal-to-noise ratio”
B. Audio Data Collection (SNR) values (0.5, 0.1, 2, and 10) for noise superposition.
In this work, we collect various cough and non-cough While pitch shift and time stretch are used to augment the
sounds from six separate audio datasets, including two respira- cough and non-cough sounds obtained from the ESC-50,
tory disease datasets. The Environmental Sound Classification FreeSound, and US-8K datasets (during model development),
(ESC-50) dataset [35] consists of five categories of sound clips noise augmentations are used to modify test-coughs obtained
recorded at 44.1 kHz frequency. The FreeSound dataset [36] from the ESC-50, SNP, COVID-19, and COPD datasets.
is a collaborative repository with 400k+ sounds and effects, 3) Train-Test Splits: As discussed before, we primarily use
which cover a wide range of recordings from field to syn- the 106 ESC-50 coughs and their 17 pitch-shift and time-
thesized sounds, recorded at 44.6 ± 4.2 kHz frequency. The stretch augmentations, i.e., a total of 1098 cough events (106
Urban Sound 8K (US-8K) dataset [37] consists of 8k+ sound x (1+17)) to form class-1, while developing models. We
clips recorded at 44.1 kHz from 10 types of urban sounds. In first split the 106 coughs randomly 10 times using a 9:1
addition to cough clips (obtain from ESC-50), from these three train-test ratio. Then, we combine the augmented versions to
datasets we create three common categories of background maintain mutual exclusion between the train and test sets. For
sounds, which we use as class-0 (i.e., non-cough class) while class balancing, we also select the same number of train-test
developing models and as background noises to modify test samples uniformly from the five or 15 types of background
coughs while testing noise effects. sounds (class-0 non-cough samples) to form the class-0 while
Animal category consists of cricket, crow, dog, frog, and developing binary guided or semi-guided models, respectively.
rooster sounds obtain from the ESC-50 dataset. Non-cough D. Feature Engineering
human category consists of breathing, laughing, snoring, and We compute 40 Mel-frequency cepstral coefficient (MFCC)s
sneezing sounds obtain from the ESC-50 dataset and the as well as 40 first and 40 second temporal derivatives. In
throat-clearing sounds obtain from the FreeSound dataset. summary, we compute a set of 120 candidate features. Using
Non-living being category consists of door knock, washing the “Select K Best” and variance-based approaches, we find
machine, vacuum cleaner, and engine sounds obtain from the 120 and 70 most influential features are good compromise for
ESC-50 dataset, and the air conditioner sounds obtain from binary and unary classifiers, respectively.
the US-8K dataset.
The SoundSnap (SNP) dataset [38] consists of 250k+ pro- E. Parameter Optimization and Classifier Selection
fessional sound effects. We obtain test-cough sounds recorded While modeling each split, we separately perform the hyper-
at 46.65 ± 11.10 kHz. The Coswara COVID-19 dataset [39] parameter optimization using grid search with various ranges
consists of cough and breathing sounds of COVID-19 positive of values. Finally, we select a combination of various parame-
patients and healthy participants. We obtain the COVID-19 ter values that achieves the highest model performance across
positive test-cough sounds recorded at 47.82 ± 0.83 kHz. all 10 splits as the optimal combination.
Fig. 2: Bar graphs with error bars of model performances when testing on noise augmented coughs

From our experimentation with a wide range of classifiers,

we find the random forest (RF) with 100 estimators works
the best for the guided models trained with the non-cough
human and non-living being background sounds as class-0.
Similarly, gradient boosting (GB) with 100 estimators works
the best for guided models with animal background sounds as
class-0. For semi-guided models, random forest (RF) with 100
estimators works the best. The SVM classifier with polynomial (a) Performance degradation (b) T-SNE plot
kernel (degree = 2 and regularization parameter = 1) works the
best for unary models, i.e., unguided models. All our analysis Fig. 3: (a) Generic model performance, (b) cough distribution
presented in this manuscript (Section III) is based on these
optimal models and their optimal parameter values.
guided models achieve lower than 0.75 accuracies when testing
III. A NALYSIS on COVID-19 and COPD coughs.
In this section, we compare the performance of three In general, we observe that unguided models outperform the
modeling approaches (discussed in Section II-A), while de- semi-guided models, which is probably because the “unary”
tecting coughs obtained from the three datasets, including two unguided models are biased to class-1, i.e., cough sounds,
respiratory disease-cough datasets in the presence or absence which is the only class that is used to train the models.
of background noises. We primarily use accuracy (ACC) to Similarly, in general, the best semi-guided model (i.e., “Binary
compare models while testing on different types of coughs. RF (15)”) performs worse than the best guided model, but
We use “Binary GB 5 (Animal)”, “Binary RF 5 (Non- better than the unguided (exception COPD coughs).
cough)”, and “Binary RF 5 (Non-living)” to refer to the best Now, we present our analysis using noise-augmented cough
guided models developed for the three types of environments events to determine the robustness of our models in the
(i.e., environments with the three categories of background presence of noises. In Figure 2, we present performance of
sounds; animal, non-cough human, and non-living being) using different models while testing on three categories of back-
binary classifiers with five types of sound from one of the three ground noises at four separate SNR values. In the figure, we
categories of environments. Similarly, we use “Binary RF 15” observe a performance drop with the addition of noise with
and “Unary” to indicate the best semi-guided and unguided increased noise levels, i.e., lowered SNR values. Considering
models developed using binary classifier with 15 types of all four SNR values, we find average test accuracy of more
background sounds and unary classifier with no background than 0.8 for all noisy coughs, except COPD, for which the
sounds, i.e., only cough sounds are used to develop models. average test accuracy is 0.72. As before, we observe that
We first present our analysis using cough events without guided models trained with the non-living being and non-
noise augmentation. In Figure 1, we present a detailed anal- cough human sounds perform the best and worst, respectively.
ysis of different models (trained from ESC-50 coughs) when At SNR value 0.1, the best model outperforms the worst model
testing on different types of coughs. In general, we observe by 24%.
that the binary guided models trained with non-living being To better understand the performance drop of different
background sounds as class-0 (i.e., “Binary RF 5 (Non-living)” cough models while testing on various types of coughs in
models) achieve an average accuracy of more than 0.9, which the absence or presence of noises at different SNR values,
is higher than the accuracy of other two guided models trained we present the median test accuracy drop (%) in Figure 3a in
with animal or non-cough human sounds as class-0. These two an aggregated form. In this analysis, we consider the median
