Attention-Based CRNN Models For Identification of Respiratory Diseases From Lung Sounds

IEEE - 56998
ATTENTION-BASED CRNN MODELS FOR

2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT) | 979-8-3503-3509-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCCNT56998.2023.10306490
IDENTIFICATION OF RESPIRATORY DISEASES

FROM LUNG SOUNDS
Sanjana J Poornanand Purushottam Naik Mahesh A. Padukudru
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering Dept. of Respiratory Medicine
National Institute of Technology Karnataka National Institute of Technology Karnataka J.S.S. Medical College, JSSAHER
Surathkal, India Surathkal, India Mysore, India
sanjanajagadish13@gmail.com poornanandnaik24@gmail.com mahesh1971in@yahoo.com
Shashidhar G. Koolagudi Jeny Rajan

Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
National Institute of Technology Karnataka National Institute of Technology Karnataka
Surathkal, India Surathkal, India
koolagudi@nitk.edu.in jenyrajan@nitk.edu.in
Abstract—Respiratory diseases are a major global health Pulmonary auscultation is a time-honoured tradition for
concern, with millions of people suffering from disorders such determining respiratory health. Clinical use of this technology
as asthma, bronchitis, chronic obstructive pulmonary disease has been deemed economical, non-invasive, and risk-free for
(COPD), and pneumonia. In recent years, machine learning and
other forms of Artificial Intelligence have proven to be useful monitoring the state of the respiratory system [3]. A pul-
resources for resolving issues in the medical field. In this study, we monologist can diagnose pulmonary conditions like asthma,
examine the diagnostic utility of Convolutional Recurrent Neural pneumonia, and bronchiectasis (BRON) by listening to the
Network (CRNN) models for identifying respiratory diseases inhaling and exhaling sounds made by the lungs while breath-
using digitally recorded lung sounds. We developed two deep ing [4]. Adventitious breathing sounds, like crackles and/or
learning models to diagnose and classify lung diseases: a binary
classification to classify COPD and non-COPD, and a multi-class wheezes, from the lungs provide useful information about
classification model to classify five lung disorders (COPD, URTI- various pulmonary disorders. In order to automatically identify
upper respiratory tract infection, Pneumonia, Bronchiectasis and and characterise adventitious lung sounds, computational lung
Bronchiolitis) and healthy conditions. The ICBHI 2017 challenge sound analysis (CLSA) systems have been developed employ-
dataset [1] was used to develop the models. The accuracy of ing digital audio recorders, signal processing techniques, and
the binary and multiclass classification models was 98.6% and
97.6%, respectively, with ICBHI Scores of 0.9866 and 0.9723. machine learning methodologies.
Index Terms—Convolutional Recurrent Neural Network, At- In this paper, we present a novel approach to respira-
tention mechanism, Long-Short Term Memory, Gated Recurrent tory disease classification using the ICBHI (Int. Conf. on
Units, Respiratory Disorders. Biomedical Health Informatics) dataset and an attention-based
CRNN (Convolutional Recurrent Neural Network) model. The
I. I NTRODUCTION ICBHI dataset is a publicly available collection of lung sound
The healthcare sector is autonomous and one of the most recordings from patients with different respiratory conditions,
crucial industries, where people have high standards for diag- making it an ideal resource for training and evaluating machine
nosis, treatment, and service quality. Recent years have seen learning models. To improve the robustness and generalisation
a rise in the prevalence of lung disorders, making them the of our model, we use various preprocessing techniques, such
third leading cause of death across the world [2]. Every year, as data augmentation and normalization. Data augmentation
respiratory problems claim the lives of almost 7 million indi- techniques, such as time and pitch shifting, increase the
viduals worldwide. These respiratory disorders are particularly variability and diversity of the dataset, thereby preventing
prevalent in low- and middle-income nations. The WHO is overfitting and improving the model’s performance on unseen
attempting to lessen the impact of these diseases on the entire data. Normalization techniques, such as Z-score normaliza-
world by improving diagnosis, prevention, and treatment. tion, ensure that the input features have similar distributions,
Effective treatment and management of many such disorders making it easier for the model to learn meaningful patterns.
depend on their early discovery and precise diagnosis. One We have leveraged the attention mechanism to capture both
promising approach is to use machine learning algorithms to temporal and spectral features of lung sounds and focus on the
classify respiratory diseases based on lung sounds. most informative parts of the input to increase the accuracy
14th ICCCNT IEEE Conference

Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on May 21,2024 at 05:27:02 UTC from IEEE Xplore. Restrictions apply.
July 6-8, 2023
IIT - Delhi, Delhi, India
IEEE - 56998
with which the model can classify data. Our experiments sounds recorded from multiple channels, giving an F1-score
support the proposed model to outperform several present-day of 84.71% and 87.59% respectively.
techniques in terms of various performance evaluation metrics. In order to detect COPD from breathing sounds, Arpan
The paper is structured as follows: In Section II, the Srivastava et al. [10] has suggested a straightforward CNN-
overview of the literature review is presented, while in Section based deep learning assistance model that uses fewer com-
III, the materials and methodology are described. Section IV putational resources. In order to evaluate the model on the
presents the discussion and results, and Section V concludes ICBHI 2017 Dataset, they have utilised the Librosa machine-
the study. learning package features to compare and contrast various
spectral features, such as MFCC, Chroma, Mel-Spectrogram,
II. LITERATURE REVIEW Chroma (Constant-Q), and Chroma CENS. When compared
to other spectral properties, ”MFCC” was found to be the
The deep-learning architecture created by Diego Perna et most effective in detecting COPD. Individual metrics such
al. [5] for diagnosing aberrant lung sounds (wheezes and as AUC and ICBHI Score were used to assess the models’
crackles) and chronic and non-chronic diseases was based on performance; the AUC was 0.89, and the ICBHI Score was
a combination of MFCCs and an advanced Recurrent Neural 0.92.
Network (RNN) model. The model achieved an ICBHI score Conor Wall et al. [11] described an audio-based COVID-
of 0.91 for two class classification (healthy or unhealthy) 19 diagnostic and classification system developed utilising
and 0.90 for three class classification (chronic, non-chronic a model ensemble consisting of a total of four deep neural
diseases, or healthy) for pathology-driven classification tasks. networks. The baseline models and the ensemble model were
Victor Basu et al. [6] developed a Gated Recurrent Unit trained and tested using the respiratory ICBHI Dataset and the
(GRU)-based model for classifying a variety of respiratory dis- Coswara breathing, cough, and speech datasets. They have also
eases such as COPD, bronchiectasis, pneumonia, URTI (upper included an attention mechanism, giving the models the ability
respiratory tract infection), and bronchiolitis. The authors used to pay attention to the most relevant information for the task
the ICBHI dataset to develop the model. Data augmentation at hand.
techniques were implemented to overcome the data imbalance In order to address the data imbalance of the ICBHI dataset,
because of large number of COPD samples. Forty MFCC Jane Saldanha et al. [12] developed a variety of variational
features were retrieved from each audio file for training the autoencoders like convolutional, conditional and multilayer
model. The model attained a classification accuracy of 95.67%, perceptron VAE for data augmentation. Evaluation metrics
a precision of 95.69%, a sensitivity of 95.65%, and an F1-score such as cross-correlation, Fre´chet Audio Distance (FAD),and
of 95.66%. Mel Cepstral Distortion were used to evaluate the synthesising
Neeraj Baghel et al. [7] presented a deep learning-based ability of the variational autoencoders. Principal Component
CNN model to classify various forms of pulmonary disorders Analysis (PCA) was used to compare the features of the
using lung sounds. The dataset contained lung sounds col- synthetic and original audios. Various neural network models
lected from 532 patients belonging to four major categories: were trained on the ICBHI dataset along with the samples
Crepitation, Normal, Rhonchi, and Wheezing. The model used generated by the VAE models and the ResNet-50 model
a Gaussian filter to remove noise and extract pure lung sounds, achieved the highest ICBHI score of 0.85.
which were then used for computation. During experiments,
the proposed model achieved a high accuracy rate of 94.24%. III. MATERIALS AND METHODS
M. Fraiwan et al. [8] combined the ICBHI dataset with a A. Dataset Description
private dataset of sthethoscopic lung sounds of 103 patients. The ICBHI 2017 challenge dataset [1] is a collection of
Various preprocessing techniques, such as wavelet smoothing, cough sound recordings from patients with respiratory diseases
displacement artifact removal, and z-score normalization, were and healthy individuals. The dataset was used in the 2017
implemented to ensure less noisy signals. Their deep neural ICBHI challenge in the interest of performing automatic
network model had two stages: CNN layers and bidirectional pulmonary disease classification based on cough sounds. The
LSTM (BDLSTM) units. The CNN+BDLSTM model attained dataset consists of recordings of 920 cough sounds in WAV
the maximum average accuracy of 99.62% and a precision of format, sampled at 44.1 kHz and a bit depth of 16 bits.
98.85% in classifying various pulmonary diseases. Each recording lasts between 10 and 90 seconds. The audio
T. Nguyen et al. [9] proposed a robust framework for clas- samples were recorded from patients with eight different
sifying adventitious lung sounds and identifying respiratory respiratory conditions, including asthma, bronchiectasis, bron-
disorders based on stethoscopic lung recordings (auscultation) chiolitis, COPD, pneumonia, URTI, LRTI, and healthy. Each
. The framework used pre-trained models from various ResNet recording is labelled with a multi-class classification indicat-
architectures and explored various fine-tuning techniques to ing the respiratory condition associated with it. The dataset
improve performance. Additionally, spectrum correction and also includes metadata about the patients, such as age, sex,
inverting data augmentation were incorporated into the pro- and smoking status. The ICBHI 2017 challenge dataset has
posed model to increase its robustness. The proposed model been widely used in research on respiratory disease diagnosis
was evaluated using the ICBHI dataset and a database of lung and classification, as well as in creation and testing of ML

July 6-8, 2023
IEEE - 56998
systems for the identification of different respiratory disorders audio augmentation techniques were applied on the non-copd
automatically. The number of audio samples belonging to the samples.
different respiratory diseases are depicted in Table I.
1) Time Shifting: Time Shifting involves altering the start
TABLE I: ICBHI Dataset Description. and end times of an audio clip, effectively shifting it forward
or backward in time. This technique is useful because it can
CLASS NUMBER OF SAMPLES
help simulate the variability in timing that naturally occurs in
COPD 793
Healthy 35 speech and other audio signals.
Bronchiectasis 16 2) Scaling: Scaling is an audio augmentation technique
Bronchiolitis 13
URTI 23
used to alter the loudness or volume of an audio signal. It
Pneumonia 37 involves multiplying the original signal by a constant value
Asthma 1 without changing its shape or frequency content. This tech-
LRTI 2 nique can be useful for increasing the diversity of training
data, as it can help mimic changes in the loudness or intensity
Waveform visualisations of the various types respiratory of the lung sounds.
disorders is shown in Figure 1. 3) Pitch Shifting: Pitch shifting for respiratory audio is
used to alter the pitch of recorded breathing sounds to highlight
specific features, such as wheezing or crackling sounds in the
lungs. Frequency-based and time-based pitch shifting can be
used to isolate and enhance the desired features, making them
easier to analyse and diagnose respiratory conditions. This
technique can be used to replicate changes in the age of the
patient or pitch of the lung sounds.
4) Loudness Augmentation: Loudness augmentation for
respiratory audio refers to the process of increasing the volume
or amplitude of respiratory sounds to improve their clarity and
make them easier to analyze. This process can be useful in
medical settings, where doctors and researchers need to listen
to respiratory sounds to help identify and track the progression
of respiratory diseases such as asthma, pneumonia, and COPD.
5) Speed Augmentation: Speed augmentation for respira-
tory audio refers to the process of increasing or decreasing the
speed or tempo of the audio recording to adjust its duration
without changing its pitch. This technique can be useful in
medical settings to analyze respiratory sounds, as it can help
to highlight specific features or patterns in the audio that may
be difficult to discern at normal playback speeds.
No augmentation was performed on the COPD samples.
The number of audio samples belonging to each category after
augmentation for binary classification is shown in Table II.
Fig. 1: Waveform Visualisations of the different respiratory
TABLE II: Augmentation for Binary Classification
disorders
Class Before Augmentation After Augmentation
COPD 793 793
B. Data Preprocessing NON-COPD 127 635
There were several anomalies and unstructured information
in the dataset. Using the Python module Librosa, we nor- The number of audio samples belonging to each category
malised the data by trimming and padding the audio samples to after augmentation for multi-class classification is shown in
a duration of 20 seconds [13]. A class-wise data imbalance was Table III.
observed in the dataset. It included a single instance of asthma
and two of LRTI. As a result, in order to improve the efficiency C. Feature Extraction
of the model, we removed the samples that corresponded We used a total of five features for feature extraction.
to those categories. More number of COPD samples than The features used were: mel-frequency cepstral coefficients
other six classes was observed; hence, the data set remained (mfcc), mel-spectrogram (Mel-Spectrogram), chromagram de-
imbalanced. As a result, data augmentation was performed on rived from the waveform/power spectrogram, spectral contrast,
the other categories of the data set. The following five different and spectral centroid.

July 6-8, 2023
IEEE - 56998
TABLE III: Augmentation for Multi-Class Classification where µ is the data’s mean, σ is its standard deviation,
Class Before Augmentation After Augmentation and z is the data point’s z-score. Various data augmentation
COPD 793 793 techniques were then applied to overcome the class-wise data
Healthy 35 280
Bronchiectasis 16 128
imbalance and prevent the model from overfitting.
Bronchiolitis 13 156
URTI 23 184
Pneumonia 37 286
Mel-frequency cepstral coefficients (MFCCs) [14] are a

powerful feature extraction technique used in audio process-
ing. MFCCs involve employing a Fast Fourier Transform
(FFT) to convert the audio signal to the frequency domain, and
then applying a mel filterbank to mimic the human auditory
system’s frequency. The resulting filtered signals are then
transformed using a Discrete Cosine Transform (DCT) to
produce a set of coefficients referred to as MFCCs.
Mel spectrograms [15] provide a visualization of the fre-
quency content of the respiratory audio signals. By converting
the frequency axis to a mel scale, mel spectrograms can
highlight the unique features of respiratory sounds, which can
be used for various tasks, such as detecting respiratory diseases
or monitoring respiratory patterns.
A chromagram [16] is a representation of the spectral content
of an audio signal. It is similar to a spectrogram, but instead
of displaying the spectrum over time, it displays the spectrum
over pitch classes.
Spectral Contrast [17] is a method for characterizing the
spectral content of an audio signal, which is the distribution of
energy across different frequency bands. It works by dividing
the frequency spectrum of an audio signal into multiple sub-
bands, typically using a logarithmic frequency scale.
Spectral centroid measures the weighted average of the fre-
quency content of the audio signal. It can capture information
about the overall spectral shape of the signal and has been
used in tasks such as respiratory sound classification, snore
detection, and wheeze detection. Fig. 2: Block diagram of the proposed system
We take the mean of each of the features about the columns
representing the number of frames or time steps. Taking the Five different spectral features were used to capture different
mean along the columns (axis=1) is a common approach aspects of the frequency content. This allowed the model to
for dimensionality reduction, as it preserves the important learn important frequency patterns and variations in respiratory
information in the matrix while reducing its size. This is sounds that may be associated with different respiratory condi-
because the mean value of each coefficient across all frames is tions. Furthermore, spectral features provide a more compact
a good approximation of its overall contribution to the audio and informative representation of respiratory audio signals
signal. compared to raw time-domain signals, reducing the amount
D. Proposed Methodology of data required for training, improving the efficiency of the
training process, and reducing the risk of overfitting. By using
The proposed methodology comprises of three steps: pre- multiple spectral features and concatenating them, the resulting
processing, feature extraction, and model training. The work- feature vector was able to capture a more comprehensive
flow of the proposed system is illustrated in Figure 2. During representation of the frequency content of the signal. This
the pre-processing stage, trimming and padding was done enabled the CRNN model to learn more discriminative features
to ensure that all the signals had a consistent length of 20 which helped in the better classification of respiratory diseases.
seconds. Further, z-score normalisation (1) was applied to This also helped reduce the impact of noise and variations in
normalize the signal amplitudes to be distributed uniformly the data because different spectral features are more robust to
about zero, with a single standard deviation. different types of noise or variations.
x−µ
z= (1)
σ

July 6-8, 2023
IEEE - 56998
Previous studies have used simple RNN models or CNN the model. Additionally, techniques such as Early Stopping
models for respiratory disease classification using audio sig- and Model Checkpointing were implemented to prevent the
nals. However, these models have limitations in their ability model from overfitting and save the best-performing model
to capture long-term dependencies in the input signal and to during the training process. Finally, two classification models
selectively attend to relevant features. were developed: one is a binary classification model to differ-
In this study, different attention based Convolutional Recur- entiate between COPD and NON-COPD, which was developed
rent Neural Network(CRNN) models were analysed to identify to benefit from the large volume of COPD samples, and
the model that performs the best for the given classification the other is a multi class classification model to distinguish
problem. A CRNN model that combines CNN with RNN was between the five respiratory diseases (COPD, pneumonia,
developed to capture both local and temporal features in the bronchiectasis, bronchiolitis, and URTI) and healthy condi-
input data. Spatial characteristics were extracted from the input tions.
sequence using the CNN layers, while temporal relationships
were retrieved by the RNN layers. The output of the CNN IV. RESULTS AND DISCUSSION
layers served as input for the RNN layers. Our attention-based CRNN model leverages both temporal
The architecture of the proposed CRNN model is described and spectral features of lung sounds, thus capturing the
in Table IV. complex patterns and variations in respiratory sounds that
are indicative of different diseases. Additionally, the attention
TABLE IV: CRNN Network Architecture mechanism enabled the model to concentrate on the most
Layer Layer Description informative parts of the given input, further improving
L1 Conv1D (32,kernel size=2,ReLu) its classification performance. The model performance
L2 MaxPooling1D (pool size=2)
L3 Conv1D (32,kernel size=2,ReLu)
was assessed using a variety of performance evaluation
L4 MaxPooling1D (pool size=2) metrics, including accuracy(on test data), sensitivity(Se),
L5 Dropout - 0.2 specificity(Sp), F1 Score and ICBHI Score. The ICBHI score
L6 CRNN - 128 (GRU/LSTM/BiGRU/BiLSTM)
L7 CRNN - 128 (GRU/LSTM/BiGRU/BiLSTM)
is computed as shown in Equation 2.
L8 Attention Mechanism
L9 Dense - 64 Sensitivity + Specif icity
L10 FC Dense (num of classes, sigmoid/softmax)
ICBHIscore = (2)
2
We have developed LSTM and GRU models to effectively A. Binary Classification Model
capture long-term dependencies in sequential data. Respiratory The results obtained using different attention-based CRNN
audio classification involves analyzing audio signals that vary models implemented for the binary classification problem
in length and complexity, making it a challenging task for across the different performance evaluation metrics considered
traditional machine learning models. However, LSTM and are depicted in Table V.
GRU models have been shown to be effective in capturing the
subtle variations in respiratory audio signals that are critical TABLE V: Performance of Binary Classification Model
for classification. The key advantage of LSTM and GRU Attention Model Accuracy F1 Score ICBHI Score
models over traditional RNN models is their ability to mitigate CRNN - GRU 0.97202 0.97 0.97320
CRNN - LSTM 0.97902 0.98 0.98027
the vanishing gradient problem, which is a common problem CRNN - BiGRU 0.97203 0.97 0.97172
in training RNNs on long sequences of data. To address CRNN - BiLSTM 0.98601 0.99 0.98660
this issue, our models include gating mechanisms to regulate
data transfer within the network and prevent gradients from The attention-based CRNN model with Bidirectional LSTM
vanishing. units can capture both past and future contexts of the input
In order to further enhance the performance of LSTM and sequence, thus providing the highest validation accuracy of
GRU models, we have incorporated an attention mechanism 98.601% and a macro-average F1-score of 0.99. Classification
that allows the model to focus on the most pertinent portions report of the model is depicted in Table VI .
of the input signal. In this study, we take a closer look at
the attention mechanism, a cutting-edge component of RNN TABLE VI: Report on Binary Classification
architectures that has been the subject of much prior explo-
Precision Recall F1-score Support
ration. [18]. The basic idea behind the attention mechanisms COPD 0.99 0.98 0.99 158
was to assign a weight to each element of the input sequence, NON-COPD 0.98 0.99 0.98 128
indicating its importance for the task at hand. We have used macro avg 0.99 0.99 0.99 286
weighted avg 0.99 0.99 0.99 286
the self-attention mechanism, where each element of the input
sequence is compared to all other elements and a weight is
assigned based on the similarity of each pair of elements. B. Multi Class Classification Model
We used ICBHI dataset with 80% of the data for training The results obtained using different attention-based CRNN
and 20% for testing. Adam optimiser was used in developing models implemented for the multi class classification problem

July 6-8, 2023
IEEE - 56998
ICBHI Dataset
across the different performance evaluation metrics considered + VAE-generated
are depicted in Table VII. synthetic samples. Variational
7 classes Auto Encoders
Jane Saldanha
(normal, (VAE)
TABLE VII: Performance of Mutli-Class Classification Model et al. (2022) [12]
bronchiolitis, + MFCCs + CNN
bronchiectasis, ICBHI Score - 0.85
Attention Model Accuracy F1 Score ICBHI Score pneumonia, URTI,
CRNN - GRU 0.96078 0.94 0.96539 LRTI and COPD)
CRNN - LSTM 0.96358 0.94 0.96154 Five Spectral Features
CRNN - BiGRU 0.96918 0.96 0.97238 ICBHI Dataset
+ Attention-based
CRNN- BiLSTM 0.955182 0.93 0.95326 2 classes
CRNN
(COPD and NON-COPD)
2 classes:
6 classes
Our Study Accuracy - 98.6%
(normal, URTI,
The Attention-based CRNN model with Bidirectional GRU ICBHI Score - 0.9866
bronchiectasis, COPD,
6 classes:
units can selectively attend to certain parts of the input pneumonia, and
Accuracy - 97%
bronchiolitis)
sequence while processing it in both forward and backward ICBHI Score - 0.9723
directions, thus achieving the highest validation accuracy of
0.9692% and a macro average F1-score of 0.96. Classification V. CONCLUSION
report of the model is depicted in Table VIII. In this research, we have exploited different attention-based
CRNN models for respiratory disease classification using
TABLE VIII: Report on Multi-Class Classification audio inputs. The empirical findings of our study demonstrate
Precision Recall F1-score Support that our attention-based CRNN framework outperforms the
COPD 0.99 0.98 0.98 159 existing literature in both binary and multi-class respiratory
Healthy 0.96 0.98 0.97 48 disease classification tasks, according to a thorough evaluation
URTI 0.94 0.91 0.93 35
Bronchiectasis 0.93 0.97 0.95 29 we did on the ICBHI Challenge data. For the binary classifi-
Pneumonia 0.96 0.99 0.97 69 cation task, the attention-based CRNN model with BiLSTM
Bronchiolitis 1.00 0.88 0.94 17 units achieved the highest test accuracy of 98.6% and an
macro avg 0.96 0.95 0.96 357
weighted avg 0.97 0.97 0.97 357
ICBHI score of 0.9866, and for the multi-class classification
task, the attention-based CRNN model with BiGRU units
attained the maximum test accuracy of 97% and an ICBHI
C. Performance comparison to earlier research publications Score of 0.9723.
It would be helpful to extend the research to a broader
A comprehensive analysis contrasting the results obtained range of medical audio recordings in future investigations,
by our model with those of the most recent research publi- possibly evaluating other medical conditions in addition to
cations is shown in Table IX. It highlights the most recent respiratory ailments. We can expand its capabilities to help
research that has utilised machine learning techniques to medical professionals identify numerous other illnesses, such
categorise lung sounds from the ICBHI 2017 database in order as the likelihood of a heart attack or cardiac arrest based on
to diagnose respiratory illnesses. heartbeat sounds, lung sounds used to diagnose asthma, etc.
Moreover, we can improve the existing system for determining
TABLE IX: Comparison of performance to existing literature the disease severity. It would also be beneficial to deploy
Author Dataset Methods and Results
the suggested model on smartphones with limited resources
ICBHI Dataset and weight [20] in order to evaluate how well it performs in
Marı́a Teresa
6 classes Mel-Spectrogram practical situations.
(normal, bronchiolitis, + CNN
Garcı́a-Ordás
bronchiectasis, Accuracy - 99%
et al. (2020) [19]
pneumonia, and F1-Score-90%
URTI, and COPD)
ICBHI Dataset
MFCC Features
6 classes
+ RNN
V. Basu et al. (normal, bronchiolitis,
Accuracy - 95.67%
(2020) [6] bronchiectasis,
and
pneumonia,
F1-Score-95.66%.
URTI, and COPD)
KAUH Dataset
Entropy features
+ ICBHI Dataset
+ Decision Trees
Luay Fraiwan 6 classes
Accuracy - 98.27%
et al.(2021) [8] (normal, BRON disorders,
and
pneumonia, asthma,
F1-Score - 93.61%.
and heart failure)
ICBHI Dataset MFCCs + CNN
Arpan Srivastava 2 classes ICBHI Score - 0.93
et al.(2021) [10] (COPD and and
NON-COPD) AUC - 0.89.

July 6-8, 2023
IEEE - 56998
R EFERENCES [10] A. Srivastava, S. Jain, R. Miranda, S. Patil, S. Pandya, and K. Kotecha,

“Deep learning based respiratory sound analysis for detection of chronic
[1] B. M. Rocha, D. Filos, L. Mendes, G. Serbes, S. Ulukaya, Y. P. obstructive pulmonary disease,” PeerJ Computer Science, vol. 7, p. e369,
Kahya, N. Jakovljevic, T. L. Turukalo, I. M. Vogiatzis, E. Perantoni, 2021.
et al., “An open access database for the evaluation of respiratory sound [11] C. Wall, L. Zhang, Y. Yu, A. Kumar, and R. Gao, “A deep ensemble
classification algorithms,” Physiological measurement, vol. 40, no. 3, neural network with attention mechanisms for lung abnormality classi-
p. 035001, 2019. fication using audio inputs,” Sensors, vol. 22, no. 15, p. 5566, 2022.
[2] S. Lehrer, Understanding lung sounds. Steven Lehrer, 2018. [12] J. Saldanha, S. Chakraborty, S. Patil, K. Kotecha, S. Kumar, and
[3] D. Bardou, K. Zhang, and S. M. Ahmad, “Lung sounds classification A. Nayyar, “Data augmentation using variational autoencoders for
using convolutional neural networks,” Artificial intelligence in medicine, improvement of respiratory disease classification,” Plos one, vol. 17,
vol. 88, pp. 58–69, 2018. no. 8, p. e0266467, 2022.
[4] E. Andrès, R. Gass, A. Charloux, C. Brandt, and A. Hentzler, “Res- [13] Librosa, “Feature extraction—librosa 0.8.0 documentation.,” 2020. Ac-
piratory sound analysis in the era of evidence-based medicine and the cessed: January, 2023.
world of medicine 2.0,” Journal of medicine and life, vol. 11, no. 2, [14] M. A. Hossan, S. Memon, and M. A. Gregory, “A novel approach for
p. 89, 2018. mfcc feature extraction,” in 2010 4th International Conference on Signal
[5] D. Perna and A. Tagarelli, “Deep auscultation: Predicting respiratory Processing and Communication Systems, pp. 1–5, IEEE, 2010.
anomalies and diseases via recurrent neural networks,” in 2019 IEEE [15] B. Thornton, “Audio recognition using mel spectrograms and convolu-
32nd International Symposium on Computer-Based Medical Systems tion neural networks,” 2019.
(CBMS), pp. 50–55, IEEE, 2019. [16] M. Kattel, A. Nepal, A. Shah, and D. Shrestha, “Chroma feature
[6] V. Basu and S. Rana, “Respiratory diseases recognition through res- extraction,” in Conference: chroma feature extraction using fourier
piratory sound with the help of deep neural network,” in 2020 4th transform, no. 20, 2019.
International Conference on Computational Intelligence and Networks [17] J. Yang, F.-L. Luo, and A. Nehorai, “Spectral contrast enhancement:
(CINE), pp. 1–6, IEEE, 2020. Algorithms and comparisons,” Speech Communication, vol. 39, no. 1-2,
[7] N. Baghel, V. Nangia, and M. K. Dutta, “Alsd-net: Automatic lung pp. 33–46, 2003.
sounds diagnosis network from pulmonary signals,” Neural Computing [18] G. Brauwers and F. Frasincar, “A general survey on attention mecha-
and Applications, vol. 33, pp. 17103–17118, 2021. nisms in deep learning,” IEEE Transactions on Knowledge and Data
[8] M. Fraiwan, L. Fraiwan, M. Alkhodari, and O. Hassanin, “Recognition Engineering, 2021.
of pulmonary diseases from lung sounds using convolutional neural [19] M. T. Garcı́a-Ordás, J. A. Benı́tez-Andrades, I. Garcı́a-Rodrı́guez,
networks and long short-term memory,” Journal of Ambient Intelligence C. Benavides, and H. Alaiz-Moretón, “Detecting respiratory pathologies
and Humanized Computing, pp. 1–13, 2021. using convolutional neural networks and variational autoencoders for
[9] T. Nguyen and F. Pernkopf, “Lung sound classification using co- unbalancing data,” Sensors, vol. 20, no. 4, p. 1214, 2020.
tuning and stochastic normalization,” IEEE Transactions on Biomedical [20] T. Lawrence and L. Zhang, “Iotnet: An efficient and accurate convolu-
Engineering, vol. 69, no. 9, pp. 2872–2882, 2022. tional neural network for iot devices,” Sensors, vol. 19, no. 24, p. 5541,
2019.

July 6-8, 2023

Attention-Based CRNN Models For Identification of Respiratory Diseases From Lung Sounds

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Attention-Based CRNN Models For Identification of Respiratory Diseases From Lung Sounds

Uploaded by

Copyright:

Available Formats

IEEE - 56998

ATTENTION-BASED CRNN MODELS FOR

IDENTIFICATION OF RESPIRATORY DISEASES

Shashidhar G. Koolagudi Jeny Rajan

14th ICCCNT IEEE Conference

14th ICCCNT IEEE Conference

14th ICCCNT IEEE Conference

Mel-frequency cepstral coefficients (MFCCs) [14] are a

14th ICCCNT IEEE Conference

14th ICCCNT IEEE Conference

14th ICCCNT IEEE Conference

R EFERENCES [10] A. Srivastava, S. Jain, R. Miranda, S. Patil, S. Pandya, and K. Kotecha,

14th ICCCNT IEEE Conference

You might also like