Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Preprocessing

1. Collect audio datasets of cough, mucus and asthma


2. Convert each audio file to an image – by plotting the spectrogram

Spectrograms represent the frequency content in the audio as colors in an image. Frequency content
of milliseconds chunks is stringed together as colored vertical bars. Spectrograms are basically two-
dimensional graphs, with a third dimension represented by colors.

Time runs from left (oldest) to right (youngest) along the horizontal axis.

The vertical axis represents frequency, with the lowest frequencies at the bottom and the
highest frequencies at the top.

The amplitude (or energy or “loudness”) of a particular frequency at a particular time is


represented by the third dimension, color, with dark blues corresponding to low amplitudes and
brighter colors up through red corresponding to progressively stronger (or louder) amplitudes.

Example:

How this will be done:

a) Divide audio file into millisecond chunks


b) Compute Short-Time Fourier Transform for each chunk
c) Plot this chunk as a colored vertical tine in the spectrogram

Training

3. Training the CNN on these spectrogram images to classify these audio-file images into
asthma, hypothorax and other diseases. It will be a supervised training process, as labels will
be available for each audio clip, which will be stored in a csv file. The file will contain the
path of each audio spectrogram image, and its corresponding label.

The CNN will have the following layers


2 Convolution layers with same kernel size (to be decided)

1 Max Pooling layer with 2X2 pooling size

1 Dropout Layer

1 Flattening Layer

2 Dense layered Neural Network at the end

We will be using keras to train the network after the preprocessing part.

Testing

4. Audio clips recorded using microphones will be used to test the model.

You might also like