Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

IEEE - 49239

An automatic siren detection algorithm using


Fourier Decomposition Method and MFCC
Binish Fatimah∗ , Preethi A† , Hrushikesh V‡ ,Akhilesh Singh B§ Harshanikethan R Kotion ¶
Department of Electronics and Communication Engineering, CMR Institute of Technology, Bengaluru
Email: ∗ binish.f@cmrit.ac.in, † preethi.a@cmrit.ac.in, ‡ hruv16e@ccmrit.ac.in, § aksi16ec@cmrit.ac.in, ¶ hark16ec@cmrit.ac.in

Abstract—In this work, an automatic ambulance detector has driver. Making drivers alert help the ambulance reach their
been proposed using features extracted from the siren of the destination faster. The proposed algorithm can be used in
ambulance. The smart ambulance detector can reduce the time various applications some of them are listed below:
required by an ambulance to reach their destination in emergency
situations. In the present scenario, the cars are designed to • In the opening and closing of an automatic gate for
provide more and more luxury to the driver and insulation from emergency vehicle, which can be used in remote place
the outside noise. Along with the traffic noise, the siren sounds
with harsh climatic conditions or during pandemics when
also get muffed, leaving the drivers clueless to the emergency
vehicles in their vicinity. In these cases, the proposed system can human activity needs to be minimized for comfort or
be used to alert the driver about the approaching emergency safety reasons.
vehicle. The proposed system uses audio sensors to record the • In autonomous vehicles that are operated via artificial in-
siren sound and pre-process the acquired signal using a bandpass telligence and require a mechanism to detect the presence
filter. In this work, two sets of features are computed, the first
of emergency vehicles automatically.
set consists of Mel-frequency cepstrum coefficients of the filtered
signal. For the second set, the signal is decomposed in frequency • The device can be used by a person with hearing impair-
domain using Fourier decomposition method. Statistical features ment and by traffic policemen.
such as kurtosis, energy, and variance are computed from each • Integrating traffic signals with such a device helps to
of the sub-bands. Relevant features are selected based on the reduce the delay.
Kruskal-Wallis test. The selected feature set is then used to
train a machine learning model to identify siren sounds from the Various authors [1]–[11] have presented different algorithms
background traffic noise. In this work, we have compared the to detect the siren sound using time domain and frequency
performance of various machine learning algorithms like kNN, domain features. Meucci et al. [1] proposed a pitch based al-
SVM and ensemble bagged trees to select the best model. The
dataset used in this work includes signals from two publicly
gorithm deployed on Atmel processor for siren detection based
available datasets with ambulance siren audio files and traffic on its periodic repetition characteristics. Module Difference
sound files, and also audio data collected from different sources Function (MDF) and peak detection algorithms were used to
on the internet. We have also recorded the siren sounds in city represent the received signal as a pitch function varying over
traffic conditions. time. It was also emphasized that the MDF implementation
Index Terms—Mel-frequency cepstrum coefficients, Fourier requires lesser computational load on low power devices like
Decomposition Method, Siren detection, SVM, Ensemble Bagged Atmel compared to conventional correlation methods. The
trees maximum detection accuracy reported in [1] is 98% when the
SNR (signal to noise ratio) is 10dB. Liaw et al. [2] proposed
I. I NTRODUCTION a method based on the pitch of the ambulance siren with
Ambulances, fire trucks, police and other emergency vehi- the Longest Common Subsequence (LCS) and obtained an
cles conventionally use a flashing light as a visual indicator and accuracy of 85%.
a high decibel siren sound as an audio indicator to alert other Takuya et al. [3] used two times FFT on dsPIC to detect
vehicles on the road and the pedestrians. An indifference to the siren sound. The authors showed that the proposed method
traffic nearby and pre-occupation with activities like listening could detect the siren sound efficiently even under the Doppler
to music or involved in hands-free communication make the effect and for as low as 0dB SNR. However, the time lag in this
drivers clueless about the approaching emergency vehicles. case was found to be 8 seconds. Schroder et al. [4] evaluated
Better and more efficient sound proof materials are being used the applicability of Part Based Models (PBMs) in acoustic
in most of latest models that inhibits any external sounds. signal detection. The authors showed that the PBMs are more
Thus, even with the use of flashing lights and high-decibel flexible in spectral dimension and can provide equivalent
siren signal most of the drivers are unaware of emergency accuracy like Hidden Markov Model (HMM) when the SNR
vehicles in city traffic. This problem can be addressed via values are zero.
a smart detecting and indicating device. The device can be Sung-Won-Park and Trevino [5] used statistical features
integrated with the vehicle’s interiors and can be designed like mean and variance of the reflection coefficients to au-
to lower the volume of the music being played and create tomatically detect siren sounds. Linear prediction algorithm
an audio and visual indication in the dashboard to warn the was implemented on TM DSP chip using Durbin algorithm.

11th ICCCNT 2020


Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on January 04,2024 at 17:04:44 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

Carmel et al. [6] used support vector machine (SVM) to siren and non-siren sounds that include among others car
obtain an accuracy of 98% and false positive rate of 0.4% horns, trains, trams, animal sounds etc. The dataset considered
to detect sirens. The feature set included multiple time do- here has been collected from four sources, first source is
main, frequency domain and wavelet based features. ReliefF available at https://www.kaggle.com/vishnu0399/emergency-
algorithm was used to select the best differentiating features. vehicle-siren-sounds, here 200 ambulance siren sounds, 200
An automatic siren detection algorithm based on mechanical fire truck sirens and 200 traffic noise audio files have been
resonant filters was proposed by Fragoulis et al. in [7]. The provided. Here each file is of 3 second duration. This data has
device essentially implements a mechanical narrow filter bank been manually collected from YouTube and Google. We have
over the frequency range of a siren. Here, the pre-requisite is taken ambulance siren data and traffic noise data from this
high sound intensity of the siren to activate the filter bank and source. The second source is a publicly available dataset [15],
high SNR. from this dataset we selected all the 40 siren sounds given
In [8], Tran, Yan and Tsai developed a recurrent neural and 40 signals from the non-siren category, each signal is of
network to detect the siren sound using Mel-frequency cepstral 5 second maximum duration collected at 44.1 kHz. The third
coefficients (MFCC). Accuracy to detect the siren sound in source for siren and traffic noise is a publicly available dataset
this case is shown as 90% in synthetic -15dB noisy data [16]. Five files corresponding to siren and 243 corresponding
and 93.8% in recorded data in traffic conditions. Tran and to various background sounds like that of cars, animals, wind,
Tsai [9], proposed a deep learning model to detect emergency bird, construction site noise, park background noise, crowd
vehicle based on siren sound of variable input lengths. The noise, car horns, fountain sounds, rain noise, school yard noise,
system incorporated two CNN based networks, one stream etc. have been selected from this data.
processed raw acoustic information and the other was trained As the fourth source, we have collected siren sounds under
with MFCC and log-mel spectrogram combined features. The busy city traffic conditions. The duration of each file in this
study reported a maximum accuracy of 98.24%. case lies in the range of 4-10 seconds and the sampling rate is
Although the classification accuracy obtained in [9] is 44.1 kHz. This dataset is varied in terms of duration of each
higher than [8] but this comes at a higher computational file, place and background conditions.
cost and higher feature set dimensionality. For a practically
III. M ETHODOLOGY
realizable and real time implementable system, our aim is to
obtain higher accuracy with least number of features and a All the emergency vehicles like ambulances, fire trucks,
computationally efficient machine learning algorithm. In this police vehicles, use sirens to inform the other road users of
paper we are proposing a system which uses audio sensors their active state. The other vehicles in this situation should
to record the siren sound and pre-process the acquired signal immediately know of the approaching emergency vehicle and
using a bandpass filter to select frequencies between 500 Hz make way for them to reach their destination as soon as pos-
and 2000 Hz. Here, two set of features are computed, MFCC sible. However, as discussed in the introduction, the luxurious
are computed from the signal obtained after pre-processing and comfortable car interiors and because of any in-vehicle
step and features such as kurtosis, energy, and variance are noise due to radio or any conversation, one might miss these
computed from the sub-bands of the signal obtained using sirens. The objective of the proposed work is to detect the siren
FDM. Since the audio signal is non-stationary and non-linear and to inform the driver about the approaching emergency
FDM is the suitable decomposition scheme, as shown in vehicle. The block diagram of the proposed methodology
various works [12]–[14]. All the features computed from the is shown in Fig. 1. The main steps of the algorithm are
band-pass filtered signal need not be relevant and, thus, a signal acquisition, pre-processing, feature extraction, feature
statistical study is conducted to select only the discriminative selection and siren detection.
features using Kruskal-Wallis test. These selected feature set 1) Signal acquisition: The audio signal is recorded using
are then used to train a machine learning model to identify audio sensors. In this case, we have collected the signals
siren sound from the background traffic noise. The proposed using different audio sensors and also from different
system is novel, low cost model and can be upgraded to serve internet sources and therefore the sampling frequency
in multiple traffic control related applications for emergency varies and also the duration of the recorded signal is
vehicles. different for each case. Thus, the proposed algorithm is
The paper is presented in four sections: Section II gives a developed independent of these quantities.
brief discussion on the data set used in this work. Section III 2) Pre-processing: The recorded signal is then filtered using
presents the methodology used in this work to detect the siren a band-pass filter with 500 Hz lower cut-off frequency
and the results are presented in Section IV. Conclusions and and 2000 Hz of higher cut-off frequency, as this is
future work is given in Section V. the operating frequency range for siren signals used
by emergency vehicles as mentioned by the automotive
II. DATASET industry standard AIS-125. This frequency range is
The objective of this work is to build a model to detect selected to ensure that the proposed framework can be
siren sounds from the traffic sounds. We consider this is as used for a very wide range of sirens and also in most of
a binary classification problem where the two classes are the countries. In this work we have used a finite impulse

11th ICCCNT 2020


Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on January 04,2024 at 17:04:44 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

linear and non-stationary. There is also an overlapping


Siren acquisition between each frame which can be decided based on the
using audio sensors application and signal in hand. In the case of sirens, we
have selected the frame-duration as 1.3 seconds and the
frame shift as 30 milliseconds. Hamming window has
been used for tapering each frame.
Pre-processing Step 3. Magnitude spectrum, X(k), 0 ≤ k ≤ N , of
using band-pass filter each windowed frame is then computed using N-point
passband: 500-2000Hz
discrete Fourier transform (DFT).
Step 4. Mel-filter bank is designed and the magnitude
spectrum of each frame from step 3 is filtered here to
normalize the signal give Mel-spectrum. The mel-filter bank comprises of
band pass shaped filters uniformly spaced on the mel-
scale. The filters designed in this case are triangular
shaped filters. The mel-spectrum of the magnitude spec-
trum is obtained by taking the product of magnitude
Compute Fourier Decomposition
spectrum with each of the mel weighting filters.
MFCCs method
i=N
X
|X(i)|2 Hn (i) , 0 ≤ m ≤ M − 1

M (n) = (1)
Compute Kurtosis i=0
energy and variance where M is the number of channels in the mel-filter
for each sub-band bank, Hn (i) is the weight of the i-th energy spectrum
bin responsible for the n-th output band.
Feature selection Step 5. To ensure that the energies in the filter
bank channels are independent of each other the log-
Ambulance Detection compressed value of these energies are decorrelated
using discrete cosine transform to produce cepstral co-
ML classifier
efficients.
Step 6. Sinusoidal lifter are employed to produce liftered
yes No MFCCs to obtain similar results as [22].
The second set of features are computed using Fourier
Fig. 1: Block Diagram of the proposed siren detection algo- Decomposition method (FDM). The signal is decom-
rithm posed into M sub-bands using FDM and features like
kurtosis, energy and variance of each sub-band is com-
puted. FDM is an adaptive decomposition technique
response filter of order 64 to implement the required used for non-linear and non-stationary data. It has been
band-pass filter. Since the signal has been recorded with shown in various studies that the performance of FDM is
different devices, we have made sure to normalize the far better than empirical mode decomposition (EMD) or
filtered signals such that the amplitude range of these discrete wavelet transform (DWT) when the given signal
signals is not more than |1|. is non-linear or non-stationary [12]–[14]. FDM is also
3) Feature Extraction: MFCC are considered as popular computationally efficient as it is implemented using fast
and efficient features for audio signal processing. MFCC Fourier transform (FFT). For more detail on FDM refer
has been used in [17] for speech recognition, in [18] for [12]. Following are the expressions used to compute the
speaker recognition, in [19] for phoneme recognition time features for each sub-band signal:
and in [9], [20] for siren recognition. The algorithm N −1
presented in [21], [22] has been used here to com- 1 X
V ariance = σi = (sbi [n] − µi ), (2)
pute MFCCs. The following steps are used to compute N n=0
MFCCs for the signal obtained after pre-processing:
Step 1. The high frequency content of the signal is where µi is the mean of the ith sub-band sbi [n],
boosted using a pre-emphasis filter. This is implemented N −1  4
1 X sbi [n] − µi
using a finite impulse response filter with transfer func- Kutosis = (3)
tion as H(z) = 1+αz −1 . The value of α can be between N n=0 σi
0.4 to 1. N −1
Step 2. The signal is segmented into short frames as X
Energy = |sbi [n]|2 (4)
the audio signals are in general considered highly non- n=0

11th ICCCNT 2020


Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on January 04,2024 at 17:04:44 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

The time-frequency response of a siren sound is shown 104Time-Frequency-Energy plot


in Fig. 2 and Fig.3 gives the time-frequency represen- 2
tation obtained after FDM. We have also shown similar
results for a car-horn in Fig. 4 and Fig. 5.

Frequency (Hz)
1.5

104Time-Frequency-Energy plot 1
2
0.5
Frequency (Hz)

1.5
0
0 0.2 0.4 0.6 0.8
1
Time (s)

0.5 Fig. 5: Time-frequency representation of the car horn signal


decomposed using FDM
0
0 0.2 0.4 0.6 0.8
Time (s) 0.001 are removed from the feature set and therefore
Fig. 2: Time-frequency representation of siren signal the confidence of rejecting the null hypothesis is fixed
at 99.9%.
5) Siren Detection: Once the feature set has been selected
104Time-Frequency-Energy plot based on the p-values, we train and test the siren
detection model. For this purpose, we have used three
2
popular machine learning classifiers namely k nearest
neighbour (kNN), support vector machine (SVM) and
Frequency (Hz)

1.5
ensemble bagged trees (EBT). For more detail on
these classifiers refer [23]. The performance of these
1
classifiers is measured using parameters like accuracy
(ACC), sensitivity (SNS) and specificity (SPC). The
0.5
quantities can be computed from the confusion matrix,
Table I, where positive stands for the class of siren
0 signals and negative stands for the signals recorded of
0 0.2 0.4 0.6 0.8
Time (s) traffic, car horns, train sound, etc. i.e. audio signals
other than siren. TP stands for true positive i.e. the
Fig. 3: Time-frequency representation of siren signal decom-
siren sounds that are classified as siren, TN stands for
posed using FDM
the true negatives i.e. the non-siren sounds classified
as non-siren sounds, FP stands for false positive i.e.
the signals that are other than siren but classified by
104Time-Frequency-Energy plot
the model as siren and FN stands for false negative
2 i.e. the siren signal classified as non-siren sounds. The
following equations are, thus, used:
Frequency (Hz)

1.5
TP + TN
ACC = (5)
1 TP + TN + FP + FN
TP
0.5 SNS = (6)
TP + FN
TN
0 SPC = (7)
0 0.2 0.4 0.6 0.8 TN + FP
Time (s) .
Fig. 4: Time-frequency representation of a car horn
TABLE I: Confusion matrix
4) Feature selection: In this work, we have used Kruskal- True Class
Wallis test to reduce the dimensionality of the feature set Positive Negative
computed using the previous step and to get the infor- Predicted class
Positive TP FP
mation regarding the contribution of each feature to the Negative FN TN
classification task. Features with p-values greater than

11th ICCCNT 2020


Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on January 04,2024 at 17:04:44 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

We have also used region under the curve (ROC) and area model. Also, the classifiers used in this work are machine
under the curve (AUC) to select the best model for the learning algorithms which are computationally less expensive
proposed methodology which gives the best true prediction than deep learning algorithms.
rate with least false alarm rate.
TABLE V: Comparison of the proposed algorithm with the
IV. S IMULATIONS literature
In this section, simulation results are presented for the Author Features ACC(%)
proposed methodology. The performance results when the L.Marchegiani et al. [24] EBMs, GTG, MFCC 94
classifier considered is EBT are presented in Table II. Here, the Tran et al. [9] MFCC+spectrogram 98.24
number of FDM sub-bands are varied from 5 to 40 to obtain Tran et al. [8] MFCC 93.8
the optimum number of sub-bands, the number of MFCC Liaw et al. [2] LCS 85
computed are 8. We increased the number of MFCCs to 13, Schroder et al. [4] MFCC+spectrogram 95
but the classification accuracy did not change. Meucci et al. [1] pitch 98.3
Carmel et al. [6] Time domain, frequency 98
TABLE II: Performance of EBT classifier vs number of sub- domain and wavelet fea-
bands obtained with FDM tures
Proposed work MFCC and statistical fea- 98.49 ± 1.37
No. of sub-bands SNS(%) SPC(%) ACC(%) tures
5 93.6 98.32 96.69 ± 1.62
10 93.6 97.9 96.42 ± 2.44
15 92.8 97.9 96.15 ± 1.56 V. C ONCLUSIONS
20 95.6 97.06 96.55 ± 1.75 In this work, a novel siren detection algorithm has been
30 94 98.31 97.37 ± 1.22 presented to detect an approaching emergency vehicle. The
40 97.6 98.74 98.49 ± 1.37 dataset used here consists of siren sounds and various kinds
of traffic sounds like car horns, trains, traffic noise etc., col-
Also, to compare the contribution of features in the classi- lected from various sources. We have computed mel-frequency
fication task we present in Table III the accuracies obtained cepstrum coefficients from the sound files and used FDM to
with EBT classifier. decompose the signal into sub-bands. Kurtosis, energy and
TABLE III: Performance analysis of proposed algorithm for variance are then computed from each of these sub-bands to
different features when EBT is the classifier form the feature set along with the MFCCs. Statistical analysis
of these features is then used to select a smaller dimensional,
Feature SNS(%) SPC(%) ACC(%) more discriminative feature set. With this reduced feature
MFCC 85.6 93.69 90.9 set, machine learning models are trained and the results are
Kurtosis 83.2 96.22 91.7 compared with existing literature. The proposed algorithm can
Energy 95.2 98.95 97.7
detect the siren with an accuracy of 98.49% which should be
Variance 96.8 98.32 97.8
better if this work is implemented in real life. Deep learning
algorithms can be explored for this purpose along with features
To select the best classifier for the proposed framework, computed using higher order statistics of the audio signals.
we compare the performance of different machine learning
algorithms for the given task. The proposed methodology R EFERENCES
[1] F. Meucci, L. Pierucci, E. Del Re, L. Lastrucci, and P. Desii, “A real-
TABLE IV: Performance analysis of different machine learn- time siren detector to improve safety of guide in traffic environment,”
ing algorithm 2008 16th European Signal Processing Conference, pp. 1–5, 2008.
[2] J. Liaw, W. Wang, H. Chu, M. Huang, and C. Lu, “Recognition of the
Classifier SNS (%) SPC(%) mean ACC(%) ambulance siren sound in taiwan by the longest common subsequence,”
kNN 86.4 98.32 94.2 2013 IEEE International Conference on Systems, Man, and Cybernetics,
pp. 3825–3828, 2013.
Decision Tree 95.6 99.57 98.2
[3] T. Miyazaki, M. Shimakawa, and Y. Kitazono, “Research of ambulance
SVM linear 90.00 98.95 95.87 siren detector using dspic,” The Japanese Journal of the Institute of
SVM cubic 90.8 69.54 76.86 Industrial Applications Engineers, vol. 2, no. 1, pp. 11–15, 2014.
SVM quadratic 91.6 95.56 94.21 [4] J. Schrder, S. Goetze, V. Grtzmacher, and J. Anemller, “Automatic
acoustic siren detection in traffic noise by part-based models,” 2013
SVM Gaussian 83.6 98.74 93.53
IEEE International Conference on Acoustics, Speech and Signal Pro-
EBT 97.6 98.74 98.49 cessing, pp. 493–497, 2013.
[5] S.-W. Park and J. Trevino, “Automatic detection of emergency vehicles
gives better performance than most of the existing algorithm, for hearing impaired drivers,” Texas A&M University-Kingsville EE/CS
Department MSC 192 Kingsville J. Acoust. Soc. Am., vol. 50, pp. 637–
as shown in Table V, however since different datasets have 665, 2013.
been used in the contemporary works exiting in the literature [6] D. Carmel, A. Yeshurun, and Y. Moshe, “Detection of alarm sounds in
this comparison has its limitations. The proposed algorithm noisy environments,” 2017 25th European Signal Processing Conference
(EUSIPCO), pp. 1839–1843, 2017.
uses a small feature set to train and test the model, this ensures [7] D. Fragoulis and J. Avaritsiotis, “A siren detection system based on
smaller waiting time for the detector and also leads to efficient mechanical resonant filters,” Sensors, vol. 1, 09 2001.

11th ICCCNT 2020


Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on January 04,2024 at 17:04:44 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

[8] V.-T. Tran, Y.-C. Yan, and W.-H. Tsai, “Detection of ambulance and fire
truck siren sounds using neural networks,” Proceedings of 51st Research
World International Conference, Hanoi, Vietnam, 26th -27th July, 2018.
[9] V. Tran and W. Tsai, “Acoustic-based emergency vehicle detection using
convolutional neural networks,” IEEE Access, vol. 8, pp. 75 702–75 713,
2020.
[10] M. Buck, J. Premont, and F. Faubel, “System and method for acous-
tic detection of emergency sirens,” Jan. 23 2020, US Patent App.
16/516,786.
[11] N. Divij, K. Divya, and A. Badage, “Iot based automated traffic light
control system for emergency vehicles using lora.”
[12] P. Singh, S. D. Joshi, R. K. Patney, and K. Saha, “The Fourier decom-
position method for nonlinear and non-stationary time series analysis,”
Proceedings of the Royal Society of London A: Mathematical, Physical
and Engineering Sciences, vol. 473: 20160871, pp. 1–27, 2017.
[13] A. Singhal, P. Singh, B. Fatimah, and R. B. Pachori, “An efficient
removal of power-line interference and baseline wander from ecg signals
by employing fourier decomposition technique,” Biomedical Signal
Processing and Control, vol. 57, p. 101741, 2020.
[14] B. Fatimah, P. Singh, A. Singhal, and R. B. Pachori, “Detection of
apnea events from ecg segments using fourier decomposition method,”
Biomedical Signal Processing and Control, vol. 61, p. 102005, 2020.
[15] K. J. Piczak, “ESC: Dataset for Environmental Sound Classification,”
2015. [Online]. Available: https://doi.org/10.7910/DVN/YDEPUT
[16] J.-R. Gloaguen, A. Can, M. Lagrange, and J.-F. Petiot, “Creation
of a corpus of realistic urban sound scenes with controlled acoustic
properties,” Proceedings of Meetings on Acoustics, vol. 30, no. 1, p.
055009, 2017.
[17] U. Bhattacharjee, S. Gogoi, and R. Sharma, “A statistical analysis
on the impact of noise on mfcc features for speech recognition,”
2016 International Conference on Recent Advances and Innovations in
Engineering (ICRAIE), pp. 1–5, 2016.
[18] P. Bansal, S. A. Imam, and R. Bharti, “Speaker recognition using mfcc,
shifted mfcc with vector quantization and fuzzy,” 2015 International
Conference on Soft Computing Techniques and Implementations (IC-
SCTI), pp. 41–44, 2015.
[19] S. Dabbaghchian, H. Sameti, M. P. Ghaemmaghami, and B. BabaAli,
“Robust phoneme recognition using mlp neural networks in various
domains of mfcc features,” 2010 5th International Symposium on
Telecommunications, pp. 755–759, 2010.
[20] A. Otlora, D. Osorio, and N. Moreno, “Methods for extraction of features
and discrimination of emergency sirens,” pp. 1525–1532, 2017.
[21] D. Ellis, Reproducing the feature outputs of common
programs using Matlab and melfcc.m., 2005. [Online]. Available:
http://labrosa.ee.columbia.edu/matlab/rastamat/mfccs.html
[22] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu,
G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland,
The HTK book. Cambridge University Engineering Department, 01
2002.
[23] E. Alpaydin, Introduction to Machine Learning, ser. Adaptive Compu-
tation and Machine Learning series. MIT Press, 2014.
[24] L. Marchegiani and P. Newman, “Listening for sirens: Locating and clas-
sifying acoustic alarms in city scenes,” arXiv preprint arXiv:1810.04989,
2018.

11th ICCCNT 2020


Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on January 04,2024 at 17:04:44 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India

You might also like