Professional Documents
Culture Documents
C-AVDI: Compressive Measurement-Based Acoustic Vehicle Detection and Identification
C-AVDI: Compressive Measurement-Based Acoustic Vehicle Detection and Identification
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier xxx
C-AVDI: Compressive
Measurement-Based Acoustic Vehicle
Detection and Identification
BILLY DAWTON1 , (Student Member, IEEE), SHIGEMI ISHIDA2 ,(Member, IEEE), YUTAKA
ARAKAWA1 , (Member, IEEE)
1
ISEE, Kyushu University, Fukuoka, 819-0395 JAPAN (e-mail: {bdawton@f., arakawa@}ait.kyushu-u.ac.jp)
2
Future University Hakodate, Hokkaido, 041-8655 JAPAN (e-mail: ish@fun.ac.jp)
Corresponding author: Billy Dawton (e-mail: bdawton@f.ait.kyushu-u.ac.jp).
A conference paper containing preliminary results of this paper appeared in IEEE VTC2020-Fall [1]. This work was supported in part by
JSPS KAKENHI Grant Numbers JP19KK0257 and JP21K11847 and the Cooperative Research Project Program of RIEC, Tohoku
University.
ABSTRACT
As society grows ever more interconnected, the need for sophisticated signal processing and data analysis
techniques becomes increasingly apparent. This is particularly true in the field of intelligent transportation
systems (ITSs), where various sensing applications generate data at an exponential rate. In this paper,
we present C-AVDI, a compressive measurement-based acoustic vehicle detection and identification
architecture capable of extracting information from vehicle audio signals while sampling at sub-Nyquist
rates. In addition, we further reduce the overall complexity by performing any necessary signal filtering
during the acquisition process, removing the need for a separate filtering stage in the system’s front-end. Our
results obtained from data collected under a range of weather conditions present an accuracy of 80% with a
back-end analog-to-digital converter (ADC) sample rate of 3 kHz, with initial results from a microcontroller
(MCU) implementation of our proposed system presenting an accuracy of 72%.
INDEX TERMS Acoustic signal detection, compressive sensing (CS), intelligent transportation systems
(ITS), vehicle detection
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
In a bid to reduce the computational costs associated with proposed system against our previous work and pro-
AVDI systems, we proposed in our previous work [1] an totyping an initial MCU-based implementation of the
initial attempt at creating a compressive measurement-based system’s feature extraction and classification processes.
acoustic vehicle sensing architecture, capable of detecting Finally, while the purpose of our paper is to design a C-
and identifying vehicles while sampling at sub-Nyquist rates. AVDI system for ITS applications, the underlying concepts
By leveraging certain key properties of compressive sensing can be applied to a wide range of sensing and machine learn-
(CS), a technique first presented in [16] and [17], the system ing (ML) applications. As highlighted in [19] and [20], if we
was able to detect and identify vehicles with an accuracy of are to ensure the continued sustainability and accessibility
83% for a back-end sample rate of 3 kHz. of ITS, as well as that of ML and artificial intelligence (AI)
The goal of this paper is to present a low-cost, low- in general, then it is necessary to reduce the financial and
complexity alternative to existing AVDI sensing systems that environmental impact of these technologies. We believe that
can implemented on a microcontroller (MCU) and whose the techniques outlined in this paper can help contribute
performance is comparable to that of currently available towards this goal.
systems. This is achieved first and foremost by reducing the
The remainder of this paper is structured as follows: we
sample rate at which the vehicle sounds are acquired. Indeed,
examine the existing literature in Section II, describe the
the biggest limitation of MCUs is the amount of available
theory and properties of CS in Section III, and present our
volatile and nonvolatile memory, and while this is not a
proposed system in Section IV. The system evaluation results
problem when working with short signals such as in [18],
are shown in Section V and discussed in Section VI.
it makes it impossible to use MCUs in AVDI applications
where the signal length is typically on the order of a few
II. RELATED WORK
seconds. By reducing the number of samples, we lower the
memory requirements required for both the implementation There is a wide variety of existing research that uses su-
of the classifier (nonvolatile memory), and the feature extrac- pervised learning techniques to detect and identify vehicles
tion process (volatile memory). using features extracted from their acoustic signatures.
We seek to expand and build upon the results obtained by The frequency-domain features extracted from vehicle au-
our proof-of-concept system in [1] in a number of ways. dio signals are used with a support vector machine (SVM)
First, we remove the front-end filtering stage and instead classifier to identify and classify vehicles and their associated
filter the input signal directly during the acquisition process parameters in [6], and used to detect vehicles for the purpose
using a spectrally tailored bipolar pseudorandom sequence. of collision avoidance in non-line-of-sight situations in [7].
This reduces the number of components in the system and Mel-frequency cepstral coefficients (MFCCs) are used in
allows the filtering parameters to be adjusted by modifying conjunction with ML or deep learning (DL) as features in a
the spectrum of the sequence rather than the hardware. Sec- number of existing AVDI systems: in [8] they are used with
ond, we test the system under adverse weather conditions, a modified MLP, in [5] they are extracted from a specific
and observe their effects on the detection and identification high energy audio region and used with an ANN and k-
performance. Third, we implement the feature extraction and nearest neighbors (KNN) classifier, and in [9] they are used
classification processes on an MCU, demonstrating the re- in a feature set containing the pitch class profile (PCP) and
duction in both cost and computational complexity presented short-term energy (STE) of vehicle audio signals in a hybrid
by our newly proposed system. convolutional neural network (CNN) containing a long short-
Our main contributions are as follows: term memory (LSTM) layer.
• We propose a compressive measurement-based AVDI In [4], Göksu presents a system capable of analyzing the
(C-AVDI) architecture capable of operating at sub- acoustic signatures of vehicles independently of any changes
Nyquist rates. Leveraging the inherent structure of ve- in engine sound. Using wavelet packet decomposition (WPD)
hicle sound signals enables us to reduce the number of and an MLP classifier, the system obtains engine speed-
samples required to detect and identify passing vehicles independent features from the acoustic signals of passing
by a factor of 16 while maintaining an accuracy compa- vehicles. On the other hand, the system in [10] identifies
rable to those of existing systems. different vehicles based on the sound of their engines using
• We develop a method to simultaneously sample and modulated per-channel energy normalization (Mod-PCEN)
filter incoming signals using spectrally shaped bipolar features in tandem with a Siamese neural network (SNN).
sequences generated from a pair of Markov chains. The All of these aforementioned existing AVDI systems share
shape of these sequences’ spectra is controlled by vary- a common, or very similar, goal and basic approach, but dif-
ing the Markov chain length and transition probability. fer in their implementation, applications, and performance.
In our paper, we use the term filtering to refer to both While these systems generally all present good performance
the attenuation and amplification of frequencies present metrics, the use of computationally intensive input signal
in a signal. processing or a complex supervised learning method offsets
• Demonstrate the viability of C-AVDI as a low-cost, low- the improvements in system cost, complexity, efficiency and
complexity sensing architecture by benchmarking our flexibility.
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
We have proposed several acoustic vehicle detection sys- TABLE 1. Summary of Related AVDI Work
tems in our previous work.
In [11], we present a sequential acoustic vehicle detector Can Can Run on Input Signal Detection
System
(SAVeD) that operates by fitting S-curves generated using Detect Identify MCU Processing /Identification
[1] X X CS RF
generalized cross-correlation (GCC) to points drawn on a [2] X X STFT SVM
sound map using an estimation method based on random [4] X X WPD MLP
sample consensus (RANSAC). This system presents an F- KNN
[5] X X MFCC
measure of 83%. An initial attempt at expanding this method /ANN
to include multilane detection was proposed in [12]. [6] X X DFT SVM
In [2], we design a stereo microphone-based AVDI system [7] X STFT SVM
[8] X X MFCC MLP
(SMBAS) capable of identifying passing vehicles through the
MFCC CNN
use of features extracted from the frequency-domain repre- [9] X X
/PCP/STE /LSTM
sentation of the vehicles’ sound signatures using successive [10] X X Mod-PCEN SNN
short-time Fourier transforms (STFTs). The signals obtained [11] X GCC RANSAC
by a pair of microphones are time-aligned and combined, [12] X GCC RANSAC
creating an emphasized sound signal from which features are [13] X X DWT LR
drawn and used in an SVM classifier. The time alignment
and combination process improves the system estimation
accuracy, particularly when faced with simultaneously and architecture in which information acquired at sub-Nyquist
successively passing vehicles, resulting in a system accuracy rates from different frequency bands is used in a range of
of 95%. applications without prior signal reconstruction.
Both of these systems present good performance metrics, There is also existing research that focuses on the creation
but again the computational cost associated with the two of of sampling strategies used to optimize the performance of
them makes low-power, embedded applications difficult. CS-based systems.
In [13], we propose an ultra-low power vehicle detec- Both [23] and [24] propose a method for shaping the
tor (ULP-VD) capable of detecting passing vehicles with spectra of the pseudorandom bipolar sequences used in var-
minimal computational cost using logistic regression (LR). ious CS architectures to improve signal reconstruction per-
This system, however, is only able to detect the presence of formance. The first proposes a Markov chain-based method
passing vehicles and requires an additional stage to identify inspired by run-length limited (RLL) sequences to both shape
them. the spectrum of the bipolar sequence and limit the rate at
Most recently, in [1], we proposed an initial attempt at which it switches polarity. This paper lays the groundwork
creating a CS-based AVDI system capable of detecting and for subsequent research, but does not fully explore the fil-
identifying vehicles while sampling at sub-Nyquist rates. By tering effects of such a sequence in the context of CS-
combining a more traditional sub-Nyquist sampling archi- based sensing, instead focusing on optimizing signal recon-
tecture with a tailored analog front-end filtering section, the struction. The second proposes a method based on convex
system is able to identify and detect vehicles with an accuracy relaxation to create a binary sequence whose spectrum re-
of 83%, and a back-end sampling rate of 3 kHz using a sembles that of a notch filter. While this method is able to
random forest (RF) classifier. The front-end filtering section produce sequences with well-defined spectra, the complexity
is an integral part of the system; however, its implementation associated with this method makes it unsuitable for low-
as a separate stage requires the use of additional components, power applications.
increasing deployment and implementation costs.
The main characteristics of these AVDI systems are sum- III. COMPRESSIVE SENSING
marized in Table 1. To the best of our knowledge, there is no A. BASICS OF COMPRESSIVE SENSING
current AVDI system capable of performing both detection First presented in [16] and [17], CS is a method to effi-
and identification on an MCU. ciently sample sparse or compressible signals (a signal is
Sub-Nyquist signal processing using the measurements called sparse if it contains only a few nonzero components
obtained during the CS process has been explored as an compared to its total length, and is called compressible if it
alternative to traditional digital signal processing in a range contains a large number of nonzero components, but only
of existing work. a few of significant magnitudes), enabling us to directly
The authors of [21] present a methodology in which acquire the information of interest in a given signal at sub-
signal processing is performed directly on the compressive Nyquist rates rather than sample the signal at the Nyquist rate
measurements obtained during the CS process. The paper and subsequently discard unwanted information.
demonstrates that filtering, detection and classification can be Let us begin by defining a sparse signal x ∈ RN of length
performed directly on the compressive measurements with- Ts whose highest frequency component is W/2 Hz and with
out recovering the signal beforehand. Similarly, the authors N = W Ts . This signal can be expressed as a combination
of [22] propose a multiband compressive signal processing of discrete coefficients α ∈ CN and vectors ψn that form the
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
Analog Digital
First presented in [30] and further expanded upon in [31]
Domain Domain and most notably in [32], the RD enables us to perform CS on
sparse or compressible continuous-time signals rather than
the exclusively discrete signals described in initial theoretical
Markov Signal work. Compared to more sophisticated architectures such
Chain Generator as the modulated wideband converter (MWC) in [22] and
Spectral Shaper
Feature quadrature analog-to-information converter (QAIC) in [33],
PRS(t) Extraction the RD is more straightforward and cheaper to implement as
x(t) it is single channel and only requires a single ADC. Further-
LPF
more, the RD is particularly suitable for our application as it
Classifier
is designed to acquire sparse single band multitone signals,
Vehicle Sound
y[m]
unlike the MWC for instance, which is a multichannel archi-
ADC tecture designed to acquire sparse multiband signals. More
Random Feature Extraction
Demodulator and Classification
details on different CS acquisition strategies can be found
in [27], and [34] provides an in-depth comparison of the RD
FIGURE 1. Proposed system overview. and MWC systems in particular.
Intuitively, we can describe the RD’s operation as follows:
rather than acquiring a signal through traditional Nyquist
sampling, the RD first demodulates the signal by multiplying
it with a white noise-like pseudorandom sequence, spreading
the signal’s frequency content across the entirety of the spec-
Band of interest
trum. The resulting signal is then low-passed a before being
sampled at a sub-Nyquist rate. If required, the original signal
can be recovered from the sub-Nyquist samples through l1
minimization as in Section III-A.
Let us describe the operation of the RD more formally.
An analog signal as described in (1) is combined with a
pseudorandom bipolar sequence of unitary amplitude defined
as:
n n+1
P RS(t) = n , t ∈ , , n = 0, 1, ..., N − 1 (7)
FIGURE 2. Average audio signals for three vehicle classes. C C
where n is a Rademacher sequence that switches between
values {−1, 1} at a rate C = W .
scooters, and periods without a passing vehicle, sampled for The combined signal x(t)P RS(t) is passed through an
a duration Ts = 2s at a rate W = 48 kHz. We can see anti-aliasing LPF h(t) and sampled at a rate R < W to
that the signals show the clearest separation for the highest obtain linear compressive samples y[m]. This procedure can
signal power over a narrow band of interest located just under be expressed as a multiplication followed by a convolution in
3 kHz. the time domain:
As highlighted in Figure 1, in our proposed system, the
processes that constitute the Random Demodulator and Spec- Z ∞
tral Shaper sections are performed on continuous-time ana- y[m] = x(τ )P RS(τ )h(t − τ ) dτ (8)
log signals, and the processes that constitute the Feature Ex- −∞ t=mR
N ∞
traction and Classification section are performed on discrete- X Z
time digital signals. These stages will be examined in more = αn ψn (τ )P RS(τ )h(mR − τ ) dτ (9)
n=1 −∞
detail in the rest of Section IV.
From which we obtain an expression for Θ, whose entries
B. RANDOM DEMODULATOR are defined as θm,n for row m and column n:
Z ∞
At the heart of our proposed system is an architecture called
the Random Demodulator (RD), which is composed of the θm,n = ψn (τ )P RS(τ )h(mR − τ ) dτ (10)
−∞
following blocks: a signal generator, a mixer, a low-pass
filter (LPF), and an analog-to-digital-converter (ADC). In our a In the traditional presentation of the RD architecture, the filtering is
proposed C-AVDI system, we include an additional Markov described as being performed by an integrator. Often, however, in practice
this integrator is considered as, or replaced by an LPF [23, 32, 35–37]. In our
chain block that is used in tandem with the signal generator implementation of the RD, the filtering is performed using an LPF, and will
to spectrally shape the P RS. be referred to and represented as such in the remainder of the paper.
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
−50
No Vehicles p=0.555
Power (dB)
10 −2 p=0.445
−60 p=0.223
p=0.001
10−3
−70
10−4
−80
10−5
0 4 8 12 16 20 24
Frequency (kHz)
10−6
0 16000 32000 48000 64000 80000 96000
Sorted Indices (a)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
−40 p=0.999
1.0 xBB
p=0.888
p=0.666 xLF
−50
p=0.555 0.8 xM F
Power (dB)
p=0.445
−60 xHF
p=0.223
Amplitude
p=0.001 0.6
−70
0.4
−80
0.2
0 4 8 12 16 20 24
Frequency (kHz)
0.0
(a) 0 4 8 12 16 20 24
Frequency (kHz)
p
FIGURE 6. Spectra of test signals xLF , xM F , xHF , and xBB . Each signal
has the same bandlimit but a different frequency composition (low-frequency,
State 1 State 2
mid-frequency, high-frequency, and broadband).
Output = 1 Output = 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
0.2 0.2
0.5
0.02
0.5 0.5
FIGURE 8. Reconstructed test signals x̂LF , x̂M F , x̂HF , and x̂BB using 2-state Markov chain-generated P RS(t) with transition probability p. Each row
represents one of the four test signals presented in Figure 6, and each column represents a different value of p.
and comparing their REs to those of their single chain- results show the impact of a mismatched P RSC (t)
generated counterparts shown in Table 3. The spectra of on signal reconstruction, underlining the importance of
the optimal dual chain-generated sequences and of the using an appropriately designed P RSC (t).
reconstructed test signals are shown in Figure 10, and • The purpose of the third reconstruction test is to visual-
the optimal dual chain reconstruction parameters are ize the filtering properties of the P RSC (t). We recon-
summarized in Table 4. These results show that using struct the broadband signal, xBB , using four P RSC (t)
dual chain-generated sequences improves reconstruc- sequences whose spectra can be likened to a low-pass,
tion performance compared to single chain-generated band-pass, and high-pass filter for the first three signals,
sequences, even if only marginally. and with the spectrum of the final signal resembling
• The purpose of the second reconstruction test is to white noise. The spectra of the bipolar sequences and
visualize the effects of any mismatches between the of the reconstructed test signals are shown in Figure 12.
input signal and dual chain-generated sequence. This These results show that in addition to having a di-
is achieved by reconstructing the four test signals using rect impact on reconstruction performance, a tailored
P RSC (t) designed to maximize reconstruction RE and P RSC (t) can attenuate or amplify the frequency con-
comparing the spectra of the original and reconstructed tent of the input signal.
signals. The spectra of the reconstructed signals and of
their respective sequences are shown in Figure 11. These
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
0.2
0.5 0.5
0.5 0.5
0.25
0.5 0.5
0.5
FIGURE 9. Reconstructed test signals x̂LF , x̂M F , x̂HF , and x̂BB using 4-state Markov chain-generated P RS(t) with transition probability p. Each row
represents one of the four test signals presented in Figure 6, and each column represents a different value of p.
F. FEATURE EXTRACTION AND CLASSIFICATION cessing requirements (no input data rescaling required). The
The Feature Extraction and Classification section is com- RF is implemented using the scikit-learn library [41].
posed of a feature extraction block and a classifier block. In
the feature extraction block, we extract a set of 5 features V. EVALUATION
from the y[m] measurements produced by the RD. We select System evaluation is performed using a software-based im-
the 5 most important features out of the 9 used in our plementation of the proposed system with audio data col-
preliminary work [1] to use in our C-AVDI system: lected from vehicles traversing a university campus.
• mean
• standard deviation A. DATA ACQUISITION
• median The data acquisition setup is shown in Figure 13. A pair
• absolute max value of Azden SGM-990 microphones are installed parallel to a
• interquartile range one-lane two-way road and connected to a Sony HDR-MV1
In the classifier block, these extracted features are used as video camera. The microphones are 1m from the ground,
inputs to a multiclass classifier to detect and identify passing the intra-microphone distance is 50 cm, the distance between
vehicles. Classification is performed using an RF classifier, the microphones and the center of the front lane is 3 m, and
chosen for its robustness in the presence of outliers, inherent the distance between the microphones and the center of the
suitability for multiclass classification, and minimal prepro- back lane is 6 m. The microphones’ pickup pattern is set to
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
P RSC , p1 =0.666 p2 = 0.223(2|4) P RSC , p1 =0.999 p2 = 0.999(2|4) P RSC , p1 =0.555 p2 = 0.445(2|2) P RSC , p1 =0.666 p2 = 0.888(2|4)
−30 −32
−10 −38
−40 −34
Power (dB)
−20
−40
−50
−36
−30
−60 −42
−38
−40
0 4 8 12 16 20 24 0 4 8 12 16 20 24 0 4 8 12 16 20 24 0 4 8 12 16 20 24
Frequency (kHz) Frequency (kHz) Frequency (kHz) Frequency (kHz)
(a)
x̂LF , p1 =0.666 p2 = 0.223(2|4) x̂M F , p1 =0.999 p2 = 0.999(2|4) x̂HF , p1 =0.555 p2 = 0.445(2|2) x̂BB , p1 =0.666 p2 = 0.888(2|4)
1.00 1.00 1.00 1.00
(b)
FIGURE 10. Spectra of a) P RSC and (b) reconstructed test signals x̂LF , x̂M F , x̂HF , and x̂BB obtained during the first reconstruction test, with the
corresponding p1 , p2 , and respective chain lengths indicated in parentheses above each subplot.
P RSC , p1 =0.999 p2 = 0.999(2|4) P RSC , p1 =0.999 p2 = 0.999(2|2) P RSC , p1 =0.001 p2 = 0.999(2|4) P RSC , p1 =0.999 p2 = 0.999(2|2)
−30 0 0
(a)
x̂LF , p1 =0.999 p2 = 0.999(2|4) x̂M F , p1 =0.999 p2 = 0.999(2|2) x̂HF , p1 =0.001 p2 = 0.999(2|4) x̂BB , p1 =0.999 p2 = 0.999(2|2)
0.6
0.3 1.0
0.4
0.4
Amplitude
0.2
0.5
0.2 0.2
0.1
(b)
FIGURE 11. Spectra of a) P RSC and (b) reconstructed test signals x̂LF , x̂M F , x̂HF , and x̂BB obtained during the second reconstruction test, with the
corresponding p1 , p2 , and respective chain lengths indicated in parentheses above each subplot.
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
P RSC , p1 =0.999 p2 = 0.999(2|2) P RSC , p1 =0.999 p2 = 0.999(2|4) P RSC , p1 =0.001 p2 = 0.999(4|2) P RSC , p1 =0.666 p2 = 0.666(2|4)
0 −30
−20 −34
−50 −40
Power (dB)
−150
−60 −60 −38
−200
0 4 8 12 16 20 24 0 4 8 12 16 20 24 0 4 8 12 16 20 24 0 4 8 12 16 20 24
Frequency (kHz) Frequency (kHz) Frequency (kHz) Frequency (kHz)
(a)
x̂BB , p1 =0.999 p2 = 0.999(2|2) x̂BB , p1 =0.999 p2 = 0.999(2|4) x̂BB , p1 =0.001 p2 = 0.999(4|2) x̂BB , p1 =0.666 p2 = 0.666(2|4)
1.00
1.0 0.50
0.5 0.5
0.5 0.25
(b)
FIGURE 12. Spectra of a) P RSC and (b) reconstructed test signal x̂BB obtained during the third reconstruction test, with the corresponding p1 , p2 , and
respective chain lengths indicated in parentheses above each subplot.
0)#%$1)$
Chain Signal p1 p2 RE
1st 2nd [×10−5 ]
2-state 2-state x̂LF 0.666 0.001 6.64
x̂M F 0.888 0.555 9.20
x̂HF 0.555 0.445 6.64
x̂BB 0.666 0.445 12.2
+,-.()/0%.1
2-state 4-state x̂LF 0.666 0.233 6.52
x̂M F 0.999 0.999 7.98
x̂HF 0.999 0.001 6.81 !"#$%&'%()*
x̂BB 0.666 0.888 11.2
4-state 2-state x̂LF 0.223 0.666 6.52
x̂M F 0.999 0.999 7.98 FIGURE 13. Experimental setup.
x̂HF 0.001 0.999 6.81
x̂BB 0.888 0.666 11.2
4-state 4-state x̂LF 0.999 0.999 6.68
x̂M F 0.999 0.001 8.12 week; and under clear, windy, and rainy weather conditions.
x̂HF 0.555 0.445 7.00 The average vehicle signal plots using data obtained under all
x̂BB 0.888 0.666 11.3 weather conditions are shown in Figure 2, and the individual
average vehicle signal plots for each weather condition are
shown in Figure 14.
cardioid, and they record the sound of passing vehicles for The three classes considered for classification are cars,
a 30-minute duration at a sample rate of 48 kHz and a bit scooters/motorbikes, and no vehicles. We refer to these as
depth of 16 bits. We average the signals obtained from the Car, Scooter, and NoVeh, respectively. The time at which a
two microphones to create a mono signal used in subsequent given vehicle passes in front of the middle of the microphone
analysis. pair is defined as tp , and the time window for the vehicle
as Tr = tp − T2s ; tp + T2s , where Ts = 2s. As we are
We obtain 14 different datasets by recording vehicle
sounds at the same location with the same setup on 14 dif- not seeking to detect successively or simultaneously passing
ferent days; at different times of day; on different days of the vehicles in this evaluation, we retain only the vehicle signals
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
Computer
Vehicle Sound
ADC y[m]
3kHz
Random 12 Bits Feature Extraction
Demodulator and Classification
−40
−45
Power (dB)
−55
−60
−65
−70
−75
0 0.5 1 1.5 2 2.5 3
Frequency (kHz)
B. SYSTEM SIMULATION
Band of interest In this section, we evaluate the system described in Sec-
tion IV-A by simulating its operation in software. Figure 15
shows an overview of the software implementation of the
system proposed in Figure 1. As previously mentioned, our
proposed system operates by taking an audio signal emitted
from a passing vehicle as input, sampling it at a sub-Nyquist
rate, and outputting a predicted vehicle type. System perfor-
mance is evaluated using the accuracy metric.
As seen in Section IV-B, the switching rate C of a bipolar
sequence is defined as twice the highest frequency contained
in x(t). Typically, C would be set to 48 kHz to match the
(c) sample rate of the microphone’s ADC, ensuring that the en-
FIGURE 14. Average audio signals for three vehicle classes recorded in a) tirety of the input signal’s frequency content has at least a par-
clear, b) windy, and c) rainy conditions. tial tone signature representation at baseband. We, however,
are interested in the information contained in a narrow band
whose upper limit is 3 kHz, so we set the P RS(t) switching
rate as C = 6 kHz accordingly. This limits the amount of
whose Tr does not overlap with those of the preceding high frequency content spread across the frequency spectrum
or following signals. The total number of vehicle sounds during the demodulation process: in our case, we can expect
obtained for each class on each day can be seen in Table 5. all the frequency components below 3 kHz in the input signal
to have at least a partial tone signature representation at
baseband, but not the components located above 3 kHz.
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
0.0002 0.0002
0.0001 0.0001
Amplitude
Amplitude
0.0000 0.0000
−0.0001 −0.0001
−0.0002 −0.0002
(a) (b)
−115
0.0002 Cars
Scooters
−120
No Vehicles
0.0001
Power (dB)
Amplitude
−125
0.0000
−130
−0.0001
−135
−0.0002
−140
0 3000 6000 0.0 0.2 0.4 0.6 0.8 1.0
Measurements Normalized Frequency (πrads/measurement)
(c) (d)
FIGURE 17. Compressive measurements y[m] drawn from the average audio signals of a) Cars, b) Scooters, and c) No Vehicles. Subplot d) shows the respective
normalized frequency representations of the y[m] measurements.
We are looking to design a P RS(t) that, when used x[n]P RSC [n] signal (where signifies elementwise mul-
to demodulate an input signal, will strongly emphasize the tiplication) at rate R = 3 kHz and bit depth B = 12 Bits.
frequency content present in the band of interest outlined in This corresponds to a reduction in sample rate by a factor of
Figure 2 and Figure 14. This same sequence will be used ( 48 kHz 6 kHz
6 kHz )( 3 kHz ) = 16.
when acquiring data under all three weather conditions, as Similarly to Section IV-D, the optimal value of M for our
the location of the frequency band of interest is the same in system is determined by first establishing a theoretical lower
all three cases. bound value that is incrementally increased until we obtain
We create a P RSC [n] ∈ RN with switching rate C, the optimal balance between the smallest number of y[m]
from P RS1 [n] ∈ RN a 4-state chain with p = 0.001 samples and maximum system prediction accuracy. We find
and P RS2 [n] ∈ RN a 2-state chain with p = 0.999. The this value to be M = 6000 and list it, along with the rest of
spectrum of our P RSC [n] is shown in Figure 16. the parameters used in our system simulation, in Table 6.
Our input signal is represented as a discrete version of the The discrete-time and frequency-domain representations
analog input signal x(t), defined as x[n] ∈ RN sampled of the average y[m] measurements for each class obtained
at rate W = 48 kHz during the data acquisition process. by our simulated system are shown in Figure 17.
We approximate the sparsity level of x(t) as K = 400
by determining the number of DFT coefficient magnitudes C. CLASSIFICATION
shown in Figure 3 such that |αn | ≥ 10−1 , and rounding to a Classification is performed on 14 feature sets, which are
single significant figure. obtained by extracting the 5 features outlined in Section IV-F
The anti-aliasing LPF preceding the ADC is a 2nd or- from the y[m] measurements of each of the 14 datasets
der Butterworth filter, and the measurements y[m] ∈ RM described in Section V-A.
are obtained by sampling and quantizing the combined We measure the performance of our system using leave-
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
Car NoVeh
1 Clear 60 8 104 172
2 Clear 21 10 45 76
Actual Type
3 Wind 21 40 64 125
4 Clear 40 57 115 212 8.77 73.95 17.27
5 Clear 52 50 112 214
6 Rain 43 26 77 146
7 Rain 77 30 152 259
1.62 8.64 89.74
Scooter
8 Clear 177 95 305 577
9 Wind 174 103 267 544
10 Clear 143 99 239 481 NoVeh Car Scooter
11 Wind 129 84 234 447 Estimated Type
12 Clear 130 59 197 386
13 Wind 119 51 186 356
14 Rain 228 65 317 610 FIGURE 18. Confusion matrix of the proposed system’s software
implementation.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
Computer Microcontroller
ware implementations onto the MCU using the emlearn
library [42].
Figure 19 shows the MCU implementation of the proposed
1st Chain
4-state
2nd Chain
2-state
system: the Random Demodulator, and Spectral Shaper sec-
p1=0.001 PRS1[n] PRS2[n] p =0.999
2 tions are implemented in software, and the Feature Extrac-
Feature
Spectral Shaper Extraction tion and Classification section is implemented on the MCU.
PRSC[n] 5 Features Figure 20 shows the resulting confusion matrix. We ob-
x[n] x[n]⊙PRSC [n] LPF tain an accuracy of 72%, which is lower than the accuracy
1.5kHz Classifier
Random Forest obtained by the simulated system in Section V-B. Upon
Vehicle Sound
inspection, we find that the features extracted from y[m] in
ADC
3kHz
y[m] the feature extraction block in both the software and MCU
Random Feature Extraction
Demodulator
12 Bits
and Classification implementations of our system are identical. From this, we
can infer that the difference in accuracy between both imple-
FIGURE 19. Overview of our proposed system’s MCU implementation.
mentations is due to the difference in size and complexity
between the ported and original versions of the classifier
models. Indeed,the limited memory of the MCU restricts the
number of trees in the RF classifiers to 100, compared to
81.11 11.74 7.15 the 1000 used in the software implementation, while also
Car NoVeh
VI. DISCUSSION
3.37 5.94 90.69
Scooter
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
Board (MCU) Volatile Memory Non-Volatile Memory ADC Resolution Power Consumption
b c
Teensy 4.1 (iMX RT1062 ) 1024 kB 7936 kB 12 bits 10.0 mW at 24 MHz
d e
LaunchPad (MSP-EXP430FR599 ) 8 kB 256 kB 12 bits 5.9 mW at 16 MHz
f g
Arduino Uno (ATMega328P ) 2 kB 32 kB 10 bits 26.0 mW at 8 MHz
h
PIC18F452 8 kB 32 kB 10 bits 8.0 mW at 4 MHz
b
c
https://www.pjrc.com/store/teensy41.html
https://www.embeddedartists.com/products/imx-rt1062-oem/ based on
iMX RT1060 https://www.nxp.com/docs/en/application-note/AN12245.pdf
d
e
https://www.ti.com/lit/pdf/slau678
https://www.ti.com/lit/ds/slase54d/slase54d.pdf?ts=1626677655975
f
g
https://www.farnell.com/datasheets/1682209.pdf
http://ww1.microchip.com/downloads/en/DeviceDoc/ATmega48A-PA-88A-PA-168A-PA-328-P-DS-DS40002061A.pdf
h
https://ww1.microchip.com/downloads/en/DeviceDoc/39564c.pdf
VOLUME 4, 2016 17
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
[7] Y. Schulz, A. K. Mattar, T. M. Hehn et al., “Hearing what you cannot see: [28] T. Wimalajeewa and P. Varshney, “Compressive sensing based classifica-
Acoustic vehicle detection around corners,” IEEE Robotics and Automa- tion in the presence of intra- and inter- signal correlation,” IEEE Signal
tion Letters, vol. 6, no. 2, pp. 2587–2594, 2021. Processing Letters, vol. 25, pp. 1–1, 07 2018.
[8] E. Alexandre, L. Cuadra, S. Salcedo-Sanz et al., “Hybridizing extreme [29] X. Liu, E. Gönültaş, and C. Studer, “Analog-to-feature (a2f) conversion
learning machines and genetic algorithms to select acoustic features in for audio-event classification,” in 2018 26th European Signal Processing
vehicle classification applications,” Neurocomput., vol. 152, no. C, p. Conference (EUSIPCO), Sep. 2018, pp. 2275–2279.
58–68, Mar. 2015. [Online]. Available: https://doi.org/10.1016/j.neucom. [30] S. Kirolos, J. N. Laska, M. Wakin et al., “Analog-to-information conver-
2014.11.019 sion via random demodulation,” in 2006 IEEE Dallas/CAS Workshop on
[9] H. Chen and Z. Zhang, “Hybrid neural network based on novel audio Design, Applications, Integration and Software, 2006, pp. 71–74.
feature for vehicle type identification,” Sci Rep, vol. 11, 04 2021. [31] J. N. Laska, S. Kirolos, M. F. Duarte et al., “Theory and implementation of
[10] L. Becker, A. Nelus, J. Gauer et al., “Audio feature extraction for vehicle an analog-to-information converter using random demodulation,” in 2007
engine noise classification,” in ICASSP 2020 - 2020 IEEE International IEEE International Symposium on Circuits and Systems, 2007, pp. 1959–
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, 1962.
pp. 711–715. [32] J. A. Tropp, J. N. Laska, M. F. Duarte et al., “Beyond Nyquist: Efficient
[11] S. Ishida, J. Kajimura, M. Uchino et al., “SAVeD: Acoustic vehicle sampling of sparse bandlimited signals,” IEEE Transactions on Informa-
detector with speed estimation capable of sequential vehicle detection,” in tion Theory, vol. 56, no. 1, p. 520–544, Jan 2010.
Proc. IEEE Conf. Intelligent Transportation Systems (ITSC), Nov. 2018, [33] T. Haque, R. T. Yazicigil, K. J.-L. Pan et al., “Theory and design of a
pp. 906–912. quadrature analog-to-information converter for energy-efficient wideband
[12] M. Uchino, B. Dawton, Y. Hori et al., “Initial design of two-stage acoustic spectrum sensing,” IEEE Transactions on Circuits and Systems I: Regular
vehicle detection system for high traffic roads,” in 2020 IEEE International Papers, vol. 62, no. 2, pp. 527–535, 2015.
Conference on Pervasive Computing and Communications Workshops [34] M. A. Lexa, M. E. Davies, and J. S. Thompson, “Reconciling compressive
(PerCom Workshops). Los Alamitos, CA, USA: IEEE Computer Society, sampling systems for spectrally sparse continuous-time signals,” IEEE
mar 2020, pp. 1–6. [Online]. Available: https://doi.ieeecomputersociety. Transactions on Signal Processing, vol. 60, no. 1, pp. 155–171, 2012.
org/10.1109/PerComWorkshops48775.2020.9156248 [35] P. J. Pankiewicz, T. Arildsen, and T. Larsen, “Sensitivity of the random
[13] K. Kubo, C. Li, S. Ishida et al., “Design of ultra low power vehicle detector demodulation framework to filter tolerances,” in 2011 19th European
utilizing discrete wavelet transform,” in Proc. ITS AP Forum, May 2018, Signal Processing Conference, 2011, pp. 534–538.
pp. 1052–1063. [36] J. Zhang, N. Fu, W. Yu et al., “Analysis of non-idealities of low-pass filter
in random demodulator,” in Eighth International Symposium on Precision
[14] M. Uchino, S. Ishida, K. Kubo et al., “Initial design of acoustic vehicle
Engineering Measurement and Instrumentation, J. Lin, Ed., vol. 8759,
detector with wind noise suppressor,” in Proc. Int. Workshop on Pervasive
International Society for Optics and Photonics. SPIE, 2013, pp. 428 –
Computing for Vehicular Systems (PerVehicle), Mar. 2019, pp. 814–819.
435. [Online]. Available: https://doi.org/10.1117/12.2015006
[15] S. Ishida, M. Uchino, C. Li et al., “Design of acoustic vehicle detector with
[37] W. Xu, Y. Cui, Y. Wang et al., “A hardware implementation of random de-
steady-noise suppression,” in Proc. IEEE Conf. Intelligent Transportation
modulation analog-to-information converter,” IEICE Electronics Express,
Systems (ITSC), Oct. 2019.
vol. 13, no. 16, pp. 20 160 465–20 160 465, 2016.
[16] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact [38] R. Baraniuk, J. N. Laska, M. A. Davenport et al., An Introduction
signal reconstruction from highly incomplete frequency information,” to Compressive Sensing. OpenStax, Apr. 2011. [Online]. Available:
IEEE Transactions on Information Theory, vol. 52, no. 2, p. 489–509, Feb https://legacy.cnx.org/content/col11133/latest/
2006. [39] A. Harms, W. U. Bajwa, and A. R. Calderbank, “Shaping the power
[17] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information spectra of bipolar sequences with application to sub-Nyquist sampling,”
Theory, vol. 52, no. 4, pp. 1289–1306, April 2006. in 5th IEEE International Workshop on Computational Advances in
[18] P. Prince, A. Hill, E. Piña Covarrubias et al., “Deploying acoustic Multi-Sensor Adaptive Processing, CAMSAP 2013, St. Martin, France,
detection algorithms on low-cost, open-source acoustic sensors for December 15-18, 2013. IEEE, 2013, pp. 236–239. [Online]. Available:
environmental monitoring,” Sensors, vol. 19, no. 3, 2019. [Online]. https://doi.org/10.1109/CAMSAP.2013.6714051
Available: https://www.mdpi.com/1424-8220/19/3/553 [40] E. van den Berg and M. P. Friedlander, “SPGL1: A solver for large-scale
[19] A. Fukuda, K. Hisazumi, T. Mine et al., “Toward sustainable smart mobil- sparse reconstruction,” December 2019, https://friedlander.io/spgl1.
ity information infrastructure platform: Project overview,” in New Trends [41] F. Pedregosa, G. Varoquaux, A. Gramfort et al., “Scikit-learn: Machine
in E-Service and Smart Computing, Studies in Computational Intelligence, learning in Python,” Journal of Machine Learning Research, vol. 12, pp.
T. Matsuo, T. Mine, and S. Hirokawa, Eds. Springer, Jan. 2018, vol. 742, 2825–2830, 2011.
pp. 35–46. [42] J. Nordby, “emlearn: Machine Learning inference engine for
[20] R. Schwartz, J. Dodge, N. A. Smith et al., “Green AI,” Commun. Microcontrollers and Embedded Devices,” Mar. 2019. [Online]. Available:
ACM, vol. 63, no. 12, p. 54–63, Nov. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.2589394
https://doi.org/10.1145/3381831
[21] M. A. Davenport, P. T. Boufounos, M. B. Wakin et al., “Signal processing
with compressive measurements,” IEEE Journal of Selected Topics in
Signal Processing, vol. 4, no. 2, pp. 445–460, April 2010.
[22] M. Mishali, Y. C. Eldar, O. Dounaevsky et al., “Xampling: Analog to
digital at sub-Nyquist rates,” IET Circuits Devices Syst., vol. 5, no. 1,
pp. 8–20, 2011. [Online]. Available: https://doi.org/10.1049/iet-cds.2010.
0147
BILLY DAWTON (S’20) received the B.E. degree
[23] A. Harms, W. U. Bajwa, and A. R. Calderbank, “A constrained
random demodulator for sub-Nyquist sampling,” IEEE Trans. Signal in Electrical and Electronic Engineering from The
Process., vol. 61, no. 3, pp. 707–723, 2013. [Online]. Available: University of Sussex, Brighton, UK, in 2014 and
https://doi.org/10.1109/TSP.2012.2231077 the M.Sc. degree in Wireless and Optical Commu-
[24] D. Mo and M. F. Duarte, “Design of spectrally shaped binary sequences nications from University College London, Lon-
via randomized convex relaxations,” in 2015 49th Asilomar Conference on don, UK, in 2015. He is currently a doctoral
Signals, Systems and Computers, 2015, pp. 164–168. student with the Department of Information Sci-
[25] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE ence and Electrical Engineering at Kyushu Uni-
Transactions on Information Theory, vol. 51, no. 12, pp. 4203–4215, Dec versity, Fukuoka, Japan. His current research fo-
2005. cuses on new sensing technologies for intelligent
[26] E. Candes and M. Wakin, “An Introduction To Compressive Sampling,” transportation system applications, in particular low-cost, low-complexity
IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, Mar. 2008. acoustic vehicle detection and identification systems.
[27] M. Rani, S. B. Dhok, and R. B. Deshmukh, “A systematic review of
compressive sensing: Concepts, implementations and applications,” IEEE
Access, vol. 6, pp. 4875–4894, 2018.
18 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3132061, IEEE Access
VOLUME 4, 2016 19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/