Calibration and Utilization of SMARTEAR

Noise Classification Using Sound Monitoring,
Assessment, and Recording Tool for

Environmental Acoustic Research (SMART-EAR)
by
Carl Kevin L. Mirhan
Undergraduate thesis submitted to the faculty of the

Department of Physics
in partial fulfillment of the requirements for the degree of
Bachelor of Science in Applied Physics
Department of Physics
School of Arts and Sciences
University of San Carlos, Cebu City, Philippines
Contents
Introduction........................................................................................................................1
1.1 Rationale....................................................................................................................1
1.2 Objectives..................................................................................................................4
1.3 Scope and Limitations................................................................................................5
Theory.................................................................................................................................6
2.1 Sound.........................................................................................................................6
2.2 Sound Pressure Level.................................................................................................8
2.3 Noise..........................................................................................................................8
2.4 Frequency Domain Analysis......................................................................................9
2.5 The Sampling Theorem............................................................................................12
2.6 Frequency Weighting...............................................................................................13
2.7 Calibration Methods.................................................................................................16
2.8 Principal Component Analysis................................................................................18
2.9 K-means Clustering.................................................................................................20
2.10 Logistic Regression................................................................................................21
2.11 Receiver Operating Characteristic.........................................................................21
Methodology.....................................................................................................................23
3.1 Calibration................................................................................................................23
3.1.1 Data Acquisition................................................................................................23
3.1.2 Data Analysis....................................................................................................25
3.1.3 Calibration Proper.............................................................................................26
3.2 Field Test.................................................................................................................27
i
3.2.2 Preparation of Frequency Spectrums................................................................28
3.2.3 Principal Component Analysis..........................................................................31
3.2.4 Annotation.........................................................................................................31
3.2.5 Machine Learning Prediction and Accuracy Check..........................................32
3.3 Application of Logistic Regression Model..........................................................33
Results and Discussion....................................................................................................34
4.1 Calibration................................................................................................................34
4.1.1 Single Sensitivity Calculation...........................................................................34
4.1.2 Per Frequency Correction Factor Calculation...................................................36
4.2 Field Test.................................................................................................................41
4.2.2 PCA and Annotation.........................................................................................42
4.2.3 K-means Clustering Results..............................................................................45
4.2.4 Logistic Regression Results..............................................................................54
4.3 Application of Logistic Regression Model..........................................................59
Conclusion and Recommendations................................................................................64
Bibliography:....................................................................................................................68
ii
List of Figures
Figure 2.1 A time domain signal (Sum) broken down into its frequency components....10
Figure 2.2 Comparison between correct sampling and undersampling a signal..............12
Figure 2.3 Equal loudness contours of the human ear for pure tones [9].........................13
Figure 2.4 SPL gains according to different frequencies for A, C and Z weightings [14]
...........................................................................................................................................15
Figure 2.5 An example of the microphone response with respect to frequency [17].......18
Figure 2.6 A sample plot of 50 observations [19]............................................................19
Figure 2.7 A plot of the same 50 observations with respect to their PC’s [19]................19
Y
Figure 3.1 The SMART-EAR system [21].......................................................................24
Figure 3.2 Schematic diagram of experimental setup for calibration...............................25
Figure 3.3 The SMART-EAR system set up in West City Homes..................................28
Figure 3.4 (a) Reference sample in the time domain and (b) Frequency spectrum of the
reference sample................................................................................................................30
Figure 3.5 The SMART-EAR setup in Sidlakan Marketing............................................33

Figure 4.1 Graph of RMS recorded by test and reference microphone shows frequency
dependency........................................................................................................................35
Figure 4.2 Graph of RMS from the time domain vs the RMS from the frequency domain
...........................................................................................................................................37
iii
Figure 4.3 Correction factors for each frequency present in the pink and white noise....38
Figure 4.4 SPL comparison of pure tones after correction factor application..................39
Figure 4.5 SPL comparison of pink and white noise after correction factor application. 40
Figure 4.6 The reference spectrum plotted against itself..................................................41
Figure 4.7 PC plot with labelling according to general time period.................................42
Figure 4.8 PC Plot According to Annotation 1.................................................................43
Figure 4.9 PC Plot According to Annotation 2.................................................................44
Figure 4.10 K-means clustering results............................................................................45
Figure 4.11 Confusion matrix of k-means vs. annotation 1 true values...........................46
Figure 4.12 Confusion matrix of k-means vs. annotation 2 true values...........................46
Figure 4.13 a) Frequency spectrum that contains the barking found in the reference
sample and at a similar intensity and b) The first test frequency spectrum plotted against
the reference frequency spectrum......................................................................................48
sample but not at the same intensity and b) The second test frequency spectrum plotted
against the reference..........................................................................................................50
Figure 4.15 a) Frequency spectrum that contains barking from different dogs found in
the reference sample and b) The third test frequency spectrum plotted against the
reference frequency spectrum............................................................................................52
Figure 4.16 Confusion matrix of logistic regression predictions using a model fitted with
annotation 1 labels vs. annotation 1 true values................................................................54
iv
Figure 4.20 PC plot of the 85 samples according to annotation 1 categorization............59
Figure 4.21 PC plot of the 85 samples according to annotation 2 categorization............60
Figure 4.22 Confusion matrix of the first iteration using the samples from Sidlakan
Marketing...........................................................................................................................61
Figure 4.23 a) Frequency spectrum of the sample with the ambulance siren and b)
Spectrum comparison of the sample with the ambulance and the reference sample.........62
Figure 4.24 Confusion matrix of the second iteration using the samples from Sidlakan
Marketing...........................................................................................................................62
v
List of Tables
Table 1.1 Rules and Regulations of the National Pollution Control Commission (NPCC)
.............................................................................................................................................3
Table 4.1 Different calculated values of sensitivity for each frequency...........................36
Table 4.2 Accuracy and standard error values for predictions made by k-means............59
Table 4.3 Accuracy and standard error values for predictions made by logistic regression
...........................................................................................................................................59
vi
Chapter 1
Introduction
1.1 Rationale
Sound is a vital aspect of our day to day lives. Whether it be in the form of speech,
music, or ambience (a ringing alarm or a humming machine), sound is generated at any
time around us. However, while there exist desirable and pleasant sounds, there is also
the presence of undesirable and unwanted sound in our immediate surroundings. This
undesirable sound, called noise, is often perceived as an environmental stressor and
nuisance.
According to the World Health Organization, noise is an important public health
issue that is featured among the top environmental risks to health. [1] And although
people often grow accustomed to noise levels in their vicinity, if exposure is chronic and
exceeds certain levels, then negative health outcomes can be seen.[2] The commonality
of these negative health outcomes is evident in the sheer number of studies correlating
1
noise and health. For example, in 2010, Vos et al. conducted a Global Burden of Disease
Study and
1
1.1 Rationale 2
estimated that hearing loss affected 1.3 billion people ranking it the 13th most important
contributor to the global years lived with disability (YLD). It was also discovered from
the study that adult-onset hearing loss unrelated to a specific disease process accounted
for 79.0% of the total YLD from hearing loss. [3]
Stansfeld et al. also compiled a thorough study on noise and its many non-
auditory effects on health. Ranging from sleep disturbance to cardiovascular disease to
cognitive difficulties in children, the study submitted that noise pollution and its effects
extend much further than the aforementioned auditory complications. [4]
With the rising concern on noise and its effects on society, along with it comes an
increasing need to study environmental noise levels and, most importantly, to monitor
them. Numerous acoustic sensor monitoring systems have been deployed across the
world. Noriega-Linares et al. utilized a Single Board Computer (SBC) known as
Raspberry Pi as a cost-efficient and customizable acoustic sensor. They created a fully
functional sensor with cloud connectivity, on-board calculations and real-time data
presentation remotely and online. In their pilot test, two devices were deployed in a local
neighborhood in Spain, which analyzed the sound field in long-term measurements,
achieving precise calculations as well as the sending and publishing of the data obtained.
[5]
Whytock et al. also developed an audio recorder which they named Solo using
Raspberry Pi for bioacoustics research. In their study, they were able to deploy around 40
Solo units which gathered 52, 381 hours of audio recordings at a sampling rate of 16
kHz. Spectrograms of frequency vs. time showed that the extracted data from the
1.1 Rationale 2
recorded bird songs of specific species could be accurately utilized to differentiate one
species from another. [6]

1.1 Rationale 3
In the Philippines, acoustic monitoring is currently a necessity. The 1980
amendment to Noise Control Regulations states that Philippine law requires a specific
maximum sound level for different classes of areas at different periods of the day. [7]
Shown in Figure 1 is the different categories of areas and their respective noise level
regulations as set by the National Pollution Control Commission (NPCC). The tabulation
shows that for areas that require quietness, especially schools and homes for the aged, the
maximum allowable noise level is at 50 dB in the day and that for residential areas, the
maximum allowable noise level is at 55 dB. This coincides with the WHO guidelines for
noise control which recommends that for road traffic noise, one of the most common
sources of noise pollution in the Philippines, a noise level of 53 dB is the maximum as
road traffic noise above this level is associated with adverse health effects. [1]
Table 1.1 Rules and Regulations of the National Pollution Control Commission (NPCC)
1.2 Objectives 4
The difficulty that arises concerning these regulations is that there is a lack of
proper implementation. A study performed by Vergel et al. discovered that tricycles in
Metro Manila exhibited noise levels that far exceeded the WHO recommended 53 dB.
Using a sound level meter, they measured the noise levels of tricycles traveling along the
road within the vicinity of major residential areas. They found that the noise levels
generated by tricycles ranged from 88 – 100 dBA where dBA is the A-weighted noise
level which is a correction applied to a measured or calculated sound to mimic the
varying sensitivity of the ear to sound for different frequencies. This range comes from
the variation of the load carried by the tricycles as well as the speed and the slope of the
road on which they were travelling. The study concluded that measured roadside noise
levels at a residential area with high tricycle traffic exceeded the local noise standards at
all times of the day. [8]
1.2 Objectives
This study aims to calibrate as well as test the effectivity of an easily deployable acoustic
sensor system in the Philippines. The system known as SMART-EAR, or Sound
Monitoring, Assessment, and Recording Tool for Environmental Acoustic Research, will
be the primary focus of this study. Because the microphone attached to the SMART-EAR
device is uncalibrated, a comparison would have to be made between the device’s
microphone and a laboratory standard microphone to ensure accurate results.
Specifically, their computed SPL readings will be the main parameter to be compared.
1.2 Objectives 4
This is important because with the help of this system, noise level data will be accurately
gathered across the

1.3 Scope and Limitations 5
day and noise levels in schools, offices and residential areas can be monitored with much
ease.
This study also aims to analyze the frequency spectrums extracted from the
recordings to characterize specific acoustic activities. A residential area, for example, can
easily be identified by the presence of dogs barking in the neighborhood. This will be the
primary activity used in this study and, by using a reference sample where this activity
was prevalent, samples where the activity took place can be identified and clustered. This
study will also attempt to identify and cluster samples that contained barking in general.
1.3 Scope and Limitations
This study will focus on the calibration of the SMART-EAR system using the computer
software, LabVIEW. It will not tackle the complete development of the system itself.
That can be found in Fornis, R.’s study entitled “Development of Sound Monitoring,
Assessment, and Recording Tool for Environmental Acoustic Research (SMART-
EAR)”[21].
During this study, gathering of data will only be carried out in one residential area
as well as one commercial area. West City Homes, a subdivision located in Labangon,
Cebu City as well as Sidlakan Marketing, a rice trading business located at Tabo-an,
Cebu City, will be the areas utilized for this study. For the sake for privacy, data
gathered by the SMART-EAR system will be analyzed in the form of a frequency
spectrum and not of a frequency-time spectrum. The acoustic activity used as a reference
in this study will be the barking of a female Yorkshire Terrier living in the subdivision.
Chapter 2
Theory
Before addressing the calibration as well as the application of the SMART-EAR system,
it is important to review what sound is and how it is produced. This would, in turn, give
greater insight into how the SMART-EAR system operates so that its application and use
would be at its most efficient. The concepts of sound pressure level, frequency weighting,
Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT) and noise will be
properly addressed in this section. Common microphone calibration methods will also be
discussed here.
2.1 Sound
Sound acts as a stimulus via the propagation of pressure changes in a wave motion across
an elastic medium. In the case of human beings, this medium is usually air. When this
6
wave of pressure changes reaches our ears, our sense of hearing is excited which then
translates
6
2.1 Sound 7
into our generalized perception. [9] These pressure changes are brought about by
mechanical vibrations of the objects surrounded by the medium.
Because sound is a propagation of pressure changes, it would make the most
sense that, to measure sound, one would have to measure its sound pressure, denoted by p
. This is actually the most accessible parameter to measure. However, with regards to its
effects, the energy content of a certain sound signal over a period of time is more relevant
than its instantaneous value and thus, the root mean square (RMS) value is of more
importance. This RMS value is defined as [10]:
T
~
p=
√
1
∫
T 0
p2 (t ) dt (2.1)
where T is the averaging time.
The pressure fluctuation, however, is actually very small compared to the normal
atmospheric air pressure and the faintest perceivable sound is of the order 20 μPa or 2
x 10−5Pa. On the other hand, the upper limit of perceivable sound is often called the
threshold of pain and is of the order 20 Pa. [11] This tells us that sound pressures would
have about six orders of divisions as
20 Pa
−5
=10 6
2 x 10 Pa
2.3 Noise 8
2.2 Sound Pressure Level
Because of the large dynamic range of audible sounds, the strength of a sound is best
described as a logarithm of the sound pressure. This logarithm is commonly known as the
Sound Pressure Level, or SPL, and is defined by:
p2
L=10 log10
( )
p02
¿ 20 log 10 ( pp )
0
(2.2)
where L is the SPL, p denotes the pressure of a certain sound signal at a given time, and
p0=¿2 x 10−5Pa, which is the internationally standardized acoustic reference pressure.
[11] This measurement has a unit of decibels or dB.
2.3 Noise
As defined previously in the introduction, noise is any undesirable sound, often perceived
as an environmental stressor. It is worth noting that noise is rarely of a constant sound
pressure and therefore, its strength fluctuates over time. Keeping this in mind, a need for
an equivalent sound pressure level is important and this is attained if the RMS of sound
pressure is utilized instead of its instantaneous value.

2.4 Frequency Domain Analysis 9
By using Eq. 2.1 and applying it to Eq. 2.2, the equivalent SPL is:
p2
~
Leq =20 log 10
( )p0
2
¿ 20 log 10 (√ 1
∫
T 0
p 2 ( t ) dt
p0 ) (2.3)
This equation gives us a constant SPL during an averaging time T which has the
same total energy of a varying SPL of that same time interval. In most guidelines and
evaluations, the equivalent SPL (or noise level) is one of the most frequently utilized
data.
2.4 Frequency Domain Analysis
Signals are generally represented in the time domain, which simply provides the
amplitudes of a signal at the instants of time during which it was sampled. Fourier’s
theorem, however, states that a signal x(t) can be expressed as a sum of sinusoids of
frequencies. Graphing these frequency constituents as well as their corresponding
amplitudes produces a frequency spectra of x(t) which is a frequency domain description
instead of a time domain one.
Breaking down a time-domain signal into its frequency domain components
requires the utilization of the Fourier transform given by the following equation [12]:
∞
X ( f )= ∫ x( t)e−2 πift dt (2.4)
−∞
where x ( t ) is the time domain signal and X ( f ) is its Fourier transform.
Figure 2.1 A time domain signal (Sum) broken down into its frequency components
Consequently, the time signal can be retrieved from its frequency components by
the inverse transform given by the formula
∞
x (t)= ∫ X ( f ) e2 πift df (2.5)
−∞
Because this study will be dealing with discretized, digital signals, equations 2.4
and 2.5 cannot apply since they only deal with continuous signals over a certain period.
The Fourier transform for discrete samples is known as the Discrete Fourier Transform
(DFT) and is characterized by the following formula [12]:
N−1
1
X ( f )=( ) ∑ x (t ) e
N n=0
−2 πift / N
(2.6)
Also, the Discrete Inverse Fourier Transform is given by:
N−1
x ( t )= ∑ X ( f ) e2 πift / N (2.7)
n=0
DFT computation from the definition provided is often a slow and impractical
way of representing a signal from the time-domain to its frequency spectrum. To
overcome this obstacle, most algorithms employ the Fast Fourier Transform (FFT),
which not only reduces the number of operations from the order of N 2 to N log 2 N but also
retains all the properties of the DFT. [12] Labview, the primary software to be utilized in
this study, employs the FFT in calculating the DFT of a time-domain signal.
To check the accuracy of the FFT algorithm, Parseval’s Theorem will be used.
The theory states that the total energy computed in the time domain must equal the total
energy computed in the frequency domain. [14] It is a statement of conservation of
energy defined by the following equation in the discrete form:
n −1 n−1
2 1 2
∑|xi| = ∑
n k=0
| X k| (2.8)
i=0
where x iand X k are discrete FFT pairs
and n is the number of samples in the sequence.

2.5 The Sampling Theorem 12
2.5 The Sampling Theorem
Real signals are continuous-time, analog signals. This poses a problem for computers and
sensors which operate on discretized, digital data. Processing a continuous-time signal
through a discrete-time system bridges the gap between the continuous-time and discrete-
time worlds. The difference between the highest and lowest frequencies of the spectral
components of a signal is the bandwidth of the signal and the Sampling Theorem states
that a real signal whose spectrum is bandlimited to f max Hz, can be reconstructed exactly
from its samples taken uniformly at a rate f s ≥ 2 f max samples per second. The minimum
sampling rate is therefore [16]:
f s= 2 f max (2.9)
This bandlimit f max is called the Nyquist frequency while the minimum sampling
rate f s, is called the Nyquist rate. Applying this theorem removes the aliasing, or the poor
representation, of signals since the signal would be adequately sampled. An example of
the difference between an aliased signal and an adequately sampled signal is shown in
Figure 2.2.
Figure 2.2 Comparison between correct sampling and undersampling a signal

2.6 Frequency Weighting 13
2.6 Frequency Weighting
In the case of human hearing, sensitivity is frequency dependent. This means that,
subjectively, comparing two tones of different frequencies will not sound equally loud
even if they both have the same SPL. [11] This was demonstrated in a study by Robinson
et.al. in 1956 wherein they employed the constant stimulus method. Participants in their
study were tasked to make comparisons between a pure tone of constant sound pressure
level and frequency and another pure tone of 1 kHz which had randomly varied pressure
levels. [13] They termed these loudness levels as ‘phon’ and because a 1 kHz tone was
used as reference, the phon would be equivalent to the sound pressure level of that 1 kHz
tone. Although this procedure required averaging of numerous results, the overall data
was consistent and so they compiled their findings in the figure below:
Figure 2.3 Equal loudness contours of the human ear for pure tones [9]
An example of this finding would be that an 80 dB, 50 Hz tone only generates a
loudness level of 60 phon while an 80 dB, 1 kHz tone generates a loudness level of 80
phon. This leads us to the conclusion that human hearing is very much frequency
dependent and so, when using a measuring device or sensor, a method is needed to mimic
this frequency response.
To bridge this gap between objective sensor measurements and subjective human
hearing, frequency weightings are used. These are networks with frequency dependent
gains, and the International standard for sound level meters, IEC 61672-1, commonly
uses the A, C and Z weightings. The attenuation for decibel readings may be obtained by
the following formula:
A weight ( f )=20 log (W ( f ) ) (2.10)
where W ( f ) is the weight function in terms of frequency.
The Z weighting has no filter applied on the signal and its gain is therefore 0. Its
weight function is defined by:
(2.11)
W Z ( f )=1
The C weighting is a network wherein a
filter is applied up to a certain frequency cutoff
point. It is primarily used to assess noise with low frequency content and primarily
focuses on peak values of the signal. [14] Decibel gains for each frequency are computed
(2.12)
using Equation 2.6 and its weight function is defined by:
2
f
( )
( ( ) )( (
W C ( f )=1.007152
1+
20.6 Hz
f
20.6 Hz
2
1+
1
f
12194 Hz )
2
)
The A weighting is similar to the C weighting except the filter is applied up to a
higher frequency cutoff point. It is mainly applied for general sound level measurement
and is most commonly used in occupational safety and health acts. Its weight function is
defined by:
2 2 2
f f f
( ) ( ) ( ) (2.13)
( ( ) )( √ ( ) )( √ ( ) )( (
W A ( f )=1.258905
1+
20.6 Hz
f
20.6 Hz
2
1+
107.7 Hz
f
107.7 Hz
2
1+
737.9 Hz
f
737.9 Hz
2
1+
1
f
12194 Hz )
2
)
Figure 2.4 shows a graph of the SPL gains plotted against the different
frequencies for different frequency weightings:
Figure 2.4 SPL gains according to different frequencies for A, C and Z weightings [14]
2.7 Calibration Methods 16
2.7 Calibration Methods
The focus of this study is to successfully calibrate as well as use the SMART-EAR
system in a practical setting. The sound recording for a digital microphone is in voltage.
The sensitivity of the microphone is used to get the pressure value of the recorded sound
below:
(2.14)
V
P=
Sensitivity
In the study by Barrera-Figueroa, et.al. [15], they discussed two important
methods of microphone calibration. The first is the sequential method wherein the
microphone under test (UT) and the reference microphone (REF) are located at the same
spatial position and are made to record the signal from a certain sound source one after
the other. The drawback of this method is that, for the sake of voltage comparison
between the two microphones, the sound signal must be temporally stable.
The second method, which is the simultaneous method, gets rid of this temporal
stability requirement by recording the same signal simultaneously. However, its
drawback is also that, at both positions, the sound pressure from the signal must be equal.
Typically, a distance of around 2 meters will be observed between the sound
source and the microphones. This is to ensure that the wave front of the signal will be flat
when it reaches the microphones. Another important aspect is that the room in which
these methods will be employed in must be anechoic. Considering this, the sequential
method is more advantageous in minimizing the effects of sound reflection that would
occur in the room since both microphones would be located in the same place. The
equation used to calibrate the microphone using the comparison method is shown by:
V UT
Sensitiyvity UT =Sensitivity REF ( )
V REF
(2.15)
where V UT is the output voltage of the microphone under test and V REF is the output
voltage of the reference microphone.
The comparison method, however, can only be applied to test microphones with a
flat frequency response. In the case of calibrating a microphone whose response varies
depending on the frequency being sampled, the frequency spectrums for both the
microphone under test and the reference microphone would have to be analyzed. In a
study published by Garg, et.al. in 2018, they employed an averaging method for
calibrating smartphone microphones using a novel per-frequency method. [17] After
simultaneously recording environmental noise as uncompressed WAV files using the
smartphone microphone (test) and a class 1 microphone (reference), the average
frequency spectrums with a frequency resolution of 1Hz for both systems were
computed. They did this by dividing the time signal into a set of overlapping frames and
computing the average FFT for all frames. The correction factors were then obtained by
taking the difference of the two spectrums obtained from the WAV files. The researchers
used the following equation to define the ith correction factor:

2
| X i|
c i=10 log10
( )
|Y i|
2
(2.16)
where X i and Y i are the ith coefficient in
the frequency spectrum for the reference and smartphone, respectively.

2.8 Principal Component Analysis 18
Figure 2.5 An example of the microphone response with respect to frequency [17]
2.8 Principal Component Analysis
In this study, each sample will be broken down into their constituent frequency SPL
levels. Because the goal of this study is to identify characteristic acoustic events in the
samples, it would be extremely difficult to enact a comparison between each frequency
for all samples. Principal Component Analysis (PCA) reduces the dimensionality of a
data set consisting of a large number of interrelated variables, while retaining as much as
possible of the variation present in the data set [19]. It is an orthogonal linear
transformation that transforms the data to a new coordinate system such that the greatest
variance by some scalar projection of the data comes to lie on the first coordinate (called
the first principal component), the second greatest variance on the second coordinate, and
so on [20]. Each new component is termed as a Principal Component (PC).

2.8 Principal Component Analysis 19
Figure 2.6 A sample plot of 50 observations [19]
To visualize this process, suppose we have 50 samples, each of which are
composed of two variables x 1and x 2. PCA focuses on the variances of the two variables
and transforms the variables into two principal components z 1and z 2. The transformed
plot is found in Figure 2.7.
Figure 2.7 A plot of the same 50 observations with respect to their PC’s [19]
2.9 K-means Clustering 20
Mathematically, the transformation is defined by a set of size l of p-dimensional
vectors of weights or coefficientsw(k) =(w1 , …. , w p)(k) that map each row vector x(i) to a
new vector of principal component scores t (i)=(t 1 , … . , t l )(i), given by [19]
t k(i)=x (i) . w(k) for i=1 , … ,n k =1 , … ,l (2.17)
In the scenario that there are more than two variables, PCA would still work in
reducing the dimensionality of the data. For example, in this study, a single recording is
composed of thousands of frequencies, each having their own value. PCA would reduce
these thousands of variables into a specified number of PC’s. By reducing the number of
variables to work on, PCA is a fundamental tool in making the data easier to visualize
and understand.
2.9 K-means Clustering
K-means clustering is an unsupervised machine learning method which partitions n
number of observations into a defined target number of k clusters. Each cluster refers to a
collection of data points aggregated together because of similarities in their features.
Once a target number of clusters has been defined, k-means utilizes an
expectation-maximization algorithm to locate the best centroids for each cluster. Once
centroids have been located, each data point is allocated to each cluster by reducing the
in-cluster sum of squares. And because k-means is unsupervised, labelling of the data is
not needed to carry out clustering. However, it will be important in checking the accuracy
of the clustering.
2.11 Receiver Operating Characteristic 21
2.10 Logistic Regression
Whereas k-means is an unsupervised machine learning method, logistic regression is a
supervised learning method. In logistic regression, data samples with an already existing
indicator variable are used to train a logistic model. This logistic model predicts the
probability of a certain class or event and, because this is a form of binary regression,
there are only two possible predictions (i.e. pass or fail) every time a new data sample is
fed into the model. In this study, the threshold will be set at 0.5. In other words, if the
computed probability of a sample ranges from 0 to 0.49, the model would classify the
sample as 0. On the other hand, it the computed probability ranges from 0.5 to 1, the
model classifies the sample as 1.
2.11 Receiver Operating Characteristic
To test the accuracy of the clustering by the k-means algorithm as well as the predictions
made by the logistic model, Receiver Operating Characteristic (ROC) will be used. Data
samples in this study are either positive or negative for a certain acoustic activity. ROC
summarizes the results of both k-means and logistic regression by presenting the number
of true positive (positive data samples that were labelled correctly as positive), true
negative (negative data samples that were labelled correctly as negative), false positive
(negative data samples that were labelled incorrectly as positive, “false alarm”), and false
negative (positive data samples that were labelled incorrectly as negative, “misses’) cases
in a confusion matrix. The accuracies in both methods are defined by the following
formula:
2.11 Receiver Operating Characteristic 22
Accuracy ( ACC )=
∑ True Positive + ∑ True Negative (2.18)
∑ Total Population
Chapter 3
Methodology
This chapter will focus on describing in detail the calibration of the Sound Monitoring,
Assessment, and Recording Tool for Environmental Acoustic Research (SMART-EAR)
system using the LabVIEW software as well as the deployment and analysis of the
recordings gathered.
3.1 Calibration
3.1.1 Data Acquisition
The SMART-EAR system is comprised of a Raspberry Pi 3 Model B+ Single Board
Computer with built in data storage. It utilizes an external ADC and, to record sound, a
ReSpeaker 6-Mic Circular Array Kit is attached to the Raspberry Pi. It runs on a
modified Linux operating system named Raspbian which is already fitted with programs
to record
23
3.1.1 Data Acquisition 24
sound. It can record continuously at a sampling rate of 16 kHz on all six channels and can
store the signal retrieved from this system in the form of .wav files.
Figure 3.1 The SMART-EAR system [21]
The reference system used is a Brüel & Kjær (B&K) microphone attached to a
National Instrumentation Data Acquisition Module (NI DAQ). The NI DAQ was then
connected to a computer running the LabVIEW software. This type of microphone is a
laboratory standard microphone characterized by a flat frequency response. Output
signals generated by any sound source was measured using the B&K microphone from 1
Hz to 20 kHz. By utilizing a program built on LabVIEW, the generated signal was sent to
the computer for data logging and, through specifying the sampling rate, was stored at the
same rate as the system under test. This recording was also stored in the form of .wav
files.
For data acquisition, a speaker generating pink noise at varying sound pressure
levels was used. Pure tones were utilized in this study to test whether the test microphone
is of a flat frequency or a curved frequency response. These signals were simultaneously

3.1.2 Data Analysis 25
recorded by the SMART-EAR system and the B&K microphone. Both setups were
situated at least 2 meters from the sound source, as shown in Figure 3.2.
Figure 3.2 Schematic diagram of experimental setup for calibration
3.1.2 Data Analysis
A separate LabVIEW program was created which would access a specified directory and
read the .wav file stored there. The recorded waveform was then represented by the
program as an array of amplitude values in the time domain. The program was built to
obtain the RMS value as well as the calculated SPL reading, based on equations 2.1 and
2.3, from that array. Because Eq. 2.3 calls for the pressure of the signal, the known
sensitivity of the reference microphone was used to convert the signal from its voltage
RMS value to the corresponding pressure value.
For the frequency domain analysis, a second LabVIEW program was created that
performed the Fast Fourier Transform (FFT) on the array based on the formula given by
equation 2.6. However, to acquire a more accurate average frequency spectrum for each
sample, the array of amplitude values in the time domain signal was first segmented into
3.1.3 Calibration Proper 26
overlapping frames just as what Garg, et.al. implemented in their study. These
overlapping frames were then averaged and the FFT was taken to generate an average
frequency spectrum. It is noted that the discrete Fourier transform is often defined with
an additional factor of 1/ N.[18] This normalization factor was implemented by the
LabVIEW program. A check on the results of the spectrum calculation was done using
Parseval’s Theorem (Eq.2.8) to ensure that the sample would produce the same
calculated voltage RMS in both time domain and frequency domain.
3.1.3 Calibration Proper
The samples used for calibration were white noise as well as pure tones, all designed to
be within a certain operating frequency range. Because the ReSpeaker 6-Mic Circular
Array Kit attached to the Raspberry Pi can only record at a sampling rate of 16 kHz,
frequencies up to 8kHz were used as audio samples to ensure that there would be no
undersampling. This was done to observe the constraints given by the Sampling
Theorem.
For microphones with flat frequency response, calibration would be done using
the comparison method. By plotting the voltage RMS values of the samples obtained
from the test microphone versus the voltage RMS values of the samples from the
reference microphone, a line of best fit would provide a sensitivity value for the
microphone under test by using equation 2.15. This value should hold true regardless of
the signal being used to calibrate the system.

However, in this study, no single sensitivity value could be applied for the system under
test. Calibration then had to be carried out in the frequency domain. The approach
employed here was similar to the one taken by Garg, et. al. wherein correction factors
per frequency were defined using equation 2.16. Applying these correction factors per
frequency on the recording of the SMART-EAR recording resulted in a corrected voltage
RMS. The correct pressure RMS could then be obtained using the sensitivity of the
reference microphone with the corrected voltage RMS.
To further verify the results of calibration, the correction factors were applied to
the results of white noise and pink noise recordings obtained by the SMART-EAR
system. Once the correction factors were applied to the recordings of the SMART-EAR
system, the voltage RMS values were divided by the sensitivity of the reference
microphone recorded at 0.04701 V/Pa to obtain their corresponding corrected pressure
RMS. Equation 2.2 was then used to calculate the SPL values. The SPL of each sample
of the white noise and pink noise recordings of the reference microphone was calculated
using Labview and compared with the corresponding calibrated SPL.
3.2 Field Test
Once calibrated, the SMART-EAR system was first deployed in West City Homes, a
small subdivision, for field test. The system was placed in an open garage situated right
beside a main road of the subdivision to protect the device from the elements.
3.2.2 Preparation of Frequency Spectrums 28
Figure 3.3 The SMART-EAR system set up in West City Homes
Using the arecord software in the Raspberry Pi, recordings were gathered for
three 24-hour periods. Each sample was 2.5 minutes long and the system was set to
record every 5 minutes. The .wav files were stored into a USB and transferred to a
desktop for analysis.
3.2.2 Preparation of Frequency Spectrums
For this section of the study, preparation of the spectrums was carried out via the
documentation prepared for the SMART-EAR system. Using the Python3 program
written by Fornis, R., the frequency spectrums for each recording was extracted and
saved as a .csv
file. The program also applied correction factors to the frequency spectrums. As
recommended by the study of Fornis, R., only considering spectrum contributions for
frequencies from 19.5 Hz – 8 kHz gives a good accuracy of the results. By discarding
these frequencies, the results from the per frequency correction calibration became more
accurate and was below the most lenient tolerance of ±1.5 dB. [21] The same was carried
out in this study. After obtaining the frequency spectrums from each .wav file, the
voltage values for the first 19 frequencies were discarded and only the frequencies from
20Hz-8 kHz were used for further analysis.
A Python3 program was used to convert the voltage values from the frequency
spectrum to their corresponding dB values. It is useful to note that these values are not dB
SPL (which is the measurement of volume level in the real world) but dB Full Scale, or
dBFS, which is the measurement of digital volume relative to the maximum value.
After the frequency spectrums of each sample were prepared, a reference sample
was selected. For this study, the acoustic activity that was used as a reference is the
barking of a certain female Yorkshire Terrier in the subdivision. Figure 3.4a shows the
reference sample in the time domain while Figure 3.4b shows the frequency spectrum of
the reference sample.

Figure 3.4 (a) Reference sample in the time domain and (b) Frequency spectrum of the
reference sample
3.2.4 Annotation 31
3.2.3 Principal Component Analysis
Each sample was then plotted against the reference sample and Principal Component
Analysis (PCA) was performed on each comparison.
The PCA algorithm used in this study is the one provided for by Python3 Scikit-
learn library. For this study, each (7981, 2) matrix, which is the comparison of the
reference frequency spectrum and test frequency spectrum against each other, was
reduced to a (1,2) matrix. These values are the principal component 1 (PC1) and principal
component 2 (PC2).
The principal component values for each sample with respect to the reference
sample were stored in another .csv file.
3.2.4 Annotation
Annotation was carried out using two categories. For the first category, samples that
contained the exact barking found in the reference sample were annotated with a 1 and
those without the bark were annotated with a 0. This specific categorization will be
referred to as annotation 1 for the rest of this study.
For the second category, samples that contained any form of barking (even those
of dogs not present in the reference sample) were annotated with a 1. Samples without
any barking were annotated with a 0. This general categorization will be referred to as
annotation 2 for the rest of this study.

3.2.5 Machine Learning Prediction and Accuracy Check 32
3.2.5 Machine Learning Prediction and Accuracy Check
Both k-means clustering as well as logistic regression were performed on the annotated
data sets. This study utilized both machine learning modules provided by the Python3
Scikit-learn library.
For k-means, its accuracy was computed by comparing the clustered labels to
both the true values of annotation 1 and annotation 2. Confusion matrices were also
generated from the results of the clustering.
As for the logistic regression method, a randomly selected 30% of the total
number of annotated samples were used to train the model as this is the standard
percentage used in most machine learning studies. For the first iteration, the model was
trained with samples categorized by annotation 1 and predictions made by the model
were compared to the true values of annotation 1. For the second iteration, the model was
were compared to the true values of annotation 2. For the third iteration, the model was
were compared to the true values of annotation 2. For the final iteration, the model was
were compared to the true values of annotation 1. Both the second and fourth iterations
were done to check the robustness of this method, and whether or not a model fit with the
specific categorization can be applied to samples identified with the more generalized
categorization and vice versa.

3.3 Application of Logistic Regression Model 33
3.3 Application of Logistic Regression Model
Once the logistic regression model was fitted by both annotations, an application was
done to another dataset, this time, recordings gathered from a commercial area. The
entire data acquisition and preparation procedure was repeated for recordings gathered
from Sidlakan Marketing, a rice trading business. The SMART-EAR system was also
placed facing the main road.
Figure 3.5 The SMART-EAR setup in Sidlakan Marketing
Annotation 1 and 2 was applied to the samples gathered, however, because the
dog found in the reference sample was not present in this area, annotation 1 was carried
out by labelling all data samples with a true value of 0. Annotation 2 proceeded normally.
The logistic regression models that were fitted by annotation 1 and 2 values were
used to predict labels for the samples gathered here. Predictions made by the model fitted
by annotation 1 was compared to the true annotated values of the samples from the
commercial area. The same was done for predictions made by the model fitted by
annotation 2.
Chapter 4
Results and Discussion
4.1 Calibration
For calibration, the Brüel & Kjær microphone was used as the reference microphone. It
was calibrated using a Brüel & Kjær microphone mini calibrator and its measured
sensitivity was 0.04701 V/Pa.
4.1.1 Single Sensitivity Calculation
The simultaneous comparison method discussed in Section 2.7 was first employed to
determine whether the SMART-EAR microphone’s response was dependent on
frequency or not. Each sample was run at 9 iterations each, each iteration being a
different volume intensity so that a line of best fit may be generated.
34
4.1.1 Single Sensitivity Calculation 35
Figure 4.1 shows that the test microphone has different lines of best fit with
respect to frequency, thereby indicating that the microphone used is frequency dependent.
Also, taking the slope of each line and applying equation 2.15 yields the sensitivity of the
test microphone, each of which is compiled in Table 4.1.
Figure 4.1 Graph of RMS recorded by test and reference microphone shows frequency
dependency
4.1.2 Per Frequency Correction Factor Calculation 36
Table 4.1 Different calculated values of sensitivity for each frequency
4.1.2 Per Frequency Correction Factor Calculation
Since the microphone being utilized is frequency dependent, calibration will have to be
carried out in the frequency domain. Using the LabVIEW program that was developed
for this study, FFT’s of pink and white noise samples from both the test microphone and
the reference microphone were obtained.
To verify Parseval’s theorem, the accuracy of the FFT was checked. The RMS
value of the signals in the time domain was plotted against the RMS value of the same
signals in the frequency domain

Figure 4.2 Graph of RMS from the time domain vs the RMS from the frequency domain
It can be seen in Figure 4.2 that there is a one-to-one correspondence of values.
Also, a line of best fit shows a slope of 1 which means that each sample’s computed
RMS value is the same in the time domain and the frequency domain.
Equation 2.16 was then applied to calculate the correction factor for each
frequency by taking the difference of each sample’s spectrums from both microphones.
Figure 4.2 shows the graph of the average frequency correction factor that resulted from
the computation. The correction factors have a frequency interval of 1 Hz.

Figure 4.3 Correction factors for each frequency present in the pink and white noise
To check the accuracy of these correction factors, the FFT of the samples in
Section 4.1 were also computed. The correction factors were then applied by multiplying
them with the FFT’s of the samples recorded by the test microphone. The SPL of both
test and reference microphones were then compared.

Figure 4.4 SPL comparison of pure tones after correction factor application
Figure 4.4 shows a better alignment of results between the test and reference
microphones for each frequency. Outliers found beyond the line of best fit can be
attributed to samples that experienced clipping during recording. But because pure tones
are rarely seen in our environment, greater emphasis is placed on white and pink noise
since both are random signals having varying intensities at a range of frequencies.
Applying the same method that was done on the pure tone samples, a graph of the SPL
calculated from the samples of the test microphone plotted against the SPL calculated
from the samples of the reference microphone was generated.

Figure 4.5 SPL comparison of pink and white noise after correction factor application
Both Figure 4.4 and 4.5 show that the test microphone has been calibrated and
that its SPL readings have matched those of the laboratory standard B&K microphone.
Because system calibration is assured, the SMART-EAR system will be taken to the
sound source identification portion of the study.

4.2 Field Test
After calibration was accomplished, the SMART-EAR system was deployed at West City
Homes for field testing. The sample rate was set to 16 kHz and the Nyquist frequency
was 8 kHz. The 3 24-hour periods yielded 863 recordings, each 2.5 minutes long.
Frequency spectrums were extracted and, after the SMART-EAR documentation
applied the correction factors, they were saved to a .csv file. The reference recording was
selected and each of the 863 frequency spectrums were plotted against it.
Figure 4.6 The reference spectrum plotted against itself
The red line found in Figure 4.6 is the line that represents one-to-one
correspondence. The orange line, on the other hand, is the line of best fit. These lines and
how they affect the results will be explained in a later section.

4.2.2 PCA and Annotation 42
4.2.2 PCA and Annotation
The principal components of each sample compared to the reference were
computed and saved into a .csv file. A PC graph was constructed from the computed
values where each data point was assigned one of three labels. Samples that were found
from 9 AM to 5 PM were labelled as Daytime, samples from 5 to 9 AM and 6-10 PM
were labelled as Morning and Evening and samples from 10 PM to 5 AM were labelled
with Nighttime. This grouping of time periods is based on the NPCC guidelines (Table
1.1).
Figure 4.7 PC plot with labelling according to general time period

Clustering of samples can already be seen from Figure 4.7. The PC values of the
reference sample plotted against itself are (1243.488515, 8.95E-14). It can be assumed
therefore that the more similar a test sample is to the reference sample, the closer it is to
the x-axis. To test this assumption, samples were manually annotated using the categories
defined by annotation 1 and annotation 2. However, due to time constraints, only 420 of
the 863 samples could be categorized. (Sample data found in Table A.1 of Appendix.)
For annotation 1, 323 of the samples were annotated with 0 as they did not
contain the barking of the Yorkshire Terrier in the reference sample. The rest of the 97
samples were annotated with 1 as they contained the same barking of the Yorkshire
Terrier but at different intensities.
Figure 4.8 PC Plot According to Annotation 1

For annotation 2, 218 of the samples were annotated with 0 since they did not
contain barking of any kind. The other 202 samples were annotated with 1 since they
contained barking of any kind.
Figure 4.9 PC Plot According to Annotation 2

4.2.3 K-means Clustering Results 45
4.2.3 K-means Clustering Results
The 420 samples underwent the k-means clustering algorithm, where the grey
dots represent the centroid of the two clusters. Predictions made by the k-means
clustering were compared to the true annotated values.
Figure 4.10 K-means clustering results

Figure 4.11 Confusion matrix of k-means vs. annotation 1 true values
Figure 4.12 Confusion matrix of k-means vs. annotation 2 true values

From Figure 4.11, K-means clustering was able to label samples with 95%
accuracy by comparing the predictions with the true values of annotation 1. Reiterating
this method 20 times using only a randomly selected 200 samples each iteration gives an
accuracy of 95.6 ± 0.2%.
Figure 4.12, on the other hand, shows that, when compared to the true values of
annotation 2, the k-means clustering is only able to perform with 72% accuracy.
Reiterating this 20 times using randomly selected 200 samples each iteration gives an
accuracy of 71.8 ± 0.5%.
The results show that the first annotation is a more dependable basis of
categorization for k-means as there is a clearer delineation of where one cluster starts and
where the other ends. Comparing the results of k-means clustering in Figure 4.10 and the
PC plot according to annotation 2 labels in Figure 4.9 show that, because there is no
proper delineation of the two clusters in annotation 2, k-means is unable to separate data
points that contained barking from the data points that did not.
This contrast in accuracy can be attributed to the fact that the PC values only
quantify how closely two frequency spectrums are from each other. As an example,
found in Figure 4.13 is the reference sample plotted against another sample which is
closely related to the reference which can be found at the point (1249.432, 187.6807) on
the PC graph. This was identified correctly by the k-means algorithm with respect to both
annotation 1 and 2.
sample and at a similar intensity and b) The first test frequency spectrum plotted against
the reference frequency spectrum
This sample contains the same dog barking in the reference sample at a similar
intensity. The SPL value of the reference sample is recorded at 79.422 dBA and the SPL
value of the test sample at 71.849 dBA. Note that the line of best fit generated from the
scatter plot has a slope of 0.986, which is parallel to the slope of one-to-one
correspondence. It’s y-intercept is also at a value of -8.218 which is only a small distance
of the line from the line of one-to-one correspondence. Because of their similarities, the
computed PC2 is 187.6807 which is close to the x-axis.
For the second sample, the dog found in the reference sample is barking but does
not do so at the same intensity. This is found on the point (1306.846, 398.471) on the PC
graph.
sample but not at the same intensity and b) The second test frequency spectrum plotted
against the reference
This sample was a false negative result with respect to both annotation 1 and 2. In
this test sample, the SPL value is recorded at 54.837 dBA, a sizable difference from the
SPL value of the reference sample. The line of best fit generated from the scatter plot has
a slope of 0.980, which is still parallel to the slope of one-to-one correspondence.
However, it’s y-intercept is valued at -25.224 which is a sizeable distance from the line of
one-to-one correspondence. This distance explains the false negative prediction from the
kmeans algorithm (for the annotation 1) and how SPL can affect the PC values for
clustering.
For the third sample, there is the presence of dogs barking in the vicinity and the
SPL of recording is higher than that of the second example. However, the dog barking in
this recording is of a different breed and has a different bark altogether. This is found on
the point (1198.928, 358.9948) on the PC graph.

Figure 4.15 a) Frequency spectrum that contains barking from different dogs found in
the reference sample and b) The third test frequency spectrum plotted against the
reference frequency spectrum
The slope of the line of best fit is calculated at 0.846 while the y-intercept of the
line is valued at -21.500. In this third sample, the SPL value is recorded at 61.334 dBA.
The PC values calculated from the difference in frequency spectrums cause the sample to
be closer to the kmean centroid where there was no barking than to the centroid where
there was barking. This explains the false negative result (for the second annotation) and
how the frequencies involved can affect the PC values for clustering.
From the three samples above, it can be observed that the slope of the line of best
fit is determined by the dB values of the frequencies of the spectrum. Simply put, the
more similar a test sample is to the reference sample, the closer the slope is to 1. On the
other hand, the overall SPL of the recording determines the distance of the line of best fit
from the line of one-to-one correspondence. It can also be observed that both SPL and the
similarity of frequencies involved in both the test and reference sample can affect the
computation of PC values, which, in turn, affects clustering as well as the accuracy of the
predictions made by k-means.

4.2.4 Logistic Regression Results 54
4.2.4 Logistic Regression Results
Turning to logistic regression, for the first iteration, a model was trained using 30% of the
data according to annotation 1 true values. The model was then tested using all 420
samples, and the predictions made by the model were compared also to the true values of
annotation 1.
annotation 1 labels vs. annotation 1 true values
It is observed that the results from the first iteration closely resemble that of the
results from k-means clustering vs. annotation 1 true values. Reiterating this with
different sets of samples 20 times gives an accuracy of 95.4 ± 0.2%.

For the second iteration, the same model trained in the first iteration was used but
predictions made by the model were contrasted with the true values of annotation 2. The
following confusion matrix resulted from the test.
Reiterating this with different sets of samples 20 times gives an accuracy of 73.2
± 0.2%. There is an increase of false negative results because, much like in k-means
clustering, the model could not distinguish samples that contained any form of barking
from samples without barking.

In the third iteration, training of the model was now done using samples with
annotation 2 labelling. Predictions made by the model were contrasted with the true
values of annotation 2. Figure 4.19 shows the confusion matrix that resulted from this
iteration.
Of the true positive predictions, 96 of the 97 samples that were positive in
annotation 1 were predicted to be positive by the model. The remaining 54 true positive
predictions were samples positive for annotation 2 but not for annotation 1. Running this
iteration with different sets of samples 20 times gives an accuracy of 84.2 ± 0.3%.
Lastly, in the fourth iteration, the logistic regression was still trained using
samples with annotation 2 labelling but the predictions made by the model were
contrasted with the true values of annotation 1. Below is the confusion matrix that
resulted from this iteration.
Observed here is an opposite result to the results of the second iteration for
logistic regression. Here, there is an increased number of false positive predictions. The
algorithm predicted more samples that contained the barking found in the reference
sample than the actual number of samples.
Running this 20 times and using different sets of samples to train the model gives
an accuracy of 83.7 ± 0.5%.

Table 4.2 Accuracy and standard error values for predictions made by k-means
Table 4.3 Accuracy and standard error values for predictions made by logistic regression
Tables 4.2 and 4.3 show the summary of performances of both machine learning
methods for this study. It can be seen that k-means clustering actually behaves similarly
to the logistic regression model trained with annotation 1 values. Both the logistic
regression and k-means clustering approach prove that this method of using PCA to
cluster data is only applicable to the specific case. And generalization, (i.e. being able to
identify other forms of activity in the same nature) is a limitation of the current method.
As for the accuracy values of the model trained by annotation 2, this can be attributed to
overfitting by the Scikit-learn algorithm.

4.3 Application of Logistic Regression Model

To test the effectivity of the models that were fitted, a second set of 85 samples recorded
from the commercial area of Sidlakan Marketing was prepared and annotated. Figure
4.20 and Figure 4.21 show the PC plots with respect to annotation 1 and 2, respectively.
Figure 4.20 PC plot of the 85 samples according to annotation 1 categorization

Figure 4.21 PC plot of the 85 samples according to annotation 2 categorization
After annotating the values, the trained logistic regression models were applied to
predict sample labels. Figure 4.22 shows the confusion matrix that resulted from
comparing predictions made by the model fitted with annotation 1 of the training samples
with the annotation 1 true values.

Figure 4.22 Confusion matrix of the first iteration using the samples from Sidlakan
Marketing
For the first iteration, the logistic regression algorithm was able to predict 96.5%
of the labelling correctly when compared to the true value of annotation 1. Two of the
three false positive samples contained voices which had similar frequencies to the
barking in the reference sample. The third false positive sample contained an ambulance
siren whose frequency was also similar to the barking in the reference sample.
Figure 4.23 a) Frequency spectrum of the sample with the ambulance siren and b)
Spectrum comparison of the sample with the ambulance and the reference sample
For the second iteration, the logistic model was trained using annotation 2 of the
original dataset. Comparing the predictions made with the true annotation 2 labels of the
new dataset produces the following confusion matrix.
Figure 4.24 Confusion matrix of the second iteration using the samples from Sidlakan
Marketing
Observe the sharp decrease in accuracy when applying the model trained by
annotation 2 of the original dataset. The model was only able to identify 69.4% of the
new dataset accurately. It can be concluded that the second model struggled to determine
which samples had barking in the recordings and which ones didn’t. This further proves
that generalization is currently a limitation of the study and that this method is only able
to identify the specific acoustic activity found in the reference sample.

Chapter 5
Conclusion and Recommendations
The study was able to calibrate the Sound Monitoring, Assessment, and Recording Tool
for Environmental Acoustics Research (SMART-EAR) system successfully. The
ReSpeaker 6-Mic Circular Array Kit was utilized as the test microphone.
Two calibration procedures were implemented for this study, calibration by single
sensitivity calculation and calibration by per frequency correction. The reference
microphone used for this study was the laboratory standard Brüel & Kjær microphone
connected to the LabView software. The study revealed that calibration by single
sensitivity was not effective in matching the performance of the test microphone to that
of the reference microphone. This was seen in the multiple sensitivity values that were
calculated for each frequency. The per frequency correction method was the better option
as, upon applying correction factors to the frequency spectrum, the calculated RMS and
64
Chapter 5: Conclusion and Recommendations 65
SPL values of the SMART-EAR system resembled those of the reference microphone.
This shows that the performance of the test microphone now matched that of the test
microphone.
Once calibration was completed, field testing was done. In this study, the
SMART-EAR system was placed in West City Homes and was set to record audio
samples for 2.5 minutes every 5 minutes. This was done for 3 24-hour periods leading to
a total number of 863 samples. Out of these samples, one was selected to be the reference
sample. This reference sample contained the barking of a female Yorkshire Terrier living
in the subdivision. By plotting the frequency spectrums of each sample against the
reference sample, principal component analysis (PCA) was applied to reduce the number
of dimensions from 7981 frequencies to just two principal components.
The PC plot that resulted from this dimensionality reduction was found to be very
effective in clustering data samples where the barking of the Yorkshire Terrier was
present. The PC plot revealed that the closer the frequency sample of the recording
resembled that of the reference sample, the closer the data sample was to the x-axis. It
was also revealed that both the computed SPL value, specifically the dBA, of the
recording and the primary frequencies involved in the sample affected the computed PC
values of the data sample.
K-means clustering was found to be highly effective in clustering samples that
contained the activity found in the reference sample from those that didn’t, with an
accuracy of 95.6 ± 0.2%. However, it struggled to cluster samples that contained barking
of any kind of dog from those that didn’t, with an accuracy of only 71.8 ± 0.5%. This
proved that the current method is limited in identifying barking in the general sense and
that it is only able to identify the barking found in the reference sample.
The results from applying logistic regression to the data set was similar to the
results of the k-means clustering. Using samples categorized by annotation 1 to train the
model, it was able to predict the presence of the reference barking in the test samples
correctly with an accuracy of 95.4 ± 0.2%. Conversely, by using samples categorized by
annotation 2 to train the logistic regression model, it struggled to correctly predict the
presence of any kind of barking in the test samples with an accuracy of only 73.2 ± 0.2%.
These performances closely resemble that of the k-means clustering.
Applying these models to predict labels of a second dataset, this time samples
gathered from a commercial area, revealed a great performance for the model fitted with
samples categorized by annotation 1 with an accuracy of 96.5%. On the other hand, the
model fitted with samples categorized by annotation 2 struggled to determine which
samples had barking in the recordings and which ones did not, giving an accuracy of only
69.4%.
The data proves that this method is able to identify and cluster samples that
contain a certain reference acoustic activity. For future research, it is recommended to
develop a method that would identify samples in a general sense, i.e. being able to
identify any kind of barking and not just the specific one found in the reference sample.
For example, a study done by Sethi, et. al. employed convolutional neural network to
monitor ecosystems autonomously using soundscape data. [22] By creating a high-
dimensional feature space, the study was able to not only cluster specific biomes but also
identify anomalies in the soundscape. The same could be done for future studies.
However, instead of using CNN to create the feature space, different reference samples
could be used to identify and cluster the same samples. Combining these PC plots would
then generate the feature space needed

to identify generalized activities, i.e. using different kinds of barking as references to
cluster samples with barking as a whole.
It is also recommended to look into how extracting the frequencies involved in the
acoustic activity to be identified can affect results. In this study, only the spread of values
relative to the reference sample was taken for comparison. This study did not take into
account nor utilize the specific frequencies involved in the reference sample.
Lastly, during the gathering of samples, there were instances where recording of
the SMART-EAR system would stop abruptly. This was found to be caused by the length
of the recording in that the system often had difficulty writing 2.5 minutes of audio into a
.wav file. It is recommended to shorten the recording time to 1 minute to limit the system
errors that would appear while recording.

Bibliography:
[1] Environmental Noise Guidelines for the European Region (2018). (2018, October 9).
Retrieved from http://www.euro.who.int/en/health-topics/environment-and-
health/noise/publications/2018/environmental-noise-guidelines-for-the-european-region-
2018.
[2] Basner, M., Müller, U., & Elmenhorst, E.-M. (2011). Single and Combined Effects of
Air, Road, and Rail Traffic Noise on Sleep and Recuperation. Sleep, 34(1), 11–23. doi:
10.1093/sleep/34.1.11
[3] Vos T, Flaxman AD, Naghavi M, et al. Years lived with disability (YLDs) for 1160
sequelae of 289 diseases and injuries 1990–2010: a systematic analysis for the Global
Burden of Disease Study 2010. Lancet 2012; 380: 2163–96.
[4] Stansfeld, S. A., & Matheson, M. P. (2003). Noise pollution: non-auditory effects on
health. British Medical Bulletin, 68(1), 243–257.
[5] Noriega-Linares, J., & Ruiz, J. N. (2016). On the Application of the Raspberry Pi as
an Advanced Acoustic Sensor Network for Noise Monitoring. Electronics, 5(4), 74.
[6] Whytock, R. C., & Christie, J. (2016). Solo: an open source, customizable and
inexpensive audio recorder for bioacoustic research. Methods in Ecology and Evolution,
8(3), 308–312.
[7] Department of Environment and Natural Resources and Environment Management
Bureau (1999) RA 8749: The Philippine Clean Air Act, Implementing Rules and
Regulations.
[8] Vergel, K. N., Cacho, F. T. & Capiz, C.L. (2004). A Study on Roadside Noise
Generated by Tricycles. Philippine Engineering Journal PEJ 2004; Vol. 25 No. 2:1–22
[9] Everest, F. A., & Pohlmann, K. C. (2015). Master handbook of acoustics (6th ed.).
New York: McGraw-Hill.
[10] Miyara, F. (2017). Software-based acoustical measurements. Springer.
[11] Kuttruff, H. (2014). Acoustics: an Introduction. Boca Raton: CRC Press.
[12] Randall R. B. (2008) Spectral analysis and correlation. In Handbook of signal

processing in acoustics. Springer.
[13] Robinson, D.W., & Dadson, R.S. (1956). A re-determination of the equal-loudness
relations for pure tones. British Journal of Applied Physics, 7(5), 166-181.
68
Bibliography 69
[14] Bjor O. (2008). Filters. In Handbook of signal processing in acoustics. Springer.

[15] Barrera-Figueroa S., Torras-Rosell A., Rasmussen K., Jacobsen F., and Cutanda-
Henriquez V. (2012). A practical implementation of microphone free-field comparison
calibration according to the standard iec 61094-8. Acoustical Society of America.
[16] Lathi, B. P. (2002). Principles of Linear Systems and Signals. New York: Oxford
University Press.
[17] Garg, S., Lim, K. M., & Lee, H. P. (2019). An averaging method for accurately
calibrating smartphone microphones for environmental noise measurement. Applied
Acoustics, 143, 222–228.
[18] Federico, M. (2017) Software-based acoustical measurements. Springer.
[19] Jolliffe I.T. (2002) Principal Component Analysis, Series: Springer Series in
Statistics, 2nd ed., Springer, NY, XXIX, 487 p. 1
[20] Jolliffe I.T. (2002) Principal Component Analysis, Series: Springer Series in
Statistics, 2nd ed., Springer, NY, XXIX, 487 p. 28 illus
[21] Fornis, R. (2020) Development of Sound Monitoring, Assessment, and Recording
Tool for Environmental Acoustic Research (SMART-EAR).
[22] Sethi, S. S., Jones, N. S., Fulcher, B. D., Picinali, L., Clink, D. J., Klinck, H., . . .
Ewers, R. M. (2020). Characterizing soundscapes across diverse ecosystems using a
universal acoustic feature set. Proceedings of the National Academy of Sciences,
117(29), 17049-17
Appendix
Table A.1 Sample data with PC values as well as annotation

Calibration and Utilization of SMARTEAR

Uploaded by

Copyright:

Available Formats

You might also like

Calibration and Utilization of SMARTEAR

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Calibration and Utilization of SMARTEAR

Uploaded by

Copyright:

Available Formats

Noise Classification Using Sound Monitoring,

Assessment, and Recording Tool for

Undergraduate thesis submitted to the faculty of the

Bachelor of Science in Applied Physics

1.3 Scope and Limitations................................................................................................5

2.2 Sound Pressure Level.................................................................................................8

2.4 Frequency Domain Analysis......................................................................................9

2.5 The Sampling Theorem............................................................................................12

2.6 Frequency Weighting...............................................................................................13

2.7 Calibration Methods.................................................................................................16

2.8 Principal Component Analysis................................................................................18

2.9 K-means Clustering.................................................................................................20

2.10 Logistic Regression................................................................................................21

2.11 Receiver Operating Characteristic.........................................................................21

3.1.1 Data Acquisition................................................................................................23

3.1.2 Data Analysis....................................................................................................25

3.1.3 Calibration Proper.............................................................................................26

3.2 Field Test.................................................................................................................27

3.2.2 Preparation of Frequency Spectrums................................................................28

3.2.3 Principal Component Analysis..........................................................................31

3.2.5 Machine Learning Prediction and Accuracy Check..........................................32

3.3 Application of Logistic Regression Model..........................................................33

Results and Discussion....................................................................................................34

4.1.1 Single Sensitivity Calculation...........................................................................34

4.1.2 Per Frequency Correction Factor Calculation...................................................36

4.2 Field Test.................................................................................................................41

4.2.1 Data Acquisition................................................................................................41

4.2.2 PCA and Annotation.........................................................................................42

4.2.3 K-means Clustering Results..............................................................................45

4.2.4 Logistic Regression Results..............................................................................54

4.3 Application of Logistic Regression Model..........................................................59

Conclusion and Recommendations................................................................................64

Figure 2.2 Comparison between correct sampling and undersampling a signal..............12

Figure 2.6 A sample plot of 50 observations [19]............................................................19

Figure 3.1 The SMART-EAR system [21].......................................................................24

Figure 3.2 Schematic diagram of experimental setup for calibration...............................25

Figure 3.3 The SMART-EAR system set up in West City Homes..................................28

Figure 3.5 The SMART-EAR setup in Sidlakan Marketing............................................33

Figure 4.6 The reference spectrum plotted against itself..................................................41

Figure 4.7 PC plot with labelling according to general time period.................................42

Figure 4.8 PC Plot According to Annotation 1.................................................................43

Figure 4.9 PC Plot According to Annotation 2.................................................................44

Figure 4.10 K-means clustering results............................................................................45

Figure 4.11 Confusion matrix of k-means vs. annotation 1 true values...........................46

Figure 4.12 Confusion matrix of k-means vs. annotation 2 true values...........................46

Figure 4.20 PC plot of the 85 samples according to annotation 1 categorization............59

Figure 4.21 PC plot of the 85 samples according to annotation 2 categorization............60

Table 4.1 Different calculated values of sensitivity for each frequency...........................36

music, or ambience (a ringing alarm or a humming machine), sound is generated at any

undesirable sound, called noise, is often perceived as an environmental stressor and

According to the World Health Organization, noise is an important public health

for 79.0% of the total YLD from hearing loss. [3]

auditory effects on health. Ranging from sleep disturbance to cardiovascular disease to

extend much further than the aforementioned auditory complications. [4]

world. Noriega-Linares et al. utilized a Single Board Computer (SBC) known as

Raspberry Pi as a cost-efficient and customizable acoustic sensor. They created a fully

neighborhood in Spain, which analyzed the sound field in long-term measurements,

species from another. [6]