Calibration and Utilization of SMARTEAR

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 88

Noise Classification Using Sound Monitoring,

Assessment, and Recording Tool for


Environmental Acoustic Research (SMART-EAR)

by
Carl Kevin L. Mirhan

Undergraduate thesis submitted to the faculty of the


Department of Physics
in partial fulfillment of the requirements for the degree of

Bachelor of Science in Applied Physics

Department of Physics
School of Arts and Sciences
University of San Carlos, Cebu City, Philippines
Contents
Introduction........................................................................................................................1

1.1 Rationale....................................................................................................................1

1.2 Objectives..................................................................................................................4

1.3 Scope and Limitations................................................................................................5

Theory.................................................................................................................................6

2.1 Sound.........................................................................................................................6

2.2 Sound Pressure Level.................................................................................................8

2.3 Noise..........................................................................................................................8

2.4 Frequency Domain Analysis......................................................................................9

2.5 The Sampling Theorem............................................................................................12

2.6 Frequency Weighting...............................................................................................13

2.7 Calibration Methods.................................................................................................16

2.8 Principal Component Analysis................................................................................18

2.9 K-means Clustering.................................................................................................20

2.10 Logistic Regression................................................................................................21

2.11 Receiver Operating Characteristic.........................................................................21

Methodology.....................................................................................................................23

3.1 Calibration................................................................................................................23

3.1.1 Data Acquisition................................................................................................23

3.1.2 Data Analysis....................................................................................................25

3.1.3 Calibration Proper.............................................................................................26

3.2 Field Test.................................................................................................................27

i
3.2.1 Data Acquisition................................................................................................27

3.2.2 Preparation of Frequency Spectrums................................................................28

3.2.3 Principal Component Analysis..........................................................................31

3.2.4 Annotation.........................................................................................................31

3.2.5 Machine Learning Prediction and Accuracy Check..........................................32

3.3 Application of Logistic Regression Model..........................................................33

Results and Discussion....................................................................................................34

4.1 Calibration................................................................................................................34

4.1.1 Single Sensitivity Calculation...........................................................................34

4.1.2 Per Frequency Correction Factor Calculation...................................................36

4.2 Field Test.................................................................................................................41

4.2.1 Data Acquisition................................................................................................41

4.2.2 PCA and Annotation.........................................................................................42

4.2.3 K-means Clustering Results..............................................................................45

4.2.4 Logistic Regression Results..............................................................................54

4.3 Application of Logistic Regression Model..........................................................59

Conclusion and Recommendations................................................................................64

Bibliography:....................................................................................................................68

ii
List of Figures
Figure 2.1 A time domain signal (Sum) broken down into its frequency components....10

Figure 2.2 Comparison between correct sampling and undersampling a signal..............12

Figure 2.3 Equal loudness contours of the human ear for pure tones [9].........................13

Figure 2.4 SPL gains according to different frequencies for A, C and Z weightings [14]
...........................................................................................................................................15

Figure 2.5 An example of the microphone response with respect to frequency [17].......18

Figure 2.6 A sample plot of 50 observations [19]............................................................19

Figure 2.7 A plot of the same 50 observations with respect to their PC’s [19]................19
Y

Figure 3.1 The SMART-EAR system [21].......................................................................24

Figure 3.2 Schematic diagram of experimental setup for calibration...............................25

Figure 3.3 The SMART-EAR system set up in West City Homes..................................28

Figure 3.4 (a) Reference sample in the time domain and (b) Frequency spectrum of the
reference sample................................................................................................................30

Figure 3.5 The SMART-EAR setup in Sidlakan Marketing............................................33


Figure 4.1 Graph of RMS recorded by test and reference microphone shows frequency
dependency........................................................................................................................35

Figure 4.2 Graph of RMS from the time domain vs the RMS from the frequency domain
...........................................................................................................................................37

iii
Figure 4.3 Correction factors for each frequency present in the pink and white noise....38

Figure 4.4 SPL comparison of pure tones after correction factor application..................39

Figure 4.5 SPL comparison of pink and white noise after correction factor application. 40

Figure 4.6 The reference spectrum plotted against itself..................................................41

Figure 4.7 PC plot with labelling according to general time period.................................42

Figure 4.8 PC Plot According to Annotation 1.................................................................43

Figure 4.9 PC Plot According to Annotation 2.................................................................44

Figure 4.10 K-means clustering results............................................................................45

Figure 4.11 Confusion matrix of k-means vs. annotation 1 true values...........................46

Figure 4.12 Confusion matrix of k-means vs. annotation 2 true values...........................46

Figure 4.13 a) Frequency spectrum that contains the barking found in the reference
sample and at a similar intensity and b) The first test frequency spectrum plotted against
the reference frequency spectrum......................................................................................48

Figure 4.14 a) Frequency spectrum that contains the barking found in the reference
sample but not at the same intensity and b) The second test frequency spectrum plotted
against the reference..........................................................................................................50

Figure 4.15 a) Frequency spectrum that contains barking from different dogs found in
the reference sample and b) The third test frequency spectrum plotted against the
reference frequency spectrum............................................................................................52

Figure 4.16 Confusion matrix of logistic regression predictions using a model fitted with
annotation 1 labels vs. annotation 1 true values................................................................54

iv
Figure 4.17 Confusion matrix of logistic regression predictions using a model fitted with
annotation 1 labels vs. annotation 2 true values................................................................55

Figure 4.18 Confusion matrix of logistic regression predictions using a model fitted with
annotation 2 labels vs. annotation 2 true values................................................................56

Figure 4.19 Confusion matrix of logistic regression predictions using a model fitted with
annotation 2 labels vs. annotation 1 true values................................................................57

Figure 4.20 PC plot of the 85 samples according to annotation 1 categorization............59

Figure 4.21 PC plot of the 85 samples according to annotation 2 categorization............60

Figure 4.22 Confusion matrix of the first iteration using the samples from Sidlakan
Marketing...........................................................................................................................61

Figure 4.23 a) Frequency spectrum of the sample with the ambulance siren and b)
Spectrum comparison of the sample with the ambulance and the reference sample.........62

Figure 4.24 Confusion matrix of the second iteration using the samples from Sidlakan
Marketing...........................................................................................................................62

v
List of Tables

Table 1.1 Rules and Regulations of the National Pollution Control Commission (NPCC)
.............................................................................................................................................3

Table 4.1 Different calculated values of sensitivity for each frequency...........................36

Table 4.2 Accuracy and standard error values for predictions made by k-means............59

Table 4.3 Accuracy and standard error values for predictions made by logistic regression
...........................................................................................................................................59

vi
Chapter 1

Introduction

1.1 Rationale

Sound is a vital aspect of our day to day lives. Whether it be in the form of speech,

music, or ambience (a ringing alarm or a humming machine), sound is generated at any

time around us. However, while there exist desirable and pleasant sounds, there is also

the presence of undesirable and unwanted sound in our immediate surroundings. This

undesirable sound, called noise, is often perceived as an environmental stressor and

nuisance.

According to the World Health Organization, noise is an important public health

issue that is featured among the top environmental risks to health. [1] And although

people often grow accustomed to noise levels in their vicinity, if exposure is chronic and

exceeds certain levels, then negative health outcomes can be seen.[2] The commonality

of these negative health outcomes is evident in the sheer number of studies correlating

1
noise and health. For example, in 2010, Vos et al. conducted a Global Burden of Disease

Study and

1
1.1 Rationale 2

estimated that hearing loss affected 1.3 billion people ranking it the 13th most important

contributor to the global years lived with disability (YLD). It was also discovered from

the study that adult-onset hearing loss unrelated to a specific disease process accounted

for 79.0% of the total YLD from hearing loss. [3]

Stansfeld et al. also compiled a thorough study on noise and its many non-

auditory effects on health. Ranging from sleep disturbance to cardiovascular disease to

cognitive difficulties in children, the study submitted that noise pollution and its effects

extend much further than the aforementioned auditory complications. [4]

With the rising concern on noise and its effects on society, along with it comes an

increasing need to study environmental noise levels and, most importantly, to monitor

them. Numerous acoustic sensor monitoring systems have been deployed across the

world. Noriega-Linares et al. utilized a Single Board Computer (SBC) known as

Raspberry Pi as a cost-efficient and customizable acoustic sensor. They created a fully

functional sensor with cloud connectivity, on-board calculations and real-time data

presentation remotely and online. In their pilot test, two devices were deployed in a local

neighborhood in Spain, which analyzed the sound field in long-term measurements,

achieving precise calculations as well as the sending and publishing of the data obtained.

[5]

Whytock et al. also developed an audio recorder which they named Solo using

Raspberry Pi for bioacoustics research. In their study, they were able to deploy around 40

Solo units which gathered 52, 381 hours of audio recordings at a sampling rate of 16

kHz. Spectrograms of frequency vs. time showed that the extracted data from the
1.1 Rationale 2

recorded bird songs of specific species could be accurately utilized to differentiate one

species from another. [6]


1.1 Rationale 3

In the Philippines, acoustic monitoring is currently a necessity. The 1980

amendment to Noise Control Regulations states that Philippine law requires a specific

maximum sound level for different classes of areas at different periods of the day. [7]

Shown in Figure 1 is the different categories of areas and their respective noise level

regulations as set by the National Pollution Control Commission (NPCC). The tabulation

shows that for areas that require quietness, especially schools and homes for the aged, the

maximum allowable noise level is at 50 dB in the day and that for residential areas, the

maximum allowable noise level is at 55 dB. This coincides with the WHO guidelines for

noise control which recommends that for road traffic noise, one of the most common

sources of noise pollution in the Philippines, a noise level of 53 dB is the maximum as

road traffic noise above this level is associated with adverse health effects. [1]

Table 1.1 Rules and Regulations of the National Pollution Control Commission (NPCC)
1.2 Objectives 4

The difficulty that arises concerning these regulations is that there is a lack of

proper implementation. A study performed by Vergel et al. discovered that tricycles in

Metro Manila exhibited noise levels that far exceeded the WHO recommended 53 dB.

Using a sound level meter, they measured the noise levels of tricycles traveling along the

road within the vicinity of major residential areas. They found that the noise levels

generated by tricycles ranged from 88 – 100 dBA where dBA is the A-weighted noise

level which is a correction applied to a measured or calculated sound to mimic the

varying sensitivity of the ear to sound for different frequencies. This range comes from

the variation of the load carried by the tricycles as well as the speed and the slope of the

road on which they were travelling. The study concluded that measured roadside noise

levels at a residential area with high tricycle traffic exceeded the local noise standards at

all times of the day. [8]

1.2 Objectives

This study aims to calibrate as well as test the effectivity of an easily deployable acoustic

sensor system in the Philippines. The system known as SMART-EAR, or Sound

Monitoring, Assessment, and Recording Tool for Environmental Acoustic Research, will

be the primary focus of this study. Because the microphone attached to the SMART-EAR

device is uncalibrated, a comparison would have to be made between the device’s

microphone and a laboratory standard microphone to ensure accurate results.

Specifically, their computed SPL readings will be the main parameter to be compared.
1.2 Objectives 4

This is important because with the help of this system, noise level data will be accurately

gathered across the


1.3 Scope and Limitations 5

day and noise levels in schools, offices and residential areas can be monitored with much

ease.

This study also aims to analyze the frequency spectrums extracted from the

recordings to characterize specific acoustic activities. A residential area, for example, can

easily be identified by the presence of dogs barking in the neighborhood. This will be the

primary activity used in this study and, by using a reference sample where this activity

was prevalent, samples where the activity took place can be identified and clustered. This

study will also attempt to identify and cluster samples that contained barking in general.

1.3 Scope and Limitations

This study will focus on the calibration of the SMART-EAR system using the computer

software, LabVIEW. It will not tackle the complete development of the system itself.

That can be found in Fornis, R.’s study entitled “Development of Sound Monitoring,

Assessment, and Recording Tool for Environmental Acoustic Research (SMART-

EAR)”[21].

During this study, gathering of data will only be carried out in one residential area

as well as one commercial area. West City Homes, a subdivision located in Labangon,

Cebu City as well as Sidlakan Marketing, a rice trading business located at Tabo-an,

Cebu City, will be the areas utilized for this study. For the sake for privacy, data

gathered by the SMART-EAR system will be analyzed in the form of a frequency

spectrum and not of a frequency-time spectrum. The acoustic activity used as a reference

in this study will be the barking of a female Yorkshire Terrier living in the subdivision.
Chapter 2

Theory

Before addressing the calibration as well as the application of the SMART-EAR system,

it is important to review what sound is and how it is produced. This would, in turn, give

greater insight into how the SMART-EAR system operates so that its application and use

would be at its most efficient. The concepts of sound pressure level, frequency weighting,

Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT) and noise will be

properly addressed in this section. Common microphone calibration methods will also be

discussed here.

2.1 Sound

Sound acts as a stimulus via the propagation of pressure changes in a wave motion across

an elastic medium. In the case of human beings, this medium is usually air. When this

6
wave of pressure changes reaches our ears, our sense of hearing is excited which then

translates

6
2.1 Sound 7

into our generalized perception. [9] These pressure changes are brought about by

mechanical vibrations of the objects surrounded by the medium.

Because sound is a propagation of pressure changes, it would make the most

sense that, to measure sound, one would have to measure its sound pressure, denoted by p

. This is actually the most accessible parameter to measure. However, with regards to its

effects, the energy content of a certain sound signal over a period of time is more relevant

than its instantaneous value and thus, the root mean square (RMS) value is of more

importance. This RMS value is defined as [10]:

T
~
p=

1

T 0
p2 (t ) dt (2.1)

where T is the averaging time.

The pressure fluctuation, however, is actually very small compared to the normal

atmospheric air pressure and the faintest perceivable sound is of the order 20 μPa or 2

x 10−5Pa. On the other hand, the upper limit of perceivable sound is often called the

threshold of pain and is of the order 20 Pa. [11] This tells us that sound pressures would

have about six orders of divisions as

20 Pa
−5
=10 6
2 x 10 Pa
2.3 Noise 8

2.2 Sound Pressure Level

Because of the large dynamic range of audible sounds, the strength of a sound is best

described as a logarithm of the sound pressure. This logarithm is commonly known as the

Sound Pressure Level, or SPL, and is defined by:

p2
L=10 log10
( )
p02

¿ 20 log 10 ( pp )
0
(2.2)

where L is the SPL, p denotes the pressure of a certain sound signal at a given time, and

p0=¿2 x 10−5Pa, which is the internationally standardized acoustic reference pressure.

[11] This measurement has a unit of decibels or dB.

2.3 Noise

As defined previously in the introduction, noise is any undesirable sound, often perceived

as an environmental stressor. It is worth noting that noise is rarely of a constant sound

pressure and therefore, its strength fluctuates over time. Keeping this in mind, a need for

an equivalent sound pressure level is important and this is attained if the RMS of sound

pressure is utilized instead of its instantaneous value.


2.4 Frequency Domain Analysis 9

By using Eq. 2.1 and applying it to Eq. 2.2, the equivalent SPL is:

p2
~
Leq =20 log 10
( )p0
2

¿ 20 log 10 (√ 1

T 0
p 2 ( t ) dt

p0 ) (2.3)

This equation gives us a constant SPL during an averaging time T which has the

same total energy of a varying SPL of that same time interval. In most guidelines and

evaluations, the equivalent SPL (or noise level) is one of the most frequently utilized

data.

2.4 Frequency Domain Analysis

Signals are generally represented in the time domain, which simply provides the

amplitudes of a signal at the instants of time during which it was sampled. Fourier’s

theorem, however, states that a signal x(t) can be expressed as a sum of sinusoids of

frequencies. Graphing these frequency constituents as well as their corresponding

amplitudes produces a frequency spectra of x(t) which is a frequency domain description

instead of a time domain one.

Breaking down a time-domain signal into its frequency domain components

requires the utilization of the Fourier transform given by the following equation [12]:


X ( f )= ∫ x( t)e−2 πift dt (2.4)
−∞
2.4 Frequency Domain Analysis 10

where x ( t ) is the time domain signal and X ( f ) is its Fourier transform.

Figure 2.1 A time domain signal (Sum) broken down into its frequency components

Consequently, the time signal can be retrieved from its frequency components by

the inverse transform given by the formula


x (t)= ∫ X ( f ) e2 πift df (2.5)
−∞

Because this study will be dealing with discretized, digital signals, equations 2.4

and 2.5 cannot apply since they only deal with continuous signals over a certain period.

The Fourier transform for discrete samples is known as the Discrete Fourier Transform

(DFT) and is characterized by the following formula [12]:

N−1
1
X ( f )=( ) ∑ x (t ) e
N n=0
−2 πift / N
(2.6)
2.4 Frequency Domain Analysis 11

Also, the Discrete Inverse Fourier Transform is given by:

N−1
x ( t )= ∑ X ( f ) e2 πift / N (2.7)
n=0

DFT computation from the definition provided is often a slow and impractical

way of representing a signal from the time-domain to its frequency spectrum. To

overcome this obstacle, most algorithms employ the Fast Fourier Transform (FFT),

which not only reduces the number of operations from the order of N 2 to N log 2 N but also

retains all the properties of the DFT. [12] Labview, the primary software to be utilized in

this study, employs the FFT in calculating the DFT of a time-domain signal.

To check the accuracy of the FFT algorithm, Parseval’s Theorem will be used.

The theory states that the total energy computed in the time domain must equal the total

energy computed in the frequency domain. [14] It is a statement of conservation of

energy defined by the following equation in the discrete form:

n −1 n−1
2 1 2
∑|xi| = ∑
n k=0
| X k| (2.8)
i=0

where x iand X k are discrete FFT pairs

and n is the number of samples in the sequence.


2.5 The Sampling Theorem 12

2.5 The Sampling Theorem

Real signals are continuous-time, analog signals. This poses a problem for computers and

sensors which operate on discretized, digital data. Processing a continuous-time signal

through a discrete-time system bridges the gap between the continuous-time and discrete-

time worlds. The difference between the highest and lowest frequencies of the spectral

components of a signal is the bandwidth of the signal and the Sampling Theorem states

that a real signal whose spectrum is bandlimited to f max Hz, can be reconstructed exactly

from its samples taken uniformly at a rate f s ≥ 2 f max samples per second. The minimum

sampling rate is therefore [16]:

f s= 2 f max (2.9)

This bandlimit f max is called the Nyquist frequency while the minimum sampling

rate f s, is called the Nyquist rate. Applying this theorem removes the aliasing, or the poor

representation, of signals since the signal would be adequately sampled. An example of

the difference between an aliased signal and an adequately sampled signal is shown in

Figure 2.2.

Figure 2.2 Comparison between correct sampling and undersampling a signal


2.6 Frequency Weighting 13

2.6 Frequency Weighting

In the case of human hearing, sensitivity is frequency dependent. This means that,

subjectively, comparing two tones of different frequencies will not sound equally loud

even if they both have the same SPL. [11] This was demonstrated in a study by Robinson

et.al. in 1956 wherein they employed the constant stimulus method. Participants in their

study were tasked to make comparisons between a pure tone of constant sound pressure

level and frequency and another pure tone of 1 kHz which had randomly varied pressure

levels. [13] They termed these loudness levels as ‘phon’ and because a 1 kHz tone was

used as reference, the phon would be equivalent to the sound pressure level of that 1 kHz

tone. Although this procedure required averaging of numerous results, the overall data

was consistent and so they compiled their findings in the figure below:

Figure 2.3 Equal loudness contours of the human ear for pure tones [9]
2.6 Frequency Weighting 14

An example of this finding would be that an 80 dB, 50 Hz tone only generates a

loudness level of 60 phon while an 80 dB, 1 kHz tone generates a loudness level of 80

phon. This leads us to the conclusion that human hearing is very much frequency

dependent and so, when using a measuring device or sensor, a method is needed to mimic

this frequency response.

To bridge this gap between objective sensor measurements and subjective human

hearing, frequency weightings are used. These are networks with frequency dependent

gains, and the International standard for sound level meters, IEC 61672-1, commonly

uses the A, C and Z weightings. The attenuation for decibel readings may be obtained by

the following formula:

A weight ( f )=20 log (W ( f ) ) (2.10)

where W ( f ) is the weight function in terms of frequency.

The Z weighting has no filter applied on the signal and its gain is therefore 0. Its

weight function is defined by:

(2.11)
W Z ( f )=1
The C weighting is a network wherein a

filter is applied up to a certain frequency cutoff

point. It is primarily used to assess noise with low frequency content and primarily

focuses on peak values of the signal. [14] Decibel gains for each frequency are computed

(2.12)
2.6 Frequency Weighting 14

using Equation 2.6 and its weight function is defined by:

2
f
( )
( ( ) )( (
W C ( f )=1.007152
1+
20.6 Hz
f
20.6 Hz
2
1+
1
f
12194 Hz )
2

)
2.6 Frequency Weighting 15

The A weighting is similar to the C weighting except the filter is applied up to a

higher frequency cutoff point. It is mainly applied for general sound level measurement

and is most commonly used in occupational safety and health acts. Its weight function is

defined by:

2 2 2
f f f
( ) ( ) ( ) (2.13)

( ( ) )( √ ( ) )( √ ( ) )( (
W A ( f )=1.258905
1+
20.6 Hz
f
20.6 Hz
2

1+
107.7 Hz
f
107.7 Hz
2
1+
737.9 Hz
f
737.9 Hz
2
1+
1
f
12194 Hz )
2

)
Figure 2.4 shows a graph of the SPL gains plotted against the different

frequencies for different frequency weightings:

Figure 2.4 SPL gains according to different frequencies for A, C and Z weightings [14]
2.6 Frequency Weighting 15
2.7 Calibration Methods 16

2.7 Calibration Methods

The focus of this study is to successfully calibrate as well as use the SMART-EAR

system in a practical setting. The sound recording for a digital microphone is in voltage.

The sensitivity of the microphone is used to get the pressure value of the recorded sound

below:
(2.14)
V
P=
Sensitivity

In the study by Barrera-Figueroa, et.al. [15], they discussed two important

methods of microphone calibration. The first is the sequential method wherein the

microphone under test (UT) and the reference microphone (REF) are located at the same

spatial position and are made to record the signal from a certain sound source one after

the other. The drawback of this method is that, for the sake of voltage comparison

between the two microphones, the sound signal must be temporally stable.

The second method, which is the simultaneous method, gets rid of this temporal

stability requirement by recording the same signal simultaneously. However, its

drawback is also that, at both positions, the sound pressure from the signal must be equal.

Typically, a distance of around 2 meters will be observed between the sound

source and the microphones. This is to ensure that the wave front of the signal will be flat

when it reaches the microphones. Another important aspect is that the room in which

these methods will be employed in must be anechoic. Considering this, the sequential

method is more advantageous in minimizing the effects of sound reflection that would
2.7 Calibration Methods 16

occur in the room since both microphones would be located in the same place. The

equation used to calibrate the microphone using the comparison method is shown by:
2.7 Calibration Methods 17

V UT
Sensitiyvity UT =Sensitivity REF ( )
V REF
(2.15)

where V UT is the output voltage of the microphone under test and V REF is the output

voltage of the reference microphone.

The comparison method, however, can only be applied to test microphones with a

flat frequency response. In the case of calibrating a microphone whose response varies

depending on the frequency being sampled, the frequency spectrums for both the

microphone under test and the reference microphone would have to be analyzed. In a

study published by Garg, et.al. in 2018, they employed an averaging method for

calibrating smartphone microphones using a novel per-frequency method. [17] After

simultaneously recording environmental noise as uncompressed WAV files using the

smartphone microphone (test) and a class 1 microphone (reference), the average

frequency spectrums with a frequency resolution of 1Hz for both systems were

computed. They did this by dividing the time signal into a set of overlapping frames and

computing the average FFT for all frames. The correction factors were then obtained by

taking the difference of the two spectrums obtained from the WAV files. The researchers

used the following equation to define the ith correction factor:


2
| X i|
c i=10 log10
( )
|Y i|
2
(2.16)

where X i and Y i are the ith coefficient in

the frequency spectrum for the reference and smartphone, respectively.


2.8 Principal Component Analysis 18

Figure 2.5 An example of the microphone response with respect to frequency [17]

2.8 Principal Component Analysis

In this study, each sample will be broken down into their constituent frequency SPL

levels. Because the goal of this study is to identify characteristic acoustic events in the

samples, it would be extremely difficult to enact a comparison between each frequency

for all samples. Principal Component Analysis (PCA) reduces the dimensionality of a

data set consisting of a large number of interrelated variables, while retaining as much as

possible of the variation present in the data set [19]. It is an orthogonal linear

transformation that transforms the data to a new coordinate system such that the greatest

variance by some scalar projection of the data comes to lie on the first coordinate (called

the first principal component), the second greatest variance on the second coordinate, and

so on [20]. Each new component is termed as a Principal Component (PC).


2.8 Principal Component Analysis 19

Figure 2.6 A sample plot of 50 observations [19]

To visualize this process, suppose we have 50 samples, each of which are

composed of two variables x 1and x 2. PCA focuses on the variances of the two variables

and transforms the variables into two principal components z 1and z 2. The transformed

plot is found in Figure 2.7.

Figure 2.7 A plot of the same 50 observations with respect to their PC’s [19]
2.9 K-means Clustering 20

Mathematically, the transformation is defined by a set of size l of p-dimensional

vectors of weights or coefficientsw(k) =(w1 , …. , w p)(k) that map each row vector x(i) to a

new vector of principal component scores t (i)=(t 1 , … . , t l )(i), given by [19]

t k(i)=x (i) . w(k) for i=1 , … ,n k =1 , … ,l (2.17)

In the scenario that there are more than two variables, PCA would still work in

reducing the dimensionality of the data. For example, in this study, a single recording is

composed of thousands of frequencies, each having their own value. PCA would reduce

these thousands of variables into a specified number of PC’s. By reducing the number of

variables to work on, PCA is a fundamental tool in making the data easier to visualize

and understand.

2.9 K-means Clustering

K-means clustering is an unsupervised machine learning method which partitions n

number of observations into a defined target number of k clusters. Each cluster refers to a

collection of data points aggregated together because of similarities in their features.

Once a target number of clusters has been defined, k-means utilizes an

expectation-maximization algorithm to locate the best centroids for each cluster. Once

centroids have been located, each data point is allocated to each cluster by reducing the

in-cluster sum of squares. And because k-means is unsupervised, labelling of the data is

not needed to carry out clustering. However, it will be important in checking the accuracy

of the clustering.
2.11 Receiver Operating Characteristic 21

2.10 Logistic Regression

Whereas k-means is an unsupervised machine learning method, logistic regression is a

supervised learning method. In logistic regression, data samples with an already existing

indicator variable are used to train a logistic model. This logistic model predicts the

probability of a certain class or event and, because this is a form of binary regression,

there are only two possible predictions (i.e. pass or fail) every time a new data sample is

fed into the model. In this study, the threshold will be set at 0.5. In other words, if the

computed probability of a sample ranges from 0 to 0.49, the model would classify the

sample as 0. On the other hand, it the computed probability ranges from 0.5 to 1, the

model classifies the sample as 1.

2.11 Receiver Operating Characteristic

To test the accuracy of the clustering by the k-means algorithm as well as the predictions

made by the logistic model, Receiver Operating Characteristic (ROC) will be used. Data

samples in this study are either positive or negative for a certain acoustic activity. ROC

summarizes the results of both k-means and logistic regression by presenting the number

of true positive (positive data samples that were labelled correctly as positive), true

negative (negative data samples that were labelled correctly as negative), false positive

(negative data samples that were labelled incorrectly as positive, “false alarm”), and false

negative (positive data samples that were labelled incorrectly as negative, “misses’) cases

in a confusion matrix. The accuracies in both methods are defined by the following

formula:
2.11 Receiver Operating Characteristic 22

Accuracy ( ACC )=
∑ True Positive + ∑ True Negative (2.18)
∑ Total Population
Chapter 3

Methodology

This chapter will focus on describing in detail the calibration of the Sound Monitoring,

Assessment, and Recording Tool for Environmental Acoustic Research (SMART-EAR)

system using the LabVIEW software as well as the deployment and analysis of the

recordings gathered.

3.1 Calibration

3.1.1 Data Acquisition

The SMART-EAR system is comprised of a Raspberry Pi 3 Model B+ Single Board

Computer with built in data storage. It utilizes an external ADC and, to record sound, a

ReSpeaker 6-Mic Circular Array Kit is attached to the Raspberry Pi. It runs on a

modified Linux operating system named Raspbian which is already fitted with programs

to record

23
3.1.1 Data Acquisition 24

sound. It can record continuously at a sampling rate of 16 kHz on all six channels and can

store the signal retrieved from this system in the form of .wav files.

Figure 3.1 The SMART-EAR system [21]

The reference system used is a Brüel & Kjær (B&K) microphone attached to a

National Instrumentation Data Acquisition Module (NI DAQ). The NI DAQ was then

connected to a computer running the LabVIEW software. This type of microphone is a

laboratory standard microphone characterized by a flat frequency response. Output

signals generated by any sound source was measured using the B&K microphone from 1

Hz to 20 kHz. By utilizing a program built on LabVIEW, the generated signal was sent to

the computer for data logging and, through specifying the sampling rate, was stored at the

same rate as the system under test. This recording was also stored in the form of .wav

files.

For data acquisition, a speaker generating pink noise at varying sound pressure

levels was used. Pure tones were utilized in this study to test whether the test microphone

is of a flat frequency or a curved frequency response. These signals were simultaneously


3.1.2 Data Analysis 25

recorded by the SMART-EAR system and the B&K microphone. Both setups were

situated at least 2 meters from the sound source, as shown in Figure 3.2.

Figure 3.2 Schematic diagram of experimental setup for calibration

3.1.2 Data Analysis

A separate LabVIEW program was created which would access a specified directory and

read the .wav file stored there. The recorded waveform was then represented by the

program as an array of amplitude values in the time domain. The program was built to

obtain the RMS value as well as the calculated SPL reading, based on equations 2.1 and

2.3, from that array. Because Eq. 2.3 calls for the pressure of the signal, the known

sensitivity of the reference microphone was used to convert the signal from its voltage

RMS value to the corresponding pressure value.

For the frequency domain analysis, a second LabVIEW program was created that

performed the Fast Fourier Transform (FFT) on the array based on the formula given by

equation 2.6. However, to acquire a more accurate average frequency spectrum for each

sample, the array of amplitude values in the time domain signal was first segmented into
3.1.3 Calibration Proper 26

overlapping frames just as what Garg, et.al. implemented in their study. These

overlapping frames were then averaged and the FFT was taken to generate an average

frequency spectrum. It is noted that the discrete Fourier transform is often defined with

an additional factor of 1/ N.[18] This normalization factor was implemented by the

LabVIEW program. A check on the results of the spectrum calculation was done using

Parseval’s Theorem (Eq.2.8) to ensure that the sample would produce the same

calculated voltage RMS in both time domain and frequency domain.

3.1.3 Calibration Proper

The samples used for calibration were white noise as well as pure tones, all designed to

be within a certain operating frequency range. Because the ReSpeaker 6-Mic Circular

Array Kit attached to the Raspberry Pi can only record at a sampling rate of 16 kHz,

frequencies up to 8kHz were used as audio samples to ensure that there would be no

undersampling. This was done to observe the constraints given by the Sampling

Theorem.

For microphones with flat frequency response, calibration would be done using

the comparison method. By plotting the voltage RMS values of the samples obtained

from the test microphone versus the voltage RMS values of the samples from the

reference microphone, a line of best fit would provide a sensitivity value for the

microphone under test by using equation 2.15. This value should hold true regardless of

the signal being used to calibrate the system.


3.2.1 Data Acquisition 27

However, in this study, no single sensitivity value could be applied for the system under

test. Calibration then had to be carried out in the frequency domain. The approach

employed here was similar to the one taken by Garg, et. al. wherein correction factors

per frequency were defined using equation 2.16. Applying these correction factors per

frequency on the recording of the SMART-EAR recording resulted in a corrected voltage

RMS. The correct pressure RMS could then be obtained using the sensitivity of the

reference microphone with the corrected voltage RMS.

To further verify the results of calibration, the correction factors were applied to

the results of white noise and pink noise recordings obtained by the SMART-EAR

system. Once the correction factors were applied to the recordings of the SMART-EAR

system, the voltage RMS values were divided by the sensitivity of the reference

microphone recorded at 0.04701 V/Pa to obtain their corresponding corrected pressure

RMS. Equation 2.2 was then used to calculate the SPL values. The SPL of each sample

of the white noise and pink noise recordings of the reference microphone was calculated

using Labview and compared with the corresponding calibrated SPL.

3.2 Field Test

3.2.1 Data Acquisition

Once calibrated, the SMART-EAR system was first deployed in West City Homes, a

small subdivision, for field test. The system was placed in an open garage situated right

beside a main road of the subdivision to protect the device from the elements.
3.2.2 Preparation of Frequency Spectrums 28

Figure 3.3 The SMART-EAR system set up in West City Homes

Using the arecord software in the Raspberry Pi, recordings were gathered for

three 24-hour periods. Each sample was 2.5 minutes long and the system was set to

record every 5 minutes. The .wav files were stored into a USB and transferred to a

desktop for analysis.

3.2.2 Preparation of Frequency Spectrums

For this section of the study, preparation of the spectrums was carried out via the

documentation prepared for the SMART-EAR system. Using the Python3 program
3.2.2 Preparation of Frequency Spectrums 28

written by Fornis, R., the frequency spectrums for each recording was extracted and

saved as a .csv
3.2.2 Preparation of Frequency Spectrums 29

file. The program also applied correction factors to the frequency spectrums. As

recommended by the study of Fornis, R., only considering spectrum contributions for

frequencies from 19.5 Hz – 8 kHz gives a good accuracy of the results. By discarding

these frequencies, the results from the per frequency correction calibration became more

accurate and was below the most lenient tolerance of ±1.5 dB. [21] The same was carried

out in this study. After obtaining the frequency spectrums from each .wav file, the

voltage values for the first 19 frequencies were discarded and only the frequencies from

20Hz-8 kHz were used for further analysis.

A Python3 program was used to convert the voltage values from the frequency

spectrum to their corresponding dB values. It is useful to note that these values are not dB

SPL (which is the measurement of volume level in the real world) but dB Full Scale, or

dBFS, which is the measurement of digital volume relative to the maximum value.

After the frequency spectrums of each sample were prepared, a reference sample

was selected. For this study, the acoustic activity that was used as a reference is the

barking of a certain female Yorkshire Terrier in the subdivision. Figure 3.4a shows the

reference sample in the time domain while Figure 3.4b shows the frequency spectrum of

the reference sample.


3.2.2 Preparation of Frequency Spectrums 30

Figure 3.4 (a) Reference sample in the time domain and (b) Frequency spectrum of the
reference sample
3.2.4 Annotation 31

3.2.3 Principal Component Analysis

Each sample was then plotted against the reference sample and Principal Component

Analysis (PCA) was performed on each comparison.

The PCA algorithm used in this study is the one provided for by Python3 Scikit-

learn library. For this study, each (7981, 2) matrix, which is the comparison of the

reference frequency spectrum and test frequency spectrum against each other, was

reduced to a (1,2) matrix. These values are the principal component 1 (PC1) and principal

component 2 (PC2).

The principal component values for each sample with respect to the reference

sample were stored in another .csv file.

3.2.4 Annotation

Annotation was carried out using two categories. For the first category, samples that

contained the exact barking found in the reference sample were annotated with a 1 and

those without the bark were annotated with a 0. This specific categorization will be

referred to as annotation 1 for the rest of this study.

For the second category, samples that contained any form of barking (even those

of dogs not present in the reference sample) were annotated with a 1. Samples without

any barking were annotated with a 0. This general categorization will be referred to as

annotation 2 for the rest of this study.


3.2.5 Machine Learning Prediction and Accuracy Check 32

3.2.5 Machine Learning Prediction and Accuracy Check

Both k-means clustering as well as logistic regression were performed on the annotated

data sets. This study utilized both machine learning modules provided by the Python3

Scikit-learn library.

For k-means, its accuracy was computed by comparing the clustered labels to

both the true values of annotation 1 and annotation 2. Confusion matrices were also

generated from the results of the clustering.

As for the logistic regression method, a randomly selected 30% of the total

number of annotated samples were used to train the model as this is the standard

percentage used in most machine learning studies. For the first iteration, the model was

trained with samples categorized by annotation 1 and predictions made by the model

were compared to the true values of annotation 1. For the second iteration, the model was

trained with samples categorized by annotation 1 and predictions made by the model

were compared to the true values of annotation 2. For the third iteration, the model was

trained with samples categorized by annotation 2 and predictions made by the model

were compared to the true values of annotation 2. For the final iteration, the model was

trained with samples categorized by annotation 2 and predictions made by the model

were compared to the true values of annotation 1. Both the second and fourth iterations

were done to check the robustness of this method, and whether or not a model fit with the

specific categorization can be applied to samples identified with the more generalized

categorization and vice versa.


3.3 Application of Logistic Regression Model 33

3.3 Application of Logistic Regression Model

Once the logistic regression model was fitted by both annotations, an application was

done to another dataset, this time, recordings gathered from a commercial area. The

entire data acquisition and preparation procedure was repeated for recordings gathered

from Sidlakan Marketing, a rice trading business. The SMART-EAR system was also

placed facing the main road.

Figure 3.5 The SMART-EAR setup in Sidlakan Marketing

Annotation 1 and 2 was applied to the samples gathered, however, because the

dog found in the reference sample was not present in this area, annotation 1 was carried

out by labelling all data samples with a true value of 0. Annotation 2 proceeded normally.

The logistic regression models that were fitted by annotation 1 and 2 values were

used to predict labels for the samples gathered here. Predictions made by the model fitted

by annotation 1 was compared to the true annotated values of the samples from the

commercial area. The same was done for predictions made by the model fitted by

annotation 2.
Chapter 4

Results and Discussion

4.1 Calibration
For calibration, the Brüel & Kjær microphone was used as the reference microphone. It

was calibrated using a Brüel & Kjær microphone mini calibrator and its measured

sensitivity was 0.04701 V/Pa.

4.1.1 Single Sensitivity Calculation

The simultaneous comparison method discussed in Section 2.7 was first employed to

determine whether the SMART-EAR microphone’s response was dependent on

frequency or not. Each sample was run at 9 iterations each, each iteration being a

different volume intensity so that a line of best fit may be generated.

34
4.1.1 Single Sensitivity Calculation 35

Figure 4.1 shows that the test microphone has different lines of best fit with

respect to frequency, thereby indicating that the microphone used is frequency dependent.

Also, taking the slope of each line and applying equation 2.15 yields the sensitivity of the

test microphone, each of which is compiled in Table 4.1.

Figure 4.1 Graph of RMS recorded by test and reference microphone shows frequency

dependency
4.1.2 Per Frequency Correction Factor Calculation 36

Table 4.1 Different calculated values of sensitivity for each frequency

4.1.2 Per Frequency Correction Factor Calculation

Since the microphone being utilized is frequency dependent, calibration will have to be

carried out in the frequency domain. Using the LabVIEW program that was developed

for this study, FFT’s of pink and white noise samples from both the test microphone and

the reference microphone were obtained.

To verify Parseval’s theorem, the accuracy of the FFT was checked. The RMS

value of the signals in the time domain was plotted against the RMS value of the same

signals in the frequency domain


4.1.2 Per Frequency Correction Factor Calculation 37

Figure 4.2 Graph of RMS from the time domain vs the RMS from the frequency domain

It can be seen in Figure 4.2 that there is a one-to-one correspondence of values.

Also, a line of best fit shows a slope of 1 which means that each sample’s computed

RMS value is the same in the time domain and the frequency domain.

Equation 2.16 was then applied to calculate the correction factor for each

frequency by taking the difference of each sample’s spectrums from both microphones.

Figure 4.2 shows the graph of the average frequency correction factor that resulted from

the computation. The correction factors have a frequency interval of 1 Hz.


4.1.2 Per Frequency Correction Factor Calculation 38

Figure 4.3 Correction factors for each frequency present in the pink and white noise

To check the accuracy of these correction factors, the FFT of the samples in

Section 4.1 were also computed. The correction factors were then applied by multiplying

them with the FFT’s of the samples recorded by the test microphone. The SPL of both

test and reference microphones were then compared.


4.1.2 Per Frequency Correction Factor Calculation 39

Figure 4.4 SPL comparison of pure tones after correction factor application

Figure 4.4 shows a better alignment of results between the test and reference

microphones for each frequency. Outliers found beyond the line of best fit can be

attributed to samples that experienced clipping during recording. But because pure tones

are rarely seen in our environment, greater emphasis is placed on white and pink noise

since both are random signals having varying intensities at a range of frequencies.

Applying the same method that was done on the pure tone samples, a graph of the SPL

calculated from the samples of the test microphone plotted against the SPL calculated

from the samples of the reference microphone was generated.


4.1.2 Per Frequency Correction Factor Calculation 40

Figure 4.5 SPL comparison of pink and white noise after correction factor application

Both Figure 4.4 and 4.5 show that the test microphone has been calibrated and

that its SPL readings have matched those of the laboratory standard B&K microphone.

Because system calibration is assured, the SMART-EAR system will be taken to the

sound source identification portion of the study.


4.2.1 Data Acquisition 41

4.2 Field Test

4.2.1 Data Acquisition

After calibration was accomplished, the SMART-EAR system was deployed at West City

Homes for field testing. The sample rate was set to 16 kHz and the Nyquist frequency

was 8 kHz. The 3 24-hour periods yielded 863 recordings, each 2.5 minutes long.

Frequency spectrums were extracted and, after the SMART-EAR documentation

applied the correction factors, they were saved to a .csv file. The reference recording was

selected and each of the 863 frequency spectrums were plotted against it.

Figure 4.6 The reference spectrum plotted against itself

The red line found in Figure 4.6 is the line that represents one-to-one

correspondence. The orange line, on the other hand, is the line of best fit. These lines and

how they affect the results will be explained in a later section.


4.2.2 PCA and Annotation 42

4.2.2 PCA and Annotation

The principal components of each sample compared to the reference were

computed and saved into a .csv file. A PC graph was constructed from the computed

values where each data point was assigned one of three labels. Samples that were found

from 9 AM to 5 PM were labelled as Daytime, samples from 5 to 9 AM and 6-10 PM

were labelled as Morning and Evening and samples from 10 PM to 5 AM were labelled

with Nighttime. This grouping of time periods is based on the NPCC guidelines (Table

1.1).

Figure 4.7 PC plot with labelling according to general time period


4.2.2 PCA and Annotation 43

Clustering of samples can already be seen from Figure 4.7. The PC values of the

reference sample plotted against itself are (1243.488515, 8.95E-14). It can be assumed

therefore that the more similar a test sample is to the reference sample, the closer it is to

the x-axis. To test this assumption, samples were manually annotated using the categories

defined by annotation 1 and annotation 2. However, due to time constraints, only 420 of

the 863 samples could be categorized. (Sample data found in Table A.1 of Appendix.)

For annotation 1, 323 of the samples were annotated with 0 as they did not

contain the barking of the Yorkshire Terrier in the reference sample. The rest of the 97

samples were annotated with 1 as they contained the same barking of the Yorkshire

Terrier but at different intensities.

Figure 4.8 PC Plot According to Annotation 1


4.2.2 PCA and Annotation 44

For annotation 2, 218 of the samples were annotated with 0 since they did not

contain barking of any kind. The other 202 samples were annotated with 1 since they

contained barking of any kind.

Figure 4.9 PC Plot According to Annotation 2


4.2.3 K-means Clustering Results 45

4.2.3 K-means Clustering Results

The 420 samples underwent the k-means clustering algorithm, where the grey

dots represent the centroid of the two clusters. Predictions made by the k-means

clustering were compared to the true annotated values.

Figure 4.10 K-means clustering results


4.2.3 K-means Clustering Results 46

Figure 4.11 Confusion matrix of k-means vs. annotation 1 true values

Figure 4.12 Confusion matrix of k-means vs. annotation 2 true values


4.2.3 K-means Clustering Results 47

From Figure 4.11, K-means clustering was able to label samples with 95%

accuracy by comparing the predictions with the true values of annotation 1. Reiterating

this method 20 times using only a randomly selected 200 samples each iteration gives an

accuracy of 95.6 ± 0.2%.

Figure 4.12, on the other hand, shows that, when compared to the true values of

annotation 2, the k-means clustering is only able to perform with 72% accuracy.

Reiterating this 20 times using randomly selected 200 samples each iteration gives an

accuracy of 71.8 ± 0.5%.

The results show that the first annotation is a more dependable basis of

categorization for k-means as there is a clearer delineation of where one cluster starts and

where the other ends. Comparing the results of k-means clustering in Figure 4.10 and the

PC plot according to annotation 2 labels in Figure 4.9 show that, because there is no

proper delineation of the two clusters in annotation 2, k-means is unable to separate data

points that contained barking from the data points that did not.

This contrast in accuracy can be attributed to the fact that the PC values only

quantify how closely two frequency spectrums are from each other. As an example,

found in Figure 4.13 is the reference sample plotted against another sample which is

closely related to the reference which can be found at the point (1249.432, 187.6807) on

the PC graph. This was identified correctly by the k-means algorithm with respect to both

annotation 1 and 2.
4.2.3 K-means Clustering Results 48

Figure 4.13 a) Frequency spectrum that contains the barking found in the reference
sample and at a similar intensity and b) The first test frequency spectrum plotted against
the reference frequency spectrum
4.2.3 K-means Clustering Results 49

This sample contains the same dog barking in the reference sample at a similar

intensity. The SPL value of the reference sample is recorded at 79.422 dBA and the SPL

value of the test sample at 71.849 dBA. Note that the line of best fit generated from the

scatter plot has a slope of 0.986, which is parallel to the slope of one-to-one

correspondence. It’s y-intercept is also at a value of -8.218 which is only a small distance

of the line from the line of one-to-one correspondence. Because of their similarities, the

computed PC2 is 187.6807 which is close to the x-axis.

For the second sample, the dog found in the reference sample is barking but does

not do so at the same intensity. This is found on the point (1306.846, 398.471) on the PC

graph.
4.2.3 K-means Clustering Results 50

Figure 4.14 a) Frequency spectrum that contains the barking found in the reference
sample but not at the same intensity and b) The second test frequency spectrum plotted
against the reference
4.2.3 K-means Clustering Results 51

This sample was a false negative result with respect to both annotation 1 and 2. In

this test sample, the SPL value is recorded at 54.837 dBA, a sizable difference from the

SPL value of the reference sample. The line of best fit generated from the scatter plot has

a slope of 0.980, which is still parallel to the slope of one-to-one correspondence.

However, it’s y-intercept is valued at -25.224 which is a sizeable distance from the line of

one-to-one correspondence. This distance explains the false negative prediction from the

kmeans algorithm (for the annotation 1) and how SPL can affect the PC values for

clustering.

For the third sample, there is the presence of dogs barking in the vicinity and the

SPL of recording is higher than that of the second example. However, the dog barking in

this recording is of a different breed and has a different bark altogether. This is found on

the point (1198.928, 358.9948) on the PC graph.


4.2.3 K-means Clustering Results 52

Figure 4.15 a) Frequency spectrum that contains barking from different dogs found in
the reference sample and b) The third test frequency spectrum plotted against the
reference frequency spectrum
4.2.3 K-means Clustering Results 53

The slope of the line of best fit is calculated at 0.846 while the y-intercept of the

line is valued at -21.500. In this third sample, the SPL value is recorded at 61.334 dBA.

The PC values calculated from the difference in frequency spectrums cause the sample to

be closer to the kmean centroid where there was no barking than to the centroid where

there was barking. This explains the false negative result (for the second annotation) and

how the frequencies involved can affect the PC values for clustering.

From the three samples above, it can be observed that the slope of the line of best

fit is determined by the dB values of the frequencies of the spectrum. Simply put, the

more similar a test sample is to the reference sample, the closer the slope is to 1. On the

other hand, the overall SPL of the recording determines the distance of the line of best fit

from the line of one-to-one correspondence. It can also be observed that both SPL and the

similarity of frequencies involved in both the test and reference sample can affect the

computation of PC values, which, in turn, affects clustering as well as the accuracy of the

predictions made by k-means.


4.2.4 Logistic Regression Results 54

4.2.4 Logistic Regression Results

Turning to logistic regression, for the first iteration, a model was trained using 30% of the

data according to annotation 1 true values. The model was then tested using all 420

samples, and the predictions made by the model were compared also to the true values of

annotation 1.

Figure 4.16 Confusion matrix of logistic regression predictions using a model fitted with
annotation 1 labels vs. annotation 1 true values

It is observed that the results from the first iteration closely resemble that of the

results from k-means clustering vs. annotation 1 true values. Reiterating this with

different sets of samples 20 times gives an accuracy of 95.4 ± 0.2%.


4.2.4 Logistic Regression Results 55

For the second iteration, the same model trained in the first iteration was used but

predictions made by the model were contrasted with the true values of annotation 2. The

following confusion matrix resulted from the test.

Figure 4.17 Confusion matrix of logistic regression predictions using a model fitted with
annotation 1 labels vs. annotation 2 true values

Reiterating this with different sets of samples 20 times gives an accuracy of 73.2

± 0.2%. There is an increase of false negative results because, much like in k-means

clustering, the model could not distinguish samples that contained any form of barking

from samples without barking.


4.2.4 Logistic Regression Results 56

In the third iteration, training of the model was now done using samples with

annotation 2 labelling. Predictions made by the model were contrasted with the true

values of annotation 2. Figure 4.19 shows the confusion matrix that resulted from this

iteration.

Figure 4.18 Confusion matrix of logistic regression predictions using a model fitted with
annotation 2 labels vs. annotation 2 true values

Of the true positive predictions, 96 of the 97 samples that were positive in

annotation 1 were predicted to be positive by the model. The remaining 54 true positive

predictions were samples positive for annotation 2 but not for annotation 1. Running this

iteration with different sets of samples 20 times gives an accuracy of 84.2 ± 0.3%.
4.2.4 Logistic Regression Results 57

Lastly, in the fourth iteration, the logistic regression was still trained using

samples with annotation 2 labelling but the predictions made by the model were

contrasted with the true values of annotation 1. Below is the confusion matrix that

resulted from this iteration.

Figure 4.19 Confusion matrix of logistic regression predictions using a model fitted with
annotation 2 labels vs. annotation 1 true values

Observed here is an opposite result to the results of the second iteration for

logistic regression. Here, there is an increased number of false positive predictions. The

algorithm predicted more samples that contained the barking found in the reference

sample than the actual number of samples.

Running this 20 times and using different sets of samples to train the model gives

an accuracy of 83.7 ± 0.5%.


4.2.4 Logistic Regression Results 58

Table 4.2 Accuracy and standard error values for predictions made by k-means

Table 4.3 Accuracy and standard error values for predictions made by logistic regression

Tables 4.2 and 4.3 show the summary of performances of both machine learning

methods for this study. It can be seen that k-means clustering actually behaves similarly

to the logistic regression model trained with annotation 1 values. Both the logistic

regression and k-means clustering approach prove that this method of using PCA to

cluster data is only applicable to the specific case. And generalization, (i.e. being able to

identify other forms of activity in the same nature) is a limitation of the current method.

As for the accuracy values of the model trained by annotation 2, this can be attributed to

overfitting by the Scikit-learn algorithm.


4.3 Application of Logistic Regression Model 59

4.3 Application of Logistic Regression Model


To test the effectivity of the models that were fitted, a second set of 85 samples recorded

from the commercial area of Sidlakan Marketing was prepared and annotated. Figure

4.20 and Figure 4.21 show the PC plots with respect to annotation 1 and 2, respectively.

Figure 4.20 PC plot of the 85 samples according to annotation 1 categorization


4.3 Application of Logistic Regression Model 60

Figure 4.21 PC plot of the 85 samples according to annotation 2 categorization

After annotating the values, the trained logistic regression models were applied to

predict sample labels. Figure 4.22 shows the confusion matrix that resulted from

comparing predictions made by the model fitted with annotation 1 of the training samples

with the annotation 1 true values.


4.3 Application of Logistic Regression Model 61

Figure 4.22 Confusion matrix of the first iteration using the samples from Sidlakan
Marketing
For the first iteration, the logistic regression algorithm was able to predict 96.5%

of the labelling correctly when compared to the true value of annotation 1. Two of the

three false positive samples contained voices which had similar frequencies to the

barking in the reference sample. The third false positive sample contained an ambulance

siren whose frequency was also similar to the barking in the reference sample.
4.3 Application of Logistic Regression Model 62
4.3 Application of Logistic Regression Model 62

Figure 4.23 a) Frequency spectrum of the sample with the ambulance siren and b)
Spectrum comparison of the sample with the ambulance and the reference sample

For the second iteration, the logistic model was trained using annotation 2 of the

original dataset. Comparing the predictions made with the true annotation 2 labels of the

new dataset produces the following confusion matrix.

Figure 4.24 Confusion matrix of the second iteration using the samples from Sidlakan
Marketing
4.3 Application of Logistic Regression Model 63

Observe the sharp decrease in accuracy when applying the model trained by

annotation 2 of the original dataset. The model was only able to identify 69.4% of the

new dataset accurately. It can be concluded that the second model struggled to determine

which samples had barking in the recordings and which ones didn’t. This further proves

that generalization is currently a limitation of the study and that this method is only able

to identify the specific acoustic activity found in the reference sample.


Chapter 5

Conclusion and Recommendations

The study was able to calibrate the Sound Monitoring, Assessment, and Recording Tool

for Environmental Acoustics Research (SMART-EAR) system successfully. The

ReSpeaker 6-Mic Circular Array Kit was utilized as the test microphone.

Two calibration procedures were implemented for this study, calibration by single

sensitivity calculation and calibration by per frequency correction. The reference

microphone used for this study was the laboratory standard Brüel & Kjær microphone

connected to the LabView software. The study revealed that calibration by single

sensitivity was not effective in matching the performance of the test microphone to that

of the reference microphone. This was seen in the multiple sensitivity values that were

calculated for each frequency. The per frequency correction method was the better option

as, upon applying correction factors to the frequency spectrum, the calculated RMS and

64
Chapter 5: Conclusion and Recommendations 65

SPL values of the SMART-EAR system resembled those of the reference microphone.

This shows that the performance of the test microphone now matched that of the test

microphone.

Once calibration was completed, field testing was done. In this study, the

SMART-EAR system was placed in West City Homes and was set to record audio

samples for 2.5 minutes every 5 minutes. This was done for 3 24-hour periods leading to

a total number of 863 samples. Out of these samples, one was selected to be the reference

sample. This reference sample contained the barking of a female Yorkshire Terrier living

in the subdivision. By plotting the frequency spectrums of each sample against the

reference sample, principal component analysis (PCA) was applied to reduce the number

of dimensions from 7981 frequencies to just two principal components.

The PC plot that resulted from this dimensionality reduction was found to be very

effective in clustering data samples where the barking of the Yorkshire Terrier was

present. The PC plot revealed that the closer the frequency sample of the recording

resembled that of the reference sample, the closer the data sample was to the x-axis. It

was also revealed that both the computed SPL value, specifically the dBA, of the

recording and the primary frequencies involved in the sample affected the computed PC

values of the data sample.

K-means clustering was found to be highly effective in clustering samples that

contained the activity found in the reference sample from those that didn’t, with an

accuracy of 95.6 ± 0.2%. However, it struggled to cluster samples that contained barking

of any kind of dog from those that didn’t, with an accuracy of only 71.8 ± 0.5%. This
Chapter 5: Conclusion and Recommendations 65

proved that the current method is limited in identifying barking in the general sense and

that it is only able to identify the barking found in the reference sample.
Chapter 5: Conclusion and Recommendations 66

The results from applying logistic regression to the data set was similar to the

results of the k-means clustering. Using samples categorized by annotation 1 to train the

model, it was able to predict the presence of the reference barking in the test samples

correctly with an accuracy of 95.4 ± 0.2%. Conversely, by using samples categorized by

annotation 2 to train the logistic regression model, it struggled to correctly predict the

presence of any kind of barking in the test samples with an accuracy of only 73.2 ± 0.2%.

These performances closely resemble that of the k-means clustering.

Applying these models to predict labels of a second dataset, this time samples

gathered from a commercial area, revealed a great performance for the model fitted with

samples categorized by annotation 1 with an accuracy of 96.5%. On the other hand, the

model fitted with samples categorized by annotation 2 struggled to determine which

samples had barking in the recordings and which ones did not, giving an accuracy of only

69.4%.

The data proves that this method is able to identify and cluster samples that

contain a certain reference acoustic activity. For future research, it is recommended to

develop a method that would identify samples in a general sense, i.e. being able to

identify any kind of barking and not just the specific one found in the reference sample.

For example, a study done by Sethi, et. al. employed convolutional neural network to

monitor ecosystems autonomously using soundscape data. [22] By creating a high-

dimensional feature space, the study was able to not only cluster specific biomes but also

identify anomalies in the soundscape. The same could be done for future studies.

However, instead of using CNN to create the feature space, different reference samples
Chapter 5: Conclusion and Recommendations 66

could be used to identify and cluster the same samples. Combining these PC plots would

then generate the feature space needed


Chapter 5: Conclusion and Recommendations 67

to identify generalized activities, i.e. using different kinds of barking as references to

cluster samples with barking as a whole.

It is also recommended to look into how extracting the frequencies involved in the

acoustic activity to be identified can affect results. In this study, only the spread of values

relative to the reference sample was taken for comparison. This study did not take into

account nor utilize the specific frequencies involved in the reference sample.

Lastly, during the gathering of samples, there were instances where recording of

the SMART-EAR system would stop abruptly. This was found to be caused by the length

of the recording in that the system often had difficulty writing 2.5 minutes of audio into a

.wav file. It is recommended to shorten the recording time to 1 minute to limit the system

errors that would appear while recording.


Bibliography:
[1] Environmental Noise Guidelines for the European Region (2018). (2018, October 9).
Retrieved from http://www.euro.who.int/en/health-topics/environment-and-
health/noise/publications/2018/environmental-noise-guidelines-for-the-european-region-
2018.
[2] Basner, M., Müller, U., & Elmenhorst, E.-M. (2011). Single and Combined Effects of
Air, Road, and Rail Traffic Noise on Sleep and Recuperation. Sleep, 34(1), 11–23. doi:
10.1093/sleep/34.1.11
[3] Vos T, Flaxman AD, Naghavi M, et al. Years lived with disability (YLDs) for 1160
sequelae of 289 diseases and injuries 1990–2010: a systematic analysis for the Global
Burden of Disease Study 2010. Lancet 2012; 380: 2163–96.
[4] Stansfeld, S. A., & Matheson, M. P. (2003). Noise pollution: non-auditory effects on
health. British Medical Bulletin, 68(1), 243–257.
[5] Noriega-Linares, J., & Ruiz, J. N. (2016). On the Application of the Raspberry Pi as
an Advanced Acoustic Sensor Network for Noise Monitoring. Electronics, 5(4), 74.
[6] Whytock, R. C., & Christie, J. (2016). Solo: an open source, customizable and
inexpensive audio recorder for bioacoustic research. Methods in Ecology and Evolution,
8(3), 308–312.
[7] Department of Environment and Natural Resources and Environment Management
Bureau (1999) RA 8749: The Philippine Clean Air Act, Implementing Rules and
Regulations.
[8] Vergel, K. N., Cacho, F. T. & Capiz, C.L. (2004). A Study on Roadside Noise
Generated by Tricycles. Philippine Engineering Journal PEJ 2004; Vol. 25 No. 2:1–22
[9] Everest, F. A., & Pohlmann, K. C. (2015). Master handbook of acoustics (6th ed.).
New York: McGraw-Hill.
[10] Miyara, F. (2017). Software-based acoustical measurements. Springer.
[11] Kuttruff, H. (2014). Acoustics: an Introduction. Boca Raton: CRC Press.

[12] Randall R. B. (2008) Spectral analysis and correlation. In Handbook of signal


processing in acoustics. Springer.
[13] Robinson, D.W., & Dadson, R.S. (1956). A re-determination of the equal-loudness
relations for pure tones. British Journal of Applied Physics, 7(5), 166-181.

68
Bibliography 69

[14] Bjor O. (2008). Filters. In Handbook of signal processing in acoustics. Springer.


[15] Barrera-Figueroa S., Torras-Rosell A., Rasmussen K., Jacobsen F., and Cutanda-
Henriquez V. (2012). A practical implementation of microphone free-field comparison
calibration according to the standard iec 61094-8. Acoustical Society of America.
[16] Lathi, B. P. (2002). Principles of Linear Systems and Signals. New York: Oxford
University Press.
[17] Garg, S., Lim, K. M., & Lee, H. P. (2019). An averaging method for accurately
calibrating smartphone microphones for environmental noise measurement. Applied
Acoustics, 143, 222–228.
[18] Federico, M. (2017) Software-based acoustical measurements. Springer.
[19] Jolliffe I.T. (2002) Principal Component Analysis, Series: Springer Series in
Statistics, 2nd ed., Springer, NY, XXIX, 487 p. 1
[20] Jolliffe I.T. (2002) Principal Component Analysis, Series: Springer Series in
Statistics, 2nd ed., Springer, NY, XXIX, 487 p. 28 illus
[21] Fornis, R. (2020) Development of Sound Monitoring, Assessment, and Recording
Tool for Environmental Acoustic Research (SMART-EAR).
[22] Sethi, S. S., Jones, N. S., Fulcher, B. D., Picinali, L., Clink, D. J., Klinck, H., . . .
Ewers, R. M. (2020). Characterizing soundscapes across diverse ecosystems using a
universal acoustic feature set. Proceedings of the National Academy of Sciences,
117(29), 17049-17
Appendix
Table A.1 Sample data with PC values as well as annotation

You might also like