Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Advanced Engineering Informatics 47 (2021) 101238

Contents lists available at ScienceDirect

Advanced Engineering Informatics


journal homepage: www.elsevier.com/locate/aei

Ambient acoustic event assistive framework for identification, detection,


and recognition of unknown acoustic events of a residence
Sharnil Pandya a, *, Hemant Ghayvat b, c, d
a
Symbiosis Institute of Tehnology and Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed) University, Pune, Maharashtra, India
b
Innovation Division, Technical University of Denmark, Denmark Faculty of Technology, Denmark
c
Computer Science Department Linnaeus University, Vaxjo, Sweden
d
Building Realization and Robotics, Technical University of Munich, Germany

A R T I C L E I N F O A B S T R A C T

Keywords: In recent times, Ambient Assisted Living has emerged as Smart Living. Smart living is a subset of ambient in­
Smart home acoustics acoustic sensor network telligence, which uses the latest technologies, intellectual processes, and ambient intelligent methodologies to
Home monitoring Residential assistance enable house residents to live independently with a virtual companion 24 × 7. Typically, these residents are
LBP
highly engrossed in the daily routine activities that they tend to ignore certain acoustic events attributing them to
LSTM-CNN
the white noise caused due to tap water leakage, flush water leakage, the acoustics of door opening/closing,
cupboard opening/closing, curtain opening/closing, television, shower, radio, chair and many more. These
unattended events lead to a waste of critical energy resources such as electricity, water, and gas and may cause
accidents in some cases. For the conducted experiments, a customized dataset termed as “unknown-2000” and
ESC-50 has been used, which has more than 2000 audio sound classification samples. The customized dataset is
used for the conducted experiments, consisting of various length acoustic events ranging from 2 s to 10 s. In the
proposed review, we have identified, analyzed, and evaluated resident acoustic events using Librosa machine
learning libraries, texture analysis using LBP methodology, LSTM-CNN, SVM, KNN, LSTM, Bi-LSTM, and Decision
Tree-based classification approaches. Furthermore, in the proposed approach, based on the conducted rigorous
and detailed analysis, we are also envisioning the prospective ways to enhance smart living concepts by pro­
posing a novel Acoustic Event Detection and Classification System. The investigation results validate the success
of the proposed approach. The obtained results indicate that the customized version of the LSTM-CNN based
classification approach used in the conducted experiment has outperformed all the other customized classifi­
cation approaches, such as SVM, KNN-based classification, C4.5 decision tree-based classification, LSTM, and Bi-
LSTM based classification. The LSTM-CNN based classification model has achieved an average value of
approximately 0.77 and a standard deviation of 0.2295. Furthermore, the obtained experiential results show that
the proposed approach has produced a good performance in various noisy conditions such as SNR0, SNR3, SNR6,
SNR9, SNR12, and SNR15. The system classification accuracy has been enhanced to 77% for various acoustic
events of a residence. In the end, a detailed comparison of LBP and without LBP approaches has been carried out,
which proves that the combination of LBP and LSTM-CNN classification approach provides better results than
without the LBP classification approach. The proposed Ambient Acoustic Event Assistive Framework is a cost-
effective alternative due to the use of low-cost microphone sensors in the conducted experiments.

1. Introduction between smart sensing devices, real-world objects, and virtual objects
[1–6]. The current era is of IoT and intelligent assistive technologies.
Cyber-physical systems aim to provide 24 * 7 connectivity between The latest ambient intelligence technologies have established the
cyber and physical objects using the latest technologies such as the concept of Ambient Assisted Living (AAL). The presented ambient
internet of things, wireless sensor networks, big data, and artificial in­ acoustic assistive framework’s primary motivation is the issues older
telligence. IoT plays a crucial role in establishing interconnectivity people face worldwide. In most countries, older populations are

* Corresponding author.
E-mail addresses: sharnil.pandya@sitpune.edu.in, sharnil.pandya@scaai.siu.edu.in (S. Pandya).

https://doi.org/10.1016/j.aei.2020.101238
Received 19 April 2020; Received in revised form 30 October 2020; Accepted 21 December 2020
Available online 8 January 2021
1474-0346/© 2021 Elsevier Ltd. All rights reserved.
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Table 1 In this study, a rigorous and detailed analysis of 22 ignorant acoustic


A list of used terminologies. events has been conducted in the undertaken study, which was identi­
Terminologies Description fied and recognized via microphone sensing units placed in various
corners of a smart residence. Furthermore, we have applied diverse state
LBP Local Binary Pattern
LSTM-CNN Long Short-Term Memory Network- Convolution Neural Networks of the art machine and deep learning techniques to identify, detect,
AAL Ambient Assisted Living recognize, and classify ADLs. We have proposed to employ a variety of
SVM Support Vector Machines non-wearable devices and wireless sensor-based AAL approaches.
KNN K-nearest neighbors Table 1 represents a list of used terminologies in the conducted
SNR Signal to Noise Ratio
WSN Wireless Sensor Networks
experiments.
ADL Activities of Daily Living The rest of the article is organized as follows: Section 2 discusses the
state of the art methodologies. Section 3 discusses the necessity and
novel features of the proposed Ambient Acoustic Framework. Section 4
maturing at a rapid pace. According to UN reports, the elderly popula­ discusses the design and experimental setup used in the conducted ex­
tion is expected to get doubled by 2050 [7]. periments. Section 5 represents the proposed system topology design,
In recent times, due to heavy work stress and a busy schedule, smart layered architecture, and detailed workflow of the proposed system. It
home occupants are so much involved in the routine activities (ADLs) also discusses and analyses various unattended acoustic events in
that they ignore certain acoustic events that directly impact the vital spectrograms, poly-grams, and tempo-grams. Section 6 discussed the
energy resources such as water, gas, and electricity. Unattended events obtained results of LBP, LSTM-CNN, KNN, SVM, and C4.5 based classi­
such as tap water leakage and flush water leakage directly impact family fication approaches. Section 7 represents concluding remarks and future
water usage’s monthly consumption. Frequently, house residents keep enhancements.
the fridge door open and start focusing on other daily routine activities,
which significantly affect the electricity bill, overall electricity con­ 2. Related work
sumption, and safety of home appliances such as microwave oven,
refrigerator, television, radio, and many more. Acoustic events such as Ren et al. [15] have proposed a cooperative acoustic assistive
main door opening/closing, wooden or metal cupboard opening/closing approach for MIMO networks. However, he did not discuss any ideas
can help residents protect their valuable assets in their absence. For related to smart home acoustics. Novarro et al. (2018) have analyzed a
instance, recognizing such acoustic events can mitigate the possibility of real-time distributed architecture for remote acoustic monitoring of el­
theft, robbery, and unauthorized access to the premises by trespassers. derlies. However, they did not consider smart home acoustic events in
However, auto-monitoring and identifying various unattended events at the conducted experiments. Alsina-Pages et al. [17] have discussed a
home is challenging for establishing a complete ambient acoustic homeSound system remote monitoring and behavior pattern surveil­
assisting framework for house residents [8]. lance. However, the system did not discuss any concepts of smart home
The presented approach discusses various AAL acoustic technologies acoustics. They did not consider smart home acoustic events in the
that consider sound to be an input for detecting and classification smart conducted experiments. Lopez-Bellester et al. (2020) have researched an
home acoustic events. Assistive health informatics is inevitable for acoustic sensor network for real-time processing of Psycho parameters.
everyone. Precise sensing information is vital for the detection, recog­ However, they did not discuss anything related to a smart home acoustic
nition, and classification of smart home acoustic events. Fellow Re­ event classification. Vithiya et al. [19] have analyzed an underwater
searchers have employed and examined heterogeneous sensing units in acoustic sensor network for assessing various routing protocols. How­
different corners of residences in existing research on smart home event ever, the conducted work has not discussed any ideas related to smart
classification. They have also formulated wellness models to detect and home acoustic events. Jin et al. [20] have discussed a localization-based
classify routine activities [9,10]. In general, smart home sensing units acoustic sensing framework. However, they did not discuss anything
can be categorized into two classification categories: wearable sensing related to smart home acoustic monitoring and recognition. Pandya
units and non-wearable sensing units. Wearable sensing units are con­ et al. [5,6] have discussed a deep learning model for preventive
nected to the human body and formulate a body sensor network and healthcare monitoring and ADLs. They also represent the presented
communicate sensing information over the cloud platform, such as ambient acoustic assistive framework. Vuegen et al. [22] have
sensing units such as accelerometer and gyroscope. These sensing units researched a wireless acoustic sensor network for monitoring ADLs for
are used in remote health monitoring applications such as fall or gauge noisy and clean scenarios. However, the described approach did not
detection, elderly monitoring, cardiac arrest detection, detection of discuss anything related to smart home acoustic events. Wang et al. [23]
respiratory diseases, and many more. Non-wearable sensing units are have performed contactless respiratory monitoring of ultrasonic signals.
typically placed in various corners of a smart home. These sensing units However, they did not discuss smart home acoustic events and their
are utilized to measure events such as water leakage, trespassers, the classification. Zhang et al. [24] have conducted respiratory monitoring
lifestyle of the smart home occupants, sleep index, anomalies in the of mobile users using RFID technology. However, they did not discuss
lifestyles, and health diseases such as Dementia and Parkinson’s, and any ideas related to smart home acoustic events. Can et al. [25]have
well-being of smart home residents [11–14] (Ciampolini et al., 2019 plus analyzed urban sounds and noise conducted noise assessment of
Smart Aging 5 & 11). It is challenging for smart home occupants to wear mobility trends. However, the discussed work was not related to smart
smart gadgets such as smartwatches, smart belts, and smartphones in home acoustics. Spoladore et al. [29] have proposed a semantic
wearable sensing unit scenarios. It is almost impossible to convince framework for energy saving in smart homes. This is an ontology-based
someone to wear a variety of smart gadgets over the body. The other application which makes the use of a variety of sensors and actuators.
option is the use of video-based methodologies to monitor and track the The proposed approach has not facilitated energy saving using acoustic
resident movements. However, it is not advisable to equip the smart event detection. Mesaros et al. (2019) have discussed a sound event
home with CCTV cameras due to security and privacy reasons. Smart detection system for recognizing task-based activities using deep neural
home residents are not comfortable with placing cameras at private networks for various human and non-human objects; however, the
places such as Toilet and Bedroom. Therefore, the reported research in proposed approach was not applied to recognizing unattended events or
this article is based on an ambient acoustic assistive network. However, smart home acoustics. Lostanlen et al. (2019) have discussed a
the acoustic assistive framework for recognizing unattended house bioacoustics sensor network for detecting sounds of animals and birds
events is still an open research problem and an initial development using CNN. The discussed approach was not tested on unknown acoustic
stage. events and routine activities of home occupants. Lojka et al. (2010) have

2
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Table 2
A summary of existing research works.
Year Author Purpose Techniques Used Dataset Issues

2018 Navarro et al. [16] Remote Acoustic Elderly Fog inspired distributed Customized homeSound The focus was on human activity
Monitoring architecture dataset of noisy events detection rather than smart
home acoustic event
classification
Two-stage audio event
classification
Wireless Acoustic Sensors They did not discuss any ideas
related to smart home acoustics.
2020 Lopez-Ballester et al. [18] Classification of Psycho-acoustic Wireless Acoustic Sensors Customized dataset of They did not consider smart
parameters typical sounds occurring in a home acoustic events such as
city Tape water and Flush water in
the experiments.
CNN based classification of
Physcho-acoustic parameters
2017 Alsina-Pages et al. [17] Classification of Behaviour and Wireless Acoustic Sensors Customized homeSound
Surveillance Monitoring of dataset of indoor events
Humans
They did not discuss any
ideas related to smart
home acoustics.
Monitoring of 14 indoor
environment events
2020 Ren et al. [15] Connectivity of Underwater Underwater Wireless Acoustic Monte Carlo Simulations They did not analyze or evaluate
Acoustic classification of Sensors were performed smart home acoustic events.
Psycho-acoustic parameters
2019 Wang et al. [23] Contactless respiratory Ultrasound signal processing 25 different age-group They did not discuss any ideas
monitoring using audio devices participants were used to related to smart home acoustics.
collect audio samples.
2020 Zhang et al. [24] Accurate respiratory monitoring Tensor Canonical Polyadic Experiments were conducted They did not discuss any ideas
of mobile users using RFID Decomposition (CPD) in a real-time driving related to smart home acoustics.
devices in driving environments environment. No dataset was
used.
2020 Can et al. [25] Urban Sound Noise assessment Artificial Neural Network + Customized urban sound They did not analyze or evaluate
and mitigation Fuzzy Logic dataset was used. smart home acoustic events.
2019 Iftikhar et al. [26] Ambient Acoustic energy conventional machining No specific dataset selected They did not discuss any ideas
Harvesting operations related to smart home acoustics.
2020 Ghayvat et al. [21] Smart Home Acoustic Microphone Sensor-based The ESC-50 dataset was used They did not discuss smart home
Monitoring Acoustic Monitoring in the conducted acoustic classification
experiments. approaches.
2019 Spoladore et al. [29] Semantic Framework for Energy Ontology-based event No specific dataset was used They did not discuss any ideas
Saving in Smart Homes classification and energy related to acoustic event
analysis of a smart home detection, recognition, and
classification.
2019 Bianchi et al. [14,12] Localization and Identification
for ZigBee Wireless Sensor
Networks in Smart Homes
RSSI-based localization No specific dataset was used They did not discuss smart home
and identification using a acoustic classification
finger-printing approach approaches.
2019 Oguntala et al. [13] RFID enabled human Activity A novel ambient HAR No specific dataset was used They did not discuss any ideas
recognition framework using the related to acoustic event
multivariate Gaussian detection, recognition, and
classification.
2019 Bianchi et al. [14,12] Personalized Human Activity
Recognition using Wearable
Sensors and Deep Learning in
Smart Homes
2009 CNN based smart home A customized HAR (Human They did not discuss smart home
and event classification Activity Recognition) dataset acoustic detection and
2013 [27,28] was used classification approaches.
2019 Bellagente et al. [11] Framework-oriented approach implementation of new open Use-case Analysis They did not discuss any ideas
for the development of AAL multivendor AAL systems using related to acoustic event
systems the web and mobile technologies detection, recognition, and
classification.
2019 Ghayvat et al. [2] Smart Aging System Heterogeneous Wireless Sensor Customized dataset of AALs They did not discuss smart home
Network to monitor ADLs and ADLs was used acoustic classification
approaches.
2015 Forkan et al. [10] Behavioral change detection Fuzzy rule-based model to Customized dataset of AALs They did not discuss any ideas
and abnormality prediction predict ADLs and ADLs was used related to acoustic event
detection, recognition, and
classification.
2017 Calvaresi et al. [9] A review on exploring the AAL Wearable and Non-wearable No specific dataset was used They did not discuss any ideas
domain Sensing technologies to detect related to wireless acoustic
ADLs was discussed sensing.
(continued on next page)

3
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Table 2 (continued )
Year Author Purpose Techniques Used Dataset Issues

2013 Magherini et al. [8] Automated Recognition of Temporal Logic and Model No specific dataset was used They did not discuss smart home
Human Activity Checking based Human Activity acoustic classification
Recognition approaches.
2010 Mesaros et al. [30] Acoustic Event Detection of real- Recognition and temporal No specific dataset was used They did not discuss any ideas
life scenarios positioning of a sequence of related to acoustic event
events detection, recognition, and
classification.
2017 Mesaros et al. [31] Sound Event Detection in the Statistical Analysis of Acoustic No specific dataset was used They did not discuss smart home
DCASE 2017 challenge Events acoustic classification
approaches related to a smart
home.
2019 Lostanlen et al. [32] Robust Sound Event Detection Bioacoustic Sensor Networks The BirdVox-full-night They did not discuss smart home
using CNNs dataset was used acoustic classification
approaches related to a smart
home.
2019 Xie et al. [43] Investigation of different CNN CNN based Bird-sound Dataset of bird species was They did not discuss any ideas
based model for Bird Sounds Classification used related to smart home acoustic
event classification.

researched an acoustic event detection system called “Ear-Tuke” for drastically. In general, smart home residents are highly engrossed in the
detecting dangerous acoustic events using modified Viterbi decoding daily routine activities that they tend to ignore certain acoustic events
and Hidden Markov Model. The proposed system was employed for attributing them to the white noise caused due to tap water leakage,
monitoring large urban areas. Alhazmi et al. [33] have analyzed a sur­ flush water leakage, the acoustics of door opening/closing, cupboard
face acoustic wave recognition system for civil infrastructures. However, opening/closing, curtain opening/closing, television, shower, radio,
the proposed system was not applied to a variety of unattended house chair and many more. These unattended events lead to a waste of critical
events. This approach is designed explicitly for infrastructures only. energy resources such as electricity, water, and gas and may cause ac­
Choi et al. have researched a DNS based texture analysis approach for cidents in some cases. For the conducted experiments, a customized
detecting railway sound signals. However, this approach has not been dataset termed as “unknown-2000” and ESC-50 has been used, which
tested on smart home acoustic events [34]. Kouzoupis et al. have dis­ has more than 2000 audio sound classification samples. In the proposed
cussed a categorization of mouse ultrasonic vocalizations using machine research work, a detailed analysis of a variety of unattended acoustic
learning techniques; however, it cannot be applied to smart home events has been done using a variety of statistical representations: (i)
acoustic events [35]. Wang et al. have analyzed a palmprint identifi­ spectrograms, (ii) poly features, and (iii) tempogram representations.
cation using boosting local binary pattern [42]. But there is no evidence Spectrogram representation analyses various unattended household
that it was applied to acoustic events. Xie et al. have proposed an events in weak and robust frequency components (Internet Reports,
investigation of different CNN-Based models for improved bird sound 2013–2019). A variety of frequency distributions also categorize several
classification. But it was not applied and tested on smart home acoustic unattended household events into easily recognizable and challenging to
events [43]. recognizable classifications using a proposed acoustic event detection
Su et al. have proposed an environment sound classification system. Poly features represent unattended acoustic events concerning
approach using a two-stream CNN-based classification applied to envi­ pitch resolution and pitch class (Internet Reports, 2013–2019). It also
ronmental noise data [44]. Lee has proposed a decision tree-based indicates the octave loudness and octave height of each recorded
classification approach for data classification [45]. However, it was acoustic events. The tempogram representation of household acoustic
not applied to any kind of multimedia data. However, this approach was events represents the variation in the intensity and measures acoustic,
not applied and tested on residence acoustic signals. Chagjun and tempo, length, and rhythms (Internet Reports, 2013–2019). Further­
Yuzong have proposed an SVM and KNN based classification approach more, based on the conducted rigorous and detailed analysis, the
for vehicle classification [48]. But there is no evidence available to recorded acoustic events have been normalized using linear trans­
confirm that this approach was applied to any kind of acoustic events. formation methodology.
George et al. have proposed an acoustic signal classification approach Moreover, in the proposed research work, we have proposed an LBP
using ANN and CNN algorithms on vehicle detection and classification based texture extraction approach to the recorded acoustic events,
[49]. However, this approach was not tested on residential acoustic which was previously used for image processing applications and was
signals. All the previous research focused on wearable gadgets, wireless never applied to acoustic events. In the proposed work, we have
sensor-based AAL recognition, and monitoring humans using micro­ extended the application of the LBP algorithm in the acoustic domain.
phone sensors. There is a requirement for developing a system that can We have presented a novel Ambient Acoustic Event Assistive Framework
classify the domestic smart home acoustic events accurately. In this for various acoustic events of a smart home residence, which was not
study, we propose a smart system that can detect and classify acoustic addressed previously. In the proposed approach, homogenous micro­
events. The proposed system can detect various domestic sounds - pri­ phone sensors were used to record acoustic events. The proposed
marily ignored and attributed to white noise at an ordinary residence. Ambient Acoustic Event Assistive Framework is a cost-effective alter­
The system can help smart home residents mitigate the wastage of native due to low-cost microphone sensors in the conducted experi­
natural resources such as water, gas, electricity, and safety. Table 2 ments. Eventually, the validation of the proposed ambient acoustic
presents a summary of existing methodologies. assistive framework approach has been done using machine learning
and deep learning-based classification techniques such as LSTM-CNN
3. Necessity of a proposed system based classification, decision-tree classification (C4.5), SVM, KNN,
LSTM, and Bi-LSTM based classification methodologies. The obtained
With the latest developments in ambient assisted technologies and results indicate that the customized version of the LSTM-CNN based
enhanced living concepts such as home automation, automation, and classification approach used in the conducted experiment has out­
monitoring of daily living activities, intelligent WSN based energy- performed all the other customized classification approaches, such as
saving approaches and lives of smart home residents have transformed SVM, KNN-based classification, C4.5 decision tree-based classification,

4
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Fig. 1. Design and Experimental Setup of the Acoustic Event Detection system
a) microphone sound sensor (b) UDB Probe for a sound card (c) Wireless
Acoustic setup.

Table 3
Technical specifications.
Sr No Technical Specifications

1 Voltage 3–3.5 W
2 Outputs 1 analogue + digital
3 2 indicator LEDs 1 power + 1 comparator
4 Response Frequency 20–50 kHz
5 Output Impedance 50 O – 600 O
6 Sensitivity Values 47–65 db
7 Operating temp − 39 ◦ C to + 80 ◦ C
8 Dimensions 43 × 14 × 10 mm

Fig. 3. The phase-wise system Architecture of the proposed Ambient Acoustic


Assistive Framework.

4. Design and experimental setup

The design and experimental setup employed for the conducted ex­
periments is presented in Fig. 1. In a non-echoing room, a USP probe and
a microphone sound sensor were installed. A USP probe has been fixed 2
m above the floor height, underneath the microphone sound sensor. The
microphone sound sensor is placed above 10 mm, then the USP sound
probe. A multi-channel audio interface has been placed in the design and
Fig. 2. The topology design of the Proposed Ambient Acoustic Assis­ experimental setup section to detect and record sound signals from the
tive System. USP probe connected with a cloud server. In addition to the placed
hardware, acoustic pulse and acoustic calibrators have been used to
LSTM, and Bi-LSTM based classification. The LSTM-CNN based classi­ record and measure acoustic signals. The detailed design specifications
fication model has achieved an average value of 0.7705 and a standard are represented in Table 3.
deviation of 0.2295. The LBP-based texture analysis and LSTM-CNN Fig. 2 depicts the overall arrangement of sensing units and the to­
based classification results were tested at various noise conditions pology design of the wireless acoustic network. Fig. 3 represents the
such as SNR0, SNR3, SNR6, SNR9, SNR12, SNR15, and SNR18. In the overall structure of the resident acoustic event detection and classifi­
end, we have envisioned the prospective ways of changing smart living cation system. The wireless acoustic event detection process is catego­
and have discussed specific future research directions. rized into three different levels: (i) Physical Sensing Layer, (ii) Acoustic

5
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Fig. 4. Layered Architecture of the Proposed Ambient Acoustic Assis­


tive System.

Event Detection layer (iii) Acoustic Event Analysis Layer.


A microphone sound sensor enables the physical sensing layer by
detecting various audios, such as opening and closing freeze, tap water
leakage, flush water noise, opening and closing windows/doors, and
opening and closing wooden/metal/plastic cupboards, and many more.
ADC converts detected analog audios into digital form via UART based
Wi-Fi controller. The recorded audio signals at the physical sensing layer
are passed to the acoustic event detection layer for further computa­
tions. The acoustic Audio detection layer is responsible for detecting,
amplifying, and storing received audio signals from the physical sensing
layer. The acoustic event detection layer is equipped with a Wi-Fi router,
and a 4-channel audio interface is placed to detect and record sound
signals from the USP probe connected with a cloud server.
The amplified sounds of detected events are forwarded to the
Acoustic Event analysis layer for noise removal, acoustic event classifi­
cation, and detected acoustic events such as tap water leakage. The
detected audios are classified and labeled at the acoustic event analysis
layer using an acoustic audio classification algorithm, as represented in
Fig. 3. A microphone sound sensor initiates this process by detecting an
acoustic event such as opening and closing of freeze, tap water leakage,
and many more. During the process of windowing, the received acoustic
signals are converted into digital form using ADC. The acoustic signals
are captured in the form of acoustic window frames of 100 ms. Figs. 4
and 5 represents a layered architecture of the proposed Ambient
Acoustic Assistive Framework. The proposed layered design has been
classified into seven layers: (i) Physical Layer, (ii) Fog layer, (iii) Cloud
Layer, (iv) Data Presentation Layer, (v) Objective Layer, (vi) Application
Layer, and (vii) Connection Layer.

4.1. Physical layer Fig. 5. Spectral representation of unknown acoustic events (a) tap water, (b)
flush water, (c) fridge opening (d) fridge closing (e) chair movement.
The physical sensing layer consists of various microphone sensors to
sense various smart home acoustic events such as Fridge Opening, 4.2. Fog layer
Fridge Closing, Wooden Door Knocking, Metal Door Knocking, and
many more. The fog layer is responsible for identifying various acoustic events
such as Tap water, Flush water using location and event classifiers and
sent it to the cloud later for storage purposes [50].

6
Table 4

S. Pandya and H. Ghayvat


The representation of seven lines of actions for 22 smart home acoustic events.
Acoustic Physical Layer Fog Layer Cloud Layer Data Presentation Layer Objective Layer Application Layer Connection Layer
Events

Tap water Detection of the tap Identification of the tap Data acquisition and Establish a secure link between a Monitoring of the classified Detection of tap water Regular updates and
water acoustic event via water acoustic event using storage of the processed microcontroller embedded with a tap water activity to detect acoustic event notification related to tap
microphone sensor location and event classifiers tap water acoustics microphone sensing unit and a cloud situations such as tap water anomalies water leakage to the caregivers
server via the MQTT broker. leakage and house members
Flush water Detection of the flush Identification of the flush Data acquisition and Establishment of a secure link The monitoring of the Detection of flush Regular updates and
water acoustic event via water acoustic event using storage of the processed between a microcontroller embedded classified flush water activity water acoustic event notification related to the flush
microphone sensor location and event classifiers flush water acoustics with a microphone sensing unit, and a to detect situations such as anomalies water leakage to the caregivers
cloud server via MQTT broker tap water leakage and house members
Wooden Detection of the wooden Identification of the wooden Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Door door opening acoustic door opening acoustic event storage of the processed between a microcontroller embedded classified wooden door wooden door opening notification related to the
Opening event via microphone using location and event wooden door opening with a microphone sensing unit, and a opening activity to detect acoustic event wooden door opening to the
sensor classifiers acoustics cloud server via MQTT broker situations such as tap water anomalies caregivers and house members
leakage
Wooden Detection of the wooden Identification of the wooden Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Door door closing acoustic door closing acoustic event storage of the processed between a microcontroller embedded classified wooden door wooden door closing notification related to the
Closing event via microphone using location and event wooden door closing with a microphone sensing unit, and a closing activity to detect acoustic event wooden door closing to the
sensor classifiers acoustics cloud server via MQTT broker situations such as tap water anomalies caregivers and house members
leakage
Metal Door Detection of the metal Identification of the tap Data acquisition and Establishment of a secure link The monitoring of the Detection of the metal Regular updates and
Opening door opening acoustic water acoustic event using storage of the processed between a microcontroller embedded classified tap water activity to door opening acoustic notification related to the metal
event via microphone location and event classifiers tap water acoustics with a microphone sensing unit, and a detect situations such as tap event anomalies door opening to the caregivers
sensor cloud server via MQTT broker water leakage and house members
Metal Door Detection of the metal Identification of the metal Data acquisition and Establishment of a secure link The monitoring of the Detection of the metal Regular updates and
Closing door closing acoustic door closing acoustic event storage of the processed between a microcontroller embedded classified tap water activity to door closing acoustic notification related to the metal
event via microphone using location and event metal door closing with a microphone sensing unit, and a detect situations such as tap event anomalies door closing to the caregivers
sensor classifiers acoustics cloud server via MQTT broker water leakage and house members
7

Window Detection of the window Identification of the window Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Opening opening acoustic event opening acoustic event using storage of the processed between a microcontroller embedded classified window opening window opening notification related to the
via microphone sensor location and event classifiers window opening with a microphone sensing unit, and a activity to detect situations acoustic event window opening to the
acoustics cloud server via MQTT broker such as tap water leakage anomalies caregivers and house members
Window Detection of the window Identification of the window Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Closing closing acoustic event closing acoustic event using storage of the processed between a microcontroller embedded classified window closing window closing notification related to the
via microphone sensor location and event classifiers window closing with a microphone sensing unit, and a activity to detect situations acoustic event window closing to the
acoustics cloud server via MQTT broker such as tap water leakage anomalies caregivers and house members
Curtain Detection of the curtain Identification of the curtain Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Opening opening acoustic event opening acoustic event using storage of the processed between a microcontroller embedded classified curtain opening curtain opening notification related to the
via microphone sensor location and event classifiers curtain opening with a microphone sensing unit, and a activity to detect situations acoustic event curtain opening to the
acoustics cloud server via MQTT broker such as tap water leakage anomalies caregivers and house members
Curtain Detection of the curtain Identification of the curtain Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Closing closing acoustic event closing acoustic event using storage of the processed between a microcontroller embedded classified curtain closing curtain closing event notification related to the

Advanced Engineering Informatics 47 (2021) 101238


via microphone sensor location and event classifiers curtain closing with a microphone sensing unit, and a activity to detect situations anomalies curtain closing to the
acoustics cloud server via MQTT broker such as tap water leakage caregivers and house members
Fan Detection of the fan Identification of the fan Data acquisition and Establishment of a secure link The monitoring of the Detection of the fan Regular updates and
acoustic event via acoustic event using location storage of the processed between a microcontroller embedded classified fan activity to acoustic event notification related to the fan
microphone sensor and event classifiers fan acoustics with a microphone sensing unit, and a detect situations such as tap anomalies acoustic event to the caregivers
cloud server via MQTT broker water leakage and house members
Chair Detection of the chair Identification of the chair Data acquisition and Establishment of a secure link The monitoring of the Detection of the chair Regular updates and
Movement movement acoustic movement acoustic event storage of the processed between a microcontroller embedded classified chair movement movement acoustic notification related to the chair
event via microphone using location and event chair movement with a microphone sensing unit, and a activity to detect situations event anomalies movement to the caregivers
sensor classifiers acoustics cloud server via MQTT broker such as tap water leakage and house members
Wooden Detection of the wooden Identification of the wooden Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Cupboard cupboard opening cupboard opening acoustic storage of the processed between a microcontroller embedded classified wooden cupboard wooden cupboard notification related to the
Opening acoustic event via event using location and wooden cupboard with a microphone sensing unit, and a opening activity to detect opening acoustic wooden cupboard opening to
microphone sensor event classifiers opening acoustics cloud server via MQTT broker event anomalies
(continued on next page)
S. Pandya and H. Ghayvat
Table 4 (continued )
Acoustic Physical Layer Fog Layer Cloud Layer Data Presentation Layer Objective Layer Application Layer Connection Layer
Events

situations such as tap water the caregivers and house


leakage members
Wooden Detection of the wooden Identification of the wooden Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Cupboard cupboard closing cupboard closing acoustic storage of the processed between a microcontroller embedded classified wooden cupboard wooden cupboard notification related to the
Closing acoustic event via event using location and wooden cupboard with a microphone sensing unit, and a closing activity to detect closing acoustic event wooden cupboard closing to
microphone sensor event classifiers closing acoustics cloud server via MQTT broker situations such as tap water anomalies the caregivers and house
leakage members
Wooden Detection of the wooden Identification of the wooden Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
Door door knocking acoustic door knocking acoustic storage of the processed between a microcontroller embedded classified wooden door wooden door notification related to the
Knocking event via microphone event using location and wooden door knocking with a microphone sensing unit, and a knocking activity to detect knocking acoustic wooden door knocking to the
sensor event classifiers acoustics cloud server via MQTT broker situations such as tap water event anomalies caregivers and house members
leakage
Metal Door Detection of the metal Identification of the metal Data acquisition and Establishment of a secure link The monitoring of the Detection of the metal Regular updates and
Knocking door knocking acoustic door knocking acoustic storage of the processed between a microcontroller embedded classified metal door door knocking notification related to the metal
event via microphone event using location and metal door knocking with a microphone sensing unit, and a knocking activity to detect acoustic event door knocking acoustic event
sensor event classifiers acoustics cloud server via MQTT broker situations such as tap water anomalies to the caregivers and house
leakage members
Cooking Detection of the cooking Identification of the cooking Data acquisition and Establishment of a secure link The monitoring of the Detection of a cooking Regular updates and
Vessel vessel acoustic event via vessel acoustic event using storage of the processed between a microcontroller embedded classified cooking vessel vessel acoustic event notification related to a
microphone sensor location and event classifiers cooking vessel acoustics with a microphone sensing unit, and a activity to detect situations anomalies cooking vessel to the caregivers
cloud server via MQTT broker such as tap water leakage and house members
Television Detection of the Identification of the Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
8

television acoustic event television acoustic event storage of the processed between a microcontroller embedded classified television activity television acoustic notification related to
via microphone sensor using location and event television acoustics with a microphone sensing unit, and a to detect situations such as event anomalies television acoustic event to the
classifiers cloud server via MQTT broker tap water leakage caregivers and house members
Fridge Detection of the fridge Identification of the fridge Data acquisition and Establishment of a secure link The monitoring of the Detection of the fridge Regular updates and
Opening opening acoustic event opening acoustic event using storage of the processed between a microcontroller embedded classified fridge opening opening acoustic notification related to the
via microphone sensor location and event classifiers fridge opening acoustics with a microphone sensing unit, and a activity to detect situations event anomalies fridge opening acoustic event
cloud server via MQTT broker such as tap water leakage to the caregivers and house
members
Fridge Detection of the fridge Identification of the fridge Data acquisition and Establishment of a secure link The monitoring of the Detection of the fridge Regular updates and
Closing closing acoustic event closing acoustic event using storage of the processed between a microcontroller embedded classified fridge closing closing acoustic event notification related to the
via microphone sensor location and event classifiers fridge closing acoustics with a microphone sensing unit, and a activity to detect situations anomalies fridge closing acoustic event to
cloud server via MQTT broker such as tap water leakage the caregivers and house
members
Radio Detection of the radio Identification of the radio Data acquisition and Establishment of a secure link The monitoring of the Detection of the radio Regular updates and

Advanced Engineering Informatics 47 (2021) 101238


acoustic event via acoustic event using location storage of the processed between a microcontroller embedded classified radio acoustic event acoustic event notification related to radio
microphone sensor and event classifiers radio acoustics with a microphone sensing unit, and a to detect situations such as anomalies acoustic event to the caregivers
cloud server via MQTT broker tap water leakage and house members
Shower Detection of the shower Identification of the shower Data acquisition and Establishment of a secure link The monitoring of the Detection of the Regular updates and
acoustic event via acoustic event using location storage of the processed between a microcontroller embedded classified shower activity to shower acoustic event notification related to shower
microphone sensor and event classifiers shower acoustics with a microphone sensing unit, and a detect situations such as tap anomalies acoustic event to the caregivers
cloud server via MQTT broker water leakage and house members
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

4.3. Cloud layer Acoustic Events detected before office hours: The acoustic events
detected in the early morning or after midnight indicates that a
The cloud layer is responsible for acquiring and storing the received, resident is suffering from anxiety or lack of sleep.
processed smart home acoustic events such as Fridge Opening, Fridge Events detected during office hours (when nobody is present at
Closing, Wooden Door Knocking, Metal Door Knocking, and many more home): The acoustic events detected during office hours fall into the
on a cloud platform such as Google Firebase. category of precarious events such as the presence of an unautho­
rized person at home, intimidating situations such as theft or rob­
4.4. Data Presentation layer bery, accessing confidential documents and many more. The acoustic
audio signals such as metal or plastic cupboard opening/closing
The data processing layer is responsible for establishing interfaces audios, windows opening/closing audios, and door opening/closing
between a sensing layer, an MQTT cloud broker, a database storage audios could indicate trespassing activities at home. Warning signals
server, and a mobile web interface. The Wi-Fi access point is essential for generated by such events are reactive measures taken by the pro­
transmitting sensor data using MQTT (application layer protocol) over posed system to protect the home from intruders’ presence.
the internet (with underlying network protocol-TCP).
The spectral representation of five acoustic events is presented in
4.5. Objective layer Fig. 6. To generate spectrogram representations, we have used Librosa
documentation and spectrogram libraries [36]. Although the frequency
The objective layer is responsible for further analyzing the received range appears similar in some of the acoustic events, in the proposed
data from the data presentation layer. For instance, to monitor and research work, a detailed analysis has been done to identify the subtle
observe irregular lifestyle activities, disorientation in the conduction of differences between 22 acoustic events to distinguish or classify. For
daily living activities, late-night television watching, and many more. example, it can be observed that acoustic events such as television and
Such signs can also be considered as emergency health situations for radio have similar frequency distributions. Still, it can be observed that
smart home residents. They can also help predict health diseases such as radio has more robust components at high frequencies as well as low
Dementia, Parkinson’s, Fall-related activities, and other emergencies. frequencies. Furthermore, in the case of tap and water leakage, it can be
observed that flush water leakage has more robust components at high,
4.6. Application layer medium, and low frequencies compared to tap water leakage. In the case
of metal door opening/closing and wooden door opening/closing, the
The application layer is responsible for the detection of smart home spectral representation indicates that metal door closing/opening has
acoustic event anomalies. It provides real-time updates of various smart more robust components at all the levels of frequencies, which also
home acoustic events such as opening cupboard in the absence of smart depicts that the proposed system can analyze and detect metal sounds
home occupants, Tap and Flush water leakages, presence of intruders, quite quickly compared to wooden sounds due to recorded frequencies.
and many more. Similarly, metal cupboard opening/closing audios depicts more vital
components at high, medium, and low frequencies than wooden
4.7. Connection layer cupboard opening/closing sounds. Moreover, the curtain door opening/
closing audios indicate more vital components at high and low fre­
The connection layer is responsible for passing real-time updates to quencies than window opening/closing sounds. It is easier to detect
the interconnected web and mobile applications and notifying house certain sounds in the absence of people at home than window opening/
members, health experts, and caregivers about emergency health situ­ closing sounds. For infrequent events such as fan and chair movement,
ations. It provides ignorant or unknown information such as fridge door fans represent powerful components at low, medium, and high fre­
open/close, wooden/metal cupboard open/close, and impacts of quencies. It is pretty clear from the spectral representation that it is not
acoustic events on critical resources such as water, gas, and electricity. easy to accurately detect chair movement audios. The acoustic event
such as shower has a similar spectral representation as to the fan’s
5. Acoustic analysis of unattended household events running, making it feasible for the proposed system to detect a person in
a washroom before and after office hour acoustic event detections. As far
In the proposed research work, an analysis of 22 acoustic events has as wooden and metal door knocking acoustic events are concerned, the
been carried out. These events are likely to trigger a warning signal in spectral representations clearly describe that metal door knocking
different corners of the home or places where an older person resides. In acoustic events can quickly identify visitors outside the main door and
this approach, health experts or caregivers have proposed diverse 7 lines then wooden door knocking audios. Cooking vessel acoustic event in­
of action for the recorded 22 acoustic events such as tap water leakage, dicates that it has vital components at both high and low frequencies.
flush water leakage, metal/wooden windows and door opening/closing, In addition to spectral representations, as shown in Fig. 6, various
cupboard opening/closing, fridge opening/closing, curtain opening/ spectral statistical representations have been done to analyze and cap­
closing, television, radio, and shower. The detailed analysis of these ture the recorded 22 acoustic events’ timbral aspects. We have used
acoustic events will determine the feasibility of generating alarms/ Librosa documentation and polygram libraries [38]. These Mel-scaled
warnings based on the proposed 22 acoustic events classification. Representations will help identify low resolution of pitches and pitch
Table 4 represents seven lines of actions for 22 smart home acoustic classes. The significance of pitch class is that it helps the proposed sys­
events. tem identify differences and variations present in the recorded 22
These recorded acoustic events can be classified into three acoustic events. The pitch class differentiates the recorded acoustic
categories: events in terms of its octave loudness and octave height. Fig. 6 represents
quadratic, linear and constant two order classification concerning a
Acoustic Events detected after office hours: Certain audio signals spectral representation of five recorded acoustic events; in this study, a
such as radio, television, main door opening/closing could also be detailed analysis has been done to do further analysis of 22 acoustic
considered as an event to raise the alarm. Observed activities such as events to identify the subtle differences and variations such as pitch
being awake till late at night highlight the distinctive lifestyle or class, variations in the octave loudness, temper or height to distinguish
disorientation in daily living activities. recorded acoustic events.
Such signs can also be considered as emergency health situations for For instance, it can be observed that there is a vast difference in the
smart home residents. poly-features representation of radio and television acoustic events. It

9
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Fig. 6. Polygram Representation of unknown acoustic events (a) tap water (b) flush water (c) fridge opening (d) fridge closing (e) chair movement.

indicates that radio represents a better resolution of pitch class con­ much louder than the tap water leakage acoustic event. Such acoustic
cerning octave loudness and height parameters in quadratic, linear, and events directly impact critical resources such as monthly water usage,
constant forms. In the case of leakages such as tap and water, the poly location-based water consumption, and water wastage. Furthermore, for
features representation indicates that the flush water acoustic event is the activities which occur in the absence of smart home residents, such

10
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

representation highlights that the proposed system can quickly detect


metal door knocking sounds due to higher pitch class. As far as unat­
tended acoustic events such as curtain opening/closing and window
opening/closing are concerned, the statistical representation indicates
that it is challenging to recognize window opening and closing sounds
due to low pitch and loudness parameters. For fan and chair movement
acoustic events, two order poly features representation indicates that
even though a fan acoustic event represents a constant poly features, the
chair movement is more comfortable to detect due to higher octave
loudness and pitch class. The acoustic events such as shower and fan
indicate that fans have a better pitch resolution than a shower. However,
shower acoustic event has better loudness and height, making it easier
for the proposed system to recognize activities such as a person’s pres­
ence in a washroom during before and after office hours for smart home
residents. The acoustic events such as vessel noise indicate that the
proposed system can effectively recognize this event due to high pitch
class, better pitch resolution, loudness, and height.
The tempogram representation of five acoustic events is shown in
Fig. 7. In general, tempogram representation is used for measuring a
variety of music. However, in the conducted research work, acoustic
events’ temporal representation has been described to measure acoustic
event intensity, tempo, length, and rhythm for smart home residents. To
generate tempogram representation, we have used Librosa documenta­
tion and tempogram libraries [39,40]. The tempogram representation is
measured using time and BPM (beats per music) units. In the case of
acoustic events such as radio and television, the tempogram represen­
tation indicates that the intensity and onset strength of television
acoustic events is higher than acoustic radio events. The estimated
tempo values recorded for radio and television are 123.047 and
129.099, respectively.
Furthermore, the temporal representation indicates that if flush
water leakage has higher intensity, the tap water leakage acoustic event
represents constant onset strength and tempo values compared to flush
water leakage. As far as acoustic events such as metal and wooden door
and cupboard opening/closing are concerned, it can be observed that
even though the onset strength of wooden door open is higher, the
estimated tempo of the metal door open is 123.047 as compared to
wooden door opening tempo values which are 117.454. However, on the
contrary, the proposed system can easily recognize the wooden door
close acoustic event due to its higher tempo value of 135.99 as compared
to metal door closing, which is around 103.359. The identical obser­
vations have been made for metal and wooden door knocking acoustic
events. The estimated tempo values of metal and wooden cupboard
opening and closing are 143.555 and 151.999, respectively.
In the case of wooden cupboard opening and closing acoustic events,
the recorded estimated tempo values of wooden cupboard opening and
closing are 112.347 and 99.384. In the case of frequent events such as a
fan and chair movement, it can be observed that even though the fan
represents higher onset strength, the movement of chair acoustic event
represents steady onset strength. However, a fan’s estimated tempo
value is higher than the chair movement acoustic event, 129.999. In
acoustic events such as shower and cooking, the tempogram represen­
tation indicates that they have almost similar onset strength represen­
tation. However, the estimated tempo values of cooking vessel acoustic
events are higher, around 129.199, than a shower acoustic event
recorded around 123.047.

Fig. 7. Tempogram Representation of unknown acoustic events (a) tap water,


6. Classification of smart home acoustic events
(b) flush water, (c) fridge opening (d) fridge closing (e) chair movement.

6.1. Data pre-processing module


as metal and wooden door opening/closing, metal and wooden
cupboard opening/closing, the statistical poly representation indicates In the initial phase, the acoustic signal is transformed into a 2D
that the audios of the metal door and cupboard acoustic events have grayscale image. First, the linear transformation methodology is applied
good octave height, octave loudness, and resolution as compared to to normalize the detected acoustic event such as fridge opening/closing,
wooden door and cupboard acoustic events. In metal and wooden door cupboard opening/closing, tap water/flush water leakage, and many
knocking acoustic event detections, the two order poly features more. The customized dataset is used for the conducted experiments,

11
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

can be represented by 1 if x is true and false otherwise. I(m,n) represents


LBP labeled image pixels. For the LBP histograms of different sizes,
coherent description of normalized histograms (LBPHin ) can be repre­
sented by,

Fig. 8. Acoustic Event Normalization Process.

Fig. 9. LBP based Texture Analysis.

consisting of various length acoustic events ranging from 2 s to 10 s. The


acoustic events have been normalized to the same length without using a
sliding time window. After completing a normalization process, the
acoustic data is converted into a 2D grayscale image, as shown in Fig. 8.
Each acoustic signal value is normalized between 0 and 255. As shown in
Fig. 8, the 2D grayscale image of size p × p is vertically mapped to the
acoustic sound signal. In this phase, we have compared portrait priority
with landscape priority and assured that whether portrait or landscape is
selected, the normalized signal’s performance should not be affected
except the texture directions.

6.2. LBP based texture analysis module

In this phase, the texture information is extracted from the 2D


grayscale image using the Local Binary Patterns methodology. The LBP
algorithm has already successfully image processing [41] (Wan et al.,
2006). However, in the undertaken study, a novel approach has been
presented by extending the LBP algorithm application to acoustic
events. Fig. 9 depicts the process of extracting textures using the LBP
algorithm. The 2D surface textures can be represented by two comple­
mentary contrasts (i) local binary spatial patterns, (ii) grayscale level
contrasts. The LBP classifier forms a variety of labels for image pixels
(Xc , Yc )It also thresholds the image of size p × p with the center value in
the form of a binary number. The function of LBP can be represented by,
p=1
∑ ( )
LBPP,R = s gp − gc 2p
p=0

where s(x) = {1, ifx ≥ 0}ands(x) = {0, otherwise}.


The notation (P, R) is utilized for neighboring pixels, where R is the
radius of a circle, and P represents sampling points. The variance of
grayscale of the local neighborhood is represented as a contrasting
complementary measure. The LBP histogram of 2n labels can be used to
represent texture descriptors. The LBP histogram (LBPH) can be repre­
sented by,

LBPH i = F{I(m, n) = p }, p = 0, 1, 2, ⋯., p − 1
m,n

where p is the variety of labels produced by the LBPH function, and I(x) Fig. 10. Proposed Acoustic Event Detection System Process Flow.

12
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Table 5
Basic Properties of recorded 22 Acoustic Events.
Recorded Acoustic Events SNR dB Mean Intensity Value Minimum Intensity Value Maximum Intensity Value

1 Tap water 9.6512 − 2.5 × 10− 5 − 0.4012 0.4890


2 Flush water 9.2343 − 2.9 × 10− 5 − 0.5556 0.5734
3 Wooden Door Opening 4.6812 − 3.5 × 10− 5 − 0.8492 0.8890
4 Wooden Door Closing 4.6512 − 3.5 × 10− 5 − 0.8312 0.8756
5 Metal Door Opening 4.3412 − 3.6 × 10− 5 − 0.8989 0.8670
6 Metal Door Closing 4.3215 − 3.7 × 10− 5 − 0.8567 0.8545
7 Window Opening 5.6559 − 3.5 × 10− 5 − 0.8234 0.8123
8 Window Closing 5.4345 − 3.5 × 10− 5 − 0.8478 0.8290
9 Curtain Opening 8.7876 − 1.5 × 10− 5 − 0.9476 0.9110
10 Curtain Closing 8.8974 − 1.5 × 10− 5 − 0.9456 0.9224
11 Fan 3.6543 − 8.4 × 10− 5 − 0.5434 0.5578
12 Chair Movement 7.4565 − 4.0 × 10− 5 − 0.8443 0.8311
13 Wooden Cupboard Opening 3.6510 − 3.5 × 10− 5 − 0.8122 0.8450
14 Wooden Cupboard Closing 3.6423 − 3.5 × 10− 5 − 0.8322 0.8545
15 Wooden Door Knocking 5.6512 − 1.1 × 10− 5 − 0.90492 0.9145
16 Metal Door Knocking 7.4653 − 1.5 × 10− 5 − 0.9623 0.9291
17 Cooking Vessel 7.4534 − 9.4 × 10− 6 − 0.3652 0.3651
18 Television 7.6512 − 3.8 × 10− 6 − 0.8492 0.8890
19 Fridge Opening 4.6810 − 3.7 × 10− 5 − 0.8321 0.8990
20 Fridge Closing 4.6512 − 3.7 × 10− 5 − 0.8212 0.8892
21 Radio 7.5513 − 9.4 × 10− 6 − 0.3652 0.3651
22 Shower 4.6512 − 8.5 × 10− 5 − 0.5456 0.5890

LBPH i b. Decision Tree classification:


LBPH in = ∑p− 1
p=0 LBPH p
In the proposed research work, a decision-tree-based classification
algorithm [45,46] [C4.5 instead of ID3] has been used in the conducted
6.3. Classification modules experiments. The C4.5 algorithm is an extension of the ID3 algorithm,
which will help data analysts understand and describe the achieved
In the final phase, various machine learning and deep learning-based results compared to other complex probabilistic classification method­
classifiers prove the transformed texture image using the LBP algorithm. ologies such as ID3 and multiclass classification. The decision-tree based
classification algorithm (C4.5) will help handle numerical properties,
a. LSTM-CNN based classification tree-depth issues, and handling anomalies and cost issues.

We have implemented LSTM-CNN based deep learning classification c. SVM based classification:
algorithm [43] (Zhang et al., 2019). Machine learning algorithms can
process raw data but cannot represent or convert raw data into the form SVM-based classification [47,48] is a widely used binary classifica­
a system could detect. Manual intervention is needed to convert the tion methodology for classification problems. SVM-based classification
processed raw data into meaningful representations. In the proposed methodology’s primary purpose is to find a hyperplane in a feature
research work, a deep learning algorithm has been employed to process space of N-dimensions.
raw data and converts into the required representations needed for
detection or classification without manual intervention. d. KNN based classification:
A deep learning algorithm can efficiently deal with various linear
and nonlinear data structures, such as audio, video, text, and images. K-nearest neighborhood classification methodology [49] is a ma­
The proposed algorithm is designed to process data in multi-dimensions chine learning technique that identifies closest data points for data
or multiple arrays. The proposed acoustic event detection algorithm has classification. SVM makes the use of Euclidian distance methodology to
mainly two layers: (i) CNN layers, (ii) Pooling layers. Fig. 10 provides a find the distance between the data points.
use case of the processing of an acoustic image. Table 5 represents SNR
values, minimum, maximum intensity, and mean intensity values of 7. Results and discussions
recorded 22 acoustic events. The CNN layers are responsible for gener­
ating features or feature maps as per the given filter values (weight An unknown-2000 dataset and a widely known ESC-50 sound clas­
values). As shown in Tables 6, the convolution process is applied to the sification dataset have been used to evaluate the proposed acoustic
input data and performed as a sliding window. The features convolve event detection system for unknown or ignorant events of a residence.
over a local region of the data and produce an output that becomes input Furthermore, measuring robustness to noise is an essential factor in the
values of sensors for the next layer.

13
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Table 6
The LSTM-CNN based classification model for acoustics classification.
Layers Feature map Size Kernel size Stride Activation Parameter

INPUT IMAGE 1 60 × 41 × 2 – – – –
1 CONVOLUTIONS 32 58 × 39 × 32 32 1 Relu 608
2 BATCH NORMALIZATION – 58 × 39 × 32 – – – 128
3 DROPOUT – 58 × 39 × 32 – – – 0
4 CONVOLUTIONS 32 56 × 37 × 32 32 1 Relu 9248
5 BATCH NORMALIZATION – 56 × 37 × 33 – – – 128
6 MAXPOOLING 32 28 × 18 × 32 32 1 Relu 0
7 CONVOLUTIONS 64 26 × 16 × 64 64 1 Relu 18,496
8 BATCH NORMALIZATION – 26 × 16 × 64 – – – 256
9 DROPOUT – 26 × 16 × 64 – – – 0
10 CONVOLUTIONS 64 24 × 14 × 64 64 1 Relu 36,928
11 BATCH NORMALIZATION – 24 × 14 × 64 – – – 256
12 MAXPOOLING 64 12 × 7 × 64 64 1 Relu 0
13 CONVOLUTIONS 128 10 × 5 × 128 128 1 Relu 73,856
14 BATCH NORMALIZATION – 10 × 5 × 128 – – – 512
15 DROPOUT – 10 × 5 × 128 – – – 0
16 CONVOLUTIONS 128 8 × 3 × 128 128 1 Relu 147,584
17 BATCH NORMALIZATION – 8 × 3 × 128 – – – 512
18 MAXPOOLING 128 4 × 1 × 128 128 1 Relu 0
19 FC – 512 – – – 0
20 BATCH NORMALIZATION – 512 – – – 2048
21 DROPOUT – 512 – – – 0
22 FC – 1024 – – Relu 525,312
23 BATCH NORMALIZATION – 1024 – – – 4096
24 DROPOUT – 1024 – – – 0
25 FC – 512 – – Relu 524,800
26 BATCH NORMALIZATION – 512 – – – 2048
27 DROPOUT – 512 – – – 0
28 OUTPUT(FC) – 50 SOFTMAX 25,650

conducted experiments. Furthermore, a similarity structure method


(SSIM) and Gaussian noise have been used for training the proposed
system to achieve this. Eventually, six different classification method­
ologies, such as LSTM-CNN based classification, SVM, KNN-based clas­
sification, C4.5 decision tree-based classification, LSTM, and Bi-LSTM
based classification has been applied to validate the proposed acoustic
event detection and classification system. Figs. 13 and 14 represents the
accuracy performance of the proposed system using LSTM-CNN based
classification methodology.

7.1. Data collection and annotation

A standard benchmark dataset ESC-50 and unknown-2000 dataset


has been used in the undertaken study, which has a labeled collection of
more than 2000 sound classification audios. Furthermore, microphone
sensors have been used to record real-time acoustic events of a resi­
dence. The customized dataset is used for the conducted experiments,
consisting of various length acoustic events ranging from 2 s to 10 s.
These events are likely to trigger an alarm in the different corners of a
smart home or places where a smart home occupant resides. In this
approach, a diverse seven lines of action for the acoustic events has been
formulated for acoustic events such as tap water leakage, flush water
leakage, metal/wooden windows and door opening/closing, cupboard
opening/closing, fridge opening/closing, curtain opening/closing, tele­
vision, radio, shower and many more as shown in Fig. 3. In this
approach, we have proposed diverse 7 lines of action for the recorded 22
acoustic events such as tap water leakage, flush water leakage, metal/
wooden windows and door opening/closing, cupboard opening/closing,
fridge opening/closing, curtain opening/closing, television, radio,
shower and many more as shown in Fig. 3. The detailed analysis of these
acoustic events has been done in Section 4 in the spectrogram, poly­
gram, and tempogram representations.
The spectrogram representation is a 2D grayscale image of the
recorded 22 acoustic events. Polygram Mel-scaled representations help
Fig. 11. Two-level LBP based Texture Representation of unknown acoustic identify low resolution of pitches and pitch classes and the temporal
events (a) tap water (b) flush water (c) fridge opening (d) fridge closing (e) representation of acoustic events to measure acoustic event intensity,
chair movement. tempo, length, and rhythm for smart home residents. The graphical

14
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Fig. 12. The training and testing accuracy performance representation of the LSTM-CNN Classification Model.

representation of the recorded 22 acoustic events is presented in Model Validation Metrics


Figs. 6–8. The total sample size used for each residential acoustic event is
100. Furthermore, Gaussian noise methodology has been used for Signal In the proposed research work, specificity (SP), sensitivity (SS) and
to Noise variations (SNR0, SNR3, SNR6, SNR9, SNR12, SNR15, and accuracy parameter (AP), Precision(P), Negative Predictive Rate(NPR),
SNR18 dB). The S NR values are used with a widely known esc-50 False Predictive Rate(FPR), False Discovery Rate(FDR), False Negative
dataset and the recorded unknown-2000 acoustic events in a test data­ Rate(FNR), F1 Score(F) and Co-relation Co-efficient(CC) metrics have
set for the conducted experiments. been analyzed to validate the proposed model [2].
Specificity can be represented by,
7.2. LBP based texture analysis results
TNS
Specificity = (4)
TNS + FPS
The texture images were extracted from 2D grayscale acoustic signals
using an LBP algorithm. In the conducted LBP experiments, the average Sensitivity can be represented by,
execution time duration was 0.5 s. The total execution time range was TPS
between 0.345 and 0.589 s. The generated results depict that the pro­ Sensitivity = (5)
TPS + FNS
posed acoustic event detection and classification methodology can be
used in real-time to detect new acoustic events at home in scenarios such Accuracy can be represented by,
as before, after, and during office hours. Fig. 10 depicts a system ar­ (TPS + TNS)
chitecture design of the proposed acoustic event detection and classifi­ Accuracy = (6)
(PS + NS)
cation system. Fig. 11 represents the texture analysis of detected five
unknown acoustic events along with its 2D grayscale images. Precision can be represented by,
The Gaussian noise classification methodology has been used for the TPS
recorded acoustic events to verify the proposed acoustic event detection Precision = (7)
TPS + FPS
and classification system’s noise robustness. Furthermore, structural
similarity methodology has been used for noise removal present in the Negative Predictive Rate can be represented by,
recorded acoustic events. As shown in Fig. 14, a comparison between TNS
average SNR values of 22 acoustic events has been carried out by Negative Predictive Rate(NPR) = . (8)
(TNS + FNS)
applying and without applying the LBP algorithm. The sequence of the
acoustic events that has been considered is shown in Table 2. The gray Positive Predictive Rate can be represented by,
line represents values without LBP SNR yellow line represents values TNS
Positive Predictive Rate(PPR) = (9)
after the application of LBP SNR. It is depicted from Fig. 14 that there is a (TNS + FNS)
vast difference in the SNR values of 22 acoustic events for both the
scenarios before LBP and after applying LBP methodology. Furthermore, False Positive Rate can be represented by,
Fig. 14(a) and (b) represents SNR variations of tap water and flush water FNS
False Positive Rate(FPR) = (10)
leakage acoustic events. It is depicted that structure similarity also be­ (TNS + FNS)
comes less if the SNR value is less [34]. However, better results have
been acquired after applying the LBP algorithm on 22 acoustic events. False Discovery Rate can be represented by,
Similarly, we had conducted SNR variation experiments for all the FPS
False Discovery Rate(FDR) = (11)
recorded acoustic events, and the obtained results highlight that the (FPS + TPS)
LBP-based texture algorithm has provided satisfactory results. Based on
the conducted experiments, it is proven that LBP based texture analysis False Negative Rate can be represented by,
can be effectively applied to acoustic data along with other structures FNS
False Negative Rate(FNR) = (12)
such as image and text. (FNS + TPS)
F1 Score can be represented by,

15
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Fig. 13. The training and testing loss representation of the LSTM-CNN Classification Model.

Fig. 14. The structural similarity comparision of the average value of 22 acoustic events as per Table 5(esc-50 and unknown-2000 dataset).

TPS representation of the LSTM-CNN Classification Model. Fig. 14 represents


F1 Score(F) = (13)
(2TPS + FPS + TNS) the structural similarity Comparision of the average value of 22 acoustic

(TPS × TNS)
Corelation Co − efficient(CC) = (14)
(sqrt((TPS + FPS) × (TPS + FNS) × (TNS + FPS) × (TNS + FNS)))

events. Fig. 15 represents the confusion matrix representation of the


average value of 22 acoustic events. Table 7 represents the F1 score
Table 6 represents an LSTM-CNN based acoustic classification model comparison of 22 acoustic events. The obtained results indicate that the
used in the conducted experiments. The detailed validation of the pro­ customized version of the LSTM-CNN based classification approach used
posed acoustic detection and classification system has been done using in the conducted experiment has outperformed all the other customized
machine learning and deep learning-based classification techniques classification approaches, such as SVM, KNN-based classification, C4.5
such as LSTM-CNN based classification, decision-tree classification decision tree-based classification, LSTM, and Bi-LSTM based classifica­
(C4.5), SVM, LSTM, Bi-LSTM, and KNN based classification. Fig. 12 tion. The LSTM-CNN based classification model has achieved an average
represents the training and testing accuracy, and Fig. 13 depicts the loss value of approximately 0.77 and a standard deviation of 0.2308.

16
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Fig. 15. The confusion matrix representation of the average value of 22 acoustic events as per Table 5(esc-50 and unknown-2000 dataset).

17
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Table 7
Classification of 22 Acoustic Events (esc-50 and unknown-2000 dataset).
Acoustic-Noise Conditions F1 Score

LSTM-CNN Decision KNN SVM LSTM Bi-LSTM Tree

Tap water 0.4800 0.4320 0.4011 0.4112 0.4770 0.4778


Flush water 0.5736 0.5540 0.5339 0.5689 0.5634 0.56504
Wooden Door Opening 0.8821 0.8884 0.9015 0.9195 0.8789 0.8790
Wooden Door Closing 0.8767 0.8450 0.8224 0.8334 0.8678 0.8752
Metal Door Opening 0.8686 0.8112 0.8115 0.8334 0.8470 0.8479
Metal Door Closing 0.8589 0.8200 0.8223 0.8356 0.8215 0.8234
Window Opening 0.8132 0.8078 0.8011 0.8122 0.8020 0.8112
Window Closing 0.8291 0.7877 0.7980 0.8020 0.7990 0.7998
Curtain Opening 0.9101 0.8978 0.8109 0.8865 0.8990 0.9011
Curtain Closing 0.9207 0.8890 0.8890 0.9211 0.9113 0.9178
Fan 0.5541 0.5111 0.4933 0.5434 0.5434 0.5479
Chair Movement 0.8361 0.7678 0.7767 0.8233 0.8211 0.8289
Wooden Cupboard Opening 0.8483 0.7988 0.8116 0.8523 0.8240 0.8411
Wooden Cupboard Closing 0.8592 0.8113 0.8223 0.8467 0.8332 0.8456
Wooden Door Knocking 0.9101 0.8656 0.8945 0.9044 0.8912 0.9067
Metal Door Knocking 0.9207 0.7988 0.8110 0.8977 0.9090 0.9178
Cooking Vessel 0.3671 0.2545 0.2789 0.3454 0.3154 0.3423
Television 0.8821 0.6756 0.7623 0.8765 0.8656 0.8689
Fridge Opening 0.8963 0.6989 0.7876 0.8777 0.8543 0.8779
Fridge Closing 0.8821 0.2443 0.2112 0.3233 0.8546 0.8612
Radio 0.3671 0.1990 0.2456 0.2901 0.3449 0.3511
Shower 0.5881 0.5110 0.5448 0.5667 0.5667 0.5697
Average 0.7692 0.6526 0.6832 0.7259 0.7495 0.7571
Standard Deviation 0.2308 0.3474 0.3168 0.2741 0.2505 0.24291

However, classifiers such as C4.5, KNN, SVM have achieved an average 8. Conclusions and discussions
value of 0.6526, 0.6832 and 0.7259, and a standard deviation of 0.3474,
0.3168 and 0.2741. Furthermore, LSTM and Bi-LSTM based advanced In general, home residents are engrossed in the daily routine activ­
classification approaches have obtained 0.7495 and 0.7571 average ities that they ignore certain acoustic events such as tap water leakage,
values. Table 8 represents the classification Metrics of the LSTM-CNN flush water leakage, the acoustics of door opening/closing, cupboard
Classification Model for 22 Acoustic. The conducted acoustic experi­ opening/closing, curtain opening/closing, television, shower, radio,
ments have also proven that LBP-based texture analysis and LSTM-CNN chair and many more. But these unattended events have an enormous
based classification can provide significant classification results under impact on critical resources such as electricity, water, and gas. A stan­
various noisy conditions such as SNR18, SNR15, and SNR12, SNR9, dard benchmark dataset ESC-50 and unknown-2000 dataset has been
SNR6, SNR3, and SNR0. Table 9 represents the classification of 22 used in the undertaken study in the conducted experiments, which has a
Acoustic Events under various acoustic-noise conditions such as SNR18, labeled collection of more than 2000 sound classification audios. In the
SNR15, and SNR12, SNR9, SNR6, SNR3, and SNR0 Fig. 16. proposed approach, we have analyzed 22 unknown or ignorant acoustic
events sensed using acoustic sensors via an acoustic sensor network
installed at various corners of a residence. The customized dataset is

Table 8
Classification Metrics of LSTM-CNN Classification Model for 22 Acoustic Events (esc-50 and unknown-2000 dataset).
Acoustic Events LSTM-CNN Classification Model Metrics

Sensitivity Specificity PPR NPR FPR FDR FNR Accuracy F1Score

Tap water 0.5 0.5 0.4545 0.5455 0.5 0.5455 0.5 0.5 0.4800
Flush water 0.5324 0.5263 0.6218 0.4348 0.4737 0.3782 0.4676 0.5299 0.5736
Wooden Door Opening 0.7895 0.7143 0.9993 0.0062 0.2857 0.0007 0.2105 0.7893 0.8821
Wooden Door Closing 0.7808 0.7143 0.9993 0.0062 0.2857 0.0007 0.2192 0.7807 0.8767
Metal Door Opening 0.7681 0.7143 0.9992 0.0062 0.2857 0.0008 0.2319 0.768 0.8686
Metal Door Closing 0.7538 0.375 0.998 0.0037 0.625 0.002 0.2462 0.7529 0.8589
Window Opening 0.6863 0.5 0.9977 0.005 0.5 0.0023 0.3137 0.6857 0.8132
Window Closing 0.7091 0.5 0.998 0.005 0.5 0.002 0.2909 0.7085 0.8291
Curtain Opening 0.8367 0.6 0.9976 0.0184 0.4 0.0024 0.1633 0.8355 0.9101
Curtain Closing 0.8545 0.6 0.9979 0.0184 0.4 0.0021 0.1455 0.8534 0.9207
Fan 0.685 0.0074 0.4652 0.0184 0.9926 0.5348 0.315 0.3853 0.5541
Chair Movement 0.7193 0.5 0.9981 0.005 0.5 0.0019 0.2807 0.7187 0.8361
Wooden Cupboard Opening 0.7377 0.375 0.9978 0.0037 0.625 0.0022 0.2623 0.7368 0.8483
Wooden Cupboard Closing 0.7538 0.625 0.9998 0.0062 0.375 0.0012 0.2462 0.7535 0.8592
Wooden Door Knocking 0.8367 0.6 0.9976 0.0184 0.4 0.0024 0.1633 0.8355 0.9101
Metal Door Knocking 0.8545 0.6 0.9979 0.0184 0.4 0.0021 0.1455 0.8534 0.9207
Cooking Vessel 0.4915 0.0071 0.2929 0.0164 0.9929 0.7071 0.5085 0.2278 0.3671
Television 0.7895 0.7143 0.9993 0.0062 0.2857 0.0007 0.2105 0.7893 0.8821
Fridge Opening 0.814 0.4118 0.9972 0.0087 0.5882 0.0028 0.186 0.8124 0.8963
Fridge Closing 0.7895 0.7143 0.9993 0.0062 0.2857 0.0007 0.2105 0.7893 0.8821
Radio 0.4915 0.0071 0.2929 0.0164 0.9929 0.7071 0.5085 0.2278 0.3671
Shower 0.7142 0.0074 0.4999 0.0184 0.9926 0.5001 0.2858 0.4184 0.5881

18
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

Table 9
Classification of 22 Acoustic Events under various acoustic-noise conditions (esc-50 and unknown-2000 dataset).
Acoustic-Noise Conditions SNR18 SNR15 SNR12 SNR9 SNR6 SNR3 SNR0

Tap water 0.4890 0.4754 0.4565 0.4434 0.3998 0.2998 0.2540


Flush water 0.5734 0.5678 0.5643 0.5498 0.5209 0.5201 0.5120
Wooden Door Opening 0.8890 0.8773 0.8581 0.8324 0.8320 0.8110 0.8009
Wooden Door Closing 0.8756 0.8665 0.8578 0.8567 0.8334 0.7325 0.6912
Metal Door Opening 0.8670 0.8434 0.8123 0.8456 0.7345 0.7787 0.6334
Metal Door Closing 0.8545 0.8434 0.8232 0.8221 0.7233 0.7223 0.6123
Window Opening 0.8123 0.8009 0.8055 0.8032 0.7003 0.7113 0.6132
Window Closing 0.8290 0.8110 0.8090 0.7432 0.7767 0.7450 0.6456
Curtain Opening 0.9110 0.9002 0.8987 0.8655 0.8112 0.7877 0.7545
Curtain Closing 0.9224 0.9007 0.8776 0.8550 0.8343 0.7998 0.6776
Fan 0.5578 0.5468 0.5323 0.5112 0.4675 0.4223 0.3244
Chair Movement 0.8311 0.8311 0.8555 0.8432 0.7903 0.7223 0.6550
Wooden Cupboard Opening 0.8450 0.8450 0.8555 0.8432 0.7903 0.7223 0.6550
Wooden Cupboard Closing 0.8545 0.8545 0.8555 0.8432 0.7903 0.7223 0.6550
Wooden Door Knocking 0.9145 0.9145 0.8757 0.8343 0.7996 0.7434 0.6987
Metal Door Knocking 0.9291 0.9113 0.8778 0.8332 0.7845 0.6787 0.6114
Cooking Vessel 0.3651 0.3651 0.3443 0.3118 0.2878 0.2434 0.2111
Television 0.8890 0.8890 0.8654 0.8223 0.7865 0.7223 0.6878
Fridge Opening 0.8990 0.8990 0.8434 0.8190 0.7986 0.6654 0.6124
Fridge Closing 0.8892 0.8892 0.8232 0.8343 0.7345 0.7878 0.6323
Radio 0.3651 0.3554 0.3310 0.2987 0.2543 0.2232 0.1878
Shower 0.5890 0.5676 0.5221 0.5112 0.4678 0.4534 0.3878

Fig. 16. SNR variations(average values) of (a) Tap Water Leakage Acoustic Event (b) Flush Water Leakage Acoustic Event(esc-50 and unknown-2000 dataset).
19
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

used for the conducted experiments, consisting of various length • The proposed system’s performance assessment has been tested
acoustic events ranging from 2 s to 10 s. The detailed analysis of the under various noisy conditions such as SNR18, SNR15, SNR12,
recorded residence acoustic events has been done in spectrogram, SNR9, SNR6, SNR3, and SNR0 dB.
polygram, and tempogram representations to identify acoustic events’ • The obtained results indicate that the customized version of the
intensity, pitch, pitch class, resolution, tempo, length, and rhythm using LSTM-CNN based classification approach used in the conducted
Librosa machine learning libraries. Furthermore, based on the con­ experiment has outperformed all the other customized classi­
ducted rigorous and detailed analysis, the recorded acoustic events have fication approaches, such as SVM, KNN-based classification,
been normalized using linear transformation methodology. In the un­ C4.5 decision tree-based classification, LSTM, and Bi-LSTM
dertaken study, we have proposed an LBP based texture extraction based classification. The LSTM-CNN based classification
approach to the recorded acoustic events, which was previously used for model has achieved an average value of 0.7705 and a standard
image processing applications only. In the proposed work, we have deviation of 0.2295. However, classifiers such as C4.5, KNN,
extended the LBP methodology to sound signals. SVM have achieved an average value of 0.6526,0.6832 and
The validation of the proposed approach has been done using ma­ 0.7259, and a standard deviation of 0.3474, 0.3168 0.2741.
chine learning and deep learning-based classification techniques such as Furthermore, LSTM and Bi-LSTM based advanced classification
LSTM-CNN based classification, decision-tree classification (C4.5), SVM, approaches have obtained 0.7495 and 0.7571 average values.
KNN, LSTM, and Bi-LSTM based classification. The obtained results The conducted acoustic experiments have also proven that LBP-
indicate that the customized version of the LSTM-CNN based classifi­ based texture analysis and LSTM-CNN based classification can
cation approach used in the conducted experiment has outperformed all provide significant classification results under various noisy
the other customized classification approaches, such as SVM, KNN- conditions such as SNR18, SNR15, and SNR12, SNR9, SNR6,
based classification, C4.5 decision tree-based classification, LSTM, and SNR3, and SNR0.
Bi-LSTM based classification. The LSTM-CNN based classification model • In the end, a detailed comparison of LBP and without LBP ap­
has achieved an average value of approximately 0.77 and a standard proaches has been carried out.
deviation of 0.2295. However, classifiers such as C4.5, KNN, SVM have • The proposed Ambient Acoustic Event Assistive Framework is a
achieved an average value of 0.6526, 0.6832 and 0.7259, and a standard cost-effective alternative due to low-cost microphone sensors in
deviation of 0.3474, 0.3168 0.2741. Furthermore, LSTM and Bi-LSTM the conducted experiments.
based advanced classification approaches have obtained 0.7495 and
0.7571 average values. The conducted acoustic experiments have also Author contribution
proven that LBP-based texture analysis and LSTM-CNN based classifi­
cation can provide significant classification results under various noisy Sharnil Pandya: Conceptualization, Data Collection, Methodology,
conditions such as SNR18, SNR15, and SNR12, SNR9, SNR6, SNR3, and Analysis and Interpretation.
SNR0. The LSTM-CNN based classifier has achieved approximately 77%
classification performance as compared to other classifiers. In the end, a Declaration of Competing Interest
detailed comparison of LBP and without LBP approaches has been car­
ried out, which proves that the combination of LBP and LSTM-CNN The authors declare that they have no known competing financial
classification approach provides better results than without the LBP interests or personal relationships that could have appeared to influence
classification approach. the work reported in this paper.
In the future, the proposed acoustic detection and classification
system can be used to find the impact of various recorded resident References
acoustic events on critical resources such as water, gas, and electricity
under a variety of noisy conditions. [1] R. Maskeliūnas, R. Damaševičius, Segal, A Review of Internet of Things
Technologies for Ambient Assisted Living Environments, Future Internet, 2019, 11,
Summary Points: 259, MDPI.
[2] H. Ghayvat, M. Awais, S. Pandya, H. Ren, S. Akbarzadeh, S. Chandra
(i) What is Already Known or not done Mukhopadhyay, C. Chen, P. Gope, A. Chouhan, W. Chen, Smart aging system:
uncovering the hidden wellness parameter for well-being monitoring and anomaly
• Basic digital signal processing approaches and noise removal. detection, Sensors 19 (4) (2019) 766.
• Use of heterogeneous sensors to detect activities of daily living. [3] S. Pandya, H. Ghayvat, K. Kotecha, M. Awais, S. Akbarzadeh, P. Gope, S.
• We are applying machine learning and deep learning-based C. Mukhopadhyay, W. Chen, Smart home anti-theft system: a novel approach for
near real-time monitoring and smart home security for wellness protocol, Appl.
image processing methodologies for recognizing AAL activ­ Syst. Innov. 1 (42) (2018). MDPI.
ities and emergencies. [4] M. Awais, H. Ghayvat, A. Krishnan Pandarathodiyil, W.M. Nabillah Ghani,
• Fellow researchers have attempted to scatter researches, but a A. Ramanathan, S. Pandya, N. Walter, M.N. Saad, R.B. Zain, I. Faye, Healthcare
professional in the loop (HPIL): classification of standard and oral cancer-causing
complete Ambient Acoustic Event Assistive Framework has not
anomalous regions of oral cavity using textural analysis technique in
been proposed. autofluorescence imaging, Sensors 20 (2020) 5780.
(ii) The contributions that this study makes are [5] S. Pandya, H. Ghayvat, A. Sur, M. Awais, K. Kotecha, S. Saxena, N. Jassal,
• Design and development of a novel Ambient Acoustic Event G. Pingale, Pollution weather prediction system: smart outdoor pollution
monitoring and prediction for healthy breathing and living, Sensors 20 (2020)
Assistive Framework for various acoustic events of a smart 5448.
home residence, which was not addressed previously. A stan­ [6] S. Pandya, A. Sur, K. Kotecha, Smart epidemic tunnel: IoT-based sensor-fusion
dard benchmark dataset ESC-50 and unknown-2000 dataset assistive technology for COVID-19 disinfection, Int. J. Pervasive Comput. Commun.
(2020), https://doi.org/10.1108/IJPCC-07-2020-0091. Vol. ahead-of-print No.
has been used in the conducted experiments, which has a ahead-of-print.
labeled collection of more than 2000 sound classification au­ [7] World Population Ageing UN reports 2019, Available: https://www.un.org/en/de
dios. In the proposed approach, homogenous microphone velopment/desa/population/publications/pdf/ageing/WorldPopulationAgeing
2019-Report.pdf.
sensors were used to record acoustic events. The detailed [8] T. Magherini, A. Fantechi, C.D. Nugent, E. Vicario, Using temporal logic and model
analysis of the recorded residence acoustic events has been checking in automated recognition of human activities for ambient-assisted living,
done in spectrogram, polygram, and tempogram representa­ IEEE Trans. Human-Machine Syst. 43 (6) (2013) 509–521.
[9] D. Calvaresi, D. Cesarini, P. Sernani, M. Marinoni, A.F. Dragoni, A. Sturm,
tions to identify acoustic events’ intensity, pitch, pitch class, Exploring the ambient assisted living domain: a systematic review, J. Amb. Intel.
resolution, tempo, length, and rhythm using Librosa machine Hum. Comp. 8 (2017) 239–257.
learning libraries.

20
S. Pandya and H. Ghayvat Advanced Engineering Informatics 47 (2021) 101238

[10] A.R.M. Forkan, I. Khalil, Z. Tari, S. Foufou, A. Bouras, A context-aware approach [27] A.A. Anosov, R.V. Belyaev, V.A. Vilkov, et al., Dynamic deep temperature recovery
for long-term behavioral change detection and abnormality prediction in ambient by acoustic thermography using neural networks, Acoust. Phys. 59 (2013)
assisted living, Pattern Recogn. (2015). 717–721, https://doi.org/10.1134/S1063771013050011.
[11] P. Bellagente, C. Crema, A. Depari, A. Flammini, G. Lenzi, S. Rinaldi, Framework- [28] Y.N. Makov, Coated microbubbles: development of echo-contrast compositions in
oriented approach to ease the development of ambient assisted-living systems, medical acoustics and dynamic models of such systems with nonlinear elastic
IEEE Syst. J. 13 (4) (2019) 4421–4432. shells, Acoust. Phys. 55 (2009) 547–555, https://doi.org/10.1134/
[12] V. Bianchi, M. Bassoli, G. Lombardo, P. Fornacciari, M. Mordonini, I. De Munari, S1063771009040113.
IoT wearable sensor and deep learning: an integrated approach for personalized [29] D. Spoladore, A. Mahroo, A. Trombetta, M. Sacco, ComfOnt, A semantic framework
human activity recognition in a smart home environment, IEEE Internet J. 6 (5) for indoor comfort and energy saving in smart homes, Electronics 8 (2019) 1449.
(2019) 8553–8562. [30] A. Mesaros, T. Heittola, A. Eronen, T. Virtanen, Acoustic event detection in real-life
[13] G.A. Oguntala, et al., SmartWall: Novel RFID-Enabled ambient human activity recordings, in: 18th European Signal Processing Conference, Aalborg, IEEE, 2010,
recognition using machine learning for unobtrusive health monitoring, IEEE Access pp. 1267–1271.
7 (2019) 68022–68033. [31] Annamaria Mesaros, Aleksandr Diment, Benjamin Elizalde, Toni Heittola,
[14] V. Bianchi, P. Ciampolini, I. De Munari, RSSI-Based indoor localization and Emmanuel Vincent, et al., 2017. Sound event detection in the DCASE 2017
identification for ZigBee wireless sensor networks in smart homes, IEEE Trans. Challenge. IEEE/ACM Transactions on Audio, Speechand Language Processing, 27
Instrum. Measur. 68 (2) (2019) 566–575. (6)992 -1006, IEEE.
[15] Q. Ren, Y. Sun, Y. Huo, L. Zhang, S. Li, Connectivity on underwater MI-assisted [32] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J.P. Bello, Robust sound event
acoustic cooperative MIMO networks, Sensors 20 (20) (2020) 3317. detection in bioacoustic sensor networks, PLOS ONE 14 (10) (2019).
[16] J. Navarro, E. Vidaña-Vila, R.M. Alsina-Pagès, M. Hervás, Real-time distributed [33] H. Alhazmi, R. Guldiken, Quantification of bolt tension by surface acoustic waves:
architecture for remote acoustic elderly monitoring in residential-scale ambient an experimentally verified simulation study, Acoustics 1 (2019) 794–807.
assisted living scenarios, Sensors 18 (2018) 2492. [34] Y. Choi, O. Atif, J. Lee, D. Park, Y. Chung, Noise-robust sound-event classification
[17] R.M. Alsina-Pagès, J. Navarro, F. Alías, M. Hervás, homeSound: real-time audio system with texture analysis, Symmetry 10 (2018) 402.
event detection based on high-performance computing for behaviour and [35] S. Kouzoupis, A. Neocleous, I. Athanassakis, Categorization of mouse ultrasonic
surveillance remote monitoring, Sensors 17 (2017) 854. vocalizations using machine learning techniques, Acoustics 1 (2019) 837–846.
[18] J. Lopez-Ballester, A. Pastor-Aparicio, S. Felici-Castell, J. Segura-Garcia, M. Cobos, [36] Librosa Tutorial Document, 2013-2019 (cited 10 Dec 2019) available: https://libro
Enabling real-time computation of psycho-acoustic parameters in acoustic sensors sa.github.io/librosa/tutorial.html.
using convolutional neural networks, IEEE Sensors J. 20 (19) (2020) [38] Librosa Poly Features Extraction, 2013-2019 (cited 11 Dec 2019) available: htt
11429–11438, https://doi.org/10.1109/JSEN.2020.2995779. ps://librosa.github.io/librosa/generated/librosa.feature.poly_features.html.
[19] R. Vithiya, G. Sharmila, S. Karthika, Enhancing the performance of routing [39] Librosa Feature Extraction, 2013-2019 (cited 11 Dec 2019) available: https
protocol in underwater acoustic sensor networks, in: 2018 IEEE International ://librosa.github.io/librosa/generated/librosa.feature.tempogram.html.
Conference on System, Computation, Automation and Networking (ICSCA), [40] Librosa Feature Extraction, 2013-2019 (cited 13 Dec 2019) available: https://libr
Pondicherry, 2018, pp. 1–5. https://doi.org/10.1109/ICSCAN.2018.8541155. osa.github.io/librosa/generated/librosa.core.mel_frequencies.html.
[20] N. Jin, X. Zhou, Z. Wang, Y. Liu, L. Wang, Robust sequence-based localization in [41] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-sclae and rotation
acoustic sensor networks, in: 2018 IEEE International Conference on Acoustics, invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal.
Speech, and Signal Processing (ICASSP), Calgary, AB, IEEE, 2018, pp. 3809–3813. Mach. Intelligence 24 (7) (2002) 971–987.
https://doi.org/10.1109/ICASSP.2018.8461945. [42] Xianji Wang, Haifeng Gong, Hao Zhang, Bin Li, Zhenquan Zhuang, Palmprint
[21] H. Ghayvat, S. Pandya, A. Patel, Deep learning model for acoustics signal based Identification using Boosting Local Binary Pattern, in: 18th International
preventive healthcare monitoring and activity of daily living, in: 2nd International Conference on Pattern Recognition (ICPR’06), IEEE, 2006, pp. 503–506.
Conference on Data, Engineering and Applications (IDEA), Bhopal, India, 2020, pp. [43] J. Xie, K. Hu, M. Zhu, J. Yu, Q. Zhu, Investigation of different CNN-based models
1–7. https://doi.org/10.1109/IDEA49133.2020.9170666. for improved bird sound classification, IEEE Access 7 (2019) 175353–175361.
[22] L. Vuegen, B. Van Den Broeck, P. Karsmakers, H. Van Hamme, B. Vanrumste, [44] Y. Su, K. Zhang, J. Wang, K. Madani, Environment sound classification using a two-
Monitoring activities of daily living using Wireless Acoustic Sensor Networks in stream CNN based on decision-level fusion, Sensors 19 (7) (2019) 1733.
clean and noisy conditions, in: 2015 37th Annual International Conference of the [45] J. Lee, AUC4.5: AUC-based C4.5 decision tree algorithm for imbalanced, Data
IEEE Engineering in Medicine and Biology Society (EMBC), Milan, 2015, pp. Classification 7 (2019) 106034–106042.
4966–4969, https://doi.org/10.1109/EMBC.2015.7319506. [46] S.R. Gaddam, V.V. Phoha, K.S. Balagani, K-Means+ID3: a novel method for
[23] T. Wang, et al., Contactless respiration monitoring using ultrasound signal with off- supervised anomaly detection by cascading K-means clustering, and ID3 decision
the-shelf audio devices, IEEE Internet Things J. 6 (2) (2019) 2959–2973, https:// tree learning methods, IEEE Trans. Knowledge Data Eng. 19 (3) (2007) 345–354.
doi.org/10.1109/JIOT.2018.2877607. [47] A. Mathur, G.M. Foody, Multiclass and binary SVM classification: implications for
[24] S. Zhang, X. Liu, Y. Liu, B. Ding, S. Guo, J. Wang, Accurate respiration monitoring training and classification users, IEEE Geosci. Remote Sensing Lett. 5 (2) (2008)
for mobile users with commercial RFID devices, IEEE J. Selected Areas Commun. 241–245.
(2020), https://doi.org/10.1109/JSAC.2020.3020604. [48] Z. Changjun, C. Yuzong, The research of vehicle classification using SVM and KNN
[25] A. Can, A. L’Hostis, P. Aumond, D. Botteldooren, M.C. Coelho, C. Guarnaccia, in a ramp, Int. Forum Comput. Sci.-Technol. Appl. (2009) 391–394.
J. Kang, The future of urban sound environments: impacting mobility trends and [49] J. George, L. Mary, K.S. Riyas, Vehicle detection and classification from the
insights for noise assessment and mitigation, Appl. Acoust. 170 (2020) 107518, acoustic signal using ANN and KNN, in: International Conference on Control
https://doi.org/10.1016/j.apacoust.2020.107518. Communication and Computing (ICCC), IEEE, 2013, pp. 436–439.
[26] Iftikhar Ahmad, A. Hassan, M.U. Anjum, et al., Ambient acoustic energy harvesting [50] A. Kumari, S. Tanwar, S. Tyagi, N. Kumar, Fog computing for Healthcare 4.0
using two connected resonators with piezoelement for wireless distributed sensor environment: opportunities and challenges, Comput. Electr. Eng. 72 (2018) 1–13,
network, Acoust. Phys. 65 (2019) 471–477, https://doi.org/10.1134/ https://doi.org/10.1016/j.compeleceng.2018.08.015.
S1063771019050014.

21

You might also like