Professional Documents
Culture Documents
Illegal Logging Listeners Using IoT Networks
Illegal Logging Listeners Using IoT Networks
Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 21,2021 at 06:51:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Overview of the system architecture, components and configurations, of our embedded sound acquisition system. A Raspberry Pi is programmed
to control a microphone and an attached sound card to continuously record sound in the forests. Audio signals captured are then automatically packaged
and periodically sent through cellular networks to a cloud storage, where the data can be downloaded remotely at any time for further processing.
With NCAPS, the number of cases caught before trees quality sound with a sufficiently high sampling rate, good
were cut had doubled, while those after damages had been audio sensitivity, and a minimal amount of unwanted noise.
done were reduced by more than half. This has been quite Each individual unit is designed so that it is readily exten-
significant for their success, considering that illegal logging sible; multiple units can then be deployed as an extended
should be intercepted in its early stages. The earlier it is logging listener network on a budget.
caught, the less the severity of the damage is. To capture high quality sound, we selected a professional-
NCAPS comprises four core components: an efficient grade, broadcast-quality lavalier microphone, specifically our
strategic planning when teams move-in to search and arrest choice was a RODE SmartLav+. Its high-performance om-
loggers, effective devices and technologies, readiness of nidirectional condenser capsule enables our system to pick
park rangers, and a strict enforcement of the law. NCAPS up acoustic sound signals equally from all directions in a
devices consist of motion sensors, infrared cameras, and GPS three-dimensional sphere pattern around this 4.5-millimeter
trackers. Sensors are wrapped in a camouflage casing and miniature-sized microphone. Equipped with a wind shield or
installed high-up in the trees. Installation of these devices a pop filter, it is less sensitive to wind and other unwanted
not only discourages a number of loggers but also catches background pop-noises. The RODE SmartLav+ spans a fre-
them in the act. Once motion is detected, an email alert is quency range from 20 to 20,000 Hertz. Its output connector
sent to rangers in charge. There are a few true positives that is a TRRS (i.e. tip/ring/ring/sleeve) jack, which we connected
should be acted upon buried in a very large number of false to a RODE S3 3.5MM TRRS to TRS adapter.
alarms; rangers could receive up to eight or nine hundred Sounds captured are acoustic analog signals. Sound is
email alerts in a day. a mechanical wave generated by physical vibrations of air
Our research project aims to enhance this current camera- particles or molecules of other mediums. It is a longitudinal
and-GPS-based NCAPS regime by integrating analysis of pressure wave that propagates through a medium by means
sound. Audio information can help confirming or rejecting of high-pressure compression and low-pressure rarefaction.
alerts when suspiciously abnormal motion is detected. It Sound wave propagates in the direction that is the same as or
can be used to filter false positive email messages, leaving parallel to the displacement of the vibrations of air particles.
rangers with fewer that require attention. Additionally, since Sound travels in all directions and it echoes when it hits
sound can be heard further away than the typically usable and bounces off a solid surface. Conversion of analog sound
field of view of a camera, sound of logging activities can be waves to electrical impulses is necessary to enable digital
captured before loggers come into the view of a camera. This sound signal processing. In our design, this conversion is
effectively increases coverage distances of the system. In a carried out by an external sound card. The TRS jack output
similar work, Ahmad and Singh detected the sound of tree from the microphone cable is connected to a UGREEN
cutting by axes [2]. Unlike their work, we are capturing and 30724 external stereo sound adapter, which is then plugged
visualizing the use of chainsaws, which are one of the most into a Raspberry Pi 3 Model B+ embedded single-board
common tools used in both harvesting and processing steps. computer through a Universal Serial Bus (USB) 2.0 port.
We built an IoT sound acquisition network and deployed it in A Raspberry Pi module is our on-site, pint-sized opera-
a real, but controlled, wilderness environment. From recorded tional control and computational unit. Pi locally stores audio
audio files, acoustic features were extracted to detect and data it receives temporarily for a short period of time. Audio
classify illegal chainsaw activities. data saved initially on its own local storage, a micro SD
card, is periodically packed and send to an off-site cloud
III. E MBEDDED S YSTEM D ESIGN AND D EVELOPMENT storage. Connected to Pi is a USB air card dongle with
Our embedded sound listener device was designed for active cellular communication channels, and through cellular
high performance and yet to be portable and affordable. An networks, data are deposited in the cloud. Our approach
overview of its architecture and configuration is shown in employs the fourth generation of broadband cellular network
Fig.2. Every component is hand-picked so as to record high technology (4G) as used in [3]. This choice was made
1278
Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 21,2021 at 06:51:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Scenes of our experimental sites where sound was recorded. Left: Fig. 4. Topographic map of our experimental sites with marked locations
Site A, the eucalyptus reforestation. Right: Site B, the thick natural forest. of our listeners and sound sources, where log cutting with chainsaws was
conducted. At Site A, sound is recorded at 100 m and 300 m away from
the source, whereas at Site B, the recorders are set at 100 m and 270 m.
This map is generated using a GPSVisualizer tool by Adam Schneider.
to enable extended uploading of audio data with minimal
interruption and maintenance. Though there are incurring
charges of 4G data plans for each listener unit, the main
advantage of using mobile networks over WiFi connections
is the extended range. This is a critical factor for fieldworks
in remote sites as it was also of a concern in [4]. WiFi has a
limited range within reach of a wireless router, the coverage
of 4G services is much wider and more accessible.
Once in the cloud, recorded digital audio data can then
be checked remotely and downloaded for further processing.
Using cloud data storage instead of local SD card or an on-
site wired external hard disk storage enables building a very
large data bank and also reduces the Size-Weight-and-Power
(SWaP) requirements. Electrical power for any additional
electronics could be challenging when devices are deployed
off grid. We note here that although our prototype Raspberry Fig. 5. Setting of our sound acquisition devices. Each listener is placed on
Pi is currently powered by a power bank, our system design is the ground with a loosely covered camouflage tarp for protection.
to have it self-sustained solar-powered, an option that could
be handily implemented.
1279
Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 21,2021 at 06:51:51 UTC from IEEE Xplore. Restrictions apply.
TABLE I TABLE II
VARIATIONS IN CHAINSAW AND LOG CUTTING ACTIVITIES . C HARACTERISTIC PARAMETERS OF THE SELECTED SOUND SAMPLES .
STIHL Chainsaw MS180: small and light duty Sample Site Distance Chainsaw Wood Size of Log Chainsaw Engine
MS382: large forestry and agricultural saw (a) A 100 m Large Dry 321 mm start and accelerate
MSA120C: electric, cordless, battery-powered (b) A 300 m Electric Dry 120 mm no acceleration
Condition of wood Fresh, moist, or dry (c) B 100 m Large Moist 275 mm with acceleration
Size of wood (mm) 60, 120, 145, 160, 175, 190, 275, or 320 (d) B 270 m Small Dry 160 mm with acceleration
Logging activities Starting and warming up of the saw engine
Cutting with or without accelerating the engine
meters and 270 meters away, respectively. Parameters of
these four sound samples are listed in TABLE II. Examining
the signal Fig.8(a), the sounds of active chainsaws are evident
in the beginning of the time series, around the halfway mark,
and at the end. Chainsaw patterns are less apparent as the dis-
tance from a sound source triples, shown in Fig.8(b). Under
noisy environment at Site B, Fig.8(c) and 8(d), the waves
are noticeably overwhelmed by undesired noise, making it
very difficult, if not impossible, to pinpoint during which
Fig. 7. Our wood cutting activities using chainsaws, conducted at the Wang segments of time chainsaw activities occur.
Nam Khiao Forestry Student Training and Research Station. These served In digital sound processing, acoustic signals are typically
as sound sources of our data collection, simulating possible illegal logging. transformed from the time to the frequency domain. A
discrete Fourier transform (DFT) is used to decompose a
signal into a linear combination of its sinusoidal frequencies
types of chainsaws, ranging from compact-sized to heavy-
and phase contents. This projection of a sound wave onto the
duty, and also including electric saws. Making it even harder
Fourier spectral domain uncovers its constituent frequency
to catch and be heard, some are apparently modified with
components, revealing which frequencies are present when
sound absorbing materials or soundproofing gear. For our
and how often. Fourier transforms, however, are not generally
experiment, three commonly used chainsaws were selected.
applied to the entire length of the wave. Instead, audio signals
Conditions and sizes of logs were varied, as well as how
are broken down into small chunks of time and onto each
saw engines were run while wood was cut. These variations
of these tiny sound segments a short-time Fourier transform
are summarized in TABLE I and photographs of our logging
(STFT) is applied. It thus details frequency components
activities are shown in Fig.7.
of a nominally constant signal, unwrapping the underlying
Logging was conducted as sound was collected in intervals
discriminative features. Following commonly used setting,
of five-minute recording with a five-minute break. The gap-
25 ms chunks were implemented in this work. To maintain
times allow completion of audio files being written to storage
continuity of signals, an overlapping window of 10 ms was
and transferred to the clouds. This experiment yielded 30
applied between every consecutive chunk.
minutes of recorded logging activities per hour. Our log
Resulting Fourier spectra are complex-valued vectors; with
cutting activity time totaled 13 hours. With four listeners
one column vector for each 25 ms audio. These are visualized
running in parallel, 26 hours of audio data were recorded.
through a spectrogram or referred to as a periodogram [5]. It
Logging activities during the 5-minute recording were per-
is the logarithmic scale of the power spectral density (PSD),
formed as naturally as possible. Chainsaws were run for a
denoted by P , of Fourier energies X at every frequency k.
minute or two or sometimes longer. Wood was cut, split,
The STFT spans over N samples of a time-varying signal
stacked, or moved. There were interruptions when chains
x(n). Since a 25 ms chunk was implemented and our signal
failed and needed to be re-set, or batteries ran out and needed
was recorded at 22,050 samples per second, therefore N =
to be changed. With these settings, the data collected contain
551. The PSD function is expressed as:
both positive and negative samples of the sound of wood
cutting with chainsaws, which is our target for detection N −1 2
1 1 X
of illegal logging. In addition to sound, environmental data P (k) = X(k)2 = x(n)e−i2πkn/N
recorded included temperature, humidity, wind speed, and N N n=0
wind direction. These parameters could potentially have an
affect and will be use later in the analysis of acoustic signals. To form a single spectrogram are columns of the power
spectrum of each 25-ms segment stitched together along the
V. ACOUSTIC S IGNAL P ROCESSING AND V ISUALIZATION time-axis. A spectrogram is a graphical representation of
The simplest and most familiar visualization of sound is a sound that shows sequential characteristics of the signal. It
time series plot of the amplitude. Fig.8 shows four examples displays how intensity of sound is distributed in frequency
from our data set. Fig.8(a) and 8(b) are from the restoring domain and how frequency spectra changes over time. Fig.9
eucalyptus forests (Site A) while Fig.8(c) and 8(d) are from depicts four periodograms corresponding to the four waves
natural forests (Site B). In the other dimension, Fig.8(a) and presented in Fig.8. Sounds of chainsaws are clearly exposed
8(c) were recorded from the distance of 100 meters away in Fig.9(a) and 9(c); both of which are recorded at 100
from the sound source while Fig.8(b) and 8(d) were 300 meters. While chainsaw are visually detectable in both the
1280
Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 21,2021 at 06:51:51 UTC from IEEE Xplore. Restrictions apply.
(a) (b)
(c) (d)
Fig. 8. Examples of sound waves plotted in time domain. Two signals in the top row (a)-(b) are from site A and the other two on the bottom row are from
site B. Those on the left, (a) and (c), are recorded at 100 m away from the sound source. Two on the right are set at 300 and 270 meters, respectively.
(a) (b)
(c) (d)
Fig. 9. Pictorial representation of our sample audio data in the frequency domain. Each plot is a spectrogram of the corresponding time series sound wave
shown in Fig.8. The x-axis is time in second. The y-axis is frequency in cycle per second or Hertz (Hz). Colors represent the third dimension, which
indicates the intensity of each frequency component of the acoustic signal in logarithmic scale in the unit of decibel (dB).
time and the frequency domain for sample (a), they are frequency contents of sounds by being narrower in lower
only distinguishable in frequency domain for the sample (c). frequencies and becomes wilder as frequencies increase.
The frequency spectrum has effectively separated the higher- Taking the logarithm of the power of Mel-filtered signals
intensity sound of log-cutting chainsaws from lower-intensity results in a waveform onto which a discrete cosine transform
environmental noise. At 300 meters distance, samples (b) (DCT) is applied. In addition to extracting periodicity of the
and (d), sound captured is naturally less intense. Patterns harmonics, since DCT is closely related to Karhunen–Loève
of chainsawing are distinguishable but considerably more transform and principle component analysis (PCA), it also
faded. Observing (d), in addition to chainsaws, frequency de-correlates the log-energies. The DCT-transformed signal
separations also include the sound of bird calling. These are is presented in a quefrency domain, a nominal of the time
multiple short-vertical bars in the periodogram. domain, and its amplitudes are those of Mel-frequency
The frequency scale in periodograms is linear in the unit cepstral coefficients or MFCC.
of cycles per second or Hertz (Hz). This, however, does The Mel-frequency cepstral coefficients of our four sample
not correspond to human auditory systems. In Hertz-scale audio signals are demonstrated in Fig.10. Differences be-
we perceive musical tones more discriminatively at lower tween the times at which there are chainsaw activities and
frequencies whereas acoustic signals at higher frequencies those without can be observed more distinctly in samples (a)
are perceived closer together. To reflect this non-linearity, f and (c). They are less pronounced in samples (b) and (d).
Hz in the Hertz-scale is mapped to m mel in the Mel-scale The signal in (d) appears to be the most difficult of all four.
(Mel is short for melody), following [6]: It is not only at the farther distance in a noisier site, but also
intermixed with bird calling sounds.
f Focusing on individual characteristics of sound at in-
m = 2595 log10 1 +
700 stances of time, we plotted the MFCCs of half-a-second long
signals, randomly segmented from each of the four audio
Mathematically, the powers of Fourier spectrum are fil- samples in Fig.10. This reveals strikingly distinctive features,
tered with the Mel-scale triangular overlapping windows. presenting a noticeably clear separation between chainsaw
The Mel triangular filters resemble human perception of the logging activities and other sounds; these are demonstrated in
1281
Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 21,2021 at 06:51:51 UTC from IEEE Xplore. Restrictions apply.
(a) (b)
(c) (d)
Fig. 10. The Mel-frequency cepstrum of our sample audio data. Each is corresponded to the time-series plot in Fig.8 and the frequency-domain periodogram
in Fig.9. Same as both Fig.8 and Fig.9, the horizontal axis of the plots is the time from 0 to 4 minutes. The vertical axis is the columns of the MFCCs.
1282
Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 21,2021 at 06:51:51 UTC from IEEE Xplore. Restrictions apply.