Biomedical Signal Processing and Control

Biomedical Signal Processing and Control 63 (2021) 102208
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control

journal homepage: www.elsevier.com/locate/bspc
Robust heart sound segmentation based on spectral change detection and

genetic algorithms
Miguel A. Alonso-Arévalo a ,∗, Alejandro Cruz-Gutiérrez a , Roilhi F. Ibarra-Hernández a ,
Eloísa García-Canseco b , Roberto Conte-Galván a
a
Department of Electronics & Telecommunications, Applied Physics Division, Centre for Scientific Research and Higher Education at Ensenada (CICESE). Ensenada,
B.C., Mexico
b Faculty of Sciences, Autonomous University of Baja California (UABC), Ensenada, B.C., Mexico
ARTICLE INFO ABSTRACT
Keywords: Listening to cardiac sounds can quickly provide information about the functioning of the heart. The heart
Heart sounds sound signal, also known as the phonocardiogram (PCG), plays an essential role in automatic auscultation.
Segmentation Segmentation of the PCG signal into its fundamental parts can significantly facilitate any further analysis. In
Spectral energy flux
this work, we propose a new method that segments the PCG into fundamental heart sounds and silences. This
Genetic algorithm
method can be divided into two stages: Detection and Selection. In the first stage, a function whose maxima
Differential evolution
indicate the presence of sound events is generated based on the calculation of the spectral flux, a measure of how
quickly the spectrum of the PCG signal is changing with respect to time. In the second stage, the position of the
beginning and termination of the fundamental heart sounds is detected by analyzing and selectively choosing
the time positions of the maxima in the detection function. This selection is solved as an optimization problem
through the estimation of an ideal detection function, whose solution is found using two genetic algorithms:
a simple genetic algorithm (SGA) and differential evolution (DE). The proposed method was evaluated using
the PhysioNet/CinC Challenge dataset, comprising more than 3,000 PCGs. Our results exhibit a mean 𝐹1 score
of 87.5% and 93.6% for the SGA and DE variants, respectively. The proposed system is robust and highly
modular, which simplifies the reuse of specific parts to evaluate algorithm variants. The implementation of
the proposed method is available as open-source software.
1. Introduction healthy individuals, there are two moments of quietness: the systolic
(s-Sys) and the diastolic silences (s-Dia). Subjects with heart pathologies
According to the world health statistics released by the WHO usually present additional sounds, such as the S3 and S4 sounds,
[1], Cardiovascular Diseases (CVD) remain the number one cause of murmurs, or clicks. The primary purpose of heart sounds segmentation
death throughout the world. If adequately performed by a trained care is to decompose the signal into meaningful cardiac cycle events: the
provider, cardiac auscultation can aid to diagnose CVD opportunely first heart sound (S1), the systolic period, the second heart sound
at a low cost [2]. Modern digital stethoscopes allow the acquisition of (S2), and the diastolic period. Many of the classification algorithms
heart sounds, known as phonocardiogram (PCG) signals, with relatively that process heart sounds use time segmentation as a pre-processing
good quality and simplicity. Recently, a push for the development stage since separating the PCG into its main components facilitates
of Computer Aided Diagnosis (CAD) systems based on PCG signals
the identification and extraction of relevant acoustic features in each
is underway, insomuch as these signals reflect the mechanics of the
cardiac cycle and simplifies further analysis. Other applications of PCG
heartbeat (HB).
segmentation include biometric systems and telemedicine.
Two fundamental heart sounds (FHS), named S1 and S2, are pro-
An in depth review of PCG segmentation and heart sound automatic
duced inside the heart by the circulation of blood during the pumping
analysis is beyond the scope of this work. A comprehensive assessment
cycle, also called Cardiac Cycle (CC). The CC has a quasi-periodic
structure, formed by the Systole (contraction), marked at its beginning of available techniques can be found in [3]. In general, the segmenta-
by S1; and by the Diastole (relaxation), started by S2. These sounds tion techniques found in the literature can be divided in two stages: an
only represent a fraction of each of the CC sections, such that, in event detection stage, followed by an event selection stage. The detection
∗ Corresponding author.
E-mail address: aalonso@cicese.edu.mx (M.A. Alonso-Arévalo).
https://doi.org/10.1016/j.bspc.2020.102208
Received 17 October 2019; Received in revised form 31 July 2020; Accepted 1 September 2020
Available online 8 September 2020
1746-8094/© 2020 Elsevier Ltd. All rights reserved.
M.A. Alonso-Arévalo et al. Biomedical Signal Processing and Control 63 (2021) 102208
stage of most methods relies on the computation of some kind of PCG 3153 normal and pathological HS recordings, as shown in Table 1,
energy envelope, which is used as a novelty or detection function. Among with corresponding ground truth annotations by the challenge organiz-
the techniques that have been used to compute the envelope are, for ers [41]. From this database, all PCG signals were evaluated with the
instance, the Shannon energy [4], wavelets [5–12], Shannon energy exceptions of the ones labeled by the CinC organizers as low-quality.
combined with the 𝑆-transform [13], homomorphic filtering [10,11, Furthermore, all signals having a duration longer than 30 s were
14–17], auto-regressive modeling [18,19] or the Viola integral [20]. partitioned into shorter overlapping sections of 20 s each. The total
In some proposals, the computation of an energy envelope is omitted number of evaluated PCG signals was then 3243, as shown in Table 1.
or replaced by methods such as simplicity measurement [4], atomic After considering the partition and overlap of the PCG segments, the
time–frequency decomposition using the short-time Fourier transform dataset used in our evaluation consisted of 185,458 FHS. This number
(STFT) [9,21,22], or empirical mode decomposition followed by a win- does not include the FHS inside the noisy sections annotated by the
dowed Kurtosis analysis [12,23]. Other methods segment the HS with CinC organizers, details are provided in [33,41].
the help of a reference signal such as ECG or the carotid pulse [12,24–
26]. For the event selection stage, many different techniques are also 2.2. Pre-processing
proposed. For instance, picking maxima above a fixed threshold [4],
computing inter maxima distance [27], or an adaptive threshold [28]; In our segmentation algorithm, we implemented a similar pre-
analyzing the duration and intensity of the maxima in the detection processing stage as the one proposed by Schmidt et al. [15]. The
function [6] using a singular value decomposition [29,30] or an en- input PCG signal is low-pass and high-pass filtered using third order
tropy gradient function [8]. The boundaries (duration) of the HS have Butterworth digital filters, with cutoff frequencies of 25 Hz and 250 Hz
also been computed using an optimization method that controls the respectively. It has been shown that the dominant frequency content
width of a Gaussian window that best matches the energy in the time for the S1 and S2 sounds lies within the 24–144 Hz range [43]. The
and 𝑆-domain [13]. Given the behavior of the cardiac cycle, a very rationale for limiting the band of analysis between 25 Hz and 250 Hz
natural method for the event selection stage is to model the dynamic was to emphasize the S1 and S2 sounds. More specifically, to reduce
relationship of the sounds/silences that form the cardiac cycle as the the influence of low and high-frequency noise/artifacts. For instance,
states of a hidden Markov/semi-Markov model [12,15,16,19,31–33]. stethoscope friction spikes are broadband, and they often have higher
More recently, event selection approaches based on neural networks magnitudes than the heart sounds. We think that another type of digital
have also been proposed [9,11]. filter can be used for the same purpose without affecting performance;
In this work, we introduce a new technique to segment PCG signals however, we have not evaluated a different approach. In addition, for
that relies on methods originally developed in the field of music signal all the PCG signals the sampling frequency was normalized to 2000 Hz
processing. Our proposal is based on a model of the human ability to with resolution of 16-bits per sample, as proposed by Liu et al. [41].1
perceive sounds under the presence of noise or artifacts, as well as
in methodologies for the estimation of rhythmic structures. The roots 2.3. Spectral energy flux
of the proposed segmentation approach can be found in the subjects
of onset detection and tempo estimation [34–38]. On this basis, in the One of the main characteristics of the FHS is their relative compact
present work we will refer to the start and finish of S1 and S2 sounds support in time and frequency. The majority of FHS durations are in
as onsets and offsets, respectively. the range from 25 ms to 150 ms, with much of their power frequency
distribution well below 200 Hz, centered around non-audible frequen-
2. Methods cies near 50 Hz [44,45]. The use of the Spectral Energy Flux (SEF) as a
detection principle seeks to exploit these properties. That is, the time–
The architecture of our segmentation algorithm consists of three frequency behavior of a PCG signal will exhibit broadband frequency
main stages, as illustrated in Fig. 1. First, the PCG signal is pre- energy bursts when an FHS is produced, with peak energy around HS
processed by a bandpass filtering and a decimation procedure. Then, a frequencies and a tail-shaped energy decay at the termination of the
detection function is calculated using the Spectral Energy Flux algorithm. sound; as seen in musical notes [37]. Detection of these sudden changes
The maxima or peaks present in the detection function indicate the can be achieved by calculating the discrete-time derivative of the fre-
likelihood of occurrence of heart sound events. However, the detection quency content. However, this simple approach exhibits in general low
function is prone to detect transient artifacts, such as stethoscope move- accuracy. To improve this method, a so-called perceptual transformation
ments, speech, respiration, and gastric sounds, or inadvertently omit is implemented, as proposed in [36–38]. Below we describe the steps
the presence of any FHS. To overcome these limitations, a procedure to calculate the detection function.
to determine the time boundaries of each FHS based on an optimization
problem is proposed as a selection phase. 1. First, the Time–Frequency Representation (TFR) of the PCG sig-
nal, 𝑥(𝑛), is calculated by using the discrete Short Time Fourier
2.1. PCG dataset Transform (STFT):
∑
𝑁−1
To assess the performance of the proposed algorithm a set of PCG 𝑋 (𝑚, 𝑘) = 𝑥 (𝑛) 𝑤 (𝑛 − 𝑚𝑅) 𝑒−𝚥2𝜋𝑛𝑘∕𝑁 , (1)
recordings were obtained from three different public databases, de- 𝑛=0
scribed in Table 1. The PASCAL B database was assembled for a HS where 𝑤(𝑛) is a window function of length 𝐿𝑤 samples; 𝑛 is the
segmentation and classification challenge [39]. From this database, discrete time index; 𝑚 is a new time (frame) index; 𝑅 is the hop
only 50 PCGs obtained from healthy subjects were randomly selected. size between two consecutive time frames 𝑚 and 𝑚 + 1, with 𝑚 =
The HSCT11 database was assembled to evaluate and compare the 0, 1, 2, … , 𝑀 −1, where 𝑀 is the total number of STFT frames and
performance of biometric systems based on PCG signals [40]. From 𝑁 is the discrete Fourier transform length. The set of frequency
this set, only 40 PCGs of good quality and obtained from healthy bins is defined as 𝑓𝑘 = 𝑘𝑓𝑠 ∕𝑁, for 𝑘 = 0, 1, 2, … , 𝑁∕2 − 1.
subjects were selected and clipped to a maximum duration of 20 s. For Finally, 𝑓𝑠 = 1∕𝑇 is the sampling frequency of the PCG, and
both datasets, a ground truth or reference segmentation locations were 𝑓𝑠𝜈 = 1∕(𝑇 𝑅) is the sample frequency of the new time vector
manually annotated by the authors.
The PhysioNet database was assembled for the PhysioNet
/Computing in Cardiology Challenge 2016 (CinC) [41,42]. The public 1
The software used for resampling was SoX (http://sox.sourceforge.net/
version of this database is composed of 6 different subsets comprising Main/HomePage).
2
Fig. 1. Block diagram of the proposed segmentation algorithm.
Table 1
Summary of evaluated PCG recordings obtained from public datasets.
Dataset Clinic condition # PCG Acquisition Charac.
Stethoscope 𝑓𝑠 [Hz] Duration [s]
PASCAL [39] Healthy & Pathologic 50 Littmann 4000 2–13
HSCT11 [40] Healthy 40 Thinklabs 11,025 15–20
3153 Diverse 2000a
A: 409 A: Meditron A: 44,100
B: 490 B: Littmann B: 4000
PhysioNet [41] Healthy & Pathologic 8–300
C: 31 C: Meditron C: 4000
D: 55 D: Prototype D: 8000
E: 2054 E: Littmann E: 8000
F: 114 F: JABES F: 8000
𝑓𝑠 : Sampling frequency;
𝑇 : Time duration.
a
Signals resampled at 2 kHz.
𝑚 [46,47]. The main reason for using the STFT is simplicity, coefficients are computed as follows:
computational speed and very good performance. It is feasible to 1
replace the STFT by other Time–Frequency Analysis (TFA) tech- 𝑔 (𝑖) = ,
𝑖𝛼 (𝑖)
niques, for instance Synchrosqueezed Chirplet Transform [48].
with
A more in-depth analysis on the use of the TFA is provided in
𝐿𝑑 ( )
Section 4. ∏ 𝑖2
𝛼 (𝑖) = 1− ,
2. Given the compact spectral distribution of the PCG signal, only 𝑗=1 𝑗2
a section of the frequency bins are selected. That is, we reduce 𝑗≠𝑖
the STFT 𝑋(𝑚, 𝑘) to the subset of frequencies such that  = {𝑘 ∣ and 𝑖 = 1, 2, … , 𝐿𝑑 . Such that
25 Hz ≤ 𝑓𝑘 ≤ 250 Hz}. [ ( ) ( )]
ℎ𝑑 = −𝑔 𝐿𝑑 , … , −𝑔 (1) , 0, 𝑔 (1) , … , 𝑔 𝐿𝑑 . (5)
3. Let 𝑘̄ ∈ , the SEF of the clipped STFT, 𝑋(𝑚, ̄ ̄ is then
𝑘),
calculated as 4. After calculating the SEF, a detection function is formed by
( ) ( ) ∑ ( ) adding all frequency bins per each 𝑚 time frame, that is
𝐸 𝑚, 𝑘̄ = 𝐺 𝑚, 𝑘̄ ∗ ℎ𝑑 (𝑚) = 𝐺 𝑙, 𝑘̄ ℎ𝑑 (𝑚 − 𝑙) , (2)
𝑙
∑
𝜐 (𝑚) = ̄
𝐸(𝑚, 𝑘). (6)
where ℎ𝑑 (𝑘) is an approximation to a differentiator filter with an 𝑘̄
ideal frequency response
5. The signal 𝜐(𝑚) exhibits maxima at the beginning or onset of
𝐻𝑑 (𝜔) ≈ 𝚥2𝜋𝜔, (3) sudden sounds, and it also exhibits minima when at end or offset
occurs. However, given the nature of the proposed perceptual
̄
where 𝜔 is the discrete-time frequency variable, and 𝐺(𝑚, 𝑘) transformation and the longer decay of the FHS offsets, it is only
is a perceptual transformation, defined as non-linear dynamic considered the maxima of 𝜐(𝑚) during the periodicity analysis.
compression of the STFT and a low-pass filtering More specific, the negative parts of the detection function 𝜐(𝑚)
( ) ( ( )| )
| are discarded
𝐺 𝑚, 𝑘̄ = log10 |𝑋̄ 𝑚, 𝑘̄ | ∗ ℎ (𝑚) . (4)
| |
𝜐 (𝑚) ← 0 ∀ 𝑚 | 𝜐 (𝑚) < 0. (7)
The function ℎ is an integrator low-pass filter [35], defined as
the 𝐿 coefficients taken from a 2𝐿 length Hann window 6. Finally, the detection function is low-pass filtered
ℎ = {Hann(𝑖) ∶ 𝑖 = 𝐿 , 𝐿 + 1, … , 2𝐿 }. 𝜈(𝑚) = 𝜐(𝑚) ∗ ℎ𝑔 (𝑚), (8)
The differentiation of discrete sequences is not defined. Never- where ℎ𝑔 (𝑚) is a length 𝐿𝑔 Gaussian window with standard
theless, it is possible to obtain a good derivative approximation deviation 𝜎𝑔 as:
using an asymmetric finite impulse response filter that behaves ( )2
as denoted in Eq. (3) [49]. In the present work, we calculate the − 12 𝑚
𝜎𝑔 𝐿𝑔 ∕2
ℎ𝑔 (𝑚) = 𝑒 . (9)
filter ℎ𝑑 (𝑚) using the central differentiation method proposed
by Dvornikov [50], which consists of a 2𝐿𝑑 order polynomial Discarding the negative part of 𝑣(𝑚) only allows to calculate a de-
that passes through 2𝐿𝑑 + 1 samples. The differentiator filter tection function for onsets. To detect offsets, another detection function
3
̄ and (d) detection function 𝜈(𝑚).

Fig. 2. Detection function generation example. (a) PCG signal, (b) STFT representation 𝑋(𝑚, 𝑘), (c) SEF representation 𝐸(𝑚, 𝑘)
is calculated using the time-reversed PCG signal, that is 𝑥(−𝑛), or even, • Windowed Auto-Correlation Function (WACF), where 𝜈(𝑚) is
̄ The output of the detection phase consists
for a reversed TFR 𝑋(−𝑚, 𝑘). divided into 𝑆𝑟̂ segments of 𝐿𝑟̂ samples each one, with an overlap
then of two detection functions, one for onsets and one for offsets. An of 𝑂𝑟̂ samples between successive segments such that 𝜈𝑗 (𝑚) =
( )
example of the estimation of a detection function is shown in Fig. 2. 𝜈 (𝑗 − 1) ⋅ 𝑂𝑟̂ + 𝑚 , for 𝑗 = 1, 2, … , 𝑆𝑟̂ , and where for each 𝜈𝑗 an
ACF will be calculated as indicated by Eq. (10):
𝐿𝑟̂
2.4. Event selection using optimized correlation 1 ∑
𝑟̂𝑗 (𝑙) = 𝜈 (𝑚) 𝜈𝑗 (𝑚 − 𝑙) ,
𝐿𝑟̂ 𝑚=𝑙+1 𝑗
The objective of the selection stage is to decide which maxima
(i.e., time instants) present in the detection function correspond to the for 𝑙 = −(𝐿𝑟̂ − 1), … , −1, 0, 1, … , 𝐿𝑟̂ − 1. Then, by taking only the
onsets and offsets of FHS in a PCG signal. By exploiting the quasi- 𝑙 ≥ 0 indexes we define the WACF as follows:
𝑆𝑟̂
periodic structure of the cardiac cycle it is possible to obtain a robust 1 ∑
selection, as proposed in previous works [15,16,22,30,51]. 𝑟̂𝑣 (𝑙) = 𝑟̂ (𝑙) , (11)
𝑆𝑟̂ 𝑗=1 𝑗
A novelty of our proposal resides on the use of optimization tech-
niques to derive a sort of paragon detection function (𝜈(𝑚))̄ by care- where 𝑆𝑟̂ is defined by
⌊ ⌋
fully shifting each maxima corresponding to onset/offset from an ini- 𝐿𝑟̂ − 𝑀
𝑆𝑟̂ = +1 .
tial and imprecise approximation to an optimal location according to 𝑂𝑟̂
periodicity constraints.
• Periodogram defined in terms of Welch’s Power Spectral Density
(PSD) estimation [47,52]. As in the case of the WACF, 𝜈(𝑚) is
2.4.1. Periodicity estimation
divided into 𝑆𝑟̂ overlapping segments called 𝜈𝑗 of 𝐿𝑟̂ samples each
The estimation of the ideal detection function (𝜈) ̄ starts by com- . Then, the periodogram of the 𝑗th segment is computed as
puting the fundamental periodicity of the detection function 𝜈(𝑚),
|∑ |2
which is formed by the time duration of the cardiac cycle (𝑇𝑐𝑐 ), the | |2 1 |𝑀 |
|𝑗 (𝑘)| = | 𝑤(𝑚)𝜈 (𝑚) 𝑒 −𝚥2𝜋𝑘𝑚∕𝑀 |
systole (𝑇𝑠 ), and the diastole (𝑇𝑑 ). With these values, denoted as 𝑇𝜈 = | | | 𝑗 | ,
[ ] 𝑀 |𝑚=1 |
𝑇𝑐𝑐 𝑇𝑑 𝑇𝑠 , a synthetic detection function 𝜈𝑠 (𝑚) is then generated, | |
serving as an initial and unrefined approximation of 𝜈. ̄ As exemplified where 𝑤(𝑚) is a smooth spectral analysis window. The Welch esti-
later, the time-alignment of 𝜈𝑠 with 𝜈 would highlight those maxima in mate of the power spectral density is then obtained by averaging
𝜈 that potentially correspond to true FHS onsets/offsets. We propose to the 𝑆𝑟̂ periodograms as follows:
calculate 𝑇𝜈 using four robust and straightforward periodicity estima- 𝑆𝑟̂
1 ∑| |2
tion techniques used with high reliability in the context of music tempo 𝜈 (𝑘) = | (𝑘)| . (12)
𝑆𝑟̂ 𝑗=1 | 𝑗 |
analysis [38].
• Spectral Product, proposed by Noll [53], which exploits the
• Auto-Correlation Function (ACF), defined in terms of the stan-
highly harmonic spectra property of periodic signals. For PCG sig-
dard biased sample covariance of a signal with itself [52]
nals, this property means that the most salient peaks in the PSD of
1 ∑ 𝜈(𝑚) will be likely located at integer multiples of the frequencies
𝑀
𝑟̂ (𝑙) = 𝜈 (𝑚) 𝜈 (𝑚 − 𝑙) , (10) given by 1∕𝑇𝜈 , particularly at 1∕𝑇𝑐𝑐 . Noll [53] showed that by
𝑀 𝑚=𝑙+1
compressing the logarithm of the spectrum at whole multiples of
where 𝑙 = −(𝑀−1), … , −1, 0, 1, … , 𝑀−1 is the sample delay index, each frequency bin and by adding with respect to time the result
and for 𝑙 < 0, 𝑟̂(𝑙) = 𝑟̂(−𝑙). of each compression, it is possible to define a function whose
4
maximum is the fundamental frequency. The spectral product is create a different synthetic function 𝜈̂𝑠𝑗 (𝑚). For each optimization step,
then defined as an additional characteristic that describes the time interval between
∏
𝐶 successive heart sound events is defined as the vector of differences
𝜓 (𝑘) = 𝜈 (𝑐𝑘) , (13) ( )
∆𝑗 = 𝛿1 𝛿2 … 𝛿𝐼−1 , 𝛿𝑑 = 𝑡𝑑 − 𝑡𝑑−1 . (19)
𝑐=1
where 𝑐 = 1, 2, … , 𝐶 represents the compression index and 𝐶 the Each ∆𝑗 reflects the periodicity of the PCG, therefore, it is used
number of compressions. during the optimization procedure, since any changes to each 𝑡𝑖 should
avoid modifying the underlying periodicity 𝑇𝜈 .
A search algorithm based on a decision tree determines the value The optimization problem can then be defined as finding the vector
of 𝑇𝜈 by combining the data of the maxima in the four periodicity 𝐭 ∗ that maximizes the cost function
functions that fall between physiological limits, between the time range | ( )| | ( )|
𝑓𝑐 (𝐭) = 𝑓𝜈 (𝐭) + 𝑓𝛥 (𝐭) = |𝜌 𝜈(𝑚), 𝜈̂𝑠𝑗 {𝐭} | + ||𝜌 𝛥0 , 𝛥̂𝑗 (𝐭) || , (20)
of 0.15 s and 1.6 s, or its frequency domain equivalent. A search | | | |
procedure was developed by noticing that the properties of the different where 𝑓𝜈 (𝐭) is the cost function associated to the correlation between
periodicity functions allow the following assumptions: (1) the time the detection function 𝜈 and a warped synthetic detection function
position of the global maximum in the WACF is almost always 𝑇𝑠 ; in 𝜈̂𝑠𝑗 ; and where 𝑓𝛥 (𝐭) is the cost function associated to modifying the
turn, (2) the global maximum in the periodogram is practically never periodicity of 𝜈𝑠 . Each maximum position 𝑡𝑖 is subject to 𝐷𝐿 𝑖 ≤ 𝑡 ≤ 𝐷𝑖 ,
𝑖 𝑈
1∕𝑇𝑐𝑐 ; (3) the global maximum in 𝜓 is almost always equal to 1∕𝑇𝑐𝑐 , 𝑖 𝑖
where 𝐷𝐿 and 𝐷𝑈 are the lower and upper time location flexibility
except when the systole and diastole have approximately the same limits respectively.
duration. An example for periodicity estimation is depicted in Fig. 3. In our proposal, we calculate the optimal detection function 𝜈̄ using
the Simple Genetic Algorithm (SGA) [54] and Differential Evolution
2.4.2. Optimal correlation (DE) [55] optimization algorithms. Both SGA and DE algorithms are
Once the periodicity 𝑇𝜈 of 𝜈(𝑚) is estimated, a synthetic function heuristic methods used in computing for finding optimized solutions
𝜈𝑠 (𝑚) is generated by convolving a Gaussian window ℎ𝑔 (𝑚) with the to search problems. They are based on the theory of natural selection
∑
sum of two Dirac Shah functions X𝑇 (𝑚) = 𝑘 𝛿(𝑚 − 𝑘𝑇 ), 𝑘 ∈ Z, both and evolutionary biology with stages including selection, mutation,
with period 𝑇 = 𝑇𝑐𝑐 , but out-of-phase by 𝑇𝑠 , that is inheritance and recombination. The optimization procedure starts by
[ ( )] creating a group of random individuals which generate a population of
𝜈𝑠 (𝑚) = ℎ𝑔 (𝑚) ∗ X𝑇𝑐𝑐 (𝑚) + X𝑇𝑐𝑐 𝑚 − 𝑇𝑠 . (14) possible solutions. Each member is evaluated with the cost function, in
our case Eq. (20). The result of the cost function indicates their fitness
The initial time-alignment 𝜈𝑠 (𝑚 − 𝑙0 ) with 𝜈(𝑚) is given by the global
to solve the optimization problem. A number of the best individuals
maximum of the cross-correlation of both functions, that is
{ } are then used to create one or more offspring through the mutation
1 ∑
𝑀
and recombination stages. With each offspring, a new optimal solution
𝑙0 = max 𝜈 (𝑚) 𝜈𝑠 (𝑚 − 𝑙) . (15) (𝐭𝑗 ) can be found by finding the result with highest cost. Depending on
𝑙 𝑀 𝑚=𝑙+1
the application, the procedure is repeated for several generations until
As exemplified in Fig. 4, the alignment provides an initial position- the difference between generations is minimal (maximum is found) or
ing of the peaks in 𝜈𝑠 (𝑚) that match the peaks in 𝜈(𝑚) which could until a certain number of generations have passed. In our particular
correspond to FHS. It is possible to achieve a closer approximation to case the initial individual is the vector 𝐭0 . A thorough explanation of
an ideal detection function (𝜈(𝑚))
̄ by independently shifting in time the how to implement the SGA or DE is out of the scope this work, but it
position of each maxima in 𝜈𝑠 (𝑚) such that the statistical correlation can be found in [54] and [55] respectively.
with 𝜈(𝑚) is the highest. As illustrated in Fig. 4, let us denote as 𝜀𝑖 the
position error between the 𝑖th peak in 𝜈𝑠 with a corresponding peak in 3. Results
𝜈. By determining the optimal shift values 𝜀𝑖 for every Gaussian pulse,
we can generate a synthetic function that best approximates 𝜈(𝑚). In the In this section, we evaluate the performance of the proposed al-
present work, we propose to find 𝜈(𝑚) ̄ using two heuristic optimization gorithm regarding its ability to accurately estimate the location of
techniques: a Simple Genetic Algorithm (SGA) and Differential Evolution the FHS onsets and offsets in a PCG signal. We adopted the same
(DE). evaluation criteria proposed by Springer et al. [16] and Liu et al. [33].
To find the best fit between the detection function, 𝜈𝑠 (𝑚), and The onset/offset positions computed by the segmentation algorithm are
the warped synthetic detection function, 𝜈̂𝑠 (𝑚), we use the statistical considered to be correct and labeled as True Positive (TP) if they lie
correlation to measure the degree of similarity, that is: within an error window of 100 ms with respect to the ground-truth
{ } annotations. Each non-detected reference position is considered as a
| |
𝜈(𝑚)
̄ = 𝜈̂𝑠 (𝑚) ∣ max |𝜌(𝜈𝑠 (𝑚), 𝜈̂𝑠 (𝑚))| (16) type II error and labeled as False Negative (FN); also each additional
𝜈̂𝑠 | |
estimated position by the algorithm is taken as a type I error and
where marked as False Positive (FP). These criteria allow us to define four
[ ] ∑𝐿 ( )( ) performance measurements: Sensitivity (𝑆𝑒), defined as 𝑆𝑒 = 𝑇 𝑃 ∕(𝑇 𝑃 +
Cov 𝐱, 𝐲 𝑖=1 𝑥𝑖 − 𝑥 ̄ 𝑦𝑖 − 𝑦̄
𝜌 (𝐱, 𝐲) = √ √ [ ] = √ √ , (17) 𝐹 𝑁); Positive Predictive Value (𝑃+ ), defined as 𝑃+ = 𝑇 𝑃 ∕(𝑇 𝑃 + 𝐹 𝑃 );
∑𝐿 ( )2 ∑𝐿 ( )2
Var [𝐱] ⋅ Var 𝐲 𝑖=1 𝑥 𝑖 − 𝑥
̄ 𝑖=1 𝑦 𝑖 − 𝑦
̄ Accuracy defined as 𝐴𝑐𝑐 = 𝑇 𝑃 ∕(𝑇 𝑃 + 𝐹 𝑃 + 𝐹 𝑁); and the 𝐹1 score, the
harmonic mean of 𝑆𝑒 and 𝑃+ , defined as
and where 𝐱 and 𝐲 are two vectors of length 𝐿.
As illustrated in Fig. 5, by moving the time/sample position of 𝑆𝑒 ⋅ 𝑃+
𝐹1 = 2 ⋅ .
each peak in the synthetic function it is possible to approximate 𝜈𝑠 to 𝑆𝑒 + 𝑃+
the ideal detection function 𝜈.
̄ If we consider the synthetic detection Given the random nature of the genetic algorithms, the segmenta-
function as defined by Eq. (14), we refer to the time position of the tion evaluation of each PCG signal was repeated 25 times, and the out-
maxima after the initial time-alignment 𝜈𝑠 (𝑚 − 𝑙0 ) as the first set of time comes were averaged. Tables 2 and 3 present the overall performance
parameters results for both optimization algorithms DE and SGA respectively.2
| ( )
𝐭𝑗 || = 𝑡1 𝑡2 … 𝑡𝑖 … 𝑡𝐼 , (18)
|𝑗=0 2
Note that averaging the results of 25 evaluations resulted in the rational
where 𝐼 is the total number of pulses present in 𝜈𝑠 (𝑚) and 𝑗 is a warped numbers and in a slightly different total sum of FHS (𝑇 𝑃 + 𝐹 𝑁) as described
𝜈𝑠 after an optimization step 𝑗. Modifying the value of any 𝑡𝑖 will in Section 2.
5
Fig. 3. Periodicity estimation examples. (a) Clean PCG signal case, all three time durations forming 𝑇𝜈 can be easily determined in time and in frequency. (b) Noisy PCG with high
CC frequency case; 𝑇𝑠 is prominent and easy to determine in time, however, determining 𝑇𝑑 or 𝑇𝑐𝑐 is not as straightforward. This is compensated by functions on the frequency
domain that help to calculate 𝑇𝑐𝑐 . (c) Abnormal PCG corresponding to a pathological sound, 𝑇𝑐𝑐 , 𝑇𝑠 and 𝑇𝑑 were properly estimated combining information from the time and
frequency domains.
Fig. 4. Alignment example between the detection function 𝜈(𝑚) (black continuous line) and a synthetic function 𝜈𝑠 (𝑚) (blue dashed line). The positioning error between the 𝑖th
peak in 𝜈𝑠 and a corresponding peak in 𝜈 is represented by 𝜀𝑖 .
Tables 4 and 5 show the segmentation performance on each PCG signal 3.40 GHz with 8 GB of RAM running Ubuntu 18.04.4 LTS and MATLAB
considering separately healthy and pathological cases for the DE and version: 9.4.0.813654 (R2018a). Table 6 shows the average execution
SGA variants respectively. Finally, the effect on 𝐹 1 performance when time for both DE and SGA. From the benchmark results we can see that
varying the error window on healthy and pathological signals is shown the computational complexity of the detection stage is low compared
in Fig. 6. to the complexity of the selection stage. Nevertheless, the algorithm
runs relatively fast when compared to the actual duration of the PCG
An important characteristic of a segmentation algorithm is the
signal. On average, both methods take less than 18% of the time of the
computational complexity. In our proposal, the implementation and
duration of the PCG signal to finish the segmentation.
evaluation of the algorithm was carried out under MATLAB, except
the cost function used in the genetic algorithms. This function was 4. Discussion
coded in the C language and compiled as a MATLAB executable (MEX)
file to increase computational speed. We conducted a benchmark of In this article, we have proposed a PCG segmentation algorithm
the proposal on a PC with an Intel Core i5-7500 CPU processor at based on the idea of detecting events related to changes in the spectral
6
Fig. 5. Illustration of the optimal correlation optimization problem.
Table 2
Results of the evaluation metrics for the Differential Evolution (DE) variant of the segmentation algorithm.
Dataset TP FP FN 𝑆𝑒 (%) 𝑃+ (%) 𝐴𝑐𝑐 (%) 𝐹1 (%)
HSCT11a 1954.8 37.08 52.24 97.40 98.14 95.63 97.77
PASCALa 832.92 23.2 43.08 95.08 97.29 92.63 96.17
TA 31 001 2653.6 2225.8 93.30 92.12 86.40 92.70
TB 6002.9 820.84 602.12 90.88 87.97 80.84 89.40
onsets TC 2735.9 567.12 874.12 75.79 82.83 65.50 79.15
TD 1363.2 136 205.76 86.89 90.93 79.96 88.86
TE 117 330 3931.2 10 244 91.97 96.76 89.22 94.30
TF 9402.5 890.6 734.48 92.75 91.35 85.26 92.05
DE Total 170 630 9059.7 14 981 91.93 94.96 87.65 93.42

HSCT11a 1948.7 50.96 58.28 97.10 97.45 94.69 97.27
PASCALa 833.28 28.88 42.72 95.12 96.65 92.09 95.88
TA 30 418 3187.6 2723 91.78 90.52 83.73 91.15
TB 5823.7 987.2 685.32 89.47 85.51 77.69 87.44
offsets TC 2764.8 602.04 842.16 76.65 82.12 65.69 79.29
TD 1338 151.68 228.96 85.39 89.82 77.85 87.55
TE 117 800 6193.2 9269.6 92.71 95.01 88.40 93.84
TF 9303.5 1012.5 824.52 91.86 90.19 83.51 91.01
Total 170 230 12 214 14 675 92.06 93.31 86.36 92.68
a Indicates that the database was used to adjust the algorithm.
Table 3
Results of the evaluation metrics for the Simple Genetic Algorithm (SGA) segmentation variant.
Dataset TP FP FN 𝑆𝑒 (%) 𝑃+ (%) 𝐴𝑐𝑐 (%) 𝐹1 (%)
HSCT11a 1860.4 147.48 146.6 92.70 92.66 86.37 92.68
PASCALa 776.6 119.68 99.4 88.65 86.65 78.02 87.64
TA 30 005 3819.6 3222.4 90.30 88.71 80.99 89.50
TB 5786.6 1113 818.44 87.61 83.87 74.98 85.70
onsets TC 2592.2 701.6 1017.8 71.81 78.70 60.12 75.09
TD 1308 194.72 261.04 83.36 87.04 74.17 85.16
TE 109 500 12 868 18 081 85.83 89.48 77.96 87.62
TF 8538.6 2073.7 1598.4 84.23 80.46 69.93 82.30
SGA Total 160 360 21 038 25 246 86.40 88.40 77.60 87.39
HSCT11 1854.4 182.32 152.64 92.40 91.05 84.71 91.72
PASCAL 790.48 103.76 85.52 90.24 88.40 80.70 89.31
TA 29 598 4301.1 3542.7 89.31 87.31 79.05 88.30
TB 5676.6 1208.1 832.4 87.21 82.45 73.56 84.77
offsets TC 2673.6 709.08 933.4 74.12 79.04 61.95 76.50
TD 1288 211.2 278.96 82.20 85.92 72.44 84.02
TE 111 830 14 588 15 234 88.01 88.46 78.95 88.24
TF 8133.1 2703.1 1994.9 80.30 75.06 63.39 77.59
Total 161 850 24 007 23 055 87.53 87.08 77.47 87.31
a
Indicates that the database was used to adjust the algorithm.
7
Table 4
Average performance on each PCG for healthy vs. pathological cases using the DE variant of the segmentation algorithm.
Healthy Pathologic
Dataset 𝑆𝑒 (%) 𝑃+ (%) 𝐴𝑐𝑐 (%) 𝐹 1 (%) 𝑆𝑒 (%) 𝑃+ (%) 𝐴𝑐𝑐 (%) 𝐹 1 (%)
HSCT11a 97.55 98.13 95.88 97.81 – – – –
PASCALa 94.65 97.38 92.39 95.73 – – – –
TA 94.12 93.57 89.03 93.48 92.54 91.29 91.29 91.76
onsets
TB 91.38 88.51 82.97 89.61 87.10 86.58 86.58 86.62
TC 98.79 98.97 97.77 98.87 70.85 78.26 78.26 71.94
TD 94.07 97.61 92.29 95.67 81.11 84.80 84.80 82.09
TE 94.29 97.35 92.31 95.39 72.03 81.86 81.86 75.45
DE
TF 95.05 94.56 90.91 94.75 85.75 82.76 82.76 83.07
Total 94.10 96.12 91.18 94.72 86.21 87.60 87.60 86.27
HSCT11a 97.03 97.34 94.69 97.15 – – – –
PASCALa 93.54 96.86 91.00 94.83 – – – –
TA 92.56 91.47 86.05 91.67 90.85 91.47 83.57 90.14
TB 90.49 86.20 80.40 87.93 84.11 86.20 75.18 83.55
offsets TC 98.04 97.11 95.33 97.57 71.08 97.11 60.97 73.79
TD 91.85 93.89 87.29 92.73 79.27 93.89 71.09 81.14
TE 94.43 95.59 90.94 94.76 72.93 95.59 63.44 75.26
TF 94.70 94.16 89.84 94.34 82.10 94.16 69.66 79.44
Total 93.94 94.35 89.56 93.88 84.87 9-4.35 76.50 84.90
a Indicates that the database was used to adjust the algorithm.
Table 5
Average performance on each PCG for healthy vs. pathological cases using the SGA variant of the segmentation algorithm.
Healthy Pathological
Dataset 𝑆𝑒 (%) 𝑃+ (%) 𝐴𝑐𝑐 (%) 𝐹 1 (%) 𝑆𝑒 (%) 𝑃+ (%) 𝐴𝑐𝑐 (%) 𝐹 1 (%)
HSCT11a 93.46 93.53 88.56 93.39 – – – –
PASCALa 87.67 87.88 80.53 87.53 – – – –
TA 91.57 90.59 84.56 90.70 90.07 88.80 82.21 89.24
TB 88.69 85.19 78.52 86.58 84.16 83.95 75.16 83.77
onsets TC 95.28 94.51 90.79 94.88 67.11 75.62 55.72 68.54
TD 93.01 96.64 90.50 94.61 77.65 81.56 66.89 78.60
TE 89.13 91.65 83.40 89.86 69.19 78.14 58.65 72.34
SGA
TF 87.21 84.71 77.13 85.87 80.52 76.24 66.06 77.13
Total 89.27 90.64 82.82 89.48 83.33 84.57 74.13 83.30
HSCT11a 92.72 91.79 86.78 92.19 – – – –
PASCALa 88.19 90.13 81.68 88.66 – – – –
TA 90.61 89.03 82.46 89.46 88.69 89.03 79.89 87.77
TB 88.51 83.90 77.27 85.77 82.31 83.90 72.05 81.37
offsets TC 95.58 93.34 89.89 94.43 68.69 93.34 57.76 70.21
TD 90.62 92.81 85.36 91.53 75.37 92.81 65.24 77.22
TE 90.59 90.49 83.75 90.25 70.75 90.49 60.25 72.95
TF 84.03 80.42 71.56 82.05 73.77 80.42 56.28 69.67
Total 90.09 89.31 82.50 89.40 82.25 89.31 72.26 81.91
a
Indicates that the database was used to adjust the algorithm.
Fig. 6. Effect of varying the error window on 𝐹 1 score for healthy and pathological signals. As expected, increasing the error tolerance increases 𝐹 1. The error window varied
between 10 ms to 200 ms. The zoom between 50 ms to 100 ms shows the variance (error bars) of results on each PCG corresponding to the 10th and 90th percentile of the
results, with many recordings achieving a 100% 𝐹 1 score.
8
Table 6 error window. As shown in Tables 4 and 5, segmentation performance

Benchmark of the proposed segmentation algorithm. PCG signals were processed 25 is higher for healthy sounds than pathological. We believe this outcome
times, only average results are shown. The percentages shown in columns 4 and 5 are
with respect to the duration of the total processing time of the signal.
is related to pathological sounds inducing artifacts similar to FHS, such
that onsets/offsets selection is not correctly performed. Furthermore,
Method PCG Total Detection Selection
duration (s) processing stage (%) stage (%) we observed that the use of CC periodicity estimation limits the perfor-
time (s) mance of the proposed method on long PCG signals (> 20 s), due to the
DE 1.39 4.8 95.2 inherent variation of CC frequency on a given acquisition interval. For
10
SGA 1.78 3.7 96.3 this reason we split longer signals in overlapping sections of 20 s.
The goal of our study is to develop a segmentation method adequate
for clinical purposes that can be used on several types of pathological
signals and robust to acoustic noise. Many segmentation methods use
content of the PCG signal. After estimating their periodicity, these recordings from the PhysioNet database, however, not all of them
events are then carefully selected with the help of an optimization conduct their evaluations with the same recordings or in the same
algorithm. way. A direct comparison of our results with other methods is not
The application of the SEF methodology proved to be a robust straightforward, and this is probably the case for most of the algorithms
approach for detecting HS, as exemplified in Fig. 2. By restricting the that use the PhysioNet dataset. Our evaluation data considers all of
analysis of the spectral content of the PCG during the SEF calculation, the 2874 annotated recordings and a small subset of the PASCAL and
noise and most artifacts in the PCG signal were filtered. Using a per- HSTC11 databases were used to adjust the algorithm. Other algorithms
ceptual transformation (dynamic compression), a central differentiation split the PhysioNet data differently in subsets for training, evaluation
filter and by considering only positive changes in the spectral content, and test. Among the top scores reported by different algorithms are
we were able to generate a detection function that clearly indicates the following: Liu et al. [33] report a score 𝐹 1 = 97.9%, and in their
the onsets of FHS (S1 and S2). This approach also helps to filter other evaluation data subset A was used for training and the rest (sets B to
physiological sounds with different spectral content and/or less salient F) for testing. Messner et al. [9] report a score 𝐹 1 = 95.6%, they use
sounds (slow energy increment). Although in principle the SEF could 2110 recordings for training and validation, and 744 recordings for
detect offsets by considering negative changes in the spectral content, testing. Noman et al. [19] report a score 𝐹 1 = 90.2% and they split the
it was found out to be much less reliable and accurate than taking database in half for training and half for testing using a 5-fold cross
positive values only. Hence, detecting offsets by processing the PCG validation. Renna et al. [11] method has 𝐹 1 = 95.5% score, they select
signal inverted in time with the same detection algorithm resulted 786 PCG recordings of which 406 correspond to pathological sounds
in a better approach. Although there exist several time–frequency and the remaining 386 correspond to normal sounds. Shukla et al. [12]
analysis (TFA) tools that provide more accurate representations than method has 𝐹 1 = 98.6% score, they selected 792 PCG recordings with
the STFT, in general, these advantages come at the cost of a much ECG data and use 192 of these for testing.
higher numerical complexity. In our previous evaluations [56], using The proposed method exhibits very good performance at segment-
better TFA tools we did not see an improvement in the results. We ing S1 and S2, however, its current limitation is that it cannot identify
believe that having a high frequency resolution does not contribute S1 and S2. Another limitation, although considerably minor, is that
to increasing performance, since during the spectral integration (see the alignment algorithm is not deterministic and since it has a random
Eq. (6)) all frequency information is lost. This is not a difficulty because nature the results vary. This variation is, nevertheless, minimal since it
we are only interested in time-domain segmentation and the STFT affects F1 score in only 0.02%.
time resolution (in our case ≈5 ms) is more than adequate for the In our proposal, each stage of the proposed segmentation method
application. is independent of each other which makes the method highly modular
The proposed PCG signal periodicity estimation methodology based and easy to adapt or evolve into other segmentation schemes. Source
on four algorithms is very robust, covering a large variety of detection code of the proposed algorithm is free under the terms of the GNU GPL
functions quality cases. Each of these algorithms has particular ad- license.
vantages, simplifying decision making when estimating the periodicity.
5. Conclusions
The frequency methods have low computational complexity, while time
domain have moderate complexity ((𝓁 2 ) for a sequence of length 𝓁),
The goal of PCG signal segmentation is to determine the time
as compared to other methods [7]. We are not aware of the use of
instants where the fundamental heart sounds (FHS) start and end.
methods such as the spectral product, windowed autocorrelation or
Determining the systolic and diastolic phases considerably facilitates
Welch’s PSD for periodicity estimation of PCG signals.
the subsequent analysis of cardiac sounds for applications such as
The proposed detection stage exhibits good performance at finding computer-aided auscultation, heart sound classification or signal com-
sound events, however, not all detected events correspond to FHS. pression. In this article, we have proposed a PCG segmentation algo-
In fact, selecting the appropriate events is the most challenging task rithm based on the idea of detecting events caused by changes in the
in PCG signal segmentation. We have not found any other PCG seg- spectral content of the heart sound signal. We refer to this algorithm as
mentation algorithm posing the onset/offset selection stage as an op- Spectral Energy Flux (SEF). The periodicity of these events is estimated
timization problem. As shown by the results, our approach provides using a very reliable methodology that combines four techniques, two
accurate results. The random nature of the SGA and DE algorithms yield in the time domain and two in the frequency domain. To select which
non-deterministic results, nonetheless. For the SGA the mean score of the detected events correspond to real FHS, we have proposed the
was 𝐹1 = 87.39 ± 0.09%, while for DE it was 𝐹 1 = 93.42 ± 0.02%, use two optimization algorithms, Differential Evolution (DE) and Sim-
in the case of onset detection. Results showed that the DE variant ple Genetic Algorithm (SGA). A thorough assessment of the proposed
performs better than the SGA, since it seems to converge faster to methodology was conducted using the PhysioNet/CinC challenge 2016
the optimal time locations 𝐭 ∗ when tested with the same maximum database. More specifically, we used the 2874 PCG recordings that have
number of generations. This fact can be observed from the results in metadata available, and the proposed algorithm has an F1 score of
Tables 2–3. We also noticed that both DE and SGA variants perform 93.6% for the DE variant and 87.5% for the SGA variant. We consider
more accurately when detecting onsets than offsets. We believe this that our proposal has low computational complexity since it takes less
behavior is caused by the slightly longer energy decay following the than 18% of the duration of a PCG signal to finish the segmentation.
FHS maximum energy amplitude. This effect can be observed in Fig. 6, The source code of the algorithm is free under the terms of the GNU
where offsets detection performance increases significantly after 50 ms GPL license.
9
CRediT authorship contribution statement [14] C.N. Gupta, R. Palaniappan, S. Swaminathan, S.M. Krishnan, Neural network
classification of homomorphic segmented heart sounds, Appl. Soft Comput. 7
(1) (2005) 286–297.
Miguel A. Alonso-Arévalo: Conceptualization, Methodology, Writ-
[15] S.E. Schmidt, C. Holst-Hansen, C. Graff, E. Toft, J.J. Struijk, Segmentation of
ing - original draft. Alejandro Cruz-Gutiérrez: Software, Methodology, heart sound recordings by a duration-dependent hidden Markov model, Physiol.
Writing - original draft. Roilhi F. Ibarra-Hernández: Data curation, Meas. 31 (4) (2010) 513–529.
Software, Investigation, Writing - review & editing. Eloísa García- [16] D.B. Springer, L. Tarassenko, G.D. Clifford, Logistic regression-HSMM-based heart
Canseco: Supervision, Formal analysis, Writing - review & editing. sound segmentation, IEEE Trans. Biomed. Eng. 63 (4) (2016) 822–832.
[17] A.P. Kamson, L. Sharma, S. Dandapat, Multi-centroid diastolic duration distribu-
Roberto Conte-Galván: Data curation, Conceptualization.
tion based HSMM for heart sound segmentation, Biomed. Signal Process. Control
48 (2019) 265–272.
Declaration of competing interest [18] A.A. Sepehri, A. Gharehbaghi, T. Dutoit, A. Kocharian, A. Kiani, A novel method
for pediatric heart sound segmentation without using the ECG, Comput. Methods
Programs Biomed. 99 (1) (2010) 43–48.
The authors declare that they have no known competing finan-
[19] F.M. Noman, S.-H. Salleh, C.-M. Ting, S.B. Samdin, H. Ombao, H. Hus-
cial interests or personal relationships that could have appeared to sain, A Markov-switching model approach to heart sound segmentation and
influence the work reported in this paper. classification, IEEE J. Biomed. Health Inf. (2019).
[20] S. Sun, Z. Jiang, H. Wang, Y. Fang, Automatic moment segmentation and
Acknowledgments peak detection analysis of heart sound pattern via short-time modified Hilbert
transform, Comput. Methods Programs Biomed. 114 (3) (2014) 219–230.
[21] C.I. Nieblas, M.a. Alonso, R. Conte, S. Villarreal, High performance heart
This work was supported by the Mexican National Council for sound segmentation algorithm based on matching pursuit, in: 2013 IEEE Digital
Science and Technology (CONACYT) through the Graduate Research Signal Processing and Signal Processing Education Meeting (DSP/SPE), 2013, pp.
Fellowship No. 338049. The authors want to thank the organizers of 96–100.
the PhysioNet/CinC 2016 Challenge, the organizers of the PASCAL [22] H. Tang, T. Li, T. Qiu, Y. Park, Segmentation of heart sounds based on dynamic
clustering, Biomed. Signal Process. Control 7 (5) (2012) 509–516.
Classifying Heart Sounds Challenge and the creators of the HSCT-11 [23] C.D. Papadaniil, L.J. Hadjileontiadis, Efficient heart sound segmentation and
dataset, for making available the heart sounds used in this research extraction using ensemble empirical mode decomposition and kurtosis features,
work. The authors also thank the anonymous reviewers for their valu- IEEE J. Biomed. Health Inf. 18 (4) (2014) 1138–1152.
able comments and suggestions that helped to improve the quality of [24] R.J. Lehner, R.M. Rangayyan, A three-channel microcomputer system for seg-
mentation and characterization of the phonocardiogram, IEEE Trans. Biomed.
the manuscript.
Eng. BME-34 (6) (1987) 485–489.
[25] M. Malarvili, I. Kamarulafizam, S. Hussain, D. Helmi, Heart sound segmentation
References algorithm based on instantaneous energy of electrocardiogram, in: Computers in
Cardiology, 2003, IEEE, 2003, pp. 327–330.
[1] World Health Organization, World Health Statistics 2016: Monitoring Health [26] R. Paiva, P. Carvalho, R. Couceiro, J. Henriques, M. Antunes, I. Quintal, J.
for the SDGs, Sustainable Development Goals, Technical Report, World Health Muehlsteff, Beat-to-beat systolic time-interval measurement from heart sounds
Organization, 2016. and ECG, Physiol. Meas. 33 (2) (2012) 177.
[2] C.B. Mahnke, Automated heartsound analysis/computer-aided auscultation: A [27] S.I. Malik, M.U. Akram, I. Siddiqi, Localization and classification of heartbeats
cardiologist’s perspective and suggestions for future development, in: 2009 using robust adaptive algorithm, Biomed. Signal Process. Control 49 (2019)
Annual International Conference of the IEEE Engineering in Medicine and Biology 57–77.
Society, Vol. 2009, IEEE, 2009, pp. 3115–3118. [28] S. Choi, Z. Jiang, Comparison of envelope extraction algorithms for cardiac sound
[3] A.K. Dwivedi, S.A. Imtiaz, E. Rodriguez-Villegas, Algorithms for automatic signal segmentation, Expert Syst. Appl. 34 (2) (2006) 1056–1069.
analysis and classification of heart sounds–a systematic review, IEEE Access 7 [29] D. Kumar, P. Carvalho, M. Antunes, R.P. Paiva, J. Henriques, Noise detection
(2018) 8316–8345. during heart sound recording using periodicity signatures, Physiol. Meas. 32 (5)
[4] H. Liang, S. Lukkarinen, I. Hartimo, Heart sound segmentation algorithm based (2011) 599–618.
on heart sound envelogram, in: Computers in Cardiology 1997, Vol. 24, IEEE, [30] C. Castro Hoyos, S. Murillo-Rendón, C.G. Castellanos-Dominguez, Heart sound
1997, pp. 105–108. segmentation in noisy environments, in: Lecture Notes in Computer Science
[5] H. Liang, L. Sakari, H. Iiro, A heart sound segmentation algorithm using wavelet (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes
decomposition and reconstruction, in: Proceedings of the 19th Annual Interna- in Bioinformatics), in: LNCS, vol. 7930, Springer Berlin Heidelberg, 2013, pp.
tional Conference of the IEEE Engineering in Medicine and Biology Society. 254–263.
‘Magnificent Milestones and Emerging Opportunities in Medical Engineering’ [31] L. Gamero, R. Watrous, Detection of the first and second heart sound using
(Cat. No.97CH36136), Vol. 4, IEEE, 1997, pp. 1630–1633. probabilistic models, in: Proceedings of the 25th Annual International Con-
[6] D. Kumar, P. Carvalho, M. Antunes, J. Henriques, L. Eugenio, R. Schmidt, J. ference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat.
Habetha, Detection of S1 and S2 heart sounds by high frequency signatures, in: No.03CH37439), IEEE, 2003, pp. 2877–2880.
2006 International Conference of the IEEE Engineering in Medicine and Biology [32] D. Gill, N. Gavrieli, N. Intrator, Detection and identification of heart sounds using
Society, IEEE, 2006, pp. 1410–1416. homomorphic envelogram and self-organizing probabilistic model, in: Computers
[7] A. Castro, T.T.V. Vinhoza, S.S. Mattos, M.T. Coimbra, Heart sound segmentation in Cardiology, 2005, Vol. 32, IEEE, 2005, pp. 957–960.
of pediatric auscultations using wavelet analysis, in: 2013 35th Annual Inter- [33] C. Liu, D. Springer, G. Clifford, Performance of an open-source heart sound
national Conference of the IEEE Engineering in Medicine and Biology Society segmentation algorithm on eight independent databases, Physiol. Meas. 38 (8)
(EMBC), Vol. 2013, IEEE, 2013, pp. 3909–3912. (2017) 1730.
[8] J. Oliveira, A. Castro, M. Coimbra, Exploring embedding matrices and the [34] L.S. Smith, Sound segmentation using onsets and offsets, J. New Music Res. 23
entropy gradient for the segmentation of heart sounds in real noisy environments, (1994).
in: 2014 36th Annual International Conference of the IEEE Engineering in [35] E.D. Scheirer, Tempo and beat analysis of acoustic musical signals, J. Acoust.
Medicine and Biology Society, Vol. 2014, IEEE, Chicago, 2014, pp. 3244–3247. Soc. Am. 103 (1) (1998) 588.
[9] E. Messner, M. Zöhrer, F. Pernkopf, Heart sound segmentation—an event [36] A. Klapuri, Sound onset detection by applying psychoacoustic knowledge, in:
detection approach using deep recurrent neural networks, IEEE Trans. Biomed. 1999 IEEE International Conference on Acoustics, Speech, and Signal Process-
Eng. 65 (9) (2018) 1964–1974. ing. Proceedings. ICASSP99 (Cat. No.99CH36258), Vol. 6, IEEE, 1999, pp.
[10] J. Oliveira, F. Renna, M. Coimbra, A subject-driven unsupervised hidden semi- 3089–3092.
Markov model and Gaussian mixture model for heart sound segmentation, IEEE [37] J. Laroche, Efficient tempo and beat tracking in audio recordings, J. Audio Eng.
J. Sel. Top. Sign. Proces. 13 (2) (2019) 323–331. Soc. 51 (4) (2003) 226–233.
[11] F. Renna, J. Oliveira, M.T. Coimbra, Deep convolutional neural networks for [38] M. Alonso, B. David, G. Richard, Tempo and beat estimation of musical signals,
heart sound segmentation, IEEE J. Biomed. Health Inform. 23 (6) (2019) in: Proc International Conference on Music Information Retrieval, Vol. 04, 2004,
2435–2445. pp. 158–163.
[12] S. Shukla, S.K. Singh, D. Mitra, An efficient heart sound segmentation approach [39] P. Bentley, G. Nordehn, M. Coimbra, S. Mannor, The PASCAL classifying heart
using kurtosis and zero frequency filter features, Biomed. Signal Process. Control sounds challenge 2011 (CHSC2011), 2011, Retrieved (Ago 2015) de: http://
57 (2020) 101762. www.peterjbentley.com/heartchallenge/index.html.
[13] A. Moukadem, A. Dieterlen, N. Hueber, C. Brandt, A robust heart sounds [40] A. Spadaccini, F. Beritelli, Performance evaluation of heart sounds biometric
segmentation module based on s-transform, Biomed. Signal Process. Control 8 systems on an open dataset, in: 2013 18th International Conference on Digital
(3) (2013) 273–281. Signal Processing (DSP), IEEE, Fira, 2013, pp. 1–5.
10
[41] C. Liu, D. Springer, Q. Li, B. Moody, R.A. Juan, F.J. Chorro, F. Castells, J.M. Roig, [49] R.G. Lyons, Understanding Digital Signal Processing, third ed., Prentice Hall,
I. Silva, A.E. Johnson, Z. Syed, S.E. Schmidt, C.D. Papadaniil, L. Hadjileontiadis, 2011.
H. Naseri, A. Moukadem, A. Dieterlen, C. Brandt, H. Tang, M. Samieinasab, M.R. [50] M. Dvornikov, Formulae of numerical differentiation, 2003, Retrieved (Sep 2015)
Samieinasab, R. Sameni, R.G. Mark, G.D. Clifford, An open access database for de: http://arxiv.org/abs/math/0306092.
the evaluation of heart sound algorithms, Physiol. Meas. 37 (9) (2016). [51] H. Naseri, M.R. Homaeinezhad, Detection and boundary identification of phono-
[42] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark, cardiogram sounds using an expert frequency-energy based metric, Ann. Biomed.
J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, Physiobank, physiotoolkit, Eng. 41 (2) (2013) 279–292.
and physionet : Components of a new research resource for complex physiologic [52] P. Stoica, R.L. Moses, Spectral Analysis of Signals, Prentice Hall, Upper Saddle
signals, Circulation 101 (23) (2000) e215–e220. River, NJ, 2005.
[43] P. Arnott, G. Pfeiffer, M. Tavel, Spectral analysis of heart sounds: relationships
[53] A.M. Noll, Pitch determination of human speech by the harmonic product
between some physical characteristics and frequency spectra of first and second
spectrum, the harmonic sum spectrum and a maximum likelihood estimate, in:
heart sounds in normals and hypertensives, J. Biomed. Eng. 6 (2) (1984)
Proceedings of the Symposium on Computer Processing in Communications, Vol.
121–128.
XIX, Polytechnic Press, 1970, pp. 779–797.
[44] J.E. Hall, Guyton and Hall Textbook of Medical Physiology, thirteenth ed.,
[54] A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, Natural
Elsevier, 2016, p. 1046.
[45] J.S. Butterworth, Cardiac Auscultation, Including Audiovisual Principles, Grune Computing Series, Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.
& Stratton, 1960. [55] R. Storn, K. Price, Differential evolution a simple and efficient heuristic for global
[46] S.K. Mitra, Digital Signal Processing: A Computer-Based Approach, second ed., optimization over continuous spaces, J. Global Optim. (1997) 341–359.
McGraw-Hill, 2001. [56] A. Cruz-Gutiérrez, Segmentación Robusta del Audio Cardiaco Mediante Análisis
[47] J.O. Smith III, Spectral Audio Signal Processing, W3K, Stanford, 2011. Tiempo-Frecuencia y Métodos de Optimización (Master’s thesis), Centro de
[48] S.K. Ghosh, R. Ponnalagu, R. Tripathy, U.R. Acharya, Automated detection of Investigación Científica y de Educación Superior de Ensenada (CICESE), B.C.,
heart valve diseases using chirplet transform and multiclass composite classifier Mexico, 2016.
with PCG signals, Comput. Biol. Med. 118 (2020) 103632.
11

Biomedical Signal Processing and Control

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biomedical Signal Processing and Control

Uploaded by

Copyright:

Available Formats

Biomedical Signal Processing and Control 63 (2021) 102208

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control

Robust heart sound segmentation based on spectral change detection and

ARTICLE INFO ABSTRACT

Fig. 1. Block diagram of the proposed segmentation algorithm.

ℎ = {Hann(𝑖) ∶ 𝑖 = 𝐿 , 𝐿 + 1, … , 2𝐿 }. 𝜈(𝑚) = 𝜐(𝑚) ∗ ℎ𝑔 (𝑚), (8)

̄ and (d) detection function 𝜈(𝑚).

Fig. 5. Illustration of the optimal correlation optimization problem.

DE Total 170 630 9059.7 14 981 91.93 94.96 87.65 93.42

Table 6 error window. As shown in Tables 4 and 5, segmentation performance

You might also like