Professional Documents
Culture Documents
Synthesis of Diplophonic and Biphonic Voices: Y. Bennane, A. Kacha F. Grenez
Synthesis of Diplophonic and Biphonic Voices: Y. Bennane, A. Kacha F. Grenez
F. Grenez
I. INTRODUCTION
Voice disorders are common in the human life; they are
consequences of disease or malfunction of the larynx. In
children, especially 2 years old and above, the dysphonia in
most of the time is due to vocal malmenage, in adults where the
usage of the voice is important for professional or social
reasons, the dysphonia is called dysfunctional. This category of
patients includes employees of call centers, teachers, etc. The
vocal activities are diversified therefore the number of people
involved in voice disorders has increased significantly.
Clinical evaluation of voice disorders is routinely based on
listener perception of speech. For example, clinicians rate the
degree of perceived abnormality according to some protocol
such as GRBAS (Grade, Roughness, Breathiness, Asthenia, and
Strain) auditory scale. This method of evaluation is subjective,
i.e. the outcome is listener-dependent.
Objective measures of voice disorders are obtained from
acoustic analysis of speech. The analysis of a speech signal may
inform indirectly on the anatomy and physiology of the speech
apparatus, including the vibrating vocal folds, without
obstructing the production of speech. The intelligibility together
with appropriate prosody and speaker timbre are objects of the
perceptual scoring of which is part of the assessment of voice
and speech given their communicative functions.
Many properties of the speech signal that identify speech
sounds phonetically are known. This is, however, not the case
for speaker timbre. Timbre is the attribute of sensation that
enables to distinguish between two different sounds even if they
have the same pitch and loudness. In clinical practice, one
observes that the perceptual scoring as well as the discovery of
the acoustic cues that describe speaker timbre quantitatively
2015 IEEE
PDOS model
= +2 sin-1( QO) -2 QO
Fig. 2. Waveform of the area glottis Ag (red), glottal entry area A2 (green)
and glottal exit area A1 (blue).
RAg 2
kp
RAg 2
Ur
kp
2 LAgU r
kp
(3)
Glottal area
Lung pressure
Fig. 1. Block diagram of the synthesizer.
02,
01,
2,
2=2A
1
01
QO
QO
R 2 2 PL
kp
kpAg 2
R
kp
2
Ag
(4)
Ur
(5)
Fig. 3. Glottal Airflow waveform (blue), glottal airflow resistance (green) and
glottal airflow inertance (red).
((QA+1))
2QA
02
Ur
(2)
R2
R1
R3
R4
R5
K
T
A nT
n 11 B nT z 1 C nT z 2
(9)
Exit
1) Digital resonators
The block diagram of a digital resonator is shown in Fig. 5.
The input-output characteristics of a digital resonator are
identified by two parameters: the resonant (formant) frequency
F and the bandwidth BW. The resonators output y(nT) is
calculated using the input x (nT) as follows:
y nT
Ax nT
By nT
Cy ( nT
(6)
2T )
Fig. 6. Magnitude spectum of the vocal tract transfer function for the vowel
[a].
2) Digital anti-resonator
An anti-resonator can be called an anti-formant or transferfunction zero pair. It is obtained by making slight modifications
to (6) and (7). The output of the digital antiresonator y (nT) is
calculated using the input x (nT) as follows:
y (nT)=A' x(nT) +B' x(nT-T) +C' x(nT-2T)
2 BW T
exp
(10)
B'= -B/A,
C'= -CA
(11)
(7)
2 exp
BW T cos 2 FT
T f
A
1 Bz 1 Cz 2
(8)
exp j 2 f T
Y(nT-2T
B
Y(nT-T)
Unit Delay
INPUT
X(nT)
III.
A. Diplophonia
Diplophonia is a phonetic symptom classified in pathologic
voice. It appears as a dual simultaneous voice or phonation. It is
characterized by a periodic signal containing two or more
glottal cycles of a different shape and/or amplitude and it is
represented by a spectrum containing multiple harmonic series
that their fundamental frequencies are in a rational ratio. We
can obtain the diplophonia sound by making modification to
PDOS model where the signal area glottis Ag is given by (1).
This modification makes the fundamental frequencies F1 and
F2 of A1 and A2, respectively, in a rational ratio. Equation (1)
becomes:
Unit Delay
OUTPUT
Y(nT)
(12)
B. Biphonation.
Biphonation is a phonetic symptom of disordered voices. It
is the produced sound when attempting to talk or to sing as a
sound with two fundamental frequencies each one of them is
generating its own harmonics. These fundamental frequencies
are in an irrational ratio. Biphonation is also described by a
succession of unequal glottal cycles. We simulate biphonation
in the same manner as diplophonia using (12) except that the
ratio between the two fundamental frequencies is irrational.
IV. RESULTS
The examples presented in this section show the ability of
the synthesizer to simulate vocal timbres of the diplophonia and
biphonation. Fig. 7 shows the area glottis of diplophonia with a
ratio between F1 and F2 equal to , and Fig. 8 shows the
speech signal of vowel [a] and its spectrum. We notice two
harmonic series with their fundamental frequencies with a
rational ratio equals to .
Fig. 10. Acoustic signal of vowel [a] in the upper part of the Fig. and its
spectrum in the lower part (Qo=0.6 and Qa=0.5).
Fig. 11. Signal of the area glottis for biphonation (Qo=0.6 and Qa=0.5).
Fig. 7. Waveform of the area glottis for diplophonia (Qo=0.6 and Qa=0.5).
Fig. 12. Accostic signal of vowel [a] for biphonic voice in the upper part of the
Fig. and its spectrum in the lowre part (Qo=0.6 and Qa=0.5).
Fig. 8. Accostic signal of vowel [a] for diplophonic voice in the upper part of
the Fig. and its spectrum in the lowre part (Qo=0.6 and Qa=0.5).
V. CONCLUSION
In this paper we have proposed a synthesizer of the
disordered voices. The synthesizer, inspired by existing formant
and physics-based synthesizers, is based on the PDOS model of
to simulate the area glottal waveforms, the Rothenbergs model
to simulate the glottal flow rate and the cascade formant
synthesizer of klatt to simulate the vocal tract. The synthesizer
has been used to simulate the diplophonic and biphonic voices.
In the future work we will focus on auditory perception of these
voices, simulating and auditory perception of pressed or
asthenic voices, modulation noise and additive noise at the
glottis and simulating and auditory perception of singers
timbers.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]