Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

4th International Conference on Electrical Engineering (ICEE 2015)

IGEE, Boumerdes, December 13th -15th, 2015

Synthesis of diplophonic and biphonic voices


Y. Bennane, A. Kacha

F. Grenez

Laboratoire de Physique de Rayonnement et Applications


Universit de Jijel
Jijel, Algeria
ben.qoub@yahoo.fr, kacha_a@yahoo.com

LISA Laboratory, Universit Libre de Bruxelles


Brussels, Belgium
fgrenez@ulb.ac.be

AbstractThe paper concerns the generation of speech


sounds the timbre of which simulates the voice quality of
dysphonic speakers. The present study will focus on the
diplophonia and biphonation which are phonetic symptoms
classified in pathologic voices. The proposed synthesizer includes
the model of phase delayed overlapping sinusoid (PDOS) to
simulate the area glottal waveforms. The glottal flow rate
obtained by Rothenbergs model and the vocal tract given by the
concatenation of digital resonators (cascade formant synthesizer
of Klatt). The preliminary tests show the capacity of our
simulator to synthesize the diplophonic and biphonic voices.
Keywords Diplophonia; Biphonation; synthesis of disordered
voices.

I. INTRODUCTION
Voice disorders are common in the human life; they are
consequences of disease or malfunction of the larynx. In
children, especially 2 years old and above, the dysphonia in
most of the time is due to vocal malmenage, in adults where the
usage of the voice is important for professional or social
reasons, the dysphonia is called dysfunctional. This category of
patients includes employees of call centers, teachers, etc. The
vocal activities are diversified therefore the number of people
involved in voice disorders has increased significantly.
Clinical evaluation of voice disorders is routinely based on
listener perception of speech. For example, clinicians rate the
degree of perceived abnormality according to some protocol
such as GRBAS (Grade, Roughness, Breathiness, Asthenia, and
Strain) auditory scale. This method of evaluation is subjective,
i.e. the outcome is listener-dependent.
Objective measures of voice disorders are obtained from
acoustic analysis of speech. The analysis of a speech signal may
inform indirectly on the anatomy and physiology of the speech
apparatus, including the vibrating vocal folds, without
obstructing the production of speech. The intelligibility together
with appropriate prosody and speaker timbre are objects of the
perceptual scoring of which is part of the assessment of voice
and speech given their communicative functions.
Many properties of the speech signal that identify speech
sounds phonetically are known. This is, however, not the case
for speaker timbre. Timbre is the attribute of sensation that
enables to distinguish between two different sounds even if they
have the same pitch and loudness. In clinical practice, one
observes that the perceptual scoring as well as the discovery of
the acoustic cues that describe speaker timbre quantitatively

have remained problematic. For instance, hundreds of acoustic


cues have been shown to correlate moderately with auditory
scales that are used routinely. Not surprisingly, auditory scales
according to which speaker timbre in general and voice timbre
in particular are assessed have often been criticized because
they lack reliability (inter-judge and intra-judge agreement) as
well as validity.
In the past, perceptual experiments have been carried out
successfully to discover the speech signal properties that report
the phonetic identity of a speech sound. The same approach has
been used to learn about the link between signal properties and
speaker timbre, but far less successfully. One reason is that
general-purpose synthesizers have been used, which are not
able to simulate a wide range of timbres.
Diplophonia and biphonation are two phonetic symptoms
classified in pathologic voices. Diplophonia is defined as a
condition in which the voice simultaneously produces two
sounds of different pitch. It is characterized by a periodic signal
containing two or more glottal cycles of a different shape and/or
amplitude and it is represented by a spectrum containing
multiple harmonic series which their fundamental frequencies
are in a rational ratio. Biphonation is characterized by two
independent fundamental frequencies each one generates its
own harmonics; these fundamental frequencies are in an
irrational ratio.
The synthesizers of diplophonia and biphonation are rare
in the literature. The first one given by Ishizaka in 1976 [1]
obtained diplophonic waveforms by using an asymmetric twomass model (two-masses for each vocal fold). Later, Patrick
Mergell and Hanspeter Herzel modeled biphonation by using
asymmetric two-mass models of the vocal folds [2]. In [3],
Hanquinet proposed a synthesis model for diplophonia and
biphonation based on a modulation of the amplitude and
instantaneous frequency of the harmonic driving function. In
diplophonia the modulation frequency is the pulse frequency
times a ratio of two small integers. In biphonation, it is an
irrational number. A few years later, Fraj [4] also proposed a
synthesizer based on a nonlinear memory-less model of the
glottal area obtained by a harmonic excitation of the
instantaneous frequency and amplitude which are controlled,
the glottal airflow rate is generated by means of an
aerodynamic model of the glottis [5] and Trachea and vocal
tract are modeled by means of a concatenation of cylindrical
pipes of an identical length, but a different cross-sections.
Recently Alonso et al [6] introduced a synthesis model for

2015 IEEE

subharmonic voice that considers additive oscillator


combination [7] based on the modification of a classic glottal
excitation synthesizer.
In this paper, we propose to develop a synthesizer of voice
disorders with a view to modeling two typical timbres of
disordered voices which are the diplophonia and biphonation.
Such a synthesizer is of great importance for clinical practice as
well as for research. Indeed, it can be used in the simulation of
voice timbres to study human perception and auditory
evaluation of voice as well as in the test of the clinical speech
analysis software with calibrated synthetic signals.
II. DESCRIPTION OF THE SYNTHESIZER
The implementation of the synthesizer that we propose is
inspired by existing formant and physics-based synthesizers.
The synthesizer represented in Fig. 1 includes three models; the
first one is the model phase delayed overlapping sinusoid
(PDOS) of Titze 2006 [5] for the simulation of the glottal area
waveforms, the second model simulates the glottal flow rate, it
is Rothenbergs model and the lest one is the model of vocal
tract based on the cascade formant synthesizer of Klatt [8].
Speech

PDOS model

= +2 sin-1( QO) -2 QO

A is the vibrational amplitude, QA represents the amplitude


quotient, it is fixed at 0.5 and QO represents the open quotient, it
is fixed at 0.6.
The area glottis obtained by the PDOS model is filtered by
a 51-order finite impulse response (FIR) low-pass filter with a
cut-off frequency of 80 Hz.
Fig. 2 illustrates the calculation of the area glottis using the
minimum of glottal entry area and glottal exit area.

Fig. 2. Waveform of the area glottis Ag (red), glottal entry area A2 (green)
and glottal exit area A1 (blue).

B. Model of glottal Airflow


The area glottis obtained previously is inserted into a second
model that is Rothenbergs model of the glottal airflow.
Therefore the glottal airflow (U) is computed by (3):

Vocal tract model


Finite impulse response
(FIR) filter
Rothenbergs model
(glottal air flow model)

RAg 2
kp

RAg 2
Ur
kp

2 LAgU r
kp

(3)

Where R is the acoustic resistance, L is the inertance, k is


the recovery coefficient, is the density of air and Ur is the
resistive glottal airflow which is given by the following
formula:

Glottal area

Lung pressure
Fig. 1. Block diagram of the synthesizer.

A. Model of area glottis


The area glottis Ag is calculated by the model PDOS as the
minimum of two signals delayed with a phase , the first is the
glottal entry area A1 and the second is the glottal exit area A2.
Ag= min (A1, A2)
A1= max [ , 01 1sin (2 F0t)] 2L
A2= max [ , 02 2sin (2 F0t- )] 2L

In the previous equations F0 is the fundamental frequency, L


is the length of glottis, is the smallest allowed glottal width (
= 0), 01 and 02 are the lower and the upper glottal half-widths
for vocal fold posturing respectively, and 1 and 2 are the lower
and the upper vibrational amplitudes respectively.
The parameters

02,

01,

2,

2=2A
1

and are obtained by (2):

01

QO
QO

R 2 2 PL
kp
kpAg 2

R
kp

2
Ag

(4)

With PL represents lung pressure.


Fig. 3 shows the airflow rate signal U (blue) as well as the
glottal airflow resistance Ur (green) and the glottal airflow
inertance Ui (red). The last one is obtained by subtraction of the
airflow rate resistance from the airflow rate:
Ui

Ur

(5)

Fig. 3. Glottal Airflow waveform (blue), glottal airflow resistance (green) and
glottal airflow inertance (red).

((QA+1))

2QA

02

Ur

(2)

C. Vocal tract model


The vocal tract model is based on the formant synthesizer of
Klatt with cascade configuration of digital resonators [8]. For a
typical length of male vocal tract (about 17cm), we need five
digital resonators as shown in Fig. 4.
Entry

R2

R1

R3

R4

R5

K
T

A nT

n 11 B nT z 1 C nT z 2

(9)

Where K represents the number of digital resonators used in


the vocal tract simulation, it is equal to 5.

Exit

Fig. 4. Cascade configuration of digital resonators.

1) Digital resonators
The block diagram of a digital resonator is shown in Fig. 5.
The input-output characteristics of a digital resonator are
identified by two parameters: the resonant (formant) frequency
F and the bandwidth BW. The resonators output y(nT) is
calculated using the input x (nT) as follows:
y nT

Ax nT

By nT

Cy ( nT

(6)

2T )

Where A, B and C are constants that depend to the resonant


frequency F and the bandwidth BW of the resonator by the
impulse-invariant transformation:

Fig. 6. Magnitude spectum of the vocal tract transfer function for the vowel
[a].

2) Digital anti-resonator
An anti-resonator can be called an anti-formant or transferfunction zero pair. It is obtained by making slight modifications
to (6) and (7). The output of the digital antiresonator y (nT) is
calculated using the input x (nT) as follows:
y (nT)=A' x(nT) +B' x(nT-T) +C' x(nT-2T)

With x (nT-2T) are the previous two samples of the input


x(nT), the constants A', B' and C' are defined by (11) :
A'=1.0/A,

2 BW T

exp

(10)

B'= -B/A,

C'= -CA

(11)

(7)

Where A, B and C are calculated by replacing the resonant


(formant) frequency F and the bandwidth BW in (7) with the
anti-resonance center frequency F' and bandwidth BW'.

A digital resonator is a second-order differential equation.


Thus the transfer function of a digital resonator has a sampled
frequency response given by:

An anti-resonance is used in the synthesizer to shape the


spectrum of the voicing source and another one is used to
simulate the effects of nasalization in the vocal tract transfer
function.

2 exp

BW T cos 2 FT

T f

A
1 Bz 1 Cz 2

(8)

exp j 2 f T

Where, f is the frequency in Hz and ranges from 0 to 5 kHz,


and j is the imaginary number corresponding to the square root
of -1.
C

Y(nT-2T

B
Y(nT-T)

Unit Delay
INPUT
X(nT)

III.

SYNTHESIS OF DIPLOPHONIA AND BIPHONATION

A. Diplophonia
Diplophonia is a phonetic symptom classified in pathologic
voice. It appears as a dual simultaneous voice or phonation. It is
characterized by a periodic signal containing two or more
glottal cycles of a different shape and/or amplitude and it is
represented by a spectrum containing multiple harmonic series
that their fundamental frequencies are in a rational ratio. We
can obtain the diplophonia sound by making modification to
PDOS model where the signal area glottis Ag is given by (1).
This modification makes the fundamental frequencies F1 and
F2 of A1 and A2, respectively, in a rational ratio. Equation (1)
becomes:

Unit Delay
OUTPUT
Y(nT)

Fig. 5. Block diagram of the digital resonator.

Ag = min (A1, A2)


A1 = max [ , 01+ 1 sin (2 F1 t)] 2L
A2 = max [ , 02 2 sin (2 F2t- )] 2L

With the ratio between F1and F2 rational.


Fig. 6 represents the transfer function of the vocal tract
which is given as follows:

(12)

B. Biphonation.
Biphonation is a phonetic symptom of disordered voices. It
is the produced sound when attempting to talk or to sing as a
sound with two fundamental frequencies each one of them is
generating its own harmonics. These fundamental frequencies
are in an irrational ratio. Biphonation is also described by a
succession of unequal glottal cycles. We simulate biphonation
in the same manner as diplophonia using (12) except that the
ratio between the two fundamental frequencies is irrational.
IV. RESULTS
The examples presented in this section show the ability of
the synthesizer to simulate vocal timbres of the diplophonia and
biphonation. Fig. 7 shows the area glottis of diplophonia with a
ratio between F1 and F2 equal to , and Fig. 8 shows the
speech signal of vowel [a] and its spectrum. We notice two
harmonic series with their fundamental frequencies with a
rational ratio equals to .

Fig. 10. Acoustic signal of vowel [a] in the upper part of the Fig. and its
spectrum in the lower part (Qo=0.6 and Qa=0.5).

Fig. 11 shows the area glottis of biphonation with a ratio


between F1 and F2 equal to Eulers number and Fig. 12 shows
the speech signal of vowel [a] and its spectrum.

Fig. 11. Signal of the area glottis for biphonation (Qo=0.6 and Qa=0.5).

Fig. 7. Waveform of the area glottis for diplophonia (Qo=0.6 and Qa=0.5).

Fig. 12. Accostic signal of vowel [a] for biphonic voice in the upper part of the
Fig. and its spectrum in the lowre part (Qo=0.6 and Qa=0.5).
Fig. 8. Accostic signal of vowel [a] for diplophonic voice in the upper part of
the Fig. and its spectrum in the lowre part (Qo=0.6 and Qa=0.5).

Fig. 9 represents the waveform of the area glottis of a


normophonic speaker with the amplitude of the area glottis
equal to 0.2. As observed, the area glottis is characterized by
equal glottal cycles and only one harmonic series and the
waveform of the speech signal of vowel [a] and its
corresponding spectrum are shown in Fig. 10.

V. CONCLUSION
In this paper we have proposed a synthesizer of the
disordered voices. The synthesizer, inspired by existing formant
and physics-based synthesizers, is based on the PDOS model of
to simulate the area glottal waveforms, the Rothenbergs model
to simulate the glottal flow rate and the cascade formant
synthesizer of klatt to simulate the vocal tract. The synthesizer
has been used to simulate the diplophonic and biphonic voices.
In the future work we will focus on auditory perception of these
voices, simulating and auditory perception of pressed or
asthenic voices, modulation noise and additive noise at the
glottis and simulating and auditory perception of singers
timbers.

Fig. 9. Waveform of the area glottis of a normophoni speaker (Qo=0.6 and


Qa=0.5).

REFERENCES
[1]
[2]
[3]

K. Ishizaka, Computer simulation of pathological vocal cord vibration


J. Acoustic. Soc. Am., vol. 60, pp. 11931198, 1976.
P. Mergell and H, Herzel, Modelling biphonation - The role of the
vocal tract, Speech Communication, vol. 22, pp. 141-154, 1997.
Hanquinet, F. Grenez and J. Schoentgen, Synthesis of disordere
speech, INTERSPEECH, Lisbon (Portugal), pp.1077-1080, September
2005.

[4]
[5]
[6]

S. Fraj, Synthse des voix pathologiques, PhD thesis, Universit Libre


de Bruxelles, 2010.
I. Titze, The myoelastic aerodynamic theory of phonation, (National
Center for Voice and Speech, Denver CO, Iowa City IA), 2006.
J. B. Alonso, M. a. Ferrer, P. Henriquez, K. Lopez-de Ipina, J. Cabrera,
and C. M. Travieso,A study of glottal excitation synthesizers for
different voice qualities, Neurocomputing,vol. 150, pp. 367376, 2015.

[7]
[8]

P. Aichinger, Diplophonic Voice:De nitions, models, and detection,


PhD thesis, Medical University of Vienna, 2015.
D. H, Klatt,Software for a cascade/parallel formant synthesizer, J.
Acoustic. Soc. Am, vol.67, pp. 971995, 1980.

You might also like