Quarterly Progress and Status Report

Quarterly Progress and
Status Report

Estimating perceived
phonatory pressedness in
singing from flow
Sundberg, J. and Thalén, M. and Alku, P. and
Vilkman, Erkki

journal: TMH-QPSR
volume: 43
number: 1
year: 2002
pages: 089-096
TMH-QPSR Vol. 43, 2002

Estimating perceived phonatory pressedness in

singing from flow glottograms
Johan Sundberg*, Margareta Thalén**, Paavo Alku***, Erkki Vilkman****

* Speech Music Hearing department, KTH, Stockholm

** SMI (University College of Music Education in Stockholm)
*** Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing,
Box 3000, Fin-02015 TKK, Finland
**** University of Oulu, Department of Otolaryngology and Phoniatrics, Oulu, Finland, and
Helsinki University Central Hospital, Box 220, Fin-00029 HUS, Finland

Paper presented at the 31st Annual Symposium Care of the Professional Voice,
Philadelphia, June 2002

The normalized amplitude quotient (NAQ), defined as the ratio between the peak-
to-peak amplitude of the flow pulse and the negative peak amplitude of the
differentiated flow glottogram and normalized with respect to period time, has
been shown to be related to glottal adduction. Glottal adduction, in turn, affects
mode of phonation and hence perceived phonatory pressedness. The relationship
between NAQ and perceived phonatory pressedness was analyzed in a material
collected from a professional female singer and singing teacher who sang a triad
pattern in breathy, flow, neutral and pressed phonation in three different loudness
conditions (soft, middle, loud). In addition, she also sang the same triad pattern in
four different styles of singing, classical, pop, jazz and blues, in the same three
loudness conditions. A panel of nine experts rated the degree of perceived
phonatory press along visual analogue scales. Comparing the obtained mean rated
pressedness ratings with the mean NAQ values for the various triads showed that
about 73% of the variation in perceived pressedness could be accounted for by
variations of NAQ.

Introduction amplitudes of the flow pulses and increases the

closed quotient. More specifically, it changes
The voice source can be continuously varied the relationship between Ps and various
with regard to vocal loudness and pitch, related characteristics of the voice source waveform.
to the physiological voice control parameters This relationship, in turn, is reflected in various
subglottal pressure (Ps) and vocal fold length measures. For example, the estimated projected
and stiffness, respectively. In addition, the voice glottal area, defined as the ratio between the
source is affected by the degree of glottal peak-to-peak pulse amplitude and the square
adduction; a firm glottal adduction produces a root of Ps, decreases, when glottal adduction is
pressed/hyperfunctional voice and a weak glottal increased. Glottal compliance, i.e., the ratio
adduction gives a breathy/hypofunctional voice. between the air volume displaced in a pulse and
Glottal adduction is relevant to vocal hygiene; a the associated Ps also decreases with increasing
habitually exaggerated glottal adduction is glottal adduction. An increased glottal adduction
known to frequently lead to voice disorders such also reduces the level difference between the
as nodules. two lowest partials of the voice source spectrum.
A change of the glottal adduction affects the Moreover, the amplitude quotient (AQ), referred
voice source in various ways. Increased to as Td by Fant (1997), has been shown to yield
adduction tends to decrease the vibration ampli- a measure that faithfully reflects the degree of
tude of the vocal folds. This reduces the glottal adduction (Alko & Vilkman, 1996). AQ

Sundberg et al.: Estimating perceived phonatory pressedness ....

is defined as the ratio between the peak-to-peak closure and is typically used in Classical
flow pulse amplitude and the negative peak singing, apparently corresponding to what has
amplitude of the differentiated flow glottogram. been referred to as “resonant voice”. (Verdolini
Since AQ yields a time-domain measure that can et al., 1998)). The subject also sang the same
be interpreted as a length of the sub-section of material in four styles: Classical, Pop, Jazz, and
the glottal closing phase, it is justified to Blues, again each in three degrees of vocal
normalize this quotient by dividing it with the loudness (soft, middle and loud). The Classical
length of the fundamental period (Alku et al., style of singing was that typically used in
2002). Similar parameterization scheme has also performances of the Lieder repertoire. The Pop
been used in Fant (1997). The normalized and Jazz styles corresponded to those typically
version of AQ is denoted by the Normalized used in Pop and Jazz ballads. The Pop style is
Amplitude Quotient (NAQ). typically represented by performers like Randy
Glottal adduction differs between singing Crawford and Whitney Houston. Typical repre-
styles. While in the classical operatic tradition, sentatives of the Jazz and Blues styles are
adduction seems to be reduced to a minimum, it performers like Billie Holiday and Sarah
is often quite firm in certain styles of singing Vaughan, and Bessie Smith and Janis Joplin,
practiced in popular music, e.g. country respectively. The subject sang the pitch pattern
(Sundberg et al., 1999b), Broadway (Stone et in the following order: breathy, flow, neutral,
al., to be printed)) and belting (Sundberg et al., pressed, Classical, Pop, Jazz, Blues, starting
1993; Bestebreurtje & Schutte, 2000). In a with low, then medium and last high degree of
recent investigation, co-authors JS and MT vocal loudness.
analyzed the voice source in different styles of The flow signal was obtained from a flow
singing as produced by a professional female mask (Glottal Enterprises, MSIF2). The audio
singer (Sundberg & Thalén, 2001). The singer signal was recorded by a high fidelity micro-
subject also produced a set of examples where phone at a distance of 0.3 m from the mouth. Ps
she deliberately attempted to vary the degree of was recorded as the oral pressure during the
glottal adduction. This material was then occlusion of the consonant /p/ in the syllable
submitted to a listening test where expert sub- [pae:]. This pressure was measured with a
jects rated the degree of phonatory pressedness. pressure transducer connected to a small plastic
In the present investigation, we will compare tube, mounted in the flow mask, that the subject
these ratings with different voice source had in the corner of her mouth. All these signals
measures of glottal adduction. were recorded on separate channels in a TEAC
PCM recorder.
The subject (co-author MT) has a long pro- Perceptual analysis
fessional experience of singing and teaching in The recorded material was evaluated per-
the four different styles: Classical, Pop, Jazz and ceptually by a panel of 10 listeners, all experts
Blues. Originally trained in the Classical style of on voice pedagogy and acquainted with different
singing she performed professionally as a singing styles. The task was to assess, on 100-
soloist, mostly in the Jazz, Pop and Blues styles mm long visual analogue scales (VAS), the
for about 10 years. In addition, she has worked degree of phonatory pressedness in all styles and
as a voice teacher in these styles for more than modes of phonation. The ends of the VAS were
20 years. marked “Extremely breathy” and “Extremely
The subject sang, in one breath group, the pressed”. These terms were familiar to the
syllable [pæ] twice on each of the pitches of a listeners.
seventh chord (A3, C#4, E4, G4). She sang this Each of the panel members received a copy
pattern in four modes of phonation: breathy, of an audiotape with the stimuli in the same
flow, neutral, and pressed, each in three degrees randomized order. In the tape, each of the triads
of vocal loudness (soft, middle and loud). (Flow occurred two times (4 phonation modes x 3
phonation is produced with the lowest degree of loudnesses x 2 presentations plus 4 styles x 3
glottal adduction that produces a vocal fold loudnesses x 2 presentations).

Table 1. Intra-rater consistency for ratings of phonatory press according to the Pearson correlation
coefficient r. Slope and Itcpt show the slope and intercept of the best linear fit to the data.

Subject # 1 2 3 4 5 6 7 8 9 10

Slope 0,98 1,13 1,00 0,62 0,51 0,91 1,00 0,26 0,98 0,92
Itcpt 2,3 -11,9 -4,2 22,0 22,8 2,6 -4,3 40,6 -4,4 2,68
r 0,974 0,969 0,932 0,669 0,562 0,877 0,961 0,216 0,855 0,926

Table 1 lists the correlation between the opposite trend. For the modes, the results
ratings of the first and second presentation showed the lowest degree of rated pressedness
obtained for identical stimuli from each rater. for breathy and the highest for pressed, as
Also shown are the slope and the intercept of the expected. For the styles, Classical showed the
best linear fit of the data points. The correlation lowest and Blues the highest degree of
averaged across raters for the responses to pressedness.
repeated stimuli was 0.794 (SD=0.295). Subject
#8 yielded an exceptionally low correlation. The Acoustic analysis
responses of this subject were therefore
discarded. A block-scheme of the analysis equipment is
Figure 1 shows the panel’s mean rated shown in Figure 2. The voice source waveform
pressedness averaged across the pitches con- was derived by inverse filtering the flow signal
tained in each triad. For both modes and styles, (Glottal Enterprises, MSIF2). The lowest pitch
the mean rated pressedness was lowest for the in each seventh chord was captured on a
soft triads and highest for loud triads except for transient recorder (Glottal Enterprises, BT-1)
the pressed mode and the Blues style, where the and the frequencies and bandwidths of the two
differences were small and showed a weak lowest formants were adjusted so as to minimize

100 100
Soft Soft
Middle Middle
Mean rated pressedness (mm)

80 Loud 80 Loud
Mean rated pressedness (mm)

60 60

40 40

20 20

0 0
Breathy Flow Neutral Pressed Classical Pop Jazz Blues
Phonation mode

Figure 1. Expert panel’s mean ratings of pressedness for the various phonation modes (left panel)
and styles (right panel). The bars represent ± 1 standard deviation.

multi channel Oscilloscope
Audio recorder

Inverse PC


LP Filter

Figure 2. Block diagram of the equipment used during the analysis.

the ripple during the closed phase. Then, all glottogram parameters were measured:
pitches of the seventh chord were run through
! period time between two adjacent discon-
the inverse filter and stored in a file, using the
tinuities corresponding to the closing of the
Soundswell signal workstation (Ternström,
1992). Ps was simultaneously recorded on
! closed phase between the end of a flow pulse
another channel of the same file.
to the onset of the next flow pulse,
For each note in the triad a representative
! glottal leakage, defined as the mean flow
waveform was selected from the middle part of
amplitude during the closed phase,
the second syllable, Figure 3. The glottogram
! peak-to-peak pulse amplitude, defined as the
characteristics as well as Ps were then measured
peak value of the pulse minus the glottal
using the Soundswell signal workstation. As
leakage, and
illustrated in Figure 4, the following flow

Blues, mf, C#4 A3


0.8 Swell


0 0.5 1 1.5 2 2.5 3

Time (s)
Figure 3. Example of sound file for the pitches C#4 and A3 sung in Blues style. The upper and
lower traces show oral pressure and flow, respectively.

Fig 4a
P-t-p Glottal

ampl. leakage

0 0,002 0,004 0,006 0,008 0,01 0,012

Time [s]

Fig 4b

0 0,002 0,004 0,006 0,008 0,01 0,012


Time [s]

Figure 4a. Flow glottogram parameters analysed: period time (T), duration of closed phase, peak-
to-peak flow amplitude (P-t-p ampl.), and mean flow during the closed phase (Glottal leakage).

Figure 4b. Differentiated flow glottogram, the negative peak amplitude of which (dU/dt) was

! air volume contained in the pulse, estimated 3. glottal compliance, defined as the ratio
from a triangle approximation of the pulse. between the air volume contained in the
glottal pulse and Ps, and
Ps was determined from the [p] preceding the
syllable analyzed; as the subject kept vocal 4. normalized amplitude quotient, defined as
loudness constant during the syllable analyzed the ratio between peak-to-peak pulse
this value of Ps was assumed to be repre- amplitude and the negative peak amplitude
sentative of the entire subsequent syllable. of the differentiated flow glottogram divided
The glottogram files were differentiated by the length of the fundamental period.
using the Extract module of the Swell package,
Thus, one physiological measure was
Figure 4b. The level difference between the first
studied, Ps, and five measures related to glottal
and second voice source spectrum partials,
adduction, relative estimated glottal area, H1-
henceforth H1-H2, was measured by means of a
H2, closed quotient (QClosed), glottal com-
narrow bandwidth FFT analysis of the flow
pliance, and normalized amplitude quotient
glottogram analyzed.
From these data the following parameters
were computed:
1. relative estimated glottal area defined as the
ratio between the peak-to-peak pulse ampli- Figure 5 shows the relationships between the
tude and the square root of Ps, five adduction measures and the mean rated
degree of phonatory pressedness. Of these, the
2. closed quotient, relative estimated glottal area showed the lowest
correlation. Among the other measures, H1-H2

0,14 20

Relative Estimated Glottal Area

0,10 14
0,08 12

0,06 8
0,04 6
y =0,094 -0,0003x y = 18,9 -0,104x
0,02 R2 = 0,1198 R2 = 0,6602
0,00 0
0 20 40 60 80 100 0 20 40 60 80 100
Mean rated pressedness (mm) Mean rated pressedness (mm)

50 0,08
45 y = 0,4587x - 5,2 0,07
40 R2 = 0,7179

Glottal Compliance

25 0,04
20 0,03
0,02 y = 0,075 -0,0005x
5 0,01 R2 = 0,7545
0 0,00
0 20 40 60 80 100 0 20 40 60 80 100
Mean rated pressedness (mm) Mean rated pressedness (mm)

Figure 5. Relationship between
the five adduction measures
and the mean rated degree of
phonatory pressedness. The

best linear fit of the data points

(lines) are shown by the equa-
tions and the associated R2
y = 0,245 -0,0014x
values for the correlation.
R2 = 0,7253
0 20 40 60 80 100
Mean rated pressedness [mm]

showed the lowest (R2 = 0.660) while Qclosed, of the degree of phonatory pressedness. It is then
glottal compliance and NAQ showed the highest interesting to specify the phonatory charac-
(R2 = 0.718, R2 = 0.755, and R2 =0.725, teristics of the styles and phonation modes in
respectively). These results suggest that most of terms of mean degree of pressedness and mean
the variation in perceived phonatory pressedness loudness. This can be realized in terms of a
can be accounted for in terms of the latter three phonation map, showing mean NAQ versus
measures. Of these, NAQ seems particularly mean Ps. Figure 6 shows the results in terms of
interesting, being a robust measure that can be such a phonation map. Breathy and pressed were
derived automatically from high fidelity audio located at opposite extremes, both with respect
recordings (Alku et al., 2002). In addition, it is to pressedness and Ps. Neutral and flow share
closely related to perceived pressedness, mean Ps with breathy, but the mean NAQ is
accounting for 73% of the variation in perceived lower in neutral. With regard to the styles,
pressedness. Classical is close to breathy, and Blues close to
According to these results, the mean NAQ pressed, although the latter had a lower NAQ
can be regarded as a reasonable approximation value. Pop and Jazz have similar mean Ps, but

0,16 Classical Styles

Mean NAQ
0,14 Neutral


0,10 Blues

5 10 15

Mean subglottal pressure [cm H2O]

Figure 6. Phonation map showing Ps versus NAQ, both averaged across pitches and degrees of
vocal loudness, for the various styles and phonation modes.

Jazz has a considerably lower mean NAQ, adduction. Co-authors JS and MT recently pre-
midway between neutral and pressed, while the sented a similar map, where, however, the
NAQ for Pop was close to that of flow vertical axis represented a pressedness factor,
phonation. derived as a weighted sum of Qclosed, glottal
compliance and H1-H2. This measure showed a
Discussion correlation of R2=0.876 with the mean
pressedness ratings, while NAQ correlation was
Normalized amplitude quotient is closely related R2=0.725. On the other hand, the weights for
to other flow glottogram measures. For example, the three terms in the pressedness factor were
the peak-to-peak pulse amplitude has been optimized with respect to the correlation with
found to be strongly correlated with the ampli- the pressedness ratings. Therefore, we believe
tude of the voice source fundamental (Gauffin & that the NAQ measure is more useful than the
Sundberg, 1989; Fant, 1997). Likewise, Ps has pressedness factor.
been found to strongly correlate with the nega- The locations of the phonation modes and the
tive peak amplitude of the differentiated flow styles of singing in the phonation map appeared,
glottogram (Sundberg et al., 1999a). This im- by and large, logical. Blues was found to be
plies that basically NAQ should reflect the close to pressed, however slightly less pressed.
relationship between the amplitude of the funda- This difference may be important from the point
mental and Ps. On the other hand, those of view of vocal health. The small difference
relationships have been documented only for a between breathy and Classical seems more
limited variation of phonatory modes in pre- problematic, since breathy phonation must be
vious investigations. Therefore, the present produced with less adduction than Classical.
material would represent a more valid test of the The present investigation was based on data
relationship between perceived degree of from one singer subject. As the results are
pressedness and the various glottogram charac- promising, it would be worthwhile to apply a
teristics. similar research strategy on some more voices.
The phonation map is an attempt to describe,
in a compact form, main phonatory charac-
teristics of modes of phonation and singing Conclusions
styles. It has the advantage of taking into The normalised amplitude quotient, available by
account two main physiological variables for automatic analysis of the audio signal, shows a
phonatory control, Ps and the degree of glottal high correlation with perceived degree of phona-

tory pressedness. It seems useful for a compact Fant G (1997). The voice source in connected
description of phonatory characteristics of speech, Speech Comm 22: 125-139.
different modes of phonation and styles of Gauffin J & Sundberg J (1989). Spectral correlates of
singing. Such description can be given the form glottal voice source waveform characteristics, J
of a phonation map, showing mean normalised Speech Hear Res 32: 556-565.
amplitude quotient versus mean Ps. Stone RE (Ed), Cleveland TF, Sundberg J & Prokop
J (submitted). Aerodynamic and acoustical
measures of speech, operatic, and Broadway
Acknowledgement vocal styles in a professional female singer.
Co-author MT’s participation in this investi- Sundberg J & Thalén M (2001). Describing different
gation was supported by a grant within the styles of singing. A comparison of a female
Research and Development in the Arts program singer’s voice source in ‘Classical‘,‘Pop‘,‘Jazz‘
at the SMI (University College of Music and ‘Blues‘”, Log Phon Vocol 26: 82-93.
Sundberg J, Andersson M & Hultqvist C (1999a).
Education in Stockholm). The inverse filtering
Effects of subglottal pressure variation on pro-
and the glottogram measurements were made by
fessional baritone singers’ voice sources, J Acoust
Jenny Iwarsson. Soc Am 105: 1965-1971.
Sundberg J, Cleveland T, Stone RE & Iwarsson J
