Contributions of vocal tract shape to voice quality

Article  in  The Journal of the Acoustical Society of America · September 1998

DOI: 10.1121/1.423589

The relationship of vocal tract shape to three voice qualities
Brad H. Story
Department of Speech and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071

Ingo R. Titze
National Center for Voice and Speech, Wilbur James Gould Voice Research Center, Denver Center for the
Performing Arts, Denver, Colorado 80204 and Department of Speech Pathology and Audiology,
University of Iowa, Iowa City, Iowa 52242

Eric A. Hoffman
Division of Physiologic Imaging, Department of Radiology, University of Iowa College of Medicine,
Iowa City, Iowa 52242

共Received 5 May 2000; accepted for publication 3 January 2001兲

Three-dimensional vocal tract shapes and consequent area functions representing the vowels 关{, ,,
Ä, É兴 have been obtained from one male and one female speaker using magnetic resonance imaging
共MRI兲. The two speakers were trained vocal performers and both were adept at manipulation of
vocal tract shape to alter voice quality. Each vowel was performed three times, each with one of the
three voice qualities: normal, yawny, and twangy. The purpose of the study was to determine some
ways in which the vocal tract shape can be manipulated to alter voice quality while retaining a
desired phonetic quality. To summarize any overall tract shaping tendencies mean area functions
were subsequently computed across the four vowels produced within each specific voice quality.
Relative to normal speech, both the vowel area functions and mean area functions showed, in
general, that the oral cavity is widened and tract length increased for the yawny productions. The
twangy vowels were characterized by shortened tract length, widened lip opening, and a slightly
constricted oral cavity. The resulting acoustic characteristics of these articulatory alterations
consisted of the first two formants (F1 and F2兲 being close together for all yawny vowels and far
apart for all the twangy vowels. © 2001 Acoustical Society of America.
关DOI: 10.1121/1.1352085兴
PACS numbers: 43.70.Fq 关AL兴

I. INTRODUCTION of the vocal tract to voice quality, it is noted that Laver

共1980兲 proposed that long-term ‘‘settings’’ of the vocal tract
Voice quality is a broad term that refers to the extralin- bias the resulting formant structure toward a particular type
guistic aspects of a speaker’s voice with regard to identity, of global timbre. He defined two categories of vocal tract
personality, health, and emotional state. Quoting Abercrom- settings as longitudinal and latitudinal. Longitudinal settings
bie 共1967, p. 91兲, voice quality is ‘‘those characteristics describe the state of the long axis of the vocal tract such as
which are present more or less all the time that a person is larynx height and protrusion/retraction of the lips. The lati-
talking.’’ A broad description of voice quality would include tudinal settings are ‘‘quasi-permanent tendencies to maintain
features contributed by all the subsystems of speech produc- a particular constrictive 共or expansive兲 effect’’ within some
tion; i.e., respiratory, phonatory, and articulatory systems. region located along the length of the vocal tract 共Laver,
Some components of voice quality arise from individu- 1980, p. 35兲. Included here would be labial, lingual, faucal,
alized organic considerations of the speech organs. Vocal pharyngeal and mandibular settings. In a similar vein, Estill
fold mass, vocal tract length, tracheal length, jaw and tongue et al. 共1996兲 have proposed a system for voice quality con-
size, and nasal cavity volume all fall into this category and trol that includes a set of six elements describing the state of
may indicate information about age, sex, physique, and the vocal tract: 共1兲 soft palate control; 共2兲 anchoring 共use of
health. Other aspects of voice quality are brought forth from large muscles in the head, neck and torso to facilitate control
the way in which a speaker habitually uses the vocal organs of smaller muscles in the larynx兲; 共3兲 pharyngeal width; 共4兲
for speaking. These may include socio-linguistic qualities ac- pharyngeal length; 共5兲 tongue control; and 共6兲 aryepiglottic
quired from the influence of a particular speaking commu- control 共tightening of the aryepiglottic sphincter兲. Selection
nity 共i.e., regional dialect, accent, familial tendencies, etc.兲, of the ‘‘position’’ of each element along with a level of
emotional and psychological effects, or other purely idiosyn- effort can produce a wide variety of voice qualities. Regard-
cratic speech patterns. Laver 共1980兲 outlined a formal system less of the system used, the necessity to produce appropriate
to describe this second category of voice qualities based on a phonetic sounds for intelligible speech means that vocal tract
concept of ‘‘settings’’ of the speech organs. These so-called settings or positions cannot be regarded as being rigidly im-
‘‘settings’’ represent habitual muscle tensions throughout the posed on the vocal tract at every instant in time, but will
speech production system that impose a specific pattern of exert an influence on the tract shape whenever conditions
use during speech and consequently a specific voice quality. allow. Hence, a long-term ‘‘quality’’ is imposed on the
With the focus of this paper limited to the contributions speech signal.

1651 J. Acoust. Soc. Am. 109 (4), April 2001 0001-4966/2001/109(4)/1651/17/$18.00 © 2001 Acoustical Society of America 1651
Studies of the vocal tract are most often concerned with tions 共four vowels, three voice qualities兲 obtained from each
understanding and reproducing articulatory configurations of the two subjects 共one male, one female兲. In addition, for-
that generate appropriate phonetic sounds 共e.g., Stevens and mant frequencies determined from acoustic recordings of the
House, 1955; Fant, 1960; Baer et al., 1991; Narayanan et al., subjects and those computed from the area functions are
1995; Story et al., 1996, 1998兲 or investigating gender and compared. A method is also described that guides the modi-
age differences of the vocal tract shape 共e.g., Goldstein, fication of the area functions so their computed formants
1980; Yang and Kasuya, 1994; Fitch and Giedd, 1999兲. The match those extracted from recorded speech. The second aim
obvious importance of phonetic structure to transmission of is to investigate the mean area function across the four vow-
the linguistic message makes this focus quite understandable. els for each quality in an attempt to understand the possible
However, voice quality can also have a significant effect on vocal tract ‘‘settings’’ employed to produce each quality.
the linguistic message as was exemplified in the classic study
of Ladefoged and Broadbent 共1957兲 who found that a listen-
er’s identification of a test word was greatly influenced by
the voice quality of a phrase preceeding the test word. Thus A. Image collection and analysis
voice quality altered the phonetic identification of vowels. The methods for image collection and analysis are iden-
This was confirmation of Joos 共1948兲 proposal that the rela- tical to those presented in the authors’ previous publications
tionship between formant frequencies of a particular vowel 共Story et al., 1996, 1998兲 and will not be repeated here. Only
and those present in other words spoken by the same speaker specific information regarding the subjects and the protocol
will determine phonetic quality rather than absolute formant will be given.
frequencies. Laver 共1980兲 and Traunmüller 共1994兲 have both Volumetric imaging 共using MRI兲 of the vocal tract was
urged that increased attention be given to voice quality in used to collect 12 vocal tract shapes from one male and one
studies of speech communication because it plays a vital role female subject. These consisted of the four vowels 关{, ,, Ä,
in communicating information to a listener that may be É兴 produced with three distinct voice qualities, normal,
highly relevant to the message. Speaker-specific differences yawny, and twangy. Electron beam computed tomography
may account for much of the acoustic variability encountered 共EBCT兲 was also used to collect one image set of the vowel
in speech signals. 关Ä兴 for each subject. For the present study, this image set was
This paper is concerned with measurement and acoustic used only to estimate the dimensions of the teeth in order to
modeling of the vocal tract shape 共specifically vowel area make a correction to the MR images during analysis 共be-
functions兲 with regard to variations in voice quality. Specifi- cause of the low amount of hydrogen in the teeth they are
cally, vocal tract shapes produced by two professional vocal effectively imaged as airspace by MRI兲.
performers under the articulatory conditions for a yawny and At the time of scanning, the male subject 共M1兲 was 37
twangy voice quality are compared to their normal speaking years old with no history of speech or voice disorders and is
vocal tract. A yawny quality was speaking as if initiating a native to the state of New York 共Rochester兲. He was 6 ft. 1
yawn while the twangy quality is best described as that often in. tall and weighed approximately 190 pounds. The female
used by Country and Western singers as well as the voice of subject 共W1兲 was 42 years old and also had no history of
former United States presidential candidate Ross Perot 共Estill speech or voice disorders. She was 5 ft. 4 in. tall, weighed
et al., 1996兲. approximately 110 pounds, and is native to the state of Colo-
Magnetic Resonance Imaging 共MRI兲 was used to gather rado. Both subjects have had extensive training in the vocal
volumetric image sets of four vowel shapes 共关{, ,, Ä, É兴兲 arts and both are professional performers. M1 holds the de-
under the normal, yawny, and twangy conditions. The image grees of Bachelor of Fine Arts 共BFA兲, Master of Music
sets were subsequently processed to yield three-dimensional 共MM兲, and Doctor of Musical Arts 共DMA兲 in vocal music
reconstructions of the vocal tract and finally area functions. and has been active in teaching voice for many years. W1
It must be emphasized that the use of vocal performers limits holds a Bachelor of Arts 共BA兲 in vocal music and has taught
the scope of this study to the performers’ own interpretation voice for 20 years. More importantly, both subjects demon-
of these different voice qualities. Thus the results are not strated an ease of producing a wide variety of vocal qualities.
necessarily characteristic of normal, yawny, and twangy Prior to the imaging sessions, each subject participated
voice qualities in general. In addition, there is nothing inher- in two practice/training sessions in which they lay supine on
ently special about the yawny and twangy voice qualities, a comfortable cushion and practiced sustaining each vowel
but they were chosen for comparison to the normal because spoken with each voice quality; concentration on maintain-
they were hypothesized to have significantly enlarged and ing a steady vocal tract shape was emphasized. The subjects
reduced cavity volumes, respectively, relative to the normal were, for the most part, allowed to self-interpret each voice
tract shape. Additionally, these qualities are ‘‘global’’ in the quality. However, during the training session some descrip-
sense that they represent potential changes along the entire tion of each quality was given. Normal speech was simply a
vocal tract rather than distinct local modifications such as normal speaking quality, yawny quality was speaking as if
Laver’s 共1980兲 latitudinal settings 共e.g., palatized, pharyn- yawning, and twangy was described with examples of Coun-
gealized, etc.兲. With a limited amount of scanning time avail- try and Western singers and also the voice of former U.S.
able for this study, large and global modifications were de- presidential candidate Ross Perot 共Estill et al., 1996兲. The
sired. subjects were allowed to phonate at a comfortable pitch of
The first aim of the paper is to report twelve area func- their choice. M1 maintained the same pitch (B [2 ⫽116 Hz)

1652 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1652
FIG. 1. Demonstration of determining
the area function from a 3D recon-
struction of the vocal tract for M1’s
vowel 关É兴, 共a兲 3D reconstruction
where the 共0,0兲 point is located just
above the glottis and represents the
first element of the area function, 共b兲
sagittal projection of the vocal tract
centerline determined for the 3D re-
construction, and 共c兲 the area function
shown as a collection of concatenated

for all vowel productions and within each voice quality. also applied to each area function to remove small disconti-
W1’s level of comfort with each voice quality was pitch nuities assumed to be imaging artifacts. The filter function
dependent. She chose to use F ]3 ⫽185 Hz for normal, B 2 was implemented as
⫽175 Hz for yawny, and B [3 ⫽233 Hz for twangy. By the
end of the second practice session each speaker was adept at
reproducing the three voice qualities. During image collec-
tion typical modal phonation was used for all vowel produc- a f 共 i 兲 ⫽0.029a o 共 i 兲 ⫹0.471a o 共 i⫺1 兲 ⫹0.471a o 共 i⫺2 兲
tions and all productions were verified by the experimenters ⫹0.029a o 共 i⫺3 兲 , 共1兲
共two of whom are speech scientists兲 to be representative of
both a given vowel and voice quality.
The MR images were acquired using a General Electric
Signa 1.5 Tesla scanner at the University of Iowa Hospitals where a 0 and a f are the original and filtered areas, respec-
and Clinics. The image acquisition mode and pulse sequence tively, and i is the section number 共section 1 is just above the
parameters were identical to those used in Story et al. 共1996, glottis and section 44 is at the lips兲. Figure 1共c兲 shows the
1998兲. A 24–28 slice series of 5 mm thick contiguous, par- 44-section filtered area function, again for M1’s normal 关É兴,
allel, axial sections extended from just superior to the hard as a concatenated series of discrete area sections. Henceforth,
palate down to about the first tracheal ring. The field of view however, all area functions will be shown with a smooth
共slice dimensions兲 for each slice was 24 cm⫻24 cm which, curve so that figures don’t become unnecessarily compli-
with a pixel matrix of 256⫻256, gives a pixel dimension of cated.
0.938 mm/pixel. Acquisition of full volume of slices for a The piriform sinuses were not well represented in many
given vowel required approximately 5 min of actual scan- of the image sets for the two subjects and hence no attempt
ning time 共about 30 repetitions of the vowel兲. However, with was made to measure their cross-sectional areas. As is
pauses for respiration the total acquisition time was on the known from previous studies, the presence of the piriform
order of 10–15 min. sinuses can have an effect on the formant locations 共Dang
and Honda, 1997; Story et al., 1998兲 and the absence of them
in the present study is noted as a limitation.
B. Image analysis
While the area function is most often used for quantita-
The image analysis proceeded with an airway segmen- tively representing a vocal tract shape, it is often difficult to
tation technique followed by shape-based interpolation to put this view into the anatomical perspective of a real
generate a 3D reconstruction of each vocal tract shape 共Story speaker. In the authors’ previous publications a sagittal pro-
et al., 1996, 1998兲. As an example, the 3D reconstruction of jection of each 3D vocal tract reconstruction 关like the one in
M1’s normal 关É兴 is shown as a sagittal projection in Fig. Fig. 1共a兲兴 was shown along with the area function. However,
1共a兲. The point indicated as 共0,0兲 is located just above the since this paper is of a comparative nature 共i.e., comparing
glottis and represents the inlet to the vocal tract. Cross- different voice qualities兲 a simpler representation seemed
sectional areas between this point and the lip termination more useful. Toward this end, each area function is also pre-
were determined by first finding the centerline through the sented in Sec. III as a ‘‘pseudo-midsagittal’’ projection. This
3D reconstruction with an iterative bisection algorithm 共see was done by plotting each element of an area function as a
Story et al., 1996, p. 542兲. A sagittal projection 共2D兲 of the line of equivalent diameter drawn perpendicular to the vocal
centerline is shown in Fig. 1共b兲; note that the origin of the tract centerline. Figure 2共a兲 demonstrates this process for
centerline corresponds to the 共0, 0兲 point in Fig. 1共a兲. Next, M1’s normal 关É兴 vowel, where the curved dotted line is the
areas were measured from oblique sections calculated to be sagittal 共x-y兲 projection of the centerline 共determined with
locally perpendicular to the centerline. The collection of the previously described iterative bisection algorithm兲. The
these areas extending from just above the glottis to the lips thick solid lines are the equivalent diameters of successive
comprises the area function. Each area function was subse- area elements from glottis to lips. The endpoints of each
quently resampled with a cubic spline to contain 44 area equivalent diameter line are then connected to form the inner
sections. The distance ⌬l between each cross-sectional area and outer profiles of the tract shape 关Fig. 2共b兲兴. For presen-
was dependent on the measured total length (L t ) of each tation purposes in Sec. III only these profiles will be shown
vocal tract shape 共i.e., ⌬l⫽L t /44兲. A smoothing filter was as demonstrated in Fig. 2共c兲.

1653 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1653
FIG. 2. Demonstration of creating a
‘‘pseudo-midsagittal’’ representation
of an area function 共M1’s 关É兴兲, 共a兲 vo-
cal tract centerline 共dotted line兲 and
equivalent diameters of cross-sectional
areas plotted perpendicular to the cen-
terline 共solid lines兲, 共b兲 same as 共a兲 ex-
cept consecutive endpoints of each
equivalent diameter have been con-
nected with solid lines, and 共c兲 same
as 共b兲 but with the equivalent diameter
lines and the centerline removed.

C. Acoustic considerations is of interest to know what differences may have existed

Audio recordings of each speaker were not made during between the imaged vocal tract shape and the shape the
the scanning procedure but later during a separate recording speaker used during the audio recording. To do this, modifi-
session. Some researchers have successfully recorded acous- cations were made to each area function so that the computed
tic signals during image acquisition with custom micro- formant locations matched reasonably well with the formant
phones 共Baer et al., 1991兲. However, in general the MR ex- values determined from the recorded speech.
amination room is magnetically and acoustically hostile to Beautemps et al. 共1995兲 have reported an optimization
audio recording systems and was not attempted for this method for adjusting vocal tract area functions so that the
study. It is recognized that this is a limitation in comparing error between the computed and measured formants is mini-
the acoustic characteristics of the vocal tract area functions mized. In the present study, a more interactive approach was
with recordings of natural speech. But, one might also argue taken. Area function modifications were carried out manu-
that since the image acquisition requires on the order of 30 or ally but were guided by the use of acoustic sensitivity func-
more repetitions of each vowel 共Story et al., 1998兲, there is tions calculated for F1, F2, and F3. The sensitivity of a
no short segment of acoustic signal that would truly corre- particular formant is defined as the difference between the
spond to the measured vocal tract shape. kinetic energy 共KE兲 and potential energy 共PE兲 divided by the
The audio recording was acquired in a fully anechoic total energy in the system 共Fant and Pauli, 1975兲
chamber at the University of Iowa. The subjects lay supine KE n 共 i 兲 ⫺ PE n 共 i 兲
and wore ear plugs to partially simulate the conditions in the S n共 i 兲 ⫽ n⫽1,2,3 and i⫽1,...,44,
TE n
MR scanner. An AKG-C410 head-mounted microphone was 共2兲
positioned 4 cm from the speaker at 45 deg off-axis of the
mouth. The speakers sustained each vowel within the three where i is the section number 共section 1 is just above the
voice qualities for approximately 5 s. The microphone signal glottis and section 44 is at the lips兲, n is the formant number
was recorded onto digital audio tape 共DAT兲 and later trans- and TE n ⫽ 兺 i⫽1
关 KE n (i)⫹ PE n (i) 兴 . The kinetic and poten-
ferred to digital audio files 共44.1 kHz sampling frequency tial energies for each formant frequency are based on the
and 16 bits resolution兲 via a Signalogic SIG31 data acquisi- pressures P n (i) and flows U n (i) computed for each section
tion board installed in a computer workstation. of an area function with the transmission-line type model
To find the first three formants of each vowel produc- mentioned previously. They are calculated as
tion, a 46 coefficient LPC algorithm 共auto-correlation 1 ␳l共 i 兲
method兲 was first used to estimate the frequency response KE n 共 i 兲 ⫽ 兩 U n共 i 兲兩 2, 共3兲
2 a共 i 兲
function. Then a peak-picking technique with parabolic in-
terpolation 共Titze et al., 1987兲 determined the formant loca- and
tions. A window size of 25 ms was used and the reported
formants are the mean values over approximately 4 s of the 1 a共 i 兲l共 i 兲
PE n 共 i 兲 ⫽ 兩 P n共 i 兲兩 2 共4兲
vowel production. 2 ␳c2
For each area function, a corresponding frequency- where a(i) and l(i) are the cross-sectional area and length of
response function was computed using a transmission-line section i within an area function, respectively. ␳ is the den-
type model 共Sondhi and Schroeter, 1987兲. Losses due to sity of air and c is the speed of sound.
yielding walls, viscosity, heat conduction, and radiation were The sensitivity function can then be used to compute the
included in the calculations. In all cases, for both speakers, change in a particular formant frequency (F n ) due to pertur-
the computed frequency response functions showed formant bation of the area function (⌬a) with the relation,
peaks that were at least slightly different than those deter-
mined from recorded speech; in some cases the differences ⌬F n ⌬a 共 i 兲
were large. Since the image collection and audio recording ⫽
F n i⫽1 兺
S n共 i 兲
a共 i 兲
. 共5兲
sessions were conducted at different times and places and
under different conditions, some discrepancy between com- This equation says that if the sensitivity function is positively
puted formants and natural formants is expected. However, it valued and the area perturbation is also positive 共area is in-

1654 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1654
FIG. 3. Demonstration of using acous-
tic sensitivity functions to guide a
modification of M1’s 关{兴 area function
such that the first three formants coin-
cide with those measured from audio
recordings of the speaker’s 关{兴 vowel:
共a兲 original 共solid兲 and modified
共dashed兲 关{兴 area function, 共b兲 fre-
quency response functions calculated
for the original 共solid兲 and modified
共dashed兲 关{兴 area function 共solid verti-
cal lines represent formant frequencies
determined from the speaker’s audio
recording兲, 共c兲 sensitivity functions of
F1, F2, and F3 for the original 关{兴
area function, and 共d兲 modification
functions which, when multiplied by
the original area function in 共a兲, will
produce the modified area function in

creased兲 the change in formant frequency will be upward mant would be largely cancelled if the area modification
共positive兲. If the area change is negative 共area decreased兲 the spans equalivalent positive and negative portions of the sen-
formant frequency will decrease. When the sensitivity func- sitivity function. The goal is to expand the area in this region
tion is negatively valued, the opposite effect occurs for posi- such that F3 is moved only slightly upward but F1 and F2
tive or negative area perturbations. Because the sensitivity are moved more significantly.
function is strictly valid for only small area perturbations it is The original area function, when multiplied by the two
difficult to efficiently use Eq. 共5兲 in a quantitative fashion. functions shown in Fig. 3共d兲 do move F1, F2, and F3 in the
Instead the the sensitivity functions are used as a qualitative desired directions. Trial and error was used to create these
guide to making manual vocal tract shape changes. functions with the equation,
As an example of using sensitivity functions to perform 2
an area function modification, Fig. 3共a兲 shows original 共solid m k 共 i 兲 ⫽1⫹q k e ⫺18共 i⫺l k /w k 兲 , 共6兲
line兲 and modified 共dashed line兲 versions of M1’s normal 关i兴 where q k is an amplitude scaling factor, l k is the location of
area function. The corresponding frequency response func- the center of the modification 共specified as distance from the
tions are given in Fig. 3共b兲 where the three vertical lines glottis兲, w k is the length of the region over which the modi-
indicate the formant frequencies obtained from LPC analysis fication will have an effect, and i is again the section number.
of the original audio recording. This figure shows that the 共The ‘‘18’’ in the exponential is a scaling factor that allows
area function modification has indeed moved the first three w k to be specified as a region length.兲 A modified area func-
formants to locations nearly coincident with those from the tion can then be specified as
recorded speech.
The necessary modification was determined by first not-
ing that for the computed formant peaks to coincide with the a m 共 i 兲 ⫽a 共 i 兲 兿
m k共 i 兲 , 共7兲
formant locations of the recorded speech, F1 and F3 must
both increase while a decrease is required of F2; however, where K is the required number of modification functions. In
only a modest increase in F3 is needed. Using the calculated the case shown in Fig. 3, K⫽2. For the first modification
sensitivity functions 共for F1, F2, and F3兲 shown in Fig. 3共c兲 function m 1 关Fig. 3共d兲, solid line兴, the parameters were set to
as a guide, a region extending from about 10.5 cm to 15.3 cm be q 1 ⫽2.4, l 1 ⫽12.2 cm, and w 1 ⫽5.2 cm; for the second
共distance from glottis兲 is observed in which S 1 remains posi- function m 2 they were set to q 2 ⫽0.8, l 2 ⫽12.4 cm, and w 2
tive and S 2 remains negative. This means that an area expan- ⫽3.7 cm.
sion in this specific region will increase F1 and decrease F2, The resulting modified area function 关Fig. 3共a兲兴 now
as desired. Also within this length region, S 3 is both posi- provides a means for understanding the discrepancy between
tively and negatively valued so that an increase 共or decrease兲 the formant locations determined from the recorded speech
in area should have little effect; i.e., the effect on this for- versus those computed for the measured area function. The

1655 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1655
FIG. 4. Area functions, pseudo-midsagittal profiles, and formant spectra for M1’s 关{兴 vowel spoken in the normal, yawny, and twangy qualities. In each of the
nine graphs the solid lines correspond to the original area function and the dashed lines to the modified area function 共based on the method in Fig. 3兲. In each
pseudo-midsagittal profile, the point 共0,0兲 is located just above the glottis and the open end represents the lip termination. In addition, the horizontal axis
represents the posterior-to-anterior dimension 共P-A兲 while the vertical axis represents the inferior-to-superior dimension 共I-S兲. The formant spectra plots
include frequency response functions computed for the original 共solid兲 and modified 共dashed兲 area functions as well as the three vertical lines that indicate
formant frequencies obtained from LPC analysis of the original audio recordings. This figure is shown in tabular form with each row representing one specific
voice quality and each column giving a particular representation of the vowel 共i.e., area function, pseudo-midsagittal profile, and formant spectra兲.

expansion that was needed in the highly constricted portion magnitude of change that may have occurred between the
of the 关{兴 vowel suggests two possibilities: 共1兲 the speaker imaging session and the recording session.
produced the vowel in the scanner differently 共i.e., with a It must be clarified, however, that it is not possible to
tighter constriction兲 than in the recording session, or 共2兲 the know that the modified area functions derived by this method
area in this region was underestimated because of imaging in are actually the true vocal tract shapes used by the speaker.
only the axial plane 关the slice thickness may have been com- In fact, Mermelstein 共1966兲 showed that three formants 共the
parable to the size of the airspace—see Story et al. 共1998兲 number of formants used in the present analysis兲 do not pro-
pp. 475–476 for a discussion of this problem兴. More likely, vide enough information to uniquely specify a vocal tract
the discrepancy probably comes about as a combination of area function. But, the modifications do provide, in the least,
both performance differences and imaging artifacts. As will a realistic possibility of how the tract may have been shaped
be seen in a later section, some of the modifications seem to during the recording of a particular sample. Note that only
be obvious performance differences while others lend them- modifications in the form of area expansions or contractions
selves more to an explanation of imaging artifact. The value have been used; it is also recognized that length modifica-
of showing each modification is that it indicates the possible tions could move formant locations significantly.

1656 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1656
FIG. 5. Area functions, pseudo-midsagittal profiles, and formant spectra for M1’s 关,兴 vowel spoken in the normal, yawny, and twangy qualities. The
arrangement and description of this figure is identical to Fig. 4.

III. RESULTS AND DISCUSSION of either n, y, or t to denote normal, yawn, or twang, respec-
tively 共e.g., the yawny a would be shown as a y 兲.
Figures 4–7 and Figs. 9–12 show the area functions,
pseudo-midsagittal profiles, and formant spectra for each of A. Male speaker „M1…
the two speakers’ four vowels and three voice qualities. Each
figure contains information corresponding to only one vowel 1. Vocal tract shape
and is presented in tabular form with the three qualities 共nor- The three voice quality variations of each vowel for
mal, yawn, and twang兲 listed vertically and the type of rep- speaker M1 are shown in Figs. 4–7. The first point to note
resentation listed horizontally. Thus figure labels of 共a兲, 共b兲, about these figures concerns the ‘‘accuracy’’ of the area
... etc. have not been used. In all figures the original mea- functions as assessed by the size of the modifications re-
surements are shown with solid lines while modifications are quired to align the computed formants with those extracted
shown with dashed lines. In each plot containing formant from real speech 共i.e., shown as three vertical lines in the
spectra, the three vertical lines indicate the formant frequen- formant spectra plots兲. In most cases, the area functions were
cies obtained from LPC analysis of the original recordings. altered only slightly, often requiring just a mild expansion of
Tables I and II present total vocal tract length and volume for a constricted region. The source of this type of error could
each speaker, respectively. The vowels produced with a likely be due to an imaging artifact and subsequent underes-
given voice quality are referred to in each figure with their timation of cross-sectional area. Of course the possibility
appropriate IPA symbols, but augmented with the subscript does exist that, during image collection, the subject did ac-

1657 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1657
FIG. 6. Area functions, pseudo-midsagittal profiles, and formant spectra for M1’s 关Ä兴 vowel spoken in the normal, yawny, and twangy qualities. The
arrangement and description of this figure is identical to Fig. 4.

tually produce the vowels with a more tightly constricted locations. While the speaker may have actually reduced the
portion of the vocal tract, potentially as a means of creating lip area during production of these vowels, it is also possible
a highly intelligible, but extreme, version of the vowel. that the speaker lengthened the tract during the audio record-
In a few cases, synergistic expansions and contractions ing. Lip area contraction achieves formant changes similar to
between front and back regions were necessary. The normal those of tract lengthening.
关 ,n 兴 and 关 Én 兴 共Figs. 5 and 7兲 fall most distinctly into this Also of note is that the male speaker produced the 关 {y 兴
category where, in each case, the front cavity needed to be 共Fig. 4兲 vowel with the tongue tip in contact with the hard
expanded and the back cavity constricted. For this type of palate, creating two lateral air flow paths at this point instead
modification, the source of the error is most likely due to of a single conduit as is typical of vowels. Thus the produc-
performance differences between the scanning session and tion was similar to an 关(兴 except the contact point was several
the audio recording session; error due to image artifact is centimeters posterior with the lateral airways beginning at
more likely to be systematic 共e.g., always underestimating or 11.2 cm from the glottis and rejoining as a single tube at 15.3
always overestimating cross-sectional area兲. Modifications to cm from the glottis. This portion of the area function is rep-
both vowels moved F2 and F3 down in frequency and F1 resented as the sum of the two lateral pathways. The pseudo-
up in frequency. midsagittal profile, which has an unnatural, right-angle
In three of the yawn quality vowels, 关 {y 兴 , 关 ,y 兴 , and 关 Äy 兴 shape, was based on the average of the centerlines deter-
共see Figs. 4–6兲, a reduction of the lip aperture area was mined for each lateral pathway. Considering the increased
needed in order to move the formant peaks to the appropriate complexity of this vocal tract shape, the computed formants

1658 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1658
FIG. 7. Area functions, pseudo-midsagittal profiles, and formant spectra for M1’s 关É兴 vowel spoken in the normal, yawny, and twangy qualities. The
arrangement and description of this figure is identical to Fig. 4.

were surprisingly close to those measured from recorded peak area than do the 关{兴 vowels of either of the other two
speech. qualities. Similarly, the front cavities for 关 ,y 兴 , 关 Äy 兴 , and
With regard to the voice qualities themselves, it is ob- 关 Éy 兴 have areas significantly larger than those of their normal
served that the yawny vowels maintain significantly larger and twang counterparts. The epi-larynx tube, which is
cross-sectional areas in the expanded portions of the vocal roughly defined as the first 2–2.5 cm of area function length,
tract than do either the normal or twang qualities. For ex- is also seen to possess slightly larger cross-sectional areas in
ample, the pharyngeal cavity for 关 {y 兴 has nearly twice the the yawn quality. An obvious result of enlarged cavities is

TABLE I. Length (L t ), volume (V o and V m 兲, and fundamental frequency (F 0 ) measurements for 12 male
vowels 共speaker M1兲. Length is given in cm, volume in cm3, and fundamental frequency in Hz. V o and V m are
the volumes of the original and modified area functions, respectively.

in iy it ,n ,y ,t Än Äy Ät un uy ut

Lt 16.32 18.39 15.14 16.06 15.84 14.70 16.10 17.56 15.22 18.00 18.52 16.50
Vo 23.7 60.0 20.0 18.5 44.3 20.2 48.0 76.3 47.9 27.9 63.7 19.4
Vm 24.5 59.3 20.4 19.9 41.0 22.0 47.9 75.6 47.0 26.9 62.9 21.7
Fo 116 116 116 116 116 116 116 116 116 116 116 116

1659 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1659
TABLE II. Length (L t ), volume (V o and V m 兲, and fundamental frequency (F o ) measurements for 12 female
vowels 共speaker W1兲. Length is given in cm, volume in cm3, and fundamental frequency in Hz. V o and V m are
the volumes of the original and modified area functions, respectively.

in iy it ,n ,y ,t Än Äy Ät un uy ut

Lt 14.34 14.52 12.94 13.82 15.14 13.64 13.95 17.07 13.38 15.31 13.46 13.64
Vo 29.9 54.9 27.1 25.6 61.7 21.7 36.0 37.4 21.5 24.4 24.2 19.0
Vm 27.7 29.4 24.2 25.3 41.7 24.2 38.8 42.9 24.5 23.9 29.4 14.8
Fo 185 175 233 185 175 233 185 175 233 185 175 233

that the yawn vowels maintain a larger vocal tract volume sion. The 关 {t 兴 and 关 Ét 兴 show a slight increase in F1 while
than either normal or twang vowels, as can be observed in 关 ,t 兴 showed a small decrease.
Table I. In fact, for three of the yawn vowels ({y ,,y ,Éy ), the
total volume is nearly double 共or more兲 than that of the nor-
mal and twang vowels. B. Female speaker „W1…
The 关 {y 兴 , 关 Äy 兴 , and 关 Éy 兴 also show a larger total length
(L t in Table I兲 relative to normal and twang. This increased 1. Vocal tract shape
overall tract length could arise from either a lowered larynx The female speaker’s 共W1兲 vocal tract characteristics for
or lip protrusion or possibly both. The exception is the 关 ,y 兴 the three qualities and four vowels are depicted in Figs.
in which the vocal tract length is slightly shorter than the 9–12. To maintain consistency the axes on these figures have
关 ,n 兴 but still more than a centimeter longer than 关 ,t 兴 . It is been made identical to those for the male speaker. The ‘‘ac-
worth noting, however, that the modification to the 关 ,y 兴 area curacy’’ of W1’s area functions is poorer than for those of
function that was necessary to bring the computed formant the male speaker as can be seen by the more extensive modi-
peaks in line with those extracted from recorded speech, fications required to align the computed formants with those
largely consisted of constricting the lip section. As men- measured from the audio recording. This is not surprising
tioned previously, an alternative modification that would considering that this subject felt somewhat constrained by
have a similar effect on formant frequencies would be to the MR scanner and often had the desire to move. The prob-
lengthen the vocal tract. lem was not, however, one of claustrophobia, but simply that
The most prominent feature of the twang quality is a this speaker is one who tends to use more body movement
tendency toward a large lip opening and a slightly con-
stricted epi-larynx, both of which will cause formants to be
relatively high. For 关 Ät 兴 and 关 {t 兴 , the lip opening is larger
than both normal and yawn. Another strong feature of the
twang tract shapes is that, relative to a normal voice quality,
the vocal tract length is shortened. The amount of shortening
is more than 1 cm for the 关 {t 兴 , 关 ,t 兴 , and 关 Ét 兴 while the 关 Ät 兴
is shortened by about 0.9 cm.

2. Acoustic characteristics
The measured formants and computed formant spectra
for the three qualities 共fourth column in Figs. 4–7兲 show
consistently that the locations of formants F1 and F2 in the
yawn quality tend to be more closely spaced than in either
the normal or twang qualities. Furthermore, for the twang
quality F1 and F2 tend to be spread far apart.
Figures 8共a兲 and 共b兲 show F1-F2 plots of the yawny
and twangy vowels, respectively, relative to the normal vow-
els. For the yawny quality 关Fig. 8共a兲兴 all vowels are shifted
downward 共relative to the normal vowels兲 in the F2 dimen-
sion. In addition, the first formants (F1) of the 关 {y 兴 and 关 Éy 兴
vowels are higher in frequency than in the normal quality
while those of 关 ,y 兴 and 关 Äy 兴 are lower. Overall, the range of
variation in both the F1 and F2 dimensions are reduced in
the yawny quality.
For the twangy quality 关Fig. 8共b兲兴 all vowels are shifted
upward in the F2 dimension. Interestingly, the 关 Ät 兴 which FIG. 8. Vowel space plot for the male speaker’s 共M1兲 vowels with F2 on
shows an increase in F1 of more that 100 Hz, is the only the y-axis and F1 on the x-axis, 共a兲 yawny 共dashed兲 and normal 共solid兲 voice
twangy vowel that shows a large change in the F1 dimen- qualities, 共b兲 twangy 共dashed兲 and normal 共solid兲 voice qualities.

1660 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1660
FIG. 9. Area functions, pseudo-midsagittal profiles, and formant spectra for W1’s 关{兴 vowel spoken in the normal, yawny, and twangy qualities. The
arrangement and description of this figure is identical to Fig. 4.

during vocalization, making the required supine position possible fatigue from the experiment, perhaps forced the sub-
along with the enclosed space of the scanner a constraining jects to adopt a compromised tract shape.
environment. The remaining three yawny vowels also required signifi-
The most extensive modification had to be performed on cant modifications but not as extensive as for the 关 {y 兴 . In the
the yawny 关 {y 兴 共Fig. 9兲 where the original shape seems to case of the 关 ,y 兴 共Fig. 10兲, the oral cavity volume needed to
have little or no resemblance to an 关{兴 vowel; in fact, it would be reduced in the region 8–13 cm from the glottis but
appear to have been produced more like an 关,兴. The applied slightly expanded near the lips. A synergistic modification
modification constricted the front portion of the tract and was required for the 关 Äy 兴 共Fig. 11兲 in which the cross-
expanded the back part, giving a more standard 关{兴-like sectional areas in the pharynx were moderately reduced
shape. Even with this, F1 is still higher than the measured while those in the oral cavity were expanded slightly except
value of F1. Interestingly, M1’s yawny 关 {y 兴 共Fig. 4兲 was also at the lips where a large expansion was needed. A similar
produced rather unconventionally 共with the tongue tip in modification was also needed for the 关 Éy 兴 共Fig. 12兲 except
contact with the hard palate兲, suggesting that an 关{兴 vowel that both a contraction and expansion of areas was needed in
and yawn quality are somewhat incompatible because of the the pharynx while a large expansion was required for the
simultaneous need for a front constriction 共to produce 关{兴兲 areas in the oral cavity between 10 and 12 cm from the
and an oral expansion 共to create the yawn quality兲. As a glottis. The modifications applied to the normal and twangy
result the subjects may have been confused as to how to vowels were also more extensive than those for the male but
configure the vocal tract for this shape. This, coupled with subtle in comparison to W1’s yawny vowels.

1661 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1661
FIG. 10. Area functions, pseudo-midsagittal profiles, and formant spectra for W1’s 关,兴 vowel spoken in the normal, yawny, and twangy qualities. The
arrangement and description of this figure is identical to Fig. 4. The ‘‘N’’ shown in the pseudo-midsagittal profile for 关 ,t 兴 signifies that this vowel was

The fact that W1’s yawny vowels required a great deal just slightly longer than the normal 关 {n 兴 and about 1.5 cm
of modification suggests that she may have been the most longer than the twang 关 {t 兴 . Oddly, for the 关 Éy 兴 the tract
uncomfortable or unfamiliar with this type of voice quality. length is the shortest of the three qualities. The modification
In other words, the yawny quality may have been farther of the 关 Éy 兴 , however, still leaves the computed F1 and F2 at
from her natural 共normal兲 speaking voice than the twang. It higher frequency locations than the measured formants, sug-
is also notable that expanded vocal tract regions in the
gesting that the vowel may have actually been produced with
yawny vowels tend to be only slightly larger than those in
a longer tract length during the audio recording.
the normal or twang vowels. In addition, the tract volumes
Like the male speaker, W1’s twang vowels have a ten-
for the modified area functions shown in Table II also indi-
cate only slightly larger values for the yawn quality. This dency toward a large lip opening. However, unlike the male,
differs from M1’s yawny vowels where the cross-sectional the cross-sectional area of the epi-larynx remains similar to
areas of expanded vocal tract regions and the tract volumes that of the normal quality 共in which it is already quite con-
were much larger than that of the other two qualities. stricted兲. What seems to be the most prominent feature for
With regard to vocal tract lengths, Table II shows that W1’s twangy vowels is an overall constrictive effect of the
for the 关 ,y 兴 and 关 Äy 兴 , the total tract lengths are approxi- entire tract, except at the lips where the effect is expansive.
mately 1.5 cm and 3.0 cm longer, respectively, than their This is seen in the oral cavity of the 关 {t 兴 , the mid-tract region
normal and twang counterparts. The total length of the 关 {y 兴 is of the 关 ,t 兴 , the pharynx of the 关 Ät 兴 , and the mid-tract region

1662 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1662
FIG. 11. Area functions, pseudo-midsagittal profiles, and formant spectra for M1’s 关Ä兴 vowel spoken in the normal, yawny, and twangy qualities. The
arrangement and description of this figure is identical to Fig. 4. The ‘‘N’’ shown in the pseudo-midsagittal profile for 关 Ät 兴 signifies that this vowel was

of the 关 Ét 兴 . This constrictive effect is also demonstrated by quency responses may be somewhat due to the fact the na-
the tract volumes in Table II where the twangy vowels have salization was not taken into account for the calculation.
the smallest values. It is also of interest that the constricted With the exception of the 关 Ét 兴 , all of W1’s twangy vowels
region in the 关 ,t 兴 at about 8 cm from the glottis is located at have tract lengths shorter than the other two qualities as can
nearly the same place as the mid-tract constriction for the be seen in Table II.
关 Ét 兴 . But, the front cavity shapes for each of these vowels is
dramatically different with a rapid increase of area out to the
lip termination in the 关 ,t 兴 and a contracted frontal space for 2. Acoustic characteristics
the 关 Ét 兴 . A feature unique to two of W1’s twang vowels was
that of nasalization. The 关 ,t 兴 and 关 Ät 兴 each showed a distinct Like the male speaker, the spectral structure of the fre-
airway branching off the main vocal tract coursing up into quency response functions and the measured formant fre-
the nasal passages where nasal port cross-sectional areas quencies 共fourth column of Figs. 9–12兲 again show that the
were measured to be 0.2 cm2 and 0.1 cm2, respectively. Estill yawn quality tends toward reducing the distance between F1
et al. 共1996兲 notes that twang is sometimes observed to be and F2. The vowel space plot in Fig. 13共a兲 shows that, for all
produced with nasalization. In the case of these two vowels, of the yawny vowels, F2 is reduced relative to the normal
the discrepancy between the formants obtained from the au- quality. Also, the 关 {y 兴 and 关 Éy 兴 show little change in the
dio recordings and those observed in the computed fre- frequency of F1 but F1 is reduced by more than 100 Hz for

1663 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1663
FIG. 12. Area functions, pseudo-midsagittal profiles, and formant spectra for W1’s 关É兴 vowel spoken in the normal, yawny, and twangy qualities. The
arrangement and description of this figure is identical to Fig. 4.

both 关 ,y 兴 and 关 Äy 兴 . An overall lengthening of the vocal tract W1 should have been required to maintain a consistent F o
as well as enlarging some tract regions apparently serve to across all productions thus eliminating the ambiguity. This,
maintain these features. however, may have forced her into an uncomfortable and
The twang quality 关Fig. 13共b兲兴 tends toward increasing even less natural pattern of vowel production.
the distance between F1 and F2, in large part by raising F2.
In fact, all of the twang vowels show an increase in the
C. Mean area functions
frequency of F2. Furthermore, all vowels except 关 Ät 兴 show
an increase in the frequency of F1. In order to condense the area function data, mean area
While the shifting of the formant frequencies for the functions were computed for each speaker across the four
twang and yawn qualities was similar for both M1 and W1, it vowels in each voice quality. They were calculated by find-
should be noted that W1 chose to use a different voice fun- ing the average area of each of the 44 vocal tract sections
damental frequency (F o ) to produce each voice quality 共see across the four vowels within a specific voice quality 共this
Table I兲. In contrast, M1 maintained an F o of 116 Hz was performed on the modified area functions兲. Thus, three
throughout the experiment and never expressed any desire to mean area functions were created for each speaker. The
use different F o ’s. W1’s use of different F o ’s was motivated length intervals ⌬l were also averaged across vowels so that
by achieving a level of comfort with the different qualities, each mean area function has a corresponding mean length.
however, it was not clear if this comfort level was associated Figure 14共a兲 shows the mean area functions for the male
with biomechanical or perceptual considerations. Perhaps speaker. The observations made in the previous sections are

1664 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1664
function for M1’s yawn quality 关Fig. 14共c兲, dashed line兴
shows an increase in cross-sectional area along the entire
vocal tract length with largest increase in the oral cavity.
M1’s twang difference function 关Fig. 14共c兲, dotted line兴 os-
cillates between small positive and negative area values for
about the first 8 cm from the glottis 共essentially the pharynx兲,
then shows a constrictive effect in the oral cavity out to
approximately 15 cm from the glottis, and terminates the
vocal tract with an increase of the lip aperture area. For W1,
the yawn difference function differs from that of M1 in that
a constrictive effect occurs in the lower pharynx 共0–7 cm
from the glottis兲 before showing the oral cavity expansion.
The difference function for W1’s twang quality shows a pro-
gressively increasing constrictive effect from 0 to 12.5 cm
from the glottis, afterwhich the function becomes positive
indicating an expansion at the lip termination.
Plotted in Figs. 14共e兲 and 共f兲 are the frequency response
functions calculated for each of the male and female mean
area functions, respectively. Each spectrum is shown with a
frequency range of 0–3 kHz so that the first three formants
can be easily seen. There is only a small amount of variation
observed in F1 with a 36 Hz range for the male and a 90 Hz
range for the female. Relative to the normal quality, both
speakers mean area function shapes reduce the distance be-
tween F1 and F2 for the yawn quality and increase the same
distance for the twang; because of the small range of varia-
FIG. 13. Vowel space plot for the female speaker’s 共W1兲 vowels with F2
tion for F1, it is primarily F2 that is increased or decreased.
on the y-axis and F1 on the x-axis, 共a兲 yawny 共dashed兲 and normal 共solid兲 Both the difference functions 关Figs. 14共c兲 and 共d兲兴 and
voice qualities, 共b兲 twangy 共dashed兲 and normal 共solid兲 voice qualities. the frequency response functions 关Figs. 14共e兲 and 共f兲兴 con-
firm and summarize the earlier observations of the individual
essentially summarized by this figure. The yawn quality has vowels with regard to spatial variations of the tract shape and
the largest cross-sectional areas and consequently the largest formant locations. In both an acoustic sense and an articula-
tract volume. It is also the longest of the three area functions tory sense, it might be proposed that the yawn quality is
and has the most expanded epi-larynx tube. Conversely, the biased toward the vowel 关Ä兴 since the tendency is to expand
mean area function for twang has the shortest tract length the oral cavity and minimize the distance between F1 and
and the most constricted epilarynx. In addition, the lip open- F2. Conversely, the twang could be said to be 关{兴-biased
ing is by far the largest of the three qualities. The normal because of the constriction of the oral cavity and widening of
mean area function is quite similar to the twang except that lip area as well as a tendency to maximize the distance be-
the epi-larynx is wider and the lip termination area is more tween F1 and F2. In more traditional phonetic terms 共e.g.,
than 2 cm2 smaller. Ladefoged, 1993兲 the yawny vowels have a higher degree of
Nearly the same characteristics can be described for the backness and lip rounding than the normal quality vowels.
female speaker’s mean area functions in Fig. 14共b兲. The The twangy vowels tend toward the opposite with a more
yawn has the longest length and a large volume, at least in fronted and lip spread production. Both qualities show a less
the oral cavity, while the twang again has a large lip termi- systematic variation in the height (F1) dimension.
nation area. Unlike the male, however, the cross-sectional The acoustic properties of the twang quality 共e.g., in-
areas within the epi-larynx tube are constricted 共⬍0.5 cm2兲 creased distance between F1 and F2兲 share some similarity
and change very little across the three qualities. to languages that use the so-called Advanced Tongue Root or
Figures 14共c兲 and 共d兲 summarize the vocal tract charac- ⫹ATR. Ladefoged and Maddieson 共1996, p. 305兲 show that
teristics of the yawn and twang qualities by showing the ⫹ATR vowels in six languages 共Akan, Ateso, DhoLuo,
difference between their respective mean area functions and Ebira, Igbo, and Ijo兲 are advanced in the (F2⫺F1) or front/
that of the normal quality. In a strict sense these differences back dimension much as were the twang vowels in this
are not valid because each mean area function has a different study. However, the ⫹ATR vowels also indicated a system-
mean length and cannot be subtracted from each other cen- atic decrease in F1 frequency 共increase in vowel height兲
timeter by centimeter. However, the area functions can be which was not always the case for the twang vowels. Fur-
normalized to a single length 共in this case to the mean length thermore, the physiologic realization of ⫹ATR is apparently
across all 15 vowels for each speaker兲 and then subtracted, a considerable increase in cross-sectional area within the
giving a difference function that is perhaps useful as an in- pharynx. On average, the twang vowels do show a slight
dicator of the global characteristics of each voice quality increase in cross-sectional area within the distance of about 2
relative to the normal quality. For example, the difference cm to 5 cm from the glottis 关see Figs. 14共a兲 and 共c兲兴 for the

1665 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1665
FIG. 14. Summary of the results in
Figs. 4–7 and Figs. 9–12 based on
mean area functions, 共a兲 mean area
functions of M1’s normal, yawny, and
twang qualities, 共b兲 mean area func-
tions of W1’s normal, yawny, and
twang qualities, 共c兲 functions repre-
senting the difference between M1’s
yawny and normal qualities 共dashed
line兲 and between M1’s twangy and
the normal qualities 共dotted line兲, 共d兲
functions representing the difference
between W1’s yawny and normal
qualities 共dashed line兲 and between
W1’s twangy and the normal qualities
共dotted line兲, 共e兲 frequency response
functions computed for each of M1’s
mean area functions 共the F2’s for each
quality are denoted with the subscripts
‘‘n,’’ ‘‘y,’’ and ‘‘t’’兲, 共f兲 frequency re-
sponse functions computed for each of
W1’s mean area functions 共the F2’s
for each quality are denoted with the
subscripts ‘‘n,’’ ‘‘y,’’ and ‘‘t’’兲.

male but a constrictive effect in this same region for the speakers were produced by resetting the vocal tract shape to
female 关see Figs. 14共b兲 and 共d兲兴. Thus while acoustic prop- have a shortened vocal tract, a widened lip aperture, and a
erties of the twang quality are similar to ⫹ATR vowels in moderate constriction of the oral cavity.
various languages, the vocal tract shapes themselves do not This study has shown how Laver’s 共1980兲 concept of
show the same degree of similarity. vocal tract ‘‘settings’’ can be realized in terms of ‘‘tenden-
cies to maintain a particular constrictive 共or expansive兲 ef-
fect’’ within some region or regions located along the length
of the vocal tract and/or to maintain a particular vocal tract
This paper has described a simple study investigating length. The tract shape tendencies observed for the yawny
some connections between voice quality and vocal tract and twangy qualities were pervasive throughout the four spo-
shape for a male and female speaker. Relative to their normal ken 共and imaged兲 vowels suggesting that they create a domi-
speaking voice, both speakers interpreted the yawny quality nant pattern or ‘‘background setting’’ upon which speech
to have a reduced distance between F1 and F2, acoustically, articulation may be superimposed. Such a background set-
and both ‘‘reset’’ their vocal tract shape to realize this voice ting may serve as the acoustic context in which a person
quality by widening the oral cavity and lengthening the en- produces their speech.
tire vocal tract. The twang quality was interpreted by both An obvious remaining question is how well each of the
speakers to have the acoustic characteristics of an increased vowels and voice qualities can be identified by listeners. As
distance between F1 and F2, the opposite of those for the stated in Sec. II, the production of each vowel within a par-
yawn quality. Furthermore, the twangy vowels for both ticular voice quality was verified during data collection only

1666 J. Acoust. Soc. Am., Vol. 109, No. 4, April 2001 Story et al.: Vocal tract shape and voice quality 1666
by the experimenters. A fair perceptual test of the voice Fitch, W. T., and Giedd, J. 共1999兲. ‘‘Morphology and development of the
qualities and the vowels would include an identification of human vocal tract: A study using magnetic resonance imaging,’’ J.
both recorded and synthetic vowels 共generated with obtained Acoust. Soc. Am. 106, 1511–1522.
Goldstein, U. G. 共1980兲. ‘‘An articulatory model for the vocal tracts of
area functions兲 as well as an intelligibility test of connected
growing children,’’ Doctoral dissertation, Department of Electrical Engi-
speech produced with each quality. Such a test was beyond neering and Computer Science, MIT.
the scope of this paper but should constitute a future study. Joos, M. 共1948兲. Acoustic Phonetics, Supplement to Language 共Journal of
the Linguistic Society of America兲, Vol. 24, no. 2.
ACKNOWLEDGMENTS Ladefoged, P. 共1993兲. A Course in Phonetics, 3rd ed. 共Harcourt Brace, Fort
Worth, TX兲.
The authors would like to thank Steve Baker at the Uni- Ladefoged, P., and Broadbent, D. E. 共1957兲. J. Acoust. Soc. Am. 29, 98–
versity of Iowa Hospitals and Clinics for his technical exper- 104.
tise of MRI and willingness to do late night scanning as well Ladefoged, P., and Maddieson, I. 共1996兲. The Sounds of the World’s Lan-
as the staff of the Division of Physiologic Imaging at the guages 共Blackwell, Oxford, Cambridge, MA兲.
University of Iowa for allowing generous use of equipment Laver, J. 共1980兲. The Phonetic Description of Voice Quality 共Cambridge
University Press, Cambridge, MA兲.
and software. In addition, we would like to thank two re-
Mermelstein, P. 共1966兲. ‘‘Determination of the vocal-tract shape from mea-
viewers for constructive comments on an earlier version of sured formant frequencies,’’ J. Acoust. Soc. Am. 41, 1283–1294.
this paper. This study was supported by Grant No. R01 Narayanan, S. S., Alwan, A. A., and Haker, K. 共1995兲. ‘‘An articulatory
DC02532 from the National Institutes on Deafness and Other study of fricative consonants using magnetic resonance imaging,’’ J.
Communication Disorders. Acoust. Soc. Am. 98, 1325–1347.
Sondhi, M. M., and Schroeter, J. 共1987兲. ‘‘A hybrid time-frequency domain
articulatory speech synthesizer,’’ IEEE Trans. Acoust., Speech, Signal
Abercrombie, D. 共1967兲. Elements of General Phonetics 共Edinburgh Uni-
Process. ASSP-35, 955–967.
versity Press, Edinburgh兲.
Beautemps, D., Badin, P., and Laboissiere, R. 共1995兲. ‘‘Deriving vocal-tract Stevens, K. N., and House, A. S. 共1955兲. ‘‘Development of a quantitative
area functions from midsagittal profiles and formant frequencies: A new description of vowel articulation,’’ J. Acoust. Soc. Am. 27, 484–493.
model for vowels and fricative consonants based on experimental data,’’ Story, B. H., Titze, I. R., and Hoffman, E. A. 共1996兲. ‘‘Vocal tract area
Speech Commun. 16, 27–47. functions from magnetic resonance imaging,’’ J. Acoust. Soc. Am. 100,
Baer, T., Gore, J. C., Gracco, L. C., and Nye, P. W. 共1991兲. ‘‘Analysis of 537–554.
vocal tract shape and dimensions using magnetic resonance imaging: Story, B. H., Titze, I. R., and Hoffman, E. A. 共1998兲. ‘‘Vocal tract area
Vowels,’’ J. Acoust. Soc. Am. 90, 799–828. functions for an adult female speaker based on volumetric imaging,’’ J.
Dang, J., and Honda, K. 共1997兲. ‘‘Acoustic characteristics of the piriform Acoust. Soc. Am. 104, 471–487.
fossa in models and humans,’’ J. Acoust. Soc. Am. 101, 456–465. Titze, I. R., Horii, Y., and Scherer, R. C. 共1987兲. ‘‘Some technical consid-
Estill, J., Fujimura, O., Sawada, M., and Beechler, K. 共1996兲. ‘‘Temporal erations in voice perturbation measurements,’’ J. Speech Hear. Res. 30,
perturbation and voice qualities,’’ in Vocal Fold Physiology: Controlling
Complexity and Chaos, edited by P. J. Davis and N. H. Fletcher, pp.
Traunmüller, H. 共1994兲. ‘‘Conventional, biological and environmental fac-
Fant, G. 共1960兲. The Acoustic Theory of Speech Production 共Mouton, The tors in speech communication: A modulation theory,’’ Phonetica 51, 170–
Hague兲. 183.
Fant, G., and Pauli, S. 共1975兲. ‘‘Spatial characteristics of vocal tract reso- Yang, C-S., and Kasuya, H. 共1994兲. ‘‘Accurate measurement of vocal tract
nance modes,’’ in Proc. Speech Comm. Sem. Vol. 74, Stockholm, Swe- shapes from magnetic resonance images of child, female, and male sub-
den, Aug. 1–3, pp. 121–132. jects,’’ Proc. ICSLP 94, 623–626, Yokohama, Japan.

