4. Investigating the Physiology of Laryngeal Structures


Subject Theoretical Linguistics » Phonetics

DOI: 10.1111/b.9780631214786.1999.00004.x

1 Introduction – basic laryngeal functions

In humans, there are four basic laryngeal functions: airway protection which is particularly important during
deglutition, effort closure for fixation of the trunk while moving the upper extremities, airway opening for
respiration, and phonation.

The most basic function of the larynx is to protect the airway. This function can be best understood by an
appreciation of its origin determined by primitive needs (Negus, 1949). The most primitive larynx is found in
the bichir lungfish (polypterus) living in rivers which periodically become dry. The primary lung developed as
downgrowths of the pharyngeal pouch in response to their need for oxygen under conditions where the
source of supply in water is limited. Development of the lung needed to be protected from the invasion of
water and food during periods of submersion and, therefore, the primary larynx evolved as a protective
mechanism. In the lungfish, the larynx developed as a simple, circular group of muscle fibers within the
upper end of the trachea, constituting an encircling sphincter band. When this simple sphincter closed, the
lung could be effectively isolated and its closure during deglutition prevented invasion by food or water.

During the course of evolution, the encircling sphincter became a more complicated structure, and in higher
animals like the human, laryngeal sphincteric closure is accomplished by a valvular adduction mechanism at
both false vocal fold and true vocal fold levels.

The sphincteric closure essentially serves as a protective mechanism for the airway but it also serves those
physiologic functions which are dependent on air being trapped at the larynx when accompanied by
increases in or maintenance of intra-thoracic and intra-abdominal pressure. Such functions include
coughing, defecation, micturition, and fixation of the trunk for the stable movement of the upper

Another important modification during evolution was the development of the laryngeal opening or abductor
mechanism and the cartilagenous framework of the larynx. Thus, the larynx was able to open the airway
when necessary. Basically, the glottis widens during inspiration and narrows during expiration. This
movement may be almost imperceptible during quiet respiration, but it becomes more prominent as the
depth of respiration increases.

Finally, phonation developed as a principal function of the larynx in which the vocal folds are used as a
flutter valve. This type of flutter valve is only seen in vertebrates possessing the respiratory requirements of
an effective bellows. This is possible only in vertebrates which have a diaphragm; that is the mammals.
Among all mammals, only humans have acquired the potential for the production of meaningful sounds, i.e.,
speech, by using the laryngeal valve as a source of vibration.

Studies of laryngeal function rely heavily on the methods of investigation. In recent years, various kinds of
observation techniques for the assessment and analysis of laryngeal behavior during speech production have
been developed. The following is a brief description of the systems currently in use to assess laryngeal

2.1 Fiberoptic observation and measurement of vocal fold movement

Many techniques have been used for the observation of the larynx. The most simple and popular method for
otolaryngologists is the indirect mirror technique, but using this conventional laryngeal mirror, the larynx
can be observed only while the subject's mouth is kept open. Even then photographs cannot easily be taken.
The rigid tele-endoscope became available later, and laryngeal photography could be readily undertaken.
Figure 4.1 shows a view of the larynx taken during phonation and deep inspiration using a tele-endoscope.

However, there was still a difficulty in the assessment of laryngeal dynamics during speech. In order to find
out what happens during speech or singing in natural circumstances, the flexible fiberscope was devised
during the late 1960s (Sawashima and Hirose, 1968).

The flexible fiberscope basically consists of a hard tip that houses an objective lens and two bundles of
glass fibers: the light guide and the image guide. The light guide conducts the light for illumination of the
field of view from the light source to the object-end of the scope. The image guide is a bundle of aligned or
coherent glass fibers which transmits the image from the objective lens to the eye-piece of the scope.

Figure 4.1 The laryngeal views obtained using a rigid tele-endoscope.

Specific requirements in the design of the fiberscope were: (1) that it have an outside diameter small enough
to pass through the nostril; (2) that it can obtain an image with a resolution good enough for the analysis of
glottal gestures; and (3) that it should be provided with a light source of sufficient brightness (Sawashima,
1977). In recent years, these requirements have generally been satisfied.

Prior to the insertion of the fiberscope, surface anesthesia is applied to the nasal mucosa and to the
epipharyngeal wall. After insertion, the tip of the scope is placed down near the tip of the epiglottis to obtain
a good laryngeal view. Figure 4.2 illustrates the positioning of the fiberscope in an adult male.

Until recently, precise observations of the pattern of vocal fold vibration have generally been made using an
ultra-high-speed movie system. Ultra-high-speed photography can provide good resolution images of the
vibrating vocal folds. However, this system is usually massive and costly, and it is always very time-
consuming to carry out frame-by-frame analysis of the film obtained.

Figure 4.3 Block diagram of the digital imaging system for the analysis of vocal fold vibration.

In the late 1980s, a new method of digitally imaging vocal fold vibration was developed using a solid-state
image sensor attached to a conventional camera system (Hirose, 1988). In this system, a lateral viewing
laryngeal tele-endoscope is attached to a single-lens reflex camera. A MOS-type solid-state image sensor
consisting of 100 × 100 picture elements is attached to the lid of the camera at the position of the film
plate. Under computer control, an image scan is made during the opening of the shutter, and image signals
are stored in an image memory through a high-speed A/D converter. After data storage, the images can be
reproduced and displayed on a monitor CRT screen. By reducing the number of horizontal scan lines, image
data with a maximum rate of 2,000 to 4,000 frames can be obtained.

Figure 4.3 shows a block-diagram of the present system, which consists of a laryngeal tele-endoscope, a
single-lens reflex camera, a special purpose image memory, a personal computer and an ordinary video
recorder. The image memory has a 1-megabyte memory and a high-speed, 8 bit A/D converter. The image
memory samples and stores image data until the memory becomes full. Then the image data are transferred
to the personal computer via a parallel I/O unit. The time required for the transfer is 15 sec. After the
transfer, a slow motion display of the image data can be accomplished under the control of the personal
computer. The images are displayed on a TV monitor and can be recorded on an ordinary video recorder. As
a light source, a pair of 250 W halogen lamps are used.

Data recording is made in the same manner as in still photography of the larynx. The larynx is visualized
through a view finder and the camera shutter is released to start data acquisition. During the shutter
opening of approximately 150 msec, 200 to 400 data frames are stored in the computer memory.

Other physiological data, such as electroglottographic (EGG) signals, can also be recorded simultaneously,
Figure 4.4 An example of laryngeal images recorded using the digital imaging system. Male
subject: sustained phonation of /e/ at a fundamental frequency ( f0 ) of approximately 200 Hz.
Frame rate: 2,000 frames/second.

Figure 4.4 shows the vocal fold vibration of a normal male subject taken at a rate of 2,000 frames per
second. Twenty consecutive frames are displayed from the top-left corner to the bottom-right. The opening
and closing phases of the vocal fold vibration are easily identifiable. In this example, the subject phonated
with a fundamental frequency of 200 Hz, so approximately 2 cycles are displayed in the figure. The two
curves shown below the pictures are the acoustic signal at the top and EGG signal at the bottom.

A pilot system using a fiberscope has also been developed more recently (Hirose et al., 1988). In this
system, a CCD-type image sensor is used. The light source for the fiberscope system is a 300 W xenon
lamp. The sampling rate of the picture elements is 20 MHz. A film rate of 2,000 per second can be achieved
with 200 × 14 picture elements. This type of system makes it possible to observe vocal fold vibration during
speech samples containing consonantal gestures.

2.3 Laryngeal electromyography

Electromyography (EMG) is a technique for providing graphic information about the time course of the
electrical activity of the muscle fibers that accompanies muscle contraction and subsequent effects,
including the development of tension. (See Stone, L ABORATORY T ECHNIQUES FOR I NVESTIGATING S PEECH A RTICULATION .)

Since EMG was established as a scientific discipline, it has been widely used in various fields for studying
muscular function and coordination. In particular, EMG has proved to be useful for research into
kinesiological aspects of human behavior, where the analysis of the parameters of the individual motor unit
action potential may not play an important role. Rather, EMG kinesiology is much more concerned with the
biomechanical analysis of various movements or gestures (Harris, 1981).

The EMG system consists of some sort of probe or electrodes for picking up the action potentials, amplifying
equipment, recording equipment, and ultimately a graphic display, which may have signal processing
facilities. For laryngeal EMG in the study of speech dynamics, so-called hooked-wire electrodes are used, in
which a pair of thin electrically shielded wires are threaded through a needle and inserted in the target
muscles (Hirose, 1985). Intrinsic laryngeal muscles, such as the cricothyroid, thyroarytenoid, and lateral
cricoarytenoid muscles are reached percutaneously as are two extrinsic laryngeal muscles, the sternohyoid,
and sternothyroid muscles. The posterior cricoarytenoid and interarytenoid muscles are reached perorally
with indirect laryngoscopy using a specially designed curved probe (Hirose, 1976). As shown in Figure 4.5
the wire-bearing tip of the needle is kept drawn into the shaft of the probe until it is brought closely to the
point of insertion, at which time it is pushed out by pulling the trigger of the probe.

It should be emphasized that progress in the strategy of the computer processing of EMG data has led to
better analysis of the temporal pattern of the activity of pertinent laryngeal muscles with reference to speech
signals (Kewly-Port, 1973).

2.4 Photoglottography (transillumination of the glottis)

Photoglottography (PGG) is a technique for recording glottal area variation by measuring the amount of light
passing through the glottis.

In 1960, Sonesson first reported the use of a photo-electric device applied to the normal human subject for
assessing the glottal area variation. In his method, a DC light source was placed against the anterior neck,
while a light-conducting rod was inserted into the hypopharynx through the mouth under topical anesthesia.
A photomultiplier tube was attached to the other end of the rod so that the illuminating light passing
through the glottal aperture was transmitted through the light-conducting rod to the photomultiplier tube.
The output of the tube was displayed on a cathode ray oscilloscope and the record was called a photo-
electric glottogram or photoglottogram. Using this technique, he measured the open period, the opening
phase, and the closing phase of the glottal vibratory cycle for sustained phonation. He claimed that the
results obtained from his method were in good agreement with the results obtained from high-speed
motion picture analysis.

Figure 4.5 Peroral insertion of wire electrodes using a curved probe under indirect laryngoscopy.

Since Sonesson's technique imposed considerable limitations on articulatory movements, further

modification was made by other investigators. For example, Frøkjaer-Jensen (1967) introduced a small
photo-sensor attached to the tip of the thin flexible plastic tube through the nasal cavity to the hypo-
pharynx, thus making transillumination possible during speech articulation.

Sawashima (1968) reversed the positions of light source and photo-sensor relative to the glottis. He used a
fiberscopic illumination as a light source while observing the laryngeal gesture, and picked up the photo-
These modifications have extended the application of the photoglottographic technique to studies on glottal
adjustments as well as on the patterns of vocal fold vibration during speech production. It has been assumed
that the data obtained by this technique provides a good approximation of the glottal area function,
although it is impossible to calibrate the instrument to measure the absolute area of the glottis. Also, it
should be taken into consideration that several sources of artefacts may exist during data assessment. A
shift in the positioning of the instruments relative to the larynx may be a major source of artefacts.
Interruption of the light by the epiglottis during speech utterance should also be carefully monitored during
recording to minimize incorrect interpretation of the obtained results.

2.5 Electroglottography (laryngography)

Electroglottography (EGG) is a technique for registering glottal vibratory movements by measuring changes in
electrical resistance across the neck. In this technique, a pair of plate electrodes are placed on the skin on
both sides of the neck above the thyroid cartilage. A weak high frequency electrical current is applied to the
electrodes, and a small fraction of the electrical current passes through the larynx. The transverse electrical
resistance of the larynx varies depending on the opening and closing of the glottis, and a modification in the
amplitude of the transglottic current occurs in correspondence with the vibratory cycles of the vocal folds.
The amplitude modification of the current is detected from which electrical glottograms are obtained.

In a typical model described by Fourcin (1981), each electrode has a guard ring and an inner conductor. One
of the electrodes has a 4 MHz transmitting voltage applied between the center conductor and guard ring.
The other serves as a current pick-up. According to Fourcin, typically about 30 mW is dissipated at the
subject's neck with only microwatts being involved at the level of the vocal folds. Contact between the vocal
folds increases current flow as contact area increases, but movement of the vocal folds without contact,
giving an increase in glottal area, will not necessarily change the current flow. For this reason, Fourcin
claimed that the term ‘glottograph’ is inappropriate, and he proposed that it should be called a

In making comparisons between electrical and photo-electric glottograms, Frøkjær-Jensen (1968) concluded
that the opening of the glottis seemed to be better represented in photo-electric glottograms, whereas the
closure of the glottis, particularly its vertical contact area, was probably better reflected in EGG.

One of the advantages of EGG is that the procedure is carried out with a minimum discomfort for the
subject. As stated above, EGG record reflects the glottal condition during closure better than during the
open period, and the presence or absence of glottal vibration, as well as the accurate fundamental frequency
can be readily determined. However, since it is difficult to estimate to what extent the glottal condition
contributes to the electrical resistance or impedance variations between the electrodes, a quantitative
interpretation of EGG seems to be less direct than PGG.

3 Laryngeal structures and the control of phonation

The framework of the larynx consists of four different cartilages: the epiglottis, thyroid, cricoid, and
arytenoid cartilages. The thyroid and cricoid cartilages are connected by the cricothyroid joint, while the
arytenoid and cricoid cartilages are connected by the cricoarytenoid joint. The movement of the
thyroarytenoid joint changes the length of the vocal folds. Movements of the arytenoid cartilage on the
surface of the cricoarytenoid joint contribute to the abduction-adduction of the vocal folds. The main
movement of the cricoarytenoid joint is a rotation (abduction-adduction) of the arytenoid cartilage around
the longitudinal axis of the joint. Other possible movements of the arytenoid are a small degree of sliding
motion along the longitudinal axis of the joint and a rocking motion around a fixed point at the attachment
of the posterior cricoarytenoid ligament (Leden and Moore, 1961). (For further details of the anatomy of
laryngeal structures, see Kahane and Folkins, 1984; Hirano, 1991; and Bless and Abbs, 1983.)

Movements of the cricothyroid and cricoarytenoid joints are controlled by the intrinsic laryngeal muscles.
Elongation and stretching of the vocal folds is achieved by contraction of the cricothyroid muscle (CT).
Movements of the arytenoid cartilage and the resultant abduction-adduction of the vocal folds are controlled
by the abductor and adductor muscles. The posterior cricoarytenoid muscle (PCA) is the only abductor
muscle, while another three–the interarytenoid (INT or IA), lateral cricoarytenoid (LCA), and the
thyroarytenoid (TA) muscle–are the adductor muscles. Contraction of the cricothyroid muscle may also result
in a small degree of glottal abduction. The vocalis muscle (VOC), which is the medial part of the
thyroarytenoid muscle, contributes to the control of the effective mass and stiffness of the vocal folds rather
than to abduction-adduction movements.

The entire larynx is supported by the extrinsic laryngeal muscles and the ligaments, of which suprahyoid and
infrahyoid muscles form the important members. These muscles contribute to the elevation and lowering of
the larynx, which may relate to the pitch control of voice, as well as to articulatory adjustments such as jaw
opening (Erickson et al., 1977).

3.2 Layered structure of the vocal fold

The layered structure of the vocal fold edge described by Hirano (1974) is shown in Figure 4.6 As can be
seen in the figure, the vocal fold consists of the mucosa epithelium, the lamina propria mucosa, and the
vocalis muscle. In the lamina propria, the superficial layer is the loose connective tissue, and the
intermediate and deep layers correspond to the so-called vocal ligament. Based on the concept of this
layered structure, Hirano proposed a structural model of the vocal fold. In his model, the vocal fold basically
consists of the three layers–cover, transition, and body. The cover consists of the epithelium and the
superficial layer of the lamina propria; the transition includes the intermediate and the deep layers; and the
body includes the vocalis muscle. For simplification, the transition can be considered as part of the body so
that the entire structure can be regarded as cover and body.

Figure 4.6 Schematical presentation of the layered structure of the human vocal fold.

This cover-body model proposed by Hirano is quite useful for explaining variation in the mode of vocal fold
vibration with different laryngeal adjustments and with various pathological conditions. Contraction of CT
elongates the vocal fold, and its effective mass decreases. Due to the elongation of the vocal fold, the
stiffness of both cover and body increases. This is the situation of the vocal fold for phonation in the light or
head register. Contraction of VOC, in contrast, shortens the vocal fold, its effective mass being increased. At
the same time, stiffness of the body increases, while that of the cover decreases. Contraction of VOC in
combination with different degrees of contraction of CT usually takes place for phonation in the modal or
chest register. Thus the difference in the mode of vocal fold vibration between the head and the chest
registers can be accounted for by the different conditions of the cover and body of the vocal fold (Hirano,
3.3 Vocal fold vibration during phonation

According to the almost universally accepted myoelastic-aerodynamic theory of vocal fold vibration during
phonation, one cycle of the vibration of the vocal fold is produced as follows (See also Stevens,

(a) The bilateral vocal folds are appropriately approximated towards the midline by the activation of
the adductor laryngeal muscles accompanied by suppression of the abductor muscle.
(b) Air is then forced through the vocal tract from the lungs and the vocal folds are sucked together
by the combined effect of Bernoulli's aerodynamic law and the elasticity of the tissues (See Shadle, T HE
(c) When the vocal folds have been sucked together, the flow of air from the lungs continues but the
flow through the glottis ceases and the subglottal air pressure rises.
(d) When the subglottal air pressure becomes greater than the medial compression of the vocal folds,
the folds are blown apart and a puff of air escapes into the supraglottal space. Consequently, the
subglottal pressure falls and the vocal folds return to their adducted position at the beginning of the
vibratory cycle as a result of their tissue elasticity.
(e) A second cycle starts as a repetition of the first cycle.

Several preconditions are required for normal phonation. The transglottal pressure (the difference between
the subglottal and supraglottal pressure) and the airflow must be high enough, the glottal width small
enough and the glottal resistance sufficiently low.

4 Laryngeal adjustments for different phonetic conditions

The basic features of laryngeal adjustments for different phonetic conditions can be classified as follows:

(1) abduction vs. adduction of the vocal folds;

(2) constriction of the supraglottal structures;
(3) adjustment of the length, stiffness and thickness of the vocal fold;
(4) elevation and lowering of the entire larynx.

4.1 Abduction vs. adduction of the vocal folds

This type of adjustment is used for the distinction between respiration and phonation, as well as for the
voiced versus voiceless distinction during speech production. For deep inspiration, the vocal folds are fully
abducted by an increase in the activity of PCA and a suppression of the adductor muscles. For quiet
respiration, the extent of the glottal opening is approximately half that for deep inspiration and the vocal
fold position observed in laryngoscopy in quiet respiration is described as the intermediate position. In this
condition, the activities of both the abductor and the adductor muscles are minimal.

The general picture of the glottal condition in the abduction vs. adduction dimension during speech is that
the glottis is closed or nearly closed for voiced sounds including vowels, whereas it is open for voiceless
sounds, the degree of the glottal opening and its timing relative to the articulatory gestures varying with
different phonetic environments.

The principal mechanism underlying abduction vs. adduction of the vocal folds during speech production is
reciprocal activation of the abductor and adductor muscle groups. The reciprocal activity pattern between
the two groups of laryngeal muscles has been revealed by recent EMG studies combined with fiberoptic
observation. In particular, reciprocity between PCA and INT is found to be important for realization of the
voiced-voiceless distinction. The reciprocity between PCA and the adductor muscles has been observed for
different languages, including American English (Hirose and Gay, 1972), Japanese (Hirose and Ushijima,
Figure 4.7 shows an example of averaged EMG curves of the INT and PCA, for a pair of test words /
∂p'∧p/and /∂b∧p/ produced by an American English speaker. It can be seen that PCA activity is suppressed
for the voiced portion of the test words, whereas it increases for the production of the intervocalic voiceless
stop /p/ as well as for word-final /p/. On the other hand, INT shows a reciprocal pattern when compared
with that of PCA in that its activity increases for the voiced portion and decreases for the voiceless portion
of the test words.

Figure 4.8 shows a typical example of the relationship between the glottal size and the pattern of the
averaged laryngeal EMG activity of PCA and INT for the production of the Japanese test word /ise:/. The
glottal width (GW), measured by means of fiberoptic analysis, increases for the voiceless consonant /s/, for
which PCA activity increases and INT activity is reciprocally suppressed.

Some languages, such as Hindi and Chinese, show a phonemic distinction between aspirated and
unaspirated stops. Previous EMG and fiberoptic studies revealed that the degree and timing of glottal
abduction-adduction gestures are well controlled by coordinated laryngeal muscle activities (Sawashima and
Hirose, 1983). In particular, the degree and timing of PCA activation seem quite important for the distinction
between different phonemic types associated with glottal opening i.e., arytenoid separation at the vocal
processes observed by a fiberscope.

Figure 4.7 Superimposed averaged EMG curves of INT and PCA for the utterances /ep′∧ ∧p/ (solid
line) and /eb′∧
∧p/ (dotted line). The line-up point for averaging (zero on the abscissa) indicates
the voice offset of the stressed vowel.

Figure 4.9 shows the relationship between the pattern of PCA activity and the time course of the glottal
width measured at the vocal process, for the three labial stop types showing arytenoid separation: voiceless
aspirated, voiceless unaspirated and voiced aspirated. The curves are lined up at the articulatory release
taken as time 0 on the abscissa, and durations of oral closure and aspiration are also illustrated (Hirose,
1977). The figure shows good agreement not only in degree but also in timing between PCA activity and the
opening gesture of the glottis. Thus, we must fully realize that, in addition to the control of the degree of
glottal abduction vs. adduction, the control of laryngeal timing is also essential in phonetic realization of
different types of consonants. As explicitly discussed by Abramson (1977), various languages of the world
make extensive use of the timing of the valvular action of the larynx relative to supraglottic articulation in
order to distinguish classes of consonants, although certain nonlaryngeal features such as pharyngeal
expansion may also be linked with laryngeal timing.

Figure 4.8 Time curves of the glottal width (GW), the smoothed and integrated EMG curves of the
INT and PCA, and the speech envelope (audio) for the test world /ise:/ produced by a Japanese
subject. The curves are aligned on the same time axis. The vertical line indicates the voice onset
for the vowel /e/.

It should be noted, however, that adjustment of glottal width is only one parameter that determines whether
or not the vocal folds will vibrate during the consonantal interval. In addition, there must be an adequate
glottal airflow through the glottis for generating vocal fold vibration, the amount of which will depend on
both subglottal pressure and on the configuration of the supraglottal articulators. Further, the physical
properties of the vocal folds, particularly the stiffness, is an important factor that relates to initiation-
cessation as well as the mode of vocal fold vibration.

In order to clarify the relationship between transglottal pressure difference and the glottal configuration
during the production of voiceless consonants, a physiological experiment was performed in which the sub-
and supraglottal pressure was measured by means of pressure transducer systems and the glottal size was
estimated using the photoglottography technique (Löfqvist and Yoshioka, 1980). The data were obtained at
the offset of the vibration at oral closure of voiceless consonants /s/ and /t/, at the onset of the vibration
after the oral release, and during the maximum glottal opening for each consonant. The transglottal
Figure 4.9 Comparison of the time courses between averaged PCA activity and glottal opening
gesture. All curves are lined up at the oral release.

Figure 4.10 shows the relationship between the ratio of the transglottal pressure to the subglottal pressure
(ΔP/Ps) and the relative size of the glottal width (GW) for word-initial /s/ and word-initial /t/. In this figure,
the 90 per cent range of the distribution is represented by circles for each of the following sets of data: the
Figure 4.10 Pattern of data distribution for word-initial /s/ and /t/ representing the relationship
between transglottal pressure (AP) vs. subglottal pressure (Ps) ratio and relative glottal width
(GW) (The largest glottal opening during the consonantal period of [s] was taken as 100%, and
relative glottal width was calculated as a percentage of that value for each token). In the figure,
the 90% range of distribution is circled for each of the following data sets: voice offset for /s/
and /t/ (s i -off and ti -off) and voice onset after /s/ and /t/ (s i -on and ti -on), respectively. The
symbol for ‘pk’ indicates the coordinate for values at the time of maximum glottal opening for
each token.

It can be seen here that both /s/ and /t/ demonstrate a difference in the physiological conditions for the
cessation and initiation of voicing related to obstruent production. Namely, in both cases, voicing following
the consonantal closure period occurred with a relatively small glottis and a higher ΔP/Ps ratio compared to
those values with which voicing ceased around the implosion of the consonant.

It can also be seen that there is a subtle difference in the patterns of the distribution of data between the
fricative /s/ and stop /t/ in terms of the laryngeal conditions for voice offset. In the case of /s/, the vocal
fold vibration ceases with a relatively wider glottis than for /t/, whereas the ΔP/Ps ratio is comparable. On
the other hand, there is no apparent difference between /s/ and /t/ distribution for the initiation of vocal
fold vibration.

Thus, it appears that there is a hysteresis in the glottal mechanism defined by the initiation and cessation of
oscillation. That is, vocal fold vibration tends to be maintained at the implosion of obstruents with relatively
favorable physiological conditions for oscillation, while vibration does not start after the voiceless period
until more favorable conditions are obtained by a narrowing of the glottis. These more favorable conditions
are associated with an elevation of the transglottal pressure difference, although the reason why the vocal
folds continue to vibrate with a wider glottis for /s/ than for /t/ is still unclear (Hirose and Niimi, 1987).

4.2 Constriction of the supraglottal structures

A typical example of supraglottal laryngeal constriction with the open glottis is observed in whispered
phonation. In whisper, there is arytenoid separation at the vocal process with an adduction of the false vocal
folds taking place with a decrease in the size of the anterior-posterior dimension of the laryngeal cavity. For
this type of laryngeal adjustment, PCA continues to be active and the thyropharyngeal activation is also
observed most likely for realization of supraglottal constriction (Tsunoda et al., 1994). This particular
gesture for whispering is considered to contribute to the prevention of the vocal fold vibration by the
transglottal airflow, as well as to facilitate the generation of turbulent noise in the laryngeal cavity.

Supraglottal laryngeal constriction with closed glottis is typically observed for glottal stop production. A
similar gesture is often seen for the syllable-final stops in American English (Fujimura and Sawashima,
1971). The gesture prevents the air from the lungs from passing through the glottis. In laryngeal EMG, it has
been observed that LCA appears to show a high degree of activity for this particular gesture together with
activation of TA.

A lesser degree of supraglottal constriction with the closed glottis can be regarded as characterizing the
laryngeal gesture known as ‘laryngealization’. This type of adjustment may be observed for the production of
Korean forced or tense stops and the so-called stød in Danish, where strong activation of VOC has been
reported (Sawashima and Hirose, 1983).

4.3 Adjustment of the length, stiffness and thickness of the vocal folds with respect to pitch
The best example of this type of laryngeal adjustment is control of the pitch of the voice, f 0 , during
phonation. f 0 control at the larynx is considered to be achieved mainly by adjusting the effective mass and
the stiffness of the vocal folds. The main contributor to pitch regulation is CT, while TA also appears to
participate to some extent. The activity of CT increases to raise pitch and decreases to lower pitch. As
mentioned earlier, contraction of CT elongates the vocal folds, resulting in a decrease in the effective
vibrating mass and an increase in the stiffness of both the cover and body of the vocal folds. Contraction of
TA results in a thickening of the vocal folds, their effective mass being increased. The stiffness of the body
increases while that of the cover decreases.

It has been observed that in the chest or modal register, a rise in pitch is characteristically achieved by
contraction of both CT and TA. The most remarkable difference in muscle control between the chest and
head registers is observed in the activity of TA. In the head register, as compared to the chest register, there
is a marked decrease in TA activity, accompanied by an increase in CT activity. The difference in the muscle
control between the two registers results in a difference in the physical conditions of the cover and body of
the vocal folds, which is reflected in the mode of vocal fold vibration (Hirano et al., 1970).

In the realization of pitch accent in Japanese, different types of tones in tone languages such as Chinese,
and word stress in English and other languages, CT is found to be uniquely related to f 0 changes. In
particular, the increase in longitudinal tension and stretch of the vocal folds is obtained by CT activation.
Figure 4.11 compared the curves of averaged CT activity and f 0 contours for five test words having different
stress positions. It is obvious for all words, that CT activation occurs slightly ahead of the pitch peak
associated with the stressed syllable.

Although the mechanism of pitch elevation seems quite clear, the mechanism of pitch lowering is not so
straightforward. The contribution of the extrinsic laryngeal muscles such as sternohyoid is assumed to be
significant, but their activity often appears to be a response to, rather than the cause of, a change in
conditions. The activity does not occur prior to the physical effects of pitch change.

4.4 Elevation and lowering of the entire larynx

This type of laryngeal adjustment is typically observed in the action of swallowing, as well as during speech
for vocal pitch control and voiced vs. voiceless distinction. However, the contribution of these movements
for phonetic distinctions still needs to be investigated, except for specific laryngeal adjustment such as
ejective and implosive sound production in which the entire larynx is elevated or lowered respectively and for
generating or maintaining vocal fold vibration while the vocal tract is closed.

5 Current main issues and the direction of future research

The science of speech production is an inherently interdisciplinary endeavor. Thus, in recent years,
multidisciplinary approaches, including physiological, engineering, and linguistic aspects have attempted to
disclose the fine nature of laryngeal behavior in voice and speech production. For the purpose of facilitating
the exchange of information among different research domains, a series of conferences on vocal fold
physiology have been held since 1981, and the proceedings of the latest conference were published in 1994
(Fujimura, 1994).

Figure 4.11 Comparison of the time courses between the averaged EMG curves of CT and f0
contours for test words having different stress positions.

In the domain of physiological research, simultaneous recordings of multiple parameters are widely
performed. For example, ultra-high-speed observation of the vocal fold vibratory pattern was made in
combination with precise acoustic measures together with the assessment of other physiological parameters
such as EGG (Childers et al., 1983). From an engineering standpoint, numerical simulation and modeling of
the voice source based on physiological data were often reported (Bickley, 1991; Cranen, 1991).

Another important issue is to investigate the nature of pathological voice production. Evaluation of abnormal
voice quality associated with laryngeal diseases has attracted the interest of laryngologists, and the
measurement of many different acoustic parameters has been proposed to quantitatively represent the
degree of voice abnormality (Imaizumi, 1985).

Further, simultaneous recordings of vibratory patterns of the vocal folds and voice signals have led to a
direct comparison between the temporal variation in vocal fold vibration and perturbation of voice, 2, thus
giving a physiological basis of abnormal voice production (Kiritani et al., 1993).

As for future research, it seems that basic studies on laryngeal structure and function are still needed. In
particular, we still lack details of neural control in the human larynx, including; 1) efferent nerve cell
distribution in the brainstem, and exact neural pathways to the laryngeal muscles from the central nervous
system, 2) the control of the larynx by the autonomic nervous system, and 3) the cerebellar control of
laryngeal timing, etc. In future research, these points need to be investigated.

In addition, further study is needed of the physical properties of the laryngeal framework, for example the
network of blood vessels within the larynx, the surface microstructure and the physical properties of the
laryngeal mucosa, and the vocal fold vibratory patterns in different laryngeal conditions under different
emotional states. All of these should be suitable topics for future basic research.

