Electrophysiology of Larynx

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12


Neurons are cells specialized for the integration and propagation of electrical events. It is
through such electrical activity that neurons communicate with each other as well as with
muscles and other end organs. Therefore, an understanding of basic electrophysiology is
fundamental to appreciating the function and dysfunctions of neurons, neural systems, and
the brain.
Ever since the inception of the field, speech pathologists have relied heavily, almost
exclusively, on their highly trained ears for judgments of speech acceptability. When disorder
was detected, the clinician typically relied upon his own auditory perception for insight into its
An extensive range of technology is available to analyse the voice and the physiology of
It has become an essential part of assessment and monitoring, and is also used for
biofeedback during voice therapy.
Instrumentation now contributes significantly to making diagnoses and to documenting
treatment outcomes. (satalof etal.;1990)
Instrumentation tends to be expensive but the cost to benefit ratio warrants the investment in
good tools.
A solid understanding of the equipment at hands allows the clinicians to modify or
interconnect it in novel and creative ways to solve complex measurement problems.
Todays instruments are often much more complex, more accurate, and more reliable than the
How are the vocal cords and larynx examined?
An examination of the internal structures of the larynx, including the vocal folds, is called
laryngoscopy. There are three principal ways to perform laryngoscopy, reviewed below. Each
of these may be appropriate in certain circumstances, but none of these methods
Method /instrumentation
Video Stroboscopy
Vocal Tract Imaging
Acoustic Analysis
Aerodynamic Measures
Aims of the instrumental assessment :
a) To provide objective information regarding the voice ,the anatomy/ physiology of the
larynx and vocal tract, and respiratory function for phonation.
b) To provide baseline measurement of the voice and vocal function.
c) To moniter treatment progress .
d) To provide outcome measures
Electromyography (EMG) is a technique for evaluating and recording physiologic properties of
muscles at rest and while contracting.
Electromyography is the only procedure that directly demonstrates muscular activity.
It is very useful particularly as one of the methods of the clinical examination of vocal fold
paralysis and functional voice disorders.
Mechanism of recording EMG:
A muscle fiber maintains a steady potential across its membrane (inside negative) at rest.
When a nerve impulse arrives at the nerve ending, a chemical transmitter substance
acetylcholine, is liberated from the nerve ending on to the motor end plate of the muscle.
This induces the depolarization of the muscle fiber membrane producing and action potential.
The action potential is transmitted along the muscle fiber in both the directions at a speed of
approximately 4 meters per second.
Thus exciting the contractile mechanism of the fiber in its wake.
The muscle fiber begin to contract after an interval of approximately 1 msec.
When an electrode is placed outside the muscle fiber membrane, the action potential can be
Since the action potential is very small (in the order of 0.1 to 1mv), an amplification is
required in order to record it.
The graphic display obtained in this way from several muscle fibers is called an
The procedure to obtain electromyogram is called electromyography and the apparatus for
electromyographic recording is called an electromyograph.
No action potential is recorded in normal resting muscle.
During voluntary contraction of a normal muscle, all the muscle fibers innervated by single
lower motor neuron act together. The tiny action potential of the muscle fibers is summed up
and they produce a larger action potential.
Electrical potentials are detected by electromyograph when cells are contracted or at rest.
The electromyography essentially consists of an electrode system, an amplifier, a cathode ray
oscilloscope, a loudspeaker and a recording system.
Muscle action potentials are detected by conductors called electrodes, placed in the region of
the elect disturbance.
At least 3 important criteria must be met by the electrode.

Must not interfere with or alter normal motor function.

Must be able to move with muscle without generating spurious signals (movements artifacts).
Must be enabled in confined area such as oral cavity.
In examining the disorders of the motor units, a needle electrode should be used.
In order to avoid interfering signals from adjacent muscles, a bipolar needle electrode is
preferable to a monopolar concentric needle electrode.
Monopolar needle electrodes simply use the bare tip of a needle as an electrode.
2 monopolar needles are inserted into the muscle for recording of the electrical activity in the
zone between them.
When a kinesiological pattern of a muscle is examined, a hooked wire electrodes (used to
examined speech musculature) should be used.
They have following important advantage:

They offer minimum discomfort to the subject and do not interfere with normal phonation.
They permit considerable localization of the area from which electrical activity is recorded .
They stay fairly well in place regardless of rapid movement of the vocal folds.
Insertion of the electrode in to the intrinsic laryngeal muscles:
Earlier, needle electrode was inserted into most of laryngeal muscle through mouth. But it
often difficult for the subject to phonate normally.
Hirano et al (1962) described techniqes of inserting needle electrode into muscle through the
cervical skin.
The method and insertion into each muscle is as follows:
Cricothyroid muscle: the skin is pierced at pharynx above the lower edge of the
cricoid cartilage and lateral to the midline.
Lateral cricoarytenoid muscle: the needle is inserted through the cricothroid
space penetrating the cricothyroid muscle. The needle is directed posteriorly,
laterally and upwards untill the LCA muscle is pierced.
Posterior cricoarytenoid muscle: the procedure is similar to LCA but needle is
inserted some 5-10 mm deeper.
Vocalis muscle: needle is inserted into the subglottal space through the
cricothyroid space at the midline. It is easy to insert needle into vocal fold during
phonation than respiration.
Interarytenoid muscle: the needle inserted into the subglottal cavity through the
cricothyroid space at the midline. The needle is pushed upward and backwards.
Processing & display of EMG:
EMG signals are very small ranging from 200 micro volt to about 1 micro volt. So amplifier
gain is needed to get an output of sufficient amplitude for recording.
The appearance of the electromyogram will depend on the volume of the muscles seen by the
electrodes and nature of the task being examined.
The frequency of the spikes of electrical activity can range up to more than 40/sec, making
raw electromyogram during a speech event.
The no. of spike/sec is direct measure of degree of muscle activation.
EMG signal are rectified and integrated for analysis.
To mitigate this problem, special computer averaging techniques have been developed .this
techniques involves sampling many repetition of the same utterances.
The EMG data are analysis and stored in computer.

EMG is used to diagnose two general categories of disease: Neuropathies and Myopathies.

Neuropathic disease has the following defining EMG characteristics:

An action potential amplitude that is twice normal due to the increased number of fibres per motor
unit because of reinnervation of denervated fibres.
An increase in duration of the action potential

A decrease in the number of motor units in the muscle (as found using motor unit number estimation

Myopathic disease has these defining EMG characteristics:

A decrease in duration of the action potential

A reduction in the area to amplitude ratio of the action potential

A decrease in the number of motor units in the muscle (in extremely severe cases only)

Directly demonstrates muscular activity.
Can be efectively used in examining patients with functional voice disability.
Can be applied to the study and evaluation of muscle and nerve pathology and nerve
muscular disorders.
Gives information about the degree and extent of paralysis.
Helpful in diferentiating vocal fold paralysis.
Useful in determining the side of lateral fixation of vocal fold in case of bilateral vocal fold
Helpful from a prognostic point of view.

Clinical applications of laryngeal EMG:

Laryngeal EMG can be useful in the diagnosis of a variety of disorders afecting the laryngeal
muscle or their innervations. Some of the most common situation in which laryngeal EMG can
be helpful include:
Lower motor neuron disorders

Evaluation of recurrent or superior laryngeal nerve palsy /paralysis

Prognosis of recovery of vocal fold paralysis.

Diferentiation of paralysis and arytenoids cartilage fixation.

Malingering and psychogenic dysphonia

Basal ganglia disorder

Laryngeal dystonia and tremors

Myothapic disorders

Neuromuscular junction disorders.

Upper motor neuron disorders.

Human tissue is moderately good conductor of electricity. They behave like resistor to which
ohms law is valid. That is current through a given structure will be proportional to the applied
voltage and inversely proportional to the net resistance. If a particular amount of current
flows through a particular structure, the voltage developed is proportional to the resistance
across it.
The Eletroglotttography is a device that enables monitoring of variations of vocal fold contact
by measuring motion induced variations in impedance of neck tissue in the area of the vocal
This represents one of the few noninvasive methods available for obtaining useful information
about the vibratory patterns of the vocal folds.
Components of an electroglotography:

1. Electrodes: the electrodes are made of copper, silver or gold. They have the forms rings
or rectangle covering an area ranging from 3cm-9cm .
A third electrode is often used as a reference for impedance measurements.
IT may be designed as a separate electrode or as a ring electrode encircling each
electrodes. The electrodes are usually mounted on a flexible band whose length
may be adjusted to still allow the subject to comfortably speak and breath naturally

2. The processing and display unit: The received signal is then demodulated bya singal
detector circuit. The typical signal to noise ratio of the demodulator is about 40 db. The
demodulated waveform is then A/D converted and stored in a computer.
Suitable additions to the standard configuration of the device consists of instruments
for measuring the signal strength (eg. Light emitting diode) or for the measuring of
signal symmetricity, showing the relation between the signal emitted by the two
electrodes .
How EGG works ?
All EGG works with a high frequency sinusoidal signal and at a safe level (which is usually 30
mv) of current.
A signal generator supplies the electrodes with an A.C sinusoidal current of an alternating
frequency usually range from 300kHz-5MHz.
This frequency is sufficiently high ,so that the current by passes the less conductive skin
The supplied current is not more than milli amperes and the voltage is not more than 0.5V.
The measuring electrodes are attached according to the manufacturer recommendation.
Some advice to use to electrode paste some advice against it.
There is no firm rule for the exact location of the electrodes but in general they should be
lateral to the thyroid cartilage at a level that approximates the position of vocal folds.
The patient should be positioned preferably with a head support so as to minimize
movement during testing .
EGG Wave Form:
There are six feature of larynx:
Opening time (line a): the time taken by the vocal folds to move from the midline (adducted)
to abducted position.
Closing time (line c): the time taken for vocal folds to move for abducted to adducted position.
Open time (line d): it is the time duration for which the vocal folds remain completely apart.
Close time (line b): the time for which vocal folds remain closed completely. It is same as the
closed phase.
Open phase (line a +c): it is the duration from the initiation of vocal fold abduction to the
complete closure of vocal folds(adduction)
Closed phase (line a+b) =closed time

EGG Waveforms in diferent types voices.

Glottal closure is marked by a sharp rise of the waveform(sudden increase in vocal fold
contact)while the opening phase is gradual with a moderate slope.
Breathy Voice:
The fall phase is longer and the flow cut is more gradual
The glottal pulse is more symmetrical
High Open Quotient
High peak flow glottal
Lower pitch
Whispery voice:
High OQ, but lower than for breathy voice
Pulses more skewed than for breathy voice, but more symmetrical than for normal voice
A high peak glottal flow, but lower than for breathy voice, which implies that H1 is lower in the
source spectrum.

Very high pitch
Rather low glottal peak flow
Often with glottis slightly open, thus, the efect of turbulent flow is observed in the spectrum
Pulses quite symmetrical.

Tense voice:
Sharp cutting of of the glottal flow, boosting the higher spectrum components, very high
skewness of the glottal pulse following a longer rise time
Small Open Quotient
Low frequency components of spectrum attenuated in comparison to the higher components.

Lax Voice:
Comparable to breathy voice
Long rise time of the glottal pulse
EGG-Jitter- is the cycle-to-cycle variability of the fundamental frequency in the EGG
EGG-Shimmer- is the cycle-to-cycle variability of peak-to-peak amplitude, in the EGG

Closed Quotient(CQ): The ratio between the closed phase and the complete cycle.
CQ= Tc/Tc+To
If CQ<0.4 -> Hypoadduction
If CQ>0.6-> Hyperadduction

Contact Index (CI):

CI is an indication of the symmetry of Contact Quotient Perturbation (CQP)

This measure is the cycle-to-cycle variability of the Contact Quotient.

Inverse filtering

Under certain assumptions about the vocal tract, the waveform of the airflow pulses at the
glottis during voiced speech or singing can be obtained by processing the waveform of the
oral volume velocity (volume airflow at the lips) with an analog or digital filter having a
transfer function (frequency response) which is the inverse of that of the vocal tract while
the glottis is closed or almost closed. (Rothenberg, 1973, 1977)
The most significant assumption is that vocal tract can be represented by a hard-walled
tube of possibly non-uniform diameter, which is closed at the end representing the sound
source (at the glottis) and open at the other, radiating end (at the mouth) These
assumptions result in a transfer function with only poles (resonances or formants) and no
zeroes (anti-resonances, such as that introduced by nasalization).
There is also a implicit assumption of airflows and pressures throughout the vocal tract
such that the laws of linear acoustics hold and that there are no significant sources of
acoustic energy within the tract. Under these conditions, the transfer function of an inverse
filter would consist of a series of zeroes or anti-resonances having frequencies and
damping values that match those of the lowest poles or formants of the vocal tract.

Indirect measurements of the glottal waveform involve inverse filtering (IF).

It is assumed that the effects of the vocal tract and lip radiation are cancelled out by the use
of an inverse (whitening) filter

Inverse filtering (IF) is used only for voiced speech segments.

The main advantage of this method lies in the possibility of scaling results in flow units
(cm3/s) and in its noninvasive nature.

Since the speech signal is assumed to have originated in an all pole system , this all zero
filter is simply the inverse of the assumed all pole system function.

There are two techniques of inverse filtering (Javkin et al., 1987; Ladefoged et al., 1988).
For the first technique the data is recorded using a reference quality condenser
microphone with a flat frequency response beginning at a very low frequency (even 0 Hz)
and extending to up to 5 or 8 kHz.
The advantage of this technique is its wide frequency response which facilitates a detailed
representation of the glottal flow signal.

Its disadvantage lies in the fact that when this procedure is used, the DC component is not
registered. Thus, the results cannot be calibrated and are relative.
Additionally, the whole recording channel must preserve the phase of the speech signal.
Also, there are very strict conditions of how recordings are to be made (Jackson et al.,
1985; Karlsson, 1988).

For the second technique the airflow is registered through a face mask (Rothenberg,
1973), which allows the recording of a DC flow component and the calibration of the
measurements in physical units.
In this technique the useful frequency response is flat (within +- 3 dB), from 0 Hz to about
1000-1500 Hz, which limits the accuracy with which the glottal pulse can be recovered.
Especially information about the abruptness of the vocal fold closure is lost (Hanson,
During recordings the mask must seal the face perfectly, since a leak would seriously affect
the measurement which also requires the use of stimuli limited to syllables that are
produced with the jaw moderately open.
Moreover, as Ladefoged et al. (1988) point out, this kind of additional apparatus is quite
difficult to use in the field.
In some experiments oral pressure is additionally recorded to derive the interpolated
subglottal pressure (Rothenberg, 1981) which in turn may be used to estimate the size of
the glottal area (Ishizaka & Flanagan, 1972; Titze &Talkin, 1979; Ananthapadmanabha &
Fant, 1982).
However, this is a complicated process that is prone to errors (Fant, 1980; Cranen & Boves,
1987) due to the limitations of the assumed speech production model as well as the
complex methodology.

For the purpose of inverse filtering, the vocal tract is approximated as an acoustic tube of a
given length composed of a number of sections with different section areas.
This is equivalent to the modelling of the sampled vocal tract transfer function (H(z)) as a
superposition of a given number of spectral poles,

The sound pressure radiated from the mouth to the surrounding air is proportional to the
time derivative of the lip volume velocity flow (Davis, 1978), which is generally
approximated as a high pass filter with a spectrum of a +6 dB/octave slope. In the inverse
filtering technique the frequencies and bandwidths of the poles are estimated by using
autoregressive modelling (AR) of a signal. This method is also called linear prediction
(LPC)11 because the linear combination of the previous input samples is used to predict the
next output sample
An inverse filter (1/H(z)) is applied to every pitch period of the speech signal and the
resulting signal is regarded as an approximation of the source signal. There are two main
strategies for the estimation of the vocal tract transfer function:
the pitch synchronous covariance method (Rabiner & Schafer, 1978) or the circular
correlation method (Paliwal & Rao, 1981) where the parameters of the LPC model are
estimated for a full pitch period
the closed-phase covariance method (Wong et al., 1979) where, as the name suggests,
the covariance method is applied to the closed phase of a pitch period and where p
samples are taken from the previous open phase (p is equal to the model order).
The latter strategy additionally calls for an identification of the closed phase during a pitch
period but is nevertheless judged to be more effective.
Both methods require a marking of the pitch periods, which is usually done by marking the
instants of glottal closing (called CGI's or epochs). Although several methods of epoch
detection based on the processing of the speech waveform have been proposed (Rabiner &
Schafer; 1978; Hess, 1991; Strube, 1974; Ma et al. 1994; Cheng & Shaugnessy, 1989;
Childers & Ahn, 1995), the task is not trivial and the results, especially for distorted speech,
are often unsatisfactory. In order to achieve more accuracy, a techinque of two channel
processing is widely used. For two channel processing the CGI's as well as the instants of
glottal opening are provided by other means, for example through electroglottography (see
section 8) (Krishnamurthy & Childers, 1986; Pinto et al., 1989).

Historically first IF's were used interactively. The operator adjusted the frequencies and
bandwidth of the filter, depending on whether the results satisfied the researcher's
expectations (Miller, 1959, Wong & Markel, 1976).

Assumptions of inverse filtering:

Speech is produced by a linear system in which a source signal is modified by a vocal tract
The system is stationary during an analysis interval.
The glottal pulse spectrum is flat.
The all-pole model of vocal tract characteristics is correct.
The estimates of the bandwidths of spectral poles are correct.

In all proposed models for the production of human speech, an important variable is the
waveform of the airflow, or volume velocity, at the glottis. The glottal volume velocity
waveform provides the link between movements of the vocal folds and the acoustical
results of such movements, in that the glottis acts approximately as a source of volume
velocity. That is, the impedance of the glottis is usually much higher than that of the vocal
tract, and so glottal airflow is controlled mostly (but not entirely) by glottal area and
subglottal pressure, and not by vocal-tract acoustics. This view of voiced speech production
is often referred to as the source-filter model.

A technique for obtaining an estimate of the glottal volume velocity waveform during voiced
speech is the inverse-filtering of either the radiated acoustic waveform, as measured by
a microphone having a good low frequency response, or the volume velocity at the mouth,
as measured by a pneumotachograph at the mouth having a linear response, little speech
distortion, and a response time of under approximately 1/2 ms. A pneumotachograph
having these properties was first described by Rothenberg[1] and termed by him a
circumferentially vented mask or CV mask.

As practiced, inverse-filtering is usually limited to non-nasalized or slightly nasalized

vowels, and the recorded waveform is passed through an inverse-filter having a transfer
characteristic that is the inverse of the transfer characteristic of the supraglottal vocal tract
configuration at that moment. The transfer characteristic of the supraglottal vocal tract is
defined with the input to the vocal tract considered to be the volume velocity at the glottis.
For non-nasalized vowels, assuming a high-impedance volume velocity source at the
glottis, the transfer function of the vocal tract below about 3000 Hz contains a number of
pairs of complex-conjugate poles, more commonly referred to as resonances or formants.
Thus, an inverse-filter would have a pair of complex-conjugate zeroes, more commonly
referred to as an anti-resonance, for every vocal tract formant in the frequency range of

If the input is from a microphone, and not a CV mask or its equivalent, the inverse filter also
must have a pole at zero frequency (an integration operation) to account for the radiation
characteristic that connects volume velocity with acoustic pressure. Inverse filtering the
output of a CV mask retains the level of zero flow[1], while inverse filtering a microphone
signal does not.
Inverse filtering depends on the source-filter model and a vocal tract filter that is linear
system, however, the source and filter need not be independent.

The fact that an inverse filter can yield a very believable waveform having a flat (constant
value) segment at or near zero flow during the glottal closed phase of normal, non-breathy
voicing indicates that these assumptions, and others pertaining to the linearity and
frequency response of the CV-mask and transducer system described below, are generally
warranted. This period immediately following glottal closure is the greatest test of an
inverse filter, since it is during this period that the acoustic energy to be removed is

You might also like