hmrnal of Voice

Vol. 12, No. 2, pp. 143-150

© 1998 Singular Publishing Group, Inc.

A Hardware-Software System for Analysis of Video Images

*Maria InSs Gongalves and tRebecca Leonard

*Universidade Camilo Castelo Branco, Centro de Estudos da Voz, and Instituto da Laringe, Sgo Paulo, Brazil
"~Universi~ of California, Davis, Medical Center, Sacramento, California, U.S.A.

Summary: The purpose of this paper is to describe a software/hardware system

for the analysis of digitized video images and a number of applications for
which it may be used. The system described includes a Macintosh computer, a
frame-grabber board, and Image, a public domain software program available at
no cost from the U.S. National Institutes of Health. In our clinic and laboratory,
this system is routinely used to make quantitative measurements from videoflu-
oroscopic x-ray images of dynamic swallow studies and studies performed to
assess velopharyngeal dysfunction in speech. It can also be used to examine var-
ious laryngeal parameters obtained from videotaped endoscopic and strobo-
scopic examinations. With a videocamera attached to a microscope, the system
permits quantitative analysis of tissue characteristics, e.g., thickness of epithe-
lial or connective tissue layers of the vocal folds. The relatively low cost and
ease of use of the image analysis system make it a particularly attractive option
when quantitative assessment of clinical or research materials in video format is
desirable. Key Words: Video image analysis.

Quantitative analyses of videotaped images can be in the reach of even the most economy-conscious
a desirable objective for both researchers and clini- budgets.
cians. In the past, most image measurements were
manually performed, and calculations were subject SYSTEM DESCRIPTION
to a large margin of error. New possibilities have
been realized with advances in computer hardware One such system, heavily used at our Center, is
and software. Currently available products improve based on the software program Image, initially de-
precision, provide shortcuts, save time in analysis, veloped by Wayne Rasband at the U.S. National In-
and permit manipulation of images in ways that en- stitute of Mental Health. (The Image program can be
hance their quality prior to measurement. At present, downloaded from the Internet at the Image Home
powerful digital image-processing techniques are read- Page, It can also
ily available, and in some cases, at prices that are with- be obtained by contacting the National Institute of
Mental Health in Bethesda, MD, U.S.A.). This soft-
ware is a public domain image processing and analy-
Accepted for publication December 4, 1996. sis program for Macintosh computers (a PC version
Address correspondence and reprint requests to Rebecca
which runs on Windows 95 is also available). It re-
Leonard, Ph.D., Universityof California, Davis, Department of
Otolaryngology, 2521 Stockton Blvd., Suite 7200, Sacramento, quires a computer (from Mac II to contemporary
California, 95817. models) with at least 8MB of RAM, and a monitor


with the capacity to display 8-bit or 16-bit images the computer monitor is helpful for simultaneously
(256 gray levels). A frame-grabber board is also re- visualizing video and digitized images.
quired to digitize video images. Boards made by Once an image has been captured and enhanced or
Data Translation* or Scion** support the NIMH pro- otherwise manipulated to meet the user's criteria, it
gram, and are available for around $I000. Alterna- can be subjected to analysis, hnage contains several
tively, Image supports QuickTime digitizers, such as tools that facilitate measurement. Tools are similar to
those built into AV Macs and selected PowerMacs. those in many draw and paint programs and include
Wnh Image and appropriate frame-grabber hard- a magnifying glass, scrolling tool, selection rectan-
ware, a user can acquire, display, edit, enhance, ana- gle, oval or polygon, a freehand drawing tool, line
lyze, print, and animate images directly from a video- tool, pencil, eraser, paintbrush, and look-up table
camera, or from a VCR. Once an image is captured, it (LUT). The LUT permits the user to transform each
can be subjected to a number of enhancements, includ- of the 256 possible gray scale pixel values into color,
ing contrast and brighmess adjustments, smoothing, if desirable. Areas chosen for processing are identi-
sharpening, edge detection, and a variety of filtering fied using rectangle, oval, or polygon selection tools.
processes. Digitized images can be rotated, inverted, Lines are created using the line tool, and can be
scaled, and manipulated in several other ways as well. straight, freehand, or segmented. Any selection can
The program can be used to measure areas, path be moved, stretched, added, subtracted, deleted, trans-
lengths and angles, to average gray values, and to de- ferred, saved, or restored. Selection options are also
termine center and angle of orientation of defined re-
useful to isolate and enhance a particular region of an
gions of interest. An additional feature of the program
image without changing other parts of the image.
is its capability to perform automated particle analy-
sis. Editing of color and grayscale images (such as
that seen with MacPaint), including the option of APPLICATION E X A M P L E S
overriding automatic operations to manually outline,
select, and/or measure particular regions of interest, Swallowing
make Image extremely "user-friendly." Any calcula- One application of the Image program that we have
tions obtained can be printed, exported to text files found extremely useful is the analysis of videotaped
and spreadsheets, or copied to the "Clipboard" for fluoroscopic studies of swallowing. At our institution,
further manipulation or analysis. The program sup- dynamic swallow studies are performed in adults and
ports multiple windows which can be simultaneous- children experiencing dysphagia related to head and
ly opened, and eight levels of magnification in which neck pathology, neuromuscular disease, neurogenic,
all editing, filtering, and measurement functions can and other disorders. During these studies, patients are
operate. asked to swallow 1 cc, 3 cc, and self-selected amounts
In order to use Image, a videotaped frame of inter- of barium, of both liquid and paste consistencies, dur-
est is input from a VCR or camera through the com- ing videofluoroscopic filming in lateral and anteropos-
puter's frame-grabber board, digitized, and captured. terior views at 30 frames per second. Timing measures
Alternatively, sequences of images can be collected, can be obtained without digitization, but other quanti-
with the number of frames per second and the amount tative assessments are made possible with Image.
of data limited by capabilities of the frame-grabber For this purpose, a radiopaque ring of known di-
board and available memory on the computer. A VCR ameter is placed on the patient's midchin to serve as
with stop frame or variable playback forward and re- a referent measurement (Fig. 1), that is, x pixels = y
verse speeds is useful for identifying selected frames displacement (in mm, cm, or other measurement
of interest for digitization. A TV monitor used with standard), assuming linearity of images obtained. The
line tool is used first to draw a straight line across the
diameter of the ring. The number of pixels traversed in
* (100 Locke Drive, Marlboro, MA 01752) this distance is then entered in the calibration win-
** (152 West Patrick Street, Frederick, MD 21701) dow to equal the number of m m of the known diam-

Journal ofVoice, Vol. 12. No. 2, 1998


this measurement correlates well with other mea-

sures of swallowing efficiency and safety. It is rou-
tinely collected on all patients undergoing dynamic
swallow studies at our center.

2. Hyoid at rest and at maximum displacement.

Elevation of the hyoid and larynx have been well es-
tablished as critical to airway protection and opening
of the upper esophageal sphincter during swallow.
Measurements of the hyoid (and larynx) at rest and at
point of maximum elevation during swallow provide
useful, objective insights into both processes. An ex-
ample of the measurement technique for hyoid ele-
vation is illustrated in Fig. 4, A-D. The appropriate
frames of the hyoid at rest (Fig. 4A) and then maxi-
FIG. 1. Lateral view videofluoroscopic image. Ring on midchin mally elevated (Fig. 4B) are selected and digitized. In
provides measurement referent. each frame, the anterior hyoid is outlined and refer-
ence lines are drawn on stable landmarks, i.e., the
floor of the nose to the tubercle of the atlas and a
eter of the ring, for example, 32 pixels = 17 mm. All straight line projected inferiorly from the floor of
subsequent measurements will be related to this ref- nose-tubercle line. When this has been completed,
erent until the referent itself is changed. Measures the hyoid and portions of the two reference lines are
that we routinely collect from the recorded swallow selected and copied from the rest frame as in Fig. 4C.
studies include the following: This selection is then ready to be pasted onto the
frame representing maximum elevation (Fig. 4B),
1. Pharyngeal area at rest and at maximum con- with care taken to ensure that the referent landmarks
striction. To obtain these measurements, the video- are aligned. Following this pasting, the superior and
taped swallow study is searched for a "rest" position anterior displacement of the hyoid from rest to its
and then for the point of maximum pharyngeal con- maximal elevation during swallow can be calculated,
striction during a swallow. At each point, pharyngeal as can the most direct distance between these two
area is carefully traced and calculated. This can be points (illustrated in Fig. 4D). Another convention re-
done by outlining the entire area or by putting dots at quires relating absolute distances to vertebral height.
selected points, which are then automatically con-
nected. A recent study in our laboratory revealed that, 3. Maximum anteroposterior upper esophageal
in 60 normal control adults, "pharyngeal area," as sphincter (UES) opening. This measurement refers
shown in Fig. 2A, became essentially zero during to the maximum opening of the UES during swallow,
swallow (Fig. 2B) (1). This is indicative of the criti- as measured in the anteroposterior dimension. Exam-
cal interaction between the base of the tongue and the ples of the LIES at rest and maximally open are pre-
pharynx in propelling the bolus into the esophagus. sented in Fig. 5, A and B. If the UES does not open
In contrast to the control subject, note the same two properly, bolus material may not enter the esophagus
points in a patient who underwent oropharyngeal re- in a timely manner or may present a risk to the air-
section for squamous cell carcinoma in Fig. 3, A and way. Both timing and extent of UES opening are rou-
B. Obviously, pharyngeal area does not decrease in a tinely calculated on patients undergoing dynamic
normal manner. Admittedly, the "area" measure does swallow studies at our institution. As with pharyn-
not account well for differential function of the right geal area, the opening measurement provides infor-
or left side of the pharynx or tongue during swallow, mation only on the anterior-posterior component of
and may lead to an overestimate of the patient's abil- UES opening. Lateral movements cannot be appreci-
ity to constrict the pharynx. Studies at our institution ated in the lateral view radiographic images. This
indicate, however, that in selected patient populations limitation notwithstanding, studies in progress in our

Jounlal of Voice. Vol. 12, No. 2, 1998


FIG. 2. A: Pharyngeal area at rest in normal adult is outlined with tools in hnage. B: Pharyngeal area at point of maximum constric-
tion during swallow in normal adult.

FIG. 3. A: Pharyngeal area at rest in adult patient with oropharyngeal resection. B: Pharyngeal area in same patient at point of maxi-
mum constriction during swallow. Large area reflects difficulty in tongue-pharynx contact caused by resection.

laboratory suggest it is a useful measure in charac- /u/in normal speakers and in speakers with glossec-
terizing the nature of swallowing impairment in dys- tomy was investigated. Range of tongue motion is
phagic patients. defined here as the total area encompassed by the
tongue across the three vowels, as measured from lat-
ARTICULATORY M O V E M E N T S eral view videofluoroscopy studies. To make this
measurement, steady-state portions of subjects' pro-
An additional application of Image at our center is ductions of/i/,/a/, and/u/were first identified on the
in determining range of tongue +jaw motions during videotape, captured, and digitized. The tongue was
selected speech tasks. In a recent study, for example, then outlined or traced anteriorly from its insertion in
range of tongue motion across the vowels/i/,/a/, and the floor of mouth and posteriorly to the vallecula.
Journal of Voice, Vol. 12, No. 2, 1998


FIG. 4. A: Hyoid at rest in normal adult. Referent lines are added and anterior hyoid is outlined using tools in Image. B: Hyoid at point
of maximum elevation during swallow in normal adult. Anterior hyoid is outlined and referent lines are added. C: Portions of anterior
hyoid and referent lines in A are selected and copied for pasting onto the image in B. D: Selection of hyoid and referent lines in A is su-
perimposed on image in B, with referent lines aligned. The shortest distance between the two points can be calculated; alternatively, an-
terior and superior displacement of hyoid, or displacement in terms of vertebral height, can be quantified.

An example of this for the vowel/a/is shown in Fig. vowel/a/(Fig. 6D). This step was then repeated for
6A. As shown, a straight line was again projected the image of the speaker producing the vowel/u/.
along the floor of nose to the tubercle of the atlas, and With each superimposition, care was taken to align
a straight line was projected inferiorly from the tu- the referent lines. When the composite picture was
bercle. This process is then repeated for the subject's completed, the measurements of the total and shared
production o f / i / ( F i g . 6B). With/i/completed, the areas of movement of the tongue for the three vowels
outline of the tongue and portions of the two refer- were made, as illustrated in Fig. 6E and E As noted,
ence lines were selected, copied (Fig. 6C), and then both overall range of tongue motion and the propol'-
pasted onto the image of the speaker producing the tion of shared area to total area are being compared

Journal of Voice. Vol. 12, No. 2, 1998


FIG. 5. A: Arrow indicates location of UES at rest (closed) in normal adult. B: Arrow indicates maximum opening of UES during swal-
low in normal adult.

in control speakers and speakers with glossectomy. quantitative information about any reduction in the
Although analyses of these data have not been com- extent of the lesion with various interventions.
pleted, preliminary findings suggest that speakers may Other applications of Image in our setting have
strive to preserve the ratio of shared and independent ranged from measures of velopharyngeal function
areas to total area even with extensive oropharyngeal during speech to tissue measurements from histology
resection. slides input into the computer via a videocamera at-
tached to a microscope, but the system lends itself to
LARYNGEAL PARAMETERS any type of video information for which measure-
ment or quantitative analysis is desirable. With a Mac-
Additional uses of Image include relative measure- intosh (or PC) computer, digitizing board or built-in
ments of a number of laryngeal parameters. It has not digitizer, good quality VCR (preferably with stop
been possible to make absolute measurement of la- frame and variable playback rates), and the hnage
ryngeal variables due to the difficulty of locating a software program from NIMH, the clinician or re-
searcher has a powerful tool. Virtually any clinical or
known measurement referent for structures of inter-
research material that can be prepared in video for-
est. However, relative measures are quite possible,
mat can be subjected to a wide range of measurement
and include extent and degree of closure of the vocal
and analysis techniques. Although many image
folds, characteristics of anterior and posterior glottal
analysis options are available, the system described
chinks, angles formed by the vocal processes or an-
here, involving free software (which is continually
terior commissure, length of the true vocal folds as- upgraded) and relatively inexpensive hardware, has
sociated with frequency changes, and displacement proven to be an extremely valuable resource with a
of the vocal fold edges associated with intensity vari- wide range of applications.
ation. A simple example is presented in Fig. 7, in
which the extent of a lesion along the vibratory por- REFERENCES
tion of one vocal fold edge is compared to glottic
1. Kendall K, McKenzie S, Leonard R, Gon~:alves M, Walker A.
length. In the example shown, the broad-based lesion
Dynamic videofluoroscopic swallowing parameters in normal
occupies about one third of the entire length of the adults. Presented at Dysphagia Research Society Meeting, As-
glottis. Repeated measures over time can provide pen, CO, October, 1996.

Journal of Voice, Vol. 12, No. 2, 1998


A, B

C, D

E, F

FIG. 6. A: Lateral view videofluoroscopic frame of normal adult producing vowel/a/. Tongue is outlined from anterior floor of mouth
to vallecula using tools in hnage. Referent lines are added. B: Process is repeated for/i/. C: Tongue shape and portion of referent lines
in B are selected and copied for pasting onto image in A. D: Selection in B is pasted onto frame of speaker producing vowel/a/, with
care taken to align referent lines. E: Composite of images for the three vowels is completed, with referent lines aligned. A line connects
vallecula to the anterior floor of mouth for each tongue shape. These points are then connected to form the inferior border of the com-
posite. F: Area common to all three tongue positions (shared area) is outlined with segmented line. Measurements permit calculation of
total, shared, and independent tongue areas for the three vowel productions.

Journal ofVoice, VoL 12, No. 2, 1998


FIG. 7. A lesion of the right true vocal fold is shown. Its extent along the vibratory edge of the fold is calculated as a percentage of the
total length of the membranous portion of the fold.

Journal of Voice, Vol. 12, No. 2, 1998

