Professional Documents
Culture Documents
The Dysphonia Severity Index
The Dysphonia Severity Index
The Dysphonia Severity Index
Floris L. Wuyts
Marc S. De Bodt The vocal quality of a patient is modeled by means of a Dysphonia Severity Index
University of Antwerp (DSI), which is designed to establish an objective and quantitative correlate of the
Antwerp, Belgium perceived vocal quality. The DSI is based on the weighted combination of the
following selected set of voice measurements: highest frequency (F0-High in Hz),
Geert Molenberghs lowest intensity (I-Low in dB), maximum phonation time (MPT in s), and jitter (%).
Limburgs Universitair Centrum The DSI is derived from a multivariate analysis of 387 subjects with the goal of
Diepenbeek, Belgium describing, purely based on objective measures, the perceived voice quality. It is
constructed as DSI = 0.13 × MPT + 0.0053 × F0-High – 0.26 × I-Low – 1.18 ×
Marc Remacle Jitter (%) + 12.4. The DSI for perceptually normal voices equals +5 and for
University of Louvain severely dysphonic voices –5. The more negative the patient’s index, the worse is
Yvoir, Belgium his or her vocal quality. As such, the DSI is especially useful to evaluate therapeu-
tic evolution of dysphonic patients. Additionally, there is a high correlation
Louis Heylen between the DSI and the Voice Handicap Index score.
University of Antwerp
KEY WORDS: voice quality, voice assessment, acoustic, voice range profile,
Antwerp, Belgium
index
Benoite Millet
Brussels, Belgium
V
Kristiane Van Lierde ocal performance has increasingly gained interest in our society,
University of Gent which is evolving into a service-oriented community. This grow-
Gent, Belgium ing interest has consequently induced a lot of multidisciplinary
research concerning voice assessment and therapy with a comprehen-
Jan Raes
sive battery of tests focusing on qualitative and quantitative aspects of
University of Brussels
vocal performance.
Jette, Belgium
The medical diagnosis of vocal fold pathology is mainly based on an
Paul H. Van de Heyning endoscopic exam of the vocal folds and upper airway tract. Voice dys-
University of Antwerp function on the other hand is assessed by perceptual judgment and ob-
Antwerp, Belgium jective measures, such as acoustic and aerodynamic characteristics.
However, perceptual evaluation is one of the most controversial topics
in voice research. Review of literature reveals a wide variety of rating
scales (Gelfer, 1988; Hammarberg, 1992; Hirano, 1981; Laver, 1980;
Wendler, Rauhut, & Krüger, 1986; Wilson, 1987; Wirz & Mackenzie Beck,
1995) and reliability data fluctuating from study to study (Bassich &
Ludlow, 1986; Blaustein & Bar, 1983; Kreiman, Gerratt, & Berke, 1994;
Kreiman, Gerratt, Kempster, Erman, & Berke, 1993). So far, there is no
internationally accepted perceptual judgment protocol, but the GRBAS
scale (Grade of hoarseness, R for roughness, B for breathiness, A for
astheny, and S for strain) proposed by the Japan Society of Logopaedics
796 Journal
Journal of of Speech,
Speech, Language,
Language, andand Hearing
Hearing Research• •Vol.
Research Vol.
4343• •796–809
796–809• •June
June 2000 • ©American Speech-Language-Hearing Association
2000
1092-4388/00/4303-0796
798 Journal of Speech, Language, and Hearing Research • Vol. 43 • 796–809 • June 2000
Control 68
Incomplete closure 48
Statistics
Vocal nodules 30 The correlation between the G score and other vari-
Reinke’s edema 30 ables was calculated with Spearman Rank statistics
Chronic laryngitis 27
because the G is a categorical variable. The normalcy of
Excessive muscular tension 26
the variables for the different groups was investigated
Paralysis in abduction 23
Tumor 22
using the Kolmogorov-Smirnov test. The equality of vari-
Sulcus glottidis and scar 21 ances for the four groups (i.e., the patients character-
Paralysis in adduction 20 ized with G0 to G3) was investigated by means of the
Mucosal cyst 14 generalized linear model ANOVA for all the variables
Granuloma 10 included. The Dysphonia Severity Index (DSI) was con-
Acute laryngitis 8 structed to be analogous to Fisher’s discriminant analy-
Haemoragy and trauma 7 sis (Fisher & Van Belle, 1993), a standard approach that
Spasmodic dysphonia 6 is used to differentiate two or more populations on the
Ventricular phonation 5 basis of several variables. Given the different subject
Polyps 4
populations, we set up a rule, based on the measure-
Psychogenic aphonia 3
ments of these subjects, whereby a new subject may be
Other 15
correctly assigned to one of the populations. When re-
Total 387 stricted to two populations and two variables, the
Results
Table 2 lists the averages, standard deviations, and
ranges for all investigated variables for the group of 68
healthy subjects and the 319 dysphonic cases.
The Spearman Rank correlation coefficients ρ be-
tween the G and the other variables for 387 cases are
listed in Table 3. The ρ2 values are additionally reported
because these values express the percentage of vari-
ability of the data that is explained by the association
800 Journal of Speech, Language, and Hearing Research • Vol. 43 • 796–809 • June 2000
Healthy 0.73 0.54 0.23 2.97 3.03 1.16 0.91 6.71 0.121 0.020 0.07 0.16 68
Dysphonic 2.55 2.33 0.21 14.12 6.39 4.18 1.37 19.87 0.204 0.151 0.06 1.17 319
Healthy 794 240 329 1397 125 35 58 185 669 226 237 1280 32 5 22 44 68
Dysphonic 442 213 104 1108 115 39 55 300 328 203 26 985 22 8 0 49 319
Healthy 97 7 85 112 51 2 43 57 46 7 35 61 68
Dysphonic 89 10 56 117 55 4 44 77 34 10 7 62 319
Healthy 18.9 6.7 9.0 43.0 3788 1020 932 6300 216 68 75 379 68
Dysphonic 12.4 6.4 1.0 41.0 3131 995 400 6300 307 172 57 1400 319
Table 3. Spearman Rank Correlation coefficients (ρ) between G and the acoustic, aerodynamic, and voice range measurements. The second
row (ρ2) indicates the percentage of variability of the data that is explained by the association between the G and the other variables.
Jitter Shimmer F0-High F0-Low F0- ST- I-High I-Low I-range MPT VC PQ
(%) (%) NHR (Hz) (Hz) range range (dB) (dB) (dB) (s) (cc) (cc/s)
ρ 0.57 0.42 0.39 –0.42 –0.05 –0.45 –0.45 –0.39 0.34 –0.48 –0.38 –0.20 0.25
ρ2 0.32 0.17 0.15 0.18 0.00 0.20 0.20 0.15 0.12 0.23 0.14 0.04 0.06
between the G and the other variables. The significance The relation between G and the DSI is represented
levels p were for all correlation coefficients smaller than in Figure 2. The more negative this DSI is for a patient,
0.001 except for F0-Low, which did not exhibit a signifi- the worse his or her vocal quality. The more it is posi-
cant relationship with G. tive, the better it is. The initially obtained regression-
Except for I-High (dB) and the vital capacity (cc), generated coefficients were post hoc multiplied with a
none of the variables were normally distributed. This scale factor in order to construct a practical scale where
justifies the use of proportional odds logistic regression, +5 corresponds to the average DSI of the G0 group and –5
which defined the combination of the following variables to the average DSI of the G3 group. Table 4 represents
as indicators of the degree of hoarseness (G) when used the DSI value after scaling for the different G scores.
in a specific linear combination: F0-High (Hz), I-Low (dB), Measurement errors on the individual components
MPT (s), and Jitter (%). The DSI, being the discriminat- of the DSI inevitably give rise to an error on the final
ing rule calculated by the logistic regression, consists of outcome measure. We calculated this error on the DSI
a linear combination of these four variables, where each as 0.64, based on an average standard deviation of 1.6
variable has a different weight. The equation is: seconds for MPT, 39 Hz for F0-High, 1.7 dB(A) for I-Low,
DSI = 0.13 × MPT (s) + 0.0053 × F0-High (Hz) and 0.3% for jitter.
– 0.26 × I-Low (dB) – 1.18 × Jitter (%) To estimate the reliability of the DSI, Table 5 rep-
+ 12.4 resents the classification success of this method. This
This DSI is the weighted combination of variables that table shows the agreement between the observed and
reflects best the degree of hoarseness as expressed by predicted perceived voice quality as expressed by G. In
the G from the GRBAS scale. 50% of the cases a perfect agreement is obtained. When
Table 6. Average values (± SE) of the DSI and its components for female and male subjects of the control
group.
Female (N = 43) 16.9 ± 0.7 905 ± 31 51.3 ± 0.2 0.79 ± 0.10 5.22 ± 0.26
Male (N = 25) 22.2 ± 1.7 602 ± 34 50.4 ± 0.5 0.63 ± 0.06 4.7 ± 0.4
802 Journal of Speech, Language, and Hearing Research • Vol. 43 • 796–809 • June 2000
the nonperiodicity of some pathologic voice samples, as The reason for the choice of MPT as a relevant variable
reported by Titze (1995). included in the DSI may lie in the fact that MPT can be
The choice of variables in the DSI is entirely deter- regarded as a phonatory ability measure (Hirano, 1981)
mined by the stepwise logistic regression procedure. How- that reflects the efficiency of several mechanisms nec-
ever, it seems logical that the highest frequency is among essary for voice production, such as subglottic pressure,
the chosen ones. In more than 50% of the dysphonic airflow resistance, closure of the vocal folds, and so forth.
patients the vocal cords are afflicted with an excess mass Piccirillo et al. (1998) elaborated a concept of a vo-
(vocal nodules, edema, etc.; see Table 1). This extra mass, cal function index, but only the classification between
usually heterogeneously distributed along the cords, normal and dysphonic was emphasized. Using logistic
hampers the higher vibratory rates, which is reflected regression, they found that a weighted combination of
by a decreased F0-High. Likewise the presence of nod- estimated subglottic pressure, airflow at lips, vocal effi-
ules, edema, and so forth increases the glottal resistance ciency, and maximum phonation time was able to dis-
such that a greater driving pressure will be necessary criminate between healthy and pathologic voices. It is
to initiate and maintain vocal fold vibration (Colton, noteworthy that the MPT emerges from both their study
1994). Consequently the lowest intensity will be in- and our work as an important variable for the overall
creased in several dysphonic patients. Similar effects assessment of voice quality.
for F0-High and I-Low are found in VRP studies of chil- In order to validate their index Piccirillo and cowork-
dren with vocal nodules (Heylen et al., 1998). Perturba- ers compared it with the GRBAS score for a group of 33
tion measures, such as jitter, are per se intended to as- patients with limited vocal dysfunction (Piccirillo,
sess the degree of irregularity of the vocal cord vibration, Painter, Fuller, Haiduk, & Fredrickson, 1998). The cor-
within certain limits. It is likely that a perceived dys- relation coefficient they found between their index and
phonia will result in an increased perturbation measure. the G was 0.58, whereas we found a value of 0.996 for
the correlation between DSI and G. This is because in voices are perceived as being characterized by a specific
our approach the DSI is based on the G score itself rather G value, there is considerable probability that their cal-
than on the discrimination between normal and patho- culated G, based on the DSI, falls into an adjacent cat-
logic voices. Additionally, the relationship between pa- egory. This is apparently the case as indicated by Table 5.
thology and dysphony is not obvious, because a severe Moreover, the DSI seems to classify better subjects with
pathology does not always strictly imply a severely dys- G1 and G2 than with G0 or G3. Still, only 6 out of 387 cases
phonic voice, and vice versa. Therefore we have adopted are really misclassified by more than one scale point. Fi-
the perceptual rating as a landmark for the classifica- nally, the DSI is not meant as a classification tool.
tion rule. Fortunately, the effect of sex is included implicitly
The classification table (Table 5) illustrates the ef- in the DSI, so that a separate DSI for males and fe-
ficiency of the applied method. An ideal classification males need not be used. As seen in Table 6, the opposite
tool would produce values only on the diagonal, which behavior of F0-High and MPT for both sexes cancels out
is of course not achieved in practice. When a group of so that DSIs for both male and female subjects are iden-
judges is scoring a number of patients, the interobserver tical. Indeed, the difference between the average female
agreement is at best “good,” but never excellent, as ex- and male DSI is not significant; in general the error on
pressed by the kappa statistic (De Bodt et al., 1997). the DSI is estimated as 0.6.
This means that a certain variation or test-retest error To assess the clinical impact of the DSI and its ease
exists in the G score itself. Therefore, it is realistic to ex- of use we present some follow-up cases. Figures 4 to 7
pect that in some cases a perceived G1 might as well have illustrate changes in vocal quality with different thera-
been a G2 or vice versa. This is translated by off-diagonal pies. For some cases the pathology was still present after
elements in the classification table. It implies that when therapy, but the vocal function had improved, according
804 Journal of Speech, Language, and Hearing Research • Vol. 43 • 796–809 • June 2000
to the otolaryngologist and the patient. In other cases, selection of these variables is based on a statistical
therapy may have improved one variable, but when other stepwise procedure that constructs a rule to classify
variables became worse, the DSI reflected this overall voices that are characterized by the scores G0 to G3,
change. which in turn represents the degree of dysphonia as
It takes 10 to 15 min to collect clinical measures perceived by the voice specialist. Considering these facts,
(MPT, etc.) from a patient and to calculate the index it seems reasonable that the DSI meets the criteria of
using the above-mentioned equation. For this calcula- content validity.
tion a desktop calculator or spreadsheet is sufficient. Criterion validity refers to the accuracy of the DSI.
Additionally, the use of anchor points of –5 and +5 fa- How does it compare to a “gold standard”? Auditory-
cilitate the clinical use of the DSI. These aspects con- perceptual judgments are typically the final arbiter in
tribute largely to the ease of use of the DSI. In the Ap- clinical decision-making and often provide the standards
pendix a recommended clinical recording protocol that against which instrumental measures are evaluated
yields the DSI is described. (Kent, 1996). Inherently, the construction of the DSI is
Additionally we want to address the content and based on such a standard, being the Grade of the widely
criterion validity of the DSI. Content validity refers to used perceptual GRBAS scale. To compare the DSI with
whether the index measures what it is intended to— an external measure, we correlated the DSI with the Voice
that is, the degree of dysphonia. The four variables used Handicap Index. The high correlation between both mea-
in the DSI are individually all clear indicators of dys- sures adds to the criterion validity of the DSI. Moreover,
phonia, because their averages are significantly differ- this high correlation indicates that the DSI reflects not
ent for patients with vocal pathology as opposed to nor- only the vocal quality of the patient but also reflects to
mal subjects (Wuyts et al., 1996). Additionally, the a great extent the handicap as perceived by the patient.
Improvement of the DSI can most probably be It is our belief that such a four-component model is suf-
achieved by the use of acoustic variables that are de- ficient in a clinical setting to assess voices in a scientifi-
rived from running speech samples rather than from a cally relevant way within a limited amount of time.
sustained vowel. Also other types of variables, such as The parameters included in the DSI have become
the airflow at lips, subglottic pressure, and so forth, quite accessible in most voice clinics throughout the
might be included. Other methods, such as neural-net- world. The DSI is objective because no perceptual input
work approaches and self-organizing maps, might prove is required for its calculation. The DSI’s small measure-
to be superior to multivariate statistical tools, such as ment error (0.6 on 10 points) and the fact that a multi-
logistic regression or discriminant analysis (Callan et center database was used for its construction underlie
al., 1999). Inevitably, again, a multicenter study with its robustness. These factors, together with the fact that
several hundred subjects is needed for the development the DSI is based on aerodynamic, voice range, and acous-
of a new DSI-like outcome measure. tic measurements, make the DSI a multidimensional,
Daily clinical use of the DSI for the past 18 months robust, and objective outcome measure for the assess-
has shown the authors that the DSI is a practical tool to ment of vocal quality. It provides the individual voice
describe voice quality in a well-balanced way. It plays a therapist with an outcome measure for an individual
valuable part in the global assessment of a dysphonic patient, without being biased by time, subjective evalu-
patient. A model for voice assessment may consist of four ation, or other factors that influence perceptual ratings.
components (De Bodt, 1997): laryngeal inspection, per- It enables clinicians to place a voice in an absolute way,
ceptual evaluation (e.g., GRBAS), subjective evaluation so that therapy can be discussed and its effectiveness
by the patient him- or herself (e.g., VHI), and the DSI. evaluated. Its universal use can enable the comparison
806 Journal of Speech, Language, and Hearing Research • Vol. 43 • 796–809 • June 2000
Kreiman, J., Gerratt, B. R., & Berke, G. S. (1994). The Welsh, A. H. (1996). Aspects of statistical inference. New
multidimensional nature of pathologic vocal quality. York: Wiley.
Journal of the Acoustical Society of America, 96, 1291–1301. Wendler, J., Rauhut, A., & Krüger, H. (1986). Classifica-
Kreiman, J., Gerratt, B. R., Kempster, G. B., Erman, tion of voice qualities. Journal of Phonetics, 14, 483–488.
A., & Berke, G. S. (1993). Perceptual evaluation of voice Wilson, D. K. (1987). Voice problems of children (3rd ed.).
quality: Review, tutorial, and a framework for future Baltimore: Williams & Wilkins.
research. Journal of Speech and Hearing Research, 36, Wirz, S., & Mackenzie Beck, J. (1995). Assessment of
21–40. voice quality: The vocal profiles analysis scheme. In S.
Laver, J. (1980). The phonetic description of voice quality. Wirz (Ed.), Perceptual aproaches to communication
London: Cambridge University Press. disorders (pp. 39–55). London: Whurr Publishers.
Lee, P. A. (1980). Normal ages of pubertal events among Wolfe, V., Fitch, J., & Cornell, R. (1995). Acoustic predic-
American males and females. Journal of Adolescent tion of severity on commonly occuring voice problems.
Health Care, 1(1), 26–29. Journal of Speech and Hearing Research, 38, 273–279.
Piccirillo, J. F., Painter, C., Fuller, D., & Fredrickson, Wuyts, F. L., De Bodt, M. S., Bruckers, L., &
J. M. (1998). Multivariate analysis of objective vocal Molenberghs, G. (1996). Research work of the Belgian
function. Annals of Otology Rhinology Laryngology, 107(2), Study Group on Voice Disorders 1996: Results. Acta Oto-
107–112. Rhino-Laryngologica Belgica, 50, 331–341.
Piccirillo, J. F., Painter, C., Fuller, D., Haiduk, A., & Wuyts, F. L., De Bodt, M. S., & Van de Heyning, P. H.
Fredrickson, J. M. (1998). Assessment of two objective (1999). Is the reliability of a visual analog scale higher
voice function indices. Annals of Otology Rhinology than that of an ordinal scale? An experiment with the
Laryngology, 107(5, Pt 1), 396–400. GRBAS scale for the perceptual evaluation of dysphonia.
Rabinov, C. R., Kreiman, J., Gerratt, B. R., & Journal of Voice, 13, 508–517.
Bielamowics, S. (1995). Comparing reliability of percep- Wuyts, F. L., De Bodt, M. S., Van de Heyning, P. H., &
tual ratings of roughness and acoustic measures of jitter. Van Hoof, R. (1998). Perceptual evaluation of voice
Journal of Speech and Hearing Research, 38, 26–32. quality with and without clincal patient information
Revis, J., Giovanni, A., Wuyts, F., & Triglia, J. M. (1997). [Abstract]. IALP Conference, Amsterdam.
Comparison of different phonetic materials for perceptive
analysis of dysphonia. Revue de Laryngologie Otologie Received May 13, 1999
Rhinologie, 118(4), 247–252.
Accepted November 22, 1999
Revis, J., Giovanni, A., Wuyts, F. L., & Triglia, J. M.
(1999). Comparison of different voice samples for Contact author: Floris L. Wuyts, PhD, University Hospital of
perceptual analysis. Folia Phoniatrica et Logopedica, 51, Antwerp, Department of Otorhinolaryngology and Head
108–116. and Neck Surgery, Wilrijkstraat 10, B-2650 Edegem,
Belgium. Email: wuyts@uia.ua.ac.be
Stewart, A. L., & Ware, J. E. (1992). Measuring functioning
808 Journal of Speech, Language, and Hearing Research • Vol. 43 • 796–809 • June 2000