Acoustic & Aerodynamic Comparisons After Voice Training

ARTICLE IN PRESS
Acoustic and Aerodynamic Comparisons of Voice Qualities

Produced After Voice Training
*,†Nicholas A. Barone, *Christy L. Ludlow, and †Cari M. Tellis, *Harrisonburg, Virginia, and yDallas, Pennsylvania
Summary: Characteristics of true vocal fold vibration such as the proportion of closed phase of vibration to
open phase, longitudinal tension, and the amount of medial compression are used to define four conditions during
Estill Voice Training. However, it is unknown whether trainees achieve these phonatory differences after training.
Acoustic and aerodynamic measures were used to determine differences in Slack, Thick, Thin, and Stiff condi-
tions. Twenty-four female speech-language pathology graduate students received training perceiving and produc-
ing these four conditions and volunteered to participate 3−5 months later. After a 20-minute refresher training,
participants were recorded using the Phonatory Aerodynamic System with electroglottography and Computer-
ized Speech Lab. Four Estill Voice Training experts independently categorized the voice quality productions.
Aerodynamic and acoustic measures of productions classified by at least three of four experts as having the
intended quality determined if measures differentiated among voice qualities and supported the hypothesized
physiological concepts used in training at Bonferroni corrected P ≤ 0.0063. Results showed that Slack had low
fundamental frequency (fo), low sound pressure level (SPL), and high vibratory instability; Thick had high sub-
glottal pressure (Psg), high SPL, and high vibratory stability; Stiff had high airflow while Thin had lower Psg
than Thick. Seven measures differentiated the four qualities with 88.1% accuracy while only Psg, airflow, and jit-
ter were required to differentiate Thick, Stiff, and Thin with 88.7% accuracy. As acoustic and aerodynamic meas-
ures differentiated among voice qualities and supported the theoretical physiological characteristics used in
training, they could be used to track accuracy during training.
Key Words: Voice qualities−Voice training−Acoustic measures−Aerodynamic measures−Discriminant func-
tion analysis−Classification accuracy.
Abbreviations: fo, fundamental frequency−Hz, Hertz−dB, decibels−Psg, subglottal pressure−SPL, sound
pressure level−NHR, noise-to-harmonic ratio−PAS, Phonatory Aerodynamic System−EGG, electroglotto-
graph−CSL, Computerized Speech Lab−MDVP, Multidimensional Voice Program.
INTRODUCTION a common terminology is to refer to possible physiological

The ability to perceive and produce different voice quali- and anatomical mechanisms that might be used during the
ties is important for the understanding and treatment of production of different voice qualities.2,8−18 By referring
people with voice disorders. Graduate students in speech- to hypothesized mechanisms for producing different voice
language pathology are required to learn to perceptually qualities, both perceptual and production accuracy could
differentiate between voice qualities and to control their possibly be reinforced during training. The Estill Voice
voice production system to produce voice qualities accu- Training system postulates that different physiological
rately and reliably. Voice qualities are auditory-percep- mechanisms are used to produce different voice qualities
tual phenomena.1−5 Different categories of voice qualities (37−39). Although such an approach may benefit student
may be used for voice training from those used to training when learning to perceive and produce different
describe voice disorders. Further, varying types of experi- voice qualities, it is unknown whether the different voice
ence and training can contribute to the reliable classifica- qualities differ on acoustic and aerodynamic attributes
tion of various voice qualities.6,7 that might support the hypothesized physiological differen-
A common terminology for describing different catego- ces among the voice qualities.
ries of voice quality is needed. One approach to developing The Estill Voice Training system19,20 postulates a frame-
work for defining voices by the component systems used to
create different voice qualities at the level of the true vocal
Accepted for publication July 15, 2019.
Funding: Support in part for the first author’s research was provided by the home folds, and the vocal tract including, but not limited to the
institution during his research training program as a PhD student at James Madison true vocal folds: body and cover, false vocal folds, aryepi-
University. These funding sources had no involvement in the study or the develop-
ment of the article for submission reporting the research. glottic sphincter, larynx position, and velopharyngeal clo-
Conflicts of Interest: The authors declare that there is no conflict of interest. sure. For this study, the focus was on the expected
From the *Department of Communication Sciences and Disorders, James Madison
University, Harrisonburg, Virginia; and the yDepartment of Speech Language Pathol- manipulations of the true vocal folds that participants were
ogy, Misericordia University, Dallas, Pennsylvania. expected to use to produce the different voice qualities.
Address correspondence and reprint requests to Nicholas A. Barone, Communica-
tion Science and Disorders Program, Department of Human Services, Curry School These manipulations refer to vocal fold mass distribution
of Education and Human Development, University of Virginia, PO Box 400267, using changes in length and tension properties, that is, con-
Charlottesville, VA 22904. E-mail: Nab5d@virginia.edu
Journal of Voice, Vol. &&, No. &&, pp. &&−&& ditions of the body and cover, for the production of different
0892-1997 perceptual voice qualities.21−23 Imaging studies using mag-
© 2019 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
https://doi.org/10.1016/j.jvoice.2019.07.011 netic resonance imaging and x-ray computed tomography
ARTICLE IN PRESS
2 Journal of Voice, Vol. &&, No. &&, 2019
during the production of voice qualities have indicated pos- in Thin and Stiff, and lowest in Slack. Voice airflow might
sible differences in vocal fold shapes.24−27 Also, the vertical be highest in Stiff, lower in Thin and Thick, and lowest in
depth of contact between the vocal folds during vibration Slack; and fo might be lowest in Slack, higher in Thick,
has been examined using high-speed stereo-endoscopy28 higher again in Thin, and highest in Stiff. Because of some
and high-speed endoscopy with laser measurement sys- hypothesized differences, subglottal pressure may be highest
tems,29−31 to indicate differences between voice qualities. in Slack and Thick and reduced in Thin and Stiff. Because
Four conditions of the true vocal folds body and cover have of instability during vibration, the noise-to-harmonic ratio
been described based on concepts of the depth of contact of (NHR) would be lowest in Thick and Thin and higher in
the folds and longitudinal tension.19,21,23,28,30,32 The four Slack and Stiff. Similarly, instability of the vibratory period
different conditions are Slack (glottal fry), Thick (modal or should be greatest in Slack and Stiff and less in Thick and
chest voice), Thin (soft clear voice), and Stiff (breathy Thin.
voice). For clarity for readers unfamiliar with the Estill This study examined the production of the four voice
model, in this manuscript, the four conditions (Thick, Thin, qualities by graduate speech pathology students trained
Stiff, and Slack) will be referred to as voice qualities. This using the physiological Estill Voice Training concepts
study will examine acoustic and aerodynamic measures of taught within their class on voice disorders. Productions of
these four voice qualities to determine if there was acoustic these four qualities were recorded from students and then
and aerodynamic evidence of different vocal fold body and each production was categorized as one of the four qualities
cover conditions for each. based on their perceptual judgments independently by exter-
The following explanations are used when teaching per- nal reviewers who were experts in Estill Voice Training.
ception and production of the four qualities. Thick voice Those productions that were categorized by at least 75% of
quality or modal (chest) voice is theorized to involve a pliant the external reviewers as having the same quality as was
body and cover with a thick, deep contact of the folds along intended by the students during production, were then ana-
the full medial edge from the lower to the upper border.19 lyzed using acoustic and aerodynamic measures to examine
This deep margin of contact is hypothesized to be due to a (1) whether productions of the four qualities differed on
lax vocal ligament with thyroarytenoid muscle activa- acoustic and aerodynamic measures of voice production, (2)
tion.21,32 Complete closure of the folds with full anterior to whether the four qualities could be discriminated based on
posterior vibration is also proposed to occur. Thin voice their acoustic and aerodynamic measures, and (3) whether
quality is theorized to involve a less-flexible body and cover the acoustic and aerodynamic differences were supportive
with a shallow depth of contact of the folds that does not of the hypothesized physiological concepts used in training.
span the full lower to upper border of the medial edge,
resulting in a rounding and thinning of the edge.19 Complete
METHODS
closure of the glottal space is hypothesized with full anterior
The protocol and consent were approved by the Internal
to posterior vibration. Longitudinal tension is thought to be
Review Board of Misericordia University and James Madi-
applied to the vocal fold due to a tight vocal ligament pre-
son University.
venting the lower margin of the vocal folds from coming
into contact.32
Stiff voice quality condition (falsetto and/or breathy Participants
voice) is thought to involve elongated vocal folds with less Potential participants were female students who had com-
contact of the vocal folds without full glottal closure result- pleted a modified version of the Estill Voice Training vocal
ing in a breathy voice quality.19 Longitudinal tension is quality perception and production training as part of a
hypothesized creating a thinner margin of vibration21,32 and Voice Disorders course in a speech-language pathology
the arytenoids may be slightly abducted resulting in a poste- master’s degree curriculum 3−5 months prior to the study.
rior glottal gap. In the Slack condition (glottal fry), the During the voice training, the third author, a certified course
vocal fold body and cover are hypothesized to be loose instructor in Estill Voice Training with 13-year experience
resulting in an inconsistent vibratory pattern due to teaching these concepts, was the professor teaching the
increased vibratory mass at points along the vibratory mar- course along with the student assistant, the first author in
gin with audible gaps.32 The vocal folds are hypothesized to this study. The voice training followed levels 1 and 2 Estill
be short and flaccid with decreased longitudinal tension cre- Voice Training method modified over one semester and
ating a thicker, more compact vocal fold vibratory margin. involved approximately 8 hours of direct instruction in the
Currently, it is not feasible to verify the hypotheses con- classroom, additional home practice, and graded (for credit)
cerning physiological mechanisms of the vocal folds and assignments including one at the end of the semester when
cover without invasive high-speed three-dimensional high- students were required to produce the voice qualities and
resolution imaging of vocal fold vibration during human conditions at the completion of the course.
voice production. Based on the hypothesized physiological Recruitment for the study took place 3−5 months after
differences, however, some acoustic and aerodynamic differ- the course was completed and grades had been assigned,
ences between these four voice qualities might be expected. was independent of any coursework, and was entirely volun-
For example, voice intensity may be highest in Thick, lower tary. Inclusion criteria were a satisfactory grade on the
ARTICLE IN PRESS
Nicholas A. Barone, et al Differences in Acoustic and Aerodynamic Measures 3
Voice Disorders course that involved Estill voice quality adequacy of the EGG signal was examined on the computer
training. In addition, after informed consent, participants display. Participants were instructed on securing the mask
were examined by the first author and criteria included that around the nose and mouth to provide a good seal and pre-
the participants had a score of 10 or less out of 50 on the vent air leakage.
Voice Handicap Index,33 and a score of 10 or less out of 100 As Smitheran and Hixon35 noted that different fo values
on the Consensus Auditory-Perceptual Evaluation of Voice could lead to differences in aerodynamic and acoustic meas-
(CAPE-V)34 on each of the six voice quality features (overall ures, we chose to control the target fo to reduce these effects
severity, roughness, breathiness, strain, pitch, and loudness) among the three qualities: Thick, Thin, and Stiff. Estill
at time of admission to the study. Exclusion criteria were a Training states that Thick, Thin, and Stiff can be produced
history of vocal pathology and previous training in voice at any fo,19,20 so we chose to use the same fo target for these
performance prior to the Voice Disorders course. productions. However, as Slack is characterized by a low
and unstable fo,36 we did not attempt to control the target
fo for this quality. As all potential participants were female,
Procedures
for Thick, Thin, and Stiff, a pure tone at 262 Hz (C4) was
Voice quality refresher training
played as the target fo for a participant to produce on all
A brief 20-minute refresher training on the target voice
targets except Slack.35
qualities derived from level 1 Estill Voice Training19 was
During testing, the examiner checked that the pneumo-
presented by the first author. Following a review of the four
tachometer mask was sealed on the participant’s face and
voice quality conditions (Thick, Thin, Stiff, and Slack), the
that the lips were sealed around the intraoral tubing during
participants practiced each of them while receiving trainer
production of the bilabial plosive /p/. When the participants
feedback. Participants were instructed to focus on changing
reported they were ready, they were instructed to repeat /pi/
only their true vocal fold body and cover configurations
five times for each of three trials at the target fo (except for
while maintaining a neutral or consistent vocal tract config-
Slack) while the examiner visually cued the participant to
uration. Spectrographic analysis on Voiceprint Plus 6.0 soft-
produce each syllable to ensure an appropriate rate at
ware (Estill Voice International, Pittsburgh, Pennsylvania)
approximately 88 beats/min.37 This was followed by pro-
was also used as feedback during training comparing partic-
duction of a single /pi/ with the vowel prolonged for 4 sec-
ipant production spectrographs to expert production spec-
onds with the PAS facemask secured against the face. The
trographic examples provided within the Voiceprint
effects of the pneumotachometer on the participants’ per-
software. When participants reported that they were able to
ception of their production was reduced as much as possible
produce each target voice quality condition and were judged
by producing target voice quality conditions with the mask
by the trainer as accurate in their productions of each of the
off immediately prior to producing the same target with the
conditions, recording was conducted. If a participant was
mask on through the pneumotachometer. Data were col-
judged unable, or self-reported they were unable, to ade-
lected in the following order across all subjects: Thick,
quately produce any of the four conditions, they were
Thin, Stiff, and then Slack.
excluded from further participation in the study.
Preparation of samples for categorizing by external

Post-training recording
reviewers
After 20 minutes of refresher training, participants’ produc-
Following data collection, each of the participants’ produc-
tions were digitally recorded in a sound-treated, quiet room
tions was compiled into .WAV files from the CSL record-
with the Phonatory Aerodynamic System (PAS) with elec-
ings. To determine how accurately participants achieved the
troglottograph (EGG) and the Computerized Speech Lab
target qualities these were provided to external expert
(CSL) running the Multidimensional Voice Program
reviewers. The .WAV files consisted of five productions of
(MDVP; KayPENTAX, Lincoln Park, New Jersey). The
/pi/ for each voice quality recorded from each participant
PAS instrumentation included a facemask with a pneumo-
uploaded to an online survey (Qualtrics, Provo, Utah). The
tachometer (airflow head), an intraoral pressure transducer
survey consisted of 119 audio samples in random order, 96
with tubing placed in the mouth, and a condenser micro-
were participant productions (24 participants with four sam-
phone, inset within the PAS equipment at a fixed distance
ples each) and 23 were randomly repeated samples to be
(16 cm) from the pneumotachometer and previously cali-
used for determining intrarater reliability.
brated for measurement of sound pressure level (SPL) in
dB. The CSL MDVP was used to record the voice for acous-
tic analysis. External reviewers categorizing of voice qualities
Prior to testing, air volume was calibrated for the pneu- Potential reviewers credentialed with the highest level Estill
motachometer for each participant using a 1-L syringe. Voice Training certification were contacted using the Estill
Intraoral pressure in centimeters of water was also cali- International website database of Certified Course Instruc-
brated as part of the PAS system following manufacturer tors. If they agreed to participate, they were sent a link to a
instructions (KayPENTAX). The EGG transducers were Qualtrics survey and asked to review unlabeled WAV file
positioned on the skin over the thyroid cartilage, and the samples, played using Qualtrics. The reviewers were
ARTICLE IN PRESS
instructed to listen to each sample and choose the voice voice qualities except Slack, differences in fo could be
quality condition from the four categories that best matched expected based on methodological differences; therefore, fo
the sample. Reviewers were able to listen to the samples as was not included as a possible predictor in the DFA examin-
often as they wanted before selecting a category. The selec- ing whether the measures could predict voice quality. A sub-
tions completed by the expert reviewers were downloaded sequent stepwise backwards DFA examined which
from the Qualtrics survey website and transferred to a measures, including fo, could differentiate among Thin,
spreadsheet for analysis. Thick, and Stiff voice qualities.
Acoustic and aerodynamic measures RESULTS

Acoustic measures were made using the MDVP software. Participants
Measures were made from 4-second recording of the pro- Of the 27 female participants recruited into the study, 24
longed /i/ vowel after /p/, and included mean fo in Hz, SPL met all inclusion criteria and were able to produce the four
in dB, jitter percent (percent of cycle to cycle variation in voice qualities. One was excluded due to a history of voice
fo), shimmer percent (percent of cycle-to-cycle variation in pathology (vocal nodules) and two scored above 10 on the
amplitude), and NHR. Using the EGG recordings, the CAPE-V. The participants were between 21 and 35 years of
closed phase quotient (proportion of the cycle when the age (mean 24.2 years).
vocal folds were closed) was obtained using the PAS soft-
ware. This provided a measure of the proportion of the
vibratory cycle that the vocal folds were in contact. External reviewers
From the five repetitions of /pi/, the PAS software was Six persons were identified with the highest level of certifica-
used to automatically derive measures of the estimated peak tion as trainers with Estill Voice Training expertise on the
subglottal pressure (Psg) in centimeters of water (cm H2O) website and were sent invitations to participate. Four out of
and mean airflow during voicing in liters per second (L/s) six expert trainers responded to the survey and agreed to
from the five /pi/ productions. The PAS software contained participate. The four external reviewers who completed the
an autothreshold system that only included those produc- survey had an average of 12.5 years of the highest level of
tions meeting the software criteria before computing the certification and professionally used the voice quality condi-
mean values for SPL, Psg, airflow, and EGG closed quo- tions daily.
tient within an utterance of five /pi/ repetitions. Mean meas-
ures for each production were transferred into a spreadsheet
Data sets
containing the four target voice quality productions of each
For each of the 24 participants, four samples, one for each
participant.
of the four qualities, were recorded yielding a total of 96
samples. Of those, the PAS readings were inaccurate either
Statistical analysis due to mask placement errors, performance errors, or dys-
The data were transferred into SYSTAT version 13 and function of the one of the transducers, on 6 of the 96 record-
SPSS version 25 (IBM, Armonk, NY) for statistical analy- ings resulting in 90 accurate items that were classified by the
sis. Using the categories identified by each of the external four external reviewers.
reviewers, inter-reviewer agreement and intra-reviewer
agreement was measured using Cohen’s Kappa coeffi-
Voice quality classifications by external reviewers
cients.38 To determine which productions were accurate rep-
Inter-reviewer reliability
resentations of one of the four qualities, only the
Agreement between the four external reviewers was exam-
productions that were categorized by 75% of the external
ined to determine the inter-reviewer reliability. Cohen’s
reviewers as having the same voice quality as was intended
Kappa values were computed between classifications by
by the participant were selected for analysis. These produc-
each reviewer pair using all 90 productions. Table 1 shows
tions were used for voice quality comparisons on the acous-
the Kappa values between reviewer pairs. A Kappa value
tic and aerodynamic measures using General Linear Model
>0.6 would indicate a good relationship.39 Although all
Analysis of Variance (GLM ANOVA) and discriminant
function analyses (DFA). The GLM ANOVAs compared
the four qualities on each of the acoustic and aerodynamic TABLE 1.
measures. To correct for multiple comparisons using eight Cohen’s Kappa Values on Inter-Reviewer Agreement on
measures, the required alpha for statistical significance was the 90 Productions Classified by Each External Reviewer
corrected to 0.05/8 = 0.0063, for quality comparison
ANOVAs. If an ANOVA was statistically significant, post Judge 1 Judge 2 Judge 3
hoc paired comparisons among voice qualities were also Judge 2 0.808
Bonferroni corrected by using P< 0.0063. Judge 3 0.823 0.881
As the examiner played a pure tone at 262 Hz (C4) before Judge 4 0.821 0.808 0.793
the participant produced the vowel or /pi/ repetitions for all
ARTICLE IN PRESS
pairs had good inter-reviewer agreement, reviewer 4 had participant’s intended Thin production was classified by
somewhat lower Kappa values in relation to the other three three of the four reviewers as Stiff; therefore, this production
reviewers. was removed from the data base when using acoustic and
aerodynamic measures to differentiate between productions
of different voice qualities. Of the remaining 84 productions,
Intrajudge reliability all participants had at least one production included for fur-
The reviewers differed in their intrareviewer reliability. The ther analysis and all were confirmed to have the intended
Cohen's Kappa values measuring agreement between quality by at least three of the four reviewers.
repeated classifications on 23 items were 0.762 for judge 1,
0.941 for judge 2, 0.942 for judge 3, and 0.752 for judge 4.
Differentiation between four voice qualities on
acoustic and aerodynamic measures
Agreement between external reviewer classifications GLM ANOVAs between voice qualities on acoustic
and intended voice quality of productions by and aerodynamic measures
participants The remaining 84 productions were used in the comparisons
The 90 productions that were used in the Qualtrics surveys of the four voice qualities of the acoustic and aerodynamic
and completed by each of the four external reviewers measures. Figure 1 contains boxplots of the distribution of
resulted in a total of 360 classifications (4 £ 90). As shown values for each acoustic and aerodynamic measure for the
in Table 2, of the 360 productions, 328 were classified by the four voice qualities when at least three of the four external
external reviewers as having the same voice quality as the reviewers assigned the same voice quality as the quality
speaker intended that is 91.1%. Of the 92 Slack productions, intended by the participant. Table 3 contains the GLM
all but two were classified as Slack by the four reviewers ANOVAs results and the post hoc paired comparisons. Fol-
(97.8%), while one classification was Thick and one was lowing Bonferroni correction, alpha was set at ≤0.0063
Thin. Of the 88 Stiff productions, 82 were classified as Stiff (0.05/8). The only measure that did not show significant dif-
(93.2%) while 6 classifications were Thin by the reviewers ferences between qualities was the percent of the cycle that
(6.8%). Of the 92 Thick intended productions, 80 were clas- was closed on the EGG; the others all showed significant
sified as Thick (86.9%), 5 as Stiff (5.4%), and 7 as Thin differences between voice qualities on the measures
(7.6%). For the 88 productions intended to be Thin, 76 were (Table 3). On SPL, Slack had lower SPL than Stiff, Thick
classified as Thin (86.4%), 9 as Stiff (10.2%), and 3 as Thick and Thin, while Thick had a higher SPL than Thin. On Psg,
(3.4 %). Overall of the 90 productions, 68 (75.6%) were cate- Thick had a higher pressure than Stiff, Slack, and Thin. On
gorized by all four reviewers as having the speaker intended airflow, Stiff had a higher flow than Slack, Thick, and Thin.
voice quality. On fo, Slack had a lower fo than Stiff, Thick, and Thin. On
Because of the differences in accuracy of some of the pro- Jitter, Shimmer, and NHR, Slack had a higher level than
ductions when classified by the four external reviewers, it Stiff, Thick, and Thin.
was decided to only use those productions where at least
three of the four external reviewers agreed with the voice
quality classification intended by the participant when com- Discriminant function analyses among qualities on
paring productions on their acoustic and aerodynamic acoustic and aerodynamic measures
measures. There were 5 items out of 90 (5.6%) where three Before conducting the DFAs to determine if the acoustic
out of the four reviewers did not agree on the same quality and aerodynamic measures were independent, we examined
and no quality could be compared with the speakers the Pearson correlation coefficients among measures for the
intended category. Of the remaining 85 productions, three data set as a whole of 84 productions and within each qual-
reviewers agreed upon the same category as the participant’s ity. This was particularly a concern as jitter and shimmer
intended voice quality with only one exception; one were likely to be related and both could also be related to
TABLE 2.
Classifications of the Productions by External Reviewers and Their Relationship With Intended Voice Quality of the
Participants
Participants Classifications by Judges
Intended Voice Quality Slack Stiff Thick Thin Total (% Agreement)
Slack 90 0 1 1 92 (97.8)
Stiff 0 82 0 6 88 (93.2)
Thick 0 5 80 7 92 (86.9)
Thin 0 9 3 76 88 (86.4)
Total 90 96 84 90 360 (91.1)
ARTICLE IN PRESS
FIGURE 1. Boxplots showing the median and the quartile distributions of each of the acoustic and aerodynamic measures in the four voice
qualities, Stiff (SF), Slack, (SK), Thick (TK), and Thin (TN). The measures shown are sound pressure level (SPL) measured in decibels (dB),
estimate of subglottal pressure (Psg) in centimeters of water (cm H2O), airflow in liters per second (L/s), EGG in percent closed phase, funda-
mental frequency (fo) in Hertz (Hz), jitter in percent variability of fo, shimmer in percent dB SPL variability, and noise-to-harmonic ratio
(NHR).
NHR. Although the relationship between jitter and shim- from r = 0.764 for Slack, r = 0.309 for Thick, r = 0.105 for
mer was r = 0.876 (r2 = 0.767) for the data set as a whole the Thin, and r = 0.741 for Stiff. Therefore, it was decided to
relationship varied among the four qualities with r = 0.790 include all seven remaining measures in the DFAs except
for Slack, r = 0.189 for Thick, r = 0.383 for Thin, and for fo because of methodological differences between testing
r = 0.848 for Stiff, indicating that the relationships varied for Slack from Thick, Thin, and Stiff when a target fo was
within the different voice qualities. Similar findings occurred provided.
between jitter and HNR with an r = 0.857 (r2 = 0.734) for The DFA differentiating between the four voice qualities
the data set as a whole but varied among voice qualities on the 84 productions was able to account for 100% of the
TABLE 3.
Results of the General Linear Model Analysis of Variance for Each of the Acoustic and Aerodynamic Measures Using a ≤
0.0063 for the Quality Comparisons and the Bonferroni Corrected Post Hoc Paired Comparisons
GLM ANOVA Results Post Hoc Comparisons (P Values)
F Stiff Vs Stiff Vs Stiff Vs Slack Vs Slack Vs Thick Vs
Measure Ratio P Slack Thick Thin Thick Thin Thin
SPL 53.154 <0.0005 <0.0005 ns ns <0.0005 <0.0005 <0.0005
Psg 26.915 <0.0005 ns <0.0005 ns <0.0005 ns <0.0005
Airflow 55.361 <0.0005 <0.0005 <0.0005 <0.0005 ns ns Ns
EGG 2.596 ns - - - - - -
fo 228.736 <0.0005 <0.0005 ns ns < .0005 <0.0005 ns
Jitter 25.961 <0.0005 <0.0005 ns ns < .0005 <0.0005 ns
Shimmer 20.306 <0.0005 <0.0005 ns ns < .0005 <0.0005 ns
NHR 73.232 <0.0005 <0.0005 ns ns < .0005 <0.0005 ns
Notes: When comparisons had a probability >0.0063, then the result was designated as nonsignificant (ns). When the initial comparison was “ns” no post hoc
comparisons were conducted designated by “-.”
Abbreviations: SPL, sound pressure level; Psg, subglottal pressure; EGG, % closed phase on electroglottograph; fo, fundamental frequency; NHR, noise-to-
harmonic ratio.
ARTICLE IN PRESS
TABLE 4.
Wilks’ Lambda, Canonical Correlations, and Eigenvalues for Canonical Discriminant Functions
Test of Wilks’ % of
Functions λ x2 df P rc rc 2 Variance Eigenvalues
1−3 0.037 255.918 21 <0.0005 0.901 0.812 61.5 4.332
2−3 0.196 126.199 12 <0.0005 0.818 0.669 28.8 2.028
3 0.594 40.325 5 <0.0005 0.637 0.406 9.7 0.683
Abbreviations: rc, canonical correlation; rc2, effect size.
variance and significantly differentiate between groups accounted for 9.7% of the variance, it still significantly
using the full model test (functions 1 through 3), partial classified Thin (group centroid = 1.523) from other qual-
model test (functions 2 and 3), and the test of function 3 ities (Thick centroid = 0.622, Stiff centroid = 0.376, Slack
(Table 4). All three functions combined accounted for 100% centroid = 0.264) with subglottal pressure and airflow pri-
of dispersion. A large canonical correlation and effect size marily responsible for group differences.
was found for functions 1 and 2; while function 3, although The classification accuracy using these three functions
significant, had only a moderate canonical correlation and was 95.2% for Thick, 94.4% for Thin, 87% for Slack, and
effect size. 77.3% for Stiff. Slack was low on function 1 (Figure 2) with
The standardized discriminant function coefficients and high HNR, shimmer and jitter, and low SPL (Figure 1).
structure coefficients were examined to determine which Three Slack productions were misclassified, two as Thin
measurements contributed to group classification. Func- and one as Stiff. Stiff was high on both functions 1 and 2
tion 1 significantly discriminated Slack from all other con- (Figure 2), and was characterized by high airflow. It had the
ditions (Figure 2) with NHR and SPL, and to some extent lowest classification accuracy, with three misclassified as
jitter and shimmer, primarily contributing to group differ- Thin, one as Slack, and one as Thick. Thick was high on
ences (Table 5). Function 2 significantly classified Thick function 1 and low on function 2, with high SPL, high Psg
from Thin, Stiff and Slack, and Stiff from Thick, Thin and low NHR, jitter and shimmer. Only one Thick produc-
and Slack (Figure 2) with group differences primarily tion was misclassified as Stiff. Thin was close to zero on
dependent on airflow (Table 5). Although function 3 only functions 1 and 2, but distinctively low on function 3. It was
FIGURE 2. A plot of the canonical scores of each of the 84 productions based on the discriminant function analysis using seven acoustic
measures to differentiate between Slack, Thick, Thin, and Stiff. Although the model employed three factors only factors 1 and 2 are used to
display the canonical scores for each production.
ARTICLE IN PRESS
TABLE 5.
Standardized Discriminant Function Coefficients (r) and Structure Canonical Coefficients (rs) With Effect Sizes (rs2) for
Seven Acoustic and Aerodynamic Measures for Differentiating Among the Four Voice Qualities: Stiff, Slack, Thick, and
Thin
Function 1 Function 2 Function 3
Variables r rs rs2 r rs rs2 r rs rs2
SPL 0.515 0.652 0.425 0.293 0.204 0.042 0.181 0.316 0.100
Psg 0.044 0.279 0.078 0.502 0.387 0.150 0.708 0.734 0.539
Airflow 0.207 0.401 0.161 0.911 0.778 0.605 0.393 0.474 0.225
EGG 0.053 0.103 0.011 0.014 0.157 0.025 0.031 0.039 0.002
Jitter 0.139 0.440 0.194 0.224 0.201 0.040 0.228 0.284 0.081
Shimmer 0.199 0.408 0.166 0.105 0.068 0.005 0.058 0.218 0.048
NHR 0.921 0.778 0.605 0.026 0.078 0.006 0.764 0.409 0.167
Note: The rs2 provides the percentage of variance accounted for in the composite score for each function.
Abbreviations: SPL, sound pressure level; Psg, subglottal pressure; EGG, % closed phase electroglottograph; NHR, noise-to-harmonic ratio.
low on jitter, shimmer, NHR, SPL, Psg, and airflow with classification accuracy was 86.4% for Stiff, 95.2% for Thick,
high classification accuracy and only one misclassification and 88.9% for Thin. Stiff was distinguished by high airflow
as Thick. and jitter, while Thick had greater subglottal pressure
A pure tone of 262 Hz (C4) was played as a target pitch (Figure 1). Thin had low subglottal pressure compared to
immediately prior to the production of three qualities: Thick.
Thick, Thin, and Stiff; therefore, with no difference in pro-
cedures between these three qualities, the measure of fo
could be included when acoustic and aerodynamic measures DISCUSSION
were examined for differentiation among these three quali- The purpose of this study was to determine if the concepts
ties on 62 productions (with Slack removed). A stepwise that are used to guide training for the perception and pro-
backwards DFA was calculated to identify the optimal mea- duction of four voice qualities in Estill Voice Training are
sure combination for differentiating between Thick, Thin, supported by differences in acoustic and aerodynamic meas-
and Stiff. The optimal combination was just three measures: ures among productions of these voice qualities. To assure
Psg, airflow, and jitter for the full model test of function 1 that the voice qualities were accurate, four external
and the test of function 2, both of which were significant at reviewers with expert experience in Estill training were
P< 0.0005 (Table 6). A large canonical correlation and asked to review the productions and independently catego-
effect size was found for function 1 and a moderate canoni- rize them as one of the four voice qualities. Overall there
cal correlation and effect size was found for function 2. The was high agreement between the intended voice qualities
two functions accounted for 100% of the dispersion. and the categories assigned by the external reviewers. This
For function 1 Stiff was high and Thick was low, and on agreement was greatest for Slack and high for Stiff, with
function 2 Thin was low (Figure 3). Airflow and Jitter pri- both well over 90% accuracy. Thick and Thin were less
marily contributed to group differences on function 1 differ- accurate with percent correct in the mid 80%, with some dis-
entiating Stiff from Thick and Thin (Figure 3 and Table 7). agreement between these two categories and with Stiff.
For function 2, Thin was discriminated from Thick and Stiff The DFA examined the acoustic and aerodynamic meas-
based on Psg and airflow. The overall classification accu- ures that characterized each of the four qualities. Seven
racy was 88.7% correct. Three Stiff productions were mis- measures were included; as a target fo was not included dur-
classified as Thin, while one Thick was misclassified as Thin ing the procedure for Slack while the other three required
and two Thin items were misclassified as Thick. The the participant to match a 262-Hz (C4) tone, fo was
TABLE 6.
Wilks’ Lambda, Canonical Correlations, and Eigenvalues for Canonical Discriminant Functions for Backwards Stepwise
Discriminant Functions Between Stiff, Thick and Thin
Test of Wilks’ % of
Functions λ x2 df P rc rc2 Variance Eigenvalues
1 and 2 0.117 122.525 6 <0.0005 0.887 0.787 81.6 3.689
2 0.546 34.443 2 <0.0005 0.673 0.453 18.4 0.830
Abbreviations: rc, canonical correlation; rc2, effect size.
ARTICLE IN PRESS
FIGURE 3. Canonical score plot shows the functions 1 and 2 canonical scores for Stiff, Thick, and Thin from a stepwise discriminant func-
tion analysis that employed three measures: estimate of subglottal pressure (Psg) in centimeters of water (cm H2O), airflow in liters per sec-
ond (L/s), and Jitter in percent variability of fo.
excluded. Two qualities were accurately categorized, 95% airflow, a high subglottal pressure, and low HNR, jitter,
for Thick and 94% for Thin by the measures on the initial and shimmer. The results showed that Thick was character-
DFA. Thus, although external reviewers judged participants ized by high SPL, high subglottal pressure, and low HNR,
as less able to produce Thick and Thin, these qualities had jitter, and shimmer, which is in agreement with the concepts
few classification errors based on the DFA. The only error used in training Thick.
for Thick was one misclassification as Stiff, and Thin had The Thin voice quality is theorized to have a less-flexible
only one erroneously classified as Thick. body and cover with a shallow depth of contact of the folds
Thick voice quality, or modal (chest) voice, is theorized to that does not span the full lower to upper border of the
involve a pliant body and cover with a deep contact of the medial edge.19 Complete closure of the glottal space is
folds along the full medial edge from the lower to the upper hypothesized with full anterior to posterior vibration. Lon-
border and complete closure of the folds with full anterior gitudinal tension is thought to be applied to the vocal fold
to posterior vibration.19 The Thick deep margin of contact due to a tight vocal ligament preventing the lower margin of
is hypothesized to be due to a lax vocal ligament with thyro- the vocal folds from coming into contact.32 These character-
arytenoid muscle activation.21,32 These characteristics istics would predict less SPL and less subglottal pressure
would predict no loss of air between pulses, thus low than Thick, with similar airflow, all of which were found.
The one misclassification for Thin was as Thick. The only
distinguishing differences for Thin were the lower SPL and
TABLE 7. lower Psg than Thick.
Standardized Discriminant Function Coefficients (r) and Stiff voice quality condition (falsetto and breathy voice) is
Structure Canonical Coefficients (rs) With Effect Sizes thought to be due to the elongated vocal folds being Stiff,
(rs2) of Stepwise (Backward) Discriminant Function with less contact of the vocal folds without full glottal clo-
Analysis Selecting the Best Combination of Measures sure resulting in a breathy voice quality.19 Longitudinal ten-
for Differentiating Among Stiff, Thick, and Thin From sion is hypothesized creating a thinner margin of
Seven Acoustic and Aerodynamic Measures vibration.21,32 In Stiff, the arytenoids may slightly abducted
Function 1 Function 2 resulting in a posterior glottal gap. These characteristics
2 would predict high airflow, which was the main distinguish-
Variables r rs rs r rs rs 2
ing measure for Stiff. Stiff, however, had the highest number
Psg 0.505 0.303 0.092 0.857 0.945 0.893 of misclassifications with only 77% classified correctly, with
Airflow 0.691 0.636 0.404 0.314 0.564 0.318 three misclassified as Thin, one misclassified as Slack, and
Jitter 0.595 0.685 0.469 0.088 0.147 0.022 one as Thick. This suggests that Stiff was slightly more diffi-
cult to characterize using the acoustic and aerodynamic
ARTICLE IN PRESS
measures. Stiff productions, however, were accurately clas- stated, however, that overall participants had good carry-
sified by the external reviewers when compared with the over from their initial training (3−5 months prior to testing)
intended production classification at 93.2%. Thus, Stiff as shown by the high accuracy at producing the target
proved to be easily learned by the participants. sounds as judged by three out of four expert judges.
In the Slack condition (glottal fry), it is hypothesized that Third, the trainees were not assessed on their perceptual
the vocal fold body and cover are loose resulting in a very accuracy in classifying voice quality productions; only the
slow vibratory pattern due to increased vibratory mass with accuracy of their productions was assessed by experts in the
audible gaps.32 The vocal folds are hypothesized to be short Estill Voice Training methods.
and flaccid with decreased longitudinal tension creating a Fourth, by using the Phonatory Aerodynamic System
thicker, more compact vocal fold vibratory margin. This with the Smitheran and Hixon35 technique for estimating
would predict a low fo, a high proportion of closed phase subglottal pressure Psg, we were able to examine whether
on the EGG, and high instability of vibration increasing changes in subglottal pressure occurred between the differ-
NHR, jitter, and shimmer. The results found high NHR ent voice qualities. The results demonstrated that Psg was
and high shimmer, low airflow, and low SPL. In addition, among the best combination of measures for differentiating
Slack was characterized by a low fo. among Stiff, Thick, and Thin as Thick had higher estimated
To improve classification accuracy between Thick, Thin, levels of subglottal pressure. However, the use of the /pi/
and Stiff, Slack was removed and fo was included and a repetition task with the Phonatory Aerodynamic System
stepwise backwards DFA conducted. The percent correct was not a usual voicing task and eliminated the ability to
classification increased in the stepwise DFA, from 77% to image the vocal folds during recording. The need for the
86.4% for Stiff, while Thick and Thin continued to have pneumotachometer mask and to seal the lips during the /p/
95% and 89% classification accuracy while using Psg, air- closure to measure intraoral pressure placed some con-
flow, and jitter. Therefore, fo did not differentiate between straints on the participants. In spite of this, however, they
Stiff, Thick, and Thin, perhaps because a 262-Hz tone was were able to produce samples that were judged by the exter-
played before each production of these three voice qualities. nal reviewers as being representative of the four different
Accordingly, voice training enabled the participants to voice qualities. Thus, we were unable to measure vocal fold
achieve relatively good accuracy in producing the types of length, configuration, or opening during vibration to con-
voice qualities and these in turn could be differentiated firm that the intended vocal fold configuration was
using acoustic and aerodynamic measures that would be achieved.
expected based on the theoretical concepts used in the Estill A separate study using rigid stroboscopy for high-speed
Voice Training System. Ideally, direct measures of vocal video recording with parallel lasers for distance calibration
fold lengthening, depth of contact from the superior to infe- into millimeters would be required. However, as rigid stro-
rior surface of the medial edge of the vibrating folds, and boscopy requires that the mouth is held open and the tongue
muscle contraction would provide the most accurate feed- protruded, this can alter vocal fold configuration. Other
back to trainees on their achievement of the physiological vocal fold imaging techniques such as magnetic resonance
parameters intended for each production. Such measures, imaging and computed tomography require that the subject
however, are not feasible during voice production without maintain the same position for many seconds to complete a
significant interference using invasive methods such as rigid scan, which is not possible during syllable repetition. Thus,
endoscopy with lasers for measuring vocal fold length accu- imaging to confirm the vocal fold configuration used for
rately, three-dimensional high-speed high-resolution imag- each production presents challenges and would need to be
ing (not yet available), and laryngeal electromyography. conducted in a separate study.
Besides the theoretical concepts being provided to guide We did not ask the participants to report the amount of
trainees, facilitation during training is dependent upon effort they required to produce the different intended voice
feedback by experts. In addition, with training, trainees qualities. This would have been of interest particularly as it
may be gaining perceptual accuracy in distinguishing their was 3−5 months after they had completed their Estill Voice
voice productions of the four qualities. Given that the Training.
acoustic and aerodynamic measures used here were found This study involved four expert listeners for categorizing
to distinguish between the four voice qualities, perhaps the the productions as representing qualities based on Estill
measures of SPL, Psg, airflow, fo, and NHR could be used Voice Training and might therefore be considered a first test
along with spectroscopy for monitoring trainees’ produc- of this approach. Although a large number of expert listen-
tion accuracy in addition to perceptual judgments by ers might be optimal, this might increase the interexaminer
experts during training. differences if experts were included with a wide range of
This study had several limitations. First, only female par- experience and skill in Estill Voice Training.
ticipants were studied; it is unknown whether male trainees In conclusion, although indirect, the acoustic and aerody-
would have similar results after participating in Estill Voice namic measures of the four voice qualities did show support
Training. Second, no systematic information was gathered for the physiological concepts used in the Estill Voice Train-
on the amount of training that is needed for trainees to gain ing method. Overall, Slack had a low fo, low SPL, and high
accuracy in producing each of the voice qualities. It can be vibratory instability. Thick was characterized by high
ARTICLE IN PRESS
subglottal pressure, high SPL, and good vibratory stability. 19. Klimek MM, Obert K, Steinhauer K. The Estill Voice Training Sys-
Stiff was characterized by high airflow and Thin had lower tem, Level One: Compulsory Figures for Voice Control. Estill Voice
subglottal pressure than Thick. These measures were able to Training Systems International, LLC; 2005.
20. De Bot K, Lowie W, Verspoor M. A dynamic systems theory approach
differentiate among the four voice qualities and might be to second language acquisition. Bilingualism Lang Cognit. 2007;10:
used for assessing production accuracy during training. 7–21.
21. Van den Berg J. Myoelastic-aerodynamic theory of voice production.
J Speech Lang Hear Res. 1958;1:227–244.
SUPPLEMENTARY DATA 22. Hirano M. Morphological structure of the vocal cord as a vibrator and
Supplementary data related to this article can be found its variations. Folia Phoniatr Logop. 1974;26:89–94.
online at https://doi.org/10.1016/j.jvoice.2019.07.011. 23. Titze IR. On the mechanics of vocalfold vibration. J Acoust Soc Am.
1976;60:1366–1380.
24. Story BH. An overview of the physiology, physics and modeling of the
REFERENCES sound source for vowels. Acoust Sci Technol. 2002;23:195–206.
1. Zraick RI, Kempster GB, Connor NP, et al. Establishing validity of 25. Story BH, Titze IR, Hoffman EA. The relationship of vocal
the consensus auditory-perceptual evaluation of voice (CAPE-V). Am tract shape to three voice qualities. J Acoust Soc Am. 2001;109:
J Speech Lang Pathol. 2011;20:14–22. 1651–1667.
2. Kreiman J, Gerratt BR. Perceptual assessment of voice quality: past, 26. Bergan CC, Titze IR, Story B. The perception of two vocal qualities in
present, and future. SIG 3 Persp Voice Voice Disord. 2010;20:62–67. a synthesized vocal utterance: ring and pressed voice. J Voice.
3. Eadie TL, Baylor CR. The effect of perceptual training on inexperienced 2004;18:305–317.
listeners' judgments of dysphonic voice. J Voice. 2006;20:527–544. 27. Samlan RA, Story BH. Relation of structural and vibratory kinematics
4. Shrivastav R, Sapienza CM, Nandur V. Application of psychometric of the vocal folds to two acoustic measures of breathy voice
theory to the measurement of voice quality using rating scales. based on computational modeling. J Speech Lang Hear Res. 2011;54:
J Speech Lang Hear Res. 2005;48:323–335. 1267–1283.
5. Shrivastav R. Multidimensional scaling of breathy voice quality: indi- 28. Sommer DE, Tokuda IT, Peterson SD, et al. Estimation of inferior-
vidual differences in perception. J Voice. 2006;20:211–222. superior vocal fold kinematics from high-speed stereo endoscopic data
6. Kreiman J, Gerratt BR, Precoda K, et al. Individual differences in in vivo. J Acoust Soc Am. 2014;136:3290–3300.
voice quality perception. J Speech Lang Hear Res. 1992;35:512–520. 29. Semmler M, Kniesburges S, Birk V, et al. 3D reconstruction of human
7. Kreiman J, Gerratt BR, Precoda K. Listener experience and percep- laryngeal dynamics based on endoscopic high-speed recordings. IEEE
tion of voice quality. J Speech Lang Hear Res. 1990;33:103–115. Trans Med Imaging. 2016;35:1615–1624.
8. Bassich CJ, Ludlow CL. The use of perceptual methods by new 30. Semmler M, D€ ollinger M, Patel RR, et al. Clinical relevance of endo-
clinicians for assessing voice quality. J Speech Hear Disord. 1986;51: scopic threedimensional imaging for quantitative assessment of pho-
125–133. nation. Laryngoscope. 2018;128:2367–2374.
9. De Bodt MS, Wuyts FL, Van de Heyning PH, et al. Test-retest study 31. de Mul FF, George NA, Qiu Q, et al. Depth-kymography of vocal fold
of the GRBAS scale: influence of experience and professional back- vibrations: part II. Simulations and direct comparisons with 3D profile
ground on perceptual rating of voice quality. J Voice. 1997;11:74–80. measurements. Phys Med Biol. 2009;54:3955.
10. de Krom G. Some spectral correlates of pathological breathy and 32. Titze IR. Principles of Voice Production. Iowa City, IA: National Cen-
rough voice quality for different types of vowel fragments. J Speech ter for Voice and Speech; 2000.
Lang Hear Res. 1995;38:794–811. 33. Jacobson BH, Johnson A, Grywalski C, et al. The voice handicap
11. Fex S. Perceptual evaluation. J Voice. 1992;6:155–158. index (VHI): development and validation. Am J Speech Lang Pathol.
12. Gelfer MP. Perceptual attributes of voice: development and use of rat- 1997;6:66–70.
ing scales. J Voice. 1988;2:320–326. 34. Kempster GB, Gerratt BR, Abbott KV, et al. Consensus auditory-per-
13. Kempster GB, Kistler DJ, Hillenbrand J. Multidimensional scaling ceptual evaluation of voice: development of a standardized clinical
analysis of dysphonia in two speaker groups. J Speech Lang Hear Res. protocol. Am J Speech Lang Pathol. 2009;18:124–132.
1991;34:534–543. 35. Smitheran JR, Hixon TJ. A clinical method for estimating laryngeal
14. Kent RD. Hearing and believing: some limits to the auditory-percep- airway resistance during vowel production. J Speech Hear Disord.
tual assessment of speech and voice disorders. Am J Speech Lang 1981;46:138–146.
Pathol. 1996;5:7–23. 36. Gottliebson RO, Lee L, Weinrich B, et al. Voice problems of future
15. Laver JD. Voice quality and indexical information. Int J Lang Com- speech-language pathologists. J Voice. 2007;21:699–704.
mun Disord. 1968;3:43–54. 37. Grillo EU, Verdolini K. Evidence for distinguishing pressed, nor-
16. Sonninen A, Hurme P. On the terminology of voice research. J Voice. mal, resonant, and breathy voice qualities by laryngeal resistance
1992;6:188–193. and vocal efficiency in vocally trained subjects. J Voice.
17. Sonninen A. Phoniatric viewpoints on hoarseness. Acta Otolaryngol. 2008;22:546–552.
1970;69:68–81. 38. McHugh ML. Interrater reliability: the kappa statistic. Biochem Med.
18. Yumoto E, Sasaki Y, Okamura H. Harmonics-to-noise ratio and psy- 2012;22:276–282.
chophysical measurement of the degree of hoarseness. J Speech Lang 39. Fleiss JL. Design and Analysis of Clinical Experiments. John Wiley &
Hear Res. 1984;27:2–6. Sons; 2011.

Acoustic & Aerodynamic Comparisons After Voice Training

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Acoustic & Aerodynamic Comparisons After Voice Training

Uploaded by

Copyright:

Available Formats

ARTICLE IN PRESS

Acoustic and Aerodynamic Comparisons of Voice Qualities

INTRODUCTION a common terminology is to refer to possible physiological

Preparation of samples for categorizing by external

Acoustic and aerodynamic measures RESULTS

You might also like