TEXT 2. Gama 2024. Auditory Training With Synthesized Voice Anchors - Effects on Rater Agreement.

TagedH1Auditory Training With Synthesized Voice Anchors: Effects on
Rater AgreementTagedEn
TagedP*,†Ana Cristina Co^ rtes Gama, ‡Sabrina Martins da Mata, §,║Priscila Campos Martins dos Santos,
¶
~o Pedro Hallack Sansa
Maurılio Nunes Vieira, #Joa ~o, and **Roberto da Costa Quinino, *#Minas Gerais, y, and
z║{**Belo Horizonte, Brazil, xTagedEn
TagedPABSTRACT: Purpose. To investigate the effects of auditory training with synthesized voices on intra- and
inter-rater agreement of the auditory-perceptual voice analysis of roughness and breathiness.
Methods. This was an experimental study consisting of four auditory training sessions. The sample consisted of
twenty raters, students from a Speech-Language Pathology course, who had previous experience with auditory-
perceptual assessment. The raters participated in the four training sessions with a seven-day break in between ses-
sions. Each training consisted of three tasks: 1) Pre-training activity: Participants were asked to rate 20 natural
voices, normal and dysphonic, from zero to three, according to the parameters of roughness and breathiness; 2)
Training activity: Synthesized voice anchor stimuli were presented, and participants were asked to rate them
from zero to three. Four stimuli were related to roughness and four to breathiness. Participants heard 20 voice
stimuli and were instructed to pair the natural voice with the synthesized anchor stimulus that most resembled it;
and 3) Post-training activity: the 20 voices from the pre-training activity were randomized and participants rated
the same voices, without prior knowledge that these were repeated. Statistical analysis of data was performed
using the AC2 test, to assess the extent of agreement between raters, and the Friedman test to compare the train-
ing sessions. A 5% significance level was considered.
Results. For the auditory-perceptual voice analysis of roughness, intra-rater agreement results ranged from 79%
to 86% between the first and fourth auditory training session, with improvement in intra-rater agreement from the
fourth session forward (P = 0.005). For the analysis of breathiness, results ranged from 88% to 92% between the
first and fourth auditory training sessions, with improvement from the fourth session forward (P = 0.036). Inter-
rater agreement results for the auditory-perceptual analysis of roughness ranged from 23% to 34%, and from
48% to 61% for breathiness, with no differences regarding training (P = 0.855).
Conclusion. The auditory-perceptual breathiness parameter had a higher AC2 indicator compared to the rough-
ness parameter, suggesting better agreement. The intra-rater agreement showed improvement starting from the
fourth auditory training session for the assessment of roughness and breathiness. The auditory training program
did not show a positive inter-rater agreement impact.TagedEn
TagedPKey Words: Voice−Voice Quality−Voice Disorders−Dysphonia−Auditory Perception−Voice Training.TagedEn
TAGEDH1INTRODUCTIONTAGEDN to the speaker's gender and age, as well as to the communi-

TagedPVoice is the main component of human communication, cative context.2,3 Voice is multidimensional in nature, as its
and contributes to important speakers' characteristics, such production involves functional, biomechanical, aerody-
as expressiveness, and psycho-emotional, cultural, and namic, and acoustic mechanisms.4TagedEn
social traits.1 A voice is considered normal when it presents TagedPDysphonia is typically observed when there is a distur-
with adequate acoustic characteristics and it is appropriate bance in vocal production, with a change in voice quality
and/or presence of phonatory effort.3 Dysphonia is a multi-
TagedEnAccepted for publication September 16, 2021. factorial human communication disorder.1TagedEn
TagedEnStudy conducted at the Department of Speech-language Pathology (Faculty of
Medicine) and Department of Statistics (Institute of Exact Sciences), Universidade TagedPIn light of the multidimensional aspects of voice produc-
Federal de Minas Gerais - UFMG - Minas Gerais (MG). tion, the assessment of dysphonia must involve the analysis
TagedEnFrom the *Department of Speech-language Pathology, Federal University of
Minas Gerais. Belo Horizonte, Minas Gerais, Brazil; TagedEnyResearcher of Productivity at of all aspects related to voice production. As such, the
National Council of Scientific Research, Brazil; TagedEnzSpeech-language Pathology, Depart- American Speech-Language-Hearing Association (ASHA)
ment of Speech-language Pathology, Federal University of Minas Gerais, Belo Hori-
zonte, Minas Gerais, Brazil; TagedEnxSpeech-language Pathology, Post-graduate program developed a protocol for the instrumental assessment of
(MSc) in speech therapy Sciences; TagedEn║Department of Speech-language Pathology, Fed- voice, providing guidelines for the diagnostic process of dys-
eral University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; TagedEn{Department
of Electronic Engineering, Federal University of Minas Gerais, Belo Horizonte, phonia4. This protocol presents and discusses the guiding
Minas Gerais, Brazil; #TagedEn Department of Technology in Civil Engineering, Computing, principles of instrumental voice assessment in clinical
Automation, Telematics and Humanities, Federal University of S~ao Jo~ao Del Rei −
UFSJ − Ouro Branco, Minas Gerais, Brazil; and the TagedEn**Department of Statistics of speech-language pathology: Laryngeal assessment; acoustic
the Institute of Exact Sciences, Federal University of Minas Gerais, Belo Horizonte, evaluation; and aerodynamic evaluation.4TagedEn
Minas Gerais, Brazil.
TagedEnAddress correspondence and reprint requests to Ana Cristina C^ ortes Gama, Uni- TagedPSubjective measures, such as voice behavior analysis, are
versidade Federal de Minas Gerais, Av. Alfredo Balena, 190/249, Belo Horizonte, also part of the assessment, as well as auditory-perceptual
Minas Gerais, CEP: 30130-100, Brazil E-mail: anacgama@medicina.ufmg.br
TagedEnJournal of Voice, Vol. 38, No. 2, pp. 366−375 assessment of voice quality and self-assessment voice proto-
TagedEn0892-1997 cols which investigate voice-related quality of life.3 Audi-
TagedEn© 2021 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
TagedEnhttps://doi.org/10.1016/j.jvoice.2021.09.009 tory-perceptual evaluation is the main assessment tool in
^ rtes Gama, et al
TagedEnAna Cristina Co Auditory Training With Synthesized Voice Anchors TagedEn367
the voice clinic.1 Despite its subjectivity, it is low cost, non- graduated SLPs. This is important because of the relevance
invasive, easy to implement, and accessible to all that auditory-perceptual analysis has in clinical voice man-
clinicians.5,6TagedEn agement, as it relates to the diagnostic process and treat-
TagedPResult reliability in auditory-perceptual voice assessment ment of dysphonia.TagedEn
remains one of the greatest challenges for speech-language TagedPThe purpose of this study is to investigate the intra- and inter-
pathologists (SLPs). Three main factors may play a role in rater agreement of the auditory-perceptual analysis of vocal
the low auditory-perceptual rater-agreement of the human roughness and breathiness, by analyzing the auditory training
voice: 1) Rater-related aspects (auditory training experience results performed with synthesized voice anchor stimuli.TagedEn
and performance); 2) Type of vocal stimulus assessed (sus-
tained vowel or connected speech); and 3) Type of auditory-
TAGEDH1METHODSTAGEDN
perceptual scale used.7TagedEn
TagedPThis is an experimental study, with intra- and inter-subject
TagedPAuditory-perceptual voice analysis is based on the Cogni-
comparison. An auditory training program was developed
tive Science prototype model, that is, on the use of internal
by the researchers to carry out the study. The program con-
standards.6 In this model, raters classify a voice based on
sisted of four training sessions in auditory-perceptual assess-
their internal standard; thus, matching the similarity of a
ment with natural voices and synthesized anchor stimuli.TagedEn
vocal parameter with its voice category's reference.6 There-
TagedPThe research was approved by the Research Ethics Com-
fore, when analyzing a voice, raters compare the new stimu-
mittee (COEP) of the Federal University of Minas Gerais
lus with their own internal standard, and classify the new
(UFMG) under the protocol number CAAE −
vocal stimulus based on this comparison.TagedEn
37872314.2.0000.5149.TagedEn
TagedPRaters' internal standards may develop in several ways.
Experience is one aspect that may affect the degree of represen-
tativeness of the internal standard. Research shows a positive TagedH2SampleTagedEn
correlation between the length of time of raters' experience and TagedPA sample calculation was performed to determine the num-
the reliability of the auditory-perceptual analysis,8,9,10 suggest- ber of participants. A sample size of 20 raters was defined,
ing that experienced raters have greater agreement in the audi- based on the following: 20 observations (voices to be evalu-
tory-perceptual analysis than inexperienced raters. As such, ated) and eight variables (roughness and breathiness param-
with more experience comes better internal standard construc- eters with the four degrees of deviation: neutral, mild,
tions for different types of voice quality.TagedEn moderate, and severe), using Fleiss' Kappa test, with a sta-
TagedPAuditory training is another aspect that affects the repre- tistical power of 80% and a significance level of 5%.TagedEn
sentativeness of the internal standards of voice quality. TagedPThe sample consisted of 20 raters (3 males, 17 females),
There are several types of strategies used in auditory train- speech-language pathology students, with an age range
ing programs to develop the raters' internal standards, from 21 to 37 years, and an average age of 24 years
which include: 1) Presentation of the trained auditory-per- (SD=3.87). Inclusion criteria were that participants had
ceptual parameter concepts; 2) Use of natural or synthesized experience in auditory-perceptual voice analysis and did not
voice anchor stimuli as external references; and 3) Laryn- present with a self-reported complaint of hearing loss. Expe-
geal data description of the evaluated voice.10TagedEn rience in auditory-perceptual voice analysis was defined as
TagedPThe literature suggests that the use of external references, knowledge in the definitions of rough and breathy voices
rather than the rater's internal standard, can be a way to and experience in evaluating normal and dysphonic voices,
improve auditory-perceptual analysis reliability.10-13 with an average time of two hours. This experience was
Research shows that the use of synthesized or natural obtained in the undergraduate disciplines of the Speech-lan-
anchors, as external references, increases intra- and inter- guage Pathology course. Participants who did not perform
rater agreement,10,12,14,15 and that synthesized voices all steps involved in the four auditory training sessions were
anchors are more efficient, as they can manipulate a single excluded. Twenty-four raters participated in the research,
auditory-perceptual parameter, thus simplifying the audi- and four were excluded.TagedEn
tory judgment of human voice quality.14,15TagedEn
TagedPAuditory training programs with synthesized anchor
TagedH2Auditory training programTagedEn
stimuli show a positive impact on the intra14 and inter-
TagedPThe auditory training program consisted of four sessions,
rater12,14 agreement of auditory-perceptual voice analysis.
with an average duration of 45 minutes each session. All the
These programs are based on the psychophysical method of
raters participated in every session, with an interval of seven
category estimation (Intramodal matching procedure),14
days between each one, during a one-month period.TagedEn
where raters must match a voice to an anchor stimulus that
TagedPThe program's sessions were composed of three activities,
most resembles it.TagedEn
organized as follows:TagedEn
TagedPAlthough the literature presents promising results on the
positive effects of auditory training on the reliability of audi-
tory-perceptual voice analysis, more research is still needed TagedPPre-training activityTagedEn
to understand the best training format to instruct inexperi- TagedPThis activity consisted in the assessment of 20 natural voi-
enced raters, such as undergraduate students or newly ces, normal and dysphonic, in which participants evaluated
TagedEn368 Journal of Voice, Vol. 38, No. 2, 2024
the roughness parameter (R- perception of irregularity dur- TagedH2Vocal stimuli selection from pre- and post-training
ing vocal production) and the breathiness parameter (B- activitiesTagedEn
perception of audible air escape during vocal production), TagedPThe Voice Clinic of UFMG's "Hospital das Clínicas" (AV/
and marked the degree of voice deviation on a four-point HC-UFMG) provided the voice bank for the sample of voi-
Likert scale (0 − neutral, 1 − mild, 2 − moderate, 3 − ces for the pre- and post-training evaluation. The voice
severe) for each of the two evaluated auditory-perceptual bank was composed of 381 samples of the sustained vowel
parameters.TagedEn /a /, from male and female individuals over the age of 18.TagedEn
TagedPThe pre-training activity consisted only of the evaluation TagedPThe voice recorder was performed using a DellÒ com-
of 20 natural voices, therefore, only human voices were pre- puter, model Optiplex GX260, with a professional sound
sented at this stage.TagedEn card, Direct SoundÒ , and a ShureÒ omni-directional, con-
denser microphone. Subjects were instructed to perform a
maximum phonation time task on a sustained vowel /a/, at
TagedPTraining activityTagedEn
a comfortable pitch and loudness. Participants performed
TagedPThis step adapted the psychophysical method of category
the emissions standing and the microphone was positioned
estimation (Intramodal matching procedure)14. Four synthe-
10 cm from the mouth, in a diagonal position and with a
sized anchor stimuli of roughness (R) and four of breathi-
90° directional pickup angle.TagedEn
ness (B) were presented for each degree of vocal deviation,
TagedPThe vowel / a / was sustained using a standardized emis-
ranging from zero to three.TagedEn
sion time of tree seconds. The sustained phonation was
TagedPParticipants listened to the anchor stimuli and a natural
recorded in an acoustically treated room with environmen-
voice stimulus and were instructed to pair the natural voice
tal noise below 50 dBSPL and measured using a Radio-
with the synthesized anchor stimulus that most resembled
ShackÒ sound pressure level meter.TagedEn
the voice presented.TagedEn
TagedPFour voice-specialized SLPs analyzed the 381 voices sam-
TagedPThroughout the activity, participants had access to a
ples. Each SLP had five to 10 years’ experience in auditory-
written definition of the auditory-perceptual parameters
perceptual voice assessment. All voices were classified
they were evaluating. For the R parameter, the concept
according to the pre-established parameters (R or B) and
presented was "perceived irregularity in the voicing
according to the general degree of voice deviation (0 − neu-
source"16 and the B parameter "audible air escape in the
tral, 1 − mild, 2 − moderate, 3 − severe).TagedEn
voice."16TagedEn
TagedPFor this activity, 20 natural voices were selected, 13 from
TagedPThis step consisted of pairing 20 natural voices (nor-
female subjects and seven from male subjects. Four voices
mal and dysphonic from mild to severe deviation
were normal and 16 were dysphonic. For the selection of 20
degrees) with the synthesized anchor stimuli that most
voices, we considered those in which the four evaluators
resembled these voices.TagedEn
were in agreement regarding the type of auditory perceptual
parameter (normal, R or B) and the degree of voice devia-
TagedPPost-training activityTagedEn tion (0 to 3). Of the 16 voices classified as altered, all had
TagedPThe 20 voices used in the pre-training activity were random- the R and/or B parameters, in addition to other auditory-
ized and the raters judged the same voices again, without perceptual parameters such as strain, tremor, or instability,
prior knowledge that they were repeated, maintaining the but only the R and B parameters were analyzed.TagedEn
same pre-training activity assessment protocol.TagedEn TagedPThe type of laryngeal alteration present in each altered
TagedPParticipants were able to listen to the natural voices or voice was not controlled, only the auditory-perceptual
synthesized anchor stimuli as often as needed, during all parameter (normal, R or B) and the degree of alteration.TagedEn
training activities. The same group of voices were used for
the pre- and pos-training activities (20 natural voices) and,
TagedH2Vocal stimuli selection from the training activityTagedEn
for the training activity, were presented other 20 natural voi-
TagedPSynthesized anchor stimulusTagedEn
ces and eight synthesized anchor stimuli in a total of 40 nat-
TagedPA parametric model was used as a source (glottic flow) for
ural voices and eight synthesized stimulus.TagedEn
the construction of normal, rough (R), and breathy (B) syn-
TagedPIn all four training sessions, the evaluators analyzed the
thesized voices, with different degrees of vocal deviation.
same voices, 20 for the pre- and post-training (assessment)
This model allowed control of the fundamental frequency,
activity and 20 natural voices with eight anchor stimuli for
jitter, shimmer, and the noise-to-harmonic ratio. A vocal
the auditory training. The only difference was in the order
tract modeling a sustained vowel /a/ was used as a filter. The
in which the voices were presented. Participants did not
vocal tract was extracted from the natural voice using a lin-
have access to this information throughout the study.TagedEn
ear prediction technique. The vocal stimuli were built by an
TagedPA Multilaser Vibe Stereo Headphone was used to carry
engineer, for a total of 300 synthesized voices.17TagedEn
out the program's activities. Sessions were carried out indi-
TagedPTo analyze the degree of naturalness of the synthesized
vidually in a silent environment, with a noise level of less
voices, three SLPs were selected with over five years’ experi-
than 50 dB SPL, measured by a RadioShackÒ sound pres-
ence in auditory-perceptual voice assessment. These three
sure level meter. All evaluators listened to the voices at a
raters individually performed the analysis of the naturalness
volume they felt was comfortable for judging the voices.TagedEn
^ rtes Gama, et al
of the voices (how the listener perceived the voice as a to assess intra- and inter-rater agreement. The intra-rater
human voice) by using a visual analogue scale (VAS). Sub- agreement was measured by comparing the responses of the
sequently, voices were classified as normal (absence of vocal 20 voices analyzed before and after the auditory training of
deviation), rough, or breathy.TagedEn the 20 judges of the research, and the inter-rater agreement
TagedPFinally, the degree of voice deviation of each of the emis- was calculated by comparing the responses of the post-train-
sions was measured, also by means of a VAS. The voice ing moment of the 20 raters for each training session. The
deviation results obtained by means of the VAS classifica- AC2 test was performed using the R software (version
tion were then converted to a Likert scale (0 to 3) using val- 3.5.1). Friedman's test for multiple comparisons was used to
ues suggested by the literature18: 1) For roughness: A score assess intra- and inter-raters’ agreement between the four
of zero for measures up to 8.5 mm; a score of one from 8.5 auditory training sessions. The test was performed in the
to 28.5 mm; a score of two from 28.5 to 59.5 mm; and a Minitab program (version 19.1.1.0) with a significance level
score of three for measures over 59.5 mm; and 2) For of 5%.TagedEn
breathiness: A score of zero for measures up to 8.5 mm; a
score of one from 8.5 to 33.5 mm; a score of two from 33.5
to 52.5 mm; and a score of three for measures over 52.5 mm.TagedEn
TagedPSynthesized voices with different degrees of deviation were TAGEDH1RESULTSTAGEDN
selected as anchors for each parameter. The voices were clas- T he analysis of Figures 1 and 2 suggests that during the first
agedPT
sified according to the highest degree of naturalness by at auditory training session, raters tended to value the presence
least two raters,19 totaling eight voices, four synthesized voi- of vocal quality deviations less, either because they consid-
ces for the R parameter, and four for the B parameter.TagedEn ered these deviations inherent to the normal characteristics
of the voice, or because they regarded deviations with a
TagedPNatural voicesTagedEn lesser degree of alteration. This aspect was more evident in
TagedPAn additional 20 voices from the AV/HC-UFMG database the R parameter analysis (Figure 1).TagedEn
were selected to create a group of natural voices for this TagedPIn the second session (Figures 3 and 4), after auditory
activity, using the same evaluation protocol of the pre- and training, the evaluators started to exchange opinions
post-training activities, that is, those in which the four raters between close pairs of vocal deviation. For parameter R,
agreed on the type of auditory perceptual parameter and this was noted between scores zero and one, and for param-
the degree of deviation were considered.TagedEn eter B, it was noted between scores two and three. Such
TagedPThis group of voices was made up of three normal and 17 results demonstrate that the raters were refining their voice
dysphonic voices, with different degrees of deviation. There analysis.TagedEn
were 14 female voices and six male voices. For the 17 dys- TagedPDuring the third session, there was less auditory training
phonic voices, the type of laryngeal change was not con- interference in voice analysis, which was more pronounced
trolled for.TagedEn for parameter B (Figures 5 and 6).TagedEn
TagedPDuring the fourth session, the evaluation of the R and B
parameters presented with similar analyzes results, before
TagedH2Statistical analysisTagedEn and after auditory training (Figures 7 and 8).TagedEn
TagedPFor statistical analysis, a descriptive analysis of the data was TagedPTable 1 showed differences in intra-rater agreement
TagedFiurfirst performed and then the Gwet's AC2 test was performed between the auditory training sessions, with significant
FIGURE 1. Frequency of responses in the roughness analysis in the first session.TagedEn

TagedFiur
FIGURE 2. Frequency of responses in the breathiness analysis in the first session.TagedEn

TagedFiur
FIGURE 3. Frequency of responses in the roughness analysis in the second session.TagedEn
TagedFiur
FIGURE 4. Frequency of responses in the breathiness analysis in the second session.TagedEn

^ rtes Gama, et al
TagedFiur
FIGURE 5. Frequency of responses in the roughness analysis in the third session.TagedEn
changes between sessions one and four for the R (P = 0.005) Benefits included: 1) more accurate assessment; and 2)
and B (P = 0.035) parameters.TagedEn improvement in intra-rater agreement.TagedEn
TagedPTables 2 and 3 showed no differences in the inter-rater TagedPThe literature indicates that inexperienced raters tend to
agreement between the four auditory training sessions for be more pathologizing than experienced ones.21 The present
both auditory-perceptual parameters.TagedEn study showed that auditory-perceptual training had raters
consider roughness and breathiness with a lower degree of
alteration. This aspect was more evident after the second
TAGEDH1DISCUSSIONTAGEDN auditory-perceptual training session.TagedEn
TagedPAuditory-perceptual analysis is an essential component of TagedPTraining with external stimuli, that is, external reference
voice assessment. A correct diagnosis with the help of com- standards, tends to positively impact intra- and inter-rater
plementary exams provides information to carry out a spe- agreement in the rating of voices.7,11,12,15,22 The need to
cific and effective therapeutic approach, directly influencing obtain a method to increase intra-rater agreement in the
the patient's prognosis.8,20TagedEn auditory-perceptual voice analysis is increasingly essential
TagedPThe present study showed a progressive impact of train- for speech-language pathology practice. The results of this
ing with synthesized anchor emissions on the R and B research indicated that the auditory-perceptual training
parameters on auditory-perceptual voice assessment. with synthesized anchor stimuli improved the intra-rater
TagedFiur
FIGURE 6. Frequency of responses in the breathiness analysis in the third session.TagedEn

TagedFiur
FIGURE 7. Frequency of responses in the roughness analysis in the fourth session.TagedEn

TagedFiur
FIGURE 8. Frequency of responses in the breathiness analysis in the fourth session.TagedEn
agreement for roughness and breathiness, in the last session,

TagedPThe literature20 shows a study which carried out the audi-
the fourth auditory training session. It is believed that the
tory-perceptual training of human voices with inexperienced
use of synthesized anchors and the shortest time interval
raters. It was observed that the voice parameters of harsh-
between training sessions might have changed the raters'
ness and hoarseness showed high intra-rater agreement dur-
internal standards from the fourth training session forward.TagedEn
ing the first assessment. However, during the auditory-
perceptual training sessions, these parameters' agreement
TagedEn TABLE 1. decreased. This aspect can be explained by the fact that the
Mean Values of Intra-Rater Agreement Between the time interval between training was quite long, which may
Auditory Training Sessions for the Roughness and have interfered in the process of the raters developing the
Breathiness Parameters, Through the AC2 Coefficient, internal standards of the parameters.20 In this research, the
and Comparison Between the Four Sessions
time of one week between sessions of auditory-perceptual
Auditory-perceptual Training sessions P-value training showed to be an adequate time to improve intra-
parameters evaluator agreement.TagedEn
1st 2nd 3rd 4th
TagedPThis study did not show an increase in inter-rater agree-
R 0.79 0.85 0.85 0.86 0.01 ment after auditory-perceptual training with synthesized
B 0.88 0.9 0.9 0.92 0.01 anchor stimuli. This data differed from two other
R, roughness; B, breathiness; Friedman test for multiple comparisons. studies12,14 that used synthesized anchors and observed an
^ rtes Gama, et al
TagedEn TABLE 2. TagedPIn the present study, despite the use of synthesized anchor
Values of Inter-Rater Agreement Between the Auditory stimuli, roughness and breathiness parameters' training
Training Sessions for the Roughness Parameter were performed concomitantly. This may have been a con-
Through the AC2 Coefficient, and Comparison of the founding factor impacting the inter-rater agreement. Fac-
Post-Training Moment Between the Four Sessions tors such as fatigue, attention lapses, and training mistakes
may also have impacted auditory-perceptual
Session Pre-training Post-training P-value
agreement.8,26,27TagedEn
1st 0.23 0.30 TagedPThe literature10,28 suggests that SLPs' length of experience
2nd 0.30 0.30 0.775 has a positive impact on inter-rater agreement, suggesting
3rd 0.27 0.27
that auditory-perceptual analysis experience tends to stan-
4th 0.34 0.27
dardize the process of voice assessment. Raters experienced
Friedman test for multiple comparisons.
in auditory-perceptual voice analysis seem to have more
ease in using learning strategies to improve their perfor-
mance in voice assessment.28 Although the raters in this
study had previous exposure to auditory-perceptual analy-
TagedEn TABLE 3.
sis, they were still undergraduate SLP students, thus their
Values of Inter-Rater Agreement Between the Auditory experience was limited in this type of assessment. This fac-
Training Sessions for the Breathiness Parameter tor, as pointed out in the literature,29 may explain the low
Through the AC2 Coefficient, and Comparison of the inter-rater agreement scores in the initial training sessions,
Post-Training Moment Between the Four Sessions but future research is needed for a better understanding of
the low inter-rater agreement during auditory-perceptual
Session Pre-training Post-training P-value
training. The inter-rater agreement was considerably lower
1st 0.52 0.61 than the intra-rater agreement in the present study, which
2nd 0.61 0.55 0.392 corroborated the literature.19,28 Such results suggest that the
3rd 0.51 0.49 inter-rater agreement seems to be more difficult to develop,
4th 0.53 0.48
probably because it requires that the internal standards6 of
Friedman test for multiple comparisons. the raters must be matching among them.TagedEn
TagedPThe number of training sessions may also influence inter-
rater agreement. Some studies showed improved agreement
increase in inter-rater agreement after auditory-perceptual after approximately 10 training sessions.20,30 The present
training. These two studies used feedback in the training study showed a significant improvement in intra-rater agree-
sessions.12,14 Probably, when feedback was provided during ment starting from the last session, being more evident for
training, the differentiation between parameters may have the breathiness parameter. As for the inter-rater agreement,
occurred more quickly, thus favoring learning.23 The lack there was no improvement with four training sessions.TagedEn
of feedback in the present study's training may explain this TagedPNew studies need to be developed to assess the impact of
finding. Raters' responses were likely aligned within their this type of training on inter-rater agreement. Suggestions
own internal standards. However, with no provided feed- for future research on auditory-perceptual training with the
back on whether their answers were right or not, this align- use of synthesized anchor emissions would be to carry out:
ment did not likely occur between the raters.TagedEn 1) the parameters' training separately; 2) to provide feed-
TagedPThe literature24 shows that auditory-perceptual learning back to the participants during the training; and 3) to
is more effective when the processes of stimuli and differen- increase the number of training sessions, to favor learning
tiation production are part of the training. The stimuli feed and to improve the inter-rater agreement.TagedEn
is given by repeated exposure to the stimulus in order to TagedPSustained vowel emissions were used in this research. The
develop internal patterns. Stimulus differentiation allows literature8 suggests that some characteristics are more easily
the listener to learn to identify each voice parameter inde- perceived in connected speech and others in sustained vowel
pendently. This differentiation occurs more easily in the production and shows that it would be more reliable to ana-
learning process when the stimuli are presented separately. lyze voice breathiness via connected speech. Another study
This is possible using synthesized anchor emissions, which also shows that inter-rater agreement is better with con-
allow the manipulation of their acoustic parameters as nected speech tasks.21 The concomitant use of sustained
desired or needed, favoring the analysis of voice parameters vowel and connected speech favors a more complete voice
in isolation.25TagedEn perception and is supported by the literature.11TagedEn
TagedPAnother strategy would be to carry out a separate audi- TagedPEmploying synthesized voices as an external prototype
tory training for each parameter. Although raters can be seems to be a promising auditory-perceptual training tool.
more critical in assessing isolated parameters,18 most scales The literature shows that synthesized voice anchors are
used in clinical vocal practice carry out auditory-perceptual more efficient because they manipulate a single auditory-
parameter assessment in isolation.TagedEn perceptual parameter, which makes the auditory judgment
of human voice quality simpler.14,15 It is important to point TagedP 7. Freitas SV, Pestana PM, Almeida V, et al. Audio-perceptual evalua-
out that this research analyzed only the R and B parameters tion of portuguese voice disorders: an inter- and intrajudge reliability
due to the restriction of the synthesized voice technology to study. J Voice. 2014;28:210–215. https://doi.org/10.1016/j.
jvoice.2013.08.001.TagedEn
produce only these two parameters.17TagedEn TagedP 8. Bele IV. Reliability in perceptual analysis of voice quality. J Voice.
TagedPA limitation of this study was the fact that participants' 2004;19:555–573. https://doi.org/10.1016/j.jvoice.2004.08.008.TagedEn
hearing was only assessed on a self-report hearing loss TagedP 9. Sofranko JL, Prosek RA. The effect of experience on classification of
basis.31 Future research should consider a baseline central voice quality. J Voice. 2012;26:299–303. https://doi.org/10.1016/j.
auditory processing evaluation of each rater to control this jvoice.2011.07.003.TagedEn
TagedP10. Oliveira SB, Gama ACC, Chaves AR. Interference of background
variable and ensure sample uniformity.TagedEn experience on agreement of perceptive-auditory analysis of neutral and
TagedPIt is also suggested that studies be carried out by using dysphonic voices. Disturbios Com. 2016;28:415–422.TagedEn
two speech tasks during the training: sustained vowel TagedP11. Brinca L, Batista AP, Tavares AI, et al. The effect of anchors and
and connected speech, to assess their impact on rater training on the reliability of voice quality ratings for different types of
speech stimuli. J Voice. 2015;29. https://doi.org/10.1016/j.
agreement.TagedEn
jvoice.2015.01.007. 776.e7-14.TagedEn
TagedP12. Santos PCM, Vieira MN, Sans~ao JPH, et al. Effect of auditory-
perceptual training with natural voice anchors on vocal quality
TAGEDH1CONCLUSIONTAGEDN evaluation. J Voice. 2018;33:220–225. https://doi.org/10.1016/j.
TagedPThe auditory-perceptual parameter of breathiness had an jvoice.2017.10.020.TagedEn
AC2 indicator higher than the one for roughness, indicating TagedP13. Gerratt BR, Kreiman J, Antonanzas-Barroso N, et al. Comparing
internal and external standards in voice quality judgments. J Speech
that it was more in agreement. Auditory training with syn-
Lang Hear Res. 1993;36:14–20. https://doi.org/10.1044/jshr.3601.14.TagedEn
thesized anchor stimuli improved intra-rater agreement in TagedP14. Gurlekian JA, Torres HM, Vaccari ME. Comparison of two percep-
rating roughness and breathiness, starting from the fourth tual methods for the evaluation of vowel perturbation produced by jit-
session. For inter-rater agreement, the auditory training did ter. J Voice. 2016;30:506.e1–506.e8. https://doi.org/10.1016/j.
not present a positive impact for the parameters of rough- jvoice.2015.05.009.TagedEn
TagedP15. Chan KM, Yiu EM. The effect of anchors and training on the reliabil-
ness and breathiness.TagedEn
ity of perceptual voice evaluation. J Speech Lang Hear Res.
TagedPFuture research on perceptual-auditory training using 2002;45:111–126. https://doi.org/10.1044/1092-4388(2002/009.TagedEn
synthesized anchor emissions is needed to evaluate the TagedP16. Kempster GB, Gerratt BR, Verdolini Abbott K, et al. Consensus
impact of this type of method on inter-rater agreement by auditory-perceptual evaluation of voice: development of a stan-
training the parameters separately, providing feedback to dardized clinical protocol. Am J Speech Lang Pathol.
2009;18:124–132.TagedEn
the participants, and with more training sessions.TagedEn
TagedP17. Vieira MN, Sans~ao JPH, Yehia HC. Measurement of signal-to-noise
ratio in dysphonic voices by image processing of spectrograms. Speech
C o m m u n . 2 014 ;61 − 62: 17– 32. h ttp s: //doi.o rg /10.1016/j.
TagedH1ACKNOWLEDGMENTSTagedEn specom.2014.04.001.TagedEn
TagedPThis work benefited from the support of the Brazilian gov- TagedP18. Baravieira PB, Brasolotto AG, Montagnoli AN, et al. Auditory-per-
ernment. It was carried out by Conselho Nacional de ceptual evaluation of rough and breathy voices: correspondence
between analogical visual and numerical scale. CoDAS. 2016;28:163–
Desenvolvimento Científico e Tecnol ogico Brazil
167. https://doi.org/10.1590/2317-1782/20162015098.TagedEn
(CNPq no. 309108/2019-5).TagedEn TagedP19. Santos PCM, Vieira MN, Sansao JPH, et al. Effect of synthesized
voice anchors on auditory-perceptual voice evaluation. CoDAS.
2021;33: e20190197. https://doi.org/10.1590/2317-1782/20202019197.TagedEn
TAGEDH1REFERENCESTAGEDN TagedP20. Silva RSA, Sim~ oes-Zenari M, Nemr NK. Impact of auditory training
TagedP 1. Tonon IG, Gomes NR, Teixeira LC, et al. Self-referred personal for perceptual assessment of voice executed by undergraduate students
behavior profile of university professors: association with communica- in SpeechLanguage Pathology. J Soc Bras Fonoaudiol. 2012;24:19–25.
tive and vocal self-evaluation. CoDAS.. 2020;32: e20180141. https:// https://doi.org/10.1590/S2179-64912012000100005.TagedEn
doi.org/10.1590/2317-1782/20192018141.TagedEn TagedP21. Delvaux V, Pillot-Loiseau C. Perceptual judgment of voice quality in
TagedP 2. Lopes LW, Cavalcante DP, Costa PO. Severity of voice disorders: inte- nondysphonic french speakers: effect of task-, speaker- and listener-
gration of perceptual and acoustic data in dysphonic patients. Codas. related variables. J Voice. 2020;34:682–693. https://doi.org/10.1016/j.
2014;26:382–388. https://doi.org/10.1590/2317-1782/20142013033.TagedEn jvoice.2019.02.2013.TagedEn
TagedP 3. Lopes LW, Sousa ESS, Silva ACF, et al. Cepstral measures in the TagedP22. Eadie TL, Kapsner-Smith M. The effect of listener experience and
assessment of severity of voice disorders. CoDAS. 2019;31: e20180175. anchors on judgments of dysphonia. J Speech Hear Res. 2011;54:430–
https://doi.org/10.1590/2317-1782/20182018175.TagedEn 447. https://doi.org/10.1044/1092-4388(2010/09-0205.TagedEn
TagedP 4. Patel RR, Awan SN, Barkmeier-Kraemer J, et al. Recommended pro- TagedP23. Barsties B, Beers M, Cate LT, et al. The effect of visual feedback and
tocols for instrumental assessment of voice: American speech-lan- training in auditory-perceptual judgment of voice quality. Logoped
guage-hearing association expert panel to develop a protocol for P h o n i a tr Vo c o l . 2 017 ;42 :1–8 . h t t p s : / / d o i . o r g / 1 0 . 3 1 0 9 /
instrumental assessment of vocal function. Am J Speech Lang Pathol. 14015439.2015.1091036.TagedEn
2018;27:887–905. https://doi.org/10.1044/2018_AJSLP-17-0009.TagedEn TagedP24. Goldstone RL. Perceptual learning. Ann Rev Psychol. 1998;49:585–
TagedP 5. Roy N, Barkmeier-Kraemer J, Eadie T, et al. Evidence-based clinical 612. https://doi.org/10.1146/annurev.psych.49.1.585.TagedEn
voice assessment: a systematic review. Am J Speech Lang Pathol. TagedP25. Yiu EML, Murdoch B, Hird K, et al. Perception of synthesized voice
2013;22:212–226. https://doi.org/10.1044/1058-0360(2012/12-0014.TagedEn quality in connected speech by Cantonese speakers. J Acoust Soc Am.
TagedP 6. Ghio A, Dufour S, Wengler A, et al. Perceptual evaluation of dys- 2002;112(3Pt1):1091–1101. https://doi.org/10.1121/1.1500753.TagedEn
phonic voices: can a training protocol lead to the development of per- TagedP26. Eadie TL, Baylor CR. The effect of perceptual training on inexperi-
ceptual categories? J Voice. 2015;29:304–311. https://doi.org/10.1016/j. enced listeners’ judgments of dysphonic voice. J Voice. 2006;20:527–
jvoice.2014.07.006.TagedEn 544. https://doi.org/10.1016/j.jvoice.2005.08.007.TagedEn
^ rtes Gama, et al
TagedP27. Carding PN, Wilson JA, Mackenzie K, et al. Measuring voice out- TagedP30. Kreiman J, Gerratt BR. Comparing two methods for reducing
comes: state of the science review. J Laryngol Otol. 2009;123:823–829. variability in voice quality measurements. J Speech Lang Hear
https://doi.org/10.1017/S0022215109005398.TagedEn Res. 2011;54:803–812. https://doi.org/10.1044/1092-4388(2010/10-
TagedP28. Law T, Kim JH, Lee KY, et al. Comparison of Rater’s reliability on 0083.TagedEn
perceptual evaluation of different types of voice sample. J Voice. TagedP31. Paiva MAA, Rosa MRD, Gielow I, et al. Auditory skills as a predictor
2012;26:666.e13–666.e21. https://doi.org/10.1016/j.jvoice.2011.08.003.TagedEn of rater reliability in the evaluation of vocal quality. J Voice.
TagedP29. Englert M, Madazio G, Gielow I, et al. Learning factor influence on doi:10.1016/j.jvoice.2019.11.020.TagedEn
the perceptual-auditory analysis. CoDAS. 2018;30: e20170107. https://
doi.org/10.1590/2317-1782/20182017107.TagedEn

TEXT 2. Gama 2024. Auditory Training With Synthesized Voice Anchors - Effects on Rater Agreement.

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TEXT 2. Gama 2024. Auditory Training With Synthesized Voice Anchors - Effects on Rater Agreement.

Uploaded by

Copyright:

Available Formats

TagedH1Auditory Training With Synthesized Voice Anchors: Effects on

TAGEDH1INTRODUCTIONTAGEDN to the speaker's gender and age, as well as to the communi-

FIGURE 1. Frequency of responses in the roughness analysis in the ﬁrst session.TagedEn

FIGURE 2. Frequency of responses in the breathiness analysis in the ﬁrst session.TagedEn

FIGURE 3. Frequency of responses in the roughness analysis in the second session.TagedEn

FIGURE 4. Frequency of responses in the breathiness analysis in the second session.TagedEn

FIGURE 5. Frequency of responses in the roughness analysis in the third session.TagedEn

FIGURE 6. Frequency of responses in the breathiness analysis in the third session.TagedEn

FIGURE 7. Frequency of responses in the roughness analysis in the fourth session.TagedEn

FIGURE 8. Frequency of responses in the breathiness analysis in the fourth session.TagedEn

agreement for roughness and breathiness, in the last session,

You might also like