Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Biomedical Signal Processing and Control 59 (2020) 101938

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control


journal homepage: www.elsevier.com/locate/bspc

Acoustic Voice Quality Index and Acoustic Breathiness Index as two


examples for strengths and weaknesses of free software in medicine
Lydia Stappenbeck a,1 , Ben Barsties v. Latoszek b,c,∗,1 , Ben Janotte a , Bernhard Lehnert a
a
Department of Phoniatrics and Pedaudiology, Clinics of Otolaryngology, Head and Neck, Surgery, Ernst-Moritz-Arndt-University Greifswald, Greifswald,
Germany
b
Speech-Language Pathology, SRH University of Applied Health Sciences, Düsseldorf, Germany
c
Speech-Language Pathology, Dormagen Therapy Centre, Dormagen, Germany

a r t i c l e i n f o a b s t r a c t

Article history: Objective: The purpose of the study was to explore the extent to which different Praat versions affect the
Received 26 August 2019 reproducibility of results performing Acoustic Voice Quality Index (AVQI) and Acoustic Breathiness Index
Received in revised form (ABI).
28 December 2019
Method: Seven Praat versions were selected and categorized into three groups based on hierarchical
Accepted 8 March 2020
cluster analysis. The differences/distances, diagnostic accuracy, and concurrent validity were evaluated
Available online 16 March 2020
among the three groups. Group three had just one Praat version. This version was found after two month
to have a computation bug in smoothed cepstral peak prominence (i.e., an important measure for AVQI
Keywords:
Praat
and ABI) before a new update removed this bug. For the analyses, a previous database of 218 German
Acoustic Voice Quality Index voice samples and auditory-perceptual judgment results were used.
Acoustic Breathiness Index Results: The AVQI and ABI results between group 1 and 2 (p = 0.53, and p = 0.62, respectively) demonstrated
Open source software no significant differences. However, the results between these two groups and group 3 yielded significant
differences for AVQI and ABI (all p < 0.00001). The concurrent validity for AVQI (r = 0.84 to 0.86) and
ABI (r = 0.84 to 0.85) were strong in all three groups. The diagnostic accuracy of both indices was also
sufficient for group one and two but low in group three particularly in sensitivity (AVQI: 23% and ABI:
9%, respectively).
Conclusion: AVQI and ABI are two robust measurements in the evaluation of voice quality. However,
caution is warranted using updates in open source software such as Praat for patient care or research.
© 2020 Elsevier Ltd. All rights reserved.

1. Introduction measured mono-dimensionally unlike pitch (i.e., measured in Hz)


or loudness (i.e., measured in dB) [1].
The evaluation of voice quality for patients with dysphonia Recent acoustic models in the evaluation of voice quality are
can be assessed by traditionally auditory-perceptual judgments or e.g., the Acoustic Voice Quality Index (AVQI) [7–9] and the Acoustic
computer analysis of acoustic signals obtained by recording voices Breathiness Index (ABI) [10]. These multiparametric indexes objec-
using a microphone [1]. Both methods are contained in protocols tively quantify the severity of overall voice quality/hoarseness and
to evaluate voice production and the outcome of treatment [2–5]. breathiness in concatenated voice samples of continuous speech
The overall voice quality represents the degree of voice abnormality (i.e., phonetically balanced text) and sustained phonation (i.e., sus-
and two major subtypes of overall voice quality are roughness and tained vowel /a/ of 3 s). AVQI and ABI are implemented in open
breathiness, which have received wide acceptance [6]. The evalua- source scripts, which are analyzed and calculated within the open
tion of voice quality with acoustic analyses reveals higher reliability source software Praat (Paul Boersma and David Weenink; Institute
and validity results using a combination of several parameters in a of Phonetic Sciences, University of Amsterdam, The Netherlands:
model than single acoustic metrics [1]. In general, we can conclude http://www.praat.org/). Praat is a widely used and a scriptable
that voice quality is a multidimensional construct which cannot be computer program intended for analysis of phonetic compo-
nents, and voice sound files, as well. The Praat algorithm for
AVQI and ABI is based on various single acoustic metrics that
∗ Corresponding author at: Graf-Adolf-Straße 67, 40210, Düsseldorf, Germany. showed a high concurrent validity level between the indexes
E-mail address: benjamin.latoszek@srh.de (B. Barsties v. Latoszek). and the perceptual evaluation of auditory-perceptual judgements
1
These authors should be considered joint first author.

https://doi.org/10.1016/j.bspc.2020.101938
1746-8094/© 2020 Elsevier Ltd. All rights reserved.
2 L. Stappenbeck, B. Barsties v. Latoszek, B. Janotte et al. / Biomedical Signal Processing and Control 59 (2020) 101938

of various expert panels [7–25]. The important measure of the Table 1


Praat versions included in this investigation under consideration of its release date
AVQI algorithm is the smoothed cepstral peak prominence (CPPS)
and the study in which the particular version was used. The group column is a result
[7,26] and the ABI algorithm has two main relevant markers (i.e., of the present study of the hierarchical cluster analysis.
CPPS, and the glottal-to-noise excitation (GNE) ratio) [9]. Further
Version released on used by group
investigations of the two indexes revealed acceptable diagnostic
precision [7–14,16,8–25,27], robust interlanguage phonetic dif- 5.3.55 September 2nd 2013 Kankare et al. [17] 1
ferences [7,10,12–18,20,8–25], independence of the influencing 5.3.57 October 27th 2013 Barsties & Maryn [7,8] 1
Barsties v. Latoszek et al.
factors of age and gender [25,27,28], and high sensitivity to voice
[9]
changes across voice therapy [11,16,25,29]. Furthermore, AVQI Barsties v. Latoszek &
further developed to improve the internal consistency of the con- Lehnert [28]
tinuous speech and sustained phonation increasing the reliability Barsties v. Latoszek et al.
[21]
and validity levels of this index [8]. Therefore, two versions of
Englert et al. [37,38]
AVQI (i.e., AVQI version two and version three) exist which dif- 6.0.06 November 29th 2015 Englert et al. [36] 1
fer in length of continuous speech, weighting of the calculation 6.0.21 October 25th 2016 Lee et al. [30] 2
equation, and thresholds in the languages [8,9,12,16,22]. Addition- 6.0.22 November 15th 2016 Hernández et al. [20] 2
ally, AVQI showed, in comparison to the Dysphonia Severity Index, 6.0.46 January 3rd 2019 no fitting paper found 3
6.0.48 February 17th 2019 no fitting paper found 2
higher validity results in the evaluation of overall voice quality [30].
However, the commercial index Cepstral Spectral Index of Dys-
phonia seems to be even stronger as compared with AVQI [31].
Further studies about AVQI reported an acceptable level to evalu- sequences in the signal processing procedure and interpretation of
ate the dysphonia classification [32], robustness to room acoustic the AVQI and ABI results. Based on the example described above,
differences - background noise - microphone quality [33,34], and a it seems necessary to verify if and how reliability issues in Praat
moderate to high correlation of self-perceived questionnaires with affect the reproducibility of AVQI and ABI results.
voice-disordered patients [34,35]. However, AVQI showed a low
correlation with a questionnaire for the assessment of the voice 2. Material and method
self-concept in neurological and psychiatric medical patients [36].
Finally, AVQI is now partly incorporated in commercial offerings as 2.1. Selection of the Praat versions
well (Maryn, Y. Phonanium [https://www.phonanium.com/]).
Regarding ABI, a recent study reported that roughness has no Ten papers were included in which either AVQI script v. 03.01
meaningful contribution to the ABI results, which confirmed the or ABI script v. 01.01 were used in a validation study, and a specific
independent characteristic of this index to estimate specifically Praat version was mentioned in the papers. Five different Praat ver-
breathiness levels [25]. sions were reported: 5.3.55, 5.3.57, 6.0.06, 6.0.21, 6.0.22 (Table 1).
In relation to the high numbers of investigations of AVQI and Two Praat versions were added to this research, (1) the critical v.
ABI, both indexes have a valuable standing in the evaluation of 6.0.46 as mentioned earlier, and (2) v. 6.0.48, which was the most
voice quality in research and clinical practice. Praat v. 5.3.57 is the recent version at the start of the research.
most widely used Praat version for evaluating the validity of these
two indices. The software Praat is regularly optimized to keep up
2.2. Voice samples
quality of usage but it is unclear if due to the dynamic adapta-
tion to user needs, unwanted changes in the algorithm of AVQI
The voice samples were used from the study by Barsties v.
and ABI can occur. Since Praat v. 5.3.57 was first published 10-27-
Latoszek et al. [22]. This voice sample database contains 218
2013, more than 87 updates of Praat have been implemented. In
German speaking voice recordings (i.e., continuous speech and
addition to Praat v. 5.3.57, other Praat versions (e.g., v. 5.3.55, v.
sustained vowel /a/). A total number of 175 subjects had various
6.0.06, v.6.0.21, and v. 6.0.22) were used for validation studies or
organic and nonorganic etiologies and various degrees in dyspho-
clinical research which used AVQI and ABI. It remains unclear if
nia severity. The number of vocally-healthy subjects was in total 43.
updates in Praat may influence the results of previous results and
The voice samples were collected in (A) a voice therapist’s prac-
recommendations based on the findings using older Praat versions.
tice and (B) from the Ernst-Moritz-Arndt-University Greifswald
Signs of differences were preliminarily discovered in AVQI and ABI
in Germany. All recordings were perceptually evaluated by three
results between the Praat v. 5.3.57 and v. 6.0.46 in January 2019
voice experts judging the overall voice quality/hoarseness and the
in the laboratory by analyzing the same audio recordings. Later,
breathiness level from the GRBAS scale [39]. Table 2 reports fur-
the developers of Praat found a computation bug in CPPS intro-
ther results of the voice quality evaluations based on of Barsties v.
duced in Praat v.6.0.44 (released: 31 December 2018) which might
Latoszek et al. [22]. For further details of the voice samples, rater
have caused these differences of our preliminary findings in the
panel reliability, and results, we refer to the paper of Barsties v.
laboratory. The bug was removed in v.6.0.47 (released: 8 February
Latoszek et al. [22].
2019). Free software is not often realized in monolithic blocks but
in stacks of different open source programs. Issues in one program
can seriously affect the stack’s behavior. 2.3. Statistics
The aim of the present study is to investigate the extent to which
different Praat versions affect the reproducibility of results per- All statistical analyses were conducted based on the results of
forming AVQI script v. 03.01 and ABI script v. 01.01. These scripts the AVQI script v. 03.01 and ABI script v. 01.01 from the different
were chosen because (A) the former AVQI script of the second ver- Praat versions as mentioned earlier. The outcomes of each AVQI
sion was superseded by AVQI script v. 03.01 and (B) ABI script and ABI result were based on the same 218 voice samples to com-
v. 01.01 was published before January 2019. Additionally, these pare the differences between the Praat versions. The data entry was
two versions of the indexes analyze the same speech material and implemented manually in Microsoft Excel 2016 (version 15.11). For
speech material length, and, thus, the results can mostly be repro- statistical computation the program R version 3.4.3 [40] was used.
duced with others. Finally, both indexes are completely analyzed in Additional packages of ggplot2 [41], pROC [42], reportROC [43], and
Praat and, thus, changes in the program structure might have con- BlandAltmanLeh [44] were used.
L. Stappenbeck, B. Barsties v. Latoszek, B. Janotte et al. / Biomedical Signal Processing and Control 59 (2020) 101938 3

Table 2
Descriptive results of perceived and acoustic voice quality evaluations from the 218 voice samples.

G-scale B-scale AVQI ABI

Mean SD Mean SD Mean SD Mean SD

Vocally-healthy subjects 0.32 0.30 0.12 0.22 1.03 0.65 2.29 0.93
Voice-disordered subjects 1.23 0.86 0.94 0.86 3.41 2.76 3.72 2.25

First, the agglomerative hierarchical cluster analysis was used


to find discrete groups of (dis)similarity in a data set represented
by a (dis)similarity matrix based on an algorithmic approach [45].
The eucleudian distance and “complete linkage” was chosen as the
clustering method (i.e., pre-sets of the stats::hclust method in R
were used) [41]. During the hierarchical cluster analysis, the Lance
& Williams dissimilarity update formula was used to constantly
recompute the analysis. Increasing dissimilarities are depicted as
larger heights in the final graphic plot (i.e., dendrogram).
Second, descriptive statistics were computed on all group dif-
ferences/distances of mean differences, mean absolute differences,
and quartiles of differences. The outcomes were displayed as Bland-
Altman plots. These plots are a statistical method for visually
assessing agreement between two methods of clinical measure-
ment [46].
Third, the groups of the hierarchical cluster analysis were tested
for systematic differences using a t-test for paired data compar-
ing the means between two related groups [47]. The results were
considered statistically significant at p < 0.05.
Fourth, the diagnostic accuracy of AVQI and ABI among the Praat Fig. 1. Hierarchical cluster analysis of AVQI outcomes of seven Praast versions.
versions were measured with the receiver operating characteristic
(ROC) statistic. The ROC curve was plotted to illustrate the abil-
ity of a predictive model distinguishing between the true positives
(i.e., sensitivity) on the ordinate and negatives (i.e., 1-specificity)
on the abscissa. The identical criteria for determining between nor-
mal and abnormal overall voice quality or absence or presence of
breathiness were used as described in Barsties v. Latoszek et al.
[22]. The sensitivity and specificity levels of AVQI and ABI of the
different Praat versions were evaluated at the previous determined
thresholds (i.e., AVQI = 1.85 and ABI = 3.42) published by Barsties v.
Latoszek et al. [22]. The discrimination power of AVQI and ABI were
analyzed with the area under ROC curve (AUC) by assessing the spe-
cific characteristic between normal and abnormal voice quality or
absence or presence of breathiness. An AUC = 0.5 corresponds with
chance-level diagnostic accuracy [48].
Fifth, the criterion-related concurrent validities of both indexes
among the Praat versions were investigated using the Spearman
rank-order correlation coefficient (rs ) The coefficients between per-
ceptual average judgment of the overall voice quality and AVQI,
and the coefficients between the perceptual average judgment of
Fig. 2. Hierarchical cluster analysis of ABI outcomes of seven Praat versions.
breathiness and ABI were evaluated. Interpretation guidelines for
rs were provided by Frey et al. [49].
ferences/distances between group 1 and 2 are much smaller than
between them and group 3.
3. Results For further analyses, the outcomes of AVQI and ABI were com-
puted with respectively one Praat version out of each group:
3.1. Hierarchical cluster analysis v.5.0.57 for group 1, v.6.0.48 for group 2, and 6.0.46 for group 3.
The AVQI and ABI results of the Praat versions for group 1 (i.e.,
Figs. 1 and 2 show the dendrograms for AVQI and ABI of the v.5.3.55, v.5.3.57, and v.6.0.06) and group 2 (i.e., v.6.0.21, v.6.0.22,
seven different Praat versions. Initially, each Praat version was and v.6.0.48) were almost identical in their groups. Thus, we can
assigned to its own cluster. Both indexes showed very similar justify the representation of one Praat version for each group for
results in their outcomes of six Praat versions, which can be divided the following analyses.
into two groups (group 1 and group 2). There were virtually no dif-
ferences between AVQI and ABI outcomes within Praat v.5.3.55, 3.2. Differences among the three groups
v.5.3.57, and v.6.0.06 (group1), and within v.6.0.21, v.6.0.22, and
v.6.0.48 (group 2). Group 3 represents the outcomes of the seventh Table 3 presents the descriptive differences between group 1
Praat v.6.0.46 with erroneous CPPS computation remaining as its and 2 for AVQI and ABI. Descriptive differences between these
own cluster. Diverse heights in the dendrograms indicated that dif- two groups and group 3 are listed in Tables 4 and 5. The paired
4 L. Stappenbeck, B. Barsties v. Latoszek, B. Janotte et al. / Biomedical Signal Processing and Control 59 (2020) 101938

Table 3
Descriptive statistics of differences between Praat versions group 1 and group 2 (as represented by version 5.3.57 minus 6.0.48).

Descriptive Differences

minimum 1st quartile median mean 3rd quartile maximum

AVQI −0.62 −0.06 0.00 −0.01 0.07 0.42


ABI −3.29 −0.11 −0.02 −0.01 0.09 1.02

Table 4
Descriptive statistics of differences between Praat versions group 1 and group 3 (as represented by version 5.3.57 minus 6.0.46).

Descriptive Differences

minimum 1st quartile Median mean 3rd quartile Maximum

AVQI 2.60 3.81 4.32 4.16 4.57 5.02


ABI 1.31 3.67 4.04 3.97 4.36 5.24

Table 5
Descriptive statistics of differences between Praat versions group 2 and group 3 (as represented by version 6.0.48 minus 6.0.46).

Descriptive Differences

minimum 1st quartile median mean 3rd quartile Maximum

AVQI 2.59 3.78 4.32 4.17 4.61 5.07


ABI 2.41 3.67 4.07 3.98 4.38 5.14

Fig. 3. Bland-Altman-plot with marginal histogram of AVQI values in Praat version 5.3.57 and version.6.0.48 (group 1 and 2).

sample t-test confirmed the results of the descriptive statistics of Barsties v. Latoszek et al. [22], the outcomes of sensitivity and speci-
Table 3, namely that no significant differences existed between ficity showed the following results: group 1 (sensitivity = 72% and
the AVQI and ABI results between group 1 and 2 (p = 0.53, and specificity = 90%), group 2 (sensitivity = 71% and specificity = 90%),
p = 0.62, respectively). However, the results of the paired sample and group 3 (sensitivity = 23% and specificity = 100%).
t-test between these two groups and group 3 yielded significant The AUC of ABI showed slightly higher results than AVQI but also
differences for AVQI and ABI (all p < 0.00001) and confirmed the a small variation that ranged between 91.5% to 92.8% was reported
findings of the results in Tables 4 and 5. The Bland-Altman-plot for in all three groups. The three groups also showed an excellent dis-
group 1 and 2 shows that differences are centered about 0.25 for criminatory power of ABI in differentiating between normal and
the results of AVQI (Fig. 3). These absolute differences rarely exceed breathy voices (see Fig. 6). At the predefined threshold at 3.42 by
0.25. Differences between group 1 and 2 are centered about 0.5 for Barsties v. Latoszek et al. [22], the outcomes of sensitivity and speci-
the results of ABI Fig. 4). ficity showed the following results: group 1 (sensitivity = 72% and
specificity = 95%), group 2 (sensitivity = 70% and specificity = 95%),
and group 3 (sensitivity = 9% and specificity = 100%).
3.3. Diagnostic accuracy

The AUC of AVQI varied slightly among the three groups 3.4. Criterion-related concurrent validities
between 88.4% and 89.7. An excellent discriminatory power of AVQI
in differentiating between normal and hoarse voices was confirmed The Spearman rank-order coefficients of the expert panel’s judg-
in all three groups (see Fig. 5). At the predefined threshold at 1.85 by ments for overall voice quality and the AVQI results of all groups
L. Stappenbeck, B. Barsties v. Latoszek, B. Janotte et al. / Biomedical Signal Processing and Control 59 (2020) 101938 5

Fig. 4. Bland-Altman-plot with marginal histogram of ABI values in Praat version 5.3.57 and version.6.0.48 (group 1 and 2).

Fig. 5. Receiver Operator Characteristic curve of AVQI in Praat version group 1 (green Fig. 6. Receiver Operator Characteristic curve of ABI in Praat version group 1 (green
line), group 2 (red line) and group 3 (blue line). line), group 2 (red line) and group 3 (blue line).

ous implications for the clinical interpretation of AVQI and ABI,


ranged from rs = 0.84 to 0.86. The same results were revealed for
although the validity of the measurement was unaffected according
the breathiness ratings and ABI outcomes in which the Spearman
to the criterion-related concurrent validities, AUC, and specificity
ran-order coefficients ranged from rs = 0.84 to 0.85.
statistics. These three statistics are sufficiently high in all three
groups. The present bug in CPPS of Praat v.6.0.44 to v.6.0.46 had
4. Discussion consequences in the interpretation of the thresholds according
to AVQI’s and ABI’s low sensitivity levels between 9% and 23%.
Open source software is often deployed in stacks of software. This misinterpretation of the thresholds has very serious conse-
Responsibility of correct computation and outcomes does not quences in the daily clinical use or interpretation of research data,
belong to one company but to the voluntary programmers and although the same results could be induced by considering the
the users themselves. In the case of the CPPS bug in Praat v.6.0.44 same conditions/circumstances of the recording which could influ-
until v.6.0.46, there is an objective impact on the outcomes of AVQI ence the outcomes of acoustic analysis (e.g. microphone quality,
and ABI. We hypothesized, that the aberrant results obtained with background noise, and acoustic room differences) [1,33]. Neverthe-
Praat v. 6.0.46 were erroneous due to a bug that was later fixed, less, the results of hierarchical cluster analysis showed that in total
so Praat versions before and after this bug should yield identi- three groups were established depending on the selected Praat ver-
cal or very close to identical results for the example of AVQI and sions but no variation in the AVQI and ABI results could be found
ABI. Our present results have confirmed that this bug had seri- relating to the Praat versions. Furthermore, the present study has
6 L. Stappenbeck, B. Barsties v. Latoszek, B. Janotte et al. / Biomedical Signal Processing and Control 59 (2020) 101938

shown an additional finding about a slight variance of AVQI and Declaration of Competing Interest
ABI results between group 1 and 2. Although these differences of
AVQI and ABI outcomes between group 1 and 2 remained sta- The authors declare that they have no known competing finan-
tistically irrelevant, there must be an unresolved bug introduced cial interests or personal relationships that could have appeared to
somewhere between Praat v.6.0.06 and v.6.0.21 because the out- influence the work reported in this paper.
comes of AVQI and ABI can vary between about 0.25 to 0.5 for
single cases. This variation can have consequences to the clini-
cal interpretation in individual cases, particularly between normal References
voice quality to abnormal voice quality. Thus, it seems as if the
[1] B. Barsties, M. De Bodt, Assessment of voice quality: current state-of-the-art,
AVQI script v. 03.01 and the ABI script v. 01.01 provide stable out- Auris Nasus Larynx 42 (2015) 183–188, http://dx.doi.org/10.1016/j.anl.2014.
comes despite possible computation bugs within the performing 11.001.
software. [2] P.H. Dejonckere, P. Bradley, P. Clemente, G. Cornut, L. Crevier-Buchman, G.
Friedrich, P. Van De Heyning, M. Remacle, V. Woisard, Committee on
This research exemplifies how a vivid community of users helps Phoniatrics of the European Laryngological Society (ELS), A basic protocol for
identify prominent problems within open source software to be functional assessment of voice pathology, especially for investigating the
solved possibly quickly and thoroughly. It also shows that slight efficacy of (phonosurgical) treatments and evaluating new assessment
techniques. Guideline elaborated by the Committee on Phoniatrics of the
differences can only be detected by a systematic analysis such as European Laryngological Society (ELS), Eur. Arch. Otorhinolaryngol. 258
the present study. Even though small margins of AVQI and ABI (2001) 77–82.
outcomes between group 1 and 2 had statistically no influence [3] P.H. Dejonckere, L. Crevier-Buchman, J.P. Marie, M. Moerman, M. Remacle, V.
Woisard, et al., Implementation of the European Laryngological Society (ELS)
between the Praat versions of group 1 and 2, there is no guar-
basic protocol for assessing voice treatment effect, Rev. Laryngol. Otol. Rhinol.
antee that future updates of Praat versions possibly might have 124 (2003) 279–283.
an influence. Currently, there is no independent control system [4] G. Friedrich, P. Dejonckere, The voice evaluation protocol of the European
Laryngological Society (ELS) – first results of a multicenter study,
available to assess these effects as disclosed in the present study.
Laryngorhinootologie 84 (2005) 744–752.
Therefore, medical professional users are well advised to verify [5] P. Boominathan, J. Samuel, R. Arunachalam, R. Nagarajan, S. Mahalingam,
variations in AVQI and ABI outcomes by using a new update of a Multi parametric voice assessment: sri ramachandra university protocol,
Praat version. For example, a defined voice sample of voice record- Otolaryngol. Head Neck Surg. 66 (2014) 246–251, http://dx.doi.org/10.1007/
s12070-011-0460-y.
ings could be pretested on the new Praat version in comparison to [6] B. Barsties v. Latoszek, Y. Maryn, E. Gerrits, M. De Bodt, A meta-analysis:
a previous Praat version. The option to have recourse to all pre- acoustic measurement of roughness and breathiness, J. Speech Lang. Hear.
viously published Praat versions is most helpful to review one’s Res. 61 (2018) 298–323.
[7] Y. Maryn, P. Corthals, P. Van Cauwenberge, N. Roy, M. De Bodt, Toward
own clinical and research results of AVQI and ABI. For a potential improved ecological validity in the acoustic measurement of overall voice
pretest among Praat versions, we uploaded, under consideration of quality: combining continuous speech and sustained vowels, J. Voice 24
ethical permission and data privacy, natural voice samples evalu- (2010) 540–555, http://dx.doi.org/10.1016/j.jvoice.2008.
[8] B. Barsties, Y. Maryn, The improvement of internal consistency of the acoustic
ating the outcomes of AVQI and ABI values (http://www2.medizin. voice quality index, Am. J. Otolaryngol. 36 (2015) 647–656, http://dx.doi.org/
uni-greifswald.de/hno/index.php?id=470). Furthermore, we rec- 10.1016/j.amjoto.2015.04.012.
ommend using the Praat version which was used in the validation [9] B. Barsties, Y. Maryn, External validation of the acoustic voice quality index
version 03.01 with extended representativity, Ann. Otol. Rhinol. Laryngol. 125
studies of each language to ensure that the Praat version has
(2016) 571–583, http://dx.doi.org/10.1177/0003489416636131.
no significant influence on the outcome of AVQI and ABI. In [10] B. Barsties v. Latoszek, Y. Maryn, E. Gerrits, M. De Bodt, The Acoustic
summary, caution is warranted for medical professionals using Breathiness Index (ABI): a multivariate acoustic model for breathiness, J. Voice
31 (2017) 511.e11–511.e27, http://dx.doi.org/10.1016/j.jvoice.2016.11.017.
updates in open source software (e.g., Praat) for patient care or
[11] Y. Maryn, M. De Bodt, N. Roy, The acoustic voice quality index: toward
research. improved treatment outcomes assessment in voice disorders, J. Commun.
Disord. 43 (2010) 161–174, http://dx.doi.org/10.1016/j.jcomdis.2009.12.004.
[12] B. Barsties, Y. Maryn, The acoustic voice quality index. Toward expanded
measurement of dysphonia severity in German subjects, HNO 60 (2012)
715–720, http://dx.doi.org/10.1007/s00106-012-2499-9.
5. Conclusion
[13] V. Reynolds, A. Buckland, J. Bailey, et al., Objective assessment of pediatric
voice disorders with the acoustic voice quality index, J. Voice 26 (672) (2012)
AVQI and ABI results can be altered by Praat versions. Therefore, e1–e7, http://dx.doi.org/10.1016/j.jvoice.2012.02.002.
[14] Y. Maryn, M. De Bodt, B. Barsties, et al., The value of the acoustic voice quality
it is recommended (a) to use Praat versions of previous valida-
index as a measure of dysphonia severity in subjects speaking different
tion studies which tested AVQI and ABI outcomes or (b) to verify languages, Eur. Arch. Otorhinolaryngol. 271 (2014) 1609–1619, http://dx.doi.
with test signals and compare the outputs with outputs of earlier org/10.1007/s00405-013-2730-7.
releases for all functions of Praat. These test signals can be own [15] Y. Maryn, H.T. Kim, J. Kim, Auditory-perceptual and acoustic methods in
measuring dysphonia severity of Korean speech, J. Voice 30 (2016) 587–594,
recordings of the users or voice samples from the University of http://dx.doi.org/10.1016/j.jvoice.2015.06.011.
Greifswald with the reports of correct AVQI and ABI results. [16] K. Hosokawa, B. Barsties, T. Iwahashi, et al., Validation of the acoustic voice
quality index in the Japanese language, J. Voice 31 (2017) 260.e1–260.e9,
http://dx.doi.org/10.1016/j.jvoice.2016.05.010.
[17] V. Uloza, T. Petrauskas, E. Padervinskis, et al., Validation of the acoustic voice
quality index in the Lithuanian language, J. Voice 31 (2017) 257.e1–257.e11,
Funding http://dx.doi.org/10.1016/j.jvoice.2016.06.002.
[18] E. Kankare, v. Barsties, B. Latoszek, Y. Maryn, et al., The acoustic voice quality
None. index version 02.02 in Finnish speaking population, Logoped Phon Vocol.
(2019), http://dx.doi.org/10.1080/14015439.2018.1556332, In Press.
[19] G.H. Kim, Y.W. Lee, I.H. Bae, H.J. Park, S.G. Wang, S.B. Kwon, Validation of the
acoustic voice quality index in the Korean language, J. Voice (2019), http://dx.
doi.org/10.1016/j.jvoice.2018.06.007, In Press.
CRediT authorship contribution statement [20] K. Hosokawa, B. Barsties v. Latoszek, T. Iwahashi, M. Iwahashi, S. Iwaki, C.
Kato, M. Yoshida, H. Sasai, A. Miyauchi, N. Matsushiro, H. Inohara, M. Ogawa,
Y. Maryn, The acoustic voice quality index version 03.01 for the
Lydia Stappenbeck: Methodology, Software, Writing - original Japanese-speaking population, J. Voice 33 (2019) 125.e1–125.e12, http://dx.
draft preparation. Ben Barsties v. Latoszek: Project administration, doi.org/10.1016/j.jvoice.2017.10.003.
Methodology, Conceptualization, Writing - reviewing & editing. [21] J. Delgado Hernandez, N.M. Leon Gomez, A. Jimenez, et al., Validation of the
acoustic voice quality index version 03.01 and the acoustic breathiness index
Ben Janotte: Data curation. Bernhard Lehnert: Formal analyses, in the Spanish language, Ann. Otol. Rhinol. Laryngol. 127 (2018) 317–326,
Writing and reviewing, Supervision. http://dx.doi.org/10.1177/0003489418761096.
L. Stappenbeck, B. Barsties v. Latoszek, B. Janotte et al. / Biomedical Signal Processing and Control 59 (2020) 101938 7

[22] B. Barsties v. Latoszek, B. Lehnert, B. Janotte, Validation of the acoustic voice function index for voice pathology screening, Eur. Arch. Otorhinolaryngol. 276
quality index version 03.01 and acoustic breathiness index in German, J. Voice (2019) 1737–1745, http://dx.doi.org/10.1007/s00405-019-05433-5.
34 (2020) 157e17–157e25. [35] T. Pommée, Y. Maryn, C. Finck, D. Morsomme, The acoustic voice quality
[23] T. Pommée, Y. Maryn, C. Finck, D. Morsomme, Validation of the acoustic voice index, version 03.01, in French and the voice handicap index, J. Voice (2018),
quality index, version 03.01, in French, J. Voice (2018), http://dx.doi.org/10. http://dx.doi.org/10.1016/j.jvoice.2018.11.017, In Press.
1016/j.jvoice.2018.12.008, In Press. [36] I. Priss, B. Barsties v. Latoszek, U. Jäger-Priss, B. Lehnert, Questionnaire for the
[24] M. Englert, B. Barsties v. Latoszek, Y. Maryn, M. Behlau, Validation of the assessment of the voice self-concept in a neurological practice : applicability
acoustic voice quality index, version 03.01, to the Brazilian Portuguese for the identification of patients with high consultation needs, Nervenarzt 90
language, J. Voice (2019), In Press. (2019) 601–608, http://dx.doi.org/10.1007/s00115-018-0642-x.
[25] K. Hosokawa, B. Barsties v. Latoszek, C.A. Ferrer, T. Iwahashi, M. Iwahashi, S. [37] M. Englert, L. Lima, A.C. Constantini, v. Barsties, B. Latoszek, Y. Maryn, M.
Iwaki, C. Kato, M. Yoshida, M. Umatani, A. Miyauchi, N. Matsushiro, H. Inohara, Behlau, Acoustic Voice Quality Index – AVQI for brazilian portuguese
M. Ogawa, Y. Maryn, Acoustic breathiness index for the Japanese-speaking speakers: analysis of different speech material, Codas 31 (2019), e20180082,
population: validation study and exploration of affecting factors, JSLHR 66 http://dx.doi.org/10.1590/2317-1782/20182018082.
(2019) 2617–2631, http://dx.doi.org/10.1044/2019 JSLHR-S-19-0077. [38] M. Englert, L. Lima, M. Behlau, Acoustic voice quality index and acoustic
[26] Y. Maryn, D. Weenink, Objective dysphonia measures in the program Praat: breathiness index: analysis with different speech material in the Brazilian
smoothed cepstral peak prominence and acoustic voice quality index, J. Voice Portuguese, J. Voice (2019), http://dx.doi.org/10.1016/j.jvoice.2019.03.015, In
29 (2015) 35–43, http://dx.doi.org/10.1016/j.jvoice.2014.06.015. Press.
[27] C. Batthyany, Y. Maryn, I. Trauwaen, E. Caelenberghe, J. van Dinther, A. [39] M. Hirano, Psycho-acoustic evaluation of voice, in: G.E. Arnold, F. Winckel,
Zarowski, F. Wuyts, A case of specificity: how does the acoustic voice quality B.D. Wyke (Eds.), Disorders of Human Communication 5. Clinical Examination
index perform in normophonic subjects? Appl. Sci. 9 (2019) 2527, http://dx. of Voice, Springer Verlag, Vienna, Austria, 1981, pp. 81–84.
doi.org/10.3390/app9122527. [40] R Core Team, R: A Language and Environment for Statistical Computing, R
[28] B. Barsties v. Latoszek, N. Ulozaitė-Stanienė, Y. Maryn, T. Petrauskas, V. Uloza, Foundation for Statistical Computing, Vienna, Austria, 2017, https://www.R-
The influence of gender and age on the acoustic voice quality index and project.org/. (Accessed 15 February 2017).
dysphonia severity index: a normative study, J. Voice 33 (2019) 340–345, [41] H. Wickham, ggplot2: Elegant Graphics for Data Analysis, Springer-Verlag,
http://dx.doi.org/10.1016/j.jvoice.2017.11.011. New York, 2009.
[29] B. Barsties v. Latoszek, B. Lehnert, Internal validation of the acoustic voice [42] X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez, M. Müller,
quality index version 03.01 and acoustic breathiness index, pROC: an open-source package for R and S+ to analyze and compare ROC
Laryngorhinootologie 97 (2018) 630–635, http://dx.doi.org/10.1055/a-0596- curves, BMC Bioinformatics 12 (2011) 77, http://dx.doi.org/10.1186/1471-
7819. 2105-12-77.
[30] V. Uloza, B. Barsties v. Latoszek, N. Ulozaite-Staniene, T. Petrauskas, Y. Maryn, [43] Z. Du, Y. Hao, reportROC: An Easy Way to Report ROCAnalysis. R Package
A comparison of Dysphonia Severity Index and Acoustic Voice Quality Index Version 3.4, 2019, https://CRAN.R-project.org/package=reportROC. (Accessed
measures in differentiating normal and dysphonic voices, Eur. Arch. 29 July 2019).
Otorhinolaryngol. 275 (2018) 949–958, http://dx.doi.org/10.1007/s00405- [44] B. Lehnert, BlandAltmanLeh: (Slightly Extended) Bland-altman Plots. R
018-4903-x. Package Version 0.3.1, 2019, https://CRAN.R-project.org/
[31] J.M. Lee, N. Roy, E. Peterson, R.M. Merrill, Comparison of two multiparameter package=BlandAltmanLeh. (Accessed 29 July 2019).
acoustic indices of dysphonia severity: the acoustic voice quality index and [45] S.C. Johnson, Hierarchical clustering schemes, Psychometrika 32 (1967)
cepstral spectral index of dysphonia, J. Voice 32 (2018) 515.e1–515.e13, 241–254.
http://dx.doi.org/10.1016/j.jvoice.2017.06.012. [46] J.M. Bland, D.G. Altman, Statistical methods for assessing agreement between
[32] B. Barsties v. Latoszek, N. Ulozaitė-Stanienė, T. Petrauskas, V. Uloza, Y. Maryn, two methods of clinical measurement, Lancet 1 (1986) 307–310.
Diagnostic accuracy of dysphonia classification of DSI and AVQI, [47] F.E. Harrell, J.C. Slaughter, Biostatistics for Biomedical Research, 2019, https://
Laryngoscope 129 (2019) 692–698, http://dx.doi.org/10.1002/lary.27350. hbiostat.org/doc/bbr.pdf. (Accessed 30 June 2019).
[33] P. Bottalico, J. Codino, L.C. Cantor-Cutiva, K. Marks, C.J. Nudelman, J. [48] L.G. Portney, M.P. Watkins, Foundations of Clinical Research, Applicaions to
Skeffington, R. Shrivastav, M.C. Jackson-Menaldi, E.J. Hunter, A.D. Rubin, Practice, 2nd ed., Prentice Hall, Upper Saddle River, New Jersey, 2000.
Reproducibility of voice parameters: the effect of room acoustics and [49] L.R. Frey, C.H. Botan, P.G. Friedman, et al., Investigating Communication: An
microphones, J. Voice (2018), http://dx.doi.org/10.1016/j.jvoice.2018.10.016, Introduction to Research Methods, Prentice Hall, Englewood Cliffs, New
In Press. Jersey, 1991, pp. 40.
[34] N. Ulozaite-Staniene, T. Petrauskas, V. Šaferis, V. Uloza, Exploring the
feasibility of the combination of acoustic voice quality index and glottal

You might also like