The Effects of Emotional Expression on Vibrato

*Christopher Dromey, *Sharee O. Holmes, J. Arden Hopkin, and *Kristine Tanner, *yProvo, Utah
Summary: Objectives/Hypothesis. The purpose of this study was to investigate the effect of emotional expression
on several acoustic measures of vibrato, including its rate, extent, and steadiness. We hypothesized that singing a passage with emotional content would influence these variables.
Study Design. This study used a within-subjects, repeated-measures design. Singer performance under different conditions was analyzed.
Methods. Ten graduate student singers (eight women, two men) completed a series of tasks including sustained sung
vowels at several pitch and loudness levels, an assigned song that was judged to have relatively neutral emotion, and a
personal selection that included passages of intense emotion. Vowel tokens were extracted from the recordings and averaged for each task. Dependent measures included the mean fundamental frequency (F0), mean intensity, frequency modulation (FM) rate, FM extent, and measures of FM rate and extent variability.
Results. The FM rate and extent were higher and the modulation variability was lower for the more emotional song
than for the sustained vowels. Mean F0 and intensity were higher for the emotional song than for the neutral song.
Conclusions. Singing an emotional passage influences acoustic features of vibrato when compared with isolated, sustained vowels. The wider dynamic and pitch ranges for emotional passages only partly explain vibrato differences between emotional and neutral singing.
Key Words: VibratoSingingEmotionVibrato extentVibrato rateModulation.
Vocal vibrato has been the subject of research for many years.
C.E. Seashore pioneered the use of acoustic measures to
examine vibrato in as early as the 1930s. Seashore1 defined
vibrato as a periodic pulsation, generally involving pitch, intensity, and timbre, which produces a pleasing flexibility,
mellowness, and richness of tone (p. 623). Vocal vibrato is
understood to be a natural feature of a well-balanced singing
voice,2 contributing to a listeners perception of the performers
technical and artistic skill. Vibrato is also considered one of the
means a singer may use to express emotion.1,3
Vibrato acoustics and vocal beauty
On the surface it may seem simple to define a specific set of
acoustic and physical measures that characterize a pleasing
voice, but many factors are involved in beautiful singing. One
factor identified in the literature is vibrato. Robison, Bounous,
and Bailey4 compared the vocal ratings from a panel of expert
judges to several acoustic measures of vocal performance.
Singers with the highest ratings of vocal beauty were those
whose vibrato occupied proportionally more of their total
singing time. Other predictors of vocal beauty included cleanness of voice and adequate breath management.
Several acoustic features of vibrato have been studied in
detail since Seashore made his first observations, including
the rate, extent, and periodicity of the vocal modulations.3,5
Vibrato rate is defined as the number of fundamental
frequency (F0) and amplitude pulses per second.1 Frequency
modulation (FM) and amplitude modulation (AM) both
Accepted for publication June 10, 2014.
From the *Department of Communication Disorders, Brigham Young University,
Provo, Utah; and the ySchool of Music, Brigham Young University, Provo, Utah.
Address correspondence and reprint requests to Christopher Dromey, Department of
Communication Disorders, Brigham Young University, 133 John Taylor Building, Provo,
UT 84602. E-mail:
Journal of Voice, Vol. -, No. -, pp. 1-12
2014 The Voice Foundation

contribute to vibrato rate. In vibrato, AM is largely an epiphenomenon that arises from the resonance-harmonic interaction
or RHI.6,7 The RHI is the interaction between rising and
falling harmonic frequencies (as a result of fluctuating F0)
and the formant peaks in the vocal tract transfer function that
determine the overall intensity of a sound. This interaction
creates an involuntary modulation in amplitude with the
modulation in F0. There is also a laryngeal component of AM
that can be measured through electroglottography,8 although
the target behavior for a singer is believed to be the modulation
of fundamental frequency. Therefore, for the remainder of this
article, FM of vibrato will be the focus of measurement and discussion, and AM will not be considered further.
The rate of vocal vibrato for an individual is not fixed; it can
be modified by the singer with conscious effort. However,
singers have a natural speed of vibrato, and rate can only be
changed modestly with volitional control.9 The average rate
of vibrato has been reported to be within the 57 Hz range,1,3,5
with higher or lower rates depending on when the note occurs
within a musical passage,10 or the amount of vocal training.11
Prames10 research showed that vibrato rate does not always
remain the same throughout a sustained note. Several studies
have also shown that with vocal training, the vibrato characteristics of an individual singer may change.12,13 With training,
inexperienced singers with an unusually fast rate tend to
gradually slow down closer to an average pace, and singers
who begin with a slow rate tend to speed up and move closer
to the average rate over a period of time.13
As defined by Seashore,1 vibrato extent is the distance between the crest and trough of the F0 trace, and is measured in
semitones (ST). The average extent of vibrato is reported to
lie between 0.41 and 1.58 ST.3 Vibrato extent has generally
increased over the course of the past century,12 and it also
increases in individual singers after a significant period of vocal
training.13 In a study of the link between acoustics and listener
ratings, excessive amplitude modulation, delayed onset of
vibrato, and complete absence of vibrato all had negative

effects on perceptual measures of the voice.2,3 A moderate
vibrato rate and extent are also important for professional
singers, and a balance in rate and extent has been identified
as important to vocal beauty.2,4 Diaz and Rothman14 concluded
that extent was an important aspect of vibrato, one that was
reflective of overall vocal quality. They suggested that periodicity of vibrato, or the regularity of the modulation, was also
among the most significant indicators of vocal beauty. Although
much has been learned about the contributions of vibrato rate,
extent, and periodicity to the overall performance and beauty
of singing, much remains to be discovered, including the effects
of emotional expression on these measures of vibrato.
Natural versus simulated emotion
The role of emotions in the human experience has been investigated extensively. It has been suggested that the primary purpose of emotional reaction in any species is to either protect an
organism from impending threats (negative valence emotions
linked to fight or flight), or to increase the chances of both short
and long-term survival of the species (positive valence responses linked to food or mating).15 Researchers differ in their
views as to whether specific emotions such as anger, fear, joy, or
surprise have their own autonomic hallmarks, or whether
emotional arousal is less finely differentiated at a physiologic
level. Kreibigs thorough review of the data from 134 studies
suggests that the bodys autonomic response tends to be specific
for a given emotion.16
In theater, the ability to assume the role of a character while
convincingly portraying emotion is a vital skill. An actor is both
himself and a fictional character at once. This duality of an actor
on stage is an essential element of theater. An actor makes what
is artificial seem genuine, and evokes an emotion in the audience that is not necessarily felt by the actor, but by the character.
The skilled simulation or feigning of emotion can be sufficient
to invoke autonomic responses in a theatrical or operatic audience. Baltes, Avram, Miclea, and Miu17 found that experiencing music through listening, watching, and learning the
plot of an opera led to physiologic changes in a viewer. It has
been suggested that emotional responses to music may be
linked to the mirror neuron system, which activates a subset
of motor neurons when an action is observed, in much the
same way those neurons would be active in actually performing
a task. In the case of an artistic expression, an audience member
experiences an emotion that is evoked not by the listeners
direct experience of an event, but by a potentially innate capacity, mediated by mirror neurons, to respond emotionally to a
musical performance.18
Given the capacity of musical performance to arouse an affective response in the listener, it could be speculated that
certain acoustic features might characterize singing that is
more rather than less emotional. This reasoning has led to a
number of studies of the connection between emotion and the
physiology of human phonation.
Emotion and the voice
During his early research on vibrato, Seashore briefly addressed
emotion as a contributor to vibrato characteristics. At that time

ages in many cultures, and that it occurred during emotional
singing, or singing with feeling. Although Seashore1 suggested
an emotional contribution to the emergence of vibrato, there
was no clear evidence at the time that emotion had a direct effect on the characteristics of vibrato.
The respiratory system is the energy source for phonation,
and because it is under autonomic and volitional control, it is
reasonable to anticipate that emotional arousal may influence
its activity, which in turn may have an impact on the voice.
Foulds-Elliott and collaborators19 asked professional opera
singers to sing in two ways. One involved technical singing,
as the artist might use during warm-up or rehearsal, and the
other was emotionally connected singing, or the type of singing
that meaningfully communicates with an audience during a performance. The key respiratory difference was that the emotionally connected singing involved initiating phonation at a higher
lung volume level and using more air. An examination of sound
pressure levels showed that the dynamic range was greater in
the emotional singing and more uniformly loud in the technical
condition. The authors speculated that performers may rely on
autonomic nervous system activation to allow a convincing performance, in much that same way that a photographer elicits a
more natural smile from a subject by telling a joke than by
asking for a smile.19
The work of Klaus Scherer in the study of emotion in
communication has been extensive. He succinctly summed up
the rationale for using acoustics to understand the mechanisms
of emotional expression: If it is demonstrated that emotion can
be correctly diagnosed from the voice, then clearly the emotions
must differentially affect the vocalization mechanism and, in
consequence, yield demonstrable differences in acoustic
patterning of the resulting sound waves (p.236237).20 He
acknowledged the ethical challenges associated with invoking
true emotions in a laboratory, and noted that in most research
into the acoustic features of individual emotions, actors have
supplied the samples, raising the concern that the results may
not reflect the features of a truly emotional experience. In discussing singing, Scherer noted that strong emotional involvement appears necessary for a successful performance, but that
we cannot be sure whether these emotions were actually felt
as opposed to skillfully feigned.20
The autonomic nervous system has been suggested as potentially responsible for functional voice disorders, where no
organic pathology can explain the dysphonia. This reasoning
led to a study of laryngeal muscle activation during a task
known to invoke an autonomic response. Helou et al21 had their
volunteers immerse a hand in ice water, and compared the activity levels of several intrinsic laryngeal muscles to cardiovascular indexes of autonomic activity. Along with the anticipated
increases in heart rate and blood pressure, activation of vocal
fold adductors, abductors, and tensors was observed, which
lasted beyond the return to baseline of the cardiovascular measures after the ice water task was over. The authors concluded
that the larynx is sensitive to autonomic nervous system activation, which in the present study may imply that vibrato characteristics could be affected by emotion.

A number of studies have been conducted that show the effects of emotion on specific aspects of the voice. Howes et al3
found that judges were able to correctly identify the emotion
of a singer during a short cadenza, which confirmed that the
singers were able to effectively portray the target emotion; however, this still does not explain the effect of the singers emotion
on the acoustic features of their vibrato. In another study, rate,
pitch height, and loudness were features of both speech and music that helped listeners to decode emotion.22
The work of Sundberg et al23 has shown that the extent of frequency modulation in vibrato may increase for an emotional as
opposed to a neutral performance. The researchers in this study
avoided a paradigm where the singer was asked to sing the same
passage with several contrasting emotions, considering such a
request to be at odds with the intended emotion embedded
into a musical passage by the composer. In contrast, other researchers have attempted to distinguish between specific emotions (tenderness, happiness, sadness, and anger) by having
performers sing the same short phrase using each of these emotions.24 In the latter study, none of the emotions affected the
acoustic measures of vibrato (rate, extent, or steadiness); however, the modulation extent and steadiness increased when the
voice was louder. In another report, Sundberg noted that during
the performance of a song the vibrato rate was higher than in
isolated, sustained tones.25 It could be speculated that the
increased emotional engagement required for a performance,
whether those emotions were real or simulated, could account
for this increase in vibrato rate.
The purpose of the present study was to investigate changes
in several acoustic measures of vibrato as singers performed
songs that were judged to be higher or lower in their emotional
content. We hypothesized that singing a more emotional passage would lead to changes in vibrato rate, extent, and variability. It has been reported that greater emotional arousal is
associated with a higher vibrato rate,2 but it is unclear how
the other variables might be affected. This research could
potentially yield insights into singing physiology and the mechanics of modifying vibrato. This may have value for professionally trained singers in their quest for vocal beauty by
shedding light on the connection between the expression of
emotion and a balanced vibrato.
Ten graduate student singers with high vocal competency ratings from the classical voice program at the School of Music
at Brigham Young University participated in this study. All
singers were rated between 3.0 and 4.05 on a scale of 1.0 to
5.0 for vocal technique. A score of 3.0 is required to pass a junior level recital and 4.0 for a graduate recital. A score of 5.0
represents professional caliber performance. Students who
were already assigned a score of 3.0 or higher in the music program were invited to participate, and no further screening was
conducted as part of the study. The mean age of the singers
was 23.9, years (SD 2.08). Eight were women and two
were men. All participants reported good health and denied a

history of hearing or voice disorders. The experimenters also

listened to each participants speaking voice and found it to
be perceptually within normal limits. Each participant signed
a consent form that was approved by the Brigham Young University Institutional Review Board.
Recordings were made in a sound-treated studio. A sound level
meter (Extech Instruments [Nashua, NH] 407736) was positioned 50 cm from the microphone to calibrate the audio signal
for vocal intensity. A Neumann (Berlin, Germany) TLM 49
condenser microphone was placed inside a sound isolation
shell, with the pickup pattern of the microphone facing away
from the piano to reduce signal bleed. An Audient (Herriard,
UK) 8024 Analog Recording Console, Grace (Lyons, CO)
Model 201 two channel preamplifier, and ProTools 10 HD2
Recording System (Avid Technology Inc, Burlington, MA)
were used for the recordings at 44.1 kHz.
The singers warmed up their voices before the recording. A
pianist accompanied each singer. They performed the following
tasks in randomized order to minimize the likelihood of a
sequence effect.
Personal selection. Each participant sang a song in classical
style that they had been practicing with their vocal instructor.
The singers were invited at the time of recruitment to the study
to be expressive with the emotion of their selection, which
should have passages of both high and low levels of emotion.
Emotion was not operationally defined for the singers; it was
left to them to identify a passage that they judged to be emotionally expressive. At the time of recording, no further instruction
regarding emotion was given. Thus, the emotions expressed in
the songs were not standardized across participants. Previous
work has reported no changes to vibrato in association with
the targeted expression of specific emotions,24 and for the purpose of the present study, it was decided to have the singers express the emotion intended by the composer, regardless of its
specific character or even valence.
The participants brought a copy of their musical selection
with annotations that they had made to indicate the emotional
intensity of each part of the song. These markings guided segmentation during signal analysis to identify and extract for
analysis only those vowels marked as having the highest
emotion. During statistical analysis the dependent measures
(reflecting vibrato rate, extent, and steadiness) for the other
conditions (assigned song, isolated vowels) were compared
with this emotional passage to evaluate changes that might be
attributable to emotional arousal. The personal selection was
considered to be a performance task, meaning it was most
representative of singing in a concert, because the participant
sang an entire song.
It is important to acknowledge that the researchers did not
attempt to directly influence the singers emotional state by
any experimental means. Thus, it can be assumed that the performers were feigning the intended emotion required for each

part of their chosen song. The singers were not questioned
following the recording regarding their level of perceived
emotional arousal.
Assigned song. Each singer sang the first 12 measures of
the song Caro mio ben by Giordanni, which was chosen
for three reasons. First, it was well known to the students in
the program as a beginning level song. Secondly, this song is
almost always sung as part of a skill development exercise
without encouragement of emotional expression. Finally, it is
relatively slow, with limited dynamic range and pitch range,
and includes several prolonged vowels that would be suitable
for modulation analysis. This short song was sung once at a
self-selected comfortable pitch and loudness. This selection
was considered a performance task (albeit without deliberate
emotional engagement) along with the personal selection,
because the participant sang multiple phrases within the context
of a familiar song.
Sustained vowels. Each participant sang the isolated
vowels //, /u/, and /i/ across pitch and loudness continua.
They sang each vowel at a comfortable pitch, at three different
loudness levels: low, medium, and high. They also sang each
vowel at a comfortable loudness, at three different pitches:
low, medium, and high. The purpose of these tasks was to allow
the measurement of vibrato changes across pitch and loudness
conditions, which might overlap those used in the songs, but
without any emotional component. It was anticipated that passages with the highest emotional content might also be sung
with a wide dynamic range; therefore, it was important to measure vibrato changes that might be attributable to elevated pitch
or loudness of the voice aside from any contribution of
emotional arousal. These tasks were assumed to have the
most neutral emotion, and were also considered isolated phonation tasks, meaning they had no real context and they were the
least representative of a concert performance. Tasks within the
sustained vowel section were also randomized to minimize the
likelihood of an order effect.
Data analysis
Digital sound files from the recording studio were transferred to
a laboratory computer for analysis. The files were first
segmented to extract isolated vowel tokens from the singing
passages as individual 44.1 kHz wav files. These vowel tokens
were opened with Praat acoustic analysis software (version
5.3.03; Paul Boersma and David Weenink, University of
Amsterdam, The Netherlands) to generate an F0 contour, which
was exported as a text file with values reported at 1 ms intervals.
The wav audio files were also analyzed with custom Matlab
software (MathWorks, Natick, MA, 2009) to create an root
mean square contour, also at 1 ms sample intervals. Acoustic
measurements were derived from the recordings with a custom
Matlab application, to compute variables reflecting vibrato rate,
extent, and steadiness, as well as means of F0 and intensity.
During vowel segmentation, the individual tokens were
trimmed minimally, leaving the longest duration possible for
each vowel; therefore, vowel tokens varied in length across
all tasks. Instances of delayed onset of vibrato were also

analysis when the piano intensity overcame the acoustic shielding and affected the voice recording to the extent that the analysis yielded a visibly contaminated F0 trace.
Personal selection vowel tokens were chosen from the sections of highest emotion, as indicated by the singers markings
on their music score. The acoustic measures of vibrato (reflecting rate, extent, steadiness, fundamental frequency, and vocal
intensity) from the first 20 high-emotion vowel tokens from
each singer were averaged to create the personal selection
data set. The assigned song data set was created from the
same five vowels for each participant. These vowels were chosen for their length, providing vowel tokens with comparable
duration to those from the personal selection. Pitch and loudness summary values of the dependent measures were generated by averaging data from all //, /u/, and /i/ vowels for
each condition, because initial analysis revealed no differences
between the three places of articulation. Figure 1 illustrates how
the dependent variables were defined.
FM rate. FM rate was measured in Hz, and was calculated
through the use of a peak- and trough-picking algorithm that
identified the temporal location of each FM cycle. The rate
was computed as the inverse of the mean period of each cycle.
FM extent. FM extent was measured in ST. This was calculated by taking the maximum value (peak) minus the minimum
value (trough), averaged over all cycles.
FM rate coefficient of variation (COV). This variable was
a measure of the regularity of the FM rate. It was computed by
dividing the standard deviation by the mean of the FM period of
a vowel token (coefficient of variation), which was then multiplied by 100 to make the numbers more convenient to interpret.
FM extent coefficient of variation. This variable was a
measure of the regularity of the FM extent. It was computed
by dividing the standard deviation by the mean of the FM extent
for the modulation cycles within a vowel token, which was then
multiplied by 100 to make the numbers more convenient to
Mean F0. The mean F0 was measured in Hz. This was the
average fundamental frequency during each vowel token.
Mean intensity (dB). The intensity mean was measured in
dB SPL at 50 cm. This variable was calculated as the average
intensity during each vowel token.

Statistical analysis
Univariate repeated-measures analyses of variance (ANOVAs)
were used to evaluate the statistical significance of changes in
the dependent measures across the vocal task conditions. An
initial analysis comparing the vowels //, /u/, and /i/ showed
no significant differences in the dependent measures. Because
the data were comparable for all vowels, they were therefore
averaged for each pitch and loudness level before further analysis. Contrast tests within the ANOVA model compared the task
with the highest level of emotion (the personal selection) with
each of the other tasks.

FIGURE 1. Frequency modulation (FM) and amplitude modulation (AM) traces for an individual vowel token, with software-derived peak
markers which were used to calculate FM rate, FM extent, FM rate coefficient of variation (COV) and FM extent COV. Vertical axis units are
Hz for FM and dB SPL at 50 cm for AM.
Although womens speaking voices are generally about an
octave higher than mens, it was reasoned that it would be
appropriate to analyze the F0 data for both men and women
together because the repeated measures ANOVA essentially
tests each singer against their own performance across the tasks
in the study. Large intersubject variance in a standard ANOVA
would lead to high levels of error variance. But the repeated
measures computation accounts for this variance because the
samples across the conditions are assumed to be correlated
and not independent. The F0 range among a mixed group of
singers will necessarily be larger than when men and women
are considered separately; however, the statistical test still allows significant changes to be identified within singers across
the levels of the independent variable.
The statistical results are reported in their unaltered form,
without explicit adjustments to minimize the potentially inflated
risk for type I errors when multiple tests are conducted. Certain
kinds of error reduction adjustments (such as Bonferroni) carry

with them the risk of increasing type 2 errors because they are
overly conservative.26 All P values below 0.05 are reported in
Tables 1 and 2, but the reader is encouraged to critically
evaluate the relative strength of the results for each variable.
Figures 2 and 3 show means and standard deviations for the
dependent variables for the pitch (Figure 2) and loudness
(Figure 3) continua. The assigned song and personal selection
are also presented for comparison with the sustained vowel
tasks. Tables 1 and 2 show the F-ratios and P values for each
of the statistically significant findings for pitch and loudness
continua respectively.
FM rate
There was a significant main effect of vocal task on FM rate of
vibrato. Figures 2 and 3 show an upward trend, with FM rate
increasing from the neutral-emotion tasks to the more

Inferential Statistics for the Dependent Measures in the Pitch Continuum, Including Main Effect and Contrast Analyses
Against the Personal Selection Task
Main Effect
FM rate
FM extent
FM rate COV
FM ext COV
Mean f0
Mean dB

Low Pitch

Comf Pitch

High Pitch











Abbreviations: CMB, Caro mio ben; comf, comfortable; ext, extent.




Inferential Statistics for the Dependent Measures in the Loudness Continuum, Including Main Effect and Contrast Analyses
Against the Personal Selection Task
Main Effect
FM rate
FM extent
FM rate COV
FM ext COV
Mean f0
Mean dB

Low Loud

Comf Loud

High Loud


















emotional task. According to the contrast analysis, there was

not a significant difference in FM rate between the neutralemotion song and the high-emotion song. There were, however,
statistically significant increases in FM rate from the sustained
vowel tasks to the personal selection in both the pitch and the
loudness continua.

FM extent
In both pitch and loudness continua, there was a significant
main effect of vocal task on FM extent. Figure 2 shows a general increase from the sustained vowel tasks to both performance tasks: the assigned song and the personal selection.
Figure 3 shows the same pattern for the loudness continuum.

FIGURE 2. Mean and standard deviation of FM rate, FM extent, FM rate COV, FM extent COV, mean F0 and mean dB across all tasks, within the
pitch continuum. CMB, Caro mio ben (neutral) song; PS, personal selection (high-emotion) song.

FIGURE 3. Mean and standard deviation of FM rate, FM extent, FM rate COV, FM extent COV, mean F0 and mean dB across all tasks, within the
loudness continuum. CMB, Caro mio ben (neutral) song; PS, personal selection (high-emotion) song.
A contrast analysis revealed that the FM extent for the personal
selection was significantly higher than the sustained vowel low,
comfortable, and high tasks for both pitch and loudness. The
difference between the assigned song and the personal selection
was insignificant.
FM rate COV
A significant main effect of vocal task on FM rate COV was
found for both pitch and loudness continua. A relative decrease
in FM rate variability is notable in Figures 2 and 3. Through the
contrast analysis, it was found that in the pitch continuum, lowand high-pitch tasks were significantly more inconsistent in FM
rate than the personal selection; however, the comfortable pitch
and assigned song tasks yielded no significant difference from
the personal selection. In the loudness continuum, only a main
effect was found, with no significant differences found between
individual tasks.
FM extent COV
The main effect of vocal task on FM extent COV was statistically significant in both the pitch and the loudness continua.

Figure 2 shows a decrease in FM extent COV for the performance tasks. Figure 3 shows a similar pattern, with more stability for the comfortable loudness task than other sustained vowel
tasks. The contrast analysis showed that in the pitch continuum,
low- and high-pitch vowels were significantly more inconsistent than personal selection vowels. For the loudness continuum, only low loudness vowels were significantly more
inconsistent than personal selection vowels.
Mean F0
There was a statistically significant main effect of vocal task on
mean F0. For the pitch continuum, Figure 2 shows a clear increase in F0 from low- to comfortable- and to high-pitch tasks,
as would be anticipated. The assigned song mean F0 was between the low and comfortable pitch sustained vowel mean
F0 values. The personal selection mean F0, however, was between the comfortable and high-pitch means. On the loudness
continuum, Figure 3 shows consistency between all tasks
with the exception of the personal selection, which has an
increased mean F0. The contrast analysis confirmed a statistically significant difference between the personal selection and

all other tasks (sustained low, comfortable, high, and the assigned song) individually for both the pitch continuum and
the loudness continuum.
Mean dB
For both the pitch and loudness continua, there were statistically significant main effects of vocal task on mean dB. A
contrast analysis of the pitch continuum revealed that the low
pitch and comfortable pitch sustained vowel tasks, and the assigned song, had a significantly lower mean dB than the personal selection. The high-pitch task, however, showed no
significant difference in mean dB from the personal selection.
For the loudness continuum, the low loudness and comfortable
loudness conditions, and the assigned song, were significantly
lower in mean dB than the personal selection. The high loudness task showed no significant difference in mean dB
compared with the personal selection.
This study investigated the potential effects of emotional
arousal on the acoustic features of vocal vibrato. On the basis
of previous evidence that emotion can affect speech, song,
and overall voice quality, it was anticipated that singing passages considered to be higher in emotional content might cause
vocal vibrato to change in its rate, extent, and/or steadiness. It is
important to recognize that emotional arousal is not the only
factor that may have led to changes in vibrato in the present
study. Physical or cognitive arousal may play a role in preparing
a singer for performance, and thus in the present study, differences between vibrato in the isolated vowels and the songs
may be attributable to factors other than real or feigned
The data revealed that there were significant changes in
vibrato as a function of vocal task; however, the extent to which
the changes were due to the level of emotional arousal remains
unclear. The results include two main trends, which were seen
in Figures 2 and 3. First, there was a general increase in vibrato
FM rate across tasks with presumably increasing emotional
engagement. Second, there was an increase in FM extent
from the isolated sustained vowel tasks to the tasks that
involved songs. Further examination of the individual
acoustic measures led to more detailed speculation about
what may have contributed to these changes.
FM rate
FM rate is a key component of vocal vibrato. Several studies
have shown that the average FM rate of vibrato is approximately 57 Hz.1,3,5 The average vibrato rate in this study was
in the 5 Hz range, with the slowest vowel tokens around
4 Hz, and the highest reaching approximately 6 Hz. In
Figure 2, FM rate steadily increased with each task across the
pitch continuum, from low pitch to the personal selection.
Because the pattern in FM rate differed from that of mean F0
across the pitch continuum and mean dB across the loudness
continuum, the data suggest that the FM rate of vibrato was
potentially influenced by emotional arousal in the task, rather
than just changing as a consequence of a higher pitched or

that although the difference in FM rate between the assigned
song and personal song did not reach statistical significance,
there was a visible increase in rate for the personal song. This
would be consistent with the report of Ekholm et al2 of an increase in vibrato rate for more emotional singing. The finding
that both of the songs in the present study had a higher vibrato
rate than the isolated vowels is similar to the report from Sundberg.25 It could be inferred from this finding that singing a
meaningful song activates the mechanism underlying vibrato
to a different degree than the production of vowels in isolation.
Although the assigned song was neutral in its intended
emotional content, it could nevertheless be speculated that
singing either song could engage the autonomic nervous system
in such a way as to lead to a slightly faster vibrato.
Titze et al27 have suggested a reflex resonance model of
vibrato, which relies on muscle spindle afference and elevated
feedback gains to generate the muscle tension modulations that
result in vibrato. If the singers experienced increased autonomic
activity for either song when compared with isolated vowels, it
might be that this resulted in higher levels of muscle contraction, as reported in a previous study of autonomic activation.21
This could have influenced the timing and magnitude of the oscillations in the neural circuits responsible for vibrato.
FM extent
The FM extent of vibrato has previously been reported in the
range of 0.41 to 1.58 ST.3 In this study, the average extent
was about 1.5 ST, with a range from approximately 1.1 ST to
2.0 ST. The patterns for FM extent across the pitch and loudness
continua are very similar, allowing the results to be considered
together. Figures 2 and 3 show intriguing patterns of change for
FM extent. First, there is a modest but steady increase in extent
from the low- to the high-pitch and loudness conditions. For the
sustained vowel tasks, the FM extent increased with mean F0,
which was also associated with a dB increase. Second, and
perhaps more significantly, there was a greater difference between the isolated vowels and the song tasks in the extent of
vibrato. The personal selection showed almost no difference
from the assigned selection, which suggested that the level of
intended emotional arousal may not have had much of an effect
on vibrato extent. Instead, the nature of the task had a significant
impact on FM extent. It could be speculated, based on this
finding, that FM extent is not tied to emotional arousal, but
rather increases during performance, in contrast to the sustained, isolated vowel tasks that are not representative of concert performance.
Sundberg25 reported that FM extent increased slightly with
vocal loudness, but the current data reveal greater increases
for the performance of a song than would be anticipated from
intensity change alone, especially because the vibrato extent
for the neutral song was higher, whereas the dB level was lower,
than for the loudest isolated vowel. Thus, singing a song appears to involve factors that can influence vibrato extent that
are missing in the production of isolated vowels, possibly the
realistic nature of the task, because singers train to perform
songs rather than sustained vowels. Anticipated differences in

the level of emotional arousal between the two songs do not

appear to be influential in this context.
FM rate COV
Vibrato rate steadiness was measured in this study as FM rate
COV. This measure was used as the method of examining the
consistency of the FM rate. In a previous study, vibrato rate
periodicity was described as an important component of vocal
beauty.11 The term periodicity usually includes both rate and
extent measures to assess the overall steadiness of a sound in
comparison to a sine wave. In this study, however, rate steadiness and extent steadiness have been examined individually
in an attempt to more specifically examine the timing and
amplitude components of vocal vibrato. In a previous study, a
steadier vibrato was favored by expert listeners over a less
steady vibrato.4 Therefore, the examination of FM rate COV
could give insights into the way that emotion affects the overall
beauty of vibrato.
In this study, the FM rate COV patterns across the pitch and
loudness continua were found to be alike, as seen in Figures 2
and 3. In both graphs, the FM rate COV showed a slight
decrease in unsteadiness with an increase in the pitch and
loudness for the sustained vowel tasks. This decrease was
subtle compared with the substantial decrease in FM rate
COV between the sustained vowels and the assigned song.
The size of this change suggests that FM rate COV is affected
by the performance nature of the task. The FM rate COV for
the personal selection was likewise significantly lower than
for the sustained vowel tasks. It is important to note, however,
that the personal selection was slightly higher in FM rate
COV than the assigned song. Although this difference was
not statistically significant, there is a visible difference
between the songs in Figures 2 and 3. The increase in
unsteadiness from the assigned song to the personal selection
may permit speculation that a higher level of emotional
arousal increases FM rate inconsistency. This may be linked
to a previously identified relationship between fear, or
anxiety, and a quivering voice.28
What is less clear is why sustaining isolated vowels would
result in greater vibrato rate unsteadiness than singing a song.
Although we do not have data on the singers perceptions during the different tasks, it is possible that because their training
targets vocal beauty in performance, no such expectation is present for isolated vowels. Previous work has linked steadier
vibrato to more positive ratings of vocal beauty,4 and singers
may naturally produce a steadier vibrato in association with
performance of a song, regardless of its emotional content.
FM extent COV
FM extent COV is the measure of inconsistency in the width of
the vibrato extent during each vowel token. Inferences about
this vibrato characteristic mirror those of the FM rate COV. A
slight decrease in extent variability was noted between the
low pitch and loudness tasks and the high-pitch and loudness
tasks within the sustained vowels, showing an inverse relationship between pitch/loudness and FM extent COV. The most significant difference was between the isolated vowel tasks and the

performance tasks, with FM extent COV decreasing significantly for the performance tasks. These results, when applied
to real performance, could suggest that the FM extent is more
stable during performance and less stable during vocal tasks
that are not part of a singers concert performance, such as
warm-up activities. With regard to emotion, FM extent COV
did not appear to be affected by the presumed increased
emotional arousal during the personal selection in the way
that FM rate COV was affected. There was no noticeable difference between the extent COV for the neutral-emotion assigned
song and the high-emotion personal selection.
It is not possible on the basis of the present data to infer
mechanistic differences in the way singers may regulate the
steadiness of FM in its rate as opposed to its extent. Previous
work has suggested a possible trade-off between rate and extent
in vibrato,29 but this does not allow confident conclusions about
the separate contributions of regularity in rate and extent to the
overall steadiness of the modulation.

Mean F0
Mean F0 for the personal selection was higher than for the assigned song in the pitch continuum (Figure 2) and higher than
all other conditions in the loudness continuum (Figure 3).
Because the singers identified sections of the musical score representing high and low emotion, and the experimenter selected
vowels from the sections marked as higher in emotion, the results necessarily reflected a high mean F0 for these highemotion vowels. This would be consistent with the use of higher
pitch by the composer as one element of emotional expression,
along with other factors, such as loudness, tempo, and the
choice of words in the song.
The assigned song mean F0 was between the low and
comfortable pitch sustained vowel tasks, and the personal selection mean F0 was between the comfortable and high-pitch tasks,
further suggesting the importance of elevated pitch in
emotional expression. The fundamental frequency of each
vowel token was measured to examine the possibility that
mean F0 might be a causal factor for changes in the dependent
measures of vibrato rate, extent, and steadiness. In the graphs
and statistical analyses, the patterns in mean F0 for each task
were compared with patterns in the variables reflecting vibrato
rate, extent, and steadiness for each task. Figure 2 shows the
mean F0 for tasks of the pitch continuum, in which the mean
F0 for sustained vowel tasks was intentionally modified. This
graph also shows the mean F0 of the assigned song and their personal selection. Because the personal selection and the assigned
song were both within the F0 range of the sustained vowel tasks,
it appears unlikely that the increases in the rate, extent, and
steadiness of the modulation with high-emotion were simply
a function of mean F0. If mean F0 for the personal selection
or the assigned song had been out of the range of sustained
vowel mean F0 for low- to high-pitch tasks, the impact on these
vibrato indexes might have simply been a product of increasing
fundamental frequency above this range. However, this was not
the case, as mean F0 was within the range that the singers produced during sustained vowel tasks for the pitch continuum.

Mean dB
The mean intensity of the vowel tokens was examined to determine whether changes in the rate, extent, and steadiness of
vibrato might be attributable to a difference in intensity as
opposed to the level of emotional arousal. In Figure 2, the
mean dB for the personal selection was higher than for all conditions except the high-pitch task, including the assigned song.
Thus at first glance, it would appear that the high dB levels associated with emotional expression in the personal selection may
have contributed to changes in the rate and extent of vibrato,
and thus it would be difficult to disentangle the effects of
emotional arousal and vocal loudness. However, a closer examination of the data in Figure 3 reveals that vibrato rate and
extent did not climb stepwise with loudness across the dB continuum, implying that emotional expression in vibrato may rely
on more than a simple increase in vocal intensity, although
highly emotional passages of singing tend to be performed
with a louder voice.
The general trend was for mean dB to follow mean F0: when
there was an increase in mean F0, there was also a comparable
increase in mean dB. This finding is consistent with the physiologic explanation that a higher subglottic pressure is needed to
overcome the increased resistance of stiffer vocal folds during
higher notes.30
Differences between sustained vowels and songs
The purpose of including sustained vowels at several levels of
pitch and loudness was to learn whether these fundamental
adjustments to laryngeal function would have consistent effects
on the rate, extent, and steadiness of vibrato. It was reasoned
that this knowledge would be important to give context to the
interpretation of any vibrato changes when singers sang more
emotionally involved passages of a song. In other words,
because emotional expression in singing can involve increases
in pitch and loudness,24 knowing the effects of these changes in
the absence of emotional engagement may help to isolate or at
least more clearly interpret the effects of emotional expression
on vibrato.
As previously mentioned during the discussion of the individual acoustic measures, there were few differences in vibrato
rate, extent, or steadiness between the two songs. This could be
interpreted to mean that although true emotions can and do
affect speech,20 the expression of emotion by a singer may
not have substantial effects on vibrato, at least in the context
of a recording session in a studio. This finding, however, does
not clarify whether singing while experiencing a genuine
emotion, along with its associated autonomic responses,16
might affect vibrato by means of increased muscle activation,21
especially in the presence of a responsive audience.
Most of the significant differences in the results were between the songs and the isolated vowels, and a number of factors may have contributed to this finding. The linguistic content
of the songs may have contributed a cognitive load to the task
that was not present for isolated vowels. The completion of linguistic tasks concurrently with sentence repetition has previously been reported to influence measures of articulatory
stability.31 It is possible that singing the words of a song like-

neural resources are dedicated to language. Another possible
explanation is that the act of articulating words may alter the
activity of the larynx. Studies of laryngeal-articulatory
coupling32,33 have provided evidence that the vocal tract
subsystems are far from isolated in their function, and
that biomechanical and/or neural linkages may be responsible
for adjustments to one component leading to changes in
Vocal modulation on the surface seems like a relatively simple phenomenon that is brought about by rhythmic adjustments
to the level of cricothyroid muscle activation.35 However, the
complexity of the control circuitry of the lungs and larynx
during phonation means that several sources of neural input
can influence the behavior of the lungs and vocal folds. These
include volitional adjustments to the expiratory muscles and
those that control the position, length, and tension of the folds,
and reflexive responses based on sensory signals from the upper
airway, and also the influence of the autonomic nervous system
in response to emotional arousal. In the act of singing with
emotioneither genuine or feignedit would be anticipated
that a blend of signals from different components of the central
nervous system would influence the muscles that control phonation. Thus the factors influencing vibrato are complex, making
it difficult to interpret evidence of change in a straightforward
Limitations of the study and directions for future
A number of assumptions were made in the design of the present study that limit the strength of the inferences we may
draw from the results. Foremost was the belief that when
singing a passage recognized as emotionally expressive, a
singer would experience autonomic nervous system activation
that would influence the physiology of singing. Previous
work that has examined respiratory behaviors during emotionally engaged singing19 and laryngeal responses to autonomic
nervous system activation21 would support the hypothesis that
for a listener to perceive emotion in a song, there must be features of the sound production that differentiate it from a more
emotionally neutral performance. However, because singers,
like actors, may be highly skilled at feigning emotions, it is
entirely possible that no autonomic changes occurred as singers
performed the personal selection in the present study, even if the
acoustics reflected a convincing portrayal of an emotion.20 This
may be one reason why the acoustic indexes of vibrato did not
differ between the two songs, although mean F0 and mean dB
were significantly higher for the personal selection.
Furthermore, the personal selection song was the only condition in the study under which the singers would be anticipated
to perform with intense emotional arousal. However, singing in
the recording studio as part of an experiment would only poorly
simulate the experience of performing before a large audience.
Thus, the personal selection may have been more representative
of a performance practice session, because the engagement
with a live audience was missing. A further concern about the
personal selection was that the length was greater than for the

assigned song, and the length also differed across singers

because each chose a different song. The length of the song
may have influenced the singers capacity to maintain a given
intensity of emotional expression, and may thus have affected
the results.
One way to understand links between emotion and singing
more fully would be through an examination of the physiological changes in the singer while performing with emotion
during a live stage event. Relevant measures could include cardiac, electrodermal, or vascular measures such as cardiac interbeat intervals, skin conductance level, diastolic blood pressure,
and mean arterial pressure. These measures have been used in
previous studies to assess physiological changes in individuals
while they listen to emotional operatic music,17 and could
potentially be adapted to assess the emotional arousal of the
singer during performance. This type of study could give a
clearer understanding of whether singers genuinely experience
emotions during a performance, or whether they are instead
highly skilled at simulation, having practiced the emotional
song so many times that they need not experience the actual
emotion during performance to convincingly evoke it in the
audience. A recent study of vocal performance students
revealed that the psychological stress associated with a jury examination led to increases in heart rate, but there were divergent
effects on singing accuracy depending on the training level of
the singer.36
Because the vowels extracted for analysis in the more
emotionally expressive personal selection came from a
different song for each performer, vowel segment durations
were not controlled for during analysis. Previous work has
shown that FM extent can be lower for longer vowels,37 and
also that FM rate can increase toward the end of a longer
vowel.10 Because the FM rate and extent in the present study
were measured as the average across vowel segments of varying
duration, potentially important differences in vibrato rate and
extent for vowels of different length were missed.
The degree to which we can generalize from the present
study was limited because there were only 10 participants
eight females and two males. In the future, a larger number of
participants with approximately equal representation of men
and women might yield results that are easier to interpret,
particularly with regard to any differences between males and
females in the influence of emotional arousal on vibrato. In
the present study it was difficult to directly compare the personal selections with other tasks because the personal selection
was different for each participant, although all other tasks were
completed in the same way for each singer. A possible solution
in future research would be to have all participants sing the
same high-emotion song.
In this study, high-emotion and neutral-emotion were the
only two categories used to describe the emotion in the singing
tasks. In future studies, the type of emotion could be further
examined in several ways, including comparisons of positive
and negative emotions, or specific emotions such as anger,
fear, pride, sadness, and so on. A more specific method of classifying emotion may lead to a clearer understanding of the
mechanisms by which emotions influence the singing voice,


particularly given Kreibigs report of different physiologic responses for specific emotional states.16
The purpose of this study was to learn whether the intensity of
emotion expressed by a singer during performance would influence the acoustic characteristics of their vibrato. In spite of the
limitations identified previously, the results not only link certain
aspects of vibrato to emotional expression, but also to singing in
performance as opposed to the production of isolated vowels.
Given the improvements in vibrato steadiness for the more
emotional passages, the importance of emotional engagement
in a performance appears worthy of further consideration in
the pursuit of vocal beauty.
We express our appreciation to the singers and accompanists
who participated in this study. We are also grateful for the financial support provided by the David O. McKay School of Education at Brigham Young University. This manuscript is based on
the masters thesis research of the second author.
