U0015 0000001 0004240

SPEAKING FAST AND SLOW: HOW SPEECH RATE OF DIGITAL ASSISTANTS
AFFECTS LIKELIHOOD TO USE
by
BRETT ALAN CHRISTENSON
CHRISTINE RINGLER, COMMITTEE CHAIR

NANCY J. SIRIANNI, COMMITTEE CO-CHAIR
ARTHUR W. ALLAWAY
KRISTY E. REYNOLDS
PETER D. HARMS
A DISSERTATION
Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy
in the Department of Marketing
in the Graduate School of
The University of Alabama
TUSCALOOSA, ALABAMA
2020
Copyright Brett Alan Christenson 2020
ALL RIGHTS RESERVED
ABSTRACT
Digital assistants like Siri and Alexa are adaptable service robots which interact vocally
to deliver services to consumers. During interactions, these digital assistants provide a unique
opportunity for marketers to convey social and emotional information by altering qualities of
their voices, such as speech rate. However, as brands begin to adapt their digital assistant voices,
they have little research to guide them in creating positive consumer responses and avoiding
negative ones. Across seven experiments, the process of how digital assistant speech rate affects
consumer emotions and likelihood to use is uncovered and explained. Nervousness and risk
facilitate the process, while interaction style and personal differences are shown to moderate the
effects. Experiment 1 begins by showing speech rates which are significantly faster or slower can
negatively impact likelihood to use a digital assistant. Following this, experiments 2A-B uncover
the process of effects and show feelings of nervousness as well as risk mediate the relationship
between speech rate and usage intentions. Experiments 3A-B provide managers with applicable
moderators for the effects while experiments 4A-B provide a complete moderated mediation
model. This work contributes to the sensory marketing literature focused on sound and its impact
upon consumer perceptions and behaviors.
Keywords: sound, voice, sensory marketing, digital assistant, human-computer interaction
ii
DEDICATION
To Tessa, Puma, Paprika, Nala, and Stellan.
iii
LIST OF ABBREVIATIONS AND SYMBOLS
a Coefficient Alpha
b Coefficient Beta
CI Confidence Interval
d Cohen’s D
DA Digital Assistant
DV Dependent Variable
F F-Statistic
HCI Human-Computer Interaction
IV Independent Variable
N Sample Size
ƞ2 p Partial Eta Squared
M Mean
% Percentage Sign
PANAS Positive and Negative Affect Schedule
p p-Value
SD Standard Deviation
SE Standard Error
t T-statistic
iv
ACKNOWLEDGEMENTS
Many people helped make this dissertation possible, so these acknowledgements should
not be considered an exhaustive list. First and foremost, I recognize and thank my co-chairs,
Christine Ringler and Nancy Sirianni, who have had a significant impact on my development as
an academic in many ways. They have both been formative in shaping my approach to scholarly
work as well as my career and this project would not have been possible without their guidance. I
would like to specifically thank Christine for holding me to a high standard throughout the
program. This motivated me to keep working harder and kept me active in thinking about my
projects and how to improve, which has made my work incredibly better. I’d also like to
specifically thank Nancy for opening my range of interests to other research areas as well as
giving me guidance on some of the unwritten stuff about PhDing. The both of you have guided
me through this process, and I’m thankful for it.
Second, I appreciate my committee members: (1) Kristy Reynolds, for being supportive
and open to me asking questions on short notice as well as working with me on other projects,
(2) Buster Allaway, for reminding me from time to time work can be fun and humor goes a long
way to reducing stress, and (3) Peter Harms, who has unknowingly served as my personal
inukshuk for this and many other projects.
Third, I thank the scholars currently conducting research on this topic, for their
dedication to the area of sensory marketing, computer interactions, and consumer behavior as
well as their willingness to lay the groundwork upon which this project attempts to build.
v
Fourth, I want to thank my parents, Debbie and Max, who set the example and showed
me the value of what an education can provide for someone and who went out of their way to
ensure I had the best one possible. You’re the ones who started me on this path at Reagan, and
I’m thankful every day for it. Look at where it led!
Last and never least, I want to acknowledge that none of this would be possible without
my wife, Tessa. You’re the smartest, most well-adjusted person I know and I wouldn’t have been
able to do this without you. There aren’t enough pages available to really say how much you
inspire who I am and what I do.
vi
CONTENTS
ABSTRACT.................................................................................................................................... ii
DEDICATION ............................................................................................................................... iii
LIST OF ABBREVIATIONS AND SYMBOLS .......................................................................... iv
ACKNOWLEDGEMENTS .............................................................................................................v
LIST OF TABLES ......................................................................................................................... ix
LIST OF FIGURES .........................................................................................................................x
THE MOVE TO VOICE .................................................................................................................1
THEORETICAL FRAMEWORK ...................................................................................................3
OVERVIEW OF STUDIES...........................................................................................................12
EXPERIMENT 1 ...........................................................................................................................13
EXPERIMENT 2A ........................................................................................................................18
EXPERIMENT 2B.........................................................................................................................24
EXPERIMENT 3A ........................................................................................................................29
EXPERIMENT 3B.........................................................................................................................32
EXPERIMENT 4A ........................................................................................................................36
EXPERIMENT 4B.........................................................................................................................41
GENERAL DISCUSSION ............................................................................................................45
DATA COLLECTION INFORMATION .....................................................................................51
REFERENCES ..............................................................................................................................52
vii
APPENDIX A: SOUND STIMULI USED IN STUDIES.............................................................59
APPENDIX B: SCRIPTS FOR EXPERIMENTS .........................................................................60
APPENDIX C: MEASURES FOR STUDIES ..............................................................................62
APPENDIX D: MEANS AND SD’S FOR PANAS MEASURES IN STUDY 2A ......................65
APPENDIX E: INSTITUTIONAL REVIEW BOARD APPROVAL LETTER ..........................66
APPENDIX F: TABLES ...............................................................................................................67
APPENDIX G: FIGURES .............................................................................................................69
APPENDIX H: HEADINGS LIST ................................................................................................76
viii
LIST OF TABLES
1. PROCESS Measures for Studies 2A-B .....................................................................................66
1. PROCESS Measures for Studies 4A-B .....................................................................................67
ix
LIST OF FIGURES
1. Conceptual Overview of All Studies .........................................................................................68
2. Experiment 2B – Serial Mediation of Speech Rates on Likelihood to Use DA ........................69
3. Experiment 3A – Johnson-Neyman Graph of Moderation of Likelihood to Use DA by

Susceptibility to Others Emotions .................................................................................................70
4. Experiment 3B – Likelihood to Use DA Across Interaction Style and Speech Rate ................71
5. Experiment 4A – Likelihood to Use DA Across Interaction Style and Speech Rate ................72
6. Experiment 4B – Likelihood to Use DA Across Interaction Style and Speech Rate ................73
x
THE MOVE TO VOICE
Over the last few years, digital assistants have increasingly become a central point of
communication between consumers and their connected devices. Digital assistant (DA) use is set
to triple over the next few years, with projections of 1.8 billion users by 2021 and over 8 billion
DA-enabled devices by 2023 (Perez 2019; Yakuel 2018). Tech companies have taken notice and
are betting the future of personal computing will be driven by vocal interaction with these
assistants, which integrate multiple devices together into a connected ecosystem (Routley 2019).
These ecosystems allow a consumer to be connected to their assistant no matter where they go,
and encourage multiple device purchases from a single brand because integration across brands
is not an option. For many of these companies, the first step in attracting consumers into their
ecosystem is adoption of their smartphones and speakers, which are enabled with their branded
DA. The smart speaker market alone is expected to be over $23 billion by the end of 2025,
indicating a large bottom-line impact for brands who invest in attracting consumers early on into
their DA ecosystem rather than a competitors (Kumar and Rasal 2018). For these brands, DAs
are a tool that can be used to provide social value to consumers through vocal interaction.
However, little research has been done in the field of marketing to address how the audible cue
of a DA’s voice impacts consumer adoption of these assistants.
In an effort to provide more social value to consumers and increase DA adoption, brands
have begun to adapt their DA voices for major release. Google has customizable add-ons for
their DA, like the voice of a favorite celebrity (Kraus 2019), and Apple continually updates the
1
voice of Siri to more closely resemble the speech of a human agent (Pierce 2017). In the summer
of 2019, Amazon updated Alexa to be more adaptable to consumer preferences in terms of the
speed at which she speaks (Amazon 2019). Consumers using Alexa-enabled devices can now
choose from seven speeds – Alexa's standard speaking rate, four faster speaking rates, and two
slower speaking rates. This newest feature is expected to create more social interactions because
it is more realistic and similar to how a human speaks. However, it also begs the question: what
effect does varying speech rate have on consumers and how does theory explain these effects?
These are some of the questions this research seeks to answer.
While consumer industry leaders are already adapting their DA’s voice, research to
support these updates is scant and has been either limited to areas other than marketing or has
focused on broadcast advertising like TV and radio. Therefore, the presented studies seek to
contribute to the marketing literature related to sounds, specifically the sound of a voice, by
integrating theory from communications with consumer behavior research. More specifically,
this work investigates how the speech rate of a DA affects consumers and lowers their likelihood
to continue using it. Without knowledge of how a DA’s voice impacts a consumer’s likelihood to
use it, marketer’s updates may be hurting rather than helping their efforts to gain market share in
the early stages of DA growth. Therefore, this work is important to both the marketing literature
as well as to industry.
2
THEORETICAL FRAMEWORK
In the area of sensory marketing, research on sound shows that meaning in verbal
communication is delivered not only in what we say, but how we say it (Peterson, Cannito, and
Brown 1995). When we speak, what we say is classified as the linguistic content of a message
while the way we speak is classified as paralinguistic content (Apple, Streeter, and Krauss 1979).
Paralinguistic aspects of our speech are made up of four basic adjustable elements: volume,
pitch, timbre, and speech rate.
Elements of Speech
The volume of a voice is indicated by amplitude and is perceived as the loudness or
softness of a sound (Bruner 1990; Krishna 2013). There is evidence to show that the volume of
background sound in restaurants impacts expenditures on food and beverages (Sullivan 2002).
Pitch of a voice is made up of the sound wave frequency and is often perceived as being on a
spectrum from low to high (Krishna 2013; Lowe, Ringler, and Haws 2018). Manipulations of
pitch have been shown to impact perceptions of product size (Lowe and Haws 2017). The third
property of a voice is its timbre, or the harmonics of the vocal sound wave (Bruner 1990).
Timbre allows a person to discern between two speakers who have identical volume and pitch,
but differ in terms of their harmonic texture. Research on timbre indicates positive effects of
certain timbres on affective responses and recall of advertisements (Oakes and North 2006).
Speech Rate
Of particular importance to the presented experiments, the fourth paralinguistic aspect of
a voice is its speech rate, or the tempo at which a person speaks. A common practice in broadcast
3
advertising is to speed up, or time compress, advertisements so they fit into a specified time slot
on air. When an ad is time compressed, the audio elements are sped up, creating the effect of
faster speech rate. One of the most well-known effects of speech rate was shown by
Chattopadhyay et al. (2003), who indicated speech rate can interact with pitch. Their results
show a voice with faster-than-normal syllable speed and low pitch can produce more favorable
ad and brand attitudes. Their results for speech rate, however, are conflated with other
paralinguistic qualities, making the inference of effects for only speech rate harder to discern. In
earlier work, MacLachlan and Siegel (1980) produce evidence that participants have better recall
of brands and often prefer content that is sped up. These authors test their hypotheses using a
single study with TV commercials which conflate the senses of vision and audition, making
inferences about speech rate difficult. Moore, Hausknecht, and Thamodaran (1986) dispute the
cognitive recall findings of MacLachlan and Siegel (1980) using manipulations of audio only.
They argue faster commercials reduce a consumer’s time to elaborate on ad information and lead
to mixed results for speech effects. They conclude that when someone hears a fast speech rate,
they use it as a peripheral cue for processing difficulty. Therefore, the focus is less on what is
being said and more on how it is being said. The limited work examining slower speech rates
shows a similar effect to that of faster speech, but is also conflated with other paralinguistic
qualities. For instance, Benki et al. (2011) investigate phone interviewers’ voices and their
impact upon a respondent’s choice to agree, refuse, or defer in taking a survey. Their results
show slower speakers elicit the lowest amount of respondent participation. This single
experiment, however, conflates speech rate with pitch as well as pausing. More recent work by
Charoenruk and Olson (2018) posits listeners find it difficult to hold relevant information about
complex topics in working memory if a speaker talks at a slow rate, producing lower recall.
4
While this short review provides a basis for the current studies, it also indicates many
opportunities for contribution. There has been very little attention in marketing given to work on
vocal sounds over the past decade, with much of the aforementioned research highlighting only
initial findings (Dahl 2010). Therefore, the current studies seek to expand the sensory marketing
literature on the auditory cue of a voice by investigating the process of its impact on consumers.
More specifically, how speech rate of a DA’s voice impacts consumers negative reactions and
their likelihood to use it. These effects are explained by social response theory, discussed next.
Social Response
The communication theory of social response explains human interactions with
computers, technology, and new media (Reeves and Nass 1996). According to the theory, people
tend to interact with computers as if they are social actors, even when they are aware the
machine does not actually possess feelings or motivations (Moon 2000; Nass and Moon 2000;
Nass, Moon, and Green 1997). When people interact with technology exhibiting human-like
characteristics, the response is reflexive and occurs without substantial deliberation (Reeves and
Nass 1996). This phenomenon aligns with research showing people make social attributions and
responses “mindlessly” (Langer 1989; Nass and Moon 2000) and use heuristics to simplify
extensive processing of information (Chaiken 1980; Eagly and Chaiken 1993). When using
computers equipped with voice output, participants have been shown to psychologically orient
themselves towards the voice, imbuing it with distinct personality types (Moon and Nass 1996,
1998; Nass et al. 1995; Nass, Moon, and Green 1997). In effect, human social responses to
objects with humanlike characteristics, like speech, are influenced by their interactions with the
object (Aggarwal and McGill 2007; Epley, Waytz, and Cacioppo 2007). Given this, the voice of
a technology is a social cue that is salient and likely to become the relational target of a social
5
response. Digital assistants therefore present an ideal context in which to investigate the effects
of the speech rate cue upon social responses, both positive and negative.
Based on social response theory, because vocalization is an evolved human behavior,
DAs which talk to us should be imbued with human traits and consumer responses should be
similar to interactions with other humans. Indeed, some modern conversational interfaces are
now able to interact with us in very humanlike ways when they are combined with artificial
intelligence technologies (McTear, Callejas, and Griol 2016; Ruijten, Terken, and Chandramouli
2018). While a majority of existing self-service technologies, such as ATM machines, lack the
capacity to engage consumers socially, DA technologies using voice are uniquely capable of
engaging in meaningful social encounters with humans. The theory of social response provides
an explanatory framework for why these social interactions occur between humans and DAs.
Given our interactions with computers can be similar to those with other people, prior
investigations of emotional and social responses between humans should help identify potential
processes of the effects between humans and robots. Since DAs interact vocally, research on
sounds and how they impact our emotions serves as a fruitful starting point.
Responses to Speech
In a review of the literature on human speech, Murray and Arnott (1993) show that
paralinguistic vocal qualities are associated with basic emotions. Their review reinforced an
earlier proposition by Darwin (1872), that the voice is a sophisticated tool used to indicate an
individual's emotional state. When we communicate verbally, we disclose information about our
biological, psychological, or social status through vocal variations (Kraus 2017; Schwartz and
Pell 2012). For example, high-pitched voices are judged as less truthful, less emphatic, and more
nervous (Apple, Streeter, and Krauss 1979). Additionally, voices which have fast or slow speech
6
rates are perceived as less benevolent (Smith et al. 1975) and overall less liked (Benki et al.
2011; Murray and Arnott 1993). These studies, however, are limited to measuring consumer
attributions of the person speaking, leaving investigation of the effects of a voice upon the
listener yet to be measured, and addressed in the presented studies.
Overall, the literature indicates that when we hear a sound which is dissonant or
unpleasant, we generally have negative reactions. Social response theory indicates these
reactions as being physiological as well as emotional. When interactions produce positive
affective responses they facilitate approach behaviors, while negative affective responses inhibit
this behavior (Fowles 1994). Given this, a consumer who hears a sound which isn’t pleasing
should be less likely to want to hear it again. This means consumers who interact with a DA who
speaks in a way that is not pleasing, such as too fast or slow, should be less likely to want to use
it. Stated more formally:
H1: Consumers who use a DA that speaks at a fast or slow rate will be less likely to
want to use it than consumers who interact with moderate speech rate.
Building on the main effect hypothesis, a logical next question to ask is what mechanism
explains the effect? A plausible candidate would be a negatively valenced and highly activated
state, such as nervousness. A common theme in communication research links speech rate with
specific emotional expressions, like speaker nervousness (Laukka et al. 2008; Siegman and
Boyle 1993). Nervousness in a speaker’s voice is indicated by irregular variations in
paralinguistic qualities and is easily perceived by the person who hears it.
Nervousness. The circumplex model of affect describes emotions in terms of two
orthogonal dimensions of valence and activation (Barrett and Russell 1999; Russell 1980).
Nervousness is defined as a highly activated, negative-valenced state associated with feelings of
7
distress and being jittery (Watson, Clark, and Tellegen 1988). This state matches the description
given by Burke et al. (1989) and is characterized by avoidant behavior (Lowe, Loveland, and
Krishna 2019). Building on the earlier proposition that positive affect facilitates approach
behavior whereas negative affect inhibits similar behavior, nervousness becomes a prime
candidate for the mechanism of the effects in hypothesis 1. Nervousness is also a prime
candidate because it differs from other highly activated and negatively valenced states. For
example, it does not include the perception of physical harm, similar to fear (Tanner, Hunt, and
Eppright 1991), and it does not facilitate approach behaviors, similar to anger (Watson, Clark,
and Tellegen 1988). Given this, two hypotheses follow:
H2: A DA which speaks at a fast or slow rate will produce greater nervousness in the
consumer who hears it compared to when it speaks at a moderate rate.
H3: Feelings of consumer nervousness mediate the relationship between DA speech
rate and consumer likelihood to want to use it, with increases in consumer
nervousness lowering their likelihood to use the digital assistant.
While an understanding of how a voice can lead to lower usage intentions is unique, a
larger contribution can be made by providing evidence for the deeper process of nervousness
effects. Research on both emotional processing as well as decision making indicates perceptions
of risk as a viable candidate for facilitating feelings of nervousness (Carpentier et al. 2017). For
example, the nature of prospect theory states that in the face of gains, nervous consumers prefer a
guaranteed option over a risky one (Kahneman and Tversky 2013). Bias towards these decisions
was originally posited as being an aversion to negative outcomes. More recent work indicates
that nervousness stems more from consumer uncertainty about their decisions, defined as
perceptions of risk, and leads to different decisions in nervous individuals (Carpentier et al.
8
2017). The authors build upon prior work associating risk and nervousness and lay the
groundwork for future exploration (Giorgetta et al. 2012; Maner et al. 2007). However, this work
only goes as far as hypothesizing the association between risk and nervousness rather than
testing a causal order of effects, leaving a theoretical gap the current studies seek to address.
In consumer behavior research, it’s widely accepted that perceptions of risk are relevant
for adoption (Cherry and Fraedrich 2002; Gatignon and Robertson 1985). Risk perceptions have
been shown to impact attitudes and behaviors towards e-business functions (de Ruyter, Wetzels,
and Kleijnen 2001), usage of self-checkouts in grocery stores (Anselmsson 2001), and aversion
to ordering prescriptions over the phone (Meuter et al. 2005). Despite the growth in service
technology, gaining consumer acceptance can still remain a challenge for marketers (Paluch and
Wünderlich 2016). The role risk plays in the process of eliciting nervousness has yet to be
explored but is indicated by this literature. Therefore, an additional hypothesis is tested in some
of the following experiments:
H4: Speech rate negatively influences likelihood to want to use a DA through a serial
mediation process of risk and nervousness. Fast and slow speech rates increase
risk perceptions, increasing consumer nervousness and lowering usage intentions.
With marketers already updating the voices of their branded DAs, providing a moderator
which can be used to avoid unwanted outcomes would be immediately valuable. Given the
market is already being filled with DAs which can alter their speech rate, how could an
interaction with a DA be created which delivers information at a faster or slower speed while
also not causing negative reactions?
Interaction Style. Research on interaction styles in the social and behavioral sciences
classifies interactions as being either monological or dialogical. In monological interactions, one
9
person speaks for a time and another person listens. This style is exemplified by less participant
engagement, or processing of the message (O’Connor and Michaels 2007). Less processing of
the central message indicates peripheral cues should be more salient. This occurs because the
speaker is not focused on the audience’s needs and only commands, coerces, and manipulates
(Johannesen 1996). Therefore, the audience is less involved. Opposite of this, a dialogic, or
back-and-forth interaction is exemplified by greater audience involvement and processing of the
content of a message. As someone becomes more involved, their focus is directed towards the
content communicated, or the words being spoken (Nesari 2015). During dialogical interactions,
what is being spoken is more salient than how it is being spoken.
Since DAs are adaptable, the interaction style they use with us can be changed depending
on the context. If a marketer wishes to create a DA which alters its speech rate, it should also be
designed to interact in such a way that avoids potential negative reactions. The discussion of
monologic versus dialogic interaction styles, and their differing influence upon which aspects of
speech are most salient, falls in line with the work reviewed earlier indicating a voice can draw
attention to either what is being said or how it is being said. This leads to the hypothesis that the
effects of speech rate, a peripheral cue, should be moderated by interaction style, an important
factor in whether someone is focused on the content or delivery of a message. More formally:
H5: The relationship between DA speech rate and likelihood to use it is moderated by
interaction style, with decreases in usage likelihood only occurring during
monologic interactions.
. Based upon the discussion showing elements of sound can cause negative emotional
reactions in consumers (Lowe, Loveland, and Krishna 2019; Lowe, Ringler, and Haws 2018), the
current work seeks to expand our knowledge of how and why this occurs. While speech rate has
10
been linked to negative reactions like speaker nervousness (Apple, Streeter, and Krauss 1979),
investigations in this area are limited to dichotomous manipulations of speech rate, being either
slow or fast (MacLachlan and Siegel 1980; Murray and Arnott 1993). This work is also usually
limited to general measures of affective states, like positive and negative. Therefore,
investigation of a single paralinguistic quality and its role between a human and DA is both
novel and important for researchers in sensory marketing and human-computer interaction. It
shines a light on more specific emotions beyond general positive or negative states consumers
feel when they hear a voice and it provides a theoretical process for effects.
11
OVERVIEW OF STUDIES
Seven experiments are presented which test five hypotheses. Experiment 1 begins by
showing the main hypothesized effect, that vocal speech rate has a differential effect upon
consumer likelihood to use a digital assistant. Following this, experiments 2A-B investigate the
process of the effects shown. Using two scenarios, these experiments show a negative affective
state, nervousness, mediates the effect between speech rate and likelihood to use the DA. These
experiments also show that self-service technology risk is the mechanism for nervousness and
rule out several alternative explanations. Experiments 3A-B provide managers with two
moderators of the effect, one being personal differences and the other being interaction style.
Lastly, experiments 4A-B show a full moderated mediation model of effects. A conceptual
model of all experiments is provided in figure 1.
-----------------------------------
Insert figure 1 about here
-----------------------------------
12
EXPERIMENT 1
Experiment 1 examines differences in consumer reactions to digital assistant (DA) voice
types that are either slow, moderate, or fast in their speech rate. These differences are explored
using a scenario-based experiment in which participants use a DA to create a personal budget.
Following from the earlier conceptualization, both faster and slower digital assistant speech rates
are expected to lower a consumer’s likelihood to use the DA. When the DA speaks at a moderate
rate, this decrease should not occur.
Stimuli and Pretests
Female voices were used for the digital assistant based upon a pilot study with 107
participants (Mage = 34.74, SD = 9.92, 59.9% male) on Amazon Mechanical Turk (MTurk). Pilot
study participants were asked to provide information on the smart devices they owned by
indicating if they had a smart phone, tablet, computer or smart speaker. Of these participants,
96% owned a smart phone, 77% owned a laptop or desktop, 51% owned a tablet and 36% owned
a smart speaker. Next, participants were asked to indicate if any of their devices were equipped
with a digital assistant, with 89% of respondents answering “yes”. Following this, participants
were asked how often they use their digital assistant, with 39% saying they used their assistant at
least once in the previous week, and 74% indicating they had used their assistant at least once in
the last month. Important for the current studies, the results indicated 85% of participants had a
device with a female voice and the device came preprogrammed to that female voice. An
additional interesting finding was that 93% of participants indicated that, while they had the
option to switch the voice of their assistant, they had not done so.
13
To ensure that speech rate was successfully operationalized for all studies, audio
recordings of the DA voice were pretested with a second, and separate, MTurk panel of 100
participants (Mage = 43.87, SD = 13.97, 54.2% male). These participants were played a recording
of the digital assistant voice, being either slow, moderate, or fast speaking, and then asked to rate
their perceptions of the voice using 7-point Likert scales on the paralinguistic qualities of Speed
(1 = Slow; 7 = Fast), Pitch (1 = Low Pitched; 7 = High Pitched), Volume (1 = Quiet; 7 = Loud),
and Timbre (1 = Smooth; 7 = Rough). There were no significant differences in listener’s
perceptions of the voices in terms of volume, pitch, or timbre (F’s < 1). When comparing
perceptions of the speed of the voice, there were significant differences between the slow,
moderate, and fast voices (F(2, 97) = 4.32, p < .001, ƞ2p = .22). Post hoc comparisons showed
that the slow speaking voice (Mslow = 3.18, SD = .92) was perceived as significantly slower
speaking than the moderate voice (Mmoderate = 3.78, SD = .78, t(58) = 3.22, p = .003, d = .434).
The fast speaking voice (Mfast = 4.56, SD = .99) was correctly perceived as being significantly
faster than the moderate voice t(58) = 4.11, p < .001, d = .603). The fast speaking voice was also
perceived as being significantly faster than the slow voice t(58) = 6.25, p < .001, d = .887). It is
important to note that all stimuli were created with the help of a professionally trained audio
technician to ensure they were equal in terms of pitch, volume, and timbre, with manipulation of
speech rate being the only change.
Method
Participants and Design. Experiment 1 utilized 766 participants (Mage = 20.25, SD =
1.01, 49.9% male) in a controlled lab environment at a large public university in the U.S.
Participants were compensated with class participation credit. The voice of the digital assistant
(DA) was manipulated to create a one-way design with 3 treatments (voice speech rate: slow,
14
moderate, fast). Participants were told they would be interacting with a new DA built to help
create a personal budget. Speech rate of the DA’s voice was operationalized by recording a
human female and then manipulating the sound file to create audio recordings for the voice
types. Using Ableton Live 10 software, the recording was adjusted first to reflect a moderate
speech rate, similar to the rate used by Lane and Grosjean (1973), having a syllabic speed of 5-
per-second and an interphase pausation of half-a-second. Next, the recording was adjusted to
create both a slow and fast version of the voice, being 20% faster or slower in each direction.
Vocal stimuli from all studies are provided in appendix A.
Procedure. After agreeing to participate, participants were asked to indicate if they used a
DA before and provide their usage rates and preferences. They were then asked to identify a test
sound to ensure the provided lab headphones were working, and finally passed an attention
check. Following this, participants were given an introduction to the scenario, which indicated
they would be interacting with a new DA built to help create a personal budget. Participants were
randomly assigned to one of the three treatment conditions, hearing either a slow, moderate, or
fast speaking DA. To control for potential effects of volume (Garlin and Owen 2006; Kellaris
and Altsech 1992), the output volume for all of the files was set to an identical amplification and
participants were asked not to change the volume. Additionally, lab assistants checked the
volume level of all participant computers after each lab session to ensure no changes were made.
There were 13 participants who did not correctly identify a test sound and were removed, leaving
753 for the analysis.
While participants heard different speeds of speech, the content of the message was
identical and consisted of the participant listening to an audio clip of the digital assistant
speaking. In social and behavioral sciences, this style of interaction is labeled as monological, or
15
one-way in style (Asterhan and Schwarz 2007). In monological interactions, one person speaks
for a time, and another person listens. This interaction style is exemplified by less participant
engagement, or processing of the message, compared to a dialogic, or back-and-forth style
(Nesari 2015). As the purpose of experiment 1 is to show the main effect of the peripheral cue of
speech rate, a monologic interaction style is used as it should encourage less processing of the
central message and make peripheral cues more salient. Participants were first greeted by the DA
and then listened as the voice provided information on how and why to create a personal budget.
The entire scenario was one directional, or monological, with the digital assistant speaking while
the participants listened. After hearing the DA speak, participants filled out measures for the
variables of interest as well as demographic information. The script for this scenario as well as
all other studies is provided in appendix B.
Measures. The dependent variable of likelihood to use the DA was measured using a
single item 7-point Likert scale (“How likely would you be to use this assistant to create a
personal budget?”; 1 = Not at all likely; 7 = Very likely). Following this measure, participants
completed single-item manipulation checks for the audio recordings, taken from Martín-Santana
et al. (2015) (“This voice was 1 = Slow; 7 = Fast”; “This voice was 1 = Quiet; 7 = Loud”; “This
voice was 1 = Low Pitched; 7 = High Pitched”). Participants finished by indicating their gender
and age. Details of measures are provided in appendix C.
Results and Discussion
Speech Rate. The manipulation of speech rate was successful, with participants indicating
significant differences between the slow, moderate, and fast treatments (F(2,751) = 155.84, p <
.001, ƞ2p = .29). Participants heard the slow voice (Mslow = 3.18, SD = .98) as significantly
slower than the moderate voice (Mmoderate = 3.78, SD = .88, t(501) = 7.19, p < .001, d = .642).
16
Participants also heard the fast voice (Mfast = 4.70, SD = 1.04) as significantly faster than the
moderate voice (t(501) = 10.69, p < .001, d = .954). Additionally, participants correctly heard the
slow voice as significantly slower than the fast voice (t(500) = 16.80, p < .001, d = 1.50). There
were no significant differences between the voices in terms of pitch or volume (F’s < 1).
Likelihood to use the DA. A one-way ANOVA comparing the three speech rate
treatments showed a significant main effect of vocal speech rate upon likelihood to use the DA
(F(2, 751) = 4.13, p = .016, ƞ2 p = .011). Post hoc comparisons showed participants in the slower
speech treatments (Mslow = 3.91, SD = 1.88) were significantly less likely to want to use the DA
than in the moderate speech rate treatments (Mmoderate = 4.28, SD = 1.84, t(501) = 2.24, p = .022,
d = .198). Additionally, participants in the fast speech rate treatments (Mfast = 3.85, SD = 1.76)
were significantly less likely to want to use the DA than in the moderate speech rate treatments
(t(501) = 2.69, p = .008, d = .238). There was no significant difference between the slow and fast
speech rate treatments (p = .715). These results support hypothesis 1, that consumers who use a
DA with a fast or slow speech rate will be less likely to want to use the digital assistant than
consumers who interact with a moderate speech rate.
17
EXPERIMENT 2A
Experiment 2A begins a series of experiments to uncover the process of the hypothesized
effects. Slow and fast DA speech rates are hypothesized to produce lower consumer usage
intentions based upon the prior theory development indicating these speaker speech rates elicit
nervous responses in listeners. Experiment 2B digs deeper into this relationship to show
increased perceptions of risk act in serial mediation with consumer nervousness to lower usage
intentions with DA’s.
Participants were asked to listen to a DA provide information on creating a study plan for
undergraduate classes. Following from the earlier theory review, a faster or slower speech rate is
expected to lead to lower likelihood to want to use the digital assistant. Given speech rates
existing relationship with negative reactions, this effect is hypothesized as occurring due to
increased nervous reactions in a participant when speech rate is slow or fast. Therefore,
experiment 2A tests hypothesis 2 and 3. Hypothesis 2 states a digital assistant with a fast or slow
speech rate will produce greater felt nervousness in a consumer than one that speaks at a
moderate rate. Hypothesis 3 states consumer nervousness mediates the relationship between
digital assistant speech rate and likelihood to want to use it.
Method
Participants and Design. Experiment 2A utilized 302 participants (Mage = 20.66, SD =
Participants were compensated with class participation credit. The voice of the DA was
manipulated to create a one-way design with 3 treatments (voice speech rate: slow, moderate,
18
fast), similar to study 1. Participants were told they would be interacting with a new DA built to
help create a personal study plan for their classes. Audio recordings of the voice were created
similar to prior experiments and compressed for equal volume as lab assistants again checked the
computer volume between lab sessions to ensure participants did not change it.
Procedure. After consenting to participate, participants were asked to identify a test
sound to ensure the provided lab headphones were working as well as pass an attention check.
Following this, participants were given an introduction to the scenario and then randomly
assigned to one of the three treatment conditions, hearing either a slow, moderate, or fast
speaking DA. Participants were greeted by the DA and then listened to the voice provide
information on how and why to create a personal study plan. The interaction style of the scenario
was similar to experiment 1, being only monological, or encouraging peripheral processing.
After hearing the DA speak, participants filled out measures for the variables of interest as well
as demographic information. Based upon incorrect responses to the attention check question, 4
participants were removed from the sample, leaving 298 for analysis.
personal study plan?”; 1 = Not at all likely; 7 = Very likely). Following the dependent variable
measure, participants completed the PANAS scales for positive and negative affect (Watson,
Clark, and Tellegen 1988) using 7-point Likert scales (1 = Not at all; 7 = Very Much). Means for
measures are provided in appendix D. The measure for nervousness was created by taking 3 of
the items (jittery, distress, anxious) and averaging them to create a nervousness index (α = .928).
An additional measure of vividness was also taken using a 4-item scale adapted from Peck,
Barger, and Webb (2013) (α = .848). The vividness measure was added after consideration of
19
alternative explanations for sensory experience. For example, in recent years, experimental
evidence has indicated that people generate mental images when they take in information
through their senses (Djordjevic et al. 2004; Jeannerod 1995; Yoo et al. 2003). This mental
imagery is an important component of experience and can be elicited by sounds (Belardinelli et
al. 2009). Furthermore, consumer behavior studies on sensory effects has shown that imagery is
facilitated by vividness. Therefore, adding a measure of vividness will help to identify the role it
may play in facilitating sensory effects for the sense of sound. Lastly, the same manipulation
check and demographic measures were taken, identical to study 1.
significant differences between the slow, moderate, and fast treatments (F(2, 296) = 132.77, p <
.001, ƞ2 p = .474). Participants heard the slow voice (Mslow = 2.21, SD = 1.17) as significantly
slower than the moderate voice (Mmoderate = 3.72, SD = .96, t(196) = 9.97, p < .001, d = 1.41).
Likelihood to use the DA. One-way ANOVA comparing the three speech rate treatments
showed a significant main effect of vocal speech rate upon likelihood to use the DA (F(2, 296) =
6.34, p = .002, ƞ2 p = .041). Participants in the slower speech treatments (Mslow = 3.05, SD =
1.97) were significantly less likely to want to use the DA than the moderate speech rate (Mmoderate
= 3.91, SD = 1.80, t(196) = 3.19, p = .001, d = .454). Participants in the fast speech rate (Mfast =
3.20, SD = 1.67) treatments were significantly less likely to want to use the DA than the
20
moderate speech rate (t(199) = 2.89, p = .006, d = .407). Additionally, participants were not
significantly different in terms of likelihood to use the DA between the slow or fast treatments
(t(195) = .56, p = .568, d = .081). These results provide additional support for hypothesis 1,
consumers who use a digital assistant with a fast or slow speech rate will be less likely to want to
use it than those who interact with a moderate speech rate.
Nervousness. A one-way ANOVA showed a significant main effect of vocal speech rate
upon participant nervousness (F(2, 296) = 3.09, p = .046, ƞ2 p = .021). Participants in the slower
speech treatments (Mslow = 2.22, SD = 1.52) felt significantly more nervous than the moderate
speech rate (Mmoderate = 1.82, SD = .98, t(196) = 2.16, p = .038, d = .311). Participants in the fast
speech rate treatments (Mfast = 2.24, SD = 1.42) felt significantly more nervous than the moderate
speech rate (t(199) = 2.42, p = .027, d = .342). Additionally, participants were not significantly
different in terms of their own nervousness between the slow or fast treatments (t(195) = .11, p =
.902, d = .013). These results provide initial support for hypothesis 2, that a digital assistant with
a fast or slow speech rate will produce greater felt nervousness in a consumer than one that
speaks at a moderate rate.
Checks for the alternative explanations of vividness as well as generalized positive or
negative affect were conducted next. One-way ANOVA showed no significant differences
between the slow (Mslow = 3.74, SD = 1.41), moderate (Mmoderate = 4.06, SD = 1.50), and fast
(Mfast = 4.12, SD = 1.46) treatments in terms of vividness (F(2, 295) = 1.87, p = .155, ƞ2 p =
.013). Additionally, one-way ANOVA with the 10 positive affect items from the PANAS
grouped together (α = .883) showed no significant differences in general positive affective state
between the slow (Mslow = 3.45, SD = 1.23), moderate (Mmoderate = 3.55, SD = 1.16), and fast
(Mfast = 3.55, SD = 1.14) treatments (F(2, 295) =.208, p = .812, ƞ2 p = .001). Differences in
21
negative affective state, using the 7 negative items from the PANAS that were not part of the
nervousness measure (α = .801), showed no significant differences between the slow (Mslow =
2.45, SD = .91), moderate (Mmoderate = 2.36, SD = .90), and fast (Mfast = 2.40, SD = .92)
treatments (F(2, 295) =.291, p = .748, ƞ2 p = .002). These results indicate that both vividness as
well as general positive or negative affective state can be ruled out as alternative explanations.
Mediation of Main Effect by Nervousness. Mediation analyses (Model 4, Hayes 2017)
using 5,000 bootstrapping samples with likelihood to use the DA as the dependent variable,
speech rate as the independent variable, and feelings of nervousness as the mediator was
performed next. Details of the mediation results are reported in table 1. The omnibus test of the
total effect was significant (p = .002), indicating nervousness mediates the relationship between
speech rate and willingness to use the digital assistant. Because the independent variable is
multicategorical, the results are reported as contrasts, using indicator coding, with the moderate
speech rate being the comparison category to the slow and fast treatments, respectively. While
the omnibus test indicates mediation occurred, examination of the 95% CI’s for each contrast
provides more information on the nature of the effect. Examination of the CI’s indicates that,
compared to the moderate speech rate, participants in both the slow [CI, -.0022 to -.1980] and
fast [CI, -.0068 to -.1961] treatments had an overall negative effect upon intentions to use the
digital assistant. This effect was mediated by increased nervousness. Examination of the CI’s
when comparing the slow versus fast speech rate treatments indicates that there was a non-
significant effect upon intentions to use the digital assistant between these groups [CI, -.0949 to
.0965]. Comparisons of slow to moderate as well as fast to moderate speech rates provide initial
evidence for hypothesis 3, that feelings of nervousness mediate the relationship between digital
22
assistant speech rate and likelihood to want to use it. When speech rate leads to increased
nervousness, usage intentions are lowered.
-----------------------------------
Insert table 1 about here
-----------------------------------
Building on this, experiment 2B investigates this process of effects and shows user
nervousness is elicited by perceptions of risk associated with using the DA.
23
EXPERIMENT 2B
Experiment 2B builds on 2A by diving deeper into the relationship between speech rate
and nervousness. Specifically, 2B tests a serial mediation model (Model 6, Hayes 2017) to
uncover the mechanism by which nervousness occurs. Between humans, situations which are
perceived as risky have been shown to induce panic and jitters (Lawrence-Wood 2011) while
perceptions of anxiety and excitement in another person have been shown to be associated with
perceptions of risk (Parkinson and Simons 2009). Given this, a primary candidate for a
facilitating mechanism for the emotion of nervousness is risk associated with using a DA.
Experiment 2B makes a direct contribution to the sensory and human-computer interaction (HCI)
literature by attempting to produce evidence for this mechanism, hypothesized as self-service
technology risk. Therefore, study 2B formally tests hypothesis 4: speech rate negatively
influences likelihood to want to use a DA indirectly through a serial mediation process of self-
service technology risk to nervousness.
Method
Participants and Design. Experiment 2B utilized 481 participants (Mage = 20.25, SD =
Participants were compensated with class participation credit. Similar to earlier experiments, the
design was one-way with 3 treatments (voice speech rate: slow, moderate, fast). Participants
were told they would be interacting with a new DA built to help register for classes. Audio
recordings of the voice were created similar to prior studies and compressed for equal volume.
24
Procedure. Participants were randomly assigned to one of three treatment conditions,
hearing either a slow, moderate, or fast speaking DA. After being greeted, they listened to a
voice provide information on how it could assist them in registering for classes based on their
needs. The interaction style of the scenario was monological, or encouraging peripheral focus.
After hearing the DA speak, participants filled out measures for the variables of interest as well
as demographic information. There were 15 participants who did not correctly identify a test
sound and were removed, leaving 466 for the analysis.
single item 7-point Likert scale (“How likely would you be to use this assistant to register for
classes?”; 1 = Not at all likely; 7 = Very likely). The 3-item measure of nervousness was again
taken from the PANAS and averaged to form a nervousness index (α = .854). Following this,
participants also filled out a 3-item measure of self-service technology (SST) risk, taken from
Bauer (1960) and Meuter et al. (2005). This included 3 Likert scale items (“I am unsure if the
digital assistant will perform satisfactorily”, “Overall, using this digital assistant is risky”, “The
digital assistant didn’t sound like it would do this task well”). Items were averaged to create an
SST Risk index measure (α = .824), to be used in the serial mediation model. Lastly, the same
manipulation check and demographic measures were taken, identical to prior studies.
slower than the moderate voice (Mmoderate = 3.80, SD = .99, t(305) = 10.93, p < .001, d = 1.25).
25
moderate voice (t(302) = 13.83, p < .001, d = 1.56). Additionally, participants correctly heard the
Likelihood to use the DA. There was a significant main effect of speech rate upon
likelihood to use the DA (F(2, 463) = 8.28, p < .001, ƞ2 p = .035). Participants in the slow speech
treatments (Mslow = 2.56, SD = 1.62) were significantly less likely to want to use the DA than the
moderate speech rate (Mmoderate = 3.35, SD = 1.84, t(305) = 4.00, p = .001, d = .455). Participants
in the fast speech rate (Mfast = 2.97, SD = 1.66) treatments were significantly less likely to want
to use the DA than the moderate speech rate (t(302) = 1.91, p = .051, d = .216). Participants were
not significantly different in terms of likelihood to use the DA between the slow or fast
treatments (t(305) = .16, p = .168, d = .19).
SST Risk. There was a significant main effect of DA speech rate upon participants
perceptions of self-service technology risk (F(2, 463) = 10.37, p < .001, ƞ2 p = .043). Participants
in the slow speech treatments (Mslow = 5.18, SD = 1.25) felt the DA was significantly more risky
than the moderate speech rate (Mmoderate = 4.54, SD = 1.29, t(305) = 4.43, p < .001, d = .503).
Participants in the fast speech rate (Mfast = 4.85, SD = 1.17) treatments felt the DA was
significantly more risky than the moderate speech rate (t(302) = 2.21, p = .028, d = .251).
Additionally, participants were not significantly different in terms of their perceptions of SST
risk between the slow or fast treatments (t(305) = .06, p = .902, d = .023).
Nervousness. There was a significant main effect of DA vocal speech rate upon
participant nervousness (F(2, 463) = 10.84, p < .001, ƞ2 p = .045). Participants in the slow speech
treatments (Mslow = 3.11, SD = 1.65) felt significantly more nervous than those who heard a
moderate DA speech rate (Mmoderate = 2.42, SD = 1.42, t(305) = 3.91, p < .001, d = .443).
26
Participants in the fast speech rate (Mfast = 3.18, SD = 1.67) treatments felt significantly more
nervous than the moderate speech rate ((t(302) = 4.31, p < .001, d = .485). Additionally,
participants were not significantly different in terms of their nervousness between those who
heard a DA speak at a slow versus fast rate (t(305) = .36, p = .899, d = .042).
Serial Mediation Analyses. It was predicted that fast and slow speech rates would convey
risk in using a DA, which makes consumers feel more nervous when they hear the DA speak,
ultimately leading to lower likelihood to want to use the DA (i.e., speech rate à risk à
nervousness à usage likelihood). To test hypothesis 4, a serial mediation analysis (Model 6,
Hayes 2017) with 5,000 bootstrapping samples was conducted that uncovered a negative and
significant indirect effect of the suggested serial mediation pathway. Statistics are provided in
table 1.
Examination of relative indirect effects when comparing slow to moderate speech shows
that slow speech rate has a negative effect upon usage likelihood (b = -.03, SE = .01; CI95% = -
.05, -.01). These effects are serially mediated by SST Risk and Nervousness. Additionally, when
comparing fast to moderate speech, fast speech rate has a negative effect upon usage likelihood
(b = -.01, SE = .01; CI95% = -.03, -.01). These effects are serially mediated by SST Risk and
Nervousness. When compared to moderate speech rate, slower speech had a positive effect on
SST risk (b = .65, SE = .14; CI95% = .37, .92), and fast speech did as well (b = .31, SE = .14;
CI95% = .03, .58). Increased SST risk was then shown to have a positive effect on consumer
nervousness (b = .36, SE = .06; CI95% = .25, .48). Increased consumer nervousness then had a
negative effect upon likelihood to use the DA (b = -.11, SE = .05; CI95% = -.20, -.02). Effects
for all contrast models are shown in figure 2. Examination of the relative indirect effects when
comparing the slow and fast DA speech rate treatments showed a small negative effect, but the
27
95% CI’s contained zero, indicating a non-significant effect (b = -.01, SE = .004; CI95% = -.02,
.23). Additionally, when the order of mediators was switched, (speech rate à nervousness à
SST risk à usage likelihood), the indirect effect of speech rate on likelihood to use the DA was
not significant for either treatment (CI95%slow = –.16, .04; CI95%fast = -.18, .05), indicating the
order of causality as hypothesized.
-----------------------------------
-----------------------------------
Experiment 2B uncovers the mechanism by which speech rate elicits nervousness in a
consumer. With a serial mediation analysis, it is demonstrated that both slow and fast speech
rates decrease likelihood to use a digital assistant. This decrease is due to greater nervousness
stemming from heightened risk perception. Essentially, when someone interacts with a digital
assistant that speaks slow or fast, they perceive increased risk in continuing to use the DA, which
elicits nervousness. When a consumer feels greater nervousness, they are then less likely to want
to continue to use the DA.
28
EXPERIMENT 3A
Based upon the earlier discussion of HCI literature and perceptions of emotion in a
person’s voice, a possible explanation for differences in consumer response other than speech
rate could be personal differences in susceptibility to the emotions conveyed by others. Early
theorists in areas of nonverbal behavior and emotional contagion posited that an empathic
process occurs during communication, with the receiver of a message oftentimes taking on
congruent emotional states as the sender (Davis 1983; Eisenberg and Miller 1987). This process
facilitates movement of emotions from person to person based upon context as well as individual
differences in susceptibility to the emotional expressions of others. While contagion is difficult
to quantify, adding a measure of susceptibility to emotional expression of others allows for
investigation of the potential impact personal differences have in the hypothesized process.
Method
12.23, 59.1% male) from an online panel (MTurk) in return for small compensation. Experiment
3A was a 3 (voice speech rate: slow, moderate, fast) x Continuous (moderator: personal
susceptibility to others emotions) design with participants being told they would interact with a
new DA built to help purchase music concert tickets. Audio recordings of the voice were created
identical to prior studies.
Procedure. Participants were randomly assigned to one of three treatment conditions,
hearing either a slow, moderate, or fast speaking DA. After being greeted, they heard a voice
provide information on how it could assist them in purchasing concert tickets based on their
29
preferences. The interaction style of the scenario was monological, or encouraging peripheral
focus. After listening to the DA speak, participants filled out measures for variables of interest
and demographic information. There were 23 participants who did not correctly identify a test
sound and were removed, leaving 164 for the analysis.
single item 7-point Likert scale (“How likely would you be to use this assistant to purchase
concert tickets?”; 1 = Not at all likely; 7 = Very likely). Following this, participants filled out a
15-item individual differences scale to quantify personal susceptibility to the emotions of others
(Doherty 1997). These items were averaged to create an index (α = .902). This measure was
taken to test whether personal susceptibility to others emotions alters the results already shown.
Lastly, the same manipulation check and demographic measures were taken.
moderate voice (t(107) = 6.28, p < .001, d = 1.19). Additionally, participants correctly heard the
Personal Differences in Susceptibility to Others Emotions. Moderation analyses (Model
1, Hayes 2017) using 5,000 bootstrapping samples with likelihood to use the DA as the DV,
speech rate as the IV, and susceptibility to others emotions as the moderator produced a
30
significant interaction (F(1, 160) = 6.21, p = .014, ƞ2 p = .098). This result indicates that personal
differences in susceptibility to others emotions moderates the shown effect of speech rate upon
usage intentions. To probe the interaction, Johnson-Neyman analysis (Preacher, Rucker, and
Hayes 2007), which examines the interaction between variables at every level of the moderator,
was conducted next. The results of this analysis indicate that participants who measured above
4.8 on the scale of personal susceptibility to others emotions were significantly less likely to
want to use the DA. For participants below this level, results indicate a non-significant effect.
Results are depicted in figure 3. These results provide initial evidence t support the notion that
personal differences of consumers can moderate the relationship between digital assistant speech
rate and a consumer’s likelihood to use it.
This study produces more evidence that slow and fast speech rates can produce negative
reactions. Additionally, it indicates personal susceptibility to others’ emotions moderates the
effect, with participants who are higher in susceptibility having lower intentions to use the DA.
Therefore, managers should be aware that the personal characteristics of consumers may have an
effect on reactions to DA speech rate. However, personal differences are not under the control of
marketers and can only be measured and accounted for. Given this, more managerial value can
be provided if an applicable moderator is tested. Therefore, the remaining studies turn to testing
the moderator of interaction style, which can be adjusted by marketing managers in their design
of digital assistants. Interaction style between digital assistants and consumers gives managers a
tool they can use to avoid unwanted results. Experiment 3B tests this moderator and replicates
some of the results from study3A while also generalizing them to a different sample.
-----------------------------------
-----------------------------------
31
EXPERIMENT 3B
The results of the first 4 studies become more managerially useful if the negative effects
can be avoided. Therefore, the remaining experiments test the moderating variable of interaction
style. In experiment 3B, participants were asked to listen to a DA voice and were randomly
assigned to interact in a monological or dialogical scenario. This was carried out by having some
participants listen to information, identical to prior experiments, while having others interact
with the DA in a back-and-forth manner, serving as a proxy for actual interaction with the DA in
the real world. The theoretical basis for a moderation hypothesis lies in the idea that the speech
rate of a voice is a paralinguistic quality and should only be effective at changing consumer
perceptions when it is a salient peripheral cue. When an interaction is designed to be more
dialogic, it encourages focus on central content, and what is being said becomes more salient. In
this style, processing of a peripheral cue will not occur and negative reactions should be avoided.
Experiment 3B directly tests hypothesis 5, that the relationship between speech rate and
likelihood to use a digital assistant is moderated by interaction style, with decreases in usage
likelihood only occurring when the interaction is monological.
Method
Participants were compensated with class participation credit. The voice of the DA was
manipulated to create slow, moderate, and fast treatments, similar to earlier studies. Additionally,
participants were also randomly assigned to engage with the DA either by only listening to the
32
DA speak or to interact with the DA in a back-and-forth way. Therefore, the design for
experiment 3B is a 3 (speech rate: slow, moderate, fast) x 2 (interaction style: monological vs.
dialogical) between subjects design. Participants were told they would be interacting with a new
DA built to help create a personal budget. Audio recordings of the voice were created similar to
prior studies. The difference between interaction style treatments was that those in the
monological treatments heard one uninterrupted sound file of the DA talking, while those in the
dialogical treatments heard shorter segments of the same recording, split up into multiple files.
Between listening to each segment, participants were asked to focus on the information being
provided by inputting budget items before proceeding. This effectively created a monological
(one-direction) interaction versus a dialogical (back-and-forth) interaction with the DA.
Procedure. Participants were randomly assigned to one of the six treatment conditions.
After being greeted by the DA, they heard the voice provide information on how and why to
create a personal budget. After going through the scenario, participants filled out measures for
the variables of interest, manipulation checks and demographic information. There were 20
participants who did not correctly identify a test sound and were removed, leaving 342 for the
analysis.
Measures. The dependent variable of likelihood to use the DA to create a personal budget
was measured using a single item 7-point Likert scale (“How likely would you be to use this
assistant to create a personal budget?”; 1 = Not at all likely; 7 = Very likely). Following this
measure, participants completed single-item manipulation checks for the audio recordings, taken
from Martín-Santana et al. (2015) (“This voice was 1 = Slow; 7 = Fast”; “This voice was 1 =
Quiet; 7 = Loud”; “This voice was 1 = Low Pitched; 7 = High Pitched”). Participants finished by
indicating their gender and age. Measures are provided in appendix C.
33
.001, ƞ2 p = .22). Participants heard the slow voice (Mslow = 3.30, SD = .89) as significantly
Likelihood to use the DA. Two-way ANOVA revealed a significant main effect of DA
speech rate upon participant likelihood to use it (F(2, 336) = 11.08, p < .001, ƞ2 p = .06).
Participants who heard the DA speak at a moderate rate were significantly more likely to use the
DA (Mmoderate = 4.69, SD = 1.46) than in both the slow speaking (Mslow = 3.74, SD = 1.81, t(221)
= 4.38, p < .001, d = .589) as well as fast speaking (Mfast = 3.80, SD = 1.82, t(229) = 4.13, p <
.001, d = .545) treatments, respectively. Differences between the slow and fast DA speech rate
treatments were not significant (t(224) = .287, p = .775, d = .038). The main effect of interaction
style was non-significant (F(1,336) = .221, p = .638).
Of particular importance to the current experiment, a marginally significant interaction
was revealed (F(2, 336) = 2.88, p = .057, ƞ2 p = .018), indicating moderation. The results of the
interaction show that in the monological interactions with the DA, as expected, participants were
significantly less likely the use the DA when they heard a slower speech rate (Mslow = 3.46, SD =
1.73) versus the moderate speech rate (Mmoderate = 4.94, SD = 1.46, t(113) = 4.96, p < .001, d =
.926). Additionally, participants were less likely to use the DA in the fast speech conditions
34
(Mfast = 3.69, SD = 1.75) versus the moderate speech rate (t(116) = 4.21, p < .001, d = .779).
Differences between the slow and fast DA speech rate treatments with a monological interaction
were not significant (t(113) = .708, p = .480, d = .131). In the dialogical conditions, where the
interaction with the DA was more focused on the central content, the negative results were
moderated, with no significant differences between the treatments (F’s < 1). Means for the
treatments are shown in figure 4. These results provide initial evidence in support of hypothesis
5, that the relationship between digital assistant speech rate and likelihood to use it is moderated
by interaction style. Decreases in likelihood to use the DA only occur when the interaction is
monological, while dialogical interactions attenuate the negative effects.
-----------------------------------
-----------------------------------
35
EXPERIMENT 4A
Experiment 4A builds on the first five experiments by testing a complete moderated
mediation model for the hypothesized effects. The experiment also provides additional
replication and generalization of the results already presented. The model includes interaction
style as a moderator because it is an adaptable variable, under the control of marketing managers
and product designers of the DA, rather than a personal difference that is not controllable.
Method
.90, 50.4% male) in a controlled lab environment at a large public university in the U.S.
Participants were again compensated with class participation credit. Participants were randomly
assigned to hear a slow, moderate, or fast vocal speech rate and also to engage with the DA in
either a monologic or dialogic style. Therefore, the design for experiment 4A is a 3 (speech rate:
slow, moderate, fast) x 2 (interaction style: monological vs. dialogical) between subjects design.
Participants were told they would be interacting with a new digital health agent. Audio
recordings of the voice were created similar to earlier experiments.
After being greeted by the DA, they listened to the voice provide information on how and why to
create a personal health plan. Manipulation of interaction style was carried out similar to prior
experiments. After going through the scenario, participants filled out measures for the variables
of interest and demographic information. There were 6 participants who did not correctly
identify a test sound and were removed, leaving 343 for the analysis.
36
Measures. Measures for experiment 4A were mostly identical to those taken in earlier
experiments, with the addition of a manipulation check for the interaction style. This measure
consisted of 2 items adapted from Asterhan and Schwarz (2007) (“Would you say this interaction
was more one-directional or interactive?”; “Would you say this interaction was
more informational or more instructional?”). These items were averaged to form an index
measure (r = .862). The dependent variable of likelihood to use the DA was measured using a
personal health plan?”; 1 = Not at all likely; 7 = Very likely). Nervousness was again measured
using the same 3-item scale (jittery, distressed, and nervous) from earlier studies. These
measures were averaged to form a nervousness index (α = .947). Lastly, the same manipulation
check and demographic measures were taken, identical to prior studies.
Manipulation Checks. The manipulation of speech rate was successful, with participants
indicating significant differences between the slow, moderate, and fast treatments (F(2, 340) =
46.80, p < .001, ƞ2 p = .21). Participants heard the slow voice (Mslow = 3.54, SD = 1.00) as
significantly slower than the moderate voice (Mmoderate = 3.90, SD = .78, t(227) = 3.01, p = .003,
d = .401). Participants also heard the fast voice (Mfast = 4.83, SD = 1.28) as significantly faster
than the moderate voice (t(228) = 6.69, p < .001, d = .873). Additionally, participants correctly
heard the slow voice as significantly slower than the fast voice (t(225) = 8.45, p < .001, d =
1.11). There were no significant differences between the voices in terms of pitch or volume (F’s
< 1). The manipulation of interaction style was also confirmed as successful, with participants in
the monological treatments indicating the interaction was more one-way (Mmonological = 4.41, SD
37
= 1.69), than those in the back-and-forth, or dialogical, treatments (Mdialogical = 3.83, SD = 1.63,
t(341) = 2.11, p = .001, d = .428).
Likelihood to use the DA. Two-way ANOVA revealed a significant main effect of DA
speech rate upon participant likelihood to use it (F(2, 337) = 3.47, p = .032, ƞ2 p = .02).
Participants who heard the DA speak at a moderate rate were significantly more likely to use the
DA (Mmoderate = 3.90, SD = 1.81) than in both the slow speaking (Mslow = 3.30, SD = 1.74, t(227)
= 2.52, p < .001, d = .337) as well as fast speaking (Mfast = 3.46, SD = 1.80, t(228) = 1.84, p <
.001, d = .248) treatments, respectively. Differences between the slow and fast DA speech rate
treatments were not significant (t(225) = .658, p = .511, d = .084). The main effect of interaction
style was not significant (F(1,337) = .647, p = .422).
In further support of hypothesis 5, there was a significant interaction between speech rate
and interaction style (F(2, 337) = 3.07, p = .048, ƞ2 p = .019). In the monological interactions
with the DA, participants were significantly less likely to use it when they heard a slower speech
rate (Mslow = 3.00, SD = 1.65) versus the moderate speech rate (Mmoderate = 4.15, SD = 1.83,
t(112) = 3.53, p = .001, d = .665). Participants were also less likely to use the DA in the fast
speech conditions (Mfast = 3.29, SD = 1.74) versus the moderate speech rate (t(114) = 2.59, p =
.009, d = .485). Differences between the slow and fast DA speech rate treatments were not
significant (t(112) = .920, p = .775, d = .170). In the dialogical treatments, where the interaction
with the DA was more interactive, the negative results were moderated, with no significant
differences between the treatments (F’s < 1). Means for the treatments are shown in figure 5.
-----------------------------------
-----------------------------------
38
Nervousness. Speech rate and interaction style had non-significant main effects upon felt
nervousness (F’s < 1), but there was a significant interaction (F(2, 337) = 4.65, p = .010, ƞ2 p =
.027). In further support of hypothesis 2, participants in the monological interactions were
significantly more nervous when they heard a slower speech rate (Mslow = 2.83, SD = 1.69)
versus the moderate speech rate (Mmoderate = 2.17, SD = 1.35, t(112) = 2.32, p = .022, d = .437).
They were also more nervous in the fast speech conditions (Mfast = 3.20, SD = 1.48) than the
moderate speech rate (t(114) = 3.92, p < .001, d = .732). Differences between the slow and fast
DA speech rate treatments were not significant (t(112) = 1.23, p = .775, d = .232). In the
dialogical treatments, the negative results were moderated, with no significant differences
between the treatments (F’s < 1).
Moderated Mediation. Using PROCESS (Model 8, Hayes 2017) with 5,000
bootstrapping samples, a complete moderated mediation analysis was conducted next.
Likelihood to use the DA was the DV, speech rate was the IV, interaction style served as the
moderating variable and feelings of nervousness was the mediator. Details of the mediation
results are reported in table 2. Because the independent variable is multicategorical, the results
are reported as contrasts, using indicator coding, with the moderate speech rate being the
comparison category to the slow and fast treatments, respectively. An index of moderated
mediation is created for each contrast of speech rates. Evidence of moderated mediation in one of
contrasts would support the hypothesis that nervousness mediates the relationship between
speech rate and likelihood to use the DA. Examination of relative indirect effects helps to
determine how the effect is moderated by interaction style.
When comparing the moderate speech rate with the slow speech rate, the index of
moderated mediation does not include zero [LLCI = .0082, ULCI = .4629], providing evidence
39
that nervousness mediates the relationship between speech rate and likelihood to use the DA.
Examination of the confidence intervals between interaction styles shows that the negative effect
upon likelihood to use the DA occurs only for monological interactions [LLCI = -.3469, ULCI =
-.0211], and not dialogical ones [LLCI = -.1057, ULCI = .2117]. Interpretation of the model
coefficients shows that during monological interactions, slow speech elicits more nervousness
than moderate speech, leading to lower likelihood to use the DA.
When comparing the moderate speech rate with the fast speech rate, the index of
moderated mediation also does not include zero [LLCI = .0916, ULCI = .5671], providing
evidence that nervousness mediates the relationship between speech rate and likelihood to use
the DA. Examination of the confidence intervals between interaction styles shows the negative
effect upon likelihood to use the DA occurs only for monological interactions [LLCI = -.4486,
ULCI = -.1024], and not dialogical ones [LLCI = -.0971, ULCI = .1953]. Interpretation of the
model coefficients shows that during monological interactions, fast speech elicits more
nervousness than moderate speech, leading to lower likelihood to use the DA. A comparison of
the slow DA speech treatment with the fast DA speech rate treatments produced an index of
moderated mediation that included zero [LLCI = -.1216, ULCI = .3295], indicating there was no
difference between the conditional indirect effects of these treatments.
-----------------------------------
Insert table 2 about here
-----------------------------------
40
EXPERIMENT 4B
Experiment 4A provided the first full model of hypotheses examined thus far. It shows
that slow and fast speech elicit increased nervousness, which then led to decreased usage
intentions. However, the design of the interaction also moderates the effect, providing guidance
to marketers who wish to make digital assistants who can speak at different rates while also
avoiding negative consumer reactions. Experiment 4B contributes to the complete empirical
package by replicating the full model of effects shown in study 4A and expanding the effects to a
different sample population while providing additional support for all hypotheses.
Method
12.79, 48.9% male) from an online panel (Amazon Mechanical Turk) in return for small
compensation. Participants were randomly assigned to hear a slow, moderate, or fast vocal
speech rate and to engage with the DA in either a monologic or dialogic interaction, similar to
experiment 4A. Therefore, the design for experiment 4B is a 3 (speech rate: slow, moderate, fast)
x 2 (interaction style: monological vs. dialogical) between subjects design. Participants were told
they would be interacting with a new digital health agent. Audio recordings of the voice were
created similar to earlier studies.
The sequence for experiment 4B was identical to the one used in 4A. After going through the
scenario, participants filled out measures for the variables of interest and demographic
41
information. There were 31 participants who did not correctly identify a test sound and were
removed, leaving 307 for the analysis.
Measures. Measures for study 4B were all the same as what was collected in study 4A.
Manipulation Checks. The manipulation of speech rate was successful, with participants
indicating significant differences between the slow, moderate, and fast treatments (F(2, 304) =
172.30, p < .001, ƞ2 p = .53). Participants heard the slow voice (Mslow = 3.03, SD = 1.55) as
significantly slower than the moderate voice (Mmoderate = 4.33, SD = .83, t(200) = 7.43, p < .001,
d = 1.05). Participants also heard the fast voice (Mfast = 6.04, SD = 1.00) as significantly faster
than the moderate voice (t(203) = 13.31, p < .001, d = 1.86). Additionally, participants correctly
heard the slow voice as significantly slower than the fast voice (t(203) = 16.57, p < .001, d =
2.31). There were no significant differences between the voices in terms of pitch or volume (F’s
< 1). The manipulation of interaction style was also confirmed as successful, with participants in
the monological treatments indicating the interaction was more one-way (Mmonological = 3.38, SD
= 1.53), than those in the back-and-forth, or dialogical, treatments (Mdialogical = 4.00, SD = 1.64,
t(303) = 3.39, p = .001, d = .389).
Likelihood to use the DA. Two-way ANOVA showed non-significant main effects for
both speech rate and interaction style upon likelihood to use the DA (F’s < 1). There was,
however, a significant interaction between the two (F(2, 301) = 4.90, p = .008, ƞ2 p = .03). In
support of hypothesis 5, participants in the monological treatments were significantly less likely
to use the DA when it spoke at a slow rate (Mslow = 4.37, SD = 2.02) versus a moderate rate
(Mmoderate = 5.15, SD = 1.21, t(103) = 2.38, p = .027, d = .462). Participants in the fast DA speech
rate treatments were also significantly less likely to use the DA (Mfast = 4.13, SD = 2.09) versus
42
the moderate speech rate (t(103) = 3.06, p = .004, d = .598). Differences between the slow and
fast DA speech rate treatments were not significant (t(104) = .615, p = .511, d = .121). In the
dialogical treatments, where the interaction with the DA was more interactive, the negative
results were moderated, with no significant differences between the treatments (F’s < 1). Means
are shown in figure 6.
-----------------------------------
-----------------------------------
Nervousness. Speech rate and interaction style had non-significant main effects upon felt
nervousness (F’s < 1), but there was a significant interaction (F(2, 301) = 5.02, p = .007, ƞ2 p
=.03). In further support of hypothesis 2, participants in the monological interactions were
significantly more nervous when they heard a slower speech rate (Mslow = 2.34, SD = 1.83)
versus the moderate speech rate (Mmoderate = 1.50, SD = .57, t(103) = 3.14, p = .005, d = .616).
Participants were also more nervous in the fast speech conditions (Mfast = 2.47, SD = 1.60) than
the moderate speech rate (t(103) = 4.12, p = .001, d = .806). Differences between the slow and
fast DA speech rate treatments were not significant (t(104) = .394, p = .775, d = .075). In the
dialogical treatments, the negative results were moderated, with no significant differences
between the treatments (F’s < 1).
Moderated Mediation. Using PROCESS (Model 8, Hayes 2017) with 5,000
bootstrapping samples, a complete moderated mediation analysis was conducted next.
Likelihood to use the DA to create a daily health check was the dependent variable, speech rate
was the independent variable, interaction style served as the moderating variable and feelings of
nervousness was the mediator. Details of the mediation results are reported in table 2. Because
43
the independent variable is multicategorical, the results are reported as contrasts, similar to
experiment 4A analysis.
When comparing the moderate speech rate with the slow speech rate, the index of
moderated mediation does not include zero [LLCI = .0343, ULCI = .3674], providing evidence
that nervousness mediates the relationship between speech rate and likelihood to use the DA.
Examination of the confidence intervals for the contrast between interaction styles shows that the
negative effect upon likelihood to use the DA occurs only for monological interactions [LLCI = -
.2866, ULCI = -.0264], and not dialogical ones [LLCI = -.0715, ULCI = .1444]. Interpretation of
the model coefficients shows that during monological interactions, slow speech elicits more
nervousness than moderate speech, leading to lower likelihood to use the DA.
When comparing the moderate speech rate with the fast speech rate, the index of
moderated mediation also does not include zero [LLCI = .0414, ULCI = .4204], providing
evidence that nervousness mediates the relationship between speech rate and likelihood to use
the DA. Examination of the confidence intervals for the contrast between interaction styles
shows that the negative effect upon likelihood to use the DA occurs only for monological
interactions [LLCI = -.3123, ULCI = -.0331], and not dialogical ones [LLCI = -.0546, ULCI =
.1705]. Interpretation of the model coefficients shows that during monological interactions, fast
speech elicits more nervousness than moderate speech, leading to lower likelihood to use the
DA. A comparison of the slow DA speech with the fast DA speech rate treatments produced an
index of moderated mediation that included zero [LLCI = -.1135, ULCI = .2092], indicating
there was not a significant difference between the conditional indirect effects of these treatments.
44
GENERAL DISCUSSION
This research shows the voice we hear during social interactions has an impact on what
we feel and ultimately what we are likely to do. Across seven experiments, evidence is presented
showing the speed of the voice we hear can elicit both positive and negative reactions, such as
nervousness. Increases in nervousness are then shown to lower likelihood to want to interact with
a voice, tested here in the form of a digital assistant. Prior work in the area of sensory marketing
has focused on understanding positive reactions to pleasing sounds, such as music, or on testing
other parameters of sound, such as volume or pitch rather than speed of speech. In the broadcast
advertising literature, outcome measures have consistently been in the form of perceptions
regarding the voice only, and studies have tested variables which are conflated with volume or
visual content. Given this, the current work clarifies the role that speech rate alone plays in
changing consumer intentions. Furthermore, it shows that negative effects may occur based on
the speech rate of a voice, particularly the sound of the voice we hear when interacting with a
digital assistant.
The opening of this manuscript makes clear that digital assistants are being adopted at a
rapid pace, and their integration with other devices that create branded ecosystems means that
competition for market share in the early stages has large financial impact for brands. This rapid
adoption of digital assistants which speak to us calls for both practitioners as well as academics
to investigate and uncover how we interpret and react to their voices. As such, these studies
expand the literature in sensory marketing related to the audible cue of speech. They attempt to
do this by investigating the popular consumer behavior topic of digital assistants and providing
45
theoretical guidance to explain the results. The results demonstrate the speech rate of a
digital assistant’s voice, being slow, moderate, or fast, can elicit nervousness in consumers,
which is facilitated by perceptions of risk, and ultimately lowers likelihood to want to use the
digital assistant. Additionally, the managerially applicable moderators of interaction style and
personal susceptibility to others emotions are shown to mitigate negative outcomes. Differences
between speech rates and consumer reactions are examined in both lab and online environments
and across multiple scenarios.
Theoretical Implications
This research contributes to the body of sensory marketing research in several ways.
First, it demonstrates that an audible cue, the voice of a digital assistant, can negatively affect a
consumer’s emotional and intentional reactions to products. While some prior work has
investigated the sound of voices, the results are limited to main effects in broadcast advertising
contexts which focus on attributions about the speaker instead of the consumer. Furthermore,
these studies often use less nuanced manipulations of slow versus fast speech rate. Second, this
work examines the audible cue of voice speech rate rather than voice pitch or volume. Speech
rate, as well as timbre qualities, open up many avenues for theoretical development moving
forward. By examining the effects of audible cues beyond those that are combined with visuals
or other sensory elements, this work expands the sensory as well as HCI literature.
Third, this work bridges a gap between sensory and communications literature by
applying the theory of social response in order to theoretically explain a sensory effect. This will
hopefully allow marketers who manage a digital assistant to better understand a consumer’s
interactions with their service robot and guide them regarding the design of the voice. The results
indicate a process of effects, via feelings of nervousness and risk, which are moderated by
46
interaction style. This provides theoretical grounding for future studies on other effects of vocal
cues. By incorporating theory from communications and HCI literature, a more refined
understanding of our interactions with a new technology of commerce is possible.
Lastly, this work builds upon prior research which used dichotomous manipulations of
speech compression or only examined faster speech. The expanded manipulation of slow,
moderate, and fast speeds of speech provides more nuance to assessments of these effects. The
current manipulation of speed at both 20% slower and faster than average provides something
closer to just-noticeable-difference analysis for established speech rates (Quené 2007). These
equated to 115 words per minute for slow and 160 words per minute for fast speech, providing a
rough guideline for future speech rate research.
Overall, the main intended contribution of this work is in providing a better theoretical
understanding of how the sounds we hear affect what we feel, think, and do. When interacting
with a digital assistant, our interactions may produce both positive and negative outcomes. These
studies provide an explanatory process for these negative outcomes in technology interaction and
establish applicable suggestions for how to avoid them.
Managerial Implications
Given that many modern consumer devices now come equipped with digital assistants,
such as Alexa and Siri, with over half of U.S. consumers owning a smart speaker and 45% of
millennials reporting they use their digital assistant while shopping (Kinsella 2019; Toplin
2018), it has become increasingly important for product managers to understand how consumers
interact with them. The voice of a digital assistant is an audible cue which helps to facilitate
social interactions with consumers. As such, it is also a tool brands are focusing on in order to
provide social value when consumers interact with their products. However, not much is known
47
in the marketing literature to help explain how and why consumers perceive this social value and
interact with different vocal types.
The presented findings suggest marketing managers who implement digital assistants
which can adjust their vocal speech rate should consider both what the voice says as well as how
it says it. Speech rate of a digital assistant can impact consumer nervousness, which then impacts
likelihood to want to use it. When the digital assistant speaks at a slow or fast rate, nervousness
increases and lowers likelihood to use.
If a brand wants to implement a digital assistant which varies its speech rate, the design
of the interaction style with the consumer becomes an important factor to consider. The strategic
use of speech rate as well as interaction style was shown to help avoid negative outcomes under
some circumstances. If an interaction is one-directional, or monologic, slow and fast speech rate
increase nervousness and then intentions to use the digital assistant are lowered. However, if the
interaction is designed as a back-and-forth interactive style, negative results are avoided.
Based upon this, managers could consider designing digital assistants to speak at a
moderate speech rate overall. However, if a voice that speaks at varying speech rates is desired,
then interactions could be designed as dialogical rather than monological.
Limitations and Future Research
These empirical findings are not without limitation and offer at least three lines of future
inquiry and improvement. First, the data collected are taken from a majority of controlled lab
samples, with only two online panels included. An interesting question is whether the actual
usage of a digital assistant would change in a real-world environment. To answer this question, a
different measure of the dependent variable could be usage behavior of those people who already
use a digital assistant. Having a collection of field data from actual devices would improve the
48
dependent measure, increase validity for the hypothesized moderator, and would reinforce the
already shown results.
Second, researchers may consider testing alternative explanations for the effects. While
interaction style and susceptibility to the emotions of others were examined here, other
mechanisms such as accessibility-diagnosticity theory (Herr, Kardes, and Kim 1991) provide
alternative explanations for the observed relationships. For instance, there is some evidence that
when the accessibility of brand-related information increases, consumers are more likely to use
that information as an input for brand evaluations (Li and He 2013; Menon and Raghubir 2003).
Therefore, if speed is important to your branding, the speech rate of your digital assistant may be
a more accessible piece of information during interactions, and should be emphasized by speech
rate.
Lastly, this work used time compression techniques similar to early research in
broadcasting on speaker voice. This technique conflates the earlier mentioned subcomponents of
speech rate: syllabic speed and interphase pausation. While there are mixed results to
manipulating one or both of these subcomponents of speech rate, doing so with digital assistants
may provide important nuance to future researchers and would not be difficult given modern
digital audio workstations.
These studies are among the first known to the author to provide a thorough process of
sensory effects related to the sound of a voice as well as reasonable and actionable tools for
managers who are using digital assistants to promote their products. With the use of digital
assistants growing, it is important that brands ensure the voice of their digital assistant provides
emotional and social value to the consumer, similar to a human agent. By establishing that either
the speech rate or interaction style mitigate the negative effects of digital assistant vocal cues, the
49
link between the voice desired by managers and their expected outcome can be achieved. The
way in which these digital assistants are designed to speak to us can impact how we feel while
interacting with them as well as how much we want to use them. This work shows the voice a
consumer hears when interacting with a digital assistant is impactful to successful consumer
adoption. Overall, for people and computers alike, it’s important to remember that our message
to others isn’t conveyed in just what we say, but also in how fast or slow we say it.
50
DATA COLLECTION INFORMATION
Experiments 1, 2A, 2B, 3B, and 4A were conducted at the University of Alabama from
Fall of 2019 to Spring 2020 by the author, under supervision of the co-chairs and direction of the
committee members. Experiments 3A and 4B were conducted online in Spring 2020 using
Amazon MTurk, under the supervision of the co-chairs. The author prepared, analyzed, and
wrote the manuscript in its current form.
51
REFERENCES
Aggarwal, Pankaj, and Ann L. McGill (2007), "Is That Car Smiling at Me? Schema Congruity as
a Basis for Evaluating Anthropomorphized Products," Journal of Consumer Research,
34, 4, 468-479.
Amazon Day One Staff (2019), “Alexa, Speak Slower,” Retrieved from
https://blog.aboutamazon.com/devices/alexa-speak-slower, August 7, 2019.
Anselmsson, Johan (2001), “Customer-Perceived Service Quality and Technology-Based Self-

Service,” Doctoral Dissertation, Lund University, Lund, Sweden: Lund Business Press.
Apple, William, Lynn A. Streeter, and Robert M. Krauss (1979), "Effects of Pitch and Speech
Rate on Personal Attributions," Journal of Personality and Social Psychology, 37, 5, 715.
Asterhan, Christa, and Baruch B. Schwarz (2007), "The Effects of Monological and Dialogical
Argumentation on Concept Learning in Evolutionary Theory," Journal of Educational
Psychology, 99, 3, 626.
Barrett, Lisa Feldman, and James A. Russell (1999), "The Structure of Current Affect:
Controversies and Emerging Consensus," Current Directions in Psychological Science,
8, 1, 10-14.
Bauer, Raymond A. (1960), Consumer Behavior as Risk Taking, Chicago, IL, 384-398.
Belardinelli, M. Olivetti, Massamiliano Palmiero, Carlo Sestieri, Davide Nardo, Rosalia Di

Matteo, Alessandro Londei, Alessandro D’Ausilio, Antonio Ferretti, Cosimo Del Gratta,
and Gian Luca Romani (2009), "An fMRI Investigation on Image Generation in Different
Sensory Modalities: The Influence of Vividness," Acta Psychologica, 132, 2, 190-200.
Benkí, José, Jessica Broome, Frederick Conrad, Robert Groves, and Frauke Kreuter (2011),
“Effects of Speech Rate, Pitch, and Pausing on Survey Participation Decisions," In
American Association for Public Opinion Research Annual Meeting, Phoenix, AZ.
Bruner, Gordon C. (1990), "Music, Mood, and Marketing," Journal of Marketing, 54, 4, 94-104.
Burke, Michael J., Arthur P. Brief, Jennifer M. George, Loriann Roberson, and Jane Webster
(1989), "Measuring Affect at Work: Confirmatory Analyses of Competing Mood
Structures with Conceptual Linkage to Cortical Regulatory Systems," Journal of
Personality and Social Psychology, 57, 6, 1091.
52
Charoenruk, Nuttirudee, and Kristen Olson (2018), "Do Listeners Perceive Interviewers’
Attributes from their Voices and Do Perceptions Differ by Question Type?," Field
Methods, 30, 4, 312-328.
Charpentier, Caroline J., Jessica Aylward, Jonathan P. Roiser, and Oliver J. Robinson (2017),
"Enhanced Risk Aversion, but Not Loss Aversion, in Unmedicated Pathological
Anxiety," Biological Psychiatry, 81, 12, 1014-1022.
Chaiken, Shelly (1980), "Heuristic Versus Systematic Information Processing and the Use of
Source Versus Message Cues in Persuasion," Journal of Personality and Social
Chattopadhyay, Amitava, Darren W. Dahl, Robin JB Ritchie, and Kimary N. Shahin (2003),
"Hearing Voices: The Impact of Announcer Speech Characteristics on Consumer
Response to Broadcast Advertising," Journal of Consumer Psychology, 13, 3, 198-204.
Cherry, John, and John Fraedrich (2002), "Perceived Risk, Moral Philosophy and Marketing
Ethics: Mediating Influences on Sales Managers' Ethical Decision-Making," Journal of
Business Research, 55, 12, 951-962.
Dahl, D. W. (2010), “Understanding the Role of Spokesperson Voice in Broadcast Advertising,”

In Sensory Marketing: Research on the Sensuality of Products, ed. A. Krishna, New
York: Routledge, 169–182.
Darwin, Charles (1872), The Expression of Emotions in Animals and Man, London: Murray, 11.
Davis, Mark H. (1983), "Measuring Individual Differences in Empathy: Evidence for a

Multidimensional Approach," Journal of Personality and Social Psychology, 44, 1, 113.
De Ruyter, Ko, Martin Wetzels, and Mirella Kleijnen (2001), "Customer Adoption of e‐Service:
An Experimental Study," International Journal of Service Industry Management, 5.
Djordjevic, Jelena, Robert J. Zatorre, Michael Petrides, and Marilyn Jones-Gotman (2004), "The
Mind's Nose: Effects of Odor and Visual Imagery on Odor Detection," Psychological
Science, 15, 3, 143-148.
Doherty, R. William (1997), "The Emotional Contagion Scale: A Measure of Individual

Differences," Journal of Nonverbal Behavior, 21, 2, 131-154.
Eagly, Alice H., and Shelly Chaiken (1993), The Psychology of Attitudes, Harcourt Brace
Jovanovich College Publishers.
Eisenberg, Nancy, and Paul A. Miller (1987), "The Relation of Empathy to Prosocial and
Related Behaviors," Psychological Bulletin, 101, 1, 91.
53
Epley, Nicholas, Adam Waytz, and John T. Cacioppo (2007), "On Seeing Human: A Three-
Factor Theory of Anthropomorphism," Psychological Review, 114, 4, 864.
Fowles, Don C. (1994), "A Motivational Theory of Psychopathology," Nebraska Symposium on

Motivation, 41, 181-238.
Garlin, Francine V., and Katherine Owen (2006), "Setting the Tone with the Tune: A Meta-
Analytic Review of the Effects of Background Music in Retail Settings," Journal of
Gatignon, Hubert, and Thomas S. Robertson (1985), "A Propositional Inventory for New
Diffusion Research," Journal of Consumer Research, 11, 4, 849-867.
Giorgetta, Cinzia, Alessandro Grecucci, Sophia Zuanon, Laura Perini, Matteo Balestrieri,
Nicolao Bonini, Alan G. Sanfey, and Paolo Brambilla (2012), "Reduced Risk-Taking
Behavior as a Trait Feature of Anxiety," Emotion, 12, 6, 1373.
Hayes, Andrew F. (2017), Introduction to Mediation, Moderation, and Conditional Process

Analysis: A Regression-Based Approach, Guilford Publications.
Herr, Paul M., Frank R. Kardes, and John Kim (1991), “Effects of Word-of-Mouth and Product-
Attribute Information on Persuasion: An Accessibility-Diagnosticity Perspective,”
Journal of Consumer Research, 17, 4, 454–62.
Jeannerod, Marc (1995), "Mental Imagery in the Motor Context," Neuropsychologia, 33, 11,
1419-1432.
Johannesen, Richard L. (1996), Ethics in Human Communication, 4th ed. Prospect Heights, IL,
Waveland Press.
Kahneman, Daniel, and Amos Tversky (2013), "Prospect Theory: An Analysis of Decision
Under Risk," In Handbook of the Fundamentals of Financial Decision Making, Part I,
99-127.
Kellaris, James J. and Moses B. Altesch (1992), "The Experience of Time as a Function of
Musical Loudness and Gender of Listener," Advances in Consumer Research, Vol. 19,
ed. J. Sherry and B. Sternthal, Provo, UT: Association for Consumer Research, 725-729.
Kinsella, Bret (2019), “45% of Millennials Use Voice Assistants While Shopping According to a
New Study,” Retrieved from https://voicebot.ai/2019/03/20/45-of-millennials-use-voice-
assistants-while-shopping-according-to-a-new-study/, May 21, 2019.
Kraus, Michael W. (2017), "Voice-Only Communication Enhances Empathic Accuracy,"

American Psychologist, 72, 7, 644.
54
Kraus, Rachel (2019), “John Legend’s Voice on Google Assistant is Finally Here,” Retrieved
from https://mashable.com/article/john-legend-google-assistant-voice-cameo-
launch/?utm_campaign=FEED+BLAST-Mashable+Top+Stories+Daily-
20190404T170000%2B0000&utm_source=newsletter#V1b_olNpbOqY, April 21, 2019.
Krishna, Aradhna (2013), Customer Sense: How the 5 Senses Influence Buying Behavior, New
York: Palgrave Macmillan.
Kumar, Rahul and Akshay Rasal (2018), “Smart Speaker Market by Intelligent Virtul Assistant,”
Retrieved from https://www.alliedmarketresearch.com/smart-speaker-market, April 22,
2020.
Lane, Harlan, and François Grosjean (1973), "Perception of Reading Rate by Speakers and
Listeners," Journal of Experimental Psychology, 97, 2, 141.
Langer, Ellen J. (1989), Mindfulness, Addison-Wesley/Addison Wesley Longman.
Laukka, Petri, Clas Linnman, Fredrik Åhs, Anna Pissiota, Örjan Frans, Vanda Faria, Åsa
Michelgård, Lieuwe Appel, Mats Fredrikson, and Tomas Furmark (2008), "In a Nervous
Voice: Acoustic Analysis and Perception of Anxiety in Social Phobics’ Speech," Journal
of Nonverbal Behavior, 32, 4, 195.
Lawrence-Wood, Eleanor Ruth (2011), “Trust Me, This Is(n't) Scary!: How Trust Affects Social
Emotional Influence in Threatening Situations,” Flinders University of South Australia,
School of Psychology, 3.
Li, Yan and Hongwei He (2013), “Evaluation of International Brand Alliances: Brand Order and
Consumer Ethnocentrism,” Journal of Business Research, 66, 1, 89–97.
Lowe, Michael L., and Kelly L. Haws (2017), "Sounds Big: The Effects of Acoustic Pitch on
Product Perceptions," Journal of Marketing Research, 54, 2, 331-346.
Lowe, Michael L., Katherine E. Loveland, and Aradhna Krishna (2019), "A Quiet Disquiet:
Anxiety and Risk Avoidance Due to Nonconscious Auditory Priming," Journal of
Consumer Research, 46, 1, 159-179.
Lowe, Michael L., Christine Ringler, and Kelly Haws (2018), "An Overture to Overeating: The
Cross-Modal Effects of Acoustic Pitch on Food Preferences and Serving Behavior,"
Appetite, 123, 128-134.
MacLachlan, James, and Michael H. Siegel (1980), "Reducing the Costs of TV Commercials by
Use of Time Compressions," Journal of Marketing Research, 17, 1, 52-57.
Maner, Jon K., J. Anthony Richey, Kiara Cromer, Mike Mallott, Carl W. Lejuez, Thomas E.
Joiner, and Norman B. Schmidt (2007), "Dispositional Anxiety and Risk-Avoidant
Decision-Making," Personality and Individual Differences, 42, 4, 665-675.
55
Martín-Santana, Josefa D., Clara Muela-Molina, Eva Reinares-Lara, and Miriam Rodríguez-
Guerra (2015), "Effectiveness of Radio Spokesperson's Gender, Vocal Pitch and Accent
and the Use of Music in Radio Advertising," BRQ Business Research Quarterly, 18, 3,
143-160.
McTear, Michael, Zoraida Callejas, and David Griol (2016), The Conversational Interface:
Talking to Smart Devices, Springer.
Menon, Geeta and Priya Raghubir (2003), “Ease-of-Retrieval as an Automatic Input in

Judgments: A Mere-Accessibility Framework?,” Journal of Consumer Research, 30, 2,
230–43.
Meuter, Matthew L., Mary Jo Bitner, Amy L. Ostrom, and Stephen W. Brown (2005), "Choosing
Among Alternative Service Delivery Modes: An Investigation of Customer Trial of Self-
Service Technologies," Journal of Marketing, 69, 2, 61-83.
Moon, Youngme (2000), “Intimate Exchanges: Using Computers to Elicit Self-Disclosure From
Consumers,” Journal of Consumer Research, 26, 4, 323–339.
Moon, Youngme, and Clifford Nass (1996), "How “Real” are Computer Personalities?
Psychological Responses to Personality Types in Human-Computer Interaction,"
Communication Research, 23, 6, 651-674.
__________ (1998), "Are Computers Scapegoats? Attributions of Responsibility in Human–

Computer Interactions," International Journal of Human-Computer Studies, 49, 1, 79-94.
Moore, Danny L., Douglas Hausknecht, and Kanchana Thamodaran (1986), "Time Compression,
Response Opportunity, and Persuasion," Journal of Consumer Research, 13, 1, 85-99.
Murray, Iain R., and John L. Arnott (1993), "Toward the Simulation of Emotion in Synthetic
Speech: A Review of the Literature on Human Vocal Emotion," The Journal of the
Acoustical Society of America, 93, 2, 1097-1108.
Nass, Clifford, Youngme Moon, Brian J. Fogg, Byron Reeves, and D. Christopher Dryer (1995),
"Can Computer Personalities be Human Personalities?," International Journal of Human-
Computer Studies, 43, 2, 223-239.
Nass, Clifford, Youngme Moon, and Nancy Green (1997), "Are Machines Gender Neutral?
Gender‐Stereotypic Responses to Computers with Voices," Journal of Applied Social
Psychology, 27, 10, 864-876.
Nass, Clifford, and Youngme Moon (2000), "Machines and Mindlessness: Social Responses to
Computers," Journal of Social Issues, 56, 1, 81-103.
56
Nesari, Ali Jamali (2015), "Dialogism Versus Monologism: A Bakhtinian Approach to
Teaching," Procedia-Social and Behavioral Sciences, 205, 642-647.
Oakes, Steve, and Adrian C. North (2006), "The Impact of Background Musical Tempo and
Timbre Congruity Upon Ad Content Recall and Affective Response," Applied Cognitive
Psychology: The Official Journal of the Society for Applied Research in Memory and
Cognition, 20, 4, 505-520.
O’Connor, Catherine, and Sarah Michaels (2007), "When is Dialogue ‘Dialogic’?," Human
Development, 50, 5, 275-285.
Paluch, Stefanie, and Nancy V. Wünderlich (2016), "Contrasting Risk Perceptions of

Technology-Based Service Innovations in Inter-Organizational Settings," Journal of
Parkinson, Brian, and Gwenda Simons (2009), "Affecting Others: Social Appraisal and Emotion
Contagion in Everyday Decision Making," Personality and Social Psychology Bulletin,
35, 8, 1071-1084.
Peck, Joann, Victor A. Barger, and Andrea Webb (2013), "In Search of a Surrogate for Touch:
The Effect of Haptic Imagery on Perceived Ownership," Journal of Consumer
Psychology, 23, 2, 189-196.
Perez, Sarah (2019), “Report: Voice Assistants in Use to Triple to 8 Billion by 2023,” Retrieved
from https://techcrunch.com/2019/02/12/report-voice-assistants-in-use-to-triple-to-8-
billion-by-2023/, April 20, 2019.
Peterson, Robert A., Michael P. Cannito, and Steven P. Brown (1995), "An Exploratory
Investigation of Voice Characteristics and Selling Effectiveness," Journal of Personal
Selling and Sales Management, 15, 1, 1-15.
Pierce, David (2017), “How Apple Finally Made Siri Sound More Human,” Retrieved from
https://www.wired.com/story/how-apple-finally-made-siri-sound-more-human/, April 24,
2019.
Preacher, Kristopher J., Derek D. Rucker, and Andrew F. Hayes (2007), "Addressing Moderated
Mediation Hypotheses: Theory, Methods, and Prescriptions," Multivariate Behavioral
Research, 42, 1, 185-227.
Quené, Hugo (2007), "On the Just Noticeable Difference for Tempo in Speech," Journal of
Phonetics, 35, 3, 353-362.
Reeves, Byron, and Clifford Ivar Nass (1996), The Media Equation: How People Treat
Computers, Television, and New Media Like Real People and Places, Cambridge
University Press.
57
Routley, Nick (2019), “The Fight for Smart Speaker Market Share,” Retrieved from
https://www.visualcapitalist.com/smart-speaker-market-share-fight/, April 22, 2020.
Ruijten, Peter, Jacques Terken, and Sanjeev Chandramouli (2018), "Enhancing Trust in
Autonomous Vehicles Through Intelligent User Interfaces that Mimic Human Behavior,"
Multimodal Technologies and Interaction, 2, 4, 62.
Russell, James A. (1980), "A Circumplex Model of Affect," Journal of Personality and Social
Schwartz, Rachel, and Marc D. Pell (2012), "Emotional Speech Processing at the Intersection of
Prosody and Semantics," PloS One, 7, 10.
Siegman, Aron W., and Stephen Boyle (1993), "Voices of Fear and Anxiety and Sadness and
Depression: The Effects of Speech Rate and Loudness on Fear and Anxiety and Sadness
and Depression," Journal of Abnormal Psychology, 102, 3, 430.
Smith, Bruce L., Bruce L. Brown, William J. Strong, and Alvin C. Rencher (1975), "Effects of
Speech Rate on Personality Perception," Language and Speech, 18, 2, 145-152.
Sullivan, Malcolm (2002), "The Impact of Pitch, Volume and Tempo on the Atmospheric Effects
of Music," International Journal of Retail and Distribution Management, 30, 6, 323-330.
Tanner Jr, John F., James B. Hunt, and David R. Eppright (1991), "The Protection Motivation
Model: A Normative Model of Fear Appeals," Journal of Marketing 55, 3, 36-45.
Toplin, Jaime (2018), “Voice Shopping Grew Threefold During the Holidays,” Retrieved from
https://www.businessinsider.com/amazon-alexa-holiday-voice-shopping-grew-threefold-
2018-12, May 20, 2019.
Watson, David, Lee Anna Clark, and Auke Tellegen (1988), "Development and Validation of
Brief Measures of Positive and Negative Affect: The PANAS Scales," Journal of
Personality and Social Psychology, 54, 6, 1063.
Yakuel, Pini (2018), “Digital Assistant, Help Me Market My Brand,” Retrieved from
https://www.forbes.com/sites/forbescommunicationscouncil/2018/09/11/digital-assistant-
help-me-market-my-brand/#7e3496cc1a54, April 20, 2019.
Yoo, Seung-Schik, Daniel K. Freeman, James J. McCarthy III, and Ferenc A. Jolesz (2003),
"Neural Substrates of Tactile Imagery: A Functional MRI Study," Neuroreport, 14, 4,
581-585.
58
APPENDIX A: SOUND STIMULI USED IN EXPERIMENTS
Experiment 1/3B: Slow DA Voice 1/3B: Moderate Voice 1/3B: Fast DA Voice
Study 1 and 3 Study 1 and 3 Study 1 and 3

Slow.mp3 Medium.mp3 Fast.mp3
Experiment 2A: Slow DA Voice 2A: Moderate DA Voice 2A: Fast DA Voice
Study 2 Slow.mp3 Study 2 Medium.mp3 Study 2 Fast.mp3
Experiment 2B: Slow DA Voice 2B: Moderate DA Voice 2B: Fast DA Voice
Stuy 2C Slow.mp3
Study 2C Moderate.mp3 Study 2C Fast.mp3
Experiment 4B: Slow DA Voice 4B: Moderate DA Voice 4B: Fast DA Voice
59
APPENDIX B: SCRIPTS FOR EXPERIMENTS
Experiments “Your monthly budget should cover your basic living expenses, including
1/3B housing, utilities, insurance, transportation and groceries. You should also
include any subscriptions you pay for, as well as your student loan payments. If
you have any other loans – like a car loan – include those as well. Once you've
recorded your living expenses and your income, you must decide what to do
with the money that's left over. I recommend you put some toward an
emergency fund, some toward discretionary purchases like dining out, and
some toward retirement or other future savings goals. As your income
increases, reevaluate your budget and always raise your savings amount before
spending more on discretionary purchases to help keep yourself on track for
your financial goals.”
Experiment “Your daily study routine should cover the basics of your classes for your
2A major, any electives, and ensuring you are prepared for your final assignments,
projects, and exams. It should also include taking stock of how you currently
feel, as this effects your study routine. Lastly, you should start to take note of
your daily schedule, taking time to study a small amount each day in order to
be best prepared for the end of the semester, and making adjustments
accordingly. For many people, finding time to study can be difficult. It is
important that you keep in mind your upcoming schedule and exams, and plan
accordingly. Managing your time benefits both studying and helps to manage
stress”
Experiment “Your classes should be selected towards your major of study. Think about
2B certain credits you may need, classes you have taken in the past, and what extra
electives you are interested in. This should include taking stock of your current
classes and major, as this may affect what I recommend. Lastly, you should
start to take note of your current schedule, budget, and how many credit hours
you need, in order to get the best match. For many people, finding time to
balance class with social life can be difficult. It is important that you keep in
mind your upcoming schedule and plan accordingly. Managing your time and
preferences can help find the right classes. I can help you with enrolling in
classes next semester. Would you like to do this?”
Experiment “Your tickets should be selected for a show you really want to see. Think about
3A certain music artists you may like and have listened to in the past. This should
include taking stock of your tastes, as this may affect what I recommend. Lastly,
you should start to take note of your current playlists and calendar of activities
in order to get the best match. For many people, finding time to attend a show
can be difficult. It is important that you keep in mind your upcoming schedule
and plan accordingly. Managing your time and preferences can help find the
60
right tickets. I can help you with purchasing concert tickets. Would you like to
do this?”
Experiment “Your daily health routine should cover some basics like emotional, physical,
4A and mental well-being. This should include taking stock of how you currently
feel. You should also consider how you would describe your mood in general,
or most of the time. Lastly, you should start to take note of your daily routine,
taking time to check your emotional state and adjust if needed. For many
people, stress is a part of daily life that can go unrecognized. It is important
that you keep in mind your feelings and experiences as you go through the day.
Managing your stress levels benefits your emotional health as well as other
areas of your life”
Experiment “Your daily nutrition plan should cover some basics like calories, sugar,
4B cholesterol and fat intake. This should include taking stock of how you currently
feel as well as any health goals you may have. You should also consider which
types of food you enjoy and any allergies that may prevent you from eating
certain foods. Lastly, you should start to take note of your daily food routine,
taking time to check your physical hunger and emotional state and adjust if
needed. For many people, eating healthy is a part of daily life that can be
overlooked. It is important that you keep in mind your current health goals and
what you eat throughout the day. Managing your stress levels benefits both
your physical and emotional health”
61
APPENDIX C: MEASURES FOR EXPERIMENTS
Measure Used Items Reliability

Likelihood to Use Adapted from Chattopadhay et al. (2003) Single-item
DA 1. “How likely would you be to create a 1 = Not Very Likely
Main Dependent personal budget with the digital banker?” 7 = Very Likely
Variable in all (Experiment 1/3B)
studies 2. “How likely would you be to create a
personal study plan with the digital
assistant?” (Experiment 2A)
3. “How likely would you be to enroll in
classes using the digital advisor?”
(Experiment 2B)
4. “How likely would you be to purchase
concert tickets using the digital assistant?”
(Experiment 3A)
5. “How likely would you be to create a
personal emotional health check using the
digital health coach?” (Experiment 4A)
6. “How likely would you be to create a
personal health plan using the digital health
coach? (Experiment 4B)
Nervousness Watson, Clark, and Tellegen (1988) Taken as part of the
A negative 1. “When interacting, I felt…jittery; PANAS
affective state of distressed; anxious” (Experiment 2A-B, 1 = Not at All
high activation in 4A-B) 7 = Very Much
which one feels
tension 3-item measure
following Α2A = .928
perceived Α2B = .854
uncertainty or Α4A = .947
strain. Α4B = .898
Self-Service Bauer (1960), Meuter et al. (2005) 3-item measure
Technology (SST)1. “I am unsure if the assistant will perform α = .841
Risk satisfactorily”
Arising from 2. “Overall, using this digital assistant is
unanticipated risky”
and uncertain 3. “The digital assistant didn’t sound like it
consequences of was designed to do this task well”
an unpleasant (Experiment 2B)
nature resulting
from an
interaction
62
Manipulation Martín-Santana et al. (2015) Single-item
Check for Voice 1. The voice I heard when the digital assistant 1 = Slow
Speed was speaking was ________ 7 = Fast
Manipulation Adapted from Asterhan and Schwarz (2007) Single-item
Check for 2. Would you say that the interaction with the r4A= .862; r4B = .899
Interaction Style digital assistant was more passive, or one- 1 = One-directional
directional, or active, and two-directional? 7 = Two-directional
3. Would you say that the interaction with the

digital assistant was more about hearing 1 = Hearing Information
information or involved instruction? 7 = Involved Instruction
Vividness – Peck, Barger, and Webb (2013) 4-item measure
Experiment 2A 4. I was able to imagine using the digital α = .848
assistant 1 = Strongly Disagree
1. I felt as if the digital assistant was in the 7 = Strongly Agree
device in front of me
2. I could imagine interacting with the digital
assistant
3. I felt I could examine the digital
assistant/device
Susceptibility to Doherty (1997) 15-item measure
Emotions of If someone I’m talking with begins to cry, I α = .902
Others – get teary-eyed 1 = Strongly Disagree
Experiment 3A Being with a happy person picks me up 7 = Strongly Agree
when I’m feeling down
When someone smiles warmly at me, I
smile back and feel warm inside
I get filled with sorrow when people talk
about the death of their loved ones
I clench my jaws and my shoulders get
tight when I see angry faces on the news
When I look into the eyes of the one I love,
my mind is filled with thoughts of romance
It irritates me to be around angry people
Watching the fearful faces of victims on
the news makes me try to imagine how
they might be feeling
I melt when the one I love holds me close
I tense when overhearing an angry quarrel
Being around happy people fills me mind
with happy thoughts
I sense my body responding when the one I
love touches me
I notice myself getting tense when I’m
around people who are stressed out
I cry at sad movies
63
Listening to the shrill screams of a terrified
child in a dentist’s waiting room makes me
feel nervous
64
APPENDIX D: MEANS AND SD’s FOR PANAS MEASURES IN EXPERIMENT 2A
Means and (SD's)

Experiment 2A
Slow Moderate Fast p
Interested 2.73 (1.65) 2.69 (1.57) 2.64 (1.55) 0.920
Distressed 3.46 (1.87) 4.05 (1.71) 3.76 (1.84) 0.070
Excited 2.9 (1.67) 3.35 (1.59) 3.23 (1.57) 0.126
Upset 2.31 (1.49) 2.26 (1.39) 2.17 (1.45) 0.789
Strong 3.91 (1.82) 3.73 (1.57) 3.72 (1.68) 0.696
Guilty 1.96 (1.22) 1.78 (1.12) 1.89 (1.23) 0.584
Scared 1.74 (1.18) 1.89 (1.32) 1.90 (1.13) 0.593
Hostile 2.27 (1.40) 2.01 (1.33) 2.10 (1.42) 0.414
Enthusiastic 2.93 (1.66) 3.28 (1.61) 3.26 (1.63) 0.237
Proud 3.63 (1.80) 3.45 (1.55) 3.65 (1.63) 0.641
Irritable 3.62 (1.96) 3.29 (1.80) 3.09 (1.82) 0.137
Alert 3.80 (1.80) 3.93 (1.66) 3.92 (1.72) 0.851
Ashamed 1.91 (1.23) 1.83 (1.12) 1.85 (1.28) 0.906
Inspired 3.18 (1.77) 3.45 (1.74) 3.37 (1.75) 0.535
Anxious 2.22 (1.52) 1.82 (1.66) 2.24 (1.42) 0.046
Determined 3.71 (1.87) 4.05 (1.74) 3.99 (1.72) 0.368
Attentive 4.06 (1.77) 4.27 (1.66) 4.17 (1.78) 0.700
Jittery 3.21 (1.82) 2.62 (1.72) 3.19 (1.78) 0.031
Active 3.71 (1.85) 3.30 (1.66) 3.53 (1.61) 0.239
Afraid 1.87 (1.27) 2.00 (1.46) 1.83 (1.24) 0.642
Vividness 3.74 (1.41) 4.05 (1.50) 4.12 (1.46) 0.155
PANAS Positive Affect 3.45 (1.23) 3.55 (1.16) 3.55 (1.14) 0.812
PANAS Negative Affect 2.45 (.91) 2.35 (.90) 2.40 (.92) 0.748
65
APPENDIX E: INSTITUTIONAL REVIEW BOARD APPROVAL LETTER
66
APPENDIX F: TABLES
Table 1
Experiments 2A-B – Mediation Results
PROCESS Contrast of
Experiment IV DV Mediator Mediator 2 Effect Indirect Effects 95% CI
Model # Multicategorical IV
X1: Slow vs. Moderate X1: -.0796 X1: -.0022 to -.1980
Speech Likelihood to Listener
2A 4 - X2: Fast vs. Moderate X2: -.0843 X2: -.0068 to -.1961
Rate Use DA Nervousness
X3: Slow vs. Fast X3: .0047 X3: -.0949 to .0965
Speech Likelihood to Listener X1: Moderate versus
2B 6 SST Risk -.0259 -.0539 to -.0039
Rate Use DA Nervousness Slow Speech
Speech Likelihood to Listener X2: Moderate versus

2B 6 SST Risk -.0124 -.0320 to -.0020
Rate Use DA Nervousness Fast Speech
Speech Likelihood to Listener X3: Slow versus Fast

2B 6 SST Risk -.0077 -.0241 to .2311
Rate Use DA Nervousness Speech
67
Table 2
Experiments 4A-B – Moderated Mediation by Nervousness and Interaction Style
Index of Contrast of Contrast of

PROCESS Indirect Effects
Experiment IV DV Mediator Moderator Moderated Multicategorical Dichotomous
Model # 95% CI
Mediation IV Moderator
Speech Likelihood Felt Interaction X1: Moderate Monological -.3469 to -.0211
4A 8 .0082 to .4629
Rate to Use DA Nervousness Style versus Slow Dialogical -.1057 to .2117
4A 8 .0916 to .5671
Rate to Use DA Nervousness Style versus Fast Dialogical -.0971 to .1953
Speech Likelihood Felt Interaction X3: Slow versus Monological -.2618 to .0537
4A 8 -.1216 to .3295
Rate to Use DA Nervousness Style Fast Dialogical -.1618 to .1624
4B 8 .0343 to .3674
Rate to Use DA Nervousness Style versus Slow Dialogical -.0715 to .1444
4B 8 .0414 to .4204
Rate to Use DA Nervousness Style versus Fast Dialogical -.0546 to .1705
Speech Likelihood Felt Interaction X3: Slow versus Monological -.1406 to .0842
4B 8 -.1135 to .2092
Rate to Use DA Nervousness Style Fast Dialogical -.0944 to .1403
68
APPENDIX G: FIGURES
FIGURE 1
CONCEPTUAL OVERVIEW OF ALL EXPERIMENTS
69
70
FIGURE 2
EXPERIMENT 2B – SERIAL MEDIATION OF SPEECH RATES ON LIKELIHOOD TO USE
DA
71
FIGURE 3
EXPERIMENT 3A – JOHNSON-NEYMAN GRAPH OF MODERATION OF LIKELIHOOD
TO USE DA BY SUSCEPTIBILITY TO OTHERS EMOTIONS
72
FIGURE 4
EXPERIMENT 3B – LIKELIHOOD TO USE DA ACROSS INTERACTION STYLE AND
SPEECH RATE
6.0
5.0
Likelihood to Use DA
4.0
3.0
2.0
1.0
0.0
Monological Dialogical
Interaction Style
Slow Moderate Fast
73
FIGURE 5
EXPERIMENT 4A – LIKELIHOOD TO USE DA ACROSS INTERACTION STYLE AND
SPEECH RATE
6.0
5.0
4.0
3.0
2.0
1.0
0.0
Interaction Style
Slow Moderate Fast
74
FIGURE 6
EXPERIMENT 4B – LIKELIHOOD TO USE DA ACROSS INTERACTION STYLE AND
SPEECH RATE
6.0
5.0
4.0
3.0
2.0
1.0
0.0
Interaction Style
Slow Moderate Fast
75
APPENDIX H: HEADINGS LIST
1) THEORETICAL FRAMEWORK
2) Elements of Speech
2) Speech Rate
2) Social Response
2) Responses to Speech
3) Nervousness
3) Interaction Style
1) OVERVIEW OF STUDIES
1) EXPERIMENT 1
2) Method
3) Participants and Design
3) Stimuli and Pretests
3) Procedure
3) Measures
2) Results and Discussion
3) Speech Rate
3) Likelihood to use the DA
1) EXPERIMENT 2A
2) Method
3) Procedure
3) Measures
76
3) Speech Rate
3) Nervousness
3) Mediation of Main Effect by Nervousness
1) EXPERIMENT 2B
2) Method
3) Procedure
3) Measures
3) Speech Rate
3) SST Risk
3) Serial Mediation Analyses
1) EXPERIMENT 3A
2) Method
3) Procedure
3) Measures
3) Speech Rate
77
3) Personal Differences in Susceptibility to Others Emotions
1) EXPERIMENT 3B
2) Method
3) Procedure
3) Measures
3) Speech Rate
1) EXPERIMENT 4A
2) Method
3) Procedure
3) Measures
3) Manipulation Checks
3) Nervousness
3) Moderated Mediation
1) EXPERIMENT 4B
2) Method
3) Procedure
3) Measures
78
3) Manipulation Checks
3) Nervousness
3) Moderated Mediation
1) GENERAL DISCUSSION
2) Theoretical Implications
2) Managerial Implications
2) Limitations and Future Research
1) DATA COLLECTION INFORMATION
79

U0015 0000001 0004240

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

U0015 0000001 0004240

Uploaded by

Copyright:

Available Formats

SPEAKING FAST AND SLOW: HOW SPEECH RATE OF DIGITAL ASSISTANTS

AFFECTS LIKELIHOOD TO USE

BRETT ALAN CHRISTENSON

CHRISTINE RINGLER, COMMITTEE CHAIR

Submitted in partial fulfillment of the requirements

upon consumer perceptions and behaviors.

Keywords: sound, voice, sensory marketing, digital assistant, human-computer interaction

To Tessa, Puma, Paprika, Nala, and Stellan.

HCI Human-Computer Interaction

ƞ2 p Partial Eta Squared

PANAS Positive and Negative Affect Schedule

me through this process, and I’m thankful for it.

inukshuk for this and many other projects.

I’m thankful every day for it. Look at where it led!

inspire who I am and what I do.

DEDICATION ............................................................................................................................... iii

LIST OF ABBREVIATIONS AND SYMBOLS .......................................................................... iv

LIST OF TABLES ......................................................................................................................... ix

LIST OF FIGURES .........................................................................................................................x

THE MOVE TO VOICE .................................................................................................................1

THEORETICAL FRAMEWORK ...................................................................................................3

GENERAL DISCUSSION ............................................................................................................45

DATA COLLECTION INFORMATION .....................................................................................51

APPENDIX B: SCRIPTS FOR EXPERIMENTS .........................................................................60

APPENDIX C: MEASURES FOR STUDIES ..............................................................................62

APPENDIX D: MEANS AND SD’S FOR PANAS MEASURES IN STUDY 2A ......................65

APPENDIX E: INSTITUTIONAL REVIEW BOARD APPROVAL LETTER ..........................66

APPENDIX F: TABLES ...............................................................................................................67

APPENDIX G: FIGURES .............................................................................................................69

APPENDIX H: HEADINGS LIST ................................................................................................76

1. PROCESS Measures for Studies 2A-B .....................................................................................66

1. PROCESS Measures for Studies 4A-B .....................................................................................67

1. Conceptual Overview of All Studies .........................................................................................68

2. Experiment 2B – Serial Mediation of Speech Rates on Likelihood to Use DA ........................69

3. Experiment 3A – Johnson-Neyman Graph of Moderation of Likelihood to Use DA by

of a DA’s voice impacts consumer adoption of these assistants.

These are some of the questions this research seeks to answer.

pitch, timbre, and speech rate.

The volume of a voice is indicated by amplitude and is perceived as the loudness or

Of particular importance to the presented experiments, the fourth paralinguistic aspect of

The communication theory of social response explains human interactions with

Based on social response theory, because vocalization is an evolved human behavior,

listener yet to be measured, and addressed in the presented studies.

reactions as being physiological as well as emotional. When interactions produce positive

it. Stated more formally:

Boyle 1993). Nervousness in a speaker’s voice is indicated by irregular variations in

Nervousness. The circumplex model of affect describes emotions in terms of two

Nervousness is defined as a highly activated, negative-valenced state associated with feelings of

and Tellegen 1988). Given this, two hypotheses follow:

consumer who hears it compared to when it speaks at a moderate rate.

H3: Feelings of consumer nervousness mediate the relationship between DA speech

nervousness lowering their likelihood to use the digital assistant.

of the following experiments:

risk perceptions, increasing consumer nervousness and lowering usage intentions.

also not causing negative reactions?

classifies interactions as being either monological or dialogical. In monological interactions, one

back-and-forth interaction is exemplified by greater audience involvement and processing of the

what is being spoken is more salient than how it is being spoken.

interaction style, with decreases in usage likelihood only occurring during

model of all experiments is provided in figure 1.

Experiment 1 examines differences in consumer reactions to digital assistant (DA) voice

using a scenario-based experiment in which participants use a DA to create a personal budget.

rate, this decrease should not occur.