Download as pdf or txt
Download as pdf or txt
You are on page 1of 90

SPEAKING FAST AND SLOW: HOW SPEECH RATE OF DIGITAL ASSISTANTS

AFFECTS LIKELIHOOD TO USE

by

BRETT ALAN CHRISTENSON

CHRISTINE RINGLER, COMMITTEE CHAIR


NANCY J. SIRIANNI, COMMITTEE CO-CHAIR
ARTHUR W. ALLAWAY
KRISTY E. REYNOLDS
PETER D. HARMS

A DISSERTATION

Submitted in partial fulfillment of the requirements


for the degree of Doctor of Philosophy
in the Department of Marketing
in the Graduate School of
The University of Alabama

TUSCALOOSA, ALABAMA

2020
Copyright Brett Alan Christenson 2020
ALL RIGHTS RESERVED
ABSTRACT

Digital assistants like Siri and Alexa are adaptable service robots which interact vocally

to deliver services to consumers. During interactions, these digital assistants provide a unique

opportunity for marketers to convey social and emotional information by altering qualities of

their voices, such as speech rate. However, as brands begin to adapt their digital assistant voices,

they have little research to guide them in creating positive consumer responses and avoiding

negative ones. Across seven experiments, the process of how digital assistant speech rate affects

consumer emotions and likelihood to use is uncovered and explained. Nervousness and risk

facilitate the process, while interaction style and personal differences are shown to moderate the

effects. Experiment 1 begins by showing speech rates which are significantly faster or slower can

negatively impact likelihood to use a digital assistant. Following this, experiments 2A-B uncover

the process of effects and show feelings of nervousness as well as risk mediate the relationship

between speech rate and usage intentions. Experiments 3A-B provide managers with applicable

moderators for the effects while experiments 4A-B provide a complete moderated mediation

model. This work contributes to the sensory marketing literature focused on sound and its impact

upon consumer perceptions and behaviors.

Keywords: sound, voice, sensory marketing, digital assistant, human-computer interaction

ii
DEDICATION

To Tessa, Puma, Paprika, Nala, and Stellan.

iii
LIST OF ABBREVIATIONS AND SYMBOLS

a Coefficient Alpha

b Coefficient Beta

CI Confidence Interval

d Cohen’s D

DA Digital Assistant

DV Dependent Variable

F F-Statistic

HCI Human-Computer Interaction

IV Independent Variable

N Sample Size

ƞ2 p Partial Eta Squared

M Mean

% Percentage Sign

PANAS Positive and Negative Affect Schedule

p p-Value

SD Standard Deviation

SE Standard Error

t T-statistic

iv
ACKNOWLEDGEMENTS

Many people helped make this dissertation possible, so these acknowledgements should

not be considered an exhaustive list. First and foremost, I recognize and thank my co-chairs,

Christine Ringler and Nancy Sirianni, who have had a significant impact on my development as

an academic in many ways. They have both been formative in shaping my approach to scholarly

work as well as my career and this project would not have been possible without their guidance. I

would like to specifically thank Christine for holding me to a high standard throughout the

program. This motivated me to keep working harder and kept me active in thinking about my

projects and how to improve, which has made my work incredibly better. I’d also like to

specifically thank Nancy for opening my range of interests to other research areas as well as

giving me guidance on some of the unwritten stuff about PhDing. The both of you have guided

me through this process, and I’m thankful for it.

Second, I appreciate my committee members: (1) Kristy Reynolds, for being supportive

and open to me asking questions on short notice as well as working with me on other projects,

(2) Buster Allaway, for reminding me from time to time work can be fun and humor goes a long

way to reducing stress, and (3) Peter Harms, who has unknowingly served as my personal

inukshuk for this and many other projects.

Third, I thank the scholars currently conducting research on this topic, for their

dedication to the area of sensory marketing, computer interactions, and consumer behavior as

well as their willingness to lay the groundwork upon which this project attempts to build.

v
Fourth, I want to thank my parents, Debbie and Max, who set the example and showed

me the value of what an education can provide for someone and who went out of their way to

ensure I had the best one possible. You’re the ones who started me on this path at Reagan, and

I’m thankful every day for it. Look at where it led!

Last and never least, I want to acknowledge that none of this would be possible without

my wife, Tessa. You’re the smartest, most well-adjusted person I know and I wouldn’t have been

able to do this without you. There aren’t enough pages available to really say how much you

inspire who I am and what I do.

vi
CONTENTS

ABSTRACT.................................................................................................................................... ii

DEDICATION ............................................................................................................................... iii

LIST OF ABBREVIATIONS AND SYMBOLS .......................................................................... iv

ACKNOWLEDGEMENTS .............................................................................................................v

LIST OF TABLES ......................................................................................................................... ix

LIST OF FIGURES .........................................................................................................................x

THE MOVE TO VOICE .................................................................................................................1

THEORETICAL FRAMEWORK ...................................................................................................3

OVERVIEW OF STUDIES...........................................................................................................12

EXPERIMENT 1 ...........................................................................................................................13

EXPERIMENT 2A ........................................................................................................................18

EXPERIMENT 2B.........................................................................................................................24

EXPERIMENT 3A ........................................................................................................................29

EXPERIMENT 3B.........................................................................................................................32

EXPERIMENT 4A ........................................................................................................................36

EXPERIMENT 4B.........................................................................................................................41

GENERAL DISCUSSION ............................................................................................................45

DATA COLLECTION INFORMATION .....................................................................................51

REFERENCES ..............................................................................................................................52

vii
APPENDIX A: SOUND STIMULI USED IN STUDIES.............................................................59

APPENDIX B: SCRIPTS FOR EXPERIMENTS .........................................................................60

APPENDIX C: MEASURES FOR STUDIES ..............................................................................62

APPENDIX D: MEANS AND SD’S FOR PANAS MEASURES IN STUDY 2A ......................65

APPENDIX E: INSTITUTIONAL REVIEW BOARD APPROVAL LETTER ..........................66

APPENDIX F: TABLES ...............................................................................................................67

APPENDIX G: FIGURES .............................................................................................................69

APPENDIX H: HEADINGS LIST ................................................................................................76

viii
LIST OF TABLES

1. PROCESS Measures for Studies 2A-B .....................................................................................66

1. PROCESS Measures for Studies 4A-B .....................................................................................67

ix
LIST OF FIGURES

1. Conceptual Overview of All Studies .........................................................................................68

2. Experiment 2B – Serial Mediation of Speech Rates on Likelihood to Use DA ........................69

3. Experiment 3A – Johnson-Neyman Graph of Moderation of Likelihood to Use DA by


Susceptibility to Others Emotions .................................................................................................70

4. Experiment 3B – Likelihood to Use DA Across Interaction Style and Speech Rate ................71

5. Experiment 4A – Likelihood to Use DA Across Interaction Style and Speech Rate ................72

6. Experiment 4B – Likelihood to Use DA Across Interaction Style and Speech Rate ................73

x
THE MOVE TO VOICE

Over the last few years, digital assistants have increasingly become a central point of

communication between consumers and their connected devices. Digital assistant (DA) use is set

to triple over the next few years, with projections of 1.8 billion users by 2021 and over 8 billion

DA-enabled devices by 2023 (Perez 2019; Yakuel 2018). Tech companies have taken notice and

are betting the future of personal computing will be driven by vocal interaction with these

assistants, which integrate multiple devices together into a connected ecosystem (Routley 2019).

These ecosystems allow a consumer to be connected to their assistant no matter where they go,

and encourage multiple device purchases from a single brand because integration across brands

is not an option. For many of these companies, the first step in attracting consumers into their

ecosystem is adoption of their smartphones and speakers, which are enabled with their branded

DA. The smart speaker market alone is expected to be over $23 billion by the end of 2025,

indicating a large bottom-line impact for brands who invest in attracting consumers early on into

their DA ecosystem rather than a competitors (Kumar and Rasal 2018). For these brands, DAs

are a tool that can be used to provide social value to consumers through vocal interaction.

However, little research has been done in the field of marketing to address how the audible cue

of a DA’s voice impacts consumer adoption of these assistants.

In an effort to provide more social value to consumers and increase DA adoption, brands

have begun to adapt their DA voices for major release. Google has customizable add-ons for

their DA, like the voice of a favorite celebrity (Kraus 2019), and Apple continually updates the

1
voice of Siri to more closely resemble the speech of a human agent (Pierce 2017). In the summer

of 2019, Amazon updated Alexa to be more adaptable to consumer preferences in terms of the

speed at which she speaks (Amazon 2019). Consumers using Alexa-enabled devices can now

choose from seven speeds – Alexa's standard speaking rate, four faster speaking rates, and two

slower speaking rates. This newest feature is expected to create more social interactions because

it is more realistic and similar to how a human speaks. However, it also begs the question: what

effect does varying speech rate have on consumers and how does theory explain these effects?

These are some of the questions this research seeks to answer.

While consumer industry leaders are already adapting their DA’s voice, research to

support these updates is scant and has been either limited to areas other than marketing or has

focused on broadcast advertising like TV and radio. Therefore, the presented studies seek to

contribute to the marketing literature related to sounds, specifically the sound of a voice, by

integrating theory from communications with consumer behavior research. More specifically,

this work investigates how the speech rate of a DA affects consumers and lowers their likelihood

to continue using it. Without knowledge of how a DA’s voice impacts a consumer’s likelihood to

use it, marketer’s updates may be hurting rather than helping their efforts to gain market share in

the early stages of DA growth. Therefore, this work is important to both the marketing literature

as well as to industry.

2
THEORETICAL FRAMEWORK

In the area of sensory marketing, research on sound shows that meaning in verbal

communication is delivered not only in what we say, but how we say it (Peterson, Cannito, and

Brown 1995). When we speak, what we say is classified as the linguistic content of a message

while the way we speak is classified as paralinguistic content (Apple, Streeter, and Krauss 1979).

Paralinguistic aspects of our speech are made up of four basic adjustable elements: volume,

pitch, timbre, and speech rate.

Elements of Speech

The volume of a voice is indicated by amplitude and is perceived as the loudness or

softness of a sound (Bruner 1990; Krishna 2013). There is evidence to show that the volume of

background sound in restaurants impacts expenditures on food and beverages (Sullivan 2002).

Pitch of a voice is made up of the sound wave frequency and is often perceived as being on a

spectrum from low to high (Krishna 2013; Lowe, Ringler, and Haws 2018). Manipulations of

pitch have been shown to impact perceptions of product size (Lowe and Haws 2017). The third

property of a voice is its timbre, or the harmonics of the vocal sound wave (Bruner 1990).

Timbre allows a person to discern between two speakers who have identical volume and pitch,

but differ in terms of their harmonic texture. Research on timbre indicates positive effects of

certain timbres on affective responses and recall of advertisements (Oakes and North 2006).

Speech Rate

Of particular importance to the presented experiments, the fourth paralinguistic aspect of

a voice is its speech rate, or the tempo at which a person speaks. A common practice in broadcast

3
advertising is to speed up, or time compress, advertisements so they fit into a specified time slot

on air. When an ad is time compressed, the audio elements are sped up, creating the effect of

faster speech rate. One of the most well-known effects of speech rate was shown by

Chattopadhyay et al. (2003), who indicated speech rate can interact with pitch. Their results

show a voice with faster-than-normal syllable speed and low pitch can produce more favorable

ad and brand attitudes. Their results for speech rate, however, are conflated with other

paralinguistic qualities, making the inference of effects for only speech rate harder to discern. In

earlier work, MacLachlan and Siegel (1980) produce evidence that participants have better recall

of brands and often prefer content that is sped up. These authors test their hypotheses using a

single study with TV commercials which conflate the senses of vision and audition, making

inferences about speech rate difficult. Moore, Hausknecht, and Thamodaran (1986) dispute the

cognitive recall findings of MacLachlan and Siegel (1980) using manipulations of audio only.

They argue faster commercials reduce a consumer’s time to elaborate on ad information and lead

to mixed results for speech effects. They conclude that when someone hears a fast speech rate,

they use it as a peripheral cue for processing difficulty. Therefore, the focus is less on what is

being said and more on how it is being said. The limited work examining slower speech rates

shows a similar effect to that of faster speech, but is also conflated with other paralinguistic

qualities. For instance, Benki et al. (2011) investigate phone interviewers’ voices and their

impact upon a respondent’s choice to agree, refuse, or defer in taking a survey. Their results

show slower speakers elicit the lowest amount of respondent participation. This single

experiment, however, conflates speech rate with pitch as well as pausing. More recent work by

Charoenruk and Olson (2018) posits listeners find it difficult to hold relevant information about

complex topics in working memory if a speaker talks at a slow rate, producing lower recall.

4
While this short review provides a basis for the current studies, it also indicates many

opportunities for contribution. There has been very little attention in marketing given to work on

vocal sounds over the past decade, with much of the aforementioned research highlighting only

initial findings (Dahl 2010). Therefore, the current studies seek to expand the sensory marketing

literature on the auditory cue of a voice by investigating the process of its impact on consumers.

More specifically, how speech rate of a DA’s voice impacts consumers negative reactions and

their likelihood to use it. These effects are explained by social response theory, discussed next.

Social Response

The communication theory of social response explains human interactions with

computers, technology, and new media (Reeves and Nass 1996). According to the theory, people

tend to interact with computers as if they are social actors, even when they are aware the

machine does not actually possess feelings or motivations (Moon 2000; Nass and Moon 2000;

Nass, Moon, and Green 1997). When people interact with technology exhibiting human-like

characteristics, the response is reflexive and occurs without substantial deliberation (Reeves and

Nass 1996). This phenomenon aligns with research showing people make social attributions and

responses “mindlessly” (Langer 1989; Nass and Moon 2000) and use heuristics to simplify

extensive processing of information (Chaiken 1980; Eagly and Chaiken 1993). When using

computers equipped with voice output, participants have been shown to psychologically orient

themselves towards the voice, imbuing it with distinct personality types (Moon and Nass 1996,

1998; Nass et al. 1995; Nass, Moon, and Green 1997). In effect, human social responses to

objects with humanlike characteristics, like speech, are influenced by their interactions with the

object (Aggarwal and McGill 2007; Epley, Waytz, and Cacioppo 2007). Given this, the voice of

a technology is a social cue that is salient and likely to become the relational target of a social

5
response. Digital assistants therefore present an ideal context in which to investigate the effects

of the speech rate cue upon social responses, both positive and negative.

Based on social response theory, because vocalization is an evolved human behavior,

DAs which talk to us should be imbued with human traits and consumer responses should be

similar to interactions with other humans. Indeed, some modern conversational interfaces are

now able to interact with us in very humanlike ways when they are combined with artificial

intelligence technologies (McTear, Callejas, and Griol 2016; Ruijten, Terken, and Chandramouli

2018). While a majority of existing self-service technologies, such as ATM machines, lack the

capacity to engage consumers socially, DA technologies using voice are uniquely capable of

engaging in meaningful social encounters with humans. The theory of social response provides

an explanatory framework for why these social interactions occur between humans and DAs.

Given our interactions with computers can be similar to those with other people, prior

investigations of emotional and social responses between humans should help identify potential

processes of the effects between humans and robots. Since DAs interact vocally, research on

sounds and how they impact our emotions serves as a fruitful starting point.

Responses to Speech

In a review of the literature on human speech, Murray and Arnott (1993) show that

paralinguistic vocal qualities are associated with basic emotions. Their review reinforced an

earlier proposition by Darwin (1872), that the voice is a sophisticated tool used to indicate an

individual's emotional state. When we communicate verbally, we disclose information about our

biological, psychological, or social status through vocal variations (Kraus 2017; Schwartz and

Pell 2012). For example, high-pitched voices are judged as less truthful, less emphatic, and more

nervous (Apple, Streeter, and Krauss 1979). Additionally, voices which have fast or slow speech

6
rates are perceived as less benevolent (Smith et al. 1975) and overall less liked (Benki et al.

2011; Murray and Arnott 1993). These studies, however, are limited to measuring consumer

attributions of the person speaking, leaving investigation of the effects of a voice upon the

listener yet to be measured, and addressed in the presented studies.

Overall, the literature indicates that when we hear a sound which is dissonant or

unpleasant, we generally have negative reactions. Social response theory indicates these

reactions as being physiological as well as emotional. When interactions produce positive

affective responses they facilitate approach behaviors, while negative affective responses inhibit

this behavior (Fowles 1994). Given this, a consumer who hears a sound which isn’t pleasing

should be less likely to want to hear it again. This means consumers who interact with a DA who

speaks in a way that is not pleasing, such as too fast or slow, should be less likely to want to use

it. Stated more formally:

H1: Consumers who use a DA that speaks at a fast or slow rate will be less likely to

want to use it than consumers who interact with moderate speech rate.

Building on the main effect hypothesis, a logical next question to ask is what mechanism

explains the effect? A plausible candidate would be a negatively valenced and highly activated

state, such as nervousness. A common theme in communication research links speech rate with

specific emotional expressions, like speaker nervousness (Laukka et al. 2008; Siegman and

Boyle 1993). Nervousness in a speaker’s voice is indicated by irregular variations in

paralinguistic qualities and is easily perceived by the person who hears it.

Nervousness. The circumplex model of affect describes emotions in terms of two

orthogonal dimensions of valence and activation (Barrett and Russell 1999; Russell 1980).

Nervousness is defined as a highly activated, negative-valenced state associated with feelings of

7
distress and being jittery (Watson, Clark, and Tellegen 1988). This state matches the description

given by Burke et al. (1989) and is characterized by avoidant behavior (Lowe, Loveland, and

Krishna 2019). Building on the earlier proposition that positive affect facilitates approach

behavior whereas negative affect inhibits similar behavior, nervousness becomes a prime

candidate for the mechanism of the effects in hypothesis 1. Nervousness is also a prime

candidate because it differs from other highly activated and negatively valenced states. For

example, it does not include the perception of physical harm, similar to fear (Tanner, Hunt, and

Eppright 1991), and it does not facilitate approach behaviors, similar to anger (Watson, Clark,

and Tellegen 1988). Given this, two hypotheses follow:

H2: A DA which speaks at a fast or slow rate will produce greater nervousness in the

consumer who hears it compared to when it speaks at a moderate rate.

H3: Feelings of consumer nervousness mediate the relationship between DA speech

rate and consumer likelihood to want to use it, with increases in consumer

nervousness lowering their likelihood to use the digital assistant.

While an understanding of how a voice can lead to lower usage intentions is unique, a

larger contribution can be made by providing evidence for the deeper process of nervousness

effects. Research on both emotional processing as well as decision making indicates perceptions

of risk as a viable candidate for facilitating feelings of nervousness (Carpentier et al. 2017). For

example, the nature of prospect theory states that in the face of gains, nervous consumers prefer a

guaranteed option over a risky one (Kahneman and Tversky 2013). Bias towards these decisions

was originally posited as being an aversion to negative outcomes. More recent work indicates

that nervousness stems more from consumer uncertainty about their decisions, defined as

perceptions of risk, and leads to different decisions in nervous individuals (Carpentier et al.

8
2017). The authors build upon prior work associating risk and nervousness and lay the

groundwork for future exploration (Giorgetta et al. 2012; Maner et al. 2007). However, this work

only goes as far as hypothesizing the association between risk and nervousness rather than

testing a causal order of effects, leaving a theoretical gap the current studies seek to address.

In consumer behavior research, it’s widely accepted that perceptions of risk are relevant

for adoption (Cherry and Fraedrich 2002; Gatignon and Robertson 1985). Risk perceptions have

been shown to impact attitudes and behaviors towards e-business functions (de Ruyter, Wetzels,

and Kleijnen 2001), usage of self-checkouts in grocery stores (Anselmsson 2001), and aversion

to ordering prescriptions over the phone (Meuter et al. 2005). Despite the growth in service

technology, gaining consumer acceptance can still remain a challenge for marketers (Paluch and

Wünderlich 2016). The role risk plays in the process of eliciting nervousness has yet to be

explored but is indicated by this literature. Therefore, an additional hypothesis is tested in some

of the following experiments:

H4: Speech rate negatively influences likelihood to want to use a DA through a serial

mediation process of risk and nervousness. Fast and slow speech rates increase

risk perceptions, increasing consumer nervousness and lowering usage intentions.

With marketers already updating the voices of their branded DAs, providing a moderator

which can be used to avoid unwanted outcomes would be immediately valuable. Given the

market is already being filled with DAs which can alter their speech rate, how could an

interaction with a DA be created which delivers information at a faster or slower speed while

also not causing negative reactions?

Interaction Style. Research on interaction styles in the social and behavioral sciences

classifies interactions as being either monological or dialogical. In monological interactions, one

9
person speaks for a time and another person listens. This style is exemplified by less participant

engagement, or processing of the message (O’Connor and Michaels 2007). Less processing of

the central message indicates peripheral cues should be more salient. This occurs because the

speaker is not focused on the audience’s needs and only commands, coerces, and manipulates

(Johannesen 1996). Therefore, the audience is less involved. Opposite of this, a dialogic, or

back-and-forth interaction is exemplified by greater audience involvement and processing of the

content of a message. As someone becomes more involved, their focus is directed towards the

content communicated, or the words being spoken (Nesari 2015). During dialogical interactions,

what is being spoken is more salient than how it is being spoken.

Since DAs are adaptable, the interaction style they use with us can be changed depending

on the context. If a marketer wishes to create a DA which alters its speech rate, it should also be

designed to interact in such a way that avoids potential negative reactions. The discussion of

monologic versus dialogic interaction styles, and their differing influence upon which aspects of

speech are most salient, falls in line with the work reviewed earlier indicating a voice can draw

attention to either what is being said or how it is being said. This leads to the hypothesis that the

effects of speech rate, a peripheral cue, should be moderated by interaction style, an important

factor in whether someone is focused on the content or delivery of a message. More formally:

H5: The relationship between DA speech rate and likelihood to use it is moderated by

interaction style, with decreases in usage likelihood only occurring during

monologic interactions.

. Based upon the discussion showing elements of sound can cause negative emotional

reactions in consumers (Lowe, Loveland, and Krishna 2019; Lowe, Ringler, and Haws 2018), the

current work seeks to expand our knowledge of how and why this occurs. While speech rate has

10
been linked to negative reactions like speaker nervousness (Apple, Streeter, and Krauss 1979),

investigations in this area are limited to dichotomous manipulations of speech rate, being either

slow or fast (MacLachlan and Siegel 1980; Murray and Arnott 1993). This work is also usually

limited to general measures of affective states, like positive and negative. Therefore,

investigation of a single paralinguistic quality and its role between a human and DA is both

novel and important for researchers in sensory marketing and human-computer interaction. It

shines a light on more specific emotions beyond general positive or negative states consumers

feel when they hear a voice and it provides a theoretical process for effects.

11
OVERVIEW OF STUDIES

Seven experiments are presented which test five hypotheses. Experiment 1 begins by

showing the main hypothesized effect, that vocal speech rate has a differential effect upon

consumer likelihood to use a digital assistant. Following this, experiments 2A-B investigate the

process of the effects shown. Using two scenarios, these experiments show a negative affective

state, nervousness, mediates the effect between speech rate and likelihood to use the DA. These

experiments also show that self-service technology risk is the mechanism for nervousness and

rule out several alternative explanations. Experiments 3A-B provide managers with two

moderators of the effect, one being personal differences and the other being interaction style.

Lastly, experiments 4A-B show a full moderated mediation model of effects. A conceptual

model of all experiments is provided in figure 1.

-----------------------------------
Insert figure 1 about here
-----------------------------------

12
EXPERIMENT 1

Experiment 1 examines differences in consumer reactions to digital assistant (DA) voice

types that are either slow, moderate, or fast in their speech rate. These differences are explored

using a scenario-based experiment in which participants use a DA to create a personal budget.

Following from the earlier conceptualization, both faster and slower digital assistant speech rates

are expected to lower a consumer’s likelihood to use the DA. When the DA speaks at a moderate

rate, this decrease should not occur.

Stimuli and Pretests

Female voices were used for the digital assistant based upon a pilot study with 107

participants (Mage = 34.74, SD = 9.92, 59.9% male) on Amazon Mechanical Turk (MTurk). Pilot

study participants were asked to provide information on the smart devices they owned by

indicating if they had a smart phone, tablet, computer or smart speaker. Of these participants,

96% owned a smart phone, 77% owned a laptop or desktop, 51% owned a tablet and 36% owned

a smart speaker. Next, participants were asked to indicate if any of their devices were equipped

with a digital assistant, with 89% of respondents answering “yes”. Following this, participants

were asked how often they use their digital assistant, with 39% saying they used their assistant at

least once in the previous week, and 74% indicating they had used their assistant at least once in

the last month. Important for the current studies, the results indicated 85% of participants had a

device with a female voice and the device came preprogrammed to that female voice. An

additional interesting finding was that 93% of participants indicated that, while they had the

option to switch the voice of their assistant, they had not done so.

13
To ensure that speech rate was successfully operationalized for all studies, audio

recordings of the DA voice were pretested with a second, and separate, MTurk panel of 100

participants (Mage = 43.87, SD = 13.97, 54.2% male). These participants were played a recording

of the digital assistant voice, being either slow, moderate, or fast speaking, and then asked to rate

their perceptions of the voice using 7-point Likert scales on the paralinguistic qualities of Speed

(1 = Slow; 7 = Fast), Pitch (1 = Low Pitched; 7 = High Pitched), Volume (1 = Quiet; 7 = Loud),

and Timbre (1 = Smooth; 7 = Rough). There were no significant differences in listener’s

perceptions of the voices in terms of volume, pitch, or timbre (F’s < 1). When comparing

perceptions of the speed of the voice, there were significant differences between the slow,

moderate, and fast voices (F(2, 97) = 4.32, p < .001, ƞ2p = .22). Post hoc comparisons showed

that the slow speaking voice (Mslow = 3.18, SD = .92) was perceived as significantly slower

speaking than the moderate voice (Mmoderate = 3.78, SD = .78, t(58) = 3.22, p = .003, d = .434).

The fast speaking voice (Mfast = 4.56, SD = .99) was correctly perceived as being significantly

faster than the moderate voice t(58) = 4.11, p < .001, d = .603). The fast speaking voice was also

perceived as being significantly faster than the slow voice t(58) = 6.25, p < .001, d = .887). It is

important to note that all stimuli were created with the help of a professionally trained audio

technician to ensure they were equal in terms of pitch, volume, and timbre, with manipulation of

speech rate being the only change.

Method

Participants and Design. Experiment 1 utilized 766 participants (Mage = 20.25, SD =

1.01, 49.9% male) in a controlled lab environment at a large public university in the U.S.

Participants were compensated with class participation credit. The voice of the digital assistant

(DA) was manipulated to create a one-way design with 3 treatments (voice speech rate: slow,

14
moderate, fast). Participants were told they would be interacting with a new DA built to help

create a personal budget. Speech rate of the DA’s voice was operationalized by recording a

human female and then manipulating the sound file to create audio recordings for the voice

types. Using Ableton Live 10 software, the recording was adjusted first to reflect a moderate

speech rate, similar to the rate used by Lane and Grosjean (1973), having a syllabic speed of 5-

per-second and an interphase pausation of half-a-second. Next, the recording was adjusted to

create both a slow and fast version of the voice, being 20% faster or slower in each direction.

Vocal stimuli from all studies are provided in appendix A.

Procedure. After agreeing to participate, participants were asked to indicate if they used a

DA before and provide their usage rates and preferences. They were then asked to identify a test

sound to ensure the provided lab headphones were working, and finally passed an attention

check. Following this, participants were given an introduction to the scenario, which indicated

they would be interacting with a new DA built to help create a personal budget. Participants were

randomly assigned to one of the three treatment conditions, hearing either a slow, moderate, or

fast speaking DA. To control for potential effects of volume (Garlin and Owen 2006; Kellaris

and Altsech 1992), the output volume for all of the files was set to an identical amplification and

participants were asked not to change the volume. Additionally, lab assistants checked the

volume level of all participant computers after each lab session to ensure no changes were made.

There were 13 participants who did not correctly identify a test sound and were removed, leaving

753 for the analysis.

While participants heard different speeds of speech, the content of the message was

identical and consisted of the participant listening to an audio clip of the digital assistant

speaking. In social and behavioral sciences, this style of interaction is labeled as monological, or

15
one-way in style (Asterhan and Schwarz 2007). In monological interactions, one person speaks

for a time, and another person listens. This interaction style is exemplified by less participant

engagement, or processing of the message, compared to a dialogic, or back-and-forth style

(Nesari 2015). As the purpose of experiment 1 is to show the main effect of the peripheral cue of

speech rate, a monologic interaction style is used as it should encourage less processing of the

central message and make peripheral cues more salient. Participants were first greeted by the DA

and then listened as the voice provided information on how and why to create a personal budget.

The entire scenario was one directional, or monological, with the digital assistant speaking while

the participants listened. After hearing the DA speak, participants filled out measures for the

variables of interest as well as demographic information. The script for this scenario as well as

all other studies is provided in appendix B.

Measures. The dependent variable of likelihood to use the DA was measured using a

single item 7-point Likert scale (“How likely would you be to use this assistant to create a

personal budget?”; 1 = Not at all likely; 7 = Very likely). Following this measure, participants

completed single-item manipulation checks for the audio recordings, taken from Martín-Santana

et al. (2015) (“This voice was 1 = Slow; 7 = Fast”; “This voice was 1 = Quiet; 7 = Loud”; “This

voice was 1 = Low Pitched; 7 = High Pitched”). Participants finished by indicating their gender

and age. Details of measures are provided in appendix C.

Results and Discussion

Speech Rate. The manipulation of speech rate was successful, with participants indicating

significant differences between the slow, moderate, and fast treatments (F(2,751) = 155.84, p <

.001, ƞ2p = .29). Participants heard the slow voice (Mslow = 3.18, SD = .98) as significantly

slower than the moderate voice (Mmoderate = 3.78, SD = .88, t(501) = 7.19, p < .001, d = .642).

16
Participants also heard the fast voice (Mfast = 4.70, SD = 1.04) as significantly faster than the

moderate voice (t(501) = 10.69, p < .001, d = .954). Additionally, participants correctly heard the

slow voice as significantly slower than the fast voice (t(500) = 16.80, p < .001, d = 1.50). There

were no significant differences between the voices in terms of pitch or volume (F’s < 1).

Likelihood to use the DA. A one-way ANOVA comparing the three speech rate

treatments showed a significant main effect of vocal speech rate upon likelihood to use the DA

(F(2, 751) = 4.13, p = .016, ƞ2 p = .011). Post hoc comparisons showed participants in the slower

speech treatments (Mslow = 3.91, SD = 1.88) were significantly less likely to want to use the DA

than in the moderate speech rate treatments (Mmoderate = 4.28, SD = 1.84, t(501) = 2.24, p = .022,

d = .198). Additionally, participants in the fast speech rate treatments (Mfast = 3.85, SD = 1.76)

were significantly less likely to want to use the DA than in the moderate speech rate treatments

(t(501) = 2.69, p = .008, d = .238). There was no significant difference between the slow and fast

speech rate treatments (p = .715). These results support hypothesis 1, that consumers who use a

DA with a fast or slow speech rate will be less likely to want to use the digital assistant than

consumers who interact with a moderate speech rate.

17
EXPERIMENT 2A

Experiment 2A begins a series of experiments to uncover the process of the hypothesized

effects. Slow and fast DA speech rates are hypothesized to produce lower consumer usage

intentions based upon the prior theory development indicating these speaker speech rates elicit

nervous responses in listeners. Experiment 2B digs deeper into this relationship to show

increased perceptions of risk act in serial mediation with consumer nervousness to lower usage

intentions with DA’s.

Participants were asked to listen to a DA provide information on creating a study plan for

undergraduate classes. Following from the earlier theory review, a faster or slower speech rate is

expected to lead to lower likelihood to want to use the digital assistant. Given speech rates

existing relationship with negative reactions, this effect is hypothesized as occurring due to

increased nervous reactions in a participant when speech rate is slow or fast. Therefore,

experiment 2A tests hypothesis 2 and 3. Hypothesis 2 states a digital assistant with a fast or slow

speech rate will produce greater felt nervousness in a consumer than one that speaks at a

moderate rate. Hypothesis 3 states consumer nervousness mediates the relationship between

digital assistant speech rate and likelihood to want to use it.

Method

Participants and Design. Experiment 2A utilized 302 participants (Mage = 20.66, SD =

1.10, 73.2% male) in a controlled lab environment at a large public university in the U.S.

Participants were compensated with class participation credit. The voice of the DA was

manipulated to create a one-way design with 3 treatments (voice speech rate: slow, moderate,

18
fast), similar to study 1. Participants were told they would be interacting with a new DA built to

help create a personal study plan for their classes. Audio recordings of the voice were created

similar to prior experiments and compressed for equal volume as lab assistants again checked the

computer volume between lab sessions to ensure participants did not change it.

Procedure. After consenting to participate, participants were asked to identify a test

sound to ensure the provided lab headphones were working as well as pass an attention check.

Following this, participants were given an introduction to the scenario and then randomly

assigned to one of the three treatment conditions, hearing either a slow, moderate, or fast

speaking DA. Participants were greeted by the DA and then listened to the voice provide

information on how and why to create a personal study plan. The interaction style of the scenario

was similar to experiment 1, being only monological, or encouraging peripheral processing.

After hearing the DA speak, participants filled out measures for the variables of interest as well

as demographic information. Based upon incorrect responses to the attention check question, 4

participants were removed from the sample, leaving 298 for analysis.

Measures. The dependent variable of likelihood to use the DA was measured using a

single item 7-point Likert scale (“How likely would you be to use this assistant to create a

personal study plan?”; 1 = Not at all likely; 7 = Very likely). Following the dependent variable

measure, participants completed the PANAS scales for positive and negative affect (Watson,

Clark, and Tellegen 1988) using 7-point Likert scales (1 = Not at all; 7 = Very Much). Means for

measures are provided in appendix D. The measure for nervousness was created by taking 3 of

the items (jittery, distress, anxious) and averaging them to create a nervousness index (α = .928).

An additional measure of vividness was also taken using a 4-item scale adapted from Peck,

Barger, and Webb (2013) (α = .848). The vividness measure was added after consideration of

19
alternative explanations for sensory experience. For example, in recent years, experimental

evidence has indicated that people generate mental images when they take in information

through their senses (Djordjevic et al. 2004; Jeannerod 1995; Yoo et al. 2003). This mental

imagery is an important component of experience and can be elicited by sounds (Belardinelli et

al. 2009). Furthermore, consumer behavior studies on sensory effects has shown that imagery is

facilitated by vividness. Therefore, adding a measure of vividness will help to identify the role it

may play in facilitating sensory effects for the sense of sound. Lastly, the same manipulation

check and demographic measures were taken, identical to study 1.

Results and Discussion

Speech Rate. The manipulation of speech rate was successful, with participants indicating

significant differences between the slow, moderate, and fast treatments (F(2, 296) = 132.77, p <

.001, ƞ2 p = .474). Participants heard the slow voice (Mslow = 2.21, SD = 1.17) as significantly

slower than the moderate voice (Mmoderate = 3.72, SD = .96, t(196) = 9.97, p < .001, d = 1.41).

Participants also heard the fast voice (Mfast = 4.63, SD = 1.02) as significantly faster than the

moderate voice (t(199) = 6.48, p < .001, d = .917). Additionally, participants correctly heard the

slow voice as significantly slower than the fast voice (t(195) = 15.48, p < .001, d = 2.20). There

were no significant differences between the voices in terms of pitch or volume (F’s < 1).

Likelihood to use the DA. One-way ANOVA comparing the three speech rate treatments

showed a significant main effect of vocal speech rate upon likelihood to use the DA (F(2, 296) =

6.34, p = .002, ƞ2 p = .041). Participants in the slower speech treatments (Mslow = 3.05, SD =

1.97) were significantly less likely to want to use the DA than the moderate speech rate (Mmoderate

= 3.91, SD = 1.80, t(196) = 3.19, p = .001, d = .454). Participants in the fast speech rate (Mfast =

3.20, SD = 1.67) treatments were significantly less likely to want to use the DA than the

20
moderate speech rate (t(199) = 2.89, p = .006, d = .407). Additionally, participants were not

significantly different in terms of likelihood to use the DA between the slow or fast treatments

(t(195) = .56, p = .568, d = .081). These results provide additional support for hypothesis 1,

consumers who use a digital assistant with a fast or slow speech rate will be less likely to want to

use it than those who interact with a moderate speech rate.

Nervousness. A one-way ANOVA showed a significant main effect of vocal speech rate

upon participant nervousness (F(2, 296) = 3.09, p = .046, ƞ2 p = .021). Participants in the slower

speech treatments (Mslow = 2.22, SD = 1.52) felt significantly more nervous than the moderate

speech rate (Mmoderate = 1.82, SD = .98, t(196) = 2.16, p = .038, d = .311). Participants in the fast

speech rate treatments (Mfast = 2.24, SD = 1.42) felt significantly more nervous than the moderate

speech rate (t(199) = 2.42, p = .027, d = .342). Additionally, participants were not significantly

different in terms of their own nervousness between the slow or fast treatments (t(195) = .11, p =

.902, d = .013). These results provide initial support for hypothesis 2, that a digital assistant with

a fast or slow speech rate will produce greater felt nervousness in a consumer than one that

speaks at a moderate rate.

Checks for the alternative explanations of vividness as well as generalized positive or

negative affect were conducted next. One-way ANOVA showed no significant differences

between the slow (Mslow = 3.74, SD = 1.41), moderate (Mmoderate = 4.06, SD = 1.50), and fast

(Mfast = 4.12, SD = 1.46) treatments in terms of vividness (F(2, 295) = 1.87, p = .155, ƞ2 p =

.013). Additionally, one-way ANOVA with the 10 positive affect items from the PANAS

grouped together (α = .883) showed no significant differences in general positive affective state

between the slow (Mslow = 3.45, SD = 1.23), moderate (Mmoderate = 3.55, SD = 1.16), and fast

(Mfast = 3.55, SD = 1.14) treatments (F(2, 295) =.208, p = .812, ƞ2 p = .001). Differences in

21
negative affective state, using the 7 negative items from the PANAS that were not part of the

nervousness measure (α = .801), showed no significant differences between the slow (Mslow =

2.45, SD = .91), moderate (Mmoderate = 2.36, SD = .90), and fast (Mfast = 2.40, SD = .92)

treatments (F(2, 295) =.291, p = .748, ƞ2 p = .002). These results indicate that both vividness as

well as general positive or negative affective state can be ruled out as alternative explanations.

Mediation of Main Effect by Nervousness. Mediation analyses (Model 4, Hayes 2017)

using 5,000 bootstrapping samples with likelihood to use the DA as the dependent variable,

speech rate as the independent variable, and feelings of nervousness as the mediator was

performed next. Details of the mediation results are reported in table 1. The omnibus test of the

total effect was significant (p = .002), indicating nervousness mediates the relationship between

speech rate and willingness to use the digital assistant. Because the independent variable is

multicategorical, the results are reported as contrasts, using indicator coding, with the moderate

speech rate being the comparison category to the slow and fast treatments, respectively. While

the omnibus test indicates mediation occurred, examination of the 95% CI’s for each contrast

provides more information on the nature of the effect. Examination of the CI’s indicates that,

compared to the moderate speech rate, participants in both the slow [CI, -.0022 to -.1980] and

fast [CI, -.0068 to -.1961] treatments had an overall negative effect upon intentions to use the

digital assistant. This effect was mediated by increased nervousness. Examination of the CI’s

when comparing the slow versus fast speech rate treatments indicates that there was a non-

significant effect upon intentions to use the digital assistant between these groups [CI, -.0949 to

.0965]. Comparisons of slow to moderate as well as fast to moderate speech rates provide initial

evidence for hypothesis 3, that feelings of nervousness mediate the relationship between digital

22
assistant speech rate and likelihood to want to use it. When speech rate leads to increased

nervousness, usage intentions are lowered.

-----------------------------------
Insert table 1 about here
-----------------------------------

Building on this, experiment 2B investigates this process of effects and shows user

nervousness is elicited by perceptions of risk associated with using the DA.

23
EXPERIMENT 2B

Experiment 2B builds on 2A by diving deeper into the relationship between speech rate

and nervousness. Specifically, 2B tests a serial mediation model (Model 6, Hayes 2017) to

uncover the mechanism by which nervousness occurs. Between humans, situations which are

perceived as risky have been shown to induce panic and jitters (Lawrence-Wood 2011) while

perceptions of anxiety and excitement in another person have been shown to be associated with

perceptions of risk (Parkinson and Simons 2009). Given this, a primary candidate for a

facilitating mechanism for the emotion of nervousness is risk associated with using a DA.

Experiment 2B makes a direct contribution to the sensory and human-computer interaction (HCI)

literature by attempting to produce evidence for this mechanism, hypothesized as self-service

technology risk. Therefore, study 2B formally tests hypothesis 4: speech rate negatively

influences likelihood to want to use a DA indirectly through a serial mediation process of self-

service technology risk to nervousness.

Method

Participants and Design. Experiment 2B utilized 481 participants (Mage = 20.25, SD =

1.48, 49.8% male) in a controlled lab environment at a large public university in the U.S.

Participants were compensated with class participation credit. Similar to earlier experiments, the

design was one-way with 3 treatments (voice speech rate: slow, moderate, fast). Participants

were told they would be interacting with a new DA built to help register for classes. Audio

recordings of the voice were created similar to prior studies and compressed for equal volume.

24
Procedure. Participants were randomly assigned to one of three treatment conditions,

hearing either a slow, moderate, or fast speaking DA. After being greeted, they listened to a

voice provide information on how it could assist them in registering for classes based on their

needs. The interaction style of the scenario was monological, or encouraging peripheral focus.

After hearing the DA speak, participants filled out measures for the variables of interest as well

as demographic information. There were 15 participants who did not correctly identify a test

sound and were removed, leaving 466 for the analysis.

Measures. The dependent variable of likelihood to use the DA was measured using a

single item 7-point Likert scale (“How likely would you be to use this assistant to register for

classes?”; 1 = Not at all likely; 7 = Very likely). The 3-item measure of nervousness was again

taken from the PANAS and averaged to form a nervousness index (α = .854). Following this,

participants also filled out a 3-item measure of self-service technology (SST) risk, taken from

Bauer (1960) and Meuter et al. (2005). This included 3 Likert scale items (“I am unsure if the

digital assistant will perform satisfactorily”, “Overall, using this digital assistant is risky”, “The

digital assistant didn’t sound like it would do this task well”). Items were averaged to create an

SST Risk index measure (α = .824), to be used in the serial mediation model. Lastly, the same

manipulation check and demographic measures were taken, identical to prior studies.

Results and Discussion

Speech Rate. The manipulation of speech rate was successful, with participants indicating

significant differences between the slow, moderate, and fast treatments (F(2, 463) = 291.48, p <

.001, ƞ2 p = .557). Participants heard the slow voice (Mslow = 2.46, SD = 1.14) as significantly

slower than the moderate voice (Mmoderate = 3.80, SD = .99, t(305) = 10.93, p < .001, d = 1.25).

Participants also heard the fast voice (Mfast = 5.55, SD = 1.23) as significantly faster than the

25
moderate voice (t(302) = 13.83, p < .001, d = 1.56). Additionally, participants correctly heard the

slow voice as significantly slower than the fast voice (t(305) = 22.83, p < .001, d = 2.60). There

were no significant differences between the voices in terms of pitch or volume (F’s < 1).

Likelihood to use the DA. There was a significant main effect of speech rate upon

likelihood to use the DA (F(2, 463) = 8.28, p < .001, ƞ2 p = .035). Participants in the slow speech

treatments (Mslow = 2.56, SD = 1.62) were significantly less likely to want to use the DA than the

moderate speech rate (Mmoderate = 3.35, SD = 1.84, t(305) = 4.00, p = .001, d = .455). Participants

in the fast speech rate (Mfast = 2.97, SD = 1.66) treatments were significantly less likely to want

to use the DA than the moderate speech rate (t(302) = 1.91, p = .051, d = .216). Participants were

not significantly different in terms of likelihood to use the DA between the slow or fast

treatments (t(305) = .16, p = .168, d = .19).

SST Risk. There was a significant main effect of DA speech rate upon participants

perceptions of self-service technology risk (F(2, 463) = 10.37, p < .001, ƞ2 p = .043). Participants

in the slow speech treatments (Mslow = 5.18, SD = 1.25) felt the DA was significantly more risky

than the moderate speech rate (Mmoderate = 4.54, SD = 1.29, t(305) = 4.43, p < .001, d = .503).

Participants in the fast speech rate (Mfast = 4.85, SD = 1.17) treatments felt the DA was

significantly more risky than the moderate speech rate (t(302) = 2.21, p = .028, d = .251).

Additionally, participants were not significantly different in terms of their perceptions of SST

risk between the slow or fast treatments (t(305) = .06, p = .902, d = .023).

Nervousness. There was a significant main effect of DA vocal speech rate upon

participant nervousness (F(2, 463) = 10.84, p < .001, ƞ2 p = .045). Participants in the slow speech

treatments (Mslow = 3.11, SD = 1.65) felt significantly more nervous than those who heard a

moderate DA speech rate (Mmoderate = 2.42, SD = 1.42, t(305) = 3.91, p < .001, d = .443).

26
Participants in the fast speech rate (Mfast = 3.18, SD = 1.67) treatments felt significantly more

nervous than the moderate speech rate ((t(302) = 4.31, p < .001, d = .485). Additionally,

participants were not significantly different in terms of their nervousness between those who

heard a DA speak at a slow versus fast rate (t(305) = .36, p = .899, d = .042).

Serial Mediation Analyses. It was predicted that fast and slow speech rates would convey

risk in using a DA, which makes consumers feel more nervous when they hear the DA speak,

ultimately leading to lower likelihood to want to use the DA (i.e., speech rate à risk à

nervousness à usage likelihood). To test hypothesis 4, a serial mediation analysis (Model 6,

Hayes 2017) with 5,000 bootstrapping samples was conducted that uncovered a negative and

significant indirect effect of the suggested serial mediation pathway. Statistics are provided in

table 1.

Examination of relative indirect effects when comparing slow to moderate speech shows

that slow speech rate has a negative effect upon usage likelihood (b = -.03, SE = .01; CI95% = -

.05, -.01). These effects are serially mediated by SST Risk and Nervousness. Additionally, when

comparing fast to moderate speech, fast speech rate has a negative effect upon usage likelihood

(b = -.01, SE = .01; CI95% = -.03, -.01). These effects are serially mediated by SST Risk and

Nervousness. When compared to moderate speech rate, slower speech had a positive effect on

SST risk (b = .65, SE = .14; CI95% = .37, .92), and fast speech did as well (b = .31, SE = .14;

CI95% = .03, .58). Increased SST risk was then shown to have a positive effect on consumer

nervousness (b = .36, SE = .06; CI95% = .25, .48). Increased consumer nervousness then had a

negative effect upon likelihood to use the DA (b = -.11, SE = .05; CI95% = -.20, -.02). Effects

for all contrast models are shown in figure 2. Examination of the relative indirect effects when

comparing the slow and fast DA speech rate treatments showed a small negative effect, but the

27
95% CI’s contained zero, indicating a non-significant effect (b = -.01, SE = .004; CI95% = -.02,

.23). Additionally, when the order of mediators was switched, (speech rate à nervousness à

SST risk à usage likelihood), the indirect effect of speech rate on likelihood to use the DA was

not significant for either treatment (CI95%slow = –.16, .04; CI95%fast = -.18, .05), indicating the

order of causality as hypothesized.

-----------------------------------
Insert figure 2 about here
-----------------------------------

Experiment 2B uncovers the mechanism by which speech rate elicits nervousness in a

consumer. With a serial mediation analysis, it is demonstrated that both slow and fast speech

rates decrease likelihood to use a digital assistant. This decrease is due to greater nervousness

stemming from heightened risk perception. Essentially, when someone interacts with a digital

assistant that speaks slow or fast, they perceive increased risk in continuing to use the DA, which

elicits nervousness. When a consumer feels greater nervousness, they are then less likely to want

to continue to use the DA.

28
EXPERIMENT 3A

Based upon the earlier discussion of HCI literature and perceptions of emotion in a

person’s voice, a possible explanation for differences in consumer response other than speech

rate could be personal differences in susceptibility to the emotions conveyed by others. Early

theorists in areas of nonverbal behavior and emotional contagion posited that an empathic

process occurs during communication, with the receiver of a message oftentimes taking on

congruent emotional states as the sender (Davis 1983; Eisenberg and Miller 1987). This process

facilitates movement of emotions from person to person based upon context as well as individual

differences in susceptibility to the emotional expressions of others. While contagion is difficult

to quantify, adding a measure of susceptibility to emotional expression of others allows for

investigation of the potential impact personal differences have in the hypothesized process.

Method

Participants and Design. Experiment 3A utilized 187 participants (Mage = 37.32, SD =

12.23, 59.1% male) from an online panel (MTurk) in return for small compensation. Experiment

3A was a 3 (voice speech rate: slow, moderate, fast) x Continuous (moderator: personal

susceptibility to others emotions) design with participants being told they would interact with a

new DA built to help purchase music concert tickets. Audio recordings of the voice were created

identical to prior studies.

Procedure. Participants were randomly assigned to one of three treatment conditions,

hearing either a slow, moderate, or fast speaking DA. After being greeted, they heard a voice

provide information on how it could assist them in purchasing concert tickets based on their

29
preferences. The interaction style of the scenario was monological, or encouraging peripheral

focus. After listening to the DA speak, participants filled out measures for variables of interest

and demographic information. There were 23 participants who did not correctly identify a test

sound and were removed, leaving 164 for the analysis.

Measures. The dependent variable of likelihood to use the DA was measured using a

single item 7-point Likert scale (“How likely would you be to use this assistant to purchase

concert tickets?”; 1 = Not at all likely; 7 = Very likely). Following this, participants filled out a

15-item individual differences scale to quantify personal susceptibility to the emotions of others

(Doherty 1997). These items were averaged to create an index (α = .902). This measure was

taken to test whether personal susceptibility to others emotions alters the results already shown.

Lastly, the same manipulation check and demographic measures were taken.

Results and Discussion

Speech Rate. The manipulation of speech rate was successful, with participants indicating

significant differences between the slow, moderate, and fast treatments (F(2, 161) = 46.24, p <

.001, ƞ2 p = .365). Participants heard the slow voice (Mslow = 3.27, SD = 1.88) as significantly

slower than the moderate voice (Mmoderate = 4.54, SD = .96, t(107) = 4.40, p < .001, d = .849).

Participants also heard the fast voice (Mfast = 5.85, SD = 1.20) as significantly faster than the

moderate voice (t(107) = 6.28, p < .001, d = 1.19). Additionally, participants correctly heard the

slow voice as significantly slower than the fast voice (t(108) = 8.56, p < .001, d = 1.63). There

were no significant differences between the voices in terms of pitch or volume (F’s < 1).

Personal Differences in Susceptibility to Others Emotions. Moderation analyses (Model

1, Hayes 2017) using 5,000 bootstrapping samples with likelihood to use the DA as the DV,

speech rate as the IV, and susceptibility to others emotions as the moderator produced a

30
significant interaction (F(1, 160) = 6.21, p = .014, ƞ2 p = .098). This result indicates that personal

differences in susceptibility to others emotions moderates the shown effect of speech rate upon

usage intentions. To probe the interaction, Johnson-Neyman analysis (Preacher, Rucker, and

Hayes 2007), which examines the interaction between variables at every level of the moderator,

was conducted next. The results of this analysis indicate that participants who measured above

4.8 on the scale of personal susceptibility to others emotions were significantly less likely to

want to use the DA. For participants below this level, results indicate a non-significant effect.

Results are depicted in figure 3. These results provide initial evidence t support the notion that

personal differences of consumers can moderate the relationship between digital assistant speech

rate and a consumer’s likelihood to use it.

This study produces more evidence that slow and fast speech rates can produce negative

reactions. Additionally, it indicates personal susceptibility to others’ emotions moderates the

effect, with participants who are higher in susceptibility having lower intentions to use the DA.

Therefore, managers should be aware that the personal characteristics of consumers may have an

effect on reactions to DA speech rate. However, personal differences are not under the control of

marketers and can only be measured and accounted for. Given this, more managerial value can

be provided if an applicable moderator is tested. Therefore, the remaining studies turn to testing

the moderator of interaction style, which can be adjusted by marketing managers in their design

of digital assistants. Interaction style between digital assistants and consumers gives managers a

tool they can use to avoid unwanted results. Experiment 3B tests this moderator and replicates

some of the results from study3A while also generalizing them to a different sample.

-----------------------------------
Insert figure 3 about here
-----------------------------------

31
EXPERIMENT 3B

The results of the first 4 studies become more managerially useful if the negative effects

can be avoided. Therefore, the remaining experiments test the moderating variable of interaction

style. In experiment 3B, participants were asked to listen to a DA voice and were randomly

assigned to interact in a monological or dialogical scenario. This was carried out by having some

participants listen to information, identical to prior experiments, while having others interact

with the DA in a back-and-forth manner, serving as a proxy for actual interaction with the DA in

the real world. The theoretical basis for a moderation hypothesis lies in the idea that the speech

rate of a voice is a paralinguistic quality and should only be effective at changing consumer

perceptions when it is a salient peripheral cue. When an interaction is designed to be more

dialogic, it encourages focus on central content, and what is being said becomes more salient. In

this style, processing of a peripheral cue will not occur and negative reactions should be avoided.

Experiment 3B directly tests hypothesis 5, that the relationship between speech rate and

likelihood to use a digital assistant is moderated by interaction style, with decreases in usage

likelihood only occurring when the interaction is monological.

Method

Participants and Design. Experiment 3B utilized 362 participants (Mage = 20.39, SD =

1.23, 46.5% male) in a controlled lab environment at a large public university in the U.S.

Participants were compensated with class participation credit. The voice of the DA was

manipulated to create slow, moderate, and fast treatments, similar to earlier studies. Additionally,

participants were also randomly assigned to engage with the DA either by only listening to the

32
DA speak or to interact with the DA in a back-and-forth way. Therefore, the design for

experiment 3B is a 3 (speech rate: slow, moderate, fast) x 2 (interaction style: monological vs.

dialogical) between subjects design. Participants were told they would be interacting with a new

DA built to help create a personal budget. Audio recordings of the voice were created similar to

prior studies. The difference between interaction style treatments was that those in the

monological treatments heard one uninterrupted sound file of the DA talking, while those in the

dialogical treatments heard shorter segments of the same recording, split up into multiple files.

Between listening to each segment, participants were asked to focus on the information being

provided by inputting budget items before proceeding. This effectively created a monological

(one-direction) interaction versus a dialogical (back-and-forth) interaction with the DA.

Procedure. Participants were randomly assigned to one of the six treatment conditions.

After being greeted by the DA, they heard the voice provide information on how and why to

create a personal budget. After going through the scenario, participants filled out measures for

the variables of interest, manipulation checks and demographic information. There were 20

participants who did not correctly identify a test sound and were removed, leaving 342 for the

analysis.

Measures. The dependent variable of likelihood to use the DA to create a personal budget

was measured using a single item 7-point Likert scale (“How likely would you be to use this

assistant to create a personal budget?”; 1 = Not at all likely; 7 = Very likely). Following this

measure, participants completed single-item manipulation checks for the audio recordings, taken

from Martín-Santana et al. (2015) (“This voice was 1 = Slow; 7 = Fast”; “This voice was 1 =

Quiet; 7 = Loud”; “This voice was 1 = Low Pitched; 7 = High Pitched”). Participants finished by

indicating their gender and age. Measures are provided in appendix C.

33
Results and Discussion

Speech Rate. The manipulation of speech rate was successful, with participants indicating

significant differences between the slow, moderate, and fast treatments (F(2, 339) = 48.34, p <

.001, ƞ2 p = .22). Participants heard the slow voice (Mslow = 3.30, SD = .89) as significantly

slower than the moderate voice (Mmoderate = 3.88, SD = .91, t(221) = 4.37, p < .001, d = .585).

Participants also heard the fast voice (Mfast = 4.63, SD = 1.21) as significantly faster than the

moderate voice (t(228) = 5.62, p < .001, d = .746). Additionally, participants correctly heard the

slow voice as significantly slower than the fast voice (t(223) = 9.28, p < .001, d = 1.24). There

were no significant differences between the voices in terms of pitch or volume (F’s < 1).

Likelihood to use the DA. Two-way ANOVA revealed a significant main effect of DA

speech rate upon participant likelihood to use it (F(2, 336) = 11.08, p < .001, ƞ2 p = .06).

Participants who heard the DA speak at a moderate rate were significantly more likely to use the

DA (Mmoderate = 4.69, SD = 1.46) than in both the slow speaking (Mslow = 3.74, SD = 1.81, t(221)

= 4.38, p < .001, d = .589) as well as fast speaking (Mfast = 3.80, SD = 1.82, t(229) = 4.13, p <

.001, d = .545) treatments, respectively. Differences between the slow and fast DA speech rate

treatments were not significant (t(224) = .287, p = .775, d = .038). The main effect of interaction

style was non-significant (F(1,336) = .221, p = .638).

Of particular importance to the current experiment, a marginally significant interaction

was revealed (F(2, 336) = 2.88, p = .057, ƞ2 p = .018), indicating moderation. The results of the

interaction show that in the monological interactions with the DA, as expected, participants were

significantly less likely the use the DA when they heard a slower speech rate (Mslow = 3.46, SD =

1.73) versus the moderate speech rate (Mmoderate = 4.94, SD = 1.46, t(113) = 4.96, p < .001, d =

.926). Additionally, participants were less likely to use the DA in the fast speech conditions

34
(Mfast = 3.69, SD = 1.75) versus the moderate speech rate (t(116) = 4.21, p < .001, d = .779).

Differences between the slow and fast DA speech rate treatments with a monological interaction

were not significant (t(113) = .708, p = .480, d = .131). In the dialogical conditions, where the

interaction with the DA was more focused on the central content, the negative results were

moderated, with no significant differences between the treatments (F’s < 1). Means for the

treatments are shown in figure 4. These results provide initial evidence in support of hypothesis

5, that the relationship between digital assistant speech rate and likelihood to use it is moderated

by interaction style. Decreases in likelihood to use the DA only occur when the interaction is

monological, while dialogical interactions attenuate the negative effects.

-----------------------------------
Insert figure 4 about here
-----------------------------------

35
EXPERIMENT 4A

Experiment 4A builds on the first five experiments by testing a complete moderated

mediation model for the hypothesized effects. The experiment also provides additional

replication and generalization of the results already presented. The model includes interaction

style as a moderator because it is an adaptable variable, under the control of marketing managers

and product designers of the DA, rather than a personal difference that is not controllable.

Method

Participants and Design. Experiment 4A utilized 349 participants (Mage = 20.26, SD =

.90, 50.4% male) in a controlled lab environment at a large public university in the U.S.

Participants were again compensated with class participation credit. Participants were randomly

assigned to hear a slow, moderate, or fast vocal speech rate and also to engage with the DA in

either a monologic or dialogic style. Therefore, the design for experiment 4A is a 3 (speech rate:

slow, moderate, fast) x 2 (interaction style: monological vs. dialogical) between subjects design.

Participants were told they would be interacting with a new digital health agent. Audio

recordings of the voice were created similar to earlier experiments.

Procedure. Participants were randomly assigned to one of the six treatment conditions.

After being greeted by the DA, they listened to the voice provide information on how and why to

create a personal health plan. Manipulation of interaction style was carried out similar to prior

experiments. After going through the scenario, participants filled out measures for the variables

of interest and demographic information. There were 6 participants who did not correctly

identify a test sound and were removed, leaving 343 for the analysis.

36
Measures. Measures for experiment 4A were mostly identical to those taken in earlier

experiments, with the addition of a manipulation check for the interaction style. This measure

consisted of 2 items adapted from Asterhan and Schwarz (2007) (“Would you say this interaction

was more one-directional or interactive?”; “Would you say this interaction was

more informational or more instructional?”). These items were averaged to form an index

measure (r = .862). The dependent variable of likelihood to use the DA was measured using a

single item 7-point Likert scale (“How likely would you be to use this assistant to create a

personal health plan?”; 1 = Not at all likely; 7 = Very likely). Nervousness was again measured

using the same 3-item scale (jittery, distressed, and nervous) from earlier studies. These

measures were averaged to form a nervousness index (α = .947). Lastly, the same manipulation

check and demographic measures were taken, identical to prior studies.

Results and Discussion

Manipulation Checks. The manipulation of speech rate was successful, with participants

indicating significant differences between the slow, moderate, and fast treatments (F(2, 340) =

46.80, p < .001, ƞ2 p = .21). Participants heard the slow voice (Mslow = 3.54, SD = 1.00) as

significantly slower than the moderate voice (Mmoderate = 3.90, SD = .78, t(227) = 3.01, p = .003,

d = .401). Participants also heard the fast voice (Mfast = 4.83, SD = 1.28) as significantly faster

than the moderate voice (t(228) = 6.69, p < .001, d = .873). Additionally, participants correctly

heard the slow voice as significantly slower than the fast voice (t(225) = 8.45, p < .001, d =

1.11). There were no significant differences between the voices in terms of pitch or volume (F’s

< 1). The manipulation of interaction style was also confirmed as successful, with participants in

the monological treatments indicating the interaction was more one-way (Mmonological = 4.41, SD

37
= 1.69), than those in the back-and-forth, or dialogical, treatments (Mdialogical = 3.83, SD = 1.63,

t(341) = 2.11, p = .001, d = .428).

Likelihood to use the DA. Two-way ANOVA revealed a significant main effect of DA

speech rate upon participant likelihood to use it (F(2, 337) = 3.47, p = .032, ƞ2 p = .02).

Participants who heard the DA speak at a moderate rate were significantly more likely to use the

DA (Mmoderate = 3.90, SD = 1.81) than in both the slow speaking (Mslow = 3.30, SD = 1.74, t(227)

= 2.52, p < .001, d = .337) as well as fast speaking (Mfast = 3.46, SD = 1.80, t(228) = 1.84, p <

.001, d = .248) treatments, respectively. Differences between the slow and fast DA speech rate

treatments were not significant (t(225) = .658, p = .511, d = .084). The main effect of interaction

style was not significant (F(1,337) = .647, p = .422).

In further support of hypothesis 5, there was a significant interaction between speech rate

and interaction style (F(2, 337) = 3.07, p = .048, ƞ2 p = .019). In the monological interactions

with the DA, participants were significantly less likely to use it when they heard a slower speech

rate (Mslow = 3.00, SD = 1.65) versus the moderate speech rate (Mmoderate = 4.15, SD = 1.83,

t(112) = 3.53, p = .001, d = .665). Participants were also less likely to use the DA in the fast

speech conditions (Mfast = 3.29, SD = 1.74) versus the moderate speech rate (t(114) = 2.59, p =

.009, d = .485). Differences between the slow and fast DA speech rate treatments were not

significant (t(112) = .920, p = .775, d = .170). In the dialogical treatments, where the interaction

with the DA was more interactive, the negative results were moderated, with no significant

differences between the treatments (F’s < 1). Means for the treatments are shown in figure 5.

-----------------------------------
Insert figure 5 about here
-----------------------------------

38
Nervousness. Speech rate and interaction style had non-significant main effects upon felt

nervousness (F’s < 1), but there was a significant interaction (F(2, 337) = 4.65, p = .010, ƞ2 p =

.027). In further support of hypothesis 2, participants in the monological interactions were

significantly more nervous when they heard a slower speech rate (Mslow = 2.83, SD = 1.69)

versus the moderate speech rate (Mmoderate = 2.17, SD = 1.35, t(112) = 2.32, p = .022, d = .437).

They were also more nervous in the fast speech conditions (Mfast = 3.20, SD = 1.48) than the

moderate speech rate (t(114) = 3.92, p < .001, d = .732). Differences between the slow and fast

DA speech rate treatments were not significant (t(112) = 1.23, p = .775, d = .232). In the

dialogical treatments, the negative results were moderated, with no significant differences

between the treatments (F’s < 1).

Moderated Mediation. Using PROCESS (Model 8, Hayes 2017) with 5,000

bootstrapping samples, a complete moderated mediation analysis was conducted next.

Likelihood to use the DA was the DV, speech rate was the IV, interaction style served as the

moderating variable and feelings of nervousness was the mediator. Details of the mediation

results are reported in table 2. Because the independent variable is multicategorical, the results

are reported as contrasts, using indicator coding, with the moderate speech rate being the

comparison category to the slow and fast treatments, respectively. An index of moderated

mediation is created for each contrast of speech rates. Evidence of moderated mediation in one of

contrasts would support the hypothesis that nervousness mediates the relationship between

speech rate and likelihood to use the DA. Examination of relative indirect effects helps to

determine how the effect is moderated by interaction style.

When comparing the moderate speech rate with the slow speech rate, the index of

moderated mediation does not include zero [LLCI = .0082, ULCI = .4629], providing evidence

39
that nervousness mediates the relationship between speech rate and likelihood to use the DA.

Examination of the confidence intervals between interaction styles shows that the negative effect

upon likelihood to use the DA occurs only for monological interactions [LLCI = -.3469, ULCI =

-.0211], and not dialogical ones [LLCI = -.1057, ULCI = .2117]. Interpretation of the model

coefficients shows that during monological interactions, slow speech elicits more nervousness

than moderate speech, leading to lower likelihood to use the DA.

When comparing the moderate speech rate with the fast speech rate, the index of

moderated mediation also does not include zero [LLCI = .0916, ULCI = .5671], providing

evidence that nervousness mediates the relationship between speech rate and likelihood to use

the DA. Examination of the confidence intervals between interaction styles shows the negative

effect upon likelihood to use the DA occurs only for monological interactions [LLCI = -.4486,

ULCI = -.1024], and not dialogical ones [LLCI = -.0971, ULCI = .1953]. Interpretation of the

model coefficients shows that during monological interactions, fast speech elicits more

nervousness than moderate speech, leading to lower likelihood to use the DA. A comparison of

the slow DA speech treatment with the fast DA speech rate treatments produced an index of

moderated mediation that included zero [LLCI = -.1216, ULCI = .3295], indicating there was no

difference between the conditional indirect effects of these treatments.

-----------------------------------
Insert table 2 about here
-----------------------------------

40
EXPERIMENT 4B

Experiment 4A provided the first full model of hypotheses examined thus far. It shows

that slow and fast speech elicit increased nervousness, which then led to decreased usage

intentions. However, the design of the interaction also moderates the effect, providing guidance

to marketers who wish to make digital assistants who can speak at different rates while also

avoiding negative consumer reactions. Experiment 4B contributes to the complete empirical

package by replicating the full model of effects shown in study 4A and expanding the effects to a

different sample population while providing additional support for all hypotheses.

Method

Participants and Design. Experiment 4B utilized 338 participants (Mage = 40.51, SD =

12.79, 48.9% male) from an online panel (Amazon Mechanical Turk) in return for small

compensation. Participants were randomly assigned to hear a slow, moderate, or fast vocal

speech rate and to engage with the DA in either a monologic or dialogic interaction, similar to

experiment 4A. Therefore, the design for experiment 4B is a 3 (speech rate: slow, moderate, fast)

x 2 (interaction style: monological vs. dialogical) between subjects design. Participants were told

they would be interacting with a new digital health agent. Audio recordings of the voice were

created similar to earlier studies.

Procedure. Participants were randomly assigned to one of the six treatment conditions.

The sequence for experiment 4B was identical to the one used in 4A. After going through the

scenario, participants filled out measures for the variables of interest and demographic

41
information. There were 31 participants who did not correctly identify a test sound and were

removed, leaving 307 for the analysis.

Measures. Measures for study 4B were all the same as what was collected in study 4A.

Results and Discussion

Manipulation Checks. The manipulation of speech rate was successful, with participants

indicating significant differences between the slow, moderate, and fast treatments (F(2, 304) =

172.30, p < .001, ƞ2 p = .53). Participants heard the slow voice (Mslow = 3.03, SD = 1.55) as

significantly slower than the moderate voice (Mmoderate = 4.33, SD = .83, t(200) = 7.43, p < .001,

d = 1.05). Participants also heard the fast voice (Mfast = 6.04, SD = 1.00) as significantly faster

than the moderate voice (t(203) = 13.31, p < .001, d = 1.86). Additionally, participants correctly

heard the slow voice as significantly slower than the fast voice (t(203) = 16.57, p < .001, d =

2.31). There were no significant differences between the voices in terms of pitch or volume (F’s

< 1). The manipulation of interaction style was also confirmed as successful, with participants in

the monological treatments indicating the interaction was more one-way (Mmonological = 3.38, SD

= 1.53), than those in the back-and-forth, or dialogical, treatments (Mdialogical = 4.00, SD = 1.64,

t(303) = 3.39, p = .001, d = .389).

Likelihood to use the DA. Two-way ANOVA showed non-significant main effects for

both speech rate and interaction style upon likelihood to use the DA (F’s < 1). There was,

however, a significant interaction between the two (F(2, 301) = 4.90, p = .008, ƞ2 p = .03). In

support of hypothesis 5, participants in the monological treatments were significantly less likely

to use the DA when it spoke at a slow rate (Mslow = 4.37, SD = 2.02) versus a moderate rate

(Mmoderate = 5.15, SD = 1.21, t(103) = 2.38, p = .027, d = .462). Participants in the fast DA speech

rate treatments were also significantly less likely to use the DA (Mfast = 4.13, SD = 2.09) versus

42
the moderate speech rate (t(103) = 3.06, p = .004, d = .598). Differences between the slow and

fast DA speech rate treatments were not significant (t(104) = .615, p = .511, d = .121). In the

dialogical treatments, where the interaction with the DA was more interactive, the negative

results were moderated, with no significant differences between the treatments (F’s < 1). Means

are shown in figure 6.

-----------------------------------
Insert figure 6 about here
-----------------------------------

Nervousness. Speech rate and interaction style had non-significant main effects upon felt

nervousness (F’s < 1), but there was a significant interaction (F(2, 301) = 5.02, p = .007, ƞ2 p

=.03). In further support of hypothesis 2, participants in the monological interactions were

significantly more nervous when they heard a slower speech rate (Mslow = 2.34, SD = 1.83)

versus the moderate speech rate (Mmoderate = 1.50, SD = .57, t(103) = 3.14, p = .005, d = .616).

Participants were also more nervous in the fast speech conditions (Mfast = 2.47, SD = 1.60) than

the moderate speech rate (t(103) = 4.12, p = .001, d = .806). Differences between the slow and

fast DA speech rate treatments were not significant (t(104) = .394, p = .775, d = .075). In the

dialogical treatments, the negative results were moderated, with no significant differences

between the treatments (F’s < 1).

Moderated Mediation. Using PROCESS (Model 8, Hayes 2017) with 5,000

bootstrapping samples, a complete moderated mediation analysis was conducted next.

Likelihood to use the DA to create a daily health check was the dependent variable, speech rate

was the independent variable, interaction style served as the moderating variable and feelings of

nervousness was the mediator. Details of the mediation results are reported in table 2. Because

43
the independent variable is multicategorical, the results are reported as contrasts, similar to

experiment 4A analysis.

When comparing the moderate speech rate with the slow speech rate, the index of

moderated mediation does not include zero [LLCI = .0343, ULCI = .3674], providing evidence

that nervousness mediates the relationship between speech rate and likelihood to use the DA.

Examination of the confidence intervals for the contrast between interaction styles shows that the

negative effect upon likelihood to use the DA occurs only for monological interactions [LLCI = -

.2866, ULCI = -.0264], and not dialogical ones [LLCI = -.0715, ULCI = .1444]. Interpretation of

the model coefficients shows that during monological interactions, slow speech elicits more

nervousness than moderate speech, leading to lower likelihood to use the DA.

When comparing the moderate speech rate with the fast speech rate, the index of

moderated mediation also does not include zero [LLCI = .0414, ULCI = .4204], providing

evidence that nervousness mediates the relationship between speech rate and likelihood to use

the DA. Examination of the confidence intervals for the contrast between interaction styles

shows that the negative effect upon likelihood to use the DA occurs only for monological

interactions [LLCI = -.3123, ULCI = -.0331], and not dialogical ones [LLCI = -.0546, ULCI =

.1705]. Interpretation of the model coefficients shows that during monological interactions, fast

speech elicits more nervousness than moderate speech, leading to lower likelihood to use the

DA. A comparison of the slow DA speech with the fast DA speech rate treatments produced an

index of moderated mediation that included zero [LLCI = -.1135, ULCI = .2092], indicating

there was not a significant difference between the conditional indirect effects of these treatments.

44
GENERAL DISCUSSION

This research shows the voice we hear during social interactions has an impact on what

we feel and ultimately what we are likely to do. Across seven experiments, evidence is presented

showing the speed of the voice we hear can elicit both positive and negative reactions, such as

nervousness. Increases in nervousness are then shown to lower likelihood to want to interact with

a voice, tested here in the form of a digital assistant. Prior work in the area of sensory marketing

has focused on understanding positive reactions to pleasing sounds, such as music, or on testing

other parameters of sound, such as volume or pitch rather than speed of speech. In the broadcast

advertising literature, outcome measures have consistently been in the form of perceptions

regarding the voice only, and studies have tested variables which are conflated with volume or

visual content. Given this, the current work clarifies the role that speech rate alone plays in

changing consumer intentions. Furthermore, it shows that negative effects may occur based on

the speech rate of a voice, particularly the sound of the voice we hear when interacting with a

digital assistant.

The opening of this manuscript makes clear that digital assistants are being adopted at a

rapid pace, and their integration with other devices that create branded ecosystems means that

competition for market share in the early stages has large financial impact for brands. This rapid

adoption of digital assistants which speak to us calls for both practitioners as well as academics

to investigate and uncover how we interpret and react to their voices. As such, these studies

expand the literature in sensory marketing related to the audible cue of speech. They attempt to

do this by investigating the popular consumer behavior topic of digital assistants and providing

45
theoretical guidance to explain the results. The results demonstrate the speech rate of a

digital assistant’s voice, being slow, moderate, or fast, can elicit nervousness in consumers,

which is facilitated by perceptions of risk, and ultimately lowers likelihood to want to use the

digital assistant. Additionally, the managerially applicable moderators of interaction style and

personal susceptibility to others emotions are shown to mitigate negative outcomes. Differences

between speech rates and consumer reactions are examined in both lab and online environments

and across multiple scenarios.

Theoretical Implications

This research contributes to the body of sensory marketing research in several ways.

First, it demonstrates that an audible cue, the voice of a digital assistant, can negatively affect a

consumer’s emotional and intentional reactions to products. While some prior work has

investigated the sound of voices, the results are limited to main effects in broadcast advertising

contexts which focus on attributions about the speaker instead of the consumer. Furthermore,

these studies often use less nuanced manipulations of slow versus fast speech rate. Second, this

work examines the audible cue of voice speech rate rather than voice pitch or volume. Speech

rate, as well as timbre qualities, open up many avenues for theoretical development moving

forward. By examining the effects of audible cues beyond those that are combined with visuals

or other sensory elements, this work expands the sensory as well as HCI literature.

Third, this work bridges a gap between sensory and communications literature by

applying the theory of social response in order to theoretically explain a sensory effect. This will

hopefully allow marketers who manage a digital assistant to better understand a consumer’s

interactions with their service robot and guide them regarding the design of the voice. The results

indicate a process of effects, via feelings of nervousness and risk, which are moderated by

46
interaction style. This provides theoretical grounding for future studies on other effects of vocal

cues. By incorporating theory from communications and HCI literature, a more refined

understanding of our interactions with a new technology of commerce is possible.

Lastly, this work builds upon prior research which used dichotomous manipulations of

speech compression or only examined faster speech. The expanded manipulation of slow,

moderate, and fast speeds of speech provides more nuance to assessments of these effects. The

current manipulation of speed at both 20% slower and faster than average provides something

closer to just-noticeable-difference analysis for established speech rates (Quené 2007). These

equated to 115 words per minute for slow and 160 words per minute for fast speech, providing a

rough guideline for future speech rate research.

Overall, the main intended contribution of this work is in providing a better theoretical

understanding of how the sounds we hear affect what we feel, think, and do. When interacting

with a digital assistant, our interactions may produce both positive and negative outcomes. These

studies provide an explanatory process for these negative outcomes in technology interaction and

establish applicable suggestions for how to avoid them.

Managerial Implications

Given that many modern consumer devices now come equipped with digital assistants,

such as Alexa and Siri, with over half of U.S. consumers owning a smart speaker and 45% of

millennials reporting they use their digital assistant while shopping (Kinsella 2019; Toplin

2018), it has become increasingly important for product managers to understand how consumers

interact with them. The voice of a digital assistant is an audible cue which helps to facilitate

social interactions with consumers. As such, it is also a tool brands are focusing on in order to

provide social value when consumers interact with their products. However, not much is known

47
in the marketing literature to help explain how and why consumers perceive this social value and

interact with different vocal types.

The presented findings suggest marketing managers who implement digital assistants

which can adjust their vocal speech rate should consider both what the voice says as well as how

it says it. Speech rate of a digital assistant can impact consumer nervousness, which then impacts

likelihood to want to use it. When the digital assistant speaks at a slow or fast rate, nervousness

increases and lowers likelihood to use.

If a brand wants to implement a digital assistant which varies its speech rate, the design

of the interaction style with the consumer becomes an important factor to consider. The strategic

use of speech rate as well as interaction style was shown to help avoid negative outcomes under

some circumstances. If an interaction is one-directional, or monologic, slow and fast speech rate

increase nervousness and then intentions to use the digital assistant are lowered. However, if the

interaction is designed as a back-and-forth interactive style, negative results are avoided.

Based upon this, managers could consider designing digital assistants to speak at a

moderate speech rate overall. However, if a voice that speaks at varying speech rates is desired,

then interactions could be designed as dialogical rather than monological.

Limitations and Future Research

These empirical findings are not without limitation and offer at least three lines of future

inquiry and improvement. First, the data collected are taken from a majority of controlled lab

samples, with only two online panels included. An interesting question is whether the actual

usage of a digital assistant would change in a real-world environment. To answer this question, a

different measure of the dependent variable could be usage behavior of those people who already

use a digital assistant. Having a collection of field data from actual devices would improve the

48
dependent measure, increase validity for the hypothesized moderator, and would reinforce the

already shown results.

Second, researchers may consider testing alternative explanations for the effects. While

interaction style and susceptibility to the emotions of others were examined here, other

mechanisms such as accessibility-diagnosticity theory (Herr, Kardes, and Kim 1991) provide

alternative explanations for the observed relationships. For instance, there is some evidence that

when the accessibility of brand-related information increases, consumers are more likely to use

that information as an input for brand evaluations (Li and He 2013; Menon and Raghubir 2003).

Therefore, if speed is important to your branding, the speech rate of your digital assistant may be

a more accessible piece of information during interactions, and should be emphasized by speech

rate.

Lastly, this work used time compression techniques similar to early research in

broadcasting on speaker voice. This technique conflates the earlier mentioned subcomponents of

speech rate: syllabic speed and interphase pausation. While there are mixed results to

manipulating one or both of these subcomponents of speech rate, doing so with digital assistants

may provide important nuance to future researchers and would not be difficult given modern

digital audio workstations.

These studies are among the first known to the author to provide a thorough process of

sensory effects related to the sound of a voice as well as reasonable and actionable tools for

managers who are using digital assistants to promote their products. With the use of digital

assistants growing, it is important that brands ensure the voice of their digital assistant provides

emotional and social value to the consumer, similar to a human agent. By establishing that either

the speech rate or interaction style mitigate the negative effects of digital assistant vocal cues, the

49
link between the voice desired by managers and their expected outcome can be achieved. The

way in which these digital assistants are designed to speak to us can impact how we feel while

interacting with them as well as how much we want to use them. This work shows the voice a

consumer hears when interacting with a digital assistant is impactful to successful consumer

adoption. Overall, for people and computers alike, it’s important to remember that our message

to others isn’t conveyed in just what we say, but also in how fast or slow we say it.

50
DATA COLLECTION INFORMATION

Experiments 1, 2A, 2B, 3B, and 4A were conducted at the University of Alabama from

Fall of 2019 to Spring 2020 by the author, under supervision of the co-chairs and direction of the

committee members. Experiments 3A and 4B were conducted online in Spring 2020 using

Amazon MTurk, under the supervision of the co-chairs. The author prepared, analyzed, and

wrote the manuscript in its current form.

51
REFERENCES

Aggarwal, Pankaj, and Ann L. McGill (2007), "Is That Car Smiling at Me? Schema Congruity as
a Basis for Evaluating Anthropomorphized Products," Journal of Consumer Research,
34, 4, 468-479.

Amazon Day One Staff (2019), “Alexa, Speak Slower,” Retrieved from
https://blog.aboutamazon.com/devices/alexa-speak-slower, August 7, 2019.

Anselmsson, Johan (2001), “Customer-Perceived Service Quality and Technology-Based Self-


Service,” Doctoral Dissertation, Lund University, Lund, Sweden: Lund Business Press.

Apple, William, Lynn A. Streeter, and Robert M. Krauss (1979), "Effects of Pitch and Speech
Rate on Personal Attributions," Journal of Personality and Social Psychology, 37, 5, 715.

Asterhan, Christa, and Baruch B. Schwarz (2007), "The Effects of Monological and Dialogical
Argumentation on Concept Learning in Evolutionary Theory," Journal of Educational
Psychology, 99, 3, 626.

Barrett, Lisa Feldman, and James A. Russell (1999), "The Structure of Current Affect:
Controversies and Emerging Consensus," Current Directions in Psychological Science,
8, 1, 10-14.

Bauer, Raymond A. (1960), Consumer Behavior as Risk Taking, Chicago, IL, 384-398.

Belardinelli, M. Olivetti, Massamiliano Palmiero, Carlo Sestieri, Davide Nardo, Rosalia Di


Matteo, Alessandro Londei, Alessandro D’Ausilio, Antonio Ferretti, Cosimo Del Gratta,
and Gian Luca Romani (2009), "An fMRI Investigation on Image Generation in Different
Sensory Modalities: The Influence of Vividness," Acta Psychologica, 132, 2, 190-200.

Benkí, José, Jessica Broome, Frederick Conrad, Robert Groves, and Frauke Kreuter (2011),
“Effects of Speech Rate, Pitch, and Pausing on Survey Participation Decisions," In
American Association for Public Opinion Research Annual Meeting, Phoenix, AZ.

Bruner, Gordon C. (1990), "Music, Mood, and Marketing," Journal of Marketing, 54, 4, 94-104.

Burke, Michael J., Arthur P. Brief, Jennifer M. George, Loriann Roberson, and Jane Webster
(1989), "Measuring Affect at Work: Confirmatory Analyses of Competing Mood
Structures with Conceptual Linkage to Cortical Regulatory Systems," Journal of
Personality and Social Psychology, 57, 6, 1091.

52
Charoenruk, Nuttirudee, and Kristen Olson (2018), "Do Listeners Perceive Interviewers’
Attributes from their Voices and Do Perceptions Differ by Question Type?," Field
Methods, 30, 4, 312-328.

Charpentier, Caroline J., Jessica Aylward, Jonathan P. Roiser, and Oliver J. Robinson (2017),
"Enhanced Risk Aversion, but Not Loss Aversion, in Unmedicated Pathological
Anxiety," Biological Psychiatry, 81, 12, 1014-1022.

Chaiken, Shelly (1980), "Heuristic Versus Systematic Information Processing and the Use of
Source Versus Message Cues in Persuasion," Journal of Personality and Social
Psychology, 39, 5, 752.

Chattopadhyay, Amitava, Darren W. Dahl, Robin JB Ritchie, and Kimary N. Shahin (2003),
"Hearing Voices: The Impact of Announcer Speech Characteristics on Consumer
Response to Broadcast Advertising," Journal of Consumer Psychology, 13, 3, 198-204.

Cherry, John, and John Fraedrich (2002), "Perceived Risk, Moral Philosophy and Marketing
Ethics: Mediating Influences on Sales Managers' Ethical Decision-Making," Journal of
Business Research, 55, 12, 951-962.

Dahl, D. W. (2010), “Understanding the Role of Spokesperson Voice in Broadcast Advertising,”


In Sensory Marketing: Research on the Sensuality of Products, ed. A. Krishna, New
York: Routledge, 169–182.

Darwin, Charles (1872), The Expression of Emotions in Animals and Man, London: Murray, 11.

Davis, Mark H. (1983), "Measuring Individual Differences in Empathy: Evidence for a


Multidimensional Approach," Journal of Personality and Social Psychology, 44, 1, 113.

De Ruyter, Ko, Martin Wetzels, and Mirella Kleijnen (2001), "Customer Adoption of e‐Service:
An Experimental Study," International Journal of Service Industry Management, 5.

Djordjevic, Jelena, Robert J. Zatorre, Michael Petrides, and Marilyn Jones-Gotman (2004), "The
Mind's Nose: Effects of Odor and Visual Imagery on Odor Detection," Psychological
Science, 15, 3, 143-148.

Doherty, R. William (1997), "The Emotional Contagion Scale: A Measure of Individual


Differences," Journal of Nonverbal Behavior, 21, 2, 131-154.

Eagly, Alice H., and Shelly Chaiken (1993), The Psychology of Attitudes, Harcourt Brace
Jovanovich College Publishers.

Eisenberg, Nancy, and Paul A. Miller (1987), "The Relation of Empathy to Prosocial and
Related Behaviors," Psychological Bulletin, 101, 1, 91.

53
Epley, Nicholas, Adam Waytz, and John T. Cacioppo (2007), "On Seeing Human: A Three-
Factor Theory of Anthropomorphism," Psychological Review, 114, 4, 864.

Fowles, Don C. (1994), "A Motivational Theory of Psychopathology," Nebraska Symposium on


Motivation, 41, 181-238.

Garlin, Francine V., and Katherine Owen (2006), "Setting the Tone with the Tune: A Meta-
Analytic Review of the Effects of Background Music in Retail Settings," Journal of
Business Research, 59, 6, 755-764.

Gatignon, Hubert, and Thomas S. Robertson (1985), "A Propositional Inventory for New
Diffusion Research," Journal of Consumer Research, 11, 4, 849-867.

Giorgetta, Cinzia, Alessandro Grecucci, Sophia Zuanon, Laura Perini, Matteo Balestrieri,
Nicolao Bonini, Alan G. Sanfey, and Paolo Brambilla (2012), "Reduced Risk-Taking
Behavior as a Trait Feature of Anxiety," Emotion, 12, 6, 1373.

Hayes, Andrew F. (2017), Introduction to Mediation, Moderation, and Conditional Process


Analysis: A Regression-Based Approach, Guilford Publications.

Herr, Paul M., Frank R. Kardes, and John Kim (1991), “Effects of Word-of-Mouth and Product-
Attribute Information on Persuasion: An Accessibility-Diagnosticity Perspective,”
Journal of Consumer Research, 17, 4, 454–62.

Jeannerod, Marc (1995), "Mental Imagery in the Motor Context," Neuropsychologia, 33, 11,
1419-1432.

Johannesen, Richard L. (1996), Ethics in Human Communication, 4th ed. Prospect Heights, IL,
Waveland Press.

Kahneman, Daniel, and Amos Tversky (2013), "Prospect Theory: An Analysis of Decision
Under Risk," In Handbook of the Fundamentals of Financial Decision Making, Part I,
99-127.

Kellaris, James J. and Moses B. Altesch (1992), "The Experience of Time as a Function of
Musical Loudness and Gender of Listener," Advances in Consumer Research, Vol. 19,
ed. J. Sherry and B. Sternthal, Provo, UT: Association for Consumer Research, 725-729.

Kinsella, Bret (2019), “45% of Millennials Use Voice Assistants While Shopping According to a
New Study,” Retrieved from https://voicebot.ai/2019/03/20/45-of-millennials-use-voice-
assistants-while-shopping-according-to-a-new-study/, May 21, 2019.

Kraus, Michael W. (2017), "Voice-Only Communication Enhances Empathic Accuracy,"


American Psychologist, 72, 7, 644.

54
Kraus, Rachel (2019), “John Legend’s Voice on Google Assistant is Finally Here,” Retrieved
from https://mashable.com/article/john-legend-google-assistant-voice-cameo-
launch/?utm_campaign=FEED+BLAST-Mashable+Top+Stories+Daily-
20190404T170000%2B0000&utm_source=newsletter#V1b_olNpbOqY, April 21, 2019.

Krishna, Aradhna (2013), Customer Sense: How the 5 Senses Influence Buying Behavior, New
York: Palgrave Macmillan.

Kumar, Rahul and Akshay Rasal (2018), “Smart Speaker Market by Intelligent Virtul Assistant,”
Retrieved from https://www.alliedmarketresearch.com/smart-speaker-market, April 22,
2020.

Lane, Harlan, and François Grosjean (1973), "Perception of Reading Rate by Speakers and
Listeners," Journal of Experimental Psychology, 97, 2, 141.

Langer, Ellen J. (1989), Mindfulness, Addison-Wesley/Addison Wesley Longman.

Laukka, Petri, Clas Linnman, Fredrik Åhs, Anna Pissiota, Örjan Frans, Vanda Faria, Åsa
Michelgård, Lieuwe Appel, Mats Fredrikson, and Tomas Furmark (2008), "In a Nervous
Voice: Acoustic Analysis and Perception of Anxiety in Social Phobics’ Speech," Journal
of Nonverbal Behavior, 32, 4, 195.

Lawrence-Wood, Eleanor Ruth (2011), “Trust Me, This Is(n't) Scary!: How Trust Affects Social
Emotional Influence in Threatening Situations,” Flinders University of South Australia,
School of Psychology, 3.

Li, Yan and Hongwei He (2013), “Evaluation of International Brand Alliances: Brand Order and
Consumer Ethnocentrism,” Journal of Business Research, 66, 1, 89–97.

Lowe, Michael L., and Kelly L. Haws (2017), "Sounds Big: The Effects of Acoustic Pitch on
Product Perceptions," Journal of Marketing Research, 54, 2, 331-346.

Lowe, Michael L., Katherine E. Loveland, and Aradhna Krishna (2019), "A Quiet Disquiet:
Anxiety and Risk Avoidance Due to Nonconscious Auditory Priming," Journal of
Consumer Research, 46, 1, 159-179.

Lowe, Michael L., Christine Ringler, and Kelly Haws (2018), "An Overture to Overeating: The
Cross-Modal Effects of Acoustic Pitch on Food Preferences and Serving Behavior,"
Appetite, 123, 128-134.

MacLachlan, James, and Michael H. Siegel (1980), "Reducing the Costs of TV Commercials by
Use of Time Compressions," Journal of Marketing Research, 17, 1, 52-57.

Maner, Jon K., J. Anthony Richey, Kiara Cromer, Mike Mallott, Carl W. Lejuez, Thomas E.
Joiner, and Norman B. Schmidt (2007), "Dispositional Anxiety and Risk-Avoidant
Decision-Making," Personality and Individual Differences, 42, 4, 665-675.

55
Martín-Santana, Josefa D., Clara Muela-Molina, Eva Reinares-Lara, and Miriam Rodríguez-
Guerra (2015), "Effectiveness of Radio Spokesperson's Gender, Vocal Pitch and Accent
and the Use of Music in Radio Advertising," BRQ Business Research Quarterly, 18, 3,
143-160.

McTear, Michael, Zoraida Callejas, and David Griol (2016), The Conversational Interface:
Talking to Smart Devices, Springer.

Menon, Geeta and Priya Raghubir (2003), “Ease-of-Retrieval as an Automatic Input in


Judgments: A Mere-Accessibility Framework?,” Journal of Consumer Research, 30, 2,
230–43.

Meuter, Matthew L., Mary Jo Bitner, Amy L. Ostrom, and Stephen W. Brown (2005), "Choosing
Among Alternative Service Delivery Modes: An Investigation of Customer Trial of Self-
Service Technologies," Journal of Marketing, 69, 2, 61-83.

Moon, Youngme (2000), “Intimate Exchanges: Using Computers to Elicit Self-Disclosure From
Consumers,” Journal of Consumer Research, 26, 4, 323–339.

Moon, Youngme, and Clifford Nass (1996), "How “Real” are Computer Personalities?
Psychological Responses to Personality Types in Human-Computer Interaction,"
Communication Research, 23, 6, 651-674.

__________ (1998), "Are Computers Scapegoats? Attributions of Responsibility in Human–


Computer Interactions," International Journal of Human-Computer Studies, 49, 1, 79-94.

Moore, Danny L., Douglas Hausknecht, and Kanchana Thamodaran (1986), "Time Compression,
Response Opportunity, and Persuasion," Journal of Consumer Research, 13, 1, 85-99.

Murray, Iain R., and John L. Arnott (1993), "Toward the Simulation of Emotion in Synthetic
Speech: A Review of the Literature on Human Vocal Emotion," The Journal of the
Acoustical Society of America, 93, 2, 1097-1108.

Nass, Clifford, Youngme Moon, Brian J. Fogg, Byron Reeves, and D. Christopher Dryer (1995),
"Can Computer Personalities be Human Personalities?," International Journal of Human-
Computer Studies, 43, 2, 223-239.

Nass, Clifford, Youngme Moon, and Nancy Green (1997), "Are Machines Gender Neutral?
Gender‐Stereotypic Responses to Computers with Voices," Journal of Applied Social
Psychology, 27, 10, 864-876.

Nass, Clifford, and Youngme Moon (2000), "Machines and Mindlessness: Social Responses to
Computers," Journal of Social Issues, 56, 1, 81-103.

56
Nesari, Ali Jamali (2015), "Dialogism Versus Monologism: A Bakhtinian Approach to
Teaching," Procedia-Social and Behavioral Sciences, 205, 642-647.

Oakes, Steve, and Adrian C. North (2006), "The Impact of Background Musical Tempo and
Timbre Congruity Upon Ad Content Recall and Affective Response," Applied Cognitive
Psychology: The Official Journal of the Society for Applied Research in Memory and
Cognition, 20, 4, 505-520.

O’Connor, Catherine, and Sarah Michaels (2007), "When is Dialogue ‘Dialogic’?," Human
Development, 50, 5, 275-285.

Paluch, Stefanie, and Nancy V. Wünderlich (2016), "Contrasting Risk Perceptions of


Technology-Based Service Innovations in Inter-Organizational Settings," Journal of
Business Research, 69, 7, 2424-2431.

Parkinson, Brian, and Gwenda Simons (2009), "Affecting Others: Social Appraisal and Emotion
Contagion in Everyday Decision Making," Personality and Social Psychology Bulletin,
35, 8, 1071-1084.

Peck, Joann, Victor A. Barger, and Andrea Webb (2013), "In Search of a Surrogate for Touch:
The Effect of Haptic Imagery on Perceived Ownership," Journal of Consumer
Psychology, 23, 2, 189-196.

Perez, Sarah (2019), “Report: Voice Assistants in Use to Triple to 8 Billion by 2023,” Retrieved
from https://techcrunch.com/2019/02/12/report-voice-assistants-in-use-to-triple-to-8-
billion-by-2023/, April 20, 2019.

Peterson, Robert A., Michael P. Cannito, and Steven P. Brown (1995), "An Exploratory
Investigation of Voice Characteristics and Selling Effectiveness," Journal of Personal
Selling and Sales Management, 15, 1, 1-15.

Pierce, David (2017), “How Apple Finally Made Siri Sound More Human,” Retrieved from
https://www.wired.com/story/how-apple-finally-made-siri-sound-more-human/, April 24,
2019.

Preacher, Kristopher J., Derek D. Rucker, and Andrew F. Hayes (2007), "Addressing Moderated
Mediation Hypotheses: Theory, Methods, and Prescriptions," Multivariate Behavioral
Research, 42, 1, 185-227.

Quené, Hugo (2007), "On the Just Noticeable Difference for Tempo in Speech," Journal of
Phonetics, 35, 3, 353-362.

Reeves, Byron, and Clifford Ivar Nass (1996), The Media Equation: How People Treat
Computers, Television, and New Media Like Real People and Places, Cambridge
University Press.

57
Routley, Nick (2019), “The Fight for Smart Speaker Market Share,” Retrieved from
https://www.visualcapitalist.com/smart-speaker-market-share-fight/, April 22, 2020.

Ruijten, Peter, Jacques Terken, and Sanjeev Chandramouli (2018), "Enhancing Trust in
Autonomous Vehicles Through Intelligent User Interfaces that Mimic Human Behavior,"
Multimodal Technologies and Interaction, 2, 4, 62.

Russell, James A. (1980), "A Circumplex Model of Affect," Journal of Personality and Social
Psychology, 39, 6, 1161.

Schwartz, Rachel, and Marc D. Pell (2012), "Emotional Speech Processing at the Intersection of
Prosody and Semantics," PloS One, 7, 10.

Siegman, Aron W., and Stephen Boyle (1993), "Voices of Fear and Anxiety and Sadness and
Depression: The Effects of Speech Rate and Loudness on Fear and Anxiety and Sadness
and Depression," Journal of Abnormal Psychology, 102, 3, 430.

Smith, Bruce L., Bruce L. Brown, William J. Strong, and Alvin C. Rencher (1975), "Effects of
Speech Rate on Personality Perception," Language and Speech, 18, 2, 145-152.

Sullivan, Malcolm (2002), "The Impact of Pitch, Volume and Tempo on the Atmospheric Effects
of Music," International Journal of Retail and Distribution Management, 30, 6, 323-330.

Tanner Jr, John F., James B. Hunt, and David R. Eppright (1991), "The Protection Motivation
Model: A Normative Model of Fear Appeals," Journal of Marketing 55, 3, 36-45.

Toplin, Jaime (2018), “Voice Shopping Grew Threefold During the Holidays,” Retrieved from
https://www.businessinsider.com/amazon-alexa-holiday-voice-shopping-grew-threefold-
2018-12, May 20, 2019.

Watson, David, Lee Anna Clark, and Auke Tellegen (1988), "Development and Validation of
Brief Measures of Positive and Negative Affect: The PANAS Scales," Journal of
Personality and Social Psychology, 54, 6, 1063.

Yakuel, Pini (2018), “Digital Assistant, Help Me Market My Brand,” Retrieved from
https://www.forbes.com/sites/forbescommunicationscouncil/2018/09/11/digital-assistant-
help-me-market-my-brand/#7e3496cc1a54, April 20, 2019.

Yoo, Seung-Schik, Daniel K. Freeman, James J. McCarthy III, and Ferenc A. Jolesz (2003),
"Neural Substrates of Tactile Imagery: A Functional MRI Study," Neuroreport, 14, 4,
581-585.

58
APPENDIX A: SOUND STIMULI USED IN EXPERIMENTS

Experiment 1/3B: Slow DA Voice 1/3B: Moderate Voice 1/3B: Fast DA Voice

Study 1 and 3 Study 1 and 3 Study 1 and 3


Slow.mp3 Medium.mp3 Fast.mp3

Experiment 2A: Slow DA Voice 2A: Moderate DA Voice 2A: Fast DA Voice

Study 2 Slow.mp3 Study 2 Medium.mp3 Study 2 Fast.mp3

Experiment 2B: Slow DA Voice 2B: Moderate DA Voice 2B: Fast DA Voice

Stuy 2C Slow.mp3
Study 2C Moderate.mp3 Study 2C Fast.mp3

Experiment 3A: Slow DA Voice 3A: Moderate DA Voice 3A: Fast DA Voice

Study 2 Slow.mp3 Study 2 Medium.mp3 Study 2 Fast.mp3

Experiment 4A: Slow DA Voice 4A: Moderate DA Voice 4A: Fast DA Voice

Study 4 Slow.mp3 Study 4 Medium.mp3 Study 4 Fast.mp3

Experiment 4B: Slow DA Voice 4B: Moderate DA Voice 4B: Fast DA Voice

Study 5 Slow.mp3 Study 5 Medium.mp3 Study 5 Fast.mp3

59
APPENDIX B: SCRIPTS FOR EXPERIMENTS

Experiments “Your monthly budget should cover your basic living expenses, including
1/3B housing, utilities, insurance, transportation and groceries. You should also
include any subscriptions you pay for, as well as your student loan payments. If
you have any other loans – like a car loan – include those as well. Once you've
recorded your living expenses and your income, you must decide what to do
with the money that's left over. I recommend you put some toward an
emergency fund, some toward discretionary purchases like dining out, and
some toward retirement or other future savings goals. As your income
increases, reevaluate your budget and always raise your savings amount before
spending more on discretionary purchases to help keep yourself on track for
your financial goals.”
Experiment “Your daily study routine should cover the basics of your classes for your
2A major, any electives, and ensuring you are prepared for your final assignments,
projects, and exams. It should also include taking stock of how you currently
feel, as this effects your study routine. Lastly, you should start to take note of
your daily schedule, taking time to study a small amount each day in order to
be best prepared for the end of the semester, and making adjustments
accordingly. For many people, finding time to study can be difficult. It is
important that you keep in mind your upcoming schedule and exams, and plan
accordingly. Managing your time benefits both studying and helps to manage
stress”
Experiment “Your classes should be selected towards your major of study. Think about
2B certain credits you may need, classes you have taken in the past, and what extra
electives you are interested in. This should include taking stock of your current
classes and major, as this may affect what I recommend. Lastly, you should
start to take note of your current schedule, budget, and how many credit hours
you need, in order to get the best match. For many people, finding time to
balance class with social life can be difficult. It is important that you keep in
mind your upcoming schedule and plan accordingly. Managing your time and
preferences can help find the right classes. I can help you with enrolling in
classes next semester. Would you like to do this?”

Experiment “Your tickets should be selected for a show you really want to see. Think about
3A certain music artists you may like and have listened to in the past. This should
include taking stock of your tastes, as this may affect what I recommend. Lastly,
you should start to take note of your current playlists and calendar of activities
in order to get the best match. For many people, finding time to attend a show
can be difficult. It is important that you keep in mind your upcoming schedule
and plan accordingly. Managing your time and preferences can help find the

60
right tickets. I can help you with purchasing concert tickets. Would you like to
do this?”
Experiment “Your daily health routine should cover some basics like emotional, physical,
4A and mental well-being. This should include taking stock of how you currently
feel. You should also consider how you would describe your mood in general,
or most of the time. Lastly, you should start to take note of your daily routine,
taking time to check your emotional state and adjust if needed. For many
people, stress is a part of daily life that can go unrecognized. It is important
that you keep in mind your feelings and experiences as you go through the day.
Managing your stress levels benefits your emotional health as well as other
areas of your life”
Experiment “Your daily nutrition plan should cover some basics like calories, sugar,
4B cholesterol and fat intake. This should include taking stock of how you currently
feel as well as any health goals you may have. You should also consider which
types of food you enjoy and any allergies that may prevent you from eating
certain foods. Lastly, you should start to take note of your daily food routine,
taking time to check your physical hunger and emotional state and adjust if
needed. For many people, eating healthy is a part of daily life that can be
overlooked. It is important that you keep in mind your current health goals and
what you eat throughout the day. Managing your stress levels benefits both
your physical and emotional health”

61
APPENDIX C: MEASURES FOR EXPERIMENTS

Measure Used Items Reliability


Likelihood to Use Adapted from Chattopadhay et al. (2003) Single-item
DA 1. “How likely would you be to create a 1 = Not Very Likely
Main Dependent personal budget with the digital banker?” 7 = Very Likely
Variable in all (Experiment 1/3B)
studies 2. “How likely would you be to create a
personal study plan with the digital
assistant?” (Experiment 2A)
3. “How likely would you be to enroll in
classes using the digital advisor?”
(Experiment 2B)
4. “How likely would you be to purchase
concert tickets using the digital assistant?”
(Experiment 3A)
5. “How likely would you be to create a
personal emotional health check using the
digital health coach?” (Experiment 4A)
6. “How likely would you be to create a
personal health plan using the digital health
coach? (Experiment 4B)
Nervousness Watson, Clark, and Tellegen (1988) Taken as part of the
A negative 1. “When interacting, I felt…jittery; PANAS
affective state of distressed; anxious” (Experiment 2A-B, 1 = Not at All
high activation in 4A-B) 7 = Very Much
which one feels
tension 3-item measure
following Α2A = .928
perceived Α2B = .854
uncertainty or Α4A = .947
strain. Α4B = .898
Self-Service Bauer (1960), Meuter et al. (2005) 3-item measure
Technology (SST)1. “I am unsure if the assistant will perform α = .841
Risk satisfactorily”
Arising from 2. “Overall, using this digital assistant is
unanticipated risky”
and uncertain 3. “The digital assistant didn’t sound like it
consequences of was designed to do this task well”
an unpleasant (Experiment 2B)
nature resulting
from an
interaction

62
Manipulation Martín-Santana et al. (2015) Single-item
Check for Voice 1. The voice I heard when the digital assistant 1 = Slow
Speed was speaking was ________ 7 = Fast
Manipulation Adapted from Asterhan and Schwarz (2007) Single-item
Check for 2. Would you say that the interaction with the r4A= .862; r4B = .899
Interaction Style digital assistant was more passive, or one- 1 = One-directional
directional, or active, and two-directional? 7 = Two-directional

3. Would you say that the interaction with the


digital assistant was more about hearing 1 = Hearing Information
information or involved instruction? 7 = Involved Instruction
Vividness – Peck, Barger, and Webb (2013) 4-item measure
Experiment 2A 4. I was able to imagine using the digital α = .848
assistant 1 = Strongly Disagree
1. I felt as if the digital assistant was in the 7 = Strongly Agree
device in front of me
2. I could imagine interacting with the digital
assistant
3. I felt I could examine the digital
assistant/device
Susceptibility to Doherty (1997) 15-item measure
Emotions of If someone I’m talking with begins to cry, I α = .902
Others – get teary-eyed 1 = Strongly Disagree
Experiment 3A Being with a happy person picks me up 7 = Strongly Agree
when I’m feeling down
When someone smiles warmly at me, I
smile back and feel warm inside
I get filled with sorrow when people talk
about the death of their loved ones
I clench my jaws and my shoulders get
tight when I see angry faces on the news
When I look into the eyes of the one I love,
my mind is filled with thoughts of romance
It irritates me to be around angry people
Watching the fearful faces of victims on
the news makes me try to imagine how
they might be feeling
I melt when the one I love holds me close
I tense when overhearing an angry quarrel
Being around happy people fills me mind
with happy thoughts
I sense my body responding when the one I
love touches me
I notice myself getting tense when I’m
around people who are stressed out
I cry at sad movies

63
Listening to the shrill screams of a terrified
child in a dentist’s waiting room makes me
feel nervous

64
APPENDIX D: MEANS AND SD’s FOR PANAS MEASURES IN EXPERIMENT 2A

Means and (SD's)


Experiment 2A
Slow Moderate Fast p
Interested 2.73 (1.65) 2.69 (1.57) 2.64 (1.55) 0.920
Distressed 3.46 (1.87) 4.05 (1.71) 3.76 (1.84) 0.070
Excited 2.9 (1.67) 3.35 (1.59) 3.23 (1.57) 0.126
Upset 2.31 (1.49) 2.26 (1.39) 2.17 (1.45) 0.789
Strong 3.91 (1.82) 3.73 (1.57) 3.72 (1.68) 0.696
Guilty 1.96 (1.22) 1.78 (1.12) 1.89 (1.23) 0.584
Scared 1.74 (1.18) 1.89 (1.32) 1.90 (1.13) 0.593
Hostile 2.27 (1.40) 2.01 (1.33) 2.10 (1.42) 0.414
Enthusiastic 2.93 (1.66) 3.28 (1.61) 3.26 (1.63) 0.237
Proud 3.63 (1.80) 3.45 (1.55) 3.65 (1.63) 0.641
Irritable 3.62 (1.96) 3.29 (1.80) 3.09 (1.82) 0.137
Alert 3.80 (1.80) 3.93 (1.66) 3.92 (1.72) 0.851
Ashamed 1.91 (1.23) 1.83 (1.12) 1.85 (1.28) 0.906
Inspired 3.18 (1.77) 3.45 (1.74) 3.37 (1.75) 0.535
Anxious 2.22 (1.52) 1.82 (1.66) 2.24 (1.42) 0.046
Determined 3.71 (1.87) 4.05 (1.74) 3.99 (1.72) 0.368
Attentive 4.06 (1.77) 4.27 (1.66) 4.17 (1.78) 0.700
Jittery 3.21 (1.82) 2.62 (1.72) 3.19 (1.78) 0.031
Active 3.71 (1.85) 3.30 (1.66) 3.53 (1.61) 0.239
Afraid 1.87 (1.27) 2.00 (1.46) 1.83 (1.24) 0.642
Vividness 3.74 (1.41) 4.05 (1.50) 4.12 (1.46) 0.155
PANAS Positive Affect 3.45 (1.23) 3.55 (1.16) 3.55 (1.14) 0.812
PANAS Negative Affect 2.45 (.91) 2.35 (.90) 2.40 (.92) 0.748

65
APPENDIX E: INSTITUTIONAL REVIEW BOARD APPROVAL LETTER

66
APPENDIX F: TABLES

Table 1

Experiments 2A-B – Mediation Results

PROCESS Contrast of
Experiment IV DV Mediator Mediator 2 Effect Indirect Effects 95% CI
Model # Multicategorical IV
X1: Slow vs. Moderate X1: -.0796 X1: -.0022 to -.1980
Speech Likelihood to Listener
2A 4 - X2: Fast vs. Moderate X2: -.0843 X2: -.0068 to -.1961
Rate Use DA Nervousness
X3: Slow vs. Fast X3: .0047 X3: -.0949 to .0965
Speech Likelihood to Listener X1: Moderate versus
2B 6 SST Risk -.0259 -.0539 to -.0039
Rate Use DA Nervousness Slow Speech

Speech Likelihood to Listener X2: Moderate versus


2B 6 SST Risk -.0124 -.0320 to -.0020
Rate Use DA Nervousness Fast Speech

Speech Likelihood to Listener X3: Slow versus Fast


2B 6 SST Risk -.0077 -.0241 to .2311
Rate Use DA Nervousness Speech

67
Table 2

Experiments 4A-B – Moderated Mediation by Nervousness and Interaction Style

Index of Contrast of Contrast of


PROCESS Indirect Effects
Experiment IV DV Mediator Moderator Moderated Multicategorical Dichotomous
Model # 95% CI
Mediation IV Moderator
Speech Likelihood Felt Interaction X1: Moderate Monological -.3469 to -.0211
4A 8 .0082 to .4629
Rate to Use DA Nervousness Style versus Slow Dialogical -.1057 to .2117
Speech Likelihood Felt Interaction X2: Moderate Monological -.4486 to -.1024
4A 8 .0916 to .5671
Rate to Use DA Nervousness Style versus Fast Dialogical -.0971 to .1953
Speech Likelihood Felt Interaction X3: Slow versus Monological -.2618 to .0537
4A 8 -.1216 to .3295
Rate to Use DA Nervousness Style Fast Dialogical -.1618 to .1624
Speech Likelihood Felt Interaction X1: Moderate Monological -.2866 to -.0264
4B 8 .0343 to .3674
Rate to Use DA Nervousness Style versus Slow Dialogical -.0715 to .1444
Speech Likelihood Felt Interaction X2: Moderate Monological -.3123 to -.0331
4B 8 .0414 to .4204
Rate to Use DA Nervousness Style versus Fast Dialogical -.0546 to .1705
Speech Likelihood Felt Interaction X3: Slow versus Monological -.1406 to .0842
4B 8 -.1135 to .2092
Rate to Use DA Nervousness Style Fast Dialogical -.0944 to .1403

68
APPENDIX G: FIGURES

FIGURE 1

CONCEPTUAL OVERVIEW OF ALL EXPERIMENTS

69
70
FIGURE 2

EXPERIMENT 2B – SERIAL MEDIATION OF SPEECH RATES ON LIKELIHOOD TO USE

DA

71
FIGURE 3

EXPERIMENT 3A – JOHNSON-NEYMAN GRAPH OF MODERATION OF LIKELIHOOD

TO USE DA BY SUSCEPTIBILITY TO OTHERS EMOTIONS

72
FIGURE 4

EXPERIMENT 3B – LIKELIHOOD TO USE DA ACROSS INTERACTION STYLE AND

SPEECH RATE

6.0

5.0
Likelihood to Use DA

4.0

3.0

2.0

1.0

0.0
Monological Dialogical
Interaction Style

Slow Moderate Fast

73
FIGURE 5

EXPERIMENT 4A – LIKELIHOOD TO USE DA ACROSS INTERACTION STYLE AND

SPEECH RATE

6.0

5.0
Likelihood to Use DA

4.0

3.0

2.0

1.0

0.0
Monological Dialogical
Interaction Style

Slow Moderate Fast

74
FIGURE 6

EXPERIMENT 4B – LIKELIHOOD TO USE DA ACROSS INTERACTION STYLE AND

SPEECH RATE

6.0

5.0
Likelihood to Use DA

4.0

3.0

2.0

1.0

0.0
Monological Dialogical
Interaction Style

Slow Moderate Fast

75
APPENDIX H: HEADINGS LIST

1) THEORETICAL FRAMEWORK

2) Elements of Speech

2) Speech Rate

2) Social Response

2) Responses to Speech

3) Nervousness

3) Interaction Style

1) OVERVIEW OF STUDIES

1) EXPERIMENT 1

2) Method

3) Participants and Design

3) Stimuli and Pretests

3) Procedure

3) Measures

2) Results and Discussion

3) Speech Rate

3) Likelihood to use the DA

1) EXPERIMENT 2A

2) Method

3) Participants and Design

3) Procedure

3) Measures

76
2) Results and Discussion

3) Speech Rate

3) Likelihood to use the DA

3) Nervousness

3) Mediation of Main Effect by Nervousness

1) EXPERIMENT 2B

2) Method

3) Participants and Design

3) Procedure

3) Measures

2) Results and Discussion

3) Speech Rate

3) Likelihood to use the DA

3) SST Risk

3) Serial Mediation Analyses

1) EXPERIMENT 3A

2) Method

3) Participants and Design

3) Procedure

3) Measures

2) Results and Discussion

3) Speech Rate

3) Likelihood to use the DA

77
3) Personal Differences in Susceptibility to Others Emotions

1) EXPERIMENT 3B

2) Method

3) Participants and Design

3) Procedure

3) Measures

2) Results and Discussion

3) Speech Rate

3) Likelihood to use the DA

1) EXPERIMENT 4A

2) Method

3) Participants and Design

3) Procedure

3) Measures

2) Results and Discussion

3) Manipulation Checks

3) Nervousness

3) Moderated Mediation

1) EXPERIMENT 4B

2) Method

3) Participants and Design

3) Procedure

3) Measures

78
2) Results and Discussion

3) Manipulation Checks

3) Nervousness

3) Moderated Mediation

1) GENERAL DISCUSSION

2) Theoretical Implications

2) Managerial Implications

2) Limitations and Future Research

1) DATA COLLECTION INFORMATION

79

You might also like