Professional Documents
Culture Documents
Accepted Manuscript: 10.1016/j.chb.2018.08.009
Accepted Manuscript: 10.1016/j.chb.2018.08.009
Voices in and of the machine: Source orientation toward mobile virtual assistants
Andrea L. Guzman
PII: S0747-5632(18)30384-4
DOI: 10.1016/j.chb.2018.08.009
Reference: CHB 5643
Please cite this article as: Guzman A.L., Voices in and of the machine: Source orientation toward mobile
virtual assistants, Computers in Human Behavior (2018), doi: 10.1016/j.chb.2018.08.009.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Voices In and Of the Machine: Source Orientation Toward Mobile Virtual Assistants
PT
Andrea L. Guzman
RI
U SC
AN
M
D
TE
C EP
Abstract
Research regarding source orientation has demonstrated that when interacting with computers,
people direct their communication toward and react toward the technology itself. Users perceive
PT
dimension to those findings with regard to source orientation with voice-based, mobile virtual
RI
assistants enabled by artificial intelligence (AI). In qualitative interviews regarding their
SC
Voice) and their perceptions of interactions with these specific technologies, some participants
describe the agent they can hear but not see as a voice in the mobile phone (assistant as distinct
U
entity) while others perceive the technology that they command to be the voice of the phone
AN
(assistant as the device). Therefore, congruent with existing research, users of mobile assistants
orient toward a technology, instead of thinking they are interacting with a human, but, in contrast
M
voice and are designed with various social cues and degrees of intelligence, the locus and nature
TE
Artificial Intelligence
AC
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 2
Voices In and Of the Machine: Source Orientation Toward Mobile Virtual Assistants
1. Introduction
PT
increasingly become a fixture in mobile devices: In addition to Siri, there is Google’s name-less
female assistant, Microsoft’s Cortana, and Samsung’s S-Voice. Some of these technologies now
RI
are being integrated into other devices, such as laptops and desktops, while a whole new market
SC
of stand-alone products that serve as smart assistants for the home, such as Amazon Echo, is
U
(AI) means that people are increasingly speaking with both humans and human-like virtual
AN
agents as they navigate almost every aspect of their lives. How do people conceptualize
technologies that possess voice, along with intelligence and other social attributes, without a
M
body as communicators and make sense of their communication with these digital interlocutors?
communication (HMC) regarding people’s interactions with virtual assistants for mobile phones
TE
(Author, 2015). This article focuses on the project’s findings regarding an integral aspect of
EP
people’s communication with technology: source orientation, who or what people direct their
attention toward in an exchange between a computer and user (Reeves & Nass, 1998).
C
Voice-based AI technologies and people’s interactions with them are a timely and
AC
important area of study for communication and technology scholars and designers, and source
agents as mobile phone applications, most of the general public had never had the opportunity
for sustained interaction with an AI technology that could talk and exhibit overt social cues.
News reports of Siri’s launch hailed the technology as the AI of science fiction being made a
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 3
reality (e.g., Gross, 2011a). For the first time, people could readily interact with a technology
they identified as AI and could communicate with it in ways that were unlike their interactions
interaction (HCI) have documented how people respond to and behave toward computers (e.g.,
PT
Nass, Steuer, & Tauber, 1994), including with voice-based technologies (e.g., Nass & Brave,
RI
2005), the communicative ability of mobile virtual assistants is more advanced than early car
navigation systems or similar applications around which initial theories of people’s interactions
SC
with computers were developed. Part of the difference between conversational agents that serve
as mobile assistants and their predecessors is the use of natural language programming that
U
allows users to speak toward and get replies from an application in ways similar to people’s
AN
interactions with other humans (Hearst, 2011). Mobile virtual assistants not only are designed to
be more human-like than most predecessor technologies but also are intended to be a routine part
M
of people’s daily lives as they are readily located in back pockets and purses. Interactions with
D
mobile assistants, then, are ongoing and contingent upon users’ routines and how they choose to
TE
communicate with these interlocutors, rather than being limited to exchanges based on a narrow
communication standpoint, mobile virtual assistants represent a step forward in the development
of communicative technology.
C
Who or what people think they are interacting with – the source of a message – is an
AC
interaction with a fellow human communicator hinges on their conceptualizations of that other
person (e.g., Goffman, 1959). Similarly, researchers have found that people’s orientation toward
technology as a message source influences how they evaluate messages from that technology
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 4
(e.g., Nass & Steuer, 1993; Sundar & Nass, 2000, 2001) as well as how they behave toward it
(e.g., Eckles et al., 2009; Shechtman & Horowitz, 2003; Srinivasan & Takayama, 2016). The
conceptualizations of this emerging technological class that can assist scholars in contextualizing
PT
and explaining people’s initial interactions with intelligent interlocutors and inform the design of
RI
the next generation of mobile assistants and conversational agents.
2. Literature Review
SC
2.1 Human-Machine Communication
This research is part of a larger project (Author, 2015) regarding voice-based digital
U
assistants informed by theories in human-machine communication. HMC is an emerging area of
AN
research within the communication discipline encompassing scholarship within HCI, human-
robot interaction (HRI), and human-agent interaction (HAI) as well as other aspects of scholarly
M
inquiry into people’s communication with technology (see for example Edwards et al., 2017;
D
Guzman, Jones, Edwards, Edwards, & Spence, 2016). As such, HMC brings together research
TE
Research in this area is united in its goal of understanding technology as a communicator rather
than restricting its role to that of a mediator, which has been the default conceptualization of
C
technology within communication theory (see discussions in Gunkel, 2012; Nass & Steuer,
AC
Because the creation of meaning between humans is guided by how people interpret one
interlocutor has been an important part of the study of people’s interactions with computers (e.g.,
Nass et al., 1994), with agents (e.g., Payr, Sabine & Trappl, 2004), and with robots (e.g.,
Schaefer, Billings, & Hancock, 2012). Researchers have established that people interpret and act
toward computer programs, embodied agents, and robots as independent interlocutors (e.g.,
PT
Hoffmann, Krämer, Lam-chi, & Kopp, 2009; Nass & Steuer, 1993; Straub, Nishio, & Ishiguro,
RI
2010). Furthermore, people often respond to communicative technology socially by applying
rules of communication with humans to interactions with technology (Nass et al., 1994; Reeves
SC
& Nass, 1998). People respond to specific humanlike traits and attributes programmed into
technology including gender (Nass, Moon, & Green, 1997), personality (Moon & Nass, 1996),
U
and nationality (Tamagawa, Watson, Kuo, MacDonald, & Broadbent, 2011). Giving technology
AN
a voice is a particularly effective way of eliciting social responses (Nass & Brave, 2005).
The voice-based, virtual agents that are the focus of this study are vastly more
M
technologically advanced and complex than many of the technologies that were the subject of
D
earlier studies, such as stationary computers with a synthesized voice. Siri and similar agents are
TE
programmed with human-like traits of gender and overt personality, and, equally important, also
are designed to enact a specific social role – that of an assistant – in the functions they perform
EP
and in the messages they send to the user. Mobile virtual assistants combine many of the social
attributes that were once singularly programmed into individual technologies. These agents also
C
utilize artificial intelligence, making them “smarter” and more responsive to individual users.
AC
Overall mobile assistants are a new type of technology within a familiar device: a voice that
people can hear, but not see, emanating from a mobile phone that was once used primarily for
listening and speaking to human voices. The goal of the larger research project (Author, 2015),
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 6
from which this study was developed, was understanding people’s conceptualizations of a new
application (a virtual assistant) incorporated into a familiar piece of hardware (a mobile phone).
An integral question in early HCI research was whether the people exhibiting social
PT
behavior toward technology were mentally attending to the technology as a source or whether
RI
they were projecting their response through the device toward a human they perceived to be
“behind” the machine, such as a programmer (Reeves & Nass, 1998; Sundar & Nass, 2000). The
SC
distinction – whether people perceive technology as source versus medium – was and still is
critical for work within HMC given that people’s source orientation affects their evaluation of
U
the messages they are receiving (Sundar & Nass, 2001) and their interactions with a technology
AN
(Eckles et al., 2009; Srinivasan & Takayama, 2016). Identifying the source of human interactions
is not always straightforward (Sundar & Nass, 2001) and is particularly complicated when
M
communication is mediated (Reeves & Nass, 1998). In people’s interactions with technology,
D
determining the source of a message may be more complex given that people are simultaneously
TE
interacting with multiple human and technological “layers,” that can be perceived as sources (i.e.
technologies people react to and behave toward the technology itself as opposed to a human
C
programmer or operator (e.g., Hoffmann et al., 2009; Srinivasan & Takayama, 2016; Straub et
AC
al., 2010; Sundar & Nass, 2000). Nass and Steuer (1993) found that people differentiate among
computerized voices, acting toward them as distinct communication partners, with Sundar and
Nass (2000) further establishing that, in direct interactions with technology, people respond to
technology as a message source rather than as a medium passing along a message from a human
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 7
programmer. Subsequent research has also found that people consider embodied conversational
agents (Hoffmann et al., 2009) and act toward robots (Straub et al., 2010) as sources. People also
act differently toward a technology sending a message versus a human sending the same
message, indicating that people perceive humans and specific technologies as distinct
PT
interlocutors (Eckles et al., 2009; Shechtman & Horowitz, 2003). In terms of source orientation,
RI
a message exchange between a co-located human and computer is analogous to interpersonal
SC
Research regarding source orientation between humans and computers has primarily
focused on this human-computer equivalent of people’s face-to-face interactions, with the user
U
and technology co-located (Eckles et al., 2009). Just as human communication can be mediated,
AN
so too can human-machine communication, meaning that one device is sending the message and
a second device is transmitting it. Eckles et al. (2009) found that in human-computer interaction
M
mediated by a mobile phone, people orient toward the computer where the message was said to
D
originate. In both co-located and mediated exchanges between humans and computers, people act
TE
toward and respond to digital technologies as distinct actors, distinguishing between a computer
Scholars draw on theory from both interpersonal and mediated communication to explain
why people perceive technology as a distinct interlocutor. Reeves and Nass (1998) explain that,
C
Similarly, in their interactions with co-located technology, people orient toward what is present
with them in the same space, such as a computer (Sundar & Nass, 2000) or an embodied agent
and robot (Hoffmann et al., 2009; Straub et al., 2010). Eckles et al.’s (2009) finding that people
orient toward a technology that is removed from them, however, demonstrates that other factors
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 8
beyond spatial proximity play a role in source orientation with digital interlocutors. Drawing
from theory in mediated communication, Solomon and Wash (2014) propose a connection
between social presence and source orientation. Presence is “the illusion of nonmediation” that
people may experience when engaged in a mediated interaction through a technology or with a
PT
technology (Lombard & Ditton, 1997), with social presence generally defined as the “sense of
RI
being with another” (Biocca, Harms, & Burgoon, 2003, p. 456). The same social cues in
technology design that elicit social responses from people toward machines can evoke user
SC
perceptions of social presence (Lee & Nass, 2005). Solomon and Wash (2014) theorize that
people experiencing social presence with a communication partner are orienting their attention
and act toward technology as independent interlocutors and message sources, limitations within
M
this research coupled with the unique design of mobile virtual assistants raise questions as to the
D
degree of applicability of previous findings to voice-based, mobile agents. In existing studies, the
TE
technology was often described to participants as performing some sort of communicative role
(e.g., Sundar & Nass, 2000) or was specifically labelled as a communicator (e.g., Eckles et al.,
EP
2009). With the exception of Eckles et al. (2009), the majority of source orientation research has
people were interacting with technologies in highly defined and limited contexts established by
AC
the researchers.
mobile virtual assistants are complex technologies with multiple technological “layers.” In
addition to their social cues, their vocal abilities, and their intelligence, their exact “location” is
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 9
ambiguous. They are represented primarily as a voice emanating from a mobile phone, and using
an agent requires interaction with the voice as well as limited interaction with the physical phone
(i.e. tapping a home button, adjusting volume, etc). And so, this research provides an
understanding of source orientation when people must interact with two technologies – the
PT
assistant and the phone – that are coupled with one another. In contrast to the communication
RI
context in previous research, this study focused on people’s perceptions and conceptualizations
as developed through a pattern of use determined by the participants’ themselves. There was no
SC
labelling of the technology other than what was presented to participants by the program itself or
other representations of it (for example, media portrayals of Siri). People’s interactions with
U
assistants also was open-ended, ranging from asking an assistant to set an alarm or to tell a funny
AN
joke. Overall, this research provides the opportunity to further develop theory regarding source
orientation in interactions in which the source may be ambiguous and interactions with it
M
3. Method
TE
This study is part of a larger project that focused on people’s conceptualizations of voice-
based, mobile virtual assistants (Author, 2015). As discussed, the design and use of mobile
EP
assistants vary in important ways from predecessor technologies, and these differences warrant
methodological consideration. Unlike previous devices that performed a narrow range of tasks
C
within highly specific contexts, such as voice-based navigation within vehicles, mobile assistants
AC
are designed to engage in a variety of interactions with users across multiple contexts while also
adapting to individual users. The flexibility of the design of mobile assistants and the greater
agency this gives to individuals in terms of how and when they interact with assistants increases
the potential that people’s conceptualizations of mobile assistants are individualized. Therefore,
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 10
the larger project and this study followed a qualitative approach that emphasized people’s
criteria solely determined a priori by the researcher (see Christians & Carey, 1989; Maxwell,
2013 for a discussion of the affordances of qualitative research in this context). Such a user-
PT
focused approach in technology research, as Lee et al. (2014) argue, provides important insight
RI
into user understanding of technology that can be leveraged by researchers and designers to
SC
3.1 Time Period
Data for the study was collected during the first half of 2015. The timing of this study is
U
important because it took place after most mobile platforms had launched voice-based assistants
AN
but immediately before Amazon Echo, a home assistant, became publicly available. The project
captures a moment in the adoption of these technologies in which many people had the
M
opportunity for ongoing use in the context of mobile phones. The findings provide a baseline for
D
understanding source orientation with mobile virtual assistants that can be used to help interpret
TE
future studies of assistants within mobile phones as well as to compare and contrast against
perceptions of these same agents used on other devices, such as Siri on a desktop, as well as
EP
3.2 Participants
C
The overall project (Author, 2015) consisted of interviews with 64 participants selected
AC
from field sites, with responses from 28 of those participants included in this study. As explained
further in the procedures section, the number of people included in this study is lower because
specific questions regarding source orientation were added later in the research project. In
addition, the larger project included people who did not use mobile virtual assistants to which
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 11
source orientation does not apply. Participants included in this study ranged in age from 20 to 65
with a mean age of 33.5 (SD = 14.31) with 61 percent identifying as female and 39 percent,
male. Regarding race and ethnicity, 47 percent of participants self-identified as white, 21 percent
as black, 14 percent as Latino, 14 percent as Asian, and 4 percent as Middle Eastern. Mobile
PT
virtual assistants used by participants include Apple’s Siri, Google Voice, and Samsung’s S-
RI
Voice, with some participants having experience with more than one technology. Given the
timing of the study, participants were most familiar with Siri, either through owning an Apple or
SC
by using it with other people, because Siri had been around the longest.
3.3 Procedure
U
Field observations and interviews were conducted by the researcher in public places –
AN
such as parks, transit stations, and common areas on university campuses – in a large
Midwestern city. The researcher entered the field sites and observed the technologies people
M
were using and how they were engaging with the technology. Participants were selected for
D
interviews based on their technological use and demographic factors to engage with a wide
TE
variety of people with varying experiences with technology. Interviews were audio recorded and
transcribed verbatim. Regarding research ethics, the appropriate Institutional Review Board
EP
approval was obtained. Participants chose an alias to be referred to during the interview and then
were randomly assigned a different alias by the researcher for the reporting of the results.
C
Consistent with the project’s goal of emphasizing people’s own articulations of their
AC
conceptualizations of mobile virtual agents, the researcher conducted field interviews using the
active interviewing method (Holstein & Gubrium, 1995). In contrast to interview approaches that
stress the agency of the researcher over that of the participant, giving the researcher full control
over what is discussed and how it is discussed, active interviewing approaches the interview as a
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 12
meaning-making process between researcher and participant, with the interview designed to be
responsive to the information being shared by the participant (Holstein & Gubrium, 1995). For
this study, the researcher developed a questionnaire based on existing literature regarding
people’s general conceptualizations of the communicative other (e.g., Goffman, 1959) and
PT
aspects of people’s interactions with technology, such as social presence (e.g., Biocca et al.,
RI
2003). The questionnaire also included open-ended items aimed at eliciting additional
SC
In active interviewing, the questionnaire is meant to serve as a guide for the interviews
but can be further revised when necessary to study unanticipated themes or angles that emerge
U
based on participant answers, and that is the case regarding this study. Specific questions
AN
regarding source orientation were not included in the initial questionnaire because, despite
limitations of existing research, the consistent finding has been that people perceive technology
M
to be a source (e.g., Eckles et al., 2009; Sundar & Nass, 2000); however, questions on the initial
D
assistants did address aspects of source orientation. Early in the research, participant answers to
these questions indicated that people were orienting to technology, instead of a human, but may
EP
a question was added to the questionnaire regarding whether people thought of themselves as
C
interacting with (talking to or listening to) the phone or the assistant (using the assistant’s name).
AC
Questions that probed source orientation from different angles (such as asking people if they
associated the voice they were hearing/talking with the assistant or the phone) were also used to
conducted concurrently until saturation of the theoretical categories was reached (Charmaz,
2014). The researcher used the qualitative analysis program MaxQDA to assist with coding and
analysis. Verbatim interview transcripts were coded using methods outlined in Saldaña (2013),
PT
including structural and in-vivo coding. The researcher analyzed the codes and subcodes applied
RI
to participants’ answers to individual interview questions looking for trends in the data for each
question and across questions. The researcher also compared interviews to one another and
SC
against existing literature, looking for areas of similarity and divergence, as in keeping with good
practice for the analysis of qualitative data (see Maxwell, 2013). In addition to analyzing
U
people’s answers to questions directly regarding source orientation, the researcher also looked at
AN
other aspects of the interview, such as people’s overall descriptions of an assistant or explanation
of how they interact with the assistant, to help contextualize and interpret people’s answers
M
4. Findings
TE
All participants responded that when speaking or listening to an agent’s voice, they
specific technology they were interacting with varied: 9 participants described themselves as
primarily talking with the mobile phone; 16 participants perceived themselves as primarily
C
talking with the assistant or another technology in or accessed through the phone; 3 participants
AC
explained that although they mostly thought of themselves as interacting with either the phone or
the assistant, at times their conceptualization of the source switches either while using the same
Some users describe the communication process with mobile virtual assistants as an
exchange of messages between them and the device. Although the voice they are hearing is being
produced by the assistant application, these participants think of the voice as representing the
mobile phone. When they are speaking to the voice, they are giving commands directly to the
PT
phone, listening to a clearly artificial voice speak back to them, interacting with a device on a
RI
limited basis, or are avoiding speaking to the voice because they generally dislike phones.
The last scenario is the case with Sabrina, a university student, who has tried Siri and
SC
similar programs several times but does not use them regularly and has no plans to start doing so.
Part of the reason why she avoids mobile virtual assistants is that Sabrina avoids using the
U
phone, period. Using an assistant is the same as talking to the phone, for Sabrina. When asked to
AN
describe Siri, Sabrina states:
“Um, I guess it’s like a voice-based interaction with your, whatever media device
M
you’re using.”
Sabrina also thinks of speaking directly to a phone, particularly around others, as a type of
D
“I just think it’s weird like if you’re walking around and you talk to your phone.”
EP
In both Sabrina’s description of Siri and in her explanation of avoiding Siri, Sabrina makes clear
that she equates interacting with Siri as communicating with the phone.
C
But equating a virtual assistant with a phone is not an exclusive trait of people who have
AC
little interaction with the technology or prefer not to use phones generally. Both Cindy, a retired
educator, and Dwight, a recent college graduate, use Siri on their iPhones at least once a day for
a variety of tasks, and enjoy doing so. It is because Cindy equates her interaction with Siri as
being with the phone that she uses the voice-based assistant so often. Cindy describes her
“…I can speak to the phone, and I don’t have to type on my phone. It’s fast.”
When asked directly about what they think they are interacting with, both Cindy and Dwight
identify “the phone.” For Dwight, his perception of Siri as the iPhone is built around the fact that
PT
“So basically you are controlling the computer yourself. You are the motherboard
of it, and it’s just waiting for your command.”
RI
Siri provides a way for Dwight to control his phone, and therefore, to Dwight, is the phone. Siri
SC
is not life-like to Dwight, and other descriptions he provides of Siri are similarly focused on
technological aspects associated with that of a tool. Cindy also similarly talks about Siri, and
U
other participants who think of themselves as interacting with the device provide descriptions of
AN
assistants that are focused on their technical aspects, not their life-like attributes.
It is important to note that people’s understanding of what a mobile virtual assistant is, in
M
terms of its status as a program, varies. And people’s perceptions of the assistant and its voice as
being that of the phone is not necessarily a function of a misunderstanding regarding the
D
technology. Some participants are aware that the voice they are interacting with is created by a
TE
program, a piece of software, and not the phone, a piece of hardware. However, their experience
EP
of interacting with the voice leaves them with a different impression of what they are
communicating with. Fred is slightly above middle age and works in a management position. He
C
uses Google Voice Search on a limited basis to quickly interact with the phone or find
AC
information. When I ask Fred what he is talking to and what is talking back, Fred replies.
“I think that the phone is talking back to me. I realize there’s a voice that’s a
program. I realize that, but it’s the phone talking back to me.”
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 16
Fred knows that the voice he is interacting with is technically separate from the phone, but the
process of actually communicating with the agent and its voice during his limited interaction
with the voice leaves him with the impression that he’s talking directly to the phone.
PT
Other participants perceive themselves as engaging with an interlocutor that is distinct
RI
from the device. They experience the assistant as a voice in, or even travelling through, the
machine and think of their device, as a medium transmitting the assistant’s voice. Some of the
SC
descriptions of digital interlocutors here are similar to historical accounts of sentient technology
and people’s experiences with ephemeral entities, such as those documented in Sconce’s (2000)
U
Haunted Media. However, not every person who thinks of a virtual assistant as something
AN
independent from the phone perceives it as sentient. Some participants think of the voice as
being produced by software instead of hardware: the assistant remains a thing, just a different
M
thing from the device. The degree to which the agent is divorced from the hardware varies
D
because people still engage with the device while using the agent, i.e. tapping a button, talking
TE
toward it, and the assistant’s voice is transmitted through the device.
Some participants describe the assistant they use as separate from the phone, but as
EP
existing within it. Priscilla is a college student and a relatively new Siri user who relies on the
program to take notes for her or set alarms. Priscilla describes Siri as being
C
When I ask Priscilla whether Siri is the phone or another entity, Priscilla provides an explanation
“I think it’s like another thing, because usually when you, like say you make a, a
note. It’s just like, ‘This is your note. Would you like me to make it?’ And I say,
‘Yes.’ And it’s kind of like I’m talking to it. So, it’s not like something that you
type out, click enter. It’s kind of like you’re saying, ‘Like, yes. Do that for me.’ It
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 17
kind of feels like you’re telling somebody to, like your little assistant to write it
down for you.”
Priscilla’s description parallels that of a human-human exchange between employer and assistant
and reflects Siri’s social and communicative positioning as an assistant (see Guzman, 2017 for a
PT
detailed explanation) and, thus, independent social actor. Although Priscilla conceptualizes Siri
as distinct from the phone, she does not think of it as entirely removed from the device, either. In
RI
Priscilla’s various descriptions of Siri throughout the interview, Priscilla tacks on the adjective
SC
“little” in front of the noun she’s using to identify Siri i.e. “little person” and “little assistant.”
The context of Priscilla’s interview makes clear that “little” denotes the reduced stature of the
U
assistant needed to fit into the phone – a type of “Honey, I shrunk the digital assistant” heuristic.
AN
Priscilla isn’t the only participant who discusses the “violation” of physics needed for an
agent – a separate entity – to reside within a device. Lillian, who is in her early 20s, does not use
M
Siri regularly, and in fact does not see a point in doing so, but is still able to talk about the
program in detail. When asked to assign traits to Siri, Lillian describes the program as “all-
D
knowing.” When asked why she would use this adjective, Lillian replies:
TE
“Um, well, it’s like a person so to speak that you can talk to that gives you all the
answers or, I guess, all powerful, too, because, then, they can go into the phone.”
EP
Similar to Priscilla, Lillian perceives the agent as an entity that is distinct, but not entirely
removed from the phone. To be clear, Priscilla, Lillian, and others who think of an assistant as a
C
distinct entity realize that it is a program and make that very clear, including Adam, who is in his
AC
mid-20s. Several times during his interview, Adam refers to the assistant as being like a “crazy
I: Do you think of yourself as interacting with your phone or interacting with Siri?
R: With Siri.
I: And so, at that point, what is Siri in that moment? Is Siri the voice to you, or
Siri a program, or something else?
R: Um, I mean it’s a voice (be)cause she, er, she talks back. But, um, I think like
that’s kind of how I would think of it: like just a crazy lady that lives in my phone
that, you know.
PT
I: So, you’re talking to the crazy lady?
R: Yeah, that I know is a computer, but, or a computer application.
I: So, basically you’re talking to an application that takes the form of a crazy
RI
lady?
R: Yes.
SC
From this exchange with Adam, as well as discussions with others, it is clear that these
descriptions of Siri are metaphorical, and, it is through these metaphors, and other descriptions
U
that users explain an assistant that is its own entity, but also is never far from the phone.
AN
The examples provided so far heavily anthropomorphize mobile virtual assistants, and
many of the users described above discuss their programs as if they are life-like throughout their
M
interviews. However, not all participants who perceive themselves as interacting directly with
assistants, conceptualize their digital interlocutors as humanized entities. Mara’s boyfriend has
D
Siri on his mobile phone, and the twentysomethings routinely interact with the program. Mara
TE
thinks of herself, and her boyfriend, as talking directly with Siri, describing Siri as being
EP
“…more like a robot, you know. Like the robot’s on the other side of the phone.”
Similarly, Sigurd is an IT executive who has worked extensively with voice-based and AI
C
technologies. He also is a heavy Siri user and thinks of himself as interacting with Siri, not the
AC
phone. The key difference between users like Sigurd and those like Priscilla or Adam is that
participants like Mara and Sigurd focus on Siri or other agents as technologies. In every
description Sigurd provides of Siri, he refers to Siri as software or a technology that serves as a
While most participants associated the disembodied voice in the machine as that of a
specific agent, such as Siri, one participant conceptualized it as a different technology accessed
through the phone. Olivia is a graduate student in her early thirties who uses Google Voice
Search. When asked what she is interacting with, the phone or Google Voice, Olivia replies:
PT
“Uh, I think of it as speaking to a database, which would be the Google
Database.”
RI
Olivia’s primary use of Google Voice Search is to search for information by voice instead of
SC
typing a query into the Google search page on the phone. In her conceptualization of what she is
interacting with, Olivia focuses on what she thinks is the origin of the information – Google
U
itself. It is important to note that Olivia was the only participant to think of herself as interacting
AN
with something other than the device or the agent, and so, her response could be interpreted as an
with Google. That a person could think of the source of the information as neither the agent nor
D
the phone is not out of the question given that mediated information passes through many
TE
technological layers, any of which may be thought of as a source (Solomon & Wash, 2014).
EP
Overall, some participants think of themselves as communicating with the agent instead of the
device but the extent to which the agent is tethered to the device varies as does the agent’s status
C
as a piece of software or life-like entity. In more limited instances, people may think of
AC
Most study participants clearly delineated between the phone and the assistant as the
source with which they were interacting. These participants articulated the focus of their
attention indirectly through their discussion of communication with the agents and directly when
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 20
asked to identify their conversational partner. Some people interviewed answered the questions
with what appeared to be a relative level of confidence manifest as quick, straightforward replies,
as if they didn’t have to think twice. Other people, however, required a moment to think, which
took the form a vocalized pause, or had to talk through the question. The need for people to think
PT
through their response to the questions being asked is unsurprising given that people’s
RI
interactions with technology are “mindless” (Nass & Moon, 2000). People who interact with
social technologies, particularly regularly, do not take the time to think through what they are
SC
interacting with and how they should interact with it. Some participants in this study reported
that they had never thought about the questions being asked of them regarding source
orientation.
U
AN
For several participants discussions regarding drawing the line between the voice of the
machine and the voice in the machine and pinpointing an exact source were particularly difficult.
M
A former Siri user who recently acquired an Android device, Beecher goes back-and-forth during
D
the interview as to whether Siri is the phone or the agent. Beecher, who is in his early twenties,
TE
“Oh, yeah, well. Yeah, I don’t think there’s a separation between the phone and
EP
Siri. Like I know that like, you know, she’s computer generated so like she is the
phone. But I, but strangely enough, I never call my phone itself Siri. So I guess I
do kind of, in my mind, like unconsciously separate the phone from Siri, but I
don’t know. I never really thought about it. But, like, just thinking about it, it’s
C
like, ‘I mean, she’s the phone.’ But then it’s like, ‘Well, why didn’t you call your
phone Siri?’ I don’t know.”
AC
The difficulty for Beecher in trying to articulate an either/or position is that he does not think of
Siri and the phone as separate. And this makes sense from a user standpoint. Siri’s voice comes
from the phone, and the program itself is baked into the iPhone. Then Beecher catches himself:
Siri’s name is the reason why he questions whether Siri is the phone or something else. What is
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 21
unspoken in Beecher’s reply is a reality of how many users, even people who conceptualize Siri
as only a technology, interact with Siri: they use its name when making requests. Beecher is
trying to make sense of something he interacts with as if it were human but knows it is not.
Judith, a middle-age professional who uses Siri on a weekly basis, faces a similar
PT
dilemma. When asked what Judith perceives herself as interacting with, she replies:
RI
R: I mean, I call her by name, but I’m talking to my iPhone.
I: So, you call her by name. So, do you think of yourself as talking to Siri or
talking to your iPhone?
SC
R: I mean, I don’t really think of her as like a person that’s like living somewhere
else and you know, phoning it in. Although, I mean, yes, I can think of her with a
face or whatever, but I don’t really think of her as like a person who lives
somewhere. You know what I mean?
U
Similar to Beecher, Judith, who anthropomorphizes Siri, is caught up with trying to explain an
AN
interaction with an entity she knows is not human but realizes she has acted toward as if it were
human. Such ontological wrangling is not unusual for people when they realize they have been
M
acting toward machines as if the technology were a person (Nass & Moon, 2000).
D
Some people’s perceptions of what they are interacting with changes based on how the
TE
communication with the machine is unfolding or is contingent upon the assistant they are
interacting with at a particular moment. Cameran frequently relies on Siri to carry out a limited
EP
range of tasks, such as setting alarms. Similar to several participants, Cameran’s articulation of
Siri bounces between the agent as a technology, “it”, and the agent as a female entity, “her.”
C
When discussing whether he thinks of himself as interacting with Siri or a phone, Cameran
AC
“I mean, really I do see myself as interacting with my phone, though I will admit
sometimes like, you know, particularly if I’m not thinking about it. If I say, you
know, ‘Siri, do this for me.’ And Siri goes like, ‘Okay, now I’ve done this for
you.’ I’ll like just say, ‘Thank you,’ without sort of really thinking about it. So
clearly I’ve anthropomorphized her a little bit.”
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 22
For Cameran, interacting with Siri involves directing his attention to the iPhone, most of the
time. But Cameran also has an awareness that under certain circumstances, namely the exchange
of pleasantries that are completely unnecessary when interacting with a program, that his focus
PT
Because people routinely own multiple devices, each potentially having their own
RI
assistant, several participants reported experiencing or actively using different mobile assistants.
Users described these assistants, including Google Voice Search and Siri, as being distinct from
SC
one another in terms of design and use, and, one participant, Naomi, reported that their
conceptualization of source was contingent upon the specific mobile assistant. Naomi routinely
U
uses Google Voice Search to access information about the weather on her phone. This is the only
AN
way she uses the application. Naomi also has interacted with Siri on her iPad a few times, asking
it what Naomi describes as “weird” questions, such as “What should I wear today?”, in a manner
M
she compares to Zooey Deschanel in the iPhone commercials. Regarding her conceptualization
D
Now this is kind of bizarre, I guess, because with my phone I think of myself
talking to a program. With this [iPad] I think of myself as talking to Siri. I don’t
know. Maybe it’s because she has a name. But I don’t know. Once you name
EP
Naomi’s source orientation is not uniform across digital assistants, diverging along with her
C
Congruent with the source orientation literature (e.g., Eckles et al., 2009; Sundar & Nass,
2000), people interacting with voice-based mobile virtual assistants perceive themselves as
participants’ conceptualizations of the technological source they are interacting with diverge as
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 23
to whether the voice with which they are communicating is that of the mobile virtual assistant
(the software) or the mobile device itself (the hardware). In limited instances, the locus of a
message is not permanently fixed but is up for negotiation in users’ minds depending on how the
interaction is unfolding at a given moment or what device they are interacting with. Therefore,
PT
the source to which people are attending when interacting with mobile assistants is not uniform
RI
across mobile assistants as a technological class, among specific assistants or for individual
users. For people who think of themselves as interacting with the “voice of the machine,” the
SC
mobile assistant comes to function primarily as a medium for controlling the phone or extracting
information from it, whereas for people who perceive the “voice in the machine,” the assistant is
U
the source. The study of source orientation regarding mobile assistants, thus, requires going
AN
beyond determining simply whether a person is orienting to a technology versus a human to
ascertaining which technological “layer” users are attending to and the subsequent effects.
M
The key question raised by these findings is why variation exists regarding source
D
orientation toward mobile virtual assistants. Because this study was focused on exploring
TE
people’s conceptualizations of assistants and perceptions of their interactions with them within
an interpretive framework it is not possible to definitively identify specific factors that resulted in
EP
with interpretations of these explanations based in existing HMC theory can point toward
C
The variation in social cues incorporated into the design of mobile virtual assistants is
designed with specific human-like traits (gender, personality, nationality, etc) elicit social
responses from users (e.g. Nass & Brave, 2005; Nass et al., 1997). While mobile assistants are
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 24
more advanced than previous technologies that were the focus of source orientation research and
Siri has the most advanced social design. In addition to a female voice, Siri is imbued
with an overt personality (Gross, 2011b), can carry on non-task related interactions, such as
PT
telling jokes, and has distinct identity characteristics (Guzman, 2017). Google Voice Search,
RI
however, lacks many of these social attributes. The majority of people who used Siri perceived
themselves as interacting with a voice in the machine (70 percent of Siri users), while people
SC
using Google Voice Search associated the voice with the device. Olivia was the only Google
Voice Search user to think of herself as interacting with something other than the mobile phone,
U
and, in her case, the interlocutor was the Google database. A potentially influential social cue, as
AN
stated by some of the participants themselves, was whether the assistant had a name and how the
name was used during interactions. As previously discussed, it was Siri’s name that indicated to
M
Beecher that Siri may somehow not be one-in-the-same with the iPhone. Naomi also said she
D
thought of herself as interacting with Siri as an assistant because it had a name, while she
TE
associated the name-less Google Voice Search with the phone. In addition, some Siri users who
perceive themselves as exchanging messages with the phone reported that they forgo using Siri’s
EP
name when making requests of it. An assistant’s name, however, is only one of multiple social
Solomon and Wash’s (2014) suggested connection between social presence and source
AC
orientation may also help to explain differences in source orientation. As part of the larger
project of which this study is a part, people were asked questions related to social presence.
Some of the participants, such as Fred, who reported that they think of themselves as interacting
with the phone also indicated they did not experience social presence when using the assistant.
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 25
Conversely, some participants who oriented to the assistant, such as Priscilla, heavily
anthropomorphized the assistant and reported experiencing social presence with the agent. As a
psychological phenomenon, presence can vary over time (International Society for Presence
Research, 2000). People’s shifting sense of social presence may also provide an explanation for
PT
the changes in source orientation some participants reported with single assistants, such as
RI
Cameran, or across assistants, such as Naomi. However, the connection between source
orientation and social presence is only proposed by Solomon and Wash (2014), not empirically
SC
supported, making it an explanation subject to further study.
U
with mobile assistants cannot be more fully discussed here because of space considerations.
AN
Factors that warrant additional study include other aspects of technology design and use, such as
the types of messages exchanged between user and assistant and the functions the assistant
M
performs on behalf of the user. People’s responses to and perceptions of technology also are
D
contingent upon factors related to the users themselves, including their personality (Lee & Nass,
TE
2005) or cognitive style (Lee, 2010). Equally important, future research also should investigate
the implications of the study’s finding on assistant use, including how differences in source
EP
There are several limitations to this study, including the composition of the sample as
C
well as the overrepresentation of Siri users within it. Future research should include a
AC
representation of mobile virtual assistant users that is more in line with adoption of the different
agents. The nature of the overarching study provided breadth and depth in terms of people’s
conceptualizations of assistants and perceptions of interactions with them; however, the method
does not allow for the degree of similarity and difference to be measured, and so further efforts
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 26
also are needed in this area. Despite these limitations, this study provides new and important
insight into the different ways people attend to mobile virtual assistants.
For scholars and technology designers, there are several important implications to these
PT
theorized, particularly as technologies are simultaneously increasingly designed to be social, to
RI
perform numerous functions, and to operate across multiple contexts. When users have greater
latitude as to how they communicate directly with a technology and how they use it, the ability
SC
of technology designers to predict source orientation may be more difficult. Regarding the design
of disembodied programs, the hardware used to access these technologies or that is controlled by
U
these technologies also needs to be taken into account. Additional research focusing on the
AN
limitations of this study and the questions it generated regarding the explanation of people’s
source orientation toward mobile virtual assistants is needed; however, it should not be limited to
M
virtual assistants alone. Given the divergent nature of people’s perceptions of what they are
D
attending to with emerging technologies, more work regarding source orientation in HMC
TE
overall is warranted.
C EP
AC
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 27
6. References
Biocca, F., Harms, C., & Burgoon, J. K. (2003). Toward a more robust theory and measure of
PT
Charmaz, K. (2014). Constructing grounded theory (2ned ed.). Thousand Oaks, CA: SAGE
Publications.
RI
Christians, C., & Carey, J. W. (1989). The logica and aims of qualiative research. In G. Stempel
SC
& B. H. Westley (Eds.), Research methods in mass communication (pp. 354–374). Cliffs,
N.J.: Prentice-Hall.
U
Eckles, D., Wightman, D., Carlson, C., Thamrongrattanarit, A., Bastea-Forte, M., & Fogg, B. J.
AN
(2009). Social responses in mobile messaging: Influence strategies, self-disclosure, and
source orientation. In CHI 2009: Studying Cell Phone Use. Boston, MA.
M
Edwards, A., Edwards, C., Guzman, A. L., Jones, S., Gunkel, D. J., Spence, P. R., … Lewis, S.
Goffman, E. (1959). The presentation of self in everyday life. New York, NY: Anchor Books.
EP
Gross, D. (2011a, October 4). Apple introduces Siri, Web freaks out. Retrieved October 2, 2013,
from http://www.cnn.com
C
Gross, D. (2011b, October 17). Snide, sassy Siri has plenty to say. Retrieved October 2, 2013,
AC
from http://www.cnn.com/2011/10/18/tech/mobile/siri-answers-iphone-4s/index.html
Guzman, A. L. (2017). Making AI safe for humans: A conversation with Siri. In R. W. Gehl &
M. Bakardjieva (Eds.), Socialbots and Their Friends: Digital Media and the Automation
Guzman, A. L., Jones, S., Edwards, A., Edwards, C., & Spence, P. R. (2016, June).
PT
Communicating with Machines: The Rising Power of Digital Interlocutors in Our Lives.
RI
Presented at the International Communication Association, Fukuoka, Japan.
Hearst, M. A. (2011). “Natural” search user interfaces. Communications of the ACM, 54(11), 60–
SC
67. https://doi.org/10.1145/2018396.2018414
Hoffmann, L., Krämer, N. C., Lam-chi, A., & Kopp, S. (2009). Media equation revisited: Do
U
users show polite reactions towards an embodied agent? In Z. Ruttkay, M. Kipp, A.
AN
Nijholt, & H. H. Vilhjálmsson (Eds.), Intelligent Virtual Agents (Vol. 5773, pp. 159–
04380-2_19
D
Holstein, J. A., & Gubrium, J. F. (1995). The active interview (Vol. 37). Thousand Oaks, CA:
TE
SAGE Publications.
International Society for Presence Research. (2000). Presence defined. Retrieved February 18,
EP
Lee, E.-J. (2010). The more humanlike, the better? How speech type and users’ cognitive style
C
https://doi.org/10.1016/j.chb.2010.01.003
Lee, H. R., Šabanovic, S., & Stolterman, E. (2014). Stay on the boundary: artifact analysis
exploring researcher and user framing of robot design (pp. 1471–1474). Presented at the
Lee, K.-M., & Nass, C. (2005). Social-psychological origins of feelings of presence: Creating
Lombard, M., & Ditton, T. (1997). At the heart of it all: The concept of presence. Journal of
PT
6101.1997.tb00072.x
RI
Maxwell, J. A. (2013). Qualitative research design (3rd ed.). Thousand Oaks, CA: SAGE
Publications.
SC
Moon, Y., & Nass, C. (1996). How “real” are computer personalities?: Psychological responses
U
651–674. https://doi.org/10.1177/009365096023006002
AN
Nass, C., & Brave, S. (2005). Wired for speech: how voice activates and advances the human-
Nass, C., & Moon, Y. (2000). Machines and mindlessness: Social responses to computers.
D
Nass, C., Moon, Y., & Green, N. (1997). Are machines gender neutral? gender-stereotypic
responses to computers with voices. Journal of Applied Social Psychology, 27(10), 864–
EP
876.
Nass, C., & Steuer, J. (1993). Voices, boxes, and sources of messages: Computers and social
C
2958.1993.tb00311.x
Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are social actors. Conference Companion
Payr, Sabine, & Trappl, R. (Eds.). (2004). Agent culture: Human-agent interaction in a
Reeves, B., & Nass, C. I. (1998). The media equation. Standford, CA : CSLI Publications.
Schaefer, K. E., Billings, D. R., & Hancock, P. A. (2012). Robots vs. machines: Identifying user
PT
perceptions and classifications (pp. 138–141). IEEE.
RI
https://doi.org/10.1109/CogSIMA.2012.6188366
Sconce, J. (2000). Haunted media: Electronic presence from telegraphy to television. Durham,
SC
N.C.: Duke University Press.
Shechtman, N., & Horowitz, L. M. (2003). Media inequality in conversation: How people
U
behave differently when interacting with computers and people. In Proceedings of the
AN
SIGCHI conference on Human factors in computing systems (pp. 281–288). ACM.
Solomon, J., & Wash, R. (2014). Human-What interaction? Understanding user source
M
orientation. In Proceedings of the Human Factors and Ergonomics Society 58th Annual
D
Srinivasan, V., & Takayama, L. (2016). Help me please: Robot politeness strategies for soliciting
help from humans (pp. 4945–4955). Presented at the #chi4good, ACM - CHI, San Jose,
EP
Straub, I., Nishio, S., & Ishiguro, H. (2010). Incorporated identity in interaction with a
C
https://doi.org/10.1109/ROMAN.2010.5598695
703. https://doi.org/10.1177/009365000027006001
ACCEPTED MANUSCRIPT
Running Head: SOURCE ORIENTATION & MOBILE ASSISTANTS 31
Sundar, S. S., & Nass, C. (2001). Conceptualizing sources in online news. Journal of
Tamagawa, R., Watson, C. I., Kuo, I. H., MacDonald, B. A., & Broadbent, E. (2011). The effects
PT
Social Robotics, 3(3), 253–262. https://doi.org/10.1007/s12369-011-0100-4
RI
U SC
AN
M
D
TE
C EP
AC
ACCEPTED MANUSCRIPT
Research Highlights
Article: Voices In and Of the Machine: Source Orientation Toward Mobile Virtual Assistants
1. Voice-based, mobile virtual assistants (i.e. Siri, Google Voice) have complex designs
2. Some users perceive the conversational agent’s voice as representing the phone
3. Other users perceive the conversational agent’s voice as the assistant in the phone
PT
4. The technological layers of mobile assistant design complicate source orientation
RI
U SC
AN
M
D
TE
C EP
AC