Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Poster Session I HAI 2017, October 17–20, 2017, Bielefeld, Germany

Dealing(with(Long(Utterances:(How(to(Interrupt(the(User(in(
a(Socially(Acceptable(Manner?(
Katharina Cyra Karola Pitsch
University of Duisburg-Essen University of Duisburg-Essen
Essen, Germany Essen, Germany
katharina.cyra@uni-due.de karola.pitsch@uni-due.de
ABSTRACT( of the circumstances under which such long utterances
Based on two contrasting case analyses we describe prac- emerge, and how these might lead to problems of the dia-
tices of handling long utterances, i.e. turn increments pro- logue component, eventually turning already correctly de-
duced by users. The strategies to handle long utterances are tected information into false interpretations. Hence, this
executed by a human wizard in the context of a speech- paper suggests an initial investigation of the subsequent
based assistive system – and reveal specific characteristics practical problem: How could a technical system adequate-
of human-human-interaction like precise timing that are not ly deal with long utterances produced by the user?
easily implemented into a technical system. The prelimi-
nary analysis shows two basic strategies of handling long
utterances: 1) to interrupt and 2) to wait-and-see. But the
fine-grained analysis of the participants' perspective shows
yet another dimension when evaluating forms of handling
turn increments: depending on whether user input was taken
up and so, the initiated action is continued, even interrup-
tions (at uncommon points in interaction) are socially ac-
cepted by users. In contrast, the socially 'safe' strategy to
wait-and-see might even cause trouble, when the jointly
worked on task is not continued by the system.
Author(Keywords( Image 1: Image of the Embodied Conversational Agent
Human-Agent-Interaction; Assistive systems; Social robot- BILLIE and exemplary appointments entered in the calendar
ics; Natural Language.
Research from interactional linguistics investigates natural
ACM(Classification(Keywords( language in Human-Human-Interaction (HHI) and studies
H.1.2 User/Machine Systems: Human factors; H.5.2 User show that utterances are not long from the outset but
Interfaces: Natural language; H.5.2 User Interfaces: User- emerge successively step by step [4]. Due to this additive
centered design. processual production, the units of speech are called (turn)
INTRODUCTION( increments. Research shows that turn increments also occur
Dialogue systems both in human-robot- and human-agent- in HAI within specific interactional contexts. Moreover, it
interaction (HRI/HAI) often have to cope with the fact that could be shown that and by which means the system added
users tend to produce long utterances when talking naturally to turn production, i.e. when no uptake by the system is
to the system. In comparison to short utterances, or longer perceivable, users tend to produce turn increments [10, 2].
utterances which are ‘well-packaged’ into discrete units [8], As shown by Cyra and Pitsch [2] for HAI, these turn incre-
long 'un-structured' utterances are hard to deal with by au- ments stand for different forms of actions performed by the
tonomous systems. Following the idea of exploring a tech- users such as continued action (i.e. adding more infor-
nical system's possibilities to pro-actively shape the details mation to an initial increment), parallel action (i.e. stating
of user conduct, [16], [10] and [2] have reconstructed some information for e.g. a successive appointment even though a
previous is not completed) and separate action (i.e. actions
Permission to make digital or hard copies of all or part of this work for that lie beyond the system's task domain like commenting
personal or classroom use is granted without fee provided that copies are
or assessing prior actions). To provide conditions under
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for which a technical system could work best it might be good
components of this work owned by others than the author(s) must be to avoid certain of these actions that cause problems for
honored. Abstracting with credit is permitted. To copy otherwise, or subsequent actions of the system like separate actions. But
republish, to post on servers or to redistribute to lists, requires prior specif-
ic permission and/or a fee. Request permissions from Permis-
once users successively produce emerging long utterances,
sions@acm.org. an autonomous system needs strategies to handle them.
HAI '17, October 17–20, 2017, Bielefeld, Germany
© 2017 Copyright is held by the owner/author(s). Publication rights
In some studies, technical or autonomous systems handle
licensed to ACM. turn increments by the strategy of 'barging in' [6]. i.e. they
ACM ISBN 978-1-4503-5113-3/17/10…$15.00
https://doi.org/10.1145/3125739.3132586

341
Poster Session I HAI 2017, October 17–20, 2017, Bielefeld, Germany

interrupt user speech. This strategy might be perceived as input the human wizard could choose from various verbal
socially problematic or impolite and have a negative impact resources, like verbal rephrasing realized by the ECA par-
on (social) acceptance [15]. In contrast, research from HHI tially linked to visual representations in the calendar (for
suggests that interruptions are quite common under certain details see [2]). Except where interruption conditions apply,
conditions: Jefferson [7] shows that in the context of turn wizards could operate the system flexibly regarding timing
taking and overlapping speech of two participants, interrup- or overlaps and were instructed to spontaneously decide on
tions occur systematically in specific parts of a turn and are uptake and verbal resources provided via the WOZ GUI.
seldom handled as troublesome by participants. Following
this, typical points for overlaps and resulting interruptions
A detailed micro-analysis of interaction allows to precisely
in HHI are at the beginning or end of an interlocutor's turn
describe interactional practices of participants while apply-
[7] and are precisely timed by human interlocutors. Thus,
ing contrary strategies to deal with long utterances. The
when a technical system is expected to function in a social-
analytical method is informed by Conversation Analysis
ly acceptable manner, e.g. in collaborative actions, such as
to jointly enter appointments with the user, we might want (CA) and is based on a multimodal perspective on interac-
to carefully design system strategies for turn taking and tion [14]. By following this approach, we focus on the se-
handling users’ long utterances. This is the groundwork for quentiality of interaction and the member’s perspective: We
the micro-analysis of two cases of contrary strategies ap- describe how interaction evolves step by step and recon-
plied by the system, i.e. interrupting to handle a long turn struct how participants subsequently react upon each oth-
(which might be perceived as socially problematic as stated er’s multimodal utterances. Likewise, this approach allows
by [15]) in contrast to a case in which the system (at first) to carve out what contributes to a user's social acceptance as
does not interrupt but 'waits' until a long 'turn' is finished. displayed by participants themselves (in contrast to re-
searcher's external perspective or pre-defined categories).
DATA(&(METHOD(
The following cases are taken from interactions of two
The analysis is based on a data corpus of interaction be-
senior participants (SEN) with two contrastive system strat-
tween human participants and an assistive system with the
egies to handle user utterances. Case 1 shows a strategy of
task to enter appointments into a virtual calendar presented
'wait-and-see' and interruption, case 2 illustrates an 'inter-
on a screen with an Embodied Conversational Agent (see
ruptions-only' strategy.
Image 1). Overall system development aims at developing
an assistive system that might support users with mild cog- CASE( 1:( INTERRUPTION( AND( UPTAKE( OF( RELEVANT(
nitive impairments like e.g. mild forms of dementia in time USER(INFORMATION(AFTER('WAITNANDNSEE'(
management and time orientation [17, 1]. The corpus com- The first case analysis (Transcript 1) shows how the tech-
prises audiovisual data, eye-tracker data, and system log nical system, i.e. the wizard operating the Embodied Con-
files that report system status changes. Interaction data of versational Agent (E), first contributes to the emergence of
52 participants was gathered (divided evenly in groups of a long user utterance through its own actions by applying a
seniors (SEN), persons with mild cognitive impairments ‘wait-and-see’ strategy. Subsequently, the system needs to
(CIM) and a student control group (CTL); for a discussion deal with this situation and applies a strategy of interrupting
on possible interactional specificities of the special user the participant (P) and makes use of an uptake of input
groups see [9]). Data processing comprises annotation and produced previously by the user. In this case the wizard
transcription of speech following GAT conventions [13] could spontaneously manage interaction without special
and, where necessary the annotation of relevant multimodal conditions regarding interruption or uptake.
interactional resources like participants' gaze or posture. 01 E: WHAT is the next appointment (.)
02 that i should enTER for you- (.)
03 mister LASTname,
The system was operated by a human wizard, unnoticed by 04 (0.228)
participants. With the Wizard-of-Oz (WOZ) setup [11] the 05 P: the next appointment would be
study's underlying aim with the focus on interactional prac- 06 tuesday,
07 (1.552)
tices of the user and the system could be explored without 08 P: oh tuesday was already right?
having to deal with misunderstandings caused by the Au- 09 (1.552)
tomatic Speech Recognition (ASR). Furthermore, a WOZ 10 P: no (rather wen-) (.) WEDNESday,
11 (1.561)
setup allows to inform future system design by analyzing 12 P: wednesday [DEN-]
interactional practices and strategies performed by the hu- 13 E: [then] WEDNESday;
man wizard [5]. In the given system users had to enter 14 P: wednesday; (0.448) dentist from eight
15 to nine o'clock;
about 10 to 13 appointments. The system was built with
different appointment conditions that differed regarding the Transcript 2: WOZ1-SEN-021 / T5 "Dentist - interrupt"
with system (E) interruption in lines 12/13
appointment content (user-initiated appointment entries vs.
system-initiated appointment proposals) and anticipated (a) System contributes to the emergence of a long user
system errors (like e.g. misunderstandings, interruptions or utterance by 'wait-and-see': E initiates the appointment
repetitions). To operate the system and for uptake of user entry (l. 01-03). After a short pause P tells a parameter for

342
Poster Session I HAI 2017, October 17–20, 2017, Bielefeld, Germany

[DAY] (l. 05/06: "tuesday") that is followed by a turn context of autonomous systems might lead to difficulties.
vacant pause [12] in which no uptake (indicated by the (c) User treats interruption as non-problematic: After
length of the pause in seconds) by E becomes noticeable (l. E's interruption in overlap, P confirms [DAY] by repeating
07). The system does not show any uptake for another the [DAY] (l. 14) and after a short pause incrementally adds
1.552 seconds even though the user has stated task-relevant information on [TOPIC] and [TIME] (l. 14/15) without
information, so this practice might be interpreted as a 'wait- further reference to the interruption on a verbal or multi-
and-see' practice. Reacting to the system's (missing) con- modal level. This extract shows how the interruption at a
duct P continues his turn with an increment indirectly ad- non-typical point in turn production is accepted by P: After
dressing E and stating that appointments for Tuesday were being interrupted there is no hint in the observable conduct
finished (l. 08: "oh tuesday was already right?"), of the participant that the interruption might be a problem.
which is a separate action that might become problematic So, the system's interruption might not be featured with a
as it deals with the assessment of prior user activities [2, 4]. precise timing as found in HHI, but the system respectively
Again in the emerging pause, the system shows no uptake the wizard picks up relevant information that help to work
(l. 09), so P continues his turn with a concurrent parameter further on the joint task of appointment entry.
for [DAY] (l. 10 "WEDnesday") followed by a pause (l. 11) CASE( 2:( IMMEDIATE( INTERRUPTION( AND(
and missing uptake by E. Once again, P produces a turn NEGLECTION(OF(RELEVANT(USER(INFO(
increment repeating the [DAY] and initiating the appoint- The second case (Transcript 2) shows how the technical
ment [TOPIC] (l. 12). This extract shows how the strategy system, i.e. the wizard operating it, immediately – and pre-
of wait-and-see and missing uptake by the system (l. 07-11) cisely timed at turn-beginning – interrupts the participant's
contributes to the production of a turn increment with a turn when she starts answering the request for entering an
length of about 9.641 seconds and concurrent information appointment. In this case we see the condition in which the
on the parameter of [DAY] as shown by [2]. wizard had to interrupt user speech at turn beginning with a
repeated request. In contrast to the first case, the user's input
(b) System interrupts user through uptake of user input: is not taken up by the system. This suggests to the user, that
While P begins to state [TOPIC] as another turn increment, her input does not seem to be relevant for the system. Inter-
E phrases the corrected parameter for [DAY] previously estingly, this appears to result in interactional trouble.
named in l. 10 and 12 in overlap with P. In contrast to find- (a) System interrupts user for the first time: The second
ings in HHI [7] this overlap is not found at a point that is case also starts with an appointment initiation by E (l.
common for overlaps like turn beginning or turn end. But 01/02). P answers after a short pause with information on
from the perspective of handling turn increments the prac- [DAY] (l. 04 "thursday") and, after a micro-pause goes
tice is effective as P immediately stops producing the next on to specify [TIME] of the appointment (l. 04). In the
increment. E's rephrasing contains information that was exact same moment E starts to repeat the initial request in
stated before by P, which we call an uptake of relevant user overlap with P's phrasing of the time (l. 05) neglecting the
input. This extract points at the wizard's ability to pick the user input and simultaneous talk. Following Jefferson [7],
right, i.e. corrected, information for [DAY] which in the

Transcript 1: WOZ-SEN-022 / T10: "Thursday – interrupt": System repeatedly interrupts user (ll. 04, 06, 09) without taking up
information earlier requested and given by the user. Color blocks in transcript indicate gaze areas: pink = gaze at calendar; green
= gaze at ECA; blue = gaze at table; Brackets around transcript blocks indicate P's posture and show a major posture shift in l.09

343
Poster Session I HAI 2017, October 17–20, 2017, Bielefeld, Germany

overlaps at turn beginning are common in HHI. This results user input as described in Case 1, here the system repeated-
in P interrupting her turn production. This practice of 'inter- ly goes on asking the initial request. This practice is treated
ruption' stands in contrast to the 'wait-and-see' practice as troublesome by the participant which becomes observa-
described in Case 1: The user's further turn increment pro- ble by a major posture shift and a gaze pattern that was
duction is avoided by interrupting here. Besides, we find a described as indicating trouble in earlier research [2]. So,
specific gaze pattern described by [2] that can be found trouble becomes apparent in different modalities of P's
when trouble occurs in the task of appointment entry: when conduct: a specific gaze pattern indicating trouble, her no-
planning and stating information about the appointment ticeable change of posture – from leaning forward and gaze
during E's initial request, P's gaze is directed at the calendar directed at CAL (which is P's predominant position and
(CAL). Shortly after E's interruption P's gaze switches from orientation during the 10-minute interaction with the sys-
CAL to E, which was described by [2] as a practice fre- tem) to leaning backwards and withdrawing gaze – and
quently found when interactional trouble occurs. repeated attempts for turn taking at specific points typical
for HHI. Hence, in contrast to the first case analysis, that
(b) System repeatedly interrupts user neglecting user
showed that interruptions are acceptable even at uncommon
input: After being interrupted and asked for an appoint-
points of turn production, this case analysis highlights the
ment entry a second time, P again states information for
necessity to recognize and acknowledge the interlocutor's
[DAY] (l. 07). Again her turn emerges in overlap with E's
contributions, i.e. making relevant user's input.
continuation of the request (l. 08), which from a timing
perspective is placed precisely at turn beginning again. P CONCLUSION(
only states information for [DAY] and then breaks her turn The initial question was to explore how a system could
off while E finishes the programmed sentence. This extract handle long utterances. Based on research from HRI and
first shows how P orients her turn production toward pauses HHI the detailed analysis of two cases of interruptions
for turn-taking (l. 06). It also shows that E's interruption revealed the circumstances and conditions under which two
practice is effective regarding turn length as P even produc- basic strategies of handling long utterances, i.e. turn incre-
es a reduced utterance. But when taking into account the ments, were marked as (non-)acceptable by the users. First,
multimodal perspective it becomes clear that P's appoint- the analysis of the strategy of wait-and-see which is over-
ment planning process (which is observable by P's gaze whelmingly taken as socially acceptable, shows that to wait
directed at CAL), is again interrupted by E's repeated re- and see has the risk of contributing to the production of turn
quest. The specific gaze pattern (switching between CAL increments. As 'waiting' could be perceived by users as
and E) again highlights emerging interactional trouble. missing uptake, this could eventually become problematic
When inspecting E's contribution in this context, the repeat- for autonomous systems when users produce turn incre-
ed request makes it apparent that the user's input is not ments to handle missing uptake. The analysis of the strategy
treated as relevant for the system. of interrupting user speech which in contrast is taken as
socially problematic, points at diverse conditions and inter-
(c) User treats interruptions as problematic: In the next actional contexts in which interruptions might take place.
fragment E repeats the appointment request (l. 09) while P Analysis shows that interrupting speech per se or precise
starts to lean backwards, and withdraws her gaze from the timing cannot be taken as the criterion for social ac-
screen to the table. After a micro-pause P starts another ceptance, as displayed by the human participants. Interest-
attempt for turn production (l. 10) gazing at CAL, but ingly, it could be shown that the key element for acceptance
breaks off as E continues its turn in overlap with P. While of interruptions to handle turn increments is the proper
observing E audio-visually, P projects E's turn end [3] and uptake and subsequent making relevant of user input and
produces another turn increment in overlap with E, but not a specific interruption strategy and timing.
breaks off. In l. 13, after E finishes its turn, P eventually
states information for [DAY] without interruption by E. In After this exploratory analysis the next steps are to analyze
the following turn vacant pause of 3.567 seconds P's gaze whether the findings regarding strategies to handle turn
switches between CAL and E while there is no noticeable increments and their acceptancy by users can also be found
uptake by the system. This missing uptake by E eventually in the other user groups. Then a corpus analysis of man-
contributes to the production of P's turn increment (l.15). agement of turn increments with focus on interruptions and
uptake, and a quantification of the described categories to
This extract highlights how interactional trouble occurs. E make predictions on the frequency of these strategies will
interrupts P repeatedly in a very short period of time but at follow. Eventually this could be the basis for implementing
a common point for overlaps in HHI, and neglects P's con- the described strategies into an autonomous system to han-
tributions for appointment entry as well as her attempts for dle turn increments with specific strategies that bypass
turn-taking. On the other hand, the user has to deal with a timing issues and focus on uptake of (relevant) information.
non-interruptible system, that produces options for turn-
taking via micro-pauses, but does not stop its own turn. ACKNOWLEDGEMENTS(
This points to the key element that causes interactional This research was supported by the German Feder-
trouble: The system repeatedly misses to take up relevant al Ministry of Education and Research (BMBF) in the pro-
information that it requested earlier. Instead of taking up ject KOMPASS. We would like to thank the participants.

344
Poster Session I HAI 2017, October 17–20, 2017, Bielefeld, Germany

REFERENCES( 10.! Karola Pitsch, Ramin Yaghoubzadeh and Stefan Kopp.


1.! Katharina Cyra, Antje Amrhein and Karola Pitsch. 2015. Entering Appointments: Flexibility and the Need
2016. Fallstudien zur Alltagsrelevanz von Zeit- und for Structure. In: Proceedings of the International Con-
Kalenderkonzepten. In Mensch und Computer (MuC) ference of the German Society for Computational Lin-
2016. 1-5. guistics and Language Technology GSCL 2015. 140-
2.! Katharina Cyra and Karola Pitsch. 2017. Dealing with 141.
'Long Turns' Produced by Users of an Assistive Sys- 11.! Laurel D. Riek. 2012. Wizard of oz studies in hri: a
tem: How Missing Uptake and Recipiency Lead to systematic review and new reporting guide-
Turn Increments. Accepted for RO-MAN 2017. lines. Journal of Human-Robot Interaction, 1(1): 119-
3.! Jan-Peter De Ruiter, Holger Mitterer and Nick J. En- 136.
field. 2006. Projecting the end of a speaker's turn: A 12.! Reinhold Schmitt, (2004). Die Gesprächspause: Verba-
cognitive cornerstone of conversaton. In Langu- le" Auszeiten" aus multi-modaler Perspektive. In Deut-
age, 82(3), 515-535. sche Sprache 32(1): pp. 56-84.
4.! Cecilia Ford, Barbara Fox and Sandra. A. Thompson. 13.! Margret Selting, Peter Auer, Dagmar Barth-
2002. Constituency and the grammar of turn incre- Weingarten, Jörg Bergmann, Pia Bergmann, Karin
ments. In The language of turn and sequence, Cecilia Birkner, Elizabeth Couper-Kuhlen, Arnulf Depper-
Ford, Barbara Fox and Sandra A. Thompson (eds.). mann, Peter Gilles, Susanne Günthner et al. 2011. A
Oxford University Press, 14-38. system for transcribing talk-in-interaction: GAT 2.
5.! Raphaela Gehle, Karola Pitsch, Timo Dankert and Gesprächsforschung - Online-Zeitschrift zur verbalen
Sebastian Wrede. 2017. How to Open an Interaction Interaktion (12): 1-51.
Between Robot and Museum Visitor?: Strategies to Es- 14.! Jack Sidnell and Tanya Stivers (eds.). 2012. The hand-
tablish a Focused Encounter in HRI. In Proceedings of book of conversation analysis (Vol. 121). John Wiley
the 2017 ACM/IEEE International Conference on Hu- & Sons.
man-Robot Interaction, 187-195.
15.! Mark Ter Maat, Khiet P. Truong and Dirk Heylen.
6.! Fabrizio Ghigi, Maxine Eskenazi, M. Ines Torres and 2011. How agents' turn-taking strategies influence im-
Sungjin Lee. 2014. Incremental dialog processing in a pressions and response behaviors. In Presence: Tele-
task-oriented dialog. In Fifteenth Annual Conference of operators and Virtual Environments, 20(5): 412-430.
the International Speech Communication Association,
16.! Anna-Lisa Vollmer, Manuel Mühlig, Jochen J. Steil,
308-312.
Karola Pitsch, Jannik Fritsch, Katharina J. Rohlfing
7.! Gail Jefferson. 1984. Notes on some orderlinesses of and Britta Wrede. 2014. Robots show us how to teach
overlap onset. Discourse analysis and natural rheto- them: Feedback from robots shapes tutoring behavior
ric, 500: 11-38. during action learning. PloS one, 9(3): e91349.
8.! Manja Lohse, Britta Wrede and Lars Schillingmann. 17.! Ramin Yaghoubzadeh and Stefan Kopp. 2016. Towa-
2013. Enabling robots to make use of the structure of rds graceful turn management in human-agent interac-
human actions-a user study employing Acoustic Pack- tion for people with cognitive impairments. In SLPAT
aging. In RO-MAN, 2013 IEEE. 490-495. 2016. 26-31.
9.! Christiane Opfermann, Karola Pitsch, Ramin
Yaghoubzadeh and Stefan Kopp. 2017. The Communi-
cative Activity of 'Making Suggestions' as an Interacti-
onal Process: Towards a Dialog Model for HAI. Sub-
mitted for HAI 2017. 1-10.

345

You might also like