Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

CHI 2008 Proceedings · Special Interest Groups April 5-10, 2008 · Florence, Italy

Vocal Interaction
Sri H Kurniawan Abstract
Department of Computer Engineering Vocal interaction research has slowly been gaining
University of California Santa Cruz popularity in the mainstream HCI, assistive technology,
SOE-3, 1156 High Street arts and game development communities. One main
Santa Cruz CA 95064, USA reason for the uptake of this interaction style is the
srikur@soe.ucsc.edu potential of exploiting one of the most natural means of
expression: human vocalizations, speech, and vocal
Adam J Sporka gestures. This SIG meeting has three purposes: to
Dipartamento di Ingegneria e Scienza communicate the results of the CHI 2007 workshop on
dell’Informazione vocal interaction to interested CHI attendees; to sketch
Univerisità degli Studi di Trento a research agenda on the topic of the emerging
Via Sommarive 14 interaction styles in the context of vocal interaction and
38100 Povo (TN), Italy its implications for the design of interactive systems;
adam.sporka@disi.unitn.it and to bring together the communities of researchers
and practitioners who are addressing this topic.

ACM Classification Keywords


H.1.2 User/Machine Systems - Human factors, Software
psychology, H.5.2 User Interfaces - Theory and
methods, Voice I/O.

General Terms
Human factors, Theory, Design

Keywords
Speech interaction, vocal interaction, interaction style,
human voice, TTS, ASR
Copyright is held by the author/owner(s).
CHI 2008, April 5 – April 10, 2008, Florence, Italy Introduction
ACM 978-1-60558-012-8/08/04. Increasingly, the quality of communication between
interactive systems and their users is determined by

2407
CHI 2008 Proceedings · Special Interest Groups April 5-10, 2008 · Florence, Italy

the effectiveness of exchange of information between contexts of use and users. The topic is likely to be of
these two parties. Faster, more natural and more interest to a range of research streams within HCI,
convenient means for users and interactive systems to including games, assistive technologies, ubiquitous
exchange information are needed. On the user’s side, computing, ambient intelligence, multimedia,
interactive system technology is constrained by the virtual/augmented reality, multimodal interaction, etc.
nature of human’s functional abilities. The challenge is
to design devices and types of dialogues that better fit The purpose of this interdisciplinary SIG meeting is to
and exploit the communication-relevant characteristics bring together researchers, designers, developers, and
of humans [7], some of which have been discussed user groups with experience, interest and knowledge in
extensively in multimedia and pervasive computing vocal interaction. We expect to generate a discussion
literature, i.e. voice-, gesture- and gaze-based on how vocal interaction is positioned among other
interactions [9]. This SIG meeting will be looking more established interaction styles and the implications
specifically at voice-based interaction. of this interaction style for the interactive system
design, and more generally experience design.
In the past, voice-based interaction was synonymous to
speech-based interaction, with a significant body of The larger questions and issues we hope to address
literature reporting the success (with varying degrees) during the SIG meeting are the following:
of speech synthesizers and recognizers. In the late
1980s, two methods of non-verbal outputs emerged: • What is the state of the art in the research of vocal
earcons and auditory icons [12]. More recently, work interaction?
on spatial sound has also been reported in various • In what contexts of use is vocal interaction
domains [3]. The use of non-verbal vocalizations as appropriate or less appropriate?
input was less common in the past, and has only • What are the issues that need to be considered
recently received more attention. Research of vocal when designing, evaluating, using, and learning
interaction is currently gaining popularity in domains as vocal interaction systems?
diverse as entertainment [5], assistive technology [2,
10, 11], and visual or performing arts [1, 13]. Whilst In particular, we intend to focus on the following topics:
vocal interaction is a promising interaction style, there
are many issues that need to be discussed before it can Advantages and challenges of voice versus other
be successfully integrated into usable and acceptable modalities. We will discuss the extent to which the
interactive styles. vocal interaction is beneficial in case of physiological or
situation-induced disability, such as hands-busy
Issues situations [12] etc. It is necessary to assess the
This SIG meeting intends to investigate vocal problem of social acceptability and possible presence of
interaction as an emerging interaction style from the ambient noise.
several perspectives and with regard to different

2408
CHI 2008 Proceedings · Special Interest Groups April 5-10, 2008 · Florence, Italy

Application domains. Vocal interaction has been used Pre and Post SIG Meeting Activities
to represent a virtual input device usable in numerous Summaries of SIG discussions will be made available at
applications, including assistive systems (emulation of http://vocal-input.org/, the website we used for the
mouse [2, 10] and keyboard [11]), data entry, CHI 2007 workshop. SIG discussions will begin before
including non-verbal dictation [8], game control [5], and continue after the SIG meeting via a wiki devoted
music training [8, 13], multimodal interaction to the topic of vocal interaction available at
enhancement, speech and language therapy, voice http://vocal-input.org/wiki.
care, and speech disability compensation techniques. It
will also be interesting to discuss the use of vocal SIG Meeting Format
interaction in art where it may be used both as a tool • Introduction of the SIG goals and the participants.
for artistic expression [1] or a means of control of (10 minutes).
interactive installations.
• A short presentation of discussion results of the
Application platforms. We may discuss technical CHI 2007 workshop titled “Striking a C[h]ord:
requirements to support vocal interaction and the use Vocal Interaction in Assistive Technologies, Games,
of the vocal interaction on different platforms, such as and More” that we organized and an overview of
desktop PC, PDA, or telephony services. Vocal the upcoming special issue of the Universal Access
interaction also has potential in new interaction in the Information Society (UAIS) journal titled
paradigms such as ubiquitous computing and ambient “Vocal Interaction: Beyond Traditional Automatic
intelligence. However, several issues need to be Speech Recognition”, due to be published in 2008.
considered, such as whether the current mobile (10 minutes).
infrastructure can support vocal interaction; or the
current image capturing equipment and processing • Small group discussions, grouped either based on
good enough to collect and interpret vocal and application domains (e.g. assistive technology,
multimodal input? entertainment, etc) or backgrounds (e.g. more
technically-oriented and more socially-oriented).
Human factors. It is necessary to find both Throughout the discussion sessions, participants
physiological and psychological limitations of vocal will be invited to take notes and record comments,
interaction, especially the fatigue and precision of input ideas, and issues on post-its. These will be
and the privacy aspect as output. It is also important to attached to A0-sized cardboards placed on the
understand cultural and language differences in vocal walls of SIG meeting room, and will later be posted
interaction and identify possible problems with cultural on the website. (40 minutes).
interpretation of the use of various vocal features.
• Reporting results of small group discussions to the
SIG meeting and further discussion if necessary.
(20 minutes).

2409
CHI 2008 Proceedings · Special Interest Groups April 5-10, 2008 · Florence, Italy

• Discussion on future plans. As the SIG meeting [4] Goto, M., Itou, K., Kobayashi, T. Speech interface
aims at sketching a research agenda on the topic exploiting intentionally-controlled nonverbal speech
information. In Proc. UIST 2005, ACM Press (2005), 35-
vocal interaction, the follow-up activities will be
36.
geared toward building a research community in
[5] Hamalainen, P., Maki-Patola, T., Pulkki, V., Airas,
this area. We had started this initiative through
M. Musical computer games played by singing. In Proc
building a mailing list of the CHI 2007 workshop 7th Intl Conf on Digital Audio Effects, Naples, Italy,
participants and the contributors of the special 2004, 367–371.
issue of UAIS journal, but will be asking whether
[6] Igarashi, T., Hughes, J.F. Voice as sound: using
there is a need to facilitate full access to the non-verbal voice input for interactive control. In Proc
working papers, tools and deliverables of the UIST '01, ACM Press (2001), 155-156.
related projects in the website. (10 minutes). [7] Jacob, R.J.K., Leggett, J.J., Myers, B.A. and
Pausch, R. Interaction styles and input/output devices.
Acknowledgment Behaviour & Information Technology 12, 2 (1993), 69-
We would like to express our thanks to Constantine 79.
Stephanidis for shepherding the submission process of [8] Nakano, T., Goto, M., Ogata, J., Hiraga, Y. Voice
the special issue of the Universal Access in the drummer: a music notation interface of drum sounds
Information Society journal and Susumu Harada for co- using voice percussion input. In Proc. UIST 2000, ACM
organizing the CHI 2007 workshop on vocal interaction. Press (2005), 49-50.
[9] Maglio, P., Matlock, T., Campbell, C., Zhai, S.,
Adam Sporka’s research is currently supported by Smith, B.A. Gaze and speech in attentive user
interfaces. In Proc. ICMI 2000, ACM Press (2000), 1–7.
Marie-Curie excellence grant, within the project
ADAMACH (contract. No. 022593). [10] Sporka, A.J., Kurniawan, S.H., Slavik, P. Whistling
user interface (U3I). In Proc. UI4All 2004, LNCS 3196,
Springer-Verlag (2004), 472-478.
References
[1] Al-Hashimi, S. Blowtter: A Voice-Controlled [11] Sporka, A.J., Kurniawan, S.H., Slavik, P.: Non-
Plotter. To appear in Proc. HCI 2006 Engage. speech Operated Emulation of Keyboard. In Clarkson J.,
Langdon, P., Robinson P. (eds) Designing Accessible
[2] Bilmes, J., Malkin, J., Li, X., Harada, S., Kilanski,
Technology, Springer-Verlag (2006), 145–154.
K., Kirchhoff, K., Wright, R., Subramanya, A., Landay,
J., Dowden, P., Chizeck, H.: The Vocal Joystick. In Proc. [12] Vilimek, R., Hempel, T. Effects of Speech and Non-
IEEE Intl Conf Audio, Speech and Sig Proc. Toulouse, Speech Sounds on Short-Term Memory and Possible
France, IEEE (2006). Implications for In-Vehicle Use. In Proc. ICAD’05.
[3] Deutscher, M., Hoskinson, R., Takashashi, S., Fels, [13] Welch, G.F., Howard, D.M., Rush, C. Real-time
S. Echology: an interactive spatial sound and video visual feedback in the development of vocal pitch
artwork. In Proc. MULTIMEDIA 2005. ACM Press accuracy in singing. Psychology of Music, 17(2), 146-
(2005), 937-945. 157.

2410

You might also like