Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

J Multimodal User Interfaces (2012) 5:211–219

DOI 10.1007/s12193-011-0084-2

O R I G I N A L PA P E R

An embodied music cognition approach to multilevel interactive


sonification
Nuno Diniz · Pieter Coussement · Alexander Deweppe ·
Michiel Demey · Marc Leman

Received: 17 January 2011 / Accepted: 12 December 2011 / Published online: 11 January 2012
© OpenInterface Association 2011

Abstract In this paper, a new conceptual framework and alyze the simultaneous variation of 10 stocks in real-time
related implementation for interactive sonification is intro- and needs to make decisions on the low level (i.e. the indi-
duced. The conceptual framework consists of a combination vidual stock as in buying or selling) and on a higher level
of three components, namely, gestalt-based electroacous- (i.e. the combined behavior of sets of stocks as a reflection
tic composition techniques (sound), user and body-centered of the company’s assets). When sound is used as a way to
spatial exploration (body), and corporeal mediation technol- investigate each individual stock, then it is clear that the si-
ogy (tools), which are brought together within an existing multaneous sounding of ten sound streams may be rather
paradigm of embodied music cognition. The implementa- overwhelming if no internal structure and emerging proper-
tion of the conceptual framework is based on an iterative ties of sound fusion are taken into account. Overwhelming
process that involves the development of several use cases. sounds will likely prevent the user from efficiently parsing,
Through this methodology, it is possible to investigate new analyzing and reacting in real time to the information that
approaches for structuring and to interactively explore mul- is conveyed to him. However, when electroacoustic compo-
tivariable data through sound. sition techniques that deal with combined sounds are incor-
porated, there is a possibility that the data display becomes
Keywords Interaction · Sonification · Embodiment · manageable and more accessible. The main issues can thus
Electroacoustic · Framework be stated as follows:
– How to make multiple levels of sonification both perceiv-
able and meaningful, in such a way so that their interre-
1 Introduction lated nature can be used to their best advantage?
– How to make sonified information available, in such a
Interactive Sonification has been proposed for displaying way that it can be manipulated and apprehended in a
multivariable information through non-speech sound-based straightforward manner?
communication. However, the search for efficient methods The traditional approach to sonification is based on a one-
to address this remains an open issue [1]. For illustrative way flow of information from system to user. As shown
purposes, we consider a scenario in which a user has to an- in Fig. 1, this approach consists of a model that first per-
forms feature extraction on the data, and then applies a map-
ping strategy for sonification. Prototype examples of this ap-
Electronic supplementary material The online version of this article proach are the Auditory Icons [2] and Earcons [3].
(doi:10.1007/s12193-011-0084-2) contains supplementary material, The one-way approach generally entails a high degree of
which is available to authorized users.
abstraction regarding the static data mining processes inter-
N. Diniz () · P. Coussement · A. Deweppe · M. Demey · vening between raw data and icons. Other techniques are
M. Leman
often closer to the data, such as parameter mapping (e.g.
Institute of Psychoacoustics and Electronic Music,
Ghent University, Ghent, Belgium Audification) [4]. However, these approaches seldom allow
e-mail: nuno.diniz@ugent.be the user to address the data from multiple perspectives, such
212 J Multimodal User Interfaces (2012) 5:211–219

Fig. 1 One-way sonification:


the model performs feature
extraction on the data, and uses
a fixed mapping strategy for
sonification

Fig. 2 Two-way sonification, or


interactive sonification: the
model performs feature
extraction on the data, but uses a
modifiable mapping strategy for
sonification based on embodied
cognition

as sound streams grouping and global versus local behavior three components namely, (1) gestalt-based electroacous-
analysis. tic sound generation (sound), (2) body-centered spatial ex-
Given the limitations of the one-way approach, the ques- ploration (body), and (3) corporeal mediation technology
tion can be raised whether a two-way approach can be de- (tools). Together they form the pillars of an integrated ap-
veloped in which the user would be able to explore the data proach to sonification, which is based on the embodied mu-
interactively. This means that the user intervenes between sic cognition paradigm as described in [6].
model and sonification. Hence, this requires a more flexi-
ble approach to data encoding and exploration techniques 2.1 Gestalt-based electroacoustic sound generation
in order to cope with emergent possibilities of multivariate
data. In fact, the techniques must possess a dynamic nature Sound-generation is here conceived from the viewpoint of
so that they can mutually influence and redefine each other. gestalt theory. As known, gestalt theory is strongly deter-
This approach to sonification implies a shared responsibil- mined by principles that affect the meaning of sound streams
ity between preconfigured relations on the one hand, and the in such a way that the whole provides a level of informa-
user’s expertise and intuition on the other hand. tion that is different from the sum of its parts. In our view,
In this paper, we propose a methodology for the sonifi- we aim at controlling sonification by exploiting principles of
cation of multivariate data based on the idea of interactive gestalt theory as guidelines for multi-level meaning genera-
sonification [5]. This involves the display of information tion. Such an approach considers both analysis and synthesis
on the basis of interaction and scope. Interaction implies as well as segregation and integration. Analysis and synthe-
that the perceptual viewpoint of the user can be controlled sis account for the decomposition of sounds into frequen-
through his movement, while scope implies the possibil- cies and the subsequent integration into pitches and chords
ity of zooming in and out for a discrimination or fusion [7]. Segregation and integration account for the grouping of
of sonified levels. As we will discuss in the following sec- time-varying patterns, depending on intervals of time and
tion, these concepts are closely linked with electroacoustic pitch [8].
gestalt-based composition and theories of musical embodi- In our view, these concepts need to be implemented using
ment. a language that incorporates these elements in a culturally
aware context. Our focus then turns to the electroacoustic
composition domain, and its internal and external contextu-
2 Theoretical background alization mechanisms [9]. Given its wide flexibility regard-
ing syntactic representation [10], electroacoustic composi-
In the two-way approach to sonification (Fig. 2), the user tion theory and practice may help to make a given dataset
and the sonification system may affect each other on the more accessible and easier to mold according to the user’s
basis of interaction. To handle the complexity of such an inspection goals.
interaction, it is necessary to further decompose the notion The search for successful scope transposition techniques
of interactivity into different elements. In what follows, we in sound based communication has always been a central
present a perspective on interactive sonification based on concern in this art form. Several technical processes that are
J Multimodal User Interfaces (2012) 5:211–219 213

concerned with the relationship between singularity and reg- 2.2 Body-centered spatial exploration
ularity of events [11] are addressed in Sound Object The-
ory [12], Concept of Unity [13], Moment Form or Formula Interaction with sound assumes a user dealing with informa-
based composition [14], as discussed in [15]. Consequently, tion resources from the viewpoint of his/her proper actions
they might encapsulate guidelines that can be of service [21]. Accordingly, we assume that the user’s conception of
in functional sound based communication [16]. For exam- sonification proceeds from the viewpoint of an attribution of
ple, the dialogue condition that is imposed to the sound ob- action-relevant values to sonification, in such a way that the
ject and the enclosing structure holds a dynamic perspective information resources may lead to meaningful experiences.
shift, which reassures the relationship between the two con- Given this action-based viewpoint, it is important to keep in
cepts. As a result, this unifying concept connecting sound mind that natural ways of processing multi-level informa-
object to enclosing structure, is taken as a design directive, tion in our environment is often based on the physical dis-
as in [17], for the manipulation of multiple levels of com- tance between the user and the resources. This physical dis-
plexity. tance defines the scope in which one is dealing with these
Furthermore, electroacoustic practices also address struc- resources. Therefore, we can say that the action-value, at-
ture in time. As an example, Stockhausen’s take on Infor- tributed to a resource, is dependent on the distance grounded
mation Theory [18] focuses on the behavior of sound ob- in actions that imply zooming in or out, and on experience.
jects through time, which is in close affinity with the prin- For the purpose of sonification, we consider different
ciples of similarity, opposition and belongingness in Gestalt spaces that surround the user as subjective body-centered
theory [19]. The sequence in which the auditory stimuli are spaces. These define the scope for a sonic exploration of
presented is in Stockhausen’s view crucial for perceiving the the information resources [22]. Three levels may be con-
musical discourse. Among others, Mikrophonie I is one of sidered here. A first level is the pericutaneous space, di-
the works where the main structural strategy is based on the rectly surrounding the body, which is defined as the space
amount of identity variation of a given gestalt segment or that provides fine-grained control over interface technolo-
“moment” in relation to the previous ones [20]. Such con- gies. A second level is the peri-personal space, which is de-
siderations are an imperative requirement, given the human fined as the area that immediately surrounds a person and
auditory system’s idiosyncrasies (i.e. the precedence of rel- that is reachable by the limbs of the subject. A third level is
ative over absolute relations in parameter discrimination). the extra-personal space, which is defined as the space out of
These concepts from electroacoustic practice present the reach of the subject in reference to the absolute position
themselves conform to the flexibility requirements related to within an environment. This multilevel spatial awareness is
the two-way sonification. Furthermore, in our approach, it is attuned with the hierarchical nature of sound object explo-
assumed that the explorer of the data controls these changes ration in such a way that the spectromorphological gesture
through body movement. As an extension of the referenced contained in the sonification can be transposed from audi-
guidelines of Schaeffer, the concepts of gesture and texture tory parameters to an exploratory process. By placing the
in spectromorphology theory [9] are included as a method- interaction within a context of spatio-temporal representa-
ological base concerning sonic attribution, deployment and tion, it becomes possible to engage in an embodied dialog
articulation. In his classification, Smalley addresses the con- with the data. The latter is then converted to a human scale,
sequences for the musical discourse emerging from the loss in reach of the user.
of physical character in sound objects. Such a loss under- In summary, by enabling a configurable location and
mines the bases upon which the internal relations of sonic form representation of the data in space, this methodology
elements are perceived and reasoned upon. As such, ges- invites the user to physically approach the inspection pro-
ture can be viewed as a wrapping mechanism for scope cess through a shared space of multilevel interaction [23].
transposition. It provides a translation device between the An embodied cognition approach is thus expected to further
physical user and the sonic texture (or spectral identity). enable a perceptual link between the data and the semantic
Furthermore, it allows the perceptual grouping of individual high-level representations of the user.
sound objects. As a result, multivariate data access should
include the representation of the variables’ progression as 2.3 The corporeal mediation technology
an energy-motion trajectory of gesture.
To summarize, the aim is to transpose the compositional The involvement of the human body in sonification assumes
strategies to the interactive sonification domain and to apply a technology that mediates between body and data. Corpo-
the relationships between material and structure to the mi- real mediation technologies are conceived as tools that ex-
cro and macro sound levels of data presentation. As a result, tend the human body as a kind of prosthesis in order to al-
functional contexts are generated by data-dependent hierar- low the human explorer to mentally access the digital realm
chical levels that still preserve their informational identity in which the data resources are assessed. These tools can
and significance. provide guidelines for deploying an interface that provides
214 J Multimodal User Interfaces (2012) 5:211–219

Fig. 3 An overview of the


Technological implementation

an integration of the variable resolution of the human mo- instance through Bluetooth and sent in the OSC protocol
tor system and sound objects [24]. Our approach is based on to the Java framework instance.
expanding the mediating role of the body through interac- – Java Framework—The Java-based framework constitutes
tion with an immersive 3D environment, using virtual enti- the core of the system concerning virtual scene state rep-
ties [25]. The present framework’s ongoing implementation resentation and monitoring and behavior triggering. All
[15] is a consequence of the need for generic data sonifica- the above OSC formatted information is gathered in the
tion tools for both research and applications [4]. It aims at virtual core and used to place and configure the virtual ob-
providing a software infrastructure for integrating the base jects in the scene. Based on this information, collision de-
concepts discussed above while addressing issues concern- tection and additional information (for example, the dis-
ing portability, flexibility and integrability. tance between virtual inspection tool and activated sound
A Java framework has been developed, based on a func- object for sound amplitude and reverb parameterization)
tional division of the multimodal interaction realm into indi- is calculated and sent in the OSC format to Max/MSP.
vidual branches around a state representation. In this frame- Also, through the Java 3D implementation, a visual core
work, the concrete implementation of a virtual world, its vi- is responsible for the rendering of the visual projections
sual and auditory representations, and the human interfaces, and the file logging of all relevant data that can be used for
can be defined according to the desired performance, access offline analysis (e.g. position of virtual objects; velocity
or functional needs of the intended use cases, using exter- associated with virtual objects at creation time; distance
nal libraries and platforms (e.g. Java3D, Supercollider 3, between virtual objects; time-stamped collision detection
Max/MSP, Ableton Live). An overview of the technological and activation of sonification elements, etc.).
setup is shown in Fig. 3. – Max/MSP—The Max/MSP environment is responsible
Since the framework is intended to provide various so- for the sound synthesis and spatialization (together with
lutions depending on certain research and/or application Ableton Live through MaxforLive) based on the data
needs, the following technical description is restricted to the transmitted by the Java framework.
main use case that is discussed in the present paper namely,
The above mentioned portability and flexibility require-
the dance use case (see Sect. 3.2):
ments are met by means of the use of Java technology in
– Optitrack motion capture system—This system is used to the framework’s core implementation for state management
capture the 3D position and orientation of IR reflective and monitoring. Respectively, portability through multiple
marker sets, that are typically attached to the human body. platform support and flexibility in terms of new functional-
The data is transmitted through the NatNet protocol via ity incorporation as well as rapid prototyping development
the Arena software and converted to OSC by a custom through object-oriented component hierarchization are more
driver. easily achieved. In the dance use case, the use of an Op-
– Micro-controller—An ICubeX system is used for con- titrack system is due to the specific need for capturing 3D
necting a bend sensor in order to control the radius of the position in a large interaction space. It is completely de-
virtual inspection tool. The data is received in a Max/MSP tached from the framework’s scene representation, which
J Multimodal User Interfaces (2012) 5:211–219 215

Fig. 4 A user exploring 2


variables’ data. The two object
arrays represent each the virtual
inspection window of each
variable. The triangular shaped
objects represent two virtual
inspection tools (virtual
microphones) operated by the
user

processes 3D positions, independently of the device through for user-oriented approaches. In what follows, two use cases
which they are collected. As an example, for a given “laptop and their evaluations are described.
centered” use case, a set of three optical cameras might be
sufficient for 3D position tracking and this adaptation would 3.1 Prototype use case
not have any repercussions on the core’s state management
and monitoring implementation. In the prototype use case, we tested three hierarchical layers
To sum up, sonification is conceived from the viewpoint of sonic data [15], using the musical concepts of pitch, inter-
of an interaction, which is based on principles that couple val and chord. The pitch mapped data according to a model
perception and action through gestalt-based awareness per- of variables’ value. The interval represented a numerical re-
ception and body-centered spatialization. Mediation tech- lation between two variables values. The chord represented
nology is needed for the coupling of these elements, in order the occurrence of a given set of relations among the data.
to make sure that body movements can naturally be extended
Pitches, intervals and chords were accessed through the use
and deployed in the digital environment. As such, interac-
of a virtual inspection window and a virtual inspection tool
tive sonification based on scope variation can be conceived
that allowed an interactive inspection. Sound generation was
as a disambiguation asset towards the variables’ behavior
based on Schaeffer’s concept of sound objects and on the
and its sonic correspondence. This is achieved through an
structural relationship between these objects. The virtual in-
active molding of the variables’ output and the added value
spection window consists of a finite set of virtual objects
concerning the result of this interaction.
that represent a time frame of the variable’s values. When
activated through body movement and collision detection,
3 Use cases and implementation the current value attributed to a given virtual object is fed
to the sonification engine. In this use case, the virtual in-
In this section, we present two use cases that have been set spection window is represented by an array of several cubes.
up in the context of implementing the approach above de- The sonification process is assigned to the latter objects, and
scribed. In our approach, the use cases form a core element conveys information about the activated set as a whole, stim-
within an iterative development cycle, in which the interac- ulating a perceptual interpolation between the set and the
tive system is tested and refined through cycles of user vali- individual nodes. The virtual inspection tool functions as a
dation. For that aim, interface affordance, hierarchical sound virtual microphone that enables the activation of the inspec-
levels generation and the related mapping needs to be devel- tion window. This approach was inspired by Stockhausen’s
oped and tested in an integrated way. By applying this inte- microphone use in Mikrophonie I, as well as by Schaeffer’s
gration in combination with a user-based, phased validation view on the recording process in Concrete Music. It provides
strategy, we obtain an early detection of preliminary issues. a perceptual play with the distance between object and mi-
On this basis, it is possible to steer the development of a sys- crophone in terms of a sonic realization that was related to
tem combining the three main components of our embodied reverberation and amplitude modulation. The virtual micro-
approach to sonification, namely sound, body, and tool. As phone that allows sonic scope variation was visualized as
pointed out in [26] concerning electroacoustic music anal- a small pyramid, following the user’s hand movements and
ysis in music information retrieval research, there is a need orientation. An explorative example is illustrated in Fig. 4.
216 J Multimodal User Interfaces (2012) 5:211–219

Fig. 5 First phase (Left) where a user sets a trail by performing dance lision with the virtual microphone represented by the grey sphere. The
movements. Here the blue objects represent the trail defined by the radius of the sphere can be adjusted by the opening/closing of the el-
movement of the head of the user. Second phase (Right) where another bow of the user
user explores the trail. The objects in red represent the objects in col-

3.2 The dance use case choreography. The user’s task was formulated as follows:
“Your predecessor laid out a small choreography, can you
The dance use case was based on the previously described recreate this choreography?”. The user was informed about
prototype, taking into account the users’ observations and the interactive sonic scope. Each user was explained that by
comments during its evaluation sessions [15]. This time, the reaching out the arms (small contraction index), the sonic
array is scattered in space, enabling users to vary their in- scope would increase. Inversely, by closing the arms (large
spection in space and time even more. An extra bend sen- contraction index), the sonic scope would decrease. Further-
sor and wireless ADC allowed a basic signaling of an open more, the user was informed that a low pitched sound sig-
or closed posture. This made it possible for the participants naled points in the beginning of the sequence, while high
to vary the scope of the virtual microphone by changing pitched sounds signaled points towards the ending of the se-
their posture from an open (global) viewpoint to a closed quence. The pitch would gradually increase from begin to
(detailed) one. Varying the scope of the virtual microphone end. When participants were unable to recreate the choreog-
made it possible to inspect the individual grains of the sound raphy on their own, the task was divided into smaller tasks,
when the user resides in his/her peri-cutaneous space. A de- namely, to locate the start position, end position, and global
crease of the index of contraction into the peri-personal direction of the choreography. Additionally, when activated,
space provides access to the macrostructure of the sound- the spatially located path elements allowed access to the ve-
scape. The contraction index, defined in [27], is a time vari-
locity related information. Besides indicating the direction
able value that is defined by the ratio of the area between
of the path through an increase in pitch, variation in the ve-
the bounding box of the human body posture and its silhou-
locity of the original movement was conveyed by the vari-
ette. The sound diffusion of the microstructures was directly
ation in the pitch’s increase. For example, if the velocity
linked to the user’s scope variation approach (the inspection
of the original movement is 3/2 of the maximum velocity
vector), using sonification ideas based on Smalley’s concept
(based on physical constrains) at the instant of creation, the
of spatiomorphology. That means that on a macro level, this
pitch would increase a perfect fifth from the previous one.
scope-driven sound diffusion defines the spectromorphology
This strategy is inspired on Stockhausen’s take on Informa-
of the soundscape. The interactive sonification implies the
tion Theory, as discussed in Sect. 2.
use of space in relation to a body-centered spatial explo-
ration over time. Since composition is imminent and ongo-
ing, it can only be molded through this kind of interaction. 3.3 User evaluations and feedback
The dance use case consists of two parts, namely, a setup
part and an exploration part. In the setup part, the data ob- In line with our user-oriented development approach, feed-
jects are set out in space. In this case, the data objects rep- back of the users’ exploration was recorded and analyzed
resent dance movements. A sequence of these data objects in order to better understand the possibilities of interactive
thus represents a trace of the dance movements. It is worth sound scope variation. The necessity of this level of user-
noticing that the sequence of data objects is not to be re- involvement also stems from the implied wielding strategies
garded as a dance choreography in the strict sense, but as [28]. A range of methods, mainly adopted from the field
an occupation of space. In the exploration part, the user is of HCI-usability studies [29, 30], is applied for this pur-
given specific tasks in order to explore the trace of the initial pose [31]. Questionnaires, a focus group discussion and the
movement (Fig. 5). The users had no visual representation video footage of the interaction sessions were collected. In
of when the virtual objects creation took place within the addition, all movement data of the users tracked by the mo-
J Multimodal User Interfaces (2012) 5:211–219 217

tion capture system, and all events occurring with the virtual
objects were logged to files for further offline analysis.
In the prototype use case, users were able to perceive and
to discern the information in multiple levels of sonification
[15]. Users found the system to be responsive and its oper-
ation to be intuitive. Their performance of appointed tasks
was improved by the use of different levels of sonification
(i.e. pitch, interval and chord). The outcome of the evalua-
tions made in the prototype stage (concerning performance,
maneuverability, precision, distinguishability and complete-
ness of visual and sonification output) showed that some of
the most important issues raised, addressed the interaction
with the virtual objects. The need was expressed to dynam-
ically change the morphology of the virtual inspection tool.
Other reported problems pertained to the use of a visual aid
for a better perception of the user’s own movements. All
these issues were dealt with in the implementation of the
Fig. 6 Visualization of movement data recorded by the motion capture
dance use case. The dance use case was difficult for most
system of two users. The trail set out by the first user is visualized by
users. Observations of the recorded footage show that the the starred markers where the starting point is in black and the ending
users experienced difficulties in reconstructing the trajec- point is in light grey. The trajectory indicated with the dotted markers
tory, although they were able to rapidly locate the beginning, represent, the movement of a second user exploring and recreating the
first user’s movement
the end and the direction of the trajectory through the soni-
fication. A particular problem did arise when the movement
of the original dancer occurred at a previous spatial location. the users were forced to make a non neutral rating of their
For users exploring the trajectory, it was hard to disentangle level of satisfaction. The second part of the questionnaire
the two events occupying the same place. In future imple- contained questions concerning demographic information,
mentations, such difficulty might be avoided by increasing (art) education, cultural profile and new technology use.
the resolution in the virtual inspection tool and/or by sup- Concerning the users’ background for the described use
pressing the generation of virtual path objects when velocity case, the users’ gender was predominately female (n = 4)
is below a given threshold. The latter would link the creation and had an even age distribution from 18 to 60 years old.
of virtual objects to the user’s energy level. A visualization All of the users received higher education and the majority
(n = 4) had a background in artistic education for a dura-
of an exploration is shown in Fig. 6. In this figure, it is clear
tion of over five years. The users reported to regularly en-
that the second user (exploring the choreography of the orig-
gage in cultural events (n = 4 for more than monthly activ-
inal dancer) can accurately locate the beginning, turning and
ity). Most of the respondents, however, stated to have little
end points and correctly reconstruct the direction of the de-
or no experience with new media demonstrations or exhi-
fined path.
bitions (n = 4). Regarding experience and leisure-time in-
Based on Nielsen’s guidelines for early stage usability
volvement with digital communication devices (e.g. com-
evaluations [32], preliminary tests were conducted with five
puter, smartphone, . . . ), most respondents reported manifold
users. According to Nielsen, this number is advisable to
and daily usage of the described technologies (n = 2 to be-
keep evaluations cost-effective. The number is sufficient
tween 1 and 3 hours and n = 2 to over 3 hours per day). The
to extract 85 percent of the usability problems reported
users’ opinions on the interface were divided. The identifi-
in early stage prototype evaluation. Alongside the experi- cation of the sonic output and the precision of sound con-
ment, the participants were asked to fill out a questionnaire. trol were found to be difficult. However, the applicability of
This generic questionnaire encompassed standard user back- sound to actions was classified as rather suitable (n = 3) and
ground information as well as a general appreciation con- the majority positively classified the aesthetic quality of the
cerning the used interface paradigms and technologies. The sonic output (n = 4). Regarding inter-modal relations, the
questionnaire was divided in two parts. A first group of ques- majority classified the relation between visuals and sounds
tions dealt with the evaluation of sonic, visual content and as clear (n = 3), attributing equal importance to both modal-
the relation between the two. Additionally, this part ques- ities (n = 4). In an overall personal evaluation regarding
tioned social interaction (if applicable) and personal assess- tool manipulation, all users evaluated the functional useful-
ment of the tool operation. The majority of these questions ness between good and very good and stated that their atten-
used a six point Likert evaluation scale. By using an even tion was divided between the use of new technology and the
number Likert-scale, which eliminates the middle option, combination of visual and sound.
218 J Multimodal User Interfaces (2012) 5:211–219

In a post-interaction evaluation, participants were asked a guide towards the development of an interactive sonifi-
about their experiences by rating different features of the cation methodology that is fully based on the paradigm of
sonification by means of a questionnaire. Concerning the embodied music cognition. All through the development
sounds that were used, the most noteworthy findings were of the interactive sonification tool, the design choices have
that participants reported to be quite able to discern dif- been based on an integration of the three main components.
ferent sounds, although the majority of participants com- They support a combination of electroacoustic techniques
plained about a lack of sonic control. The aesthetic quality (sound), embodied space (body), and mediation technolo-
of the used sounds and the correlation between sounds and gies (tools). By considering the outcome of the user tests
actions were evaluated as good. Participants were able to and the feedback recorded during the presented develop-
correlate movements to the visuals more easily than to the ment stages, an evolution can be reported over the course
sounds. The level of accuracy in their control of the visu- of the two use cases. The user evaluation and the incorpo-
als, both functionally and aesthetically, was perceived to be ration of the feedback provide information about usability
high. Given the combination of sonic and visual feedback issues and their rectification in the subsequent development
in the use-case, most of the participants reported the relation cycle. Starting with a prototype use case that focused on dif-
sonic and visual content to be clear and very complementary. ferent levels of investigation of data (using pitch, interval
Almost all the participants said to have paid equal attention and chord), the dance use case’s emphasis was on space oc-
to the sounds although, contrary to the reported poor level cupation and how interactive scope variation allowed spatial
of sonic control, a minority of participants said to have paid access of the data. Though the focus of the use cases varied
the most attention to the sonic features of their exploration. between iterations, features of the first use case were incor-
The participants’ personal appreciation of the aesthetics, the porated in the second use case. This approach led to a more
perceived quality of interaction, its functionality and the er- meaningful way of interacting with multiple levels of sound,
gonomics was overall quite positive. In a number of focus and ultimately to a better interactive sonification system.
groups in between and after the interaction sessions, users The use cases reflect a work in progress. The feed-
were inquired in group about their opinions concerning the back and user-tests leave room for improvement and user-
operation of the system. Most of the users found that this requested design decisions made in the previous stages have
use case was an apt means to link visual and auditory stim- to be re-evaluated. The concept of scope variance, for ex-
uli together. Most participants agreed that the visuals were ample, proved to be too difficult to control, given the oper-
primarily used to establish the trajectories whereas the soni- ational movements intertwined with natural dance gestures.
fication was used to gain a sense of direction (i.e. finding the Given the flexible nature of the two-way sonification, the
high and low pitches, the beginning, the end and the cours unpredictability of a user-inspired development process, the
of the trajectory), and this was furthermore reported to be arbitrariness common in mapping strategies and reliability
relatively easy. The assessment of single objects was said issues still confining the technology [33], rapid usability
to be too complex. The confusion stemmed from the fact breakthroughs can hardly be expected within a limited num-
that most participants were not used to derive 3D virtual po- ber of iterations. Only a limited number of issues can be ad-
sitional information from 2D-representations projected on dressed in one iteration. Nevertheless, as we are reshaping
screens. Because of that, some of the participants found that and refining the system, we are hopeful that future develop-
they could better explore the direction of the trail using only ments will occur. For example, in the near future we hope
the sonification. Although the trajectories were said to be to re-evaluate the problem at hand because we envision a
clear enough, isolating discrete pitches through the manipu- more weighted scope variation through an improved posture
lation of the scope was reported to be problematic. But the sensing. This way, users could listen with their torso to the
flexibility of the scope device was applauded. A suggestion macrostructure, and reach out with arm (sound chunks) or
was made to shift to a sonification of discrete pitches when hand (sound grains), thus giving them control over multiple
the scope of the inspection tool was at its largest because in microphones while all listening to different levels of sonifi-
the multitude of sonified objects, some resolution was said cation.
to be lost. A final positive remark, that was recurrently made
during the discussion in all the sessions, was the sense of be-
ing completely unhindered by the technology used. 5 Conclusion

A new conceptual framework has been presented, com-


4 Discussion and future work bining gestalt-based electroacoustic composition techniques
(sound), user and body-centered spatial exploration (body),
This paper exemplifies how an iterative development cy- and corporeal mediation technology (tools). The paradigm
cle, rooted in a user centered methodology, can serve as of embodied music cognition mediated the integration of
J Multimodal User Interfaces (2012) 5:211–219 219

these components into an implementation of a conceptual 16. Hermann T (2008) Taxonomy and definitions for sonification and
framework. By means of an iterative process, involving the auditory display. In: Proceedings of the 14th international confer-
ence on auditory display, Paris
development of several use cases, we showed that it is pos- 17. Scaletti C (1989) Composing sound objects in Kyma. Perspect
sible to investigate new approaches for structuring and in- New Music 27:42–69
teractively exploring multivariable data through non-speech 18. Harvey J (1975) The music of Stockhausen: an introduction. Uni-
sound communication and auditory scope variation. versity of California Press, Berheley
19. Chang D, Nesbitt KV (2006) Developing gestalt-based design
guidelines for multi-sensory displays. In: Proceedings of the 2005
NICTA-HCSNet multimodal user interaction workshop, vol 57.
References Australian Computer Society, Sydney, pp 9–16
20. Stockhausen K (1965) Mikrophonie I, fr tamtam, 2 mikrophone,
2 filter und regler. Stockhausen Texte zur Musik, 3. Verlag M.
1. Flowers JH (2005) Thirteen years of reflection on auditory graph- DuMont Schauberg, Cologne
ing: promises, pitfalls, and potential new directions. In: Interna- 21. Gibson JJ (1979) The ecological approach to perception.
tional conference on auditory display (ICAD), Limerick, Ireland Haughton Mifflin, Boston
2. Gaver WW (1986) Auditory icons: using sound in computer inter-
22. Leman M, Naveda L (2010) Basic gestures as spatiotemporal ref-
faces. Hum-Comput Interact 2:167–177
erence frames for repetitive dance/music patterns in samba and
3. Blattner M, Sumikawa D, Greenberg R (1989) Earcons and icons:
Charleston. Music Percept 28:71–91
their structure and common design principles. Hum Comput Inter-
23. Kendall GS (1991) Visualization by ear: auditory imagery for sci-
act 11–44
entific visualization and virtual reality. Comput Music J 15:70–73
4. Kramer G, Walker B, Bonebright T, Cook P, Flowers JH, Miner N,
24. Godoy RI (2004) Gestural imagery in the service of musical im-
Neuhoff J, et al (1997) Sonification report: status of the field and
agery. Springer lecture notes in computer science, pp 99–100
research agenda. Available at: http://dev.icad.org/node/400
25. Mulder A, Fels S, Mase K (1997) Mapping virtual object manip-
5. Hermann T, Hunt A (2005) Guest editors’ introduction: an intro-
ulation to sound variation. IPSJ SIG Not 97:63–68
duction to interactive sonification. IEEE Multimed 12:20–24
26. Zattra L (2005) Analysis and analyses of electroacoustic music.
6. Leman M (2008) Embodied music cognition and mediation tech- In: Proceedings of the sound and music computing, Salerno, Italy
nology. MIT Press, Cambridge 27. Camurri A, Trocca R, Volpe G (2002) Interactive systems design:
7. Langner G (1997) Temporal processing of pitch in the auditory a KANSEI-based approach. In: Proceedings of the conference on
system. J New Music Res 26:116–132 new interfaces for musical expression. National University of Sin-
8. Bregman AS (1994) Auditory scene analysis: the perceptual orga- gapore, Singapore, pp 1–8
nization of sound. MIT Press, Cambridge 28. Deweppe A, Diniz N, Coussement P, Lesaffre M, Leman M (2010)
9. Smalley D (1997) Spectro-morphology and structuring processes. Evaluating strategies for embodied music interaction with musical
In: Emmerson S (ed) The language of electroacoustic music. content. In: Proceedings of the 10th international conference on
Macmillan, London, pp 61–96 interdisciplinary musicology, Sheffield United Kingdom
10. Emmerson S (1989) Composing strategies and pedagogy. Con- 29. Wanderley MM, Orio N (2002) Evaluation of input devices for
temp Music Rev 3:133–144 musical expression: borrowing tools from HCI. Comput Music J
11. Delalande F (2007) Towards an analysis of compositional strate- 26(3):62–76
gies. Circuit Musiques Contemp 17:11–26 30. Kiefer C, Collins N, Fitzpatrick G (2008) HCI methodology for
12. Chion M (1983, 2009) Guide to sound objects. Pierre Schaef- evaluating musical controllers: a case study. In: Proceedings of
fer and musical research (trans. John Dack and Christine North). the conference on new interfaces for musical expression, Genova,
http://www.ears.dmu.ac.uk/ Italy, pp 87–90
13. Clarke M (1998) Extending contacts: the concept of unity in com- 31. Leman M, Lesaffre M, Nils L, Deweppe A (2010) User-oriented
puter music. Perspect New Music 36:221–246 studies in embodied music cognition. Music Sci 14(2/3). Univer-
14. Stockhausen K (1989) Stockhausen on music. Lectures and inter- sité de Liège, Belgium
views compiled by Robin Maconie. Marion Boyars, London 32. Nielsen J, Landauer TK (1993) A mathematical model of the find-
15. Diniz N, Deweppe A, Demey M, Leman M (2010) A Frame- ing of usability problems. In: Proceedings of ACM INTERCHI’93
work for music-based interactive sonification. In: Proceedings of conference, Amsterdam, The Netherlands, pp 206–213
the 16th international conference on auditory display, Washington, 33. Norman D (2010) Natural user interfaces are not natural. Interac-
DC, USA, pp 345–351 tions 17:6–10

You might also like