Virtual and Mixed Reality New Trends

Lecture Notes in Computer Science 6773
Commenced Publication in 1973

Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany
Randall Shumaker (Ed.)
Virtual and Mixed Reality –

New Trends
International Conference, Virtual and Mixed Reality 2011

Held as Part of HCI International 2011
Orlando, FL, USA, July 9-14, 2011
Proceedings, Part I
13
Volume Editor
Randall Shumaker
University of Central Florida
Institute for Simulation and Training
3100 Technology Parkway and 3280 Progress Drive
Orlando, FL 32826, USA
E-mail: shumaker@ist.ucf.edu
ISSN 0302-9743 e-ISSN 1611-3349

ISBN 978-3-642-22020-3 e-ISBN 978-3-642-22021-0
DOI 10.1007/978-3-642-22021-0
Springer Heidelberg Dordrecht London New York
Library of Congress Control Number: Applied for
CR Subject Classification (1998): H.5, H.4, I.3, I.2, C.3, I.4, I.6
LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web

and HCI
© Springer-Verlag Berlin Heidelberg 2011

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
The 14th International Conference on Human–Computer Interaction, HCI In-

ternational 2011, was held in Orlando, Florida, USA, July 9–14, 2011, jointly
with the Symposium on Human Interface (Japan) 2011, the 9th International
Conference on Engineering Psychology and Cognitive Ergonomics, the 6th In-
ternational Conference on Universal Access in Human–Computer Interaction,
the 4th International Conference on Virtual and Mixed Reality, the 4th Interna-
tional Conference on Internationalization, Design and Global Development, the
4th International Conference on Online Communities and Social Computing, the
6th International Conference on Augmented Cognition, the Third International
Conference on Digital Human Modeling, the Second International Conference
on Human-Centered Design, and the First International Conference on Design,
User Experience, and Usability.
A total of 4,039 individuals from academia, research institutes, industry and
governmental agencies from 67 countries submitted contributions, and 1,318
papers that were judged to be of high scientific quality were included in the
program. These papers address the latest research and development efforts and
highlight the human aspects of design and use of computing systems. The papers
accepted for presentation thoroughly cover the entire field of human–computer
interaction, addressing major advances in knowledge and effective use of com-
puters in a variety of application areas.
This volume, edited by Randall Shumaker, contains papers in the thematic
area of virtual and mixed reality (VMR), addressing the following major topics:
• Augmented reality applications
• Virtual and immersive environments
• Novel interaction devices and techniques in VR
• Human physiology and behaviour in VR environments
The remaining volumes of the HCI International 2011 Proceedings are:
• Volume 1, LNCS 6761, Human–Computer Interaction—Design and Devel-
opment Approaches (Part I), edited by Julie A. Jacko
• Volume 2, LNCS 6762, Human–Computer Interaction—Interaction Tech-
niques and Environments (Part II), edited by Julie A. Jacko
• Volume 3, LNCS 6763, Human–Computer Interaction—Towards Mobile and
Intelligent Interaction Environments (Part III), edited by Julie A. Jacko
• Volume 4, LNCS 6764, Human–Computer Interaction—Users and Applica-
tions (Part IV), edited by Julie A. Jacko
• Volume 5, LNCS 6765, Universal Access in Human–Computer Interaction—
Design for All and eInclusion (Part I), edited by Constantine Stephanidis
Users Diversity (Part II), edited by Constantine Stephanidis
VI Foreword

Context Diversity (Part III), edited by Constantine Stephanidis
Applications and Services (Part IV), edited by Constantine Stephanidis
• Volume 9, LNCS 6769, Design, User Experience, and Usability—Theory,
Methods, Tools and Practice (Part I), edited by Aaron Marcus
• Volume 10, LNCS 6770, Design, User Experience, and Usability—
Understanding the User Experience (Part II), edited by Aaron Marcus
• Volume 11, LNCS 6771, Human Interface and the Management of
Information—Design and Interaction (Part I), edited by Michael J. Smith
and Gavriel Salvendy
• Volume 12, LNCS 6772, Human Interface and the Management of
Information—Interacting with Information (Part II), edited by Gavriel Sal-
vendy and Michael J. Smith
• Volume 14, LNCS 6774, Virtual and Mixed Reality—Systems and Applica-
tions (Part II), edited by Randall Shumaker
• Volume 15, LNCS 6775, Internationalization, Design and Global Develop-
ment, edited by P.L. Patrick Rau
• Volume 16, LNCS 6776, Human-Centered Design, edited by Masaaki Kurosu
• Volume 17, LNCS 6777, Digital Human Modeling, edited by Vincent G.
Duffy
• Volume 18, LNCS 6778, Online Communities and Social Computing, edited
by A. Ant Ozok and Panayiotis Zaphiris
• Volume 19, LNCS 6779, Ergonomics and Health Aspects of Work with Com-
puters, edited by Michelle M. Robertson
• Volume 20, LNAI 6780, Foundations of Augmented Cognition: Directing the
Future of Adaptive Systems, edited by Dylan D. Schmorrow and Cali M.
Fidopiastis
• Volume 21, LNAI 6781, Engineering Psychology and Cognitive Ergonomics,
edited by Don Harris
• Volume 22, CCIS 173, HCI International 2011 Posters Proceedings (Part I),
edited by Constantine Stephanidis
• Volume 23, CCIS 174, HCI International 2011 Posters Proceedings (Part II),
edited by Constantine Stephanidis
I would like to thank the Program Chairs and the members of the Pro-
gram Boards of all Thematic Areas, listed herein, for their contribution to the
highest scientific quality and the overall success of the HCI International 2011
Conference.
In addition to the members of the Program Boards, I also wish to thank
the following volunteer external reviewers: Roman Vilimek from Germany, Ra-
malingam Ponnusamy from India, Si Jung “Jun” Kim from the USA, and Ilia
Adami, Iosif Klironomos, Vassilis Kouroumalis, George Margetis, and Stavroula
Ntoa from Greece.
Foreword VII
This conference would not have been possible without the continuous support
and advice of the Conference Scientific Advisor, Gavriel Salvendy, as well as the
dedicated work and outstanding efforts of the Communications and Exhibition
Chair and Editor of HCI International News, Abbas Moallem.
I would also like to thank for their contribution toward the organization of
the HCI International 2011 Conference the members of the Human–Computer
Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona,
George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, Maria Bouhli and George
Kapnas.
July 2011 Constantine Stephanidis

Organization
Ergonomics and Health Aspects of Work with Computers

Program Chair: Michelle M. Robertson
Arne Aarås, Norway Brenda Lobb, New Zealand

Pascale Carayon, USA Holger Luczak, Germany
Jason Devereux, UK William S. Marras, USA
Wolfgang Friesdorf, Germany Aura C. Matias, Philippines
Martin Helander, Singapore Matthias Rötting, Germany
Ed Israelski, USA Michelle L. Rogers, USA
Ben-Tzion Karsh, USA Dominique L. Scapin, France
Waldemar Karwowski, USA Lawrence M. Schleifer, USA
Peter Kern, Germany Michael J. Smith, USA
Danuta Koradecka, Poland Naomi Swanson, USA
Nancy Larson, USA Peter Vink, The Netherlands
Kari Lindström, Finland John Wilson, UK
Human Interface and the Management of Information

Program Chair: Michael J. Smith
Hans-Jörg Bullinger, Germany Youngho Rhee, Korea

Alan Chan, Hong Kong Anxo Cereijo Roibás, UK
Shin’ichi Fukuzumi, Japan Katsunori Shimohara, Japan
Jon R. Gunderson, USA Dieter Spath, Germany
Michitaka Hirose, Japan Tsutomu Tabe, Japan
Jhilmil Jain, USA Alvaro D. Taveira, USA
Yasufumi Kume, Japan Kim-Phuong L. Vu, USA
Mark Lehto, USA Tomio Watanabe, Japan
Hirohiko Mori, Japan Sakae Yamamoto, Japan
Fiona Fui-Hoon Nah, USA Hidekazu Yoshikawa, Japan
Shogo Nishida, Japan Li Zheng, P. R. China
Robert Proctor, USA
X Organization
Human–Computer Interaction
Program Chair: Julie A. Jacko
Sebastiano Bagnara, Italy Gitte Lindgaard, Canada

Sherry Y. Chen, UK Chen Ling, USA
Marvin J. Dainoff, USA Yan Liu, USA
Jianming Dong, USA Chang S. Nam, USA
John Eklund, Australia Celestine A. Ntuen, USA
Xiaowen Fang, USA Philippe Palanque, France
Ayse Gurses, USA P.L. Patrick Rau, P.R. China
Vicki L. Hanson, UK Ling Rothrock, USA
Sheue-Ling Hwang, Taiwan Guangfeng Song, USA
Wonil Hwang, Korea Steffen Staab, Germany
Yong Gu Ji, Korea Wan Chul Yoon, Korea
Steven A. Landry, USA Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics

Program Chair: Don Harris
Guy A. Boy, USA Jan M. Noyes, UK

Pietro Carlo Cacciabue, Italy Kjell Ohlsson, Sweden
John Huddlestone, UK Axel Schulte, Germany
Kenji Itoh, Japan Sarah C. Sharples, UK
Hung-Sying Jing, Taiwan Neville A. Stanton, UK
Wen-Chin Li, Taiwan Xianghong Sun, P.R. China
James T. Luxhøj, USA Andrew Thatcher, South Africa
Nicolas Marmaras, Greece Matthew J.W. Thomas, Australia
Sundaram Narayanan, USA Mark Young, UK
Mark A. Neerincx, The Netherlands Rolf Zon, The Netherlands
Universal Access in Human–Computer Interaction

Program Chair: Constantine Stephanidis
Julio Abascal, Spain Michael Fairhurst, UK

Ray Adams, UK Dimitris Grammenos, Greece
Elisabeth André, Germany Andreas Holzinger, Austria
Margherita Antona, Greece Simeon Keates, Denmark
Chieko Asakawa, Japan Georgios Kouroupetroglou, Greece
Christian Bühler, Germany Sri Kurniawan, USA
Jerzy Charytonowicz, Poland Patrick M. Langdon, UK
Pier Luigi Emiliani, Italy Seongil Lee, Korea
Organization XI
Zhengjie Liu, P.R. China Hirotada Ueda, Japan

Klaus Miesenberger, Austria Jean Vanderdonckt, Belgium
Helen Petrie, UK Gregg C. Vanderheiden, USA
Michael Pieper, Germany Gerhard Weber, Germany
Anthony Savidis, Greece Harald Weber, Germany
Andrew Sears, USA Panayiotis Zaphiris, Cyprus
Christian Stary, Austria
Virtual and Mixed Reality

Program Chair: Randall Shumaker
Pat Banerjee, USA David Pratt, UK

Mark Billinghurst, New Zealand Albert “Skip” Rizzo, USA
Charles E. Hughes, USA Lawrence Rosenblum, USA
Simon Julier, UK Jose San Martin, Spain
David Kaber, USA Dieter Schmalstieg, Austria
Hirokazu Kato, Japan Dylan Schmorrow, USA
Robert S. Kennedy, USA Kay Stanney, USA
Young J. Kim, Korea Janet Weisenford, USA
Ben Lawson, USA Mark Wiederhold, USA
Gordon McK Mair, UK
Internationalization, Design and Global Development

Program Chair: P.L. Patrick Rau
Michael L. Best, USA James R. Lewis, USA

Alan Chan, Hong Kong James J.W. Lin, USA
Lin-Lin Chen, Taiwan Rungtai Lin, Taiwan
Andy M. Dearden, UK Zhengjie Liu, P.R. China
Susan M. Dray, USA Aaron Marcus, USA
Henry Been-Lirn Duh, Singapore Allen E. Milewski, USA
Vanessa Evers, The Netherlands Katsuhiko Ogawa, Japan
Paul Fu, USA Oguzhan Ozcan, Turkey
Emilie Gould, USA Girish Prabhu, India
Sung H. Han, Korea Kerstin Röse, Germany
Veikko Ikonen, Finland Supriya Singh, Australia
Toshikazu Kato, Japan Alvin W. Yeo, Malaysia
Esin Kiris, USA Hsiu-Ping Yueh, Taiwan
Apala Lahiri Chavan, India
XII Organization
Online Communities and Social Computing

Program Chairs: A. Ant Ozok, Panayiotis Zaphiris
Chadia N. Abras, USA Anthony F. Norcio, USA

Chee Siang Ang, UK Ulrike Pfeil, UK
Peter Day, UK Elaine M. Raybourn, USA
Fiorella De Cindio, Italy Douglas Schuler, USA
Heidi Feng, USA Gilson Schwartz, Brazil
Anita Komlodi, USA Laura Slaughter, Norway
Piet A.M. Kommers, The Netherlands Sergei Stafeev, Russia
Andrew Laghos, Cyprus Asimina Vasalou, UK
Stefanie Lindstaedt, Austria June Wei, USA
Gabriele Meiselwitz, USA Haibin Zhu, Canada
Hideyuki Nakanishi, Japan
Augmented Cognition
Program Chairs: Dylan D. Schmorrow, Cali M. Fidopiastis
Monique Beaudoin, USA Rob Matthews, Australia

Chris Berka, USA Dennis McBride, USA
Joseph Cohn, USA Eric Muth, USA
Martha E. Crosby, USA Mark A. Neerincx, The Netherlands
Julie Drexler, USA Denise Nicholson, USA
Ivy Estabrooke, USA Banu Onaral, USA
Chris Forsythe, USA Kay Stanney, USA
Wai Tat Fu, USA Roy Stripling, USA
Marc Grootjen, The Netherlands Rob Taylor, UK
Jefferson Grubb, USA Karl van Orden, USA
Santosh Mathan, USA
Digital Human Modeling

Program Chair: Vincent G. Duffy
Karim Abdel-Malek, USA Yaobin Chen, USA

Giuseppe Andreoni, Italy Kathryn Cormican, Ireland
Thomas J. Armstrong, USA Daniel A. DeLaurentis, USA
Norman I. Badler, USA Yingzi Du, USA
Fethi Calisir, Turkey Okan Ersoy, USA
Daniel Carruth, USA Enda Fallon, Ireland
Keith Case, UK Yan Fu, P.R. China
Julie Charland, Canada Afzal Godil, USA
Organization XIII
Ravindra Goonetilleke, Hong Kong Ahmet F. Ozok, Turkey

Anand Gramopadhye, USA Srinivas Peeta, USA
Lars Hanson, Sweden Sudhakar Rajulu, USA
Pheng Ann Heng, Hong Kong Matthias Rötting, Germany
Bo Hoege, Germany Matthew Reed, USA
Hongwei Hsiao, USA Johan Stahre, Sweden
Tianzi Jiang, P.R. China Mao-Jiun Wang, Taiwan
Nan Kong, USA Xuguang Wang, France
Steven A. Landry, USA Jingzhou (James) Yang, USA
Kang Li, USA Gulcin Yucel, Turkey
Zhizhong Li, P.R. China Tingshao Zhu, P.R. China
Tim Marler, USA
Human-Centered Design
Program Chair: Masaaki Kurosu
Julio Abascal, Spain Zhengjie Liu, P.R. China

Simone Barbosa, Brazil Loı̈c Martı́nez-Normand, Spain
Tomas Berns, Sweden Monique Noirhomme-Fraiture,
Nigel Bevan, UK Belgium
Torkil Clemmensen, Denmark Philippe Palanque, France
Susan M. Dray, USA Annelise Mark Pejtersen, Denmark
Vanessa Evers, The Netherlands Kerstin Röse, Germany
Xiaolan Fu, P.R. China Dominique L. Scapin, France
Yasuhiro Horibe, Japan Haruhiko Urokohara, Japan
Jason Huang, P.R. China Gerrit C. van der Veer,
Minna Isomursu, Finland The Netherlands
Timo Jokela, Finland Janet Wesson, South Africa
Mitsuhiko Karashima, Japan Toshiki Yamaoka, Japan
Tadashi Kobayashi, Japan Kazuhiko Yamazaki, Japan
Seongil Lee, Korea Silvia Zimmermann, Switzerland
Kee Yong Lim, Singapore
Design, User Experience, and Usability

Program Chair: Aaron Marcus
Ronald Baecker, Canada Ana Boa-Ventura, USA

Barbara Ballard, USA Lorenzo Cantoni, Switzerland
Konrad Baumann, Austria Sameer Chavan, Korea
Arne Berger, Germany Wei Ding, USA
Randolph Bias, USA Maximilian Eibl, Germany
Jamie Blustein, Canada Zelda Harrison, USA
XIV Organization
Rüdiger Heimgärtner, Germany Christine Ronnewinkel, Germany

Brigitte Herrmann, Germany Elizabeth Rosenzweig, USA
Sabine Kabel-Eckes, USA Paul Sherman, USA
Kaleem Khan, Canada Ben Shneiderman, USA
Jonathan Kies, USA Christian Sturm, Germany
Jon Kolko, USA Brian Sullivan, USA
Helga Letowt-Vorbek, South Africa Jaakko Villa, Finland
James Lin, USA Michele Visciola, Italy
Frazer McKimm, Ireland Susan Weinschenk, USA
Michael Renner, Switzerland
HCI International 2013
The 15th International Conference on Human–Computer Interaction, HCI Inter-

national 2013, will be held jointly with the affiliated conferences in the summer
of 2013. It will cover a broad spectrum of themes related to human–computer
interaction (HCI), including theoretical issues, methods, tools, processes and
case studies in HCI design, as well as novel interaction techniques, interfaces
and applications. The proceedings will be published by Springer. More infor-
mation about the topics, as well as the venue and dates of the conference,
will be announced through the HCI International Conference series website:
http://www.hci-international.org/
General Chair
Professor Constantine Stephanidis
University of Crete and ICS-FORTH
Heraklion, Crete, Greece
Email: cs@ics.forth.gr
Table of Contents – Part I
Part I: Augmented Reality Applications

AR Based Environment for Exposure Therapy to Mottephobia . . . . . . . . . 3
Andrea F. Abate, Michele Nappi, and Stefano Ricciardi
Designing Augmented Reality Tangible Interfaces for Kindergarten

Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Pedro Campos and Sofia Pessanha
lMAR: Highly Parallel Architecture for Markerless Augmented Reality

in Aircraft Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Andrea Caponio, Mauricio Hincapié, and Eduardo González Mendivil
5-Finger Exoskeleton for Assembly Training in Augmented Reality . . . . . 30

Siam Charoenseang and Sarut Panjan
Remote Context Monitoring of Actions and Behaviors in a Location

through 3D Visualization in Real-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
John Conomikes, Zachary Pacheco, Salvador Barrera,
Juan Antonio Cantu, Lucy Beatriz Gomez, Christian de los Reyes,
Juan Manuel Mendez-Villarreal, Takeo Shime, Yuki Kamiya,
Hedeki Kawai, Kazuo Kunieda, and Keiji Yamada
Spatial Clearance Verification Using 3D Laser Range Scanner and

Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Hirotake Ishii, Shuhei Aoyama, Yoshihito Ono, Weida Yan,
Hiroshi Shimoda, and Masanori Izumi
Development of Mobile AR Tour Application for the National Palace

Museum of Korea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Jae-Beom Kim and Changhoon Park
A Vision-Based Mobile Augmented Reality System for Baseball

Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Seong-Oh Lee, Sang Chul Ahn, Jae-In Hwang, and Hyoung-Gon Kim
Social Augmented Reality for Sensor Visualization in Ubiquitous

Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Youngho Lee, Jongmyung Choi, Sehwan Kim, Seunghun Lee, and
Say Jang
XVIII Table of Contents – Part I
Digital Diorama: AR Exhibition System to Convey Background

Information for Museums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Takuji Narumi, Oribe Hayashi, Kazuhiro Kasada,
Mitsuhiko Yamazaki, Tomohiro Tanikawa, and
Michitaka Hirose
Augmented Reality: An Advantageous Option for Complex Training
and Maintenance Operations in Aeronautic Related Processes . . . . . . . . . 87
Horacio Rios, Mauricio Hincapié, Andrea Caponio,
Emilio Mercado, and Eduardo González Mendı́vil
Enhancing Marker-Based AR Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Jonghoon Seo, Jinwook Shim, Ji Hye Choi, James Park, and
Tack-don Han
MSL AR Toolkit: AR Authoring Tool with Interactive Features . . . . . . . . 105
Jinwook Shim, Jonghoon Seo, and Tack-don Han
Camera-Based In-situ 3D Modeling Techniques for AR Diorama in
Ubiquitous Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Atsushi Umakatsu, Hiroyuki Yasuhara, Tomohiro Mashita,
Kiyoshi Kiyokawa, and Haruo Takemura
Design Criteria for AR-Based Training of Maintenance and Assembly
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Sabine Webel, Ulrich Bockholt, and Jens Keil
Part II: Virtual and Immersive Environments

Object Selection in Virtual Environments Performance, Usability and
Interaction with Spatial Abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Andreas Baier, David Wittmann, and Martin Ende
Effects of Menu Orientation on Pointing Behavior in Virtual
Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Nguyen-Thong Dang and Daniel Mestre
Some Evidences of the Impact of Environment’s Design Features in
Routes Selection in Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Emı́lia Duarte, Elisângela Vilar, Francisco Rebelo, Júlia Teles, and
Ana Almeida
Evaluating Human-Robot Interaction during a Manipulation
Experiment Conducted in Immersive Virtual Reality . . . . . . . . . . . . . . . . . 164
Mihai Duguleana, Florin Grigorie Barbuceanu, and Gheorghe Mogan
3-D Sound Reproduction System for Immersive Environments Based
on the Boundary Surface Control Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Seigo Enomoto, Yusuke Ikeda, Shiro Ise, and Satoshi Nakamura
Table of Contents – Part I XIX
Workspace-Driven, Blended Orbital Viewing in Immersive

Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Scott Frees and David Lancellotti
Irradiating Heat in Virtual Environments: Algorithm and

Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Marco Gaudina, Andrea Brogni, and Darwin Caldwell
Providing Immersive Virtual Experience with First-person Perspective

Omnidirectional Movies and Three Dimensional Sound Field . . . . . . . . . . 204
Kazuaki Kondo, Yasuhiro Mukaigawa, Yusuke Ikeda,
Seigo Enomoto, Shiro Ise, Satoshi Nakamura, and
Yasushi Yagi
Intercepting Virtual Ball in Immersive Virtual Environment . . . . . . . . . . . 214

Massimiliano Valente, Davide Sobrero, Andrea Brogni, and
Darwin Caldwell
Part III: Novel Interaction Devices and Techniques

in VR
Concave-Convex Surface Perception by Visuo-vestibular Stimuli for
Five-Senses Theater . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Tomohiro Amemiya, Koichi Hirota, and Yasushi Ikei
Touching Sharp Virtual Objects Produces a Haptic Illusion . . . . . . . . . . . 234

Andrea Brogni, Darwin G. Caldwell, and Mel Slater
Whole Body Interaction Using the Grounded Bar Interface . . . . . . . . . . . . 243

Bong-gyu Jang, Hyunseok Yang, and Gerard J. Kim
Digital Display Case Using Non-contact Head Tracking . . . . . . . . . . . . . . . 250

Takashi Kajinami, Takuji Narumi, Tomohiro Tanikawa, and
Michitaka Hirose
Meta Cookie+: An Illusion-Based Gustatory Display . . . . . . . . . . . . . . . . . 260

Takuji Narumi, Shinya Nishizaka, Takashi Kajinami,
Tomohiro Tanikawa, and Michitaka Hirose
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality . . . 270
Pedro Santos, Hendrik Schmedt, Bernd Amend, Philip Hammer,
Ronny Giera, Elke Hergenröther, and André Stork
Olfactory Display Using Visual Feedback Based on Olfactory Sensory

Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Tomohiro Tanikawa, Aiko Nambu, Takuji Narumi,
Kunihiro Nishimura, and Michitaka Hirose
XX Table of Contents – Part I
Towards Noninvasive Brain-Computer Interfaces during Standing for

VR Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Hideaki Touyama
Part IV: Human Physiology and Behaviour in VR

Environments
Stereoscopic Vision Induced by Parallax Images on HMD and its

Influence on Visual Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Satoshi Hasegawa, Akira Hasegawa, Masako Omori, Hiromu Ishio,
Hiroki Takada, and Masaru Miyao
Comparison of Accommodation and Convergence by Simultaneous

Measurements during 2D and 3D Vision Gaze . . . . . . . . . . . . . . . . . . . . . . . 306
Hiroki Hori, Tomoki Shiomi, Tetsuya Kanda, Akira Hasegawa,
Hiromu Ishio, Yasuyuki Matsuura, Masako Omori, Hiroki Takada,
Satoshi Hasegawa, and Masaru Miyao
Tracking the UFO’s Paths: Using Eye-Tracking for the Evaluation of

Serious Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Michael D. Kickmeier-Rust, Eva Hillemann, and Dietrich Albert
The Online Gait Measurement for Characteristic Gait Animation

Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Yasushi Makihara, Mayu Okumura, Yasushi Yagi, and
Shigeo Morishima
Measuring and Modeling of Multi-layered Subsurface Scattering for

Human Skin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Tomohiro Mashita, Yasuhiro Mukaigawa, and Yasushi Yagi
An Indirect Measure of the Implicit Level of Presence in Virtual

Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Steven Nunnally and Durell Bouchard
Effect of Weak Hyperopia on Stereoscopic Vision . . . . . . . . . . . . . . . . . . . . . 354

Masako Omori, Asei Sugiyama, Hiroki Hori, Tomoki Shiomi,
Tetsuya Kanda, Akira Hasegawa, Hiromu Ishio, Hiroki Takada,
Satoshi Hasegawa, and Masaru Miyao
Simultaneous Measurement of Lens Accommodation and Convergence

to Real Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Tomoki Shiomi, Hiromu Ishio, Hiroki Hori, Hiroki Takada,
Masako Omori, Satoshi Hasegawa, Shohei Matsunuma,
Akira Hasegawa, Tetsuya Kanda, and Masaru Miyao
Table of Contents – Part I XXI
Comparison in Degree of the Motion Sickness Induced by a 3-D Movie

on an LCD and an HMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Hiroki Takada, Yasuyuki Matsuura, Masumi Takada, and
Masaru Miyao
Evaluation of Human Performance Using Two Types of Navigation

Interfaces in Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Luı́s Teixeira, Emı́lia Duarte, Júlia Teles, and Francisco Rebelo
Use of Neurophysiological Metrics within a Real and Virtual Perceptual

Skills Task to Determine Optimal Simulation Fidelity Requirements . . . . 387
Jack Vice, Anna Skinner, Chris Berka, Lauren Reinerman-Jones,
Daniel Barber, Nicholas Pojman, Veasna Tan, Marc Sebrechts, and
Corinna Lathan
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

Table of Contents – Part II
Part I: VR in Education, Training and Health

Serious Games for Psychological Health Education . . . . . . . . . . . . . . . . . . . 3
Anya Andrews
Mixed Reality as a Means to Strengthen Post-stroke Rehabilitation . . . . 11

Ines Di Loreto, Liesjet Van Dokkum, Abdelkader Gouaich, and
Isabelle Laffont
A Virtual Experiment Platform for Mechanism Motion Cognitive

Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Xiumin Fan, Xi Zhang, Huangchong Cheng, Yanjun Ma, and
Qichang He
Mechatronic Prototype for Rigid Endoscopy Simulation . . . . . . . . . . . . . . . 30

Byron Pérez-Gutiérrez, Camilo Ariza-Zambrano, and
Juan Camilo Hernández
Patterns of Gaming Preferences and Serious Game Effectiveness . . . . . . . 37

Katelyn Procci, James Bohnsack, and Clint Bowers
Serious Games for the Therapy of the Posttraumatic Stress Disorder of

Children and Adolescents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Rafael Radkowski, Wilfried Huck, Gitta Domik, and
Martin Holtmann
Virtual Reality as Knowledge Enhancement Tool for Musculoskeletal

Pathology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Sophia Sakellariou, Vassilis Charissis, Stephen Grant,
Janice Turner, Dianne Kelly, and Chistodoulos Christomanos
Study of Optimal Behavior in Complex Virtual Training Systems . . . . . . 64

Jose San Martin
Farming Education: A Case for Social Games in Learning . . . . . . . . . . . . . 73

Peter Smith and Alicia Sanchez
Sample Size Estimation for Statistical Comparative Test of Training by

Using Augmented Reality via Theoretical Formula and OCC Graphs:
Aeronautical Case of a Component Assemblage . . . . . . . . . . . . . . . . . . . . . . 80
Fernando Suárez-Warden, Yocelin Cervantes-Gloria, and
Eduardo González-Mendı́vil
XXIV Table of Contents – Part II
Enhancing English Learning Website Content and User Interface

Functions Using Integrated Quality Assessment . . . . . . . . . . . . . . . . . . . . . . 90
Dylan Sung
The Influence of Virtual World Interactions toward Driving Real World
Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Hari Thiruvengada, Paul Derby, Wendy Foslien, John Beane, and
Anand Tharanathan
Interactive Performance: Dramatic Improvisation in a Mixed Reality
Environment for Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Jeff Wirth, Anne E. Norris, Dan Mapes, Kenneth E. Ingraham, and
J. Michael Moshell
Emotions and Telerebabilitation: Pilot Clinical Trials for Virtual
Telerebabilitation Application Using Haptic Device and Its Impact on
Post Stroke Patients’ Mood and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 119
Shih-Ching Yeh, Margaret McLaughlin, Yujung Nam, Scott Sanders,
Chienyen Chang, Bonnie Kennedy, Sheryl Flynn, Belinda Lange,
Lei Li, Shu-ya Chen, Maureen Whitford, Carolee Winstein,
Younbo Jung, and Albert Rizzo
An Interactive Multimedia System for Parkinson’s Patient
Rehabilitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Wenhui Yu, Catherine Vuong, and Todd Ingalls
Part II: VR for Culture and Entertainment

VClav 2.0 – System for Playing 3D Virtual Copy of a Historical
Clavichord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Krzysztof Gardo and Ewa Lukasik
A System for Creating the Content for a Multi-sensory Theater . . . . . . . . 151
Koichi Hirota, Seichiro Ebisawa, Tomohiro Amemiya, and
Yasushi Ikei
Wearable Display System for Handing Down Intangible Cultural
Heritage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Atsushi Hiyama, Yusuke Doyama, Mariko Miyashita, Eikan Ebuchi,
Masazumi Seki, and Michitaka Hirose
Stroke-Based Semi-automatic Region of Interest Detection Algorithm
for In-Situ Painting Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Youngkyoon Jang and Woontack Woo
Personalized Voice Assignment Techniques for Synchronized Scenario
Speech Output in Entertainment Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Shin-ichi Kawamoto, Tatsuo Yotsukura, Satoshi Nakamura, and
Shigeo Morishima
Table of Contents – Part II XXV
Instant Movie Casting with Personality: Dive Into the Movie System . . . 187
Shigeo Morishima, Yasushi Yagi, and Satoshi Nakamura
A Realtime and Direct-Touch Interaction System for the 3D Cultural

Artifact Exhibition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Wataru Wakita, Katsuhito Akahane, Masaharu Isshiki, and
Hiromi T. Tanaka
Digital Display Case: A Study on the Realization of a Virtual

Transportation System for a Museum Collection . . . . . . . . . . . . . . . . . . . . . 206
Takafumi Watanabe, Kenji Inose, Makoto Ando, Takashi Kajinami,
Takuji Narumi, Tomohiro Tanikawa, and Michitaka Hirose
Part III: Virtual Humans and Avatars
Integrating Multi-agents in a 3D Serious Game Aimed at Cognitive

Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Priscilla F. de Abreu, Luis Alfredo V. de Carvalho,
Vera Maria B. Werneck, and
Rosa Maria E. Moreira da Costa
Automatic 3-D Facial Fitting Technique for a Second Life Avatar . . . . . . 227
Hiroshi Dohi and Mitsuru Ishizuka
Reflected in a Liquid Crystal Display: Personalization and the Use of

Avatars in Serious Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Shan Lakhmani and Clint Bowers
Leveraging Unencumbered Full Body Control of Animated Virtual

Characters for Game-Based Rehabilitation . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Belinda Lange, Evan A. Suma, Brad Newman, Thai Phan,
Chien-Yen Chang, Albert Rizzo, and Mark Bolas
Interactive Exhibition with Ambience Using Video Avatar and

Animation on Huge Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Hasup Lee, Yoshisuke Tateyama, Tetsuro Ogi, Teiichi Nishioka,
Takuro Kayahara, and Kenichi Shinoda
Realistic Facial Animation by Automatic Individual Head Modeling

and Facial Muscle Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Akinobu Maejima, Hiroyuki Kubo, and Shigeo Morishima
Geppetto: An Environment for the Efficient Control And Transmission

of Digital Puppetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Daniel P. Mapes, Peter Tonner, and Charles E. Hughes
XXVI Table of Contents – Part II
Body Buddies: Social Signaling through Puppeteering . . . . . . . . . . . . . . . . 279

Magy Seif El-Nasr, Katherine Isbister, Jeffery Ventrella,
Bardia Aghabeigi, Chelsea Hash, Mona Erfani,
Jacquelyn Morie, and Leslie Bishko
Why Can’t a Virtual Character Be More Like a Human:

A Mixed-Initiative Approach to Believable Agents . . . . . . . . . . . . . . . . . . . . 289
Jichen Zhu, J. Michael Moshell, Santiago Ontañón,
Elena Erbiceanu, and Charles E. Hughes
Part IV: Developing Virtual and Mixed Environments
Collaborative Mixed-Reality Platform for the Design Assessment of

Cars Interior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Giandomenico Caruso, Samuele Polistina, Monica Bordegoni, and
Marcello Aliverti
Active Location Tracking for Projected Reality Using Wiimotes . . . . . . . . 309

Siam Charoenseang and Nemin Suksen
Fast Prototyping of Virtual Replica of Real Products . . . . . . . . . . . . . . . . . 318

Francesco Ferrise and Monica Bordegoni
Effectiveness of a Tactile Display for Providing Orientation Information

of 3d-patterned Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Nadia Garcia-Hernandez, Ioannis Sarakoglou,
Nikos Tsagarakis, and Darwin Caldwell
ClearSpace: Mixed Reality Virtual Teamrooms . . . . . . . . . . . . . . . . . . . . . . 333

Alex Hill, Matthew Bonner, and Blair MacIntyre
Mesh Deformations in X3D via CUDA with Freeform Deformation

Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Yvonne Jung, Holger Graf, Johannes Behr, and Arjan Kuijper
Visualization and Management of u-Contents for Ubiquitous VR . . . . . . . 352

Kiyoung Kim, Jonghyun Han, Changgu Kang, and Woontack Woo
Semi Autonomous Camera Control in Dynamic Virtual

Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Marcel Klomann and Jan-Torsten Milde
Panoramic Image-Based Navigation for Smart-Phone in Indoor

Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Van Vinh Nguyen, Jin Guk Kim, and Jong Weon Lee
Table of Contents – Part II XXVII
Foundation of a New Digital Ecosystem for u-Content: Needs,

Definition, and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Yoosoo Oh, Sébastien Duval, Sehwan Kim, Hyoseok Yoon,
Taejin Ha, and Woontack Woo
Semantic Web-Techniques and Software Agents for the Automatic

Integration of Virtual Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Rafael Radkowski and Florian Weidemann
Virtual Factory Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

Marco Sacco, Giovanni Dal Maso, Ferdinando Milella,
Paolo Pedrazzoli, Diego Rovere, and Walter Terkaj
FiveStar: Ultra-Realistic Space Experience System . . . . . . . . . . . . . . . . . . . 407

Masahiro Urano, Yasushi Ikei, Koichi Hirota, and
Tomohiro Amemiya
Synchronous vs. Asynchronous Control for Large Robot Teams . . . . . . . . 415

Huadong Wang, Andreas Kolling, Nathan Brooks,
Michael Lewis, and Katia Sycara
Acceleration of Massive Particle Data Visualization Based on GPU . . . . . 425

Hyun-Rok Yang, Kyung-Kyu Kang, and Dongho Kim
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

AR Based Environment for Exposure Therapy to
Mottephobia
Andrea F. Abate, Michele Nappi, and Stefano Ricciardi
Virtual Reality Laboratory – University of Salerno,

84084, Fisciano (SA), Italy
{abate,mnappi,sricciardi}@unisa.it
Abstract. Mottephobia is an anxiety disorder revolving around an extreme,

persistent and irrational fear of moths and butterflies leading sufferers to panic
attacks. This study presents an ARET (Augmented Reality Exposure Therapy)
environment aimed to reduce mottephobia symptoms by progressive
desensitization. The architecture described is designed to provide a greater and
deeper level of interaction between the sufferer and the object of its fears. To
this aim the system exploits an inertial ultrasonic-based tracking system to
capture the user’s head and wrists positions/orientations within the virtual
therapy room, while a couple of instrumented gloves capture fingers’ motion. A
parametric moth behavioral engine allows the expert monitoring the therapy
session to control many aspects of the virtual insects augmenting the real scene
as well as their interaction with the sufferer.
Keywords: Augmented reality, exposure therapy, mottephobia.
1 Introduction
Mottephobia is the term used to describe the intense fear of moths and more in
general of butterflies. According to psychologists’’ classification of phobias, which
distinguish between agoraphobia, social phobia and specific phobia, mottephobia falls
within the last category and represents an animal phobia, an anxiety disorder which is
not uncommon though not so well-known as arachnophobia. In severe cases, panic
attacks are triggered in mottephobia sufferers if they simply view a picture or even
think of a moth. Consequently, many of these persons will completely avoid
situations where butterflies or moths may be present. If they see one, they often
follow it with close scrutiny as to make sure it does not come anywhere near them.
Sometimes the fear is caused by a split second of panic during exposure to the
animal. This wires the brain to respond similarly to future stimuli with symptoms
such as fast heartbeat, sweating, dry mouth and elevated stress and anxiety levels. In
general, the most common treatment for phobias is exposure therapy, or systematic
desensitization. This involves gradually being exposed to the phobic object or
situation in a safe and controlled way. For example, a mottephobic subject might start
out by looking at cartoon drawings of butterflies. When they reach a point where the
images no longer trigger the phobic response, they may move on to photographs, and
R. Shumaker (Ed.): Virtual and Mixed Reality, Part I, HCII 2011, LNCS 6773, pp. 3–11, 2011.
4 A.F. Abate, M. Nappi, and S. Ricciardi
so on. Therapy is a slow process, but can have lasting effects. In the last decade the
systematic desensitization treatment has been approached by means of virtual reality
based environments and more recently by augmented reality techniques where in-vivo
exposure is difficult to manage. In this case the contact between the sufferer and the
source of its fear is performed via a virtual replica of it which can be visualized on a
screen or through an head-up display and may even enable a simulated interaction.
This study presents a novel augmented reality based environment for exposure
therapy to mottephobia. The final goal is to match the emotional impact experimented
during the exposure to real moths while providing therapists a level of control of
virtual moths’ behavior which would be impossible in-vivo.
The rest of this paper is organized as follow. Related works and their comparison
with the proposed approach are presented in section 2., while the system’s
architecture is described in detail in section 3. The experiments conducted and their
results are presented in section 4., while conclusions are drawn in section 5.
2 Related Works and Proposed Approach

In the last decade the systematic desensitization treatment has been approached by
means of virtual reality based environments and more recently by augmented reality
techniques where in-vivo exposure is difficult to manage. Virtual Reality based
Exposure Therapy (VRET) has proved to be an effective strategy for phobias
treatment since the original study by Carlin and al. in 1997 [1] which first reported
about the efficacy of a virtual exposure to spiders opening the way to other researches
in this line [2, 3]. More recently augmented reality has also been proposed to allow
the sufferer to see the real environment around him/her instead that a virtual one
while displaying the virtual contents co-registered to the user’s field of view as they
were really present there, possibly resulting in more convincing stimula for the
therapy (ARET). This objective has been approached by means of (visible and
invisible) marker based techniques [4, 5] using both video-based and optical-based
see-through head mounted display [6]. The aforementioned marker-based approach
involves some limitations: from one side the operative volume is restricted to a
fraction of the environment (typically the desktop where the marker is located)
possibly limiting the user’s head movement to not lose the marker and therefore the
co-registration between real and virtual. On the other side the choice of marker’s
location (either visible or not) is limited by lighting and orientation constraints related
to pattern detection/recognition issues which may reduce the range of the experience.
This design may be still valid when interacting with not-flying creatures (like
spiders or cockroaches) especially considering the low cost of the optical tracking, but
it is very limiting when simulating flying insects’ behavior which involves much
larger spaces. Furthermore, in most proposals the virtual insects do not react to user’s
hands actions, i.e. they perform their pre-built animation(s) independently from where
exactly hands and fingers are, eventually reacting only to actions like pressing a key
to crush the insects.
In this paper, the mottephobia ARET environment proposed addresses the
aforementioned limitations exploiting a head/wrists inertial tracking system,
instrumented gloves and a parametric moth behavior approach to enable a greater and
deeper level of interaction between the sufferer and the object of its fears.
AR Based Environment for Exposure Therapy to Mottephobia 5
3 System’s Architecture
The overall system’s architecture is schematically depicted in Fig. 1. The main
components are the Moth Behavioral Engine which controls both the appearance and
the dynamic behavior of the virtual moths represented in the dedicated 3D Dataset
throughout the simulation, the Interaction Engine managing the sufferer-moths
interaction exploiting hands gesture capture and wrists tracking, and the AR Engine in
charge of scene augmentation (based on head tracking) and stereoscopic rendering via
the see-through head mounted display which also provides audio stimula generated on
a positional basis.
Fig. 1. Schematic view of the proposed system
As the main objective was a believable hand-moth interaction, wireless instrumented

gloves and ultrasonic tracking devices have been used. An instrumented glove, indeed,
enables a reliable gesture capture as each single finger has individual sensors which are
unaffected by any other fingers.
In this case, left and right hand gesture acquisition is performed via a couple of
wireless 5DT Dataglove 14 ultra, featuring fourteen channels for finger flexion and
abduction measurement, with 12 bit of sampling resolution each. As datagloves do
not provide any spatial information, the system relies on an inertial ultrasonic-based
tracking system (Intersense IS 900 VET) with six degrees-of-freedom, to detect head
and wrists position in 3D space and their rotation on yaw, pitch and roll axis. Among
the advantages of this setup there is the wide capture volume (respect to video based
solutions requiring the user to be positioned in a precise spot within camera field of
view), an accuracy in the range of millimeters for distance measurements and of
tenths of degree for angular measurements and a high sampling rate suited to
accurately capture fast movements. A preprocessing applied to each of six channels
(for each hand) filters capture noise by means of a high frequency cut and a temporal
average of sampled values. Left and right hands data streams are outputted to the
Interaction Engine, while head tracking is sent to the AR Engine for virtual-to-real co-
registration.
The Moth Behavioral Engine allows the therapist to control many parameters of
the simulated exposure (see Fig. 2). Both behavioral and interaction parameters can
be adjusted interactively during the exposure session, allowing the therapist to modify
the simulation on-the-fly, if required. These parameters include the “number”, the
“size”, the maximum amount of “size variation” (with respect to a pseudo-random
distribution) and the type of flying creatures to be visualized among those available in
a previously built 3D dataset.
Fig. 2. The GUI screen including the main simulation parameters
Actually, this engine is based on a parametric particle system which controls the
virtual moths as instances of a reference geometry (a polygonal model). The dynamic
of the particles (i.e. the moths motion) is controlled at two different levels: the particle
level and the swarm level. At the particle level the motion of the single moth is
controlled through a seamlessly loopable spline based animation defining the
particular flying pattern. The “moth speed” parameter multiplied for a random
variation value affects the time required to complete the pattern. At the swarm level
the motion of the whole swarm is controlled through an emitter and a target which can
be interactively selected among predefined locations in the 3D model of the virtual
therapy environment. More than one swarm may be active at the same time allowing
the moths to originate from different locations and thus providing a less repetitive and
more unexpected experience. The “swarm speed” parameter affects the time required
to complete the emitter-target path. Other two swarm level parameters, namely
“aggressiveness” and “user avoidance” respectively affect the swarm dynamic
behavior by attracting the swarm path towards the sufferer position and by defining
the radius of the sufferer centered sphere in which the moths cannot enter.
The Interaction Engine, exploits the user’s tracking data to enable realistic hand-
moth interaction. Indeed, not only the approximate hand location, but also each
finger’s position can be computed based on the wrists tracking and forward
kinematics applied to the flexion/abduction data captured by the instrumented gloves.
By this design, as the user shake the hands the butterflies may react avoiding the
collision and flying away according to their motion pattern, while in a more advanced
stage of the therapy a direct contact with the insects is possible by allowing the insect
to settle on the hand surface. To this regard, it has to be remarked that for the first
interaction modality the instrumented gloves could be omitted (thus reducing the
hardware required and the equipment to be worn), while for the other two “direct-
contact” modalities they are strictly necessary. During “direct-contact”, one or more
virtual insects (according to the “direct contact” parameter) may settle on each hand
in spots randomly selected among a pre-defined set of swarm targets (e. g. the palm,
or the index finger or the back of the hand). Again, the purpose of this randomness is
to prevent the sufferer to expect a contact happening always in the same way.
The 3D dataset contains medium to low detail polygonal models of
moth/butterflies, realistically textured and animated. These models are transformed
and rendered by the visualization engine, also responsible for AR related real time
transformations and for the stereo rendering of 3D content. The engine is built on the
DirectX based Quest3D graphics toolkit (see Fig. 3), which enables dynamic
simulation by means of the Newton Dynamics API or even via the Open Dynamics
Engine (OpenDE, a.k.a. ODE) open-source library. To generate the AR experience,
the visualization engine exploits user’s head position and orientation to transform the
virtual content as seen from user’s point of view and coherently to a 3D model of
surrounding environment, a crucial task referred as 3D registration. Any AR
environment requires a precise registration of real and virtual objects, i.e. the objects
in the real and virtual world must be properly aligned with respect to each other, or
the illusion that the two worlds coexist will be compromised. Therefore at runtime
two rendering cameras (one for each eye) are built, matching the exact
position/orientation of user’s eyes, transforming each vertex of each virtual object to
be displayed onto the real scene accordingly.
Fig. 3. A fragment of Quest3D graph-based programming environment for finger-moth collision

detection
Two renderings (left and right) are then calculated and coherently displayed
through an optical see-through Head Mounted Display, which works by placing
optical combiners in front of the user's eyes (see Fig. 4). These combiners are partially
transmissive, so that the user can look directly through them to see the real world. The
combiners are also partially reflective, so that the user sees virtual images bounced off
the combiners from head-mounted LCD monitors. The rendering engine has been
tailored to optical see-through HMD, but it could be adapted to video see-through
displays. Eventually, a selective culling of a virtual object may be performed whereas
it is partially or totally behind a real object, but in many cases this technique (and the
overhead required to accurately model the real environment) could not be necessary.
To further stimulate the user’s emotional reactions, audio samples mimicking the
sound of moths’ flapping wings diffused through the headphones integrated in the
HMD, are exploited to amplify the sensation of presence of the virtual insects
according to their size, number and distance from the sufferer. The flapping wings
audio-samples are short looping samples whose duration is in sync with the actual
flapping animation cycle to achieve an audio-visual coherence.
4 Experiments
We are still in the process of performing a quantitative study to measure the response
of mottephobia sufferers to this approach to exposure therapy. So far, we have carried
out some preliminary qualitative evaluations on the system described above, to gather
first impressions about its potential efficacy from experts in exposure therapy and
from their patients. These experiments involved five mottephobiac subjects showing
various levels of symptoms’ seriousness and three exposure therapy specialists. The
test bed hardware included a dual quad-core Intel Xeon workstation equipped with an
Nvidia Quadro 5600 graphics board with 1,5 Gigabytes of VRAM in the role of
simulation server and control interface. The HMD adopted is a Cybermind Visette Pro
with see-through option. The virtual therapy room has a surface of about 40 mq of
which 15 mq fall within the capture volume of the tracking system, providing a
reasonable space for moving around and interacting (see Fig. 5).
Each of the 5 participants has been exposed to moth/butterflies augmenting the real
scene during the course of 8 ARET sessions featuring a progressively closer level of
interaction, while the experts were invited to control the simulation’s parameters after
a brief training. After each session the participant have been asked to answer to a
questionnaire developed to measure six subjective aspects of the simulated experience
by assigning a vote in the integer range 1-10 (the higher the better) to: (A) Realism of
Simulated Experience; (B) Visual Realism of Virtual Moths; (C) Realism of Moth
Behavior; (D) Realism of Hand-Moth Interaction; (E) Emotional Impact of Audio
Stimula; (F) Maximum Fear Level Experimented. Additionally, the therapists were
asked to provide feedback on two qualitative aspects of the ARET control interface:
(G) Accuracy of Control; (H) Range of Control. As shown in Table 1, while the
evaluations provided are subjective and the number of users involved in these first
trials is very small, the overall results seem to confirm that many of the factors
triggering the panic attacks in mottephobiac subjects, like the sudden appearance of
insects from behind or above, the moths’ erratic flying patterns, the sound of flapping
wings or simply the insects’ visual aspect, are credibly reproduced by the proposed
AR environment.
Fig. 4. See-Through HMD, datagloves and head/wrists wireless trackers worn during testing
Table 1. A resume of the scores provided by the users of the ARET system proposed
Features Min. Avg. Max.

(A) Realism of Simulated Experience 7 7.9 9
(B) Visual Realism of Virtual Moths 8 9.1 10
(C) Realism of Moth Behavior 6 6.8 8
(D) Realism of Hand-Moth Interaction 6 7.5 9
(E) Emotional Impact of Audio Stimula 8 8.2 9
(F) Maximum Fear Level Experimented 8 8.8 10
(G) Accuracy of Control 7 7.5 8
(H) Range of Control 8 9.0 10
Fig. 5. The room for virtual exposure therapy, augmented with interacting butterflies
On the other side the exposure therapy experts involved were favourably impressed
by the level of control of the virtual simulation available.
However, only a quantitative analysis conducted on a much wider number of
subjects may objectively assess the efficacy of this ARET environment. To this
regard, the evaluation we are carrying out is based on a modified version of the “fear
of spider” questionnaire originally proposed by Szymanski and O’ Donoghue [7] as,
to our best knowledge, there is no specific work of this kind for mottephobia.
5 Conclusions
In this paper, we presented an AR based environment for exposure therapy of
mottephobia. The proposed architecture exploits inertial tracking system,
instrumented gloves and parametric behavioral/interaction engines to provide the user
a more believable and emotionally involving interaction experience, improving at the
same time the range and the accuracy of the user-system interaction during the usage.
To this aim, we performed a first qualitative evaluation inolving ET experts and a
group of mottephobia sufferers asked to respond to a questionnaire. So far the first
qualitative reports confirm the potential of the proposed system for mottephobia
treatment, while, according to the therapists involved, other kind of anxiety disorders
could be favorably treated as well.
We are currently working on completing the aforementioned quantitative study to
assess the system’s effectiveness in reducing mottephobia symptoms as well as to
compare this proposal with both marker-based ARET and VRET approaches. As
currently the system is able to display only one type of moth/butterfly for a single
session, we are also working to remove this limitation. Additionally we are
developing a new version of the AR engine specific for video see-through HMDs.
References
1. Bouchard, S., Côté, S., St-Jacques, J., Robillard, G., Renaud, P.: Effectiveness of virtual
reality exposure in the treatment of arachnophobia using 3D games. Technology and Health
Care 14(1), 19–27 (2006)
2. Carlin, A., Hoffman, H.Y., Weghorst, S.: Virtual reality and tactile augmentation in the
treatment of spider phobia: a case study. Behaviour Research and Therapy 35(2), 153–158
(1997)
3. Bouchard, S., Côté, S., Richards, C.S.: Virtual reality applications for exposure. In:
Richards, C.S. (ed.) Handbook of Exposure, ch. 11 (in press)
4. Botella, C., Juan, M.C., Baños, R.M., Alcañiz, M., Guillen, V., Rey, B.: Mixing realities?
An Application of Augmented Reality for the Treatment of Cockroach phobia.
Cyberpsychology & Behavior 8, 162–171 (2005)
5. Juan, M.C., Joele, D., Baños, R., Botella, C., Alcañiz, M., Van Der Mast, C.: A Markerless
Augmented Reality System for the treatment of phobia to small animals. In: Presence
Conference, Cleveland, USA (2006)
6. Juan, M.C., Alcañiz, M., Calatrava, J., Zaragozá, I., Baños, R.M., Botella, C.: An Optical
See-Through Augmented Reality System for the Treatment of Phobia to Small Animals. In:
Shumaker, R. (ed.) HCII 2007 and ICVR 2007. LNCS, vol. 4563, pp. 651–659. Springer,
Heidelberg (2007)
7. Szymanski, J., O’Donoghue, W.: Fear of spiders questionnaire. J. Behav. Ther. Exp.
Psychiatry 26(1), 31–34 (1995)
Designing Augmented Reality Tangible Interfaces for
Kindergarten Children
Pedro Campos1,2 and Sofia Pessanha1

1
University of Madeira and Madeira Interactive Technologies Institute
Campus Universitário da Penteada, 9000-390 Funchal, Portugal
2
VIMMI Group, Visualization and Intelligent Multimodal Interfaces, INESC-ID
R. Alves Redol 9, 1000-029 Lisboa, Portugal
pcampos@uma.pt, sofia.pessanha@gmail.com
Abstract. Using games based on novel interaction paradigms for teaching

children is becoming increasingly popular because children are moving towards
a new level of inter-action with technology and there is a need to children to
educational contents through the use of novel, attractive technologies. Instead of
developing a computer program using traditional input techniques (mouse and
keyboard), this re-search presents a novel user interface for learning kindergarten
subjects. The motivation is essentially to bring something from the real world
and couple that with virtual reality elements, accomplishing the interaction using
our own hands. It’s a symbiosis of traditional cardboard games with digital
technology. The rationale for our approach is simple. Papert (1996) refers that
“learning is more effective when the apprentice voluntarily engages in the
process”. Motivating the learners is therefore a crucial factor to increase the
possibility of action and discovery, which in turn increases the capacity of what
some researchers call learning to learn. In this sense, the novel constructionist-
learning paradigm aims to adapt and prepare tomorrow’s schools to the constant
challenges faced by a society, which is currently embracing and accelerating
pace of profound changes. Augmented reality (Shelton and Hedley, 2002) and
tangible user interfaces (Sharlin et al., 2004) fitted nicely as a support method for
this kind of learning paradigm.
Keywords: Augmented reality, Interactive learning systems, Tangible Interfaces.
1 Introduction
Using games as a way for better educating children is becoming increasingly popular
because children are moving towards a new level of interaction with technology and
there is a need to approach them towards the educational contents. This can be done
through the use of novel, more attractive technologies.
The power of digital games as educational tools is, however, well understood.
Games can be successfully used for teaching science and engineering better than
lectures [1], and e.g. Mayo and colleagues even argued they could be the “cure for a
numbing 200-person class.” [1]. Games can also be used to teach a number of very
different subjects to children all ages. For instance Gibson describes a game aimed at
Designing Augmented Reality Tangible Interfaces for Kindergarten Children 13
teaching programming to pre-teens school children [2]. Belotti and colleagues [5]
describe an educational game using a state-of-the-art commercial game development
approach, and enriched the environment with instances of developed educational
modules. The research goals for these approaches are essentially to exploit the
potential of computers and reach a demographic that is traditionally averse to
learning.
On a more specific line, there is also interesting research on using Augmented
Reality (AR) games in the classroom. From high-school mathematics and geometry
[3] to interactive solar systems targeted at middle school science students [4], the
range of applications is relatively broad.
However, there is a clear lack of solutions and studies regarding the application of
these technologies with kindergarten children, who are aged 3-5 years old and
therefore have different learning objectives.
In this paper, we present a tangible user interface for an augmented reality game
specifically targeted at promoting collaborative learning in kindergarten. The game’s
design involved HCI researchers (the authors), kindergarten teachers and 3D
designers. We evaluated the system during several days in tow different local schools
and we recorded the children’s reactions, behaviors and answers to a survey we also
conducted.
Instead of developing a computer program using traditional input techniques
(mouse and keyboard), this research presents a novel user interface for learning
kindergarten subjects. The motivation is essentially to bring something from the real
world and couple that with virtual reality elements, accomplishing the interaction
using our own hands, thus, children don’t need to have previous experience using
computers in order to use this system. The interface is, essentially, a symbiosis of
traditional cardboard games with digital technology.
2 Related Work
Technology today provides exciting new possibilities to approach children to digital
contents. There are numerous areas where Augmented Reality (AR) can be applied,
ranging from more serious areas to entertainment and fun. Thus, the process of
viewing and manipulating virtual objects in a real environment can be found in many
applications, especially in the area of education and training which are very promising
applicants, since it is often necessary to use resources enabling a better view of the
object under study. Other applications include the creation of collaborative
environments in AR, which consist of multi-user systems with simultaneous access
where each user views and interacts with real and virtual elements, each of their point
of view.
Given the scope of our work, we divide the review of the literature into two broad
aspects: the use of augmented reality technology in the classroom, and approaches
targeted at promoting collaboration in the classroom by means of novel technology –
not necessarily based in augmented reality.
The use of augmented reality systems in educational settings, per se, is not novel.
Shelton and Hedley [6] describe a research project in which they used augmented
reality to help teach undergraduate geography students about earth-sun relationships.
14 P. Campos and S. Pessanha
They examined over thirty students who participated in an augmented reality exercise
containing models designed to teach concepts of rotation/revolution, solstice/equinox,
and seasonal variation of light and temperature, and found a significant overall
improvement in student understanding after the augmented reality exercise, as well as
a reduction in student misunderstandings.
Some other important conclusions about this system were that AR interfaces do not
merely change the delivery mechanism of instructional content: They may
fundamentally change the way that content is understood, through a unique
combination of visual and sensory information that results in a powerful cognitive and
learning experience [6].
Simulations in virtual environments are becoming an important research tool for
educators [9]. Augmented reality, in particular, has been used to teach physical
models in chemistry education [10]. Schrier evaluated the perceptions regarding these
two representations in learning about amino acids. The results showed that some
students enjoyed manipulating AR models by rotating the markers to observe
different orientations of the virtual objects [10].
Construct3D [9] is a three-dimensional geometric construction tool specifically
designed for mathematics and geometry education. In order to support various
teacher-student interaction scenarios, flexible methods were implemented for context
and user dependent rendering of parts of the construction. Together with hybrid
hardware setups they allowed the use of Construct3D in classrooms and provided a
test bed for future evaluations. Construct3D is easy to learn, encourages
experimentation with geometric constructions, and improves spatial skills [9].
The wide range of AR educational applications also extend to physics. Duarte et al.
[11] use AR to dynamically present information associated to the change of scenery
being used in the real world. In this case, the authors perform an experiment in the
field of physics to display information that varies in time, such as velocity and
acceleration, which can be estimated and displayed in real time.
The visualization of real and estimated data during the experiment, along with the
use of AR techniques, proved to be quite efficient, since the experiments could be
more detailed and interesting, thus promoting the cognitive mechanisms of learning.
Promoting collaborating behaviors is crucial in the kindergarten educational
context. Therefore, we briefly analyze approaches that use technology as a way to
achieve higher levels of collaboration in the classroom.
Children communicate and learn through play and exploration [16]. Through social
interaction and imitating one another, children acquire new skills and learn to
collaborate with others. This is also true when children work with computers.
Using traditional mouse-based computers, and even taking into consideration that
two or more children may collaborate verbally, only one child at a time has control
of the computer. The recognition that group work around a single display is desirable
has led to the development of software and hardware that is designed specifically to
support this. The effect of giving each user an input device, even if only one could be
active at a time was then examined and significant learning improvements were
found [17].
Stewart et al. [18] observed that children with access to multiple input devices
seemed to enjoy an enhanced experience, with the researchers observing increased
incidences of student-student interaction and student-teacher interaction as well as
changing the character of the collaborative interaction. The children also seemed to
enjoy their experience more, compared with earlier observations of them using similar
software on standard systems.
There are also studies about the design of user interfaces for collaboration between
children [14]. Some results present systems which effectively supported collaboration
and interactivity that children enjoyed, and were engaged in the play [14].
Kannetis and Potamianos [13] investigated the way fantasy, curiosity, and
challenge contributes to the user experience in multimodal dialogue computer games
for preschool children, which is particularly relevant for our research. They found out
that fantasy and curiosity are correlated with children's entertainment, while the level
of difficulty seems to depend on each child's individual preferences and capabilities
[13]. One issue we took into account when designing our AR game for kindergarten
was that preschoolers become more engaged when multimodal interfaces are speech
enabled and contain curiosity elements. We specifically introduced this element in our
design, and confirmed the results described in [13].
3 An Augmented Reality Tangible Interface for Kindergarten

As with any game, the solution space dimension was very high, so we collaboratively
designed the game with kindergarten teachers, focusing on a biodiversity theme, using
traditional book-based activities as a starting point.
The developed system was based on a wooden board containing nine divisions
where children can freely place the game’s pieces. The pieces are essentially based on
augmented reality markers. Several (experienced) kindergarten teachers provided us
with a learning objective and actively participated in the entire game’s design. For
instance, they listed a series of requirements that any game or educational tool should
comply when dealing with kindergarten children. They can be aged from 3 to 5 years
old, and therefore have different teaching and caring needs, when compared with
older children or other types of users. Among the most important requirements were:
• Promote respectful collaborative behaviors like giving turns to friends, pointing out
mistakes and offering corrections;
• Promote learning of the given subject.
• Promote a constructivist approach, where children learn by doing and by constructing
solutions;
• The previous requirement also implied that the physical material of the tangible
interface had to be resistant and adequate to manipulation by the group of children;
In our case, the learning objective was the study of animals and the environments
(sea, rivers, land and air) they live in. Each division of the board’s game contains a
printed image of a given environment.
Given the manipulative nature of such game, the game’s pieces had to be made
from a special material, which is particularly suited for children, a flexible but robust
material. Each of the game’s pieces displays a 3D animal that can be manipulated, as
in a regular augmented reality setting. The board also contains a fixed camera, which
processes the real time video information. Figure 1 illustrates the overall setting of the
system, which can be connected to any kind of computer and display. In the figure,
we show the system connected to a laptop, but during classroom evaluation we used a
projector, to facilitate collaborative learning.
The goal of the game is to place all the markers (game board pieces representing
animals) in the correct slot of the board. We only give feedback about the correctness of
the placement of pieces in the end, when the player places a special marker that is used
for that purpose, i.e. a “show me the results” marker. Two different versions of the game
were developed, to assess the impact of the feedback’s immediacy on the children’s
levels of collaboration: a version where feedback can be freely given at any time
(whenever children place the special marker to see the results, as shown in Figure 2); and
a version where feedback is only given at the end of the game, i.e. when all the pieces
have been placed in the board (again, by placing the special marker).
Fig. 1. The developed system, when used in a LCD display configuration
Figure 2 shows a screenshot of what children see displayed in the screen. The
markers display 3D animals, which can be freely manipulated. The animals that are
correctly placed have a green outline, incorrectly placed animals show a red outline.
Following the teachers’ suggestions, we also added audio feedback, with pre-recorded
sentences like “That’s not right, try it again!” This encouraged children, especially
when positive reinforcement was given in the form of an applause sound.
The game also features a detailed logging mechanism with all actions recorded
with timestamps. This was developed as an aid to evaluating the effects on
collaboration levels. The system logs the completion times of each game, the number
of incorrectly placed markers, the number of feedback requests (which can be
considered the number of attempts to reach a solution), and other variables.
Fig. 2. The game’s screen, showing feedback as a red or green border around the animals
4 Discussion
The results obtained so far indicate that using our augmented reality system is a
positive step forward towards achieving the goal of reducing the distance between
children and knowledge, by learning through play.
The system has a very positive impact on the whole class collaboration. This is
much harder than it seems, since kindergarten children have very low attention cycles.
They get distracted very often, and they have trouble collaborating in an orderly
manner. An important contribution from this paper, in terms of design issues that
promote collaboration, is the importance of providing immediate feedback in virtual
reality games such as the one we have developed. It is crucial that designers targeting
kindergarten children are capable of exploiting the innate curiosity in these tiny users
in order to achieve good levels of collaborative interactions.
Motivation, enjoyment and curiosity are important ingredients for any kind of
educational game, but they are even more important when it comes to kindergarten
user interfaces. Interaction with tangible board pieces (the AR markers) may be well
suited to very young children because of their physicality, but this is could not be
sufficient to achieve good levels of motivation and collaboration.
5 Conclusions
Augmented reality technology and tangible interfaces are well accepted by today’s
kindergarten children and by their teachers as well. Large projection screens and a
good blend of the physical game pieces with their virtual ones can prove effective for
increasing motivation and collaboration levels among children. In the learning field,
we also concluded that by playing the game the children’s number of wrong answers
decreased, which suggests the game could help kindergarten children to learn simple
concepts.
Since kindergarten children loose the focus of their attention frequently, specially
with a game, we feared that the game could harm the learning process. These results
suggest that the game didn’t make any harm to that process, since the next day’s
posttest results showed a positive improvement. According to teachers’ feedback, the
game looks like a promising way to complement the traditional teaching methods.
About motivation, we observed high levels of motivation while children played the
game because most of them were clearly motivated, e.g. they never gave up the game
until they found the solution. Curiosity was another driving factor towards motivation.
Children wanted to see all the 3D animals but for that to happen, they had to wait
until all markers were placed. In terms of maintaining motivation, this was a crucial
design issue.
This research focus was around promoting collaboration. We analyzed several
variables such as the number of collaborative comments made by children, number of
constructive collaborative corrections made by children, including pointing gestures
and the number of attempts made until reaching a solution. Results suggest that
immediate feedback played an important role, increasing the number of collaborative
behaviors and interactions among kindergarten children.
We also studied the impact of display size, but the results showed that differences
were not significant, although by observation, and also according to teachers’
feedback, the larger display seemed to better promote collaboration levels than the
smaller display. Future work should consist of expanding the experiment in order to
better assess the role played by the display size in collaboration levels. Future work
will also include more tests with different schools, as well as investigating other
features and design issues that could positively influence collaboration in kindergarten.
References
1. Mayo, M.J.: Games for science and engineering education. Communications of the
ACM 50(7), 30–35 (2007)
2. Gibson, J.P.: A noughts and crosses Java applet to teach programming to primary school
children. In: Proceedings of the 2nd International Conference on Principles and Practice of
Programming in Java, PPPJ, vol. 42, pp. 85–88. Computer Science Press, New York
(2003)
3. Kaufmann, H., Schmalstieg, D.: Mathematics and geometry education with collaborative
augmented reality. In: ACM SIGGRAPH 2002 Conference Abstracts and Applications, pp.
37–41. ACM, New York (2002)
4. Medicherla, P.S., Chang, G., Morreale, P.: Visualization for increased understanding and
learning using augmented reality. In: Proceedings of the International Conference on
Multimedia Information Retrieval, MIR 2010, pp. 441–444. ACM, New York (2010)
5. Bellotti, F., Berta, R., Gloria, A.D., Primavera, L.: Enhancing the educational value of
video games. Computers in Entertainment 7(2), 1–18 (2009)
6. Shelton, B., Hedley, N.: Using Augmented Reality for Teaching Earth-Sun Relationships
to Undergraduate Geography Students. In: The First IEEE International Augmented
Reality Toolkit Workshop, Darmstadt, Germany (September 2002), IEEE Catalog
Number: 02EX632 ISBN: 0-7803-7680-3
7. Papert, S.: The Connected Family: Bridging the Digital Generation Gap. Longstreet Press,
Atlanta (1996)
8. Sharlin, E., Watson, B., Kitamura, Y., Kishino, F., Itoh, Y.: On tangible user interfaces,
humans and spatiality. Personal Ubiquitous Computing 8(5), 338–346 (2004)
9. Tettegah, S., Taylor, K., Whang, E., Meistninkas, S., Chamot, R.: Can virtual reality
simulations be used as a research tool to study empathy, problems solving and perspective
taking of educators?: theory, method and application. International Conference on
Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2006 Educators
Program, Article No. 35 (2006)
10. Schrier, K.: Using augmented reality games to teach 21st century skills. In: International
Conference on Computer Graphics and Interactive Techniques, ACM SIGGRAPH 2006
Educators Program (2006)
11. Duarte, M., Cardoso, A., Lamounier Jr., E.: Using Augmented Reality for Teaching
Physics. In: WRA 2005 - II Workshop on Augmented Reality, pp. 1–4 (2005)
12. Kerawalla, L., Luckin, R., Seljeflot, S., Woolard, A.: Making it real: exploring the
potential of augmented reality for teaching primary school science. Virtual Reality 10(3-4),
163–174 (2006)
13. Kannetis, T., Potamianos, A.: Towards adapting fantasy, curiosity and challenge in
multimodal dialogue systems for preschoolers. In: Proceedings of the 2009 International
Conference on Multimodal Interfaces, ICMI-MLMI 2009, pp. 39–46. ACM, New York
(2009)
14. Africano, D., Berg, S., Lindbergh, K., Lundholm, P., Nilbrink, F., Persson, A.: Designing
tangible interfaces for children’s collaboration. In: CHI 2004 Extended Abstracts on
Human Factors in Computing Systems, CHI 2004, pp. 853–868. ACM, New York (2004)
15. Brosterman, N.: Inventing Kindergarten. Harry N. Adams Inc. (1997)
16. Sutton-Smith, B.: Toys as culture. Gardner Press, New York (1986)
17. Inkpen, K.M., Booth, K.S., Klawe, M., McGrenere, J.: The Effect of Turn-Taking
Protocols on Children’s Learning in Mouse- Driven Collaborative Environments. In:
Proceedings of Graphics Interface (GI 97), pp. 138–145. Canadian Information Processing
Society (1997)
18. Stewart, J., Raybourn, E.M., Bederson, B., Druin, A.: When two hands are better than one:
Enhancing collaboration using single display groupware. In: Proceedings of Extended
Abstracts of Human Factors in Computing Systems, CHI 1998 (1998)
19. Hsieh, M.-C., Lee, J.-S.: AR Marker Capacity Increasing for Kindergarten English
Learning. National University of Tainan, Hong Kong (2008)
20. Self-Reference (2008)
lMAR: Highly Parallel Architecture for
Markerless Augmented Reality in Aircraft
Maintenance
Andrea Caponio, Mauricio Hincapié, and Eduardo González Mendivil
Instituto Tecnológico y de Estudios Superiores de Monterrey, Ave. Eugenio Garza

Sada 2501 Sur Col. Tecnológico C.P. 64849 — Monterrey, Nuevo León, Mexico
andrea.caponio@yahoo.com, maurhin@gmail.com, egm@itesm.mx
Abstract. A novel architecture for real time performance marker-less

augmented reality is introduced. The proposed framework consists of
several steps: at first the image taken from a video feed is analyzed and
corner points are extracted, labeled, filtered and tracked along subse-
quent pictures. Then an object recognition algorithm is executed and
objects in the scene are recognized. Eventually, position and pose of the
objects are given. Processing steps only rely on state of the art image pro-
cessing algorithms and on smart analysis of their output. To guarantee
real time performances, use of modern highly parallel graphic process-
ing unit is anticipated and the architecture is designed to exploit heavy
parallelization.
Keywords: Augmented Reality, Parallel Computing, CUDA, Image

Processing, Object Recognition, Machine Vision.
1 Introduction
In recent times augmented reality (AR) systems have been developed for sev-
eral applications and several fields. In order to augment user’s experience, AR
systems blend image of actual objects, coming for instance from a camera video
feed, with virtual objects which offer new important information. Therefore, AR
systems need to recognize some object in a real scene: this is normally done
by placing a particular marker on those specific objects. Markers are easy to
recognize and AR systems based on this method are already widely used, as
shown in section 2. However marker based systems are invasive, rigid and time
consuming. To overcome these difficulties, marker-less AR has been proposed:
avoiding markers leads to a much more effective AR experience but, on the other
hand, requires the implementation of several image processing or sensor fusion
techniques, resulting in more complex algorithms and in higher computational
demands that risk to compromise user’s experience.
In this article we present the design of lMAR (library for Marker-less Aug-
mented Reality), a parallel architecture for marker-less AR, whose purpose is to
provide developers with a software tool able to recognize one or more specific ob-
jects in a video feed and to calculate their pose and position with respect to the

c Springer-Verlag Berlin Heidelberg 2011
lMAR: Highly Parallel Architecture for Markerless AR in AM 21
camera reference frame. To counterweight algorithm complexity, lMAR design

fully exploits parallel computing, now available at low cost thanks to modern
CPUs and GPUs. This way the proposed system will be able to use very complex
and computational intensive algorithms for image processing, while still deliv-
ering real time performance and avoiding low frame rate processing and video
stuttering.
This article is structured as follows: in section 2 state of the art AR solutions
are presented, along with the most important application fields. Section 3 presents
in detail the proposed architecture. Section 4 describes how lMAR guarantees
real time performances. Section 5 closes the article offering some conclusions and
detailing future work.
2 Related Work
AR has become really popular in the last 20 years and is currently used in many
fields such as training, product development, maintenance, medicine and multi-
media. In AR systems is quite common to use printed markers to successfully
blend actual reality with virtual information. In fact, algorithms based on this
kind of set up have been used for many years and are not computational de-
manding, so they can deliver a satisfying AR experience to the final user. On
the other hand, even if marker based AR applications proved to be practical
and deliver good performances, the presence of markers can be problematic in
several situations: i.e. when we have to deal with objects of different size, when
the markers must be positioned in locations difficult to be accessed, or when we
have to work with unfriendly environmental conditions. Moreover, maintenance
and training are among the principal research topics nowadays as there is a clear
interest from industry to develop working applications, opening the opportunity
for a global establishment of AR as a tool for speed up maintenance of complex
systems and training of complex procedures.
2.1 Marker Based AR Solutions

In [6], Kim and Dey propose an AR based solution for training purposes: a
video see-through AR interface is integrated into three prototype 3D applications
regarding engineering systems, geospace, and multimedia. Two sample cases
making use of marker tags are presented: (a) an AR-interfaced 3D CAE
(Computer-Aided Engineering) simulation test-bed, and (b) a haptically-
enhanced broadcasting test-bed for AR-based 3D media production. In the 3D
CAE simulation a marker is used to display a model and the interaction with
the model is done by means of keyboard and markers, as both trigger certain
activities.
In [11] Uva et al. integrate AR technology in a product development process
using real technical drawings as a tangible interface for design review. The pro-
posed framework, called ADRON (Augmented Design Review Over Network),
provides augmented technical drawings, interactive FEM simulation, multi-modal
annotation and chat tools, web content integration and collaborative client/server
22 A. Caponio, M. Hincapié, and E.G. Mendivil
architecture. Technical drawings are printed along with hexadecimal markers

which allow the system to display information like 3D models and fem analysis.
Authors’ framework meant to use common hardware instead of expensive and
complex virtual or augmented facilities, and the interface is designed specifically
for users with little or no augmented reality expertise.
Haritos and Macchiarella in [3] apply AR to training for maintenance in the
aeronautical field by developing a mobile augmented reality system which makes
use of markers applied to different parts of the aircraft in order to help technicians
with the task of inspecting the propeller mounting bolts and safety wire for signs
of looseness on Cessna 172S airplanes.
2.2 Marker-less AR Solutions

Paloc et al. develop in [10] a marker-less AR system for enhanced visualization of
the liver involving minimal annoyance for both the surgeon and the patient. The
ultimate application of the system is to assist the surgeon in oncological liver
surgery. The Computer Aided Surgery (CAS) platform consists of two function
blocks: a medical image analysis tool used in the preoperative stage, and an
AR system providing real time enhanced visualization of the patient and its
internal anatomy. In the operating theater, the AR system merges the resulting
3D anatomical representation onto the surgeon’s view of the real patient. Medical
image analysis software is applied to the automatic segmentation of the liver
parenchyma in axial MRI volumes of several abdominal datasets. The three-
dimensional liver representations resulting from the above segmentations were
used to perform in house testing of the proposed AR system. The virtual liver
was successfully aligned to the reflective markers and displayed accurately on
the auto-stereoscopic monitor.
Another project involving the marker-less approach is the Archeoguide by
Vlahakis et al. [13]. The Archeoguide system provides access to a huge amount
of information in cultural heritage sites in a compelling and user-friendly way,
through the development of a system based on advanced IT techniques which in-
cludes augmented reality, 3D visualization, mobile computing, and multi-modal
interaction. Users are provided with a see-through Head-Mounted Display
(HMD), earphones and mobile computing equipment.
Henderson and Feiner designed, implemented and tested a prototype of an
augmented reality application to support military mechanics conducting routine
maintenance tasks inside an armored vehicle turret [5]. Researchers created a
marker-less application for maintenance processes and designed the hardware
configuration and components to guarantee good performance of the application.
The purpose of the project was to create a totally immersive application to
both improve maintenance time and diminish the risk of injury, due to highly
repetitive procedures.
3 lMAR: Overview of the Proposed Solution

In the previous sections we have underlined how the presence of markers can
seriously hamper the integration of AR in several fields. This is particularly true
in maintenance, where we need to identify several objects in particularly difficult

environments. In fact, as said before, markers cannot be used when the size range
of the objects we want to identify is really wide, as when we have to recognize
both big and small objects, when the absolute size of the objects to identify is
too small or too big, or when it is simply not possible to properly set up the
scene with the needed tags. In these scenarios a marker-less AR approach would
be more advisable.
While marker based AR systems relies on the presence of tags for object
identification, marker-less AR depends on modern computer vision techniques
which usually are computational demanding, thus risking to deliver a stutter-
ing and inaccurate AR experience. In order to minimize this risk we designed
lMAR, a software architecture meant to execute object recognition with real time
performances even in complex situations.
3.1 Working Principles

Purpose of lMAR is providing developers of marker-less AR with software tools
for recognizing several specific objects present in a scene. The main idea is that
we need to analyze a camera feed to find out which objects are present and in
which specific pose and position they appear. The objects do not need to be on
a specific plane, neither they need to satisfy specific conditions such as planarity.
However, objects should show enough specific points needed to recognize them,
so extremely flat monochromatic objects or highly reflective objects are not
considered at the moment.
lMAR was conceived to recognize objects only after a training phase. Once
trained, lMAR functions will be able to analyze a video feed and return the
number of recognized objects and, for each one of them, an identification and a
homography matrix [4] representing object pose and scale. We can then distin-
guish two main functioning modes: a training mode and a working mode. During
training mode the system learns one by one all the objects it will need to recog-
nize; in the working mode lMAR functions will analyze a video feed to identify
objects of interest and output their position and pose with respect to the camera
frame.
3.2 Training Mode

In order to successfully recognize an object, a marker-less AR software must,
first of all, learn how this object looks like. Fig. 1 shows how lMAR perform
this step: at first an image I of the object obj is given to the system. The image
is processed by a feature points extraction algorithm (FEA) and the list X of
object’s feature points is used to populate a database which associates X to the
unique object name obj. As object appearance can dramatically change with its
position, the training stage should process several images of the same object, so
that it could be seen and recognized from several perspectives. The database of
objects is created by repeating this procedure with all the objects we want to
recognize.
Fig. 1. Block diagram of lMAR training mode
It is worth pointing out that the training stage does not need to be fast, as
it can be also done off-line using recorded videos or static images. Thus the
algorithms used at this stage do not need to be fast and more attention can be
given to the accurately populate the database.
3.3 Working Mode
Working mode of lMAR is shown in fig. 2 where we can distinguish three main
stages: an Image Processing Stage, a Data Analysis Stage and an Output Gener-
ation Stage. Variable names in fig. 2 are described in table 1. Blocks of different
colors are independent from each other and can be executed in parallel. On the
contrary, blocks of the same color must be executed in sequence. This strategy
is suggested in [7], and allows a multi-thread approach which helps to speed up
the AR application.
Fig. 2. Block diagram of lMAR working mode
Image Processing Stage individuates feature points in the current view and
compute relative motion between subsequent frame in the video feed. This is
done by means of two different algorithms: a feature extraction and a feature
tracking algorithm (FEA and FTA).
The FEA is needed to analyze the current scene and individuate particular
points, called corners, which are known to be invariant to several geometric
transformations. Part of detected points belong to objects of interest and by
analyzing them we will eventually recognize the objects.
The FEA is fundamental for good performance of the AR system: it must be
accurate and point out good features in the current scene to allow their successful
tracking. On the other hand it must be fast, allowing a high frame rate of the
whole application. In the past many algorithms have been developed for feature
Table 1. Legend of variables in figure 2
Variable Name Variable Meaning

I(ts ) Input image from the video feed when FEA is run.
I(tof ) Input image from the video feed when FTA is run.
I(tof − 1) Image previously processed by the FTA.
Xs Vector of feature points identified by FEA.
Xof Vector of feature points identified by FTA.
Vof Vector of velocities of points identified by FTA.
Vector of feature points identified by FEA, after
Xls
labeling.
Vector of feature points identified by FTA, after
Xlof
labeling.
DB Database of object from the previous training.
X Vector of filtered feature points.
OBJ List of object recognized in the scene.
List of object recognized in the scene at previous
OBJ(t − 1)
iteration.
Xobj Vector of feature points belonging to recognized objects.
Vector of feature points belonging to recognized objects
Xobj (t − 1)
at previous iteration.
Homography matrices representing pose of each
H
identified object.
Matrix to indicate position in the scene for each
P
identified object.
extractions and corner detection; the most promising among them are the SIFT
algorithm by Lowe [8], the SURF algorithm by Bay et al. [2] and the most recent
DSIFT algorithm by Vedaldi and Fulkerson [12]. To compare the performances
of these algorithms, we ran a preliminary study which is summarized in table
2. In this study, the three algorithms mentioned were evaluated by checking
quality and number of the matches found among images from the Oxford Affine
Covariant Regions Dataset [9]. Each algorithm received a score between 1 and 3
for several transformations. Finally a value to execution speed was assigned. A
brief look to table 2 shows that even if the three algorithms perform well in every
situation, SIFT outperforms all of them. Anyway, SIFT is the slowest algorithm
and would not guarantee a high execution rate. On the other hand, also DSIFT
offers really good performance, but, running at a quite high rate, qualifies as the
best possible FEA.
Table 2. Preliminary comparison between DSIFT, SIFT, and SURF algorithms
DSIFT SIFT SURF

Affine Transformation 2 3 1
Blurring 3 3 3
Compression Artifacts 3 3 3
Rotation 3 3 3
Zoom 3 3 2
Speed of Execution 3 1 2
However, no matter which algorithm we choose, FEA will not be fast enough
to guarantee an extremely reactive AR system. To improve overall performance
we introduce a FTA, whose purpose is to track the features in the scene as the
image changes in time. This approach was first proposed in [7], where the Opti-
cal Flow (OF) algorithm was used. OF is a common algorithm for analyzing two
subsequent pictures of a same video and calculate the overall displacement be-
tween them. OF output can be used to extrapolate overall image movement, thus
considerably simplifies matching of feature points between subsequent frames of
the video feed.
Data Analysis Stage is needed to process the information given by image

processing. With reference to fig. 2, the Label Features operation is needed to
find correspondent points between FEA and FTA outputs, so that points given
by the two algorithms can be associated. After this, a Filter Features operation
happens in order to choose the more robust feature points. The main idea is that
when Xls and Xlof are received as input, the filter compares them, giving more
importance to those which confirm each other. Moreover, the input points are
compared with those which were previously recognized as object points: objects
that were pictured in the previous frame are likely going to be still there. This
step generates X, a list of feature points which are likely going to be more stable
than Xls and Xlof alone. Finally, the Find Objects block finds good matches
between points seen in the image and objects present in DB. Starting from the
list of filtered feature points, X, and taking in consideration the list of objects
recognized at the previous iteration, OBJ(t − 1), the algorithm will search for
the best matches between groups of points and the object database. Eventually,
the list of recognized objects OBJ and the list of feature points belonging to
them, Xobj , are given as outputs.
Output Generation Stage consists just in the block Calculate Homographies

and Position, which is the last one in the process and calculates, from the lists
OBJ and Xobj , the pose and position of each object with respect to the current
camera frame. These information are expressed by a matrix of homographies H.
Outputting H and OBJ will allow the rest of the AR system to understand which
objects are present in the scene, where they are and how they are positioned.
The AR system will use these information to augment the current scene, e.g.
drawing 3D models of recognized objects on a screen, superimposing the virtual
objects to the real one.
4 Strategies to Improve Performances

We have already stressed how important it is for AR systems to run smoothly
and to provide users with a stutter-less experience. lMAR design takes this need
into account in several ways. First of all, it puts side by side the FEA and the
FTA as redundant algorithms: this way when FEA performs poorly and cannot
recognize enough feature points or when it performs too slowly, FTA can be used
to extrapolate objects’ position. Since FTA is much faster than FEA, lMAR is
guaranteed to run at an overall higher frame rate than the one it would be
constrained to by the FEA.
As a second speeding up strategy, we designed lMAR as a multi-threaded
solution, like suggested in [7]. Therefore, all operations independent from each
other can be run in parallel, as different threads. This is clearly shown in fig.
2, where different colors represent different threads. In particular we can notice
that the FEA and the FTA are independent from each other and from the rest of
the recognition steps. In fact, as the data analysis stage processes data coming
from the image processing stage, both FEA and FTA keep working as different
threads, providing new data for the next iterations.
A third way to improve performances regards the actual implementation of
the FEA, which will be done by means of the DSIFT algorithm [12]. As shown
in table 2, DSIFT is an excellent compromise between the quality of the feature
extraction process and speed of execution.
As a fourth final strategy to speed up the AR system, lMAR is designed
to fully exploit parallel computing, now available at low cost thanks to modern
GPUs. More specifically lMAR implementation will be done through the parallel
computing CUDA architecture [1], which delivers the performance of NVIDIA’s
graphics highly parallel processor technology to general purpose GPU Comput-
ing, allowing us to reach dramatic speedups in the proposed application. To
take advantage of modern GPUs hardware capabilities, all lMAR functions are
designed as parallel functions and will be implemented through CUDA.
5 Conclusions
Nowadays AR is becoming increasingly important, especially in training and
maintenance. Many AR systems make use of special markers and tags to set up
the virtual environment and to recognize real world objects . This makes AR
systems useless or difficult to set up in many situations. To overcome this dif-
ficulty, marker-less AR systems have been proposed and researchers have lately
dedicated a great amount of resources to their development. Anyway, to authors’
knowledge, no marker-less AR system is yet able to recognize several objects in
the same scene while relying only on the analysis of the video feed of the scene.
lMAR was designed to fill this lack and to provide a software instrument to
marker-less AR system development.
In this article the general design of lMAR was presented. lMAR has been con-
ceived to offer state of the art image processing and object recognition algorithms
and, thanks to its highly parallel implementation, it will exploit the most recent
hardware advances in GPUs, guaranteeing real time stutter-less performances.
This will allow developers to offer an extremely satisfying AR experience, partic-
ularly for maintenance and training applications, where many different objects
should be recognized.
In the future all the described system and the needed algorithms will be
developed as a C++/CUDA library, thus providing developers with an extremely
performing tool to realize marker-less AR software. After this step, we will use
lMAR library to realize a marker-less AR environment to support training and
maintenance in aeronautic maintenance.
Acknowledgments. Authors would like to thank A.DI.S.U. Puglia for the

financial support of Dr. Andrea Caponio, according to the regional council res-
olution n. 2288/2009.
References
1. NVIDIA CUDA Compute Unified Device Architecture - Programming Guide
(2010), http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/
docs/CUDA_C_Programming_Guide.pdf
2. Bay, H., Esse, A., Tuytelaars, T., Gool, L.V.: Surf: Speeded up robust features. In:
9th European Conference on Computer Vision (May 2006)
3. Haritos, T., Macchiarella, N.: A mobile application of augmented reality for
aerospace maintenance training. In: The 24th Digital Avionics Systems Confer-
ence, DASC 2005 (2005)
4. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd
edn. Cambridge University Press, Cambridge (2004); ISBN: 0521540518
5. Henderson, S., Feiner, S.: Evaluating the benefits of augmented reality for task
localization in maintenance of an armored personnel carrier turret. In: 8th IEEE
International Symposium on Mixed and Augmented Reality, ISMAR 2009, pp.
135–144 (2009)
6. Kim, S., Dey, A.K.: Ar interfacing with prototype 3d applications based on user-
centered interactivity. Comput. Aided Des. 42, 373–386 (2010)
7. Lee, T., Hollerer, T.: Multithreaded hybrid feature tracking for markerless aug-
mented reality. IEEE Transactions on Visualization and Computer Graphics 15(3),
355–368 (2009)
8. Lowe, D.: Object recognition from local scale-invariant features. In: The Proceed-
ings of the Seventh IEEE International Conference on Computer Vision (1999)
9. Oxford Visual Geometry Research Group: Oxford affine covariant regions dataset,
http://www.robots.ox.ac.uk/~vgg/data/data-aff.html
10. Paloc, C., Carrasco, E., Macia, I., Gomez, R., Barandiaran, I., Jimenez, J., Rueda,
O., Ortiz de Urbina, J., Valdivieso, A., Sakas, G.: Computer-aided surgery based
on auto-stereoscopic augmented reality. In: Proceedings of Eighth International
Conference on Information Visualisation, IV 2004, pp. 189–193 (2004)
11. Uva, A.E., Cristiano, S., Fiorentino, M., Monno, G.: Distributed design review
using tangible augmented technical drawings. Comput. Aided Des. 42, 364–372
(2010)
12. Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer
vision algorithms (2008), http://www.vlfeat.org/
13. Vlahakis, V., Ioannidis, M., Karigiannis, J., Tsotros, M., Gounaris, M., Stricker,
D., Gleue, T., Daehne, P., Almeida, L.: Archeoguide: an augmented reality guide
for archaeological sites. IEEE Computer Graphics and Applications 22(5), 52–60
(2002)
5-Finger Exoskeleton for Assembly Training in
Augmented Reality
Siam Charoenseang and Sarut Panjan
Institute of Field Robotics, King Mongkut's University of Technology Thonburi,

126 Pracha-u-thit, Bangmod, Tungkru,
Bangkok, Thailand 10140
siam@fibo.kmutt.ac.th, sa_panjan@hotmail.com
Abstract. This paper proposes an augmented reality based exoskeleton for

virtual object assembly training. This proposed hand exoskeleton consists of 9
DOF joints which can provide force feedback to all 5 fingers at the same time.
This device has ability to simulate shape, size, and weight of the virtual objects.
In this augmented reality system, user can assembly virtual objects in real
workspace which is superimposed with computer graphics information. During
virtual object assembly training, user can receive force feedback which is
synchronized with physics simulation. Since this proposed system can provide
both visual and kinesthesia senses, it will help the users to improve their
assembly skills effectively.
Keywords: Exoskeleton Device, Augment Reality, Force Feedback.
1 Introduction
In general, object assembly training requires several resources such as materials,

equipment, and trainers. The simulation is one of training solutions which can save
costs, times, and damages occurred during training. However, most of simulators do
not provide sufficient realistics and senses. Hence, this paper proposes an augmented
reality based exoskeleton for virtual object assembly training. This system can
provide more realistics and senses such as visual and haptic during operation. Objects
for assembling task are simulated in the form of computer graphics superimposed on
the real environment. Furthermore, this system can provide force feedback while the
trainee assembles virtual objects.
Force feedback technology in general can be categorized into 2 styles which are
wearable and non-wearable. The wearable force feedback devices are usually in the
form of hand, arm, and whole body exoskeletons. The non-wearable force feedback
devices are usually in the form of force feedback stylus, joystick, and small robot
arm. Immersion CyberGrasp mounts a force feedback device and 3D tracking
device on Immersion CyberGlove [1]. It uses cables to transfer power from motor to
the exoskeleton device. This device is lightweight and its motors are mounted on its
base separately. Koyama, T. proposed a hand exoskeleton for generating force
5-Finger Exoskeleton for Assembly Training in Augmented Reality 31
feedback [2]. This device uses the passive actuators, clutches, for simulating
smooth force feedback. Bouzit, M. implemented small active pneumatic actuators
for generating force feedback to a hand exoskeleton [3]. This exoskeleton has a
small size and light weight. Ganesh Sankaranarayanan and Suzanne Weghorst
proposed an augmented reality system with force feedback for teaching chemistry
and molecular biology [4]. This system simulates the geometry and flexibility of
organic compounds and then uses Phantom haptic device to create force feedback.
Matt Adcock, Matthew Hutchins, and Chris Gunn used augmented reality with
force feedback for designing, advising, and surveying among users [5]. This device
uses pneumatic actuators for creating force feedback.
All previous non-wearable exoskeleton devices, which are implemented with
haptic device and augment reality for generating force feedback to user, cannot
simulate force feedback to each joint of hand. Hence, this paper proposes an
augmented reality system using wearable hand exoskeleton for generating force
feedback to user during virtual assembly task.
2 System Overview
Figure 1 shows the configuration of the proposed system. This system consists of a
hand exoskeleton device which is used to generate force feedback to the user. The
exoskeleton also sends finger’s angles and receives braking angles from the main
computer. Markers are used to track the positions and orientations of virtual objects
and user’s hand. Video camera is used to receive video image from the real
environment. Camera is mounted on LCD Glasses Display which is used to show the
graphics in the same view of user’s. Graphics is updated by the physics engine using
the Bullet software library [6].

8VHU
9LGHR&DPHUD /&'*ODVVHV'LVSOD\
0DLQ&RPSXWHU
([RVNHOHWRQ
0DUNHU
Fig. 1. System Overview
3 System Components
The system consists of two main components which are hardware and software
components. Hardware includes an exoskeleton device with controller, an LCD
32 S. Charoenseang and S. Panjan
Glasses Display, a video camera, force sensors, and markers. Software includes
graphics manager and vision manager.
+DUGZDUH 6RIWZDUH

Fig. 2. System Components
In Figure 2, the system receives video image from the video camera, does image
processing to find targets’ positions and orientations, and generate computer graphics
superimposed on video image. It also sends force feedback in form of braking angles
to all motors on exoskeleton device.
3.1 Exoskeleton Device
Exoskeleton device is used to generate force feedback to a user. It receives commands

from the main computer through the exoskeleton controller for controlling its motors.
The controller also receives sensed forces from strain gages for adjusting tensions of
the cables.
In general of object manipulation by hand, finger no.1 can rotate about X and Z
axes but fingers no.2-5 can rotate only about Z axis. The last joint of each fingers
no.2-5 cannot be controlled independently. Rotation of the last joint depends on the
previous joint’s rotation. Hence, mechanical structure of the proposed exoskeleton
device is designed so that exoskeletons of fingers no.2-5 can generate 2-DOF force
feedback at first joint and fingertip. To simplify the mechanical structure exoskeleton
of finger no.1, this exoskeleton can generate only 1-DOF force feedback at the
fingertip. Computer graphics of virtual finger no.1 and fingers no. 2-5 are designed
with 2 DOFs and 3 DOFs, respectively as shown in Figure 3. In addition, the
movements of virtual fingers are updated correspondingly to real finger’s. Physics
engine uses forward kinematics from Equation 1 to calculate the position and
orientation of each finger from the D-H parameters as shown in Table 1 and 2 [7].
Since the last joints of all fingers always move relatively with the middle joints,
inverse kinematics can be calculated by converting 2-DOF configuration to 1-DOF
configuration of finger no. 1 and 3-DOF configuration to 2-DOF configuration of

fingers no. 2-5 as shown in Figure 4.
Fig. 3. Frames and axes of hand
Table 1. Finger no.1’s DH-parameters Table 2. Finger no.2-5’s DH-parameters

Ϭ Ϭ
Ϭ Ϭ

ϵϬ Ϭ Ϭ
ϵϬ Ϭ Ϭ

ͲϵϬ Ϭ Ϭ
ͲϵϬ Ϭ Ϭ
ϰϱ Ϭ

Ϭ Ϭ Ϭ Ϭ

Ϭ Ϭ

(1)
Fig. 4. Plane geometry associated with a finger
Equation 2 is used to calculate the distance from fingertip to base. Equation 3 is

used to calculate the rotation angle of the fingertip with respect to the first joint.
The first angle between base joint and middle joint can be obtained using the
Equation 4-5. Equation 6 is used to find the second angle between middle joint and
fingertip. Inverse kinematics is used to calculate the braking angle when a collision
occurs.
M = x2 + y2 (2)
y
α1 = tan −1 (3)
x
la = (l22 + l32 ) (4)
⎡ l a2 − l12 − M 2 ⎤
θ1 = α 1 − cos −1 ⎢ ⎥ (5)
⎣ − 2l1 M ⎦
⎡M 2 − l2 − l2 ⎤
θ 2 = 180 − cos −1 ⎢ a 1
⎥ (6)
⎣⎢ − 2 l l
1 a ⎦⎥
Strain gage is mounted on each joint of the exoskeleton as shown in Figure 5-a. Strain
gages are used to receive force which acts on each joint of exoskeleton. Nine digital
servo motors in exoskeleton device are used to transfer force to the user by adjusting
the cable’s tension. Each digital servo motor with maximum torque at 8kg/cm is
installed on its base separately from exoskeleton device as shown in Figure 6. The
overview of exoskeleton system can be shown in Figure 7.

Fig. 5. (a) Strain gages mounted on the exoskeleton (b) Close-up view
Digital Servo Motors
Fig. 6. Servo motors on exoskeleton’s base
Fig. 7. Overview of exoskeleton system
Exoskeleton controller’s MCU, which is STM32 ARM Cortex-M3 core-based

family of microcontrollers, is for receiving 12 bit A/D data from each strain gage and
controlling all motors. It can communicate with computer via serial port at
115,200bps and interface with motors via rx/tx pins. The main control loop is
programmed to receive all force data from the 9 strain gages and braking angles for
motors from the main computer. If value of each strain gage is less than zero, each
motor will pull the cable for adjusting the tension of cable. If value of each strain gage
is more than zero and motor angle is less than the breaking angle, motor will release
the cable motors. If motor angle is more than the breaking angle, each motor will hold
its position. Exoskeleton controller also returns the motor angles to the main computer
for updating graphics.
3.2 Vision Manager
The Logitech 2 MP Portable Webcam C905[8] is used to capture video image and
send it to the main computer. This camera is mounted on LCD Glasses Display in
order to synchronize between the user’s view and camera’s view. Video capture
resolution is 640x480 pixels and graphics refresh rate is 30 frames per seconds as
shown in Figure 8. The vision manager applies the ARToolkit software library [9] to
locate markers and send the marker’s position and orientation to the graphics
manager.
Fig. 8. Video Display
3.3 Graphics Manager
Graphics manager is responsible for rendering virtual objects and virtual hand on a
marker using OpenGL as shown in Figure 9 (a) and (b). Bullet physics engine
included in the graphics manager is used to detect collisions and calculate reaction
force from virtual hand’s manipulation. Virtual hand is a VRML-based model with
separated link models. Angle of each finger read from the exoskeleton device is sent
to the graphics manager via the serial communication. Position and orientation of
each finger model can be calculated from forward kinematics explained in section 3.1.
The calculated position and orientation are used to update virtual hand’s position and
orientation in physics simulation.
Hole
Peg Hand
Fig. 9. (a) Virtual objects in physics simulation (b) Virtual hand in physics simulation
4 Experimental Results
4.1 Sensor Data Map
This experiment is set to explore the relationship between force and A/D data. First,
strain gages are fixed on a piece of clear acrylics. Forces with range of 0-80 N are
applied to the tip of acrylics as shown in Figure 5-b. The experimental results of force
and corresponding A/D data are plotted in Figure 10.
(Unit)
Fig. 10. Data mapping between force and A/D data
In Figure 10, the horizontal axis represents force applied to the strain gage and the
vertical axis represents A/D data read from the exoskeleton controller. The results
show that strain gage can return data in a linear fashion.
4.2 Maximum Force Feedback
This experiment is set to explore the maximum force feedback provided by the
exoskeleton device. First, the user wears the exoskeleton to do grasping while the
motors are set to hold their original positions. The exoskeleton controller then queries
the maximum forced from strain gaged.
Motor ID
Fig. 11. Maximum force feedback from motors
In Figure 11, the horizontal axis represents motor IDs and the vertical axis
represents forces exerted on each joint. The results show that exoskeleton device can
generate maximum force feedback up to 50 N.
4.3 Virtual Assembly Task
This experiment is set to test the virtual object assembly task. In this experiment, the
user is allowed to use the proposed exoskeleton device to manipulate the virtual
objects in the real environment. The goal of this virtual assembly task is to put virtual
pegs in holes with force feedback effect. All virtual objects with physics simulation
are augmented on the real markers as shown in Figure 12-a. User can receive force
feedback while he/she manipulates the virtual objects as shown in Figure 12-b.
(a) Before grasping virtual object (b) Grasping virtual object (c) Virtual object in the hole
Fig. 12. Virtual assembly task with force feedback
Figure 12-c shows the completion of one virtual peg assembled in to hole. This
operation can be applied for training the user in more complex assembly task with
augmented information. Furthermore, the graphics refresh rate is about 25 frames per
seconds.
5 Conclusions and Future Works

This research proposed an augment reality with force feedback system for virtual
object assembly task. Exoskeleton device was designed and built to generate 9-DOF
force feedback to the user’s hand. It can generate with maximum forces up to 5N for
each finger. Virtual objects in physics simulation can be superimposed on the tracked
real markers. Graphics refresh rate is about 25 frames per seconds. Several assembly
trainings can be applied using this proposed system. In the training, the user can use
the hand exoskeleton to manipulate virtual objects with force feedback in the real
environment. This provides more realistics and improves the training performances.
Future works of this research would cover virtual soft object manipulation,
enhanced graphics user interface, and markerless augmented reality implementation.
Acknowledgments. This research work is financially supported by the National

Science and Technology Development Agency, Thailand.
References
1. Zhou, Z., Wan, H., Gao, S., Peng, Q.: A realistic force rendering algorithm for CyberGrasp,
p. 6. IEEE, Los Alamitos (2006)
2. Koyama, T., Yamano, I., Takemura, K., Maeno, T.: Multi-fingered exoskeleton haptic
device using passive force feedback for dexterous teleoperation. 3, 2905–2910 (2002)
3. Monroy, M., Oyarzabal, M., Ferre, M., Campos, A., Barrio, J.: MasterFinger: Multi-finger
Haptic Interface for Collaborative Environments. Haptics: Perception, Devices and
Scenarios, 411–419 (2008)
4. Sankaranarayanan, G., Weghorst, S., Sanner, M., Gillet, A., Olson, A.: Role of haptics in
teaching structural molecular biology (2003)
5. Adcock, M., Hutchins, M., Gunn, C.: Augmented reality haptics: Using ARToolKit for
display of haptic applications, pp. 1–2. IEEE, Los Alamitos (2004)
6. Coumans, E.: Bullet 2.76 Physics SDK Manual (2010),
http://www.bulletphysics.com
7. Craig, J.J.: Introduction to robotics: mechanics and control (1986)
8. Logitech Portable Webcam C905, http://www.logitech.com/en-us/webcam-
communications/webcams/devices/6600
9. ARToolKit Library (2002),
http://www.hitl.washington.edu/artoolkit/download/
Remote Context Monitoring of Actions and Behaviors in
a Location through 3D Visualization in Real-Time
John Conomikes1, Zachary Pacheco1, Salvador Barrera2, Juan Antonio Cantu2,

Lucy Beatriz Gomez2, Christian de los Reyes2, Juan Manuel Mendez-Villarreal2
Takeo Shime3, Yuki Kamiya3, Hedeki Kawai3,
Kazuo Kunieda3, and Keiji Yamada3
1
Carnegie Mellon University, Entertainment Technology Center (ETC),
800 Technology Drive, Pittsburgh, PA, 15219, USA
2
Universidad de Monterrey (UDEM), Engineering and Technology Division,
Av. Morones Prieto 4500 Pte. San Pedro Garza Garcia, C.P. 66238, N.L. Mexico
3
NEC C&C Innovation Research Laboratories, 8916-47,
Takayama-Cho, Ikoma, Nara 630-0101, Japan
{JohnConomikes,zakpacheco}@gmail.com,
{sbarrea1,jcantaya,lgomez20,xpiotiav,jmndezvi}@udem.net,
t-shime@ce.jp.nec.com, y-kamiya@fn.jp.nec.com,
h-kawai@ab.jp.nec.com, k-kunieda@ak.jp.nec.com,
kg-yamada@cp.jp.nec.com
Abstract. The foal of this [project is to present huge amounts of data, not parse-
able by a single person and present it in an interactive 3D recreation of the
events that the sensors detected using a 3D rendering engine known as
Panda3D. "Remote Context Monitoring of Actions and Behavior in a Location
Through the Usage of 3D Visualization in Real-time" is a software applications
designed to read large amounts of data from a database and use that data to
recreate the context that the events occurred to improve understanding of the
data.
Keywords: 3D, Visualization, Remote, Monitoring, Panda3D, Real-Time.
1 Introduction
This prototype is the result of a long project development made at the Entertainment
Technology Center where work was done in conjunction with NEC and the
Universidad de Monterrey.
While there is a lot of work in this field one of the unique angles of this project is
the type of data is designed to build the recreation from. This data is from NEC's
LifeLog system which tracks a wide variety of detailed information on what each
employee in the monitored space does daily on a second to second basis.
Additionally, the data can be viewed from anywhere in the world, not just the
monitored laboratory.
Remote Context Monitoring of Actions and Behaviors 41
Fig. 1. Initial 3D shaded model for the Southern laboratory
2 Methodology
One of the requirements for this project is the ability to view the current state of the
office, i.e. keeping up with the sensor data in real-time.
Due to the large amounts of data that must be parsed every frame a rolling parsing
system had to be implemented where only a portion of the data is parsed and updated
each frame rather than all of it in a single frame per second. This is done because the
number of frames per second must be kept above 20 in order to maintain a smooth
appearance. This gives us only 50 ms of parsing time, minus the overhead of
rendering the 3D environment.
Fig. 2. Initial UI design

42 J. Conomikes et al.
As the sensors only poll data at most once per second, this system allows us to
keep the data real-time without sacrificing frame rate. Originally it was thought to
use threading to help alleviate this problem, however the 3D rendering engine used
(Panda3D) has very limited inherent support for threading so this was not possible.
Another problem that was tackled was that of a user interface, as the people using
this tool may not be high end computer users and there is a large amount of data
available to analyze.
We went over a large number of different designs (see Figure 2 above for an
example of one of the previous user interface designs) before settling on this latest
one which combines ease of use (similar to Office 2007[1] style tabbed buttons) while
still allowing the user a large amount of freedom to show and hide data as needed.
See Figure 3 below for the final user interface design of the software.
Fig. 3. Final UI design
3 System Architecture
Our entire system is built on NEC's LifeLog system which is responsible for gathering
the large amount of data that is needed for the software to operate. See Figure 4
below for a view of the ceiling with installed sensors.
Fig. 4. Ceiling of the South Laboratory with installed sensors
Employee location is detected through the use of IR emitters on employees and

receivers mounted on the ceiling, though approximately 25% of all location data is
"8022" which is the code for a person who is not detected by any IR receiver on the
premises.
Remote Context Monitoring of Actions and Behaviors 43
Ambient sound level data is collected by over 90 microphones installed in the

ceiling. There are also over 30 cameras (like the one shown in Figure 5 below) in
place on the ceiling to provide up to 10 images per second.
Fig. 5. Close up of one of the many cameras installed in the ceiling
All E-mails send to or from monitored employees are also stored, though
addressees that are not monitored are stored only as “Company Employee" or
"Recipient Outside Company".
Additionally, extensive information is pulled from the computer operations of each
monitored employee. Statistics such as key presses, mouse clicks and mouse
movements in the past second. Further, they track the currently active process
running on the computer and the most recently accessed file. Even all of the currently
running processes in the background. Finally they log all of the employee's internet
access, though this last piece of information can be disabled by the employee.
Finally, each employee has a wireless button that they carry with them that records
when it was pressed and if pressed for more than one second, it also reports the
duration of the press.
Also, while not related to people, 16 RFID readers are used to track the location of
resources (e.g. books, laptops) which have RFID tags on them, as they move around
the office. It also tracks which employee is using each particular resource.
The flow of information is quite simple, the LifeLog system polls the sensors for
their latest information. It then takes this information, timestamps it and outputs it to
a simplified YAML[2] format and stores this information on a server. Out program
then connects to the server and requests the files required to view the time the user
wishes to view, loads the needed information into memory in python data structures
and displays the recreated events to the user.
Due to security restrictions at NEC, the data is only accessible locally or through a
Virtual Private Network (VPN) connection. However, since the only remote action
that is being performed with the software is reading data from the server, with less
strict security measures, the software can function anywhere without the need for any
special access permissions.
44 J. Conomikes et al.
In testing the software it was found that starting the software up takes approximately
one minute per hour of data the user wishes to view. This is because the user needs to
be able to jump around to any point in the data and the only way this could be done
seamlessly while playing the data is to load all needed data up front. However, after
this load time, the user can easy jump to any point in time for the loaded data, in
addition to being able to view the most recent data. This load time could also be
reduced by having direct, local access to the server or lengthened by a slow internet
connection.
5 Comments and Conclusion

While the system does use a large concentration of sensors in a small area and is
generally very invasive, it does mean there are many promising opportunities for
future research to improve on both the technology and software. While not ready for
industry yet, with the inclusion of other research as well further improvement of the
current software this seems to be a promising technology and may prove to be the
next big step in combining multiple different information gathering technologies.
References
[1] Ebara, Y., Watashiba, Y., Koyamada, K., Sakai, K., Doi, A.: Remote Visualization Using
Resource Monitoring Technique for Volume Rendering of Large Datasets. In: 2004
Symposium on Applications and the Internet (SAINT 2004), p. 309 (2004)
[2] Hibbard, B.: Visad: connecting people to computations and people to people. SIGGRAPH
Computer Graphics 32(3), 10–12 (1998)
Spatial Clearance Verification Using
3D Laser Range Scanner and Augmented Reality
Hirotake Ishii1, Shuhei Aoyama1, Yoshihito Ono1, Weida Yan1,

Hiroshi Shimoda1, and Masanori Izumi2
1
Graduate School of Energy Science, Kyoto University,
Yoshida Monmachi, Sakyo-ku, Kyoto-shi, 606-8501 Kyoto, Japan
2
Fugen Decommissioning Engineering Center, Japan Atomic Energy Agency,
Myojin-cho, Tsuruga-shi, 914-8510 Fukui, Japan
{hirotake,aoyama,ono,yanweida,shimoda}@ei.energy.kyoto-u.ac.jp,
izumi.masanori@jaea.go.jp
Abstract. A spatial clearance verification system for supporting nuclear power

plant dismantling work was developed and evaluated by a subjective evaluation.
The system employs a three-dimensional laser range scanner to obtain three-
dimensional surface models of work environment and dismantling targets. The
system also employs Augmented Reality to allow field workers to perform
simulation of transportation and temporal placement of dismantling targets
using the obtained models to verify spatial clearance in actual work
environments. The developed system was evaluated by field workers. The
results show that the system is acceptable and useful to confirm that
dismantling targets can be transported through narrow passages and can be
placed in limited temporal workspaces. It was also found that the extension of
the system is desirable to make it possible for multiple workers to use the
system simultaneously to share the image of the dismantling work.
Keywords: Augmented Reality, Laser Range Scanner, Nuclear Power Plants,

Decommissioning, Spatial Clearance Verification.
1 Introduction
After the service period of a nuclear power plant terminates, the nuclear power plant
must be decommissioned. Because some parts of a nuclear power plant remain
radioactive, the procedure of its decommissioning differs from that of general
industrial plants. Each part of the nuclear power plant must be dismantled one by one
by following a dismantling plan made in advance. In some cases, it is desirable to
dismantle large plant components into small pieces at different location from their
original location; the components are removed from their bases and transported to
appropriate workspaces. However, nuclear power plants are not designed to be easily
dismantled. Passages are very narrow and workspace is not large enough. Large
components may collide with passages and workspace during transportation and
placement. Moreover, dismantled components need to be stored at a temporal space
46 H. Ishii et al.
for a certain period before they are transported to outside of the plant because their
radioactive level must be checked. The space for the temporal storage is also not large
enough. Therefore it is necessary to verify that the dismantled components can be
transported through narrow passages, can be placed in a limited space before
performing dismantling work. But the verification is not easy because there are
various components in nuclear power plants and their shapes are much different.
In this study, to make it easy for field workers to perform the verification, a spatial
clearance verification system was developed and evaluated by a subjective evaluation.
The system employs a three dimensional (3D) laser range scanner to obtain 3D
surface point clouds of work environment and dismantling targets, and then builds
polygon models. Augmented Reality (AR) technology is also employed to allow field
workers to perform transportation and temporal placement simulation intuitively
using the obtained models to verify spatial clearance between the work environment
and the dismantling targets in actual work environments. The developed system was
used along with a scenario by field workers who are working for dismantling a
nuclear power plant and an interview and questionnaire survey were conducted to
confirm whether the system is effective or not, how acceptable the system is, or what
problems arise in practical use.
2 Related Work
Various studies have been conducted to apply AR to maintenance tasks in nuclear

power plants [1]. In [2], a mobile AR system is investigated as an alternative to paper-
based systems to retrieve maintenance procedure from online servers. In [3], a mobile
AR system to support maintenance task of a power distribution panel is proposed. The
authors have proposed some AR systems to support workers in nuclear power plants
[4][5][6]. In [4], an AR support system for water system isolation task is proposed
and evaluated. In [5], AR technology is used to support field workers to refer cutting
line of dismantling target and record the work progress. In [6], field workers are
supported to make a plan of preparation for dismantling work by deciding how to
layout scaffolding and greenhouses. In this study, the authors focus on a spatial
clearance verification task as a new support target in which real time interaction
between virtual objects and real environment need to be realized.
3 Spatial Clearance Verification System
3.1 Basic Design
Most crucial requirement for spatial clearance verification is to make it possible to

perform the verification using accurate 3D models of work environment and
dismantling targets. The 3D models are used to detect collisions between work
environment and dismantling targets. One possible way to obtain the 3D models is to
use existing CAD that was made when the plant was designed. But the CAD usually
includes only large components and is not updated since it was made; they do not
Spatial Clearance Verification Using 3D Laser Range Scanner and Augmented Reality 47
represent the current status of the plant properly. Therefore, the authors decided to
employ 3D laser range scanner to make 3D models of work environment and
dismantling targets.
Concerning an interface for performing the verification, one possible way is to
develop GUI application with which users can manipulate 3D models in a virtual
environment. But such interface may be difficult to use because it is necessary to
indicate 3D position and orientation of dismantling targets. Moreover, it is difficult to
obtain concrete image of spatial relation between work environment and dismantling
targets. In this study, therefore, the authors aimed at developing an AR-based
application that can be used in actual work environment. The transportation path and
layout of the dismantling target can be investigated intuitively by manipulating real
objects, and the users can confirm which part of the work environment and
dismantling targets collide each other in an intuitive way.
The whole system can be divided into two subsystems; Modeling Subsystem and
Verification Subsystem.
3.2 Modeling Subsystem
The Modeling Subsystem is used to build 3D surface polygon models of work

environment and dismantling targets. These models are used to detect collisions
during using the Verification Subsystem. The accuracy of the models is not necessary
to be an order of millimeter but should be better than an order of meter. It is not clear
how much accurate the models should be, the authors, therefore, tried to make the
total cost of the system reasonably low, and then tried to make the models as accurate
as possible with the available hardware. Further study is necessary to reveal the
required accuracy of the models used for the spatial verification.
The Modeling Subsystem consists of a laser range scanner, a motion base and a
color camera to obtain 3D point clouds of work environment and dismantling targets,
and a software to make 3D polygon models from the obtained point clouds as shown
in Figure 1. The hardware specifications are shown in Table 1. The laser range
scanner employed in this study is a kind of line scanner and can obtain 3D positions
of surrounding environment in a 2D plane. Therefore, the scanner is mounted on a
motion base; the motion base rotates the scanner to obtain point clouds of whole
surrounding environment. The color camera is used to capture visual images. The
position and orientation of the camera when the images are captured are also
recorded.
The obtained point clouds are based on a local coordinate system which origin is
the intersection of rotational axis of the motion base when they are obtained. But the
point clouds need to be based on a world coordinate system when they are used for
the spatial verification. In this study, the authors employed a camera tracking
technique proposed in [7]. Multiple markers are pasted in work environment and their
position and orientation based on the world coordinate are measured in advance. By
capturing these markers with the color camera, the position and orientation of the
camera is estimated. Then positions of the obtained point clouds are transformed into
the world coordinate.
48 H. Ishii et al.
Laser Range Color Table 1. Hardware specifications for Modeling

Scanner Camera Subsystem
Vendor SICK Inc.

Laser Model LMS100-10000
range Scan angle 270 deg.
scanner Angular res. 0.25 deg.
Max. error 40mm
Vendor FLIR Systems Inc.
Motion
Model PTU-D46-70
base
Motion Interface for making Angular res. 0.013 deg.
Base polygon models Vendor PointGreyResearch Inc.
Model CMLN-13S2C-CS
Camera
Resolution 1280×960
Fig. 1. Configuration of Modeling Subsystem Focal Length 4.15mm
Another problem is that a single point cloud does not include enough points of
work environment and dismantling targets. Only one side of work environment and
dismantling targets can be measured at once. It is necessary to use the scanner at
multiple positions to obtain whole surface of work environment and dismantling
targets. The obtained point clouds need to be combined into one point cloud. One
possible solution is to use the camera tracking again. If the camera can capture the
markers at all measuring positions, the point clouds can be combined without any
additional operation because all the point clouds are based on the world coordinate.
But in some cases, it is difficult to capture markers. In this study, the authors tried to
use ICP (Iterative Closest Point) algorithm to transform one point cloud to be matched
with another point cloud that is already transformed into the world coordinate. But in
our case, ICP algorithm can not be directly used because the point cloud obtained in
nuclear power plants includes much noise and two point clouds do not always contain
enough part of the environment in common. Therefore, a GUI application was
developed to set an initial transform of the target point cloud by hand, and then two
point clouds are combined with the following algorithm. (It is assumed that Cloud1 is
already transformed into the world coordinate. The goal is to transform Cloud2 into
the world coordinate.)
Step 1. Smooth Cloud2 to remove random error of the measurement.
Step 2. Locate a sphere which radius is 200 cm randomly inside Clouds2 and clip the
points that are inside of the sphere.
Step 3. Perform ICP algorithm to adjust the clipped points to Cloud1 and obtain its
transformation matrix.
Step 4. Apply the transformation matrix to all the points of Cloud2.
Step 5. Count the number of points of Cloud2 which distance from nearest point of
Cloud1 is less than 5 cm.
Step 6. Repeat Step2 to Step5 10 times and choose the transformation matrix with
which the number of points in Step5 is largest.
Step 7. Apply the transformation matrix to all the points of Cloud2.
Step 8. Repeat Step 2 to Step7 until the number of points in Step5 does not increase.
After applying the above algorithm, an area that contains the necessary points is set
by hand. Then the clipped point cloud is converted into polygon model with Quadric
Clustering Algorithm [8].
Concerning the polygon model for the dismantling targets, it is necessary to make a
texture to increase its visibility. In this study, the texture is automatically generated
using the captured images during obtaining point clouds of the dismantling targets.
Figure 2 shows an example polygon models made with the Modeling Subsystem.
Work environment (With texture) (Without texture)

(Partially extracted for better visibility) Dismantling target
Fig. 2. Polygon models obtained with Modeling Subsystem
3.3 Verification Subsystem
The Verification Subsystem is used to conduct simulations of transportation and

placement of dismantling targets in actual work environments intuitively using
Augmented Reality technology. The most significant feature of the Verification
Subsystem is a function to detect collisions between virtual dismantling targets and
real work environment.
Figure 3 shows a conceptual image of the Verification Subsystem. The system
consists of a marker cube, a tablet PC, a camera and environmental markers. The
marker cube is used to indicate 3D position and orientation of a virtual dismantling
target. The table PC is mounted on a tripod and a dolly, which enables users to move
the system easily. Six markers are pasted on the marker cube and used to measure the
relative position and orientation between the marker cube and the camera. The
environmental markers pasted in work environment are used to measure the position
and orientation of the camera relative to the work environment. For both the marker
cube and environmental markers, the markers proposed in [7] are used.
The system is supposed to be used by two workers; a cube operator and a system
operator. When the camera captures the marker cube and the environmental markers,
3D models of the dismantling target made with the Modeling Subsystem is
superimposed on the camera image based on the current position and orientation of
the marker cube. When the cube operator moves the marker cube, the superimposed
model follows its movement. When the virtual dismantling target collides with the
work environment, the collided position is visualized as shown in Figure 4. The
yellow area shows the collided part of the virtual dismantling target and the red area
shows the collided part of the work environment. (At the initial state, 3D model of the
Work Environment is invisible and the user can see the camera image. When collision
occurs, only the nearest polygon from the collided position is made visible and its
color is changed to red.)
50 H. Ishii et al.
Table 2 shows the hardware specifications used in the Verification Subsystem. To

capture wide view angle images of the work environment, a lens that has short focal
length is used. It results on the necessity to use the large markers (41cm×41cm) to
make the tracking of the camera and the marker cube accurate and stable.
Environmental
Markers Cube This part collides with
Operator the dismantling target
Marker Cube
Camera
Tablet
PC Superimposed image on
Tripod and Tablet PC Dismantling target model
dolly System
Operator
is superimposed
Fig. 3. Conceptual image of Verification Subsystem Fig. 4. Visualization of collided part
Table 2. Hardware specifications for

Verification Subsystem
Vendor Panasonic Corp.
Model CF-C1AEAADR
Tablet
CPU Core i5-520M
PC
GPU Intel HD Graphics
Memory 1GB
Vendor PointGreyResearch Inc.
Model CMLN-13S2C-CS
Camera
Resolution 1280×960
Focal Length 3.12mm Fig. 5. Interface for verification
By using the marker cube, it is expected that the position and orientation of the
virtual dismantling target can be changed intuitively. But there may be a case that it is
difficult to move the virtual dismantling target only with the marker cube. For
example, the intended position is too high or very small adjustment is necessary.
Therefore, in this study, GUI is also implemented as shown in Figure 5. The system
operator can change the position and orientation of the virtual dismantling target by
using the buttons and also can drag the virtual dismantling target with a stylus pen. In
addition, following functions are also implemented.
1. A function to record the 3D position and orientation of the virtual dismantling
target. The superimposed image is also recorded simultaneously.
2. A function to make the virtual dismantling target invisible.
3. A function to reset all the indication of the collided part. (The color of the virtual
dismantling target is set to its original color and the model of the work
environment is made invisible.)
The application was developed on an operating system Windows 7 (Microsoft Corp.)

using compiling software Visual C++ 2008 (Microsoft Corp.). Open GL, Visualization
Tool Kit Library [9] and Bullet Physics Library [10] were used to render 3D models,
implement ICP algorithm and conduct collision detection respectively.
4 Evaluation
4.1 Objective
It is expected that it is possible for field workers to simulate transportation and

placement of dismantling targets using the proposed system. However, it remains
unknown how acceptable the system is for actual field workers, what problems arise
in practical use. An evaluation experiment was conducted to answer these questions.
In this evaluation, the authors mainly focused on the evaluation of the Verification
Subsystem because the pre-evaluation showed that combining multiple point clouds
by hand using the Modeling Subsystem is difficult for novice users. The Modeling
Subsystem will be improved and evaluated as a future work.
4.2 Method
Before the evaluation, the experimenters pasted environmental markers and measured
their position and orientation relative to the work environment using Marker
Automatic Measurement System [11]. The experimenters demonstrated how to use
the Modeling Subsystem and the Verification Subsystem for about 10 minutes each.
Then four evaluators used the Modeling Subsystem and the Verification Subsystem
with the assumption that one plant component will be dismantled. The evaluators used
the Modeling Subsystem only to obtain point clouds and did not try to combine the
point clouds into one point cloud. The polygon models used with the Verification
Subsystem were prepared in advance by the experimenters. Each evaluator played
only a role of the system operator. The experimenter played a role of the cube
operator. After using the system, the evaluators answered questionnaire, then an
interview and a group discussion were conducted.
The dismantling target was assumed to be a water purification tank as shown in the
right hand side of Figure 3. The evaluators were asked to use the Verification
Subsystem under the assumption that the tank will be removed from its base, placed
temporarily at the near space, and then transported through a narrow passage.
Of the four evaluators, three (Evaluator A, B and C) were staffs at Fugen
Decommissioning Engineering Center. One (Evaluator D) was a human interface
expert working at a university.
4.3 Questionnaire and Results
The questionnaire includes 36 items for system function and usability as shown in
Table 3. Evaluators answer each question as 1 – 5 (1. completely disagree; 2.
disagree; 3. fair; 4. agree; 5. completely agree). In addition, free description is added
to the end of the questionnaire. Respondents describe other problems and points to be
improved.
52 H. Ishii et al.
Each evaluator used the system for about 40 minutes. Table 3 presents the results
of the questionnaire. Table 4 presents answers of the free description, interview and
group discussion.
Table 3. Questionnaire results
Evaluator
Questionnaire
A B C D
Q1 Is it easy to set up the system? 5 4 5 5
Q2 Is it easy to remove the system? 5 4 5 5
Q3 The situation of temporal placement becomes easy to be understood by 5 4 4 5
superimposing the dismantling target over the camera view.
Q4 The situation of transportation becomes easy to be understood by 5 5 4 5
superimposing the dismantling target over the camera view.
Q5 It is easy to recognize the collided position on the dismantling target 4 2 5 4
by making the collided position yellow.
Q6 It is easy to recognize the collided position in the work environment 5 4 5 5
by making the collided position red.
Q7 It is effective to make it possible to change the position and orientation 4 2 4 5
of dismantling target by moving the marker cube.
Q8 It is easy to translate the dismantling target by using the marker cube. 4 2 3 5
Q9 It is easy to rotate the dismantling target by using the marker cube. 2 4 3 5
Q10 It is effective to translate the dismantling target using a stylus pen. 5 4 5 5
Q11 It is effective to rotate the dismantling target using a stylus pen. 5 4 5 5
Q12 It is easy to translate the dismantling target using a stylus pen. 5 4 5 5
Q13 It is rotate to translate the dismantling target using a stylus pen. 3 5 5 3
Q14 It is easy to operate the system using a stylus pen. 5 3 3 4
Q15 It is effective to translate dismantling target using the buttons. 5 3 4 5
Q16 It is easy to translate dismantling target using the buttons. 5 4 5 5
Q17 It is effective to set the position and orientation of dismantling target at 4 5 5 5
its initial position using the button.
Q18 It is effective to record the position and orientation of dismantling 5 5 5 5
target.
Q19 It is easy to record the position and orientation of dismantling target. 5 5 5 5
Q20 It is effective to refer the recorded position and orientation of 5 5 5 5
dismantling target visually.
Q21 It is easy to refer the recorded position and orientation of dismantling 5 5 5 4
target visually.
Q22 It is effective to choose the recorded capture images using the buttons. 5 5 5 5
Q23 It is easy to choose the recorded capture images using the buttons. 5 5 5 5
Q24 The function is effective to make dismantling target invisible. 5 5 5 5
Q25 The function is effective to reset the color of dismantling target. 5 5 5 5
Q26 The size of the area to display the camera image is adequate. 5 5 4 5
Q27 The size of the PC display is adequate. 5 5 4 5
Q28 The size of the system is adequate and it is easy to carry in. 5 4 4 5
Q29 The size of the buttons is adequate. 5 5 3 5
Q30 The system can be used easily even if it is the first use. 4 4 4 4
Q31 The system response is quick enough. 5 4 5 5
Q32 It is easy to rotate the system to change your viewpoint. 5 4 4 5
Q33 It is easy to move the system to change your viewpoint. 4 5 4 5
Q34 It is effective to make dismantling target models by measuring with 5 4 5 5
the system and use them for the verification.
Q35 It is effective to verify temporal placement and transportation work by 5 4 5 5
referring dismantling target model at actual work environment.
Q36 I could use the system without feeling stress. 3 3 5 4
Table 4. Free description and interview results (Partially extracted)
Evaluator A
A1 It is difficult to tell the cube operator how to move the marker cube only by gesture.
A2 It is difficult to conduct detail operations using the stylus pen especially for the model rotation.
A3 The models should be more stable when the camera does not move.
Evaluator B
B1 It is a little difficult to notice the change of the color. It may be better to change only the color
of the work environment.
B2 The marker cube is not necessary if the same operation can be done with the buttons.
B3 It is better if the virtual model follows the marker cube more quickly.
Evaluator C
C1 The size of the marker cube should be smaller.
C2 Sometimes it was difficult to see the display because of the reflection of the light.
C3 It is better if it is possible to change the amount of model movements by the button operation.
Evaluator D
D1 Using the marker cube is intuitive.
D2 The system is useful to confirm that dismantling targets can be transported through passages.
D3 Changing the color of the dismantling target is useful to decide which part of the dismantling
target should be cut to be transported through a narrow passage.
D4 The system will be more useful if multiple workers can use the system simultaneously. This
extension will enables us to check what other workers will see from their positions.
4.4 Discussion
As shown in Table 3, all evaluators gave positive responses to almost all

questionnaire items. But for several items, some evaluators gave negative responses.
Evaluator B gave a negative response to Q5. For Q5, he also gave a comment B1 as in
Table 4. The authors decided to change the colors of both dismantling target and work
environment because it will give more information to the workers. In fact, Evaluator
D gave a comment D3 that is a positive response to changing the color of the
dismantling target. Therefore, it will be better to add a function to enable and disable
the color of dismantling target and work environment separately. Evaluator B gave
negative responses to Q7 and Q8. He also gave a comment B2. On the other hand,
Evaluator D gave a positive comment D1 to the marker cube. The possible cause of
this difference is that Evaluator B is much younger than Evaluator D and very
familiar with computers. Evaluator B is good at using GUI therefore he may think that
the marker cube is not necessary. Evaluator A gave a negative response to Q9. He
also gave a comment A1. It was difficult to give orders by voice because the work
environment is very noisy. Therefore, the evaluators must give orders to the cube
operator by gestures. But the authors did not teach the evaluators anything about
which gesture should be used to give orders to the cube operator. A set of standard
gestures should be designed and shared between the cube operator and the system
operator in advance.
Evaluator D gave an interesting comment D4. It is easy to make it possible for
multiple workers to use the system by introducing multiple hardwares and exchanging
information via wireless network. This extension will enable us to share the work
image that is very important to increase the safety and efficiency.
54 H. Ishii et al.
5 Summary and Future Works

In this study, a spatial verification support system using a 3D laser range scanner and
Augmented Reality was developed and evaluated by a subjective evaluation. The
results show that the system is basically acceptable and useful for the spatial
verification. Artificial marker based tracking was employed in this study, because the
authors intended to prioritize stability and accuracy rather than practicability. For
practical use, it is necessary to decrease the number of markers and make it possible
for workers to move more freely. Another problem is that there is a case that the
scanner can not be used to make surface models of dismantling targets; the target is at
high location or obstructed by other components. One possible solution is to employ a
modeling method using only small cameras. One promising extension of the system is
to make it possible for multiple workers to use the system simultaneously. This
extension will enable workers to share the image of dismantling work that is very
important to increase the safety and efficiency of the dismantling work.
Acknowledgments. This work was partially supported by KAKENHI (No. 22700122).
References
1. Ishii, H.: Augmented Reality: Fundamentals and Nuclear Related Applications.
International Journal of Nuclear Safety and Simulation 1(4), 316–327 (2010)
2. Dutoit, H., Creighton, O., Klinker, G., Kobylinski, R., Vilsmeier, C., Bruegge, B.:
Architectural issues in mobile augmented reality systems: a prototyping case study. In:
Software Engineering Conference, pp. 341–344 (2001)
3. Nakagawa, T., Sano, T., Nakatani, Y.: Plant Maintenance Support System by Augmented
Reality. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 1, pp.
768–773 (1999)
4. Shimoda, H., Ishii, H., Yamazaki, Y., Yoshikawa, H.: An Experimental Comparison and
Evaluation of AR Information Presentation Devices for a NPP Maintenance Support
System. In: 11th International Conference on Human-Computer Interaction (2005)
5. Ishii, H., Shimoda, H., Nakai, T., Izumi, M., Bian, Z., Morishita, Y.: Proposal and
Evaluation of a Supporting Method for NPP Decommissioning Work by Augmented
Reality. In: 12th World Multi-Conference on Systemics, Cybernetics, vol. 6, pp. 157–162
(2008)
6. Ishii, H., Oshita, S., Yan, W., Shimoda, H., Izumi, M.: Development and evaluation of a
dismantling planning support system based on augmented reality technology. In: 3rd
International Symposium on Symbiotic Nuclear Power Systems for 21st Century (2010)
7. Ishii, H., Yan, W., Yang, S., Shimoda, H., Izumi, M.: Wide Area Tracking Method for
Augmented Reality Supporting Nuclear Power Plant Maintenance Work. International
Journal of Nuclear Safety and Simulation 1(1), 45–51 (2010)
8. Lindstrom, P.: Out-of-core simplification of large polygonal models. In: 27th Annual
Conference on Computer Graphics and Interactive Techniques, pp. 259–262 (2000)
9. Visualization Tool Kit, http://www.vtk.org/
10. Bullet Physics Library, http://bulletphysics.org/
11. Yan, W., Yang, S., Ishii, H., Shimoda, H., Izumi, M.: Development and Experimental
Evaluation of an Automatic Marker Registration System for Tracking of Augmented
Reality. International Journal of Nuclear Safety and Simulation 1(1), 52–62 (2010)
Development of Mobile AR Tour Application for the
National Palace Museum of Korea
Jae-Beom Kim and Changhoon Park
Dept. of Game Engineering, Hoseo University,

165 Sechul-ri, Baebang-myun, Asan,
Chungnam 336-795, Korea
re.ho.mik@gmail.com, chpark@hoseo.edu
Abstract. We present the mobile augmented reality tour application (MART) to

provide intuitive interface for the tourist. And, a context-awareness is used for
smart guide. In this paper, we discuss practical ways of recognizing the context
correctly with overcoming the limitation of the sensors. First, semi-automatic
context recognition is proposed to explore context ontology based on user
experience. Second, multiple sensors context-awareness enables to construct
context ontology by using multiple sensor. And, we introduce the iphone tour
application for the national palace museum of korea.
Keywords: Mobile, Augmented Reality, Tour, Semi-automatic context

recognition, Multi-sensor context-awareness.
1 Introduction
We introduce an ongoing project to develop an mobile AR tour application for the

national palace museum of korea running on the iphone. Every exhibit in the museum
has its own name and history. For richer experience, this application is based on the
augmented reality to make that content available to tourists interacting with exhibits
by enhancing one’s current perception of reality. Moreover, we also support AR
content authoring in situ to share their experiences of exhibits.
When the visitor see a exhibit through iPhone’s camera, relevant information to the
captured real images will be provided. To achieve this, the tour application is
developed based on a client-server architecture. The client sends the query image to a
remote server for recognition process, which extract visual features from the image
and perform the image mach against large database of reference images by using
SIFT(Scale-Invariant Feature Transform) algorithm. Once the matching image is
found, the client render and overlay computer-generated virtual elements about the
objects in it. And, the client continuously tracks the viewing pose, relative to the real
object for image registration. Compass and gyroscope sensors of iPhone 4 are used
for tracking.
56 J.-B. Kim and C. Park
Fig. 1. Overview of the Mobile Augmented Reality Tour (MART) Application
2 Context-Awareness for MART

We have been researching mobile augmented reality tour(MART) applications to
provide intuitive interface to the visitor. And, the context-awareness is used to support
a smart tour guide. In this paper, we discuss practical ways of recognizing the context
correctly with overcoming the limitation of the sensors. And, this approach is
implemented in the iphone tour application for the national palace museum of korea.
Table 1. Three key steps for context-awareness
Step Input Output

1. Context Name of the sensor data Candidate Contexts
Recommendation (automatic)
2. Context User input Best Matching Context
Exploration (manual)
3. Resources User input Multimedia, 3D Model,
Offer (manual) Other applications ...
First step is to recommend candidate contexts by using the name of the captured
data from the sensor. This name can represent identification and characteristics of the
sensor data. For example, this name can be retrieved from GPS coordinates with the
help of google places API that returns information about a “place”. In second step,
the user can find the best matching context for the situation. Because of the
limitations of sensor, it is difficult to recognize all contexts by using only the sensor.
So, it is allowed for the user to explore the ontology based context manually. Third
step provides the list of resources available for the specific context.
2.1 Semi-automatic Recognition of the Context
This paper propose an efficient way of exploring context ontology based on user
experience. We use the past experience to minimize the cost of context exploration of
Development of Mobile AR Tour Application 57
second step mentioned previous section. Interesting contexts receive a higher

reference count that stands for the user’s visiting frequency. And, these contexts are
more likely to appear at the top for exploration.
For example, the “public place” context can not be provided directly from GPS
sensor. Instead, the user can find “public place” context in the ontology from the high
part of the “Museum”. And, if there is no service for indoor location, “ticket booth”
context cannot be provided directory by using only the sensor. But, the user can find
“ticket booth” context from the low part of “Museum” context. After all, context
ontology includes the context reduced by sensor or not. This semi-automatic approach
will enable provide appropriate contexts to the user quickly with overcoming the
limitations of sensor.
To apply the experience, context ontology records how many times the context is
referenced by the user. And the order of displaying context is depends on this value.
In addition to this, the experience of friends are also be considered with different
weight ratio for the calculation of interesting. This approach based on experience will
be expected to reduce not only the cost of context exploration but also support the
sharing of experience.
2.2 Multiple Sensors Context-Awareness
We propose a way of constructing context ontology to define more concrete contexts

by using multiple sensor. To achieve this, context is limited related to at mote one kind
of sensor. If there are two sensors for context recognition, we can find two contexts in
the ontology where there is a path between them.
For example, the visitor can take a picture with current location by using camera
and GPS sensor. Then, we can find the name of captured image and the name of
location. Several contexts can be founded by using these names. And, we will provide
a context if there is a path between two contexts. This means that there is a sensor
hierarchy. Low level context is affected by high level context. High-level sensor affect
on it;s lower-level contexts. After all, we can define concrete contexts by combining
multiple sensors on the context ontology.
We can find the name of object by using camera. If there are two more than contexts
reduced by this name, this means that there are the same things in the world. Then, we
can restrict the scope by using GPS by adding the context reduced by camera into the
low level of the context by GPS.
3 Mobile AR Tour Application
In this section, we introduce the key implementation method of the iphone tour
application for the national palace museum of korea. The AR Media Player makes
content available to tourists interacting with exhibits by enhancing one’s current
perception of reality. And, In-situ authoring and commenting support AR content
authoring in situ to share their experiences of exhibits.
3.1 AR Media Player
The client consists of 3 layers: live camera view, 3d graphic rendering and touch
input. First, we make a layer to display video preview coming from the camera.
Second layer is for the rendered image of virtual world. In virtual world, interactive
virtual character will explain about what the camera is seeing. We ported
OpenSceneGraph to the iPhone for real-time 3D rendering. OpenSceneGraph is based
on the concept of scene graph, providing high performance rendering, multiple file
type loaders and so on. And, This layer clears the color buffer setting the alpha to 0 to
draw 3D scene on the top of the camera view layer. So, the camera view layer will be
shown in the background. The third layer is provided for GUI.
Fig. 3. AR media player consisting of three layers: live camera view, 3D rendering and GUI
3.2 Image Query
The tour application send a query image automatically without user input. If the live
captured image on the screen is identified by the server, the green wireframe
rectangle will be displayed like the below figure. This approach is very intuitive and
natural, but the cost of network should be considered. To reduce bandwidth usage, we
change the size and resolution of image for query. And, the client use acceleration and
compass sensor to decide when is the best time to send query image. The movement
of iphone enables to detect when the user focuses attention on the particular exhibit or
not. So, we can control the frequency of sending the query image.
Development of Mobile AR Tour Application 59
Fig. 4. Image query running on the iPhone 4 without user input
3.3 In-situ AR Authoring
The client provide an interface for in-situ authoring of AR contents on the iphone as
the below figure. This interface enables to create visitor’s own contents for a specific
exhibit on the spot. And this content can be shared with others who are also interested
in the same exhibit. We will suggest an efficient and easy way of in-situ authoring
with overcoming the limitation of mobile devices.
Fig. 5. In-situ AR authoring and commenting on iPhone
4 Conclusion
In this paper, we presented practical way of recognizing the context correctly with
overcoming the limitation of the sensors. Semi-automatic recognition of the context
is proposed to reduce not only the cost of context exploration but also support the
sharing of experience. And, we introduced multiple sensor based context-awareness
to define more concrete contexts by using multiple sensor. Promising results were
demonstrated in the iphone tour application for the national palace museum of Korea.
Acknowledgement. “This research was supported by the Academic Research fund of

Hoseo University in 2009” (20090082)
References
1. Park, D.J., Hwang, S.H., Kim, A.R., Chang, B.M.: A Context-Aware Smart Tourist Guide
Application for an Old Palace. In: Proceedings of ICCIT (2007)
2. Adomavicius, G., Tuzhilin, A.: Context-Aware Recommender Systems. Technical Report,
http://ids.csom.umn.edu
3. Seo, B.K., Kim, K., Park, J., Park, J.I.: A tracking framework for augmented reality tours on
cultural heritage sites. In: Proceedings of VRCAI (2010)
4. Riboni, D., Bettini, C.: Context-Aware Activity Recognition through a Combination of
Ontological and Statistical Reasoning. In: Zhang, D., Portmann, M., Tan, A.-H., Indulska, J.
(eds.) UIC 2009. LNCS, vol. 5585, pp. 39–53. Springer, Heidelberg (2009)
5. Gellersen, H., Schmidt, A., Beigl, M.: Multi-Sensor Context-Awareness in Mobile Devices
and Smart Artefacts. Mobile Networks and Applications 7(5), 341–351 (2002)
6. Lim, B.: Improving trust in context-aware applications with intelligibility. In: Proceedings
of Ubicomp, pp. 477–480 (2010)
A Vision-Based Mobile Augmented Reality System for
Baseball Games
Seong-Oh Lee, Sang Chul Ahn, Jae-In Hwang, and Hyoung-Gon Kim
Imaging Media Research Center,

Korea Institute of Science and Technology, Seoul, Korea
{solee,asc,hji,hgk}@imrc.kist.re.kr
Abstract. In this paper we propose a new mobile augmented-reality system that

will address the need of users in viewing baseball games with enhanced
contents. The overall goal of the system is to augment meaningful information
on each player position on a mobile device display. To this end, the system
takes two main steps which are homography estimation and automatic player
detection. This system is based on still images taken by mobile phone. The
system can handle various images that are taken from different angles with a
large variation in size and pose of players and the playground, and different
lighting conditions. We have implemented the system on a mobile platform.
The whole steps are processed within two seconds.
Keywords: Mobile augmented-reality, baseball game, still image, homography,

human detection, computer vision.
1 Introduction
A spectator sport is a sport that is characterized by the presence of spectators, or
watchers, at its matches. If additional information can be provided, it will be more fun
when viewing spectator sports. How about applying a mobile augmented-reality
system (MARS) to spectator sports? Augmented Reality is widely used for sports
games, like football, soccer, and swimming except baseball. Therefore, we want to
focus on baseball games. Hurwitz and co-workers proposed a conceptual MARS that
targets baseball games [1]. However, any implementation methods have not been
presented.
Previous research literatures include several papers on Augmented Reality (AR)
technology for sports entertainment. Demiris et al. used computer vision techniques to
create a mixed reality view of the athletes attempt [2]. Inamoto et al. focused on
generating virtual scenes using multiple synchronous video sequences of a given
sports game [3]. Some researchers tried to synthesize virtual sports scenes from TV
broadcasted video [4], [5]. These systems, however, were not designed for real-time
broadcasting. Han et al. tried to build a real-time AR system for court-net sports like
tennis [6]. Most of the previous works were applied to TV broadcasting of sports.
There have been no AR systems for on-site sports entertainment.
In this paper we propose a MARS for baseball games in stadium environments.
The overall goal of the system is to augment meaningful information with each player
62 S.-O. Lee et al.
on a captured playfield image during a game. This information includes name, team,
position, and statistics of players and games, which are available via the Web and
local information server installed in stadium. Our system is currently based on still
images of playfields, which are taken by a mobile phone. The images are can be from
different angles, having a large variation in size and pose of players and playground,
and different lighting conditions.
The rest of this paper is structured as follows. Section 2 gives an overview and
detailed description of the proposed system. The experimental results on the baseball
field images are provided in Section 3, and Section 4 concludes the paper.
2 The Proposed System

Figure 1 shows the architecture of the proposed system. This system starts the
processing with capturing a still image from a mobile device. We use a still image
because of two reasons. The first reason is that users may have some difficulties in
holding mobile devices without shaking for a long time while interacting with
augmented contents on a live video frames. The second reason is that a still image has
higher resolution than an image frame of a video sequence. In general, users take a
picture in a long distance during a baseball game. For detecting players, we need a
sufficient image resolution of the players. The captured still image is then analyzed to
estimate a homography between a playfield template and the imaged playfield, and to
detect the location of each player. If the analysis is performed successfully, the game
contents are received by accessing the information server. A user can touch an
interested player on the mobile phone screen. A best candidate of the corresponding
player, then, is found by a simple method combining the detected player location and
the game information with some boundary constraints. Finally, team and name of the
player is augmented above the touched player. Detail information is displayed on a
new screen when the user touches the player’s name. We use a new screen, because
the screen size is too small to display the whole information on the field image.
Figure 5 shows an example of the results.
Fig. 1. The proposed system architecture
2.1 Planar Homography Estimation
One of the main techniques for AR technology is the homography estimation. In

general, there are two different approaches for the homography estimation from a
A Vision-Based Mobile Augmented Reality System for Baseball Games 63
single image. First one is a marker-based approach that uses image patterns that are
specially designed for homography estimation. Second one is a markerless approach
that does not use those patterns, but is restricted to natural images that contain
distinctive local patterns. However, baseball playfield images include formalized
geometric primitives that are hard to distinguish between an input frame from the
reference frame based on local patterns. Therefore, we propose a baseball playfield
registration method by matching the playfield shape, which consists of geometric
primitives. This method is divided into three steps. First, contours are extracted based
on edges between the dominant color (e.g. grass) and others. Secondly, geometric
primitives, like lines and ellipses, are estimated by using parameter estimation
methods. Third, homography is estimated by matching those geometric primitives.
Edge Contours Extraction. Unlike other sports playfield that have well-defined
white line structure, grass and soil colors are two dominant colors in baseball field
[7]. The infield edge pixels define most of the shape structure. Actually, foul lines are
designated with a white line. It is not enough to estimate the projective
transformation. To detect the edge pixels, grass-soil playfield segmentation approach
is considered [8]. However, according to the empirical analysis of the colors, grass
pixels have dominant component of green in RGB color space. By setting a pixel with
larger green component than red component as grass, we get a reliable pixel
classification result as shown in Figure 2(b). Noise removal is followed by applying a
Median filtering. Note that we do not filter out the background areas, such as sky and
spectators, because the homography estimation step removes these areas
automatically.
After pixel classification, an edge detection algorithm is applied to detect edge
pixels. There are many methods in the literature to detect edges from an image. In this
case, a simple edge detection method that detects pixels of grass area adjacent to other
area is developed. We set as edge the pixels that have both of grass and other
components in a 3x3 window. The detected edges are shown in Figure 2(c). Finally,
edge pixels are linked together into lists of sequential edge points, one list for each
edge-contour for discriminating the connectivity. Note that small segments and holes
are removed by discarding contours that have smaller length than 50.
Geometric Primitives Estimation. The infield structure of a baseball field consists

of two different types of shape, line and ellipse. Starting with the detected edge
contours, line and ellipse parameters are extracted. Brief descriptions of the
estimation methods are as follows.
Line segmentation method is used to form straight-line segments from an edge-
contour by slightly modifying Peter Kovesis implementation [9]. The start and end
positions of a line segment are determined, and the line-parameters are further refined
with a least-square line fitting algorithm. Finally, nearby line segments with similar
parameters are joined. The final line segmentation results are shown in Figure 2(d).
There are two possible ellipses, the pitcher’s mound and the home plate, in a
baseball field. It is hard to detect the elliptical shape of home plate in general, because
64 S.-O. Lee et al.
it is not separated into a single edge-contour. Therefore, in our system, the pitcher’s
mound is considered as the best detectable ellipse in a playfield. A direct least squares
ellipse fitting algorithm is utilized in each edge-contour for ellipse parameter
estimation [10]. Then, we can find the pitcher’s mound as the ellipse with minimum
error smaller than a pre-defined threshold by using ellipse fitness function. Finally,
the estimated ellipse is verified by fine matching based on sum of squared difference
(SSD). Note that we assume that the observed image contains the pitcher’s mound.
The final detected ellipse is shown in red in Figure 2(d) (The figure is best viewed in
color).
Fig. 2. Contours extraction and geometric primitives estimation: input image (a), classified
grass pixels (white) (b), detected edges (c), detected lines (yellow) and an ellipse (red) (d)
Homography Estimation. A diamond shape which consists of four line segments is

located inside of the infield and a circle, pitcher’s mound, exists at the very center of
the diamond. Outside the diamond, two foul lines are located. Hence, we define the
playfield model composed of six line segments and a circle. The defined model is
shown in Figure 3(a). Now, homography estimation is thought as a matter of finding
correspondences between the model and a set of extracted shapes from the observed
image. Our solution utilizes four line correspondences with two sets of parallel lines
in the diamond shape of the playfield. A transformation matrix is determined
immediately by using the normalized direct linear transformation [11].
Fig. 3. The defined playfield model (6 lines and a circle) (left) and the geometrical constraints
(green: selected lines, red: removed lines) (right)
Searching for the best correspondence requires a combinatorial search that can be
computationally complex. Hence, we try geometrical constraints. Since we don’t
know any shape correspondences except the pitcher’s mound, metric and scale
properties are recovered roughly using relationship between a circle and an ellipse
without considering perspective parameters. Then, all the extracted line segments are
sorted in counter-clock wise order by an absolute angle of line joining the center of
the ellipse and the center of each line segment. And we remove the line segments
beyond the scope of pre-defined length from the center of the ellipse to the center of
each line segment. We also applied some minor constraints by utilizing similar
techniques proposed in the literature [7]. These constraints resolve the image
reflection problem and reduce the number of search significantly as shown in Figure
3, where many lines are removed by geometrical constraints. For each configuration
that satisfies these constraints, we compute the transformation matrix and the
complete model matching error as described in [7]. The transformation matrix that
gives the minimum error is selected as the best transformation. Finally, the estimated
homography is verified by fine matching based on SSD. Figure 4 shows a
transformed playfield model that is drawn over an input image using the estimated
homography.
2.2 Player Detection
For automatic player detection in our framework, we use the AdaBoost learning based
on histograms of oriented gradients that gives somewhat satisfied detection rate and
fast search speed [13]. Dalal & Triggs show experimentally that grids of Histograms
of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets
for human detection [14]. A feature selection algorithm, AdaBoost, is performed to
automatically select a small set of discriminative HOG features with orientation
information in order to achieve robust detection results. More details of this employed
approach can be found in [13].
This approach is designed to use a larger set of blocks that vary in sizes, locations
and aspect ratios. Therefore, it is possible to detect variable-size players in images. If
we know the search block size, it improves the detection accuracy and reduces the
66 S.-O. Lee et al.
searching time. In the proposed system, the search block size is calculated by using
the average height of Korean baseball players (i.e. 182.9 cm) [12]. At first, a camera
pose is estimated using a robust pose estimation technique from a planar target [15].
Input parameters for the pose estimation are the four corresponding lines that are used
to estimate a homography. Next, a search block size at each pixel location is
calculated approximately using a given camera pose and average height. In a baseball
game, most of the interested players are inside a baseball field. The detected players
outside the field are not considered. An example of player detection results is shown
in Figure 4.
Fig. 4. Homography estimation and player detection: a transformed playfield model (left),
detected players (green box) and a player outside the playfield (red box) (right)
We have tested the proposed algorithm using photos taken with Apple iPhone 3GS
and 4 on a PC with an Intel 2.67 GHz Core I7 CPU. The pictures were taken at Jamsil
and Mokdong baseball stadiums in Seoul, Korea. Images were resized two different
resolutions, 640 x 480 and 960 x 720, that are used to estimate homography and to
detect players including outfielders respectively. Homography estimation time always
remained between 50 and 100 ms. The time costs to detect all players are much longer
than this. However, there is no need to search all the pixels inside the baseball field,
because only an interested player is searched within the small region that is selected
by a user.
We also implemented the system on a mobile platform (Apple iPhone 4). The
whole steps were processed within two seconds. The information server manages
contexts of baseball games held in Korea. The mobile device connects to the
information server via wireless network after the image processing step. As we know,
the information server does not provide the exact location of each player. Therefore,
we roughly matched the detected player with the given information by inference
based on the team, the position, and the detected location. Figure 5 shows results of
the system after touching an interested player on the mobile phone screen.
Fig. 5. The implemented system on a mobile platform: the upper screen displays team, name
(over the player), and position (upper-right) in Korean text after touching an interested player
and the lower screen displays the detail information of the player
4 Conclusion and Future Work

We have described the vision-based Augmented Reality system that displays
supplementary information of players on a mobile device during a baseball game.
Since homography estimation plays an important role in this system, we propose a
new estimation method to fit a baseball field. As a player detection method, we
employ the fast and robust algorithm based on Adaboost learning that gives somewhat
satisfied detection rate and search speed. However, sometimes we fail to detect
players. Further improvement of the detection rate remains as a future work. We have
successfully implemented the system on a mobile platform and tested the system two
different stadiums.
Our current system does not cover every baseball stadiums, because the proposed
pixel classification algorithm is based on the playfield consists of grass and soil.
However, we found that there are various types of playfield in the world. For
example, some stadiums have a playfield that is painted white lines on a green field.
Therefore, our next goal is to develop a system that satisfies these various types of
playfield.
68 S.-O. Lee et al.
References
1. Hurwitz, A., Jeffs, A.: EYEPLY: Baseball proof of concept - Mobile augmentation for
entertainment and shopping venues. In: IEEE International Symposium on ISMAR-AMH
2009, pp. 55–56 (2009)
2. Demiris, A.M., Garcia, C., Malerczyk, C., Klein, K., Walczak, K., Kerbiriou, P., Bouville,
C., Traka, M., Reusens, E., Boyle, E., Wingbermuhle, J., Ioannidis, N.: Sprinting Along
with the Olympic Champions: Personalized, Interactive Broadcasting using Mixed Reality
Techniques and MPEG-4. In: Proc. of BIS 2002, Business Information Systems (2002)
3. Inamoto, N., Saito, H.: Free viewpoint video synthesis and presentation of sporting events
for mixed reality entertainment. In: Proc. of ACM ACE, vol. 74, pp. 42–50 (2004)
4. Matsui, K., Iwase, M., Agata, M., Tanaka, T., Ohnishi, N.: Soccer image sequence
computed by a virtual camera. In: Proc. of CVPR, pp. 860–865 (1998)
5. Kammann, T.D.: Interactive Augmented Reality in Digital Broadcasting Environments.
Diploma Thesis, University of Koblenz-Landau (2005)
6. Han, J., Farin, D., de With, P.H.N.: A Real-Time Augmented-Reality System for Sports
Broadcast Video Enhancement. In: Proc. of ACM Multimedia, pp. 337–340 (2007)
7. Farin, D., Han, J., de With, P.: Fast Camera Calibration for the Analysis of Sport
Sequences. In: IEEE Int. Conf. Multimedia Expo (ICME 2005), pp. 482–485 (2005)
8. Kuo, C.-M., Hung, M.-H., Hsieh, C.-H.: Baseball Playfield Segmentation Using Adaptive
Gaussian Mixture Models. In: International Conference on Innovative Computing,
Information and Control, pp. 360–363 (2008)
9. Nguyen, T.M., Ahuja, S., Wu, Q.M.: A real-time ellipse detection based on edge grouping.
In: Proc. of the IEEE International Conference on Systems, Man and Cybernetics, pp.
3280–3286 (2009)
10. Halir, R., Flusser, J.: Numerically stable direct least squares fitting of ellipses. In: 6th
International Conference on Computer Graphics and Visualization (1998)
11. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn.
Cambridge University Press, Cambridge (2004)
12. Korea Baseball Organization: Guide Book (2010),
http://www.koreabaseball.com
13. Zhu, Q., Avidan, S., Yeh, M.C., Cheng, K.T.: Fast Human Detection Using a Cascade of
Histograms of Oriented Gradients. In: IEEE Conf. on CVPR, pp. 1491–1498. IEEE
Computer Society Press, Los Alamitos (2006)
14. Dalal, N., Triggs, B.: Histogram of Oriented Gradients for Human Detection. In: IEEE
Conf. on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 886–893 (2005)
15. Schweighofer, G., Pinz, A.: Robust Pose Estimation from a Planar Target. IEEE Trans. on
Pattern Analysis and Machine Intelligence 28, 2024–2030 (2005)
Social Augmented Reality for Sensor Visualization in
Ubiquitous Virtual Reality*
Youngho Lee1, Jongmyung Choi1, Sehwan Kim2, Seunghun Lee3, and Say Jang4
1
Mokpo National University, Jeonnam, Korea
{youngho,jmchoi}@mokpo.ac.kr
2
WorldViz, Santa Barbara, CA 93101, USA
kim@worldviz.com
3
Korea Aerospace Research Institute, Korea
slee@kari.re.kr
4
Samsung Electronics Co., Ltd., Korea
say.jang@samsung.com
Abstract. There have been several research activities on data visualization

exploiting augmented reality technologies. However, most researches are
focused on tracking and visualization itself, yet do not much discuss social
community with augmented reality. In this paper, we propose a social
augmented reality architecture that selectively visualizes sensor information
based on the user’s social network community. We show three scenarios:
information from sensors embedded in mobile devices, from sensors in
environment, and from social community. We expect that the proposed
architecture will have a crucial role in visualizing thousands of sensor data
selectively according to the user’s social network community.
Keywords: Ubiquitous virtual reality, context-awareness, augmented reality,

social community.
1 Introduction
Recently, computing paradigm shows its trend that the technologies including
ubiquitous virtual reality, social community analysis, and augmented reality combine
the real world and the virtual world [1,2]. A smart object is a hidden intelligent object
that recognizes user’s presence and provides services to immediate needs. With smart
objects, users are allowed to interact with a whole environment with expecting the
highly intelligent responses. With the changing of computing paradigms, mobile
devices which were proposed in the Mark Weiser’s paper are commercialized in our
daily lives [11]. Especially, mobile devices are not only small devices for voice
communication between human and human but also user interfaces to access social
community [10].
*
This paper was supported by Research Funds of Mokpo National University in 2010.
70 Y. Lee et al.
There are several research activities on visualization of sensor data using

augmented reality technology. Gunnarsson et al. developed a prototype system for
visual inspection of hidden structures using a mobile phone wireless ZigBee sensor
network [3]. Claros et al., Goldsmith et al., and Yazar et al. demonstrated AR
interface for visualizing wireless sensor information in a room [4-6]. However,
previous researches show possibility that augmented reality is good to visualize
sensor data, they didn’t discussed how to visualize rich data. In real applications,
sensors are installed in large scale environments such as bridge, mountain, or city in
some case. Therefore it is very hard to visualize sensor data as user want to.
In this paper, we propose a social augmented reality architecture that visualizes
sensor information based on the user’s social network community selectively. Three
possible scenarios are presented to design the architecture: about visualization of
information from sensors embedded in mobile devices, from sensors in environment,
and from social community. It is based on Context-aware Cognitive Agent
Architecture for real-time and intelligent responses of user interfaces [8]. This
architecture enables users interact with sensor data through an augmented reality user
interface in various ways of intelligence by exploiting social community analysis.
This paper is organized as followings. In Section 2, we briefly introduce related
works in ubiquitous virtual reality. Service Scenarios of Social AR for Sensor
Visualization is presented in Section 3, and Context-aware Cognitive Agent
Architecture for Social AR is in Section 4. Conclusion is in Section 5.
2 Related Works
2.1 Smart Objects and Social Community in Ubiquitous Virtual Reality
Ubiquitous Virtual Reality (Ubiquitous VR) has been researched to apply the concept
of virtual reality into ubiquitous computing environments (real world) [9]. Lee et al.
presented three key characteristics of Ubiquitous VR based on reality, context, and
human activity [2]. Reality-virtuality continuum was introduced by Milgram.
According to Milgram’s idea, real world is ‘any environment consisting solely of real
objects, and includes whatever might be observed when viewing a real-world scene
either directly in person’. Context is defined as ‘any information that can be used to
characterize the situation of an entity, where an entity can be a person, place, or
physical or computational object’. Context can be represented as static-dynamic
continuum. We call static context if it describes information such as user profile. On
the other hand, if it describes wisdom obtained by intelligent analysis, it is called
dynamic context. Human activity could be classified into personal, group, community
and social activity. It can be represented a personal-social continuum. Ubiquitous VR
supports human social connections with highest-level user context (wisdom) in mixed
reality.
A smart object is a hidden intelligent object that recognizes user’s presence and
provides information to their immediate needs by using its sensors and processor. It
assumes that things necessary for daily life embedded microprocessors, and they are
Social Augmented Reality for Sensor Visualization in Ubiquitous Virtual Reality 71
connected over wired/wireless network. It also assumes that user interfaces control
environmental conditions and support user interaction in a natural and personal way.
Fig1 shows an idea which combines three major research area, augmented reality,
social community analysis, and smart objects.
Fig. 1. Augmented Reality, Smart objects, and Social Community
2.2 Context-Aware Cognitive Agent Architecture for Ambient User Interfaces
Cognitive Agent Architecture for virtual and smart environment was proposed for
realizing seamless interaction in ubiquitous virtual reality [8]. It is a cognitively
motivated vertically layered two-pass agent architecture for realizing responsiveness,
reactivity, and pro-activeness of smart objects, smart environments, virtual characters,
and virtual place controllers. Direct responsiveness is bounded to time frame of visual
continuity (about 40 msec). Immediate reaction is requested from user’s command
and it could take more than 40msec, with a second. Pro-activity is schedule events
and it could take any amount of time, five sec, a min., or a day.
Fig. 2. Context-aware cognitive agent architecture for ambient user interfaces in ubiquitous
virtual reality [2]
72 Y. Lee et al.
Context-aware Cognitive Agent Architecture (CCAA) is designed for real-time and

intelligent responses of ambient user interfaces based on context-aware agent
architecture in Ubiquitous VR [2]. The three layers are AR (augmented reality) layer,
CA (context-aware) layer, and AI layer. This architecture enables ambient smart
objects to interact with users in various ways of intelligence by exploiting context and
AI techniques.
3 Service Scenarios of Social AR for Sensor Visualization
3.1 Service Scenarios
In this section, we gather service scenarios of social sensor AR systems and elicit
some functional and non-functional requirements. Social sensor AR systems are
complex systems that utilize social network concept, sensor network, and
augmented reality. The idea comes from that too much information will raise
visualization problems and people would like to watch information from their
community or by selected based on their social community. The information can
come from sensors in the environments or from social network services such as
facebook or twitter.
Fig. 3. Process of social augmented reality
We can think of some service scenarios of social AR system for sensor

visualization. There are two cases: sensors could embed in mobile AR devices or
sensors are located in the environment. The first scenario is about outdoor activity and
health related service. Asthma is one of chronic and critical illness, and it is closely
related to pollen, but it is hard to see in our naked eyes. Here is the scenario.
• Service Scenario 1.
Kelly wants to take a walk to a park around her home with her daughter, Jane.
However, whenever she goes out to the park, she is worried about Jane’s asthma
attack caused by pollen. So she checks the pollen count at the Internet before going
out, but the information is not so correct because they provide information about
broad area, not a specific area such as the park. Now she can see the pollen count via
sensor AR system before going to the park. And she finds out that it is very easy to
explain to her daughter why Jane cannot go to the park when the pollen count is high
by showing the pollen monster images on the system. After then she share the pollen
information with her asthma community in Facebook, so that other member can check
it before going out to the park.
The second scenario is about indoor activity and education related service. Library
is equipped with RFID tracking system for searching books and wireless network is
available.
• Service Scenario 2
Hyo who is living in city is going to library to read books. There are thousands of
books which include adventures, drama, fantasy, history and so on. The books are
managed by RFID management system. So, whenever people move books, the
management system recognizes book’s new location automatically. Let’s assume that
she is a new member of science fiction circle. But she doesn’t know what to
read. While she looks around the library, Social AR user interface shows memos
written by her friends in the circle and information who read the book to her.
It also recommends books to her. That information is very useful to select books to
read.
The third one is scenario in a conference site and getting information from social
network. The conference site could be outdoor or indoor if there is proper location
tracking system.
• Service Scenario 3
Hun is attending international conference for his research and business. So, he is
looking for persons who have interest in similar research topic. First, he opens his
profile which includes research topics, paper lists, and contact information. Privacy
setting is important to prevent those information is opened without willingness.
Now, Hun watches where the persons are located in the site roughly with their
information.
4 Context-Aware Cognitive Agent Architecture for Social AR in

Ubiquitous Virtual Reality
Based on the Context-aware Cognitive Agent Architecture [8], we extend it for the
possible scenarios. The fourth layer, community layer is added in the original
architecture. Fig 4 shows the four layers. Social Network Construction module
receives context (processed information) from lower layers and constructs user’s
social network. Social Network Analysis module reduces and optimizes the network
to the current user’s needs.
74 Y. Lee et al.
Fig. 4. Context-aware Cognitive Agent Architecture for Social AR

5 Conclusion and Future Works

In this paper, we propose social augmented reality architecture which visualizes
sensor information based on the user’s social network community selectively.
Several scenarios are suggested to figure out necessary functions. However, our work
is still ongoing project. We expect the proposed architecture will be improved for the
future applications.
References
1. Lee, Y., Oh, S., Shin, C., Woo, W.: Recent Trends in Ubiquitous Virtual Reality. In:
International Symposium on Ubiquitous Virtual Reality, pp. 33–36 (2008)
2. Lee, Y., Oh, S., Shin, C., Woo, W.: Ubiquitous Virtual Reality and Its Key Dimension. In:
International Workshop on Ubiquitous Virtual Reality, pp. 5–8 (2009)
3. Gunnarsson, A., Rauhala, M., Henrysson, A., Ynnerman, A.: Visualization of sensor data
using mobile phone augmented reality. In: 5th IEEE and ACM International Symposium
on Mixed and Augmented Reality (ISMAR 2006), pp. 233–234. IEEE Computer Society,
Washington, DC (2006)
4. Claros, D., Haro, M., Domínguez, M., Trazegnies, C., Urdiales, C., Hernández, F.:
Augmented Reality Visualization Interface for Biometric Wireless Sensor Networks. In:
Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS,
vol. 4507, pp. 1074–1081. Springer, Heidelberg (2007)
5. Goldsmith, D., Liarokapis, F., Malone, G., Kemp, J.: Augmented Reality Environmental
Monitoring Using Wireless Sensor Networks. In: 12th International Conference
Information Visualisation, pp. 539–544
6. Yazar, D., Tsiftes, N., Osterlind, F., Finne, N., Eriksson, J., Dunkels, A.: Augmenting
reality with IP-based sensor networks. In: 9th ACM/IEEE International Conference on
Information Processing in Sensor Networks (IPSN 2010), pp. 440–441 (2010)
7. Dow, S., Mehta, M., Lausier, A., MacIntyre, B., Mateas, M.: Initial lessons from AR
Façade, an interactive augmented reality drama. In: ACM SIGCHI International
Conference on Advances in Computer Entertainment Technology, June 14-16 (2006)
8. Lee, Y., Schmidtke, H.R., Woo, W.: Realizing Seamless Interaction: a Cognitive Agent
Architecture for Virtual and Smart Environments. In: International Symposium on
Ubiquitous Virtual Reality, pp. 5–6 (2007)
9. Kim, S., Lee, Y., Woo, W.: How to Realize Ubiquitous VR? In: Pervasive: TSI Workshop,
pp. 493–504 (2006)
10. Choi, J., Moon, J.: MyGuide: A Mobile Context-Aware Exhibit Guide System. In:
Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds.) ICCSA
2008, Part II. LNCS, vol. 5073, pp. 348–359. Springer, Heidelberg (2008)
11. Weiser, M.: The Computer for the Twenty-First Century. Scientific American, 94–10
(September 1991)
Digital Diorama: AR Exhibition System to Convey
Background Information for Museums
Takuji Narumi1, Oribe Hayashi2, Kazuhiro Kasada2, Mitsuhiko Yamazaki2,

Tomohiro Tanikawa2, and Michitaka Hirose2
1
Graduate School of Engineering, The University of Tokyo / JSPS
7-3-1 Hongo Bunkyo-ku, Tokyo Japan
2
Graduate School of Information Science and Technology, The University of Tokyo
{narumi,olive,kasada,myama,tani,hirose}@cyber.t.u-tokyo.ac.jp
Abstract. In this paper, we propose a MR museum exhibition system, the

“Digital Diorama” system, to convey background information intuitively. The
The system aims to offer more features than the function of existing dioramas in
museum exhibitions by using mixed reality technology. The system
superimposes computer generated diorama scene reconstructed from related
image/video materials onto real exhibits. First, we implement and evaluate
location estimation methods of photos and movies are taken in past time. Then,
we implement and install two types of prototype system at the estimated
position to superimpose virtual scenes onto real exhibit in the Railway
Museum. By looking into an eyehole type device of the proposed system,
visitors can feel as if they time-trip around the exhibited steam locomotive and
understand historical differences between current and previous appearance.
Keywords: Mixed Reality, Museum Exhibition, Digital Museum.
1 Introduction
In every museum, a great deal of informational materials about museum exhibits has
been preserved as texts, pictures, videos, 3D models and so on. Curators have tried to
convey this large amount of information to the visitors by using instruction boards or
audio/visual guidance devices within their exhibitions. However, such conventional
information assistance methodologies cannot tell or show vivid background
information about target exhibits, for example, the state of society of that time or a
usage scene of the object.
Meanwhile, with rapid growth in information technologies, mixed reality
technologies had developed and popularized in last decade. Today, we can present a
high-quality virtual experience in real-time and real-environment by using next
generation display and interaction system: auto-stereoscopic 3D displays, gesture
input devices, marker-less tracking system, etc. Thus, museums are very interested in
the introduction of these technologies to tell the rich background information about
their exhibits. There are some research projects about this kind of exhibition systems
featuring digital technology.
Digital Diorama: AR Exhibition System to Convey Background Information for Museums 77
In this paper, we introduce a MR system superimposing virtual environment onto

real exhibits - the Digital Diorama system. In a museum exhibition, a diorama is a
technology for showing usage scenes and situations of the exhibits by building a set
or painting background image like a film. The Digital Diorama system aims to offer
more features than the function of existing dioramas in museum exhibitions. In
particular, our proposed system superimposes computer generated diorama scene on
an exhibit by using HMD, a projector, etc. With this approach, the system can present
vivid scenes or situations to visitors with real exhibits: how it was used, how it was
made and how they moved. Based on this concept, we made prototype system
superimposing virtual environment which was reconstructed from concerning
photographs/videos on the real exhibit (Fig. 1).
The Digital Diorama system consists of two main component technologies. One is
methodology deriving the location in which the target image/video material was taken.
In order to integrate a virtual scene from related photos or movies with the exhibit, we
propose an interactive method for estimating the relative position where the source
photos or movies are taken. The other is methodology superimposing the scene and the
exhibit. By placing eyehole type device consist of a HMD and a webcam at the
estimated position and presenting superimposed the scene and the exhibit, user can
experience the past historical event scene intuitively. This paper describes
implementation and evaluation of this Digital Diorama system, mainly focusing on
conveying historical scene of museum exhibits in the Railway Museum, Japan.
Fig. 1. Concept of Digital Diorama system
2 Related Works
There are some research projects about exhibition systems with cutting-edge display
technologies. For example, Virtual Showcase proposed by Oliber Bimber et al. is a
storytelling exhibition system [1]. However few of these research projects are actually
introduced in the museums, because they do not follow a trend of museum's
exhibition method and conventional curators cannot understand how to utilize them.
These research projects aim to construct system similar to display cases or panels that
are used in museums now and curators can utilize them naturally. Furthermore, these
systems intend to show small exhibit.
78 T. Narumi et al.
2.1 Constructing Digital Data and AR Exhibit for Museum
Even though we can come up with the conceptual idea of digital system for the small
exhibits easily, conveying background information about large exhibits in museums is
complex and confusing. Exhibits are very large and look very different according to
where we stand. The exhibit may be mutually away. For those reason, to treat large
exhibits with digital diorama is difficult.
Then we contemplate how to convey background information. In the case that
there is a white flat wall, we can use a projector for background telling. And there are
many research projects about projector based AR techniques. For example, O. Bimber
et al. use projector in a historical museum to explain and interact with pictorial
artworks [2]. This research can show information about the artwork but their system
requires a diffused projection screen when we want to cover large area. Hand held
projectors are useful to present individual information for each person. Yoshida et al.
propound hand held projector interaction system [3]. They estimate positional posture
information against the wall based on the axis sift between the lens center of the
projector and camera. Since the camera has to acquire the standard image, this system
is restricted by optical environment.
Moreover in a lot of cases, like spatially large museum, we may not have wall to
project data or it may be too bright to watch projected information. We cannot present
circumstance or background of the exhibits with it because there are no projection
screens behind the exhibit. Therefore we decide to use a HMD and a camera for our
digital diorama system.
An example case of approach to apply VR technology to a certain place with
special feature is ”Open Air Exhibition” [4] which was discussed in Japan Expo.
2005. “Open Air Exhibition” is a prototype to create outdoor exhibition space using
wearable computers worn by visitors. It does not require a pavilion for the exhibition.
And there are some research projects to duplicate past time in the cultural heritage.
For example, Ikeuchi et al. reconstruct Kawaradera at Asuka area which is a buddhist
temple in Japan constructed in the seventh century with AR technology [5].
Papagiannakis reproduce people’s livings at time when pompeii have not been buried
by the volcanic ash [6]. Furthermore archeoguide [7] system provides not only
Augmented Reality reconstructions of ancient ruins but on-site help, based on user’s
position and orientation in the cultural site.
Though these research projects consider the characteristic of the place, their object
is so old that they can only use CG for the restoration of the exhibit. We decided to
use photographs/movies which show the exhibit of other days. Image Based
Rendering technique [8] is very useful to construct the scenery of the past time with
photos. If we have much enough pictures of the target exhibits we can use manual or
automatic IBR process [9-11]. This technique uses feature points to reconstruct
building not only outdoor appearance but also indoors. They are extremely useful to
preserve appearance of exhibit or scenery exist today, but it is not possible to use
them to reproduce a past appearance of them of which very few or even no photos
remained. From the same reason, cyber city walk [12] and google street view [13] that
aim to construct photorealistic virtual city have little meaning for our object. But their
accumulated data is very important and they will be useful for our research when time
passes and appearance of the cities changes.
2.2 Estimation and Superimpose Methods
There is a research about superimposing background image like Depth keying [14].
This system uses Chroma key that is very orthodox way to superimpose background.
The weak drawback of this method is to require a blue sheet behind exhibits.
To superimpose photos/movies to a video image, natural connection between them
is also important. And if there is a little gap between target photos/movies and a video
image and the estimation of the point where the photo was taken, simple blending
method is not enough because a big blurring is caused as long as the taking a picture
position is not the same. Poisson image editing [15] is the way to generate natural
middle flame. With this method, we can use rough estimation for digital diorama.
Digital Diorama system requires the estimation the point where the photo was
taken. 8-point algorithm [16] enables us to estimate the relative position based on the
photo. When we want to superimpose CG in video image, we can use marker less
tracking like PTAM [17]. If we can detect relative direction to the real world, we can
superimpose a virtual object to a video image.
3 Concept and Implementation

In a museum exhibition, the diorama is the technique to convey usage scene and
situation of the exhibits to visitors by constructing a three-dimensional full-size or
miniature model or painting a background image of the exhibits like a film. Digital
diorama aims to realize the same function by using mixed reality technology.
In particular, our proposed system superimposes computer generated diorama
scene on a exhibits by using a HMD, a projector, etc. With this approach, the system
can present the vivid scene or situation to visitors with real exhibits: how to be used,
how to be made and how to be moved them. Based on this concept, we implemented a
prototype system superimposing reconstructed virtual environment from concerning
photographs and videos on the real exhibit.
3.1 The Method of Locating the Point in Which the Photo Was Taken
In order to connect old photographs or videos with the exhibit, we proposed a system
for estimating the relative position where the photos or videos are taken. We call an
old photograph or video as a “target image,” which means the system aims to guide
the user to the target image. This matching system is constructed with a mobile PC
and a webcam. First, the system acquires a current image of a target exhibit from the
webcam and compares it with target image materials based on feature points. Second,
the system estimates the relative position where the target image is taken and guides
the user to that position (Fig. 2). By repeating this procedure, the system and user
specify that accurate position.
In this system, the user gives three feature points for each target image and the
current view. It is difficult to identify same point on each image automatically,
because past and current situations are largely different. After this step, the system
tracks the assigned feature points on current image and outputs directions
80 T. Narumi et al.
Fig. 2. Work flow of the estimation Fig. 3. Coordination used in the estimation
continuously: yaw, pitch, right/left, back/forward and roll. The user moves according
to the direction presented by the system and can reach the past camera position. We
use Lucas & Kanade algorithm [18] to track three feature points on the video view.
With this tracking method, we can estimate the relative position from the exhibit
continuously. It is important to estimate the relative position by a few feature points in
order to reduce the load to do it. In order to estimate relative position from a little
information, the system uses these assumptions.
− The height of camera position is the same past and present.
− The camera parameter of cameras is the same past and present.
− Three feature points are right angle in the real world.
− The object is upright.
The system outputs yaw and pitch instructions so that the average coordinates of past
and present images coincides. It determines right/left instructions by using the angle
of three feature points. The frame of reference is shown in Fig. 3. Eye directions of
the target image and current image are on XZ plane and they look toward the origin.
Three feature points are on XY plane. The angle between the past eye direction and Z
axis is θtarget. A is camera parameter matrix. The distance from the original to the past
viewpoint is dtarget and the distance from the original to the current viewpoint is dcurrent.
If the absolute coordinate of a feature point is (X, Y, 0) and its coordinate in the
image plane is (u’, v’, s), their relationship is showed in Equation 1 and 2.
(1)
(2)
The system determines θtarget and absolute coordinates of three feature points by using
this equation and the assumption. Secondly, it calculates θcurrent so that the angle made
by three feature points in the current image plane doesn’t contradict the absolute
coordinates. θcurrent is the angle between the current eye direction and Z axis. dtarget is
also assumed in this calculation. The distance Dright is the distance to move right, and
it is determined by Equation 3.
(3)
When the system outputs back/forward instructions, it uses scale difference of
coordinates in the images. lt1, lt2 are the distance of AB, BC in the image plane of the
target picture. lc1, lc2 are the distance of AB, BC in the image plane of the current
picture. The distance Dforward to move forward is determined by Equation 4.
(4)
The system outputs roll instructions so that the slopes of AB and BC in the image
planes coincide.
We did a simulation to make sure that the method to estimate camera position is
valid. The unit of length is m in this simulation. Firstly, past viewpoint was dtarget =
50, θtarget = π/6. The absolute coordinates were A(-10, 10, 0), B(10, 10, 0), C(10, -10,
0). We shifted the current viewpoint in this simulation by dcurrent = 40, 50, 60, -π/6 <
θcurrent < π/3. Past and current viewpoints were assumed that they look at the origin. In
a simulation, the current viewpoint moved according to the right/left, back/forward
directions from the system, and finally reached a point which the system estimated to
be a past camera position. The result is showed in Fig. 4. The average error, which
means the distance from past viewpoint and the result of guidance, was 0.76m. The
ratio of dtarget to this error is 1.5.
Next, we did the other simulation by changing absolute coordinates of feature
points. In order to change the angle of feature points, absolute coordinates of A and B
were changed as follows: (A(-10, 15, 0), B(10, 5, 0)) (A(-10, 14, 0), B(10, 6,
0))…(A(-10, 6, 0), B(10, 14, 0)) (A(-10, 5, 0), B(10, 15, 0)). The past viewpoint was
dtarget = 50, θtarget = π/6 and we shifted the current viewpoint in this simulation by
dcurrent = 50, -π/6 <qn < π/3. The relationship between the angle of feature points in the
absolute coordinate system and the estimation error is showed in Fig. 5. According to
this result, if the angle of feature points is from 75 degree to 100 degree, the error
ratio to the dtarget is less than 5%. It means if the angle of feature points is in this
range, the system can estimate the camera position.
Fig. 4. The result of guidance in a simulation Fig. 5. The relationship between the angle
and the errors
82 T. Narumi et al.
Then we did an experiment of estimating camera positions of 26 past photos in The

Railway Museum. The photos are about a steam locomotive C57. As a result, we
succeeded in estimating 22 camera positions of past photos. The reason of failure in
three photos is interruption by other exhibits. Failure in one photo is due to incorrect
assumption. It cost us 4 minutes 14 seconds on average to estimate camera position of
the past photos. These results are shown in Fig. 6.
Fig. 6. Matching result between image materials Fig. 7. Prototype for superimposing
and the position around steam locomotive exhibit omnidirectional video
3.2 Implementation of the Digital Diorama System
To present superimposed real exhibits on virtual reconstructed scene, we installed two

types of prototype display device at the estimated position in front of the exhibits of
the Railway Museum (Fig. 7). The outline of the system merging past event scene and
current scene of an exhibit is shown in Fig. 8.
Fig. 8. System chart of the Digital Diorama
The first prototype system consists of a webcam and a HMD for presenting
integrated diorama scene by using ordinary photos/movies taken in past time. This
system was fixed at the estimated camera position and attitude which determined
from the target photo/movie and exhibit by using the method mentioned in 3.1.
Visitor can look into this system as if he/she is looking into an observation window,
and he/she can observe the atmosphere of old days through this window.
The other prototype system is constructed for omnidirectional and panoramic
movies. For more immersive experience, wide field of view and interaction such as
looking around is effective. On the other hand, wide angle lens cameras and
mosaicing algorithms are popular and amount of such images accumulated in data
bases around the world are increasing. Thus, generating or taking omnidirectional/
panoramic scene is easier than before. To experience such omnidirectional scene, we
construct this system which is similar to binocular telescope located in a observation
deck. Display part of this system is same as that of first prototype system. This system
has 2-DOF rotating mechanism and user can decide his/her view direction by moving
display part freely (Fig. 7). According to the rotating angle from each sensor of
rotating mechanism, this system presents proper view of the generated scene at that
direction. By looking into this system, visitor can look around the composed omni-
directional scene through the binocular telescope interface.
For this system, we propose two superimposing methods described below.
Method A: Simple Alpha Blending (Fig. 9 top). First, user watch live video image
of current exhibit through eyehole type device. Then the picture of both background
and the exhibit is superimposed by alpha blending. At the end, user watches the video
image taken when the exhibit is in use.
Method B: Stencil Superimposing (Fig. 9 bottom). User can watch current exhibit
first. Then pictures background not including past exhibits figure is superimposed to
video image by alpha blending. At last, exhibit area also replaced by exhibits figure in
video images.
Fig. 9. Superimpose picture to video image
To evaluate the efficiency of the Digital Diorama system, we ask seven subject to
experience the generated diorama scene through these two prototype system in the
Railway Museum. We installed these prototype systems at the estimated camera
position. Then subjects watch movies that connect present C57 with past C57 image.
And we took a questionnaire from them after experiencing our exhibition system. The
results of questionnaire are shown in Fig. 10.
The individual elements of the system, the validity of the camera positioning,
smooth blending, video stability, get high evaluation. This means that the system can
connect past images with present exhibits naturally. The result of questionnaire shows
that there are problems about our display device. The subjects said that the resolution
(800 x 600) was not high and they feel some kind of distortion when they looked
through a HMD. This problem can be solved by the improvement of display devices.
84 T. Narumi et al.
We got medium evaluation about the question whether the subjects can image the
past situation. Some subjects said that if there were more contents, they can image
past situation more vividly. The question whether the subjects got absorbed in the
movies on a HMD also got medium evaluation.
Moreover, many subjects said that the system was good for knowing the
background contexts of exhibits. Large part of the users preferred Method B which
superimposed to understand background information. On the other hand, pseudo
scene that the exhibit looked moving occurred with Method A. In fact exhibit’s
driving wheel looked turning that did not turn actually when the movie of running
exhibits is superimposed. About both of these systems, the subjects said alpha
blending with perfectly geometrically compatible image was very impressive. So
large amount of them replied the system helped understanding background
information about exhibits more effective than only watching pictures of the exhibit.
Users also said that the voice and noise in movie is very important to understand the
scenery of the past years the exhibits was in use.
Fig. 10. The result of user feedback (Left: Technical section, Right: Impression). Error bar
means variance and yellow point means the answer from a curator.
4 Conclusion and Future Works

In this paper, we proposed the Digital Diorama system to convey background
information intuitively. The system superimposes real exhibits on computer generated
diorama scene reconstructed from related image/video materials. Our proposed
system is divided in two procedures. In order to switching and superimposing real
exhibits and past photos seamlessly, we implement a sub-system for estimating the
camera position where photos are taken. Thus, we implement and install two types of
prototype system at estimated position to superimposing virtual scene and real exhibit
in the Railway Museum. Seven subjects, including museum curator, experienced the
constructed museum exhibition system, and answered our questionnaire. According to
the results of questionnaire, camera position estimation and seamless image
connection got good evaluation. And many subjects said that our system was good for
knowing the contexts of exhibits.
In future works, we brush up our prototype system for providing more high quality
realistic scene more effectively by using projector-based AR, 3D acoustic device,
marker-less tracking etc. By reconstructing 3D virtual environment [12] from
different types of image/video materials, user can slightly move around the exhibition
while experiencing past event scene or time-tripping between current and past age.
Also by allowing lighting condition [19] and occlusion [20] in superimposing process,
the system provide more realistic diorama scene to the visitors. Finally we aim to
introduce these systems for practical use in the museum.
Acknowledgements. This research is partly supported by “Mixed Realty Digital

Museum” project of MEXT of Japan. The authors would like to thank all the members
of our project. Especially, Torahiko Kasai and Kunio Aoki, the Railway Museum.
References
1. Bimber, O., Encarnacao, L.M., Schmalstieg, D.: The virtual showcase as a new platform
for augmented reality digital storytelling. In: Proc. of the Workshop on Virtual
Environments 2003, vol. 39, pp. 87–95 (2003)
2. Bimber, O., Coriand, F., Kleppe, A., Bruns, E., Zollmann, S., Langlotz, T.: Superimposing
pictorial artwork with projected imagery. In: ACM SIGGRAPH 2005 Courses, p. 6 (2005)
3. Yoshida, T., Nii, H., Kawakami, N., Tachi, S.: Twincle: Interface for using handheld
projectors to interact with physycal surfaces. In: ACM SIGGRAPH 2009 Emerging
Technologies (2009)
4. Ueoka, R., Hirose, M., Kuma, K., Sone, M., Kohiyama, K., Kawamura, T., Hiroto, K.:
Wearable computer application for open air exhibition in expo 2005. In: Proc. of the 2nd
IEEE Pacific Rim Conference on Multimedia, pp. 8–15 (2001)
5. Kakuta, T., Oishi, T., Ikeuchi, K.: Virtual kawaradera: Fast shadow texture for augmented
reality. In: Proc. of Intl. Society on Virtual Systems and MultiMedia 2004, pp. 141–150
(2004)
6. Papagiannakis, G., Schertenleib, S., O’Kennedy, B., Arevalo-Poizat, M., Magnenat-
Thalmann, N., Stoddart, A., Thalmann, D.: Mixing virtual and real scenes in the site of
ancient pompeii. Computer Animation & Virtual Worlds 16, 11–24 (2005)
7. Vlahakis, V., Karigiannis, J., Tsotros, M., Gounaris, M., Almeida, L., Stricker, D., Gleue,
T., Christou, I.T., Carlucci, R., Ioannidis, N.: Archeoguide: first results of an augmented
reality, mobile computing system in cultural heritage sites. In: Proc. of the 2001
Conference on Virtual Reality, Archeology and Cultural Heritage, pp. 131–140 (2001)
8. Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from
photographs: a hybrid geometry- and image-based approach. In: Proc. of the 23rd Annual
Conference on Computer Graphics and Interactive Techniques, pp. 11–20 (1996)
9. Aoki, T., Tanikawa, T., Hirose, M.: Virtual 3D world construction by inter-connecting
photograh-based 3D modeles. In: Proc. of IEEE Virtual Reality 2008, pp. 243–244 (2008)
10. Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Reconstructing building interiors from
images. In: Twelfth IEEE Intl. Conference on Computer Vision, ICCV 2009 (2009)
11. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: Exploring photo collections in 3D.
In: Proc. of SIGGRAPH 2006, pp. 835–846 (2006)
12. Hirose, M., Watanabe, S., Endo, T.: Generation of wide-range virtual spaces using
photographic images. In: Proc. of 4th IEEE Virtual Reality Annual Intl. Symposium, pp.
234–241 (1998)
13. Google Street View, http://maps.google.com/
14. Gvili, R., Kaplan, A., Ofek, E., Yahav, G.: Depth keying. In: SPIE, pp. 564–574 (2003)
15. Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM Trans. Graph. 22(3), 313–
318 (2003)
86 T. Narumi et al.
16. Hartley, R.I.: In defense of the eight-point algorithm. IEEE Transactions on Pattern
Analysis and Machine Intelligence 19, 580–593 (1997)
17. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proc.
of ISMAR 2007, pp. 1–10 (2007)
18. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to
stereo vision. In: Proceedings of the 1981 DARPA Image Understanding Workshop, pp.
121–130 (1981)
19. Kakuta, T., Oishi, T., Ikeuchi, K.: Shading and shadowing of architecture in mixed reality.
In: Proceedings of ISMAR 2005, pp. 200–201 (2005)
20. Kiyokawa, K., Billinghurst, M., Campbell, B., Woods, E.: An occlusion-capable optical
see-through head mount display for supporting co-located collaboration. In: Proc. of
ISMAR 2003, pp. 133–141 (2003)
Augmented Reality: An Advantageous Option for
Complex Training and Maintenance Operations in
Aeronautic Related Processes
Horacio Rios, Mauricio Hincapié, Andrea Caponio,

Emilio Mercado, and Eduardo González Mendívil
Instituto Tecnológico y de Estudios Superiores de Monterrey, Ave. Eugenio Garza Sada 2501
Sur Col. Tecnologico C.P. 64849 — Monterrey, Nuevo León, México
{hrios1987,maurhin}@gmail.com, Andrea.caponio@yahoo.com,
emelio_45@hotmail.com,
egm@itesm.mx
Abstract. The purpose of this article is to present the comparison between three
different methodologies for the transfer of knowledge of complex operations in
aeronautical processes that are related to maintenance and training. The first of
them is the use of the Traditional Teaching Techniques that uses manuals and
printed instructions to perform an assembly task; the second one, is the use of
audiovisual tools to give more information to operators; and finally, the use of
an Augmented Reality (AR) application to achieve the same goal with the
enhancing of real environment with virtual content. We developed an AR
application that operates in a regular laptop with stable results and provides
useful information to the user during the 4 hours of training; also basic
statistical analysis was done to compare the results of our AR application.
Keywords: Augmented Reality, Maintenance, Training, Aeronautic Field.
1 Introduction
During the past 20 years there has been a technological revolution like no other. The
advances continuously done in technology and information are changing our daily
lives in an unprecedented way that cannot be compared. Currently, Aviation
industries are looking for new technologies that can achieve repeatable results for
learning processes to train people for repair and maintenance operations, where the
driving path is the cost of implementing such tools. The reason to have priority in a
low cost/high efficiency goal is the recession that we suffered in 2009 that is
highlighted as a great concern since the Aircraft MRO (Maintenance, repair &
overhaul) Market Forecast from OAG [12] estimated that the growing rate in this part
of the industry will be just 2.3% compared to the 6% per annum in previous years. In
the present work we are analyzing a solution to improve the cognitive process of a
maintenance operator in the aviation business by using Augmented Reality
technology in the assembly of a training kit of an RV-10 airplane that is commercially
sold for practicing assembly operations of this aircraft.
88 H. Rios et al.
This paper begins by examining related work and applications of augmented reality
systems and how they reacted to controlled environments and in situ prototypes. In
the next section we present a detailed description of the components of an AR
application using markers and how they are related to each other in our case of study.
In the following segment of this work the training tool kit for the airplane RV-10 is
presented as the study case and the results of the analysis are offered for the 4 hours
assembly training of the RV kit using a low cost system that was created to guide the
operator through every operation that includes preparing tools, removing edges,
deburring holes, riveting and drilling components. We then present the comparison of
results against other technologies tested in our research group; those are the
Traditional Technique with print instructions (TTT) and the Multimedia Design
Guide (MMDG). The last one is based on the enhancing of the content presented to
the student by adding audiovisual tools in order to have better learning and a natural
cognitive process.
2 Augmented Reality Background

Augmented Reality (AR) has become the center of expectation of many industries
that have assembly or maintenance processes because of the feasibility of an AR
application. We present in next section the state of the art of AR applications in
maintenance as well as different prototypes and technologies used to create them.
2.1 Maintenance Applications of AR Systems
In 2004 Zenati and colleagues [15] described a prototype of maintenance system

based on distributed augmented reality in industrial context; this research group
achieved good results of the virtual overlays, by using optical tracking system and
calibration procedures based on computer vision techniques, where the job of the user
was to align in the scene the perceived virtual marker with a physical target, they used
head mounted display to perform the maintenance task. In other work they presented
the UML notation as the option to go for ergonomic and software design process basis
for the AR applications through the V-model and then provide the operator for details
that cannot be access during a maintenance task.
In the other hand in Columbia University, Henderson and Feiner [8] presented the
design, implementation, and user testing of a prototype augmented reality application
on the United States Marine Corps (USMC) mechanics operating inside an armored
vehicle turret. The prototype uses a tracked head-worn display to augment a
mechanic´s natural view with text, labels, arrows, and animated sequences designed
to facilitated task comprehension and execution, they researches use 10 tracking
cameras to achieve ample coverage of the position of the user´s head without
concerning to much with disadvantages of adding a large number of cameras to the
turret. Some important findings in the research were that the augmented reality
condition allowed mechanics to locate tasks more quickly than when using improved
version of currently employed methods, and in some instances, resulted in less overall
head movement. In [9], the same authors work in testing opportunistic controls (OC)
for user interaction technique. This is a tangible user interface that leverages
Augmented Reality: An Advantageous Option for Complex Training 89
properties of an object that determines how can be used; this prototype

implementation was able to support faster completion times than those of the
baseline.
In the case of study presented by Wang [14], the author proposes AR applications
for equipment maintenance. The system is based on the projection and detection of
infra-red markers to replace the air filter and give maintenance to the gear-box in a
milling machine. Later, an interesting approach where Abate and colleagues [0]
provide a ASSYST framework, which aims to enhance existing Collaborative
Working Environments by developing a maintenance paradigm based on the
integration of augmented reality technologies and a tele-assistance network
architecture enabling to support radar system operators during system failure
detection, diagnosis and fixing. The author uses an approach where, the support for
maintenance procedures is provided by means of a virtual assistant visualized into the
operational field and able to communicate with human operator through an interaction
paradigm close to the way in which humans are used to collaborate, improving the
results that a screen based interface can offer.
A system involving markerless CAD based tracking was developed by Platonov
and colleagues [13]. The application was based on an off the shelf notebook, a
wireless mobile setup consisting of a monocular wide-angle video camera and an
analog video transmission system; it used well known concepts and algorithms for the
AR application and tested for the maintenance of a BMW7 Series engines, resulting
in stable behavior.
Another example is the purpose of the research done at Embry-Rydle Aeronautical
School [6], for analyzing the use of an augmented reality system as a training medium
for novice Aviation Maintenance Trainee. Some interesting findings form their job
were that AR system could reduce the cost for training and retraining personnel by
complementing human information processing and assisting with performance of job
tasks.
In the work presented by Machhiarella and colleagues [10], AR was used to
develop augmented scenes framework that can complement human information
processing and such complement can reveal itself in training efficiency applicable to a
wide variety of flight and maintenance tasks. During the investigation the authors
managed to determine that AR-based learning effects long-term memory by reducing
the amount of information forgotten after a seven-day intervening time between an
immediate-recall test and long-term-retention-recall test. However, the author makes
a very important point with the suggestion that further research is necessary to isolate
human variability associated with cognition, learning and application of AR-based
technologies as training and learning paradigm for the aerospace industry.
Another type of AR system application is shown in the framework SEAR (speech-
enabled AR) that uses flexible and scalable vision-based localization techniques to
offer maintenance technicians a seamless multimodal user interface. The user
interface juxtaposes a graphical AR view with a context-sensitive speech dialogue;
this work was done by Goosse et. al [5]. Also De Crescenzio and research group [2]
implemented an AR system that uses SURF (Speeded-Up Robust Features) that
processes a subsampled video of 320x230 at approximately 10 fps making a tradeoff
for accuracy and allowing fluid rendering of augmented graphical content. They
tested the system in the daily inspections of Cessna C.172P, an airplane that flight
90 H. Rios et al.
schools often use. As shown, many research groups are impacting the advances of the
state of the art of the Augmented Reality technology, but there is still much work to
do and the biggest concern of industry is the cost that the application can have in real
maintenance operations and not controlled work environments. In the following
sections an explanation about our case of study for a four hour length maintenance
operation with low cost technology is presented.
2.2 AR Components
As shown previously there many types of applications with different content for
augmented reality. This section will present the different components that share as a
common in AR applications, even though some of the components may be
overlooked depending of a certain application these will be the most common parts
taken into account when developing an AR prototype. The four main components will
be the display, tracker, content and marker. Each element has a fundamental role in
augmented reality applications and the description is explained further on.
• Display. The principal sense stimulated in an AR application is sight. This means
that the content must be shown in a display and according to this, the application
may change along with the interaction human-machine. Therefore is very
important to have a correct display that suits the information that is going to be
transmitted.
• AR tracking. In order to accurately overlay the content in the real environment it
is essential to identify the environment and therefore to update the contents
location. That is the main purpose of the tracking software and tools in an AR
application are to determine where the marker is, depending of the position, the
content may or may no change. This change is measured based in a reference
point, which is normally called a marker.
• AR content. The information that can be displayed in an AR application is the
same content that can be displayed in a computer like videos, music, 3d models,
animations.The content is subject to the author´s needs and objetives.
• Marker. Is the reference point to track changes in an environment and therefore
can be used as a reference to deploy content in an application. There are many
types of marker types and technologies; one that has been used more commonly is
the geometric marker. One of the advantages if this marker is that it is easy to use
and the data processing requirements are affordable with common hardware (e.g.
home desktop). For markerless applications the object is used as marker to deploy
the AR content.
3 Design of Experiment
3.1 Methodology
The methodology followed during the course of the investigation is shown in

Figure 1, the first and very important step is to perform a proper research of the state
of the art of the technology available for Augmented Reality. After we selected the
case of study, from this point all the decisions will take in consideration the study
case selected, the software selection include the programs required to create 3D
models and animations, also it must be considered the software that will receive all
the objects created and will compile the application script and runs the application as
such. The next step is to create a story board, in the story board the different steps are
defined, there is a brief description of the steps,
the animation and the screen layout. Using the
storyboard as a guide the models and animations
are created, concurrently the program script is
created to test functions and models. When the
application prototype is done a test is perform to
check all steps and make any corrections needed
to have the final version of the application. The
definition of the experimentation can be started
right after the study case is selected and
simultaneously with the application
development. In this part the type of experiment
is defined, what are the evaluation parameters,
the number of experiments, the location and
Fig. 1. Methodology followed to testing profile. The last step in the process is
create the AR application for RV- gathering all the data for proper analysis and
10 training kit finally we must reach the conclusions from the
experiments.
The complete integration of our system is what outlines trustable results, this
integration may be seeing as a loop of harmonization between the human part of the
system, the station, the complete set of components and the tools that are used to
complete the task. If any of the components of the loop
breaks the equilibrium point for any reason, the AR
system will be affected and the task may not be
completed with success, for example, having a person
who is not willing to work with the AR technology
could lead to greater assembly times or the destruction
of the system, also having a bad setup of the work
Fig. 2. Interaction loop environment while using AR will generate stress in the
between components of operator when he tries to fit the technology in the wrong
the AR application of media. The loop of interaction of our AR application is
RV-10 training kit showed in Figure 2.
3.2 Experiment Description
The interaction between the elements shown in Figure 2 is what defines the reliability
of our data. We present here the description of all the components of our experiment:
Human participation: the test subject is the person with an engineering background
we do not present age as a restrain. The AR station is made up of a computer Acer
Aspire 3680 with intel Celeron M processor, 1.5 MB DDR2 RAM memory, Intel
Graphics media accelerator 950. The laptop was used thinking of the portability of the
92 H. Rios et al.
system. The camera used is a Micro Innovation Basic Camera model IC50C. The
markers are black and white design to be recognized by the software, the geometric
design is printed in a white sheet of regular
paper with a laser printer. Software used to
create the application is the effort of a joint
collaboration by UPV at Valencia in Spain and
Tecnologico de Monterrey (ITESM at
Monterrey) in México. This script was written
Fig. 3. RV-10 AR experiment in C-lite language and is run under the compiler
(left) drilling operation, (Right) gamestudio A7 version. The application can run
riveting operation under a self-executable archive allowing the
application to run in different computers.
The RV training kit was assembled with all the parts and accessories (rivets). This
is bought directly from the supplier Vans aircraft aviation. This kit is sold for the
training in the basic operation done during the assembly of an aircraft and is
integrated by 12 aluminum parts, the kit is shown in figure 3 when the drilling
operation and riveting is being performed.
The tools needed to perform the assembly correctly are the following: Rivet gun,
hand driller, rivet cutter, priming machine, C-clamps, metal cutting snips and
deburring tool, an important remark is that the assembly must be done indoor with
good illumination.
The experiment consisted in one test subject performing the experiment. The
person was introduced to the background of the instruction kit. Explanation of
the different tools was done along safety instructions. Then an introduction of the
software was done were the basic instructions were explained, for example, the
operator were trained to keep markers without obstructions if they wanted to play
the AR animation, also basic commands with mouse and marker were shown, next the
assembly kit is presented and the test subject is allowed to start the work, the trainer
answered any questions regarding the AR application but not about the RV assembly
kit and the way is supposed to be united.
Fig. 4. (Left) Display of the AR application for RV training kit, (Right) Parts of the kit that
form the assembly
The AR application for RV kit is form by 12 steps where 11 of them are manual
operations that include preparing the tools, removing edges, deburring holes, riveting
and drilling components. The list of operation for each block is displayed at the side
of the Figure with a brief description of each operation.
4 Experiment Results
According to the design of experiment (DOE) theory any process can be divided in
certain parts in order to study and understand their impact to the process. A process has
inputs which are transformed into outputs by a configuration of parameters. The
parameters that can be modified in order to change the output of the process are called
factors. The factors should be parametric
in order to measure the changes made and
thus the difference in the output
according to Guerra [11]. The factors can
have different configurations or different
values within the process to change the
output, these configurations are called
levels and each one or more of the factors
can have different levels which are tested
during the experiment.
In the case of the experiment
performed and according to the theory of
design of experiments the experiment
can be described like this. There is one
Fig. 5. RV-10 Toolkit assembly process factor evaluated during the experiment, it
flow diagram is the method used to transfer the
knowledge. The different configurations
available for this factor are Traditional Teaching Technique (TTT), Multimedia
Design Guide Method by Clark [3], (MMDG: usage of audio and video to enhance
learning) and AR methods, therefore this are the levels. This means the process have
one factor with 3 levels. The outputs of the process measured during the experiments
were the time, errors and questions. These 3 outputs are the 3 quantifiable parameters.
Another aspect of the experiment is where it physically took place; the location of the
experiment was the Manufacturing Lab specifically in the designated work area for
the RV Project, because it has enough space and provides good conditions to perform
the experiment.
The testing subjects, in this case the users, are a total of 7 male engineering
students. None of the students have any prior knowledge of the assembly, the process
or the tools used for the instruction kit. The users were all appointed individually for
the experiments; they were under the supervisor who introduces them to the
augmented reality instruction kit and its interface. The supervisor details to the user
the kit’s control commands and the different types of content (text, video, 3D)
displayed. The supervisor makes an introduction of all the tools the user will use
during the assembly.
Once the time starts running the users begin with the adaptation process of the
instruction kit interface, this time is included in the total time of the assembly, and
then start with all the steps in the guide. The supervisor is with user at all times and is
allowed to answer any question made. The supervisor will help the user in certain
operations of the guide because these operations must be done by 2 people;
nonetheless the supervisor is an objective assistant and helps the user according to his
instructions.
94 H. Rios et al.
The supervisor will be in charge of gathering data for the questionnaire, the time,
number of errors and number of questions is collected by the supervisor during the
assembly. Afterwards, when the user finishes the assembly, the second part of the
questionnaire will be completed with 4 open questions. The results of the experiment
are presented in two categories, first one the quantifiable and the second one the
qualitative. The quantifiable data is shown in numeric data grouped in table 1. The
three measured parameters were: the time that the person took to complete the
assembly, this parameter pretend to establish if augmented reality improves the
understanding of assembly by the user allowing to make it faster. The errors found in
the assembly that include from the misplacement of a part to a bad riveting or
misalignment. The third parameter is the questions made by the user that could not be
answered with the AR guide.
Table 1. Results from the AR method
Experiment Time (min) Error (qty) Questions (qty)

1 248 3 4
2 210 2 3
3 236 2 3
4 284 2 5
5 197 0 1
6 166 0 2
Total average 223.5 1.5 3
The analysis of the data extracted from the experiment, began with a goodness of
fit test to check if the sample follows a normal distribution. The Kolmogorov-
Smirnoff normality test with a confidence level of 95% was done to validate the
sample. The test was done using the Minitab tools for data analysis. These tests were
done for the 3 quantitative parameters, Time, Errors and Questions. The results show
that the parameters follow a normal distribution except the error parameter.
We compare our data against the TTT and MMDG method that was tested before
the AR application in [11] by M. Guerra, the results showed in Figure are normalize
to 1 for the maximum value taken for each parameter, for the time recorded to
complete the assignment for the AR and MMDG was 223.5 and 218 minutes
respectively giving us a difference of 4.5 minutes, that fall between the confidence
interval done by the t-test sample data (95% for this parameter) therefore it can be
assume that both methods have a similar time of assembly but the data shows a
difference from the TTT method with 243 minutes . From the errors comparison we
conclude that AR represent the option to go in aeronautical assembly with only 1.5
errors on average against TTT and MMDG with 2 and 6 errors. The number of
questions was the parameter that showed the biggest difference in the comparison, the
question made by the user in the AR (3 questions) method were 2.3 times lower than
the MMDG method (7 questions), this indicates that the users have more information
available thus there is lesser need for external assistance which is great for 223
minutes assignment, for the TTT method we got 5 times more questions when
compare to the AR method, this is a reflection of the understanding of the assembly
and better performance of the steps.
It is important to emphasize that the statistical

study is in process and will be presented in future
work, but the results that we obtain until this
moment showed an excellent tendency for the
AR technology when compare to other teaching
techniques, we are in the process to obtain the
sample needed to get 95% of the confidence
Fig. 6. Comparison of results to interval in all the parameters, the data presented
complete assembly between AR, in this work will help to get that number because
MMDG and TTT method it give us a perspective of the variation of the
assembly process in real conditions.
For the qualitative results we selected two main topics to address, the impact in the
motivation factor by the use of augmented reality as well as the easiness factor. These
topics were evaluated using open questions to the users in a questionnaire. In most
cases the users were initially motivated by the new technology, they were eager to use
it and thus in the first minutes of the experiment they tend to focus in how to use to
use the Augmented Reality rather than the experiment per se. The initial motivation
by users was overcome in some cases by the length of the assembly; especially in
cases were the assembly took more than 4 hours. All users considered the augmented
reality a tool that made it easier to understand the instruction kit, but also commented
that the control and display tools can be improved and change to mouse control
instead of marker control.
5 Conclusions
We developed an AR system for aeronautic maintenance and training using a regular
laptop computer to show that this technology can be used without high capacity video
cards and demonstrate the feasibility in aircraft training.
During the experiments the users of the AR application had a better appreciation of
spatial and depth concept; this means that during the assembly, when involving
smaller parts in the initial steps it was easier for them to understand and to perform
the correct step. In comparison with the multimedia method, the pictures are in some
cases insufficient to explain completely a step.
Users have a period of assimilation of the new technology, the interface is easy to
understand, yet it needs improvement to completely fulfill the user necessities,
especially regarding the control of the 3d models and animations. When manipulating
the controls of the interface in every occasion the users preferred the mouse over the
markers for control. We conclude that users perform the steps with a higher grade of
confidence, although they are not always right, they are able to generate a complete
picture/idea in their mind and replicate it physically, in the other hand using MMDG
the users still have doubts about procedure and/or parts thus performing with an
amount of doubt thus lowering the confidence of the user in the assembly done.
As shown, the number of operators chosen as test subjects showed good results but
the intention on this paper is to present the progress of the work done until this
moment and the tendency of the results when compared to the Traditional Testing
Method during a 4 hour maintenance task, now that we have data from real work
96 H. Rios et al.
conditions and the point of view of different background persons, we are able to
estimate the sample test needed to present all the parameters with 95% of confidence
interval and demonstrate that our sample is representative of the population in the
MRO for the aviation industry, this results will be presented in future work.
References
1. Abate, A., Nappi, M., Loia, V., Ricciardi, S., Boccola, E.: ASSYST: Avatar baSed
SYStem maintenance. In: Radar Conference (2008)
2. De Crecenzio, F., Fantini, M., Persiani, F., Stefano, L.D., Azzari, P., Salti, S.: Augmented
Reality for Aircraft Maintenance Training and Operations Support
3. Clark, W.: Using multimedia and cooperative learning in and out of class. In: Frontiers in
Education Conference Worcester Polythecnic Institute, MA
4. Ghadirian, P., Bishop, I.D.: Integration of augmented reality and GIS: A new approach to
realistic landscape visualisation. Landscape and Urban Planning 86, 226–232 (2008)
5. Goose, S., Sudarsky, S., Zhang, X., Nabad, N.: Speech-enabled augmented reality
supporting mobile industrial maintenance. Pervasive Computing, 65–70 (2003)
6. Haritos, T., Macchiarella, N.: A mobile application of augmented reality for aerospace
maintenance training. In: Digital Avionics Systems Conference DASC (2005)
7. Henderson, S.J., Feiner, S.: Evaluating the benefits of augmented reality for task
localization in maintenance of an armored personnel carrier turret. In: 8th IEE
International Symposium on ISMAR 2009, pp. 135–144 (2009)
8. Steven, H., Feiner, S.: Exploring the Benefits of Augmented Reality Documentation for
Maintenance and Repair. IEEE Transactions On Visualization And Computer Graphics X
(2011)
9. Steven, H., Feiner, S.: Opportunistic Tangible User Interfaces for Augmented Reality. IEE
Transaction on Visualization and Computer Graphics (2010)
10. Machiarella, N.D.: Effectiveness of video-based augmented reality as a learning paradigm
for aerospace maintenance training (2004)
11. Moreno, G., Miguel, A.: Applying Knowledge Management and Using Multimedia for
Developing Aircraft Equipment (Master degree thesis, ITESM) (2008)
12. OAG Aviation. June 11 (2009), http://www.oagaviation.com/News/
Press-Room/Air-Transport-Recession-Results-in-3-Years-of-
Lost-MRO-Market-Growth (accessed 2011)
13. Platonov, J., Heibel, H., Meier, P., Grollmann, B.: A mobile markerless AR system for
maintenance and repair. In: Mixed and Augmented Reality ISMAR 2006, pp. 105–108
(2006)
14. Wang, H.: Distributed Augmented Reality for visualization collaborative construction task
(2008)
15. Zenati, N., Zerhouni, N., Achour, K.: Asistance to maintenance in industrial process using
an augmented reality system. Industrial Technology 2, 848–852 (2004)
Enhancing Marker-Based AR Technology
Jonghoon Seo, Jinwook Shim, Ji Hye Choi, James Park, and Tack-don Han
134, Sinchon-dong, Seodaemun-gu, Seoul, Korea

{jonghoon.seo,jin99foryou,asellachoi,
james.park,hantack}@msl.yonsei.ac.kr
Abstract. In this paper, we propose a method that solves both jittering and
occlusion problems which is the biggest issue in marker based augmented
reality technology. Because we adjust the pose estimation by using multiple
keypoints that exist in the marker based on cells, we can predict the strong pose
on jittering. Additionally, we can solve the occlusion problem by applying
tracking technology.
Keywords: Marker-based AR, Augmented Reality, Tracking.
1 Introduction
Augmented reality(AR) technology combines digital information with real
environment and provides insufficient information in real world.[1] Unlike virtual
reality(VR) technology which substitutes the real environment with the graphics
produced by a computer, AR technology annotates onto the real environment [2].
In order to implement augmented reality technology, many component technologies
are needed – Detection, Registration, Tracking and so on. To implement practically,
Registration and Tracking technology should be researched. Marker based tracking
which uses visually fiducial marker in real world, offers more robust Registration and
Tracking quality. Although this technology uses visually obtrusive marker in real
world, it can offer more precise and faster tracking quality. So it is used in various
commercial AR applications.
Conventional marker-based tracking, however, has used minimum number of
features to estimate camera pose, so it is fragile by noise. Also, it has performed
detection in every frame without tracking technology. So it could not perform
augmenting when detection is failed. In this paper, to overcome these marker-inherent
problems, we propose a hybrid method with marker-less tracking technology. In
marker-less tracking, multiple features are used to estimate camera pose, so it
provides more robust estimation than marker-based tracking. Also, marker-less
tracking adopts tracking technology, it can continue augmenting even detection-
failure condition. In this paper, we adopt those marker-less tracking technology in
marker-based tracking technology. We used multiple feature points in marker-based
tracking to provide more robust against noise, and implemented feature-tracking
method to avoid detection failure condition.
98 J. Seo et al.
In chapter 2, the related existing marker-based AR technology is described. We

will explain the proposed method in chapter 3 and method which solves the jittering
problem in chapter 3.1. We will explain the method that can solve occlusion problem
in chapter 3.2. Chapter 4 and 5 shows the results obtained though experiments using
our method and conclusion, in respectively.
2 Related Works
The study of augmented reality marker based study was researched actively because of
the advantages of correctness and the speed. The marker that was developed with
various type of shape such as circle or LED finally settled to the squared typed marker
and it has been researched from many researchers after Matrix[3] & ARToolkit[4] was
developed. These markers calculate recognizes the contour of the square and identify it
as the marker after the inside of the pattern is acceptable. It uses the augmenting 3D
object method with the 3D pose estimation by calculating the perspective distortion of
four vertexes. For these markers, the ARToolKit[4] that was invented at HITLab is
most famous in HCI field. ARTag[5] was developed with improving the ARToolKit’s
recognition performance. In addition, T.U.Graz developed the Unobtrusive Marker[6]
and solved the unrealistic problem with the marker. GIST developed the Simple Frame
Marker[7] to increase the readability of marker. However, these researches have a
purpose to improve the performance of the marker itself.
Fig. 1. Various AR Marker Systems
However, the problems of augment reality based on marker still existed even
through these systems. In the Augment Reality based on marker, it predicts the 3D
Enhancing Marker-Based AR Technology 99
pose by using the specific limited numbers of points so the change of pose with video
noises and there were jittering problems existed because of that. To solve this problem,
there were some cases using the signal processing theory.[8] Nevertheless, these
methods have a problem with the big amount of calculation and the unnaturalness of
movement since it uses the history information to reduce the effect of noises.
Additionally, without any tracking process, the Instability problem[9] which the object
disappears even though marker exists when the marker fails the detection since it
calculates the code area about the area where it is detected. This problem is considered
as a serious problem on Interaction when the marker is covered with a hand.
In marker-less tracking, there are not exist these problems. Since markerless
tracking technology uses many numbers of specific points for pose prediction and
matching, the noise effect decreases through the average effect.[10] In addition, to
improve the slow velocity during the process of matching, the tracking process
applies and tracking can be provided strongly in the Detection Fail situation such as
occlusion through many specific points and tracking technology.
3 Proposed Method
We propose a robust pose estimation method which uses multiple keypoints and an
occlusion-reduction method which uses feature-tracking.
3.1 Jittering Reduction Method
The augment reality analyses the media, calculates the pose & the location of camera
on World Coordinate System, and moves the object which will be augmented by
moving the Graphics Camera based on this. For this, the augment reality extracts the
feature points from video and estimates the camera’s location and pose based on these
feature points. As mentioned above, in previous marker tracking technology, it
predicts the pose and location of camera by using only four vertexes of marker’s most
outer side. Because of video noise, the result of prediction can be changed and the
jittering problem of augmented object occurred through this. The suggesting method
reduces jittering through average effect by using the more feature points and hybrids
the markerless tracking technology. Especially, many of the good points to select as
the feature exists since each cell that are designed as digital cell (i.e. ARTag,
ARToolKit Plus, etc) have a clear boundary. The suggesting method predicts the pose
by using the boundary of cell as the feature points.
Fig. 2. Feature points which used to estimate the camera pose. Black discs represent
conventional keypoints, and which dots are additional keypoints we adopted
100 J. Seo et al.
However, to predict thee pose with additional points, the marker-coordinatee of

additional feature points have
h to be calculated. For this, the following steps are
processed.
Fig. 3. Multiplle keypoints based robust pose estimation process
It performs the corner detection on the area where it recognized as the markerr. In
this paper, we used [11] method.
m With this method, we can find the points that are
strong in Cornerness on thee image.
(1)
Additionally, it calculatess the image coordinates of Ideal Corners in area tthat

recognized as the marker. Ideal
I Corners are points that are located in the boundaaries
of cell in marker. It can be appeared on corner or edge or inside area depends on the
ID. The marker- coordinatees of Ideal corners are already known. The Ideal corners are
transformed depends on th he area of marker we can obtain the ideal corners that are
transformed by the pose.
(2)
With matching the cornerrs(Pc) and ideal corners(Pi) that are calculated with this
method, we calculate the filtered
f keypoints(Pf). These filtered keypoints are stroong
with cornerness on the imaage and able to know the marker coordinates that cann be
mapped.
(3)
Since these calculated feature points can know the image coordinate and marker
coordinate, it can be used with pose prediction. With using the multiple points, it can
predict the pose and can extract the pose prediction feature points
3.2 Robust Marker Tracking Method
The previous augmented reality based on markers augments an object by finding the
marker with performing standard detection methods. These methods have
disadvantages of disappearance of object because of detection failure, and the failure
is occurred by several reason - e.g. occlusion, shadow, light change, etc. We solved
this problem by adopting the feature tracking technology. The augmented reality
toolkit operates in the input image, detects the marker, and keeps the detected feature
points as the keypoints. When the detection failure occurs while the tracking, the KLT
feature tracking algorithm is performed based on those keypoints already kept before,
and pose is estimated by those tracked keypoints. Because this process is performed
after a marker is detected in previous frame, the system can obtain the ID of marker
and needs only feature points of the marker, so the augmentation can be worked
continuously. Figure 4 shows the previous method (left), and the case when we keep
tracking (right) when failure occurs.
Fig. 4. Feature tracking process to overcome detection failure. Left comparison is same with
conventional method, and the right comparison shows tracking method when failure occurs.
4 Experiments
To prove reduction jittering, we measured error of pose in 100 frames when the
marker and the camera were still. ARToolKit adopts none jitter reduction method.
Also ARToolKit Plus uses more robust pose estimation algorithm.
Table 1. Average error of pose in 100 still frames
ARToolKit ARToolKit Plus Proposed Method

3.24 2.76 0.98
102 J. Seo et al.
ARToolKit shows the most amount of error in 100 frames, and ARToolKit Plus
provides less error. They, however, still larger than proposed method, because they
took effect by image noise. In proposed method, we still suffer effect of noise, but the
amount is less than others.
Also, to prove overcoming the occlusion problem, we performed the experiment
from the paper of [9]. In [9], it called the case of failing the augmenting with
occlusion in augment reality based on marker as corner case and defined as figure 5.
In ARToolKit cases, it failed even though it is only occluded the one edge and ARTag
failed when two side of edges were occluded.
Fig. 5. Occlusion Corner Cases[9]. When edges are occluded, detection goes failure.
We applied the proposed method about those corner cases.
Fig. 6. Result of proposed method. It provides robust to occlusion.

When we applied suggesting method, the marker was strongly does the tracking
even though both edges were covered. However, since it does the tracking the feature
points, there were some problems with the distorted shaped object when the feature
points moves. This has to be solved as the future work.
Fig. 7. Future work. When the tracking point is moves, augmented object is distorted.
5 Conclusion
In this paper, we have proposed two methods to overcome marker-based AR inherent
problems – i.e. jittering and occlusion instability. When marker based AR is
developed to commercial application, these problems are serious. We have solved
these problem by hybrid with marker-less tracking technology. We implemented a
multiple point based pose estimation method to reduce jitter, and adopted feature
tracking method to overcome detection failure. In ordinary condition, they worked
well, but, in some special condition, they need to be improved more.
Acknowledgement. This work (2010-0027654) was supported by Mid-career

Researcher Program through NRF grant funded by the MEST.
References
1. Azuma, R.T.: A survey of augmented reality. Presence: Teleoperators and Virtual
Environment 6(4), 355–385 (1997)
2. Milgram, P., Takemura, H., Utsumi, A., Kishino, F.: Augmented Reality: A class of
displays on the reality-virtuality continuum. In: Proceedings of Telemanipulator and
Telepresence Technologies, pp. 34–2351 (2007) (retrieved)
3. Rekimoto, J.: Matrix: A Realitime Object Identification and Registration Method for
Augmented Reality. In: Proceeding of APCHI 1998, pp. 63–68 (1998)
4. Kato, H., Billinghurst, M.: Marker Tracking and HMD Calibration for a Video based
Augmented Reality Conferencing System. In: Proceeding of IWAR 1999, pp. 85–94
(1999)
5. Fiala, M.: ARTag, a fiducial marker system using digital techniques. In: Proceedings of
Computer Vision and Pattern Recognition, vol. 2, pp. 590–596 (2005)
6. Wagner, D., Langlotz, T., Schmalstieg, D.: Robust and unobtrusive marker tracking on
mobile phones. In: Proceedings of ISMAR 2008, pp. 121–124 (2008)
7. Kim, H., Woo, W.: Simple Frame Marker for Image and Character Recognition. In:
Proceedings of ISUV 2008, pp. 43–46 (2008)
104 J. Seo et al.
8. Rubio, M., Quintana, A., Pérez-Rosés, H., Quirós, R., Camahort, E.: Jittering Reduction in
Marker-Based Augmented Reality Systems. In: Gavrilova, M.L., Gervasi, O., Kumar, V.,
Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS,
9. Lee, S.-W., Kim, D.-C., Kim, D.-Y., Han, T.-D.: Tag Detection Algorithm for Improving a
Instability Problem of an Augmented Reality. In: Proceedings of ISMAR 2006, pp. 257–
258 (2006)
10. Wagner, D., Schmalstieg, D., Bischof, H.: Multiple Target Detection and Tracking with
Guaranteed Framerates on Mobile Phones. In: Proceedings of ISMAR 2009, pp. 57–64
(2009)
11. Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: Fourth Alvey Vision
Conference, Manchester, UK, pp. 147–151 (1998)
MSL_AR Toolkit: AR Authoring Tool with Interactive
Features
Jinwook Shim, Jonghoon Seo, and Tack-don Han
Dept. of Computer Science, Yonsei University,

134, Sinchon-dong, Seodaemun-gu, Seoul, Korea
{jin99foryou,jonghoon.seo,hantack}@msl.yonsei.ac.kr
Abstract. We describe an authoring tool for Augmented Reality (AR) contents.

In recent years there have been a number of frameworks proposed for
developing Augmented Reality (AR) applications. This paper describes an
authoring tool for Augmented Reality (AR) application with interactive
features. We developed the AR authoring tool which provides Interactive
features that we can perform the education service project and participate it
actively for the participating education service. In this paper, we describe
MSL_AR Authoring tool process and two kinds of interactive features.
Keywords: Augmented Reality, Authoring, interaction.
1 Introduction
Continuous development in IT technology has been the main driving force of the new
advancement and change throughout the whole society. The fields of education that
requires high quality educational content service cannot be an exception. The
educational environment of the future is expected to bring the learners to a
participating education service where they actively join the service to show their
individual creativity, out of the existing unidirectional educational service where the
users only watch and listen to the given contents passively. Without realistic interface
for wide participation, today`s educational environment, however, finds it hard to
break-through the limitations of the existing educational pattern.
This paper focuses on a new way of education environment for user participation
combining underlying Augmented Reality technologies which grant both educators
and learners easy participation. We are suggesting the new type of participating
educational environment by using Augmented Reality (AR) so that both educators
and learners can participate easily.
We can provide guidelines about complicated sequence or the process of endanger
experiment by using Augmented Reality (AR) with providing the virtual information
which can be hard to observe and recognize the situation. In addition, we provide the
educational effect with performing the tangible experiment by using the interaction
technology. We are suggesting the Augmented Reality (AR) tool that provides two
kinds of interactive features.
106 J. Shim, J. Seo, and T.-d. Han
2 Related Work
Current situation of AR Authoring Tools have too short history and lack of common
application compare with 3ds MAX, MAYA. AR Authoring tool has a major ripple
effect on technically yet. [1]
Table 1. Types of Authoring tools[1]
Programmers Non-programmers
ARToolKit DART
Low level
arTag[4] ComposAR
Studierstube AMIRE
High level
osgART MARS
ARToolKit[2] is an open-source library which provides computer vision based

tracking of black square markers. However, to develop an AR application with
ARToolKit requires further code for 3D model loading, interaction techniques, and
other utility functions. This authoring tool is very simple structure and decoding
algorithm. But, must load marker file and correlate for every marker to be detected.
ARToolKit requires the developer to have c/c++ skills and needs to link with graphics
and utility libraries. [2][3]
osgART is mainly based on the OpenSceneGraph framework, delivering them a
pre-existing large choice of multimedia content but also the ability to import from
professional designing tools (e.g. Maya, 3D Studio Max). The Tracking is mainly
based on computer vision, using ARToolKit (extended also for supporting inertial
tracking). [5][6][7]
One of the first AR authoring tools to support interactivity is DART [9], the
Designer`s ARToolKit, which is a plug-in for the popular Macromedia Director
software. The main aim of DART is to support application designers. DART is
built to allow non-programmers to create AR experiences using the low-level AR
services provided by the Director Xtras, and to integrate with existing Director
behaviours and concepts. DART supports both visual programming and a scripting
interface. [8][9]
composAR is a PC application that allows users to easily create AR scenes. It is
based on osgART [5], and it is also a test beb for the use of scripting environments for
OpenSceneGraph, ARToolKit and wxWidgets. [10][11]
AMIRE is an authoring tool for the efficient creation and modification of
augmented reality applications. The AMIRE framework provides an interface to load
and to replace a library at runtime and uses visual programming techniques to
interactively develop AR applications. AMIRE is designed to allow content experts to
easily build application without detailed knowledge about the underlying base
technologies. [12][13]
MSL_AR Toolkit: AR Authoring Tool with Interactive Features 107
3 MSL_AR Toolkit
In this section we describe the prototype of a MSL_AR toolkit process. Our goal is to
develop a low-level tool that will allow programmers to build AR contents with
interactive features. The users can modify configure file that needs for interaction on
AR content through GUI.
Fig. 1. MSL_AR toolkit authoring tool process
Figure 1 is a diagram of a MSL_AR toolkit authoring tool overall process. The first
step is to input the information to create a simple AR content through User GUI. After
that it creates AR content about entered information through main frame of MSL_AR
toolkit.
The AR content that came out as the result of MSL_AR toolkit works with
interactive feature which specified by the user.
3.1 MSL_AR Toolkit Process
MSL_AR toolkit authoring tool requires coding and scripting skills with
programming knowledge. This authoring tool provides main function of engine and
DLL library basically. Configure file determines markers` ID and the interaction
methods, etc. Authoring tool that has been developed provides interaction that uses
marker occlusion and marker merge methods. Figure 2 shows a flowchart of
MSL_AR toolkit authoring tool.
User sets the configure file through interface. And then, user implements the
MSL_AR toolkit, it creates the AR content by applying the configure file in the stage
of preprocessor. When the process activates, it will find marker in the inputted video
from camera and augments the object above the marker. It interacts by interactive
feature with the information that user input.
Fig. 2. MSL_AR toolkit flowchart
Figure 3 shows a main structure of MSL_AR toolkit authoring tool. “1class” of

Figure 3 is a basic class to create AR content. “4 tracking” part is a detection and pose
estimation module of AR process.
The most important parts are “2initialize” and “5return” part of MSL_AR toolkit`s
main structure. The "2initialize" is a part that applies the configure file which user
inputted through interface in process, and the “5return” is a part that shows the AR
content with interactive feature.
Fig. 3. MSL_AR toolkit authoring tool Main Structure

3.2 Configure Setting
Figure 4 is a GUI picture that can register the object which is connected to marker and
method that users can interact to marker in MSL_AR toolkit.
Fig. 4. MSL_AR toolkit configure set up GUI
The first item of Figure 4 designates marker ID. The second item designates
interactive features. The third item designates the 3D object which is connected to
marker. The fourth item reduces or adds ID of marker that will be used in contents.
Through this interface, user can set up the AR content interaction comfortably.
Figure 5 shows the configure file content of MSL_AR toolkit that users can input
through interface.
#the number of marker

2
#the number of interaction patterns to be recognized
2
#marker 1
1
40.0
0.0 0.0
1.0000 0.0000 0.0000 0.0000
0.0000 1.0000 0.0000 0.0000
0.0000 0.0000 1.0000 0.0000
#marker 2
2
40.0
0.0 0.0
1.0000 0.0000 0.0000 50.0000
0.0000 1.0000 0.0000 0.0000
0.0000 0.0000 1.0000 0.0000
#select the method to be used in interactions
1 occlusion
#if interaction result object
Data/patt.H20
80.0
0.0 0.0
Fig. 5. Sample of MSL_AR toolkit configure file

First of all, “the number of marker” means the number of marker that users will
use. The second “the number of interaction patterns to be recognized” means the
number of markers that will be used in interaction. The third “marker 1” and “marker
2” mean marker ID, size of marker, relative coordinates of marker. The fourth “select
the method to be used in interactions” means interactive feature that will be used in
interaction. The fifth “if interaction result object” means the information of object that
appears when interaction happens between markers.
3.3 Interactive Features
MSL_AR toolkit authoring tool provides two kinds of interactive features. One of
them is an occlusion method that performs when marker is covered. And the other is a
merge method that performs when two markers are located closely.
Occlusion. Figure 6 shows an occlusion interactive feature of MSL_AR toolkit

authoring tool when marker was covered by user`s finger. The left figure of Figure 6
shows two markers that registered by user. The right figure shows occlusion
interactive feature of MSL_AR toolkit. One marker augments the square object as a
result of interaction on another marker when a marker is covered by user's finger.
Fig. 6. Occlusion Interactive feature
User selects the GUI 2nd item of Figure 4 as “occlusion” and selects the marker that
will augment the object on the 3rd item. 4th item of GUI is an object that will augment
above the marker.
After finishing the inputting the GUI, these data are entered into “#select the
method to be used in interactions” and “#if interaction result object” of Figure 5`s
configure file.
Merge. Figure 7 shows a merge interactive feature of MSL_AR toolkit authoring tool
when two markers are located closely. The right figure is a result that is merge
interactive feature of MSL_AR toolkit. When two markers merged, “Oxygen” marker
augments the H20 molecule object as a result of interaction.
Fig. 7. Merge Interactive feature
User selects the GUI 2nd item of Figure 4 as “merge” and the rest of the item are
selected as above.
4 Conclusion
The existing limitation of Augmented Reality authoring tool is fixed and provided
restricted contents. Thus, there is no more additional information. The function of
Augmented Reality based offers additional virtual information; that is actually
difficult to observe, recognizes the situation and provides guideline about
experimental procedure or order, etc. Also through interaction technology, MSL_AR
toolkit increases the educational effectiveness and can be performed effectual,
tangible experiments using marker.
Non-programmer can produce contents easily by developing improved GUI from
future work and has to study interaction method which is a variety of location-based
marker. And much more active and efficient to produce AR contents, we have to
make study additional interactive features such as keyboard listener and will be able
to control phase of contents.
Acknowledgement. This work (2010-0027654) was supported by Mid-career

Researcher Program through NRF grant funded by the MEST.
References
1. Wang, Y., Langlotz, T., Bilinghurst, M., Bell, T.: An Authring Tool for Mobile Phone AR
Environments. In: NZCSRSC 2009 (2009)
2. Azuma, R.: A survey of Augmented Reality. Presence: Teloperators and Virtual
Environments 6(4), 355–385 (1997)
3. http://www.hitl.washington.edu/artoolkit/
4. Fiala, M.: ARTag, a fiducial marker system using digital techniques. In: IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 592,
pp. 590–596 (2005)
5. http://www.cc.gatech.edu/ael/resources/osgar.html
6. http://www.artoolworks.com/community/osgart/
7. Grasset, R., Looser, J., Billinghurst, M.: OSGARToolKit: tangible + transitional 3D
collaborative mixed reality framework. In: Proceedings of the 2005 International
Conference on Augmented Tele-Existence, ICAT 2005, Christchurch, New Zealand,
December 05-08, vol. 157, pp. 257–258. ACM, New York (2005)
8. http://www.gvu.gatech.edu/dart/
9. MacIntyre, B., Gandy, M., Dow, S., Bolter, J.D.: DART: a toolkit for rapid design
exploration of augmented reality experiences. In: Marks, J. (ed.) SIGGRAPH 2005, ACM
SIGGRAPH 2005 Papers, Los Angeles, California, July 31-August 04, pp. 932–932.
ACM, New York (2005)
10. http://www.hitlabnz.org/wiki/ComposAR
11. Dongpyo, H., Looser, J., Seichter, H., Billinghurst, M., Woontack, W.: A Sensor-Based
Interaction for Ubiquitous Virtual Reality Systems. In: International Symposium on
Ubiquitous Virtual Reality, ISUVR 2008, pp. 75–78 (2008)
12. http://www.amire.net/
13. Grimm, P., Haller, M., Paelke, V., Reinhold, S., Reimann, C., Zauner, R.: AMIRE -
authoring mixed reality. In: The First IEEE International Workshop Augmented Reality
Toolkit, p. 2 (2002)
Camera-Based In-situ 3D Modeling Techniques
for AR Diorama in Ubiquitous Virtual Reality
Atsushi Umakatsu1 , Hiroyuki Yasuhara1,

Tomohiro Mashita1,2 , Kiyoshi Kiyokawa1,2, and Haruo Takemura1,2
1
Graduate School of Information Science and Technology, Osaka University,
1-5 Yamadaoka, Suita, Osaka 565-0871, Japan
2
Cybermedia Center, Osaka University,
1-32 Machikaneyama, Toyonaka, Osaka 560-0043, Japan
{mashita,kiyo,takemura}@ime.cmc.osaka-u.ac.jp
Abstract. We have been studying an in-situ 3D modeling and author-

ing system, AR Diorama. In the AR Diorama system, a user is able to
reconstruct a 3D model of a real object of concern and describe behaviors
of the model by stroke input. In this article, we will introduce two ongo-
ing studies on interactive 3D reconstruction techniques. First technique
is feature-based. Natural feature points are first extracted and tracked.
A convex hull is then obtained from the feature points based on Delau-
nay tetrahedralisation. The polygon mesh is carved to approximate the
target object based on a feature-point visibility test. Second technique is
region-based. Foreground and background color distribution models are
first estimated to extract an object region. Then a 3D model of the target
object is reconstructed by silhouette carving. Experimental results show
that the two techniques can reconstruct a better 3D model interactively
compared with our previous system.
Keywords: AR authoring, AR Diorama, 3D reconstruction.
1 Introduction
We have been studying an in-situ 3D modeling and authoring system, AR Dio-

rama [1]. In the AR Diorama system, a user is able to reconstruct a 3D model
of a real object of concern and describe behaviors of the model by stroke input.
Being able to combine real, virtual and virtualized objects, AR Diorama has a
variety of applications including city planning, disaster planning, interior design,
and entertainment. We target smart phones and tablet computers with a touch
screen and a camera as a platform of a future AR Diorama system.
Most augmented reality (AR) systems to date can play only AR contents that
have been prepared in advance of usage. Some AR systems provide in-situ au-
thoring functionalities [2]. However, it is still difficult to handle real objects as
part of AR contents on demand. For our purpose, online in-situ 3D reconstruction
is necessary. There exist a variety of hardware devices for acquiring a 3D model of
a real object in a short time such as real-time 2D rangefinders. However, a special

114 A. Umakatsu et al.
hardware device is not desired in our scenario. In addition, acquiring geometry

of an entire scene is not enough. In AR Diorama, we would like to reconstruct
only an object of interest. Introducing a minimal human intervention is a rea-
sonable approach for this segmentation problem since AR Diorama inherently
involves computer human interaction. On the other hand, single camera-based
interactive 3D reconstruction techniques have been intensively studied recently
in the literatures of augmented reality (AR), mixed reality (MR) and ubiquitous
virtual reality (UVR) [3,4,5,6]. These techniques only require a single standard
camera to extract a target model geometry.
2 AR Diorama
Figure 1 shows an overview of the AR Diorama system architecture [1]. In the
following, its user interaction techniques and a 3D reconstruction algorighm are
briefly explained.
Feature
Camera SLAM
Points
+ AR Diorama
Camera Pose Reconstruction Virtual Scene
Mouse Stroke Input Stroke Positioning / Editing
User Interface Module Scene Editing Module
Legend Process Data Device
Fig. 1. AR Diorama system architecture [1]
2.1 User Interaction

AR Diorama supports a few simple stroke input-based interaction techniques to
reconstruct and edit the virtual scene. First, a user needs to specify a stage, on
which all virtual objects are placed, by circling the area of interest on screen.
Then a stage is automatically created based on 3D positions of feature points
in the area. Then the user is able to reconstruct a real object by again sim-
ply circling it. The polygon mesh of the reconstructed model is composed of
feature points in the circle and the input image as texture. The reconstruction
algorithm is described in more detail below. The reconstructed object is over-
laid onto the original real object for later interaction. As the polygon mesh
has only surfaces that are visible in the input image, the user will need to
see the object from different angles and circle it again to acquire a more com-
plete model. The reconstructed model can be saved to a file and loaded for
reuse.
Camera-Based In-situ 3D Modeling Techniques for AR Diorama 115
Fig. 2. Stroke-based scene editing in AR Diorama [1]
Once the model creation is done, the user is able to translate, rotate and
duplicate the model. Translation is performed by simply draw a path from a
model to the destination. Rotation is performed by drawing an arc whose center
is on a model. Figure 2 shows an example of translation operation.
2.2 Texture-Based Reconstruction

In the AR Diorama system, a texture-based reconstruction approach has been
used [1]. Natural feature points in the object region are first extracted and
tracked using the open-source PTAM (parallel tracking and mapping) library
[7]. A polygon mesh is created from 3D positions of the feature points by using
2D Delaunay triangulation calculated from the corresponding camera position.
When the next polygon mesh is created from a different viewpoint, they are
merged into a single polygon mesh. At this time, a texture-based surface visibil-
ity test is conducted and a false surface will be removed. That is, if a similarity
between an apperance of a surface in a new mesh from the corresponding view-
point and its corresponding appearance of a surface in the current mesh rendered
from the same viewpoint using a transformation matrix between the two view-
points is lower than a threshold, that surface is considered false and removed.
This is an easy-to-implement approach, however, the reconstruction accuracy
is not satisfactory at all, mainly due to the limitation of 2D Delaunay triangu-
lation. Examples of reconstructed models can be found in the middle column of
Figure 3. To improve the model accuracy, we have implemented two different
in-situ 3D reconstruction techniques inspired by recent related work, which we
will report in next sections.
3 Feature-Based Reconstruction
First 3D reconstruction technique we have newly implemented is a feature-based
one inspired by ProFORMA proposed by Pan et al [4]. In the following, its
implementation details and some reconstruction results are described in order.
3.1 Implementation
Natural feature points in the scene are first extracted and tracked using the open-
source PTAM (parallel tracking and mapping) library [7]. A PTAM’s internal
variable outlier count is used to exclude unreliable feature points.
A Delaunay tetrahedralisation of the feature points is obtained using CGAL

library. At this stage, a surface mesh of the obtained polygon is a convex hull of
the feature points, and false tetrahedrons that do not exist in the target object
need to be removed. While tracking, each triangle surface is examined and its
corresponding tetrahedron will be removed if any feature point that should be
behind the surface is visible. This carving process is expressed by the following
equations.

Pexist (Ti |v) = (Ti |Rj,k ) = (1 − Intersect(Ti , Rj,k )) (1)
v v

1(if Rj,k intersects Ti )
Intersect(Ti , Rj,k ) = (2)
0(otherwise)
Ti denotes the ith triangle in the model, j denotes a keyframe id for reconstruc-
tion, k denotes a feature point id, Rj,k denotes a ray in the jth keyframe from
a camera position to the kth feature point, v denotes all combinations of (j, k)
where the kth feature point is visible in the jth keyframe. However, this test will
remove tetrahedrons wrongly that exist in the target object due to some noise
in feature point positions. To cope with this problem, we have implemented a
probabilistic carving algorithm found in ProFORMA [4].
After carving, texture is mapped using keyframes, that are stored automati-
cally during tracking, onto the polygon surfaces. A keyframe is added when the
camera pose is different from any other camera poses associated with existing
keyframes. As the camera moves around the object, a textured polygon model
that approximates the target object is acquired.
In ProFORMA, feature points on the target object are easily identified be-
cause the camera is fixated. In our system, a user can move a camera freely
so segmentation of the target region from the background is not trivial. As a
solution, the user roughly draws round an object of concern on screen in the
beginning of model creation, to specify a region to reconstruct.
3.2 Results
Two convex objects (a Rubic’s Cube and an aluminum can) and a concave object
(an L-shape snack box) were reconstructed by the implemented feature based
reconstruction technique, and compared against those reconstructed by the pre-
vious technique [1]. A desktop PC (AMD Athlon 64 X2 Dual Core 3800+, 4GB
RAM) and a handheld camera (Point Grey Research, Flea3, 648×488@60fps)
were used in the system.
Figure 3 shows the results for a Rubic’s Cube and an aluminum can. Virtual
models reconstructed by the previous technique have many cracks and texture
dicontinuities compared with the new technique.
Figure 4 shows the results for an L-shape snack box. Reconstructed object’s
shape approximates that of the target object better after carving, however, some
tetrahedrons wrongly remain probably due to insufficient parameter tuning for
the probabilistic carving. Another conceivable reason is tracking accuracy. In our
(a) Rubic’s Cube. (left) original, (middle) old technique, (right) new technique
(b) Aluminum can. (left) original, (middle) old technique, (right) new technique
Fig. 3. Results for convex objects
(a) Snack box. (left) original, (middle, right) before carving
(b) Snack box. after carving
Fig. 4. Results for a concave object
system, position accuracy of feature points rely on the PTAM library whereas a
dedicated, robust drift-free tracking method is used in ProFORMA.
4 Region-Based Reconstruction
A feature-based approach relies on texture on the object surface, and thus
not appropriate for texture-less and/or curved objects. Second technique is a
silhouette-based approach inspired by an interactive modeling method proposed

by Bastian et al [5]. In the following, its implementation details and some
reconstruction results are described in order.
4.1 Implementation
Natural feature points in the scene are first extracted and tracked again using the
PTAM library. Then a user draws a stroke on screen to specify a target object
to reconstruct. The stroke is used to build a set of foreground and background
color distributions in the form of a Gaussian Mixture Model, and the image is
segmented into the two types of pixels using graph-cuts [8](initial segmentation).
After initial segmentation, the target object region is automatically extracted
and tracked (dynamic segmentation) using again graph-cuts. In dynamic seg-
mentation, a distance field converted from a binarized foreground image in the
previous frame is used for robust estimation. In addition, stroke input based
interaction techniques, Inclusion brush and Exclusion brush, are provided to
manually correct the silhouette.
After a silhouette of the target object is extracted, a 3D model approximating
the target object is progressively reconstructed by silhouette carving. In silhou-
ette carving, a voxel space is initially set around the object, and the 3D volume
approximating the target object is iteratively carved by testing the projection
of each voxel against the silhouette. This process is expressed by the following
equation. vti denotes the ith voxel in frame t (v0i = 1.0), P t denotes a projection
matrix, W (·) denotes a transformation from the world coordinate to the camera
coordinate.
vti = vt−1
i
f (Itα (P t W (v i ))) (3)
Normally a voxel will remain empty once removed. To cope with unstable
PTAM’s camera pose estimation, a voting scheme is introduced. That is, votes
more than a threshold are required to finally remove a voxel. From the remaining
voxel set, a polygon mesh is created using a Marching Cubes algorithm. Then
each surface of the polygon mesh is textured based on the smallest angle between
a camera pose in each keyframe and the surface normal.
4.2 Results
Using the same hardware devices as the feature-based reconstruction, it takes
about 2.5 seconds from image capturing to rendering the updated textured ob-
ject. However, the rendering and interaction performance is kept around 10
frames per second, thanks to a CPU-based, yet multi-threaded implementation.
In the following, results of main steps of reconstruction as well as a few final
reconstructed models are shown.
Figure 5 shows a segmentation result in a frame, a binarized image of the tar-
get object region, and the corresponding distance field. Foreground probability
decreases rapidly near the silhouette. Figure 6 and Figure 7 show an example
(a) Segmentation result (b) Binarized image (c) Distance field

in the previous frame
Fig. 5. Probability distribution in dynamic segmentation
(a) before (b) Inclusion brush in use (c) after
Fig. 6. Inclusion brush
(a) before (b) Exclusion brush in use (c) after
Fig. 7. Exclusion brush
usage of Inclusion and Exclusion brushes, respectively. A user is able to add

(remove) an area to (from) the foreground region interactively. Figure 8 shows
a series of voxel data generated by silhouette carving in order of time (left to
right). As the number of keyframes increases, the volume shape is refined to
approximate the targe object. Voxel color indicates texture id.
Figure 9 shows a reconstructed plushie (c) and some keyframes used (a, b).
Textures are mapped onto the model correctly, though some discontinuities
appear. This is mainly due to brightness differences in textures mapped onto
adjacent surfaces.
Figure 10 shows a reconstructed paper palace (c) and some keyframes used (a,
b). In this case, a concave part is not reconstructed well as indicated in a green
circle in Figure 10(c). This is a typical limitation of a simple silhouette carving
(a) Keyframes
(b) Voxel data (color indicates texture id)
Fig. 8. Silhouette carving
(a) Keyframe 1 (b) Keyframe 2 (c) Result
Fig. 9. Reconstruction of a plushie
(a) Keyframe 1 (b) Keyframe 2 (c) Result
Fig. 10. Reconstruction of a paper palace
algorithm. To tackle this, we will need to introduce photometric constraints or

combine with a feature-based approach.
(a) Keyframe 1 (b) Result
Fig. 11. Reconstruction of an apple
Figure 11 shows a reconstructed apple (b) and a keyframe used (a). In this case
no feature points were found in the foreground region and both our previous and
new feature based techniques did not work. A region based technique is suitable
for reconstructing such a feature-less object as far as its color distribution is
different from that of the background.
5 Conclusion
In this study, we have introduced two types of implementations and results of
3D reconstruction techniques for our AR Diorama system, inspired by the recent
advancements in this field [4,5].
The feature-based technique implemented has been proven to produce a bet-
ter reconstructed model compared with our previous technique. Advantages of
feature-based approaches over region-based approaches include that they can
reconstruct concave objects, objects whose color distribution is similar to that
of the background, and potentially non-rigid objects, and that users need not
to shoot an object from many directions. However, with our current implemen-
tation, it was found that some nonexistent surfaces sometimes remain probably
due to insufficient parameter tuning and inaccurate feature tracking.
The region-based technique has also been proven to produce a better recon-
structed model compared with our previous technique. Advantages of region-
based approaches include that they can reconstruct feature-less objects such as
plastic toys and fruits. As far as the color distribution of the target object is
different from that of the background, region-based approaches will succeed in
reconstruction. However, they cannot handle concave objects well by its nature.
In the future, we will continue pursuing improving the reconstruction qual-
ity by combining feature-based and region-based approaches, extend the stroke-
based interaction techniques [9,10,11,12] and develop an easy-to-use,
multi-purpose AR Diorama system.
References
1. Tateishi, T., Mashita, T., Kiyokawa, K., Takemura, H.: A 3D Reconstruction Sys-
tem using a Single Camera and Pen-Input for AR Content Authoring. In: Proc. of
Human Interface Symposium 2009, vol. 0173 (2009) (in Japanese)
2. Lee, G.A., Nelles, C., Billinghurst, M., Kim, G.J.: Immersive Authoring of Tan-
gible Augmented Reality Applications. In: Proc. of the 3rd IEEE International
Symposium on Mixed and Augmented Reality, pp. 172–181 (2004)
3. Fudono, K., Sato, T., Yokoya, N.: Interactive 3-D Modeling System with Capturing
Support Interface Using a Hand-held Video Camera. Transaction of the Virtual
Reality Society of Japan 10(4), 599–608 (2005) (in Japanese)
4. Pan, Q., Reitmayr, G., Drummond, T.: ProFORMA: Probabilistic Feature-based
On-line Rapid Model Acquisition. In: Proc. of the 20th British Machine Vision
Conference (2009)
5. Bastian, J., Ward, B., Hill, R., Hengel, A., Dick, A.: Interactive Modelling for AR
Applications. In: Proc. of the 9th IEEE and ACM International Symposium on
Mixed and Augmented Reality, pp. 199–205 (2010)
6. Hengel, A., Dick, A., Thormählen, T., Ward, B., Torr, P.H.S.: VideoTrace: Rapid
Interactive Scene Modelling from Video. ACM Transactions on Graphics 26(3),
Article 86 (2007)
7. Klein, G., Murray, D.: Parallel Tracking and Mapping for Small AR Workspaces.
In: Proc. of the 6th IEEE and ACM International Symposium on Mixed and Aug-
mented Reality, pp. 1–10 (2007)
8. Boykov, Y., Kolmogorov, V.: An Experimental Comparison of Min-cut/max-flow
Algorithms for Energy Minimization in Vision. IEEE Transactions on Pattern
Analysis and Machine Intelligence 26(9), 1124–1137 (2004)
9. Thorne, M., Burke, D., van de Panne, M.: Motion Doodles: An Interface for Sketch-
ing Character Motion. ACM Transactions on Graphics 23(3), 424–431 (2004)
10. Cohen, J.M., Hughes, J.F., Zeleznik, R.C.: Harold: a world made of drawings. In:
Proc. of the 1st International Symposium on Non-photorealistic Animation and
Rendering, pp. 83–90 (2000)
11. Bergig, O., Hagbi, N., El-Sana, J., Billinghurst, M.: In-place 3D Sketching for
Authoring and Augmenting Mechanical Systems. In: Proc. of the 8th IEEE Inter-
national Symposium on Mixed and Augmented Reality, pp. 87–94 (2009)
12. Popovic, J., Seitz, S.M., Erdmann, M., Popovic, Z., Witkin, A.: Interactive Ma-
nipulation of Rigid Body Simulations. In: Proc. of the 27th Annual Conference on
Computer Graphics and Interactive Techniques, pp. 209–217 (2000)
Design Criteria for AR-Based Training of Maintenance
and Assembly Tasks
Sabine Webel, Ulrich Bockholt, and Jens Keil
Fraunhofer Institute for Computer Graphics Research IGD

Fraunhoferstr. 5, 64283 Darmstadt, Germany
{sabine.webel,ulrich.bockholt,jens.keil}@igd.fraunhofer.de
Abstract. As the complexity of maintenance tasks can be enormous, the

efficient training of technicians in performing those tasks becomes increalingly
important. Maintenance training is a classical application field of Augmented
Reality explored by different research groups. Mostly technical aspects (e.g
tracking, 3D augmentations) have been in focus of this research field. In our
paper we present results of interdisciplinary research based on the fusion of
cognitive science, psychology and computer science. We focus on analyzing the
improvement of AR-based training of maintenance skills by addressing also the
necessary cognitive skills. Our aim is to find criteria for the design of AR-based
maintenance training systems. A preliminary evaluation of the proposed design
strategies has been conducted by expert trainers from industry.
Keywords: Augmented Reality, training, skill acquisition, training system,

industrial applications.
1 Introduction
As the complexity of maintenance and assembly tasks can be enormous, the training
of the technician to acquire the necessary skills to perform those tasks efficiently is a
challenging point. A good guidance of the user through the training task is one of the
key features to improve the efficiency of training. Traditional training programs are
often expensive in terms of effort and costs, and rather inefficient, since the training is
highly theoretical. Due to the complexity of maintenance tasks, it is not enough to
teach the execution of these tasks, but rather to train the underlying skills. Speed,
efficiency, and transferability of training are three major demands which skill training
systems should meet. In order to train the maintenance skills, the trainee’s practical
performance of the training tasks is vitally important.
From previous research it can be derived, that Augmented Reality (AR) is a
powerful technology to support training in particular in the context of industrial
service procedures. Instructions on how to assemble/disassemble a machine can
directly be linked to the machines to be operated. Various approaches exist, in which
the trainee is guided step-by-step through the maintenance task. Mostly technical
aspects (tracking, visualization etc.) have been in focus of this research field.
Furthermore, those systems function rather as guiding systems than training systems.
A potential danger of Augmented Reality applications is that users become dependent
124 S. Webel, U. Bockholt, and J. Keil
on Augmented Reality features, and as a result they might not be able to perform the
task, when those features are not available or when the technology fails. That is to
say, an AR-based training system must clearly differ from an AR-based guiding
system; it must really train the user instead of only guiding him through the task. This
can be only achieved by involving cognitive aspects in the training.
Industrial maintenance and assembly can be considered as a collection of complex
tasks. In most cases, these tasks involve the knowledge of specific procedures and
techniques for each machine. Each technique and procedure requires cognitive
memory and knowledge of the way the task should be performed as well as fine motor
"knowledge" about the precise movements and forces that should be applied. Hence,
the skill, which is responsible for a fast and robust acquisition of maintenance
procedures, is a complex skill. In this context, procedural skills can be considered as
the most important skill in industrial maintenance tasks. Procedural skills are the
ability to follow repeated a set of actions step-by-step in order to achieve a specified
goal. It is based on getting a good representation of a task organization: What
appropriate actions should be done, when to do them and how to do them.
Within a cooperation of engineering and perceptual scientists we explored the
training of industrial maintenance. Here we focused on training of procedural skills.
By analyzing the use of Augmented Reality technologies for enhancing the training of
procedural skills, we aim for finding design criteria for developing efficient AR-based
maintenance training systems. Therefore, a sample training application has been
developed. We present preliminary results of the evaluation conducted by
maintenance trainers from industry.
2 Related Work
As the complexity of maintenance and assembly procedures can be enormous, the

training of operators to perform those tasks efficiently has been in focus of many
research groups. Numerous studies presented the potential of Augmented Reality based
training systems and its use in guidance applications for maintenance tasks. One of the
first approaches is using Augmented Reality for a photocopier maintenance task [1].
The visualization is realized using wireframe graphics and a monochrome monoscopic
HMD. The tracking of objects and the user's head is provided by ultrasonic trackers.
The main objective is to extend an existing two dimensional automated instruction
generation system to an augmented environment. Hence, only simple graphics are
superimposed instead of complicated 3D models and animations.
Reiners et al. [2] introduce an Augmented Reality demonstrator for training a
doorlock assembly task. The system uses CAD data directly taken from the
construction/production database as well as 3D-animation and instruction data
prepared within a Virtual Prototyping planning session, to facilitate the integration of
the system into existing infrastructures. For the tracking they designed an optical
tracking system using low cost passive markers. A Head Mounted Display functions
as display device.
Schwald et al. describe an AR system for training and assistance in the industrial
maintenance context [3], which guides the user step-by-step through training and
Design Criteria for AR-Based Training of Maintenance and Assembly Tasks 125
maintenance tasks. Magnetic and infrared optical tracking techniques are combined to
obtain a fast evaluation of the user's position in the whole set-up and a correct
projection for the overlays of virtual information in the user's view. The user is
equipped with a lightweight helmet, which integrates an optical see-through HMD, a
microphone, headphones, and a 3D-positioning sensor. The headphones offer the user
the possibility to get audio information on the procedures to achieve. Via the
microphone the user can easily interact with the system by using speech recognition.
The 3D-positioning sensor is used to determine the position of the objects of interest
in 3D space in relation to the user’s position. That way, 3D augmentations are directly
superimposed with their real counterparts, whereby the parts of interest are
highlighted. Besides, also information about how to interact with the counterparts can
be visualized. The paper discusses the usage of the system, the user equipment, the
tracking and the display of virtual information.
In [4] a spatial AR system for industrial CNC-machines, that provides real-time 3D
visual feedback by using a transparent holographic element instead of using user worn
equipment (like e.g. HMD). Thus, the system can simultaneously provide bright
imagery and clear visibility of the tool and work piece. To improve the user's
understanding of the machine operations, visualizations from process data are
overlaid over the tools and work pieces, while the user can still see the real machinery
in the workspace, and also information on occluded tools is provided. The system,
consisting of software and hardware, requires minimal modifications to the existing
machine. The projectors need only to be calibrated once in a manual calibration
process.
An Augmented Reality application for training and assisting in maintaining
equipment is presented in [5]. Overlaid textual annotations, frames and pointing
arrows provide information about machine parts of interest. That way, the user's
understanding of the basic structure of the maintenance task and object is improved.
A key component of the system is a binocular video see-through HMD, that the user
is wearing. The tracking of the position and orientation of equipment is implemented
using ARToolKit [6].
The work of Franklin [7] focuses on the application of Augmented Reality in the
training domain. The test-bed is realized in the context of Forward Air Controller
training. Using the system, the Forward Air Controller (trainee) can hear and visualize
a synthetic aircraft and he can communicate with the simulated pilot via voice. Thus,
the trainee can guide the pilot onto the correct target. The system can provide
synthetic air asset stimulus and can support the generation of synthetic ground based
entities. Positions and behavior of these entities can be adapted to the needs of the
scenario. The author concluded that the impact of Augmented Reality for training
depends on the specific requirements of the end user and in particular on the realism
of the stimulation required. According to the author, this is influenced by the means
of the required stimulation, the criticality on how the synthetic stimulation is used, the
dynamism and complexity of the training environment and the availability of a
common synthetic environment.
3 Training of Procedural Skills

As mentioned before, procedural skills are the ability to follow repeated set of actions
step-by-step in order to achieve a specified goal and reflect the operator’s ability to
obtain a good representation of task organization. This skill is needed in the
performance of complex tasks as well as simple tasks. Procedural skills are based on
two main components: procedural knowledge and procedural memory. Procedural
knowledge enables a person to reproduce trained behavior. It is defined as the
knowledge about how and when (i.e. in which order) to execute a sequence of
procedures required to accomplish a particular task [8]. Procedural knowledge is
stored in the procedural memory, which enables persons to preserve the learned
connection between stimuli and responses and to response adaptively to the
environment [8]. Generally speaking, procedural skills develop gradually over several
sessions of practice (e.g. [9]) and are based on getting a good internal representation
of a task organization. Therefore, the training of procedural skills should address the
development of a good internal representation of the task and the execution of the
single steps in the right order in early training phases.
3.1 Enhancement of Mental Model Building
It has been explored that the performance of a learner of a procedural skill becomes
more accurate, faster, and more flexible when he is provided with elaborated
knowledge (e.g. [10],[11]). This means that the learner’s performance increases when
how-it-works knowledge (“context procedures”) is provided in addition to the how-
to-do-it knowledge (“list procedures”) (e.g. [10]). According to Taatgen et al., when
elaborated knowledge is given, the learner is able to extract representations of the
system and the task, which are closer to his internal representation, and as a result
performance improved [10]. This internal, psychological representation of the device
to interact with can be defined as mental model [12]. In order to support the trainee’s
mental model building process, the features of the task which are most important for
developing a good internal representation must be presented to the trainee. It has been
suggested, that "the mental model of a device is formed largely by interpreting its
perceived actions and its visible structure" [13]. The mental model building is mainly
influenced by two factors: the actions of the system (i.e. the task and the involved
device) its visible structure.
Transferring this into the context of procedural skill training, two aspects seem to
be important for supporting the building of a good mental model: One is providing an
abstract representation of the system, what constructs a better understanding of how it
works. The other one is providing the visual representation of the system, which will
strengthen the internal visual image. It has been found, that people think of assemblies
as a hierarchy of parts, where parts are grouped by different functions (e.g. the legs of
a chair) [11]. Hence, the hypothesis is that the displayed sub-part of the assembly task
should include both the condition of the device before the current step (or rather the
logical group of steps to which the current step belongs) and the condition after. This
hypothesis is based on the work of Taatgen et al. [10], in which it is shown that
instructions which state pre- and post-conditions yield better performance than
instructions which do not. Reviewing this it can be concluded, that the user’s mental
model building process can be improved by using visualization elements providing
context information.
4 Design Strategies
It has been shown, that guided experience is good for learning, but an active
exploration of the task has to be assured as well (e.g. [14],[15]). A too strong
guidance of the trainee during training impedes an active task exploration and harms
the learning process. Active exploration naturally occurs when transferring the
information about the task during training is accompanied with some difficulties,
forcing the trainee to independently explore the task. If such difficulties are reduced
(e.g. by showing the user in detail how to solve the problem), active exploration may
not take place. Strong visual guidance tools impede active exploration, because they
guide the trainee in specific actions and thus inhibit the trainee’s active exploratory
responses [16]. This can be illustrated using the example of a car driver guided by a
route guidance system: this driver typically has less orientation than driver who is
exploring the way with the help of maps and street signs. Also reproducing the way,
when he has to drive it again, is more difficult for the driver who used the route
guidance system. From all this it can be concluded, that the training system should
include visual elements that allow for reducing the level of provided information.
Furthermore, it should contain elements, which guide the trainee through the training
task by improving the trainee’s comprehension and internalization of the task, while
active exploration is not inhibited.
4.1 Adaptive Visual Aids
An important issue when designing AR-based training systems is how much

information should be visualized in the different training phases. A basic
understanding of how much information the trainee needs during learning can be
obtained by observing studying people. Examining the learning behavior of a student
studying procedural processes using textbooks or written notations, the following
characteristics can be observed: First of all, for each step the student marks a couple
of words, a sentence or an excerpt in the running text and writes annotations at the
side margin. He studies the process by going repeatedly through this learning
material. In the first cycles, the student reads the marked text and the accordant
annotations to catch information about the single steps and to put them in order. With
the increasing number of performed studying cycles the information that he needs to
decide and reproduce the single steps of the procedure decreases. When he starts
studying he needs more detailed information about the single steps, because the
learning of the single steps is in focus. With the growing development of an
understanding of the single steps, the learning of how the steps fit together (i.e. of the
procedure) comes increasingly to the fore.
Fig. 1. Adaptive Visual Aid in the training application: a pulsing yellow circle (pointer)
highlights the area of interest; the detailed instruction is given on the plane (content object)
Transferring this observation into the context of training, the mapping of the
visualized information level to the different training phases can be hypothesized as
follows: In early phases, a clear and detailed instruction about the current step should
be provided in order to train the trainee in understanding and performing the single
steps. This can be realized by using adaptive visual aids (AVA) consisting of overlaid
3D objects (pointer) and/or multimedia instructions (content) that is displayed on user
demand (see Fig. 1). Alternatively, the pointer can act as object/area highlight while
the content provides the detailed multimedia instruction. During the training the level
of presented information should be gradually reduced (e.g. only 3D animation, then
only area highlight with some buzzwords or a picture, then only area highlight, etc.).
Hence, both AVA pointer and AVA content object can provide a variable amount of
information. The pointer consists of at least one virtual object overlaid on the camera
image (like traditional Augmented Reality overlays). Hence, it presents also the
spatial component of the information. The pointer object can contain for example
complete 3D animations, 3D models, or highlighting geometries (e.g. pulsing circle).
The AVA content object consists of a view-aligned virtual 2D plane and different
multimedia data visualized on that plane. Thus, it can provide multimedia information
that is clearly recognizable for the user. The data displayed on the plane can contain
text, images, videos and 3D scenes rendered in a 2D image on the plane, or any
combination of those elements. That is, it can contain detailed instructions (e.g. a text
description and a video showing an expert performing the task) or just a hint (e.g. a
picture of the tool needed to perform the task).
4.2 Structure and Progress Information
Since providing abstract, structural information about the task can improve the
trainee’s mental model building process, and hence the acquisition of procedural
skills (see chapter 3.1), visual elements displaying information about the structure of
the training task should be included in the training system. Not only the structure of
the task, but also the relation between the current status and the structure is important.
That is, the position of the current state in the whole structure should be visualized as
well. Thus, the trainee gets an overview of the training task and can arrange the
current step in the structure of the task and use this information to refine his internal
representation of the task. One possibility to visualize structural information is the use
of progress-bars. Progress-bars provide an abstract overview of the trainee’s current
status in relation to the whole task (see Fig. 2).
Fig. 2. An extended progressbar showing the user’s progress inside the task and inside the
mental groups (each part of the bar corresponds to a mental group of steps)
4.3 Device Display
As mentioned in chapter 3.1, the presentation of context information, such as logical

units of sub-tasks, and the display of the device to maintain can support the trainee’s
mental model building. Moreover, the presentation of only relevant sub-parts of the
device and the visualization of the pre- and post-conditions can further enhance the
development of a good internal representation.
Based on these findings, the use of a Device Display is suggested. The Device
Display is a visual element that provides information about successive steps, or rather
sub-tasks, belonging to a logical group. That is, it provides information about a good
mental model of the task. This can support the user in developing his internal
representation of the task. The provided information includes also the condition of the
device before the current step and afterwards. Thus, using the Device Display, the
user can recognize a sub-goal of the task he has to perform. This can help him to
understand "what" he has to do, and hence to deduce the next step to perform. In fact,
the presentation of sub-goals actually forces the trainee to deduce the next step
without using a more direct visual guidance.
The visualization of the Device Display is similar to the visualization of the AVA
content (see Fig. 3, left). It consists of a view-aligned 2D plane and multimedia
objects rendered on the top of this plane, which can be faded in/out on user demand.
The objects displayed on the plane provide information about the grouped sub-tasks
(i.e. mental group) and the condition of the device before and after the mental group.
A text describes the objective of the mental group in a few words. For example, if the
mental group comprises all steps for removing a machine cover, the text "Remove the
cover of the valve" is displayed. Additionally, either a video of an expert’s
performance of the grouped sub-tasks, or a 3D animation presenting the sub-tasks
including the device conditions is shown in the Device Display. Thus, for each mental
group the best representation can be chosen. Also a progress-bar is displayed, that
shows the user’s progress inside the mental group. That way, supplemental
information about the structure of the task, or rather of the mental model, is presented,
what can further support the user’s mental model building process.
Fig. 3. Left: a “Device Display” at the left side of the window shows a video about a
mental group of steps; Right: a vibrotactile bracelet developed by DLR (German Aerospace
Center)
4.4 Haptic (Vibrotactile) Hints
The potential of vibrotactile feedback for spatial guidance and attention direction
has been demonstrated in various works (e.g. [17]). Usually a lot of visual
information has to be processed in complex working scenarios. In contrast, the
tactile channel is less overloaded. Furthermore, vibrotactile feedback is a quite
intuitive feedback, as the stimuli are directly mapped to body coordinates. Since it
provides a soft guidance that "channels" the user to the designated target instead of
directly manipulating his movements, it does not prevent the active exploration of
the task. Thus, the mental model building process can be supported. Vibrotactile
hints can be given by using simple devices like the vibrotactile bracelet shown in
Fig. 3 (right). The bracelet developed by DLR is equipped with six vibration
actuators which are placed at equal distance from each other inside the bracelet and
hence also around the user‘s arm. The intensity of each actuator can be controlled
individually. That way, various sensations can be generated, such as sensations
indicating rotational or translational movements.
Such vibrotactile feedback should be used to give the trainee additional motion
hints during the task training, such as rotational or translational movement cues, and
to guide the trainee to specific targets. For example, if the trainee needs to rotate his
arm for performing a sub-task, the rotational direction (cw or ccw) may be difficult
to recognize in a video showing an expert performing the sub-task. Receiving the
same information using a vibrotactile bracelet, the trainee can easier identify the
rotational direction. Also translational movements can be conveyed. Apart from
that, vibrotactile feedback can also be used for presenting error feedback, such as
communicating whether the right action is performed (e.g. the right tool is grasped).
This can prevent the user from performing errors at an early stage. In addition,
vibrotactile hints can be used to provide slight instructions by directing the trainee’s
attention to a body part.
5 Preliminary Tests and Conclusion
A preliminary evaluation has been conducted by four expert trainers from the food
packaging industry (Sidel1). The training task is the assembly of a valve. The
implemented training application consists of 32 steps, showing the sub-tasks which
are necessary to assemble the valve. Haptic hints indicating rotational and
translational movements of the user’s right wrist have been implemented and
provided using the vibrotactile bracelet described above. The trainers performed the
training task using the realized AR training platform. Afterwards they filled out a
questionnaire about the usability and functionality of the training system and the
design strategies. Table 1 shows an extract of this questionnaire.
Table 1. Extract of the preliminary evaluation questionnaire
SCALE T1 T2 T3 T4 AVG
The information provided by the platform via displayed
1 7 4 5 6 6 5,25
information was enough to understand the task.
The visualization of the different operations was enough for
1 7 5 5 6 6 5,5
leaning the task?
Is there any critical information of the task missing? N/A no no no no
Please rate the general visualization utilities: spatial
1 10 6 7 8 8 7,25
information, step information, captions, etc.?
Please rate the overview strategy? 1 10 10 7 8 8 8,25
Please rate the spatial pointer strategy? (AVA pointer) 1 10 6 4 8 7 6,25
Please rate the content aids display strategy? (AVA content,
1 10 7 8 10 7 8
Device Display)
Please rate the context aids strategy? (progress bars) 1 10 6 7 6 8 6,75
Please rate the haptic hints strategy? 1 10 6 3 2 8 4,75
Please rate the playback/trainer-trainee based strategy? 1 10 6 8 10 8 8
From the functionality point of view, how do you rate the
1 10 6 6 9 8 7,25
platform in overall?
What percentage of the task do you consider that you have
% 90% 70% 80% 10% 62,50%
learnt?
What grade would you give to the AR platform as learning
1 10 8 7 8 5 7
system?
We conclude from this, that the proposed design strategies, namely the use of
Adaptive Visual Aids (AVAs), the provision of structure and progress information,
the visualization of a Device Display and the integration of haptic hints, have a great
potential for improving training of maintenance and assembly skills. The perception
of the implemented haptic hints indicating movements turned out to be potentially
valuable, but we have to refine the realization of the hints (i.e. the controlling of the
vibration stimuli) in order to produce clear indications of the movements the trainee
1
Sidel is one of the world’s leaders of solutions for packaging liquid foods (http://www.sidel.com/).
has to perform. In our future work the training platform will be optimized according
to the results of the preliminary tests (i.e. improvement of haptic hints, provision of
error feedback) and evaluated by technicians working at Sidel.
References
1. Feiner, S., Macintyre, B., Seligmann, D.: Knowledge-based Augmented Reality. Commun.
ACM 36(7), 53–62 (1993)
2. Reiners, D., Stricker, D., Klinker, G., Müller, S.: Augmented Reality for construction
tasks: Doorlock assembly. In: Proc. IEEE and ACM IWAR 1998: 1st Int. Workshop on
Augmented Reality, pp. 31–46 (1998)
3. Schwald, B., Laval, B.D., Sa, T.O., Guynemer, R.: An Augmented Reality system for
training and assistance to maintenance in the industrial context. In: 11th Int. Conf. in
Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, Czech
Republic, pp. 425–432 (2003)
4. Olwal, A., Gustafsson, J., Lindfors, C.: Spatial Augmented Reality on industrial CNC-
machines. In: SPIE Conference Series (2008)
5. Ke, C., Kang, B., Chen, D., Li, X.: An Augmented Reality-Based Application for
Equipment Maintenance. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS,
6. Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based
Augmented Reality conferencing system. In: Proc. IEEE and ACM IWAR 1999, pp. 85–
94. IEEE Computer Society, Los Alamitos (1999)
7. Franklin, M.: The lessons learned in the application of Augmented Reality. In: Virtual
Media for Military Applications, pp. 30/1–30/8 (2006)
8. Tulving, E.: How many memory systems are there? American Psychologist 40(4), 385–
398 (1985)
9. Gupta, P., Cohen, N.J.: Theoretical and computational analysis of skill learning, repetition
priming, and procedural memory. Psychological Review 109(2), 401–448 (2002)
10. Taatgen, N.A., Huss, D., Dickinson, D., Anderson, J.R.: The acquisition of robust and
flexible cognitive skills. Journal of Experimental Psychology: General 137(3), 548–565
(2008)
11. Agrawala, M., Phan, D., Heiser, J., Haymaker, J., Klingner, J., Hanrahan, P., Tversky, B.:
Designing effective step-by-step assembly instructions. ACM Trans. Graph 22(3), 828–
837 (2003)
12. Cañas, J.J., Antolí, A., Quesada, J.F.: The role of working memory on measuring mental
models of physical systems. Psicológica 22, 25–42 (2001)
13. Norman, D.: Some observations on mental models. In: Mental Models, pp. 7–14.
Lawrence Erlbaum Associates, Mahwah (1983)
14. Mayer, R.E.: Multimedia Learning. Cambridge University Press, Cambridge (2001)
15. Wickens, C.D.: Multiple Resources and Mental Workload. Human Factors 50(3), 449–455
(2008)
16. Gavish, N., Yechiam, E.: The disadvantageous but appealing use of visual guidance in
procedural skills training. In: Proc. AHFE (2010)
17. Weber, B., Schätzle, S., Hulin, T., Preusche, C., Deml, B.: Evaluation of a vibrotactile
feedback device for spatial guidance. In: IEEE- World Haptics Conference (2011)
Object Selection in Virtual Environments
Performance, Usability and Interaction with Spatial
Abilities
Andreas Baier1, David Wittmann2, and Martin Ende2

1
University of Regensburg, Universitätsstraße 31, 93053 Regensburg, Germany
2
Cassidian Air Systems, Rechlinerstraße 1, 85077 Manching, Germany
andreas.baier@psychologie.uni-regensburg.de
Abstract. We investigate the influence of users’ spatial orientation and space

relations ability on performance with six different interaction methods for
object selection in virtual environments. Three interaction methods are operated
with a mouse, three with a data glove. Results show that mouse based inter-
action methods perform better compared to data glove based methods. Usability
ratings reinforce these findings. However, performance with the mouse based
methods appears to be independent from users’ spatial abilities, whereas data
glove based methods are not.
Keywords: Object selection, interaction method, virtual environment, input

device, performance, usability, spatial ability.
1 Introduction
In order to interact with virtual environments, the user must have the possibility to
carry out object selection tasks, utilizing convenient interaction methods [2]. These
interaction methods require particular user actions which in turn necessitate particular
user abilities. Knowledge about these actions and abilities facilitates to make a
decision pro or contra a particular interaction method. In this study six interaction
methods based on two input devices, mouse and data glove, are developed and
evaluated. The data glove typically is associated with virtual environments and as six
degrees of freedom input device it allows for a variety of different interaction
techniques such as gesture recognition or direct object selection. However, for a wide
variety of gestures a motion tracking system is mandatory. Furthermore, the glove
usually does not fit well to every person and system calibration must be conducted.
The fact that the user has to wear the device can be seen as another drawback. The
mouse in comparison has been designed for traditional 2D desktop applications but
with appropriate mappings it also works in 3D applications. It requires a less
complicated hardware setup, has not to be worn, and is suitable to basically every
user. Since it is one of the most widely used devices a high degree of user
familiarization can be assumed and, because it is placed on a 2D surface, less physical
effort is to be expected. In addition, as its motion is only associated with three degrees
of freedom, the assumption can be made that spatial abilities may have a lower impact
136 A. Baier, D. Wittmann, and M. Ende
on task performance. In order to verify these assumptions, an assessment is carried

out. Evaluation criteria are performance, usability [3] and interrelation between users’
spatial abilities and performance. Spatial abilities are measured with the Spatial
Orientation Test [4] and the Space Relations Subtest [1].
2 Experimental Setup and Procedure
2.1 Participants
The study has been conducted with 22 male and two female subjects with an average
age of 25 years.
2.2 Hardware
In order to facilitate three-dimensional perception a stereoscopic rear projection

screen (246 cm screen diagonal) was used in combination with polarized glasses to
present the object selection task scenery. The participants were seated in front of the
projection screen with a distance of 125 cm between head and screen. The chair was
equipped with fixed supports for both arms. The left support was fitted with a
keyboard, the right one served as a board for the mouse and as the starting point for
object selection with the data glove. The glove required an optical tracking system
realized by six cameras equipped with infrared filters and appropriate light sources. It
carried eight LEDs itself, one each at the tip of thumb, index and middle finger and
five on the back of the hand.
2.3 Software and Procedures
The basic object selection task scenery was the same for all interaction methods and
participants. Nine bullets were presented (Fig. 1), eight blue and one magenta bullet,
which was the target object.
Fig. 1. Exemplary object selection task scenery (target object in magenta)
The selection process was always two-staged and required a nomination and a
confirmation of the target bullet. Nomination of a bullet changed its color from blue
or magenta to white, and confirmation into green in case of a correct selection or else
to red. Task difficulty was varied by the two factors bullet size (1.4 and 0.6 cm
diameter) and object distance (25 and 65 cm from the users hand), resulting in four
difficulty levels: large and near (A), large and far (B), small and near (C) and small
Object Selection in Virtual Environments 137
and far (D). By means of a training session all participants were made familiar with
the interaction methods before the trials began and all participants accomplished the
full set of object selection tasks. Thus, each participant conducted four trials with each
interaction method leading to 24 trials in total. The presentation sequence as well as
the levels of task difficulty was counterbalanced in order to avoid sequence effects.
The selection of the target object was carried out as fast and precise as possible. The
participants started the evaluation of an interaction method manually by pressing the
F8-button on the keyboard. This also started the measurement of the selection time
and the registration of errors. The measurement stopped when the target object was
correctly selected. Selection of a false object caused the registration of an error. The
measurement of time and registration of further errors was continued until the correct
selection was made. The six interaction methods differed in terms of the nomination
and confirmation processes, leading to three groups.
Group 1 - Data Glove Based Interaction Methods
Direct: Nomination required the user to point at the object with the tip of the index
finger. In the virtual scenery the fingertips were indicated by grey colored pellets, the
index finger was additionally marked with a white cone (Fig. 3a). For confirmation the
fingertips of both thumb and middle finger had to be brought together.
Ray: A virtual ray originated at the tip of the index finger. The ray had to be pointed
onto an object in order to accomplish its nomination. Confirmation was conducted as
with Direct.
Group 2 - Mouse Based Interaction Methods
Plane: Nomination of an object was achieved by horizontal adjustment of the grey

colored selection cross (Fig. 2a) with the mouse and allocation of the vertical plane
using the mouse wheel. Confirmation of the nomination was carried out by pressing
the F8-button on the keyboard.
Cylinder: The cursor was modeled as a two-dimensional circle on the underlying
surface. It had to be positioned right under an object (Fig. 2b). When the edge of the
circle passed through the center of an object projection on the underlying surface, the
cursor shape changed from a circle to a semitransparent cylinder (Fig. 2c), nominating
the accordant bullet. Confirmation was conducted as with the Plane.
(2a) (2b) (2c)
Fig. 2. Plane (2a), Cylinder without (2b) and with nomination (2c)
Group 3 – List Based Interaction Methods
A common feature of these interaction methods was the use of a selection list (Fig. 3).
Object selection did not happen by nomination of an object itself, but by nomination
of its representation from a selection list. Each row represented one bullet, displayed
by a circular symbol of the appropriate color. The top row presented the left bullet,
the bottom row the right one. The lists were the same for both methods. The sizes of
the bullets in the scenery as well as their distances from the user varied as in the other
interaction methods.
(3a) (3b)
Fig. 3. Hand (3a) and Mouse Operated List (3b)
Hand Operated List: Object nomination required the user to point at the accordant
row with the tip of the index finger. The grey row indicated the current position in the
list (Fig. 3a). The color of the selected bullet changed into white accordingly. In order
to confirm the nominated row and therewith selecting the associated bullet, the
F8-button of the keyboard had to be pressed.
Mouse Operated List: Object nomination was carried out by scrolling through the list
(Fig. 3b) using the mouse wheel and confirmation as with the Hand Operated List.
2.4 Usability Measurement
After the evaluation of each interaction method all participants were presented a
standardized usability questionnaire according to ISO 9241-9:2002 [3].
2.5 Spatial Ability Measurement
In order to determine the space relations ability of the participants (ability to think in
three dimensions) the Space Relations Subtest of the Differential Aptitude Test was
applied [1]. For the measurement of the spatial orientation ability the Spatial Orienta-
tion Test was adopted (ability to take over different perspectives or orient in space)
[4].
3 Results
3.1 Performance Measurement
Both selection time and error rate of the hand based methods Direct and Ray increase
considerably with rising task difficulty. In case of Direct, the decrease in performance
between low and high task difficulty amounts to about 100 % (3.2s) and in case of the
Ray to about 180 % (8.6s). Direct shows superior performance compared to Ray
regarding selection time and error rate. The difference in selection time accounts for 2.8
seconds on average, considering an error rate of 0.5 errors per selection. The selection
times of the mouse base methods Plane and Cylinder rise about 50 % and 90 %
respectively with increasing task difficulty, whereas the error rates are very low and exhi-
bit no noticeable difference between the two interaction methods. Concerning selection
time, Cylinder performs about 2.3 seconds faster across all difficulty levels. Both Hand
and Mouse Operated List show comparably low error rates with no significant difference
between both methods. The gain regarding selection time with the Mouse compared to
the Hand Operated List averages about 30 % (0.8s). A comparison of all interaction
methods shows that the best performance is achieved by Hand and Mouse Operated List.
The Cylinder and Plane show moderate performance. With regard to error rate their
performance is comparable to the list based methods. Data glove based methods show
relatively long selection times and high error rates. Sequencing the methods according to
their performance in an anticlimactic order shows the following results: 1. Mouse
Operated List, 2. Hand Operated List, 3. Cylinder, 4. Plane, 5. Direct and 6. Ray. At large
mouse based methods show better performance than data glove based methods. Table 1
summarizes the results of the performance measurements.
Table 1. Results of performance measurements. Selection time, error rate and standard
deviation (SD) in parentheses. A-D indicate the levels of difficulty (A is the easiest, D the most
difficult task condition).
selection time (SD) [s] error rate (SD) [count]
Difficulty A B C D A B C D
Direct 3.2 (0.4) 3.6 (0.4) 6.5 (1.2) 6.4 (0.7) 0.3 (0.1) 0.2 (0.1) 0.9 (0.3) 0.8 (0.2)
Ray 4.7 (0.6) 6.1 (1.0) 7.1 (1.1) 13.3 (2.2) 0.7 (0.2) 0.9 (0.2) 0.7 (0.1) 1.7 (0.3)
List Hand 2.4 (0.3) 0.1 (0.1)
List Mouse 1.6 (0.1) 0.0
Plane 4.6 (0.2) 5.0 (0.4) 5.9 (0.3) 6.8 (0.5) 0.1 (0.1) 0.0 0.1 (0.1) 0.1 (0.1)
Cylinder 2.4 (0.2) 2.8 (0.3) 3.3 (0.2) 4.6 (0.7) 0.0 0.0 0.0 0.0
3.2 Usability Measurement

Evaluation comprised the following items: expenditure of energy required to
accomplish the task (a), constancy (b), required effort while carrying out the task (c),
accuracy (d), speed (e), overall contentment (f) and utilization (g).
Table 2. Results of usability evaluation. The table shows mean values (1 is worst, 7 is best).
Standard deviations in parentheses.
Item List Mouse List Hand Cylinder Plane Direct Ray
a 6.7 (0.5) 6.4 (0.9) 6.0 (0.6) 6.0 (1.1) 5.8 (1.0) 5.8 (1.3)
b 6.8 (0.4) 6.4 (0.8) 6.0 (1.1) 6.0 (0.8) 5.4 (1.2) 4.6 (1.4)
c 6.7 (0.6) 6.3 (0.7) 6.0 (1.1) 5.7 (1.2) 5.0 (1.2) 4.2 (1.5)
d 6.7 (0.7) 6.0 (1.1) 5.5 (1.3) 5.7 (1.2) 4.5 (1.6) 3.2 (1.5)
e 6.7 (0.5) 6.3 (0.8) 5.6 (1.3) 5.7 (1.0) 5.2 (1.4) 4.3 (1.3)
f 6.6 (0.6) 6.3 (0.8) 5.6 (1.3) 4.7 (1.1) 5.0 (1.6) 3.9 (1.6)
g 6.5 (0.8) 6.2 (0.9) 6.0 (1.1) 5.2 (1.2) 5.5 (1.3) 4.3 (1.3)
Total 6.7 (0.6) 6.3 (0.9) 5.8 (1.1) 5.6 (1.1) 5.2 (1.3) 4.3 (1.4)
The overall rating indicates a precedence of the Mouse and Hand Operated List.
Cylinder and Plane achieve a moderate rating and Direct and Ray feature the lowest
values. This order complies with the performance order. The average evaluation of
accuracy (d) and speed (e) reflect the actual performance quite good, except for Plane
and Cylinder. Both accuracy and speed received comparable usability judgments,
whereas the actual performance regarding selection time differs appreciably.
Concerning the overall contentment (f) and utilization (g) Direct ranks higher than
Plane (Tab. 3).
3.3 Spatial Ability Measurement
The examination of the influence of space relations and spatial orientation ability on
performance shows a significant correlation between spatial orientation ability and
both selection time and error rate with data glove based interaction methods (r = -.18,
p < .01 and r = -.16, p < .05 respectively). For mouse and list based interaction
methods there is no significant correlation. Furthermore, results reveal a significant
correlation between spatial orientation ability and selection time with the Hand
Operated List (r = -.21, p < .05). In case of the Mouse Operated List, there is no
significant correlation. To identify the actual differences in performance between
users with high and low spatial orientation abilities, two groups are formed by median
split. With the data glove based interaction methods (Group 1) the users with high
spatial orientation ability perform on average 18 % (1.0s) faster and make 15 % (0.11)
less errors compared to the users with low spatial orientation ability. With a high task
difficulty the difference regarding selection time amounts to 50 % (8.0s vs. 12.0s) and
the error rate difference is 7 % (1.19 vs. 1.27). Under low task difficulty conditions no
difference regarding selection time was determined, whereas the error rate difference
amounts to 64 % (0.39 vs. 0.64). An analysis of the performance differences within
the list based interaction methods (Group 3) shows that the users with high spatial
orientation ability need on average 14 % (0.3s) shorter time to select the target object
with the Hand Operated List compared to the users with low spatial orientation
ability. For the Mouse Operated List no differences can be found.
Table 3. Results of the space relations and spatial orientation test scores and high and low
spatial orientation ability group values after median split.
M SD Range N
Space Relations Test Score (0-100) 85.4 8.6 66-98 24
Spatial Orientation Test Score (0-180) 162 16.7 96-174 24
Low 151.8 20.6 96-166 12

Spatial Orientation Ability Group
High 170.2 2.0 168-174 12
4 Discussions
4.1 Performance
The results of the performance measures suggest considerable differences between the
evaluated interaction methods. The superiority of the list based interaction methods
could be due to the unvaried levels of task difficulty. The reason for the advantage of
the Mouse compared to the Hand Operated List presumably originates from
differences in the nomination process, which requires considerable hand movement
when using the data glove while using the scroll wheel of the mouse is not associated
with major hand movement. The advantage of the Cylinder compared to the Plane re-
garding selection time probably arises due to the automated adjustment of the cylinder
height. The worse performance regarding selection time of the data glove based inter-
action methods certainly arise from the high error rates that go along with these
methods, which in turn may origin from the confirmation mode - bringing together
the tips of the middle finger and the thumb. This finger movement seems to affect the
position and the direction of the index finger, thus increasing the probability of an
error. In contrast, nomination with the mouse based interaction methods is carried out
with the right hand and confirmation is done with the left hand. Generally, the higher
level of user familiarity with a mouse compared to a data glove tends to amplify the
difference in performance.
4.2 Usability
Furthermore, the subjective rating on expenditure of energy required to accomplish

the task, constancy and required effort while carrying out the task, as well as overall
contentment and evaluation of utilization correlate with the performance of the
interaction methods. The superior evaluation of the Plane compared to the Cylinder
regarding accuracy and speed may be a consequence of the missing necessity to adjust
the height of the cursor while directing it to the target object. Positioning the cursor
under the object implies a trial procedure, which is only finished when the cursor
changes into a cylinder. The superior rating of Direct compared to Plane regarding
overall contentment and utilization assumedly originates from the relatively simple
operation of Direct in contrast to the complex nomination process with Plane which
requires combined mouse movement and scroll wheel operation.
4.3 Spatial Ability
The correlation between spatial orientation ability and performance with different
data glove based interaction methods at first indicates considerable demands on the
spatial orientation ability of the user with these interaction methods. The conclusion
that these demands arise from the use of the data glove is not valid due to the lack of
comparability between the tested methods. However, the significant interrelation
between spatial orientation ability and selection time with the Hand but not the Mouse
Operated List supports this assumption, because these methods are indeed compar-
able. Taking into account that the statistical spread of the spatial orientation ability
values was relatively small, a stronger influence of spatial orientation ability on
performance can be expected for a more heterogeneous user group. It also is to be
taken into account that the participants were almost exclusively male. The transfer-
ability to a wide range of users may therefore be limited. For more universally
applicable results, gender effects have to be investigated.
4.4 Design Recommendations
Results of performance and usability measurements generally suggest mouse based

interaction methods to accomplish object selection tasks in virtual environments. The
question, which mouse based interaction method should finally be used, strongly
depends on the task to be accomplished. Mouse Operated List is most suitable when
the number of objects to be selected or their size is relatively small and when cases of
occlusion appear frequently. Its use is suboptimal when a particular object has to be
selected from a large number of objects due to the increasing number of rows within
the list and the accordant rise in search time. Alternatively the application of the
Cylinder can be suggested, whereas occlusion or superimposed assemblies of objects
can also cause problems. To partly avoid these problems, the Plane, which received
relatively positive usability ratings, seems to be suitable as well. Another approach
could be the combination of the Cylinder and the Mouse Operated List, which could
solve the occlusion and superimposition problem. The independence of mouse based
interaction methods from the user's spatial orientation ability argues for these
methods, especially in cases where relatively heterogeneous user groups are to be
expected and a high performance is important. Data glove based interaction methods
can generally not be recommended due to their relatively poor performance. If they
are considered to be used, Direct is the method of choice due to its comparatively
positive usability rating. Especially for easy and very easy object selection tasks this
method should be convenient. In order to facilitate task fulfillment, a combination of
Direct and Hand Operated List could be implemented, utilizing the list particularly to
select small objects. Generally, the adoption of the Ray is not recommended but could
be considered when the objects to be selected are at a distance that can not be reached
physically. When considering the usage of a data glove based interaction method, the
significant influence of the user's spatial orientation ability on performance should be

taken into account. Especially in operational areas where high performance and safety
has to be ensured, the use of the data glove implies aptitude tests, examining the
spatial orientation ability of potential users.
4.5 Outlook
To enhance the interaction, improvements in the design of individual methods are

necessary and combinations of interaction methods should be developed and assessed.
Further analysis of the relationship between spatial orientation ability and perfor-
mance with improved or other interaction methods and tasks should support the
selection of a suitable interaction method, and gives important advice which criteria
should be evaluated in aptitude assessments. Also gender differences should be
examined systematically. Furthermore, the investigation of learning effects appears to
be important, considering the novelty of the virtual environment, the analyzed
interaction methods, and the high levels of familiarization with commonly used inter-
action devices such as mouse or keyboard.
Acknowledgements. We thank the participants for taking part in the experiment and
H. Neujahr, Prof. Dr. A. Zimmer, Dr. P. Sandl and C. Vernaleken for their inspiring
comments.
References
1. Bennett, G.K., Seashore, H.G., Wesman, A.G.: Differential Aptitude Tests (Space Relations
Subtest). The Psychological Corporation, New York (1990)
2. Bowman, D.A., Kruijff, E., LaViola, J.J., Poupyrev, I.: User Interfaces: Theory and
Practice. Pearson Education, Inc., Boston (2005)
3. Ergonomic requirements for office work with visual display terminals (VDTs) - Part 9:
Requirements for non-keyboard input devices (ISO 9241-9:2002)
4. Hegarty, M., Kozhevnikov, M., Waller, D.: Perspective Taking / Spatial Orientation Test.
University of California, Santa Barbara (2008), Downloaded on April 16, 2010,
http://www.spatiallearning.org
Effects of Menu Orientation on Pointing Behavior in
Virtual Environments
Nguyen-Thong Dang and Daniel Mestre
Institute of Movement Sciences, CNRS and University of Aix Marseille II

163 avenue de Luminy, CP 910, 13288 Marseille cedex 9, France
{thong.dang,daniel.mestre}@univmed.fr
Abstract. The present study investigated the effect of menu orientation on user
performance in a menu items' selection task in virtual environments. An ISO
9241-9-based multi-tapping task was used to evaluate subjects’ performance.
We focused on a local interaction task in a mixed reality context where the
subject’s hand directly interacted with 3D graphical menu items. We evaluated
the pointing performance of subjects across three levels of inclination: a vertical
menu, a 45°-tilted menu and a horizontal menu. Both quantitative data
(movement time, errors) and qualitative data were collected in the evaluation.
The results showed that a horizontal orientation of the menu resulted in
decreased performance (in terms of movement time and error rate), as
compared to the two other conditions. Post-hoc feedback from participants,
using a questionnaire confirmed this difference. This research might contribute
to guidelines for the design of 3D menus in a virtual environment.
Keywords: floating menu, menu orientation, local interaction, pointing,

evaluation, virtual environments.
1 Introduction
Graphical menus are frequently used for system control, one of the basic tasks in
Virtual Environments (VEs). Typically, a command is issued to perform a particular
function, to change the mode of interaction, or to change the system state [2].
Research on graphical menus in VEs currently focuses on the design of menu
characteristics, among them menu appearance and structure, menu placement, menu
invocation and availability, etc. [4]. Various menu systems for VEs have been
proposed in the literature; however, a standard graphical menu system for VEs is still
yet to come.
Among different approaches for menu system design, adopting two-dimensional
(2D) graphical menus in VEs presents many advantages. This approach, bringing the
commonly used menu concept from 2D user interfaces to VEs, might benefit from
well-established practices in 2D menu design. However, whereas traditional 2D
menus are always constrained to a fixed vertical plane surface, graphical menus in a
3D spatial context, such as VEs, can be positioned and oriented in many different
ways. The menu placement in an immersive environment can be world-referenced,
Effects of Menu Orientation on Pointing Behavior in Virtual Environments 145
object-referenced, body-referenced, device-referenced, etc. [2][4]. In addition, a menu

system can also be oriented in different angles with respect to the user’s viewpoint.
This might also be one of the reasons why the Virtual Reality (VR) and 3D User
Interfaces research community agrees on the term “floating menu” to refer to a 3D
menu system in VEs. The main issue for menu placement in VEs is to define an
optimal combination between menu position and menu orientation, to facilitate
interaction with menu items.
The present paper tackles issues of floating menus’ placement in VEs, in particular
menu orientation. We conducted a study investigating the effect of menu orientation
on user performance in a menu items’ selection task in VEs. We focused on a local
interaction task (pointing to menu items). More specifically, we hypothesized that the
inclination of a floating menu would affect users' perception of menu items’ distance
and consequently the pointing action. We thus hypothesized that differences in menu
orientation would lead to differences in terms of pointing time and errors.
Determining whether menu orientation has any effect on users' performance, might be
helpful for developers in choosing the placement of menus in VEs; especially in the
context of human scale virtual environments where virtual objects (in our case menu
systems) are usually within reach of users’ hand. We did not address here "at-a-
distance" interaction or distant pointing, in which a ray-casting technique is usually
used to point to a distant menu system. In the present study, the subject’s hand
directly interacted with menu items.
2 Related Work
The benefit of graphical menus for interaction in a virtual environment was first
showed in a study conducted by Jacoby and Ellis [10]. Since then, various studies on
the design and evaluation of graphical menus in VEs have been proposed, leading to a
collection of more than thirty existing menu systems at present [4]. A comprehensive
review of those menu systems can be found in a survey conducted by Dachselt and
Hübner [4]. However, most of the previous studies focused on the menu's appearance
and structure, rather than on the issues of menu placement. We can find in the
literature numerous studies about the arrangement of menu items which varies from a
planar layout (for example, Kim et al. [11]; Bowman and Wingrave [3]; Ni et al. [12],
to name a few), to a ring layout (for e.g. the Spin menu proposed by Gerber and
Bechmann [7] ), or to a 3D layout as the Control and Command Cube (Grosjean et al.
[8]), etc. Only a few studies focused on the issues of menu placement in a virtual
environment. However, those studies mostly worked on the spatial reference frame of
the menu system, rather than on menu orientation. Since the menu orientation is also
dependent to the reference frame in certain cases (handheld menu for example), it is
worth introducing shortly here some typical studies regarding the reference frame of
menu systems in VEs.
In 2000, Kim et al. [11] conducted a study involving three factors: menu
presentation (the way menu elements are disposed on menus), input device and menu
reference frame in a Head-Mounted Display (HMD). The two menu’s reference
frames were: world-reference (where the menu is fixed in a position of the scene,
independent to the user’s viewpoint) and viewer-reference (where menu position and
146 N.-T. Dang and D. Mestre
orientation was updated and so kept unchanged in relation to the user's viewpoint).
The menu was placed at a distance and the selection of menu items could be done
using a ray-casting technique. However, the study focused more on comparing
interaction modalities (gesture versus tracked device) than on the issue of reference
frame itself. Besides, the analysis on menu presentation and reference frame was not
provided. It is thus difficult to draw any conclusion regarding the effect of different
reference frames from this study.
Another study conducted by Bernatchez and Robert [1] compared 5 spatial frames
of reference, among them world-reference and 4 types of body-reference in a HMD.
The four configurations of body-reference were (1) the menu follows the user in
position only (2) the menu follows the user’s position and turns to remain facing the
user (3) the menu is attached to the user’s non dominant hand and (4) the menu is
following the user’s gaze direction. The menu was placed in the subject’s arm range
(i.e. local interaction) and subjects controlled their hand's avatar to interact with menu
elements. This study showed that the user performed the experimental task (a slider
control) the best with the body-reference frame (2).
It is important to note that most previous studies about the placement of floating
menus (including the previous two studies) have been conducted using a HMD,
which is different from a mixed-reality context, like a rear-projected VR system such
as CAVE or a workbench. In those VR systems, subjects can see their own body and
use their body to interact with different elements of the virtual scene. Recently, a
study conducted by Das and Borst [5] investigated menu manipulation performance in
a rear-projected VR system, in different layouts, with different menu placements and
in a distant pointing context. The study showed that contextual pop-up menus
increased performance, as compared to fixed location menus.
An interesting point is that, in previous studies, the effects of the orientation of the
floating menu relative to the user were not taken into consideration. In some studies
[5][11], the menu was placed vertically. In other studies [1][13], the floating menu was
tilted at a certain degree without any details about the choice of menu orientation.
From the literature, we were not able to find the answers for our research problem,
which involves both orientation of the floating menu and local interaction in a mixed
reality context in an immersive environment. The experiment we conducted in this
study was designed to help us understand the effect of orientation of the floating menu
in a mixed reality context where the user’s hand directly touches virtual menu items.
3 Evaluation
3.1 Methodology
We adopted the methodology presented in Part 9 of the ISO 9241 standard for
non-keyboard input devices [9]. Specifically, we undertook a user study involving
two-dimensional serial target selection. Performance was quantified by the throughput
index (in bits per second (bps)) whose calculation is based on Fitts' law [6] and
requires the measurement of effective index of difficulty (IDe) (cf. Formula (2)) and
the average movement time (MT) (cf. Formula (1)).
IDe
Throughput (TP ) = (1)
MT
Movement time is the mean trial duration over a series of target selection tasks. The
effective index of difficulty is calculated based on the effective target width (We) and
distance (D) of the target selected. SDx is the standard deviation of the over/
under-shoot projected onto the task axis for a given condition.
⎛D
IDe = log 2 ⎜⎜
⎞
(
+ 1⎟⎟ where We = 4.133 × SDx ) (2)
⎝ We ⎠
3.2 Participants
Seven subjects (age range from 22 to 39 years old, all right-handed) participated in
the evaluation, after being tested for normal vision and correct stereoscopic
perception. Six have little to no experience with 3D stereo vision in a VR system.
3.3 Procedure
The interpupillary distance of each subject was first measured at the beginning of the
experiment. A paper with written instructions was then provided to the subject.
Subjects were allowed to ask questions and for additional explanation only before the
beginning of the test. After that, a calibration of a device tracking user fingers’ 3D
position was carried out for each subject. Then, every subject was invited to stand in
the centre of the CAVE system. Training trials were prepared in order to let subjects
become acquainted with the 3D scenes and task in each condition. Afterwards, the
real task began.
After the experiment, each subject was requested to answer to some questions on a
seven-point Likert scale. The questionnaire aimed at gathering subjective information
regarding: ease of the experimental task, enjoyment, effectiveness, and frustration in
relation to each experimental condition. Overall, each experimental session (including
calibration phase) lasted for approximately 1 hour.
3.4 Apparatus and Task
Subjects were presented with 9 circular targets, arranged in a circle on a virtual planar
surface, projected in a 4-sided CAVE-like setup at the Mediterranean Virtual Reality
Center (CRVM)1. The floating menu was positioned at the centre of the CAVE. The
height of the virtual surface was adjusted according to the subject’s height to avoid
fatigue of the subject’s arm. Subjects were free to choose their position so that they
felt comfortable with the pointing task.
1
www.realite-virtuelle.univmed.fr
Fig. 1. Experimental task
Figure 1 illustrates the experimental target selection task. Subjects wore the A.R.T.
Fingertracking device and used their index fingertip to touch the target (cf.
Figure 1(b)). The order of presentation of the 9 targets was predefined as in Figure
1(a). Targets were highlighted in red (except target 1, which was highlighted in black,
allowing the subject to rest) one at a time; subjects were asked to point to the
highlighted target as quickly and accurately as possible using their index fingertip.
Making a selection (whether a hit or a miss) ended the current trial.
Stereoscopic viewing was obtained using Infitec® technology. Real-time tracking
of the subject's viewpoint and fingers was obtained using an ART® system. Virtools
® software was used to build and control virtual scenarios, for experimental control
and data recording.
3.5 Experimental Design
Fours factors were taken into account in the study

• Target width (W): 0,024 m, 0,036 m
• Target distance (D): 0,12 m, 0,24 m, 0,4 m
• Inclination of the floating menu : vertical (0°), 45°, horizontal (90°) (cf. Fig. 2)
• Block: 1, 2, 3, 4, 5, 6
Fig. 2. Three levels of inclination of the floating menu

The combination of two values of target width and three values of target distance
defined six IDs (Index of Difficulty) as follows: 2,12 (D=0,12, W=0,036), 2,58
(D=0,12, W=0,024), 2,94 (D=0,24, W=0,036), 3,46 (D=0,24, W=0,024), 3,60 (D=0,4,
W=0,036), 4,14 (D=0,4, W=0,024). In total, there were 6048 recorded trials
(7 subjects × 6 IDs × 3 inclinations × 6 blocks × 8 trials by ID). The dependent
variables were movement time (s), error rate (percent), and throughput (bps). Results
were analyzed with repeated measures ANOVAs.
4 Results and Discussion
4.1 Blocks
Movement time with regards to the six trial blocks were respectively 653 ms
(Standard Error (SE) = 71), 620 ms (SE=71), 627 ms (SE=68), 627 ms (SE=73), 620
ms (SE=65) and 604 ms (SE=62). The differences were not significant across the six
blocks [F (5, 642) = 1,090, p = 0,6364]. Error rates corresponding to the six blocks
were respectively 9,93% (SE=5.12%), 7.6% (SE=4,32%), 8,4% (SE=4,32%), 7,31%
(SE=4,21%), 7,6% (SE=4,45%) and 7,6% (SE=4,32%). The differences in error rate
were also not significant [F(5,642) = 0,992, p = 0,422]. As a matter of fact, there was
no learning effect across blocks, indicating that subjects easily adapted to the
experimental task and the input device.
Error rate Mouvement time
1,00 653
800
620 627 627 620
0,90 604 700
0,80
Movement time (ms)
600
0,70
500
Error rate
0,60
0,50 400
0,40 300
0,30
200
0,20 9,93% 7,60% 8,40% 7,60% 7,60%
7,31%
0,10 100
0,00 0
1 2 3 4 5 6
Block
Fig. 3. Movement time and Error rate across six blocks of trial; vertical bars show standard errors
4.2 Movement Time and Error Rate
Average movement time was 632 ms (SE=75), 599 ms (SE=52) and 644 ms (SE=75)
in the 0°, 45° and 90° inclination of the floating menu respectively. The difference
was significant (F(2,642) = 4,673, p=0,01). Post-hoc comparisons using Tukey test
revealed significant differences in movement time between the 45° and 90°
conditions (p<0,01). There was no difference between the 45° and 0° conditions.
Average error rates corresponding to the three levels of inclination (0°, 45° and
90°) were respectively 6,2% (SE=3.56%), 6,12% (SE=3,57%) and 11,86%
(SE=5,59%). The difference was significant [F(2,642) = 22,039, p < 0,001]. Post-hoc
comparisons (Tukey test) revealed a significant difference in error rate between the
45° and 90° conditions (p < 0,001) and between the 0° and 90° conditions
(p < 0,001). There was no difference between the 45° and 0° conditions (p=0,997).
Error rate Movement time
1,00 800
0,90 632 644 700
599
0,80
Movement time (ms)

600
0,70
500
Error rate
0,60
0,50 400
0,40 300
0,30
11,86% 200
0,20
6,20% 6,12%
0,10 100
0,00 0
0 45 90
Inclination
Fig. 4. Movement time and Error rate (vertical bars show standard errors) corresponding to the
three levels of inclination of the floating menu
4.3 Throughput
The difference in throughput, which incorporates both speed and accuracy, was also
significant (F(2,642)=6,807, p=0,001). The average throughput was 5.46 bps
(SE=0,62), 5,48 bps (SE=0,52) and 5,01 bps (SE=0,64) respectively for the 0°, 45°
and 90° conditions. Post-hoc comparisons using Tukey test revealed significant
differences in throughput between the 45° and 90° conditions (p=0,003), and between
the 0° and 90° conditions (p=0,005). There was no difference between the 45° and 0°
conditions (p=0,991).
Throughput
5,46 5,48
5
5,01
3
0 45 90
Fig. 5. Average throughput value (vertical bars show standard errors) as a function of the three
levels of inclination of the floating menu
4.4 Qualitative Results
As stated before, after the experimental session, subjects were asked to fill a short
questionnaire addressing the following questions: ease of the task, enjoyment,
frustration and effectiveness with respect to the three level of inclination of the
floating menu. Values reported in the histograms (cf. Fig. 6) are medians. Except
frustration where high score referred to a high level of frustration from subjects, high
scores in the other parameters corresponded to positive feedback. Median values
regarding the ease of the task, for the 0° condition it was 5 (1st quartile = 5; 3rd
quartile = 6); for the 45° condition it was 6 (5; 6) and for the 90° condition 3 (3; 5).
Median values for the 0°, 45° and 90° conditions in terms of enjoyment were
respectively 5 (5; 6), 5 (5; 6), and 3 (2; 3). Regarding the frustration with respect to
each condition, median values were 2 (2; 3), 2 (2; 3), and 5 (5; 5) respectively for the
0°, 45° and 90° conditions. Finally, as for the effectiveness of different inclinations in
supporting the pointing task, median value for the 0° condition was 5 (3; 5), for the
45° condition 6 (4; 7), and for the 90° condition 3 (2; 4).
0° 45° 90°
7
6 6
6
5 5 5 5 5
5
4
3 3 3
3
2 2
2
0
Ease of the Task Enjoyment Frustration Effectiveness
Fig. 6. Results from the questionnaire (on a 0-7 Likert scale)
Additional analyses were performed on the questionnaire data using the Friedman
test. There was a statistically significant difference in subjects’ feedback regarding the
ease of the task [χ2(2) = 6,348, p = 0,042], the enjoyment [χ2(2) = 11,385, p = 0,003],
the level of frustration [χ2(2) = 10,300, p = 0,006] and the effectiveness [χ2(2) =
8,400, p = 0,015]. Post-hoc comparisons using Wilcoxon signed ranks test revealed
significant differences between the 45° and 90° conditions in subjects’ feedback
regarding the ease of the task [Z = -2,132, p = 0,033], the enjoyment [Z = -2,414, p =
0,016], the frustration [Z = -2,220, p = 0,026], and the effectiveness [Z = -2,070, p =
0,042] . There were also significant differences between the 0° and 90° conditions in
subject’s feedback regarding the enjoyment [Z = -2,410, p = 0,016], the frustration [Z
= -2,041, p = 0,041], and the effectiveness [Z = -2,032, p = 0,038]. Overall, the 0° and
45° conditions received positive scores while the 90° condition received negative
feedback from subjects. This result was in line with the quantitative results regarding
the subject’s performance above.
5 Conclusion
The focus of this evaluation was on the effect of menu orientation in a local pointing
task. The vertical menu plane (inclination of 0°) and 45°-tilted menu plane resulted in
better performance (shorter pointing time and lower error rate), as compared to the
horizontal floating menu (i.e. the inclination of 90°). Even though menu items in the
three conditions of inclination used in the present study were within reach of subjects'
hand, inclination seemed to affect users' pointing to menu items. We suggest that a
horizontal menu orientation has to be avoided since this configuration potentially lead
to difficulties in judging the position of menu items and subsequently in pointing to
those targets.
Acknowledgement. The authors wish to thank Jean-Marie Pergandi, Pierre Mallet,

Vincent Perrot at CRVM and all the participants of the evaluation. This work was
carried out in the framework of the VIRTU’ART project, sponsored by the Pole
PEGASE, funded by the PACA region and the French DGCIS.
References
1. Bernatchez, M., Robert, J.-M.: Impact of Spatial Reference Frames on Human
Performance in Virtual Reality User Interfaces. Journal of Multimedia 3(5), 19–32 (2008)
2. Bowman, D., Kruijff, E., Joseph, J., LaViola, J., Poupyrev, I.: 3D user interfaces: theory
and practice. Addison-Wesley, Reading (2004)
3. Bowman, D.A., Wingrave, C.A.: Design and evaluation of menu systems for immersive
virtual environments. In: Proceedings of IEEE Virtual Reality 2001, pp. 149–156 (2001)
4. Dachselt, R., Hübner, A.: Virtual Environments: Three-dimensional menus: A survey and
taxonomy. Comput. Graph. 31(1), 53–65 (2007)
5. Das, K., Borst, C.W.: An Evaluation of Menu Properties and Pointing Techniques in a
Projection-Based VR Environment. In: IEEE 3D User Interfaces (3DUI), pp. 47–50 (2010)
6. Fitts, P.M.: The information capacity of the human motor system in controlling the
amplitude of movement. J. Exp. Psychology 47, 381–391 (1954)
7. Gerber, D., Bechmann, D.: The Spin Menu: A Menu System for Virtual Environments. In:
Proceedings of the 2005 IEEE Conference 2005 on Virtual Reality (VR 2005), pp. 271–
272. IEEE Computer Society, Washington, DC (2005)
8. Grosjean, J., Burkhardt, J., Coquillart, S., Richard, P.: Evaluation of the Command and
Control Cube. In: IEEE International Conference on Multimodal Interfaces, pp. 473–478
(2002)
9. ISO, Ergonomic requirements for office work with visual display terminals (VDTs) – Part
9: Requirements for non-keyboard input device. International Organization for
Standardization (2000)
10. Jacoby, R., Ellis, S.: Using Virtual Menus in a Virtual Environment. In: Proceedings of
SPIE: Visual Data Interpretation, pp. 39–48 (1992)
11. Kim, N., Kim, G.J., Park, C.-M., Lee, I., Lim, S.H.: Multimodal Menu Presentation and
Selection in Immersive Virtual Environments. In: Proceedings of the IEEE Virtual Reality
2000 Conference (VR 2000). IEEE Computer Society, Washington, DC (2000)
12. Ni, T., McMahan, R.P., Bowman, D.A.: rapMenu: Remote Menu Selection Using
Freehand Gestural Input. In: 3DUI 2008: IEEE Symposium on 3D User Interfaces, pp. 55–
58 (2008)
13. Wloka, M.M., Greenfield, E.: The Virtual Tricorder: A Uniform Interface for Virtual
Reality. In: UIST 1995: Proceedings of the 8th Annual ACM Symposium on User
Interface and Software Technology, pp. 39–40. ACM Press, New York (1995)
Some Evidences of the Impact of Environment’s Design
Features in Routes Selection in Virtual Environments
Emília Duarte1, Elisângela Vilar2, Francisco Rebelo2,

Júlia Teles3, and Ana Almeida1
1
UNIDCOM/IADE – Superior School of Design, Av. D. Carlos I, no. 4,
1200-649 Lisbon, Portugal
2
Ergonomics Laboratory and 3 Mathematics Unit, FMH/Technical University of Lisbon,
Estrada da Costa, 1499-002 Cruz Quebrada, Dafundo, Portugal
emilia.duarte@iade.pt, {elivilar,frebelo,jteles}@fmh.utl.pt,
sofialmeida@gmail.com
Abstract. This paper reports results from a research project investigating users’
navigation in a Virtual Environment (VE), using immersive Virtual Reality.
The experiment was conducted to study the extent that certain features of the
environment (i.e., colors, windows, furniture, signage, corridors’ width) may
affect the way users select paths within a VE. Thirty university students
participated in this study. They were requested to traverse a VE, as fast as
possible and without pausing, until they reached the end. During the travel they
had to make choices regarding the paths. The results confirmed that the
window, corridors’ width, and exit sign factors are route predictors in the extent
that they influence the paths selection. The remaining factors did not influence
significantly the decisions. These findings may have implications for the design
of environments to enhance wayfinding.
Keywords: Virtual Reality; Wayfinding; paths selection; environmental features.
1 Introduction
This paper reports results from a research project investigating users’ navigation in a
Virtual Environment (VE), using immersive Virtual Reality (VR). The experiment
was conducted to study the extent that certain features of the environment (i.e., colors,
windows, furniture and corridors’ width) and the signage presence (i.e., Exit sign)
may affect the way users select paths within a VE.
In general, the studies in the area of wayfinding have two main focuses: the
internal information (cognitive representation) and external information
(environmental features), conceptualized by Norman [1] as “knowledge in the head”
and “knowledge in the world”, being both essentials for people daily functioning.
When interacting with a building for the first time, people rely on the external
information (knowledge in the world), which can complement their internal
information (knowledge in the head) in order to be successful in their orientation and
navigation through the new environment. Based in these concepts, Conroy [2]
Some Evidences of the Impact of Environment’s Design Features 155
suggests that for wayfinding the external information is presented in many forms and
cognition levels. In a lower level of awareness, this information can be considered
implicit in the overall configuration and structure of the environment. In contrast, the
external knowledge is explicit in the form of, for instance, signage, in a higher level
of awareness.
Route selection is a fundamental stage for the wayfinding process together with the
orientation and the recognition of the destination [3]. However, few studies were done
based on the environmental features (external information) that may influence the
decision about the path to follow and they are usually based on signage [4, 5].
Wayfinding studies have shown the importance of several cues in the navigational
process, mainly in unfamiliar places, such as landmarks, areas differentiations (e.g.,
color, textures), and the overall floor plan [6, 7]. Furthermore, any factors that can
cognitively facilitate wayfinding should be considered.
As suggested by previous studies [2, 8, 9], VR offers several potential advantages
over traditional means (e.g., observation, paper/pencil tests) of assessing people’s
wayfinding behavior. For example, problems related with the manipulation and
control of variables, data collection, as well as ecological validity can be overcome.
Thus, this study was designed to examine the influence of several features of the
environment, in a lower level of awareness, on paths selection, in order to determine
if such variables can be considered as route predictors. Participants were asked to
traverse a novel route, containing several decision points in which they had to select a
path from two alternatives.
2 Method
2.1 Sample
Thirty university students, 14 females and 16 males, aged 19 to 36 years old

participated in this within-subjects study (mean age = 23.30, SD = 4.48), and each one
made 36 trials (6 choices per factor). They had no previous experience with
navigation in VEs. Participants had normal sight or had corrective lenses and no color
vision deficiencies. They reported no physical or mental conditions that would
prevent them from participating in a VR simulation.
2.2 Apparatus
During both training and testing stages, participants were seated and viewed the VE at
a resolution of 800 × 600 pixels, at 32 bits, with a FOV 30°H, 18°V and 35°D through
a Head-Mounted Display (HMD) from Sony®, model PLM-S700E. The participants’
viewpoint was egocentric. Participants were free to look around the VE since a
magnetic motion tracker from Ascension-Tech®, model Flock of Birds®, monitored
their head movements. A joystick from Thrustmaster® was used as a navigation
device. The speed of movement gradually increased from stopped to an average walk
pace (1.2 m/s) to a maximum speed around 2.5 m/s. Wireless headphones from
Sony®, model MDR-RF800RK, allowed them to listen to an instrumental ambient
music (i.e., elevator music). The simulation was running in a Windows® graphics
workstation, equipped with a graphics card (NVIDIA® Quadro FX4600). An external
156 E. Duarte et al.
monitor was used to display the same image of the VE that was being displayed to the
participants. Hence, the researcher could watch the interaction, inside and outside the
VE, and at the same time, take notes and manage the operation of the ErgoVR system.
The ErgoVR system [10], developed in the Ergonomics Laboratory, of the
FMH-Technical University of Lisbon, allowed not only the display of the VE but also
the automatic collection of data such as the duration of the simulation, distance and
path taken by the participants.
2.3 Virtual Environment
The VE is comprised by a sequence of 36 modules, which were composed by two

parallel corridors, measuring 6 m long by 2 m wide (with exception of a special
narrow corridor which was 1.2 m wide), which started and ended in distribution halls
(junctions). The initial hall contained the decision point where the participants needed
to make a choice regarding travelling left or right. A short corridor, measuring
2.5 × 2 m, connects the modules. Thus, the layout of the VE takes the shape of a chain
with 36 “O” linked by short connection corridors. Fig. 1 depicts a section of the VE.
Fig. 1. A section of the VE’s floor plan, showing the 1st and 2nd decision points and the
alternative corridors
The interior surfaces of the VE were textured and illuminated to resemble a

common interior passageway space. The modules were numbered to facilitate the
participants’ location, at any time of the simulation, as well as to keep them informed
about their progression along the VE, although they were not aware of the total
number of modules. The outside image displayed in the window, as well as the exit
sign, were bitmaps. The chairs, which existed in the module with furniture, were 3D
objects.
The base structure of the VE was designed using AutoCAD® 2009, and then it was
imported into 3D Studio Max® 2009 (both from Autodesk, Inc.). The VE was then
exported, using a free plug-in called OgreMax (v. 1.6.23), to be used by the ErgoVR.
2.4 Experiment Design
The experiment was divided in two stages: training followed by testing. The
procedures for each stage are described in the next topic.
For the testing stage, the study used a within-subjects design, comprising 36 trials
(one per junction), which resulted from the combination of six factors being presented
six times, in which their two alternative categories were positioned in an
interchangeable way on the left/right corridors, by a sequence defined accordingly to
a Latin Square. The factors and their alternative categories were: (1) color:
yellow/blue; (2) window: no-window/window; (3) width: narrow (1.20 m)/large
(2.00 m) corridor; (4) furniture: no-chairs/chairs; (6) signage: no-exit sign/exit sign;
plus a neutral condition (5) empty corridors, equally dimensioned and white colored.
Fig. 2 shows screenshots of the corridors, taken at the junctions/decision points, for
each factor, with their alternative categories.
Fig. 2. Screenshots of the corridors, taken at the decision point, for each factor (1 - color;
2 - corridor’s width; 3 - window; 4 - furniture; 5 - neutral; 6 - signage)
With exception of the neutral condition (5), the other factors were hypothesized to
have an impact on peoples’ wayfinding behavior and, therefore, be considered as
predictors of a route selection. The signage was added to the environmental factors
since its influence on navigation/wayfinding it is well known and, this way, would
provide a ceiling measure to which compare the effect of the other factors.
2.5 Procedure
All participants were tested individually, and the entire experiment lasted
approximately 20 minutes. Before starting the training stage, a brief explanation about
the study and an introduction to the equipment was given to the participants. The
Ishihara Test [11] was used to detect color vision deficiencies. Next, participants were
asked to sign an Informed Consent Form and to complete a demographic
questionnaire. This questionnaire asked for information regarding age, gender,
participants’ use of computers and prior experience with VR.
At the training stage, participants familiarized themselves with the VR equipment,

for 3-5 minutes, by exploring a VE designed only for training purposes. The goal of
this stage was to get the participants acquainted with the setup and to make a
preliminary check for any initial indications of simulator sickness. The training VE
consisted of two rooms with 65 m2 (5 × 13 m), without windows, connected by a door
and containing a number of obstacles (e.g., narrow corridors, pillars, etc.), requiring
some skill to be circumvented. There was no time pressure during the training stage.
Subjects were encouraged to freely explore the VE, and make several turns around the
pillar as smoothly and accurately as possible.
At the testing stage verbal instructions were given to the participants requesting
them to travel along the VE until reaching the end, as fast as possible, without
pausing. Turning back was not allowed at any point of the way, but participants could
request to stop the procedure at any time. During the testing stage, which required
about 10 minutes, subjects took 36 decisions (trials) regarding the paths (left or right),
by the same order for all of them. On reaching the end of the route, it was shown to
the participants the message – “End. Thank you” – displayed on the wall. Finally,
after the testing stage, subjects were asked to fill a post-hoc questionnaire to obtain
subjective feedback and preferences of the participants regarding the factors (the data
regarding the questionnaire is not included in this paper).
3 Results
3.1 Decision-Making at Junctions
The ErgoVR software automatically registered the paths taken by the participants. The
main dependent variables were the decisions regarding the factors at junctions, which
were dichotomous variables taking the values 1 or 0 depending on the choices that were
made by the participants regarding the factors’ categories. All 30 participants took 36
decisions, thus providing six values for each factor. Considering all the participants, the
percentages of choices for each category, by factor, are shown in Table 1.
Table 1. The percentages of choices made by the participants in 36 trials (6 per factor)
Color Window Corridor’s width

Blue Yellow Window No-window Narrow Large
41.7% 58.3% 85.0% 15.0% 26.1% 73.9%
Furniture Signage Neutral
Chair No-chair Sign No-sign Right Left
62.2% 37.8% 83.9% 16.1% 40.6% 59.4%.
Binomial tests with Bonferroni adjustments were used to evaluate if the

abovementioned factors significantly affected the route selection and, therefore, could
be considered predictors of route selection. The significant values (Adj. Sig) are
presented in bold type. The outputs of the Binomial tests with Bonferroni
adjustments, for each factor, are presented in Table 2 through Table 7.
Table 2. The outputs of Binomial test for factor color
Choice Category N Observed Test Sig. Adj.Sig.

proportion proportion
1 yellow | blue 11 | 19 0.37 | 0.63 0.5 0.200 1
2 yellow | blue 22 | 8 0.73 | 0.27 0.5 0.016 0.096
3 yellow | blue 22 | 8 0.73 | 0.27 0.5 0.016 0.096
4 yellow | blue 13 | 17 0.43 | 0.57 0.5 0.585 1
5 yellow | blue 17 | 13 0.57 | 0.43 0.5 0.585 1
6 yellow | blue 20 | 10 0.67 | 0.33 0.5 0.099 0.594
Table 3. The outputs of Binomial test for factor window

1 window | no window 30 | 0 1.00 | 0.00 0.5 <0.001 <0.001
2 window | no window 22 | 8 0.73 | 0.27 0.5 0.016 0.096
3 window | no window 21 | 9 0.70 | 0.30 0.5 0.043 0.258
4 window | no window 29 | 1 0.97 | 0.03 0.5 <0.001 <0.001
5 window | no window 27 | 3 0.90 | 0.10 0.5 <0.001 <0.001
6 window | no window 24 | 6 0.80 | 0.20 0.5 0.001 0.006
Table 4. The outputs of Binomial test for factor corridor’s width

1 narrow | large 10 | 20 0.33 | 0.67 0.5 0.099 0.594
2 narrow | large 4 | 26 0.13 | 0.87 0.5 <0.001 <0.001
3 narrow | large 7 | 23 0.23 | 0.77 0.5 0.005 0.030
4 narrow | large 7 | 23 0.23 | 0.77 0.5 0.005 0.030
5 narrow | large 11 | 19 0.37 | 0.63 0.5 0.200 1
6 narrow | large 8 | 22 0.27 | 0.73 0.5 0.016 0.096
Table 5. The outputs of Binomial test for factor furniture

1 chair | no chair 23 | 7 0.77 | 0.23 0.5 0.005 0.003
2 chair | no chair 16 | 14 0.53 | 0.47 0.5 0.856 1
3 chair | no chair 19 | 11 0.63 | 0.37 0.5 0.200 1
4 chair | no chair 22 | 8 0.73 | 0.27 0.5 0.016 0.096
5 chair | no chair 18 | 12 0.60 | 0.40 0.5 0.362 1
6 chair | no chair 14 | 16 0.47 | 0.53 0.5 0.856 1
Table 6. The outputs of Binomial test for factor signage

1 sign | no sign 26 | 4 0.87 | 0.13 0.5 <0.001 <0.001
2 sign | no sign 25 | 5 0.83 | 0.17 0.5 <0.001 <0.001
3 sign | no sign 22 | 8 0.73 | 0.27 0.5 0.016 0.096
4 sign | no sign 30 | 0 1.00 | 0.00 0.5 <0.001 <0.001
5 sign | no sign 23 | 7 0.77 | 0.23 0.5 0.005 0.030
6 sign | no sign 25 | 5 0.83 | 0.17 0.5 <0.001 <0.001
Table 7. The outputs of Binomial for neutral (left vs. right)

1 left | right 15 | 15 0.50 | 0.50 0.5 1.0 1
2 left | right 17 | 13 0.57 | 0.43 0.5 0.585 1
3 left | right 21 | 9 0.70 | 0.30 0.5 0.043 0.258
4 left | right 18 | 12 0.60 | 0.40 0.5 0.362 1
5 left | right 20 | 10 0.67 | 0.33 0.5 0.099 0.594
6 left | right 16 | 14 0.53 | 0.14 0.5 0.856 1
3.2 Consistency of Decision-Making at Junction
The participants’ behavior for each factor, in what regards to the choices made during
the 6 trials, was classified in one of the three mutually exclusive categories of
consistency: inconsistent (when 2 or more choices are different from the remaining),
curious (when only 1 choice is different) and consistent (when all choices are equal).
Fig. 3 shows the number of participants, for each behavioral category, regarding
consistency, by factor.
Fig. 3. The participant’s behavior relating to the choices made during the travel
For each factor, the options made by those classified as consistent were analyzed to
find their object of loyalty, i.e., which was the category they always selected
(see Table 8).
Table 8. The participant’s object of loyalty
Color Window Width

Blue Yellow Window No window Narrow Large
Consistent 80.0% 20.0% 100.0% 0.0% 8.3% 91.7%
Furniture Signage Neutral
Chair No-chair Sign No-sign Right Left
Consistent 90.0% 10.0% 100.0% 0.0% 66.7% 33.3%
A Cochran test was used to test if the probability of the participants being
consistent in what regards their choices is equal in all factors. For this test, two
behavioral categories were computed: consistent and inconsistent, with the latter
including the curious behavior. The results suggest that the probability of participants’
being consistent is not equal for all factors (Q(2) = 6.186, p = 0.006, N = 30).
Pairwise comparisons reveal that this difference is due to window and color factors
(p-value = 0.020).
3.3 Left of Right Bias
Since some participants were not consistent in their choices, regarding the factors’
categories, it was hypothesized that such inconsistency could be due to a left/right
bias (i.e., the preference for left or right side). Thus, data from those participants
classified as inconsistent were analyzed in order to check if they were, instead,
consistent regarding the side. For the neutral factor were taken in consideration all
participants independently of their behavioral classification.
Results revealed, from a group of 78 cases of inconsistent behaviors regarding the
factors choices, a total of 13 cases (16.6%) of consistency relating side choices. From
these last cases, only 3 selected always the left corridor (Color: n = 5, all in right;
Window: n = 3, all in right; Width: n = 1, in left; Furniture: n = 4, all in right;
Signage: n = 0; Neutral: n = 6, left = 2, right = 4). The number of cases of curious,
relating side, was 18 (Color: n = 4, all in right; Window: n = 0; Width: n = 4, left = 3,
right = 1; Furniture: n = 4, all in right; Signage: n = 4, left = 2, right = 2;
Neutral: n = 4, all in right).
4 Conclusions
This study had the objective of determining the effect of factors as color, window,
width, furniture, and signage on the paths selection in a VE and, therefore, to
determine if such factors could be considered as predictors for route selection.
Participants were requested to traverse a VE and to take 36 decisions during the
travel. They were confronted, six times, with two alternative and parallel corridors
containing diverse, sometimes reverse (e.g., window vs. no window), categories of the
mentioned factors.
Examination of the statistics reveals that the majority of the participants prefer
large corridors, painted with yellow color, with window, with chairs and, finally, in
corridors marked with an exit sign. Additionally, when facing corridors that are
similar (neutral) the participants prefer mostly the left side. However, despite the
differences found, the results of Binomial tests with Bonferroni adjustments suggest
that, considering that at least three significant differences in six trials could identify a
factor as a route predictor, the window, width and signage could be considered route
predictors. We also highlight that a window seems to be a factor almost as important
as the sign. The results did not reveal the color factor as route predictor; furthermore,
the outcomes attained for this factor were similar to those achieved by the neutral
factor. Two main reasons can explain this outcome. First, this might be due to the fact
of having two colors in dispute. If participants had to choose between an achromatic
corridor (i.e. white or gray) and one that is colored the results might be different.
Second, the small diference can be due to the chosen colors.
It is also clear that the participants were not always consistent in their preference
for a factor’s category. The Cochran test suggests that the percentage of consistent
individuals is not equal for all factors. The difference was between the window and
color factors: the factor with higher percentage of consistency was the window and
the factor with lower percentage was color. Signage, which was expected to be the
most influential, was classified below the window in what regards consistency. This
outcome reinforces the importance of the window as route predictor in contrast to the
irrelevance of the color factor.
In what regards to the left/right bias, in the neutral condition the majority of the
participants selected the left corridor, but the differences were not statistically
significant. However, those participants that were not consistent in what regards the
choices made for the factors revealed a preference for right side.
Further analysis could give more insight about the route decision, such as time to
make the decision at each decision-making point and participants’ handedness.
Additionally, results from participants’ reports, gathered with the post-hoc
questionnaire, can help to explain which were the strategies adopted by them in what
regards to the choices considering a specific factors and/or to complete the route.
VR, with the setup used, has proved to be an adequate tool for this kind of studies.
Nevertheless, other VR display techniques, such as the use of stereoscopic devices
with larger field of view could benefit this kind of research. The findings of this study
may have implications for the design of environments to enhance wayfinding.
References
1. Norman, D.A.: The Design of Everyday Things, 3rd edn. MIT Press, New York (1998)
2. Conroy, R.: Spatial navigation in immersive virtual environments. Unpublished
Dissertation, University of London, London (2001)
3. Carpman, J.R., Grant, M.A.: Design That Cares: Planning Health Facilities for Patients and
Visitors, 2nd edn. Jossey-Bass, New York (2001)
4. O’Neill, M.J.: Effects of Signage and Floor Plan Configuration on Wayfinding Accuracy
Environment and Behavior, vol. 23(5), pp. 553–574 (1991)
5. Smitshuijzen, E.: Signage Design Manual, 1st edn. Lars Muller, Baden (2007)
6. Passini, R.E.: Wayfinding in complex buildings: An environmental analysis. Man-
Environment Systems 10, 31–40 (1980)
7. Weisman, G.D.: Evaluating architectural legibility: Wayfinding in the built environment.
Environment and Behavior 13, 189–204 (1981)
8. Cubukcu, E.: Investigating Wayfinding Using Virtual Environments. Unpublished

Dissertation. Ohio State University, Ohio (2003)
9. Cubukcu, E., Nasar, J.L.: Influence of physical characteristics of routes on distance cognition
in virtual environments. Environment and Planning B: Planning and Design 32(5), 777–785
(2005)
10. Teixeira, L., Rebelo, F., Filgueiras, E.: Human interaction data acquisition software for
virtual reality: A user-centered design approach. In: Kaber, D.B., Boy, G. (eds.) Advances
in Cognitive Ergonomics, pp. 793–801. CRC Press, Boca Raton (2010)
11. Ishihara, S.: Test for Colour-Blindness, 38th edn. Kanehara & Co., Ltd., Tokyo (1988)
Evaluating Human-Robot Interaction during a
Manipulation Experiment Conducted in Immersive
Virtual Reality
Mihai Duguleana, Florin Grigorie Barbuceanu, and Gheorghe Mogan
Transylvania University of Brasov, Product Design and Robotics Department,

Bulevardul Eroilor, nr. 29, Brasov, Romania
{mihai.duguleana,florin.barbuceanu,mogan}@unitbv.ro
Abstract. This paper presents the main highlights of a Human-Robot

Interaction (HRI) study conducted during a manipulation experiment performed
in Cave Automatic Virtual Environment (CAVE). Our aim is to assess whether
using immersive Virtual Reality (VR) for testing material handling scenarios
that assume collaboration between robots and humans is a practical alternative
to similar real live applications. We focus on measuring variables identified as
conclusive for the purpose of this study (such as the percentage of tasks
successfully completed, the average time to complete task, the relative distance
and motion estimate, presence and relative contact errors) during different
manipulation scenarios. We present the experimental setup, the HRI
questionnaire and the results analysis. We conclude by listing further research
issues.
Keywords: human-robot interaction, immersive virtual reality, CAVE,

presence, manipulation.
1 Introduction
One of the most important goals of robotics researchers is achieving a natural
interaction between humans and robots. For attaining this, a great deal of effort is
spent in designing and constructing experiments involving both robots and human
operators. Using real physical structures in everyday experiments implies time
consuming testing activities. Considering VR has lately emerged as a prolific
approach to solving several issues scientists encounter during their work, some recent
robotics studies propose the usage of immersive VR as a viable alternative to classic
experiments.
But what are the advantages gained by using simulated robots instead of real ones?
Modeling robot tasks in VR solves hardware troubleshooting problems. From
programmer’s point of view, when implementing i.e. a new attribute for real robots, a
lot of time is spent on setting up collateral systems, time that can be saved using
simulation software [1]. VR solves uniqueness problems. Most research laboratories
have one or two study platforms which have to be shared between several researchers.
Using simulation eliminates the problem of concurrent use [2]. Simulation lowers the
Evaluating Human-Robot Interaction during a Manipulation Experiment 165
entry barrier for young scientists and improves education process [3]. Using only a
personal computer, inexperienced robot researchers can develop complex applications
in which they can program physical constrains of virtual objects, for obtaining results
close to reality [4].
When developing an experiment that involves both robots and humans, aside
solving trivial problems, one must also handle aspects concerning HRI. HRI has been
defined as the process of understanding and shaping the interaction between humans
and robots. By some recent studies [5], HRI has 5 primary attributes: level of
autonomy, nature of information exchange, the structure of human and robot teams
involved in interaction, the human/robot training process and the task shaping
process. Assessing HRI translates into a methodical “measurement” of its 5 attributes
[6]. As humans use their hands all the time, the transfer of objects between humans
and robots is one of the fundamental forms of HRI that integrates these attributes.
Thus, analyzing a manipulation scenario is a straight way to assess fundamental HRI
particularities [7]. Some scientists have identified a set of generic HRI metrics
(intervention response time, judgment of motion, situation awareness and others) and
a set of specialized HRI metrics for manipulation experiments (degree of mental
computation, contact errors and others) [8].
In this paper, we are focusing on conducting a manipulation experiment within
CAVE immersive VR environment. As it is not yet clear whether using virtual robots
has the same impact on HRI as using real equipment, we propose assessing the
difference between using real robots in real physical scenarios versus using virtual
robots in virtual scenarios, based on previous work in presence-measuring [9; 10]. In
order to validate the experiment setup, a questionnaire which targets both types of
HRI metrics was designed and applied to several subjects. The variables identified as
conclusive for the purpose of this study are: the percentage of tasks successfully
completed, the average time to complete a task, the average time to complete all tasks,
relative distance and motion estimate for VR tests and relative contact errors. In order
to measure presence, some of these variables are also assessed during tests with the
real robot.
2 Contribution to Technological Innovation
Although there have been several attempts to clearly determine the nature of HRI
within VR, most of the literature focuses on scenarios built upon non-immersive
simulation software. Most work in presence-measuring uses non-immersive VR as
comparison term. Furthermore, most of the studies focus on measuring the “sociable”
part of robots as seen by human operator, rather than measuring the performance
attained by direct collaboration between humans and robot, as in the case of a hand-
to-hand manipulation scenario [9; 10].
Although the presented manipulation experiment is intended to be a proof-of-
concept (as several equipment, administrative and implementation issues need to be
solved before using the results presented this paper), the computed questionnaire data
shows that the proposed approach is suitable to be extended to generic research in
HRI and robotic testing.
166 M. Duguleana, F.G. Barbuceanu, and G. Mogan
2.1 Designing the Virtual Robot and Working Environment
Nowadays, scientists have at their disposal several pieces of software that can help
them to achieve a satisfying simulation of their scenarios. Starting from modeling and
ending with the simulation itself, one can use various programs such as SolidWorks
or CATIA for CAD design, low level file formats built for 3D applications such as
VRML/X3D or COLLADA, for animating their designs, or more focused robot
simulators like Player Project, Webots or Microsoft Robotics Developer Studio.
For our experiment, we have modeled PowerCube robot arm using CATIA
software. The resulted CAD model has been exported as meshes in C++/OGRE (see
Fig. 1) and XVR programming environments, as these offer stereo vision capabilities
needed to include the arm model in CAVE.
Fig. 1. PowerCube arm exported from CATIA in C++/OGRE
The collision detection of the arm with itself and with objects from the virtual
world is handled by MATLAB, within the arm controlling algorithm. In real world,
working environments have irregular shaped obstacles. These may vary in size and
location with respect to the arm position. For simplicity reasons, we have defined 3
classes of obstacles which may be added to the virtual workspace:
- Spheres: (x,y,z, sphere radius).
- Parallelograms: (x,y,z, length, width, height).
- Cylinders: (x,y,z, radius, height).
The collision detection is implemented using sphere covering technique (see Fig.
2). The representation of the world is a set of radii and centers that models each object
as a set of spheres. During arm movement, the world is checked to verify that no
collisions happen between spheres (the distance between any 2 circles belonging to
different sets is higher than the sum of their radii). The clustering into spheres is done
using k-means clustering algorithm [11]. A number of clusters is chosen based on the
resolution of the world.
Fig. 2. In left (a) it is presented a configuration space with 4 obstacles and in right (b), its
equivalent after applying the sphere covering function
2.2 PowerCube Control
Manipulation assumes solving inverse kinematics and planning the motion for the
robot arm used in our experiments. In a previous study [12], an arm controller was
developed based on a double neural network path planner. Using reinforcement
learning, the control system solves the task of obstacle avoidance of rigid
manipulators, such as PowerCube. This approach is used for the motion planning part
of the virtual and the real robotic arm from our study. According to algorithm
performance specifications, the average time for reaching one target in an obstacle
free space is 13,613 seconds, while the average time for reaching one target in a space
with one obstacle (a cylinder bar located at Cartesian coordinates 5;15;-40) is 21,199
seconds.
The proposed control system is built in MATLAB. In order to achieve a stand-
alone application, a link between MATLAB and C++ is needed (see Fig. 3).
Unfortunately, creating a shared C++ library using MATLAB compiler is not a valid
solution, as custom neural network directives cannot be deployed. Using MATLAB
Engine to directly call for .m files is also not suitable for bigger projects. In the end, a
less-conservative method was chosen: a TCP/IP server-client communication. The
MATLAB sender transmits trajectory details (the angles vector captured at discrete
amounts of time) to C++ receiver.
The experiment is spit into 2 parts, one handling tests in VR and the other handling
tests in real environment. The VR tests have the following prerequisites:
- A virtual target object (a bottle) is attached to the position returned by the hand
trackers (see Fig. 4).
Fig. 3. The correspondence between MATLAB and C++/OGRE programming environments,

from XOY and XOZ perspectives
- Human subjects are mounted with optical sensors on their head (for CAVE head
tracking) and on their hands (for determining the target position).
- The subjects are asked to perform consecutively 3 tasks in 3 different scenarios,
briefly described before commencing the experiment.
- When the robotic arm reaches targets’ Cartesian coordinates (with a small error of
0.5cm), we suppose it automatically grasps the bottle.
- In all tests, the modeled robot arm starts with the angle configuration (0;90;0;-
90;0;0) which translates in the position presented in Fig. 3.
Scenario 1. In the first VR scenario, the subject is asked to place the virtual object
that is automatically attached to his hand in a designated position. The workspace
contains the PowerCube robot arm, an obstacle shaped as a cylinder bar with (5, 15,
-40, 3, 80) parameters (see Fig. 3) and the target object, a 0.5l bottle. The manipulator
waits until the subject correctly places the bottle, then using the motion planning
algorithm from MATLAB which generates an obstacle free trajectory to the
designated position, it moves towards target. After reaching the bottle, it
automatically grasps it and moves it to a second designated position (see Fig. 5).
Fig. 4. Passive optical markers mounted on subject’s hand
Fig. 5. Scenario 1 of the HRI experiment in CAVE
Scenario 2. In the second VR scenario, the subject is asked to freely move the virtual
object automatically attached to his hand within the workspace, which is, in this case,
obstacle free. Using the motion planning algorithm from MATLAB, the manipulator
dynamically follows the Cartesian coordinates of the bottle. Subjects are allowed to
move the target maximum 30 seconds. After this timeframe, if the robotic arm hasn’t
already reached it, the bottle will keep fixed its last Cartesian coordinates. After
reaching the bottle, the arm automatically grasps it and moves it to a designated
position.
Scenario 3. The third VR scenario assumes the arm has the target object in its
gripper. The role of the subject is to reach the bottle, and then place it at a designated
position. When the Cartesian coordinates returned by the passive optical markers
placed on the subjects hand reach the bottle with same small error of 0.5cm, the
virtual bottle is automatically attached to subject’s hand.
In order to measure presence, a scenario similar to scenario 1 was tested in the real
environment. Subjects are asked to place a bottle in the arm’s gripper. PowerCube is
then controlled to place the bottle in a designated position (see Fig. 6).
Fig. 6. Real environment scenario; the robot receives from subject a plastic bottle, which will
be placed on a chair
2.4 HRI Questionnaire Design
The HRI questionnaire was developed to gain information about subjects’ perception
when interacting with the real and the virtual PowerCube manipulators. All questions
may be answered with a grade from 1 to 10.
The proposed HRI questionnaire contains 12 questions divided into 3 parts:
- The biographical information section contains data related to the technological
background of the study participants. Using questions from this section, we are
trying to categorize subjects by their bio-attributes (age, sex), the frequency of
computer use and their experience with VR technologies and robotics. Some
examples of questions addressed here are: „How often do you use computers?”,
„How frequent are you interacting with robotic systems?”, „How familiarized are
you with VR?”
- The specific information section refers to the particularities subjects encounter
during the manipulation experiment. In order to measure the relative motion of the
arm, this section includes questions such as „How fast do you think the robot arm
was moving?”. The relative distance and relative contact errors are measured using
questions such as „How exact was the robot arm in picking/grasping/placing the
target?”. Overall impressions are measured by questions such as „How often did
you feel that PowerCube (both virtual and real) was interacting with you?” and
„How satisfying do you find the control algorithm of PowerCube?”.
The presence measuring section contains questions that try to assess the difference
between using a real PowerCube versus using a simulated one. Considering other
studies in presence, we have settled to measure 2 types of presence: presence as
perceptual realism and presence as immersion [9]. In order to measure presence as
perceptual realism, we asked „How real did the overall VR experience feel, when
compared with the real equipment?”. Presence as immersion was measured by
questions such as „How engaging was the VR interaction?”. We have also addressed
some open-ended questions at the end of this section, such as „What was missing
from virtual PowerCube that would make it seem more appropriate to the real
PowerCube?”.
2.5 Experiment Trials
22 students, 4 PhD. students and 3 persons from the university administrative staff
participated as subjects in this experiment. The experiment took an average of 20
minutes per subject, and answering the HRI questionnaire took an average of 5
minutes per subject. The results are centralized in Table 1.
Other variables that were measured during our experiment are the percentage of
tasks successfully completed – 99.1379%, the average time to complete a scenario – 4
minutes and 8 seconds, and the average time to complete all 4 scenarios – 16 minutes
and 32 seconds. The open question received suggestions such as paying better
attention to environment details, objects texture and experiment lighting conditions.
Some of the subjects inquired about the possibility of integrating haptic devices that
could enhance the realism of the simulation.
2.6 Discussion of Results
Overall, the centralized results from the HRI questionnaire allow us to conclude that
robot’s presence affects HRI. The result of question 10 (7.7586 on a 1-to-10 scale)
shows that using immersive VR is a great way of simulating robotic scenarios.
However, as reported in other studies [10], subjects gave the physically present robot
more personal space than in VR. Most of our subjects enjoyed interacting with both
the real and the virtual robot – on average, our test subjects found that interacting with
the virtual PowerCube is an experience worth rated at 8.5862 on a 1-to-10 scale. The
nature of the arm (fully mechanical, not anthropomorphic) made the subjects to rate
question 8 with only 5.1724 on a 1-to-10 scale. However, the arms’ control algorithm
seems to be fairly satisfying (7.6896), as it is very accurate (9.1379). Its main reported
drawback is its low reaching speed (6.3103).
Table 1. Centralized data from HRI questionnaire
Section Question Answer
Biographical 1.Age? 21 years – 14; 26 years – 3;

Information 22 years – 6; 39 years – 1;
23 years – 2; 40 years – 1;
25 years – 1; 44 years – 1;
2.Sex? M – 62% F – 38%
3.How often do you use computers? 7.7931 (1 – never used; 10 – every
day)
4.How frequent are you interacting with 5.3448 (1 – never interacted; 10 –
robotic systems? every day)
5.How familiarized are you with VR 5.9655 (1 – never heard; 10 – very
technologies? familiarized)
Specific 6.How fast do you think the robot arm 6.3103 (1 – very slow; 10 – very
Information was moving? fast)
7.How exact was the robot arm in 9.1379 (1 – completely inexact; 10
picking/grasping/placing the target? – perfectly accurate)
8.How often did you feel that 5.1724 (1 – never; 10 – all the
PowerCube (both virtual and real) was time)
interacting with you?
9.How satisfying do you find the control7.6896 (1 – completely
algorithm of PowerCube? unsatisfying; 10 – completely
satisfying)
Presence 10.How real did the overall VR 7.7586 (1 – completely unrealistic;
Measuring experience feel, when compared with the 10 – perfectly real)
real equipment?
11.How engaging was the VR 8.5862 (1 – not engaging; 10 –
interaction? very engaging)
12.What was missing from virtual -

PowerCube that would make it seem
more appropriate to the real PowerCube?
3 Conclusions
Testing real life manipulation scenarios with PowerCube (and other robotic
manipulators) imposes additional work in solving security issues, foreseeing and
solving possible hardware and software malfunctions and preparing additional
equipment for possible injuries. The proposed virtual solution eliminates all these
problems. Although the presented virtual model is close to the real life robot, there
still are some issues that need to be handled. The real robot has wires between each
link, wires that have not been integrated into the simulated model. Another issue is
the inconstancy between the real environment and the simulated one. Careful
measures have been taken in order to have a good virtual replica of the real setup,
however, due to the nature of the measuring process (which is inexact), the simulated
arm slightly differs in some dimensions, as well as the real working environment. The
simulated working environment had to be modified to include the robot body, chairs
and the ground level as obstacles.
According to the discussion of results, the information computed from the HRI
questionnaire shows that immersive VR is a good alternative to classical robot testing.
Acknowledgments. This work was supported by CNCSIS –UEFISCSU, project

number PNII – IDEI 775/2008.
References
1. Johns, K., Taylor, T.: Professional Microsoft Robotics Developer Studio. Wrox Press,
Indianapolis (2008)
2. Haton, B., Mogan, G.: Enhanced Ergonomics and Virtual Reality Applied to Industrial
Robot Programming. Scientific Bulletin of Politehnica University of Timisoara, Timisoara,
Romania (2008)
3. Morgan, S.: Programming Microsoft® Robotics Studio. Microsoft Press, Washington
(2008)
4. Duguleana, M., Barbuceanu, F.: Designing of Virtual Reality Environments for Mobile
Robots Programming. Journal of Solid State Phenomena 166-167, 185–190 (2010)
5. Goodrich, M.A., Schultz, A.C.: Human–Robot Interaction: A Survey. Foundations and
Trends in Human–Computer Interaction 1(3), 203–275 (2007)
6. Walters, M.L., et al.: Practical and Methodological Challenges in Designing and
Conducting Human-Robot Interaction Studies. In: Proceedings of AISB 2005 Symposium
on Robot Companions Hard Problems and Open Challenges in Human-Robot Interaction,
pp. 110–120 (2005)
7. Edsinger, A., Kemp, C.: Human-robot interaction for cooperative manipulation: Handing
objects to one another. In: Proceedings of the IEEE International Workshop on Robot and
Human Interactive Communication, ROMAN (2007)
8. Steinfeld, A., et al.: Common Metrics for Human-Robot Interaction. In: Proceedings of the
1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, pp. 33–40 (2006)
9. Lombard, M., et al.: Measuring presence: a literature-based approach to the development
of a standardized paper-and-pencil instrument. In: The 3rd International Workshop on
Presence, Delft, The Netherlands (2000)
10. Bainbridge, W.A., et al.: The effect of presence on human-robot interaction. In:
Proceedings of the 17th IEEE International Symposium on Robot and Human Interactive
Communication, pp. 701–706 (2008)
11. Vajta, L., Juhasz, T.: The Role of 3D Simulation in the Advanced Robotic Design, Test
and Control, Cutting Edge Robotics, pp. 47–60 (2005)
12. Duguleana, M.: Robot Manipulation in a Virtual Industrial Environment. International
Master Thesis on Virtual Environments. Scuola Superiorre Sant’Anna, Pisa, Italy (2010)
3-D Sound Reproduction System for Immersive
Environments Based on the Boundary Surface
Control Principle
Seigo Enomoto1 , Yusuke Ikeda1 , Shiro Ise2 , and Satoshi Nakamura1

1
Spoken Language Communication Group, National Institute of Information and
Communications Technology, 3-5 Hikaridai, Keihanna Science City, 619-0289, Japan
2
Graduate school of engineering, Department of Architecture and architectural
engineering, Kyoto University, C1-4-386 Kyotodaigaku-katsura, Nishikyo-ku, Kyoto,
615-8540, Japan
{seigo.enomoto,yusuke.ikeda,satoshi.nakamura}@nict.go.jp,
ise@archi.kyoto-u.ac.jp
Abstract. We constructed a 3-D sound reproduction system containing

a 62-channel loudspeaker array and 70-channel microphone array based
on the boundary surface control principle (BoSC). The microphone array
can record the volume of the 3-D sound field and the loudspeaker array
can accurately recreate it in other locations. Using these systems, we
realized immersive acoustic environments similar to cinema or television
sound spaces. We also recorded real 3-D acoustic environments, such as
an orchestra performance and forest sounds, by using the microphone ar-
ray. Recreated sound fields were evaluated by demonstration experiments
using the 3-D sound field. Subjective assessments of 390 subjects confirm
that these systems can achieve high presence for 3-D sound reproduction
and provide the listener with deep immersion.
Keywords: Boundary surface control principle, Immersive environments,

Virtual reality, Stereophony, Surround sound.
1 Introduction
Stereophony is one of the primary factors in improving the sense of immersion

in movies or television. Recent years have seen the emergence of surround sound
systems with 5.1-channel or greater loudspeakers beyond movie theaters and into
the home. Surround sound listeners can achieve a feeling as if they are in the
actual places to which they are listening. Traditional surround systems, however,
cannot reconstruct sound wavefronts that radiate in the actual environment. If
a system can reconstruct the wavefront, it can recreate a fully immersive, rather
than surrounding, environment. Since the hearing information can be obtained
from all directions, the importance of this information is particularly increased
in applications where many people in distance places communicate with each
other.

3-D Sound Field Reproduction System for Immersive Environments 175
The Kirchhoff-Helmholtz integral equation (KHIE) is the theoretical basis

of 3-D sound reproduction systems to record and reproduce sound fields, and
this sort of reproduction has a long history. In the early 1930s, Fletcher and
colleagues reported that ideal sound field recording and reproduction could
be achieved by using numerous microphones and loudspeakers with acoustical
transparency[3,4]. Steinberg et al. also confirmed in subjective experiments that
stereophonic sound could be represented using only three loudspeakers[13]. A
three-loudspeaker system cannot reconstruct a wavefront but can serve as a
basis for surround sound.
Field or 3-D sound reproduction is, however, more attractive. For correctly
reconstructing wavefronts, as compared to conventional surround systems, many
technologies have been developed and their theoretical basis have been studied.
In 1993, Berkhout et al. proposed Wave Field Synthesis (WFS) [1] as a 3-D
sound field reproduction system based on the KHIE, and IOSONO [8] is a com-
mercial application based on WFS. The theoretical basis of WFS is, however,
the Rayleigh integral equation. The KHIE is not used directly; therefore, in the
WFS an infinite plain boundary is considered explicitly. However, since the finite
length loudspeaker array is applied to practical WFS, it is difficult to reproduce a
3-D sound field with no artifacts due to the truncation. Moreover, practical loud-
speakers must be placed on the boundary instead of an ideal monopole source.
An approximation of an ideal source by means of a loudspeaker causes other ar-
tifacts, especially in a higher frequency range of its own properties. Ambisonics
[14] is another application of 3-D sound reproduction systems based on the wave
equation. Ambisonic-based 3-D sound reproduction systems require higher order
spherical harmonics to accurately reproduce 3-D sound fields, this they have to
date been difficult to construct. It is difficult to extend the so-called sweet spot
in ambisonics.
In contrast, Ise in 1997 proposed the boundary surface control principle (BoSC)
[6,7]. By integrating the KHIE and inverse system, the BoSC can accurately re-
produce a 3-D sound field surrounded by a closed surface. According to the BoSC,
it is not necessary to place ideal monopole and dipole sources on the boundary.
Therefore an approximation of such sources is also not required. There is also no
restriction on the loudspeaker position; it would be located in an exterior area
for the boundary.
Consequently, in our research we constructed 3-D sound reproduction systems
that contain a 70-channel microphone array and 62-channel loudspeaker arrays.
In this manuscript, we describe the results of subjective assessment to confirm the
“presence” recreated by the BoSC-based 3-D sound field reproduction system.
2 Boundary Surface Control Principle
The BoSC is the theory for reproducing a 3-D sound field, and is now applied to
the field of active noise control[10] or steering the directionality of a loudspeaker
array[2]. This section describes the BoSC as an application of 3-D sound repro-
duction. Fig. 1 expresses 3-D sound field reproduction based on the BoSC. The
176 S. Enomoto et al.
Sound source
Secondary sources
n n’
S ri r’i
S’
Secondary
Primary sound field sound field
V V’
p(s) p(s’)
Recording points Control points
Fig. 1. 3-D sound field reproduction based on the BoSC. The left figure shows the pri-
mary sound field to be recorded e.g., concert hall. The right figure shows the secondary
sound field where the recorded primary sound field is reproduced.
figure on the left shows the primary sound field to be recorded; e.g., a concert
hall. On the right is the secondary sound field where the recorded primary sound
field is reproduced; i.e., a listening room. The 3-D sound reproduction system
based on the KHIE aims to record sound field V bounded by surface S and re-
produce it into V bounded by S . Using the KHIE, the complex sound pressure
of p(s) and p(s ) where s ∈ V and s ∈ V are the evaluation points is given by

∂G(r|s)
p(s) = −jωρ0 G(r|s)vn (r) − p(r) dS , (1)
S ∂n

∂G(r |s )
p(s ) = −jωρ0 G(r |s )vn (r ) − p(r )
d S , (2)
S ∂n
where ω is an angular frequency. ρ0 is density of the medium. p(r) and vn (r) are
sound pressure and normal-outward particle velocity on the boundary, respec-
−j ω |r−s|
tively. G(r|s) = e 4π|r−s|
c
is a free-space Green’s function [11,15]. For notational
simplicity, angular frequency ω in equations is omitted. The free-space Green’s
function G(r|s) is explicitly defined by r and s. Therefore, if the shape of bound-
ary S is identical to S , the free-space Green’s function G(r|s) is also identical
to G(r |s ). Consequently, if p(r) and vn (r) are equal to p(r) and vn (r), p(s) is
also equal to p(s ).
To equalize p(r) and vn (r) with p(r ) and vn (r ) respectively, the BoSC system
employs the secondary loudspeakers. The output signal of the loudspeakers is
determined by the convolutions of the recorded sound signal and the inverse
system. The inverse system is computed to equalize the room transfer function
between each loudspeaker and microphone.
3 3-D Sound Reproduction System

3.1 BoSC in Practice : Reproducing 3-D Sound Field from
Recorded Sound Pressure on the Boundary
The 3-D sound reproduction system based on the KHIE theoretically requires
measurements of the particle velocity on the boundary. It is well known that
the particle velocity can be measured by using the sound pressure positioned
with two points that intersect the boundary [11]. However, in this case, since the
doubled record/control points are required, there is huge computational cost.
Therefore we constructed a 3-D sound reproduction system that can record and
reproduce only the sound pressure on the boundary. The Dirichlet Green’s func-
tion GD (r|s) can be used [15]. Substituting GD (r|s) into equations (1) and (2),
the first item of right-hand of these equations can be eliminated. Note that it
is difficult to derive the exact value of the Dirichlet Green’s function GD (r|s),
but it is not required in the BoSC system. The BoSC can assume that GD (r|s)
and GD (r |s ) are constants if the shape of the boundaries S is identical to S .
Therefore, if the sound pressure is recorded on boundary S and reproduced on
boundary S , p(s) is also reproduced in volume V . Note, however, that 3-D
sound fields at the natural frequency of a closed surface cannot be reproduced
in the Dirichlet boundary condition.
3.2 Microphone Array for 3-D Sound Field Recording

As boundary S and S depicted in Fig. 1, we presumed that the microphones
should be distributed at regular intervals. To construct a microphone array of this
shape, we designed it based on the C80 fullerene structure. The constructed array
is shown in Fig. 2. Its diameter is around 46 cm. Omni-directional microphones
(DPA 4060-BM) are installed on each node of the fullerene. Ten microphones
located on the bottom of the fullerene are also eliminated to insert the head of
the Head And Torso Simulator (HATS) or the subjects in Fig. 2. Therefore there
are 70 nodes. Since the maximum and minimum interval of each microphone is
respectively around 16 cm and 8 cm, the system can reproduce a frequency
range up to 2 kHz. The system, however, aims to create immersive environments
and have people feel the presence of other people or places. Therefore we did
not limit the frequency range to below 2 kHz in the demonstration experiment.
In the experiment we also aimed to evaluate 3-D sound fields that contain a
frequency signal over 2 kHz.
3.3 Loudspeaker Array and Sound Reproduction Room

As the secondary sound sources depicted in Fig. 1, we designed the loudspeaker
array with a dome structure consisting of four wooden layers and supported
by the four wood columns. Six, 16, 24, and 16 full-range loudspeakers (Fos-
tex FE83E) were installed on each wood layer. We presumed that the height
(a) (b)
Fig. 2. BoSC-based 3D sound reproduction system : 70-channel microphone array in

which omni-directional microphones are installed on every node. (a) 70-channel mic.
array. (b) Omni-directional mic. installed.
of the third layer is as almost same as the center of the microphone array. To
compensate for the lower frequency responses of full-range loudspeakers, two
sub-woofer loudspeakers (Fostex FW108N) were installed on each wood col-
umn. However, we employed 62 full-range loudspeakers for the design of the
inverse system. Though the minimum resonance frequency of a full-range loud-
speaker is 127 Hz, the 3-D sound reproduction system we constructed can pro-
duce from 80 Hz by using the appropriate inverse system. We therefore employed
the subwoofer loudspeaker only for below the 80 Hz frequency range. The loud-
speaker array is shown in Fig. 3. The loudspeaker array was constructed in a
soundproofed room (Yamaha Woodybox: sound insulation level Dr = 30; in-
side dimensions: 1,203 mm × 1,646 mm × 2,164 mm) to reduce the disturbance
background noise. To reduce the reverberation in the soundproofed room, we
attached sound absorbing sponge to each interior wall.
3.4 Design of the Inverse System
In Fig. 1, we presume that X(ω) is a complex pressure recorded on bound-

ary S in the primary sound field, Y(ω) is a radiation signal from the loud-
speaker in the secondary sound field, and [G(ω)] is the impedance matrix of the
transfer function between each loudspeaker and microphone pair. Complex pres-
sure Z(ω) measured on boundary S in the secondary sound field then satisfies
Equation (3).
Z(ω) = [G(ω)]Y(ω) (3)
Therefore, Equation (4) is required to be Z(ω) = X(ω).
Y(ω) = [G(ω)]+ X(ω) (4)

(a) (b)
Fig. 3. BoSC-based 3D sound reproduction system: 70-channel loudspeaker array

which 62 full-range loudspeakers and eight subwoofer loudspeakers installed. Only 62
full-range loudspeakers are used to render a wavefront. (a) Loudspeaker array con-
structed in the soundproofed room; (b) Dome structure of the loudspeaker array
where, [·]+ represents pseudo-inverse matrix. In addition,

⎡ ⎤
X(ω) = [X1 (ω), · · · , XN (ω)]T , G11 (ω) · · · G1M (ω)
⎢ .. .. .. ⎥
Y(ω) = [Y1 (ω), · · · , YM (ω)]T , [G(ω)] = ⎣ . . . ⎦, (5)
T
Z(ω) = [Z1 (ω), · · · , ZN (ω)] , GN 1 (ω) · · · GN M (ω)
where, [·]T represents transpose, the number of microphones is N = 70, and the
number of loudspeakers is M = 62 in this manuscript. Therefore, by using the
left inverse matrix, [G(ω)]+ can be given as
[G(ω)]+ = ([G(ω)]† [G(ω)] + β(ω)IM )−1 [G(ω)]† , (6)
where, [·]† represents conjugate transpose, β(ω) represents the regularization

parameter, and IM is the unit matrix with order M . An appropriate regulariza-
tion parameter can reduce instabilities of ([G(ω)]† [G(ω)]). We determined the
parameters in each octave frequency band heuristically. We also presume the
center of each frequency band is 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz,
8 kHz, and 16 kHz.
The transfer impedance matrix [G(ω)] is measured by using a 217 points
swept-sine signal experimentally in advance. Therefore, the inverse matrix only
contains the inversion of the measured transfer function and is not the correct
inversion in the reproducing circumstance. To determine [G(ω)]+ as it can com-
pensate for the fluctuations and time variants of the transfer function, we can
employ adaptive signal processing. Many adaptive signal processing methods for
the MIMO inverse system have been proposed and applied in the WFS system
to compensate for reverberations in a listening room [9,12,5]. Almost all algo-
rithms can be applied into the BoSC system. However, such adaptive algorithms
(a) (b)
Fig. 4. Recording of 3-D sound field data: (a) Orchestra with two microphone arrays are
located in the auditorium and in front of the conductor during playing of Beethoven’s
Symphony No. 7. (b) Forest sounds which contains air-plane, hamming bird, singing
of insects, footsteps, conversational voice, and so on.
with 62 inputs and 70 outputs require huge computational complexity. We there-

fore assumed in this manuscript that the instabilities caused by fluctuations and
time variants of the transfer impedance matrix can be reduced by using the
appropriate regularization parameters in this manuscript.
4 Demonstration Experiments
4.1 3-D Sound Field Recording
To demonstrate the performance of the 3-D reproduction system based on BoSC,
we recorded real 3-D sound environments. The orchestra and forest sounds were
reproduced in the subjective assessments described in the next section. The
recording environments are shown in Fig. 4. The recordings of 3-D sound sources
were carried out with 48 kHz sampling frequency and 24-bit quantization bit
depth. In the recording of the orchestra, we employed two microphone arrays;
one located in the auditorium and another in front of the conductor. Providing
the visual information supplementally in the demonstration, we carried out video
recording at the same position as 3-D field recording.
4.2 Subjective Assessment

The demonstration experiments were conducted to evaluate the performance of
the BoSC-based sound reproduction system. The reproduced sound fields are
listed in Table 1. Three kinds of nature sound (A) recorded in forests, and an or-
chestra performance (B and C) were employed in the demonstration. The perfor-
mance in (B) corresponds to (A). The demonstration was limited to around five
minutes. The recorded video was shown on an LCD monitor while the orchestra
Table 1. Reproduced 3-D sound field data
Feature sound/Location Time [sec.]

airplane 70.0
Forest sound A conversing voices 37.5
footsteps 22.5
B in auditorium 67.0
Orchestra
C in front of conductor (stage) 67.0
Table 2. Questionnaire entries for the demonstration of the 3-D sound field reproduc-
tion
Comprehensive feeling for the reproduced 3-D sound field

A. Did you feel as if you were in a forest?
1. Very poor 2. Poor 3. Average 4. Good 5. Very good
B. Did you feel as if you were in the auditorium?
C. Did you feel as if you were in front of the conductor?
What was the most impressive sound? (description)
Table 3. Averaged scores of each age and total subject (1. very poor, 2. poor, 3.
average, 4. good, and 5. very good)
-9y/o 10’s 20’s 30’s 40’s 50y/o- total

A. Forest sound 4.54 4.69 4.48 4.66 4.55 4.10 4.55
B. Auditorium 4.50 4.53 4.29 4.24 4.36 4.28 4.37
C. Stage 4.26 4.43 4.59 4.57 4.68 4.24 4.50
performance was being reproduced. Subjective assessment was conducted for the
audience of the 3-D sound reproduction system. The questionnaire entries of the
subjective assessment were listed in Table 2. For the orchestra performance (B
and C), the subwoofer loudspeakers were employed to raise the lower frequency
range. Each subwoofer loudspeaker was assigned to radiate the sound signals,
which were measured by using the C80 microphone array with delay-and-sum.
4.3 Experimental Results and Discussions

In the demonstration, we obtained questionnaire responses from 390 subjects.
The results of the subjective assessment for each age are shown in Fig. 5 and the
averaged scores are shown in Table 3. Fig. 5 (a) shows that almost all subjects
for all ages felt as if they were in the nature. This confirmed that the BoSC-
based sound reproduction system reproduced a 3-D sound field and can create
immersive environments.
Figs. 5(b) and (c) also show that many subjects rated the reproduced sound
field as “good” or “very good.” The system therefore can be said to yield a
70
Very good
60 Good
Number of subjects
Avarage
50 Poor
40 Very poor
30
20
10
0
−9 10s 20s 30s 40s 50−
Age
(a)
70 70
Very good Very good
60 Good 60 Good
Number of subjects
Number of subjects
Avarage Avarage
50 Poor 50 Poor
40 Very poor 40 Very poor
30 30
20 20
10 10
0 0
−9 10s 20s 30s 40s 50− −9 10s 20s 30s 40s 50−
Age Age
(b) (c)
Fig. 5. Experimental results: Subjective assessments for (a) forest sounds, (b) audito-
rium, (c) stage.
presence similar to as if subjects were in the concert hall or on the stage. The
scores for B, however, were smaller than for A or C, especially in those aged 20’s
to 40’s since mismatches of reproduced sound fields and video information due
to the monitor’s size and position caused odd sensations. On the other hand,
because only the conductor was shown on the monitor, the subjects did not have
such sensations in C. In addition, subjects with experience playing instruments
gave more “very good” scores in C compared to B.
5 Conclusions
We constructed a 3-D sound reproduction system based on the BoSC, consisting
of a 70-channel microphone array designed from a C80 fullerene structure, and
a loudspeaker array in which 62 full-range loudspeakers and eight subwoofer
loudspeakers were installed.
To evaluate the performance of the BoSC-based system, we conducted

subjective evaluation through the demonstration. The assessment results con-
firm that the system can provide immersive environments and the presence
of other people or places. However, we did not limit the frequency range of
the reproduced sound field, though theoretically it is limited up to 2 kHz.
The accuracy of reproduced sound fields should be physically evaluated in the
future.
Acknowledgments
Part of this study was supported by the Special Coordination Funds for Pro-
moting Science and Technology of the Ministry of Education, Culture, Sports,
Science and Technology of Japan, and the Strategic Information and Commu-
nications R&D Promotion Program commissioned by the Ministry of Internal
Affairs and Communications of Japan.
References
1. Berkhout, A.J., de Vries, D., Vogel, P.: Acoustic control by wave field synthesis.
Journal of the Acoustical Society of America 93(5), 2764–2778 (1993)
2. Enomoto, S., Ise, S.: A proposal of the directional speaker system based on
the boundary surface control principle. Electronics and Communications in
Japan 88(2), 1–9 (2005)
3. Fletcher, H.: Auditory perspective - basic requirement. In: Symposium on Wire
Transmission of Symphony Music and its Reproduction in Auditory Perspective,
vol. 53, pp. 9–11 (1934)
4. Fletcher, H.: The stereophonic sound film system - general theory. The J. Acoust.
Soc. Am. 13(2), 89–99 (1941)
5. Gauthier, P.A., Berry, A.: Adaptive wave field synthesis with independent radiation
mode control for active sound field reproduction: Theory. Journal of the Acoustical
Society of America 119(5), 2721–2737 (2006)
6. Ise, S.: A principle of sound field control based on the kirchhoff-helmholtz inte-
gral equation and the inverse system theory. Journal of the Acoustical Society of
Japan 53(9), 706–713 (1997) (in Japanese)
7. Ise, S.: A principle of sound field control based on the kirchhoff-helmholtz integral
equation and the theory of inverse systems. Acustica 85, 78–87 (1999)
8. Lee, N.: Iosono. ACM Computers in Entertainment (CIE) 2(3), 3–3 (2004)
9. Lopez, J., Gonzalez, A., Fuster, L.: Room compensation in wave field synthesis
by means of multichannel inversion. In: IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (2005)
10. Nakashima, T., Ise, S.: A theoretical study of the descretization of the boundary
surface in the boundary surface control principle. Acoustical Science and Technol-
ogy 27(4), 199–205 (2006)
11. Nelson, P.A., Elliot, S.J.: Active Control of Sound. Academic Press, San Diego
(1992)
12. Spors, S., Buchner, H., Rabenstein, R., Herbordt, W.: Active listening room com-
pensation for massive multichannel sound reproduction systems using wave-domain
adaptive filtering. Journal of the Acoustical Society of America 122(1), 354–369
(2007)
13. Steinberg, J., Snow, W.: Auditory perspective - physical factors. In: Symposium
on Wire Transmission of Symphony Music and its Reproduction in Auditory Per-
spective, vol. 53, pp. 12–17 (1934)
14. Ward, D.B., Abhayapala, T.D.: Reproduction of a plane-wave sound field using an
array of loudspeakers. IEEE Transactions on Speech and Audio Processing 9(6),
697–707 (2001)
15. Williams, E.G.: Fourier Acoustics:Sound Radiation and Nearfield Acoustical Holog-
raphy. Academic Press, London (1999)
Workspace-Driven, Blended Orbital Viewing in
Immersive Environments
Scott Frees and David Lancellotti
Ramapo College of New Jersey

505 Ramapo Valley Road
Mahwah, NJ 07430
{sfrees,dlancell}@ramapo.edu
Abstract. We present several additions to orbital viewing in immersive virtual

environments, including a method of blending standard and orbital viewing to
allow smoother transitions between modes and more flexibility when working
in larger workspaces. Based on pilot studies, we present methods of allowing
users to manipulate objects while using orbital viewing in a more natural way.
Also presented is an implementation of workspace recognition, where the
application automatically detects areas of interest and offers to invoke orbital
viewing as the user approaches.
Keywords: Immersive Virtual Environments, Context-Sensitive Interaction,

3DUI, interaction techniques.
1 Introduction
One of the key benefits of most immersive virtual environment configurations is the
ability to control the viewpoint using natural head motion. When wearing a tracked,
head-mounted display, users can control the direction of their gaze within the virtual
world by turning their head the same way they do in the real world. This advantage
typically holds in other configurations as well. We refer to this viewpoint control as
egocentric viewing, as the axis of rotation of the viewpoint is centered atop the user’s
physical head.
There are, however, situations where egocentric view control may not be optimal.
In order to view an object from different perspectives, users are normally forced to
physically walk around the area. Depending on the hardware configuration (wired
trackers, etc.), this can be cumbersome. One alternative is orbital viewing [5], where
the user’s gaze is fixed towards the area of interest. Head movements and rotations
are mapped such that the user’s viewpoint orbits around the target location, constantly
being reoriented such that the target is in view. Physical head rotations to the left
cause the virtual viewpoint to swing out to the right, such that the user is looking at
the right side of the object. Looking down has the effect of moving the viewpoint up
above such that the user is looking down at the object.
Orbital viewing has its own disadvantages. Firstly, it is quite ineffective if the user
wishes to survey the world –it locks the gaze direction such that the user is always
looking towards the same 3D location in the world. Switching between orbital and
186 S. Frees and D. Lancellotti
egocentric viewing can also be disruptive and confusing. Orbital viewing also
presents interaction challenges when manipulating objects, as the act of orbiting while
manipulated an object can affect the user’s ability to control the object’s position
accurately. This paper presents additions to orbital viewing that allows it to be more
effective in general interactive environments.
Finally, we describe how orbital viewing can be integrated into an application by
linking it to known areas of interest within the virtual world – which we call
“hotspots”. We present an outline of how we automatically infer the existence of such
locations based on user behavior.
2 Related Work
Koller et al. [5] first presented orbital viewing as an alternative method for viewing
objects, in particular a WIM [7]. Several of orbital viewing’s limitations were
addressed in this work, such as the possibility of occlusion, importance of controlling
the radius of orbit, and the possibility of increased disorientation when transitioning
into / out of orbital viewing. We do not address the occlusion problem in our work,
but offer implementations that relate to radius control and disorientation.
Many alternatives to egocentric view control have been presented in the literature
using a wide range of approaches for both immersive environments [2, 7, 9] and
desktop applications [8]. Our work is not aimed at comparing orbital viewing with
other alternative techniques however; it is focused on determining and improving
orbital viewing’s effectiveness in interactive systems in general.
Much of our work is aimed at developing viewpoint control techniques that
effectively handle changing workspaces. The user’s workspace can be described as
the location and size of the area the user is currently most interested in. The size and
location of the workspace can greatly influence an interaction technique’s
effectiveness, described in detail in [1]. Our techniques for implicitly recognizing
workspace are based in part on the concept of focus and nimbus, presented by
Greenhalgh and Benford [3]. In addition, our concept of modeling workspaces as
artifacts within the environment in which users can invoke orbital viewing is similar
to the idea of landmarks introduced by Pierce and Pausch [6], where landmarks were
anchor points for navigation techniques in large-scale workspaces.
3 Identifying Problems with Orbital Viewing
This research started as an investigation into whether or not orbital viewing could aid
in manipulation tasks requiring the user to view objects from multiple perspectives. In
our user studies, participants were presented with a virtual object (as shown in
Fig. 1) that fits within a translucent object of the same shape (only slightly larger).
The user was asked to position the object to fit within the target as many times as
possible within a 3-minute trial. This task was selected because it would require the
user to view the object from all directions in order to make the fit.
Workspace-Driven, Blended Orbital Viewing in Immersive Environments 187
Fig. 1. User manipulating the object such that it will match the target's (blue translucent object)
orientation
We conducted this experiment with 24 participants. Each participant underwent

two training trials using egocentric viewing and two training trials with standard
orbital viewing – where the orbital viewing configuration was centered on the target
object. They completed two recorded trials with egocentric and orbital viewing in a
randomized order.
ANOVA analysis of completions per trial showed no statistically significant effect
for view control type (p = 0.143). User feedback, however, provided us with some
clear areas where orbital viewing could be improved. Survey data also indicated that
while orbital viewing did not help performance on the task, many participants found it
more comfortable (as opposed to walking around the object).
Our first observation was that orbital viewing could make it difficult to control
objects precisely. As the head rotates, the user’s virtual viewpoint and avatar orbit
around a center point. This becomes a problem when the user is holding a virtual
object (perhaps with a hand-held stylus). If the physical relationship between the hand
and head are preserved, head movement results in hand movement – and thus the
virtual object will move. In Section 4.1 we present a modification that temporarily
breaks the relationship between the viewpoint and hand to improve manipulation.
Another area where orbital viewing caused problems was at the beginning of trials,
where if the trial was to use orbital viewing the user was abruptly switched into that
mode when timing began. We observed confusion and awkwardness during these
transitions. Furthermore, our overall goal is to take advantage of orbital viewing in
general virtual environment applications if/when the user’s workspace becomes small.
If the user is to switch between egocentric and orbital viewing as their workspace size
changes, the disorientation during transitions needed to be resolved. In Section 4.2,
we discuss our blended view control approach that works to reduce this problem.
4 Implementation of Viewing Techniques

The implementation of egocentric viewing is straightforward – a tracking device of
some kind provides position and orientation information for the user’s head. The
viewpoint, or virtual camera, rotates and moves in a one-to-one correspondence with
the tracker - commonly im mplemented by making the viewpoint a child object of the
tracker representation in th he scene graph (perhaps with offsets to account for eye
position and offsets). An alternative way of describing viewpoint control in 3D
graphics is the specificatioon of two 3D points, eye and look. In this scenario, the
orientation of the viewpoin nt or virtual camera is constrained to point at the “look”
point. The look and eye points
p are rigidly attached to each other (at some fixed
distance). Rotations of the viewpoint (driven by the tracker) result in translations and
rotations of the look point. In effect, in egocentric viewing, the look point orbits the
eye point (or viewpoint), as depicted in Fig. 2 (A). In our system, we used a Virttual
Research 1280 HMD and d Polhemus FASTRAK system. The application and
interaction techniques weree implemented using the SVE [4] and CDI toolkits [1].
When orbital viewing is active, the situation reverses – instead of the tracker beeing
mapped to rotations of the eye
e point; head rotations are mapped to the look point. T The
look point’s position is fixed,
f and the viewpoint orbits the look point. Thiss is
implemented by detaching the viewpoint from the tracker and making it a child off the
look point. Rotational inforrmation (not position) is mapped from the head trackeer to
the look point. The result iss that head movements cause the viewpoint to orbit arouund
the look point, as shown in Fig. 2 (B).
A B
Fig. 2. (A
A) Egocentric viewing; (B) Orbital viewing
In an interactive system, the user must be given a method to control the radiuss of
the orbit. In our implementation, where the user points the stylus in the geneeral
direction they wish to mov ve and hold the button down. The vector representing the
d onto a vector connecting the eye and look points. If the
stylus direction is projected
vector projects away from the
t look point, the radius is increased, and vice versa.
4.1 Object Manipulation

n While Using Orbital Viewing
In orbital viewing, head rotation results in viewpoint rotation and translattion

(viewpoint orbits the worksspace). This presents a question: should the viewpoint and
virtual hand move together, as a unit, or should the virtual hand stay fixed while oonly
the viewpoint moves? Bo oth choices could create confusion, however the form mer
creates real interaction diffiiculties since the virtual hand may move while the physical
hand remains still.
Our implementation allows the hand to orbit with the viewpoint only when not
interacting with an object. This means that if the user’s hand is within view when
rotating the head, the virtual hand appears still, as it moves proportionately with the
viewpoint, as depicted in Fig. 3 (A). This configuration is implemented by moving the
virtual hand in the scene graph such that it is a child of the viewpoint. Its local
position with respect to the viewpoint is recalculated each time the stylus tracker
moves such that it is equal to the distance between the stylus and HMD trackers.
A B
Fig. 3. (A) Virtual hand moves with viewpoint. (B) Virtual hand stays in original (physical)
position, viewpoint moves without it.
In our system the user’s virtual hand is tracked by a hand-held stylus, which has a
single button. The user can translate and rotate an object by intersecting their virtual
hand with it and holding a stylus button down. While manipulating an object, we
switch to the setup depicted in Fig. 3 (B). When holding an object the virtual hand
“sticks” in its original position - the cursor/virtual hand still responds to physical
hand movement, but it does not orbit with the viewpoint. This creates an
inconsistency between the physical hand/head and the virtual hand/viewpoint. Once
the user releases the object, the virtual hand “snaps” back to where it would have
been if it had orbited with the viewpoint, which restores the user to a workable state
going forward.
At first look, this technique may seem confusing, as one would think it would be
distracting to have your viewpoint potentially move large distances while the hand
stays stationary. In our experience, however, it has proven effective. When users are
manipulating objects, their attention is focused on their hand and the object and they
do not expect their virtual hand to move unless the physical hand does. When head
rotations cause their viewpoint to orbit the object of interest, the experience mimics
real world experiences of holding an object in hand and “peering” around it to see it
from another perspective. During user trials, participants needed little explanation of
the technique, they adapted to it without difficulty. We suspect preserving the
mapping of physical hand movements (or absence of) to the virtual cursor is far more
important than preserving the relationship between the viewpoint and virtual hand,
however we would like to investigate this more directly in future trials.
4.2 Blended Orbital Viewing
Switching between egocentric and orbital viewing can be disorienting, and often
times being in complete orbital viewing mode may not be optimal (the user may be
over-constrained). We have implemented a mechanism for blending the two viewing
techniques together seamlessly, such that the user can be experiencing view control
anywhere along the continuum between egocentric and orbital viewing.
We describe both techniques by maintaining two 3D points – look and eye – rigidly
attached to each other. In egocentric viewing, the head tracker directly rotates the eye
(viewpoint), where in orbital viewing it maps to the look point. Rather than pick a
configuration, we instead keep track of both sets of points - giving us eyeego and
lookego for egocentric and eyeorb and lookorb for orbital viewing using four hidden
objects inserted into the scene graph.
The actual look and eye points (and thus the viewpoint’s orientation) are defined
by a view control variable, VC. At VC = 0, the user is in full egocentric viewing and
the viewpoint is directly mapped to lookego and eyeego. When VC = 1, the user is
engaged in orbital viewing – the viewpoint is mapped to lookorb and eyeorb. For values
between 0 and 1, the eye (viewpoint) and look positions are linear interpolations of
their corresponding egocentric and orbital pairs. Once the interpolated look and eye
points are known, the actual virtual camera is repositioned and oriented to adhere to
the configuration depicted in Fig. 4. A similar interpolation procedure is performed to
position the virtual hand and cursor.
Blending on Transitions. The most straightforward use of blended orbital viewing is

during transitions between full egocentric and orbital viewing. When the workspace
becomes small (either determined explicitly or through observation) and the interface
invokes orbital viewing, the abrupt change in viewpoint mapping can become
extremely distracting. Worse yet, abruptly transitioning from orbital viewing into
normal egocentric viewing can leave the user looking in a completely different
direction than they were a moment before (without actually physically moving
their head).
To alleviate these effects, we never move the VC value discretely; rather we
gradually adjust it over a period of three seconds. The actual time interval is
somewhat arbitrary, but it represented a good compromise derived from several user
trials. Shorter transition intervals did little to alleviate disorientation, and significantly
longer intervals tended to interfere with the task at hand. We fully expect different
users might respond better to different interval values.
Blending Based on Workspace Size. Often a user might be particularly interested in

a region of the world containing several objects. While this workspace may be
relatively small, it could be large enough that locking into a fixed 3D location at its
center using pure orbital viewing could be a hindrance to the user. By selecting a VC
value between 0 and 1, the user benefits from orbital viewing (i.e. head movements
cause them to orbit the center point, giving them a better perspective) while still
allowing them to view a larger field of view (i.e. the look point moves within the
workspace). This is depicted in Fig. 5, where the VC value is 0.75.
Workspace-Driven
n, Blended Orbital Viewing in Immersive Environments 191
Fig. 4. Exam
mple of blended orbital viewing with VC = 0.5
Fig. 5. With VC = 0.75, when user rotates physical head right, most of the rotation results inn an
orbit around to the right side (due to orbital component), however the look point also mooves
(due to egocentric component))
This technique could also o be useful when the user gradually becomes more focuused
on a specific area. In the beeginning of the sequence they may be surveying the woorld,
using egocentric viewing. Over
O time, the user may start to focus on a set of objeects,
perhaps manipulating them. As they become more focused on a smaller space, the VC
can gradually be increased towards
t orbital viewing.
5 Workspace Recogn
nition
The user’s current workspaace is part of the contextual information an application can
utilize when deciding the interaction technique to deliver. When working withiin a
wing technique such as orbital viewing may be beneficiial -
small, confined area, a view
while when working in a laarger area it could be too limiting. To make decisions baased
on workspace, we require a way to define it quantitatively.
Our model defines threee size ranges for the workspace: small, medium, and larrge;
supporting different view control
c techniques for each range. We base view conntrol
decisions on the ranges insstead of physical dimensions directly in order to allow the
thresholds to be varied eassily between applications or users. Current workspace can
be determined in a variety of ways – ranging from asking the user to explicitly

indicate the volume to inferring it by user activity. We use a combination of explicit
and implicit workspace recognition. Potential workspaces in the world are identified
through observation – which we refer to as “hotspots”. Users must explicitly “attach”
to the hotspot before orbital viewing is invoked.
Workspace is inferred by recording where the user has been “looking” and
interacting. We divide the virtual world into a three-dimensional grid – with each
location assigned a numeric score. As the user’s viewpoint moves, we increment the
score of each location within the field of view. Interaction with an object near the
location also increases its score. Periodically the system reduces the score of all grid
locations by a small amount. Overtime, heavily active locations achieve high enough
scores to be promoted to “hotspots” – which become permanent artifacts within the
world and anchor points for orbital viewing. A hotspot has a precise location. Its size
is recorded as “small” if there are no hotspots immediately adjacent to it. If there is a
hotspot at an adjacent grid location, its size is set to “medium”.
Hotspots are implemented such that the required score, frequency and magnitude
of increments, and granularity of the 3D grid points easily varied; we have not yet
done formal studies to determine an optimal configuration (if one even exits).
In addition to dynamic hotspots, we also support predefined hotspot locations that
can be placed within the world. Regardless of the type of hotspot, the user interface
associated with it is the same. As a user approaches a hotspot, they are given a textual
indication on the top of their HMD’s view suggesting that orbital viewing could be
used. If they wish to invoke orbital viewing, they simply touch their stylus to their
head and click its button. The user is then transitioned from a VC = 0 (egocentric
viewing) up to the maximum VC associated with the hotspot. Typically, we choose
VC = 1 for small (standalone) hotspot locations and VC = 0.75 for hotspots with
proximate neighbors (part of medium sized workspaces). The VC is adjusted
gradually as described in Section 4.1.1. To exit orbital viewing, the user touches their
stylus to their head and clicks – which gradually reduces the VC back to 0.
6 Conclusions and Future Work

We have created several features that can be implemented with orbital viewing that
we’ve seen reduce confusion and disorientation while allowing users to manipulate
objects in more natural manner. In follow-up experiments post-trial user feedback has
improved. We believe the system for recognizing workspaces, or hotspots, in the
virtual world is a solid stepping stone in automatically recognizing areas of interest,
and that using blended orbital viewing to ease transitions is a valuable addition to the
view control interface.
Our greatest challenge has been quantifying performance results. The user
experiment described in Section 3 was specifically designed to be difficult if users did
not try to view the object from various angles. Our expectation was that if orbital
viewing really helped the user to do this, participants would attain more completions
on those trials. Even with our improvements however, the actual manipulation task is
hard enough that it seems to outweigh the effects of the view control method. In
subsequent experiments with the same design, view control technique still has not
appeared significant. Reducing difficulty (making the target larger such that it is
easier to fit the control object within it) tends to reduce the importance of the viewing
technique. We have considered experiments that only require the user to view an
object or area from multiple directions (without needing to interact), however we feel
this would be a trivial comparison – as obviously rotating one’s head would be faster
than walking around to the other side of the object. We are currently investigating
alternative methods of evaluating the effect orbital viewing has on general interaction.
Acknowledgements. This work was funded by the National Science Foundation,

grant number IIS-0914976.
References
1. Frees, S.: Context-Driven Interaction in Immersive Virtual Environments. Virtual
Reality 14, 277–290 (2010)
2. Fukatsu, S., Kitamura, Y., Toshihiro, M., Kishino, F.: Intuitive control of “birds eye”
overview images for navigation in an enormous virtual environment. In: Proceedings of the
ACM Symposium on Virtual Reality Software and Technology, pp. 67–76 (1998)
3. Greenhalgh, C., Benford, S.: Massive: A collaborative virtual environment for
teleconferencing. ACM Transactions on Computer-Human Interaction 2(3), 239–261 (1995)
4. Kessler, G.D., Bowman, D.A., Hodges, L.F.: The Simple Virtual Environment Library, and
Extensible Framework for Building VE Applications. Presence: Teleoperators and Virtual
Environments 9(2), 187–208 (2000)
5. Koller, D., Mine, M., Hudson, S.: Head-Tracked Orbital Viewing: An Interaction Technique
for Immersive Virtual Environments. In: Proceedings of the ACM Symposium on User
Interface Software and Technology, pp. 81–82 (1996)
6. Pierce, J., Pausch, R.: Navigation with Place Representations and Visible Landmarks. In:
Proceedings of IEEE Virtual Reality, pp. 173–180 (2004)
7. Stoakley, R., Conway, M., Pausch, R.: Virtual Reality on a WIM: interactive worlds in
miniature. In: Proceedings of CHI 1995, pp. 265–272 (1995)
8. Tan, D.S., Robertson, G.G., Czerwinski, M.: Exploring 3D Navigation: Combining Speed-
coupled Flying with Orbiting. In: CHI 2001 Conference on Human Factors in Computing
Systems, Seattle, WA (2001)
9. Tanriverdi, V., Jacob, R.: Interacting with Eye Movements in Virtual Environments. In:
Proceedings of SIGCHI on Human Factors in Computing Systems, pp. 265–272 (2000)
Irradiating Heat in Virtual Environments: Algorithm
and Implementation
Marco Gaudina, Andrea Brogni, and Darwin Caldwell
Advanced Robotics Dept. - Istituto Italiano di Tecnologia, Genoa, Italy

{marco.gaudina,andrea.brogni,darwin.caldwell}@iit.it
Abstract. Human-computer interactive systems focused mostly on graphical

rendering, implementation of haptic feedback sensation or delivery of auditory
information. Human senses are not limited to those information and other physi-
cal characteristics, like thermal sensation, are under research and development. In
Virtual Reality, not so many algorithms and implementation have been exploited
to simulate thermal characteristics of the environment. This physical character-
istic can be used to dramatically improve the overall realism. Our approach is
to establish a preliminary way of modelling an irradiating thermal environment
taking into account the physical characteristics of the heat source. We defined an
algorithm where the irradiating heat surface is analysed for its physical charac-
teristic, material and orientation with respect to a point of interest. To test the
algorithm consistency some experiments were carried out and the results have
been analysed. We implemented the algorithm in a basic virtual reality applica-
tion using a simple and low cost thermo-feedback device to allow the user to
perceive the temperature in the 3D space of the environment.
Keywords: Virtual Reality, Thermal Characteristic, Haptic, Physiology.
1 Introduction
In the last decade the Human-computer interaction has dramatically evolved due to the
large expansion of technology. Some fields like Virtual Reality now have a new impor-
tant role. In the challenge to take the user’s experience to a different level of interaction,
a Virtual Environment is a suitable platform to allow the user to feel more confort-
able with the application. In a Virtual Environment the user can freely move around,
perceive objects depth via 3D glasses, touch objects using haptic interfaces and listen
to sounds. Many works presented in [8] demonstrate all the effort that has been put
in recent years to achieve a good level of haptic interaction in virtual environments.
Softness, surface, friction are for example aspects that have been analysed until now
and these helped much more to improve how a user can perceive him/herself as the
protagonist of a virtual scene. Despite these several improvements, a big topic needs
more attention: the thermal interaction. When we move around in an environment we
feel different thermal characteristics, therefore cutaneous skin stimuli are generated and
our perception is altered giving important information to the nervous system about the
surrounding area. In some pathological cases, like for example blind people, thermal
characteristics could help users where visual information is missing. In [6], tempera-
ture changes during contact have been used to assist in identifying and discriminating

Irradiating Heat in Virtual Environments: Algorithm and Implementation 195
objects in the absence of vision. Benali-KhoudjaI et al. in [1] showed a way the user can
feel the thermal characteristic of a touched object. Other researchers in [5] focused their
attention on heat transfer between a finger and a touched object, considering different
materials and blood flux. Human skin can perceive the irradiating heat that an object is
generating before touching it and therefore, with this information, the user can modify
the action that he/she is performing. If for example we want to move in the direction of
a switched on lamp, we can perceive the hot temperature the lamp is generating, before
touching the lamp. Lamps, oven, engines, electronic devices or in general every system
with a temperature different from absolute zero, generates irradiating heat.
In this work we present a novel thermal algorithm to represent heat irradiating
sources against a moving interest point. We implemented the algorithm in a virtual real-
ity environment with a simple and low cost thermal device. Two different experiments
were carried out to analyse how a user behaves reaching a hot object and discovering
which is the hot surface of a die with two different side sizes. Finally, we developed a
basic virtual application with multiple heat sources differentiating surface temperature
and material, introducing a basic 3D element representing a lamp.
2 Thermal Modeling
2.1 Irradiating Heat Exchange Algorithm
We are interested in finding which is the final temperature of an interest point under
the influence of different irradiating heat sources inside a virtual environment. The aim
of the algorithm is to keep into consideration factors like the physical surface of the
object, its surface temperature and its material. At this stage we make a preliminary
consideration: there is no generating air flux either natural, since we consider the ambi-
ent as closed, or forced. This because at this stage we don’t want to consider convection
effect. We consider each examined object as grey body as like as humans skin, with
homogeneous surface material without taking into account the object of the color itself.
To be rigorous we should consider that the heat exchange between two grey bodies is
continuous and becomes unimportant after a while. This is because the radiation de-
creases, bouncing between the two bodies with a reflection coefficient that for a grey
body is less than 1 as explained in [10]. Like in acoustic studies, we should consider the
case as a transitory stage and study all of the possible energetic exchanges step by step;
in thermodynamics studies making a final balance of every exchange doesn’t create a
relevant error, because such energetic exchanges take place very rapidly. Therefore we
consider the system at its steady stage. In this way we can avoid external and influenc-
ing effects, and concentrate our attention on the irradiating effect of an object at the
time we are getting closer. Taking into account the base formula of the irradiating heat
as described in [2] we want to express now the heat irradiating quantity generated by
each heat source like:
qi = σ Ai (Tskin
4
− Ti4 ) (1)
where qi is each irradiated quantity generated by an object, σ is the Stefan-Boltzmann
constant, Ai the exposed surface of the irradiating object Tskin the initial temperature of
196 M. Gaudina, A. Brogni, and D. Caldwell
the user’s skin and Ti the surface temperature of the irradiating object. The user’s skin
will absorb only a part of the irradiated heat, because of its grey body characteristic. We
can make a parallelism with the Coulomb attraction law of two electrical charges and
therefore assume that the quantity is inversely proportional to the distance of the desired
point with the irradiating surface. We therefore introduce the reflection coefficient and
distance:
σ Ai (Tskin
4
− Ti4 )
qi = (2)
2π d 2( aMaterial
1
+ aSkin
1
− 1)
where d is the distance between the interest point and the heat source, aMaterial is the
absorption coefficient of the object material and aSkin is the absorption coefficient of the
user’s skin, which sum minus 1 constitutes the reflection coefficient. From equation 2
we can say that each heat source can be summarised as function of the distance with the
interested point, the surface of the object, the surface temperature and the material of
the heat source. At this point we can say that the total heat quantity of each contribution
on the user’s skin is:
n
Q = ∑ qi (di , Ti ) (3)
i=1
If we consider the heat quantity perceived by user skin, we have to introduce the thermal
capacity of the skin itself and the mass of the user finger:
Q = cmΔ T (4)
where c is the thermal capacity of the user skin and m is the mass of the finger. Writing
equation 4 in therms of temperature, we may express the temperature over the skin as:
Q
Tdesired = − Tskin (5)
cm
At every instant equation 5 gives us the final temperature of a desired point over the
user’s skin. This allows us to model a virtual ambient where objects can have irradiating
thermal characteristics. This could take the user to a new level of interaction with the
surrounding ambient and it allows involving experiences.
2.2 Position and Heat Source Orientation
Each heat source will start influencing the desired point at a certain distance and will
stop its contribution outside what we define influence area, shown in Figure 1.
Another important aspect talking about heat sources, is how the considered irradiat-
ing objects are positioned in respect to the desired interest point. Assuming all of the
heat sources are diffused reflecting as most of the grey bodies, we are interested to know
which is the angle between the interest point and the irradiating physical exposed part of
the heat source. For instance, a desired point is influenced by a spherical irradiating heat
source wherever the interest point is positioned with respect to the sphere; by the way, if
we consider a cube and we assign only one face a thermal irradiating characteristic, the
interest point will be influenced only under a certain angle between the irradiating side
of the object and the interest point. More precisely, following the Lambert’s cosine law
we introduced alpha that represents the angle between the centre of the interest point
and the normal to the irradiating side of the irradiating object. We can therefore include
this consideration in equation 2 and consider it as it follows:
⎧ 4 −T 4 )
⎪ σ Ai (Tskin
⎪
⎨ 2π d 2 ( 1
i cos α , if lowerlimit < α < upperlimit
aMaterial + 1
aSkin −1)
qi = (6)
⎪
⎪
⎩
0, if outside
where lowerlimit and upperlimit represent the maximum irradiating angle range of the
surface. In Figure 1 it is clearly visible that this depends on the surface conformation of
the irradiating side of the object considered. If for example, we are behind the verse of
the irradiating direction we will not perceive the influence of that particular object. In
this way we implement an attenuation of the heat flux compared to the interest point on
the normal of the considered surface. An important aspect, that at this stage we choose
not to consider, is the presence of other objects between the irradiating object and the
interested point and the influence heat source have on one on another to avoid further
complications.
Fig. 1. Interest point receives influences if it is in the range of the irradiating side. Therefore the
influence qi is not equal to 0.
2.3 Materials
Different materials exchange with the ambient different quantities of energy. We per-
ceive a temperature difference from an object of a material at a higher distance with
respect to another material, but the temperature is the same for both. This is due to
the different power of emissivity of each material that in equation 2 have been called
reflection coefficients. Table 1 shows some coefficients which have been used.
Table 1. List of emissivity coefficient of some materials
Material Coefficient
Plastic 0.91
Wood 0.91
Iron 0.0014
Copper 0.03
Glass 0.89
After this consideration it is obvious that a material at a certain temperature is per-

ceived less than another material with a significantly higher temperature at the same
distance, but with a different emissivity characteristic.

To test the irradiating algorithm, we prepared a virtual environment setup. VR pro-
jection is obtained with two Christie Mirage S+ 4000 projectors, synchronised with
StereoGraphics CrystalEyes active shutter glasses. We use 4x2 m2 powerwall, and an
Intersense IS-900 inertial-ultrasonic motion tracking system to sensorize the area in
front of the screen; in this way the user’s head is always tracked. Finger tracking is also
achieved with a set of 12 Optitrack FLEX:100 infrared cameras; since just one passive
marker is set on the finger, this is not a critical task. The main 3D application in Figure
2 was developed with VRMedia XVR1 which handles graphics, scene behavior and in-
put/output data sending. Devices and software exchange data with the main application
through XVR internal modules was written in C++.
2.5 Hardware Device

The aim of this work is to find the temperature on a interested point from an irradiating
heat source to give the user a thermal perception. To do this we can use cutaneous
thermal devices that are quite common and have been used for the last twenty years [3],
and most of them are based on Peltier effect to give the user the thermal feedback. The
problem of heat dissipating is to guarantee a good cooling down phase: this means that
cumbersome heat sink need to be used. In [4] Yang et al. developed an interesting device
to give surface discrimination and thermal feedback on the same point. The technique
used to dissipate heat was liquid cooling. This allows fast temperature variations. We
decided to follow a different strategy, developing a simple and low cost device, wearable
but at the same time having a temperature variation limited to other solution. The device
is composed by a Peltier Cell to generate warm/cold sensation. This thermo cell is
attached to a little piece of copper that allows an analog thermal sensor to close the loop
around the temperature generated. The upper side of the piece of copper is placed in
contact with the user’s skin to generate the thermal sensation. An LM3S1968 evaluation
board by Luminary Micro2 generates PWM signals to control, via PID controller, the
H-Bridge connected to the thermo cell driving the current in both directions. This allows
control of temperature up and down. Two 10x10x10 mm heat sinks and a DC fan gives
us the possibility to cool down the hot face of the Peltier cell in a faster way. The system
in Figure 2 is capable of increasing temperature of10.5◦C/sec and of cooling down at
4.5◦C/sec. The system can perform temperature variations in the range 15◦C − 75◦C.
3 Experiments and Data Analysis

To understand in a more analytical way the algorithm implementation, we carried out
two different experiments to test the algorithm and the hardware proposed. Ten
1 http://www.vrmedia.it/
2 http://www.luminarymicro.com
Fig. 2. On the left the testing thermal device. On the right the Virtual Environment setup and heat
sources representation.
dexterous participants, with a mean age of 27.6±4 years old, have been selected with
no known thermal diseases and with a low usage of their hands during their normal
day activities. The experimental setup used is as previously described, hence the finger
is tracked by the Optitrack cameras and a custom actuator generates the temperature
variation needed.
3.1 Reaching an High Temperature Object

In the first experimental session we asked the user to start 1.5 meters away from the
screen and slowly move with his finger toward a 3D projected sphere, stopping when
feeling thermal pain. The sphere with a radius size of 0.1 m could have three different
temperatures 318(K)(45C), 338(K)(65C), 378(K)(105C) randomly distributed between
all the ten participants having three trials each. Before each trial, the user has been asked
to wait for 10 seconds and the thermal actuator has been cooled down to avoid thermal
adaptation.
We wanted to analyse if this irradiating algorithm could be used as an alarm or not.
The first temperature threshold 318(K)(45C) is the limit observed for human thermal
pain [7]. The other two temperature threshold have been chosen to represent an object
that is really hot and the user should not touch. From the results shown in Figure 3 and
Figure 4 we can see that, as expected, the minimal distance between the tracked finger
and the center of the sphere is inversely proportional to the object temperature. In the
318(K)(45C) case most of the users touch the object before feeling any disturbance.
Fig. 3. Graph of the minimal distance between the tracked finger and the hot sphere
Fig. 4. Top view of the sphere object and the tracked position for each temperature threshold
With 338(K)(65C) and 378(K)(105C), the user feels the irradiated energy before touch-
ing the sphere and the distance felt is inversely proportional to the distance with the
sphere.
3.2 Discovering an Heat Source
The second experiment consists of the discovery of which is the irradiating heat face
of a 3D die. Every face was enumerated like a real dice as explained in Figure 6. The
experiment has been divided in two phases to better understand the limits we can en-
counter modelling a virtual environment regarding object dimensions. In the first part
of the experiment the dice side was 0.2m and in the second 0.1m. For each participant
the total number of trials was 6 and the irradiating face had been previously randomly
assigned for each trial. The correctness percentage of the answers is different between
the two dice. As expected its easier to discover which is the irradiating heat source with
the big dice than with the smaller. In Figure 5 we can observe the correctness percentage
of the two dice.
Fig. 5. Charts of the correctness percentage of the dice with two different side sizes with orienta-
tion as in Figure 6
Thus, we discovered, analysing the average time spent for each face in Table 2, that
for a smaller object the discovering phase is faster. This is probably due to a greater
flexibility in the user movements regarding little objects but this corresponds to a lower
percentage of correctness.
Table 2. Comparison of the average time spent by the user over each face of the two dice
Face 1 Face 2 Face 3 Face 4 Face 5 Face 6

Big Dice (sec) 10.05 11.92 88.74 83.21 51.9 60.6
Small Dice (sec) 88.7 72.71 119.5 71.16 55.09 80.3
Correct Answer Big Dice (sec) 67.79 76.56 76.07 69.84 41.97 56.09
Correct Answer Small Dice (sec) 81.33 15.82 27.21 62.27 30.27 40.33
Fig. 6. The dice orientation respect the user’s point of view and the enumeration of the faces
The results obtained underline that the irradiating algorithm is working as expected.
Users avoid touching objects with high temperatures and they can discover which is
the heat generating side of a simple cube. This is justification for a better and deeper
analysis and a wider and more comprehensive study, because it could mean that the
thermal feedback could be used in an operation like environment mapping or to describe
a virtual environment in a more realistic way.
4 Testing Application
The implementation in Figure 2 consists of the placement of up to six irradiating spheres
with different temperature values in range between 10◦C − 75◦C. The choice of using
spheres is to keep the system at this stage as simple as possible. The user can freely
move around the virtual scene and feel the temperature on the skin of the finger. In
table 3 are shown the parameters used for the application.
Table 3. List of parameters used in the testing application
Parameter Value
Stefan-Boltzmann constant 5.67*10−8 W m−2 K −4
Skin Thermal Capacity 418.6 J/(kgK)
Finger Mass 0.01Kg
Skin Thermal Emissivity coefficient 0.85
Sphere Radius 0.05m
Skin Temperature 310K
These are commonly used parameters in thermodynamics and the finger mass is
derived by the average weight of a hand [9]. The algorithm implementation suggests
that the algorithm works well with this kind of object, having different temperature
according to the algorithm control variables surface temperature, surface exposition,
distance and material. We tried different combinations of these variables and we noticed
that the algorithm implemented works well.
Fig. 7. On the left a real case study of the measurement of temperature values of a real lamp, on
the right a virtual representation of a bulb lamp
The algorithm takes into account hot and cold surfaces. If the user gets within the
influence area of an object with a temperature higher than his skin temperature, the user
will feel an increase of temperature. In addition, if he gets closer to a lower temperature
irradiating object he could feel a temperature decrease. This is due to the heat quantity
exchanged by the object with respect to the user’s skin, we know indeed that the heat
direction is from the hot body to the cold body.
To test a real case, we have implemented a 40W filament bulb lamp. Other than
considering the temperature of the filament itself - we can assume it is around 2000K -
we take into account the external temperature of the glass bulb. In this case we assumed
that the temperature is around 423K ( 150C ). The behaviour expected was that this
could have been the same as interacting with a sphere. The virtual representation of the
lamp is made with a Autodesk 3D Studio3 mesh imported inside XVR. The radius value
is the same as the previously used sphere and the material used in this case is glass. In
Figure 7, the virtual model is compared with a real 40w bulb lamp and the output value
of a thermal sensor is compared to the output values of the algorithm.
5 Conclusion
With this work we defined and tested an algorithm to model thermal characteristics of
irradiating objects for virtual environments. Physical characteristics such as the surface
and material of the considered irradiating object have been taken into account. Custom
electronics have been created to test the overall system. Two different concepts reaching
and discovering have been successfully studied using the algorithm. The results show
that users do not touch hot objects feeling a temperature disturbance. They could also
understand with a high percentage the correct irradiating face of a die. We then imple-
mented the algorithm in a basic virtual application. Future works are in the direction
of improving the algorithm, implementing other characteristics to better represent the
heat exchange between heats sources and an interested point. Natural and forced heat
convection are important topics to be implemented in order to create much more realis-
tic environments. Regarding the custom electronics used, this needs to be improved in
3 http://www.autodesk.com
speed responses, current consumption and physical dimensions and to be integrated into
a multimodal interaction system. The proposed work could be the base of psychophys-
ical studies about interaction between humans and their environment, where thermal
characteristics could help the user better understand the environment in which he/she is
operating.
References
1. Benali-Khoudja, M., Hafez, M., Alexandre, J.M., Benachour, J., Kheddar, A.: Thermal
feedback model for virtual reality. In: International Symposium on Micromechatronics and
Human Science. IEEE, Los Alamitos (2003)
2. Bonacina, C., Cavallini, A., Mattarolo, L.: Trasmissione del calore. CLEUP, Via delle
Fontane 44 r., Genova, Italy (1989)
3. Caldwell, D., Gosney, C.: Enhanced tactile feedback (tele-taction) using a multi-functional
sensory system. In: Robotics and Automation Conference. IEEE, Los Alamitos (1993)
4. Yang, G.-H., Ki-Uk Kyung, M.S., Kwon, D.S.: Development of quantitative tactile display
device to provide both pin-array-type tactile feedback and thermal feedback. In: Second Joint
EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and
Teleoperator Systems. IEEE, Los Alamitos (2007)
5. Guiatni, M., Kheddar, A.: Theoretical and experimental study of a heat transfer model for
thermal feedback in virtual environments. In: International Conference on Intelligent Robots
and Systems. IEEE, Los Alamitos (2008)
6. Ho, H.-N., Jones, L.: Contribution of thermal cues to material discrimination and localiza-
tion. Percept Psychophys 68, 118–128 (2006)
7. Kandel, E., Schawrtz, J., Jessell, T.: Principles of Neural Sience. McGraw Hill, New York
(2000)
8. Lin, M.C., Otaduy, M.A.: Haptic Rendering, Foundations,Algorithm and Applications. A.K.
Peters, Ltd., Wellesly, Massachusetts (2008)
9. Winter, D.A.: Biomechanics and motor control of human movement, 3rd edn. John Wiley
and Sons, Chichester (2004) (incorporated)
10. Yunus, C., Boles, M.A.: Thermodynamics: An Engineering Approach Sixth Edition (SI
Units). McGraw-Hill Higher Education, New York (2009)
Providing Immersive Virtual Experience with
First-Person Perspective Omnidirectional
Movies and Three Dimensional Sound Field
Kazuaki Kondo1 , Yasuhiro Mukaigawa2, Yusuke Ikeda3 , Seigo Enomoto3 ,

Shiro Ise4 , Satoshi Nakamura3 , and Yasushi Yagi2
1
Academic Center for Computing and Media studies, Kyoto University
Yoshida honmachi, Sakyo-ku, Kyoto, Japan
2
The Institute of Scientific and Industrial Research, Osaka University
8–1 Mihogaoka, Ibaraki-shi, Osaka, Japan
3
Spoken Language Communication Group, National Institute of Information and
Communications Technology
3–5 Hikaridai, Keihanna Science City, Japan
4
Graduate school of engineering, Department of Architecture and architectural
engineering, Kyoto University
Kyotodaigaku-katsura, Nishikyo-ku, Kyoto, Japan
kondo@ccm.media.kyoto-u.ac.jp, {mukaigaw,yagi}@am.sanken.osaka-u.ac.jp,
{yusuke.ikeda,seigo.enomoto,satoshi.nakamura}@nict.go.jp,
ise@archi.kyoto-u.ac.jp
Abstract. Providing high immersive feeling to audiences has proceeded

with growing up of techniques about video and acoustic medias. In our
proposal, we record and reproduce omnidirectional movies captured at
a perspective of an actor and three dimensional sound field around him,
and try to reproduce more impressive feeling. We propose a sequence
of techniques to archive it, including a recording equipment, video and
acoustic processing, and a presentation system. Effectiveness and de-
mand of our system has been demonstrated by ordinary people through
evaluation experiments.
Keywords: First-person Perspective, Omnidirectional Vision, Three Di-

mensional Sound Reproduction, Boundary Surface Control Principle.
1 Introduction
High realistic scene reproduction provides audiences rich virtual experiences that
can be used for sensory simulators and multimedia amusements. For example,
current cinemas install advanced capturing method, video processing, audio fil-
tering, presentation system to give immersive feeling as they were in the target
scene. The most important issue for providing such feeling is to capture and
present target scenes as these were. In this paper, we focus on following three
functions for that.

Providing Immersive Virtual Experience 205
Preserving observation perspective: observation perspective can be catego-

rized into third-party perspective and first-person perspective The former corre-
sponds to objectively capturing a scene, which can effectively conveys structure
of the scene and the story line. The latter perspective can be captured by a
recording device placed at a character’s position, which is good at providing im-
mersive feeling. Examples are to attach a compact video camera to ones head,
and actors/actress role as if a video camera was a person.
Preserving wide range(Omnidirectional) visual feeling: A wide range video
boosts realistic feeling. A Panoramic video on a wide screen is a typical ap-
proach. But it is not always enough because of not considering temporal change
and individual difference of audience’s observation direction. We focus on captur-
ing and displaying omnidirectional videos in order to adapt to these situations.
Although reconstruction of a 3D is also a effective approach, we here treat only
an omnidirectional property instead of the combination of them.
Preserving 3D acoustic feeling: Audio reality strongly depends on sounds
coming from which directions and how distances. Thus it is important to re-
produce 3D sound field including positions of sound sources. Usual approaches
are using stereo or 5.1 ch system, but these provide insufficient reproduction for
a specific position and direction. We focus on relaxing those listening limitations
as watching one discussed in the visual feeling.
Although these functions have been individually attacked in conventional ap-
proaches, we do not find any total system that covers all of them. In this paper,
we design a special recording device, discuss media processing, and develop a
presentation system, in order to satisfy the three functions.
2 Recording System
2.1 Wearable Omnidirectional Camera
We here assume three requirements that a video recording device should have.
- It can capture high resolution and uniform omnidirectional videos.
- Its optical center and a viewpoint of a wearer are at the same position.
- It can be easy to wear and to act for an enough long time.
Unfortunately, conventional approaches to capture outdoor scenes as omnidi-
rectional videos [5,7] do not satisfy all of the above requirements. These did
not consider to capture a scene from a character’s viewpoint, and approximate
the viewpoint matching with omnidirectional cameras mounted on the head.
Furthermore, needs of an additional equipment for recording and power supply
prevent the third requirements. A wearable omnidirectional camera has been
proposed [9] for life-log recording. But it also has the viewpoint mismatch, and
additionally low resolution problem. For these reason, we had proposed a special
wearable camera system named FIPPO [10]. FIPPO is constructed by four op-
tical units consisting of a handy type video camera and curved and flat mirrors
(Fig. 1(a)). It captures omnidirectional videos from first-person perspective with-
out any additional equipments and wired supply. Following descriptions briefly
explain design of a single optical unit in FIPPO.
206 K. Kondo et al.
We start at an objective projection defined by correspondences between pixels

on the image plane and rays running in the scene. Considering uniform resolu-
tion of the panoramic scene whose FOV are [θmin , θmax ] along azimuth angle
and [tanφmin , tanφmax ] along elevation, respectively, the objective projection is
formulated as
⎡ ⎤
tan( Uu (θmax − θmin ) + θmin )
Vs (u, v) = ⎣ Vv (tanφmax − tanφmin ) + tanφmin ⎦ (1)
1
in the world coordinate system. (u, v) is the position of a pixel on the image
plane whose size is U × V . It determines also a corresponding camera projection
Vc (u, v) = [u, v, −f ]t with its focal length f . They, Vs and Vc , should relate a
target curved mirror to its reflection ; An objective normal vector field Nd (u, v)
bisects the angle consisting of Vs and Vc . It is obtained by

1
Nd = N (N [Vs ] + N [RVc ]) . (2)
2
x
with a vector normalizing operator N [x] = ||x|| , and external parameters of the
camera P = [R t]. The mirror shape is formed so that its normal field is equal
to Nd . We use the linear algorithm [6], which equalizes the mirror shape S(u, v)
as the cross products of four-degree spline curves. S(u, v) is formulated as

S(u, v) = RVc (u, v) Cij fi (u)gj (v) + t (3)
i,j
where Cij and fi (u), gj (v) are control points on the spline curves and four-
degree spline bases, respectively. We obtain an optimal shape by solving the
linear equations about Cij that are stacks of ∂S ∂S
∂u · Nd = ∂v · Nd = 0 because the
optimal shape should be perpendicular to the desired normal vector field Nd .
Although this algorithm certainly minimizes errors on the normal vector field,
it tends to form a bumpy surface. So we apply a smoothing procedure to the
shape formed by this algorithm.
The obtained mirror approximates the objective projection. Thus, we check
the degree of the approximation. It is evaluated by sufficiency : how much does it
cover the required FOV, redundancy : how much does it cover outside the FOV,
and uniformity : how uniformly does it distribute the image. If the approximation
is sufficient, the design advances to the next step. If not, we adjust the camera
parameters to reduce projection errors and return to Eq. (2). Aberrations of the
designed optics need to be also checked because the mirror design algorithm does
not consider image focusing. The amount of aberration can be estimated with a
spot diagram, which is a spread image of a target object on the image plane. We
can construct spot diagrams by tracing the rays that go through an aperture of
the lens unit. If the aberrations appear to prevent image focusing, we adjust the
camera parameters to reduce the aberration and return to the first step. The
design process continues until the aberrations are acceptably small.
(a) FIPPO (b) Microphone array and a recorder
Fig. 1. Overview of the recording system
2.2 Microphone Array
Methods which have been used for recording and reproducing first-person per-
spective sound field include head and torso simulator(HATS) and recording with
microphones worn on listener’s ears. However, in these methods, it is impossi-
ble to freely move a listener’s head, because the sound signal is reproduced at
only two points around the ears. In this paper, we used a sound reproduction
system based on boundary surface control(BoSC) principle so that it gives the
listener an experience of the sound field from first person’s perspective with
omnidirectional movies.
Original BoSC system[8] has 70ch microphone array. It is difficult to make
microphone array-aided recording of the first-person perspective sound field ac-
companied by free body movements. Therefore, we simplified the system through
reducing the number of channels. The recording system has eight omnidirectional
microphones which are installed horizontally around the head of a wearer. It is
recommended that the height of microphone is on the same level with the per-
son’s ears. The microphones are installed slightly over the top of the head in
order to keep the microphones away from the mirrors of FIPPO (Fig. 1(b)). The
system is small enough and allows the person to freely move while wearing it.
One of the factors contributing to the small system is that the signal is recorded
in a handheld PC through a small Bus-Powered USB A/D converter.
3 Media Processing for Making Contents Movie

3.1 Image Processing for Omnidirectional Panorama
Correcting Image Warping. Images captured by FIPPO still have some ge-
ometric warps, despite of uniform projection being configured as objective one.
Calibrations of the distorted projections produced by the entire optical system,
including the curved mirrors, allow the images to be corrected. The calibrations
were conducted with a particular scene construction in order to homologize im-
age pixels and rays in the world. FIPPO placed at the front of a wide flat panel
monitor captures coded patterns that give correspondences between each pixel
on the image plane and each 2D position on the monitor. Measurements for
208 K. Kondo et al.
(a) (b) (c) (d)
(e)
Fig. 2. A result of image unwarping (panoramic). (a)-(b) Input images for each
direction, left, front, right, and back (e) Unwarped and mosaiced image.
planes at several depths are necessary for pixel-ray correspondence. When mea-
surements are taken at two depths whose distances d are given, the pixel-ray
correspondences can be formulated by
⎡ ⎤
x1 (u, v) − x2 (u, v)
p1 − p 2
ray(u, v) = = ⎣ y1 (u, v) − y2 (u, v) ⎦ (4)
d
d
where ray(u, v) = [rx , ry , rz ]t , and pi = [xi , yi ]t denote a ray in the world corre-
sponding to a point(u, v) on the image plane, and a 2D position on the monitor
plane, respectively. Figure 2 shows an example of correcting image warping based
on the calibration results. Eq. (4) says that directions of rays are determined,
but not their position. Thus note that the calibration does not work well for
near scene because FIPPO is designed to be approximated as a single viewpoint
optical system.
Correcting Color Space. It is also necessary to correct chromatic differ-

ences that are mainly attributable to individual differences in the cameras. We
solved this problem by transforming color spaces under the assumption of an
affine transformations between them. It enough approximates relationship of
color spaces produced by the same model cameras used in FIPPO. The affine
transform is related by
⎡ ⎤
⎡ ⎤ ⎡ ⎤ Rn
Rm p11 p12 p13 p14 ⎢ ⎥
⎣ Gm ⎦ = ⎣ p21 p22 p23 p24 ⎦ ⎢ Gn ⎥ (5)
⎣ Bn ⎦
Bm p31 p32 p33 p34
1
where [Rk , Gk , Bk ]t and pij are RGB colors of the same object on the k-th cam-
era and coefficients of the affine transformation, respectively. Since Eq. (5) forms
three linear equations for one color correspondence, at least four color correspon-
dences are necessary to determine twelve unknowns in pij . Figure 3 shows the
(a) (b) (c)
Fig. 3. Chromatic correction. (a) Color checkers captured by different cameras. (b)
Mosaiced images without the correction. (c) That with the correction.
results of chromatic correction. The images in the figure show a neighborhood

of image mosaics. The vertical line at the horizontal center corresponds to the
border between two contiguous images. Blue components that were relatively
strong assume natural coloring after the correction.
3.2 Reconstructing 3D Sound Field

Boundary Surface Control Principle. It follows from Kirchhoff-Helmholtz
integral equation that a control of sound pressures and sound pressure gradients
on a boundary of region means a control of sound pressures inside the boundary.
Boundary surface control principle removes the problem of ideal sound sources
and the restriction of free sound field using Kirchhoff-Helmholtz integral equation
and multi-channel inverse system[2]. When applied in the 3D sound reproduction
system, the microphones are set at arbitrarily-chosen points within the 3D sound
field, and by reproducing the sound pressures recorded at those points in a
different location, it becomes possible to accurately reproduce sound field of
the area enclosed with the microphones. Therefore, it is different from common
transaural system and binaural system. In BoSC system, a listener freely moves
his body listening to the sound field which is consistent with the original sound
field.
Design Method of Inverse System. Here loudspeakers controlling sound

pressures and points which are targets of sound pressure control are referred
to as “secondary sound sources” and “control points” respectively. The number
of secondary sources and control points are denoted by M and N respectively.
Frequency transfer characteristic between ith sound source and jth control point
is denoted by Gji (ω). Recorded signal at primary sound field, output signal from
sound source and measured signal at control points are denoted by Xj (ω), Yi (ω)
and Zj (ω) respectively. The relationship between inputs and outputs of sound
reproduction system is as follows.
Z(ω) = [G(ω)]Y(ω) = [G(ω)][H(ω)]X(ω) (6)

T T
where, X(ω) = [X1 (ω), · · · , XN (ω)] , Y(ω) = [Y1 (ω), · · · , YM (ω)] , Z(ω) = [Z1 (ω),
· · · , ZN (ω)]T ,
210 K. Kondo et al.
⎡ ⎤ ⎡ ⎤
G11 (ω) · · · G1M (ω) H11 (ω) · · · H1N (ω)
⎢ .. .. .. ⎥ ⎢ .. .. .. ⎥
[G(ω)] = ⎣ . . . ⎦ and [H(ω)] = ⎣ . . . ⎦.
GN 1 (ω) · · · GN M (ω) HM1 (ω) · · · HMN (ω)
The purpose of designing inverse filter in the sound reproduction system is to

find the inverse filter [H(ω)] of [G(ω)]. When a small error included in X(ω) and
variation of system transfer function[G] largely effect the value of Z(ω), inverse
filter [H(ω)] becomes unstable. We, thus, designed inverse filter using a regular-
ization which can continuously change the parameter to ease the instability.
4 Presentation System: Omnidirectional Theater

Omnidirectional movies should be displayed all around on viewers with a wide
FOV in order to provide immersive feeling. Researchers have proposed omnidi-
rectional display systems for such a situation. These are categorized into personal
use equipment[3] and dome or room type systems for multiple persons[1]. We
developed the latter type omnidirectional theater to emphasize that multiple
audiences share the same feeling. The theater consists of four projectors and
four 3m × 2m flat screens standing like walls of a square room(Fig. 4).
Eight loudspeakers surrounding a listener reproduce a recorded sound field
based on BoSC principle. The two loudspeakers are set behind each screen which
is perforated for a sound. We measured the impulse responses between each
loudspeaker and the microphone array which is set inside the theater and has
the same alignment with the microphone array which was used for recording. We
calculated the inverse filter which has 4096 points length. In order to simplify
the calculation of the inverse system, acoustic panels and carpets are installed
on the ceiling and the floor of the theater respectively. The sound field inside the
microphone array is reproduced by the loudspeakers which have the convoluted
signal of calculated inverse filter and recorded signal as the output signal. It is
expected that the sound field of a larger region is reproduced so that it is the
same as the original sound field[4]. It is also expected that the sound field is
more accurately reproduced because the inverse filter compensates not only an
(a) (b) (c)
Fig. 4. Omnidirectional theater. (a) Omnidirectional video display system. (b) 3D

sound reproduction system. (c) Inner of the theater.
Table 1. Contents of the questionnaire
Questions about realistic sensation(five-grade scales).

A. Did you got immersive feeling as you were in the scene ?
1. Not at all 2. Not much 3. As usual 4. Fairly 5. Much
B. How much did you feel reality compared with
a single front movie ?
C. How was the image quality ?
1. Bad 2. Not good 3. Normal 4. Good 5. Great
D. How much did you feel reality compared with
a stereo sound ?
E. How was the audio quality ?
1. Bad 2. Not good 3. Normal 4. Good 5. Great
Questions with free-form spaces.
F. What additional features are necessary for the current system ?
attenuation caused by the sound screen but also the acoustic characteristics of
a theater.
5 Experiment
5.1 Configurations
We validated our scene reproduction proposal from viewpoint of immersive feel-
ing through a virtual experiment. Presented first-person perspective contents
were (1) daily scenes in the park including fall foliage, water currents, and other
persons, and (2) basketball game scenes like shown in Fig. 2.
The experiment had been executed for each group consisting of 3-5 persons in
an outreach event held at the National Museum of Emerging Science and Inno-
vation in Tokyo. They experienced the 3 minute video content consisting of the
scenes addressed in the above. After that, we conducted questionnaires whose
items are listed in Table 1. These are five-grade scale questions and questions
with free-form answer spaces. The former is related to realistic sensation that the
viewers felt. The latter is to obtain objective opinions about demands of first-
person perspective omnidirectional movies and issues that should be improved.
We got about 750 valid responses from more than 1, 100 subjects who experi-
enced our system for three days. Since the subjects were in a wide age range, and
included groups such as couples, friends, and families, we can expected general
and objective evaluations.
5.2 Results and Discussions

We can see that most subjects felt highly realistic sensations from the result
shown in Fig. 5(a), which demonstrates the effectiveness of first-person perspec-
tive omnidirectional movies. Unfortunately, image quality got a low score. One
212 K. Kondo et al.
Quality of the medias

- Improve image quality
5
- (resolution and contrast)
3.81 3.97 3.82 Video quaking
4
- Little sick, nauseous, dizzy
3.13 - Film without head swinging
3
2.49 Reproduction of spatial sense
- Desire sense of correct depth
2 - Recognize shapes of objects and scenes
Construction of the screen
1 - A cylindrical, a doom, and eight flat screens
Question A Question B Question C Question D Question E - Additional overhead screen
(a) (b)
Fig. 5. (a) Results of the five-grade scale questions. 1 and 5 denote lowest and highest
scores, respectively. (b) Representative answers written in the free-form spaces.
reason is optical construction of FIPPO. Since rays from a scene are reflected
multiple times to be projected on image planes, light quantity decreases in each
reflection, resulting in low quality images. The mirrors used in the prototype
FIPPO are covered with low reflective material. This problem can be solved by
using high reflective material. The other reason is less image contrast at the
display stage. There are some causes such as output contrast of the projectors
and inter reflections between the screens.
The representative opinions written in the free-form space are listed in Fig.
5(b). Video quaking that means image shakes and blurs caused by rapid ego-
motions were pointed out as problems that should be solved. Some subjects said
that they felt nauseous or got dizzy. In a way, our system can truly reproduce
a first-person perspective, including head swing, but this is actually worse to
be displayed to static viewers. Recording a head state with a gyro sensor or
ego-motion estimation algorithms that use horizontal cyclic property of omni-
directional movies will help with video stabilization. Some subjects said that a
cylindrical screen should be used instead of the four flat screens that gives an
incorrect sense on depth. A fundamental approach is needed to provide a spa-
tial sense not considered in the proposed method. Omnidirectional scenes must
be spatially constructed, which requires special capturing equipment. A three
dimensional display all around viewers is also a challenging issue.
6 Conclusion
In this paper, we proposed a virtual experience system that provides high realis-
tic feeling to audiences with omnidirectional videos and 3D sounds captured from
a first-person perspective. Contents data are captured by a specially designed
wearable equipment consisting of catadioptric imaging systems and a microphone
array. Audio and visual media processing for providing high realistic feeling and
a presentation system are also discussed. Its performance have been evaluated by
experiences of more than 1,000 ordinary persons. At the same time of expected
results on high realistic and immersive feeling, we got several problems such as
video quaking, media quality, and sense of depth given by video. These problems
are now being attacked by other proposals. Thus the combination with them will
give more attractive virtual experiences.
Acknowledgment
This work is supported by the Special Coordination Funds for Promoting Sci-
ence and Technology of Ministry of Education, Culture, Sports, Science and
Technology. The authors wish to thank SANYO Electric Co. Ltd. for providing
the specially modified portable video cameras.
References
1. Cruz-Neria, C., Sandin, D.J., DeFanti, T.A.: Surround-Screen Projectorion-Based
Virtual Reality: The Design and Implementation of the CAVE. In: Proc. of Int.
Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH 1993), pp.
135–142 (1993)
2. Ise, S.: A principle of active control of sound based on the Kirchhoff-Helmholtz
integral equation and the inverse system theory. The Journal of Acoustical Society
of Japan 53(9), 706–713 (1997)
3. Hashimoto, W., Iwata, H.: Ensphered vision: Spherical immersive display using
convex mirror. Trans. of the Virtual Reality Society of Japan 4(3), 479–486 (1999)
4. Kaminuma, A., Ise, S., Shikano, K.: Sound reproduction-system design considering
head movement. Trans. of the Virtual Reality Society of Japan 5(3), 957–964 (2000)
(in Japanese)
5. Yamazawa, K., Takemura, H., Yokoya, N.: Telepresence system with an omnidirec-
tional HD camera. In: Proc. of Fifth Asian Conference on Computer Vision (ACCV
2002), vol. II, pp. 533–538 (2002)
6. Swaminathan, R., Nayar, S.K., Grossberg, M.D.: Designing of Mirrors for cata-
dioptric systems that minimize image error. In: Proc. of IEEE Workshop on Om-
nidirectional Vision, OMNIVIS (2004)
7. Ikeda, S., Sato, T., Kanbara, M., Yokoya, N.: Immersive telepresence system with
a locomotion interface using high-resolution omnidirectional videos. In: Proc. of
IAPR Conf. on Machine Vision Applications (MVA), pp. 602–605 (2005)
8. Enomoto, S., Ikeda, Y., Ise, S., Nakamura, S.: Three-dimensional sound field re-
production and recording system based on the boundary surface control principle.
In: The 14th Int. Conf. on Auditory Display, pp. o 16 (2008)
9. Azuma, H., Mukaigawa, Y., Yagi, Y.: Spatio-Temporal Lifelog Using a Wearable
Compound Omnidirectional Sensor. In: Proc. of the Eighth Workshop on Omni-
directional Vision, Camera Networks and Non-classical Cameras, ONIVIS 2008
(2008)
10. Kondo, K., Mukaigawa, Y., Yagi, Y.: Wearable Imaging System for Capturing
Omnidirectional Movies from a First-person Perspective. In: Proc. of The 16th
ACM Symposium on Virtual Reality Software and Technology, VRST 2009 (2009)
Intercepting Virtual Ball in Immersive Virtual
Environment
Massimiliano Valente, Davide Sobrero, Andrea Brogni, and Darwin Caldwell
Advanced Robotics Dept. - Istituto Italiano di Tecnologia, Genoa, Italy

{massimiliano.valente,davide.sobrero,andrea.brogni,
darwin.caldwell}@iit.it
Abstract. Catching a flying ball is a difficult task that requires sensory systems
to calculate the precise trajectory of the ball to predict its movement, and the
motor systems to drive the hand in the right place at the right time.
In this paper we have analyzed the human performance in an intercepting task
performed in an immersive virtual environment and the possible improvement of
the performance by adding some feedback.
Virtual balls were launched from a distance of 11 m with 12 trajectories. The
volunteers was equipped only with shutter glasses and one maker on backhand
to avoid any constriction of natural movements. We ran the experiment in a nat-
ural scene, either without feedback or with acoustic feedback to report a corrects
intercept. Analysis of performance shows a significant increment of successful
trials in feedback condition. Experiment results are better with respect to similar
experiment described in literature, but performances are still lower to results in
real world.
Keywords: Virtual Reality, Ecological Validity, Interceptive Action.
1 Introduction
In real life, interaction with different objects is driven by a set of sensory stimuli, first
of all visual and haptic ones: if we lose some of them our performances decrease. When
we want to interact with a moving object, we need to know its physical characteristics
and its space displacement so that our movements toward the target are well directed.
In particular, catching a flying ball is a difficult task that requires our sensory system
to calculate the precise trajectory of the ball to predict its movement, and our motor
system to drive the hand in the right place at the right time.
Such characteristics set makes this task suitable for a good evaluation of human
performances in terms of precision, immersion and adaptation in virtual reality.
The aim of our study is to design a more natural way of interaction and, thus, to
evaluate if the performance increases and the goal-directed movement is more accurate,
and overall to evaluate the human reactions in a virtual environment.
In our experiment, we expect that the natural wave to reach a flying ball produces
better performance with respect to the same task performed with other handed devices.
Thus, besides the performance results, we have also evaluated the characteristics of the
hand movement, to certify the similarity of the trajectory with respect to the same task
in a real environment.

Intercepting Virtual Ball in Immersive Virtual Environment 215
The remind of the paper is organized as follows: in Sec. 2 we outline some relevant
related work in literature; in Sec. 3 we describe the design of experiment; in Sec. 4 we
present the results obtained from the subjects performing the experiment and in Sec. 5
we summarize the most relevant results and discuss some possible future works.
2 Related Works
The spatial perception is important variables in our task, we must consider the potential
causes of different perception between real-word and virtual environment like graphic
characteristics and differences between natural vision and perception in virtual environ-
ment: for more details, see Murgia and Sharkey [10]. In particular, distance perception
is influenced by these perceptive differences. Different previous works report that ob-
ject’s distance from observer is underestimated more in virtual environment than in real
[1,14].
Literature on reaching ball in physical world is divided in three research areas: the
outfielder problem, estimation of reaching and catching fly ball.
The outfielder problem regards the movement to position of ball’s fall [5,9]; the test
of the outfielder problem is translated by Fink et. al in virtual environment with use of
HMD [7]. Experiments of estimation of reaching regard the indication of balls posi-
tions before or after balls passages [12]. Catching fly ball studies analyze the trajectory
perception in binocular and monocular conditions [8], the performance with several
velocity [13], the performance with several angle trajectories [4].
The literature on catching or intercepting balls in immersive virtual reality is ad-
dressed on all body movement’s analysis sport performance both wit HMD [7] and in
CAVE [2,3,6].
Zaal and Michaels [15] studied the judging of trajectories and intercepting move-
ments for intercepting task in CAVE, they also made analysis of intercepting perfor-
mances but their results show that the volunteers intercepted only the 15% of balls. They
designed their experiment without any environment and they captured hand movements
with a wand then the volunteers held. They used only throws to subjects bound without
different angles of approach.
3 The Experiment
We plan our experiment starting from a typical training task for baseball players: to
catch a ball thrown from an automatic system. We designed a simple virtual environ-
ment where the subject can perform a similar repetitive task.
3.1 System Setup

We carried out our experiment in an immersive virtual environment system: the room
was equipped with two stereographic projectors Christie Mirage S+3K that allow vi-
sualizing a wide scene on a screen of 4 m of length and 2 m of height. The system is
integrated with the IS900 from Intersense: a 6-DOF system for wide area tracking; we
used it to track the position of the head of the subject.
216 M. Valente et al.
We also used the Optitrack FLEX:V100r2 motion capture system, with infrared
cameras to track the subject hand with a passive marker fixed on his right back hand.
The graphical aspect of the experiment was implemented using the XVR1 framework
for developing virtual reality applications . We used in addiction a physic simulation
engine (Physx2) to calculate in real time the ball trajectory as real as possible.
3.2 Procedure and Design

The virtual environment was composed of a green grass with on a wooden board 11 m
away from the observer. We placed of the board every 2 m three black point, to indicate
the different sources on the throws. A picture of the starting setup of the experiment is
shown in Fig. 1.
Fig. 1. The initial setup: the ball is in front of the subject
At the beginning of the experiment, the volunteer stood in front of the screen with
his feet on a blue line posed one meter away from the screen. A virtual softball ball,
of 10 cm diameter, was placed between the screen and the subject, at subject’s chest
height, to make clear the actual dimension of the ball.
During the experimental sessions, the subjects were asked to try to intercept the
virtual softball ball with the right hand. In every trial, the ball was thrown with constant
initial speed, about 10 m/s, from one of the three black points randomly. The ball could
have three different elevations from the floor (high 38.5◦, central 37◦ and low 35.5◦ )
and four different angles respects to the subject (azimuth), as explained in Table 1.
We designed the trajectories through these parameters to obtain four classes of arrival
area: L-balls for balls arriving on the left of the subject, C-balls for ball arriving in
1
www.vrmedia.it
2
www.nvidia.com
Table 1. Different azimuth angles for trajectory
Azimuth
L-balls C-balls R-balls RR-balls
Left hole 9.3◦ 10.3◦ 11.3◦ 12.3◦
◦
Central hole −1.0 0.0◦ 1.0◦ 2.0◦
Right hole −11.3 −10.3◦
◦
−9.3◦ −8.3◦
front of the subject, R-balls for balls arriving on the right and RR-balls for the ones
arriving on the extreme right, still reachable from the hand subject. In Fig. 2 we show
the idealized target position.
We tried with some volunteers trajectories arriving to the extreme left, but this throws
came out hard to intercept because the balls was out of range of the right hand.
Fig. 2. Idealized target position is indicated by balls. The arrow shows the hand in start position
and the marker on back hand.
The subject was asked to stay in fixed position, its feet on the blue line and the right
hand on its navel, in front of the screen, before the ball starts and to move the right hand
to intercept the ball as quickly as possible when the ball starts to move.
To reach the ball far from him, he could move the body, but he could not move his
feet.
The experiment was made up of five sessions: one training session with ten
throws and four experimental sessions of ninety throws each, for a total of 370 trials
for subject.
3.3 Participants
We performed the experiment with twenty volunteers (thirteen men and seven women)
between 25 and 32 years of age, with height between 1.65 m and 1.80 m. All the par-
ticipants reported normal or correct to normal vision. All the subjects right handed, as
checked with the Edinburgh Handedness Inventory [11].
The volunteers reported nothing or little previous virtual reality experience. Five
people reported experience in ball games such as basketball, volleyball or tennis and all
men reported experience in soccer (but not as goalkeeper).
The volunteers participated in to this experiment after giving informed consent.
We divided the volunteers into two random group of ten people. The first group
did not receive any kind of feedback, either visual or other, for rights intercepting ball
during the experiment (NoFeedback Group). The second group had sounds feedbacks,
one for rights catching ball and one for fail, during the experiment (Feedback Group).
Moreover, at the end of every session the subjects of this group were informed of their
percentage of success.
4 Results
We recorded data from the head sensor for the head movements, the position of the
hand during the ball throwing and the position of the ball during the fly.
We have examined subjects catching behavior and the immersion in virtual envi-
ronment degree. We analyzed the percentage of successful catches and we compared
results of NoFeedback Group with results of Feedback Group.
In addiction we analyzed the hand movement characteristics (peak of maximum ve-
locity, latency before start the movement and time to catching) to verify if our results
can be comparable with the same data found in experiments in real word showed in
literature.
4.1 Catching Performance
To assess the success in the catching of the balls, we calculated at each time step the
distance between the marker placed on subject’s hand and the center of the virtual ball,
when the ball was in front of the hand and not behind.
We assessed a successful catch (hit) when this distance was less than 10 cm. We
calculated the global hit probability for both groups of subjects, the hit probability for
all trajectories and hit probability for the 12 intercept areas. A summary of the perfor-
mances for all intercept areas can be found on Table 2.
We can see that the performance without acoustic feedback is significantly lower
than performances helped by a feedback. The average of performance of No feedback
Group was 0.53 with a standard error of 0.014, the performance of Feedback Group
was 0.61 with standard error of 0.012 (p < 0.001).
The graphs in Fig. 3 show mean and standard error for all types of throw. We can
observe that the performances of Feedback group are always better than performances
of NoFeedback group for throws from left hole (hit probability: 0.61 vs. 0.5) and central
Table 2. Results
L C R RR
Mean SEM Mean SEM Mean SEM Mean SEM
Fb 0.557 0.045 0.637 0.041 0.651 0.035 0.581 0.053
High
NoFb 0.488 0.044 0.599 0.047 0.568 0.049 0.489 0.039
Fb 0.657 0.038 0.740 0.030 0.742 0.029 0.612 0.041
Central
NoFb 0.567 0.061 0.632 0.039 0.644 0.036 0.567 0.050
Fb 0,443 0,041 0.586 0.034 0.569 0.036 0.520 0.036
Low
NoFb 0.392 0.055 0.409 0.058 0.536 0.045 0.468 0.044
hole (hit probability: 0.64 vs. 0.53). This difference is not present for throws from right
hole: the NoFeedback group’s performances reach the Feedback group’s performances
(hit probability: 0.57 vs. 0.57).
Fig. 3. Hit success percentage of all throws. (A) throws from right hole. (B) throws from central
hole. (C) throws from right hole.
ANOVA analysis shows significant effect of the elevation factor in both groups.
For the NoFeedback group the effect is significant between 38.5◦ and 35.5◦ with
p = 0.017 and between 37◦ and 35.5◦ with p > 0.001. The difference is not significant
between 38.5◦ and 37◦ (p = 0.08).
Fig. 4. Hit probability for different elevations
For the Feedback group the effect is significant for all elevation among them (38.5◦
vs. 37◦ p = 0.001; 38.5◦ vs. 35.5◦ p = 0.003; 37◦ vs. 35.5◦ p > 0.001).
Feedback’s effect is show in Fig. 4: this effect is significant for all elevations always
with p < 0.001.
The analysis performances reveals a good interpretation of balls’ trajectories but the
intercept performances are worse than results made in real environment, that record
performances around 90% of hit [8,13] versus 74% of our better mean performance.
However our hit probabilities are better than the performance indicated by Zaal and
Michaels [15] for the same type of throws in virtual environment.
4.2 Hand Movement

Latency. We calculated the latencies from the instant in which the ball starts, to the
moment when movement speed is about 5 m/s. Mean latency is 417 ms; this results is
consistent with the acquisition in real task [8].
There are some differences in latencies between the different types of throws. La-
tencies times are inverse proportional to distances between the start position and the
probable impact point, and they have correlation with left or right side of catch. Post
hoc analysis shows significant differences between latencies for L-balls and R-balls.
Graph of the mean latencies grouped for area of intercept is shown in Fig. 5a.
Maximal Velocity Analysis. The means of maximal velocity for the various trajecto-
ries (Fig. 5b) is the mirror image of latencies graph: minor latency is equal to major
speed movement to left side.
Heuristics Observations. We made two particular observation on subject’s behavior

during the task execution.
Fig. 5. (A) Mean latencies of hand movement for area of intercept. (B) Maximal velocity of hand
for area of intercept.
We observed that many volunteers made an hand’s grasping movement on virtual

balls during the experiment. This behavior lasted about two or three sessions (including
training session) and after only few people carried on to close the hand.
Moreover, the subjects had to be still with feet during the experimental sessions,
according to instructions, but they had no instructions for rest of body. Few volunteers
started immediately to move the body to reach the balls, most people started to move
the body only after two or three sessions.
5 Conclusions
The experiment gave us positive results with respect to previous experiences: the possi-
bility to perform a natural reaching movement without using any handed device and the
ecologic setup have raised the performances of the subjects beyond the results present
in literature, in spite of the presence of a major number of trajectories.
We have shown that by giving in real time an acoustic feedback the performance
increases, allowing a correction of the eye/hand coordination and, probably, giving a
reward element absent in the case without feedback.
The fact that in the case of throws from the right hole the feedback does not give any
aid is probably due to the more clear visibility of this types of trajectories.
Nevertheless, we do not reach the performances expected in the real case: this can be
caused by perceptive problems mentioned in Sec. 2, or from the relative lack of proposal
feedbacks.
We are working on different types of feedback and combination of more than one
feedback at the same time, to verify if the performances can be further improved.
References
1. Armbrster, C., Wolter, M., Kuhlen, T., Spijkers, W., Fimm, B.: Depth perception in virtual
reality: Distance estimations in peri- and extrapersonal space. Cyber Psychology & Behav-
ior 11(1), 9–15 (2008)
2. Bideau, B., Kulpa, R., Vignais, N., Brault, S., Multon, F., Craig, C.: Using virtual reality to
analyze sports performance. IEEE Computer Graphics and Applications 30(2), 14–21 (2010)
3. Bideau, B., Multon, F., Kulpa, R., Fradet, L., Arnaldi, B., Delamarche, P.: Using virtual
reality to analyze links between handball thrower kinematics and goalkeeper’s reactions.
Neuroscience Letters 372(1-2), 119–122 (2004)
4. Bockemhl, T., Troje, N.F., Drr, V.: Inter-joint coupling and joint angle synergies of human
catching movements. Human Movement Science 29(1), 73–93 (2010)
5. Chapman, S.: Catching a baseball. American Journal of Physics 36(10), 868–870 (1968)
6. Craig, C.M., Goulon, C., Berton, E., Rao, G., Fernandez, L., Bootsma, R.J.: Optic variables
used to judge future ball arrival position in expert and novice soccer players. Attention, Per-
ception, & Psychophysics 71(3), 515–522 (2009)
7. Fink, P.W., Foo, P.S., Warren, W.H.: Catching fly balls in virtual reality: A critical test of the
outfielder problem. Journal of Vision 9(13), 1–8 (2009)
8. Mazyn, L.I.N., Lenoir, M., Montagne, G., Savelsbergh, G.J.P.: The contribution of stereo
vision to one-handed catching. Experimental Brain Research 157(3), 383–390 (2004)
9. McLeod, P., Dienes, Z.: Do fielders know where to go to catch the ball or only how to get
there? Journal of Experimental Psychology: Human Perception and Performance 22(3), 531–
543 (1996)
10. Murgia, A., Sharkey, P.M.: Estimation of distances in virtual environments using size con-
stancy. The International Journal of Virtual Reality 8(1), 67–74 (2009)
11. Oldfield, R.C.: The assessment and analysis of handedness: The edinburgh inventory. Neu-
ropsychologia 9(1), 97–113 (1971)
12. Peper, L., Bootsma, R.J., Mestre, D.R., Bakker, F.C.: Catching balls: How to get the hand to
the right place at the right time. Journal of Experimental Psychology: Human Perception and
Performance 20(3), 591–612 (1994)
13. Tijtgat, P., Bennett, S., Savelsbergh, G., De Clercq, D., Lenoir, M.: Advance knowledge
effects on kinematics of one-handed catching. Experimental Brain Research 201(4), 875–
884 (2010), 10.1007/s00221-009-2102-0
14. Wann, J.P., Rushton, S., Mon-Williams, M.: Natural problems for stereoscopic depth percep-
tion in virtual environments. Vision Research 35(19), 2731–2736 (1995)
15. Zaal, F.T.J.M., Michaels, C.F.: The information for catching fly balls: Judging and inter-
cepting virtual balls in a cave. Journal of Experimental Psychology: Human Perception and
Performance 29(3), 537–555 (2003)
Concave-Convex Surface Perception by Visuo-vestibular
Stimuli for Five-Senses Theater
Tomohiro Amemiya1, Koichi Hirota2, and Yasushi Ikei3

1
NTT Communication Science Laboratories,
3-1 Morinosato Wakamiya, Atsugi-shi, Kanagawa 243-0198 Japan
2
Graduate School of Frontier Science, The University of Tokyo,
5-1-5 Kashiwanoha, Kashiwano-shi, Chiba 277-8563 Japan
3
Graduate School of System Design, Tokyo Metropolitan University,
6-6 Asahigaoka, Hino-shi, Tokyo 191-0065 Japan
amemiya@ieee.org, hirota@media.k.u-tokyo.ac.jp,
ikei@tmit.ac.jp
Abstract. The paper describes a pilot study of perceptual interactions among

visual, vestibular, and tactile stimulations for enhancing the sense of presence
and naturalness for ultra-realistic sensations. In this study, we focused on
understanding the temporally and spatially optimized combination of visuo-
tactile-vestibular stimuli that would create concave-convex surface sensations.
We developed an experimental system to present synchronized visuo-vestibular
stimulation and evaluated the influence of various combinations of visual and
vestibular stimuli on the shape perception by body motion. The experimental
results urge us to add a tactile sensation to facilitate ultra-realistic
communication by changing the contact area between the human body and
motion chair.
Keywords: vestibular stimulation, ultra realistic, multimodal, tactile.
1 Introduction
With the progress in video technology and the recent spread of video presentation
equipment, we can watch stereoscopic movies and large-screen high-definition videos
not only in large amusement facilities but also in our private living rooms,. The next
step for enhancing the presence of audiovisual contents will be to add other sensory
information, such as tactile, haptic, olfactory, or vestibular information. After
SENSORAMA, a pioneering system in multisensory theater, a number of similar
attractions have been developed for the large amusement facilities. In order for a new
technology to make its way to our living rooms, it is important to establish a
methodology with the aim of not only faithfully reproducing the physical information,
but also of optimizing it for human perception. If the sensory stimuli can be fully
optimized, it is expected that a highly effective system can be developed with
inexpensive, simple, and small equipment.
226 T. Amemiya, K. Hirota, and Y. Ikei
The authors have proposed Five-Senses Theater [1-3] to generate “ultra-realistic”

sensations [4]. The “theater” we envision here would be widely available in living
rooms as “home theater” and offer an interactive framework rather than just a way to
experience contents. In this paper, we focus on motion sensation, which is one aspect
of Five-Senses Theater. We developed an experimental system to integrate visual and
vestibular sensory information and conducted a pilot study to investigate how to
effectively generate vestibular sensation with visual stimuli. We also present a tactile-
integrated prototype, which we plan to use in psychophysical experiments on
multisensory integration.
2 System Design
Sensory inputs involved in self-motion sensation are mainly visual, vestibular, and
somatosensory signals. In chair-like vehicles, such as driving simulators or theater
seats, we generally detect velocity information using visual cues and detect
acceleration and angular acceleration information using mechanical cues (vestibular
and tactile sensations), respectively.
A stationary observer often feels subjective movement of the body when viewing a
visual motion simulating a retinal optical flow generated by body movement. This
phenomenon is called vection [5-8].
Acceleration and angular acceleration are sensed by the otolith organs and
semicircular canals. These organs can be stimulated by mechanical (e.g., motion chair
[9,10]), electrical (e.g., galvanic vestibular stimulation [11]), and thermal means (e.g.,
caloric tests). Electrical stimulation can be achieved with a more inexpensive
configuration than the others can. However, it affects anteroposterior and lateral
directions differently, and there have been no reports that it affects the vertical
direction. In addition, its effect is changed by the electrical impedance of the skin. In
thermal stimulation, cold water is poured directly into the ear, which is not a suitable
experimental stimulus with computer systems.
In this study, we choose a motorized motion chair (Kawada Industries, Inc.; Joy
Chair-R1), to stimulate the vestibular system with haptic modality. The motion chair
has two degrees of freedom (DOF) in roll and pitch rotations. To reproduce exact
physical information, a motion chair needs six degrees of freedom [12]. However,
such motion chairs tend to be expensive and large-scale. We constructed an
experimental system using a simple 2-DOF motion chair as an approximate
representation since size and cost are constrained in home use.
Figure 1 shows the configuration of the experimental system for generating visual
stimuli and controlling the motion chair. The motion chair and the visual stimulus are
controlled by different computers on a network with distributed processing, coded by
Matlab (The MathWorks, Inc.), Cogent Graphics Toolbox, and Psychophysics
Toolbox. Synchronization of the stimuli was performed over the network. Position
control was adopted to drive the motion chair. A voltage proportional to the desired
angle is applied by a microprocessor (Microchip Inc.; PIC18F252) and a 10-bit D/A
Concave-Convex Surface Perception by Visuo-vestibular Stimuli for Five-Senses Theater 227
converter (MAXIM; MAX5141). A visual stimulus is presented to a 100-inch screen

by a projector on the floor (NEC; WT600J).
Screen
Numeric keyboard
Projector
Computer
Computer
JoyChair R1
Control box
Fig. 1. System configuration
14.0
Inclination angle [deg]
10.5
7.0
3.5
0.0
0.0 0.5 1.0 1.5 2.0
Frequency [Hz]
Fig. 2. Perceptual threshold where the participant could not notice vibration noise of the
motion-chair versus amplitude (inclination angle) and frequency
We need to not only drive the chair within the maximum rotation velocity but also
to know the perceptual threshold of vibration noise. Figure 2 shows the perceptual
threshold of a smooth motion. The threshold was determined by asking a naïve
participant (24-year-old male) to control the level of the amplitude and frequency and
to alter them until they were not detectable (i.e., with the adjusting method). In
addition, ten male naïve participants (details later) reported that they did not feel
vibration noise under the criteria. The results show that the vibration noise could be
perceptually ignored under the combinations of amplitude and frequency, which were
below the line in Fig. 2. In the following experiment, we choose the experimental
parameters to drive the motion chair to meet the criteria for perceptual unawareness of
vibration noise.
We measured delay time between the onset of visual stimuli and motion-chair
stimuli in advance. The delay time can be reduced up to one video frame (33 ms) by
adjusting the onset of the visual stimuli. Each computer over the network
synchronizes the time between visual and motion-chair stimuli.
3 User Study
Ten male participants, aged 19–33 years, participated in the experiments. We decided
to use males only because it has been reported that women experience motion
sickness more often than men [13,14]. None of participants had any recollection of
ever experiencing motion sickness, and all had normal or corrected-to-normal vision.
They had no known abnormalities of their vestibular and tactile sensory systems.
Informed consent was obtained from the naïve participants before the experiment
started. Recruitment of the participants and the experimental procedures were
approved by the NTT Communication Science Laboratories Research Ethics
Committee, and the procedures were conducted in accordance with the Declaration of
Helsinki.
Visual stimuli generated by Matlab with Cogent Graphics Toolbox were radial
expansions of 700 random dots. The distance between the participant and the screen
was 1.72 m. The size of each dot was 81.28 mm. Resolution was 1024×768 (XGA).
Participants wore an earmuff (Peltor Optime II Ear Defenders; 3M, Minnesota, USA)
to mask the sound of the motion chair.
In each trial, a stimulus was randomly selected from experimental conditions.
Subjects were seated in the motion chair with their body secured with a belt. They
were instructed to keep their heads on the headrest of the chair. Figure 3 shows the
experimental procedure. Subjects were instructed to watch the fixation point on the
screen during the trial. After five seconds, the stimuli were presented for 20 seconds.
The experimental task was to respond whether the shape they overran was a bump
(convex upward), a hole (concave) or a flat surface (plane) by pressing a key of a
numeric keyboard. Buttons were labelled ‘bump’, ‘hole’ and ‘flat’. No feedback was
given during the experiment. Data from a seven-point scale of motion sickness (1. not
at all; 4. neither agree nor disagree; 7. very much) were also collected. Three visual
conditions (Bump/Hole/Flat) × 3 motion-chair conditions (Bump/Hole/Flat) × 3
velocity conditions (20, 30, 40 m/s) × 10 trials (a total of 270 trials) were conducted.
Subjects had 15-minute breaks after every 28 trials, but could rest at any time. A
typical experiment lasted about three hours and thirty minutes. The translational
velocity of the motion chair was expressed as velocity of optical flow. The shape was
expressed by titling the chair forwards and backwards, i.e., by modifying the pitch
rotation, which corresponded to the tangential angle on surface.
5 sec
Time
20 sec
Response
- Bump, hole, or flat
(3-alternative forced choice)
- Motion sickness rating
(7-point scale)
5 sec
Fig. 3. Experimental procedure
The vertical velocity of optical flow was determined by a combination of the

translational motion and pitch rotation from the profile of the shape. The profile of the
shape (y=f(x)) was Gaussian as follows:
1 ⎧ (x − μ) 2 ⎫
f (x)= exp⎨ − ⎬ (1)
2πσ ⎩ 2σ 2 ⎭
where σ=1.1 since the maximum of the tilt of the motion chair
⎧d ⎫
θ =arctan⎨ f (x) x = μ ±σ ⎬ (2)
⎩ dx ⎭
was set to 13.5 degrees from the limit of the motion chair’s angle. After ten seconds
from start, the height was at the maximum (i.e., x=μ). The translational velocity was
calculated by v=dx/dt.
The results of shape perception by visuo-vestibular stimulation are shown in Fig. 4.
Experimental results show that shape perception was greatly affected by vestibular
stimulation. The results suggest that the tilt of the chair, 13.5 degrees, was large
enough to judge the shape independent of visual stimuli. Reducing the angular
amplitude of the chair motion or weakening the effect of the vestibular sensory
stimuli (e.g., adopting around 2.2 degrees of tilt perception threshold [15] or slower
angular acceleration) can be expected to increase the effect of visual stimuli.
In contrast, it seems the visual stimuli should be redesigned to augment the effect.
When the motion chair stimulus was a flat surface and the visual stimulus was not a
flat surface, the responses were almost evenly split among the three surfaces. This
indicates that it is difficult to perceive the sensation of non-flat surfaces only from
visual stimuli.
The velocities of optical flow we used in the experiment did not greatly affect the
shape perception. Subjective motion-sickness scales of from all subjects were not
larger than 2, which means that the experimental stimulus did not cause motion
sickness.
Visual bump Visual hole Visual flat

Motion bump Motion hole Motion flat Motion bump Motion hole Motion flat Motion bump Motion hole Motion flat
1.0
Probability of clasifying
surface as a bump
0.5
0
20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40
1.0
surface as a hole
0.5
0
20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40
1.0
surface as a flat
0.5
0
20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40 20 30 40
Velocity conditions (m/s)
Fig. 4. Surface classification probabilities. Subjects mainly used information of the tilt of the
motion chair to identify the shape.
4 Enhancing Tactile Stimulation

Integration of visuo-vestibular stimuli with tactile stimuli is expected to enhance the
perception of self-motion. When we drive a car and accelerate it, the body is pressed
to the seat. If our body is pressed to the seat more strongly, we will perceive a
stronger subjective motion against the direction of pressure. In our motion chair
system so far, we have not yet implemented the pressure stimulation, which would
simulate acceleration or deceleration of body motion.
Figure 5 shows the design of the tactile stimulator for changing the pressure
between a seat and human body. The tactile stimulator is composed of voice-coil
motors with a pin-array and plates with holes. Figure 6 shows a layout drawing of the
tactile stimulator on the seat of a motion chair.
We expected that when the voice-coil motor with the pin-array vibrates at lower
frequencies, such as sub-hertz, a pressure sensation will be induced rather than
vibration sensation, because Merkel disks, which convey pressure information, are the
most sensitive of the four main types of mechanoreceptors to vibrations at low
frequencies.
Fig. 5. Schematic design of tactile stimulator with voice-coil motor and pin-array
Fig. 6. Layout drawing of the tactile stimulator on the seat of a motion chair
Figure 7 is a photograph of a prototype of the tactile stimulator. The pin-arrays of

the tactile stimulator are made of ABS resin. Four sets of voice-coil motors were
connected as a unit. To measure the pressure between the seat and human body, each
unit was connected on pairs of strain gauges in a Wii Balance Board (Nintendo, Inc.).
The tactile stimulators were driven with a computer with a D/A board (DA12-8(PCI),
CONTEC Co., Ltd.) and a custom-made circuit (including an amplifier).
Fig. 7. Prototype of tactile stimulator to generate the sensation of being pressed to a seat
5 Conclusion
In this paper, we reported a pilot study of presenting visuo-vestibular stimulation to

generate convex or concave surface perception. The results of the pilot study indicate
that we should redesign the effective stimulus combination of the visuo-vestibular
stimuli. After that, we will conduct a further experiment with different parameters in
an attempt to augment the effect of visual stimuli. We are also planning to conduct an
experiment with the tactile stimulator integrated with the visuo-vestibular system to
better understand the effectiveness of these stimuli.
Acknowledgement. This research was supported by the National Institute of

Information and Communication Technology (NICT). We thank Dr. Takeharu Seno
for his valuable comments on visually induced self-motion perception, and Mr.
Shohei Komukai for his contribution to building the experiment setup.
References
1. Ikei, Y., Urano, M., Hirota, K., Amemiya, T.: FiveStar: Ultra-realistic Space Experience
System. In: Proc. of HCI International 2011 (2010) (in appear)
2. Yoshioka, T., Nishimura, K., Yamamoto, W., Saito, T., Ikei, Y., Hirota, K., Amemiya, T.:
Development of Basic Techniques for Five Senses Theater - Multiple Modality Display for
Ultra Realistic Experience. In: Proc. of ASIAGRAPH in Shanghai, pp. 89–94 (2010)
3. Ishigaki, K., Kamo, Y., Takemoto, S., Saitou, T., Nishimura, K., Yoshioka, T.,
Yamaguchi, T., Yamamoto, W., Ikei, Y., Hirota, K., Amemiya, T.: Ultra-Realistic
Experience in Haptics and Memory. In: Proc. of ASIAGRAPH 2009 in Tokyo, p. 142
(2009)
4. Enami, K.: Research on ultra-realistic communications. In: Proc. of SPIE, vol. 7329, p.
732902 (2009)
5. Fischer, M.H., Kornmuller, A.E.: Optokinetisch ausgelöste Bewegungswahrnehmung und
optokinetischer Nystagmus. Journal of Psychological Neurology 41, 273–308 (1930)
6. Duijnhouwer, J., Beintema, J.A., van den Berg, A.V., van Wezel, R.J.: An illusory
transformation of optic flow fields without local motion interactions. Vision
Research 46(4), 439–443 (2006)
7. Warren Jr., W.H., Hannon, D.J.: Direction of self-motion is perceived from optical flow.
Nature 336, 162–163 (1988)
8. Seno, T., Ito, H., Sunaga, S., Nakamura, S.: Temporonasal motion projected on the nasal
retina underlies expansion-contraction asymmetry in vection. Vision Research 50, 1131–
1139 (2010)
9. Amemiya, T., Hirota, K., Ikei, Y.: Development of Preliminary System for Presenting
Visuo-vestibular Sensations for Five Senses Theater. In: Proc. of ASIAGRAPH in Tokyo,
vol. 4(2), pp. 19–23 (2010)
10. Huang, C.-H., Yen, J.-Y., Ouhyoung, M.: The design of a low cost motion chair for video
games and MPEG video playback. IEEE Transactions on Consumer Electronics 42(4),
991–997 (1996)
11. Maeda, T., Ando, H., Amemiya, T., Nagaya, N., Sugimoto, M., Inami, M.: Shaking the
World: Galvanic Vestibular Stimulation as a Novel Sensation Interface. In: Proc. of ACM
SIGGRAPH 2005 Emerging Technologies, p. 17 (2005)
12. Lebret, G., Liu, K., Lewis, F.L.: Dynamic analysis and control of a stewart platform
manipulator. Journal of Robotic Systems 10(5), 629–655 (1993)
13. Lentz, J.M., Collins, W.E.: Motion Sickness Susceptibility and Related Behavioral
Characteristics in Men and Women. Aviation, Space, & Environmental Medicine 48(4),
316–322 (1977)
14. Sharma, K., Aparna: Prevalence and Correlates of Susceptibility to Motion Sickness. Acta
Geneticae Medicae et Gemellologiae 46(2), 105–121 (1997)
15. Guedry, F.: Psychophysics of vestibular sensation. In: Kornhumber, H.H. (ed.) Handbook
of Sensory Physiology, vol. VI/2. Springer, Heidelberg (1974)
Touching Sharp Virtual Objects Produces a Haptic
Illusion
Andrea Brogni1,2, Darwin G. Caldwell1 , and Mel Slater3,4

1 Advanced Robotics Dept. - Istituto Italiano di Tecnologia, Genoa, Italy
{andrea.brogni,darwin.caldwell}@iit.it
2 Universidad Politcnica de Cataluña, Barcelona, Spain
3 ICREA-Universidad de Barcelona, Barcelona, Spain
4 Computer Science Dept. - University College London, London, UK
melslater@ub.edu
Abstract. Top down perceptual processing implies that much of what we per-
ceive is based on prior knowledge and expectation. It has been argued that such
processing is why Virtual Reality works at all - the brain filling in missing infor-
mation based on expectation. We investigated this with respect to touch. Seven-
teen participants were asked to touch different objects seen in a Virtual Reality
system. Although no haptic feedback was provided, questionnaire results show
that sharpness was experienced when touching a virtual cone and scissors, but
not when touching a virtual sphere. Skin conductance responses separate out the
sphere as different to the remaining objects. Such exploitation of expectation-
based illusory sensory feedback could be useful in the design of plausible virtual
environments.
Keywords: Virtual Reality, Human Reaction, Physiology, Haptic Illusion.
1 Introduction
It was argued by the late Professor Lawrence Stark that ’virtual reality works because re-
ality is virtual’ [12]. The meaning of this is that our perceptual system makes inferences
about the world based on relatively small samples of the surrounding environment, and
uses top down prior expectations to automatically fill in missing information. A scene
displayed in virtual reality typically provides a very small sample of what it is supposed
to be portraying at every level - in terms of geometric, illumination, behavioral, audi-
tory, and especially haptic sensory data, and within each of these with low resolution,
low fidelity, a visually small field of view - a typically huge amount of sensory informa-
tion that is missing compared to what would be available in physical reality. Yet there
is a lot of evidence that people tend to respond realistically to situations and events in
virtual reality in spite of this paucity of sensory data [9] [11].
There is also strong evidence that the brain in processing sensory data relies strongly
on multisensory correlations. By manipulating these multisensory correlations it is pos-
sible to produce in a person bizarre illusions of changes to their body - a rubber or
virtual arm replacing their real arm [2] [10], the Pinocchio Illusion (the feeling that
one’s nose is growing longer) [6], the shrinking waist illusion [5], out-of-the-body ex-
periences [7] [4] and even the illusion that a manikin body is one’s own body [8]. Each

Touching Sharp Virtual Objects Produces a Haptic Illusion 235
of these relies on synchronous visual-tactile or visual-motor correlations, or in the case

of the Pinocchio and shrinking waist illusions, correlation between the feeling of touch
on a body part (the nose, the waist) while proprioception indicates that the hand doing
the touching is also moving. The brain solves this contradiction through the inference
that the body itself is changing.
In this paper we investigate a simple setup that produces the illusion of touch when
there is none. It also relies on multisensory correlation and top-down knowledge. In
the case of the rubber hand illusion the subject sees, for example, a brush touching a
rubber hand placed in a plausible position on the table at which he or she is seated, and
feels the corresponding touch synchronously on the hidden real hand. The visual-tactile
correlation produces after a few seconds of such stimulation in most people, the strong
feeling that the rubber hand is their hand, the touch is felt on the rubber hand, and this is
demonstrated not only subjectively through a questionnaire, but behaviorally. The sub-
ject will blindly point towards the rubber hand when asked to indicate where the hand
is felt to be located. If the rubber hand is threatened then this will generate physiolo-
gical arousal that would be appropriate for a pain response [2]. When the visual-tactile
sensations are not synchronous then the illusion typically does not occur. This shows
that the connection between the visual and haptic modalities is very strong and that one
can influence the other under specific conditions, even if vision usually dominates.
In our experiment the seen hand is the person’s real hand, and the visual touch with
a virtual object is also visually on the hand. What is missing is the real touch sensation,
and the main question is to what extent the brain will fill in this missing information
and provide a feeling of touch. Can we really feel a sensation on our palm when we
approach a virtual object ? Are we perceiving different sensation if we have a sharp
object or a smooth one ? This would open new opportunities for some applications,
where the pure haptic feedback is not so crucial and simple virtual sensations could be
enough for increasing the sense of presence.
2 The Experiment
2.1 Introduction
The study we carried out was related the different perception we could have in ap-
proaching objects with different shapes. The experiment covered the simple act of ap-
proaching and touching an object. Moving the hand and reaching for an object with
the palm of the hand was the task for the subjects, in a very simple virtual world. The
main task for the volunteers was to stand in a very simple environment, consisting of
just a grid placed on the floor and to wait for an object to appear. They were asked to
approach the object at the clear red spot and to lift their arm and ”touch” with the palm,
where we have an high density of receptors. Skin Conductance was recorded during the
experiment, to analyze physiological responses when the volunteer was ”touching” the
different objects.
In real life we perceive objects of different shapes in different ways, due to our
previous experience, and our sensations and reactions are different according to the
object. The hypothesis is that similar reactions could happen in a completely virtual
236 A. Brogni, D.G. Caldwell, and M. Slater
environment, where no real haptic feedback is provided. Approaching a virtual sharp

object should be different from approaching a virtual sphere, or a smooth object,
showing similarity with a real situation.
2.2 Equipment
The study was carried out at the Advanced Robotics Lab, at the Istituto Italiano di
Tecnologia, Genoa, Italy. Participants were placed inside a Powerwall system, where
the virtual environment was projected on a 4x2m2 screen, by two Christie Mirage S+
4000 projectors, synchronised with StereoGraphics CrystalEyes active shutter glasses.
The head of the participant was tracked with 6 dof by an Intersense IS900 inertial-
ultrasonic motion tracking system. An Optitrack FLEX:100 system composed of 12
infrared cameras was used for tracking the hand position using a single passive marker.
We used a Sun workstation (dual core AMD Opteron, processors 2218, 2.60 GHz, 2.50
GB RAM) with Windows XP Professional and Nvidia Quadro FX 4600 video card. A
snapshot of a volunteer during the experiment is in Figure 1.
The device for the physiological monitoring was the Nexus-4, by MindMedia1 . This
was used to record skin conductance at a sampling rate of 128Hz. The main application
was developed in C++, using XVR2 as graphic library and VRPN3 as a network library
for the connection with the physiological monitoring and the tracking system.
The participants were able to move freely in the VE like in the real one, because the
virtual space was aligned with the real walls of the VR system and the grid on the floor
was on the same level of the real floor of the room. The frame rate of the simulation
was around 100 Hz.
2.3 Procedures and Scenario
Seventeen individuals were recruited for the experiment, ten male and seven females.
After initial greetings, the participants were asked to complete a consent form after
reading an information sheet. The participants were then asked to answer a question-
naire containing a series of demographic questions.
The participant was introduced to the Powerwall and fitted with various devices:
tracking sensors, shutter glasses and physiological sensors. A passive marker was placed
on the back of the hand to keep track of the palm position. Initially the participant was
asked to stand still for 30 seconds in the dark in order to record a baseline for phy-
siological signals in a relaxed and non-active state. After the dark baseline session, the
VE appeared on the projection screen, and another set of baseline measurements were
recorded for another 30 seconds, but with the virtual environment displayed (a simple
grid on the floor).
After this period, the actual experiment started, and seven objects were shown in
random order. These were a cone, a cube, a cylinder, a pyramid, scissors, a sphere and
a vase (Figures 2 and 3). These objects varied in their degree of sharpness from none
1 http://www.mindmedia.nl/english/index.php
2 http://www.vrmedia.it/Xvr.htm
3 http://www.cs.unc.edu/Research/vrpn/
Fig. 1. A Volunteer during the experiment
Fig. 2. Snapshots of the geometrical virtual objects
at all (a sphere) to very pointy (a cone). The participants were asked touch the virtual
objects at a point indicated by a red spot, using the palm of their right hand. However,
the vase was slightly different, there was no red spot on it, and participants were told
that they could touch it wherever they wanted, and even in a different place each time.
Each object was displayed in succession for about 15 seconds and then there was 15
seconds between the display of each object.
The participants saw these objects in a different random order, but the first and
the last in the sequence were never the vase or the scissors because those objects
were breaking the geometrical sequence, and we wanted them to be in the middle. In
Figure 1, some snapshots of a participant during the experiment.
Fig. 3. Snapshots of virtual scissors and vase
2.4 Recording Subjective Responses

A questionnaire was administered at the end of the experiment consisting of three ques-
tions on a 7-point Likert4 scale. The order of questions was randomized for each sub-
ject, and each question was answered with respect to each object - again the order of
the objects different and randomised for each subject. These questions were:
– (Q1) Did it sometimes occur to you that you wanted to pick up or use the object in
some way?
– (Q2) Did it sometimes occur to you that the object was really there?
– (Q3) When your hand touched the [object name] did you at any time feel any of the
following sensations? (hot, humid, unpleasant, sharp, soft, painful, cold, smooth).
Each question was answered on the scale 1 to 7 where 1 meant ‘not at all’ and 7 ‘very
often’. With respect to Q3, this scale was used in relation to each of the listed properties.
2.5 Physiological Responses

The physiological measure recorded during this experiment was skin conductance [3]
recorded at 128Hz by the Nexus device. Superficial electrodes were placed on the
palmar areas of the index and ring fingers of the non-dominant hand. This measures
changes in arousal through changes in skin conductance caused by sweat levels.
In particular, Electrodermal Activity (EDA) was recorded during the virtual experi-
ence. EDA is based on the sweat gland activity on the human skin. When the level of
arousal increases the glands produce more sweat, changing the resistance, and the con-
ductance, of the skin [1] and [13]. Our signal was indeed the Skin Conductance (SC),
expressed in micro-siemens, that during relaxation normally decreases. The raw data
coming from the device have been treated to obtain the micro-siemens values, and then
we have smoothed the signal using MATLAB wavedec function to have a better signal
for the skin conductance level (SCL) detection.
Following [4], we have measured the SCL at the start of any touching event, and
then the maximum reached during the immediately following 6 seconds. We allowed 5
4 http://en.wikipedia.org/wiki/Likert_scale
seconds for the response, and 1 second as leeway in not having the very exact moment
that the participant’s hand intersected the object. The greater this amplitude change the
greater the level of arousal.
2.6 Analysis and Results
Questionnaire Analysis. We first consider the internal consistency of the responses

within each object. In this paper we only pay attention to the scores on the felt properties
of the objects as elicited by Q3. We expect that the sharp objects should be reported as
sharp and smooth objects as smooth. We used a non-parametric Kruskal-Wallis non-
parametric one-way analysis of variance, since the questionnaire responses are ordinal.
The results are consistent with expectation. For the cone, cube and pyramid the ‘sharp’
response was significantly higher than all of the rest (P = 7x10−8 , 0.0004, 1.3x10−5
respectively, for the tests of equality of all medians). For the scissors the medians were
not all equal (P = 3.6x10−6), with ‘sharp’ having the highest median, but also ‘painful’
and ‘unpleasant’ were not significantly different from ‘sharp’ (using a multiple contrast
analysis based on the ANOVA at an overall 5% level).
There was no significant difference between the properties for cylinder (P = 0.18).
For the sphere ‘smooth’ is significantly higher (P = 1.1x10−9) than the remaining prop-
erties, and for the vase ‘humid’ and ‘smooth’ were significantly higher than the remain-
ing properties (P = 0.0004) but this difference was caused by very few differences in
the scores (most of the scores were 1).
More interesting analysis is between the levels of reported sharpness between the
objects. Considering the question ‘sharp’ only, there is a significant difference between
the medians of the 7 objects (P = 5.6x10−7 ). Using a multiple contrast analysis with
overall significance level 0.05 we find that the level of reported sharpness for the cone
is significantly higher than for cylinder, sphere and vase. Scissors and pyramid are both
significantly higher than sphere and vase. Sphere and vase are significantly lower than
cone, scissors and pyramid.
The questionnaire analysis is interesting, but is not sufficient, since we cannot be
sure that people are not just reporting what they think might be expected of them rather
than an objective response. For this we turn to the skin conductance analysis.
Skin Conductance Analysis. Analysis of variance of the skin conductance amplitudes

revealed no differences in the mean values between the different objects. However, we
consider the 7 by 17 matrix of skin conductance amplitudes, 7 columns for the objects,
by 17 participants, and from this construct a distance matrix between the objects, using
Euclidian distance. Using this we carried out a cluster analysis (using the MATLAB
linkage function). We allowed for 2, 3 or 4 clusters amongst the 7 objects. The results
are shown in Table 1.
It is clear that the sphere stands out as a cluster on its own. We know a priori that
there were 3 types of object (those with sharp edges, the sphere, and the vase which was
quite different from the others). The cluster analysis always correctly distinguishes the
sphere from the rest.
Table 1. List of the clusters allowed by the analysis
Number of Cluster 1 Cluster 2 Cluster 3 Cluster 4

clusters al-
lowed
2 All except sphere
sphere
3 cone, cube, vase cylinder, scis- sphere
sors, pyramid
4 cone, vase cube cylinder, scis- sphere
sors, pyramid
2.7 Discussion
In the rubber hand illusion there is synchronous visual and tactile sensory data, and the
brain infers that the rubber hand must be the real hand. Our experiment may be con-
sidered as an ‘inverse’ virtual hand illusion experiment - here the hand is the person’s
seen real hand, and this hand is seen to touch a virtual object. It is as if there were an
equation: this is my hand = the rubber hand is seen to be being touched + there is felt
touch (with synchronous visual-tactile stimulation).
In our case the left hand side is definitely true (it is their real hand), and also the first
term on the right hand side is true (the hand is seen to be touching virtual objects). The
‘unknown’ value here is the felt touch, which is then may be generated automatically
by the perceptual system.
Although the absolute values of the skin conductance amplitudes revealed no differ-
ences between the means of the different object types, it is nevertheless the case that the
cluster analysis can distinguish the one object that is clearly smooth and different from
the remaining ones. Remember that this is based solely on physiological responses,
which in themselves seemingly have no connection with the shapes of virtual geomet-
ric objects. However, it appears that the relationships between the degrees of arousal
can distinguish between the different types of objects.
The result gives an indication on how the brain could be able to deal with a com-
pletely new situation and to provide an help for us. Based on the database of previous
experiences, the brain sends very small but relevant sensations related the touch sense,
even without a real feedback from the environment. This ”haptic illusion” seems to be
stronger for sharp shapes than for smooth ones, maybe because the former is connected
to something dangerous for us and the object could hurt and a subconscious mechanism
overwrites the actual perception with something fake, but useful as alarm.
Positive feedback came also from the quick and informal chats made after the ex-
periments. People reported ”How did you make it ? I was feeling the material ! - The
sphere was soft and smooth, and I felt my hand falling down when it disappeared - The
blue side of the objects was cold”, but of course we had also negative impressions like
”I didn’t get any sensation, a part of a bit of discomfort when I have approached the
scissors facing me with the sharp end”.
3 Conclusions
Our studies has demonstrated that subjectively people distinguish between the smooth-
ness / sharpness properties of different types of objects. There is also some evidence
that this occurs at a physiological level. It is our view that haptics remains the great un-
solved problem of virtual reality. It is true that there are specific haptic devices that can
give specific types of haptic feedback under very constrained circumstances. However,
there is no generalized haptics in the sense that contingent collisions with virtual ob-
jects on any part of the body can generate tactile responses. We speculate that reliance
on the perceptual properties of the brain can be employed in order, in the long run, to
solve this problem.
4 Future Works
The main problem in the results of the our study was that we could not distinguish
between the visual influence and the imaginary haptic influence. Just seeing the cone or
scissors might provoke a response, which could be the cause of the GSR effect of the
cone without any touching. The next step in the research will be to carry out a new small
study, in which we could ask the volunteers to move their hand close to the point but
not actually touch it and compare to moving the hand onto the sharp / smooth point and
intersecting with it. This should help us to discriminate the purely visual effect from the
illusory haptic one.
Acknowledgment. This work was supported by the Spanish Ministry of Education

and Science, Accion Complementaria, TIN2006-27666-E, and the EU-FET project IM-
MERSENCE (IST-2006-027141). The study was approved by the Ethical Committee of
the Azienda Sanitaria Locale 3 in Genoa, Italy. The studies were carried out at the Ad-
vanced Robotics Lab, at the Istituto Italiano di Tecnologia, Genoa, Italy.
References
1. Andreassi, J.J.: Psychophysiology: Human Behavior and Physiological Response, 4th edn.
Lawrence Erlbaum Associates, London (2000)
2. Armel, K., Ramachandran, V.: Projecting sensations to external objects: evidence from skin
conductance response. Proceedings of the Royal Society, B, Biological Sciences 270, 1499–
1506 (2003)
3. Boucsein, W.: Electrodermal Activity, New York (1992)
4. Ehrsson, H.H.: The experimental induction of out-of-body experiences. Science 317(5841)
(August 2007)
5. Ehrsson, H.H., Kito, T., Sadato, N., Passingham, R.E., Naito, E.: Neural substrate of body
size: Illusory feeling of shrinking of the waist. PLoS Biol. 3(12) (November 2005)
6. Lackner, J.R.: Some proprioceptive influences on the perceptual representation of body shape
and orientation. Brain 111(2), 281–297 (1988)
7. Lenggenhager, B., Tadi, T., Metzinger, T., Blanke, O.: Video ergo sum: Manipulating bodily
self-consciousness. Science 317(5841), 1096–1099 (2007)
8. Petkova, V.I., Ehrsson, H.H.: If i were you: Perceptual illusion of body swapping. PLos
ONE 3(12) (2008)
9. Sanchez-Vives, M.V., Slater, M.: From presence to consciousness through virtual reality.
Nature Neuroscience 6(4), 8–16 (2005)
10. Slater, M., Marcos, D.P., Ehrsson, H.H., Sanchez-Vives, M.V.: Towards a digital body: The
virtual arm illusion. Frontiers in Human Neuroscience 2(6) (March 2008)
11. Slater, M.: Place illusion and plausibility can lead to realistic behaviour in immersive
virtual environments. Philosophical Transactions of the Royal Society B: Biological Sci-
ences 364(1535), 3549–3557 (2009)
12. Stark, L.W.: How virtual reality works! the illusions of vision in real and virtual envi-
ronments. In: Proc SPIE: Symposium on Electronic Imaging: Science and Technology,
vol. 2411, pp. 5–10 (February 1995)
13. Stern, R.M., Ray, W.J., Quigley, K.S.: Psychophysiological Recording, 2nd edn. Oxford Uni-
versity Press, Oxford (2001)
Whole Body Interaction Using
the Grounded Bar Interface
Bong-gyu Jang, Hyunseok Yang, and Gerard J. Kim
Digital Experience Laboratory

Korea University, Seoul, Korea
gjkim@korea.ac.kr
Abstract. Whole body interaction is an important element in promoting the

level of presence and immersion in virtual reality systems. In this paper, we
investigate the effect of “grounding” the interaction device to take advantage of
the significant passive reaction force feedback sensed throughout the body, and
thus in effect realizing the whole body interaction without complicated sensing
and feedback apparatus. An experiment was conducted to assess the task
performance and level of presence/immersion, as compared to a keyboard input
method, using a maze navigation task. The results showed that while the G-Bar
did induce significantly higher presence and the task performance (maze
completion time and number of wall collisions) was on par with the already
familiar keyboard interface. The keyboard user instead had to adjust and learn
how to navigate faster and not collide with the wall over time, indicating that
the whole body interaction contributed to a better perception of the immediate
space. Thus considering the learning rate and the relative unfamiliarity of G-
Bar, with sufficient training, G-Bar could accomplish both high
presence/immersion and task performance for s.
Keywords: Whole-body interaction, Presence, Immersion, Task performance,

Isometric interaction.
1 Introduction
One of the defining characteristics of virtual reality is the provision of “presence” [1],
the feeling of being contained in the content. Many researches have identified the
elements that contribute to enhance the level of presence [1][8], and one such element
is the use of “whole body” interaction whose strategy is to leverage on as many
sensory and motor organs as possible [2].
In this paper, we introduce an interface called the “G-Bar” (Grounded Bar), a two-
handed isometric device that is fixed to the ground (grounded) for a variety of
interactive tasks including navigation, and object selection and manipulation. The
interface is “whole body” because it is basically operated with two hands, and since it
is grounded, it also indirectly involves the interactions through the legs and body parts
in between (see Figure 1). Since the user also needs to move one’s head/neck in order
to view and scan the environment visually, virtually all parts of the body become
active. Moreover, since the device is isometric and senses the user’s pressure input,
244 B.-g. Jang, H. Yang, and G.J. Kim
the user can express dynamic interactions more naturally [3][4][5]. In addition, we
formally evaluate and validate the projected merits of the whole body interaction
induced by the “grounded” device such as G-Bar.
Fig. 1. The G-Bar in usage for navigating and selecting objects in a virtual environment. The
reaction force resulting from the two-hand interaction with the grounded device propagates
throughout the body (right). More detailed view of the G-Bar prototype (left).
2 Related Work
Employing whole body interaction is an effective method to enhance the immersive
quality of interactive contents [6][7][8]. However, whole body interaction does not
necessarily require separate sensing and feedback mechanisms for the body parts
involved. Through clever interaction and interface design, whole body interaction can
be induced through gesture latitude and minimal sensing. For instance, the arcade
game “Dance Dance Revolution” [9] utilizes a very simple foot switch pad, but the
interaction is designed to induce the use of whole body (similarly for Nintendo Wii
based games [10]).
This concept is somewhat related to that of the “passive” haptics, an inexpensive
and creative way to use natural reaction force feedback (e.g. tangible props).
However, props are usually light weight and fragile, limiting the users in applying
high amount of forces. Meehan et al. [11] demonstrated the utility of passive haptics
with grounded prop (a ledge) which has significantly enhanced the level of presence
for their virtual cliff environment. A large reaction force is more likely to propagate
throughout and stimulate the whole the body.
Isometric input is also known to increase interaction realism by allowing dynamic
expression through the input [3][4][5]. G-Bar combines all these aforementioned
elements in hopes to creating effective interaction and compelling user experience.
On the other hand, the effect of whole body toward the task performance is
unclear. For one, the relationship between presence/immersion and task performance
is generally viewed as being task dependent [1][8].
Whole Body Interaction Using the Grounded Bar Interface 245
3 G-Bar
G-Bar is implemented by installing low cost pressure sensors and vibrating motors on
a bar handle. Four pressure sensors (two at each end of the bar) realize the isometric
input and the four vibrating motors (laid out in a regular interval along the bar) give
directional feedback cues in addition to the natural reaction force.
Thus G-Bar is particularly appropriate for tasks that involve frequent and dynamic
contact with the environment or interaction object. A typical example might be
navigating and passing through a bumping crowd, riding and directly controlling a
vehicle such as a cart, motor cycle or hang-glider. In fact, the “bar” resembles the
control handles used for some of these vehicles (e.g. motor cycle handle), and such
metaphors can be even more helpful.
While object selection and manipulation might not really involve whole body
interaction in the real life, by exaggerating the extent of the required body parts involved,
we posit that it could be a one way to maximize the virtual experience. Figure 2 shows
how one might be able to navigate through the virtual space by combinations of simple
two handed isometric push (forward) and pull (backward) actions. In the selection mode,
the same interaction technique can be used for controlling the virtual ray/cone, then
applying the grasping action for final selection (right hand grasp) and undo (left hand
grasp). Once the object is selected, it can be rotated and moved in a similar fashion as
well. Despite the seemingly natural interaction metaphors, the sheer unfamiliarity will
require the users some amount of learning.
Fig. 2. Navigation, and object selection and manipulation through combinations of push, pull,
twist and grab actions with the G-Bar
4 Experiment
To assess the effectiveness of the proposed interaction technique, we have carried out
an experiment comparing the G-Bar interface to a non-grounded interface, namely a
keyboard input. The user was asked to navigate a fixed path in a virtual maze with a
cart (in first person viewpoint) and the task performance, level of presence/immersion
and general usability were measured. Our hypothesis was that the use of G-Bar
would result in significantly enhanced user experience (e.g. high presence and
immersion), but may not produce good task performance nor high usability without
sufficient training.
The experiment was designed as a one factor (two level) repeated measure (within
subject), the sole factor being the type of the interface employed (G-Bar vs.
keyboard). The subject was asked to navigate a fixed path in a maze like environment
using the two interfaces (presented in a balanced order). The shown in Figure 3, the
virtual maze was composed of brick walls with the directional path marked for user
convenience. The subject was asked to follow and navigate the path as fast as possible
but without colliding with walls as much as possible. The width of the path was set
properly so that the task was not too easy, especially in making turns (after an initial
pilot test). The test environment was staged as the user pushing cart (seen in the first
person viewpoint) with a large box in it occluding the front end so that the user had to
get the “feel” for the extent of the cart. This “feel” would be important in avoiding
collision with the walls and also a quality that was thought to be better acquired with
a whole body interaction and thus would produce higher task performance (at least
eventually).
Fig. 3. A snapshot of the virtual maze used in the comparative experiment (left). A subject
navigating the virtual maze using the interface set up. The G-Bar is installed on a heavy
treadmill for grounding. The keyboard interface was also placed on the same place where the
G-Bar was installed (right).
The measured dependent variables were the task completion time, accuracy (e.g.
no. of collision), subjective presence/immersion score (collected through a survey)
and other general usability (we omit the content of the survey for lack of space).
4.2 Experiment Process
A total of 16 subjects participated in the experiment (13 males / 3 Females). Average

age of the subject was 27.18 and mostly college undergraduate or graduate students
recruited on campus, all with previous experiences in keyboard/mouse based computer
interface and using supermarket carts. They were given proper compensation for the
participation.
The subject was first given some amount of training until one got sufficiently
familiarized with the G-Bar. However, due to the prior familiarity with the keyboard
interface, the competency of G-Bar usage did not match that of the keyboard. Then,
the subject navigated the virtual maze using the two interfaces presented in a balanced
order. The subject tried out the maze three times over which the same maze was used
to measure the learning effect of the interface itself. The learning effect (of the maze)
or bias due to using the same maze over the trials or over the different treatments was
deemed minimal because the task was simply to follow the marked path (rather than
finding the opening path).
Since G-Bar is a grounded interface, it was attached to a front horizontal bar on a
(heavy) treadmill (see Figure 3, the moving walk of treadmill in the bottom was not
used). To set the environment condition equally, when the keyboard was used, it was
placed on the same position as the G-Bar.
The quantitative dependent variables were captured automatically through the test
program software and the presence/immersion and usability survey was taken after
the subject carried out both treatments.
5 Experiment Results
Figure 4 shows the experiment results with the task performance (task completion
time and no. of wall collisions) over three trials. Contrary to our expectation, despite
the relative familiarity of the keyboard interface, the users performed generally better
with the G-Bar interface (even though statistically significant differences were not
observed). Moreover, the keyboard interface showed more learning effect than the
G-Bar. Therefore, we can conclude that the G-Bar (or whole body interaction) was a
more proper interface (e.g. better depth/spatial perception) to begin with and in
reverse forced the keyboard users to adapt and learn how not to collide or navigate
better.
Figure 5 shows the experiment results with the qualitative survey responses (all
measured in the 7 Likert scale). It is very interesting that in relation to the results with
the task performance, the users still felt the keyboard interface was much easier. The
extent of perceived level of whole body usage, force feedback (note that there was not
explicit force feedback) and immersion (presence) was much higher with the G-Bar
(with statistical significance). Again, we posit that such factors affected the depth and
spatial perception contributing to the relatively high task performance despite the user
was not totally trained to use the G-Bar.
Fig. 4. Experiment results (time of completion and no. of collisions between the keyboard and
G-Bar)
Fig. 5. Experiment results (survey questions: ease of use, extent of whole body interaction, the
level of immersion, and extent of the perceived force feedback)
6 Conclusion
In this paper, we presented the G-Bar, a low cost, two handed isometric whole body
interface for interacting in the virtual space. The use of two hands in combination with
the passive reactive feedback proved to be a contributing factor to enhanced presence.
In addition, despite the novelty of the technique, after minimal training, the users were
able to achieve the level of task performance comparable to the nominal non-grounded
device as well. While G-Bar may not be appropriate for all types of virtual tasks (e.g.
for interacting with fast moving light objects with relatively little reaction force, or for
tasks that are more natural with one hand), the study shows the effectiveness of
grounding the interaction device and leveraging on the naturally induced whole body
experience. We believe that in combination with multimodal feedback (e.g. vibration
feedback and visual simulation), the virtual experience can be further enriched.
Acknowledgements. This research was supported in part by the Strategic Technology
Lab. Program (Multimodal Entertainment Platform area) and the Core Industrial
Tech. Development Program (Digital Textile based Around Body Computing area) of
the Korea Ministry of Knowledge Economy (MKE).
References
1. Kim, G.J.: Designing Virtual Reality Systems: A Structured Approach. Springer,
Heidelberg (2005)
2. Buxton, W.: There’s More to Interaction than Meets the Eye: Some Issues in Manual
Input. In: Norman, D.A., Draper, S.W. (eds.) User Centered System Design: New
Perspectives on Human-Computer Interaction, pp. 319–337. Lawrence Erlbaum
Associates, Mahwah (1986)
3. Lecuyer, A., Coquillart, S., Kheddar, A.: Pseudo-Haptic Feedback: Can Isometric Input
Devices Simulate Force Feedback. In: Proceedings of the IEEE Virtual Reality
Conference, pp. 83–89 (2000)
4. Zhai, S.: Investigation of Feel for 6DOF inputs: Isometric and Elastic rate control for
manipulation in 3D environments. In: Proceedings of the Human Factors and Ergonomics
Society (1993)
5. Zhai, S.: User Performance in Relation to 3D Input Device Design. Computer
Graphics 32(4) (1998)
6. Boulic, R., Maupu, D., Peinado, M., Raunhardt, D.: Spatial Awareness in Full-Body
Immersive Interactions: Where Do We Stand? In: Boulic, R., Chrysanthou, Y., Komura, T.
(eds.) MIG 2010. LNCS, vol. 6459, pp. 59–69. Springer, Heidelberg (2010)
7. Benyon, D., Smyth, M., Helgason, I. (eds.): Presence for Everyone: A Short Guide to
Presence Research. The Centre for Interaction Design, Edinburgh Napier University, UK
(2009)
8. Peterson, B.: The Influence of Whole-Body Interaction on Wayfinding in Virtual Reality.
PhD Thesis, University of Washington (1998)
9. Konami Digital Entertainment, Inc. Dance Dance Revolution (2010),
http://www.konami.com/ddr/
10. Nintendo, Inc. Wii, http://wii.com/
11. Meehan, M., Whitton, M., Razzaque, S., Zimmon, P., Insko, B., Combe, G., Lok, B.,
Scheuermann, T., Naik, S., Jerald, J., Harris, M., Antley, A., Brooks, F.: Physiological Reaction
and Presence in Stressful Virtual Environments. In: Proc. of ACM SIGGRAPH (2002)
Digital Display Case Using Non-contact Head Tracking
Takashi Kajinami1, Takuji Narumi2, Tomohiro Tanikawa1, and Michitaka Hirose1

1
7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656 Japan
2
Graduate School of Engineering, The University of Tokyo
7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656 Japan
{kaji,narumi,tani,hirose}@cyber.t.u-tokyo.ac.jp
Abstract. In our research, we aim to construct the Digital Display Case system,
which enables a museum exhibition using virtual exhibits using computer
graphics technology, to convey background information about exhibits
effectively. In this paper, we consider more practical use in museum, and
constructed the system using head tracking, which doesn't need to load any
special devices on users. We use camera and range camera to detect and track
user's face, and calculate images on displays to enable users to appreciate virtual
exhibits as if they were really in the virtual case.
Keywords: Digital Display Case, Digital Museum, Computer Graphics ,Virtual

Reality.
1 Introduction
In our research, we aim to construct the Digital Display Case system, which enables
museums to hold an exhibition with virtual exhibits using computer graphics
technology, to convey background information about the exhibits effectively[1].
Recently museums are very interested in the introduction of digital technologies
into their exhibitions, to tell more about background information about their exhibits.
Every exhibit has many background information, for example when or where it was
made, what kind of culture it belongs to and so on. However, museums have some
problem in conveying these information, because they cannot modify the exhibit itself
to preserve them. They conventionally used panel to convey them (Fig. 1), but it is not
so effective way to help visitors to connect the exhibit itself and its information on the
panel, because they often placed detached. Thus digital exhibition system is needed to
tell the background information in a manner more closely to the exhibit, without
suffering their exhibits.
Therefore in our research, we aim to construct an interactive exhibition system to
tell the background information about exhibits more effectively, which is designed
based on the contexts of conventional exhibitions in museums. In this paper, we
consider more practical use in museum, and constructed the system using head
tracking, which doesn't need to load any special devices on users. We use camera and
Digital Display Case Using Non-contact Head Tracking 251
Fig. 1. Conventional Exhibition in Museum
range camera to detect and track user's face, and calculate images on displays to
enable users to appreciate virtual exhibits as if they were really in the virtual case.
2 Related Works
2.1 Digital Devices for Museums
Although some digital devices like Information Kiosk or video about exhibits are
already introduced into museums, most of them are placed out of the exhibition rooms.
This is because curators in museums, who design exhibitions, do not know how to use
it effectively, while they know much about conventional exhibition devices. We have
to consider this know-how to introduce mixed reality technologies into museums.
The most popular digital system is a theater system, which some museums already
introduced. Several studies have been conducted on the gallery talk in the theater[2].
These systems can present the highly realistic images about the theme of the
exhibition. However it is difficult to introduce the system into exhibition rooms, and it
is a big problem which looses the connection between the contents in the theater and
the exhibits in the room.
There are also some researches to use digital technologies at the gallery talk in
exhibition rooms. Gallery talk is a conventional way for museums to convey the
background information about exhibits to their visitors, which means oral explanation
about exhibits by
However, it is difficult to have frequently or individually because of the problem of
manpower shortage. Some digital devices are made to solve this problem. Gallery talk
robot[3][4] is one solution for the problem, which realize the gallery talk from a
remote person. This reduces the geographic restriction of commentators, and makes it
easy to do the gallery talk. However it has the problem how the robot moves in the
exhibition rooms where people are also walks. We have to consider the robot not to
knock against the person nor disturb person's movement.
252 T. Kajinami et al.
Mobile devices are also used to convey the information about exhibits. Hiyama
et al[5] present this type of museums guiding system. They use mobile device with
position tracking using infra-red signals, and show visitors the information based on
this position data. This enables museums to have a structural explanation of the entire
exhibition room. However, it is difficult to install the devices for positioning into all
exhibition rooms, and this is a high threshold for introduction.
In addition, there are some works about digital exhibition devices for museum.
Some researches about the exhibition system with HMD have been conducted[6].
However wearable systems like HMD system have a big problem when we introduce
them into permanent exhibition, because it is difficult for museums to manage them.
On the other hand, there are also some installed devices for museums presented. We
can cite Virtual Showcase[7] as an example, which overlays images on the exhibit
with half mirror and allows multiple users to observe and interact with augmented
contents in the display. It can explain the background information using real exhibit,
but at the same time has some constrain in its exhibition because it cannot move the
exhibit.
2.2 Display Device of 3D Model Data
There are also some other related works especially about the display device of 3D
model data and the interaction with it.
Many studies have been conducted for the 3D display, and today we can easily get
the 3D display system with glasses and a display in the shape of conventional one.
Here we focuses on the volumetric displays considering the shape of current display
cases of free-standing type. Seelinder[8] is a 3D display with the rotation of
cylindrical parallax barrier and light source array of LED. This system allows
multiple viewers to see the appropriate images corresponds to there position from any
position, but it can show appropriate image only correspond to horizontal motion, and
does not support vertical motion. On the other hand, there are some displays[9][10]
which can show the appropriate images for vertical motion with the two-axis rotation
of mirror. However, they can only display small images in low resolution and low
contrast.
On the other hand, there are also some studies about the system consisted of 3D
display system and a kind of interaction devices. We can take MEDIA3 [11] or
gCubik[12] as examples. However when we use it in the story-telling in museums,
We need more complex interaction than ones they realize.
3 Digital Display Case
In the previous paper[1], we constructed a prototype of Digital Display Case(Fig. 2),

which realizes an exhibition using computer graphics(Fig. 3). With this prototype
we considered how to tell background information about exhibits, categorized to
synchronicity and diachronicity (Fig. 4), and make some exhibition to tell them.
Fig. 2. Prototype of Digital Display Case
Fig. 3. Exhibition of virtual exhibits Fig. 4. Exhibition to Convey Diachronicity
,That prototype was enough to indicate the effectiveness of Digital Display Case in
museums. However, it has relatively law compatibility with conventional display
cases, which is not suitable when we place the system in an exhibition room, It has
also necessity to load polhemus sensor on a user, which will be the problem when we
enable many visitors in museum to experience the system.
Therefore In this paper, we aim to construct the display system of virtual exhibits
using CG, which is more compatible with conventional display cases, and which
visitors in museum can appreciate virtual exhibits more easily.
3.1 Implementation of the System
We constructed the system shown in Fig. 5. In this system, we use 40 inch 3D

televisions as display, and constructed three displays into box shape like conventional
display cases. In previous prototype, we use polhemus sensor to measure user's
position of view. Then in this system, we use camera and range camera attached, and
measure the position of view without loading any special devices on users.
This system composed of two subsystems, one for detect and track user's head and
the other for render the images on the display. Fig. 6 shows the dataflow of the whole
system.
Fig. 5. Digital Display Case more compatible with conventional display cases
Fig. 6. Dataflow of the system

Fig. 7. Kinects placed at the top
Fig. 8. Capture image and depth around the system
3.2 Motion Parallax with Non-contact Tracking of User's Head
For tracking user's head, we use Kinect [13], which has camera and range camera as
sensor. We place three Kinects at the top of the system (Fig. 7), and capture image and
measure depth around the system (Fig. 8). We measure the position of view based on
these data.
Fig. 9 shows how to measure user's position of view. First, we use depth image to
detect user around the system, and extract the area of user from captured image. From
this area, we detect a user's face and calculate its position in the image. Then we get
an average of depth around the position and calculate the position of view.
Fig. 9. Dataflow to get the position of view
Then the system gathers the data from three Kinects and select the user most near
to the system. To avoid the confusion when two or more person are in the same
distance, we set priority on the one most near to the position detected in the precious
detection.
3.3 Discussion
Fig. 10 shows how the system works. It shows that the head tracking process works
effectively, and realizes a motion parallax about the virtual display case in the system,
without any sensors on a user. This is more suitable for museum exhibition, because
visitors can appreciate virtual exhibits in the same way we appreciate real exhibits,
without putting any sensor on them.
Fig. 10. Motion parallax without any sensor on a user
This system measures the position of view in 15 fps and enable users to move
almost 180 degree around the system and appreciate the virtual exhibit in it. It also
can select the appropriate user when some users are detected, and he can keep his
appreciation.
The speed of the head tracking is enough when user go around the system.
However, when user moves so fast, the big gap between frames reduces the
smoothness of motion parallax. Failure in the detection also reduces this smoothness.
So we have to improve the speed of processing in head tracking and complement the
movement between frames in detection. To do this, we have a plan to introduce the
object tracking using computer vision, and use hybrid algorithm composed of
detection and tracking, to realize more effective head tracking processing.
Although this system can track user's head in enough range to enable him to move
around and appreciate the virtual exhibit, he cannot appreciate the virtual exhibits
from very near or under, because there face goes out of the range Kinects can capture.
To avoid this, we have to consider more about the number or placement of Kinects
base on user's behavior in his appreciation.
Selection of the appropriate user usually works well even if some faces or users are
captured. However, the system is confused when many people stand in the same
distance from the system. To solve this problem, we have to use more intelligent user
detection using depth data captured by range camera. We are also planning to
introduce some system to indicate who has the priority in the appreciation, for
example spot light on the user, to avoid users from confusing on the position of the
priority in head tracking.
4 Conclusion
In this paper, we constructed the Digital Display Case system to realize museum
exhibition using virtual exhibits using computer graphics, designed more compatible
with conventional display cases. We constructed vertically long system composed of
large display. We also constructed head tracking system using Kinect, and realizes the
appreciation of virtual exhibits from any point around the system, without loading any
special devices or sensors on users, which we can use more easily than previous
prototype.
As our future work, we have to improve the process of head tracking as described
in the chapter 3.3 to realize more natural motion parallax. In addition to this, we are
now planning to introduce some interaction to our system using the detection of users'
gesture using Kinect, which we now use only for the measurement of their position of
view.
Acknowledgments. This research is partly supported by “Mixed Realty Digital

Museum” project of MEXT of Japan. The authors would like to thank all the members
of our project. Especially Makoto Ando and Takafumi Watanabe from Toppan
printing.
References
1. Kajinami, T., Hayashi, O., Narumi, T., Tanikawa, T., Hirose, M.: Digital Display Case:
Museum exhibition system to convey background information about exhibits. In:
Proceedings of Virtual Systems and Multimedia (VSMM) 2010, pp. 230–233 (October
2010)
2. Tanikawa, T., Ando, M., Yoshida, K., Kuzuoka, H., Hirose, M.: Virtual gallery talk in
museum exhibition. In: Proceedings of ICAT 2004, pp. 369–376 (2004)
3. Kuzuoka, H., Yamazaki, K., Yamazaki, A., Kosaka, J., Suga, Y., Heath, C.: Dual ecologies
of robot as communication media: thoughts on coordinating orientations and projectability.
In: CHI 2004: Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, pp. 183–190. ACM, New York (2004)
4. Kuzuoka, H., Kawaguchi, I.: Study on museum guide robot that draws visitors’ attentions.
In: Proceedings of ASIAGRAPH 2009 (October 2009)
5. Hiyama, A., Yamashita, J., Kuzuoka, H., Hirota, K., Hirose, M.: Position tracking using
infra-red signals for museum guiding system. In: Murakami, H., Nakashima, H., Tokuda,
H., Yasumura, M. (eds.) UCS 2004. LNCS, vol. 3598, pp. 49–61. Springer, Heidelberg
(2005)
6. Kondo, T., Manabe, M., Arita-Kikutani, H., Mishima, Y.: Practical uses of mixed reality
exhibition at the national museum of nature and science in Tokyo. In: Joint Virtual Reality
Conference of EGVE - ICAT -EuroVR (December 2009)
7. Bimber, O., Encarnacao, L.M., Schmalstieg, D.: The virtual showcase as a new platform
for augmented reality digital storytelling. In: Proceedings of the Workshop on Virtual
Environments 2003, vol. 39, pp. 87–95 (August 2003)
8. Tomohiro Yendo, N.K., Tachi, S.: Seelinder: The cylindrical lightfield display. In:
SIGGRAPH 2005 E-tech (2005)
9. Doyama, Y., Taniakawa, T., Tagawa, K., Hirota, K., Hirose, M.: Cagra: Occlusion-capable
automultiscopic 3d display with spherical coverage. In: Proceedings of ICAT 2008, pp.
36–42 (2008)
10. Jones, A., McDowall, I., Yamada, H., Bolas, M., Debevec, P.: Rendering for an interactive
360deg light field display. In: ACM SIGGRAPH (2007)
11. Kawakami, N., Inami, M., Maeda, T., Tachi, S.: Proposal for the object-oriented display:
The design and implementation of the media3. In: Proceedings of ICAT 1997, pp. 57–62
(1997)
12. Roberto Lopez-Gulliver, S.Y.N.I., Yoshida, S.: gcubik: A cubic autostereoscopic display
for multiuser interaction - grasp and groupshare virtual images. In: ACM SIGGRAPH
2008 Poster (2008)
13. Kinect, http://www.xbox.com/en-US/Kinect
Meta Cookie+: An Illusion-Based Gustatory Display
Takuji Narumi1, Shinya Nishizaka2, Takashi Kajinami2,

Tomohiro Tanikawa2, and Michitaka Hirose2
1
Graduate School of Engineering, The University of Tokyo / JSPS
2
narumi@cyber.t.u-tokyo.ac.jp,
{nshinya,kaji,tani,hirose}@cyber.t.u-tokyo.ac.jp
Abstract. In this paper, we propose the illusion-based "Psuedo-gustation"

method to change perceived taste of a food when people eat by changing its
appearance and scent with augmented reality technology. We aim at utilizing an
influence between modalities for realizing a "pseudo-gustatory" system that
enables the user to experience various tastes without changing the chemical
composition of foods. Based on this concept, we built a "Meta Cookie+" system
to change the perceived taste of a cookie by overlaying visual and olfactory
information onto a real cookie. We performed an experiment that investigates
how people experience the flavor of a plain cookie by using our system. The
result suggests that our system can change the perceived taste based on the
effect of the cross-modal interaction of vision, olfaction and gustation.
Keywords: Illusion-based Virtual Reality, Gustatory Display, Pseudo-

gustation, Cross-modal Integration, Augmented Reality.
1 Introduction
Because it has recently become easy to manipulate some kind of multimodal
information by using a computer, many research projects have used computer-
generated virtual reality for studying the input and output of haptic and olfactory
information in order to realize more number of realistic applications [1]. However,
few of these studies have dealt with gustatory information, and there have been few
display systems presenting gustatory information [2, 3].
This scarcity of research on gustatory information is for several reasons. One
reason is that taste sensation is based on chemical signals, whose functions have not
yet been fully understood. Another reason is that taste sensation is affected by other
factors such as vision, olfaction, thermal sensation, and memory. Thus, as described
above, the complexity of the cognition mechanism for gustatory sensation makes it
difficult to build up a gustatory display, which is able to present a wide variety of
gustatory information.
Our hypothesis is that the complexity of the gustatory system can be applied to the
realization of a pseudo-gustatory display that presents the desired flavors by means of
Meta Cookie+: An Illusion-Based Gustatory Display 261
a perceptual illusion. The cases I have in mind are ones in which what you sense with
one modality affects what you experience in another. For example, the ventriloquist
effect involves an illusory experience of the location of a sound that is produced by
the sound's apparent visible source. The effect is neither inferential nor cognitive, but
results from cross-modal perceptual interactions. Cross-modal interactions, however,
are not limited to vision's impact upon the experience through other sense modalities.
By using this illusionary effect, we may induce people to experience different flavors
when they taste the same chemical substance. Therefore, in order to realize a novel
gustatory display system, we aim to establish a method for eliciting and utilizing a
cross-modal interaction.
In this paper, we propose a method to change the perceived taste of a cookie when
it is being eaten based on an illusion evoked by changing its appearance and scent by
employing augmented reality technology. We then report a "Meta Cookie" system,
which implements the proposed method, and the results of an experiment that
investigates how people experience the flavor of a plain cookie by using our system.
2 Cross-Modal Interactions Underlying the Perception of Flavor
Fundamental tastes are considered the basis for the presentation of various tastes
similar to how basic colors such as RGB are used as the basis for visual systems.
According to physiological definitions, taste has the status of a minor sense, as the
channel of only a limited number of sensations: sweetness, sourness, bitterness,
saltiness, and umami [4].
What is commonly called taste signifies a perceptual experience that involves the
integration of various sensations. When we use the common word "flavor" in place of
taste, then, we are again referring to what is a quite multi-faceted sensation. In fact,
the International Standards Organization has defined flavor as a complex combination
of the olfactory, gustatory, and trigeminal sensations perceived during tasting [5].
Auvray et al. reviewed the literature on multisensory interactions underlying the
perception of flavor; furthermore, they summarized saying that flavor is not defined
as a separate sensory modality but as a perceptual modality that is unified by the act
of eating, and should be used as a term to describe the combination of taste, smell,
touch, visual cues, auditory cues, and the trigeminal system [6]. These definitions
suggest that it is possible to change the flavor that people experience from foods by
changing the feedback they receive through modalities other than the sense of taste.
While it is difficult to present various tastes through a change in chemical substances,
it is possible to induce people to experience various flavors without changing the
chemical ingredients but by changing only the other sensory information that they
experience.
The sense of smell is most closely related to our perception of taste above all other
senses. This relationship between gustatory and olfactory sensations is commonly
known, as illustrated by pinching our nostrils when we eat food that we find
displeasing. Indeed, it has been reported that most of what people commonly think of
as the taste of food actually originates from the nose [7]. Conversely, another set of
researches on taste enhancement has provided strong support for the ability of odors
262 T. Narumi et al.
to modify taste qualities [8]. These studies indicate the possibility of changing the
flavor that people experience with foods by changing the scent.
Conversely, under many conditions, it is well known that humans have a robust
tendency to rely upon vision more than other senses. Several studies have explored
the effect of visual stimuli on our perception of flavor. For instance, taste and flavor
intensity have been shown to increase as the color level in a solution increases [9].
However, Spence et al. state the empirical evidence regarding the role that food
coloring plays in the perception of the intensity of a particular flavor or taste that is
attributed (as reported by many researchers over the last 50 years), which is rather
ambiguous because food coloring most certainly influences people's flavor
identification responses [10]. Their survey suggests the possibility of changing the
flavor identification by changing the appearance of food.
Therefore, our research focuses on the technological application of the influence of
appearance and scent on flavor perception. We propose a method to change the
perceived taste of food by changing its appearance and scent. In this study, we use
cookies as an example application. This is because cookies have a wide variety of
appearance, scent, and taste; however, at the same time, almost all cookies are similar
in texture and shape. Thus, we have developed a system to overlay the appearance
and scent of a flavored cookie on a plain cookie to let users experience eating a
flavored cookie although they are just eating a plain cookie.
3 Pseudo-gustatory Display: MetaCookie+
We developed a system, which we have named "MetaCookie+," to change the

perceived taste of a cookie by overlaying visual and olfactory information onto a real
cookie with a special AR marker pattern.
3.1 System Overview
"MetaCookie+" (Fig. 1) comprises four components: a marker-pattern-printed plain

cookie, a cookie detection unit based on an Edible Marker System, an overlaying
visual information unit, and an olfactory display. Fig. 2 illustrates the system
configuration of "MetaCookie+." Each component is discussed in more detail in the
following sections.
In this system, a user wears a head-mounted visual and olfactory display system.
The cookie detection unit detects the marker-pattern-printed cookie and calculates the
state (6DOF coordinate/occlusion/division/distance between the cookie and the nose
of a user) of the cookie in real time. Based on the calculated state, an image of a
flavored cookie is overlaid onto the cookie. Moreover, the olfactory display generates
the scent of a flavored cookie with an intensity that is determined based on the
calculated distance between the cookie and the nose of a user. The user can choose
one cookie, which s/he wants to eat, from multiple types. The appearance and scent of
the flavored cookie, which the user selects, are overlaid onto the cookie.
Fig. 1. MetaCookie+ Fig. 2. System configuration of "MetaCookie+"
3.2 Edible Marker System
For taste augmentation, an interaction with foods is necessary. At the time when the
foods must be eaten or divided, a method to detect occlusion and division is required.
However, conventional object-detection methods assume that a target object is
continuous. When the object is divided into pieces, tracking will fail because the
feature points of only a single piece are recognized as the target object whereas other
feature points are regarded as outliers. Despite the importance of division as one of
the state changes of an object, it has not been studied from the viewpoint of object
detection for AR applications.
Therefore, we proposed the "Edible Marker" system, which not only estimates the
6DOF coordinate of the AR marker, but also detects its occlusion and division. We
then applied this system to "MetaCookie+." Fig. 3 shows the processing steps of the
occlusion- and division-detectable "Edible Marker" system. The Edible Marker
System estimates the 6DOF coordinate, occlusion, and division of a marker in three
steps: Marker Detection, Background Subtraction and Superimposition.
In step 1 (Marker Detection), the natural feature points are detected from the
captured image and the marker position is extracted from an estimated homography
matrix. Subsequently, the projected image of the marker area in the captured image
can be obtained. For implementation, we used Fern [11] as the natural feature
descriptor and classifier.
In a conventional planar-object detection method, a homography matrix is
estimated from the correspondence of the feature points between a prepared template
image and an image captured by the user's camera. An accurate homography matrix is
estimated by calculating its elements using the least squares method after outlier
elimination. If we simply apply the conventional method to a divided planar object,
only one out of all the pieces of the object is detected as an inlier; therefore, the other
pieces cannot be detected as parts of the object. To detect all pieces of the divided
object, another method is required. The proposed method detects the pieces of the
divided object by iteratively applying PROSAC [12]. A database of the target object's
feature points is prepared in advance. In each estimation process, the inlier points are
deleted from the database. Next, estimation is performed using the updated database.
Subsequently, this method detects the inliers for each piece of the target object. The
iteration stops when the homography matrix calculation fails. After these processes,
the projected image of each piece in the captured image is obtained.
In step 2 (Background Subtraction), a difference image is obtained by background

subtraction. We implemented background subtraction based on the method shown in
[13]. The image to the left of the middle panel in Fig. 3 is the difference between the
template image and the projected image. Combining the temporary result and the
mask image for superimposition, the final result is obtained.
In step 3 (Superimposition), the image to the left of bottom panel in Fig. 3 is
overlaid using the final result obtained in step 2. The result of superimposition is
shown in the image to the right.
Fig. 4 shows that the proposed object-detection system can be recognized even if
the cookie is eaten, partially occluded, or divided. Furthermore, the area on which the
image should be overlaid can be detected by the background subtraction method. The
marker can manage an occlusion/division that is more than half of an entire cookie.
Fig. 3. Processing steps of the occlusion-and-division-detectable “Edible Marker” system and

realistic superimposition based on the detection
3.3 Pattern Printed Cookie
We made a plain cookie detectable by a camera for this system by printing a pattern
on it with a food printer. We use a MasterMind's food printer "Counver" [14], which
is a commercial off-the-shelf product. The printer produces a jet of colored edible ink
and creates a printed image on a flat surface of food.
3.4 Cookie Detection and Overlaying an Appearance
The Cookie Detection unit based on the Edible Marker system can obtain the 6DOF
coordinate of the cookie and the distance between the cookie and the camera. In the
Cookie Detection phase, two cameras are used in parallel. We used two Logicool
Webcam Pro 9000 cameras (angle of view: 76°) in this implementation. The layout of
the cameras, a head-mounted display (HMD), and an olfactory display is shown in
Fig. 4. The range of the cameras is also illustrated in this figure. The two cameras are
positioned to eliminate blind spots between the user's hands and mouth, in order to
track the cookie from the time at which a user holds it to the time at which s/he puts it
in her/his mouth.
Fig. 4. Layout of two cameras, a head-mounted display and an olfactory display
Camera 1 in Fig. 6 is for overlaying the appearance of another cookie on the

marked cookie and deciding the strength of the produced smell. The relationship of
the distance between the cookie and the camera to the strength of the produced smell
is discussed below. This camera and an HMD are used for Video see-through. The
HMD displays an image of several types of cookies on the pattern-printed cookie
based on the estimated position and detected occlusion/division. This visual effect
allows users to experience eating a selected cookie while merely eating a plain
cookie.
Another camera (Camera 2 in Fig. 6) is positioned in front of the user's nose and
oriented in the downward direction in order to detect when the user eats a cookie. The
area near the user's mouth is outside the first camera's line of vision. This is because
we placed the second camera in front of the user's nose and oriented it in the
downward direction. When the second camera detects that there is a cookie in front of
the user's mouth (within 15 cm from the camera), the system recognizes that the user
is about to put a cookie in her/his mouth.
3.5 Olfactory Display
We use an air-pump-type head-mounted olfactory display (Fig. 5) to produce the

scent of the selected cookie. The olfactory display comprises six air pumps, a
controller, and scented filters. One is to send fresh air and five pumps are to send
scented air. Each pump for scented air is connected to a scent filter filled with
aromatic chemicals. It can eject fresh air and six types of scented air. The scent filters
add scents to air from the pumps, and the scented air is ejected near the user's nose.
The strength of these scents can be adjusted to 127 different levels. By mixing fresh
air and scented air, the olfactory display generates an odor in arbitrary level with same
air volume. Users are unable to feel any change in air volume when the strength of the
generated odor changes.
According to the position of the pattern-printed plain cookie, the controller drives
the air pumps. Nearer the marked cookie from the user's nose, stronger scent ejects
from the olfactory display. Response time for generating arbitrary odor is less than 50
ms. It is quick enough to let users experience the change of smell in synchronization
with the change of visual information.
Fig. 5. Air-pump type head-mounted olfactory display
"MetaCookie+" generates two patterns of olfactory stimuli for simulating

orthonasal and retronasal olfaction. One pattern simulates orthonasal olfaction and
functions after the user holds the pattern-printed cookie and before s/he brings it near
her/his mouth. In this pattern, the controller drives the air pumps according to the
position of the pattern-printed plain cookie. The nearer the pattern-printed cookie is to
the user's nose, the stronger the scent ejected from the olfactory display. The olfactory
display is activated when the cookie is detected within 50 cm from camera 1. The
value of 50 cm is determined based on the average distance between the cameras and
a 70 cm-high desk along the line of sight when the user sits on a chair in front of the
desk. The strength of the smell produced by the olfactory display is zero when the
distance is 50 cm and strongest when the distance is 0 cm. The output is controlled
linearly within 50 cm from camera 1.
Another pattern simulates retronasal olfaction and functions after the system
recognizes that the user is about to put a cookie in her/his mouth with camera 2. When
camera 2 detects a cookie in front of the user's mouth, the system produces the
strongest smell from the olfactory display for 30 s. We determined the period to be
longer than the time to finish eating a bite of the cookie.
This olfactory information evokes a cross-modal effect between olfaction and
gustation, and enables users to feel that they are eating a flavored cookie although
they are just eating a plain cookie.
4 Evaluation
In order to evaluate the effectiveness of our proposed method for inducing people to
experience various flavors, we conducted an experiment to investigate how people
experience flavor in a cookie by using the "Meta Cookie" system. The purpose of this
experiment was to examine the cross-modal effect of visual stimuli and olfactory
stimuli on gustation. We investigated how the participants would perceive and
identify the taste of the cookie under conditions of only visual augmentation and only
olfactory augmentation and visual and olfactory augmentation. We prepared two
types of appearances and scents of commercially available cookies: chocolate and tea.
We examined how the participants experience and identfy the taste of a plain cookie
with these appearances and scent that are overlaid using our system.
4.1 Experimental Protocol
The combinations of scent and appearance in each experimental condition, which

were used in the experiment for representing flavored cookies, are illustrated in
Fig. 6. There are 7 combinations (without augmentation, visual augmentation
(chocolate), visual augmentation (tea) olfactory augmentation (chocolate) olfactory
augmentation (tea), visual and olfactory augmentation (chocolate) and visual and
olfactory augmentation (tea) ). We captured images of a chocolate cookie and a tea
cookie, and used these to overlay onto real cookies. The experiment was conducted
with 15 participants. The participants had never received training in the anatomy of
tastes. And we did not inform the participants beforehand that our system aimed at
changing the perceived taste before the participant finished the experiment.
Fig. 6. Experimental conditions
After subjects had eaten the plain cookie and the cookie which was augmented in
one of the seven experimental conditions, they were asked to compare it with the
plain cookie and to plot their experience of the taste on plotting paper. The plotting
paper had two scales from -4 to 4: one for sweetness and one for bitterness. We
defined the origin (0) of the scale as the taste of the plain cookie. Moreover, they were
asked to write the taste they identified from the cookie. Subjects repeated these steps
7 times. To eliminate any effect of the order in which the cookie were eaten, the order
was randomly assigned by the experimenters. In addition, subjects drank water in the
intervals between their eating of the cookies.
4.2 Result
Fig. 7 illustrates the results of this experiment. When the participants ate olfactory
augmented cookie, they experienced a change in the cookie's taste in 80% of the
trials. Moreover, when the participants ate cookie with visual stimuli (chocolate) and
olfactory stimuli (chocolate), they identified it as the chocolate cookie in 67% of the
trials. And when the participants ate cookie with visual stimuli (tea) and olfactory
stimuli (tea), they identified it as the tea cookie in 80% of the trials. While when the
participants ate cookie only with olfactory stimuli (chocolate), they identified it as the
chocolate cookie in 47% of the trials. And when the participants ate cookie with
visual stimuli (tea) and olfactory stimuli (tea), they identified it as the tea cookie in
67% of the trials. These averages in condition of only with olfactory stimuli are lower
than the averages in condition of with visual and olfactory stimuli.
Fig. 7. The cross-modal effect of visual stimuli, olfactory stimuli and visual & olfactory stimuli
on perception and identification of taste
4.3 Discussion
The results suggested that olfactory stimuli play an important role in the perception of
taste. While the results also suggested that olfactory stimuli cannot change the
identification of the taste sufficiently without help of visual stimuli. These do suggest
that cross-modal integration among visual, olfactory and gustatory plays important
roles for pseudo-gustatory system and our system can change a perceived taste, and
lets users experience various flavors without changing the chemical composition by
only changing the visual and olfactory information.
5 Conclusion
In this study, we proposed a "Psuedo-gustation" method to change the perceived taste
of a cookie when it is being eaten by changing its appearance and scent with
augmented reality technology. We built a "Meta Cookie" system based on the effect
of the cross-modal integration of vision, olfaction, and gustation as an implementation
of the proposed method. We performed an experiment that investigates how people
experience the flavor of a plain cookie by using our system. The results of the
experiment suggested that our system can change the perceived taste.
Because our system can shift the flavor of nutritionally controlled foods from
distasteful or tasteless to tasty or desired, we believe that it can be used for food
prepared in hospitals and in diet food applications. Moreover, we believe we can build
an expressive gustatory display system by combining this pseudo-gustation method
based on cross-modal integration and methods for synthesizing a rough taste from
fundamental taste substances. By doing so, we can realize a gustatory display, which
is able to display a wide variety of tastes.
Acknowledgement. This research was partially supported by MEXT, Grant-in-Aid

for Young Scientists (A), 21680011, 2009.
References
1. Nakamoto, T., Minh, H.P.D.: Improvement of olfactory display using solenoid valves. In:
Proc. of IEEE VR 2007, pp. 179–186 (2007)
2. Iwata, H., Yano, H., Uemura, T., Moriya, T.: Food Simulator: A Haptic Interface for
Biting. In: Proc. of IEEE VR 2004, pp. 51–57 (2004)
3. Maynes-Aminzade, D.: Edible Bits: Seamless Interfaces between People, Data and Food.
In: ACM CHI 2005 Extended Abstracts, pp. 2207–2210 (2005)
4. Delwiche, J.: The impact of perceptual interactions on perceived flavor. Food Qual.
Prefer. 15, 137–146 (2004)
5. Chandrashekar, J., Hoon, M.A., Ryba, N.K., Zuker, C.S.: The receptors and cells for
mammalian taste. Nature 444, 288–294 (2006)
6. Auvray, M., Spence, C.: The multisensory perception of flavor. Consciousness and
Cognition 17, 1016–1031 (2008)
7. Rozin, P.: Taste-smell confusion and the duality of the olfactory sense. Perception and
Psychophysics 31, 397–401 (1982)
8. Stevenson, R.J., Prescott, J., Boakes, R.A.: Confusing Tastes and Smells: How Odours can
Influence the Perception of Sweet and Sour Tastes. Chem. Senses 24(6), 627–635 (1999)
9. Zampini, M., Wantling, E., Phillips, N., Spence, C.: Multisensory flavor perception:
Assessing the influence of fruit acids and color cues on the perception of fruit-flavored
beverages. Food Quality & Preference 18, 335–343 (2008)
10. Spence, C., Levitan, C., Shankar, M., Zampini, M.: Does Food Color Influence Taste and
Flavor Perception in Humans? Chemosensory Perception 3(1), 68–84 (2010)
11. Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random
ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 448–461 (2010)
12. Chum, O., Matas, J.: Matching with prosac-progressive sample consensus. In: Proc. of
CVPR 2005, vol. 1, pp. 220–226. IEEE, Los Alamitos (2005)
13. Li, L., Huang, W., Gu, I., Tian, Q.: Foreground object detection from videos containing
complex background. In: Proceedings of the Eleventh ACM International Conference on
Multimedia, p. 10. ACM, New York (2003)
14. Master Mind, Food Printer ”Couver”,
http://www.begin.co.jp/goods/11_plotter/main_01.htm
LIS3D: Low-Cost 6DOF Laser Interaction for
Outdoor Mixed Reality
Pedro Santos1, Hendrik Schmedt1, Bernd Amend1, Philip Hammer2, Ronny Giera3,
Elke Hergenröther4, and André Stork5
1
Fraunhofer-IGD, A2, Germany
2
Deck13 Interactive GmbH, Germany
3
Semantis Information Builders GmbH, Germany
4
University of Applied Sciences Darmstadt, Germany
5
Technical University of Darmstadt, Germany
{Pedro.Santos,Hendrik.Schmedt,Bernd.Amend}@igd.fhg.de,
Philip.Hammer@deck13.com,
rgiera@semantis-ib.de, e.hergenroether@fbi.h-da.de,
Andre.Stork@igd.fhg.de
Abstract. This paper introduces a new low-cost, laser-based 6DOF interaction

technology for outdoor mixed reality applications. It can be used in a variety of
outdoor mixed reality scenarios for making 3D annotations or correctly placing
3D virtual content anywhere in the real world. In addition, it can also be used
with virtual back-projection displays for scene navigation purposes. Applications
can range from design review in the architecture domain to cultural heritage
experiences on location. Previous laser-based interaction techniques only yielded
2D or 3D intersection coordinates of the laser beam with a real world object. The
main contribution of our solution is that we are able to reconstruct the full pose
of an area targeted by our laser device in relation to the user.In practice, this
means that our device can be used to navigate any scene in 6DOF. Moreover, we
can place any virtual object or any 3D annotation anywhere in a scene, so it
correctly matches the user’s perspective.
1 Introduction
Why should research be conducted in the area of mixed reality technologies and
which is their benefit for human-machine interaction? The main reason is that a
human being will best interact with a machine in his known and familiar environment
using objects that are also known and familiar to him.
The ultimate goal of mixed reality applications is therefore to make the transition
between virtual and real content appear seamless to the user and interaction easy to
handle.
However, to reach that ultimate goal many supporting technologies need to either
be invented from scratch or further be developed and enhanced, many of which are
vision based, because no other human sense offers so much available bandwidth for
information transfer.
LIS3D: Low-Cost 6DOF Laser Interaction for Outdoor Mixed Reality 271
The applications of mixed reality technologies are wide-spread. In the domain of

industrial applications mixed reality technologies are used more and more along the
production chain. Prominent examples feature the design and modelling stages of new
prototypes, but also training and maintenance scenarios on the final products.
Prototypes built in product design stages are increasingly virtual and no longer
physical mock-ups until the final design is produced. Mixed reality technologies
allow for seamless visualization of models in real environments under correct lighting
conditions.
In the automotive industry technologies such as autonomous vision systems for
robots lay the foundations for advanced driver assistance systems which super-impose
context and back-up path planning in the driver’s field of view.
Using mixed reality for customer care concerning household appliances greatly
reduces the need for an expert on location and increases safe handling of simple
repairs or parts replacement by the customers themselves.
In job-training scenarios mixed reality applications can make learning schedules
more flexible, while guaranteeing constant levels of quality education.
The cultural heritage domain benefits from mixed reality technologies that enable
3D reconstruction of an ancient piece of art, based on its fragments found at an
archaeological site. In many cases even whole buildings or premises can be visualized
in mixed reality on location in the way they would have been many years ago.
Tourism benefits from mixed reality enabling visitors to explore a city without
previous knowledge of the topography as well as points of interest which are super-
imposed in their view, while pose estimation technologies are used for navigation.
In conclusion, mixed reality and its supporting technologies bring benefit to a wide
range of domains simplifying human-machine interaction. The user’s attention is no
longer diverted to other sorts of input or output devices when virtual content is
directly super-imposed in his view. In addition the visual quality of virtual content has
improved so much it seamlessly blends in with reality. Mixed reality makes complex
tasks easier to handle.
To correctly visualize a mixed reality scene requires good pose estimation
technology, able to calculate the 6DOF position and orientation of the user or his
display, in case of using a head mounted see-through device. Knowing his pose
allows virtual content to accurately be super-imposed in his field-of-view.
But what if the user wants to directly interact with the scene and place a 3D virtual
object of his own next to a real building on a town square ? What if the user wants to
add a 3D annotation to a monument or a location of special interest? What if he wants
to move virtual content around a real scene when designing his new home ?
Our new low-cost, laser-based 6DOF interaction technology offers a very simple
way of answering those needs.
Instead of using a single laser-pointer, we project a specific laser pattern on any flat
surface and dynamically reconstruct its pose from its given projection on that surface.
2 Setup and Calibration

The basic idea is to track a projected pattern consisting of five laser points on a planar
surface and infer the pattern’s pose from a camera view on the pattern. The system is
intended to be used in mixed reality where the user wears a camera attached to his
272 P. Santos et al.
head-mounted-display and projects a pattern on to any planar surface of his choice.

To develop and test this idea we have used a stereo back-projection system first,
featuring a webcam capturing projected laser patterns on the screen and we have built
a low cost interaction device out of five small, off-the-shelf laser pointers, which
projecting a cross-hair pattern on any target area (Figure 1).
Fig. 1. Projected pattern
Later the goal will be to test it outdoors for mixed reality environments, allowing a
user to interact with reality and place, move or modify virtual content anywhere on-
the-fly while looking at it with a laser pattern generator connected to his Head-
mounted display, letting the mixed reality rendering system correctly super-impose
the virtual content in his view.
Screen Setup. The preliminary test environment for the device and its corresponding
algorithms is composed of a back-projection stereo wall with two projectors for
passive stereo projection and a webcam with a daylight blocking filter to better
resolve the red laser beams when projecting a pattern on the projection screen
resulting in an outside-in tracking setup for the interaction device (Figure 2).
Fig. 2. Outside-in tracking setup and camera view of projected patterns
Interaction Device. The interaction device consists of five laser pointers which are
mounted on an aluminium chassis, so that each pointer can individually be aligned.
Having flexible mount points was useful to determine the best possible projection
pyramid for the laser lights depending on the distance to the projection target which in
Fig. 3. Laser Interaction Device
our case was the back-projection display. We have used 5 mW Lasers with a
wavelength of 635mm-670mm. All laser pointers are fed by a single battery pack and
are switched on and off simultaneously (Figure 3).
Calibration of the Setup. To properly use the interaction device we calibrated both,
the camera that films and tracks the projection of the laser pattern on a surface (back-
projection screen) and the laser-set itself.
Concerning the camera calibration we applied a radial and perspective distortion
correction.
For the radial correction we use algorithms as stated in [1][2] and implemented in
OpenCV, which take into account that real lenses also have a small tangential
distortion. A checkered calibration pattern is used for that purpose.
The perspective correction is needed, because the camera is not perpendicular to
the back-projection screen. Therefore recorded video footage of any pattern projected
on top of the screen would always feature an additional distortion due to the camera
angle it is taken from.
To compensate for this in our calculus, we project checkered pattern previously used
for radial distortion correction on the screen and identify its four corners which are then
matched against the camera image plane to define a homography from the distorted input
to the perspective corrected picture we want to use for tracking, where the edges of the
calibration pattern match the edges of the camera image plane (Figure 4).
In practical terms, calibration of the back-projection setup including the tracking

camera is done by interactively marking the edges of the pattern to compute the radial
and the perspective distortion.
Fig. 4. Without and with perspective distortion correction
Calibration of the Laser-set. To calibrate the laser interaction device we have to

align all five lasers accordingly. This is achieved by a small tool that projects the
target pattern on the back projection screen, so the laser lights have to match it, while
the laser-set is mounted perpendicular to the screen. A configuration file is generated
which contains the established projection pyramid of the laserset. This step is needed
to properly interpret the results of the laser tracking and being able to identify
absolute values for position and orientation in relation to the target screen.
3 Laser Tracking
Once the setup is calibrated, the goal is to be able to track the projected laser pattern
consisting of five points and avoiding or identifying ambiguities in particular when
more than one of those interaction devices is used, which means that projected laser
points need to be associated to their corresponding patterns generated by the
respective devices.
Our approach to solve this tracking problem consists of the following tracking
pipeline:
• Point recognition
• Point tracking
• Line recognition
• Pattern recognition
• Pose reconstruction
Point Recognition. The camera has a daylight blocking filter to enhance the effect of
the laser pointers. The first step of the pipeline is point acquisition. For this purpose
we initially converted incoming video footage to greyscale and compensated radial
and perspective distortion. However the two latter operations took around 60ms per
frame for a camera resolution of 1280x1024. Therefore we first identified relevant
feature points in a frame and then compensated for radial and perspective distortion
for those points only, so this would take less than 1ms per point and we were able to
process much more feature points in real-time. To identify a feature point in a frame,
we use a contour finder [2][4][5] and search for ellipsoids that fit matching criteria in
respect to their minimum and maximum sizes as well as their ratio regarding their
major and minor axis (Figure 4). We have to impose constraints because despite using
laser light, the contours of a projected laser beam on the screen are not a well-defined
ellipse, but only an approximation of it. Moreover, since the tracking camera is
behind the screen together with the stereo projectors, we also have to filter out the
projectors’ bright spots in the centre of the projection using an infrared filter. As an
alternative to the described approach we also implemented a method using adjacency
lists on connected points.
Fig. 4. Two laser projections and the recognized points
Point Tracking. Point recognition outputs a list of detected feature points. To track
points from one frame to another we analyze the position, speed and direction of a
laser point. Those criteria are dependent on camera frame-rate, covered area, camera
resolution and the number of points tracked. For each feature point pair in one frame
we calculate a rating stating how similar those points are to each other. We do the
same for the subsequent frame and can then match the pairs to each other to identify
the previous and subsequent position of the same point. The similarity rating per point
is calculated based on the current position, the previous and current direction and the
speed and results in a number between 0 and 1.
For each new point we calculate its similarity rating to all points of the previous
frame.
Where
For each point we take the best and second best similarity rating and calculate if
erfülltist. “nearest
neighbourhood” is a factor which specifies how much better the first choice must be
over the second best choice. If we do not have a match, then we have a new point
(Figure 4).
Line Recognition. To detect the laser pattern we now use Hough Transforms to
identify three points on a line. The difference to common use of the Hough Transform
is, that here we do not transform points of a line detection into an accumulator, but
only the already recognized points in the current frame[3],[6],[7]. We build a list of
potential candidates for lines (accumulator cells with more than k hits). Cells with 3
hits allow for a single line. Cells with 4 hits allow for 4 lines and cells with 5 hits
allow for 8 lines. Already recognized lines in a previous frame are found in a
subsequent frame if all previous points correspond to one of the new potential lines.
In that case the previous line is used.
Figure 5 shows the result of three projected laser-set patterns with three interaction
devices.
Fig. 5. Three projected laser-set patterns and possible line combinations
Pattern Recognition. To identify the cross patterns proposed we use a Greedy-

algorithm which tries to find a generic solution to a problem by iteratively finding
local optima. For this purpose we see the number of lines as an undirected graph
connecting the corresponding points and build a co-existence matrix which for each
pair of points stores their ability to co-exist. The solution of the problem is
represented by the maximum clique, meaning the biggest connected sub-graph
(Figure 6).
Fig. 6. All possible patterns and final result
Pose Recognition. Finally the pose of each recognized pattern is reconstructed from
the delta transformation between previous and current frame. Each transformation is
around the pattern’s axis which in our case is the center point. Therefore rotations and
translations are applied around that point to alter the pose. A 2d translation can easily
be calculated by observing the shift of the center point from previous to current frame.
A 2d rotation can be calculated from rotation around its center in the previous and
current frame. Yet to reconstruct the 3D pose in space we use an algorithm of our own
which we cannot disclose at this stage [11] or alternatively a variant of the algorithm
proposed by [2],[3] which iteratively computes the pose based on the knowledge of
the points and dimension of the pattern.
4 Results
We have successfully demonstrated the feasibility of a 6DOF laser-based interaction
technique requiring 20ms per interaction pattern for pose reconstruction on regular
Intel Core 2Duo 2.4Mhz Hardware.
We have validated our results in outside-in and inside-out tracking scenarios.
The usage of a pattern-based laser interaction greatly simplifies human computer
interaction in mixed reality environments for the purpose of 3D annotations or virtual
content modification (Figure 7).
Fig. 7. Test applications being controlled by Laser interaction
References
1. Zhang, Z.: Flexible Camera Calibration By Viewing a Plane From Unknown Orientations.
Version:1999. IEEE, Los Alamitos (1999)
2. Santos, P., Stork, A., Buaes, A., Pereira, C.E., Jorge, J.: A Real-time Low-cost Marker-
based Multiple Camera Tracking Solution for Virtual Reality Applications. Journal of
Real-Time Image Processing 5(2), 121–128 (2010); First published as Online First,
November 11 (2009), DOI 10.1007/s11554-009-0138-9
3. Santos, P., Stork, A., Buaes, A., Jorge, J.: PTrack: Introducing a Novel Iterative Geometric
Pose Estimation for a Marker-based Single Camera Tracking System. In: Fröhlich, B.,
Bowman, D., Iwata, H. (eds.) Proceedings of Institute of Electrical and Electronics
Engineers (IEEE): IEEE Virtual Reality 2006, pp. 143–150. IEEE Computer Society, Los
Alamitos (2006)
4. Sukthankar, R., Stockton, R., Mullin, M.: Self-Calibrating Camera-Assisted Presentation
Interface. In: Proceedings of International Conference on Automation, Control, Robotics
and Computer Vision (2000)
5. Kurz, D., Hantsch, F., Grobe, M., Schiewe, A., Bimber, O.: Laser Pointer Tracking in
Projector-Augmented Architectural Environments. In: Proceedings of the 6th IEEE and
ACM International Symposium on Mixed and Augmented Reality, November 13-16, pp.
1–8. IEEE Computer Society, Washington, DC (2007),
http://dx.doi.org/10.1109/ISMAR.2007.4538820
6. Kim, N.W., Lee, S.J., Lee, B.G., Lee, J.J.: Vision based laser pointer interaction for
flexible screens. In: Jacko, J.A. (ed.) Proceedings of the 12th International Conference on
Human-Computer Interaction: Interaction Platforms and Techniques, Beijing, China.
LNCS, pp. 845–853. Springer, Heidelberg (2007)
7. Zhang, L., Shi, Y., Chen, B.: NALP: Navigating Assistant for Large Display Presentation
Using Laser Pointer. In: First International Conference on Advances in Computer-Human
Interaction, February 10-15, pp. 39–44 (2008), doi:10.1109/ACHI.2008.54
8. Santos, P., Schmedt, H., Hohmann, S., Stork, A.: The Hybrid Outdoor Tracking Extension
for the Daylight Blocker Display. In: Inakage, M. (ed.) ACM SIGGRAPH: Siggraph Asia
2009. Full Conference DVD-ROM, p. 1. ACM Press, New York (2009)
9. Santos, P., Gierlinger, T., Machui, O., Stork, A.: The Daylight Blocking Optical Stereo
See-through HMD. In: Proceedings: Immersive Projection Technologies / Emerging
Display Technologies Workshop, IPT/EDT 2008, p. 4. ACM, New York (2008)
10. Santos, P., Acri, D., Gierlinger, T., Schmedt, H.: Supporting Outdoor Mixed Reality
Applications for Architecture and Cultural Heritage. In: Khan, A. (ed.) The Society for
Modeling and Simulation International: 2010 Proceedings of the Symposium on
Simulation for Architecture and Urban Design, pp. 129–136 (2010)
11. Amend, B., Giera, R., Hammer, P., Schmedt, H.: LIS3D Laser Interaction System 3D,
System Development project internal report, Dept. Computer Science, University of
Applied Sciences Darmstadt, Germany (2008)
Olfactory Display Using Visual Feedback
Based on Olfactory Sensory Map
Tomohiro Tanikawa1, Aiko Nambu1,2, Takuji Narumi1,

Kunihiro Nishimura1, and Michitaka Hirose1
1
The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku
Tokyo, 113-8656 Japan
2
Japan Society for the Promotion of Science, 6 Ichibancho, Chiyoda-ku
Tokyo, 102-8471 Japan
{tani,aikonmb,narumi,kuni,hirose}@cyber.t.u-tokyo.ac.jp
Abstract. Olfactory sensation is based on chemical signals whereas the visual

sensation and auditory sensation are based on physical signals. Therefore
olfactory displays which exist now can only present the set of scents which was
prepared beforehand because a set of “primary odors" has not been found. In
our study, we focus on development of an olfactory display using cross
modality which can represent more patterns of scents than the patterns of scents
prepared. We construct olfactory sensory map by asking subjects to smell
various aroma chemicals and evaluate their similarity. Based on the map, we
selected a few aroma chemicals and implemented a visual and olfactory display.
We succeed to generate various smell feeling from only few aromas, and it is
able to substitute aromas by pictures nearer aromas are drawn by pictures more
strongly. Thus, we can reduce the number of aromas in olfactory displays using
the olfactory map.
Keywords: Olfactory display, Multimodal interface, Cross modality, Virtual

Reality.
1 Introduction
Researches on olfactory displays are evolving into a medium of VR as well as visual
and auditory displays. However, there are some bottlenecks in olfactory information
presentation.Visual, auditory and haptic senses come from physical signals, whereas
olfactory and gustatory senses come from chemical signals. Therefore, researches on
olfactory and gustatory information are not so well on the way of researches as that of
visual, auditory and haptic information.
Olfaction is most unexplained among five senses, and even the mechanism of
reception and recognition of smell substances is unknown. Thus, “primary odors”,
which can represent all types of scents, are not established. It means that the policy on
mixing and presenting smell substances does not exist. Thus, it is difficult to present
various scents using olfactory displays.In addition, olfaction is more unstable and
variable than vision and audition. It is known that we can identify scents of daily
materials only fifty percent of the time. For example, only half can answer “apple”
when they sniff apples. [1,2]
Olfactory Display Using Visual Feedback Based on Olfactory Sensory Map 281
1.1 Olfactory Display
Olfactory displays can produce high realistic sensation which cannot be given by
vision or audition. We illustrate olfactory displays developed so far by two following
examples.
“Let’s cook curry” by Nakamoto, et al. [3] is an olfactory display with interactive
aroma contents, “a cooking game with smells”. It presents smells of curry, meat,
onion and so on by player’s control. Wearable olfactory display” by Yamada, et al.
[4] generates a pseudo olfactory field by changing concentration of some kinds of
aroma chemicals using position information.
However, they produce only combination of prepared element odors, selected
aroma chemicals, in each preceding studies. Therefore element odors cannot represent
smells which do not belong to the element odors. It means conventional olfactory
displays have limitations to represent various smells.
In order to implement practicable olfactory displays which can produce more
various smells than before, it is required to reduce the number of element odors and
produce feelings of wide range of smells from those few element odors.
We focused on instability and variability of olfaction. Using the band of olfactory
fluctuation, there is a possibility that we can make people feel a smell different from
the presented smell by using some techniques when a certain smell material is
presented. If we are able to make people feel other smell than actual element odors,
we can treat element odors the same as “primary odors” and we can generate various
olfactory experiences from the element odors.
2 Concept
2.1 Drawing Effect to Olfaction by Visual Stimuli
Olfaction has more ambiguity than vision or audition. For example, it is difficult to
distinguish the name of flowers or foods only by scents, unlike by visual images. [2]
Thus, olfaction is easily affected by knowledge of smell and other sensation. [5][6]
In addition, olfactory sensation interacts with other various senses, especially
visual sensation. That is, it is thought that olfaction can be used for the information
presentation more effectively by the interaction with the cue of other sensation than
olfaction. [7][8][9]
In this paper, we tried to generate various "pseudo olfactory experience" by using
the cross modal effect between vision and olfaction. When the visual stimulus which
contradicts the presented olfactory stimulus is presented, the visual stimulus
influences olfaction. It is able to produce the olfactory sensation corresponding to not
the smell generated actually but the visual information by presenting an image
conflicting with the produced smell. We defined this cross modal effect between
vision and olfaction as "drawing effect" on olfaction by vision.
This drawing effect leads visual images to give pseudo olfactory sensation not
presented actually. For example, Nambu et al. [10] suggest the possibility of
producing the scent of melons from the aroma of lemons, which is unrelated to the
scent of melons, by showing a picture of melons. In this case, the picture of melons
282 T. Tanikawa et al.
draws the aroma of lemons toward the pseudo scent of melons. When we apply the
drawing effect to the olfactory display, we need the index what condition intensifies
the drawing effect.
Fig. 1. Concept of Visual-olfactory Display
2.2 Olfactory Display Using Sensory Maps
It is thought that the strength of the drawing effect depends on the kind of the
presented scent. That is to say, it is forecast that the drawing effect is generated easily
between smells with high degree of similarity of the scent, while the drawing effect is
not generated easily between scents with low degree of similarity of the scent. Then,
we propose to use the degree of similarity of the smell, or the distance between scents,
as an index in the drawing effect to the smell.
The method of evaluating the degree of similarity of the scent includes a method
using similarity according to the language and similarity of the chemical
characteristics.[11] However, there is not a fundamental research on the degree of
similarity of the scent based on human’s olfactory sensation yet. Then, we tried to
construct a new olfactory map based on smell evaluation, which is more approximate
to olfactory sensation than past olfactory maps.
The distance between scents can be evaluated more accurately and easily than
before by making the olfactory map based on olfaction. Then, it becomes possible to
operate the drawing effect to olfaction by vision more efficiently if we can prove that
the closer the content of the olfactory source and the content of the picture aimed at,
the stronger the drawing effect occurs.
3 Construction of an Olfactory Map

It is difficult to take out a common part of multiple people's smell senses and to make
it into the common map because olfaction has more individual variations than other
senses. [12] Two kinds of approaches are thought for making an olfactory map.
The first approach is making a personal olfactory map. It is required to measure
distance, to make the map, and to prepare appropriate aromas for each user but the
olfactory display completely suitable for each user can be achieved.
The second approach is evaluating the distance between scents by many people,
extracting a common part of olfaction not influenced to the individual variation and
making the map. The advantage of this approach is to be able to obtain the data of the
distance between smells in which one map has generality. Therefore, when the
individual variation of the result is not extremely large, it can be said that it is more
advantageous for the development of an olfactory display suitable for practical use to
use the approach of this common olfactory map.
It is necessary to consider two points, secure reliability to the smell evaluation and
application to multiple people with individual variation of olfaction, to make a
common olfactory map by using sense of smell. We discuss the method for making
the olfactory map satisfying the points and evaluation to the map.
3.1 Method for Constructing Olfactory Maps
The procedure of constructing an olfactory map is as follows. First, we prepared 18-

kinds of fruit flavored aroma chemicals: lemons, oranges, strawberries, melons,
bananas, grapefruits, yuzu (Chinese lemons), grapes, peaches, pineapples, lychees,
guavas, mangoes, apples, green apples, kiwis, apricots, plums. We confined the kinds
of aroma chemicals among fruit flavors because we can compare the similarity
between two aromas in the same category (fruits, flowers, dishes, etc.) easily than
between two aromas in different categories. Then, we soaked test papers into each
aroma chemicals and used them as smell samples.
Seven subjects evaluated the degree of similarity of two smell samples by five
stages from among them. For example, if a subject feels that the scent of oranges and
the scent of melons are very similar, the similarity is “4”. Then, we calculated smell
distance between two smell samples as “5 minus similarity”. Correspondence table
between the similarity and the distance is illustrated in Table 1. This trial was done to
the combination of all of the 18-kinds smell samples.
Table 1. Relationship between score of similarity and distance of smell samples
Score of Similarity Mention of questionnaire Score of Distance

1 Different 4
2 Less similar 3
3 Fairly similar 2
4 Very similar 1
5 Hard to tell apart 0
The relation of the distance of the 18-kinds of smell samples is shown as 18

dimensions square matrix. We analyzed the distance matrix by Isometric
Multidimensional Scaling (isoMDS) and mapped the result into a two dimension
olfactory map.
3.2 Evaluation of Reliability
Reliability to the smell evaluation should be ensured because human sense of smell is
unstable. In order to check if people can identify the same aromas as same scents, we
prepared a pair of test paper of the same aroma chemicals, and we asked five subjects
to evaluate the degree of similarity between the same aromas. We went on the
experiment for two kinds of aromas: Lemons and Lyches. Then, we compared the
results of comparing the same aromas with the average of the degree of similarity to
the combination of all of the 18 kinds of smells.
Fig. 2. The degree of similarity between the same aromas. The mean value of degree of
similarity between same lemon aromas is 4.6 and the one between lychee aromas is 4.2. Both
results differ significantly from the mean value among entire 18-kinds of aroma chemicals
(2.07). (p<0.01).
There is reliability in the similar level measurement based on the sense of smell
because the mean value of the degree of similarity between the same aromas indicated
a remarkably high value compared with the entire mean value.
3.3 Making the Common Olfactory Map
We created a policy that it is possible to construct the common olfactory map by

integrating and extracting a common part from results of multiple people’s olfactory
maps. In this section, we constructed a common olfactory map by averaging the result
of 7 subjects’ olfactory similarity evaluation.
We tried two ways of methods to average subjects’ result: the simple average and
the binarized average of similarity values. The simple average method is the method
calculating the average of smell distance between each aroma and mapping them
aroma chemicals by isoMDS. The binarized average method is the method binarizing
smell distance before calculating arithmetic average and mapping the results in order
to prevent blurring the map by fluctuation of subjects’ answers. First we set a proper
threshold value from 0 to 5 and divided the similarity values into the smell distance
value 0 if the similarity degree value is under the threshold and the distance value 1 if
it is over the threshold. We set the threshold between 3 and 4 because the similarity
degree of two identical aromas is no less than 4 in 9 trials out of 10 in 3.2 Evaluation.
Fig. 3. (A) simple average (left), (B) binarized average (right)
On the simple average map (Fig.3.A), the scent “yuzu” was mapped between
“lemon” and “lemon2” (same as lemon) and “guava” was mapped between “lychee”
and “lychee2” (same as lychee). It means that the distance between different aromas
is closer than the distance between the same aromas. In contrast, on the binarized
average map (Fig.3.B), “lemon2” is closest to “lemon” among all aromas, and
“lychee2” is closest to “lychee” among all aromas.
The binarized average method can prevent blurring among individuals and blurring
between subjects more than the simple average, and suitable for the olfactory map
generation. Furthermore, the generated olfactory map makes it possible to categorize
aroma chemicals based on each rough character of the smell like citrus fruits and the
apples, etc. There is a possibility to implement the olfactory display with which small
number of representative aroma chemicals can present various smells by selecting
representative aromas from each category.
4 Olfactory Display Using Olfactory Maps

If we prove that “the more similar the content of picture and the content of aroma
chemicals are, the more the drawing effect is likely to happen”, we can implement a
brand-new olfactory display which can render more kinds and range of smells than
the number and range of a few of prepared aroma chemicals.
In this chapter, we describe the prototype of visual-olfactory display system and
the experiments to evaluate the effect of the smell distance on the map to the drawing
effect.
4.1 Implementation of Visual-Olfactory Display
The visual-olfactory system consists of an olfactory display and notebook PC for

showing pictures and control. (Fig.4)
The olfactory display consists of the scent generator, the controller, the showing
interface and PC monitor. The scent generator has four air pumps. Each pump is
connected to a scent filter filled with aroma chemicals. The controller drives the air
pumps in the scent generator according to the command from PC. The scent filters add
scents to air from the pumps and then the showing interface ejects air nearby user’s nose.
Fig. 4. Prototype system of visual-olfactory display
4.2 Evaluation of the Visual-Olfactory Display
We went on experiments to evaluate the visual-olfactory display for 7 subjects. These

7 subjects are different from subjects of the evaluation 3 in chapter 3.4 in order to
prove the validness of olfactory map for people not participating to build the map.
We showed them a picture from 18-kind of pictures of fruits and an aroma from 4-
kind of element aromas. The pictures of fruits correspond to 18 aroma flavors used in
chapter 3 one by one. Then, we asked them, “What kind of smell you feel by sniffing
the olfactory display?” We conducted the experiment in a well-ventilated large room
to avoid mixing different aromas and olfactory adaptation.
The four kinds of element aromas were selected from 18-kinds of fruit used in
chapter 3. First, we categorized the 18-kinds of aromas into four groups by features of
scents. (Fig.5) Then we selected one scent from each group (apple, peach, lemon, lychee)
so as to minimize the distance between a key aroma and each another aroma in the same
category as the key aroma. Each picture was shown with the nearest key aroma.
If subjects answer that they feel the smell correspondent to the shown picture when
the content of the picture and the content of the aroma are different, the drawing
effect is considered to occur. Thus, we used the rate of answering the smell of the
shown picture as an index of the drawing effect.
In order to prove “the more similar the content of picture and the content of aroma
chemicals are, the more the drawing effect is likely to happen”, we also conducted
another experiment to evaluate the drawing effect between the picture and another
aroma which is not closest to the content of picture. We asked subjects to answer
what smell they felt when we showed them a picture and the aroma second closest to
the picture. The trials were done for 9-kinds of picture. We compared the drawing
effect between trials using the closest aroma and trials using the second closest aroma.
Fig. 5. Categorization of aroma chemicals
Each subject answered the scent of the picture in an average of 13 of 27 trials

(36%) per person showing a picture and an aroma. This is statistically higher than the
rate to answer the scent of the aroma, 11%. (p<0.01) The number of description of
answer for 27 trials per person was as many as an average of 13 kinds although we
used only four kinds of aromas.
Moreover, we compared the rate to answer that the smell is like the content of the
picture by olfactory distance on the map. The rate was 44% when the picture and the
aroma were close, while 27% when the picture and the aroma were distant. It means
the close pair helped subjects to answer the smell influenced by the picture
statistically significantly. (p<0.01) (Fig.6)
Fig. 6. Comparison by Distance

5 Conclusion
There were much more answers that it smelled corresponding to the destined image
than the answer that it smelled corresponding to the aroma chemical actually
presented. Besides, the kind of a free answer included many kinds of smells. These
results confirmed that we can generate several times more kinds of pseudo smells than
the number of prepared aroma chemicals.
In addition, the fact that a similar set of a picture and an aroma increases the rate of
the drawing effect of the picture to the aroma proved the hypothesis that the closer the
picture and the aroma on the olfactory map were, the stronger the drawing effect was.
As well as the smell distance, positional relationship of smells in the map is available
for a criterion of selecting element odors.
However, the rate answering the smell of the destined picture was 44%, which was
not enough high to use the drawing effect for olfactory displays actually. This is
attributed to the difficulty to identify the picture and the name by free answer. For
example, guavas are not popular for Japanese subjects so that they are thought not to
be able to recall the name of guavas by visual cue, and it is difficult to tell the picture
of lemons and grapefruits apart.
It proved to be possible to construct the olfactory map based on olfactory sensation
according to the sensory evaluation of the smell similarity of two or more people. The
common olfactory map suitable for people with various olfactory sensation patterns
can be used as well as the language based olfactory map. Using the common
olfactory map, we achieved an olfactory display presenting various smells virtually
from a few aroma chemicals. It becomes possible to achieve olfactory virtual reality
with a simpler system by reducing the number of aroma sources. Moreover, it is
thought to be possible to make olfactory maps among other kinds of smells like
flowers or dishes as far as the smells have visual cues in order for the olfactory cue.
The technique changing smell feeling by visual sensation without changing the
aroma chemicals itself makes it possible to achieve high quality olfactory VR more
easily.
References
1. Cain, W.S.: To know with the nose: Keys to odor identification. Science 203, 467–470
(1979)
2. Sugiyama, H., Kanamura, S.A., Kikuchi, T.: Are olfactory images sensory in nature?
Perception 35, 1699–1708 (2006)
3. Nakamoto, T., Otaguro, S., Kinoshita, M., Nagahama, M., Ohinishi, K., Ishida, T.:
Cooking Up an Interactive Olfactory Game Display. IEEE Computer Graphics and
Applications 28(1), 75–78 (2008)
4. Yamada, T., Yokoyama, S., Tanikawa, T., Hirota, K., Hirose, M.: Wearable Olfactory
Display: Using Odor in Outdoor Environment. In: Proceedings IEEE VR 2006, pp. 199–
206 (2006)
5. Herz, R.S., von Clef, J.: The influence of verbal labeling on the perception of odors:
evidence for olfactory illusion? Perception 30, 381–391 (2001)
6. Gottfried, J., Dolan, R.: The Nose Smells What the Eye Sees: Crossmodal Visual
Facilitation of Human Olfactory Perception. Neuron 39(2), 375–386 (2003)
7. Zellner, D.A., Kautz, M.A.: Color affects perceived odor intensity. Journal of
Experimental Psychology: Human Perception and Performance 16, 391–397 (1990)
8. Grigor, J., Van Toller, S., Behan, J., Richardson, A.: The effect of odour priming on long
latency visual evoked potentials of matching and mismatching objects. Chemical
Senses 24, 137–144 (1999)
9. Sakai, N., Imada, S., Saito, S., Kobayakawa, T., Deguchi, Y.: The Effect of Visual Images
on Perception of Odors. Chemical Senses 30(suppl. 1) (2005)
10. Nambu, A., Narumi, T., Nishimura, K., Tanikawa, T., Hirose, M.: A Study of Providing
Colors to Change Olfactory Perception - Using ”flavor of color”. In: ASIAGRAPH in
Tokyo 2008, vol. 2(2), pp. 265–268 (2008)
11. Bensafi, M., Rouby, C.: Individual Differences in Odor Imaging Ability Reflect
Differences in Olfactory and Emotional Perception. Chemical Senses 32, 237–244 (2007)
12. Lawless, H.T.: Exploration of fragrance categories and ambiguous odors using
multidimensional scaling and cluster analysis. Chemical Senses (1989)
13. Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. J. Mol.
Biol. 147, 195–197 (1981)
Towards Noninvasive Brain-Computer Interfaces
during Standing for VR Interactions
Hideaki Touyama
Toyama Prefectural University

5180 Kurokawa, Imizu, Toyama 939-0398, Japan
touyama@pu-toyama.ac.jp
Abstract. In this study, we propose a portable Brain-Computer Interface (BCI)

aiming to realize a novel interaction with VR objects during standing. The
ElectroEncephaloGram (EEG) was recorded under two experimental conditions:
I) the subject was during sitting at rest and II) during simulated walking
conditions in indoor environment. In both conditions, the Steady-State Visual
Evoked Potential (SSVEP) was successfully detected by using computer
generated visual stimuli. This result suggested that the EEG signals with portable
BCI systems would provide a useful interface in performing VR interactions
during standing in indoor environment such as immersive virtual space.
Keywords: Brain-Computer Interface (BCI), Electroencephalogram (EEG),

Steady-State Visual Evoked Potential (SSVEP), standing, immersive virtual
environment.
1 Introduction
The Brain-Computer Interfaces (BCIs) are communication channels with which the
computer or machine can be operated only by human brain activities [1]. In recent
years, the useful applications using virtual reality technology were demonstrated and
the feasibility of the BCI in immersive virtual environment was presented [2]-[4].
The steady-state visual evoked potential (SSVEP) can be used in order to
determine user's eye-gaze directions [5]. For example, Cheng et al investigated the
virtual phone [6]. Furthermore, Trejo et al developed a realistic demonstration to
control a moving map display [7]. However, most of the previous works on BCI
application based on SSVEP have been performed on a standard computer monitor. In
an immersive virtual environment, based on the SSVEP, the author made a
demonstration on the control of a 3D object in real time according to the eye-gaze
directions of the user [8].
One of the problems in the BCI applications is in the motivation of the user.
Usually, the BCIs have been used by the user sitting in a chair in order to avoid the
additional artifacts from the muscles activities. This made the user unable to use the
system for long time. Then, aiming to use BCI applications in immersive virtual
environment, the author investigated the feasibility of virtual reality (VR) interactions
with the subject in physically moving conditions.
Towards Noninvasive Brain-Computer Interfaces during Standing for VR Interactions 291
This paper is organized as follows. In section 2, experimental settings are

explained. In section 3, the results of our experiments are shown. Discussions and
conclusions are mentioned in the following sections.
2 Experiments
The healthy male participated in the experiments as the subject. He was
unsophisticated person for the EEG experiments.
In the experiment I, the subject was comfortably sitting on an arm-chair facing
visual stimuli of computer generated images on a monitor. There were two flickering
visual stimuli in the visual field as shown in Figure 1. In the experiment II, during the
measurements, the subjects were instructed to do simulated walking. Here, the
position of the visual stimuli was adjusted to the height of the subjects’ eyes.
Left Right
4Hz 6Hz
Fig. 1. Flickering visual stimuli
EEG electrode
Flickering
visual stimuli
A portable
amplifier
A laptop PC for
data acquisition &
A subject under signal processing
mimic walking
Fig. 2. The experimental setup
During both experiments, the scalp electrodes were applied in order to perform
EEG recordings. In this study, one-channel EEG signals were investigated from Oz
292 H. Touyama
according to the international 10/20 system [9]. A body-earth and a reference

electrode were on a forehead and on a left ear lobe, respectively. The analogue EEG
signals were amplified at a multi-channel bio-signal amplifier (Polymate II (AP216),
TEAC Corp. Japan) which is compact and portable. The subject was wearing a tiny
bag on his belt in which the compact amplifier was included. The amplified signals
were sampled at 200 Hz. The digitized EEG data was stored in a laptop computer
(apart from the subject in this study). The experimental setup is shown in Figure 2.
The experiment I and II were performed alternatively with the interval of rest for
about 1 minute. In the experiment I, one session was for gazing at left (flickering at
4Hz) visual stimulus and the following session was for right (6Hz) one. It was same
for the experiment II. Here, one session consisted of 30 sec of EEG measurements.
3 Results
In order to extract the features of SSVEP induced by the flickering stimuli we adopted
the frequency analysis on the collected EEG data. Figure 3 shows the results of the
power spectral density in Fast Fourier Transform (FFT) analysis in simulated walking
conditions. The clear SSVEP (fundamental and the harmonic signals) were observed
in the frequency corresponding flickering stimuli.
3
)
B
(d2.5 4Hz
itys
ne 2
D
alr1.5
tc
ep
S 1
re
ow0.5
P
0
0 5 10 15 20
Frequency (Hz)
3
)
dB
( 2.5 6Hz
yit
sn 2
e
D
la1.5
trc
peS 1
re
w
o0.5
P
0
0 5 10 15 20
Frequency (Hz)
Fig. 3. The results of frequency analysis (Upper: Gazing at 4Hz-flickering stimulus, Lower:
Gazing at 6Hz-one)
Furthermore, to see the quality of EEG signals we investigated the single shot data.
The EEG segments for 3 sec were taken from all collected sessions. For all segments,
the FFT analysis was applied to have feature vectors. After that, the pattern recognition
Towards Noninvasive Brain-Computer Interfaces during Standing for VR Interactions 293
algorithm was applied in order to classify two brain states (EEG signals in 4Hz and
6Hz flickering conditions).The number of the feature dimension was reduced by using
Principal Component Analysis followed by the classification by Linear Discriminant
Analysis. To estimate the classification performance a leave-one-out method was
adopted, where only one data was used for the testing and the others were for the
training.
It was found that the pattern recognition performance was successfully obtained to be
88.3% for the experiment I (sitting conditions). And we found the similar performance
for the experiment II (simulated walking conditions). Thus, in both conditions, we could
decode the subjects’ eye gaze directions more than 85% of accuracy.
4 Discussions
The results of the SSVEP classification show the feasibility of BCI application for VR
interactions during standing or physically moving. The group of the author tested the
BCI performance by developing an online CG controlling system. In the preliminary
study, the cursor control was possible even with the subjects during standing. In
immersive virtual environment the users interact with virtual objects usually in
standing conditions and thus the result in this paper encourages the development of
BCI application in CAVE-like display system [10].
One of the problems in BCI applications has been in the posture of the user. The
BCI has been used by the user sitting in a chair in order to avoid the additional
artifacts from the muscles activities. This has made the user less motivated and unable
to use the system for long time. In our studies the subjects did not report the less
motivated in standing conditions. Note that the group of the author reported that in
ambulatory context P300 evoked potential could be detected by using auditory stimuli
in indoor [11] and even in outdoor environment [12].
5 Conclusions
In this study, we proposed a portable BCI aiming to realize a novel interaction with
VR objects during standing. The EEG was recorded during simulated walking
conditions in indoor environment. The SSVEP was successfully detected by using
computer generated visual stimuli. This result suggested that the EEG signals with
portable BCI systems would provide a useful interface in performing VR interactions
during standing in indoor environment such as CAVE-like system.
Acknowledgment. This work is partly supported by the Telecommunication
Advancement Foundation.
References
1. Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Brain
computer interfaces for communication and control. Clinical Neurophysiology 113(6),
767–791 (2002)
2. Bayliss, J.D.: The use of the evoked potentials P3 Component for Control in a virtual
apartment. IEEE Transaction on Neural Syatems and Rehabilitation Engineering 11(2) (2003)
294 H. Touyama
3. Pfurtscheller, G., Leeb, R., Keinrath, C., Friedman, D., Neuper, C., Guger, C., Slater, M.:
Walking from thought. Brain Research 1071, 145–152 (2006)
4. Fujisawa, J., Touyama, H., Hirose, M.: EEG-based navigation of immersing virtual
environments using common spatial patterns. In: Proc. of IEEE Virtual Reality Conference
(2008) (to appear)
5. Middendorf, M., McMillan, G., Calhoun, G., Jones, K.S.: Brain-Computer Interfaces
Based on the Steady-State Visual-Evoked Response. IEEE Transactions on Rehabilitation
Engineering 8(2), 211–214 (2000)
6. Cheng, M., Gao, X., Gao, S., Xu, D.: Design and Implementation of a Brain-Computer
Interface With High Transfer Rates. IEEE Transactions on Biomedical Engineering 49(10),
1181–1186 (2002)
7. Trejo, L.J., Rosipal, R., Matthews, B.: Brain-computer interfaces for 1-D and 2-D cursor
control: designs using volitional control of the EEG spectrum or steady-state visual evoked
potentials. IEEE Trans. Neural. Syst. Rehabil. Eng. 14(2), 225–229 (2006)
8. Touyama, H., Hirose, M.: Steady-State VEPs in CAVE for Walking Around the Virtual
World. In: Stephanidis, C. (ed.) UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 715–717.
Springer, Heidelberg (2007)
9. Jasper, H.H.: The ten-twenty electrode system of the international federation.
Electroenceph. Clin. Neurophysiol. 10, 370–375 (1958)
10. Cruz-Neira, C., Sandin, D.J., DeFanti, T.A.: Surround-screen projection-based virtual
reality: The design and implementation of the CAVE. In: Proc. ACM SIGGRAPH 1993,
pp. 135–142 (1993)
11. Lotte, F., Fujisawa, J., Touyama, H., Ito, R., Hirose, M., Lécuyer, A.: Towards Ambulatory
Brain-Computer Interfaces: A Pilot Study with P300 Signals. In: 5th Advances in Computer
Entertainment Technology Conference (ACE), pp. 336–339 (2009)
12. Maeda, K., Touyama, H.: in preparation (2011)
Stereoscopic Vision Induced by Parallax Images on HMD
and Its Influence on Visual Functions
Satoshi Hasegawa1, Akira Hasegawa1,2, Masako Omori3,

Hiromu Ishio2, Hiroki Takada4, and Masaru Miyao2
1
Nagoya Bunri University, 365 Maeda Inazawa Aichi Japan
hasegawa@nagoya-bunri.ac.jp
2
Nagoya University, Furo-cho Chikusa-ku Nagoya Japan
3
Kobe Women's University, Suma Kobe Japan
4
University of Fukui, Bunkyo Fukui Japan
hasegawa@nagoya-bunri.ac.jp
Abstract. Visual function of lens accommodation was measured while subjects

used stereoscopic vision in a head mounted display (HMD). Eyesight with
stereoscopic Landolt ring images displayed on HMD was also studied. In
addition, the recognized size of virtual stereoscopic images was estimated using
the HMD. Accommodation to virtual objects was seen when subjects viewed
stereoscopic images of 3D computer graphics, but not when the images were
displayed without appropriate binocular parallax. This suggests that
stereoscopic moving images on HMD induced the visual accommodation.
Accommodation should be adjusted to the position of virtual stereoscopic
images induced by parallax. The difference in the distances of the focused
display and stereoscopic image may cause visual load. However, an experiment
showed that Landolt rings of almost the same size were distinguished regardless
of virtual distance of 3D images if the parallax was not larger than the fusional
upper limit. However, congruent figures that were simply shifted to cause
parallax were seen to be larger as the distance to the virtual image became
longer. The results of this study suggest that stereoscopic moving images on
HMD induced the visual accommodation by expansion and contraction of the
ciliary muscle, which was synchronized with convergence. Appropriate parallax
of stereoscopic vision should not reduce the visibility of stereoscopic virtual
objects. The recognized size of the stereoscopic images was influenced by the
distance of the virtual image from display.
Keywords: 3-D Vision, Lens Accommodation, Eyesight, Landolt ring, Size

Constancy.
1 Introduction
Stereoscopic vision (3D) technology using binocular parallax images has become
popular, used for movies, television, camera, and mobile displays. 3D vision enables
the display of real and exciting images with information of stereoscopic space.
However, 3D viewing may cause asthenopia more often than watching 2D images or
298 X. Zhang et al.
natural vision. Influences of 3D viewing on visual functions should be studied. It is

necessary to understand the mechanisms of recognition and the effects of 3D vision to
make safe and natural 3D images.
Three examinations have been conducted in order to study the effects of 3D vision
on visual functions. First we measured the recognized size of a figure displayed
stereoscopically (Experiment 1). Same size figures could be perceived as if their sizes
were different (Fig. 1) because of the size constancy. How are the sizes of
stereoscopic figure recognized?
Fig. 1. Example of illusion by size constancy. The right sphere looks as if it is larger than the
left one, although the sizes are the same.
Another experiment (Experiment 2) was to measure binocular visual acuity while

viewing 3D Landolt rings (Fig. 2). The focus is not fixed on the surface of the display,
but moving near and far synchronously with the movement of the 3D images being
viewed, as we previously reported [1-5]. The accommodation agrees with a
convergence fusion image that is different from the virtual display position [6]. Does
the visibility of the stereoscopic image deteriorate from the lack of focus?
Fig. 2. Landolt ring and visual acuity measurement
In Experiment 3, lens accommodation was measured while watching 3D vision.

Ordinary 3D (cross point camera image) and Power3DTM (Olympus visual
Development of Sizing Systems for Chinese Minors 299
communications Co., Ltd) were used for this experiment. A method to make natural
3D vision was suggested.
Details of these Experiments are described below.
2 Methods
2.1 Method of Experiment 1: Recognized Size Estimation
The HMD (Vuzix Corp. iWear AV230XL+, 320×240 pixel) displayed a 44 inch
virtual screen at a viewing distance of approximately 300 cm (270 cm). The center
circle of three circles was shifted horizontally without size change, to make parallax
for 9 different 3D virtual distances of 100, 150, 200, 250, 300 (2D), 350, 400, 450 and
500 cm from the eye to the fusion image (Fig. 3a). Subjects viewed these images (Fig.
3b) and recorded recognized size with pencil on the paper sheet shown in Fig. 3c.
Plain 2D
(a) (300)*
Pop Toward
(b) (100)*
Pop Away
(c)
(500)*
*Fusion Distance (cm)

Left Eye Right Eye
(a) Examples of 3D images displayed on HMD
(b) Viewing 3D images on HMD (c) Recognized size recording sheet
Fig. 3. Experiment 1: Recognized size estimation
2.2 Method of Experiment 2: Visual Acuity for 3D Landolt Ring

An HMD (same apparatus in Exp. 1) was used. Still images of parallax with a side-
by-side format were prepared for display of Landolt rings of 12 sizes (0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.9, 1.0, 1.2, 1.5 and 2.0) and visual acuity of 300 cm. Landolt rings
300 X. Zhang et al.
were shifted horizontally without size change, to make parallax for 9 different
stereoscopic virtual 3D virtual distances of 100, 150, 200, 250, 300 (2D), 350, 400,
450 and 500 cm to the fusion image (Fig. 4). Subjects wearing the HMD adjusted the
focus of both the left and right glasses first using the dial on the HMD while viewing
300 cm images that had no parallax. They then watched stereoscopic images of 9
distances without changing the focus dial positions. The smallest size of Landolt ring
resolved by the subjects was recorded with a value of 0.2-2.0 (eyesight value shown
in Fig. 2). Eighteen subjects (24.6 ± 7.9 years) with naked vision or wearing contact
lens or glasses were studied only when they could view fusion stereoscopic images.
Plain
(a)
2D
Pop
(b) Toward
(c) Pop
Away
Left Eye Right *Fusion Distance

(cm)
Fig. 4. Examples of 3D images used in Experiment 2: Visual Acuity for 3D
2.3 Method of Experiment 3: Accommodation Measurement

The image used in Experiment 3 was a moving 3D-CG sphere displayed
stereoscopically. The sphere moved virtually in a reciprocating motion toward and
away from the observer with the cycle of 10 seconds (Fig. 5).
10 sec.
Far Middle Near Middle Far
Fig. 5. Moving 3D image used in Experiment 3
Moving images (Fig. 5) were prepared by four type of 2D (Fig. 6a), Pseudo 3D (Fig.
6b), Cross point 3D (Fig. 6c with Fig. 7a) and POWER3DTM (Fig. 6c with Fig. 7b).
A modified version of an original apparatus [3] to measure lens accommodation
was used in the experiment 3 (Fig. 8). Accommodation was measured for 40 seconds
under natural viewing conditions with binocular vision while a 3D image (Fig. 5)
moved virtually toward and away from the subject on an HMD (Fig. 8). For the
accommodation measurements, the visual distance from the HMD to the subjects’
Far
Middle
Near
Left eye Right eye Left eye Right eye Left eye Right eye
(a) 2D (no parallax) (b) Pseudo 3D (fixed parallax) (c) 3D (stereoscopic)
Fig. 6. Three parallax modes used in Experiment 3
(2)
(1)
Far Near
Background
Far object
Cross point
Virtual screen
Near object (1)+(2)
Multi-Screen
Multi-Camera
Virtual Camera
(a) Cross Point 3D

(b) POWER3DTM
Fig. 7. Two 3D photography modes used in Experiment 3
eyes was 3 cm. The refractive index of the right lens was measured with an
accommodo-refractometer (Nidek AR-1100) when the subjects gazed at the presented
image via a small mirror with both eyes. The HMD (Vuzix Corp. iWear AV920,
640×480 pixel) was positioned so that it appeared in the upper portion of a dichroic
mirror placed in front of the subject’s eyes (Fig. 8). The 3D image was observed
302 X. Zhang et al.
through the mirror. The stereoscopic image displayed in the HMD could be observed
with natural binocular vision through reflection in the dichroic mirror, and refraction
could be measured at the same time by transmitting infrared rays.
HMD HMD
Eye
Dichroic mirror
Dichroic
Accommodo-refractometer (Nidek AR-1100) Subjects gazed with both eyes
Fig. 8. Lens accommodation measurement while watching 3D movie on HMD
The subjects were instructed to gaze at the center of the sphere with binocular eyes.
All subjects viewed four types of images of 2D, Pseudo 3D, Cross point 3D and
POWER 3DTM (Fig. 6, Fig. 7). While both eyes were gazing at the stereoscopic
image, the lens accommodation of the right eye was measured and recorded.
3 Results
3.1 Result of Experiment 1: Recognized Size Estimation
Fig. 9 shows the result of the Experiment 1. Four subjects recorded the recognized
size of 3D circle on the sheet shown in Fig. 3c. Ratio to the size of 2D (fusion
distance: 300 cm) was plotted, except when subjects could not fuse 3D images.
Additionally, the theoretical line mentioned below is shown in the same graph.
Subject Subject Subject Subject Theor

A B C D y

Recognized size of the circle

(%)

Distance from eyes to fusion image of the circle ᧤cm
Fig. 9. Result of Experiment 1

(a) Lower limit of recognition
(b) Rate of subjects possible to fuse and impossible to fuse
3.2 Result of Experiment 2: Visual Acuity for 3D Landolt Ring
The result of Experiment 2 is shown in Fig. 10. Fig. 10a shows the smallest Landolt
ring size expressed by the value for visual acuity from a 300 cm distance, averaged
for 15 subjects (excluding 3 who could view fusion images for neither parallax). In
this graph, ● shows the average of visual acuity points (eyesight value shown in Fig.
2) in which the value of the non-fusion cases was 0.0, and △ shows the average of
only cases of successfully viewed fusion.
The number and percentage of subjects who could and could not view fusion
images are shown in Fig. 10b. There are a lot of subjects who exceed the fusional
upper limit by 100 cm and 150 cm.
304 X. Zhang et al.
(Diopter)
5
J3D P3D G3D 2D
4.5 FP 2D
4 FP Pseudo 3D
Diopter
Accommdation 3.5 FP
Cross Point
3 FP 3
J3D 3D
P3D G3D 2D
FP
2.5 FP 2.5 FP
2 FP 2 FP
1.5 FP 1.5 FP
1 P 1 P
0.5 P 0.5 P
0 0
0 5 10 14 19 24 29 34 38
0 5 10 14 19 24 29 34 38 Ͳ0.5
Time(sec) Time(sec)
Time (sec ) (a) Time (sec )
(a) Accommodation of Subject E (b) Accommodation of Subject F
The fusion limit field is different depending on the individual variation or the
characteristic factor of HMD. However, Fig. 10a shows that almost the same size △
Landolt ring was distinguished regardless of the virtual distance of 3D images if the
parallax was not larger than the fusional upper limit for each subject.
3.3 Result of Experiment 3: Accommodation Measurement

The presented image was a 3D-CG sphere that moved in a reciprocating motion
toward and away from the observer with the cycle of 10 sec. (Fig. 5). The subjects
gazed at the sphere and accommodation was measured for 40 seconds (Fig. 8). The
results for 2D, Pseudo 3D, CrossPoint3D and POWER3D (Fig. 6, Fig. 7) are shown in
Fig. 11 for two subjects.
Figure 11 (a) shows results for subject E (age: 24, male), and (b) for subject F (age:
39, female). The results showed that large amplitude of accommodation
synchronizing with convergence is shown only in both 3D modes of Cross Point 3D
and POWER3D but in neither 2D nor Pseudo 3D. Individual difference among
subjects was large, however. POWER3D induced larger amplitude accommodations
than Cross Point 3D in both subjects.
4 Discussion
Stereoscopic images induced the illusion that pop away figures were recognized as
▲
being larger than the size on the display screen (Fig. 9), although the rate was saturated
except in one subject of in Fig. 9. The theoretical line shown in Fig. 9 is the
calculated size ratio according to the principle shown in Fig. 12. The saturation of the
expansion of the pop away image size might be caused by ‘size consistency’ (Fig. 1).
The reason why the size of the pop-toward image was recognized as being smaller
and the pop-away as larger (Experiment 1, Fig. 9) may be that the images on the
retina of subjects' eyes agreed with the screen images although the recognized
distances were on the fusion point of the parallax images (Fig. 12). Hori and Miyao
et. al [6] have reported the accommodation agreed with fusion distance while
watching 3D images. Some scholars have said the 3D images might be unfocused
because the accommodations focused on the pop toward/away images though images
were displayed on the screen. However, the result of Experiment 2 (Fig. 10) showed
that the eyesight did not deteriorate regardless of the pop toward/away distances if
subjects could successfully view fusion within the depth of field.
Pop away image

Pop up image
Eyes
Screen
Fig. 12. The principle of expansion and contraction of recognized 3D images
Accommodation was induced by the movement of the stereoscopic image in 3D

mode on the HMD (Fig. 11), and the amplitude was larger in POWER3D than in
ordinary Cross Point 3D. One of the reasons why some people feel artificiality in
stereoscopic viewing was the fixed camera angles in photography (Fig. 7).
POWER3D induced the larger accommodation without fusion failure, and may be
more natural for 3D viewers than the conventional method of Cross Point 3D
(Experiment 3, Fig. 11).
Acknowledgement. This study is partly supported by Olympus Visual

Communications Corporation (OVC), Japan. Parts of experiments shown in this paper
were done with the help of Mr. H. Saito and Mr. H. Kishimoto, students of Nagoya
Bunri University.
References
1. Hasegawa, S., Omori, M., Watanabe, T., Fujikake, K., Miyao, M.: Lens Accommodation to
the Stereoscopic Vision on HMD. In: Shumaker, R. (ed.) VMR 2009. LNCS, vol. 5622, pp.
439–444. Springer, Heidelberg (2009)
2. Omori, M., Hasegawa, S., Watanabe, T., Fujikake, K., Miyao, M.: Comparison of
measurement of accommodation between LCD and CRT at the stereoscopic vision gaze. In:
Shumaker, R. (ed.) VMR 2009. LNCS, vol. 5622, pp. 90–96. Springer, Heidelberg (2009)
3. Miyao, M., Otake, Y., Ishihara, S.: A newly developed device to measure objective
amplitude of accommodation and papillary response in both binocular and natural viewing
conditions. Jpn. J. Ind. Health 34, 148–149 (1992)
4. Miyao, M., Ishihara, S., Saito, S., Kondo, T., Sakakibara, H., Toyoshima, H.: Visual
accommodation and subject performance during a stereographic object task using liquid
crystal shutters. Ergonomics 39(11), 1294–1309 (1996)
5. Omori, M., Hasegawa, S., Ishigaki, H., Watanabe, T., Miyao, M., Tahara, H.:
Accommodative load for stereoscopic displays. In: Proc. SPIE, vol. 5664, p. 64 (2005)
6. Hori, H., Shiomi, T., Kanda, T., Hasegawa, A., Ishio, H., Matsuura, Y., Omori, M., Takada,
H., Hasegawa, H., Miyao, M.: Comparison of accommodation and convergence by
simultaneous measurements during 2D and 3D vision gaze. In: HCII 2011, in this Proc.
(2011)
Comparison of Accommodation and Convergence
by Simultaneous Measurements
during 2D and 3D Vision Gaze
Hiroki Hori1, Tomoki Shiomi1, Tetsuya Kanda1, Akira Hasegawa1,

Hiromu Ishio1, Yasuyuki Matsuura1, Masako Omori2, Hiroki Takada3,
Satoshi Hasegawa4, and Masaru Miyao1
1
Nagoya University Japan
2
Kobe Women's University Japan
3
Fukui University Japan and
4
Nagoya Bunri University Japan
hiroki@miyao.i.is.nagoya-u.ac.jp
Abstract. Accommodation and convergence were measured simultaneously

while subjects viewed 2D and 3D images. The aim was to compare fixation
distances between accommodation and convergence in young subjects while
they viewed 2D and 3D images. Measurements were made using an original
machine that combined WAM-5500 and EMR-9, and 2D and 3D images were
presented using a liquid crystal shutter system. Results suggested that subjects’
accommodation and convergence were found to change the diopter value
periodically when viewing 3D images. The mean values of accommodation and
convergence among the 6 subjects were almost equal when viewing 2D and 3D
images respectively. These findings suggest that the ocular functions when
viewing 3D images are very similar to those during natural viewing. When
subjects are young, accommodative power while viewing 3D images is similar
to the distance of convergence, and the two values of focusing distance are
synchronized with each other.
Keywords: Stereoscopic Vision, Simultaneous Measurement, Accommodation

and Convergence, Visual Fatigue.
1 Introduction
Recently, stereoscopic vision technology has been developing. Today, stereoscopic
vision is not only used in movie theaters. Each home appliance maker has started to
sell 3D TVs and 3D cameras. Lately, mobile devices, such as cellular phones and
portable video game machines, have been converted to 3D, and the general public has
also started to become very comfortable with stereoscopic vision.
Various stereoscopic display methods are proposed. The most general method is to
present images with binocular disparity. For 3D TVs, the following two methods are
mainly used. One method is polarized display systems that present two different
images with binocular disparity to the right and left eyes using polarized filters. The
Comparison of Accommodation and Convergence by Simultaneous Measurements 307
other method is frame sequential systems that present two different images with
binocular disparity by time-sharing using liquid crystal shutters. Special 3D glasses
are needed to watch 3D images with these two methods. In contrast, the following
two methods are mainly used for mobile devices such as cellular phones. One method
is parallax barrier systems that separate two different images to present to the right
and left eyes with parallax barrier on the display. The other is lenticular systems that
separate two different images and present them to the left and right eyes using a hog-
backed lens called a lenticular lens. There is less need for special 3D glasses using a
polarized display system and frame sequential system to watch 3D images with these
two methods. Another method is HMD (Head Mounted Display) systems that separate
and present images with glasses.
However, despite the progress in this stereoscopic vision technology, the effects on
the human body from continuously watching 3D images (such as visual fatigue and
motion sickness) have not been elucidated. Lens accommodation (Fig. 1) and
binocular convergence (Fig. 2) may provide clues for understanding the causes of
various symptoms.
It is generally explained to the public that, "During stereoscopic vision,
accommodation and convergence are mismatched and this is the main reason for the
visual fatigue caused by stereoscopic vision" [1-4]. During natural vision, lens
accommodation is consistent with convergence. During stereoscopic vision, while
accommodation is fixed on the display that shows the 3D image, convergence of left
and right eyes crosses at the location of the stereoimage. According to the findings
presented in our previous reports, however, such explanations are mistaken [5-7].
However, our research has not been recognized in the world. This may be because the
experimental evidence obtained in our previous studies, where we did not measure
accommodation and convergence simultaneously, was not strong enough to convince
people. We therefore developed a new device that can simultaneously measure
accommodation and convergence.
In this paper, we report experimental results obtained using this device, in order to
compare the fixation distances during viewing of 2D and 3D images.
Fig. 1. Lens Accommodation

308 H. Hori et al.
Fig. 2. Convergence
2 Method
The subjects in this study were 6 healthy young students in their twenties (2 with
uncorrected vision, 4 who used soft contact lenses). The aim was to compare fixation
distances between accommodation and convergence in young subjects while they
viewed 2D and 3D images. We obtained informed consent from all subjects, and
approval for the study from Ethical Review Board in the Graduate School of
Information Science at the Nagoya University.
The details of experimental setup were as follows: We set an LCD monitor 1 m in
front of subjects, and presented 2D or 3D images where a spherical object moved
forward and back with a cycle of 10 seconds (Fig. 3). In theory, the spherical object
appears as a 3D image at 1 m (i.e., the location of LCD monitor) and moves toward the
subjects to a distance of 0.35 m in front of them. We asked them to gaze at the center
of the spherical object for 40 seconds, and measured their lens accommodation and
convergence distance during that time. The 3D and 2D images were presented using a
liquid crystal shutter system. Measurements were made three times each. For the
measurements, we made an original machine by combining WAM-5500 and EMR-9.
Fig. 3. Spherical Object Movies (Power 3D™ : Olympus Visual Communications, Corp.)
WAM-5500 is an auto refractometer (Grand Seiko Co., Ltd.) that can measure
accommodative power with both eyes opened under natural conditions (Fig. 4). It
enables continuous recording at a rate of 5 Hz for reliable and accurate measurement

of accommodation. WAM-5500 has two measurement modes. One is a static mode,
and the other is a dynamic mode. We used the dynamic mode. The instrument was
connected to a PC running the WCS-1 software via an RS-232 cable with the WAM-
5500 set to Hi-Speed (continuous recording) mode. During dynamic data collection,
we simply depress the WAM-5500 joystick button once to start recording, and once to
stop at the end of the desired time frame.
EMR-9 is an eye mark recorder (NAC Image Technology Inc.) that can measure
convergence distance (Fig. 5) using the pupillary/corneal reflex method. The
specifications are resolution of eye movement of 0.1 degrees, measurement range 40
degrees and measurement rate 60 Hz. Small optical devices of 10 mm width and 30
mm long for irradiation and measurement of infrared are supported by a bar attached
to a cap mounted on the face of subject.
Fig. 4 - 5. Auto Refractometer WAM-5500 (Fig.4: left) and Eye Mark Recorder EMR-9 (Fig.5:
right)
We used a liquid crystal shutter system combined with the respective binocular
vision systems for 2D and 3D (Fig. 6). The experimental environment is shown in
Fig. 7 and Table 1. Here, we note that brightness (cd/m2) is a value measured through
the liquid crystal shutter, and that the size of the spherical objects (deg) is not equal
because the binocular vision systems for 2D and 3D have different display sizes.
The images we used in the experiment are from Power 3D™ (Olympus Visual
Communications, Corp.). This is an image creation technique to combine near and far
views in a virtual space. It has multiple sets of virtual displays, the position of which
can be adjusted. When subjects view a close target (crossed view), far view cannot be
fused. When they see a far view, the close target (crossed view) is split and two
targets are seen. Therefore, Power 3D presents an image that is extremely similar to
natural vision.
310 H. Hori et al.
Fig. 6. Uniting WAM-5500 (Fig.4) and EMR-9 (Fig.5)
Fig. 7. Experimental Environment
Table 1. Experimental Environment
Brightness of Spherical Far 3.6

Object (cd/m2) Near 3
Far
Illuminance (lx) 126
Near
Size of Spherical Far 0.2
Object (deg) Near 7.7
3 Results
The measurements for the 6 subjects showed roughly similar results. For 3D vision,
results for Subjects A and B are shown in Fig. 8 and Fig. 9 as examples. When
Subject A (23 years old, male, soft contact lenses) viewed the 3D image (Fig. 8),
accommodation changed between about 1.0 Diopter (100 cm) and 2.5 Diopters (40
cm), while convergence changed between about 1.0 Diopter (100 cm) and 2.7
Diopters (37 cm). The changes in the respective diopter values have almost the same
amplitude and are in phase, fluctuating synchronously with a cycle of 10 seconds
corresponding to that of the 3D image movement.
Similarly, when Subject B (29 years old, male, soft contact lenses) viewed the 3D
image (Fig. 9), both accommodation and convergence changed in almost the same
way between about 0.8 Diopters (125 cm) and 2.0 Diopters (50 cm). The changes in
the respective diopter values have almost the same amplitude and are in phase,
fluctuating synchronously with a cycle of 10 seconds corresponding to that of the 3D
image movement.
For 2D vision, the results for Subject A are shown in Figs. 10 as an example. As
stated above (Fig. 8), when he viewed the 3D image, his accommodation and
convergence changed between about 1.0 Diopter (100 cm) and 2.5 Diopters (40 cm).
They had almost the same amplitude and were in phase, fluctuating synchronously
with a cycle of 10 seconds corresponding to that of the 3D image movement. In
contrast, when viewing the 2D image (Fig. 10), the diopter values for both
accommodation and convergence were almost constant at around 1 Diopter (1 m).
Fig. 8. Subject A (3D image)

312 H. Hori et al.
Fig. 9. Subject B (3D image)
Fig. 10. Subject A (2D image)
Finally, Table 2 shows mean values of accommodation and convergence in the 6

subjects when they viewed 2D and 3D images. The mean values of accommodation
and convergence for the 6 subjects when they viewed the 2D image were both 0.96
Diopters. The difference was negligible. When viewing the 3D image, the values of
accommodation and convergence were 1.29 Diopters and 1.32 Diopters, respectively.
The difference was about 0.03 Diopters, which is also negligible. Therefore, we can
say that there is not much quantitative difference in the results between
accommodation and convergence when viewing either the 2D or 3D images.
In this experiment, there were also a few subjects who could recognize the
stereoscopic view but complained that it was not easy to see with stereoscopic vision
at the point where the 3D image was closest.
Table 2. Mean value of accommodation and convergence
Accommodation Convergence Difference

0.96 D 0.96 D 0D
2D
(104.2 cm) (104.2 cm) (0 cm)
1.29 D 1.32 D 0.03 D
3D
(77.5 cm) (75.8 cm) (1.7 cm)
4 Discussion
In this experiment, we simultaneously measured accommodation and convergence
while subjects viewed 2D and 3D images for comparison, since it is said that
accommodation and convergence are mismatched during stereoscopic vision. Wann et
al. (1995) said that within a VR system the eyes must maintain accommodation on the
fixed LCD screens, despite the presence of disparity cues that necessitate convergence
eye movements in the virtual scene [1]. Moreover, HONG et al. (2010) said that the
natural coupling of eye accommodation and convergence in viewing a real-world
scene is broken in stereoscopic displays [4].
From the results in Fig. 8 and Fig. 9, we see that when young subjects are viewing
3D images, accommodative power is consistent with the distance of convergence with
the liquid crystal shutter systems, and that the values of focusing distances are
synchronized with each other.
In addition, the results in Figs. 10 and Table 2 suggest that the ocular functions
when viewing 3D images are very close to those during natural viewing. In general, it
is said that there is a slight difference between accommodation and convergence even
during natural viewing, with accommodation focused on a position slightly farther
than that of real objects and convergence focused on the position of the real objects.
This is said that to originate in the fact that the index is seen even if focus is not
accurate because of the depth of field [8]. In our 3D vision experiments, the mean
values of accommodation and convergence were found to be 1.29 Diopters and 1.32
Diopters, respectively. This means that accommodation focuses on a position slightly
farther than that of convergence by about 0.03 Diopters. Hence, our findings suggest
that eye movement when viewing 3D images is similar to that during natural viewing.
In the light of the above, the conventional theory stating that within a VR system our
eyes must maintain accommodation on the fixed LCD screen may need to be
corrected. We can also say that the kind of results presented herein could be obtained
because the 3D images used in the experiments were produced not by conventional
means but with Power 3D, whose images are extremely close to natural viewing.
Therefore, we consider that as long as 3D images are made using a proper method,
314 H. Hori et al.
accommodation and convergence should almost always coincide, even for an image
that projects out significantly, and that we can view such images more easily and
naturally. Conventional 3D and the Power 3D on HMD have been compared in
experiments using Power 3D [6-7]. These previous works also found that Power 3D is
superior to conventional 3D.
5 Conclusion
In this experimental investigation, we simultaneously measured accommodation and
convergence while subjects viewed 2D and 3D images for comparison. The results
suggest that the difference in eye movement for accommodation and convergence is
equally small when viewing 2D and 3D images. This suggests that the difference
between accommodation and convergence is probably not the main reason for visual
fatigue, motion sickness, and other problems. The number of subjects in this
experiment was only 6, which may be still too small for our findings to be completely
convincing. In the near future, we would like to repeat this study with a larger number
of subjects. We would also like to simultaneously measure and compare both
accommodation and convergence in subjects viewing real objects (natural vision) and
3D images (stereoscopic vision) of those objects made with 3D cameras.
References
1. Wann, J.P., Rushton, S., Mon-Williams, M.: Natural Problems for Stereoscopic Depth
Perception in Virtual Environments. Vision Res. 35(19), 2731–2736 (1995)
2. Simon, J.W., Kurt, A., Marc, O.E., Martin, S.B.: Focus Cues Affect Perceived Depth.
Journal of Vision 5, 834–862 (2005)
3. David, M.H., Ahna, R.G., Kurt, A., Martin, S.B.: Vergence-accommodation Conflicts
Hinder Visual Performance and Cause Visual Fatigue. Journal of Vision 8(33), 1–30 (2008)
4. Hong, H., Sheng, L.: Correct Focus Cues in Stereoscopic Displays Improve 3D Depth
Perception. SPIE, Newsroom (2010)
5. Miyao, M., Ishihara, S., Saito, S., Kondo, T., Sakakibara, H., Toyoshima, H.: Visual
Accommodation and Subject Performance during a Stereographic Object Task Using Liquid
Crystal Shutters. Ergonomics 39(11), 1294–1309 (1996)
6. Miyao, M., Hasegawa, S., Omori, M., Takada, H., Fujikake, K., Watanabe, T., Ichikawa, T.:
Lens Accommodation in Response to 3D Images on an HMD. In: IWUVR (2009)
7. Hasegawa, S., Omori, M., Watanabe, T., Fujikake, K., Miyao, M.: Lens Accommodation to
the Stereoscopic Vision on HMD. In: Shumaker, R. (ed.) VMR 2009. LNCS, vol. 5622, pp.
439–444. Springer, Heidelberg (2009)
8. Miyao, M., Otake, Y., Ishihara, S., Kashiwamata, M., Kondo, T., Sakakibara, H., Yamada,
S.: An Experimental Study on the Objective Measurement of Accommodation Amplitude
under Binocular and Natural Viewing Conditions. Tohoku, Exp., Med. 170, 93–102 (1993)
Tracking the UFO’s Paths: Using Eye-Tracking
for the Evaluation of Serious Games
Michael D. Kickmeier-Rust, Eva Hillemann, and Dietrich Albert
Graz University of Technology, Brueckenkopfgasse 1/6,

8020 Graz, Austria
{michael.kickmeier-rust,eva.hillemann,dietrich.albert}@tugraz.at
Abstract. Computer games are undoubtedly an enormously successful genre.

Over the past years, a continuously growing community of researchers and
practitioners made the idea of using the potential of computer games for
serious, primarily educational purposes equally popular. However, the present
hype over serious games is not reflected in sound evidence for the effectiveness
and efficiency of such games and also indicators for the quality of learner-game
interaction is lacking. In this paper we look into those questions, investigating a
geography learning game prototype. A strong focus of the investigation was on
relating the assessed variables with gaze data, in particular gaze paths and
interaction strategies in specific game situations. The results show that there a
distinct gender differences in the interaction style with different game elements,
depending on the demands on spatial abilities (navigating in the three-
dimensional spaces versus controlling rather two-dimensional features of the
game) as well as distinct differences between high and low performers.
Keywords: Game-based learning, serious games, learning performance, eye

tracking.
1 Introduction
Over the past years, digital educational games (DEG) got in the focus of educational
research and development. Several commercial online platforms are distributing (semi-)
educational mini games (e.g., www.funbrain.com or www.primarygames.com) for the
young children, Nintendo DS’ “Dr Kawashima's Brain Training: How Old Is Your
Brain?” is a best seller, and a growing number of companies concentrate on educational
simulation and game software. Today’s spectrum of learning games is broad, ranging
from so-called using off-the-shelf games (COTS) in educational setting to specifically
designed curriculum-related learning games. A classification on the basis of the
psycho-pedagogical and technical level of games is proposed by [1, 2, 3].
As rich as the number of games is, is also the number of initiatives and projects in
this area. Historically, among the founders of the recent hype over game-based
learning are doubtlessly Mark Prensky and Jim Gee. Mark Prensky published in 2001
[4] his ground breaking book “Digital Game-based Learning”. His idea of game-
based learning focuses on the concept of digital natives. He argues that the
316 M.D. Kickmeier-Rust, E. Hillemann, and D. Albert
omnipresence of “twitch speed” media such as MTV and computer games have
emphasized specific cognitive aspects and de-emphasized others which, in turn has
changed the educational demands of this generation. Jim Gee focused on learning
principles in video games and how these principles can be applied to K-12 education
[5, 6]. His thematically origin is the idea that (well-designed) computer games are
very good at challenging the players, at preserving them, and at teaching how to play.
On this basis he identified several principles for successful (learning) game design
(e.g., learners must be enabled to be active agents or producers and not just passive
recipients or consumers; Gee, 2005, p. 6, [7]). Another pioneer was for example
David Shaffer in the context of applying regular entertainment computer games
(COTS) in education [8].
In the USA a strong focus of research and development has a military background
(this strong background is for example mirrored by governmental institutions such the
Department of Defence Game Developers’ Community; www.dodgamecommunity.com),
resulting in famous games such America’s Army. In Europe widely a more civil approach
is pursued, strongly driven by the European Commission. Leading-edge projects are, for
example, ELEKTRA (www.elektra-project.org), 80Days (www.eightdays.eu), TARGET
(www.reachyourtarget.org), mGBL (www.mg-bl.com), Engagelearning
(www.engagelearning.eu), LUDUS (www.ludus-project.eu), or the special interest group
SIG-GLUE (www.sig-glue.net).
A noteworthy initiative comes from the Network of Excellence GALA (Games and
Learning Alliance; www.galanoe.eu), an alliance were Europe’s most import players
in the serious games sector attempt to streamline the fragmented field and to increase
scientific and economic impact.
The recent hype over game-based learning is based on the natural, almost self-
evident, link between pedagogical and didactic guidelines and theories and the
characteristics of modern computer games [9]. Just as one example, the rich virtual
environments of immersive games enable very naturally a meaningful and plausible
context for learning and therefore support deeper learning processes. Evidently,
computer games have the potential to make knowledge and skills a desirable and
valuable asset and they have the potential to make learning a meaningful and
important task. By this means, there exists the justifiable hope that also those learners
can be reached who are not necessarily keen on learning and who may be reached
with other educational measures. In this sense, serious games can be way more than
just “chocolate covered broccoli” [10]. Recent research even argues that the
immersion and gaming experience impacts the neurotransmitter systems and thus
alternating cognitive functions and learning capacity [11].
Despite the great many of advantages of serious games and despite the large
amount of related research and development activities, as outlined above, educational
computer games, specifically what we termed competitive games, have not become a
serious business case so far. This impression coincides with the view of many
researchers in the field of game-based learning, who are arguing that t hose games
most often are still in their infancy from a scientific and pedagogical perspective (e.g.,
[12], [13]). Major challenges for research, design, and development are seen, for
example, in finding an appropriate balance between gaming and learning activities [3]
or finding an appropriate balance between challenges through the game and abilities
of the learner (e.g., [14]). One of the most important challenges for research concerns
Tracking the UFO’s Paths: Using Eye-Tracking for the Evaluation of Serious Games 317
the core strength of games, which can be summarized with their enormous intrinsic
motivational potential. On the one hand, maintaining a high level of motivation
requires an intelligent and continuous real-time adaptation of the game to the
individual learner, for example, a continuous balancing of challenge and ability and of
problems and learning progress. Essentially, this corresponds to the concept of flow –
a highly immersed experience when a person is engaged in a mental and/or physical
activity to a level where this person loses track of time and the outside world and
when performance in this activity is optimal [15].
2 Evaluating Serious Games

An ultimate challenge in the context of serious games, however, is a scientifically
sound formative and summative assessment and evaluation. On the one hand, the
gaming aspect must be addressed on the other hand the learning aspects. Both,
unfortunately, must be considered more than the sum of its components and, more
importantly, raise in part contracting demands on the metrics of “good” and
“educational” games.
When considering the criteria of “good” software in general and conventional
learning/teaching software in particular, these are heavily dominated by the idea of
performance with the software (in terms of effectivity and efficiency); the ISO standards,
for example, are promoting this kind of view. Accordingly, also the metrics and
heuristics focus on performance aspects (e.g., error prevention, minimization of task
time, minimization of cognitive load, and so on; cf. [16]). In contrast, criteria of “good”
games focus on aspects of fun, immersion, pleasure, or entertainment. Heuristics
concern, for example, the visual quality, the quality of the story, fairness, curiosity, and
also a certain level of challenge (in the sense of task load or cognitive load); Atari
founder Nolan Bushnell states in a famous quote, “a good game is easy to learn but hard
to master”.
Evaluating serious games must consider both, performance aspects as well as
recreational aspects. On the one hand, such game is supposed to accomplish a specific
goal, that is, teaching a defined set of knowledge/skills/competencies. This goes along
with context conditions such as justifiability of development and usage costs,
comparability with conventional learning material in terms of effectivity and
efficiency, or societal concerns. On the other hand, the great advantages of the
medium game are only existent if the game character is in the foreground, that is,
being immersion, having fun, experiencing some sort of flow.
In the past, several approaches to measure serious games were published. De
Freitas and Oliver [17] proposed an evaluation framework which focuses on (i) the
application context of a serious game, (ii) learner characteristics, (iii)
didactical/pedagogical aspects, and (iv) the concept of “diegesis”, the extent and
quality of the game story’s world. Besides such general frameworks, also more
specific scales were introduced. An example is the eGameFlow approach [12], which
attempts to measure flow experience in educational games by criteria such as
challenge, amount of concentration, level of feedback, and so forth. A more recent
and complete approach to gauge “good” serious games comes from [18]. The
EVADEG framework concentrates on the aspects (i) learning performance, (ii)
gaming experience, (iii) game usability, and (iv) the evaluation of adaptive features.
The latter is a highly important yet oftentimes neglect factor. In the tradition of
adaptive, intelligent tutoring systems, some modern serious games adapt
autonomously to the needs, preferences, abilities, goals, and the individual progress of
the players [14]. As emphasized by Weibelzahl [19], a scientific correct evaluation of
adaptive systems is difficult because, in essence, personalization and adaptivity means
that each user/player potentially receive different information, in a different ways, in a
different sequence, and in a different format.
The introduced frameworks and approaches offer a valuable basis for evaluating
serious games. A key problem, however, is how to measure all those aspects in a
possibly unobtrusive, reliable, valid, and methodologically correct way. In the present
paper, we took up the ideas of EVADEG and introduce eye tracking as means of
studying (i) the usability of a game, (ii) the extent of learner satisfaction, and most
importantly (iii) the learning efficacy. More concretely, we utilized the considerations
and methods to evaluate a learning game prototype which was developed in the
course of a European research project (80Days).
2.1 Eye Tracking
Observing eye movements has a long tradition in psychology in general and the field
of HCI/usability in particular. El-Nasr and Yan [20] describe how perceptive (i.e.,
bottom up) and cognitive (i.e., top-down) processes orchestrate in the context of 3D
videogames. Accordingly, while saliency of objects can grab players’ attention, goal-
orientation (top-down) in games is more effective for attracting attention. The big
challenge for the authors was to develop a new methodology to analyze eye tracking
data in a complex 3D environment, which differed considerably from the stimuli used
in eye-tracking experiments conducted until then [21].
Basic variables in eye tracking studies are fixations (processing of attended
information with stationary eyes) and saccades (quick eye movements occurring
between fixations without information processing) [22]. The sequence of fixations
establish scan paths through a visual field. Although such eye tracking measures are
commonly used, their interpretations remain malleable. To give an example, an
important indicator for the depth of processing is fixation duration; the longer a
stimulus is attended, the higher is cognitive load and the deeper is the processing]. In
immersive games such relationship my not be quite as stable and valid as the might be
for other visual processing tasks. In the work of Jennett [23], for example, who
investigated the immersion in a game using eye tracking a decrease of fixations per
second in the immersive condition was found as compared to an increase in a non-
immersive control condition. Jennett argued that in an immersive game the attention
of the players becomes more focused on game-related visual components and
therefore less “vulnerable” to distracting stimuli.
3 An Eye Tracking Study on Learning and Gaming
3.1 The 80Days Game Prototype
The investigated game prototype was developed in the context of the European
80Days project (www.eighytdays.eu). The game is teaching geography for a target
audience of 12 to 14 year olds and follows European curricula in geography. In

concrete terms, an adventure game was realized within which the learner takes the
role of an Earth kid at the age of 14. The game starts when a UFO is landing in the
backyard and an alien named Feon is contacting the player. Feon is an alien scout
who has to collect information about Earth. The player wants to have fun by flying a
UFO and in the story pretends to be an expert in the planet earth. He or she assists the
alien to explore the planet and to create a report about the Earth and its geographical
features. This is accomplished by the player by means of flying to different
destinations on Earth, exploring them, and collecting and acquiring geographical
knowledge. The goal is to send the Earth report as a sort of travelogue about Earth to
Feon’s mother ship. In the course of the game, the player discloses the aliens’ real
intentions – preparing the conquest of the earth – and reveals the “real” goal of the
game: The player has to save the planet and the only way to do it is to draw the right
conclusion from the traitorous Earth report. Therefore the game play has got two main
goals: (1) to help the alien to complete the geographical Earth report, and (2) to save
the planet, which is revealed in the course of the story, when the player realizes the
true intention of the alien. Figure 1 gives some impressions of the game. Details are
given by [14].
Fig. 1. Screenshots of the 80Days demonstrator game; an action adventure – on the basis of an
Alien story – to learn geography according to European curricula
3.2 Study Design
The study presented in this paper is only one of a long sequence of experiments in
several European countries, to evaluate to demonstrator game and to conduct in-depth
research on the relationships and mechanisms in the context of using computer games
for learning. Due to the vast complexity of this research battery, we must focus on a
rather concise snapshot of this work only. The present results are based on data of 9
Austrian children, 4 girls and 5 boys. The participant’s age ranged between 11 and 16
years with the average of 13 years (SD = 1.61).
3.3 Material and Apparatus
To record gaze information, the Tobii 1750 eye tracker was used, a device that works
with infrared cameras and therefore enables a fully unobtrusive recording of gaze
information (Figure 2a). For the pre and post assessments of knowledge, we utilized a
paper-pencil knowledge test, and in addition motivational, usability-related, and
attention-related scales were issued. For the analyses, we selected three individual
scenes: Flying to Budapest, instructive cockpit scene in Budapest, terraforming
simulation (cf. Figure 2b).
Fig. 2. Panel a (left) shows an image of the eye tracking set up. Panel b (right) a screen shot of
the game’s terraforming simulation. The colored recatangle indicate predefined areas of interest
(AOI) for gaze data analyses.
4 Results
4.1 Learning Performance
The average score of the knowledge test prior to the gaming session was 32.33
(SD=9.45) and that of the posttest 39.00 (SD =10.22). The difference is statistically
significant (T=-3.814, df=8, p=0.005). For girls the average score was 26.25 (SD=11.09)
and 31.25 (SD =10.63) respectively. There is no significant difference between pre and
posttests. For boys the average score in the pretest was 37.20 (SD=4.44) and that for the
posttest 45.20 (SD=4.02), which is a significant difference between these two test scores
(T=-6.136, df=4, p=0.004). The results are illustrated in Figure 3.
4.2 Eye Movements
For this investigation it was important to get information on how much time in the
three different situations participants spend while playing the game and on which
parts their eyes fixed. By this way the relative fixation numbers, the duration of the
situation as well as the total duration, and the saccade lengths were analyzed.
Fig. 3. Results of the knowledge tests before and after playing the learning game
These results imply that females spend more time in playing the game. Females’
total duration is M=1018.86 with SD=102.58 in contrast to males’ total duration with
M=868.80 with SD=280.72. Especially on the simulation situation females spend
more time with M=913.20 (SD=142.28) than males with M=726.41 (SD=278.41).
A key question of this investigation is whether good and low performers in terms
of learning have distinct gaze patterns/scan paths. Participants who learned more
spend more time in playing the game with M=940.10 with SD=23.52 in contrast to
participants who learned less with M=781.79 with SD=344.46. These results were
present in the different situations as well. For participants who learned more the
duration of the flying situation is higher with M=100.09 (SD=79.33) than for the other
group with M=59.71 (SD=31.75). The duration for the instruction for a high learning
effectiveness is M=70.97 (SD=29.33) and M=45.53 (SD=20.28) for a low learning
effectiveness. In the simulation situation the duration is M=769.04 (SD=77.21) for
better learners and M=676.55 (SD=362.57) for persons who learned less. Regarding
the fixation number/sec for the three different situations participants who learned
more have smaller values than participants who learnt less. The total fixation
number/sec for better learners is M=0.72 (SD=0.04), in the flying situation the
fixation number/sec is M=2.08 (SD=0.41), in the Instruction situation is M=1.94
(SD=0.85), and in the Simulation situation is =2.22 (SD=0.34). For the other group
the total fixation number/sec is M=0.77 (SD=0.12), the flying fixation number/sec is
M=2.40 (SD=0.33), the instruction fixation number/sec is M=2.22 (SD=0.50), and the
simulation fixation number/sec is M=4.48 (SD=4.14). Regarding the first situation
participants who learnt more have a longer average fixation length with M=0.45
(SD=0.10) than persons who learnt less with M=0.40 (SD=0.04). Regarding the
second situation better learner also have a longer average fixation length with M=0.57
(0.37) than the others who have an average fixation length with M=0.42 (SD=0,06).
Regarding the third situation a longer average fixation length with M=0.45 (SD=0.06)
is found for good learners. Participants who learnt less have an average fixation
length with M=0.35 (SD=0.21) in the simulation situation. In the flying situation the
saccade lengths are higher for participants who showed a higher learning
effectiveness with M=73.44 (SD=29.64) in contrast to the other group where M=53.05
(SD=27.13). In the second and third situation both groups have nearly the same
average saccade lengths.
Although the descriptive data shows some differences, MANOVA results showed
no significant differences for gender and attention for duration of playing, fixation
rate, and saccade lengths. Only learning effectiveness has a significant effect on the
duration of the simulation situation with F(1)=186.652, p=0.047. Regarding the
fixation lengths in the different situations especially in the simulation part significant
differences could be found. On the one hand gender has a significant effect on the
fixation length with F(1)=195.77, p=0.045. On the other hand the learning
effectiveness has a significant effect with F(1)=372.982, p=0.033.
4.3 Areas of Interest
Areas of Interest (AOI) are particular display elements which are predefined by the
researcher. AOI analysis is used to quantify gazed data within a defined region of the
visual stimulus. The number of fixations on such a particular display element should
reflect the importance of that element. More important display elements will be
fixated more frequently.
When aiming at evaluating serious games, such information is crucially important
since it provides very clear indications of which regions on the screen are attended
(sufficiently) and, therefore, whether all instructional aspects are attended.
The most distinct results of these analyses are that players with high learning
performance have in total a larger number of fixations (Figure 4). By the prototypical
example of AOI 6, which was the most often attended one, we can say that children
who learnt more spend 38.67 % (SD=0.65) on it and children who are not so good
learners spend 32.23% (SD=4.32) on it (Figure 4). Persons with a higher attention
value spend 36.24% (SD=2.77) on AOI6 in contrast to persons with a lower attention
who spend more time with 38.53% (SD=0.86) on it.
Similarly, we found that high performers showed significantly longer saccades in
the flying scene of the game. This is an indication that children who learnt
well in general exhibited a more “calm” and smooth distribution of fixations on the
screen.
Fig. 4. Average fixation duration and saccade length

5 Summary
The empirical results indicate that the children can benefit from playing computer
games for learning purposes. The most distinct finding we presented here is the fact
that extreme groups such as high and low performers exhibit different visual patterns.
While the good learners scan the visual field evenly with longer saccades and attend
relevant areas on the screen more frequently and in a more stable fashion.
The results of our investigation also show that there are distinct gender differences
in the interaction style with different game elements, depending on the demands on
spatial abilities (navigating in the three-dimensional spaces versus controlling rather
two-dimensional features of the game) as well as distinct differences between high
and low performers in terms of learning. In addition to the comparisons on the level
of participants, on the basis of gaze density maps aggregated from the date in
combination with qualitative interviews with the subjects, we identified design
recommendations for further improvements of the game prototype in particular as
well as games in general.
Finally, our study showed that using eye tracking can be successfully applied to
measure critical aspects with regard to the quality of serious games.
Acknowledgements. The research and development introduced in this work is funded

by the European Commission under the seventh framework programme in the ICT
research priority, contract number 215918 (80Days, www.eightydays.eu).
References
1. Kickmeier-Rust, M.D.: Talking digital educational games. In: Kickmeier-Rust, M.D. (ed.)
Proceedings of the 1st International Open Workshop on Intelligent Personalization and
Adaptation in Digital Educational Games, Graz, Austria, October 14, pp. 55–66 (2009a)
2. de Freitas, S.: Learning in immersive worlds. A review of game-based learning (2006),
http://www.jisc.ac.uk/media/documents/programmes/
elearning_innovation/gaming%20report_v3.3.pdf
(retrieved August 28, 2007)
3. Van Eck, R.: Digital game-based learning. It’s not just the digital natives who are restless.
Educause Review, 17–30 (March/April 2006)
4. Prensky, M.: Digital game-based learning. McGraw-Hill, New York (2001)
5. Gee, J.P.: What video games have to teach us about learning and literacy. Palgrave
Macmillan, New York (2003)
6. Gee, J.P.: What video games have to teach us about learning and literacy (2nd revised and
updated edn.). Palgrave Macmillan, New York (2008)
7. Gee, J.P.: Learning by design: Good video games as learning machines. E-Learning and
Digital Media 2(1), 5–16 (2005); Gee, J. P.: What video games have to teach us about
learning and literacy. Palgrave Macmillan, New York (2003)
8. Shaffer, D.W.: How computer games help children learn. Palgrave Macmillan, New York
(2006)
9. Kickmeier-Rust, M.D., Mattheiss, E., Steiner, C.M., Albert, D.: A psycho- pedagogical
framework for multi-adaptive educational games. To appear in International Journal on
Game-Based Learning 1(1), 45–58 (in press)
10. Habgood, J.: Wii don’t do edutainment. In: Proceedings of Game-based Learning 2009,
London, UK, March 19-20 (2009)
11. Demetriou, S.: Motivation in computer games: The impact of reward uncertainty on
learning. In: Gómez Chova, L., Martí Belenguer, D., Candel Torres, I. (eds.) Proceedings
of Edulearn 2010, Barcelona, Spain, July 5-7 (2010)
12. Fu, F.-L., Su, R.-C., Yu, S.-C.: EgameFlow: A scale to measure learners’ enjoyment of e-
learning games. Computers & Education 52(1), 101–112 (2009)
13. Oblinger, D.: Games and learning. Educause Quarterly Magazine 29(3), 5–7 (2006)
14. Kickmeier-Rust, M.D., Albert, D.: Micro adaptivity: Protecting immersion in didactically
adaptive digital educational games. Journal of Computer Assisted Learning 26, 95–105
(2010)
15. Csikszentmihalyi, M.: Flow: The psychology of optimal experience. Harper and Row,
New York (1990)
16. Nielsen, J.: Heuristic evaluation. In: Nielsen, J., Mack, R.L. (eds.) Usability Inspection
Methods. John Wiley & Sons, New York (1994)
17. de Freitas, S., Oliver, M.: How can exploratory learning with games and simulations
within the curriculum be most effectively evaluated? Computers and Education Special
Issue on Gaming 46, 249–264 (2006)
18. Law, E.L.-C., Kickmeier-Rust, M.D., Albert, D., Holzinger, A.: Challenges in the
development and evaluation of immersive digital educational games. In: Holzinger, A.
(ed.) USAB 2008. LNCS, vol. 5298, pp. 19–30. Springer, Heidelberg (2008)
19. Weibelzahl, S., Lippitsch, S., Weber, G.: Advantages, opportunities, and limits of
empirical evaluations: Evaluating adaptive systems. Künstliche Intelligenz 3(2), 17–20
(2002)
20. El-Nasr, M.S., Yan, S.: Visual Attention in 3D Video Games. In: Proceedings of ACE
2006. Hollywood, California (2006)
21. Law, E.L.-C., Kickmeier-Rust, M., Albert, D., Holzinger, A.: Challenges in the
development and evaluation of immersive digital educational games. In: Holzinger, A.
(ed.) USAB 2008. LNCS, vol. 5298, pp. 19–30. Springer, Heidelberg (2008)
22. Land, M.F.: Eye Movements and the Control of Actions in Everyday Life. Progress in
Retinal and Eye Research 25, 296–324 (2006)
23. Jennett, C., Cox, A.L., Cairns, P., Dhoparee, S., Epps, A., Tijs, T., Walton, A.: Measuring
and Defining the Experience of Immersion in Games. International Journal of Human
Computer Studies 66(9), 641–661 (2008)
The Online Gait Measurement for Characteristic Gait
Animation Synthesis
Yasushi Makihara1, Mayu Okumura1, Yasushi Yagi1, and Shigeo Morishima2

1
The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka,
Ibaraki, Osaka 567-0047, Japan,
2
Department of Science and Engineering , Waseda University, 3-4-1 Okubo Shinjuku-ku,
1698555 Tokyo, Japan,
{makihara,okumura,yagi}@am.sanken.osaka-u.ac.jp,
morishima@mlab.phys.waseda.ac.jp
Abstract. This paper presents a method to measure online the gait features from
the gait silhouette images and to synthesize characteristic gait animation for an
audience-participant digital entertainment. First, both static and dynamic gait
features are extracted from the silhouette images captured by an online gait
measurement system. Then, key motion data for various gaits are captured and
a new motion data is synthesized by blending key motion data. Finally, blend
ratios of the key motion data are estimated to minimize gait feature errors
between the blended model and the online measurement. In experiments, the
effectiveness of gait feature extraction were confirmed by using 100 subjects
from OU-ISIR Gait Database and characteristic gait animations were created
based on the measured gait features.
1 Introduction
Recently, audience-participant digital entertainment has gained more attention, where
the individual features of participants or users are reflected to computer games and
Computer Graphic (CG) cinemas. In EXPO 2005 AICHI JAPAN[1],the Future Cast
System (FCS)[2] in the Mitsui-Toshiba pavilion[3] has been performed as a one of
large-scale audience-participant digital entertainments. The system captures an
audience’s facial shape and texture online, and a CG character’s face in the digital
cinema is replaced by the captured audience’ face. In addition, as an evolutional
version of the FCS, Dive Into the Movie(DIM) Project[4] tries to reflect not only the
individual features of face shape and texture but also those of voice, facial expression,
facial skin, body type, and gait. Among these, body type and gait are expected to
attract more attention of the audience because there are reflected to the whole body of
the CG character.
For the purpose of gait measurement, acceleration sensors[5][6][7][8] and Motion
Capture (MoCap) systems[9] have been widely used. These systems are, however,
unsuitable for an online gait measurement system because it takes much time for the
audiences to wear the acceleration sensors or to attach Mo-Cap markers.
326 Y. Makihara et al.
In a computer vision-based gait analysis area, both model-based

approaches[10][11]and appearance-based approaches[12][13] have been proposed,
which can measure gait feature without any wearable sensors or attached markers.
In the model-based methods, a human body is expressed as articulated links or
generic cylinders and it is fit to the captured image to obtain both static features like
link length and dynamic features like joint angles separately. Although these features
can be used for the gait feature measurement, the method is unsuitable for online
measurement because of high computational cost and difficulties of model fitting.
The appearance-based methods extract gait features directly from the captured
images without troublesome model fitting. In general, extract features are composite
of both static and dynamic components. Although the composite features are still
useful for gait-based person identification[10][11][12][13], they are unsuitable for
separate measurement of static and dynamic components.
Therefore, we propose a method of online measurement of intuitive static and
dynamic gait features from the silhouette sequences and also a method of
characteristic gait animation synthesis in the digital cinema. Side-view and frontview
cameras capture image sequences of target subject’s straight walk, and Gait Silhouette
Volume (GSV) is constructed via silhouette extraction, silhouette size normalization,
and registration. Then, the both static and dynamic components are measured
separately from the GSV. Because the proposed method extracts the gait features
directly from the silhouette sequence without model fitting, its computational cost get
much lower than that of the model-based methods, which enables online measurement
of the audience’s gait.
2 Gait Feature Measurement
2.1 GSV Construction
The first step in gait feature measurement is the construction of a spatiotemporal Gait
Silhouette Volume (GSV). First, gait silhouettes are extracted by background
subtraction and a silhouette image is defined by a binary image whose pixel value is 1
if it is inside the silhouette and is 0 otherwise. Second, the height and center values of
the silhouette region are computed for each frame. Third, the silhouette is scaled so
that the height is a pre-determined size, whilst maintaining the aspect ratio. In this
paper, the size is set to a height of hg = 60 pixels and a width of wg = 40 pixels.
Fourth, each silhouette is registered such that its center corresponds to the image
center. Finally, a spatio-temporal GSV is produced by stacking the silhouettes on the
temporal axis. Let f(x, y, n) be a silhouette value of the GSV at position (x, y) of the
nth frame.
2.2 Estimation of Key Gait Phases
The next step is estimation of two types of key gait phases: Single Support Phase
(SSP) and Double Support Phase (DSP) where subject’s legs and arms are the closest
and the most spraddle, respectively. The SSP and DSP are estimated as local
The Online Gait Measurement for Characteristic Gait Animation Synthesis 327
minimum and maximum of the second-order moment around the central vertical axis
of the GSV in the half gait cycle. The second-order moment at n-th frame within the
vertical range of [ht, hb] is defined as
(1)
where xc is horizontal center of the GSV. Because the SSP and DSP occur once per a
half gait cycle alternately, those at i-th half gait cycle are obtained as follows:
(2)
where and are initialized to zero, and gait period gp is detected by

maximizing the normalized autocorrelation of the GSV for the temporal axis[14]. In
the following section, NSSP and NDSP represent the number of the SSP and DSP
respectively. Note that SSP and DSP focused on arms, legs, and entire body are
computed by defining the vertical range [ht, hb] appropriately. In this paper, the ranges
of arms and legs are defined as [0.33hg, 0.55hg] and [0.55hg, hg] respectively. Figure 2
shows the result of the estimation of SSP and DSP.
Fig. 1. Example of GSV
Fig. 2. SSP (left) and DSP (right) Fig. 3. Measurement of the waist size
2.3 Measurement of the Static Feature
Static features contain the height, the waist size in width and bulge. When the static
features are measured, GSV at SSP is used to reduce the influence of the arm swing
and the leg bend. First, the number of the silhouette pixels at height y of the side and
front GSV are defined as the width w(y) and the bulge b(y) respectively as shown in
Fig. 3. Then, their averages WGSV and BGSV within the waist range of heights [yWt,
yWb] and [yBt, yBb] are calculated respectively.
(3)
(4)
(5)
Then, given the height on the original size image Hp, the waist size in the width
and the bulge on the image Wp and Bp are computed as
(6)
Finally, given the distance l from the camera to the subject and the focal length f of
the camera, the real height H and the waist size in the width W and the bulge B are
(7)
For statistical analysis of the static features, 100 subjects in OU-ISIR Gait
Database[15] were chosen at random and front- and left side-view cameras were used
for measurement. Figure 4(a) shows the relation between the measured height and the
questionnaire result range of the height. The measured heights almost lie within the
questionnaire result range. The measured heights of some short subjects (children)
are, however, out of lower bound of the questionnaire result. These errors may result
from self-enumeration errors due to the rapid growth rate of the children.
Fig. 4. The measurements and questionnaire results

Figure 4(b) shows the relation between the measured waist size in bulge and the
questionnaire result of the weight and it indicates that the waist size correlates with
the weight to some extent. For example, the waist size of light subject A and heavy
subject B are small and large as shown in Fig. 4(b), respectively. As an exceptional
example, the waist size of light subject C becomes large by mistake because he/she
wears a down jacket.
Fig. 5. Arm swing areas. In (a), front and back lines are depicted as red and blue lines,
respectively. In (b), front and back arm swing areas are painted in red and blue, respectively.
2.4 Measurement of the Dynamic Feature
Step: Side-view silhouettes are used for step estimation. First, walking speed v is
computed by distance between silhouette positions at the first and the last frame and
elapsed time. Then, the averaged steps’ length are computed by multiplying the
walking speed v and the half gait cycle gp=2.
Arm swing: The side-view GSV from SSP to DSP is used for arm swing
measurement. First, the body front and back boundary line (let them be lf and lb
respectively) are extracted from a gait silhouette image at SSP, and then front and
back arm swing candidate areas RAf and RAb are set respectively as shown in Fig. 5(a).
Next, the silhouette sweep image Fi(x, y) are calculated for i-th interval from SSP to
the next DSP as
(8)
(9)
where sign is sign function. Finally, the front and back arm swing areas are computed
as areas of the swept pixels of Fi(x, y) in the RAf and RAb respectively.
(10)
(11)
For statistical analysis of the arm swing, the same 100 subjects as the static feature
analysis are used. Figure 6 shows the result of the measured the front swing arm area.
We can see that the measured arm swings are widely distributed, that is, they are useful
cues for synthesizing characteristic gait animation. For example, arm swing for subject
A (small), B (middle), and C (large) can be confirmed in the corresponding gait
silhouette image at DSP on the graph. In addition, though the asymmetry of the arm
swing is not so outstanding, some subjects have the asymmetry such as Subject D.
Stoop: We propose two methods of stoop measurement: slope-based and curvature-
based methods. In the both methods, a side-view gait silhouette image at a SSP is
used to reduce the influence of the arm swing, and a back contour is extracted from
the image. Then, the slope of the back line is computed by fitting the line lb to the
back contour, and also the curvature is obtained as maximum k-curvatures of the back
contour (k is set to 8 empirically).
For statistical analysis of the stoop, the same 100 subjects are used. Figure 7 shows
the result of the measured the stoop. By measuring both the slope and the curvature of
the back contour, various kinds of the stoops are measured such as large slope and
small curvature (e.g., Subject A), large slope and large curvature (e.g., Subject B), and
small slope and large curvature (e.g., Subject C).
Fig. 6. Distribution of front arm swing areas Fig. 7. Distribution of stoop with slope and
and its asymmetry curvature
3 Gait Animation Synthesis
3.1 Motion Blending
Basically, a new gait animation is synthesized by blending a small number of motion

data called key motions in the same way as [16]. First, a subject was asked to walk on
a treadmill at a speed of 4 [km/h] and its 3D motion data was captured by using
motion capture system “Vicon”[9]. The motion data is composed of n walking styles
with variations in terms of step width, arm swing, stoop. Second, two steps of
sequence are clipped from the whole sequence to produce key motions M = {mi}.
Third, because all the motion data M needs to be synchronized before blending,
synchronized motion S = {si} is generated by using time warping[16]. Finally, a
blended motion is synthesized as
(12)
where αi is a blend ratio for i-th key motion data and a set of blend ratios is denoted
by α = {αi}. The remaining issue is how to estimate the blend ratios based on the
appearance-based gait features measured in the proposed framework and it is
described in the following section.
3.2 Blend Ratio Estimation
First, a texture-less CG image sequence for each key motion is rendered as shown in
Fig. 8. Second, m dimensional appearance-based gait feature vi for i-th key motion
data si is measured in the same way as described before. Then, let’s assume the gait
feature vector of the blended motion data is approximated as a weighted linear
sum of those of the key motions vi
(13)
Then, the blend ratio α is estimated so as to minimize errors between the gait
features of the blended model and the online measured gait features v (call it an
input vector later). The minimization problem is formulated as the following convex
quadratic programming.
(14)
The above minimization problem is solved with the active set method.
Moreover, when the number of the key motion data n is larger than the dimension
of the gait features m, the solution to Eq. (14) is indeterminate. On the other hand, Eq.
(13) is just the approximate expression because the mapping from the motion data
domain to the appearance-based gait feature domain is generally nonlinear. Therefore,
it is desirable to choose nearer gait features to the input feature.
Thus, another cost function is defined as an inner product of the blend ratio
and the cost weight vector w of the Euclidean distance from the features of each key
motion data to the input feature is defined as w = [||v1-v||, ..., ||vn - v||]T. Given one of
the solution to Eq. (14) as αCQP and the resultant blended model vα = V αCQP , the
minimization problem can be written as a linear programming.
(15)
Finally, this linear programming is solved with the simplex method.

3.3 Experimental Results
In this experiment, three gait features: arm swing, step, and stoop are reflected to the
CG characters, and seven key motion data and three test subjects are used as shown in
Table 1 respectively. Note that the gait features are z-normalized so as to adjust scales
among them. The resultant blend ratios are shown in Tab. 2. For example, Subject A
with large arm swing gains the largest blend ratio of the Arm Swing L and it results in
the synthesized silhouette of the blended model with large arm swing (Fig. 8). The
gait features of Subject B (large step and small arm swing) and C (large stoop and
large arm swing) are also successfully reflected in both terms of the blend ratios and
the synthesized silhouettes as shown in Tab. 2 and Fig. 8.
Moreover, the gait animation synthesis in conjunction with texture mapping was
realized in the audience-participant digital movie as shown in Fig. 9 and it plays a
certain role in identifying the audience in the digital movie.
Table 1. Gait features for key motions and inputs
Key motion Arm Step Stoop

swing
Arm swing L 3.50 0.51 -0.55
Arm swing S -1.00 -1.44 -0.63
Step L -0.34 3.00 -1.27
Step S -0.30 -2.30 1.12
Stoop 1.11 -0.61 2.50
Recurvature 1.48 -0.68 -3.00
Average 0.30 -1.02 -1.49
Input Arm Step Stoop
swing
A 3.40 0.34 0.23
B -0.84 1.13 -1.23
C 1.83 0.52 2.46
Fig. 8. The synthetic result

Table 2. The blending ratio of the key motion data
Key A B C
motion/Input
Arm swing L 0.83 0 0.16
Arm swing S 0 0.42 0
Step L 0 0.58 0.04
Step S 0 0 0
Stoop 0.17 0 0.80
Recurvature 0 0 0
Average 0 0 0
Fig. 9. Screen shot in digital movie
4 Conclusion
This paper presents a method to measure online the static and dynamic gait features
separately from the gait silhouette images. The sufficient distribution of the gait
features were observed in the statistical analysis with large-scale gait database.
Moreover a method of characteristic gait animation synthesis was proposed with the
blend ratio estimation of the key motion data. The experimental results show the gait
feature like arm swing, step, and stoop are effectively reflected to the synthesized
blended model.
Acknowledgement. This work is supported by the Special Coordination Funds for

Promoting Science and Technology of Ministry of Education, Culture, Sports,
Science and Technology.
References
1. EXPO 2005 AICHI JAPAN, http://www.expo2005.or.jp/en/
2. Morishima, S., Maejima, A., Wemler, S., Machida, T., Takebayashi, M.: Future cast
system. In: ACM SIGGRAPH 2005 Sketches, SIGGRAPH 2005. ACM, New York (2005)
3. MITSUI-TOSHIBA Pavilion,
http://www.expo2005.or.jp/en/venue/pavilionprivateg.html
4. Morishima, S.: Yasushi Yagi, S.N.: Instant movie casting with personality: Dive into the
movie system. In: Proc. of Invited Workshop on Vision Based Human Modeling and
Synthesis in Motion and Expression, Xian, China, pp. 1–10 (September 2009)
5. Gafurov, D., Helkala, K., Sondrol, T.: Biometric gait authentication using accelerometer
sensor. Journal of Computer 1(7), 51–59 (2006)
6. Gafurov, D., Snekkenes, E., Bours, P.: Improved gait recognition performance using cycle
matching. In: 2010 IEEE 24th Int. Conf. on Advanced Information Networking and
Applications Workshops (WAINA), pp. 836–841 (2010)
7. Rong, L., Jianzhong, Z., Ming, L., Xiangfeng, H.: A wearable acceleration sensor system
for gait recognition. In: 2nd IEEE Conf. on Industrial Electronics and Applications, pp.
2654–2659 (2007)
8. Rong, L., Zhiguo, D., Jianzhong, Z., Ming, L.: Identification of individual walking patterns
using gait acceleration. In: The 1st Int. Conf. on Bioinformatics and Biomedical
Engineering, pp. 543–546 (2007)
9. Motion Capture Systems Vicon, http://www.crescentvideo.co.jp/vicon/d
10. Yam, C., Nixon, M., Carter, J.: Extended model based automatic gait recognition of
walking and running. In: Proc. of the 3rd Int. Conf. on Audio and Video-based Person
Authentication, Halmstad, Sweden, pp. 278–283 (June 2001)
11. Cuntoor, N., Kale, A., Chellappa, R.: Combining multiple evidences for gait recognition.
In: Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 3, pp. 33–36
(2003)
12. Sarkar, S., Phillips, J., Liu, Z., Vega, I., Grother, P., Bowyer, K.: The humanid gait
challenge problem: Data sets, performance, and analysis. Trans. of Pattern Analysis and
Machine Intelligence 27(2), 162–177 (2005)
13. Han, J., Bhanu, B.: Individual recognition using gait energy image. Trans. on Pattern
Analysis and Machine Intelligence 28(2), 316–322 (2006)
14. Makihara, Y., Sagawa, R., Mukaigawa, Y., Echigo, T., Yagi, Y.: Gait recognition using a
view transformation model in the frequency domain. In: Leonardis, A., Bischof, H., Pinz,
A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 151–163. Springer, Heidelberg (2006)
15. OU-ISIR Gait Database,
http://www.am.sanken.osaka-u.ac.jp/gaitdb/index.html
16. Kovar, L., Gleicher, M.: Flexible automatic motion blending with registration curves. In:
ACM SIGGRAPH 2003, pp. 214–224 (2003)
Measuring and Modeling of Multi-layered
Subsurface Scattering for Human Skin
Tomohiro Mashita1 , Yasuhiro Mukaigawa2, and Yasushi Yagi2

1
Cybermedia Center Toyonaka Educational Research Center, Osaka University,
1-32 Machikaneyama, Toyonaka, Osaka 560-0043, Japan
2
The Institute of Scientific and Industrial Research, Osaka University,
8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan
mashita@ime.cmc.osaka-u.ac.jp, {mukaigaw,yagi}@am.sanken.osaka-u.ac.jp
Abstract. This paper introduces a Multi-Layered Subsurface Scattering

(MLSSS) model to reproduce an existing human’s skin in a virtual space.
The MLSSS model consists of a three dimensional layer structure with
each layer an aggregation of simple scattering particles. The MLSSS
model expresses directionally dependent and inhomogeneous radiance
distribution. We constructed a measurement system consisting of four
projectors and one camera. The parameters of MLSSS were estimated
using the measurement system and geometric and photometric analysis.
Finally, we evaluated our method by comparing rendered images and real
images.
1 Introduction
Dive into the Movie[1] is a system which, by scanning personal features such as
the face, body shape, gait motion and so on, enables the members of an audience
to appear in the movie as human characters. Technologies to reproduce a person
in virtual space are important for these types of systems. In particular, good
reproducibility of the skin is necessary for the further expression of personal
features because it includes several aspects such as transparency, fineness, color,
wrinkles, hairs, and so on.
Simulation of subsurface scattering is important for increasing the quality
of the reproduced skin because some of the characteristics of the skin are the
result of optical behavior under the skin’s surface. Subsurface scattering is a
phenomenon where light incident on a translucent material is reflected multiple
times in the material and radiated to a point other than the incident point. If
the transparency and inner components of the skin are expressed by simulating
subsurface scattering, the expression of personal features will be improved.
Measurement and simulation of subsurface scattering are challenging prob-
lems. The Monte Carlo simulation method as typified by MCML [2] is one
approach for subsurface scattering. This approach aims to simulate photon be-
havior based on physics and requires extravagant computational resources. When
expressing complex media like human skin, it is also difficult to measure the pa-
rameters. Diffusion approximation [3] enables effective simulation of dense and

336 T. Mashita, Y. Mukaigawa, and Y. Yagi
homogeneous media. However, there are limitations in the expression for repro-
ducing human skin because this approximation ignores the direction of incident
and outgoing light and assumes a homogeneous medium. Jensen et al. covered
expressiveness using a combination of the single-scattering and dipole diffusion
model [4]. Tariq et al. [5] expressed the inhomogeneity as Spatially Varying Sub-
surface Scattering. They suggested that an inhomogeneous rendering method is
important for the reproduction of real skin.
An expression of inhomogeneous scattering media and scattering dependent
on the incoming and outgoing light direction is necessary to achieve a more
expressive reproduction of real skin. The parameters of the scattering model must
be measurable to reproduce real skin. In this paper we propose a Multi-Layered
Subsurface Scattering model to achieve this requirement. We experimentally
evaluated our system by comparing real images and images synthesized using
the estimated parameters of the subsurface scattering.
Contribution
– The proposed model expresses a subsurface scattering model which is de-
pendent on the direction of incoming and outgoing light. This directional
dependency is achieved by the combination of simple scattering and a lay-
ered structure.
– The proposed model expresses an inhomogeneous scattering medium
which is necessary for the reproduction of human skin including personal
features.
Related Work
Rendering Method of Sub-Surface Scattering. Jensen proposed photon
mapping [6] which traced individual photons for simulating volumetric subsur-
face scattering. Later, Jensen et al. proposed a dipole approximation [4] based on
a diffusion approximation. Dipole approximation has been improved to a multi-
pole model[7]. Ghosh et al.[8] proposed a practical method for modeling layered
facial reflectance consisting of specular reflectance, single scattering and shallow
and deep subsurface scattering. These methods were able to provide positive re-
sults for rendering photorealistic skin. The above methods are based on diffusion
approximation. However, the importance of directional dependency is discussed.
Donner et al. showed the disadvantage of diffusion approximation and proposed
a spatially- and directionally-dependent model [9]. Mukaigawa et al. analyzed
the anisotropic distribution of the lower-order scattering in 2D homogeneous
scattering media [10].
Method for Measuring Subsurface Scattering. Various methods have

been proposed for measuring subsurface scattering in translucent objects di-
rectly using a variety of lighting devices such as a point light source[11], a laser
beam[4,12], a projector[5], and a fiber optic spectrometer[13].
Multi-layered Subsurface Scattering for Human Skin 337
Hair
Skin surface lipid film
Fine wrinkle
Blood vessel
Corneum Melanocyte
Collagen fiber
} Epidermis
} Dermis
} Hypodermis
Fig. 1. Skin structure

Fig. 2. SRD of a point illuminated hand
2 Expressing Spatial Radiance Distribution by

Multi-Layered Subsurface Scattering
2.1 Spatial Radiance Distribution
Human skin has a complicated multi-layered structure and each layer consists
of many components which have their own optical properties. The main compo-
nents and skin structure are shown in Fig. 1. This complicated structure makes
it difficult to render a photorealistic human face or skin and to measure the
optical properties of the components of human skin. The structure of human
skin is inhomogeneous in three dimensions. This inhomogeneity is reflected in
the optical behavior of incident light. The inhomogeneity of optical behavior is
observed as a Spatial Radiance Distribution (SRD) in the surface of the human
skin.
Figure 2 shows the SRD in a hand by point illumination using a green laser
pointer. Figure 2 left shows the experimental condition with general light. Fig-
ure 2 right shows the SRD with only the light of a laser pointer. In the upper
row, laser light is incoming from the top of the hand. In the lower row, laser light
is incoming from the left side of the hand. Obviously, the SRD in human skin is
not homogeneous and depends on the direction of incoming light. We focus on
the SRD and this paper describes the recreation of the SRD.
2.2 Concept of a Multi-Layered Subsurface Scattering Model

A simulation model to express the inhomogeneous and directionally dependent
SRD is necessary for reproducing human skin. To reproduce a complicated SRD,
an approximation model of the three-dimensional and inhomogeneous optical
behavior is required. Furthermore, the parameters of the approximation model
must be estimable. The important factor is not the number of parameters but
the number of estimable parameters.
We propose the Multi-Layered Subsurface Scattering (MLSSS) model. The
MLSSS model satisfies the above conditions when using a combination of the
multi-layered structure and a simple scattering model. The three-dimensional
Observed spatial radiance distribution
Camera
Incident light
Outgoing light
μ0 (xI,0 )
Layer0 ωO xO xI ωI
μ1(xI,1 )
d1
Layer1 xI,1 xO,1
μ2(xI,2 ) d2
Layer2 xI,2 xO,2
μ3(xI,3 ) Layer3 xI,3 xO,3 d3
Fig. 3. Concept of MLSSS model
optical behavior is approximated by the layer structure. The observed SRD is

expressed by a multiplication of the scattering in each layer. We define the
scattering in each layer as simple isotropic scattering. Inhomogeneity of the SRD
is expressed by each scattering particle having its own variables.
2.3 Details of the Multi-Layered Subsurface Scattering Model

The concept of this model is shown in Fig. 3. Figure 3 right shows incident light
scattered by a translucent material, where the horizontal lines represent layers
of the MLSSS model and arrows represent incident and outgoing light. The
graph in the upper left shows the observed asymmetric SRD using the camera.
The MLSSS model assumes that the asymmetric SRD is a mixture of simple
distributions in each layer as shown in Fig. 3 left. The MLSSS model expresses
a SRD that depends on the direction of the incident light and the viewpoint of
the observer.
The Bidirectional Scattering Surface Reflectance Distribution Function (BSS-
RDF) S expresses the relationship between incident radiance LI and outgoing
radiance LO with incident and outgoing directions ωI , ω O and points xI , xO
as given in the following equation:

Lo (xO , ωO ) = S(xI , ω I ; xO , ω O )LI (xI , ω I )(n(xI ), ω I )dω I dA(xI ), (1)
Ω A
where Ω is a sphere, A is illuminated area, n(xI ) is normal vector at the point

xI , and ω I , ω O , and n(xI ) are unit vectors. If an isotropic SRD is assumed,
the direction of incoming and outgoing light can be ignored. The MLSSS model
proposed in this paper is an approximation of the BSSRDF but does not ignore
the incoming and outgoing direction.
We assume incident radiance LI (xI , ω I ) at the point xI from the angle ω I .
In the MLSSS model, the observed point in each layer xO,l shifts in proportion
to the depth of the layer dl , where l is layer number. The observed outgoing
radiance LO is expressed as
Lo (xO , ωO ) = Σl LO,l (xO,l ), (2)
where
dl
xO,l = xO + ωO . (3)
(n, ω O )
Thus the dependence on the outgoing direction of the MLSSS is defined.
The SRD in each layer is assumed to be an isotropic distribution and inde-
pendent of the direction of incident light similar to the diffusion approximation.
The BSSRDF function in each layer is expressed as D(xO , xI ), where xI is the
incident point. Thus the relationship between the incident and outgoing radiance
in each layer is
LO,l = D(xO,l , xI,l )(n(xI ), ω I )LI,l , (4)
where ω I is the direction of the incoming light, n(xI ) is the normal vector at
xI , LI,l is a part of the incident light scattered in layer l.
The incident point in each layer also shifts in proportion to the depth of
the layer dl . The incident point in a layer xI,l is obtained from the following
equation:
dl
xI,l = xI + ωI . (5)
(n, ω I )
LI,l is obtained by
wl
LI,l = LI , (6)
wa + Σl wl
where wl is the weight of light scatted in the layer l and wa is the weight of
absorbed light.
Finally, the relationship between the outgoing radiance LO (xO , ω O ) and the
incident radiance LI (xI , ω) is expressed as
LO (xO , ω O ) = Σl D(xO,l , xI,l )(n(xI ), ω I )LI (xI , ω). (7)
The case of general lighting is expressed as

LO (xO , ωO ) = Σl D(xO,l , xI,l )(n(xI ), ω I )LI (xI , ω)dωdA(xI ). (8)
Ω A
The BSSRDF function in the MLSSS model is

wl
S(xI , ω I ; xO , ω O ) = Σl D(xO,l , xI,l )(n(xI ), ω I ) . (9)
wa + Σl wl
This BSSRDF function retains the dependence on direction and consists of
isotropic scattering, layer distance, and weight parameters.
(a) (b) (c) (d) (e)
Fig. 4. MLSSS measurement system ((a) Hardware and (b) Configuration of cam-
era and projectors) and Samples of captured images ((c) Structured light, (d) High
frequency, (d) Line sweeping)
Observed Intensity
Decomposed Gaussian
Fig. 5. Schematic diagram of parameter estimation
3 Measuring System
3.1 Hardware
We have constructed an implement for capturing facial images with pattern
projection and a system for parameter estimation by geometric and photometric
analysis.
The measurement system is shown in Fig. 4(a) and (b). The measurement
system consists of 4 LCD projectors and a camera. The inside of the measure-
ment system is a black box with a hole for inserting a face and 4 holes for the
projectors. The size of the box is 90cm × 90cm × 90cm. The camera is a Lw160c
(Lumenera) with 1392 × 1042 pixel and 12 bit valid data for each pixel. The pro-
jectors are EMP-X5 (EPSON) with 1024 × 768 pixel and 2200 lm. The positions
of the camera and projectors are shown in Fig. 4(b).
3.2 Geometric and Photometric Analysis

We describe the parameter estimation for the MLSSS model. The geometric pa-
rameters estimated are ω O , ωI , n, and the shape of a face. The photometric
Calibration Input images Input images Input images

images (Slit) (Graycode) (High freq.)
Coded structured Coded structured Direct component

light light separation
Projector
matrix Surface Direct
geometry component
Camera matrix
Radiance Four light source
distribution Normal
photometric stereo
separation
Radiance Distribution
distribution decomposition
Distribution
Parameter ... Distribution
Parameter
(Layer 0) (Layer N)
Fig. 6. Geometric photometric analysis
parameters estimated are the σ and wl included in the scattering particles be-
cause we use Gaussian distribution, D(). Figure 6 shows the flow of the geometric
and photometric analysis.
A structured light, slit pattern and high-frequency pattern are projected from
each projector. Figures 4(c), (d), and (e) show samples of captured images with
projected patterns. The shape of the face is reconstructed using the coded struc-
tured light projection method[14]. The captured images with high-frequency
patterns are used for direct component separation [15]. The separated direct
components are used for the normal vector estimation using the four light source
photometric stereo method [16].
We extracted a one dimensional SRD from this geometric information and the
slit line projected images. We took this one dimensional radiance distribution as
a distribution mixture of the simple radiance distributions in each layer. Figure 5
shows a schematic diagram of the photometric parameter estimations. Scattering
parameters are estimated by an EM algorithm.
4 Experiments
We conducted experiments to evaluate our model and measuring system by

rendering. The conditions of rendering are as follows. The camera position was
fixed and the same as when the image was captured. The light position was not
the same as when the image was captured. To evaluate the MLSSS model, we
(c) MLSSS +
(a) MLSSS (b) Direct Direct (d) Real
Fig. 7. Synthesized image with stripe projection
Real R Synth R MLSSS R Direct R

0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
250 300 350 400 450 500
Fig. 8. A profile of Fig. 7 of red plane and y = 150
used captured images with three light sources for parameter estimation and the
remaining unused images were used for comparison with the rendered images.
We rendered the image of a face with the camera and a light position identical
to when the image was captured. The mixture distribution was decomposed by
an EM algorithm, with 20 mixtures.
4.1 Evaluation of Anisotropic Scattering
We rendered an image with a strip light to evaluate the expression for anisotropic
scattering. Figure 7(a) shows the rendered subsurface scattering. Figure 7(b)
shows the separated direct component of a real image. Figure 7(c) shows the
sum of Fig. 7 (a) and Fig. 7(b). Figure 7 (d) shows a real image with the same
lighting conditions as the rendered image. We can compare Fig. 7(c) to Fig. 7(d)
and it is obvious that the rendered subsurface scattering using the MLSSS model
enables a reproduction of the directionally dependent subsurface scattering.
(b) MLSSS + (c) Homogeneous

(a) MLSSS (b) Direct Direct MLSSS + Direct (d) Real
Fig. 9. Rendered image with point light source
The red color’s profile from left to right of the image is shown in Fig. 8.
In Fig. 8, the position of the light source is on the right side. The pixels with
low values are not illuminated. We can see the effect of subsurface scattering
from the boundaries between the illuminated and non-illuminated area. There
are differences in the illuminated area’s intensity between the Real and Synthe-
sized images. We think that the parameters of subsurface scattering include the
influence of the position of the light source.
4.2 Evaluation of Inhomogeneity

We rendered an image of subsurface scattering with a point light source to evalu-
ate the expression of inhomogeneity. Figure 9 shows the result of rendering with
a point light source. Figure 9(a) is a synthesized indirect component using the
MLSSS model. Figure 9(b) is the direct component of the real image. Figure 9(c)
is Fig. 9(b) + Fig. 9(c). Figure 9(d) is rendered using the same parameters for all
the scattering particles in each layer. The parameters for Fig. 9(d) are the mean
of each parameter. Figure. 9(e) is a real image with the same light source for
comparison. We can see the blood vessels, facial hair roots, and inhomogeneous
redness in Fig. 9(a) and Fig. 9(c). However, we cannot see the components of
the inside of the skin from Fig. 9(c). It is obvious that the inhomogeneity of
subsurface scattering is important for the reality of rendered skin. However, the
direct component is also important to express the texture of real skin.
5 Conclusions
In this paper, we proposed the MLSSS model. This model expresses the SRD
of the surface of translucent material depending on the direction of the incident
and outgoing light. The MLSSS model has pixel-level variation of the scattering
parameters. We constructed a measurement system consisting of a camera and
four projectors. The parameters of the MLSSS model are estimated from the
captured images by geometric photometric analysis. In the experiments, images
of subsurface scattering are rendered and compared with real images. Future
work includes investigating other scattering models and layer structures.
References
1. Morishima, S.: Dive into the Movie -Audience-driven Immersive Experience in the
Story. IEICE Trans. Information and Systems E91-D(6), 1594–1603 (2008)
2. Wang, L., Jacques, S.L., Zheng, L.: Mcml–monte carlo modeling of light transport
in multi-layered tissues. Computer Methods and Programs in Biomedicine 47(2),
131–146 (1995)
3. Stam, J.: Multiple scattering as a Diffusion Process. In: Eurographics Workshop
in Rendering Techniques 1995, pp. 51–58 (1995)
4. Jensen, H.W., Marschner, S.R., Levoy, M., Hanrahan, P.: A practical model for
subsurface light transport. In: SIGGRAPH 2001, pp. 511–518 (2001)
5. Tariq, S., Gardner, A., Llamas, I., Jones, A., Debevec, P., Turk, G.: Efficient esti-
mation of spatially varying subsurface scattering parameters. In: VMV 2006, pp.
165–174 (2006)
6. Jensen, H.W.: Realistic Image Synthesis using Photon Mapping. AK Peters, Welles-
ley (2001)
7. Donner, C., Jensen, H.W.: Light diffusion in multi-layered translucent materials.
In: SIGGRAPH 2005, pp. 1032–1039 (2005)
8. Ghosh, A., Hawkins, T., Peers, P., Frederiksen, S., Debevec, P.E.: Practical model-
ing and acquisition of layered facial reflectance. In: SIGGRAPH Asia 2008 (2008)
9. Donner, C., Lawrence, J., Ramamoothi, R., Hachisuka, T., Jensen, H.W., Nayar,
S.K.: An Empirical BSSRDF Model. In: SIGGRAPH 2009 (2009)
10. Mukaigawa, Y., Yagi, Y., Raskar, R.: Analysis of Light Transport in Scattering
Media. In: CVPR (2010)
11. Mukaigawa, Y., Suzuki, K., Yagi, Y.: Analysis of subsurface scattering under
generic illumination. In: ICPR 2008 (2008)
12. Goesele, M., Lensch, H.P.A., Lang, J., Fuchs, C., Seidel, H.P.: Disco - acquisition
of translucent objects. In: SIGGRAPH 2004, pp. 835–844 (2004)
13. Weyrich, T., Matusik, W., Pfister, H., Bickel, B., Donner, C., Tu, C., McAndless,
J., Lee, J., Ngan, A., Jensen, H.W., Gross, M.: Analysis of human faces using a
measurement-based skin reflectance model. In: SIGGRAPH 2006, pp. 1013–1024
(2006)
14. Inokuchi, S., Sato, K., Matsuda, F.: Range imaging system for 3-D object recog-
nition. In: ICPR, pp. 806–808 (1984)
15. Nayar, S.K., Krishnan, G., Grossberg, M.D., Raskar, R.: Fast separation of di-
rect and global components of a scene using high frequency illumination. In: SIG-
GRAPH 2006, pp. 935–944 (2006)
16. Barsky, S., Petrou, M.: The 4-source photometric stereo technique for three-
dimensional surfaces in the presence of highlights and shadows. IEEE Transactions
on Pattern Analysis and Machine Intelligence 25(10), 1239–1252 (2003)
An Indirect Measure of the Implicit Level of Presence
in Virtual Environments
Steven Nunnally1 and Durell Bouchard2

1
University of Pittsburgh School of Information Science
smn34@pitt.edu
2
Roanoke College Department of Math, Computer Science, and Physics
bouchard@roanoke.edu
Abstract. Virtual Environments (VEs) are a common occurrence for many

computer users. Considering their spreading usage and speedy development it is
ever more important to develop methods that capture and measure key aspects
of a VE, like presence. One of the main problems with measuring the level of
presence in VEs is that the users may not be consciously aware of its affect.
This is a problem especially for direct measures that rely on questionnaires and
only measure the perceived level of presence explicitly. In this paper we
develop and validate an indirect measure for the implicit level of presence of
users, based on the physical reaction of users to events in the VE. The addition
of an implicit measure will enable us to evaluate and compare VEs more
effectively, especially with regard to their main function as immersive
environments. Our approach is practical, cost-effective and delivers reliable
results.
Keywords: Virtual Environments, Presence, Indirect Implicit Measure.
1 Introduction
VEs are very important to many aspects as technology advances. Their immersive
quality allows many different users to create situations that are either unavailable or
impractical to create in real life. The uses of VEs range from simple spatial tasks of
showing buildings to perspective buyers before the project is started to creating the
same emotional and physical reactions in a high risk situation without the danger.
Military, rescue, and medical personnel especially can utilize VEs to improve their
skills without exposing themselves to risky situations while reducing cost at the same
time, an important application since these personnel will soon be expected to act with
precision in the worst of scenarios. Another great use of this emerging technology is
fixing psychological disorders. VEs are being used to confront an individual’s worst
phobia or to create a reaction, to help treat disorders like Post-Traumatic Stress.
Unfortunately, this research is hindered because the different tools necessary to
measure the different aspects of VEs are limited. The key to all of the applications
listed above is the immersive quality that VEs possess. The term for this quality is
presence, which is the user's perceived level of authenticity of the displayed
346 S. Nunnally and D. Bouchard
environment. The above application must create the emotional and physical reactions
to the presented situation in order to accomplish their goal of increasing efficiency
when the real situation occurs [5]. Direct measures, like questionnaires, have been
used in the past to measure the direct effect of the environment on the user.
Researchers have worked to validate numerous questionnaires that not only support
evidence that one VE has a higher level of presence than another, but can often give
evidence to determine the specific trait that increases the level of presence [6]. These
direct measures help researchers compare different features of different VEs to find
which enables the greatest direct increase of presence on the user. However, some
features may not consciously affect the user, making a direct measure insufficient.
To advance research in this field, we must also be able to determine the indirect
effects that the VEs have on users by measuring the implicit levels of presence as well
as the explicit levels. In related work in Psychology such implicit measures have
proven to be better predictors of behavior than direct and explicit measures, which
could prove more important in an application where the emotional and physical
reaction is the key to success [4]. Currently, no indirect measurement of the implicit
level of presence is readily available for researchers.
There are a few earlier studies which have attempted to find an indirect measure of
the implicit level of presence. One of these studies attempted to use a method they
named behavioral realism, which measures a reaction to an event within the
environment. One attempt was with postural responses [2]. Freeman et al. tried this
using a video of a car racing around a race track while measuring the subject’s
response to the hairpin turns. They used a questionnaire in order to compare the sway
data with the questionnaire data. They did this using many different presence altering
features. One was stereoscopic vs. monoscopic video and another was different screen
sizes [1,2]. They concluded that their data showed weak support for the use of this
behavioral realism measurement in evaluating the VE features. The indirect measure
did not correlate with the questionnaire.
The conclusion rejected the hypothesis because it was necessary for the direct
measure and indirect measure to correlate, which would only be coincidental if there
was correlation because the measurements simply measure presence in different ways
[3]. First there was not a large enough difference between the features that were used
to increase presence given the number of participants. Tan showed that participants
performed tasks much better on larger displays than on smaller ones, even when the
visual angle was the same, supporting the idea that different field of views (FOVs)
should affect presence [5].
The experiment detailed in this paper uses the CAVE at Roanoke College to
rework the screen size experiment of Freeman. This CAVE encapsulates 170 degrees
of the user’s FOV for the larger display, as compared to Freeman’s 50 degrees.
Freeman also used a passive activity in the attempt to measure presence. The level at
which the user is involved in the VE can also greatly affect presence [6]. This
measurement uses an active response to measure presence and might only show
results with an active environment. Finally, the hypothesis more correctly states that
this same measurement can be used to measure presence because of the higher level
of involvement and the greater difference between the features of the VEs. This will
be supported by showing that this measurement rejects the hypothesis that the CAVE,
or immersive condition, and the desktop display, or non-immersive condition, has a
An Indirect Measure of the Implicit Level of Presence in Virtual Environments 347
lesser or equal level of presence. Further, the measurement will show a greater
reliability than the explicit measure of presence, with more consistent results and a
greater confidence.
2 Experimental Evaluation
2.1 Procedure
The design listed here is meant to correct the problems in the experiment described
above. This experiment is a 1 by 1 comparison, for a more direct and less complicated
experiment. Every subject was administered the test on both the non-immersive and
the immersive condition to set up a within subject comparison, half started with the
non-immersive condition.
Participants actively navigated a virtual racetrack using a steering wheel and foot
pedals to drive the racecar. The steering wheel was set up so that it could not adjust so
the device would not accidentally move during the test, creating false movement in
the postural sway. Participants were allowed to adjust their chair at the beginning of
the experiment for comfort, but were then asked to keep the chair’s position fixed so
that the data would be taken from the same distance and the FOV would be
comparable between conditions.
Participants wore suspenders and a head band that had infrared lights attached.
With these lights, a Wiimote was used to track the person’s head and shoulder
position to determine their sway. The character’s physical location and direction in
the virtual environment was logged about every 30 milliseconds as was the position of
the 4 infrared lights. The participants were not told about the indirect measure of the
implicit level of presence, to not bias the results.
The participant was then given a calibration sequence. First the steering wheel was
turned in both directions with as little head and body movement as possible. Then the
user turned both the steering wheel and their head simultaneously in both directions.
This sequence was later used to minimize the effects of head turning and shouldering
moving used to complete the task so that only postural sway based on a higher
implicit level of presence was measured, instead of a difference between the screen
sizes, as the user will be more likely to turn their head for the immersive condition.
The racetrack had many different types of curves and turns to force the participant
to take turns at different speeds. The participants were allowed a short training period,
which was the same course in the opposite direction. The subject could get used to the
environment and the control aspects of the experiment during this training period. At
this time the participant would fill out the questionnaire to get used to the questions,
so knowledge of the questionnaire did not bias the results of the second direct
measurement. The course was a circuit, so that the participants would not finish
before the trial time expired. The participants were asked to stay on the track and
warned that bumps and hills off the track were designed to slow them down.
After each trial the subjects were given a presence questionnaire to get the direct
measure of presence of the condition (shown in Fig. 1). There were four questions. The
first three were used to directly measure their explicit level of presence, while the fourth
was used to determine whether or not the subject should continue with the experiment.
Fig. 1. The questionnaire used for this experiment
Twenty-two Roanoke College undergraduate students, eighteen male, volunteered

for this experiment. A $25$ prize was awarded to the participant with the fastest lap
time as an incentive used to increase involvement, adding to the presence of both VEs
in the experiment.
2.2 Apparatus
Most of the simulation is handled by Epic Games’ Unreal Tournament Game Engine
(2003). Epic Games created this engine for a first-person shooter video game, but the
weapons and crosshairs were made invisible for the purposes of this experiment. The
racetrack was developed using Epic Games’ Unreal Editor. The movements were
controlled using Gamebots, which is a command based language that passes messages
between the user and the game character. This allows for a program to control the
character’s movement so that the character moves like a car, with acceleration and
deceleration, depending on the pedals position. This could also log the input device
variables for analysis.
The input device was a Logitech steering wheel and foot pedals. A program
calculated the speed and rotational direction based off of the previous state of the
character and the current state of the input device. Just like a normal car, the
accelerator pedal would accelerate the car faster the further the pedal was pushed.
The car would decelerate if no pedals were pushed and decelerate faster if the brake
was pushed.
The participants’ motion was recorded using 2 Wii sensor bars that are infrared
lights mentioned in the previous section. They were attached to the participant with
Velcro, using suspenders to trace the shoulders and a head band to trace the head
(shown in Fig. 2). The lights were recorded with a Wiimote, which passed the
information in pixel coordinates to a program which recorded the coordinates of the 4
infrared lights. These coordinates were used to measure the subject’s position.
Fig. 2. Apparatus used to measure the users body position. The user is also using the steering
wheel and foot pedals and the screen used for the non-immersive condition.
The display for the small screens was on a standard 18 inch CRT display, which
fills a FOV of about 28 degrees. The large screen was displayed on the CAVE, which
fills a FOV of about 170 degrees. The CAVE uses 4 Epson Projectors to project the
image on three wall sized screens. The middle screen is 12 ft. wide by 8 ft. tall and the
other two screens act as wings that are both 6 ft. wide by 8 ft. tall. The wings wrap
around the user slightly to gain the 170 degrees FOV advantage.
3 Analysis
The analysis for the indirect measure begins with two sets of data: the character’s
rotational position and the four infrared light coordinates. Both have synchronized
timestamps for all values. First, the measurement must be derived from the raw data.
The recorded character‘s rotational position is not useful for this measurement, but the
difference between the rotational values represents the position of the steering wheel,
which is used to determine when the event of turning the vehicle occurs and to what
extent. Next, some interpolation must be used for the data. The infrared lights were
recorded as a pixel coordinate, but whenever one pixel was not recorded the x and y
position of that light was recorded as a zero. All of these zeros were replaced using
interpolation from the first point before the light was dropped to the next available
point. Next, the timestamps of both must match to allow direct comparison between
the two sets of data. The steering wheel position started and ended with each trial, and
was therefore the base data that was used for comparison. The infrared coordinates
were thus taken as a weighted average for each of the timestamps used alongside the
steering wheel position. This gives an interpolated position of the subject’s position
for each record of the steering wheel position.
The steering wheel data can range to some point number n in both the negative and
positive direction so that the center position of the steering wheel is zero. The
subjects’ physical position must match this condition in order to compare the results,
so that they are comparable. This method takes the average horizontal position, or x
value, of all four infrared lights and averages them any time the steering wheel data
are within the 10% range in the center of its minimum and maximum number to find
the subject’s resting location. It is assumed that if the steering wheel is near the
center, than the vehicle is traveling in a near straight direction and no centripetal
acceleration would be felt by the driver. Now each value of the averaged horizontal
position is changed so that it represents the difference from the resting point, and
therefore ranges from some number negative number to some positive number and the
resting position is near zero, much like the steering wheel data. The data are then
normalized so that both sets range from -1 to 1 using the minimum and maximum
points of both sets.
Fig. 3. A graphical representation of the data before the correlation is calculated. The lines
represent the steering wheel position and the participants position away from their resting point.
Then, the calibration taken during the experiment was used to take away all motion
not related to postural sway. The non-immersive condition used only the steering
wheel calibration, since the screen is not large enough for head turning. The immersive
condition used the other calibration sequence described in the procedure. The
calibration data was used taking the subject’s horizontal position change based on the
percentage the steering wheel is turned. This value is added to the horizontal position
at all times based on the amount the wheel is turned and which condition is in trial. To
find the percentage of correlation (shown in Fig. 3) between the two sets of data, the
number with the least absolute value is divided by the number with the greatest
absolute value at each timestamp. This value is then averaged through the trial to find
the presence value of that condition. This number is between -1 and 1, where -1 shows
completely negative correlation, an unlikely possibility with this measurement, zero
shows no correlation, and thus no implicit level of presence, and 1 shows a completely
positive correlation, and thus a perfect implicit level of presence is felt.
The questionnaire data was simple to analyze. The first three answers are a value
between 0 and 100. These values are averaged since the questions ask about presence
in different ways. This should minimize deviation with a greater number of answers
for better accuracy in the VE’s presence value. A greater number should show greater
presence.
4 Results
The immersive conditioned produced a 0.1903 average correlation, whereas the non-
immersive condition only had a 0.0338 average (as shown in Fig. 4). 21 of the 24
participants' values confirmed that the CAVE is more immersive, with a presence
value greater for the immersive condition. The difference between the values for the
conditions was significant for this measurement (p < 0.00001).
Fig. 4. A graph of the results for the indirect measure of the implicit level of presence
The direct measure of the explicit level of presence had an average value of
66.5909 for the immersive condition and 58.7121 for the non-immersive condition (as
shown in Fig. 5). Only, 15 of the 24 participants' values confirmed that the CAVE is
more immersive, but the difference was significant for this measurement (p < 0.03).
Fig. 5. A graph of the results for the direct measure of the explicit level of presence
5 Conclusions
The experiment produced evidence supporting the use of the indirect measure of the
implicit level of presence for research with VEs. Furthermore, it does so more reliably
and with a higher confidence level than the direct measure of the explicit level or
presence used in this study. This shows that the direct measurement of the explicit
level of presence fails to capture important aspects of a VE that relate to subconscious
processes which can have great impact on behavior. This adds a powerful tool to VE
researchers' arsenal in order to advance the technology. The utility offered by this new
measurement offers great insight into the differences of implicit and explicit levels of
presence using an indirect and direct level of presence.
6 Future Work
Our results motivate further work in which our measure can be used to predict the
behavior of participants, particularly for events that aim to induce emotional response
or reflexes. The validated measurement should be tested against different
questionnaires to see if this is comparable to other questionnaires, and discover the
advantage of using one over another. This experiment used a very simple
questionnaire from Freeman et al.’s experiments, but more complex questionnaires
exist.
Making this research more usable with different behavioral events should also be
considered and tested. Our measurement only works with tasks that can make use of
centripetal acceleration, like the driving task presented here. This is limiting because
it may not be suitable for all researchers to test their presence enhancing features in a
driving VE. This is an example of how this specific indirect measurement can work.
For other tasks, possibly more generic tasks, one should seek new implicit measures.
Also, the method of analyzing the data uses the minimums and maximums within the
data sets to normalize the data which could minimize bias between subjects if the bias
between users' and their movement comes from predetermined movement extremes.
Additional Experiments could confirm or deny this idea, which could aid the research
efforts by creating easier comparisons between experiments.
If these steps are achieved then indirect measures of presence can be used
alongside direct measures to discover differences in VEs that have not yet been
possible. Testing could begin to decide whether or not different senses like sounds or
smells could have a significant effect on presence. A cost-benefit analysis could be
completed so each feature of a VE can be examined for its price and the added
presence value that is achieved. This could help schools and companies who may
currently have doubts by showing exactly what presence value they will obtain for a
certain price tag. Then schools, medical facilities, or the military can consider whether
or not it is affordable to introduce as a new training or exploratory environment to
aide in these efforts.
Acknowledgments. At Roanoke College I would like to thank Dr. Bouchard and Dr.
Childers for helping me with the project. At the University of Pittsburgh I would like
to thank Dr. Lewis and Dr. Kolling for supporting the project. Special thanks to Dr.
Hughes for starting my interest in this project.
References
1. Freeman, J., Avons, S.E., et al.: Effect of Stereoscopic Presentation, Image Motion, and
Screen Size on Subjective and Objective Corroborative Measures of Presence.
Presence 10(3), 298–311 (2001)
2. Freeman, J., Avons, S.E., et al.: Using Behavioural Realism to Estimate Presence: A Study
of the Utility of Postural Responses to Motion-Stimuli. Presence 9(2), 149–164 (2000)
3. Hofmann, W., Gawronski, B., Gschwendner, T., Le, H., Schmitt, M.: A Meta-Analysis on
the Correlations Between the Implicit Association Test and Explicit Self-Report Measures.
University of Trier, Germany (2004) (unpublished manuscript)
4. De Houwer, J.: What Are Implicit Measures and Why Are We Using Them. In: The
Handbook of Implicit Cognition and Addiction, pp. 11–28. Sage Publishers, Thousand Oaks
(2006)
5. Tan, D.S., Gergle, D., et al.: Physically Large Displays Improve Performance on Spatial
Tasks. ACM Transactions on Computer-Human Interaction 13(1), 71–99 (2006)
6. Witmer, R.G., Singer, M.J.: Measuring Presence in Virtual Environments: A Presence
Questionnaire. Presence 7(3), 225–240 (1998)
Effect of Weak Hyperopia on Stereoscopic Vision
Masako Omori1, Asei Sugiyama2, Hiroki Hori3, Tomoki Shiomi3,

Tetsuya Kanda3, Akira Hasegawa3, Hiromu Ishio4, Hiroki Takada5,
Satoshi Hasegawa6, and Masaru Miyao4
1
Facility of Home Ergonomics, Kobe Women’s University, 2-1 Aoyama,
Higashisuma, Suma-ku, Kobe-city 654-8585, Japan
2
Department of Information Engineering, School of Engineering, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
3
Department of Information Engineering, Nagoya University, Furo-cho, Chikusa-ku,
Nagoya 464-8603, Japan
4
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku,
5
Graduate School of Engineering, Human and Artificial Intelligent Systems,
University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan
6
Department of Information and Media Studies, Nagoya Bunri University, 365 Maeda,
Inazawa-cho, Inazawa-city, Aichi 492-8520, Japan
masako@suma.kobe-wu.ac.jp
Abstract. Convergence, accommodation and pupil diameter were measured

simultaneously while subjects were watching 3D images. The subjects were
middle-aged and had weak hyperopia. WAM-5500 and EMR-9 were combined
to make an original apparatus for the measurements. It was confirmed that
accommodation and pupil diameter changed synchronously with convergence.
These findings suggest that with naked vision the pupil is constricted and the
depth of field deepened, acting like a compensation system for weak
accommodation power. This suggests that people in middle age can view 3D
images more easily if positive (convex lens) correction is made.
Keywords: convergence, accommodation, pupil diameter, middle age and 3D

image.
1 Introduction
Recently, a wide variety of content that makes use of 3D displays and stereoscopic
images is being developed. Previous studies reported effects on visual functions after
using HMDs. Sheehy and Wilkinson (1989) [1] observed binocular deficits in pilots
following the use of night vision goggles, which have similarities to the HMDs used
for VR systems. Mon-Williams et al (1993) [2] showed that physiological changes in
the visual system occur after periods of exposure of around 20 min. Howarth (1999)
[3] discussed the oculomotor change which might be expected to occur during
immersion in a virtual environment whilst wearing HMD.
Effect of Weak Hyperopia on Stereoscopic Vision 355
Meanwhile, effects such as visual fatigue and motion sickness from continuously
watching 3D images and the influence of binocular vision on human visual function
remain insufficiently understood. Various studies have been performed on the
influence of stereoscopic images on visual function [4] [5] [6]. Most prior studies
discussed the effects of visual image quality and extent of physical stress. These
studies have employed bioinstrumentation or surveys of subjective symptoms [7]. To
find ways to alleviate visual fatigue and motion sickness from watching 3D movies
further studies are needed.
Under natural viewing conditions the depth of convergence and accommodation
agrees in young subjects. However, when viewing a stereoscopic image using binocular
parallax, it has been thought that convergence moves with the position of the reproduced
stereoscopic image, while accommodation remains fixed at the image display, resulting
in contradictory depth information between convergence and accommodation, called
discordance, in the visual system [8]. With the aim of qualitatively improving
stereographic image systems, measurements under stereoscopic viewing conditions are
needed. However, from objective measurements of the accommodation system, Miyao et
al., [9] confirmed that there is a fluctuating link between accommodation and
convergence in younger subjects during normal accommodation.
In middle-aged and elderly people gazing at forward and back movement for a long
time, it is said that the discordance is such that even during natural viewing there is a
slight difference between accommodation and convergence, with accommodation
focused on a position slightly farther than that of real objects and convergence focused on
the position of the real objects. However, we obtained results that indicate discordance
between accommodation and convergence does not occur in younger subjects gazing at a
stereoscopic view for a given short time.
Weal (1975) [10] reported the deterioration of near visual acuity in healthy people
is accelerated after 45 years of age. We found a similar tendency in near vision in this
experiment [11]. Similar to presbyopia, cataract cloudiness gradually becomes severer
with age after middle age. Sun and Stark [12] also reported that middle-aged subjects
have low accommodative power, that their vision should be properly corrected for
VDT use, and that more care should be taken to assure they have appropriate displays
than for their younger counterparts.
In fact, it may be possible for middle-aged and elderly people with weak hyperopia
to supplement accommodative power when they are watching 3D images by
deepening the depth of field with trigger pupil contraction. However, this pupil
contraction due to near reaction makes it a little harder to see with reduced light. The
possibility is therefore suggested that pupil contraction is alleviated with correction by
soft contact lenses.
The purpose of this experiment was to investigate pupil expansion by simultaneously
measuring accommodation, convergence and pupil diameter.
2 Methods
2.1 Accommodative and Convergence Measurement and Stimulus
In this experiment, visual function was tested using a custom-made apparatus. We

combined a WAM-5500 auto refractometer and EMR-9 eye mark recorder to make an
356 M. Omori et al.
original machine for the measurements. The WAN-5500 auto refractometer (Grand
Seiko Co., Ltd.) can measure accommodation power with both eyes opened under
natural conditions, and the EMR-9 eye mark recorder (Nac Image Technology, Ltd)
can measure the convergence distance. 3D images were presented using a liquid
crystal shutter system.
In this experiment, 3D image was shown with a display set 60 cm in front of
subjects. The distance between the subjects’ eyes and the target on the screen was 60
cm (1.00/0.6 = 1.67 diopters (D)) (Note: diopter (D) = 1/distance (m); MA (meter
angle) = 1/distance (m)). The scene for measurements and the measurement
equipment are shown in Fig. 1. Convergence, accommodation and pupil diameter
were measured simultaneously while subjects were watching 3D images.
Fig. 1. Experimental Environment
2.2 Experiment Procedure
The subjects were three healthy middle-aged people (37, 42, and 45 years old) with
normal uncorrected vision, one healthy younger person (25 years old) with correction
using soft contact lenses and one healthy elderly person (59 years old) with normal
uncorrected vision. The subjects were instructed to gaze at the center of the sphere
with the gaze time was set at 40 seconds. All subjects had a subjective feeling of
stereoscopic vision. While both eyes were gazing at the stereoscopic image, the lens
accommodation of the right eye was measured and recorded. Informed consent was
obtained from all subjects and approval was received from the Ethical Review Board
of the Graduate School of Information Science at Nagoya University.
The concept of stereoscopic vision is generally explained to the public as follows:
During natural vision, lens accommodation (Fig. 2) coincides with lens convergence
(Fig. 3). Gaze time was 40 seconds, and the accommodation of the right eye was
measured and recorded while the subjects gazed at the stereoscopic image with both
eyes. The sphere moved virtually in a reciprocating motion range of 20 cm to 60 cm
in front of the observer with a cycle of 10 seconds (Fig. 4). They gazed at the
open-field stereoscopic target under binocular and natural viewing conditions.
Fig. 2. Lens Accommodation
Fig. 3. Convergence
Fig. 4. Spherical Object Movies (Power 3D™ : Olympus Visual Communications, Corp.)
Measurements were made three times under two conditions: (1) with the subjects
using uncorrected vision and (2) with subjects using soft contact lenses (+1.0 D).
Subjects used their naked eyes or wore soft contact lenses, and their refraction was
corrected to within ±0.25 diopter. (“Diopter” is the refractive index of a lens and an
index of accommodation power. It is the inverse of meters, for example, 0 stands for
infinity, 0.5 stands for 2 m, 1 stands for 1 m, 1.5 stands for 0.67 m, 2 stands for 0.5 m,
and 2.5 stands for 0.4 m). Middle-aged and elderly subjects with normal vision of the
naked eyes also wore soft contact lenses (+1.0).
358 M. Omori et al.
The experiment was conducted according to the following procedures (Table 1).
Subjects’ accommodation and convergence were measured as they gazed in binocular
vision at a sphere presented in front of them. The illuminance of the experimental
environment was about 36.1 (lx), and the brightness of the sphere in this environment
was 5.8 (cd/m2).
Table 1. Experimental Environment
Brightness of Spherical Object(cd/m2) 5.8
illuminance(lx) 36.1
Far 0.33
Size of Spherical Object (deg)
Near 12
3 Results
Convergence, accommodation and pupil diameter were measured simultaneously
while subjects were watching 3D images. The following results were obtained in
experiments in which subjects were measured with naked eyes or while wearing soft
contact lenses, with their refraction corrected to within ±0.25 diopters (Figure 5, 6, 7).
Figure 5 shows the results for Subject A (25 years of age), Figure 6 shows the
results for Subject B (45 years of age) and Figure 7 shows results for Subject C (59
years of age). Figures 8 and 9 show the results for subjects who wore soft contact
lenses for near sight. Figure 8 shows measurement results for Subject B (45 years of
age), and Figure 9 shows the results for Subject C (59 years of age). These Figures
show accommodation and convergence with diopters on the left side vertical axis. The
right vertical axis shows pupil diameter. Table 2 shows the average pupil diameter for
middle-aged Subject B and elderly Subject C.
Subjects’ convergence was found to change between about one diopter (1 m) and
five diopters (20 cm) regardless of whether they were wearing the soft contact lenses.
The diopter value also fluctuated with a cycle of 10 seconds. In addition, we
confirmed that the accommodation and pupil diameter changed synchronously with
convergence. Thus, the pupil diameter became small and accommodation power
became large when the convergence distance became small.
The accommodation amplitude every 10 seconds was from 2D to 2.5D with the
naked eye (Figure 5), Figure 6 shows 0.5D, Figure 7 shows from 0.5D to 0.8D, and
when the subjects were wearing soft contact lenses (+1.0 D) for mild presbyopia,
Figure 8 shows from 0.5D to 1D, and Figure 9 shows from 0.5D to 1.5D.
From the results in Table 2, it is seen that the average pupil diameter with naked
eyes was larger than with corrected soft contact lenses. The dilation in the diameter
was 0.4 mm for Subject B and 0.2 mm for Subject C.
Table 2. Average of pupil diameter with uncorrected and soft contact lenses (+1.0 D)
middle-aged Subject B elderly Subject C
Average of pupil diameter with subjects

3.69 mm 2.02 mm
using uncorrected vision
Average of pupil diameter with subjects

4.05 mm 2.20 mm
using soft contact lenses (+1.0 D)
pupil diameter
pupil diameter (mm)

accommodation distance movement of 3D image
convergence
Fig. 5. Subject A (25 years of age) wore soft contact lenses for near sight
convergence
pupil diameter (mm)
pupil diameter distance movement

of 3D image
accommodation
Fig. 6. Subject B (45 years of age) with Fig. 7. Subject C (59 years of age) wore
naked eyes soft contact lenses for near sight
360 M. Omori et al.
convergence convergence
pupil diameter
pupil diameter (mm)

pupil diameter (mm)
distance movement pupil diameter
of 3D image distance movement of 3D image
accommodation accommodation
Fig. 8. Subject B (45 years of age) wore soft Fig. 9. Subject C (59 years of age) wore soft
contact lenses for near sight contact lenses for near sight
4 Discussions
It was shown that the focus moved to a distant point as the virtual movement of the
visual target away from the subject. The change occurred at a constant cycle of 10
seconds, synchronously with the movement of the 3D image. By measuring the
accommodation movement in response to the near and far movement of the 3D
image, the distant view was shown to be about from 1D to 5D (1 meter to 0.2 meters).
These results were consistent with the distance movement of the 3D image (0.6 – 0.2
meter). Thus, we were able to measure the results as subjects’ watched 3D image with
both eyes. Figure 5 shows large movements in both accommodation and convergence.
Wann et al. [13] stated that within a virtual reality system, the eyes of a subject must
maintain accommodation at the fixed LCD screen, despite the presence of disparity
cues that necessitate convergence eye movements to capture the virtual scene.
Moreover, Hong et al. [14] stated that the natural coupling of eye accommodation and
convergence while viewing a real-world scene is broken when viewing stereoscopic
displays.
In the 3D image with the liquid crystal shutter system, the results of this study
differed from those of a previous study in which accommodation was fixed on the
LCD. Meanwhile, the change in accommodation was smaller than the large
movement seen in convergence. These results show the influence of aging in the
deterioration of accommodation. However, accommodation in Subject B and Subject
C was fixed behind the display. Accommodation was not fixed on the display in any
case. This also did not match the previous study.
At the closest the distance, the difference between accommodation and
convergence was about 4D in Figure 6, about 5-6D in Figure 7, about 2-3D in Figure
8 and about 4-5D in Figure 9. In Figures 5–7, the accommodation change gradually
becomes smaller and more irregular, and the values become closer to 0. This is related
to a lack of accommodation power due to presbyopia. In Figures 6–7, the pupil
diameter becomes smaller in synchronization with the near vision effort of
convergence. It is suggested that a near response occurs with 3D images similar to
that with real objects. It is reported that near response occurs gradually from 0.3 m
(3.3D), and then pupil diameter reaches the maximum with rapid contraction at 0.2 m
(5D). Figures 6–8 show similar results. However, contraction of pupil diameter is not
seen in Figure 5.
The above suggests that the reason of middle-aged subject are able to view 3D
images stereoscopically is that have supplemented accommodation power is
supplemented with a deepened depth of field from pupil contraction. Thus, it is
thought that with contraction of pupil diameter, images with left-right parallax can be
perceived. On the other hand, pupil contraction implies that a decreased amount of
light enters the retina. Therefore, it is suggested that elderly people perceive things as
being darker than younger people do.
In this study, rapid changes in pupil diameter were seen from 5 second before the
start of measurements (Figures 5–8). It may be that light reaction occurred because of
the rapid change in the display from presentation of 3D images. It is reported that the
amount of pupil diameter contraction from the light reaction becomes progressively
smaller with age. The present experimental result was the same. It takes about 1 sec
until the changes from the light reaction are over. Therefore, in this experiment, average
pupil diameter from 10 sec to 20 sec, when no influence is seen, was compared.
In middle-aged Subject B and elderly Subject C, pupil diameter became about 10%
larger when they wore soft contact lenses than with normal vision for near sight.
Thus, it is suggested that pupil contraction is reduced as a result of compensation by
soft contact lenses with near eye sight. Especially, it was shown in Figures 8 and 9
that accommodation follows convergence. This suggests that people in middle age
can view 3D images more easily if positive (convex lens) correction is made.
5 Conclusions
In this study we used 3D images with a virtual stereoscopic view. The influences of
age and visual functions on stereoscopic recognition were analyzed. We may
summarize the present experiment as follows.
1. Accommodation and convergence change occurred at a constant cycle of 10
seconds, synchronously with to the movement of the 3D image.
2. In the middle-aged subject and elderly subject, accommodation showed less
change than convergence.
3. The pupil diameter of the middle-aged subject and elderly subject contracted in
synchronization with near vision effort of convergence.
4. Discordance of accommodation and convergence was alleviated with near sight
correction with soft contact lenses. Contraction of pupil diameter was also
alleviated.
These findings suggest that with naked vision the pupil is constricted and the depth
of field is deepened, acting like a compensation system for weak accommodation
power. When doing visual near work, a person’s ciliary muscle of accommodation
constantly changes the focal depth of the lens of the eye to obtain a sharp image.
Thus, when the viewing distance is short, the ciliary muscle must continually contract
for accommodation and convergence. In contrast, when attention is allowed to wander
over distant objects, the eyes are focused on infinity and ciliary muscles remain
362 M. Omori et al.
relaxed (Kroemer & Grandjean, 1997) [15]. Consequently, it is thought that easing the
strain of the ciliary muscle due to prolonged near work may prevent accommodative
asthenopia.
In addition, this study is suggested that pupil contraction is reduced as a result of
compensation by soft contact lenses with near eye sight. This suggests that people in
middle age can view 3D images more easily if positive (convex lens) correction is
made.
References
1. Sheehy, J.B., Wilkinson, M.: Depth perception after prolonged usage of night vision
goggles. Aviat. Space Environ. Med. 60, 573–579 (1989)
2. Mon-Williams, M., Wann, J.P., Rushton, S.: Binocular vision in a virtual world: visual
deficits following the wearing of a head-mounted display. Ophthalmic Physiol. Opt. 13(4),
387–391 (1993)
3. Howarth, P.A.: Oculomotor changes within virtual environments. Appl. Ergon. 30, 59–67
(1999)
4. Heron, G., Charman, W.N., Schor, C.M.: Age changes in the interactions between the
accommodation and vergence systems. Optometry & Vision Science 78(10), 754–762 (2001)
5. Schor, C.: Fixation of disparity: a steady state error of disparity-induced vergence.
American Journal of Optometry & Physiological Optics 57(9), 618–631 (1980)
6. Rosenfield, M., Ciuffreda, K.J., Gilmartin, B.: Factors influencing accommodative
adaptation. Optometry & Vision Science 69(4), 270–275 (1992)
7. Iwasaki, T., Akiya, S., Inoue, T., Noro, K.: Surmised state of accommodation to
stereoscopic three-dimensional images with binocular disparity. Ergonomics 39(11),
1268–1272 (1996)
8. Hoffman, D.M., Girshick, A.R., Akeley, K., Banks, M.S.: Vergence-accommodation
conflicts hinder visual performance and cause visual fatigue. Journal of Vision 8(3), 33.1–
30 (2008)
9. Miyao, M., Ishihara, S.Y., Saito, S., Kondo, T.A., Sakakibara, H., Toyoshima, H.: Visual
accommodation and subject performance during a stereographic object task using liquid
crystal shutters. Ergonomics 39(11), 1294–1309 (1996)
10. Weale, R.A.: Senile changes in visual acuity. Transactions of the Ophthalmological
Societies of the United Kingdom 95(1), 36–38 (1975)
11. Omori, M., Watanabe, T., Takai, J., Takada, H., Miyao, M.: An attempt at preventing
asthenopia among VDT workers. International J. Occupational Safety and Ergonomics 9(4),
453–462 (2003)
12. Sun, F., Stark, L.: Static and dynamic changes in accommodation with age. In:
Presbyopia: Recent Research and Reviews from the 3rd International Symposium, pp.
258–263. Professional Press Books/Fairchild Publications, New York (1987)
13. Wann, J.P., Rushton, S., Mon-Williams, M.: Natural problems for stereoscopic depth
perception in virtual environments. Vision Res. 35(19), 2731–2736 (1995)
14. Hong, H., Sheng, L.: Correct focus cues in stereoscopic displays improve 3D depth
perception. SPIE, Newsroom (2010)
15. Kroemer, K.H.E., Grandjean, E.: Fitting the Task to the Human, 5th edn. Taylor & Francis,
London (1997)
Simultaneous Measurement of Lens Accommodation
and Convergence to Real Objects
Tomoki Shiomi1, Hiromu Ishio1, Hiroki Hori1, Hiroki Takada2,

Masako Omori3, Satoshi Hasegawa4, Shohei Matsunuma5,
Akira Hasegawa1, Tetsuya Kanda1, and Masaru Miyao1
1
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku,
2
Graduate School of Engineering, Fukui University, 3-9-1 Bunkyo, Fukui, 910-8507, Japan
3
Facility of Home Economics, Kobe Women’s University, 2-1 Aoyama,
Higashisuma, Sumaku, Kobe 654-8585, Japan
4
Department of Information and Media Studies, Nagoya Bunri University, Maeda365,
Inazawa-cho, Inazawa, Nagoya 492-8620, Japan
5
Nagoya Industrial Science Research Institute, 2-10-19 sakae, naka-ku,
shiomi.tomoki@a.mbox.nagoya-u.ac.jp
Abstract. Human beings can perceive that objects are three-dimensional (3D)
as a result of simultaneous lens accommodation and convergence on objects,
which is possible because humans can see so that parallax occurs with the right
and left eye. Virtual images are perceived via the same mechanism, but the
influence of binocular vision on human visual function is insufficiently
understood. In this study, we developed a method to simultaneously measure
accommodation and convergence in order to provide further support for our
previous research findings. We also measured accommodation and convergence
in natural vision to confirm that these measurements are correct. As a result, we
found that both accommodation and convergence were consistent with the
distance from the subject to the object. Therefore, it can be said that the present
measurement method is an effective technique for the measurement of visual
function, and that even during stereoscopic vision correct values can be
obtained.
Keywords: simultaneous measurement, eye movement, accommodation and

convergence, natural vision.
1 Introduction
Recently, 3-dimensional images have been spreading rapidly, with many opportunities
for the general population to come in contact with them, such as in 3D films and 3D
televisions. Manufacturers of electric appliances, aiming at market expansion, are
strengthening their line of products with digital devices related 3D.
Despite this increase in 3D products and the many studies that have been done on
binocular vision, the influence of binocular vision on human visual function remains
364 T. Shiomi et al.
insufficiently understood [1, 2, 3, 4]. In considering the safety of viewing virtual

3-dimensional objects, investigations of the influence of stereoscopic vision on the
human body are important.
Though various symptoms, such as eye fatigue and solid intoxication, are seen
often when humans continue to view 3-dimensional images, neither solid intoxication
nor eye fatigue is a symptom seen in the conditions in which we usually live, so-
called natural vision. One of the reasons often given for this is that lens
accommodation (Fig.1) and convergence (Fig.2) are inconsistent.
Fig. 1. Principle of lens accommodation Fig. 2. Principle of convergence
Accommodation is a reaction that changes refractive power by changing the

curvature of the lens with the action of the musculus ciliaris of the eye and the
elasticity of the lens, so that an image of the external world is focused on the retina.
Convergence is a movement where both eyes rotate internally, functioning to
concentrate the eyes on one point to the front. There is a relationship between
accommodation and convergence, and this is one factor that enables humans to see
one object with both eyes. When an image is captured differently with right and left
eyes (parallax), convergence is caused. At the same time, focus on the object is
achieved by accommodation. Binocular vision using such mechanisms is the main
method of presenting 3-dimensional images, and many improvements have been
made [5, 6]. In explaining the inconsistencies above, it is said that accommodation is
always fixed on the screen where the image is displayed, while convergence intersects
at the position of the stereo images. As a result, eye fatigue, solid intoxication, and
other symptoms occur.
However, we obtained results that indicate inconsistency between accommodation and
convergence does not occur [7]. Even so, it is still often explained that inconsistency is a
cause of eye symptoms. One reason is that we could not simultaneously measure
accommodation and convergence in our previous study, and the proof for the results was
insufficient. To resolve this inconsistency, it was thought that measuring accommodation
and convergence simultaneously was needed. We therefore developed a method to
simultaneously measure accommodation and convergence.
Comparison with measurements of natural vision is essential in investigating
stereoscopic vision. For such comparisons, it is first necessary to make sure that the
Simultaneous Measurement of Lens Accommodation and Convergence to Real Objects 365
measurements of natural vision are accurate. We therefore focused on whether we

could accurately measure natural vision, and we report the results of those
measurements.
2 Method
The experiment was done with six healthy young males (age: 20~37). Subjects were
given a full explanation of the experiment in advance, and consent was obtained.
Subjects used their naked eyes or wore soft contact lenses (one person with
uncorrected vision, 5 who wore soft contact lenses), and their refraction was corrected
to within ±0.25 diopter. (“Diopter” is the refractive index of lens. It is an index of
accommodation power. It is the inverse of meters, for example, 0 stands for infinity,
0.5 stands for 2 m, 1 stands for 1 m, 1.5 stands for 0.67 m, 2 stands for 0.5 m, and 2.5
stands for 0.4 m). Devices used in this experiment were an auto ref/keratometer,
WAM-5500 (Grand Seiko Co. Ltd., Hiroshima, Japan) and an eye mark recorder,
EMR-9 (NAC Image Technology Inc., Tokyo, Japan).
2.1 WAM-5500
The WAM-5500 (Fig. 3) provides an open binocular field of view while a subject is
looking at a distant fixation target, and has two measurement modes, static mode and
dynamic mode. We used the dynamic mode in this experiment. The accuracy of the
WAM-5500 in measuring refraction in the dynamic mode of operation was evaluated
using the manufacturer’s supplied model eye (of power -4.50 D). The WAM-5500 set
to Hi-Speed (continuous recording) mode was connected to a PC running the WCS-1
software via an RS-232 cable that allows refractive data collection at a temporal
resolution of 5 Hz. No special operation was needed during dynamic data collection.
It was necessary to depress the WAM-5500 joystick button once to start and again to
stop recording at the beginning and end of the desired time frame, respectively.
The software records dynamic results, including time (in seconds) of each reading
for pupil size and MSE (mean spherical equivalent) refraction in the form of an Excel
Comma Separated Values (CSV) file [8, 9].
Fig. 3. Auto ref/keratometer WAM-5500 (Grand Seiko Co. Ltd., Hiroshima, Japan)
2.2 EMR-9
The EMR-9 (Fig. 4) measured eye movement using the papillary/corneal reflex
method. The horizontal measurement range was 40 degrees, the vertical range was 20
degrees, and the measurement rate was 60 Hz. This consisted of two video cameras
fixed to the left and right sides of the face, plus another camera (field-shooting unit)
fixed to the top of the forehead. Infrared light sources were positioned in front of each
lower eyelid. The side cameras recorded infrared light reflected from the cornea of
each eye while the camera on top of the forehead recorded pictures shown on the
screen. After a camera controller superimposed these three recordings with a 0.01 s
electronic timer, the combined recording was recorded on a SD card. Movement of
more than 1 degree with a duration greater than 0.1 s was scored as an eye movement.
A gaze point was defined by a gaze time exceeding 0.1 s. This technique enabled us
to determine eye fixation points. The wavelength of the infrared light was 850 nm.
After data were preserved on an SD card, they were read into a personal computer
[10,11].
Fig. 4. EMR-9 (NAC Image Technology Inc., Tokyo, Japan)
2.3 Experiment
These two devices were combined, and we simultaneously measured focus distances
of accommodation and convergence when subjects were gazing at objects (Fig. 5).
The experiment was conducted according to the following procedures. Subjects’
accommodation and convergence were measured as they gazed in binocular vision at
an object (tennis ball: diameter 7 cm) presented in front of them. The object moved in
a range of 0.5 m to 1 m, with a cycle of 10 seconds. Measurements were made four
times every 40 seconds. The illuminance of the experimental environment was about
103 (lx), and the brightness of the object in this environment was 46.9 (cd/m2).
Fig. 5. Pattern diagram of measurements
3 Results
In this study, we simultaneously measured subjects’ accommodation and convergence
while they were gazing at an object in binocular vision. The results of these
measurements were comparable for all subjects.
The results of the experiment for two subjects are shown as typical examples
(Fig. 6, Fig. 7). In Fig. 6 and 7, “accommodation” stands for focal length of lens
accommodation, and “convergence” stands for convergence focal length. These
figures show that the accommodation and convergence of both subject A and B
changed in agreement. Moreover, the change in the diopter value occurred with a
cycle of about ten seconds. Maximum diopter values of accommodation and
convergence of A and B were both about 2D, which is equal to 0.5 m. This was
consistent with the distance from the subject to the object. On the other hand, their
minimum values were accommodation distance of 1 D, equal to 1 m, and convergence
distance of 0.7 D, equal to 1.43 m. Convergence was consistent with the distance to
the object, but accommodation was focused a little beyond the object (about 0.3 D).
Fig. 6. Example of measurement: subject A

Fig. 7. Example of measurement: subject B
4 Discussion
In this experiment, we used the WAM-5500 and the EMR-9. The experiment using
the WAM-5500 examined the performance, and measurements with accuracy of -
0.01D ± 0.38D were possible by examining the results of measurements with WAM-
5500 from the agreement with subjective findings within the range from -6.38 to
+4.88D [9]. Eyestrain and transient myopia were also investigated using the WAM-
5500 [12, 13]. Experiments that examined the accuracy of DCC (dynamic cross
cylinder) have also been conducted, and significant differences in the values of test
and measurement data were found. The reliability of DCC was questioned [14].
Queiros et al. [8] investigated the influence of the lens on the adjustment for paralysis
and hyperopia using the WAM-5500. With respect to the eye mark recorder, Egami et
al. [11] investigated differences according to the age in tiredness and the learning
effect, showing several kinds of pictures. Sasaki [10] tried forecasting people’s
movements from the data of the glance obtained from the eye mark recorder, and
improved running of a support robot based on it. In addition, Nakashima et al. [15]
examined the possibility of early diagnosis of dementia from senior citizens’ eye
movements with the eye mark recorder. The eye mark recorder has thus been used in
various types of research. As mentioned above, much research has investigated the
performance and characteristics of these instruments, and experiments using them
have been conducted. In this experiment, we measured the accommodation distance
and the convergence distance while subjects watched an object. We calculated
convergence distance based on coordinated data for both eyes from pupil distance.
Our results showed that subjects’ accommodation and convergence changed to a
position between a near and far position from them while they were gazing at the
object. Moreover, these changes occurred at a constant cycle, tuned to the movement
of the object. Therefore, subjects viewed the object with binocular vision, and we
could measure the results. The accommodation weakened about 0.3 D when there was
an object in the furthest position and the point of 1 m. This indicates that the lens may
not be accommodated strictly at about 0.4D, nearly in agreement with our previous
findings [16]. While convergence was almost consistent with the distance from the
subject to the object, accommodation was often located a little beyond the object. This
is thought to originate from the fact that the index is seen even if focus is not accurate
because of the depth of field.These measurements were done in healthy young males.
In this case, it can be said that accommodation and convergence were consistent with
distance to the object when the subjects were gazing at the object. Further
investigation is needed to see whether the same results will be obtained in different
conditions, such as when the subjects are woman, not emmetropic, or older. In
conclusion, it was possible to simultaneously measure both accommodation and
convergence when subjects were gazing at an object. It can be said that the present
measurement method is an effective technique for the measurement of visual
function, and that correct values can be obtained even during stereoscopic vision.
Additionally, in future studies, higher quality evaluation of 3-dimensional images will
be possible by comparing subjects when they are viewing a 3-dimensional image and
when they are viewing the actual object.
References
1. Donders, F.C.: On the Anomalies of Accommodation and Refraction of the Eye. New
Sydenham Soc., London (1972); Reprinted by Milfort House, Boston (1864)
2. Fincham, E.F.: The mechanism of Accommodation. Br. J. Ophthalmol. 21, monograph
supp. 8 (1937)
3. Krishman, V.V., Shirachi, D., Stark, L.: Dynamic measures of vergence accommodation.
Am. J. Optom. Physiol. Opt. 54, 470–473 (1977)
4. Ukai, K., Tanemoto, Y., Ishikawa, S.: Direct recording of accommodative response versus
accommodative stimulus. In: Breinin, G.M., Siegel, I.M. (eds.) Advances in Diagnostic
Visual Optic, pp. 61–68. Springer, Berlin (1983)
5. Cho, A., Iwasaki, T., Noro, K.: A study on visual characteristics binocular 3-D images.
Ergonomics 39(11), 1285–1293 (1996)
6. Sierra, R., et al.: Improving 3D Imagerywith Variable Convergence and Focus
Accommodation for the Remote Assessment of Fruit Quality. In: SICE-ICASE
International Joint Conference 2006, pp. 3553–3558 (2006)
7. Miyao, M., et al.: Visual accommodation and subject performance during a stereographic
object task using liquid crystal shutters. Ergonomics 39(11), 1294–1309 (1996)
8. Queirós, A., González-Méijome, J., Jorge, J.: Influence of fogging lenses and cycloplegia
on open-field automatic refraction. Ophthal. Physiol. Opt. 28, 387–392 (2008)
9. Sheppard, A.L., Davies, L.N.: Clinical evaluation of the Grand Seiko Auto
Ref/Keratometer WAM-5500. Ophthal. Physiol. Opt. 30, 143–151 (2010)
10. Sakaki, T.: Estimation of Intention of User Arm Motion for the Proactive Motion of Upper
Extramity Supporting Robot. In: 2009 IEEE 11th International Conference on
Rehabilitation Robotics, Kyoto International Conference Center, Japan, June 23-26 (2009)
11. Egami, C., Morita, K., Ohya, T., Ishii, Y., Yamashita, Y., Matsuishi, T.: Developmental
characteristics of visual cognitive function during childhood according to exploratory eye
movements. Brain & Development 31, 750–757 (2009)
12. Tosha, C., Borsting, E., Ridder, W.H., Chase, C.: Accommodation response and visual
discomfort. Ophthal. Physiol. Opt. 29, 625–633 (2009)
13. Borsting, E., Tosha, C., Chase, C., Ridder, W.H.: Measuring Near-Induced Transient
Myopia in College Students with Visual Discomfort. American Academy of
Optometry 87(10) (2010)
14. Benzoni, J.A., Collier, J.D., McHugh, K., Rosenfield, M., Portello, J.K.: Does the dynamic
cross cylinder test measure the accommodative response accurately? Optometry 80, 630–
634 (2009)
15. Nakashima, Y., Morita, K., Ishii, Y., Shouji, Y., Uchimura, N.: Characteristics of
exploratory eye movements in elderly people: possibility of early diagnosis of dementia.
Psychogeriatrics 10, 124–130 (2010)
16. Miyao, M., Otake, Y., Ishihara, S., Kashiwamata, M., Kondo, T., Sakakibara, H., Yamada,
S.: An Experimental Study on the Objective Measurement of Accommodative Amplitude
under Binocular and Natural Viewing Conditions. Tohoku J. Exp. Med. 170, 93–102
(1993)
Comparison in Degree of the Motion Sickness Induced by
a 3-D Movie on an LCD and an HMD
Hiroki Takada1,2, Yasuyuki Matsuura3, Masumi Takada2, and Masaru Miyao3

1
Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan
2
Aichi Medical University, 21 Iwasaku Karimata, Nagakute, Aichi 480-1195, Japan
3
Nagoya University, Furo-cho, Chikusa-Ku, Nagoya 464-8601, Japan
takada@u-fukui.ac.jp
Abstract. Three-dimensional (3D) television sets are already on the market

and are becoming increasingly popular among consumers. Watching
stereoscopic 3D movies, though, can produce certain adverse affects such as
asthenopia and motion sickness. Visually induced motion sickness (VIMS)
is considered to be caused by an increase in visual-vestibular sensory
conflict while viewing stereoscopic images. VIMS can be analyzed both
psychologically and physiologically. According to our findings reported at
the last HCI International conference, VIMS could be detected with the total
locus length and sparse density, which were used as analytical indices of
stabilograms. In the present study, we aim to analyze the severity of motion
sickness induced by viewing conventional 3D movies on a liquid crystal
display (LCD) compared to that induced by viewing these movies on a
head-mounted display (HMD). We quantitatively measured the body sway
in a resting state and during exposure to a conventional 3D movie on an
LCD and HMD. Subjects maintained the Romberg posture during the
recording of stabilograms at a sampling frequency of 20 Hz. The simulator
sickness questionnaire (SSQ) was completed before and immediately after
exposure. Statistical analyses were applied to the SSQ subscores and to the
abovementioned indices (total locus length and sparse density) for the
stabilograms. Friedman tests showed the main effects in the indices for the
stabilograms. Multiple comparisons revealed that viewing the 3D movie on
the HMD significantly affected the body sway, despite a large visual
distance.
Keywords: visually induced motion sickness, stabilometry, sparse density,

liquid crystal displays (LCDs), head-mounted displays (HMDs).
1 Introduction
The human standing posture is maintained by the body’s balance function, which is an
involuntary physiological adjustment mechanism called the “righting reflex” [1]. This
righting reflex, which is centered in the nucleus ruber, is essential to maintain the
standing posture when locomotion is absent. The body’s balance function utilizes
sensory signals such as visual, auditory, and vestibular inputs, as well as proprioceptive
372 H. Takada et al.
inputs from the skin, muscles, and joints [2]. The evaluation of this function is
indispensable for diagnosing equilibrium disturbances like cerebellar degenerations,
basal ganglia disorders, or Parkinson’s disease [3].
Stabilometry has been employed for a qualitative and quantitative evaluation of
this equilibrium function. A projection of a subject’s center of gravity onto a detection
stand is measured as an average of the center of pressure (COP) of both feet. The
COP is traced for each time step, and the time series of the projections is traced on an
x-y plane. By connecting the temporally vicinal points, a stabilogram is created, as
shown in Fig. 1. Several parameters are widely used in clinical studies to quantify the
degree of instability in the standing posture: for instance, the area of sway (A), total
locus length (L), and locus length per unit area (L/A). It has been revealed that the last
parameter is particularly related to the fine variations involved in posture control [1].
Thus, the L/A index is regarded as a gauge for evaluating the function of
proprioceptive control of standing in human beings. However, it is difficult to
clinically diagnose disorders of the balance function and identify the decline in
equilibrium function by utilizing the abovementioned indices and measuring patterns
in a stabilogram. Large interindividual differences might make it difficult to
understand the results of such a comparison.
Mathematically, the sway in the COP is described by a stochastic process [4]–[6].
We examined the adequacy of using a stochastic differential equation (SDE) and
investigated the most adequate equation for our research. G(x), the distribution of the
observed point x, is related in the following manner to V(x), the (temporally
averaged) potential function, in the SDE, which has been considered to be a
mathematical model of sway:
r 1 r
V ( x ) = − ln G( x ) + const. (1)
2
The nonlinear property of SDEs is important [7]. There are several minimal points
of potential. In the vicinity of these points, local stable movement with a high-
frequency component can be generated as a numerical solution to the SDE. We can
therefore expect a high density of observed COP in this area on the stabilogram.
The analysis of stabilograms is useful not only for medical diagnoses but also for
achieving upright standing control in two-legged robots and preventing falls in elderly
people [8]. Recent studies have suggested that maintaining postural stability is one of
the major goals of animals, [9] and that they experience sickness symptoms in
circumstances wherein they have not acquired strategies to maintain their balance
[10]. Although the most widely known theory of motion sickness is based on the
concept of sensory conflict [10]–[12], Riccio and Stoffregen [10] argued that motion
sickness is instead caused by postural instability. Stoffregen and Smart (1999) reported
that the onset of motion sickness may be preceded by significant increases in postural
sway [13].
The equilibrium function in humans deteriorates when viewing 3-dimensional (3D)
movies [14]. It has been considered that this visually induced motion sickness (VIMS)
is caused by a disagreement between vergence and visual accommodation while
viewing 3D images [15]. Thus, stereoscopic images have been devised to reduce this
disagreement [16]–[17].
Comparison in Degree of the Motion Sickness Induced by a 3-D Movie 373
VIMS can be measured by psychological and physiological methods, and the

simulator sickness questionnaire (SSQ) is a well-known psychological method for
measuring the extent of motion sickness [18]. The SSQ is used in this study to verify
the occurrence of VIMS. The following parameters of autonomic nervous activity are
appropriate for the physiological method: heart rate variability, blood pressure,
electrogastrography, and galvanic skin reaction [19]–[21]. It has been reported that a
wide stance (with the midlines of the heels 17 or 30 cm apart) significantly increases
the total locus length in the stabilograms of individuals with high SSQ scores, while
the length for individuals with low scores is less affected by such a stance [22]. In our
report at the last HCI International 2011, we reported that VIMS could be detected
by the total locus length and sparse density, which were used as the analytical indices
of stabilograms [23].
The objective of the present study is to compare the degree of motion sickness
induced by viewing a conventional 3D movie on a liquid crystal display (LCD)
with that from viewing a 3D movie on a head-mounted display (HMD). We
quantitatively measured body sway during the resting state, exposure to a 3D
movie on an LCD, and that on an HMD.
2 Material and Methods

Ten healthy subjects (age, 23.6 ± 2.2 years) voluntarily participated in the study. All
of them were Japanese and lived in Nagoya and its surrounding areas. They provided
informed consent prior to participation. The following subjects were excluded from
the study: subjects working night shifts, those dependent on alcohol, those who
consumed alcohol and caffeine-containing beverages after waking up and less than 2
h after meals, those using prescribed drugs, and those who may have had any
otorhinolaryngologic or neurological disease in the past (except for conductive
hearing impairment, which is commonly found in the elderly). In addition, the
subjects had to have experienced motion sickness at some time during their lives.
We ensured that the body sway was not affected by environmental conditions.
Using an air conditioner, we adjusted the temperature to 25 °C in the exercise room,
which was kept dark. All the subjects were tested in this room from 10 a.m. to 5 p.m.
Three kinds of stimuli were presented in random order: (I) a static circle with a
diameter of 3 cm (resting state); (II) a conventional 3D movie that showed a sphere
approaching and moving away from the subjects, irregularly; and (III) the same
motion picture as shown in (II). Stimuli (I) and (II) were presented on an LCD
monitor (S1911- SABK, NANAO Co., Ltd.). The distance between the LCD and the
subjects was 57 cm. On the other hand, the subjects wore an HMD (iWear AV920;
Vuzix Co. Ltd.) during exposure to the last movie (III). This wearable display is
equivalent to a 62-inch screen viewed at a distance of 2.7 m.
The subjects stood without moving on the detection stand of a stabilometer
(G5500; Anima Co. Ltd.) in the Romberg posture, with their feet together for 1 min
before the sway was recorded. Each sway of the COP was then recorded at a sampling
frequency of 20 Hz; the subjects were instructed to maintain the Romberg posture for
the first 60 s. The subjects viewed one of the stimuli, that is, (I), (II), or (III), from the
beginning till the end. They filled out an SSQ before and after the test.
We calculated several indices that are commonly used in the clinical field [24] for
stabilograms, including the “area of sway,” “total locus length,” and “total locus
length per unit area.” In addition, new quantification indices that were termed SPD S2,
S3, and total locus length of chain [25] were also estimated.
3 Results
The SSQ results are shown in Table 1 and include the nausea (N), oculomotor
discomfort (OD), and disorientation (D) subscale scores, along with the total score
(TS) of the SSQ. No statistical differences were seen in these scores among the
stimuli presented to the subjects.
(a)
(b) (c)
Fig. 1. Typical stabilograms observed when subjects viewed the static circle (a), the conventional
3D movie on the LCD (b), and the same 3D movie on the HMD (c).
Table 1. Table 1 Subscales of SSQ after exposure to 3D movies

Movies (II) (III)
N 14.3 ± 4.8 11.4 ± 3.7
OD 16.7 ± 4.0 18.2 ± 4.1
D 22.3 ± 9.3 23.7 ± 8.8
TS 19.8 ± 5.8 19.8 ± 5.3
(a) (b)
(**p < 0.01, *p < 0.05, Ⴄp < 0.1)
Fig. 2. Typical results of Nemenyi tests for the following indicators: total locus length (a) and
SPD (b)
However, there were increases in the scores after exposure to the conventional 3D
movies. Although there were large individual differences, sickness symptoms seemed
to appear more often with the 3D movies.
Typical stabilograms are shown in Fig. 1. In these figures, the vertical axis shows
the anterior and posterior movements of the COP, and the horizontal axis shows the
right and left movements of the COP. The sway amplitudes that were observed
during exposure to the movies (Fig. 1b–1c) tended to be larger than those of the
control sway (Fig. 1a). Although a high COP density was observed in the stabilograms
(Fig. 1a), this density decreased during exposure to the movies (Fig. 1b–1c).
According to the Friedman test, the main effects were seen in the indices of the
stabilograms, except for the chain (p < 0.01). Nemenyi tests were employed as a post-
hoc procedure after the Friedman test (Fig. 2). Five of the six indices were enhanced
significantly by exposure to the 3D movie on the HMD (p < 0.05). Except for the total
locus length, there was no significant difference between the values of the indices
measured during the resting state and exposure to the 3D movie on the LCD (p < 0.05).
4 Discussion
A theory has been proposed to obtain SDEs as a mathematical model of body sway on
the basis of stabilograms.
We questioned whether the random force vanished from the mathematical model of
the body sway. Using our Double-Wayland algorithm [26]–[27], we evaluated the
degree of visible determinism in the dynamics of the COP sway. Representative
results of the Double-Wayland algorithm are shown in Fig. 3. We calculated
translation errors Etrans derived from the time series x (Fig. 3a, 3c). The translation
errors Etrans’ were also derived from their temporal differences (differenced time
series). Regardless of whether a subject was exposed to the 3D movie on the HMD
(III), the Etrans’ was approximately 1.
Time Series Time Series

1.5 Differenced Time Series 1.5 Differenced Time Series
Translation Error (E trans)
trans)
Translation Error (E
1.0 1.0
0.5 0.5
(a) (b)
0.0 0.0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Dimension of Embedding space Dimension of Embedding space
Time Series
Time Series 1.5 Differenced Time Series
1.5 Differenced Time Series
trans)
Translation Error (E trans)
Translation Error (E
1.0
1.0
0.5 0.5
(c) (d)
0.0 0.0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Dimension of Embedding space Dimension of Embedding space
Fig. 3. Mean translation error for each embedding space. The translation errors were estimated from
the stabilograms that were observed when subjects viewed a static circle (a)–(b) and conventional
3D movie on the HMD (c)–(d). We derived the values from the time series y (b), (d).
The translation errors in each embedding space were not significantly different
from those derived from time series x and y. Thus, Etrans > 0.5 was obtained using the
Wayland algorithm, which implies that the time series could be generated by a
stochastic process in accordance with a previous standard [28]. This 0.5 threshold is
half of the translation error resulting from a random walk. Body sway has previously
been described by stochastic processes [4]–[7], which were shown using the Double-
Wayland algorithm [29]. Moreover, 0.8 < Etrans’ < 1 exceeded the translation errors
Etrans estimated by the Wayland algorithm, as shown in Fig. 3b. However, the
translation errors estimated by the Wayland algorithm were similar to those obtained
from the temporal differences, except for the case in Fig. 3b, which agrees with the
abovementioned explanation of the dynamics for controlling a standing posture. The
exposure to 3D movies would not change the dynamics into a deterministic one.
Mechanical variations were not observed in the locomotion of the COP. We assumed
that the COP was controlled by a stationary process, and the sway during exposure to
the static control image (I) could be compared with that when the subject viewed the
conventional 3D movie on the HMD. The indices for the stabilograms might reflect
the coefficients in stochastic processes, although no significant difference in
translation error was seen in a comparison of the stabilograms measured during
exposure to (I) and (III). Regarding the system to control our standing posture during
exposure to the 3D movie on the LCD (II), similar results were obtained.
The anterior-posterior direction y was considered to be independent of the
∋
medial-lateral direction x [30]. SDEs on the Euclid space E2 (x, y)
U x w t (2)
U y w t (3)
have been proposed as mathematical models for generating stabilograms [4]–[7].

Pseudorandom numbers were generated by the white noise terms wx(t) and wy(t).
Constructing nonlinear SDEs from the stabilograms (Fig. 1) in accordance with Eq.
(1) revealed that their temporally averaged potential functions, Ux and Uy, have plural
minimal points, and fluctuations can be observed in the neighborhood of these points
[7]. The variance in the stabilogram depends on the form of the potential function in
the SDE; therefore, the SPD is regarded as an index for its measurement.
Regardless of the display on which the 3D movies were presented, multiple
comparisons indicated that the total locus length during exposure to the stereoscopic
movies was significantly larger than that during the resting state (Fig. 2b). As shown
in Fig. 1b and 1c, obvious changes in the form and coefficients of the potential
function (1) occur. Structural changes might occur in the time-averaged potential
function (1) with exposure to stereoscopic images, which are assumed to reflect
the sway in the center of gravity. We considered that the decrease in the gradient of
the potential increased the total locus length of the stabilograms during exposure to
the stereoscopic movies. The standing posture becomes unstable because of the
effects of the stereoscopic movies.
Most of the indices during exposure to the 3D movie on the HMD were
significantly greater than those in the resting state, although there was no significant
difference between the indices of the stabilograms during the resting state and those
during exposure to the 3D movie on the LCD (Fig. 2). In this study, the apparent size
of the LCD was greater than that of the HMD. Despite the size and visual distance,
the 3D movie on the HMD affected the subject’s equilibrium function. Hence, by
using the indicators involved in the stabilograms, we noted postural instability during
exposure to the conventional stereoscopic images on the HMD. The next step will
involve an investigation with the goal of proposing guidelines for the safe viewing of
3D movies on HMDs.
References
1. Okawa, T., Tokita, T., Shibata, Y., Ogawa, T., Miyata, H.: Stabilometry - Significance of
locus length per unit area (L/A) in patients with equilibrium disturbances. Equilibrium
Res. 55(3), 283–293 (1995)
2. Kaga, K., Memaino, K.: Structure of vertigo. Kanehara, Tokyo 23-26, 95–100 (1992)
3. Okawa, T., Tokita, T., Shibata, Y., Ogawa, T., Miyata, H.: Stabilometry - Significance of
locus length per unit area (L/A). Equilibrium Res. 54(3), 296–306 (1996)
4. Collins, J.J., De Luca, C.J.: Open-loop and closed-loop control of posture: A random-walk
analysis of center of pressure trajectories. Exp. Brain Res. 95, 308–318 (1993)
5. Emmerrik, R.E.A., Van Sprague, R.L., Newell, K.M.: Assessment of sway dynamics in
tardive dyskinesia and developmental disability: Sway profile orientation and stereotypy.
Moving Disorders 8, 305–314 (1993)
6. Newell, K.M., Slobounov, S.M., Slobounova, E.S., Molenaar, P.C.: Stochastic processes in
postural center-of-pressure profiles. Exp. Brain Res. 113, 158–164 (1997)
7. Takada, H., Kitaoka, Y., Shimizu, Y.: Mathematical index and model in stabilometry.
Forma 16(1), 17–46 (2001)
8. Fujiwara, K., Toyama, H.: Analysis of dynamic balance and its training effect - Focusing
on fall problem of elder persons. Bulletin of the Physical Fitness Research Institute 83,
123–134 (1993)
9. Stoffregen, T.A., Hettinger, L.J., Haas, M.W., Roe, M.M., Smart, L.J.: Postural instability
and motion sickness in a fixed-base flight simulator. Human Factors 42, 458–469 (2000)
10. Riccio, G.E., Stoffregen, T.A.: An ecological theory of motion sickness and postural
instability. Ecological Physiology 3(3), 195–240 (1991)
11. Oman, C.: A heuristic mathematical model for the dynamics of sensory conflict and
motion sickness. Acta Otolaryngologica Supplement 392, 1–44 (1982)
12. Reason, J.: Motion sickness adaptation: A neural mismatch model. J. Royal Soc. Med. 71,
819–829 (1978)
13. Stoffregen, T.A., Smart, L.J., Bardy, B.J., Pagulayan, R.J.: Postural stabilization of
looking. Journal of Experimental Psychology. Human Perception and Performance 25,
1641–1658 (1999)
14. Takada, H., Fujikake, K., Miyao, M., Matsuura, Y.: Indices to detect visually induced
motion sickness using stabilometry. In: Proc. VIMS 2007, pp. 178–183 (2007)
15. Hatada, T.: Nikkei electronics, vol. 444, pp. 205–223 (1988)
16. Yasui, R., Matsuda, I., Kakeya, H.: Combining volumetric edge display and multiview
display for expression of natural 3D images. In: Proc. SPIE, vol. 6055, pp. 0Y1–0Y9
(2006)
17. Kakeya, H.: MOEVision: Simple multiview display with clear floating image. In: Proc.
SPIE, vol. 6490, p. 64900J (2007)
18. Kennedy, R.S., Lane, N.E., Berbaum, K.S., Lilienthal, M.G.: A simulator sickness
questionnaire (SSQ): A new method for quantifying simulator sickness. International J.
Aviation Psychology 3, 203–220 (1993)
19. Holomes, S.R., Griffin, M.J.: Correlation between heart rate and the severity of motion
sickness caused by optokinetic stimulation. J. Psychophysiology 15, 35–42 (2001)
20. Himi, N., Koga, T., Nakamura, E., Kobashi, M., Yamane, M., Tsujioka, K.: Differences in
autonomic responses between subjects with and without nausea while watching an
irregularly oscillating video. Autonomic Neuroscience. Basic and Clinical 116, 46–53
(2004)
21. Yokota, Y., Aoki, M., Mizuta, K.: Motion sickness susceptibility associated with visually
induced postural instability and cardiac autonomic responses in healthy subjects. Acta
Otolaryngologia 125, 280–285 (2005)
22. Scibora, L.M., Villard, S., Bardy, B., Stoffregen, T.A.: Wider stance reduces body sway
and motion sickness. In: Proc. VIMS 2007, pp. 18–23 (2007)
23. Fujikake, K., Miyao, M., Watanabe, T., Hasegawa, S., Omori, M., Takada, H.: Evaluation
of body sway and the relevant dynamics while viewing a three-dimensional movie on a
head-mounted display by using stabilograms. In: Shumaker, R. (ed.) VMR 2009. LNCS,
24. Suzuki, J., Matsunaga, T., Tokumatsu, K., Taguchi, K., Watanabe, Y.: Q&A and a manual
in stabilometry. Equilibrium Res. 55(1), 64–77 (1996)
25. Takada, H., Kitaoka, Y., Ichikawa, S., Miyao, M.: Physical meaning on geometrical index
for stabilometry. Equilibrium Res. 62(3), 168–180 (2003)
26. Wayland, R., Bromley, D., Pickett, D., Passamante, A.: Recognizing determinism in a time
series. Phys. Rev. Lett. 70, 530–582 (1993)
27. Takada, H., Morimoto, T., Tsunashima, H., Yamazaki, T., Hoshina, H., Miyao, M.:
Applications of Double-Wayland algorithm to detect anomalous signals. FORMA 21(2),
159–167 (2006)
28. Matsumoto, T., Tokunaga, R., Miyano, T., Tokuda, I.: Chaos and time series, Baihukan,
Tokyo, pp. 49–64 (2002) (in Japanese)
29. Takada, H., Shimizu, Y., Hoshina, H., Shiozawa, Y.: Wayland tests for differenced time
series could evaluate degrees of visible determinism. Bulletin of Society for Science on
Form 17(3), 301–310 (2005)
30. Goldie, P.A., Bach, T.M., Evans, O.M.: Force platform measures for evaluating postural
control: Reliability and validity. Arch. Phys. Med. Rehabil. 70, 510–517 (1989)
Evaluation of Human Performance Using Two Types
of Navigation Interfaces in Virtual Reality
Luís Teixeira1, Emília Duarte2, Júlia Teles3, and Francisco Rebelo1

1
Ergonomics Laboratory. FMH/Technical University of Lisbon, Estrada da Costa,
1499-002 Cruz Quebrada - Dafundo, Portugal
2
UNIDCOM/IADE – Superior School of Design, Av. D. Carlos I, no. 4,
1200-649 Lisbon, Portugal
3
Mathematics Unit. FMH/Technical University of Lisbon, Estrada da Costa,
1499-002 Cruz Quebrada - Dafundo, Portugal
{lmteixeira,jteles,frebelo}@fmh.utl.pt, emilia.duarte@iade.pt
Abstract. Most of Virtual Reality related studies use a hand-centric device as a

navigation interface. Since this could be a problem when is required to
manipulate objects or it can even distract a participant from other tasks if he has
to “think” on how to move, a more natural and leg-centric interface seems more
appropriate. This study compares human performance variables (distance
travelled, time spent and task success) when using a hand-centric device
(Joystick) and a leg-centric type of interface (Nintendo Wii Balance Board)
while interacting in a Virtual Environment in a search task. Forty university
students (equally distributed in gender and number by experimental conditions)
participated in this study. Results show that participants were more efficient
when performing navigation tasks using the Joystick than with the Balance
Board. However there were no significantly differences in the task success.
Keywords: Virtual Reality, Navigation interfaces, Human performance.
1 Introduction
The most common navigation interfaces used in Virtual Reality (VR) are
hand-centric. This might pose a problem when besides navigating in the Virtual
Environment (VE), it is also required to further interact with it, for example,
manipulate objects. It can also be a problem if the hand-centric navigation can distract
in some way a participant from other tasks in the VE, i.e. when the participant has to
“think” on how to move. Also, and since a hand-centric interface does not reproduce a
natural navigation movement, it might not allow similar performance as other type of
interface, such as one that makes use of legs or feet movement to represent motion.
Because of the abovementioned limitations, navigation interfaces are an important
issue for VR and, as such, several attempts in creating new types of interface have
been done (e.g. [1], [2], [3]). Slater, Usoh et al. [1] used a walk-in-place technique to
navigate in the VE. Peterson, Wells et al. [2] presented a body-controller interface
called Virtual Motion Controller that uses the body to generate motion commands.
Evaluation of Human Performance Using Two Types of Navigation Interfaces in VR 381
Beckhaus, Blom and Haringer [3] propose two different types of interface for
navigation in VEs. One is based in a dance pad usually used for dance games. The
other interface proposed is a chair-based interface.
Another possible solution might pass by using already existing interfaces
(e.g. interfaces used for game consoles) in a new perspective, as navigation interfaces
for VR.
In recent years, new game interfaces have been created which are more
entertaining and involving than previous ones (for example the Nintendo Wiimote
interface when comparing to a more traditional gamepad). This new type of interface
can be adapted and used for VR navigation or for some other kind of interaction,
giving it a new meaning. Although most of these new interfaces are hand-centric,
there is one interface, the Nintendo® Wii Balance Board [4] (Balance Board
hereafter), that uses the weight of the person and can be controlled by using the
lower-body of the user, making it a more close to a natural type of navigational
interface.
Hilsendeger, Brandauer, Tolksdorf and Fröhlich [5] used the Balance Board as a
navigational interface for VEs. The solution they proposed was by defining a user-
vector of the movement that would translate to movement within the VE, based on the
pressure made in each of the four sensors. They presented two forms of navigation in
the VE: direct control of speed and acceleration mode. The direct control of speed
uses the leaning of the participant on the platform to create the desired movement and
speed in the VE. The acceleration mode only requires that the participant leans in
the direction they pretend to go, which would create the desired acceleration for the
movement. After that, the participant could stand still and the velocity would be the
same.
However, there are few studies that compare the users’ performance in VE when
using a leg-centric and a hand-centric navigation interface. So, the main objective of
this study is to compare two types of navigational interface (Balance Board and
Joystick) in VEs using performance variables as time spent, distance travelled and
task success. It was hypothesized that individuals that use the Balance Board as a
navigation interface have better performance in the search task.
2 Method
2.1 Study’s Design and Protocol
To test the formulated hypothesis, it was developed an experimental study with two
conditions, Balance Board and Joystick. These experimental conditions were
evaluated taking into account a search task in a VE using the following performance
variables: time spent, distance travelled and task success.
The study used a between-subject design. The selection criteria for the participants
of this study only allowed users that were university students (between 18 and 35
years old), had fluency in the Portuguese language, had no color vision deficiencies
(tested by the Ishihara Test [6]), could not use glasses (corrective lenses were
allowed) because the Head-Mounted Display did not allow the use of glasses, and
participants that reported being in good physical and mental health.
382 L. Teixeira et al.
The experimental session was divided in four stages: (1) signing of consent form
and introduction to the study; (2) training; and (3) simulation (4) open-ended
questions.
(1) In the beginning of the session, participants signed a consent form and were
advised that they could end the experiment at any time. In this part of the
experimental session they were also introduced to the study and to the equipment in a
way to learn how they would use it to interact with the simulation. Participants were
told that we were testing new VR software that automatically captures human
interaction’s data. This was told in order to reduce the possibility of any bias from the
participants while trying to deliberately perform better with the specific navigation
interface.
(2) Participants were placed in a training VE to familiarize with the equipment.
The environment contained a small room with a pillar in the center of it and a
connection to a zigzag type of corridor. Participants were told that they could explore
the area freely until they felt able to control the navigational interface. After that, the
researcher asked them to specifically go around the pillar in both directions and go to
the end of the corridor. If the participant could achieve these small goals without
difficulties, the researcher would consider that the participant was able to do the
simulation.
(3) The scenario was an end-of-day routine check where the participants had as the
main task to push six buttons in the VE. There were messages in boards on the VE
that directed the participant to the buttons. They were also told that the first
instruction was in the “Meeting Room”. The total number of buttons was omitted in
the instructions. The simulation ended after 20 minutes (the researcher would stop the
simulation if the participant seemed lost in the environment) or if the participant
reached a specific end point in the VE after activating a certain trigger.
After the simulation, participants were interviewed (open-ended questions) about the
difficulties that they experienced while immersed and about their overall opinion
regarding the interaction quality.
2.2 Sample
Forty university students (20 males and 20 females) participated and were equally
distributed in gender and number by two experimental conditions. The participants
declared that they have not used the Balance Board before.
For the Joystick condition, participants had between 19 and 34 years old
(mean = 22.9, SD = 3.32), and for the Balance Board condition, they had between 18
and 29 years old (mean = 21.10, SD = 3.11).
The VE was designed in the idea to try to promote the immersion of the participants
and also to try to create a more natural interaction with the navigational interfaces at
study. As such, the VE was an office building, containing four symmetrical rooms
(meeting room, laboratory, cafeteria and warehouse), each measuring 12 by 12 meter.
The rooms were separated by two perpendicular axes of corridors and circumvented
by another corridor, 2 meter wide each.
There were six buttons placed on the walls distributed in the VE. The participants
were directed to each button through messages with instructions placed on boards in
each room. An orientation signage system was also designed in order to help
participants find the respective rooms. These signs were wall-mounted directional
signs, in panels, with pictorials, arrows and verbal information.
The VE was modeled in Autodesk 3dsMax v2009 and exported through the plugin
Ogremax v1.6.23 and presented by the ErgoVR software.
2.4 Equipment
The equipment used for this study in both experimental conditions was: (a) two
magnetic motion trackers from Ascension-Tech®, model Flock of Birds, with 6DOF,
used for the motion detection of the head and arm; (b) Head-Mounted Display from
Sony®, model PLM-S700E; (c) Wireless headphones from Sony®, model MDR-
RF800RK; (d) Graphics Workstation with an Intel® i7 processor, 8 Gigabytes of
RAM and a nVIDIA® QuadroFX4600.
For the Balance Board condition it was used a Nintendo® Wii Balance Board as a
navigation interface and for the Joystick condition was used a Thrustmaster® USB
Joystick.
2.5 Navigation
The Balance Board (see Fig. 1) has four pressure sensors in its corners that are used to
measure the user’s center of balance and weight. The center of balance is the
projection of the center of mass over the Balance Board platform. That projection can
be used as a reference to the corresponding movement in the VE.
Fig. 1. Images of the Nintendo Wii Balance Board. Seen from above on the Left and seen from
below on the Right (images from Nintendo®’s official website and the manual).
The solution used in this study for the navigation, regarding the Balance Board, is
similar to the direct control of speed mentioned by Hilsendeger, Brandauer, Tolksdorf
and Fröhlich [5]. That is, the navigation is made by leaning on the platform, by
applying more pressure on different areas of it. If the participant wants to move
forward (or backward) in the VE, he/she just needs to apply more pressure on the
forward (or backward) sensors of the platform. If the participant applies more
pressure on the left or the right sensors of the platform, the virtual body will rotate
over its own axis. The forward or backward plus leaning left or right movement’s
combination has the expected result which is that the virtual body will move forward
or backward while rotating left or right.
The same leaning movement principle was used for the navigation regarding the
Joystick.
3 Results
The variables that were automatically collected by the ErgoVR software [7] were the
Time spent, Distance travelled and Task success. Time spent is the time, in seconds,
and Distance travelled is the distance, in meters, that goes from the start of the
simulation to the moment when the participant reached the trigger (or they decided to
stop the simulation). Task success is given for the number of pressed buttons at the
end of the simulation.
Due to the violation of normality assumptions, the Mann-Whitney test was used to
compare the two conditions (Joystick and Balance Board interfaces) concerning
the performance variables (total distance travelled, time spent in the simulation and
the success rate of the search task). The statistical analysis, performed in IBM®
SPSS® Statistics v19, was conducted at a 5% significance level.
Results show that the time spent (see Fig. 2) by the Joystick users in the simulation
(mdn = 382.25 s) was significantly lower than for the Balance Board users
(mdn = 605.05 s), with U = 82.0, z = -3.192 and p < 0.001.
Fig. 2. Boxplot regarding Time Spent, in seconds, by experimental condition
Regarding the distance travelled (see Fig. 3), Joystick users (mdn = 265.46 m)
travelled significantly less distance than Balance Board users (mdn = 344.32 m), with
U = 95.0, z = -2.840 and p = 0.002.
Fig. 3. Boxplot regarding Distance travelled, in meters, by experimental condition
For the task success (see Fig. 4), i.e. number of pressed buttons, there were no
significantly differences between the Joystick users (mdn = 4) and the Balance Board
users (mdn = 3.5), with U = 153.5, z = -1.294 and p = 0.199.
Fig. 4. Boxplot regarding Task success by experimental condition
4 Discussion and Conclusion

To verify that users’ that use the Balance Board as a navigation interface have better
performance in a search task, it was made a comparison between two types of
navigation interfaces (Balance Board and Joystick) according to performance
variables (time spent, distance travelled and task success).
The results show that the hypothesis was not verified because participants were
more efficient concerning some performance variables, namely time spent and
distance, when performing navigation tasks using the Joystick than with the Balance
Board, but the results also show that there were no significant differences in the task
success (number of pressed buttons). The higher times and distances when using the
Balance Board can be connected with a somewhat not-natural rotational movement,
but that did not affected the task success.
Based on informal opinions from the participants at the end of the test, it was
noticed a higher enthusiasm from those who interacted with the Balance Board, yet
this interface also got the worst critics mainly because in some situations, the
participants seemed frustrated with the navigation. The somewhat unnatural rotational
movement might explain the differences found.
The Balance Board results could also been affected by a lower adaptation time
with the navigational interface.
With the results gathered we can see at least three paths to continue to work on.
The first would be the improvement of the navigational control with the Balance
Board, especially the rotational movement that is not very natural and was the aspect
that got worst critics from the participants. Participants stated that although after a
while in the VE they did not have to “think” to make the forward or backward
movement, when they had to change directions, especially in small spaces or when
against a wall, they always had to “think” to make the appropriate movement. This
fact could be the cause of the higher times and distances to perform the requested
tasks.
The second path that can be taken is that, even with these higher values of time and
distance and some difficulties with the rotational movement, the participants’ sense of
presence and immersion might be higher than with the joystick and as such should be
investigated.
The third would be to create a less subjective training moment, where would be
possible to define different criteria, analyzed automatically and that would consider
the participant able to control well enough the navigational controls before passing to
the simulation moment.
The present research provides evidence that interfaces used for games can be a
viable option for VR-based studies when performance variables such as time spent
and distance travelled are not important for the study but it is important the search
task completion.
References
1. Slater, M., Usoh, M., Steed, A.: Taking steps: the influence of a walking technique on
presence in virtual reality. ACM Trans. Comput.-Hum. Interact. 2(3), 201–219 (1995)
2. Peterson, B., Wells, M., Furness III, T.A., Hunt, E.: The Effects of the Interface on
Navigation in Virtual Environments. In: Proceedings of Human Factors and Ergonomics
Society 1998 Annual Meeting, pp. 1496–1505 (1981)
3. Beckhaus, S., Blom, K.J., Haringer, M.: Intuitive, Hands-free Travel Interfaces for Virtual
Environments. In: New Directions in 3D User Interfaces Workshop of IEEE VR 2005, pp.
57–60. Shaker Verlag, Ithaca (2005)
4. Nintendo Wii Balance Board official website,
http://www.nintendo.com/wii/console/accessories/balanceboard
5. Hilsendeger, A., Brandauer, S., Tolksdorf, J., Fröhlich, C.: Navigation in Virtual Reality
with the Wii Balance Board. In: 6th Workshop on Virtual and Augmented Reality (2009)
6. Ishihara, S.: Test for Colour-Blindness, 38th edn. Kanehara & Co., Ltd., Tokyo (1988)
7. Teixeira, L., Rebelo, F., Filgueiras, E.: Human interaction data acquisition software for
virtual reality: A user-centered design approach. In: Kaber, D.B., Boy, G. (eds.) Advances
in Cognitive Ergonomics, pp. 793–801. CRC Press, Boca Raton (2010)
Use of Neurophysiological Metrics within a Real and
Virtual Perceptual Skills Task to Determine Optimal
Simulation Fidelity Requirements
Jack Vice1, Anna Skinner1, Chris Berka3, Lauren Reinerman-Jones2,

Daniel Barber2, Nicholas Pojman3, Veasna Tan3,
Marc Sebrechts4, and Corinna Lathan1
1
AnthroTronix, Inc. 8737 Colesville Rd., L203 Silver Spring, MD 20910, USA
2
Institute for Simulation and Training, University of Central Florida 3100 Technology
Parkway, Orlando, FL 32826, USA
3
Advanced Brain Monitoring, Inc. 2237 Faraday Ave., Ste 100 Carlsbad, CA 92008, USA
4
The Catholic University of America, Department of Psychology, 4001 Harewood Rd., NE
Washington, DC 20064, USA
{askinner,jvice,clathan}@atinc.com,
{lreinerm,dbarber}@ist.ucf.edu,
{chris,npojman,vtan}@b-alert.com, sebrechts@cua.edu
Abstract. The military is increasingly looking to virtual environment (VE)

developers and cognitive scientists to provide virtual training platforms to
support optimal training effectiveness within significant time and cost
constraints. However, current methods for determining the most effective levels
of fidelity in these environments are limited. Neurophysiological metrics may
provide a means for objectively assessing the impact of fidelity variations on
training. The current experiment compared neurophysiological and performance
data for a real-world perceptual discrimination task as well as a similarly-
structured VE training task under systematically varied fidelity conditions.
Visual discrimination and classification was required between two militarily-
relevant (M-16 and AK-47 rifle), and one neutral (umbrella) stimuli, viewed
through a real and virtual Night Vision Device. Significant differences were
found for task condition (real world versus virtual, as well as visual stimulus
parameters within each condition), within both the performance and
physiological data.
1 Introduction
The military is increasingly looking to VE developers and cognitive scientists to
provide virtual training platforms to support optimal training effectiveness within
significant time and cost constraints. However, validation of these environments and
scenarios is largely limited to subjective reviews by warfighter subject matter experts
(SMEs), who may not be fully aware of, or able to articulate, the cues they rely on
during situation assessments and decision-making. Warfighters are trained to use a
decision process referred to as the OODA loop: Observe, Orient, Decide, Act. In
order to successfully execute appropriate actions, it is necessary to observe the
388 J. Vice et al.
environment, using appropriate cues to develop accurate situational awareness and

orient to contextual and circumstantial factors, before a decision can be made and
acted upon. “Intuitive” decision-making relies on this process, but at a pace that is
too rapid to be decomposed and assessed effectively using standard methods of
during- and after-action review. The United States Marine Corps has recently
developed a training program, known as Combat Hunter, which emphasizes
observation skills in order to increase battlefield situational awareness and produce
proactive small-unit leaders that possess a bias for action (Marine Corps Interim
Publication, 2011).
Although extensive theoretical and empirical research has been conducted
examining the transfer of training from VEs to real world tasks (e.g., Lathan, Tracey,
Sebrechts, Clawson, & Higgins 2002; Sebrechts, Lathan, Clawson, Miller, &
Trepagnier, 2003), objective metrics of transfer are limited and there is currently a
lack of understanding of the scientific principles underlying the optimal interaction
requirements a synthetic environment should satisfy to ensure effective training.
Existing methods of transfer assessment are for the most part limited to indirect,
performance-based comparisons and subjective assessments, as well as assessments
of the degree to which aspects of the simulator match the real world task environment.
The method of fidelity maximization assumes that increased fidelity relates to
increased transfer; however, in some cases, lower-fidelity simulators have been
shown to provide effective training as compared to more expensive and complex
high-fidelity simulators., and while the approach of matching core components and
logical structure is promising, methods of determining which aspects of fidelity are
most critical to training transfer for a given task are limited. Performance-based
assessments are typically compared before and after design iterations in which
multiple fidelity improvements have been implemented, making it difficult or
impossible to identify which fidelity improvements correlate to improved training.
Thus, a need exists for more objective and efficient methods of identifying optimal
fidelity and interaction characteristics of virtual simulations for military training.
In 2007, a potential method for determining fidelity requirements for training
simulation component fidelity was proposed by Vice, Lathan, Lockerd, and Hitt. By
comparing physiological response and behavior between real and VE training stimuli,
Vice et al. hypothesized that such a comparison could potentially inform which types
of fidelity will have the highest impact on transfer of training. Skinner et al (2010)
expanded on this approach in the context of high-risk military training.
Physiologically-based assessment metrics, such as eye-tracking and
electroencephalogram (EEG) have been shown to provide reliable measures of
cognitive workload (e.g., Berka et al., 2004) and attention allocation (Carroll, Fuchs,
Hale, Dargue, & Buck, 2010), as well as cognitive processing changes due to fidelity
and stimulus variations within virtual training environments (Crosby & Ikehara, 2006;
Skinner, Vice, Lathan, Fidopiastis, Berka, & Sebrechts, 2009; Skinner, Sebrechts,
Fidopiastis, Berka, Vice, & Lathan, 2010; Skinner, Berka, O’Hara-Long, & Sebrechts,
2010).
Previous related research has demonstrated that event-related potentials (ERP’s)
are sensitive to even slight variations in virtual task environment fidelity, even in
cases in which task performance does not significantly differ. A pilot study was
conducted (Skinner et al., 2009) in which variations in the fidelity of the stimuli (high
Use of Neurophysiological Metrics within a Real and Virtual Perceptual Skills Task 389
versus low polygon count) in a visual search/identification task did not result in
performance changes; however, consistent and distinguishable differences were
detected in ERP early and late components. The results of a second study (Skinner, et
al., 2010) demonstrated that ERPs varied across four classes of vehicles and were
sensitive to changes in the fidelity of the vehicles within the simulated task
environment. While performance, measured by accuracy and reaction times,
distinguished between the various stimulus resolution levels and between classes of
vehicles, the ERPs further highlighted interactions between resolution and class of
vehicle, revealing subtle but critical aspects affecting the perceptual discrimination for
the vehicles within the training environment. The objective of the current study was to
collect physiological and performance data for participants completing a real world
perceptual skills task, as well as a similarly-structured VE training task in varied
fidelity conditions, and to compare the data sets in an effort to identify the impact of
the various task conditions on both behavioral and neurophysiological metrics.
2 Method
Within the current study, visual discrimination and classification was required
between 3 stimuli: positive (M-16), negative (AK-47), and neutral (umbrella) viewed
through a real or virtual AN/PVS-14 Night Vision Device (NVD). The stimuli were
partially occluded; only 6 inches of the front portion of the stimuli were visible,
sticking out from a hallway, 20 feet from the seated observer. Stimuli were
perpendicular to the hallway wall, and parallel to the ground. The real world (RW)
conditions used a hallway constructed from foam board; the layout of the hallway is
shown in Figure 1. Within the VE condition, a virtual hallway and virtual target
objects were developed that were matched to RW task conditions and viewed through
a virtual NVD model. This task design also reduced confounding variables such as
field of view (FOV) in both the RW and VE conditions.
Within the RW conditions, the FOV of the observer was restricted by the NVD.
Participants were seated with their dominant eye up to the eyecup of the NVD, which
was mounted on a tripod, and their non-dominant eye was covered by a patch. Within
the VE conditions, subjects were seated 15 inches from a flat screen 19-inch monitor
on which stimuli were displayed, with their dominant eye up to an NVD eyecup and a
plastic tube designed to match the FOV for the in the RW task environment, which
was mounted on a tripod, and their non-dominant eye was covered by a patch. A
shutter mechanism was used to show or hide the visual stimuli in both conditions, and
was synced to an open source data logging and visualization tool to fuse data from the
physiological sensors and the task environment. Stimulus viewing time was 3
seconds with an interstimulus interval (ISI) of 7 seconds to allow enough time to
swap the stimuli in the RW condition.
Two task conditions were completed by all participants within the RW setting:
ambient light conditions (RW ambient) and with infrared lighting (RW IR). The
order of stimulus presentation was randomized by a computer program. A
photograph of the RW task environment, taken through the NVD, and a screen shot of
the VE are shown in Figure 1 and 2 respectively.
390 J. Vice et al.
Fig. 1. Real world AK-47 stimulus Fig. 2. Virtual Environment M-16
Based on previous studies and the specific VE task characteristics, two fidelity
components (resolution and color depth) were identified that were expected to reflect
the greatest impact on performance and neurophysiological response; these were
selected to be systematically varied to assess concomitant physiological changes. All
other fidelity components were kept constant at a standard, default level during the
experiments so as not to impact results. Three fidelity configurations were used in the
VE condition: Low Resolution/High Color Depth (LoHi), High Resolution/Low Color
Depth (HiLo), and High Resolution/High Color Depth (HiHi). The VE task scenarios
were designed as closely to the RW scenes as possible by using pictures taken
(without magnification) from the perspective of a participant in the RW task
condition. Lighting within the VE was designed to match lighting conditions for RW
Ambient lighting condition.
A total of 40 participants were recruited for this experiment. Pilot testing was
conducted with 5 participants, and 35 participated in the formal study.
Approximately half of the participants started in the RW task environment, and half
started in the VE task condition. The order of conditions within RW and VE was
randomized in a block design. A total of 25 trials were presented for each of the 9
unique images (3 stimuli x 3 fidelity conditions) in the VE condition. A total of 25
trials were also presented for each of the 6 unique images (3 stimuli x 2 fidelity
conditions) in the RW condition. The order of stimulus presentation was randomized
for each subject, and the order of conditions was balanced across subjects. Data
collected included accuracy, reaction time (RT), and EEG using a 9-channel EEG cap.
3 Results
3.1 Performance Data
Performance data were assessed in terms of both accuracy (percent correct) and
reaction time. Effects were assessed both across and within task conditions (RW and
VE) for each stimulus type using repeated measures analysis of variance (ANOVA).
Thirty-four participants completed the experiment in total. Based on a screening
criterion to eliminate performers that were not able to perform above chance, those
with accuracy scores below 33% for any condition were removed; thus, 23
participants were included in the performance and physiological analyses. The
Greenhouse-Geisser adjustment in SPSS was used to correct for violations of
sphericity.
A 3 x 5 (stimulus x fidelity) repeated measures ANOVA for percent correct
showed a main effect for stimulus (F(1.77,39.03) = 4.25, p = .025), such that
participants correctly identified the AK47 (M = 98.2%) more often than both the
umbrella (M = 96.4%, p = .038) and the M16 (M = 96.1%, p = .003), with no
significant difference between performance on the umbrella and M16. The main
effect for fidelity, as well as the stimulus x fidelity interaction, was not statistically
significant (p > .05).
A 3 x 5 (stimulus x fidelity) repeated measures ANOVA for response time (RT) for
correct responses found a significant main effect for stimulus (F(1.36,29.87) = 37.94,
p < .001), such that RTs were faster for the AK47 (M = 1.289s) than either the
umbrella (M = 1.356s, p = .005) or the M16 (M = 1.602s, p < .001), with the umbrella
also faster than the M16 (p < .001). Thus, no speed accuracy trade-offs are evident;
based on these results the AK-47 appears to have been the easiest stimulus to identify
across all fidelity conditions, followed by the umbrella; the M-16 appears to have
been the most difficult stimulus to identify. A significant fidelity main effect was also
found (F(1.89,41.56) = 16.00, p < .001), such that response time in the RW IR
condition (M = 1.702s) was slower than all other fidelity conditions (RW Ambient: M
= 1.467s, VE LoHi: M = 1.313s, VE HiLo: M = 1.318s, VE HiHi: M = 1.278; p <
.001 in all cases), and RTs were faster in the VE HiHi fidelity than the RW Ambient
condition (p < .001). No significant differences were found in the post-hoc
comparison of any other fidelity conditions. Finally, the interaction between stimulus
and fidelity was found to be significant (F(4.44,97.69) = 2.86, p = .023). The effect
is driven by the fact that participants responded slower to the umbrella in the RW
Ambient condition (M = 1.420s) than in the VE HiHi condition (M = 1.178s, p <
.001). Reaction time for the M16 was also slower in the RW Ambient fidelity
(M = 1.690s) than the VE LoHi (M = 1.471s, p = .016), VE HiLo (M = 1.492s, p
.011) and VE HiHi (M = 1.453s, p < .001). Reaction time for the AK47 exhibited no
significant difference between the RW IR fidelity and any of the VE fidelities (p > .05
in each case).
Fig. 3. Mean RTs for correct trials for each stimulus by fidelity condition
392 J. Vice et al.
3.2 Neurophysiological Data
Single trial ERP waveforms that included artifacts such as eyeblinks or excessive
muscle activity were removed on a trial-by-trial basis using the B-Alert automated
software. Additionally, trials with data points exceeding plus or minus 70 µV were
filtered and removed before averages were combined for the grand mean analysis
across all 23 participants. The ERP waveforms were time locked to the presentation
of the testbed stimuli and ERPs were plotted for the two seconds post-stimulus
presentation leading up to the response. Figure 4 highlights the ERP components of
interest for a set of sample ERP waveforms.
Fig. 4. ERP waveform after stimulus presentation over a 2 second window
Based on previous research findings, indicating relevance to the current task, the
following ERP waveform components were examined: N1, P2, and the late positivity
(500-1200ms). Analysis of these components examined the effects of fidelity
condition and stimulus type at various electrode sites. Initial analyses have focused
on the three midline electrode sites (Fz, Cz, PO), providing indications of the impact
of fidelity and stimulus variations at the frontal, central, and partietal/occipital regions
of the brain. Figure 5 displays the grand mean ERP waveforms for each of the
fidelity conditions by stimulus type at the three midline sites.
The various conditions (VE and RW) are clearly differentiated across the
waveforms by stimulus type and electrode site. The RW conditions display noticeably
lower amplitude positive waveform components (P2 and late positivity) than the VE
conditions across all sites and stimuli, as well as less pronounced negative (N1)
components for all stimuli at the Fz and Cz sites.
These waveforms were further examined for statistically significant effects of
fidelity condition and stimulus type within the N1, P2, and Late Positivity
components for the following comparisons: VE HiHi compared to both RW
conditions Ambient and IR), as well as VE LoHi and VE HiLo compared to the RW
Ambient Condition. The comparison of VE HiHi to the RW conditions was
conducted to examine the relationship of the maximal fidelity condition to the RW
transfer task conditions. The comparison of VE LoHi and VE HiLo to RW Ambient
sought to identify which fidelity trade-off resulted in physiological responses that
mapped more closely to the RW task under standard (ambient lighting) conditions.
Fig. 5. 3VE and 2RW conditions for each stimulus at the Fz, Cz, and PO sites
N1 Amplitude. The window used for the analysis of the N1 peak amplitude ranged
between 40ms-175ms from the initial onset of the stimulus. The N1 was assessed for
maximized fidelity (VE HiHi) compared to both RW conditions (Ambient and IR),
revealing a main effect for EEG channel (Figure 6), as well as a significant interaction
effect for fidelity by channel (Figure 7) in which the HiHi condition elicited a
significantly larger N1 component at the Cz electrode site.
Table 1. VE HiHi, RW Ambient, and RW IR N1 Statistical Analysis
Source DF F p
Channel 2 23.67 <.0001
Fidelity*Channel 4 6.43 0.0001
Fig. 6. N1 Main effect for Channel Fig. 7. N1 Fidelity x Channel interaction
The N1 Amplitude was also assessed for the comparison of fidelity trade-offs
(VE LoHi and VE HiLo) to the RW Ambient condition. As shown in Table 1, a
394 J. Vice et al.
significant main effect was found for fidelity and channel, and significant interactions
were found for fidelity by stimulus and for fidelity, channel, and stimulus. The
significant interactions are shown in Figures 8, 9, and 10.
Table 2. VE LoHi, VE HiLo, and RW Ambient N1 Statistical Analysis
Source DF F p
Fidelity 2 5.24 0.0091
Channel 2 19.95 <.0001
Fidelity*Channel 4 3.47 0.0111
Fidelity*Stimulus 4 4.08 0.0044
Fidelity*Channel*Stimulus 8 2.94 0.0041
Fig. 8. N1 Fidelity x Channel Fig. 9. N1 Fidelity x Stimulus
Fig. 10. VE LoHi, VE HiLo, and RW Ambient N1 Fidelity x Channel x Stimulus
P2 Amplitude. The window used for the analysis of the P2 amplitude was between
100ms-312ms. The P2 was assessed for maximized fidelity (VE HiHi) compared to
both RW conditions (Ambient and IR). As shown in Table 3, this analysis revealed
highly significant main effects for fidelity (Figure 11) and for channel (Figure 12),
with the VE condition eliciting a significantly higher P2 than both RW conditions,
and the Fz electrode site demonstrating lower P2 effects than Cz and PO. This
increased P2 within the VE, compared to the RW conditions likely reflects additional
required for processing of features within the VE.
Table 3. VE HiHi, RW Ambient, and RW IR P2 Statistical Analysis
Source DF F p
Fidelity 2 32.06 <.0001
Channel 2 11.91 <.0001
Fig. 11. P2 Main effect for Fidelity Fig. 12. P2 Main effect for Channel
The P2 amplitude was also assessed for the comparison of fidelity trade-offs (VE
LoHi and VE HiLo) to the RW Ambient condition. As shown in Table 4, significant
main effects were found for fidelity and channel (Figure 13), with the Ambient
condition and the Fz electrode site generating the smallest P2 components, and a
significant interaction was found for fidelity by stimulus (Figure 14). The HiLo
condition elicited the highest P2 peak for the AK-47, but a noticeably lower P2 peak
for the umbrella. This may indicate that a salient or critical feature of the AK-47 is
degraded when the color depth is reduced, but that lower color depth may actually
require less feature processing for the umbrella.
Table 4. VE LoHi, VE HiLo, and RW Ambient P2 Statistical Analysis
Source DF F P
Fidelity 2 16.82 <.0001
Channel 2 12.12 <.0001
Fig. 13. P2 Main effect for Channel Fig. 14. P2 Fidelity x Stimulus Interaction
Late Positivity. The Late Positive Component, ranging from 500ms-1200ms after the
presentation of the stimuli was assessed for maximized fidelity (VE HiHi) compared
396 J. Vice et al.
to both RW conditions (Ambient and IR). This analysis revealed main effects for
fidelity condition and for EEG channel (Figure 15), as well as a significant interaction
for fidelity by stimulus. The significant interaction is shown in Figure 16, in which
the RW Ambient condition displays a positive late component for the M-16 and AK-
47, but a negative late component for the umbrella.
Table 5. VE HiHi, RW Ambient, and RW IR Late Positivity Statistical Analysis
Source DF F P
Fidelity 2 3.81 0.0297
Channel 2 38.4 <.0001
Fig. 15. Main effect for Channel Fig. 16. Late Positivity Fidelity x Stimulus
The Late Positive Component was also assessed for the comparison of fidelity
trade-offs (VE LoHi and VE HiLo) to the RW Ambient condition. As shown in Table
6, a significant main effect was found for EEG channel (Figure 17), and significant
interaction effects were found for fidelity by stimulus, as well as fidelity, channel, and
stimulus. The significant interactions are shown in Figures 18 and 19.
Table 6. VE LoHi, VE HiLo, and RW Ambient Late Positivity Statistical Analysis
Source DF F P
Channel 2 31.84 <.0001
Fidelity*Channel*Stimulus 8 3.06 0.003
Fig. 17. Main effect for Channel Fig. 18. Late Positivity Fidelity x Stimulus
Fig. 19. Late Positivity Fidelity x Channel x Stimulus
4 Discussion
The goal of this study was to identify the VE fidelity configurations that provided a
perceptual experience that most closely mimicked the RW task and to relate the
neurophysiological data results to the performance results in an effort to better
understand the relationship between task performance and neurophysiological
response within a perceptual skills task. We expected to observe degraded
performance and distinctive differentiation between physiological signatures in
association with degraded fidelity.
Within the performance data, both accuracy and response times indicated a main
effect for stimulus type in which the AK-47 was the easiest stimulus to identify,
followed by the umbrella, with the M16 being the most challenging stimulus to
identify. An effect for fidelity condition was also found, indicating that RTs for the
RW IR condition were significantly slower than all other fidelity conditions, followed
by RW Ambient, VE LoHi, VE HiLo, and with the VE HiHi condition demonstrating
the fastest RTs. The faster reaction times within the VE conditions could be
attributed to the fact that simulated stimuli contain less visual details and features to
be processed. Faster reaction times within the highest fidelity condition are likely due
to increased distinguishability between salient features.
Within the neurophysiological data, significant effects were found for stimulus
type and fidelity condition, as well as EEG electrode channel/site along the midline of
the brain for three components of the ERP waveform: N1, P1, and the Late Positivity.
The RW ERPs were distinct from the VE ERPs, with the VE conditions eliciting
higher amplitude ERP waveform components consistent with increased processing of
pop-out visual features and object recognition. Thus, while based on the performance
data, the VE conditions appeared to be easier, higher levels of processing were going
on in the brain within those conditions.
Comparisons of the maximal VE condition (HiHi) to both the RW Ambient and
RW IR conditions were conducted in order to further explore differentiation between
VE and RW neurophysiological response. Of particular interest was the finding that
for the Late Positivity, the ERP waveforms were closely matched for the two
weapons, and were distinct form the umbrella waveforms, despite the fact that the
performance data demonstrate more similarity in accuracy and reaction times for the
398 J. Vice et al.
AK-47 and the umbrella than for the AK-47 and the M-16. Thus, the neurophysiological
processing of the weapons may be more closely matched, despite larger differences in
response times.
Additionally, the VE trade-off conditions (LoHi and HiLo) were compared to the
RW Ambient condition in order to identify the optimal fidelity trade-off in the event
that the maximized (HiHi) condition could not be implemented due to development
limitations. Significant interactions revealed that the optimal fidelity trade-off
condition varied based on the stimulus. For example, the HiLo condition elicited the
highest P2 peak for the AK-47, but a noticeably lower P2 peak for the umbrella. This
may indicate that a salient or critical feature of the AK-47 is degraded when the color
depth is reduced, but that lower color depth may actually require less feature
processing for the umbrella.
The distinctive ERP signatures offer a method to characterize objects within
military training scenarios that required higher resolution for effective training, as
well as those that could be easily recognized at lower resolutions, thus saving
developers time and money by highlighting the most efficient requirements to achieve
training efficacy. ERPs can be measured unobtrusively during training, allowing
developers to access a metric that could be used to guide scenario development
without requiring repeated transfer of training assessments and without relying solely
on performance or subjective responses. This novel approach could potentially be
used to determine which aspects of VE fidelity will have the highest impact on
transfer of training with the lowest development costs for a variety of simulated task
environments.
These findings will be leveraged under an ongoing research effort to assess the
impact of fidelity variations on performance and neurophysiological response within a
VE-based perceptual skills training task to further examine the technical feasibility of
utilizing neurophysiological measures to assess fidelity design requirements in order
to maximize cost-benefit tradeoffs and transfer of training.
References
1. Berka, C., Levendowski, D.J., Cvetinovic, M.M., Petrovic, M.M., Davis, G., Lumicao,
M.N., Zivkovic, V.T., Popovic, M.V., Olmstead, R.: Real-Time Analysis of EEG Indexes of
Alertness, Cognition, and Memory Acquired With a Wireless EEG Headset. International
Journal of Human-Computer Interaction 17(2), 151–170 (2004)
2. Carroll, M., Fuchs, S., Hale, K., Dargue, B., Buck, B.: Advanced Training Evaluation
System (ATES): Leveraging Neuro-physiological Measurement to Individualize Training.
In: Proceedings of I/ITSEC 2010 (2010)
3. Crosby, M.E., Ikehara, C.S.: Using physiological measures to identify individual
differences in response to task attributes. In: Schmorrow, D.D., Stanney, K.M., Reeves,
L.M. (eds.) Foundations of Augmented Cognition, 2nd edn., pp. 162–168. Strategic
Analysis, Inc., San Ramon (2006)
4. Marine Corps Interim Publication 3-11.01, Combat Hunter. Publication Control Number
146 000009 00 (2011)
5. Skinner, A., Sebrechts, M., Fidopiastis, C.M., Berka, C., Vice, J., Lathan, C.:
Psychophysiological Measures of Virtual Environment Training. In: Book chapter in
Human Performance Enhancement in High Risk Environments: Insights, Developments &
Future Directions from Military Research (2010)
6. Skinner, A., Vice, J., Lathan, C., Fidopiastis, C., Berka, C., Sebrechts, M.: Perceptually-
Informed Virtual Environment (PerceiVE) Design Tool. In: Schmorrow, D.D., Estabrooke,
I.V., Grootjen, M. (eds.) FAC 2009. LNCS, vol. 5638, pp. 650–657. Springer, Heidelberg
(2009)
7. Skinner, A., Berka, C., Ohara-Long, L., Sebrechts, M.: Impact of Virtual Environment
Fidelity on Behavioral and Newurophysiological Response. In: Proceedings of I/ITSEC
2010 (2010)
8. Wickens, C.D., Hollands, J.G.: Engineering psychology and human performance, 3rd edn.
Prentice Hall, Upper Saddle River (2000)
Author Index
Abate, Andrea F. I-3 Cheng, Huangchong II-20

Aghabeigi, Bardia II-279 Choi, Ji Hye I-97
Ahn, Sang Chul I-61 Choi, Jongmyung I-69
Akahane, Katsuhito II-197 Christomanos, Chistodoulos II-54
Albert, Dietrich I-315 Conomikes, John I-40
Aliverti, Marcello II-299
Almeida, Ana I-154 da Costa, Rosa Maria E. Moreira II-217
Amemiya, Tomohiro I-225, II-151, Dal Maso, Giovanni II-397
II-407 Dang, Nguyen-Thong I-144
Amend, Bernd I-270 de Abreu, Priscilla F. II-217
Ando, Makoto II-206 de Carvalho, Luis Alfredo V. II-217
Andrews, Anya II-3 de los Reyes, Christian I-40
Aoyama, Shuhei I-45 Derby, Paul II-100
Ariza-Zambrano, Camilo II-30 Di Loreto, Ines II-11
Dohi, Hiroshi II-227
Baier, Andreas I-135 Domik, Gitta II-44
Barber, Daniel I-387 Doyama, Yusuke II-158
Barbuceanu, Florin Grigorie I-164 Duarte, Emı́lia I-154, I-380
Barrera, Salvador I-40 Duguleana, Mihai I-164
Beane, John II-100 Duval, Sébastien II-377
Behr, Johannes II-343
Berka, Chris I-387 Ebisawa, Seichiro II-151
Bishko, Leslie II-279 Ebuchi, Eikan II-158
Bockholt, Ulrich I-123 Ende, Martin I-135
Bohnsack, James II-37 Enomoto, Seigo I-174, I-204
Bolas, Mark II-243 Erbiceanu, Elena II-289
Bonner, Matthew II-333 Erfani, Mona II-279
Bordegoni, Monica II-299, II-318
Bouchard, Durell I-345 Fan, Xiumin II-20
Bowers, Clint II-37, II-237 Ferrise, Francesco II-318
Brogni, Andrea I-194, I-214, I-234 Flynn, Sheryl II-119
Brooks, Nathan II-415 Foslien, Wendy II-100
Frees, Scott I-185
Caldwell, Darwin G. I-194, I-214,
I-234, II-327 Garcia-Hernandez, Nadia II-327
Campos, Pedro I-12 Gardo, Krzysztof II-141
Cantu, Juan Antonio I-40 Gaudina, Marco I-194
Caponio, Andrea I-20, I-87 Giera, Ronny I-270
Caruso, Giandomenico II-299 Gomez, Lucy Beatriz I-40
Cervantes-Gloria, Yocelin II-80 González Mendı́vil, Eduardo I-20, I-87,
Chang, Chien-Yen II-119, II-243 II-80
Charissis, Vassilis II-54 Gouaich, Abdelkader II-11
Charoenseang, Siam I-30, II-309 Graf, Holger II-343
Chen, Shu-ya II-119 Grant, Stephen II-54
402 Author Index
Ha, Taejin II-377 Kasada, Kazuhiro I-76

Hammer, Philip I-270 Kawai, Hedeki I-40
Han, Jonghyun II-352 Kawamoto, Shin-ichi II-177
Han, Tack-don I-97, I-105 Kayahara, Takuro II-253
Hasegawa, Akira I-297, I-306, Keil, Jens I-123
I-354, I-363 Kelly, Dianne II-54
Hasegawa, Satoshi I-297, I-306, Kennedy, Bonnie II-119
I-354, I-363 Kickmeier-Rust, Michael D. I-315
Hash, Chelsea II-279 Kim, Dongho II-425
Hayashi, Oribe I-76 Kim, Gerard J. I-243
He, Qichang II-20 Kim, Hyoung-Gon I-61
Hergenröther, Elke I-270 Kim, Jae-Beom I-55
Hernández, Juan Camilo II-30 Kim, Jin Guk II-370
Hill, Alex II-333 Kim, Kiyoung II-352
Hillemann, Eva I-315 Kim, Sehwan I-69, II-377
Hincapié, Mauricio I-20, I-87 Kiyokawa, Kiyoshi I-113
Hirose, Michitaka I-76, I-250, I-260, Klomann, Marcel II-362
I-280, II-158, II-206 Kolling, Andreas II-415
Hirota, Koichi I-225, II-151, II-407 Kondo, Kazuaki I-204
Hiyama, Atsushi II-158 Kubo, Hiroyuki II-260
Holtmann, Martin II-44 Kuijper, Arjan II-343
Hori, Hiroki I-306, I-354, I-363 Kunieda, Kazuo I-40
Huck, Wilfried II-44
Hughes, Charles E. II-270, II-289 Laffont, Isabelle II-11
Hwang, Jae-In I-61 Lakhmani, Shan II-237
Lancellotti, David I-185
Ikeda, Yusuke I-174, I-204 Lange, Belinda II-119, II-243
Ikei, Yasushi I-225, II-151, II-407 Lathan, Corinna I-387
Ingalls, Todd II-129 Lee, Hasup II-253
Ingraham, Kenneth E. II-110 Lee, Jong Weon II-370
Inose, Kenji II-206 Lee, Seong-Oh I-61
Isbister, Katherine II-279 Lee, Seunghun I-69
Ise, Shiro I-174, I-204 Lee, Youngho I-69
Ishii, Hirotake I-45 Lewis, Michael II-415
Ishio, Hiromu I-297, I-306, I-354, I-363 Li, Lei II-119
Ishizuka, Mitsuru II-227 Lukasik, Ewa II-141
Isshiki, Masaharu II-197
Izumi, Masanori I-45 Ma, Yanjun II-20
MacIntyre, Blair II-333
Jang, Bong-gyu I-243 Maejima, Akinobu II-260
Jang, Say I-69 Makihara, Yasushi I-325
Jang, Youngkyoon II-167 Mapes, Dan II-110
Jung, Younbo II-119 Mapes, Daniel P. II-270
Jung, Yvonne II-343 Mashita, Tomohiro I-113, I-335
Matsunuma, Shohei I-363
Kajinami, Takashi I-250, I-260, II-206 Matsuura, Yasuyuki I-306, I-371
Kamiya, Yuki I-40 McLaughlin, Margaret II-119
Kanda, Tetsuya I-306, I-354, I-363 Mendez-Villarreal, Juan Manuel I-40
Kang, Changgu II-352 Mercado, Emilio I-87
Kang, Kyung-Kyu II-425 Mestre, Daniel I-144
Author Index 403
Milde, Jan-Torsten II-362 Rizzo, Albert II-119, II-243

Milella, Ferdinando II-397 Rovere, Diego II-397
Miyao, Masaru I-297, I-306,
I-354, I-363, I-371 Sacco, Marco II-397
Miyashita, Mariko II-158 Sakellariou, Sophia II-54
Mogan, Gheorghe I-164 Sanchez, Alicia II-73
Morie, Jacquelyn II-279 Sanders, Scott II-119
Morishima, Shigeo I-325, II-177, San Martin, Jose II-64
II-187, II-260 Santos, Pedro I-270
Moshell, J. Michael II-110, II-289 Sarakoglou, Ioannis II-327
Mukaigawa, Yasuhiro I-204, I-335 Schmedt, Hendrik I-270
Sebrechts, Marc I-387
Nakamura, Satoshi I-174, I-204, Seif El-Nasr, Magy II-279
II-177, II-187 Seki, Masazumi II-158
Nam, Yujung II-119 Seo, Jonghoon I-97, I-105
Nambu, Aiko I-280 Shim, Jinwook I-97, I-105
Nappi, Michele I-3 Shime, Takeo I-40
Narumi, Takuji I-76, I-250, I-260, Shimoda, Hiroshi I-45
I-280, II-206 Shinoda, Kenichi II-253
Newman, Brad II-243 Shiomi, Tomoki I-306, I-354, I-363
Nguyen, Van Vinh II-370 Skinner, Anna I-387
Nishimura, Kunihiro I-280 Slater, Mel I-234
Nishioka, Teiichi II-253 Smith, Peter II-73
Nishizaka, Shinya I-260 Sobrero, Davide I-214
Norris, Anne E. II-110 Stork, André I-270
Nunnally, Steven I-345 Suárez-Warden, Fernando II-80
Sugiyama, Asei I-354
Ogi, Tetsuro II-253
Suksen, Nemin II-309
Oh, Yoosoo II-377
Suma, Evan A. II-243
Okumura, Mayu I-325
Sung, Dylan II-90
Omori, Masako I-297, I-306, I-354, I-363
Sycara, Katia II-415
Ono, Yoshihito I-45
Ontañón, Santiago II-289
Takada, Hiroki I-297, I-306, I-354,
Pacheco, Zachary I-40 I-363, I-371
Panjan, Sarut I-30 Takada, Masumi I-371
Park, Changhoon I-55 Takemura, Haruo I-113
Park, James I-97 Tan, Veasna I-387
Pedrazzoli, Paolo II-397 Tanaka, Hiromi T. II-197
Pérez-Gutiérrez, Byron II-30 Tanikawa, Tomohiro I-76, I-250,
Pessanha, Sofia I-12 I-260, I-280, II-206
Phan, Thai II-243 Tateyama, Yoshisuke II-253
Pojman, Nicholas I-387 Teixeira, Luı́s I-380
Polistina, Samuele II-299 Teles, Júlia I-154, I-380
Procci, Katelyn II-37 Terkaj, Walter II-397
Tharanathan, Anand II-100
Radkowski, Rafael II-44, II-387 Thiruvengada, Hari II-100
Rebelo, Francisco I-154, I-380 Tonner, Peter II-270
Reinerman-Jones, Lauren I-387 Touyama, Hideaki I-290
Ricciardi, Stefano I-3 Tsagarakis, Nikos II-327
Rios, Horacio I-87 Turner, Janice II-54
404 Author Index
Umakatsu, Atsushi I-113 Wirth, Jeff II-110

Urano, Masahiro II-407 Wittmann, David I-135
Woo, Woontack II-167, II-352, II-377
Valente, Massimiliano I-214
Van Dokkum, Liesjet II-11 Yagi, Yasushi I-204, I-325, I-335, II-187
Ventrella, Jeffery II-279 Yamada, Keiji I-40
Vice, Jack I-387 Yamazaki, Mitsuhiko I-76
Vilar, Elisângela I-154 Yan, Weida I-45
Vuong, Catherine II-129 Yang, Hyun-Rok II-425
Yang, Hyunseok I-243
Wakita, Wataru II-197 Yasuhara, Hiroyuki I-113
Wang, Huadong II-415 Yeh, Shih-Ching II-119
Watanabe, Takafumi II-206 Yoon, Hyoseok II-377
Webel, Sabine I-123 Yotsukura, Tatsuo II-177
Weidemann, Florian II-387 Yu, Wenhui II-129
Werneck, Vera Maria B. II-217
Whitford, Maureen II-119 Zhang, Xi II-20
Winstein, Carolee II-119 Zhu, Jichen II-289

Virtual and Mixed Reality New Trends

Uploaded by

Copyright:

Available Formats

You might also like

Virtual and Mixed Reality New Trends

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Virtual and Mixed Reality New Trends

Uploaded by

Copyright:

Available Formats

Lecture Notes in Computer Science 6773

Commenced Publication in 1973

Virtual and Mixed Reality –

International Conference, Virtual and Mixed Reality 2011

ISSN 0302-9743 e-ISSN 1611-3349

Library of Congress Control Number: Applied for

LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web

© Springer-Verlag Berlin Heidelberg 2011

The 14th International Conference on Human–Computer Interaction, HCI In-

• Volume 7, LNCS 6767, Universal Access in Human–Computer Interaction—

July 2011 Constantine Stephanidis

Ergonomics and Health Aspects of Work with Computers

Arne Aarås, Norway Brenda Lobb, New Zealand

Human Interface and the Management of Information

Hans-Jörg Bullinger, Germany Youngho Rhee, Korea

Sebastiano Bagnara, Italy Gitte Lindgaard, Canada

Engineering Psychology and Cognitive Ergonomics

Guy A. Boy, USA Jan M. Noyes, UK

Universal Access in Human–Computer Interaction

Julio Abascal, Spain Michael Fairhurst, UK

Zhengjie Liu, P.R. China Hirotada Ueda, Japan

Virtual and Mixed Reality

Pat Banerjee, USA David Pratt, UK

Internationalization, Design and Global Development

Michael L. Best, USA James R. Lewis, USA

Online Communities and Social Computing

Chadia N. Abras, USA Anthony F. Norcio, USA

Monique Beaudoin, USA Rob Matthews, Australia

Digital Human Modeling

Karim Abdel-Malek, USA Yaobin Chen, USA

Ravindra Goonetilleke, Hong Kong Ahmet F. Ozok, Turkey

Julio Abascal, Spain Zhengjie Liu, P.R. China

Design, User Experience, and Usability

Ronald Baecker, Canada Ana Boa-Ventura, USA

Rüdiger Heimgärtner, Germany Christine Ronnewinkel, Germany

The 15th International Conference on Human–Computer Interaction, HCI Inter-

Part I: Augmented Reality Applications

Designing Augmented Reality Tangible Interfaces for Kindergarten

lMAR: Highly Parallel Architecture for Markerless Augmented Reality

5-Finger Exoskeleton for Assembly Training in Augmented Reality . . . . . 30

Remote Context Monitoring of Actions and Behaviors in a Location

Spatial Clearance Veriﬁcation Using 3D Laser Range Scanner and

Development of Mobile AR Tour Application for the National Palace

A Vision-Based Mobile Augmented Reality System for Baseball

Social Augmented Reality for Sensor Visualization in Ubiquitous

Digital Diorama: AR Exhibition System to Convey Background

Part II: Virtual and Immersive Environments

Workspace-Driven, Blended Orbital Viewing in Immersive

Irradiating Heat in Virtual Environments: Algorithm and

Providing Immersive Virtual Experience with First-person Perspective

Intercepting Virtual Ball in Immersive Virtual Environment . . . . . . . . . . . 214

Part III: Novel Interaction Devices and Techniques

Touching Sharp Virtual Objects Produces a Haptic Illusion . . . . . . . . . . . 234

Whole Body Interaction Using the Grounded Bar Interface . . . . . . . . . . . . 243

Digital Display Case Using Non-contact Head Tracking . . . . . . . . . . . . . . . 250

Meta Cookie+: An Illusion-Based Gustatory Display . . . . . . . . . . . . . . . . . 260

Olfactory Display Using Visual Feedback Based on Olfactory Sensory

Towards Noninvasive Brain-Computer Interfaces during Standing for

Part IV: Human Physiology and Behaviour in VR