Professional Documents
Culture Documents
Augmented Reality Using Wii Mote
Augmented Reality Using Wii Mote
on
Bachelors of Engineering
by
Guide:
Prof. Rajan S. Deshmukh
University of Mumbai
2014-2015
CERTIFICATE
This is to certify that the project report entitled Augmented
a bonafide work of Adil
(49) submitted to the University of Mumbai in partial fulfillment of the requirement for the
award of the degree of Bachelor of
Head of Department
Principal
Khan, Huzefa Saifee and Roshan Shetty, is approved for the degree of
Bachelor of Computer Engineering.
Examiners
1.---------------------------------------------
2.--------------------------------------------Guide
1.---------------------------------------------
2.---------------------------------------------
Declaration
I declare that this written submission represents my ideas in my own words and
where others' ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all
principles of academic honesty and integrity and have not misrepresented or
fabricated or falsified any idea/data/fact/source in my submission. I understand
that any violation of the above will be cause for disciplinary action by the
Institute and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken when
needed.
-----------------------------------------
Adil Khan
----------------------------------------Huzefa Saifee
-----------------------------------------
Roshan Shetty
ABSTRACT
The goal of this project is to create a customizable three dimensional virtual reality display on
a system available to any non-technical user. This system will use the infrared camera
component of a standard Nintendo Wiimote to track a users head motions in the left, right,
forward, backwards, up and down directions. The virtual display will be a customizable
image projected onto a screen or simply shown on a computer or TV monitor. In order to
make the two dimensional image appear three dimensional, the image on the display will
continually change according to the position of the users head. As the user moves their head
to the left and right, potions of the image displayed will be shifted to the right and left
respectively by different amounts depending on their depth location in the image. Likewise,
as the user moves their head closer and further from the display, portions of the image will be
increased or decreased in size again depending on their depth location. In this way the image
is being continually redrawn to match the users perspective or vantage point. The user will
be able to select a number of images of their choosing and place them in the virtual display
assigning them each a position in space based on x, y, z coordinates and relative scale. The
system will create a continually augmented display according to the motion and location of
the users head to create a three dimensional presentation on a two dimensional screen.
Index
Sr. No
1.
2.
3.
4.
5.
Title
Introduction
Review and Literature
2.1. Paper 1
2.2. Paper 2
Proposed Methodology
3.1
Section
3.1.1. Subsection
Conclusion
References
Appendix
Acknowledgement
Page No
1
2
3
4
5
6
7
9
10
11
12
List of Figures
Sr. No Title
1.1.
1.2
The setup used to perform head tracking with the Wiimote and
IR LEDs.
1.3
The change in the visible field when the user moves to a different
angle relative to (a) a window and (b) a monitor.
1.4
The change in the visible field when the user moves closer to (a) a
window and (b) a monitor
3.1
3.2
Page No
List of Tables
Sr. No
1.1.
Title
Table 1
Page No
1
Chapter 1
Introduction
The goal of this project is to take a three dimensional virtual reality space and make it
customizable to the user. This virtual reality space will consist of a two dimensional image that
is continually redrawn to rearrange the size and location of individual images depending on the
users changing perspective of the scene. The user will be able to take a background image and
insert a variety of different images at different locations of the foreground to create their own
virtual display. Each image placed in the foreground, will have its own x, y, z coordinates in
the three dimensional space and a specific size relative to all other objects contained in the
space. Each aspect of the images location and even the images themselves will be determined
by the user in a closed loop system so that values may be changed without the need to
recompile and restart the system. The three dimensional characteristic of the space will be
conveyed through use of head tracking with a Wiimote. The user will wear a specially designed
hat or set of glasses with infrared LEDs embedded on the front facing towards the screen. A
Wiimote will be placed under or above the screen facing the user so that the Wiimotes infrared
camera can determine location and movements of the LEDs. As the user moves their head and
the LEDs change location, the information is sent to the computer via a Bluetooth connection.
The computer then parses the data from the Wiimote to determine the changing orientation of
the users head. The system uses the head tracking information to perform an appropriate matrix
transformation on the three dimensional virtual space. As the user moves their head to the left,
the images are moved to the right by different amounts based on their specified depth in the
space. If the user moves closer or farther away from the screen, the images will be scaled larger
or smaller, again by amounts determined by their depth in the space, to simulate a three
dimensional environment. A communication block diagram for the overall system is shown
below in Figure 1.
In this report we will discuss the background of the project and why it was deemed important
enough to invest time into, the specific requirements of the end product,
our design,
construction and testing of the system, and finally what we gained from working on this
project and our recommendations for future development of the system.
Figure 1.2: The setup used to perform head tracking with the Wiimote and IR LEDs.
In line with the growth of the game industry, the demands for realism in modern computer games
increase as well. Realism is often raised through improved graphics, articial intelligence, sound
eects and similar. The player interaction devices are however, mostly limited to the use of mouse,
keyboard and conventional game controllers. The possibilities have improved somewhat by the
introduction of Wii1, Nintendos newest game console which was introduced in 2006. Now players
can interact more directly and naturally with the games, thus achieving higher realism. This is done
through the Wiimote controller which, among other things, features motion-sensing. However, critical
areas are still left behind. There is for example poor access to equipment allowing the player to change
the view perspective by moving her head. This can indeed already be achieved using available virtual
reality (VR) equipment, but for most users it is not within an acceptable price range.
It turns out that equipment allowing this kind of interaction does not necessarily have to be
expensive. It can be achieved using an infrared (IR) camera and some light emitting diodes (LEDs).
By placing the camera by the monitor and the LEDs by the head, the placement of the LEDs can be
found, hereby allowing us to determine the users movement and changing the eld of view according
to this. This yields the illusion that the monitor is like a physical window into another room instead of
just a static photo. This can increase the realism for especially 3D games dramatically.
Johnny Lee [14] from Carnegie Mellon University has realized the task using the IR camera
on the Wiimote to facilitate the job using a simple setup illustrated in gure 1. Normally registration of
images is a dicult job and forms an entire research eld, but when taking advantage of the build in
IR camera and image processor of the Wiimote, it becomes quite simple to detect and track numerous
positions. Furthermore the Bluetooth support and low price of the controller makes it easily accessible.
Based on these arguments, we feel that the idea presented by Johnny Lee calls for further
investigation. In this paper we will therefore try to mimic his work and determine the eld of view in a
3D game like world from the available data. Through a user study we furthermore wish to evaluate
whether our solution is suited for interaction in a 3D world.
To keep the focus of the project on the subject at hand, we will assume that the reader is
familiar with the process of setting up a camera in a 3D scene. Thus terms like the up vector and
projection matrix will not be explained in the paper. A thorough introduction to this subject can be
found in [4].
Included with the paper is a CD-ROM containing a digital copy of the paper, the source code,
application program interface (API) documentation and a video illustrating our solution in action.
Wiimote
In November 2006, Nintendo released its fifth home videogame console, the Nintendo Wii.
The companys previous game console, the Gamecube, hadnt fared well in terms of market share
against the much higher-powered alternatives released by its competitors, Microsoft and Sony. At first
the Wii also seemed significantly underpowered relative to its competitors. However, one year later it
became the market leader of its console generation, selling over 20 million units worldwide.1 This
success is largely attributable to the innovative interactive technology and game-play capabilities
introduced by the consoles game controller, the Wii remote, shown in Figure 1. The Nintendo Wii
remote, or Wiimote, is a handheld device resembling a television remote, but in addition to buttons, it
contains a 3-axis accelerometer, a high-resolution highspeed IR camera, a speaker, a vibration motor,
and wireless Bluetooth connectivity. This technology makes the Wii remote one of the most
sophisticated PC-compatible input devices available today; together with the game consoles market
success, its also one of most common. At a suggested retail price of US$40, the Wii remote is an
impressively cost-effective and capable platform for exploring interaction research. Software
applications developed for it have the additional advantage of being readily usable by millions of
individuals around the world who already own the hardware. Ive recently begun using Internet video
tutorials to demonstrate interaction techniques supported or enabled by the Wii remote. In just a few
weeks, these tutorials have received over six million unique views and generated over 700,000
software downloads. In this article, I will talk about the Wii remotes technology, cover whats
involved in developing custom applications, describe intended and unintended interaction techniques,
and outline additional uses of the device.
The Wiimote is a controller for the Wii. In contrast to most other controllers for game
consoles, the input methods are not just buttons and analogue sticks, but an IR camera and motionsensing. In this section we will describe these features. The technical specification is based on . The
appearance of the Wiimote is designed much like a TV remote as seen in figure below. It has 12
buttons, where 11 are on the top and one is located underneath. The Wiimote has 16 kilobyte
EEPROM2 of which a part is freely accessible and another is reserved. The Wiimote also houses a
speaker and the top has holes which the sound can come out of. Four blue LEDs are on the top. If the
Wiimote connects to a Wii the LEDs first indicate the battery level and afterwards which number the
Wiimote is connected as. A small rotating motor is placed in the Wiimote which can make the Wiimote
vibrate. A plug is located in the end of the Wiimote. Through this plug attachments can be connected.
A couple of peripherals have been released.
However, the most used is the Nunchuck. The Nunchuck is a device with two buttons, an analogue
stick and motion-sensing as the Wiimote itself. The Wiimote can detect motion in all 3 dimensions.
This is achieved through accelerometers inside. For a thorough discussion of the functionality. The
front of the Wiimote has a small black area like a normal TV remote. The difference is that a TV
remote has an IR LED beneath, where the Wiimote has an IR camera. By placing a so called sensor
bar (essentially IR LEDs powered by the Wii) below the TV, the Wiimote can be used as a pointing
device for the Wii. As can be seen on Figure 3 the LEDs in the sensor bar are located with space
between them. This makes it possible to calculate the relative position of the Wiimote with regards to
the sensor bar. When using the Wiimote as a pointing device for the Wii, one doesnt actually point at
things on the TV, but is pointing relative to the TV. To minimize the amount of data being transferred
from the Wiimote to the Wii, it does not actually transfer every image taken by the Wiimote. Instead
the Wiimote calculates the position of each point and sends the x- and y-coordinate for each of them
together. It can also send other information such as a rough size of the points. It is noted that the LEDs
that can be seen of Figure 3 are taken as two separate LEDs and not 10. This is because they are
located so close to each other. The Wiimote can register up to four separate points at a time.
The Wiimote does not just take an image, analyze it and send the result of the positions to the Wii, it
also keeps track of the points. Every time a point is detected it is assigned a number from one to four.
If for instance two points are detected and the one marked as 1 is moved out of range, the other point
is still marked as 2. If the point is then moved within range again it is marked as 1, but the Wiimote of
course can not tell if it is the same LED as before or a new one.
buer of size 1. This may result in a few lost packages at times, but it is not critical in our application,
since all packages contains the absolute position of the IR points as discussed in section 5.2.2. Another
thing that should be noted is that the status of the rumble engine is send as the least signicant bit in
every request report. Therefore it is important to make sure that this bit is 0 at all times since we never
use the rumble feature.
Interpretation of Data
When the position of the points is known, the users position can be calculated. The distance between
the two LEDs is inverse proportional to the distance between the user and the screen. However, instead
of just using the distance directly, Lee takes the size of the monitor into consideration. This is done by
allowing the user to specify the size of her monitor. If the information is not given, a default value will
be used.
As can be seen in Figure 6 the distance from the user to the screen is inverse proportional to the
tangent to half the viewing angle measured between the LEDs. To calculate this distance, the distance
between the two LEDs is scaled with the size of the monitor and used to derive the relative distance
between the user and the monitor. This distance is then used to calculate the position of the user
relative to the monitor: By using the sine-function on the angle which indicates the distance between
the LEDs on the x-axis, the x-coordinate relative to the camera-space is known. This method assumes,
that the user is centred on the x-axis, when directly in front of the Wiimote. The y-coordinate in
camera-space is calculated much the same way, but this coordinate is not assumed to be centred
directly in front of the Wiimote. To take this into consideration an oset is used. This oset can also
be changed by the user to t the setup used. In the above calculations we assume the position is
centred around the origin. Since the Wiimote returns values ranging from 0 to 1023 and 0 to 767 for
the x and y respectively, we must deduct 512 from the x and 384 from y.
scaled with the front clipping plane. The boundaries are used to calculate the projection-matrix, which
is constructed as follows:
It is noted, that the frustum given by this matrix is not necessarily the same to the left and right of the
camera. However, this is needed because the plane closest to the user has to be locked to the monitor
to achieve the eect of looking through a window. Calculating these two matrices nishes the
interpretation of the data, since the world matrix needs not be modied.
The visualization is not particularly interesting. After the matrices needed for the transformation are
calculated, the graphics are made as in most other 3D applications. This means that objects are shown
by transforming them from the 3D space into 2D space by using the mentioned matrices. After this,
they are rendered to the monitor.
Chapter 2
Review of Literature
Paper 1: Experience in the Design and Development of a
Game Based on Head-Tracking Input"
Authors: Jeffrey Yim, Eric Qiu, T.C. Nicholas Graham
Published : IACSS 2008. Calgary 3rd November, 2008.
This Paper shows that Tracking technologies, such as eye and head-tracking, provide
novel techniques for interacting with video games. For instance, players can shoot with their
eyes in a first person shooter using gaze-based input. Head-tracking systems allow players to
look around a virtual cockpit by simply moving their head. However, tracking systems are
typically based on expensive specialized equipment. The prohibitive costs of such systems
have motivated the creation of low-cost head-tracking solutions using simple web cameras
and infrared light detection. In this paper, we describe our experience developing a simple
shooting game which incorporates such low-cost head-tracking technology. There have in
recent years been significant advances in novel techniques for interacting with video games.
One interesting direction involves the creation of more immersive ways of providing input to
games, where players natural movements translate into in-game actions. Perhaps the most
well-known of these are gesture-based interactions using a Wii Remote and movement-based
interaction using a dance pad or Wii Balance Board. A very different style of approach has
been the use of passive input devices that capture players focus of attention, and use this as a
direct source of input to games. An example is eye-gaze control of video games, where for
example, a player of a first person shooter can aim his gun simply by looking at the desired
target. In this approach, players do not control the game through a physical input device; they
simply look. Gaze-based input requires expensive equipment (e.g., a $28,000 Tobii eye
tracker), and therefore cheaper approaches have been developed, such as head-tracking.
Unlike eye trackers, head tracking systems cannot directly determine eye-gaze. Instead, player
attention is approximated by capturing head position and orientation. In this manner, players
can freely look around their cockpit in a flight simulator game. Head trackers, however, are
not just low-fidelity substitutes for eye trackers. For instance, the act of dodging projectiles in
a shooting game is naturally captured by lateral head movements.
Chapter 3
Proposed Methodology
In this section we will first give a general analysis of the problem we are trying
to solve. This is followed by a analysis of the solution of Johnny Lee and the
methods used. Finally this will result in a description of the solution we will use
to achieve our goal.
3.1
frame, this changes. Now, what you see through the frame depends on the
angle and distance that the frame is viewed, exactly like when looking through a
window. This effect is what we wish to achieve on the monitor. Depending on
the position of the user, the visible content changes to reflect her movement.
3.1.1
Movement scenarios
The movement that the user can make can be separated into two general
categories; changes in the angle between the front of the monitor and the user
and changes in the distance between the monitor and the user. The change of
angle is an effect of the user moving around in a constant dis- tance from the
centre of the screen (i.e. the position of the Wiimote). When this occurs, some
elements should vanish while others should become visible. Again, using a
window as an example; if you stand in front of a window and move to the left,
you can see more to the right on the other side of the
window and vice versa. The situation is illustrated from above in Figure 4(a).
Figure 3.3: The change in the visible field when the user moves to a different angle relative to
(a) a window and (b) a monitor.
As a contrast, the normal scenario when using a monitor is shown in Figure 4(b)
where it is obvious that the angle of view has no influence and thus seems
unrealistic. The same principles also apply when lowering or raising your head;
you can see more o f the sky through a window when crouching than when
standing. The other category, change of distance, is a different effect as the
distance between the monitor and the user determines how much of an image
should be visible overall. Returning to the window example: the closer one is to
a window, the more one can see outside the window in all directions. The
situation is illustrated in Figure 3.4.
Figure 3.4: The change in the visible field when the user moves closer to (a) a window and
(b) a monitor.
As one moves closer to the window, more of the outside becomes visible. As
before, Figure is provided as a contrast, showing how distance has no effect when
using a traditional monitor.
To enable the kind of interaction described above, the computer needs to know
the position of the users head relative to the monitor. This problem is known as
headtracking . As mentioned, headtracking in general is difficult since the process
of finding areas of interest and registering several images is complicated. However
due to the IR camera and onboard image processor of the Wiimote, the task is
significantly simplified. As described in section the data returned from the
Wiimote contains absolute coordinates of up to four points and optionally size
estimates. Since little information can be obtained when only one point is used,
as the size estimate is too coarse, two points are required. This is also the
amount of points recognized when using the sensor bar. This approach is the
foundation in the solution of Johnny Lee as well as in our solution.
Combined, the x- and y-coordinate and distance from user to screen can give the camera
position, camera target and camera up vector. From these three vectors the three axis of the
coordinate system to be used can be determined:
This means that objects are shown by transforming them from the 3D space into 2D space by
using the mentioned matrices. After this, they are rendered to the monitor.
Chapter _
Conclusions
At rest, the Wii Remotes accelerometer detects gravitys pull and can thus be used as
an inclinometer. It provides a good estimate of the pitch and roll (together referred to
here as inclination) of the Remote leaving only yaw (azimuth) unconstrained. Using
this constraint, we develop a technique to register an object of known geometry to an
image with only three or four points. This technique is then applied to create a selfcalibrating six-degree-of-freedom input device.
Instead of attempting to intelligently determine correspondences, the small number of
unknowns allows for a guess-and-check approach. Each possible mapping of markers
on the tracked object (which we call an artifact) to observed points is attempted and
the resulting transformation is checked for consistency with the observed points and
inclinations.
The purpose of this project is to implement a solution for performing headtracking
using the Wiimote. The goal of the headtracking was to allow users to interact
with a 3D world and give much of the same feeling as when looking through a
window.
Appendix
Detailed information, lengthy derivations, raw experimental observations etc. are to be presented
in the separate appendices, which shall be numbered in Roman Capitals (e.g. Appendix I).
Chapter _
References
[1] Jeffrey Yim, Eric Qiu, T.C. Nicholas Graham Experience in the Design and Development of
a Game Based on Head-Tracking Input" IACSS 2008. Calgary 3rd November, 2008.
[2] Kevin Hejn, Jens Peter Rosenkvist Head tracking using a Wiimote COGAIN 28th March,
2008.
[3] Sreeram Sreedharan, Edmund S. Zurita, and Beryl Plimmer.
worlds.
group (CHISIG)
of Australia
on Computer-human
interaction: design: activities, artifacts and environments, pages 227230. ACM, 2007.
[4] Akihiko Shirai, Erik Geslin, and Simon Richir. Wiimedia: motion anal- ysis
methods and applications using a consumer video game controller. In Sandbox 07:
Proceedings of the 2007 ACM SIGGRAPH symposium on Video games, pages
133140. ACM, 2007.
[5]R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge Univ.
Press, 2003.
[6]R. Raskar, G. Welch, and K.-L. Low,Shader Lamps: Animating Real Objects with ImageBased Illumination, Proc. Eurographics Workshop on Rendering, Springer, 2001, pp. 89102.
[7] James D. Foley, Andries van Dam, Steven K. Feiner, and John F.Hughes.
Computer Graphics - Principles And Practice. Addison Wesley, 2nd edition edition, 1996.
[8] Ralph L. Rosnow and Robert Rosenthal. Beginning Behavioral Research
Acknowledgements
I am profoundly grateful to Prof. Prof. Rajan Deshmukh for his expert guidance and continuous
encouragement throughout to see that this project rights its target.
I would like to express deepest appreciation towards Dr. Varsha Shah, Principal RCOE,
Mumbai and Prof. Dinesh Deore, Head of the Computer Engineering Department whose invaluable
guidance supported me in this project.
At last I must express my sincere heartfelt gratitude to all the staff members of Computer
Engineering Department who helped us directly or indirectly during this course of work.