Proposal 1

Projected Interactive Display
for Public Spaces
--------------------
A Thesis Proposal
Presented to the Faculty of the
Department of Electronics and Communications Engineering
College of Engineering, De La Salle University
--------------------
In Partial Fulfillment of
The Requirements for the Degree of
Bachelor of Science in Electronics and Communications Engineering
--------------------
by
Arcellana, Anthony A.
Ching, Warren S.
Guevara, Ram Christopher M.
Santos, Marvin S.
So, Jonathan N.
June 2006
1. Introduction
1.1. Background of the Study
Human-computer interaction (HCI) is the study of the interaction between the
users and the computers. The basic goal of HCI is to improve the interaction between
users and computers by making the computers more user-friendly and accessible to
users. HCI in the large is an interdisciplinary area. It is emerging as a specialty
concern within several disciplines, each with different emphases: computer science,
psychology, sociology and industrial design (Hewett et. al., 1996). The ultimate goal
of HCI is to design systems that would minimize the barrier between the human’s
cognitive model of what they want to accomplish and the computer’s understanding
of the user’s task.
The thesis applies a new way to interact with sources of information using an
interactive projected display. For a long time the ubiquitous mouse and keyboard has
been used to control a graphical display. With the advent of increased processing
power and technology, there has been great interest from the academic and
commercial sector in developing new and innovative human computer interfaces in
the past decades. (Myers et. al., 1996). Recently advances and research in human
computer interaction (HCI) has paved the way for techniques such as vision, sound,
speech recognition, and context-aware devices that allow for a much richer,
multimodal interaction between man and machine. (Turk, 1998; Porta, 2002). This
type of recent research moves away from traditional input devices which are
essentially blind into the so called Perceptual User Interfaces (PUI). PUI are
interfaces that emulate the natural capabilities of humans to sense, perceive, and
reason. It models human-computer interaction after human-human interaction. Some
of the advantages of PUIs are as follows: (1) it reduces the dependence on being in
proximity that is required by keyboards and mouse systems, (2) it makes use of
communication techniques found natural in humans, making the interface easy to
use,(3) it allows interfaces to be built for a wider range of users and tasks, (4) it
creates interfaces that are user-centered and not device centered, and (5) it has design
emphasis on being a transparent and unobtrusive interface. (Turk, 1998).
What is interesting in this line of research is the development of natural and
intuitive interface methods that make use of body language. A subset of PUI is Vision
Based Interfaces (VBI) which focuses on the visual awareness of computers to the
people using them. Here computer vision algorithms are used to locate and identify
individuals, track human body motions, model the head, and face, track facial
features, interpret human motion and actions. (Porta, 2002) A certain class of this
research falls under bare hand human-computer interaction which this study is about.
Bare hand human interaction uses as a basis of input, the actions and gestures of the
human hands alone without the use of attached devices.
1.2. Statement of the Problem
Information-rich interactive viewing modules are usually implemented as
computer based kiosks. However placing computer peripherals such as touch-screens
and mouse and keyboard controlled computers in a public area would require
significant space and have maintenance concerns on the physical hardware being used
by the common public. Using a projected display and a camera based input device,
would eliminate the hardware problems associated with the space and maintenance. It
also attracts people since projected displays are new and novel.
1.3. Objectives
1.3.1.General Objectives
The general objective of the thesis is to create an interactive projected
display system using a projector and a camera. The projector would display the
interactive content and the user would use his hand to select objects in the
projected display. Computer vision is used to detect and track the hand and
generate the proper response.
1.3.2.Specific Objectives
1.3.2.1. To use a DLP or LCD projector for the display
1.3.2.2. To use a PC camera as the basis of user input
1.3.2.3. To use a PC to implement algorithms to detect hand action as seen
from the camera
1.3.2.4. To use a PC to host the information-rich content
1.3.2.5. To create an interactive DLSU campus map as a demo application
1.4. Scope and Delimitation
1.4.1.Scope of the Study
1.4.1.1. The proponents will create a real time interactive projected display
using a projector and camera.
1.4.1.2. The proponents will use development tools for image/video
processing and computer vision to program the PC. Algorithms for
hand detection and tracking will be implemented using these tools.

1.4.1.3. A demo application of the system will be implemented as an
interactive campus map of the school.
1.4.1.4. Only the posture of a pointing hand will be recognize as an input.
Other visual cues to the camera will not be recognized.
1.4.2.Delimitation of the Study
1.4.2.1. The display will be projected in a clean white wall.
1.4.2.2. The projector and the camera set-up will be fixed in such a way that
blocking the projector is not a problem.
1.4.2.3. Trapezoidal distortion which results from projecting from an angle
will be manually compensated if present.
1.4.2.4. Lighting conditions will be controlled to not overpower the projector.
1.4.2.5. The system will be designed to handle only a single user. In the
presence of multiple users, the system would respond to the first user
triggering an event.
1.5. Significance of the Study
The study applies a new way of presenting information using projected
displays and allows the user to interact with it. A projected display conserves space as
the system is ceiling mounted and there is no hardware that directly involves the user.
Using only the hands of the user as an input, the system is intuitive and natural-- key
criteria for effective interfaces. It presents an alternative to computer based modules
where space can be a problem.

Currently there is a high cost of acquiring and maintaining a projector. But it
is still viable when maintaining an information center is deemed to be important. The
system can be comparable to large screen displays that are used in malls and such.
Since the system is also a novel way of presenting information. It can be used to
make interactive advertisements that are very attracting to consumers. The display
can transform from an inviting advertisement into detailed product information. With
this said, the cost of the operation of the projector can possibly be justified with the
revenue generated from effective advertising.
The study is an endeavor towards the development of natural interfaces. The
use of a projector and camera provides a means of producing an augmented reality
that is natural-- requiring no special goggles or gloves that the user has to wear. In
public spaces where information is very valuable, a system that can provide an added
dimension to reality is very advantageous and the use of nothing but the hands means
the user can instantly tap the content of the projected interface. Computer vision
provides the implementation of a perceptual user interface and the projection provide
the means of creating an augmented reality. Further developments in these areas
means the presence of computers can be present in everyday life without being
perceived as such. With PUI there is no need for physical interface hardware, only the
use of natural interaction skills present in every human is needed.
1.6. Description of the Project
The system is comprised of 3 main components; (1) the PC which houses the
information and control content, (2) the projector which displays the information, and
(3) the PC camera which is the input of the system. Development of the study would
be heavily invested in the software programming of the PC. The functions of the PC
would be the following: detection of the position and action of the hands of the user
relative to the screen, generating a response from a specific action, hosting the
information rich content. Techniques of image/video processing and machine vision
will be used to facilitate the first two functions of the PC.
As a demo application an interactive map of the school is used. The projector
will project the campus directory of De La Salle University Manila. The camera will
capture the images needed and will upload to the computer. The user will then pick on
which building he/she would like to explore using his/her hand as the pointing tool.
Once the user has chosen a building, a menu will appear that will give information
about the building. Information includes brief history, floor plans, facilities, faculties,
etc. Once the user is finished exploring the building, he/she can touch the back button
to select another building in the campus. The cycle will just continue until the user is
satisfied.
1.7. Methodology
Development of the study would be heavily invested in the software
programming of the PC. We the researchers must spend time in acquiring skills in
programming for the implementation of the research. Research about video capture
and processing is greatly needed for the operation of the system. Quick
familiarization and efficiency with the libraries and tools for computer vision is
necessary for timely progress in the study.
The proponents of the research must first obtain the hardware that is needed
for the achievement of the study, materials such as the camera that will capture the
input, the projector which will give the output display (projected display) and the PC
which the system will be based. The appropriate specifications of the camera and the
projector will be carefully looked at to get the precise requirements. After which the
system has its working prototype, testing and making the necessary adjustments will
be needed upon detecting and fixing problems in the system.
Seeking advice from different people may be necessary for speedy progress of
the study. Advice in programming will be very helpful, since the implementation is
PC based. Additionally advice from the panel, adviser, and other people about the
interface will be helpful in removing biases the proponents may have in the system.
1.8. Gantt Chart
1.9. Estimated Cost
Projector P 50,000
PC Camera P 1500-2000
Open Source SDK Free
Development / Prototype Computer Available
Miscellaneous P5000
Estimated budget P 57000

2. Review of Related Literature
2.1. PC Camera
PC Camera, popularly known as web camera or webcam, is a real time camera
widely used for video conferencing via the Internet. Acquired images from this device
were uploaded in a web server hence making it accessible using the world wide web,
instant messaging, or a PC video calling application. Over the years, several
applications were developed including in the field of astrophotography, traffic
monitoring, and weather monitoring. Web cameras typically includes a lens, an image
sensor, and some support electronics. Image sensors can be a CMOS or CCD, the
former being the dominant for low-cost cameras. Typically, consumer webcams offers
a resolution in the VGA region having a rate of around 25 frames per second. Various
lens were also available, the most being a plastic lens that can be screwed in and out
to manually control the camera focus. Support electronics is present to read the image
from the sensor and transmit it to the host computer.
2.2. Projectors
Projectors are classified into two technologies, DLP (Digital Light Processing)
and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the
projector uses to compose the image (Projectorpoint).
2.2.1.DLP
DLP technology uses an optical semiconductor known as the Digital
Micromirror Device, or DMD chip to recreate the source material. Below is an
illustration of how it works (Projectorpoint).

2.2.1.1. Advantages of DLP projectors
There are advantages of DLP projectors over the LCD projectors.
First, there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels
in DLP are much closer together. Another advantage is that it has higher
contrast compared to LCD. DLP projectors are much portable for it only
requires fewer components and finally, claims had shown that DLP projectors
last longer than LCD (Projectorpoint).
2.2.1.2. Disadvantages of DLP projectors
Certainly, DLP projectors also have disadvantages to consider. It has
less color saturation. The ‘rainbow effect’ is appearing when looking from one
side of the screen to the other, or when looking away from the projected image
to an off-screen object and sometimes ‘Halo effect’ appears (Projectorpoint).
2.2.2.LCD
LCD projectors contain three separate LCD glass panels, one for red,
green, and blue components of the image signal being transferred to the projector.
As the light passes through the LCD panels, individual pixels can be opened to
allow light to pass or closed to block the light. This activity modulates the light
and produces the image that is projected onto the screen (Projectorpoint).
2.2.2.1. Advantages of LCD projectors
Advantages of LCD projectors over the DLP projectors include: It is
more ‘light efficient’ than DLP. It produces more saturated colors making it
seem brighter than a DLP projector. It produces sharper image
(Projectorpoint).
2.2.2.2. Disadvantages of LCD projectors
Disadvantages of LCD projectors over DLP projectors are: It produces
‘chicken wire’ effect causing the image to look more pixellated. LCD
projectors are more bulky because there are more internal components. Dead
pixels, which are pixels that are permanently on or permanently off, appear
which can be irritating to see. LCD panels can fail, and are very expensive to
replace (Projectorpoint).
2.3. Similar Researches
2.3.1.Bare-Hand Human –Computer Interface
Human-computer interaction describes the interaction between the user
and the machine. Devices such as keyboard, mouse, joystick, electronic pens and
remote controls were commonly used as the means for human-computer
interaction. Real-time barehanded interaction is the controlling of computer
system without any device or wires attached to the user. The position of the
fingers and the hand is to be used to control the applications (Hardenburg, 2001).
2.3.1.1. Applications
Bare-hand computer interaction is more practical than traditional input
devices. A good example is during a presentation, the presenter may use hand
gestures for selecting slides therefore minimizing the delay or pauses caused
by moving back and fourth to the computer system to click for the slide.
Perceptual interface allows systems to be integrated in small areas and allows
users to operate at a certain distance. Direct manipulation of virtual objects
using fingers is made possible with this system. Also, with this system,
indestructible interface could be built by mounting the projector and camera
high enough for the user not to access or touch it. With these, the system will
be less prone to damage caused by the users (Hardenburg, 2001).
2.3.1.2. Functional Requirements
Functional requirement includes the services for a vision-based
computer interaction system. The three essential services needed in the
implementation of the aforementioned system are detection, identification and
tracking. Detection determines the presence and position of the objects
acquired. The output of detection could be used for controlling applications.
Identification service recognizes if the object present in the scene is within the
given class of objects. Some of the identification tasks were the identification
of certain hand posture and number of fingers visible. Tracking service is
required to be able to tell which object moved between two frames since the
identified objects will not rest in the same position over the time (Hardenburg,
2001).
2.3.1.3. Non-Functional Requirements
Non-functional requirements describe the minimum quality expected
from a service. The qualities to be monitored and maintained are latency,
resolution and stability. Latency is defined as the lag between the user’s action
and the response of the system. Eventually there is no system without latency
therefore the acceptable latency of the system is of given importance since the
application requires real-time interaction. Minimum input resolution is
important in the detection and identification processes. It is difficult to

identify fingers with a resolution width below six pixels. Tracking service is
said to be stable as long as the tracking object does not move and as long as
the measured position does not change (Hardenburg, 2001).
2.3.2.Dynamically Reconfigurable Vision-Based User Interfaces
Vision-based user interfaces (VB-UI) are an emerging area of user interface
technology where a user’s intentional gestures are detected via camera, interpreted
and used to control an application. The paper describes a system where the
application sends the vision system a description of the user interface as a
configuration of widgets. Based on this, the vision system assembles a set of
image processing components that implement the interface, sharing computational
resources when possible. The parameters of the surfaces where the interface can
be realized are defined and stored independently of any particular interface. These
include the size, location and perspective distortion within the image and
characteristics of the physical environment around that surface, such as the user’s
likely position while interacting with it.
The framework presented in this paper should be seen as a way that vision
based applications can easily adapt to different environments. Moreover, the
proposed vision-system architecture is very appropriate for the increasingly
common situations where the interface surface is not static (Kjeldsen, 2003.).
2.3.2.1. Basic Elements
A VB-UI is composed of configurations, widgets, and surfaces.
Configurations are a set of individual interaction dialogs. It specifies a
boundary area that defines the configuration coordinate system. The boundary
is used during the process of mapping a configuration onto a particular
surface. Each configuration is a collection of interactive widgets. A widget
provides an elemental user interaction, such as detecting a touch or tracking a
fingertip. It generates events back to the controlling application where they are
mapped to control actions such as triggering an event or establishing a value
of a parameter. A surface is essentially the camera’s view of a plane in 3D
space. It is able to define the spatial layout of widgets with respect to each
other and the world but it should not be concerned with the details of the
recognition process (Kjeldsen, 2003).
2.3.2.2. Architecture
In this system, each widget is represented internally as a tree of
components. Each component performs one step in the widget’s operation.
There are components for finding the moving pixels in an image (Motion
Detection), finding and tracking fingertips in the motion data (Fingertip
Tracking), looking for touch-like motions in the fingertip paths (Touch Motion
Detection), generating the touch event for the application (Event Generation),
storing the region of application space where this widget resides (Image
Region Definition), and managing the transformation between application
space and the image (Surface Transformation) (Kjeldsen, 2003).
The figure below shows the component tree of a “touch button” and the “tracking area.”
2.3.2.3. Example Applications
One experimental application developed that used the dynamically
reconfigurable vision system is the Everywhere Display Projector (ED). This
provides information access in retail spaces. The Product Finder Application
is another example. Its goal is to allow customer to look up products in a store
directory, and then guide him/her to where the product is (Kjeldsen, 2003.).
2.3.3.Computer Vision-Based Gesture Recognition for an Augmented Reality
Interface
Current researchers are discerning the realization of taking out computers
in other places than in our desktops while eyeing everywhere computation as one
of their objectives. The idea of wearable computers to enhance human visual
sensors by augmenting image generated information on a visual input is one of

these issues. One of the main proponents of the research is Gesture-Recognition
as the input such as pointing and clicking of a finger. It has been classified that
gesture recognition has two steps: 1.) capturing the motion of the user input and
2.) Classify the gesture to its predefined gesture classes. Capturing is either
performed by glove–based or optical-based system. Optical-based gesture
recognition comprise of model-based and appearance-based category. In a model-
based system a geometric model of the hand is created where it is matched to the
image data to define the state of the hand. While in appearance-based system the
recognition is based on a pixel representation learned from training images.
Because both approaches require a lot of computational complexity which is not
desirable for Augmented Reality (AR) systems it requires enhancements like
markers and infrared lightings. Gesture recognition will be introduced and the
main topic in this paper in order to make useful interface, as well as having a low
computational complexity. Outline of the paper will be done to show how the
research is implemented (Moeslund T., 2004).
2.3.3.1. Defining the Gestures
Two primary gestures are introduced, pointing and clicking gesture of
the hand. Consideration of minimum requirements to control the application is
done also it include other easy to remember gestures that will help in short-cut
commands to be able to avoid numerous pop-up menus (Moeslund T., 2004).

2.3.3.2. Segmentation
Task of the segmentation will be for the recognition and detection of
the placeholder objects and pointers where the visual output of the system will
be projected as well as hands in the 2d image captured. In order to achieve
invariance to changing size and form of objects to be detected the research
used colour pixel-based approach to segment spots of similar colour image.
Problems like lighting settings, changing illumination and skin colour
detection is discussed and was given solutions to (Moeslund T., 2004).
2.3.3.3. Gesture Recognition
A basic approach is done to solve this problem, by counting the
number of fingers. Hand and fingers can be approximated by a circle and a
number of rectangles, where it equates to the number of the finger that is
projected. Polar transformation around the centre of the hand and count the
number of fingers (rectangles) present in each radius. The algorithm does not
contain any information regarding the relative distances between two fingers,
because it makes the system more general, and secondly because different
users tend to have different preferences in the shape and size of their hands
(Moeslund T., 2004).

2.3.3.4. System Performance
Gesture-recognition has been implemented as part of the computer
vision system of a computer vision system of an AR multi-user application.
The low level segmentation (section 3) can robustly segment 7 different
colours from the background (skin colour and 6 colours for PHO and
pointers), given there are no big changes in the illumination colour (Moeslund
T., 2004).
Segmentation Result
2.3.4.A Design Tool for Camera-Based Interaction
Constructing a camera-based interface can be difficult for most
programmers and would require a better understanding of machine algorithms that
are involve. Basically a camera-based interface is that a camera will serve as the
sensor/eyes of the system regarding with your input. The goal is to make the
system interactive while not wearing any other special devices to detect the input
rather than having other traditional inputs like keyboard etc. This makes
computing set in the environment rather than in our desktops. Problem lies in the
designing of a camera-based system, the programming and the mathematics part

is complicated that ordinary programmers do not have the skill for it especially
when we are considering bare-hand inputs. The main item to be considered in a
camera-based interaction is a classifier that takes an image and identifies pixels
that is considered. Acquiring skills in building a classifier is greatly needed to
pursue the idea (Fails, J.A., 2003).
Crayons is one of the tools to make a classifier which can be exported in a
form that can be read by java. Crayons help User Interface (UI) designers to make
the camera-based interface even without detailed knowledge on image processing.
But its features are unable to distinguish shapes and object orientation but do well
in object-detection and hand and object tracking (Fails, J.A., 2003).
Classifier Design Process
The function of the Crayons is to create a classifier with ease. Crayons
receive images and then after the user gives its input a classifier is created then a
feedback is displayed (Fails, J.A., 2003).
2.3.4.1. User Interfaces
There are four pieces of information that a designer must consider and
operate in designing a classifier interface which are: 1.) set of classes to be
recognized, 2.) Set of training images to be used, 3.) classification of pixels as
defined by the programmer and 4.) the classifier’s current classification of the
pixels (Fails, J.A., 2003).
2.3.4.2. Crayons Classifier

Automating the classifier creation is the main function of the crayon
tool. It is required to extract features and generate classifiers as quickly as
possible. Current Crayons prototype has about 175 features per pixel (Fails,
J.A., 2003).
Lastly to accomplish the application a machine learning algorithm that
can handle a large number of examples with a large number of features is
required (Fails, J.A., 2003).
2.3.5.Using Marking Menus to Develop Command Sets for Computer Vision
Based Hand Gesture Interfaces
The use of hand gestures for interaction, in an approach based on
computer vision. The purpose is to study if marking menus, with practice, could
support the development of autonomous command sets for gestural interaction.
Some early problems are reported, mainly concerning with user fatigue and
precision of gestures (Lenman, S., 2002).
Remote control of electronic appliances in a home environment, such as
TV sets and DVD players, has been chosen as a starting point. Normally it
requires the use of a number of devices, and there are clear benefits to an
appliance-free approach. They only implemented a first prototype for exploring
pie- and marking menus for gesture-based interaction (Lenman, S., 2002).
2.3.5.1. Perceptive and Multimodal User Interfaces
Perceptive User Interfaces (PUI) strives for automatic recognition of
natural, human gestures integrated with other human expressions, such as
body movements, gaze, facial expression, and speech. The second approach to
gestural interfaces will be the Multimodal User Interfaces (MUI), where hand
poses and specific gestures are used as commands in a command language. In
this approach, gestures are either a replacement for other interaction tools,
such as remote controls, mouse, or other interaction devices. The gestures
need not be natural gestures but could be developed for the situation, or based
on a standard sign language.
There is a growing interest in designing multimodal interfaces that
incorporate vision-based technologies. It contrasts the passive mode of PUI
with the active input mode addressed here. It claims that although passive
modes may be less obtrusive, active modes generally are more reliable
indicators of user intent, and not as prone to error.
The design space for such commands can be characterized along three
dimensions: Cognitive aspects, Articulatory aspects, and Technological
aspects.
Cognitive aspects refer to how easy commands are to learn and to remember.
It is often claimed that gestural command sets should be natural and intuitive,
meaning that they should inherently make sense to the user.
Articulatory aspects refer to how easy gestures are to perform, and how tiring
they are for the user. Gestures involving complicated hand or finger poses
should be avoided, because they are difficult to articulate.
Technological aspects refer to the fact that in order to be appropriate for
practical use, and not only in visionary scenarios and controlled laboratory
situations, a command set for gestural interaction based on computer vision
must take into account the state-of-the art of technology (Lenman, S., 2002).
2.3.5.2. Current Work
The point of departure for the current work is cognitive, leaving
articulatory aspects aside at the moment. A command language based on a
menu structure has the cognitive advantage that the commands can be
recognized rather than recalled. Traditional menu based interaction is not
attractive in a gesture-based scenario. Pie- and marking menus might provide
a foundation for developing directness and autonomous gestural command
sets (Lenman, S., 2002).
Pie menus are pop-up menus with the alternatives arranged radially.
Because the gesture to select an item is directional, users can learn to make
selections without looking at the menu. The direction of the gesture is
sufficient to recognize the selection. If the user hesitates at some point in the
interaction, the underlying menus can be popped up, always giving the
opportunity to get feedback about the current selection.
Hierarchic marking menus are a development of pie menus that allow
more complex choices by the use of sub-menus. The shape of the gesture
(mark) with its movements and turns can be recognized as a selection, instead
of the sequence of distinct choices between alternatives.
The gestures in the command set would consist of a start pose, a
trajectory defined by menu organization for each possible selection and, lastly
a selection pose. Gestures ending in any other way than with the selection
pose would be discarded (Lenman, S., 2002).
2.3.5.3. A Prototype for Hand Gesture Interaction
Here remote control appliances in a domestic environment were
chosen as the first application. So far, the only designed hierarchic menu
system is for controlling some functions of a TV, a CD player, and a lamp
(Lenman, S., 2002).
The hand was chosen as a view-based representation which includes
both color and shape cues. The system tracks and recognizes the hand poses
based on a combination of multi-scale color feature detection, view-based
hierarchical hand models and particle filtering. The hand poses are represented
in terms of hierarchies of color image features at different scales, with
qualitative interrelations in terms of scale, position and orientation. These
hierarchical models capture the coarse shape of the hand poses. In each image,
detection of multi-scale color features is performed.
The particle filtering allows for the evaluation of multiple hypotheses
about the hand position, state, orientation and scale, and a possibility measure
determines what hypothesis to choose. To improve the performance of the
system, a prior on skin color is included in the particle filtering step. In fig. 1,
yellow (white) ellipses show detected multi-scale features in a complex scene
and the correctly detected and recognized hand pose is superimposed in red
(gray).
Detected multi-scale features and the recognized hand pose superimposed in
an image of a complex scene
There is a large number of works on real-time hand pose recognition in
the computer vision literature. Some of the most related in this approach is by
using normalized correlation of template images of hands for hand pose
recognition. Though efficient, this technique can be expected to be more
sensitive to different users, deformations of the pose and changes in view,
scale, and background.
However, the performance was far from real-time. The approach
closest was representing the poses as elastic graphs with local jets of Gabor
filters computed at each vertex. In order to maximize speed and accuracy in
the prototype, gesture recognition is currently tuned to work against a uniform
background within a limited area, approximately 0.5 by 0,65m in size, at a
distance of approximately 3m from the camera, and under relatively fixed
lighting conditions (Lenman, S., 2002).

The demo space at CID
2.4. Similar Product
An Interactive Whiteboard (IW) is a projector-screen, except that the screen is
either touch sensitive or can respond to a special ‘pen.’ This means that the projector-
screen can be used to interact with the projected user image. This provides a more
intuitive way to interact rather than using input devices such as the mouse/keyboard
for navigation of the computer screen being projected. There are two basic functions
of an IW, writing on the board and acting as a mouse. All common IWs have
character-recognition and can convert scrawls into text-boxes.
There are two market leaders in IWs. They are Promethean ActivBoard and
SmartBoard. Promethean has its own presentation system, web browser, and its own
file system. SmartBoard uses the computer’s native browser. Promethean uses stylus
pen to interact with the board while the SmartBoard are touched to operate. The
reason to prefer one to the other will depend on its applications.
There are some issues regarding IWs. One of which is that it requires a
computer with an IW software installed. The need for a software makes it awkward to
use an IW with individual laptops. Another issue is that all IWs used were “front-lit”
meaning that the user’s shadow will be thrown across the screen. Backlit IWs
currently are very expensive. Lastly, although IWs have both character-recognition
and an onscreen keyboard, it is not a good technology for typing. The user can easily
go back to the computer keyboard when he/she needs to do a lot of typing. (Stowell,
2003)
2.5. Computer Vision and Image Processing Development Tools
2.5.1.Open CV
OpenCV which stands for Open Computer Vision is an open source library
developed by Intel. This library is cross-platform which runs both on Windows
and Linux and mainly focuses on real-time image processing. This library is
intended for use, incorporation and modification by researchers, commercial
software developers, government and camera vendors as reflected in the license
(Open Source Computer Vision Library).
2.5.2.Microsoft Vision SDK
Microsoft Vision SDK is a library for writing programs to perform
manipulation and analysis on computers running on Microsoft Windows operating
systems. The library was developed to support researchers and developers of
advanced applications, including real-time image processing applications.
Microsoft Vision SDK is a C++ library of object definitions, related software, and
documentation for use with Microsoft Visual C++. It is a low-level binary,
intended to provide a strong programming foundation for research and application
development. It includes classes and functions for working with images but it
does not include image processing predefined functions (Intel, n.d.).

References:
DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from
http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm.
Fails, J.A., Olsen, D. (2003). A Design Tool for Camera-Based Interaction. Bringham
University, Utah. Retrieved from
http://icie.cs.byu.edu/Papers/CameraBaseInteraction.pdf
Hardenberg, C., Bérard, F., (2001). Bare-hand human-computer interaction. Orlando, FL

USA. Retrieved from
Hewett, et. al. (1996) Chapter 2: Human Computer Interaction. ACM SIGCHI Curricila
for Human Computer Interaction. Available:
http://sigchi.org/cdg/cdg2.html#2_3 retrieved June 2, 2006.
Intel, (n.d.). Open source computer vision library. Retrieved June 4, 2006 from
http://www.intel.com/technology/computing/opencv/index.htm.
Kjeldsen, R., Levas, A., & Pinhanez, C. (2003). Dynamically Reconfigurable Vision-
Based User Interface. Retrieved from
http://www.research.ibm.com/ed/publications/icvs03.pdf
Lenman, S., Bretzner, L., Thuresson B., (2002, October). Using Marking Menus to
Develop Command Sets for Computer Vision Based Hand Gesture Interfaces.
Retrieved from http://delivery.acm.org/10.1145/580000/572055/p239-
lenman.pdf?key1=572055&key2=1405429411&coll=GUIDE&dl=ACM&CFID=
77345099&CFTOKEN=54215790
Moeslund T., Liu Y., Storring M., (2004, September). Computer Vision-Based Gesture
Recognition for an Augmented Reality Interface. Marbella, Spain. Retrieved from
http://www.cs.sfu.ca/~mori/courses/cmpt882/papers/augreality.pdf
Myers B., et. al. (1996) Strategic Directions in Human-Computer Interaction. ACM
Computing Surveys Vol.28 No.4
Porta, M. (2002) Vision-based user interfaces:methods and applications. International

Journal of Human Computer Studies. Elsevier Science
Stowell, D. (May, 2003). Interactive Witeboard. Retrieved June 1, 2006 from

http://www.ucl.ac.uk/is/fiso/lifesciences/whiteboard.
The Microsoft Vision SDK. (2000, May). Retrieved June 4, 2006 from
http://robotics.dem.uc.pt/norberto/nicola/visSdk.pdf
Turk, M. (1998). Moving from GUIs to PUIs. Symposium on Intelligent Information
Media. Microsoft Research Technical Report MSR-TR-98-69
Webcam. (n.d.). Wikipedia. Retrieved June 03, 2006, from Answers.com Web site:
http://www.answers.com/topic/web-cam.

Proposal 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proposal 1

Uploaded by

Copyright:

Available Formats

Projected Interactive Display

for Public Spaces

1.1. Background of the Study

Human-computer interaction (HCI) is the study of the interaction between the

users. HCI in the large is an interdisciplinary area. It is emerging as a specialty

of the user’s task.

commercial sector in developing new and innovative human computer interfaces in

communication techniques found natural in humans, making the interface easy to

emphasis on being a transparent and unobtrusive interface. (Turk, 1998).

What is interesting in this line of research is the development of natural and

human hands alone without the use of attached devices.

1.2. Statement of the Problem

Information-rich interactive viewing modules are usually implemented as

computer based kiosks. However placing computer peripherals such as touch-screens

The general objective of the thesis is to create an interactive projected

generate the proper response.

1.3.2.1. To use a DLP or LCD projector for the display

1.3.2.2. To use a PC camera as the basis of user input

1.3.2.3. To use a PC to implement algorithms to detect hand action as seen

from the camera

1.3.2.4. To use a PC to host the information-rich content

1.3.2.5. To create an interactive DLSU campus map as a demo application

1.4. Scope and Delimitation

1.4.1.Scope of the Study

using a projector and camera.

1.4.1.2. The proponents will use development tools for image/video

processing and computer vision to program the PC. Algorithms for

hand detection and tracking will be implemented using these tools.

interactive campus map of the school.

1.4.1.4. Only the posture of a pointing hand will be recognize as an input.

Other visual cues to the camera will not be recognized.

1.4.2.Delimitation of the Study

1.4.2.1. The display will be projected in a clean white wall.

blocking the projector is not a problem.

1.4.2.3. Trapezoidal distortion which results from projecting from an angle

will be manually compensated if present.

1.4.2.4. Lighting conditions will be controlled to not overpower the projector.

1.5. Significance of the Study

The study applies a new way of presenting information using projected

criteria for effective interfaces. It presents an alternative to computer based modules

where space can be a problem.

is still viable when maintaining an information center is deemed to be important. The

revenue generated from effective advertising.

The study is an endeavor towards the development of natural interfaces. The

use of a projector and camera provides a means of producing an augmented reality

the means of creating an augmented reality. Further developments in these areas

use of natural interaction skills present in every human is needed.

1.6. Description of the Project

information rich content. Techniques of image/video processing and machine vision

will be used to facilitate the first two functions of the PC.

As a demo application an interactive map of the school is used. The projector

Development of the study would be heavily invested in the software

necessary for timely progress in the study.

be needed upon detecting and fixing problems in the system.

1.8. Gantt Chart

1.9. Estimated Cost

Open Source SDK Free

Development / Prototype Computer Available

Estimated budget P 57000

PC Camera, popularly known as web camera or webcam, is a real time camera

instant messaging, or a PC video calling application. Over the years, several

applications were developed including in the field of astrophotography, traffic

from the sensor and transmit it to the host computer.

projector uses to compose the image (Projectorpoint).