Professional Documents
Culture Documents
Proposal 1
Proposal 1
--------------------
A Thesis Proposal
Presented to the Faculty of the
Department of Electronics and Communications Engineering
College of Engineering, De La Salle University
--------------------
In Partial Fulfillment of
The Requirements for the Degree of
Bachelor of Science in Electronics and Communications Engineering
--------------------
by
Arcellana, Anthony A.
Ching, Warren S.
Guevara, Ram Christopher M.
Santos, Marvin S.
So, Jonathan N.
June 2006
1. Introduction
users and the computers. The basic goal of HCI is to improve the interaction between
users and computers by making the computers more user-friendly and accessible to
concern within several disciplines, each with different emphases: computer science,
psychology, sociology and industrial design (Hewett et. al., 1996). The ultimate goal
of HCI is to design systems that would minimize the barrier between the human’s
cognitive model of what they want to accomplish and the computer’s understanding
The thesis applies a new way to interact with sources of information using an
interactive projected display. For a long time the ubiquitous mouse and keyboard has
been used to control a graphical display. With the advent of increased processing
power and technology, there has been great interest from the academic and
the past decades. (Myers et. al., 1996). Recently advances and research in human
computer interaction (HCI) has paved the way for techniques such as vision, sound,
speech recognition, and context-aware devices that allow for a much richer,
multimodal interaction between man and machine. (Turk, 1998; Porta, 2002). This
type of recent research moves away from traditional input devices which are
essentially blind into the so called Perceptual User Interfaces (PUI). PUI are
interfaces that emulate the natural capabilities of humans to sense, perceive, and
reason. It models human-computer interaction after human-human interaction. Some
of the advantages of PUIs are as follows: (1) it reduces the dependence on being in
proximity that is required by keyboards and mouse systems, (2) it makes use of
use,(3) it allows interfaces to be built for a wider range of users and tasks, (4) it
creates interfaces that are user-centered and not device centered, and (5) it has design
intuitive interface methods that make use of body language. A subset of PUI is Vision
Based Interfaces (VBI) which focuses on the visual awareness of computers to the
people using them. Here computer vision algorithms are used to locate and identify
individuals, track human body motions, model the head, and face, track facial
features, interpret human motion and actions. (Porta, 2002) A certain class of this
research falls under bare hand human-computer interaction which this study is about.
Bare hand human interaction uses as a basis of input, the actions and gestures of the
and mouse and keyboard controlled computers in a public area would require
significant space and have maintenance concerns on the physical hardware being used
by the common public. Using a projected display and a camera based input device,
would eliminate the hardware problems associated with the space and maintenance. It
also attracts people since projected displays are new and novel.
1.3. Objectives
1.3.1.General Objectives
display system using a projector and a camera. The projector would display the
interactive content and the user would use his hand to select objects in the
projected display. Computer vision is used to detect and track the hand and
1.3.2.Specific Objectives
1.4.1.1. The proponents will create a real time interactive projected display
1.4.2.2. The projector and the camera set-up will be fixed in such a way that
1.4.2.5. The system will be designed to handle only a single user. In the
presence of multiple users, the system would respond to the first user
triggering an event.
displays and allows the user to interact with it. A projected display conserves space as
the system is ceiling mounted and there is no hardware that directly involves the user.
Using only the hands of the user as an input, the system is intuitive and natural-- key
system can be comparable to large screen displays that are used in malls and such.
Since the system is also a novel way of presenting information. It can be used to
make interactive advertisements that are very attracting to consumers. The display
can transform from an inviting advertisement into detailed product information. With
this said, the cost of the operation of the projector can possibly be justified with the
that is natural-- requiring no special goggles or gloves that the user has to wear. In
public spaces where information is very valuable, a system that can provide an added
dimension to reality is very advantageous and the use of nothing but the hands means
the user can instantly tap the content of the projected interface. Computer vision
provides the implementation of a perceptual user interface and the projection provide
means the presence of computers can be present in everyday life without being
perceived as such. With PUI there is no need for physical interface hardware, only the
The system is comprised of 3 main components; (1) the PC which houses the
information and control content, (2) the projector which displays the information, and
(3) the PC camera which is the input of the system. Development of the study would
be heavily invested in the software programming of the PC. The functions of the PC
would be the following: detection of the position and action of the hands of the user
relative to the screen, generating a response from a specific action, hosting the
will project the campus directory of De La Salle University Manila. The camera will
capture the images needed and will upload to the computer. The user will then pick on
which building he/she would like to explore using his/her hand as the pointing tool.
Once the user has chosen a building, a menu will appear that will give information
about the building. Information includes brief history, floor plans, facilities, faculties,
etc. Once the user is finished exploring the building, he/she can touch the back button
to select another building in the campus. The cycle will just continue until the user is
satisfied.
1.7. Methodology
programming of the PC. We the researchers must spend time in acquiring skills in
programming for the implementation of the research. Research about video capture
and processing is greatly needed for the operation of the system. Quick
familiarization and efficiency with the libraries and tools for computer vision is
The proponents of the research must first obtain the hardware that is needed
for the achievement of the study, materials such as the camera that will capture the
input, the projector which will give the output display (projected display) and the PC
which the system will be based. The appropriate specifications of the camera and the
projector will be carefully looked at to get the precise requirements. After which the
system has its working prototype, testing and making the necessary adjustments will
Seeking advice from different people may be necessary for speedy progress of
the study. Advice in programming will be very helpful, since the implementation is
PC based. Additionally advice from the panel, adviser, and other people about the
interface will be helpful in removing biases the proponents may have in the system.
Projector P 50,000
PC Camera P 1500-2000
Miscellaneous P5000
2.1. PC Camera
widely used for video conferencing via the Internet. Acquired images from this device
were uploaded in a web server hence making it accessible using the world wide web,
monitoring, and weather monitoring. Web cameras typically includes a lens, an image
sensor, and some support electronics. Image sensors can be a CMOS or CCD, the
former being the dominant for low-cost cameras. Typically, consumer webcams offers
a resolution in the VGA region having a rate of around 25 frames per second. Various
lens were also available, the most being a plastic lens that can be screwed in and out
to manually control the camera focus. Support electronics is present to read the image
2.2. Projectors
Projectors are classified into two technologies, DLP (Digital Light Processing)
and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the
2.2.1.DLP
First, there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels
in DLP are much closer together. Another advantage is that it has higher
contrast compared to LCD. DLP projectors are much portable for it only
requires fewer components and finally, claims had shown that DLP projectors
less color saturation. The ‘rainbow effect’ is appearing when looking from one
side of the screen to the other, or when looking away from the projected image
2.2.2.LCD
LCD projectors contain three separate LCD glass panels, one for red,
green, and blue components of the image signal being transferred to the projector.
As the light passes through the LCD panels, individual pixels can be opened to
allow light to pass or closed to block the light. This activity modulates the light
and produces the image that is projected onto the screen (Projectorpoint).
more ‘light efficient’ than DLP. It produces more saturated colors making it
(Projectorpoint).
2.2.2.2. Disadvantages of LCD projectors
‘chicken wire’ effect causing the image to look more pixellated. LCD
projectors are more bulky because there are more internal components. Dead
pixels, which are pixels that are permanently on or permanently off, appear
which can be irritating to see. LCD panels can fail, and are very expensive to
replace (Projectorpoint).
and the machine. Devices such as keyboard, mouse, joystick, electronic pens and
system without any device or wires attached to the user. The position of the
fingers and the hand is to be used to control the applications (Hardenburg, 2001).
2.3.1.1. Applications
devices. A good example is during a presentation, the presenter may use hand
gestures for selecting slides therefore minimizing the delay or pauses caused
by moving back and fourth to the computer system to click for the slide.
using fingers is made possible with this system. Also, with this system,
indestructible interface could be built by mounting the projector and camera
high enough for the user not to access or touch it. With these, the system will
Identification service recognizes if the object present in the scene is within the
given class of objects. Some of the identification tasks were the identification
required to be able to tell which object moved between two frames since the
identified objects will not rest in the same position over the time (Hardenburg,
2001).
resolution and stability. Latency is defined as the lag between the user’s action
and the response of the system. Eventually there is no system without latency
therefore the acceptable latency of the system is of given importance since the
said to be stable as long as the tracking object does not move and as long as
technology where a user’s intentional gestures are detected via camera, interpreted
and used to control an application. The paper describes a system where the
resources when possible. The parameters of the surfaces where the interface can
be realized are defined and stored independently of any particular interface. These
include the size, location and perspective distortion within the image and
characteristics of the physical environment around that surface, such as the user’s
The framework presented in this paper should be seen as a way that vision
common situations where the interface surface is not static (Kjeldsen, 2003.).
boundary area that defines the configuration coordinate system. The boundary
is used during the process of mapping a configuration onto a particular
fingertip. It generates events back to the controlling application where they are
space. It is able to define the spatial layout of widgets with respect to each
other and the world but it should not be concerned with the details of the
2.3.2.2. Architecture
There are components for finding the moving pixels in an image (Motion
Tracking), looking for touch-like motions in the fingertip paths (Touch Motion
Detection), generating the touch event for the application (Event Generation),
storing the region of application space where this widget resides (Image
The figure below shows the component tree of a “touch button” and the “tracking area.”
2.3.2.3. Example Applications
directory, and then guide him/her to where the product is (Kjeldsen, 2003.).
Interface
in other places than in our desktops while eyeing everywhere computation as one
as the input such as pointing and clicking of a finger. It has been classified that
gesture recognition has two steps: 1.) capturing the motion of the user input and
2.) Classify the gesture to its predefined gesture classes. Capturing is either
based system a geometric model of the hand is created where it is matched to the
image data to define the state of the hand. While in appearance-based system the
markers and infrared lightings. Gesture recognition will be introduced and the
main topic in this paper in order to make useful interface, as well as having a low
computational complexity. Outline of the paper will be done to show how the
done also it include other easy to remember gestures that will help in short-cut
the placeholder objects and pointers where the visual output of the system will
projected. Polar transformation around the centre of the hand and count the
number of fingers (rectangles) present in each radius. The algorithm does not
contain any information regarding the relative distances between two fingers,
because it makes the system more general, and secondly because different
users tend to have different preferences in the shape and size of their hands
colours from the background (skin colour and 6 colours for PHO and
pointers), given there are no big changes in the illumination colour (Moeslund
T., 2004).
Segmentation Result
are involve. Basically a camera-based interface is that a camera will serve as the
sensor/eyes of the system regarding with your input. The goal is to make the
system interactive while not wearing any other special devices to detect the input
rather than having other traditional inputs like keyboard etc. This makes
computing set in the environment rather than in our desktops. Problem lies in the
form that can be read by java. Crayons help User Interface (UI) designers to make
But its features are unable to distinguish shapes and object orientation but do well
receive images and then after the user gives its input a classifier is created then a
There are four pieces of information that a designer must consider and
defined by the programmer and 4.) the classifier’s current classification of the
possible. Current Crayons prototype has about 175 features per pixel (Fails,
J.A., 2003).
computer vision. The purpose is to study if marking menus, with practice, could
Some early problems are reported, mainly concerning with user fatigue and
TV sets and DVD players, has been chosen as a starting point. Normally it
requires the use of a number of devices, and there are clear benefits to an
pie- and marking menus for gesture-based interaction (Lenman, S., 2002).
body movements, gaze, facial expression, and speech. The second approach to
gestural interfaces will be the Multimodal User Interfaces (MUI), where hand
this approach, gestures are either a replacement for other interaction tools,
need not be natural gestures but could be developed for the situation, or based
with the active input mode addressed here. It claims that although passive
modes may be less obtrusive, active modes generally are more reliable
The design space for such commands can be characterized along three
aspects.
Cognitive aspects refer to how easy commands are to learn and to remember.
It is often claimed that gestural command sets should be natural and intuitive,
Articulatory aspects refer to how easy gestures are to perform, and how tiring
they are for the user. Gestures involving complicated hand or finger poses
practical use, and not only in visionary scenarios and controlled laboratory
situations, a command set for gestural interaction based on computer vision
must take into account the state-of-the art of technology (Lenman, S., 2002).
menu structure has the cognitive advantage that the commands can be
Pie menus are pop-up menus with the alternatives arranged radially.
Because the gesture to select an item is directional, users can learn to make
sufficient to recognize the selection. If the user hesitates at some point in the
interaction, the underlying menus can be popped up, always giving the
more complex choices by the use of sub-menus. The shape of the gesture
(mark) with its movements and turns can be recognized as a selection, instead
trajectory defined by menu organization for each possible selection and, lastly
a selection pose. Gestures ending in any other way than with the selection
chosen as the first application. So far, the only designed hierarchic menu
both color and shape cues. The system tracks and recognizes the hand poses
hierarchical hand models and particle filtering. The hand poses are represented
hierarchical models capture the coarse shape of the hand poses. In each image,
about the hand position, state, orientation and scale, and a possibility measure
system, a prior on skin color is included in the particle filtering step. In fig. 1,
and the correctly detected and recognized hand pose is superimposed in red
(gray).
Detected multi-scale features and the recognized hand pose superimposed in
the computer vision literature. Some of the most related in this approach is by
closest was representing the poses as elastic graphs with local jets of Gabor
either touch sensitive or can respond to a special ‘pen.’ This means that the projector-
screen can be used to interact with the projected user image. This provides a more
intuitive way to interact rather than using input devices such as the mouse/keyboard
for navigation of the computer screen being projected. There are two basic functions
of an IW, writing on the board and acting as a mouse. All common IWs have
There are two market leaders in IWs. They are Promethean ActivBoard and
SmartBoard. Promethean has its own presentation system, web browser, and its own
file system. SmartBoard uses the computer’s native browser. Promethean uses stylus
pen to interact with the board while the SmartBoard are touched to operate. The
There are some issues regarding IWs. One of which is that it requires a
computer with an IW software installed. The need for a software makes it awkward to
use an IW with individual laptops. Another issue is that all IWs used were “front-lit”
meaning that the user’s shadow will be thrown across the screen. Backlit IWs
currently are very expensive. Lastly, although IWs have both character-recognition
and an onscreen keyboard, it is not a good technology for typing. The user can easily
go back to the computer keyboard when he/she needs to do a lot of typing. (Stowell,
2003)
2.5.1.Open CV
OpenCV which stands for Open Computer Vision is an open source library
and Linux and mainly focuses on real-time image processing. This library is
Microsoft Vision SDK is a C++ library of object definitions, related software, and
development. It includes classes and functions for working with images but it
DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from
http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm.
Fails, J.A., Olsen, D. (2003). A Design Tool for Camera-Based Interaction. Bringham
University, Utah. Retrieved from
http://icie.cs.byu.edu/Papers/CameraBaseInteraction.pdf
Hewett, et. al. (1996) Chapter 2: Human Computer Interaction. ACM SIGCHI Curricila
for Human Computer Interaction. Available:
http://sigchi.org/cdg/cdg2.html#2_3 retrieved June 2, 2006.
Intel, (n.d.). Open source computer vision library. Retrieved June 4, 2006 from
http://www.intel.com/technology/computing/opencv/index.htm.
Kjeldsen, R., Levas, A., & Pinhanez, C. (2003). Dynamically Reconfigurable Vision-
Based User Interface. Retrieved from
http://www.research.ibm.com/ed/publications/icvs03.pdf
Lenman, S., Bretzner, L., Thuresson B., (2002, October). Using Marking Menus to
Develop Command Sets for Computer Vision Based Hand Gesture Interfaces.
Retrieved from http://delivery.acm.org/10.1145/580000/572055/p239-
lenman.pdf?key1=572055&key2=1405429411&coll=GUIDE&dl=ACM&CFID=
77345099&CFTOKEN=54215790
Moeslund T., Liu Y., Storring M., (2004, September). Computer Vision-Based Gesture
Recognition for an Augmented Reality Interface. Marbella, Spain. Retrieved from
http://www.cs.sfu.ca/~mori/courses/cmpt882/papers/augreality.pdf
Myers B., et. al. (1996) Strategic Directions in Human-Computer Interaction. ACM
Computing Surveys Vol.28 No.4
The Microsoft Vision SDK. (2000, May). Retrieved June 4, 2006 from
http://robotics.dem.uc.pt/norberto/nicola/visSdk.pdf
Turk, M. (1998). Moving from GUIs to PUIs. Symposium on Intelligent Information
Media. Microsoft Research Technical Report MSR-TR-98-69
Webcam. (n.d.). Wikipedia. Retrieved June 03, 2006, from Answers.com Web site:
http://www.answers.com/topic/web-cam.