Ch4 FINAL

4.
DESIGN CONSIDERATIONS
COMPUTER VISION
Image Preliminary
Acquisition Image Fingertip Finding
Processing
Triggering /
Finger Rules Clustering
Checker
APPLICATION
Mouse Pointer
Calibration
G
U
I
Mouse Click
Figure 4-1. Overview of the Program Blocks
4.1 Image Acquisition

The images are acquired using a Logitech Creative IFX webcam. It is
capable of capturing 640x480 videos at 30 frames per second (fps). For the
program however, we limit the size of the frame to 320x240 and it is not
operated at constant fps. The program would get frames upon completion of its
calculations, which is varying.
1
A problem encountered with the camera is that of its automatic gain
control. The gain control is not suitable for our purposes since the camera
invariably changes general exposure levels and gains making the background
models unstable. In other times, certain locations with the projector make the
camera run at very high exposure levels giving us overbright video with almost
no contrast. The following images illustrate the effect of exposure to the
acquired image.
Figure 4-2. Resultant image from different Exposures: 1/20s (upper left), 1/40s
(upper right), 1/60s (lower left), 1/80s (lower right)
There seems to be no direct programming interface with the driver setting to
disable this camera function from within our program. As a workaround, the
2
proponents manually adjust the driver settings from the camera’s bundled
application. Under most lighting conditions, an exposure of 1/80s provides good
contrast.
Because the webcam has a fixed lens and lack optical zoom the
prototypes is limited to a distance of only up to about 3.5m from the projection.
Setting the camera farther gives poorer quality images and makes the detection
process more difficult.
4.2. Preliminary Image Processing
The proponents made use preliminary image processing to simplify the
input image, readying it for the fingertip finding process. The flow diagram
below shows the group’s implementation of low-level image processing to
prepare the input frame / image for fingertip detection.
PRELIMINARY IMAGE PROCESSING
Colored Image Grayscale Adaptive Reference

Acquisition Conversion Reference Modification
Image
Thresholding Differencing
Fingertip
Size Filter
Detection
3
Figure 4-3. Preliminary Image Processing Block
4.2.1 Grayscale Conversion
Grayscale conversion of the acquired colored image is the initial
process necessary to implement the preliminary image processing. The
conversion of colored image frame to grayscale image is required to the
succeeding preliminary image process such as image differencing and
background subtraction.
Colors in an image may be converted to a shade of gray by
calculating the effective brightness or luminance of the color and using
this value to create a shade of gray that matches the desired brightness.
The effective luminance of a pixel is calculated with the
following formula:
Y= 0.299*R + 0.587*G + 0.114*B
OpenCV includes a function that performs the conversion of absolute
color images to gray scale. It is called as:
cvCvtColor(src_img, dest_img, CV_BGR2GRAY )
4.2.2 Image Differencing and Background

Subtraction
4
Image differencing is used to determine the change between
images. It extracts a moving foreground from a static background by
comparing the gray values of successive frames. As discussed earlier in
theoretical consideration, image differencing suffers from noise
introduced by subtle change in illumination due to environmental
changes and internal noise caused by the digitization process of camera.
Noise introduced can be alleviated using a size filter, which will be
discussed in section 4.2.6. However, the major drawback is the
requirement for motion. In our application once the hand rests it
disappears in the difference frame since there is no motion. Also slow
motions cause hollow areas to appear due to the hand having a general
skin color. The proponents instead use a modified background
subtraction in order to resolve the drawbacks introduced in differencing
successive image frames.
Background subtraction uses a reference frame instead of the
previous frame. With background subtraction there is much cleaner
segmentation of any object not found in the reference background. A
draw back for this method is that it is not suitable for changing
backgrounds. Our application is an interactive display system hence,
having changing content. Therefore, what is needed is to use an adaptive
background model which will be discussed in the succeeding section.
5
4.2.3 Adaptive Reference
A popular adaptive background model is the running average.
This is implemented in OpenCV as the function cvRunningAvg. The
running average algorithm uses an accumulator with weighted inputs of
the starting reference and the current input. In equation form it is
represented as:
R( t ) = (1 − α ) R(t −1) + (α ) I (t )
Where R(t) is the new reference at time t
R(t-1) is the previous reference
I(t) is the current input image and
α is the weight with a value from 0 to 1.
The value α determines how fast the reference image updates
with respect to the current frame. The higher the α, the higher the weight
of the current image and hence the faster the reference updates. With the
right value of α the background can be updated properly making
background subtraction a usable process in the proponent’s study. It is an
observation that the hand does not rest for too long in a single area. The
right value of α is dependent on the speed of the machine in use. Under a
1.7 GHz laptop our application uses α in the range of 0.01 to 0.02.
Experiments of the determination of alpha, using the said hardware,
results to the following table:
6
α , Alpha Time to absorb as reference(s)
0.05 2.15
0.04 3.11
0.03 3.82
0.02 5.85
0.01 10.68
Beyond limit of algorithm. Does
0.005 not update
Table 4-1
Alpha and resulting time values
The time stated is the time it takes for a foreground object to be
completely absorbed into the background. Because of finite number
representation (8 bit or 256 levels) the value of alpha has a threshold
limit where the weight is too small and the resulting calculation is floored
down, hence no update happens. This is evident in the last entry of the
table. The figures below show the effect of the running average to the
thresholded diffrence image:
Figure 4-4
At alpha = 0.05. Timestamps : 0.83s and 2.04s
7
Figure 4-5
At alpha = 0.01 Timestamps: 3.3s, 7s, and 10.5s
It can be seen that as time passes, the hand is slowly getting absorbed
into the background reference as indicated by the decreasing thresholded
difference. Higher α, gives faster updating and lower α, slower updating
as seen in the examples above.
Making use of an adaptive background the group found some
problems that cause undesirable situations. The problem lies when the
hand rests for too long and gets absorbed in the background. Once the
hand moves again, the hands original position does not disappear
completely in background and this causes 2 hands to show up—the
current hand and its ghost of the resting position.
Figure 4-6.
Ghosting
8
The ghost results from the previous hand position slowly being
replaced by the now visible backgound. With the effect of ghosting, the
algorithm needed to be modified and the proponents followed
suggestions in related literature particularly, Hardenburgh and Berard [5],
to modify the running average.
4.2.4 Adaptive Reference Modification
Hardenbergh and Berrard suggested that pixels which are lighter,
must be updated immediately since it is assumed that the projected image
is always of high intensity and that the background is lighter than the
foreground (i.e the user). In this case the reference image will be updated
continuously regardless of what weight (α) is prescribed.
The group implemented an additional update process checking if
the particular pixel area is lighter than a certain value.
R( t ) ( x, y ) = I (t) ( x, y ) if I(x,y) >= background_threshold
With this additional updating technique the lighter areas get updated
immediately therefore removing the “ghost” of the previous hand. The
proponents used a fix background threshold and set it 200. With a white
projection the intensity is typically 240-255 for a bright lit room and 230-
240 with the projection as the main light source (room lights off). The
9
proponents chose to use 200 since it allows more light colored objects to
be part of the background. A light skinned person has a pixel intensity
around 160 and a dark skinned person at around 130. The system will
assume pixels with intensity greater than or equal to 200 were part of the
background that must be updated instantaneously.
4.2.5 Image Thresholding
Thresholding simplifies the image for easier processing in the
latter stages. It reduces the grayscale 256 depth image to a two level,
binary image. The proponents found that a threshold of 15-20% of the
max level gives the right amount of threshold to properly segment the
hand. The acceptable threshold values are established during operation of
the projector in typical room (fluorescent) light. Experiments in chapter 5
exhibit in detail the effects of thresholding (see 5-1). The following
figure illustrates the effect of different thresholds to the differenced
image.
Figure 4-7. Thresholding Results. (Left) 10% , (Center) 35%,

(Right) 50%
10
All pixels equal or exceeding the threshold are set to 255 value
while those below are set as zero. This gives a binary image with only the
non-background objects.
4.2.6 Size Filter
As discussed earlier, noise introduced by sudden change in
illumination can be removed using a size filter. The size filter removes
blobs whose areas are smaller than a set value. The size threshold is
currently set to 190 pixels (area). In perspective, the hand has around 350
pixels of area, however it is possible that only the fingers can be exposed
as in operation near the edges of the projection screen. A 190 size
threshold is used to allow for these type of situations. Additionally holes
found within the blobs are filled. The size filter implemented removes
small noise and further simplifies the image with its filling function. The
size filter in use is an OpenCV port of the Matlab function bwareaopen
available in the OpenCV forums [27].
The size filter is implemented from contour finding and area
calculation functions of OpenCV. The function cvFindContours is used
to find all contours in the image. Then cvContourArea is used to get the
areas of the contours and if found to be smaller than the threshold that
particular contour is redrawn with cvDrawContour set to black thus
11
removing it from the image. If there are holes within the blobs it is filled
by redrawing it with cvDrawContour set to white.
4.3 Fingertip Detection
4.3.1 Algorithm Details
A simple fingertip finding algorithm was introduced by
Hardenburgh and Berard [5] in their paper Bare Hand Computer
Interaction which used as it basis a simple searching square and inscribed
circle.
Figure 4-8. (left) Searching model, (right) on a fingertip
Looking at the search square, for a fingertip the inner circle should
be composed of filled pixels. Outside the circle, the search square contains
two chains: one is a long chain of unfilled pixels and the other a shorter
12
chain of filled pixels. The search algorithm is based on these features.
Hardenbergh and Berard defined parameters by the search algorithm as
follows:
Diameter of finger – In their work it was found that this value lies within 5
and 10 pixels. However, in our implementation, the value can be pushed to
2 pixels in the extreme range of the camera. Section 5.5 of chapter 5
contains experiments that define the usable finger diameter vs distance.
The diameter is fairly invariant since the distance between the projection
and camera is fixed. Supposedly the user’s fingers should be closed to the
projected surface this more or less limits the changes in diameter.
Search square diameter – this should be at least two pixels wider than the
finger diameter.
Minimum number of filled pixels along the search square - the minimum
number of pixels that can be considered as a finger. Hardenbergh defined
this as the finger diameter which obvious from the figure.
Maximum number of filled pixels along the search square – the
maximum number of pixels that can be considered as a finger.
Hardenbergh defines this as twice the finger diameter.
13
The minimum and maximum numbers will define the range of values of
objects in query by the algorithm that are around the size of the finger that
the system should identify. Chapter 5 section 5.6 investigates and
experiments on different values for these parameters.
The algorithm is illustrated by the flow chart in the following page. The
flow chart is written in an eliminative manner where the point that is not
eliminated in all the tests is memorized as the finger location . A yes in the
decision box means that point is eliminated.
14
Figure 4-9. Fingertip Finding Algorithm
15
The searching square would scan the region of interest and subject
the search region to the tests in the flowchart. The region of interest (ROI)
is defined as the pixel areas that are filled (white) resulting from the
thresholding operation and within the projection. Pixel count within the
circle and along the square are performed. Each of the decision boxes are
tests to see if the region is indeed a fingertip. These tests are outlined
below:
Decision Box 1. Is number of filled pixels inside circle < expected circle area.
There has to be a sufficient number of filled pixels in the close
neighborhood of the position (x, y) representing a circle. In the model, the
inscribed circle should be filled. If it is less than that then the point is
eliminated.
Decision Box 2 and 3. Is number of filled pixels < maximum pixels
Is number of filled pixels > minimum pixels.
There has to be the right number of filled pixels or chain along the
described square around (x, y). In the model, the filled chain (part of the
square coinciding with filled pixels) should have a length of at least the
finger diameter. It is allowed to exceed the diameter allowing tolerances
limited by the maximum pixel length. If the results are not within the
minimum and maximum values then that point is eliminated.
16
Decision Box 4. Is number of connected flled pixel < number of total flled pixel.
The filled pixels along the square have to be connected in one chain. The
number of connected filled pixel is the count of a continous chain
compared against the total filled pixels which is the sum of all chains
(broken or continous). If the connected count is less than the continous
count then that point is eliminated.
If the particular point in query by the algorithm matches the
description of the model implemented through the tests, then that point is
labeled as a fingertip location.The folowing figure illustrates the possible
scenarios of the fingertip finding process.
Figure 4-10.
A hypothetical sample of the fingertip finding process
It must be noted that the algorithm scans the entire ROI, which is the entire
area where there are white pixels. The sample above shows only particular
17
cases to exemplify the algorithm at work. Looking at the rejected points,
the rightmost reject point fails because of decision box 1 of the flow chart.
The circle is not filled and there is no need to do the square test. The center
reject has its circle filled but fails decision box 2, having more than the
maximum pixels required. The same analysis goes for the thumb. The
small finger reject fails the test of decision box 4. Its chain is broken
indicated by the two red lines.
The next section discusses the programming considerations of the
group’s implementation of the algorithm. As expected, tolerances have to
be introduced allowing for error margins that make the algorithm perform
better in real world conditions.
4.3.2 Implementation Details
4.3.2.1 Image Access
The group’s implementation of the above algorithm required the
use of pixel access in OpenCV. Accessing pixels in OpenCV is slower
and requires more processing power. This is where the bulk of the
processing time goes to compared to just calling OpenCV functions
which are optimized to run comfortably in real time by Intel.
Two methods were used in the program and these are prominent
methods used in introductory tutorials [28]. One is the direct access
method and the other is the use of a C++ typedef wrapper. The direct
18
access method is the fastest but is error prone. To access a pixel in this
method the following call is performed:
Pixel_val = ((uchar *)(img->imageData + y*img

>widthStep))[x];
Pixel_val is an unsigned char from 0 to 255 read directly from
the dataype IplImage (derived from Intel Image Processing Library)
whose label is img. The variable x and y are the x and y components of
the point in query.
The other method used is a C++ wrapper which is basically a
redefinition of the above call into variables that are more accessible. This
is less efficient compared to the direct access method but it allows for
easier calling and offers better readability. The redefinitions are as
follows:
template<class T> class Image

{
private:
IplImage* imgp;
public:
Image(IplImage* img=0) {imgp=img;}
~Image(){imgp=0;}
void operator=(IplImage* img) {imgp=img;}
inline T* operator[](const int rowIndx) {
return ((T *)(imgp->imageData + rowIndx*imgp-
>widthStep));}
};
typedef Image<unsigned char> BwImage;
It’s usage is exemplified below:
IplImage* img; //Given an image type with name img

BwImage imgA(img); //Define wrapper as imgA on img
19
imgA[y][x] = 111; //Access pixel (write as 111)
With the wrapper, accessing is simplified to the call img_wrapper
[y][x]. Where, again, [x] and [y] are the x and y coordinates of the point
in interest. The direct access method is used in simpler parts of our code
where reading from binary images is performed. The wrapper method
with its simplicity and readability is used in more intricate parts of our
code.
4.3.2.2 Region of Interest
Defining a region of interest for the searching algorithm will reduce
the pixel access and tests thus decreasing computational load. The
proponents found that the projection is always less than input image and
it is typically trapezoidal by shape. The region of interest is defined by
that trapezoid using calculations from the calibration block discussed in
section 4.6. Furthermore the input to the fingertip finding block has been
processed with image processing techniques particularly thresholding.
This allows us to further reduce the ROI from within the trapezoid by
considering only pixels that are white (255).
The ROI is implemented in code by defining the limits of the
nested for loops to that of the lines that compose the trapezoid. A test of
whether a particular point is empty is done and if found empty it skips
the algorithm and goes to the next point.
20
4.3.2.3 Inscribed Circle Test
Given the finger diameter, the finger radius coded as int frad can be
calculated and the expected circle area can be calculated as πr2. This is
the theoretical area that will be used to compare what will be found by
the program. With the way the program is coded the finger diameter must
be even numbers such that the resulting radius must be an integer. It is a
requirement that the radius be integers to be used in the FOR loop. In the
program to find the pixels in the circle defined by the radius, frad the
following loop is used:
for ( c_i = i - frad ; c_i < i + frad ; c_i++ )
{
for ( c_j = j - frad ; c_j < j + frad ; c_j++
scan_area_check = (i - c_i)*(i - c_i) +

(j - c_j)*(j - c_j);
if (scan_area_check > frad*frad) continue;
if( (thresholded_filt->imageData + c_j*thresholded_filt-

>widthStep)[c_i] != 0 )
{ disc_pixel_count++; }
Given a point pt(i,j) 2 loops with limits based on the finger radius
are calculated. The x and y component loops are calculated with lower
limits – frad, and upper limit + frad. It can be seen that the two
FOR loops alone will scan a rectangular area. An additional step makes
21
sure that only a circular region is scanned and it skips other areas that are
not part of the circle. The variable scan_area_check together with the if
statement checks whether that particular point is within the radius of the
circle. This check is motivated by the equation of the circle that r2 = x2 +
y2. If the resultant number of (x component)2 and (y component)2 exceeds
the (finger radius)2 then it is outside the circle and is therefore not
processed. This method may seem indirect but is preferred over a direct
approach where the FOR loop limits are defined by the solution of the
circle equation. The direct approach involves calculation of square root
of the sum of the squares. This was found to be very slow, unsuitable for
real time operation. Squares are calculated in the program as
multiplication by itself instead of using the power functions of math.h C
library.
For points inside the circle with a value not equal to 0 (meaning
filled since the image is binary), an accumulator variable called
disc_pixel_count is incremented. Upon end of the nested x and y
coordinate loops the program variable disc_pixel_count should hold the
number of pixels inside the circle.
It is then compared to the expected area. If the expected area >
disc_pixel_count – error margin then the circle is said to be filled. An
error margin must be introduced since the expected area is based on
continuous values compared to discrete pixel count. The following table
22
illustrates the difference between the theoretical and actual areas of a
circle with different radii. These were calculated with a program using
the algorithm just discussed.
Radi Theoretical Actual (discrete

us (continous) count)
5 78.54 79
4 50.27 47
3 28.27 27
2 12.57 11
1 3.14 3
Table 4-2
Comparison of theoretical and actual circle areas
It can be seen that the discrepency is typically at a value of 1 and at
most 3. The error margin for the circle test is currently set as 3 pixels.
This is a safe number allowing for increased tolerance in the search
criteria.
4.3.2.4 Square Tests
For points that pass the previous criteria the square based tests are
now performed. The following diagram illustrates how the test is
performed
23
Figure 4-11. Square Test Implementation
Given the point Pt (x,y) the square corners are calculated with
reference to the square dimensions specified earlier in the algorithm
explanation. For the program the proponents use a square dimension 4-6
pixels wider than the fingertip diameter. The variable square_dist is half
the square width. Therefore the square corners are defined by the points:
(x-square_dist, y- square_dist), (x + square_dist, y- square_dist), (x-
square_dist, y + square_dist), and (x + square_dist, y + square_dist). In
the code, 4 loops are created going in the each of the numbered directions
to scan a square.
24
During the scan the filled (white) pixels that it goes through are
counted as well as the length of the longest chain (connected filled
pixels) it achieved. If the filled pixel count is less than the minimum or
exceeds the maximum then the point does not satisfy the criteria and is
not a fingertip location. An additional test is performed to make sure the
filled pixels are connected along a chain. If the longest connected pixel <
filled pixel count – error margin then the points are not connected in a
long chain and does not satisfy the criteria. Again an error margin is
introduced which allows tolerances in the criteria. The algorithm at work
in our program is illustrated below:
Figure 4-12. Fingertip Finding Results
All fingers have completely filled inscribed circles. The unfilled
chains of the squares are marked as blue while the filled chains of the
square are marked as red. The filled chains satisfy the criteria of being
25
within the minimum and maximum lengths. The error margin allows
tolerances that are exemplified by the middle finger and the pinky.
4.4 Clustering/Grouping
The positives found by the finger finding algorithm often are found on
adjacent pixels. That is a fingertip can have on its center, adjacent pixels also
found as fingertips. This is due to the tolerances allowed by the algorithm. A
finger can have 2 or more matches in a single fingertip location.
A clustering or sorting algorithm is implemented to find which matches
belong to a particular finger. The algorithm is based on defining a minimum
distance between found positives to be marked as a new fingertip. If the distance
is less than the set distance then that match is on the same finger. If not then it is
on the next finger.
26
Figure 4-13. Clustering / Grouping Algorithm Flowchart
The flowchart above illustrates the clustering/grouping algorithm. The
raw matches provided by the fingertip finding module is stored in an array. A
new array is created whose contents are the distances of the points with respect
to the first point in the raw matches array. Then a check is made to separate those
within the minimum distance from those exceeding. All raw matches with
distances less than the minimum are stored in a fingertip location with the first
27
match as reference. All raw matches that exceed the minimum distance are
stored in a buffer which later will become the new raw matches. This will
continue until all raw matches are assigned to a finger location.
4.5 Triggering / Rules Check

The group decided to use a delay based approach to signal a mouse
clicking event. The user would just have to hold his hand in the area of interest
to signify he/she is clicking. A problem occurs in triggering because it is not
possible for a fingertip to settle in an exact pixel location in several frames
because there is a tendency for the fingertip to be detected at adjacent pixel. This
is a limitation of the accuracy of the fingertip finding process. In order to solve
this problem the proponents set a margin of error wherein if the located fingertip
moves due to instability from the program or from the user, the system will still
be able to trigger. The delay is counted as a number of frames where the
fingertip stays. The group was originally targeting a 0.5s to 1s delay to click,
which translates to 7 -14 frames (using average processing time gathered from
experimentation, please see 5.9.2) that the fingertip should be detected at the
same location. However, user feedback shows that users want the minimum
delay in clicking, as in the moment when the finger rests, a click should be sent.
The fastest frame count threshold that the group can use is 3. This is the fastest
triggering speed without accidental clicks. Ideally 3 frames triggers in far less
than half a second but instabilities may cause the 3 consecutive frames to
28
accumulate longer. Average values for clicking delay relative to the frame
threshold are gathered in experiments in section 5.8 of chapter 5.
Preliminary
Image
Processing
Check if
Wait = 1 No Fingertip
Then Counter2 ++ Finding Process
Yes
Check if Counter2 No Counter set to 0
is equal to wait
update time
Check if fingertip No
found is equal to 1
Yes
Yes
Wait counter
is set to 0 Check if the location
of the previous No
finger is the same to
the present
Yes
Counter + 1
Check if counter is No
equal to defined
click speed
After clicking the
system must Yes
update
background by Move the mouse to
setting the wait the finger location
update time. and trigger click
Figure 4-14. Rules Check and Trigerring Algorithm Flowchart
29
The above diagram shows the flow of the mouse clicking algorithm. The
algorithm itself is in the shaded zone in contrast to other modules of the
program. The system will not trigger if the previous process which is fingertip
finding locates more than one finger. If more than one finger is detected the
mouse clicking algorithm will not do anything and continue. A problem after
triggering occurs when the mouse clicks and the content hence background
changes. Because it will take time before the background reference settles, a
chance of false detection occurs from the arbitrary shapes that will be produced
in the difference layer as the reference updates. The proponents decided to
temporarily disable the fingertip search algorithm after the click is sent to allow
the background reference time to update to the new content. After a fixed
number of frames the fingertip finding algorithm will reactivate.
4.6 Calibration
Since the camera does not share the same optical path as the projection
there are distortions introduced into the video. Most particular is the trapezoid
effect of the off-center projection.
Figure 4-15. Camera Content Visualization
30
The outer rectangle is what is seen by the camera. The trapezoid drawn
by the solid line is the projection visible from the camera. The inner rectangle is
the effective screen that gets transformed into Windows coordinates.
As part of the calibration process, the trapezoid is defined by hand. The
user will be asked to click on the upper left, lower left, upper right, and lower
right corners of the visible projection. In the case of uneven Y values for either
the upper or lower sections (i.e different y values for upper left and right) the
program overwrites the values prioritizing inner coordinates. Afterwards the
calibration calculations performed by the program is a two part process. One is
to compensate for the trapezoid and the next is to transform the coordinate from
openCV into what windows can take. The steps that are done are outlined below:
1. Get finger location (OpenCV coordinates)
2. Place location relative to trapezoid.
3. Take that location and scale down to relative rectangular region
4. Transform that scaled location to Windows based coordinates.
(0,0)
Figure 4-16. Calibration Considerations
31
OpenCV takes as an origin the lower left section of the video. The
location of the finger found will be based on OpenCV coordinates and that will
be compared relative to the trapezoid projection. Knowing the corners of the
trapezoid, the line equations representing the vertical sides of the trapezoid can
be defined. The points corresponding to the upper and lower left define the line
labeled as xlower while the points corresponding to the upper and lower right
corners define the upper x line. These can be put into a line equation form y =
mx + b. Given the current coordinates Xi,Yi the value of the corresponding x of
the xlower line is calculated. The answer of the expression Xi – xlower gives the
distance of the found point relative to the trapezoidal projection. The next step is
to scale that distance to fit in the rectangle defined by the dotted red lines. The
equation below illustrates this:
Xscaled = ( Xi − Xlower ) * (Upperright. X − Upperleft. X ) /( Xupper − Xlower )
Where: Xi,Yi is the point being transformed

Xlower is the x value of the left line given Yi
Xupper is the x value of the right left line given Yi
Upperright.X is the x component of the upper right corner coordinate
Upperleft.X is the x component of the upper left corner coordinate
The Y values are more easily transformed since uneven Y values are
overwritten to be level. A major consideration for the transformation of the Y
component is that Windows uses a top left origin instead of the bottom left of
OpenCV. The major difference is that the y component originates topside from
windows while it is bottom side for OpenCV. Again the OpenCV coordinates are
32
referred against the projection and conforming to Windows origin it will be
compared to the top line of the visible projection. The equation below is the
implemented transformation.
Yscaled = Yupper − Yi
Where: Yi is the y component of the point being transformed.
Yupper is the lower (inner) Y between the upper left and right corners.
X and Y coordinates that have been scaled are relative to the red
rectangle inside of the trapezoid. Now it has to be placed on the Windows screen
requiring parameters such as effective height and width. The effective height is
simply the height of the rectangle or quantitatively, Yupper – Ylower. The
effective width is simply the width of the red rectangle or quantitatively,
Upperright.X – Upperleft.X. Due to calibration, the resolution offered by our
system is less than 320x240 and is defined by the effective screen height and
width.
4.7 Windows API
The proponents used WinAPI or the Windows Application Programming
Interface part of the Platform SDK to generate the functions for moving the
mouse pointer and simulating the left click event. It serves as the driver that
communicates with the system inputs. The SendInput function, initialized as a
33
mouse input, is used for the control of the mouse pointer and for the simulation
of a left click. The arguments for moving the mouse are the calibrated x and y
coordinates that the calibration step provides. Snippets of the functions used are
explained below:
void MouseMove (int x, int y)

{
double fx = x*(65535.0f/fScreenWidth);
double fy = y*(65535.0f/fScreenHeight);
INPUT Input;
Input.type = INPUT_MOUSE;
Input.mi.dwFlags = MOUSEEVENTF_MOVE|MOUSEEVENTF_ABSOLUTE;
Input.mi.dx = fx;
Input.mi.dy = fy;
SendInput(1,&Input,sizeof(INPUT));
}
void LeftClick ()
{
INPUT Input;
Input.type = INPUT_MOUSE;
Input.mi.dwFlags = MOUSEEVENTF_LEFTDOWN;
SendInput(1,&Input,sizeof(INPUT));
INPUT Input1;
Input1.type = INPUT_MOUSE;
Input1.mi.dwFlags = MOUSEEVENTF_LEFTUP;
SendInput(1,&Input1,sizeof(INPUT));
}
INPUT Input declares the variable Input as an INPUT structure. Input.type
specifies that the type of input event in a mouse input Input.mi.dwFlags sets the
bit flag to move, the MOUSEEVENTF_ABSOLUTE is also set to specify that
the dx and dy values are normalized absolute coordinates. The value of fx and
fy have the scaling factor of 65535/fScreenWidth and 65535/fScreenWidth
because the flag MOUSEEVENTF_ABSOLUTE makes dx and dy contain
normalized absolute coordinates between 0 and 65,535. The value fScreenWidth
34
and fScreenHeight are from effective resolution dimensions calculated by the
calibration modules of the program, which is less than 320x240. SendInput
function synthesizes keystrokes, mouse motions and button clicks. In this case, it
synthesizes mouse motions. The first parameter of the function arguments is the
nInputs, it is the number of structure in the pInputs array. Next is the pInputs, it
is a pointer to an array of INPUT structures. The last is the cbSize, it specified
the size (bytes) of an INPUT structure.
The flag MOUSEEVENTF_LEFTDOWN specifies that the left button
was pressed. The flag MOUSEEVENTF_LEFTUP specifies that the left button
was released, so simulating a left click is simply setting these flags and use
SendInput to synthesize the button clicks like in the code shown above.
4.8 GUI design

Certain limitations to the GUI have to be considered due to the nature of
the vision system being applied. The use of a background subtraction technique
means that there should be sufficient contrast (difference in intensity) between
the hand and general background. This reduces us to colors that are very unlike
skin color. Additionally the GUI must be static as possible to prevent the
possibility that animations are falsely detected as fingers. This may however
affect the aesthetic value of the GUI. So there is a tradeoff between stability and
presentability. It is up to designer on which will be a priority depending on the
application. Because the projector has a native capacity of 800x600 resolution
35
and the program has a resolution of less than 320x240, a loss in fine resolution
control of the mouse must be considered. Therefore the GUI should have
aesthetically as large buttons as possible. Sudden losses and reappearances of
tracking might be distracting to the user as it will cause the cursor to jump.
Making the cursor invisible will minimize user troubles and will not affect, in
general, the application.
Figure 4-17. Main Menu Frame
The image above shows that light colors were implemented to the
building blocks. As said earlier sufficient contrast should be needed in order for
the background subtraction to be effective. Second, GUI elements in the extreme
edges were avoided to prevent the fingertip from being cut-off by the size filter.
Lastly, certain buildings were made larger (not to same scale) in order to
minimize issues regarding the program’s resolution.
36
Figure 4-18. Building Menu Frame
On this figure, it can be observed that the parameters mentioned were
strictly applied. First, color selection was followed. In this case, variations of the
color “light green” were chosen to be the buttons background and text color.
Then sizes of the buttons were obviously made large. The presentation was
predominantly made static so that the probability for false detections is
minimized. Some animations are used in transitions (between content) where
during that particular time fingertip detection is not necessary.
37

Ch4 FINAL

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch4 FINAL

Uploaded by

Copyright:

Available Formats

4.

Figure 4-1. Overview of the Program Blocks

4.1 Image Acquisition

calculations, which is varying.

no contrast. The following images illustrate the effect of exposure to the

(upper right), 1/60s (lower left), 1/80s (lower right)

There seems to be no direct programming interface with the driver setting to

application. Under most lighting conditions, an exposure of 1/80s provides good

prototypes is limited to a distance of only up to about 3.5m from the projection.

process more difficult.

4.2. Preliminary Image Processing

The proponents made use preliminary image processing to simplify the

below shows the group’s implementation of low-level image processing to

prepare the input frame / image for fingertip detection.

PRELIMINARY IMAGE PROCESSING

Colored Image Grayscale Adaptive Reference

4.2.1 Grayscale Conversion

Grayscale conversion of the acquired colored image is the initial

process necessary to implement the preliminary image processing. The

conversion of colored image frame to grayscale image is required to the

succeeding preliminary image process such as image differencing and

Colors in an image may be converted to a shade of gray by

calculating the effective brightness or luminance of the color and using

The effective luminance of a pixel is calculated with the

Y= 0.299*R + 0.587*G + 0.114*B

OpenCV includes a function that performs the conversion of absolute

color images to gray scale. It is called as:

cvCvtColor(src_img, dest_img, CV_BGR2GRAY )

4.2.2 Image Differencing and Background

images. It extracts a moving foreground from a static background by

comparing the gray values of successive frames. As discussed earlier in

theoretical consideration, image differencing suffers from noise

introduced by subtle change in illumination due to environmental

changes and internal noise caused by the digitization process of camera.

Noise introduced can be alleviated using a size filter, which will be

discussed in section 4.2.6. However, the major drawback is the

requirement for motion. In our application once the hand rests it

disappears in the difference frame since there is no motion. Also slow

skin color. The proponents instead use a modified background

subtraction in order to resolve the drawbacks introduced in differencing

successive image frames.

Background subtraction uses a reference frame instead of the

previous frame. With background subtraction there is much cleaner

segmentation of any object not found in the reference background. A

backgrounds. Our application is an interactive display system hence,

having changing content. Therefore, what is needed is to use an adaptive

background model which will be discussed in the succeeding section.

A popular adaptive background model is the running average.

This is implemented in OpenCV as the function cvRunningAvg. The

running average algorithm uses an accumulator with weighted inputs of

the starting reference and the current input. In equation form it is

R(t-1) is the previous reference

I(t) is the current input image and

α is the weight with a value from 0 to 1.

The value α determines how fast the reference image updates

right value of α the background can be updated properly making

background subtraction a usable process in the proponent’s study. It is an

right value of α is dependent on the speed of the machine in use. Under a

Experiments of the determination of alpha, using the said hardware,

results to the following table:

The time stated is the time it takes for a foreground object to be

completely absorbed into the background. Because of finite number

Y= 0.299R + 0.587G + 0.114*B