Professional Documents
Culture Documents
Ch4 FINAL
Ch4 FINAL
DESIGN CONSIDERATIONS
COMPUTER VISION
Image Preliminary
Acquisition Image Fingertip Finding
Processing
Triggering /
Finger Rules Clustering
Checker
APPLICATION
Mouse Pointer
Calibration
G
U
I
Mouse Click
capable of capturing 640x480 videos at 30 frames per second (fps). For the
program however, we limit the size of the frame to 320x240 and it is not
operated at constant fps. The program would get frames upon completion of its
1
A problem encountered with the camera is that of its automatic gain
control. The gain control is not suitable for our purposes since the camera
invariably changes general exposure levels and gains making the background
models unstable. In other times, certain locations with the projector make the
camera run at very high exposure levels giving us overbright video with almost
acquired image.
Figure 4-2. Resultant image from different Exposures: 1/20s (upper left), 1/40s
disable this camera function from within our program. As a workaround, the
2
proponents manually adjust the driver settings from the camera’s bundled
contrast.
Because the webcam has a fixed lens and lack optical zoom the
Setting the camera farther gives poorer quality images and makes the detection
input image, readying it for the fingertip finding process. The flow diagram
Image
Thresholding Differencing
Fingertip
Size Filter
Detection
3
Figure 4-3. Preliminary Image Processing Block
background subtraction.
this value to create a shade of gray that matches the desired brightness.
following formula:
4
Image differencing is used to determine the change between
motions cause hollow areas to appear due to the hand having a general
draw back for this method is that it is not suitable for changing
5
4.2.3 Adaptive Reference
represented as:
R( t ) = (1 − α ) R(t −1) + (α ) I (t )
Where R(t) is the new reference at time t
with respect to the current frame. The higher the α, the higher the weight
of the current image and hence the faster the reference updates. With the
observation that the hand does not rest for too long in a single area. The
1.7 GHz laptop our application uses α in the range of 0.01 to 0.02.
6
α , Alpha Time to absorb as reference(s)
0.05 2.15
0.04 3.11
0.03 3.82
0.02 5.85
0.01 10.68
Beyond limit of algorithm. Does
0.005 not update
Table 4-1
Alpha and resulting time values
limit where the weight is too small and the resulting calculation is floored
down, hence no update happens. This is evident in the last entry of the
table. The figures below show the effect of the running average to the
Figure 4-4
At alpha = 0.05. Timestamps : 0.83s and 2.04s
7
Figure 4-5
At alpha = 0.01 Timestamps: 3.3s, 7s, and 10.5s
It can be seen that as time passes, the hand is slowly getting absorbed
problems that cause undesirable situations. The problem lies when the
hand rests for too long and gets absorbed in the background. Once the
hand moves again, the hands original position does not disappear
Figure 4-6.
Ghosting
8
The ghost results from the previous hand position slowly being
replaced by the now visible backgound. With the effect of ghosting, the
is always of high intensity and that the background is lighter than the
foreground (i.e the user). In this case the reference image will be updated
With this additional updating technique the lighter areas get updated
proponents used a fix background threshold and set it 200. With a white
projection the intensity is typically 240-255 for a bright lit room and 230-
240 with the projection as the main light source (room lights off). The
9
proponents chose to use 200 since it allows more light colored objects to
around 160 and a dark skinned person at around 130. The system will
assume pixels with intensity greater than or equal to 200 were part of the
latter stages. It reduces the grayscale 256 depth image to a two level,
max level gives the right amount of threshold to properly segment the
image.
10
All pixels equal or exceeding the threshold are set to 255 value
while those below are set as zero. This gives a binary image with only the
non-background objects.
illumination can be removed using a size filter. The size filter removes
blobs whose areas are smaller than a set value. The size threshold is
currently set to 190 pixels (area). In perspective, the hand has around 350
pixels of area, however it is possible that only the fingers can be exposed
found within the blobs are filled. The size filter implemented removes
small noise and further simplifies the image with its filling function. The
to find all contours in the image. Then cvContourArea is used to get the
areas of the contours and if found to be smaller than the threshold that
11
removing it from the image. If there are holes within the blobs it is filled
circle.
Looking at the search square, for a fingertip the inner circle should
be composed of filled pixels. Outside the circle, the search square contains
two chains: one is a long chain of unfilled pixels and the other a shorter
12
chain of filled pixels. The search algorithm is based on these features.
follows:
Diameter of finger – In their work it was found that this value lies within 5
The diameter is fairly invariant since the distance between the projection
and camera is fixed. Supposedly the user’s fingers should be closed to the
Search square diameter – this should be at least two pixels wider than the
finger diameter.
Minimum number of filled pixels along the search square - the minimum
13
The minimum and maximum numbers will define the range of values of
objects in query by the algorithm that are around the size of the finger that
The algorithm is illustrated by the flow chart in the following page. The
flow chart is written in an eliminative manner where the point that is not
eliminated in all the tests is memorized as the finger location . A yes in the
14
Figure 4-9. Fingertip Finding Algorithm
15
The searching square would scan the region of interest and subject
the search region to the tests in the flowchart. The region of interest (ROI)
is defined as the pixel areas that are filled (white) resulting from the
thresholding operation and within the projection. Pixel count within the
circle and along the square are performed. Each of the decision boxes are
tests to see if the region is indeed a fingertip. These tests are outlined
below:
Decision Box 1. Is number of filled pixels inside circle < expected circle area.
inscribed circle should be filled. If it is less than that then the point is
eliminated.
There has to be the right number of filled pixels or chain along the
described square around (x, y). In the model, the filled chain (part of the
square coinciding with filled pixels) should have a length of at least the
limited by the maximum pixel length. If the results are not within the
16
Decision Box 4. Is number of connected flled pixel < number of total flled pixel.
The filled pixels along the square have to be connected in one chain. The
compared against the total filled pixels which is the sum of all chains
description of the model implemented through the tests, then that point is
Figure 4-10.
A hypothetical sample of the fingertip finding process
It must be noted that the algorithm scans the entire ROI, which is the entire
area where there are white pixels. The sample above shows only particular
17
cases to exemplify the algorithm at work. Looking at the rejected points,
the rightmost reject point fails because of decision box 1 of the flow chart.
The circle is not filled and there is no need to do the square test. The center
reject has its circle filled but fails decision box 2, having more than the
maximum pixels required. The same analysis goes for the thumb. The
small finger reject fails the test of decision box 4. Its chain is broken
be introduced allowing for error margins that make the algorithm perform
and requires more processing power. This is where the bulk of the
Two methods were used in the program and these are prominent
method and the other is the use of a C++ typedef wrapper. The direct
18
access method is the fastest but is error prone. To access a pixel in this
whose label is img. The variable x and y are the x and y components of
redefinition of the above call into variables that are more accessible. This
is less efficient compared to the direct access method but it allows for
follows:
19
imgA[y][x] = 111; //Access pixel (write as 111)
[y][x]. Where, again, [x] and [y] are the x and y coordinates of the point
in interest. The direct access method is used in simpler parts of our code
with its simplicity and readability is used in more intricate parts of our
code.
the pixel access and tests thus decreasing computational load. The
proponents found that the projection is always less than input image and
section 4.6. Furthermore the input to the fingertip finding block has been
This allows us to further reduce the ROI from within the trapezoid by
nested for loops to that of the lines that compose the trapezoid. A test of
20
4.3.2.3 Inscribed Circle Test
Given the finger diameter, the finger radius coded as int frad can be
calculated and the expected circle area can be calculated as πr2. This is
the theoretical area that will be used to compare what will be found by
the program. With the way the program is coded the finger diameter must
requirement that the radius be integers to be used in the FOR loop. In the
program to find the pixels in the circle defined by the radius, frad the
{
for ( c_j = j - frad ; c_j < j + frad ; c_j++
Given a point pt(i,j) 2 loops with limits based on the finger radius
are calculated. The x and y component loops are calculated with lower
limits – frad, and upper limit + frad. It can be seen that the two
FOR loops alone will scan a rectangular area. An additional step makes
21
sure that only a circular region is scanned and it skips other areas that are
not part of the circle. The variable scan_area_check together with the if
statement checks whether that particular point is within the radius of the
the (finger radius)2 then it is outside the circle and is therefore not
processed. This method may seem indirect but is preferred over a direct
approach where the FOR loop limits are defined by the solution of the
of the sum of the squares. This was found to be very slow, unsuitable for
library.
For points inside the circle with a value not equal to 0 (meaning
22
illustrates the difference between the theoretical and actual areas of a
circle with different radii. These were calculated with a program using
Table 4-2
Comparison of theoretical and actual circle areas
most 3. The error margin for the circle test is currently set as 3 pixels.
criteria.
For points that pass the previous criteria the square based tests are
performed
23
Figure 4-11. Square Test Implementation
Given the point Pt (x,y) the square corners are calculated with
explanation. For the program the proponents use a square dimension 4-6
pixels wider than the fingertip diameter. The variable square_dist is half
the square width. Therefore the square corners are defined by the points:
the code, 4 loops are created going in the each of the numbered directions
to scan a square.
24
During the scan the filled (white) pixels that it goes through are
pixels) it achieved. If the filled pixel count is less than the minimum or
exceeds the maximum then the point does not satisfy the criteria and is
filled pixels are connected along a chain. If the longest connected pixel <
filled pixel count – error margin then the points are not connected in a
long chain and does not satisfy the criteria. Again an error margin is
chains of the squares are marked as blue while the filled chains of the
square are marked as red. The filled chains satisfy the criteria of being
25
within the minimum and maximum lengths. The error margin allows
tolerances that are exemplified by the middle finger and the pinky.
4.4 Clustering/Grouping
The positives found by the finger finding algorithm often are found on
adjacent pixels. That is a fingertip can have on its center, adjacent pixels also
is less than the set distance then that match is on the same finger. If not then it is
26
Figure 4-13. Clustering / Grouping Algorithm Flowchart
new array is created whose contents are the distances of the points with respect
to the first point in the raw matches array. Then a check is made to separate those
within the minimum distance from those exceeding. All raw matches with
distances less than the minimum are stored in a fingertip location with the first
27
match as reference. All raw matches that exceed the minimum distance are
stored in a buffer which later will become the new raw matches. This will
clicking event. The user would just have to hold his hand in the area of interest
because there is a tendency for the fingertip to be detected at adjacent pixel. This
this problem the proponents set a margin of error wherein if the located fingertip
moves due to instability from the program or from the user, the system will still
fingertip stays. The group was originally targeting a 0.5s to 1s delay to click,
which translates to 7 -14 frames (using average processing time gathered from
experimentation, please see 5.9.2) that the fingertip should be detected at the
same location. However, user feedback shows that users want the minimum
delay in clicking, as in the moment when the finger rests, a click should be sent.
The fastest frame count threshold that the group can use is 3. This is the fastest
triggering speed without accidental clicks. Ideally 3 frames triggers in far less
than half a second but instabilities may cause the 3 consecutive frames to
28
accumulate longer. Average values for clicking delay relative to the frame
Preliminary
Image
Processing
Check if
Wait = 1 No Fingertip
Then Counter2 ++ Finding Process
Yes
Check if Counter2 No Counter set to 0
is equal to wait
update time
Check if fingertip No
found is equal to 1
Yes
Yes
Wait counter
is set to 0 Check if the location
of the previous No
finger is the same to
the present
Yes
Counter + 1
Check if counter is No
equal to defined
click speed
After clicking the
system must Yes
update
background by Move the mouse to
setting the wait the finger location
update time. and trigger click
29
The above diagram shows the flow of the mouse clicking algorithm. The
program. The system will not trigger if the previous process which is fingertip
finding locates more than one finger. If more than one finger is detected the
mouse clicking algorithm will not do anything and continue. A problem after
triggering occurs when the mouse clicks and the content hence background
changes. Because it will take time before the background reference settles, a
chance of false detection occurs from the arbitrary shapes that will be produced
temporarily disable the fingertip search algorithm after the click is sent to allow
the background reference time to update to the new content. After a fixed
4.6 Calibration
Since the camera does not share the same optical path as the projection
there are distortions introduced into the video. Most particular is the trapezoid
30
The outer rectangle is what is seen by the camera. The trapezoid drawn
by the solid line is the projection visible from the camera. The inner rectangle is
user will be asked to click on the upper left, lower left, upper right, and lower
right corners of the visible projection. In the case of uneven Y values for either
the upper or lower sections (i.e different y values for upper left and right) the
to compensate for the trapezoid and the next is to transform the coordinate from
openCV into what windows can take. The steps that are done are outlined below:
(0,0)
31
OpenCV takes as an origin the lower left section of the video. The
location of the finger found will be based on OpenCV coordinates and that will
trapezoid, the line equations representing the vertical sides of the trapezoid can
be defined. The points corresponding to the upper and lower left define the line
labeled as xlower while the points corresponding to the upper and lower right
corners define the upper x line. These can be put into a line equation form y =
the xlower line is calculated. The answer of the expression Xi – xlower gives the
distance of the found point relative to the trapezoidal projection. The next step is
to scale that distance to fit in the rectangle defined by the dotted red lines. The
The Y values are more easily transformed since uneven Y values are
component is that Windows uses a top left origin instead of the bottom left of
OpenCV. The major difference is that the y component originates topside from
windows while it is bottom side for OpenCV. Again the OpenCV coordinates are
32
referred against the projection and conforming to Windows origin it will be
compared to the top line of the visible projection. The equation below is the
implemented transformation.
Yscaled = Yupper − Yi
Where: Yi is the y component of the point being transformed.
Yupper is the lower (inner) Y between the upper left and right corners.
X and Y coordinates that have been scaled are relative to the red
rectangle inside of the trapezoid. Now it has to be placed on the Windows screen
requiring parameters such as effective height and width. The effective height is
system is less than 320x240 and is defined by the effective screen height and
width.
Interface part of the Platform SDK to generate the functions for moving the
mouse pointer and simulating the left click event. It serves as the driver that
33
mouse input, is used for the control of the mouse pointer and for the simulation
of a left click. The arguments for moving the mouse are the calibrated x and y
coordinates that the calibration step provides. Snippets of the functions used are
explained below:
void LeftClick ()
{
INPUT Input;
Input.type = INPUT_MOUSE;
Input.mi.dwFlags = MOUSEEVENTF_LEFTDOWN;
SendInput(1,&Input,sizeof(INPUT));
INPUT Input1;
Input1.type = INPUT_MOUSE;
Input1.mi.dwFlags = MOUSEEVENTF_LEFTUP;
SendInput(1,&Input1,sizeof(INPUT));
}
specifies that the type of input event in a mouse input Input.mi.dwFlags sets the
the dx and dy values are normalized absolute coordinates. The value of fx and
34
and fScreenHeight are from effective resolution dimensions calculated by the
function synthesizes keystrokes, mouse motions and button clicks. In this case, it
synthesizes mouse motions. The first parameter of the function arguments is the
nInputs, it is the number of structure in the pInputs array. Next is the pInputs, it
was pressed. The flag MOUSEEVENTF_LEFTUP specifies that the left button
was released, so simulating a left click is simply setting these flags and use
SendInput to synthesize the button clicks like in the code shown above.
the vision system being applied. The use of a background subtraction technique
the hand and general background. This reduces us to colors that are very unlike
skin color. Additionally the GUI must be static as possible to prevent the
possibility that animations are falsely detected as fingers. This may however
affect the aesthetic value of the GUI. So there is a tradeoff between stability and
35
and the program has a resolution of less than 320x240, a loss in fine resolution
control of the mouse must be considered. Therefore the GUI should have
tracking might be distracting to the user as it will cause the cursor to jump.
Making the cursor invisible will minimize user troubles and will not affect, in
The image above shows that light colors were implemented to the
building blocks. As said earlier sufficient contrast should be needed in order for
edges were avoided to prevent the fingertip from being cut-off by the size filter.
Lastly, certain buildings were made larger (not to same scale) in order to
36
Figure 4-18. Building Menu Frame
strictly applied. First, color selection was followed. In this case, variations of the
color “light green” were chosen to be the buttons background and text color.
Then sizes of the buttons were obviously made large. The presentation was
37