Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 37

4.

DESIGN CONSIDERATIONS

COMPUTER VISION

Image Preliminary
Acquisition Image Fingertip Finding
Processing

Triggering /
Finger Rules Clustering
Checker

APPLICATION

Mouse Pointer

Calibration

G
U
I
Mouse Click

Figure 4-1. Overview of the Program Blocks

4.1 Image Acquisition


The images are acquired using a Logitech Creative IFX webcam. It is

capable of capturing 640x480 videos at 30 frames per second (fps). For the

program however, we limit the size of the frame to 320x240 and it is not

operated at constant fps. The program would get frames upon completion of its

calculations, which is varying.

1
A problem encountered with the camera is that of its automatic gain

control. The gain control is not suitable for our purposes since the camera

invariably changes general exposure levels and gains making the background

models unstable. In other times, certain locations with the projector make the

camera run at very high exposure levels giving us overbright video with almost

no contrast. The following images illustrate the effect of exposure to the

acquired image.

Figure 4-2. Resultant image from different Exposures: 1/20s (upper left), 1/40s

(upper right), 1/60s (lower left), 1/80s (lower right)

There seems to be no direct programming interface with the driver setting to

disable this camera function from within our program. As a workaround, the

2
proponents manually adjust the driver settings from the camera’s bundled

application. Under most lighting conditions, an exposure of 1/80s provides good

contrast.

Because the webcam has a fixed lens and lack optical zoom the

prototypes is limited to a distance of only up to about 3.5m from the projection.

Setting the camera farther gives poorer quality images and makes the detection

process more difficult.

4.2. Preliminary Image Processing

The proponents made use preliminary image processing to simplify the

input image, readying it for the fingertip finding process. The flow diagram

below shows the group’s implementation of low-level image processing to

prepare the input frame / image for fingertip detection.

PRELIMINARY IMAGE PROCESSING

Colored Image Grayscale Adaptive Reference


Acquisition Conversion Reference Modification

Image
Thresholding Differencing

Fingertip
Size Filter
Detection

3
Figure 4-3. Preliminary Image Processing Block

4.2.1 Grayscale Conversion

Grayscale conversion of the acquired colored image is the initial

process necessary to implement the preliminary image processing. The

conversion of colored image frame to grayscale image is required to the

succeeding preliminary image process such as image differencing and

background subtraction.

Colors in an image may be converted to a shade of gray by

calculating the effective brightness or luminance of the color and using

this value to create a shade of gray that matches the desired brightness.

The effective luminance of a pixel is calculated with the

following formula:

Y= 0.299*R + 0.587*G + 0.114*B

OpenCV includes a function that performs the conversion of absolute

color images to gray scale. It is called as:

cvCvtColor(src_img, dest_img, CV_BGR2GRAY )

4.2.2 Image Differencing and Background


Subtraction

4
Image differencing is used to determine the change between

images. It extracts a moving foreground from a static background by

comparing the gray values of successive frames. As discussed earlier in

theoretical consideration, image differencing suffers from noise

introduced by subtle change in illumination due to environmental

changes and internal noise caused by the digitization process of camera.

Noise introduced can be alleviated using a size filter, which will be

discussed in section 4.2.6. However, the major drawback is the

requirement for motion. In our application once the hand rests it

disappears in the difference frame since there is no motion. Also slow

motions cause hollow areas to appear due to the hand having a general

skin color. The proponents instead use a modified background

subtraction in order to resolve the drawbacks introduced in differencing

successive image frames.

Background subtraction uses a reference frame instead of the

previous frame. With background subtraction there is much cleaner

segmentation of any object not found in the reference background. A

draw back for this method is that it is not suitable for changing

backgrounds. Our application is an interactive display system hence,

having changing content. Therefore, what is needed is to use an adaptive

background model which will be discussed in the succeeding section.

5
4.2.3 Adaptive Reference

A popular adaptive background model is the running average.

This is implemented in OpenCV as the function cvRunningAvg. The

running average algorithm uses an accumulator with weighted inputs of

the starting reference and the current input. In equation form it is

represented as:

R( t ) = (1 − α ) R(t −1) + (α ) I (t )
Where R(t) is the new reference at time t

R(t-1) is the previous reference

I(t) is the current input image and

α is the weight with a value from 0 to 1.

The value α determines how fast the reference image updates

with respect to the current frame. The higher the α, the higher the weight

of the current image and hence the faster the reference updates. With the

right value of α the background can be updated properly making

background subtraction a usable process in the proponent’s study. It is an

observation that the hand does not rest for too long in a single area. The

right value of α is dependent on the speed of the machine in use. Under a

1.7 GHz laptop our application uses α in the range of 0.01 to 0.02.

Experiments of the determination of alpha, using the said hardware,

results to the following table:

6
α , Alpha Time to absorb as reference(s)

0.05 2.15

0.04 3.11

0.03 3.82

0.02 5.85

0.01 10.68
Beyond limit of algorithm. Does
0.005 not update

Table 4-1
Alpha and resulting time values

The time stated is the time it takes for a foreground object to be

completely absorbed into the background. Because of finite number

representation (8 bit or 256 levels) the value of alpha has a threshold

limit where the weight is too small and the resulting calculation is floored

down, hence no update happens. This is evident in the last entry of the

table. The figures below show the effect of the running average to the

thresholded diffrence image:

Figure 4-4
At alpha = 0.05. Timestamps : 0.83s and 2.04s

7
Figure 4-5
At alpha = 0.01 Timestamps: 3.3s, 7s, and 10.5s

It can be seen that as time passes, the hand is slowly getting absorbed

into the background reference as indicated by the decreasing thresholded

difference. Higher α, gives faster updating and lower α, slower updating

as seen in the examples above.

Making use of an adaptive background the group found some

problems that cause undesirable situations. The problem lies when the

hand rests for too long and gets absorbed in the background. Once the

hand moves again, the hands original position does not disappear

completely in background and this causes 2 hands to show up—the

current hand and its ghost of the resting position.

Figure 4-6.
Ghosting

8
The ghost results from the previous hand position slowly being

replaced by the now visible backgound. With the effect of ghosting, the

algorithm needed to be modified and the proponents followed

suggestions in related literature particularly, Hardenburgh and Berard [5],

to modify the running average.

4.2.4 Adaptive Reference Modification

Hardenbergh and Berrard suggested that pixels which are lighter,

must be updated immediately since it is assumed that the projected image

is always of high intensity and that the background is lighter than the

foreground (i.e the user). In this case the reference image will be updated

continuously regardless of what weight (α) is prescribed.

The group implemented an additional update process checking if

the particular pixel area is lighter than a certain value.

R( t ) ( x, y ) = I (t) ( x, y ) if I(x,y) >= background_threshold

With this additional updating technique the lighter areas get updated

immediately therefore removing the “ghost” of the previous hand. The

proponents used a fix background threshold and set it 200. With a white

projection the intensity is typically 240-255 for a bright lit room and 230-

240 with the projection as the main light source (room lights off). The

9
proponents chose to use 200 since it allows more light colored objects to

be part of the background. A light skinned person has a pixel intensity

around 160 and a dark skinned person at around 130. The system will

assume pixels with intensity greater than or equal to 200 were part of the

background that must be updated instantaneously.

4.2.5 Image Thresholding

Thresholding simplifies the image for easier processing in the

latter stages. It reduces the grayscale 256 depth image to a two level,

binary image. The proponents found that a threshold of 15-20% of the

max level gives the right amount of threshold to properly segment the

hand. The acceptable threshold values are established during operation of

the projector in typical room (fluorescent) light. Experiments in chapter 5

exhibit in detail the effects of thresholding (see 5-1). The following

figure illustrates the effect of different thresholds to the differenced

image.

Figure 4-7. Thresholding Results. (Left) 10% , (Center) 35%,


(Right) 50%

10
All pixels equal or exceeding the threshold are set to 255 value

while those below are set as zero. This gives a binary image with only the

non-background objects.

4.2.6 Size Filter

As discussed earlier, noise introduced by sudden change in

illumination can be removed using a size filter. The size filter removes

blobs whose areas are smaller than a set value. The size threshold is

currently set to 190 pixels (area). In perspective, the hand has around 350

pixels of area, however it is possible that only the fingers can be exposed

as in operation near the edges of the projection screen. A 190 size

threshold is used to allow for these type of situations. Additionally holes

found within the blobs are filled. The size filter implemented removes

small noise and further simplifies the image with its filling function. The

size filter in use is an OpenCV port of the Matlab function bwareaopen

available in the OpenCV forums [27].

The size filter is implemented from contour finding and area

calculation functions of OpenCV. The function cvFindContours is used

to find all contours in the image. Then cvContourArea is used to get the

areas of the contours and if found to be smaller than the threshold that

particular contour is redrawn with cvDrawContour set to black thus

11
removing it from the image. If there are holes within the blobs it is filled

by redrawing it with cvDrawContour set to white.

4.3 Fingertip Detection

4.3.1 Algorithm Details

A simple fingertip finding algorithm was introduced by

Hardenburgh and Berard [5] in their paper Bare Hand Computer

Interaction which used as it basis a simple searching square and inscribed

circle.

Figure 4-8. (left) Searching model, (right) on a fingertip

Looking at the search square, for a fingertip the inner circle should

be composed of filled pixels. Outside the circle, the search square contains

two chains: one is a long chain of unfilled pixels and the other a shorter

12
chain of filled pixels. The search algorithm is based on these features.

Hardenbergh and Berard defined parameters by the search algorithm as

follows:

Diameter of finger – In their work it was found that this value lies within 5

and 10 pixels. However, in our implementation, the value can be pushed to

2 pixels in the extreme range of the camera. Section 5.5 of chapter 5

contains experiments that define the usable finger diameter vs distance.

The diameter is fairly invariant since the distance between the projection

and camera is fixed. Supposedly the user’s fingers should be closed to the

projected surface this more or less limits the changes in diameter.

Search square diameter – this should be at least two pixels wider than the

finger diameter.

Minimum number of filled pixels along the search square - the minimum

number of pixels that can be considered as a finger. Hardenbergh defined

this as the finger diameter which obvious from the figure.

Maximum number of filled pixels along the search square – the

maximum number of pixels that can be considered as a finger.

Hardenbergh defines this as twice the finger diameter.

13
The minimum and maximum numbers will define the range of values of

objects in query by the algorithm that are around the size of the finger that

the system should identify. Chapter 5 section 5.6 investigates and

experiments on different values for these parameters.

The algorithm is illustrated by the flow chart in the following page. The

flow chart is written in an eliminative manner where the point that is not

eliminated in all the tests is memorized as the finger location . A yes in the

decision box means that point is eliminated.

14
Figure 4-9. Fingertip Finding Algorithm

15
The searching square would scan the region of interest and subject

the search region to the tests in the flowchart. The region of interest (ROI)

is defined as the pixel areas that are filled (white) resulting from the

thresholding operation and within the projection. Pixel count within the

circle and along the square are performed. Each of the decision boxes are

tests to see if the region is indeed a fingertip. These tests are outlined

below:

Decision Box 1. Is number of filled pixels inside circle < expected circle area.

There has to be a sufficient number of filled pixels in the close

neighborhood of the position (x, y) representing a circle. In the model, the

inscribed circle should be filled. If it is less than that then the point is

eliminated.

Decision Box 2 and 3. Is number of filled pixels < maximum pixels

Is number of filled pixels > minimum pixels.

There has to be the right number of filled pixels or chain along the

described square around (x, y). In the model, the filled chain (part of the

square coinciding with filled pixels) should have a length of at least the

finger diameter. It is allowed to exceed the diameter allowing tolerances

limited by the maximum pixel length. If the results are not within the

minimum and maximum values then that point is eliminated.

16
Decision Box 4. Is number of connected flled pixel < number of total flled pixel.

The filled pixels along the square have to be connected in one chain. The

number of connected filled pixel is the count of a continous chain

compared against the total filled pixels which is the sum of all chains

(broken or continous). If the connected count is less than the continous

count then that point is eliminated.

If the particular point in query by the algorithm matches the

description of the model implemented through the tests, then that point is

labeled as a fingertip location.The folowing figure illustrates the possible

scenarios of the fingertip finding process.

Figure 4-10.
A hypothetical sample of the fingertip finding process

It must be noted that the algorithm scans the entire ROI, which is the entire

area where there are white pixels. The sample above shows only particular

17
cases to exemplify the algorithm at work. Looking at the rejected points,

the rightmost reject point fails because of decision box 1 of the flow chart.

The circle is not filled and there is no need to do the square test. The center

reject has its circle filled but fails decision box 2, having more than the

maximum pixels required. The same analysis goes for the thumb. The

small finger reject fails the test of decision box 4. Its chain is broken

indicated by the two red lines.

The next section discusses the programming considerations of the

group’s implementation of the algorithm. As expected, tolerances have to

be introduced allowing for error margins that make the algorithm perform

better in real world conditions.

4.3.2 Implementation Details

4.3.2.1 Image Access

The group’s implementation of the above algorithm required the

use of pixel access in OpenCV. Accessing pixels in OpenCV is slower

and requires more processing power. This is where the bulk of the

processing time goes to compared to just calling OpenCV functions

which are optimized to run comfortably in real time by Intel.

Two methods were used in the program and these are prominent

methods used in introductory tutorials [28]. One is the direct access

method and the other is the use of a C++ typedef wrapper. The direct

18
access method is the fastest but is error prone. To access a pixel in this

method the following call is performed:

Pixel_val = ((uchar *)(img->imageData + y*img


>widthStep))[x];

Pixel_val is an unsigned char from 0 to 255 read directly from

the dataype IplImage (derived from Intel Image Processing Library)

whose label is img. The variable x and y are the x and y components of

the point in query.

The other method used is a C++ wrapper which is basically a

redefinition of the above call into variables that are more accessible. This

is less efficient compared to the direct access method but it allows for

easier calling and offers better readability. The redefinitions are as

follows:

template<class T> class Image


{
private:
IplImage* imgp;
public:
Image(IplImage* img=0) {imgp=img;}
~Image(){imgp=0;}
void operator=(IplImage* img) {imgp=img;}
inline T* operator[](const int rowIndx) {
return ((T *)(imgp->imageData + rowIndx*imgp-
>widthStep));}
};

typedef Image<unsigned char> BwImage;

It’s usage is exemplified below:

IplImage* img; //Given an image type with name img


BwImage imgA(img); //Define wrapper as imgA on img

19
imgA[y][x] = 111; //Access pixel (write as 111)

With the wrapper, accessing is simplified to the call img_wrapper

[y][x]. Where, again, [x] and [y] are the x and y coordinates of the point

in interest. The direct access method is used in simpler parts of our code

where reading from binary images is performed. The wrapper method

with its simplicity and readability is used in more intricate parts of our

code.

4.3.2.2 Region of Interest

Defining a region of interest for the searching algorithm will reduce

the pixel access and tests thus decreasing computational load. The

proponents found that the projection is always less than input image and

it is typically trapezoidal by shape. The region of interest is defined by

that trapezoid using calculations from the calibration block discussed in

section 4.6. Furthermore the input to the fingertip finding block has been

processed with image processing techniques particularly thresholding.

This allows us to further reduce the ROI from within the trapezoid by

considering only pixels that are white (255).

The ROI is implemented in code by defining the limits of the

nested for loops to that of the lines that compose the trapezoid. A test of

whether a particular point is empty is done and if found empty it skips

the algorithm and goes to the next point.

20
4.3.2.3 Inscribed Circle Test

Given the finger diameter, the finger radius coded as int frad can be

calculated and the expected circle area can be calculated as πr2. This is

the theoretical area that will be used to compare what will be found by

the program. With the way the program is coded the finger diameter must

be even numbers such that the resulting radius must be an integer. It is a

requirement that the radius be integers to be used in the FOR loop. In the

program to find the pixels in the circle defined by the radius, frad the

following loop is used:

for ( c_i = i - frad ; c_i < i + frad ; c_i++ )

{
for ( c_j = j - frad ; c_j < j + frad ; c_j++

scan_area_check = (i - c_i)*(i - c_i) +


(j - c_j)*(j - c_j);
if (scan_area_check > frad*frad) continue;

if( (thresholded_filt->imageData + c_j*thresholded_filt-


>widthStep)[c_i] != 0 )
{ disc_pixel_count++; }

Given a point pt(i,j) 2 loops with limits based on the finger radius

are calculated. The x and y component loops are calculated with lower

limits – frad, and upper limit + frad. It can be seen that the two

FOR loops alone will scan a rectangular area. An additional step makes

21
sure that only a circular region is scanned and it skips other areas that are

not part of the circle. The variable scan_area_check together with the if

statement checks whether that particular point is within the radius of the

circle. This check is motivated by the equation of the circle that r2 = x2 +

y2. If the resultant number of (x component)2 and (y component)2 exceeds

the (finger radius)2 then it is outside the circle and is therefore not

processed. This method may seem indirect but is preferred over a direct

approach where the FOR loop limits are defined by the solution of the

circle equation. The direct approach involves calculation of square root

of the sum of the squares. This was found to be very slow, unsuitable for

real time operation. Squares are calculated in the program as

multiplication by itself instead of using the power functions of math.h C

library.

For points inside the circle with a value not equal to 0 (meaning

filled since the image is binary), an accumulator variable called

disc_pixel_count is incremented. Upon end of the nested x and y

coordinate loops the program variable disc_pixel_count should hold the

number of pixels inside the circle.

It is then compared to the expected area. If the expected area >

disc_pixel_count – error margin then the circle is said to be filled. An

error margin must be introduced since the expected area is based on

continuous values compared to discrete pixel count. The following table

22
illustrates the difference between the theoretical and actual areas of a

circle with different radii. These were calculated with a program using

the algorithm just discussed.

Radi Theoretical Actual (discrete


us (continous) count)
5 78.54 79
4 50.27 47
3 28.27 27
2 12.57 11
1 3.14 3

Table 4-2
Comparison of theoretical and actual circle areas

It can be seen that the discrepency is typically at a value of 1 and at

most 3. The error margin for the circle test is currently set as 3 pixels.

This is a safe number allowing for increased tolerance in the search

criteria.

4.3.2.4 Square Tests

For points that pass the previous criteria the square based tests are

now performed. The following diagram illustrates how the test is

performed

23
Figure 4-11. Square Test Implementation

Given the point Pt (x,y) the square corners are calculated with

reference to the square dimensions specified earlier in the algorithm

explanation. For the program the proponents use a square dimension 4-6

pixels wider than the fingertip diameter. The variable square_dist is half

the square width. Therefore the square corners are defined by the points:

(x-square_dist, y- square_dist), (x + square_dist, y- square_dist), (x-

square_dist, y + square_dist), and (x + square_dist, y + square_dist). In

the code, 4 loops are created going in the each of the numbered directions

to scan a square.

24
During the scan the filled (white) pixels that it goes through are

counted as well as the length of the longest chain (connected filled

pixels) it achieved. If the filled pixel count is less than the minimum or

exceeds the maximum then the point does not satisfy the criteria and is

not a fingertip location. An additional test is performed to make sure the

filled pixels are connected along a chain. If the longest connected pixel <

filled pixel count – error margin then the points are not connected in a

long chain and does not satisfy the criteria. Again an error margin is

introduced which allows tolerances in the criteria. The algorithm at work

in our program is illustrated below:

Figure 4-12. Fingertip Finding Results

All fingers have completely filled inscribed circles. The unfilled

chains of the squares are marked as blue while the filled chains of the

square are marked as red. The filled chains satisfy the criteria of being

25
within the minimum and maximum lengths. The error margin allows

tolerances that are exemplified by the middle finger and the pinky.

4.4 Clustering/Grouping
The positives found by the finger finding algorithm often are found on

adjacent pixels. That is a fingertip can have on its center, adjacent pixels also

found as fingertips. This is due to the tolerances allowed by the algorithm. A

finger can have 2 or more matches in a single fingertip location.

A clustering or sorting algorithm is implemented to find which matches

belong to a particular finger. The algorithm is based on defining a minimum

distance between found positives to be marked as a new fingertip. If the distance

is less than the set distance then that match is on the same finger. If not then it is

on the next finger.

26
Figure 4-13. Clustering / Grouping Algorithm Flowchart

The flowchart above illustrates the clustering/grouping algorithm. The

raw matches provided by the fingertip finding module is stored in an array. A

new array is created whose contents are the distances of the points with respect

to the first point in the raw matches array. Then a check is made to separate those

within the minimum distance from those exceeding. All raw matches with

distances less than the minimum are stored in a fingertip location with the first

27
match as reference. All raw matches that exceed the minimum distance are

stored in a buffer which later will become the new raw matches. This will

continue until all raw matches are assigned to a finger location.

4.5 Triggering / Rules Check


The group decided to use a delay based approach to signal a mouse

clicking event. The user would just have to hold his hand in the area of interest

to signify he/she is clicking. A problem occurs in triggering because it is not

possible for a fingertip to settle in an exact pixel location in several frames

because there is a tendency for the fingertip to be detected at adjacent pixel. This

is a limitation of the accuracy of the fingertip finding process. In order to solve

this problem the proponents set a margin of error wherein if the located fingertip

moves due to instability from the program or from the user, the system will still

be able to trigger. The delay is counted as a number of frames where the

fingertip stays. The group was originally targeting a 0.5s to 1s delay to click,

which translates to 7 -14 frames (using average processing time gathered from

experimentation, please see 5.9.2) that the fingertip should be detected at the

same location. However, user feedback shows that users want the minimum

delay in clicking, as in the moment when the finger rests, a click should be sent.

The fastest frame count threshold that the group can use is 3. This is the fastest

triggering speed without accidental clicks. Ideally 3 frames triggers in far less

than half a second but instabilities may cause the 3 consecutive frames to

28
accumulate longer. Average values for clicking delay relative to the frame

threshold are gathered in experiments in section 5.8 of chapter 5.

Preliminary
Image
Processing

Check if
Wait = 1 No Fingertip
Then Counter2 ++ Finding Process

Yes
Check if Counter2 No Counter set to 0
is equal to wait
update time
Check if fingertip No
found is equal to 1
Yes
Yes
Wait counter
is set to 0 Check if the location
of the previous No
finger is the same to
the present

Yes
Counter + 1

Check if counter is No
equal to defined
click speed
After clicking the
system must Yes
update
background by Move the mouse to
setting the wait the finger location
update time. and trigger click

Figure 4-14. Rules Check and Trigerring Algorithm Flowchart

29
The above diagram shows the flow of the mouse clicking algorithm. The

algorithm itself is in the shaded zone in contrast to other modules of the

program. The system will not trigger if the previous process which is fingertip

finding locates more than one finger. If more than one finger is detected the

mouse clicking algorithm will not do anything and continue. A problem after

triggering occurs when the mouse clicks and the content hence background

changes. Because it will take time before the background reference settles, a

chance of false detection occurs from the arbitrary shapes that will be produced

in the difference layer as the reference updates. The proponents decided to

temporarily disable the fingertip search algorithm after the click is sent to allow

the background reference time to update to the new content. After a fixed

number of frames the fingertip finding algorithm will reactivate.

4.6 Calibration
Since the camera does not share the same optical path as the projection

there are distortions introduced into the video. Most particular is the trapezoid

effect of the off-center projection.

Figure 4-15. Camera Content Visualization

30
The outer rectangle is what is seen by the camera. The trapezoid drawn

by the solid line is the projection visible from the camera. The inner rectangle is

the effective screen that gets transformed into Windows coordinates.

As part of the calibration process, the trapezoid is defined by hand. The

user will be asked to click on the upper left, lower left, upper right, and lower

right corners of the visible projection. In the case of uneven Y values for either

the upper or lower sections (i.e different y values for upper left and right) the

program overwrites the values prioritizing inner coordinates. Afterwards the

calibration calculations performed by the program is a two part process. One is

to compensate for the trapezoid and the next is to transform the coordinate from

openCV into what windows can take. The steps that are done are outlined below:

1. Get finger location (OpenCV coordinates)

2. Place location relative to trapezoid.

3. Take that location and scale down to relative rectangular region

4. Transform that scaled location to Windows based coordinates.

(0,0)

Figure 4-16. Calibration Considerations

31
OpenCV takes as an origin the lower left section of the video. The

location of the finger found will be based on OpenCV coordinates and that will

be compared relative to the trapezoid projection. Knowing the corners of the

trapezoid, the line equations representing the vertical sides of the trapezoid can

be defined. The points corresponding to the upper and lower left define the line

labeled as xlower while the points corresponding to the upper and lower right

corners define the upper x line. These can be put into a line equation form y =

mx + b. Given the current coordinates Xi,Yi the value of the corresponding x of

the xlower line is calculated. The answer of the expression Xi – xlower gives the

distance of the found point relative to the trapezoidal projection. The next step is

to scale that distance to fit in the rectangle defined by the dotted red lines. The

equation below illustrates this:

Xscaled = ( Xi − Xlower ) * (Upperright. X − Upperleft. X ) /( Xupper − Xlower )

Where: Xi,Yi is the point being transformed


Xlower is the x value of the left line given Yi
Xupper is the x value of the right left line given Yi
Upperright.X is the x component of the upper right corner coordinate
Upperleft.X is the x component of the upper left corner coordinate

The Y values are more easily transformed since uneven Y values are

overwritten to be level. A major consideration for the transformation of the Y

component is that Windows uses a top left origin instead of the bottom left of

OpenCV. The major difference is that the y component originates topside from

windows while it is bottom side for OpenCV. Again the OpenCV coordinates are

32
referred against the projection and conforming to Windows origin it will be

compared to the top line of the visible projection. The equation below is the

implemented transformation.

Yscaled = Yupper − Yi
Where: Yi is the y component of the point being transformed.
Yupper is the lower (inner) Y between the upper left and right corners.

X and Y coordinates that have been scaled are relative to the red

rectangle inside of the trapezoid. Now it has to be placed on the Windows screen

requiring parameters such as effective height and width. The effective height is

simply the height of the rectangle or quantitatively, Yupper – Ylower. The

effective width is simply the width of the red rectangle or quantitatively,

Upperright.X – Upperleft.X. Due to calibration, the resolution offered by our

system is less than 320x240 and is defined by the effective screen height and

width.

4.7 Windows API

The proponents used WinAPI or the Windows Application Programming

Interface part of the Platform SDK to generate the functions for moving the

mouse pointer and simulating the left click event. It serves as the driver that

communicates with the system inputs. The SendInput function, initialized as a

33
mouse input, is used for the control of the mouse pointer and for the simulation

of a left click. The arguments for moving the mouse are the calibrated x and y

coordinates that the calibration step provides. Snippets of the functions used are

explained below:

void MouseMove (int x, int y)


{
double fx = x*(65535.0f/fScreenWidth);
double fy = y*(65535.0f/fScreenHeight);
INPUT Input;
Input.type = INPUT_MOUSE;
Input.mi.dwFlags = MOUSEEVENTF_MOVE|MOUSEEVENTF_ABSOLUTE;
Input.mi.dx = fx;
Input.mi.dy = fy;
SendInput(1,&Input,sizeof(INPUT));
}

void LeftClick ()
{
INPUT Input;
Input.type = INPUT_MOUSE;
Input.mi.dwFlags = MOUSEEVENTF_LEFTDOWN;
SendInput(1,&Input,sizeof(INPUT));

INPUT Input1;
Input1.type = INPUT_MOUSE;
Input1.mi.dwFlags = MOUSEEVENTF_LEFTUP;
SendInput(1,&Input1,sizeof(INPUT));
}

INPUT Input declares the variable Input as an INPUT structure. Input.type

specifies that the type of input event in a mouse input Input.mi.dwFlags sets the

bit flag to move, the MOUSEEVENTF_ABSOLUTE is also set to specify that

the dx and dy values are normalized absolute coordinates. The value of fx and

fy have the scaling factor of 65535/fScreenWidth and 65535/fScreenWidth

because the flag MOUSEEVENTF_ABSOLUTE makes dx and dy contain

normalized absolute coordinates between 0 and 65,535. The value fScreenWidth

34
and fScreenHeight are from effective resolution dimensions calculated by the

calibration modules of the program, which is less than 320x240. SendInput

function synthesizes keystrokes, mouse motions and button clicks. In this case, it

synthesizes mouse motions. The first parameter of the function arguments is the

nInputs, it is the number of structure in the pInputs array. Next is the pInputs, it

is a pointer to an array of INPUT structures. The last is the cbSize, it specified

the size (bytes) of an INPUT structure.

The flag MOUSEEVENTF_LEFTDOWN specifies that the left button

was pressed. The flag MOUSEEVENTF_LEFTUP specifies that the left button

was released, so simulating a left click is simply setting these flags and use

SendInput to synthesize the button clicks like in the code shown above.

4.8 GUI design


Certain limitations to the GUI have to be considered due to the nature of

the vision system being applied. The use of a background subtraction technique

means that there should be sufficient contrast (difference in intensity) between

the hand and general background. This reduces us to colors that are very unlike

skin color. Additionally the GUI must be static as possible to prevent the

possibility that animations are falsely detected as fingers. This may however

affect the aesthetic value of the GUI. So there is a tradeoff between stability and

presentability. It is up to designer on which will be a priority depending on the

application. Because the projector has a native capacity of 800x600 resolution

35
and the program has a resolution of less than 320x240, a loss in fine resolution

control of the mouse must be considered. Therefore the GUI should have

aesthetically as large buttons as possible. Sudden losses and reappearances of

tracking might be distracting to the user as it will cause the cursor to jump.

Making the cursor invisible will minimize user troubles and will not affect, in

general, the application.

Figure 4-17. Main Menu Frame

The image above shows that light colors were implemented to the

building blocks. As said earlier sufficient contrast should be needed in order for

the background subtraction to be effective. Second, GUI elements in the extreme

edges were avoided to prevent the fingertip from being cut-off by the size filter.

Lastly, certain buildings were made larger (not to same scale) in order to

minimize issues regarding the program’s resolution.

36
Figure 4-18. Building Menu Frame

On this figure, it can be observed that the parameters mentioned were

strictly applied. First, color selection was followed. In this case, variations of the

color “light green” were chosen to be the buttons background and text color.

Then sizes of the buttons were obviously made large. The presentation was

predominantly made static so that the probability for false detections is

minimized. Some animations are used in transitions (between content) where

during that particular time fingertip detection is not necessary.

37

You might also like