Professional Documents
Culture Documents
Computer Vision Based Sign Language Recognition For Numbers: Kth@ece - Cmu.edu Jwng@andrew - Cmu.edu Shirlene@cmu - Edu
Computer Vision Based Sign Language Recognition For Numbers: Kth@ece - Cmu.edu Jwng@andrew - Cmu.edu Shirlene@cmu - Edu
1
top to bottom again, this time from right to left. The vector, and passing through the COM. The COM is
first black pixel that we encounter is then set as the then recomputed and adjusted to be the middle point
“right” side of the hand. Next, we do a horizontal of the cut-off line. From there, we can determine the
scan from left to right starting from the top of the starting and the ending points (left most black pixel
image, within the vertical boundaries previously intersecting the cut-off line and the rightmost black
defined. The first black pixel that we encounter is pixel intersecting the cut-off line) for our boundary
then set as the “top” of the hand. We assume the contour generation.
hand extends from the bottommost part of the image, Draw line
perpendicular
thus there’s no cropping of the image to locate the Recompute to orientation
“bottom” part of the hand. Center of Mass vector and
crosses COM
COM
Draw vector to
obtain image
orientation
Figure 2: Average
Locating the hand from the orientation vector
binary image.
Figure 3: Determining the orientation of the hand using vectors
2
5. Boundary Contour
The cut-off points obtained from the orientation
determination process are used as start point and end
point of an edge detection heuristic. We cycle the
Figure 5: The skeleton is a line of points that are no nearer to points along the edge of the binary image, while
one point than another
saving them in that sequence, and at the same time
computing the Euclidean distance between that point
To see how this works, imagine that the and the COM.
foreground regions in the input binary image are
made of some uniform slow-burning material. Light
fires simultaneously at all points along the boundary Figure 9:
of this region and watch the fire move into the The distance from
interior. At points where the fire traveling from two each pixel to the
COM is then plotted.
different boundaries meets itself, the fire will
extinguish itself and the points at which this happens
form the so called `quench line'. This line is the
skeleton. Under this definition it is clear that thinning
also produces a sort of skeleton. The peaks (maximum) are the furthest point from the
COM, that is, they represent the positions of the tip
of the fingers.
4.3 Thinning
Figure 7:
Structuring elements used for thinning
3
Peaks
Threshold to be classified as
To implement the thinning method, we also used
finger:
<20% encoded as 0
MATLAB’s morphing function.
>20% encoded as 1
6. Equidistance Skeleton
Figure 13:
The window is then placed at the
There are two ways of skeletonization the hand starting point (bottom most) and scans
through the skeleton. Each branch
structure. We used both the equidistance skeleton and point is recorded and marked.
the thinning method for our project.
To implement the equidistance skeleton method,
we used MATLAB’s morphing function.
We classify each pixel as we move along the
MATLAB’s morphing function requires us to invert
skeleton using identifiers such as “endpoints”,
the colors of the binary image. Morphing the binary
“branch” and “normal points”. “Endpoints” is
image of the hand, we obtain the skeletal
classified as the pixel at which there are no other
representation of the hand.
valid neighboring pixels. “Branches” are classified as
the pixel at which it has more than one valid
neighboring pixel. This pixel is then marked so that
after reaching an endpoint of one of the branching
Inverted pathways, our window tracking would return to the
branch point and move along the next valid pathway
from that particular pixel.
4
analyze the arrays by comparing the distance of each structuring element is set to background. Otherwise it
pathway we traced. is left unchanged.
DETECTED: 5
6.3. Results for Equidistance Skeleton Those are outputs for the thinning algorithm showing
number four and eight respectively.
We ran our program with 5 sets of hands of
different color, orientation and size. We also ran a
video clip of a hand signing an average of 10 7.2. Classification for Thinning
numbers over a period of time with our program.
Correct matching between the original number and For the thinning implementation, the classification
the detected number was less than 50%. It was an stage is almost similar to that of the equidistance
approximate average of ~40% correctness. skeleton. The same process of tracing through the
skeleton is also needed. However the algorithm is
We attribute the low performance of this less complex since we do not need to take care of
algorithm to the many extra branching paths in the split ends as perceived from the skeleton image.
skeleton. In the future, we believe the performance of
this algorithm can be improved by applying an extra
skeleton “cleaning” step to remove all the small
branching that were extraneous. Due to time (a) (b)
Figure 15: (a) Split end at the endpoint of the branch of a skeleton,
constraints, we did not have time to implement this (b) endpoint of a branch of from thinning
extra feature into the program to enhance its
performance. Basically, we calculate the length of each branch,
getting rid of insignificant branches which have
length shorter than a given threshold. Based on the
7. Thinning longest branch, we calculate how many short
branches and how many long branches. Long
To implement thinning, first, translate the origin branches would represent stretched fingers while
of the structuring element (middle) to each possible short branches represent folded fingers. I would then
pixel position in the image. If foreground and code them in terms of binary values as 1 for stretched
background pixels in the structuring element exactly fingers and 0 for folded fingers.
match foreground and background pixels in the
image, the image pixel underneath the origin of the
5
time because the morphing, calculation and
Longest branch
classifications took time to complete for each frame
we captured.
Short branch
6
Mellon University), Research Division (Komatsu Ltd.) and
Univ. of Electro-Communications.
Jeremy Ng is a Junior in
Electrical and Computer
Engineering at Carnegie Mellon
University. He is very
interested in computer vision,
image processing and computer
graphics. He is an international
student from Malaysia.