Wseas TransOnSystems

Non-rigid Shape Recognition for Sign Language Understanding
LIVIU VLADUTU
School of Computing
Dublin City University
Glasnevin, Dublin 9, DCU
IRELAND
lvladutu@computing.dcu.ie http://www.computing.dcu.ie
Abstract: - The recognition of human activities from video sequences is currently one of the most active areas
of research because of its many applications in video surveillance, multimedia communications, medical
diagnosis, forensic research and sign language recognition. The work described in this paper describes a new
method designed to precisely identify human gestures for Sign Language recognition. The system is to be
developed and implemented on a standard personal computer (PC) connected to a colour video camera. The
present paper tackles the problem of shape recognition for deformable objects like human hands using modern
classification techniques derived from artificial intelligence.
Key-Words: - Statistical Learning, Shape recognition, Sign-Language
1 Introduction interfaces (VBI), and there are a lot of possible

The purpose of the project is to develop a system for applications:
Human-Computer interaction (HCI) projects in - television control, [021] ;
general and Irish Sign Language (ISL) under- - music synthesis,[23];
standing in particular. Sign languages are the native - robot control [27];
languages by which communities of Deaf - surgeon's computerized aid, [22];
communicate throughout the world. Despite the - video surveillance ;
great deal of effort in Sign Language so far, most - forensic research;
existing systems can achieve good performance - hand-gestures interfaces for smart phones;
- smart interfaces,[25];
only with small vocabularies or gesture - home automation & medical monitoring,
datasets. Increasing vocabulary inevitably like Gesture Pendant, [28] etc.
incurs many difficulties for training and
recognition, such as the large size of required
training set, variations due to signers and to 1.1 Main steps
recording conditions and so on. Up to now the The classical steps of this type of human-computer
Deaf people had to communicate usually interaction (HCI) system that were implemented by
through an interpreter or through written forms members of the team (see also [2]) are:
of spoken languages, which are not the native - Hands and face detection;
languages of the Deaf community. - Tracking of the above mentioned human body
The aim of the project is to develop this system parts using Hidden Markov Models;
using vision based techniques, independent of - Shapes coding and classification using Machine
sensor- based technologies (using gloves) that can Learning techniques;
prove to be expensive, uncomfortable to wear, - Elimination of small area occlusion problems
intrusive and limit the natural motion of the hand. (hand-hand or hand-face occlusion) using motion
The images are extracted from simple ’one-shot’ estimation and compensation (Figure 1 below);
gestures, recorded in video-streams, where only an - Construction of faster implementation aiming the
individual gesture is executed, not linked to the real-time ISL understanding system, using faster
consecutive signs (as in a normal sign-language programming environments (.mex or .mexw32
conversation). programs in Matlab based on C++ implementation);
Vision-based gesture recognition techniques have
received a lot of attention in the recent years, and 1.2 Short description
controlling and understanding gestures are the focus The work presented in the current paper investigates
of current research in vision-based the detection of subunits (that compose a sign) from
the point of view of the human motion (not a server) and with a regular digital camera for
characteristics. It has the following main steps: providing the input data has imposed us to limit our
1) Video-stream analysis of the whole video- tests to acceptable classification method. The
stream (corresponding to a gesture) using material, was a database of video streams created by
Principal Component Analysis (based on our group using a PC and a handy-cam, but also a
Singular Value Decomposition) in order to proprietary database of synthetic images (using
extract the representative frames (using virtual signers) created using Poser (http://www.e-
Fuzzy-Logic); frontier.com) and the Python programming
2) Feature extraction using novel features language.
(Colour Layout Descriptor and the Region
shape one developed by MPEG-7 group);
3) shape classification using Computational
2 The proposed approach
Intelligence methods (based on Support 2.1 The skin model
Vector Machine classifiers). The current work was based on a large experience
The detection of subunits and the skin segmentation in skin-model detection ([1][6] and related) using
are hampered by a series of inevitable natural the presumption of the human bodies being ”under
factors, some of them being enumerated below: uniform lighting”. The proposed algorithm is fully
● Different skin colour of the signers; automatic and adaptive to different signers (real
● differences in illumination conditions; persons or virtual speakers, implemented in Poser).
● different anatomic characteristics of the humans; The skin detector is responsible for segmenting skin
● differences in distances and angles of camera objects like the face and hands from video frames
location from the signer; and it works ’in tandem’ with the tracker which
●different temporal execution of the gestures keeps track of the hand location and detects any
(influenced by signer's mood or temperament); occlusions that might happen between any skin
● differences between the ways a man / woman objects. Due to this feature, (skin detector closely
executes the gestures. interacting with the HMM-based tracker) has also
solved the problem of small occlusions (hand-hand
In the model the subunit is seen as a continuous or hand-face), see Figure 1 below. The skin detector
hand action in time and space; therefore, the clear + tracker scheme can be seen at work, see for
shape understanding at certain moments in time instance:
Representative Frames (RFrames) for human action http://www.youtube.com/watch?v=exbGdHpFiW0,
understanding is essential. One of the problems we but the tracking doesn’t make the object of the
faced is that the hand is a highly deformable current paper. The present work, tackled mainly
articulate object with up to 28 degrees of freedom. one-handed gestures corresponding to ISL, see
Also the skin detection for segmentation, see [1][6] also”The Standard Dictionary of Irish Sign
is based on the assumption that skin colour is quite Language”, [12].
different from colours of other objects and it's
distribution might form a cluster in some specific
colour-spaces (RGB, YCbCr). Even in the case of 2.2 The features space
specific difficult conditions, i.e. fast segmentation To evaluate the retrieval performance of the
imposed by the online requirement of the design, proposed method, a large number of experiments
workarounds were found. The previous coding were carried on a set of images recorded by a
approaches were dictated by classical member of our group and a database of images
implementation in the field, using principal created using Python and Poser. I have used
component analysis (PCA), like the work described descriptors from MPEG-7, formally known as
in [10], influenced by new insights of the machine Multimedia Content Description Interface and
learning and statistical learning theory[11],[18]. includes standardized tools (descriptors, description
We detect the skin by combining 3 useful features: schemes, and language) enabling structural, detailed
colour, motion and position. These features descriptions of audiovisual information. Moving
together, represent the skin colour pixels that are Picture Experts Group is the committee that also
more likely to be foreground pixels and are within a developed the Emmy Award winning standards
predicted position range. Machine Learning is a rich known as MPEG-1 and MPEG-2, and the MPEG-4
field of knowledge-discovery in data, but the severe standard. An extended description is also at:
restrictions imposed by having in the end to have a http://www.chiariglione.org/mpeg/standards/mpeg-
real-time ISL recognition system running on a PC 7/mpeg-7.htm
recorded from each segmented image after selecting
50 (cropping) the body part of interest (face or hands).
100 From each shape, as set of ART coefficients, Fnm ,
150 is extracted, using the following formula:
200
100 200 300 Fnm  Vnm (  , )  f nm (  , ) 

2 1
 0 0Vnm (  , ) f nm (  , ) d  d
*
100
f (  , ) is an image intensity function in

200
300
where
400 polar coordinates and Vnm (  , ) is the ART basis
500
100 200 300 400
function of order n and m. The basis functions are
separable along the angular and radial directions,
and are defined as follows:
1
Vnm (  , )  exp( jm ) Rn ( )(1)
Figure 1: Example of an image from real signer 2
(above) and the equivalent one from a Poser-
generated video (below) 1,if n  0 
Rn ( )   (2)
 2cos(  n ),if n  0 
Two MPEG-7 visual descriptors are used in the
experiments. The Colour Layout descriptor (CLD)
provides information about the spatial colour The default region-based shape descriptor
distribution within images. After an image is has 140 bits. It uses 35 coefficients (n=10, m=10)
divided in 64 blocks, this descriptor is extracted quantized to 4 bits per coefficient. I have used in the
from each of the blocks based on Discrete Cosine object description all the 35 resulted coefficients.
Transform. We can evaluate the distance between The region based shape descriptor expresses pixel
two CLD vectors using the formula with luminance distribution within a 2D object region; it can
and 2 chrominance channels information: describe complex objects consisting of multiple
disconnected regions as well as simple objects with
or without holes (Figure 2). Some important features
of this descriptor are:
SCLD (Q, I )   i wy (YQ
i i
 YIi ) 2 
● It gives a compact and efficient way of describing
 i wCb (CbQ i i
 CbIi )  2 properties of multiple disjoint regions
simultaneously;
 i wCr (CrQ
i i
 CrIi ) 2 ● It can cope with errors in segmentation where an
object is split into disconnected sub regions,
provided that the information which sub regions
where wi represents the weight associated with
contribute to the object is available and used during
coefficient i. There are 12 coefficients extracted for the descriptor extraction,
the colour layout descriptor (6 for Y, and 3 each for
Cb and Cr). There is a more detailed description in ● The descriptor is robust to segmentation noise.
the MPEG-7 ISO schema files, see [7] and:
http://standards.iso.org/ittf/PubliclyAvailableStan Also the classification with this descriptors
dards/MPEG-7 schema files/. The region based outperforms the results obtained with other
shape descriptor belongs to the broad class of shape descriptors, like SIFT, described in [30] and [32].
analysis techniques based on moments. The information from CLD and RS (the region
It uses a complex 2D Angular Radial shape) descriptors is stored in XML (Extensible
Transformation (ART) , defined on a unit disk in Markup Language) a Metadata Interchange format,
polar coordinates. The ART coefficients were which is further processed by the classifier.
The feature space for shape retrieval and {v1, v40 , v75 , v120 ,...} ,
then {v40 , v75} can be
classification consisted of up to 47 coefficients (12
used to visually represent the segment.
from CLD and 35 from ART descriptors) and they
In order to have a clear understanding of how
were passed further on to the classifier after the
gesture’s RFrames are selected a simple figure 4
selection scheme based on Fuzzy C-Means.
shows a relative smooth transition from neutral
phase (where signer keep hands down) to the active
phase (the region in green from the middle) and
again in the neutral position.
Figure 2: Example of shapes where region based

shape is applicable.
2.3 Video-stream segmentation and

RFrames selectionSubsection
This initial phase is supposed to select the images
from our simple video recordings that are to be used
for shape understanding. First of all, the stream of
images is represented in the PCA-space, like in the
figure 3 below:
1 100 100 100

X: 0.6968
Y: 0.8315
Z: 0.6542
0.8 200 200 200
300 300 300

0.6
100 200 300 100 200 300 100 200 300
0.4
0.2
0
1
0.8 1
0.6 0.8
0.6
Figure 4: The figure shows how the images in a
0.4
0.4
0.2
0.2
0 0 video stream can be clustered in 3 simple regions,
therefore Fuzzy C-Means is applicable
Figure 3: A representation of a video-gesture 'a' in
the 3-dimensional PCA-space
In this selection process, the images from the all
In the figure above, there are 3 clusters and gesture's video stream, are represented in the
their centers (represented as red circles) and we Principal Components (PC) space,[3]. The
selected the frames (RFrames) from the middle- representation in PC-space has previously revealed
cluster (the one from the left hand of the us very interesting aspects of motion’s dynamics,
picture). The manifold is a closed one, since, [10][16].
the signer's hands are starting and ending in the
Since we are not dealing with crisp transitions (i.e.
same position, as emphasized in the Figure 4. an image may belong to 2 subsets), I considered
A simple algorithm, was chosen, (see also [8]): for mandatory to use Fuzzy Clustering.
example, if a logical video segment is v10100 , and The basics of the algorithm, called Fuzzy C-Means
the Rframe set from the whole video is (FCM) were introduced by Dunn [4] and improved
by Bezdek, [5] a classic fuzzy clustering m
 j 1 (uij )
N
algorithms. xj
ci  m
(5)

N
(u
j 1 ij
)
Fix the number of clusters C;
In the equation above m  [1, ] is the fuzzifier
Fix the fuzzifier m;
Do { (m=2 in this case).
Therefore, in the end the algorithm looks like in the
Update membership using equation (4)
pseudo-code description depicted in Table 1.
Update center using equation (5)
} Until (center stabilize)
2.4 Short introduction to Support-Vector

Machines based learning
Table 1 Machine learning in general is a scientific

discipline that is concerned with the design and
The objective function for FCM is given by the development of algorithms that allow
function J: computers to learn based on data, such as from
C N sensor data (data derived from images in our
J   (uij ) m d 2 ( x j , ci )(3) case) or databases. In the current framework of
i 1 j 1
Machine Learning and data understanding I have
considered the approach derived from statistical
where uij  [0,1] are the membership functions for learning theory of SVM (support vector machines).
Suppose we are given a set of examples
all i, and obey the constraints defined in the
equation below:
( x1 , y1 ),( x2 , y2 ),...( xl , yl ), xi  X , yi {1}
and we assume that the two classes of the
C
classification problem are linearly separable.
 uik  1,1  k  n Theorem 1: Let the "l" training set vectors
i 1
n x1 , x2 ,..., xl  X (X is the dot product space)
0   uik  n,1  i  c belong to a sphere SR(a), of diameter D, and center
k 1 at a, i.e. :
where C is the number of clusters (3 as explained D
above) and d is the distance norm (like Euclidean, SR (a )  {x  X , x  a  },a  X
Manhattan etc). I have used as validity measure for 2
fuzzy clustering, the Xie-Beni index [20], which for
C  [2,10] , recommends m  [1.5, 2.5] . We Also, let f w,b  sgn((w  x)  b) be canonical
have C=3, so the choice m=2 for the fuzzifier gave hyper-plane decision functions, defined on these
the best results. points. Then, the set of  -margin optimal
The minimization of the objective function J with separating hyper-planes has the VC-dimension h
respect to membership values leads to the following: bounded by the inequality:
h  min([ D 2 /  2 ], n)  1(6)
1 where [ x ] denotes the integer part of x.
uij  (4)
d 2 ( x j , ci )
 (d 2(x 1/ m 1
)
C
k 1 In this case, we can find an optimal weight vector
j , ck ) 2
w0 such that w0 is minimum (in order to
And the minimization of the objective function 2

maximize the margin  of Theorem 1
with respect to the center of each cluster will gives w0
us:
above and
yi  (w 0  xi  b)  1,i  1,..., l The introduction of the variables i allows
misclassified points, which have their corresponding
The support vectors are those training examples i  1.
situated on the boundary (between the 2 classes) that
l
satisfy the equality:
yi  (w 0  xi  b)  1,i  1,..., l
Thus,  i is an upper bound on the number of
i 1
They define two hyper-planes. The one hyper-plane training errors. The corresponding generalization
goes through the support vectors of one class and of the concept of optimal separating hyper-plane is
the other through the support vectors of the other obtained by the solution of the optimization problem
class. given by equation (9) above, subject to:
The distance between the two hyper-planes is
maximized when the norm of the weight vector yi (w  xi  b)  1  i ,i  1,2,..., l
w0 is minimum.
and i  0(10)
This minimization can proceed by maximizing
the following function with respect to the variables The control of the learning capacity is achieved
by the minimization of the first term of (9) while the
 i (Lagrange multipliers) [18]: purpose of the second term is to punish for
misclassification errors. The parameter C is a kind
l 1 l l of regularization parameter, that controls the trade-
W ( )  i  i j (xi x j ) yi y j (7) off between learning capacity and training set errors.
i 1 2 i1 j 1 Clearly, a large C corresponds to assigning a higher
subject to the constraint: 0   i . If  i  0 , then penalty to errors.
Finally, the case of nonlinear Support Vector
xi corresponds to a support vector. Machines should be considered. The input data in
The classification of an unknown vector x is this case are mapped into a high dimensional feature
obtained by computing: space through some nonlinear mapping  chosen a
priori [18].
F (x)  sgn{w 0  x  b} , where: The optimal separating hyper-plane is then
constructed in this space. As shown in Chapter 5 in
l [11] the corresponding optimization problem is
w0    i y i xi (8) obtained from (7) by substituting x by its mapping
i 1 z   (x) in the feature space, i.e. is the
and the sum accounts only N s  l nonzero support maximization of W ( ) :
vectors (i.e. training set vectors xi whose  i are
l 1 l l
nonzero). Clearly, after the training, the W ( )    i    i   j 
classification can be accomplished efficiently by i 1 2 i 1 j 1
taking the dot product of the optimum weight vector
w 0 with the input vector x. ( (xi )   (x j ))  yi  y j
The case that the data is not linearly separable is subject to:
handled by introducing slack variables
1,  2 ,..., l with i  0 (see also [19]) such as: l
 yi i  0,i : 0   i  C
i 1
yi (w  xi  b)  1  i ,i  1,2, ..., l
The idea can be expressed in a formal way as: the 2.4 The proposed classifier
goal is to:
1 l The number of training examples is denoted by l.
minimize ( w  w  C  i )(9) In our case l was always 280 (10 frames for each of
2 i 1 the 28 one-handed gestures of ISL).  is a vector
of l variables, where each component i ●the Laplacian kernel:
2
corresponds to a training example (xi , yi ) .  ( x, xk )  x, xkT exp{ x  xk }
2
xi represents the features vector which is formed as expressed in an excellent new reference: see
by either 35 (only the Region-shape coefficients) or [35].
47 coefficients corresponding to both the Region-
shape (RS) and the CLD descriptors.
Classifier results using diff. Kernels
A fast Windows implementation (.dll’s) for the
extraction of the video descriptors was chosen,
([15]) that can be included in our final real-time 100.00%
90.00%
Sign-Language understanding system. Although 80.00%
70.00%
there are many available implementations in several C l a ssf i c a t i on 60.00% polynomial
50.00%
programming languages (like Matlab, C++, Java, e r r or 40.00% dot
30.00%
Lisp a.s.o.), I have used a Java version ([13]) of an 20.00%
radial
10.00% Anova
implementation of the Support Vector Machine 0.00%
called mySVM developed by Stefan Rüping. It is 1 2 3
RS 1 R S 1& C LD 1 RS 2
based on the optimization algorithm of SVMlight as
described in [14]. mySVM can be used for pattern
recognition, regression and distribution estimation.
In order to cope with the relatively small number of
examples, a cross-validation (see, [17]) with a factor Figure 5: Classification performance of SVM with
of 25 was chosen. some of the most common kernels
Several types of kernels were tested (neural,
polynomial, Anova, Epanechnikov, gaussian-
combination, multiquadric, based on radial-basis
functions) but, our experience [9][24] is once more
confirmed, that (most probably) the SVM-classifiers 3 Experimental results
based on polynomial kernels are the best for
classification problems. The experience acquired in the group has shown that
Therefore, all the results expressed in the table 2 there are many factors that can influence the quality
correspond to the polynomial-kernel (the one with of image understanding, like: the differences
the smallest overall classification error for the 3 between signers clothing, between the lighting
experiments described in Table 2) supervised sources, between the skin of the humans or due to
learning. In the graph from Figure 4 below are the motion blur. Therefore the first step was to
described the classification errors (on ordinate) for 4 compare the classification performance of our
of the most commonly used kernels; algorithm for two classes of input data, only 35
- Anova; coefficients (of the RS descriptor), or 47 coefficients
- polynomial; (of the RS and CLD descriptors).
- radial; The performance vector (overall) resulted from the
- dot. confusion matrix of the results is presented in the
The SVM with regular kernels as above are table 2, and it shows that by adding the 12-extra
expressed by the following equations: coefficients corresponding to CLD, the
classification error is only slightly increased (by
0.35%). That will allow to gather more information
●linear SVM:  ( x, xk )  xk , x
T
in our training and testing database (more real and
●the polynomial SVM of degree d: virtual signers) and to quantify the differences
enumerated above in only few coefficients at
 ( x, xk )  (a xkT  x  1)d virtually no expenses. The second step of our
experiment used the same number of images but, for
●the RBF (Radial Basis-Function) SVM: the same letter/ sign expressed were used ten static
2
 ( x, xk )  exp{ x  xk } images and ten images extracted from the Poser-
2 generated video-stream- corresponding to the same
sign, like in the Figure 1. The performance is given
in the third line of the table 2.
Implementations were built around Matlab
(Mathworks ©) which is a powerful matrix-based Acknowledgements:
software package having a lot of toolboxes, and The research was supported by the Science
which was used as a 'wrapper'. Foundation of Ireland –SFI ( to whom I am deeply
thankful) but the author is also thankful to:
Table 2: Generalization Performance of mySVM - Lecturer Alistair Sutherland, Dr. George Awad,
classifier with polynomial kernels. Dr. Junwei (Jeff) Han for the SST (Skin-
Current experiment Performance Description of segmentation and tracking) contribution;
Vector the experiment
- Dr. Sara Morrisey and Tommy Coogan for the
RS1 6.03% Region-shape only
database of synthetic video streams generated in
RS1 & CLD1 6.39% Region-shape and CLD Poser (all from Dublin City University) and,
coefficients real images respectively, for the camera collected video-streams.
RS2 7.64 % Region-shape
coefficients for real References:
and virtual signers
[1] J. Han, G. Awad, A. Sutherland and H. Wu,
Automatic Skin Segmentation for Gesture
Recognition Combining Region and Support
3 Conclusion Machine Active Learning, Proceedings of the
7th International Conference on Automatic
The results explained in the previous sections show Face and Gesture Recognition, 2006, pp. 237-
that a limited of gestures (executed by a human or a 242.
robot...) can be learned and understood by [2] G.M. Awad, A Framework for Sign-Language
combining the shape recognition (hand shapes Recognition using Support Vector Machines
playing the role of letters in an alphabet) detailed in and Active Learning for Skin Segmentation and
the current work with an understanding of the Boosted Temporal Sub-units, Dublin City
gesture dynamics represented in the feature space University- Ireland, 2007 (PhD Thesis).
(like PCA). In this latter approach, the images are [3] I. T. Jolliffe, Principal Component Analysis,
represented in the PCA-space and the gestures are Springer-Verlag, 2002.
represented in a nonlinear manifold. The fast [4] J.C. Dunn, A Fuzzy Relative to the ISODATA
procedure exposed- it takes approximately 10 Process and It's Use in Detecting Compact
milliseconds (average classification time), and Well-separated Clusters, Cybernetics and
approximately 1 second for the VDE-feature Systems: An International Journal, Vol. 3,
extraction on a PC (having dual-2.4 GHz processor) Issue 3, 1973, pp. 32-57.
it’s considered to be a good choice for other [5] J.C. Bezdek, Pattern Recognition with Fuzzy
researchers in the related fields, which are Objective Function Algorithms, Plenum-Press,
enumerated in the Introduction section. New-York, 1981.
Future envisaged work involves: [6] J. Kovac, P. Peer and F. Solina, Human Skin
- background modelling at the beginning of Colour Clustering for Face Detection,
the video-frame analysis, [29]; Proceedings of EUROCON 2003, Turku,
- eventually tackling occlusion problems Finland, pp. 144-148.
with the help of a-priori defined 3D models [7] Shih-Fu Chang, T. Sikora and A. Puri,
of the involved body parts (head and hands), Overview of the MPEG-7 Standard, IEEE
using some of the available software Transactions on Systems and Circuits for Video
environments, either: Technology, Volume 11, No. 6, June 2001, pp.
● ITK/VTK (http://www.vtk.org/, [31] 688-695.
and http://www.itk.org/) ; [8] A. Joshi, S. Auephanwiriyakul and R.
● using Computational Geometry Algorithms Krishnapuram, On Fuzzy Clustering and
Content Based Access to Content Video
Library (http://www.cgal.org/) or related, see
Databases, Proceedings of the Workshop on
also [33], [36]. Research Issues in Databases Engineering,
1998, pp. 42-47.
[9] S. Papadimitriou, S. Mavroudi, L. Vladutu and
A. Bezerianos, Ischemia Detection with a Self-
Organizing Map Supplemented by Supervised
Learning, IEEE Trans. on Neural Networks, Conference on Automatic Face and Gesture
Volume 12, Issue 3, pp. 503-515. Recognition, FGR 2006, pp. 591-596.
[10] W. Hai and A. Sutherland, Irish Sign Language [24] L. Vladutu, Computational Intelligence
Recognition using Hierarchical PCA, Irish Methods on Biomedical Signal Analysis, VDM-
Machine Vision and Image Processing Verlag Publishing House, 2009.
Conference (IMVIP 2001), National University [25] J. Krumm, S. Shafer and A. Wilson, How a
of Ireland, Maynooth, 5-7 September 2001. Smart Environment Can Use Perception,
[11] V. Vapnik, The Nature of Statistical Learning Workshop on Sensing and Perception, (part of
Theory, 2nd Edition, Springer Verlag, 2000. ACM UbiComp 2001), September 2001.
[12] The National Association for Deaf People, [26] Y. Wu and T. Huang, Vision-Based Gesture
Ireland, The Standard Dictionary of Irish Sign recognition: A Review, Lecture Notes in
Language, (CD/DVD), by microBooks Ltd., Computer Science, Springer Verlag, Volume
2006. 1739, 1999, pp. 103-115.
[13] Open Source Data Mining with the Java [27] A. Corradini and H-M Gross, Camera-Based
software, RapidMiner, Gesture Recognition for Robot Control,
http://rapid-i.com/content/blogcategory/38/69 Proceedings of the IEEE-INNS-ENNS
[14] Joachims Thorsten, Making Learning Large- International Joint Conference on Neural
Scale SVM Learning Practical, Advances in Networks, (IJCNN 2000), Como, Italy, 2000,
Kernel Methods, chapter 11, MIT Press, 1999. pp. IV 133-138.
[15] G. Tolias, Visual Descriptors Applications, [28] T. Starner, J. Auxier, D. Ashbrook and M.
Semantic Multimedia Analysis Group, NTUA, Gandy, The Gesture Pendant: A Self-
Athens, Greece, Illuminating, Wearable, Infrared Computer
http://image.ntua.gr/smag/tools/vde/ Vision System for Home Automation Control
[16] L. Vladutu, A. Sutherland, Gesture analysis of and Medical Monitoring, Proceedings of the4th
deaf people language using nonlinear manifolds IEEE International Symposium on Wearable
analysis, SFI Conference, Dublin, Ireland, July, Computers, ISWC 2000, pp. 87-94.
2007, presented by. A. Sutherland. [29] D. Gutchess, M. Trajkovics, E. Cohen-Solal, D.
[17] R. Kohavi, A study of cross-validation and Lyons and A.K. Jain, A background model
bootstrap for accuracy estimation and model initialization algorithm for video surveillance,
selection, Proceedings of the14th International Proceedings of the 8th IEEE International
Joint Conference on Artificial Intelligence, Conference on Computer Vision, ICCV 2001,
2(12), Morgan Kaufmann, San Mateo, 1995, Volume 1, July 2001, pp. 733-740.
pp. 1137-1143. [30] D. G. Lowe, Distinctive Image Features from
[18] V. N. Vapnik, Statistical Learning Theory, Scale-Invariant Keypoints, International
Wiley-Interscience, 1998. Journal of Computer Vision, Volume 60,
[19] C. Cortes and V. Vapnik, Support Vector Number 2, November 2004, pp. 91-110.
Networks, Machine Learning, Volume 20, [31] B. Preim, D. Bartz, Visualization in Medicine:
Number 3, September 1995, pp. 273-297. Theory, Algorithms and Applications, The
[20] X. L. Xie, G. Beni, A Validity Measure for Morgan Kaufmann Series in Computer
Fuzzy Clustering, IEEE Trans. Pattern Graphics, July 2007.
Analysis and Machine Intelligence, Volume 13, [32] D. G. Lowe, Object Recognition from Scale-
Number 8, 1991, pp. 841-847. Invariant Features, IEEE 7th International
[21] W. T. Freeman and C. Weisman, Television Conference on Computer Vision, (ICCV '99),
Control by Hand Gestures, International Volume 2, 1999, pp. 1150-1156.
Workshop on Automatic Face and Gesture [33] A. Fabri, G-J. Giezeman, L. Kettner, S.
Recognition, IEEE Computer Society, Zurich, Schirra, On the Design of CGAL, a
Switzerland, June 1995, pp. 179-183. computational geometry algorithms library,
[22] C. Graetzel, S. Grange, T. Fong and C. Baur, A Software- Practice and Experience, Volume
non-contact mouse for Surgeon-Computer 30, Issue 11, pp. 1167-1202.
Interaction, Technology and Health Care, IOS [34] M. Bober, MPEG-7 Visual Shape Descriptors,
Press, Volume 12, Number 3, 2004, pp. 245- IEEE Transactions on Circuits and Systems for
257. Video Technology, Volume 11, Number 6, June
[23] J. Carreira and P. Peixoto, A Vision Based 2001, pp. 716-719.
Interface for Local Collaborative Music [35] S. Amiri, D. von Rosen, S. Zwanzig, The SVM
Synthesis, Proceedings of the 7th International Approach for Box-Jenkins Models, REVSTAT
Statistical Journal, Volume 7, Number 1, April
2009, pp. 23-26.
[36] J. E. Goodman and J. O'Rourke, Handbook of
Discrete and Computational Geometry, 2nd
Edition, Chapman & Hall/ CRC, 2004.

Wseas TransOnSystems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wseas TransOnSystems

Uploaded by

Copyright:

Available Formats

Non-rigid Shape Recognition for Sign Language Understanding

Key-Words: - Statistical Learning, Shape recognition, Sign-Language

1 Introduction interfaces (VBI), and there are a lot of possible

100 200 300 Fnm  Vnm (  , )  f nm (  , ) 

f (  , ) is an image intensity function in

Figure 2: Example of shapes where region based

2.3 Video-stream segmentation and

1 100 100 100

300 300 300

2.4 Short introduction to Support-Vector

Table 1 Machine learning in general is a scientific

And the minimization of the objective function 2

called mySVM developed by Stefan Rüping. It is 1 2 3

You might also like