Sign Language From Spotting Fingerspelled Words

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

SIGN LANGUAGE FROM SPOTTING

FINGERSPELLED WORDS

SUBMITTED BY
PRASANNA KUMAR.M[1031310716]
KALYAN AKULA[1031310686]
INTRODUCTION
Sign language is an important tool for the deaf and hard of hearing to
communicate. However, signs alone cannot cover all existing words since a
large number of new words continue to be coined. To say a word that is not yet
defined in a sign language, sign language users spell out the word by using a
finger alphabet. Figure 1 shows the Japanese finger alphabet.
Because of this there is strong demand for the development of a method for
highlighting or spotting these kinds of unfamiliar words in sign language videos
and dis- playing them on an auxiliary screen to help interpreters and the audience
follow a talk.
Developing a spotting method is challenging because speech is composed of a
mixture of signs and sets of fin- ger letters or characters. Spotting specific words in
a speech video essentially consists of detecting and recog- nizing particular actions
involving hand shape in a given video. Therefore, the key to developing a practical
spot- ting method is to establish a method for recognizing moving hand shapes.
Many hand shape recognition methods have been pro- posed [8, 10, 14, 13, 9].
a ka sa ta na ha ma ya ra wa

a
i
u
e
o
Figure 1: Japanese finger alphabet.

On the approach based on Kernal orthogonal mutual subspace method


(KOMSM) successfully classified 41 kana from the Japanese finger alphabet
with high accuracy by using a set of multiple sequential images from a
video.Here, however, our aim is recog- nition of sets of multiple finger
characters. Thus, in our setting, there may be pairs of words that are composed
of similar finger shapes but expressed in different temporal or- ders. This
suggests that we need to take temporal continuity into account when spotting
those pairs.
ABSTRACT

➢ This application helps the deaf and dumb person to communicate with the rest of the
world using sign language. Communication plays an important role for human
beings. Communication is treated as a life skill.
➢ Keeping these important words in mind we present this project to mainly focus on
aiding the speech impaired and paralyzed patients. Our work helps in improving the
communication with the deaf and dumb.
➢ Speech-to-sign technology and VRS enables audible language translation on smart
phones with signing and application has characters feature in mobile without dialing
number uses a technology that translates spoken and written words into sign
language with video.
➢ Interaction between normal people with blind person is very difficult because of
communication problems. There are many applications available in the market to
help the blind people to interact with the world. Voice-based email and chatting
systems are available to communicate with each other by blinds.
abstract(cont..)

➢ This helps to interact with persons by blind people. This application includes a voice
based, text based and video based interaction approach. Video chat technology
continues to improve and one day may be the preferred means of mobile
communication among the deaf.
➢ Technologies not mashed up to solve the problem of mobile sign language
translation in daily life activities. Deaf people could gesture sign language into smart
phone by using VRS which would produce audible and textual output.
➢ Mobile gesture recognition might enable the deaf to converse with the hearing,
remotely and intermediated by a video interpreter. Video interpreter is responsible
for helping deaf or hearing impaired individuals understand what is being said in a
variety of situations.
➢ The main feature of this work is that it can be used to learn sign language and to
provide sign language translation of video for people with hearing impairment.
EXISTING SYSTEM:
●SMS and MMS enable users to communicate with both deaf and hearing
parties.

●Human interpreter are used for communication between hearing person and
deaf person.

●Face to face communication.

●Software translates signs into text (and voice), and the hearing person reads
it (and hears it). The hearing person speaks into microphone; Software
translates voice into text. The deaf person reads it.
Limitations of Existing System:

It can be used only between the caller and Callie.

For communication between deaf and hearing person we must dial the number.

For daily activities that are for normal face to face communication we cannot use this application.
PROPOSED SYSTEM
●It is an application for the mobile phone which converts everything we say
in a high pitched voice and gets the required video from the server.
●The main part of this system which is communication between deaf is
implemented using ASL video from server.
●The proposed system will pave way for the deaf person to easily interact
with normal person from anywhere. This system also supports automatic
translation, automatic speech recognition, and speech-to-sign and sign-to-
speech transmission.
●The various technologies used in this system are divided into two main
parts hardware and software. In hardware phone and speaker is used. In
software outfit-7 and Video Relay Service (VRS) is used. They are brought
together and integrated as a system.
●It can be used without dialing the number of the receiver as he is a
registered user.
Finger alphabet spotting algorithm
1: Capture the image sequence
2: Extract the hand region of the signer from the whole body image by using Kinect for
Windows v2 SDK
3: Resize clipped images to 32 × 32 and vectorize them into 1024-dimensional raw image
vector arrays

4: Normalize the image to have unit ℓ2 norm


5: Perform kernel whitening and extract feature vectors from the set of images
6: Calculate the similarities between pre-specified words by TRCCA-mean or TRCCA-wmean
7: Classify the signs into the words with the highest simi- larity to the corresponding
subspace
example of extracting the hand region.

Kernel whitening and TRCCA have parameters such as the length of the gap s and the
dimension to be mapped. For parameter s, our two proposed methods TRCCA-mean and
TRCCA-wmean do not have one fixed value but need a set of different s values.
Learning phase Test phase
Capture imagesequence Capture imagesequence
Segment hand region Segment hand region
Featureextraction Featureextraction
Calculate Kernel whiteningtransform
kernel whitening matrix 𝐎
Reference feature sequence Input feature sequence 𝐘
represent each class 𝐗𝟏 , 𝐗𝟐 , … , 𝐗𝐂
Calculate similarity 𝑆𝑖𝑚𝑐 ,𝑠 by TRCCA Perform

weighted averaging of similarities


Spot the fingerspelled word

Flowchart of the proposed word spotting method.

Dominant hand
a
75cm

140 cm a = 13cm
(a) Layout. (b) Hand region of inter- est.

Figure 5: Experimental setup.

parameter candidates are determined by preliminary small-scale


experiments.
Advantages:

●Majority of people can afford the smart phone since its cost it goes on
decreasing due to availability of many brands with less cost.

●It helps other people to understand the deaf and dumb people with ease.

●It reduces the communication gap between the deaf and dumb people with
the world (other people).

●For ASL users, VRS conversations flow so much more smoothly, naturally,
and faster than communicating by typing.
SYSTEM SPECIFICATION

Hardware Requirement:

Processor : Pentium P4
Motherboard : Genuine Intel
RAM : Min 1 GB
Hard Disk : 80 GB
SYSTEM REQUIREMENTS:

Software Requirement:

Operating system : Windows XP


Technology Used : Android
IDE : Eclipse
Emulators : AVD
Evaluation measures and experimental results
To assess the effect of temporal regularization and sim- ilarity averaging using different time gap parameters,
we performed classification experiments with different time gap parameters in the TRCCA-based methods. We
com- pared the performance of CCA, TRCCA with the best time gap parameter s (denoted by TRCCA-best),
TRCCA-mean, and TRCCA-wmean. We note that TRCCA-best is not available in real situations. We also
performed experi- ments to see the effect of kernelization and orthogonaliza- tion. The methods included for
comparison are KOMSM, kernel orthogonal CCA (KOCCA), KOTRCCA with the best time gap parameter
(KOTRCCA-best), and the kernel- ized and orthogonalized versions of our proposed methods, KOTRCCA-
mean and KOTRCCA-wmean.
We constructed subspaces corresponding to the eight words expressed by fingerspelling. The test data are com-
posed of both frames for one of the specified eight words in the eight classes and frames for none of the pre-
specified words. That is, the test data are composed of frames with and without class labels. To evaluate the
performance of multiclass classification methods when there are “none of the pre-specified class” data, we
introduce three perfor- mance measures. The first measure is the hit rate (HR), which is defined as the ratio of
the number of frames for which the similarity to the correct class subspace is the max- imum value among all
similarities to subspaces, to the num- ber of frames with class labels. The second measure is the correct
acceptance rate (CAR), which is defined by the ra- tio of the numbers of frames where the similarity to the true
class subspace is higher than some threshold and also higher than all other similarities to other classes to the
number of frames with class labels. The last measure is the correct rejection rate (CRR) on only non-finger
alphabet frames without any class label in an input video. This measure is defined as the ratio of the number of
the frames where the maximum similarity among all the classes is less than the threshold to the number of all
the non-finger alphabet frames.
Conclusions
We proposed a method for spotting specific words ex- pressed by fingerspelling in
sign language videos. In our experiments, we assumed a practical situation in which the
viewer needs the support of a spotting system. Our exper- imental results show that
incorporation of multitemporal smoothness regularization and kernel whitening into TR-
CCA is effective for spotting gestures. The experiments also show that averaging the
similarities obtained with dif- ferent regularization time gaps is effective for stably spot-
ting spelled out words of various lengths.
The spotting performance achieved by our proposed method is insufficient for
developing practical systems and performance improvements are needed in order to
construct a robust spotting system. Because of the difficulty of ob- taining sign
language and finger alphabet video sequences signed by expert users of Japanese Sign
Language, the performances reported in this paper were one-time exper- iments
signed by expert users of Japanese Sign Language, the performances reported in this
paper were one-time exper- iments. An important task for the future is to collect more
real-world datasets for evaluating the proposed method. Al- though we showed
experimentally that time gap parame- ter selection can be partially avoided by using an
averaging approach, there remain other parameters and automatic pa- rameter tuning
methods that need to be investigated in fu- ture work.
References
[1] N. Cristianini and J. Shawe-Taylor. An introduction to Sup- port Vector Machines and other kernel-
based learning meth- ods. Cambridge University Press, 2000.
[2] K. Fukui and O. Yamaguchi. The kernel orthogonal mutual subspace method and its application to
3d object recognition. In Asian Conference on Computer Vision, pages 467–476, 2007.
[3] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canon- ical correlation analysis: An overview
with application to learning methods. Neural Computation, 16(12):2639–2664, 2004.
[4] T. Kawahara, M. Nishiyama, T. Kozakaya, and O. Yam- aguchi. Face recognition based on
whitening transformation of distribution of subspaces. In ACCV Workshops, Subspace, pages 97–
103, 2007.
[5] T. Kobayashi. S3cca: Smoothly structured sparse cca for par- tial pattern matching. In International
Conference on Pattern Recognition, pages 1981–1986, 2014.
[6] Y. Kodama, T. Kobayashi, N. Otsu, and K. Fukui. Partial matching method using spatio-temporal
regularized canoni- cal correlation analysis. Technical Report of IEICE, PRMU, 110(414):99–104,
2011. (in Japanese).
[7] M. Kondo, N. Kato, K. Fukui, and A. Okazaki. Develop- ment of a fingerspelling training system
for both static and dynamic fingerspelling using depth image. In Human Inter- face Symposium,
pages 643–648, 2014. (in Japanese).
[8] S. Liwicki and M. Everingham. Automatic recognition of fingerspelled words in british sign
language. In CVPR Work- shops, Human Communicative Behavior Analysis, pages 50– 57, 2009.
[9] Y. Ohkawa and K. Fukui. Hand-shape recognition using the distributions of multi-viewpoint image
sets. IEICE Trans- actions on Information and Systems, E95-D(6):1619–1627, 2012.
[10] S. Ricco and C. Tomasi. Fingerspelling recognition through classification of letter-to-letter
transitions. In Asian Confer- ence on Computer Vision, pages 214–225, 2009.
[11] B. Scho¨lkopf, A. Smola, and K.-R. Mu¨ller. Nonlinear com- ponent analysis as a kernel eigenvalue
problem. Neural Com- putation, 10(5):1299–1319, 1998.

You might also like