Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

Maha College of Engineering

Attur main road,Minnampalli, Salem-106

__________________________________________________________

EMBEDDED SYSTEM
(CELLPHONE FOR BLIND AND DEAF)
___________________________________________________________ Name : Thanmanam.P , Harikrishnan.P

Dept. E- Mail id Contact Number (ear)*em

: : : :

Information Technology p.thanmanam@gmail.com 96 !!"#$!%9 &9!#99''

I+)+II

If the conditions of life change and the form undergo modification, uniformity of character can be given to the modified offspring, solely by natural selection preserving similar favorable variations. -Charles Darwin

ABSRACT
The gro,th of ,irele** net,or-* in the pa*t fe, year* can be attributed to ri*ing demand for ,irele** *er.ice* *uch a* data% .oice% .ideo% and the de.elopment of ne, ,irele** *tandard*. The rapid ,irele** technological change ha* made cell phone* reach e.ery corner of the *ociety. The cell phone and ,irele** technology could be in e.ery,ay e/tended to *upport an all together deferent echelon of u*er*. The hearing% *peech and .i*ion depri.ed u*er could be helped to u*e *er.ice* of the cell phone a* ,ell.Thi* paper e/plore* the po**ibility of integrating the application* of Speech to Animation(S A!,"oice recognition to help the blind% deaf and dumb u*er*. 0peech to animation technology con.ert* human *peech to animated film in real time. The core of thi* technology i* a real time *peech recognition engine ,hich con.ert* human .oice to phoneme*. 0o ideally% thi* technology in.ol.e* generating the animated image* ba*ed on the phoneme* of the recei.ed *peech content in real time. #$%&'(DS: Speech to animation technology, voice recognition, phonemes, speech recognition engine.

ARTICLE OUTLINE

Introduction S !!c" to Ani#$tion M$%or & !!c" r!'$t!d ($ci$' #otion& Lo)!r 'i *!"$+iour U !r 'i *!"$+iour

,irtu$' ($c! #$ni u'$tion ,i&!#! R! r!&!nt$tion ,id!o to & !!c" A *'ind !r&on c$''in- $ d!$(. du#* !r&on Ad+$nt$-!& $nd c"$''!n-!& Conc'u&ion& R!(!r!nc!&/ commands rom the !lind user to make a call to a destination.Speech is produced INTRODUCTION As evident, the cell phone has graduated rom a device providing the !asics o services to accommodating a multitude o applications that can !e added according to the needs o the people. A typical cell phone or the dea and dum! user shall have the STA application in it "hich can play the animated ace in the screen o the phone. #t shall also have $ideo to Speech application and a camera to record the acial %especially lips& movements o the user. An alerter similar to a vi!rator can !e used to indicate the user o a ne" call. The !lind user needs voice recognition so t"are to take voice !y the movements o speci ic organs or articulators o characteristic articulators, the vocal tract. 'ach positioning and some o o the their distinct speech sound is related to a

movements are "holly or partially visi!le on the speaker(s ace, especially in the region around the mouth, "hich comprises the upper end o the vocal tract. The simulation o such visi!le movements during speech production is the key to the generation o realistic speech synchroni)ed *+ acial

animation.

Speech to animation technology converts human speech to animated ilm in real time.Here a camera captures the lip movements o the !lind user and his voice signals are converted to phonemes. These phonemes are converted to visemes and an animated ace opens up at the receiving end. The dea and dum! user at the receiving end can interpret the in ormation !y lip reading rom the

Phonemes are the !uilding !locks o human speech. These are the smallest sound units that make up the "ords that are spoken. ,hen the sound o a phoneme changes, the change in the "ord is e ected as "ell. The term viseme, a shorthand or visual phoneme, constitutes a central concept o speech synchroni)ed acial animation. $isemes can !e understood as the recogni)a!le visual motor patterns usually common to t"o or more phonemes. These phonemes are the smallest speech sound units in a language that are capa!le o conveying a distinction in meaning, such as the -p. in -pie. in contrast "ith the -!. in -!ye.. $isemes can !e distinguished !ased on the measurements o movements displayed on the or visemes. SPEECH TO ANIMATION ace during speech

animated ace. Audio is pro!a!ly the most natural modality to recogni)e speech content and a valua!le source to identi y a speaker /01. $ideo also contains important !iometric ace2lip in ormation, "hich includes

te3ture and lip motion in ormation that is correlated "ith the audio.State4o 4art speech recognition systems have !een 5ointly using lip in ormation "ith audio. 6or speech recognition, it is usually su icient to e3tract the principal components o the lip in ormation and to match the mouth openings7closings "ith the phonemes o speech.The usion o audio modality "ith !est lip movements "ill present good speech recognition

production to derive geometric models

engines

in

real

time.

(RA,EL) synthesi)ed actors(&0n$ctor) representing real or imaginary people, animated characters can !e simulated and programmed to per orm actions. MA1OR SPEECH RELATED FACIAL MO,EMENTS 6or controlling the dynamic !ehavior o a *+ synthetic ace, the speech related movements are !roken do"n into three components: a rigid !ody component associated "ith the movement o the mandi!le, !ounded !y the !ehavior o the t!# oro#$ndi*u'$r %oint. and t"o non4rigid components associated "ith movements o the upper and lo"er lips. #n the ollo"ing paragraphs, "e derive !ehavioral models or the temporomandi!ular 5oint and the lo"er lip !ased on the in ormation provided !y the tra5ectories o the iduciary points. The upper lip !ehavior is directly given !y the tra5ectory o P0.

To capture the geometry in ormation during speech production, iduciary points "ere marked "ith "hite paint on the ace o the speaker as sho"n in the ig. #t represents the location and la!elling o the our iduciary points, P0, P8, P* and P9, or "hich *+ tra5ectories "ere measured and analy)ed. The igure also sho"s the orientation o the :artesian coordinate system consistently used throughout the te3t.A speech processing system converts le3ical presentation to a string o phonemes and a computer animation system, given visual data, generates immediate rames to provide a smooth transition rom one visual image to the ne3t to provide smooth animation. The timing data is used to correlate the phonemic string "ith the visual images to produce accurately timed se;uences o visually correlated events. <tili)ing a real time random access audio2visual synthesi)er (RA,E) together "ith associated special purpose audio2visual modeling language

:onsidering the conventions in a!ove ig and the measured coordinates y9%t& and )9%t& o the iduciary point P9 located on the chin, the T=> angle o rotation can

!e

e3pressed

!y

animation o the lo"er region o the synthetic ace that around is the the

mouth.Assuming "here cy%t?& and c)%t?& are the components on the midsagittal plane o , "ith the T=> center at rest position. The presumed location o the T=> center o our speaker "as also measured rom the video. The translations ty%t& and t)%t& o the T=> center "ithin the midsagittal plane can !e e3pressed !y that !ehavior

tra5ectory o the iduciary point P* and is the movement o the lo"er can !e e3pressed !y

lip due to that o the T=>, the lo"er lip

The components o the vector and B,

in

the midsagittal plane are %directions A respectively&

The radius r9, the module o vector , is a constant and is de ined !y the si)e o the mandi!le. #t is possi!le to estimate r9 in the position o rest !y

The distance r* !et"een P* and the T=> center can !e calculated center o the T=> at rom the rest !y coordinates o P* and the location o the

LO2ER LIP BEHA,IOUR The lo"er lip !ehavior is related to the voluntary movement o the lo"er lip tissue %other than that caused !y the mandi!le& necessary to produce speci ic speech gestures, such as lip protrusion. @o"er lip !ehavior descri!es this kind o voluntary movement involved in the tra5ectory o the iduciary point P*. Non4 rigid !ody de ormation can !e derived rom this !ehavior to control the

As the lo"er lip e3periences the same variation o angle as P9 due to the rotation o e3pressed !y the T=>, C*%t& can !e

These igures sho" that the analysis o the lo"er lip !ehavior correctly indicates the occurrence o lo"er lip protrusion. This protrusion occurs "hen the solid curve in the igures is to the le t o the dashed one. ,hen it is to the right, the

lo"er lip is retracted, as in mouth spreading. UPPER LIP BEHA,IOUR

the corners o the mouth. The lo"er !ound o the mandi!ular region is de ined !y the neck. The "hite dots sho"n in the !elo" ig. are the vertices o the synthetic ace su!mitted to these trans ormations. The a3is o rotation

As "ith lo"er lip !ehavior, upper lip !ehavior is associated "ith the voluntary movement o lip tissue. Ho"ever, in contrast to "hat happens "ith the lo"er lip, the !ehavior o the upper lip associated "ith the iduciary point P0 is not entangled "ith T=> movement. Thus, upper lip !ehavior can !e determined directly rom the tra5ectory o the iduciary point P0. Non4rigid !ody de ormation is derived rom this !ehavior to control the animation o region a!ove the mouth o the synthetic ace. ,IRTUAL FACE MANIPULATION To reproduce the movements o the mandi!le in the synthetic ace, the rigid !ody trans ormations o rotation and translation descri!ed !y T=> !ehavior are applied to the polygonal vertices o the geometric model. These trans ormed vertices are those "ithin the region o the ace alongside the mandi!ular !one, more precisely, in the region !elo" and including the lo"er lip and the lateral side o the ace !elo" an imaginary plane de ined !y T=> rotation a3is and

de ined !y the center o the T=> is presented in the igure as a "hite line in ront o the ear.The !ehavior o the upper and lo"er lips is mapped onto the synthetic ace on the !asis o three main considerations. 6irst, the points on the geometric model corresponding to the iduciary points must closely mimic the !ehaviors descri!ed !y the measurements. Second, during speech production, the skin tissue around the mouth, including the lips, su ers de ormations primarily attri!uted to the sphincteral !ehavior o the Or*icu'$ri& Ori& muscle. Third, other muscles "hich also in luence the movement o the skin around the mouth are distri!uted asymmetrically "ith respect to the hori)ontal plane %"ith di erent groups o muscles inserted into the skin o the upper and lo"er regions o the mouth&. Dased on these considerations, a geometric model "as derived to e3press the visi!le characteristics o the skin around the mouth during speech production.

,ISEME REPRESENTATION The visual representation o the phonemes, or visemes, is a crucial aspect o speech synchroni)ed realistic acial animation. The recogni)a!le characteristics o a viseme, ho"ever, can signi icantly change due to the e ects o coarticulation. Traditionally, these e ects have !een modelled !y usion unctions that !lend phonetic conte3t independent EpureF visemes visemes. that Cont!3t4 represent an ind! !nd!nt +i&!#!&, ho"ever, are ideali)ation o e3pected speech postures. This ideali)ation assumes that the speech segment, and there ore its viseme, is produced in an isolated orm, "ithout su ering inter erence rom the a and articulatory movements o EpureF viseme and o involves an intricate a near!y

dum! user are converted to visemes "hich are then converted as phonemes and then relayed to the !lind user as voice signals. A TYPICAL HANDSET4 A BLIND USER CALLS A DEAF. DUMB USER/ A typical cell phone or de erent echelon o users "ill have a camera to capture the acial e3pressions esp. the lip movements o the !lind user "hich are the converted to visemes using viseme representation techni;ues discussed here. @ater geometric models are derived or these visemes and animated created using &0nc"roni5!d ace is ($ci$'

r!co-nition/ The audio signals o the !lind speaker are used "ith the lip movements to generate visemes. Such a usion is called #u'ti4#od$' (u&ion and the system is said to !e a #u'ti#od$' & !!c" r!co-nition &0&t!#/ Dut speech su icient recognition, to e3tract it is the or usually principal

segment. The characteri)ation o process

its !lending

re;uires the gathering and analysis o e3tensive articulatory data. ,IDEO TO SPEECH This application shall !asically do the reverse o Speech to Animation application. #t shall convert live video streams to "ords, "hich shall !e converted to phonemes and there!y to speech streams that can !e relayed over the net"ork. The images o the dea and

components o the lip in ormation and to match the mouth openings7closings "ith the phonemes o speech.The dea , dum! user at the receiving end receives the animated images o the transmitted in ormation. The camera in his cell phone captures his lip movements and converts them to voice signals !y

reversing the process , thus ena!ling the !lind user to hear the voice messages. A SAMPLE PHONE CAL: BLIND USER(6) TO DEAF. DUMB USER(Y) G issues voice command to call A A receives a vi!rating alert. <pon animated phone A.s response is captured !y the camera and is streamed to video to speech converter, "hich is then transmitted to G. The camera in A.s phone is ready as "ell. G.s response %speech& lands in A and is converted to animation stream "hich is played !y the ace in A. 6or speech recognition, it is usually su icient to e3tract the principal components o the lip in ormation and to match the mouth openings7closings "ith the phonemes o speech. accepting the call, the ace opens up in A.s cell

AD,ANTA7ES AND CHALLEN7ES This recommendation is ;uiet e ective to !ecause o many actors. The main, !eing the availa!ility o higher end technology that already supports these %STA& applications. =ost o the applications Some are language provide movement independent or multilingual in other "ords. greater applications acial authentic

replication than a normal eye could translate, thus rendering the application more e ective. A!ove all, @ip reading is the most e ective "ay o communication used !y the dea and dum!. The !lind person could use a lesser complicated gadget or even closer to the normal gadget. On t"! ('i &id!, Not all dea and dum! users can lip read. A dea and dum! person !y !irth has a lesser possi!ility to posses the skill o lip reading. #deally, the user should !e trained in lip reading. The video to speech converter application using phonemes is still not common. The characteri)ation o a EpureF viseme and o its !lending involves an intricate process. =oreover the implementation o this model using poorly de ined

parameters can easily produce deceptive results. CONCLUSIONS This recommendation is highly e ective in many "ays. The purpose o science and technology esp. "ireless communication is that it should reach the masses. And as communication engineers it is our orthright duty to e3tend the usage o "ireless technology to the impaired also. The technologies mentioned in this paper e3ist already and it "ould very ruit ul i the suggested plan is implemented. #t "ould help the de erent sector o and technology. )Any sufficiently advanced technology is indistinguishable from magic. *or man holds in his mortal hands the power to abolish all forms of human poverty, and all forms of human life.) -+ohn *it,gerald #ennedy ($*$($-C$S 123 C. 4elachaud% N.I. 5adler and M. 0teedman% 6enerating facial e/pre**ion* for *peech% Cogniti.e 0cience 1" 3 4.7. 8ac-*on% The theoretical minimal unit for .i*ual *peech perception: +i*eme* and coarticulation 1#3 9.-4. 5enguerel and M.:. 4ichora-;uller% Coarticulation effect* in the society to keep a!reast o the developments in science

lipreading% 8ournal of 0peech and <earing =e*earch 1! 3 N.4. Erber% 9uditory% .i*ual% and auditory-.i*ual recognition of con*onant* by children ,ith normal and impaired hearing% 8ournal of 0peech and <earing Di*order* 1'3 I. Matthe,*% T. Coote*% 8. 5angham% 0. Co/ and =. <ar.ey% E/traction of .i*ual feature* for lipreading 163 6. 4otamiano*% C. Neti% 6. 6ra.ier% 9. 6arg% 9. 0enior% =ecent ad.ance* in the automatic recognition of audio-.i*ual *peech.

You might also like