Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Tp ch Khoa hc & Cng ngh - S 1(45) Tp 2/Nm 2008

CNG THC BAYES V NG DNG GII QUYT


CC BI TON NHN DNG
T Trung Hiu (i hc Thy li)

1. Cng thc Bayes


Theo suy ngh thng thng, nu ta tm c mt hnh nh E ging vi mt k hiu H
m ta bit trc , ta s kt lun E l hnh nh ca H. Nhng khi ta nhn thy rng E c th
hao hao ging H1 hoc H2, ta s phi s dng thm cc thng tin khc. V d nh tn sut xut
hin ca H1 v H2, nu k hin no c tn sut ln hn, ta s chn k hiu . Hoc da vo cc
hnh ln cn ca E quyt nh xem chn H1 hay H2 l ph hp. l tt c nhng g m
Bayes pht biu trong cng thc.
p ( E | H ). p ( H )
p( H | E ) =
p( E )
Nh vy, kh nng gi thuyt H ng vi bng chng E, tc l lng p(H|E), ph thuc
vo khp ca E i vi H, hay l lng p(E|H), v tn sut xut hin ca H, tc l lng
p(H), v bn cht ca E, hay chnh l lng p(E). chn ra gi thuyt tt nht i vi mi E,
chng ta s chn ra H* c p(H*|E) cao nht, cng c ngha l lng p(E|H).p(H) ln nht, v
lng p(E) l c nh vi mi E.
p( E | H k ). p( H k )
H * = arg max p( H k | E ) = arg max = arg max p( E | H k ). p( H k )
Hk Hk p( E ) Hk

V d trong ng dng quay s bng ging ni, ngi dng ni ra mt on m thanh A v


my cn tnh ton tm ra mt tn ngi N* khp nht vi on m thanh va nhn c. Vi gi
s trong trong my tnh c lu cc tn ngi N1, N2, NK trong danh b. N s gi nh rng N1 cng
c th l A, N2 cng c th l A, do n phi tnh tt c cc gi nh hay tnh tt c cc lng sau
p( N1 | A) = p( N1 | A). p( N1 ) = equal ( N1 , A). freq( N1 )
p( N 2 | A) = p( N 2 | A). p( N 2 ) = equal ( N 2 , A). freq( N 2 )
...
p( N K | A) = p( N K | A). p( N K ) = equal ( N K , A). freq( N K )
Trong equal(Nk, A) l ging nhau gia Nk v A. Khi Nk cng ging A th o
ny tin dn v 1. Khi Nk cng khc A th con s ny tin dn v 0. Sau n s chn ra Nk no
c p(Nk | A) l ln nht. Trong trng hp cc kh nng xut hin ca cc tn l nh nhau,
ngha l cc p(Nk) u bng nhau, th kh nng Nk l A chnh l khp ca Nk vi A. y l
trng hp c bit ca cng thc Bayes, trong thng tin v tn xut ca cc gi thuyt
khng ng gp g vo nhn dng.
2. Nhn dng mt k hiu n
Mt k hiu (symbol) trong nhn dng thng c dng ch mt n v c lp c
th c a vo cc php so snh em li kt qu nhn dng. Trong nhn dng ting ni,
116
Tp ch Khoa hc & Cng ngh - S 1(45) Tp 2/Nm 2008

mt k hiu thng ng vi mt m tit (syllable). Trong nhn dng ch vit, mt k hiu c


th l mt ch n (character), nu ta chia c t thnh ch, hoc mt t (handwritten word)
gm nhiu ch lin nt.
Trong nhn dng mt k hin n, ta cn mt t in D cc mu nhn dng. T in ny
s c to trong qu trnh hun luyn. Ta gi nh t in D lit k c, ngha l n h tr
ton t size(D) cho kch thc ca t in v item(k, D) cho phn t mu th k trong t in D.
Do th tc nhn dng s nh sau
b1) Ban u t gi tr kmax = -1; pmax = 0;
b2) Vi mi gi tr item(k, D) c trong t in, ta tnh lng pk
pk = equal(item(k, D), V) * freq( item(k, D) );
b3) t li gi tr kmax v pmax nu nh pk ln hn pmax
b4) Tr v gi tr kmax tm c
Th tc tm kim ny s tr v -1 trong trng hp t in rng, v tr v kmax nm
trong khong 0 n size(D)-1 vi kmax c kh nng ln nht. Nu chng ta t ngng cho
vic nhn dng, th tc tm kim cng tr v -1 khi pmax nh hn
Trong phng php nhn dng ny, t in D c nhiu phn t, v ta dng biu thc
item(k, D) ly phn t th k. Mi phn t l mt mu (model) v vic nhn dng thc cht l
so snh i tng cn nhn dng V vi cc mu trong t in. V mt lp trnh, mu nhn dng
l bt k cu trc d liu no cho php thc hin hai ton t equal v freq nh trn. Di y
chng ti s gii thiu mt s cc phn t c bn c th dng lm mu.
Dng n gin nht ca mu M = (, , ) trong l mt vc t gi l tm ca mu,
l mt s thc dng xc nh bn knh ca mu, v xc nh kh nng xut hin ca mu.
Do ta c th nh ngha hm equal nh sau
(V ) 2
equal (V , M ) = exp v freq(M) =
2 2

Vic hun luyn mu ny c thc hin bng cch tnh ba tham s , , t tp d liu
hun luyn tng ng. y ch l cc php ton thng k thng thng trong c tnh
bng trung bnh ca cc mu hun luyn, c tnh bng khong cch ln nht gia v cc
mu, v l s lng mu c tm trn tt c cc mu.
M hnh thng k HMM cng hay c dng lm phn t nhn dng. Mt m hnh
HMM thng c ba tham s =(A, B, ) c m t trong cc ti liu [3, 2, 4]. Ta c th tnh
lng equal(V, ) = p(V|) thng qua thut ton c lng. V ta c th lu thng tin thng k
p() nh trng hp trn. Vic hun luyn c thc hin thng qua thut ton Baum-Welch
3. Nhn dng cc chui k hiu ri rc

Mt chui k hiu (symbol sequence) thng c dng ch mt dy tun t cc k


hiu c ghp ni lin tc vi nhau, v d nh mt chui cc m tit c pht ra, mt dy lin
tc cc t c vit trn mt dng, mt dy cc hnh nh lin nhau trong mt on phim.

117
Tp ch Khoa hc & Cng ngh - S 1(45) Tp 2/Nm 2008

Chui k hiu ri rc (connected symbol sequence) l mt chui k hin trong cc k


hiu c cc khong trng c th phn bit c. Trong nhn dng, khong trng cng l mt
k hiu v thng l cc vng tn hiu khng mang nng lng. V chui tn hiu ri rc c th
chia nh thnh cc k hiu c lp (isolated symbols), bi ton nhn dng chui tn hiu ri rc
c a v bi ton nhn dng k hiu n. Tuy nhin chng ta hy xem xt thut ton nhn
dng vi cc bc sau
b1) Chia nh chui k hiu thnh cc k hiu tch bit
b2) p dng thut ton nhn dng k hiu ring tm ra cc ng c vin cho k hiu, mi
mt k hiu c mt tp cac ng c vin c xc sut gi nh cao nht.
b3) Dng thng tin ng cnh hay thng tin ngn ng la chn cu c kh nng xut hin
cao nht.
Chng ta hy xt mt v d n gin nht nhn dng dy cc k hiu vit tay di y
lm r thut ton nhn dng cc k hiu ri rc. Trong v d di y, chng ta chia dng
ch lm ba k hiu v nhn dng c ba tp t tng ng. Vic la chn cu no t ba tp t
phi s dng thng tin ngn ng, hay c th hn l tn sut xut hin ca mt cu. Chng ta s
thy kh nng "Ti i chi" hoc "Ti i ch" l rt cao, nhng chng ta s khng thy "Ti i
cht" hoc "Tra i ch" v cc cu ni xut hin rt t hoc khng xut hin trong ting Vit.

Ti i chi
Thi ti ch
Ta si ch
Tra cht
Thng tin ngn ng (language information) thng c lu hai dng ph bin, m
hnh ngn ng (language model) v vn phm (grammar) cng vi cc hnh thc tng ng
vn phm. M hnh ngn ng [2, 5, 6] l mt cng c thng k cho php tnh xc sut ca mt
cu ni trong ngn ng. Cc cu ni thng gp s c tn sut cao, cc cu ni sai ng php
hoc t gp s c xc sut xp x khng. M hnh ngn ng phn nh quy lut ng php, ng
ngha, ng dng di dng thng k. Vn phm [7, 8, 11] v cc dng tng ng ca n phn
nh ng php ca ngn ng. Vn phm l cc quy tc ghp k hiu chnh xc v khng th sinh
t ng nh cc quy lut thng k, do chng ta cn phi bin son cc b vn phm phn
nh thng tin ngn ng.
M hnh ngn ng thng c lu thnh m hnh bigram, trong mi t c xc sut
ng u p(W) v xc xut ng sau mt t no p(Wsau | Wtruoc) do cu ni trn c xc
nh nh sau, vi gi nh ta c ba k hiu Ttri, dsi, chci ng vi ba hnh nh cha bit. Ta s
tnh cc lng nh di y v chn ra cu c kh nng cao nht. V d ta s tnh cc lng sau
equal(Ti, Ttri) . equal(i, dsi) . equal(chi, chci) . p(Ti) . p(i | Ti) . p(chi | i)
equal (Ta, Ttri) . equal (ti, dsi) . equal(ch, chci) . p(Ta) . p(ti | Ta) . p(ch | ti)

118
Tp ch Khoa hc & Cng ngh - S 1(45) Tp 2/Nm 2008

Trong cc hm equal c dng xc nh khp gia cc hnh nh v m hnh


ca cc t. Cc hm xc sut pha sau c ly t m hnh ngn ng. Chng ta c th thy y
l cng thc Bayes trn cu, p(cu | hnh) = p(hnh | cu) . p(cu) nhng mt cu c chia
thnh nhiu t v mt hnh c chia thnh nhiu k hiu n l.
4. Nhn dng cc chui k hiu lin tc
Chui k hiu lin tc (continuous symbol sequence) l chui k hiu trong ta khng
thng tin tch bit cc k hiu thnh cc t n. C ngha l cc khong trng gia cc k
hiu khng tn ti hoc khng ln nhn ra, v do chng ta khng th chia nh cc t.
V d cc t c ni lin tc trong bn tin thi s hoc bnh lun bng , hoc v d cc t
c vit dy v lin tc trn mt dng v khng th chia nh thnh cc t n.
Khi chui k hiu khng th chia nh c, ta phi x l ton b chui k hiu v coi n
nh mt i tng hay mt k hiu n. C hai cch tip cn ph bin cho vic nhn dng
chui k hiu lin tc. Cch th nht l tm kim chui cn nhn dng trong khng gian chui
mu. C th hiu l tm kim trn t in ging nh nhn dng k hiu n l. Nhng cng c
th s dng thut ton tm chui ti u, v d thut ton Viterbi [2, 3] tm chui trng thi
khp nht vi chui cn nhn dng. Cch th hai l dng phng php tng hp t di ln vi
cc b phn tch c php t di ln c trnh by trong [9, 10, 11, 12] sinh ra mt cu trc
cy trong c cc t thay th v t ca ngn ng. Cch ny i hi phi bin son b vn
phm cc b phn tch c th hot ng.
Kt lun
Cng thc Bayes l c s xc nh kh nng ca mt gi nh da trn bng chng.
Khi c mt on d liu S cn nhn dng, ta cn gi nh rng S c th khp vi bt k mt
mu d liu M1, M2, MK no bit trc . Do ta cn chn mt gi nh tt nht bng cch
c lng kh nng hay xc sut ca gi nh bng cng thc Bayes. Cng thc Bayes cng
c pht trin nhn dng cc chui k hiu. Trong xc sut tin nghim, hay kh nng
xut hin ca mt t hoc mt cu, c xc nh bng thng tin ngn ng, hay c th hn l
m hnh ngn ng.
Vn phm l mt gii php thay th cho thng tin ngn ng. Mc d cc lut ca vn
phm rt cht ch, nhng chng ta cn bin son. Cc lut thng k trong m hnh ngn ng c
th to mt cch t ng, hn na n phn nh c ng php, ng ngha, v ng dng ca cu ni
trong ngn ng.
Tm tt
Cc nghin cu v nhn dng s dng phng php thng k ngu nhin thng s dng cng thc
Bayes tnh cc xc sut ca cc gi nh v la chn gi nh c xc sut cao nht lm kt qu nhn
dng. Trong bi bo ny, chng ti mun gii thiu mt s dng khc nhau ca cng thc Bayes v ng
dng ca n trong cc bi ton nhn dng khc nhau. Qua chng ti cng gii thiu mt s khi nim
nh khng gian mu, m hnh ngn ng, vn phm, m hnh Markov Nn.
T kha: Bayesian rule, speech recognition, handwriting recognition, language model, hidden markov
model, context-free grammar.

119
Tp ch Khoa hc & Cng ngh - S 1(45) Tp 2/Nm 2008

Summary
Bayesian rule and its application to solve recognition problems
Tu Trung Hieu - { tutrunghieu@gmail.com }
Researches on recognition with stochastic approach usually use the Bayesian rule to evaluate the
probabilities of hypotheses and select the hypothesis with the maximum probability to be the recognition
result. In this paper, we would like to introduce the Bayesian rule and its application in different
recognition problems. In addition, we also introduce some recognition concepts, such as pattern space,
language model, grammar, hidden Markov model.

Ti liu tham kho

[1] E. T. Jaynes (2003), Probability Theory: The Logic of Science, Cambridge University Press.
[2] Steve Young, Dan Kershaw, Julian Odell, Dave Ollason, Valtcho Valtchev, Phil Woodland (2000),
The HTK Book.
[3] Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech
Recognition. Proceedings of the IEEE, 77 (2), p. 257286, February 1989.
[4] Gernot A. Fink and Thomas Pltz (2007), Markov Models for Handwriting Recognition, ICDAR
2007 Tutorial, Curitiba, Brazil
[5] Fei Song, W. Bruce Croft (1999), A General Language Model for Information Retrieval.
[6] Jay M. Ponte, W. Bruce Croft (1998), A Language Modeling Approach to Information Retrieval,
[7] Jean-Michel Autebert, Jean Berstel, Luc Boasson ((1997), Context-Free Languages and Push-Down
Automata.
[8] J.E. Hopcroft and J.D. Ullman (1979). Introduction to Automata Theory, Languages, and
Computation, Addison-Wesley,
[9] Philippe Mclean. Nigel Horspool (1996), A Faster Earley Parser.
[10] Mark Hepple (1999), An Earley-style Predictive Chart Parsing Method for Lambek Grammars.
[11] Alon Lavie, Masaru Tomita (1993), GLR* An Efficient Noise-skipping Parsing Algorithm For
Context Free Grammars.
[12] J. C. Chappelier, M. Rajman (1998), A generalized CYK algorithm for parsing stochastic CFG.

120

You might also like