Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 2

Thut ton tch t

N i dung

1. 2. 3. 4. 5.

1 Cc hng tip cn 2 Cc phng php c s dng 3 ng dng bi ton tch t 4 Thut ton ph bin 5 Mt s vn m rng

Tch t l mt qu trnh x l nhm mc ch xc nh ranh gii ca cc t trong cu vn, cng c th hiu n gi n r ng tch t l qu trnh x trong cu. i vi x l ngn ng, c th xc nh cu trc ng php ca cu, xc nh t loi c a m t t trong cu, yu cu nh t thi t t r trong cu. Vn ny tng chng n gin vi con ngi nhng i vi my tnh, y l bi ton rt kh gii quyt.

Chnh v l do tch t c xem l bc x l quan trng i vi cc h thng X L Ngn Ng T Nhin, c bi t l i v i cc ngn ng thu ng n lp, v d: ting Trung Quc, ting Nht, ting Thi, v ting Vit. Vi cc ngn ng thu c loi hnh ny, ranh gi i t khng ch n gi n cc ngn ng thuc loi hnh ha kt nh ting Anh, m c s lin h cht ch gia cc ti ng v i nhau, m t t c th cu t o b i m t ho c nh ng thuc vng ng , vn ca bi ton tch t l kh c s nhp nhng trong ranh gi i t.

Cc h ng ti p c n Mt cch tng qut c th thy rng bi ton tch t c 3 phng php tip cn chnh : Tip cn da vo t in c nh. Tip cn da vo thng k thun ty. Tip cn da trn c hai phng php trn.

Cc ph ng php c s d ng So khp t di nht (Longest Matching) So khp cc i (Maximum Matching) M hnh Markov n (Hidden Markov Models- HMM) Hc da trn s ci bin (Transformation-based Learning TBL) Chuyn i trng thi trng s hu hn (Weighted Finite State Transducer WFST) hn lon cc i (Maximum Entropy ME) My hc s dng vect h tr (Support Vector Machines)

.Trng xc xut c iu kin (CRFs)

Ngoi ra cn c th kt hp nhng phng php trn ng d ng bi ton tch t Bi ton tch t l bi ton c bn u tin trong cc bi ton t ra cho x l ngn ng sau : Phn tch hnh thi (morphological analysis) o Phn tch ph t o Nhn din tn ring o Nhn din ranh gii ng Phn tch ng php (PARSER) o Gn nhn t loi o Gn nhn ranh gii ng o Gn nhn quan h c php X l vn bn o Kim li chnh t o Kim li vn phm o Phn loi vn bn

o Tm tt vn bn o Hiu vn bn o Khai thc vn bn Ti nguyn h tr: T in ting Vit Ng liu ting Vit c tch t h tr qu trnh hun luyn Thu t ton ph bi n i vi ting Nht, thut ton ph bin nht l "Tr ng s c c ti u" : qui v bi ton th nh sau : 1. To ra 2 nh o l start, v end (u v cui cu). 2. Ln lt so snh cc on vi di bt k vi 1 t in ngn ng c sn. 3. Cc on c xut hin trong t in s to thnh 1 nh mi trn th. 4. Trng s gia 2 nh (2 on phi lin tip nhau trong cu) c tnh theo cng thc f(i,j) v i i,j l 2 t . 5. Tm ng i t nh start n nh end c trng s nh nht trn th .

Trong bc 4, cng thc f(i,j) thng c tnh theo gi tr uni-gram(kh nng xut hin ca 1 t) v bi-gram(kh nng 2 t xut hin lin ti p n 1 s yu t khc nh t loi, kh nng lin kt t loi, ... cng c s dng trong hm f. Trc y, cc gi tr ny (tr uni-gram v bi-gram c ly t cch thng k corpus) thng c nh gi bng tay (do ng i th c hin). Nh my hc nh Markov n, CRFs, ... cc gi tr ny thng c tnh 1 cch t ng. Trong bc 5, thut ton tm ng i t nh start n nh end thng s dng thut ton Viterbi v i ph c t p thu t ton O(n) v i n l M t s v n m r ng i vi ting Vit,c 1 cng c tch t c pht trin trong ti VLSP http://www.loria.fr/~lehong/tools/vnTokenizer.php

chnh xc t 97%. Theo nh ti c bit, vi cc cng c tng t cho ting Nht, chnh xc ln n 99% (JUMAN, Mecab). Nh t nhiu vic phi lm.

Ngoi ra, c 1 vn ny sinh trong khi tch t l vic xut hin cc t mi (cc t khng c nh ngha trong t y l 1 vn khng th b qua khi ngn ng l lun lun thay i v sinh ra cc t mi, trong khi t in (dnh nhin) khng th cp nht ht c. Cc nghin cu i vi ting Nht v ting Trung x l vn ny kh tt. Vi s gn gi v ng php v c dng nhng nghin cu i vi ting Vit.

You might also like