SVM2

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Support Vector Machine

1. Support Vector Machine:


Support Vector Machine (SVM) l mt phung php phn lp da trn l thuyt hc thng k, c xut bi Vapnik (1995). n gin ta s xt bi ton phn lp nh phn, sau s m rng vn ra cho bi ton phn nhiu lp. Xt mt v d ca bi ton phn lp nh hnh v; ta phi tm mt ng thng sao cho bn tri n ton l cc im , bn phi n ton l cc im xanh. Bi ton m dng ng thng phn chia ny c gi l phn lp tuyn tnh (linear classification).

Hm tuyn tnh phn bit hai lp nh sau: ( ) Trong : l vector trng s hay vector chun ca siu phng phn cch, T l k hiu chuyn v. l lch ( ) l vc t c trng, lm hm nh x t khng gian u vo sang khng gian c trng. ( ) (1)

Tp d liu u vo gm N mu input vector {x1, x2,...,xN}, vi cc gi tr nhn tng ng l {t1,,tN} trong * +. Lu cch dng t y: im d liu, mu u c hiu l input vector xi; nu l khng gian 2 chiu th ng phn cch l ng thng, nhng trong khng gian a chiu th gi l siu phng. Gi s tp d liu ca ta c th phn tch tuyn tnh hon ton (cc mu u c phn ng lp) trong khng gian c trng (feature space), do s tn ti gi tr tham s w v b theo (1)

tha ( th m

) (

cho nhng im c nhn v ( ) cho mi im d liu hun luyn.

cho nhng im c

, v

SVM tip cn gii quyt vn ny thng qua khi nim gi l l, ng bin (margin). L c chn l khong cch nh nht t ng phn cch n mi im d liu hay l khong cch t ng phn cch n nhng im gn nht.

Trong SVM, ng phn lp tt nht chnh l ng c khong cch margin ln nht (tc l s tn ti rt nhiu ng phn cch xoay theo cc phng khc nhau, v ta chn ra ng phn cch m c khong cch margin l ln nht).

Ta c cng thc tnh khong cch t im d liu n mt phn cch nh sau: | ( )|

Do ta ang xt trong trng hp cc im d liu u c phn lp ng nn mi n. V th khong cch t im xn n mt phn cch c vit li nh sau:
( ) ( ( ) )

cho

(2)

L l khong cch vung gc n im d liu gn nht xn t tp d liu, v chng ta mun tm gi tr ti u ca w v b bng cch cc i khong cch ny. Vn cn gii quyt s c vit li di dng cng thc sau: { Chng ta c th em nhn t

, (

)-}

(3)

ra ngoi bi v w khng ph thuc n. Gii quyt vn ny mt

cch trc tip s rt phc tp, do ta s chuyn n v mt vn tng ng d gii quyt hn. Ta s scale v cho mi im d liu, t y khong cch l tr thnh 1, vic bin i ny khng lm thay i bn cht vn . ( ( ) ) (4)

T by gi, cc im d liu s tha rng buc: ( Vn ti u yu cu ta cc i ( ) ) (5)

c chuyn thnh cc tiu , ta vit li cng thc: (6)

Vic nhn h s s gip thun li cho ly o hm v sau. L thuyt Nhn t Lagrange: Vn cc i hm f(x) tha iu kin Lagrange nh sau: ( ) ( ) ( ) s c vit li di dng ti u ca hm ( )

Trong x v phi tha iu kin Karush-Kuhn-Tucker (KKT) nh sau: ( )

( ) Nu l cc tiu hm f(x) th hm Lagrange s l ( ) ( ) ( )

gii quyt bi ton trn, ta vit li theo hm Lagrange nh sau: ( Trong ( ) * ( ( ) ) + (7)

) l nhn t Lagrange.

Lu du () trong hm Lagrange, bi v ta cc tiu theo bin w v b, v l cc i theo bin a. Ly o hm L(w,b,a) theo w v b ta c: ( ) (8) (9)

Loi b w v b ra khi L(w,b,a) bng cch th (8), (9) vo. iu ny s dn ta n vn ti u: ( ) Tha cc rng buc: (11) (12) ( ) (10)

y hm nhn (kernel function) c nh ngha l (

).

Vn tm thi gc li y, ta s tho lun k thut gii quyt (10) tha (11), (12) ny sau. phn lp cho 1 im d liu mi dng m hnh hun luyn, ta tnh du ca y(x) theo cng thc (1), nhng th w trong (8) vo: ( ) Tha cc iu kin KKT sau: (14) ( * ) ( ) + (15) (16) ( ) (13)

( ) V th vi mi im d liu, hoc l hoc l . Nhng im d liu m c s khng xut hin trong (13) v do m khng ng gp trong vic d on im d liu mi. Nhng im d liu cn li ( ) c gi l support vector, chng tha nhng im nm trn l ca siu phng trong khng gian c trng. ( ) , l

Support vector chnh l ci m ta quan tm trong qu trnh hun luyn ca SVM. Vic phn lp cho mt im d liu mi s ch ph thuc vo cc support vector. Gi s rng ta gii quyt c vn (10) v tm c gi tr nhn t a, by gi ta cn xc ( ) nh tham s b da vo cc support vector xn c . Th (13) vo: ( ( ) ) (17)

Trong S l tp cc support vector. Mc d ta ch cn th mt im support vector xn vo l c th tm ra b, nhng m bo tnh n nh ca b ta s tnh b theo cch ly gi tr trung bnh da trn cc support vector. u tin ta nhn tn vo (17) (lu Trong Ns l tng s support vector. Ban u d trnh by thut ton ta gi s l cc im d liu c th phn tch hon ton trong khng gian c trng ( ). Nhng vic phn tch hon ton ny c th dn n kh nng tng qut ha km, v thc t mt s mu trong qu trnh thu thp d liu c th b gn nhn sai, nu ta c tnh phn tch hon ton s lm cho m hnh d on qu khp. ), v gi tr b s l: ( ( )) (18)

chng li s qu khp, chng ta chp nhn cho mt vi im b phn lp sai. lm iu ny, ta dng cc bin slack variables cho nhng im nm trn l hoc pha trong ca l ( ) cho nhng im cn li. Do nhng im nm trn ng phn cch ( ) s c Cn nhng im phn lp sai s c cho mi im d liu.

Cng thc (5) s vit li nh sau: ( ) (20)

Mc tiu ca ta by gi l cc i khong cch l, nhng ng thi cng m bo tnh mm mng cho nhng im b phn lp sai. Ta vit li vn cn cc tiu: hay l l. (21)

Trong C > 0 ng vai tr quyt nh t tm quan trng vo bin By gi chng ta cn cc tiu (21) tha rng buc (20) v ( Trong * ) + v * * ( )

. Theo Lagrange ta vit li: + (22)

+ l cc nhn t Lagrange.

Cc iu kin KKT cn tha l: (23) ( ( ) ( ) ) (24) (25) (26) (27) (28) Vi n = 1,,N Ly o hm (22) theo w, b v { }: ( ) (29) (30) (31) Th (29), (30), (31) vo (22) ta c: ( ) T (23), (26) v (31) ta c: ( ) (32)

Vn cn ti u ging ht vi trng hp phn tch hon ton, ch c iu kin rng buc khc bit nh sau: (33) Th (29) vo (1), ta s thy d on cho mt im d liu mi tng t nh (13). Nh trc , tp cc im c khng c ng gp g cho vic d on im d liu mi. v theo (25) tha: (35) v l nhng im nm trn l. (34)

Nhng im cn li to thnh cc support vector. Nhng im c ( Nu theo (31) c )

, t (28) suy ra

Nhng im c c th l nhng im phn lp ng nm gia l v ng phn cch nu hoc c th l phn lp sai nu xc nh tham s b trong (1) ta s dng nhng support vector m ( ) th : ( ( ) ) c v

(36)

Ln na, m bo tnh n nh ca b ta tnh theo trung bnh: Trong M l tp cc im c gii quyt (10) v (32) ta dng thut ton Sequential Minimal Optimization (SMO) do Platt a ra vo 1999. ( ( )) (37)

2. MultiClass SVMs:

By gi xt n trng hp phn nhiu lp K > 2. Chng ta c th xy dng vic phn K-class da trn vic kt hp mt s ng phn 2 lp. Tuy nhin, iu ny s dn n mt vi kh khn (theo Duda and Hart, 1973). Hng one-versus-the-rest, ta s dng K-1 b phn lp nh phn xy dng K-class. Hng one-versus-one, dng K(K-1)/2 b phn lp nh phn xy dng K-class. C 2 hng u dn n vng mp m trong phn lp (nh hnh v). Ta c th trnh c vn ny bng cch xy dng K-Class da trn K hm tuyn tnh c dng: ( ) V mt im x c gn vo lp Ck khi ( ) ( ) vi mi .

Mt hng tip cn khc do Wu (2004) xut phng php c lng xc sut cho vic phn m lp.

3. p dng cho bi ton phn loi vn bn:


Hng dn ci t: M t vector c trng ca vn bn: L vector c s chiu l s c trng trong ton tp d liu, cc c trng ny i mt khc nhau. Nu vn bn c cha c trng s c gi tr 1, ngc li l 0.

Vic ci t SVM kh phc tp ta nn dng cc th vin ci sn trn mng nh LibSVM, SVMLight. Thut ton gm 2 giai on hun luyn v phn lp: 1. Hun luyn: u vo: Cc vector c trng ca vn bn trong tp hun luyn (Ma trn MxN, vi M l s vector c trng trong tp hun luyn, N l s c trng ca vector). Tp nhn/lp cho tng vector c trng ca tp hun luyn. Cc tham s cho m hnh SVM: C, (tham s ca hm kernel, thng dng hm Gauss) u ra: M hnh SVM (Cc Support Vector, nhn t Lagrange a, tham s b). 2. Phn lp: u vo: Vector c trng ca vn bn cn phn lp. M hnh SVM u ra: Nhn/lp ca vn bn cn phn loi.

4. Ti liu tham kho:


[1] Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer (2007) .

You might also like