Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Tr Tu Nhn To

Nguyn Nht Quang


quangnn-fit@mail.hut.edu.vn
Vin Cng ngh Thng tin v Truyn thng
Trng i hc Bch Khoa H Ni
Nm hc 2009-2010

Ni dung mn hc:

Gii thiu v Tr tu nhn to

Tc t

Gii quyt vn : Tm kim, Tha mn rng buc

Logic v suy din

Biu din tri thc

Suy din vi tri thc khng chc chn

Hc my

Gii thiu v hc my

Phn lp
p Nave Bayes
y

Hc da trn cc lng ging gn nht

Lp k hoch
Tr Tu Nhn To

Gii thiu
v Hc
my
y

Cc nh ngha v Hc my (Machine learning)


Mt qu trnh nh mt h thng ci thin hiu sut (hiu qu hot
ng) ca n [Simon, 1983]
Mt qu trnh m mt chng trnh my tnh ci thin hiu sut ca n
trong
g mt cng
g vic thng
gq
qua kinh nghim
g
[Mitchell, 1997]
Vic lp trnh cc my tnh ti u ha mt tiu ch hiu sut da trn
cc d liu v d hoc kinh nghim trong qu kh [Alpaydin, 2004]

Bi din
Biu
di mt
bi ton
h
hc my
[Mitchell, 1997]
Hc my = Ci thin hiu qu mt cng vic thng qua kinh nghim
Mt cng vic (nhim v) T
i vi cc tiu ch nh gi hiu sut P
Thng qua (s dng) kinh nghim E
Tr Tu Nhn To

Cc v d ca bi ton hc my (1)
Bi ton lc cc trang Web theo s
thch ca mt ngi dng

T: D on ( lc) xem nhng trang


Web no m mt ngi dng c th
th h c
thch

P: T l (%) cc trang Web c d


on ng

E: Mt tp cc trang Web m ngi


dng ch nh l thch c v mt tp
cc trang Web m anh ta ch nh l
khng thch c

Tr Tu Nhn To

Interested?

Cc v d ca bi ton hc my (2)
Bi ton phn loi cc trang Web theo cc ch

T Phn
T:
Ph lloii cc
ttrang W
Web
b th
theo cc
ch
h
nh
h ttrc

P: T l (%) cc trang Web c phn loi chnh xc

E: Mt tp cc trang Web
Web, trong mi trang Web gn vi mt
ch

Which
cat.?

Tr Tu Nhn To

Cc v d ca bi ton hc my (3)
Bi ton nhn dng ch
vit tay

T: Nhn dng v phn loi cc


t trong cc nh ch vit tay

P: T l (%) cc t c nhn
dng v phn loi ng

E: Mt tp cc nh ch vit
tay, trong mi nh c gn
vi mt nh danh ca mt t

Tr Tu Nhn To

Which word?

we

do

in

the

right

way

Cc v d ca bi ton hc my (4)
Bi ton robot li xe t ng

T: Robot (c trang b cc
camera quan st) li xe t ng
trn ng cao tc

P: Khong cch trung bnh m


robot c th li xe t ng
trc khi xy ra li (tai nn)

E: Mt tp cc v d c ghi
li khi quan st mt ngi li xe
trn ng cao tc
tc, trong
mi v d gm mt chui cc
nh v cc lnh iu khin xe

Tr Tu Nhn To

Which steering
command?
Go
straight

Move
left

Move Slow Speed


right down up

Qu trnh hc
Q
my
y
Tp hc
(Training set)
Hun luyn
h thng

Tp d liu
(Dataset)
Tp ti u
(Validation set)

Ti u ha
cc tham s
ca h thng
Tp th nghim
(Test set)

Tr Tu Nhn To

Th nghim
h thng
hc
8

Hc
c vs. khng
g c g
gim st

Hc c gim st (supervised learning)

Mi v d hc gm 2 phn: m t (biu din) ca v d hc,


hc v
nhn lp (hoc gi tr u ra mong mun) ca v d hc
Bi ton hc phn lp (classification problem)
D train = {(<Biu
D_train
{(<Biu_din_ca_x>,
din ca x> <Nhn_lp_ca_x>)}
<Nhn lp ca x>)}

Bi ton hc d on/hi quy (prediction/regression problem)


D_train = {(<Biu_din_ca_x>, <Gi_tr_u_ra_ca_x>)}

Hc khng c gim st (unsupervised learning)

Mi v d hc ch cha m t (biu din) ca v d hc - m


khng c bt k thng tin no v nhn lp hay gi tr u ra mong
mun ca v d hc
Bi ton hc phn cm (Clustering problem)
Tp hc D_train
D train = {(<Biu_din_ca_x>)}
{(<Biu din ca x>)}
Tr Tu Nhn To

Bi ton hc my Cc thnh phn chnh (1)

La chn cc v d hc (training/learning examples)


Cc thng tin hng dn qu trnh hc (training feedback) c cha
ngay trong cc v d hc, hay l c cung cp gin tip (vd: t mi
trng hot ng)
Cc v d hc theo kiu c g
gim st ((supervised)
p
) hay
y khng
g c g
gim st
(unsupervised)
Cc v d hc phi tng thch vi (i din cho) cc v d s c s
dng bi h thng trong tng lai (future test examples)

Xc nh hm mc tiu (gi thit, khi nim) cn hc


F: X {0,1}
F: X {Mt tp cc nhn lp}
F: X R+ (min cc gi tri s thc dng)

Tr Tu Nhn To

10

Bi ton hc my Cc thnh phn chnh (2)

La chn cch biu din cho hm mc tiu cn hc

Hm a thc (a polynomial function)


Mt tp cc lut (a set of rules)
Mt cy quyt nh (a decision tree)
Mt mng n-ron
n ron nhn to (an artificial neural network)

La chn mt gii thut hc my c th hc (xp x)


c hm mc tiu

Phng php hc hi quy (Regression-based)


gp
php
p hc
q
quy
y np
p lut
(Rule
(
induction))
Phng
Phng php hc cy quyt nh (ID3 hoc C4.5)
Phng php hc lan truyn ngc (Back-propagation)

Tr Tu Nhn To

11

Cc vn trong Hc my (1)

Gii thut hc my (Learning algorithm)


Nhng gii thut hc my no c th
hc (xp
x) mt hm
mc tiu cn hc?
Vi nhng
h iu
i ki
kin no,
mt
t gii
ii th
thut
t hc
h my
chn
h
s hi t (tim cn) hm mc tiu cn hc?
i vi mt lnh vc bi ton c th v i vi mt cch
biu din cc v d (i tng) c th, gii thut hc my
no thc hin tt nht?

Tr Tu Nhn To

12

Cc vn trong Hc my (2)

Cc v d hc (Training examples)
Bao nhiu v d hc l ?
Kch thc ca tp hc (tp hun luyn) nh hng th
no
i vi
i chnh
h h xc
ca
h
hm mc ti
tiu h
hc
c?
?
Cc v d li (nhiu) v/hoc cc v d thiu gi tr thuc
tnh (missing-value)
(missing value) nh hng th no i vi chnh
xc?

Tr Tu Nhn To

13

Cc vn trong Hc my (3)

Qu trnh hc (Learning process)


Chin
lc ti
u cho vic la chn th t s dng (khai
thc) cc v d hc?
C
Cc chin
hi llc lla chn
h ny
l
lm th
thay i mc
phc
h
tp ca bi ton hc my nh th no?
Cc tri thc c th ca bi ton (ngoi cc v d hc) c
th ng gp th no i vi qu trnh hc?

Tr Tu Nhn To

14

Cc vn trong Hc my (4)

Kh nng/gii hn hc (Learning capability)


Hm mc
tiu no m h
thng
g cn hc?

Biu din hm mc tiu: Kh nng biu din (vd: hm tuyn


tnh / hm phi tuyn) vs. phc tp ca gii thut v qu
trnh hc

Cc gii hn (trn l thuyt) i vi kh nng hc ca cc gii thut


hc my?
Kh nng
khi qut
t ha
h (generalize)
(
li ) ca
h thng
th t cc
v d
d hc?
h ?

trnh vn over-fitting (t chnh xc cao trn tp hc,


nhng t chnh xc thp trn tp th nghim)

Kh nng h thng
t ng thay i
(thch nghi) biu
din
(cu
trc)
bn trong ca n?

ci thin kh nng (ca h thng i vi vic) biu din v hc


h mc tiu
hm
ti
Tr Tu Nhn To

15

Vn over-fitting
g ((1))

Mt hm mc tiu (mt gi thit) hc c h s c gi


l qu khp/qu ph hp (over
(over-fit)
fit) vi mt tp hc nu
tn ti mt hm mc tiu khc h sao cho:
h km ph hp hn (t chnh xc km hn) h i vi tp
h
hc,
nhng
h
h t chnh xc cao hn h i vi ton b tp d liu (bao
gm c nhng v d c s dng sau qu trnh hun luyn)

Vn over-fitting thng do cc nguyn nhn:


Li ((nhiu)) trong
g tp
p hun luyn
y (do
( q
qu trnh thu thp/xy
p y dng
g
tp d liu)
S lng cc v d hc qu nh, khng i din cho ton b tp
(phn b) ca cc v d ca bi ton hc
Tr Tu Nhn To

16

Vn over-fitting
g ((2))

Gi s gi D l tp ton b cc v d, v D_train l tp
cc v d hc

Gi s gi ErrD(h) l mc li m gi thit h sinh ra i


vi tp D,
D v ErrD_train
D t i (h) l mc li m gi thit h sinh
ra i vi tp D_train

Gi thit h qu khp (qu ph hp) tp hc D_train


D train
nu tn ti mt gi thit khc h:
ErrD_train(h) < ErrD_train(h), v
ErrD(h) > ErrD(h)

Tr Tu Nhn To

17

Vn over-fitting
g ((3))

Trong s cc gi thit (hm mc tiu)


hc c, gi thit (hm mc tiu) no
khi qut ha tt nht t cc v d hc?
Lu : Mc tiu ca hc my l
t
c

chnh xc cao trong
g
d on i vi cc v d sau ny,
khng phi i vi cc v d hc

Hm mc tiu f(x) no
t chnh xc cao nht

i vi cc v d sau ny?
f(x)

Occams
O
razor: u
tin
ti chn
h h
hm
mc tiu n gin nht ph hp (khng
nht thit hon ho) vi cc v d hc
Khi qut
t h
ha tt hn
h
D gii thch/din gii hn
p
phc tp
p tnh ton t hn
Tr Tu Nhn To

18

Vn over-fitting
g V d

Tip tc qu trnh hc cy quyt nh s lm gim chnh xc i


vi tp th nghim mc d tng chnh xc i vi tp hc

[Mitchell, 1997]
Tr Tu Nhn To

19

Phn lp
p Nave Bayes
y

L cc phng php hc phn lp c gim st v da


trn xc sut

Da trn mt m hnh (hm) xc sut

Vic phn
Vi
h lloii d
da trn
t cc
gi
i ttr xc
sut
t ca
cc
kh
nng xy ra ca cc gi thit

L mt trong cc phng php hc my thng c


s dng trong cc bi ton thc t

Da trn nh l Bayes (Bayes theorem)

Tr Tu Nhn To

20

nh l Bayes
y
P( D | h).P(h)
P(h | D) =
P( D)
P(h): Xc sut trc (prior probability) rng gi thit (phn
lp) h l ng
P(D): Xc sut trc rng tp d liu D c quan st (thu
c)
P(D|h): Xc sut ca vic quan st c (thu c) tp d
liu D,
D vi iu kin gi thit h l ng
P(h|D): Xc sut ca gi thit h l ng, vi iu kin tp
d liu D c quan st
Tr Tu Nhn To

21

nh

l
Bayes
y V d
(1)
( )
Xt tp d liu sau y:
Day

Outlook

Temperature Humidity

Wind

Play Tennis

D1

Sunny

Hot

High

Weak

No

D2

Sunny

Hot

High

Strong

No

D3

O
Overcast
t

H t
Hot

Hi h
High

W k
Weak

Y
Yes

D4

Rain

Mild

High

Weak

Yes

D5

Rain

Cool

Normal

Weak

Yes

D6

Rain

Cool

Normal

Strong

No

D7

Overcast

Cool

Normal

Strong

Yes

D8

Sunny

Mild

High

Weak

No

D9

Sunny

Cool

Normal

Weak

Yes

D10

Rain

Mild

Normal

Weak

Yes

D11

Sunny

Mild

Normal

Strong

Yes

D12

Overcast

Mild

High

Strong

Yes

[Mitchell, 1997]

Tr Tu Nhn To

22

nh

l
Bayes
y V d
(2)
( )

Tp v d D. Tp cc ngy m thuc tnh Outlook c gi tr Sunny v


thuc tnh Wind c gi tr Strong

Gi thit (phn lp) h. Anh ta chi tennis

Xc sut trc P(h). Xc sut anh ta chi tennis (khng ph thuc


vo cc thuc tnh Outlook v Wind)

Xc sut trc P(D). Xc sut ca mt ngy m thuc tnh Outlook


c gi tr Sunny v thuc tnh Wind c gi tr Strong

P(D|h). Xc sut
ca mt ngy m thuc tnh Outlook c gi tr
Sunny v Wind c gi tr Strong, vi iu kin (nu bit rng) anh ta
chi tennis

P(h|D). Xc sut anh ta chi tennis, vi iu kin (nu bit rng)


thuc tnh Outlook c gi tr Sunny v Wind c gi tr Strong
Phng
gp
php
pp
phn lp
p Nave Bayes
y da
trn xc sut c iu
kin (posterior probability) ny!
Tr Tu Nhn To

23

Cc i ha xc sut c iu kin

Vi mt tp cc gi thit (cc phn lp) c th H, h thng hc


s tm gi thit c th xy ra nht (the most probable
hypothesis) h(H) i vi cc d liu quan st c D

Gi thit h nyy c
g
gi l g
gi thit cc
i
ha xc sut c
iu kin (maximum a posteriori MAP)

hMAP = arg max P(h | D)


hH

hMAP

P( D | h).P (h)
= arg max
P (D
( D)
hH
h

hMAP = arg max P( D | h).P(h)


hH
h

Tr Tu Nhn To

(bi nh l Bayes)
(P(D) l nh nhau
gi thit h))
i vi cc g
24

MAP V d

Tp H bao gm 2 gi thit (c th)


h1: Anh ta chi tennis
h2: Anh ta khng chi tennis

Tnh gi tr ca 2 xc xut c iu kin: P(h1|D), P(h2|D)

Gi thit c th nht hMAP=h1 nu P(h1|D) P(h2|D); ngc


li th hMAP=h2

Bi v P(D)=P(D,h
P(D)=P(D h1)+P(D,h
)+P(D h2) l nh nhau i vi c 2 gi
thit h1 v h2, nn c th b qua i lng P(D)

V vy,
y, cn tnh 2 biu thc: P(D|h
( | 1)
).P(h
( 1) v
P(D|h2).P(h2), v a ra quyt nh tng ng
Nu P(D|h1).P(h1) P(D|h2).P(h2), th kt lun l anh ta chi tennis
Ngc
N
l i th kt lun
li,
l l anh
h ta
t khng
kh chi
h i tennis
t
i
Tr Tu Nhn To

25

nh gi kh nng xy ra cao nht

Phng php MAP: Vi mt tp cc gi thit c th H, cn


tm mt gi thit cc i ha gi tr: P(D|h).P(h)

Gi s (assumption) trong phng php nh gi kh nng


xy ra cao nht (maximum likelihood estimation MLE):
Tt c cc gi thit u c gi tr xc sut trc nh nhau:
P(hi)=P(hj), hi,hjH

Phng php MLE tm gi thit cc i ha gi tr P(D|h);


trong P(D|h) c gi l kh nng xy ra (likelihood) ca
d liu D i vi h

Gi thit cc i ha kh nng xy ra (maximum likelihood


hypothesis)

hMLE = arg max P( D | h)


hH
h

Tr Tu Nhn To

26

MLE V d

Tp H bao gm 2 gi thit c th
h1: Anh ta chi tennis
h2: Anh ta khng chi tennis
D: Tp d liu (cc ngy) m trong thuc tnh Outlook c gi tr Sunny
v thuc tnh Wind c gi tr Strong

Tnh 2 gi tr kh nng xy ra (likelihood values) ca d liu D


i vi 2 gi thit: P(D|h1) v P(D|h2)
P(Outlook=Sunny
P(Outlook=Sunny, Wind=Strong|h1)= 1/8
P(Outlook=Sunny, Wind=Strong|h2)= 1/4

Gi thit MLE hMLE=h1 nu P(D|h1) P(D|h2); v ngc


g
li th hMLE=h2
Bi v P(Outlook=Sunny, Wind=Strong|h1) <
P(
(Outlook=Sunny,
y, Wind=Strong
g|
|h2), h
thng
g kt lun
rng:
g
Anh ta s khng chi tennis!
Tr Tu Nhn To

27

Phn loi
Nave Bayes
y (1)
( )

Biu din bi ton phn loi (classification problem)


Mt tp hc D_train,
D train trong mi v d hc x c biu din l
mt vect n chiu: (x1, x2, ..., xn)
Mt tp xc nh cc nhn lp: C={c1, c2, ..., cm}
Vi mt v d (mi) z, z s c phn vo lp no?

Mc tiu: Xc nh phn lp c th (ph hp) nht i vi z


c MAP = arg max P(ci | z )
ci C

c MAP = arg max P(ci | z1 , z 2 ,..., z n )


ci C

c MAP = arg max


ci C

P( z1 , z 2 ,..., z n | ci ).P(ci )
P( z1 , z 2 ,..., z n )
Tr Tu Nhn To

(bi nh l Bayes)

28

Phn loi
Nave Bayes
y (2)
( )

tm c phn lp c th nht i vi z

c MAP = arg max P( z1 , z 2 ,..., z n | ci ).


) P ( ci )
ci C

(P(z1,z
z2,...,z
zn) l
nh nhau vi cc lp)

Gi s (assumption) trong phng php phn loi Nave


Bayes. Cc thuc tnh l c lp c iu kin (conditionally
independent) i vi cc lp
n

P( z1 , z 2 ,..., z n | ci ) = P( z j | ci )
j =1

Phn loi Nave Bayes tm phn lp c th nht i vi z


n

c NB = arg max P (ci ). P ( z j | ci )


ci C

j =1

Tr Tu Nhn To

29

Phn loi Nave Bayes Gii thut

Giai on hc (training phase), s dng mt tp hc


i vi mi phn lp c th (mi nhn lp) ciC
Tnh gi tr xc sut trc: P(ci)
i vi mi gi tr thuc tnh xj, tnh gi tr xc sut xy ra
ca gi tr thuc tnh i vi mt phn lp ci: P(xj|ci)

Giai on phn lp (classification phase), i vi mt v d mi


i vi mi phn lp ciC, tnh gi tr ca biu thc:
n

P(ci ). P( x j | ci )
j =1

Xc nh phn lp ca z l lp c th nht c*
n

c* = arg max P (ci ). P ( x j | ci )


ci C

j =1

Tr Tu Nhn To

30

Phn lp Nave Bayes V d (1)


Mt sinh vin tr vi thu nhp trung bnh v mc nh gi tn dng bnh thng s mua mt ci my tnh?
Rec. ID

Age

Income

Student

Credit_Rating

Buy_Computer

Young

High

No

Fair

No

Young

High

No

Excellent

No

Medium

High

No

Fair

Yes

Old

M di
Medium

N
No

F i
Fair

Y
Yes

Old

Low

Yes

Fair

Yes

Old

Low

Yes

Excellent

No

Medium

Low

Yes

Excellent

Yes

Young

Medium

No

Fair

No

Young

Low

Yes

Fair

Yes

10

Old

Medium

Yes

Fair

Yes

11

Young

Medium

Yes

Excellent

Yes

12

Medium

Medium

No

Excellent

Yes

13

Medium

High

Yes

Fair

Yes

14

Old

Medium

No

Excellent

No

http://www.cs.sunysb.edu
/~cse634/lecture_notes/0
7classification.pdf

Tr Tu Nhn To

31

Phn lp Nave Bayes V d (2)

Biu din bi ton phn loi


z = (Age
(Age=Young
Young,Income
Income=Medium
Medium,Student
Student=Yes
Yes,Credit_Rating
Credit Rating=Fair)
Fair)
C 2 phn lp c th: c1 (Mua my tnh) v c2 (Khng mua my tnh)

Tnh gi tr xc sut trc cho mi phn lp


P(c1) = 9/14
P(c2) = 5/14

Tnh gi tr xc sut ca mi gi tr thuc tnh i vi mi phn lp


P(Age=Young|c1) = 2/9;

P(Age=Young|c2) = 3/5

P(Income=Medium|c
P(Income=M di | 1) = 4/9;

P(Income=M di | 2) = 2/5
P(Income=Medium|c

P(Student=Yes|c1) = 6/9;

P(Student=Yes|c2) = 1/5

P(Credit_Rating=Fair|c1) = 6/9;

P(Credit_Rating=Fair|c2) = 2/5

Tr Tu Nhn To

32

Phn lp Nave Bayes V d (3)

Tnh ton xc sut c th xy ra (likelihood) ca v d z i vi mi


phn lp
i
vi phn lp c1
P(z|c1) = P(Age=Young|c1).P(Income=Medium|c1).P(Student=Yes|c1).
P(Credit_Rating=Fair|c1) = (2/9).(4/9).(6/9).(6/9) = 0.044

i vi phn lp c2
P(z|c2) = P(Age=Young|c2).P(Income=Medium|c2).P(Student=Yes|c2).
P(Credit_Rating=Fair|c2) = (3/5).(2/5).(1/5).(2/5) = 0.019

Xc nh phn lp c th nht (the most probable class)


i vi phn lp c1
P(c1).P(z|c
) P(z|c1) = (9/14)
(9/14).(0.044)
(0 044) = 0
0.028
028

i vi phn lp c2
P(c2).P(z|c2) = (5/14).(0.019) = 0.007

Kt lun: Anh ta (z) s mua mt my tnh!


Tr Tu Nhn To

33

Phn lp Nave Bayes Vn (1)

Nu khng c v d no gn vi phn lp ci c gi tr thuc tnh xj


n
P(x
( j|
|ci)
)=0 , v v vy:
y
P (c ).
)
P( x | c ) = 0
i

j =1

Gii php: S dng phng php Bayes c lng P(xj|ci)


n(ci , x j ) + mp
P ( x j | ci ) =
n(ci ) + m
n(c
( i)
): s lng
g cc v d
hc
g
gn vi p
phn lp
p ci
n(ci,xj): s lng cc v d hc gn vi phn lp ci c gi tr thuc tnh
xj
p: c lng i vi gi tr xc sut P(xj|ci)
Cc c lng ng mc: p=1/k, vi thuc tnh fj c k gi tr c th
m: mt h s (trng s)
b sung cho n(ci) cc v d thc s c quan st vi thm m
mu v d vi c lng p
Tr Tu Nhn To

34

Phn lp Nave Bayes Vn (2)

Gii hn v chnh xc trong tnh ton ca my tnh


P(x
P( j|c
| i)<1,
)<1 i vi mi gi tr thuc tnh xj v phn lp ci
V vy, khi s lng cc gi tr thuc tnh l rt ln, th:
n

lim P ( x j | c i ) = 0
n
j =1

Gii php: S dng hm lgarit cho cc gi tr xc sut


n

c NB = arg max log P(ci ). P ( x j | ci )

c C
j =1

c NB = arg max log P(ci ) + log P( x j | ci )


ci C
j =1

Tr Tu Nhn To

35

Phn loi vn bn bng NB (1)

Biu din bi ton phn loi vn bn


Tp hc D_train, trong mi v d hc l mt biu din vn bn gn vi
mt nhn lp: D = {(dk, ci)}
Mt tp cc nhn lp xc nh: C = {ci}

Giai on hc
T tp cc vn bn trong D_train, trch ra tp cc t kha
(keywords/terms): T = {tj}
Gi D_c
D ci (D_train)
(D train) l tp cc vn bn trong D_train
D train c nhn lp ci
i vi mi phn lp ci
- Tnh gi tr xc sut trc ca phn lp ci:

P (ci ) =

D _ ci

D
- i vi mi t kha tj, tnh xc sut t kha tj xut hin i vi lp ci
P (t j | ci ) =

d k D _ ci

d k D _ ci

n(d k ,t j ) + 1

m T

n( d k , t m ) + T

Tr Tu Nhn To

(n(dk,tj): s ln xut hin ca


t kha tj trong vn bn dk)

36

Phn loi vn bn bng NB (2)

phn lp cho mt vn bn mi d

Giai on phn lp
T vn bn d, trch ra tp T_d gm cc t kha (keywords) tj
c nh ngha trong tp T (T_d T)
Gi s
(assumption).
(
ti ) Xc
X sut
t t kha
kh tj xut
t hin
hi i vi
i l
lp
ci l c lp i vi v tr ca t kha trong vn bn
P(tj v tr k|ci) = P(tj v tr m|ci), k,m

i vi mi phn lp ci, tnh gi tr likelihood ca vn bn d i


vi ci
P(ci ).

P(t j | ci )

t j T _ d

Phn lp vn bn d thuc vo lp c*
c * = arg max P(ci ).
ci C

P(t j | ci )

t j T _ d

Tr Tu Nhn To

37

Hc da trn lng ging gn nht

Mt s tn gi khc ca phng php hc da trn lng ging


gn nht (Nearest neighbor learning)
Instance-based learning
Lazy learning
Memory
Memory-based
based learning

tng ca phng php hc da trn lng ging gn nht


tp
p cc v d
hc

Vi mt

(n gin l) lu li cc v d hc
Khng cn xy dng mt m hnh (m t) r rng v tng qut
ca hm mc tiu cn hc

i vi mt v d cn phn loi/d on

Kim tra (xt) quan h gia v d vi cc v d hc gn


gi tr ca hm mc tiu (mt nhn lp, hoc mt gi tr thc)
Tr Tu Nhn To

38

Hc da trn lng ging gn nht

Biu din u vo ca bi ton


Mi v d x c biu din l mt vect n chiu trong
g khng
gg
gian
cc vect XRn
x = (x1,x2,,xn), trong xi (R) l mt s thc

C
Chng
ta xt 2 kiu
bi ton hc
Bi ton phn lp (classification)
hc
mt
hm mc
tiu c gi
g tr ri rc
((a discrete-valued target
g
function)
u ra ca h thng l mt trong s cc gi tr ri rc xc nh
trc (mt trong cc nhn lp)

Bi ton d on/hi quy (prediction/regression)


hc mt hm mc tiu c gi tr lin tc (a continuous-valued
g function))
target
u ra ca h thng l mt gi tr s thc

Tr Tu Nhn To

39

Phn lp
p da
trn NN V d

Xt 1 lng ging gn
nht
Gn z vo lp c2

Xt 3 lng ging gn
nht
Gn z vo lp c1

Xt 5 lng ging gn
nht
Gn z vo lp c1

Tr Tu Nhn To

Lp c1

Lp c2
V d cn
phn lp z

40

Gii thut
p
phn lp
p k-NN

Mi v d hc x c biu din bi 2 thnh phn:


M t ca v d: x
x=(x
(x1,x
x2,,x
xn),
) trong xiR
Nhn lp : c (C, vi C l tp cc nhn lp c xc nh trc)

Giai on hc
n gin l lu li cc v d hc trong tp hc D = {x}

Giai on
p
phn lp:
p p
phn lp
p cho mt
v d
((mi)) z
Vi mi v d hcxD, tnh khong cch gia x v z
Xc nh tp NB(z) cc lng ging gn nht ca z
Gm
G k v d
d h
hc ttrong D gn
nht
ht vi
i z tnh
t h theo
th mt
t hm
h
khong cch d
Phn z vo lp chim s ng (the majority class) trong s cc lp
ca cc v d hc trong NB(z)
Tr Tu Nhn To

41

Gii thut
d
on k-NN

Mi v d hc x c biu din bi 2 thnh phn:


M t ca v d: x
x=(x
(x1,x
x2,,x
xn),
) trong xiR
Gi tr u ra mong mun: yxR (l mt s thc)

Giai on hc
n gin l lu li cc v d hc trong tp hc D

Giai on d on: d on gi tr u ra cho v d z


i
vi mi
v d hc xD, tnh khong cch gia x v z
Xc nh tp NB(z) cc lng ging gn nht ca z
Gm k v d
hc
trong
gDg
gn nht vi z tnh theo mt
hm khong
g
cch d
D on gi tr u ra i vi z:

Tr Tu Nhn To

yz =

1
y

xNB ( z ) x
k
42

Xt mt hay nhiu lng ging?

Vic phn lp (hay d on) ch da trn duy nht mt lng


ging gn nht (l v d hc gn nht vi v d cn phn
lp/d on) thng khng chnh xc
Nu v d hc ny l mt v d bt thng, khng in hnh (an
outlier)) rt khc so vi cc v d
khc
Nu v d hc ny c nhn lp (phn lp sai) do li trong qu
trnh thu thp (xy dng) tp d liu

Thng xt k (>1) cc v d hc gn
nht
vi v d cn
phn
lp, v gn v d vo lp chim s ng trong s k v d
hc gn nht ny

k thng c chn l mt s l, trnh cn bng v t l


phn lp (ties in classification)
V d: k= 3, 5, 7,
Tr Tu Nhn To

43

Hm tnh khong
g cch ((1))

Hm tnh khong cch d


ng vai tr rt quan trng trong phng php hc da trn lng
ging gn nht
Thng c xc nh trc, v khng thay i trong sut qu
trnh hc v phn loi/d on

La chn hm khong cch d


Cc hm khong cch hnh hc: Dnh cho cc bi ton c cc
thuc tnh u vo l kiu s thc (xiR)
Hm khong
g cch Hamming:
g Dnh cho cc bi ton c cc
thuc tnh u vo l kiu nh phn (xi{0,1})
Hm tnh tng t Cosine: Dnh cho cc bi ton phn lp
vn bn (xi l gi tr trng s TF/IDF ca t kha th i)
Tr Tu Nhn To

44

Hm tnh khong
g cch ((2))

Cc hm tnh khong cch hnh hc (Geometry distance


functions)
n

d ( x, z ) = xi zi

Hm Manhattan:

i =1

d ( x, z ) =

Hm Euclid:

(x z )

i =1

1/ p

Hm Minkowski (p-norm):

n
p
d ( x, z ) = xi zi

i =1

Hm Chebyshev:

n
p
d ( x, z ) = lim xi zi
p
i =1

1/ p

= max xi zi
i

Tr Tu Nhn To

45

Hm tnh khong
g cch ((3))

Hm khong cch
H
Hamming
i

d ( x, z ) = Difference
ff
( xi , z i )
i =1

i vi cc thuc tnh u
vo l kiu nh phn
V d: x=(0,1,0,1,1)

1, if ( a b)
Difference ( a, b) =
0, iff ( a = b)

Hm tnh tng t
Cosine
i vi u vo l mt vect
cc gi tr trng s (TF/IDF)
ca cc t kha

x.z
=
d ( x, z ) =
x z

Tr Tu Nhn To

x z
i =1

xi
i =1

i i
n

zi

i =1

46

Chun ha min gi tr thuc tnh

Hm tnh khong cch Euclid:

d ( x, z ) =

2
(
)
x

z
i i
i =1

Gi s mi v d c biu din bi 3 thuc tnh: Age, Income (cho


mi thng), v Height (o theo mt)
x = (Age=20, Income=12000, Height=1.68)
z = (Age=40, Income=1300, Height=1.75)

Khong cch gia x v z


d(x,z)
d(x z) = [(20-40)2 + (12000-1300)2 + (1.68-1.75)
(1 68-1 75)2]1/2
Gi tr khong cch ny b quyt nh ch yu bi gi tr khong cch (s
khc bit) gia 2 v d i vi thuc tnh Income
V: Thuc tnh Income c min gi tr rt ln so vi cc thuc tnh khc

Cn phi chun ha min gi tr (a v cng mt khong gi tr)


Khong gi tr [0,1] thng c s dng
i vi mi thuc tnh i: xi = xi/gi_tr_cc_i_i_vi_thuc_tnh_i
/gi tr cc i i vi thuc tnh i
Tr Tu Nhn To

47

Trng
g s ca cc thuc
tnh

Hm khong cch Euclid:

d ( x, z ) =

2
(
)
x

z
i i
i =1

Tt c cc thuc tnh c cng (nh nhau) nh hng i vi gi tr


khong cch

Cc thuc tnh khc nhau c th (nn) c mc nh hng khc


nhau i vi gi tr khong cch

Cn phi tch hp (a vo) cc gi tr trng s ca cc thuc tnh


n
trong hm tnh khong cch
2

d ( x, z ) =

wi l trng s ca thuc tnh i:

w (x z )
i =1

Lm sao xc nh cc gi tr trng s ca cc thuc tnh?


Da trn cc tri thc c th ca bi ton (vd: c ch nh bi cc
chuyn gia trong lnh vc ca bi ton ang xt)
g mt
q
qu trnh ti u ha cc g
gi tr trng
g s ((vd: s dng
g mt
tp
p
Bng
hc hc mt b cc gi tr trng s ti u)
Tr Tu Nhn To

48

Khong cch ca cc lng ging (1)

Xt tp NB(z) gm k v d hc gn
nht vi v d cn phn lp/d on z

test instance z

Mi v d (lng ging gn nht) ny c


khong cch khc nhau n z
Cc lng ging
ny c nh
hng
nh
nhau i vi vic phn lp/d oncho
z? KHNG!

Cn gn cc mc nh hng (ng
gp) ca mi lng ging gn nht ty
theo khong cch ca n n z
Mc nh hng cao hn cho cc
lng ging gn hn!

Tr Tu Nhn To

49

Khong cch ca cc lng ging (2)

Gi v l hm xc nh trng s theo khong cch


i vi mt gi tr d(x,z) khong cch gia x v z
v(x,z) t l nghch vi d(x,z)

i vi bi ton phn lp:

c ( z ) = arg max
c j C

v( x, z ).Identical (c j , c( x))

xNB ( z )

1, if (a = b)
Identical (a, b) =
0, if (a b)

i vi bi ton d on (hi quy):

f ( z) =

v( x, z ). f ( x)
v ( x, z )

xNB ( z )

xNB ( z )

La chn mt hm xc nh trng s theo khong cch:


1
v ( x, z ) =
+ d ( x, z )

1
v ( x, z ) =
+ [d ( x, z )]2
Tr Tu Nhn To

v ( x, z ) = e

d ( x, z )2

50

Hc
NN Khi no?

Cc v d c biu din l cc vect trong khng gian s thc (Rn)

S lng cc thuc tnh (s chiu ca khng gian) u vo khng ln


Tp hc kh ln (nhiu v d hc)

Cc u im
Khng
Kh cn
bc
b h
hc (h th
thng ch
h
n gin
i l llu llii cc
v d
d h
hc))
Hot ng tt vi cc bi ton c s lp kh ln
Khng cn phi hc ring r n b phn lp cho n lp
Phng php hc k-NN (k >>1) c th
lm vic c c vi d liu li

Vic phn lp/d on da trn k lng ging gn nht

Cc nhc im
Phi xc nh hm tnh khong cch ph hp
Chi ph tnh ton (v thi gian v b nh) ti thi im phn lp/d on
C th phn lp/d on sai , do cc thuc tnh khng lin quan (irrelevant
attributes)
Tr Tu Nhn To

51

Ti liu
tham kho
E. Alpaydin. Introduction to Machine Learning. The MIT
Press, 2004.
T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.
H. A. Simon. Why Should Machines Learn? In R. S.
Michalski, J. Carbonell, and T. M. Mitchell (Eds.):
M hi
Machine
l
learning:
i
A
An artificial
tifi i l i
intelligence
t lli
approach,
h
chapter 2, pp. 25-38. Morgan Kaufmann, 1983.

Tr Tu Nhn To

52

You might also like