Professional Documents
Culture Documents
Chuong 10 - Hoc May
Chuong 10 - Hoc May
Ni dung mn hc:
Tc t
Hc my
Gii thiu v hc my
Phn lp
p Nave Bayes
y
Lp k hoch
Tr Tu Nhn To
Gii thiu
v Hc
my
y
Bi din
Biu
di mt
bi ton
h
hc my
[Mitchell, 1997]
Hc my = Ci thin hiu qu mt cng vic thng qua kinh nghim
Mt cng vic (nhim v) T
i vi cc tiu ch nh gi hiu sut P
Thng qua (s dng) kinh nghim E
Tr Tu Nhn To
Cc v d ca bi ton hc my (1)
Bi ton lc cc trang Web theo s
thch ca mt ngi dng
Tr Tu Nhn To
Interested?
Cc v d ca bi ton hc my (2)
Bi ton phn loi cc trang Web theo cc ch
T Phn
T:
Ph lloii cc
ttrang W
Web
b th
theo cc
ch
h
nh
h ttrc
E: Mt tp cc trang Web
Web, trong mi trang Web gn vi mt
ch
Which
cat.?
Tr Tu Nhn To
Cc v d ca bi ton hc my (3)
Bi ton nhn dng ch
vit tay
P: T l (%) cc t c nhn
dng v phn loi ng
E: Mt tp cc nh ch vit
tay, trong mi nh c gn
vi mt nh danh ca mt t
Tr Tu Nhn To
Which word?
we
do
in
the
right
way
Cc v d ca bi ton hc my (4)
Bi ton robot li xe t ng
T: Robot (c trang b cc
camera quan st) li xe t ng
trn ng cao tc
E: Mt tp cc v d c ghi
li khi quan st mt ngi li xe
trn ng cao tc
tc, trong
mi v d gm mt chui cc
nh v cc lnh iu khin xe
Tr Tu Nhn To
Which steering
command?
Go
straight
Move
left
Qu trnh hc
Q
my
y
Tp hc
(Training set)
Hun luyn
h thng
Tp d liu
(Dataset)
Tp ti u
(Validation set)
Ti u ha
cc tham s
ca h thng
Tp th nghim
(Test set)
Tr Tu Nhn To
Th nghim
h thng
hc
8
Hc
c vs. khng
g c g
gim st
10
Tr Tu Nhn To
11
Cc vn trong Hc my (1)
Tr Tu Nhn To
12
Cc vn trong Hc my (2)
Cc v d hc (Training examples)
Bao nhiu v d hc l ?
Kch thc ca tp hc (tp hun luyn) nh hng th
no
i vi
i chnh
h h xc
ca
h
hm mc ti
tiu h
hc
c?
?
Cc v d li (nhiu) v/hoc cc v d thiu gi tr thuc
tnh (missing-value)
(missing value) nh hng th no i vi chnh
xc?
Tr Tu Nhn To
13
Cc vn trong Hc my (3)
Tr Tu Nhn To
14
Cc vn trong Hc my (4)
Kh nng h thng
t ng thay i
(thch nghi) biu
din
(cu
trc)
bn trong ca n?
15
Vn over-fitting
g ((1))
16
Vn over-fitting
g ((2))
Gi s gi D l tp ton b cc v d, v D_train l tp
cc v d hc
Tr Tu Nhn To
17
Vn over-fitting
g ((3))
Hm mc tiu f(x) no
t chnh xc cao nht
i vi cc v d sau ny?
f(x)
Occams
O
razor: u
tin
ti chn
h h
hm
mc tiu n gin nht ph hp (khng
nht thit hon ho) vi cc v d hc
Khi qut
t h
ha tt hn
h
D gii thch/din gii hn
p
phc tp
p tnh ton t hn
Tr Tu Nhn To
18
Vn over-fitting
g V d
[Mitchell, 1997]
Tr Tu Nhn To
19
Phn lp
p Nave Bayes
y
Vic phn
Vi
h lloii d
da trn
t cc
gi
i ttr xc
sut
t ca
cc
kh
nng xy ra ca cc gi thit
Tr Tu Nhn To
20
nh l Bayes
y
P( D | h).P(h)
P(h | D) =
P( D)
P(h): Xc sut trc (prior probability) rng gi thit (phn
lp) h l ng
P(D): Xc sut trc rng tp d liu D c quan st (thu
c)
P(D|h): Xc sut ca vic quan st c (thu c) tp d
liu D,
D vi iu kin gi thit h l ng
P(h|D): Xc sut ca gi thit h l ng, vi iu kin tp
d liu D c quan st
Tr Tu Nhn To
21
nh
l
Bayes
y V d
(1)
( )
Xt tp d liu sau y:
Day
Outlook
Temperature Humidity
Wind
Play Tennis
D1
Sunny
Hot
High
Weak
No
D2
Sunny
Hot
High
Strong
No
D3
O
Overcast
t
H t
Hot
Hi h
High
W k
Weak
Y
Yes
D4
Rain
Mild
High
Weak
Yes
D5
Rain
Cool
Normal
Weak
Yes
D6
Rain
Cool
Normal
Strong
No
D7
Overcast
Cool
Normal
Strong
Yes
D8
Sunny
Mild
High
Weak
No
D9
Sunny
Cool
Normal
Weak
Yes
D10
Rain
Mild
Normal
Weak
Yes
D11
Sunny
Mild
Normal
Strong
Yes
D12
Overcast
Mild
High
Strong
Yes
[Mitchell, 1997]
Tr Tu Nhn To
22
nh
l
Bayes
y V d
(2)
( )
P(D|h). Xc sut
ca mt ngy m thuc tnh Outlook c gi tr
Sunny v Wind c gi tr Strong, vi iu kin (nu bit rng) anh ta
chi tennis
23
Cc i ha xc sut c iu kin
Gi thit h nyy c
g
gi l g
gi thit cc
i
ha xc sut c
iu kin (maximum a posteriori MAP)
hMAP
P( D | h).P (h)
= arg max
P (D
( D)
hH
h
Tr Tu Nhn To
(bi nh l Bayes)
(P(D) l nh nhau
gi thit h))
i vi cc g
24
MAP V d
Bi v P(D)=P(D,h
P(D)=P(D h1)+P(D,h
)+P(D h2) l nh nhau i vi c 2 gi
thit h1 v h2, nn c th b qua i lng P(D)
V vy,
y, cn tnh 2 biu thc: P(D|h
( | 1)
).P(h
( 1) v
P(D|h2).P(h2), v a ra quyt nh tng ng
Nu P(D|h1).P(h1) P(D|h2).P(h2), th kt lun l anh ta chi tennis
Ngc
N
l i th kt lun
li,
l l anh
h ta
t khng
kh chi
h i tennis
t
i
Tr Tu Nhn To
25
Tr Tu Nhn To
26
MLE V d
Tp H bao gm 2 gi thit c th
h1: Anh ta chi tennis
h2: Anh ta khng chi tennis
D: Tp d liu (cc ngy) m trong thuc tnh Outlook c gi tr Sunny
v thuc tnh Wind c gi tr Strong
27
Phn loi
Nave Bayes
y (1)
( )
P( z1 , z 2 ,..., z n | ci ).P(ci )
P( z1 , z 2 ,..., z n )
Tr Tu Nhn To
(bi nh l Bayes)
28
Phn loi
Nave Bayes
y (2)
( )
tm c phn lp c th nht i vi z
(P(z1,z
z2,...,z
zn) l
nh nhau vi cc lp)
P( z1 , z 2 ,..., z n | ci ) = P( z j | ci )
j =1
j =1
Tr Tu Nhn To
29
P(ci ). P( x j | ci )
j =1
Xc nh phn lp ca z l lp c th nht c*
n
j =1
Tr Tu Nhn To
30
Age
Income
Student
Credit_Rating
Buy_Computer
Young
High
No
Fair
No
Young
High
No
Excellent
No
Medium
High
No
Fair
Yes
Old
M di
Medium
N
No
F i
Fair
Y
Yes
Old
Low
Yes
Fair
Yes
Old
Low
Yes
Excellent
No
Medium
Low
Yes
Excellent
Yes
Young
Medium
No
Fair
No
Young
Low
Yes
Fair
Yes
10
Old
Medium
Yes
Fair
Yes
11
Young
Medium
Yes
Excellent
Yes
12
Medium
Medium
No
Excellent
Yes
13
Medium
High
Yes
Fair
Yes
14
Old
Medium
No
Excellent
No
http://www.cs.sunysb.edu
/~cse634/lecture_notes/0
7classification.pdf
Tr Tu Nhn To
31
P(Age=Young|c2) = 3/5
P(Income=Medium|c
P(Income=M di | 1) = 4/9;
P(Income=M di | 2) = 2/5
P(Income=Medium|c
P(Student=Yes|c1) = 6/9;
P(Student=Yes|c2) = 1/5
P(Credit_Rating=Fair|c1) = 6/9;
P(Credit_Rating=Fair|c2) = 2/5
Tr Tu Nhn To
32
i vi phn lp c2
P(z|c2) = P(Age=Young|c2).P(Income=Medium|c2).P(Student=Yes|c2).
P(Credit_Rating=Fair|c2) = (3/5).(2/5).(1/5).(2/5) = 0.019
i vi phn lp c2
P(c2).P(z|c2) = (5/14).(0.019) = 0.007
33
j =1
34
lim P ( x j | c i ) = 0
n
j =1
c C
j =1
Tr Tu Nhn To
35
Giai on hc
T tp cc vn bn trong D_train, trch ra tp cc t kha
(keywords/terms): T = {tj}
Gi D_c
D ci (D_train)
(D train) l tp cc vn bn trong D_train
D train c nhn lp ci
i vi mi phn lp ci
- Tnh gi tr xc sut trc ca phn lp ci:
P (ci ) =
D _ ci
D
- i vi mi t kha tj, tnh xc sut t kha tj xut hin i vi lp ci
P (t j | ci ) =
d k D _ ci
d k D _ ci
n(d k ,t j ) + 1
m T
n( d k , t m ) + T
Tr Tu Nhn To
36
phn lp cho mt vn bn mi d
Giai on phn lp
T vn bn d, trch ra tp T_d gm cc t kha (keywords) tj
c nh ngha trong tp T (T_d T)
Gi s
(assumption).
(
ti ) Xc
X sut
t t kha
kh tj xut
t hin
hi i vi
i l
lp
ci l c lp i vi v tr ca t kha trong vn bn
P(tj v tr k|ci) = P(tj v tr m|ci), k,m
P(t j | ci )
t j T _ d
Phn lp vn bn d thuc vo lp c*
c * = arg max P(ci ).
ci C
P(t j | ci )
t j T _ d
Tr Tu Nhn To
37
Vi mt
(n gin l) lu li cc v d hc
Khng cn xy dng mt m hnh (m t) r rng v tng qut
ca hm mc tiu cn hc
i vi mt v d cn phn loi/d on
38
C
Chng
ta xt 2 kiu
bi ton hc
Bi ton phn lp (classification)
hc
mt
hm mc
tiu c gi
g tr ri rc
((a discrete-valued target
g
function)
u ra ca h thng l mt trong s cc gi tr ri rc xc nh
trc (mt trong cc nhn lp)
Tr Tu Nhn To
39
Phn lp
p da
trn NN V d
Xt 1 lng ging gn
nht
Gn z vo lp c2
Xt 3 lng ging gn
nht
Gn z vo lp c1
Xt 5 lng ging gn
nht
Gn z vo lp c1
Tr Tu Nhn To
Lp c1
Lp c2
V d cn
phn lp z
40
Gii thut
p
phn lp
p k-NN
Giai on hc
n gin l lu li cc v d hc trong tp hc D = {x}
Giai on
p
phn lp:
p p
phn lp
p cho mt
v d
((mi)) z
Vi mi v d hcxD, tnh khong cch gia x v z
Xc nh tp NB(z) cc lng ging gn nht ca z
Gm
G k v d
d h
hc ttrong D gn
nht
ht vi
i z tnh
t h theo
th mt
t hm
h
khong cch d
Phn z vo lp chim s ng (the majority class) trong s cc lp
ca cc v d hc trong NB(z)
Tr Tu Nhn To
41
Gii thut
d
on k-NN
Giai on hc
n gin l lu li cc v d hc trong tp hc D
Tr Tu Nhn To
yz =
1
y
xNB ( z ) x
k
42
Thng xt k (>1) cc v d hc gn
nht
vi v d cn
phn
lp, v gn v d vo lp chim s ng trong s k v d
hc gn nht ny
43
Hm tnh khong
g cch ((1))
44
Hm tnh khong
g cch ((2))
d ( x, z ) = xi zi
Hm Manhattan:
i =1
d ( x, z ) =
Hm Euclid:
(x z )
i =1
1/ p
Hm Minkowski (p-norm):
n
p
d ( x, z ) = xi zi
i =1
Hm Chebyshev:
n
p
d ( x, z ) = lim xi zi
p
i =1
1/ p
= max xi zi
i
Tr Tu Nhn To
45
Hm tnh khong
g cch ((3))
Hm khong cch
H
Hamming
i
d ( x, z ) = Difference
ff
( xi , z i )
i =1
i vi cc thuc tnh u
vo l kiu nh phn
V d: x=(0,1,0,1,1)
1, if ( a b)
Difference ( a, b) =
0, iff ( a = b)
Hm tnh tng t
Cosine
i vi u vo l mt vect
cc gi tr trng s (TF/IDF)
ca cc t kha
x.z
=
d ( x, z ) =
x z
Tr Tu Nhn To
x z
i =1
xi
i =1
i i
n
zi
i =1
46
d ( x, z ) =
2
(
)
x
z
i i
i =1
47
Trng
g s ca cc thuc
tnh
d ( x, z ) =
2
(
)
x
z
i i
i =1
d ( x, z ) =
w (x z )
i =1
48
Xt tp NB(z) gm k v d hc gn
nht vi v d cn phn lp/d on z
test instance z
Cn gn cc mc nh hng (ng
gp) ca mi lng ging gn nht ty
theo khong cch ca n n z
Mc nh hng cao hn cho cc
lng ging gn hn!
Tr Tu Nhn To
49
c ( z ) = arg max
c j C
v( x, z ).Identical (c j , c( x))
xNB ( z )
1, if (a = b)
Identical (a, b) =
0, if (a b)
f ( z) =
v( x, z ). f ( x)
v ( x, z )
xNB ( z )
xNB ( z )
1
v ( x, z ) =
+ [d ( x, z )]2
Tr Tu Nhn To
v ( x, z ) = e
d ( x, z )2
50
Hc
NN Khi no?
Cc u im
Khng
Kh cn
bc
b h
hc (h th
thng ch
h
n gin
i l llu llii cc
v d
d h
hc))
Hot ng tt vi cc bi ton c s lp kh ln
Khng cn phi hc ring r n b phn lp cho n lp
Phng php hc k-NN (k >>1) c th
lm vic c c vi d liu li
Cc nhc im
Phi xc nh hm tnh khong cch ph hp
Chi ph tnh ton (v thi gian v b nh) ti thi im phn lp/d on
C th phn lp/d on sai , do cc thuc tnh khng lin quan (irrelevant
attributes)
Tr Tu Nhn To
51
Ti liu
tham kho
E. Alpaydin. Introduction to Machine Learning. The MIT
Press, 2004.
T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.
H. A. Simon. Why Should Machines Learn? In R. S.
Michalski, J. Carbonell, and T. M. Mitchell (Eds.):
M hi
Machine
l
learning:
i
A
An artificial
tifi i l i
intelligence
t lli
approach,
h
chapter 2, pp. 25-38. Morgan Kaufmann, 1983.
Tr Tu Nhn To
52