Professional Documents
Culture Documents
4 Bn-En
4 Bn-En
Thanh-Nghi Do
Can Tho University
dtnghi@cit.ctu.edu.vn
Can Tho
Dec. 2019
Content
■ Introduction
■ Naive Bayes
■ Conclusion
2
Content
■ Introduction
■ Naive Bayes
■ Conclusion
3
■ Introduction
■ Naive Bayes
■ Conclusions
Bayesian classification
4
■ Introduction
■ Naive Bayes
■ Conclusions
(Kdnuggets)
5
Content
■ Introduction
■ Naive Bayes
■ Conclusion
6
■ Introduction
■ Naive Bayes
■ Conclusions
Naive Bayes
■ Naive
● Features are equally important
● Assumption that Xi are conditionally independent, given Y
P(X1|X2, Y) = P(X1| Y)
■ Comments
● Independence assumption is almost never correct!
● But … this scheme works well in practice J
7
■ Introduction
■
Naive Bayes
Conclusions
8
■ Introduction
■
Naive Bayes
Conclusions
■
Naive Bayes
Conclusions
Sunny
Temp.
Cool
Humidity
High
Windy
True
Play
10
■ Introduction
■
Naive Bayes
Conclusions
■ Play = yes/no?
Outlook Temp. Humidity Windy Play
Bayes rule
■ Conclusions
12
■ Introduction
■ Naive Bayes
13
■ Introduction
■ Naive Bayes
´ 93 ´ 93 ´ 93 ´ 149
2
= 9
Pr[ E ]
14
■ Introduction
■ Naive Bayes
■ Conclusions
15
■ Introduction
■ Naive Bayes
■ Conclusions
Laplace estimator
16
■ Introduction
■ Naive Bayes
■ Conclusions
Missing values
17
■ Introduction
■ Naive Bayes
■ Conclusions
Numerical features
18
■ Introduction
■ Naive Bayes
■ Conclusions
Numerical features
n
1
● Standard deviation: s = 2
å i
n - 1 i =1
( x - µ ) 2
Numerical features
( 66-73) 2
1 -
f (temperature = 66 | yes) = e 2*6.22
= 0.0340
2p 6.2 20
■ Introduction
■ Naive Bayes
■ Conclusions
Numerical features
Sunny 66 90 true ?
21
Content
■ Introduction
■ Naive Bayes
■ Conclusion
22
■ Introduction
■ Naive Bayes
■ Conclusions
Conclusions
■ Naïve Bayes
● Bayes rule
● Works surprisingly well (even if independence assumption is
clearly violated)
● Why? Because classification doesn’t require accurate
probability estimates as long as maximum probability is
assigned to correct class
● However: adding too many redundant features will cause
problems (e.g. identical features)
● Note also: many numeric features are not normally distributed
à kernel density estimators
23
■ Introduction
■ Naive Bayes
■ Conclusions
Extensions
■ Improvements
● Select best features (e.g. with greedy search)
● Often works as well or better with just a fraction of all
features
● Bayesian Networks
24