Professional Documents
Culture Documents
Lec 21
Lec 21
§ Naïve Bayes
F1 F2 Fn
§ Parameter estimation:
§ MLE, MAP, priors
§ Laplace smoothing
Hello, # free
YOUR_NAME
:
:
2
0 SPAM
Do you want free printr
cartriges? Why pay more
MISSPELLED
FROM_FRIEND
:
:
2
0 or
when you can get them
ABSOLUTELY FREE! Just
...
+
PIXEL-7,12 : 1
PIXEL-7,13
...
: 0
“2”
NUM_LOOPS : 1
...
Some (Simplified) Biology
§ Very loose inspiration: human neurons
Linear Classifiers
S
§ Positive, output +1 w2
f2 >0?
w3
§ Negative, output -1 f3
Weights
§ Binary case: compare features to a weight vector
§ Learning: figure out the weight vector from examples
# free : 4
YOUR_NAME :-1
MISSPELLED : 1 # free : 2
FROM_FRIEND :-3 YOUR_NAME : 0
... MISSPELLED : 2
FROM_FRIEND : 0
...
# free : 0
YOUR_NAME : 1
MISSPELLED : 1
Dot product positive FROM_FRIEND : 1
means the positive class ...
Decision Rules
Binary Decision Rule
§ In the space of feature vectors
§ Examples are points
§ Any weight vector is a hyperplane
§ One side corresponds to Y=+1
§ Other corresponds to Y=-1
money
2
1
BIAS : -3
free : 4
money : 2
... 0
0 1 free
Binary Decision Rule
§ In the space of feature vectors
§ Examples are points
§ Any weight vector is a hyperplane
§ One side corresponds to Y=+1
§ Other corresponds to Y=-1
money
2
1
free : 4
money : 2
0
0 1 free
Binary Decision Rule
§ In the space of feature vectors
§ Examples are points
§ Any weight vector is a hyperplane
§ One side corresponds to Y=+1
§ Other corresponds to Y=-1
money
2
+1 = SPAM
1
BIAS : -3
free : 4
money : 2
... 0
-1 = HAM 0 1 free
Weight Updates
Learning: Binary Perceptron
§ Start with weights = 0
§ For each training instance:
§ Classify with current weights
§ If correct, no change!
§ If wrong: lower score of wrong
answer, raise score of right answer
Example: Multiclass Perceptron
1 -2 -2 0 3 3 0 0
BIAS : 1 0 1 BIAS : 0 1 0 BIAS : 0
win : 0 -1 0 win : 0 1 0 win : 0
game : 0 0 1 game : 0 0 -1 game : 0
vote : 0 -1 -1 vote : 0 1 1 vote : 0
the : 0 -1 0 the : 0 1 0 the : 0
... ... ...
Properties of Perceptrons
Separable
§ Separability: true if some parameters get the training
set perfectly correct
0.9 | 0.1
0.7 | 0.3
0.5 | 0.5
0.3 | 0.7
0.1 | 0.9
How to get probabilistic decisions?
§ Perceptron scoring: z = w · f (x)
§ If z = w · f (x) very positive à want probability going to 1
§ If z = w · f (x) very negative à want probability going to 0
§ Sigmoid function
1
(z) = z
1+e
A 1D Example
normalizer
The Soft Max
Best w?
§ Maximum likelihood estimation:
X
(i) (i)
max ll(w) = max log P (y |x ; w)
w w
i
(i) (i) 1
with: P (y = +1|x ; w) = w·f (x(i) )
1+e
(i) (i) 1
P (y = 1|x ; w) = 1 w·f (x(i) )
1+e
= Logistic Regression
Separable Case: Deterministic Decision – Many Options
Separable Case: Probabilistic Decision – Clear Preference
0.7 | 0.3
0.5 | 0.5
0.7 | 0.3
0.3 | 0.7 0.5 | 0.5
0.3 | 0.7
Multiclass Logistic Regression
§ Recall Perceptron:
§ A weight vector for each class:
wy(i) ·f (x(i) )
(i) (i) e
with: P (y |x ; w) = P w ·f (x (i) )
y e y
§ Optimization