Professional Documents
Culture Documents
05 LogisticRegression
05 LogisticRegression
Regression
These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made
their course materials freely available online. Feel free to reuse or adapt these slides for your own academic
purposes, provided that you include proper aIribu&on. Please send comments and correc&ons to Eric.
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
Classifica&on Based on Probability
• Instead of just predic&ng the class, give the probability
of the instance being that class
– i.e., learn p(y | x)
• Comparison to perceptron:
– Perceptron doesn’t produce probability es&mate
– Perceptron (and other discrimina&ve classifiers) are only
interested in producing a discrimina&ve model
• Recall that:
0 p(event) 1
p(event) + p(¬event) = 1
2
Logis&c Regression
• Takes a probabilis&c approach to learning
discrimina&ve func&ons (i.e., a classifier)
• should give
h✓ (x) p(y = 1 | x; ✓) Can’t just use linear
regression with a
– Want 0 h✓ (x) 1 threshold
| |
should be large nega&ve
h✓ (x) = g (✓ x) should be large posi&ve
h✓ (x) = g (✓ x)
values for nega&ve instances values for posi&ve instances
9
Deriving the Cost Func&on via
Maximum Likelihood Es&ma&on
n
Y
• Likelihood of data is given by: l(✓) = p(y (i) | x(i) ; ✓)
i=1
1 X ⇣ ⇣ (i) ⌘ ⌘2
n
Compare to linear regression: J(✓) = h✓ x y (i)
2n i=1 12
Intui&on Behind the Objec&ve
⇢
log(h✓ (x)) if y = 1
cost (h✓ (x), y) =
log(1 h✓ (x)) if y = 0
13
Intui&on Behind the Objec&ve
⇢
log(h✓ (x)) if y = 1
cost (h✓ (x), y) =
log(1 h✓ (x)) if y = 0
If y = 1
• Cost = 0 if predic&on is correct
If y = 1 • As h✓ (x) ! 0, cost ! 1
• Captures intui&on that larger
cost
mistakes should get larger
penal&es
– e.g., predict , but y = 1
h✓ (x) = 0
0 h✓ (x) = 0 1
Based on example by Andrew Ng 14
Intui&on Behind the Objec&ve
⇢
log(h✓ (x)) if y = 1
cost (h✓ (x), y) =
log(1 h✓ (x)) if y = 0
If y = 0
• Cost = 0 if predic&on is correct
If y = 1 • As (1 h✓ (x)) ! 0, cost ! 1
If y = 0
• Captures intui&on that larger
cost
mistakes should get larger
penal&es
0 h✓ (x) = 0 1
Based on example by Andrew Ng 15
Regularized Logis&c Regression
n h
X ⇣ ⌘ ⇣ ⌘i
J(✓) = y (i) log h✓ (x(i) ) + 1 y (i) log 1 h✓ (x(i) )
i=1
16
Gradient Descent for Logis&c Regression
n h
X ⇣ ⌘ ⇣ ⌘i
(i) (i) (i) (i)
Jreg (✓) = y log h✓ (x ) + 1 y log 1 h✓ (x ) + k✓[1:d] k22
i=1
2
• Ini&alize ✓
• Repeat un&l convergence
@ simultaneous update
✓j ✓j ↵ J(✓)
@✓j for j = 0 ... d
Use the natural logarithm (ln = loge) to cancel with the exp() in h✓ (x) =
17
Gradient Descent for Logis&c Regression
n h
X ⇣ ⌘ ⇣ ⌘i
(i) (i) (i) (i)
Jreg (✓) = y log h✓ (x ) + 1 y log 1 h✓ (x ) + k✓[1:d] k22
i=1
2
• Ini&alize ✓
• Repeat un&l convergence (simultaneous update for j = 0 ... d)
Xn ⇣ ⇣ ⌘ ⌘
✓0 ✓0 ↵ h✓ x(i) y (i)
i=1
" n ⇣
#
X ⇣ ⌘ ⌘
(i)
✓j ✓j ↵ h✓ x(i) y (i) xj + ✓j
i=1
18
Gradient Descent for Logis&c Regression
• Ini&alize ✓
• Repeat un&l convergence (simultaneous update for j = 0 ... d)
Xn ⇣ ⇣ ⌘ ⌘
✓0 ✓0 ↵ h✓ x(i) y (i)
i=1
" n #
X⇣ ⇣ ⌘ ⌘
(i)
(i) (i)
✓j ✓j ↵ h✓ x y xj + ✓j
i=1
x2 x2
x1 x1
21
Mul&-Class Logis&c Regression
Split into One vs Rest:
x2
x1