Professional Documents
Culture Documents
EX02 Logistic Regression Solution
EX02 Logistic Regression Solution
Logistic Regression
Siming Bayer, Paul Stoewer
Pattern Recognition Lab (CS 5)
Winter Semester 2022/23
Last Exercise. . .
Bayes Classifier
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 2
Last Exercise. . .
Bayes Classifier
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 2
Last Exercise. . .
Bayes Classifier
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 2
Last Exercise. . .
Bayes Classifier
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 2
Last Exercise. . .
Bayes Classifier
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 2
Generative vs Discriminative Models
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 3
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 4
Classification vs Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 5
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 6
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 6
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 6
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 6
How can posterior probabilities be modeled?
p(y = 0) · p(x |y = 0)
p (y = 0|x ) =
p (x )
p(y = 0) · p(x |y = 0)
=
p(y = 0)p(x |y = 0) + p(y = 1)p(x |y = 1)
1
= p(y =1)p(x |y =1)
1+ p(y =0)p(x |y =0)
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 7
How can posterior probabilities be modeled?
p(y = 0) · p(x |y = 0)
p (y = 0|x ) =
p (x )
p(y = 0) · p(x |y = 0)
=
p(y = 0)p(x |y = 0) + p(y = 1)p(x |y = 1)
1
= p(y =1)p(x |y =1)
1+ p(y =0)p(x |y =0)
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 7
How can posterior probabilities be modeled?
p(y = 0) · p(x |y = 0)
p (y = 0|x ) =
p (x )
p(y = 0) · p(x |y = 0)
=
p(y = 0)p(x |y = 0) + p(y = 1)p(x |y = 1)
1
= p(y =1)p(x |y =1)
1+ p(y =0)p(x |y =0)
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 7
How can posterior probabilities be modeled?
p(y = 0) · p(x |y = 0)
p (y = 0|x ) =
p (x )
p(y = 0) · p(x |y = 0)
=
p(y = 0)p(x |y = 0) + p(y = 1)p(x |y = 1)
1
= p(y =1)p(x |y =1)
1+ p(y =0)p(x |y =0)
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 7
Posteriors and the Logistic Function (cont.)
1
p (y = 0|x ) = p(y =1)p(x |y =1)
1+ p(y =0)p(x |y =0)
1
= p(y =1)p(x |y =1)
log
1+e p(y =0)p(x |y =0)
1
= =0|x )
− log pp((yy =
1+e 1|x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 8
Posteriors and the Logistic Function (cont.)
1
p (y = 0|x ) = p(y =1)p(x |y =1)
1+ p(y =0)p(x |y =0)
1
= p(y =1)p(x |y =1)
log
1+e p(y =0)p(x |y =0)
1
= =0|x )
− log pp((yy =
1+e 1|x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 8
Posteriors and the Logistic Function (cont.)
1
p (y = 0|x ) = p(y =1)p(x |y =1)
1+ p(y =0)p(x |y =0)
1
= p(y =1)p(x |y =1)
log
1+e p(y =0)p(x |y =0)
1
= =0|x )
− log pp((yy =
1+e 1|x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 8
Posteriors and the Logistic Function (cont.)
1
p (y = 0|x ) = p(y =1)p(x |y =1)
1+ p(y =0)p(x |y =0)
1
= p(y =1)p(x |y =1)
log
1+e p(y =0)p(x |y =0)
1
= =0|x )
− log pp((yy =
1+e 1|x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 8
How can posterior probabilities be modeled?
We see that the posterior for class y = 0 can be written in terms of
a logistic function:
1
p(y = 0|x ) =
1 + e−F (x )
e−F (x )
=
1 + e−F (x )
1
=
1 + e F (x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 9
How can posterior probabilities be modeled?
We see that the posterior for class y = 0 can be written in terms of
a logistic function:
1
p(y = 0|x ) =
1 + e−F (x )
e−F (x )
=
1 + e−F (x )
1
=
1 + e F (x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 9
How can posterior probabilities be modeled?
We see that the posterior for class y = 0 can be written in terms of
a logistic function:
1
p(y = 0|x ) =
1 + e−F (x )
e−F (x )
=
1 + e−F (x )
1
=
1 + e F (x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 9
How can posterior probabilities be modeled?
We see that the posterior for class y = 0 can be written in terms of
a logistic function:
1
p(y = 0|x ) =
1 + e−F (x )
e−F (x )
=
1 + e−F (x )
1
=
1 + e F (x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 9
How can posterior probabilities be modeled?
We see that the posterior for class y = 0 can be written in terms of
a logistic function:
1
p(y = 0|x ) =
1 + e−F (x )
e−F (x )
=
1 + e−F (x )
1
=
1 + e F (x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 9
Logistic Regression
Answer:
• p (y = 0|x ) = g (F (x )) can be modeled in terms of a function F (x ) and the logistic function
• F (x ) > 0 ⇒ y →0
• F (x ) ≤ 0 ⇒ y →1
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 10
Logistic Regression
b) What is the shape of the decision boundary for a Gaussian distributed class-conditional densities?
Does it change their covariance matrices are identical?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 11
Logistic Regression
b) What is the shape of the decision boundary for a Gaussian distributed class-conditional densities?
Does it change their covariance matrices are identical?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 11
Logistic Regression
b) What is the shape of the decision boundary for a Gaussian distributed class-conditional densities?
Does it change their covariance matrices are identical?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 11
Logistic Regression
b) What is the shape of the decision boundary for a Gaussian distributed class-conditional densities?
Does it change their covariance matrices are identical?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 11
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example
Let us assume both classes have normally distributed d-dimensional feature vectors:
1 1 T −1
p(x |y ) = p e− 2 (x −µy ) Σy (x −µy )
det (2πΣy )
1 1
p (y = 0|x ) = =
1 + e−F (x ) 1 + e−(x
T Ax +αT x +α0 )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 12
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example
Let us assume both classes have normally distributed d-dimensional feature vectors:
1 1 T −1
p(x |y ) = p e− 2 (x −µy ) Σy (x −µy )
det (2πΣy )
1 1
p (y = 0|x ) = =
1 + e−F (x ) 1 + e−(x
T Ax +αT x +α0 )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 12
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example
Let us assume both classes have normally distributed d-dimensional feature vectors:
1 1 T −1
p(x |y ) = p e− 2 (x −µy ) Σy (x −µy )
det (2πΣy )
1 1
p (y = 0|x ) = =
1 + e−F (x ) 1 + e−(x
T Ax +αT x +α0 )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 12
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example cont.
1 T −1
√ 1 e− 2 (x −µ0 ) Σ0 (x −µ0 )
p (y = 0) det (2πΣ0 )
F (x ) = log + log 1 −1
e− 2 (x −µ1 ) Σ1 (x −µ1 )
T
p (y = 1 ) √ 1
det (2πΣ1 )
p (y = 0) 1 det (2πΣ1 )
c = log + log
p (y = 1) 2 det (2πΣ0 )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 13
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example cont.
1 T −1
√ 1 e− 2 (x −µ0 ) Σ0 (x −µ0 )
p (y = 0) det (2πΣ0 )
F (x ) = log + log 1 −1
e− 2 (x −µ1 ) Σ1 (x −µ1 )
T
p (y = 1 ) √ 1
det (2πΣ1 )
p (y = 0) 1 det (2πΣ1 )
c = log + log
p (y = 1) 2 det (2πΣ0 )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 13
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example cont.
Furthermore we have:
1 T −1
e− 2 (x −µ0 ) Σ0 (x −µ0 )
log 1 T −1 =
e− 2 (x −µ1 ) Σ1 (x −µ1 )
1
(x − µ1 )T Σ−1 T −1
= 1 (x − µ1 ) − (x − µ0 ) Σ0 (x − µ0 )
2
1
= x T (Σ−1 −1 T −1 T −1
1 − Σ0 )x − 2(µ1 Σ1 − µ0 Σ0 )x +
2
+µT1 Σ−1 T −1
1 µ1 − µ0 Σ0 µ0
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 14
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example cont.
Furthermore we have:
1 T −1
e− 2 (x −µ0 ) Σ0 (x −µ0 )
log 1 T −1 =
e− 2 (x −µ1 ) Σ1 (x −µ1 )
1
(x − µ1 )T Σ−1 T −1
= 1 (x − µ1 ) − (x − µ0 ) Σ0 (x − µ0 )
2
1
= x T (Σ−1 −1 T −1 T −1
1 − Σ0 )x − 2(µ1 Σ1 − µ0 Σ0 )x +
2
+µT1 Σ−1 T −1
1 µ1 − µ0 Σ0 µ0
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 14
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example cont.
Furthermore we have:
1 T −1
e− 2 (x −µ0 ) Σ0 (x −µ0 )
log 1 T −1 =
e− 2 (x −µ1 ) Σ1 (x −µ1 )
1
(x − µ1 )T Σ−1 T −1
= 1 (x − µ1 ) − (x − µ0 ) Σ0 (x − µ0 )
2
1
= x T (Σ−1 −1 T −1 T −1
1 − Σ0 )x − 2(µ1 Σ1 − µ0 Σ0 )x +
2
+µT1 Σ−1 T −1
1 µ1 − µ0 Σ0 µ0
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 14
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
Example cont.
Now we have:
1
A = (Σ−1 −1
1 − Σ0 )
2
αT = µT0 Σ−1 T −1
0 − µ1 Σ1
p (y
= 0) 1 det (2πΣ1 ) T −1 T −1
α0 = log + log + µ1 Σ1 µ1 − µ0 Σ0 µ0
p(y = 1) 2 det (2πΣ0 )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 15
What is the shape of the decision boundary for a Gaussian distributed
class-conditional densities?
5
p (y = 0) = 0.5
3
p (y = 1) = 0.5
2
-1
-2
-3
-4
-5
-5 -4 -3 -2 -1 0 1 2 3 4 5
p (y = 0) = 0.5
3
p (y = 1) = 0.5
2
-1
-2
-3
-4
-5
-5 -4 -3 -2 -1 0 1 2 3 4 5
p (y = 0) = 0.8
3
p (y = 1) = 0.2
2
-1
-2
-3
-4
-5
-5 -4 -3 -2 -1 0 1 2 3 4 5
90
80 p (y = 0) = 0.5
70 p (y = 1) = 0.5
60
50
40
30
20
10
-10
-20
-30
-40
-50
-50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100
F (x ) = x T Ax + αT x + α0
!
= ax12 + bx1 x2 + cx22 + dx1 + ex2 + f = 0
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 17
Logistic Regression
b) What is the shape of the decision boundary for a Gaussian distributed class-conditional densities?
Does it change their covariance matrices are identical?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 18
Logistic Regression
b) What is the shape of the decision boundary for a Gaussian distributed class-conditional densities?
Does it change their covariance matrices are identical?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 18
Logistic Regression
b) What is the shape of the decision boundary for a Gaussian distributed class-conditional densities?
Does it change their covariance matrices are identical?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 18
Does it change their covariance matrices are identical?
Example cont.
If both classes share the same covariances i. e. Σ = Σ0 = Σ1 , then the argument of the sigmoid function is linear
in the components of x.
A = 0
αT = (µ0 − µ1 )T Σ−1
p(y = 0) 1
α0 = log + (µ1 + µ0 )T Σ−1 (µ1 − µ0 )
p(y = 1) 2
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 19
Does it change their covariance matrices are identical?
5
p (y = 0) = 0.5
3
p (y = 1) = 0.5
2
-1
-2
-3
-4
-5
-5 -4 -3 -2 -1 0 1 2 3 4 5
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 20
Does it change their covariance matrices are identical?
5
p (y = 0) = 0.5
3
p (y = 1) = 0.5
2
-1
-2
-3
-4
-5
-5 -4 -3 -2 -1 0 1 2 3 4 5
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 20
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 21
How does a change of prior affect the decision boundary?
Example cont.
1 T −1
√ 1 e− 2 (x −µ0 ) Σ0 (x −µ0 )
p (y = 0) det (2πΣ0 )
F (x ) = log + log 1 −1
e− 2 (x −µ1 ) Σ1 (x −µ1 )
T
p (y = 1 ) √ 1
det (2πΣ1 )
p (y = 0) 1 det (2πΣ1 )
c = log + log
p (y = 1) 2 det (2πΣ0 )
We observe:
• Priors imply a constant offset of the decision boundary.
• If priors and covariance matrices of both classes are identical, this offset is c = 0.
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 22
Logistic Regression
d) For Gaussian distributed class-conditional densities, is it easier to calculate the decision boundary
directly or to estimate the parameters of the distributions?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 23
Logistic Regression
d) For Gaussian distributed class-conditional densities, is it easier to calculate the decision boundary
directly or to estimate the parameters of the distributions?
Answer:
• Estimating the Gaussians: p(y = 0), µ0 , µ1 , Σ0 , Σ1
• Estimating the decision boundary: A, α, α0
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 23
Logistic Regression
e) How can we obtain a decision boundary that is non-linear in the components of the feature vector?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 24
Logistic Regression
e) How can we obtain a decision boundary that is non-linear in the components of the feature vector?
Answer:
• We get a non-linear decision boundary when the covariance matrices of the involved Gaussians are
not identical
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 24
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 25
Log-Likelihood Function
p(y = 0|x ) = 1 − g (θ T x )
p(y = 1|x ) = g (θ T x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 26
Log-Likelihood Function
p(y = 0|x ) = 1 − g (θ T x )
p(y = 1|x ) = g (θ T x )
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 26
Log-Likelihood Function (cont.)
Before we work on the formulas of the ML-estimator, we rewrite the posteriors using Bernoulli probability:
p(y |x ) = g (θ T x )y (1 − g (θ T x ))1−y
which shows the great benefit of the chosen notation for class numbers.
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 27
Log-Likelihood Function (cont.)
Before we work on the formulas of the ML-estimator, we rewrite the posteriors using Bernoulli probability:
p(y |x ) = g (θ T x )y (1 − g (θ T x ))1−y
which shows the great benefit of the chosen notation for class numbers.
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 27
Log-Likelihood Function (cont.)
m
L (θ) = log ∏ p(yi |xi )
i =1
m 1−yi
T yi T
= ∑ log g (θ xi ) 1 − g (θ xi )
i =1
m
= ∑ yi log g (θ T xi ) + (1 − yi ) log 1 − g (θ T xi )
i =1
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 28
Log-Likelihood Function (cont.)
m
L (θ) = log ∏ p(yi |xi )
i =1
m 1−yi
T yi T
= ∑ log g (θ xi ) 1 − g (θ xi )
i =1
m
= ∑ yi log g (θ T xi ) + (1 − yi ) log 1 − g (θ T xi )
i =1
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 28
Log-Likelihood Function (cont.)
m
L (θ) = log ∏ p(yi |xi )
i =1
m 1−yi
T yi T
= ∑ log g (θ xi ) 1 − g (θ xi )
i =1
m
= ∑ yi log g (θ T xi ) + (1 − yi ) log 1 − g (θ T xi )
i =1
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 28
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 29
Logistic Regression
• Likelihood is a product of very small probabilities, Logarithm transforms to reasonable number range
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 30
Logistic Regression
h) What important property with respect to optimization does the log-likelihood function fulfill?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 31
What important property with respect to optimization does the log-likelihood
function fulfill?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 32
Logistic Regression
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 33
What method did we use to maximize the log-likelihood?
Note: If you write the Newton-Raphson iteration in matrix form, you will end up with a weighted least
squares iteration scheme.
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 34
Logistic Regression
j) Could we also use a different method? What effect does the choice of optimization method have on
the convergence speed of the algorithm?
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 35
Logistic Regression
j) Could we also use a different method? What effect does the choice of optimization method have on
the convergence speed of the algorithm?
Answer:
• Alternative method: gradient decent without step size control
• Newton method has step-size control ⇒ less steps necessary for convergence
Siming Bayer, Paul Stoewer | Pattern Recognition Lab (CS 5) | Pattern Recognition — Theory Exercises Winter Semester 2022/23 36