Professional Documents
Culture Documents
Session2 Eng
Session2 Eng
Session2 Eng
Session II
Pierre Michel
pierre.michel@univ-amu.fr
M2 EBDS
2021
1. Logistic regression
1. Logistic regression
Classification problems
Some examples:
• Binary classification: y:
I 0: “negative class” (ex: normal sinus rhythm)
I 1: “positive class” (ex: atrial fibrillation)
• Multiclass classification: y ∈ {0, 1, 2, ..., K}, K ≥ 2
1: Oui
Tumeur maligne ?
0: Non
Taille de la tumeur
0 ≤ hθ (x) ≤ 1
hθ (x) = g(θT x)
1
where g(z) = 1+e−z (g is the sigmoid or logistic function), that is:
1
hθ (x) =
1 + e−θT x
Logistic function
1
g(z) =
1 + e−z
1.0
0.8
0.6
g(z)
0.4
0.2
0.0
−4 −2 0 2 4
Pierre Michel
z Prediction methods and Machine learning 9/57
1. Logistic regression
1.2. Representation of the model
Interpretation of hθ (x)
hθ (x) = P (y = 1|x; θ)
with
P (y = 0|x; θ) + P (y = 1|x; θ) = 1
P (y = 0|x; θ) = 1 − P (y = 1|x; θ)
Classification rule
1
hθ (x) = g(θT x) = P (y = 1|x; θ) g(z) = 1+e−z
Classification rule:
2
1
y=0
y=1
0
x1
Non-linear example:
hθ (x) = g(θ0 + θ1 x1 + θ2 x2 + θ3 x21 + θ4 x22 )
2
1
x2
0
−1
y=0
y=1
−2
−2 −1 0 1 2
x1
Reminder: probability
We have:
P (y = 1|x; θ) = hθ (x)
P (y = 0|x; θ) = 1 − hθ (x)
n
Y
L(θ) = p(y (i) |x(i) ; θ)
i=1
n
Y (i) (i)
= hθ (x(i) )y (1 − hθ (x(i) ))1−y
i=1
n
1 X (i)
J(θ) = − y log hθ (x(i) ) + (1 − y (i) ) log(1 − hθ (x(i) ))
n i=1
1 https:
//scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html
2 https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
LogisticRegression.html
Pierre Michel Prediction methods and Machine learning 20/57
1. Logistic regression
1.5. Multiclass classification
Introduction
Example:
3 https:
//scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html
Pierre Michel Prediction methods and Machine learning 22/57
1. Logistic regression
1.5. Multiclass classification
4
3
3
x2
x2
2
2
1
1
y=1
y=0 y=2
y=1 y=3
0
0
0 1 2 3 4 0 1 2 3 4
x1 x1
(k)
hθ (x) = P (y = k|x; θ)
2. Regularization
Size
Overfitting
Solutions
Two options:
Idea of regularization
Prix (y)
Regularization
We obtain small values for the parameters θ0 , θ1 , ..., θp
X n
X
J(θ) = (y (i) − hθ (x(i) ))2 + λ θ2
i=1 i=1
hθ (x) = θ0
.
Pierre Michel Prediction methods and Machine learning 32/57
3. Evaluation of a Machine Learning model
Overfitting
A model with little learning error does not guarantee a good model.
The model may have difficulty generalizing to new observations (test
sample).
• the training sample: {(x(1) , y (1) ), (x(2) , y (2) ), ..., (x(n) , y (n) )
• the test sample {(x(1) , y (1) ), (x(2) , y (2) ), ..., (x(ntest ) , y (ntest ) )
min J(θ)
θ
2.We compute the mean squared error (MSE) using the test sample:
nX
test
1 (i) (i)
M SEtest = (hθ (xtest ) − ytest )2
ntest i=1
min J(θ)
θ
nX
test
1
err = I(y (i) 6= ŷ (i) )
ntest i=1
with
1 if hθ (x) ≥ 0.5
ŷ (i) =
0 elsewhere.
d = 1: hθ (x) = θ0 + θ1 x
d = 2: hθ (x) = θ0 + θ1 x + θ2 x2
d = 3: hθ (x) = θ0 + θ1 x + θ2 x2 + θ3 x3
..
.
d = 10: hθ (x) = θ0 + θ1 x + θ2 x2 + ... + θ10 x10
For each value of d, we obtain a vector of parameters noted θ(d) .
We can calculate the MSE on the test sample, for each value of d: M SE (d) .
Problem: the choice of the d hyperparameter is made according to the
same test sample, problem of overfitting on the test sample.
Cross-validation
• the training sample: {(x(1) , y (1) ), (x(2) , y (2) ), ..., (x(n) , y (n) )}
• the validation sample:
• the test sample {(x(1) , y (1) ), (x(2) , y (2) ), ..., (x(ntest ) , y (ntest ) )}
Model selection
d = 1: hθ (x) = θ0 + θ1 x
d = 2: hθ (x) = θ0 + θ1 x + θ2 x2
d = 3: hθ (x) = θ0 + θ1 x + θ2 x2 + θ3 x3
..
.
d = 10: hθ (x) = θ0 + θ1 x + θ2 x2 + ... + θ10 x10
For each value of d, we obtain a vector of parameters noted θ(d) .
We compute the MSE on the validation sample, for each value of d:
(d) (d)
M SEcv . We choose the value of d that minimizes M SEcv .
The generalization error is then computed on the test sample, with the
chosen value of d.
When doing regularization, the choice of the regularization parameter λ is
the most important one.
Pierre Michel Prediction methods and Machine learning 45/57
3. Evaluation of a Machine Learning model
3.4. Bias and variance
• Bias (underfitting)
I M SEtrain high
I M SEtrain ' M SEcv
• Variance (overfitting)
I M SEtrain low
I M SEcv >> M SEtrain
Classification example
Cross-validation
• the training sample: {(x(1) , y (1) ), (x(2) , y (2) ), ..., (x(n) , y (n) )}
• the validation sample:
• the test sample {(x(1) , y (1) ), (x(2) , y (2) ), ..., (x(ntest ) , y (ntest ) )}
Model selection
1 if cancer
y=
0 elsewhere.
Confusion matrix
Predicted class
0 1 Total
0 TN FP TN + FP
Observed class
1 FN TP FN + TP
Total TN + FN FP + TP n
• T N : true negatives
• F P : false positives
• F N : false negatives
• T P : true positives
• Precision:
TP
Precision =
TP + FP
• Recall:
TP
Recall =
TP + FN
Trade-off precision/recall
Score F1
Precision × Recall
F1 score = 2 ∗
Precision + Recall