Professional Documents
Culture Documents
5 Logistic Regression
5 Logistic Regression
M AC H I N E L E A R N I N G
z
- Lecture 5 - AI - ML
Logistic Dr. Aicha BOUTORH
Classification
2 Boutorh
Logistic Regression
ML - Logistic Regression
Dr. A.
4 Boutorh
0: Negative Class
z
1: Positive Class
ML - Logistic Regression
z Linear Classifier
§ Given examples ( x(i) , y(i) ), learn a classifier that is able to
predict y* given new point x*.
§ – Example
Ø x2 : Fish Length
ML - Logistic Regression
Ø y : Fish Species.
Figure Source:
Logistic Regression,
Dr. Patras, Hospedales
Dr. A.
6 Boutorh
z
h 𝛉(x) = 𝛉T x
0.5
h 𝛉(x) = 𝛉T x
Ø Classification : y = 0 or y= 1
Logistic Regression
ML - Logistic Regression
Dr. A.
9
z
Linear vs Logistic Boutorh
Regression
§ Linear and Logistic Regression use different
Hypothesis / Representation / Model Assumptions.
z = 𝒈 ( 𝜽
h 𝜽(x) T x)
𝟏
h 𝜽(x) = _
𝜽Tx
𝟏 𝟏m𝒆
𝒈 𝒛 = _𝒛
𝟏+𝒆
Logistic Regression 𝒈 𝒛
0 ≤ h 𝛉(x) ≤ 1
0.5
Sigmoid Function
ML - Logistic Regression
Logistic Function
z
Dr. A.
11
Interpretation of Hypothesis Output Boutorh
☞ h 𝜽(x) = P ( y=1 | x; 𝜽)
tumor being
Ø P ( y=0 | x; 𝜽) = 1 - P ( y=1 | x; 𝜽) Benign is 20%
12
v Sigmoid Function
z
also called
Logistic Function
The function g(z) maps any real number to the (0,1) interval making any valued
function better suited for classification.
ML - Logistic Regression
h 𝜽(x) will give the probability that the output is 1. The probability that the prediction
is 0 is the complement of the probability that is 1.
Dr. A.
13 Boutorh
Logistic Regression
ML - Logistic Regression
Dr. A.
14 h 𝜽(x) = 𝒈 𝜽T x Boutorh
z
h 𝜽(x) ≥ 0.5
0.5
h 𝜽(x) < 0.5
𝜽T x < 0 𝜽T x ≥ 0
𝜽T x
h 𝜽(x) = 𝒈 ( 𝜽T x) = P(y=1| x ; 𝛉)
ML - Logistic Regression
z Decision Boundary
𝟏
h 𝜽(x) = _
𝜽Tx . = 0.5
𝟏m𝒆
_𝜽Tx
𝟏+ 𝒆 = 2
ML - Logistic Regression
𝜽Tx = 0
16
z Decision Boundary
-3
v Example : 𝛉 = 1
1
Ø h 𝜽(x) = g (𝜽T x)
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø 𝜽T x = −𝟑 + x1 + x2
Ø y = 1 if 𝜽T x ≥ 0
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
ML - Logistic Regression
Ø y = 1 if x1 + x2 ≥ 𝟑
Source:
ML – Andrew Ng
17
z Decision Boundary
-3
v Example : 𝛉 = 1
1
Ø h 𝜽(x) = g (𝜽T x)
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø 𝜽T x = −𝟑 + x1 + x2
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
Ø y = 1 if x1 + x2 ≥ 𝟑
ML - Logistic Regression
Ø Decision Boundary : x1 + x2 = 3
Source:
ML – Andrew Ng
18
z Decision Boundary
-3
v Example : 𝛉 = 1
1
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
Ø Decision Boundary : x1 + x2 = 3
☞ h 𝜽(x) = 0.5
Ø y = 1 if x1 + x2 ≥ 𝟑
ML - Logistic Regression
Ø y = 0 if x1 + x2 < 𝟑
Source:
ML – Andrew Ng
19
z Decision Boundary
-3
v Example : 𝛉 = 1
1
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø Predict y = 1 if −𝟑 + x1 + x2 ≥ 0
Ø Decision Boundary : x1 + x2 = 3
☞ h 𝜽(x) = 0.5
Even if the data set is taken away, the decision boundary is the same
ML - Logistic Regression
h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
ML - Logistic Regression
complex decision boundaries is obtained. It does not just try to separate the
positive and negative examples in a straight line, but the decision boundary
is a circle. Source:
ML – Andrew Ng
23
z
Non-Linear Decision Boundary
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )
Ø The training set is used to fit the parameters theta but not to define the
decision boundary. Source:
ML – Andrew Ng
24
z
Non-Linear Decision Boundary
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )
Ø Decision Boundary : x12 + x22 = 1
Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0
Example 2:
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
ML - Logistic Regression
Logistic Regression
ML - Logistic Regression
Dr. A.
26 Boutorh
Ø Training : {(x
z (1), y(1)) , (x(2), y(2)), … , (x(m), y(m)) }
𝒎
𝟏 𝟏
§ Linear Regression : J(𝜽 ) = ~ 𝒉𝜽(𝒙(𝒊) ) − 𝒚(𝒊) 𝟐
𝒎 𝒊„𝟏 𝟐
𝟏 𝒎
§ Logistic Regression : J(𝜽 ) = …𝒊„𝟏 𝑪𝒐𝒔𝒕 𝒉𝜽(𝒙(𝒊) ), 𝒚(𝒊)
𝒎
𝟏
ML - Logistic Regression
z The Cost:
Negative Log Likelihood
- log ( P ( y=1 | x; 𝜽) ) if y = 1
§ 𝑪𝒐𝒔𝒕 𝒉𝜽 (𝒙), 𝒚 =
Ø The cost is the penalty that the algorithm pays for the value of
h𝛉(x) (h 𝛉(x) is a number like 0.8 which is the predicted value ),
relative to the value of the label y.
v The cost is :
ML - Logistic Regression
Ø - log ( h𝛉(x) ) if y = 1
Ø - log ( 1 - h𝛉(x) ) if y = 0
Dr. A.
30
Logistic Regression Cost Function Boutorh
Cost = 0 if y = 1 ;
h𝛉(x) = 1
Cost → ∞ ; h𝛉(x) → 0
If h𝛉(x) = 0 but y = 1
1
The learning Algorithm
ML - Logistic Regression
will be penalized by a
y=1 very large cost. Figure Source:
https://www.geeksforg
eeks.org/ml-cost-
function-in-logistic-
regression/
Dr. A.
31
Logistic Regression Cost Function Boutorh
y=0
Ø The curve goes to plus
infinity as h(x) goes to 1
large cost.
Figure Source:
https://www.geeksforg
eeks.org/ml-cost-
function-in-logistic-
regression/
Dr. A.
32 Boutorh
Logistic Regression
ML - Logistic Regression
Dr. A.
33 Boutorh
q Rather than writing out this cost function in two separated cases,
y=1 and y =0, the function can be simplified and the two lines can be
compressed into one equation.
ML - Logistic Regression
q This would make it more convenient to write out a cost function and
derive gradient descent.
Dr. A.
34 Boutorh
𝟏𝒎
§ J(𝜽 ) = …𝒊„𝟏 𝑪𝒐𝒔𝒕 𝒉𝜽(𝒙(𝒊) ), 𝒚(𝒊)
𝒎
𝟏 (𝐢) (𝒊)
§ J(𝜽 ) = − [ ∑mi=1 y(i) log (𝐡𝛉 𝐱 + (1 – y(i)) log ( 1- 𝐡𝛉 𝐱 )]
𝒎
𝟏
Output 𝒉𝜽 𝒙 =
ML - Logistic Regression
𝟏m𝒆 _𝜽𝑻𝒙
Dr. A.
36 Boutorh
z
Gradient Descent
𝟏 ( 𝐢) ( 𝒊)
§ J(𝜽 ) = − [ ∑m i=1 y(i) log (𝐡𝛉 𝐱 + (1 – y(i)) log ( 1- 𝐡𝛉 𝐱 )]
𝒎
𝟏
𝒉𝜽 𝒙 =
𝟏m𝒆 _𝜽𝑻𝒙
ML - Logistic Regression
Logistic Regression
ML - Logistic Regression
Dr. A.
38 Boutorh
z
Multiclass classification
§ Multiclass classification involves predicting one of more than two
classes, y = {1, 2, 3, ….., K} for K possible classes
Ø Work ☞ y = 1
Ø Friends ☞ y = 2
Ø Family ☞ y = 3
Ø Contacts ☞ y = 4
ML - Logistic Regression
Ø Services ☞ y = 5
v
z
ML - Logistic Regression
39
Dr. A.
40 Boutorh
Multi-class problems?
ML - Logistic Regression
Andrew Ng
z ML - Logistic Regression
41
Dr. A.
42 Boutorh
z
One – vs - All
§ Use K classifiers, each solving a two class problem of
separating class k from all others.
(𝒌)
the class k that maximizes 𝐌𝐚𝐱 𝒉𝜽 𝒙
43
ML - Logistic Regression
One – vs – All : K Classes
z
Figure Source:
Logistic Regression,
Dr. Patras, Hospedales
Dr. A.
44 Boutorh
Logistic Regression
ML - Logistic Regression
Dr. A.
45 Boutorh
z Summary
§ Logistic Regression uses sigmoid function to predict probability in [0,1]
suitable for the classification.
§ The cost is the penalty that the algorithm pays for the value of h𝜽(x)
relative to the value of the label y.
(𝒌)
§ One – vs – All select the class k that maximizes 𝑴𝒂𝒙 𝒉𝜽 𝒙 for K
trained logistic regression classifiers.
Dr. A.
46 Boutorh
Logistic Regression
ML - Logistic Regression
Dr. A.
47 Boutorh
z
Refences
§ Introduction to Machine Learning, Andrew Ng, Stanford University
z ML - Logistic Regression
48