Lecture 5 - AI - ML
Logistic

Regression

‫واﻟﺘﻌﻠﻢ اﻵﻟﻲ‬
2020 /2021
§ Classification predictive modeling involves assigning a class

label to input examples.

§ Binary classification refers to predicting one of two classes

§ Multi-class classification involves predicting one of more than

two classes.

§ Multi-label classification involves predicting one or more

classes for each example
ML - Logistic Regression

§ Imbalanced classification refers to classification tasks where

the distribution of examples across the classes is not equal.
Logistic Regression
ML - Logistic Regression
0: Negative Class

E.g. Benign Tumor, Not Spam Email, …

Goal: Take an input vector x and

y 𝝐 { 0, 1 } assign it to one of 2 classes y.

1: Positive Class
ML - Logistic Regression

E.g. Malignant Tumor, Spam Email, …

z Linear Classifier
§ Given examples ( x(i) , y(i) ), learn a classifier that is able to
predict y* given new point x*.

§ It should generalize well to new x* .

§ – Example

Ø x1: Fish Weight

Ø x2 : Fish Length
ML - Logistic Regression

Ø y : Fish Species.
Figure Source:
Logistic Regression,
Dr. Patras, Hospedales
h 𝛉(x) = 𝛉T x


h 𝛉(x) = 𝛉T x

Threshold Classifier Output h 𝜽(x) at 0.5

Ø If h 𝛉(x) ≥ 0.5 then predict y=1
ML - Logistic Regression

Ø If h 𝛉(x) < 0.5 then predict y = 0

Ø Binary linear classifier is a classifier that separates

two classes using a line, a plane, or a hyper- plane.

Ø Classification : y = 0 or y= 1

Ø Logistic Regression : 0 ≤ h 𝛉(x) ≤ 1

ML - Logistic Regression
Logistic Regression
ML - Logistic Regression
§ Linear and Logistic Regression use different
Hypothesis / Representation / Model Assumptions.

Ø Linear Regression h 𝜽(x) ∈ [-∞, + ∞]

ML - Logistic Regression

Ø Logistic Regression h 𝜽(x) ∈ [ 0, 1 ]

z = 𝒈 ( 𝜽
h 𝜽(x) T x)
h 𝜽(x) = _
𝟏 𝟏m𝒆
𝒈 𝒛 = _𝒛

Logistic Regression 𝒈 𝒛
0 ≤ h 𝛉(x) ≤ 1
Sigmoid Function
ML - Logistic Regression

Logistic Function
Ø h 𝜽(x) estimates probability that y = 1 on Input x

Ø Example : xt = [x0, x 1] = [1, TumorSize]

If h 𝜽(x) = 0.8 then y =1

§ It signifies : 80% chance of tumor being Malignant

☞ h 𝜽(x) = P ( y=1 | x; 𝜽)

Probability that y = 1 , given x parametrized by 𝜽

Ø P ( y=1 | x; 𝜽) + P ( y=0 | x; 𝜽) = 𝟏 Probability of

ML - Logistic Regression

tumor being
Ø P ( y=0 | x; 𝜽) = 1 - P ( y=1 | x; 𝜽) Benign is 20%
v Sigmoid Function
also called
Logistic Function

The function g(z) maps any real number to the (0,1) interval making any valued
function better suited for classification.
ML - Logistic Regression

h 𝜽(x) will give the probability that the output is 1. The probability that the prediction
is 0 is the complement of the probability that is 1.
Logistic Regression
ML - Logistic Regression
h 𝜽(x) ≥ 0.5

h 𝜽(x) < 0.5

𝜽T x < 0 𝜽T x ≥ 0
𝜽T x

h 𝜽(x) = 𝒈 ( 𝜽T x) = P(y=1| x ; 𝛉)
ML - Logistic Regression

ØPredict y = 1 if h 𝜽(x) ≥ 0.5 ☞ 𝜽T x ≥ 0

ØPredict y = 0 if h 𝜽(x) < 0.5 𝜽T x < 0
z Decision Boundary

§ The Decision Boundary is the boundary between two

classes, where:
P ( y = 1 | x; 𝜽) = P ( y=0 | x; 𝜽) = 0.5

h 𝜽(x) = _
𝜽Tx . = 0.5
𝟏+ 𝒆 = 2
ML - Logistic Regression

𝜽Tx = 0

z Decision Boundary
v Example : 𝛉 = 1
Ø h 𝜽(x) = g (𝜽T x)
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø 𝜽T x = −𝟑 + x1 + x2
Ø y = 1 if 𝜽T x ≥ 0
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
ML - Logistic Regression

Ø y = 1 if x1 + x2 ≥ 𝟑
z Decision Boundary
v Example : 𝛉 = 1
Ø h 𝜽(x) = g (𝜽T x)
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø 𝜽T x = −𝟑 + x1 + x2
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
Ø y = 1 if x1 + x2 ≥ 𝟑
ML - Logistic Regression

Ø Decision Boundary : x1 + x2 = 3
z Decision Boundary
v Example : 𝛉 = 1
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
Ø Decision Boundary : x1 + x2 = 3
☞ h 𝜽(x) = 0.5
Ø y = 1 if x1 + x2 ≥ 𝟑
ML - Logistic Regression

Ø y = 0 if x1 + x2 < 𝟑
z Decision Boundary
v Example : 𝛉 = 1
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø Predict y = 1 if −𝟑 + x1 + x2 ≥ 0

Ø Decision Boundary : x1 + x2 = 3
☞ h 𝜽(x) = 0.5

Even if the data set is taken away, the decision boundary is the same
ML - Logistic Regression

( the region where predict y =1 versus y = 0) that's a property of the

hypothesis and its parameters and not a property of the data set
z Non-Linear Decision Boundary

More complex example :
Given a training set as presented in the
figure, how to get logistic regression to fit
the sort of data?

For example the hypothesis looks like this:

h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
ML - Logistic Regression

Two extra features were added to the

features: x1 squared and x2 squared.
z Non-Linear Decision Boundary

v Take 𝛉 = 1
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )

Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )

Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0

ML - Logistic Regression

Ø Predict y =1 if x12 + x22 ≥ 1

z Non-Linear Decision Boundary

Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )

Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )

Ø Decision Boundary : x12 + x22 = 1

Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0

Ø Predict y =1 if x12 + x22 ≥ 1
By adding these more complex, or polynomial terms to the features, more
ML - Logistic Regression

complex decision boundaries is obtained. It does not just try to separate the
positive and negative examples in a straight line, but the decision boundary
is a circle. Source:
Non-Linear Decision Boundary
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )

Ø Decision Boundary : x12 + x22 = 1

Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0
Ø Predict y =1 if x12 + x22 ≥ 1
Ø Once again, the decision boundary is a property of the hypothesis under the
parameters and not of the training set. So, so long as the parameter vector
theta is given, that defines the decision boundary, which is the circle.
ML - Logistic Regression

Ø The training set is used to fit the parameters theta but not to define the
decision boundary. Source:
Non-Linear Decision Boundary
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )
Ø Decision Boundary : x12 + x22 = 1
Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0

Example 2:
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
ML - Logistic Regression

𝛉3 x12 + 𝛉4 x12 x2 + 𝛉5 x12 x22 +

𝛉6 x13x2 )
Logistic Regression
ML - Logistic Regression
Ø Training : {(x
z (1), y(1)) , (x(2), y(2)), … , (x(m), y(m)) }

Ø m examples in the dataset

Ø x = ∈ℝ
x2 n+1 x0 = 1, y ∈ {0,1}

ML - Logistic Regression

Ø How to choose parameters 𝛉 ?

𝟏 𝟏
§ Linear Regression : J(𝜽 ) = ~ 𝒉𝜽(𝒙(𝒊) ) − 𝒚(𝒊) 𝟐
𝒎 𝒊„𝟏 𝟐

𝟏 𝒎
§ Logistic Regression : J(𝜽 ) = …𝒊„𝟏 𝑪𝒐𝒔𝒕 𝒉𝜽(𝒙(𝒊) ), 𝒚(𝒊)

ML - Logistic Regression

§ 𝒉𝜽 𝒙 = 𝑪𝒐𝒔𝒕 𝒉𝜽(𝒙(𝒊) ), 𝒚(𝒊) = ??

𝟏m𝒆 _𝜽𝑻𝒙
z The Cost:
Negative Log Likelihood

- log ( P ( y=1 | x; 𝜽) ) if y = 1

§ 𝑪𝒐𝒔𝒕 𝒉𝜽 (𝒙), 𝒚 =

- log ( 1- P ( y=1 | x; 𝜽)) if y = 0

ML - Logistic Regression
zLogistic Regression Cost Function

Ø The cost is the penalty that the algorithm pays for the value of
h𝛉(x) (h 𝛉(x) is a number like 0.8 which is the predicted value ),
relative to the value of the label y.

v The cost is :
ML - Logistic Regression

Ø - log ( h𝛉(x) ) if y = 1
Ø - log ( 1 - h𝛉(x) ) if y = 0
Cost = 0 if y = 1 ;
h𝛉(x) = 1

Cost → ∞ ; h𝛉(x) → 0

If h𝛉(x) = 0 but y = 1
The learning Algorithm
ML - Logistic Regression

will be penalized by a
y=1 very large cost. Figure Source:
Ø The curve goes to plus
infinity as h(x) goes to 1

Ø If the label y = 0 but the

hypothesis predicted
that y = 1, then the
algorithm pays a very
ML - Logistic Regression

large cost.
Figure Source:
Logistic Regression
ML - Logistic Regression
q Rather than writing out this cost function in two separated cases,
y=1 and y =0, the function can be simplified and the two lines can be
compressed into one equation.
ML - Logistic Regression

q This would make it more convenient to write out a cost function and
derive gradient descent.
Cost (𝐡𝛉 𝐱 , 𝐲) = - y log (𝐡𝛉 𝐱 ) − (1 – y) log ( 1- 𝐡𝛉 𝐱 )

ML - Logistic Regression

Ø If y = 1 : Cost (𝐡𝛉 𝐱 , 𝐲) = – log (𝐡𝛉 𝐱 )

Ø If y = 0 : Cost (𝐡𝛉 𝐱 , 𝐲) = – log ( 1- 𝐡𝛉 𝐱 )
§ J(𝜽 ) = …𝒊„𝟏 𝑪𝒐𝒔𝒕 𝒉𝜽(𝒙(𝒊) ), 𝒚(𝒊)

𝟏 (𝐢) (𝒊)
§ J(𝜽 ) = − [ ∑mi=1 y(i) log (𝐡𝛉 𝐱 + (1 – y(i)) log ( 1- 𝐡𝛉 𝐱 )]

§ Parameters 𝛉 : Find 𝛉 that Minimize the cost J (𝛉)

§ To make prediction given new “x” :

Output 𝒉𝜽 𝒙 =
ML - Logistic Regression

𝟏m𝒆 _𝜽𝑻𝒙
Gradient Descent
𝟏 ( 𝐢) ( 𝒊)
§ J(𝜽 ) = − [ ∑m i=1 y(i) log (𝐡𝛉 𝐱 + (1 – y(i)) log ( 1- 𝐡𝛉 𝐱 )]

𝒉𝜽 𝒙 =
𝟏m𝒆 _𝜽𝑻𝒙
ML - Logistic Regression

Until Convergence Figure Source:

Logistic Regression,
Dr. Patras, Hospedales
Logistic Regression
ML - Logistic Regression
Multiclass classification
§ Multiclass classification involves predicting one of more than two
classes, y = {1, 2, 3, ….., K} for K possible classes

§ Example : Emails Classification ( K = 5)

Ø Work ☞ y = 1

Ø Friends ☞ y = 2

Ø Family ☞ y = 3

Ø Contacts ☞ y = 4
ML - Logistic Regression

Ø Services ☞ y = 5
ML - Logistic Regression
From Binary to Multiclass

§ We know how to model Logistic function and learn
Gradient Descent binary classifiers (K=2).

§ What about K >2 ,

Multi-class problems?
ML - Logistic Regression
Andrew Ng
z ML - Logistic Regression
One – vs - All
§ Use K classifiers, each solving a two class problem of
separating class k from all others.

§ For each class k, a logistic regression classifier

𝒉𝜽 𝒙 is trained to predict the probability that
y=k (P ( y=k | x; 𝜽) )

§ To make prediction of new instance x (input) , select

ML - Logistic Regression

the class k that maximizes 𝐌𝐚𝐱 𝒉𝜽 𝒙
ML - Logistic Regression
One – vs – All : K Classes

Figure Source:
Logistic Regression,
Dr. Patras, Hospedales
Logistic Regression
ML - Logistic Regression
z Summary
§ Logistic Regression uses sigmoid function to predict probability in [0,1]
suitable for the classification.

§ h 𝜽(x) gives the probability that the output is 1.

§ The decision boundary is a property of the hypothesis and its parameters

and not a property of the data set.

§ The cost is the penalty that the algorithm pays for the value of h𝜽(x)
relative to the value of the label y.

§ Multi-class classification involves predicting one of K classes where K > 2.

ML - Logistic Regression

§ One – vs – All select the class k that maximizes 𝑴𝒂𝒙 𝒉𝜽 𝒙 for K
trained logistic regression classifiers.
Logistic Regression
ML - Logistic Regression
z ML - Logistic Regression

