Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

ARTIFICIAL INTELLIGENCE

M AC H I N E L E A R N I N G

z
- Lecture 5 - AI - ML
Logistic Dr. Aicha BOUTORH

Regression ‫اﻟﺬﻛﺎء اﻻﺻﻄﻨﺎﻋﻲ‬


‫واﻟﺘﻌﻠﻢ اﻵﻟﻲ‬
2020 /2021
Dr. A.

Classification
2 Boutorh

§ Classification predictive modeling involves assigning a class


label to input examples.

§ Binary classification refers to predicting one of two classes

§ Multi-class classification involves predicting one of more than


two classes.

§ Multi-label classification involves predicting one or more


classes for each example
ML - Logistic Regression

§ Imbalanced classification refers to classification tasks where


the distribution of examples across the classes is not equal.
Dr. A.
3 Boutorh

Logistic Regression
ML - Logistic Regression
Dr. A.
4 Boutorh

0: Negative Class
z

E.g. Benign Tumor, Not Spam Email, …

Goal: Take an input vector x and


y 𝝐 { 0, 1 } assign it to one of 2 classes y.

1: Positive Class
ML - Logistic Regression

E.g. Malignant Tumor, Spam Email, …


Dr. A.
5 Boutorh

z Linear Classifier
§ Given examples ( x(i) , y(i) ), learn a classifier that is able to
predict y* given new point x*.

§ It should generalize well to new x* .

§ – Example

Ø x1: Fish Weight

Ø x2 : Fish Length
ML - Logistic Regression

Ø y : Fish Species.
Figure Source:
Logistic Regression,
Dr. Patras, Hospedales
Dr. A.
6 Boutorh

z
h 𝛉(x) = 𝛉T x

0.5

h 𝛉(x) = 𝛉T x

Threshold Classifier Output h 𝜽(x) at 0.5


Ø If h 𝛉(x) ≥ 0.5 then predict y=1
ML - Logistic Regression

Ø If h 𝛉(x) < 0.5 then predict y = 0


Dr. A.
7 Boutorh

Ø Binary linear classifier is a classifier that separates


two classes using a line, a plane, or a hyper- plane.

Ø Classification : y = 0 or y= 1

Ø Logistic Regression : 0 ≤ h 𝛉(x) ≤ 1


ML - Logistic Regression
Dr. A.
8 Boutorh

Logistic Regression
ML - Logistic Regression
Dr. A.
9

z
Linear vs Logistic Boutorh

Regression
§ Linear and Logistic Regression use different
Hypothesis / Representation / Model Assumptions.

Ø Linear Regression h 𝜽(x) ∈ [-∞, + ∞]


ML - Logistic Regression

Ø Logistic Regression h 𝜽(x) ∈ [ 0, 1 ]


Dr. A.
10 Boutorh

z = 𝒈 ( 𝜽
h 𝜽(x) T x)
𝟏
h 𝜽(x) = _
𝜽Tx
𝟏 𝟏m𝒆
𝒈 𝒛 = _𝒛
𝟏+𝒆

Logistic Regression 𝒈 𝒛
0 ≤ h 𝛉(x) ≤ 1
0.5
Sigmoid Function
ML - Logistic Regression

Logistic Function
z
Dr. A.
11
Interpretation of Hypothesis Output Boutorh

Ø h 𝜽(x) estimates probability that y = 1 on Input x

Ø Example : xt = [x0, x 1] = [1, TumorSize]

If h 𝜽(x) = 0.8 then y =1

§ It signifies : 80% chance of tumor being Malignant

☞ h 𝜽(x) = P ( y=1 | x; 𝜽)

Probability that y = 1 , given x parametrized by 𝜽

Ø P ( y=1 | x; 𝜽) + P ( y=0 | x; 𝜽) = 𝟏 Probability of


ML - Logistic Regression

tumor being
Ø P ( y=0 | x; 𝜽) = 1 - P ( y=1 | x; 𝜽) Benign is 20%
12
v Sigmoid Function
z
also called
Logistic Function

The function g(z) maps any real number to the (0,1) interval making any valued
function better suited for classification.
ML - Logistic Regression

h 𝜽(x) will give the probability that the output is 1. The probability that the prediction
is 0 is the complement of the probability that is 1.
Dr. A.
13 Boutorh

Logistic Regression
ML - Logistic Regression
Dr. A.
14 h 𝜽(x) = 𝒈 𝜽T x Boutorh

z
h 𝜽(x) ≥ 0.5

0.5
h 𝜽(x) < 0.5

𝜽T x < 0 𝜽T x ≥ 0
𝜽T x

h 𝜽(x) = 𝒈 ( 𝜽T x) = P(y=1| x ; 𝛉)
ML - Logistic Regression

ØPredict y = 1 if h 𝜽(x) ≥ 0.5 ☞ 𝜽T x ≥ 0


ØPredict y = 0 if h 𝜽(x) < 0.5 𝜽T x < 0
Dr. A.
15 Boutorh

z Decision Boundary

§ The Decision Boundary is the boundary between two


classes, where:
P ( y = 1 | x; 𝜽) = P ( y=0 | x; 𝜽) = 0.5

𝟏
h 𝜽(x) = _
𝜽Tx . = 0.5
𝟏m𝒆
_𝜽Tx
𝟏+ 𝒆 = 2
ML - Logistic Regression

𝜽Tx = 0
16

z Decision Boundary
-3
v Example : 𝛉 = 1
1
Ø h 𝜽(x) = g (𝜽T x)
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø 𝜽T x = −𝟑 + x1 + x2
Ø y = 1 if 𝜽T x ≥ 0
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
ML - Logistic Regression

Ø y = 1 if x1 + x2 ≥ 𝟑
Source:
ML – Andrew Ng
17

z Decision Boundary
-3
v Example : 𝛉 = 1
1
Ø h 𝜽(x) = g (𝜽T x)
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø 𝜽T x = −𝟑 + x1 + x2
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
Ø y = 1 if x1 + x2 ≥ 𝟑
ML - Logistic Regression

Ø Decision Boundary : x1 + x2 = 3
Source:
ML – Andrew Ng
18

z Decision Boundary
-3
v Example : 𝛉 = 1
1
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø y = 1 if −𝟑 + x1 + x2 ≥ 0
Ø Decision Boundary : x1 + x2 = 3
☞ h 𝜽(x) = 0.5
Ø y = 1 if x1 + x2 ≥ 𝟑
ML - Logistic Regression

Ø y = 0 if x1 + x2 < 𝟑
Source:
ML – Andrew Ng
19

z Decision Boundary
-3
v Example : 𝛉 = 1
1
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 )
Ø Predict y = 1 if −𝟑 + x1 + x2 ≥ 0

Ø Decision Boundary : x1 + x2 = 3
☞ h 𝜽(x) = 0.5

Even if the data set is taken away, the decision boundary is the same
ML - Logistic Regression

( the region where predict y =1 versus y = 0) that's a property of the


hypothesis and its parameters and not a property of the data set
Source:
ML – Andrew Ng
20

z Non-Linear Decision Boundary


More complex example :
Given a training set as presented in the
figure, how to get logistic regression to fit
the sort of data?

For example the hypothesis looks like this:

h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
ML - Logistic Regression

Two extra features were added to the


features: x1 squared and x2 squared.
Source:
ML – Andrew Ng
21

z Non-Linear Decision Boundary


-1
0
0
v Take 𝛉 = 1
1
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )

Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )

Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0


ML - Logistic Regression

Ø Predict y =1 if x12 + x22 ≥ 1


Source:
ML – Andrew Ng
22

z Non-Linear Decision Boundary


Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )

Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )

Ø Decision Boundary : x12 + x22 = 1

Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0


Ø Predict y =1 if x12 + x22 ≥ 1
By adding these more complex, or polynomial terms to the features, more
ML - Logistic Regression

complex decision boundaries is obtained. It does not just try to separate the
positive and negative examples in a straight line, but the decision boundary
is a circle. Source:
ML – Andrew Ng
23

z
Non-Linear Decision Boundary
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )

Ø Decision Boundary : x12 + x22 = 1


Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0
Ø Predict y =1 if x12 + x22 ≥ 1
Ø Once again, the decision boundary is a property of the hypothesis under the
parameters and not of the training set. So, so long as the parameter vector
theta is given, that defines the decision boundary, which is the circle.
ML - Logistic Regression

Ø The training set is used to fit the parameters theta but not to define the
decision boundary. Source:
ML – Andrew Ng
24

z
Non-Linear Decision Boundary
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
𝛉3 x12 + 𝛉4 x22 )
Ø h 𝜽(x) = g ( −𝟏 + x12 + x22 )
Ø Decision Boundary : x12 + x22 = 1
Ø Predict y =1 if −𝟏 + x12 + x22 ≥ 0

Example 2:
Ø h 𝜽(x) = g ( 𝛉0 x0 + 𝛉1 x1 + 𝛉2 x2 +
ML - Logistic Regression

𝛉3 x12 + 𝛉4 x12 x2 + 𝛉5 x12 x22 +


𝛉6 x13x2 )
Source:
ML – Andrew Ng
Dr. A.
25 Boutorh

Logistic Regression
ML - Logistic Regression
Dr. A.
26 Boutorh

Ø Training : {(x
z (1), y(1)) , (x(2), y(2)), … , (x(m), y(m)) }

Ø m examples in the dataset


x0
x1
Ø x = ∈ℝ
x2 n+1 x0 = 1, y ∈ {0,1}

xn
ML - Logistic Regression

Ø How to choose parameters 𝛉 ?


Dr. A.
27
Cost Function
Boutorh

𝒎
𝟏 𝟏
§ Linear Regression : J(𝜽 ) = ~ 𝒉𝜽(𝒙(𝒊) ) − 𝒚(𝒊) 𝟐
𝒎 𝒊„𝟏 𝟐

𝟏 𝒎
§ Logistic Regression : J(𝜽 ) = …𝒊„𝟏 𝑪𝒐𝒔𝒕 𝒉𝜽(𝒙(𝒊) ), 𝒚(𝒊)
𝒎

𝟏
ML - Logistic Regression

§ 𝒉𝜽 𝒙 = 𝑪𝒐𝒔𝒕 𝒉𝜽(𝒙(𝒊) ), 𝒚(𝒊) = ??


𝟏m𝒆 _𝜽𝑻𝒙
Dr. A.
28 Boutorh

z The Cost:
Negative Log Likelihood

- log ( P ( y=1 | x; 𝜽) ) if y = 1

§ 𝑪𝒐𝒔𝒕 𝒉𝜽 (𝒙), 𝒚 =

- log ( 1- P ( y=1 | x; 𝜽)) if y = 0


ML - Logistic Regression
Dr. A.
29 Boutorh

zLogistic Regression Cost Function

Ø The cost is the penalty that the algorithm pays for the value of
h𝛉(x) (h 𝛉(x) is a number like 0.8 which is the predicted value ),
relative to the value of the label y.

v The cost is :
ML - Logistic Regression

Ø - log ( h𝛉(x) ) if y = 1
Ø - log ( 1 - h𝛉(x) ) if y = 0
Dr. A.
30
Logistic Regression Cost Function Boutorh

Cost = 0 if y = 1 ;
h𝛉(x) = 1

Cost → ∞ ; h𝛉(x) → 0

If h𝛉(x) = 0 but y = 1
1
The learning Algorithm
ML - Logistic Regression

will be penalized by a
y=1 very large cost. Figure Source:
https://www.geeksforg
eeks.org/ml-cost-
function-in-logistic-
regression/
Dr. A.
31
Logistic Regression Cost Function Boutorh

y=0
Ø The curve goes to plus
infinity as h(x) goes to 1

Ø If the label y = 0 but the


hypothesis predicted
that y = 1, then the
algorithm pays a very
ML - Logistic Regression

large cost.
Figure Source:
https://www.geeksforg
eeks.org/ml-cost-
function-in-logistic-
regression/
Dr. A.
32 Boutorh

Logistic Regression
ML - Logistic Regression
Dr. A.
33 Boutorh

q Rather than writing out this cost function in two separated cases,
y=1 and y =0, the function can be simplified and the two lines can be
compressed into one equation.
ML - Logistic Regression

q This would make it more convenient to write out a cost function and
derive gradient descent.
Dr. A.
34 Boutorh

Cost (𝐡𝛉 𝐱 , 𝐲) = - y log (𝐡𝛉 𝐱 ) − (1 – y) log ( 1- 𝐡𝛉 𝐱 )


ML - Logistic Regression

Ø If y = 1 : Cost (𝐡𝛉 𝐱 , 𝐲) = – log (𝐡𝛉 𝐱 )


Ø If y = 0 : Cost (𝐡𝛉 𝐱 , 𝐲) = – log ( 1- 𝐡𝛉 𝐱 )
Dr. A.
35
Logistic Regression Cost Function Boutorh

𝟏𝒎
§ J(𝜽 ) = …𝒊„𝟏 𝑪𝒐𝒔𝒕 𝒉𝜽(𝒙(𝒊) ), 𝒚(𝒊)
𝒎

𝟏 (𝐢) (𝒊)
§ J(𝜽 ) = − [ ∑mi=1 y(i) log (𝐡𝛉 𝐱 + (1 – y(i)) log ( 1- 𝐡𝛉 𝐱 )]
𝒎

§ Parameters 𝛉 : Find 𝛉 that Minimize the cost J (𝛉)

§ To make prediction given new “x” :

𝟏
Output 𝒉𝜽 𝒙 =
ML - Logistic Regression

𝟏m𝒆 _𝜽𝑻𝒙
Dr. A.
36 Boutorh

z
Gradient Descent
𝟏 ( 𝐢) ( 𝒊)
§ J(𝜽 ) = − [ ∑m i=1 y(i) log (𝐡𝛉 𝐱 + (1 – y(i)) log ( 1- 𝐡𝛉 𝐱 )]
𝒎

𝟏
𝒉𝜽 𝒙 =
𝟏m𝒆 _𝜽𝑻𝒙
ML - Logistic Regression

Until Convergence Figure Source:


Logistic Regression,
Dr. Patras, Hospedales
Dr. A.
37 Boutorh

Logistic Regression
ML - Logistic Regression
Dr. A.
38 Boutorh

z
Multiclass classification
§ Multiclass classification involves predicting one of more than two
classes, y = {1, 2, 3, ….., K} for K possible classes

§ Example : Emails Classification ( K = 5)

Ø Work ☞ y = 1

Ø Friends ☞ y = 2

Ø Family ☞ y = 3

Ø Contacts ☞ y = 4
ML - Logistic Regression

Ø Services ☞ y = 5
v
z
ML - Logistic Regression
39
Dr. A.
40 Boutorh

From Binary to Multiclass


§ We know how to model Logistic function and learn
Gradient Descent binary classifiers (K=2).

§ What about K >2 ,

Multi-class problems?
ML - Logistic Regression
Andrew Ng
z ML - Logistic Regression
41
Dr. A.
42 Boutorh

z
One – vs - All
§ Use K classifiers, each solving a two class problem of
separating class k from all others.

§ For each class k, a logistic regression classifier


(𝒌)
𝒉𝜽 𝒙 is trained to predict the probability that
y=k (P ( y=k | x; 𝜽) )

§ To make prediction of new instance x (input) , select


ML - Logistic Regression

(𝒌)
the class k that maximizes 𝐌𝐚𝐱 𝒉𝜽 𝒙
43
ML - Logistic Regression
One – vs – All : K Classes
z

Figure Source:
Logistic Regression,
Dr. Patras, Hospedales
Dr. A.
44 Boutorh

Logistic Regression
ML - Logistic Regression
Dr. A.
45 Boutorh

z Summary
§ Logistic Regression uses sigmoid function to predict probability in [0,1]
suitable for the classification.

§ h 𝜽(x) gives the probability that the output is 1.

§ The decision boundary is a property of the hypothesis and its parameters


and not a property of the data set.

§ The cost is the penalty that the algorithm pays for the value of h𝜽(x)
relative to the value of the label y.

§ Multi-class classification involves predicting one of K classes where K > 2.


ML - Logistic Regression

(𝒌)
§ One – vs – All select the class k that maximizes 𝑴𝒂𝒙 𝒉𝜽 𝒙 for K
trained logistic regression classifiers.
Dr. A.
46 Boutorh

Logistic Regression
ML - Logistic Regression
Dr. A.
47 Boutorh

z
Refences
§ Introduction to Machine Learning, Andrew Ng, Stanford University

§ An Introduction to Machine Learning with Python by Andreas C. Müller


and Sarah Guido (O’Reilly). Copyright 2017 Sarah Guido and Andreas
Müller, 978-1-449-36941-5.

§ Applied Machine Learning in Python, Kevyn Collins Thompson,


University of Michigan
§ The elements of statistical learning: data mining, inference, and
prediction. Springer Science & Business Media. Hastie T, Tibshirani R,
Friedman J; 2009.

§ An introduction to statistical learning, with applications in R.


James G, Witten D, Hastie T, Tibshirani R. An introduction to
statistical learning. springer; 2013.
ML - Logistic Regression

§ Mathematics for Machine Learning. Marc Peter Deisenroth, A. Aldo Faisal


, Cheng Soon Ong. 2020
Boutorh
Dr. A.

z ML - Logistic Regression
48

You might also like