Shrutanik Chatterjee - 34230822046 - Machine Learning Applications

FUTURE INSTITUTE OF TECHNOLOGY , KOLKATA
MACHINE LEARNING APPLICATIONS, 2024

PCCAIML 601
Report On
MLE & LOSS FUNCTION
Author: SHRUTANIK CHATERJEE

Course Instructor: DR. PRADIPTA KR. BANERJEE
Department of CSE (AI & ML)

University Roll No: 34230822046
University Registration No: 223420120181 of 2022-23
6th Semester
Contents
1 Show that derivative of 𝜎(𝐙) = 𝜎(𝐙) ⋅ (1 − 𝜎(𝐙)) 3
2 Derivation of Cross Entropy Loss from Maximum Likelihood Estimation of Logistic

Regression: 4
3 Draw MSE, log-loss and hinge-loss: 5
3.1 Python Code Implementation: . . . . . . . . . . . . . . . . . . . . . . . . 6
4 If (𝑋: 𝑃) = 𝑃 ⋅ (1 − 𝑃) 𝑋−𝑃 from this distribution the following sample drawn [4,5,6,5,6,3]
Estimate the Maximum likelihood of P. 8
1 Show that derivative of 𝝈(𝐙) = 𝝈(𝐙) ⋅ (𝟏 − 𝝈(𝐙))
Show that 𝜎 ′ (𝑧) = 𝜎(𝑧)(1 − 𝜎(𝑧))
The sigmoid function 𝜎(𝑧) is defined as:
1
𝜎(𝑧) =
1 + 𝑒 −𝑧
Let's denote 𝜎(𝑧) as 𝑓(𝑧) :o derivative of 𝑓(𝑧) with respect to ∗ can be found using the chain
rule:
∂ ∂ 1
𝑓(𝑧) = ( )
∂𝑧 ∂𝑧 1 + 𝑒 −𝑧
To find this derivative we, will use the quotient rule: To find this derivative we,
∂ 𝑢 𝜇′ 𝑣−𝜇𝑣 ′
( )
∂𝑧 𝑣
= 𝑣2 [Here 𝜇′ and 𝑣 ′ are derivatives of
𝜇 and 𝑣 respectively]
Let 𝑢 = 1 and 𝑣 = 1 + 𝑒 −𝑧 , then 𝑢′ = 0 and 𝑣 ′ = −𝑒 −𝑧
∴ Applying the quotient rule, we get:
∂ 0 ⋅ (1 + 𝑒 −𝑧 ) − 1(−𝑒 −𝑧 ) 𝑒 −𝑧
𝑓(𝑧) = =
∂𝑧 (1 + 𝑒 −𝑧 )2 (1 + 𝑒 −𝑧 )2
𝑧
1 𝑒 1
∴ 𝑓(𝑧) = −𝑧
= 𝑧 = −𝑧
1+𝑒 𝑒 +1 𝑒 +1
and 𝑓(𝑧) = 𝜎(𝑧)
∂ 𝑒 −𝑧 1 𝑒 −𝑧
∴ 𝜎(𝑧) = = [ ] ⋅ [ ]
∂𝑧 (1 + 𝑒 −𝑧 )2 1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
1 1
=[ ] ⋅ [1 − ]
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
= 𝜎(𝑧) ⋅ [1 − 𝜎(𝑧)]
∴ Derivative of 𝜎(𝑧) = 𝜎(𝑧) ⋅ (1 − 𝜎(𝑧))
2 Derivation of Cross Entropy Loss from Maximum Likelihood Estimation of

Logistic Regression:
Derive the cross entropy loss from MLE estimation of Logistic Regression.
To derive the cross-entropy loss from MLE in Logistic Regression, we start with the
likelihood function. In logistic regression we model the probability of a binary outcome
1
𝑃(𝑦 = 1 ∣ 𝑥) = 𝜎(𝜔⊤ 𝑥) = 1+𝑒 −𝜔𝑇𝑥
𝑃(𝑦 = 0 ∣ 𝑥) = 1 − 𝑃(𝑦 = 1 ∣ 𝑥)
1
Here 𝜎(𝑧) is the logistic function and 𝜎(𝑧) = 1+𝑒 −𝑧 .
• 𝑤 is the weight vector 𝑜𝑥 is the input vector.

For a dataset {(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑁 , 𝑦𝑁 )}, the likelihood function can
be written as:
𝑁
𝐿(𝜔) = ∏ 𝑃(𝑦𝑖 ∣ 𝑥𝑖 )
𝑖=1
𝐿(𝜔) = ∏𝑁 𝑖=1 𝑃(𝑦𝑖 ∣ 𝑥𝑖 )

Taking the logarithm of the likelihood function (log-likelihood) to simplify
calculations:
𝑁
→ log 𝐿(𝜔) = ∑ log 𝑃(𝑦𝑖 ∣ 𝑥𝑖 )

𝑖=1
∴ log 𝑃(𝑦𝑖 ∣ 𝑥𝑖 ) = 𝑦𝑖 log (𝜎(𝜔⊤ 𝑥𝑖 )) + (1 − 𝑦𝑖 )log (1 − 𝜎(𝜔⊤ 𝑥𝑖 ))
Jo maximize the likelihood function fwe minimize the regative log-likelihood
1 𝑁
∴ 𝐽(𝜔) = − ∑ [𝑦 (log (𝜎 ⊤ (𝜔⊤ 𝑥𝑖 )) + (1 − 𝑦𝑖 )log (1 − 𝜎(𝜔⊤ 𝑥𝑖 ))]
𝑁 𝑖=1 𝑖
1 1
Now 𝐽(𝜔) = ∑𝑁
𝑖=1 [𝑦𝑖 log ( ⊤ ) + (1 − 𝑦𝑖 )log (1 − ⊤ ))
1+𝑒 −𝜔 𝑥𝑖 1+𝑒 −𝜔 𝑥𝑖
1 ⊤𝑥
Now 𝑦𝑖 log ( ⊤ ) = 𝑦𝑖 log (1) − 𝑦𝑖 log (1 + 𝑒 −𝜔𝑇𝑥𝑖 ) = −𝑦𝑖 log (1 + 𝑒 −𝜔 𝑖 )
1+𝑒 −𝜔 𝑥
⊤
1 𝑒 −𝜔 𝑥𝑖 ⊤𝑥
and (1 − 𝑦𝑖 )log (1 − ⊤ ) = (1 − 𝑦𝑖 )log ( ⊤ ) = (1 − 𝑦𝑖 )log (𝑒 −𝜔 𝑖 ) −(1 −
1+𝑒 −𝜔 𝑥𝑖 1−𝑒 −𝜔 𝑥𝑖
−𝜔𝑇 𝑥𝑖
𝑦𝑖)log (1 + 𝑒 )
⊤ 𝑥𝑖
= −(1 − 𝑦𝑖)𝜔⊤ 𝑥𝑖 − log (1 + 𝑒 −𝜔 )
Substituting (1) and (ii) in log-likelihood function, we get.

𝑁
⊤𝑥 ⊤𝑥
𝐽(𝜔) = − ∑ (𝑦𝑖 log (1 + 𝑒 −𝜔 𝑖 ) + (1 − 𝑦𝑖 )log (1 + 𝑒 −𝜔 𝑖 ))
𝑖=1
𝑁
⊤𝑥
= − ∑ log (1 + 𝑒 −𝜔 𝑖 ).
𝑖=1
𝑁
⊤𝑥
= − ∑ log (1 + 𝑒 −𝑦𝑖𝑤 𝑖 ). ← cross entropy loss from
𝑖=1
MLE estimation of Logistic Regression
3 Draw MSE, log-loss and hinge-loss:
Draw MSE, Log Loss & Hinge Loss
MSE (Mean Squared Error Loss):
(1) Log Loss (Cross - Entropy Loss):
(2 ) Hinge Loss:
3.1 Python Code Implementation:
Comparison of Loss Functions

4 If (𝑋: 𝑃) = 𝑃 ⋅ (1 − 𝑃) 𝑋−𝑃 from this distribution the following sample drawn [4,5,6,5,6,3]
Estimate the Maximum likelihood of P.
If 𝑓(𝑥: 𝑝) = (1 − 𝑝)(𝑥−𝐷 ⋅ 𝑝 from this distribution, the following sample arawn
Estimate the Maximum Likelihood of 𝑃.
Probability Mass Function 𝑓(𝑥; 𝑝) = 𝑝𝑥 (𝑦 − 𝑃)𝑥−1

2. 𝐿(𝑝) = 𝑓(4; 𝑝) × 𝑓(5; 𝑝) × 𝑓(6; 𝑝) × 𝑓(5; 𝑝) × 𝑓(6; 𝑝) × 𝑓(3; 𝑝)
= 𝑝 × (1 − 𝑝)3 × 𝑝(1 − 𝑝)4 × 𝑝(1 − 𝑝)5 × 𝑝(1 − 𝑝)4 × 𝑝(1 − 𝑝)5 × 𝑝(1 − 𝑝)2
= 𝑝6 × (1 − 𝑝)23
maximum likelihood estimate, we differentiate the likelihood function wrt 𝑝.
∂ ∂
𝐿(𝑝) = 𝑝6 × (1 − 𝑝)23 [ (𝑢𝑣) = 𝑢′ 𝑣 + 𝑣 ′ 𝑢]
∂𝑝 ∂𝑥
Here 𝑢 = 𝑝6 and 𝑣 = (1 − 𝑝)23
∂
𝐿(𝑝) = (1 − 𝑝)23 × 𝑝6
∂𝑝
∂ ∂ 6
= [(1 − 𝑝)23 ] ⋅ 𝑝6 + (1 − 𝑝)23 ⋅ [𝑝 ]
∂𝑝 ∂𝑝
∂
= {23(1 − 𝑝)22 ⋅ (1 − 𝑝)𝑝6 } + {(1 − 𝑝)23 ⋅ 6𝑝5 }
∂𝑝
𝑑 ∂
= 23(1 − 𝑝)22 ( [1] − (𝑝)) 𝑝6 + (1 − 𝑝)23 ⋅ 6𝑝5
𝑑𝑝 ∂𝑝
= 23(1 − 𝑝)22 (0 − 1)𝑝6 + (1 − 𝑝)23 6𝑝5
= 6𝑝5 (1 − 𝑝)23 − 23(1 − 𝑝)22 𝑝6 = 6𝑝5 × (1 − 𝑝)22 [6 − 6𝑝 − 23𝑝]
= 6𝑝5 × (1 − 𝑝)22 [6 − 29𝑝]
For the product to be equal to zero, at least one of the factors must be equal to xero, Thus we
have two cases:
6𝑝5 = 0 → 𝑝 = 0 /(1 − 𝑝)22 = 0 → 𝑝 = 1/(6 − 29𝑝) = 0 → 𝑝 = 6/29
𝑃 is within the feasible mange of probabilities 10 0 ⩽ 𝑃 ⩽ 1
∴ 𝑝 = 6/29
𝐿(𝑃) = (1 − 6/29)23 × (6/29)6
= 3.79 × 10−7

Shrutanik Chatterjee - 34230822046 - Machine Learning Applications

Uploaded by

Copyright:

Available Formats

You might also like

Shrutanik Chatterjee - 34230822046 - Machine Learning Applications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shrutanik Chatterjee - 34230822046 - Machine Learning Applications

Uploaded by

Copyright:

Available Formats

FUTURE INSTITUTE OF TECHNOLOGY , KOLKATA

MACHINE LEARNING APPLICATIONS, 2024

MLE & LOSS FUNCTION

Author: SHRUTANIK CHATERJEE

Department of CSE (AI & ML)

2 Derivation of Cross Entropy Loss from Maximum Likelihood Estimation of Logistic

3 Draw MSE, log-loss and hinge-loss: 5

3.1 Python Code Implementation: . . . . . . . . . . . . . . . . . . . . . . . . 6

Show that 𝜎 ′ (𝑧) = 𝜎(𝑧)(1 − 𝜎(𝑧))

The sigmoid function 𝜎(𝑧) is defined as:

Let 𝑢 = 1 and 𝑣 = 1 + 𝑒 −𝑧 , then 𝑢′ = 0 and 𝑣 ′ = −𝑒 −𝑧

∴ Applying the quotient rule, we get:

∴ Derivative of 𝜎(𝑧) = 𝜎(𝑧) ⋅ (1 − 𝜎(𝑧))

2 Derivation of Cross Entropy Loss from Maximum Likelihood Estimation of

• 𝑤 is the weight vector 𝑜𝑥 is the input vector.

𝐿(𝜔) = ∏𝑁 𝑖=1 𝑃(𝑦𝑖 ∣ 𝑥𝑖 )

→ log 𝐿(𝜔) = ∑ log 𝑃(𝑦𝑖 ∣ 𝑥𝑖 )

Substituting (1) and (ii) in log-likelihood function, we get.

Draw MSE, Log Loss & Hinge Loss

MSE (Mean Squared Error Loss):

(1) Log Loss (Cross - Entropy Loss):

Comparison of Loss Functions

If 𝑓(𝑥: 𝑝) = (1 − 𝑝)(𝑥−𝐷 ⋅ 𝑝 from this distribution, the following sample arawn

Estimate the Maximum Likelihood of 𝑃.

Probability Mass Function 𝑓(𝑥; 𝑝) = 𝑝𝑥 (𝑦 − 𝑃)𝑥−1

maximum likelihood estimate, we differentiate the likelihood function wrt 𝑝.

Here 𝑢 = 𝑝6 and 𝑣 = (1 − 𝑝)23

= 23(1 − 𝑝)22 (0 − 1)𝑝6 + (1 − 𝑝)23 6𝑝5

= 6𝑝5 (1 − 𝑝)23 − 23(1 − 𝑝)22 𝑝6 = 6𝑝5 × (1 − 𝑝)22 [6 − 6𝑝 − 23𝑝]

= 6𝑝5 × (1 − 𝑝)22 [6 − 29𝑝]

6𝑝5 = 0 → 𝑝 = 0 /(1 − 𝑝)22 = 0 → 𝑝 = 1/(6 − 29𝑝) = 0 → 𝑝 = 6/29

𝑃 is within the feasible mange of probabilities 10 0 ⩽ 𝑃 ⩽ 1

𝐿(𝑃) = (1 − 6/29)23 × (6/29)6

You might also like