Professional Documents
Culture Documents
Shrutanik Chatterjee - 34230822046 - Machine Learning Applications
Shrutanik Chatterjee - 34230822046 - Machine Learning Applications
Shrutanik Chatterjee - 34230822046 - Machine Learning Applications
4 If (𝑋: 𝑃) = 𝑃 ⋅ (1 − 𝑃) 𝑋−𝑃 from this distribution the following sample drawn [4,5,6,5,6,3]
Estimate the Maximum likelihood of P. 8
1 Show that derivative of 𝝈(𝐙) = 𝝈(𝐙) ⋅ (𝟏 − 𝝈(𝐙))
1
𝜎(𝑧) =
1 + 𝑒 −𝑧
Let's denote 𝜎(𝑧) as 𝑓(𝑧) :o derivative of 𝑓(𝑧) with respect to ∗ can be found using the chain
rule:
∂ ∂ 1
𝑓(𝑧) = ( )
∂𝑧 ∂𝑧 1 + 𝑒 −𝑧
To find this derivative we, will use the quotient rule: To find this derivative we,
∂ 𝑢 𝜇′ 𝑣−𝜇𝑣 ′
( )
∂𝑧 𝑣
= 𝑣2 [Here 𝜇′ and 𝑣 ′ are derivatives of
𝜇 and 𝑣 respectively]
∂ 0 ⋅ (1 + 𝑒 −𝑧 ) − 1(−𝑒 −𝑧 ) 𝑒 −𝑧
𝑓(𝑧) = =
∂𝑧 (1 + 𝑒 −𝑧 )2 (1 + 𝑒 −𝑧 )2
𝑧
1 𝑒 1
∴ 𝑓(𝑧) = −𝑧
= 𝑧 = −𝑧
1+𝑒 𝑒 +1 𝑒 +1
and 𝑓(𝑧) = 𝜎(𝑧)
∂ 𝑒 −𝑧 1 𝑒 −𝑧
∴ 𝜎(𝑧) = = [ ] ⋅ [ ]
∂𝑧 (1 + 𝑒 −𝑧 )2 1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
1 1
=[ ] ⋅ [1 − ]
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
= 𝜎(𝑧) ⋅ [1 − 𝜎(𝑧)]
To derive the cross-entropy loss from MLE in Logistic Regression, we start with the
likelihood function. In logistic regression we model the probability of a binary outcome
1
𝑃(𝑦 = 1 ∣ 𝑥) = 𝜎(𝜔⊤ 𝑥) = 1+𝑒 −𝜔𝑇𝑥
𝑃(𝑦 = 0 ∣ 𝑥) = 1 − 𝑃(𝑦 = 1 ∣ 𝑥)
1
Here 𝜎(𝑧) is the logistic function and 𝜎(𝑧) = 1+𝑒 −𝑧 .
be written as:
𝑁
𝐿(𝜔) = ∏ 𝑃(𝑦𝑖 ∣ 𝑥𝑖 )
𝑖=1
1 𝑁
∴ 𝐽(𝜔) = − ∑ [𝑦 (log (𝜎 ⊤ (𝜔⊤ 𝑥𝑖 )) + (1 − 𝑦𝑖 )log (1 − 𝜎(𝜔⊤ 𝑥𝑖 ))]
𝑁 𝑖=1 𝑖
1 1
Now 𝐽(𝜔) = ∑𝑁
𝑖=1 [𝑦𝑖 log ( ⊤ ) + (1 − 𝑦𝑖 )log (1 − ⊤ ))
1+𝑒 −𝜔 𝑥𝑖 1+𝑒 −𝜔 𝑥𝑖
1 ⊤𝑥
Now 𝑦𝑖 log ( ⊤ ) = 𝑦𝑖 log (1) − 𝑦𝑖 log (1 + 𝑒 −𝜔𝑇𝑥𝑖 ) = −𝑦𝑖 log (1 + 𝑒 −𝜔 𝑖 )
1+𝑒 −𝜔 𝑥
⊤
1 𝑒 −𝜔 𝑥𝑖 ⊤𝑥
and (1 − 𝑦𝑖 )log (1 − ⊤ ) = (1 − 𝑦𝑖 )log ( ⊤ ) = (1 − 𝑦𝑖 )log (𝑒 −𝜔 𝑖 ) −(1 −
1+𝑒 −𝜔 𝑥𝑖 1−𝑒 −𝜔 𝑥𝑖
−𝜔𝑇 𝑥𝑖
𝑦𝑖)log (1 + 𝑒 )
⊤ 𝑥𝑖
= −(1 − 𝑦𝑖)𝜔⊤ 𝑥𝑖 − log (1 + 𝑒 −𝜔 )
(2 ) Hinge Loss:
3.1 Python Code Implementation:
= 𝑝 × (1 − 𝑝)3 × 𝑝(1 − 𝑝)4 × 𝑝(1 − 𝑝)5 × 𝑝(1 − 𝑝)4 × 𝑝(1 − 𝑝)5 × 𝑝(1 − 𝑝)2
= 𝑝6 × (1 − 𝑝)23
∂ ∂
𝐿(𝑝) = 𝑝6 × (1 − 𝑝)23 [ (𝑢𝑣) = 𝑢′ 𝑣 + 𝑣 ′ 𝑢]
∂𝑝 ∂𝑥
∂
𝐿(𝑝) = (1 − 𝑝)23 × 𝑝6
∂𝑝
∂ ∂ 6
= [(1 − 𝑝)23 ] ⋅ 𝑝6 + (1 − 𝑝)23 ⋅ [𝑝 ]
∂𝑝 ∂𝑝
∂
= {23(1 − 𝑝)22 ⋅ (1 − 𝑝)𝑝6 } + {(1 − 𝑝)23 ⋅ 6𝑝5 }
∂𝑝
𝑑 ∂
= 23(1 − 𝑝)22 ( [1] − (𝑝)) 𝑝6 + (1 − 𝑝)23 ⋅ 6𝑝5
𝑑𝑝 ∂𝑝
For the product to be equal to zero, at least one of the factors must be equal to xero, Thus we
have two cases:
∴ 𝑝 = 6/29
= 3.79 × 10−7