Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Machine Learning

Gradient Descent
Gradient Descent
Gradient Descent is just like Agile Methodology

Make
changes Build
depending something
upon the quickly
feedback

Get some Get it out


feedback there
Gradient Descent

Lets have some function 𝐽 θ

Want to min J(θ)


θ

Algorithm:
- initialize θ ’s randomly
- keep chaining θ ′ s to reduce J(θ)
until we hopefully end up at a minimum
Gradient Descent
Lets have some function 𝐽 θ

Want to min J(θ)


θ

Algorithm:
- initialize θ ’s randomly
- repeat until convergence {
𝜕
θi := θi - α J(θ)
𝜕θi
Gradient Descent

Lets have some function 𝐽 θ1

Want to min J(θ1)


θ1

Algorithm:
- initialize θ1 randomly
- keep chaining θ1 to reduce J(θ 1)
until we hopefully end up at a minimum
Gradient Descent
Lets have some function 𝐽 θ1

Want to min J(θ1)


θ1

Algorithm:
- initialize θ ’s randomly
- repeat until convergence {
𝜕
θ1 := θ1 - α J(θ1)
𝜕θ1
}
Gradient Descent
𝐽 θ1 = (θ1 - 3 )2 +5 θ1 := θ1 - α
𝜕
J(θ1)
𝜕θ1
θ1 𝑱 θ1 𝜕
0 14 J(θ1) = 2(θ1 – 3) α = 0.1
𝜕θ1
1 9
-1 21
If θ1 = 10
2 6
-2 30
3 5
-3 41
4 6
-4 54
5 9
-5 69
6 14 If θ1 = -5
-6 86
7 21
8 30
9 41
10 54
11 69
12 86
13 105
Gradient Descent
Q&A
Impact of learning rate in Gradient
Descent
Impact of learning rate in Gradient Descent
Impact of learning rate in Gradient Descent
Q&A
How to implement Gradient Descent
How to implement Gradient Descent
𝐽 θ1 = (θ1 - 3 )2 +5 initialize θ ’s randomly
- repeat until convergence {
θ1 𝑱 θ1 𝜕
0 14 θ1 := θ1 - α J(θ1)
𝜕θ1
1 9 }
-1 21
2 6
-2 30
3 5 𝜕
𝐽(θ1) = 2(θ1 – 3)
-3 41 𝜕θ1
4 6
-4 54
5 9
-5 69 initialization θ1 = 10 initialization θ1 = -5
6 14
-6 86
7 21 Repeat until convergence{
8 30
θ1 := θ1 - α 2(θ1 – 3)
9 41
10 54
}
11 69
12 86
13 105
How to implement Gradient Descent

Cost function: J(θ0,θ1)

min J(θ0,θ1)
θ0,θ1

Algorithm:
- initialize θ ’s randomly
- repeat until convergence {
𝜕
θi := θi - α J(θ0,θ1)
𝜕θi
How to implement Gradient Descent
How to implement Gradient Descent
How to implement Gradient Descent
How to implement Gradient Descent
Cost function: J(θ0,θ1)
Algorithm:
- initialize θ ’s randomly min J(θ0,θ1)
θ0,θ1
- repeat until convergence {
𝜕
θi := θi - α J(θ0,θ1)
𝜕θi
Correct: Simultaneous Update Incorrect
𝜕 𝜕
temp0 := θ0 - α J(θ0,θ1) temp0 := θ0 - α J(θ0,θ1)
𝜕θ0 𝜕θ0
𝜕
temp1 := θ1 - α J(θ0,θ1) θ0 := temp0
𝜕θ1
𝜕
θ0 := temp0 temp1 := θ1 - α J(θ0,θ1)
𝜕θ1
θ1 := temp1 θ1 := temp1
Q&A

You might also like