Applied Machine Learning: One Variable (Simple) Linear Regression

Applied
Machine Learning
Lecture 2
One variable (simple)
linear regression
Ekarat Rattagan, Ph.D.
1
Outline
2.1 Linear regression
2.2 Learning algorithm
2.3 Iterative method: Gradient descent
2.4 Gradient descent for linear regression
2
Andrew Ng
2.1 Linear regression
3
Andrew Ng
Supervised Learning: Regression
• Given (xX1, y ), (x1,2, y
= 1{(x y 12),), ..., (x
(x 2, y 2n),, y. n.). , (x m, y m)}
• Learn a function f(x) to predict y given x Notation:
– y is real-valued == regression m = Number of training examples
9 x’s = “input” or “independent”
variables or features
September Arctic Sea Ice Extent
8
7
y’s = “output”, “target”, or
(1,000,000 sq km)
6
5 “dependent” variable
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
4
= {(x 1, y 1), (x 2, y 2), . . . , (x m, y m)}
Credit: Tom M. Mitchell 5

2.2 Learning algorithm
6
Andrew Ng
Learning algorithm
• Objective: to find the best target function h ∈ H
hθ(x) = θ0 + θ1x (Model representation, L1, P40)
• We need to specify a performance measure “A computer program is said to

learn from experience E with
• Cost function respect to some task T and some
performance measure P, if its
performance on T, as measured
by P, improves with experience
E.”
7
What is cost function ?
• A function that measures the performance of a Machine
Learning model for given data.
• It quantifies the error between estimated and true values.
• It is used for parameter estimation.
8
Cost function (Sum of Squared Errors)
hθ(x) = θ0 + θ1x
5
(3,4)
4
2
(0,1)
1
(2,1)
0
0 1 2 3 4 5
m
(hθ(x (i)) − y (i))2
∑
Cost function J(θ0, θ1) =
9
i=1
Learning objective: minimize J(θ0, θ1)
θ0, θ1
To find so that the gap between hθ(x) and is minimized.
10
Closed-form solution
• The Normal equation
θ = (XTX)−1 XT y
Code: appendix
11
Normal equation
• Limitation
• Non-invertible or Singular Matrix (det = 0)
• Some features are redundant
• Solution
• The Moore-Penrose Pseudo-inverse
• Singular Value Decomposition (SVD)
https://hadrienj.github.io/posts/Deep-Learning-Book-Series-2.9-The-Moore-Penrose-Pseudoinverse/
12
Normal equation
• Limitation
• Computational complexity O(n 3)
• Slow when the number of features grows large (e.g., 100,000), to
inverse matrix
• Solution
• Another approach: iterative method
13
2.3 Iterative method:
(Batch) Gradient descent
14
Andrew Ng
Hypothesis:
Parameters:
Cost Function:
Goal:
15
(for fixed , this is a function of x) (function of the parameter )
3 3
2 2
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
3 3
2 2
y θ1 = 0.5
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
3 3
2 2
y θ1 = 0
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
Credit: Andrew NG
19
J(θ0,θ1)
θ1
θ0
Credit: Andrew NG
20
Gradient descent
reference: https://arxiv.org/pdf/1609.04747.pdf
http://sites.science.oregonstate.edu/math/home/programs/undergrad/CalculusQuestStudyGuides/vcalc/grad/grad.html 21
Gradient descent
Simultaneous update
22
Declare convergence
Declare convergence if decreases by less than, e.g., in one

iteration.
23
If α is too small, gradient descent can
be slow.
If α is too large, gradient descent can

overreach the minimum.
It may fail to converge, or even
diverge.
Figures from “Hands-on Machine Learning with Scikit-Learn, Keras, & TensorFlow, …” 24
2.4 Gradient descent
for linear regression
25
Andrew Ng
Gradient descent algorithm
Linear Regression Model
26
Gradient descent algorithm
How to get these two equations?

27
(for fixed , this is a function of x) (function of the parameters )
Credit: Andrew NG
28
Credit: Andrew NG
29
Credit: Andrew NG
30
Credit: Andrew NG
31
Credit: Andrew NG
32
Credit: Andrew NG
33
Credit: Andrew NG
34
Credit: Andrew NG
35
Credit: Andrew NG
36
Gradient Descent
• Limitation
• Need to choose the learning rate
• Can be very slow as large datasets may not fit in the
memory
37
Gradient Descent VS Normal Equation
descent
use pseudo-inverse
https://www.geeksforgeeks.org/difference-between-gradient-descent-and-normal-equation/ 38

Applied Machine Learning: One Variable (Simple) Linear Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Machine Learning: One Variable (Simple) Linear Regression

Uploaded by

Copyright:

Available Formats

Applied

Credit: Tom M. Mitchell 5

• Objective: to find the best target function h ∈ H

hθ(x) = θ0 + θ1x (Model representation, L1, P40)

• We need to specify a performance measure “A computer program is said to

To find so that the gap between hθ(x) and is minimized.

• The Normal equation

Declare convergence if decreases by less than, e.g., in one

If α is too large, gradient descent can

Linear Regression Model

How to get these two equations?

You might also like