Lecture LinearRegression

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 42

Linear Regression

Machine learning algorithms


Supervised Unsupervised
Learning Learning

Discrete Classification Clustering

Dimensionality
Continuous Regression reduction
Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression


Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression


Regression
Training set
real-valued output

Learning Algorithm

𝑥 h 𝑦
Size of house Hypothesis Estimated price
House pricing prediction
Price ($)
in 1000’s
400
300
200
100

500 1000 1500 2000 2500


Size in feet^2
Training set Size in feet^2 (x) Price ($) in 1000’s (y)
2104 460
1416 232
1534 315
852 178 = 47
… …

• Notation:
• = Number of training examples
• = Input variable / features Examples:
• = Output variable / target variable
• (, ) = One training example
• (, ) = training example
Model representation

Training set
Shorthand

Learning Algorithm Price ($)


in 1000’s
400
300

𝑥 h 𝑦
200
100

Size of house Hypothesis Estimated price 500 1000 1500 2000 2500
Size in feet^2

Univariate linear regression


Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression


Size in feet^2 (x) Price ($) in 1000’s (y)
Training set 2104 460
1416 232
1534 315
852 178 = 47
… …

• Hypothesis

: parameters/weights

How to choose ’s?


h 𝜃 ( 𝑥 )= 𝜃 0+ 𝜃 1 𝑥
𝑦 𝑦 𝑦
3 3 3
2 2 2
1 1 1

1 2 3 𝑥 1 2 3 𝑥 1 2 3 𝑥
Cost function
• Idea: 𝜃 0 , 𝜃1
Choose so that
is close to for our
h 𝜃 ( 𝑥 ) =𝜃 0 +𝜃 1 𝑥
(𝑖 ) (𝑖)
training example
𝑚
𝑦 1 (𝑖 ) 2
Price ($)
in 1000’s
𝐽 ( 𝜃0 , 𝜃1 ) = ∑
2𝑚 𝑖=1
( h𝜃 ( 𝑥 ) − 𝑦 )
(𝑖 )

400
300
200

Cost function
100

500 1000 1500 2000 2500


𝑥 𝜃 0 , 𝜃1
Size in feet^2
Simplified
• Hypothesis: • Hypothesis:

• Parameters: • Parameters:

• Cost function: • Cost function:

• Goal: • Goal:

𝜃 0 , 𝜃1 𝜃 0 , 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1
• Hypothesis:

• Parameters:

• Cost function:

• Goal:
𝜃 0 , 𝜃1
Cost function
How do we find good that minimize ?
Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression


Gradient descent
Have some function
Want
𝜃 0 , 𝜃1

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at minimum
Gradient descent
Repeat until convergence{
(for and )
}

: Learning rate (step size)


: derivative (rate of change)
Gradient descent
Correct: simultaneous update Incorrect:
𝜕
𝜃 1 ≔ 𝜃1 − 𝛼 𝐽 ( 𝜃1 )
𝜕 𝜃1
𝐽 ( 𝜃1 )

𝜕
3 𝐽 ( 𝜃1 ) < 0
𝜕 𝜃1
𝜕
2 𝐽 ( 𝜃1 ) > 0
𝜕 𝜃1

0 1 2 3 𝜃1
Learning rate
Gradient descent for linear regression
Repeat until convergence{
(for and )
}

• Linear regression model


Computing partial derivative
•=
=

•:
•:
Gradient descent for linear regression
Repeat until convergence{

Update and simultaneously


Batch gradient descent
• “Batch”: Each step of gradient descent uses all the training examples
Repeat until convergence{
: Number of training examples

}
Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression


Training dataset
Size in feet^2 (x) Price ($) in 1000’s (y)
2104 460
1416 232
1534 315
852 178
… …

h 𝜃 ( 𝑥 )=𝜃 0+ 𝜃1 𝑥
Multiple features (input variables)
Size in feet^2 () Number of Number of Age of home Price ($) in
bedrooms () floors () (years) () 1000’s (y)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… …

Notation:
= Number of features
= Input features of training example
= Value of feature in training example
Hypothesis
Previously:

Now:
h 𝜃 ( 𝑥 )=𝜃 0+𝜃1 𝑥1 +𝜃 2 𝑥 2+…+𝜃 𝑛 𝑥 𝑛
• For convenience of notation, define
( for all examples)


Gradient descent
• Previously () • New algorithm ()

Repeat until convergence{ Repeat until convergence{

}
Simultaneously update
Gradient descent in practice: Feature scaling
• Idea: Make sure features are on a similar scale (e.g,. )
• E.g. size (0-2000 feat^2)
number of bedrooms (1-5)

𝜃2 𝜃2

3 3
2 2
1 1
𝜃1 𝜃1
0 1 2 3 0 1 2 3
Gradient descent in practice: Learning rate
• Automatic convergence test
• too small: slow convergence
• too large: may not converge

• To choose , try

0.001, … 0.01, …, 0.1, … , 1


House prices prediction

• Area
Polynomial regression
Price ($)
in 1000’s
400

300

200

100 = (size)
500 1000 1500 2000 2500
= (size)^2
Size in feet^2 = (size)^3

You might also like