05 Multivariate Regression IV

CS 4104
APPLIED MACHINE LEARNING
Dr. Hashim Yasin

National University of Computer
and Emerging Sciences,
Faisalabad, Pakistan.
LINEAR REGRESSION
Linear Regression with one Variable
3
Housing Prices
(Portland, OR)
of dollars)
(in 1000s
Price
Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict real-valued output
each example in the data.
Dr. Hashim Yasin Applied Machine Learning (CS4104)
Regression Example
4
Training set of Size in feet2 (x) Price ($) in 1000's (y)

housing prices 2104 460
1416 232
1534 315
852 178
… …
Notation:
m = Number of training examples One Training example 𝑥, 𝑦
x’s = “input” variable / features 𝑖𝑡ℎ training example (𝑥 𝑖 , 𝑦 𝑖 )
y’s = “output” variable / “target” variable

Regression Example
5
Training set of Size in feet2 (x) Price ($) in 1000's (y)

housing prices 2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
Linear Regression with one Variable
6
Simplified:
Hypothesis:
Parameters:
Cost Function:
Goal:

Regression
7
One Parameter (𝜃1 ):

𝑚 𝑚
෍ 𝜃1 𝑥 2,𝑖 − ෍ 𝑥 𝑖 𝑦 𝑖 = 0
𝑚 𝑖=1 𝑖=1
𝜕 𝐽(𝜃1 ) 𝜕 2
= ෍ ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 𝑚 𝑚
𝜕𝜃1 𝜕𝜃1
𝑖=1 ෍ 𝜃1 𝑥 2,𝑖 = ෍ 𝑥 𝑖 𝑦 𝑖
𝑚
𝜕 𝐽(𝜃1 ) 𝜕 𝑖=1 𝑖=1
2
= ෍ 𝜃1 𝑥 𝑖 − 𝑦 𝑖
𝜕𝜃1 𝜕𝜃1 σ𝑚 𝑖 𝑖
𝑖=1 𝑖=1 𝑥 𝑦
𝜃1 = 𝑚 2𝑖
𝑚 σ𝑖=1 𝑥
𝜕 𝐽(𝜃1 ) 𝜕
= 2 ෍ 𝜃1 𝑥 𝑖 − 𝑦 𝑖 𝜃1 𝑥 𝑖 − 𝑦 𝑖
𝜕𝜃1 𝜕𝜃1
𝑖=1
𝑚 𝑐𝑜𝑣𝑎𝑟(𝑋, 𝑌)
𝜕 𝐽(𝜃1 )
= 2 ෍ 𝜃1 𝑥 𝑖 − 𝑦 𝑖 𝑥 𝑖 = 0 𝑣𝑎𝑟(𝑋)
𝜕𝜃1 if 𝑚𝑒𝑎𝑛(𝑋) = 𝑚𝑒𝑎𝑛(𝑌) = 0
𝑖=1

LINEAR REGRESSION WITH
MULTIPLE VARIABLES
Multivariate Regression
9
Hypothesis:
Parameters:
𝑥0 = 1
Cost function:
Gradient descent:
Repeat
(simultaneously update for every )

Gradient Descent
10
New algorithm :
Repeat
Previously (𝑛 = 1):
Repeat
(simultaneously update for
)
(simultaneously update )

11
Two Parameters(𝜃0 , 𝜃1 ):
𝑚 𝑚 𝑚
1 1 1
෍ 𝜃0 + ෍ 𝜃1 𝑥 = ෍ 𝑦 𝑖
𝑖
𝑚 𝑚 𝑚
𝑖=1 𝑖=1 𝑖=1
𝑚
𝜕 𝐽(𝜃0 ) 𝜕 1 2 𝑚
= ෍ ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 1
𝜕𝜃0 𝜕𝜃0 2𝑚
𝑖=1 𝑥ҧ = ෍ 𝑥 𝑖
𝑚 𝑚
𝜕 𝐽(𝜃0 ) 𝜕 1 2 𝑖=1
= ෍ 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖 𝑚
𝜕𝜃0 𝜕𝜃0 2𝑚
𝑖=1 1
𝜕 𝐽(𝜃0 ) 1
𝑚
𝜕
𝑦ത = ෍ 𝑦 𝑖
=2 ෍ 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖 𝑚
𝜕𝜃0 2𝑚 𝜕𝜃0 𝑖=1
𝑖=1
𝑚
𝜕 𝐽(𝜃0 ) 1
= ෍ 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖 = 0
𝜃0 = 𝑦ത − 𝜃1 𝑥ҧ
𝜕𝜃0 𝑚
𝑖=1

12
𝑚
𝜕 𝐽(𝜃1 ) 𝜕 1 2
= ෍ ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖
𝜕𝜃1 𝜕𝜃1 2𝑚
𝑖=1
𝑚
𝜕 𝐽(𝜃1 ) 𝜕 1 2
= ෍ 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖
𝜕𝜃1 𝜕𝜃1 2𝑚
𝑖=1
𝑚
𝜕 𝐽(𝜃1 ) 1 𝜕
=2 ෍ 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖
𝜕𝜃1 2𝑚 𝜕𝜃1
𝑖=1
𝑚
𝜕 𝐽(𝜃1 ) 1
= ෍ 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖 𝑥 𝑖 = 0
𝜕𝜃1 𝑚
𝑖=1

13
𝑚
𝜕 𝐽(𝜃1 ) 1
= ෍ 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖 𝑥 𝑖 = 0
𝜕𝜃1 𝑚
𝑖=1
𝑚 𝑚 𝑚
1 1 1
෍ 𝜃0 𝑥 + ෍ 𝜃1 𝑥 = ෍ 𝑥 𝑖 𝑦 𝑖
𝑖 2𝑖
𝜃0 = 𝑦ത − 𝜃1 𝑥ҧ
𝑚 𝑚 𝑚
𝑖=1 𝑖=1 𝑖=1
𝑚 𝑚 𝑚
෍ 𝜃1 𝑥 2,𝑖 + ෍(𝑦ത − 𝜃1 𝑥)𝑥

ҧ 𝑖 = ෍ 𝑥𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1
𝑚 𝑚 𝑚 𝑚
෍ 𝜃1 𝑥 2,𝑖 + ෍ 𝑦𝑥
ത 𝑖 − ෍ 𝜃1 𝑥𝑥
ҧ 𝑖 = ෍ 𝑥 𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑚 𝑚
𝜃1 ෍ 𝑥 𝑖 𝑥 𝑖 − 𝑥ҧ = ෍ 𝑥 𝑖 𝑦 𝑖 − 𝑦ത
𝑖=1 𝑖=1
14
Two Parameters(𝜃0 , 𝜃1 ): 𝑚
1
𝑚 𝑚
𝑥ҧ = ෍ 𝑥 𝑖
𝑚
𝑖=1
𝜃1 ෍ 𝑥 𝑖 𝑥 𝑖 − 𝑥ҧ = ෍ 𝑥 𝑖 𝑦 𝑖 − 𝑦ത
𝑚
𝑖=1 𝑖=1 1
σ𝑚 𝑖
𝑖=1 𝑥 𝑦 − 𝑦
𝑖
ത 𝑚
𝜃1 = 𝑚 𝑖 𝑖 𝑖=1
σ𝑖=1 𝑥 𝑥 − 𝑥ҧ
σ𝑚 𝑥 𝑖 𝑦 𝑖 − 𝑚 σ𝑚 𝑥 𝑖 𝑦
ത
𝑖=1 𝑚 𝑖=1
𝜃1 =
σ𝑚 𝑥 2,𝑖 − 𝑚 σ𝑚 𝑥 𝑖 𝑥ҧ
𝑖=1 𝑚 𝑖=1
σ𝑚
𝑖=1 𝑥 𝑖 𝑦 𝑖 − 𝑚. 𝑥ҧ 𝑦
ത
𝜃1 = 𝑚 2,𝑖
σ𝑖=1 𝑥 − 𝑚. 𝑥ҧ 2
15
1
σ𝑚 𝑥 𝑖 𝑦 𝑖 − 𝑚. 𝑥ҧ 𝑦
ത 𝑚
𝑖=1 𝑖=1
𝜃0 = 𝑦ത − 𝜃1 𝑥ҧ 𝜃1 = 𝑚 2,𝑖
σ𝑖=1 𝑥 − 𝑚. 𝑥ҧ 2 𝑚
1
𝑚
These equations can be summarized by the following 𝑖=1
matrix equation (also known as normal equation)
𝑚 σ𝑚
𝑖=1 𝑥 𝑖
𝜃0 σ𝑚
𝑖=1 𝑦 𝑖
=
σ𝑚
𝑖=1 𝑥
𝑖 σ𝑚
𝑖=1 𝑥
2,𝑖 𝜃1 σ𝑚 𝑖 𝑖
𝑖=1 𝑥 𝑦

16
1
𝑚 σ𝑚
𝑖=1 𝑥 𝑖 𝜃0 σ𝑚
𝑖=1 𝑦 𝑖 𝑚
= 𝑖=1
σ𝑚
𝑖=1 𝑥
𝑖 σ𝑚
𝑖=1 𝑥
2,𝑖 𝜃1 σ𝑚 𝑖 𝑖
𝑖=1 𝑥 𝑦 𝑚
1
This equation can be written in more compact form, 𝑦ത = ෍ 𝑦 𝑖
𝑚
Suppose 𝐗 = (𝟏 𝐱) , 𝟏 = (1,1,1, … )𝑇 and 𝑖=1
𝐱 = (𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 )𝑇
𝑇
𝟏 𝑇
𝟏 𝟏𝑇 𝐱 𝑚 σ𝑚
𝑖=1 𝑥
𝑖
𝐗 𝐗= 𝑇 =
𝐱 𝟏 𝑇
𝐱 𝐱 σ𝑚
𝑖=1 𝑥
𝑖 σ𝑚
𝑖=1 𝑥
2,𝑖
𝑇𝐲 σ 𝑚 𝑖
𝟏 𝑖=1 𝑦
𝐗 𝑇 𝐲 = (𝟏 𝐱)𝑇 𝐲 = 𝑇 =
𝐱 𝒚 σ𝑚 𝑖 𝑖
𝑖=1 𝑥 𝑦
17
𝐗 𝑇 𝐗𝛉 = 𝐗 𝑇 y
where 𝛉 = (𝜃0 , 𝜃1 )𝑇
𝛉 = (𝐗 𝑇 𝐗)−1 𝐗 𝑇 y
 This equation leads to extend the univariate linear

regression to multivariate regression.
 If the attribute set consists of 𝑛 attributes
(𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 ), 𝐗 becomes an 𝑚 × 𝑛 design matrix.

18
𝑚 × 𝑛 design matrix
1 𝑥11 𝑥12 ⋯ 𝑥1𝑛
1 𝑥21 𝑥22 ⋯ 𝑥2𝑛
𝐗= ⋯ ⋯ ⋯ ⋯ ⋯
1 𝑥𝑚1 𝑥𝑚2 ⋯ 𝑥𝑚𝑛
𝜃 = (𝜃0 , 𝜃1 , 𝜃2 , ⋯ , 𝜃𝑛 )𝑇
While 𝜃 is the n-dimensional vector.

EXAMPLES
Example 1
20
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
simultaneously update
where 𝛉 = (𝜃0 , 𝜃1 , 𝜃2 , 𝜃3 , 𝜃4 )𝑇
Example 2
21
A chemical process expects the

yield to be affected by two factors
𝑥1 and 𝑥2
Observations recorded for these

two factors are shown in the given
table.

Example 2
22
 The first order regression model is,

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2
𝛉 = (𝐗 𝑇 𝐗)−1 𝐗 𝑇 y

Example 2
23
𝛉 = (𝐗 𝑇 𝐗)−1 𝐗 𝑇 y
𝜽𝟎
𝛉 = 𝜽𝟏
𝜽𝟐
𝜽𝟎
𝛉 = 𝜽𝟏
𝜽𝟐

Example 2
24
𝜽𝟎
𝛉 = 𝜽𝟏
𝜽𝟐
𝜽𝟎
𝛉 = 𝜽𝟏
𝜽𝟐
ℎ𝜃 𝑥 = −153.51 + 1.24𝑥1 + 12.08𝑥2

Example 2
25
ℎ𝜃 𝑥 = −153.51 + 1.24𝑥1 + 12.08𝑥2
𝜽𝟎 −153.51
𝛉 = 𝜽𝟏 = 1.24
𝜽𝟐 12.08

Gradient Descent
26
Have some function

Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Gradient Descent
27
Gradient descent algorithm
Notice : α is the learning rate.
Correct: Simultaneous update Incorrect:

Gradient Descent
28
Gradient descent algorithm Linear Regression Model

Gradient Descent
29
Gradient descent algorithm

𝜕
∙ 𝐽(𝜃0 , 𝜃1 )
𝜕𝜃0
update
and
simultaneously
𝜕
∙ 𝐽(𝜃0 , 𝜃1 )
𝜕𝜃1

Acknowledgement
30
Tom Mitchel, Russel & Norvig, Andrew Ng, Alpydin &

Ch. Eick.

31

05 Multivariate Regression IV

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05 Multivariate Regression IV

Uploaded by

Copyright:

Available Formats

CS 4104

APPLIED MACHINE LEARNING

Dr. Hashim Yasin

Training set of Size in feet2 (x) Price ($) in 1000's (y)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Training set of Size in feet2 (x) Price ($) in 1000's (y)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

One Parameter (𝜃1 ):

Dr. Hashim Yasin Applied Machine Learning (CS4104)

(simultaneously update for every )

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

෍ 𝜃1 𝑥 2,𝑖 + ෍(𝑦ത − 𝜃1 𝑥)𝑥

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 This equation leads to extend the univariate linear

Dr. Hashim Yasin Applied Machine Learning (CS4104)

While 𝜃 is the n-dimensional vector.

Dr. Hashim Yasin Applied Machine Learning (CS4104)

A chemical process expects the

Observations recorded for these

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 The first order regression model is,

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

ℎ𝜃 𝑥 = −153.51 + 1.24𝑥1 + 12.08𝑥2

Dr. Hashim Yasin Applied Machine Learning (CS4104)

ℎ𝜃 𝑥 = −153.51 + 1.24𝑥1 + 12.08𝑥2

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Have some function

Gradient descent algorithm

Notice : α is the learning rate.

Correct: Simultaneous update Incorrect:

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Gradient descent algorithm Linear Regression Model

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Gradient descent algorithm

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Tom Mitchel, Russel & Norvig, Andrew Ng, Alpydin &

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

You might also like