Lec03-1 Linear Regression

Mauro Sebastián Innocente
PhD, MSc, MEng, FHEA
Autonomous Vehicles &

Artificial Intelligence Laboratory (AVAILab)
Mauro.S.Innocente@gmail.com
https://msinnocente.com/
https://availab.org/
AVAILab
Autonomous Vehicles &
Artificial Intelligence
Laboratory
• The Regression Problem
• Simple Linear Regression
• Multiple Linear Regression
Linear Regression Mauro S. Innocente

2 | 18

3 | 18
For example, according to
Newton’s 2nd Law of Motion:
For example, house prices:

4 | 18
5 | 18
• Suppose we have n couples:
{xi , yi}, i = 1,…,n.
• Suppose we know/estimate the shape of the function that governs the system, g(x).
• The MSE is the aggregation of the squared differences between corresponding

observed and predicted data:
𝑛
1 2
𝑀𝑆𝐸 = ⋅ ෍ 𝒚𝒊 − 𝒈 𝒙𝒊
𝑛
𝑖=1
• Squaring the difference also makes the sign of the differences irrelevant.
• Squaring the difference assigns higher pressure to reducing large differences.

6 | 18
Training Data Set
(known input-output pairs)
Learning Algorithm
(training of hypothesis / optimisation)
Hypothesis
g(x)
Price [£] = a0 + a ∙ x

7 | 18

8 | 18
• Approximation function (in Machine Learning jargon, the hypothesis):
𝑔(𝑥) = 𝑎0 + 𝑎1 ⋅ 𝑥
Training Set
• Variables: 𝑎0 , 𝑎1 𝑛
x y
1 1 2
• Training: Minimise Mean Squared Error: 𝑀𝑆𝐸 = ⋅ ෍ 𝑦𝑖 − 𝑔 𝑥𝑖
𝑛
2
2 4
𝑖=1 3 6

9 | 18
• Approximation function (in Machine Learning jargon, the hypothesis):
𝑔(𝑥) = 𝑎0 + 𝑎1 ⋅ 𝑥
Training Set
• Variables: 𝑎0 , 𝑎1 𝑛
x y
1 1 2
• Training: Minimise Mean Squared Error: 𝑀𝑆𝐸 = ⋅ ෍ 𝑦𝑖 − 𝑔 𝑥𝑖
𝑛
2
2 4
𝑖=1 3 6

10 | 18
• What is the minimum MSE in the previous example?
• Can you tell why? Training Set
x y
• Consider the new data set in the table on the right, and compute 1
2
1.5
4.5
the MSE for a0 = 0 and the cases a1 = 0, a1 = 1, a1 = 2, a1 = 3. 3 5.5
17.5833
5.5833
4.2500
0.2500

11 | 18
• Finding the coefficients that maximise fitting implies solving an optimisation
problem: error minimisation – typically, mean squared error MSE.
• A gradient-descent optimisation algorithm can
𝜕𝑀𝑆𝐸 (𝑡−1) (𝑡−1)
(𝑡) (𝑡−1)
be implemented more or less straightforwardly: 𝑎𝑗 = 𝑎𝑗 −𝑘⋅
𝜕𝑎𝑗
𝑎0 , 𝑎1
• Due to time constraints, we will use Matlab’s fminunc built-in function to solve
the general optimisation problem embedded in the general regression problem.
• For Linear Regression, there is also a straightforward closed-form solution:
𝑛 𝑛
1 2 1 2
𝑀𝑆𝐸 = ⋅ ෍ 𝑦𝑖 − 𝑔 𝑥𝑖 = ⋅ ෍ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖
𝑛 𝑛
𝑖=1 𝑖=1
𝑛
𝜕𝑀𝑆𝐸 2
= − ⋅ ෍ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖 = 0
𝜕𝑎0 𝑛
𝑖=1
𝑛
𝜕𝑀𝑆𝐸 2
= − ⋅ ෍ 𝑥𝑖 ⋅ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖 = 0
𝜕𝑎1 𝑛
𝑖=1
12 | 18
𝑛 𝑛 𝑛 𝑛 𝑛
෍ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖 = ෍ 𝑦𝑖 − 𝑎0 ⋅ 𝑛 − 𝑎1 ⋅ ෍ 𝑥𝑖 = 0 ⇒ 𝑎0 ⋅ 𝑛 + 𝑎1 ⋅ ෍ 𝑥𝑖 = ෍ 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
෍ 𝑥𝑖 ⋅ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖 = ෍ 𝑥𝑖 ⋅ 𝑦𝑖 − ෍ 𝑥𝑖 ⋅ 𝑎0 − ෍ 𝑎1 ⋅ 𝑥𝑖 2 = 0 ⇒ 𝑎0 ⋅ ෍ 𝑥𝑖 + 𝑎1 ⋅ ෍ 𝑥𝑖 2 = ෍ 𝑥𝑖 ⋅ 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
• Operating,
𝜕𝑀𝑆𝐸
• Note that, if 𝑎0 = 0, the equation derived from
𝜕𝑎0
= 0 does not apply. Then,

13 | 18
𝑛 2
σ𝑖=1 𝑦𝑖 − 𝑔 𝑥𝑖
𝑅2 = 1 −
σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2

14 | 18
Training Set
x y
10 0.3337
𝑎0 = 0 (assumption) 20 0.7227
𝑎1 = 0.03403 30 0.9050
𝑅2 = 0.9892 40 1.3939
50 1.5043
𝑀𝑆𝐸 = 0.0157
60 2.0251
70 2.4487
80 2.5065
90 3.2735
100 3.3543
110 3.6600
120 4.2511
σ𝑛𝑖=1 𝑥𝑖 ⋅ 𝑦𝑖
𝑎1 = = 0.03403
σ𝑛𝑖=1 𝑥𝑖 2

15 | 18

16 | 18
• Back to the house price example:
Price [£] = a0 + a ∙ x
Here the input vector x (predictor / independent variable) is no longer a scalar but a 4D vector.
• For multiple linear regression, the approximation function is given by:
𝑔(𝐱) = 𝐚 ⋅ 𝐱 = 𝑎0 ⋅ 𝑥0 + 𝑎1 ⋅ 𝑥1 +. . . +𝑎𝑚 ⋅ 𝑥𝑚 (vector notation, inner product)
• In matrix notation,
𝑔(𝐱) = 𝐚T ⋅ 𝐱 = 𝑎0 ⋅ 𝑥0 + 𝑎1 ⋅ 𝑥1 +. . . +𝑎𝑚 ⋅ 𝑥𝑚
• For convenience, x0 = 1. Hence there are m input variables (a.k.a. features).

• The same as for the 1D problem, an error such as the MSE can be computed.
• The coefficients that minimise the error are sought using an optimisation algorithm.
17 | 18
• Now every data point is given by an input vector and a scalar output.
• And the MSE is given by:
2
𝑛 𝑚
1
𝑀𝑆𝐸 = ⋅ ෍ 𝑦𝑖 − ෍ 𝑎𝑗 ⋅ 𝑥𝑖𝑗
𝑛 Subindex “i” stands for ith training example.
𝑖=1 𝑗=0
Subindex “j” stands for jth input variable.
xij: Value of the jth variable of the ith training example.
• If a gradient descent algorithm was to be used for the optimisation:

𝑛 𝑚
(𝑡) (𝑡−1) 𝜕𝑀𝑆𝐸 𝑡−1 (𝑡) (𝑡−1) 2 (𝑡−1)
𝑎𝑗 = 𝑎𝑗 −𝑘⋅ 𝐚 ⇒ 𝑎𝑗 = 𝑎𝑗 + 𝑘 ⋅ ⋅ ෍ 𝑥𝑖𝑗 ⋅ 𝑦𝑖 − ෍ 𝑎𝑠 ⋅ 𝑥𝑖𝑠
𝜕𝑎𝑗 𝑛
𝑖=1 𝑠=0
• Data Normalisation: If different variables have different orders or magnitude, scaling them to
the same order (e.g. [0, 1]) speeds-up convergence.
• Mean Normalisation: Technique used to work with zero-mean variables.

18 | 18

Lec03-1 Linear Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec03-1 Linear Regression

Uploaded by

Copyright:

Available Formats

Mauro Sebastián Innocente

PhD, MSc, MEng, FHEA

Autonomous Vehicles &

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

For example, house prices:

Linear Regression Mauro S. Innocente

• The MSE is the aggregation of the squared differences between corresponding

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

Linear Regression Mauro S. Innocente

• For convenience, x0 = 1. Hence there are m input variables (a.k.a. features).

• If a gradient descent algorithm was to be used for the optimisation:

Linear Regression Mauro S. Innocente

You might also like