Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Mauro Sebastián Innocente

PhD, MSc, MEng, FHEA

Autonomous Vehicles &


Artificial Intelligence Laboratory (AVAILab)
Mauro.S.Innocente@gmail.com
https://msinnocente.com/
https://availab.org/

AVAILab
Autonomous Vehicles &
Artificial Intelligence
Laboratory
• The Regression Problem
• Simple Linear Regression
• Multiple Linear Regression

Linear Regression Mauro S. Innocente


2 | 18
• The Regression Problem
• Simple Linear Regression
• Multiple Linear Regression

Linear Regression Mauro S. Innocente


3 | 18
For example, according to
Newton’s 2nd Law of Motion:

For example, house prices:

Linear Regression Mauro S. Innocente


4 | 18
Linear Regression Mauro S. Innocente
5 | 18
• Suppose we have n couples:
{xi , yi}, i = 1,…,n.

• Suppose we know/estimate the shape of the function that governs the system, g(x).

• The MSE is the aggregation of the squared differences between corresponding


observed and predicted data:
𝑛
1 2
𝑀𝑆𝐸 = ⋅ ෍ 𝒚𝒊 − 𝒈 𝒙𝒊
𝑛
𝑖=1

• Squaring the difference also makes the sign of the differences irrelevant.
• Squaring the difference assigns higher pressure to reducing large differences.

Linear Regression Mauro S. Innocente


6 | 18
Training Data Set
(known input-output pairs)

Learning Algorithm
(training of hypothesis / optimisation)

Hypothesis
g(x)

Price [£] = a0 + a ∙ x

Linear Regression Mauro S. Innocente


7 | 18
• The Regression Problem
• Simple Linear Regression
• Multiple Linear Regression

Linear Regression Mauro S. Innocente


8 | 18
• Approximation function (in Machine Learning jargon, the hypothesis):
𝑔(𝑥) = 𝑎0 + 𝑎1 ⋅ 𝑥
Training Set
• Variables: 𝑎0 , 𝑎1 𝑛
x y
1 1 2
• Training: Minimise Mean Squared Error: 𝑀𝑆𝐸 = ⋅ ෍ 𝑦𝑖 − 𝑔 𝑥𝑖
𝑛
2
2 4
𝑖=1 3 6

Linear Regression Mauro S. Innocente


9 | 18
• Approximation function (in Machine Learning jargon, the hypothesis):
𝑔(𝑥) = 𝑎0 + 𝑎1 ⋅ 𝑥
Training Set
• Variables: 𝑎0 , 𝑎1 𝑛
x y
1 1 2
• Training: Minimise Mean Squared Error: 𝑀𝑆𝐸 = ⋅ ෍ 𝑦𝑖 − 𝑔 𝑥𝑖
𝑛
2
2 4
𝑖=1 3 6

Linear Regression Mauro S. Innocente


10 | 18
• What is the minimum MSE in the previous example?
• Can you tell why? Training Set
x y
• Consider the new data set in the table on the right, and compute 1
2
1.5
4.5
the MSE for a0 = 0 and the cases a1 = 0, a1 = 1, a1 = 2, a1 = 3. 3 5.5

17.5833

5.5833
4.2500

0.2500

Linear Regression Mauro S. Innocente


11 | 18
• Finding the coefficients that maximise fitting implies solving an optimisation
problem: error minimisation – typically, mean squared error MSE.
• A gradient-descent optimisation algorithm can
𝜕𝑀𝑆𝐸 (𝑡−1) (𝑡−1)
(𝑡) (𝑡−1)
be implemented more or less straightforwardly: 𝑎𝑗 = 𝑎𝑗 −𝑘⋅
𝜕𝑎𝑗
𝑎0 , 𝑎1

• Due to time constraints, we will use Matlab’s fminunc built-in function to solve
the general optimisation problem embedded in the general regression problem.
• For Linear Regression, there is also a straightforward closed-form solution:
𝑛 𝑛
1 2 1 2
𝑀𝑆𝐸 = ⋅ ෍ 𝑦𝑖 − 𝑔 𝑥𝑖 = ⋅ ෍ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖
𝑛 𝑛
𝑖=1 𝑖=1
𝑛
𝜕𝑀𝑆𝐸 2
= − ⋅ ෍ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖 = 0
𝜕𝑎0 𝑛
𝑖=1
𝑛
𝜕𝑀𝑆𝐸 2
= − ⋅ ෍ 𝑥𝑖 ⋅ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖 = 0
𝜕𝑎1 𝑛
𝑖=1
Linear Regression Mauro S. Innocente
12 | 18
𝑛 𝑛 𝑛 𝑛 𝑛

෍ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖 = ෍ 𝑦𝑖 − 𝑎0 ⋅ 𝑛 − 𝑎1 ⋅ ෍ 𝑥𝑖 = 0 ⇒ 𝑎0 ⋅ 𝑛 + 𝑎1 ⋅ ෍ 𝑥𝑖 = ෍ 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛

෍ 𝑥𝑖 ⋅ 𝑦𝑖 − 𝑎0 − 𝑎1 ⋅ 𝑥𝑖 = ෍ 𝑥𝑖 ⋅ 𝑦𝑖 − ෍ 𝑥𝑖 ⋅ 𝑎0 − ෍ 𝑎1 ⋅ 𝑥𝑖 2 = 0 ⇒ 𝑎0 ⋅ ෍ 𝑥𝑖 + 𝑎1 ⋅ ෍ 𝑥𝑖 2 = ෍ 𝑥𝑖 ⋅ 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1

• Operating,

𝜕𝑀𝑆𝐸
• Note that, if 𝑎0 = 0, the equation derived from
𝜕𝑎0
= 0 does not apply. Then,

Linear Regression Mauro S. Innocente


13 | 18
𝑛 2
σ𝑖=1 𝑦𝑖 − 𝑔 𝑥𝑖
𝑅2 = 1 −
σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2

Linear Regression Mauro S. Innocente


14 | 18
Training Set
x y
10 0.3337
𝑎0 = 0 (assumption) 20 0.7227
𝑎1 = 0.03403 30 0.9050
𝑅2 = 0.9892 40 1.3939
50 1.5043
𝑀𝑆𝐸 = 0.0157
60 2.0251
70 2.4487
80 2.5065
90 3.2735
100 3.3543
110 3.6600
120 4.2511

σ𝑛𝑖=1 𝑥𝑖 ⋅ 𝑦𝑖
𝑎1 = = 0.03403
σ𝑛𝑖=1 𝑥𝑖 2

Linear Regression Mauro S. Innocente


15 | 18
• The Regression Problem
• Simple Linear Regression
• Multiple Linear Regression

Linear Regression Mauro S. Innocente


16 | 18
• Back to the house price example:

Price [£] = a0 + a ∙ x

Here the input vector x (predictor / independent variable) is no longer a scalar but a 4D vector.
• For multiple linear regression, the approximation function is given by:
𝑔(𝐱) = 𝐚 ⋅ 𝐱 = 𝑎0 ⋅ 𝑥0 + 𝑎1 ⋅ 𝑥1 +. . . +𝑎𝑚 ⋅ 𝑥𝑚 (vector notation, inner product)
• In matrix notation,
𝑔(𝐱) = 𝐚T ⋅ 𝐱 = 𝑎0 ⋅ 𝑥0 + 𝑎1 ⋅ 𝑥1 +. . . +𝑎𝑚 ⋅ 𝑥𝑚

• For convenience, x0 = 1. Hence there are m input variables (a.k.a. features).


• The same as for the 1D problem, an error such as the MSE can be computed.
• The coefficients that minimise the error are sought using an optimisation algorithm.
Linear Regression Mauro S. Innocente
17 | 18
• Now every data point is given by an input vector and a scalar output.
• And the MSE is given by:
2
𝑛 𝑚
1
𝑀𝑆𝐸 = ⋅ ෍ 𝑦𝑖 − ෍ 𝑎𝑗 ⋅ 𝑥𝑖𝑗
𝑛 Subindex “i” stands for ith training example.
𝑖=1 𝑗=0
Subindex “j” stands for jth input variable.
xij: Value of the jth variable of the ith training example.

• If a gradient descent algorithm was to be used for the optimisation:


𝑛 𝑚
(𝑡) (𝑡−1) 𝜕𝑀𝑆𝐸 𝑡−1 (𝑡) (𝑡−1) 2 (𝑡−1)
𝑎𝑗 = 𝑎𝑗 −𝑘⋅ 𝐚 ⇒ 𝑎𝑗 = 𝑎𝑗 + 𝑘 ⋅ ⋅ ෍ 𝑥𝑖𝑗 ⋅ 𝑦𝑖 − ෍ 𝑎𝑠 ⋅ 𝑥𝑖𝑠
𝜕𝑎𝑗 𝑛
𝑖=1 𝑠=0

• Data Normalisation: If different variables have different orders or magnitude, scaling them to
the same order (e.g. [0, 1]) speeds-up convergence.
• Mean Normalisation: Technique used to work with zero-mean variables.

Linear Regression Mauro S. Innocente


18 | 18

You might also like