Professional Documents
Culture Documents
Multivariate Linear Regression
Multivariate Linear Regression
Lec. 22
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 1
Example of Linear Regression
2
R = 19 %
80
75
𝑛𝑛 = 40 75
65 65
strength [MPa]
strength [MPa]
60 60
MPa m3
55 55
𝜃𝜃̂1 = 0.1368
𝑦𝑦
Kg
50 50
45 45
2400 2420 2440 2460 2480 2400 2420 2440 2460 2480 2500
𝑥𝑥
3 3
density [kg/m ] density [kg/m ]
𝛼𝛼 = 5% ⇒ 1 − 𝛼𝛼 = 95% confidence
𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃0 = 𝜃𝜃̂0 − 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 0 ; 𝜃𝜃̂0 + 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 0 = −500.9; −47.9 MPa
MPa m3
𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 = 𝜃𝜃̂1 − 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 1 ; 𝜃𝜃̂1 + 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
�1 = 0.0442; 0.2295
Kg
𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 : value for t-distr. with 𝑛𝑛 − 2 0 ∉ 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 ⇒ reject 𝐻𝐻0 : 𝜃𝜃1 = 0 ,
degrees of freedom. with significance 𝛼𝛼 = 5%,
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 2
Further topics on Regression
After performing regression, we assess how much of the uncertainty of the output is
explained by conditioning on the input.
This is quantified by the Coefficient of Determination 𝑅𝑅2 .
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 3
again, MultiVariate Normal RVs: conditional variance
The conditional distribution is normal, with - Cond. Mean 𝜇𝜇𝑌𝑌|𝑥𝑥 linearly varying with 𝑥𝑥 ;
2 - Cond. Variance 𝜎𝜎𝑌𝑌2|𝑋𝑋 invariant respect to 𝑥𝑥.
𝑌𝑌| 𝑋𝑋 = 𝑥𝑥 ~𝒩𝒩 𝜇𝜇𝑌𝑌|𝑥𝑥 , 𝜎𝜎𝑌𝑌|𝑋𝑋
𝜎𝜎𝑌𝑌
𝔼𝔼𝑌𝑌|𝑥𝑥 𝑌𝑌|𝑥𝑥 = 𝜇𝜇𝑌𝑌|𝑥𝑥 = 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋 Marginal variance:
𝕍𝕍𝑌𝑌|𝑥𝑥 𝑌𝑌|𝑥𝑥 = 𝜎𝜎𝑌𝑌2|𝑋𝑋 = 1 − 𝜌𝜌2 𝜎𝜎𝑌𝑌2 𝕍𝕍𝑌𝑌 𝑌𝑌 = 𝜎𝜎𝑌𝑌2
0.75
0.5
y
0.25
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 4
Contributions to total variance
𝜇𝜇𝑌𝑌 + 2𝜎𝜎𝑌𝑌 For any pair of RVs, we can define marginal and
conditional moments, and the law says that
𝜇𝜇𝑌𝑌|𝑥𝑥 the total uncertainty about 𝑌𝑌 is the sum of
𝜇𝜇𝑌𝑌|𝑥𝑥 + 2𝜎𝜎𝑌𝑌|𝑥𝑥 the explained and the unexplained uncertainties.
𝜇𝜇𝑌𝑌 𝑃𝑃𝑌𝑌|𝑋𝑋 𝑦𝑦 𝑥𝑥
𝜇𝜇𝑌𝑌|𝑥𝑥 − 2𝜎𝜎𝑌𝑌|𝑥𝑥
For MVN, 𝜎𝜎𝑌𝑌2|𝑥𝑥 does not change with 𝑥𝑥, so
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 5
Univariate ANOVA
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 6
Univariate ANOVA, cont.
𝑛𝑛
1
Sample estimators: 𝜎𝜎𝑌𝑌2 ≅ 𝑉𝑉�𝑌𝑌,𝑛𝑛 = � 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛 2
prior uncertainty of 𝑌𝑌
𝑛𝑛 − 1
𝑖𝑖=1
𝑛𝑛
1
Adjusted definition: 𝜎𝜎𝑌𝑌2|𝑋𝑋 = 𝜎𝜎𝜀𝜀2 ≅ 𝑉𝑉�𝜀𝜀,𝑛𝑛 = � 𝑟𝑟𝑖𝑖2 [unbiased]
𝑛𝑛 − 𝑚𝑚 − 1
𝑖𝑖=1
𝑉𝑉�𝜀𝜀,𝑛𝑛 ∑ 𝑛𝑛
𝑖𝑖=1 𝑟𝑟 2
𝑖𝑖 / 𝑛𝑛 − 𝑚𝑚 − 1
𝑅𝑅� 2 = 1 − = 1 − 𝑛𝑛
𝑉𝑉�𝑌𝑌,𝑛𝑛 ∑𝑖𝑖=1 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛 2 / 𝑛𝑛 − 1
The adjusted coeff. of determination 𝑅𝑅� 2 is less than the classical one 𝑅𝑅2 .
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 8
Multiple features
outcome variable 𝑚𝑚 features: (aka independent variables,
(aka dependent variable) predictors, covariates)
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 9
Multivariate linear regression: basics
𝐱𝐱1 𝑦𝑦1
Matrix of 𝐱𝐱 2 size: Vector of 𝑦𝑦2 size:
𝐗𝐗 = ⋮ 𝐲𝐲 = ⋮
regressors: 𝑛𝑛 × 𝑚𝑚 + 1 outputs: 𝑛𝑛 × 1
𝐱𝐱 𝑛𝑛 𝑦𝑦𝑛𝑛
𝑓𝑓1 𝜃𝜃0
Vector of 𝑓𝑓2 size: Vector of 𝜃𝜃1 size:
predictions: 𝐟𝐟 = parameters: 𝛉𝛉 =
⋮ 𝑛𝑛 × 1 ⋮ 𝑚𝑚 + 1 × 1
𝑓𝑓𝑛𝑛 𝜃𝜃𝑚𝑚
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 11
MLE for multiple regression with normal noise
rss
� 𝑛𝑛 �
To maximize Log-LH, we minimize rss𝑛𝑛 : � = 𝐗𝐗 + 𝐲𝐲; 𝜎𝜎�𝜀𝜀2 =
𝛉𝛉 � 𝑛𝑛 ≜ rss𝑛𝑛 𝛉𝛉
; rss
𝑛𝑛 − 𝑚𝑚 − 1
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 12
Uncertainty in the estimation of regression parameters
−1 T
Estimation: � = 𝐗𝐗 T 𝐗𝐗
𝛉𝛉 𝐗𝐗 𝐲𝐲 𝐘𝐘 = 𝐗𝐗 𝛉𝛉 + 𝛆𝛆 𝛆𝛆~𝒩𝒩 𝟎𝟎, 𝜎𝜎𝜀𝜀2 𝐈𝐈
−1 T −1 T
� = 𝐗𝐗 T 𝐗𝐗
𝛉𝛉 𝐗𝐗 𝐗𝐗 𝛉𝛉 + 𝐗𝐗 T 𝐗𝐗 𝐗𝐗 𝛆𝛆 = 𝛉𝛉 + 𝐗𝐗 + 𝛆𝛆 error
𝐗𝐗 � 𝐗𝐗, 𝜎𝜎𝜀𝜀2 = 𝛉𝛉 unbiased est. (as error is zero mean)
𝔼𝔼 𝛆𝛆 = 𝟎𝟎 ⇒ 𝔼𝔼 𝛉𝛉�𝛉𝛉,
𝛉𝛉 �
𝛉𝛉 −1 T −1
� 𝐗𝐗, 𝜎𝜎𝜀𝜀2 = 𝜎𝜎𝜀𝜀2 𝐗𝐗 + 𝐗𝐗 + T = 𝜎𝜎𝜀𝜀2 𝐗𝐗 T 𝐗𝐗
𝕍𝕍 𝛉𝛉�𝛉𝛉, 𝐗𝐗 𝐗𝐗 𝐗𝐗 T 𝐗𝐗
𝐘𝐘
−1
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2 = 𝜎𝜎𝜀𝜀2 𝐗𝐗 T 𝐗𝐗 ≜ 𝚺𝚺Θ
−1
≅ 𝜎𝜎�𝜀𝜀2 𝐗𝐗 T 𝐗𝐗 �Θ
≜ 𝚺𝚺
𝐗𝐗
𝛉𝛉, 𝜎𝜎𝜀𝜀2 � is proportional 𝜎𝜎𝜀𝜀2 ,
� 𝜎𝜎�𝜀𝜀2 The matrix of uncertainty of 𝛉𝛉
𝛉𝛉, −1
while 𝐗𝐗 T 𝐗𝐗 tends to the zero matrix for large 𝑛𝑛.
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 13
Uncertainty in the estimation of noise level
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 14
Example of Multivariate Regression
j name 𝜃𝜃̂𝑗𝑗 se
� 𝑗𝑗 𝑡𝑡𝑗𝑗 𝒫𝒫𝑗𝑗
0 (Intercept) -589.4 167.6 -3.59 0.0012 ** If 𝒫𝒫𝑗𝑗 is small, we are
1 Age 1.041 0.446 2.331 0.0255 * confident that 𝜃𝜃𝑗𝑗
2 South State 11.29 13.24 0.853 0.3994 should not be zero.
3 Education 1.178 0.681 1.728 0.0926
4 Expenditures 0.964 0.249 3.861 0.0005 ***
5 Labor 0.106 0.153 0.692 0.4935
6 Numb. Males 0.303 0.222 1.363 0.1813
7 Population 0.090 0.138 0.652 0.5185
8 Unempl. (14-24) -0.682 0.480 -1.418 0.1648
9 Unempl. (25-39) 2.150 0.950 2.262 0.0299 *
10 Wealth -0.083 0.091 -0.913 0.3672
𝑡𝑡𝑗𝑗 = 𝜃𝜃̂𝑗𝑗 /se
� 𝑗𝑗
𝒫𝒫𝑗𝑗 = 2Φ − 𝑡𝑡𝑗𝑗
We use Φ under the normal approx., and the Student’s t CDF for small 𝑛𝑛.
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 15
Prediction of new values of 𝑓𝑓 and 𝑦𝑦, given 𝐱𝐱
How to predict 𝑦𝑦∗ , the outcome at features 𝐱𝐱 ∗ ?
True value: 𝑓𝑓∗ = 𝑓𝑓 𝐱𝐱 ∗ , 𝛉𝛉 = 𝐱𝐱 ∗ 𝛉𝛉 estimator: 𝑓𝑓̂∗ = 𝐱𝐱 ∗ 𝛉𝛉
�
−1 T
Including noise in 𝑦𝑦∗ : � 2 𝑦𝑦∗ = 𝜎𝜎�𝜀𝜀2 𝐱𝐱 ∗ 𝐗𝐗 T 𝐗𝐗
se 𝐱𝐱 ∗ +1
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 17
Proof of orthogonality
We can derive the MLE estimator for from the condition that the residual
vector is orthogonal respect to prediction vector:
𝐟𝐟 = 𝐗𝐗 𝛉𝛉 prediction
𝛅𝛅 = 𝐲𝐲 − 𝐟𝐟 = 𝐲𝐲 − 𝐗𝐗 𝛉𝛉 residual funct. of 𝛉𝛉
T
�
𝐟𝐟 ⊥ 𝛅𝛅 ⇔ 0 = 𝐟𝐟 T 𝛅𝛅 = 𝐗𝐗 𝛉𝛉 �
𝐲𝐲 − 𝐗𝐗 𝛉𝛉
�T 𝐗𝐗 T 𝐲𝐲 − 𝐗𝐗 T 𝐗𝐗 𝛉𝛉
⇔ 0 = 𝛉𝛉 �
⟸ 𝟎𝟎 = �
𝐗𝐗 T 𝐲𝐲 − 𝐗𝐗 T 𝐗𝐗 𝛉𝛉 the bi-direction inference ⇔ is not certain.
−1 T
� = 𝐗𝐗 T 𝐗𝐗
⇔ 𝛉𝛉 𝐗𝐗 𝐲𝐲 = 𝐗𝐗 + 𝐲𝐲
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 18
Regression, depending on the feature set, I
10
Feature: 𝑋𝑋0 : constant, 𝑋𝑋1 : linear time trend,
𝑋𝑋2 : temperature, 𝑋𝑋3 : load.
We assume measures are generated as follows:
5
0
𝑓𝑓 𝐗𝐗, 𝛉𝛉 = 𝑋𝑋0 𝜃𝜃0 + 𝑋𝑋1 𝜃𝜃1 + ⋯ + 𝑋𝑋𝑚𝑚 𝜃𝜃𝑚𝑚 , 𝑌𝑌 = 𝑓𝑓 + 𝜀𝜀 .
-5 � 𝜎𝜎�𝜀𝜀2 from data analysis.
We identify parameters 𝛉𝛉,
-10 The outcome depends on the par.s set 𝑆𝑆 included.
0 5 10 15
-1
0 5 10 15
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 19
Regression, depending on the feature set, II
0
𝑓𝑓 𝐗𝐗, 𝛉𝛉 = 𝑋𝑋0 𝜃𝜃0 + 𝑋𝑋1 𝜃𝜃1 + ⋯ + 𝑋𝑋𝑚𝑚 𝜃𝜃𝑚𝑚 , 𝑌𝑌 = 𝑓𝑓 + 𝜀𝜀 .
-5 � 𝜎𝜎�𝜀𝜀2 from data analysis.
We identify parameters 𝛉𝛉,
-10
The outcome depends on the par.s set 𝑆𝑆 included.
0 5 10 15
-1
95% 𝐶𝐶𝐶𝐶𝑛𝑛 true
𝜎𝜎�𝜀𝜀 = 5.99 𝜃𝜃0 5.8 7.7 -2.5
0 5 10 15
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 20
Regression, depending on the feature set, III
10
Feature: 𝑋𝑋0 : constant, 𝑋𝑋1 : linear time trend,
𝑋𝑋2 : temperature, 𝑋𝑋3 : load.
We assume measures are generated as follows:
5
0
𝑓𝑓 𝐗𝐗, 𝛉𝛉 = 𝑋𝑋0 𝜃𝜃0 + 𝑋𝑋1 𝜃𝜃1 + ⋯ + 𝑋𝑋𝑚𝑚 𝜃𝜃𝑚𝑚 , 𝑌𝑌 = 𝑓𝑓 + 𝜀𝜀 .
-5 � 𝜎𝜎�𝜀𝜀2 from data analysis.
We identify parameters 𝛉𝛉,
-10 The outcome depends on the par.s set 𝑆𝑆 included.
0 5 10 15
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 21
Regression, depending on the feature set, IV
0
𝑓𝑓 𝐗𝐗, 𝛉𝛉 = 𝑋𝑋0 𝜃𝜃0 + 𝑋𝑋1 𝜃𝜃1 + ⋯ + 𝑋𝑋𝑚𝑚 𝜃𝜃𝑚𝑚 , 𝑌𝑌 = 𝑓𝑓 + 𝜀𝜀 .
-5 � 𝜎𝜎�𝜀𝜀2 from data analysis.
We identify parameters 𝛉𝛉,
-10
The outcome depends on the par.s set 𝑆𝑆 included.
0 5 10 15
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 22
Regression, depending on the feature set, V
20
0 5 10 15
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 23
Building a base of functions for regression
0 0.5 1 1.5 2
e.g., we can define : 𝑋𝑋1 = 𝑋𝑋, 𝑋𝑋2 = 𝑋𝑋 2 , 𝑋𝑋3 = 𝑋𝑋 3 , …
polynomial fitting.
40 Polynomial regression function:
35 𝑓𝑓 𝐱𝐱, 𝛉𝛉 = 𝜃𝜃0 + 𝜃𝜃1 𝑋𝑋 + 𝜃𝜃2 𝑋𝑋 2 + ⋯ + 𝜃𝜃𝑚𝑚 𝑋𝑋 𝑚𝑚
30
𝑌𝑌 = 𝑓𝑓 + 𝜀𝜀 𝑚𝑚: polynomial degree
25
0 0.5 1 1.5 2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 24
Building a base of functions for regression, II
1 𝑚𝑚 = 0
0 0.5 1 1.5 2
10
0 0.5 1 1.5 2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 25
Building a base of functions for regression, III
2
If the assumed polynomial is a constant (𝑚𝑚 = 0),
estimating 𝜃𝜃0 is estimating 𝜇𝜇𝑌𝑌 .
1 𝑚𝑚 = 0
0 0.5 1 1.5 2
10
0 0.5 1 1.5 2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 26
Building a base of functions for regression, IV
2
If the assumed polynomial is a linear (𝑚𝑚 = 1),
dataset is still underfitted.
1 𝑚𝑚 = 0
0 0.5 1 1.5 2
10
0 0.5 1 1.5 2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 27
Building a base of functions for regression, V
2
If the assumed polynomial is a parabolic (𝑚𝑚 = 2),
the assumption is correct, and the actual parameters are
1 𝑚𝑚 = 0 in the estimated conf. intervals.
0 0.5 1 1.5 2
10
0 0.5 1 1.5 2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 28
Building a base of functions for regression, VI
2
If the assumed polynomial is a cubic (𝑚𝑚 = 3),
the cubic coefficient is uncertain,
1 𝑚𝑚 = 0 and so the prediction is also highly uncertain.
0 0.5 1 1.5 2
10
0 0.5 1 1.5 2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 29
Building a base of functions for regression, VII
2
If the assumed polynomial is of order four (𝑚𝑚 = 4),
also the coefficient of power four is uncertain,
1 𝑚𝑚 = 0 and so the prediction is even more uncertain.
0 0.5 1 1.5 2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 30
Building a base of functions for regression, VIII
2
If the assumed polynomial is of order four (𝑚𝑚 = 12),
the optimal coefficients cannot be identified, because of
1 𝑚𝑚 = 0 numerical issues related to co-linearity.
0 0.5 1 1.5 2
25
20
15
10
0 0.5 1 1.5 2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 31
Summary
Remarks:
1. What is the “meaning” of regression parameter 𝜃𝜃𝑖𝑖 ?
It represents how regression function 𝑓𝑓 changes under variation 𝑥𝑥𝑖𝑖 → 𝑥𝑥𝑖𝑖 + 1.
Δ𝑓𝑓 = … + 𝜃𝜃𝑖𝑖 𝑥𝑥𝑖𝑖 + 1 + ⋯ − … + 𝜃𝜃𝑖𝑖 𝑥𝑥 + ⋯ = 𝜃𝜃𝑖𝑖
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 32
References and readings
https://en.wikipedia.org/wiki/Linear_regression
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 33
Proof of error formulas for straight-line regression
We derive the formulas for the cov. matrix in the two-par. est. of straight line,
−1
starting from the general formulas in vector-matrix notation: 𝚺𝚺Θ = 𝜎𝜎𝜀𝜀2 𝐗𝐗 T 𝐗𝐗
Vector-matrix notation:
𝐲𝐲 = 𝜃𝜃0 𝐱𝐱 + 𝜃𝜃1 𝟏𝟏 = 𝐱𝐱 𝟏𝟏 𝛉𝛉 = 𝐗𝐗𝛉𝛉 with 𝐗𝐗 = 𝐱𝐱 𝟏𝟏
T T 𝐱𝐱
𝐱𝐱 T 𝟏𝟏 = 𝑛𝑛 𝑋𝑋𝑛𝑛2 𝑛𝑛 𝑋𝑋�𝑛𝑛 𝑋𝑋𝑛𝑛2 𝑋𝑋�𝑛𝑛
𝐗𝐗 T 𝐗𝐗 = 𝐱𝐱 T 𝐱𝐱 𝟏𝟏 = 𝐱𝐱 = 𝑛𝑛
𝟏𝟏 𝐱𝐱 T 𝟏𝟏 𝟏𝟏T 𝟏𝟏 𝑛𝑛 𝑋𝑋�𝑛𝑛 𝑛𝑛 𝑋𝑋�𝑛𝑛 1
Matrix inversion:
−1
𝑎𝑎 𝑏𝑏
−1 1 𝑑𝑑 −𝑏𝑏 𝑋𝑋𝑛𝑛2 𝑋𝑋�𝑛𝑛 1 1 −𝑋𝑋�𝑛𝑛
= ⇒ =
𝑐𝑐 𝑑𝑑 𝑎𝑎𝑎𝑎 − 𝑏𝑏𝑏𝑏 −𝑐𝑐 𝑎𝑎 𝑋𝑋�𝑛𝑛 1 𝑋𝑋𝑛𝑛2 − 𝑋𝑋�𝑛𝑛2 −𝑋𝑋�𝑛𝑛 𝑋𝑋𝑛𝑛2
𝑉𝑉�𝑋𝑋,𝑛𝑛
−1 𝜎𝜎𝜀𝜀2 1 −𝑋𝑋�𝑛𝑛
Hence: 𝚺𝚺Θ = 𝜎𝜎𝜀𝜀2 𝐗𝐗 T 𝐗𝐗 = as reported in past lecture.
𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛 −𝑋𝑋�𝑛𝑛 𝑋𝑋𝑛𝑛2
12
24 704: Prob Est Eng Sys Lec. 22 Multivariate Linear Regression 34