Lecture 12 - Adv. Correlation and Multiple Regression

Linear Regression
Lecture 12
INSTRUC TOR:
DR. MAHA AMIN HASSANEIN
P R O F E S S O R E N G I N E E R I N G M AT H E M AT I C S A N D P H Y S I C S D E PA R T M E N T
F A C U LT Y O F E N G I N E E R I N G
CAIRO UNIVERSITY
Study Outline
Multiple Linear Regression
Variance-Covariance Matrix
R2 Best of fit
Spring 2024 DR. MAHA A. HASSANEIN 2

Making predictions based on multiple factors X on an
outcome Y
Y : the Dependent Variable
(𝑋1 , 𝑋2 , … , 𝑋𝑘 ): Independent Variables
Regression Coefficients
𝛽0 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
(𝛽1 , 𝛽2 , … , 𝛽𝑘 ) the slopes
gives the strength and direction of the relationship

Under Same assumptions in the simple linear regression,
the Multiple Linear Regression Equation:
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + … + 𝛽𝑘 𝑋𝑘 + 𝜀
ε represents the error term (unexplained variation in Y)

Illustrative Example
Data: Female/Male Wage , yearly wage , Age in years, Educ: level
1 to 4 , partime –job (1 if no works ,0 if work )
The linear relation ship between the response (Wage) and the
predictor variables is given by :
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + 𝛽4 𝑥4𝑖 + 𝑒𝑖

Dependent :
𝑦 = 𝑊𝑎𝑔𝑒
Independent:
𝑥1 = 𝐹𝑒𝑚𝑎𝑙𝑒, 𝑥2 = 𝐴𝑔𝑒, 𝑥3 = 𝐸𝑑𝑢𝑐, 𝑥4 = 𝑃𝑎𝑟𝑡𝑖𝑚𝑒

Model (In Matrix Form)
For n observations and k factors
𝑌 = 𝑋𝛽 + 𝜀
𝑌 is 𝑛 × 1 vector
𝑋 is 𝑛 × (𝑘 + 1) coefficient matrix
𝛽 is (𝑘 + 1) × 1 unknown parameters
𝜀 𝑛 × 1 error term
෡=𝒃
The estimate of the parameter 𝜷 denoted by 𝜷

Least Squares Solution
The Least Squares solution of
𝑦 = 𝑋𝑏
Such that b that minimizes the residual error
𝑋𝛽 − 𝑋𝑏
The normal equation:
𝑋𝑇𝑋 𝑏 = 𝑋𝑇𝑦
If X is full rank , that is 𝑟𝑎𝑛𝑘 (𝑋) = 𝑘 + 1, then 𝑋 𝑇 𝑋 is
non-singular
−1 𝑇
෠ 𝑇
𝑏= 𝑋 𝑋 𝑋 𝑦
(prove)

Sum of Squares Error (SSE)
An estimate 𝑦ො = 𝑋𝑏 has sum of squares error
𝑛
𝑆𝑆𝐸 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2
𝑖=1
In matrix form rewritten as
𝑆𝑆𝐸 = 𝑦 − 𝑦ො 𝑇 𝑦 − 𝑦ො
𝑻
𝑺𝑺𝑬 = 𝒚 − 𝑿𝒃 𝒚 − 𝑿𝒃

Estimate of σ2
An estimate of the (residual variance) σ2 in 𝜖 ~𝑁 0, 𝜎 2 referred
to as the residual sum of squares
1 𝑛
σ2 = 𝑠𝑒2 =
ෝ ෍ 𝑦𝑖 − 𝑦ො 2
𝑛 − (𝑘 + 1) 𝑖=1
or rewritten in terms of SSE
𝐒𝐒𝑬
𝒔𝟐𝒆 =
𝐧 − (𝒌 + 𝟏)
has degrees of freedom
df = 𝑛 − number of 𝛽′ 𝑠 in model = 𝑛 − (𝑘 + 1)
The standard error of the estimate: 𝑠𝑒 = 𝑠𝑒2

Example 1 ( case k=1)
Use the matrix relations to fit a straight line to the data
X 0 1 2 3 4
y 8 9 4 3 1

Solution
X’ Y X’X 𝑋 ′𝑋 −1 X’y
1 1 1 1 1 8 5 10 0.6 −0.2 25
0 1 2 3 4 9 10 30 −0.2 0.1 30
4
3
1
Compute 𝑏 = 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑦
0.6 −0.2 25
=
−0.2 0.1 30
9
=
−2
The fitted equation: 𝑦ො = 9 − 2𝑥

Cnt’d.
The fitted values : 𝑦ො = 𝑋𝑏
1 0 9
1 1 9 7
= 1 2 = 5
1 3 −2
3
1 4 1
Thus
−1
2
𝑦 − 𝑦ො = −1
0
0
6
→ 𝑆𝑆𝐸 = 6 and 𝑠𝑒2 = = 2.0
5−(1+1)

Case k=2 Two predictor variables
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2
1 𝑥11 𝑥12
𝑏0
1 𝑥21 𝑥22
𝑦ො = 𝑏1 = 𝑋𝑏
⋮ ⋮ ⋮
𝑏2
1 𝑥𝑛1 𝑥𝑛2
𝑛 Σ𝑥1 Σ𝑥2 Σ𝑦
𝑋 𝑇 𝑋 = Σ𝑥1 Σ𝑥12 Σ𝑥1 𝑥2 , 𝑋 𝑇 𝑌 = Σ𝑥1 𝑦
Σ𝑥2 Σ𝑥2 𝑥1 Σ𝑥22 Σ𝑥2 𝑦
−1
𝑏෠ = 𝑋 𝑇 𝑋 𝑋 𝑇 𝑦
1
𝑠𝑒2 = 𝑦 − 𝑋𝑏 𝑇 𝑦 − 𝑋𝑏
𝑛−3

Example ( case k=2)
Data y: Number of twists required to break alloy bar, x1: %of
element A in bar, x2: %of element B in bar.
Fit a least squares regression plane and use to estimate
number of twists required to break a bar with x1=2.5 ,
x2=12.
y 41 49 69 40 50 43
x1 1 2 3 1 2 4
x2 5 5 5 10 10 20

Solution
X’ Y X’X 𝑋 ′𝑋 −1 X’y
1 1 1 1 1 1 41 6 13 55 0.915 −0.244 −0.024 285

1 2 3 1 2 4 42 13 35 140 −0.244 0.233 −0.028 644
5 5 5 10 10 20 69 55 140 675 −0.024 −0.028 0.009 2520
40
50
43
0.915 −0.244 −0.024 285

Compute 𝑏 = 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑦 = −0.244 0.233 −0.028 644
−0.024 −0.028 0.009 2520
𝟒𝟑. 𝟐𝟒
𝒃= 𝟖. 𝟖 ,
−𝟏. 𝟔𝟏𝟓
ෝ = 𝟒𝟑. 𝟐𝟒 + 𝟖. 𝟖𝒙𝟏 − 𝟏. 𝟔𝟏𝟓𝒙𝟐
The fitted equation: 𝒚

The estimated variances and covariance's of the least squares estimators
expressed as: (Assuming zero mean of error, constant σ² for 𝑒𝑖 𝑒𝑖 and 0
for 𝑒𝑖 𝑒𝑗 ):
෢ (𝑏0 )
𝑉𝑎𝑟 ෢ 0 , 𝑏1 ) ⋯ 𝐶𝑜𝑣(𝑏
𝐶𝑜𝑣(𝑏 ෢ 0 , 𝑏𝑘 )
2 ′ −1 ෢ 1 , 𝑏0 )
𝐶𝑜𝑣(𝑏 ෢ 1)
𝑉𝑎𝑟(𝑏 ⋯ ෢ 1 , 𝑏𝑘 )
𝐶𝑜𝑣(𝑏
𝑠𝑒 ∗ 𝑋 𝑋 =
⋮ ⋮ ⋱ ⋮
෢ 𝑘 , 𝑏0 ) 𝐶𝑜𝑣(𝑏
𝐶𝑜𝑣(𝑏 ෢ 𝑘 , 𝑏1 ) ෢ 𝑘)
⋯ 𝑉𝑎𝑟(𝑏
Let 𝑋 ′ 𝑋 −1 =C , then
2
𝜎ො𝑏2𝑖 ෢ 𝑏𝑖 = 𝐸
= 𝑉𝑎𝑟 𝑏𝑖 − 𝐸 𝑏𝑖 = 𝑐𝑖𝑖 𝑠𝑒2
෢ 𝑏𝑖 , 𝑏𝑗 = 𝐸
𝜎ො𝑏𝑖 𝑏𝑗 = 𝐶𝑜𝑣 𝑏𝑖 − 𝐸 𝑏𝑖 𝑏𝑗 − 𝐸 𝑏𝑗 = 𝑐𝑖𝑗 𝑠𝑒2

Example 1.
The Variance-Covariance Matrix
𝐶 = 𝑠𝑒2 ∗ 𝑋 ′ 𝑋 −1
0.6 −0.2
= 2.0
−0.2 0.1
The estimated variance and covariance
෢ 𝑏0 = 2.0 ∗ 𝑐11 = 1.2
𝜎ො𝑏20 = 𝑉𝑎𝑟
෢ 𝑏1 = 2.0 ∗ 𝑐22 = 0.2
𝜎ො𝑏21 = 𝑉𝑎𝑟
෢ 𝑏1 , 𝑏0 = −0.4
𝐶𝑜𝑣

*Recall : Estimates 𝛼ො and 𝛽መ
As 𝛼ො = 𝑎 and𝛽መ = 𝑏 are linear functions of independent
normal variables, they are random variables and normally
distributed
𝑎~𝑁(𝛼, 𝜎𝑎2 ) and 𝑏~𝑁(𝛽, 𝜎𝑏2 )
1 𝑥ҧ 2 Σ𝑥𝑥
2 2 2
𝜇𝑎 = 𝛼, 𝜎𝑎 = 𝑠𝑒 + = 𝑠𝑒
𝑛 𝑆𝑥𝑥 𝑛𝑆𝑥𝑥
𝑠 2
𝑒
𝜇𝑏 = 𝛽 , 𝜎𝑏2 =
𝑆𝑥𝑥
Fall 2021 DR. MAHA A. HASSANEIN 19

Example 1. CI of regression
parameters
For 95% confidence 𝑡𝛼 = 2.306
2
CI for intercept :
𝑎 𝜖 9 ± 2.306 ∗ 2.0 × 0.6
CI for slope :
𝑏 𝜖 − 2 ± 2.306 ∗ 2.0 × 0.1

Estimates 𝛽መ𝑖
As 𝛽መ𝑖 = 𝑏𝑖 are linear functions of independent normal
variables, they are random variables and normally
distributed. Thus,
1 − 𝛼 %, CI of 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 is
𝛽መ𝑖 ± 𝑡𝛼,𝑛−𝑘 𝑠𝑒 𝑐𝑖𝑖
2
(prove)

R-squared
Measure of Quality of Fit
R-squared, also known as the coefficient of determination, measures how well
the linear regression model fits the data. It ranges from 0 to 1.
The Measure of the proportion of y variability explained by the linear
model
𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑑𝑢𝑒 𝑡𝑜 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝑇𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
2
𝑆𝑥𝑦
൘
𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑥𝑥 2
𝑆𝑥𝑦
= = = = 𝑟2
𝑆𝑆𝑇 𝑆𝑦𝑦 𝑆𝑥𝑥 𝑆𝑦𝑦

Recall:Correlation Coefficient r
The correlation coefficient is the sum of the products of the standardized
observations
1 σ 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
𝑟=
𝑛−1 𝑠𝑥 𝑠𝑦
Since,
σ 𝑥𝑖 − 𝑥ҧ 2
𝑆𝑥𝑥
𝑠𝑥2 = =
𝑛−1 𝑛−1
A simple formula for r
𝑆𝑥𝑦
𝑟=
𝑆𝑥𝑥 𝑆𝑦𝑦

Values of R-squared
If SSE=0 then
𝑟 2 =1 “Fit is perfect”
Elseif SSE ≈SST, then
𝑟2 ≈ 0 “Poor Fit“
R-squared values between 0 and 0.3: Weak fit

R-squared values between 0.3 and 0.7: Moderate fit
R-squared values above 0.7: Strong fit

Example 3
The following are the numbers of minutes to complete
x Y
a task in the morning ,x, and in the late afternoon, y. 11.1 10.9
Calculate the sample correlation coefficient. 10.3 14.2
12.0 13.8
15.1 21.5
13.7 13.2
18.5 21.1
17.3 16.4
14.2 19.3
14.8 17.4
15.3 19.0
Solution Σ𝑥𝑥 = 2085.31
x Y
11.1 10.9
Σ𝑥𝑦 = 2434.69
10.3 14.2 Σ𝑦𝑦 = 2897.80
12.0 13.8
The sums :
15.1 21.5
142.3 2
13.7 13.2 𝑆𝑥𝑥 = 2085.31 − = 60.381
10
18.5 21.1
142.3 166.8
17.3 16.4 𝑆𝑥𝑦 = 2434.69 −
10
14.2 19.3 = 61.126
14.8 17.4
166.8 2
15.3 19.0
𝑆𝑦𝑦 = 2897.80 − = 115.576
10
Σ𝑥 =142.3 Σ𝑦 =166.8 61.126
Hence, 𝑟 = = 0.732
60.381∗115.576
Sol. Cnt’d
As r =0.732 , the proportion of variation in y
attributed to x is
𝑟 2 = 0.732 2 = 0.536
an R-squared over 0.5 indicates the morning assembly times
have a statistically significant (over 50%) ability to linearly
predict and explain what will happen with the afternoon
times based on the sample data

Summary Boston Results
Call : fit1lm(formula = medv ~ lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.168 -3.990 -1.318 2.034 24.500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Intercept 34.55384 0.56263 61.41 <2e-16
lstat -0.95005 0.03873 -24.53 <2e-16
Residual standard error: 6.216 on 504 degrees of freedom

Multiple R-squared: 0.5441, Adjusted R-squared: 0.5432
F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16

*Example 4 using R
data(mtcars)
str(mtcars)
head(mtcars)
model <- lm(mpg ~ wt, data = mtcars)
summary(model)
plot(mtcars$wt, mtcars$mpg, xlab = "Weight",
ylab = "Miles per Gallon", main = "Scatter Plot of
MPG vs. Weight")
abline(model, col = "red")
t.test(mpg ~ am, data = mtcars)
cor(mtcars$mpg, mtcars$hp)
Ans. Cor=-0.7761684

Cnt’d
>summary(model)
Call: lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10

Cnt’d.
Create scatter plot with regression
line and confidence interval
library(ggplot2)
scatter_plot <-
ggplot(mtcars, aes(x =
wt, y = mpg)) +
geom_point() +
geom_smooth(method =
"lm", se = TRUE, color =
"red") + labs(x =
"Weight", y = "Miles per
Gallon", title = "Scatter
Plot of MPG vs. Weight")
+ theme_minimal()
scatter_plot

Text book
Chapter 11.
Sec. 11.1 - 11.7
Reference

Thank you for your attention
Maha

Lecture 12 - Adv. Correlation and Multiple Regression

Uploaded by

Copyright:

Available Formats

You might also like

Lecture 12 - Adv. Correlation and Multiple Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 12 - Adv. Correlation and Multiple Regression

Uploaded by

Copyright:

Available Formats

Linear Regression

Spring 2024 DR. MAHA A. HASSANEIN 2

Spring 2024 DR. MAHA A. HASSANEIN 3

ε represents the error term (unexplained variation in Y)

Spring 2024 DR. MAHA A. HASSANEIN 4

𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + 𝛽4 𝑥4𝑖 + 𝑒𝑖

Spring 2024 DR. MAHA A. HASSANEIN 5

Spring 2024 DR. MAHA A. HASSANEIN 6

Spring 2024 DR. MAHA A. HASSANEIN 7

Spring 2024 DR. MAHA A. HASSANEIN 8

The standard error of the estimate: 𝑠𝑒 = 𝑠𝑒2

Spring 2024 DR. MAHA A. HASSANEIN 9

Spring 2024 DR. MAHA A. HASSANEIN 10

Spring 2024 DR. MAHA A. HASSANEIN 11

Spring 2024 DR. MAHA A. HASSANEIN 12

Spring 2024 DR. MAHA A. HASSANEIN 13

Spring 2024 DR. MAHA A. HASSANEIN 14

1 1 1 1 1 1 41 6 13 55 0.915 −0.244 −0.024 285

0.915 −0.244 −0.024 285

Spring 2024 DR. MAHA A. HASSANEIN 15

Spring 2024 DR. MAHA A. HASSANEIN 16

Spring 2024 DR. MAHA A. HASSANEIN 18

Fall 2021 DR. MAHA A. HASSANEIN 19

Fall 2021 DR. MAHA A. HASSANEIN 20

Fall 2021 DR. MAHA A. HASSANEIN 21

Spring 2024 DR. MAHA A. HASSANEIN 22

Spring 2024 DR. MAHA A. HASSANEIN 23

R-squared values between 0 and 0.3: Weak fit

Spring 2024 DR. MAHA A. HASSANEIN 24

Spring 2024 DR. MAHA A. HASSANEIN 27

Residual standard error: 6.216 on 504 degrees of freedom

Spring 2024 DR. MAHA A. HASSANEIN 28

Fall 2021 DR. MAHA A. HASSANEIN 29

Spring 2024 DR. MAHA A. HASSANEIN 30

Spring 2024 DR. MAHA A. HASSANEIN 31

Spring 2024 DR. MAHA A. HASSANEIN 32

Spring 2024 DR. MAHA A. HASSANEIN 33

You might also like