Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Multiple Linear Regression

Estimation and Inference


Indicator Variables
Summary and Remark

Lecture 4 Linear Regression II


STAT 441: Statistical Methods for Learning and Data Mining

Jinhan Xie

Department of Mathematical and Statistical Sciences


University of Alberta

Winter, 2023

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 1/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Outline

1 Multiple Linear Regression

2 Estimation and Inference

3 Indicator Variables

4 Summary and Remark

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 2/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Multiple Linear Regression

Multiple Linear Regression has more than one covariates,

Y = β0 + β1 X1 + · · · + βp Xp + ε,

where usually ε ∼ N(0, σ 2 ).


We interpret βj as the average effect on Y of a one unit increase in Xj ,
while holding all the other covariates fixed.
In the advertising example, the model becomes

Sales = β0 + β1 × TV + β2 × Radio + β3 × Newspaper + ε.

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 3/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Coefficient Interpretation

The ideal scenario is when the predictors are uncorrelated — a balanced


design.
Each coefficient can be estimated and tested separately.
Interpretations such as a unit change in Xj is associated with a βj change in
Y , while all the other variables stay fixed, are possible.
Correlations amongst predictors cause problems.
The variance of all coefficient tends to increase, sometimes dramatically.
Interpretations become hazardous — when Xj changes, everything else
changes.
Claims of causality should be avoided for observational data.

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 4/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

The woes of regression coefficients

Data Analysis and Regression, Mosteller and Tukey 1977


A regression coefficient βj estimates the expected change in Y per unit
change in Xj , with all other predictors held fixed. But predictors usually
change together!
Example: Y total amount of change in your pocket; X1 = # of coins;
X2 = # of pennies, nickels and dimes. By itself, regression coefficient of
Y on X2 will be > 0. But how about with X1 in model?
Y = number of tackles by a football player in a season; W and H are his
weight and height. Fitted regression model is Y = β0 + 0.50W − 0.10H.
How do we interpret βˆ2 < 0?

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 5/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Two quotes by famous Statisticians

1919 - 2013 (aged 93)

Essentially, all models are wrong, but some are useful.


George Box
The only way to find out what will happen when a complex system is
disturbed is to disturb the system, not merely to observe it passively.
Fred Mosteller and John Tukey, paraphrasing George Box
Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 6/23
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Coefficient estimation

Given the estimates βˆ0 , βˆ1 , · · · , and βˆp , the estimated regression line is

ŷ = βˆ0 + βˆ1 x1 + · · · + βˆp xp .

We estimate all the coefficients βj , j = 0, 1, · · · , p as the values that


minimize the sum of squared residuals
X
RSS = (yi − ŷi )2 ,
i

where ŷi = βˆ0 + βˆ1 xi1 + · · · + βˆp xip is the predicted values.
This is done using standard statistical software. The values βˆ0 , βˆ1 , · · · ,
and βˆp that minimize RSS are the multiple least squares regression
coefficient estimates.

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 7/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Estimation Example

X2

X1

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 8/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Observations: yi , xij , for i = 1, . . . , n and j = 0, 1, . . . , p. Noise: ε1 , . . . , εn .


Unknown parameters: β0 , β1 , . . . , βp , which we try to mine data for.

y1 = β0 + β1 x11 + · · · + βp x1p + ε1
y2 = β0 + β1 x21 + · · · + βp x2p + ε2
..
.
yn = β0 + β1 xn1 + · · · + βp xnp + εn

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 9/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

We write
  
1 x11 x12 · · · x1p β0
 1 x21 x22 · · · x2p  β1 
Y=  + ε,
  
.. .. .. .. ..  ..
 . . . . .  . 
1 xn1 xn2 · · · xnp βp
or,

Y = X β + ε ,
(n×1) (n×(p+1))((p+1)×1) n×1

which is an elegant and equivalent matrix form.

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 10/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Least squares estimation

n
X
RSS = (yj − b0 − b1 xj1 − · · · − bp xjp )2
j=1

= (y − Xb)0 (y − Xb)
When X has full rank p + 1 ≤ n, the least square estimate of β is
β = (X0 X)−1 X0 y.
β̂
Derivation:
Let x ∈ Rm , A, B be constant matrices, and c be a constant vector.
Dx hAx, ci = A0 c, Dx hAx, Bxi = B0 Ax + A0 Bx.
Therefore Db RSS = Db hXb − y, Xb − yi = 2X 0 (Xb − y).
We let Db RSS = 0 to obtain the estimate.
Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 11/23
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Example
Fit a straight-line regression model

Y = β0 + β1 X1 + ε,

with data
X1 0 1 2 3 4
.
Y 1 4 3 8 9

Solution.
   
1 0 1
 1 1   4   
    0 5 10
X= 1 2 , y= 3 , XX= .
    10 30
 1 3   8 
1 4 9

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 12/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Recall that if
 
a11 a12
A= ,
a21 a22

and if A is invertible, then |A| = a11 a22 − a12 a21 and


    
1 a22 −a12 a11 a12 1 0
= .
|A| −a21 a11 a21 a22 0 1

So
 
−1 a22 −a12
1
A = , and
−a21 a11
|A|
  3
− 51
 
0 −1 1 30 −10 5
(X X) = = .
5 × 30 − 10 × 10 −10 5 − 15 1
10

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 13/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

On the other hand,


 
25
X0Y = ,
70
so
3
− 15
    
5 25 1
β̂ = = .
− 51 1
10 70 2

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 14/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Inference

Is at least one predictor useful?


H0 : β1 = · · · = βp = 0, v.s. H1 : otherwise
Insight: to test if there is a linear relationship between Y and X1 , . . . , Xp .
TSS−RSS)/p ∼ F
Test statistic F = (RSS/(n−p−1) p,n−p−1 .

n
X n
X
RSS = (yi − ŷi )2 , TSS = (yi − ȳ)2
i=1 i=1

Under H0 , F ∼ Fp,n−p−1 , the F distribution with degrees of freedom p


and n − p − 1. One rejects H0 if F is too large (depending on the test
significance), in which case β̂ is significant.

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 15/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Inference

What about an individual coefficient, say if βj useful?


H0 : βj = 0, v.s. H1 : βj 6= 0.
j β̂ −0
Test statistic: t = ∼ tn−p−1 .
SE(β̂j )
tn−p−1 , the t-distribution with n − p − 1 degrees of freedom.
SE(β̂j ) is called the estimated standard error, or simply standard
error of the coefficient β̂j ,
s
RSS
SE(β̂j ) = [(X 0 X)−1 ]jj ,
n−p−1

where (X 0 X)−1 jj is the (j, j)-entry of the matrix (X 0 X)−1 .


 

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 16/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

R2 and Adjusted R2

Pn
2 (ŷi − ȳ)2
R = Pi=1
n 2
i=1 (yi − ȳ)

R2 ∈ [0, 1] is called the coefficient of determination. It gives the


proportion of the total variation in the yj ’s, that is “explained” by, or
attributable to, the predictor variables X1 , . . . , Xp .
R2 = 1 if the fitted equation passes through all the data point (perfect fit,
ŷj = yj for any j).
R2 = 0 if βˆ0 = ȳ and β̂1 = · · · = β̂p = 0, in which case X1 , . . . , Xp have
no influence on the response Y.

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 17/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

R2 and Adjusted R2
The adjusted R2 statistic
RSS/(n − r − 1)
R2Adj = 1 − .
TSS/(n − 1)
n
(ŷi − ȳ)2 (we skip the proof),
P
since TSS = RSS +
i=1
Pn
− ȳ)2
2 i=1 (ŷi
RSS
R = =1−
TSS TSS
2
In general, R never decreases when one more prediction variable (e.g.
Xp+1 ) is added to the model, even if the new predictor contributes
nothing.
Pn 2
i=1 (yi − ŷi ) /(n − r − 1) is the residual mean square. When one more
predictor is added in, R2Adj is increased only if
Pn 2
i=1 (yi − ŷi ) /(n − r − 1) is decreased.
Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 18/23
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Estimating the regression function at x0

For the linear regression model Y = Xβ + ε, x00 β̂ = β̂0 + β̂1 x01 + · · · + β̂p x0p
is the unbiased linear estimator of E[Y|x0 ]. If the errors ε1 , . . . , εn are
independently distributed and all have N(0, σ 2 ) distribution then a
100(1 − α)% confidence interval for E[Y|z0 ] = z00 β is given by
α q
x00 β̂ ± tn−p−1 (x00 (X 0 X)−1 x0 )s2 ,
2
1 Pn
where s2 = n−p−1 2
i=1 (yi − ŷi ) .

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 19/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Advertising example

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.938889 0.311908 9.422 <2e-16 ***
TV 0.045765 0.001395 32.809 <2e-16 ***
Radio 0.188530 0.008611 21.893 <2e-16 ***
Newspaper -0.001037 0.005871 -0.177 0.86
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1.686 on 196 degrees of freedom


Multiple R-squared: 0.8972, Adjusted R-squared: 0.8956
F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16

> predict(TVadlm, newdata, interval="c", level=0.95)


fit lwr upr
1 20.52397 19.99627 21.05168
> predict(TVadlm, newdata, interval="p", level=0.95)
fit lwr upr
1 20.52397 17.15828 23.88967

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 20/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Indicator Variables

Some predictors are not quantitative but are qualitative, taking a discrete
set of values.
These are also called categorical predictors or factor variables.
Example: investigate difference in credit card balance between males
and females, ignoring the other variables. We create a new variable,
(
1 if i-th person is female,
xi = .
0 if i-th person is male

Resulting model
(
β0 + β1 + ε i if i-th person is female,
yi = β0 + β1 xi + εi = .
β0 + εi if i-th person is male

Interpretation and more than two levels (categories)?


Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 21/23
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Indicator Variables

In general, if we have k levels, we need (k − 1) indicator variables.


For example, we have 3 levels — A, B, and C for a covariate x,
( (
1 if x is A, 1 if x is B,
xA = ; xB = .
0 if x is not A 0 if x is not B

If x is C, then xA = xB = 0. We call C as baseline.


βA is the contrast between A and C and βB is the contrast between B and
C.

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 22/23


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Summary and Remark

Multiple linear regression


Estimation and inference
Indicator variables
Read textbook Chapter 3

Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 23/23

You might also like