Multiple Linear Regression

Estimation and Inference

Indicator Variables
Summary and Remark

Lecture 4 Linear Regression II

STAT 441: Statistical Methods for Learning and Data Mining

Jinhan Xie

Department of Mathematical and Statistical Sciences

University of Alberta

Winter, 2023

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark


1 Multiple Linear Regression

2 Estimation and Inference

3 Indicator Variables

4 Summary and Remark

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Multiple Linear Regression

Multiple Linear Regression has more than one covariates,

Y = β0 + β1 X1 + · · · + βp Xp + ε,

where usually ε ∼ N(0, σ 2 ).

We interpret βj as the average effect on Y of a one unit increase in Xj ,
while holding all the other covariates fixed.
In the advertising example, the model becomes

Sales = β0 + β1 × TV + β2 × Radio + β3 × Newspaper + ε.

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Coefficient Interpretation

The ideal scenario is when the predictors are uncorrelated — a balanced

Each coefficient can be estimated and tested separately.
Interpretations such as a unit change in Xj is associated with a βj change in
Y , while all the other variables stay fixed, are possible.
Correlations amongst predictors cause problems.
The variance of all coefficient tends to increase, sometimes dramatically.
Interpretations become hazardous — when Xj changes, everything else
Claims of causality should be avoided for observational data.

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

The woes of regression coefficients

Data Analysis and Regression, Mosteller and Tukey 1977

A regression coefficient βj estimates the expected change in Y per unit
change in Xj , with all other predictors held fixed. But predictors usually
change together!
Example: Y total amount of change in your pocket; X1 = # of coins;
X2 = # of pennies, nickels and dimes. By itself, regression coefficient of
Y on X2 will be > 0. But how about with X1 in model?
Y = number of tackles by a football player in a season; W and H are his
weight and height. Fitted regression model is Y = β0 + 0.50W − 0.10H.
How do we interpret βˆ2 < 0?

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Two quotes by famous Statisticians

1919 - 2013 (aged 93)

Essentially, all models are wrong, but some are useful.

George Box
The only way to find out what will happen when a complex system is
disturbed is to disturb the system, not merely to observe it passively.
Fred Mosteller and John Tukey, paraphrasing George Box
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Coefficient estimation

Given the estimates βˆ0 , βˆ1 , · · · , and βˆp , the estimated regression line is

ŷ = βˆ0 + βˆ1 x1 + · · · + βˆp xp .

We estimate all the coefficients βj , j = 0, 1, · · · , p as the values that

minimize the sum of squared residuals
RSS = (yi − ŷi )2 ,

where ŷi = βˆ0 + βˆ1 xi1 + · · · + βˆp xip is the predicted values.
This is done using standard statistical software. The values βˆ0 , βˆ1 , · · · ,
and βˆp that minimize RSS are the multiple least squares regression
coefficient estimates.

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Estimation Example



Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Observations: yi , xij , for i = 1, . . . , n and j = 0, 1, . . . , p. Noise: ε1 , . . . , εn .

Unknown parameters: β0 , β1 , . . . , βp , which we try to mine data for.

y1 = β0 + β1 x11 + · · · + βp x1p + ε1
y2 = β0 + β1 x21 + · · · + βp x2p + ε2
yn = β0 + β1 xn1 + · · · + βp xnp + εn

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

We write
  
1 x11 x12 · · · x1p β0
 1 x21 x22 · · · x2p  β1 
Y=  + ε,
  
.. .. .. .. ..  ..
 . . . . .  . 
1 xn1 xn2 · · · xnp βp

Y = X β + ε ,
(n×1) (n×(p+1))((p+1)×1) n×1

which is an elegant and equivalent matrix form.

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Least squares estimation

RSS = (yj − b0 − b1 xj1 − · · · − bp xjp )2

= (y − Xb)0 (y − Xb)
When X has full rank p + 1 ≤ n, the least square estimate of β is
β = (X0 X)−1 X0 y.
Let x ∈ Rm , A, B be constant matrices, and c be a constant vector.
Dx hAx, ci = A0 c, Dx hAx, Bxi = B0 Ax + A0 Bx.
Therefore Db RSS = Db hXb − y, Xb − yi = 2X 0 (Xb − y).
We let Db RSS = 0 to obtain the estimate.
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Fit a straight-line regression model

Y = β0 + β1 X1 + ε,

with data
X1 0 1 2 3 4
Y 1 4 3 8 9

   
1 0 1
 1 1   4   
    0 5 10
X= 1 2 , y= 3 , XX= .
    10 30
 1 3   8 
1 4 9

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Recall that if
a11 a12
A= ,
a21 a22

and if A is invertible, then |A| = a11 a22 − a12 a21 and

1 a22 −a12 a11 a12 1 0
= .
|A| −a21 a11 a21 a22 0 1

−1 a22 −a12
A = , and
−a21 a11
− 51
0 −1 1 30 −10 5
(X X) = = .
5 × 30 − 10 × 10 −10 5 − 15 1

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

On the other hand,

X0Y = ,
− 15
5 25 1
β̂ = = .
− 51 1
10 70 2

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark


Is at least one predictor useful?

H0 : β1 = · · · = βp = 0, v.s. H1 : otherwise
Insight: to test if there is a linear relationship between Y and X1 , . . . , Xp .
TSS−RSS)/p ∼ F
Test statistic F = (RSS/(n−p−1) p,n−p−1 .

X n
RSS = (yi − ŷi )2 , TSS = (yi − ȳ)2
i=1 i=1

Under H0 , F ∼ Fp,n−p−1 , the F distribution with degrees of freedom p

and n − p − 1. One rejects H0 if F is too large (depending on the test
significance), in which case β̂ is significant.

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark


What about an individual coefficient, say if βj useful?

H0 : βj = 0, v.s. H1 : βj 6= 0.
j β̂ −0
Test statistic: t = ∼ tn−p−1 .
SE(β̂j )
tn−p−1 , the t-distribution with n − p − 1 degrees of freedom.
SE(β̂j ) is called the estimated standard error, or simply standard
error of the coefficient β̂j ,
SE(β̂j ) = [(X 0 X)−1 ]jj ,

where (X 0 X)−1 jj is the (j, j)-entry of the matrix (X 0 X)−1 .


Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

R2 and Adjusted R2

2 (ŷi − ȳ)2
R = Pi=1
n 2
i=1 (yi − ȳ)

R2 ∈ [0, 1] is called the coefficient of determination. It gives the

proportion of the total variation in the yj ’s, that is “explained” by, or
attributable to, the predictor variables X1 , . . . , Xp .
R2 = 1 if the fitted equation passes through all the data point (perfect fit,
ŷj = yj for any j).
R2 = 0 if βˆ0 = ȳ and β̂1 = · · · = β̂p = 0, in which case X1 , . . . , Xp have
no influence on the response Y.

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

R2 and Adjusted R2
The adjusted R2 statistic
RSS/(n − r − 1)
R2Adj = 1 − .
TSS/(n − 1)
(ŷi − ȳ)2 (we skip the proof),
since TSS = RSS +
− ȳ)2
2 i=1 (ŷi
R = =1−
In general, R never decreases when one more prediction variable (e.g.
Xp+1 ) is added to the model, even if the new predictor contributes
Pn 2
i=1 (yi − ŷi ) /(n − r − 1) is the residual mean square. When one more
predictor is added in, R2Adj is increased only if
Pn 2
i=1 (yi − ŷi ) /(n − r − 1) is decreased.
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Estimating the regression function at x0

For the linear regression model Y = Xβ + ε, x00 β̂ = β̂0 + β̂1 x01 + · · · + β̂p x0p
is the unbiased linear estimator of E[Y|x0 ]. If the errors ε1 , . . . , εn are
independently distributed and all have N(0, σ 2 ) distribution then a
100(1 − α)% confidence interval for E[Y|z0 ] = z00 β is given by
α q
x00 β̂ ± tn−p−1 (x00 (X 0 X)−1 x0 )s2 ,
1 Pn
where s2 = n−p−1 2
i=1 (yi − ŷi ) .

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Advertising example

Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.938889 0.311908 9.422 <2e-16 ***
TV 0.045765 0.001395 32.809 <2e-16 ***
Radio 0.188530 0.008611 21.893 <2e-16 ***
Newspaper -0.001037 0.005871 -0.177 0.86
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1.686 on 196 degrees of freedom

Multiple R-squared: 0.8972, Adjusted R-squared: 0.8956
F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16

> predict(TVadlm, newdata, interval="c", level=0.95)

fit lwr upr
1 20.52397 19.99627 21.05168
> predict(TVadlm, newdata, interval="p", level=0.95)
fit lwr upr
1 20.52397 17.15828 23.88967

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Indicator Variables

Some predictors are not quantitative but are qualitative, taking a discrete
set of values.
These are also called categorical predictors or factor variables.
Example: investigate difference in credit card balance between males
and females, ignoring the other variables. We create a new variable,
1 if i-th person is female,
xi = .
0 if i-th person is male

Resulting model
β0 + β1 + ε i if i-th person is female,
yi = β0 + β1 xi + εi = .
β0 + εi if i-th person is male

Interpretation and more than two levels (categories)?

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Indicator Variables

In general, if we have k levels, we need (k − 1) indicator variables.

For example, we have 3 levels — A, B, and C for a covariate x,
( (
1 if x is A, 1 if x is B,
xA = ; xB = .
0 if x is not A 0 if x is not B

If x is C, then xA = xB = 0. We call C as baseline.

βA is the contrast between A and C and βB is the contrast between B and

Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark

Summary and Remark

Multiple linear regression

Estimation and inference
Indicator variables
Read textbook Chapter 3

