Professional Documents
Culture Documents
Lecture 4 Linear Regression II: STAT 441: Statistical Methods For Learning and Data Mining
Lecture 4 Linear Regression II: STAT 441: Statistical Methods For Learning and Data Mining
Jinhan Xie
Winter, 2023
Outline
3 Indicator Variables
Y = β0 + β1 X1 + · · · + βp Xp + ε,
Coefficient Interpretation
Coefficient estimation
Given the estimates βˆ0 , βˆ1 , · · · , and βˆp , the estimated regression line is
where ŷi = βˆ0 + βˆ1 xi1 + · · · + βˆp xip is the predicted values.
This is done using standard statistical software. The values βˆ0 , βˆ1 , · · · ,
and βˆp that minimize RSS are the multiple least squares regression
coefficient estimates.
Estimation Example
X2
X1
y1 = β0 + β1 x11 + · · · + βp x1p + ε1
y2 = β0 + β1 x21 + · · · + βp x2p + ε2
..
.
yn = β0 + β1 xn1 + · · · + βp xnp + εn
We write
1 x11 x12 · · · x1p β0
1 x21 x22 · · · x2p β1
Y= + ε,
.. .. .. .. .. ..
. . . . . .
1 xn1 xn2 · · · xnp βp
or,
Y = X β + ε ,
(n×1) (n×(p+1))((p+1)×1) n×1
n
X
RSS = (yj − b0 − b1 xj1 − · · · − bp xjp )2
j=1
= (y − Xb)0 (y − Xb)
When X has full rank p + 1 ≤ n, the least square estimate of β is
β = (X0 X)−1 X0 y.
β̂
Derivation:
Let x ∈ Rm , A, B be constant matrices, and c be a constant vector.
Dx hAx, ci = A0 c, Dx hAx, Bxi = B0 Ax + A0 Bx.
Therefore Db RSS = Db hXb − y, Xb − yi = 2X 0 (Xb − y).
We let Db RSS = 0 to obtain the estimate.
Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 11/23
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark
Example
Fit a straight-line regression model
Y = β0 + β1 X1 + ε,
with data
X1 0 1 2 3 4
.
Y 1 4 3 8 9
Solution.
1 0 1
1 1 4
0 5 10
X= 1 2 , y= 3 , XX= .
10 30
1 3 8
1 4 9
Recall that if
a11 a12
A= ,
a21 a22
So
−1 a22 −a12
1
A = , and
−a21 a11
|A|
3
− 51
0 −1 1 30 −10 5
(X X) = = .
5 × 30 − 10 × 10 −10 5 − 15 1
10
Inference
n
X n
X
RSS = (yi − ŷi )2 , TSS = (yi − ȳ)2
i=1 i=1
Inference
R2 and Adjusted R2
Pn
2 (ŷi − ȳ)2
R = Pi=1
n 2
i=1 (yi − ȳ)
R2 and Adjusted R2
The adjusted R2 statistic
RSS/(n − r − 1)
R2Adj = 1 − .
TSS/(n − 1)
n
(ŷi − ȳ)2 (we skip the proof),
P
since TSS = RSS +
i=1
Pn
− ȳ)2
2 i=1 (ŷi
RSS
R = =1−
TSS TSS
2
In general, R never decreases when one more prediction variable (e.g.
Xp+1 ) is added to the model, even if the new predictor contributes
nothing.
Pn 2
i=1 (yi − ŷi ) /(n − r − 1) is the residual mean square. When one more
predictor is added in, R2Adj is increased only if
Pn 2
i=1 (yi − ŷi ) /(n − r − 1) is decreased.
Jinhan Xie (University of Alberta) Lecture 4 LR II Winter, 2023 18/23
Multiple Linear Regression
Estimation and Inference
Indicator Variables
Summary and Remark
For the linear regression model Y = Xβ + ε, x00 β̂ = β̂0 + β̂1 x01 + · · · + β̂p x0p
is the unbiased linear estimator of E[Y|x0 ]. If the errors ε1 , . . . , εn are
independently distributed and all have N(0, σ 2 ) distribution then a
100(1 − α)% confidence interval for E[Y|z0 ] = z00 β is given by
α q
x00 β̂ ± tn−p−1 (x00 (X 0 X)−1 x0 )s2 ,
2
1 Pn
where s2 = n−p−1 2
i=1 (yi − ŷi ) .
Advertising example
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.938889 0.311908 9.422 <2e-16 ***
TV 0.045765 0.001395 32.809 <2e-16 ***
Radio 0.188530 0.008611 21.893 <2e-16 ***
Newspaper -0.001037 0.005871 -0.177 0.86
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Indicator Variables
Some predictors are not quantitative but are qualitative, taking a discrete
set of values.
These are also called categorical predictors or factor variables.
Example: investigate difference in credit card balance between males
and females, ignoring the other variables. We create a new variable,
(
1 if i-th person is female,
xi = .
0 if i-th person is male
Resulting model
(
β0 + β1 + ε i if i-th person is female,
yi = β0 + β1 xi + εi = .
β0 + εi if i-th person is male
Indicator Variables