Professional Documents
Culture Documents
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
A sample of n observations is observed in the form of (yi , xi1 , . . . , xip ) for the ith
observation:
Y X1 X2 Xp
y1 x11 x12 x1p
y2 x21 x22 x2p
.. .. .. .. ..
. . . . .
yn xn1 xn2 xnp
where
Y = X + , Nn (0, 2 In ),
1
where
0
y1 1 x11 x1p
1 1
y2 1 x21 x2p
2 .. .
Y= .. , X= .. .. .. .. , = and = .
. . . . . ..
. n
yn 1 xn1 xnp
p
Notations:
0 , 1 , . . . , p : unknown regression coecients;
Y Nn (X, 2 In ).
Note: The word linear in a linear model refers to the property that E[Y] is
linear in the regression coecients 0 , . . . , p , but not necessarily in each ex-
planatory variable.
2
2 Estimation of Regression Coecients
In the multiple regression model, the least squares (LS) estimates 0 , 1 , , p
for 0 , 1 , , p are chosen to minimize
n
(yi 0 1 xi1 p xip )2 . (2.1)
i=1
Since the above expression is cumbersome we would like to have a neat solution.
To this end, by (1.1) rewrite (2.1) as
1
n 2
i2 = (1 2 p ) . = = (Y X) (Y X). (2.3)
.
.
i=1
p
To minimize (2.3) we need to take derivative with respect to the vector . It is
like a quadratic function. Using the chain rule we nd
d [ ]
(Y X) (Y X) = 2X (Y X).
d
Setting this equal to zero and solving for we obtain the normal equation
2X X 2X Y = 0.
It follows that
b = (X X)1 X Y.
(2.4)
3
The tted values are then represented by
b
Y = X (3.1)
HX = X,
H2 = H,
H = H,
(I H)2 = I H,
(I H) = I H.
In terms of the hat matrix H, we may further write (3.1) and (3.2) as
Y = HY (3.3)
and
e = (I H)Y. (3.4)
4 An example
A data set is taken from an environmental study that measured four variables
for 30 consecutive days, which contains 30 observations (rows) and 4 variables
(columns), is shown in Table 1:
4
Table 1: Environmental data
ozone radiation temperature wind
3.45 190.00 67.00 7.40
3.30 118.00 72.00 8.00
2.29 149.00 74.00 12.60
2.62 313.00 62.00 11.50
2.84 299.00 65.00 8.60
2.67 99.00 59.00 13.80
2.00 19.00 61.00 20.10
2.52 256.00 69.00 9.70
2.22 290.00 66.00 9.20
2.41 274.00 68.00 10.90
2.62 65.00 58.00 13.20
2.41 334.00 64.00 11.50
3.24 307.00 66.00 12.00
1.82 78.00 57.00 18.40
3.11 322.00 68.00 11.50
2.22 44.00 62.00 9.70
1.00 8.00 59.00 9.70
2.22 320.00 73.00 16.60
1.59 25.00 61.00 9.70
3.17 92.00 61.00 12.00
2.84 13.00 67.00 12.00
3.56 252.00 81.00 14.90
4.86 223.00 79.00 5.70
3.33 279.00 76.00 7.40
3.07 127.00 82.00 9.70
4.14 291.00 90.00 13.80
3.39 323.00 87.00 11.50
2.84 148.00 82.00 8.00
2.76 191.00 77.00 14.90
3.33 284.00 72.00 20.70
Figure 1 displays a scatter plot matrix of the data, providing visual informa-
tion about pairwise relationships among the 4 variables.
5
0 50 150 250 10 15 20
5
4
ozone
3
2
1
250
radiation
150
0 50
90
80
temperature
70
60
20
15
wind
10
1 2 3 4 5 60 70 80 90
6
5 Model checks for adequacy
5.1 ANOVA table
Denote 1 = (1, ..., 1) to be a n-dimension column vector, and then A = n1 11 is
a n n matrix with all entries equal to 1/n. Hence, formulas for sums of squares
are given as follows.
Syy = (yi y)2 = yi2 ny 2 = Y Y Y AY
SSE = (yi yi )2 = e e = Y (In H)2 Y = Y (In H)Y
SSR = (yi y)2 = yi2 ny 2 = Y H2 Y Y AY = Y (H A)Y.
where we use (3.3) and (3.4) and the properties of the hat matrix H. From the
above matrix expression for sums of squares one may easily verify the following
partition
Syy = SSR + SSE.
Table below shows these analysis of variance results.
Source df SS MS F p-value
Regression p SSR = (yi y)2 MSReg MSReg /s2
Residual np1 SSE = (yi yi )2 s2
Total n1 Syy = (yi y)2
where
SSR
MSReg = .
dfR
The dierence SSR measures how eective the variables X1 , ..., Xp are to ex-
plain the variation in the response Y collectively.
5.2 F test
It can be proved that E(s2 ) = 2 , as for SLR. The expectation of MSReg is 2 plus
a quantity that is nonnegative. For instance, when p = 2, we have
1[ 2 ]
n n n
E(MSReg ) = 2 + 1 (xi1 x1 )2 +22 (xi2 x2 )2 +21 2 (xi1 x1 )(xi2 x2 ) .
2
i=1 i=1 i=1
Note that if both 1 and 2 are equal to zero, then E(MSReg ) = 2 . Otherwise
E(MSReg ) > 2 .
This suggests that a comparison of MSReg and s2 is useful for testing whether
there is a regression relation between the response variable y and the predictor
variables x1 , , xp .
7
The formal F test for the signicance of the MLR is equivalent to testing
H0 : 1 = = p = 0 vs H1 : at least one j = 0
MSReg
F= s2
F(p, n p 1).
and
YT Ai Y 2 r2i , for i = 1, . . . , m,
where ri = tr(Ai ), the trace of Ai .
For the matrices involved in SSE and SSR one may verify that
A + (In H) + (H A) = In ;
Then, by Cochrans Theorem, we have that SSE and SSR are independent, and
SSR p2 and 2
SSE np1 .
8
5.3 R2 statistic
R2 statistic:
SSR
R2 = .
Syy
Adding more variables will make R2 go up because SSE will never become larger
with more predicator variables and SSy is always the same for a given set of
responses ; so cannot really use R2 to determine whether a variable should be
added.
Adjusted R2 statistic:
SSE/(n p 1)
Ra2 = 1 .
Syy /(n 1)
Sum of squares are adjusted by degree of freedom. Note that:
Ra2 is mainly used to measure the performance of the tted model on dif-
ferent data sets, especially with dierent sample size.
Generally, for a good model, the R2 should not be small (< 60%) or not be
large (> 95%).
EXAMPLE 3 Example 1 (contd)
ANOVA table:
Source df SS MS F p-value
Regression 3 7.6685 2.5562 7.3244
Residual 26 9.0738 0.3490
Total 29 16.7423
Calculating using computer,
For a size test, we read the critical values from standard statistical tables
or computer software. For example,
()
Critical value F3,26
5% 2.9752
1% 4.6365
In either case, the F-ratio 7.3244 is bigger than the critical value and we
should reject H0 at the stated signicance level.
9
6 Inference for Regression Coecients
Tests model signicance give no indication of which variable(s) in particular are
important. So we need to consider tests for individual regression coecients.
bi Np+1 (, 2 (X X)1 ). It follows that
Since Y N(X, 2 I) we have
i N(i , 2 (X X)1
ii )
where (X X)1 1 2
ii denotes the i diagonal element of (X X) . Note that is unknown
in practice. As in SLR one may verify that s2 = SSE/(n p 1) = MSE is an
unbiased estimator for 2 . Moreover we have
i i
tnp1 ,
s.e.(i )
i
t =
s.e.(i )
and the decision rule is
so that
s.e.(2 2 ) = s 0.0005338541 = 0.01365.
10
The 95% condence interval for 2 is
ozone_raw<-read.table(C:/ozone.txt,header=FALSE)
ozone<-data.frame(y=ozone_raw$V1,x1=ozone_raw$V2,
x2=ozone_raw$V3,x3=ozone_raw$V4)
plot(ozone)
> mlr<-lm(y~x1+x2+x3,data=ozone)
> summary(mlr)
Call:
lm(formula = y ~ x1 + x2 + x3, data = ozone)
Residuals:
Min 1Q Median 3Q Max
-1.13583 -0.39280 0.00007 0.39270 1.41993
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.295271 0.998348 -0.296 0.76976
x1 0.001306 0.001080 1.209 0.23746
x2 0.045606 0.013650 3.341 0.00253 **
11
x3 -0.027843 0.030203 -0.922 0.36507
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> anova(mlr)
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 3.1469 3.1469 9.0172 0.005845 **
x2 1 4.2250 4.2250 12.1062 0.001787 **
x3 1 0.2966 0.2966 0.8498 0.365073
Residuals 26 9.0738 0.3490
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
R code for condence interval of 2
> confint(mlr,x2,level=0.95)
2.5 % 97.5 %
x2 0.01754858 0.0736629
12