Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

LINEAR REGRESSION MODELS W4315

HOMEWORK 1 ANSWERS
October 4, 2009

Instructor: Frank Wood (10:35-11:50)

1. (25 points) Let Yi = β0 + β1 Xi + i be a linear regression model with distribution of


error terms unspecified (butPwith mean E() = 0 and variance V (i ) = σ 2 (σ 2 finite) known).
2
i −Ŷi )
Show that s2 = M SE = (Yn−2 is an unbiased estimator for σ 2 . Ŷi = b0 + b1 Xi where
P
i ((X −X̄)(Yi −Ȳ ))
b0 = Ȳ − b1 X̄ and b1 = Pi
i (Xi −X̄)

Answer:

First, let’s denote the followings:


eˆi = yi − ŷi
X n
SXX = (xi − x̄)2
i=1
n
X
SY Y = (yi − ȳ)2
i=1
n
X
SXY = (xi − x̄)(yi − ȳ)
i=1

Now we set out to prove the following equation which accomplishes essentially the final
result: n
X
V areˆi = E eˆi 2 = ( n−2
n
+ 1
(
SXX n
1
x2j + x2i − 2(xi − x̄)2 − 2xi x̄))σ 2
j=1
To prove the above, realize that:

V ar(eˆi ) = V ar(yi − βˆ0 − βˆ1 xi )


= V ar((yi − β0 − β1 xi ) − (βˆ0 − β0 ) − xi (βˆ1 − β1 ))
= V ar(yi ) + V ar(βˆ0 ) + x2 V ar(βˆ1 ) − 2Cov(yi , βˆ0 ) − 2xi Cov(yi , βˆ1 ) + 2xi Cov(βˆ0 , βˆ1 )
i

The last equation holds because the covariance between any random variable and a constant
is zero, and all the yi ’s are independent entailing that the Cov(yi , yj ) = 0, i 6= j
V ar(yi ) = σ 2

1
Notice that(some algebras needed here, and the following tricks are crucial in reducing the
amount of calculation):
P
(xiP− x̄) = 0
xi −x̄yi
β1 = SXX
So now we have:

SXY
V ar(β1 ) = V ar( )
SXX
P
(xi − x̄)yi
= V ar( )
SXX
1 X
= xi − x̄2 V ar(yi )
SXX 2
σ2
=
SXX

And:

V ar(β0 ) = V ar(ȳ − βˆ1 x̄)


X 1 (xi − x̄)x̄
= V ar( ( − )yi )
n SXX
X 1 xi − x̄
= ( − x̄)2 σ 2
n SXX
X 1 SXX ∗ x̄2 2 x̄(xi − x̄) 2
= [ 2+ 2
− ]σ
n SXX n XSS
1 nx̄2 2
=[ + ]σ
nP SXX
xi 2 2
= σ
n ∗ SXX
For the other terms in the decomposition of V ar(eˆi ), we have:

sumxi − x̄yi
Cov(yi , βˆ1 ) = Cov(yi , )
SXX
xi − x̄
= V ar(yi )
SXX
xi − x̄ 2
= σ
SXX
and:

2
Cov(yi , βˆ0 ) = Cov(yi , ȳ − βˆ1 x̄)
P
yi X
= Cov(yi , − (xi − x̄)yi SXX x̄)
n
σ2 xi − x̄ 2
= + x̄ σ
n SXX
At last, we have:

Cov(βˆ0 , βˆ1 ) = Cov(ȳ − βˆ1 x̄, βˆ1 )


P
yi X (xi − x̄)x̄ X (xi − x̄)yi
= Cov( − yi , )
n SXX SXX
n
X 1 xi − x̄ xi − x̄ 2
= ( − x̄) σ
i=i
n SXX SXX

=− σ2
SXX
Then plug in all the parts back to the decomposition of V ar(eˆi ), we have:
n
X
n−1 1 1
V ar(eˆi ) = ( n + SXX ( n xj 2 + x2i − 2(xi − x̄)2 − 2xi x̄))σ 2
j=1
Thus,

n
1X
2
E σ̂ = E eˆi 2
n i=1
n n
1 X n−2 1 1X 2
= [ + ( x + x2i − 2(xi − x̄)2 − 2xi x̄)]σ 2
n i=1 n SXX n j=1 j
n n n
n−2 1 X 2 X 2 1 X 2 2
=[ + { xj + xi − 2SXX − 2 ( xi ) }]σ
n nSxx j=1 i=1
n i=1
n−2
=( + 0)σ 2
n
n−2 2
= σ
n

xi x̄ = n1 ( xi )2
P P
where the third equation holds because:
P 2 1 P 2
and the second to last equation holds since xi − n ( xi ) = SXX
From the above equation, the result flows.

3
2. (25 points) Derive the maximimum likelihood estimators β̂0 , β̂1 , and σ̂ 2 for parameters
β0 , β1 , and σ 2 for the normal linear regression model (i.e. i ∼iid N (0, σ 2 )).

Answer:

To figure the MLE of the parameters, we need to first write down the likelihood function of
the data, so under normal assumption, we have the log-likelihood function as follows:
Xn
(yi − β0 − β1 xi )2
logL(β0 , β1 , σ 2 |x, y) = − n2 log(2π) − n2 logσ 2 − i=1 2σ 2
.
2
For any fixed value of σ , logL is maximized as a function of β0 and β1 , that minimize
n
X
(yi − β0 − β1 xi )2 (1)
i=1

But to minimize this function is just to principle behind LSE, so it’s apparent that the MLE
of β0 and β1 are the same as their LSE’s. Now, substituting in the log-likelihood, to find the
MLE of σ 2 we need to maximize
n
X
(yi − βˆ0 − βˆ1 xi )2
n n i=1
− log(2π) − logσ 2 −
2 2 2σ 2

This maximization problem is nothing but MLE of σ 2 in ordinary normal sampling problems,
which is easily given as
n
1X
2
σ̂ = (yi − βˆ0 − βˆ1 xi )2
n i=1

If you are not familiar with the MLE in normal sampling setting, you can take derivative
with respect to σ 2 (N.B. not σ), and then set the derivative to be zero. The solution of the
equation is just the MLE of σ 2 .

3. (50 points) Do problem 1.19 in the book.

Answer:

(a)
data < −read.table(”f : /T A1.txt”)
attach(data)

4
x < −data[, 2]
y < −data[, 1]
SXX < −sum((x − mean(x))2 )
SY Y < −sum((y − mean(y))2 )
SXY < −sum((x − mean(x)) ∗ (y − mean(y)))
beta.hat < −SXY /SXX
alpha.hat < −mean(y) − beta.hat ∗ mean(x)
We get the result the the LSE of the intercept and the slope are 2.11 and .038.
The corresponding regression line is thus

Y = .038 + 2.11X (2)

(b) From the graph, we can see that the regression passes through the center of the major
part of the data, but does not capture all the features of the data.
(c)
Plug x = 30 into (2), and the result is the point estimation of mean GPA with ACT score
being 30.

ŷ = α̂ + β̂ ∗ x = 2.11 + .038 ∗ 30 = 3.28

(d)

5
It is nothing but the slope of the estimated regression line, since the slope can be interpreted
as the average GPA will increase by .038 when the ACT score is enhanced by one point.

You might also like