Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

In the simple regression model, it is assumed that the error terms

1 ,  2 ,..., n are from the normal distribution with  = 0 and


i .i .d
variance =  , or  i ~ N (0,  ).
2 2

Q ? How do we examine above assumption?

Usually, there are two parts that needs to check, which

concludes the assumptions of normality and independence.

(1) for the normality:


^ ^
To check the(estimated) error terms  1 ,..., n that are from
N (0, 2 ) , we will use the normal probability plot to examine

these data. This procedure contains the following steps.

Step1: First, we need to rank the observations s1 ,..., sn from the

smallest to the largest.

Step2: Compute the probabilities z1 ,..., z n with the following

formula.
Step 3: Plot the points ( z i , s i ) on the xy-plane.

Conclusion: If the graph is close to a straight line, then these

data could be treated as a normal sample.

(2) for the independence:

The Durbin-Watson test is used widely in testing the hypothesis

like H 0 : data are independent v.s. H 1 : not H 0 . People

can find this method from the related texts if interested.


Multiple regression model

The multiple regression model is defined to be

Yi =  0 + 1 X 1i +  2 X 2i + ... +  k X ki +  i , i = 1,..., n = X  + 
i .i .d .
where  i ~ N (0,  ) , X is an n  (k + 1) matrix and k  1
2

is needed in this model.

The process to find the estimations for those parameters in this

model is similar to what we did in the simple regression model.

That is, we find the estimations to minimize the SSE(rror) or


min Q = min   i2 = min  (Yi − 0 − 1 X 1i − ... −  k X ki ) 2
  

People can find that the normal equations to this model in

matrix form is

X ' X  = X 'Y

and hence the least square estimator of  is


−1
 = ( X ' X ) −1 X ' Y if ( X ' X ) exists.

People need to check the second-order condition to guarantee

the minimization of this function.


The formulas for SST, SSE, and SSR are the same as before,

that is,

1
SST= Y ' ( I − 1n 1n )Y , SSE= Y ' ( I − X ( X ' X ) X ' )Y , and
' −1
n

SSR=SST-SSE.

One can apply the Cochran’s Thm. to get the corresponding

distributions for above different sources. The results are

summarized in the following ANOVA table.

S.S. d.f. M.S. E(M.S.) F-ratio

SSR SSR / k
SSR = SST − SSR k  + ??? SSE /( n − k − 1)
2
k
^2 SSE
SSE =   i n − k −1 2
n − k −1

SST =  (Yi − Y ) 2 n − 1

The modification in above table is the degrees of freedom

changed from 1 to k .
Interpretations for the parameters

(1) The interpretation for the parameter  r is at follows. If the

independent variable X r increases one unit while other

variables remain the same, then the dependent variable Y will

increase  r units. This is because

Y = [  0 + 1 X 1 + ... +  r ( X r + 1) + ... +  k X k ] −

[  0 + 1 X 1 + ... +  r X r + ... +  k X k ]

= r

(2) The interpretation for the interception  0 is when we keep

all independent variables X 1 ,..., X k at the zero level, then  0

is the response of the dependent variable Y . That is,

Y = [ 0 + 1 (0) + ... +  k (0)] =  0

We can use the multi-Normal distribution to get the distribution


^
of  and apply those results to perform the testing hypothesis.

From previous assumption, we have Y ~ N ( X  ,  I ) and thus


2

^
 ~ N (  ,  2 ( X ' X ) −1 ) . Some properties are listed at the

following discussion.
^
Properties of 
^
There are some properties about  that will be introduced.

(1) From the result of multi-Normal distribution, the estimator


^
of  i is still a Normal distribution.
^
(2)  is the unbiased estimator of  .

(3) We use the F-value in ANOVA table to test the hypothesis

H 0 :  1 =  2 = ... =  k = 0 .

(4) The sources SST, SSR, and SSE follow the

 2 -distribution with d.f. n − 1, k , and n − k − 1 , separately.

(5) The estimator of 2 in multiple regression is


^
^
 =
SSE
2
=
  i
2

n − k −1 n − k −1
(6) The estimator in (5) follows a  2 -distribution with d.f.

n − k −1.

You might also like