Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Some notes about the distributions of SST, SSE, and SSR

(1) In Simple (or multiple) regression model, the distributions

of SST and SSR(reg) follow the(exact)  2 -distribution if

the parameters 1   2  ...   k  0 . That is, if the

parameters ( 1 ,  2 ,...,  k )  0 , then SST and SSR are not

from the  2 -distribution.

(2) The F-ratio(the last column)in the ANOVA table follows

the F-distribution is under the hypo. H 0 : 1  ...   k  0 .

(3) The distribution of SSE(error) is exact  2 -distribution

even the hypothesis H 0 : 1  ...   k  0 is not satisfied.

(4) In Cochran’s theory, the random vector Y ~ N (0, I ) , and the

3 sources are following the exact  2 -distribution,

separately.

(5) Since the distribution of SSR(reg) is not from the 2

distribution, the expectation(E(SSR)) is a function of the

parameters 1 ,  2 ,..., and  k (see the last question of the

midterm).
How to fit a nonlinear regression line

We can use the regression analysis to fit a nonlinear regression

line, but it needs to determine the model first.

3.5

2.5

2
數列1
1.5

0.5

0
0 5 10 15 20

From above picture, we can use the following model to fit these

data, which is
i .i .d .
Yi   0  1 X i   2 X   i ,  i ~ N (0,  2 )
i
2

2
(or Y  X    ,  ~ N (0,  I n ) )

 1 X1 X 12 
 
1 X 2 X 22 
where X  .
 ... ... ... 
1 X X n2 
 n

The other processes are the same as we discussed, for example,

the estimations of parameters, ANOVA table, and so on.


Dummy variable(虛擬變數)

In the real world problem, the data could contain some qualitative

variables like colors, gender, and so on. The statistical method

can not analyze them unless they first translate into numerical

variables.

The dummy variable is a method to handle above qualitative

problem. We use the following example to illustrate how to use

the dummy variable.

Ex.1 : The incomes for 4 female’s and 6 male’s are listed and we

would like to know is there any difference in the income for

different gender.

income 1000 1150 1200 990 1400 1250 1320 1060 1180 1360

gender F F M M M F M F M M

In this example, gender is the qualitative variable and we can use

the following way to translate them into numerical values. For

example, we can set 1 for male and set 0 for female. That is,
1, if male

Di =
0, if female

The regression model for this problem can be assumed to be


i .i . d .
2
Yi   0  1 Di   i ,  i ~ N (0,  )

(or Y  X   )
 1 D1   1 
     
 1 D2 
where X   ... ...  ,
   0  , and    ... 
   1   
1 D   10 
 10 

In this example, we can let Y ' =(1000,1150,1250,1060, 1200,990,


 1 1 1 1 1 1 1 1 1 1
1400 ,1320, 1180,1360), and X '    .
 0 0 0 0 1 1 1 1 1 1 

^
1
Use the formula   ( X ' X ) X 'Y , we have

1
^ 10 6  11910  1115 
         and
 6 6   7450   126.7 

the corresponding regression model is

Yi  1115  126.7 * Di
From above model, we have the following results.

(1) When Di  0 , the model gives Y  1115. This means that

the average income for female is $1115 and this value is the
1000  1150  1250  1060
same as .
4

(2) When Di  1 , the model gives Y  1241.7. This means that

the average income for male is $1241.7 and this value is the
1200  990  ...  1360
same as .
6

(3) If we want to know is there any difference of the income

between different gender, we need to test the hypothesis

H 0 : 1  0 . The process is in listed in the following steps.


^
Step 1: Find the distribution of  1 .
^ 10
From previous discussion, we have  1 ~ N (  1 ,  2) .
24

Step 2: If  2 is known, then the test statistic under H 0 is


^
1 0
z ~ N (0,1)
10

24

We reject H 0 if | z | z  , and accept H 0 otherwise.


2

Step 3: If  2 is unknown, then the test statistic under H 0 is


^
1 0
t ~ t( n2 )
10 ^

24
^ SSE (error )
where   .
n2

We reject H 0 if | t | t  , and accept H 0 otherwise.


2

^
Q ? What are the values of  2 and t -value for this sample?

Above discussion focuses on finding the difference of the income

between different gender . But in practice, the model could

include other independent variables, for example, working hours,

education, working area, and so on. If this is possible, then the

model could be like


i .i .d .
Yi  0  1Di  1Hr   i ,  i ~ N (0,  2 )

If we treat the working hours as a numerical(quantitative) value,

then the analysis is easier and the discussion is in the following

way.

You might also like