Regression 8

Some notes about the distributions of SST, SSE, and SSR
(1) In Simple (or multiple) regression model, the distributions
of SST and SSR(reg) follow the(exact)  2 -distribution if
the parameters 1   2  ...   k  0 . That is, if the
parameters ( 1 ,  2 ,...,  k )  0 , then SST and SSR are not
from the  2 -distribution.
(2) The F-ratio(the last column)in the ANOVA table follows
the F-distribution is under the hypo. H 0 : 1  ...   k  0 .
(3) The distribution of SSE(error) is exact  2 -distribution
even the hypothesis H 0 : 1  ...   k  0 is not satisfied.
(4) In Cochran’s theory, the random vector Y ~ N (0, I ) , and the
3 sources are following the exact  2 -distribution,
separately.
(5) Since the distribution of SSR(reg) is not from the 2
distribution, the expectation(E(SSR)) is a function of the
parameters 1 ,  2 ,..., and  k (see the last question of the
midterm).
How to fit a nonlinear regression line
We can use the regression analysis to fit a nonlinear regression
line, but it needs to determine the model first.
3.5
2.5
2
數列1
1.5
0.5
0
0 5 10 15 20
From above picture, we can use the following model to fit these
data, which is
i .i .d .
Yi   0  1 X i   2 X   i ,  i ~ N (0,  2 )
i
2
2
(or Y  X    ,  ~ N (0,  I n ) )
 1 X1 X 12 
 
1 X 2 X 22 
where X  .
 ... ... ... 
1 X X n2 
 n
The other processes are the same as we discussed, for example,
the estimations of parameters, ANOVA table, and so on.

Dummy variable(虛擬變數)
In the real world problem, the data could contain some qualitative
variables like colors, gender, and so on. The statistical method
can not analyze them unless they first translate into numerical
variables.
The dummy variable is a method to handle above qualitative
problem. We use the following example to illustrate how to use
the dummy variable.
Ex.1 : The incomes for 4 female’s and 6 male’s are listed and we
would like to know is there any difference in the income for
different gender.
income 1000 1150 1200 990 1400 1250 1320 1060 1180 1360
gender F F M M M F M F M M
In this example, gender is the qualitative variable and we can use
the following way to translate them into numerical values. For
example, we can set 1 for male and set 0 for female. That is,
1, if male
Di =
0, if female
The regression model for this problem can be assumed to be

i .i . d .
2
Yi   0  1 Di   i ,  i ~ N (0,  )
(or Y  X   )
 1 D1   1 
     
 1 D2 
where X   ... ...  ,
   0  , and    ... 
   1   
1 D   10 
 10 
In this example, we can let Y ' =(1000,1150,1250,1060, 1200,990,

 1 1 1 1 1 1 1 1 1 1
1400 ,1320, 1180,1360), and X '    .
 0 0 0 0 1 1 1 1 1 1 
^
1
Use the formula   ( X ' X ) X 'Y , we have
1
^ 10 6  11910  1115 
         and
 6 6   7450   126.7 
the corresponding regression model is
Yi  1115  126.7 * Di
From above model, we have the following results.
(1) When Di  0 , the model gives Y  1115. This means that
the average income for female is $1115 and this value is the
1000  1150  1250  1060
same as .
4
(2) When Di  1 , the model gives Y  1241.7. This means that
the average income for male is $1241.7 and this value is the
1200  990  ...  1360
same as .
6
(3) If we want to know is there any difference of the income
between different gender, we need to test the hypothesis
H 0 : 1  0 . The process is in listed in the following steps.

^
Step 1: Find the distribution of  1 .
^ 10
From previous discussion, we have  1 ~ N (  1 ,  2) .
24
Step 2: If  2 is known, then the test statistic under H 0 is

^
1 0
z ~ N (0,1)
10

24
We reject H 0 if | z | z  , and accept H 0 otherwise.

2
Step 3: If  2 is unknown, then the test statistic under H 0 is

^
1 0
t ~ t( n2 )
10 ^

24
^ SSE (error )
where   .
n2
We reject H 0 if | t | t  , and accept H 0 otherwise.

2
^
Q ? What are the values of  2 and t -value for this sample?
Above discussion focuses on finding the difference of the income
between different gender . But in practice, the model could
include other independent variables, for example, working hours,
education, working area, and so on. If this is possible, then the
model could be like

i .i .d .
Yi  0  1Di  1Hr   i ,  i ~ N (0,  2 )
If we treat the working hours as a numerical(quantitative) value,
then the analysis is easier and the discussion is in the following
way.

Regression 8

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression 8

Uploaded by

Copyright:

Available Formats

Some notes about the distributions of SST, SSE, and SSR

(1) In Simple (or multiple) regression model, the distributions

of SST and SSR(reg) follow the(exact)  2 -distribution if

the parameters 1   2  ...   k  0 . That is, if the

parameters ( 1 ,  2 ,...,  k )  0 , then SST and SSR are not

from the  2 -distribution.

(2) The F-ratio(the last column)in the ANOVA table follows

the F-distribution is under the hypo. H 0 : 1  ...   k  0 .

(3) The distribution of SSE(error) is exact  2 -distribution

even the hypothesis H 0 : 1  ...   k  0 is not satisfied.

(4) In Cochran’s theory, the random vector Y ~ N (0, I ) , and the

3 sources are following the exact  2 -distribution,

(5) Since the distribution of SSR(reg) is not from the 2

distribution, the expectation(E(SSR)) is a function of the

parameters 1 ,  2 ,..., and  k (see the last question of the

We can use the regression analysis to fit a nonlinear regression

line, but it needs to determine the model first.

The other processes are the same as we discussed, for example,

the estimations of parameters, ANOVA table, and so on.

variables like colors, gender, and so on. The statistical method

The dummy variable is a method to handle above qualitative

problem. We use the following example to illustrate how to use

the dummy variable.

would like to know is there any difference in the income for

In this example, gender is the qualitative variable and we can use

the following way to translate them into numerical values. For

The regression model for this problem can be assumed to be

In this example, we can let Y ' =(1000,1150,1250,1060, 1200,990,

the corresponding regression model is

(1) When Di  0 , the model gives Y  1115. This means that

(2) When Di  1 , the model gives Y  1241.7. This means that

(3) If we want to know is there any difference of the income

between different gender, we need to test the hypothesis

H 0 : 1  0 . The process is in listed in the following steps.

Step 2: If  2 is known, then the test statistic under H 0 is

We reject H 0 if | z | z  , and accept H 0 otherwise.

Step 3: If  2 is unknown, then the test statistic under H 0 is

We reject H 0 if | t | t  , and accept H 0 otherwise.

Above discussion focuses on finding the difference of the income

between different gender . But in practice, the model could

include other independent variables, for example, working hours,

education, working area, and so on. If this is possible, then the

model could be like

If we treat the working hours as a numerical(quantitative) value,

then the analysis is easier and the discussion is in the following

You might also like