Econometrics I Handout

UNIT TWO
SIMPLE LINEAR REGRESSION

UNIT INTRODUCTION
As noted in unit one our task is to estimate the population regression function (PRF) on the basis
of sample regression function (SRF) as accurately as possible. This unit has five sections.
The first section deals with the concept of regression model. It also makes a distinction of single
and multiple regression models anddiscusses the basic assumption about the X i and ui in a
classical regression model (CLRM). In section two we introduce the method of least squares
with the aid of the two-variable linear regression model, a model in which the dependent variable
is expressed as a linear function of only a single explanatory variable. OLS requires that we
  
should choose  and  as sample estimates of  and  , respectively, so that u i will be small of
 2  
the minimum. To minimize u i we differentiate equation  u i with respect to  and  1 and
equate the first order partial derivations to zero.
The third section Define the meaning of coefficient of determination and discuss the procedures
of obtaining and estimate the coefficients of determination; the desirable properties of OLS and
the Gauss-Markov theorem. Finally, illustrates the procedures for testing of hypothesis,
performing the student’s t  test and confidence interval. It also provides a comparison between
regression analysis and ANOVA:
Introduction
Regression analysis is one of the most commonly used tools in econometrics analysis. Thus we
start with a basic question: what is regression analysis? Regression1 analysis is concerned with
describing and explaining the relationship between a given variable (often called the dependent
or explained variable) and one or more other variables (often called the explanatory or
independent variables). We denote the dependent variable by Y and the explanatory variables by
X1 , X 2 ,..., X k . If k  1, that is, there is only one explanatory variable (or one of the X 
variables), we have what is known as simple regression. This is what we discuss in this unit. On
the other hand, if k  1 , that is, there are more than one explanatory variable (or the X 
1
Note that, the dictionary definition of “regression” is ‘backward movement, a retreat, a return to an
earlier stage of development’. Paradoxical as it may sound, regression analysis as it is currently used has
noting to do with regression dictionaries define the term.
1
variables), we have what is known as multiple regression, whichwill be discussed in depth in
unit 3.
2.1.1 Stochastic or statistical relationship

Regression analysis is concerned with the study of the dependence of one variable, the dependent
variable, on one more mode variables, explanatory variables, with the view of estimating and/or
predicting the (population) mean or average value of the former in terms of the known or fixed
(in repeated sampling) values of the later.
As mentioned earlier, simple regression the relationship between one dependent variable, which
is denoted by Y and one explanatory variable, which is denoted by X is given by
Y  f (X ) ------------------------------------------------ (2.1.1)
Where, f (X ) is a function of X
At this point we need to distinguish between two types of relationships

1) A deterministic or mathematical relationship
2) A stochastic or statistical relationship which does not give unique values of Y for given
values of X but can be described exactly in probabilistic term.
In regression analysis we are concerned with relationship of type 2 not of type 1. As an example,
suppose that the relationship between sales ( Y ) and advertising expenditure ( X ) is
Y  2500 100X  X 2
This is a deterministic relationship. The sales for different levels of advertising expenditure can
be determined exactly as follows
X Y
0 2500
20 4100
50 5000
100 2500
On the other hand, suppose that the relationship between sales ( Y ) and advertising expenditure (
X ) is
2
Y  2500 100X  X 2  u
Where, u  500 with probability ½
u  500 With probability ½
Then the values of Y for different values of X cannot be determined exactly but can be described
probabilistically. For example, if advertising expenditures are 50, sales will be 5500 with
probability ½ and 4500 with probability ½. Thus the values of Y for different values of X as
follows:
X Y
0 2000 or 3000
20 3600 or 4600
50 4500 or 5500
100 2000 or 3000
The data on Y that we observe can be any one of the 8 possible cases. For instance, we can have
X Y
0 2000
20 4600
50 5500
100 2000
If the error term has a continuous distribution, say a normal distribution with mean 0 and
variance 1, then for each value of X we have a normal distribution for Y and the value of Y we
observe can be any observation from this distribution. For instance, the relationship between Y
and X is
Y  2 X u
Where, the error term u is N (0,1) , then for each value of X,Y will have a normal distribution.
This is shown in Figure 2.1. The line drawn is the deterministic relationship Y  2  X .
Y
Y=2+X
Possible values of Y for given value of X
3
X
Figure 2.1: A stochastic relationship
The actual values of Y for each X will be some points on the vertical lines shown. The
relationship between Y and X in such cases is called a stochastic or statistical relationship.
Given back to equation (2.1.1), we will assume that the function f (X ) is linear in X , that is,
f ( X )    X
And we will assume that this relationship is a stochastic relationship. That is,
Y    X  u ----------------------------------------- (2.1.2)
Where u , which is called an error or disturbance term, has a known a probability distribution
(i.e., is a random variable). In equation (2.1.2),   X is a deterministic component of Y and X
is a stochastic or random component;  and  are called regression coefficients or regression
parameters that we estimate from the data on Y and X .
There is no reason why the deterministic and stochastic components must be additive. But we
will start our discussion with a simple model and introduce complications later. For this reason
we have taken f (X ) to be a linear function and have assumed an additive error.
Why should we add an error term or what are the source of the error term ( u ) in equation
(2.1.2)? There are three main sources of the error term ( u )
1) Unpredictable element or randomness of human behavior,
2) Effect of large number of variables that have been omitted, and
3) Measurement error in Y
If we have n observations on Y and X , we can write the simple regression model (2.1.2) by
adding subscripts as
Yi    X i  ui i  1,2,..., n -------------------- (2.1.3)
Furthermore, the above regression equation can be rearranged for the error term as
ui  Yi    X i --------------------------------------- (2.2)
2.1.2 The Gaussian, standard, or classical linear regression model (CLRM)


In regression analysis, however, our objective is not only to estimate the unknown parameters 

and  i given the n observations on Y and X but also to draw inference about the true  and  .
 
For example, we would like to know how close are  and  i to their counterparts in the
4

population or how close is Y to the true E (Y / X i ) . To this end we must not only specify the
functional form of the regression model as in equation (2.1.3) but also make some assumptions
about the manner in which Yi generated. To see why it is needed, look at the PRF -
Yi    X i  ui . It shows that Yi depends on X i and ui . Therefore, unless we specify, how X i
and ui generated or created there is no way we can make any statistical inference about Yi and as
 
we shall see about  and  i .
Therefore, the assumption we make about the X i and ui are extremely critical to the valid
interpretation of the regression estimates.
1) Zero mean value of the disturbance ( u i ). That is; given the value of X the mean or
expected value of the disturbance term is zero. Technically, the conditional mean value of
u i is zero. Symbolically,
E (ui )  0 for all i or
E (ui / X i )  0
2)Homoscedasticity or equal variance of ui . That is; given the value of X , the variance of ui is
the same for all observations. That is the conditional variances of ui are identical. Simbolically
we have,
var(ui / X i )  E[ui  E(ui / X i )]2
 E(ui2 / X i )
var(ui )   2 for all i
3)Independence or no autocorrelation between the disturbance term. Given any two X values
X i and X j ( i  j ), the correlation between ui and u j i  j is zero. Symbolically,
cov(ui , u j / X i , X j )  E{[ui  E(ui )] / X i }{[u j  E(u j )] / X j }
 E[ui / X i ][u j / X j ]  0
Where, i and j are different observations
4)Independence of X j ; that is u j and X j are independent or the E( X j , ui )  0 for all j .
cov(u j , X j )  E[u j  E(u j )][ X j  E( X j )]
 E[u j ( X j  E( X j )]  0 ; since E(u j )  0
 E(u j X j )  E( X j )E(u j )  0 since E( X j ) is non stochastic
5
 E(u j X j )  0 ; since E(u j )  0
 0 by assumption
Assumption (4) says that the disturbance term u j and the explanatory variable X j are
uncorrelated
5)Normality. That is, u i are normally distributed for all i . In conjunction with assumption 3, 4,
and 5, this implies that u i are independently and normally distributed with mean zero and a
common variance  2 . We write this as u i ~ IN (0,  2 )
6)Linear Regression model. That is the regression model is linear in parameters but it may not be
linear in variables. That is,  and  appear with power 1 only and cannot be multiplied or
divided by any other variables like x ,  /  etc.
7)The X values are fixed in a repeated sampling. Values taken by the regressor X are considered
fixed in repeated samples. More technically, X is assumed to be nonstochastic.
8)The number of observations ( n ) must be greater than the number of parameters ( k ) to be
estimated. Alternatively, the number of observations should be greater than the number of
explanatory variables.
9)Variability of X values. That is, X ' s in a given sample must not be all the same. Technically,
the var(X ) must be a finite positive number.
10)The regression model is correctly specified. Alternatively, there is no specification bias or
error in the model used in the empirical analysis. That is, variables to be included in the model,
the functional form, and statistical assumptions should be correct
Under the first four assumptions, we can show that the method of least squares gives estimators
that are unbiased and have minimum variance among the class of linear unbiased estimators. As
for the normality assumptions, we retain it because we will make inferences on the basis of the
normal distribution as well as t and F distributions. The first assumption is also retained
throughout. Since E(u)  0 , we can also write equation (2.1.3) as
E (Yi )    X i ------------------------------------- (2.1.5)
This is also often termed the population regression function/model. When we substitute
estimates of the parameters  and  in this, we get the sample regression function/ model as
  
E(Y i )     i X i ------------------------------------ (2.1.6)
6
SECTION TWO: THE METHOD OF LEAST SQUARES
Introduction
The method of ordinary least squares (OLS) has some attractive statistical properties that have
made it one of the most powerful and popular methods of regression analysis. A further
justification for the least-squares method is that the estimators obtained by OLS have some very
 
desirable statistical properties. OLS requires that we should choose  and  as sample
 
estimates of  and  , respectively, so that u i will be small of the minimum. To minimize u i we
2  
differentiate equation ui with respect to  and  1 and equate the first order partial derivations
to zero.
2.2.1 Ordinary least square (OLS) principle

Under the stochastic assumptions we have made earlier, the method of ordinary least squares
(OLS) has some attractive statistical properties that have made it one of the most powerful and
popular methods of regression analysis. To understand this, method, we first explain the
ordinary least square principle. Recall that in the two-variable population regression function
(PRF) we have
Y    1 X i  ui ------------------------------------- (2.2.1)
However, as we noted above, the PRF is not directly observable. We estimate it from the Sample
regression function (SRF)
  
Yi     X i  ui ------------------------------------- (2.2.2)
 
Yi  Y i  ui --------------------------------------------- (2.2.3)

Where, Y is the estimated (conditional mean) value of Y .
To understand how the SRF is determined we first rearrange equation (2.2.3) as
 
ui  Y  Y i
  
ui  Y     1 X i ------------------------------------- (2.2.4)

Equation (2.2.4) shows that the u i (the residuals) are the difference between the actual and
estimated Y values. Now given n pairs of observations on Y and X , we would like to determine
the SRF in such a manner that it is as close as possible to the actual Y . To this end, we may adopt
7
the following criterion: choose the SRF in such a way that the sum of the residuals (
 
ui  (Yi  Y i ) is small as possible. Graphically
Y


 
  
u3 Yt     X t

u1 
u4

u2


X1 X2 X3 X4
Figure 2.2: Least square criterion
  
If we adopt a criterion of minimizing u  (Y  Y ) figure (2.2) shows that the residuals u
i i i 2
      
and u3 as well as u1 and u 4 receive the same weights in the sum ( u1  u 2  u 3  u4 ) although the
first two residuals are much closer to the SRF than the latter two. In other words, all the residuals
receive equal importance no matter how close or how widely scattered the individual
   
observations are from the SRF. Let the residuals u1 , u 2 , u3 , and u 4 in Figure (2.2) assume the

values 10, -2, +2, and -10 respectively. The algebraic sum of the residuals is zero although u1
  
and u 4 are scattered more widely about the SRF than u 2 and u3 .
We can avoid this problem if we adopt the least-squares criterion, which states that the SRF can
be fixed in such a way that
2  2
 
 u i    (Yi  Y i )   0
 
8
2   2
 
 u i    Yi     1 X i   0 ------------------------------ (2.2.5)
 
2 
is as small as possible, where ui are the squared residuals. By squaring u i this method gives
   
more weight to the residuals such as u1 and u 4 in figure (2.2) than the residuals u 2 and u3 . As
 
noted previously, under the minimum  u i criterion, the sum can be small even though the u i are
widely spread about the SRF. But this is not possible under the least square-procedure, for the
 2
larger the u i (in absolute value), the larger the ui .A further justification for the least-squares
method is that the estimators obtained by OLS have some very desirable statistical properties.
2.2.2 Estimation procedure in simple linear regression model

 
The method of the Least Squares requires that we should choose  and  as sample estimates

of  and  , respectively, so that u i will be small of the minimum. For this purpose we minimize

the ui  0 in equation (2.2.4)
  
u  (Y    
i i 1 Xi )  0
   
Thus,  and  should be chosen to make the regression line pass through Y and X such that
2 2
ui is the minimum among class of unbiased estimators. To minimize ui we differentiate
 
equation (2.2.5) with respect to  and  1 and equate the first order partial derivations to zero. To
 
proceed with partial differentiation with respect to  and  1 we first expand the right hand side
of equation (2.2.5)
2  2    2
u  Y
i i
2
 2 Yi  n  2  1  X iYi  2 1  X i   1  X i2 ----- (2.2.6a)
2  
Then the partial differentiation of u i with respect to  and  1 are:
2
 u i  

 2Yi 2n  2 1  X i  0

9
 
2Yi  2n  2  1  X i ------------------------------------------ (2.2.6b)
2
 u i  

 2 X iYi 2  X i  2 1  X i2  0
 1
 
2 X iYi  2  X i  2 1  X i2 -------------------------------- (2.2.6c)
Dividing both sides of equation (2.2.6b) and (2.2.6c) by 2 we get

 
Yi  n   1  X i -------------------------------------------- (2.2.7a)
 
 X Y   X
i i i   1  X i2 ----------------------------------- (2.2.7b)
Equations (2.2.7a and 2.2.7b) are called normal equations. The first normal equation (2.2.5a)
enables us to obtain the constant term of the simple regression model. Dividing both side of the
 
first normal equation; that is Yi  n  1  X i by n we get
 
Yi  n   1  X i
n n n
   
Y     1 X -------------------------------------- (2.2.8a)
Then rearranging the above equation we get the expression for the constant term of the simple
regression model as
   
  Y  1 X --------------------------- (2.2.8b)

Substituting the value of  from equation (2.2.8b) into (2.2.7b) we get
   
 X Y  X (Y  
i i i 1 X )   1  X i2 --------------- (2.2.9)
 
Note that, X i  n X and Yi  n Y . Substituting these two into equation (2.2.9) we have
    
 X iYi n X (Y   X )   1  X i2
    2 
X Y  n X Y 
i i 1 n X   1  X i2
10
    2
 X iYi  n X Y  1( X i2  n X ) ------------------------------ (2.2.10)

Finally, we obtain the expression for the Least Squares estimates of  1 or slope of the simple
regression model as follows
 

1 
 X Y n X Y i i
 2 ----------------------------- (2.2.11)
 X n X i
2
Let us define the deviation of the variables from their mean using small letters as follows
 2
 yi2 (Yi  Y )2  Yi 2  nY --------------------------- (a)
  2
 x ( X
2
i i  X )   X  n X ------------------------ (b) and
2
i
2
   
 x y ( X
i i i  X )(Yi  Y )   X iYi  n X Y ------------- (c)
Therefore, based on the above definitions an alternative expression of equation (2.2.11) is
 

1 
 (Yi  Y )  ( X i  X ) 
x y i i ---------------------- (2.2.12)
(X  X ) i

2 x 2
i
  
Where, X and Y are the sample means of X and Y . We have also defined xi  ( X i  X ) and

yi  (Yi  Y ) . Henceforth, we adopt the conventional of lettering the lowercase letters denote
deviations from mean value. Finally, if Y is the dependent variable and X is an explanatory
variable then the sample regression function (SRF) of Y on X is written formally as
  
Yi    X i --------------------------------------------- (2.2.13)
Proof: We can proof the equality of the numerators and denominators of equation (2.2.12) and
(2.2.11)
The equality of the numerator of equation (2.2.12) and (2.2.11)

 
 x y (Y  Y )( X
i i i  X ) . Expanding this
11
   
X Y Y X
i i i  X Yi  n X Y
 
Recall that,  X i  n X and Yi  nY . Substituting these two in the above expression we get
     
 X iYi  Y n X  X nY  n X Y
Now the last two expressions can cancel out. As a result, we remain only with
 
 X iYi  Y n X . This is also equal to
 X Y  n n . n
X Y i i
i i
 X Y  n  n
X Yi
i i 2
 X Y   n . Multiplying both sides by n we get

X Y i
i i
  X iYi   i  i n . Finally, the expression reduced to

 X Y
 
 n 
n X iYi   X i Yi ; which is the numerator of equation (2.2.11)
The equality of the denominators
 
 xi2 ( X i  X )( X i  X ) is the denominator of equation (2.2.12)? Expanding
the right hand side we get
  2
 X i2  2( X  X i )  X
   2
 X i2  2( X n X )  X
2
 2 
X i
2
 2n X  X
 2
X i
2
 n X ; which is equal to the denominator of equation (2.2.11)
Example 1: Consider the data in Table (2.1). The data are observations for 12 days quantity
supplied by a firm. Given this information,
A) Find the regression of Y on X and determine the relationship between quantity supply and
price of a kilo of orange.
12
B) Estimate the elasticity from an estimated regression line.
Table 2.1: Supply of orange by a firm
X (PRICE) 9 12 6 10 9 10 7 8 12 6 11 8
Y 69 76 52 56 57 77 58 55 67 53 72 64
(qty.SS)
Solution:
A) Parameter estimates
To find the parameter estimates, first we compute  X , Y ,& XY as well as the mean of
2 2
Y and X using information contained in Table 2.1 as follows:
Table 2.2: Computation of X 2 , Y 2 , and XY

1 2 3 4 5 6
Obsn. X (Price) Y (qty.SS) X2 Y2 XY
1 9 69 81 4761 621
2 12 76 144 5776 912
3 6 52 36 2704 312
4 10 56 100 3136 560
5 9 57 81 3249 513
6 10 77 100 5929 770
7 7 58 49 3364 406
8 8 55 64 3025 440
9 12 67 144 4489 804
10 6 53 36 2809 318
11 11 72 121 5184 792
12 8 64 64 4096 512
Sum 108 756 1020 48522 6960
Mean 9 63
Second we compute the three definitions we made earlier based on the data in the table above as
follows:
   
(1)  xi yi  ( X i  X )(Yi  Y )   X iYi  n X Y
 696012(9)(63)  6960 6804 156
13
  2
(2)  xi2 ( Xi  X )2   X i2  n X  1020  12(9)2  1020  972  48
 2
(2)  yi2 (Yi  Y )2  Yi 2  nY  48522  12(63)2  48522  47625  894
 
Thirdly, we obtain the value of the least square estimators or of the parameters  1 and  using
equation (2.1.9) and (2.2.6) respectively, as follows
 

1 
 X Y  n X Y  156  3.25
i i
 248
X nX i
2
   
  Y   X  63  3.25(9)  63  29.25  33.75
 
Finally, after the estimation of the parameters  and  1 in the third step as estimates of  and 
, the estimated supply function of Y on X or the Sample Regression Function (SRF) is expressed
formally as

Yi  33.75  3.25X i
The finding indicates that the slope parameter is trustworthy on a priori economic criteria. That
is, since economic theory postulates is direct or positive relationship between supply and price,
 
our finding of the coefficient of  1 is positive or  1  0 implying the parameter the expected
sign. Another priority economic criterion is to be considered to proceed to the next steps in
regression is size of the estimated coefficients. That is, whether economic theory supports the

change in quantity supply can be highly elastic as obtained  1  3.25 in our example? Well the
coefficient can be as large as this one if the market is characterized by other than monopoly and
duopoly. For instance, had the regression coefficient been marginal propensity to consume
(MPS) or marginal propensity to save (MPS) out of income economic theory does not support

the coefficient to be greater or equal to one or  1  1 .
Therefore, we can conclude that in regression analysis always priority should be given to the
fulfillment of the economic a priori criteria (sign and size of the estimates). Hence, one should
14
one proceed with the application of first-order and second-order tests of significance only when
the a priori economic criteria are satisfied.
B) The elasticity
 
After estimation of the parameters,  and  1 , we should estimate the elasticity from an estimated
regression line. We have said the estimation function
  
Yi     X i
   
is the equation of a line whose Y intercept is  and its slope  1 . The coefficient  1 is the

derivation of Y with respect to X i ; that is

 dY
 ------------------------------------------------------ (2.2.14)
dX

Equation (2.2.14) shows the rate of change in Y as X i changes by a very small amount. It should
be noted that the estimated function we use as an example is a linear supply or demand function.

The coefficient  1 , which is dY / dX , is not the price elasticity but a component of the elasticity,
which is defined by the formula
dy y dy x
P   . ---------------------------------------- (2.2.15)
dx x dx y
Where,  P  price elasticity, Y  quantity supplied, and X  price
Y
  
SRF= Yi     1 X i

1

Y

 X
 
Figure 2.3: The estimated parameters (  and  1 ) in the regression line
 dY
Clearly  or is the component of elasticity. From an estimated function we obtained an
dX
average elasticity as
15

 X
P   . 
------------------------------------------------ (2.2.16)
Y
 
Where, X  the average price (mean) in the sample, and Y  average value of the
  
quantity supplied in the sample. Therefore, substituting the value of  1 , X , and Y we
have already obtained into equation (2.1.16), the price elasticity of our Supply function is

X 9
P  (3.25) 
 (3.25)  0.46
Y 63
SECTION THREE: RESIDUALS AND GOODNESS OF FIT
Introduction
After the estimation of the parameters and the determination of the least square regression line,
we need first to judge the statistical reliability of the estimate using the standard errors
(residuals) of the parameter estimates and second to judge the explanatory power of the linear
regression of Y on X using the goodness of fit (coefficient of determination). It is the measure
of dispersion of observations around the regression line.
2.3.1The residuals (standard error) test of OLS estimates

Since sampling errors are inevitable in all estimates it is necessary to apply tests of significance
in order to measure the size of the error and determine the degree of confidence in the validity of

the estimates. The standard error test helps us to decide whether the parameter estimates of 
  
and  are statistically different from zero. Since  and  are sample estimates of the true
parameters  and  we must test their statistical reliability. Testing the standard errors of the
OLS estimates is known as Statistical tests of significance or the first order test.
In order to apply the standard test of significance of the Ordinary Least Square (OLS) Parameter
estimates we must compute their mean, variance, and residuals or standard errors
chronologically. To begin with, we assume that we draw repeated samples of size n from the
 
population of Y and X for each sample we estimate the parameters  and  . This is known as
the hypothetical repeated sampling procedure.
a. Linearity: (for ̂ )
16
Proposition: ˆ & ˆ are linear in Y.
Proof: From (2.17) of the OLS estimator of ̂ is given by:

xi yi xi (Y  Y ) xiY  Yxi
ˆ    ,
xi2 xi2 xi2
(but xi   ( X  X )   X  nX  nX  nX  0 )
x Y xi
 ˆ  i 2 ; Now, let  Ki (i  1,2,.....n)
xi xi2
 ˆ  KiY                          (2.19)
 ̂  K1Y1  K 2Y2  K 3Y3       K nYn
 ̂ is linear in Y
Check yourself question:

Show that ̂ is linear in Y? Hint: ̂  1n  Xki Yi . Derive this relationship between
̂ and Y.
Solution: let start from   Y 

 
X, we know that  


x y i i
and substituting we get
x 2
i
  Y   
  xi yi 
X
  x 2 
i 
 xi (Y  Y )
 Y  X
xi2
  xiY  Yxi 
  Y    X , (but xi   ( X  X )   X  nX  nX  nX  0 )
 xi2 
Thus,   Y i    i 2 i  X ,    1  K i X  Y
  xY  
 x  n i
 i 

Mean of the OLS estimates: The mean of the slope parameter(  ) is given by

E (  )   ---------------------------------------------------- (2.3.1)

We know that the slope parameter (  ) in simple regression model is obtained as
17

 xi yi
------------------------------------------------- (2.3.2)
 xi2
We know also that the deviation of the dependent variable from its mean is designated by y i

(small letter) and is y i  ( Yi Y ). Substituting this in equation (2.3.2) above we obtain
 

 x y  x (Y  Y )  x Y Y  x
 
i i
 i i i i i
-------------- (2.3.3)
x x 2
i x x 2
i
2
i
2
i
Since by definition the sum of the deviations of a variable from its mean is identically equal to
zero we can eliminate the expression after the mines sign. In doing so, we get

 xi yi  xiYi
 -------------------------------------- (2.3.4)
 xi2  xi2
The values of X are assumed to be a set of fixed values, which do not change from sample to
sample. Consequently, the ratio xi x 2
i will be constant from sample to sample, and if we
denote this ratio by k i we can write equation (2.3.4) in the form

   k1Yi ----------------------------------------------------------- (2.3.5)
Furthermore, substitute Yi    1 X i  i in equation (2.3.5) and rearranging the factors in the

resultant expressions we find
 
   k (   X i   ) ) ----------------------------------- (2.3.6a)
 
    ki    ki X i   ki i ---------------------------------- (2.3.6b)
Recall that to differentiate the error term of the population and sample regression form we use
 i for the population error term and e i for sample error term. From equation (2.3.6b), k i 0
and k X
i i  1.
Proof
18

 xi (X i  X ) 0
k    0
x x  xi2
i 2 2
i i
  
 x X   ( X  X )X X 2
 X  Xi X 2
 X  Xi
k X    1
i i i i i i
x x x 
i i
X  X  Xi
2 2 2
i i i 2
i
To proof the denominator in the above equation is equal to x 2

i follow the procedure below.

x 2
i  (X i  X )2
 
  ( X i2  2 X i X  X 2 )
  2
  X i2  2 X  X i n X
  X i2  2 X  X i  n X
 
X i
n

x 2
i   X i2  X  X i
Therefore, equation (2.3.6) will be


     ki i   
x i
 i ----------------------------------- (2.3.7)
x 2
i
Taking the expected values of equation (2.3.7) and recalling that X i ' s are fixed we obtain
 
E (  )  E  ( B) 
 xi  i 


  xi2 

 E ( )  E
x  i i
x 2
i

 x E( )
i i
x 2
i

x i
(0)
x 2
i
Since the true parameter (  ) is constant E( )   . Moreover, since the mean of the error term (
 i ) is zero ( E (  i )  0 ), the second term in the right hand side vanishes. Finally, we get
 
Mean of  is E (  )   --------------------------------- (2.3.8)
19

Equation (2.3.8) is thus read as;the mean of the OLS estimate  is equal to the true value of the
parameter  and hence it is unbiased estimator.

The mean of  : The process for computing the mean of the intercept term follows similar
procedure to that of the mean of the slope. Thus, we represent it simply as
 
Mean of   E ( )   ---------------------------------- (2.3.9)
This also shows unbiased estimator of the true parameter

 Proof(2): prove that ̂ is unbiased i.e.: (ˆ )  
From the proof of linearity property, we know that:
̂  1n  Xki Yi
 1n  Xki   X i  Ui , Since Yi    X i  U i
    1n X i  1n ui  Xki  Xki X i  Xki ui
   1n ui  Xki ui ,  ˆ    1n ui  Xki ui
   1n  Xki )ui
(ˆ )    1n (ui )  Xki (ui )
(ˆ )  
 ̂ is an unbiased estimator of  .

The Variance of OLS estimates:It can be proved that the variance of the slope parameter(  ) is
    1 1
var(  )  E[  E ( )]2  E[   ]2   2   2
( X  X )

2  xi2
Recall the following to proceed with proofing it

 k  0i
 k X 1
i i
20
  xi  
2
xi2 1
  ki  
2
2 
 
  xi   xi  xi  xi2
2 2

Therefore,    kiYi will be

   ki (   X i  ui ) -------------------------------------------- (1)
 ki    ki X i  ki ui
 0( )   (1)   ki ui

     ki ui ------------------------------------------------------ (2)
The expected value of equation (2) is


E( )  E(   ki ui )

E( )  E( )  E  ki ui ) ------------------------------------------------ (3)

We have already proved that E (  )  E (  )   , is unbiased estimator. Thus, rearranging
equation (3) we have

E( )    E  ki ui )

E(   )  E  ki ui ------------------------------------------------- (4)
Squaring both sides of equation (4) we have

 
var(  )  E(   ) 2  E[ ki ui ]2 --------------------------------- (5)
Expanding the right hand side of equation (5) gives us

var(  )  Ek1u1  k2u2  ...  knun 
2
  2
 E  ki2ui2  2 ki k j ui u j  . Where u
2
i   u and 2 ki k j ui u j  0
 i j  i j
 E[ k i2  ] ------------------------------------------------- (8)

2
Recalling the definition we made earlier that k 2j  1 /  x 2j , equation (8) will be
  1  ------------------------------------ (2.3.10a)
var(  )   2  2 
  x i 
21

The variance of  : It can also be proved that
  
var( )  E[  E ( )]  E[   ]
2

2
  2 X i
2

n ( X  X )
i
2
Recall the following normal equation first

    
   Y   X . And we also defined in equation (2.3.5) that    kiY . Thus,
  
  Y  kiYi X

Y 
    i  kiYi X 
n 

1 

     X k i Yi
n 
Therefore,
  1   
var( )  var    X ki Yi  ---------------------------------- (1)
 n  
2

1  
var( )     X k i  var(Yi ) ---------------------------------- (2)
n 
But we know that, var(Yi )  E[Yi  E(Yi )]  E( i2 )   2 . Substituting this into equation (2) we
have

 
 
1 2 X ki
var( )     2 
 2 2
 X k i ------------------------ (3)
2
n n 
 
k
1
Since  0 and  k i2  , substituting them into (3) above we obtain
 xi2
i
 
2 
 
2 
2  i

2 1 X  x 2
 n X 
var( )        -------------- (4)
 n  x i2  n x i 2
   
 
We have also proved earlier that  x i2   ( X i  X ) 2   X i2  n X 2 . Thus, substituting this
into equation (4) the expression will be
22
  
2 
2 
 X 2
 n X 2
 n X 
var( )    
i
n x i 2 
 
   X i2 
var( )     2
2
------------------------------------ (2.3.10b)
 n xi 
We have computed the variances OLS estimators. Now, it is time to check whether these
variances of OLS estimators do possess minimum variance property compared to the
variances other estimators of the true  and  , other than ˆ and ˆ .
To establish that ˆ and ˆ possess minimum variance property, we compare their

variances with that of the variances of some other alternative linear and unbiased
estimators of  and  , say  * and  * . Now, we want to prove that any other linear and
unbiased estimator of the true population parameter obtained from any other econometric
method has larger variance that that OLS estimators.
Lets first show minimum variance of ̂ and then that of ̂ .
1. Minimum variance of ̂
Suppose:  * an alternative linear and unbiased estimator of  and;
Let  *  w i Y i ......................................... ………………………………(2.29)
where , wi  ki ; but: wi  ki  ci
 *  wi (  X i  ui ) Since Yi    X i  U i
 wi  wi X i  wi ui
 ( *)  wi  wi X i ,since (ui )  0
Since  * is assumed to be an unbiased estimator, then for  * is to be an unbiased

estimator of  , there must be true that wi  0 and wi X  1 in the above equation.
But, wi  ki  ci
wi  (k i  ci )  ki  ci
23
Therefore, ci  0 since ki  wi  0
Again wi X i  (ki  ci ) X i  ki X i  ci X i
Since wi X i  1 and ki X i  1  ci X i  0 .
From these values we can drive ci xi  0, where xi  X i  X
ci xi  ci ( X i  X ) ci X i  Xci
Since ci X i  1 ci  0  ci xi  0

Thus, from the above calculations we can summarize the following results.
wi  0, wi xi  1, ci  0, ci X i  0
To prove whether ̂ hasminimum variance or not lets compute var( *) to compare with
var( ˆ ) .
var(  *)  var( wi Yi )
 wi var(Yi )
2
 var(  *)   2 wi2 since Var(Yi )   2
But, wi  (ki  ci ) 2  ki2  2ki ci  ci2

2
ci xi
 wi2  ki2  ci2 Since k i ci  0
xi2
Therefore, var(  *)   2 (ki2  ci2 )   2 ki2   2 ci2
var(  *)  var( ˆ )   2 ci2
Given that ci is an arbitrary constant,  2 ci2 is a positive i.e it is greater than zero. Thus
var(  *)  var( ˆ ) . This proves that ̂ possesses minimum variance property. In the
similar way we can prove that the least square estimate of the constant intercept ( ̂ )
possesses minimum variance.
2. Minimum Variance of ̂
We take a new estimator  * , which we assume to be a linear and unbiased estimator of
function of  . The least square estimator ̂ is given by:
24
ˆ  ( 1n  Xki )Yi
By analogy with that the proof of the minimum variance property of ̂ , let’s use the
weights wi = ci + ki Consequently;
*  ( 1n  Xwi )Yi
Since we want  * to be on unbiased estimator of the true  , that is, (*)   , we
substitute for Y    xi  ui in  * and find the expected value of  * .
*  ( 1n  Xwi )(  X i  ui )
 X ui
 (    Xwi  XX i wi  Xwi ui )
n n n
*    X  ui / n  Xwi  Xwi X i  Xwi ui
For  * to be an unbiased estimator of the true  , the following must hold.
 (wi )  0, (wi X i )  1 and  (wi ui )  0
i.e., if wi  0, and wi X i  1 . These conditions imply that ci  0 and ci X i  0 .
As in the case of ̂ , we need to compute Var (  * ) to compare with var( ̂ )

var(*)  var ( 1n  Xwi )Yi 
 ( 1n  Xwi ) 2 var(Yi )
  2 ( 1n  Xwi ) 2
  2 ( 1 n2  X 2 wi  2 1 n Xwi )
2
  2 ( n n2  X 2 wi  2 X wi )
2 1
n
var(*)   2 
1
n  X 2 wi
2
 ,Since wi  0
but wi  ki2  ci2

2
 var(*)   2 
1
n  X 2 (ki2  ci2 
1 X2 
var( *)   2   2    2 X 2 ci2
 n xi 
25
 X i2 
  2  
2 
 2 X 2 ci2
 nxi 
The first term in the bracket it var(ˆ ) , hence
var(*)  var(ˆ )  2 X 2 ci2
 var(*)  var(ˆ ) , Since  2 X 2 ci2  0

Therefore, we have proved that the least square estimators of linear regression model are
best, linear and unbiased (BLU) estimators.
The variance of the random variable (  ): The formula of the variance of  and  involves
the variance of the random term for the population (  );which is  2 . However, since the values
of  are not observable the true variance cannot be computed. Nevertheless, we may obtain an
unbiased estimate of the true variance or  2 from the expression

    i -------------------------------------------------------------- (2.3.11)
 u2
2
nk
  
Where, ei  Yi  Y i or ei  Yi     X i , and k the number of parameters in the regression
function. In the case of our supply function k  2 (i.e.,  &  ). Hence, it is assumed that the
expected value of the estimated variance is equal to the true variance and hence it is unbiased
estimatesof  2 . That is
 2
E(  )   2 ------------------------------------------------------------- (2.3.12)
Expression (2.3.12) together with the assumption that the mean of the error term is zero
[ E (  i )  0] gives the assumption that the variable  i has a normal distribution. That is
 ~ N (0, 2 ) -------------------------------------------------------- (2.3.13)

Besides, the above two assumptions and two others i.e.  i is a random real variable and the value
of  i is constant in each period summarize the assumptions about the behavior (distribution) of
the value of the error term.
26
The standard error test of the least square estimates: As mentioned earlier, the standard error
 
test helps us to decide whether the parameter estimates of  and  1 are statistically different
from zero. The formula for deriving the standard error test may be outlined as
   2
Se(  1 )  var(  1 )  ----------------------------------------- (2.3.14a)
x 2
i
   2  X i2
Se( )  var( )  -------------------------------------- (2.3.14b)
n xi2
Since  2 is not feasible for it belongs to the population regression function we estimate it by the
2
estimated  using the definition in equation (2.3.11)

   
ui2
2
nk
Example 1: Estimate the variance as well as the standard errors of the parameter estimate for our
supply function

Solution: In our example, we know that n  12, k  2 , and  ui2  387. Therefore,

387
 2   38.7
12  2
It follows from this thus the variance as well as the standard errors (standard deviations) of the
 
estimated parameters (i.e.,  and  1 ) can easily be obtained as follows.
 
 1  1
var( 1 )   2  2
 38.7   0.8062 -------------------------------- (4)
  xi   48 
 
Se( 1 )  var( 1 )  0.8062  0.8979
   X i2   1020  1020 
var( )   2
2 
 38.7    38.7   38.7(1.7708)  68..5313 -- (5)
 n xi   576 
i
12(48) 
 
Se( )  var( )  68.5313  8.2784
Where, Se refers to standard error. So we have
27

Yi  33.75  3.25X i
Se  (8.2784) (0.8979)
2.3.2. Statistical test of Significance of the OLS Estimators

(First Order tests)
After the estimation of the parameters and the determination of the least square
regression line, we need to know how ‘good’ is the fit of this line to the sample
observation of Y and X, that is to say we need to measure the dispersion of observations
around the regression line. This knowledge is essential because the closer the
observation to the line, the better the goodness of fit, i.e. the better is the explanation of
the variations of Y by the changes in the explanatory variables.
We divide the available criteria into three groups: the theoretical a priori criteria, the
statistical criteria, and the econometric criteria. Under this section, our focus is on
statistical criteria (first order tests). The two most commonly used first order tests in
econometric analysis are:
i. The coefficient of determination (the square of the correlation coefficient i.e. R2).
This test is used for judging the explanatory power of the independent variable(s).
ii. The standard error tests of the estimators. This test is used for judging the
statistical reliability of the estimates of the regression coefficients.
1. TESTS OF THE ‘GOODNESS OF FIT’ WITH R2

r2 shows the percentage of total variation of the dependent variable that can be explained
by the changes in the explanatory variable(s) included in the model. To elaborate this
let’s draw a horizontal line corresponding to the mean value of the dependent variable Y .
(see figure‘d’ below). By fitting the line Yˆ  ˆ 0  ˆ1 X we try to obtain the explanation of
the variation of the dependent variable Y produced by the changes of the explanatory
variable X.
28
.Y
Y = e  Y  Yˆ
Y Y = Ŷ Yˆ  ˆ 0  ˆ1 X
= Yˆ  Y
Y.
X
Figure ‘d’. Actual and estimated values of the dependent variable Y.
As can be seen from fig.(d) above, Y  Y represents measures the variation of the sample
observation value of the dependent variable around the mean. However the variation in
Y that can be attributed the influence of X, (i.e. the regression line) is given by the
vertical distance Yˆ  Y . The part of the total variation in Y about Y that can’t be
attributed to X is equal to e  Y  Yˆ which is referred to as the residual variation.
In summary:
ei  Yi  Yˆ = deviation of the observation Yi from the regression line.
yi  Y  Y = deviation of Y from its mean.
yˆ  Yˆ  Y = deviation of the regressed (predicted) value ( Ŷ ) from the mean.
Now, we may write the observed Y as the sum of the predicted value ( Ŷ ) and the
residual term (ei.).
Y i  Yˆ  e i
Observed Yi predicted Yi Re sidual
From equation (2.34) we can have the above equation but in deviation form
y  yˆ  e . By squaring and summing both sides, we obtain the following expression:
y 2  ( yˆ 2  e) 2
y 2  ( yˆ 2  ei2  2 yei)
 yi  ei2  2yêi

2
29
But ŷei = e(Yˆ  Y )  e(ˆ  ˆxi  Y )
 ˆei  ˆexi  Yˆei
(but ei  0, exi  0 )
  yê  0 ………………………………………………(2.46)
Therefore;

yi  
2
y 
ˆ2  2
ei ………………………………...(2.47)
Total Explained Un exp lained
var iation var iation var ation
OR,
Total sumof square  Explained sumof square  Re sidual sum
of square
i.e
TSS  ESS  RSS ……………………………………….(2.48)
Mathematically; the explained variation as a percentage of the total variation is explained
as:
ESS yˆ 2
 ……………………………………….(2.49)
TSS y 2
From equation (2.37) we have yˆ  ̂x . Squaring and summing both sides give us
yˆ 2  ˆ 2 x 2                        (2.50)
We can substitute (2.50) in (2.49) and obtain:
ˆ 2 x 2
ESS / TSS  …………………………………(2.51)
y 2
 xy  xi x y
2 2
 2  , Since ˆ  i 2 i
 x  y xi
2
xy xy
 ………………………………………(2.52)
x 2 y 2
Comparing (2.52) with the formula of the correlation coefficient:
30
r = Cov (X,Y) / x2x2= xy / nx2x2= xy / ( x 2 y 2 )1/2 ………(2.53)
Squaring (2.53) will result in: r2= ( xy )2 / ( x 2 y 2 ). ………….(2.54)
Comparing (2.52) and (2.54), we see exactly the expressions. Therefore:

xy xy
ESS/TSS  = r2
x 2 y 2
From (2.48), RSS=TSS-ESS. Hence R2 becomes;

TSS  RSS RSS e 2
R2   1  1  i2 ………………………….…………(2.55)
TSS TSS y
From equation (2.55) we can drive;

RSS  ei2  yi2 (1  R 2 )                            (2.56)
The limit of R2: The value of R2 falls between zero and one. i.e. 0  R2  1 .

Interpretation of R2
Suppose R 2  0.9 , this means that the regression line gives a good fit to the observed data
since this line explains 90% of the total variation of the Y value around their mean. The
remaining 10% of the total variation in Y is unaccounted for by the regression line and is
attributed to the factors included in the disturbance variable u i .
2. TESTING THE SIGNIFICANCE OF OLS PARAMETERS
To test the significance of the OLS parameter estimators we need the following:
 Variance of the parameter estimators
 Unbiased estimator of  2
 The assumption of normality of the distribution of error term.
We have already derived that:
ˆ 2
 var( ˆ )  2
x
ˆ 2 X 2
 var(ˆ ) 
nx 2
e 2 RSS
 ˆ 2  
n2 n2
31
For the purpose of estimation of the parameters the assumption of normality is not used,
but we use this assumption to test the significance of the parameter estimators; because
the testing methods or procedures are based on the assumption of the normality
assumption of the disturbance term. Hence before we discuss on the various testing
methods it is important to see whether the parameters are normally distributed or not.
We have already assumed that the error term is normally distributed with mean zero and
variance  2 , i.e. U i ~ N(0, 2 ) . Similarly, we also proved that Yi ~ N[(  x),  2 ] . Now,
we want to show the following:
 2 
1. ˆ ~ N  , 2 
 x 
  2 X 2 
2. ˆ ~ N , 
 nx 2 
To show whether ˆ and ˆ are normally distributed or not, we need to make use of one
property of normal distribution. “........ any linear function of a normally distributed
variable is itself normally distributed.”
ˆ  ki Yi  k1Y1  k 2 Y2i  ....  k n Yn
ˆ  wi Yi  w1Y1  w2 Y2i  ....  wn Yn
Since ˆ and ˆ are linear in Y, it follows that
 2    2 X 2 
ˆ ~ N  ,  ; ˆ ~ N , 
 x 2   nx 2 
The OLS estimates ˆ and ˆ are obtained from a sample of observations on Y and X.
Since sampling errors are inevitable in all estimates, it is necessary to apply test of
significance in order to measure the size of the error and determine the degree of
confidence in order to measure the validity of these estimates. This can be done by using
various tests. The most common ones are:
i) Standard error test ii) Student’s t-test iii) Confidence interval
32
All of these testing procedures reach on the same conclusion. Let us now see these testing
methods one by one.
i) Standard error test
This test helps us decide whether the estimates ˆ and ˆ are significantly different from
zero, i.e. whether the sample from which they have been estimated might have come
from a population whose true parameters are zero.   0 and / or   0 .
Formally we test the null hypothesis
H 0 :  i  0 against the alternative hypothesis H1 :  i  0
The standard error test may be outlined as follows.

First: Compute standard error of the parameters.
SE(ˆ )  var( ˆ )
SE(ˆ )  var(ˆ )
Second: compare the standard errors with the numerical values of ˆ and ˆ .
Decision rule:
 If SE(î )  12 î , accept the null hypothesis and reject the alternative hypothesis.
We conclude that ̂ i is statistically insignificant.
 If SE(î )  1 2 î , reject the null hypothesis and accept the alternative hypothesis.
We conclude that ̂ i is statistically significant.

The acceptance or rejection of the null hypothesis has definite economic meaning.
Namely, the acceptance of the null hypothesis   0 (the slope parameter is zero)
implies that the explanatory variable to which this estimate relates does not in fact
influence the dependent variable Y and should not be included in the function, since the
conducted test provided evidence that changes in X leave Y unaffected. In other words
acceptance of H0 implies that the relationship between Y and X is in fact Y    (0) x  
, i.e. there is no relationship between X and Y.
33
Numerical example: Suppose that from a sample of size n=30, we estimate the following
supply function.
Q  120  0.6 p  ei
SE : (1.7) (0.025)
Test the significance of the slope parameter at 5% level of significance using the standard
error test.
SE(ˆ )  0.025
(ˆ )  0.6
1
2 ˆ  0.3
This implies that SE(î )  1 2 î . The implication is ̂ is statistically significant at 5%

level of significance.
Note: The standard error test is an approximated test (which is approximated from the z-
test and t-test) and implies a two tail test conducted at 5% level of significance.
ii) Student’s t-test
Like the standard error test, this test is also important to test the significance of the
parameters. From your statistics, any variable X can be transformed into t using the
general formula:
X 
t , with n-1 degree of freedom.
sx
Where  i  value of the population mean

s x  sample estimate of the population standard deviation
( X  X ) 2
sx 
n 1
n  sample size
We can derive the t-value of the OLS estimates
34
î   
t ˆ  
SE ( ˆ ) 
 with n-k degree of freedom.
ˆ   
tˆ  
SE (ˆ )
Where:
SE = is standard error
k = number of parameters in the model.
Since we have two parameters in simple linear regression with intercept different from
zero, our degree of freedom is n-2. Like the standard error test we formally test the
hypothesis: H 0 :  i  0 against the alternative H1 :  i  0 for the slope parameter;
and H0 :  0 against the alternative H1 :   0 for the intercept.
To undertake the above test we follow the following steps.

Step 1: Compute t*, which is called the computed value of t, by taking the value of  in
the null hypothesis. In our case   0 , then t* becomes:
ˆ  0 ˆ
t*  
SE(ˆ ) SE(ˆ )
Step 2: Choose level of significance. Level of significance is the probability of making
‘wrong’ decision, i.e. the probability of rejecting the hypothesis when it is actually true or
the probability of committing a type I error. It is customary in econometric research to
choose the 5% or the 1% level of significance. This means that in making our decision
we allow (tolerate) five times out of a hundred to be ‘wrong’ i.e. reject the hypothesis
when it is actually true.
Step 3: Check whether there is one tail test or two tail test. If the inequality sign in the
alternative hypothesis is  , then it implies a two tail test and divide the chosen level of
significance by two; decide the critical region or critical value of t called tc. But if the
inequality sign is either > or < then it indicates one tail test and there is no need to divide
the chosen level of significance by two to obtain the critical value of t from the t-table.
Example:
35
If we have H 0 :  i  0
against: H1 :  i  0
Then this is a two tail test. If the level of significance is 5%, divide it by two to obtain
critical value of t from the t-table.
Step 4: Obtain critical value of t, called tcat  and n-2 degree of freedom for two tail test.
2
Step 5: Compare t* (the computed value of t) and tc (critical value of t)

 If t*>tc , reject H0 and accept H1. The conclusion is ̂ is statistically significant.
 If t*<tc , accept H0 and reject H1. The conclusion is ̂ is statistically insignificant.

Numerical Example:
Suppose that from a sample size n=20 we estimate the following consumption function:
C  100  0.70  e
(75.5) (0.21)
The values in the brackets are standard errors. We want to test the null hypothesis:
H 0 :  i  0 against the alternative H1 :  i  0 using the t-test at 5% level of significance.
a. the t-value for the test statistic is:

ˆ  0 ˆ 0.70
t*   =  3.3
SE(ˆ ) SE(ˆ ) 0.21
b. Since the alternative hypothesis (H1) is stated by inequality sign (  ) ,it is a two
tail test, hence we divide 
2  0.052  0.025 to obtain the critical value of ‘t’ at 
2
=0.025 and 18 degree of freedom (df) i.e. (n-2=20-2). From the

t-table ‘tc’ at 0.025 level of significance and 18 df is 2.10.
c. Since t*=3.3 and tc=2.1, t*>tc. It implies that ̂ is statistically significant.
iii) Confidence interval
36
Rejection of the null hypothesis doesn’t mean that our estimate ˆ and ˆ is the correct
estimate of the true population parameter  and  . It simply means that our estimate
comes from a sample drawn from a population whose parameter  is different from zero.
In order to define how close, the estimate to the true parameter, we must construct
confidence interval for the true parameter, in other words we must establish limiting
values around the estimate with in which the true parameter is expected to lie within a
certain “degree of confidence”. In this respect we say that with a given probability the
population parameter will be within the defined confidence interval (confidence limits).
We choose a probability in advance and refer to it as confidence level (interval

coefficient). It is customarily in econometrics to choose the 95% confidence level. This
means that in repeated sampling the confidence limits, computed from the sample, would
include the true population parameter in 95% of the cases. In the other 5% of the cases
the population parameter will fall outside the confidence interval.
In a two-tail test at  level of significance, the probability of obtaining the specific t-
value either –tc or tc is 
2 at n-2 degree of freedom. The probability of obtaining any
ˆ  
value of t which is equal to at n-2 degree of freedom is 1   2   2  i.e. 1   .
SE(ˆ )
i.e. Pr t c  t*  t c   1   …………………………………………(2.57)
ˆ  
but t*  …………………………………………………….(2.58)
SE ( ˆ )
Substitute (2.58) in (2.57) we obtain the following expression.
 ˆ   
Pr t c   t c   1   ………………………………………..(2.59)
 SE(ˆ ) 
 
Pr  SE(ˆ )t c  ˆ    SE(ˆ )t c  1        by multiplying SE(ˆ )
Pr ˆ  SE(ˆ )t c 
   ˆ  SE(ˆ )t c  1        by subtracting ˆ
37
 
Pr  ˆ  SE(ˆ )    ˆ  SE(ˆ )t c  1        by multiplying by  1
Prˆ  SE(ˆ )t c    ˆ  SE(ˆ )t  1        int erchanging

c
The limit within which the true  lies at (1   )%degree of confidence is:
[ˆ  SE(ˆ )t c , ˆ  SE(ˆ )t c ] ; where tc is the critical value of t at 

2 confidence interval and
n-2 degree of freedom.
The test procedure is outlined as follows.
H0 :   0
H1 :   0
Decision rule: If the hypothesized value of  in the null hypothesis is within the
confidence interval, accept H0 and reject H1. The implication is that ̂ is statistically
insignificant; while if the hypothesized value of  in the null hypothesis is outside the
limit, reject H0 and accept H1. This indicates ̂ is statistically significant.

The F test:
Another test of significance is the F  test. It is used to judge the overall (joint) significance of
parameters. The F * ratio (the observed variance ratio) is obtained by dividing the two ‘mean
square errors’ that is the MSE of the ESS (between the means) and RSS (within the sample). The
MSE is one of the desirable properties of OLS and it appears in the last column of the Analysis
of Variance (ANOVA) table. Symbolically,
2
F* 
ESS /( k  1)

 y /(k  1) 
rYX2 /( k  1)
RSS /( n  k ) 2 (1  rYX2 ) /( n  k )
u /(n  k )
Suppose, you are given the following regression line and intermediate results for 25 sample
observations

Y  89  2.88 X
Se: (38.4) (0.85)
2
r  0.76
2
u i  135
In order to compile the ANOVA table based on the information given we need to get ESS
(between the mean) and TSS as follows
38
2
 r2  1
u i
y 2
135
0.76  1  ,
y2
We know that if r 2  0.76 the unexplained variation will be 1 – 0.76 =0.24.
Thus, the TSS will be obtained as
135
0.24 
y2
y 2 (0.24)  135
y 2  135 / 0.24  562.5
Hence, the ESS is 562.5 135  427.5 . To appraise the findings from the regression
analysis we construct the ANOVA table, obtain F * , and compare the value of F * with
the tabulated F value as follows.
Table 2.4: ANOVA table for the two variable regression model
Source of variation Sum of Degree of freedom MSE F*
square
Between the means 2 V  (k  1)  1 427.5/1=427.5 F*=427.5/5.869565
(due to regression of ESS) y =72.83333
Within the sample
(Due to residuals)
e 2
i
(n  k )  23 135/23=5.869565
Total
y 2
i
(n  1)  24 562.5/24=23.4 F=0.95
V1=1
V2=23 is 4.28
Whether to reject or accept that the parameter estimates are jointly statistically significant we
need to compare the calculated F * with the tabulated F for a given confidence interval. The F-
distribution is given for upper 5% point ( F0.95 ) and for upper 1% point ( F0.99 ).For 95%
confidence interval, the decision for F test is to reject the null hypothesis if F *  F0.05 .
For instance, the tabulated F at 95% confidence interval is F0.95 (1, 23) is 4.28. Note that, while
reading the F distribution table the first number in the bracket next to F0.95 refers to the degree of
freedom for the numerator of the F * formula in equation (5.1.5) or the first raw last column in
ANOVA table 2.4 and the second entry is for the denominator the F * formula or second raw in
39
the ANOVA table. Moreover, you should read horizontally (from left to right) for the first entry
in the bracket and downward for the second entry respectively.
Comparison of Regression analysis and ANOVA:

 In both ways the variation in Y is split into two additive components
 The test performed in regression analysis is a test concerning the explanatory power of
the regression as measured by goodness of fit ( r 2 ). The F* ratio is a test of significance
of r 2 . That is
2
F* 
 y /( K  1) 
r 2Y , X /( K  1)
2 (1  r 2Y , X ) /( n  K )
ui /(n  K )
 In both methods an ANOVA table from which we may compute F ratio and use them for
testing hypothesis related to the aim of the study.
 It can be proofed that for individual regression coefficient (that is a model without
intercept term) the t and F tests are formally equivalent.
 The regression analysis is more powerful than the ANOVA when analyzing from market
data which are not experimental. That is regression analysis provides all information
which we may obtain from the method of ANOVA. Furthermore, it provides numerical
estimates for the influence of each explanatory variable. Whereas the ANOVA approach
shows only the addition to the explanation of the total variation which are obtained by the
introduction of an additional variable in the relationship. This is the only information
provided by the ANOVA. Hence, it is often argued that the ANOVA approach is more
appropriate for the study of the influence of qualitative variables on a certain variable.
This is because qualitative variables (like profession, sex, and religion) do not have
numerical values and hence their influence cannot be measured by regression analysis
while ANOVA technique does not require knowledge of the values of X’s but it is solely
depends on the values of Y. This argument, however, lost a lot of its merit with the
expansion of dummy variable in regression analysis.
Reporting the Results of Regression Analysis
The results of the regression analysis derived are reported in conventional formats. It is
not sufficient merely to report the estimates of  ’s. In practice we report regression
coefficients together with their standard errors and the value of R2. It has become
40
customary to present the estimated equations with standard errors placed in parenthesis
below the estimated parameter values. Sometimes, the estimated coefficients, the
corresponding standard errors, the p-values, and some other indicators are presented in
tabular form.
 These results are supplemented by R2 on (to the right side of the regression equation).
Y  128.5  2.88 X
Example: , R2 = 0.93. The numbers in the parenthesis
(38.2) (0.85)
below the parameter estimates are the standard errors. Some econometricians report the t-
values of the estimated coefficients in place of the standard errors.
41

Econometrics I Handout

Uploaded by

Copyright:

Available Formats

You might also like

Econometrics I Handout

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics I Handout

Uploaded by

Copyright:

Available Formats

UNIT TWO

SIMPLE LINEAR REGRESSION

2.1.1 Stochastic or statistical relationship

At this point we need to distinguish between two types of relationships

Possible values of Y for given value of X

2.1.2 The Gaussian, standard, or classical linear regression model (CLRM)

2.2.1 Ordinary least square (OLS) principle

Figure 2.2: Least square criterion

2.2.2 Estimation procedure in simple linear regression model

Dividing both sides of equation (2.2.6b) and (2.2.6c) by 2 we get

Therefore, based on the above definitions an alternative expression of equation (2.2.11) is

The equality of the numerator of equation (2.2.12) and (2.2.11)

 X Y   n . Multiplying both sides by n we get

  X iYi   i  i n . Finally, the expression reduced to

Y and X using information contained in Table 2.1 as follows:

Table 2.2: Computation of X 2 , Y 2 , and XY

2.3.1The residuals (standard error) test of OLS estimates

Proof: From (2.17) of the OLS estimator of ̂ is given by:

 ̂  K1Y1  K 2Y2  K 3Y3       K nYn

Check yourself question:

Solution: let start from   Y 

X, we know that  

Furthermore, substitute Yi    1 X i  i in equation (2.3.5) and rearranging the factors in the

To proof the denominator in the above equation is equal to x 2

Therefore, equation (2.3.6) will be

Mean of  is E (  )   --------------------------------- (2.3.8)

Mean of   E ( )   ---------------------------------- (2.3.9)

This also shows unbiased estimator of the true parameter

    1n X i  1n ui  Xki  Xki X i  Xki ui

   1n ui  Xki ui ,  ˆ    1n ui  Xki ui

(ˆ )    1n (ui )  Xki (ui )

Recall the following to proceed with proofing it

The expected value of equation (2) is

Squaring both sides of equation (4) we have

 E[ k i2  ] ------------------------------------------------- (8)

Recalling the definition we made earlier that k 2j  1 /  x 2j , equation (8) will be

Recall the following normal equation first

To establish that ˆ and ˆ possess minimum variance property, we compare their

 ( *)  wi  wi X i ,since (ui )  0

Since  * is assumed to be an unbiased estimator, then for  * is to be an unbiased

From these values we can drive ci xi  0, where xi  X i  X

ci xi  ci ( X i  X ) ci X i  Xci

Since ci X i  1 ci  0  ci xi  0

 var(  *)   2 wi2 since Var(Yi )   2

But, wi  (ki  ci ) 2  ki2  2ki ci  ci2

Therefore, var(  *)   2 (ki2  ci2 )   2 ki2   2 ci2

var(  *)  var( ˆ )   2 ci2

 (wi )  0, (wi X i )  1 and  (wi ui )  0

As in the case of ̂ , we need to compute Var (  * ) to compare with var( ̂ )

but wi  ki2  ci2

var(*)  var(ˆ )  2 X 2 ci2

 var(*)  var(ˆ ) , Since  2 X 2 ci2  0

 ~ N (0, 2 ) -------------------------------------------------------- (2.3.13)

Where, Se refers to standard error. So we have

2.3.2. Statistical test of Significance of the OLS Estimators

1. TESTS OF THE ‘GOODNESS OF FIT’ WITH R2

yi  Y  Y = deviation of Y from its mean.

yˆ  Yˆ  Y = deviation of the regressed (predicted) value ( Ŷ ) from the mean.