Professional Documents
Culture Documents
Econometrics I Handout
Econometrics I Handout
Econometrics I Handout
The first section deals with the concept of regression model. It also makes a distinction of single
and multiple regression models anddiscusses the basic assumption about the X i and ui in a
classical regression model (CLRM). In section two we introduce the method of least squares
with the aid of the two-variable linear regression model, a model in which the dependent variable
is expressed as a linear function of only a single explanatory variable. OLS requires that we
should choose and as sample estimates of and , respectively, so that u i will be small of
2
the minimum. To minimize u i we differentiate equation u i with respect to and 1 and
equate the first order partial derivations to zero.
The third section Define the meaning of coefficient of determination and discuss the procedures
of obtaining and estimate the coefficients of determination; the desirable properties of OLS and
the Gauss-Markov theorem. Finally, illustrates the procedures for testing of hypothesis,
performing the student’s t test and confidence interval. It also provides a comparison between
regression analysis and ANOVA:
Introduction
Regression analysis is one of the most commonly used tools in econometrics analysis. Thus we
start with a basic question: what is regression analysis? Regression1 analysis is concerned with
describing and explaining the relationship between a given variable (often called the dependent
or explained variable) and one or more other variables (often called the explanatory or
independent variables). We denote the dependent variable by Y and the explanatory variables by
X1 , X 2 ,..., X k . If k 1, that is, there is only one explanatory variable (or one of the X
variables), we have what is known as simple regression. This is what we discuss in this unit. On
the other hand, if k 1 , that is, there are more than one explanatory variable (or the X
1
Note that, the dictionary definition of “regression” is ‘backward movement, a retreat, a return to an
earlier stage of development’. Paradoxical as it may sound, regression analysis as it is currently used has
noting to do with regression dictionaries define the term.
1
variables), we have what is known as multiple regression, whichwill be discussed in depth in
unit 3.
As mentioned earlier, simple regression the relationship between one dependent variable, which
is denoted by Y and one explanatory variable, which is denoted by X is given by
Y f (X ) ------------------------------------------------ (2.1.1)
Where, f (X ) is a function of X
In regression analysis we are concerned with relationship of type 2 not of type 1. As an example,
suppose that the relationship between sales ( Y ) and advertising expenditure ( X ) is
Y 2500 100X X 2
This is a deterministic relationship. The sales for different levels of advertising expenditure can
be determined exactly as follows
X Y
0 2500
20 4100
50 5000
100 2500
On the other hand, suppose that the relationship between sales ( Y ) and advertising expenditure (
X ) is
2
Y 2500 100X X 2 u
Where, u 500 with probability ½
u 500 With probability ½
Then the values of Y for different values of X cannot be determined exactly but can be described
probabilistically. For example, if advertising expenditures are 50, sales will be 5500 with
probability ½ and 4500 with probability ½. Thus the values of Y for different values of X as
follows:
X Y
0 2000 or 3000
20 3600 or 4600
50 4500 or 5500
100 2000 or 3000
The data on Y that we observe can be any one of the 8 possible cases. For instance, we can have
X Y
0 2000
20 4600
50 5500
100 2000
If the error term has a continuous distribution, say a normal distribution with mean 0 and
variance 1, then for each value of X we have a normal distribution for Y and the value of Y we
observe can be any observation from this distribution. For instance, the relationship between Y
and X is
Y 2 X u
Where, the error term u is N (0,1) , then for each value of X,Y will have a normal distribution.
This is shown in Figure 2.1. The line drawn is the deterministic relationship Y 2 X .
Y
Y=2+X
3
X
Figure 2.1: A stochastic relationship
The actual values of Y for each X will be some points on the vertical lines shown. The
relationship between Y and X in such cases is called a stochastic or statistical relationship.
Given back to equation (2.1.1), we will assume that the function f (X ) is linear in X , that is,
f ( X ) X
And we will assume that this relationship is a stochastic relationship. That is,
Y X u ----------------------------------------- (2.1.2)
Where u , which is called an error or disturbance term, has a known a probability distribution
(i.e., is a random variable). In equation (2.1.2), X is a deterministic component of Y and X
is a stochastic or random component; and are called regression coefficients or regression
parameters that we estimate from the data on Y and X .
There is no reason why the deterministic and stochastic components must be additive. But we
will start our discussion with a simple model and introduce complications later. For this reason
we have taken f (X ) to be a linear function and have assumed an additive error.
Why should we add an error term or what are the source of the error term ( u ) in equation
(2.1.2)? There are three main sources of the error term ( u )
1) Unpredictable element or randomness of human behavior,
2) Effect of large number of variables that have been omitted, and
3) Measurement error in Y
If we have n observations on Y and X , we can write the simple regression model (2.1.2) by
adding subscripts as
Yi X i ui i 1,2,..., n -------------------- (2.1.3)
Furthermore, the above regression equation can be rearranged for the error term as
ui Yi X i --------------------------------------- (2.2)
4
population or how close is Y to the true E (Y / X i ) . To this end we must not only specify the
functional form of the regression model as in equation (2.1.3) but also make some assumptions
about the manner in which Yi generated. To see why it is needed, look at the PRF -
Yi X i ui . It shows that Yi depends on X i and ui . Therefore, unless we specify, how X i
and ui generated or created there is no way we can make any statistical inference about Yi and as
we shall see about and i .
Therefore, the assumption we make about the X i and ui are extremely critical to the valid
interpretation of the regression estimates.
1) Zero mean value of the disturbance ( u i ). That is; given the value of X the mean or
expected value of the disturbance term is zero. Technically, the conditional mean value of
u i is zero. Symbolically,
E (ui ) 0 for all i or
E (ui / X i ) 0
2)Homoscedasticity or equal variance of ui . That is; given the value of X , the variance of ui is
the same for all observations. That is the conditional variances of ui are identical. Simbolically
we have,
var(ui / X i ) E[ui E(ui / X i )]2
E(ui2 / X i )
var(ui ) 2 for all i
3)Independence or no autocorrelation between the disturbance term. Given any two X values
X i and X j ( i j ), the correlation between ui and u j i j is zero. Symbolically,
cov(ui , u j / X i , X j ) E{[ui E(ui )] / X i }{[u j E(u j )] / X j }
E[ui / X i ][u j / X j ] 0
Where, i and j are different observations
4)Independence of X j ; that is u j and X j are independent or the E( X j , ui ) 0 for all j .
cov(u j , X j ) E[u j E(u j )][ X j E( X j )]
E[u j ( X j E( X j )] 0 ; since E(u j ) 0
E(u j X j ) E( X j )E(u j ) 0 since E( X j ) is non stochastic
5
E(u j X j ) 0 ; since E(u j ) 0
0 by assumption
Assumption (4) says that the disturbance term u j and the explanatory variable X j are
uncorrelated
5)Normality. That is, u i are normally distributed for all i . In conjunction with assumption 3, 4,
and 5, this implies that u i are independently and normally distributed with mean zero and a
common variance 2 . We write this as u i ~ IN (0, 2 )
6)Linear Regression model. That is the regression model is linear in parameters but it may not be
linear in variables. That is, and appear with power 1 only and cannot be multiplied or
divided by any other variables like x , / etc.
7)The X values are fixed in a repeated sampling. Values taken by the regressor X are considered
fixed in repeated samples. More technically, X is assumed to be nonstochastic.
8)The number of observations ( n ) must be greater than the number of parameters ( k ) to be
estimated. Alternatively, the number of observations should be greater than the number of
explanatory variables.
9)Variability of X values. That is, X ' s in a given sample must not be all the same. Technically,
the var(X ) must be a finite positive number.
10)The regression model is correctly specified. Alternatively, there is no specification bias or
error in the model used in the empirical analysis. That is, variables to be included in the model,
the functional form, and statistical assumptions should be correct
Under the first four assumptions, we can show that the method of least squares gives estimators
that are unbiased and have minimum variance among the class of linear unbiased estimators. As
for the normality assumptions, we retain it because we will make inferences on the basis of the
normal distribution as well as t and F distributions. The first assumption is also retained
throughout. Since E(u) 0 , we can also write equation (2.1.3) as
E (Yi ) X i ------------------------------------- (2.1.5)
This is also often termed the population regression function/model. When we substitute
estimates of the parameters and in this, we get the sample regression function/ model as
E(Y i ) i X i ------------------------------------ (2.1.6)
6
SECTION TWO: THE METHOD OF LEAST SQUARES
Introduction
The method of ordinary least squares (OLS) has some attractive statistical properties that have
made it one of the most powerful and popular methods of regression analysis. A further
justification for the least-squares method is that the estimators obtained by OLS have some very
desirable statistical properties. OLS requires that we should choose and as sample
estimates of and , respectively, so that u i will be small of the minimum. To minimize u i we
2
differentiate equation ui with respect to and 1 and equate the first order partial derivations
to zero.
However, as we noted above, the PRF is not directly observable. We estimate it from the Sample
regression function (SRF)
Yi X i ui ------------------------------------- (2.2.2)
Yi Y i ui --------------------------------------------- (2.2.3)
Where, Y is the estimated (conditional mean) value of Y .
To understand how the SRF is determined we first rearrange equation (2.2.3) as
ui Y Y i
ui Y 1 X i ------------------------------------- (2.2.4)
Equation (2.2.4) shows that the u i (the residuals) are the difference between the actual and
estimated Y values. Now given n pairs of observations on Y and X , we would like to determine
the SRF in such a manner that it is as close as possible to the actual Y . To this end, we may adopt
7
the following criterion: choose the SRF in such a way that the sum of the residuals (
ui (Yi Y i ) is small as possible. Graphically
Y
u3 Yt X t
u1
u4
u2
X1 X2 X3 X4
If we adopt a criterion of minimizing u (Y Y ) figure (2.2) shows that the residuals u
i i i 2
and u3 as well as u1 and u 4 receive the same weights in the sum ( u1 u 2 u 3 u4 ) although the
first two residuals are much closer to the SRF than the latter two. In other words, all the residuals
receive equal importance no matter how close or how widely scattered the individual
observations are from the SRF. Let the residuals u1 , u 2 , u3 , and u 4 in Figure (2.2) assume the
values 10, -2, +2, and -10 respectively. The algebraic sum of the residuals is zero although u1
and u 4 are scattered more widely about the SRF than u 2 and u3 .
We can avoid this problem if we adopt the least-squares criterion, which states that the SRF can
be fixed in such a way that
2 2
u i (Yi Y i ) 0
8
2 2
u i Yi 1 X i 0 ------------------------------ (2.2.5)
2
is as small as possible, where ui are the squared residuals. By squaring u i this method gives
more weight to the residuals such as u1 and u 4 in figure (2.2) than the residuals u 2 and u3 . As
noted previously, under the minimum u i criterion, the sum can be small even though the u i are
widely spread about the SRF. But this is not possible under the least square-procedure, for the
2
larger the u i (in absolute value), the larger the ui .A further justification for the least-squares
method is that the estimators obtained by OLS have some very desirable statistical properties.
2
Then the partial differentiation of u i with respect to and 1 are:
2
u i
2Yi 2n 2 1 X i 0
9
2Yi 2n 2 1 X i ------------------------------------------ (2.2.6b)
2
u i
2 X iYi 2 X i 2 1 X i2 0
1
2 X iYi 2 X i 2 1 X i2 -------------------------------- (2.2.6c)
Equations (2.2.7a and 2.2.7b) are called normal equations. The first normal equation (2.2.5a)
enables us to obtain the constant term of the simple regression model. Dividing both side of the
first normal equation; that is Yi n 1 X i by n we get
Yi n 1 X i
n n n
Y 1 X -------------------------------------- (2.2.8a)
Then rearranging the above equation we get the expression for the constant term of the simple
regression model as
Y 1 X --------------------------- (2.2.8b)
Substituting the value of from equation (2.2.8b) into (2.2.7b) we get
X Y X (Y
i i i 1 X ) 1 X i2 --------------- (2.2.9)
Note that, X i n X and Yi n Y . Substituting these two into equation (2.2.9) we have
X iYi n X (Y X ) 1 X i2
2
X Y n X Y
i i 1 n X 1 X i2
10
2
X iYi n X Y 1( X i2 n X ) ------------------------------ (2.2.10)
Finally, we obtain the expression for the Least Squares estimates of 1 or slope of the simple
regression model as follows
1
X Y n X Y i i
2 ----------------------------- (2.2.11)
X n X i
2
Let us define the deviation of the variables from their mean using small letters as follows
2
yi2 (Yi Y )2 Yi 2 nY --------------------------- (a)
2
x ( X
2
i i X ) X n X ------------------------ (b) and
2
i
2
x y ( X
i i i X )(Yi Y ) X iYi n X Y ------------- (c)
1
(Yi Y ) ( X i X )
x y i i ---------------------- (2.2.12)
(X X ) i
2 x 2
i
Where, X and Y are the sample means of X and Y . We have also defined xi ( X i X ) and
yi (Yi Y ) . Henceforth, we adopt the conventional of lettering the lowercase letters denote
deviations from mean value. Finally, if Y is the dependent variable and X is an explanatory
variable then the sample regression function (SRF) of Y on X is written formally as
Yi X i --------------------------------------------- (2.2.13)
Proof: We can proof the equality of the numerators and denominators of equation (2.2.12) and
(2.2.11)
11
X Y Y X
i i i X Yi n X Y
Recall that, X i n X and Yi nY . Substituting these two in the above expression we get
X iYi Y n X X nY n X Y
Now the last two expressions can cancel out. As a result, we remain only with
X iYi Y n X . This is also equal to
X Y n n . n
X Y i i
i i
X Y n n
X Yi
i i 2
Example 1: Consider the data in Table (2.1). The data are observations for 12 days quantity
supplied by a firm. Given this information,
A) Find the regression of Y on X and determine the relationship between quantity supply and
price of a kilo of orange.
12
B) Estimate the elasticity from an estimated regression line.
Table 2.1: Supply of orange by a firm
X (PRICE) 9 12 6 10 9 10 7 8 12 6 11 8
Y 69 76 52 56 57 77 58 55 67 53 72 64
(qty.SS)
Solution:
A) Parameter estimates
To find the parameter estimates, first we compute X , Y ,& XY as well as the mean of
2 2
Second we compute the three definitions we made earlier based on the data in the table above as
follows:
(1) xi yi ( X i X )(Yi Y ) X iYi n X Y
696012(9)(63) 6960 6804 156
13
2
(2) xi2 ( Xi X )2 X i2 n X 1020 12(9)2 1020 972 48
2
(2) yi2 (Yi Y )2 Yi 2 nY 48522 12(63)2 48522 47625 894
Thirdly, we obtain the value of the least square estimators or of the parameters 1 and using
equation (2.1.9) and (2.2.6) respectively, as follows
1
X Y n X Y 156 3.25
i i
248
X nX i
2
Y X 63 3.25(9) 63 29.25 33.75
Finally, after the estimation of the parameters and 1 in the third step as estimates of and
, the estimated supply function of Y on X or the Sample Regression Function (SRF) is expressed
formally as
Yi 33.75 3.25X i
The finding indicates that the slope parameter is trustworthy on a priori economic criteria. That
is, since economic theory postulates is direct or positive relationship between supply and price,
our finding of the coefficient of 1 is positive or 1 0 implying the parameter the expected
sign. Another priority economic criterion is to be considered to proceed to the next steps in
regression is size of the estimated coefficients. That is, whether economic theory supports the
change in quantity supply can be highly elastic as obtained 1 3.25 in our example? Well the
coefficient can be as large as this one if the market is characterized by other than monopoly and
duopoly. For instance, had the regression coefficient been marginal propensity to consume
(MPS) or marginal propensity to save (MPS) out of income economic theory does not support
the coefficient to be greater or equal to one or 1 1 .
Therefore, we can conclude that in regression analysis always priority should be given to the
fulfillment of the economic a priori criteria (sign and size of the estimates). Hence, one should
14
one proceed with the application of first-order and second-order tests of significance only when
the a priori economic criteria are satisfied.
B) The elasticity
After estimation of the parameters, and 1 , we should estimate the elasticity from an estimated
regression line. We have said the estimation function
Yi X i
is the equation of a line whose Y intercept is and its slope 1 . The coefficient 1 is the
derivation of Y with respect to X i ; that is
dY
------------------------------------------------------ (2.2.14)
dX
Equation (2.2.14) shows the rate of change in Y as X i changes by a very small amount. It should
be noted that the estimated function we use as an example is a linear supply or demand function.
The coefficient 1 , which is dY / dX , is not the price elasticity but a component of the elasticity,
which is defined by the formula
dy y dy x
P . ---------------------------------------- (2.2.15)
dx x dx y
Where, P price elasticity, Y quantity supplied, and X price
Y
SRF= Yi 1 X i
1
Y
X
Figure 2.3: The estimated parameters ( and 1 ) in the regression line
dY
Clearly or is the component of elasticity. From an estimated function we obtained an
dX
average elasticity as
15
X
P .
------------------------------------------------ (2.2.16)
Y
Where, X the average price (mean) in the sample, and Y average value of the
quantity supplied in the sample. Therefore, substituting the value of 1 , X , and Y we
have already obtained into equation (2.1.16), the price elasticity of our Supply function is
X 9
P (3.25)
(3.25) 0.46
Y 63
SECTION THREE: RESIDUALS AND GOODNESS OF FIT
Introduction
After the estimation of the parameters and the determination of the least square regression line,
we need first to judge the statistical reliability of the estimate using the standard errors
(residuals) of the parameter estimates and second to judge the explanatory power of the linear
regression of Y on X using the goodness of fit (coefficient of determination). It is the measure
of dispersion of observations around the regression line.
In order to apply the standard test of significance of the Ordinary Least Square (OLS) Parameter
estimates we must compute their mean, variance, and residuals or standard errors
chronologically. To begin with, we assume that we draw repeated samples of size n from the
population of Y and X for each sample we estimate the parameters and . This is known as
the hypothetical repeated sampling procedure.
a. Linearity: (for ̂ )
16
Proposition: ˆ & ˆ are linear in Y.
(but xi ( X X ) X nX nX nX 0 )
x Y xi
ˆ i 2 ; Now, let Ki (i 1,2,.....n)
xi xi2
ˆ KiY (2.19)
̂ is linear in Y
Y
xi yi
X
x 2
i
xi (Y Y )
Y X
xi2
xiY Yxi
Y X , (but xi ( X X ) X nX nX nX 0 )
xi2
Thus, Y i i 2 i X , 1 K i X Y
xY
x n i
i
Mean of the OLS estimates: The mean of the slope parameter( ) is given by
E ( ) ---------------------------------------------------- (2.3.1)
We know that the slope parameter ( ) in simple regression model is obtained as
17
xi yi
------------------------------------------------- (2.3.2)
xi2
We know also that the deviation of the dependent variable from its mean is designated by y i
(small letter) and is y i ( Yi Y ). Substituting this in equation (2.3.2) above we obtain
x y x (Y Y ) x Y Y x
i i
i i i i i
-------------- (2.3.3)
x x 2
i x x 2
i
2
i
2
i
Since by definition the sum of the deviations of a variable from its mean is identically equal to
zero we can eliminate the expression after the mines sign. In doing so, we get
xi yi xiYi
-------------------------------------- (2.3.4)
xi2 xi2
The values of X are assumed to be a set of fixed values, which do not change from sample to
sample. Consequently, the ratio xi x 2
i will be constant from sample to sample, and if we
denote this ratio by k i we can write equation (2.3.4) in the form
k1Yi ----------------------------------------------------------- (2.3.5)
and k X
i i 1.
Proof
18
xi (X i X ) 0
k 0
x x xi2
i 2 2
i i
x X ( X X )X X 2
X Xi X 2
X Xi
k X 1
i i i i i i
x x x
i i
X X Xi
2 2 2
i i i 2
i
X i2 2 X X i n X
X i
n
x 2
i X i2 X X i
Taking the expected values of equation (2.3.7) and recalling that X i ' s are fixed we obtain
E ( ) E ( B)
xi i
xi2
E ( ) E
x i i
x 2
i
x E( )
i i
x 2
i
x i
(0)
x 2
i
Since the true parameter ( ) is constant E( ) . Moreover, since the mean of the error term (
i ) is zero ( E ( i ) 0 ), the second term in the right hand side vanishes. Finally, we get
19
Equation (2.3.8) is thus read as;the mean of the OLS estimate is equal to the true value of the
parameter and hence it is unbiased estimator.
The mean of : The process for computing the mean of the intercept term follows similar
procedure to that of the mean of the slope. Thus, we represent it simply as
1n Xki )ui
(ˆ )
̂ is an unbiased estimator of .
The Variance of OLS estimates:It can be proved that the variance of the slope parameter( ) is
1 1
var( ) E[ E ( )]2 E[ ]2 2 2
( X X )
2 xi2
k X 1
i i
20
xi
2
xi2 1
ki
2
2
xi xi xi xi2
2 2
Therefore, kiYi will be
ki ( X i ui ) -------------------------------------------- (1)
ki ki X i ki ui
0( ) (1) ki ui
ki ui ------------------------------------------------------ (2)
2
E ki2ui2 2 ki k j ui u j . Where u
2
i u and 2 ki k j ui u j 0
i j i j
1 ------------------------------------ (2.3.10a)
var( ) 2 2
x i
21
The variance of : It can also be proved that
var( ) E[ E ( )] E[ ]
2
2
2 X i
2
n ( X X )
i
2
But we know that, var(Yi ) E[Yi E(Yi )] E( i2 ) 2 . Substituting this into equation (2) we
have
1 2 X ki
var( ) 2
2 2
X k i ------------------------ (3)
2
n n
k
1
Since 0 and k i2 , substituting them into (3) above we obtain
xi2
i
2
2
2 i
2 1 X x 2
n X
var( ) -------------- (4)
n x i2 n x i 2
We have also proved earlier that x i2 ( X i X ) 2 X i2 n X 2 . Thus, substituting this
into equation (4) the expression will be
22
2
2
X 2
n X 2
n X
var( )
i
n x i 2
X i2
var( ) 2
2
------------------------------------ (2.3.10b)
n xi
We have computed the variances OLS estimators. Now, it is time to check whether these
variances of OLS estimators do possess minimum variance property compared to the
variances other estimators of the true and , other than ˆ and ˆ .
1. Minimum variance of ̂
Suppose: * an alternative linear and unbiased estimator of and;
Let * w i Y i ......................................... ………………………………(2.29)
where , wi ki ; but: wi ki ci
* wi ( X i ui ) Since Yi X i U i
wi wi X i wi ui
23
Therefore, ci 0 since ki wi 0
Again wi X i (ki ci ) X i ki X i ci X i
Since wi X i 1 and ki X i 1 ci X i 0 .
To prove whether ̂ hasminimum variance or not lets compute var( *) to compare with
var( ˆ ) .
var( *) var( wi Yi )
wi var(Yi )
2
ci xi
wi2 ki2 ci2 Since k i ci 0
xi2
Given that ci is an arbitrary constant, 2 ci2 is a positive i.e it is greater than zero. Thus
var( *) var( ˆ ) . This proves that ̂ possesses minimum variance property. In the
similar way we can prove that the least square estimate of the constant intercept ( ̂ )
possesses minimum variance.
2. Minimum Variance of ̂
We take a new estimator * , which we assume to be a linear and unbiased estimator of
function of . The least square estimator ̂ is given by:
24
ˆ ( 1n Xki )Yi
By analogy with that the proof of the minimum variance property of ̂ , let’s use the
weights wi = ci + ki Consequently;
* ( 1n Xwi )Yi
Since we want * to be on unbiased estimator of the true , that is, (*) , we
substitute for Y xi ui in * and find the expected value of * .
* ( 1n Xwi )( X i ui )
X ui
( Xwi XX i wi Xwi ui )
n n n
* X ui / n Xwi Xwi X i Xwi ui
For * to be an unbiased estimator of the true , the following must hold.
i.e., if wi 0, and wi X i 1 . These conditions imply that ci 0 and ci X i 0 .
( 1n Xwi ) 2 var(Yi )
2 ( 1n Xwi ) 2
2 ( 1 n2 X 2 wi 2 1 n Xwi )
2
2 ( n n2 X 2 wi 2 X wi )
2 1
n
var(*) 2
1
n X 2 wi
2
,Since wi 0
var(*) 2
1
n X 2 (ki2 ci2
1 X2
var( *) 2 2 2 X 2 ci2
n xi
25
X i2
2
2
2 X 2 ci2
nxi
The first term in the bracket it var(ˆ ) , hence
The variance of the random variable ( ): The formula of the variance of and involves
the variance of the random term for the population ( );which is 2 . However, since the values
of are not observable the true variance cannot be computed. Nevertheless, we may obtain an
unbiased estimate of the true variance or 2 from the expression
i -------------------------------------------------------------- (2.3.11)
u2
2
nk
Where, ei Yi Y i or ei Yi X i , and k the number of parameters in the regression
function. In the case of our supply function k 2 (i.e., & ). Hence, it is assumed that the
expected value of the estimated variance is equal to the true variance and hence it is unbiased
estimatesof 2 . That is
2
E( ) 2 ------------------------------------------------------------- (2.3.12)
Expression (2.3.12) together with the assumption that the mean of the error term is zero
[ E ( i ) 0] gives the assumption that the variable i has a normal distribution. That is
26
The standard error test of the least square estimates: As mentioned earlier, the standard error
test helps us to decide whether the parameter estimates of and 1 are statistically different
from zero. The formula for deriving the standard error test may be outlined as
2
Se( 1 ) var( 1 ) ----------------------------------------- (2.3.14a)
x 2
i
2 X i2
Se( ) var( ) -------------------------------------- (2.3.14b)
n xi2
Since 2 is not feasible for it belongs to the population regression function we estimate it by the
2
estimated using the definition in equation (2.3.11)
ui2
2
nk
Example 1: Estimate the variance as well as the standard errors of the parameter estimate for our
supply function
Solution: In our example, we know that n 12, k 2 , and ui2 387. Therefore,
387
2 38.7
12 2
It follows from this thus the variance as well as the standard errors (standard deviations) of the
estimated parameters (i.e., and 1 ) can easily be obtained as follows.
1 1
var( 1 ) 2 2
38.7 0.8062 -------------------------------- (4)
xi 48
Se( 1 ) var( 1 ) 0.8062 0.8979
X i2 1020 1020
var( ) 2
2
38.7 38.7 38.7(1.7708) 68..5313 -- (5)
n xi 576
i
12(48)
Se( ) var( ) 68.5313 8.2784
27
Yi 33.75 3.25X i
Se (8.2784) (0.8979)
28
.Y
Y = e Y Yˆ
Y Y = Ŷ Yˆ ˆ 0 ˆ1 X
= Yˆ Y
Y.
X
Figure ‘d’. Actual and estimated values of the dependent variable Y.
As can be seen from fig.(d) above, Y Y represents measures the variation of the sample
observation value of the dependent variable around the mean. However the variation in
Y that can be attributed the influence of X, (i.e. the regression line) is given by the
vertical distance Yˆ Y . The part of the total variation in Y about Y that can’t be
attributed to X is equal to e Y Yˆ which is referred to as the residual variation.
In summary:
ei Yi Yˆ = deviation of the observation Yi from the regression line.
Now, we may write the observed Y as the sum of the predicted value ( Ŷ ) and the
residual term (ei.).
Y i Yˆ e i
Observed Yi predicted Yi Re sidual
From equation (2.34) we can have the above equation but in deviation form
y yˆ e . By squaring and summing both sides, we obtain the following expression:
y 2 ( yˆ 2 e) 2
y 2 ( yˆ 2 ei2 2 yei)
29
But ŷei = e(Yˆ Y ) e(ˆ ˆxi Y )
yˆe 0 ………………………………………………(2.46)
Therefore;
yi
2
y
ˆ2 2
ei ………………………………...(2.47)
Total Explained Un exp lained
var iation var iation var ation
OR,
Total sumof square Explained sumof square Re sidual sum
of square
i.e
TSS ESS RSS ……………………………………….(2.48)
Mathematically; the explained variation as a percentage of the total variation is explained
as:
ESS yˆ 2
……………………………………….(2.49)
TSS y 2
From equation (2.37) we have yˆ ̂x . Squaring and summing both sides give us
yˆ 2 ˆ 2 x 2 (2.50)
We can substitute (2.50) in (2.49) and obtain:
ˆ 2 x 2
ESS / TSS …………………………………(2.51)
y 2
xy xi x y
2 2
2 , Since ˆ i 2 i
x y xi
2
xy xy
………………………………………(2.52)
x 2 y 2
Comparing (2.52) with the formula of the correlation coefficient:
30
r = Cov (X,Y) / x2x2= xy / nx2x2= xy / ( x 2 y 2 )1/2 ………(2.53)
The limit of R2: The value of R2 falls between zero and one. i.e. 0 R2 1 .
Interpretation of R2
Suppose R 2 0.9 , this means that the regression line gives a good fit to the observed data
since this line explains 90% of the total variation of the Y value around their mean. The
remaining 10% of the total variation in Y is unaccounted for by the regression line and is
attributed to the factors included in the disturbance variable u i .
2. TESTING THE SIGNIFICANCE OF OLS PARAMETERS
To test the significance of the OLS parameter estimators we need the following:
Variance of the parameter estimators
Unbiased estimator of 2
The assumption of normality of the distribution of error term.
We have already derived that:
ˆ 2
var( ˆ ) 2
x
ˆ 2 X 2
var(ˆ )
nx 2
e 2 RSS
ˆ 2
n2 n2
31
For the purpose of estimation of the parameters the assumption of normality is not used,
but we use this assumption to test the significance of the parameter estimators; because
the testing methods or procedures are based on the assumption of the normality
assumption of the disturbance term. Hence before we discuss on the various testing
methods it is important to see whether the parameters are normally distributed or not.
We have already assumed that the error term is normally distributed with mean zero and
variance 2 , i.e. U i ~ N(0, 2 ) . Similarly, we also proved that Yi ~ N[( x), 2 ] . Now,
we want to show the following:
2
1. ˆ ~ N , 2
x
2 X 2
2. ˆ ~ N ,
nx 2
To show whether ˆ and ˆ are normally distributed or not, we need to make use of one
property of normal distribution. “........ any linear function of a normally distributed
variable is itself normally distributed.”
ˆ ki Yi k1Y1 k 2 Y2i .... k n Yn
ˆ wi Yi w1Y1 w2 Y2i .... wn Yn
2 2 X 2
ˆ ~ N , ; ˆ ~ N ,
x 2 nx 2
The OLS estimates ˆ and ˆ are obtained from a sample of observations on Y and X.
Since sampling errors are inevitable in all estimates, it is necessary to apply test of
significance in order to measure the size of the error and determine the degree of
confidence in order to measure the validity of these estimates. This can be done by using
various tests. The most common ones are:
i) Standard error test ii) Student’s t-test iii) Confidence interval
32
All of these testing procedures reach on the same conclusion. Let us now see these testing
methods one by one.
i) Standard error test
This test helps us decide whether the estimates ˆ and ˆ are significantly different from
zero, i.e. whether the sample from which they have been estimated might have come
from a population whose true parameters are zero. 0 and / or 0 .
Formally we test the null hypothesis
H 0 : i 0 against the alternative hypothesis H1 : i 0
SE(ˆ ) var( ˆ )
SE(ˆ ) var(ˆ )
Second: compare the standard errors with the numerical values of ˆ and ˆ .
Decision rule:
If SE(ˆi ) 12 ˆi , accept the null hypothesis and reject the alternative hypothesis.
If SE(ˆi ) 1 2 ˆi , reject the null hypothesis and accept the alternative hypothesis.
33
Numerical example: Suppose that from a sample of size n=30, we estimate the following
supply function.
Q 120 0.6 p ei
SE : (1.7) (0.025)
Test the significance of the slope parameter at 5% level of significance using the standard
error test.
SE(ˆ ) 0.025
(ˆ ) 0.6
1
2 ˆ 0.3
( X X ) 2
sx
n 1
n sample size
We can derive the t-value of the OLS estimates
34
ˆi
t ˆ
SE ( ˆ )
with n-k degree of freedom.
ˆ
tˆ
SE (ˆ )
Where:
SE = is standard error
k = number of parameters in the model.
Since we have two parameters in simple linear regression with intercept different from
zero, our degree of freedom is n-2. Like the standard error test we formally test the
hypothesis: H 0 : i 0 against the alternative H1 : i 0 for the slope parameter;
and H0 : 0 against the alternative H1 : 0 for the intercept.
ˆ 0 ˆ
t*
SE(ˆ ) SE(ˆ )
Step 2: Choose level of significance. Level of significance is the probability of making
‘wrong’ decision, i.e. the probability of rejecting the hypothesis when it is actually true or
the probability of committing a type I error. It is customary in econometric research to
choose the 5% or the 1% level of significance. This means that in making our decision
we allow (tolerate) five times out of a hundred to be ‘wrong’ i.e. reject the hypothesis
when it is actually true.
Step 3: Check whether there is one tail test or two tail test. If the inequality sign in the
alternative hypothesis is , then it implies a two tail test and divide the chosen level of
significance by two; decide the critical region or critical value of t called tc. But if the
inequality sign is either > or < then it indicates one tail test and there is no need to divide
the chosen level of significance by two to obtain the critical value of t from the t-table.
Example:
35
If we have H 0 : i 0
against: H1 : i 0
Then this is a two tail test. If the level of significance is 5%, divide it by two to obtain
critical value of t from the t-table.
Step 4: Obtain critical value of t, called tcat and n-2 degree of freedom for two tail test.
2
b. Since the alternative hypothesis (H1) is stated by inequality sign ( ) ,it is a two
tail test, hence we divide
2 0.052 0.025 to obtain the critical value of ‘t’ at
2
36
Rejection of the null hypothesis doesn’t mean that our estimate ˆ and ˆ is the correct
estimate of the true population parameter and . It simply means that our estimate
comes from a sample drawn from a population whose parameter is different from zero.
In order to define how close, the estimate to the true parameter, we must construct
confidence interval for the true parameter, in other words we must establish limiting
values around the estimate with in which the true parameter is expected to lie within a
certain “degree of confidence”. In this respect we say that with a given probability the
population parameter will be within the defined confidence interval (confidence limits).
ˆ
value of t which is equal to at n-2 degree of freedom is 1 2 2 i.e. 1 .
SE(ˆ )
ˆ
but t* …………………………………………………….(2.58)
SE ( ˆ )
Substitute (2.58) in (2.57) we obtain the following expression.
ˆ
Pr t c t c 1 ………………………………………..(2.59)
SE(ˆ )
Pr SE(ˆ )t c ˆ SE(ˆ )t c 1 by multiplying SE(ˆ )
Pr ˆ SE(ˆ )t c
ˆ SE(ˆ )t c 1 by subtracting ˆ
37
Pr ˆ SE(ˆ ) ˆ SE(ˆ )t c 1 by multiplying by 1
The limit within which the true lies at (1 )%degree of confidence is:
H1 : 0
Decision rule: If the hypothesized value of in the null hypothesis is within the
confidence interval, accept H0 and reject H1. The implication is that ̂ is statistically
insignificant; while if the hypothesized value of in the null hypothesis is outside the
F*
ESS /( k 1)
y /(k 1)
rYX2 /( k 1)
RSS /( n k ) 2 (1 rYX2 ) /( n k )
u /(n k )
Suppose, you are given the following regression line and intermediate results for 25 sample
observations
Y 89 2.88 X
Se: (38.4) (0.85)
2
r 0.76
2
u i 135
In order to compile the ANOVA table based on the information given we need to get ESS
(between the mean) and TSS as follows
38
2
r2 1
u i
y 2
135
0.76 1 ,
y2
We know that if r 2 0.76 the unexplained variation will be 1 – 0.76 =0.24.
Thus, the TSS will be obtained as
135
0.24
y2
y 2 (0.24) 135
y 2 135 / 0.24 562.5
Hence, the ESS is 562.5 135 427.5 . To appraise the findings from the regression
analysis we construct the ANOVA table, obtain F * , and compare the value of F * with
the tabulated F value as follows.
Table 2.4: ANOVA table for the two variable regression model
Source of variation Sum of Degree of freedom MSE F*
square
Between the means 2 V (k 1) 1 427.5/1=427.5 F*=427.5/5.869565
(due to regression of ESS) y =72.83333
Within the sample
(Due to residuals)
e 2
i
(n k ) 23 135/23=5.869565
Total
y 2
i
(n 1) 24 562.5/24=23.4 F=0.95
V1=1
V2=23 is 4.28
Whether to reject or accept that the parameter estimates are jointly statistically significant we
need to compare the calculated F * with the tabulated F for a given confidence interval. The F-
distribution is given for upper 5% point ( F0.95 ) and for upper 1% point ( F0.99 ).For 95%
confidence interval, the decision for F test is to reject the null hypothesis if F * F0.05 .
For instance, the tabulated F at 95% confidence interval is F0.95 (1, 23) is 4.28. Note that, while
reading the F distribution table the first number in the bracket next to F0.95 refers to the degree of
freedom for the numerator of the F * formula in equation (5.1.5) or the first raw last column in
ANOVA table 2.4 and the second entry is for the denominator the F * formula or second raw in
39
the ANOVA table. Moreover, you should read horizontally (from left to right) for the first entry
in the bracket and downward for the second entry respectively.
F*
y /( K 1)
r 2Y , X /( K 1)
2 (1 r 2Y , X ) /( n K )
ui /(n K )
In both methods an ANOVA table from which we may compute F ratio and use them for
testing hypothesis related to the aim of the study.
It can be proofed that for individual regression coefficient (that is a model without
intercept term) the t and F tests are formally equivalent.
The regression analysis is more powerful than the ANOVA when analyzing from market
data which are not experimental. That is regression analysis provides all information
which we may obtain from the method of ANOVA. Furthermore, it provides numerical
estimates for the influence of each explanatory variable. Whereas the ANOVA approach
shows only the addition to the explanation of the total variation which are obtained by the
introduction of an additional variable in the relationship. This is the only information
provided by the ANOVA. Hence, it is often argued that the ANOVA approach is more
appropriate for the study of the influence of qualitative variables on a certain variable.
This is because qualitative variables (like profession, sex, and religion) do not have
numerical values and hence their influence cannot be measured by regression analysis
while ANOVA technique does not require knowledge of the values of X’s but it is solely
depends on the values of Y. This argument, however, lost a lot of its merit with the
expansion of dummy variable in regression analysis.
The results of the regression analysis derived are reported in conventional formats. It is
not sufficient merely to report the estimates of ’s. In practice we report regression
coefficients together with their standard errors and the value of R2. It has become
40
customary to present the estimated equations with standard errors placed in parenthesis
below the estimated parameter values. Sometimes, the estimated coefficients, the
corresponding standard errors, the p-values, and some other indicators are presented in
tabular form.
These results are supplemented by R2 on (to the right side of the regression equation).
Y 128.5 2.88 X
Example: , R2 = 0.93. The numbers in the parenthesis
(38.2) (0.85)
below the parameter estimates are the standard errors. Some econometricians report the t-
values of the estimated coefficients in place of the standard errors.
41