Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

CHAPTER TWO

SIMPLE LINEAR REGRESSION

Introduction

As we have already mentioned in chapter one, one of the roles of econometrics is to lend empirical
support to economic theories and hypothesis. There are various ways of econometric analysis used
to achieve this objective. Regression analysis is the most common and appropriate technique of
econometric analysis. Regression analysis refers to estimating functions showing the relationship
between two or more variables and corresponding tests. Thus, the purpose of this chapter is to
introduce the concept of simple linear regression analysis.

2.1. Concept of Regression Function


2.1.1. Introduction to Simple Linear Regression

Correlation analysis is concerned for expressing and measuring the closeness of variables. It does
not identify the cause and effect relationship between variables. Regression is suitable in
identifying the cause and effect relationships of variables. In this section, we will discuss one of
the simplest form of regression; that is simple regression.

Regression analysis is concerned with the study of relationship between one variable (known as
dependent variable) and one or more other variables (known as independent variable(s)). A central
point in regression analysis is estimating regression functions accompanied by some preceding
and some succeeding steps in the econometrics methodology. Such function is generally given as
follows, equation 2.1.

Y  f X i  ………………………………………………………. 2.1
The equation means that the variable ‘Y’ is a function of the other variable(s) given by ‘Xi’.
Where, ‘i’ = 1, 2, 3, …………….

Note that the variable on the left hand side represented by ‘Y’ is known as dependent variable,
explained variable and/or endogenous variable. The variable(s) on the right hand side represented
by ‘Xi’ is also known as independent variable(s), explanatory variable(s) and/or exogenous
variable(s) demonstrated as follows.
1|Page
Y  f  X 1 , X 2 , X 3 ,......... .


   
 

dependentvar iable independent var iables


……………………. 2.2
exp lained var iable exp lanatory var iables
endogenousvar iable exogenousvar iables

Note also that regression function doesn’t always imply causal relationship among the variables.
Why regression functions? What are the objectives of the regression analysis?

The major objectives and uses of a regression function are:


1) To estimate mean or average value of the dependent variable, given the value of
independent variable(s);
2) To test hypothesis about sign and magnitude of relationship between the dependent
variable and one or more independent variable(s);
3) To predict or forecast future value(s) of the dependent variable which is in turn used in
policy formulation; and
4) Combination of any two or more of the above objectives.

Having this much about regression function in general, now let’s see some more points about our
current topic ‘simple linear regression analyses. Let’s see the meaning of the words ‘simple’ and
‘linear’ in the topic. In simple linear regression function or analysis, the term ‘simple’ refers to the
fact that we use only two variables (one dependent and another independent variable). If the
number of independent or explanatory variables is greater than one, we don’t use the term ‘simple’.
Instead we use the term ‘multiple’ as we will see in the next chapter.

Most econometric models got their model specification from economic theories since as we
mentioned in chapter one, coefficients of an econometric model are constants of economic theory.
Now let us discuss the simple linear regression using an example from economic theories. Consider
the theory of demand. The theory of demand in its simplest form postulates that there exists a
negative relationship between price and quantity demanded, ceteris paribus. As price rises quantity
demanded decreases and vice versa. Following the econometric procedure our task is specification
of the demand model: Determination of dependent and independent variables, the number of

2|Page
equations of the model and their precise mathematical form. Economic theory provides the
following information with regard to demand function.

a) The dependent variable is quantity demanded and the independent variable is price.
𝑌𝑖 = 𝑓(𝑋𝑖 )
Where 𝑌𝑖 = quantity demanded and 𝑋𝑖 = 𝑝𝑟𝑖𝑐𝑒
b) Economic theory does not specify whether demand is studied using one equation or more
elaborated form of simultaneous equations. Let us we use one equation system.
c) Economic theory is not clear about the mathematical form (linear or non-linear) of the demand
function. Let we choose a linear function. Therefore, the demand function is given as
𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 …………………………………………………………………………….. 2.3

d) The above form of demand function implies that the relation between Y and X is exact. That
means the whole variation in Y is due to the changes in X only and there are no other factors
or variables affecting the dependent variable except X. If this were true, all quantity-price pairs
would fall on the straight line if we plot it in the X-Y plane. However, if we gather information
from the market and plot it in the X-Y plane, all the price-quantity pairs do not fall on the
straight line. Most of the pairs would lie on the straight line, some of the points would lie above
the straight line and some of the points would also lie below the straight line. The deviation of
points from the straight line may be attributed to several reasons. These are:

1) Omitted Variables from the Function: In economics, each variable is influenced by a very
large number of factors. However, in the model not each and every factor or variable can be
included in the function due to several factors
a) Some of the factors may not even known by a person
b) Some of the factors may not be statistically measured
c) Some of the factors are random appearing in unpredictable way and time
d) Some of the factors have very small influence on the dependent variable
e) Some of the factors may not be having reliable statistical data.

Therefore, in most cases, only few variables are explicitly included in the model.

3|Page
2) Misspecification of the Mathematical Form of the Model: Misspecification of the model
can also result in the deviation of points from the straight line.
3) Errors of Aggregation: We often use aggregate data in which we add magnitudes referring
to individual behavior which are dissimilar. In this case, those variables expressing individual
peculiarities are missing.
4) Errors of Measurement: The deviation of points around the line may be due to the errors of
measurements of variables which are inevitable due to the methods of collecting and
processing statistical information.

In order to take into account, the above sources of errors, we include in econometric functions a
random variable which is usually denoted by letter ‘U’ and is called the error term or the
disturbance term or the random term.

𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 ………………………………………..…………………………… 2.4

The true relationship which connects the variables involved is split in two parts: A part represented
by the line and a part represented by the random term. Such type of models is called stochastic or
probabilistic models and is familiar in econometrics. The above model shows that the relationship
between the two variables is inexact and the total variation of the dependent variable is split into
two additive components: Explained variation and residual variation which can be shown as

TSS=ESS + RSS

The variation in the dependent variable is not hundred percent explained by the variation in the
explanatory variable. Thus, the variation in the dependent variable expressed as the sum of
explained variation and random variation given as follows.

Yi    X i

 Ui
 

(Variation in Yi) = (Systematic variation) + (Random variation)


(Variation in Yi) = (Explained variation) + (Unexplained variation)

Let’s discuss the second term ‘linear’ in our topic. What is the meaning of the term ‘linear’? Is it
the same as its meaning in mathematics? The answer is no! The meaning linear regression equation

4|Page
in econometrics is different from linear equation in mathematics. The difference between the two
can better be captured in comparing ‘linearity’ in variables and ‘linearity’ in parameters.

According to mathematical definition, a function is said to be linear in variables if the conditional


expectation of the dependent variable given the values of the independent variable is a linear
function of the variable. Linearity in variables is represented by combination of variables with the
same power and their power is of degree one. Variables are linearly combined (the variables have
the same degree). The following are examples of linear equations in variables.

𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 …………….…………………………………………………….. 2.5

𝒀𝒊 = 𝜶 + 𝟐𝜷𝑿𝒊 + 𝑼𝒊 ………………..………………………………………………… 2.6

As opposed to mathematical functions, in econometrics linearity doesn’t necessarily mean linearity


in variables. It basically means linearity in parameters. For instance, if the variation in the
dependent variable ‘Yi’ is better explained by the square the explanatory variable (Xi2) than its
actual values (Xi), the same concept of linear regression analysis can be used since all the necessary
transformations are made before estimation (for instance, squaring all values of Xi). Thus, the term
‘linearity’ in regression analysis emphasizes the linearity in parameters. The function is said to be
linear parameters if the conditional expectation of a function for given values of the variable is a
linear function of its parameters. For instances,

𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 …………………………………………………………………. 2.7

𝒀𝒊 = 𝜶 + 𝜷𝑿𝟐𝒊 + 𝑼𝒊 ………………………………………………………………… 2.8

2.2. Estimating Simple Linear Regression Function

In econometrics research, the second step or stage is to find the numerical values of the population
parameters which can be used for several purposes. The magnitude of the estimator is highly
required by analysts, policy formulators and forecasters in executing their jobs. In this section, we
will discuss the method of estimating parameters of simple linear regression functions. There are
various methods of estimating regression functions such as Method of Moment, Method of Least
Square or Ordinary Least Square (OLS) method and the Maximum Likelihood (MLE) method.

5|Page
First we give highlight on these three methods and we discuss OLS method in detail due to its
advantage over the Method of Moment and Maximum Likelihood Method. The ordinary least
square method is the easiest and the most commonly used method as opposed to the maximum
likelihood (MLH) method and Moment Method which are limited by their assumptions. For
instance, the MLH method is valid only for large sample as opposed to the OLS method which can
be applied to smaller samples. Owing to this merit, our discussion mainly focuses on the ordinary
least square (OLS).

This section is divided into three subsections. In the first subsection we will explain the Method
of Moment and OLS techniques of estimating regression functions along with classical conditions
or assumptions of OLS method. In the second section we will entirely concentrate on the OLS
method of regression analysis; and finally we will see the related concepts including measures of
goodness of fit and prediction with simple linear regression function.

2.2.1. Method of Moment

Since our objective is to get the estimates of the unknown parameters 𝛼 𝑎𝑛𝑑 𝛽 in equation 2.3, we
have to make some assumptions about the error term, 𝑈𝑖 . These are

i) Zero mean: 𝐸(𝑈𝑖 ) = 0 for all 𝑖


ii) Constant variance : 𝑉𝑎𝑟(𝑈𝑖 )=𝛿𝑢2 for all 𝑖
iii) Independence: 𝑈𝑖 𝑎𝑛𝑑 𝑈𝑗 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 ≠ 𝑗. i.e. 𝐶𝑜𝑣(𝑈𝑖 , 𝑈𝑗 ) = 0
iv) Independence of𝑋𝑗 : 𝑈𝑖 𝑎𝑛𝑑 𝑋𝑗 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 ≠ 𝑗. i.e. 𝐶𝑜𝑣(𝑋𝑖 , 𝑈𝑖 ) = 0
v) Normality: 𝑈𝑖 are normally distributed with zero mean and constant (common)
variance. 𝑈𝑖 ̴ 𝑁(0, 𝛿𝑢2 ).

The assumptions we have made about the error term u imply that
𝐸(𝑈𝑖 ) = 0 and 𝐶𝑜𝑣(𝑋𝑖 , 𝑈𝑖 ) = 𝐸(𝑋𝑖 𝑈𝑖 ) =0

In the method of moments, we replace these conditions by their sample counterparts. Let 𝛼̂ 𝑎𝑛𝑑 𝛽̂
be the estimators for 𝛼 𝑎𝑛𝑑 𝛽, respectively. The sample counterpart of 𝑈𝑖 is the estimated error 𝑒𝑖
(which is called the residual), defined as

̂𝑖 = 𝛼̂ + 𝛽̂ 𝑋𝑖
̂𝑖 , but 𝑌
𝑒𝑖 = 𝑌𝑖 - 𝑌
6|Page
̂ 𝑿𝒊 …………………………………………………………………2.9
̂ −𝜷
𝒆𝒊 = 𝒀𝒊 − 𝜶

The two equations to determine 𝛼̂ 𝑎𝑛𝑑 𝛽̂ are obtained by replacing the population assumptions by
their sample counterparts. That is, for 𝐸(𝑈𝑖 ) = 0 and 𝐶𝑜𝑣(𝑋𝑖 , 𝑈𝑖 ) = 0, the counterpart assumptions
are 𝐸(𝑒𝑖 ) = 0 and 𝐶𝑜𝑣(𝑋𝑖 , 𝑒𝑖 ) = 0.

Note: In these and hereafter ∑ denotes ∑𝑛𝑖=1


To do so, taken the summation of both sides in equation (2.9)

∑ 𝒆𝒊 = ∑(𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) ,
̂ −𝜷
∑ 𝒆𝒊
But, 𝑬(𝒆𝒊 ) = = 0. This imply that ∑ 𝒆𝒊 =0
𝒏

Then ∑(𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) = 0
̂ −𝜷
∑ 𝒀𝒊 − 𝒏𝜶 ̂ ∑ 𝑿𝒊
̂ −𝜷 ̂ ∑ 𝑿𝒊
̂ = ∑ 𝒀𝒊 − 𝜷
𝒏𝜶 ̂ =𝒀
𝜶 ̂𝑿
̅− 𝜷 ̅ ……………………… 2.10

∑ 𝑿𝒊 𝒆𝒊
By assumption IV above, 𝑬(𝑿𝒊 𝒆𝒊 ) = = 0. This also imply that ∑ 𝑿𝒊 𝒆𝒊 =0
𝒏

∑ 𝑿𝒊 𝒆𝒊 =0, but ̂ 𝑿𝒊
̂ −𝜷
𝒆𝒊 = 𝒀𝒊 − 𝜶
∑ 𝑿𝒊 (𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) = 0
̂ −𝜷
∑ 𝒀𝒊 𝑿𝒊 − 𝜶 ̂ ∑ 𝑿𝟐𝒊 =0 ……………………………………………………………… 2.11
̂ ∑ 𝑿𝒊 − 𝜷

̂ = 𝒏 ∑ 𝒀𝒊 𝑿𝟐𝒊−∑ 𝒀𝒊 ∑ 𝑿𝒊 or 𝜷
𝜷 ̂ = ∑ 𝒚𝒊𝒙𝟐 𝒊
𝒏 ∑ 𝑿 −(∑ 𝑿 )𝟐
𝒊 𝒊 ∑𝑿 𝒊

While estimating the unknown parameters 𝛼 𝑎𝑛𝑑 𝛽, we have derived two equations (equation 2.10
and 2.11). These equations are called normal equations.

2.2.2. Ordinary Least Square Method (OLS) and Classical Assumptions

There are two major ways of estimating regression functions. These are the (ordinary) least square
(OLS) method and maximum likelihood (MLH) method. Both the methods are basically similar
to their application to point estimations that you may be aware of in statistics courses.

The (Ordinary) least square (OLS) method of estimating parameters or regression function is about
finding or estimating values of the parameters (  and  ) of the simple linear regression function

7|Page
given below for which the errors or residuals are minimized. Thus, it is about minimizing the
residuals or the errors.

𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 …………………………………………………………………… 2.12

You may recall that the above identity represents population regression function (to be estimated
from total enumeration of data from the entire population. But, most of the time it is difficult to
generate population data owing to several reasons; and most of the time we use sample data and
we estimate sample regression function. Thus, we use the following sample regression function
for the derivation of the parameters and related analysis.

𝒀̂𝒊 = 𝜶 ̂ 𝑿𝒊 …………………………………………………..……………………. 2.13


̂ +𝜷

Before discussing the details of the OLS estimation techniques, let’s see the major conditions
known as classical assumptions that are necessary for the validity of the analysis, interpretations
and conclusions of the regression function.

i) Classical Assumptions
1. The error terms ‘𝑈𝑖 ’ are randomly distributed or the disturbance terms are not correlated.
This means that there is no systematic variation or relation among the value of the error terms
(𝑈𝑖 and 𝑈𝑗 ) where i = 1, 2, 3, …….., j = 1, 2, 3, ……. and i  j . This is represented by
zero covariance among the error terms: (𝑈𝑖 , 𝑈𝑗 ) = 0 for i  j . Note that the same
argument holds for residual terms when we use sample data or sample regression function.
Thus, 𝐶𝑜𝑣 (𝑒𝑖 , 𝑒𝑗 ) = 0 for i  j

Otherwise, the error terms do not serve an adjustment purpose; rather, it causes an
autocorrelation problem. This problem will be discussed in detail in chapters four.

2. The disturbance terms ‘𝑈𝑖 ’ have zero mean. The sum of the disturbance terms is zero. The
deviation or the values of some of the disturbance terms are negative; some are zero and
∑𝑈𝑖
some are positive and the sum or the average is zero. That is,𝐸(𝑈𝑖 ) = . Multiplying both
𝑛
sides by (sample size ‘n’) we obtain the following.  𝐸(𝑈𝑖 ) = ∑𝑈𝑖 = 0. The same
argument is true for sample regression function and so for residual terms given as follows

8|Page
∑𝑒𝑖 = 0. If this condition is not met, then the position of the regression function (or curve)
will not be the same as where it is supposed to be. This results in an upward (if the mean of
the error term or residual term is positive) or down ward (if the mean of the error term or
residual term is negative) shift in the regression function. For instance, suppose we have the
following regression function.

𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊
𝑬(𝒀𝒊 ) = 𝑬(𝒀𝒊 /𝑿𝒊 ) = 𝑬(𝜶 + 𝜷𝑿𝒊 ) + 𝑬(𝑼𝒊 )
𝑬(𝒀𝒊 ) = 𝑬(𝒀𝒊 /𝑿𝒊 ) = 𝜶 + 𝜷𝑿𝒊 𝒊𝒇 𝑬(𝑼𝒊 ) = 0
Otherwise the estimated models will be biased and causes the regression function to shift. For
instance, if 𝐸(𝑈𝑖 ) > 0 (or positive) it is going to shift up the estimation from the true
representative model. Similar argument is true for residual term of sample regression function.

3. The disturbance terms have constant variance in each period. This is given as follows:

𝑉𝑎𝑟(𝑈𝑖 )= 𝐸(𝑈𝑖 − 𝐸(𝑈𝑖 ))2 = δ2 = 𝛿𝑢2 . This assumption is known as the assumption of
homoscedasticity. If this condition is not fulfilled or if the variance of the error terms varies as
sample size changes or as the value of explanatory variables changes, then this leads to
Heteroscedasticity problem. This problem is given as follows: 𝑉𝑎𝑟(𝑈𝑖 )= 𝛿𝑖2 = 𝛿𝑢𝑖
2
. i = 1, 2, 3,
4... Heteroscedasticity problem along its detection methods and remedial measures will be
addressed in chapters four as well.

4. Explanatory variables ‘Xi’ and disturbance terms ‘𝑈𝑖 ’ are uncorrelated or independent. All
the co-variances of the successive values of the error term are equal to zero. This condition
is given by the following identity. 𝐶𝑜𝑣 (𝑈𝑖 , 𝑈𝑗 ) = 0. It is followed from this that∑𝑒𝑖 𝑋𝑖 = 0.
The value in which the error term assumed in one period does not depend on the value in
which it assumed in any other period. If this condition is not met by our data or variables,
our regression function and conclusions to be drawn from it will be invalid.

5. The explanatory variable Xi is fixed in repeated samples. Each value of Xi does not vary for
instance owing to change in sample size. This means the explanatory variables are non-
random and hence distributional free variable.

9|Page
6. Linearity of the model in parameters. The simple linear regression requires linearity in
parameters; but not necessarily linearity in variables. The same technique can be applied to
estimate regression functions of the following forms: Y = f (X ); Y = f (X 2); Y = f (X 3);
Y = f (X – kX ); and so on. What is important is transforming the data as required.

7. Normality assumption. The disturbance term Ui is assumed to have a normal distribution


with zero mean and a constant variance. This assumption is given as follows:𝑈𝑖 ̴ 𝑁(0, 𝛿𝑢2 ).
This assumption is a combination of zero mean of error term assumption and
homoscedasticity assumption. This assumption or combination of assumptions is used in
testing hypotheses about significance of parameters as we will see in the fourth chapter. It is
also useful in both estimating parameters and testing their significance in maximum
likelihood method.

8. Explanatory variables should not be perfectly, linearly and/or highly correlated. Using
explanatory variables which are highly or perfectly correlated in a regression function causes
a biased function or model. It also results in multicollearity problem (to be discussed chapter
four.

9. The variables are measured without error (the data are error free). Since wrong data leads to
wrong conclusion, it is important to make sure that our data is free from any type of error.
There must also be correct aggregation of values of related variables (macro variables).
Aggregate variables such as Gross Domestic Product (GDP), Trade Balance, Balance of
Payment (BOP), etc. should be correctly calculated from their components.

10. The relationship between variables (or the model) is correctly specified. For instance, all the
necessary variables are included in the model. The variables are in the form that best
describes the functional relationship. For instance, “Y = f (X 2)” may better reflects the
relationship between Y and X than “Y = f (X )”.

Note that some of these assumptions or conditions, (those which imply to more than one
explanatory variables), are meant for the next chapters (along with all the other assumptions or
conditions). So, we may not restate these conditions in the next chapter even if they are required

10 | P a g e
there also. Next let’s directly go to our discussion about techniques of estimating the simple
linear regression model.

ii) Estimation of OLS Model and Distribution of the Dependent variable Y

The dependent variable is normally distributed with the following mean and variance.

Mean of Y  E (Yi )  E (  X i  U i )
 E ( )  E ( X i )  E (U i )
   X i

Var (Yi )  E ((Yi  E (Yi )) 2


And the variance of Y is given as
 E (   X i  U i     X i ) 2  E (U i ) 2   u2
Therefore, the variance of the dependent variable is the same with the variance of the error term
which is constant.

Yi ˜ 
N   X i ,  u
2

Estimating a linear regression function using the Ordinary Least Square (OLS) method is simply
about calculating the parameters of the regression function for which the sum of square of the error
terms is minimized. Suppose we want to estimate the following equation

𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 …………………………………………………….……………. 2.14

Since most of the time we use sample (or it is difficult to get population data) the corresponding
sample regression function is given as follows.

𝒀̂𝒊 = 𝜶 ̂ 𝑿𝒊 …………………………………………………...…………………… 2.15


̂ +𝜷

From this identity, we solve for the residual term ' ei ' , square both sides and then take sum of
both sides. These three steps are given (respectively as follows.

𝒆𝒊 = 𝒀𝒊 - 𝒀̂𝒊 = 𝒀̂𝒊 -𝜶 ̂ 𝑿𝒊 ……………………………………………………………….. 2.16


̂ -𝜷
∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 )2……………………….……………………………………. 2.17
̂ −𝜷

Where,  ei  RSS= Residual Sum of Squares.


2

11 | P a g e
The method of OLS involves finding the estimates of the intercept and the slope for which the sum
squares given by Equation 2.17 is minimized. To minimize the residual sum of squares we take
the first order partial derivatives and equate them to zero.

That is, the partial derivative with respect to 𝛼̂ :

𝝏 ∑ 𝒆𝟐𝒊
̂ 𝑿𝒊 ) (−𝟏) = 0 ………………………………………………..……… 2.18
̂–𝜷
= 2∑(𝒀𝒊 − 𝜶
̂
𝝏𝜶

∑(𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) = 0 ……………………………………………………………… 2.19


̂–𝜷
∑ 𝒀𝒊 − 𝒏𝜶 ̂ ∑𝑿𝒊 = 0 …………………………………………………………… 2.20
̂ −𝜷
∑ 𝒀𝒊 = 𝒏𝜶 ̂ ∑𝑿𝒊 ……………………………………………………………… 2.21
̂ +𝜷

Where n is the sample size.


Partial derivative With respect to 𝛽̂

𝝏 ∑ 𝒆𝟐𝒊
̂ 𝑿𝒊 ) (−𝑿𝒊 ) = 0 ………………………………......……………………… 2.22
̂–𝜷
−𝜶
̂ = 2∑(𝒀𝒊
𝝏𝜷

∑(𝒀𝒊 𝑿𝒊 − 𝜶 ̂ 𝑿𝟐𝒊 ) = 0 …………………….………………………..…………… 2.23


̂ 𝑿𝒊 – 𝜷
∑ 𝒀𝒊 𝑿𝒊 = 𝜶 ̂ ∑𝑿𝟐𝒊 = 0 ……………………………………………………… 2.24
̂ ∑𝑿𝒊 + 𝜷

Note that the equation ∑(𝑌𝑖 − 𝛼̂ − 𝛽̂ 𝑋𝑖 )2 is a composite function and we should apply a chain rule
in finding the partial derivatives with respect to the parameter estimates.

Equations 2.21 and 2.24 are together called the system of normal equations. Solving the system of
normal equations simultaneously we obtain:

̂ = 𝒏 ∑ 𝒀𝒊 𝑿𝒊−∑ 𝒀𝒊 ∑ 𝑿𝒊 ……………………………………………………………………… 2.25


𝜷 ∑𝒏 𝟐𝟐
𝑿𝒊 −(∑ 𝑿𝒊 )

̂ = ∑ 𝒀𝒊 𝑿𝒊−𝒏𝒀̅𝟐 𝑿̅
𝜷 𝟐
∑ ̅𝑿𝒊 −𝒏𝑿

̂ =𝒀
And 𝜶 ̂𝑿
̅− 𝜷 ̅ ……………………………………………………………….…………… 2.26

The regression coefficients can also be obtained by simple formulae by taking the deviations
between the original values and their means. Now, if

12 | P a g e
xi  X i  X ……………………………………………………….………………………… 2.27
and
yi  Yi  Y ………………………………………………………….………………………… 2.28
Then,

̂ = ∑ 𝒚𝒊𝒙𝟐 𝒊 ………………………...…..…………………………….………………… 2.29


𝜷 ∑𝑿 𝒊

̂ =𝒀
𝜶 ̂𝑿
̅− 𝜷 ̅ …………………………...…..…………………………….…………………… 2.30
Example 2.1: The following table gives the quantity supplied (in tons) and its price (pound per
ton) for a commodity over a period of twelve years.

Yi 69 76 52 56 57 77 58 55 67 53 72 64
Xi 9 12 6 10 9 10 7 8 12 6 11 8
a) Fit the simple linear regression equation; Y = f(X) and Interpret your result.
b) Test the significance of each parameter estimates
c) Determine the 95% confidence interval for the slope.
d) Predict the value of Y when X is 45.

Solution: The slope and the intercept are estimated to be:

 xy 156
ˆ    3.25
48
x 2

𝛼̂ = 𝑌̅ − 𝛽̂ 𝑋̅ = 63-(3.25)(9) = 33.75

Thus the fitted regression function is given by:

Yˆi  33.75  3.25 X i

Interpretation: The value of the intercept term, 33.75, implies that the value of the dependent
variable ‘Y’ is 33.75 when the value of the explanatory variable is zero. The value of the slope
coefficient ( ˆ  3.25 ) is a measure of the marginal change in the dependent variable ‘Y’ when the
value of the explanatory variable increases by one. For instance, in this model, the value of ‘Y’
increases on average by 3.25 units when ‘X’ increases by one.

13 | P a g e
2.3. Residuals and Goodness of fit

After estimating the parameters and determining the regression line, we need to judge the
explanatory power as well as the statistical reliability of the regression of Y on X. There are two
most commonly used tests in econometrics. These are:

1) The square of correlation coefficient, r 2 , which is used for judging the explanatory power of
the linear regression of Yon X or on X’s. The square of correlation coefficient in simple
regression is known as the coefficient of determination and is given by R 2 . The coefficient of
determination measures the goodness of fit of the line of regression on the observed sample
values of Y and X.
2) The standard error test of the parameter estimates and is applied for judging the statistical
reliability of the estimates. This test measures the degree of confidence that we may attribute
to the estimates (section 2.6).

2.3.1. The Coefficient of determination (R2)

The coefficient of determination is the measure of the amount or proportion of the total variation
of the dependent variable that is determined or explained by the model or the presence of the
explanatory variable in the model. The total variation of the dependent variable is split in two
additive components: a part explained by the model and a part represented by the random term.

_
Total var iationin Yi  (Yi  Y )2
 _
Total exp lained var iation  (Yi  Y )2
Total un exp lained var iation   ei2

The total variation of the dependent variable is given in the following form: TSS=ESS + RSS,
which means total sum of square of the dependent variable is split into explained sum of square
and residual sum of square.

14 | P a g e

ei  y i  yi

y i  y i  ei
 2 
y i2  y i  ei2  2 y i ei
2 
 yi2   y i   ei2  2  y i ei

But  y i ei  0
2
Therefore,  y   y i   ei2 2
i

The coefficient of determination is given by the formula

 (Yˆ  Y )  yˆ
2 2
i i
Explained Variation in Y
R2   
Total Variation in Y …………………………………… 2.31
 (Y i  Y )2 y i
2

2 
Since  y i  1  xi yi the coefficient of det er min ationcan alsobe givenas
 OR
1  xi yi
R 
2

 yi2

 (Y  Yˆi ) 2 e
2
i i
Unexplained Variation in Y
R2  1  1  1 ………………….…… 2.32
Total Variation inY
 (Y i Y ) 2
y i
2

The higher the coefficient of determination is the better the fit. Conversely, the smaller the
coefficient of determination is the poorer the fit. That is why the coefficient of determination is
used to compare two or more models. One minus the coefficient of determination is called the
coefficient of non-determination, and it gives the proportion of the variation in the dependent
variable that remained undetermined or unexplained by the model.

15 | P a g e
Example 2.3: Refer to Example 2.1. Determine how much percent of the variations in the quantity
supplied is explained by the price of the commodity and what percent remained unexplained.

Using Equation 2.26;

e
2
i
387
R2  1  1  1  0.43  0.57
894
y i
2

This result shows that 57% of the variation in the quantity supplied of the commodity under
consideration is explained by the variation in the price of the commodity; and the rest 37%
remained unexplained by the price of the commodity. In other word, there may be other important
explanatory variables left out that could contribute to the variation in the quantity supplied of the
commodity, under consideration.

2.4. Properties of OLS Estimates and Gauss-Markov Theorem

There are various econometric methods with which we may obtain the estimates of the parameters
of economic relationships. We would like to an estimate to be as close as the value of the true
population parameters i.e. to vary within only a small range around the true parameter. How are
we to choose among the different econometric methods, the one that gives ‘good’ estimates? We
need some criteria for judging the ‘goodness’ of an estimate.

‘Closeness’ of the estimate to the population parameter is measured by the mean and variance or
standard deviation of the sampling distribution of the estimates of the different econometric
methods. We assume the usual process of repeated sampling i.e. we assume that we get a very
large number of samples each of size ‘n’; we compute the estimates 𝛽̂ ’s from each sample, and for
each econometric method and we form their distribution. We next compare the mean (expected
value) and the variances of these distributions and we choose among the alternative estimates the
one whose distribution is concentrated as close as possible around the population parameter. The
ideal or optimum properties that the OLS estimates possess may be summarized by well-known
theorem known as the Gauss-Markov Theorem.

16 | P a g e
Statement of the theorem: “Given the assumptions of the classical linear regression model, the
OLS estimators, in the class of linear and unbiased estimators, have the minimum variance, i.e. the
OLS estimators are BLUE. According to this theorem, under the basic assumptions of the classical
linear regression model, the least squares estimators are linear, unbiased and have minimum
variance (i.e. are best of all linear unbiased estimators). Sometimes the theorem referred as the
BLUE theorem i.e. Best, Linear, and Unbiased Estimator. An estimator is called BLUE if (the
detailed proofs of these properties are presented below):

a) Linear: a linear function of the random variable, such as, the dependent variable Y.

b) Unbiased: its average or expected value is equal to the true population parameter.

c) Minimum variance: It has a minimum variance in the class of linear and unbiased
estimators. An unbiased estimator with the least variance is known as an efficient
estimator.

a) Linearity: (for ˆ )

Proposition: ˆ & ˆ are linear in Y.

Proof: From (2.17) of the OLS estimator of ˆ is given by:

xi yi xi (Y  Y ) xi Y  Yxi


ˆ    ,
xi2 xi2 xi2

(but xi   ( X  X )   X  nX  nX  nX  0 )

x Y xi
 ˆ  i 2 ; Now, let  Ki (i  1,2,..... n)
xi xi2

 ̂  KiY ……………………………………………………………………. 2.33

 ̂  K1Y1  K 2Y2  K 3Y3       K nYn

 ̂ is linear in Y

Exercise: Show that 𝛼̂ is linear in Y

17 | P a g e
b) Unbiasedness:
Proposition: ˆ & ˆ are the unbiased estimators of the true parameters  & 

ˆ
From your statistics course, you may recall that if  is an estimator of  then
E(ˆ)    the amount of bias and if ˆ is the unbiased estimator of  then bias =0 i.e.

E(ˆ)    0  E(ˆ)   In our case, ˆ & ˆ are estimators of the true parameters  &  .To

show that they are the unbiased estimators of their respective parameters means to prove that:
(ˆ )   and (ˆ )  

 Proof (1): Prove that ˆ is unbiased i.e. (ˆ )   .

We know that ̂  kYi  k i (  X i  U i )

 k i  k i X i  k i u i ,

but k i  0 and k i X i  1

xi ( X  X ) X  nX nX  nX
k i     0
xi
2
xi2
xi2 xi2

  k i  0 ………………………………………..…………………………… 2.34

xi X i ( X  X ) Xi X 2  XX X 2  nX 2
k i X i     1
xi2 xi2 X 2  nX 2 X 2  nX 2

  ki X i  1 …………………………………………………….…………….. 2.35

ˆ    kiui  ˆ    kiui …………………………………………………. 2.36

( ˆ )  E (  )  k i E (ui ), Since k i are fixed

(ˆ )   , since (u i )  0

Therefore, ˆ is unbiased estimator of  .

18 | P a g e
 Proof(2): prove that ̂ is unbiased i.e.: (ˆ )  

From the proof of linearity property, we know that:


̂  1 n  Xki Yi

 1 n  Xki   X i  U i  , Since Yi    X i  U i

   1
n X i  1n ui  Xki  Xki X i  Xki ui

   1 n ui  Xki ui ,  ˆ    1
n ui  Xki ui

   1 n  Xk i )ui ……………..…………………………… 2.37

(ˆ )    1 n (ui )  Xki (ui )

(ˆ )   ……………………………………………………………………… 2.38

 ̂ is an unbiased estimator of  .

c) Minimum variance of ˆ and ˆ

Now, we have to establish that out of the class of linear and unbiased estimators of𝛼 𝑎𝑛𝑑 𝛽,
𝛼̂ 𝑎𝑛𝑑 𝛽̂ possess the smallest sampling variances. For this, we shall first obtain variance
of𝛼̂ 𝑎𝑛𝑑 𝛽̂ and then establish that each has the minimum variance in comparison of the variances
of other linear and unbiased estimators obtained by any other econometric methods than OLS.

i) Variance of 𝛽̂

var(  )  (ˆ  (ˆ )) 2  (ˆ   ) 2 ………………………………………………... 2.39


Substitute (2.22) in (2.25) and we get
var( ˆ )  E ( k i u i ) 2

 [k12 u12  k 22 u22  .......... ..  k n2 un2  2k1k 2 u1u2  .......  2k n1k n un1un ]

 [k12u12  k 22u22  .......... ..  k n2un2 ]  [2k1k 2 u1u2  .......  2k n1k n un1un ]

 ( k i2 ui2 )  (k i k j ui u j ) i j

 k i2 (ui2 )  2k i k j (ui u j )   2 k i2 (Since (ui u j ) =0)

19 | P a g e
x i xi2 1
k i  2 , and therefore, k i 
2
 2
x i (xi )
2 2
xi

2
 var( ˆ )   2 k i2  2 ………………...…………………………………….. 2.40
xi

ii) Variance of ̂

var(ˆ )  (ˆ  ( )


2

 ˆ    …………………………………………………………….. 2.41


2

Substituting equation (2.37) in (2.41), we get


var(ˆ )   1 n  Xk i  u i2
2

  1 n  Xk i  (ui ) 2
2

  2 ( 1 n  Xki ) 2

  2 ( 1 n2  2 n Xk i  X 2 k i2 )

  2 ( 1 n  2 X n ki  X 2 ki2 ) , Since  k i  0

  2 ( 1 n  X 2 ki2 )

1 X2 xi2 1
 ( 
2
) , Since k i 
2
 2
n  xi 2
(xi )
2 2
xi

Again:
1 X 2 xi2  nX 2  X 2 
    
n xi2 nxi2  nxi
2

 X2   X i2 
 var(ˆ )   2  1 n  2    2   ………………………………………….……… 2.42
 xi   nxi
2

We have computed the variances OLS estimators. Now, it is time to check whether these variances
of OLS estimators do possess minimum variance property compared to the variances other
estimators of the true 𝛼 𝑎𝑛𝑑 𝛽, other than 𝛼̂ 𝑎𝑛𝑑 𝛽̂.

20 | P a g e
̂ and β̂ possess minimum variance property, we compare their variances with
To establish that α
that of the variances of some other alternative linear and unbiased estimators of 𝛼 𝑎𝑛𝑑 𝛽, say  *
and  * . Now, we want to prove that any other linear and unbiased estimator of the true population
parameter obtained from any other econometric method has larger variance that that OLS
estimators. Let’s first show minimum variance of ˆ and then that of ̂ .

1. Minimum variance of ˆ

Suppose:  * an alternative linear and unbiased estimator of  and;

Let  *  wiYi …………………………………………………………..…...………2.43


where , wi  k i ; but: wi  k i  ci

 *  wi (  X i  u i ) Since Yi    X i  U i

 wi  wi X i  wi u i

 (  *)  wi  wi X i ,since (u i )  0

Since  * is assumed to be an unbiased estimator, then for  * is to be an unbiased estimator of 


there must be true that wi  0 and wi X  1 in the above equation.

But, wi  k i  ci
wi  (k i  ci )  k i  ci

Therefore, ci  0 since k i  wi  0


Again wi X i  (k i  ci ) X i  k i X i  ci X i
Since wi X i  1 and k i X i  1  ci X i  0 .

From these values we can drive ci xi  0, where xi  X i  X

ci xi   ci ( X i  X ) ci X i  Xci

Since ci xi  1 ci  0  ci xi  0

Thus, from the above calculations we can summarize the following results.

21 | P a g e
wi  0, wi xi  1, ci  0, ci X i  0

To prove whether ˆ has minimum variance or not lets compute var(  *) to compare with

var( ˆ ) . var(  *)  var( wi Yi )

 wi var(Yi )
2

 var(  *)   2 wi2 since Var(Yi )   2

But, wi  (k i  ci ) 2  k i2  2k i ci  ci2


2

ci xi
 wi2  ki2  ci2 Since k i ci  0
xi2

Therefore, var(  *)   2 (ki2  ci2 )   2 ki2   2 ci2

var(  *)  var( ˆ )   2 ci2

Given that ci is an arbitrary constant,  2 ci2 is a positive i.e. it is greater than zero. Thus

var(  *)  var( ˆ ) . This proves that ˆ possesses minimum variance property. In the similar way

we can prove that the least square estimate of the constant intercept ( ̂ ) possesses minimum
variance.

̂
Exercise 2: Check for 𝜶

The variance of the random variable (Ui)

You may observe that the variances of the OLS estimates involve  , which is the population
2

variance of the random disturbance term. But it is difficult to obtain the population data of the

disturbance term because of technical and economic reasons. Hence it is difficult to compute  ;
2

this implies that variances of OLS estimates are also difficult to compute. But we can compute
these variances if we take the unbiased estimate of  2 which is ˆ 2 computed from the sample
value of the disturbance term ei from the expression:
ei2
ˆ u2  ………………………………………………………….………….. 2.44
n2

22 | P a g e
To use ˆ 2 in the expressions for the variances of ˆ and ˆ , we have to prove whether ˆ 2

e   
2

is the unbiased estimator of  , i.e., E (ˆ )  E


2 2 i 2

n2

e Yˆ , y, yˆ and ei .
2
To prove this we have to compute i from the expressions of Y,

Proof:

Yi  ˆ  ˆX i  ei

Yˆ  ˆ  ˆx

 Y  Yˆ  ei …………………………………..……..…………………………… 2.45

 ei  Yi  Yˆ ………………………………….……….………………………… 2.46

Summing (2.45) will result the following expression;


Yi  yi  ei

Yi  Yˆi sin ce (ei )  0

Dividing both sides the above by ‘n’ will give us


Y Yˆi
  Y  Yˆ ……………………………………………… 2.47
n n
Putting (2.45) and (2.47) together and subtract
Y  Yˆ  e

Y  Yˆ

 (Y  Y )  (Yˆ  Yˆ )  e

 y i  yˆ i  e …………………………………………………………… 2.48

From (2.48):
ei  y i  yˆ i ……………………………..………………………………... 2.49

Where the y’s are in deviation form.


Now, we have to express yi and yˆ i in other expression as derived below.
From: Yi    X i  U i

Y    X  U
23 | P a g e
We get, by subtraction
yi  (Yi  Y )   i ( X i  X )  (U i  U )  xi  (U  U )

 yi  x  (U  U ) ……………………………..……………………...……….. 2.50

Note that we assumed earlier that , (u )  0 , i.e in taking a very large number samples we
expect U to have a mean value of zero, but in any particular single sample U is not
necessarily zero.

Similarly: From;
Yˆ  ˆ  ˆx

Y  ˆ  ˆx

We get, by subtraction

Yˆ  Yˆ  ˆ ( X  X )

 yˆ  ̂x ……………………………………………………………………. 2.51


Substituting (2.50) and (2.51) in (2.49) we get;
ei  xi  (ui  u )  ˆxi

 (ui  u )  ( ˆi   ) xi

The summation over the n sample values of the squares of the residuals over the ‘n’ samples
yields:

ei2  [(ui  u )  ( ˆ   ) xi ]2

 [(ui  u ) 2  ( ˆ   ) 2 xi  2(ui  u )(ˆ   ) xi ]


2

 (ui  u ) 2  ( ˆ   ) 2 xi  2[( ˆ   )xi (ui  u )]


2

Taking expected values, we have:

(ei2 )  [(ui  u ) 2 ]  [( ˆ   ) 2 xi ]  2[( ˆ   )xi (ui  u )] ………….………..… 2.52


2

The right hand side terms of (2.52) may be rearranged as follows;

24 | P a g e
a) [(u  u ) 2 ]  (ui2  u ui )

 2 (u i ) 2 
  u i  
 n 
1
 (u i2 )   ( u ) 2
n
 n 2  1n (u1  u2  .......  ui ) 2 since (ui2 )   u2

 n 2  1n (ui2  2ui u j )

 n 2  1n ((ui2 )  2ui u j ) i  j

 n 2  1n n u2  n2 (ui u j )

 n u2   u2 ( given (ui u j )  0)

  u2 (n  1) ………………………………..…………………….. 2.53

b) [( ˆ   ) 2 xi ]  xi2 .(ˆ   ) 2


2

Given that the X’s are fixed in all samples and we know that
1
( ˆ   ) 2  var( ˆ )   u2
x 2
1
Hence xi2 .( ˆ   ) 2  xi2 .  u2 2
x

xi2 .( ˆ   ) 2   u2 ……………….………………………………….. 2.54

c) -2 [( ˆ   )xi (ui  u )]  2[( ˆ   )(xi ui  u xi )]

= -2 [( ˆ   )(xi u i )] , sin ce  xi  0

But from (2.36) , ( ˆ   )  k i u i and substitute it in the above expression, we will get:

-2 [( ˆ   )xi (ui  u )  2(k i ui )(xi ui )]

 xi u i  
= -2 
 (xi u i ) xi
  ,since k i 
 xi x
2
  i
2

 (x u ) 2 
 2  i 2i 
 xi 

25 | P a g e
 xi 2 u i 2  2xi x j u i u j 
 2  
xi
2
 
 x 2 (u i 2 )  2( xi x j )(u i u j ) 
 2 i  j 
 xi xi
2 2


x 2 (u i )
2

 2 ( given (u i u j )  0)
xi
2

 2(ui2 )  2 2 ……………………………………………………………. 2.55

Consequently, Equation (2.52) can be written interms of (2.53), (2.54) and (2.55) as
follows:

 
 ei2  n  1 u2   2  2 u2  (n  2) u2 …………………..……………….…………. 2.56

From which we get

 e 2 
 i   E (ˆ u2 )   u2 ……………………….……………………………….. 2.57
n2
ei2
Since ˆ u2 
n2
ei2
Thus, ˆ 2  is unbiased estimate of the true variance of the error term(  2 ).
n2
ei2
The conclusion that we can drive from the above proof is that we can substitute ˆ  2

n2

for (  2 ) in the variance expression of ˆ and ˆ , since E (ˆ 2 )   2 . Hence the formula

of variance of ˆ and ˆ becomes;

ˆ ˆ 2 ei2
Var (  )  2 = ……………...………………………………… 2.58
xi (n  2) xi 2

 X i2   ei  X i
2 2

Var (ˆ )  ˆ 2  
2 
………………………………………… 2.59
   i
2
 n x i  n ( n 2) x

e can be computed as  ei 2   yi  ̂  xi yi .
2 2
Note: i

26 | P a g e
2.5. Maximum Likelihood Method

The Maximum Likelihood Method (MLM) is another method of obtaining the estimates of the
parameters of the population from a random sample. The MLM chooses among all the possible
estimates of parameters those values which make the possibility of obtaining an observed sample
is as large as possible.

Assumptions of Maximum Likelihood Method

i) The form of the distribution of the population Yi’s is assumed to be known. In particular,
we assume that the distribution of Yi’s is normal
ii) The sample must be random and each Ui must be independent of any values of 𝑈𝑖 .
𝐸(𝑈𝑖 𝑈𝑗 ) =0
iii) Any random sample must be representative of the population

Definition: The function which defines the joint (total) probability of any sample being observed
is called likelihood function. The general expression of likelihood function is given as:

L= 𝑓(𝑋
⏟1 , 𝑋2 , 𝑋3 , − − − − − − − − 𝑋𝑘 , ⏟
𝛽0 , 𝛽1 , 𝛽2 , − − − − − − − − − 𝛽𝑘 )

𝑉𝑎𝑟𝑎𝑏𝑙𝑒𝑠 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝑛
−1
1 (∑(𝑌𝑖 −𝛽̂0 −𝛽̂𝑋
1 𝑖
)2
𝐿=( ) .𝑒 2𝛿2
𝑢
2
√2𝜋𝛿𝑢

𝑛
−1
1 (∑(𝑌𝑖 −𝛽̂0 −𝛽̂1 𝑋𝑖 )2
𝜕𝐿
=o ===( ) .𝑒 2𝛿2
𝑢 .2. −1 2
. 𝛿𝑢 (∑(𝑌𝑖 − 𝛽̂0 − 𝛽̂𝑋 ) (−1)
𝜕𝛽0 2 1 𝑖
2
√2𝜋𝛿𝑢

∑(𝒀𝒊 − 𝜷̂𝟎 − 𝜷̂𝑿


𝟏 𝒊 ) = 0 ………………………………………………………… 2.60
𝜕𝐿
Similarly, by taking partial derivative of function L with respect to 𝛽1 ( 𝜕𝛽 = o ), we will arrive at,
0

̂0 = 𝑌̅ − 𝛽
𝛽 ̂1 𝑋̅

𝑛 ∑ 𝑌𝑖 𝑋𝑖 −∑ 𝑌𝑖 ∑ 𝑋𝑖 ∑ 𝑦𝑖 𝑥𝑖
̂1
𝛽 = or ̂1
𝛽 =
𝑛 ∑ 𝑋2𝑖 −(∑ 𝑋𝑖 )2 ∑ 𝑋2𝑖

27 | P a g e
2.6. Confidence Interval and Hypothesis Testing
2.6.1. Testing the significance of a given regression coefficient

Since the sample values of the intercept and the coefficient are estimates of the true population
parameters, we have to test them for their statistical reliability. The significance of a model can be
seen in terms of the amount of variation in the dependent variable that it explains and the
significance of the regression coefficients. There are different tests that are available to test the
statistical reliability of the parameter estimates. The following are the common ones;

A) The standard error test


B) The students t-test

A) The Standard Error Test

This test first establishes the two hypotheses that are going to be tested which are commonly known
as the null and alternative hypotheses. The null hypothesis addresses that the sample is coming
from the population whose parameter is not significantly different from zero while the alternative
hypothesis addresses that the sample is coming from the population whose parameter is
significantly different from zero. The two hypotheses are given as follows:

H0: βi=0
H1: βi≠0
The standard error test is outlined as follows:

1. Compute the standard deviations of the parameter estimates using the above formula for
variances of parameter estimates. This is because standard deviation is the positive square root
of the variance.

  U2
se( 1 ) 
 xi2
  U2  X i2
se(  0 ) 
n  xi2

28 | P a g e
2. Compare the standard errors of the estimates with the numerical values of the estimates and
make decision.
A) If the standard error of the estimate is less than half of the numerical value of the estimate,
 1 
we can conclude that the estimate is statistically significant. That is, if se(  i )  ( i ) ,
2
reject the null hypothesis and we can conclude that the estimate is statistically significant.
B) If the standard error of the estimate is greater than half of the numerical value of the
 1 
estimate, the parameter estimate is not statistically reliable. That is, if se(  i )  ( i ) ,
2
conclude that accept the null hypothesis and conclude that the estimate is not statistically
significant.

Example 2.4: The regression shows the estimated regression of supply on price

Yi  33.75  3.25 X i , where the numbers in parenthesis are standard errors. Test the statistical
(8.3) (0.9)
significance of the estimates using standard error test.

Solution: The following information is given for decision.


 0  33.75

se(  0 )  8.3

1  3.25

se( 1 )  0.9
  
Testing for 1 : Since the standard error of 1 is less than half of the value of 1 , we have to reject

the null hypothesis and conclude that the parameter estimate 1 is statistically significant.
  
Testing for  0 Since the standard error of  0 is less than half of the numerical value of  0 , we

have to reject the null hypothesis and conclude that  0 is statistically significant.

29 | P a g e
B) The Student T-Test

In conditions where Z-test is not applied (in small samples), t-test can be used to test the statistical
reliability of the parameter estimates. The test depends on the degrees of freedom that the sample
has. The test procedures of t-test are similar with that of the z-test. The procedures are outlined as
follows;

1. Set up the hypothesis. The hypotheses for testing a given regression coefficient is given by:
H 0 : i  0
………………………………………………………………………..……… 2.61
H1 :  i  0
2. Determine the level of significance for carrying out the test. We usually use a 5% level
significance in applied econometric research.
3. Determine the tabulated value of t from the table with n-k degrees of freedom, where k is the
number of parameters estimated.
4. Determine the calculated value of t. The test statistic (using the t- test) is given by:

ˆi
tcal  ……………………………………………..………..……………………… 2.62
se( ˆi )
The test rule or decision is given as follows:
Reject H0 if
| t cal | t / 2,nk …………………………………………...……………………………………. 2.63

Example 2.5: Refer to Example 2.1. Is price of the commodity significant in determining the
quantity supplied of the commodity under consideration? Use =0.05.

The hypothesis to be tested is:


H 0 : 1  0
H 1 : 1  0

As we found in Example 2.1, ˆ1  3.25, se(ˆ1 )  0.8979


Then
ˆi 3.25
t cal    3.62
se( ˆi ) 0.8979

30 | P a g e
The tabulated value as given in Example 2.1 is 2.228. since calculated t is greater than the tabulated
value, we reject the null hypothesis and conclude that the price of the commodity is significant in
determining the quantity supplied for the commodity. Note that when the degrees of freedom is
large, we can conduct t-test without consulting the t-table in finding the theoretical value of t. This
rule is known as “2t-rule”. The t-table shows that the values of t changes very slowly if the degrees
of freedom (n-k) are greater than 8. For example the value of t0.025 changes from 2.30 (when n-
k=8) to 1.96(when n-k=∞). The change from 2.30 to 1.96 is obviously very slow. Consequently,
we can ignore the degrees of freedom (when they are greater than 8) and say that the theoretical
value of tcal is 2.0. Thus, a two tail test of a null hypothesis at 5% level of significance can be
reduced to the following rules.

1. If t cal is greater than 2 or less than -2, we reject the null hypothesis

2. If tcal is less than 2 or greater than -2, accept the null hypothesis.

2.6.2. Confidence Interval Estimation of the regression Coefficients

In the above section, we have seen how to test the reliability of parameter estimates. But one thing
that must be clear is that rejecting the null hypothesis does not mean that the parameter estimates
are correct estimates of the true population parameters. It means that the estimate comes from the
sample drawn from the population whose population parameter is significantly different from zero.
In order to define how close to the estimate the true parameter lies, we must construct a confidence
interval for the parameter. Like we constructed confidence interval estimates for a given
population mean, using the sample mean (in Introduction to Statistics), we can construct 100(1-
) % confidence intervals for the sample regression coefficients. To do so we need to have the
standard errors of the sample regression coefficients. The standard error of a given coefficient is
the positive square root of the variance of the coefficient. Thus, we have discussed that the
formulae for finding the variances of the regression coefficients are given as.

Variance of the intercept ( ( ˆ0 )

X
2
i

var( ˆ0 )   u
2
…………………………………...…………………………………… 2.64
n xi
2

31 | P a g e
Variance of the slope ( ˆ1 )
1
var( ˆ1 )   u …………………………………………………………………………. 2.65
2

x
2
i

Where,

e
2
i

u2  ………………………………………………….………………………………. 2.66


nk
is the estimate of the variance of the random term and k is the number of parameters to be estimated
in the model.

The standard errors are the positive square root of the variances, as repeatedly defined above and
100(1- ) % confidence interval for the slope is:

   
1  t (n  k )(se( 1 ))  1  1  t (n  k )(se( 1 ))
2 2

1  ˆ1  t / 2,n k ( se( ˆ1 )) …………………………………………………………………….. 2.67

And for the intercept:


 0  ˆ0  t / 2,n k ( se( ˆ 0 )) ……………………………………………………………………. 2.68

Example 2.6: From example 2.1 above, determine the 95% confidence interval for the slope.

 xy 156
ˆ    3.25 =============== ˆ0  Y  ˆ1 X  63  (3.25)(9)  33.75
48
x 2

e
2
i
387 387 1 1
u2     38.7 ===== var( ˆ1 )   u 2  38.7( )  0.80625
nk 12  2 10 48
x 2

The standard error of the slope is:

se( ˆ1 )  var( ˆ1 )  0.80625  0.8979

32 | P a g e
The tabulated value of t for degrees of freedom 12-2=10 and /2=0.025 is 2.228. Hence, the 95%
confidence interval for the slope is given by:

ˆ1  3.25  (2.228)(0.8979)  3.25  2  3.25  2, 3.25  2  1.25, 5.25

2.7. Test of Model Adequacy (Overall Significance Test)

Is the estimated equation a useful one? To answer this, an objective measure of some sort is
desirable.

The total variation in the dependent variable Y can partition into two: one that accounts for
variation due to the regression equation (explained portion) and another that is associated with the
unexplained portion of the model.

̅ ) 𝟐 = ∑(𝒀
∑(𝒀 − 𝒀 ̂− 𝒀
̅ ) 𝟐 + ∑(𝒀 − 𝒀
̂) 𝟐

TSS = ESS + RSS


In other words, the total sum of squares (TSS) is decomposed into regression (explained) sum of
squares (ESS) and error (residual or unexplained) sum of squares (RSS).

Thus, the coefficient of determination is;

 (Yˆ  Y )
i
2
 yˆ
i
2

R2  
 (Y  Y )
i
2
y i
2

 (Y  Yˆ ) e
2 2
i i i

1  xi yi OR R2  1   1
R 
2
 (Y  Y ) y
2 2
 yi2 i i

Note:
1) The proportion of total variation in the dependent variable (Y) that is explained by changes in
the independent variable (X) or by the regression line is equal to: R2 x 100%.
2) The proportion of total variation in the dependent variable (Y) that is due to factors other than
X (for example, due to excluded variables, chance, etc.) is equal to: (1– R2) x 100%.

33 | P a g e
Tests for the coefficient of determination (R2)

The largest value that R2 can assume is 1 (in which case all observations fall on the regression
line), and the smallest it can assume is zero. A low value of R2 is an indication that:
 X is a poor explanatory variable in the sense that variation in X leaves Y unaffected, or
 While X is a relevant variable, its influence on Y is weak as compared to some other
variables that are omitted from the regression equation, or
 The regression equation is mis-specified (for example, an exponential relationship might
be more appropriate.

Thus, a small value of R2 casts doubt about the usefulness of the regression equation. We do not,
however, pass final judgment on the equation until it has been subjected to an objective statistical
test. Such a test is accomplished by means of analysis of variance (ANOVA) which enables us to
test the significance of R2 (i.e., the adequacy of the linear regression model). The ANOVA table
for simple linear regression is given below:

Source of Sum of Squares Degree of Mean Variance Ratio


Variation Freedom Square
Regression ESS k-1 𝑬𝑺𝑺⁄ 𝑬𝑺𝑺⁄
(𝒌 − 𝟏) (𝒌 − 𝟏)
𝑭𝒄𝒂𝒍 =
𝑹𝑺𝑺⁄ 𝑹𝑺𝑺⁄
Residual RSS n-k (𝒏 − 𝒌)
(𝒏 − 𝒌)
Total TSS n-1

To test for the significance of R2, we compare the variance ratio with the critical value from the F
distribution with (k-1) and (n-k) degrees of freedom in the numerator and denominator,
respectively, for a given significance level α.

Decision: If the calculated variance ratio exceeds tabulated value, that is, if Fcal > Fα (k-1, n-2),
we then conclude that R2 is significant (or that the linear regression model is adequate).

Note that, the F test is designed to test the significance of all variables or a set of variables in a
regression model. In the two-variable model, however, it is used to test the explanatory power of
a single variable (X), and at the same time, is equivalent to the test of significance of R2.

34 | P a g e
2.8. Prediction Using Simple Linear Regression Model

Once estimated parameters are proved to be significant and valid, then it can be used to forecast
or predict future values of the objective variable. Prediction or forecasting future values of
economic variables such as demand, cost and so on is a crucial contribution of econometric
analysis in economic decision making. For instance, in fixing future price, determination of
production level, making investment decision and so on.

Hence, predicting the future values of the dependent variable is one of the key tasks in econometric
̂𝑖 = 𝛼̂ + 𝛽̂ 𝑋𝑖 is used for predicting the values of Y
analysis. The estimated regression equation 𝑌
for a given values of X. To proceed with, let X0 be the given value of X. Then we predict the
corresponding value of Y0 of Y by

̂𝟎 = 𝜶
𝒀 ̂ 𝑿𝟎
̂ +𝜷 ……………………………………...…………………. 2.69

The true value of Y is given by

𝑌0 = 𝛼 + 𝛽𝑋0 + 𝑈0 where 𝑈0 𝑖𝑠 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚

Hence, the prediction error is:

𝑌̂0 − 𝑌0 = (𝛼̂ – 𝛼) + (𝛽̂ −𝛽)𝑋0 − 𝑈0

Since 𝐸(𝛼̂ – 𝛼) = 0, 𝐸(𝛽̂ − 𝛽) = 0 𝑎𝑛𝑑 𝐸(𝑈0 ) = 0,

we have 𝐸(𝑌̂0 − 𝑌0 ) = 0

This equation shows that the predictant is unbiased. Note that the predictant is unbiased in the
sense that 𝐸(𝑌̂0 ) = 𝐸(𝑌0 ) since both 𝑌̂0 𝑎𝑛𝑑 𝑌0 are random variables.

Example 2.7: From example 2.1, predict the value of Y when X is 45

Yˆi  33.75  3.25 X i = 33.75+ (3.25) (45) = 180


Solution:

That means when X assumes a value of 45, Y on average is expected to be 180.

35 | P a g e

You might also like