Econometrics II Distance Module

College of Business, Economics, and Social Sciences; Department of Common and Supportive Courses
UNITY UNIVERSITY
COLLEGE OF DISTANCE AND CONTINUING EDUCATION
DEPARTMENT OF ECONOMICS
ECONOMETRICS II (Econ-2062) Module
Course Title: Econometrics II Prepared by:-------------------------------

A. Course Overview
 Course Title: Econometrics II
 Course No: Econ-2062
 Credit Hours: 3
 Pre-requisite: Econ-2061
B. Course Description
This course is a continuation of Econometrics I. It aims at introducing the theory (and
practice) of regression on qualitative information, time series and panel data
econometrics as well as simultaneous equation modeling. It first makes an introduction
to the basic concepts in qualitative information modeling such as dummy variable
regression and binary choice models (LPM, Logit and Probit). Elementary time series
models, estimations and tests 2for both stationary and non-stationary data will then be
discussed. It also covers introduction to simultaneous equation modeling with

alternative estimation methods. Introductory pooled cross-sectional and panel data

models will finally be highlighted.
C. Course objectives
After the completion of this course, students are expected to:
o Understand the basic concepts in regression involving dummy

independent and dependent variables;
o Know the theory and practice of elementary time series econometrics;
o Understand the motivation and estimation methods of simultaneous

equation modelling;
o Get introductory ideas on linear panel data models; and
D. Assessment/Evaluation
Attendance 5%
Quiz 5%
Test-I 15%
Test-II 15%
Assignment 10%
Final Exam 50%
Total Mark 100%
E. Required text book:
 Gujarati, D. N. (2009). Basic Econometrics, 5th edition, McGraw-Hill.
F. Additional readings:
 Maddala, G. S. (2013). Introduction to Econometrics, 4th edition,

Macmillan.
 Wooldridge, J. (2004). IntroductoryEconometrics: A Modern Approach,
3
2nd ed.
 Enders, W. (2004). Applied Econometric Time Series, John Wiley & Sons:
Singapore.

 Koutsoyiannis, A. (2001). Theory of Econometrics, Palgrave: New York.
 Johnston, J. (2015)., Econometric Methods, 3rd edition.
 Kmenta, J. (2008). Elements of Econometrics, 2nd edition.
 Intrilligator M.D, R.G. Bodkin, and D. Hsiao (1996). Econometric Models,

Techniques and Applications.

Chapter one
Regression Analysis with Qualitative Information: Binary (Dummy Variables)
Topic Learning Objectives
This chapter aims at introducing models with binary explanatory variable and
specification and estimation of qualitative/dummy variables.
In this chapter you will be able to:
 Familiarize yourself with regression models with binary explanatory variables
 Understand some of the specification of the model for dummy dependent

and dummy independent variables.
 Understand the basic concepts in qualitative information modeling such as

dummy variable regression and binary choice models (LPM, Logit and Probit).
Topic Outline
1. Introduction
2. Describing Qualitative Information
3. Dummy as Independent Variables
4. Dummy as Dependent Variable
3.1. The Linear Probability Model (LPM)
3.2. The Logit and Probit Models
3.3. Interpreting the Probit and Logit Model Estimates
5. Review questions
1.1 Introduction
5
As it is mentioned in the previous section , this chapter is dealing with the role of
qualitative explanatory variables in regression analysis and the interpretation of the
coefficients/ parameter estimates of such models. It will be shown that the introduction

of qualitative variables, often called dummy variables, makes the linear regression
model an extremely flexible tool that is capable of handling many interesting problems
encountered in empirical studies. Having a brief introduction on such binary variables,
several models( such as LPM, Logit, Probit and Tobit models) in which the dependent
variable itself is qualitative in nature will be discussed.
1.2 Describing Qualitative Information
In the regression analysis you discussed in Econometrics I, the dependent variable as

well as the explanatory variables was quantitative in nature. However, the dependent
variable might not only influenced by variables that can be readily quantified on some
well-defined scale (e.g., income, output, prices, costs, height, and temperature), but
also by variables that are essentially qualitative in nature (e.g., sex, race, color, religion,
nationality, and changes in government economic policy). For example, holding all
other factors constant, female college professors are found to earn less than their male
counterparts, and nonwhites are found to earn less than whites. This pattern may result
from sex or racial discrimination, but whatever the reason, qualitative variables such as
sex and race does influence the dependent variable and clearly should be included
among the explanatory variables. Since such qualitative variables usually indicate the
presence or absence of a “quality” or an attribute, such as male or female, black or
white, one method of “quantifying” such attributes is by constructing artificial variables
that take on values of 1 or 0, 0 indicating the absence of an attribute and 1 indicating
the presence (or possession) of that attribute. For example, 1 may indicate that a
person is a male, and 0 may designate a female; or 1 may indicate that a person is a
college graduate, and 0 that he is not, and so on. Variables that assume such 0 and 1
values are called dummy variables. Alternative names are indicator variables, binary
6
variables, categorical variables, and dichotomous variables.

1.3 . Dummy Variable as Independent Variable
1.3.1. Regression on one qualitative variable with two classes, or

categories
Dummy variables can be used in regression models just as easily as quantitative

variables. As a matter of fact, a regression model may contain explanatory variables
that are exclusively dummy, or qualitative, in nature.
Example: =
Where =annual salary of a college professor
1 if male college professor
= 0 otherwise (i.e., female professor)
Note that (1.1) is like the two variable regression models encountered in econometrics I
except that instead of a quantitative X variable we have a dummy variable D (hereafter,
we shall designate all dummy variables by the letter D).
Model (1. 1) may enable us to find out whether sex makes any difference in a college
professor’s salary, assuming, of course, that all other variables such as age, degree
attained, and years of experience are held constant. Assuming that the disturbance
satisfy the usually assumptions of the classical linear regression model, we obtain from
(1. 1).
Mean salary of female college professor: /
Mean salary of male college professor: /
That is, the intercept term α gives the mean salary of female college professors and
7
the slope coefficient β tells by how much the mean salary of a male college professor
differs from the mean salary of his female counterpart, α+ β reflecting the mean
salary of the male college professor. A test of the null hypothesis that there is no sex

discrimination ( H 0 : β=0 ) can be easily made by running regression (1.1) in the usual
manner and finding out whether on the basis of the t- test the estimated β is
statistically significant.
1.3.2. Regression on one quantitative variable and one qualitative variable

with two classes
Consider the model: =
Where: Annual salary of a college professor
Years of teaching experience
1 if male
=0 otherwise
Model (1. 3) contains one quantitative variable (years of teaching experience) and one
qualitative variable (sex) that has two classes (or levels, classifications, or categories),
namely, male and female. What is the meaning of this equation? Assuming, as usual,
that E(ui )=0 , we see that
Mean salary of female college professor: /
Mean salary of male college professor: /
Geometrically, we have the situation shown in fig. 1.1 (for illustration, it is assumed that
α 1 >0 ). In words, model 1. 1 postulates that the male and female college professors’
8
salary functions in relation to the years of teaching experience have the same slope ( β )
but different intercepts. In other words, it is assumed that the level of the male

professor’s mean salary is different from that of the female professor’s mean salary (by
α 2 ) but the rate of change in the mean annual salary by years of experience is the
same for both sexes.
If the assumption of common slopes is valid, a test of the hypothesis that the two
regressions (1.4) and (1.5) have the same intercept (i.e., there is no sex discrimination)
can be made easily by running the regression (1.3) and noting the statistical
significance of the estimated α 2 on the basis of the traditional t- test. If the t test
shows that α^ 2 is statistically significant, we reject the null hypothesis that the male and
female college professors’ levels of mean annual salary are the same.
Before proceeding further, note the following features of the dummy variable regression
model considered previously.
1. To distinguish the two 9categories, male and female, we have introduced only
one dummy variable

Di . For example, if Di=1 always denotes a male, when
Di=0 we know that it is a female since there are only two possible outcomes.

Hence, one dummy variable suffices to distinguish two categories. The general
rule is this: If a qualitative variable has ‘m’ categories, introduce only
‘m-1’ dummy variables. In our example, sex has two categories, and hence
we introduced only a single dummy variable. If this rule is not followed, we shall
fall into what might be called the dummy variable trap, that is, the situation of
perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and female, is
arbitrary in the sense that in our example we could have assigned D=1 for
female and D=0 for male.
3. The group, category, or classification that is assigned the value of 0 is often

referred to as the base, benchmark, control, comparison, reference, or omitted
category. It is the base in the sense that comparisons are made with that
category.
4. The coefficient α 2 attached to the dummy variable D can be called the
differential intercept coefficient because it tells by how much the value of

the intercept term of the category that receives the value of 1 differs from the
intercept coefficient of the base category.
1.3.3. Regression on one quantitative variable and one qualitative variable

with more than two classes
Suppose that, on the basis of the cross-sectional data, we want to regress the annual
expenditure on health care by an individual on the income and education of the
individual. Since the variable education is qualitative in nature, suppose we consider
three mutually exclusive levels10 of education: less than high school, high school, and
college. Now, unlike the previous case, we have more than two categories of the
qualitative variable education. Therefore, following the rule that the number of

dummies be one less than the number of categories of the variable, we should
introduce two dummies to take care of the three levels of education. Assuming that the
three educational groups have a common slope but different intercepts in the
regression of annual expenditure on health care on annual income, we can use the
following model:
Where = Annual expenditure on health care
= Annual expenditure
1 if high school education
= 0 otherwise
1 if college education
= 0 otherwise
Note that in the preceding assignment of the dummy variables we are arbitrarily
treating the “less than high school education” category as the base category.
Therefore, the intercept α 1 will reflect the intercept for this category. The differential
α
intercepts α 2 and 3 tell by how much the intercepts of the other two categories differ
from the intercept of the base category, which can be readily checked as follows:
Assuming E(ui )=0 , we obtain from (1.6)
/ 11

which are, respectively the mean health care expenditure functions for the three levels
of education, namely, less than high school, high school, and college. Geometrically, the
situation is shown in fig 1.2 (for illustrative purposes it is assumed that

α 3 >α 2 ).
1.3.4. Regression on one quantitative variable and two qualitative variables
The technique of dummy variable can be easily extended to handle more than one
qualitative variable. Let us revert to the college professors’ salary regression (1.3), but
now assume that in addition to years of teaching experience and sex, the skin color of
the teacher is also an important determinant of salary. For simplicity, assume that color
has two categories: black and white. We can now write (1.3) as:
Where = annual salary
= years of teaching experience
12
= if female
=0 otherwise

= if white
=0 otherwise
Notice that, each of the two qualitative variables, sex and color, has two categories and
hence needs one dummy variable for each. Note also that the omitted, or base,
category now is “black female professor.”
Assuming = , we can obtain the following regression from (1.7)
Mean salary for black female professor:
Mean salary for black male professor:
Mean salary for white female professor:
Mean salary for white male professor:
Once again, it is assumed that the preceding regressions differ only in the intercept
coefficient but not in the slope coefficient β .

13

An OLS estimation of (1.6) will enable us to test a variety of hypotheses. Thus, if

α 3 is
statistically significant, it will mean that color does affect a professor’s salary. Similarly,
if α 2 is statistically significant, it will mean that sex also affects a professor’s salary. If
both these differential intercepts are statistically significant, it would mean sex as well
as color is an important determinant of professors’ salaries. From the preceding
discussion it follows that we can extend our model to include more than one
quantitative variable and more than two qualitative variables. The only precaution to be
taken is that the number of dummies for each qualitative variable should be one less
than the number of categories of that variable.
1.4. REGRESSION ON DUMMY DEPENDENT VARIABLE
1.4.1. INTRODUCTION
In all the regression models that we have considered so far, we have implicitly assumed
that the dependent variable Y is quantitative, whereas the explanatory variables are
either quantitative, qualitative (or dummy), or a mixture thereof. In fact, in the previous
sections, on dummy variables, we saw how the dummy explanatory variables are
introduced in a regression model and what role they play in specific situations. In this
section we consider several models in which the dependent variable itself is qualitative
in nature. Although increasingly used in various areas of social sciences and medical
research, qualitative response regression models pose interesting estimation and
interpretation challenges.
Suppose we want to study the labor-force participation of adult males as a function of

the unemployment rate, average wage rate, family income, education, etc. A person
either is in the labor force14or not. Hence, the dependent variable, labor-force
participation, can take only two values: 1 if the person is in the labor force and 0 if he
or she is not. We can consider another example. A family may or may not own a house.

If it owns a house, it takes a value 1 and 0 if it does not. There are several such
examples where the dependent variable is dichotomous. A unique feature of all the
examples is that the dependent variable is of the type that elicits a yes or no response;
that is, it is dichotomous in nature. Now before we discuss the concept of qualitative
response models (QRM) involving dichotomous response variables, it is important to
note a fundamental difference between a regression model where the dependent
variable Y is quantitative and a model where it is qualitative. In a model where Y is
quantitative, our objective is to estimate its expected, or mean, value given the values
of the explanatory variables. In terms econometrics I, what we want is E(Yi | X1i , X2i ,
. . . , Xki), where the X’s are explanatory variables, both quantitative and qualitative. In
models where Y is qualitative, our objective is to find the probability of something
happening, or owning a house, or participating in a labour force etc. Hence, qualitative
response regression models are often known as probability models.
1.4.2. QUALITATIVE RESPONSE MODELS
Now we start our discussion of qualitative response models (QRM). These are models
in which the dependent variable is a discrete outcome. These models are analyzed in a
general framework of probability models. There are two broad categories of QRM.
A. Binomial Models: where the choice is between two alternatives

Example 1.3.1: ++
Y= 1, if individual attend college
= 0, otherwise
Therefore, the dependent variable Y takes on only two values (i.e., 0 and 1).
Conventional regression methods cannot be used to analyze a qualitative dependent
15
variable model.
B. Multinomial Models: The choice is between more than two alternatives

Example 1.3.2: ++
In this case, the dependent variable Y takes more than two alternatives. For instances,
Y= 1, if individual attend college
Y= 2, if individual high school
Y= 3, otherwise
Since the discussion of multinomial QRMs is out of scope of this course, we start our
study of qualitative response models by considering the binary response regression
model. There are four approaches to estimating a probability model for a binary
response variable:
1. Linear Probability Models 3. The Probit Model
2. The Logit Model 4. The Tobit (Censored Regression) Model1.
1.4.2.1. THE LINEAR PROBABILITY MODEL (LPM)
The linear probability model is the regression model applied to a binary dependent
variable. To fix ideas, consider the following simple model:
Yi =
β 0 + β 1 X + U ……………………………(1)
i i
Where X = family income and = is the disturbance term
Y = 1 if the family owns a house
= 0 if the family does not own a house

16
The independent variable Xi can be discrete or continuous variable. The model can be
extended to include other additional explanatory variables. The above model expresses

the dichotomous Yi as a linear function of the explanatory variable X i. Such kinds of

models are called linear probability models (LPM) since E(Y i/Xi) the conditional
expectation of Yi given Xi, can be interpreted as the conditional probability that the
event will occur given X i; that is, Pr(Yi = 1/Xi). Thus, in the preceding case, E(Y i/Xi) gives
the probability of a family owing a house and whose income is the given amount X i. The
justification of the name LPM can be seen as follows.
Assuming E(Ui) = 0, as usual (to obtain unbiased estimators), we obtain
E(Yi/Xi) =
β 0 + β 1 X …………………………………….(2)
i
Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – P i =
probability that Yi = 0 (that is, that the event does not occur), the variable Y i has the
following distributions:
Probability
0 1-
Total 1
Therefore, by the definition of mathematical expectation, we obtain
E(Yi) = 0 (1 – Pi) + 1(Pi) = Pi ……………………………………..(3)
Now, comparing (2) with (3), we can equate

17
β
E(Yi/Xi) = Yi = 0 + β 1 Xi = Pi ……………………………………(4)

i.e., the conditional expectation of the model (1) can, in fact, be interpreted as the
conditional probability of Yi. Since the probability Pi must lie between 0 and 1, we have
the restriction 0 ? E (Yi/Xi) ? 1 i.e., the conditional expectation, or conditional
probability, must lie between 0 and 1.
Example: The LPM estimated by OLS (on home ownership) is given as follows:
Y^ i = -0.9457 + 0.1021X
i
(0.1228) (0.0082)
t = (-7.6984) (12.515) R2 = 0.8048
The above regression is interpreted as follows
 The intercept of –0.9457 gives the “probability” that a family with zero income
will own a house. Since this value is negative, and since probability cannot be
negative, we treat this value as zero. The slope value of 0.1021 means that for a
unit change in income, on the average the probability of owning a house
increases by 0.1021 or about 10 percent. This is so whether the income level is
increased or not. This seems patently unrealistic. In reality one would expect
that Pi is non-linearly related to Xi (see next section).
Problems with the LPM
From the preceding discussion it would seem that OLS can be easily extended to binary
dependent variable regression models. So, perhaps there is nothing new here.
Unfortunately, this is not the case, for the LPM poses several problems. That is, while
the interpretation of the parameters is unaffected by having a binary outcome, several
assumptions of the LPM are necessarily violated.
18
1. Heteroscedasticity

The variance of the disturbance terms depends on the X’s and is thus not constant. Let
us see this as follows. We have the following probability distributions for U.
Probability
0 -- 1-
1 1--
Now by definition =2 =) since and for all (no serial correlation) by assumption.
Therefore, using the preceding probability distribution of, we obtain
= E(Ui2) = (-
β 0 – β 1 X )2 (1-P ) + (1- β 0 – β 1 X )2 (P )
i i i i
=(-
β 0 – β 1 X )2(1- β 0 – β 1 X ) + (1- β 0 – β 1 X )2 ( β 0 + β 1 X )
i i i i
=(
β 0 + β 1 X ) (1- β 0 – β 1 X )
i i
or = E(Yi/Xi) [1 – E(Yi/Xi) = Pi (1 – Pi)
This shows that the variance of Ui is heteroscedastic because it depends on the

conditional expectation of Y, which, of course, depends on the value taken by X. Thus
the OLS estimator of β is inefficient and the standard errors are biased, resulting in
incorrect test.
2. Non-normality of Ui
Although OLS does not require the disturbance (U’s) to be normally distributed, we
19
assumed them to be so distributed for the purpose of statistical inference, that is,
hypothesis testing, etc. But the assumption of normality for U i is no longer tenable for
the LPMs because like Yi, Ui takes on only two values.

= Y i-
β 0 – β1 X
i
Now when Yi = 1, = 1 -
β 0 – β1 X
i
and when Yi = 0, = –
β 0 – β1 X
i
Obviously cannot be assumed to be normally distributed. Recall that normality is not

required for the OLS estimates to be unbiased.
3. Non-fulfillment of 0 ? E (Yi/Xi) ? 1
The LPM produces predicted values outside the normal range of probabilities (0, 1). It
predicts value of Y that are negative and greater than 1. This is the real problem with
the OLS estimation of the LPM.
4. Functional Form:
Since the model is linear, a unit increase in X results in a constant change of β in the
probability of an event, holding all other variables constant. The increase is the same
regardless of the current value of X. In many applications, this is unrealistic. When the
outcome is a probability, it is often substantively reasonable that the effects of
independent variables will have diminishing returns as the predicted probability
approaches 0 or 1.
Remark: Because of the above mentioned problems the LPM model is not
recommended for empirical works.
20
1.4.2.2. ALTERNATIVE MODEL TO LPM

As we have seen, the LPM is characterized by several problems such as non-normality

of ui, heteroscedasticity of ui and possibility of Yi lying outside the 0–1 range. But even
then the fundamental problem with the LPM is that it is not logically a very attractive
model because it assumes that Pi = E(Y =1| X) increases linearly with X, that is, the
marginal or incremental effect of X remains constant throughout. Thus, in our home
ownership example we found that as X increases by a unit (Birr 1000), the probability
of owning a house increases by the same constant amount of 0.10. This is so whether
the income level is Birr 8000, Birr 10,000, Birr 18,000, or Birr 22,000. This seems
patently unrealistic. In reality one would expect that Pi is nonlinearly related to Xi: At
very low income a family will not own a house but at a sufficiently high level of income,
say, X*, it most likely will own a house. Any increase in income beyond X* will have
little effect on the probability of owning a house. Thus, at both ends of the income
distribution, the probability of owning a house will be virtually unaffected by a small
increase in X.
Therefore, what we need is a (probability) model that has these two features: (1) As Xi
increases, Pi = E(Y = 1 | X) increases but never steps outside the 0–1 interval, and (2)
the relationship between Pi and Xi is nonlinear, that is, “one which approaches zero at
slower and slower rates as Xi gets small and approaches one at slower and slower rates
as Xi gets very large.’’ Geometrically, the model we want would look something like
Figure 1.1.
1 CDF
21

-? 0 ?
Fig 1.1: A Cumulative Distribution Function (CDF
The above S-shaped curve is very much similar with the cumulative distribution function
(CDF) of a random variable. (Note that the CDF of a random variable X is simply the
probability that it takes a value less than or equal to X 0, were X0 is some specified
numerical value of X. In short, F(X), the CDF of X, is F(X = X 0) = P(X ? X0). Therefore,
one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
1) the logistic – which gives rise to the logit model
2) the normal – which gives rise to the probit (or normit) model
1. THE LOGIT MODEL
Now let us see how one can estimate and interpret the logit model. Recall that the LPM
was (for home ownership).
Pi = E(Y = 1 | Xi) = β1 + β2Xi
Where X is income and Y = 1 means the family owns a house. Now consider the
following representation of home ownership.
Pi = E(Y = 1 | Xi)=
Pi = = where = 22
This equation represents what is known as the (cumulative) logistic distribution

function. As Zi ranges from −∞ to +∞, Pi ranges between 0 and 1, and Pi is nonlinearly

related to Zi (i.e., Xi), thus, satisfying the two requirements considered earlier. Since
the above equation is non linear in both the X and the’s. This means we cannot use the
familiar OLS procedure to estimate the parameters. This can be linear as follows.If Pi,
the probability of owning a house, then (1 − Pi ), the probability of not owning a house,
is
1 − Pi =
==
NowPi/ (1 − Pi) is simply the odds ratio in favor of owning a house- the ratio of the
probability that a family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain
Li =) = = 1 +2
L(the log of the odds ratio) is linear in X as well as (the parameters). L is called the
logit and hence the name logit model is given to it.
The Estimation and Interpretation of the Logit Model
Now for estimation purposes, let us write the logit model as
Li =) = = 1 +2 +
To estimate the above model we need values of Xi and Li. Standard OLS cannot be
applied since values of L are meaningless. Recall that Pi = 1 if a family owns a house
and Pi = 0 if it does not own a house. But if we put these values directly into the logit
23
Li, we obtain:
Li =) if a family own a house

Li =) if a family does not own a house
Therefore estimation is by using the maximum likelihood method. Since derivation of

parameters using MLE method is out of the scope of this course, we will not discuss the
method here. The interpretation of the logit model is as follows:
– The slope measures the change in L for a unit change in X .
β 0 – The intercept tells the value of the log-odds in favor of owning a house if
income is zero. Like most interpretations of intercepts, this interpretation may not have
any physical meaning.
Example: Logit estimates. Assume that Y is linearly related to the variables Xi’s as
follows:
Yi =
β 0 + β1 X + β2 X + β3 X + β4 X + β5 X + U
1 2 3 4 5 i
The logit estimate results are presented as:
Yi = -10.84 – 0.74X1 – 11.6X2 – 5.7X3 – 1.3X4 + 2.5X5
t = (-3.20) (-2.51) (-3.01) (-2.4) (-1.37) (1.62)
The variables X1, X2 and X3 are statistically significant at 99%. The variable X4 is
significant at 90%. The above estimated result shows that the variables X 1, X2 and X3
have a negative effect on the probability of an event to occur (i.e., y = 1). While the
sign of or the variable X5 has a positive effect on the probability of an event to occur.
Note: Parameters of the model are not the same as the marginal effects we are used to
when analyzing OLS.
1.5 THE PROBIT MODEL 24

The estimating model that emerges from the normal CDF is popularly known as the
probit model. Here the observed dependent variable Y, takes on one of the values 0
and 1 using the following criteria.
¿
Define a latent variable Y* such that
Y i = X 1i β + ?
I
¿
Y = 1 if Y i > 0
0 if
Y ¿i ? 0
The latent variable Y* is continuous (-?< Y* <?). It generates the observed binary
variable Y.
An observed variable, Y can be observed in two states:
i. if an event occurs it takes a value of 1
ii. if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear function of the observed X’s through the
structural model. Note that is a latent variable because it is unobserved.
Consider example of latent variable:Let Y measures whether one is employed or not. It

is a binary variable taking values 0 and 1. Y* - measures the willingness to participate
in the labor market. This changes continuously and is unobserved. If X is a wage rate,
then as X increases the willingness to participate in the labor market will increase. (Y* -
the willingness to participate cannot be observed). The decision of the individual will be
changed (becomes zero) if the wage rate is below the critical point.
25
To motivate the probit model, assume that in our home ownership example the decision
of the family to own a house or not depends on an unobservable utility index (also

known as a latent variable), that is determined by one or more explanatory variables,

say income , in such a way that the larger the value of the index , the greater the
probability of a family owning a house. We express the index as
= β1 + β2Xi
where Xi is the income of the family.
Given the assumption of normality, the probability that is less than or equal to can be
computed from the standardized normal CDF as:
Pi = P(Y = 1 | X) = P (≤) = P (≤ β1 + β2) = F (β1 + β2)
But now consider the following representation of home ownership:
P(Y = 1 | Xi) =
is the normal probability distribution function (i.e. )
Since Y* is continuous the model avoids the problems inherent in the LPM model (i.e.,
the problem of non-normality of the error term and heteroscedasticity). However, since
the latent dependent variable is unobserved the model cannot be estimated using OLS.
Maximum likelihood can be used instead.
Most often, the choice is between normal errors and logistic errors, resulting in the
probit (normit) and logit models, respectively. The coefficients derived from the
maximum likelihood (ML) function will be the coefficients for the probit model, if we
assume a normal distribution. If we assume that the appropriate distribution of the
26
error term is a logistic distribution, the coefficients that we get from the ML function will
be the coefficient of the logit model. In both cases, as with the LPM, it is assumed thatE
[/] = 0. In the probit model, it is assumed that / In the logit model, it is assumed that /.

Hence the estimates of the parameters (β’s) from the two models are not directly
comparable. But as Amemiya suggests, a logit estimate of a parameter multiplied by
0.625 gives a fairly good approximation of the probit estimate of the same parameter.
Similarly the coefficients of LPM and logit models are related as follows: β LPM = 0.25 β
, except for intercept
Logit
LPM = 0.25 β Logit + 0.5

0 for intercept
Marginal Effects of Logit and Probit Model Estimates
In the linear regression model, as you have discussed in Econometrics I, the slope
coefficient measures the change in the average value of the regressand for a unit
change in the value of a regressor, with all other variables held constant.
In the LPM, the slope coefficient measures directly the change in the probability of an
event occurring as the result of a unit change in the value of a regressor, with the
effect of all other variables held constant. In the case of linear models, is marginal
effect as given by =
However, the coefficients in logit and probit models are not easy to interpret directly.
Hence, we have to derive marginal effects to compare results from different model.
That is, we need to derive the probability of equals 1 with respect to the kth element
in. Algebraically,
= =
Hence, in probit model and logit models, respectively
= and =
27
Where denotes the standard CDF and normal density function respectively. The
marginal effects depends up on the values of . The sign of marginal effects corresponds

to the sign of its coefficients and direction of effect of explanatory variables X on the
dependent variable y.
Therefore, in the logit model the slope coefficient of a variable gives the change in the
log of the odds associated with a unit change in that variable, again holding all other
variables constant. But as noted previously, for the logit model the rate of change in the
probability of an event happening is given by Pi(1 − Pi ), where is the (partial
regression) coefficient of the kth regressor. But in evaluating Pi , all the variables
included in the analysis are involved.
In the probit model, as we saw earlier, the rate of change in the probability is is given
by , where is the density function of the standard normal variable. Thus, in both the
logit and probit models all the regressors are involved in computing the changes in
probability, whereas in the LPM only the kth regressor is involved.
Review Questions
Answer the following questions briefly.
1. Explain the role of qualitative explanatory variables in regression analysis.
2. What are the common and major features of the dummy variable regression
model? State them.
3. Suppose that a researcher wants to regress the annual salaries of economics

graduates on the number
28 of years of experience and education level of the
graduates (Here we have three levels of education: namely, BA, MSc and PhD)
A. How many dummy variables will be included in the model?

B. Specify the model considering BA as the base category.
C. Find the mean values of the annual salaries corresponding to different

values of the regressors.
4. What is a Linear Probabilty Model(LPM)? What are the shortcomings of this

model?
5. Explain or outline the similarities and differences between the probit and logit
models.
6. Specify the mathematical form of the logit and probit models.
7. Can we use the standard OLS method to estimate the probit and logit models?
Why?
8. From data for 54 standard metropolitan statistical areas (SMSA), Demaris

estimated the following logit model to explain high murder rate versus low
murder rate:
ln Oi = 1.1387 + 0.0014Pi + 0.0561Ci – 0.4050Ri
se = (0.0009) (0.0227) (0.1568)
where O = the odds of a high murder rate,
P =1980 population size in thousands,
C = population growth rate from 1970 to 1980,
R = reading quotient, and
the se are the asymptotic standard errors.

29
A. How would you interpret the various coefficients?

B. Which of the coefficients are individually statistically significant?
C. What is the effect of a unit increase in the reading quotient on the odds of
having a higher murder rate?
D. What is the effect of a percentage point increase in the population growth

rate on the odds of having a higher murder rate?
30

Chapter Two
2. Introduction to Basic Regression Analysis with Time Series Data
Topic Learning objectives
The aim of this chapter is to extend the discussion of regression analysis by

incorporating a brief discussion of time series econometrics.
After the student has completed this chapter, he/she will
 Understand the nature of time series data
 Understand the concept of stationarity.
 Distinguish between stationary stochastic and non- stationary stochastic process
 Distinguish between Trend Stationary (TS) and Difference Stationary (DS)

Stochastic Processes
 Define the concept of ‘Spurious Regression’
 Conduct a unit root test
31
Topic Outline
1. Introduction

2. The nature of Time Series Data
3. Stationary and non-stationary stochastic Processes
4. Trend Stationary and Difference Stationary Stochastic Processes
5. Integrated Stochastic Process
6. Tests of Stationarity: The Unit Root Test
7. Review questions
2.1 Introduction
Time series data is one of the three important types of data that are used in
empirical analysis such as cross-section, time-series and pooled/panel data. Time-
series data have become so frequently and intensively used in empirical research
that econometricians have recently begun to pay very careful attention to such
data. In this chapter, we first give an overview about the nature of time-series data
and then we will see the concept of stochastic process (both stationary and non-
stationary), integrated stochastic processes, unit root test and transforming non-
stationary time-series.
2.2 Nature of Time Series Data
Recall that one of the important types of data used in empirical analysis is time series
data.
A time series data set consists of observations on a variable or several variables over
time. These are data that can be collected over time, for instances, weekly, monthly,
32
quarterly, semiannually, annually, etc. Examples of time series data include stock
prices, money supply, consumer price index, gross domestic product, and automobile
sales figures. Because past events can influence future events and lags in behavior are

prevalent in the social sciences, time is an important dimension in time series data set.
Unlike the arrangement of cross-sectional data, the chronological ordering of
observations in a time series conveys potentially important information. In this and the
following sections we take a closer look at such data not only because of the frequency
with which they are used in practice but also because they pose several challenges to
econometricians and practitioners.
2.3. Stochastic Processes
From a theoretical point of view, a time series is a collection of random variables (X t).
Such a collection of random variables ordered in time is called a stochastic process.
Loosely speaking, a random or stochastic process is a collection of random variables
ordered in time. The word stochastic has a Greek origin and means "pertaining to
chance." If we let Y denote a random variable, and if it is continuous, we denote it as
Y(t), but if it is discrete, we denoted it as Yt. We distinguish two types of stochastic
process: stationary and non-stationary stochastic processes.
2.3.1. Stationary Stochastic Processes
A type of stochastic process that has received a great deal of attention by time series
analysts is the so-called stationary stochastic process.A stochastic process is said to be
stationary if its mean and variance are constant over time and the value of the
covariance between the two time periods depends only on the distance or gap or lag
between the two time periods and not the actual time at which the covariance is
computed. In the time series literature, such a stochastic process is known as a weakly
stationary, or covariance stationary,
33 or second-order stationary, or wide sense,
stochastic process. To explain stationarity, let Yt be a stochastic time series with these
properties:

Mean: (2.1)
Variance: = (2.2)
Covariance: = (2.3)
where , the covariance (or autocovariance) at lag k, is the covariance between the
values of and , that is, between two Y values k periods apart. If k = 0, we obtain ,
which is simply the variance of Y (=); if k = 1, is the covariance between two adjacent
values of Y (recall the first-order autoregressive scheme).
Suppose we shift the origin of Y from to (for instance, from the first quarter of 1970 to
the first quarter of 1975 say for GDP data). Now if is to be stationary, the mean,
variance, and autocovariances of must be the same as those of . In short, if a time
series is stationary, its mean, variance, and autocovariance (at various lags) remain the
same no matter at what point we measure them; that is, they are time invariant. Such
a time series will tend to return to its mean (called mean reversion) and fluctuations
around this mean (measured by its variance) will remain constant.
Why are stationary time series so important? Because if a time series is non-stationary,
we can study its behavior only for the time period under consideration. Each set of time
series data will therefore be for a particular time period. As a consequence, it is not
possible to generalize it to other time periods. Therefore, for the purpose of forecasting,
such (non-stationary) time series may be of little practical value. We call a stochastic
process purely random (white noise, process) if it has zero mean, constant variance ,
and is serially uncorrelated. Loosely speaking, the error term is assumed to be a white
noise process if ̴IIDN (0,); that is, is independently and identically distributed as a
normal distribution with zero mean
34 and constant variance.

2.3.2. Nonstationary Stochastic Processes
Although our interest is in stationary time series, one often encounters non-stationary
time series, the classic example being the random walk model (RWM). That is, a
non-stationary time series will have a time varying mean or a time-varying variance or
both. It is often said that asset prices, such as stock prices or exchange rates, follow a
random walk; that is, they are non-stationary. We distinguish two types of random
walks: (1) random walk without drift (i.e., no constant or intercept term) and (2)
random walk with drift (i.e., a constant term is present).
A) Random Walk without Drift. Suppose is a white noise error term with mean 0
and variance σ2. Then the series Yt is said to be a random walk if
=+ (2.4)
In the random walk model, as (2.4) shows, the value of Y at time t is equal to its value
at time (t − 1) plus a random shock; thus it is an AR(1) model. We can think of (2.4) as
a regression of Y at time t on its value lagged one period. Believers in the efficient
capital market hypothesis argue that stock prices are essentially random and
therefore there is no scope for profitable speculation in the stock market: If one could
predict tomorrow’s price on the basis of today’s price, we would all be millionaires.
Now from (2.4) we can write
=+
=+=++
=+=+++
35
In general, if the process started at some time 0 with a value of , we have
= +∑ (2.5)

Therefore,
=+∑ = (why?) (2.6)
In like fashion, it can be shown that
(2.7)
As the preceding expression shows, the mean of Y is equal to its initial, or starting,
value, which is constant, but as t increases, its variance increases indefinitely, thus
violating a condition of stationarity. In short, the RWM without drift is a non-stationary
stochastic process. In practice is often set at zero, in which case =.
An interesting feature of RWM is the persistence of random shocks (i.e., random

errors), which is clear from (2.5): is the sum of initial plus the sum of random shocks.
As a result, the impact of a particular shock does not die away. For example, if = 2
rather than = 0, then all ’s from onward will be 2 units higher and the effect of this
shock never dies out so that random walk is said to have an infinite memory. As Kerry
Patterson notes, random walk remembers the shock forever; that is, it has infinite
memory.
Interestingly, if you write (2.4) as
-== (2.8)
where is the first difference operator. It is easy to show that, while is non-stationary,
its first difference is stationary. In other words, the first differences of a random walk
time series are stationary. 36
B) Random Walk with Drift. Let us modify (2.4) as follows:

=+ (2.9)
where δ is known as the drift parameter. The name drift comes from the fact that if
we write the preceding equation as
-= (2.10)
it shows that drifts (moves) upward or downward, depending on δ being positive or

negative. Note that model (2.9) is also an AR(1) model. Following the procedure
discussed for random walk without drift, it can be shown that for the random walk with
drift model (2.9),
= +t (2.11)
(2.12)
Hence, for RWM with drift the mean as well as the variance increases over time, again
violating the conditions of stationarity. In short, RWM, with or without drift, is a non-
stationary stochastic process.
2.4. Unit Root Stochastic Process
Let us write the RWM (2.4) as:
=+ (2.13)
This model resembles the Markov first-order autoregressive model that we discussed in
the case of autocorrelation. If ρ = 1, (2.13) becomes a RWM (without drift). If ρ is in
fact 1, we face what is known as the unit root problem, that is, a situation of non-
stationarity; we already know that in this case the variance of is not stationary. The
37

name unit root is due to the fact that ρ = 12. Thus the terms non-stationarity, random
walk, and unit root can be treated as synonymous.
If, however, |ρ| ≤ 1, that is if the absolute value of ρ is less than one, then it can be
shown that the time series is stationary in the sense we have defined it 3. In practice,
then, it is important to find out if a time series possesses a unit root. In the next section
we will discuss tests of unit root, that is, tests of stationarity.
2.5. Trend Stationary (TS) and Difference Stationary (DS) Stochastic

Processes
The distinction between stationary and non-stationary stochastic processes (or time
series) has a crucial bearing on whether the trend is deterministic or stochastic.
Broadly speaking, if the trend in a time series is completely predictable and not
variable, we call it a deterministic trend, whereas if it is not predictable, we call it a
stochastic trend. To make the definition more formal, consider the following model of
the time series .
= β1+ β2t + β3+ (2.14)
where is a white noise error term and where t is time measured chronologically. Now
we have the following possibilities:
Pure random walk: If in (2.14) β1= 0, β2= 0, β3= 1, we get
2
A technical point: If ρ = 1, we can write (2.13) as - = . Now using the lag operator L so that L = , L2 = , and so on,
we can write (2.13) as (1 − L) = . The term unit root refers to the root of the polynomial in the lag operator. If you
set (1 − L) = 0, we obtain, L = 1, hence the
38 name unit root.
3
If in (2.13) it is assumed that the initial value of Y (=) is zero, |ρ| ≤ 1, and is white noise and distributed normally
with zero mean and unit variance, then it follows that ) = 0 and ) = 1/(1 − ρ2). Since both these are constants, by the
definition of stationarity, is stationary. On the other hand, as we saw before, if =1, is a random walk or
nonstationary.

=+ (2.15)
which is nothing but a RWM without drift and is therefore nonstationary. But note that,
if we write (2.15) as
== (2.8)
it becomes stationary, as noted before. Hence, a RWM without drift is a difference

stationary process (DSP).
Random walk with drift: If in (2.14) β1 0, β2 = 0, β3 = 1, we get
= β1+ (2.16a)
which is a random walk with drift and is therefore non-stationary. If we write it as
= = β1+ (2.16b)
this means will exhibit a positive (β1>0) or negative (β1<0) trend. Such a trend is called
a stochastic trend. Equation (2.16b) is a DSP process because the non-stationarity in
can be eliminated by taking first differences of the time series.
Deterministic trend: If in (2.14), β1 0, β2 0, β3 = 0, we obtain
= β1+ β1t + (2.17)
which is called a trend stationary process (TSP). Although the mean of is β1+ β1t,
which is not constant, its variance (= σ2) is. Once the values of β1 and β2 are known, the
mean can be forecast perfectly. Therefore, if we subtract the mean of from , the
39
resulting series will be stationary, hence the name trend stationary. This procedure of
removing the (deterministic) trend is called detrending.

Random walk with drift and deterministic trend: If in (21.5.1), β1 0, β2 0, β3 =

1, we obtain:
= β1+ β2t ++ (2.18a)
we have a random walk with drift and a deterministic trend, which can be seen if we
write this equation as
= β1+ β1t + (2.18b)
Deterministic trend with stationary AR(1) component: If in (21.5.1) β1 0, β2 0,

β3<1, then we get
= β1+ β2t +β3+ (2.19)
which is stationary around the deterministic trend.
2.6. Integrated Stochastic Processes
The random walk model is a specific case of a more general class of stochastic
processes known as integrated processes. Recall that the RWM without drift is non-
stationary, but its first difference, as shown in (2.8), is stationary. Therefore, we call the
RWM without drift integrated of order 1, denoted as . Similarly, if a time series has
to be differenced twice (i.e., take the first difference of the first differences) to make it
stationary, we call such a time series integrated of order 24. In general, if a
(nonstationary) time series has to be differenced d times to make it stationary, that
time series is said to be integrated of order d. A time series integrated of order d is
denoted as ̴. If a time series is stationary to begin with (i.e., it does not require any
differencing), it is said to be integrated of order zero, denoted by ̴. Thus, we will use
40
the terms “stationary time series” and “time series integrated of order zero” to mean

the same thing. Most economic time series are generally ; that is, they generally
become stationary only after taking their first differences.
Properties of Integrated Series
The following properties of integrated time series may be noted: Let Xt , Yt ,and Zt be
three time series.
1. If ̴ and ̴, then ; that is, a linear combination or sum of stationary and non-stationary
time series is non-stationary.
2. If ̴, then ) =, where a and b are constants. That is, a linear combination of a series is
also. Thus, if ̴, then ) ̴.
3. If ̴ and ̴, then ) ̴, where <.
4. If ̴ and ̴, then ) ̴ I(d*); d* is generally equal to d, but in some cases d* < d. As you
can see from the preceding statements, one has to pay careful attention in combining
two or more time series that are integrated of different order.
2.7. Spurious Regression
To see why stationary time series are so important, consider the following two random
walk models:
=+ (2.20)
=+ (2.21)
41
where we generated 500 observations of from ̴N(0, 1) and 500 observations of from
̴N(0, 1) and assumed that the initial values of both Y and X were zero. We also assumed
that and are serially uncorrelated as well as mutually uncorrelated. As you know by

now, both these time series are nonstationary; that is, they are I(1) or exhibit
stochastic trends.
Suppose we regress on . Since and are uncorrelated I(1) processes, the R2 from the
regression of Y on X should tend to zero; that is, there should not be any relationship
between the two variables. Consider the regression results:
_________________________________________________________________
Variable Coefficient Std. error t statistic
---------------------------------------------------------------------------------------------------
C -13.2556 0.6203 -21.36856
X 0.3376 0.0443 7.61223
R2 = 0.1044 d = 0.0121
----------------------------------------------------------------------------------------------------
-----
As you can see, the coefficient of X is highly statistically significant, and, although the
R2 value is low, it is statistically significantly different from zero. From these results, you
may be tempted to conclude that there is a significant statistical relationship between Y
and X, whereas a priori there should be none. This is in a nutshell the phenomenon of
spurious or nonsense regression, first discovered by Yule. Yule showed that
(spurious) correlation could persist in non-stationary time series even if the sample is
very large.
The regression result is characterized by the extremely low Durbin–Watson d value,

which suggests very strong first-order autocorrelation. According to Granger and
42
Newbold, an R2> d is a good rule of thumb to suspect that the estimated regression is
spurious. That the regression results presented above are meaningless can be easily
seen from regressing the first differences of (=) on the first differences of (=);

remember that although and are non-stationary, their first differences are stationary. In
such a regression you will find that R2 is practically zero, as it should be, and the
Durbin–Watson d is about 2.
2.8. Tests of Stationarity
By now you have a good idea about the nature of stationary stochastic processes and
their importance. In practice we face two important questions: (1) how do we find out
if a given time series is stationary? (2) If we find that a given time series is not
stationary, is there a way that it can be made stationary?
2.8.1. The Unit Root Test
A test of stationarity (or non-stationarity) that has become widely popular over the past
several years is the unit root test. We will first explain it, and then illustrate it. To
start with, we consider equation (2.13).
=+ (2.13)
where is a white noise error term.
We know that if ρ = 1, that is, in the case of the unit root, (2.13) becomes a random
walk model without drift, which we know is a non-stationary stochastic process.
Therefore, to test whether series Yt stationary or not, we regress on its (one period)
lagged value and find out if the estimated ρ is statistically equal to 1? If it is, then is
non-stationary. This is the general
43 idea behind the unit root test of stationarity. For
theoretical reasons, we manipulate (2.13) as follows: Subtract from both sides of
(2.13) to obtain:

= + = (+ (2.22)
which can be alternatively written as:
=+ (2.23)
where δ = (ρ − 1) and , as usual, is the first-difference operator.
In practice, therefore, instead of estimating (2.13), we estimate (2.23) and test the
(null) hypothesis that δ = 0. If δ = 0, then ρ = 1, that is we have a unit root, meaning
the time series under consideration is non-stationary. Before we proceed to estimate
(2.23), it may be noted that if δ = 0, (2.23) will become
== (2.24)
Since is a white noise error term, it is stationary, which means that the first differences
of a random walk time series are stationary, a point we have already made before.
Now let us turn to the estimation of (2.23). We take the first differences of and regress
them on and see if the estimated slope coefficient in this regression (=) is zero or not.
If it is zero, we conclude that is non-stationary. But if it is negative, we conclude that is
stationary. The only question is which test we use to find out if the estimated
coefficient of in (2.23) is zero or not. You might be tempted to say, why not use the
usual t test? Unfortunately, under the null hypothesis that δ = 0 (i.e., ρ = 1), the t
value of the estimated coefficient of does not follow the t distribution even in large
samples; that is, it does not have an asymptotic normal distribution. This suggests that
t-test is not applicable and instead we use widely used test called Dickey-Fuller test.
44
Dickey and Fuller have shown that under the null hypothesis that δ = 0, the estimated t
value of the coefficient of in (2.23) follows the τ (tau) statistic. A sample of these

critical values is given D-F table. In the literature the tau statistic or test is known as
the Dickey–Fuller (DF) test, in honor of its discoverers.
The actual procedure of implementing the DF test involves several decisions. In

discussing the nature of the unit root process in sections 2.3 and 2.4, we noted that a
random walk process may have no drift, or it may have drift or it may have both
deterministic and stochastic trends. To allow for the various possibilities, the DF test is
estimated in three different forms, that is, under three different null hypotheses.
is a random walk: =+ (2.23)
is a random walk with drift: =β1 + + (2.25)
is a random walk with drift
around a stochastic trend: =β1+ β2t + + (2.26)
where t is the time or trend variable. In each case, the null hypothesis is that δ = 0;
that is, there is a unit root-the time series is non-stationary. The alternative hypothesis
is that δ is less than zero; that is, the time series is stationary. If the null hypothesis is
rejected, it means that is a stationary time series with zero mean in the case of (2.23),
that is stationary with a non-zero mean [= β1/(1 − ρ)] in the case of (2.25), and that is
stationary around a deterministic trend in (2.26).
It is extremely important to note that the critical values of the tau test to test the
hypothesis that δ = 0, are different for each of the preceding three specifications of the
DF test, which can be seen tau table. Moreover, if, say, specification (2.25) is correct,
45
but we estimate (2.23), we will be committing a specification error, whose
consequences we already know from Chapter 4 of Econometrics I.

The actual estimation procedure is as follows: Estimate (2.23), or (2.24), or (2.25) by

OLS; divide the estimated coefficient of in each case by its standard error to compute
the (τ) tau statistic; and refer to the DF tables (or any statistical package). If the
computed absolute value of the tau-statistic (|τ |) exceeds the DF or MacKinnon critical
tau values, we reject the hypothesis that δ = 0, in which case the time series is
stationary. On the other hand, if the computed |τ | does not exceed the critical tau
value, we do not reject the null hypothesis, in which case the time series is non-
stationary. Make sure that you use the appropriate critical τ values.
Example: The results of the regressions is as follows: The dependent variable in is
= 28.2054 − 0.00136 (2.27)

t = (1.1576) (−0.2191) R2 = 0.00056 d = 1.35
Our primary interest here is in the t (=τ) value of the coefficient. The critical 1, 5, and
10 percent τ values are −3.5064, −2.8947, and −2.5842. The estimated δ coefficient is
negative, implying that the estimated ρ is less than 1. The estimated τ value is
−0.2191, which in absolute value is below even the 10 percent critical value of
−2.5842. Since, in absolute terms, the former is smaller than the latter, our conclusion
is that the GDP time series is not stationary.
2.8.2. TRANSFORMING NONSTATIONARY TIME SERIES
Now that we know the problems associated with non-stationary time series, the
46
practical question is what to do. To avoid the spurious regression problem that may
arise from regressing a non-stationary time series on one or more non-stationary time
series, we have to transform non-stationary time series to make them stationary. The

transformation method depends on whether the time series are difference stationary
(DSP) or trend stationary (TSP). We consider each of these methods in turn.
Difference-Stationary Processes
If a time series has a unit root, the first differences of such time series are stationary 5.
Therefore, the solution here is to take the first differences of the time series. Returning
to our U.S. GDP time series, we have already seen that it has a unit root. Let us now
see what happens if we take the first differences of the GDP series.
Let. For convenience, let =. Now consider the following regression:
= 16.0049 − 0.06827
t = (3.6402) (−6.6303) (2.28)
R2= 0.3435 d = 2.0344
The 1 percent critical DF τ value is −3.5073. Since the computed τ (= t) is more

negative than the critical value, we conclude that the first-differenced GDP is stationary;
that is, it is I(0).
Trend-Stationary Process
The simplest way to make Trend-Stationary Process time series stationary is to regress
it on time and the residuals from this regression will then be stationary. In other words,
run the following regression:
= β + β2t + (2.29)
47

where is the time series under study and where t is the trend variable measured
chronologically. Now
=(--) (2.30)
will be stationary. is known as a (linearly) detrended time series.
It should be pointed out that if a time series is DSP but we treat it as TSP, this is called
underdifferencing. On the other hand, if a time series is TSP but we treat it as DSP,
this is called overdifferencing. The consequences of these types of specification
errors can be serious, depending on how one handles the serial correlation properties of
the resulting error terms.
In passing it may be noted that most macroeconomic time series are DSP rather than
TSP.
Review Questions
1. What is meant by stationarity and unit roots ?
2. Discuss the concept of spurious regression?
3. Distinguish between Trend Stationary Process(TSP) and a Difference Stationary

Process (DSP)
48
4. Proof mathemathecally that a random walk model without a drift is non-
stationary while its first-difference is stationary.
5. A time-series that has a unit root is called a random walk. Explain.

6. Review questions
49

Chapter Three
2. An Introduction to Simultaneous Equation Models
The purpose of this chapter is to introduce the student very briefly about the
concept of simultaneous dependence of economic variables.
After the student has completed this chapter, he/she will
 Understand the concept of simultaneous equation
 Distinguish between endogeneous and exogeneous variables in a

model
 Be able to derive reduced form equations from structural equations
 Understand the concept of inderidentified, identified and

overidentified equations.
 Be able to conduct test of simultaneity.
Topic Outline
1. Introduction
2. The Nature of Simultaneous Equation Models
3. Simultaneity bias
4. Order and rank conditions of identification (without proof)

50
5. Indirect squares and 2SLS estimation of structural equations
6. Review questions

3.1. Introduction
In all the previous chapters discussed so far, we have been focusing exclusively with
the problems and estimations of a single equation regression models. In such models, a
dependent variable is expressed as a linear function of one or more explanatory
variables. The cause-and-effect relationship in such models between the dependent
and independent variable is unidirectional. That is, the explanatory variables are the
cause and the independent variable is the effect. But there are situations where such
one-way or unidirectional causation in the function is not meaningful. This occurs if, for
instance, Y (dependent variable) is not only function of X’s (explanatory variables) but
also all or some of the X’s are, in turn, determined by Y. There is, therefore, a two-way
flow of influence between Y and (some of) the X’s which in turn makes the distinction
between dependent and independent variables a little doubtful. Under such
circumstances, we need to consider more than one regression equations; one for each
interdependent variables to understand the multi-flow of influence among the variables.
This is precisely what is done in simultaneous equation models and what we are going
to see in this chapter.
3.2.Simultaneous Bias
A system describing the joint dependence of variables is called a system of

simultaneous equation or simultaneous equations model . The number of
equations in such models is equal to the number of jointly dependent or endogenous
variables involved in the phenomenon under analysis. Unlike the single equation
models, in simultaneous equation models it is not usually possible (possible only under
specific assumptions) to estimate a single equation of the model without taking into
account the information provided by other equation of the system. If one applies OLS
to estimate the parameters of each equation disregarding other equations of the model,
51
the estimates so obtained are not only biased but also inconsistent ; i.e. even if the
sample size increases indefinitely, the estimators do not converge to their true values.

The bias arising from application of such procedure of estimation which treats each
equation of the simultaneous equations model as though it were a single model is
known as simultaneity bias or simultaneous equation bias.
What happens to the parameters of the relationship if we estimate by applying OLS to

each equation without taking into account the information provided by the other
equations in the system? The application of OLS to estimate the parameters of
economic relationships presupposes the classical assumptions discussed in
Econometrics I. One of the crucial assumptions of the OLS is that the explanatory
variables and the disturbance term is independent i.e. the disturbance term is truly
exogenous. Symbolically:. As a result, the linear model could be interpreted as
describing the conditional expectation of the dependent variable (Y) given a set of
explanatory variables. In the simultaneous equation models, such independence of
explanatory variables and disturbance term is violated i.e. . If this assumption is
violated, the OLS estimator is biased and inconsistent and hence we can treat
simultanoues bias as violation of assumption of independence of explanatory variables
and the disturbance term.
Simultaneity bias of OLS estimators: The two-way causation in a relationship leads

to violation of the important assumption of linear regression model, i.e. one variable
can be dependent variable in one of the equation but becomes also explanatory
variable in the other equations of the simultaneous-equation model. In this case may
be different from zero. To show simultaneity bias, let’s consider the following simple
52
simultaneous equation model.

Y=α 0+α1 X+U ¿ }¿ ¿¿

-------------------------------------------------- (3.1)
Suppose that the following assumptions hold (recall assumptions of classical regression
model).
Ε(U )=0 , Ε (V )=0

Ε(U 2 )=σ 2u , Ε (V 2 )=σ u2
Ε(U i U j )=0 , Ε (V i V j )=0 , also Ε (UiVi)=0 ;
where X and Y are endogenous variables and Z is an exogenous variable.
The reduced form of X of the above model is obtained by substituting Y in the equation
of X.
X =β 0 + β 1 ( α 0 + α 1 X +U )+ β 2 Z +V
X=
β 0 + α 0 β1
+
(β2
1−α 1 β 1 1−α 1 β1
Z+
) (
β 1 U +V
1−α 1 β 1 )
−−−−−−−−−−−−−−−−−−−−−(3 .2 )
Applying OLS to the first equation of the above structural model will result in biased
estimator because cov ( X i U i )=Ε( X i U j )≠0 . Now, let’s proof whether this is the case or
not.
cov ( XU )=Ε [ { X −Ε( X ) }{U −Ε(U ) } ]
=Ε [ { X−Ε ( X ) } U ]−−−−−−−−−−−−−−−−−−−−−−−−−(3 . 3)
=Ε
[{ β0 +α 0 β 1
1−α 1 β 1
+
(
β2
1−α 1 β 1 ) (
Z+
β1 U +V
1−α 1 β1
−
)
β 0+ α0 β 1
1−α 1 β1
−
β2
1−α 1 β1
Z U
( ) }]
[{ }]
53
U
=Ε ( β U +V )
1−α 1 β1 1

=
( 1
1−α 1 β1 ) 2
Ε ( β1 U +UV )
=
( β1
1−α 1 β1 )
Ε (U 2 )=
β 1 σ 2u
1−α 1 β1
≠0
, since
That is, covariance between X and U is not zero. As a consequence, if OLS is applied to
each equation of the model separately the coefficients will turn out to be biased. Now,
let’s examine how the non-zero co-variance of the error term and the explanatory
variable will lead to biasness in OLS estimates of the parameters. If we apply OLS to
the first equation of the above structural equation (10)

Y =α 0 +α 1 X +U , we obtain
Σ xy Σx(Y −Ȳ ) Σ xY Ȳ Σx
α^ 1= = = 2 2
Σx 2
Σx2 Σx ; (since Σx is zero)
Σx( α 0 +α 1 X +U ) α 0 Σx Σ xU Σ xU
= = 2
+ α1 +
Σx 2
Σx Σx2 Σx2
Σ xX
2
=1
But, we know that Σx=0 and Σx , hence
Σ xU
α=α
^ 1+ −−−−−−−−−−−−−−−−−−−−−−(3 . 4 )
Σx 2
Taking the expected values on both sides;
Ε( α^ )=α 1 + Ε ( ΣΣxxU )
2
Since, we have already proved that Ε( Σ XU )≠0 ; which is the same as Ε( XU )≠0 .
Consequently, when Ε( XU )≠054; Ε( α^ )≠α , that is α^ 1 will be biased by the amount

Σ xu
2
equivalent to Σx .

3.3. Definitions of Some Concepts
I) Endogenous and exogenous variables
In simultaneous equation models variables are classified as endogenous and

predetermined. The endogenous variables are variables whose values are determined
by the economic model (within the system or model) and predetermined are those
variables whose values are determined outside the model. Predetermined variables can
be divided into two categories. These are: exogenous as well as current and lagged
exogenous variables. For instance;

X t and X t−1 depict the current and lagged
exogenous variables and

Y t−1 depicts lagged endogenous variable. This is on the
assumption that X’s symbolize the exogenous variables and Y’s symbolize the
endogenous variables. Thus,

X t , X t−1 and Y t−1 are regarded as predetermined
variables.
Consider the demand and supply functions.
d
Q =β 0 +β 1 P+ β2 Y +U 1−−−−−−−−−−−−−−−−−−(3 .5 )
s
Q =α 0 + α 1 P+α 2 R+U 2 −−−−−−−−−−−−−−−−−−−3 . 6 )
where : Q=quantity , Y=income, P=price, R=Rainfalls, U 1 ∧U 2 are error terms.
Here P and Q are endogenous variables and Y and R are exogenous variables.
55
II) Structural models

A structural model describes the complete structure of the relationships among the
economic variables. Structural equations of the model may be expressed in terms of
endogenous variables, exogenous variables and disturbances (random variables). The
parameters of structural model express the direct effect of each explanatory variable on
the dependent variable. Variables not appearing in any function explicitly may have an
indirect effect and is taken into account by the simultaneous solution of the system. For
instance, a change in consumption affects the investment indirectly and is not
considered in the consumption function. The effect of consumption on investment
cannot be measured directly by any structural parameter, but is measured indirectly by
considering the system as a whole.
Example: The following simple Keynesian model of income determination can be

considered as a structural model.
C=α+ βY +U ----------------------------------------------- (3.7)
Y =C +Z ---------------------------------------------------- (3.8)
for ?>0 and 0<?<1
where: C=consumption expenditure
Z=non-consumption expenditure
Y=national income
C and Y are endogenous variables while Z is exogenous variable.
56
III) Reduced form of the model:

The reduced form of a structural model is the model in which the endogenous variables
are expressed a function of the predetermined variables and the error term only.
Illustration: Find the reduced form of the above structural model.
Since C and Y are endogenous variables and only Z is the exogenous variables, we
have to express C and Y in terms of Z. To do this substitute Y=C+Z into equation (16).
C=α+ β(C +Z ) + U
C=α+ βC+βZ +U
C−βC=α+ βZ+U
C(1−β )=α+ βZ +U
C=
α
+
β
1−β 1−β ( )
Z+
U
1−β ---------------------------------- (3.9)
Substituting again (9) into (8) we get;
Y=
α
+
1
1−β 1−β ( )
Z+
U
1−β -------------------------------- (3.10)
Equation (9) and (10) are called the reduced form of the structural model of the above.
We can write this more formally as:
Structural form equations Reduced form equations

57
C    Y  U     U
C    Z 
1  1   1 

Y CZ   1  U
Y    Z 
1  1   1 
Parameters of the reduced form measure the total effect (direct and indirect) of a
change in exogenous variables on the endogenous variable. For instance, in the above
reduced form equation(9),

( 1−ββ ) measures the total effect of a unit change in the
non-consumption expenditure on consumption. This total effect is β , the direct effect,
times
( 1−β1 ) ,the indirect effect.
The reduced form equations can be obtained in two ways:
1) To express the endogenous variables directly as a function of the predetermined

variables.
2) To solve the structural system of endogenous variables in terms of the

predetermined variables, the structural parameters, and the disturbance terms.
3. 4. The Identification Problem
Note that since the reduced form coefficients can be estimated by the OLS method and
these coefficients are combinations of the structural coefficients, the possibility exist
that the structural coefficients can be “retrieved” from the reduced-form coefficients,
and it is in the estimation of the structural parameters that we may be ultimately
interested. Unfortunately, retrieving the structural coefficients from the reduced form
coefficients is not always possible;
58 this problem is one way of viewing the identification
problem.

By the identification problem we mean whether numerical estimates of the parameters

of a structural equation can be obtained from the estimated reduced-form coefficients.
If this can be done, we say that the particular equation is identified. If this cannot be
done, then we say that the equation under consideration is unidentified, or under
identified. Note that the identification problem is a mathematical (as opposed to
statistical) problem associated with simultaneous equation systems. It is concerned with
the equation of the possibility or impossibility of obtaining meaningful estimates of the
structural parameters.
An identified equation may be either exactly (or fully or just) identified or over
identified. It is said to be over identified if more than one numerical value can be
obtained for some of the parameters of the structural equations. The circumstances
under which each of these cases occurs will be shown in the following discussion.
a) Under Identification
Example: Consider the following linear demand and supply models.
Demand function = ?0 + ?1Pt + U1t ……………………………... (3.11)
Supply function = ?0 + ?1Pt + U2t………………………………… (3.12)
Equilibrium Condition = ……………….……………………….. (3.13)
Where = Quantity demanded, = Quantity supplied, P = price and t = time
By the equilibrium condition (i.e., = ) we obtain,
?0 + ?1Pt + U1t= ?0 + ?1Pt + U2t …………………………… (3.14)
Solving (3.14) using the substitution technique, we obtain the equilibrium price
59
Pt = ?0 + Vt …………………………………………..(3.15)
where = V1 =

Then, we obtain the following equilibrium quantity:
= + …………………………….. (3.16)
where = Wt =
Note that and , (the reduced-form-coefficients) contain all four structural parameters; ?
, ?1, ?0 and ?1. But, there is no way in which the four structural unknowns can be
0
estimated from only two reduced form coefficients. Recall from Algebra for Economists
that to estimate four unknowns we must have four (independent) equations, and in
general, to estimate k unknowns we must have R (independent) equations. What all
this means is that, given time series data on p(price) and Q(quantity) and no other
information, there is no way the researcher guarantee whether he/she is estimating the
demand function or the supply function. That is, a given and represent simply the
point of intersection of the appropriate demand and supply curves because of the
equilibrium condition that demand is equal to supply.
b) Just or Exact Identification
The reason we could not identify the preceding demand function or the supply function
was that the same variables P and Q are present in both functions and there is no
additional information. But suppose we consider the following demand and supply
model.
Demand function = ?0 + ?1Pt +?2It + U1t , ?1< 0, ?2> 0 ……………... (3.17)
Supply function = ?0 + ?1Pt + ?2Pt-1 +U2t , ?1> 0, ?2> 0 ……………… (3.18)
where I = income of the consumer, an exogenous variable

60
Pt-1= Price lagged one period, usually incorporated in the model to explain the supply of
many agricultural commodities. Note that Pt-1is a predetermined variable because its
value is known at time t. By the market-clearing mechanism we have

?0 + ?1Pt +?2 It + U1t= ?0 + ?1Pt + ?2Pt-1 +U2t…………...............… (3.19)
Solving this equation, we obtain the following equilibrium price
Pt = ?0 + ?1It + ?2Pt-1 + Vt ……………….............................. …………(3.20)
Where = =- = =
Substituting the equilibrium price (3.20) into the demand or supply equation of ( 3.17)
or (3.18) we obtain the corresponding equilibrium quantity:
Qt = ?3 + ?4It + ?sPt-1 + Wt …......................................……..(3.21)
where the reduced-form coefficients are
= = = =
the demand-and-supply model given in equations (3.17) and (3.18) contain six
structural coefficients ?0, ?1, ?2, ?0, ?1, and ?2 – and there are six reduced form
coefficients - ?0, ?1, ?2, ?3, ?4 and ?5 – to estimate them. Thus, we have six equations in
six unknowns, and normally we should be able to obtain unique estimates. Therefore,
the parameters of both the demand and supply equations can be identified and the
system as a whole can be identified.
c) Over identification
Note that for certain goods and services, wealth of the consumer is another important
determinant of demand. Therefore, the demand function (3.17) can be modified as
follows, keeping the supply function as before:
Demand function = ?0 61
+ ?1Pt +?2It +?3Rt + U1t……………... (3.22)
Supply function = ?0 + ?1Pt + ?2Pt-1 +U2t ………………………… (3.23)
Where R represents wealth

Equating demand to supply, we obtain the following equilibrium price and quantity
Pt = ?0 + ?1It + ?2Rt + ?3Pt-1 + Vt ……………………………..….. (3.24)
Qt = ?4 + ?5It + ?6Rt + ?7Pt-1 + Wt ……………………………….... (3.25)
Where = = = =
= = = =
= = = =
The demand and supply model in (3.22) and (3.23) contains seven structural
coefficients, but there are eight equations to estimate them – the eight reduced form
coefficients given above (i.e., ?0 … ?7). Notice that the number of equations is greater
than the number of unknowns. As a result, unique estimation of all the parameters of
our model is not possible. For example, one can solve for ?1 in the following two ways
= or =
That is, there are two estimates of the price coefficient in the supply function, and there
is no guarantee that these two values or solutions will be identical. Moreover, since ?1
will be transmitted to other estimates. Note that the supply function is identified in the
system (3.17) and (3.18) but not in the system (3.22) and (3.23), although in both
cases the supply function remains the same. This is because we have “too much” or an
over sufficiency of information to identify the supply curve. The over sufficiency of the
information results from the fact that in the model (3.22) and (3.23) the exclusion of
the income variable form the supply function was enough to identify it, but in the model
(3.22) and (3.23) the supply function excludes not only the income variable but also the
wealth variable. However, this situation does not imply that over identification is
necessarily bad since the problem
62 of too much information can be handled.

Notice that the situation is the opposite of the case of under identification where there
is too little information. The only way in which the structural parameters of unidentified
(or under identified) equations can be identified (and thus be capable of being
estimated) is through imposition of further restrictions, or use of more extraneous
information. Such restrictions, of course, must be imposed only if their validity can be
defended.
In a simple example such as the forgoing, it is easy to check for identification; in more
complicated systems, however, it is not so easy. This time consuming procedure can be
avoided by resorting to either the orders condition or the rank condition of
identification. Although the order condition is easy to apply, it provides only a necessary
condition for identification. On the other hand, the rank condition is both a necessary
and sufficient condition for identification. Hence, in the next section we discuss orders
condition or the rank condition of identification.
3.5.1Establishing Identification from the Structural Form of the Model
There are two conditions which must be fulfilled for an equation to be identified.
1. The Order Condition for Identification
This condition is based on a counting rule of the variables included and excluded from
the particular equation. It is a necessary but not sufficient condition for the
identification of an equation. The order condition may be stated as follows.
For an equation to be identified the total number of variables (endogenous and

exogenous) excluded from it63 must be equal to or greater than the number of
endogenous variables in the model less one. Given that in a complete model the
number of endogenous variables is equal to the number of equations of the model, the

order condition for identification is sometimes stated in the following equivalent form.
For an equation to be identified the total number of variables excluded from it but
included in other equations must be at least as great as the number of equations of the
system less one.
Let: G = total number of equations (= total number of endogenous variables)
K= number of total variables in the model (endogenous and predetermined)
M= number of variables, endogenous and exogenous, included in a particular

equation.
Then the order condition for identification may be symbolically expressed as:
( K − M )≥ ( G −1 )
[ excluded ¿ ] ¿ ¿ ¿ ¿
¿
¿
For example, if a system contains 10 equations with 15 variables, ten endogenous and
five exogenous, an equation containing 11 variables is not identified, while another
containing 5 variables is identified.
a. For the first equation we have
G=10 K =15 M =11
Order condition:
( K−M )≥(G−1 )
(15−11 )<(10−1 ) ; that is, the order condition is not satisfied.
b. For the second equation we have

64
G=10 K =15 M =5
order condition:

( K−M )≥(G−1 )
(15−5 )<(10−1) ; that is, the order condition is satisfied.
The order condition for identification is necessary for a relation to be identified, but it is
not sufficient, that is, it may be fulfilled in any particular equation and yet the relation
may not be identified.
2. The Rank Condition for Identification
The rank condition states that: in a system of G equations any particular equation is
identified if and only if it is possible to construct at least one non-zero determinant of
order (G-1) from the coefficients of the variables excluded from that particular equation
but contained in the other equations of the model. The practical steps for tracing the
identifiability of an equation of a structural model may be outlined as follows.
Firstly. Write the parameters of all the equations of the model in a separate table,
noting that the parameter of a variable excluded from an equation is equal to zero.
Example 1: For example let a structural model be:
y 1 =3 y 2 −2 x 1 + x 2 +u1
y 2 = y 3 + x 3 +u 2
y 3 = y 1− y 2−2 x 3 +u3
where the y’s are the endogenous variables and the x’s are the predetermined
variables. This model may be rewritten in the form
− y 1 +3 y 2 +0 y3 −2 x 1 +x652 +0 x 3 +u 1=0
0 y 1− y 2 + y 3 + 0 x 1 + 0 x 2 + x3 +u2 =0

y 1 − y 2− y 3 +0 x 1 +0 x 2−2 x 3 +u 3=0
Ignoring the random disturbance the table of the parameters of the model is as follows:
Variables
Equations
Y1 Y2 Y3 X1 X2 X3
1st equation -1 3 0 -2 1 0
2nd equation 0 -1 1 0 0 1
3rd equation 1 -1 -1 0 0 -2
Secondly. Strike out the row of coefficients of the equation which is being examined for
identification. For example, if we want to examine the identifiability of the second
equation of the model we strike out the second row of the table of coefficients.
Thirdly. Strike out the columns in which a non-zero coefficient of the equation being
examined appears. By deleting the relevant row and columns we are left with the
coefficients of variables not included in the particular equation, but contained in the
other equations of the model. For example, if we are examining for identification the
second equation of the system, we will strike out the second, third and the sixth
columns of the above table, thus obtaining the following tables.
Table of structural parameters Table of parameters of excluded

variables
Y1 Y2 Y3 X1 X2 X3 Y3 X1 X2
? ? ?
66
1st -1 3 0 -2 1 0 -1 -2 1
?2nd

3rd 0 -1 1 0 0 1 1 0 0
1 -1 -1 0 0 -2
Fourthly. Form the determinant(s) of order (G-1) and examine their value. If at least
one of these determinants is non-zero, the equation is identified. If all the determinants
of order (G-1) are zero, the equation is underidentified. In the above example of
exploration of the identifiability of the second structural equation, we have three
determinants of order (G-1)=3-1=2. They are:
Δ 1=¿|− 1 −2 ¿|¿ ¿ ¿
¿
(the symbol Δ stands for ‘determinant’) We see that we can form two non-zero
determinants of order G-1=3-1=2; hence the second equation of our system is
identified.
Fifthly. To see whether the equation is exactly identified or overidentified we use the
order condition ( K−M )≥(G−1 ). With this criterion, if the equality sign is satisfied,
that is if ( K−M )=(G−1 ) , the equation is exactly identified. If the inequality sign
holds, that is, if ( K−M )<(G−1) , the equation is overidentified.
In the case of the second equation we have:
G=3 K=6 M=3
And the counting rule ( K−M )≥(G−1 ) gives
(6-3)>(3-1) 67
Therefore the second equation of the model is overidentified.

Example 2. Assume that we have a model describing the market of an agricultural

product. From the theory of partial equilibrium we know that the price in a market is
determined by the forces of demand and supply. The main determinants of the demand
are the price of the commodity, the prices of other commodities, incomes and tastes of
consumers. Similarly, the most important determinants of he supply are the price of
the commodity, other prices, technology, the prices of factors of production, and
weather conditions. The equilibrium condition is that demand be equal to supply. The
above theoretical information may be expressed in the form of the following
mathematical model.
D=a0 +a1 P1 +a 2 P2 +a 3 Y +a 4 t+u
S=b 0 +b1 P1 +b 2 P 2 +b 3 C+b 4 t +w
D=S
Where: D= quantity demanded
S= quantity supplied
P1= price of the given commodity
P2 = price of other commodities
Y= income
C= costs (index of prices of factors of production)
t= time trend. In the demand function it stands for ‘tastes’; in the supply
function it stands for ‘technology’.
68
The above model is mathematically complete in the sense that it contains three
equations in three endogenous variables, D,S and P1. The remaining variables, Y, P2, C,

t are exogenous. Suppose we want to identify the supply function. We apply the two
criteria for identification:
1. Order condition: ( K−M )≥(G−1 )
In our example we have: K=7 M=5 G=3
Therefore, (K-M)=(G-1) or (7-5)=(3-1)=2
Consequently the second equation satisfies the first condition for identification.
2. Rank condition
The table of the coefficients of the structural model is as follows.
Variables
Equations
D P1 P2 Y t S C
1st equation -1 a1 a2 a3 a4 0 0
b1 b2 0 b4
2nd equation 0 -1 b3
0 0 0
rd 0 0
3 equation 1 1
Following the procedure explained earlier we strike out the second row an the second,
third, fifth, sixth and seventh columns. Thus we are left with the table of the
coefficients of excluded variables:
Complete table of Table of

parameters of
Structural parameters
variables excluded
69
from the second
equation

-1 a1 a2 a3 a4 0 0 -1 a3
b1 b2 0 b4 b3
0 1
0 0 0 1 0
0 -1
1 1
From this table we can form only one non-zero determinant of order
(G-1) = (3-1) =2
Δ=¿|−1 a3 ¿|¿ ¿ ¿
¿
The value of the determinant is non-zero, provided that

a3 ≠0 .
We see that both the order and rank conditions are satisfied. Hence the second
equation of the model is identified. Furthermore, we see that in the order condition the
equality holds: (7-5) = (3-1) = 2. Consequently the second structural equation is
exactly identified.
3.6. Approaches to Estimation
As we have discussed in section 3.2, the bias arising from application of OLS estimation
which treats each equation of the simultaneous equations model as though it were a
single model is known as simultaneity bias or simultaneous equation bias . To avoid this
bias we use other methods of estimation, such as, Indirect Least Square (ILS), Two
Stage Least Square (2SLS), three Stage Least Square(3SLS), Maximum Likelihood
Methods and the Method of Instrumental Variable (IV). However, in view of the
introductory nature of this course we shall consider very briefly the following
70
techniques.

1. The Method of Indirect Least Squares (ILS)
For just or exactly identified structural equation, the method of obtaining the estimates
of the structural coefficients from the OLS estimators of the reduced form coefficients is
known as the method of indirect least squares (ILS). ILS involves the following three
steps
Step I: - We first obtain the reduced form equations.
Step II: - Apply OLS to the reduced form equations individually.
Step III: - Obtain estimates of the original structural coefficients from the estimated
reduced form coefficients obtained in step II.
2. The Method of two stage least squares (2SLS)
This method is applied in estimating an over identified equation. Theoretically, the two
stages least squares may be considered as an extension of ILS method. The 2SLS
method boils down to the application of ordinary list squares in two stages. That is, in
the first stage, we apply least squares to the reduced form equations in order to obtain
an estimate of the exact and random components of the endogenous variables
appearing in the right hand side of the equation with their estimated value and then we
apply OLS to the transformed original equation to obtain estimates of the structural
parameters.
Note, however, that since 2SLS is equivalent to ILS in the just-identified case, it is
usually applied uniformly to all identified equations in the system.
Review Questions

71
1. What do we mean by simultaneous equation?

2. What do we mean by simultaneity bias?
3. When do we say a model is over identified? What is the consequence

of over identification?
4. Briefly explain the ILS and 2SLS approaches to estimation.
5. What is the economic meaning of the imposition of “Zero restriction”

on the parameters of a model?
6. Consider the following hypothetical model:
It=0 +α1Yt +ut
Yt=Qt+It
A. Identify the endogeneous and exogeneous variables.

B. Write the reduced form equation expressed.
7. Given the simple Keynesian Model of Income –determination , derive

the reduced form of the behavioral equations.
Ct=0 +α1Yt +u1
It=0 +β1Yt + β2 Yt-1 +u2
Yt=Ct+It+ Gt+u3
8. Take as a model of wage-price behavior:
W t =a0 +a172(UN )t +a 2 P t +ε 1t ,
Pt =b 0 +b 1 M t +b 2 (UN )t + b3 W t +ε 2t ,

where
W = the percentage change in wages,
UN = the rate of unemployment,
P = the percentage change in prices,
Mt = the percentage change in the money supply, and
ε 1 and ε 2 = disturbance terms .
Assume that
ε 1 t and ε 2t have zero means, constant variances, are not
autocorrelated, and are independent of (UN)t and Mt.
A. Are the above equations identified ? Explain. If so what type of

identification is it?
B. Outline an estimation procedure for the identified equation.
9. Suppose that private investment spending is such that
I it = a+b1 r it +b 2 S i(t−1)+uit , i = 1, ... , N,
r it =r t +b 3 I it +ε it ,
where
73
Iit = investment expenditures of the ith firm at time t,
rit = the rate of interest it must pay for investment funds,

Si(t-1) = its sales in period t – 1 , and
rt = the economy-wide average interest rate for investment funds,
We assume that these N firms are large so that the level of their investment
expenditure affects the interest rate they face. Assume the standard conditions
concerning uit and

ε it . Assume also that we have only cross-sectional data.
Discuss whether or not the equations are identified.(Hint: use the rank condition for
identification)
10. Consider the model
Ct =b0 +b1 C t−1 +b2 Y t +ε 1t (1)
Y t =I t +Ct , (2 )
I t =a0 +a1 Y t +a 2 Y t−1 +a3 r t +ε 2 t , (3 )
where C, I, Y, and r are, respectively, consumer expenditures, investment,
income, and the interest rate. Assume that ε 1 and ε 2 are not autocorrelated and
are independent of rt.
a. List the endogenous variables and the predetermined variables in the model.
b. How would you estimate equation (1)?
c. How would you estimate equation (3)?
74

Chapter four
Panel Data Regression Models
The aim of this chapter is to introduce the students with the basic concept of panel data
and panel data regression models.
After studying this chapter students will be able to
 Understand the nature of panel data
 Identify the advantages of panel data.
 See the estimation of panel

75 data regression models
 Distinguish between Fixed effect and Random effect models.

Topic outline
1. Introduction
2. Estimation of Panel Data Regression Model: The Fixed Effects Approach
3. Estimation of Panel Data Regression Model: The Random Effects Approach
4. Review exercise
In Econometrics I we discussed briefly the types of data that are generally available for
empirical analysis, namely, time series, cross section, and panel. In time series
data we observe the values of one or more variables over a period of time (e.g., GDP
for several quarters or years). In cross-section data, values of one or more variables are
collected for several sample units, or entities, at the same point in time (e.g., crime
rates for 50 states in the United States for a given year). In panel data the same cross-
sectional unit (say a family or a firm or a state) is surveyed over time. In short, panel
data have space as well as time dimensions.
There are other names for panel data, such as pooled data (pooling of time series and
cross-sectional observations), combination of time series and cross-section data,
micropanel data, longitudinal data (a study over time of a variable or group of
subjects), event history analysis (e.g., studying the movement over time of subjects
through successive states or conditions), cohort analysis (e.g., following the career
path of 1965 graduates of a business school). Although there are subtle variations, all
these names essentially connote movement over time of cross-sectional units. We will
therefore use the term panel data
76 in a generic sense to include one or more of these
terms. And we will call regression models based on such data panel data regression
models.

4.2. WHY PANEL DATA?
What are the advantages of panel data over cross-section or time series data? Baltagi
lists the following advantages of panel data:
1. Since panel data relate to individuals, firms, states, countries, etc.,over time, there is
bound to be heterogeneity in these units. The techniques of panel data estimation can
take such heterogeneity explicitly into account by allowing for individual-specific
variables, as we shall show shortly. We use the term individual in a generic sense to
include microunits such as individuals, firms, states, and countries.
2. By combining time series of cross-section observations, panel data give “more

informative data, more variability, less collinearity among variables, more degrees of
freedom and more efficiency.”
3. By studying the repeated cross section of observations, panel data are better suited
to study the dynamics of change. Spells of unemployment, job turnover, and labor
mobility are better studied with panel data.
4. Panel data can better detect and measure effects that simply cannot be observed in
pure cross-section or pure time series data. For example, the effects of minimum wage
laws on employment and earnings can be better studied if we include successive waves
of minimum wage increases in the federal and/or state minimum wages.
5. Panel data enables us to study more complicated behavioral models. For example,
phenomena such as economies of scale and technological change can be better handled
by panel data than by pure cross-section or pure time series data.
6. By making data available for several thousand units, panel data can minimize the
bias that might result if we aggregate individuals or firms into broad aggregates.
77
In short, panel data can enrich empirical analysis in ways that may not be possible if we
use only cross-section or time series data. This is not to suggest that there are no

problems with panel data modeling. We will discuss them after we cover some theory
and discuss an example.
Panel data: an illustrative example
To set the stage, let us consider a concrete example. Consider the famous study of
investment theory proposed by Y. Grunfeld. Grunfeld was interested in finding out how
real gross investment (Y) depends on the real value of the firm () and real capital stock
(). Although the original study covered several companies, for illustrative purposes we
have obtained data on four companies, General Electric (GE), General Motor (GM), U.S.
Steel (US), and Westinghouse. Data for each company on the preceding three variables
are available for the period 1935–1954. Thus, there are four cross-sectional units and
20 time periods. In all, therefore, we have 80 observations. A priori, Y is expected to be
positively related to and .
In principle, we could run four time series regressions, one for each company or we
could run 20 cross-sectional regressions, one for each year, although in the latter case
we will have to worry about the degrees of freedom. Pooling, or combining, all the 80
observations, we can write the Grunfeld investment function as:
= 1, 2, 3, 4 and t = 1, 2, . . . , 20
where i stands for the ith cross-sectional unit and t for the tth time period.
As a matter of convention, we will let i denote the cross-section identifier and t the time
78
identifier. It is assumed that there are a maximum of N cross sectional units or
observations and a maximum of T time periods. If each cross-sectional unit has the
same number of time series observations, then such a panel (data) is called a

balanced panel. In the present example we have a balanced panel, as each company
in the sample has 20 observations.
If the number of observations differs among panel members, we call such a panel an
unbalanced panel. In this chapter we will largely be concerned with a balanced panel.
Initially, we assume that the X’s are non-stochastic and that the error term follows the
classical assumptions, namely, E() N(0, σ2).Notice carefully the double and triple
subscripted notation, which should be self-explanatory. How do we estimate (4.1)? The
answer follows.
4.3. Estimation of Panel Data Regression Models: The Fixed Effects

Approach
Estimation of (4.1) depends on the assumptions we make about the intercept, the slope
coefficients, and the error term, . There are several possibilities:
1. Assume that the intercept and slope coefficients are constant across time and space
and the error term captures differences over time and individuals.
2. The slope coefficients are constant but the intercept varies over individuals.
3. The slope coefficients are constant but the intercept varies over individuals and time.
4. All coefficients (the intercept as well as slope coefficients) vary over individuals.
5. The intercept as well as slope coefficients vary over individuals and time.
As you can see, each of these cases introduces increasing complexity (and perhaps
79
more reality) in estimating panel data regression models, such as (4.1). Of course, the
complexity will increase if we add more regressors to the model because of the
possibility of collinearity among the regressors. To cover each of the preceding

categories in depth will require a separate book, and there are already several ones on
the market. In what follows, we will cover some of the main features of the various
possibilities, especially the first four. Our discussion is nontechnical.
1. All Coefficients Constant across Time and Individuals
The simplest, and possibly naive, approach is to disregard the space and time
dimensions of the pooled data and just estimate the usual OLS regression. That is,
stack the 20 observations for each company one on top of the other, thus giving in all
80 observations for each of the variables in the model. The OLS results are as follows
= −63.3041 + 0.1101 + 0.3034

se = (29.6124) (0.0137) (0.0493)
t = (−2.1376) (8.0188) (6.1545) ……………………………………………(4.2)
= 0.7565, Durbin–Watson = 0.2187, n= 80 and df = 77
If you examine the results of the pooled regression, and applying the conventional
criteria, you will see that all the coefficients are individually statistically significant, the
slope coefficients have the expected positive signs and the R2 value is reasonably high.
As expected, Y is positively related to X2 and X3. The “only” fly in the ointment is that
the estimated Durbin–Watson statistic
is quite low, suggesting that perhaps there is autocorrelation in the data. Of course, as
we know, a low Durbin–Watson value could be due to specification errors also. For
instance, the estimated model assumes that the intercept value of GE, GM, US, and
Westinghouse are the same. It also assumes that the slope coefficients of the two X
variables are all identical for all the four firms. Obviously, these are highly restricted
assumptions. Therefore, despite its simplicity, the pooled regression (4.1) may distort
80
the true picture of the relationship between Y and the X’s across the four companies.
What we need to do is find some way to take into account the specific nature of the
four companies. How this can be done is explained next.

2. Slope Coefficients Constant but the Intercept Varies across Individuals:

The Fixed Effects or Least-Squares Dummy Variable (LSDV) Regression
Model
One way to take into account the “individuality” of each company or each cross-
sectional unit is to let the intercept vary for each company but still assume that the
slope coefficients are constant across firms. To see this, we write model (4.1) as:
…………………………………… (4.3)
Notice that we have put the subscript i on the intercept term to suggest that the
intercepts of the four firms may be different; the differences may be due to special
features of each company, such as managerial style or managerial philosophy.
In the literature, model (4.3) is known as the fixed effects (regression) model (FEM).
The term “fixed effects” is due to the fact that, although the intercept may differ across
individuals (here the four companies), each individual’s intercept does not vary over
time; that is, it is time invariant. Notice that if we were to write the intercept as β1it , it
will suggest that the intercept of each company or individual is time variant. It may be
noted that the FEM given in (4.3) assumes that the (slope) coefficients of the
regressors do not vary across individuals or over time.
How do we actually allow for the (fixed effect) intercept to vary between companies?
We can easily do that by the dummy variable technique, particularly, the differential
intercept dummies. Therefore, we write (4.3) as:
…………………………………..(4.4) 81

where = 1 if the observation belongs to GM, 0 otherwise; = 1 if the observation

belongs to US, 0 otherwise; and = 1 if the observation belongs to WEST, 0 otherwise.
Since we have four companies, we have used only three dummies to avoid falling into
the dummy-variable trap (i.e., the situation of perfect collinearity). Here there is no
dummy for GE. In other words, α1 represents the intercept of GE and α2, α3, and α4, the
differential intercept coefficients, tell by how much the intercepts of GM, US, and WEST
differ from the intercept of GE. In short, GE becomes the comparison company.
Of course, you are free to choose any company as the comparison company.
Incidentally, if you want explicit intercept values for each company, you can introduce
four dummy variables provided you run your regression through the origin, that is, drop
the common intercept in (4.4); if you do not do this, you will fall into the dummy
variable trap. Since we are using dummies to estimate the fixed effects, in the literature
the model (4.4) is also known as the LSDV model. So, the terms fixed effects and
LSDV can be used interchangeably. In passing, note that the LSDV model (4.4) is also
known as the covariance model and X2 and X3 are known as covariates.
The results based on (4.4) are as follows:
= −245.7924 + 161.5722D2i+ 339.6328D3i+ 186.5666D3i+ 0.1079X2i+ 0.3461X3i
se = (35.8112) (46.4563) (23.9863) (31.5068) (0.0175) (0.0266)
t = (−6.8635) (3.4779) (14.1594) (5.9214) (6.1653) (12.9821)
= 0.9345, d = 1.1076, and df = 74 …………………………………………….(4.5)
Compare this regression with (4.1). In (4.5) all the estimated coefficients are
individually highly significant,82as the p values of the estimated t coefficients are
extremely small. The intercept values of the four companies are statistically different;
being −245.7924 for GE, −84.220 (=−245.7924 + 161.5722) for GM, 93.8774

(=−245.7924 + 339.6328) for US, and −59.2258 (=−245.7924 + 186.5666) for WEST.
These differences in the intercepts may be due to unique features of each company,
such as differences in management style or managerial talent.
Which model is better—(4.1) or (4.5)? The answer should be obvious, judged by the
statistical significance of the estimated coefficients, and the fact that the R2 value has
increased substantially and the fact that the Durbin–Watson d value is much higher,
suggesting that model (4.1) was mis-specified. The increased R2 value, however, should
not be surprising as we have more variables in model (4.5).
We can also provide a formal test of the two models. In relation to (4.5), model (4.1) is
a restricted model in that it imposes a common intercept on all the companies.
Therefore, we can use the restricted F test discussed in Econometrics I. Using the
previous formula , the reader can easily check that in the present instance the F value
is:
F== =………………… (4.6)
where the restricted R2 value is from (4.1) and the unrestricted R2 is from (4.5) and
where the number of restrictions is 3, since model (4.1) assumes that the intercepts of
the GE, GM, US, and WEST are the same. Clearly, the F value of 66.9980 (for 3
numerator df and 74 denominator df) is highly significant and, therefore, the restricted
regression (4.1) seems to be invalid.
The Time Effect. Just as we used the dummy variables to account for individual
(company) effect, we can allow for time effect in the sense that the Grunfeld
83
investment function shifts over time because of factors such as technological changes,
changes in government regulatory and/or tax policies, and external effects such as wars
or other conflicts. Such time effects can be easily accounted for if we introduce time

dummies, one for each year. Since we have data for 20 years, from 1935 to 1954, we
can introduce 19 time dummies (why?), and write the model (4.4) as:
………………(4.7)
where Dum35 takes a value of 1 for observation in year 1935 and 0 otherwise, etc. We
are treating the year 1954 as the base year, whose intercept value is given by λ0
(why?)
We are not presenting the regression results based on (4.7), for none of the individual
time dummies were individually statistically significant. The R2 value of (4.7) was
0.7697, whereas that of (4.1) was 0.7565, an increment of only 0.0132. It is left as an
exercise for the reader to show that, on the basis of the restricted F test, this increment
is not significant, which probably suggests that the year or time effect is not significant.
This might suggest that perhaps the investment function has not changed much over
time.
We have already seen that the individual company effects were statistically significant,
but the individual year effects were not. Could it be that our model is mis-specified in
that we have not taken into account both individual and time effects together? Let us
consider this possibility.
3. Slope Coefficients Constant but the Intercept Varies over Individuals As

Well As Time
To consider this possibility, we can combine (16.3.4) and (16.3.6), as follows:
84
…………..(4.8)

When we run this regression, we find the company dummies as well as the coefficients
of the X are individually statistically significant, but none of the time dummies are.
Essentially, we are back to (4.5). The overall conclusion that emerges is that perhaps
there is pronounced individual company effect but no time effect. In other words, the
investment functions for the four companies are the same except for their intercepts. In
all the cases we have considered, the X variables had a strong impact on Y.
4. All Coefficients Vary across Individuals
Here we assume that the intercepts and the slope coefficients are different for all
individual, or cross-section, units. This is to say that the investment functions of GE,
GM, US, and WEST are all different. We can easily extend our LSDV model to take care
of this situation. Reconsider chapter I. There we introduced the individual dummies in
an additive manner. But, later on, on dummy variables, we showed how interactive, or
differential, slope dummies, can account for differences in slope coefficients. To do this
in the context of the Grunfeld investment function, what we have to do is multiply each
of the company dummies by each of the X variables [this will add six more variables].
That is, we estimate the following model:
Yit = α1 + α2D2i + α3D3i + α4D4i + β2X2it + β3X3it + γ1(D2i X2it) + γ2(D2i X3it)
+γ3(D3i X2it) + γ4(D3i X3it) + γ5(D4i X2it) + γ6(D4i X3it) + uit …………..4.9
You will notice that the γ ’s are the differential slope coefficients, just as α2, α3, and α4
are the differential intercepts. If one or more of the γ coefficients are statistically
significant, it will tell us that one or more slope coefficients are different from the base
group. For example, say β2 and γ1 are statistically significant. In this case ( β2 + γ1)
will give the value of the slope 85
coefficient of
X2 for General Motors, suggesting that the GM slope coefficient of X2 is different from
that of General Electric, which is our comparison company. If all the differential

intercept and all the differential slope coefficients are statistically significant, we can
conclude that the investment functions of General Motors, United States Steel, and
Westinghouse are different from that of General Electric. If this is in fact the case, there
may be little point in estimating the pooled regression .
Let us examine the regression results based on (16.3.8). For ease of reading, the
regression results of (16.3.8) are given in tabular form in Table 16.2. As these results
reveal, Y is significantly related to X2 and X3. However, several differential slope
coefficients are statistically significant. For instance, the slope coefficient of X2 is 0.0902
for GE, but 0.1828 (0.0902 + 0.092) for GM. Interestingly, none of the differential
intercepts are statistically significant.
Table 4.1 Results of regression
Variable Coefficient Std. error t value p value
Intercept - 9.9563 76.3518 -0.1304 0.8966
D2i -139.5104 109.2808 -1.2766 0.2061
D3i -40.1217 129.2343 -0.3104 0.7572
D4i 9.3759 93.1172 0.1006 0.9201
X2i 0.0926 0.0424 2.1844 0.0324
X3i 0.1516 0.0625 2.4250 0.0180
D2iX2i 0.0926 0.0424 2.1844 0.0324
D2iX3i 0.2198 0.0682

86
3.2190 0.0020
D3iX2i 0.1448 0.0646 2.2409 0.0283
D3iX3i 0.2570 0.1204 2.1333 0.0365

D4iX2i 0.0265 0.1114 0.2384 0.8122
D4iX3i -0.0600 0.3785 -0.1584 0.8745
R2 = 0.9511 d = 1.0896
All in all, it seems that the investment functions of the four companies are different.
This might suggest that the data of the four companies are not “poolable,” in which
case one can estimate the investment functions for each company separately. This is a
reminder that panel data regression models may not be appropriate in each situation,
despite the availability of both time series and cross-sectional data.
A Caution on the Use of the Fixed Effects, or LSDV, Model. Although easy to use,
the LSDV model has some problems that need to be borne in mind.
First, if you introduce too many dummy variables, as in the case of the previous model ,
you will run up against the degrees of freedom problem. Alternatively, in the other case
of, where we have 80 observations, but only 55 degrees of freedom—we lose 3 df for
the three company dummies, 19 df for the 19 year dummies, 2 for the two slope
coefficients, and 1 for the common intercept. Second, with so many variables in the
model, there is always the possibility of multicollinearity, which might make precise
estimation of one or more parameters difficult.
Third, suppose in the FEM , we also include variables such as sex, color, and ethnicity,
which are time invariant too because an individual’s sex color, or ethnicity does not
change over time. Hence, the LSDV approach may not be able to identify the impact of
such time-invariant variables. 87

Fourth, we have to think carefully about the error term uit. All the results we have
presented so far are based on the assumption that the error term follows the classical
assumptions, namely,
uit N(0, σ2). Since the i index refers to cross-sectional observations and t to time series
observations, the classical assumption for uit may have to be modified. There are
several possibilities.
1. We can assume that the error variance is the same for all cross section units or we
can assume that the error variance is heteroscedastic.
2. For each individual we can assume that there is no autocorrelation over time. Thus,
for example, we can assume that the error term of the investment function for General
Motors is nonautocorrelated. Or we could assume that it is autocorrelated, say, of the
AR(1) type.
3. For a given time, it is possible that the error term for General Motors is correlated
with the error term for, say, U.S. Steel or both U.S. Steel and Westinghouse.7 Or, we
could assume that there is no such correlation.
4. We can think of other permutations and combinations of the error term. As you will
quickly realize, allowing for one or more of these possibilities will make the analysis that
much more complicated. Space and mathematical demands preclude us from
considering all the possibilities.
However, some of the problems may be alleviated if we resort to the so-called random
effects model, which we discuss next.
4.4. Estimation Of Panel

88 Data Regression Models: The Random Effects
Approach

Although straightforward to apply, fixed effects, or LSDV, modeling can be expensive in

terms of degrees of freedom if we have several cross-sectional units. Besides, as
Kmenta notes:
An obvious question in connection with the covariance [i.e., LSDV] model is whether
the inclusion of the dummy variables—and the consequent loss of the number of
degrees of freedom—is really necessary. The reasoning underlying the covariance
model is that in specifying the regression model we have failed to include relevant
explanatory variables that do not change over time (and possibly others that do change
over time but have the same value for all cross-sectional units), and that the inclusion
of dummy variables is a cover up of our ignorance. If the dummy variables do in fact
represent a lack of knowledge about the (true) model, why not express this ignorance
through the disturbance term uit? This is precisely the approach suggested by the
proponents of the so called error components model (ECM) or random effects
model (REM).
The basic idea is to start with:
Yit = β1i + β2X2it + β3X3it + uit ………………………………………………………….(4.10)
Instead of treating β1i as fixed, we assume that it is a random variable with a mean
value of β1 (no subscript i here). And the intercept value for an individual company can
be expressed as
β1i = β1 + εi i = 1, 2, . . . , N ……………………………………………………….(4.11)
where εi is a random error term with a mean value of zero and variance of σ2 ε .
What we are essentially saying

89 is that the four firms included in our sample are a
drawing from a much larger universe of such companies and that they have a common

mean value for the intercept ( = β1) and the individual differences in the intercept
values of each company are reflected in the error term εi .
Substituting (4.11) into (4.10), we obtain:
Yit= β1 + β2X2it+ β3X3it+ εi+ uit
= β1 + β2X2it+ β3X3it+ wit (4.12)
where
wit = εi + uit (4.13)
The composite error term wit consists of two components, εi , which is the cross-
section, or individual-specific, error component, and uit , which is the combined time
series and cross-section error component. The term error components model derives its
name because the composite error term wit consists of two (or more) error
components.
The usual assumptions made by ECM are that
Ɛit N(0,δ2)
uit N(0,δ2)…………………………………………………………………………. (4.14)
E(εiuit) = 0 E(εiεj ) =0 (i = j )
E(uituis) = E(uitujt) = E(uitujs) =0 (i = j ; t = s).
that is, the individual error components are not correlated with each other and are not
autocorrelated across both cross-section and time series units. Notice carefully the
90
difference between FEM and ECM. In FEM each cross-sectional unit has its own (fixed)
intercept value, in all N such values for N cross-sectional units. In ECM, on the other
hand, the intercept β1 represents the mean value of all the (cross-sectional) intercepts

and the error component εi represents the (random) deviation of individual intercept
from this mean value. However, keep in mind that εi is not directly observable; it is
what is known as an unobservable, or latent, variable.
As a result of the assumptions stated above, it follows that
E(wit) = 0 ……………………………………………………….(4.15)
var (wit) = σ2ε+ σ2u ……………………………………………..(4.16)
Now if σ2ε= 0, there is no difference between models (4.1) and (4.12), in which case
we can simply pool all the (cross-sectional and time series) observations and just run
the pooled regression, as we did in (4.2).
As (4.16) shows, the error term wit is homoscedastic. However, it can be shown that
wit and wis (t _= s) are correlated; that is, the error terms of a given cross-sectional
unit at two different points in time are correlated. The correlation coefficient, corr ( wit,
wis), is as follows:
corr (wit, wis) = σ2εσ2ε+ σ2u……………………………………..(4.17)
Notice two special features of the preceding correlation coefficient. First, for any given
cross-sectional unit, the value of the correlation between error terms at two different
times remains the same no matter how far apart the two time periods are. This is in
strong contrast to the first-order [AR(1)] scheme, where we found that the correlation
between time periods declines over time. Second, the correlation structure given in
(4.17) remains the same for all cross sectional units; that is, it is identical for all
individuals.
91
If we do not take this correlation structure into account, and estimate (4.12) by OLS,
the resulting estimators will be inefficient. The most appropriate method here is the
method of generalized least squares (GLS).

We will not discuss the mathematics of GLS in the present context because of its
complexity. Since most modern statistical software packages now have routines to
estimate ECM (as well as FEM), we will only present the results for our investment
example. But before we do that, it may be noted that we can easily extend (4.13) to
allow for a random error component to take into account variation over time.
The results of ECM estimation of the Grunfeld investment function are presented in the
following Table. Several aspects of this regression should be noted. First, if you sum the
random effect values given for the four companies, it will be zero, as it should (why?).
Second, the mean value of the random error component, εi, is the common intercept
value of −73.0353. The random effect value of GE of −169.9282 tells us by how much
the random error component of GE differs from the common intercept value. Similar
interpretation applies to the other three values of the random effects. Third, the R2
value is obtained from the transformed GLS regression.
If you compare the results of the ECM model given in the following table with those
obtained from FEM, you will see that generally the coefficient values of the two X
variables do not seem to differ much, except for those given in the previous table,
where we allowed the slope coefficients of the two variables to differ across cross-
sectional units.
Table 4.2: ECM estimation of the Grunfeld investment function
Variable Coefficient Std. error t statistic p

92
value
Intercept -73.0353 83.9495 -0.8699

0.3870
X2 0.1076 0.0168 6.4016

0.0000
X3 0.3457 0.0168 13.0235

0.0000
Random effect:
GE -169.9282
GM -9.5078
USS 165.5613
Westinghouse 13.87475
R2 = 0.9323 (GLS)
4.5. Fixed Effects (LSDV) versus Random Effects Model
The challenge facing a researcher is: Which model is better, FEM or ECM? The answer
to this question hinges around the assumption one makes about the likely correlation
between the individual, or cross-section specific, error component εi and the X
regressors.
If it is assumed that εi and the X’s are uncorrelated, ECM may be appropriate, whereas
if εi and the X’s are correlated, FEM may be appropriate. Why would one expect
correlation between the individual error component εi and one or more regressors?
Consider an example. Suppose
93
we have a random sample of a large number of
individuals and we want to model their wage, or earnings, function. Suppose earnings
are a function of education, work experience, etc. Now if we let εi stand for innate
ability, family background, etc., then when we model the earnings function including εi

it is very likely to be correlated with education, for innate ability and family background
are often crucial determinants of education. As Wooldridge contends, “In many
applications, the whole reason for using panel data is to allow the unobserved effect
[i.e., εi] to be correlated with the explanatory variables.” The assumptions underlying
ECM is that the εi are a random drawing from a much larger population. But sometimes
this may not be so. For example, suppose we want to study the crime rate across the
50 states in the United States. Obviously, in this case, the assumption that the 50 states
are a random sample is not tenable.
Keeping this fundamental difference in the two approaches in mind, what more can we
say about the choice between FEM and ECM? Here the observations made by Judge et
al. may be helpful:
1. If T (the number of time series data) is large and N (the number of cross-sectional
units) is small, there is likely to be little difference in the values of the parameters
estimated by FEM and ECM. Hence the choice here is based on computational
convenience. On this score, FEM may be preferable.
2. When N is large and T is small, the estimates obtained by the two methods can differ
significantly. Recall that in ECM β1i = β1 + εi , where εi is the cross-sectional random
component, whereas in FEM we treat β1i as fixed and not random. In the latter case,
statistical inference is conditional on the observed cross-sectional units in the sample.
This is appropriate if we strongly believe that the individual, or cross-sectional, units in
our sample are not random drawings from a larger sample. In that case, FEM is
appropriate. However, if the cross-sectional units in the sample are regarded as random
drawings, then ECM is appropriate, for in that case statistical inference is unconditional.
94
3. If the individual error component εi and one or more regressors are correlated, then
the ECM estimators are biased, whereas those obtained from FEM are unbiased.

4. If N is large and T is small, and if the assumptions underlying ECM hold, ECM
estimators are more efficient than FEM estimators.
Is there a formal test that will help us to choose between FEM and ECM? Yes, a test
was developed by Hausman in 1978. We will not discuss the details of this test, for they
are beyond the scope of this book. The null hypothesis underlying the Hausman test is
that the FEM and ECM estimators do not differ substantially. The test statistic developed
by Hausman has an asymptotic χ2 distribution. If the null hypothesis is rejected, the
conclusion is that ECM is not appropriate and that we may be better off using FEM, in
which case statistical inferences will be conditional on the εi in the sample.
4.6. Panel data regressions: some concluding comments
As noted at the outset, the topic of panel data modeling is vast and complex. We have
barely scratched the surface. Among the topics that we have not discussed, the
following may be mentioned.
1. Hypothesis testing with panel data.
2. Heteroscedasticity and autocorrelation in ECM.
3. Unbalanced panel data.
4. Dynamic panel data models in which the lagged value(s) of the regressand ( Yit)
appears as an explanatory variable.
5. Simultaneous equations involving panel data.
6. Qualitative dependent variables and panel data.
95
One or more of these topics can be found in the references cited in this chapter, and
the reader is urged to consult them to learn more about this topic. These references

also cite several empirical studies in various areas of business and economics that have
used panel data regression models. The beginner is well advised to read some of these
applications to get a feel about how researchers have actually implemented such
models.
Summary and conclusions
1. Panel regression models are based on panel data. Panel data consist of observations
on the same cross-sectional, or individual, units over several time periods.
2. There are several advantages to using panel data. First, they increase the sample
size considerably. Second, by studying repeated cross-section observations, panel data
are better suited to study the dynamics of change. Third, panel data enable us to study
more complicated behavioral models.
3. Despite their substantial advantages, panel data pose several estimation and
inference problems. Since such data involve both cross-section and time dimensions,
problems that plague cross-sectional data (e.g., heteroscedasticity) and time series data
(e.g., autocorrelation) need to addressed. There are some additional problems, such as
cross-correlation in individual units at the same point in time.
4. There are several estimation techniques to address one or more of these problems.
The two most prominent are (1) the FEM and (2) the REM or error components model
(ECM).
5. In FEM the intercept in the regression model is allowed to differ among individuals in
recognition of the fact each individual,
96 or cross sectional, unit may have some special
characteristics of its own. To take into account the differing intercepts, one can use
dummy variables. The FEM using dummy variables is known as the least-squares

dummy variable (LSDV) model. FEM is appropriate in situations where the individual
specific intercept may be correlated with one or more regressors. A disadvantage of
LSDV is that it consumes a lot of degrees of freedom when the number of cross-
sectional units, N, is very large, in which case we will have to introduce N dummies (but
suppress the common intercept term).
6. An alternative to FEM is ECM. In ECM it is assumed that the intercept of an individual

unit is a random drawing from a much larger population with a constant mean value.
The individual intercept is then expressed as a deviation from this constant mean value.
One advantage of ECM over FEM is that it is economical in degrees of freedom, as we
do not have to estimate N cross-sectional intercepts. We need only to estimate the
mean value of the intercept and its variance. ECM is appropriate in situations where the
(random) intercept of each cross-sectional unit is uncorrelated with the regressors.
7. The Hausman test can be used to decide between FEM and ECM.
8. Despite its increasing popularity in applied research, and despite increasing

availability of such data, panel data regressions may not be appropriate in every
situation. One has to use some practical judgment in each case.
Review question
Answer the following questions briefly
1. Write the difference among cross section, time series and pooled data.
2. List the advantages of panel data.

97
3. Differentiate FEM from that of REM
4. Explain the Hausman test.

Econometrics II Distance Module

Uploaded by

Copyright:

Available Formats

You might also like

Econometrics II Distance Module

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics II Distance Module

Uploaded by

Copyright:

Available Formats

College of Business, Economics, and Social Sciences; Department of Common and Supportive Courses

COLLEGE OF DISTANCE AND CONTINUING EDUCATION

ECONOMETRICS II (Econ-2062) Module

Course Title: Econometrics II Prepared by:-------------------------------

Course Title: Econometrics II Prepared by:-------------------------------

alternative estimation methods. Introductory pooled cross-sectional and panel data

After the completion of this course, students are expected to:

o Understand the basic concepts in regression involving dummy

o Know the theory and practice of elementary time series econometrics;

o Understand the motivation and estimation methods of simultaneous

o Get introductory ideas on linear panel data models; and

 Maddala, G. S. (2013). Introduction to Econometrics, 4th edition,

Course Title: Econometrics II Prepared by:-------------------------------

 Koutsoyiannis, A. (2001). Theory of Econometrics, Palgrave: New York.

 Johnston, J. (2015)., Econometric Methods, 3rd edition.

 Kmenta, J. (2008). Elements of Econometrics, 2nd edition.

 Intrilligator M.D, R.G. Bodkin, and D. Hsiao (1996). Econometric Models,

Course Title: Econometrics II Prepared by:-------------------------------

Regression Analysis with Qualitative Information: Binary (Dummy Variables)

Topic Learning Objectives

In this chapter you will be able to:

 Familiarize yourself with regression models with binary explanatory variables

 Understand some of the specification of the model for dummy dependent

 Understand the basic concepts in qualitative information modeling such as

Course Title: Econometrics II Prepared by:-------------------------------

1.2 Describing Qualitative Information

In the regression analysis you discussed in Econometrics I, the dependent variable as

Course Title: Econometrics II Prepared by:-------------------------------

1.3 . Dummy Variable as Independent Variable

1.3.1. Regression on one qualitative variable with two classes, or

Dummy variables can be used in regression models just as easily as quantitative

Where =annual salary of a college professor

1 if male college professor

= 0 otherwise (i.e., female professor)

Mean salary of female college professor: /

Mean salary of male college professor: /

Course Title: Econometrics II Prepared by:-------------------------------

1.3.2. Regression on one quantitative variable and one qualitative variable

Consider the model: =

Where: Annual salary of a college professor

Years of teaching experience

that E(ui )=0 , we see that

Mean salary of female college professor: /

Mean salary of male college professor: /

Course Title: Econometrics II Prepared by:-------------------------------

same for both sexes.

one dummy variable

Course Title: Econometrics II Prepared by:-------------------------------

3. The group, category, or classification that is assigned the value of 0 is often

4. The coefficient α 2 attached to the dummy variable D can be called the

differential intercept coefficient because it tells by how much the value of

1.3.3. Regression on one quantitative variable and one qualitative variable

Course Title: Econometrics II Prepared by:-------------------------------

Where = Annual expenditure on health care

1 if high school education

Assuming E(ui )=0 , we obtain from (1.6)

Course Title: Econometrics II Prepared by:-------------------------------

situation is shown in fig 1.2 (for illustrative purposes it is assumed that