Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 20

Welcome to the eleventh issue of e-Tutorial.

In this issue we introduce simultaneous dynamic


equations and exogeneity (Hausman) tests. I would like to remark that the theoretical
background given in class is essential to proceed with the computational exercise below.
Thus, I recommend you to consult Prof. Koenker's Lectures Notes as you go through the
tutorial.
The first thing you need is to download the data sets in ASCII format by clicking in the
respecitive names: system1.dat and system2.dat. Save them in your preferred location (I'll
save mine as "C:/system1.dat" and "C:/system2.dat"). Then I suggest you to open the files in
Notepad (or another text editor) and type the name of the variable "year" in the first row, first
column, i.e. before the variable "w". Use <Tab> to separate the names of variables. Save both
files in text format in your favorite directory (I will save mine as "C:/system1a.txt" and
"C:/system2a.txt", respectively).
Part 1:
For the first part of the problem set, go to STATA and type:
infile

year

using "C:/system1a.txt"

Next you need to declare your data as time series:


gen quarter=q(1947q1)+_n-1
tsset quarter

To obtain a flavor of the data, use the command summarize, detail. To work with time series
functions, use previous tutorials.
(Supply)
(Demand)

Qt= a1 + a2pt-1+ a3zt + ut


ps= b1 + b2Qt + b3wt + vt

Question 1:
Here you just need to run the system above, using OLS:
*Supply equation:
regress Q L.p
z
*Demand equation:
regress p Q w

For the graph, first consider the equations in the steady state. Then use the last observations
values z and w. Finally plot those two equations in a single cartesian graph. In Matlab, you
can do that using the following routine (substitute "yournumber" by the values from the
equation system you have run):
Q=0:.1:20;
p_supply=yournumber+yournumber*Q;
p_demand=yournumber+yournumber*Q;
plot(Q, p_supply, Q, p_demand, ':')
legend('supply','demand')
xlabel('Quantity')
ylabel('Price')
title('Dynamic Relationship Between Demand and Supply')

If you don't have access to Matlab, use any other graphical device or simply draw it by hand.
Find the equilibrium price and quantity. Compare with your graph. Check if there's
convergence in the graph (see Prof. Koenker class notes).
Question 2:
Here I suggest you to construct a table as follows:
n peridos ahead
0
1
2
(...)
20

Q
2.4168

p
6.7186

(...)

(...)

z
2.5089
2.5089

w
.112727
.112727

2.5089
(...)
2.5089

.112727
(...)
.112727

The values at n=0 come from the last observation in the data set. After that, z and w stay
fixed. To find p(n=1), you need first to calculate the forecast of Q(n=1). Use the system of
equations you have estimated in question 1. Do the same for p(n=i), i=2,..., 20. Find prices
and quantities of equilibrium.
Question 3:
A) Explaining the effects of autocorrelation:
The problem asks you to explain why L.p cannot be considered exogenous. Please also explain
the consequences of applying OLS estimators in the presence of autocorrelated disturbances.
B) Testing for the presence of autocorrelation: see e-Tutorial 10.
C) Correcting the Model: see e-Tutorial 10.

Question 4:

Here you can use the Hausman specification test you have saw in Lecture 11. Make sure to
explain the null and the alternative hypothesis, and describe how the test is computed. The
choice of instruments is crucial: you need to select instruments that are exogenous,
orthogonal to errors, but correlated with included variables. In STATA, you can calculate the
Hausman test as follows:
*2SLS with full set of instruments:
regress Q L.p z ( your instruments z )
matrix bFull=get(_b)
matrix varFull=get(VCE)
*2SLS with reduced set of instruments:
regress Q L.p z ( your instruments only )
matrix bRedux=get(_b)
matrix varRedux=get(VCE)
*Hausman Specification Test:
matrix Omega=varRedux-varFull
matrix Omegainv=syminv(Omega)
matrix q=bFull-bRedux
matrix Delta=q*Omegainv*q'
matrix list Delta

After you obtain your test statistic Delta - a scalar, you should compare it with a Chi-squared
(1). The degrees of freedom correspond to the number of dubious variables, i.e., the number
of variables included as instruments in the first equation, but not in the second equation.
Compare your results with the command hausman in STATA.

Part 2:
For the second part of the problem set, go to STATA and type:
infile

year

p1 using "C:/system2a.txt"

Next you need to declare your data as time series:


gen quarter=q(1947q1)+_n-1
tsset quarter

To obtain a flavor of the data, use the command summarize, detail. To work with time series
functions, use previous tutorials.
(Supply)
(Demand)

Qt= a1 + a2pt+ a3pt-1 + a4zt + ut


pt= b1 + b2Qt + b3wt + vt

The variable p1 in the data set was created to substitute the lag of p, so you don't have to
loose any observation.
Question 5:
Here you just need to compare the OLS estimators with the 2SLS estimators for the whole
system:
*Supply
*OLS:
regress Q p p1 z
*2SLS:
regress Q p p1 z ( your instruments )
*Demand:
*OLS:
regress p Q w
*2SLS:
regress p Q w ( your instruments )

As an econometrician, draw your analysis on the results above. Check signs, significance, and
economic sense of the results.

Question 6.
This question is a simple hyptohesis testing exercise. Nevertheless, you don't know if you
should test using OLS or 2SLS. The results might be different according to the model. So, I
suggest you to implement a Hausman-Wu test, similar to the one you did in question 4, but
considering a simple OLS regression versus a 2SLS. You decide your instruments. To
implement the test, follow the commands in question 4, adjusting for the OLS equation. Based
on your diagnostic, choose the best model (OLS or 2SLS), and type in STATA the following
commands:
*Is the long-run supply response to change in price equals 1?
test p+p1=1
*Are the first and second periods price effects the same?
test p=p1

Optional: Checking if the errors are correlated in a system of equations:


Sometimes is useful to know whether the errors of the supply equation are correlated with
the errors of the demand equation. For example, using 2SLS methods:
*Supply:
regress Q p p1 z (z w p1)
*Obtain the residuals of the supply:
predict sres, res
*Demand:
regress p Q w (z w p1)
*Obtain the residuals of the demand:
predict dres, res
*Regress the estimated residuals of the supply against the estimated residuals of the demand:
regress sres dres, noconst

If you find the residuals are correlated, maybe it is because some of the variables are not
exogenous. Then you can proceed with the Hausman test to verify who is not exogenous
among the instruments.

Welcome to the twelfth issue of e-Tutorial. Here I will talk about the basic fundamentals of
panel data estimation techniques: from the organization of your panel data sets to the tests of
fixed effects versus random effects. In the example below I will use the theoretical
background of Prof. Koenker's Lecture Note 13 (2004) to reproduce the results of Greene
(1997). I insert STATA estimation techniques (plus some comments) whenever necessary. I
also provide a short introduction to panel data in R. Have fun!!!
Example:
Greene (1997) provides a small panel data set with information on costs and output of 6
different firms, in 4 different periods of time (1955, 1960,1965, and 1970). Your job is try to
estimate a cost function using basic panel data techniques.
Stacking your data:
The data is shown below in a stacked form, i.e., the first "T" lines (here T=4) regard the firm
1, then the second "T" lines regard firm 2, and so on. The columns are self-explanatory. To
facilitate your work, I included firm specific dummy variables for each firm, represented by
columns D1-D6. The data is described below and available here in ASCII format for download.
Year

Fir
m

Cost

Outpu
D1 D2 D3 D4 D5 D6
t

195
5

1 3.154

214

196

1 4.271

419

0
196
5

1 4.584

588

197
0

1 5.849 1025

195
5

2 3.859

696

196
0

2 5.535

811

196
5

2 8.127 1640

197
0

10.96
2506
6

(...) (...)

(...)

(...)

(... (... (... (... (... (...


)
)
)
)
)
)

195
5

73.05 1179
0
6

196
0

98.84 1555
6
1

196
5

138.8 2721
80
8

197
0

191.5 3095
60
8

Save the data in your preferred path (I will save mine as "C:/econ508/greene.txt") and open
your preferred software.
In R:
The Appendix A contains a panel data session in R with the main results derived in this
tutorial.
In STATA:
The first step is to download your data into the software:
infile Year Firm Cost Output D1 D2 D3 D4 D5 D6 using "C:/econ508/greene14.txt"

Drop the first line of observations containing missing values (due to the labels of
variables in the text file).
The next step is to generate the log values of costs and outputs:
gen lnc=log(Cost)
gen lny=log(Output)

Finally you declare your data set as panel:


iis Firm
tis Year

where iis refers to the cross-sectional unit identification, and tis to the time series
identification.
Theoretical Background:
Consider a simplified version of the equation (1) in Koenker's Lecture 13:
(1)

yit = xitb + ai + uit

a) Pooled OLS:
The most basic estimator of panel data sets are the Pooled OLS (POLS). Johnston & DiNardo
(1997) recall that the POLS estimators ignore the panel structure of the data, treat

observations as being serially uncorrelated for a given individual, with homoscedastic errors
across individuals and time periods:
bPOLS = (X'X)-1X'y

(2)

In STATA, you can obtain the POLS as follows:


regress lnc lny
Source |
SS
df
MS
---------+-----------------------------Model |
33.617333
1
33.617333
Residual | 1.01520396
22 .046145635
---------+-----------------------------Total | 34.6325369
23 1.50576248

Number of obs
F( 1,
22)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

24
728.51
0.0000
0.9707
0.9694
.21482

-----------------------------------------------------------------------------lnc |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lny |
.8879868
.0328996
26.991
0.000
.8197573
.9562164
_cons | -4.174783
.2768684
-15.079
0.000
-4.748973
-3.600593
-----------------------------------------------------------------------------scalar R2OLS=_result(7)

b) Fixed Effects (Within-Groups) Estimators:


In Koenker's Lecture 13 we examined the effects of applying the matrix P and Q to the data:
P = D(D'D)-1D': transform data into individual means
Q = I-P : transform data into deviation from individual means.

The within-groups (or fixed effects) estimator is then given by:


bW = (X'QX)-1X'Qy

(3)

Given that Q is idempotent, this is equivalent to regressing Qy on QX, i.e., using data in the
form of deviations from individuals means. In STATA, you can obtain the within-groups
estimators using the built-in functionxtreg, fe:
xtreg lnc lny, fe
Fixed-effects (within) regression
Group variable (i) : Firm
R-sq:

within = 0.8774
between = 0.9833
overall = 0.9707

corr(u_i, Xb)

= 0.8495

Number of obs
Number of groups

=
=

24
6

Obs per group: min =


avg =
max =

4
4.0
4

F(1,17)
Prob > F

=
=

121.66
0.0000

-----------------------------------------------------------------------------lnc |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lny |
.6742789
.0611307
11.030
0.000
.5453044
.8032534
_cons | -2.399009
.508593
-4.717
0.000
-3.472046
-1.325972
-----------------------------------------------------------------------------sigma_u | .36730483
sigma_e | .12463167
rho | .89675322
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(5,17) =
9.67
Prob > F = 0.0002
matrix bW=get(_b)
matrix VW=get(VCE)

Note: The intercept above shown is an average of individual intercepts. If you are interested
in obtaining firm-specific intercepts, go to Appendix B.
Between-Groups Estimators:
Another useful estimator is provided when you use only the group means, i.e., transforming

your data by applying the matrix P to equation (1) above:


bB = [X'PX]-1X'Py

(4)

In STATA, you can obtain the between-groups estimators using the built-in function xtreg, be:
xtreg lnc lny, be
Between regression (regression on group means)
Group variable (i) : Firm
R-sq:

within = 0.8774
between = 0.9833
overall = 0.9707

sd(u_i + avg(e_i.))=

.1838474

Number of obs
Number of groups

=
=

24
6

Obs per group: min =


avg =
max =

4
4.0
4

F(1,4)
Prob > F

=
=

236.23
0.0001

-----------------------------------------------------------------------------lnc |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lny |
.9110734
.0592772
15.370
0.000
.7464935
1.075653
_cons | -4.366618
.4982409
-8.764
0.001
-5.749957
-2.983279
-----------------------------------------------------------------------------matrix bB=get(_b)
matrix VB=get(VCE)

c) Random Effects:
Following Koenker's Lecture 13, consider ai's as random. So, the model will be estimated via
GLS:
(5) bGLS = [X'Omega-1X]-1X'Omega-1y

where Omega = (sigmau2*InT + T*sigmaa2*P)


You can obtain GLS estimators in STATA by using the built-in functionxtreg, re:
xtreg lnc lny, re
Random-effects GLS regression
Group variable (i) : Firm
R-sq:

within = 0.8774
between = 0.9833
overall = 0.9707

Random effects u_i ~ Gaussian


corr(u_i, X)
= 0 (assumed)

Number of obs
Number of groups

=
=

24
6

Obs per group: min =


avg =
max =

4
4.0
4

Wald chi2(1)
Prob > chi2

=
=

268.10
0.0000

-----------------------------------------------------------------------------lnc |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lny |
.7963203
.0486336
16.374
0.000
.7010002
.8916404
_cons | -3.413094
.4131166
-8.262
0.000
-4.222788
-2.6034
---------+-------------------------------------------------------------------sigma_u | .17296414
sigma_e | .12463167
rho | .65823599
(fraction of variance due to u_i)
------------------------------------------------------------------------------

GLS as a Combination of Within- and Between-Groups Estimators:


You can recover GLS estimator from the combination of between and within estimators, as
shown in Koenker's Lecture 13:
(5.a)

bGLS = Delta* bB + (1-Delta)* bW

where

Delta = VW / (VW + VB)

In STATA, you can recover random effects GLS estimators as follows:

matrix V=VW+VB
matrix Vinv=syminv(V)
matrix D=VW*Vinv
matrix P1=D*bB'
matrix I2=I(2)
matrix RD=I2-D
matrix P2=RD*bW'
matrix bRE=P1+P2
matrix list bRE
bRE[2,1]
y1
lny

.79632032

_cons

-3.413094

What should I use: Fixed Effects or Random Effects? A Hausman (1978) Test
Approach
Hausman (1978) suggested a test to check whether the individual effects (ai) are correlated
with the regressors (Xit):
- Under the Null Hypothesis: Orthogonality, i.e., no correlation between individual effects and
explanatory variables. Both random effects and fixed effects estimators are consistent, but the
random effects estimator is efficient, while fixed effects is not.
- Under the Alternative Hypothesis: Individual effects are correlated with the X's. In this case,
random effects estimator is inconsistent, while fixed effects estimator is consistent and
efficient.
Greene (1997) recalls that, under the null, the estimates should not differ systematically.
Thus, the test will be based on a contrast vecor H:
(6)

H = [bGLS - bW]'[V(bW)-V(bGLS)]-1[bGLS - bW]

~ Chi-squared (k)

where k is the number of regressors in X (excluding constant). In STATA, you can obtain that
as follows:
xtreg lnc lny, fe
hausman, save
xtreg lnc lny, re
hausman
---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
|
Prior
Current
Difference
S.E.
---------+------------------------------------------------------------lny |
.6742789
.7963203
-.1220414
.0370369
---------+------------------------------------------------------------b = less efficient estimates obtained previously from xtreg.
B = fully efficient estimates obtained from xtreg.
Test:

Ho:

difference in coefficients not systematic


chi2(

1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
10.86
Prob>chi2 =
0.0010

So, based on the test above, we can see that the tests statistic (10.86) is greater than the
critical value of a Chi-squared (1df, 5%) = 3.84. Therefore, we reject the null hypothesis.
Given such result, the preferred model is the fixed effects.
Appendix A: Quick Session in R
The first thing to do is to download the data, save in your preferred directory (I will save mine

as C:/econ508/greene14.txt), and infile the data into R:


greene14<-read.table("C:/econ472/greene14.txt", header=T)
greene14

Next you need to extract each variable from the data set:
year<-greene14$Year
firm<-greene14$Firm
cost<-greene14$Cost
output<-greene14$Output
d1<-greene14$D1
d2<-greene14$D2
d3<-greene14$D3
d4<-greene14$D4
d5<-greene14$D5
d6<-greene14$D6
summary(greene14)

And transform them into logs (usually you don't need to, but it will facilitate the use of panel
functions later).
lnc<-log(cost)
lny<-log(output)

Finally, you will call the library MASS, to use the vcov function.
library(MASS)
help(vcov)

Pooled OLS
pols<-lm(log(cost)~log(output))
summary(pols)
anova(pols)
bpols<-cbind(coefficients(pols))
vcov.lm(pols)

Fixed Effects:
In order to obtain the fixed effects we need to transform the data into means and deviations
from means. The function panmat.R, available at the Econ 508 webpage (Routines, panel.R),
does such transformation. You can copy the function below and past on the R screen.
#Start copying here:
#This function computes matrices of means and deviations from means
#used by the panel2 function.
# Input: x = a matrix data indexed by id
#
id = a factor variable indexing x
# Output: list containing: xm<-matrix of means
#
xdm<-matrix of deviations from means.
"panmat"<-function(x,id)
{
x<-as.matrix(x)
id<-as.factor(id)
xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id)
xdm<- x-apply(xm, 2, function(y,z) rep(y,table(z)),z=id)
list(xm=xm, xdm=xdm)
}
#Finish copying here.

Next, you will extract the between and the within data:
lncwe<-panmat(lnc,firm)$xdm
lncbe<-panmat(lnc,firm)$xm
lnywe<-panmat(lny,firm)$xdm
lnybe<-panmat(lny,firm)$xm
#Fixed Effects (Within Estimators):
within<-lm(lncwe~lnywe-1)
summary(within)
bwe<-coefficients(within)

vwe<-vcov(within)
#Between Estimator:
between<-lm(lncbe~lnybe)
summary(between)
vbe<-vcov(between)
vbe

Appendix B: Recovering Alfas from Fixed Effects (Least Squares Dummy Variables)
Suppose you are interested in to obtain a specific regression for firm 3. E.g., many
international economists need to find a country-specific equation when they are dealing with
country panels. If you are in this situation, don't worry. The fixed effects estimators are
already taking into account all individual effects. The only mysterious thing happening is that
such individual intercepts are not being shown in the regression output. In the example
above, the intercept shown in the fixed effects output is not specific to any firm. Instead, it is
an average of all firms intercepts.
You can recover the intercept of your cross-sectional unit after using fixed effects estimators.
For the example above, let's calculate the fixed effects model including dummy variables for
each firm, instead of a common intercept (some authors call this Lest Squares Dummy
Variables, but it is the same fixed effects you saw earlier). In STATA:
regress lnc lny D1 D2 D3 D4 D5 D6, noconst
Source |
SS
df
MS
---------+-----------------------------Model | 280.714267
7 40.1020382
Residual | .264061918
17 .015533054
---------+-----------------------------Total | 280.978329
24 11.7074304

Number of obs
F( 7,
17)
Prob > F
R-squared
Adj R-squared
Root MSE

=
24
= 2581.72
= 0.0000
= 0.9991
= 0.9987
= .12463

-----------------------------------------------------------------------------lnc |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lny |
.6742789
.0611307
11.030
0.000
.5453044
.8032534
D1 | -2.693527
.3827874
-7.037
0.000
-3.501137
-1.885916
D2 | -2.911731
.4395755
-6.624
0.000
-3.839154
-1.984308
D3 | -2.439957
.5286852
-4.615
0.000
-3.555386
-1.324529
D4 | -2.134488
.5587981
-3.820
0.001
-3.313449
-.955527
D5 | -2.310839
.55325
-4.177
0.001
-3.478094
-1.143583
D6 | -1.903512
.6080806
-3.130
0.006
-3.18645
-.6205737
------------------------------------------------------------------------------

The slope is obviously the same. The only change is the substitution of a common intercept for
6 dummies, each of them representing a cross-sectional unit.
Now suppose you would like to know if the difference in the firms effects is statistically
significant. How to do that?
- Regress the fixed effects estimators above, including the intercept and the dummies:
regress lnc lny D1 D2 D3 D4 D5 D6
Source |
SS
df
MS
---------+-----------------------------Model |
34.368475
6 5.72807917
Residual | .264061918
17 .015533054
---------+-----------------------------Total | 34.6325369
23 1.50576248

Number of obs
F( 6,
17)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

24
368.77
0.0000
0.9924
0.9897
.12463

-----------------------------------------------------------------------------lnc |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------lny |
.6742789
.0611307
11.030
0.000
.5453044
.8032534
D1 | (dropped)
D2 | -.2182041
.1052027
-2.074
0.054
-.4401624
.0037542
D3 |
.2535693
.1716665
1.477
0.158
-.1086153
.6157538
D4 |
.5590387
.1982915
2.819
0.012
.1406801
.9773973
D5 |
.3826881
.1933058
1.980
0.064
-.0251516
.7905277
D6 |
.7900151
.2436915
3.242
0.005
.275871
1.304159

_cons | -2.693527
.3827874
-7.037
0.000
-3.501137
-1.885916
------------------------------------------------------------------------------

Note that one of the dummies is dropped (due to perfect collinearity of the constant), and all
other dummies are represented as the difference between their original value and the
constant . (The value of the constant in this second regression equals the value of the dropped
dummy in the previous regression. The dropped dummy is seen as the benchmark.)
- Obtain the R-squared from restricted (POLS) and unresctricted (fixed effects with dummies)
models
scalar R2LSDV=_result(7)
scalar list
R2OLS = .97068641
R2LSDV = .99237532

- Perform the traditional F-test, comparing the unrestricted regression with the restricted
regression:
(7)

F(n-1,

nT-n-K)

=[ (Ru2 - Rp2) / (n-1) ] / [ (1 - Ru2) / (nT - n - k) ]

where the subscript "u" refers to the unrestricted regression (fixed effects with dummies),
and the subscript "p" to the restricted regression (POLS). Under the null hypothesis, POLS
are more efficient.
scalar F=((R2LSDV-R2OLS)/(6-1))/((1-R2LSDV)/(24-6-1))
scalar list F
F = 9.6715307

The result above can be compared with the critical value of F(5,17), which equals 4.34 at 1%
level. Therefore, we reject the null hypothesis of common intercept for all firms.
References:
Greene, William, 1997, Econometric Analysis, Third Edition, NJ: Prentice-Hall.
Hausman, Jerry, 1978, "Specification Tests in Econometrics," Econometrica, 46, pp.12511271.
Johnston, Jack, and John DiNardo, 1997, Econometric Methods, Fourth Edition, NY: McGrawHill.
Koenker, Roger, 2004, "Panel Data," Lecture 13, mimeo, University of Illinois at UrbanaChampaign.

Welcome to the thirteenth issue of e-Tutorial. Here I will apply the Hausman-Taylor (1981)
instrumental variables approach to the phuzics data of PS4. The estimation strategy is
explained in Koenker's Lecture 16 (2005, and the respective routines to implement such
strategies are given in both STATA and R. I hope this helps in PS4. Have fun!!!
Downloading your data:
You can download your data from the Econ 508 webpage (here) and save the file in your
preferred directory (I'll save mine as "C:\phuzics01.txt"). Note that the names of the
variables are slightly different from the PS4. In the data set:

id: person identifier sex: gender (female==1)

yr: current year 1900

rphd: rank of PhD

phd: year of PhD -

ru: dummy for research university(res.pos.==1)

y: page
equivale
nt in
current
year
Y:
discount
ed
cumulat
ive page
equivale
nt
s:

current
annual
salary

1900
In R:
See Appendix A for a quick session in R.
In STATA:
First you need to expand the memory dedicated to STATA in your computer:
set memory 1g

Then you can infile the data by typing:


infile

id

yr

phd

sex

rphd

ru

using

"C:\phuzics01.txt"

Note: Drop the first line of obs with missing values (due to the labels of variables in
.txt file).
After that you declare your data set as panel, sort by id, and summarize the data:
iis id
tis yr
sort id
summarize

Finally you can save it in the STATA format (I will save mine as "C:\phuzics01.dta"), and
upload it using a little STATA program you are going to write with your panel functions.
PQ.do
The first step towards the panel data estimation is to transform your data into group means
and deviations of group means. There's a specific code in STATA for that, called PQ.do :
* A simple program for computing group means (P) and
* deviations from group means (Q).
capture program drop PQ
program define PQ
version 4.0
local options "Level(integer $S_level)"
local varlist "req ex"
parse "`*'"
parse "`varlist'",parse(" ")
sort id
quietly by id: gen P`1'=sum(`1')/sum(`1'~=.)
quietly by id: replace P`1'=P`1'[_N]
quietly gen Q`1'=`1'-P`1'
end

You can download the code at the Econ 508 webpage (Routines , PQ.do), and save it. In
STATA, go to "Files", "Do...", and select the PQ.do file you have saved. As you open the file in
STATA, it automatically runs the code. After that you can use the function by typing
"PQvariablename". For example, if you type PQy, two tranformations of y will be added to
your list of variables:
Py
Qy

for the group means of y (used by the between estimators), and


for the deviations of group means of y (used by the within estimators).

You should apply this function for all variables used in your estimations. For example, you
will see that the PQ routine will be used inside the program ht.do, to run the HausmanTaylor Instrumental Variables estimators.
ht.do
The second step is to write your own program in order to compute the HTIV estimators. The
Econ 508 webpage (Routines) provides a base program for this, called ht.do. You can
download the file in the same way you did above. Some details must be rexplained, though:

1) If you have'nt run PQ.do until now, please do so. Otherwise the program ht.do will not
work.
2) The program ht.do contains some features that should be adjusted according to the user,
such as the path to access the data set, the directory where to create a log file, etc. So, don't
forget to adjust the program to your machine.
3) The most important detail: the user should specify the model, create new variables, and
decide which variables will be included in the regression and/or treated as instruments.
Thus, it is essential to read Koenker's Lecture 13 (2004) and Hausman-Taylor (1981), as well
as a good interpretation of the PS4 and auxiliar papers, in order to understand what the
program is doing and how you need to adjust it.
To facilitate your job, I included below a sample of the ht.do program (with small
adjustments) to compute the productivity and the wages regressions:
The Productivity Equation
. do "C:\produc01.do"
. use "C:\phuzics01.dta", clear
. iis id
. tis yr
. sort id
. replace y=log(y)
(10623 real changes made)
. gen exp=yr-phd
. gen expsq=exp^2
. gen ier=1/(exp*rphd)
. gen d60=0
. replace d60=1 if phd>60
(8788 real changes made)
. quietly by id: gen y1=y[_n-1]
. quietly by id: gen y2=y[_n-2]
. drop if y2==.
(1000 observations deleted)
. * Why can't we ommitt the above command?
. * Stata can handle missing obs, but here we will
. * have problems if we let Stata do our work. Why?
. * Did you forget to run PQ.do before this program? If so, try again; otherwise, go ahead.
. PQ y
. PQ y1
. PQ y2
. PQ exp
. PQ expsq
. PQ rphd
. PQ ier
. PQ d60
. PQ sex
. PQ ru
. PQ Y
. PQ s
. * Note the effect of PQ in the time fixed variables. E.g.: Pd60=d60, Qd60=0, Psex=sex,
Qsex=0.
. * Nonetheless, we need Pd60 and Psex later. Can you see where and why?
. * POOLED OLS
. regress y y1 y2 exp expsq ier d60 sex
Source |
SS
df
MS
Number of obs =
9623
---------+-----------------------------F( 7, 9615) = 3485.58
Model | 2482.67217
7 354.667452
Prob > F
= 0.0000
Residual | 978.354138 9615
.1017529
R-squared
= 0.7173
---------+-----------------------------Adj R-squared = 0.7171
Total | 3461.02631 9622 .359699263
Root MSE
= .31899
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------y1 |
.7788283
.0099088
78.600
0.000
.759405
.7982516
y2 | -.2373115
.0095526
-24.843
0.000
-.2560365
-.2185865
exp |
.0837589
.0021972
38.120
0.000
.0794519
.088066
expsq | -.0016344
.0000441
-37.102
0.000
-.0017207
-.001548
ier |
3.407355
.1403287
24.281
0.000
3.132281
3.682429
d60 |
.0204149
.0098334
2.076
0.038
.0011395
.0396904
sex | -.0189182
.0093459
-2.024
0.043
-.0372382
-.0005982

_cons |
.4432838
.0192268
23.056
0.000
.4055953
.4809723
-----------------------------------------------------------------------------. * WITHIN ESTIMATORS (FIXED EFFECTS)
. xtreg
y y1 y2 exp expsq ier d60 sex, fe
Fixed-effects (within) regression
Group variable (i) : id
R-sq:

within = 0.6535
between = 0.7696
overall = 0.6594

corr(u_i, Xb)

= 0.2409

Number of obs
Number of groups

=
=

9623
500

Obs per group: min =


avg =
max =

5
19.2
46

F(5,9118)
Prob > F

=
=

3438.90
0.0000

-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------y1 |
.6573427
.0098663
66.625
0.000
.6380026
.6766828
y2 | -.3369351
.0094686
-35.585
0.000
-.3554957
-.3183745
exp |
.101091
.0023692
42.669
0.000
.0964469
.1057351
expsq | -.0020004
.000046
-43.470
0.000
-.0020906
-.0019102
ier |
.2556901
.2487721
1.028
0.304
-.231959
.7433392
d60 | (dropped)
sex | (dropped)
_cons |
1.037784
.0282838
36.692
0.000
.9823415
1.093227
-----------------------------------------------------------------------------sigma_u | .20105547
sigma_e |
.3014717
rho | .30784989
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(499,9118) =
3.30
Prob > F = 0.0000
. hausman, save
. * BETWEEN ESTIMATORS
. xtreg
y y1 y2 exp expsq ier d60 sex, be
Between regression (regression on group means)
Group variable (i) : id
R-sq:

within = 0.5435
between = 0.9916
overall = 0.6503

sd(u_i + avg(e_i.))=

.0323035

Number of obs
Number of groups

=
=

9623
500

Obs per group: min =


avg =
max =

5
19.2
46

F(7,492)
Prob > F

=
=

8303.54
0.0000

-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------y1 |
1.49359
.0292156
51.123
0.000
1.436188
1.550993
y2 |
-.519665
.0294238
-17.661
0.000
-.5774767
-.4618533
exp |
.0045644
.0027111
1.684
0.093
-.0007622
.0098911
expsq | -.0001547
.0000683
-2.264
0.024
-.0002889
-.0000204
ier |
.0081957
.0830353
0.099
0.921
-.1549517
.1713432
d60 | -.0079103
.0113073
-0.700
0.485
-.0301268
.0143062
sex | -.0039254
.0042214
-0.930
0.353
-.0122197
.0043688
_cons |
.0810791
.0155373
5.218
0.000
.0505515
.1116067
-----------------------------------------------------------------------------. * GLS ESTIMATORS (RANDOM EFFECTS):
. xtreg
y y1 y2 exp expsq ier d60 sex, re
Random-effects GLS regression
Group variable (i) : id
R-sq:

within = 0.6324
between = 0.9390
overall = 0.7173

Random effects u_i ~ Gaussian


corr(u_i, X)
= 0 (assumed)

Number of obs
Number of groups

=
=

9623
500

Obs per group: min =


avg =
max =

5
19.2
46

Wald chi2(7)
Prob > chi2

=
=

24399.03
0.0000

-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------y1 |
.7788283
.0099088
78.600
0.000
.7594074
.7982492

y2 | -.2373115
.0095526
-24.843
0.000
-.2560342
-.2185889
exp |
.0837589
.0021972
38.120
0.000
.0794524
.0880654
expsq | -.0016344
.0000441
-37.102
0.000
-.0017207
-.001548
ier |
3.407355
.1403287
24.281
0.000
3.132316
3.682395
d60 |
.0204149
.0098334
2.076
0.038
.0011419
.039688
sex | -.0189182
.0093459
-2.024
0.043
-.0372359
-.0006005
_cons |
.4432838
.0192268
23.056
0.000
.4056001
.4809675
---------+-------------------------------------------------------------------sigma_u |
0
sigma_e |
.3014717
rho |
0
(fraction of variance due to u_i)
-----------------------------------------------------------------------------. * HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
. hausman
---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
|
Prior
Current
Difference
S.E.
---------+------------------------------------------------------------y1 |
.6573427
.7788283
-.1214856
.
y2 | -.3369351
-.2373115
-.0996236
.
exp |
.101091
.0837589
.017332
.0008861
expsq | -.0020004
-.0016344
-.000366
.0000133
ier |
.2556901
3.407355
-3.151665
.2054152
---------+------------------------------------------------------------b = less efficient estimates obtained previously from xtreg.
B = fully efficient estimates obtained from xtreg.
Test:

Ho:

difference in coefficients not systematic


chi2( 5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 2114.69
Prob>chi2 =
0.0000

. * INSTRUMENTAL VARIABLES (1ST ROUND)


. regress y y1 y2 exp expsq ier d60 sex (Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
---------+-----------------------------Model | 2274.04125
7 324.863035
Residual | 1186.98506 9615 .123451384
---------+-----------------------------Total | 3461.02631 9622 .359699263

Number of obs
F( 7, 9615)
Prob > F
R-squared
Adj R-squared
Root MSE

=
9623
= 2063.13
= 0.0000
= 0.6570
= 0.6568
= .35136

-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------y1 |
.6607854
.0114819
57.550
0.000
.6382785
.6832923
y2 | -.3351626
.0110306
-30.385
0.000
-.356785
-.3135403
exp |
.1023073
.0027215
37.593
0.000
.0969726
.107642
expsq | -.0020375
.0000529
-38.501
0.000
-.0021412
-.0019338
ier |
.3347733
.2895833
1.156
0.248
-.232871
.9024175
d60 |
.0324825
.010838
2.997
0.003
.0112378
.0537272
sex | -.0198867
.010298
-1.931
0.053
-.0400729
.0002995
_cons |
.9895473
.0332799
29.734
0.000
.9243117
1.054783
-----------------------------------------------------------------------------.
.
.
.
.
.
.
.

predict r,res
PQ r
gen Prsq=Pr^2
quietly by id: gen mark=_n
*What does mark do? (see next regression)
quietly by id: gen T=_N
gen iT=1/T
regress Prsq iT if mark==1
Source |
SS
df
MS
---------+-----------------------------Model | .011654194
1 .011654194
Residual | 1.39687301
498 .002804966
---------+-----------------------------Total |
1.4085272
499
.0028227

Number of obs
F( 1,
498)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

500
4.15
0.0420
0.0083
0.0063
.05296

-----------------------------------------------------------------------------Prsq |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------iT |
.1525195
.0748252
2.038
0.042
.0055075
.2995315
_cons |
.0291845
.0053924
5.412
0.000
.0185899
.0397791
------------------------------------------------------------------------------

. matrix b=get(_b)
. gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))
. *Now you need to transform the variables included in your model
. replace y=y-(1-theta)*Py
(9623 real changes made)
. replace y1=y1-(1-theta)*Py1
(9623 real changes made)
. replace y2=y2-(1-theta)*Py2
(9623 real changes made)
. replace exp=exp-(1-theta)*Pexp
(9623 real changes made)
. replace expsq=expsq-(1-theta)*Pexpsq
(9623 real changes made)
. replace ier=ier-(1-theta)*Pier
(9623 real changes made)
. replace d60=d60-(1-theta)*Pd60
(7874 real changes made)
. replace sex=sex-(1-theta)*Psex
(1358 real changes made)
. * INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
. regress y y1 y2 exp expsq ier d60 sex theta (Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex
theta), noconstant
Instrumental variables (2SLS) regression
Source |
SS
df
MS
---------+-----------------------------Model | 18502.9839
8 2312.87299
Residual | 907.099508 9615 .094342123
---------+-----------------------------Total | 19410.0834 9623 2.01705117

Number of obs
F( 8, 9615)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

9623
.
.
.
.
.30715

-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------y1 |
.6578081
.01005
65.454
0.000
.638108
.6775082
y2 | -.3367606
.0096466
-34.910
0.000
-.3556698
-.3178513
exp |
.1012754
.0024055
42.101
0.000
.09656
.1059907
expsq | -.0020057
.0000468
-42.898
0.000
-.0020974
-.0019141
ier |
.2589409
.2534411
1.022
0.307
-.237857
.7557388
d60 |
.033342
.023551
1.416
0.157
-.0128229
.079507
sex |
-.01201
.0203523
-0.590
0.555
-.0519049
.0278848
theta |
1.009543
.0361539
27.924
0.000
.9386737
1.080412
-----------------------------------------------------------------------------. *Why do we have theta as a variable and no intercept here?
. matrix list b
b[1,2]
iT
_cons
y1 .15251948 .02918446
. summarize theta
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------theta |
9623
.4463058
.0806179
.3194048
.7148795
. clear
end of do-file

The Wages Equation


. do "C:\wages01.do"
. use "C:\phuzics01.dta", clear
. iis id
. tis yr
. sort id
. quietly by id: gen s1=s[_n-1]
. quietly by id: gen Y1=Y[_n-1]
. *Shall I drop the missing variables here?
. drop if s1==.
(500 observations deleted)
. replace s=log(s/s1)
(10123 real changes made)
. replace Y=log(Y/Y1)
(10123 real changes made)
. PQ s
. PQ Y

.
.
.
.

PQ sex
PQ ru
* POOLED OLS
regress s Y ru sex
Source |
SS
df
MS
---------+-----------------------------Model | 7.67338445
3 2.55779482
Residual | 7.75952035 10119 .000766827
---------+-----------------------------Total | 15.4329048 10122 .001524689

Number of obs
F( 3, 10119)
Prob > F
R-squared
Adj R-squared
Root MSE

=
10123
= 3335.56
= 0.0000
= 0.4972
= 0.4971
= .02769

-----------------------------------------------------------------------------s |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Y |
.1795768
.0019558
91.816
0.000
.175743
.1834107
ru |
.0142779
.0006994
20.414
0.000
.012907
.0156489
sex | -.0082762
.0007914
-10.458
0.000
-.0098275
-.0067249
_cons | -.0026954
.0004037
-6.676
0.000
-.0034869
-.001904
-----------------------------------------------------------------------------. * WITHIN ESTIMATORS (FIXED EFFECTS)
. xtreg
s Y ru sex, fe
Fixed-effects (within) regression
Group variable (i) : id
R-sq:

within = 0.4614
between = 0.6455
overall = 0.4855

corr(u_i, Xb)

= 0.0939

Number of obs
Number of groups

=
=

10123
500

Obs per group: min =


avg =
max =

6
20.2
47

F(2,9621)
Prob > F

=
=

4120.97
0.0000

-----------------------------------------------------------------------------s |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Y |
.1793187
.0021088
85.035
0.000
.1751851
.1834523
ru |
.0064123
.001264
5.073
0.000
.0039346
.0088899
sex | (dropped)
_cons | -.0022472
.0004157
-5.406
0.000
-.0030619
-.0014324
-----------------------------------------------------------------------------sigma_u | .01011922
sigma_e | .02691058
rho | .12388261
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(499,9621) =
2.19
Prob > F = 0.0000
. hausman, save
. * BETWEEN ESTIMATORS
. xtreg
s Y ru sex, be
Between regression (regression on group means)
Group variable (i) : id
R-sq:

within = 0.4591
between = 0.7096
overall = 0.4971

sd(u_i + avg(e_i.))=

.0088949

Number of obs
Number of groups

=
=

10123
500

Obs per group: min =


avg =
max =

6
20.2
47

F(3,496)
Prob > F

=
=

404.06
0.0000

-----------------------------------------------------------------------------s |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Y |
.2020762
.007132
28.334
0.000
.1880636
.2160888
ru |
.0167146
.0011768
14.203
0.000
.0144025
.0190268
sex | -.0080017
.0011612
-6.891
0.000
-.0102832
-.0057203
_cons | -.0069383
.0011596
-5.983
0.000
-.0092167
-.00466
-----------------------------------------------------------------------------. * GLS ESTIMATORS (RANDOM EFFECTS):
. xtreg
s Y ru sex, re
Random-effects GLS regression
Group variable (i) : id
R-sq:

within = 0.4602
between = 0.7063
overall = 0.4969

Number of obs
Number of groups

=
=

10123
500

Obs per group: min =


avg =
max =

6
20.2
47

Random effects u_i ~ Gaussian


corr(u_i, X)
= 0 (assumed)

Wald chi2(3)
Prob > chi2

=
=

9324.37
0.0000

-----------------------------------------------------------------------------s |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Y |
.1785767
.0019929
89.608
0.000
.1746707
.1824826
ru |
.0123244
.0008565
14.390
0.000
.0106458
.014003
sex | -.0081517
.001123
-7.259
0.000
-.0103527
-.0059508
_cons | -.0022302
.0005162
-4.320
0.000
-.0032419
-.0012184
---------+-------------------------------------------------------------------sigma_u | .00597206
sigma_e | .02691058
rho | .04693787
(fraction of variance due to u_i)
-----------------------------------------------------------------------------. * HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
. hausman
---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
|
Prior
Current
Difference
S.E.
---------+------------------------------------------------------------Y |
.1793187
.1785767
.0007421
.0006894
ru |
.0064123
.0123244
-.0059121
.0009296
---------+------------------------------------------------------------b = less efficient estimates obtained previously from xtreg.
B = fully efficient estimates obtained from xtreg.
Test:

Ho:

difference in coefficients not systematic


chi2(

2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
56.15
Prob>chi2 =
0.0000
. * INSTRUMENTAL VARIABLES (1ST ROUND)
. regress s Y ru sex (PY QY Qru Psex)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
---------+-----------------------------Model | 7.58349777
3 2.52783259
Residual | 7.84940703 10119
.00077571
---------+-----------------------------Total | 15.4329048 10122 .001524689

Number of obs
F( 3, 10119)
Prob > F
R-squared
Adj R-squared
Root MSE

=
10123
= 3168.81
= 0.0000
= 0.4914
= 0.4912
= .02785

-----------------------------------------------------------------------------s |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Y |
.1835144
.0020489
89.569
0.000
.1794983
.1875306
ru |
.0067056
.0013071
5.130
0.000
.0041434
.0092678
sex | -.0084882
.0007966
-10.656
0.000
-.0100496
-.0069268
_cons | -.0016707
.0004326
-3.862
0.000
-.0025186
-.0008227
-----------------------------------------------------------------------------.
.
.
.
.
.
.
.

predict r,res
PQ r
gen Prsq=Pr^2
quietly by id: gen mark=_n
*What does mark do? (see next regression)
quietly by id: gen T=_N
gen iT=1/T
regress Prsq iT if mark==1
Source |
SS
df
MS
---------+-----------------------------Model | 7.4757e-08
1 7.4757e-08
Residual | 8.3462e-06
498 1.6759e-08
---------+-----------------------------Total | 8.4209e-06
499 1.6876e-08

Number of obs
F( 1,
498)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

500
4.46
0.0352
0.0089
0.0069
.00013

-----------------------------------------------------------------------------Prsq |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------iT |
.0004526
.0002143
2.112
0.035
.0000316
.0008736
_cons |
.0000653
.0000141
4.630
0.000
.0000376
.000093
------------------------------------------------------------------------------

. matrix b=get(_b)
. gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))
. *Now you need to transform the variables included in your model
. replace s=s-(1-theta)*Ps
(10078 real changes made)
. replace Y=Y-(1-theta)*PY
(10123 real changes made)
. replace sex=sex-(1-theta)*Psex
(1426 real changes made)
. replace ru=ru-(1-theta)*Pru
(6318 real changes made)
. * INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
. regress s Y ru sex theta (PY QY Qru Psex theta), noconstant
Instrumental variables (2SLS) regression
Source |
SS
df
MS
---------+-----------------------------Model | 8.00935131
4 2.00233783
Residual | 7.19637615 10119 .000711175
---------+-----------------------------Total | 15.2057275 10123 .001502097

Number of obs
F( 4, 10119)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

10123
.
.
.
.
.02667

-----------------------------------------------------------------------------s |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Y |
.1809964
.0020571
87.988
0.000
.1769642
.1850287
ru |
.0065199
.0012524
5.206
0.000
.004065
.0089748
sex | -.0080601
.0015735
-5.122
0.000
-.0111445
-.0049757
theta | -.0014172
.00067
-2.115
0.034
-.0027306
-.0001039
-----------------------------------------------------------------------------. *Why do we have theta as a variable and no intercept here?
. matrix list b
b[1,2]
iT
_cons
y1 .00045259
.0000653
. summarize theta
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------theta |
10123
.4892002
.0797415
.3584998
.7321285
. clear
end of do-file

Appendix A: Hausman-Taylor Instrumental Variables in R


Here you can reproduce the results above using R. You should start running the functions
PQ.R, tsls.R, and htiv.R available at the Econ 508 webpage (Routines, panel.R).:
"htiv" <function(x, y, id, d, z = NULL)
{
#input:
# x design matrix partitioned as given in d
# y response vector
# id strata indicator
# d list of column numbers indicating partitioning of x
# x[,d[[1]]] is x1 -- exogonous time varying vars
# x[,d[[2]]] is x2 -- endogonous time varying vars
# x[,d[[3]]] is z1 -- exogonous time invariant vars
# x[,d[[4]]] is z2 -- endogonous time invariant vars
# z may contain excluded exogonous variables if there are any
# NB. intercept is automatically included
x <- as.matrix(cbind(x, 1))
Tx <- PQ(x, id)
d[[3]] <- c(d[[3]], dim(x)[2])
Z <- cbind(z, Tx$Ph[, d[[1]]], Tx$Qh[, d[[1]]], Tx$Qh[, d[[2]]],
x[, d[[ 3]]])
r <- tsls(x, Z, y, int = F)$resid
Ti <- table(id)
Ti.inv <- 1/table(id)
rdot2 <- tapply(r, id, mean)^2
v <- lm(rdot2~Ti.inv)
v <- v$coef
theta <- as.vector(sqrt(v[2]/(v[2] + v[1] * Ti[Tx$is])))
x <- x - (1 - theta) * Tx$Ph

y <- y - (1 - theta) * PQ(y, id)$Ph


fit <- tsls(x, Z, y, int = F)
list(fit=fit,v=v)
}
"PQ" <function(h, id)
{
if(is.vector(h))
h <- matrix(h, ncol = 1)
Ph <- unique(id)
Ph <- cbind(Ph, table(id))
for(i in 1:ncol(h))
Ph <- cbind(Ph, tapply(h[, i], id, mean))
is <- tapply(id, id)
Ph <- Ph[is, - (1:2)]
Qh <- h - Ph
list(Ph=as.matrix(Ph), Qh=as.matrix(Qh), is=is)
}
"tsls" <function(x, z, y, int = T)
{
# two stage least squares
if(int){
x <- cbind(1, x)
z <- cbind(1, z)
}
xhat <- lm(x~z-1)$fitted.values
R <- lm(y ~ xhat -1)
R$residuals <- c(y - x %*% R$coef)
return(R)
}

Next you should prepare your data, and then to run the htiv.R. Here is a code for that:
#a simple analysis of wages
d <- read.table("data.05",header=TRUE)
n <- length(d[,1])
h <- as.matrix(d)
h <- cbind(h[3:n,],h[3:n,]-h[2:(n-1),],h[2:(n-1),]-h[1:(n-2),])#difference
h <- h[h[,10]==0,]#ignore obs whose first diff confounds people
h <- h[h[,19]==0,]#ignore obs whose second diff confounds people
h <- cbind(h[,1:9],h[,7:9]-h[,16:18],h[,7:9]-h[,16:18]-h[,25:27])
h <- cbind(h[,1:4],h[,2]-h[,3],(h[,2]-h[,3])^2,h[,5:15])
dimnames(h)[[2]] <- c("id","yr","phd","sex","exp","exp^2","rphd","ru",
"y","Y","s","y <- 1","Y_1","s_1",
"y <- 2","Y_2","s_2")
#h <- h[h[,1]<124,]
#fit productivity equation by Hausman Taylor Method
y <- log(h[,9])
x <- h[,c(12,15,5:7,3,4)]
x[,1:2] <- log(x[,1:2])
#x[,5] <- 1/x[,5]#rank of phd program
x[,6] <- as.numeric(x[,6]>60) #vintage effect of phd
x[,5] <- 1/(x[,3]*x[,5])#rank of phd program
id <- h[,1]
dimnames(x)[[2]] <- c("y","y_1","exp","exp2","rphd","phd","sex")
vl <- list(3:4,c(1:2,5),6:7,NULL)
v <- htiv(x,y,id,vl)
print("these are the variance component estimates")
print(v$v)
print(summary(v$fit))

References:
Hausman, Jerry, 1978, "Specification Tests in Econometrics," Econometrica, 46, pp.12511271.
Hausman, Jerry, and William Taylor, 1981, "Panel Data and Unobservable Individual Effects",
Econometrica, 49, No. 6, pp.1377-1398.
Koenker, Roger, 2004, "Panel Data," Lecture 13, mimeo, University of Illinois at UrbanaChampaign.

You might also like