Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Section 10 - ECON 140

GSI: Caroline, Chris, Jimmy, Kaushiki & Leah ⇤

1 Panel Models
Recall that in Economics, we use 3 different types of data:

• Cross-sectional data: data which describe the activities of individuals, firms, or other units
at a given point in time.

• Time series data: data which describe the movement of a variable over time. They can be
daily, weekly, monthly, quarterly, or annual.

• Panel data: a combination of the above two data types.

Panel data consist of repeated observations at different time points for the same units (firms,
individuals or other economic agents). Let yit denote the outcome for unit i in period t, and
x1,it , ..., xk,it k explanatory variables. The index i denotes the unit and runs from 1 to N , and the
index t denotes time and runs from 1 to T . Typically T is relatively small (as small as 2), and
N is relatively large. As a result, when we consider the large sample properties of the estimators
(consistency), we typically assume that N goes to infinity, while keeping T fixed.
Here we mainly look at balanced panels, where T is the same for each unit. An unbalanced
panel has potentially different numbers of observations for each unit. This may arise because of
units dropping out of the sample; a practical example would be firms going out of business.

1.1 General set-up


• Regression model: yit = 1 x1,it + ... + k xk,it + ci + uit , where:

– i = 1, ..., N .
– t = 1, ..., T .
– yit is the outcome variable.
– xl,it is explanatory variable l = 1, ...k.
– uit is the error term.
– ci is the unobserved individual effect. It varies across units but does not change overtime
within unit i. ci can be used to represent unobservable characteristics of the unit i that
do not change over time.
– The coefficient of interest are the . We are usually not interested in knowing ci .

Thanks for previous GSIs for sharing section notes.

1
• The core issue that panel data addresses is the possibility that individual units may differ
in important, unobserved ways that affect their outcomes in a manner that is constant over
time. From a statistical standpoint, however, the key issue with panel data is that error
terms within the same unit will be correlated because of the common ci term.

Table 1: A panel data model: incit = trainit + ci + uit

Unit Time income training individual


effect
(unobserved)
1 1 Y11 X11 c1
1 2 Y12 X12 c1
1 3 Y13 X13 c1
2 1 Y21 X21 c2
2 2 Y22 X22 c2
2 3 Y23 X23 c2
3 1 Y31 X31 c3
3 2 Y32 X32 c3
3 3 Y33 X33 c3
4 1 Y41 X41 c4
4 2 Y42 X42 c4
4 3 Y43 X43 c4

• The two main approaches to deal with such issues are fixed effects and random effects models.
The key distinction is whether we allow correlation between the individual effects, ci , and the
regressors, x1,it , ..., xk,it , or whether we assume that they are not correlated. In both cases
we assume that the error terms are independent across individuals.

• We will be covering only the fixed effects model. There is an assumption related to the
exogeneity assumption we had in models with a cross-sectional structure: Cov(uit , xl,it ) = 0
for all l = 1, ..., k and Cov(uit , ci ) = 0, the error term is uncorrelated with regressors and the
individual effects.

1.2 Fixed Effects Model (FE)


We will only use the exogeneity assumption:

Cov(uit , xl,it ) = 0, 8l = 1, ..., k and Cov(uit , ci ) = 0.

With only this assumption, the above equation can no longer give us consistent estimate for
our parameters of interest because

Cov(xl,it , "it ) = Cov(xl,it , ci + uit ) = Cov(xl,it , ci ) + Cov(xl,it , uit ) = Cov(xl,it , ci ) 6= 0 8l = 1, ..., k

In other words, we have an endogeneity problem or an OVB problem (or whatever you want to
call the fact that the regressors and the error term are correlated) if we rely on the Cov(xl,it , ci ) = 0
assumption (this is called Random Effects (RE) model). Instead, we’re going to work directly

2
with the original model and make simple transformations that will help us deal with this problem.
Consider the original model:

Yit = 0 + 1 x1,it + ... + k xk,it + ci + uit (1)

where the assumptions Cov(uit , xl,it ) = 0 for all l = 1, ..., k, and Cov(uit , ci ) = 0 guarantee us
consistent estimates for . The problem is, of course, that the ci are not observed. There are, in
general, 3 ways to deal with the problem:

1. Estimate all ci ’s using dummy variables (the FE estimator).

2. Use the “demeaning” approach (the within estimator).

3. Use the “differencing” approach (the differencing estimator, use only when T = 2).

The FE estimator and within estimator are numerically identical. People sometimes use the
terms FE and within estimator interchangeably. The differencing estimator, however, is only
equivalent to the FE when T = 2.

1.2.1 Fixed Effect Estimator (FE)


The first approach tries to estimate all ci ’s. Sicne ci ’s are assumed to be constant for each unit,
we call the ci ’s fixed effects. Recall ci ’s are like the constant parameter in the cross-sectional data
models. The only difference is that we have a different parameter for each unit (individual, for
example). Since we have T observations for each unit, we can use those observations to estimate
each ci . Algebraically, we include a dummy variable for each unit i: Di = 1 for unit i and Di = 0
for all other units. The coefficients of the dummy variables give us the estimates for ci ’s. Therefore,
altogether we include N dummy variables (or N 1 if we also include the constant term) into the
model.
What’s the problem here? As we mentioned at the beginning of this section, T is usually small
in panel data (as small as 2). That implies that we usually cannot confidently estimate ci ’s because
consistency is meaningful only when we have large number of observations (as T becomes large).
But it does not matter as long as we are not interested in ci ’s, which is usually the case. We
estimate the model with a full set of dummy variables for the units, but do not use the estimates
of ci ’s for inference.
After we control for unit dummy variables, the identification of comes from within-unit
variation. Therefore FE estimator is sometimes called within estimator.

1.2.2 Within Estimator


In practice, it can be hard to estimate the FE model if N is very large.
Fortunately, since the estimates of ci are not reliable anyway, we can use a simple demeaning
transformation to the data that gives us consistent estimates of . This estimator is known as the
within estimator because it identifies using within-unit variation. As we mentioned above, it is
numerically identical to the FE estimator.
Compute the unit-specific averages for unit i as
T T T
1X 1X 1X
yi· = yit , xl,i· = xl,it , for all l = 1, ..., k and ui· = uit
T t=1 T t=1 T t=1

3
Then the demeaned dependant and independent variables will be

ỹit = yit yi· , x̃l,it = xl,it xl,i· , and ũit = uit ui·

The demeaned model of Equation 1 will be transformed into

ỹit = 1 x̃1,it + ... + k x̃k,it + ũit (2)

because ci does not vary within unit i thus the demeaned ci terms disappear: c̃i = ci c̄i = 0.
Because of this simplicity and the fact that it is identical to FE estimator, software packages
usually run FE models using the within estimator procedure.
Note that because Cov(x̃l,it , ũit ) = 0 for all l = 1, ..., k holds true by our assumption of exo-
geneity, using OLS on this model gives us consistent estimates.

1.2.3 Differencing Estimator


Another class of estimators that can get rid of the ci terms are differencing estimators. Run an
OLS regression for
yit = 1 x1,it + ... + k xk,it + uit (3)
where yit = yit yit 1 , xl,it = xl,it xl,it 1 , and uit = uit uit 1 . Notice that cit = cit cit 1 =
ci ci = 0 again disappears. Similarly Cov( xl,it , ui t) = 0 for all l = 1, ..., k by assumption,
so that we can obtain consistent estimates for by running OLS. When T = 2, the differencing
estimator is numerically identical to FE estimator. Therefore, this method is usually used when
we observe units in only 2 periods of time.
Notice that in both within estimator and differencing estimator we cannot estimate the unit
fixed effects ci because they get canceled out. We can only get estimates for ci ’s in FE estimator
when we include a whole set of dummy variables for all units. Since we are usually not interested
in these fixed effects, and more importantly, we cannot obtain reliable estimates of the fixed effects
due to small number time periods, there is really no advantage of using FE estimator over within
estimator.

1.3 Time Fixed Effects


In addition to the usual unit fixed effect models in which we use dummy variables to control for all
unobserved characteristics of units that do not vary over time, it’s also common for us to include
time fixed effects in the models to control for unobserved variables that are common to all units
at a given point of time. The idea and method are all similar to unit fixed effects. However, since
panel data models usually have a relatively small number of time periods, we do not need to use
within estimator to consider time fixed effects. The common practice is to simply include a full
set of time dummy variables (since there aren’t many) into the model and proceed as a FE model.

1.4 STATA commands


Commands having to do with fixed effects:

• “xtset entity time” tells STATA to treat your dataset as a panel, where entityvar is the name
of the variable identifying entities (e.g. state codes) and timevar identifies time periods (e.g.
years)

4
• Once you use the “xtset” command you can write “xtreg y x, fe”. This will run a regression
with entity fixed effects and automatically report an F-stat for the joint significance test that
all your fixed effects are 0.

• “reg y x i.entity” automatically adds dummy variables for all categories of the variable “entity”.
It does exclude one entity to avoid the dummy variable trap. This will get you the same
results as xtreg y x, fe. To run a joint significance test on your fixed effects, type testparm
i.entity after you run reg y x i.entity.

• “reg y x i.entity i.time” Adds fixed effects for entities and time periods.

• “areg y x, absorb(entity)” Estimates the within estimator for the on x. Remember this
is the same you would get if you added dummies to your regression for all the entities
except one. The advantage to this command is that it can compute the faster because it
demeans the data rather than create a lot of dummy variables. The downside is you can’t
run an F-test on all your fixed effects.

• “xtreg Y X, fe cluster (entityvar)” or “xtreg Y X, fe vce(cluster entityvar)” will cluster your


standard errors by entityvar (e.g. by state) and will provide you the correct (HAC) standard
errors. Don’t worry about this too much.

Notes

1. All of the above methods give the same coefficients.

2. How to test whether fixed effects were necessary at all? Just include all the relevant dummies
by and conduct a simple test an exclusion of Dummies (F-test).

Lagged variables Consider the model below where we examine the effect of seatbelt usage on
traffice fatalities across the 50 US states and years 1983-1987.

traf f ic_f atalitiesit = 0 + 1 sb_usagei,t + 2 traf f ic_f atalitiesi,t 1 + 3drinkingagei,t + ui,t

We have introduced a “lagged” variable by including traf f ic_f atalitiesi,t 1 as an X variable.


This variable is the traffice fatalities for the previous year for state i. You would include a lagged
variable if you think past levels of the variable affect the present level of the variable. The table
below shows an example of what a lagged variable would look like in a spreadsheet of data for
three of the states, Alabama, Arkansas, and Arizona.

5
state year fatalityrate lagged_fatality
AL 1983 0.029969 .
AL 1984 0.028276 0.029969
AL 1985 0.025135 0.028276
AL 1986 0.031791 0.025135
AL 1987 0.029685 0.031791
AK 1983 0.044669 .
AK 1984 0.037336 0.044669
AK 1985 0.033073 0.037336
AK 1986 0.0252 0.033073
AK 1987 0.019487 0.0252
AZ 1983 0.03442 .
AZ 1984 0.042158 0.03442
AZ 1985 0.041381 0.042158
AZ 1986 0.04443 0.041381
AZ 1987 0.029594 0.04443

So for the year 1984, Arkansas (AK) lagged fatality is equal to the fatality rate in 1983, .04469.
Notice that the lagged fatality rate for year 1983 is missing for all three states. That is because we
don’t have data on the fatality rates in the years previous to 1983, so we cannot create a lagged
value for the year 1983. The commands to created lagged_fatality in state are as follows:

1. sort state year

2. by state: gen lagged_fatality = fatality[_n-1]

In this case we have to use the “by state” command because we have panel data. If we didn’t
have panel data and we just had a time series for only one state, then the commands to create a
lagged variable would be

1. sort year

2. gen lagged_fatality = fatality[_n-1]

You can see examples of all these commands in the Section 11 do file on bcourses.

2 Exercises
1. A study, published in 1993, used U.S. state panel data to investigate the relationship between
minimum wages and employment of teenagers. The sample period was 1977 to 1989 for all
50 states. The author estimated a model of the following type:

ln(Eit ) = 0 + 1 ln(Mit /Wit ) + 2 S2i + ... + 50 S50i + 2 D2t + ... + 13 D13t + "it ,

where E is the employment to population ratio of teenagers, M is the nominal minimum


wage, and W is average hourly earnings in manufacturing. In addition, other explana-
tory variables, such as the adult unemployment rate, the teenage population share, and the
teenage enrollment rate in school, were included.

6
(a) Name some of the factors that might be picked up by state and time fixed effects.
(b) The author decided to use eight regional dummy variables instead of the 49 state dummy
variables. What is the implicit assumption made by the author? Could you test for its
validity? How?
(c) The results, using time and region fixed effects only, were as follows:

\ –0.182
ln(E it ) = ⇥ ln(Mit /Wit ) + ..., R2 = 0.727
(0.036)

Interpret the result briefly.


(d) State minimum wages do not exceed federal minimum wages often. As a result, the
author decided to choose the federal minimum wage in his specification above. How
does this change your interpretation? How is the original regression equation affected
by this?

2. A researcher investigates the determinants of crime in the United Kingdom using data for
42 police regions over 22 years. She estimates by OLS the following regression:

ln (cmrtit ) = ↵i + t + 1 unrtmit + 2 proythit + 3 ln (ppit ) + uit

for i = 1, ...., 42 and t = 1, ...., 22 . The variable cmrt is the crime rate per head of population,
unrtm is the unemployment rate of males, proyth is the proportion of youths, and pp is the
probability of punishment measured by the (number of convictions) / (number of crimes
reported). The ↵i and t are region and year fixed effects, respectively.

(a) Why does the researcher need to exclude one of the year fixed-effects?
(b) What are the terms ↵i and t likely to pick up?
(c) Discuss the advantages of using panel data for this type of investigation.
(d) Estimation by OLS (using properly-corrected standard errors) results in the following
output:
ln (cmrtit ) = 0.063 ⇥ unrtmit + 3.739 ⇥ proythit 0.588 ⇥ ln (ppit )
(0.109) (0.179) (0.024)
Where standard errors appear in parentheses, the coefficients of the various fixed effects
are not reported and R2 = 0.904. Comment on the results. In particular, what is the
effect of a ten percent increase in the probability of punishment?
(e) To test for the relevance of the region fixed effects, your restrict the regression by
dropping all entity fixed effects and a single constant is added. The relevant F-statistic
is 135.28. What are the degrees of freedom? What is the critical value from your F
table?
(f) Although the test rejects the hypothesis of eliminating the fixed effects from the regres-
sion, you want to analyze what happens to the coefficients and their standard errors
when the equation is re-estimated without fixed effects. In the resulting regression, 2
and 3 do not change by much, although their standard errors roughly double. However,
1 is now +1.340 with a standard error of 0.234. Why do you think that is?

7
3. E10.2 Traffic crashes are the leading cause of death for Americans between the ages of 5
and 32. Through various spending policies, the federal government has encouraged states to
institute mandatory seat belt laws to reduce the number of fatalities and serious injuries. In
this exercise you will investigate how effective these laws are in increasing seat belt use and
reducing fatalities. On the textbook Web site you will find a data file Seatbelts that contains
a panel of data from 50 U.S. states plus the District of Columbia for the years 1983 through
1997. A detailed description is given in Seatblets_ Description, available on the Web site.

(a) Estimate the effect of seat belt use on fatalities by regressing FatalityRate on sb_useage,
speed65, speed70, ba08, drinkage21, ln(income), and age. Does the estimated regression
suggest that increased seat belt use reduces fatalities?
(b) Do the result change when you add state fixed effects? Provide an intuitive explanation
for why the result changed.
(c) Do the result change when you add time fixed effects plus state fixed effects?
(d) Which regression specification-(a),(b), or (c)-is most reliable? Explain why.
(e) Using the result in (c), discuss the size of the coefficient on sb_useage. Is it large?
Small? How many lives would be saved if seat belt use increased from 52% to 90%?
(f) There are two ways that mandatory seat belt laws are enforced:“Primary” enforcement
means that a police officer can stop a car and ticket the driver if the officer observes an
occupant not wearing a seat belt; “ secondary” enforcement means that a police officer
can write a ticket if an occupant is not wearing a seat belt, but must have another reason
to stop the car. In the data set, primary is binary variable for primary enforcement
and secondary is a binary variable for secondary enforcement. Run a regression of
sb_usage on primary, secondary, speed65, speed70, ba08, drinkage21, ln(income), and
age, including fixed state and time effects in the regression. Does primary enforcement
lead to more seat belt use? What about secondary enforcement?
(g) In 2000, New Jersey changed from secondary enforcement to primary enforcement.
Estimate the number of lives saved per year by making this change.

8
3 Solutions
1. Time effects will pick up the effect of omitted variables that are common to all 50 states
at a given point in time. Federal fiscal and monetary variables, exchange rate and U.S.
terms of trade movements, aggregate business cycle developments, etc., are candidates here.
State fixed effects will include variables that are characteristics of states that are expected
to remain constant over time (at least during the considered time frame) within a specific
state such as attitudes toward employment or labor force participation, state specific labor
market policies, industrial and labor force composition, etc.
2. The implicit assumption by the author is that the coefficients on the state fixed effects are
identical within a region but differ between regions. Since these coefficients imply linear
restrictions, they can be tested using the F -test. This is equivalent to test S null hypotheses
of the kind H0 : 1s = · · · = N s s for each of the s = 1, . . . , S regions, where is is the
estimated fixed effect of state i in region s, and N s is the number of states in that region.
3. Consider a ten percent increase in minimum wages, say from $5 to $5.50 with constant
average hourly earnings. This corresponds to a ten percent increase in relative minimum
wages. The resulting decrease in the employment to population ratio for teenagers is 1.8 or
almost 2 percent. The regression explains roughly 73 percent of the variation of employment
to population ratio of teenagers across time and states.
4. This choice in effect drops the i subscript from the minimum wage, since there is no variation
by state. The original equation then reads
ln(Eit ) = 0 + 1 ln(Mt /Wit ) + 2 S2i + ... + 50 S50i + 2 D2t + ... + 13 D13t + "it ,
Furthermore, since the federal minimum wage is constant across the nine regions at a point
in time, it is absorbed by the time effects. The coefficient on the relative minimum wage now
is identified through the variation in average hourly earning in manufacturing across time
and region.
1. Answer: Adding up all the region fixed effects will result in a variable that equals 1 for all
observations. This would result in the dummy variable trap if there was a constant in the
regression, but there isn’t one. However, adding up all the year fixed effects will also result in
a variable equal to 1 so it is like having a constant in the regression. Therefore, the entity and
time fixed effects taken together are perfectly multicollinear. If just one of them (either time
or region fixed effects) is dropped, then this will not be true. (An equivalent specification
would include a constant term, but then would have to drop one time fixed effect and one
region fixed effect to prevent perfect multicollinearity.)
2. Answer: The ↵’s pick up omitted variables that are specific to police regions and do not vary
over time. Attitudes toward crime may vary between rural regions and metropolitan areasand
these differences persist over time. They would be hard to capture through measurable
variables. The t ’s pick up factors that are common to all regions in a year but vary over
time. Common macroeconomic shocks (e.g., inflation, recession) that affect all regions equally
will be captured by the time fixed effects.
3. Answer: Although some variables that are time invariant or region invariant could be intro-
duced as regressors, the list of possible variables is long. By introducing fixed effects, all of
them are taken into account.

9
4. Answer: Given the positive coefficients on unrtm and proyth, a higher male unemployment
rate and higher proportion of youths tend to increase the crime rate. The negative coefficient
on ln(pp) says that crime rate falls with a higher probability of punishment. The coefficients
on projth and ln(pp) are statistically significant, while that for unrtm is not. The regression
explains roughly 90 percent of the variation in crime rates in the sample. Because there is
a log-log specification of the relationship between crime rate and probability of punishment,
the coefficient on ln(pp) is the “punishment elasticity of the crime rate.” Therefore the
interpretation of the estimate of -0.588 is that a 10% increase in the probability of punishment
results in a reduction in the crime rate of 5.88%.

5. Answer: Consider first the reasoning behind this test strategy. You might think that you
would just perform the usual test of the joint hypothesis that all 42 region fixed effects are
zero: ↵i = 0, i = 1, 2, . . . , 42. But because you excluded the first time fixed effect, i.e., 1 = 0,
there would be no fixed effect for the first year. This is repaired by introducing a constant
term into the regression, in which case the remaining year fixed effects are interpreted as a
year’s impact on the crime rate relative to the excluded case, i.e., year = 1. “Dropping” all
the region fixed effects is the same as saying that you restricted all of their coefficients to
zero. There are 42 of these coefficients, so it seems like imposing 42 restrictions. But, since
one should add the “full” constant term to the regression (which was previously restricted
to zero) one restriction is gained. On net, the test imposes q = 42 – 1 = 41 restrictions.
Thus we would perform the joint test using the F statistic which is distributed according
to Fq,n k . Now since there are n = 42*22 = 942 observations, we can use the large sample
approximation of this distribution, i.e., Fq,1 = F41,1 . . Look for the critical value in Table
4 in the Appendix. That table goes up to only 30 degrees of freedom and since the 135.28 >
1.70 which is the critical value with 30 degrees of freedom at the 1% significance level, 135.28
exceeds the critical value for 41 degrees of freedom and so we reject the restrictions that all
region fixed effects are zero.

6. Answer: Dropping all the region and year fixed effects results in an estimate of the coefficient
on unemployment rate that is now significant whereas before it was not. (The other two
regressors remain significant even after a doubling of their standard errors.) It appears
that region and year fixed effects were controlling for variation that is now picked up by
the unrtm variable. It is likely that different regions have different levels of unemployment
rates that persist over time due to the nature of the local economy (e.g.„ agriculture or
manufacturing based). Those differences were picked up by the region fixed effects in the
original specification. It’s possible that there is a nationwide business cycle over this 22 year
period which is highly correlated with unemployment rates. That would have been picked
up by the year fixed effects in the original specification and now is captured by the common
time component to local unemployment rates.

10
11

You might also like