Regresi Data Panel

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

Panel Regression

Brodjol Sutijo
Introduction

• A panel data set, or longitudinal data set, is one where there are repeated
observations on the same units
• The units may be individuals, households, enterprises, countries, or any
set of entities that remain stable through time.
• The National Longitudinal Survey of Youth (NLSY) is an example. The
same respondents were interviewed every year from 1979 to 1994. Since
1994 they have been interviewed every two years.
• A balanced panel is one where every unit is surveyed in every time period.
The NLSY is unbalanced because some individuals have not been
interviewed in some years. Some could not be located, some refused, and
a few have died.
Introduction
• Panel data sets have several advantages over cross-section data sets:
• They may make it possible to overcome a problem of bias caused
by unobserved heterogeneity.
• They make it possible to investigate dynamics without relying on
retrospective questions that may yield data subject to measurement
error.
• They are often very large. If there are n units and T time periods,
the potential number of observations is nT.
• Because they tend to be expensive to undertake, they are often well
designed and have high response rates. The NLSY is an example.
Introduction

• We will start with an example of the use of panel data to investigate simple
dynamics. We will use data from the 1988 round of the NLSY for 1,538 males in
full-time employment.
• Here is the result of regressing the logarithm of hourly earnings on a
dummy variable for being married and a set of control variables (years of
schooling, ASVABC score, years of tenure and square, years of work
experience and square, etc; coefficients not shown).

• LG_EARN = 0.129 MARRIED R2 : 0.271 n : 1538

• Married males earn 12.9 percent more than single males and the effect is highly
significant (standard error in parentheses).
Introduction

NLSY 1988 data


Dependent variable LGEARN
OLS Fixed
effects
MARRIED 0.129 0.163
(0.024) (0.028)
SOONMARR — 0.096
(0.037)
SINGLE — —

R2 0.271 0.274
n 1538 1538

• Two Dummy Var : Married, SOONMARR (SOONMARR  SINGLE)


• if the alternative hypothesis is true, the coefficient of SOONMARR should be
equal to that of MARRIED, but it is lower.
INTRODUCTION

NLSY 1988 data


Dependent variable LGEARN
OLS Fixed Fixed
effects effects
MARRIED 0.129 0.163 —
(0.024) (0.028)
SOONMARR — 0.096 –0.066
(0.037) (0.034)
SINGLE — — –0.163
(0.028)
R2 0.271 0.274 0.274
n 1538 1538 1538

• To test whether it is significantly lower, the easiest method is to change the reference
category to those who were married by 1988 and to introduce a new dummy variable
SINGLE that is equal to 1 if the respondent was still single four years later.
Introduction
• REGRESSION ANALYSIS WITH PANEL DATA
k s
Yit   1    j X jit    p Z pi   t   it
j2 p 1

• where the Xj variables are observed and the Zp variables are unobserved (unobserved
heterogeneity )
• The index i refers to the unit of observation, t refers to the time period, and j and p
are used to differentiate between different observed and unobserved explanatory
variables
• Note that the unobserved heterogeneity is assumed to be unchanging and
accordingly the Zp variables do not have a time subscript.
Introduction

• Model Panel data


k s s
Yit   1    j X jit    p Z pi   t   it  i    p Z pi
j2 p 1 p 1

• Because the Zp variables are unobserved, there is no means of obtaining information


about the SgpZp component of the model and it is convenient to define a term ai,
known as the unobserved effect, representing the joint impact of the Zp variables on Yi.

k
Yit   1    j X jit   i   t   it
j2

• In that case the ai term may be dropped and pooled OLS may be used to fit the model,
treating all the observations for all of the time periods as a single sample.
Fix Effect
k
Yit   1    j X jit   i   t   it
Yit  Yi    j  X jit  X ji     t  t    it   i
k
j2
k
Yi   1    j X ji   i  t   i
j 2

j 2

• Last model is known as the ‘within-groups’ method because the model is explaining the
variations about the mean of the dependent variable in terms of the variations about the
means of the explanatory variables for the group of observations relating to a given
individual.
• The intercept b1 and any X variable that remains constant for each individual will drop
out of the model.
• the fixed effects approach is that the dependent variables are likely to have much smaller
variances than in the original specification. Now they are measured as deviations from
the individual mean, rather than as absolute amounts.
Fix Effect First Difference

k
Yit   1    j X jit   i   t   it
Yit  Yit 1    j  X jit  X jit 1      it   it 1
k
j2

k
j2
Yit 1   1    j X jit 1   i    t  1   it 1 k
j2
Yit    j X jit     it   it 1
j 2

• Note that the error term is now (eit – eit–1). Thus the differencing gives rise to
moving average autocorrelation if eit satisfies the regression model assumptions.
• However, if eit is subject to AR(1) autocorrelation and r is close to 1, taking first
differences may approximately solve the problem.
 it   it 1  vit
 it   it 1  vit   1     it 1
 vit

You might also like