A Practical Guide to Using Panel Data

Author: Simonetta Longhi, Alita Nandi

Pub. Date: 2017
Methods: Survey research, Panel data, Sampling
Keywords: surveying, households, unemployment, waves
Why Panel Surveys?


Different types of data are available for analysis. This chapter is a very brief introduction to longitudinal data
and discusses their advantages over cross-section data.

What are Longitudinal Data?

Cross-section data consist of one set of observations for each unit of observation. It gives us a snapshot of
the population of interest at a particular point in time. In contrast, longitudinal data represent multiple snap-
shots of the same units of observation. Units of observation may be individuals, households, firms, schools,
countries, and so on. While cross-section data may show who is poor, unemployed or in poor health at any
point in time, longitudinal data also show whether people move into and out of poverty, how often people are
unemployed, whether poor health follows periods of unemployment or vice versa.

Longitudinal data may be collected from one single interview by asking people about both their current and
their previous situation. For example, people may be asked about the characteristics of their current job but
also about their previous jobs and spells of unemployment and inactivity, or they may be asked about their
current marriage but also their previous marriages and spells of cohabitation. These are often referred to as
retrospective data. Alternatively the data may be collected via multiple interviews asking about the current sit-
uation (for example, marriage or employment) at the time of the interview. In this case we have repeated ob-
servations for each unit of observation and these successive interviews are often called ‘waves’, ‘sweeps’ or
‘rounds’ (of interviews). Longitudinal data collected using these prospective methods are referred to as panel
data. The datasets discussed in this book fall into this category although they may include some retrospective
data as well. In most of the surveys discussed in this book, households are randomly selected at a point in
time and all household members are interviewed at that time and at regular intervals after that. We discuss
different aspects of panel surveys in detail in Chapter 2.

Some surveys do not interview the same set of people or households at each point in time, but sample new

ones at each wave. In this case we do not have a longitudinal dataset – of the same people – but pooled
cross-sections. These are also known as repeated cross-sections; one example is the Family Resources Sur-
vey, where a sample of approximately 25,000 UK households is selected and interviewed each year. While
these datasets do not qualify as longitudinal data, if we are interested in group averages, say regional av-
erage pay, we can use these surveys to compute average pay of all people living (or working) in the same
region and construct longitudinal data of regions, thus creating a macro or pseudo panel.

Another common type of longitudinal data consists of macro or pseudo panels. Macro panels consist of ag-
gregates such as unemployment rate, inflation, and so on, observed over time for a certain number of areas.
In this case the cross-sectional part of the panel is not made of people, but may be made of, for example,
countries or regions. Typically, in these types of panel datasets the number of units of observation is much
lower than in the case of individual or household panels, where we often have thousands of people or house-

Although there are commonalities, econometric techniques used for macro panels can be rather different than
those used for individual and household panels. For example, individual and random effects, discussed in
Part II of this book, have a different interpretation when they refer to regions rather than individuals. Tech-
niques such as spatial econometrics are applied to macro panels, but not to individual and household panels.
It is also worth noting that while individual and household surveys generally include only a sample of the pop-
ulation of interest, macro panels tend to include the whole universe of areas the researcher is interested in.
In this book we focus on individual and household data and their related econometric techniques.

Advantages of Longitudinal versus Cross-Section Data

Compared with longitudinal data, cross-section data are quite common and perhaps easier to deal with. How-
ever, since they represent a snapshot of a population at a specific point in time, the type of econometric analy-
sis that they allow is relatively limited. These types of data rarely allow analyses of transitions or changes over

The first advantage of longitudinal data is that repeated observations for the same individual also allow us to
use econometric techniques such as fixed and random effects methods. These methods allow us to control
for certain types of individual-specific time-invariant factors that are not observed in the dataset (often referred

to as individual unobserved heterogeneity). For example, we may observe that those people who change
their residence are more likely to earn higher wages than those who do not move. One reason why some
people earn higher wages may be that they have higher levels of motivation. Those who have higher levels
of motivation may also be the ones who are more likely to change place of residence. So, if we observe that
those who move earn more than those who do not, does that mean that by changing the place of residence
of individuals we can increase their earnings? No: since motivation levels are not observed in the data and
we cannot directly include them in the models, the correlation between wages and the probability of changing
residence may simply reflect differences in wages between the high and less motivated. If these unobserved
characteristics do not change over time, panel data methods such as the fixed or random effects method can
be used to better identify such causal effects. These and other panel data methods are described in Part II of
this book.

The second advantage of having repeated observations is that they allow a better study of dynamics. For ex-
ample, if we want to analyse the correlation between bad health and unemployment, observing people over
time allows us to see whether bad health tends to appear before unemployment or after it. The causation (un-
employment leads to bad health instead of bad health leading to unemployment) is likely to be clearer when
longitudinal data are available.

The key advantage here is the possibility to measure change. For example, we may know that the unem-
ployment rate in a certain country has been 6% for the last four years. However, this figure does not tell us
if it is the same few people who are unemployed over a long period of time, or if there are many people who
transition into and out of unemployment and experience short spells of unemployment. While it is possible to
study the macro-level phenomenon with repeated cross-sections (6% unemployment rate for four years), lon-
gitudinal data are necessary to analyse transitions into and out of unemployment. More generally, duration or
survival analysis methods which investigate what drives staying in a particular state such as unemployment
or poverty and what determines transitions out of these states can only be used with longitudinal data. These
methods are described in Part III of this book.

Missing Data, Balanced and Unbalanced Panels

In panel surveys, sometimes it is not possible to obtain further interviews with all individuals, households or
firms who were interviewed in the first wave. The different reasons for non-interviews are discussed in Chap-

ter 7. We have ‘wave non-response’ if some non-interviews are followed by interviews in successive waves,
and ‘panel attrition’ if the units of observation drop out of the survey permanently. Together these are referred
to as unit non-response.

Even those interviewed in every wave may not answer all questions in every interview, resulting in ‘item non-
response’. There are serious implications of non-response and attrition on analysis and extensive efforts are
made to minimise non-response and attrition in surveys. But despite these efforts non-response and attrition
are present in almost all surveys and analysts often use statistical methods to minimise these problems.

Unit and item non-response also result in the data being unbalanced: that is, each individual unit is not ob-
served at every interview wave. For example, if we have a panel of 100 people interviewed for three years, we
should have three interviews for each of the 100 persons. However, this may not happen because of attrition
and non-response. We can convert an unbalanced into a balanced panel by dropping all people who have
missed at least one interview; this, however, would reduce the sample size, sometimes considerably. Lucki-
ly, most estimation techniques and estimation commands work effectively for both balanced and unbalanced
data and researchers often use panels that are unbalanced (see, for example, Baltagi 2009).

Summary and Suggestions for Further Reading

For those who are unfamiliar with longitudinal data, in this chapter we have very briefly discussed the main
differences between longitudinal and cross-section data, and the advantages of being able to use longitudinal
data. Although this book focuses on longitudinal and panel data, data management techniques such as those
discussed in Chapters 3, 5 and 6, and econometric estimations such as those discussed in Chapter 7 and 8,
are relevant also for the analysis of cross-section data.

Key Points

• Cross-section data provide only one observation at a particular point in time per individual/household/
firm, while longitudinal data provide multiple observations at different points in time per individual/
household/firm (either collected once or in repeated interviews).
• Longitudinal data allow us to analyse changes, transitions, temporal order of events, and persistence

in particular states. More importantly, longitudinal data allow us to control for the effects of individual-
specific time-invariant unobserved factors. None of this is possible with cross-section data.
• Data may not be available for all units of observation at all points in time when the data were col-
lected, either because of unit non-response (wave non-response and attrition) or because of item
non-response (specific questions not answered). This results in unbalanced panels. Estimation tech-
niques for unbalanced panels are the same as those for balanced panels. However, non-response
also has serious consequences for population estimates based on these data. Estimation methods
are available to address these issues.

Suggestions for Further Reading

• For a more detailed discussion on the advantages and disadvantages of panel data see:
• Chapter 1 of Baltagi, B.H. (2009) Econometric Analysis of Panel Data. London, Wiley.
• Chapter 1 of Hsiao, C. (2003) Analysis of Panel Data. Cambridge, Cambridge University
• For a discussion of different types of longitudinal data see Chapter 1 of Taris, T.W. (2000) A Primer
in Longitudinal Data Analysis. London, Sage.
• For a discussion of survey non-response see Groves, R.M., Dillman, D.A., Eltinge, J.L. and Little,
R.J.A. (2001) Survey Non-response. New York, Wiley.

