01-Intro - Ekonometrika Terapan

Applied Econometrics: An introduction

Muhammad Halley Yudhistira

Department of Economics, Universitas Indonesia

August 2019

1 Introduction

2 Standard Procedure in Research

3 Econometrics and Causality

4 Random assignment

5 What’s next

Introduction to Our Course

• This is an introductory class to applied econometrics. I hope you still

remember with your math and statistics class during matriculation.
• Still, I wish this class put at least a good memory for you
• Grading consists of:
• Paper (20)
• Tutor (10)
• Midterm (35)
• Final (35)

Softwares and Textbooks

• I still have to discuss with your TA, but most probable software we
will use is Stata. You may use any version of Eviews and Stata
• Textbooks:
• Brooks, Chris. Introductory Econometrics for Finance, 2nd ed.
Cambridge University Press (CB).
• Halcoussis, Dennis, Understanding Econometrics, 1st edition,
South-Western. (DH).
• Wooldridge, Jeffrey. M, Introductory Econometrics A Modern
Approach, 5th edition. South-Western Cengage Learning. (JW).
• Verbeek, Marno. A Guide to Modern Econometrics, 4th edition, John
Wiley. (MV).
• Angrist, Joshua and Jörn-Steffen Pischke. Mostly Harmless
Econometrics (JJ-MHE)
• Angrist, Joshua and Jörn-Steffen Pischke. Mastering Metrics (JJ-MM)

What we will cover

• This class aims to (hopefully) help you be familiar with regression as

one of empirical tools in economics
• we will cover:
• Ordinary least square (OLS)
• Limited dependent model
• Panel data
• introduction to time series
• We are trying to have it as applicable as possible

Standard Procedure in Research

Data Analysis in Research

Figure: Data analysis process

Standard Procedure in Research


• Population vs sample
• In most cases, we cannot obtain
population data. What we can
do most is to draw some
observations from whole
population, and analyze the
• A careful sampling will give us
the ability to predict the
population behavior

Standard Procedure in Research

“Cooking” the Data

• Once you get your data, what kind of “receipt” you want to execute?
• Descriptive statistics: collecting, presenting, and describing the data
• Inferential statistics: drawing conclusion of population behavior w.r.t
the behavior of our sample

Standard Procedure in Research

Types of Data

We may categorize our dataset into three types according to the period:
• Time-Series: a sequence of data points made over a time interval.
• Cross-section: data collected by observing many subjects (such as
individuals, countries, or regions) at the same point of time. Ex:
census data
• Pooled data: combination between time-series and cross-section data:
annual GDP data for all ASEAN countries
Source of data:
• Primary data: a term for data collected from a source. Ex: field
survey on perception
• Secondary data: data collected by someone other than the user. Ex:
data from BPS

Standard Procedure in Research

Data Presentation

Econometrics and Causality

Why Econometrics

• Descriptive analysis using tables and graphs is never enough. It has

limited purpose
• Further technique enables us to understand the relationship between
two (or more) variables in form of a specific function.
• For example, how to analyze the relationship between price and
quantity demanded in our usual demand function
• Econometrics technique will help us. Econometrics uses statistical
tests to tackle various questions, such as. . .
• How well or badly does the model describe the observed data?
• Does another available model to describe the observed data any better?
• In any model, how large is the estimate of the effects of variable on any
other, and how reliable is the estimate?
• How far into the future, and with what degree of reliability, can the
model predict any variable of interest?

Econometrics and Causality

Correlation vs Causation
• The organization of the regression equation often leads people to
assume the explanatory variables cause the dependent variable, but
this interpretation isn’t necessary.
• Correlation does not prove causation. If two variables, A and B, are
correlated, then it could be that:
• A causes B, or vise versa
• Both A and B are caused by some other event
• The correlation is due to random chance
• Studenmund (2017): ”Don’t be deceived by the words dependent and
independent, however. Although many economic relationships are
causal by their very nature, a regression result, no matter how
statistically significant, cannot prove causality. All regression analysis
can do is test whether a significant quantitative relationship exists.
Judgments as to causality must also include a healthy dose of
economic theory and common sense.”
• let’s watch the talk
Econometrics and Causality

Bringing Causality

• In recent applied econometrics, people are obsessed to build a

causality. ”Does A cause B?” becomes a mainstream.
• Does social assistance program (ex.PKH) improve welfare?
• Is trans-Java highways beneficial for household welfare?
• Does odd-even policy reduce traffic congestion?
• Let assume you are a governor of Jakarta and aim to evaluate the
effect of KJP on student’s UAS result. How do you quantify the

Econometrics and Causality

Challenges: How to build a correct ”counterfactual”

• Consider the following example. Two new students are admitted by

MPKP and offered an MPKP-customized health insurance by Pak
Triman. A student decides to join the program and another one
doesn’t. As an SPS, you try to evaluate effect of the program.

Khuzdar Maria
Potential outcome without insurance: Yoi 3 5
Potential outcome with insurance: Yoi 4 5
Treatment (insurance status chosen): Di 1 0
Actual health outcome: Yi 4 5
Treatment effect: (XX) XX XX

Econometrics and Causality

Challenges: How to build a correct ”counterfactual (2)”

• The causal effect of the health insurance is Y1i − Y0i . The effect is
detected only for Khuzdar.
• If we have a group of n people, the average causal effect is
Avgn [Y1i − Y0i ], where
1 1 1
Avgn [Y1i − Y0i ] = ∑[Y1i − Y0i ] = ∑[Y1i ] − ∑[Y0i ] (1)
n n n

Khuzdar Maria
Potential outcome without insurance: Yoi 3 5
Potential outcome with insurance: Yoi 4 5
Treatment (insurance status chosen): Di 1 0
Actual health outcome: Yi 4 5
Treatment effect: Y1i − Y0i 1 0

Econometrics and Causality

Challenges: How to build a correct ”counterfactual (3)”

• What do we see in the real world?

• Actual health outcome of both students after the health insurance
• Temptation in taking the difference between health outcome of
Khuzdar and Maria as causal effect (Y1K − Y1M = Y1K − Y0M = −1).
• misleading conclusion and even further policy implication
• Mistakes in choosing the counterfactual is commonly found in
understanding the causal analysis. The key: comparability

Econometrics and Causality

Why Mislead

• Let’s see closer to our misleading result. We may rewrite it as:

Y1K − Y1M = Y1K − Y0M

= (Y1K − Y0K ) + (Y0K − Y0M ) (2)
= 1 + (−2)

• The causal effect is masked by the initial health status that affect the
student’s decision in joining program. This is what we call with

Econometrics and Causality

Back to counterfactual
• Let assume now more than 2 people joining MPKP, some are joining
the health insurance, and others skip it. You attempt to evaluate the
effect on health status Yi
• Let Di = 1 if individual i is insured and Di = 0 is not.
Avgn [Yi ∣Di = 1] is the average health status among insured, while
Avgn [Yi ∣Di = 0] is the status among uninsured.
• What we want to know (Why?)

Avgn [Y1i ∣Di = 1] − Avgn [Y1i ∣Di = 0] (3)

• Unfortunately, what we know (Why?)

Avgn [Yi ∣Di = 1] − Avgn [Yi ∣Di = 0] (4)

Avgn [Y1i ∣Di = 1] − Avgn [Y0i ∣Di = 0] (5)
Econometrics and Causality

Constant-effects formula

• Let further assume that the insurance makes people healthier by β, or

average causal effect of insurance on health, that is Y1i = Y0i + β
• Substituting into Equation (5), we have

Avgn [Y1i ∣Di = 1] − Avgn [Y0i ∣Di = 0]

= (β + Avgn [Y0i ∣Di = 1]) − Avgn [Y0i ∣Di = 0]
= β + (Avgn [Y0i ∣Di = 1] − Avgn [Y1i ∣Di = 0])

• The causal effect is always masked by the last part of the exposition.
What is it? Can we drop? How?

Random assignment

Random assignment for removing selection bias

• By randomly assign the treatment, we expect that probability of

people getting treated is similar across group
• The random assignment works by ensuring that the mix of individuals
being compared is the same, not by eliminating individual differences.
Creating ceteris paribus
• Note: The number of sample should be large enough and
representative to be able to draw any conclusion at population level

Random assignment

Random assignment in practice

• Popular term: Randomized Control Trial (RCT)

• Having random assignment also means that you do not have to use
about ”complicated” econometrics.
• Even simple t-test of difference in average between treatment and
control group almost give you the whole story
• You may consider to skip the next class afterwards.
• In reality, RCT is perhaps the most difficult approach
• Careful preparation and design
• Costly

What’s next

Get away from bias

• You’ve (hopefully) already understood that simple comparison

between treated and control groups tends to provide misleading
causal effect unless under random assignment is applied
• Question: Are there any alternative ways to escape from the bias
(control the selection)?

What’s next


• In the next session, we will learn how regression framework can

provide us a causal estimate
• Specifically we aim for

Yi = α + βDi + Xi γ + ei (6)

and hope to have β as causal effect by controlling other factors that

may affect the outcome.

