Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Introduction Causality vs.

correlation Structure of data

QMUL, ECN224 - Econometrics 1


Topic 1: Economic questions, data, causality

Sarolta Laczó

Autumn 2018
Introduction Causality vs. correlation Structure of data

Outline

Introduction to the field of econometrics: combining economic


and statistical models

Causality versus correlation - controlled experiments

Structure of data
Introduction Causality vs. correlation Structure of data

What is econometrics?

Econometrics...
uses economic theory and statistical techniques to analyse
economic data.

aims to answer quantitative questions.

Main tool: regression analysis


We want to determine the effect of one variable (X) on another
variable (Y ).

Statistical Methods in Economics: statistical analysis of one variable

Econometrics 1: analysis of the relationship between two (or more)


variables
Introduction Causality vs. correlation Structure of data

Quantitative questions
Examples of questions of interest:
How does demand change with the price of the good?
What is the effect of a new marketing campaign on sales?
How does class size affect education outcomes (e.g. test scores)?
How much an additional year of schooling increases wages?
What is the relationship between credit scores and loan default
rates?
How do taxes on cigarettes influence smoking?
Do more policemen reduce crime?
Does increasing the incarceration rate reduce crime?
If free bed nets are distributed, will the prevalence of malaria
decrease?
How to predict inflation? Or the pound/euro exchange rate?
Introduction Causality vs. correlation Structure of data

Economic models

Economic models express relationships between economic variables.

They study the direction and magnitudes of relationships.

They answer questions concerning the signs and magnitudes of


unknown and unobservable parameters.

Question: How do we introduce parameters into an economic


model and how do we estimate them?
Introduction Causality vs. correlation Structure of data

Economic models

Economic theory is not exact - it does not claim to be able to predict


the specific behaviour of any individual or firm, but rather it describes
the average or systematic behaviour of many individuals or firms.

For example, when studying consumption we recognise that actual


consumption is the sum of this systematic part and a random,
unpredictable component - a random error.
Introduction Causality vs. correlation Structure of data

Statistical model

A statistical model representing aggregate consumption can be


written as
c = f (i) + e,
where c denotes consumption, i is income, and e is a random error
which accounts for the many factors that affect consumption that we
have omitted from this simple model (e.g. wealth)

Economic theory describes the systematic part, f (i).

e is the non-systematic, random error component that we know is


present but cannot be observed.

Adding random errors converts our economic model into a


statistical one, which gives us a basis for statistical inference.
Introduction Causality vs. correlation Structure of data

Causality

We will use regression analysis to understand causal effects


between economic variables.

Most often empirical work is carried out using observational (i.e.


non-experimental) data.

For example: estimate the returns to education using data on


educational choices made by individuals

We will deal with difficulties arising from using observational data to


estimate causal effects.
Introduction Causality vs. correlation Structure of data

Causality

Observational data pose major challenges:

Confounding effects (omitted factors): schooling decisions follow


from self-selection on e.g. motivation or socio-economic
background.

Simultaneous causality: e.g. housing prices and school quality.

Often only a correlation can be established, and correlation is


NOT causality.
Introduction Causality vs. correlation Structure of data

An example

Suppose that we are interested in trying to understand how the


quantity of Citroen cars that consumers purchase is determined.
We summarise a few determinants of the quantity of Citroen cars
demanded in the following equation:

q D = f (p, pS , pC , i),
where
q D : quantity of Citroen cars demanded
p: price of Citroen car
pS : price of cars that are substitutes
pC : price of complements (e.g. petrol)
i: level of income

q D is called the dependent variable and (p, pS , pC , i) are called


explanatory variables.
Introduction Causality vs. correlation Structure of data

Controlled experiment

How to find out the causal effect ideally?

Run controlled experiments!

Set the levels of the explanatory variables.

Run the experiment to obtain one observation of the dependent


variable.

Repeat many times.


Introduction Causality vs. correlation Structure of data

Controlled experiment about Citroen cars

To find out, for example, how a Citroen car’s own price affects its
demand:
Set a price

Observe the number of Citroen cars sold

Set different prices, while keeping other explanatory variables


unchanged (other prices, income)

By repeating this process a number of times, we create a sample of


economic data.

However, there are also omitted random variables in this economic


experiment (consumer confidence, characteristics of a Citroen relative
to competition, etc.)
Introduction Causality vs. correlation Structure of data

‘Uncontrolled’ experiments

Economists typically cannot perform controlled experiments.

Instead, to determine the relationship between each explanatory


variable (p, pS , pC , i) and the outcome (q D ), we use a sample of T
observations on values of the explanatory variables and the outcome
variable.

Often, we will want to know (estimate from the data) the effect of a
change in one explanatory variable (p) on the average or expected
outcome (q D ).

To do this, we combine economic theory and statistics, the


combination being econometrics. We typically do this on a
computer.
Introduction Causality vs. correlation Structure of data

Another example: class size and test scores

Social scientists, educators, and parents have long been concerned


with the causal effects of class size, a key input to education
production.

Many countries have implemented important reforms to cap class size.

Small classes are costly, therefore evidence on their effectiveness is


welcome.

Policy question: What is the effect on test scores (or some other
outcome measure) of reducing class size by one student per class?
Introduction Causality vs. correlation Structure of data

Data on school inputs and test scores

What do data say about the class size - test score relationship?

First step: get data

Our dataset: California Standardised Testing and Reporting, 1999

All K-6 and K-8 California school districts (n = 420)

District average of test scores (combined math and reading)

Student to teacher ratio, i.e., number of students in the district


divided by number of full-time equivalent teachers
Introduction Causality vs. correlation Structure of data

Student to teacher ratio and test scores

700
680
660
testscr
640
620
600

15 20 25
str
Introduction Causality vs. correlation Structure of data

Student to teacher ratio and test scores

From the figure we can detect a (weak) negative relationship between


the student to teacher ratio and test scores. The sample correlation is
−23%.
Note: A regression with one explanatory variable will provide us with
similar information.

Does this imply a causal relationship?

Only if we have reason to believe that all other things are held
constant when the student to teacher ratio changes. The Latin
phrase ceteris paribus is often used.

It is key to scientific inquiry. In this example, we have to seek to


screen out factors that perturb the relationship between class size and
achievement. This is in order to get closer to the ideal of controlled
experiments.
Introduction Causality vs. correlation Structure of data

Confounding factors
E.g. districts with small classes might be richer, so students in small
classes may have better opportunities for learning outside the school.

4
3.5 3
logavginc
2.5 2
1.5

15 20 25
str

Districts with small classes are indeed richer!


Note: In multiple regression we will control for observable potential
confounding factors.
Introduction Causality vs. correlation Structure of data

Some descriptive statistics

Before turning to regression analysis and testing, we take a first look


at data by computing descriptive statistics.

Note: This table doesn’t tell us anything about the relationship


between test scores and the STR.
Introduction Causality vs. correlation Structure of data

Cross-sectional data consist of a sample of multiple entities


(individuals, households, firms, schools, cities, states, countries, etc.)
taken at a given point in time.

Example: 420 California school districts, 1999

Time-series data consist of a single entity observed at multiple time


periods.

Example: UK CPI, January 2007 – December 2016

Panel/longitudinal data consist of multiple entities observed at


multiple time periods.

Example: British Household Panel Survey (BHPS), 2000 – 2008

You might also like