Professional Documents
Culture Documents
3 Identification of Variarion (Causality)
3 Identification of Variarion (Causality)
Giacomo Pasini
Lecture overview
1 Identification
2 Causal Diagrams
Outline
1 Identification
2 Causal Diagrams
scientists believe that there are regular laws that govern the way the universe works.
These laws are an example of a data generating process (DGP).
Example:
We can see that if you let go of a ball, it drops to the ground. That’s our observation,
our data.
Gravity is a part of the data generating process for that ball. That’s the underlying
law.
DGPs in the social sciences are generally not as well-behaved and precise as the
ones in the physical sciences.
Regardless, if we believe that observational data comes from at least somewhat
regular laws, we are saying there’s a DGP
Identification Causal Diagrams Drawing Causal Diagrams
There’s clearly a negative relationship. What this tells us is that avocado sales tend
to be lower in weeks where the price of avocados is high.
What we can see in the graph is the negative covariation or correlation between
price and quantity of avocados. THAT’S ALL!!!
We might be tempted to say something like an increase in the price of avocados
drives down sales. But that’s not actually on the graph!
Or we might want to say that an increase in prices makes people demand fewer
avocados, but that’s not on there either
DGP: both supply and demand are part of what generates market prices and
quantities, and we’ve got no way of pulling just the demand parts out yet.
Identification Causal Diagrams Drawing Causal Diagrams
Perhaps we’re interested in how price-sensitive consumers are - what is the effect
of a price increase on the number of avocados people buy?
maybe we should ask what is the effect of the number of avocados brought to
market on the price that sellers choose to charge?
What is the effect of the price on the number of avocados brought to market?
What is the effect of the quantity sold one week on the number of avocados people
will want to buy the next week?
...and many more
Each of these questions would require us to dig out a different part of the variation.
Identification Causal Diagrams Drawing Causal Diagrams
The graph shows the covariation of price and quantity - how they move together or
apart.
But these variables move around for all sorts of reasons! Focus on two consecutive
data points
Variables move around for all sorts of reasons. Those reasons would be reflected in
the DGP.
But when we have a research question in mind, we are usually only interested in
only one of those reasons.
How can we find the variation in the data that answers our question?
Somewhere inside the data, our reason for variation is hiding. How can we get it
out?
Identification Causal Diagrams Drawing Causal Diagrams
1 Using theory and what you know about the context from where the data are taken,
paint the most accurate picture possible of what the data generating process looks
like
2 Use that data generating process to figure out the reasons our data might look the
way it does that don’t answer our research question
3 Find ways to block out those alternate reasons and so dig out the variation we need
Identification Causal Diagrams Drawing Causal Diagrams
Outline
1 Identification
2 Causal Diagrams
Causality
We can say that X causes Y if, were we to intervene and change the value of X , then
the distribution of Y would also change as a result.
Identification Causal Diagrams Drawing Causal Diagrams
We can observe that the number of people who wear shorts is much higher on days
when people eat ice cream.
If we were to intervene and swap out someone’s pants for shorts, would it make
them more likely to eat ice cream? Probably not! So this is a non-causal
relationship.
Identification Causal Diagrams Drawing Causal Diagrams
We can observe that the number of people who wear shorts is much higher on days
when people eat ice cream.
If we were to intervene and swap out someone’s pants for shorts, would it make
them more likely to eat ice cream? Probably not! So this is a non-causal
relationship.
Surely the price of cigarettes by itself has no causal effect on your health.
But if we were to intervene and raise the price of cigarettes, that would likely
reduce the number of cigarettes smoked.
So the price of cigarettes causes cigarette smoking (to go down).
Also, if we were to intervene and reduce the number of cigarettes smoked, that
would cause your health (to improve).
In sum, the price of cigarettes causes health.
Identification Causal Diagrams Drawing Causal Diagrams
Important: we’ll still say X causes Y even if changing X doesn’t always change Y ,
but just changes the probability that Y occurs. As I said earlier, it changes the
distribution of Y
Does buying a child a copy of Alice in Wonderland cause them to read it?
Not always! Some kids won’t read it no matter what, and some kids would manage
to read it on their own without you buying it for them.
But in general, buying children copies of Alice in Wonderland increases the
probability that a child reads it, and so we’d say that buying them the book causes
them to read it
Identification Causal Diagrams Drawing Causal Diagrams
Causal diagram
A simple example
each variable on the graph may take multiple values: this is different from a
flowchart!
the arrow just tells us that one variable causes another. It doesn’t say anything
about whether that causal effect is positive or negative.
Identification Causal Diagrams Drawing Causal Diagrams
we must include the causal relationships between all the variables. One variable
might cause multiple things (like CoinFlip), and other variables might be caused by
multiple things (like Money).
when one variable is caused by multiple things, the diagram doesn’t tell us exactly
how those things come together.
All (non-trivial) variables relevant to the data generating process should be
included, even if we can’t measure or see them.
Identification Causal Diagrams Drawing Causal Diagrams
Unobserved variables 1
People are more likely to wear shorts on days they eat ice cream, but shorts don’t
cause you to eat ice cream and ice cream doesnÕt cause you to wear shorts.
There has to be something between them, otherwise they wouldn’t be correlated.
But we can’t have an arrow from one to the other, since neither causes the other.
In these cases we imagine that there’s a latent variable causing both of them, and
we can put that on the diagram
Identification Causal Diagrams Drawing Causal Diagrams
Assumptions
We should be able to see which parts of the diagram answer our research question
of interest.
We should be interested in any parts of it that allow PolicePerCapita to cause
Crime.
Identification Causal Diagrams Drawing Causal Diagrams
Outline
1 Identification
2 Causal Diagrams
Simplify
The real world is complex. The true data generating process is too.
Problem: the whole point of having a model like a causal diagram is to help us
make sense of the DGP and, eventually, figure out how we can use it to identify the
answer to our research question.
We want to simplify where we can without getting so simple that our diagram no
longer represents the true DGP.
Identification Causal Diagrams Drawing Causal Diagrams
How to simplify
Unimportance. If the arrows coming in and out of a variable are likely to be tiny
and unimportant effects, we can probably remove the variable.
Redundancy. If there are any variables on the diagram that have the arrows
coming in and going out of them from/to the same variables - we can probably
combine them and describe them together
Mediators. If one variable is only on the graph as a way for one variable to affect
another (i.e. B in A → B → C where nothing else connects to B), then we can
probably remove it and just have A → C directly
Irrelevance. Some variables are an important part of the data generating process
but irrelevant to the research question at hand. If a variable isn’t on any path
between the treatment and outcome variables, we can probably remove the variable.
Identification Causal Diagrams Drawing Causal Diagrams
Cycles
In a causal diagram there cannot be a cycle. You shouldn’t be able to start at one
variable, follow down the path of the arrows, and end up back where you started.
Problem: there are plenty of real-world data generating processes with feedback
loops: The rich get richer, and if I punch you that makes you punch me, which
makes me punch you.
Identification Causal Diagrams Drawing Causal Diagrams
Let’s pay attention to when these punches are thrown. As is common in statistical
applications where time is a factor
Whenever we have a cycle in our diagram, we can get out of it by thinking about
adding a time dimension: time’s arrow only moves in one direction.