Professional Documents
Culture Documents
Lec 3EFCFull
Lec 3EFCFull
For
Consultants
Data, everywhere
Instructor: Tirthatanmoy Das
Lecture 3
Jun 12, 2024
1
Last class…
Software for analysis
Data fitting?
Causality
Multivariate linear
regression
Causal interpretation of
results
4
Is this GDPN’s causal
effect?
Are the estimates Is 𝛽!! = 0.92 the causal
confounded? effect?
6
Causality comes Causal effect of 𝑋"
from domain
concepts/theory
establish causal Consider
relationships, not 𝑌 = 𝑓(𝑋" , 𝑋# … 𝑋$ )
from data
▷ Causal relationship: a
Think of thought change in 𝑋" causes a
experiments change in 𝑌, holding the
other 𝑋s constant (ceteris
paribus)
7
How to hold other Causal effect of 𝑋"
factors constant
With linearity
𝑢: the regression 𝑌 = 𝐵! + 𝐵" 𝑋" + ⋯ 𝐵$ 𝑋%
error
▷ But if you run
All omitted 𝑌 = 𝐵! + 𝐵" 𝑋"
determinants of 𝑌
▷The you are missing
Any random noise 𝑢 = 𝐵" 𝑋# + ⋯ 𝐵$ 𝑋% 8
Confounding? The imperfect regression
model
Imperfect regression model
𝑌 = 𝐵! + 𝐵" 𝑋 + 𝑢
9
What can cause Corr 𝑋, 𝑢 ≠ 0
10
Investigate the Ask the following for
root cause of the 𝐶𝑜𝑣[𝑋!" , 𝑢] ≠ 0
violation ▷ Any relevant
determinants of 𝑌 are
omitted that could be
correlated with 𝑋?
12
Investigate the Ask the following for
root cause of the 𝐶𝑜𝑣[𝑋!" , 𝑢] ≠ 0
violation
13
Today…
Causality
Another example – US
manufacturing data?
𝑌 = 𝐵! + 𝐵" 𝑋" + ⋯ 𝐵$ 𝑋% + 𝑢
15
A more complete Multivariate regression
model emerges
𝑌 = 𝐵! + 𝐵" 𝑋" + ⋯ 𝐵$ 𝑋% + 𝑢
17
GDPN, CVN, PP Is this GDPN’s causal
are indeed effect?
correlated
Answer: possibly yes,
unfortunately!
19
A more complete Multivariate regression
model emerges
The ‘pharma’example
Including other 𝑃(
factors = 𝐵! + 𝐵)*+, 𝐺𝐷𝑃𝑁(
determining + 𝐵-., 𝐶𝑉𝑁( + 𝐵*+- 𝐷𝑃𝐶(
‘pharma’ price + 𝐵/+- 𝐼𝑃𝐶( +𝐵++ 𝑃𝑃( + 𝑢(
20
The model is Multivariate regression
applicable to model
every entity in the
population 𝑌(
= 𝐵! + 𝐵" 𝑋"( + ⋯ + 𝐵$ 𝑋%( + 𝑢(
▷ Deterministic: ( E(Y|X)):
XB= 𝐵! + 𝐵" 𝑋"( + ⋯ + 𝐵$ 𝑋%(
▷ Error: ui
21
The model is Multivariate regression
applicable to model
every entity in the
population 𝑌(
= 𝐵! + 𝐵" 𝑋"( + ⋯ + 𝐵$ 𝑋%( + 𝑢(
▷ Y: regressand or
outcome or dependent
variable
23
Nomenclature of Multivariate regression
the model model
𝑌(
= 𝐵! + 𝐵" 𝑋"( + ⋯ + 𝐵$ 𝑋%( + 𝑢(
where 𝑖 = 1, 2, … , 𝑁
▷ X: vector of regressors or
determinants or
independent variable
24
Nomenclature of Multivariate regression
the model model
𝑌(
= 𝐵! + 𝐵" 𝑋"( + ⋯ + 𝐵$ 𝑋%( + 𝑢(
where 𝑖 = 1, 2, … , 𝑁
▷ 𝐵! , 𝐵" , … 𝐵$ : regression
coefficients or regression
parameters 25
Nomenclature of Multivariate regression
the model model
𝑌(
= 𝐵! + 𝐵" 𝑋"( + ⋯ + 𝐵$ 𝑋%( + 𝑢(
where 𝑖 = 1, 2, … , 𝑁
▷ N: the number of
observations
▷ Reminder: linearity is
assumed, though it doesn't
have to be 26
Obtaining causal Causal effect
effect from the
regression When 𝑋" changes by 1 unit,
(remember 𝑢 is 𝑌 changes by 𝐵" unit (on
uncorrelated with average), holding ‘ceteris
𝑋s) paribus’
𝜕𝐸[𝑌|𝑋]
𝜕𝑋!
𝜕(𝐵" +𝐵! 𝑋! + 𝐵# 𝑋# + ⋯ 𝐵$ 𝑋% )
=
𝜕𝑋!
= 𝐵!
27
Obtaining causal Sample counterpart of
effect from the the regression
regression 𝑌(
(remember 𝑢 is = 𝑏! + 𝑏" 𝑋"( + ⋯ + 𝑏$ 𝑋%( + 𝑒(
uncorrelated with
𝑋s)
▷ where 𝑏! , 𝑏" , … 𝑏% are the
estimates of 𝐵! , 𝐵" , … , 𝐵%
Causality
Multivariate linear
regression
31
Evaluating the Result evaluations
results from
regression – How reliable are OLS
credibility and results? Check boxes
reliability
32
Evaluating the Result evaluations
results from
regression – How reliable are OLS
credibility and results? Check boxes
reliability
33
Evaluating the Result evaluations
results from
regression – How reliable are OLS
credibility and results? Check boxes
reliability
34
Evaluating the Result evaluations
results from
regression – How reliable are OLS
credibility and results? Check boxes
reliability
35
Evaluating the Result evaluations
results from
regression – How reliable are OLS
credibility and results? Check boxes
reliability
36
Evaluating the Result evaluations
results from
regression – How reliable are OLS
credibility and results? Check boxes
reliability
38
Evaluating the Result evaluations
results from
regression – How reliable are OLS
credibility and results? Check boxes
reliability
39
!
Evaluating the Goodness of fit: 𝑅
results from
regression – ▷ 𝑅 : the coefficient of
"
credibility and determination
reliability
▷ Overall measure of
goodness of fit
40
!
How good is the Goodness of fit: 𝑅
fit
▷ It is a value between 0
(no fit) and 1 (perfect fit)
41
!
How good is the Goodness of fit: 𝑅
fit
▷ Explained sum squares
"
𝐸𝑆𝑆 = ∑ 𝑌P − 𝑌Q
▷ Residual sum squares
∑ "
𝑅𝑆𝑆 = 𝑒
▷ Total sum squares 𝑇𝑆𝑆 =
∑ 𝑌 − 𝑌Q "
233 533
▷ Then, 𝑅 =
"
=1−( )
433 433 42
" !
Higher 𝑅 is Misuse of 𝑅
better, but not
the only thing to ▷ Do not attempt to
consider maximize 𝑅" only
43
" !
Higher 𝑅 is Example: misuse of 𝑅
better, but not
the only thing to Effect of income on
consider Mozzarella consumption
▷ Model 1 𝑅 = 0.88
"
▷ Model 2 𝑅 = 0.97
"
45
Applied Applied regression: steps
regression
analysis 6 steps
progresses
according to a
set of steps ▷ Review domain
concepts/literature/theory
▷ Based on domain
concepts/literature/theory
- select the 𝑌 and 𝑋s
46
Applied regression Applied regression: steps
analysis
progresses 6 steps
according to a set
of steps
▷ Based on domain
concepts/literature/theory
- select the functional form
(linear for now)
47
Applied regression Applied regression: steps
analysis
progresses 6 steps
according to a set
of steps
▷ Hypothesize the
expected signs of the
coefficients – based on
theory
48
Applied regression Applied regression: steps
analysis
progresses 6 steps
according to a set
of steps
▷ Collect the data
49
Applied Applied regression: steps
regression
analysis 6 steps
progresses
according to a
set of steps ▷ Estimate and evaluate
the equation
50