Professional Documents
Culture Documents
1 EMF univariateOLS
1 EMF univariateOLS
Nova SBE
2022
4 Goodness of Fit
5 Exercise
y = β0 + β1 x + u
y = β0 + β1 x + u
y = β0 + β1 x + u
y:
x:
u:
β0 :
β1 :
RET = β0 + β1 SAT + u
y = β0 + β1 x + u
β1 measures the (linear) causal effect of a change in x on y :
∆y = β1 ∆x
when ∆u = 0
β1 is the ceteris paribus effect of x on y , i.e. keeping everything else
constant a change in x by 1 unit, will cause y to change by β1 units
Since β1 is unobservable, we need to estimate it using data about x
and y
How can we hope to learn about the effect of x on y holding other
factors fixed, when we are ignoring all those other factors?
RET = β0 + β1 SAT + u
Interpretations
▶ Causal:
▶ Non-causal:
E (u|x) = E (u) = 0
E (u) = 0
E (y |x) = β0 + β1 x
y = E (y |x) + u
N
X N
X N
X
min SSR = min ubi2 = min (yi − ybi )2 = min (yi − βb0 − βb1 xi )2
β
b0 ,β
b1 β
b0 ,β
b1
i=1 β
b0 ,β
b1
i=1 β
b0 ,β
b1
i=1
PN
i=1 (yi − y )(xi − x) sample covariance(x, y )
βb1 = PN =
i=1 (xi − x)
2 sample variance(x)
βb0 = y − βb1 x
PN PN
where x = 1/N i=1 xi and y = 1/N i=1 yi
(Derivation: video)
Properites of OLS estimators that follow directly from algebra and are
therefore always true. In other words, OLS estimators βb0 and βb1
are chosen such that:
PN
1
i=1 ubi = 0
The sum (and the sample average) of the OLS residuals is zero
PN
2
i=1 xi u
bi =0
The sample covariance between the regressor(s) and the OLS residuals
is zero1
1 1
P N 1
PN 1
PN 1
PN
n−1 i=1 (xi − x)(ûi − û) = n−1 i=1 (xi − x)ûi = n−1 i=1 xi ûi − x n−1 i=1 ûi =
1
P N PN
n−1 i=1 i ûi = 0 ⇔
x i=1 xi ubi = 0
Robert Hill Empirical Methods for Finance 24 / 34
Errors vs. Residuals
Errors ui
▶ all other factors that affect y
▶ the vertical distances between observations and the PRF
▶ never observed
▶ assumptions of the model are built around u
Residuals ûi
▶ computed from the data
▶ the vertical distances between observations and the estimated
regression function
▶ have several important algebraic properties
Sum of Squares Total (SST): measures the total sample variation in the yi
N
X
SST = (yi − y )2
i=1
Sum of Squares Explained (SSE): measures the sample variation in the ŷi
N
X
SSE = (ŷi − ŷ )2
i=1
Sum of Squares Residual (SSR): measures the sample variation in the ûi
N
X N
X
SSR = (ûi − û)2 = ûi2
i=1 i=1
The total variation can be decomposed into variation explained and variation
residual (unexplained):
SST = SSE + SSR
Intuitively, a good measure of the regression fit is how much of the total
variation can the model explain. This is the definition of R 2
Some remarks:
▶ R2: proportion of the variation in y explained by variation in x
▶ R 2 is always between 0 and 1
▶ Higher R 2 means that a higher proportion of variation in yi is explained
by the variation in xi
▶ Low R 2 are not uncommon, especially for cross-sectional data
▶ High R 2 is useless if correlation is spurious
Robert Hill Empirical Methods for Finance 28 / 34
Robert Hill Empirical Methods for Finance 29 / 34
Robert Hill Empirical Methods for Finance 30 / 34
Robert Hill Empirical Methods for Finance 31 / 34
Exercise