Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Simple Linear Regression

Empirical Methods for Finance

Prof. Robert Hill

Nova SBE

2022

Robert Hill Empirical Methods for Finance 1 / 34


Outline

1 The Linear Regression Model

2 Population Regression Function and Fitted Line

3 Ordinary Least Squares (OLS)

4 Goodness of Fit

5 Exercise

Robert Hill Empirical Methods for Finance 2 / 34


The Linear Regression Model

Robert Hill Empirical Methods for Finance 3 / 34


Simple Linear Regression Model

y = β0 + β1 x + u

This describes the data generating process of y in the population

▶ y and x are linearly related


▶ The relationship is not exact
▶ The model implies that u captures everything that determines y that is
not x. Many times, this includes a lot of stuff!

Robert Hill Empirical Methods for Finance 4 / 34


Simple Linear Regression Model: Variables and Parameters

y = β0 + β1 x + u

(y , x, u) are random variables


(y , x) are observable (we have a sample from the population)
u is (always!) unobservable
(β0 , β1 ) are unobservable population parameters. This is what we
want to estimate.

Robert Hill Empirical Methods for Finance 5 / 34


Simple Linear Regression Model: Terminology

y = β0 + β1 x + u

y:
x:
u:
β0 :
β1 :

Robert Hill Empirical Methods for Finance 6 / 34


Example: Does Pedigree Predict Performance

RET = β0 + β1 SAT + u

RET is the return of a fund above the return on a benchmark portfolio


SAT is the average SAT score of students at the undergraduate institution
of the fund’s manager
What about u? All other factors that determine funds’ performance
Does the managers’ quality of education explain differences in performance
across MF managers?

Robert Hill Empirical Methods for Finance 7 / 34


Ceteris Paribus: everything else held constant

Definition of the causal effect of x on y :


How y changes when only x changes
This means, when all other factors that possibly affect y are held
unchanged (ceteris paribus = everything else equal)
Causation is different from correlation (correlation does not
imply causation)
▶ Correlation: x moves with y
▶ Causation: x moves y

Most interesting questions are ceteris paribus questions

Robert Hill Empirical Methods for Finance 8 / 34


Correlation vs Causation
An empirical observation:

Spurious correlation! Due to coincidence or to the variation of an omitted


factor that is driving both variables (“confounding factor”)
Robert Hill Empirical Methods for Finance 9 / 34
Correlation vs Causation
Another example of spurious correlation
Is Facebook driving the Greek debt crisis?

Robert Hill Empirical Methods for Finance 10 / 34


Ceteris Paribus Interpretation of the Linear Regression Model

y = β0 + β1 x + u
β1 measures the (linear) causal effect of a change in x on y :

∆y = β1 ∆x
when ∆u = 0
β1 is the ceteris paribus effect of x on y , i.e. keeping everything else
constant a change in x by 1 unit, will cause y to change by β1 units
Since β1 is unobservable, we need to estimate it using data about x
and y
How can we hope to learn about the effect of x on y holding other
factors fixed, when we are ignoring all those other factors?

Robert Hill Empirical Methods for Finance 11 / 34


Example: Does Pedigree Predict Performance?

RET = β0 + β1 SAT + u

Another empirical observation: managers who attended higher-SAT


undergraduate institutions have systematically higher excess returns

Interpretations

▶ Causal:

▶ Non-causal:

Important distinction: is the cost of ivy league education worth it?

Robert Hill Empirical Methods for Finance 12 / 34


Key assumption for causality

Zero conditional mean assumption

E (u|x) = E (u) = 0

Makes two assumptions:


(1) Mean independence of the error term

E (u|x) = E (u), for all values x

(2) Zero mean:


E (u) = 0

Robert Hill Empirical Methods for Finance 13 / 34


(1) Mean independence of the error term

E (u|x) = E (u), for all values x

▶ The average value of u does not depend on the value of x


▶ This is the key assumption. A very strong assumption!
▶ In the example RET = β0 + β1 SAT + u, u contains innate ability
among other things
▶ Mean independence of u means that E (ability |SAT ) = E (ability ), i.e.
that the average level of ability is the same across people from different
institutions
▶ This implies E (ability |SAT = 1500 Princeton) = E (ability |SAT =
1177 U. of Alabama). Realistic?

Robert Hill Empirical Methods for Finance 14 / 34


(2) Also assumes that u is zero in expectation

E (u) = 0

▶ Harmless assumption (normalization) as long as there is an intercept


▶ The constant (intercept) will absorb any non-zero mean of u

Robert Hill Empirical Methods for Finance 15 / 34


Population Regression Function and Fitted Line

Robert Hill Empirical Methods for Finance 16 / 34


The Population Regression Function (PRF)

Under the zero conditional mean assumption E (u|x) = E (u) = 0

E (y |x) = β0 + β1 x

▶ The PRF gives us a relationship between the average level of y at


different levels of x. Whether the actual y is above or below the PRF
depends on the unobserved factors in u

▶ β1 now tells us how the average value of y changes with x

▶ y can be decomposed into a systematic and a idiosyncratic part

y = E (y |x) + u

Robert Hill Empirical Methods for Finance 17 / 34


Expected Values and Errors

For a sample of the population {yi , xi }, i = 1 . . . n


yi = E (y |xi ) + ui

For a given value of x, we observe different values of y because of the


randomness in u

Robert Hill Empirical Methods for Finance 18 / 34


Fitted Values and Residuals
Given a sample {yi , xi }, i = 1 . . . n we estimate
ŷ = β̂0 + β̂1 x
Regression residuals are defined as
ûi = yi − ŷi ⇔ y = ŷ + ûi

Robert Hill Empirical Methods for Finance 19 / 34


Robert Hill Empirical Methods for Finance 20 / 34
Ordinary Least Squares (OLS)

Robert Hill Empirical Methods for Finance 21 / 34


Ordinary Least Squares (OLS)

The most common estimator is known as OLS (ordinary least square)


Choose βb0 and βb1 such that, collectively, the difference between the true
value yi and the fitted value ybi is minimized
This is achieved by minimizing the sum of the squared residuals:

N
X N
X N
X
min SSR = min ubi2 = min (yi − ybi )2 = min (yi − βb0 − βb1 xi )2
β
b0 ,β
b1 β
b0 ,β
b1
i=1 β
b0 ,β
b1
i=1 β
b0 ,β
b1
i=1

Robert Hill Empirical Methods for Finance 22 / 34


OLS Estimators

PN
i=1 (yi − y )(xi − x) sample covariance(x, y )
βb1 = PN =
i=1 (xi − x)
2 sample variance(x)

βb0 = y − βb1 x

PN PN
where x = 1/N i=1 xi and y = 1/N i=1 yi

(Derivation: video)

You can compute them manually. In practice, this is done by


econometric packages (e.g. STATA)

Robert Hill Empirical Methods for Finance 23 / 34


Algebraic Properties of OLS

Properites of OLS estimators that follow directly from algebra and are
therefore always true. In other words, OLS estimators βb0 and βb1
are chosen such that:
PN
1
i=1 ubi = 0
The sum (and the sample average) of the OLS residuals is zero
PN
2
i=1 xi u
bi =0
The sample covariance between the regressor(s) and the OLS residuals
is zero1

3 The point (x, y ) always lies on the regression line

1 1
P N 1
PN 1
PN 1
PN
n−1 i=1 (xi − x)(ûi − û) = n−1 i=1 (xi − x)ûi = n−1 i=1 xi ûi − x n−1 i=1 ûi =
1
P N PN
n−1 i=1 i ûi = 0 ⇔
x i=1 xi ubi = 0
Robert Hill Empirical Methods for Finance 24 / 34
Errors vs. Residuals
Errors ui
▶ all other factors that affect y
▶ the vertical distances between observations and the PRF
▶ never observed
▶ assumptions of the model are built around u

Residuals ûi
▶ computed from the data
▶ the vertical distances between observations and the estimated
regression function
▶ have several important algebraic properties

Robert Hill Empirical Methods for Finance 25 / 34


Goodness of Fit

Robert Hill Empirical Methods for Finance 26 / 34


Goodness of Fit: Some definitions

Sum of Squares Total (SST): measures the total sample variation in the yi
N
X
SST = (yi − y )2
i=1

Sum of Squares Explained (SSE): measures the sample variation in the ŷi
N
X
SSE = (ŷi − ŷ )2
i=1

Sum of Squares Residual (SSR): measures the sample variation in the ûi
N
X N
X
SSR = (ûi − û)2 = ûi2
i=1 i=1

Robert Hill Empirical Methods for Finance 27 / 34


Goodness of Fit: R-squared

The total variation can be decomposed into variation explained and variation
residual (unexplained):
SST = SSE + SSR

Intuitively, a good measure of the regression fit is how much of the total
variation can the model explain. This is the definition of R 2

R 2 = SSE /SST = 1 − SSR/SST

Some remarks:
▶ R2: proportion of the variation in y explained by variation in x
▶ R 2 is always between 0 and 1
▶ Higher R 2 means that a higher proportion of variation in yi is explained
by the variation in xi
▶ Low R 2 are not uncommon, especially for cross-sectional data
▶ High R 2 is useless if correlation is spurious
Robert Hill Empirical Methods for Finance 28 / 34
Robert Hill Empirical Methods for Finance 29 / 34
Robert Hill Empirical Methods for Finance 30 / 34
Robert Hill Empirical Methods for Finance 31 / 34
Exercise

Robert Hill Empirical Methods for Finance 32 / 34


Can you help my friends?

Some friends recently started a


start-up that produces vegan
salmon
The company is doing great, but
they have no quantitative
background
They are collecting data about
the production activity but they
cannot figure how to get the
information they need out of
them...

Robert Hill Empirical Methods for Finance 33 / 34


Can you help my friends?

Each week they record


1 how many units of product 1 they produced
2 how many units of product 2 they produced
3 the total number of hours worked
For each product, they need to know how many units they produce in
one hour

Robert Hill Empirical Methods for Finance 34 / 34

You might also like