Week 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

Week 1: Linear Models I

Introduction to Analytics in Finance

Katerina Chrysikou

KBS

Week 1

Katerina Chrysikou (KBS) Week 1 1 / 56


Outline

Introduction
Typical Problems
Data and Notation
Model formulation
The classical Linear Regression Model
Univariate regression Models
Ordinary Least Squares
Deriving the Estimator
Assumptions
Properties of Estimators
Statistical Inference
Testing Hypotheses
Empirical analysis
References

Katerina Chrysikou (KBS) Week 1 2 / 56


Finance questions that may be solved by
Econometricians I

Testing whether financial markets are weak-form informationally


efficient.
Testing whether the CAPM or APT represent superior models for
the determination of returns on risky assets.
Measuring and forecasting the volatility of bond returns.
Explaining the determinants of bond credit ratings used by the
ratings agencies.
Modelling long-term relationships between prices and exchange
rates.

Katerina Chrysikou (KBS) Week 1 3 / 56


Finance questions that may be solved by
Econometricians II

Determining the optimal hedge ratio for a spot position in oil.


Testing technical trading rules to determine which makes the
most money.
Testing the hypothesis that earnings or dividend announcements
have no effect on stock prices.
Testing whether spot or futures markets react more rapidly to
news.
Forecasting the correlation between the returns to the stock
indices of two countries.

Katerina Chrysikou (KBS) Week 1 4 / 56


Special Characteristics and Types of Data

Frequency and quantity of data: Stock market prices are measured


every time there is a trade or somebody posts a new quote.
Quality: Recorded asset prices are usually those at which the
transaction took place. No possibility for measurement error but
financial data are “noisy”.
There are 3 types of data which econometricians might use for
analysis:
1 Time series data;
2 Cross-sectional data;
3 Panel data, a combination of 1. and 2.

Katerina Chrysikou (KBS) Week 1 5 / 56


Time Series Data I

Examples of time series data:

Series Frequency
GDP, Industrial Production/unemployment monthly, or quarterly
government budget deficit annually
money supply weekly
value of a stock market index end of day
value of a stock price as transactions occur

Katerina Chrysikou (KBS) Week 1 6 / 56


Time Series Data II

Examples of Problems that Could be Tackled Using a Time Series


Regression
How the value of a country’s stock index has varied with that
country’s macroeconomic fundamentals.
How the value of a company’s stock price has varied when it
announced the value of its dividend payment.
The effect on a country’s currency of an increase in its interest
rate.

Katerina Chrysikou (KBS) Week 1 7 / 56


Cross-sectional Data

Cross-sectional data are data on one or more variables collected at a


single point in time, e.g.
A poll of usage of internet stock broking services
Cross-section of stock returns on the New York Stock Exchange
A sample of bond credit ratings for UK banks

Examples of Problems that Could be Tackled Using a Cross-Sectional


Regression:
The relationship between company size and the return to investing
in its shares.
The relationship between a country’s GDP level and the probability
that the government will default on its sovereign debt.

Katerina Chrysikou (KBS) Week 1 8 / 56


Notation

A usual notation includes the following symbols:


denote each observation by the letter t and the total number of
observations by T for time series data, t = 1, 2, ..., T
denote each unit by the letter i and the total number of
observations by N for cross-sectional data, i = 1, 2, ..., N.

Example:
The stock price of asset i across time: pit .

Katerina Chrysikou (KBS) Week 1 9 / 56


Prices of Financial Assets
# Make a plot
plot(d, p, main="SPY", type="l", xlab="Time", ylab="Prices ($)")

SPY

250
200
Prices ($)

150
100
50

2005 2010 2015

Time

Figure:
Katerina Chrysikou (KBS) Week 1 10 / 56
Returns for Financial Assets

It is preferable not to work directly with asset prices (why???), so we


usually convert the raw prices into a series of returns. There are two
ways to do this. Let Rt denote the return at time t, pt denote the asset
price at time t.
Simple (Percentage) returns → modelling & calculation of profits
pt − pt−1
Rt = (×100%)
pt−1

Log returns → modelling only


 
pt
Rt = log (×100%)
pt − 1

We also ignore any dividend payments, or alternatively assume that the


price series have been already adjusted to account for them.

Katerina Chrysikou (KBS) Week 1 11 / 56


Log Returns

The returns are also known as log price relatives. There is a number of
reasons for this:
They have the nice property that they can be interpreted as
continuously compounded returns.
Can add them up, e.g. if we want a weekly return and we have
calculated daily log returns:
r1 = log p1 /p0 = log p1 − log p0
r2 = log p2 /p1 = log p2 − log p1
r3 = log p3 /p2 = log p3 − log p2
r4 = log p4 /p3 = log p4 − log p3
r5 = log p5 /p4 = log p5 − log p4
log p5 − log p0 = log p5 /p0

Katerina Chrysikou (KBS) Week 1 12 / 56


Disadvantage of Log Returns

The simple return on a portfolio of assets is a weighted average of the


simple returns on the individual assets:
N
Rpt = ∑ wip Rit
i=1

But this does not work for the continuously compounded returns.

Katerina Chrysikou (KBS) Week 1 13 / 56


Steps involved in the formulation of econometric
models

1 Economic or Financial Theory (Previous Studies)


2 Collection of Data
3 Formulation of a (testable) Theoretical Model
4 Model Estimation. Is the Model Statistically Adequate?
No:
Reformulate Model (go back to Step 3)
Yes:
Interpret Model
Use for Analysis

Katerina Chrysikou (KBS) Week 1 14 / 56


Regression

Regression is probably the single most important tool at the


econometrician’s disposal.
But what is regression analysis?
It is concerned with describing and evaluating the relationship
between a given variable (usually called the dependent variable)
and one or more other variables (usually known as the
independent variable(s)).

Katerina Chrysikou (KBS) Week 1 15 / 56


Notation

Denote the dependent variable by y

and the independent variable(s) by x1 , x2 , . . . , xk where there are k


independent variables.

Note that there can be many x variables

but we will limit ourselves to the case where there is only one x
variable to start with.

In our set-up, there is only one y variable.

Katerina Chrysikou (KBS) Week 1 16 / 56


Regression vs Correlation

If we say y and x are correlated, it means they are linearly


associated and that we are treating y and x in a completely
symmetrical way.
It does not imply that changes in x cause changes in y.

In regression, we treat the dependent variable (y) and the independent


variable(s) (xs) very differently.
The y variable is assumed to be random or “stochastic” in some
way, i.e. to have a probability distribution.
The x variables are, however, assumed to have fixed
(“non-stochastic”) values in repeated samples.

Katerina Chrysikou (KBS) Week 1 17 / 56


Simple Regression

For simplicity, say k = 1.

This is the situation where y depends on only one x variable.

Examples of the kind of relationship that may be of interest


include:
How asset returns vary with their level of market risk.
Measuring the long-term relationship between stock prices and
dividends.
Constructing an optimal hedge ratio.

Katerina Chrysikou (KBS) Week 1 18 / 56


Simple Regression I

Suppose that we have the following data on the excess returns on a


fund manager’s portfolio (“fund XXX‘”) together with the excess returns
on a market index:

Year Excess return Excess Return on market index


t rXXX,t − rft rmt − rft
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3

Katerina Chrysikou (KBS) Week 1 19 / 56


Simple Regression II

We have some intuition that the beta on this fund is positive.

We therefore want to find whether there appears to be a


relationship between x and y given the data that we have.

The first stage would be to form a scatter plot of the two variables.

Katerina Chrysikou (KBS) Week 1 20 / 56


Return Plot
# specify the axes values if you want using ylim and xlim
plot(mi, mp, main="Scatterplot", xlab="Market Index", ylab="Fund XXX",
type="p", pch=19, col="blue", cex=2, xlim=c(0,25), ylim=c(0,45))

Scatterplot

40

30
Fund XXX


20

● ●


10
0

0 5 10 15 20 25

Market Index

Katerina Chrysikou (KBS) Figure: Week 1 21 / 56


Finding a Line of Best Fit (Regression Line)

We can use the general equation for a straight line,

y = a + bx

to get the line that best “fits” the data.

However, this equation (y = a + bx) is completely deterministic. Is


this realistic?

No. So what we do is to add a random disturbance term, u into the


equation:
yt = α + βxt + ut
where t = 1, 2, 3, 4, 5

Katerina Chrysikou (KBS) Week 1 22 / 56


Why a Disturbance Term?

The disturbance term can capture a number of features:


We always leave out some determinants of yt
There may be errors in the measurement of yt that cannot be
modelled.
Random outside influences on yt which we cannot model

So how do we determine what α and β are?


Choose α and β so that the (vertical) distances from the data
points to the fitted lines are minimised (so that the line fits the
data as closely as possible):

Katerina Chrysikou (KBS) Week 1 23 / 56


Determining the Regression Coefficient
Scatterplot

40

30
Fund XXX


20

● ●


10
0

0 5 10 15 20 25

Market Index

Figure:
Katerina Chrysikou (KBS) Week 1 24 / 56
Ordinary Least Squares

The most common method used to fit a line to the data is known as


Ordinary Least Squares (OLS).

What we actually do is take each distance and square it and minimise


the total sum of the squares (hence least squares).

Tightening up the notation, let:


yt denote the actual data point
ŷt denote the fitted value from the regression line
ût denote the residual yt − ŷt

Katerina Chrysikou (KBS) Week 1 25 / 56


How OLS Works

So

min u21 + u22 + u23 + u24 + u25


5
= min ∑ u2t .
t=1

This is known as the residual sum of squares.


But what was ut ? It was the difference between the actual point and
the line, yt − α − βxt .
So minimising ∑(yt − α − βxt )2 is equivalent to minimising ∑ u2t with
respect to α and β .

Katerina Chrysikou (KBS) Week 1 26 / 56


Deriving the OLS Estimator I
Let L = ∑(yt − α − βxt )2 . We want to minimize L w.r.t. α and β. So the
FOC are:

∂L
= −2 ∑(yt − α − βxt ) = 0, (1)
∂α
∂L
= −2 ∑ xt (yt − α − βxt ) = 0. (2)
∂β
From (1)

∑ yt − T α̂ − β̂ ∑ xt = 0.

But ∑ yt = Ty, ∑ xt = Tx. So we can write: y − α̂ − β̂x = 0, and obtain

α̂ = y − β̂x. (3)

Katerina Chrysikou (KBS) Week 1 27 / 56


Deriving the OLS Estimator II

We can substitute (3) into (2) and get

∑ xt (yt − y + β̂x − β̂xt ) = 0


∑ xt yt − y ∑ xt + β̂x ∑ xt − β̂ ∑ xt2 = 0
∑ xt yt − Tyx + β̂Tx2 − β̂ ∑ xt2 = 0

from which one easily gets

∑ xt yt − Tyx
β̂ = 2
.
∑ xt2 − Tx

Katerina Chrysikou (KBS) Week 1 28 / 56


What Do we use α̂ and β̂ for?

In the CAPM example used above, plugging the 5 observations in to


make up the formulae given above would lead to the estimates
α̂ = −1.74 and β̂ = 1.64.
# Do the regression and save the regression output
out.reg <- lm(mp~mi)

> lm(mp~mi)

Call:
lm(formula = mp ~ mi)

Coefficients:
(Intercept) mi
-1.737 1.642

Katerina Chrysikou (KBS) Week 1 29 / 56


We could write the fitted line as:

ŷt = −1.74 + 1.64xt .

Question: If an analyst tells you that she expects the market to yield a
return 20% higher than the risk-free rate next year, what would you
expect the return on fund XXX to be?
Solution: We can say that the expected value of
y = −1.74 + 1.64 ∗ (value of) x, so plug x = 20 into the equation to get
the expected value for y:

ŷt = −1.74 + 1.64 × 20 = 31.06.

Katerina Chrysikou (KBS) Week 1 30 / 56


Linearity I

In order to use OLS, we need a model which is linear in the parameters.


It does not necessarily have to be linear in the variables. Linear in the
parameters means that the parameters are not multiplied together,
divided, squared or cubed etc.

Some models can be transformed to linear ones by a suitable


substitution or manipulation, e.g. the exponential regression model

Yt = eα Xt eut .
β

Then let yt = log Yt and xt = log Xt and get

yt = α + βxt + ut .

Katerina Chrysikou (KBS) Week 1 31 / 56


Linearity II

Similarly, if theory suggests that y and x should be inversely related:

β
yt = α + + ut
xt
then the regression can be estimated using OLS by substituting

1
zt = .
xt
But some models are intrinsically non-linear, e.g.
β
yt = α + x t + u t .

Katerina Chrysikou (KBS) Week 1 32 / 56


The assumptions underlying the CLRM

We usually make the following set of assumptions:


1 E(ut ) = 0
2 var(ut ) = σ2 < ∞
3 cov(ut , us ) = 0
4 cov(xt , ut ) = 0

A fifth assumption is required if we want to make exact inference


about the population parameters (the actual α and β) from the sample
parameters (α̂ and β̂). Additional assumption:
ut is normally distributed.

Katerina Chrysikou (KBS) Week 1 33 / 56


Properties of the OLS Estimator

If Assumptions 1. through 4. hold, then the estimators by OLS are


known as Best Linear Unbiased Estimators (BLUE).

What does the acronym stand for?


Best: Estimator with the minimum variance within the class of
linear unbiased estimators (Gauss-Markov theorem).
Linear: It is a linear estimator.
Unbiased: Its expectation is the true value of the parameter.

Katerina Chrysikou (KBS) Week 1 34 / 56


Consistency/Unbiasedness/Efficiency

Consistency: means that estimates will converge to the true value as


the sample size increases. Formally

lim Pr β̂ − β > δ = 0, ∀ δ > 0.
T →∞

Unbiasedness: implies that



E β̂ = β.

Efficiency: relates to the variance of the estimator.

Katerina Chrysikou (KBS) Week 1 35 / 56


Standard Errors

We need some measure of the reliability or precision of the estimators


(α̂ and β̂ ).
The precision of the estimate is given by its standard error.
Given Assumptions 1 - 4 above, then the standard errors can be shown
to be given by
s s
∑ xt2 ∑ xt2
SE(α̂) = s = s ,
T ∑(xt − x)2 T ∑ xt2 − T 2 x2
s s
1 1
SE( β̂) = s =s ,
∑ ( xt − x ) 2 ∑ xt2 − Tx2
where s is the estimated standard errors of residuals.

Katerina Chrysikou (KBS) Week 1 36 / 56


Estimating the Variance of the Disturbance Term

The variance of the random variable ut is given by:

var(ut ) = E(ut − E(ut ))2 = E(u2t ).

Since ut are not observable, we use:


1
s2 =
T ∑ û2t
But this estimator is a biased estimator of σ2 . An unbiased estimator
is given by:
1
T−2 ∑ t
s2 = û2 .

Katerina Chrysikou (KBS) Week 1 37 / 56


Use the previous example and calculate!

# Break into pieces so you can understand the algebra


sx2 <- sum(x^2)
SEa.denom1 <- t*sx2
SEa.denom2 <- (t^2) * (mean(x)^2)
SEa <- s * sqrt( sx2 / (SEa.denom1 - SEa.denom2))
SEb <- s * sqrt( 1 / (sx2 - t*(mean(x)^2)))

> SEa
[1] 4.113994

> SEb
[1] 0.2647783

Katerina Chrysikou (KBS) Week 1 38 / 56


Also check the regression output!

# Check the standard error


> summary(out.reg)

Call:
lm(formula = mp ~ mi)

Residuals:
1 2 3 4 5
-2.955 2.648 3.209 -1.645 -1.257

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.7366 4.1140 -0.422 0.70135
mi 1.6417 0.2648 6.200 0.00845 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.179 on 3 degrees of freedom


Multiple R-squared: 0.9276,Adjusted R-squared: 0.9035
F-statistic: 38.45 on 1 and 3 DF, p-value: 0.008452

Katerina Chrysikou (KBS) Week 1 39 / 56


Some Comments on the Standard Error Estimators I

1 Both SE(α̂) and SE( β̂) depend on s2 . The greater the variance s2 ,
then the more dispersed the errors are about their mean value and
therefore the more dispersed y will be about its mean value.

2 The sum of the squares of x about their mean appears in both


formulae. The larger the sum of squares, the smaller the
coefficient variances.

Katerina Chrysikou (KBS) Week 1 40 / 56


Small ∑(xt − x)2
# Check how small dispersion looks like
# by generating 20 random numbers (all very close to your mean value)
y <- rnorm(20, mean=5, sd=0.5)
x <- rnorm(20, mean=5, sd=0.5)
plot(x, y, type="p", pch=19, col="blue", cex=1.2, xlim=c(0,10), ylim=c(0,10))
abline(lm(y~x), col="red", lwd=2)

10
8


6

● ●●●
●●
●●
y

● ●
● ● ●●

●●

4


2
0

0 2 4 6 8 10

Figure:
Katerina Chrysikou (KBS) Week 1 41 / 56
Large ∑(xt − x)2
y <- rnorm(20, mean=5, sd=0.5)
x <- rnorm(20, mean=5, sd=2)
plot(x, y, type="p", pch=19, col="blue", cex=1.2, xlim=c(0,10), ylim=c(0,10))
abline(lm(y~x), col="red", lwd=2)

10
8


6

● ●
● ●● ● ●
● ●
● ● ●
y


● ●

● ●●
4
2
0

0 2 4 6 8 10

Figure:
Katerina Chrysikou (KBS) Week 1 42 / 56
Some Comments on the Standard Error Estimators II

The larger the sample size, T, the smaller will be the coefficient
variances.

T appears explicitly in SE(α̂) and implicitly in SE( β̂).

T appears implicitly since the sum ∑(xt − x)2 is from t = 1 to T.

Katerina Chrysikou (KBS) Week 1 43 / 56


An Introduction to Statistical Inference

We want to make inference about the likely population values from the
regression parameters.

From regression results we have a single (point) estimate of the


unknown population parameter. How “reliable” is this estimate?

The reliability of the point estimate is measured by the coefficient’s


standard error.

Katerina Chrysikou (KBS) Week 1 44 / 56


Hypothesis Testing: Some Concepts

We can use the information in the sample to make inferences about


the population.

We will always have two hypotheses that go together, the null


hypothesis (denoted H0 ) and the alternative hypothesis (denoted H1 ).
The null hypothesis is the statement or the statistical hypothesis that
is actually being tested. The alternative hypothesis represents the
remaining outcomes of interest.

For example, suppose given the regression results above, we are


interested in the hypothesis that the true value of β is in fact 0.5. We
would use the notation
H0 : β = 0.5
H1 : β 6= 0.5
This would be known as a two sided test.
Katerina Chrysikou (KBS) Week 1 45 / 56
The Probability Distribution of the Least Squares
Estimators I

We assume that ut ∼ N(0, σ2 )

The least squares estimators are linear combinations of the random


variables i.e. β̂ = ∑ wt yt

Therefore, the weighted sum of normal random variables is also


normally distributed, so

α̂ ∼ N(α, var(α̂))

β̂ ∼ N( β, var( β̂))

Katerina Chrysikou (KBS) Week 1 46 / 56


The Probability Distribution of the Least Squares
Estimators II

What if the errors are not normally distributed?

Will the parameter estimates still be normally distributed?

Yes, if the other assumptions of the CLRM hold, and the sample size is
sufficiently large.

Standard normal variates can be constructed from α̂ and β̂:

α̂ − α β̂ − β
p ∼ N(0, 1) and q ∼ N(0, 1)
var(α̂) var( β̂)

Katerina Chrysikou (KBS) Week 1 47 / 56


The Probability Distribution of the Least Squares
Estimators III

But var(α) and var( β) are unknown.

So we have

α̂ − α β̂ − β
∼ tT −2 and ∼ tT −2 .
SE(α̂) SE( β̂)

Katerina Chrysikou (KBS) Week 1 48 / 56


Testing Hypotheses: The Test of Significance
Approach I

Assume the regression equation is given by

yt = α + βxt + ut , t = 1, 2, . . . , T.

The steps involved in doing a test of significance are:


1 Calculate α̂, β̂ and SE(α̂), SE( β̂) in the usual way.
2 Calculate the test statistic. This is given by the formula

β̂ − β∗
,
SE( β̂)

where β∗ is the value of β under the null hypothesis.

Katerina Chrysikou (KBS) Week 1 49 / 56


Testing Hypotheses: The Test of Significance
Approach II

We need some tabulated distribution with which to compare the


estimated test statistics.

Test statistics derived in this way can be shown to follow a


t-distribution with T − 2 degrees of freedom.

As the number of degrees of freedom increases, we need to be less


cautious in our approach since we can be more sure that our results
are robust.

Katerina Chrysikou (KBS) Week 1 50 / 56


Testing Hypotheses: The Test of Significance
Approach III

We need to choose a “significance level”, often denoted α (which is


different from the constant!!).

This is also sometimes called the size of the test and it determines the
region where we will reject or not reject the null hypothesis that we are
testing.

It is conventional to use a significance level of 5%. Intuitive explanation


is that we would only expect a result as extreme as this or more
extreme 5% of the time as a consequence of chance alone.

Conventional to use a 5% size of test, but 10% and 1% are also


commonly used.

Katerina Chrysikou (KBS) Week 1 51 / 56


Testing Hypotheses: The Test of Significance
Approach IV

Use the t-tables to obtain a critical value or values with which to


compare the test statistic.

Finally, perform the test. If the test statistic lies in the rejection region
then reject the null hypothesis (H0 ), else do not reject H0 .

Katerina Chrysikou (KBS) Week 1 52 / 56


A Special Type of Hypothesis Test: The t-ratio I

Recall that the formula for a test of significance approach to


hypothesis testing using a t-test was

β̂ − β∗
,
SE( β̂)

If the test is
H0 : β = 0,
H1 : β 6 = 0
i.e. a test that the population coefficient is zero against a two-sided
alternative, this is known as a t-ratio test.

Katerina Chrysikou (KBS) Week 1 53 / 56


A Special Type of Hypothesis Test: The t-ratio II

Since β∗ = 0, the test statistic is

β̂
.
SE( β̂)

t-ratios are conventionally reported by all the econometric packages.

Katerina Chrysikou (KBS) Week 1 54 / 56


Calculate the t-ratios

# Calculate the t-ratios for true values equal to 0


tra <- (out.reg$coefficients[1] - 0)/SEa
trb <- (out.reg$coefficients[2] - 0)/SEb

> tra
-0.4221322

> trb
6.200453

# compare with the output


Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.7366 4.1140 -0.422 0.70135
mi 1.6417 0.2648 6.200 0.00845 **

Katerina Chrysikou (KBS) Week 1 55 / 56


References

Papers:
Jensen, M. (1968). The performance of mutual funds in the period
1945-1964. Journal of Finance, 23, 389-416.

Katerina Chrysikou (KBS) Week 1 56 / 56

You might also like